The paper introduces an R Package of multivariate Fay-Herriot models for small area estimation named msae. This package implements four types of Fay-Herriot models, including univariate Fay-Herriot model (model 0), multivariate Fay-Herriot model (model 1), autoregressive multivariate Fay-Herriot model (model 2), and heteroskedastic autoregressive multivariate Fay-Herriot model (model 3). It also contains some datasets generated based on multivariate Fay-Herriot models. We describe and implement functions through various practical examples. Multivariate Fay-Herriot models produce a more efficient parameter estimation than direct estimation and univariate model.
Survey sampling is a data collection method by observing several units of observation to obtain information from the entire population. Compared to other data collection methods, survey sampling has advantages in cost, time, and human resources. Survey sampling is designed for a certain size of the domain, commonly a large area. However, data demand for small areas is increasing and has become high issue (Ghosh and Rao 1994). The inadequate sample size causes a large standard error of parameter estimates. This problem is overcome by indirect estimation, namely Small Area Estimation (SAE).
Rao and Molina (2015) said that SAE increases the effectiveness of sample size using the strength of neighboring areas and information on other variables that are related to the variable of interest. There are some estimation methods in SAE, namely Best Linear Unbiased Predictors (BLUP), Empirical Best Linear Unbiased Predictors (EBLUP), Hierarchical Bayes (HB), and Empirical Bayes (EB). The most common SAE estimator is the EBLUP (Krieg et al. 2015). EBLUP has advantages over EB and HB methods. It is a development of the BLUP method that minimizes the MSE among other unbiased linear estimators (Ghosh and Rao 1994). Area level of EBLUP application for the continuous response variable, called Fay-Herriot model, was firstly employed for estimating log per-capita income (PCI) in small places in the US (Rao and Molina 2015).
The Fay-Herriot model has extended into a multivariate Fay-Herriot model, which is a model with several correlated response variables. Datta et al. (1991) firstly applied the multivariate model to estimate the median income of four-person families in the US states. Benavent and Morales (2016) developed multivariate Fay-Herriot models with the EBLUP method and introduced four estimation models based on the estimated variance matrix structure. Ubaidillah et al. (2019) also implemented the multivariate Fay-Herriot model and indicated that the multivariate Fay-Herriot model produces a more efficient parameter estimation than the univariate model.
On the Comprehensive R Archive Network (CRAN), there are several packages implementing small area estimation. Some of them are included in the Small Area Estimation subsection of The CRAN Task View: Official Statistics & Survey Methodology (Templ 2014), including sae (Molina and Marhuenda 2018), rsae (Schoch 2014), nlme (Pinheiro et al. 2020), hbsae (Boonstra 2012), JoSAE (Breidenbach 2018), and BayesSAE (Chengchun Shi Developer 2018). Other popular SAE packages not included in that subsection are mme (Lopez-Vizcaino et al. 2019) and saery (Lefler et al. 2014).
In this paper, we introduce our R package of multivariate Fay-Herriot models for small area estimation, named msae. This package and its details are available on CRAN at http://CRAN.R-project.org/package=msae. Functions in this package implement four Fay-Herriot Models, namely Model 0, Model 1, Model 2, and Model 3, as proposed by Benavent and Morales (2016).
The paper is structured as follows. First, we explain multivariate Fay-Herriot models in Section 2. Then, we describe msae package and illustrate the use of this package for SAE estimation by employing simulation studies and applying it to a real dataset in the next Sections 3 and 4. Finally, we provide a conclusion in Section 8.
The multivariate Fay-Herriot model is an extension of the Fay-Herriot model, which utilizes some correlated responses. Fay-Herriot is a combination of two model components. The first component is called the sampling model, and the second component is called the linking model.
Suppose we want to estimate characteristics of R variables in D areas, \(\mu_{d} = (\mu_{1d}, \dotsc ,\mu_{Rd})^T\), with \(d = 1, \dotsc ,D\). Let \(y_d= (y_{1d}, \dotsc ,y_{Rd})^T\), be a direct estimator of \(\mu_d\). The first component, i.e., sampling model is \[y_d=\mu_d+e_d, \quad e_d \sim N(0,V_{e_d}), \quad d=1,\dotsc,D\] , where \(e_d\) is sampling error with a covariance matrix, \(V_{e_d}\) , that is assumed to be known. In the second component, we assume that \(\mu_d\) is linearly related with \(p_d\) area-specific auxiliary variables \(X_d= (X_1,\dotsc,X_{p_d})^T\) as follows: \[\mu_d=X_d \beta_d + u_d, \quad u_d \sim N(0,V_{u_d}), \quad d=1,\dotsc,D\] , where \(u_d\) is area random effects, and \(\beta_d\) is a vector of regression coefficient corresponding with \(X_d\). This second component is called the linking model. The combination of the two components forms a multivariate linear mixed model as follows: \[y_d=X_d\beta_d + u_d + e_d,\quad e_d \sim N(0,V_{e_d}),\quad d=1,\dotsc,D\] , where \(u\) and \(e\) are independent.
Benavent and Morales (2016) proposed Fay-Herriot models using four different variance matrices, Model 0, Model 1, Model 2, and Model 3. Model 0 is a univariate Fay-Herriot Model, of which the sampling error and the random effect of target variables are independent. Sampling error and random effect variance matrix are written as follows: \[V_{ud} = diag_{a \leq r \leq R}(\sigma_{ur}^2)\]
\[V_{ed} = diag_{a \leq r \leq R}(\sigma_{edr}^2)\] , \[\text{where} \: d=1,\dotsc,D\] .
Model 1, Model 2, and Model 3 are multivariate Fay-Herriot models, of which the variance matrices are no longer diagonal matrices. Model 1 is a multivariate form of Model 0, where the random effect variance of Model 1 is still a diagonal matrix. Model 2 is called the autoregressive multivariate Fay-Herriot model (AR(1)), in which the random variance matrix is written as follows : \[V_{ud} = \sigma_{ur}^2 \Omega_d (\rho)\]
\[\Omega_d (\rho) = \frac{1}{1- \rho^2} \begin{bmatrix} 1 & \rho & \cdots & \rho^{R-1}\\ \rho & 1 & & \rho^{R-2}\\ \vdots & & \ddots & \vdots \\ \rho^{R-1} & \rho^{R-2} & \cdots & 1 \end{bmatrix}\] Model 3 is called heteroskedastic autoregressive multivariate Fay-Herriot model (HAR(1)), which the element of random error is written as follows:
\[u_{dr} = \rho u_{dr-1} + a_{dr}\]
\[u_{d0} \sim N(0, \sigma_{u0}^2) \quad \text{and} \quad a_{dr} \sim N(0, \sigma_{r}^2)\] , where \(\sigma_{u0}^2=1 , a_{dr}, \text{ and } u_{d0}\) are independent. The element of random variance matrix is written as follow: \[\sigma_{drii} = \sum_{k=1}^{i} \rho^{2k} \sigma_{i-k}^2\]
\[\sigma_{drij} = \sum_{k=0}^{|i-j|} \rho^{2k+|i-j|} \sigma_{|i-j|-k}^2\] .
The best linear unbiased prediction (BLUP) of \(\mu\) is \[\hat{\mu} = X \hat{\beta} + Z^T \hat{V_u} Z \Omega^{-1} (y-X \hat{\beta)}\] , where \(\hat{\beta} = (X^T \Omega^{-1} X)^{-1} X^T \Omega^{-1} y\) is the best linear unbiased estimator (BLUE) of \(\beta\) with the covariance matrix \(cov(\hat{\beta})=(X^T \Omega^{-1} X)^{-1}\). BLUP estimator depends on the random effect variance that is usually unknown. Using Restricted Maximum Likelihood (REML), we estimate and substitute random effect variance estimator for obtaining the multivariate EBLUB estimator. The estimation formula with EBLUP is written as follows: \[\hat{\mu} = X \hat{\beta} + Z^T \hat{V_u} Z \hat{\Omega}^{-1} (y-X\hat{\beta})\]
\[\hat{\Omega} = Z^T \hat{V_u} Z+ V_e\] , where \(\hat{\beta}=(X^T \hat{\Omega}^{-1} X)^{-1} X^T \hat{\Omega}^{-1} y\) is the best linear unbiased estimator (BLUE) of \(\beta\) with the covariance matrix \(cov(\hat{\beta})=(X^T \hat{\Omega}^{-1} X)^{-1}\).
Benavent and Morales (2016) also proposed MSE estimation for the multivariate Fay-Herriot models as follows: \[mse(\hat{\mu}) = g_{1i}(\hat{\sigma}_u^2) + g_{2i}(\hat{\sigma}_u^2) +g_{3i}(\hat{\sigma}_u^2)\] , where each component can be described as follows: \[g_{1i}(\hat{\sigma}_u^2) = \Gamma V_e\] , \[g_{2i}(\hat{\sigma}_u^2) = (1-\Gamma)X(X^T \Omega^{-1} X)^{-1} X^T (I - \Gamma)^T\] , \[g_{3i}(\hat{\sigma}_u^3) \approx \Sigma\Sigma cov(\hat{\sigma}_{uk}^2 , \hat{\sigma}_{ul}^2) \Gamma_{(k)} \Omega \Gamma^T_{(k)} , k,l = 1,2,\dotsc, q\] , where \(\Gamma = Z^T\hat{V_u} Z\), \(\Gamma_{(k)} = \frac{\delta \Gamma}{\delta \sigma_u^2}\), and \(cov(\hat{\sigma}_{uk}^2 , \hat{\sigma}_{ul}^2)\) is the inverse of the Fisher information matrix in the estimation of REML.
The R Package msae implements multivariate Fay-Herriot models for small area estimation. Here are some functions and the descriptions at a glance:
eblupUFH
: This function gives the EBLUP and MSE based on the
univariate Fay-Herriot model (Model 0).eblupMFH1
: This function gives the EBLUP and MSE based on the
multivariate Fay-Herriot model (Model 1).eblupMFH2
: This function gives the EBLUP and MSE based on the
autoregressive multivariate Fay-Herriot model (Model 2).eblupMFH3
: This function gives the EBLUP and MSE based on the
heteroskedastic autoregressive multivariate Fay-Herriot model (Model
3).Those functions return a list of five elements:
eblup
is a data frame of EBLUPs for each variable.MSE
is a data frame of the estimated MSEs of the EBLUPs.randomEffect
is a data frame of random effect estimators.Rmatrix
is a block diagonal matrix composed of sampling variances.fit
is a list containing the following objects:
method
shows the type of fitting method.convergence
shows the convergence of the Fisher Scoring
algorithm.estcoef
shows estimated model coefficients and their
significance.refvar
shows the estimated random effect variance.refvarTest
(only for eblupMFH3
) shows homogeneity of random
effect variance test based on Model 3.rho
(only for eblupMFH2
and eblupMFH3
) shows the estimated
\(\rho\) of random effect variance and their parameter test.informationFisher
is a matrix of information Fisher.This package also provides datasets generated for each multivariate model. The datasets are generated based on Model 1, Model 2, and Model 3 following steps:
Generate sampling error \(e\) and auxiliary variables « \(X1, X2\) ». Set the parameter as follows:
Generate random effect \(u\), where \(u \sim N_3(0,V_u)\). For each dataset, parameter for generating random effect \(u\) are as follows:
Set \(\beta_1=5\) and \(\beta_2=10\) to calculate direct estimation « \(Y1,\: Y2, \text{ and } Y3\) », where \(Y_i = X\beta + u_i + e_i\).
Auxiliary variables « \(X1\) and \(X2\) », direct estimates « \(Y1, Y2,\)
and \(Y3\) », and sampling variance-covariance «
\(v1, v2, v3, v12, v13,\) and \(v23\) » are combined into a data frame
called datasae1
for Model 1, datasae2
for Model 2, and
datasae3
for Model 3.
datasae1
, which is generated based on Model 1, contains 50
observations on the following 11 variables: 3 dependent variables «
\(Y1, Y2,\) and \(Y3\) », 2 auxiliary variables « \(X1\) and \(X2\) », and 6
variance-covariance of direct estimation « \(v1, v2, v3, v12, v13,\) and
\(v23\) ». The procedures for generating such datasets are provided in the
previous section. The following R commands are run to obtain EBLUPs of
the univariate Fay-Herriot model (Model 0) and the multivariate
Fay-Herriot model (Model 1), plot the EBLUPs of the univariate and
multivarate model, and plot the MSEs of EBLUPs in the univariate and
multivarate model:
data('datasae1')
# model specifications
<- list(f1=Y1~X1+X2,
Fo f2=Y2~X1+X2,
f3=Y3~X1+X2)
<- c("v1", "v2", "v3", "v12", "v13", "v23")
vardir
# EBLUP based on Model 0 and Model 1
<- eblupUFH(Fo, vardir, data=datasae1) # Model 0
u <- eblupMFH1(Fo, vardir, data=datasae1) #Model 1
m1
# Figure 1: EBLUPs under Model 0 and Model 1
par(mfrow=c(1,3))
plot(u$eblup$Y1, type = "o", col = "blue", pch = 15, xlab = "area", ylab = "Y1",
cex.axis = 1.5, cex.lab = 1.5, xaxt = "n")
points(m1$eblup$Y1, type = "o", col = "red", pch = 18)
axis(1, at=1:50, labels = 1:50)
legend("topleft",legend=c("Model 0","Model 1"),ncol=2 , col=c("blue","red"),
pch=c(15,18), inset=c(0.617,-0.1), xpd=TRUE)
plot(u$eblup$Y2, type = "o", col = "blue", pch = 15, xlab = "area", ylab = "Y2",
cex.axis = 1.5, cex.lab = 1.5, xaxt = "n")
points(m1$eblup$Y2, type = "o", col = "red", pch = 18)
axis(1, at=1:50, labels = 1:50)
legend("topleft",legend=c("Model 0","Model 1"),ncol=2 , col=c("blue","red"),
pch=c(15,18), inset=c(0.617,-0.1), xpd=TRUE)
plot(u$eblup$Y3, type = "o", col = "blue", pch = 15, xlab = "area", ylab = "Y3",
cex.axis = 1.5, cex.lab = 1.5, xaxt = "n")
points(m1$eblup$Y3, type = "o", col = "red", pch = 18)
axis(1, at=1:50, labels = 1:50)
legend("topleft",legend=c("Model 0","Model 1"),ncol=2 , col=c("blue","red"),
pch=c(15,18), inset=c(0.617,-0.1), xpd=TRUE)
# Figure 2: MSE of Model 0 and Model 1
par(mfrow=c(1,3))
plot(u$MSE$Y1, type = "o", col = "blue", pch = 15, xlab = "area", ylab = "MSE of Y1",
cex.axis = 1.5, cex.lab = 1.5, xaxt = "n", ylim=c(0.038,0.12))
points(m1$MSE$Y1, type = "o", col = "red", pch = 18)
axis(1, at=1:50, labels = 1:50)
legend("topleft",legend=c("Model 0","Model 1"),ncol=2 , col=c("blue","red"),
pch=c(15,18), inset=c(0.617,-0.1), xpd=TRUE)
plot(u$MSE$Y2, type = "o", col = "blue", pch = 15, xlab = "area", ylab = "MSE of Y2",
cex.axis = 1.5, cex.lab = 1.5, xaxt = "n", ylim=c(0.05,0.28))
points(m1$MSE$Y2, type = "o", col = "red", pch = 18)
axis(1, at=1:50, labels = 1:50)
legend("topleft",legend=c("Model 0","Model 1"),ncol=2 , col=c("blue","red"),
pch=c(15,18), inset=c(0.617,-0.1), xpd=TRUE)
plot(u$MSE$Y3, type = "o", col = "blue", pch = 15, xlab = "area", ylab = "MSE of Y3",
cex.axis = 1.5, cex.lab = 1.5, xaxt = "n", ylim=c(0.1,0.42))
points(m1$MSE$Y3, type = "o", col = "red", pch = 18)
axis(1, at=1:50, labels = 1:50)
legend("topleft",legend=c("Model 0","Model 1"),ncol=2 , col=c("blue","red"),
pch=c(15,18), inset=c(0.617,-0.1), xpd=TRUE)
Figure 1 illustrates the EBLUPs based on Model 0 (univariate model) and Model 1 (multivariate model) for all variables of interest. Figure 2 shows the MSEs of Model 1 compared with the MSEs of Model 0. It can be seen that the estimates of both methods show a similar pattern. Meanwhile, EBLUPs based on Model 1 has a lower MSE than Model 0. From this example, we can conclude that the multivariate Fay-Herriot model (Model 1) is more efficient than the univariate Fay-Herriot model (Model 0).
datasae2
, which was generated based on Model 2, contains 50
observations on the following 11 variables: 3 dependent variables «
\(Y1, Y2\) and \(Y3\) », 2 auxiliary variables « \(X1\) and \(X2\) », and 6
variance-covariance of direct estimation « \(v1, v2, v3, v12, v13,\) and
\(v23\) ». We compare the effectiveness of the univariate model (Model 0)
and autoregressive multivariate Fay-Herriot Model (Model 2) by their
MSE. We use eblupMFH2()
to estimate parameters based on Model 2. Then,
we plot the EBLUP and MSE of these methods to compare them.
data('datasae2')
# model specifications
<- list(f1=Y1~X1+X2,
Fo f2=Y2~X1+X2,
f3=Y3~X1+X2)
<- c("v1", "v2", "v3", "v12", "v13", "v23")
vardir
# EBLUP based on Model 0 and Model 2
<- eblupUFH(Fo, vardir, data=datasae2) # Model 0
u <- eblupMFH2(Fo, vardir, data=datasae2) # Model 2 m2
The EBLUPs based on Model 0 (univariate model) and Model 1 (autoregressive multivariate model) are shown in Figure 3. As it can be seen, both methods show an almost similar result. However, EBLUPs based on Model 2 has a lower MSE than the EBLUPs based on Model 0, as shown in Figure 4. In this example, the autoregressive multivariate Fay-Herriot model (Model 2) seems to be more efficient than the univariate Fay-Herriot model (Model 0).
datasae3
, which was generated based on Model 3, is structured the same
as datasae1
and datasae2
. We compare the effectiveness of the
univariate model (Model 0) and heteroskedastic autoregressive
multivariate Fay-Herriot Model (Model 3) by their MSE. We use
eblupMFH3()
to estimate parameters based on Model 3. Then, we plot the
EBLUP and MSE of these methods to compare them.
data('datasae3')
# model specifications
<- list(f1=Y1~X1+X2,
Fo f2=Y2~X1+X2,
f3=Y3~X1+X2)
<- c("v1", "v2", "v3", "v12", "v13", "v23")
vardir
# EBLUP based on Model 0 and Model 3
<- eblupUFH(Fo, vardir, data=datasae3) # Model 0
u <- eblupMFH3(Fo, vardir, data=datasae3) # Model 3 m3
Figure 5 shows EBLUPs based on Model 3 compared to Model 0. The MSEs of both methods are shown in Figure 6. It can be seen that the multivariate EBLUPs will follow the pattern of univariate EBLUPs with smaller MSE values. It can be concluded that the heteroskedastic autoregressive multivariate Fay-Herriot model (Model 3) is more efficient than the univariate model (Model 0).
In this section, we use incomedata
dataset, which is provided in
library sae. The dataset contains unit-level data on income and other
related variables in Spain. We will demonstrate how to estimate the
EBLUP of Foster-Greer-Thorbecke (FGT) poverty index for each province
using the multivariate Fay-Herriot Models. The index consists of three
indicators, i.e., poverty proportion (p0), poverty gap (p1), and poverty
severity (p2).
library(sae)
data("incomedata")
Firstly, we obtain area-level data by calculating poverty indicators for each unit and aggregating it by province based on Benavent and Morales (2016). We use poverty line z=6557.143 (Molina and Marhuenda 2015). These following R commands are run to obtain \(p0\), \(p1\), and \(p2\) as variables of interest.
library(tidyverse)
<- rep(6557.143, dim(incomedata)[1]) # poverty line
pov.line
# calculate unit indictators
$y <- (pov.line - incomedata$income)/pov.line
incomedata= incomedata %>% mutate(poverty = ifelse(incomedata$y > 0, TRUE, FALSE),
incomedata y0 = ifelse(incomedata$y > 0, incomedata$y^0, 0),
y1 = ifelse(incomedata$y > 0, incomedata$y^1, 0),
y2 = ifelse(incomedata$y > 0, incomedata$y^2, 0))
# estimated domain size
<- aggregate(incomedata$weight, list(incomedata$prov), sum)[,2]
est.Nd
## estimate P0 P1 dan P2
= incomedata %>% group_by(prov) %>%
prov.est summarise(p0.prov = sum(weight*y0),
p1.prov = sum(weight*y1),
p2.prov = sum(weight*y2)) %>%
mutate(p0.prov = p0.prov / est.Nd,
p1.prov = p1.prov / est.Nd,
p2.prov = p2.prov / est.Nd)
<- incomedata %>% left_join(prov.est, by = c("prov" = "prov")) incomedata
We also need variance and covariance of variables of interest to estimate using the multivariate Fay-Herriot model. The following R commands are run to obtain variance and covariance of direct estimation based on Benavent and Morales (2016).
# estimate direct estimation variance-covariance
= incomedata %>%
prov.variance mutate(v11 = ifelse(incomedata$poverty,
*(weight-1))*(y0-p0.prov)*(y0-p0.prov), 0),
(weightv12 = ifelse(incomedata$poverty,
*(weight-1))*(y0-p0.prov)*(y1-p1.prov), 0),
(weightv13 = ifelse(incomedata$poverty,
*(weight-1))*(y0-p0.prov)*(y2-p2.prov), 0),
(weightv22 = ifelse(incomedata$poverty,
*(weight-1))*(y1-p1.prov)*(y1-p1.prov), 0),
(weightv23 = ifelse(incomedata$poverty,
*(weight-1))*(y1-p1.prov)*(y2-p2.prov), 0),
(weightv33 = ifelse(incomedata$poverty,
*(weight-1))*(y2-p2.prov)*(y2-p2.prov), 0)) %>%
(weightgroup_by(prov) %>%
summarise_at(c("v11","v12", "v13", 'v22',"v23","v33"), sum) %>%
mutate_at(c("v11","v12", "v13", 'v22',"v23","v33"),
function(x){x/est.Nd^2})
We use six explanatory variables selected by stepwise method, i.e., an indicator of age group 50-64 (\(age4\)), an indicator of age group >=65 (\(age5\)), an indicator of education level 1 (\(educ1\)), an indicator of education level 2 (\(educ2\)), an indicator of education level 3 (\(educ3\)), and an indicator of Spanish nationality (\(nat1\)). The model specifications are written as follow:
<- list(f1=p0.prov~age4+age5+educ1+educ2+educ3+nat1,
formula f2=p1.prov~age4+age5+educ1+educ2+educ3+nat1,
f3=p2.prov~age4+age5+educ1+educ2+educ3+nat1)
<- c("v11", "v22", "v33", "v12", "v13", "v23") vardir
Next, we select the most suitable multivariate Fay-Herriot model using variance homogeneity test and random effect \(\rho\) parameter test. In the variance homogeneity test, we test \(H0 : \hat{\sigma}_{ui}^2 = \hat{\sigma}_{uj}^2 \: ; \: i,j=1,2,3\) using model 3. We obtain a non-convergent model, the p-values are 0.98674 and 0.98996. It shows that the difference between variance of random effects is statistically not significant. After that, we test \(H0 : \rho = 0\) using model 2. We get the t-statistics value of 19.26 with p-value of 0.00. It shows that there is a correlation between random effects. These results indicate the model that fits the data is Model 2.
The following codes are run to obtain the EBLUPs (under Model 2), to plot the direct estimates and the EBLUPs, and to plot the MSEs of direct estimates and the MSEs of EBLUPs ordered by sample size:
# EBLUP based on Model 2
<- eblupMFH2(formula, vardir, data=prov.data)
eblup.pov
# Dataframe of Result
= data.frame(prov = prov.data$prov,
result_eblup est.Nd = est.Nd,
p0 = prov.data$p0.prov, p0.eblup = eblup.pov$eblup$p0.prov,
p1 = prov.data$p1.prov, p1.eblup = eblup.pov$eblup$p1.prov,
p2 = prov.data$p2.prov, p2.eblup = eblup.pov$eblup$p2.prov,
v11 = prov.data$v11, p0.mse = eblup.pov$MSE$p0.prov,
v22 = prov.data$v22, p1.mse = eblup.pov$MSE$p1.prov,
v33 = prov.data$v33, p2.mse = eblup.pov$MSE$p2.prov)
= result_eblup %>% arrange(est.Nd)
result_eblup $prov = as.factor(result_eblup$prov)
result_eblup$id = 1:nrow(result_eblup)
result_eblup
# Figure 7: Direct estimates estimates and EBLUPs (under Model 2) ordered by sample size.
par(mfrow=c(1,3))
plot(result_eblup$id, result_eblup$p0, type = "o", col = "blue", pch = 15, xlab = "area",
ylab = "p0", cex.axis = 1.5, cex.lab = 1.5, xaxt = "n")
points(result_eblup$p0.eblup, type = "o", col = "red", pch = 18)
axis(1, at=result_eblup$id, labels = result_eblup$prov)
legend("topright",legend=c("Direct estimates","Model 2"),col=c("blue","red"), pch=c(15,18))
plot(result_eblup$id, result_eblup$p1, type = "o", col = "blue", pch = 15, xlab = "area",
ylab = "p1", cex.axis = 1.5, cex.lab = 1.5, xaxt = "n")
points(result_eblup$p1.eblup, type = "o", col = "red", pch = 18)
axis(1, at=result_eblup$id, labels = result_eblup$prov)
legend("topright",legend=c("Direct estimates","Model 2"),col=c("blue","red"), pch=c(15,18))
plot(result_eblup$id, result_eblup$p2, type = "o", col = "blue", pch = 15, xlab = "area",
ylab = "p2", cex.axis = 1.5, cex.lab = 1.5, xaxt = "n")
points(result_eblup$p2.eblup, type = "o", col = "red", pch = 18)
axis(1, at=result_eblup$id, labels = result_eblup$prov)
legend("topright",legend=c("Direct estimates","Model 2"),col=c("blue","red"), pch=c(15,18))
# Figure 8: MSE of Direct estimates and MSE of EBLUPs (under Model 2) ordered by sample size.
par(mfrow=c(1,3))
plot(result_eblup$id, result_eblup$v11, type = "o", col = "blue", pch = 15, xlab = "area",
ylab = "p0", cex.axis = 1.5, cex.lab = 1.5, xaxt = "n")
points(result_eblup$p0.mse, type = "o", col = "red", pch = 18)
axis(1, at=result_eblup$id, labels = result_eblup$prov)
legend("topright",legend=c("Direct estimates","Model 2"),col=c("blue","red"), pch=c(15,18))
plot(result_eblup$id, result_eblup$v22, type = "o", col = "blue", pch = 15, xlab = "area",
ylab = "p1", cex.axis = 1.5, cex.lab = 1.5, xaxt = "n")
points(result_eblup$p1.mse, type = "o", col = "red", pch = 18)
axis(1, at=result_eblup$id, labels = result_eblup$prov)
legend("topright",legend=c("Direct estimates","Model 2"),col=c("blue","red"), pch=c(15,18))
plot(result_eblup$id, result_eblup$v33, type = "o", col = "blue", pch = 15, xlab = "area",
ylab = "p2", cex.axis = 1.5, cex.lab = 1.5, xaxt = "n")
points(result_eblup$p2.mse, type = "o", col = "red", pch = 18)
axis(1, at=result_eblup$id, labels = result_eblup$prov)
legend("topright",legend=c("Direct estimates","Model 2"),col=c("blue","red"), pch=c(15,18))
Variable of Interest | Statistic | Direct Estimation | Model 2 |
---|---|---|---|
p0 | Min | 0.05244 | 0.04601 |
Quartil 1 | 0.17296 | 0.18947 | |
Median | 0.21850 | 0.21937 | |
Mean | 0.22659 | 0.22020 | |
Quartil 3 | 0.27753 | 0.25468 | |
Max | 0.43588 | 0.35011 | |
Standard Deviation | 0.08173 | 0.056597 | |
p1 | Min | 0.01891 | 0.02551 |
Quartil 1 | 0.05336 | 0.05969 | |
Median | 0.06810 | 0.06866 | |
Mean | 0.07567 | 0.07407 | |
Quartil 3 | 0.09208 | 0.08866 | |
Max | 0.15973 | 0.14023 | |
Standard Deviation | 0.03227 | 0.02506 | |
p2 | Min | 0.005449 | 0.008481 |
Quartil 1 | 0.025819 | 0.026867 | |
Median | 0.032344 | 0.033600 | |
Mean | 0.039042 | 0.038179 | |
Quartil 3 | 0.051379 | 0.049755 | |
Max | 0.099308 | 0.087194 | |
Standard Deviation | 0.02083 | 0.017137 |
We will compare EBLUPs based on Model 2 with the direct estimates. Both of the estimation results can be seen in Table 1. On the median value, the result of estimated poverty indicators are relatively similar, ranging from 0.218-0.219 for p0, 0.0681-0.0687 for p1, and 0.0323-0.0336 for p2. Model 2 has a lower range and smaller standard deviation than the direct estimation. It means that, in general, the multivariate Fay-Herriot model has lower variability of small area estimates than direct estimation.
The results of direct estimates and EBLUPs under Model 2 ordered by sample size are shown in Figure 7. The patterns of small area estimates for both methods are almost the same for all areas. The MSEs of the direct estimates and the EBLUP estimates ordered by sample size are shown in Figure 8. These plots show that the multivariate Fay-Herriot model has lower MSE than the direct estimation. Thus, we can conclude that the multivariate Fay-Herriot model is more efficient than direct estimation.
This paper introduces the first R package of multivariate Fay-Herriot
model for small area estimation named msae. The package is available
on Comprehensive R Archive Network (CRAN) at
http://CRAN.R-project.org/package=msae. This package contains a number
of functions for estimating the EBLUP and MSE of EBLUP of each
Fay-Herriot Model. This package accommodates the univariate Fay-Herriot
model (model 0), multivariate Fay-Herriot model (model 1),
autoregressive multivariate Fay-Herriot model (model 2), and
heteroskedastic autoregressive multivariate Fay-Herriot model (model 3).
The functions are described and implemented using three examples of
datasets provided in msae package and a real dataset provided in sae
package, incomedata
. By these examples, we show that the multivariate
Fay-herriot models produce more efficient parameter estimates than
direct estimation and univariate model.
sae, rsae, nlme, hbsae, JoSAE, BayesSAE, mme, saery, msae
Bayesian, ChemPhys, Econometrics, Environmetrics, Finance, MixedModels, OfficialStatistics, Psychometrics, Spatial, SpatioTemporal
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Permatasari & Ubaidillah, "msae: An R Package of Multivariate Fay-Herriot Models for Small Area Estimation", The R Journal, 2021
BibTeX citation
@article{RJ-2021-096, author = {Permatasari, Novia and Ubaidillah, Azka}, title = {msae: An R Package of Multivariate Fay-Herriot Models for Small Area Estimation}, journal = {The R Journal}, year = {2021}, note = {https://rjournal.github.io/}, volume = {13}, issue = {2}, issn = {2073-4859}, pages = {111-122} }