Small Area Estimation Of Expenditure Per-capita in Banyuwangi with Hierarchical Bayesian and Empirical Bayes Methods

One of the economic indicators that are widely used to measure the level of prosperity and welfare is per capita income. However, an accurate income data is difficult to be obtained. In Susenas this data is approached by using data on expenditures per capita. This study employ Hierarchical Bayes (HB) and Empirical Bayes (EB) methods to be applied to Small Area Estimation (SAE) to estimate the expenditure per-capita in Banyuwangi. The results showed indirect estimation using hierarchical Bayes and Empirical Bayes produce RMSE values smaller than the direct estimation. The HB method, on the other hand, produces smaller RMSE value than the EB method. Finally, this research suggests to use HB method to estimate the expenditure per-capita in Banyuwangi rather than direct estimation which is used nowadays.

I. INTRODUCTION 1 he development of an area is said to succeed if the welfare level of an entire region includes the level of individual or household. One of the most widely used economic indicators for measuring prosperity and welfare is per capita income. However, accurate income data is difficult to obtain, so in Susenas activities this data is approached through household expenditure data. Household expenditure consisting of food and non-food expenditure can illustrate how the population allocates household needs.
Badan Pusat Statistik (BPS) usually conducts the Survei Sosial Ekonomi Nasional (Susenas) on per capita expenditures, this survey is designed to collect population social data on a relatively broad scope of district / city level. If the results of this survey are used to make predictions at smaller levels, such as sub-districts or villages, then it is likely to produce large biased and variance estimates caused by a less representative sample size to represent the population. Sources of data in a study are usually constrained on relatively small number of samples, one attempt is to increase the number of samples, but often the cost is quite expensive. Another effort that can be done is optimizing the data available with a Small Area Estimation (SAE).
Small Area Estimation (SAE) is a statistical technique to estimate subpopulation parameters whose sample size is small [1]. This estimation method utilizes data from large scale for estimating parameters on a smaller scale that is not sampled. A simple approximation of a small area based on a design-based application is called direct estimation. This direct estimation could not guarantee the accuracy when the 1  sample size in a small area is small or zero (does not sampled), so the statistics obtained will have a large variance. There is no rule to make a prediction, because it is not represented in the survey [2]. SAE has been implemented in several countries. Ndeng'e [3] from Kenya builds a poverty map in Kenya based on a combination of information from the Welfare Monitoring Survey (household survey) in 1997 with the 1999 Population Census. In Indonesia, Kurnia and Notodiputro [4] perform data simulations to evaluate some SAE standard techniques and apply SAE techniques with indirect methods on poverty data of West Java. Wardani [5] in the case study of per-capita expenditure estimation in Bogor City, the result of his research concluded that empirical Bayes estimation method with Jackknife approach resulted in Relative Root Mean Square Error (RRMSE) smaller than EBLUP method. Other studies that use SAE is Rumiati [6] researched the SAE with unequal probability sampling for binomial and multinomial responses using Empirical Bayes (EB).
Study about method of EB by Fausi [7] estimates per capita expenditure at the sub-district level in Sumenep by differentiating into mainland and island groups and a study with the same data was also performed by Darsyah [8] using the estimation method Kernel-Bootstrap approach. From two studies with different approaches, indirect estimates resulted in more precise estimates than direct estimates based on MSE values.
Various SAE methods have been developed especially regarding model-based methods as an alternative to direct estimation. The methods are Empirical Best Linear Unbiased Prediction (EBLUP), Empirical Bayes (EB), and Hierarchical Bayes (HB). EBLUP method is an estimation of parameters that minimize Mean Square Error by substituting unknown variant components with variant estimators through sample data. In the EB method, the model parameters estimated from marginal distribution of T data, and then the inference is based on the posterior distribution estimated. In the HB method, parameter estimation is based on posterior distribution where Parameters are estimated with posterior averages, and precision is measured by its posterior variant [9].
The EB and HB methods are the more general method that can handle continuous data, binary and count. Therefore, this study compared two SAE models that are SAE model with HB method and SAE Model with EB method in expenditure per capita population per sub district in Banyuwangi.

A. Small Area Estimation
There are two types of models in SAE ie area level model and unit level model. The area level model is a model based on the availability of supporting data that exists only for a given area level. Let = � 1 , 2 , … , � and the parameter to be expected is for the i-area assumed to be related to . The supporting data is used to build the = + model with = 1,2, … , and ~(0, 2 ) as the random effect is assumed to be normally distributed. The general model of the area level also assumes that a direct survey estimate of the observed variable denoted as is assumed that � = + , Where the sampling error ~(0, 2 ) with 2 is known. The combination of the two models will form the equation (1) which is a model of mixed linear level area known as the Fay-Herriot model [10]. � = + + , = 1,2, … , (1) Where is known to be a constant positive value.

B. Hierarchical Bayes
The HB approach to the area-level model in equation (1) is assumed that the prior distribution on the ( , 2 ) parameter model for cases with 2 is known and assumed to be 'flat' prior to via ( ) ∝ 1, and rewritten according to equation (1) for the HB model..
For the case of unknown 2 , equation (2) becomes,

C. For 2 Unknown
In the case where 2 is unknown, use Gibbs sampling for the area-level model for (i) and (ii) of equation (2), assume the priorities and 2 in equation (3) with the Gamma distribution with the shape parameter and scale parameter . −2~( , ), > 0, > (4) 2 distributes gamma inverse ( , ) with : The positive constants and are made very small. Gibbs conditional is proven through, i. iii. Where, All conditional Gibbs have a closed form so that MCMC samples can be generated directly from the conditional (i) -(iii).
The mean posterior of ( | ) in the HB approach is used as an estimate of the positional point and variance of ( | ) as a measure of diversity. The Gibbs sampler method [11] with the Metropolis Hasting algorithm [12] can be used to find posterior mean and variance. Define MCMC sample as �� ( ) , ( ) , 2( ) � , = + 1, … , + � with posterior mean and variance. and For more efficient estimators can be obtained from the results of closed-form exploration of equation (3) for 2 is known. and

D. Empirical Bayes
The Empirical Bayes Method (EB) is one of the approaches that can be used on SAE based on the bayesss method. The first step taken on the bayesss method is to obtain a posterior distribution for the observed parameter denoted ( | , , 2 ), assuming and 2 are known. However, in the EB method, the inferences obtained are based on the estimate of the posterior distribution of by including the estimation values of and 2 that are Data from variable support (auxiliary variables) are included within the model. Supporting data available only up to the area level is = � 1 , 2 , … , � , then the model for the Empirical Bayes approach using the model in equation (1) is also known as the fay-Herriot model where ~(0, 2 ) and ~(0, ) , and are independent. and 2 are unknown while is assumed to be known. Let 2 and be symbolized by A and , then the bayes estimator for by following the bayes model : i. |~( , ). ii.
~( , ) is the prior distribution for , = 1,2, … , . Bayes model's explanation is given as follows : and So that for = ( 1 , 2 , … , ) and = ( 1 , 2 , … , ) , Consider two exponential functions regardless of factor (-1/2) on with 1 * is constant and does not contain so that, Based on that formula, we get a bayes estimator for : When the parameter is known, the in the above formula can be estimated by the Maximum Likelihood method. But in reality, is not known, to estimate the parameter also uses the Maximum Likelihood Estimation (MLE) or Restricted/Residual Maximum Likelihood (REML) method. Estimator Uses REML consistently despite the violation of the normality assumption [13]. Because and are estimated then obtained an estimate of Empirical Bayes as follows: with � = �̂+ � ⁄ Based on the Bayes method, obtained: The MSE estimator is underestimated because of the estimation of and values. This can be corrected using the jackknife approach. The jackknife's method is one of the most frequently used methods in surveying because of its simple concept [14].

E. Mean Square Error (MSE) Jackknife
The procedure for applying the jackknife approach in estimate MSE is as follows : 1. Calculate the estimator of � 1 as : where 1 � � ( ) 2 � obtained by deleting the th observation from the full data set 1 ( � 2 ). 2. Calculate the estimator of � 2 as : where � � ( ) � obtained by deleting the th observation from the full data set � � �. 3. Calculate the jackknife estimator of MSE as : (23) Jackknife methods developed by Jiang, Lahiri, and Wan can be used for all models for SAE, including mismatched models and for cases that are not normally distributed [1].

F. Per-Capita Expenditure
Average expenditure per capita monthly indicates the totals of expenditure of each household member within a month, while the definition of a household is a group of people who inhabit some or all physical buildings and usually live together and eat from one kitchen (BPS 2003). In one household may consist of one, two, or more heads of households. The expenditure per capita is formulated as follows : Where is expenditure per capita, is the amount of household expenditure a month and is number of household members.

A. Exploration of Per-Capita Expenditure Data
The proposed methods are applied to the per-capita expenditure data in Banyuwangi District, which is available in Susenas 2015. There are 23 sub-districts with only one sub district, i.e. Siliragung sub district, is not in sample. In figure 1 show the distribution pattern of expenditure per-capita in Banyuwangi Regency forms a normal distribution pattern. By using EasyFit v5.5 the normality test result using Anderson-Darling method obtained the value of AD of 0.4389 greater than 2.5018 with (α = 5%) which means that the failure to reject H0, which means that variable expenditure per capita is normal distribution.

B. Exploration of independent variable data
Estimation of expenditure per capita is done with the help of five independent variables. Descriptive of the independent variables is presented in Table 2.  Table 2 show that, the average of population density ( 1 ) in Banyuwangi is 665. This means that the average area of one km 2 is inhabited by 665 residents. The most densely populated sub district is Banyuwangi Sub-district of 3594 people / km 2 , and the smallest in Kecamatan Tegaldlimo is 46 people / km 2 . For the percentage of poor people ( 2 ), the average value is 16.06%, which means that average of 100 populations, there are 16 poor people, where the largest percentage of poor people in Kecamatan Licin is 30.27%, and the smallest in Kecamatan Gambiran is 7.38 %. For educational problems, the average number of residents who are schools ( 3 ) in Banyuwangi is 12194 people, While for welfare problems in terms of the presence or absence of electricity services from PLN ( 4 ), Banyuwangi residents have subscribed to electricity PLN with an average of 18980 customers in each sub district. The average number of household members ( 5 ) equal 3.02. This shows that there are 3 household members per household.
To determine whether there is a linear relationship between each independent variable to expenditure per capita in Banyuwangi, then the correlation test is done. H0 : = 0 H1 : ≠ 0 With a significance of 5% (α = 0.5), the results from this correlation test are presented in Table 3.  Table 3 show that p-value less than ( = 0,05) is population density ( 1 ) which means that population density has significant linear relation to expenditure per capita in Banyuwangi.

C. SAE Model on Expenditure Per-Capita Using HB Method
The Small Area estimation method with the HB approach is used to estimate expenditure per capita at the sub district level in Banyuwangi District. The estimation is done using the help of Win BUGS software. In estimating the � we first estimate β and 2 through MCMC method with Gibbs sampling algorithm.
In this study, Markov chain convergence was obtained after burn-in as much as 50 out of 20.000 iterations performed, with number of thin is 10. The result of the trace plot indicates that the Markov chain has converged because the parameter estimation value not formed the up and down pattern. The density plot for parameter β, it shows that the priority density form is relatively normal distribution in accordance with its full conditional function, as well as for parameter 2 showing a smooth density plot. The autocorrelation plot has shown a cut off since lag 0 indicating the MCMC sample is independent. From the iteration result on MCMC process, we can get parameter estimation for β and 2 parameter shown in Table 4.  Table 4 show that parameters ̂0 and ̂1 have a significant effect to the per-capita expenditure. This is shown by the 95% credible interval generated for ̂0 and ̂1 do not containt a zero value.

D. SAE Model on Expenditure Per-Capita Using EB Method
For doing the Empirical Bayes model, we first estimate the variance of the random effect factor (A) using the Restricted Estimation Maximum Likelihood (REML) method and is obtained the value of 1,6975. The next is to estimate the value of ̂ using Maximum Likelihood Estimation. The Small Area Estimation model using Empirical Bayes method based on (7) is as follows: � = 7,7352 + 1,0044 1 + (1 − ) � − (7,7352 + 1,0044 1 )� (24) Where = / (1,6975 + ).
is the value of sampling error variance that is assumed to be known. The value is estimated from the 2 / value which is the ratio between the variance of expenditure per capita value and the number of samples in each sub district.
After the model is obtained, the next step is to estimate the expenditure per capita from the surveyed sub-districts. The following is a general overview of per-capita expenditure estimates using the Empirical Bayes method.

E. Comparison of Estimation Results Between HB Method and EB Method of Per-Capita Expenditure
After estimating per-capita expenditure using both direct and indirect estimates (HB method and EB method), the next step is to estimate the MSE value from the second result of the estimate. In this, study applied jackknife method to correct the bias of the estimator. Figure 5 shows the MSE value of direct estimation and indirect estimation (HB method and EB method). Based on Figure 5, the MSE of direct estimates tends to be higher than the MSE of indirect estimation. MSE values indirect estimation methods of HB (MSE_HB) and EB (MSE_EB) methods are more precise than direct estimation methods (MSE_D). It can also be seen from the box plot comparison of MSE values from direct estimates and indirect estimates (HB method and EB method) in Figure 6. Based on Figure 6, the MSE value of indirect estimation is generally smaller than the MSE value of the direct estimation. It can be seen as the MSE value of direct estimation, there is a large value. MSE value that be an outlier is the MSE of the estimated expenditure per capita in Giri Subdistric while indirect estimation MSE there are no outliers. This indicates that the value of indirect estimation MSE is more precise than the direct estimation.
Evaluation of the three approach of estimation, (direct and indirect approach) can be determined by comparing the value of their RMSE after employing the jackknife's method. A smaller RMSE value indicates that the approach has good accuracy.  Figure 7 shows that the RMSE value of the indirect estimator smaller than the direct estimator, and the RMSE points to the HB method show a smaller value than the EB method in all the sampled sub districts.
In Susenas data of 2015, Siliragung subdistrict is not sampled so it will be estimated using the best model that is HB model. According to Rao [1], the concept of synthetic estimation can be used to estimate the expenditure per capita on the District that is not surveyed, with the assumption that the behavior between the sub district in Banyuwangi is same (the same value). The expected value of the Small Area Estimation model is , So expenditure per capita is calculated by the formula : � =̂ (25) The estimation of expenditure per capita in Siliragung subdistrict equal to Rp 805.675,7.

IV. CONCLUSION
Small area estimation of per-capita expenditure using Jackknife's HB approach and Jackknife's EB approach had more accurate results than direct estimation. These two indirect estimation can be employed than the traditional direct one. Thus improving the RMSE value is very significant than the direct estimator, although data has a sampling error variance is not homogeneous and diversity of big sub districts. Between two indirect methods, HB estimation method with Jackknife approach produces smaller RMSE values compared with EB estimation method with Jackknife approach in predicting expenditure per capita per sub district in Banyuwangi.