Semiparametric Spline Truncated Regression on Modelling AHH in Indonesia

Life expectancy (AHH) is an indicator that can reflect the health status of a region, whether from infrastructure, access, and health quality. As one of the dimensions of the Human Development Index (HDI), AHH is deemed to need global attention. Today, AHH growth in developing countries is slow even slower when compared to underdeveloped countries. In Indonesia, the trend of life expectancy at birth continues to increase, but the achievement of national AHH is still lagging behind the AHH of neighboring countries. In addition, as an archipelagic country, the achievement of AHH among provinces still shows a disparity. This study modeled the factors that determine AHH with semiparametric spline truncated regression approach. The method of selecting optimum knots using Generalized Cross Validation (GCV) method. The best model that is formed is a model using three knots with the coefficient of determination of 84.70 percent. Significant variables were percentage of the poor population, percentage of households using clean drinking water, percentage of population who have health complaints, the percentage of under five years who have been complete immunized, and Mean Years of Schooling (MYS). The results of this study are expected to be an input for the Government to take policy in order to improve the national AHH as a whole. Keywords―Life Expectancy, Semiparametric, Spline Truncated, Likelihood Ratio Test (LRT).


I. INTRODUCTION 1
Regression analysis is one of the statistical methods are often used to determine the relationship between the response variable and the predictor variable. If the pattern of the relationship between the response variable and the predictor variables partly known and partly unknown pattern, it is advisable to use a semiparametric regression approach [1]. One model of semiparametric regression approach is spline. Spline models have very special and very nice statistical interpretation and visual interpretation, and it has a very good ability in handling data that behavior change in subgroups specified interval [2]. Therefore, the spline method developed in the last decade. Budiantara [3] developed a spline estimator in nonparametric regression by using a base spline function family. Truncated approach using spline basis family function truncated gives mathematical calculations easier and simpler, and optimization is used without involving a penalty that optimization Least Square (LS). Engle [4] introduced a semiparametric regression to estimate the relationship between weather and electricity sales to approach a linear spline. Ruliana [5] conducted a study on simultaneous hypothesis testing spline models on Structural Equation Modeling (SEM) Nonlinear.
This study will be modeling the factors that influence AHH in Indonesia by spline models truncated on regression semiparametric. The population is one of the assets owned by the nation. Indonesia is one of the countries that occupy the top five with the largest population in the world. In 2015, Indonesia was ranked fourth after China, India, and the United States. As a great nation, Indonesia should have the ability to perform development by making the component of HDI to measure the level of public health as well as a benchmark for the success of development. According to observations by the WHO, AHH growth in developing countries slow moving even slower when compared with under developed country. Study Global Burden Disease (GBD) under the supervision of the Institute for Health Metrics and Evaluation (IHME) said that the developing countries are currently facing serious challenges that lifestyle and deadly diseases that are heart disease, stroke and diabetes that can affect all walks of life. Increased life expectancy nationally in Indonesia still puts Indonesia under AHH of neighboring countries. Sourced from the United Nations in its publications, BPS recorded AHH of few countries in the world during the period of 1990 -2015. In the period of the year, AHH Indonesia was still under Singapore (82.2), Malaysia (74.9), Thailand (74.3), and Cambodia (71.6). Along with the increase AHH nationally, the provinces in Indonesia also experienced positive growth AHH. However, there appears disparity of AHH between provinces.

A. Spline Truncated in Semiparametric Regression
If given pairwise data ( , , ), = 1,2, … , , where is the response variable, is the predictor variable following the parametric pattern and is the predictor variable following the nonparametric pattern, so the relationship patterns, and can be accepted in the regression model such as equations.
= � ′ � + ( ) + , = 1,2, … , (1) Furthermore, if the regression curve is approximated by a spline regression curve with knots K1, K2, …, Kr then: Where is parameter nonparametric component and truncated function ( − ) + is: The regression curve ( ) is a truncated spline nonparametric regression curve with m degree and with many of r knots point, m degree is a degree in a polynomial equation. The 1 , 2 , … , knots point are the knots that show the changes on pattern behavior of curves in the different interval subgroups, which 1 < 2 < ⋯ < . So, the truncated spline semiparametric regression equation in equation (1) becomes: The truncated spline semiparametric regression above, consists of a response variable with one or more parametric predictor variables and only one nonparametric predictor variable. If the semiparametric regression consists of a response variable with more than one predictor variables, which is parametric and nonparametric components, with the composition of the data such as � 1 , … , , 1 , … , �, then the relationship between � 1 , … , , 1 , … , � and can be written such us: ) + � (5) Equation (5) also can be written as follows: , is a matrix that contains predictors of parametric components of size × ( + 1) and T is a matrix that contains predictors of nonparametric components of size × (( + ) ) that depend on knots point � , where � is vector of knots point of size is parameter vector of size �( + 1) + ( + ) � × 1 and ̃ is error vector.

B. Parameter Estimation
Getting the parameter estimation under (Ω) using the Maximum Likelihood Estimation (MLE) method. Likelihood function under (Ω) is The results of partial derivatives of the equation (7) are and Getting the parameter estimation under using the method of Lagrange Multiplier Function (LM). The LM function given that So that would be obtained � � as below:

C. Formulation of Partial Hypothesis Testing
Formulation of partial hypothesis used to test the significance of the parameters on semiparametric spline truncated regression model is as below: Define the parameters space under (Ω) is as below: Define the parameters space under 0 (ω) is as below:

D. Statistics Test and Rejection Area for Partial Hypothesis
Furthermore, to obtain a statistical hypothesis test of equation (12) were completed using the LRT.

E. Selection of Optimal Knot Point
The important thing in semiparametric spline truncated regression is the selection of optimal knot point. One commonly used method of choosing an optimal knot point is the Generalized Cross Validation (GCV).

F. Life Expectancy (AHH)
Life Expectancy at birth by the World Bank is the average number of years of life expectancy of a group people born in the same year, assuming deaths at each age remain constant in the future [6].

A. Source Data and Research Variables
The data used in this research is secondary data, i.e derived data from the National Socio Economic Survey (Susenas) 2015 published by the Statistics Indonesia (BPS) in Welfare Statistics 2015 and education data has been published in the Human Development Index 2015. The observation unit used in this study was all provinces in Indonesia.

B. Variables Used
In this study, the variables used are the response variable that life expectancy in Indonesia, the predictor variables, i.e percentage of poor population, the percentage of households using clean drinking water the percentage of population who had health complaints, the percentage of under-fives who have been complete immunized, and Mean Years of Schooling.

C. Step of Analysis
Stages of research as follows: 1. Create plot the response variable with each predictor variable. 2. Determine the variable component of parametric and nonparametric components. 3. Modeling the relationship between the response variable and the predictor variables using semiparametric spline truncated estimator for 1 knots point, two knots point, 3 knots point and knots point combinations 4. Choosing the optimal knots point based on GCV method. 5. Testing the significance of parameters simultaneously. 6. Perform a partial significance testing parameters. 7. Examine the assumptions are independent, identical and normal distribution for residuals. 8. Make interpretation semiparametric spline truncated regression of the model AHH in Indonesia.

IV. RESULTS AND DISCUSSION
A. Analysis Descriptive Characteristics of each variable, both response and predictor variables can be seen in table 1. The highest percentage of poor population Indonesia in 2015 is located in Papua province and the lowest is in DKI Jakarta Province, that is 3.93 percent. The average percentage of poor population Indonesia in 2015 is 11.83 percent with a standard deviation 6.16.
The lowest percentage of households using clean drinking water in 2015 was Bengkulu Province 41.08 percent and the highest was DKI Jakarta at 93.40 percent. So the percentage range of households using clean drinking water is 52.32 percent. The average percentage of households using clean drinking water in Indonesia is 68.62 percent with standard deviation of 11.04.
The average of percentage population who had health complaints in 2015 is 28.42 percent with a standard deviation of 6.11. The percentage of population who had health complaints in Indonesia is 22.87 percent with the highest is DIY Province is 39.58 percent and the lowest percentage is North Maluku Province is 16.71 percent.
Percentage of population aged 15 years and over in 2015 are on average educated for 8 years. So that the population aged 15 years and over in general have education up to the Junior High School level. The MYS range in Indonesia is 4.71 years, which is between 5.99 years and 10.70 years. MYS lowest in Papua Province, while the highest MYS in DKI Jakarta Province. In figure 2 there are five scatter plot between the response variable and each predictor variable. Plot (a) is between AHH with the percentage of poor population, plot (b) between AHH with the percentage of households using clean drinking water, plot (c) between AHH with the percentage of population who had health complaints, plot (d) between AHH with percentagesof under-fives who have been complete immunized and plot (e) between AHH with Mean Years of Schooling (MYS). From the results of the plot shows that the only plot (a) which tends to form a certain pattern, a pattern that looks relationships tend to follow a straight line or linear. While the plot (b), (c), (d) and (e) looks less likely to form a specific pattern and the pattern of relationships which looks likely to change the behavior of sub intervals. So the conclusion of variables including parametric and nonparametric components as shown in table 2.

C. Modeling AHH with Semiparametric Spline Truncated
Modeling AHH with semiparametric spline truncated depend on knot that used, there are tried one knot, two knots, three knots, and combination knots. Minimum GCV value generated with one knot is equal to 5.04. The point of knots on percentage of households using clean drinking water variable ( 1 ) is 61.37 ( 11 ) percentage of population who had health complaints variable is ( 2 ) 25.58 ( 12 ) percentage of underfives who have been complete immunized variable is ( 3 ) 54.18 ( 13 ) and MYS variable is ( 4 ) 7.82 ( 12 ). GCV value generated using semiparametric regression spline truncated with two knots are presented in Table 4. Shown in Table 4, the value of minimum GCV generated is equal to 4.76. The point of knots on a variable percentage of households using clean drinking water ( 1 ) is 47.49 ( 11 ) and ( 21 ) 52.83 percentage of the population who had health complaints variable ( 2 ) is 19.51 ( 12 ) and ( 22 ) 21.84 percentage of under-fives who have been complete immunized variable ( 3 ) is 44.73 ( 13 ) and ( 23 ) 48.36 and MYS variable ( 4 ) is 6.57 ( 14 ) and 7.05 ( 24 ). GCV value is generated by using spline semiparametric regression truncated to three knots are presented in Table 5. Based on Table 5 Table 7 shows that there is a minimum GCV at three knots points in the amount 4.19. So that the best semiparametric regression model spline truncated is model with three-point knots on percentage of households using clean drinking water variable, three points knots on percentage of the population who had health complaints variable, three point knots on percentage of under-five have been complete immunized variable and three point knots on MYS variable. So semiparametric regression model spline truncated formed as follows:

D. Testing The Significant Parameter Simultanously
Testing hypothesis to test the significance of parameters simultaneously using the following hypothesis: The result of partial testing hypothesis is shown in table 9. Rejection of 0 if * > or − <  Table 9 shows that there are 11 parameters significant of 18 parameters. All predictor variables affect the response variable.

E. Interpretation
The spline truncated semiparametric regression model with three knots has fulfilled the residual assumption of IIDN, so the resulting model can be interpreted further. The best spline truncated semiparametric regression model generated is as follows: Model interpretation of the variables that significantly influence is as follows: a. The percentage of poor population with other assumptions of constant variables is as follows: � = 68,65 − 0,23 1 b. If there is a percentage of poor population increase as much as one percent so AHH will decrease by 0.23 percent. The percentage of households using clean drinking water with assumption other variables are constant as follows: In the percentage of households using clean drinking water, there are four sub-intervals that have behavioral changes. For the percentage of households using clean drinking water between 60.30 and 82.7 percent, each increase of one percent then AHH will increase by 0.18 years. c. The percentage of population who had health complaints with assumption other variables are constant as follows: For provinces with a percentage of population who had health complaints between 25.11 percent and 34.91 percent, any increase of one percent of the population with health complaints will cause AHH to decrease by 0.27 years. As for the percentage of the population who had health complaints more than 35.38 percent, if the percentage of people who had health complaints increased 1 percent then AHH will decrease by 2.84 years. d. The percentage of under-fives who have been immunized with assumption other variables are constant as follows: In the percentage of under-five who have been complete immunized, there were four sub-intervals of behavior change. For the percentage of under fives whohave been complete immunized from 53.46 percent to 68.73 percent, if an increase in percentage of underfives who have been complete immunizied by one percent, AHH will rise 0.21 years. There are 21 provinces in Indonesia that have this kind of behavior. As for the percentage of under-fives who have been complete immunizied above 69.46 percent, if there is an increase in percentage of under-fives who have been complete immunizied of one percent then the value of AHH will rise 2.2 years. Areas that have this behavior are Central Java, Bangka Belitung, Bali and DIY. e. The MYS with assumption other variables are constant as follows: The MYS variable has four behavioral sub-intervals. For MYS value less than 7.72 years, if the value MYS rise one year, AHH will rise 3.63 years. Areas with such patterns of behavior are the provinces of Papua, West Nusa Tenggara, East Nusa Tenggara, West Kalimantan, West Sulawesi, West Papua, Central Java, Gorontalo, East Java, Bangka Belitung, Lampung and South Sulawesi. Policies to improve the quality of life such as AHH through improved education in these provinces are appropriate. Meanwhile, there is one province that has a pattern of every 1-year increase of MYS, it will increase AHH by 15.34 years, that is DKI Jakarta Province.

V. CONCLUSION
The best model is model with three knots. Of the five variables used have a significant effect on the model. The coefficient of determination ( 2 ) obtained is 84.70 percent, so the model is feasible to use.