Application of Confidence Intervals for Parameters of Nonparametric Spline Truncated Regression on Index Development Gender in East Java

The Gender Development Index (GDI) is an index that measures the achievement of human basic capability development for the health, education and economic sectors within a region by considering equality between men and women. In this research GDI with the factors that are suspected to affect it will be modeled using nonparametric spline truncated regression, because the results of scatterplot pattern of GDI with some predictor variables not to follow a certain pattern. Determination of predictor variables that significantly influence GDI by using confidence interval obtained high school enrollment rate of female population, morbidity of female population, percentage of last aid of birth by medical, and female labor-force participation rate have significantly influenced to GDI in East Java. Keywords Gender Development Index (GDI), Nonparametric Regression, Spline Truncated, Confidence Interval.


I. INTRODUCTION
egression analysis is method used to determined the relationship between response variable with one or more predictor variable.The regression approach is divided into three approaches: parametric regression, semiparametric regression, and nonparametric regression.Parametric regression approach requires assumptions such as shape of the curve must be known, error normal distribution and homogeneous variance.Nonparametric regression approach is a statistical method used to determine the relationship between the response variable with predictor variables of unknown function form, simply assumed to be smooth in the sense contained in a particular function space.Nonparametric regression approach is very flexible regression to model the data pattern [1].While the semiparametric regression approach is a regression that contains parametric components and nonparametric components.
Nonparametric regression has been developed, such as using kernel, spline, polynomial local, and deret fourier.Spline has several advantages including having a simple and good statistical interpretation and good visual representation.
Spline univariable regression is nonparametric regression analysis if there is one response variable and one predictor variable.If there is one response variable and more than one predictor variable then it is called Spline multivariable regression [2].
One of the most important parts of statistical inference is the confidence interval.The confidence intervals for parameters of nonparametric spline truncated regression.Confidence intervals for parameters of nonparametric regression can be used to determine predictor variables that significantly influence response variables.The conclusion is based on the confidence interval for parameter that contains a zero value.If the confidence interval contains a zero value, then the predictor variable has no significant effect on the response variable.
National development is human development and development of Indonesian society fairly and equally.However, in order to create these conditions, there are several problems, including the existence of the gap in development achievement between women and men as well as the low quality of life and the role of women in development.The gender gap in various areas of development is also marked by the low opportunities women have for work, as well as low access to economic resources such as technology, information, markets, credit and working capital.The different gender roles that exist in Indonesia is a matter of social injustice that places women as the main victims.Forms of gender inequality and gender justice are known as gender disparities that will cause gender issues [3].
To improve gender equality, the women's basic needs such as health, education, employment and participation must be considered.These basic needs reflect the quality of human resources.The Government has strived to realize gender equality and justice in the life of society and the state through several policies and programs.But in practice there are still many obstacles and challenges [4].
To evaluate the development already accommodating the gender aspects can be used related indicators, such as the R Gender Development Index (GDI).GDI was introduced by the United Nations Development Programs (UNDP) in the 1995 Human Development Report.This GDI number is expected to provide information on the development outcomes that already accommodate gender aspects [6].Based on data released by UNDP in BPS publications, GDI Indonesia still occupies a low position compared to ASEAN countries (excluding Vietnam and Myanmar) which is the third lowest position after Timor Leste and Cambodia.
The position of East Java GDI achievement in 2014 separated 20 districts/cities under the achievement of the provincial IPG and 18 districts/cities above the achievement of provincial GDI.This condition illustrates that there are still many districts/cities that need improvement in programs that lead to gender mainstreaming.In 2014 there is a disparity of GDI numbers between provinces in Indonesia.On a nationwide scale of IPG achievements from 34 provinces, East Java ranks 16th.In Java Island, East Java occupies the second lowest position after West Java.If seen the development of East Java IPG value from 2010 until 2014 has increased.
[5] has discussed gender differences in East Java, which concluded that factors affecting the gender gap are the junior secondary enrollment rate for the female population, the percentage of the female population with junior secondary education, and the percentage of the female population employed in the formal sector.[6] also studied the components of the Gender Development Index in East Kalimantan and South Kalimantan Province in 2011.From these studies, it was found that factors affecting GDI components in the provinces of East Kalimantan and South Kalimantan for sex were population density, ratio's facility health, percentage of educated population above junior high, and unemployment rate.While the factor that affects the female's GDI is the percentage of educated population above junior high.
Based on the description, GDI with the factors that are suspected to affect it will be modeled using nonparametric spline truncated regression, because the results of scatterplot pattern of GDI with some predictor variables ie high school enrollment rate of female population, morbidity of female population, percentage of last aid of birth by medical, and female's labor-force participation rate not to follow a certain pattern.The determination of the variabel that have significantly influence using confidence interval.

A. Nonparametrik Spline Truncated Regression
Spline is a piece of polynomials, ie polynomials having continuous segmented nature.Spline has a high flexibility and has the ability to estimate data behavior that tends to differ at different intervals [1] where with With  1 ,  2 , …   are knot points showing the pattern of behavioral changes of functions at different sub-intervals.

B. Selection of Optimal Knots Point
The knot point is a common fusion point where there is a change in function behavior at different intervals [7].One method used to select the optimal knot point is to use the GCV (Generalized Cross Validation) method [8].The best Spline model is obtained from the smallest GCV value. where , ( 1 ,  2 , …   ), are knot point, and matrix ( 1 ,  2 , …   ) = (′) − ′ [9].

C. Estimation for parameters of Nonparametric Spline Truncated Regression
The estimation for parameters of nonparametric spline truncated regression.If given model of nonparametric spline truncated regression with r knot If equation ( 5) is expressed in matrix form Based on equation ( 7) and ( 8) obtained  is normally distributed with mean () and variance  2 .One method that can be used to get point estimation of  is by using the Maximum Likelihood Estimation (MLE).The probability distribution of  is ( ) Based on equation ( 9) obtained likelihood function If equation ( 10) is transformed to a logarithmic form, it will be obtained Using a partial derivative of  is obtained:

D. Confidence Intervals for parameters of Nonparametric Spline Truncated Regression
Confidence intervals for parameters of nonparametric spline truncated regression divided into two forms, when  2 known and  2 unknown.The confidence intervals for parameters of nonparametric spline truncated regression when  2 known is and the confidence intervals for parameters of nonparametric spline truncated regression when  2 unknown is

E. Checking the Assumption of Residuals
Checking the assumption of residuals in this research include checking the assumption of independence residuals, identical residuals, and normality residuals.

F. The Assumption of Independence Residuals
Checking the assumption of independence residuals is used to detect correlation between residuals.The independence assumption on residual is indicated by the covariat value between   and   equal to zero.For checking the assumption can be seen from the plot is on the boundary of significant area that is ± 2 � ∕ √.So it indicated there is no case of autocorrelation [9].

G. The Assumption of Identical Residuals
Statistic Glejser is one of of the methods that can be used to detect heterogenity variance of residuals [10].The hypothesis used is

H. The Assumption of Normality Residuals
Checking the assumption of normality residuals used to check the residual is normally distributed or not.

I. Gender Development Index
The Gender Development Index (GDI) is one of the indicators to measure the success rate of development achievements that already accommodate gender issues.GDI is a direct measurement of the inequality of the genders in the achievement of Human Development Index (HDI).GDI is the ratio of female HDI to male HDI.

GDI = HDI P HDI L (16)
When the number of IPG approaches the number 100, then the development of gender is more balanced or evenly distributed.However, if the more away from the number 100, then the development of gender increasingly unbalanced between the sexes.The dimensions used to measure the quality of life in GDI is longevity and healthy living, knowledge, and decent living standard / welfare [3].In this study, the dimensions of longevity and healthy living were measured by the morbidity of female population [4] and the last medical birth attendant, the dimensions of knowledge were measured by the School Enrollment Rate of Women's Senior High Schools [11], and the standard of living worth measuring with Female Labor-Force Participation Rate.

A. Data Source
The data is used in this research is secondary data obtained from the publication of the Badan Pusat Statistik (BPS) of East Java province.The unit of observation in this study includes 29 counties and nine cities in the province of East Java in 2014.

B. Reseacrh Variables
The research variables used in this research are high school enrollment rate of female population ( 1 ), morbidity of female population ( 2 ), percentage of last aid of birth by medical ( 3 ), and labor-force participation rate of female population ( 4 ).

C. Step of Analysis
The steps to solve the problems and achieve the goals in this research are as follows: (1) Creating scatter plot data between response variables with each predictor variable; (2) Modeling Using Nonparametric Spline Truncated Regression with one, two, three, and combination knot point; (3) Determining the best model using GCV method; (4) Calculating the coefficient of determination R 2 ; (5) Checking the assumtion of Residuals.Such as ; ( 6) Determining the confidence interval for parameters of nonparametric spline truncated regression; and (7) Geting the conclusion which was determination of the variabel that have significantly influence using confidence interval.

A. Modeling Using Nonparametric Spline Truncated Regression
For the first we will discuss about the charateriristics of each variable, The position of East Java GDI achievement in 2014 separated 20 districts/cities under the achievement of the provincial IPG and 18 districts/cities above the achievement of provincial GDI.The spread of GDI in the province of East Java shown in figure 1.Based on figure 1 it can be seen that there are 20 regions that have IPG under the achievement of provincial GDI, indicated by red color.While the yellow area shows its GDI value is already above the achievement of GDI province.The scatterplot between GDI with each of the variables expected to influence shown in figure 2 Figure 2. Scatterplot between GDI with the variables predictors Based on figure 2 can be seen that scatterplot pattern of GDI with some predictor variables ie high school enrollment rate of female population (a), morbidity of female population (b), percentage of last aid of birth by medical (c), and female's labor-force participation rate (d) not to follow a certain pattern.The first step in modeling using nonparametric spline truncated regression is to select the optimal knot point with one knot point, two knots, three knots, and a combination of knot points.

1) Selection of optimal knot point with one knot point
In this section will be discussed about the selection of optimal knot point on GDI and four predictor variables that are suspected to affect it.The model of nonparametric spline truncated regression with one-point knot using four variables predictor as follows The value of GCV generated with a one knot point is shown in table 2.  Based on table 2, the minimum GCV value with one knot point is 8,16.

2) Selection of optimal knot point with two knot point
The next step after getting the minimum GCV with one point knots, then select the optimal knots with two point knots.The model of nonparametric spline truncated regression with two-point knot using four variables predictor as follows The value of GCV generated with a one knot point is shown in table 3 Based on table 3, the minimum GCV value with one knot point is 8,49.

3) Selection of optimal knot point with three knot point
The next step after getting the minimum GCV with two point knots, then select the optimal knots with three point knots.The model of nonparametric spline truncated regression with three-point knot using four variables predictor as follows   =  ̂01 +  ̂11  1 +  ̂12 ( 1 −   Based on table 5, the minimum GCV value with combination knot point is 6,13.

5) Modeling with Knot Optimum Point
After obtaining the minimum GCV value by using one, two, three, and a combination of knot points, the next is to select the best model by comparing the smallest GCV value of each knot.In table 6 we will show the minimum GCV value for each knot point.(21) The model of nonparametric spline truncated regression with the combination of 3, 2, 2, 1 knot points has R 2 equal to 87,48.This value can be interpreted that this model can explain the GDI 87,48%.

B. Confidence Interval for Parameters of Nonparametric Spline Truncated Regression
After we get the model of nonparametric spline truncated regression, then a 95% confidence interval will be constructed with the formula given in equation ( 13).Here are the result or a confidence interval nonparametric regression.If the confidence interval contains a zero value, then the parameter does not significantly affect the model.Based on Table 6 are obtained from the 13 parameters 8 parameters that significantly influence the model.But overall, the four predictor variables ie high school enrollment rate of female population, morbidity of female population, percentage of last aid of birth by medical, and female's labor-force participation rate have significantly influence to the response variable (GDI).

C. Checking the Assumption of Residuals 1) The Assumption of Independence Residuals
The result ACF plot of the residuals is shown in figure 3. Based on figure 3 it can be seen that the residual autocorrelation value is at a significant limit or in other words no lag is out of bounds.So it can be concluded that there is no correlation between residuals.

3) The Assumption of Normality Residual
The hypothesis used is.H 0 : residual is normally distributed H 1 : residual is not normally distributed Based on the normality test with Kolmogorov obtained pvalue equal to 0.27 > α = 0.05 then failed to reject H0.So it can be concluded that the residual is normally distributed.

V. CONCLUSION
The best nonparametric regression model is as follows

Figure 1 .
Figure 1.GDI in East Java

TABLE 1 .
STATISTIC DESCRIPTIVE OF GDI WITH THE VARIABLES PREDICTORSFrom table 1 we know that the average IPG (y) in East Java Province in 2014 amounted to 90,06.The highest IPG value in East Java province was 98,23 is Blitar, while Sumenep had the lowest IPG value of 76.63.

TABLE 2 .
GCV VALUE WITH ONE POINT KNOT

TABLE 4 .
GCV VALUE WITH THREE POINT KNOT

TABLE 6 .
COMPARISON OF GCV VALUES

TABLE 6
2,  = 1,2, … , , (residual not identical) which has R 2 value of 87,48%.Determination of predictor variables that significantly influence GDI by using confidence interval obtained high school enrollment rate of female population, morbidity of female population, percentage of last aid of birth by medical, and female laborforce participation rate have significantly influenced to GDI in East Java.