Nested Generalized Linear Model with Ordinal Response for Correlated Data

In this paper, we discuss the generalized linear models with ordinal response for correlated data in nested area. Some basic concepts are described, that is nested spatial, threshold model, and cumulative link function. Due to correlated data used for this modeling, Generalized Estimating Eequation (GEE) is used as model parameters estimation method. Nested is shown by the model building and its application on nested spatially data. In this method, some Working Correlation Matrices (WCM) are able to be specified depend on the nature and type of the data. In this study, 3 types of WCM and 2 types of parameters estimation covariance are used to see the results of parameters estimation from these combinations. As a conclusion, independent WCM is appropriate to the data. Keywords—nested generalized linear model, ordinal response, working correlation matrix, correlated Abstrak—Makalah ini membahas generalized linear models dengan respon ordinal untuk data berkorelasi pada area tersarang. Beberapa konsep dasar dibahas, yaitu sedikit pendahuluan mengenai spatial tersarang, model threshold, dan fungsi penghubung kumulatif. Karena ada indikasi data berkorelasi, Generalized Estimating Equation (GEE) digunakan untuk pendugaan parameter model. Pembentukan model disesuaikan dengan kondisi tersarang dan diaplikasikan pada data spasial tersarang. Pada metode pendugaan parameter GEE, beberapa Working Correlation Matrices (WCM) dapat ditentukan tergantung dari kondisi data. Tiga struktur WCM dan 2 jenis pendugaan digunakan untuk melihat pengaruhnya pada hasil pendugaan parameter. Hasil perhitungan memberikan kesimpulan bahwa WCM independent paling sesuai untuk data yang digunakan. Kata Kunci—nested generalized linear model, respon ordinal, working correlation matrix, berkorelasi


I. INTRODUCTION 2
s the starting consideration of the nested Generalized Linear Mixed Models (nested GLMM) for ordinal response, this paper works through about nested Generalized Linear Models (nested GLMs) for ordinal response, sub topics about parameter estimation method, and implementation of the model to the data.
Related to the evaluation of regions on poverty alleviated program, comparison among regions is needed.In this work, the score in ordinal scale is prefer than numeric to simplify the interpretation [1].This study uses ordinal response for modelling, and the unit of observation is sub district.Connected to this region and certain multilevel spatial survey, assumed the regions (e.g., districts or 'kabupaten') of one area (e.g., province) are similar but not identical for another area.Such an arrangement is called a nested, with levels of district nested under the levels of province.For example, consider the government has a goal to reduce poverty and modeling is used to know the factors that contribute to determine poverty level.The question is: do these factors have the same effects on poverty level in all provinces?This question will be answered through the nested modeling.
There are some districts available from each province.The situation is depicted by Figure 1, which in this problem, a district from particular province has different nature from districts of another province [2].Every Yekti Widyaningsih is with Department of Mathematics, FMIPA, Universitas Indonesia, Depok, 16424, Indonesia.E-mail:yekti@sci.ui.ac.id.
province has a particular nature and policy especially for a specific province such as 'Daerah Istimewa'.This situation has an effect on the correlation matrix and parameter estimation.Based on this effect, the nature of the spatial component should be considered especially when a modelling is needed to analyze the effects of districts and province Spatial data can be viewed as realizations of a spatial stochastic process {Z(s): sD} where s is the location from which the data is observed and D is a random set in d dimensional Euclidean space [3].Lattice data is defined as follows.Denote that Z(s 1 ),…• ,Z(s n ) are lattice data observed at n sites.D is a fixed subset of R d and it is partitioned into a finite number of lattices (or areal units), while site index s varies continuously over D [4].
Generalized estimating equations as parameters estimation method, were developed to extend generalized linear models to accommodate correlated longitudinal and/or clustered data [5].In statistics, a Generalized Estimating Equation (GEE) is used to fit the parameters of a generalized linear model where unknown correlation between observations in a cluster is present.This method is usually used for the models of the clustering or longitudinal data.GEE was introduced by [6] as a method of regression model parameters estimation when dealing with correlated or clustered data.To define a regression model using the GEE methodology, one needs to define the following principles: the distribution of dependent variable (which must be a member of the exponential family), the monotonic link function, the independent variables, and the correlation or covariance structure of the repeated (clustered) measurements.
Analog to the concept of four calcium content measurements on a leaf [7], unit of observation in this research is sub-district.Analogy of sub district is the point of calcium contents measurement, analogy of A district is the leaf, and analogy of province is the tree where the leaf come from.
Sub district is presented as repeated measurement in a district, and district as level 1 is nested in province as level 2. Ordinal measurement is made on each subdistrict within a district [1]; or repeated ordinal measurements are made on each district at different subdistricts.For instance, in a poverty study, a district may be represented by several sub districts at a given value of covariates, furthermore, these districts can be classified as "worst", "moderate", or "mild" in poverty [8].
Clusters are an example to represent the correlated observations: assumed that there is a correlation between observations in a cluster, while there is no correlation between observations from different cluster.This structure can be used as an example of longitudinal data or panel data as well as the data of family studies, or data with spatial structure [9].
To capture some of the beneficial aspects of quasilikelihood estimation in the context of models for correlated data, [6] and [7] established GEE method.Beside robust in misspecification of covariance matrix, estimation using GEE is often easier to quantify than the maximum likelihood estimation [10].
In the parlance of the GEE approach, V i is known as a "working" covariance matrix to distinguish it from the true underlying covariance among the Y i .That is, the term "working" acknowledges our uncertainty about the assumed model for the variances and within-subject associations; unless they have been correctly modeled, our model for the covariance matrix may not be correct.The GEE approach allows the modeler to specify an incorrect structure [5].The objective of this paper is to build generalized linear model of ordinal poverty response in nested area using GEE as the method to estimate the model parameters.

A. Study Area
The study area (D) consist of 3 provinces with 3 districts in every province, and n si sub districts in district i, i = 1, 2, 3 of province s, s=1,2,3.The districts are chosen random independently in every province, with a uniform distribution.In other words, simple random sampling is used to select 3 districts from every province (without Banten, DKI, and DIY). Figure 2 shows study area in this research.

B. Data
Based on the report of BAPPENAS, there is a relation between the level of severity (poverty) of a region with a number of farmer families, the number of education centers (schools), the number of medical centers and/or health personnel, as well as the number of cases of malnutrition or bad nutrition [11].To find out how these variables affects poverty level, through the modeling.
Nested GLM in this study are applied to the data on poverty with response variable is the poverty (severity) level of sub district, which has levels (worst, moderate, mild) [1].The explanatory variables in this modeling are the number of schools, farmer families, health personnel, and malnutrition hotspots status.All explanatory variables are divided into 3 levels (low, moderate, and high), except the hotspot has 2 levels (hotspot = 1, non hotspot = 0).The model is shown by Equation 7.

A. Threshold Model
Threshold is a latent variable at the model that made the difference between linear models with ordinal response and linear models with non-ordinal responses.Threshold model is explained as follows.In logistic and probit regression models, there are assumptions about an unobserved latent variable (y) associated with the actual responses through the concept of threshold [12].For dichotomy model, it is assumed there is a threshold value, while for ordinal model with K categories (polytomy), it is assumed there are K-1 threshold values, namely γ 1 , γ 2 , … , γ K-1 .Response occurs in category k (Z = k), if the latent response y is greater than  k-1 and smaller than  k .Assumed Y j is unobserved, and the j-th observation is in a category, say category Z j , j = 1, …, N. The relationship between Y j and Z j is taken to be, where k  {1, , K},  0 = -,  K = + and γ 1 , γ 2 , … , γ K-1 are unknown boundary points that define a partitioning of the real line into K intervals.Thus, when the realized value of Y j belongs to the k-th interval, we observe that z j = k.Under that assumptions, the probability-mass function of Z 1 , , This model is called the threshold model [13].
These models can also be interpreted in terms of a latent variable.Specifically, suppose that the manifest response Z j results from grouping an underlying continuous variable Y i using cut-points  1 <  2 <  <  K- 1 , so that Z j takes the value 1 if Y j is below  1 , the value 2 if Y j is between  1 and  2 , and so on, taking the value K if Y j is above  K-1 .Figure 3 illustrates this idea for the case of five response categories.

B. Cumulative Link Models
All of the models to be considered in this research arise from focusing on the cumulative distribution of the response.Let  jk = Pr{Z j = k} denote the probability that the response of an individual with characteristics x j falls in the k-th category, and let p jk denote the corresponding cumulative probability, that the response falls in the k-th category or below, so, ) denote a link function mapping probabilities to the real line.Then the class of models that will be considered, assumes that the transformed cumulative probabilities are a linear function of the predictors, of the form, In this formula,  k is a constant representing the baseline value of the transformed cumulative probability for category k, xj is a row vector of covariates of j-th observation and β is a column vector, represents the effect of the covariates on the transformed cumulative probabilities.Since it is written as the constant explicitly, we assume that the predictors do not include a column of ones.
Suppose further the underlying continuous variable follows a linear model of the form where the error term  j has c.d.f.F( j ).Then, the probability response of the j-th individual will fall in the k-th category or below, given x j , satisfies the equation and therefore follows the general form in Equation ( 4) with link given by the inverse of the c.d.f. of the error term It is assumed that the predictors x j do not include a column of ones, because the constant is absorbed in the cut-points [15].

C. Model Building
This part is about Nested Generalized Linear Models for ordinal response.Model building in this study is based on spatial concept: the closer the observation, the larger the correlation [3].Based on this concept, the idea was expanded to the nested of location or area.Furthermore, as the data in the observation is not always continue nor has normal distribution, the model should be in the general form is the expected value of Y s(i) , response of observation in province s and district i; X s(i) β S is the linear predictor, a linear combination of unknown parameters  s ; is a link function.
Modelling in this research uses ordinal scale as response variable and some categorical (ordinal) variables as covariates.Index j is for sub district (repeated observations in district i), i is for district (nested in province s), and s is an index for province.District as level-1 is nested in province as the level-2.Link function for multinomial ordinal response is the cumulative logit model [16][17].

 
) Where  k is threshold.'s are the fixed effect at the transformed cumulative probabilities, y s(i(j)) = the response variable of jth sub subject (sub district), in ith subject (district), in sth center (province), that could be continuous, binary, count, or category., x s(i(j)) is the value of covariates x of sub district j, in district i and province s.Let S = number of centers, I = number of subject n si = the number of repeated measurements (observations or sub-subjects) of the response on the subject.The response on the ith district of province sth can be grouped into a n si x 1 vector.
are assumed to be independent of one another.
)) ( where X size is, . is a spatially unstructured random effect assumed identically independently normally distributed of sub district j in district i and province s.This assumption is based on a theorem, that standardized of Pearson residual Moran's I convergence in distribution to N(0,1) [18].
In this research, the model is developed using ordinal response variable with multinomial distribution for spatially nested area.As the data is nested and correlated, GEE is used as parameters estimation method for GLM.The models are implemented for poverty data in Java Island.Study area comprises of 3 provinces, 9 districts.

D. GEE for Ordinal Response Data
Generalized linear models were first introduced by Nelder and Wedderburn [17] and later expanded by McCullagh and Nelder [16].The following discussion is based on their works and an extension of GEE from Liang & Zeger [6] for ordinal categorical responses data.
Suppose we have a multinomial response, say z.And for this response, there are K ordered categories with corresponding probabilities π 1 , π 2 , …, π K , that is Pr(z = k) = π k .The proportional odds model is based on the cumulative probabilities,  k = π 1 + π 2 + …+ π k , for k = 1 to K-1.Logit link function is used to relate  k to a linear function of p covariates X.Now let's take a look at the repeated situation.Suppose we have a sample of I subjects.Let z ij be the ordinal response (with K levels) for the ith subject (i =1 to I) at point j (j = 1 to n i ).Form of a (K-1)×1 vector yij = (y ij1 , y ij2 , …,y ijK-1 )`where Y ijk = 1 if z ij =k, and 0 otherwise.
The objective of this part is to model the π ijk as a function of x ij and the regression parameters θ = γ 1 , γ 2 , … , γ K-1 ,β)' where γ k are intercept or cut-point parameters (threshold) and β is a p×1 vector of regression parameters.Let φ ijk = π ij1 + π ij2 + …+ π ijk denote the cumulative probabilities.Then the proportional odds model at sub subject j is: log it(φ ijk ) = γ k + x ij β.To establish notation, let y i = (y i1, … ,y ini )', and π i = ( π i1, … , π ini )'.Then θ can be estimated through solving the estimating equation as follows where, )), ( ( and for all j = 1, … n si , 1 ,..., ), ( ( Generalized inverse is used if V s(i) is singular or almost singular due to redundant data.Here with little mathematical operations, from equation ( 10), the working correlation matrix, R s For simplifying, index j = 1, …, n si is not inside parenthesis, but the same meaning with written before is maintained.
Note that there is a subscript s(i) in R s(i) (α) which means each subject has different working correlation matrix.In fact, only the diagonal blocks are different for different subjects, the off-diagonal blocks will be the same for all subjects.The diagonal blocks of are specified entirely by π s(i(j)) .In particular, the diagonal elements of which are not constant and depend on the categories k and m at measurement j.

E. Working Correlation Matrices
This part describes about Working Correlation Matrix (WCM) for ordinal multinomial model.Y sij ,k and  sij ,k are described in part D in this section.WCM uses Pearson-like residuals that defined as follows . The following structures are available [19].
Some provisions in choosing the working correlation matrix are: (1) if I is small and data are balanced and complete, then an unstructured matrix is recommended, (2) if observations are repeated, then use a structure that accounts for correlation as function of time (stationary, or auto-regressive), (3) if observations are clustered (i.e.no logical ordering) then exchangeable may be appropriate, (4) if the number of clusters is small, then independent may be the best [20].

F. Algorithm for GEE Parameter Estimation
The algorithm for estimating model parameters using GEEs is outlined below [19,6].The standard iterative procedure to fit GEE, based on Liang and Zeger is : 1) Compute initial estimates for θ, θ (0) using conventional GLM, i.e., assuming independence.2) Compute the working correlation R(α) based on θ (0) , Pearson residuals and a specified working correlation structure.Check if R(α) is positive definite for exchangeable and unstructured structures.If it is not, revise it to be equal to ), where I is an identity matrix and ς is a ridge value such that the adjusted matrix is positive definite.If a fixed correlation matrix is specified by the researchers and it is not positive definite, then cannot continue.As an alternative, then compute the initial estimate of the covariance matrix of , the generalized estimating equation  (0) , and generalized Hessian matrix H (0) (see formulae below) based on θ (0) and .

3) Initialize
is a positive integer, update the working correlation, checking for positive definiteness as above.6) Compute an estimate of the covariance matrix of yj and its generalized inverse

 
For the ordinal multinomial model, replace R(α) with in the above equations.

7) Revise and based on and
Check the convergence criteria.If they are met or the maximum number of iterations is reached, then stop.The final vector of estimates is denoted by .Otherwise, go back to step (4) [19].

Convergence Criteria
Let  and are given as tolerance levels, then the criteria for parameter convergence can be written as follows: [19] (may be specified by researcher) after the loglikelihood or parameter convergence criterion has been satisfied.

Parameter Estimate Covariance Matrix, Correlation
Matrix and Standard Errors Two parameter estimate covariance matrices can be calculated, that is model-based and robust estimators.In the generalized linear model, the consistency of the model-based parameter estimate covariance depends on the correct specification of the mean and variance of the response (including correct choice of the working correlation matrix).However, the robust parameter estimate covariance is still consistent even when the specification of the working correlation matrix is incorrect as we often expect [20].
Model-based parameter estimate covariance is is the generalized inverse of : The robust parameter estimate covariance is: Note that model-based parameter estimate covariance will be affected by how the scale parameter is handled, but the robust parameter estimate covariance will not be affected by the estimate of the scale parameter because is cancelled in different terms [19].

Parameter Estimate Standard Error
For the ordinal multinomial model, let , k = 1, 2, …, K-1, be threshold parameter estimates and , r = 1, … p denote non-redundant regression parameter estimates.Their standard errors are the square root of the rth diagonal element of Σ r : and , respectively [19].

G. Wald Statistics
For ordinal multinomial model, the more general test matrix L= (L( ), L(β)), where L(γ)= (I 1 , … ,I K-1 ) consists of columns corresponding to threshold parameters and L(β) be the part of L corresponding to regression parameters.Consider matrix L o = (l o , L(β)),, where the column vectors corresponding to threshold parameters are replaced by their sum Then Lθ is estimable if and only if L o = L o Hω, where Hω = (X 1 'ΩX 1 ) -X 1 'ΩX 1 is a (1+p) x (1+p) matrix constructed using X 1 = (1,-X).Ω is the scale weight matrix with ith diagonal element ωi and such that L i θ is estimable.The Wald statistic for testing Lθ =K, where L is a r x (K-1+p) full row rank hypothesis matrix and K is a r×1 resulting vector, is defined by is the GEE estimate and Σ is the estimated covariance matrix (Σ could be the model based or robust estimator).The asymptotic distribution of S is , where r c = rank (LL) [10].

Wald Confidence Intervals
For the ordinal multinomial model, the 100(1 -α)% Wald confidence interval for parameter is given by

III. RESULTS AND DISCUSSION
Appendix 1 shows the parameter estimates and their standard errors and Appendix 2 presents significance of the parameters.The Following is the explanation about the output.

A. Standard Error of Parameters Estimate
Figure 4 shows the standard errors of model based GLM parameters.Standard errors unstructured tend smaller than standard errors independent and exchangeable, while standard errors independent are the highest.
Moreover, according to Figure 5, there is no particular pattern between exchangeable, unstructured, and independent for robust parameter estimates.For some parameters (prov2, farm11, farm21, sch12, sch22), standard errors unstructured are extremely high.
From Figure 6, robust estimation with independent WCM has the lowest standard error, but according to the nature of the data, unstructured WCM is the most appropriate due to high correlation between sub districts in the same district.
It is desirable to compare the fit of different working correlation structures within a GEE for Nested GLM.An informal comparison is to compare the standard error of the robust (SE R ) and model-based (SE M ).There are no guidelines regarding the size of the ratio, but higher ratios reflect poorer model fit.This comparison is qualitative, but it is the best approach available at this time [21].Table 1 presents the averages of this ratios (SE R /SE M ).It shows independent WCM has the smallest ratio.It means independent WCM is most appropriate to the data.

B. Signifinace (p-values) (Appendix 2)
From Appendix 2, it is believed that province 2 (West Java) is different from province 3 (East Java) for all models (p-values  0.000).Number of farmer families in province 2 (Central Java) is significant as a contribution to determine the poverty level.Furthermore, number of school is also significant as a contribution to determine the poverty level in West Java.

Some interpretation of parameters
School1 in prov1 with model based and unstructured WCM: in West Java, probability a sub district with category of school1 to be the more severe is exp(2.05)= 7.77 times of sub district with category of school3, with other values of variables are fixed.

IV. CONCLUSION
Even though based on the nature of the data, unstructured WCM is the most appropriate due to high correlation between sub districts in the same district, but according to the result of modeling, independent WCM is the most appropriate to the data.Ratio of SE R /SE M is the smallest, 0.56 and percentage of the true classification is the highest, 60.9% Using robust estimation with unstructured WCM should be avoided, due to unstable standard error, but model based unstructured is fine (stable).
In general, parameters estimate of model based are more stable than those of robust.
Related to the nested area using in the modeling, the significance results (p-values) give the indication that characteristic of Central Java is different from West Java and East Java, which is appropriate to research of "Civilization Java" by Rahardjo [22].Rahardjo said, characteristic of geographical areas in Central Java is more closed than in East Java.In addition, number of farmer families in Central Java is significant as contribution to poverty level, but not significant in other provinces.In addition, number of schools in West Java is significant to determine the poverty level, but not significant in East Java.
It is recommended to conduct simulation data to look the differences among some working correlation matrices.Furthermore nested area based on geographical conditions, such as northern coast, inland, and southern coast is still a challenge as an open problem to be studied.ACKNOWLEDGEMENT We wish to give our gratitude to DP2M DG DIKTI DEPDIKNAS who has supported this research through Hibah Penelitian Pascasarjana Nasional 2009-2011.

TABLE 1 AVERAGES
OF SER/SEM OF NESTED GLM PARAMETER ESTIMATES Threshold), prov, farm(prov), school(prov), medis(prov), ULS(prov) a. Set to zero because this parameter is redundant.b.Hessian matrix singularity is caused by this parameter.The parameter estimate at the last iteration is displayed.