Variables Selection Affecting Indonesian Human Development Index Using LASSO

Etis Sunandi, Titin Siswantining

Abstract


According to Statistics Indonesia, the Human Development Index (HDI) is a measure that reflects the level of human development achievement in a region, based on three basic dimensions: a long and healthy life, knowledge, and a decent standard of living. There are many factors that are suspected to influence HDI in Indonesia. Another hand, estimation of parameters in regression analysis using the Least Squares Method will experience problems, if the number of independent variables is greater than the number of observations. One method that can be used to overcome this problem is to use the Least Absolute Shrinkage and Selection Operator (LASSO) method.  The purpose of this study is the selection of variables that affect Indonesia's Human Development Index (HDI) in 2023 using the LASSO. The LASSO method is known as a model used to select independent variables while overcoming multicollinearity problems. The ridge regression model is used as a comparison model. The results showed that LASSO Analysis is better than Ridge Regression. This can be seen from the Mean Squared Error of Prediction (MSEP) of LASSO (0.34) is smaller than the ridge regression (3.61). In addition, the r-squared value of LASSO is higher, which is 97.6%.

Keywords


LASSO, MSEP, R-Squared, Ridge Regression

Full Text:

PDF

References


P. S. Lestari, S. Martha, and N. N. Debataraja, “Penerapan Metode Regresi Ridge Pada Kasus Angka Kematian Bayi Di Provinsi Jawa Timur,” 2022.

H. A. Khoirunissa, A. R. Wijaya, B. Isnaini, and K. Ferawati, “Analisis Faktor-Faktor Penyebab Inflasi di Indonesia Menggunakan Regresi Ridge, LASSO dan Elastic-Net,” Indones. J. Appl. Stat., vol. 7, no. 2, pp. 121–130, 2024, doi: 10.13057/ijas.v7i2.96921.

F. Rahmawati and R. Y. Suratman, “Performa Regresi Ridge dan Regresi Lasso pada Data dengan Multikolinearitas,” Leibniz J. Mat., vol. 2, no. 2, pp. 1–10, 2022, doi: 10.59632/leibniz.v2i2.176.

A. A. H. Suruddin, E. Erfiani, and I. M. Sumertajaya, “The Continuum Regression Analysis with Preprocessed Variable Selection LASSO and SIR-LASSO,” Inferensi, vol. 8, no. 1, pp. 45–51, 2025, [Online]. Available: https://iptek.its.ac.id/index.php/inferensi/article/view/21658

Badan Pusat Statistik, Indeks Pembangunan Manusia 2023 Volume 18. Badan Pusat Statistik/BPS-Statistics Indonesia, 2024. [Online]. Available: https://web-api.bps.go.id/download.php?f=GOvR/J1dHI5pfCpRUWdm2nNJRWdPQWd5amZZZVdzK3JCcmM3YVBjdk16aGRocnVaN1lCSTYvaitUYXdsYnFUNzViaXROeU5OMDBZcWJZbW1BNEI1ajMvcXhzT0pTblpOWEFwdXJMUWF2Wk9QSWU5R3I3MlozMXlZUVJuZzZJSGxhc1UzclNWUTR0YzNCak1rZWVxM2pLV0dZdkk0T21iSH

M. Arif and M. Faisal, “Penerapan Model Regresi Linear Untuk Estimasi Mobil Bekas Menggunakan Bahasa Python,” EULER, vol. 11, no. 2, pp. 182–191, 2023, doi: 10.37905/euler.v11i2.20698.

A. A. Azahra, “Analisis Prediksi Jumlah Penerimaan Mahasiswa Baru Menggunakan Metode Regresi Linier Sederhana,” Bull. Appl. Ind. Eng. Theory, vol. 3, no. 1, 2022.

M. Sinanta P.W.J, “Prediksi Harga Mobil Menggunakan Linear Regression, Ridge Regression Dan Lasso Regression,” J. Rev. Pendidik. dan Pengajaran, vol. 8, no. 1, pp. 3066–3071, 2025, [Online]. Available: https://journal.universitaspahlawan.ac.id/index.php/nutrihealth

R. Tibshirani, “Regression shrinkage and selection via the lasso: A retrospective,” J. R. Stat. Soc. Ser. B Stat. Methodol., vol. 73, no. 3, 2011, doi: 10.1111/j.1467-9868.2011.00771.x.

I. Sartika, N. N. Debataraja, and N. Imro’ah, “Analisis Regresi Dengan Metode Least Absolute Shrinkage And Selection Operator (Lasso) Dalam Mengatasi Multikolinearitas,” BIMASTER, vol. 9, no. 1, pp. 31–38, 2020, [Online]. Available: https://jurnal.untan.ac.id/index.php/jbmstr/article/view/38029/75676584327

J. H. Lee, Z. Shi, and Z. Gao, “On LASSO for predictive regression,” J. Econom., vol. 229, no. 2, 2022, doi: 10.1016/j.jeconom.2021.02.002.

F. Li, L. Lai, and S. Cui, “On the Adversarial Robustness of LASSO Based Feature Selection,” in Wireless Networks (United Kingdom), 2022. doi: 10.1007/978-3-031-16375-3_3.

O. A. Montesinos López, A. Montesinos López, and J. Crossa, “Overfitting, Model Tuning, and Evaluation of Prediction Performance,” in Multivariate Statistical Machine Learning Methods for Genomic Prediction, 2022. doi: 10.1007/978-3-030-89010-0_4.

C. Hong et al., “LASSO-Based Identification of Risk Factors and Development of a Prediction Model for Sepsis Patients,” Ther. Clin. Risk Manag., vol. 20, 2024, doi: 10.2147/TCRM.S434397.

P. Hu, L. Chen, and Z. Zhou, “Machine Learning in the Differentiation of Soft Tissue Neoplasms: Comparison of Fat-Suppressed T2WI and Apparent Diffusion Coefficient (ADC) Features-Based Models,” J. Digit. Imaging, vol. 34, no. 5, 2021, doi: 10.1007/s10278-021-00513-7.

R. Tibshirani, “Regression Shrinkage and Selection Via the Lasso,” J. R. Stat. Soc. Ser. B Methodol., vol. 58, no. 1, 1996, doi: 10.1111/j.2517-6161.1996.tb02080.x.

L. Zhou, F. Koehler, D. J. Sutherland, and N. Srebro, “Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression,” ACM / IMS J. Data Sci., vol. 1, no. 2, 2024, doi: 10.1145/3594234.

S. K. Safi, M. Alsheryani, M. Alrashdi, R. Suleiman, D. Awwad, and Z. N. Abdalla, “Optimizing Linear Regression Models with Lasso and Ridge Regression: A Study on UAE Financial Behavior during COVID-19,” Migr. Lett., vol. 20, no. 6, 2023, doi: 10.59670/ml.v20i6.3468.

I. S. Dar, S. Chand, M. Shabbir, and B. M. G. Kibria, “Condition-index based new ridge regression estimator for linear regression model with multicollinearity,” Kuwait J. Sci., vol. 50, no. 2, 2023, doi: 10.1016/j.kjs.2023.02.013.

A. Tsigler and P. L. Bartlett, “Benign overfitting in ridge regression,” J. Mach. Learn. Res., vol. 24, 2023, [Online]. Available: https://www.jmlr.org/papers/volume24/22-1398/22-1398.pdf

T. O. Hodson, T. M. Over, and S. S. Foks, “Mean Squared Error, Deconstructed,” J. Adv. Model. Earth Syst., vol. 13, no. 12, 2021, doi: 10.1029/2021MS002681.

D. Chicco, M. J. Warrens, and G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation,” PeerJ Comput. Sci., vol. 7, 2021, doi: 10.7717/PEERJ-CS.623.




DOI: http://dx.doi.org/10.12962%2Fj27213862.v8i2.22891

Refbacks

  • There are currently no refbacks.




Creative Commons License
Inferensi by Department of Statistics ITS is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://iptek.its.ac.id/index.php/inferensi.

ISSN:  0216-308X

e-ISSN: 2721-3862

Web
Analytics Made Easy - StatCounter View My Stats