The Continuum Regression Analysis with Preprocessed Variable Selection LASSO and SIR-LASSO

Adzkar Adlu Hasyr Suruddin; Erfiani Erfiani; I Made Sumertajaya

doi:10.12962/j27213862.v8i1.21658

The Continuum Regression Analysis with Preprocessed Variable Selection LASSO and SIR-LASSO

Adzkar Adlu Hasyr Suruddin, Erfiani Erfiani, I Made Sumertajaya

Abstract

Analyzing high-dimensional data is a considerable challenge in statistics and data science. Issues like multicollinearity and outliers often arise, leading to unstable coefficients and diminished model effectiveness. Continuum regression is a useful method for calibration models because it effectively handles multicollinearity and reduces the number of dimensions in the data. This method condenses data into autonomous latent variables, resulting in a more stable, precise, and reliable model. It is possible to use the dimensionality reduction method without losing any important information from the original data. This makes it a useful tool for making calibration models work better. In the initial phase, minimizing dimensions via variable selection is crucial. The study aims to build and test the Continuum Regression calibration model using LASSO and SIR-LASSO variable selection preprocessing methods. SIR-LASSO is a method that integrates SIR with the variable selection capabilities of LASSO. This technique aims to handle high-dimensional data by identifying relevant low-dimensional structures. LASSO improves variable selection by applying a penalty to regression coefficients, reducing the impact of less significant or redundant variables. The integration improves SIR's efficacy in assessing high-dimensional data while also enhancing model stability and interpretability. This approach seeks to address the issues of multicollinearity and model instability. We conducted simulations using both low-dimensional and high-dimensional datasets to assess the efficacy of CR LASSO and CR SIR-LASSO. RStudio version 4.1.3 was used for the analysis. The "MASS" package was used to create data with a multivariate normal distribution. The "glmnet" package was used for LASSO variable selection, and the "LassoSIR" package was used for SIR-LASSO variable selection. In the simulation itself, LASSO surpasses SIR-LASSO in variable selection by yielding the lowest RMSEP value in every scenario. On the other hand, SIR-LASSO becomes less stable as the number of dimensions increases, which suggests that it is sensitive to large changes in variables. As shown by lower median RMSEP values across a range of sample sizes and situations, CR LASSO is usually better at making predictions than SIR-LASSO. The RMSEP distributions for LASSO are consistently tighter, which means that its performance is more stable and reliable compared to SIR-LASSO, whose data has more outliers and more variation. Even with a growing sample size, LASSO maintains its advantage, particularly when setting the value at 0.5. SIR-LASSO, although occasionally competitive, generally yields more variable results, particularly with larger sample sizes. Overall, LASSO appears to be a more reliable option for CR model with pre-processed variable selection.

Keywords

continuum regression; High-dimensional; LASSO; SIR-LASSO; variable selection

Full Text:

PDF

References

K. Lakshmi, B. Mahaboob, M. Rajaiah, and C. Narayana, “Ordinary least squares estimation of parameters of linear model,” Journal of Mathematical and Computational Science, vol. 11, no. 2, pp. 2015–2030, 2021, doi: 10.28919/jmcs/5454.

M. Tsagris and N. Pandis, “Multicollinearity,” May 01, 2021, NLM (Medline). doi: 10.1016/j.ajodo.2021.02.005.

X. Chen and L. P. Zhu, “Connecting continuum regression with sufficient dimension reduction,” Stat Probab Lett, vol. 98, pp. 44–49, Mar. 2015, doi: 10.1016/j.spl.2014.12.007.

M. Stone and R. J. Brooks, “Continuum regression: cross‐validated sequentially constructed prediction embracing ordinary least squares, partial least squares and principal components regression,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 52, no. 2, pp. 237–258, 1990.

K. Setiawan Notodiputro, “Regresi Kontinum dengan Prapemrosesan Transformasi Wavelet Diskret (Continum Regression with Discrete Wavelet Transformation Preprocessing),” Jurnal ILMU DASAR, vol. 8, no. 2, pp. 103-109, 2007.

S. M. Ajeel and H. A. Hashem, “Comparison Some Robust Regularization Methods in Linear Regression via Simulation Study,” Academic Journal of Nawroz University, vol. 9, no. 2, p. 244, Aug. 2020, doi: 10.25007/ajnu.v9n2a818.

S. Sivaranjani, S. Ananya, J. Aravinth, and R. Karthika, “Diabetes Prediction using Machine Learning Algorithms with Feature Selection and Dimensionality Reduction,” in 2021 7th International Conference on Advanced Computing and Communication Systems, ICACCS 2021, Institute of Electrical and Electronics Engineers Inc., Mar. 2021, pp. 141–146. doi: 10.1109/ICACCS51430.2021.9441935.

A. Arwini, A. H. Wigena, and A. Mohamad Soleh, “Continuum Regression Modeling with LASSO to Estimate Rainfall,” 2020. doi: 10.29322/ijsrp.10.10.2020.p10651.

R. Tibshirani, “Regression shrinkage and selection via the lasso,” J R Stat Soc Series B Stat Methodol, vol. 58, no. 1, pp. 267–288, 1996.

S. Agus Mohammad and Aunuddin, “LASSO: SOLUSI ALTERNATIF SELEKSI PEUBAH DAN PENYUSUTAN KOEFISIEN MODEL REGRESI LINIER,” Forum Statistika Dan Komputasi, vol. 18, no. 1, 2013.

K.-C. Li, “Sliced inverse regression for dimension reduction,” J Am Stat Assoc, vol. 86, no. 414, pp. 316–327, 1991.

A. F. Fikri, W. Agwil, and D. Agustina, “PERFORMA TEKNIK REGULARISASI DALAM PENANGANAN MASALAH MULTIKOLINIERITAS,” Journal UNIB, vol.2, no. 1, pp. 45-51, 2022. [Online]. Available: https://ejournal.unib.ac.id/diophantine, 2022. [Online]. Available: https://ejournal.unib.ac.id/diophantine,

Y. Tu, Y. S. Hung, L. Hu, and Z. Zhang, “PCA-SIR: a new nonlinear supervised dimension reduction method with application to pain prediction from EEG,” in 2015 7th International IEEE/EMBS Conference on Neural Engineering (NER), IEEE, 2015, pp. 1004–1007.

Q. Lin, Z. Zhao, and J. S. Liu, “Sparse Sliced Inverse Regression via Lasso,” J Am Stat Assoc, vol. 114, no. 528, pp. 1726–1739, Oct. 2019, doi: 10.1080/01621459.2018.1520115.

S. Girard, H. Lorenzo, and J. Saracco, “Advanced topics in Sliced Inverse Regression,” J Multivar Anal, vol. 188, 2022, doi: 10.1016/j.jmva.2021.104852ï.

L. Li and X. Yin, “Sliced inverse regression with regularizations,” Biometrics, vol. 64, no. 1, pp. 124–131, 2008, doi: 10.1111/j.1541-0420.2007.00836.x.

S. Sutikno, S. Setiawan, and H. Purnomoadi, “Statistical downscaling output GCM modeling with continuum regression and pre-processing PCA approach,” IPTEK The Journal for Technology and Science, vol. 21, no. 3, 2010.

I. Ismah, E. Erfiani, A. H. Wigena, and B. Sartono, “Performance Analysis of Robust Functional Continuum Regression to Handle Outliers,” InPrime: Indonesian Journal of Pure and Applied Mathematics, vol. 6, no. 1, pp. 52–62, 2024.

S. K. Safi, M. Alsheryani, M. Alrashdi, R. Suleiman, D. Awwad, and Z. N. Abdalla, “Migration Letters Optimizing Linear Regression Models with Lasso and Ridge Regression: A Study on UAE Financial Behavior during COVID-19,” vol. 20, no. 6, pp. 139–153, 2023, [Online]. Available: www.migrationletters.com

W. K. Härdle and L. Simar, Applied multivariate statistical analysis. Springer Nature, 2019.

L. Li, “Sparse sufficient dimension reduction,” Biometrika, vol. 94, no. 3, pp. 603–613, 2007.

Z. Xie, X. Feng, X. Chen, and G. Huang, “Optimizing a vector of shrinkage factors for continuum regression,” Chemometrics and Intelligent Laboratory Systems, vol. 206, p. 104141, Nov. 2020, doi: 10.1016/J.CHEMOLAB.2020.104141.

S. Lee, M. H. Seo, and Y. Shin, “The lasso for high dimensional regression with a possible change point,” 2015. [Online]. Available: https://academic.oup.com/jrsssb/article/78/1/193/7040660

F. Emmert-Streib and M. Dehmer, “High-Dimensional LASSO-Based Computational Regression Models: Regularization, Shrinkage, and Selection,” Dec. 01, 2019, MDPI. doi: 10.3390/make1010021.

D. Chicco, M. J. Warrens, and G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation,” PeerJ Comput Sci, vol. 7, p. e623, 2021.

DOI: http://dx.doi.org/10.12962%2Fj27213862.v8i1.21658

Refbacks

There are currently no refbacks.

Inferensi by Department of Statistics ITS is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://iptek.its.ac.id/index.php/inferensi.

ISSN: 0216-308X

e-ISSN: 2721-3862

View My Stats

Username
Password
Remember me