Comparison of Ensemble Learning Methods in Classifying Unbalanced Data on the Bank Marketing Dataset
Abstract
The banking industry is experiencing rapid growth, particularly in telemarketing strategies to increase product and service sales. Despite widespread use, these strategies need higher success rates due to data imbalance, where fewer customers accept offers than those who reject them. This study evaluates machine learning algorithms, including Random Forest, Gradient Boosting, Extra Trees, and AdaBoost, without and handling imbalanced data using the Random Over-Sampling Examples (ROSE) method. The evaluation covers accuracy, precision, recall, F1-score, and AUC of the ROC curve. Results indicate that Random Forest and AdaBoost consistently perform well, with Random Forest maintaining a high accuracy of 91.00% after handling imbalanced data. Gradient Boosting and Extra Trees improve in precision post-oversampling. All models exhibit high AUC values, close to 0.94, demonstrating excellent differentiation between positive and negative classes. The study concludes that addressing data imbalance enhances model performance, making these models suitable for effective telemarketing strategies in the banking sector.
Keywords
Full Text:
PDFReferences
A. Ali, S. M. Shamsuddin and A. L. Ralescu, "Classification with class imbalance problem: a review," International Journal of Advance Soft Computing Applications, vol. 5, 2013.
T. S. Amelia, M. N. S. Hasibuan and R. Pane, "Comparative analysis of resampling techniques on Machine Learning algorithm," Sinkron: Jurnal dan Penelitian Teknik Informatika journal, vol. 6, 2022.
J. Zhang and L. Chen, "Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis," Computer Assisted Surgery, 2019.
M. Pirizadeh, N. Alemohammad and M. Manthouri, "A new machine learning ensemble model for class imbalance problem of screening enhanced oil recovery methods," Journal of Petroleum Science and Engineering, vol. 198, 2021.
E. K. Ampomah, Z. Qin and G. Nyame, "Evaluation of Tree-Based Ensemble Machine Learning Models in Predicting Stock Price Direction of Movement," Information, vol. 11, 2020.
K. A. Nguyen, W. Chen, B.-S. Lin and U. Seeboonruang, "Comparison of Ensemble Machine Learning Methods for Soil Erosion Pin Measurements," International Journal of Geo-Information, vol. 10, 2021.
Efron, B. and Tibshirani, R. J., An introduction to the bootstrap, Boca Raton: CRC press, 1994.
U. Ahmed, R. Mumtaz, H. Anwar, A. A. Shah, R. Irfan and J. García-Nieto, "Efficient Water Quality Prediction Using Supervised Machine Learning," Water, vol. 11, no. 11, 2019.
W. Wang, G. Chakraborty and B. Chakraborty, "Predicting the Risk of Chronic Kidney Disease (CKD) Using Machine Learning Algorithm," Applied Sciences, vol. 11, 2021.
Z. Chu, J. Yu and A. Hamdulla, "Throughput Prediction based on ExtraTree for Stream Processing Tasks," Computer Science and Information Systems, 2018.
S. E. Suryana, B. Warsito and Suparti, "Penerapan Gradient Boosting Dengan Hyperopt Untuk Memprediksi Keberhasilan Telemarketing Bank," Jurnal Gaussian, vol. 10, no. 4, pp. 617-623, 2021.
J. Son and S. Yang, "A New Approach to Machine Learning Model Development for Prediction of Concrete Fatigue Life under Uniaxial Compression," Applied Sciences, vol. 12, no. 19, pp. 9766 (1-22), 2022.
S. Demir and E. K. Şahin, "Evaluation of Oversampling Methods (OVER, SMOTE, and ROSE) in Classifying Soil Liquefaction Dataset based on SVM, RF, and Naïve Bayes," European Journal of Science and Technology, vol. 34, pp. 142-147, 2022.
N. H. A. Malek, W. F. W. Yaacob, Y. B. Wah, S. A. M. Nasir, N. Shaadan and S. W. Indratno, "Comparison of ensemble hybrid sampling with bagging and boosting machine learning approach for imbalanced data," Indonesian Journal of Electrical Engineering and Computer Science, vol. 29, no. 1, pp. 598-608, 2023.
L. Qadrini, A. Seppewali and A. Aina, "Decision Tree Dan Adaboost Pada Klasifikasi Penerima Program Bantuan Sosial," Jurnal Inovasi Penelitian, vol. 2, pp. 1959-1966, 2021.
DOI: http://dx.doi.org/10.12962/j27213862.v8i1.20569
Refbacks
- There are currently no refbacks.
Inferensi by Department of Statistics ITS is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://iptek.its.ac.id/index.php/inferensi.
ISSN: 0216-308X
e-ISSN: 2721-3862
View My Stats