Forecasting Tourist Arrivals in Bali: A Grid Search-Tuned Comparative Study of Random Forest, XGBoost, and a Hybrid RF-XGBoost Model

Kadek Jemmy Waciko, Leni Anggraini Susanti, Muayyad Muayyad, Rifqi Nur Fakhrurozi

Abstract


Tourism planning, infrastructure growth, and economic stability. This study presents an extensive comparative evaluation of Random Forest (RF), Extreme Gradient Boosting (XGBoost), Long Short-Term Memory (LSTM), and a novel Hybrid RF-XGBoost model in the prediction of monthly international tourist arrivals. A full time series dataset of a ten-year period (2014–2024) from the Central Bureau of Statistics of Bali was used for training and testing the models. Hyperparameter optimization using Grid Search with cross-validation (Grid Search CV) was used for all the machine learning models to obtain best predictive performance. Two robust metrics, Root Mean Squared Error (RMSE) and Mean Absolute Percentage Error (MAPE), were used to assess forecasting accuracy. Results show that the Random Forest model outperforms all competitors with lowest RMSE (41,772.68) and MAPE (6.30%), indicating high forecasting precision and robustness, especially during structural breaks such as the COVID-19 pandemic. The hybrid model also performs well, with LSTM indicating higher error rates, illustrating its shortcomings on small-to-medium-scale tourism time series. Besides, the study provides six-month ahead predictions (January–June 2025) with 95% prediction intervals, showing an ongoing trend of recovery. The findings affirm the superiority of bagging-based ensemble methods over polynomial-based methods in capturing nonlinearity, seasonality, and exogenous shocks in tourist demand. The study plugs the growing amount of data-driven tourism analytics by offering a reproducible, high-precision forecasting model for developing countries and seasonally driven destinations.

Keywords


Grid Search; CV; LSTM; Machine Learning; Random Forest; XGBoost

Full Text:

PDF

References


Y. Qian and Y. Zhang, “Long-term forecasting in asset pricing: Machine learning models’ sensitivity to macroeconomic shifts and firm-specific factors,” North American Journal of Economics and Finance, vol. 78, May 2025, doi: 10.1016/J.NAJEF.2025.102423.

M. A. Afrianto and M. Wasesa, “The impact of tree-based machine learning models, length of training data, and quarantine search query on tourist arrival prediction’s accuracy under COVID-19 in Indonesia,” Current Issues in Tourism, vol. 25, no. 23, pp. 3854–3870, 2022, doi: 10.1080/13683500.2022.2085079.

F. Antolini and S. Cesarini, “Predicting Domestic Tourists’ Length of Stay in Italy leveraging Regression Decision Tree Algorithms,” Electronic Journal of Applied Statistical Analysis, vol. 17, no. 3, pp. 621–635, 2024, doi: 10.1285/I20705948V17N3P621.

Z. Chen, C. Ye, H. Yang, P. Ye, Y. Xie, and Z. Ding, “Exploring the impact of seasonal forest landscapes on tourist emotions using Machine learning,” Ecol Indic, vol. 163, Jun. 2024, doi: 10.1016/J.ECOLIND.2024.112115.

Z. Marzak, R. Benabbou, S. Mouatassim, and J. Benhra, “Forecasting Multivariate Time Series with Trend and Seasonality: A Random Forest Approach,” Communications in Computer and Information Science, vol. 2373 CCIS, pp. 128–144, 2025, doi: 10.1007/978-3-031-80775-6_9.

J. Zhao, C. Der Lee, G. Chen, and J. Zhang, “Research on the Prediction Application of Multiple Classification Datasets Based on Random Forest Model,” 2024 IEEE 6th International Conference on Power, Intelligent Computing and Systems, ICPICS 2024, pp. 156–161, 2024, doi: 10.1109/ICPICS62053.2024.10795875.

L. Liu, J. Liu, and X. Zhu, “An augmentation of random forest through using the principle of justifiable granularity,” p. 218, Jul. 2024, doi: 10.1117/12.3031228.

S. Bhadula et al., “Optimizing Random Forest Algorithms for LargeScale Data Analysis,” Proceedings of International Conference on Contemporary Computing and Informatics, IC3I 2024, pp. 1673–1678, 2024, doi: 10.1109/IC3I61595.2024.10829145.

Z. Li and T. Lu, “Prediction of Multistation GNSS Vertical Coordinate Time Series Based on XGBoost Algorithm,” Lecture Notes in Electrical Engineering, vol. 910 LNEE, pp. 275–286, 2022, doi: 10.1007/978-981-19-2576-4_24.

F. A. Nahid, M. N. Jahangir, H. M. Chowdhury, and K. Akter, “Evaluation and Performance Metrics for Forecasting Renewable Power Generation, Demand, and Electricity Price,” Forecasting Methods for Renewable Power Generation, pp. 173–218, Jan. 2025, doi: 10.1002/9781394249466.CH7.

T. O. Hodson, “Root-mean-square error (RMSE) or mean absolute error (MAE): when to use them or not,” Geosci Model Dev, vol. 15, no. 14, pp. 5481–5487, Jul. 2022, doi: 10.5194/GMD-15-5481-2022.

Z. Schwartz, J. Ma, and T. Webb, “The MSapeMER: a symmetric, scale-free and intuitive forecasting error measure for hospitality revenue management,” International Journal of Contemporary Hospitality Management, vol. 36, no. 6, pp. 2035–2048, Apr. 2024, doi: 10.1108/IJCHM-01-2023-0088.

“Badan Pusat Statistik Provinsi Bali.” Accessed: Jul. 28, 2025. [Online]. Available: https://bali.bps.go.id/id

W. Limpornchitwilai, P. Suksompong, C. Charoenlarpnopparut, L. O. Kovavisaruch, T. Sanpechuda, and M. Nakatani, “Improvement of the RF-Based and XGBoost-Based Visitor Data Prediction for Colocated Museums,” 19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, ECTI-CON 2022, 2022, doi: 10.1109/ECTI-CON54298.2022.9795605.

D. T. Andariesta and M. Wasesa, “Machine learning models for predicting international tourist arrivals in Indonesia during the COVID-19 pandemic: a multisource Internet data approach,” Journal of Tourism Futures, 2022, doi: 10.1108/JTF-10-2021-0239.

E. Brilliandy, H. Lucky, A. Hartanto, D. Suhartono, and M. Nurzaki, “Using Regression to Predict Number of Tourism in Indonesia based of Global COVID-19 Cases,” 2022 3rd International Conference on Artificial Intelligence and Data Sciences: Championing Innovations in Artificial Intelligence and Data Sciences for Sustainable Future, AiDAS 2022 - Proceedings, pp. 310–315, 2022, doi: 10.1109/AIDAS56890.2022.9918731.

Y. Manzali and M. Elfar, “Random Forest Pruning Techniques: A Recent Review,” Operations Research Forum, vol. 4, no. 2, Jun. 2023, doi: 10.1007/S43069-023-00223-6.

A. Khanna, D. Goyal, N. Chaurasia, and T. H. Sheikh, “Forecasting Financial Success App: Unveiling the Potential of Random Forest in Machine Learning-Based Investment Prediction,” Lecture Notes in Networks and Systems, vol. 788 LNNS, pp. 279–292, 2023, doi: 10.1007/978-981-99-6553-3_22.

A. Sharma, A. M. Gupta, and S. Das, “Enhancing liver cirrhosis diagnosis: A comparative analysis of XGBoost, SVM, and random forest classifiers for optimal predictive analysis,” Computational Methods in Science and Technology - Proceedings of the 4th International Conference on Computational Methods in Science and Technology, ICCMST 2024, vol. 1, pp. 130–135, 2025, doi: 10.1201/9781003501244-22.

S. K. Khaleelullah, H. Kumar, G. Rahul, R. Naik, and S. Teja, “DietRx: Machine Learning Enhanced Disease Specific Nutrition and Precautions,” Proceedings - 2024 8th International Conference on Inventive Systems and Control, ICISC 2024, pp. 340–344, 2024, doi: 10.1109/ICISC62624.2024.00065.

K. Maji, S. Gupta, and P. K. Dutta, “Enhancing Heart Disease Prediction Accuracy: Comprehensive Analysis of XGBoost and AdaBoost,” IET Conference Proceedings, vol. 2024, no. 37, pp. 130–136, 2024, doi: 10.1049/ICP.2025.0833.

R. Sibindi, R. W. Mwangi, and A. G. Waititu, “A boosting ensemble learning based hybrid light gradient boosting machine and extreme gradient boosting model for predicting house prices,” Engineering Reports, vol. 5, no. 4, Apr. 2023, doi: 10.1002/ENG2.12599.

H. Xiao, “Enhanced separation of long-term memory from short-term memory on top of LSTM: Neural network-based stock index forecasting,” PLoS One, vol. 20, no. 6 June, Jun. 2025, doi: 10.1371/JOURNAL.PONE.0322737.

J. Peeperkorn, S. vanden Broucke, and J. De Weerdt, “Can recurrent neural networks learn process model structure?,” J Intell Inf Syst, vol. 61, no. 1, pp. 27–51, Aug. 2023, doi: 10.1007/S10844-022-00765-X.

S. Boriratrit and R. Chatthaworn, “Improvement of Long Short-Term Memory via CEEMDAN and Logistic Maps for the Power Consumption Forecasting,” 2023 15th International Conference on Advanced Computational Intelligence, ICACI 2023, 2023, doi: 10.1109/ICACI58115.2023.10146172.

Q. Kang, D. Yu, K. H. Cheong, and Z. Wang, “Deterministic convergence analysis for regularized long short-term memory and its application to regression and multi-classification problems,” Eng Appl Artif Intell, vol. 133, Jul. 2024, doi: 10.1016/J.ENGAPPAI.2024.108444.

Y. Ying, H. Qi, and L. Na, “Feature Selection and Efficient Disease Early Warning Based on Optimized Ensemble Learning Model:Case Study of Geriatric Depression and Anxiety,” Data Analysis and Knowledge Discovery, vol. 7, no. 7, pp. 74–88, Jul. 2023, doi: 10.11925/INFOTECH.2096-3467.2022.0718.

Y.A Jatmiko, R.A, Rahayu and G. Darmawan.Perbandingan Keakuratan Hasil Peramalan Produksi Bawang merah Metode Holt-Winters dengan Singular Spectrum Analysis (SSA).Jurnal Matematika“Mantik“ Vol.03 No.01:13-23.




DOI: http://dx.doi.org/10.12962%2Fj27213862.v8i3.23334

Refbacks

  • There are currently no refbacks.




Creative Commons License
Inferensi by Department of Statistics ITS is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://iptek.its.ac.id/index.php/inferensi.

ISSN:  0216-308X

e-ISSN: 2721-3862

Web
Analytics Made Easy - StatCounter View My Stats