Penerapan Synthetic Minority Oversampling Technique terhadap Data Perokok Anak di Nusa Tenggara Barat Tahun 2021

Rahma Mutiara Sari, Achmad Prasetyo


Indonesia is ranked as the country with the highest number of young smokers in Southeast Asia. This situation is very worrying considering the negative impact of smoking can cause various health problems and even lead to death. West Nusa Tenggara Province has the highest percentage of children who smoke in Indonesia in 2021 at 2.28%. Data on children's smoking status is identified as unbalanced data because the ratio between children who smoke and do not smoke is very lame. Therefore, the binary logistic regression analysis method of the Synthetic Minority Oversampling Technique approach was applied to handle the problem. This study aims to determine an overview and identify variables that influence children's smoking behavior in West Nusa Tenggara in 2021 and their trends. The data used in this study are secondary data from the 2021 National Socio-Economic Survey with the unit analysis of children aged 5 to 17 years in West Nusa Tenggara in 2021. The results showed that gender, economic status, age, status of region of residence, education level of the head of household, and schooling status influenced children's smoking behavior in West Nusa Tenggara in 2021 with children who didnt attend school having the greatest tendency to smoke.


imbalance data; child smoking behavior; logistic regression; SMOTE

Full Text:



BPS, “Indikator Tujuan Pembangunan Berkelanjutan 2021,” pp. 1–253, 2021.

WHO, “WHO Report on The Global Tobacco Epidemic,” Heal. Promot., 2021, [Online]. Available:

BPS, Profil Anak Usia Dini 2020. 2020.

SEATCA, “The tobacco control atlas: ASEAN region,” Southeast Asia Tob. Control Alliance, no. December, 2021, [Online]. Available: ASEAN Tobacco Control Atlas_5th Ed.pdf.

D. Komasari and A. F. Helmi, “Faktor Faktor Penyebab Merokok Pada Remaja,” J. Psikol., vol. 27, no. 1, pp. 37–47, 2011.

S. Rezeki and D. M. Utari, “Faktor-Faktor yang Mempengaruhi Perilaku Merokok Pada Anak Sekolah Dasar di SD Pinggiran Banda AcehTahun 2021,” J. Healthc. Technol. Med., vol. 47, no. 4, pp. 124–134, 2021, doi: 10.31857/s013116462104007x.

I. K. Nasution, “PERILAKU MEROKOK PADA REMAJA,” Rev. Esp. Enfermedades Dig., vol. 94, no. 2, pp. 101–103, 2007.

BPS, “Profil Statistik Kesehatan 2021,” Badan Pus. Stat., p. 404, 2021, [Online]. Available:

A. Agresti, “Categorical data analysis (Vol. 792).,” John Wiley Sons, 2012.

P. Harrington, “Machine Learning in Action,” in New York: Manning Publications Co, 2012.

J. Brownlee, “Data Preparation for Machine Learning,” San Fr. Mach. Learn. Mastery., 2020.

G. King and L. Zeng, “Logistic Regression in Rare Events Data,” Polit. Anal., vol. 9, no. 2, pp. 137–163, 2001, doi: 10.1093/oxfordjournals.pan.a004868.

G. E. Batista, R. C. Prati, and M. C. Monard, “Study of the Behavior of Several Methods for Balancing Machine Learning Training Data,” ACM SIGKDD Explor. newsletter, 6(1), pp.20-29, 2004.

O. Komori and S. Eguchi, “Statistical methods for imbalanced data in ecological and biological studies,” Springer Japan, 2019.

N. V Chawla, K. W. Bowyer, L. O. Hall, and W. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” J. Artif. Intell. Res., vol. 30, no. 2, pp. 321–357, 2002, doi: 10.1002/eap.2043.

C. I. Zahrani and I. M. Arcana, “Determinan Perilaku Remaja Merokok Setiap Hari Di Indonesia,” Semin. Nas. Off. Stat., vol. 2020, no. 1, pp. 519–528, 2021, doi: 10.34123/semnasoffstat.v2020i1.412.

N. Kusumawardhani, I. Tarigan, Suparmi, and A. Schlotheuber, “Socio-Economic, demographic and geographic coreelates of cigarette smoking among Indonesian adolescent: result from the 2013 Indonesian Basic Health Research (RISKESDAS) survey.,” 2018.

Y. Wang, H. Y. Sung, T. Yao, & Lightwood, J., and W. Max, “Infrequent and frequent nondaily smokers and daily smokers: their characteristics and other tobacco use patterns,” pp. 741–748, 2018.

F. . Aula, “Stop Merokok,” 2010.

R. Jessor, Problem-Behavior Theory, vol. 2. 2001.

J. D. W. Hosmer, S. Lemeshow, and R. X. Sturdivant, “Applied logistic regression (Vol. 398),” John Wiley Sons, 2013.

R. P. Saputri, W. S. Winahju, and K. Fithriasari, “Klasifikasi Sentimen Wisatawan Candi Borobudur pada Situs TripAdvisor Menggunakan Support Vector Machine dan K-Nearest Neighbor,” J. Sains dan Seni ITS, vol. 8, no. 2, pp. 349–356, 2020.

L. A. Andika, P. A. N. Azizah, and R. Respatiwulan, “Analisis Sentimen Masyarakat terhadap Hasil Quick Count Pemilihan Presiden Indonesia 2019 pada Media Sosial Twitter Menggunakan Metode Naive Bayes Classifier,” Indones. J. Appl. Stat., pp. 34–41, 2019.

C. S. Imanwardhani, “Pendekatan Synthetic Minority Oversampling Technique Dalam Menangani Klasifikasi Imbalanced Data Biner (Studi Kasus: Status Ketertinggalan Desa di Jawa Timur),” 2018.

F. Gorunescu, “Data Mining: Concepts, models and techniques,” Springer Sci. Bus. Media., 2011.

BPS, “Konsep dan Definisi Survei Sosial Ekonomi Nasional Maret 2021,” 2021.

S. C. Pandelaki, “Determinan Perilaku Merokok Pada Anak di Indonesia Tahun 2020,” 2022.

V. Maharani and T. Harsanti, “Variabel-Variabel yang Mempengaruhi Intensitas Merokok Remaja Pria di Indonesia Tahun 2017 (Variables that affect the smoking intensity of male adolescents in Indonesia in 2017),” vol. 2017, pp. 821–830, 2021.

N. E. Amponsah, G. Afful-Mensah, and S. Ampaw, “Deteminants of cigarette smoking and smoking intensity among adult males in Ghana,” BMC Public Health, vol. 18, no. 941, 2018.

R. Albaihaqi, Determinan Perilaku Merokok Anak di Jawa Barat Tahun 2019. 2021.

W. D. Purnaningrum, H. Joebagio, and B. Murti, “Association between cigarette advertisement, peer group, parental education, family income, and pocket money with smoking behavior among adolescents in Karanganyar District, Central Java,” J. Heal. Promot. Behav., vol. 2, no. 2, pp. 148–158, 2017.

S. L. Tyas and L. L. Pederson, “Psychosocial factors related to adolescent smoking: a critical review of the literature,” Tob. Control, vol. 7, no. 4, pp. 409–420, 1998.



  • There are currently no refbacks.

Creative Commons License
Inferensi by Department of Statistics ITS is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at

ISSN:  0216-308X

e-ISSN: 2721-3862

Analytics Made Easy - StatCounter View My Stats