Comparative Analysis of Feature Selection Method to Predict Customer Loyalty

Heni Sulistiani, Aris Tjahyanto


The growth of Fast Moving Consumer Goods (FMCG) industry is still showing double-digit and Indonesia becomes a potential market for the products FMCG, so that the competition between companies will be intense. The company have to attempted to survive, one of the way is to maintain customer loyalty. Data mining techniques can be used to predict customer loyalty. In data mining pra-processing, feature selection is one of the important thing to reduces the number of features, removes irrelevant, redundant, or data noise, and brings the immediate effects for applications: speeding up a data mining algorithm, improving mining performance such as the accuracy of the prediction and the comprehensive result. This paper aims to identify the relevant factors that affect the performance of the classification of customer loyalty with several feature selection method and to compare the classification performance in customers loyalty prediction of FMCG products. Data was obtained from the results of fast moving consumer goods customers questionnaires towards several brands of instant noodles in Lampung that was ranked TOP Brand Award Phase 1 2016, using nonprobability sampling method and convenience sampling technique. The result in this paper, chi square feature selection methods with threshold > 0.01 showed the best results, it is indicated by the highest accuracy of  random forest classification algorithm, that is 83.2% for thirteenth features


Classification, Customer Loyalty, Feature Selection

Full Text:



E. Osmanbegović, M. Suljić, and H. Agić, “Determining Dominant Factor for Students Performance Prediction by Using Data Mining,” TRANZICIJA, vol. 17, no. 34, pp. 147–158, 2014.

W. Buckinx and D. Van den Poel, “Customer base analysis: partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting,” Eur. J. Oper. Res., vol. 164, no. 1, pp. 252–268, 2005.

Winarso Kukuh, “Kepuasan dan loyalitas pelanggan pada produk susu bayi menggunakan service quality dan path analysis,” Manaj. Teor. dan Terap., vol. 3, no. 1, pp. 81–104, 2010.

H. Liu, L. Yu, S. S. Member, L. Yu, and S. S. Member, “Toward integrating feature selection algorithms for classification and clustering,” Knowl. Data Eng. IEEE Trans., vol. 17, no. 4, pp. 491–502, 2005.

A. L. Blum and P. Langley, “Selection of relevant features and examples in machine learning,” Artif. Intell., vol. 97, no. 1–2, pp. 245–271, Dec. 1997.

V. Ramesh, P. Parkavi, and K. Ramar, “Predicting Student Performance : A Statistical and Data Mining Approach,” Int. J. Comput. Appl., vol. 63, no. 8, pp. 35–39, 2013.

E. Prasetyo, Data Mining Mengolah Data Menjadi Informasi Menggunakan Matlab. Yogyakarta: Andi Offset, 2014.

L. Portinale, L. Saitta, D. Informatica, and P. Orientale, Feature Selection Feature Selection : State of the Art. 2002.

A. S. Sukardi and C. Supriyanto, “Klasifikasi Spam Email Menggunakan Algoritma C4.5 Dengan Seleksi Fitur,” J. Teknol. Inf., vol. 10, no. 1, pp. 19–30, 2014.

B. Nurina Sari, “Implementasi Teknik Seleksi Fitur Information Gain Pada Algoritma Klasifikasi Machine Learning Untuk Prediksi Performa Akademik Siswa,” Semin. Nas. Teknol. Inf. dan Multimed. 2016, p. 6, 2016.

C.-F. Tsai and M.-Y. Chen, “Variable selection by association rules for customer churn prediction of multimedia on demand,” Expert Syst. Appl., vol. 37, no. 3, pp. 2006–2015, 2010.

Musriadi, “Riset Indonesia Pasar Potensial Produk FMCG,” 2014. [Online]. Available: [Accessed: 01-Jan-2015].

T. B. Santoso, “Analisa Dan Penerapan Metode C4.5 Untuk Prediksi Loyalitas Pelanggan,” J. Ilm. Fak. Tek. LIMIT’S, vol. 10, no. 1, pp. 33–36, 2014.

Abubakar, “Pengukuran Persepsi Penumpang tentang Efektivitas Strategi Pencegahan Kejahatan TransJakarta,” Universitas Indonesia, 2009.

S. Lemeshow, D. W. Hosmer Jr, J. Klar, and S. K. Lwanga, Adequacy of Sample Size in Health Studies. Chichester: John Wiley & Sons Ltd, 1990.

A. Verikas, A. Gelzinis, and M. Bacauskiene, “Mining data with random forests: A survey and results of new tests,” Pattern Recognit., vol. 44, no. 2, pp. 330–349, 2011.

P. O. Gislason, J. A. Benediktsson, and J. R. Sveinsson, “Random Forests for land cover classification,” 2006.

S. R. Joelsson, J. A. Benediktsson, and J. R. Sveinsson, “Feature Selection for Morphological Feature Extraction using Random Forests,” in Proceedings of the 7th Nordic Signal Processing Symposium - NORSIG 2006, 2006, pp. 10–13.

L. Breiman, “Random forests,” Mach. Learn., vol. 45, pp. 5–32, 2001.

D. R. Cutler et al., “Random Forests for Classification in Ecology,” Ecology, vol. 88, no. 11, pp. 2783–2792, 2007.

a Liaw and M. Wiener, “Classification and Regression by randomForest,” R news, vol. 2, no. December, pp. 18–22, 2002.

H. Liu, J. Sun, L. Liu, and H. Zhang, “Feature selection with dynamic mutual information,” Pattern Recognit., vol. 42, no. 7, pp. 1330–1339, 2009.

T. Liu, S. Liu, Z. Chen, and W. Ma, “An evaluation on feature selection for text clustering,” in Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), 2003, pp. 488–495.

G. Doquire and M. Verleysen, “Mutual information-based feature selection for multilabel classification,” 2013.

K. Zhang, Y. Li, P. Scarf, and A. Ball, “Feature selection for high-dimensional machinery fault diagnosis data using multiple models and Radial Basis Function networks,” Neurocomputing, vol. 74, no. 17, pp. 2941–2952, 2011.

W. Julianto, R. Yunitarini, and M. K. Sophan, “Algoritma C4.5 Untuk Penilaian Kinerja Karyawan,” Scan, vol. Vo. IX, no. No. 2, pp. 33–39, 2014.

R. P. Priyadarsini, M. L. Valarmanthi, and S. Sivakumari, “Gain Ratio Based Feature Selection Method for Privacy Preservation,” ICTACT J. Soft Comput., vol. 1, no. 4, pp. 201–205, 2011.

X. Jin, A. Xu, R. Bie, and P. Guo, “Machine Learning Techniques and Chi-Square Feature Selection for Cancer Classification Using SAGE Gene Expression Profiles,” in Data Mining for Biomedical Applications: PAKDD 2006 Workshop, BioDM 2006, J. Li, Q. Yang, and A.-H. Tan, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, pp. 106–115.

C. Sun, X. Wang, and J. Xu, “Study on feature selection in finance text categorization,” 2009 IEEE Int. Conf. Syst. Man Cybern., vol. 3, no. October, pp. 5077–5082, 2009.

B. Sui, “Information gain feature selection based on feature interactions,” University of Houston, 2013.



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.