Model Evaluation for Logistic Regression and Support Vector Machines in Diabetes Problem

Baiq Siska Febriani Astuti, Neni Alya Firdausanti, Santi Wulan Purnami

Abstract


Machine learning is a method or computational algorithm to solve problems based on data that already available from the database. Classification is one of the important methods of supervised learning in machine learning. Support Vector Machine and Logistic Regression are some supervised learning methods that can be used both for classification and regression. In datamining process, Preprocessing is an important part before doing further analysis. In preprocessing data, feature selection and deviding training and testing data are important part of preprocessing data. In this research will be compared some evaluation model of deviding method for training and testing data, namely Random Repeated Holdout, Stratified Repeated Holdout, Random Cross-Validation, and Startified Cross-Validation. Evaluation model would be implying in logistic regression and Support Vector Machines (SVMs). From the analysis, can be concluded that by selecting features can improve the accuracy of classification with logistic regression, but opposite of Support Vector Machines (SVMs). For training and testing data pertition method can not be sure what method is better, because each method of partition training and testing data using the concept of random selection. Model evaluation cannot sure influence to increase best perform for SVMs model in particular this case.

Keywords


Calssification; Cross-validation; Feature Selection; Logistic Regression; Preprocessing; Repeated Holdout; Support Vector Machine

Full Text:

PDF

References


I. H. Witten and E. Frank, Data Mining Practical Learning Tools and Techniques, 2nd Edition ed., United States of America: Morgan Kaufmann, 2005.

J. Han, M. Kamber and J. Pei, Data Mining Concepts and Techniques, 3rd Edition ed., USA: Morgan Kaufmann, 2012.

S. Kim, Z. Yu, R. M. kil and M. Lee, "Deep learning of support vector machines with class probability output networks," Neural Networks, pp. 19-28, 2015.

R. Johnson and D. Wichern, Applied Multivariate Statistical Analysis, New Jersey: Pearson Education, 2007.

F. P. S. Rachman, "Perbandingan Klasifikasi Tingkat Keganasan Breast Cancer dengan Menggunakan Regresi Logistic Ordinal dan Support Vector Machine," Jurnal Sains dan Seni ITS, vol. 1, 2012.

D. Hosmer, S. Lemeshow and R. Sturdivant, Applied Logistic Regression, 3rd Edition ed., New Jersey, USA: John Wiley & Sons, 2013.

S. Purnami, S. Rahayu and A. Embong, "Feature Selection and Classification of Breast Cancer Diagnosis Based on Support Vector Machines," IEEE, pp. 1-6, 2008.

B. Sitthidah and J. S. Maurice, "Comparing Training Method for a New Interactive Whiteboard," International Symposium on Human Factors and Ergonomics in Health Care: Improving Outcomes, pp. 15-18, 2016.

A. O. Kusakci, B. Ayvaz, a. Karakaya and E., "Towards An Autonomous Human Chromosome Classification System using Competitive Support Vector Machines Teams (CSVMT)," Expert Systems With Applications, pp. 224-234.

X. Peng and J. Shen, "Twin-Hyperspheres Support Vector Machine with Automatic Variable Weights for Data Classification," Information Sciences, pp. 216-235, 2017.

S. M. J. a. M. J. Maldonado, "Redefining Support Vector Machines with The Ordered Weighted Average," Knowledge-Based System, pp. 41-46, 2018.




DOI: http://dx.doi.org/10.12962/j27213862.v1i2.6728

Refbacks

  • There are currently no refbacks.




Creative Commons License
Inferensi by Department of Statistics ITS is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://iptek.its.ac.id/index.php/inferensi.

ISSN:  0216-308X

e-ISSN: 2721-3862

Web
Analytics Made Easy - StatCounter View My Stats