Comparisons of Logistic Regression and Support Vector Machines in Classification of Echocardiogram Dataset

Neni Alya Firdausanti, Ratih Ardiati Ningrum, Siti Qomariyah

Abstract


Echocardiography is a test that uses sound waves to produce an image of our heart. This image is called an echocardiogram. This paper uses Echocardiogram Dataset, in which the problem is to classify from 7 features whether the patient will survive or not. In this study, the classification method is used to solve this problem. Some classification methods can be applied to classify category response variables, such as Logistic regression and Support Vector Machines (SVM). The method for predicting best accuracy used holdout and cross-validation. Before doing classification, some preprocessing procedures were applied to this dataset. The preprocessing procedures include missing value imputation using median imputation, outliers detection in univariate and multivariate procedures, and feature selection using the backward method. The result of classification in the analysis showed that SVM with unstratified holdout gave the best accuracy, that is 91.54%.

Keywords


Classification; Cross Validation; Echocardiogram; Holdout; Logistic Regression; SVM

Full Text:

PDF

References


F. Gorunescu, Data Mining Concepts, Models, and Techniques, Berlin Heidelberg: Springer-Verlag, 2011.

D. Dall, R. Kaur and M. Juneja, "Machine Learning: A Review of the Algorithms and Its Application," 2020.

L. Zhang, J. Wen, Y. Li, J. Chen, Y. Ye, Y. Fu and W. Livingood, "A review of machine learning in building load prediction," Applied Energy, vol. 285, 2021.

I. Sarkeh, "Machine Learning: Algorithms, Real-World Application and Research Direction," SN Computer Science, vol. 2, no. 3, 2021.

M. Tariq, S. Tayyaba, M. Ashraf and V. Balas, "Deep learning techniques for optimizing medical big data," Deep Learning Techniques for Biomedical and Health Informatics, pp. 187-211, 2020.

F. Herrera, F. Charte, A. Rivera and M. del Jesus, Multilable Classification, Cham: Springer International Publishing, 2915.

E. Garba and A. Amadu, "A Systematic Review of Data Mining in Health Care: A Case of Breast Cancer," International Journal of Research anf Analysis in Science and Engineering, vol. 2, no. 1, pp. 19-25, 2022.

A. Pena, D. Cisgar and D. Unal, "Comparison of Data Mining Classification Algorithms Determining the Default Risk," Scientific Programming , vol. 2019, 2019.

A. Niazalizadeh Moghadam and R. Ravanmehr, "Multi-agent distributed data mining approach for classifying meteorology data: case study on Iran's synoptic weather stations," International Journal of Environmental Science and Technology, vol. 15, no. 1, pp. 149-158, 2018.

P. Yang, G. Yang , F. Zhang, B. Jiang and M. Wang, "Spectral Classification and Particular Spectra Identification Based on Data Mining," Archives of Computational Methods in Engineering, vol. 28, no. 3, pp. 917-935, 2021.

D. Degadwala and D. Vyas, "Data Mining Approach for Amino Acid Sequence Classification," International Journal of New Practices in Management and Engineering, vol. 10, no. 04, pp. 01-08, 2021.

S. Huang, N. Cai, P. Pacheco, S. Narrandes, Y. Wang and W. Xu, "Application of Support Vector Machine (SVM) Learning in Cancer Genomics," Cancer Genomics & Proteomics January, vol. 15, no. 1, pp. 41-51, 2018.

H. Rahman, Y. Wah, H. He and A. Bulgiba, "Comparison of ADABOOST, KNN, SVM, and Logistic Regression in Classification of Imbalanced Dataset," in International Conference on Soft Computing in Data Science, Singapore, 2015.

X. Shen, L. Niu, Z. Qi and Y. Tian, "Support vector machine classifier with truncated pinball loss," Pattern Recognition, vol. 68, pp. 199-210, 2017.

H. Best and C. Wolf, Logistic Regression, Los Angles: Sage, 2015.

E. Choi, M. Bahadori, J. Kulas , A. Schuetz and W. Stewart, "RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism," in Advances in Neural Information Processing Systems, 2016.

J. Han, M. Kamber and J. Pei, Data Mining Concepts and Techniques, USA: Morgan Kaufmann, 2012.

A. Agresti, Categorical Data Analysis, Second Edition, New Jersey: Johm Wiley & Sons, 2002.

D. Hosmer and S. Lemeshow, Applied Logistic Regression Second Edition, USA: John Wiley & Sons, 2000.

N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machine, Cambridge: Cambridge University Press, 2000.

K. Sembiring, "Penerapan Teknik Support Vector Machine untuk Pendeteksian Intrusi pada Jaringan," Bandung, 2007.

I. Witten , E. Frank and M. Hall, Data Mining Practical Machine Learning Tools and Techniques Third Edition, USA: Morkan Kaufmann Publisher, 2011.




DOI: http://dx.doi.org/10.12962/j27213862.v5i2.14121

Refbacks

  • There are currently no refbacks.




Creative Commons License
Inferensi by Department of Statistics ITS is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://iptek.its.ac.id/index.php/inferensi.

ISSN:  0216-308X

e-ISSN: 2721-3862

Web
Analytics Made Easy - StatCounter View My Stats