Ensemble Oversampling For Financial Fraud Classification Of Imbalanced Data

Agus Budi Raharjo, Moch Deny Pratama, Diana Purwitasari

Abstract


Financial fraud classification cases such as credit card fraud and bitcoin fraud have highly imbalanced data problems that the oversampling data of fraud class is necessary. Financial transactions could have different attributes. In a credit card transaction, the attributes could represent a nominal amount, transaction period information, the status of deposits or other types like withdrawals or refunds, and more detailed information. In the financial transaction of bitcoin, the attributes could represent the number of nodes, transaction fee, output volume, and aggregated figures. The various characteristics of attributes in financial fraud data require an adaptable oversampling method so that the classification model can perform well. An Ensemble Oversampling method is proposed as a general context approach to handling financial fraud classification in credit cards and bitcoin. The proposed method combines generative with traditional approaches such as GAN, SMOTE, and ADASYN. In the classification step, Deep Learning algorithms such as CNN and LSTM are applied to provide better performance. The genetic algorithm is used to optimize Deep Learning hyperparameters. The evaluation was carried out by comparing four scenarios, i.e., without oversampling, using oversampling with GAN, SMOTE, ADASYN, original data, and Ensemble Oversampling. The combined oversampling of GAN and SMOTE with the CNN classifier model produces the highest evaluation score of all scenarios with an average F1-Score value of 0.995 and Kappa Statistics of 0.990. It shows that augmented data quality does affect prediction performance, and Ensemble Oversampling technique could be considered to improve classifier performance in financial fraud data.

Keywords


Financial Fraud Classification; Imbalanced Data; Ensemble Oversampling; Deep Learning

Full Text:

PDF

References


Al-Hashedi KG, Magalingam P. Financial fraud detection applying data mining techniques: A comprehensivereview from 2009 to 2019. Computer Science Review 2021;40. https://www.sciencedirect.com/science/article/pii/S1574013721000423?via%3Dihub.

Kim E, et al. Champion-challenger analysis for credit card fraud detection: Hybrid ensemble and Deep Learning. ExpertSystems with Applications 2019;128:214–224. https://www.sciencedirect.com/science/article/pii/S0957417419302167?via%3Dihub.

Liu XF, Jiang XJ, Liu SH, Tse CK. Knowledge Discovery in Cryptocurrency Transactions: A Survey. IEEE Access2021;9(2):37229–37254. https://www.sciencedirect.com/science/article/pii/S0957417419302167?via%3Dihub.

Weber M, et al. Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for FinancialForensics. arXiv preprint arXiv:190802591 2019;1(10).

Nicholls J, Kuppa A, Le-Khac NA. Financial cybercrime: A comprehensive survey of Deep Learning approaches totackle the evolving financial crime landscape. IEEE Access 2021;9:163965–163986. https://ieeexplore.ieee.org/document/9642993.

Makki S, Assaghir Z, Taher Y, Haque R, Hacid MS, Zeineddine H. An Experimental Study With Imbalanced ClassificationApproaches for Credit Card Fraud Detection. IEEE Access 2019;p. 93010–93022. https://ieeexplore.ieee.org/document/8756130.

agus ET AL.

Benchaji I, Douzi S, Ouahidi BE, Jaafari J. Enhanced credit card fraud detection based on attention mechanismand LSTM deep model. Journal of Big Data 2021;8(1). https://journalofbigdata.springeropen.com/articles/10.1186/s40537-021-00541-8.

Zhang X, Han Y, Xu W, Wang Q. A novel sensitive staphylococcal enterotoxin C1 fluoroimmunoassay based on function-alized fluorescent core-shell nanoparticle labels. Food Chemistry 2007;105:1623–1629. https://www.sciencedirect.com/science/article/pii/S0308814607003135?via%3Dihub.

Alarab I, Prakoonwit S, Nacer MI. Competence of graph convolutional networks for anti-money laundering in bitcoinblockchain. In: Proceedings of the ACM International Conference Proceeding Series July; 2020. p. 23–27. https://dl.acm.org/doi/10.1145/3409073.3409080.

Xia P, Ni Z, Xiao H, Zhu X, Peng P. A Novel Spatiotemporal Prediction Approach Based on Graph Convolution NeuralNetworks and Long Short-Term Memory for Money Laundering Fraud. Arabian Journal for Science and Engineering2022;47(2):1921–1937. https://link.springer.com/a

Tani L, Rand D, Veelken C, Kadastik M. Evolutionary algorithms for hyperparameter optimization in Machine Learninfor application in high energy physics. European Physical Journal C 2021;81(2):1–9. https://link.springer.com/article/10.1140/epjc/s10052-021-08950-y.

Hassan MR, Ismail WN, Chowdhury A, Hossain S, Huda S, Hassan MM. A framework of genetic algorithm-based CNN onmulti-access edge computing for automated detection of COVID-19. Journal of Supercomputing 2022;78(7):10250–10274.https://link.springer.com/article/10.1007/s11227-021-04222-4.

Shamsudin H, Yusof UK, Jayalakshmi A, Khalid MNA. Combining oversampling and undersampling techniques for imbal-anced classification: A comparative study using credit card fraudulent transaction dataset. In: Proceedings of the IEEEInternational Conference on Control and Automation (ICCA), vol. 2020-Octob; 2020. p. 803–808. https://ieeexplore.ieee.org/document/9264517.

Ishaq A, et al. Improving the Prediction of Heart Failure Patients’ Survival Using SMOTE and Effective Data MiningTechniques. IEEE Access 2021;9:39707–39716. https://ieeexplore.ieee.org/document/9370099.

Shen F, Zhao X, Kou G, Alsaadi FE. A new Deep Learning ensemble credit risk evaluation model with an improvedsynthetic minority oversampling technique. Applied Soft Computing 2021;98:106852. https://www.sciencedirect.com/science/article/pii/S1568494620307900?via%3Dihub.

Koziarski M. Radial-Based Undersampling for imbalanced data classification. Pattern Recognition 2020;102. https://www.sciencedirect.com/science/article/pii/S0031320320300674?via%3Dihub.

Huda S, et al. An Ensemble Oversampling Model for Class Imbalance Problem in Software Defect Prediction. IEEE Access2018;6:24184–24195. https://www.sciencedirect.com/science/article/pii/S0031320320300674?via%3Dihub.

Hilal W, Gadsden SA, Yawney J. Financial Fraud: A Review of Anomaly Detection Techniques and RecentAdvances. Expert Systems with Applications 2022;193:116429. https://www.sciencedirect.com/science/article/pii/S0957417421017164?via%3Dihub.

Engelmann J, Lessmann S. Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning.Expert Systems with Applications 2021;174(December 2020):114582. https://www.sciencedirect.com/science/article/pii/S0957417421000233?via%3Dihub.

Fajardo VA, et al. On oversampling imbalanced data with deep conditional Generative models. Expert Systems withApplications 2021;169:114463. https://www.sciencedirect.com/science/article/pii/S0957417420311155?via%3Dihub.

Tran TC, Dang TK. Machine Learning for Prediction of Imbalanced Data: Credit Fraud Detection. In: 2021 15thInternational Conference on Ubiquitous Information Management and Communication (IMCOM); 2021. p. 1–7. https://ieeexplore.ieee.org/document/9377352.agus ET AL. 11

Mqadi N, Naicker N, Adeliyi T. A SMOTe based Oversampling Data-Point Approach to Solving the Credit Card Imbalanceddata Problem in Financial fraud Detection. International Journal of Computing and Digital Systems 2021;1(1). https://journal.uob.edu.bh/items/15251cc8-0e71-4121-ad36-f858990a715b.

Dong S, Wang P, Abbas K. A survey on Deep Learning and its applications. Computer Science Review 2021;40:100379.https://www.sciencedirect.com/science/article/pii/S1574013721000198?via%3Dihub.

Kiranyaz S, Avci O, Abdeljaber O, Ince T, Gabbouj M, Inman DJ. 1D convolutional neural networks and applications: Asurvey. Mechanical Systems and Signal Processing 2021;151:107398. https://www.sciencedirect.com/science/article/pii/S0888327020307846?via%3Dihub.

Bakhashwain N, Sagheer A. Online Tuning of Hyperparameters in Deep LSTM for Time Series Applications. InternationalJournal of Intelligent Engineering and Systems 2020;14(1):212–220. https://www.inass.org/2021/2021022821.pdf.

Vitianingsih AV, Othman Z, Baharin SSK, Suraji A, Maukar AL. Application of the Synthetic Over-Sampling Method toIncrease the Sensitivity of Algorithm Classification for Class Imbalance in Small Spatial Datasets. International Journal ofIntelligent Engineering and Systems 2022;15(5):676–690. https://inass.org/wp-content/uploads/2022/06/2022103158-2.pdf.

Pratama MD, Sarno R, Abdullah R. Sentiment Analysis User Regarding Hotel Reviews by Aspect Based Using LatentDirichlet Allocation, Semantic Similarity, and Support Vector Machine Method. International Journal of IntelligentEngineering and Systems 2022;15(3):514–524. https://dx.doi.org/10.22266/ijies2022.0630.43.

Narayan V, Ganapathisamy S. Hybrid Sampling and Similarity Attention Layer in Bidirectional Long Short Term Memoryin Credit Card Fraud Detection. International Journal of Intelligent Engineering and Systems 2022;15(6):35–44. https://inass.org/wp-content/uploads/2022/06/2022103158-2.pdf.

Wang L, et al. Multi-classifier-based identification of COVID-19 from chest computed tomography using generalizableand interpretable radiomics features. European Journal of Radiology 2021;136:109552. https://www.sciencedirect.com/science/article/pii/S0720048X21000322?via%3Dihub.




DOI: http://dx.doi.org/10.12962%2Fj20882033.v34i3.17183

Refbacks

  • There are currently no refbacks.


Creative Commons License

IPTEK Journal of Science and Technology by Lembaga Penelitian dan Pengabdian kepada Masyarakat, ITS is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://iptek.its.ac.id/index.php/jts.