SMOTE-Based Oversampling for Imbalanced Digital Fraud Risk Classification
Abstract
Digital fraud risk among university students is an important issue, yet classification using survey-based indicators is complicated by class imbalance. This study examined whether Synthetic Minority Over Sampling Technique (SMOTE) improves Digital Fraud Risk classification among Universitas Terbuka students. This research used primary survey data from 498 respondents and modeled using five predictors representing financial literacy, digital financial literacy, monthly gross income, age, and job tenure. The evaluated models were Gaussian Naive Bayes, Random Forest, calibrated linear Support Vector Machine (SVM), Radial Basis Function SVM, and XGBoost. The performance of model was evaluated using confusion matrix, accuracy, balanced accuracy, precision, recall, F1 score, ROC-AUC, PR-AUC, MCC and Kappa. This research revealed that without oversampling, some models showed higher nominal accuracy but zero recall for High risk. It means that accuracy is insufficient for model selection under imbalance. In contrast, SMOTE increased recall for the High risk class across all models and improved PR AUC in several cases. The SMOTE based Random Forest achieved the highest test PR AUC (0.415), whereas the SMOTE based RBF SVM achieved the highest recall (0.659). Diagnostic analyses for the selected SMOTE based Random Forest provided evidence of non-random predictive signal, although overall discriminative performance remained moderate.
Downloads
References
Ali, A., Abd Razak, S., Othman, S. H., Eisa, T. A. E., Al-Dhaqm, A., Nasser, M., Elhassan, T., Elshafie, H., & Saif, A. (2022). Financial Fraud Detection Based on Machine Learning: A Systematic Literature Review. Applied Sciences, 12(19), 9637. https://doi.org/10.3390/app12199637
Bhaduri, D., Toth, D., & Holan, S. H. (2025). A Review of Tree‐Based Methods for Analyzing Survey Data. WIREs Computational Statistics, 17(1). https://doi.org/10.1002/wics.70010
Breiman, L. (2001). Random Forests. 45, 5–32.
Carvalho, M., Pinho, A. J., & Brás, S. (2025). Resampling approaches to handle class imbalance: a review from a data perspective. Journal of Big Data, 12(1), 71. https://doi.org/10.1186/s40537-025-01119-4
Chen, W., Yang, K., Yu, Z., Shi, Y., & Chen, C. L. P. (2024). A survey on imbalanced learning: latest research, applications and future directions. Artificial Intelligence Review, 57(6), 137. https://doi.org/10.1007/s10462-024-10759-6
Choung, Y., Chatterjee, S., & Pak, T.-Y. (2023). Digital financial literacy and financial well-being. Finance Research Letters, 58, 104438. https://doi.org/10.1016/j.frl.2023.104438
Elreedy, D., Atiya, A. F., & Kamalov, F. (2024). A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning. Machine Learning, 113(7), 4903–4923. https://doi.org/10.1007/s10994-022-06296-4
Gao, X., Xie, D., Zhang, Y., Wang, Z., Chen, C., He, C., Yin, H., & Zhang, W. (2026). A comprehensive survey on imbalanced data learning. Frontiers of Computer Science, 20(11), 2011622. https://doi.org/10.1007/s11704-025-50274-7
Guido, R., Ferrisi, S., Lofaro, D., & Conforti, D. (2024). An Overview on the Advancements of Support Vector Machine Models in Healthcare Applications: A Review. Information, 15(4), 235. https://doi.org/10.3390/info15040235
Hairani, H., Widiyaningtyas, T., & Dwi Prasetya, D. (2024). Addressing Class Imbalance of Health Data: A Systematic Literature Review on Modified Synthetic Minority Oversampling Technique (SMOTE) Strategies. JOIV : International Journal on Informatics Visualization, 8(3), 1310. https://doi.org/10.62527/joiv.8.3.2283
Khalid, A. R., Owoh, N., Uthmani, O., Ashawa, M., Osamor, J., & Adejoh, J. (2024). Enhancing Credit Card Fraud Detection: An Ensemble Machine Learning Approach. Big Data and Cognitive Computing, 8(1), 6. https://doi.org/10.3390/bdcc8010006
Kivrak, M., Avci, U., Uzun, H., & Ardic, C. (2024). The Impact of the SMOTE Method on Machine Learning and Ensemble Learning Performance Results in Addressing Class Imbalance in Data Used for Predicting Total Testosterone Deficiency in Type 2 Diabetes Patients. Diagnostics, 14(23), 2634. https://doi.org/10.3390/diagnostics14232634
Leviany, F., Kasmiarno, K. S., & Fitriana, I. N. L. (2025). Predicting Digital Fraud Risk Using Support Vector Machine Classifier A Case Study Of Universitas Terbuka Students. Proceeding of The International Seminar on Business, Economics, Social Science and Technology (ISBEST), 54–60. https://doi.org/10.33830/isbest.v5i1.7407
Lokanan, M., & Liu, S. (2021). Predicting Fraud Victimization Using Classical Machine Learning. Entropy, 23(3), 300. https://doi.org/10.3390/e23030300
Malhotra, R., & Lata, K. (2022). Handling class imbalance problem in software maintainability prediction: an empirical investigation. Frontiers of Computer Science, 16(4), 164205. https://doi.org/10.1007/s11704-021-0127-0
Pantic, I. V., Paunovic Pantic, J., Valjarevic, S., Corridon, P. R., & Topalovic, N. (2025). Artificial intelligence – based approaches based on random forest algorithm for signal analysis: Potential applications in detection of chemico - biological interactions. Chemico-Biological Interactions, 418, 111624. https://doi.org/10.1016/j.cbi.2025.111624
Salman, H. A., Kalakech, A., & Steiti, A. (2024). Random Forest Algorithm Overview. Babylonian Journal of Machine Learning, 2024, 69–79. https://doi.org/10.58496/BJML/2024/007
Saputra, D., ’Alauddin, A. A. F., & Azizan, M. (2025). Comparative Analysis of Gaussian Naïve Bayes and Categorical Naïve Bayes Algorithms with Laplace Smoothing in COVID-19 Detection. Jurnal Ilmu Komputer Dan Informatika, 5(1), 69–78. https://doi.org/10.54082/jiki.286
Sayegh, H. R., Dong, W., & Al-madani, A. M. (2024). Enhanced Intrusion Detection with LSTM-Based Model, Feature Selection, and SMOTE for Imbalanced Data. Applied Sciences, 14(2), 479. https://doi.org/10.3390/app14020479
Sulaiman, B. R., Schetinin, V., & Sant, P. (2022). Review of Machine Learning Approach on Credit Card Fraud Detection. Human-Centric Intelligent Systems, 2(1–2), 55–68. https://doi.org/10.1007/s44230-022-00004-0
Wibowo, P., & Fatichah, C. (2021). An in-depth performance analysis of the oversampling techniques for high-class imbalanced dataset. Register: Jurnal Ilmiah Teknologi Sistem Informasi, 7(1), 63. https://doi.org/10.26594/register.v7i1.2206
Xiao, X., Li, X., & Zhou, Y. (2022). Financial literacy overconfidence and investment fraud victimization. Economics Letters, 212, 110308. https://doi.org/10.1016/j.econlet.2022.110308
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel SMOTE-Based Oversampling for Imbalanced Digital Fraud Risk Classification
Pages: 2106-2117
Copyright (c) 2026 Ika Nur Laily Fitriana, Fonda Leviany, Kurnia Sari Kasmiarno, Mohammad Okky Mabruri

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).













