Penerapan Metode Supervised Learning dan Teknik Resampling untuk Prediksi Penipuan Transaksi Keuangan


  • Elven Constancio Universitas Sriwijaya, Palembang, Indonesia
  • Ken Ditha Tania * Mail Universitas Sriwijaya, Palembang, Indonesia
  • (*) Corresponding Author
Keywords: Financial Transaction Fraud; SMOTE; ADASYN; Undersampling; XGBoost

Abstract

Financial transaction fraud can result in devastating consequences for the stability of companies, as well as huge losses for shareholders, the industry, and even the market as a whole. As fraud in financial transactions increases, there is a need for effective methods to accurately detect and prevent fraudulent activities. This study aims to compare the performance of five machine learning models, namely Random Forest, K-Nearest Neighbors (KNN), Decision Tree, XGBoost, and Extra Trees, in detecting financial transaction fraud using an imbalanced dataset. To overcome the data imbalance problem, three resampling techniques are applied, namely Synthetic Minority Oversampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), and Undersampling. Experiments were conducted with two training and test data sharing ratios, namely 70:30 and 80:20. The evaluation results showed that the XGBoost model was the most consistent, with the highest ROC AUC value of 99%, especially after the application of resampling techniques. The 80:20 data ratio resulted in a more balanced distribution and better model performance in detecting the minority class, particularly after resampling. This study concludes that the XGBoost model with resampling techniques is highly effective in addressing data imbalance.

Downloads

Download data is not yet available.

References

H. Sun, J. Li, and X. Zhu, “Financial fraud detection based on the part-of-speech features of textual risk disclosures in financial reports,” Procedia Comput Sci, vol. 221, pp. 57–64, 2023, Accessed: Oct. 14, 2024. [Online]. Available: https://doi.org/10.1016/j.procs.2023.07.009

S. Wen, J. Li, X. Zhu, and M. Liu, “Analysis of financial fraud based on manager knowledge graph,” Procedia Comput Sci, vol. 199, pp. 773–779, 2022, doi: https://doi.org/10.1016/j.procs.2022.01.096.

Z. Zhao and T. Bai, “Financial Fraud Detection and Prediction in Listed Companies Using SMOTE and Machine Learning Algorithms,” Entropy, vol. 24, no. 8, p. 1157, 2022, Accessed: Oct. 14, 2024. [Online]. Available: https://www.mdpi.com/1099-4300/24/8/1157

T. Xu, G. Coco, and M. Neale, “A predictive model of recreational water quality based on adaptive synthetic sampling algorithms and machine learning,” Water Res, vol. 177, p. 115788, 2020, doi: https://doi.org/10.1016/j.watres.2020.115788.

Q. Zhou and B. Sun, “Adaptive K-means clustering based under-sampling methods to solve the class imbalance problem,” Data Inf Manag, vol. 8, no. 3, p. 100064, 2024, doi: https://doi.org/10.1016/j.dim.2023.100064.

J. Bai, Y. Li, J. Li, X. Yang, Y. Jiang, and S.-T. Xia, “Multinomial random forest,” Pattern Recognit, vol. 122, p. 108331, 2022, doi: https://doi.org/10.1016/j.patcog.2021.108331.

S. Zhang, “Cost-sensitive KNN classification,” Neurocomputing, vol. 391, pp. 234–242, 2020, doi: https://doi.org/10.1016/j.neucom.2018.11.101.

M. M. Ghiasi and S. Zendehboudi, “Application of decision tree-based ensemble learning in the classification of breast cancer,” Comput Biol Med, vol. 128, p. 104089, 2021, doi: https://doi.org/10.1016/j.compbiomed.2020.104089.

M. Amjad, I. Ahmad, M. Ahmad, P. Wróblewski, P. Kamiński, and U. Amjad, “Prediction of pile bearing capacity using XGBoost algorithm: modeling and performance evaluation,” Applied Sciences, vol. 12, no. 4, p. 2126, 2022, Accessed: Oct. 14, 2024. [Online]. Available: https://www.mdpi.com/2076-3417/12/4/2126

U. Saeed, S. U. Jan, Y.-D. Lee, and I. Koo, “Fault diagnosis based on extremely randomized trees in wireless sensor networks,” Reliab Eng Syst Saf, vol. 205, p. 107284, 2021, doi: https://doi.org/10.1016/j.ress.2020.107284.

F. Zamachsari and N. Puspitasari, “Penerapan Deep Learning dalam Deteksi Penipuan Transaksi Keuangan Secara Elektronik,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 2, pp. 203–212, 2021, Accessed: Oct. 15, 2024. [Online]. Available: https://jurnal.iaii.or.id/index.php/RESTI/article/view/2952/391

L. Hasibuan and F. Jannah, “Deteksi Penipuan Kartu Kredit Menggunakan Support Vector Machine dengan Optimasi Grid Search dan Genetic Algorithm,” Building of Informatics, Technology and Science (BITS), vol. 6, no. 1, Jun. 2024, doi: 10.47065/bits.v6i1.5355.

M. Febriady, S. Samsuryadi, and D. P. Rini, “Klasifikasi Transaksi Penipuan Pada Kartu Kredit Menggunakan Metode Resampling Dan Pembelajaran Mesin,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 6, no. 2, pp. 1010–1016, 2022, Accessed: Oct. 15, 2024. [Online]. Available: https://ejurnal.stmik-budidarma.ac.id/index.php/mib/article/view/3515/2649

W. Priatna, “Dampak Pengambilan Sampel Data untuk Optimalisasi Data tidak seimbang pada Klasifikasi Penipuan Transaksi E-Commerce,” Indonesian Journal of Computer Science, vol. 13, no. 2, 2024, Accessed: Oct. 15, 2024. [Online]. Available: http://ijcs.net/ijcs/index.php/ijcs/article/view/3698

K. Kurniabudi, A. Harris, and V. Veronica, “Komparasi Performa Tree-Based Classifier Untuk Deteksi Anomali Pada Data Berdimensi Tinggi dan Tidak Seimbang,” Jurnal Media Informatika Budidarma, vol. 6, no. 1, pp. 370–377, 2022, Accessed: Oct. 15, 2024. [Online]. Available: https://www.ejurnal.stmik-budidarma.ac.id/index.php/mib/article/view/3473/2431

W. Liang, S. Luo, G. Zhao, and H. Wu, “Predicting hard rock pillar stability using GBDT, XGBoost, and LightGBM algorithms,” Mathematics, vol. 8, no. 5, p. 765, 2020, Accessed: Oct. 19, 2024. [Online]. Available: https://www.mdpi.com/2227-7390/8/5/765

A. Ramadina and K. D. Tania, “Knowledge Extraction of Gojek Application Review Using Aspect-based Sentiment Analysis,” The Indonesian Journal of Computer Science, vol. 13, no. 3, 2024, Accessed: Oct. 19, 2024. [Online]. Available: https://ejurnal.seminar-id.com/index.php/bits/article/view/5368/2982

L. Camacho, G. Douzas, and F. Bacao, “Geometric SMOTE for regression,” Expert Syst Appl, vol. 193, p. 116387, 2022, doi: https://doi.org/10.1016/j.eswa.2021.116387.

G. Ahmed et al., “Dad-net: Classification of alzheimer’s disease using adasyn oversampling technique and optimized neural network,” Molecules, vol. 27, no. 20, p. 7085, 2022, Accessed: Oct. 16, 2024. [Online]. Available: https://www.mdpi.com/1420-3049/27/20/7085

Q. Dai, J. Liu, and Y. Liu, “Multi-granularity relabeled under-sampling algorithm for imbalanced data,” Appl Soft Comput, vol. 124, p. 109083, 2022, doi: https://doi.org/10.1016/j.asoc.2022.109083.

J. Li, C. Guo, S. Lv, Q. Xie, and X. Zheng, “Financial fraud detection for Chinese listed firms: Does managers’ abnormal tone matter?,” Emerging Markets Review, vol. 62, p. 101170, 2024, doi: https://doi.org/10.1016/j.ememar.2024.101170.

A. Shokrzade, M. Ramezani, F. Akhlaghian Tab, and M. Abdulla Mohammad, “A novel extreme learning machine based kNN classification method for dealing with big data,” Expert Syst Appl, vol. 183, p. 115293, 2021, doi: https://doi.org/10.1016/j.eswa.2021.115293.

B. Charbuty and A. Abdulazeez, “Classification based on decision tree algorithm for machine learning,” Journal of Applied Science and Technology Trends, vol. 2, no. 01, pp. 20–28, 2021, Accessed: Oct. 14, 2024. [Online]. Available: https://www.jastt.org/index.php/jasttpath/article/view/65/24

A. Izotova and A. Valiullin, “Comparison of Poisson process and machine learning algorithms approach for credit card fraud detection,” Procedia Comput Sci, vol. 186, pp. 721–726, 2021, doi: https://doi.org/10.1016/j.procs.2021.04.214.

M. Seyyedattar, M. M. Ghiasi, S. Zendehboudi, and S. Butt, “Determination of bubble point pressure and oil formation volume factor: Extra trees compared with LSSVM-CSA hybrid and ANFIS models,” Fuel, vol. 269, p. 116834, 2020, doi: https://doi.org/10.1016/j.fuel.2019.116834.

X. Deng, H. Shao, L. Shi, X. Wang, and T. Xie, “A classification–detection approach of COVID-19 based on chest X-ray and CT by using keras pre-trained deep learning models,” Computer Modeling in Engineering & Sciences, vol. 125, no. 2, pp. 579–596, 2020, Accessed: Oct. 16, 2024. [Online]. Available: https://www.ingentaconnect.com/contentone/tsp/cmes/2020/00000125/00000002/art00006#


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Penerapan Metode Supervised Learning dan Teknik Resampling untuk Prediksi Penipuan Transaksi Keuangan

Dimensions Badge
Article History
Submitted: 2024-10-21
Published: 2024-12-03
Abstract View: 105 times
PDF Download: 62 times
How to Cite
Constancio, E., & Tania, K. (2024). Penerapan Metode Supervised Learning dan Teknik Resampling untuk Prediksi Penipuan Transaksi Keuangan. Building of Informatics, Technology and Science (BITS), 6(3), 1427-1439. https://doi.org/10.47065/bits.v6i3.6110
Issue
Section
Articles