Undersampling dan K-Fold Random Forest Untuk Klasifikasi Kelas Tidak Seimbang
Abstract
Classification in Data Mining is a process of modelling that explains and differentiates data classes intending to estimate the class of an object whose class is unknown. Classification can be applied in various aspects so over time quite a lot of classification algorithms have been developed, but some problems are often encountered in classification, namely the problem of data imbalance. An imbalanced class is a condition where there are several data where the number of classes is not balanced or there is a significant difference in each number of classes. Most classification datasets do not have the same number of classes. However, the class imbalance is not a problem when the comparison between classes is not much different. Class imbalance can cause problems if left untreated because the resulting model predictions will tend to the majority group so that the contribution of the minority class to the model is small. One of the algorithms that are often used to handle unbalanced classes is the resampling algorithm. The purpose of this research is to apply the Resampling Undersampling Random Forest and Random Forest K-Fold Undersampling Algorithms to the Breast Cancer Diagnostic dataset from UCI Machine Learning. Undersampling was chosen because it produces better accuracy than oversampling. Recall accuracy for the K-Fold 10 Random Forest Algorithm is 83% and for Recall Undersampling Random Forest is 65%.
Downloads
References
D. Pramadhana, R. Rendi, and R. Robiyanto, “Peningkatan Algoritma J48 Untuk Klasifikasi Hasil Prestasi Mahasiswa Selama Proses Pembelajaran Secara Daring Menggunakan CFS Dan Adaboost,” J. Informatics Inf. Syst. Softw. Eng. Appl., vol. 5, no. 1, pp. 17–26, 2022.
R. D. Fitriani, H. Yasin, and T. Tarno, “Penanganan Klasifikasi Kelas Data Tidak Seimbang Dengan Random Oversampling Pada Naive Bayes (Studi Kasus: Status Peserta KB IUD di Kabupaten Kendal),” J. Gaussian, vol. 10, no. 1, pp. 11–20, 2021.
R. T. Prasetio and P. Pratiwi, “Penerapan Teknik Bagging pada Algoritma Klasifikasi untuk Mengatasi Ketidakseimbangan Kelas Dataset Medis,” J. Inform., vol. 2, no. 2, 2015.
F. D. Astuti and F. N. Lenti, “Implementasi SMOTE untuk mengatasi Imbalance Class pada Klasifikasi Car Evolution menggunakan K-NN”.
R. Ihfa and T. Harsanti, “Komparasi Teknik Resampling Pada Pemodelan Regresi Logistik Biner,” in Seminar Nasional Official Statistics, 2020, vol. 2020, no. 1, pp. 863–870.
R. Siringoringo, “Klasifikasi data tidak seimbang menggunakan algoritma SMOTE dan k-nearest neighbor,” J. Inf. Syst. Dev., vol. 3, no. 1, 2018.
M. P. Pangestika, I. M. Sumertajaya, and A. Rizki, “Penerapan Synthetic Minority Oversampling Technique pada Pemodelan Regresi Logistik Biner terhadap Keberhasilan Studi Mahasiswa Program Magister IPB,” Xplore J. Stat., vol. 10, no. 2, pp. 152–166, 2021.
L. Qadrini, H. Hikmah, and M. Megasari, “Oversampling, Undersampling, Smote SVM dan Random Forest pada Klasifikasi Penerima Bidikmisi Sejawa Timur Tahun 2017,” J. Comput. Syst. Informatics, vol. 3, no. 4, pp. 386–391, 2022.
R. Wasono, “Perbandingan Algoritma Random Forest dan naive bayes untuk Klasifikasi Debitur Berdasarkan Kualitas Kredit,” 2022.
A. Lestari, E. Mariati, and W. Widiatry, “Model Klasifikasi Kepuasan Mahasiswa Teknik Terhadap Sarana Pembelajaran Menggunakan Data Mining,” J. Teknol. Inf. J. Keilmuan dan Apl. Bid. Tek. Inform., vol. 14, no. 2, pp. 112–118, 2020.
N. Sulistiyowati and M. Jajuli, “Integrasi Naive Bayes Dengan Teknik Sampling SMOTE Untuk Menangani Data Tidak Seimbang,” Nuansa Inform., vol. 14, no. 1, pp. 34–37, 2020.
W. I. Sabilla and C. B. Vista, “Implementasi SMOTE dan Under Sampling pada Imbalanced Dataset untuk Prediksi Kebangkrutan Perusahaan,” J. Komput. Terap., vol. 7, no. 2, pp. 329–339, 2021.
A. I. Kusumarini, P. A. Hogantara, M. Fadhlurohman, N. Chamidah, S. Kom, and M. Kom, “Perbandingan Algoritma Random Forest, Naive Bayes, Dan Decision Tree Dengan Oversampling Untuk Klasifikasi Bakteri E. Coli,” in Prosiding Seminar Nasional Mahasiswa Bidang Ilmu Komputer dan Aplikasinya, 2021, vol. 2, no. 1, pp. 792–799.
E. Saputro and D. Rosiyadi, “Penerapan Algoritma Random Over-Under Sampling Pada Algoritma Klasifikasi Penentuan Penyakit Diabetes,” Bianglala Inform., vol. 10, no. 1, pp. 42–47, 2022.
O. Heranova, “Synthetic Minority Oversampling Technique pada Averaged One Dependence Estimators untuk Klasifikasi Credit Scoring,” J. RESTI (Rekayasa Sist. Dan Teknol. Informasi), vol. 3, no. 3, pp. 443–450, 2019.
R. Prasetyo, I. Nawawi, and A. Fauzi, “Komparasi Algoritma Logistic Regression dan Random Forest pada Prediksi Cacat Software,” J. Tek. Inform. UNIKA St. Thomas, pp. 275–281, 2021.
R. I. Arumnisaa and A. W. Wijayanto, “Comparison of Ensemble Learning Method: Random Forest, Support Vector Machine, AdaBoost for Classification Human Development Index (HDI),” Sist. J. Sist. Inf., vol. 12, no. 1, pp. 206–218, 2023.
Qadrini L, Sepperwali A, and Aina A, “Decision Treedan Adaboostpada Klasifikasi Penerima Program Bantuan Sosial,” Decis. Tree Dan Adab. Pada Klasifikasi Penerima Progr. Bantu. Sos., vol. 2, no. 7, pp. 1959–1966, 2021.
O. Arifin and T. B. Sasongko, “Analisa perbandingan tingkat performansi algoritma support vector machine dan naïve bayes Classifier untuk klasifikasi jalur minat SMA,” SEMNASTEKNOMEDIA ONLINE, vol. 6, no. 1, pp. 1–2, 2018.
R. Ridwansyah, I. Ariyati, and S. Faizah, “PARTICLE SWARM OPTIMIZATION BERBASIS CO-EVOLUSIONER DALAM EVALUASI KINERJA ASISTEN DOSEN,” J. Saintekom, vol. 9, no. 2, pp. 165–177, 2019.
E. D. Wahyuni, A. A. Arifiyanti, and M. Kustyani, “Exploratory data analysis dalam konteks klasifikasi data mining,” ReTII, pp. 263–269, 2019.
E. Christy and K. Suryowati, “ANALISIS KLASIFIKASI STATUS BEKERJA PENDUDUK DAERAH ISTIMEWA YOGYAKARTA MENGGUNAKAN ALGORITMA RANDOM FOREST,” J. Stat. Ind. dan Komputasi, vol. 6, no. 01, pp. 69–76, 2021.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Undersampling dan K-Fold Random Forest Untuk Klasifikasi Kelas Tidak Seimbang
Pages: 1967−1974
Copyright (c) 2023 Laila Qadrini

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).