The Utilization of Resampling Techniques and the Random Forest Method in Data Classification
Abstract
In data classification, there are various methods that can be employed, one of which is the random forest method. This method proves effective in handling non-linear data, exhibiting robustness against extreme data points and disturbances, and providing ease of use that results in high-quality classification outcomes. Data imbalance, where one class has more or fewer instances than the others, is a common issue. In situations of data imbalance, most classification models tend to favor the majority class, which can lead to overfitting and unsatisfactory classification results. To address this issue, resampling techniques can be applied. One such resampling technique is SMOTE, specifically an oversampling method that augments the minority class by generating synthetic data points. This research aims to evaluate the accuracy of data classification using the random forest method and assess the impact of resampling and random forest on classification. The data used in this study includes simulated breast cancer data and real-world patient data from LBW Puskesmas Banggae I Kabupaten Majene. The analysis results indicate an accuracy rate of 94.74%, a sensitivity of 93.33%, and an F1-Score of 95.89% for breast cancer data. Meanwhile, the accuracy for LBW data reached 73.75%, with a sensitivity of 77.63%, and an F1-Score of 84.89%.
Downloads
References
Alber, J., 2021, Klasifikasi Data Mining Untuk Menentukan Tingkat Kepuasan Pengguna Transaksi Bus Trans Metro Pekanbaru Menggunakan Metode Naive Bayes, Skripsi, Program Pasca Sarjana Teknik, Universitas Islam Riau, Pekanbaru.
Breiman, L., 2001, Random Forest, Machine Learning, 45, 5-32.
Chawla, N.V. dkk, 2002, SMOTE Boast: Improving Prediction Of The Minority Class In Boosting, Proc. Knowl, Discov, PP, Hal: 107-119.
Choirunnisa, S., 2019, Metode Hibrida Oversampling dan Undersampling Untuk Menangani Ketidakseimbangan Data Kegagalan Akademik Universitas XYZ, Tesis, Program Magister Komputer, Institut Teknologi Sepuluh Nopember, Surabaya.
Depkes RI, 2009, Pedoman Pelayanan Kesehatan Bayi berqat Lahir Rendah (LBW) Dengan perawatan Metode Kanguru Di Rumah Sakit dan Jejaringannya, Jakarta: Bakti Husada.
Fadilah, L., 2018, Klasifikasi Random Forest Pada Data Imbalance, Skripsi, Program Pasca Sarjana Matematika, UIN Syarif Hidayatullah, Jakarta.
Lestari, T.S. & Agustin Nuriani Sirodj, D., 2021, Klasifikasi Penipuan Transaksi Kartu Kredit Menggunakan Metode Random Forest, Jurnal Riset Statistik, No. 2, Volume 1, Hal: 160-167.
Lestariningsih, S. & Arta Budi Sisila, D., 2014, Hubungan Preeklasia Dalam Kehamilan Dengan Kejadian LBW di RSUD Jenderal Ahmad Yani Kota Metro Tahun 2011, Jurnal Kesehatan Masyarakat, No. 1, Vol. 8, Hal: 32-39.
Manuaba, 2008, Gawat Darurat Obstetri Ginekologi Dan Obsetri Ginekologi Sosial Untuk Profesi Bidan, Jakarta : EGC.
Manuaba, Ida Ayu Chandranita, dkk. Ilmu Kebidanan, Penyakit kandungan
dan KB. Jakarta : EGC; 2010
Muqiit WS, A. dkk., 2020, Penerapan Metode Resampling Dalam Mengatasi Imbalance Data Pada Determinan Kasus Diare Pada Balita di Indonesia, Jurnal Matematika dan Statistika Serta Aplikasinya, No. 1, Vol. 8, Hal: 19-27.
Pangastuti, S.P., 2018, Perbandingan Metode Ensemble Random Forest Dengan Smote-Boosting Dan Smote-Bagging Pada Klasifikasi Data Mining Untuk Kelas Imbalance (Studi Kasus : Data Beasiswa Bidikmisi Tahun 2017 di Jawa Timur), Tesis, Program Magister sains, Institut Teknologi Sepuluh Nopember, Surabaya.
Qadrini, L. dkk., 2022, Oversampling, Undersampling, SMOTE SVM dan Random Forest Pada Klasifikasi Penerima Bidikmisi Se Jawa Timur Tahun 2017, No. 4, Vol.3, Hal: 386-391.
Qadrini, L. Seppewali, A, Aina, A. (2021). Decision Tree dan Adaboost Pada Klasifikasi Penerima Program Bantuan Sosial, Jurnal Inovasi Penelitian. 2(7): 2722-9475.
Saifuddin, A.B., 2009, Panduan Praktis Pelayanan Kesehatan Maternal dan Neonatal, Jakarta: EGC.
Saifullah, 2019, Deteksi Kelayakan Fisik Air Untuk Konsumsi Menggunakan Naive Bayes Clasifier, Skripsi, Program Pasca Sarjana Komputer, UIN Maulana Malik Ibrahim, Malang.
Setianingrum, S., 2005, Hubungan Antara Kenaikan Berat Badan, Lingkar Lengan Atas, Kadar Hemoglobin Ibu Hamil Trimester III Dengan Berat Bayi Lahir di Puskesmas Ampel Boyolali, Jurnal Semarang.
Setiati, A.R. & Rahayu, S., 2017, Faktor Yang Mempengaruhi Kejadian LBW (Berat Badan Lahir Rendah) Di Ruang perawatan Intensif Neonatus RSUD DR Moewardi Di Surakarta, Jurnal Keperawatan Global, No. 1, Vol. 2, Hal: 1-61.
Siringoringo, R., 2018, Klasifikasi Data Tidak Seimbang Menggunakan Algoritma SMOTE dan k-Nearest Neighbor, Jurnal ISD, No. 1, Vol. 3, Hal: 44-49.
Syukron, A. & Subekti, D., 2018, Penerapan Metode Random Over-Under Sampling dan Random Forest Untuk Klasifikasi Penilaian Kredit, Jurnal Informatika, No. 2, Vol. 5, Hal: 175-185.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel The Utilization of Resampling Techniques and the Random Forest Method in Data Classification
Pages: 252-259
Copyright (c) 2023 Laila Qadrini

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).