The Application of The Neighborhood Cleaning Rule in Conjunction with Random Forest, K-Fold Cross-Validation, and Grid Search for Addressing Imbalanced Datasets
Abstract
Finding a model that explains and separates data classes is the process of classification in data mining, which is used to guess the class of an item with an unknown class. Numerous strategies have been developed since categorization can be applied in a wide range of applications. But a common issue with classification is class imbalance. Data predictability suffers as a result of the issue of unbalanced classes. There are typically not an equal number of examples in each class in real-world categorization datasets. Class imbalance is not a problem when there are not significant differences in how the classes are distributed. Due to class imbalance, prediction models may skew in favor of the majority class, with the minority class contributing little to the model. One often used strategy for addressing class imbalance is the resampling technique. This study's objective is to put the Resampling Algorithm into practice. Neighborhood Cleaning Rule Random Forest K-Fold Tune Grid Search was carried out on a dataset that includes cases of Low Birth Weight Infants (BBLR) in Majene Regency and breast cancer diagnoses, which was posted on the UCI website. The Neighborhood Cleaning Rule (NCL), a data processing method, eliminates noise or other disturbances from datasets used for modeling or analysis. The F1-Score, G-Mean, Accuracy, and Sensitivity values from the model are good.
Downloads
References
Arifiyanti, A. A., & Wahyuni, E. D. (2020). SMOTE: Metode penyeimbang kelas pada klasifikasi data mining. Scan: Jurnal Teknologi Informasi Dan Komunikasi, 15(1), 34–39.
Astuti, F. D., & Lenti, F. N. (2021.). Implementasi SMOTE untuk mengatasi Imbalance Class pada Klasifikasi Car Evolution menggunakan K-NN.
Bappenas, S. (2020). Metadata Indikator Tujuan Pembangunan Berkelanjutan (TPB). Sustainable Development Goals (SDGs) Indonesia Pilar Pembangunan Ekonomi.
Choirunnisa, S. (2019). Metode hibrida oversampling dan ketidakseimbangan data kegagalan.
Devella, S., Yohannes, Y., & Rahmawati, F. N. (2020). Implementasi Random Forest Untuk Klasifikasi Motif Songket Palembang Berdasarkan SIFT. JATISI (Jurnal Teknik Informatika Dan Sistem Informasi), 7(2), 310–320.
Erlin, E., Desnelita, Y., Nasution, N., Suryati, L., & Zoromi, F. (2022). Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan Data Tidak seimbang. MATRIK: Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 21(3), 677–690.
Ihfa, R., & Harsanti, T. (2020). Komparasi Teknik Resampling Pada Pemodelan Regresi Logistik Biner. Seminar Nasional Official Statistics, 2020(1), 863–870.
Kemenkes, R. I. (2019). Profil Kesehatan Indonesia Tahun 2021. Kementerian Kesehatan Republik Indonesia. Jakarta: Kementerian Kesehatan Republik Indonesia.
Lestari, A., Mariati, E., & Widiatry, W. (2020). Model Klasifikasi Kepuasan Mahasiswa Teknik Terhadap Sarana Pembelajaran Menggunakan Data Mining. Jurnal Teknologi Informasi: Jurnal Keilmuan Dan Aplikasi Bidang Teknik Informatika, 14(2), 112–118.
Lujan-Moreno, G. A., Howard, P. R., Rojas, O. G., & Montgomery, D. C. (2018). Design of experiments and response surface methodology to tune machine learning hyperparameters, with a random forest case-study. Expert Systems with Applications, 109, 195–205.
Nugraha, W., & Sasongko, A. (2022). Hyperparameter Tuning pada Algoritma Klasifikasi dengan Grid Search. SISTEMASI : Jurnal Sistem Informasi, 11(2), 391–401.
Pangestika, M. P., Sumertajaya, I. M., & Rizki, A. (2021). Penerapan Synthetic Minority Oversampling Technique pada Pemodelan Regresi Logistik Biner terhadap Keberhasilan Studi Mahasiswa Program Magister IPB. Xplore: Journal of Statistics, 10(2), 152–166.
Qadrini, L., Hikmah, H., & Megasari, M. (2022). Oversampling, Undersampling, Smote SVM dan Random Forest pada Klasifikasi Penerima Bidikmisi Sejawa Timur Tahun 2017. Journal of Computer System and Informatics (JoSYC), 3(4), 386–391. https://doi.org/10.47065/josyc.v3i4.2154
Qadrini L, Sepperwali A, & Aina A. (2021). Decision Treedan Adaboostpada Klasifikasi Penerima Program Bantuan Sosial. Decision Tree Dan Adaboost Pada Klasifikasi Penerima Program Bantuan Sosial, 2(7), 1959–1966.
Siringoringo, R. (2018). Klasifikasi data tidak seimbang menggunakan algoritma SMOTE dan k-nearest neighbor. Journal Information System Development (ISD), 3(1).
Suryani Agustin, Budi Darma Setiawan, & Mochammad Ali Fauzi. (2019). Klasifikasi Berat Badan Lahir Rendah (BBgustin, Suryani Setiawan, Budi Darma Fauzi, Mochammad AlLR) Pada Bayi Dengan Metode Learning Vector Quantization (LVQ). Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 3(3), 2929–2936. https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/download/4831/2254/
Turlapati, V. P. K., & Prusty, M. R. (2020). Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19. Intelligence-Based Medicine, 3, 100023.
Wasono, R. (2022). Perbandingan Metode Random Forest dan naive bayes untuk Klasifikasi Debitur Berdasarkan Kualitas Kredit.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel The Application of The Neighborhood Cleaning Rule in Conjunction with Random Forest, K-Fold Cross-Validation, and Grid Search for Addressing Imbalanced Datasets
Pages: 286-293
Copyright (c) 2023 Laila Qadrini, Muh Hijrah, Laelatul Hikmah, Handayani Handayani

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).