Penerapan Metode GA-RU Pada Algoritma Random Forest  Untuk Mengatasi Class Imbalance Data Beasiswa KIP-Kuliah

Febrian Nor Rahman; Taghfirul Azhima Yoga Siswa; Rudiman Rudiman

doi:10.47065/bits.v6i4.6757

Febrian Nor Rahman Universitas Muhammadiyah Kalimantan Timur, Samarinda, Indonesia
Taghfirul Azhima Yoga Siswa * Universitas Muhammadiyah Kalimantan Timur, Samarinda, Indonesia
Rudiman Rudiman Universitas Muhammadiyah Kalimantan Timur, Samarinda, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bits.v6i4.6757

Keywords: Genetic Algorithm; Random Undersampling; Random Forest; Klasifikasi

Abstract

Class imbalance is a common challenge in data analysis, where the majority class significantly outnumbers the minority class. This condition causes classification models to lean toward predicting the majority class, resulting in low accuracy in identifying the minority class. This study proposes the application of Genetic Algorithm (GA) combined with Random Undersampling (RU) on the Random Forest algorithm to address class imbalance issues in the dataset of Indonesia Smart Card (KIP) scholarship recipients at Universitas Muhammadiyah Kalimantan Timur. The dataset comprises 1,080 records with 37 features related to the socio-economic factors of the scholarship recipients. After data cleaning, 1,075 records were retained. The results indicate that the Random Undersampling method improved the accuracy of the Random Forest model from 84.27% to 85.06%. Although this improvement appears modest, it is significant as it demonstrates increased model stability in classifying the minority class, which previously had low accuracy. The combination of GA and RU proved effective in enhancing model performance, resulting in more stable classification for the minority class. This study is expected to contribute to the development of more accurate and efficient scholarship selection systems and serve as a reference for research in data mining and machine learning.

Downloads

Download data is not yet available.

References

M. Safii and A. Amanda, “Optimisasi Algoritma MOOSRA Pada Seleksi Penerima Beasiswa KIP Kuliah,” J. SAINTIKOM (Jurnal Sains Manaj. Inform. dan Komputer), vol. 22, no. 2, p. 555, 2023, doi: 10.53513/jis.v22i2.9459.

B. Baskoro, S. Sriyanto, and L. S. Rini, “Prediksi Penerima Beasiswa dengan Menggunakan Teknik Data Mining di Universitas Muhammadiyah Pringsewu,” Pros. Semin. Nas. Darmajaya, vol. 1, no. 0, pp. 87–94, 2021, [Online]. Available: https://jurnal.darmajaya.ac.id/index.php/PSND/article/view/2918

E. Budiarto, R. Rino, S. Hariyanto, and D. Susilawati, “Penerapan Data Mining Untuk Rekomendasi Beasiswa Pada SD Maria Mediatrix Menggunakan Algoritma C4.5,” Algor, vol. 3, no. 2, pp. 23–34, 2022, doi: 10.31253/algor.v3i2.1019.

T. D. Piyadasa and K. Gunawardana, “SOM-XG: Self-Organizing Map Based Resampling with Sample Extraction and Generation,” Int. J. Adv. ICT Emerg. Reg., vol. 16, no. 4, pp. 11–20, 2023, doi: 10.4038/icter.v16i4.7270.

S. S. Nusrhendratno, “Sintesis Fitur Density Based Feature Selection (DBFS) dan AdaBoots dengan XGBoost Untuk Meningkatkan Performa Model Prediksi,” Pros. Sains Nas. dan Teknol., vol. 12, no. 1, p. 305, 2022, doi: 10.36499/psnst.v12i1.6997.

D. Hlavcheva, V. Yaloveha, A. Podorozhniak, and N. Lukova-Chuiko, “a Comparison of Classifiers Applied To the Problem of Biopsy Images Analysis,” Adv. Inf. Syst., vol. 4, no. 2, pp. 12–16, 2020, doi: 10.20998/2522-9052.2020.2.03.

Wahyudi, Rudiman, and N. A. Verdikha, “Klasifikasi Sentimen X-Twitter Perihal Pemindahan Ibu Kota Indonesia Menggunakan Ekstraksi Fitur TF-IDF dan Metode Support Vector Machine (SVM),” J. Teknol. Inf., vol. 18, no. 2, pp. 185–199, 2024.

A. P. Saripah and F. H. Sibarani, “Analisis Sentimen Terhadap Aplikasi Maxim Menggunakan Algoritma Random Forest,” J. Sci. Soc. Res., vol. 7, no. 3, pp. 1201–1208, 2024, [Online]. Available: http://jurnal.goretanpena.com/index.php/JSSR

I. Taufiq, T. A. Y. Siswa, and W. J. Pranoto, “Model Optimasi Random Forest dengan PSO-CHI-SM dalam Mengatasi High Dimensional dan Imbalanced Data Banjir Kota Samarinda,” J. Teknol. Sist. Inf. dan Apl., vol. 7, no. 3, pp. 1267–1279, 2024, doi: 10.32493/jtsi.v7i3.41632.

M. Talebi Moghaddam et al., “Predicting diabetes in adults: identifying important features in unbalanced data over a 5-year cohort study using machine learning algorithm,” BMC Med. Res. Methodol., vol. 24, no. 1, p. 220, 2024, doi: 10.1186/s12874-024-02341-z.

Y. A. T. Siswa and W. J. Pranoto, “Implementasi Seleksi Fitur Information Gain Ratio Pada Algoritma Random Forest Untuk Model Data Klasifikasi Pembayaran Kuliah,” Din. Inform., vol. 15, no. 1, pp. 41–49, 2023.

A. A. Dhani, T. A. Y. Siswa, and W. J. Pranoto, “Perbaikan Akurasi Random Forest Dengan ANOVA Dan SMOTE Pada Klasifikasi Data Stunting,” Teknika, vol. 13, no. 2, pp. 264–272, 2024, doi: 10.34148/teknika.v13i2.875.

Y. Priantama and T. A. Yoga Siswa, “Optimasi Correlation-Based Feature Selection Untuk Perbaikan Akurasi Random Forest Classifier Dalam Prediksi Performa Akademik Mahasiswa,” JIKO (Jurnal Inform. dan Komputer), vol. 6, no. 2, p. 251, 2022, doi: 10.26798/jiko.v6i2.651.

A. Sircar, K. Yadav, K. Rayavarapu, N. Bist, and H. Oza, “Application of machine learning and artificial intelligence in oil and gas industry,” Pet. Res., vol. 6, no. 4, pp. 379–391, 2021, doi: 10.1016/j.ptlrs.2021.05.009.

F. Aziz, Y. Yanto, and E. Herdit Juningsih, “Rancang Bangun Sistem Penunjang Keputusan Penentuan Beasiswa Menggunakan Metode Fuzzy Tsukamoto Dengan Optimasi Genetic Algorithm,” JATI (Jurnal Mhs. Tek. Inform., vol. 8, no. 1, pp. 709–715, 2024, doi: 10.36040/jati.v8i1.9338.

W. I. Sabilla and C. Bella Vista, “Implementasi SMOTE dan Under Sampling pada Imbalanced Dataset untuk Prediksi Kebangkrutan Perusahaan,” J. Komput. Terap., vol. 7, no. 2, pp. 329–339, 2021, doi: 10.35143/jkt.v7i2.5027.

F. A. Dolf, N. Safriadi, and T. Tursina, “Implementasi Sentiment Analysis Berdasarkan Tweets Masyarakat Terhadap Kinerja Presiden dalam Aspek Penanganan Covid-19,” J. Sist. dan Teknol. Inf., vol. 10, no. 3, p. 303, 2022, doi: 10.26418/justin.v10i3.54503.

A. P. Ratnasari, “Performance of Random Oversampling, Random Undersampling, and SMOTE-NC Methods in Handling Imbalanced Class in Classification Models,” Int. J. Sci. Res. Manag., vol. 12, no. 04, pp. 494–501, 2024, doi: 10.18535/ijsrm/v12i04.m03.

C. Fan, M. Chen, X. Wang, J. Wang, and B. Huang, “A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery From Building Operational Data,” Front. Energy Res., vol. 9, no. March, pp. 1–17, 2021, doi: 10.3389/fenrg.2021.652801.

R. Ariani, “Data Curation Dan Research Data Management Untuk Terwujudnya Integrasi Data Riset Di Indonesia,” J. Doc. Inf. Sci., vol. 4, no. 1, pp. 93–103, 2020, doi: 10.33505/jodis.v4i1.162.

F. Sulianta, Basic Data Mining from A to Z, 2023. [Online]. Available: https://books.google.co.id/books?id=JcLhEAAAQBAJ

I. R. Pratama, M. Maimunah, and E. R. Arumi, “Sistem Klasifikasi Penjualan Produk Alat Listrik Terlaris Untuk Optimasi Pengadaan Stok Menggunakan Naïve Bayes,” J. Media Inform. Budidarma, vol. 6, no. 4, p. 2135, 2022, doi: 10.30865/mib.v6i4.4418.

I. M. Hamdani1 et al., “INTISARI Jurnal Inovasi Pengabdian Masyarakat Edukasi dan Pelatihan Data Science dan Data Preprocessing,” Juni, vol. 2, no. 1, pp. 19–26, 2024, doi: 10.58227/intisari.v2i1.125.

M. Thalita da Silva Leite, E. da Silva Rocha, I. Vitor Teixeira, F. Leandro de Morais Melo, and P. Takako Endo, “Evaluating undersampling techniques in the prediction of potential congenital syphilis cases using real data from Pernambuco, Brazil,” 2024.

A. Fauzi and A. H. Yunial, “Optimasi Algoritma Klasifikasi Naive Bayes, Decision Tree, K – Nearest Neighbor, dan Random Forest menggunakan Algoritma Particle Swarm Optimization pada Diabetes Dataset,” J. Edukasi dan Penelit. Inform., vol. 8, no. 3, p. 470, 2022, doi: 10.26418/jp.v8i3.56656.

P. K. Sari and R. R. Suryono, “Komparasi Algoritma Support Vector Machine Dan Random Forest Untuk Analisis Sentimen Metaverse,” J. Mnemon., vol. 7, no. 1, pp. 31–39, 2024, doi: 10.36040/mnemonic.v7i1.8977.

J. V. Alegre-Requena, S. Sowndarya S. V., R. Pérez-Soto, T. M. Alturaifi, and R. S. Paton, “AQME: Automated quantum mechanical environments for researchers and educators,” Wiley Interdiscip. Rev. Comput. Mol. Sci., vol. 13, no. 5, pp. 1–18, 2023, doi: 10.1002/wcms.1663.

S. Katoch, S. S. Chauhan, and V. Kumar, A review on genetic algorithm: past, present, and future, vol. 80, no. 5. Multimedia Tools and Applications, 2021. doi: 10.1007/s11042-020-10139-6.

B. P. Pratiwi, A. S. Handayani, and S. Sarjana, “Pengukuran Kinerja Sistem Kualitas Udara Dengan Teknologi Wsn Menggunakan Confusion Matrix,” J. Inform. Upgris, vol. 6, no. 2, pp. 66–75, 2021, doi: 10.26877/jiu.v6i2.6552.

C. Sirichanya and K. Kraisak, “Semantic data mining in the information age: A systematic review,” Int. J. Intell. Syst., vol. 36, no. 8, pp. 3880–3916, 2021, doi: 10.1002/int.22443.

Budhi Gustiandi, Langkah Awal Menguasai Bahasa Pemrograman Python. 2023. doi: 10.55981/brin.656.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Penerapan Metode GA-RU Pada Algoritma Random Forest Untuk Mengatasi Class Imbalance Data Beasiswa KIP-Kuliah

Penerapan Metode GA-RU Pada Algoritma Random Forest Untuk Mengatasi Class Imbalance Data Beasiswa KIP-Kuliah

Abstract

Downloads

References

Most read articles by the same author(s)