Penerapan Metode GA-CBU Pada Algoritma Logistic Regression Untuk Mengatasi Class Imbalance Data Beasiswa KIP-Kuliah


  • Ahmad Nugraha Poernamawan Universitas Muhammadiyah Kalimantan Timur, Samarinda, Indonesia
  • Taghfirul Yoga Azhima Siswa * Mail Universitas Muhammadiyah Kalimantan Timur, Samarinda, Indonesia
  • Rudiman Rudiman Universitas Muhammadiyah Kalimantan Timur, Samarinda, Indonesia
  • (*) Corresponding Author
Keywords: classification; class imbalance; LR; GA; CBU

Abstract

The issue of class imbalance often poses a challenge in data analysis, where the number of instances in the majority class is significantly higher than that in the minority class. This can lead classification models to be biased towards predicting the majority class, resulting in low accuracy in identifying the minority class. This research aims to implement the Logistic Regression (LR) algorithm combined with the Clustering Based Undersampling (CBU) method as an undersampling technique, feature selection, and optimization using Genetic Algorithm (GA) in classifying KIP-College scholarship data at Muhammadiyah University of East Kalimantan. In addition, this research also evaluates the performance of the model with 10-Fold Cross Validation and Confusion Matrix techniques as accuracy metrics and aims to overcome the problem of class imbalance in the data of scholarship recipients (KIP) at Muhammadiyah University of East Kalimantan. The data used consists of 1075 records with 37 features related to the socio-economic factors of scholarship recipients. The results from the application of the CBU method indicate an increase in the accuracy of the Logistic Regression model from 62.51% to 67.68%. Furthermore, the combination of GA and CBU has providing more stable results in classifying minority classes. It is hoped that this research can make a significant contribution to the development of a more accurate and efficient scholarship recipient selection system, as well as serve as a reference for future studies in the fields of data mining and machine learning.

Downloads

Download data is not yet available.

References

N. Indriyani, A. Fauzi, and A. B. H. Y. Yanto, “Pemodelan Prediksi Penerima Beasiswa Kip-Kuliah Menggunakan Metode Weight Product,” IMTechno J. Ind. Manag. Technol., vol. 5, no. 1, 2024, doi: 10.31294/imtechno.v5i1.2958.

A. S. Suweleh, D. Susilowati, and Hairani, “Aplikasi Penentuan Penerima Beasiswa Menggunakan Algoritma C4.5,” J. BITe, vol. 2, no. 1, pp. 12–21, 2020, doi: 10.30812/bite.v2i1.798.

P. Dewi, R. Nur Aulia, and R. Taufiqillah, “Customer Churn Prediction for Life Insurance Using Binary Logistic Regression,” Econ. Rev. J., vol. 3, no. 3, pp. 2289–2299, 2024, doi: 10.56709/mrj.v3i3.353.

D. Megah Sari, N. Arifin, Nurfitrianingsih, and A. M. Yusuf, “Implementation of Decision Support System for Scholarship Recipients at Bank Indonesia,” Ceddi J. Educ., vol. 1, no. 1, pp. 13–22, 2022, doi: 10.56134/cje.v1i1.10.

J. Prasetya, “Penerapan Klasifikasi Naive Bayes dengan Algoritma Random Oversampling dan Random Undersampling pada Data Tidak Seimbang Cervical Cancer Risk Factors,” Leibniz J. Mat., vol. 2, no. 2, pp. 11–22, 2022, doi: 10.59632/leibniz.v2i2.173.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16, no. February, pp. 321–357, 2020, doi: 10.1613/jair.953.

M. Kim and K. B. Hwang, “An empirical evaluation of sampling methods for the classification of imbalanced data,” PLoS One, vol. 17, no. 7 July, pp. 1–22, 2022, doi: 10.1371/journal.pone.0271260.

M. Khairy, T. M. Mahmoud, and T. Abd-El-Hafeez, “The effect of rebalancing techniques on the classification performance in cyberbullying datasets,” Neural Comput. Appl., vol. 36, no. 3, pp. 1049–1065, 2024, doi: 10.1007/s00521-023-09084-w.

S. Katoch, S. S. Chauhan, and V. Kumar, A review on genetic algorithm: past, present, and future, vol. 80, no. 5. Multimedia Tools and Applications, 2021. doi: 10.1007/s11042-020-10139-6.

N. Cahyani, S. S. Pangastuti, K. Fithriasari, I. Irhamah, and N. Iriawan, “Classification of Bidikmisi Scholarship Acceptance using Neural Network Based on Hybrid Method of Genetic Algorithm,” Indones. J. Stat. Its Appl., vol. 5, no. 2, pp. 396–404, 2021, doi: 10.29244/ijsa.v5i2p396-404.

C. Fan, M. Chen, X. Wang, J. Wang, and B. Huang, “A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery From Building Operational Data,” Front. Energy Res., vol. 9, no. March, pp. 1–17, 2021, doi: 10.3389/fenrg.2021.652801.

Y. D. Evitasari, W. J. Pranoto, and N. A. Verdikha, “Evaluasi Support Vector Machine Dengan Optimasi Metode Genetic Algorithm Pada Klasifikasi Banjir Kota Samarinda,” J. Sains Komput. dan Teknol. Inf., vol. 6, no. 1, pp. 49–53, 2023, doi: 10.33084/jsakti.v6i1.5462.

R. Ariani, “Data Curation Dan Research Data Management Untuk Terwujudnya Integrasi Data Riset Di Indonesia,” J. Doc. Inf. Sci., vol. 4, no. 1, pp. 93–103, 2020, doi: 10.33505/jodis.v4i1.162.

F. Sulianta, Basic Data Mining from A to Z. Feri Sulianta, 2023. [Online]. Available: https://books.google.co.id/books?id=JcLhEAAAQBAJ

I. R. Pratama, M. Maimunah, and E. R. Arumi, “Sistem Klasifikasi Penjualan Produk Alat Listrik Terlaris Untuk Optimasi Pengadaan Stok Menggunakan Naïve Bayes,” J. Media Inform. Budidarma, vol. 6, no. 4, p. 2135, 2022, doi: 10.30865/mib.v6i4.4418.

I. M. Hamdani1 et al., “INTISARI Jurnal Inovasi Pengabdian Masyarakat Edukasi dan Pelatihan Data Science dan Data Preprocessing,” Juni, vol. 2, no. 1, pp. 19–26, 2024, doi: 10.58227/intisari.v2i1.125.

D. Ariyadi, T. Azhima, and Y. Siswa, “Penerapan Metode PSO-SMOTE Pada Algoritma Random Forest Untuk Mengatasi Class Imbalance Data Bencana Tanah Longsor,” vol. 6, no. 1, pp. 320–329, 2025.

A. Kochkarev, A. Khvostikov, D. Korshunov, A. Krylov, and M. Boguslavskiy, “Data balancing method for training segmentation neural networks,” CEUR Workshop Proc., vol. 2744, pp. 1–9, 2020, doi: 10.51130/graphicon-2020-2-4-19.

M. Fajar and Rudiman, “Klasifikasi Jenis Tanah Wakaf Muhammadiyah di Tanjung Redeb dengan Metode K-Means Berbasis Sig,” Borneo Student Res., vol. 3, no. 2, p. 2022, 2022, [Online]. Available: https://muhammadsyaf.wordpress.com/2017/03/04/sistem-informasi-geografis-dan-

J. V. Alegre-Requena, S. Sowndarya S. V., R. Pérez-Soto, T. M. Alturaifi, and R. S. Paton, “AQME: Automated quantum mechanical environments for researchers and educators,” Wiley Interdiscip. Rev. Comput. Mol. Sci., vol. 13, no. 5, pp. 1–18, 2023, doi: 10.1002/wcms.1663.

J. Zhang, L. Chen, and F. Abid, “Prediction of Breast Cancer from Imbalance Respect Using Cluster-Based Undersampling Method,” J. Healthc. Eng., vol. 2019, 2020, doi: 10.1155/2019/7294582.

T. Wongvorachan, S. He, and O. Bulut, “A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining,” Inf., vol. 14, no. 1, 2023, doi: 10.3390/info14010054.

Budhi Gustiandi, Langkah Awal Menguasai Bahasa Pemrograman Python. 2023. doi: 10.55981/brin.656.

F. H. Harahap, “IJM : Indonesian Journal of Multidisciplinary Klasifikasi Menggunakan Model Regresi Logistik Multinomial dan Regresi Logistik Multinomial Komponen Utama,” vol. 1, pp. 632–642, 2023.

P. Schober and T. R. Vetter, “Statistical Minute,” Int. Anesth. Res. Soc., vol. 129, no. 2, p. 2019, 2021.

B. P. Pratiwi, A. S. Handayani, and S. Sarjana, “Pengukuran Kinerja Sistem Kualitas Udara Dengan Teknologi Wsn Menggunakan Confusion Matrix,” J. Inform. Upgris, vol. 6, no. 2, pp. 66–75, 2021, doi: 10.26877/jiu.v6i2.6552.

R. Syaputra, T. A. Y. Siswa, and W. J. Pranoto, “Model Optimasi SVM Dengan PSO-GA dan SMOTE Dalam Menangani High Dimensional dan Imbalance Data Banjir,” Teknika, vol. 13, no. 2, pp. 273–282, 2024, doi: 10.34148/teknika.v13i2.876.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Penerapan Metode GA-CBU Pada Algoritma Logistic Regression Untuk Mengatasi Class Imbalance Data Beasiswa KIP-Kuliah

Dimensions Badge
Article History
Submitted: 2025-01-15
Published: 2025-03-01
Abstract View: 22 times
PDF Download: 8 times
How to Cite
Poernamawan, A., Siswa, T., & Rudiman, R. (2025). Penerapan Metode GA-CBU Pada Algoritma Logistic Regression Untuk Mengatasi Class Imbalance Data Beasiswa KIP-Kuliah. Building of Informatics, Technology and Science (BITS), 6(4), 2322-2334. https://doi.org/10.47065/bits.v6i4.6747
Issue
Section
Articles