Oversampling, Undersampling, Smote SVM dan Random Forest pada Klasifikasi Penerima Bidikmisi Sejawa Timur Tahun 2017
Abstract
Bidikmisi is tuition assistance from the government for high school graduates (SMA) or equivalent who have good academic potential but have economic limitations. Different from scholarships that focus on providing awards or financial support to those who excel. The achievement requirements for Bidikmisi are aimed at ensuring that Bidikmisi recipients are selected from those who truly have the potential and willingness to complete higher education. Given that the recipients of this bidikmisi must really be the right person, in this study a classification of the recipients of the 2017 bidikmisi in East Java will be carried out, in this study there is data that is not balanced the "Accepted" class is more than the "Not accepted" class. If the data is not balanced, almost all classification algorithms will produce much higher accuracy for the majority class than for the minority class. Researchers will handle class imbalances. The resampling technique used in research related to the prediction of bidikmisi recipients includes resampling techniques, namely Oversampling, Undersampling and SMOTE using two classification methods, namely SVM and Random Forest. The Oversampling technique was chosen because it does not reduce the amount of data but adds to the dataset that is lacking in the minority class. The Oversampling algorithm used is Synthetic Minority Over-sampling Technique (SMOTE), this algorithm was chosen from several resampling algorithms because SMOTE produces good accuracy and is effective in dealing with unbalanced classes because it reduces overfitting.
Downloads
References
https://bidikmisi.belmawa.ristekdikti.go.id/ diakses pada Tanggal 11 Juni 2022.
Tan, P., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining. Boston: Pearson Education.
Santosa, B. (2007). Data Mining: Teknik Pemanfaatan Data untuk Keperluan Bisnis. Yogyakarta(ID): Graha Ilmu.
Ali, S. M. Shamsuddin, & A. L. Ralescu. (2009). Classification with class imbalance problem: a review. Int J Adv. Soft Compu Appl, 7(3).
Kothan. (2015). Handling class imbalance problem in miRNA dataset associated with cancer. Bioinformation, 11(1):6–10.
Wu, Y. Ye, H. Zhang, M. K. Ng, & S.-S. Ho. (2014). ForesTexter: An efficient random forest algorithm for imbalanced text categorization. Knowl-Based Syst. 67:105–116.
Li & S. Liu. (2014). A comparative study of the class imbalance problem in Twitter spam Detection. Concurr. Comput. Pract. Exp.,pp. n/a-n/a
Siringoringo, Rimbun. (2018). Klasifikasi data tidak seimbang menggunakan algoritma smote dan k-nearest neighbor. Jurnal ISD. 3(1): 2528-5114.
A. Smote & D. A. N. Neighbor. (2017). Klasifikasi Data Tidak Seimbang.3(1):44–49.
M. Mustaqim, B. Warsito, & B. Surarso. (2019). Kombinasi Synthetic Minority Oversampling Technique (SMOTE) dan Neural Network Backpropagation untuk menangani data tidak seimbang pada prediksi pemakaian alat kontrasepsi implan. Regist. J. Ilm. Teknol. Sist. Inf. 5(2):128.
Sulistiyono, M., Pristyanto, Y., Adi, S., Gumelar, G. (2021). Implementasi Algoritma Synthetic Minority Over-Sampling Technique untuk Menangani Ketidakseimbangan Kelas pada Dataset Klasifikasi. SISTEMASI: Jurnal Sistem Informasi. 10(2):445-459.
Novritasari, A. A., & Purnami, S. W. (2015). Klasifikasi Kerentanan Seseorang Terserang Stroke di Jawa Timur Menggunakan Synthetic Minoryty Oversampling Technique (SMOTE) dan Support Vector Machine (SVM). Surabaya: Tugas Akhir. ITS.
Imanwardhani, C.S. (2018). Pendekatan synthetic minority Oversampling technique dalam menangani klasifikasi imbalanced data biner (studi kasus: status ketertinggalan desa di Jawa timur). Surabaya: Tugas Akhir. ITS.
Sabilla, I. W., Vista, B. C., (2021). Implementasi SMOTE dan Under Sampling pada Imbalanced Dataset untuk Prediksi Kebangkrutan Perusahaan. Jurnal Politeknik Caltex Riau. 7(2):329-339.
Choi, M. J. (2010). A Selective Sampling Method for Imbalanced Data Learning on Support Vector Machines. Iowa: Graduate Theses. Iowa State University.
Yen, S.-J., & Lee, Y.-S. (2009). Cluster-based under-sampling approaches for imbalanced data distributions. Expert Systems with Applications, 36(3): 5718–5727.
N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal Of Artificial Intelligence Research. 16:321-357.
Kurniawan, Dios. (2020). Pengenalan Machine Learning dengan Python Solusi Untuk Permasalahan Bigdata. Jakarta(ID): PT. Elex Media Komputindo.
Qadrini, L. Seppewali, A, Aina, A. (2021). Decision tree dan adaboost pada klasifikasi penerima program bantuan sosial. Jurnal Inovasi Penelitian. 2(7): 2722-9475.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Oversampling, Undersampling, Smote SVM dan Random Forest pada Klasifikasi Penerima Bidikmisi Sejawa Timur Tahun 2017
Pages: 386-391
Copyright (c) 2022 Laila Qadrini, Hikmah Hikmah, Megasari Megasari

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).