Oversampling, Undersampling, Smote SVM dan Random Forest pada Klasifikasi Penerima Bidikmisi Sejawa Timur Tahun 2017


  • Laila Qadrini * Mail Universitas Sulawesi Barat, Majene, Indonesia
  • Hikmah Hikmah Universitas Sulawesi Barat, Majene, Indonesia
  • Megasari Megasari Universitas Sulawesi Barat, Majene, Indonesia
  • (*) Corresponding Author
Keywords: Oversampling; Undersampling; SMOTE

Abstract

Bidikmisi is tuition assistance from the government for high school graduates (SMA) or equivalent who have good academic potential but have economic limitations. Different from scholarships that focus on providing awards or financial support to those who excel. The achievement requirements for Bidikmisi are aimed at ensuring that Bidikmisi recipients are selected from those who truly have the potential and willingness to complete higher education. Given that the recipients of this bidikmisi must really be the right person, in this study a classification of the recipients of the 2017 bidikmisi in East Java will be carried out, in this study there is data that is not balanced the "Accepted" class is more than the "Not accepted" class. If the data is not balanced, almost all classification algorithms will produce much higher accuracy for the majority class than for the minority class. Researchers will handle class imbalances. The resampling technique used in research related to the prediction of bidikmisi recipients includes resampling techniques, namely Oversampling, Undersampling and SMOTE using two classification methods, namely SVM and Random Forest. The Oversampling technique was chosen because it does not reduce the amount of data but adds to the dataset that is lacking in the minority class. The Oversampling algorithm used is Synthetic Minority Over-sampling Technique (SMOTE), this algorithm was chosen from several resampling algorithms because SMOTE produces good accuracy and is effective in dealing with unbalanced classes because it reduces overfitting.

Downloads

Download data is not yet available.

References

https://bidikmisi.belmawa.ristekdikti.go.id/ diakses pada Tanggal 11 Juni 2022.

Tan, P., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining. Boston: Pearson Education.

Santosa, B. (2007). Data Mining: Teknik Pemanfaatan Data untuk Keperluan Bisnis. Yogyakarta(ID): Graha Ilmu.

Ali, S. M. Shamsuddin, & A. L. Ralescu. (2009). Classification with class imbalance problem: a review. Int J Adv. Soft Compu Appl, 7(3).

Kothan. (2015). Handling class imbalance problem in miRNA dataset associated with cancer. Bioinformation, 11(1):6–10.

Wu, Y. Ye, H. Zhang, M. K. Ng, & S.-S. Ho. (2014). ForesTexter: An efficient random forest algorithm for imbalanced text categorization. Knowl-Based Syst. 67:105–116.

Li & S. Liu. (2014). A comparative study of the class imbalance problem in Twitter spam Detection. Concurr. Comput. Pract. Exp.,pp. n/a-n/a

Siringoringo, Rimbun. (2018). Klasifikasi data tidak seimbang menggunakan algoritma smote dan k-nearest neighbor. Jurnal ISD. 3(1): 2528-5114.

A. Smote & D. A. N. Neighbor. (2017). Klasifikasi Data Tidak Seimbang.3(1):44–49.

M. Mustaqim, B. Warsito, & B. Surarso. (2019). Kombinasi Synthetic Minority Oversampling Technique (SMOTE) dan Neural Network Backpropagation untuk menangani data tidak seimbang pada prediksi pemakaian alat kontrasepsi implan. Regist. J. Ilm. Teknol. Sist. Inf. 5(2):128.

Sulistiyono, M., Pristyanto, Y., Adi, S., Gumelar, G. (2021). Implementasi Algoritma Synthetic Minority Over-Sampling Technique untuk Menangani Ketidakseimbangan Kelas pada Dataset Klasifikasi. SISTEMASI: Jurnal Sistem Informasi. 10(2):445-459.

Novritasari, A. A., & Purnami, S. W. (2015). Klasifikasi Kerentanan Seseorang Terserang Stroke di Jawa Timur Menggunakan Synthetic Minoryty Oversampling Technique (SMOTE) dan Support Vector Machine (SVM). Surabaya: Tugas Akhir. ITS.

Imanwardhani, C.S. (2018). Pendekatan synthetic minority Oversampling technique dalam menangani klasifikasi imbalanced data biner (studi kasus: status ketertinggalan desa di Jawa timur). Surabaya: Tugas Akhir. ITS.

Sabilla, I. W., Vista, B. C., (2021). Implementasi SMOTE dan Under Sampling pada Imbalanced Dataset untuk Prediksi Kebangkrutan Perusahaan. Jurnal Politeknik Caltex Riau. 7(2):329-339.

Choi, M. J. (2010). A Selective Sampling Method for Imbalanced Data Learning on Support Vector Machines. Iowa: Graduate Theses. Iowa State University.

Yen, S.-J., & Lee, Y.-S. (2009). Cluster-based under-sampling approaches for imbalanced data distributions. Expert Systems with Applications, 36(3): 5718–5727.

N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal Of Artificial Intelligence Research. 16:321-357.

Kurniawan, Dios. (2020). Pengenalan Machine Learning dengan Python Solusi Untuk Permasalahan Bigdata. Jakarta(ID): PT. Elex Media Komputindo.

Qadrini, L. Seppewali, A, Aina, A. (2021). Decision tree dan adaboost pada klasifikasi penerima program bantuan sosial. Jurnal Inovasi Penelitian. 2(7): 2722-9475.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Oversampling, Undersampling, Smote SVM dan Random Forest pada Klasifikasi Penerima Bidikmisi Sejawa Timur Tahun 2017

Dimensions Badge
Article History
Submitted: 2022-08-20
Published: 2022-09-04
Abstract View: 1437 times
PDF Download: 2948 times
Section
Articles