Implementasi Metode Resampling Dalam Menangani Data Imbalance Pada Klasifikasi Multiclass Penyakit Thyroid
Abstract
It is estimated that at least 17 million Indonesians suffer from thyroid disorders. Interestingly, nearly 60% of those living with a thyroid disorder do not receive a diagnosis. Thus, it is necessary to carry out research that applies methods to predict thyroid disease. Before applying prediction methods, it is crucial to implement classification methods to obtain an accurate prediction model. However, to achieve optimal classification results and to avoid inaccuracies, a balance in the used data is required. Data imbalance is a condition where the ratio between classes in the data is uneven, which can result in the generated model becoming biased. The main objective of the research is to present a solution that can improve the accuracy of early detection of thyroid diseases through addressing data imbalance and implementing appropriate classification algorithms. The research methodology began with the collection and analysis of a dataset consisting of 9172 data points. Preprocessing was then performed, resulting in 5321 training data points and 1331 test data points. The testing phase employed 7 different classification algorithms with 7 different resampling methods and evaluation using a confusion matrix. This research achieved the highest accuracy rate of 98%, obtained from the combination of the Random Forest Algorithm and the Random Over Sampling method. It can be concluded that the combination of the Random Forest Algorithm with the Random Over Sampling resampling method can improve early detection accuracy for thyroid diseases.
Downloads
References
W. E. Ariawan, I. Made, and A. W. Putra, “Sistem Pakar Mendiagnosa Penyakit Tiroid Menggunakan Metode Certainty Factor Berbasis Web,” Jurnal Sutasoma, vol. 01, no. 01, pp. 104–110, 2022, [Online]. Available: https://s.id/jurnalsutasoma
R. S. Tantika and A. Kudus, “Penggunaan Metode Support Vector Machine Klasifikasi Multiclass pada Data Pasien Penyakit Tiroid,” Bandung Conference Series: Statistics, vol. 2, no. 2, pp. 159–166, Jul. 2022, doi: 10.29313/bcss.v2i2.3590.
K. Anda and M. Sandrianti, “Siaran Pers Survei Mengungkap Kurangnya Pengetahuan Tentang Dampak Gangguan Tiroid Terhadap Kesuburan,” 2020. [Online]. Available: https://www.healthywomen.org/content/article/thyroid-
S. Agustiani, A. Mustopa, A. Saryoko, W. Gata, S. Khotimatul Wildah, and S. Nusa Mandiri, “Penerapan Algoritma J48 Untuk Deteksi Penyakit Tiroid,” Paradigma - Jurnal Informatika dan Komputer, vol. 22, no. 2, pp. 153–160, 2020, doi: 10.31294/p.v21i2.
T. Okta Bagaskara, M. Izman Herdiansyah, T. Sutabri, and E. Surya Negara, “Model Prediksi Menggunakan Teknik Machine Learning untuk Penjualan terhadap Produksi Kain Jumputan pada Pengerajin Batiq Colet Jumputan Palembang,” PETIR: Jurnal Pengkajian dan Penerapan Teknik Informatika, vol. 16, no. 2, pp. 189–199, 2023, doi: https://doi.org/10.33322/petir.v16i2.2187.
S. Muhammad and W. Nugraha, “Mwmote Dalam Mengatasi Ketidakseimbangan Kelas Pada Prediksi Churn Menggunakan Klasifikasi C4.5,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 7, no. 1, pp. 54–62, Feb. 2023.
A. Nugroho and Y. Religia, “Analisis Optimasi Algoritma Klasifikasi Naive Bayes menggunakan Genetic Algorithm dan Bagging,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 3, pp. 504–510, Jun. 2021, doi: 10.29207/resti.v5i3.3067.
G. Gumelar, Q. Ain, R. Marsuciati, S. Agustanti Bambang, A. Sunyoto, and M. Syukri Mustafa, “Kombinasi Algoritma Sampling dengan Algoritma Klasifikasi untuk Meningkatkan Performa Klasifikasi Dataset Imbalance,” SISFOTEK-Sistem Informasi dan Teknologi, pp. 250–255, 2021.
A. Syukron, E. Saputro, and P. Widodo, “Penerapan Metode Smote Untuk Mengatasi Ketidakseimbangan Kelas Pada Prediksi Gagal Jantung,” 2023. [Online]. Available: https://doi.org/10/25047/jtit.v10i1.312
N. Yudistira and A. F. Putra, “Algoritma Decision Tree Dan Smote Untuk Klasifikasi Serangan Jantung Miokarditis Yang Imbalance,” Jurnal Litbang Edusaintech, vol. 2, no. 2, pp. 112–122, Dec. 2021, doi: 10.51402/jle.v2i2.48.
M. R. Hunafa and A. Hermawan, “Perbandingan Algoritma Naïve Bayes dan K-Nearest Neighbor Pada Imbalace Class Dataset Penyakit Diabetes,” KLIK: Kajian Ilmiah Informatika dan Komputer , vol. 4, no. 3, pp. 1551–1561, 2023, doi: 10.30865/klik.v4i3.1486.
A. Handika Permana, F. Rakhmat Umbara, and F. Kasyidi, “Klasifikasi Penyakit Jantung Tipe Kardiovaskular Menggunakan Adaptive Synthetic Sampling dan Algoritma Extreme Gradient Boosting,” Technology and Science (BITS), vol. 6, no. 1, 2024, doi: 10.47065/bits.v6i1.5421.
R. Aryanti, T. Misriati, and R. Hidayat, “Klasifikasi Risiko Kesehatan Ibu Hamil Menggunakan Random Oversampling Untuk Mengatasi Ketidakseimbangan Data,” KLIK: Kajian Ilmiah Informatika dan Komputer, vol. 3, no. 5, pp. 409–416, 2023, [Online]. Available: https://djournals.com/klik
L. Mutawali, W. Murniati, and K. Kunci, “PENERAPAN KNNIMPUTER DALAM MENGOLAH DATA MISSING VALUE UNTUK MEMBANTU MENINGKATKAN AKURASI SUPPORT VECTOR MACHINE KLASIFIKASI PENYAKIT TIROID,” 2022. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/thyroid+diseas
U. Erdiansyah, A. Irmansyah Lubis, and K. Erwansyah, “Komparasi Metode K-Nearest Neighbor dan Random Forest Dalam Prediksi Akurasi Klasifikasi Pengobatan Penyakit Kutil,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 6, no. 1, p. 208, Jan. 2022, doi: 10.30865/mib.v6i1.3373.
M. Hayaty, S. Muthmainah, and S. M. Ghufran, “Random and Synthetic Over-Sampling Approach to Resolve Data Imbalance in Classification,” International Journal of Artificial Intelligence Research, vol. 4, no. 2, p. 86, Jan. 2021, doi: 10.29099/ijair.v4i2.152.
Y. T. Samuel, C. Beatrix, and A. Nahuway, “Prediksi Indeks Prestasi Mahasiswa Yang Berkuliah Sambil Bekerja Di Universitas Advent Indonesia Dengan Menggunakan Metode Decision Tree C4.5 Dan SMOTE,” Jurnal TeIKa, vol. 10, no. 1, pp. 69–77, 2020.
U. Ungkawa and M. A. Rafi, “Data Balancing Techniques Using the PCA-KMeans and ADASYN for Possible Stroke Disease Cases,” Jurnal Online Informatika, vol. 9, no. 1, pp. 138–147, Jun. 2024, doi: 10.15575/join.v9i1.1293.
W. I. Sabilla and C. B. Vista, “Implementasi SMOTE dan Under Sampling pada Imbalanced Dataset untuk Prediksi Kebangkrutan Perusahaan,” Jurnal Komputer Terapan, vol. 7, no. 2, pp. 329–339, 2021, [Online]. Available: https://jurnal.pcr.ac.id/index.php/jkt/
H. Utami, “Analisis Sentimen dari Aplikasi Shopee Indonesia Menggunakan Metode Recurrent Neural Network,” Indonesian Journal of Applied Statistics, vol. 5, no. 1, p. 31, May 2022, doi: 10.13057/ijas.v5i1.56825.
H. Wang and X. Liu, “Undersampling bankruptcy prediction: Taiwan bankruptcy data,” PLoS One, vol. 16, no. 7 July, Jul. 2021, doi: 10.1371/journal.pone.0254030.
F. Yang, K. Wang, L. Sun, M. Zhai, J. Song, and H. Wang, “A hybrid sampling algorithm combining synthetic minority over-sampling technique and edited nearest neighbor for missed abortion diagnosis,” BMC Med Inform Decis Mak, vol. 22, no. 1, p. 344, Dec. 2022, doi: 10.1186/s12911-022-02075-2.
K. Akbar and M. Hayaty, “Data Balancing untuk Mengatasi Imbalance Dataset pada Prediksi Produksi Padi Balancing Data to Overcome Imbalance Dataset on Rice Production Prediction,” Jurnal Ilmiah Intech : Information Technology Journal of UMUS, vol. 2, no. 02, pp. 1–14, 2020.
H. Apriyani, “Perbandingan Metode Naïve Bayes Dan Support Vector Machine Dalam Klasifikasi Penyakit Diabetes Melitus,” 2020. [Online]. Available: https://journal-computing.org/index.php/journal-ita/index
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Implementasi Metode Resampling Dalam Menangani Data Imbalance Pada Klasifikasi Multiclass Penyakit Thyroid
Pages: 890−900
Copyright (c) 2024 Najmi Cahaya Nugraha
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).