Komparasi Perbandingan Algoritma C4.5, Naive Bayes, K-Nearest Neighbor, Random Forest Untuk Prediksi Faktor Penyebab Penyakit Diabetes
Abstract
Diabetes is a chronic metabolic disease characterized by elevated blood glucose levels and can cause various serious complications and contribute to high mortality rates worldwide. The main problem in managing diabetes is the need for accurate patient status classification based on laboratory test data so that appropriate treatment can be carried out. This study aims to compare the performance of the C4.5 algorithm, Naive Bayes, K-Nearest Neighbor (KNN), and Random Forest in classifying diabetes patient data. The dataset used was sourced from Electronic Health Records (EHRs) with research subjects from Rantauprapat Regional General Hospital, totaling 10,000 data consisting of eight attributes and one class attribute, with 859 diabetes patient data and 9,141 non-diabetes patient data. The research method was carried out by dividing the data into training data and testing data using a ratio of 90:10, 80:20, and 70:30. Evaluation of model performance used accuracy parameters and Receiver Operating Characteristic (ROC) with Area Under Curve (AUC) values. The results showed that the C4.5 and Random Forest algorithms produced higher accuracy values than Naive Bayes and KNN, especially at training data ratios of 90%:10% and 70%:30%. Based on the ROC evaluation, the Random Forest algorithm obtained the highest AUC values at the 70%:30% ratio of 0.972 and 80%:20% of 0.970. Based on these test results, it can be concluded that the C4.5 and Random Forest algorithms have relatively better performance and are almost equivalent in classifying diabetes based on accuracy and AUC values.
Downloads
References
Karmila Hannum Dly, “Penerapan Data Mining Metode Algoritma C4.5 Dalam Memprediksi Tingkat Perceraian Di Kecamatan Kuranji Kota Padang Berbasis Website,” J. Sains Inform. Terap. ( JSIT ), vol. 4, no. 3, pp. 493–501, 2025, [Online]. Available: https://doi.org/10.37676/jmi.v17i1.1317
R. D. Apyuma, “Penerapan Data Mining Untuk Prediksi Permintaan Hasil Pertanian Beras Menggunakan Metode FP-Growth Berbasis Website,” J. Sains Inform. Terap., vol. 4, no. 3, pp. 559–567, 2025, [Online]. Available: https://doi.org/10.37034/jsisfotek.v3i3.49
Minyechil, “Diabetes Analysis And Prediction Using Random Forest, KNN, Naive Bayes, and J48: An Ensemble Approach,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 12, pp. 348–358, 2023.
I. H. Faruque, M. F., Asaduzzaman, & Sarker, “Performance Analysis of Machine Learning Techniques to Predict Diabetes Mellitus,” Int. J. Comput. Sci. Mob. Comput., vol. 8, no. 3, pp. 5–6, 2019.
M. R. Hunafa dan A. Hermawan, “Perbandingan Algoritma Naïve Bayes dan K-Nearest Neighbor Pada Imbalace Class Dataset Penyakit Diabetes,” KLIK: Kajian Ilmiah Informatika dan Komputer, vol. 4, no. 3, pp. 1551–1561, 2023, doi: 10.30865/klik.v4i3.1486.
A., A. Divi Adiffia Freza, and J. Christina, “Comparison Of The C.45 And Naive Bayes Algorithms To Predict Diabetes,” Sinkron : Jurnal dan Penelitian Teknik Informatika, vol. 7, no. 4, pp. 2641–2650, 2023, doi: 10.33395/sinkron.v8i4.12998
S. Rizki Alifia dan H. Rahmatina, “Comparative Study of K-Nearest Neighbor and Naive Bayes for Diabetes Risk Classification,” SMATIKA : STIKI Informatika Jurnal, vol. 14, no. 2, pp. 297–303, 2022, doi: 10.32664/smatika.v14i02.1350
W. Apriliah, I. Kurniawan, M. Baydhowi, and T. Haryati, “Prediksi Kemungkinan Diabetes pada Tahap Awal Menggunakan Algoritma Klasifikasi Random Forest,” Sistemasi, vol. 10, no. 1, p. 163, 2021, doi: 10.32520/stmsi.v10i1.1129.
I. Afdhal, R. Kurniawan, I. Iskandar, R. Salambue, E. Budianita, and F. Syafria, “Penerapan Algoritma Random Forest Untuk Analisis Sentimen Komentar Di YouTube Tentang Islamofobia,” J. Nas. Komputasi dan Teknol. Inf., vol. 5, no. 1, pp. 122–130, 2022, [Online]. Available: http://ojs.serambimekkah.ac.id/jnkti/article/view/4004/pdf
M. R. A. A. et Al, “Perbandingan algoritma c4.5 dengan c4.5 berbasis bagging dalam menganalisa pelanggan pulsa elektronik 1,” J. Teknol. Inf. dan Multimed., vol. 4, no. 3, pp. 1–11, 2015, [Online]. Available: https://doi.org/10.35746/jtim.v7i4.747
M. Barak, T. Ashkar, Y. J. Dori, M. Barak, T. Ashkar, and Y. J. Dori, “Teaching Science via Animated Movies : Its Effect on Students ’ Learning Outcomes and Motivation,” Teh. Inform., vol. 1, no. 1, pp. 1–6, 2023, [Online]. Available: https://doi.org/10.56211/helloworld.v2i1.193
M. A. . Andrés, “Gaming in Higher Education: Students’Assesment on Game-Based Learning. Proceedings of the 45th Conference of the International Simulation and Gaming Association,” Matematika, vol. 3, no. 1, pp. 5–7, 2023, [Online]. Available: doi: 10.17509/cd.v9i2.11339.
I. Ayu, G. Suwiprabayanti, N. Luh, P. Trisnawati, and U. Udayana, “SISTEM PENDETEKSI KESEHATAN MENTAL REMAJA MENGGUNAKAN METODE FORWARD CHAINING DAN,” J. Sist. Inf. dan Inform., vol. 8, no. 1, pp. 212–222, 2025, [Online]. Available: doi: 10.17509/cd.v9i2.11339.
P. Ramadani, R. Fadillah, Q. Adawiyah, B. Restu, and A. Ghazali, “Perbandingan Algoritma Naïve Bayes , C4 . 5 , dan K-Nearest Neighbor untuk Klasifikasi Kelayakan Program Keluarga Harapan,” J. MEDIA Inform. [JUMIN], vol. 6, no. 1, pp. 775–782, 2024, [Online]. Available: https://doi.org/10.56211/helloworld.v2i1.193
M. F. Nasrullah, R. R. Saedudin, and F. Hamami, “Sistem Informasi , Teknik dan Teknologi Terapan Comparison Accuracy of C4 . 5 Algorithm and K-Nearest Neighbors for Rainfall Classification,” J. SITEKNIK (Sistem Informasi, Tek. dan Teknol. Ter., vol. 1, no. 2, pp. 90–100, 2024.
M. Anshori, N. Rikatsih, M. S. Haris, “Prediksi Pasien Dengan Penyakit Kardiovaskular Menggunakan Random Forest,”, Jurnal TEKTRIKA, vol. 7, no. 2, pp. 58–64, 2023, doi: https://doi.org/10.25124/tektrika.v7i2.5279
H. G. et Al, “Implementasi Algoritma Naive Bayes Untuk Memprediksi Tingkat Penyebaran Covid,” J. Ris. Rumpun Ilmu Teh., vol. 1, no. 1, pp. 39–40, 2022, [Online]. Available: https://doi.org/10.55606/jurritek.v1i1.127
F. N. Zamzami, A. Adiwijaya, dan M. D. P “Analisis Sentimen Terhadap Review Film Menggunakan Metode Modified Balanced Random Forest dan Mutual Information,” Jurnal Media Informatika Budidarma, vol. 8, no. 8, p. 415-421, 2020, doi: 10.30865/mib.v5i2.2835
M. Putri, “Prediksi Penyakit Stroke Menggunakan Machine Learning Dengan Algoritma Random Forest,” Jurnal Infomedia: Teknik Informatika, multimedia & Jaringan, vol. 9, no. 1, pp. 16-21, 2024, doi: 10.30656/prosisko.v8i1.2848
M. Kholish et. al, “Perbandingan Algoritma Random Forest dan Naive Bayes dalam Memprediksi Penyakit Diabetes,” Hubisintek: Hukum Bisnis, Sains Teknologi, vol. 5, no. 1, pp. 322–328, 2024, doi: 10.31294/ijcit.v5i1.7951
R. Irfannandhy, L. B. Handoko, and N. Ariyanto, "Analisis Performa Model Random Forest dan CatBoost dengan Teknik SMOTE dalam Prediksi Risiko Diabetes,” Edumatic : Jurnal Pendidikan Informatika, vol. 8, no. 2, pp. 714–723, 2024, doi: 10.29408/edumatic.v8i2.27990.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Komparasi Perbandingan Algoritma C4.5, Naive Bayes, K-Nearest Neighbor, Random Forest Untuk Prediksi Faktor Penyebab Penyakit Diabetes
Pages: 2118-2126
Copyright (c) 2025 Muhammad Bagus Fadli, Irwan Purnama, Rohani Rohani

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).





















