Komparasi Perbandingan Algoritma C4.5, Naive Bayes, K-Nearest Neighbor, Random Forest Untuk Prediksi Faktor Penyebab Penyakit Diabetes


  • Muhammad Bagus Fadli * Mail Universitas Labuhanbatu, Rantauprapat, Indonesia
  • Irwan Purnama Universitas Labuhanbatu, Rantauprapat, Indonesia
  • Rohani Rohani Universitas Labuhanbatu, Rantauprapat, Indonesia
  • (*) Corresponding Author
Keywords: Diabetes; Decision Tree; C4.5; Naive Bayes; K-Nearest Neighbor; Random Forest

Abstract

Diabetes is a chronic metabolic disease characterized by elevated blood glucose levels and can cause various serious complications and contribute to high mortality rates worldwide. The main problem in managing diabetes is the need for accurate patient status classification based on laboratory test data so that appropriate treatment can be carried out. This study aims to compare the performance of the C4.5 algorithm, Naive Bayes, K-Nearest Neighbor (KNN), and Random Forest in classifying diabetes patient data. The dataset used was sourced from Electronic Health Records (EHRs) with research subjects from Rantauprapat Regional General Hospital, totaling 10,000 data consisting of eight attributes and one class attribute, with 859 diabetes patient data and 9,141 non-diabetes patient data. The research method was carried out by dividing the data into training data and testing data using a ratio of 90:10, 80:20, and 70:30. Evaluation of model performance used accuracy parameters and Receiver Operating Characteristic (ROC) with Area Under Curve (AUC) values. The results showed that the C4.5 and Random Forest algorithms produced higher accuracy values ​​than Naive Bayes and KNN, especially at training data ratios of 90%:10% and 70%:30%. Based on the ROC evaluation, the Random Forest algorithm obtained the highest AUC values ​​at the 70%:30% ratio of 0.972 and 80%:20% of 0.970. Based on these test results, it can be concluded that the C4.5 and Random Forest algorithms have relatively better performance and are almost equivalent in classifying diabetes based on accuracy and AUC values.

Downloads

Download data is not yet available.

References

Karmila Hannum Dly, “Penerapan Data Mining Metode Algoritma C4.5 Dalam Memprediksi Tingkat Perceraian Di Kecamatan Kuranji Kota Padang Berbasis Website,” J. Sains Inform. Terap. ( JSIT ), vol. 4, no. 3, pp. 493–501, 2025, [Online]. Available: https://doi.org/10.37676/jmi.v17i1.1317

R. D. Apyuma, “Penerapan Data Mining Untuk Prediksi Permintaan Hasil Pertanian Beras Menggunakan Metode FP-Growth Berbasis Website,” J. Sains Inform. Terap., vol. 4, no. 3, pp. 559–567, 2025, [Online]. Available: https://doi.org/10.37034/jsisfotek.v3i3.49

Minyechil, “Diabetes Analysis And Prediction Using Random Forest, KNN, Naive Bayes, and J48: An Ensemble Approach,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 12, pp. 348–358, 2023.

I. H. Faruque, M. F., Asaduzzaman, & Sarker, “Performance Analysis of Machine Learning Techniques to Predict Diabetes Mellitus,” Int. J. Comput. Sci. Mob. Comput., vol. 8, no. 3, pp. 5–6, 2019.

M. R. Hunafa dan A. Hermawan, “Perbandingan Algoritma Naïve Bayes dan K-Nearest Neighbor Pada Imbalace Class Dataset Penyakit Diabetes,” KLIK: Kajian Ilmiah Informatika dan Komputer, vol. 4, no. 3, pp. 1551–1561, 2023, doi: 10.30865/klik.v4i3.1486.

A., A. Divi Adiffia Freza, and J. Christina, “Comparison Of The C.45 And Naive Bayes Algorithms To Predict Diabetes,” Sinkron : Jurnal dan Penelitian Teknik Informatika, vol. 7, no. 4, pp. 2641–2650, 2023, doi: 10.33395/sinkron.v8i4.12998

S. Rizki Alifia dan H. Rahmatina, “Comparative Study of K-Nearest Neighbor and Naive Bayes for Diabetes Risk Classification,” SMATIKA : STIKI Informatika Jurnal, vol. 14, no. 2, pp. 297–303, 2022, doi: 10.32664/smatika.v14i02.1350

W. Apriliah, I. Kurniawan, M. Baydhowi, and T. Haryati, “Prediksi Kemungkinan Diabetes pada Tahap Awal Menggunakan Algoritma Klasifikasi Random Forest,” Sistemasi, vol. 10, no. 1, p. 163, 2021, doi: 10.32520/stmsi.v10i1.1129.

I. Afdhal, R. Kurniawan, I. Iskandar, R. Salambue, E. Budianita, and F. Syafria, “Penerapan Algoritma Random Forest Untuk Analisis Sentimen Komentar Di YouTube Tentang Islamofobia,” J. Nas. Komputasi dan Teknol. Inf., vol. 5, no. 1, pp. 122–130, 2022, [Online]. Available: http://ojs.serambimekkah.ac.id/jnkti/article/view/4004/pdf

M. R. A. A. et Al, “Perbandingan algoritma c4.5 dengan c4.5 berbasis bagging dalam menganalisa pelanggan pulsa elektronik 1,” J. Teknol. Inf. dan Multimed., vol. 4, no. 3, pp. 1–11, 2015, [Online]. Available: https://doi.org/10.35746/jtim.v7i4.747

M. Barak, T. Ashkar, Y. J. Dori, M. Barak, T. Ashkar, and Y. J. Dori, “Teaching Science via Animated Movies : Its Effect on Students ’ Learning Outcomes and Motivation,” Teh. Inform., vol. 1, no. 1, pp. 1–6, 2023, [Online]. Available: https://doi.org/10.56211/helloworld.v2i1.193

M. A. . Andrés, “Gaming in Higher Education: Students’Assesment on Game-Based Learning. Proceedings of the 45th Conference of the International Simulation and Gaming Association,” Matematika, vol. 3, no. 1, pp. 5–7, 2023, [Online]. Available: doi: 10.17509/cd.v9i2.11339.

I. Ayu, G. Suwiprabayanti, N. Luh, P. Trisnawati, and U. Udayana, “SISTEM PENDETEKSI KESEHATAN MENTAL REMAJA MENGGUNAKAN METODE FORWARD CHAINING DAN,” J. Sist. Inf. dan Inform., vol. 8, no. 1, pp. 212–222, 2025, [Online]. Available: doi: 10.17509/cd.v9i2.11339.

P. Ramadani, R. Fadillah, Q. Adawiyah, B. Restu, and A. Ghazali, “Perbandingan Algoritma Naïve Bayes , C4 . 5 , dan K-Nearest Neighbor untuk Klasifikasi Kelayakan Program Keluarga Harapan,” J. MEDIA Inform. [JUMIN], vol. 6, no. 1, pp. 775–782, 2024, [Online]. Available: https://doi.org/10.56211/helloworld.v2i1.193

M. F. Nasrullah, R. R. Saedudin, and F. Hamami, “Sistem Informasi , Teknik dan Teknologi Terapan Comparison Accuracy of C4 . 5 Algorithm and K-Nearest Neighbors for Rainfall Classification,” J. SITEKNIK (Sistem Informasi, Tek. dan Teknol. Ter., vol. 1, no. 2, pp. 90–100, 2024.

M. Anshori, N. Rikatsih, M. S. Haris, “Prediksi Pasien Dengan Penyakit Kardiovaskular Menggunakan Random Forest,”, Jurnal TEKTRIKA, vol. 7, no. 2, pp. 58–64, 2023, doi: https://doi.org/10.25124/tektrika.v7i2.5279

H. G. et Al, “Implementasi Algoritma Naive Bayes Untuk Memprediksi Tingkat Penyebaran Covid,” J. Ris. Rumpun Ilmu Teh., vol. 1, no. 1, pp. 39–40, 2022, [Online]. Available: https://doi.org/10.55606/jurritek.v1i1.127

F. N. Zamzami, A. Adiwijaya, dan M. D. P “Analisis Sentimen Terhadap Review Film Menggunakan Metode Modified Balanced Random Forest dan Mutual Information,” Jurnal Media Informatika Budidarma, vol. 8, no. 8, p. 415-421, 2020, doi: 10.30865/mib.v5i2.2835

M. Putri, “Prediksi Penyakit Stroke Menggunakan Machine Learning Dengan Algoritma Random Forest,” Jurnal Infomedia: Teknik Informatika, multimedia & Jaringan, vol. 9, no. 1, pp. 16-21, 2024, doi: 10.30656/prosisko.v8i1.2848

M. Kholish et. al, “Perbandingan Algoritma Random Forest dan Naive Bayes dalam Memprediksi Penyakit Diabetes,” Hubisintek: Hukum Bisnis, Sains Teknologi, vol. 5, no. 1, pp. 322–328, 2024, doi: 10.31294/ijcit.v5i1.7951

R. Irfannandhy, L. B. Handoko, and N. Ariyanto, "Analisis Performa Model Random Forest dan CatBoost dengan Teknik SMOTE dalam Prediksi Risiko Diabetes,” Edumatic : Jurnal Pendidikan Informatika, vol. 8, no. 2, pp. 714–723, 2024, doi: 10.29408/edumatic.v8i2.27990.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Komparasi Perbandingan Algoritma C4.5, Naive Bayes, K-Nearest Neighbor, Random Forest Untuk Prediksi Faktor Penyebab Penyakit Diabetes

Dimensions Badge
Article History
Submitted: 2025-11-11
Published: 2025-12-31
Abstract View: 51 times
PDF Download: 29 times
How to Cite
Fadli, M., Purnama, I., & Rohani, R. (2025). Komparasi Perbandingan Algoritma C4.5, Naive Bayes, K-Nearest Neighbor, Random Forest Untuk Prediksi Faktor Penyebab Penyakit Diabetes. Building of Informatics, Technology and Science (BITS), 7(3), 2118-2126. https://doi.org/10.47065/bits.v7i3.8683
Issue
Section
Articles