Perbandingan Kinerja Algoritma K-Nearest Neighbors dan Decision Tree untuk Klasifikasi Diabetes


  • Amar Haris Yunianto * Mail Universitas Dian Nuswantoro, Semarang, Indonesia
  • Egia Rosi Subhiyakto Universitas Dian Nuswantoro, Semarang, Indonesia
  • (*) Corresponding Author
Keywords: Diabetes; Classification; K-Nearest Neighbors; Decision Tree; SMOTE

Abstract

Diabetes is a chronic metabolic disease that is a major concern in global health due to its increasing prevalence, including in Indonesia, with significant impacts on individual health and health systems. This study aims to compare the performance of K-Nearest Neighbors (KNN) and Decision Tree (DT) algorithms in diabetes classification using the Pima Indians Diabetes Database (PIDD) dataset. Research methods include data collection, pre-processing, missing value handling, outlier detection and handling, and data balancing techniques using Synthetic Minority Oversampling Technique (SMOTE) to overcome class imbalance in the dataset. Model implementation is done by optimizing parameters using GridSearchCV, while performance evaluation is done based on accuracy, precision, recall, and F1 score matrices. The results show that the DT algorithm has superior performance compared to KNN, both without SMOTE and with SMOTE. In the model without SMOTE, DT achieved 85.71% accuracy, while KNN only reached 83.12%. After applying SMOTE, the performance of both algorithms improved significantly, with DT achieving 92% accuracy, 94% precision, 90.38% recall, and 92.16% F1 score, while KNN achieved 91% accuracy, 96.59% recall, and 90.43% F1 score. This study revealed that the use of SMOTE effectively improved the model's performance in handling data imbalance, while the DT algorithm showed better performance stability. These findings are expected to make a significant contribution to the development of more accurate prediction models for diabetes diagnosis, while enriching insights into the application of machine learning in the healthcare field.

Downloads

Download data is not yet available.

References

R. Sianturi and A. Mustofa, “Aerobic Exercise Reduce Blood Glucose in Type 2 Diabetes Mellitus,” Media Keperawatan Indonesia, vol. 5, no. 1, p. 73, Feb. 2022, doi: 10.26714/mki.5.1.2022.73-83.

World Health Organization, “Diabetes.” Access Date Sept 2024

International Diabetes Federation, IDF Diabetes Atlas, 10th ed. Brussels, Belgium, 2021.

Q. A. Puteri, T. Sagirani, and J. Lemantara, “Perbandingan Algoritma Naïve Bayes dan K-Nearest Neighbor (KNN) untuk Mengetahui Keakuratan Diagnosa Penyakit Diabetes,” Jurnal Nasional Teknologi dan Sistem Informasi, vol. 9, no. 3, pp. 247–254, Dec. 2023, doi: 10.25077/teknosi.v9i3.2023.247-254.

A. Perdana, A. Hermawan, and D. Avianto, “Analyze Important Features of PIMA Indian Database For Diabetes Prediction Using KNN,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 12, no. 1, pp. 70–75, Mar. 2023, doi: 10.32736/sisfokom.v12i1.1598.

N. N. Habibah, A. Nazir, I. Iskandar, F. Syafria, L. Oktavia, and I. Syurfi, “Pemodelan Klasifikasi Untuk Menentukan Penyakit Diabetes dengan Faktor Penyebab Menggunakan Decision Tree C4.5 Pada Wanita,” Jurnal Sistem Komputer dan Informatika (JSON), vol. 4, no. 4, p. 654, Jun. 2023, doi: 10.30865/json.v4i4.6202.

N. Aggarwal, C. Bagath Basha, A. Arya, and N. Gupta, “A Comparative Analysis of Machine Leaming-Based Classifiers for Predicting Diabetes,” in Proceedings - 2023 International Conference on Advanced Computing and Communication Technologies, ICACCTech 2023, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 615–621. doi: 10.1109/ICACCTech61146.2023.00105.

P. Sarkar and S. Pawar, “Machine Learning based Early Predication and Detection of Diabetes Mellitus,” in International Conference on Artificial Intelligence for Innovations in Healthcare Industries, ICAIIHI 2023, Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/ICAIIHI57871.2023.10489259.

J. J. Khanam and S. Y. Foo, “A comparison of machine learning algorithms for diabetes prediction,” ICT Express, vol. 7, no. 4, pp. 432–439, Dec. 2021, doi: 10.1016/j.icte.2021.02.004.

M. E. Febrian, F. X. Ferdinan, G. P. Sendani, K. M. Suryanigrum, and R. Yunanda, “Diabetes prediction using supervised machine learning,” in Procedia Computer Science, Elsevier B.V., 2022, pp. 21–30. doi: 10.1016/j.procs.2022.12.107.

E. Safitri, D. Rofianto, N. Purwati, H. Kurniawan, and S. Karnila, “Prediksi Penyakit Diabetes Melitus Menggunakan Algoritma Machine Learning,” JUSTIN, Vol. 12, No. 4, 2024, doi: 10.26418/justin.v12i4.84620.

B. Amma N.G., “En-RfRsK: An ensemble machine learning technique for prognostication of diabetes mellitus,” Egyptian Informatics Journal, vol. 25, Mar. 2024, doi: 10.1016/j.eij.2024.100441.

P. V. K. Rao, Aarti, and A. S. Rao, “Machine Learning Approaches for Diabetes Prediction: Comparative Analysis and Pre-processing Insights,” in Proceedings - 2024 8th International Conference on Inventive Systems and Control, ICISC 2024, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 39–46. doi: 10.1109/ICISC62624.2024.00014.

C. Haryawan, Y. Muria Kusuma Ardhana, “Analisa Perbandingan Teknik Oversampling Smote Pada Imbalanced Data,” JIRE, Vol 6, No 1, 2023. doi: 10.36595/jire.v6i1.834.

R. Ghorbani and R. Ghousi, “Comparing Different Resampling Methods in Predicting Students’ Performance Using Machine Learning Techniques,” IEEE Access, vol. 8, pp. 67899–67911, 2020, doi: 10.1109/ACCESS.2020.2986809.

S. Sofyan and A. Prasetyo, “Penerapan Synthetic Minority Oversampling Technique (SMOTE) Terhadap Data Tidak Seimbang Pada Tingkat Pendapatan Pekerja Informal Di Provinsi D.I. Yogyakarta Tahun 2019,” Seminar Nasional Official Statistics, 2021. doi: https://doi.org/10.34123/semnasoffstat.v2021i1.1081.

A. Nugroho and E. Rilvani, “Penerapan Metode Oversampling SMOTE Pada Algoritma Random Forest Untuk Prediksi Kebangkrutan Perusahaan Application of the SMOTE Oversampling Method to the Random Forest Algorithm for Predicting Company Bankruptcy.” Jurnal Teknologi Informasi, Vol 11, No 1, 2023, doi: 10.33633/tc.v22i1.7527.

A. Andreyestha and Q. N. Azizah, “Analisa Sentimen Kicauan Twitter Tokopedia Dengan Optimalisasi Data Tidak Seimbang Menggunakan Algoritma SMOTE,” Infotek : Jurnal Informatika dan Teknologi, vol. 5, no. 1, pp. 108–116, Jan. 2022, doi: 10.29408/jit.v5i1.4581.

E. Rahmawati and C. Agustina, “Optimasi Ulasan Pengguna Aplikasi ChatGPT di Google Play Store Menggunakan SMOTE,” J-TIT, vol 11, no 1, 2024. [Online]. Available: https://doi.org/10/25047/jtit.v11i1.360

M. Persada Pulungan, A. Purnomo, A. Kurniasih, “Penerapan Smote Untuk Mengatasi Imbalance Class Dalam Klasifikasi Kepribadian Mbti Menggunakan Naive Bayes Classifier Application Of Smote To Overcome Class Imbalance In The Mbti Personality Classification Using The Naïve Bayes Classifier” JTIK, Vol 11, No 5, 2024, doi: 10.25126/jtiik.2024117989.

A. Surya Firmansyah, A. Aziz, and M. Ahsan, “Optimasi K-Nearest Neighbor Menggunakan Algoritma Smote Untuk Mengatasi Imbalance Class Pada Klasifikasi Analisis Sentimen,” JATI, Vol 7, No 6, 2023. doi: https://doi.org/10.36040/jati.v7i6.7257.

M. Syukron, R. Santoso, and T. Widiharih, “Perbandingan Metode Smote Random Forest Dan Smote Xgboost Untuk Klasifikasi Tingkat Penyakit Hepatitis C Pada Imbalance Class Data,” Jurnal Gaussian, vol. 9, no. 3, pp. 227–236, Aug. 2020, doi: 10.14710/j.gauss.9.3.227-236.

A. W. Ishlah, S. Sudarno, and P. Kartikasari, “Implementasi Gridsearchcv Pada Support Vector Regression (Svr) Untuk Peramalan Harga Saham,” Jurnal Gaussian, vol. 12, no. 2, pp. 276–286, Jul. 2023, doi: 10.14710/j.gauss.12.2.276-286.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Perbandingan Kinerja Algoritma K-Nearest Neighbors dan Decision Tree untuk Klasifikasi Diabetes

Dimensions Badge
Article History
Submitted: 2024-12-27
Published: 2025-03-16
Abstract View: 691 times
PDF Download: 462 times
How to Cite
Yunianto, A., & Subhiyakto, E. (2025). Perbandingan Kinerja Algoritma K-Nearest Neighbors dan Decision Tree untuk Klasifikasi Diabetes. Building of Informatics, Technology and Science (BITS), 6(4), 2601-2611. https://doi.org/10.47065/bits.v6i4.6550
Issue
Section
Articles