Evaluasi KNN dan Logistic Regression untuk Klasifikasi Diabetes dengan Preprocessing Terstandarisasi: Trade-off Kinerja dan Interpretabilitas


  • Alif Zayyin Kamandani Universitas Dian Nuswantoro, Semarang, Indonesia
  • Egia Rosi Subhiyakto * Mail Universitas Dian Nuswantoro, Semarang, Indonesia
  • (*) Corresponding Author
Keywords: Diabetes Mellitus; K-Nearest Neighbors; Logistic Regression; Model Interpretability; ROC-AUC

Abstract

Although K-Nearest Neighbors (KNN) and Logistic Regression have been widely used in diabetes classification, studies that systematically combine a standardized preprocessing pipeline—including median imputation, feature standardization, and stratified data splitting—and evaluate the trade-off between predictive performance and model interpretability remain limited. This study aims to compare the performance of both algorithms in classifying diabetes status using the Pima Indians Diabetes dataset, which consists of 768 samples with eight numerical attributes. The research stages include data exploration, handling missing values using median imputation, feature standardization using StandardScaler, and stratified data splitting with a ratio of 80:20. Model evaluation is conducted using accuracy, precision, recall, F1-score, confusion matrix, and ROC-AUC metrics. The experimental results show that KNN with an optimal parameter of K=21 achieves an accuracy of 75.97%, an F1-score of 61.86%, and a ROC-AUC of 0.8120, while Logistic Regression achieves an accuracy of 70.78%, an F1-score of 54.55%, and a ROC-AUC of 0.8130. Although KNN demonstrates higher predictive performance, Logistic Regression provides advantages in interpretability through model coefficients, where the variables Glucose (β=1.1825) and BMI (β=0.6887) are identified as the main predictors of diabetes risk. These findings indicate a clear trade-off between accuracy and interpretability, suggesting that KNN is more suitable for high-accuracy prediction tasks, while Logistic Regression is more appropriate in clinical contexts requiring transparency and model accountability.

Downloads

Download data is not yet available.

References

L. M. Cendani and A. Wibowo, “Perbandingan Metode Ensemble Learning pada Klasifikasi Penyakit Diabetes,” Jurnal Masyarakat Informatika, vol. 13, no. 1, pp. 33–44, 2022, doi: 10.14710/jmasif.13.1.42912.

Kementerian Kesehatan Republik Indonesia, “Laporan Riset Kesehatan Dasar (Riskesdas),” 2023. Accessed: Mar. 11, 2026. [Online]. Available: https://www.kemkes.go.id

M. R. Maulana, A. Sucipto, and H. M. Mulyo, “Optimisasi Parameter Support Vector Machine dengan Particle Swarm Optimization untuk Peningkatan Klasifikasi Diabetes,” Jurnal Informatika Teknologi dan Sains (Jinteks), vol. 6, no. 4, pp. 802–812, Nov. 2024, doi: 10.51401/jinteks.v6i4.4784.

K. Oliullah, M. H. Rasel, Md. M. Islam, Md. R. Islam, Md. A. H. Wadud, and Md. Whaiduzzaman, “A stacked ensemble machine learning approach for the prediction of diabetes,” J. Diabetes Metab. Disord., vol. 23, no. 1, pp. 603–617, Nov. 2023, doi: 10.1007/s40200-023-01321-2.

N. Sunanto and G. Falah, “Penerapan Algoritma C4.5 untuk Membuat Model Prediksi Pasien yang Mengidap Penyakit Diabetes,” Rabit : Jurnal Teknologi dan Sistem Informasi Univrab, vol. 7, no. 2, pp. 208–216, Jul. 2022, doi: 10.36341/rabit.v7i2.2435.

M. F. Kurniawan and D. A. Megawaty, “Comparison of Logistic Regression, Random Forest, Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) Algorithms in Diabetes Prediction,” Journal of Applied Informatics and Computing, vol. 9, no. 5, pp. 2154–2162, 2025, doi: 10.30871/jaic.v9i5.9815.

D. S. F. Azzahrah and Alamsyah, “Klasifikasi Penyakit Diabetes Menggunakan Algoritma K-Nearest Neighbor,” Seminar Nasional Ilmu Komputer (SNIK 2022), vol. 1, no. 4, pp. 70–75, 2022.

A. Hamid et al., “Analysis to predict diabetes Using Data Mining,” JIEMAR ( Journal of Industrial Engineering & Management Research), vol. 6, no. 2, pp. 25–40, doi: 10.7777/jiemar.v6i2.579.

I. Tasin, T. U. Nabil, S. Islam, and R. Khan, “Diabetes prediction using machine learning and explainable AI techniques,” Healthc. Technol. Lett., vol. 10, no. 1–2, pp. 1–10, Feb. 2023, doi: 10.1049/htl2.12039.

R. Pratama, A. M. Siregar, S. A. P. Lestari, and S. Faisal, “Implementation of Diabetes Prediction Model Using Random Forest Algorithm, K-Nearest Neighbor, and Logistic Regression,” Jurnal Teknik Informatika (Jutif), vol. 5, no. 4, pp. 1165–1174, Sep. 2024, doi: 10.52436/1.jutif.2024.5.4.2593.

A. Perdana, A. Hermawan, and D. Avianto, “Analyze Important Features of PIMA Indian Database For Diabetes Prediction Using KNN,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 12, no. 1, pp. 70–75, Mar. 2023, doi: 10.32736/sisfokom.v12i1.1598.

R. S. Nurhalizah, R. Ardianto, and P. Purwono, “Analisis Supervised dan Unsupervised Learning pada Machine Learning: Systematic Literature Review,” Jurnal Ilmu Komputer dan Informatika, vol. 4, no. 1, pp. 61–72, 2024, doi: 10.54082/jiki.168.

Dewi Nasien et al., “Perbandingan Implementasi Machine Learning Menggunakan Metode KNN, Naive Bayes, dan Logistik Regression Untuk Mengklasifikasi Penyakit Diabetes,” JEKIN - Jurnal Teknik Informatika, vol. 4, no. 1, pp. 10–17, Feb. 2024, doi: 10.58794/jekin.v4i1.640.

M. Kurniawan Khamdani, N. Hidayat, and R. K. Dewi, “Implementasi Metode K-Nearest Neighbor Untuk Mendiagnosis Penyakit Tanaman Bawang Merah,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 5, no. 1, pp. 11–16, 2021

V. Wulandari, W. J. Sari, Z. Alfian, L. Legito, and T. Arifianto, “Implementasi Algoritma Naïve Bayes Classifier dan K-Nearest Neighbor untuk Klasifikasi Penyakit Ginjal Kronik,” MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 4, no. 2, pp. 710–718, Apr. 2024, doi: 10.57152/malcom.v4i2.1229.

F. D. Pramakrisna, F. D. Adhinata, and N. A. F. Tanjung, “Aplikasi Klasifikasi SMS Berbasis Web Menggunakan Algoritma Logistic Regression,” Teknika, vol. 11, no. 2, pp. 90–97, Jun. 2022, doi: 10.34148/teknika.v11i2.466.

Q. R. Cahyani et al., “Prediksi Risiko Penyakit Diabetes menggunakan Algoritma Regresi Logistik Diabetes Risk Prediction using Logistic Regression Algorithm Article Info ABSTRAK,” JOMLAI: Journal of Machine Learning and Artificial Intelligence, vol. 1, no. 2, pp. 2828–9099, 2022, doi: 10.55123/jomlai.v1i2.598.

M. R. Romadhon and F. Kurniawan, “A Comparison of Naive Bayes Methods, Logistic Regression and KNN for Predicting Healing of Covid-19 Patients in Indonesia,” 3rd 2021 East Indonesia Conference on Computer and Information Technology, EIConCIT 2021, pp. 41–44, 2021, doi: 10.1109/EIConCIT50028.2021.9431845.

I. Rahmawati, T. Rika Fitriani, A. No’eman, and A. Y. P. Yusuf, “Analisis Sentimen Menggunakan Algoritma Logistic Regression Pada Penerbangan Lion Air berdasarkan Ulasan Platform Online,” Jurnal Riset Informatika dan Teknologi Informasi, vol. 1, no. 1, pp. 11–16, Aug. 2023, doi: 10.58776/jriti.v1i1.60.

M. N. Sutoyo and I. Slamet, Buku Ajar Data Mining. PT. Sonpedia Publishing Indonesia, 2024.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Evaluasi KNN dan Logistic Regression untuk Klasifikasi Diabetes dengan Preprocessing Terstandarisasi: Trade-off Kinerja dan Interpretabilitas

Dimensions Badge
Article History
Submitted: 2026-03-14
Published: 2026-03-31
Abstract View: 26 times
PDF Download: 26 times
How to Cite
Kamandani, A., & Subhiyakto, E. (2026). Evaluasi KNN dan Logistic Regression untuk Klasifikasi Diabetes dengan Preprocessing Terstandarisasi: Trade-off Kinerja dan Interpretabilitas. Building of Informatics, Technology and Science (BITS), 7(4), 2679-2689. https://doi.org/10.47065/bits.v7i4.9534
Issue
Section
Articles