Analisis Performa K-Nearest Neighbor dengan Optimasi F1-Score dan Teknik SMOTE dalam Klasifikasi Risiko Serangan Jantung

Fikri Luqman Pratama; Muhamad Akrom

doi:10.47065/bits.v7i4.9493

Fikri Luqman Pratama * Universitas Dian Nuswantoro, Semarang, Indonesia
Muhamad Akrom Universitas Dian Nuswantoro, Semarang, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bits.v7i4.9493

Keywords: K-Nearest Neighbor; SMOTE; Heart Disease Classification; Medical Machine Learning; Class Imbalance

Abstract

Heart attack is one of the leading causes of death worldwide, making early risk prediction essential for improving patient outcomes. However, many medical datasets suffer from class imbalance, where the number of high-risk cases is significantly smaller than normal cases. This condition may cause machine learning models to be biased toward the majority class and reduce their ability to detect high-risk patients. This study aims to analyze the performance of the K-Nearest Neighbor (KNN) algorithm optimized using F1-score and combined with the Synthetic Minority Over-sampling Technique (SMOTE) for heart attack risk classification. The dataset used is the Heart Attack Dataset, which consists of numerical and categorical features. The research applies an experimental approach by developing a machine learning pipeline that includes data preprocessing, missing value handling, feature standardization, oversampling using SMOTE, and hyperparameter optimization through GridSearchCV with F1-score as the main evaluation metric. Model evaluation is conducted using Stratified 5-Fold Cross-Validation with accuracy, precision, recall, F1-score, and ROC-AUC metrics. The results show that the baseline KNN model achieves an accuracy of 98.50%, precision 95.27%, recall 81.47%, and ROC-AUC 0.9278. Meanwhile, the KNN model integrated with SMOTE attains a recall of 87.27% and ROC-AUC of 0.9484, indicating improved detection of heart attack cases and a reduction in false negatives by 31%, although precision decreases to 72.15%. These findings demonstrate that the integration of SMOTE and hyperparameter optimization effectively improves model sensitivity, making it more suitable for medical applications that prioritize patient safety.

Downloads

Download data is not yet available.

References

S. Wan, F. Wan, X. D.-A. of C. Diseases, and undefined 2025, “Machine learning approaches for cardiovascular disease prediction: A review,” ElsevierS Wan, F Wan, X DaiArchives of Cardiovascular Diseases, 2025•Elsevier, 2025, doi: 10.1016/j.acvd.2025.04.055.

I. Akbar, F. Supriadi, and D. I. Junaedi, “Pemanfaatan Machine Learning Di Bidang Kesehatan,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 9, no. 1, pp. 1744–1749, Jan. 2025, doi: 10.36040/jati.v9i1.12663.

J. Yang and J. Guan, “A Heart Disease Prediction Model Based on Feature Optimization and Smote-Xgboost Algorithm,” Information 2022, Vol. 13, vol. 13, no. 10, Oct. 2022, doi: 10.3390/info13100475.

M. S. Simanjuntak, R. Robet, and L. Hoki, “A Comparative Study of Machine Learning and Deep Learning Models for Heart Disease Classification,” Journal of Applied Informatics and Computing, vol. 9, no. 6, pp. 3405–3409, Dec. 2025, doi: 10.30871/jaic.v9i6.11546.

N. Tyagi and P. Jain, “A Review of Machine Learning Algorithms for Predicting Heart Disease,” 2024 2nd International Conference on Disruptive Technologies, ICDT 2024, pp. 961–965, 2024, doi: 10.1109/ICDT61202.2024.10488917.

A. K. Yadav, R. Shukla, and T. R. Singh, “Machine learning in expert systems for disease diagnostics in human healthcare,” Machine Learning, Big Data, and IoT for Medical Informatics, pp. 179–200, Jan. 2021, doi: 10.1016/B978-0-12-821777-1.00022-7.

C. Boukhatem, H. Y. Youssef, and A. B. Nassif, “Heart Disease Prediction Using Machine Learning,” 2022 Advances in Science and Engineering Technology International Conferences, ASET 2022, 2022, doi: 10.1109/ASET53988.2022.9734880.

A. Ishaq et al., “Improving the Prediction of Heart Failure Patients’ Survival Using SMOTE and Effective Data Mining Techniques,” IEEE Access, vol. 9, pp. 39707–39716, 2021, doi: 10.1109/ACCESS.2021.3064084.

N. Nasution, F. B. Nasution, M. A. Hasan, and L. Kuning, “Predicting Heart Disease Using Machine Learning: An Evaluation of Logistic Regression, Random Forest, SVM, and KNN Models on the UCI Heart Disease Dataset,” IT Journal Research and Development, vol. 9, no. 2, pp. 140–150, Apr. 2025, doi: 10.25299/itjrd.2025.17941.

E. Richardson, R. Trevizani, J. A. Greenbaum, H. Carter, M. Nielsen, and B. Peters, “The receiver operating characteristic curve accurately assesses imbalanced datasets,” Patterns, vol. 5, no. 6, p. 100994, Jun. 2024, doi: 10.1016/j.patter.2024.100994.

D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, no. 1, Jan. 2020, doi: 10.1186/s12864-019-6413-7.

S. S. Yadav* and G. P. Bhole, “Learning from Imbalanced Data in Classification,” International Journal of Recent Technology and Engineering (IJRTE), vol. 8, no. 5, pp. 1907–1016, Jan. 2020, doi: 10.35940/ijrte.e6286.018520.

A. H. Shaker, I. A. Ibrahim, and S. K. Gharghan, “Machine learning techniques for cardiovascular disease detection through heart sound analysis: A review,” AIP Conf. Proc., vol. 3232, no. 1, Oct. 2024, doi: 10.1063/5.0236263.

S. P. Aulia, B. Rahmat, and A. Junaidi, “Enhancing Heart Disease Prediction through SMOTE-ENN Balancing and RFECV Feature Selection,” Journal of Artificial Intelligence and Engineering Applications (JAIEA), vol. 4, no. 3, pp. 1968–1973, Jun. 2025, doi: 10.59934/jaiea.v4i3.1057.

M. Kavitha, G. Gnaneswar, R. Dinesh, Y. R. Sai, and R. S. Suraj, “Heart Disease Prediction using Hybrid machine Learning Model,” Proceedings of the 6th International Conference on Inventive Computation Technologies, ICICT 2021, pp. 1329–1333, Jan. 2021, doi: 10.1109/ICICT50816.2021.9358597.

M. Rahardi, B. P. Asaddulloh, A. Aminuddin, F. F. Abdulloh, I. Saifudin, and F. P. Kusumawijaya, “Optimizing Machine Learning Models for Class Imbalance in Heart Disease Prediction,” Engineering, Technology & Applied Science Research, vol. 15, no. 3, pp. 23599–23604, Jun. 2025, doi: 10.48084/etasr.10407.

J. J. Wibowo, D. A. Kristiyanti, and J. Wiratama, “Enhancing Heart Disease Classification: A Comparative Analysis of SMOTE and Naïve Bayes on Imbalanced Data,” JOIV : International Journal on Informatics Visualization, vol. 9, no. 5, pp. 2072–2079, Sep. 2025, doi: 10.62527/joiv.9.5.3248.

S. Akinola, R. Leelakrishna, and V. Varadarajan, “Enhancing cardiovascular disease prediction: A hybrid machine learning approach integrating oversampling and adaptive boosting techniques,” AIMS Med. Sci., vol. 11, no. 2, pp. 58–71, 2024, doi: 10.3934/medsci.2024005.

S. Gupta, A. Tripathi, and C. Srivastava, “Machine Learning Models for Heart Disease Prediction: Balancing Accuracy and Transparency,” 2nd IEEE International Conference on IoT, Communication and Automation Technology, ICICAT 2024, pp. 1605–1611, 2024, doi: 10.1109/ICICAT62666.2024.10922965.

D. Purwanto, S. C. Hidayati, D. I. Ricoida, and K. A. Putri, “Optimized Machine Learning Models for Heart Disease Prediction: A Performance Analysis,” Proceedings - 2024 International of Seminar on Application for Technology of Information and Communication: Smart And Emerging Technology for a Better Life, iSemantic 2024, pp. 559–562, 2024, doi: 10.1109/iSemantic63362.2024.10762500.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Analisis Performa K-Nearest Neighbor dengan Optimasi F1-Score dan Teknik SMOTE dalam Klasifikasi Risiko Serangan Jantung

Analisis Performa K-Nearest Neighbor dengan Optimasi F1-Score dan Teknik SMOTE dalam Klasifikasi Risiko Serangan Jantung

Abstract

Downloads

References

Most read articles by the same author(s)