Classification of Diabetes Diseases Based on Medical Features Using Optimized Support Vector Machine
Abstract
Diabetes mellitus is a chronic disease caused by impaired glucose metabolism and has become a global health threat with a steadily increasing prevalence each year. According to WHO and IDF, the number of people living with diabetes is projected to reach 783 million by 2045. This condition demands the development of an accurate and efficient early detection system to support medical decision-making. This study aims to develop an optimized Support Vector Machine (SVM)-based classification model to enhance the accuracy and interpretability of diabetes prediction. The dataset used is the Pima Indians Diabetes Dataset, which consists of eight medical features such as glucose level, blood pressure, and body mass index (BMI). The research stages include data preprocessing, class balancing using the Synthetic Minority Over-sampling Technique (SMOTE), parameter optimization with GridSearchCV, and interpretability analysis through SHapley Additive exPlanations (SHAP). The results show that the optimized SVM model with the Radial Basis Function (RBF) kernel achieved an accuracy of 82%, with a significant improvement in the diabetes class recall value from 0.564 to 0.83 after optimization. The Area Under Curve (AUC) value of 0.871 indicates the model’s effectiveness in distinguishing between positive and negative classes. The SHAP analysis reveals that Glucose, Age, BMI, and Diabetes Pedigree Function are the most influential features in prediction. These findings emphasize that the combination of normalization, balancing, hyperparameter optimization, and interpretability produces a reliable and transparent SVM model. This model has strong potential for implementation in Clinical Decision Support Systems (CDSS) for accurate and explainable early diabetes detection.
Downloads
References
A. N. Tarwadi, N. Nu’im Haiya, and M. Aspihan, “Hubungan Tingkat Stres dengan Kadar Gula Darah pada Penderita Diabetes Melitus,” J. Keperawatan Berbudaya Sehat, vol. 3, no. 2, pp. 53–60, 2025, doi: https://doi.org/10.35473/jkbs.v3i2.3877.
D. Tomic, J. E. Shaw, and D. J. Magliano, “The burden and risks of emerging complications of diabetes mellitus,” Nat. Rev. Endocrinol., vol. 18, no. 9, pp. 525–539, 2022, doi: https://doi.org/10.1038/s41574-022-00690-7.
K. Khunti et al., “Diabetes and multiple long-term conditions: a review of our current global health challenge,” Diabetes Care, vol. 46, no. 12, pp. 2092–2101, 2023, doi: https://doi.org/10.2337/dci23-0035.
A. A. Basri, “Tingkat Health Literacy Terhadap Penerapan Self Care Management Pada Pasien Diabetes Melitus Tipe 2,” Bookchapter Diabetes Mellit., vol. 1, no. 1, 2024, doi: https://doi.org/10.5281/zenodo.15896198.
M. Shaikhomer, “Epidemiology and Clinical Advancements in Managing and Treating Diabetes Mellitus,” Pakistan J. Life Soc. Sci., vol. 23, no. 1, pp. 1417–1424, 2025, doi: https://doi.org/10.57239/PJLSS-2025-23.1.00110.
B. Chitradevi, N. S. Chandra, and H. Alabdeli, “Diabetes Mellitus Prediction and Classification Using Firefly Optimization Based Support Vector Machine,” in 2024 International Conference on Distributed Computing and Optimization Techniques (ICDCOT), 2024, pp. 1–5. doi: https://doi.org/10.1109/ICDCOT61034.2024.10515397.
D. H. Badr, “Support vector machine for classifying diabetes patients,” J. Stat. Manag. Syst., vol. 24, no. 7, pp. 1551–1558, 2021, doi: https://doi.org/10.1080/09720510.2021.1960548.
F. Y. Sari, M. sukma Kuntari, H. Khaulasari, and W. A. Yati, “Comparison of Support Vector Machine Performance with Oversampling and Outlier Handling in Diabetic Disease Detection Classification,” MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 22, no. 3, pp. 539–552, 2023, doi: https://doi.org/10.30812/matrik.v22i3.2979.
R. Krisdianto, I. Apriani, and H. Masada, “Performance Analysis of Support Vector Machine (SVM) for Diabetes Disease Detection,” in 2024 5th International Conference on Artificial Intelligence and Data Sciences (AiDAS), 2024, pp. 203–207. doi: https://doi.org/10.1109/AiDAS63860.2024.10730403.
S. Pranata and M. R. Wahyudi, “The Relationship Between Self-acceptance and Self-management on Diabetes Distress among Diabetes Patients in Indonesia,” J. Res. Heal., vol. 15, no. 3, pp. 237–246, 2025, doi: http://dx.doi.org/10.32598/JRH.15.3.2532.1.
R. Amelia, J. Harahap, H. Wijaya, M. A. Pase, S. Suryani Widjaja, and S. Saktioto, “Prevalence, Characteristics and Potential Risk Factors of Prediabetes in Primary Health Care: A Cross-Sectional Study,” F1000Research, vol. 13, pp. 1–24, 2025, doi: https://doi.org/10.12688/f1000research.150600.3.
F. Maulidina, Z. Rustam, S. Hartini, V. V. P. Wibowo, I. Wirasati, and W. Sadewo, “Feature optimization using Backward Elimination and Support Vector Machines (SVM) algorithm for diabetes classification,” in Journal of Physics: Conference Series, 2021, vol. 1821, no. 1, pp. 1–7. doi: 10.1088/1742-6596/1821/1/012006.
M. H. H. Aly, “Klasifikasi Diabetes Menggunakan Algoritma Support Vector Machine Radial Basis Function,” J. Tek. Inform. dan Teknol. Inf., vol. 4, no. 1, pp. 28–38, 2024, doi: https://doi.org/10.55606/jutiti.v4i1.3420.
D. D. Dewi, N. Qisthi, S. S. Sobariah Lestari, S. Putri, and Z. Hidayah, “Perbandingan Metode Neural Network Dan Support Vector Machine Dalam Klasifikasi Diagnosa Penyakit Diabetes,” Cerdika J. Ilm. Indones., vol. 3, no. 9, p. 828, 2023, doi: 10.59141/cerdika.v3i09.662.
A. A. G. A. Pranandita and I. M. Widiartha, “Optimasi Metode Support Vector Machine (SVM) Mengunakan Particle Swarm Optimization pada Permasalahan Klasifikasi Diabetes,” J. Nas. Teknol. Inf. dan Apl., vol. 3, no. 4, pp. 879–888, 2025, doi: https://doi.org/10.24843/JNATIA.2025.v03.i04.p18.
P. Koukaras and C. Tjortjis, “Data Preprocessing and Feature Engineering for Data Mining: Techniques, Tools, and Best Practices,” AI, vol. 6, no. 10, 2025, doi: 10.3390/ai6100257.
D. E. Bowler, R. J. Boyd, C. T. Callaghan, R. A. Robinson, N. J. B. Isaac, and M. J. O. Pocock, “Treating gaps and biases in biodiversity data as a missing data problem,” Biol. Rev., vol. 100, no. 1, pp. 50–67, 2025, doi: https://doi.org/10.1111/brv.13127.
F. C. Oettl et al., “The artificial intelligence advantage: Supercharging exploratory data analysis,” Knee Surgery, Sports Traumatology, Arthroscopy, vol. 32, no. 11. Wiley Online Library, pp. 3039–3042, 2024. doi: https://doi.org/10.1002/ksa.12389.
C. E. da Silva Santos, R. C. Sampaio, L. dos Santos Coelho, G. A. Bestard, and C. H. Llanos, “Multi-objective adaptive differential evolution for SVM/SVR hyperparameters selection,” Pattern Recognit., vol. 110, p. 107649, 2021, doi: https://doi.org/10.1016/j.patcog.2020.107649.
R. Jain, V. Kukreja, S. Chattopadhyay, A. Verma, and R. Sharma, “Radial basis function integrated with support vector machine model for breast cancer detection,” in 2024 2nd International Conference on Artificial Intelligence and Machine Learning Applications Theme: Healthcare and Internet of Things (AIMLA), 2024, pp. 1–5. doi: https://doi.org/10.1109/AIMLA59606.2024.10531382.
P. K. Sahu and T. Fatma, “Optimized Breast Cancer Classification Using PCA-LASSO Feature Selection and Ensemble Learning Strategies with Optuna Optimization,” IEEE Access, vol. 13, pp. 35645–35661, 2025, doi: https://doi.org/10.1109/ACCESS.2025.3539746.
S. Bayaral, E. Gül, and D. Avcı, “Classification of Brain Tumors Using Artificial Intelligence,” Int. J. Innov. Eng. Appl., vol. 9, no. 1, pp. 8–22, 2025, doi: https://doi.org/10.46460/ijiea.1563426.
M. Altalhan, A. Algarni, and M. T.-H. Alouane, “Imbalanced data problem in machine learning: A review,” IEEE Access, vol. 13, pp. 13686–13699, 2025, doi: https://doi.org/10.1109/ACCESS.2025.3531662.
A. F. A. Alshamrani and F. Alshomran, “Optimizing Breast Cancer Mammogram Classification through a Dual Approach: A Deep Learning Framework Combining ResNet50, SMOTE, and Fully Connected Layers for Balanced and Imbalanced Data,” IEEE Access, vol. 13, pp. 4815–4826, 2024, doi: https://doi.org/10.1109/ACCESS.2024.3524633.
A. Alem and S. Kumar, “Deep learning models performance evaluations for remote sensed image classification,” Ieee Access, vol. 10, pp. 111784–111793, 2022, doi: https://doi.org/10.1109/ACCESS.2022.3215264.
J. H. Cabot and E. G. Ross, “Evaluating prediction model performance,” Surgery, vol. 174, no. 3, pp. 723–726, 2023, doi: https://doi.org/10.1016/j.surg.2023.05.023.
S. Sathyanarayanan and B. R. Tantri, “Confusion matrix-based performance evaluation metrics,” African J. Biomed. Res., vol. 27, no. 4S, pp. 4023–4031, 2024, doi: https://doi.org/10.53555/AJBR.v27i4S.4345.
J. C. Obi, “A comparative study of several classification metrics and their performances on data,” World J. Adv. Eng. Technol. Sci., vol. 8, no. 1, pp. 308–314, 2023, doi: https://doi.org/10.30574/wjaets.2023.8.1.0054.
A. M. Carrington et al., “Deep ROC analysis and AUC as balanced average accuracy, for improved classifier selection, audit and explanation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 1, pp. 329–341, 2022, doi: https://doi.org/10.1109/TPAMI.2022.3145392.
J. Li, “Area under the ROC Curve has the most consistent evaluation for binary classification,” PLoS One, vol. 19, no. 12, p. e0316019, 2024, doi: https://doi.org/10.1371/journal.pone.0316019.
J. T. Hancock III, T. M. Khoshgoftaar, and J. M. Johnson, “Using area under the precision recall curve to assess the effect of random undersampling in the classification of imbalanced medicare big data,” Int. J. Reliab. Qual. Saf. Eng., vol. 31, no. 1, p. 2350039, 2024, doi: https://doi.org/10.1142/S0218539323500390.
M. S. Timilsina, S. Sen, B. Uprety, V. B. Patel, P. Sharma, and P. N. Sheth, “Prediction of HHV of fuel by Machine learning Algorithm: Interpretability analysis using Shapley Additive Explanations (SHAP),” Fuel, vol. 357, p. 129573, 2024, doi: https://doi.org/10.1016/j.fuel.2023.129573.
G. Zhao et al., “Enhancing interpretability of tree-based models for downstream salinity prediction: Decomposing feature importance using the Shapley additive explanation approach,” Results Eng., vol. 23, p. 102373, 2024, doi: https://doi.org/10.1016/j.rineng.2024.102373.
Z. Guo et al., “Interpretable machine learning models based on shapley additive explanations for predicting the risk of cerebrospinal fluid leakage in lumbar fusion surgery,” Spine (Phila. Pa. 1976)., vol. 49, no. 18, pp. 1281–1293, 2024, doi: 10.1097/BRS.0000000000005087.
A. Prastyo, Sutikno, and Khadijah, “Improving support vector machine and backpropagation performance for diabetes mellitus classification,” Comput. Sci. Inf. Technol., vol. 5, no. 2, pp. 140–149, 2024, doi: 10.11591/csit.v5i2.pp140-149.
A. Wibowo, A. Fitri, N. Masruriyah, and S. Rahmawati, “Refining Diabetes Diagnosis Models: The Impact of SMOTE on SVM, Logistic Regression, and Naïve Bayes,” J. Electron. Electromed. Eng. Med. Informatics, vol. 7, no. 1, pp. 197–207, 2025, doi: https://doi.org/10.35882/jeeemi.v7i1.596.
B. H. Aubaidan, R. A. Kadir, and M. T. Ijab, “A Comparative Analysis of Smote and CSSF Techniques for Diabetes Classification Using Imbalanced Data,” ournal Comput. Sci., vol. 20, no. 9, pp. 1146–1165, 2024, doi: https://doi.org/10.3844/jcssp.2024.1146.1165.
D. C. E. Saputra, A. Ma’arif, and K. Sunat, “Optimizing Predictive Performance: Hyperparameter Tuning in Stacked Multi-Kernel Support Vector Machine Random Forest Models for Diabetes Identification,” J. Robot. Control, vol. 4, no. 6, pp. 896–904, 2023, doi: https://doi.org/10.18196/jrc.v4i6.20898.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Classification of Diabetes Diseases Based on Medical Features Using Optimized Support Vector Machine
Pages: 2035-2045
Copyright (c) 2025 Ita Arfyanti, Rizki Galang Rahmadani, Bartolomius Harpad

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).





















