Deteksi Penyakit Jantung Menggunakan SVM dan XGBoost dengan Interpretabilitas SHAP dan Integrasi LLM

Raihan Al Aziz; Egia Rosi Subhiyakto

doi:10.47065/bits.v8i1.9722

Raihan Al Aziz Universitas Dian Nuswantoro, Semarang, Indonesia
Egia Rosi Subhiyakto * Universitas Dian Nuswantoro, Semarang, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bits.v8i1.9722

Keywords: Explainable AI; Heart Disease; Large Language Models; Machine Learning; Support Vector Machine

Abstract

Cardiovascular disease remains the leading cause of death globally, demanding accurate early detection, yet limited access to specialist medical personnel in developing countries often hinders timely diagnosis. This study aims to address the critical gap between the high accuracy of machine learning models in academic research and the minimal adoption of practical clinical applications by developing a safe and trustworthy hybrid artificial intelligence-based heart disease triage system. The proposed methodology integrates a dual-model architecture in which Support Vector Machine serves as the primary prediction model and Extreme Gradient Boosting as a second-opinion model, both optimized with SMOTE oversampling technique to handle class imbalance, and implements SHAP to provide transparency in black-box model decisions. The system is enriched with Dynamic Prompt Engineering innovation on the Mistral-7B Large Language Model to translate numerical probabilities into safe, personalized, and empathetic medical narratives. Experimental results show that the Support Vector Machine model with RBF kernel delivers superior performance with an accuracy of 90.22% and sensitivity of 94.12%, which is crucial for minimizing false negative cases in medical screening, outperforming the Extreme Gradient Boosting model which recorded 88.04% accuracy. Interpretability analysis identified chest pain type, cholesterol level, and maximum heart rate as the primary risk indicators, validating the model's alignment with standard cardiology guidelines. A dual safety validation mechanism through programmed risk thresholds and language generation temperature control ensures the system does not produce harmful diagnostic hallucinations. In conclusion, the system implemented as a FastAPI-based microservice is proven technically feasible with low latency, offering an accurate, transparent, and communicative early screening solution to support healthcare service efficiency.

Downloads

Download data is not yet available.

References

S. Simatupang, R. Ramadhansyah, R. Tumanggor, E. P. Tan, and S. A. Fajar, “Prediction of Heart Disease Risk Based on Patient Health History Using the Support Vector Machine (SVM) Algorithm,” ZERO: Jurnal Sains, Matematika dan Terapan, vol. 9, no. 2, p. 612, Nov. 2025, doi: 10.30829/zero.v9i2.26087.

S. A. Bangun, E. S. Ompusunggu, W. Wilson, and E. K. Harefa, “Support Vector Machine for Classifying Heart Failure, Hypertension, and Normal Heart Condition,” JUSIFO (Jurnal Sistem Informasi), vol. 11, no. 1, pp. 53–60, Jun. 2025, doi: 10.19109/jusifo.v11i1.28113.

T. Roopa and G. D. Ramanjinappa, “Heart Disease Predictive Modeling with XGBoost and SMOTE-Driven Class Imbalance Mitigation,” Engineering, Technology & Applied Science Research, vol. 15, no. 6, pp. 29914–29918, Dec. 2025, doi: 10.48084/etasr.14301.

D. Rohmayani, C. A. Sugianto, R. S. Perdana, and M. M. Nafea, “Improving Extreme Gradient Boosting Model for Heart Disease Prediction Using SMOTE for Class Imbalance,” Jurnal Teknik Informatika (Jutif), vol. 6, no. 4, pp. 1717–1728, Aug. 2025, doi: 10.52436/1.jutif.2025.6.4.4753.

U. Nagavelli, D. Samanta, and P. Chakraborty, “Machine Learning Technology-Based Heart Disease Detection Models,” J. Healthc. Eng., vol. 2022, pp. 1–9, 2022, doi: 10.1155/2022/7351061.

S. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions,” in Advances in Neural Information Processing Systems, Nov. 2017, pp. 4765–4774. [Online]. Available: https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html

S. M. Ganie, P. K. D. Pramanik, and Z. Zhao, “Ensemble learning with explainable AI for improved heart disease prediction based on multiple datasets,” Sci. Rep., vol. 15, no. 1, p. 13912, Apr. 2025, doi: 10.1038/s41598-025-97547-6.

E. Tjoa and C. Guan, “A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 11, pp. 4793–4813, Nov. 2021, doi: 10.1109/TNNLS.2020.3027314.

M. Ghassemi, L. Oakden-Rayner, and A. L. Beam, “The false hope of current approaches to explainable artificial intelligence in health care,” Lancet Digit. Health, vol. 3, no. 11, pp. e745–e750, Nov. 2021, doi: 10.1016/S2589-7500(21)00208-9.

K. Singhal et al., “Large language models encode clinical knowledge,” Nature, vol. 620, no. 7972, pp. 172–180, Aug. 2023, doi: 10.1038/s41586-023-06291-2.

A. Q. Jiang et al., “Mistral 7B,” arXiv preprint, arXiv:2310.06825, Oct. 2023, doi: 10.48550/arXiv.2310.06825.

B. A. Majeed, A. Y. Hardan, B. Y. Hardan, and D. F. Munaf, “Accurate AI-Based Chatbot to Diagnose Heart Diseases Pre-Human Doctor Consultation,” Revue d’Intelligence Artificielle, vol. 38, no. 1, pp. 213–220, Feb. 2024, doi: 10.18280/ria.380121.

S. E. Antia et al., “Healthy Heart Assistant, a WhatsApp-Based Generative Pretrained Transformer Technology, for Self-Care in Hypertensive Patients,” Mayo Clinic Proceedings: Digital Health, vol. 3, no. 3, p. 100243, Sep. 2025, doi: 10.1016/j.mcpdig.2025.100243.

A. Nurlita and M. Munawaroh, “Pengembangan Chatbot Dengan Metode Natural Language Processing Untuk Layanan Pelanggan (Studi Kasus PT Masterlink Internet Solution),” Jurnal Informatika dan Teknik Elektro Terapan, vol. 13, no. 3S1, Oct. 2025, doi: 10.23960/jitet.v13i3S1.8176.

S. Boit and R. Patil, “A Prompt Engineering Framework for Large Language Model–Based Mental Health Chatbots: Conceptual Framework,” JMIR Ment. Health, vol. 12, pp. e75078–e75078, Nov. 2025, doi: 10.2196/75078.

B. Meskó, “Prompt Engineering as an Important Emerging Skill for Medical Professionals: Tutorial,” J. Med. Internet Res., vol. 25, p. e50638, Oct. 2023, doi: 10.2196/50638.

M. Muhetaer, A. Yusupu, W. Yifan, M. Mutalipu, and F. Hao, “Medical QA dialogue datasets in RAG systems performance evaluation and ChatGPT optimization,” Sci. Rep., vol. 15, no. 1, p. 44467, Dec. 2025, doi: 10.1038/s41598-025-28015-4.

O. T. Odofin, B. I. Adekunle, E. Ogbuefi, J. C. Ogeawuchi, O. S. Adanigbo, and T. P. Gbenle, “Improving Healthcare Data Intelligence through Custom NLP Pipelines and Fast API Micro services,” Journal of Frontiers in Multidisciplinary Research, vol. 4, no. 1, pp. 390–397, 2023, doi: 10.54660/.JFMR.2023.4.1.390-397.

J. Yang and J. Guan, “A Heart Disease Prediction Model Based on Feature Optimization and Smote-Xgboost Algorithm,” Information, vol. 13, no. 10, p. 475, Oct. 2022, doi: 10.3390/info13100475.

K. Budholiya, S. K. Shrivastava, and V. Sharma, “An optimized XGBoost based diagnostic system for effective prediction of heart disease,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 7, pp. 4514–4523, Jul. 2022, doi: 10.1016/j.jksuci.2020.10.013.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Deteksi Penyakit Jantung Menggunakan SVM dan XGBoost dengan Interpretabilitas SHAP dan Integrasi LLM

Deteksi Penyakit Jantung Menggunakan SVM dan XGBoost dengan Interpretabilitas SHAP dan Integrasi LLM

Abstract

Downloads

References

Most read articles by the same author(s)