Effective Coronary Artery Disease Prediction Using Bayesian Optimization Algorithm and Random Forest

Muhammad Syiarul Amrullah; Anny Yuniarti

doi:10.47065/bits.v6i2.5554

Muhammad Syiarul Amrullah * Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia https://orcid.org/0009-0001-3366-5063
Anny Yuniarti Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bits.v6i2.5554

Keywords: Random Forest; Bayesian Optimization; Coronary Artery Disease; Machine Learning; Feature Selection

Abstract

Coronary artery disease (CAD) continues to be a major global health issue, demanding more effective diagnostic techniques. This study introduces a detailed framework for CAD detection that integrates data preprocessing, feature engineering, and model optimization to enhance diagnostic accuracy. Our methodology encompasses comprehensive data cleansing to eliminate inconsistencies, transformations for better feature representation, feature reduction to highlight relevant variables, data augmentation for balanced class distribution, and optimization strategies to boost model performance. We employed a random forest classifier, trained via 5-fold cross-validation, to develop a robust model. The efficacy of this model was tested through two key experiments: firstly, by comparing its performance on preprocessed versus raw data, and secondly, against previous studies. Results demonstrate that our model significantly surpasses the one trained on raw data, achieving an accuracy of 93.00% compared to 86.16%. Moreover, when compared with existing research, our random forest model excels with an accuracy of 93.00%, a F1 Score of 93.00%, and a recall of 94.00%. Despite the superior precision of the Hybrid PSO-EmNN model found in other research, our results are promising. They underscore the potential of advanced feature engineering to further refine the effectiveness of CAD detection models. The study concludes that meticulous data preprocessing and model optimization are crucial for enhancing CAD diagnostics. Future research should focus on incorporating more sophisticated feature engineering techniques and expanding the dataset to improve the model’s precision and overall diagnostic capabilities.

Downloads

Download data is not yet available.

References

R. Moretti et al., “Common Shared Pathogenic Aspects of Small Vessels in Heart and Brain Disease,” Biomedicines, vol. 10, no. 5, 2022, doi: 10.3390/biomedicines10051009.

P.-S. Huang et al., “An artificial intelligence-enabled ECG algorithm for the prediction and localization of angiography-proven coronary artery disease,” Biomedicines, vol. 10, no. 2, p. 394, 2022.

M. M. Ghiasi, S. Zendehboudi, and A. A. Mohsenipour, “Decision tree-based diagnosis of coronary artery disease: CART model,” Comput Methods Programs Biomed, vol. 192, p. 105400, 2020.

F. Bodendorf, M. Sauter, and J. Franke, “A mixed methods approach to analyze and predict supply disruptions by combining causal inference and deep learning,” Int J Prod Econ, vol. 256, p. 108708, 2023.

S. S. Alotaibi et al., “Automated prediction of Coronary Artery Disease using Random Forest and Naïve Bayes,” in 2020 International Conference on Advanced Computer Science and Information Systems (ICACSIS), 2020, pp. 109–114. doi: 10.1109/ICACSIS51025.2020.9263159.

A. H. Shahid and M. P. Singh, “A Novel Approach for Coronary Artery Disease Diagnosis using Hybrid Particle Swarm Optimization based Emotional Neural Network,” Biocybern Biomed Eng, vol. 40, no. 4, pp. 1568–1585, 2020, doi: https://doi.org/10.1016/j.bbe.2020.09.005.

J. H. Joloudari et al., “Coronary Artery Disease Diagnosis; Ranking the Significant Features Using a Random Trees Model,” Int J Environ Res Public Health, vol. 17, no. 3, 2020, doi: 10.3390/ijerph17030731.

R. Valarmathi and T. Sheela, “Heart disease prediction using hyper parameter optimization (HPO) tuning,” Biomed Signal Process Control, vol. 70, p. 103033, 2021, doi: https://doi.org/10.1016/j.bspc.2021.103033.

B. Kolukisa and B. Bakir-Gungor, “Ensemble feature selection and classification methods for machine learning-based coronary artery disease diagnosis,” Comput Stand Interfaces, vol. 84, p. 103706, 2023, doi: https://doi.org/10.1016/j.csi.2022.103706.

S. F. Pane and M. S. Amrullah, “Systematic Literature Review: Analisa Sentimen Masyarakat terhadap Penerapan Peraturan ETLE,” Journal of Applied Computer Science and Technology, vol. 4, no. 1, pp. 65–74, Jul. 2023, doi: 10.52158/jacost.v4i1.493.

M. Hosseinzadeh et al., “Data cleansing mechanisms and approaches for big data analytics: a systematic study,” J Ambient Intell Humaniz Comput, pp. 1–13, 2023.

X. Wu, W. Zheng, X. Xia, and D. Lo, “Data quality matters: A case study on data label correctness for security bug report prediction,” IEEE Transactions on Software Engineering, vol. 48, no. 7, pp. 2541–2556, 2021.

B. Dastjerdy, A. Saeidi, and S. Heidarzadeh, “Review of applicable outlier detection methods to treat geomechanical data,” Geotechnics, vol. 3, no. 2, pp. 375–396, 2023.

D. P. Misra, O. Zimba, and A. Y. Gasparyan, “Statistical data presentation: a primer for rheumatology researchers,” Rheumatol Int, vol. 41, no. 1, pp. 43–55, 2021.

B. Diène, J. J. P. C. Rodrigues, O. Diallo, E. L. H. M. Ndoye, and V. V Korotaev, “Data management techniques for Internet of Things,” Mech Syst Signal Process, vol. 138, p. 106564, 2020.

X. Zhang, Y. Han, W. Xu, and Q. Wang, “HOBA: A novel feature engineering methodology for credit card fraud detection with a deep learning architecture,” Inf Sci (N Y), vol. 557, pp. 302–316, 2021.

D. L. Mann, D. P. Zipes, P. Libby, and R. O. Bonow, Braunwald’s Heart Disease E-Book: A Textbook of Cardiovascular Medicine. Elsevier Health Sciences, 2014. [Online]. Available: https://books.google.co.id/books?id=1R44BAAAQBAJ

V. Barannik, S. Sidchenko, N. Barannik, and V. Barannik, “Development of the method for encoding service data in crypto-compression image representation systems,” Eastern-European Journal of Enterprise Technologies, vol. 3, no. 9, p. 111, 2021.

V. N. G. Raju, K. P. Lakshmi, V. M. Jain, A. Kalidindi, and V. Padma, “Study the influence of normalization/transformation process on the accuracy of supervised classification,” in 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), IEEE, 2020, pp. 729–735.

E. Alshdaifat, D. Alshdaifat, A. Alsarhan, F. Hussein, and S. M. F. S. El-Salhi, “The effect of preprocessing techniques, applied to numeric features, on classification algorithms’ performance,” Data (Basel), vol. 6, no. 2, p. 11, 2021.

N. Pudjihartono, T. Fadason, A. W. Kempa-Liehr, and J. M. O’Sullivan, “A review of feature selection methods for machine learning-based disease risk prediction,” Frontiers in Bioinformatics, vol. 2, p. 927312, 2022.

P. Ghosh et al., “Efficient prediction of cardiovascular disease using machine learning algorithms with relief and LASSO feature selection techniques,” IEEE Access, vol. 9, pp. 19304–19326, 2021.

Y. Zhou, F. Dong, Y. Liu, Z. Li, J. Du, and L. Zhang, “Forecasting emerging technologies using data augmentation and deep learning,” Scientometrics, vol. 123, pp. 1–29, 2020.

D. Elreedy, A. F. Atiya, and F. Kamalov, “A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning,” Mach Learn, pp. 1–21, 2023.

S. Studer et al., “Towards CRISP-ML (Q): a machine learning process model with quality assurance methodology,” Mach Learn Knowl Extr, vol. 3, no. 2, pp. 392–413, 2021.

E. Izquierdo-Verdiguier and R. Zurita-Milla, “An evaluation of Guided Regularized Random Forest for classification and regression tasks in remote sensing,” International Journal of Applied Earth Observation and Geoinformation, vol. 88, p. 102051, 2020.

L. Yang and A. Shami, “On hyperparameter optimization of machine learning algorithms: Theory and practice,” Neurocomputing, vol. 415, pp. 295–316, 2020.

S. Nematzadeh, F. Kiani, M. Torkamanian-Afshar, and N. Aydin, “Tuning hyperparameters of machine learning algorithms and deep neural networks using metaheuristics: A bioinformatics study on biomedical and biological cases,” Comput Biol Chem, vol. 97, p. 107619, 2022.

E. Elgeldawi, A. Sayed, A. R. Galal, and A. M. Zaki, “Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis,” in Informatics, MDPI, 2021, p. 79.

A. Tharwat, “Classification assessment methods,” Applied computing and informatics, vol. 17, no. 1, pp. 168–192, 2021.

B. J. Erickson and F. Kitamura, “Magician’s corner: 9. Performance metrics for machine learning models,” Radiology: Artificial Intelligence, vol. 3, no. 3. Radiological Society of North America, p. e200126, 2021.

J. Miao and W. Zhu, “Precision–recall curve (PRC) classification trees,” Evol Intell, vol. 15, no. 3, pp. 1545–1569, 2022.

D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, pp. 1–13, 2020.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Effective Coronary Artery Disease Prediction Using Bayesian Optimization Algorithm and Random Forest