Analisis Prediktif Faktor Kematian Balita menggunakan Logistic Regression, Random Forest, dan XGBoost
Abstract
The Under-Five Mortality Rate (UFMR) is a crucial issue in Indonesia that requires data-driven interventions. This study aims to develop a predictive model to identify the most influential risk factors for under-five mortality in Bandung City and to compare the performance of three machine learning algorithms. This research utilizes secondary data from the Bandung City Open Data portal for the period 2019-2021. The method employed is a comparative analysis of Logistic Regression, Random Forest, and XGBoost. To address the significant class imbalance in the data, the Synthetic Minority Over-sampling Technique (SMOTE) was applied to the training data. The evaluation results show that all three models achieve high accuracy, however, performance on the minority calss (mortality cases) remains challenging, indicated by low F1-scores (0.12 for Random Forest and 0.17 for XGBoost). Nonetheless, the feature importance analysis from the Random Forest model successfully identified 'other causes' (penyebab_LAIN-LAIN), 'fever' (penyebab_DEMAM), and the availability of healthcare professionals (PERAWAT, BIDAN) as the most significant predictors. This study highlights the insight from feature importance in identifying risk factors in imbalanced medical data, providing a basis for more targeted health policy recommendations.
Downloads
References
Agus Iryanto, A., Joko, T., & Raharjo, M. (2021). Literature Review : Faktor Risiko Kejadian Diare Pada Balita Di Indonesia Literature Review : Risk Factors For The Incidence of Diarrhea in Children Under Five in Indonesia. Jurnal Kesehatan Lingkungan, 11(1), 1–7. https://doi.org/10.47718/jkl.v10i2.1166
Akbar, I., Supriadi, F., & Junaedi, D. I. (2025). Pemanfaatan Machine Learning Di Bidang Kesehatan. In Jurnal Mahasiswa Teknik Informatika) (Vol. 9, Issue 1).
Bitew, F. H., Nyarko, S. H., Potter, L., & Sparks, C. S. (2020). Machine learning approach for predicting under-five mortality determinants in Ethiopia: evidence from the 2016 Ethiopian Demographic and Health Survey. Genus, 76(1). https://doi.org/10.1186/s41118-020-00106-2
Dwi Putri, A., Devianto, D., & Yanuar, F. (2021). Pemodelan Jumlah Kematian Bayi Di Kota Bandung Dengan Menggunakan Regresi Zero-Inflated Poisson. Jurnal Matematika UNAND, 10(4), 464–475.
Irfannandhy, R., Handoko, L. B., & Ariyanto, N. (2024a). Analisis Performa Model Random Forest dan CatBoost dengan Teknik SMOTE dalam Prediksi Risiko Diabetes. Edumatic: Jurnal Pendidikan Informatika, 8(2), 714–723. https://doi.org/10.29408/edumatic.v8i2.27990
Jasman, T. Z., Hasmin, E., Sunardi, Susanto, C., & Musu, W. (2022a). Perbandingan Logistic Regression, Random Forest, dan Perceptron pada Klasifikasi Pasien Gagal Jantung. CSRID (Computer Science Research and Its Development Journal), 14(3), 271–286. https://doi.org/10.22303/csrid.14.3.2022.271-286
Lee, J., Cai, J., Li, F., & Vesoulis, Z. A. (2021). Predicting mortality risk for preterm infants using random forest. Scientific Reports, 11(1). https://doi.org/10.1038/s41598-021-86748-4
Mfateneza, E., Rutayisire, P. C., Biracyaza, E., Musafiri, S., & Mpabuka, W. G. (2022). Application of machine learning methods for predicting infant mortality in Rwanda: analysis of Rwanda demographic health survey 2014–15 dataset. BMC Pregnancy and Childbirth, 22(1). https://doi.org/10.1186/s12884-022-04699-8
Mishra, A., Vasishtha, G., & Maiti, S. (2024a). Predicting factors associated with under-5 mortality in India using machine learning algorithms: evidence from National Family Health Survey, 2019-21. https://doi.org/10.21203/rs.3.rs-5309131/v1
Rahman, A., Hossain, Z., Kabir, E., & Rois, R. (2022). An assessment of random forest technique using simulation study: illustration with infant mortality in Bangladesh. Health Information Science and Systems, 10(1). https://doi.org/10.1007/s13755-022-00180-0
Salam, A., Azhari, L., Septarini, R. S., & Heriyani, N. (2025a). Pendekatan Hybrid K-Means SMOTE dan Logistic Regression Untuk Deteksi Dini Diabetes Mellitus Pada Imbalanced Data. Bulletin Of Computer Science Research, 5(3), 219–227. https://doi.org/10.47065/bulletincsr.v5i3.502
Saroj, R. K., Yadav, P. K., Singh, R., & Chilyabanyama, O. N. (2022). Machine Learning Algorithms for understanding the determinants of under-five Mortality. BioData Mining, 15(1). https://doi.org/10.1186/s13040-022-00308-8
Tirsa Lengkong. (2020). Faktor-Faktor Yang Berhubungan Dengan Kematian Bayi Di Indonesia. In Jurnal KESMAS., 9(4).
Wijaya, V., & Rachmat, N. (2024a). Comparison of SVM, Random Forest, and Logistic Regression Performance n Student Mental Health Screening. JEECS (Journal of Electrical Engineering and Computer Sciences), 9(2), 173–184. https://doi.org/10.54732/jeecs.v9i2.9
Yustisi Irkan, N., & Aril Ahri, R. (2022). Analysis of Factors Associated with Infant Mortality. Journal of Muslim Community Health (JMCH) 2022, 3(1), 24–32. https://doi.org/10.52103/jmch.v3i1.783
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Analisis Prediktif Faktor Kematian Balita menggunakan Logistic Regression, Random Forest, dan XGBoost
Pages: 692-698
Copyright (c) 2025 Aqila Kharismawardani, Denny Ganjar Purnama

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).













