Perbandingan Kinerja Algoritma CatBoost, XGBoost, LightGBM dan Random Forest Dalam Memprediksi Risiko Infeksi Aids Dalam Dataset Kesehatan


  • Pramudya Ridwan Yulianto Universitas Dian Nuswantoro, Semarang, Indonesia
  • Yani Parti Astuti * Mail Universitas Dian Nuswantoro, Semarang, Indonesia
  • (*) Corresponding Author
Keywords: AIDS; Machine Learning; SMOTE; CatBoost; Medical Classification

Abstract

This study investigates the prediction of AIDS infection risk using tree-based algorithms CatBoost, XGBoost, LightGBM, and Random Forest applied to a medical and demographic dataset consisting of 2,139 observations and 23 variables. The research process includes data exploration, cleaning, handling extreme values using the interquartile range (IQR) method, normalization with RobustScaler, and class balancing using the Synthetic Minority Over-sampling Technique (SMOTE). Due to the imbalanced nature of the dataset, model evaluation emphasizes not only accuracy but also Recall, F1-Score, and AUC-ROC to better assess infected class detection. Prior to SMOTE implementation, all models achieved high accuracy but relatively low recall for the positive class; after resampling, CatBoost demonstrated the most significant improvement, with recall increasing from 63% to 77% and F1-Score from 72% to 79%, achieving an overall accuracy of 90%. In comparison, XGBoost reached an accuracy of 88.63% with a more moderate recall improvement, while LightGBM and Random Forest showed consistent yet smaller gains, indicating that the combination of SMOTE and CatBoost is more effective in minimizing False Negatives in AIDS infection cases. The main contribution of this study lies in the integration of robust outlier handling, feature normalization, and class balancing within a structured experimental framework, with a specific emphasis on sensitivity optimization to enhance early detection reliability in clinical screening contexts.

Downloads

Download data is not yet available.

References

R. S. Gumarianto, S. Lardo, and A. Chairani, “Hubungan Antara Hitung Jumlah CD4 dengan Kejadian Wasting Syndrome pada Pasien HIV/AIDS di RSPAD Gatot Soebroto Periode Januari–Desember 2020,” Jurnal Kedokteran dan Kesehatan : Publikasi Ilmiah Fakultas Kedokteran Universitas Sriwijaya, vol. 9, no. 2, pp. 133–142, May 2022, doi: 10.32539/jkk.v9i2.16975.

Jocelyn et al., “HIV/AIDS in Indonesia: current treatment landscape, future therapeutic horizons, and herbal approaches,” Front. Public Health, vol. 12, Feb. 2024, doi: 10.3389/fpubh.2024.1298297.

World Health Organization, “HIV Statistics, Globally and by WHO Region, 2025,” Geneva, 2025. Accessed: Feb. 19, 2026. [Online]. Available: https://share.google/Xb6HLrQ2I2s8TkzZL

Kemenkes RI, “Perkembangan HIV AIDS dan Penyakit Infeksi Menular Seksual (PIMS) Semester I Tahun 2025,” Jun. 2025. Accessed: Feb. 19, 2026. [Online]. Available: https://share.google/sKhjSlrkuYg2l2ake

A. Phuphuakrat, K. Khamnurak, S. Srichatrapimuk, and W. Wangsomboonsiri, “Missed opportunities for earlier diagnosis of HIV infection in people living with HIV in Thailand,” PLOS Global Public Health, vol. 2, no. 7, Jul. 2022, doi: 10.1371/journal.pgph.0000842.

X. Hu et al., “Development and application of an early prediction model for risk of bloodstream infection based on real-world study,” BMC Med. Inform. Decis. Mak., vol. 25, no. 1, May 2025, doi: 10.1186/s12911-025-03020-9.

A. Hidayani and C. Florency, “Faktor Risiko HIV pada Anak dengan Ibu Penderita HIV Positif Ditinjau dari Berbagai Literatur di Palangka Raya,” Jurnal Analis Kesehatan Klinikal Sains, vol. 13, no. 1, Jun. 2025, doi: https://doi.org/10.36341/klinikalsains.v13i1.6345.

World Health Organization, “HIV and AIDS,” Jul. 2025. Accessed: Feb. 19, 2026. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/hiv-aids

I. Y. Mauleti et al., “Rapid Antiretroviral Therapy Initiation Reduces Mortality Among People Living With HIV in Indonesia: A Retrospective Observational Study,” Journal of Preventive Medicine and Public Health, vol. 58, no. 4, pp. 360–369, Jan. 2025, doi: 10.3961/jpmph.24.622.

D. Ayu Novita Prameswari, “Faktor Risiko yang Berhubungan dengan HIV/AIDS di Indonesia: Literature Review,” Jurnal Kesehatan Tambusai, vol. 5, no. 3, Sep. 2024, doi: https://doi.org/10.31004/jkt.v5i3.31350.

P. Pramita Izati, N. Aniniyah, and D. P. Isnawaty, “Comparison Between XGBoost, CatBoost, Random Forest, and LightGBM in Indonesian Women’s Breast Cancer Dataset,” Parameter: Journal of Statistics, vol. 5, no. 2, pp. 76–88, Dec. 2025, doi: 10.22487/27765660.2025.v5.i2.17658.

M. Bahril Ilmi and Kusrini, “Perbandingan Kinerja Algoritma Machine Learning dalam Deteksi Potensi Risiko HIV,” Buffer Informatika, vol. 11, no. 1, Apr. 2025, doi: https://doi.org/10.25134/buffer.v11i1.355.

B. L. Ortiz et al., “Data Preprocessing Techniques for AI and Machine Learning Readiness: Scoping Review of Wearable Sensor Data in Cancer Care,” JMIR Mhealth Uhealth, vol. 12, Sep. 2024, doi: 10.2196/59587.

J. M. H. Pinheiro et al., “The Impact of Feature Scaling in Machine Learning: Effects on Regression and Classification Tasks,” IEEE Access, vol. 13, pp. 199903–199931, Nov. 2025, doi: 10.1109/ACCESS.2025.3635541.

A. Mizwar, A. Rahim, P. Hartato, A. Ridwan, and F. Asharudin, “Machine Learning-Based Approach for HIV/AIDS Prediction: Feature Selection and Data Balancing Strategy,” Journal of Applied Informatics and Computing (JAIC), vol. 9, no. 2, pp. 338–347, Apr. 2025, doi: https://doi.org/10.30871/jaic.v9i2.9125.

Y. Li et al., “The Predictive Accuracy of Machine Learning for the Risk of Death in HIV Patients: A Systematic Review and Meta-Analysis,” BMC Infect. Dis., vol. 24, no. 1, May 2024, doi: 10.1186/s12879-024-09368-z.

M. F. H. Lamem, M. I. Sahid, and A. Ahmed, “Artificial intelligence for access to primary healthcare in rural settings,” Journal of Medicine, Surgery, and Public Health, vol. 5, no. 2, Apr. 2025, doi: 10.1016/j.glmedi.2024.100173.

S. Adhikari, “Importance of Data Preprocessing and Parameters Tuning for Supervised Machine Learning Models on Tweets Sentiment Analysis,” THE BATUK : A Peer Reviewed Journal of Interdisciplinary Studies, vol. 10, no. 1, pp. 133–151, Jan. 2024, doi: 10.3126/batuk.v10i1.62303.

V. Werner de Vargas, J. A. Schneider Aranda, R. dos Santos Costa, P. R. da Silva Pereira, and J. L. Victória Barbosa, “Imbalanced Data Preprocessing Techniques for Machine Learning: A Systematic Mapping Study,” Knowl. Inf. Syst., vol. 65, no. 1, pp. 31–57, Jan. 2023, doi: 10.1007/s10115-022-01772-8.

M. D. Teja and G. M. Rayalu, “Optimizing Heart Disease Diagnosis with Advanced Machine Learning Models: A Comparison of Predictive Performance,” BMC Cardiovasc. Disord., vol. 25, Mar. 2025, doi: 10.1186/s12872-025-04627-6.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Perbandingan Kinerja Algoritma CatBoost, XGBoost, LightGBM dan Random Forest Dalam Memprediksi Risiko Infeksi Aids Dalam Dataset Kesehatan

Dimensions Badge
Article History
Submitted: 2025-12-18
Published: 2026-03-06
Abstract View: 107 times
PDF Download: 101 times
How to Cite
Yulianto, P., & Astuti, Y. (2026). Perbandingan Kinerja Algoritma CatBoost, XGBoost, LightGBM dan Random Forest Dalam Memprediksi Risiko Infeksi Aids Dalam Dataset Kesehatan. Building of Informatics, Technology and Science (BITS), 7(4), 2383−2393. https://doi.org/10.47065/bits.v7i4.8975
Issue
Section
Articles