Perbandingan XGBoost dan Random Forest Menggunakan Seleksi Fitur ANOVA-MI Dalam Klasifikasi Kesehatan Janin Cardiotocography


  • Abednego Destyo Amanda Universitas Teknokrat Indonesia, Bandar Lampung, Indonesia
  • Angga Bayu Santoso * Mail Universitas Teknokrat Indonesia, Bandar Lampung, Indonesia
  • (*) Corresponding Author
Keywords: Fetal Health; Cardiotocography; Random Forest; XGBoost; Feature Selection; ADASYN

Abstract

This study compares the performance of Random Forest and XGBoost algorithms in classifying fetal health problems using Cardiotocography (CTG) data. The imbalance in the amount of data between classes, the presence of less relevant features, and the challenge in identifying the Suspect class, which has characteristics between the Normal and Pathological classes, are the main problems in the CTG dataset. This condition is important because the early stage of fetal health risk determines further medical treatment represented by the Suspect class. This study uses ANOVA and Mutual Information feature selection techniques, as well as the ADASYN oversampling method to balance the data to overcome these problems. In addition, Random Search is used to optimize model parameters to improve its performance. Unlike previous studies that generally focus on improving accuracy, this study also emphasizes the model's ability to detect minority classes, especially the Suspect class. Based on the results of the study, in almost every test scenario, XGBoost consistently outperforms Random Forest. The XGBoost model obtained optimal accuracy from the combination of ANOVA, ADASYN, and hyperparameter tuning with an accuracy of 95.51%. Meanwhile, the application of Mutual Information with ADASYN and tuning was quite effective in identifying the Suspect class with a higher recall value of 81%. However, because the Suspect class attribute lies between the Normal and Pathological class attributes, the model still faces challenges in optimally distinguishing them. Overall, this study shows that a combination of appropriate feature selection, handling data imbalance, and parameter optimization in a single pipeline can improve model performance more balanced. This research is expected to support more objective medical decision-making, especially in detecting fetal risk conditions from an early stage.

Downloads

Download data is not yet available.

References

I. Sulihati, A. Syukur, and A. Marjuni, “Deteksi Kesehatan Janin Menggunakan Decision Tree dan Feature Forward Selection,” Building of Informatics, Technology and Science (BITS), vol. 4, no. 3, Dec. 2022, doi: 10.47065/bits.v4i3.2672.

M. Chen and Z. Yin, “Classification of Cardiotocography Based on the Apriori Algorithm and Multi-Model Ensemble Classifier,” Front. Cell Dev. Biol., vol. 10, May 2022, doi: 10.3389/fcell.2022.888859.

H. A. Zeini, D. Al-Jeznawi, H. Imran, L. F. A. Bernardo, Z. Al-Khafaji, and K. A. Ostrowski, “Random Forest Algorithm for the Strength Prediction of Geopolymer Stabilized Clayey Soil,” Sustainability (Switzerland), vol. 15, no. 2, Jan. 2023, doi: 10.3390/su15021408.

G. Velarde et al., “Tree boosting methods for balanced and imbalanced classification and their robustness over time in risk assessment,” Intelligent Systems with Applications, vol. 22, Jun. 2024, doi: 10.1016/j.iswa.2024.200354.

I. Campos, H. Gonçalves, J. Bernardes, and L. Castro, “Fetal Heart Rate Preprocessing Techniques: A Scoping Review,” 2024, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/bioengineering11040368.

I. Nazli, E. Korbeko, S. Dogru, E. Kugu, and O. K. Sahingoz, “Early Detection of Fetal Health Conditions Using Machine Learning for Classifying Imbalanced Cardiotocographic Data,” Diagnostics, vol. 15, no. 10, May 2025, doi: 10.3390/diagnostics15101250.

A. Ilham, T. A. P. Nagara, M. Kamaruddin, L. Khikmah, and T. Mantoro, “Fetal Health Risk Classification Using Important Feature Selection and Cart Model on Cardiotocography Data,” Informatica (Slovenia), vol. 49, no. 1, pp. 193–206, 2025, doi: 10.31449/inf.v49i1.5658.

A. Kuzu and Y. Santur, “Early Diagnosis and Classification of Fetal Health Status from a Fetal Cardiotocography Dataset Using Ensemble Learning,” Diagnostics, vol. 13, no. 15, Aug. 2023, doi: 10.3390/diagnostics13152471.

K. Wadhwa, R. Kumari, and A. Gosain, “Enhancing Model Performance in Hybrid Class Imbalance Techniques,” in Procedia Computer Science, Elsevier B.V., 2025, pp. 288–297. doi: 10.1016/j.procs.2025.04.266.

T. N. Annisa, J. Jasmir, and N. Nurhadi, “Comparison of ANOVA and Chi-Square Feature Selection Methods to Improve Machine Learning Performance in Anemia Classification,” Jurnal Teknik Informatika (Jutif), vol. 6, no. 4, pp. 1925–1940, Aug. 2025, doi: 10.52436/1.jutif.2025.6.4.5017.

S. Liu and M. Motani, “Improving Mutual Information based Feature Selection by Boosting Unique Relevance,” Dec. 2022, [Online]. Available: http://arxiv.org/abs/2212.06143

F. Francis, S. Luz, H. Wu, S. J. Stock, and R. Townsend, “Machine learning on cardiotocography data to classify fetal outcomes: A scoping review,” 2024, Elsevier Ltd. doi: 10.1016/j.compbiomed.2024.108220.

U. Sirisha et al., “Dynamic Multi-Layer Perceptron for Fetal Health Classification Using Cardiotocography Data,” Computers, Materials and Continua, vol. 80, no. 2, pp. 2301–2330, 2024, doi: 10.32604/cmc.2024.053132.

W. Widyastuty and M. A. Azis, “Classification and Evaluation of Sleep Disorders Using Random Forest Algorithm in Health and Lifestyle Dataset,” Compiler, vol. 13, no. 1, p. 11, May 2024, doi: 10.28989/compiler.v13i1.2184.

D. Asmawati, L. Arif Sanjani, C. Dimas Renggana, C. Fatichah, and T. Mustaqim, “Arrhythmia Classification with ECG Signal using Extreme Gradient Boosting (XGBoost) Algorithm,” Journal of Technology and Informatics (JoTI), vol. 6, no. 1, pp. 36–42, Oct. 2024, doi: 10.37802/joti.v6i1.792.

J. P. van Zyl and A. P. Engelbrecht, “Analysis of classification metric behaviour under class imbalance,” Egyptian Informatics Journal, vol. 31, Sep. 2025, doi: 10.1016/j.eij.2025.100711.

S. Sathyanarayanan, “Confusion Matrix-Based Performance Evaluation Metrics,” African Journal of Biomedical Research, vol. 27, no. 4S, pp. 4023–4031, Nov. 2024, doi: 10.53555/ajbr.v27i4s.4345.

A. Latif and Siti Khotimatul Wildah, “Optimization of Random Forest Model with SMOTE for Fetal Health Classification Based on Cardiotocography,” JURNAL TEKNOLOGI DAN OPEN SOURCE, vol. 8, no. 1, pp. 390–397, Jun. 2025, doi: 10.36378/jtos.v8i1.4360.

K. Handayani et al., “Comparison of XGboost, Extra Trees, and LightGBM with SMOTE for Fetal Health Classification,” Sistemasi: Jurnal Sistem Informasi, vol. 13, no. 3, 2024, [Online]. Available: http://sistemasi.ftik.unisi.ac.id

O. C. Olayemi and O. O. Olasehinde, “Machine Learning Prediction of Fetal Health Status from Cardiotocography Examination in Developing Healthcare Contexts,” Journal of Computer Science Research, vol. 6, no. 1, pp. 43–53, Mar. 2024, doi: 10.30564/jcsr.v6i1.6242.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Perbandingan XGBoost dan Random Forest Menggunakan Seleksi Fitur ANOVA-MI Dalam Klasifikasi Kesehatan Janin Cardiotocography

Dimensions Badge
Article History
Submitted: 2026-04-17
Published: 2026-06-05
Abstract View: 0 times
PDF Download: 0 times
How to Cite
Amanda, A., & Santoso, A. (2026). Perbandingan XGBoost dan Random Forest Menggunakan Seleksi Fitur ANOVA-MI Dalam Klasifikasi Kesehatan Janin Cardiotocography. Building of Informatics, Technology and Science (BITS), 8(1), 108-120. https://doi.org/10.47065/bits.v8i1.9688
Issue
Section
Articles