Prediksi Periode Fosil Trilobita Menggunakan XGBoost dengan Seleksi Fitur Geologi–Geospasial dan Hyperparameter Tuning


  • Naufal Rizky Ramadhan Universitas Dian Nuswantoro, Semarang, Indonesia
  • Elkaf Rahmawan Pramudya * Mail Universitas Dian Nuswantoro, Semarang, Indonesia
  • (*) Corresponding Author
Keywords: XGBoost; RandomizedSearchCV; Imbalance Data; Trilobite; Digital Paleontology; Geospasial

Abstract

This study investigates the application of the Extreme Gradient Boosting (XGBoost) algorithm to predict the age period of trilobite fossils based on geological and geospatial data. The challenges addressed in this research include the high complexity of paleontological data, the presence of missing values, and class imbalance in the target variable time_period, which can negatively affect predictive performance. The objective of this study is to develop an accurate and robust fossil age prediction model through systematic data preprocessing, feature selection, and model optimization. The dataset used in this research was obtained from Kaggle and consists of the attributes longitude, latitude, lithology, environment, and collection_type as the main features. The research workflow includes data cleaning, missing value imputation, categorical feature encoding, data splitting using stratified train–test split, and class imbalance handling through a class weight adjustment approach. The XGBoost model was trained on the training dataset and further optimized using RandomizedSearchCV to obtain the optimal hyperparameter configuration. Evaluation results on the testing dataset show that the tuned XGBoost model achieved an accuracy of 95%, precision of 90%, recall of 93%, and an F1-score of 91%, outperforming the model without hyperparameter tuning. These results demonstrate that the integration of geological–geospatial feature selection and hyperparameter tuning in XGBoost is effective in improving the performance of trilobite fossil age period prediction. The results of this study are expected to serve as a computational support approach in paleontology to assist fossil period determination in a more objective, efficient, and data-driven manner.

Downloads

Download data is not yet available.

References

J. D. Holmes and G. E. Budd, “Reassessing a cryptic history of early trilobite evolution,” Commun Biol, vol. 5, no. 1, Dec. 2022, doi: 10.1038/s42003-022-04146-6.

C. S. Marques, E. Malafaia, S. Pereira, V. F. Santos, and E. Dufourq, “A review of machine learning applications for identification and classification problems in paleontology,” , Elsevier B.V., vol. 91, Nov. 01, 2025doi: 10.1016/j.ecoinf.2025.103329.

B. T. Kopperud, S. Lidgard, and L. H. Liow, “Text-mined fossil biodiversity dynamics using machine learning,” Proceedings of the Royal Society B: Biological Sciences, vol. 286, no. 1901, Apr. 2019, doi: 10.1098/rspb.2019.0022.

J. Castle-Jones et al., “Integrated biostratigraphy, chemostratigraphy and geochronology of the lower Cambrian succession in the western Stansbury Basin, South Australia,” Australian Journal of Earth Sciences, vol. 72, no. 2, pp. 182–212, 2025, doi: 10.1080/08120099.2025.2473098.

E. Hodgson, J. McCoy, K. Webber, N. Nuñez Otaño, J. O’Keefe, and M. Pound, “A global dataset of fossil fungi records from the Cenozoic,” Scientific Data , vol. 12, no. 1, Dec. 2025, doi: 10.1038/s41597-025-04553-4.

D. A. Dinanthi, E. Ramadanti, C. Sri, K. Aditya, and D. R. Chandranegara, “Diabetes Detection Using Extreme Gradient Boosting (XGBoost) with Hyperparameter Tuning,” Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 2, pp. 78–84, May 2024, doi: 10.35882/ijeeemi.v6i2.351.

M. Kang, K. Pham, K. Kwon, S. Yang, and H. Choi, “A Hybrid Numerical-ML Model for Predicting Geological Risks in Tunneling with Electrical Methods,” KSCE Journal of Civil Engineering, vol. 28, no. 12, pp. 5972–5986, Dec. 2024, doi: 10.1007/s12205-024-0066-z.

J. Bahn, G. H. Alférez, and K. Snyder, “Machine Learning Classification of Fossilized Pectinodon bakkeri Teeth Images: Insights into Troodontid Theropod Dinosaur Morphology,” Mach Learn Knowl Extr, vol. 7, no. 2, Jun. 2025, doi: 10.3390/make7020045.

N. M. Shahani, X. Zheng, C. Liu, F. U. Hassan, and P. Li, “Developing an XGBoost Regression Model for Predicting Young’s Modulus of Intact Sedimentary Rocks for the Stability of Surface and Subsurface Structures,” Front Earth Sci (Lausanne), vol. 9, Oct. 2021, doi: 10.3389/feart.2021.761990.

J. Zhang, R. Wang, A. Jia, and N. Feng, “Optimization and Application of XGBoost Logging Prediction Model for Porosity and Permeability Based on K-means Method,” Applied Sciences (Switzerland), vol. 14, no. 10, May 2024, doi: 10.3390/app14103956.

N. M. Shahani, M. Kamran, X. Zheng, C. Liu, and X. Guo, “Application of gradient boosting machine learning algorithms to predict uniaxial compressive strength of soft sedimentary rocks at Thar coalfield,” Advances in Civil Engineering, vol. 2021, Oct. 2021, doi: 10.1155/2021/2565488.

N. M. Shahani, X. Zheng, C. Liu, F. U. Hassan, and P. Li, “Developing an XGBoost Regression Model for Predicting Young’s Modulus of Intact Sedimentary Rocks for the Stability of Surface and Subsurface Structures,” Front Earth Sci (Lausanne), vol. 9, Oct. 2021, doi: 10.3389/feart.2021.761990.

A. A. Syahputra and R. E. Saputro, “Application of the XGBoost Model with Hyperparameter Tuning for Industry Classification for Job Applicants,” sinkron, vol. 8, no. 3, pp. 1920–1931, Jul. 2024, doi: 10.33395/sinkron.v8i3.13840.

A. Dendi Rachmatsyah, T. Sugihartono, and K. Irfan, “Perbandingan Teknik Optimasi Grid Search dan Randomized Search dalam Meningkatkan Akurasi Metode Klasifikasi SVM Pada Sentimen Ulasan Pengguna Aplikasi JKN Mobile,” SKANIKA: Sistem Komputer dan Teknik Informatika, vol. 8, no. 1, pp. 13–22, Jan. 2025, doi: https://doi.org/10.36080/skanika.v8i1.3328.

A. Jarmakovica, “Machine learning-based strategies for improving healthcare data quality: an evaluation of accuracy, completeness, and reusability,” Front Artif Intell, vol. 8, Jul. 2025, doi: 10.3389/frai.2025.1621514.

Y. Zhang and P. J. Thorburn, “Handling missing data in near real-time environmental monitoring: A system and a review of selected methods,” Future Generation Computer Systems, vol. 128, pp. 63–72, Mar. 2022, doi: 10.1016/j.future.2021.09.033.

W. Wang, C. Xue, J. Zhao, C. Yuan, and J. Tang, “Machine learning-based field geological mapping: A new exploration of geological survey data acquisition strategy,” Ore Geol Rev, vol. 166, Mar. 2024, doi: 10.1016/j.oregeorev.2024.105959.

D. Breskuvien and G. Dzemyda, “Categorical Feature Encoding Techniques for Improved Classifier Performance when Dealing with Imbalanced Data of Fraudulent Transactions,” International Journal of Computers, Communications and Control, vol. 18, no. 3, Jun. 2023, doi: 10.15837/ijccc.2023.3.5433.

W. Albattah and R. U. Khan, “Impact of imbalanced features on large datasets,” Front Big Data, vol. 8, Mar. 2025, doi: 10.3389/fdata.2025.1455442.

Z. Wang, X. Chu, D. Li, H. Yang, and W. Qu, “Cost-sensitive matrixized classification learning with information entropy,” Appl Soft Comput, vol. 116, Feb. 2022, doi: 10.1016/j.asoc.2021.108266.

J. Han, K. Shu, and Z. Wang, “Predicting energy use in construction using Extreme Gradient Boosting,” PeerJ Comput Sci, vol. 9, Aug. 2023, doi: 10.7717/peerj-cs.1500.

F. Nurrahman, H. Wijayanto, A. H. Wigena, and N. Nurjanah, “PRE-PROCESSING DATA ON MULTICLASS CLASSIFICATION OF ANEMIA AND IRON DEFICIENCY WITH THE XGBOOST METHOD,” Barekeng, vol. 17, no. 2, pp. 767–774, Jun. 2023, doi: 10.30598/barekengvol17iss2pp0767-0774.

Sugiarto et al., “Optimizing The XGBoost Model with Grid Search Hyperparameter Tuning for Maximum Temperature Forecasting,” Journal of Applied Data Sciences, vol. 6, no. 4, pp. 2517–2529, Dec. 2025, doi: 10.47738/jads.v6i4.885.

H. Wijaya, D. P. Hostiadi, and E. Triandini, “Optimization XGBoost Algorithm Using Parameter Tunning in Retail Sales Prediction,” Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI), vol. 13, no. 3, Dec. 2024, doi: 10.23887/janapati.v13i3.82214.

C. G. L. Pringandana and K. Kusnawi, “A Comparative Analysis of Hyperparameter-Tuned XGBoost and LightGBM for Multiclass Rainfall Classification in Jakarta,” Jurnal Teknik Informatika (Jutif), vol. 6, no. 4, pp. 2467–2483, Aug. 2025, doi: 10.52436/1.jutif.2025.6.4.4965.

D. S. Soper, “Greed is good: Rapid hyperparameter optimization and model selection using greedy k-fold cross validation,” Electronics (Switzerland), vol. 10, no. 16, Aug. 2021, doi: 10.3390/electronics10161973.

L. A. Yates, Z. Aandahl, S. A. Richards, and B. W. Brook, “Cross validation for model selection: A review with examples from ecology,” Ecol Monogr, vol. 93, no. 1, Feb. 2023, doi: 10.1002/ecm.1557.

D. Liang, X. Jin, Y. Yuan, and R. Zou, “Performance Analysis of Machine Learning Methods,” in Journal of Physics: Conference Series, Institute of Physics, vol. 2023, Oct. 2023. doi: 10.1088/1742-6596/2428/1/012039.

I. Imantoko, A. Hermawan, and D. Avianto, “Comparative analysis of support vector machine and k-nearest neighbors with a pyramidal histogram of the gradient for sign language detection,” Matrix : Jurnal Manajemen Teknologi dan Informatika, vol. 11, no. 2, pp. 107–118, Jul. 2021, doi: 10.31940/matrix.v11i2.2433.

O. Rainio, J. Teuho, and R. Klén, “Evaluation metrics and statistical tests for machine learning,” Sci Rep, vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-024-56706-x.

R. Irmanita, S. S. Prasetiyowati, and Y. Sibaroni, “Classification of Malaria Complication Using CART (Classification and Regression Tree) and Naïve Bayes,” Jurnal RESTI, vol. 5, no. 1, pp. 10–16, Feb. 2021, doi: 10.29207/resti.v5i1.2770.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Prediksi Periode Fosil Trilobita Menggunakan XGBoost dengan Seleksi Fitur Geologi–Geospasial dan Hyperparameter Tuning

Dimensions Badge
Article History
Submitted: 2025-12-06
Published: 2026-03-05
Abstract View: 175 times
PDF Download: 164 times
How to Cite
Ramadhan, N., & Pramudya, E. (2026). Prediksi Periode Fosil Trilobita Menggunakan XGBoost dengan Seleksi Fitur Geologi–Geospasial dan Hyperparameter Tuning. Building of Informatics, Technology and Science (BITS), 7(4), 2181-2192. https://doi.org/10.47065/bits.v7i4.8862
Issue
Section
Articles