Prediksi Periode Fosil Trilobita Menggunakan XGBoost dengan Seleksi Fitur Geologi–Geospasial dan Hyperparameter Tuning
Abstract
This study investigates the application of the Extreme Gradient Boosting (XGBoost) algorithm to predict the age period of trilobite fossils based on geological and geospatial data. The challenges addressed in this research include the high complexity of paleontological data, the presence of missing values, and class imbalance in the target variable time_period, which can negatively affect predictive performance. The objective of this study is to develop an accurate and robust fossil age prediction model through systematic data preprocessing, feature selection, and model optimization. The dataset used in this research was obtained from Kaggle and consists of the attributes longitude, latitude, lithology, environment, and collection_type as the main features. The research workflow includes data cleaning, missing value imputation, categorical feature encoding, data splitting using stratified train–test split, and class imbalance handling through a class weight adjustment approach. The XGBoost model was trained on the training dataset and further optimized using RandomizedSearchCV to obtain the optimal hyperparameter configuration. Evaluation results on the testing dataset show that the tuned XGBoost model achieved an accuracy of 95%, precision of 90%, recall of 93%, and an F1-score of 91%, outperforming the model without hyperparameter tuning. These results demonstrate that the integration of geological–geospatial feature selection and hyperparameter tuning in XGBoost is effective in improving the performance of trilobite fossil age period prediction. The results of this study are expected to serve as a computational support approach in paleontology to assist fossil period determination in a more objective, efficient, and data-driven manner.
Downloads
References
J. D. Holmes and G. E. Budd, “Reassessing a cryptic history of early trilobite evolution,” Commun Biol, vol. 5, no. 1, Dec. 2022, doi: 10.1038/s42003-022-04146-6.
C. S. Marques, E. Malafaia, S. Pereira, V. F. Santos, and E. Dufourq, “A review of machine learning applications for identification and classification problems in paleontology,” , Elsevier B.V., vol. 91, Nov. 01, 2025doi: 10.1016/j.ecoinf.2025.103329.
B. T. Kopperud, S. Lidgard, and L. H. Liow, “Text-mined fossil biodiversity dynamics using machine learning,” Proceedings of the Royal Society B: Biological Sciences, vol. 286, no. 1901, Apr. 2019, doi: 10.1098/rspb.2019.0022.
J. Castle-Jones et al., “Integrated biostratigraphy, chemostratigraphy and geochronology of the lower Cambrian succession in the western Stansbury Basin, South Australia,” Australian Journal of Earth Sciences, vol. 72, no. 2, pp. 182–212, 2025, doi: 10.1080/08120099.2025.2473098.
E. Hodgson, J. McCoy, K. Webber, N. Nuñez Otaño, J. O’Keefe, and M. Pound, “A global dataset of fossil fungi records from the Cenozoic,” Scientific Data , vol. 12, no. 1, Dec. 2025, doi: 10.1038/s41597-025-04553-4.
D. A. Dinanthi, E. Ramadanti, C. Sri, K. Aditya, and D. R. Chandranegara, “Diabetes Detection Using Extreme Gradient Boosting (XGBoost) with Hyperparameter Tuning,” Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 2, pp. 78–84, May 2024, doi: 10.35882/ijeeemi.v6i2.351.
M. Kang, K. Pham, K. Kwon, S. Yang, and H. Choi, “A Hybrid Numerical-ML Model for Predicting Geological Risks in Tunneling with Electrical Methods,” KSCE Journal of Civil Engineering, vol. 28, no. 12, pp. 5972–5986, Dec. 2024, doi: 10.1007/s12205-024-0066-z.
J. Bahn, G. H. Alférez, and K. Snyder, “Machine Learning Classification of Fossilized Pectinodon bakkeri Teeth Images: Insights into Troodontid Theropod Dinosaur Morphology,” Mach Learn Knowl Extr, vol. 7, no. 2, Jun. 2025, doi: 10.3390/make7020045.
N. M. Shahani, X. Zheng, C. Liu, F. U. Hassan, and P. Li, “Developing an XGBoost Regression Model for Predicting Young’s Modulus of Intact Sedimentary Rocks for the Stability of Surface and Subsurface Structures,” Front Earth Sci (Lausanne), vol. 9, Oct. 2021, doi: 10.3389/feart.2021.761990.
J. Zhang, R. Wang, A. Jia, and N. Feng, “Optimization and Application of XGBoost Logging Prediction Model for Porosity and Permeability Based on K-means Method,” Applied Sciences (Switzerland), vol. 14, no. 10, May 2024, doi: 10.3390/app14103956.
N. M. Shahani, M. Kamran, X. Zheng, C. Liu, and X. Guo, “Application of gradient boosting machine learning algorithms to predict uniaxial compressive strength of soft sedimentary rocks at Thar coalfield,” Advances in Civil Engineering, vol. 2021, Oct. 2021, doi: 10.1155/2021/2565488.
N. M. Shahani, X. Zheng, C. Liu, F. U. Hassan, and P. Li, “Developing an XGBoost Regression Model for Predicting Young’s Modulus of Intact Sedimentary Rocks for the Stability of Surface and Subsurface Structures,” Front Earth Sci (Lausanne), vol. 9, Oct. 2021, doi: 10.3389/feart.2021.761990.
A. A. Syahputra and R. E. Saputro, “Application of the XGBoost Model with Hyperparameter Tuning for Industry Classification for Job Applicants,” sinkron, vol. 8, no. 3, pp. 1920–1931, Jul. 2024, doi: 10.33395/sinkron.v8i3.13840.
A. Dendi Rachmatsyah, T. Sugihartono, and K. Irfan, “Perbandingan Teknik Optimasi Grid Search dan Randomized Search dalam Meningkatkan Akurasi Metode Klasifikasi SVM Pada Sentimen Ulasan Pengguna Aplikasi JKN Mobile,” SKANIKA: Sistem Komputer dan Teknik Informatika, vol. 8, no. 1, pp. 13–22, Jan. 2025, doi: https://doi.org/10.36080/skanika.v8i1.3328.
A. Jarmakovica, “Machine learning-based strategies for improving healthcare data quality: an evaluation of accuracy, completeness, and reusability,” Front Artif Intell, vol. 8, Jul. 2025, doi: 10.3389/frai.2025.1621514.
Y. Zhang and P. J. Thorburn, “Handling missing data in near real-time environmental monitoring: A system and a review of selected methods,” Future Generation Computer Systems, vol. 128, pp. 63–72, Mar. 2022, doi: 10.1016/j.future.2021.09.033.
W. Wang, C. Xue, J. Zhao, C. Yuan, and J. Tang, “Machine learning-based field geological mapping: A new exploration of geological survey data acquisition strategy,” Ore Geol Rev, vol. 166, Mar. 2024, doi: 10.1016/j.oregeorev.2024.105959.
D. Breskuvien and G. Dzemyda, “Categorical Feature Encoding Techniques for Improved Classifier Performance when Dealing with Imbalanced Data of Fraudulent Transactions,” International Journal of Computers, Communications and Control, vol. 18, no. 3, Jun. 2023, doi: 10.15837/ijccc.2023.3.5433.
W. Albattah and R. U. Khan, “Impact of imbalanced features on large datasets,” Front Big Data, vol. 8, Mar. 2025, doi: 10.3389/fdata.2025.1455442.
Z. Wang, X. Chu, D. Li, H. Yang, and W. Qu, “Cost-sensitive matrixized classification learning with information entropy,” Appl Soft Comput, vol. 116, Feb. 2022, doi: 10.1016/j.asoc.2021.108266.
J. Han, K. Shu, and Z. Wang, “Predicting energy use in construction using Extreme Gradient Boosting,” PeerJ Comput Sci, vol. 9, Aug. 2023, doi: 10.7717/peerj-cs.1500.
F. Nurrahman, H. Wijayanto, A. H. Wigena, and N. Nurjanah, “PRE-PROCESSING DATA ON MULTICLASS CLASSIFICATION OF ANEMIA AND IRON DEFICIENCY WITH THE XGBOOST METHOD,” Barekeng, vol. 17, no. 2, pp. 767–774, Jun. 2023, doi: 10.30598/barekengvol17iss2pp0767-0774.
Sugiarto et al., “Optimizing The XGBoost Model with Grid Search Hyperparameter Tuning for Maximum Temperature Forecasting,” Journal of Applied Data Sciences, vol. 6, no. 4, pp. 2517–2529, Dec. 2025, doi: 10.47738/jads.v6i4.885.
H. Wijaya, D. P. Hostiadi, and E. Triandini, “Optimization XGBoost Algorithm Using Parameter Tunning in Retail Sales Prediction,” Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI), vol. 13, no. 3, Dec. 2024, doi: 10.23887/janapati.v13i3.82214.
C. G. L. Pringandana and K. Kusnawi, “A Comparative Analysis of Hyperparameter-Tuned XGBoost and LightGBM for Multiclass Rainfall Classification in Jakarta,” Jurnal Teknik Informatika (Jutif), vol. 6, no. 4, pp. 2467–2483, Aug. 2025, doi: 10.52436/1.jutif.2025.6.4.4965.
D. S. Soper, “Greed is good: Rapid hyperparameter optimization and model selection using greedy k-fold cross validation,” Electronics (Switzerland), vol. 10, no. 16, Aug. 2021, doi: 10.3390/electronics10161973.
L. A. Yates, Z. Aandahl, S. A. Richards, and B. W. Brook, “Cross validation for model selection: A review with examples from ecology,” Ecol Monogr, vol. 93, no. 1, Feb. 2023, doi: 10.1002/ecm.1557.
D. Liang, X. Jin, Y. Yuan, and R. Zou, “Performance Analysis of Machine Learning Methods,” in Journal of Physics: Conference Series, Institute of Physics, vol. 2023, Oct. 2023. doi: 10.1088/1742-6596/2428/1/012039.
I. Imantoko, A. Hermawan, and D. Avianto, “Comparative analysis of support vector machine and k-nearest neighbors with a pyramidal histogram of the gradient for sign language detection,” Matrix : Jurnal Manajemen Teknologi dan Informatika, vol. 11, no. 2, pp. 107–118, Jul. 2021, doi: 10.31940/matrix.v11i2.2433.
O. Rainio, J. Teuho, and R. Klén, “Evaluation metrics and statistical tests for machine learning,” Sci Rep, vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-024-56706-x.
R. Irmanita, S. S. Prasetiyowati, and Y. Sibaroni, “Classification of Malaria Complication Using CART (Classification and Regression Tree) and Naïve Bayes,” Jurnal RESTI, vol. 5, no. 1, pp. 10–16, Feb. 2021, doi: 10.29207/resti.v5i1.2770.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Prediksi Periode Fosil Trilobita Menggunakan XGBoost dengan Seleksi Fitur Geologi–Geospasial dan Hyperparameter Tuning
Pages: 2181-2192
Copyright (c) 2026 Naufal Rizky Ramadhan, Elkaf Rahmawan Pramudya

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).





















