Hybrid Machine Learning Approaches for Atmospheric CO₂ Prediction: Evaluating Regression and Ensemble Models with Advanced Feature Engineering
Abstract
The accurate prediction of atmospheric CO₂ concentrations is essential for understanding climate change dynamics and developing effective environmental policies. This study evaluates the predictive capabilities of various machine learning models, including ensemble-based regressors such as Random Forest, Gradient Boosting, and XGBoost, alongside traditional regression models such as Support Vector Regression (SVR), Ridge, and Lasso regression. The dataset, derived from meteorological observations, was preprocessed using multiple feature scaling techniques, including StandardScaler, MinMaxScaler, and RobustScaler, followed by feature engineering techniques such as polynomial transformation and Principal Component Analysis (PCA) to enhance predictive accuracy. Model performance was assessed using the coefficient of determination (R²) and cross-validation techniques. The results indicate that tree-based models, including Random Forest and XGBoost, struggled to generalize well, exhibiting negative R² values due to overfitting and an inability to capture the temporal dependencies in CO₂ variations. SVR emerged as the best-performing model, though its predictive power remained limited. Computational complexity analysis revealed that tree-based methods incurred high processing costs, while linear models such as Ridge and Lasso demonstrated lower complexity but failed to capture non-linear dependencies. The study highlights the challenges of CO₂ prediction using conventional machine learning techniques and underscores the need for advanced deep learning approaches, such as hybrid Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) models, to better capture spatial and temporal dependencies. Future research should explore integrating external environmental factors and leveraging deep learning architectures to improve predictive performance.
Downloads
References
L. J. R. Nunes, “The rising threat of atmospheric CO2: a review on the causes, impacts, and mitigation strategies,” Environments, vol. 10, no. 4, p. 66, 2023. 10.3390/environments10040066
M. Filonchyk, M. P. Peterson, L. Zhang, V. Hurynovich, and Y. He, “Greenhouse gases emissions and global climate change: Examining the influence of CO2, CH4, and N2O,” Sci. Total Environ., p. 173359, 2024. 10.1016/j.scitotenv.2024.173359
M. M. Ramirez-Corredores, M. R. Goldwasser, and E. de Sousa Aguiar, “Carbon dioxide and climate change,” in Decarbonization as a Route Towards Sustainable Circularity, Springer, 2023, pp. 1–14.
S. I. Seneviratne et al., “Weather and climate extreme events in a changing climate,” 2021.
B. Clarke, F. Otto, R. Stuart-Smith, and L. Harrington, “Extreme weather impacts of climate change: an attribution perspective,” Environ. Res. Clim., vol. 1, no. 1, p. 12001, 2022.
M. G. Muluneh, “Impact of climate change on biodiversity and food security: a global perspective—a review article,” Agric. & Food Secur., vol. 10, no. 1, pp. 1–25, 2021.
S. Kumar, “A novel hybrid machine learning model for prediction of CO2 using socio-economic and energy attributes for climate change monitoring and mitigation policies,” Ecol. Inform., vol. 77, p. 102253, 2023. 10.1016/j.ecoinf.2023.102253
M. Madhavi et al., “Experimental evaluation of remote sensing--based climate change prediction using enhanced deep learning strategy,” Remote Sens. Earth Syst. Sci., pp. 1–15, 2024.
H. Han, Z. Liu, J. Li, and Z. Zeng, “Challenges in remote sensing based climate and crop monitoring: navigating the complexities using AI,” J. cloud Comput., vol. 13, no. 1, pp. 1–14, 2024.
M. Z. Rehman, A. A. Dar, and T. Wangmo A, “Forecasting CO2 Emissions in India: A Time Series Analysis Using ARIMA,” Processes, vol. 12, no. 12, p. 2699, 2024. 10.3390/pr12122699
L. A. Mansfield, A. Gupta, A. C. Burnett, B. Green, C. Wilka, and A. Sheshadri, “Updates on Model Hierarchies for Understanding and Simulating the Climate System: A Focus on Data-Informed Methods and Climate Change Impacts,” J. Adv. Model. Earth Syst., vol. 15, no. 10, p. e2023MS003715, 2023. 10.1029/2023MS003715
D. Tena-Gago, G. Golcarenarenji, I. Martinez-Alpiste, Q. Wang, and J. M. Alcaraz-Calero, “Machine-learning-based carbon dioxide concentration prediction for hybrid vehicles,” Sensors, vol. 23, no. 3, p. 1350, 2023. 10.3390/s23031350
S. Ali, S. Bogarra, M. N. Riaz, P. P. Phyo, D. Flynn, and A. Taha, “From time-series to hybrid models: advancements in short-term load forecasting embracing smart grid paradigm,” Appl. Sci., vol. 14, no. 11, p. 4442, 2024.
P. Linardatos, V. Papastefanopoulos, T. Panagiotakopoulos, and S. Kotsiantis, “CO2 concentration forecasting in smart cities using a hybrid ARIMA--TFT model on multivariate time series IoT data,” Sci. Rep., vol. 13, no. 1, p. 17266, 2023.
F. F. Mojtahedi, N. Yousefpour, S. H. Chow, and M. Cassidy, “Deep Learning for Time Series Forecasting: Review and Applications in Geotechnics and Geosciences,” Arch. Comput. Methods Eng., pp. 1–31, 2025.
M. Sakib, S. Mustajab, and M. Alam, “Ensemble deep learning techniques for time series analysis: a comprehensive review, applications, open issues, challenges, and future directions,” Cluster Comput., vol. 28, no. 1, pp. 1–44, 2025.
Z. Saharuna, R. Nur, and D. Nur, “Real time forecasting of indoor CO2 concentration using random forest,” in AIP Conference Proceedings, 2024, vol. 3140, no. 1.
U. P. Iskandar and M. Kurihara, “Time-series forecasting of a CO2-EOR and CO2 storage project using a data-driven approach,” Energies, vol. 15, no. 13, p. 4768, 2022. 10.3390/en15134768
I. Malashin, V. Tynchenko, A. Gantimurov, V. Nelyub, and A. Borodulin, “Applications of Long Short-Term Memory (LSTM) Networks in Polymeric Sciences: A Review,” Polymers (Basel)., vol. 16, no. 18, p. 2607, 2024. 10.3390/polym16182607
M. V. Pujitha and K. V. D. Kiran, “Predicting India’s CO2 Emissions from Vehicles in the Next 20 Years: A Comparative Study of Statistical and Deep Learning Models.,” Int. J. Veh. Struct. & Syst., vol. 16, no. 2, 2024.
M. Jansen, “BGC Jena Weather Station Dataset (2017-2024).” Kaggle, 2024.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Hybrid Machine Learning Approaches for Atmospheric CO₂ Prediction: Evaluating Regression and Ensemble Models with Advanced Feature Engineering
Pages: 2747-2755
Copyright (c) 2025 Gregorius Airlangga

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).