Pendekatan Machine Learning dengan Teknik Stacking untuk Memprediksi Kualitas Air MinumPendekatan Machine Learning dengan Teknik Stacking untuk Memprediksi Kualitas Air Minum


  • Ishak Bintang D * Mail Universitas Dian Nuswantoro, Semarang, Indonesia
  • Pulung Nurtantio Andono Universitas Dian Nuswantoro, Semarang, Indonesia
  • Ricardus Anggi Pramunendar Universitas Dian Nuswantoro, Semarang, Indonesia
  • Agus Winarno Universitas Dian Nuswantoro, Semarang, Indonesia
  • Aditya Aqil Darmawan Universitas Dian Nuswantoro, Semarang, Indonesia
  • (*) Corresponding Author
Keywords: Water Quality; Machine Learning; Classification; Stacking Ensemble; Water Potability

Abstract

Safe drinking water quality is essential for public health, yet environmental pollution has significantly degraded its quality. Manual methods such as WQI and STORET are inefficient, prompting this study to propose a machine learning-based classification system for more accurate water potability assessment. The Water Potability dataset from Kaggle is used, consisting of 3,276 samples with nine key parameters. The preprocessing stage includes data imputation, normalization, feature engineering, and oversampling with SMOTE. The applied models include LGBM, Random Forest, GBM, and XGBoost, optimized using Bayesian techniques and stacking ensemble to enhance accuracy. Results show that the stacking ensemble achieves an accuracy of 85.38%, precision of 88.02%, recall of 85.38%, and F1-score of 85.23%, outperforming individual models. This system enables real-time water quality monitoring with faster and more accurate results, supporting decision-making in sanitation policies and clean water availability.

Downloads

Download data is not yet available.

References

C. Allen, G. Metternicht, and T. Wiedmann, “Initial progress in implementing the Sustainable Development Goals (SDGs): a review of evidence from countries,” Sustain. Sci., vol. 13, no. 5, pp. 1453–1467, 2018, doi: 10.1007/s11625-018-0572-3.

S. Tyagi, B. Sharma, P. Singh, and R. Dobhal, “Water Quality Assessment in Terms of Water Quality Index,” Am. J. Water Resour., vol. 1, no. 3, pp. 34–38, 2020, doi: 10.12691/ajwr-1-3-3.

P. A. Riyantoko, T. M. Fahrudin, and K. M. Hindrayani, “Analisis Sederhana Pada Kualitas Air Minum Berdasarkan Akurasi Model Klasifikasi Dengan Menggunakan Lucifer Machine Learning,” Pros. Semin. Nas. Sains Data, vol. 1, no. 01, pp. 12–18, 2021, doi: 10.33005/senada.v1i01.20.

N. Malagi, “Water Potability Prediction using Machine Learning,” Int. Res. J. Mod. Eng. Technol. Sci., no. 08, pp. 2779–2782, 2023, doi: 10.56726/irjmets44413.

C. N. Ihsan et al., “Comparison of Machine Learning Algorithms in Detecting Tea Leaf Diseases,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 8, no. 1, pp. 135–141, 2024, doi: 10.29207/resti.v8i1.5587.

A. Mukati, R. Rathore, G. Patidar, and M. Patel, “Memahami Konsep Potensi Air Melalui Pembelajaran Mesin,” J Curr Trends Comp Sci Res, vol. 3, no. 2, 2024

A. P. Sari et al., “Classification of Drinking Water Source Suitability in West Java Using XGBoost and Cluster Analysis Based on SHAP Values,” Indonesian Journal of Statistics and its Applications, vol. 8, no. 2, pp. 202–214, 2024.

H. Hairani, T. Widiyaningtyas, and D. Dwi Prasetya, “Addressing Class Imbalance of Health Data: A Systematic Literature Review on Modified Synthetic Minority Oversampling Technique (SMOTE) Strategies,” JOIV Int. J. Informatics Vis., vol. 8, no. 3, pp. 1310–1318, 2024.

K. Yu, S. Xia, Y. Zhang, and S. Wang, “Loan Approval Prediction Improved by XGBoost Model Based on Four-Vector Optimization Algorithm,” Preprints, vol. 0, pp. 35–44, doi: 10.20944/preprints202410.0783.v1.

J. Yan et al., “LightGBM: accelerated genomically designed crop breeding through ensemble learning,” Genome Biol., vol. 22, no. 1, pp. 1–24, 2021, doi: 10.1186/s13059-021-02492-y.

J. Hu and S. Szymczak, “A review on longitudinal data analysis with random forest,” Brief. Bioinform., vol. 24, no. 2, pp. 1–11, 2023, doi: 10.1093/bib/bbad002.

R. Soelistijadi et al., “Pemodelan Prediktif Menggunakan Metode Ensemble Learning XGBoost dalam Peningkatan Akurasi Klasifikasi Penyakit Ginjal,” Kesatria, vol. 5, no. 4, pp. 1866–1875, 2024.

H. Los et al., “Evaluation of Xgboost and Lgbm Performance in Tree Species Classification With Sentinel-2 Data,” Int. Geosci. Remote Sens. Symp., vol. 2021-July, pp. 5803–5806, 2021, doi: 10.1109/IGARSS47720.2021.9553031.

F. Aziz, P. Ishak, and S. Abasa, “Klasifikasi Depresi Menggunakan Support Vector Machine: Pendekatan Berbasis Data Text Mining,” J. Pharm. Appl. Comput. Sci., vol. 2, no. 2, pp. 33–38, 2024, doi: 10.59823/jopacs.v2i2.53.

L. A. Yates, Z. Aandahl, S. A. Richards, and B. W. Brook, “Cross validation for model selection: A review with examples from ecology,” Ecol. Monogr., vol. 93, no. 1, pp. 1–24, 2023, doi: 10.1002/ecm.1557.

W. Musu, A. Ibrahim, and Heriadi, “Pengaruh Komposisi Data Training dan Testing terhadap Akurasi Algoritma C4.5,” Pros. Semin. Ilm. Sist. Inf. Dan Teknol. Inf., vol. X, no. 1, pp. 186–195, 2021.

M. I. K. Saraan and R. F. A. K. Rambe, “Kebijakan Pengembangan Inovasi Teknologi Pertanian Presisi di Provinsi Sumatera Utara,” J. Kaji. Agrar. dan Kedaulatan Pangan, vol. 2, no. 1, pp. 1–5, 2023, doi: 10.32734/jkakp.v2i1.13319.

M. E. Lestari, I. Asror, and I. L. Sardi, “Penerapan PCA (Principal Component Analysis) pada Deteksi Outlier untuk Data Text,” e-Proceeding Eng., vol. 10, no. 3, p. 3549, 2023.

F. Jabnabillah and N. Margina, “Analisis Korelasi Pearson Dalam Menentukan Hubungan Antara Motivasi Belajar Dengan Kemandirian Belajar Pada Pembelajaran Daring,” J. Sintak, vol. 1, no. 1, pp. 14–18, 2022.

J. Brandt and E. Lanzén, “A Comparative Review of SMOTE and ADASYN in Imbalanced Data Classification,” 2021‏, p. 42, 2020, [Online]. Available: https://www.diva-portal.org/smash/record.jsf?pid=diva2:1519153


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Pendekatan Machine Learning dengan Teknik Stacking untuk Memprediksi Kualitas Air MinumPendekatan Machine Learning dengan Teknik Stacking untuk Memprediksi Kualitas Air Minum

Dimensions Badge
Article History
Submitted: 2025-02-19
Published: 2025-03-13
Abstract View: 10 times
PDF Download: 6 times
How to Cite
D, I., Andono, P., Pramunendar, R., Winarno, A., & Darmawan, A. (2025). Pendekatan Machine Learning dengan Teknik Stacking untuk Memprediksi Kualitas Air MinumPendekatan Machine Learning dengan Teknik Stacking untuk Memprediksi Kualitas Air Minum. Building of Informatics, Technology and Science (BITS), 6(4), 2546-2558. https://doi.org/10.47065/bits.v6i4.7014
Issue
Section
Articles