Implementasi Grid Search CV KNN dengan Preprocessing Z-Score Outlier Removal untuk Sistem Prediksi Risiko Kehamilan


  • Ivan Maulana Anggita Universitas Dian Nuswantoro, Semarang, Indonesia
  • Muhammad Naufal * Mail Universitas Dian Nuswantoro, Semarang, Indonesia
  • Farrikh Al Zami Universitas Dian Nuswantoro, Semarang, Indonesia
  • (*) Corresponding Author
Keywords: K-Nearest Neighbors; GridSearchCV; Pregnancy Risk; Machine Learning; Standard Scaling

Abstract

This study aims to optimize the K-Nearest Neighbors (KNN) algorithm in predicting pregnancy risk levels using the “maternal health risk” dataset from the UCI Machine Learning Repository. The methodology includes data preprocessing through outlier detection and removal using Z-score, normalization with Standard Scaling, and categorical encoding on the target labels. Hyperparameter tuning is performed using GridSearchCV to identify the optimal combination of KNN parameters (number of neighbors, distance weight, and distance metric). The results show that the unoptimized KNN model achieved an accuracy of only 69.46%, whereas the optimized model reached an accuracy of 82.00%, with macro average precision of 81.91%, recall of 82.89%, and F1-score of 82.23%. Evaluation using a confusion matrix also revealed significant performance improvement, especially in the high-risk category. The optimized model was deployed as a web application using the Flask framework and Docker via Hugging Face Spaces, enabling real-time and efficient online pregnancy prediction. These findings indicate that combining KNN with GridSearchCV and data normalization significantly enhances prediction performance and offers practical application in healthcare decision support systems.

Downloads

Download data is not yet available.

References

A. Raza, H. U. R. Siddiqui, K. Munir, M. Almutairi, F. Rustam, and I. Ashraf, “Ensemble learning-based feature engineering to analyze maternal health during pregnancy and health risk prediction,” journals.plos.org, vol. 17, no. 11 November, Nov. 2022, doi: 10.1371/JOURNAL.PONE.0276525.

A. Bayuana et al., “Komplikasi Pada Kehamilan, Persalinan, Nifas dan Bayi Baru Lahir: Literature Review,” Jurnal Wacana Kesehatan, vol. 8, no. 1, p. 26, Jul. 2023, doi: 10.52822/jwk.v8i1.517.

A. Kurjak, M. Stanojević, and J. Dudenhausen, “Why maternal mortality in the world remains tragedy in low-income countries and shame for high-income ones: Will sustainable development goals (SDG) help?,” J Perinat Med, vol. 51, no. 2, pp. 170–181, Feb. 2023, doi: 10.1515/jpm-2022-0061.

R. Musarandega et al., “Causes of maternal mortality in Sub Saharan Africa: A systematic review of studies published from 2015 to 2020,” J Glob Health, Oct, 2021, doi: 10.7189/jogh.11.04048.

R. G. Wardhana, G. Wang, and F. Sibuea, “Penerapan Machine Learning Dalam Prediksi Tingkat Kasus Penyakit Di Indonesia,” Journal of Information System Management (JOISM), vol. 5, no. 1, pp. 40–45, Jul. 2023, doi: 10.24076/JOISM.2023V5I1.1136.

S. M. D. A. C. Jayatilake and G. U. Ganegoda, “Involvement of Machine Learning Tools in Healthcare Decision Making,” J Healthc Eng, vol. 2021, no. 1, p. 6679512, Jan. 2021, doi: 10.1155/2021/6679512.

M. Rijal et al., “Prediksi Depresi: Inovasi Terkini Dalam Kesehatan Mental Melalui Metode Machine Learning,” Journal Pharmacy and Application of Computer Sciences, vol. 2, no. 1, pp. 9–14, Feb. 2024, doi: 10.59823/JOPACS.V2I1.47.

M. R. S. Rao, D. Yadav, and V. Anbarasu, “An Improvised Machine Learning Model KNN for Malware Detection and Classification,” 2023 International Conference on Computer Communication and Informatics, ICCCI 2023, 2023, doi: 10.1109/ICCCI56745.2023.10128189.

I. Mayla Faiza, Gunawan, and W. Andriani, “Tinjauan Pustaka Sistematis: Penerapan Metode Machine Learning untuk Deteksi Bencana Banjir,” Jurnal Minfo Polgan, vol. 11, no. 2, pp. 59–63, Aug. 2022, doi: 10.33395/JMP.V11I2.11657.

L. Rubinger, A. Gazendam, S. Ekhtiari, and M. Bhandari, “Machine learning and artificial intelligence in research and healthcare,” Injury, vol. 54, pp. S69–S73, May 2023, doi: 10.1016/J.INJURY.2022.01.046.

Y. Chen, M. Mancini, X. Zhu, and Z. Akata, “Semi-Supervised and Unsupervised Deep Visual Learning: A Survey,” IEEE Trans Pattern Anal Mach Intell, vol. 46, no. 3, pp. 1327–1347, Mar. 2024, doi: 10.1109/TPAMI.2022.3201576.

R. S. Nurhalizah, R. Ardianto, and P. Purwono, “Analisis Supervised dan Unsupervised Learning pada Machine Learning: Systematic Literature Review,” Jurnal Ilmu Komputer dan Informatika, vol. 4, no. 1, pp. 61–72, Aug. 2024, doi: 10.54082/JIKI.168.

N. L. P. C. Savitri, R. A. Rahman, R. Venyutzky, and N. A. Rakhmawati, “Analisis Klasifikasi Sentimen Terhadap Sekolah Daring pada Twitter Menggunakan Supervised Machine Learning,” Jurnal Teknik Informatika dan Sistem Informasi, vol. 7, no. 1, pp. 2443–2229, Apr. 2021, doi: 10.28932/JUTISI.V7I1.3216.

“Maternal Health Risk - UCI Machine Learning Repository.” Accessed: Jun. 12, 2025. [Online]. Available: https://archive.ics.uci.edu/dataset/863/maternal+health+risk

H. B. Mutlu, F. Durmaz, N. Yücel, E. Cengil, and M. Yıldırım, “Prediction of Maternal Health Risk with Traditional Machine Learning Methods,” NATURENGS, vol. 4, no. 1, pp. 16–23, Jun. 2023, doi: 10.46572/NATURENGS.1293185.

M. N. Raihen and S. Akter, “Comparative Assessment of Several Effective Machine Learning Classification Methods for Maternal Health Risk,” Computational Journal of Mathematical and Statistical Sciences, vol. 3, no. 1, pp. 161–176, Apr. 2024, doi: 10.21608/cjmss.2024.259490.1036.

T. R. Noviandy, S. I. Nainggolan, R. Raihan, I. Firmansyah, and R. Idroes, “Maternal Health Risk Detection Using Light Gradient Boosting Machine Approach,” Infolitika Journal of Data Science, vol. 1, no. 2, pp. 48–55, Dec. 2023, doi: 10.60084/ijds.v1i2.123.

K. Cabello-Solorzano, I. Ortigosa de Araujo, M. Peña, L. Correia, and A. J. Tallón-Ballesteros, “The Impact of Data Normalization on the Accuracy of Machine Learning Algorithms: A Comparative Analysis,” Lecture Notes in Networks and Systems, vol. 750 LNNS, pp. 344–353, 2023, doi: 10.1007/978-3-031-42536-3_33.

P. V. Anusha, C. Anuradha, P. S. R. Chandra Murty, and C. S. Kiran, “Detecting outliers in high dimensional data sets using Z-score methodology,” International Journal of Innovative Technology and Exploring Engineering, vol. 9, no. 1, pp. 48–53, Nov. 2019, doi: 10.35940/IJITEE.A3910.119119.

W. Aprilliandhika and F. F. Abdulloh, “Comparison Of K-Nearest Neighbor And Support Vector Machine Algorithm Optimization With Grid Search Cv On Stroke Prediction,” Jurnal Teknik Informatika (Jutif), vol. 5, no. 4, pp. 991–1000, Jul. 2024, doi: 10.52436/1.JUTIF.2024.5.4.1951.

A. Yaqin, D. Kurniawan, and J. Zeniarja, “Optimasi Algoritma K-Nearest Neighbors Menggunakan GridSearchCV untuk Klasifikasi Penyakit Diabetes,” Infotekmesin, vol. 16, no. 1, pp. 75–84, Jan. 2025, doi: 10.35970/INFOTEKMESIN.V16I1.2557.

M. Ahmed, M. A. Kashem, M. Rahman, and S. Khatun, “Review and Analysis of Risk Factor of Maternal Health in Remote Area Using the Internet of Things (IoT),” Lecture Notes in Electrical Engineering, vol. 632, pp. 357–365, 2020, doi: 10.1007/978-981-15-2317-5_30.

V. Da Poian et al., “Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometry,” Frontiers in Astronomy and Space Sciences, vol. 10, p. 1134141, May 2023, doi: 10.3389/FSPAS.2023.1134141/BIBTEX.

V. Werner de Vargas, J. A. Schneider Aranda, R. dos Santos Costa, P. R. da Silva Pereira, and J. L. Victória Barbosa, “Imbalanced data preprocessing techniques for machine learning: a systematic mapping study,” Springer, vol. 65, no. 1, pp. 31–57, Jan. 2023, doi: 10.1007/S10115-022-01772-8.

J. Elektronika and D. Komputer, “Mengoptimalkan Proses Pembersihan Data dalam Analisis Big Data Menggunakan Pipeline Berbasis AI,” Elkom: Jurnal Elektronika dan Komputer, vol. 17, no. 2, pp. 657–666, Dec. 2024, doi: 10.51903/ELKOM.V17I2.2311.

I. M. K. Karo and H. Hendriyana, “Klasifikasi Penderita Diabetes menggunakan Algoritma Machine Learning dan Z-Score,” Jurnal Teknologi Terpadu, vol. 8, no. 2, pp. 94–99, Dec. 2022, doi: 10.54914/JTT.V8I2.564.

P. P. Allorerung, A. Erna, M. Bagussahrir, and S. Alam, “Analisis Performa Normalisasi Data untuk Klasifikasi K-Nearest Neighbor pada Dataset Penyakit,” JISKA (Jurnal Informatika Sunan Kalijaga), vol. 9, no. 3, pp. 178–191, Sep. 2024, doi: 10.14421/jiska.2024.9.3.178-191.

V. R. Prasetyo, M. Mercifia, A. Averina, L. Sunyoto, and B. Budiarjo, “Prediksi Rating Film Pada Website Imdb Menggunakan Metode Neural Network,” Network Engineering Research Operation, vol. 7, no. 1, p. 1, Apr. 2022, doi: 10.21107/NERO.V7I1.268.

V. R. Prasetyo, M. F. Naufal, and Budiarjo, “Implementation of K-Means and K-Nearest Neighbor Methods for Laptop Recommendation Websites,” Proceedings of the 4th International Conference on Informatics, Technology and Engineering 2023 (InCITE 2023), pp. 457–469, Nov. 2023, doi: 10.2991/978-94-6463-288-0_38.

X. Yang, L. Hou, Y. Zhou, W. Wang, and J. Yan, “Dense label encoding for boundary discontinuity free rotation detection,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 15814–15824, 2021, doi: 10.1109/CVPR46437.2021.01556.

R. Oktafiani, A. Hermawan, and D. Avianto, “Pengaruh Komposisi Split data Terhadap Performa Klasifikasi Penyakit Kanker Payudara Menggunakan Algoritma Machine Learning,” Jurnal Sains dan Informatika, vol. 9, no. 1, pp. 19–28, Jun. 2023, doi: 10.34128/JSI.V9I1.622.

S. Zhang, “Challenges in KNN Classification,” IEEE Trans Knowl Data Eng, vol. 34, no. 10, pp. 4663–4675, Oct. 2022, doi: 10.1109/TKDE.2021.3049250.

H. Al Azies and M. Naufal, “A Stacking Approach to Enhance K-Nearest Neighbors Performance for Autism Screening,” Jurnal Teknologi Informasi dan Terapan (J-TIT), vol. 11, no. 2, Dec. 2024, doi: 10.25047/JTIT.V11I2.5517.

Z. Maisat, E. Darmawan, and A. Fauzan Dianta, “Implementasi Optimasi Hyperparameter GridSearchCV Pada Sistem Prediksi Serangan Jantung Menggunakan SVM,” Teknologi: Jurnal Ilmiah Sistem Informasi, vol. 13, no. 1, pp. 8–15, Jan. 2023, doi: 10.26594/TEKNOLOGI.V13I1.3098.

Muljono, S. A. Wulandari, H. Al Azies, M. Naufal, W. A. Prasetyanto, and F. A. Zahra, “Breaking Boundaries in Diagnosis: Non-Invasive Anemia Detection Empowered by AI,” IEEE Access, vol. 12, pp. 9292–9307, 2024, doi: 10.1109/ACCESS.2024.3353788.

“Aplikasi Prediksi Resiko Kehamilan.” Accessed: Aug. 09, 2025. [Online]. Available: https://risetkami.my.id/mahasiswa/mhr_kehamilan.html


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Implementasi Grid Search CV KNN dengan Preprocessing Z-Score Outlier Removal untuk Sistem Prediksi Risiko Kehamilan

Dimensions Badge
Article History
Submitted: 2025-08-11
Published: 2025-09-04
Abstract View: 428 times
PDF Download: 255 times
How to Cite
Anggita, I., Naufal, M., & Zami, F. (2025). Implementasi Grid Search CV KNN dengan Preprocessing Z-Score Outlier Removal untuk Sistem Prediksi Risiko Kehamilan. Building of Informatics, Technology and Science (BITS), 7(2), 1202-1213. https://doi.org/10.47065/bits.v7i2.8206
Section
Articles

Most read articles by the same author(s)