Implementasi Grid Search CV KNN dengan Preprocessing Z-Score Outlier Removal untuk Sistem Prediksi Risiko Kehamilan
Abstract
This study aims to optimize the K-Nearest Neighbors (KNN) algorithm in predicting pregnancy risk levels using the “maternal health risk” dataset from the UCI Machine Learning Repository. The methodology includes data preprocessing through outlier detection and removal using Z-score, normalization with Standard Scaling, and categorical encoding on the target labels. Hyperparameter tuning is performed using GridSearchCV to identify the optimal combination of KNN parameters (number of neighbors, distance weight, and distance metric). The results show that the unoptimized KNN model achieved an accuracy of only 69.46%, whereas the optimized model reached an accuracy of 82.00%, with macro average precision of 81.91%, recall of 82.89%, and F1-score of 82.23%. Evaluation using a confusion matrix also revealed significant performance improvement, especially in the high-risk category. The optimized model was deployed as a web application using the Flask framework and Docker via Hugging Face Spaces, enabling real-time and efficient online pregnancy prediction. These findings indicate that combining KNN with GridSearchCV and data normalization significantly enhances prediction performance and offers practical application in healthcare decision support systems.
Downloads
References
A. Raza, H. U. R. Siddiqui, K. Munir, M. Almutairi, F. Rustam, and I. Ashraf, “Ensemble learning-based feature engineering to analyze maternal health during pregnancy and health risk prediction,” journals.plos.org, vol. 17, no. 11 November, Nov. 2022, doi: 10.1371/JOURNAL.PONE.0276525.
A. Bayuana et al., “Komplikasi Pada Kehamilan, Persalinan, Nifas dan Bayi Baru Lahir: Literature Review,” Jurnal Wacana Kesehatan, vol. 8, no. 1, p. 26, Jul. 2023, doi: 10.52822/jwk.v8i1.517.
A. Kurjak, M. Stanojević, and J. Dudenhausen, “Why maternal mortality in the world remains tragedy in low-income countries and shame for high-income ones: Will sustainable development goals (SDG) help?,” J Perinat Med, vol. 51, no. 2, pp. 170–181, Feb. 2023, doi: 10.1515/jpm-2022-0061.
R. Musarandega et al., “Causes of maternal mortality in Sub Saharan Africa: A systematic review of studies published from 2015 to 2020,” J Glob Health, Oct, 2021, doi: 10.7189/jogh.11.04048.
R. G. Wardhana, G. Wang, and F. Sibuea, “Penerapan Machine Learning Dalam Prediksi Tingkat Kasus Penyakit Di Indonesia,” Journal of Information System Management (JOISM), vol. 5, no. 1, pp. 40–45, Jul. 2023, doi: 10.24076/JOISM.2023V5I1.1136.
S. M. D. A. C. Jayatilake and G. U. Ganegoda, “Involvement of Machine Learning Tools in Healthcare Decision Making,” J Healthc Eng, vol. 2021, no. 1, p. 6679512, Jan. 2021, doi: 10.1155/2021/6679512.
M. Rijal et al., “Prediksi Depresi: Inovasi Terkini Dalam Kesehatan Mental Melalui Metode Machine Learning,” Journal Pharmacy and Application of Computer Sciences, vol. 2, no. 1, pp. 9–14, Feb. 2024, doi: 10.59823/JOPACS.V2I1.47.
M. R. S. Rao, D. Yadav, and V. Anbarasu, “An Improvised Machine Learning Model KNN for Malware Detection and Classification,” 2023 International Conference on Computer Communication and Informatics, ICCCI 2023, 2023, doi: 10.1109/ICCCI56745.2023.10128189.
I. Mayla Faiza, Gunawan, and W. Andriani, “Tinjauan Pustaka Sistematis: Penerapan Metode Machine Learning untuk Deteksi Bencana Banjir,” Jurnal Minfo Polgan, vol. 11, no. 2, pp. 59–63, Aug. 2022, doi: 10.33395/JMP.V11I2.11657.
L. Rubinger, A. Gazendam, S. Ekhtiari, and M. Bhandari, “Machine learning and artificial intelligence in research and healthcare,” Injury, vol. 54, pp. S69–S73, May 2023, doi: 10.1016/J.INJURY.2022.01.046.
Y. Chen, M. Mancini, X. Zhu, and Z. Akata, “Semi-Supervised and Unsupervised Deep Visual Learning: A Survey,” IEEE Trans Pattern Anal Mach Intell, vol. 46, no. 3, pp. 1327–1347, Mar. 2024, doi: 10.1109/TPAMI.2022.3201576.
R. S. Nurhalizah, R. Ardianto, and P. Purwono, “Analisis Supervised dan Unsupervised Learning pada Machine Learning: Systematic Literature Review,” Jurnal Ilmu Komputer dan Informatika, vol. 4, no. 1, pp. 61–72, Aug. 2024, doi: 10.54082/JIKI.168.
N. L. P. C. Savitri, R. A. Rahman, R. Venyutzky, and N. A. Rakhmawati, “Analisis Klasifikasi Sentimen Terhadap Sekolah Daring pada Twitter Menggunakan Supervised Machine Learning,” Jurnal Teknik Informatika dan Sistem Informasi, vol. 7, no. 1, pp. 2443–2229, Apr. 2021, doi: 10.28932/JUTISI.V7I1.3216.
“Maternal Health Risk - UCI Machine Learning Repository.” Accessed: Jun. 12, 2025. [Online]. Available: https://archive.ics.uci.edu/dataset/863/maternal+health+risk
H. B. Mutlu, F. Durmaz, N. Yücel, E. Cengil, and M. Yıldırım, “Prediction of Maternal Health Risk with Traditional Machine Learning Methods,” NATURENGS, vol. 4, no. 1, pp. 16–23, Jun. 2023, doi: 10.46572/NATURENGS.1293185.
M. N. Raihen and S. Akter, “Comparative Assessment of Several Effective Machine Learning Classification Methods for Maternal Health Risk,” Computational Journal of Mathematical and Statistical Sciences, vol. 3, no. 1, pp. 161–176, Apr. 2024, doi: 10.21608/cjmss.2024.259490.1036.
T. R. Noviandy, S. I. Nainggolan, R. Raihan, I. Firmansyah, and R. Idroes, “Maternal Health Risk Detection Using Light Gradient Boosting Machine Approach,” Infolitika Journal of Data Science, vol. 1, no. 2, pp. 48–55, Dec. 2023, doi: 10.60084/ijds.v1i2.123.
K. Cabello-Solorzano, I. Ortigosa de Araujo, M. Peña, L. Correia, and A. J. Tallón-Ballesteros, “The Impact of Data Normalization on the Accuracy of Machine Learning Algorithms: A Comparative Analysis,” Lecture Notes in Networks and Systems, vol. 750 LNNS, pp. 344–353, 2023, doi: 10.1007/978-3-031-42536-3_33.
P. V. Anusha, C. Anuradha, P. S. R. Chandra Murty, and C. S. Kiran, “Detecting outliers in high dimensional data sets using Z-score methodology,” International Journal of Innovative Technology and Exploring Engineering, vol. 9, no. 1, pp. 48–53, Nov. 2019, doi: 10.35940/IJITEE.A3910.119119.
W. Aprilliandhika and F. F. Abdulloh, “Comparison Of K-Nearest Neighbor And Support Vector Machine Algorithm Optimization With Grid Search Cv On Stroke Prediction,” Jurnal Teknik Informatika (Jutif), vol. 5, no. 4, pp. 991–1000, Jul. 2024, doi: 10.52436/1.JUTIF.2024.5.4.1951.
A. Yaqin, D. Kurniawan, and J. Zeniarja, “Optimasi Algoritma K-Nearest Neighbors Menggunakan GridSearchCV untuk Klasifikasi Penyakit Diabetes,” Infotekmesin, vol. 16, no. 1, pp. 75–84, Jan. 2025, doi: 10.35970/INFOTEKMESIN.V16I1.2557.
M. Ahmed, M. A. Kashem, M. Rahman, and S. Khatun, “Review and Analysis of Risk Factor of Maternal Health in Remote Area Using the Internet of Things (IoT),” Lecture Notes in Electrical Engineering, vol. 632, pp. 357–365, 2020, doi: 10.1007/978-981-15-2317-5_30.
V. Da Poian et al., “Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometry,” Frontiers in Astronomy and Space Sciences, vol. 10, p. 1134141, May 2023, doi: 10.3389/FSPAS.2023.1134141/BIBTEX.
V. Werner de Vargas, J. A. Schneider Aranda, R. dos Santos Costa, P. R. da Silva Pereira, and J. L. Victória Barbosa, “Imbalanced data preprocessing techniques for machine learning: a systematic mapping study,” Springer, vol. 65, no. 1, pp. 31–57, Jan. 2023, doi: 10.1007/S10115-022-01772-8.
J. Elektronika and D. Komputer, “Mengoptimalkan Proses Pembersihan Data dalam Analisis Big Data Menggunakan Pipeline Berbasis AI,” Elkom: Jurnal Elektronika dan Komputer, vol. 17, no. 2, pp. 657–666, Dec. 2024, doi: 10.51903/ELKOM.V17I2.2311.
I. M. K. Karo and H. Hendriyana, “Klasifikasi Penderita Diabetes menggunakan Algoritma Machine Learning dan Z-Score,” Jurnal Teknologi Terpadu, vol. 8, no. 2, pp. 94–99, Dec. 2022, doi: 10.54914/JTT.V8I2.564.
P. P. Allorerung, A. Erna, M. Bagussahrir, and S. Alam, “Analisis Performa Normalisasi Data untuk Klasifikasi K-Nearest Neighbor pada Dataset Penyakit,” JISKA (Jurnal Informatika Sunan Kalijaga), vol. 9, no. 3, pp. 178–191, Sep. 2024, doi: 10.14421/jiska.2024.9.3.178-191.
V. R. Prasetyo, M. Mercifia, A. Averina, L. Sunyoto, and B. Budiarjo, “Prediksi Rating Film Pada Website Imdb Menggunakan Metode Neural Network,” Network Engineering Research Operation, vol. 7, no. 1, p. 1, Apr. 2022, doi: 10.21107/NERO.V7I1.268.
V. R. Prasetyo, M. F. Naufal, and Budiarjo, “Implementation of K-Means and K-Nearest Neighbor Methods for Laptop Recommendation Websites,” Proceedings of the 4th International Conference on Informatics, Technology and Engineering 2023 (InCITE 2023), pp. 457–469, Nov. 2023, doi: 10.2991/978-94-6463-288-0_38.
X. Yang, L. Hou, Y. Zhou, W. Wang, and J. Yan, “Dense label encoding for boundary discontinuity free rotation detection,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 15814–15824, 2021, doi: 10.1109/CVPR46437.2021.01556.
R. Oktafiani, A. Hermawan, and D. Avianto, “Pengaruh Komposisi Split data Terhadap Performa Klasifikasi Penyakit Kanker Payudara Menggunakan Algoritma Machine Learning,” Jurnal Sains dan Informatika, vol. 9, no. 1, pp. 19–28, Jun. 2023, doi: 10.34128/JSI.V9I1.622.
S. Zhang, “Challenges in KNN Classification,” IEEE Trans Knowl Data Eng, vol. 34, no. 10, pp. 4663–4675, Oct. 2022, doi: 10.1109/TKDE.2021.3049250.
H. Al Azies and M. Naufal, “A Stacking Approach to Enhance K-Nearest Neighbors Performance for Autism Screening,” Jurnal Teknologi Informasi dan Terapan (J-TIT), vol. 11, no. 2, Dec. 2024, doi: 10.25047/JTIT.V11I2.5517.
Z. Maisat, E. Darmawan, and A. Fauzan Dianta, “Implementasi Optimasi Hyperparameter GridSearchCV Pada Sistem Prediksi Serangan Jantung Menggunakan SVM,” Teknologi: Jurnal Ilmiah Sistem Informasi, vol. 13, no. 1, pp. 8–15, Jan. 2023, doi: 10.26594/TEKNOLOGI.V13I1.3098.
Muljono, S. A. Wulandari, H. Al Azies, M. Naufal, W. A. Prasetyanto, and F. A. Zahra, “Breaking Boundaries in Diagnosis: Non-Invasive Anemia Detection Empowered by AI,” IEEE Access, vol. 12, pp. 9292–9307, 2024, doi: 10.1109/ACCESS.2024.3353788.
“Aplikasi Prediksi Resiko Kehamilan.” Accessed: Aug. 09, 2025. [Online]. Available: https://risetkami.my.id/mahasiswa/mhr_kehamilan.html
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Implementasi Grid Search CV KNN dengan Preprocessing Z-Score Outlier Removal untuk Sistem Prediksi Risiko Kehamilan
Pages: 1202-1213
Copyright (c) 2025 Ivan Maulana Anggita, Muhammad Naufal, Farrikh Al Zami

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).





















