Evaluasi Seleksi Fitur Algoritma Genetika pada Klasifikasi Penyakit Stroke Menggunakan K-Nearest Neighbor pada Dataset Overlapping
Abstract
Stroke is a disease that disrupts brain function due to obstructed blood flow and requires early detection to reduce the risk of disability and death. One of the classification methods widely used in stroke prediction is K-Nearest Neighbor (KNN), but this algorithm is sensitive to irrelevant features and complex data distribution. This study aims to evaluate the effectiveness of feature selection using Genetic Algorithm (GA) on the performance of KNN classification on stroke prediction datasets characterized by high overlap between classes. The dataset used consists of 15,000 patient data with 21 initial attributes, which after the preprocessing stage changed to 38 attributes. The evaluation model was carried out using 10-Fold Stratified Cross Validation at three data sharing ratios, namely 90:10, 80:20, and 70:30, with parameter K = 13. The results showed that GA was effective in reducing the data dimensionality significantly to 15–17 selected features and increasing the average internal validation fitness value by around 3%. However, evaluation of the test data shows that the performance of both the KNN and GA-KNN models remains stagnant at an accuracy range of 48–52%, with ROC-AUC values approaching 0.500. This condition indicates that GA tends to find local optimal solutions that are sensitive to the division of the training data, making it difficult for the model to generalize when processing new data. The extreme level of overlap in this dataset is further demonstrated through comparative experiments using feature engineering and the Random Forest algorithm, which both stalled at around 50% accuracy. These results indicate that GA feature selection successfully addresses data complexity, but is not automatically capable of solving class criteria problems if overlap is an inherent characteristic of the data itself.
Downloads
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Evaluasi Seleksi Fitur Algoritma Genetika pada Klasifikasi Penyakit Stroke Menggunakan K-Nearest Neighbor pada Dataset Overlapping
Pages: 233-243
Copyright (c) 2026 Aqmal Syarif Fadilah, Siska Kurnia Gusti, Iwan Iskandar, Iis Afrianty

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).


