Data Mining Dalam Clusterisasi Risiko Tinggi Obesitas Menggunakan Metode K-Means Clustering

Anzila Hasby; Budianto Bangun; Masrizal Masrizal

doi:10.47065/bits.v7i1.7462

Anzila Hasby * Universitas Labuhanbatu, Rantauprapat, Indonesia
Budianto Bangun Universitas Labuhanbatu, Rantauprapat, Indonesia
Masrizal Masrizal Universitas Labuhanbatu, Rantauprapat, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bits.v7i1.7462

Keywords: Obesity; K-Means Clustering; Cluster Analysis; Women; Risk Factors

Abstract

Obesity is a condition of excess body fat due to an imbalance between calorie intake and expenditure. This problem has become a global epidemic, including in Indonesia, with serious impacts on physical, mental, and social health. Women are more susceptible to obesity due to biological factors and lifestyle choices, as evidenced by data from a community health centre where 76.6% of central obesity patients were women. This study developed an obesity risk segmentation model for women using the K-Means Clustering algorithm based on secondary data from Kaggle (n=898), incorporating variables such as age, family history, dietary patterns, physical activity levels, and mode of transportation used. The results of preprocessing and StandardScaler normalisation showed two optimal clusters (Silhouette Score: 0.267), where Cluster 1 (young age 24.53 years, family history of obesity 1.91, fast food consumption 1.84, low physical activity 2.71) has a higher risk compared to Cluster 0 (age 41.41 years with a healthier lifestyle), revealing a significant interaction between genetic factors and lifestyle as the main triggers. These findings provide a scientific basis for group-based interventions, such as targeted nutrition education programmes for the young population, while demonstrating the effectiveness of data mining approaches in public health for classifying the risk of non-communicable diseases.

Downloads

Download data is not yet available.

References

D. Pratista, R. A. D. Sartika, dan P. N. Putri, “The Prevalence and Risk Factors of Central Obesity in Hypertensive Patients at Puskesmas Kemiri Muka, Depok City, West Jawa,” J. Indones. Nutr. Assoc., vol. 47, no. 2, hal. 195–208, 2024, doi: 10.36457/gizindo.v47i2.1066.

T. Hien, T. Nguyen, D. Tai, D. Songsak, dan S. Van Nam, “A method for k-means-like clustering of categorical data,” J. Ambient Intell. Humaniz. Comput., vol. 14, no. 11, hal. 15011–15021, 2023, doi: 10.1007/s12652-019-01445-5.

S. M. Miraftabzadeh, C. G. Colombo, M. Longo, dan F. Foiadelli, “K-Means and Alternative Clustering Methods in Modern Power Systems,” IEEE Access, vol. 11, hal. 119596–119633, 2023, doi: 10.1109/ACCESS.2023.3327640.

S. Suraya, M. Sholeh, dan U. Lestari, “Evaluation of Data Clustering Accuracy using K-Means Algorithm,” Int. J. Multidiscip. Approach Res. Sci., vol. 2, no. 01, hal. 385–396, 2023, doi: https://doi.org/10.59653/ijmars.v2i01.504.

B. Liu, C. Liu, Y. Zhou, D. Wang, dan Y. Dun, “An unsupervised chatter detection method based on AE and merging GMM and K-means,” Mech. Syst. Signal Process., vol. 186, hal. 109861, 2023, doi: https://doi.org/10.1016/j.ymssp.2022.109861.

H. Hu, J. Liu, X. Zhang, dan M. Fang, “An effective and adaptable K-means algorithm for big data cluster analysis,” Pattern Recognit., vol. 139, hal. 109404, 2023, doi: https://doi.org/10.1016/j.patcog.2023.109404.

C. E. Sukmawati, A. Fitri, N. Masruriyah, dan A. R. Juwita, “Efektivitas algoritma AdaBoost dan XGBoost pada dataset obesitas populasi dewasa,” Jambura J. Informatics, vol. 6, no. 2, hal. 101–111, 2024, doi: 10.37905/jji.

R. Wahyusari, “Penerapan Algoritma K-Medoids Untuk Mengelompokkan Status Obesitas,” Simetris, vol. 18, no. 1, hal. 1–4, 2024, [Daring]. Tersedia pada: https://www.utrcepu.ac.id/index.php/simetris/article/download/405

F. Shepyantoni, I. Kanedi, dan E. Suryana, “Penerapan Metode K-Means Clustering Dalam Pengelompokan Data Pasien Rawat Inap Peserta BPJS Di Rumah Sakit Umum Daerah Kabupaten Kaur,” J. Media Infotama, vol. 20, no. 2, hal. 493–500, 2024, doi: https://doi.org/10.37676/jmi.v20i2.6458.

I. Pii, N. Suarna, dan N. Rahaningsih, “Penerapan Data Mining Pada Penjualan Produk Pakaian Dameyra Fashion Menggunakan Metode K-Means Clustering,” JATI (Jurnal Mhs. Tek. Inform., vol. 7, no. 1, hal. 423–430, 2023, doi: https://doi.org/10.36040/jati.v7i1.6336.

C. Llatas, B. Soust-Verdaguer, L. C. Torres, dan D. Cagigas, “Application of Knowledge Discovery in Databases (KDD) to environmental, economic, and social indicators used in BIM workflow to support sustainable design,” J. Build. Eng., vol. 91, hal. 109546, 2024, doi: https://doi.org/10.1016/j.jobe.2024.109546.

S. Głowania, J. Kozak, dan P. Juszczuk, “Knowledge discovery in databases for a football match result,” Electronics, vol. 12, no. 12, hal. 2712, 2023, doi: https://doi.org/10.3390/electronics12122712.

R. H. Sukarna dan Y. Ansori, “Implementasi Data Mining Menggunakan Metode Naive Bayes Dengan Feature Selection Untuk Prediksi Kelulusan Mahasiswa Tepat Waktu,” J. Ilm. Sains dan Teknol., vol. 6, no. 1, hal. 50–61, 2022, doi: 10.47080/saintek.v6i1.1467.

F. O. Lusiana, I. Fatma, dan A. P. Windarto, “Estimasi Laju Pertumbuhan Penduduk Menggunakan Metode Regresi Linier Berganda Pada BPS Simalungun,” J. Informatics Manag. Inf. Technol., vol. 1, no. 2, hal. 79–84, 2021, [Daring]. Tersedia pada: https://hostjournals.com/

Z. Nabila, A. Rahman Isnain, dan Z. Abidin, “Analisis Data Mining Untuk Clustering Kasus Covid-19 Di Provinsi Lampung Dengan Algoritma K-Means,” J. Teknol. dan Sist. Inf., vol. 2, no. 2, hal. 100, 2021, [Daring]. Tersedia pada: http://jim.teknokrat.ac.id/index.php/JTSI

Y. L. Nainel, E. Buulolo, dan I. Lubis, “Penerapan Data Mining Untuk Estimasi Penjualan Obat Berdasarkan Pengaruh Brand Image Dengan Algoritma Expectation Maximization (Studi Kasus: PT. Pyridam Farma Tbk),” JURIKOM (Jurnal Ris. Komputer), vol. 7, no. 2, hal. 214, 2020, doi: 10.30865/jurikom.v7i2.2097.

G. J. Oyewole dan G. A. Thopil, “Data clustering: application and trends,” Artif. Intell. Rev., vol. 56, no. 7, hal. 6439–6475, 2023, doi: https://doi.org/10.1007/s10462-022-10325-y.

S. E. Hashemi, F. Gholian-Jouybari, dan M. Hajiaghaei-Keshteli, “A fuzzy C-means algorithm for optimizing data clustering,” Expert Syst. Appl., vol. 227, hal. 120377, 2023, doi: https://doi.org/10.1016/j.eswa.2023.120377.

S. Pitafi, T. Anwar, dan Z. Sharif, “A taxonomy of machine learning clustering algorithms, challenges, and future realms,” Appl. Sci., vol. 13, no. 6, hal. 3529, 2023, doi: https://doi.org/10.3390/app13063529.

M. Annas dan S. N. Wahab, “Data mining methods: K-means clustering algorithms,” Int. J. Cyber IT Serv. Manag., vol. 3, no. 1, hal. 40–47, 2023, doi: https://doi.org/10.34306/ijcitsm.v3i1.122.

T.-H. T. Nguyen, D.-T. Dinh, S. Sriboonchitta, dan V.-N. Huynh, “A method for k-means-like clustering of categorical data,” J. Ambient Intell. Humaniz. Comput., vol. 14, no. 11, hal. 15011–15021, 2023, doi: https://doi.org/10.1007/s12652-019-01445-5.

P. Dubey dan A. Rajavat, “Effective K-means clustering algorithm for efficient data mining,” in 2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN), 2023, hal. 1–6. doi: 10.1109/ViTECoN58111.2023.10157179.

G. E. Okereke, M. C. Bali, C. N. Okwueze, E. C. Ukekwe, S. C. Echezona, dan C. I. Ugwu, “K-means clustering of electricity consumers using time-domain features from smart meter data,” J. Electr. Syst. Inf. Technol., vol. 10, no. 1, hal. 1–18, 2023, doi: https://doi.org/10.1186/s43067-023-00068-3.

R. Zaib dan O. Ourabah, “Large scale data using K-means,” Mesopotamian J. Big Data, vol. 2023, hal. 36–45, 2023, doi: https://doi.org/10.58496/MJBD/2023/006.

S. N. Alaziz, B. Albayati, A. al-A. H. El-Bagoury, dan W. Shafik, “Clustering of COVID-19 multi-time series-based K-means and PCA with forecasting,” Int. J. Data Warehous. Min., vol. 19, no. 3, hal. 1–25, 2023, doi: 10.4018/IJDWM.317374.

S. Kim, S. Cho, J. Y. Kim, dan D.-J. Kim, “Statistical assessment on student engagement in asynchronous online learning using the k-means clustering algorithm,” Sustainability, vol. 15, no. 3, hal. 2049, 2023, doi: https://doi.org/10.3390/su15032049.

I. F. Ashari, E. D. Nugroho, R. Baraku, I. N. Yanda, dan R. Liwardana, “Analysis of elbow, silhouette, Davies-Bouldin, Calinski-Harabasz, and rand-index evaluation on k-means algorithm for classifying flood-affected areas in Jakarta,” J. Appl. Informatics Comput., vol. 7, no. 1, hal. 95–103, 2023, doi: https://doi.org/10.30871/jaic.v7i1.4947.

E. L. Cahapin, B. A. Malabag, C. S. Santiago Jr, J. L. Reyes, G. S. Legaspi, dan K. L. Adrales, “Clustering of students admission data using k-means, hierarchical, and DBSCAN algorithms,” Bull. Electr. Eng. Informatics, vol. 12, no. 6, hal. 3647–3656, 2023, doi: https://doi.org/10.11591/eei.v12i6.4849.

O. Khan et al., “Exploring the performance of biodiesel-hydrogen blends with diverse nanoparticles in diesel engine: A hybrid machine learning K-means clustering approach with weighted performance metrics,” Int. J. Hydrogen Energy, vol. 78, hal. 547–563, 2024, doi: https://doi.org/10.1016/j.ijhydene.2024.06.303.

M. S. Kim et al., “Association of genetic risk, lifestyle, and their interaction with obesity and obesity-related morbidities,” Cell Metab., vol. 36, no. 7, hal. 1494–1503, 2024, doi: 10.1016/j.cmet.2024.06.004.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Data Mining Dalam Clusterisasi Risiko Tinggi Obesitas Menggunakan Metode K-Means Clustering

Data Mining Dalam Clusterisasi Risiko Tinggi Obesitas Menggunakan Metode K-Means Clustering

Abstract

Downloads

References

Most read articles by the same author(s)