Optimasi Cluster Pada K-Means Clustering Dengan Teknik  Reduksi Dimensi Dataset Menggunakan Gini Index

Muhammad Imam Zarkasyi; Herman Mawengkang; Opim Salim Sitompul

doi:10.47065/bits.v4i3.2458

Muhammad Imam Zarkasyi * Universitas Sumatera Utara, Medan, Indonesia
Herman Mawengkang Universitas Sumatera Utara, Medan, Indonesia
Opim Salim Sitompul Universitas Sumatera Utara, Medan, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bits.v4i3.2458

Keywords: Dimensional Reduction; Clustering; K-Means Clustering; Gini Index; Sum of Square Error

Abstract

In K-Means Clustering, the number of attributes of a data can affect the number of iterations generated in the data grouping process. One of the solutions to overcome these problems is by using a reduction technique on the dimensions of the dataset. In this study, the authors apply the Gini Index to perform attribute reduction on the data set to reduce attributes that have no effect on the dataset before clustering with K-Means Clustering. The dataset used to be tested as a testing instrument in this research is Absenteeism at work obtained from the UCI Machine Learning Repository, with 20 attributes, 740 data records and 4 attribute classes. The results of the tests in this research indicate that the number of iterations obtained from the comparison of tests using the K-Means in a Conversional (Without Attribute Reduction) is obtained by the number of 9 iterations, while the K-Means with attribute reduction with the Gini Index obtains the number of iterations totaling 6 iterations. Clustering evaluation was calculated using Sum of Square Error (SSE). The SSE value in K-Means Clustering in a Conversional (Without Attribute Reduction) is 1391.613, while in K-Means Clustering with attribute reduction with a Gini Index, it is 440.912. From the results of the proposed method, it is able to reduce the percentage of errors and minimize the number of iterations in K-Means Clustering by reducing the dimensions of the dataset using the Gini Index

Downloads

Download data is not yet available.

References

I. Alpiana and L. Anifah, “Penerapan Metode KnA (Kombinasi K-Means dan Agglomerative Hierarchical Clustering) dengan Pendekatan Single Linkage untuk Menentukan Status Gizi pada Balita,” Indones. J. Eng. Technol., vol. 1, no. 2, pp. 2623–2464, 2019, [Online]. Available: https://journal.unesa.ac.id/index.php/inajet

E. Muningsih, “Kombinasi Metode K-Means Dan Decision Tree Dengan Perbandingan Kriteria Dan Split Data,” J. Teknoinfo, vol. 16, no. 1, p. 113, 2022, doi: 10.33365/jti.v16i1.1561.

N. K. Zuhal, “Study Comparison K-Means Clustering dengan Algoritma Hierarchical Clustering,” Univ. Nusant. PGRI Kediri. Kediri, vol. 1, no. 1, pp. 200–205, 2022.

M. Arief Soeleman and F. Ilmu Komputer, “Penentuan Centroid Awal Pada Algoritma K-Means Dengan Dynamic Artificial Chromosomes Genetic Algorithm Untuk Tuberculosis Dataset Pre-Centroid Determination in K-Means Algorithm using Dynamic Artificial Chromosomes Genetic Algorithm for Tuberculosis Datas,” Februari, vol. 20, no. 1, pp. 97–108, 2021.

G. Rahayu and Mustakim, “Principal Component Analysis Untuk Dimensi Reduksi Data Clustering Sebagai Pemetaan Persentase Sertifikasi Guru Di Indonesia,” Semin. Nas. Teknol. Inf. Komun. dan Ind., vol. 0, no. 0, pp. 201–208, 2017, [Online]. Available: http://ejournal.uin-suska.ac.id/index.php/SNTIKI/article/view/3265

A. Izzuddin, “Optimasi Cluster pada Algoritma K-Means dengan Reduksi Dimensi Dataset Menggunakan Principal Component Analysis untuk Pemetaan Kinerja Dosen,” Ed. Nop., vol. 5, no. 2, pp. 41–46, 2015.

D. Hediyati and I. M. Suartana, “Penerapan Principal Component Analysis (PCA) Untuk Reduksi Dimensi Pada Proses Clustering Data Produksi Pertanian Di Kabupaten Bojonegoro,” J. Inf. Eng. Educ. Technol., vol. 5, no. 2, pp. 49–54, 2021.

M. Mauludin Rohman and S. Adinugroho, “Analisis Sentimen pada Ulasan Aplikasi Mobile JKN Menggunakan Metode Maximum Entropy dan Seleksi Fitur Gini Index Text,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 5, no. 6, pp. 2646–2654, 2021, [Online]. Available: http://j-ptiik.ub.ac.id

H. Irwandi, O. S. Sitompul, and S. Sutarman, “K-Means Performance Optimization Using Rank Order Centroid (ROC) And Braycurtis Distance,” SinkrOn, vol. 7, no. 2, pp. 472–478, 2022, doi: 10.33395/sinkron.v7i2.11371.

T. Setiyorini and R. T. Asmono, “Penerapan Metode K-Nearest Neighbor Dan Gini Index Pada Klasifikasi Kinerja Siswa,” J. Techno Nusa Mandiri, vol. 16, no. 2, pp. 121–126, 2019, doi: 10.33480/techno.v16i2.747.

T. Setiyorini and R. T. Asmono, “Penerapan Gini Index dan K-Nearest Neighbor untuk Klasifikasi Tingkat Kognitif Soal Pada Taksonomi Bloom,” Pilar Nusa Mandiri, vol. 13, no. 2, pp. 209–216, 2017, [Online]. Available: https://ejournal.nusamandiri.ac.id/index.php/pilar/article/view/239

I. Arfiani, H. Yuliansyah, and M. D. Suratin, “Implementasi Bee Colony Optimization Pada Pemilihan Centroid (Klaster Pusat) Dalam Algoritma K-Means,” Build. Informatics, Technol. Sci., vol. 3, no. 4, pp. 756–763, 2022, doi: 10.47065/bits.v3i4.1446.

A. I. Lubis, U. Erdiansyah, and R. Siregar, “Komparasi Akurasi pada Naive Bayes dan Random Forest dalam Klasifikasi Penyakit Liver,” J. Comput. Eng. Syst. Sci., vol. 7, no. 1, pp. 81–89, 2022.

U. Erdiansyah, A. Irmansyah Lubis, and K. Erwansyah, “Komparasi Metode K-Nearest Neighbor dan Random Forest Dalam Prediksi Akurasi Klasifikasi Pengobatan Penyakit Kutil,” J. Media Inform. Budidarma, vol. 6, no. 1, p. 208, 2022, doi: 10.30865/mib.v6i1.3373.

N. Putu, E. Merliana, and A. J. Santoso, “Analisa Penentuan Jumlah Cluster Terbaik pada Metode K-Means,” Pros. Semin. Nas. MULTI DISIPLIN ILMU&CALL Pap. UNISBANK, pp. 978–979, 2016.

A. I. Lubis, P. Sihombing, and E. B. Nababan, “Comparison SAW and MOORA Methods with Attribute Weighting Using Rank Order Centroid in Decision Making,” Mecn. 2020 - Int. Conf. Mech. Electron. Comput. Ind. Technol., no. February 2022, pp. 127–131, 2020, doi: 10.1109/MECnIT48290.2020.9166640.

L. Zahrotun, “Analisis Pengelompokan Jumlah Penumpang Bus Trans Jogja Menggunakan Metode Clustering K-Means Dan Agglomerative Hierarchical Clustering (Ahc),” J. Inform., vol. 9, no. 1, pp. 1039–1047, 2015, doi: 10.26555/jifo.v9i1.a2045.

D. Jollyta, S. Efendi, M. Zarlis, and H. Mawengkang, “Optimasi Cluster Pada Data Stunting: Teknik Evaluasi Cluster Sum of Square Error dan Davies Bouldin Index,” Pros. Semin. Nas. Ris. Inf. Sci., vol. 1, no. September, p. 918, 2019, doi: 10.30645/senaris.v1i0.100.

L. P. Refialy, H. Maitimu, and M. S. Pesulima, “Perbaikan Kinerja Clustering K-Means pada Data Ekonomi Nelayan dengan Perhitungan Sum of Square Error (SSE) dan Optimasi nilai K cluster,” Techno.Com, vol. 20, no. 2, pp. 321–329, 2021, doi: 10.33633/tc.v20i2.4572.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Optimasi Cluster Pada K-Means Clustering Dengan Teknik Reduksi Dimensi Dataset Menggunakan Gini Index

Optimasi Cluster Pada K-Means Clustering Dengan Teknik Reduksi Dimensi Dataset Menggunakan Gini Index

Abstract

Downloads

References

Most read articles by the same author(s)