Optimization in Time and Score using IID Algorithm for K-Modes Clustering

Farah Yulianti; Tjong Wan Sen

doi:10.47065/bits.v4i4.2791

Farah Yulianti * President University, Bekasi, Indonesia
Tjong Wan Sen President University, Bekasi, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bits.v4i4.2791

Keywords: Clustering Algorithm; K-Modes; InterIntra-Cluster; Dissimilarities

Abstract

Nowadays, there are numerous methods for analyzing data, one of which is cluster analysis. Because most practical data in today's analysis contains categorical attributes, categorical data clustering has recently received a lot of attention. To cluster categorical data, unsupervised machine learning techniques, which used frequency-based method, such as K-Mode’s clustering are used. The K-Modes algorithm takes advantage of the differences between the data points (total mis-matches or dissimilarities). The lower the dissimilarities, the more similar the data points, and thus the better the cluster. This paper aims to improve K-Mode’s clustering performance by incorporating the intercluster and intracluster dissimilari-ty measure, or IID measure, into the K-Modes algorithm rather than just using the standard simple-matching method to increase the algorithm's accuracy and execution time. This combined algorithm improves accuracy and execution time of the K-Modes algorithm. As a result, this algorithm can be used as an alternative to better cluster categorical data.

Downloads

Download data is not yet available.

References

D.-T. Dinh and V.-N. Huynh, “k-PbC: an improved cluster center initialization for categorical data clustering,” Applied Intelligence, vol. 50, no. 8, pp. 2610–2632, 2020.

Pal, S. K., & Pal, M. A Comparative Study of Initialization Methods for K-Means-Type Clustering Algorithms IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.

Kuo, R. J., & Nguyen, T. P. Q. Genetic intuitionistic weighted fuzzy k-modes algorithm for categorical data. Neurocomputing, 330, 116-126, 2019.

Zafar, A., & Swarupa Rani, K. Novel Initialization Strategy for K-modes Clustering Algorithm. In Proceedings of International Conference on Big Data, Machine Learning and Applications (pp. 89-100). Springer, Singapore, 2021.

F. Cao et al., “An algorithm for clustering categorical data with set-valued features,” IEEE Trans Neural Netw Learn Syst, vol. 29, no. 10, pp. 4593–4606, 2017.

Xiao, Y., Huang, C., Huang, J., Kaku, I., & Xu, Y. Optimal mathematical programming and variable neighborhood search for k-modes categorical data clustering. Pattern Recognition, 90, 183-195, 2019.

Wang, Y., & Zhang, Y. A K-Means Clustering-Based Hybrid Offspring Generation Mechanism in Evolutionary Multi-Objective Optimization. IEEE Access, 9, 167642-167651, 2021.

Guo, J., Li, X., Li, X., & Li, Y. “Gaussian Mixture Model for Mixed Data Types”. IEEE Transactions on Cybernetics, 2021.

Oskouei, A. G., Balafar, M. A., & Motamed, C. FKMAWCW: categorical fuzzy k-modes clustering with automated attribute-weight and cluster-weight learning. Chaos, Solitons & Fractals, 153, 111494, 2021.

Y. Zhang, Y. Yang, T. Li, and H. Fujita, “A multitask multiview clustering algorithm in heterogeneous situations based on LLE and LE,” Knowl Based Syst, vol. 163, pp. 776–786, 2019.

A. J. Gates and Y.-Y. Ahn, “The impact of random models on clustering similarity,” arXiv preprint arXiv:1701.06508, 2017.

Everitt, B. S., Landau, S., & Leese, M. Handbook of cluster analysis. CRC press, 2019.

Yuan, F., Yang, Y., & Yuan, T. A dissimilarity measure for mixed nominal and ordinal attribute data in k-Modes algorithm. Applied Intelligence, 50(5), 1498-1509, 2020.

Alves, G., Couceiro, M., & Napoli, A. Similarity Measure Selection for Categorical Data Clustering, 2019.

Jahwar, A. F., & Abdulazeez, A. M. Meta-heuristic algorithms for K-means clustering: A review. PalArch's Journal of Archaeology of Egypt/Egyptology, 17(7), 12002-12020, 2020.

Gharaei, N., Bakar, K. A., Hashim, S. Z. M., & Pourasl, A. H. Inter-and intra-cluster movement of mobile sink algorithms for cluster-based networks to enhance the network lifetime. Ad Hoc Networks, 85, 60-70, 2019.

Wei, Q., Bai, K., Zhou, L., Hu, Z., Jin, Y., & Li, J. A cluster-based energy optimization algorithm in wireless sensor networks with mobile sink. Sensors, 21(7), 2523, 2021.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Optimization in Time and Score using IID Algorithm for K-Modes Clustering