Implementasi K-Means sebagai Mekanisme Self-Labeling dalam Arsitektur Ensemble Voting Classifier untuk Prediksi Penjualan Usaha Mikro Kecil dan Menengah (UMKM) pada Data Tanpa Label
Abstract
Sales forecasting in the Micro, Small, and Medium Enterprises (MSME) sector faces challenges due to the fluctuating (noisy) nature of the data and the absence of class labels (unlabeled) required for training supervised learning models. This study proposes a sequential hybrid architecture in which the K-Means algorithm is employed as a Self-Labeling mechanism to automatically transform raw transaction data into class labels (“Low” and “High”). The resulting synthetic labels are then used to train an Ensemble Voting Classifier model that aggregates predictions from XGBoost, LightGBM, and CatBoost. The experimental evaluation results show that although the single XGBoost model achieves a slightly higher accuracy (96.24%) compared to the Ensemble model (96.07%), the hybrid Ensemble Voting model proves superior in terms of probability calibration, achieving the lowest Loss value of 0.1532. This value outperforms XGBoost (0.1646) and LightGBM (0.1772), indicating more reliable and stable prediction confidence. The model also demonstrates excellent balance with an F1-Score of 0.95 and a Recall of 0.96 for the majority class. This study confirms that the hybrid approach is effective in reducing uncertainty in MSME stock management.
Downloads
References
M. R. Santoso, ‘PRODUCT INNOVATION STRATEGIES IN INCREASING COMPETITIVENESS MSMES IN INDONESIAN’, Proceeding of International Students Conference of Economics and Business Excellence, vol. 1, no. 1, pp. 143–147, 2024, doi: 10.33830/iscebe.v1i1.4412.
A. Z. Muttaqin, A. E. Lestiana, and N. R. Aza, ‘Pendampingan Penambahan Varian Rasa Sebagai Upaya Peningkatan Nilai Jual Produk UMKM Keripik Tempe “Nadhira”’, Harmoni Sosial : Jurnal Pengabdian dan Solidaritas Masyarakat, vol. 2, no. 1, pp. 85–92, 2025, doi: 10.62383/harmoni.v2i1.1111.
T. T. H. Tambunan, UMKM Di INDONESIA: Perkembangan, Kendala, dan Tantangan. Prenada Media, 2021.
D. Astuti, K. Kardiyem, R. Setiyani, and L. Latifah, ‘Peningkatan Nilai Tambah Olahan Bawang Merah dengan Konsep Pengembangan Ekonomi Lokal di Kecamatan Toroh Grobogan’, Jurnal Pengabdian Nasional (JPN) Indonesia, vol. 4, pp. 528–535, 2023, doi: 10.35870/jpni.v4i3.430.
O. Surakhi et al., ‘Time-Lag Selection for Time-Series Forecasting Using Neural Network and Heuristic Algorithm’, Electronics (Basel), vol. 10, no. 20, p. 2518, 2021, doi: 10.3390/electronics10202518.
J. Heizer, B. Render, and C. Munson, Operations Management: Sustainability and Supply Chain Management. Pearson, 2020.
S. P. Lesmarna et al., ‘DEVELOPMENT OF TIME-SERIES-BASED MLOPS ARCHITECTURE FOR PREDICTING SALES QUANTITY IN MICRO, SMALL, AND MEDIUM ENTERPRISES (MSMES)’, Transmisi: Jurnal Ilmiah Teknik Elektro, vol. 26, no. 2, pp. 64–69, 2024, doi: 10.14710/transmisi.26.2.64-69.
A. Borucka, ‘Seasonal Methods of Demand Forecasting in the Supply Chain as Support for the Company’s Sustainable Growth’, Sustainability, vol. 15, no. 9, p. 7399, 2023, doi: 10.3390/su15097399.
Jahroni and M. Muksin, ‘Economic Order Quantity (EOQ) Application to Raw Material Inventory Control for SME’s’, Community Service Journal (CSJ), vol. 5, no. 2, pp. 88–98, 2023, doi: 10.22225/csj.5.2.2023.88-98.
S. Mansur et al., ‘Sales forecasting for retail stores using hybrid neural networks and sales-affecting variables’, PeerJ Comput Sci, vol. 11, p. e3058, 2025, doi: 10.7717/peerj-cs.3058.
R. Kang, ‘Sales Prediction of Big Mart based on Linear Regression, Random Forest, and Gradient Boosting’, Advances in Economics, Management and Political Sciences, vol. 17, no. 1, pp. 200–207, 2023, doi: 10.54254/2754-1169/17/20231094.
C. N. C et al., ‘Advancing Retail Predictions: Integrating Diverse Machine Learning Models for Accurate Walmart Sales Forecasting’, Asian Journal of Probability and Statistics, vol. 26, no. 7, pp. 1–23, 2024, doi: 10.9734/ajpas/2024/v26i7626.
S. Wu, Z. Zhang, and Y. Ru, ‘Research on Product Demand Forecasting Based on Random Forest and ARIMA Time Series: Precision Forecasting Method for Data-Scarce Environments’, Transactions on Computer Science and Intelligent Systems Research, vol. 5, pp. 1327–1335, Aug. 2024, doi: 10.62051/G9R9CA46.
M. Kamal Ahmed et al., ‘Forecasting Sales Trends Using Time Series Analysis: A Comparative Study Of Traditional And Machine Learning Models’, Membrane Technology, vol. 668, no. 1, pp. 668–682, Jan. 2025, doi: 10.52710/MT.325.
P. Ganguly and I. Mukherjee, ‘Enhancing Retail Sales Forecasting with Optimized Machine Learning Models’, in 2024 4th International Conference on Sustainable Expert Systems (ICSES), 2024, pp. 884–889. doi: 10.1109/ICSES63445.2024.10762950.
A. F. Alshaibanee and K. B. S. AlJanabi, ‘A Proposed Class Labeling Approach: From Unsupervised to Supervised Learning’, in 2022 Iraqi International Conference on Communication and Information Technologies (IICCIT), 2022, pp. 1–6. doi: 10.1109/IICCIT55816.2022.10010551.
B. Sunarko et al., ‘Penerapan Stacking Ensemble Learning untuk Klasifikasi Efek Kesehatan Akibat Pencemaran Udara’, Edu Komputika Journal, vol. 10, pp. 55–63, 2023, doi: 10.15294/edukomputika.v10i1.72080.
T. Emmanuel, T. Maupong, D. Mpoeleng, T. Semong, B. Mphago, and O. Tabona, ‘A survey on missing data in machine learning’, J Big Data, vol. 8, no. 1, p. 140, 2021, doi: 10.1186/s40537-021-00516-9.
S. Makridakis, E. Spiliotis, and V. Assimakopoulos, ‘M5 accuracy competition: Results, findings, and conclusions’, Int J Forecast, vol. 38, no. 4, pp. 1346–1364, 2022, doi: 10.1016/j.ijforecast.2021.11.013.
S. Bates, T. Hastie, and R. Tibshirani, ‘Cross-Validation: What Does It Estimate and How Well Does It Do It?’, J Am Stat Assoc, vol. 119, no. 546, pp. 1434–1445, 2024, doi: 10.1080/01621459.2023.2197686.
A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, ‘K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data’, Inf Sci (N Y), vol. 622, pp. 178–210, 2023, doi: 10.1016/j.ins.2022.11.139.
I. D. Mienye and Y. Sun, ‘A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects’, IEEE Access, vol. 10, pp. 99129–99149, 2022, doi: 10.1109/ACCESS.2022.3207287.
S. Kumari, D. Kumar, and M. Mittal, ‘An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier’, International Journal of Cognitive Computing in Engineering, vol. 2, pp. 40–46, 2021, doi: 10.1016/j.ijcce.2021.01.001.
T. Chen and C. Guestrin, ‘XGBoost: A scalable tree boosting system’, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13-17-August-2016, pp. 785–794, Aug. 2016, doi: 10.1145/2939672.2939785;CSUBTYPE:STRING:CONFERENCE.
G. Ke et al., ‘LightGBM: A Highly Efficient Gradient Boosting Decision Tree’, doi: 10.5555/3294996.3295074.
L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, ‘CatBoost: unbiased boosting with categorical features’, Adv Neural Inf Process Syst, vol. 2018-December, pp. 6638–6648, Jun. 2017, Accessed: Dec. 06, 2025. [Online]. Available: https://arxiv.org/pdf/1706.09516
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Implementasi K-Means sebagai Mekanisme Self-Labeling dalam Arsitektur Ensemble Voting Classifier untuk Prediksi Penjualan Usaha Mikro Kecil dan Menengah (UMKM) pada Data Tanpa Label
Pages: 2006-2016
Copyright (c) 2025 Muhammad Aqil Fahmi, Defri Kurniawan

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).





















