Analisis Sentimen Masyarakat Terhadap Kebocoran Pusat Data Nasional Sementara Menggunakan Algoritma Random Forest dan Support Vector Machine
Abstract
A ransomware attack on Indonesia’s Temporary National Data Center (PDNS) in June 2024 triggered major public concern over data security and government preparedness. This study aims to analyze public sentiment toward the incident using an Aspect-Based Sentiment Analysis approach on 2,700 Indonesian-language tweets collected from the X platform. The research follows the SEMMA (Sample, Explore, Modify, Model, Assess) methodology, involving text preprocessing, aspect extraction using part-of-speech tagging and named entity recognition, feature representation using Term Frequency-Inverse Document Frequency, and aspect refinement through semantic coherence. Extracted aspects are grouped into five categories: data security, institutions, infrastructure, politics and economy, and impact. Sentiment classification is carried out using the IndoBERTweet model. Results indicate a strong dominance of negative sentiment, particularly in the infrastructure and institutional categories, with no positive sentiment recorded in the political and economic aspect. To address class imbalance in sentiment distribution, the Synthetic Minority Oversampling Technique is applied during model training. Performance evaluation of two algorithms—Random Forest and Support Vector Machine—shows that Random Forest performs best, achieving 96% accuracy on a 70:30 data split and 99.05% average accuracy using 10-fold cross-validation. These findings highlight the effectiveness of aspect-based sentiment analysis and demonstrate Random Forest's superiority in handling imbalanced sentiment classification tasks.
Downloads
References
H. T. Halawani, A. M. Mashraqi, S. K. Badr, and S. Alkhalaf, “Automated sentiment analysis in social media using Harris Hawks optimisation and deep learning techniques,” Alexandria Engineering Journal, vol. 80, pp. 433–443, Oct. 2023, doi: 10.1016/j.aej.2023.08.062.
H. Alhindi, I. Traore, and I. Woungang, “Preventing Data Leak through Semantic Analysis,” Internet of Things, vol. 14, p. 100073, Jun. 2021, doi: 10.1016/j.iot.2019.100073.
D. Arisandi, T. Sutrisno, and I. Kurniawan, “KLASIFIKASI OPINI MASYARAKAT DI TWITTER TENTANG KEBOCORAN DATA YANG TERJADI DI INDONESIA MENGGUNAKAN ALGORITMA SVM,” Jurnal Informatika Kaputama (JIK), vol. 7, no. 1, pp. 84–90, Jan. 2023, doi: 10.59697/jik.v7i1.10.
Z. N. Aziza and D. Y. Kristiyanto, “Prediction of The Level of Public Trust in Government Policies in the 1 st Quarter of The Covid 19 Pandemic using Sentiment Analysis,” E3S Web of Conferences, vol. 317, p. 05013, Nov. 2021, doi: 10.1051/e3sconf/202131705013.
V. Tandon and R. Mehra, “An Integrated Approach For Analysing Sentiments On Social Media,” Informatica, vol. 47, no. 2, pp. 213–220, Jun. 2023, doi: 10.31449/inf.v47i2.4390.
Z. Janková, “CRITICAL REVIEW OF TEXT MINING AND SENTIMENT ANALYSIS FOR STOCK MARKET PREDICTION,” Journal of Business Economics and Management, vol. 24, no. 1, pp. 177–198, Apr. 2023, doi: 10.3846/jbem.2023.18805.
P. Guo, “Construction of Semantic Coherence Diagnosis Model of English Text based on Sentence Semantic Map,” Scalable Computing: Practice and Experience, vol. 25, no. 1, pp. 327–339, Jan. 2024, doi: 10.12694/scpe.v25i1.2298.
H.-H. Nguyen, “Enhancing Sentiment Analysis on Social Media Data with Advanced Deep Learning Techniques,” International Journal of Advanced Computer Science and Applications, vol. 15, no. 5, 2024, doi: 10.14569/IJACSA.2024.0150598.
T. Ahmed Khan, R. Sadiq, Z. Shahid, M. M. Alam, and M. Mohd Su’ud, “Sentiment Analysis using Support Vector Machine and Random Forest,” Journal of Informatics and Web Engineering, vol. 3, no. 1, pp. 67–75, Feb. 2024, doi: 10.33093/jiwe.2024.3.1.5.
N. I. Wibowo, T. A. Maulana, H. Muhammad, and N. A. Rakhmawati, “Perbandingan Algoritma Klasifikasi Sentimen Twitter Terhadap Insiden Kebocoran Data Tokopedia,” JISKA (Jurnal Informatika Sunan Kalijaga), vol. 6, no. 2, pp. 120–129, May 2021, doi: 10.14421/jiska.2021.6.2.120-129.
R. Sholehurrohman and I. Sabda Ilman, “ANALISIS SENTIMEN TWEET KASUS KEBOCORAN DATA PENGGUNAAN FACEBOOK OLEH CAMBRIGDE ANALYTICA,” Jurnal Pepadun, vol. 3, no. 1, pp. 140–147, Apr. 2022, doi: 10.23960/pepadun.v3i1.108.
A. Zy and Wahyu Hadikristanto, “Implementasi Algoritma Metode Naive Bayes dan Support Vector Machine Tentang Pembobolan dan Kebocoran Data di Twitter,” Bulletin of Information Technology (BIT), vol. 4, no. 1, pp. 49–56, Mar. 2023, doi: 10.47065/bit.v4i1.493.
C. Umam, L. B. Handoko, and F. O. Isinkaye, “Performance Analysis of Support Vector Classification and Random Forest in Phishing Email Classification,” Scientific Journal of Informatics, vol. 11, no. 2, pp. 367–374, May 2024, doi: 10.15294/sji.v11i2.3301.
A. Z. Taufan and W. Wibowo, “ANALISIS SENTIMEN TERKAIT PERSEPSI KEAMANAN DATA INFORMASI DAN PRIVASI DI INDONESIA MENGGUNAKAN PENDEKATAN MACHINE LEARNING,” Jurnal Informatika Teknologi dan Sains (Jinteks), vol. 6, no. 3, pp. 728–736, Aug. 2024, doi: 10.51401/jinteks.v6i3.4764.
M. I. Amal, E. S. Rahmasita, E. Suryaputra, and N. A. Rakhmawati, “Analisis Klasifikasi Sentimen Terhadap Isu Kebocoran Data Kartu Identitas Ponsel di Twitter,” Jurnal Teknik Informatika dan Sistem Informasi, vol. 8, no. 3, Dec. 2022, doi: 10.28932/jutisi.v8i3.5483.
A. M. Taufiqi and A. Nugroho, “Sentimen Pengguna Twitter Mengenai Isu Kebocoran Data Dengan Algoritma Naïve Bayes,” Jurnal Nasional Ilmu Komputer, vol. 4, no. 1, pp. 1–11, Mar. 2023, doi: 10.47747/jurnalnik.v4i1.1091.
D. Jacob and R. Henriques, “Educational Data Mining to Predict Bachelors Students’ Success,” Emerging Science Journal, vol. 7, no. Special Issue 2, pp. 159–171, Jul. 2023, doi: 10.28991/ESJ-2023-SIED2-013.
L. Andrade-Arenas, I. Rubio-Paucar, and C. Yactayo-Arias, “Predictive models in Alzheimer’s disease: an evaluation based on data mining techniques,” International Journal of Electrical and Computer Engineering (IJECE), vol. 14, no. 3, p. 2988, Jun. 2024, doi: 10.11591/ijece.v14i3.pp2988-3002.
J. Boegershausen, H. Datta, A. Borah, and A. T. Stephen, “Fields of Gold: Scraping Web Data for Marketing Insights,” J Mark, vol. 86, no. 5, pp. 1–20, Sep. 2022, doi: 10.1177/00222429221100750.
V. Boppana and P. Sandhya, “Web crawling based context aware recommender system using optimized deep recurrent neural network,” J Big Data, vol. 8, no. 1, p. 144, Dec. 2021, doi: 10.1186/s40537-021-00534-7.
N. Babanejad, H. Davoudi, A. Agrawal, A. An, and M. Papagelis, “The Role of Preprocessing for Word Representation Learning in Affective Tasks,” IEEE Trans Affect Comput, vol. 15, no. 1, pp. 254–272, Jan. 2024, doi: 10.1109/TAFFC.2023.3270115.
C. B. Lee, H. N. Io, and H. Tang, “Sentiments and perceptions after a privacy breach incident,” Cogent Business and Management, vol. 9, no. 1, 2022, doi: 10.1080/23311975.2022.2050018.
P. A. Rodríguez-Correa et al., “Information security education: a thematic trend analysis,” F1000Res, vol. 14, p. 5, Jan. 2025, doi: 10.12688/f1000research.159828.1.
R. Shandler, N. Kostyuk, and H. Oppenheimer, “Public Opinion and Cyberterrorism,” Public Opin Q, vol. 87, no. 1, pp. 92–119, 2023, doi: 10.1093/poq/nfad006.
N. U. Prince et al., “AI-Powered Data-Driven Cybersecurity Techniques: Boosting Threat Identification and Reaction,” 2024. [Online]. Available: www.nano-ntp.com
M. Sivakumar, S. Parthasarathy, and T. Padmapriya, “Trade-off between training and testing ratio in machine learning for medical image processing,” PeerJ Comput Sci, vol. 10, p. e2245, Sep. 2024, doi: 10.7717/peerj-cs.2245.
V. M. Rajan and A. Ramanujan, “Architecture of a Semantic WordCloud Visualization,” in Second International Conference on Networks and Advances in Computational Technologies, L. and J. J. and J. J. Palesi Maurizio and Trajkovic, Ed., Cham: Springer International Publishing, 2021, pp. 95–106. doi: 10.1007/978-3-030-49500-8_9.
M. S. Sayeed, V. Mohan, and K. S. Muthu, “BERT: A Review of Applications in Sentiment Analysis,” HighTech and Innovation Journal, vol. 4, no. 2, pp. 453–462, Jun. 2023, doi: 10.28991/HIJ-2023-04-02-015.
A. F. Hidayatullah, R. A. Apong, D. T. C. Lai, and A. Qazi, “Corpus creation and language identification for code-mixed Indonesian-Javanese-English Tweets,” PeerJ Comput Sci, vol. 9, p. e1312, Jun. 2023, doi: 10.7717/peerj-cs.1312.
F. Koto, J. H. Lau, and T. Baldwin, “IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA: Association for Computational Linguistics, 2021, pp. 10660–10668. doi: 10.18653/v1/2021.emnlp-main.833.
M. Kurniawan, K. Kusrini, and M. R. Arief, “Part of Speech Tagging Pada Teks Bahasa Indonesia dengan BiLSTM + CNN + CRF dan ELMo,” Jurnal Eksplora Informatika, vol. 11, no. 1, pp. 29–37, Jan. 2022, doi: 10.30864/eksplora.v11i1.506.
E. Yulianti, N. Bhary, J. Abdurrohman, F. W. Dwitilas, E. Q. Nuranti, and H. S. Husin, “Named entity recognition on Indonesian legal documents: a dataset and study using transformer-based models,” International Journal of Electrical and Computer Engineering (IJECE), vol. 14, no. 5, p. 5489, Oct. 2024, doi: 10.11591/ijece.v14i5.pp5489-5501.
L. Wang, Y. Yang, L. Xu, and T. Ji, “Application of random forest algorithm in the detection of foreign objects in wine,” Applied Mathematics and Nonlinear Sciences, vol. 9, no. 1, Jan. 2024, doi: 10.2478/amns.2023.2.00055.
D. Papakyriakou and I. S. Barbounakis, “Data Mining Methods: A Review,” Int J Comput Appl, vol. 183, no. 48, pp. 5–19, Jan. 2022, doi: 10.5120/ijca2022921884.
V. D. Cong and T. T. Hiep, “Support vector machine-based object classification for robot arm system,” International Journal of Electrical and Computer Engineering (IJECE), vol. 13, no. 5, p. 5047, Oct. 2023, doi: 10.11591/ijece.v13i5.pp5047-5053.
V. Ganganwar and R. Rajalakshmi, “Employing synthetic data for addressing the class imbalance in aspect-based sentiment classification,” Journal of Information and Telecommunication, vol. 8, no. 2, pp. 167–188, Apr. 2024, doi: 10.1080/24751839.2023.2270824.
A. Newaz, Md. S. Mohosheu, Md. A. Al Noman, and T. Jabid, “iBRF: Improved Balanced Random Forest Classifier,” in 2024 35th Conference of Open Innovations Association (FRUCT), Tampere: IEEE, Apr. 2024, pp. 501–508. doi: 10.23919/FRUCT61870.2024.10516372.
M. F. Schrauf, G. de los Campos, and S. Munilla, “Comparing Genomic Prediction Models by Means of Cross Validation,” Front Plant Sci, vol. 12, Nov. 2021, doi: 10.3389/fpls.2021.734512.
J. Wieczorek, C. Guerin, and T. McMahon, “K ‐fold cross‐validation for complex sample surveys,” Stat, vol. 11, no. 1, Dec. 2022, doi: 10.1002/sta4.454.
M. Shenify, “Sentiment analysis of Saudi e-commerce using naïve bayes algorithm and support vector machine,” International Journal of Data and Network Science, vol. 8, no. 3, pp. 1607–1612, Jun. 2024, doi: 10.5267/j.ijdns.2024.3.006.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Analisis Sentimen Masyarakat Terhadap Kebocoran Pusat Data Nasional Sementara Menggunakan Algoritma Random Forest dan Support Vector Machine
Pages: 960-971
Copyright (c) 2025 Faishal Khairi Basri, M Afdal, Angraini Angraini, Nesdi Evrilyan Rozanda

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).





















