Analisis Sentimen Masyarakat Terhadap Kebocoran Pusat Data Nasional Sementara Menggunakan Algoritma Random Forest dan Support Vector Machine

Faishal Khairi Basri; M Afdal; Angraini Angraini; Nesdi Evrilyan Rozanda

doi:10.47065/bits.v7i2.7473

Faishal Khairi Basri * Universitas Islam Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
M Afdal Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
Angraini Angraini Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
Nesdi Evrilyan Rozanda Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bits.v7i2.7473

Keywords: Aspect-Based Sentiment Analysis; Data Breach; Random Forest; Sentiment Analysis; Support Vector Machine

Abstract

A ransomware attack on Indonesia’s Temporary National Data Center (PDNS) in June 2024 triggered major public concern over data security and government preparedness. This study aims to analyze public sentiment toward the incident using an Aspect-Based Sentiment Analysis approach on 2,700 Indonesian-language tweets collected from the X platform. The research follows the SEMMA (Sample, Explore, Modify, Model, Assess) methodology, involving text preprocessing, aspect extraction using part-of-speech tagging and named entity recognition, feature representation using Term Frequency-Inverse Document Frequency, and aspect refinement through semantic coherence. Extracted aspects are grouped into five categories: data security, institutions, infrastructure, politics and economy, and impact. Sentiment classification is carried out using the IndoBERTweet model. Results indicate a strong dominance of negative sentiment, particularly in the infrastructure and institutional categories, with no positive sentiment recorded in the political and economic aspect. To address class imbalance in sentiment distribution, the Synthetic Minority Oversampling Technique is applied during model training. Performance evaluation of two algorithms—Random Forest and Support Vector Machine—shows that Random Forest performs best, achieving 96% accuracy on a 70:30 data split and 99.05% average accuracy using 10-fold cross-validation. These findings highlight the effectiveness of aspect-based sentiment analysis and demonstrate Random Forest's superiority in handling imbalanced sentiment classification tasks.

Downloads

Download data is not yet available.

References

H. T. Halawani, A. M. Mashraqi, S. K. Badr, and S. Alkhalaf, “Automated sentiment analysis in social media using Harris Hawks optimisation and deep learning techniques,” Alexandria Engineering Journal, vol. 80, pp. 433–443, Oct. 2023, doi: 10.1016/j.aej.2023.08.062.

H. Alhindi, I. Traore, and I. Woungang, “Preventing Data Leak through Semantic Analysis,” Internet of Things, vol. 14, p. 100073, Jun. 2021, doi: 10.1016/j.iot.2019.100073.

D. Arisandi, T. Sutrisno, and I. Kurniawan, “KLASIFIKASI OPINI MASYARAKAT DI TWITTER TENTANG KEBOCORAN DATA YANG TERJADI DI INDONESIA MENGGUNAKAN ALGORITMA SVM,” Jurnal Informatika Kaputama (JIK), vol. 7, no. 1, pp. 84–90, Jan. 2023, doi: 10.59697/jik.v7i1.10.

Z. N. Aziza and D. Y. Kristiyanto, “Prediction of The Level of Public Trust in Government Policies in the 1 st Quarter of The Covid 19 Pandemic using Sentiment Analysis,” E3S Web of Conferences, vol. 317, p. 05013, Nov. 2021, doi: 10.1051/e3sconf/202131705013.

V. Tandon and R. Mehra, “An Integrated Approach For Analysing Sentiments On Social Media,” Informatica, vol. 47, no. 2, pp. 213–220, Jun. 2023, doi: 10.31449/inf.v47i2.4390.

Z. Janková, “CRITICAL REVIEW OF TEXT MINING AND SENTIMENT ANALYSIS FOR STOCK MARKET PREDICTION,” Journal of Business Economics and Management, vol. 24, no. 1, pp. 177–198, Apr. 2023, doi: 10.3846/jbem.2023.18805.

P. Guo, “Construction of Semantic Coherence Diagnosis Model of English Text based on Sentence Semantic Map,” Scalable Computing: Practice and Experience, vol. 25, no. 1, pp. 327–339, Jan. 2024, doi: 10.12694/scpe.v25i1.2298.

H.-H. Nguyen, “Enhancing Sentiment Analysis on Social Media Data with Advanced Deep Learning Techniques,” International Journal of Advanced Computer Science and Applications, vol. 15, no. 5, 2024, doi: 10.14569/IJACSA.2024.0150598.

T. Ahmed Khan, R. Sadiq, Z. Shahid, M. M. Alam, and M. Mohd Su’ud, “Sentiment Analysis using Support Vector Machine and Random Forest,” Journal of Informatics and Web Engineering, vol. 3, no. 1, pp. 67–75, Feb. 2024, doi: 10.33093/jiwe.2024.3.1.5.

N. I. Wibowo, T. A. Maulana, H. Muhammad, and N. A. Rakhmawati, “Perbandingan Algoritma Klasifikasi Sentimen Twitter Terhadap Insiden Kebocoran Data Tokopedia,” JISKA (Jurnal Informatika Sunan Kalijaga), vol. 6, no. 2, pp. 120–129, May 2021, doi: 10.14421/jiska.2021.6.2.120-129.

R. Sholehurrohman and I. Sabda Ilman, “ANALISIS SENTIMEN TWEET KASUS KEBOCORAN DATA PENGGUNAAN FACEBOOK OLEH CAMBRIGDE ANALYTICA,” Jurnal Pepadun, vol. 3, no. 1, pp. 140–147, Apr. 2022, doi: 10.23960/pepadun.v3i1.108.

A. Zy and Wahyu Hadikristanto, “Implementasi Algoritma Metode Naive Bayes dan Support Vector Machine Tentang Pembobolan dan Kebocoran Data di Twitter,” Bulletin of Information Technology (BIT), vol. 4, no. 1, pp. 49–56, Mar. 2023, doi: 10.47065/bit.v4i1.493.

C. Umam, L. B. Handoko, and F. O. Isinkaye, “Performance Analysis of Support Vector Classification and Random Forest in Phishing Email Classification,” Scientific Journal of Informatics, vol. 11, no. 2, pp. 367–374, May 2024, doi: 10.15294/sji.v11i2.3301.

A. Z. Taufan and W. Wibowo, “ANALISIS SENTIMEN TERKAIT PERSEPSI KEAMANAN DATA INFORMASI DAN PRIVASI DI INDONESIA MENGGUNAKAN PENDEKATAN MACHINE LEARNING,” Jurnal Informatika Teknologi dan Sains (Jinteks), vol. 6, no. 3, pp. 728–736, Aug. 2024, doi: 10.51401/jinteks.v6i3.4764.

M. I. Amal, E. S. Rahmasita, E. Suryaputra, and N. A. Rakhmawati, “Analisis Klasifikasi Sentimen Terhadap Isu Kebocoran Data Kartu Identitas Ponsel di Twitter,” Jurnal Teknik Informatika dan Sistem Informasi, vol. 8, no. 3, Dec. 2022, doi: 10.28932/jutisi.v8i3.5483.

A. M. Taufiqi and A. Nugroho, “Sentimen Pengguna Twitter Mengenai Isu Kebocoran Data Dengan Algoritma Naïve Bayes,” Jurnal Nasional Ilmu Komputer, vol. 4, no. 1, pp. 1–11, Mar. 2023, doi: 10.47747/jurnalnik.v4i1.1091.

D. Jacob and R. Henriques, “Educational Data Mining to Predict Bachelors Students’ Success,” Emerging Science Journal, vol. 7, no. Special Issue 2, pp. 159–171, Jul. 2023, doi: 10.28991/ESJ-2023-SIED2-013.

L. Andrade-Arenas, I. Rubio-Paucar, and C. Yactayo-Arias, “Predictive models in Alzheimer’s disease: an evaluation based on data mining techniques,” International Journal of Electrical and Computer Engineering (IJECE), vol. 14, no. 3, p. 2988, Jun. 2024, doi: 10.11591/ijece.v14i3.pp2988-3002.

J. Boegershausen, H. Datta, A. Borah, and A. T. Stephen, “Fields of Gold: Scraping Web Data for Marketing Insights,” J Mark, vol. 86, no. 5, pp. 1–20, Sep. 2022, doi: 10.1177/00222429221100750.

V. Boppana and P. Sandhya, “Web crawling based context aware recommender system using optimized deep recurrent neural network,” J Big Data, vol. 8, no. 1, p. 144, Dec. 2021, doi: 10.1186/s40537-021-00534-7.

N. Babanejad, H. Davoudi, A. Agrawal, A. An, and M. Papagelis, “The Role of Preprocessing for Word Representation Learning in Affective Tasks,” IEEE Trans Affect Comput, vol. 15, no. 1, pp. 254–272, Jan. 2024, doi: 10.1109/TAFFC.2023.3270115.

C. B. Lee, H. N. Io, and H. Tang, “Sentiments and perceptions after a privacy breach incident,” Cogent Business and Management, vol. 9, no. 1, 2022, doi: 10.1080/23311975.2022.2050018.

P. A. Rodríguez-Correa et al., “Information security education: a thematic trend analysis,” F1000Res, vol. 14, p. 5, Jan. 2025, doi: 10.12688/f1000research.159828.1.

R. Shandler, N. Kostyuk, and H. Oppenheimer, “Public Opinion and Cyberterrorism,” Public Opin Q, vol. 87, no. 1, pp. 92–119, 2023, doi: 10.1093/poq/nfad006.

N. U. Prince et al., “AI-Powered Data-Driven Cybersecurity Techniques: Boosting Threat Identification and Reaction,” 2024. [Online]. Available: www.nano-ntp.com

M. Sivakumar, S. Parthasarathy, and T. Padmapriya, “Trade-off between training and testing ratio in machine learning for medical image processing,” PeerJ Comput Sci, vol. 10, p. e2245, Sep. 2024, doi: 10.7717/peerj-cs.2245.

V. M. Rajan and A. Ramanujan, “Architecture of a Semantic WordCloud Visualization,” in Second International Conference on Networks and Advances in Computational Technologies, L. and J. J. and J. J. Palesi Maurizio and Trajkovic, Ed., Cham: Springer International Publishing, 2021, pp. 95–106. doi: 10.1007/978-3-030-49500-8_9.

M. S. Sayeed, V. Mohan, and K. S. Muthu, “BERT: A Review of Applications in Sentiment Analysis,” HighTech and Innovation Journal, vol. 4, no. 2, pp. 453–462, Jun. 2023, doi: 10.28991/HIJ-2023-04-02-015.

A. F. Hidayatullah, R. A. Apong, D. T. C. Lai, and A. Qazi, “Corpus creation and language identification for code-mixed Indonesian-Javanese-English Tweets,” PeerJ Comput Sci, vol. 9, p. e1312, Jun. 2023, doi: 10.7717/peerj-cs.1312.

F. Koto, J. H. Lau, and T. Baldwin, “IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA: Association for Computational Linguistics, 2021, pp. 10660–10668. doi: 10.18653/v1/2021.emnlp-main.833.

M. Kurniawan, K. Kusrini, and M. R. Arief, “Part of Speech Tagging Pada Teks Bahasa Indonesia dengan BiLSTM + CNN + CRF dan ELMo,” Jurnal Eksplora Informatika, vol. 11, no. 1, pp. 29–37, Jan. 2022, doi: 10.30864/eksplora.v11i1.506.

E. Yulianti, N. Bhary, J. Abdurrohman, F. W. Dwitilas, E. Q. Nuranti, and H. S. Husin, “Named entity recognition on Indonesian legal documents: a dataset and study using transformer-based models,” International Journal of Electrical and Computer Engineering (IJECE), vol. 14, no. 5, p. 5489, Oct. 2024, doi: 10.11591/ijece.v14i5.pp5489-5501.

L. Wang, Y. Yang, L. Xu, and T. Ji, “Application of random forest algorithm in the detection of foreign objects in wine,” Applied Mathematics and Nonlinear Sciences, vol. 9, no. 1, Jan. 2024, doi: 10.2478/amns.2023.2.00055.

D. Papakyriakou and I. S. Barbounakis, “Data Mining Methods: A Review,” Int J Comput Appl, vol. 183, no. 48, pp. 5–19, Jan. 2022, doi: 10.5120/ijca2022921884.

V. D. Cong and T. T. Hiep, “Support vector machine-based object classification for robot arm system,” International Journal of Electrical and Computer Engineering (IJECE), vol. 13, no. 5, p. 5047, Oct. 2023, doi: 10.11591/ijece.v13i5.pp5047-5053.

V. Ganganwar and R. Rajalakshmi, “Employing synthetic data for addressing the class imbalance in aspect-based sentiment classification,” Journal of Information and Telecommunication, vol. 8, no. 2, pp. 167–188, Apr. 2024, doi: 10.1080/24751839.2023.2270824.

A. Newaz, Md. S. Mohosheu, Md. A. Al Noman, and T. Jabid, “iBRF: Improved Balanced Random Forest Classifier,” in 2024 35th Conference of Open Innovations Association (FRUCT), Tampere: IEEE, Apr. 2024, pp. 501–508. doi: 10.23919/FRUCT61870.2024.10516372.

M. F. Schrauf, G. de los Campos, and S. Munilla, “Comparing Genomic Prediction Models by Means of Cross Validation,” Front Plant Sci, vol. 12, Nov. 2021, doi: 10.3389/fpls.2021.734512.

J. Wieczorek, C. Guerin, and T. McMahon, “K ‐fold cross‐validation for complex sample surveys,” Stat, vol. 11, no. 1, Dec. 2022, doi: 10.1002/sta4.454.

M. Shenify, “Sentiment analysis of Saudi e-commerce using naïve bayes algorithm and support vector machine,” International Journal of Data and Network Science, vol. 8, no. 3, pp. 1607–1612, Jun. 2024, doi: 10.5267/j.ijdns.2024.3.006.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Analisis Sentimen Masyarakat Terhadap Kebocoran Pusat Data Nasional Sementara Menggunakan Algoritma Random Forest dan Support Vector Machine

Analisis Sentimen Masyarakat Terhadap Kebocoran Pusat Data Nasional Sementara Menggunakan Algoritma Random Forest dan Support Vector Machine

Abstract

Downloads

References

Most read articles by the same author(s)