Klasifikasi Sentimen pada Dataset Terbatas Menggunakan Random Forest dan Word2Vec


  • Dina Deswara Fitri Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
  • Surya Agustian * Mail Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
  • Pizaini Pizaini Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
  • Suwanto Sanjaya Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
  • (*) Corresponding Author
Keywords: Sentiment Classification; Random Forest; Word2Vec; Limited Dataset; Media Sosial

Abstract

Sentiment measurement of public opinion on social media is essential for understanding societal views on various issues, including public figures and political events. This research explores the effectiveness of the Random Forest algorithm with Word2Vec-based word representation for sentiment classification on a limited dataset. The case study involves tweets regarding Kaesang Pangarep as the Chairman of PSI, supplemented by external data related to Covid-19 and general topics. The dataset was processed using cleaning techniques, case folding, stopword removal, stemming, and tokenization. Words in the dataset were represented using the Word2Vec model with a Continuous Bag of Words (CBOW) architecture and a vector dimension of 500. Random Forest was employed to classify sentiment into positive, negative, or neutral categories. In the initial phase, the model was trained using 300 samples per label; however, the results showed unsatisfactory performance with an F1-Score of 49.00% and an accuracy of 50.00%. To improve performance, the dataset was expanded by adding 900 samples from Kaesang and 1,080 samples from external topics. The final results indicated an improvement with an F1-Score of 49.89%, an accuracy of 58.29%, precision of 49.16%, and recall of 56.47%. This research confirms that the use of Random Forest with word representation from Word2Vec can enhance sentiment classification performance, even with a limited dataset, and contributes to the development of sentiment analysis techniques in the field of machine learning.

Downloads

Download data is not yet available.

References

H. Naufal, M. F., Arifin, T., & Wirjawan, “Analisis Perbandingan Tingkat Performa Algoritma SVM , Random Forest , dan Naïve Bayes untuk Klasifikasi Cyberbullying pada Media Sosial,” J. Ris. Sist. Inf. Dan Tek. Inform., vol. 8, no. 1, pp. 82–90, 2023, doi: 10.30645/jurasik.v8i1.544.

A. Wandani, “Sentimen Analisis Pengguna Twitter pada Event Flash Sale Menggunakan Algoritma K-NN , Random Forest , dan Naive Bayes,” J. Sains Komput. Inform. Vol., vol. 5, no. 2, pp. 651–665, 2021, doi: 10.30645/j-sakti.v5i2.365.

A. Nasrudin Yahya, “Pro dan Kontra Kaesang Pangerep Jadi Ketum PSI,” Kompas.com. [Online]. Available: https://nasional.kompas.com/read/2023/09/26/16000031/pro-dan-kontra-kaesang-pangarep-jadi-ketum-psi?page=all

R. M. Nailar, “Sistem Deteksi Berita Hoax Menggunakan Algoritma Navie Bayes Dan Random Forest Pada Machine Learning,” Pondasi J. Appl. Sci. Eng., vol. 1, no. 2, pp. 43–57, 2024.

D. A. Agustina, S. Subanti, E. Zukhronah, P. S. Statistika, and U. S. Maret, “Implementasi Text Mining Pada Analisis Sentimen Pengguna Twitter Terhadap Marketplace di Indonesia Menggunakan Algoritma Support Vector Machine,” Indones. J. Appl. Stat., vol. 3, no. 2, pp. 109–122, 2020, doi: https://doi.org/10.13057/ijas.v3i2.44337.

F. W. Kurniawan and W. Maharani, “Analisis Sentimen Twitter Bahasa Indonesia dengan Word2Vec,” e-Proceeding Eng., vol. 7, no. 2, pp. 7821–7829, 2020.

R. Di, K., Tentang, Y., Afdhal, I., Kurniawan, R., Iskandar, I., & Salambue, “Penerapan Algoritma Random Forest Untuk Analisis Sentimen Komentar Di YouTube Tentang Islamofobia,” J. Nas. Komputasi dan Teknol. Inf., vol. 5, no. 1, pp. 122–130, 2022, doi: 10.32672/jnkti.v5i1.4004.

D. I. A. Susanto.Aji, “Analisis Sentimen Data Twitter Topik Ekonomi Dan Industri Dengan Metode Naive Bayes Dan Random Forest,” J. Ilm. Wahana Pendidik., vol. 9, no. 20, pp. 59–65, 2023, doi: 10.5281/zenodo.8398895.

& R. Adhan, S. N., Wibawa, G. N. A., Arisona, D. C., Yahya, I., Agusrawati, “Analisis sentimen ulasan aplikasi wattpad di google play store dengan metode random forest,” AnoaTIK J. Teknol. Inf. dan Komput., vol. 2, no. 1, pp. 6–15, 2024, doi: 10.33772/anoatik.v2i1.32.

S. K. Delimasari, “Komparasi Algoritma Machine Learning Untuk Menganalisis Sentimen Ulasan Pada Aplikasi Digital Korlantas Polri,” G-Tech J. Teknol. Terap., vol. 8, no. 4, pp. 2411–2419, 2024, doi: 10.70609/gtech.v8i4.5089.

N. B. S. N. R. R. N. S. Fatonah, “PENGGUNAAN METODE SVM DAN RANDOM FOREST UNTUK ANALISIS SENTIMEN ULASAN PENGGUNA TERHADAP KAI ACCESS DI GOOGLE PLAYSTORE,” JATI (Jurnal Mhs. Tek. Inform., vol. 7, no. 3, pp. 1901–1906, 2023, doi: 10.36040/jati.v7i3.6899.

I. R. Hendrawan, E. Utami, and A. D. Hartanto, “Analisis Perbandingan Metode Tf-Idf dan Word2vec pada Klasifikasi Teks Sentimen Masyarakat Terhadap Produk Lokal di Indonesia,” Smart Comp, vol. 11, no. 3, pp. 497–503, 2022, doi: 10.30591/smartcomp.v11i3.3902.

F. Ismayanti and E. B. Setiawan, “Deteksi Konten Hoax Berbahasa Indonesia di Twitter Menggunakan Fitur Ekspansi dengan Word2Vec,” eProceedings Eng., vol. 8, no. 5, pp. 10288–10300, 2021.

S. Agustian, M. I. Syah, and R. Abdillah, “Arah Baru Penelitian Klasifikasi Teks: Memaksimalkan Kinerja Klasifikasi Sentimen dari Data Terbatas,” MALCOM (Indonesia J. Mach. Learn. Comput. Sci., vol. 4, no. 8, pp. 1–10, 2024.

M. Sahbuddin and S. Agustian, “Support Vector Machine Method with Word2vec for Covid-19 Vaccine Sentiment Classification on Twitter,” J. Informatics Telecommun. Eng., vol. 6, no. 1, pp. 288–297, 2022, doi: 10.31289/jite.v6i1.7534.

M. Ihsan et al., “LSTM (Long Short Term Memory) for Sentiment COVID-19 Vaccine Classification on Twitter 1,2,3,” Digit. Zo., vol. 13, no. 1, pp. 79–89, 2022, doi: 10.31849/digitalzone.v13i1.9950.

S. Agustian and A. Nazir, “Klasifikasi Sentimen Terhadap Pengangkatan Kaesang Sebagai Ketua Umum Partai PSI Menggunakan Metode Support Vector Machine,” Build. Informatics, Technol. Sci., vol. 6, no. 1, pp. 216–225, 2024, doi: 10.47065/bits.v6i1.5340.

M. Dimas Lutfiyanto, E. B. Setiawan, and S. Si, “Expansion Feature dengan Word2Vec untuk Analisis Sentimen pada Opini Politik di Twitter dengan Klasifikasi Support Vector Machine, Naïve Bayes, dan Random Forest,” eProceedings Eng., vol. 8, no. 5, pp. 10399–10410, 2021.

W. Widayat, “Analisis Sentimen Movie Review menggunakan Word2Vec dan metode LSTM Deep Learning,” J. Media Inform. Budidarma, vol. 5, no. 3, pp. 1018–1026, 2021, doi: 10.30865/mib.v5i3.3111.

Y. A. Pradana, I. Cholissodin, and D. Kurnianingtyas, “Analisis Sentimen Pemindahan Ibu Kota Indonesia pada Media Sosial Twitter menggunakan Metode LSTM dan Word2Vec,” J. Pengemb. Teknol. dan Ilmu Komput., vol. 7, no. 5, pp. 2389–2397, 2023.

T. A. A. D. Ananey-obiri, “Word2vec neural model-based technique to generate protein vectors for combating COVID-19 : a machine learning approach,” Int. J. Inf. Technol., vol. 14, no. 7, pp. 3291–3299, 2022, doi: 10.1007/s41870-022-00949-2.

M. S. Efendi and A. K. Zyen, “Penerapan Algoritma Random Forest Untuk Prediksi Penjualan Dan Sistem Persediaan Produk,” RESOLUSI Rekayasa Tek. Inform. dan Inf., vol. 5, no. 1, pp. 12–20, 2024, doi: 10.30865/resolusi.v5i1.2149.

R. F. Amir and I. A. Sobari, “Penerapan PSO Over Sampling Dan Adaboost Random Forest Untuk Memprediksi Cacat Software,” Indones. J. Softw. Eng., vol. 6, no. 2, pp. 230–239, 2020, doi: 10.31294/ijse.v6i2.9258.

M. R. Adrian and M. P. Putra, “Perbandingan Metode Klasifikasi Random Forest dan SVM Pada Analisis Sentimen PSBB,” J. Inform. UPGRIS, vol. 7, no. 1, pp. 36–40, 2021, doi: 10.26877/jiu.v7i1.7099.

E. Elgeldawi, A. Sayed, and A. R. Galal, “Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis,” Informatics, vol. 8, no. 4, pp. 1–21, 2021, doi: 10.3390/informatics8040079.

H. M. Falah, M. R. Jamil, A. Taufik, and M. Botha, “Analysis Sentiment Terhadap Ginjal Akut pada Twitter Menggunakan Algoritma Random Forest,” Jurnla Ilmu Komput. dan Inform., vol. 3, no. 2, pp. 99–106, 2023, doi: 10.54082/jiki.65.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Klasifikasi Sentimen pada Dataset Terbatas Menggunakan Random Forest dan Word2Vec

Dimensions Badge
Article History
Submitted: 2024-11-13
Published: 2024-11-26
Abstract View: 81 times
PDF Download: 44 times
Section
Articles