Komparasi FastText dan TF-IDF Berbasis Random Forest pada Analisis Sentimen IKN di Youtube
Abstract
The development of Indonesia's New Capital City (IKN) represents a significant national policy that has triggered diverse public responses, particularly across social media platforms like YouTube. This study aims to analyze public sentiment regarding the IKN project and compare the performance of two text feature extraction methods, FastText and Term Frequency-Inverse Document Frequency (TF-IDF), using the Random Forest algorithm. The primary objective is to identify which method is more effective in capturing the nuances of Indonesian-language public opinion. The dataset for this research includes 4,093 YouTube comments related to IKN, obtained using the YouTube Data API v3 in August 2025. The data were categorized into two classes, positive and negative, while neutral data were removed to minimize model bias. Data labeling was conducted manually and validated by a linguistic expert, followed by pre-processing stages such as data cleaning, case folding, normalization, tokenizing, stopword removal, and stemming. The setting of a 200-vector dimension for FastText and a 5,000-feature limit for TF-IDF was based on findings from previous sentiment analysis research, proving that such configurations provide stable classification performance compared to other parameters, as they are statistically more effective in filtering irrelevant features without losing deep semantic information. Model performance was evaluated using the 10-Fold Cross-Validation method and Confusion Matrix based on accuracy, precision, recall, and F1-score metrics. Results indicate that the FastText method achieved an accuracy of 83.67%, precision of 84.01%, recall of 83.72%, and an F1-score of 80.83%, while TF-IDF yielded an accuracy of 80.53%. These findings conclude that FastText is more effective in representing the context and semantic meaning of Indonesian YouTube comments related to IKN. Furthermore, this method provides a balance in pattern recognition and the precision of sentiment classification results. This research contributes to assisting stakeholders and researchers in more accurately understanding public opinion toward IKN.
Downloads
References
Akbar, H., & Sanjaya, W. K. (2023). Kajian Performa Metode Class Weight Random Forest pada Klasifikasi Imbalance Data Kelas Curah Hujan. Jurnal Sains, Nalar, Dan Aplikasi Teknologi Informasi (SNATI), 3(1), 42–49. https://doi.org/10.20885/snati.v3i1.3094
Arif, M. W., & Kustiyono, K. (2025). Analisis Sentimen Kebijakan Makan Bergizi Gratis di Media Sosial Menggunakan Natural Language Processing Berbasis Python TextBlob di Indonesia. Jurnal Pendidikan Dan Teknologi Indonesia, 5(9), 2463–2471. https://doi.org/10.52436/1.jpti.931
Atika, P., Surya, A., Jasril, & Iis, A. (2025). Eksplorasi Fitur FastText, TF-IDF Dan IndoBERT Pada Metode K-Nearest Neighbor Untuk Klasifikasi Sentimen. Jurnal Sistem Informasi (ZONASI), 7(1), 49–60. https://doi.org/10.31849/zn.v7i1.24779
Ayunda, C., Soemedhy, A., Trivetisia, N., Winanti, N. A., Martiyaningsih, D. P., Utami, T. W., & Sudianto, S. (2022). Analisis Komparasi Algoritma Machine Learning untuk Sentiment Analysis (Studi Kasus: Komentar YouTube ‘Kekerasan Seksual’). Jurnal Informatika: Jurnal Pengembangan IT (JPIT), 7(2), 80–84. https://doi.org/10.30591/jpit.v7i2.3547
Bayu, B. B., Susanto, I., & Khomsah, S. (2021). Analisis Sentimen Pelanggan Hotel di Purwokerto Menggunakan Metode Random Forest dan TF-IDF (Studi Kasus: Ulasan Pelanggan Pada Situs Tripadvisor). INISTA, 3(2), 21–29. https://doi.org/10.20895/inista.v3i2.203
Bifadhlillah, M. I., Sucipto, & Arie, N. (2025). Analisis Algoritma KNN dan Penerapan SMOTE Dalam Deteksi Dini Kanker Paru-Paru. Jurnal Qua Teknika, 15(2), 38–50. https://doi.org/10.35457/quateknika.v15i02.4603
Calvin Jonathan, Theresia Herlina Rochadiani, & Thamrin Sofian. (2023). Analisis Sentimen Komentar Video Youtube Flat Earth Theory Dengan Menggunakan Metode Unsupervised Dan Supervised Learning. Decode: Jurnal Pendidikan Teknologi Informasi, 3(2), 378–387. https://doi.org/10.51454/decode.v3i2.210
Cucun, V. A., Mochamad Adrian, N. T., Budi, D. S., & Ari, K. (2024). Optimasi Klasifikasi Sentimen Menggunakan Random Forest dengan Preprocessing K-Means Clustering dan SMOTE. JEPIN : Jurnal Edukasi Dan Penelitian Informatika, 10(3), 389–400. https://doi.org/10.26418/jp.v10i3.84514
Dina, S. N., Nining, R., Raditya, D. D., & Cep Lukman, R. (2025). Penerapan Algoritma Naive Bayes Dalam Analisis Sentimen Ulasan Aplikasi Kita Lulus Di Google Play Store. Jurnal Informatika Terpadu, 11(1), 213–223. https://doi.org/10.54914/jit.v11i1.1544
Ermawan, B. R., & Cahyono, N. (2025). Optimasi Metode Klasifikasi Menggunakan FastText dan GRID Search Pada Analisis Sentimen Ulasan Aplikasi Seabank. JIKO (Jurnal Informatika Dan Komputer), 9(1), 226–238. https://doi.org/10.26798/jiko.v9i1.1523
Fitriyah Syarifatul. (2025). Analisis Penerapan Teori Pembangunan Rostow Terhadap Pembangunan Ibu Kota Negara (IKN) di Kalimantan Timur. JEPP : Jurnal Ekonomi Pembangunan Dan Pariwisata, 5(1), 1–11. https://doi.org/10.52300/jepp.v5i1.19822
Huwaida, S. F., Kusumawati, R., & Isnaini, B. (2024). Analisis Sentimen Komentar YouTube terhadap Pemindahan Ibu Kota Negara Menggunakan Metode Naïve Bayes. Jambura Journal of Informatics, 6(1), 26–39. https://doi.org/10.37905/jji.v6i1.24718
Iman, A. K., & Ujianto, E. I. H. (2024). Analisis Sentimen Pemindahan Ibu Kota Indonesia Menggunakan K-Nearest Neighbor. Jurnal Pendidikan Dan Teknologi Indonesia, 4(12), 759–768. https://doi.org/10.52436/1.jpti.546
Jasmine, A. M., Kinaya, K. K., Wildan, H., Muhammad Galuh, G., Reza, P., & Humannisa, R. L. (2025). Analisis Sentimen Ulasan Aplikasi HeyJapan di Google Play Store Menggunakan Algoritma NLP. Pragmatik : Jurnal Rumpun Ilmu Bahasa Dan Pendidikan, 3(3), 157–167. https://doi.org/10.61132/pragmatik.v3i3.1801
Kadir, S. F., & Fairuzabadi, A. (2025). Analisis Sentimen Ulasan Shopee di Google Play dengan TF-IDF dan Logistic Regression. RIGGS: Journal of Artificial Intelligence and Digital Business, 4(2), 7940–7944. https://doi.org/10.31004/riggs.v4i2.2850
Laia, N. A., & Barus, S. P. (2025). Analisis Sentimen YouTube: ‘Di Balik Ambisi Jokowi dalam IKN’. Jurnal Pustaka AI (Pusat Akses Kajian Teknologi Artificial Intelligence), 5(1), 7–12. https://doi.org/10.55382/jurnalpustakaai.v5i1.891
Larasakti, D. N., Aziz, A., & Aditya, D. (2023). Analisis Sentimen Komentar Video Youtube Dengan Metode K-Nearest Neighbor. Jurnal Ilmiah Wahana Pendidikan, 9(5), 132–142. https://doi.org/10.5281/zenodo.7728573
Magnolia, C., Nurhopipah, A., & Kusuma, A. B. (2022). Penanganan Imbalanced Dataset untuk Klasifikasi Komentar Program Kampus Merdeka Pada Aplikasi Twitter. Du Komputika: Jurnal Ilmiah Pendidikan Informatika, 9(2), 105–113. https://doi.org/10.15294/edukomputika.v9i2.61854
Mahmuda, S. (2024). Implementasi Metode Random Forest pada Kategori Konten Kanal Youtube. Jurnal Jendela Matematika, 2(01), 21–31. https://doi.org/10.57008/jjm.v2i01.633
Melati, R., & Reza, M. (2024). Analisis Sentimen Data Twitter Menggunakan Metode K-Means Clustering Pada Studi Kasus Pemindahan Ibu Kota Nusantara (IKN). Jurnal TAM (Technology Acceptance Model), 15(1), 66–73. https://doi.org/10.56327/jurnaltam.v15i1.1670
Mola, S. A. S., Iqbal Muhammad Iskandar, Pidu Dimu, J. E., & Seran, W. Y. (2024). Analisis Sentimen Pembangunan Ibu Kota Negara Indonesia Menggunakan Metode Naive Bayes, dan K-Nearest Neighbor. HOAQ (High Education of Organization Archive Quality) : Jurnal Teknologi Informasi, 15(2), 151–157. https://doi.org/10.52972/hoaq.vol15no2.p151-157
Muhammad Rizki Syafapri, Elin Haerani, Iwan Iskandar, & Liza Afriyanti. (2024). Klasifikasi sentimen terhadap larangan pernikahan beda agama menggunakan metode Naive Bayes Classifier. Jurnal CoSciTech (Computer Science and Information Technology), 5(1), 10–18. https://doi.org/10.37859/coscitech.v5i1.6889
Muhayat, T., Fauzi, A., & Indra, J. (2023). Analisis Sentimen Terhadap Komentar Video Youtube Menggunakan Support Vector Machines. Progresif: Jurnal Ilmiah Komputer, 19(1), 231–240. https://doi.org/10.35889/progresif.v19i1.1060
Mukti, M. K., & Agustian, S. (2022). Metode SVM dengan Fitur Representasi FastText untuk Klasifikasi Sentimen Twitter Mengenai Program Vaksinasi Covid-19. Jurnal Teknologi Informasi & Komunikasi Digital Zone, 13(1), 140–150. https://doi.org/10.31849/digitalzone.v13i2.11531
Mursyidah, Davi, M., & Novitri, D. S. (2024). Klasifikasi Sentimen Review Pengguna terhadap Aplikasi Instagram menggunakan Algoritma Random Forest. Jurnal Informatika Dan Mesin (JIM), 9(2), 106–115. https://doi.org/10.30811/jim.v9i2.6069
Nurannisa, N., Indriati, & Arif Rahman, M. (2026). Analisis Sentimen Opini Terhadap Bilingualisme dalam Teks Code-Mixed Indonesia-Inggris di Platform X Menggunakan Metode Attention-Based BiLSTM-CNN. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 10(1), 2548–2964. https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/16075
Pangestu, A. F., Rahmat, B., & Sihananto, A. N. (2024). Analisis Sentimen Pada Media Sosial X Terhadap Implementasi Kurikulum Merdeka Menggunakan Metode Fastext Dan Long Short-Term Memory (LSTM). JIPI (Jurnal Ilmiah Penelitian Dan Pembelajaran Informatika), 9(4), 2271–2280. https://doi.org/10.29100/jipi.v9i4.5665
Pasaribu, R., & Ida Ayu, G. P. S. (2025). Analisis Sentimen Ulasan Aplikasi dengan Multinomial Naïve Bayes, Logistic Regression, dan SVM. Jurnal Nasional Teknologi Informasi Dan Aplikasinya (JNATIA), 4(1), 63–72. https://doi.org/10.24843/JNATIA.2025.v04.i01.p08
Prastyo, D., Irawan, D., & Mursyidin, I. H. (2024). Klasifikasi Sentimen Komentar YouTube dengan NLP pada Debat Pilkada Banten 2024. Bit-Tech, 7(2), 413–421. https://doi.org/10.32877/bt.v7i2.1833
Priyanto, C. R. D. M., Azahari, & Sa’ad, I. M. (2025). Analisis Sentimen Terhadap Kontroversi Pembangunan IKN di Media Sosial Twitter Menggunakan Metode Naïve Bayes. Bulletin of Information Technology (BIT), 6(2), 97–108. https://doi.org/10.47065/bit.v6i2.1993
Purnomo, D., Firgiawan, W., & Nur, N. (2025). Komparasi Algoritma Random Forest, Naïve Bayes, dan SVM pada Sentimen Kebijakan PPN 12%. Jurnal Tekno Kompak, 19(2), 155–167. https://doi.org/10.33365/jtk.v19i2.122
Putra, H., & Rumini. (2025). Comparative Study of Logistic Regression, Random Forest, and XGBoost for Bank Loan Approval Classification. Journal of Applied Informatics and Computing (JAIC), 9(5), 2822–2835. https://doi.org/10.30871/jaic.v9i5.10862
Rafi, R., Rahim, A., & Rudiman, R. (2024). Analisis Sentimen Ulasan “Ojol The Game” Di Google Play Store Menggunakan Algoritma Naive Bayes Dan Model Ekstraksi Fitur TF-IDF Untuk Meningkatkan Kualitas Game. Jurnal Informatika Dan Teknik Elektro Terapan, 12(3), 2928–2936. https://doi.org/10.23960/jitet.v12i3.4988
Rahmadhani, S. A., Rusanti, L. D., & Rosyid, H. Al. (2025). Klasifikasi Sentimen Komentar Youtube Demonstrasi DPR RI Menggunakan Support Vector Machine. Arcitech: Journal of Computer Science and Artificial Intelligence, 5(2), 356–375. https://doi.org/10.29240/arcitech.v5i2.15316
Rahman, M. D., Arif Djunaidy, & Faizal Mahananto. (2021). Penerapan Weighted Word Embedding pada Pengklasifikasian Teks Berbasis Recurrent Neural Network untuk Layanan Pengaduan Perusahaan Transportasi. Jurnal Sains Dan Seni ITS, 10(1), 1–6. https://doi.org/10.12962/J23373520.V10I1.56145
Raihan, A., Hasan, A., Azhim, M. F., & Fadilah, I. (2026). Klasifikasi Kompleksitas Gameplay Berbasis Struktur Kalimat pada Deskripsi Game. Jurnal Ilmu Komputer Dan Teknologi Informasi (JIKTI), 3(1), 11–21. https://doi.org/10.63447/jikti.v3i1.1824
Rihastuti, S., & Rosyidi, A. (2025). Progres Pembangunan IKN Dengan Metode Random Forest. Journal of Computer Science and Technology, 5(1), 19–23. https://doi.org/10.54840/jcstech.v5i1.345
Romadhoni, Y., & Holle Hayati, F. K. (2022). Analisis Sentimen Terhadap PERMENDIKBUD No.30 pada Media Sosial Twitter Menggunakan Metode Naive Bayes dan LSTM. JPIT (Jurnal Informatika: Jurnal Pengembangan IT), 7(2), 118–123. https://doi.org/10.30591/jpit.v7i2.3191
Salman Al Markas, M., Anraeni, S., & Budiman Ilmuwan, L. (2025). Implementasi Fitur Vector Bag Of Word Dan TF IDF untuk Analisis Sentiment. LINIER: Literatur Informatika Dan Komputer, 2(2), 136–146. https://doi.org/10.33096/linier.v2i2.3104
Saputri, G. A., & Alita, D. (2024). Analisis Sentimen Twitter Terhadap Pemindahan Ibu Kota Negara Menggunakan Support Vector Machine. Jurnal Informatika: Jurnal Pengembangan IT, 9(3), 213–223. https://doi.org/10.30591/jpit.v9i3.6612
Sari, S. N., Faisal, M. R., Kartini, D., Budiman, I., Saragih, T. H., & Muliadi. (2023). Perbandingan Ekstraksi Fitur dengan Pembobotan Supervised dan Unsupervised pada Algoritma Random Forest untuk Pemantauan Laporan Penderita Covid-19 di Twitter. Jurnal Komputasi, 11(1), 33–42. https://doi.org/10.23960/komputasi.v11i1.6650
Setiana, E., Marwondo, Venia Retreva Danestiara, & Wiyanudin. (2023). Analisis Sentimen Pelaksanaan Kuliah Online Menggunakan Algoritma Support Vector Machine. Nuansa Informatika, 17(2), 66–70. https://doi.org/10.25134/ilkom.v17i2.11
Siswa Yoga, A. T., & Pranoto, J. W. (2023). Implementasi Seleksi Fitur Information Gain Ratio Pada Algoritma Random Forest untuk Model Data Klasifikasi Pembayaran Kuliah. Dinamika Informatika, 15(1), 41–49. https://doi.org/10.35315/informatika.v15i1.9465
Suci Amaliah, Nusrang, M., & Aswi, A. (2022). Penerapan Metode Random Forest Untuk Klasifikasi Varian Minuman Kopi di Kedai Kopi Konijiwa Bantaeng. VARIANSI: Journal of Statistics and Its Application on Teaching and Research, 4(3), 121–127. https://doi.org/10.35580/variansiunm31
Tamba, P. S. E. (2022). Prediksi Penyakit Gagal Jantung Dengan Menggunakan Random Forest. Jurnal Sistem Informasi Dan Ilmu Komputer Prima (JOSIKOM PRIMA), 5(2), 176–181. https://doi.org/10.34012/jurnalsisteminformasidanilmukomputer.v5i2.2445
Wajhillah, R., & Wibowo, A. (2021). Information Retrieval Pemetaan Peta Jalan Penelitian Perguruan Tinggi Berbasis Dokumen Publikasi Ilmiah Dosen. Larik: Jurnal Lapisan Riset Informatika, 2(2), 49–56. https://doi.org/10.31294/larik.v2i2.1816
Wijaya, H., & Hayati, N. (2025). Natural Language Processing ( NLP ) Untuk Analisis Sentimen Ulasan Seblak Bandung Pedas Kudus. Journal of Business and Audit Information System, 8(1), 13–22. https://doi.org/10.30813/jbase.v8i1.8035
Wijiyanto, W., Pradana, A. I., Sopingi, S., & Atina, V. (2024). Teknik K-Fold Cross Validation untuk Mengevaluasi Kinerja Mahasiswa. Jurnal Algoritma, 21(1), 239–248. https://doi.org/10.33364/algoritma/v.21-1.1618
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Komparasi FastText dan TF-IDF Berbasis Random Forest pada Analisis Sentimen IKN di Youtube
Pages: 2288-2301
Copyright (c) 2026 Fadhil Irsyad Ramadhani, Taghfirul Azhima Yoga, Naufal Azmi Verdhika

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).













