Optimal Number Data Trains in Hoax News Detection of Indonesian using SVM and Word2Vec
Abstract
Along with the development of the era of technological development also has an increase. Information dissemination occurs very quickly on social media, especially Twitter. On Twitter, only some news circulating is necessarily accurate information. Lots of information that is spread is hoax news that irresponsible individuals apply. In this research, the author will build a system to determine the optimal amount of data trained in the hoax news classification process. In this study, the authors will use the support vector machine and word2vec algorithms to classify hoax and non-hoax news on the system to be created. In this study, five experiments were carried out with the number of train data used as many as 5000, 10000, 15000, 20000, and 25000. 5000 data train results in an accuracy of 77.28%, 10000 data train produce an accuracy of 79.68%, data 15,000 trains produce an accuracy of 79.892%, 20,000 data trains produce an accuracy of 80,416%, and 25,000 data trains produce an accuracy of 81,184%, by using a combination of unigram with token full token selection. This research aims to build a hoax detection system that can determine the optimal amount of data training to use. Also, this research is used to see the performance of the Support Vector Machine algorithm with Word2Vec in detecting hoax news
Downloads
References
A. Afriza and J. Adisantoso, “Metode Klasifikasi Rocchio untuk Analisis Hoax Rocchio Classification Method for Hoax Analysis,” J. Ilmu Komput. Agri-Informatika, vol. 5, no. 1, pp. 1–10, 2018, [Online]. Available: http://journal.ipb.ac.id/index.php/jika
R. N. Rahayu and Sensusiyati, “Analisis Berita Hoax Covid - 19 Di Media Sosial Di Indonesia,” J. Ekon. Sos. Hum., vol. 1, no. 9, p. 63, 2020.
Munirul, Ula, M. M. Alvanof, and R. Triandi, “Analisa Dan Deteksi Konten Hoax Pada Media Berita,” J. Teknol. Terap. Sains 4.0 Univ. Malikussaleh, vol. 1, p. 2, 2020.
I. Kencana Wintang, Crisanadenta; Setiawan Budi, Erwin; Kurniawan, “JURNAL RESTI Hoax Detection on Twitter using Feed-forward and Back-propagation,” RESTI J. (System Eng. Inf. Technol., vol. 4, no. 10, pp. 655–663, 2020.
F. Ismayanti and E. B. Setiawan, “Deteksi Konten Hoax Berbahasa Indonesia Di Twitter Menggunakan Fitur Ekspansi Dengan Word2vec,” eProceedings …, vol. 8, no. 5, pp. 10288–10300, 2021, [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/15697%0Ahttps://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/15697/15410
B. P. Nayoga, R. Adipradana, R. Suryadi, and D. Suhartono, “Hoax Analyzer for Indonesian News Using Deep Learning Models,” Procedia Comput. Sci., vol. 179, no. 2020, pp. 704–712, 2021, doi: 10.1016/j.procs.2021.01.059.
P. N. Anggreyani and W. Maharani, “Hoax Detection Tweets of the COVID-19 on Twitter Using LSTM- CNN with Word2Vec,” vol. 6, pp. 2432–2437, 2022, doi: 10.30865/mib.v6i4.4564.
D. A. Pisner and D. M. Schnyer, “Support vector machine,” Mach. Learn. Methods Appl. to Brain Disord., pp. 101–121, 2019, doi: 10.1016/B978-0-12-815739-8.00006-7.
I. M. Mubaroq and E. B. Setiawan, “The Effect of Information Gain Feature Selection for Hoax Identification in Twitter Using Classification Method Support Vector Machine,” Indones. J. …, vol. 5, no. September, pp. 107–118, 2020, doi: 10.21108/indojc.2020.5.2.499.
A. Nurdin, B. Anggo Seno Aji, A. Bustamin, and Z. Abidin, “Perbandingan Kinerja Word Embedding Word2Vec, Glove, Dan Fasttext Pada Klasifikasi Teks,” J. Tekno Kompak, vol. 14, no. 2, p. 74, 2020, doi: 10.33365/jtk.v14i2.732.
D. E. Latumaerissa, “Studi Ekstraksi Fitur Data Teks Rencana Pelaksanaan Pembelajaran Memanfaatkan Model Word2Vec,” J. Linguist. Komputasional, vol. 4, no. 2, p. 34, 2021, doi: 10.26418/jlk.v4i2.54.
D. I. Af’idah, Dairoh, S. F. Handayani, and R. W. Pratiwi, “Pengaruh Parameter Word2Vec terhadap Performa Deep Learning pada Klasifikasi Sentimen,” J. Inform. Jurunal Pengemb. IT, vol. 6, no. 3, pp. 156–161, 2021.
S. Khomsah and Agus Sasmito Aribowo, “Model Text-Preprocessing Komentar Youtube Dalam Bahasa Indonesia,” Rekayasa Sist. dan Teknol. Informasi, RESTI, vol. 4, no. 10, pp. 648–654, 2020.
I. Fahrur Rozi, A. Taufika Firdausi, and K. Islamiyah, “Analisis Sentimen Pada Twitter Mengenai Pasca Bencana Menggunakan Metode Naïve Bayes Dengan Fitur N-Gram,” J. Inform. Polinema, vol. 6, no. 2, pp. 33–39, 2020, doi: 10.33795/jip.v6i2.316.
M. Hakiem, M. A. Fauzi, and Indriati, “Klasifikasi Ujaran Kebencian pada Twitter Menggunakan Metode Naïve Bayes Berbasis N-Gram Dengan Seleksi Fitur Information Gain,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 3, no. 3, pp. 2443–2451, 2019, [Online]. Available: http://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/4682
J. Patihullah and E. Winarko, “Hate Speech Detection for Indonesia Tweets Using Word Embedding And Gated Recurrent Unit,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 13, no. 1, p. 43, 2019, doi: 10.22146/ijccs.40125.
E. Suryati, A. Ari Aldino, N. Penulis Korespondensi, and E. Suryati Submitted, “Analisis Sentimen Transportasi Online Menggunakan Ekstraksi Fitur Model Word2vec Text Embedding Dan Algoritma Support Vector Machine (SVM),” vol. 4, no. 1, pp. 96–106, 2023, [Online]. Available: https://doi.org/10.33365/jtsi.v4i1.2445
I. M. Parapat and M. T. Furqon, “Penerapan Metode Support Vector Machine (SVM) Pada Klasifikasi Penyimpangan Tumbuh Kembang Anak,” J. Pengemb. Teknol. Inf. dan Ilmu Komput. , vol. 2, no. 10, pp. 3163–3169, 2018, [Online]. Available: http://j-ptiik.ub.ac.id
D. F. N. Anisa, I. Mukhlash, and M. Iqbal, “Deteksi Berita Online Hoax Covid-19 Di Indonesia Menggunakan Metode Hybrid Long Short Term Memory dan Support Vector Machine,” J. Sains dan Seni ITS, vol. 11, no. 3, 2023, doi: 10.12962/j23373520.v11i3.83227.
D. Maulina and R. Sagara, “Klasifikasi Artikel Hoax Menggunakan Support Vector Machine Linear Dengan Pembobotan Term Frequency-Inverse Document Frequency,” J. Mantik Penusa, vol. 2, no. 1, pp. 35–40, 2018.
Y. Jung, “Multiple predicting K-fold cross-validation for model selection,” J. Nonparametr. Stat., vol. 30, no. 1, pp. 197–215, 2018, doi: 10.1080/10485252.2017.1404598.
F. Tempola, M. Muhammad, and A. Khairan, “Perbandingan Klasifikasi Antara KNN dan Naive Bayes pada Penentuan Status Gunung Berapi dengan K-Fold Cross Validation,” J. Teknol. Inf. dan Ilmu Komput., vol. 5, no. 5, p. 577, 2018, doi: 10.25126/jtiik.201855983.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Optimal Number Data Trains in Hoax News Detection of Indonesian using SVM and Word2Vec
Pages: 21−28
Copyright (c) 2023 Muhammad Sulthon Asramanggala, Sri Suryani Prasetyowati, Yuliant Sibaroni

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).





















