Optimal Number Data Trains in Hoax News Detection of Indonesian using SVM and Word2Vec


  • Muhammad Sulthon Asramanggala Telkom University, Bandung, Indonesia
  • Sri Suryani Prasetyowati Telkom University, Bandung, Indonesia
  • Yuliant Sibaroni * Mail Telkom University, Bandung, Indonesia
  • (*) Corresponding Author
Keywords: Hoax; Classification; Support Vector Machine; Word2Vec; Twitter

Abstract

Along with the development of the era of technological development also has an increase. Information dissemination occurs very quickly on social media, especially Twitter. On Twitter, only some news circulating is necessarily accurate information. Lots of information that is spread is hoax news that irresponsible individuals apply. In this research, the author will build a system to determine the optimal amount of data trained in the hoax news classification process. In this study, the authors will use the support vector machine and word2vec algorithms to classify hoax and non-hoax news on the system to be created. In this study, five experiments were carried out with the number of train data used as many as 5000, 10000, 15000, 20000, and 25000. 5000 data train results in an accuracy of 77.28%, 10000 data train produce an accuracy of 79.68%, data 15,000 trains produce an accuracy of 79.892%, 20,000 data trains produce an accuracy of 80,416%, and 25,000 data trains produce an accuracy of 81,184%, by using a combination of unigram with token full token selection. This research aims to build a hoax detection system that can determine the optimal amount of data training to use. Also, this research is used to see the performance of the Support Vector Machine algorithm with Word2Vec in detecting hoax news

Downloads

Download data is not yet available.

References

A. Afriza and J. Adisantoso, “Metode Klasifikasi Rocchio untuk Analisis Hoax Rocchio Classification Method for Hoax Analysis,” J. Ilmu Komput. Agri-Informatika, vol. 5, no. 1, pp. 1–10, 2018, [Online]. Available: http://journal.ipb.ac.id/index.php/jika

R. N. Rahayu and Sensusiyati, “Analisis Berita Hoax Covid - 19 Di Media Sosial Di Indonesia,” J. Ekon. Sos. Hum., vol. 1, no. 9, p. 63, 2020.

Munirul, Ula, M. M. Alvanof, and R. Triandi, “Analisa Dan Deteksi Konten Hoax Pada Media Berita,” J. Teknol. Terap. Sains 4.0 Univ. Malikussaleh, vol. 1, p. 2, 2020.

I. Kencana Wintang, Crisanadenta; Setiawan Budi, Erwin; Kurniawan, “JURNAL RESTI Hoax Detection on Twitter using Feed-forward and Back-propagation,” RESTI J. (System Eng. Inf. Technol., vol. 4, no. 10, pp. 655–663, 2020.

F. Ismayanti and E. B. Setiawan, “Deteksi Konten Hoax Berbahasa Indonesia Di Twitter Menggunakan Fitur Ekspansi Dengan Word2vec,” eProceedings …, vol. 8, no. 5, pp. 10288–10300, 2021, [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/15697%0Ahttps://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/15697/15410

B. P. Nayoga, R. Adipradana, R. Suryadi, and D. Suhartono, “Hoax Analyzer for Indonesian News Using Deep Learning Models,” Procedia Comput. Sci., vol. 179, no. 2020, pp. 704–712, 2021, doi: 10.1016/j.procs.2021.01.059.

P. N. Anggreyani and W. Maharani, “Hoax Detection Tweets of the COVID-19 on Twitter Using LSTM- CNN with Word2Vec,” vol. 6, pp. 2432–2437, 2022, doi: 10.30865/mib.v6i4.4564.

D. A. Pisner and D. M. Schnyer, “Support vector machine,” Mach. Learn. Methods Appl. to Brain Disord., pp. 101–121, 2019, doi: 10.1016/B978-0-12-815739-8.00006-7.

I. M. Mubaroq and E. B. Setiawan, “The Effect of Information Gain Feature Selection for Hoax Identification in Twitter Using Classification Method Support Vector Machine,” Indones. J. …, vol. 5, no. September, pp. 107–118, 2020, doi: 10.21108/indojc.2020.5.2.499.

A. Nurdin, B. Anggo Seno Aji, A. Bustamin, and Z. Abidin, “Perbandingan Kinerja Word Embedding Word2Vec, Glove, Dan Fasttext Pada Klasifikasi Teks,” J. Tekno Kompak, vol. 14, no. 2, p. 74, 2020, doi: 10.33365/jtk.v14i2.732.

D. E. Latumaerissa, “Studi Ekstraksi Fitur Data Teks Rencana Pelaksanaan Pembelajaran Memanfaatkan Model Word2Vec,” J. Linguist. Komputasional, vol. 4, no. 2, p. 34, 2021, doi: 10.26418/jlk.v4i2.54.

D. I. Af’idah, Dairoh, S. F. Handayani, and R. W. Pratiwi, “Pengaruh Parameter Word2Vec terhadap Performa Deep Learning pada Klasifikasi Sentimen,” J. Inform. Jurunal Pengemb. IT, vol. 6, no. 3, pp. 156–161, 2021.

S. Khomsah and Agus Sasmito Aribowo, “Model Text-Preprocessing Komentar Youtube Dalam Bahasa Indonesia,” Rekayasa Sist. dan Teknol. Informasi, RESTI, vol. 4, no. 10, pp. 648–654, 2020.

I. Fahrur Rozi, A. Taufika Firdausi, and K. Islamiyah, “Analisis Sentimen Pada Twitter Mengenai Pasca Bencana Menggunakan Metode Naïve Bayes Dengan Fitur N-Gram,” J. Inform. Polinema, vol. 6, no. 2, pp. 33–39, 2020, doi: 10.33795/jip.v6i2.316.

M. Hakiem, M. A. Fauzi, and Indriati, “Klasifikasi Ujaran Kebencian pada Twitter Menggunakan Metode Naïve Bayes Berbasis N-Gram Dengan Seleksi Fitur Information Gain,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 3, no. 3, pp. 2443–2451, 2019, [Online]. Available: http://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/4682

J. Patihullah and E. Winarko, “Hate Speech Detection for Indonesia Tweets Using Word Embedding And Gated Recurrent Unit,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 13, no. 1, p. 43, 2019, doi: 10.22146/ijccs.40125.

E. Suryati, A. Ari Aldino, N. Penulis Korespondensi, and E. Suryati Submitted, “Analisis Sentimen Transportasi Online Menggunakan Ekstraksi Fitur Model Word2vec Text Embedding Dan Algoritma Support Vector Machine (SVM),” vol. 4, no. 1, pp. 96–106, 2023, [Online]. Available: https://doi.org/10.33365/jtsi.v4i1.2445

I. M. Parapat and M. T. Furqon, “Penerapan Metode Support Vector Machine (SVM) Pada Klasifikasi Penyimpangan Tumbuh Kembang Anak,” J. Pengemb. Teknol. Inf. dan Ilmu Komput. , vol. 2, no. 10, pp. 3163–3169, 2018, [Online]. Available: http://j-ptiik.ub.ac.id

D. F. N. Anisa, I. Mukhlash, and M. Iqbal, “Deteksi Berita Online Hoax Covid-19 Di Indonesia Menggunakan Metode Hybrid Long Short Term Memory dan Support Vector Machine,” J. Sains dan Seni ITS, vol. 11, no. 3, 2023, doi: 10.12962/j23373520.v11i3.83227.

D. Maulina and R. Sagara, “Klasifikasi Artikel Hoax Menggunakan Support Vector Machine Linear Dengan Pembobotan Term Frequency-Inverse Document Frequency,” J. Mantik Penusa, vol. 2, no. 1, pp. 35–40, 2018.

Y. Jung, “Multiple predicting K-fold cross-validation for model selection,” J. Nonparametr. Stat., vol. 30, no. 1, pp. 197–215, 2018, doi: 10.1080/10485252.2017.1404598.

F. Tempola, M. Muhammad, and A. Khairan, “Perbandingan Klasifikasi Antara KNN dan Naive Bayes pada Penentuan Status Gunung Berapi dengan K-Fold Cross Validation,” J. Teknol. Inf. dan Ilmu Komput., vol. 5, no. 5, p. 577, 2018, doi: 10.25126/jtiik.201855983.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Optimal Number Data Trains in Hoax News Detection of Indonesian using SVM and Word2Vec

Dimensions Badge
Article History
Submitted: 2023-05-26
Published: 2023-06-28
Abstract View: 1258 times
PDF Download: 641 times
How to Cite
Asramanggala, M., Prasetyowati, S., & Sibaroni, Y. (2023). Optimal Number Data Trains in Hoax News Detection of Indonesian using SVM and Word2Vec. Building of Informatics, Technology and Science (BITS), 5(1), 21−28. https://doi.org/10.47065/bits.v5i1.3516
Issue
Section
Articles