Topic Detection on Twitter using GloVe with Convolutional Neural Network and Gated Recurrent Unit
Abstract
Twitter is a social media platform that allows users to share thoughts or information with others for all to see. However, twitters often use abbreviations, slang, and incorrect grammar because tweets are limited to 280 characters. Topic detection often has problems with low accuracy, one method that can be used to overcome this problem is feature expansion. Feature expansion on Twitter is a semantic addition to the process of expanding the original text syllables to make it look like a large Document. That way, feature expansion is used to reduce word mismatches. This study uses the expansion of the GloVe feature with the Convolutional Neural Network (CNN) and Gated Recurrent Unit (GRU) classification methods. The results show that the topic detection system with the GloVe feature extension and CNN-GRU hybrid classification has an accuracy of 94.41%
Downloads
References
P. Studi Komunikasi dan Penyiaran Islam and S. Tinggi Agama Islam As-Sunnah Deli Serdang, “Dampak Perkembangan Teknologi Informasi dan Komunikasi Terhadap Budaya Impact of Information Technology Development and Communication on Culture Daryanto Setiawan,” SIMBOLIKA, vol. 4, no. 1, 2018, doi: 10.31289/simbollika.v4i1.1474.
E. B. Setiawan, D. H. Widyantoro, and K. Surendro, “Feature Expansion using Word Embedding for Tweet Topic Classification,” in 2016 10th International Conference on Telecommunication Systems Services and Applications (TSSA), Denpasar: IEEE, 2016, pp. 1–5. doi: 10.1109/TSSA.2016.7871085.
R. A. Yahya and E. B. Setiawan, “Feature Expansion with FastText on Topic Classification Using the Gradient Boosted Decision Tree on Twitter,” in 10th International Conference on Information and Communication Technology (ICoICT), Bandung: IEEE, 2022, pp. 322–327. doi: 10.1109/ICoICT55009.2022.9914896.
I. F. Ramadhy and Y. Sibaroni, “Analisis Trending Topik Twitter dengan Fitur Ekspansi FastText Menggunakan Metode Logistic Regression,” JURIKOM (Jurnal Riset Komputer), vol. 9, no. 1, p. 1, Feb. 2022, doi: 10.30865/jurikom.v9i1.3791.
B. Xu and K. Mou, “A High-performance Web Attack Detection Method based on CNN-GRU Model,” in 2020 IEEE 4th Information Technology,Networking,Electronic and Automation Control Conference (ITNEC 2020), Chongqing, China: IEEE, 2020, pp. 804–808. doi: 10.1109/ITNEC48623.2020.9085028.
B. Cao, C. Li, Y. Song, Y. Qin, and C. Chen, “Network Intrusion Detection Model Based on CNN and GRU,” Applied Sciences (Switzerland), vol. 12, no. 9, May 2022, doi: 10.3390/app12094184.
E. B. Setiawan, D. H. Widyantoro, and K. Surendro, “Feature expansion for sentiment analysis in twitter,” in International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), Institute of Advanced Engineering and Science, Oct. 2018, pp. 509–513. doi: 10.1109/EECSI.2018.8752851.
Alvi Rahmy Royyan and Erwin Budi Setiawan, “Feature Expansion Word2Vec for Sentiment Analysis of Public Policy in Twitter,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 1, pp. 78–84, Feb. 2022, doi: 10.29207/resti.v6i1.3525.
W. W. Ariestya, I. Astuti, and I. M. Wiryana, “Preprocessing For Crawler Of Short Message Social Media,” in 2018 Third International Conference on Informatics and Computing (ICIC), Palembang, Indonesia: IEEE, Oct. 2018, pp. 1–6. doi: 10.1109/IAC.2018.8780451.
J. Hernandez-Gonzalez, I. Inza, and J. A. Lozano, “A Note on the Behavior of Majority Voting in Multi-Class Domains with Biased Annotators,” IEEE Trans Knowl Data Eng, vol. 31, no. 1, pp. 195–200, Jan. 2019, doi: 10.1109/TKDE.2018.2845400.
J. Hartmann, J. Huppertz, C. Schamp, and M. Heitmann, “Comparing automated text classification methods,” International Journal of Research in Marketing, vol. 36, no. 1, pp. 20–38, Mar. 2019, doi: 10.1016/j.ijresmar.2018.09.009.
M. Umer, Z. Imtiaz, S. Ullah, A. Mehmood, G. S. Choi, and B. W. On, “Fake news stance detection using deep learning architecture (CNN-LSTM),” IEEE Access, vol. 8, pp. 156695–156706, 2020, doi: 10.1109/ACCESS.2020.3019735.
M. Anandarajan, C. Hill, and T. Nolan, “Text Preprocessing,” 2019, pp. 45–59. doi: 10.1007/978-3-319-95663-3_4.
M. A. Rosid, A. S. Fitrani, I. R. I. Astutik, N. I. Mulloh, and H. A. Gozali, “Improving Text Preprocessing for Student Complaint Document Classification Using Sastrawi,” in IOP Conference Series: Materials Science and Engineering, Institute of Physics Publishing, Jul. 2020. doi: 10.1088/1757-899X/874/1/012017.
J. Yao, “Automated Sentiment Analysis of Text Data with NLTK,” in Journal of Physics: Conference Series, Institute of Physics Publishing, May 2019. doi: 10.1088/1742-6596/1187/5/052020.
E. B. Setiawan, D. H. Widyantoro, and K. Surendro, “Measuring information credibility in social media using combination of user profile and message content dimensions,” International Journal of Electrical and Computer Engineering, vol. 10, no. 4, pp. 3537–3549, 2020, doi: 10.11591/ijece.v10i4.pp3537-3549.
L. Dhara J and D. Nikita P, “Stopword Identification and Removal Techniques on TC and IR Applications: A Survey,” in 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India: IEEE, May 2020. doi: 10.1109/ICACCS48705.2020.9074166.
D. Merlini and M. Rossini, “Text categorization with WEKA: A survey,” Machine Learning with Applications, vol. 4, p. 100033, Jun. 2021, doi: 10.1016/j.mlwa.2021.100033.
A. Kadhim, “An Evaluation of Preprocessing Techniques for Text Classification Pattern Recognition View project Improvement text classification using log(TF-IDF) with K-NN Algorithm View project,” Article in International Journal of Computer Science and Information Security, vol. 16, no. 6, pp. 13–22, 2018, doi: 10.5281/zenodo.1296383.
Zankoya Zaxo and Duhok Polytechnic University, “Term Weighting for Feature Extraction on Twitter: A Comparison Between BM25 and TF-IDF,” in 2019 International Conference on Advanced Science and Engineering (ICOASE), Zakho - Duhok, Iraq: IEEE, Apr. 2019, pp. 124–128. doi: 10.1109/ICOASE.2019.8723825.
Z. Zhang, Y. Lei, J. Xu, X. Mao, and X. Chang, “TFIDF-FL: Localizing faults using term frequency-inverse document frequency and deep learning,” IEICE Trans Inf Syst, vol. E102D, no. 9, pp. 1860–1864, 2019, doi: 10.1587/transinf.2018EDL8237.
S. Qaiser and R. Ali, “Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents,” Int J Comput Appl, vol. 181, no. 1, pp. 25–29, Jul. 2018, doi: 10.5120/ijca2018917395.
A. Nurdin, B. Anggo, S. Aji, A. Bustamin, and Z. Abidin, “PERBANDINGAN KINERJA WORD EMBEDDING WORD2VEC, GLOVE, DAN FASTTEXT PADA KLASIFIKASI TEKS,” Jurnal TEKNOKOMPAK, vol. 14, no. 2, p. 74, 2020, doi: https://doi.org/10.33365/jtk.v14i2.732.
E. M. Dharma, F. Lumban Gaol, H. Leslie, H. S. Warnars, and B. Soewito, “THE ACCURACY COMPARISON AMONG WORD2VEC, GLOVE, AND FASTTEXT TOWARDS CONVOLUTION NEURAL NETWORK (CNN) TEXT CLASSIFICATION,” J Theor Appl Inf Technol, vol. 31, no. 2, 2022, [Online]. Available: www.jatit.org
L. Deng et al., “News Text Classification Method Based on the GRU_CNN Model,” International Transactions on Electrical Energy Systems, vol. 2022, 2022, doi: 10.1155/2022/1197534.
S. Sridevi, G. R. Karpagam, and B. V. Kumar, “GENETIC ALGORITHM - OPTIMIZED GATED RECURRENT UNIT (GRU) NETWORK FOR SEMANTIC WEB SERVICES CLASSIFICATION,” Malaysian Journal of Computer Science, vol. 35, no. 1, pp. 70–88, 2022, doi: 10.22452/mjcs.vol35no1.5.
M. A. Hossain, R. Karim, R. Thulasiram, N. D. B. Bruce, and Y. Wang, “Hybrid Deep Learning Model for Stock Price Prediction,” in 2018 IEEE Symposium Series on Computational Intelligence (SSCI), Bangalore, India: IEEE, 2018, pp. 1837–1844. doi: 10.1109/SSCI.2018.8628641.
C. N. Dang, M. N. Moreno-García, and F. De La Prieta, “Hybrid Deep Learning Models for Sentiment Analysis,” Complexity, vol. 2021, 2021, doi: 10.1155/2021/9986920.
M. M. Fahmy, “Confusion Matrix in Binary Classification Problems: A Step-by-Step Tutorial,” Journal of Engineering Research, vol. 6, no. 5, 2022, doi: 10.21608/ERJENG.2022.274526.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Topic Detection on Twitter using GloVe with Convolutional Neural Network and Gated Recurrent Unit
Pages: 386−396
Copyright (c) 2023 Moh Adi Ikfini M, Erwin Budi Setiawan
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).