Comparison of Random Forest and Decision Tree Methods for Emotion Classification based on Social Media Posts

Muhammad Abiyyu Tsaqif; Warih Maharani

doi:10.47065/bits.v6i4.6677

Muhammad Abiyyu Tsaqif * Telkom University, Bandung, Indonesia
Warih Maharani Telkom University, Bandung, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bits.v6i4.6677

Keywords: Bag of Words; Decision Tree; Emotion Classification; Random Forest; TF-IDF

Abstract

Social media platforms like X (formerly Twitter) have become essential for expressing emotions and opinions, making emotion classification a critical task with applications in mental health, public sentiment monitoring, and customer feedback analysis. This study compares Random Forest and Decision Tree algorithms for classifying emotions such as joy, sadness, anger, and fear which are from social media posts. Data collection involved crawling tweets and manual labeling. Preprocessing included tokenization, stemming, and stopword removal, with feature extraction using TF-IDF and Bag of Words. Experimental scenarios tested data split ratios, resampling for class balance, and parameter tuning. Decision Tree parameters included criterion (gini, entropy), max depth (none, fixed values), min samples split (2, 5), and min samples leaf (1, 2). Random Forest parameters tuned were n_estimators (100–400), max depth (none, fixed values), min samples split (2, 5, 10), and min samples leaf (1, 2). Results showed Random Forest achieving a maximum accuracy of 76.17%, outperforming Decision Tree’s 72.62%. The combination of TF-IDF and Bag of Words delivered the highest accuracy for both models. This study underscores the importance of preprocessing, balanced datasets, and parameter optimization for effective emotion classification. The findings offer insights into advancing sentiment analysis and natural language processing, enabling practical applications in public sentiment tracking, customer experience enhancement, and crisis management.

Downloads

Download data is not yet available.

References

B. Liu, “Sentiment Analysis and Opinion Mining” Morgan & Claypool Publisher, 2012

J. Prinz, “Which Emotions Are Basic?,” Oxford University Press, 2004. doi: 10.1093/acprof:oso/9780198528975.003.0004.

A. Al Maruf, F. Khanam, M. M. Haque, Z. M. Jiyad, M. F. Mridha, and Z. Aung, “Challenges and Opportunities of Text-Based Emotion Detection: A Survey,” IEEE Access, vol. 12, pp. 18416–18450, 2024, doi: 10.1109/ACCESS.2024.3356357.

T. Heričko and B. Šumak, “Commit Classification into Software Maintenance Activities: A Systematic Literature Review,” in Proceedings - International Computer Software and Applications Conference, 2023. doi: 10.1109/COMPSAC57700.2023.00254.

F. Aaboub, H. Chamlal, and T. Ouaderhman, “Analysis of the prediction performance of decision tree-based algorithms,” 2023 International Conference on Decision Aid Sciences and Applications, DASA 2023, pp. 7–11, 2023, doi: 10.1109/DASA59624.2023.10286809.

D. Septhya et al., “Implementasi Algoritma Decision Tree dan Support Vector Machine untuk Klasifikasi Penyakit Kanker Paru,” MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 3, no. 1, pp. 15–19, May 2023, doi: 10.57152/MALCOM.V3I1.591.

T. T. Huynh-Cam, L. S. Chen, and H. Le, “Using decision trees and random forest algorithms to predict and determine factors contributing to first-year university students’ learning performance,” Algorithms, vol. 14, no. 11, Nov. 2021, doi: 10.3390/a14110318.

I. Setiawan et al., “Utilizing Random Forest Algorithm for Sentiment Prediction Based on Twitter Data,” in Proceedings of the First Mandalika International Multi-Conference on Science and Engineering 2022, MIMSE 2022 (Informatics and Computer Science), Atlantis Press International BV, 2022, pp. 446–456. doi: 10.2991/978-94-6463-084-8_37.

D. Keskar, S. Palwe, and A. Gupta, “Fake News Classification on Twitter Using Flume, N-Gram Analysis, and Decision Tree Machine Learning Technique,” Proceeding of International Conference on Computational Science and Applications, pp. 139–147, 2020, doi: 10.1007/978-981-15-0790-8_15.

H. Taherdoost, “Data Collection Methods and Tools for Research; A Step-by-Step Guide to Choose Data Collection Technique for Academic and Business Research Projects,” International Journal of Academic Research in Management (IJARM), 2021. [Online]. Available: https://hal.science/hal-03741847v1

V. Vine, E. E. Bernstein, and S. Nolen-Hoeksema, “Less is more? Effects of exhaustive vs. minimal emotion labelling on emotion regulation strategy planning,” Cogn Emot, vol. 33, no. 4, 2019, doi: 10.1080/02699931.2018.1486286.

P. Li, Z. Chen, X. Chu, and K. Rong, “DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data,” Proceedings of the ACM on Management of Data, vol. 1, no. 2, pp. 1–26, Jun. 2023, doi: 10.1145/3589328.

K. Goyle, Q. Xie, and V. Goyle, “DataAssist: A Machine Learning Approach to Data Cleaning and Preparation,” Jul. 2023, [Online]. Available: http://arxiv.org/abs/2307.07119

A. Jalilifard, V. F. Caridá, A. F. Mansano, R. S. Cristo, and F. P. C. da Fonseca, “Semantic Sensitive TF-IDF to Determine Word Relevance in Documents,” Advances in Computing and Network Communications, vol. 735, Jan. 2020, doi: 10.1007/978-981-33-6977-1.

W. N. Ibrahem Al-Obaydy, H. A. Hashim, Y. AbdulKhaleq Najm, and A. A. Jalal, “Document classification using term frequency-inverse document frequency and K-means clustering,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 27, no. 3, pp. 1517–1524, Sep. 2022, doi: 10.11591/ijeecs.v27.i3.pp1517-1524.

D. Yan, K. Li, S. Gu, and L. Yang, “Network-Based Bag-of-Words Model for Text Classification,” IEEE Access, vol. 8, pp. 82641–82652, 2020, doi: 10.1109/ACCESS.2020.2991074.

U. K. Singh, B. Prabhu Shankar, R. Chinnaiyan, and N. Jain, “Machine Learning-Based Text Categorization with Bag of Words,” Lecture Notes in Electrical Engineering, vol. 1194, pp. 577–587, 2024, doi: 10.1007/978-981-97-2839-8_40.

W. A. Qader, M. M. Ameen, and B. I. Ahmed, “An Overview of Bag of Words;Importance, Implementation, Applications, and Challenges,” Proceedings of the 5th International Engineering Conference, IEC 2019, pp. 200–204, Jun. 2019, doi: 10.1109/IEC47844.2019.8950616.

D. Intan Af et al., “Pengaruh Parameter Word2Vec terhadap Performa Deep Learning pada Klasifikasi Sentimen,” vol. 6, no. 3, 2021, doi: https://doi.org/10.30591/jpit.v6i3.3016.

J. Liao, Y. Huang, H. Wang, and M. Li, “Matching Ontologies with Word2Vec Model Based on Cosine Similarity,” pp. 367–374, 2021, doi: 10.1007/978-3-030-76346-6_34.

L. Breiman, “Random Forests,” Machine Learning, 2001. doi: doi.org/10.1023/A:1010933404324.

Y. Al Amrani, M. Lazaar, and K. E. El Kadirp, “Random forest and support vector machine based hybrid approach to sentiment analysis,” in Procedia Computer Science, Elsevier B.V., 2018, pp. 511–520. doi: 10.1016/j.procs.2018.01.150.

X. Chen, D. Yu, and X. Zhang, “Optimal Weighted Random Forests,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.10042

L. Rokach and O. Maimon, “Decision Trees,” Data Mining and Knowledge Discovery Handbook, 2005. doi: https://doi.org/10.1007/0-387-25465-X_9.

G. Zhang and A. Gionis, “Regularized impurity reduction: Accurate decision trees with complexity guarantees,” Data Min Knowl Discov, vol. 37, no. 1, pp. 434–475, Aug. 2022, doi: 10.1007/s10618-022-00884-7.

L. Ceriani and P. Verme, “The origins of the Gini index: Extracts from Variabilità e Mutabilità (1912) by Corrado Gini,” J Econ Inequal, vol. 10, no. 3, pp. 421–443, Sep. 2012, doi: 10.1007/s10888-011-9188-x.

P. Singh, N. Singh, K. K. Singh, and A. Singh, “Diagnosing of disease using machine learning,” Machine Learning and the Internet of Medical Things in Healthcare, pp. 89–111, Jan. 2021, doi: 10.1016/B978-0-12-821229-5.00003-3.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Comparison of Random Forest and Decision Tree Methods for Emotion Classification based on Social Media Posts

Comparison of Random Forest and Decision Tree Methods for Emotion Classification based on Social Media Posts

Abstract

Downloads

References

Most read articles by the same author(s)