Comparison of Random Forest and Decision Tree Methods for Emotion Classification based on Social Media Posts
Abstract
Social media platforms like X (formerly Twitter) have become essential for expressing emotions and opinions, making emotion classification a critical task with applications in mental health, public sentiment monitoring, and customer feedback analysis. This study compares Random Forest and Decision Tree algorithms for classifying emotions such as joy, sadness, anger, and fear which are from social media posts. Data collection involved crawling tweets and manual labeling. Preprocessing included tokenization, stemming, and stopword removal, with feature extraction using TF-IDF and Bag of Words. Experimental scenarios tested data split ratios, resampling for class balance, and parameter tuning. Decision Tree parameters included criterion (gini, entropy), max depth (none, fixed values), min samples split (2, 5), and min samples leaf (1, 2). Random Forest parameters tuned were n_estimators (100–400), max depth (none, fixed values), min samples split (2, 5, 10), and min samples leaf (1, 2). Results showed Random Forest achieving a maximum accuracy of 76.17%, outperforming Decision Tree’s 72.62%. The combination of TF-IDF and Bag of Words delivered the highest accuracy for both models. This study underscores the importance of preprocessing, balanced datasets, and parameter optimization for effective emotion classification. The findings offer insights into advancing sentiment analysis and natural language processing, enabling practical applications in public sentiment tracking, customer experience enhancement, and crisis management.
Downloads
References
B. Liu, “Sentiment Analysis and Opinion Mining” Morgan & Claypool Publisher, 2012
J. Prinz, “Which Emotions Are Basic?,” Oxford University Press, 2004. doi: 10.1093/acprof:oso/9780198528975.003.0004.
A. Al Maruf, F. Khanam, M. M. Haque, Z. M. Jiyad, M. F. Mridha, and Z. Aung, “Challenges and Opportunities of Text-Based Emotion Detection: A Survey,” IEEE Access, vol. 12, pp. 18416–18450, 2024, doi: 10.1109/ACCESS.2024.3356357.
T. Heričko and B. Šumak, “Commit Classification into Software Maintenance Activities: A Systematic Literature Review,” in Proceedings - International Computer Software and Applications Conference, 2023. doi: 10.1109/COMPSAC57700.2023.00254.
F. Aaboub, H. Chamlal, and T. Ouaderhman, “Analysis of the prediction performance of decision tree-based algorithms,” 2023 International Conference on Decision Aid Sciences and Applications, DASA 2023, pp. 7–11, 2023, doi: 10.1109/DASA59624.2023.10286809.
D. Septhya et al., “Implementasi Algoritma Decision Tree dan Support Vector Machine untuk Klasifikasi Penyakit Kanker Paru,” MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 3, no. 1, pp. 15–19, May 2023, doi: 10.57152/MALCOM.V3I1.591.
T. T. Huynh-Cam, L. S. Chen, and H. Le, “Using decision trees and random forest algorithms to predict and determine factors contributing to first-year university students’ learning performance,” Algorithms, vol. 14, no. 11, Nov. 2021, doi: 10.3390/a14110318.
I. Setiawan et al., “Utilizing Random Forest Algorithm for Sentiment Prediction Based on Twitter Data,” in Proceedings of the First Mandalika International Multi-Conference on Science and Engineering 2022, MIMSE 2022 (Informatics and Computer Science), Atlantis Press International BV, 2022, pp. 446–456. doi: 10.2991/978-94-6463-084-8_37.
D. Keskar, S. Palwe, and A. Gupta, “Fake News Classification on Twitter Using Flume, N-Gram Analysis, and Decision Tree Machine Learning Technique,” Proceeding of International Conference on Computational Science and Applications, pp. 139–147, 2020, doi: 10.1007/978-981-15-0790-8_15.
H. Taherdoost, “Data Collection Methods and Tools for Research; A Step-by-Step Guide to Choose Data Collection Technique for Academic and Business Research Projects,” International Journal of Academic Research in Management (IJARM), 2021. [Online]. Available: https://hal.science/hal-03741847v1
V. Vine, E. E. Bernstein, and S. Nolen-Hoeksema, “Less is more? Effects of exhaustive vs. minimal emotion labelling on emotion regulation strategy planning,” Cogn Emot, vol. 33, no. 4, 2019, doi: 10.1080/02699931.2018.1486286.
P. Li, Z. Chen, X. Chu, and K. Rong, “DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data,” Proceedings of the ACM on Management of Data, vol. 1, no. 2, pp. 1–26, Jun. 2023, doi: 10.1145/3589328.
K. Goyle, Q. Xie, and V. Goyle, “DataAssist: A Machine Learning Approach to Data Cleaning and Preparation,” Jul. 2023, [Online]. Available: http://arxiv.org/abs/2307.07119
A. Jalilifard, V. F. Caridá, A. F. Mansano, R. S. Cristo, and F. P. C. da Fonseca, “Semantic Sensitive TF-IDF to Determine Word Relevance in Documents,” Advances in Computing and Network Communications, vol. 735, Jan. 2020, doi: 10.1007/978-981-33-6977-1.
W. N. Ibrahem Al-Obaydy, H. A. Hashim, Y. AbdulKhaleq Najm, and A. A. Jalal, “Document classification using term frequency-inverse document frequency and K-means clustering,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 27, no. 3, pp. 1517–1524, Sep. 2022, doi: 10.11591/ijeecs.v27.i3.pp1517-1524.
D. Yan, K. Li, S. Gu, and L. Yang, “Network-Based Bag-of-Words Model for Text Classification,” IEEE Access, vol. 8, pp. 82641–82652, 2020, doi: 10.1109/ACCESS.2020.2991074.
U. K. Singh, B. Prabhu Shankar, R. Chinnaiyan, and N. Jain, “Machine Learning-Based Text Categorization with Bag of Words,” Lecture Notes in Electrical Engineering, vol. 1194, pp. 577–587, 2024, doi: 10.1007/978-981-97-2839-8_40.
W. A. Qader, M. M. Ameen, and B. I. Ahmed, “An Overview of Bag of Words;Importance, Implementation, Applications, and Challenges,” Proceedings of the 5th International Engineering Conference, IEC 2019, pp. 200–204, Jun. 2019, doi: 10.1109/IEC47844.2019.8950616.
D. Intan Af et al., “Pengaruh Parameter Word2Vec terhadap Performa Deep Learning pada Klasifikasi Sentimen,” vol. 6, no. 3, 2021, doi: https://doi.org/10.30591/jpit.v6i3.3016.
J. Liao, Y. Huang, H. Wang, and M. Li, “Matching Ontologies with Word2Vec Model Based on Cosine Similarity,” pp. 367–374, 2021, doi: 10.1007/978-3-030-76346-6_34.
L. Breiman, “Random Forests,” Machine Learning, 2001. doi: doi.org/10.1023/A:1010933404324.
Y. Al Amrani, M. Lazaar, and K. E. El Kadirp, “Random forest and support vector machine based hybrid approach to sentiment analysis,” in Procedia Computer Science, Elsevier B.V., 2018, pp. 511–520. doi: 10.1016/j.procs.2018.01.150.
X. Chen, D. Yu, and X. Zhang, “Optimal Weighted Random Forests,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.10042
L. Rokach and O. Maimon, “Decision Trees,” Data Mining and Knowledge Discovery Handbook, 2005. doi: https://doi.org/10.1007/0-387-25465-X_9.
G. Zhang and A. Gionis, “Regularized impurity reduction: Accurate decision trees with complexity guarantees,” Data Min Knowl Discov, vol. 37, no. 1, pp. 434–475, Aug. 2022, doi: 10.1007/s10618-022-00884-7.
L. Ceriani and P. Verme, “The origins of the Gini index: Extracts from Variabilità e Mutabilità (1912) by Corrado Gini,” J Econ Inequal, vol. 10, no. 3, pp. 421–443, Sep. 2012, doi: 10.1007/s10888-011-9188-x.
P. Singh, N. Singh, K. K. Singh, and A. Singh, “Diagnosing of disease using machine learning,” Machine Learning and the Internet of Medical Things in Healthcare, pp. 89–111, Jan. 2021, doi: 10.1016/B978-0-12-821229-5.00003-3.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Comparison of Random Forest and Decision Tree Methods for Emotion Classification based on Social Media Posts
Pages: 2240-2248
Copyright (c) 2025 Muhammad Abiyyu Tsaqif, Warih Maharani

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).