Mental Health Sentiment Analysis on Twitter using Ensemble Learning Algorithm

Kemal Aziz; Bambang Ari Wahyudi; Irma Palupi

doi:10.47065/bits.v7i2.7763

Kemal Aziz * Telkom University, Bandung, Indonesia
Bambang Ari Wahyudi Telkom University, Bandung, Indonesia
Irma Palupi Telkom University, Bandung, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bits.v7i2.7763

Keywords: Ensemble Learning; Machine Learning; Mental Health; Sentiment Analysis; Social Media

Abstract

Mental health problems have become an important health issue around the world. Poor understanding as well as low mental health awareness contribute to mental health healing efforts. In particular, Social media is becoming a platform for people to convey feelings and emotions. A dataset of 20,000 English tweets, equally divided into 10,000 depressed and 10,000 non-depressed tweets, which were cleaned and processed using Term Frequency-Inverse Document Frequency (TF-IDF) for feature extraction. The method used in this sentiment analysis introduces an ensemble learning framework that combines Naïve Bayes, Support Vector Machine, and Random Forest classifiers, using majority voting for prediction. Each classifier was optimized using the best parameters, and the models were validated through 5-fold cross-validation. The experimental results show that Naïve Bayes with α = 1 achieved an accuracy of 76.23% while Random Forest with 5000 trees at 76.77%, and Support Vector Machine with a linear kernel at 75.32%. By combining these classifiers, the ensemble model reached the highest accuracy of 77.88%, demonstrating the effectiveness of combining multiple models to improve performance.

Downloads

Download data is not yet available.

References

W. H. Organization, “WHO highlights urgent need to transform mental health and mental health care.” [Online]. Available: https://www.who.int/news/item/17-06-2022-who-highlights-urgent-need-to-transform-mental-health-and-mental-health-care

J. A. Naslund, A. Bondre, J. Torous, and K. A. Aschbrenner, “Social Media and Mental Health: Benefits, Risks, and Opportunities for Research and Practice,” J. Technol. Behav. Sci., vol. 5, no. 3, pp. 245–257, 2020, doi: 10.1007/s41347-020-00134-x.

S. Chancellor and M. De Choudhury, “Methods in predictive techniques for mental health status on social media: a critical review,” npj Digit. Med., vol. 3, no. 1, 2020, doi: 10.1038/s41746-020-0233-7.

N. Braig, A. Benz, S. Voth, J. Breitenbach, and R. Buettner, “Machine Learning Techniques for Sentiment Analysis of COVID-19-Related Twitter Data,” IEEE Access, vol. 11, pp. 14778–14803, 2023, doi: 10.1109/ACCESS.2023.3242234.

Y. DIng, X. Chen, Q. Fu, and S. Zhong, “A Depression Recognition Method for College Students Using Deep Integrated Support Vector Algorithm,” IEEE Access, vol. 8, pp. 75616–75629, 2020, doi: 10.1109/ACCESS.2020.2987523.

N. Al Asad, M. A. Mahmud Pranto, S. Afreen, and M. M. Islam, “Depression Detection by Analyzing Social Media Posts of User,” in 2019 IEEE International Conference on Signal Processing, Information, Communication & Systems (SPICSCON), 2019, pp. 13–17. doi: 10.1109/SPICSCON48833.2019.9065101.

Z. N. Vasha, B. Sharma, I. J. Esha, J. Al Nahian, and J. A. Polin, “Depression detection in social media comments data using machine learning algorithms,” Bull. Electr. Eng. Informatics, vol. 12, no. 2, pp. 987–996, 2023, doi: 10.11591/eei.v12i2.4182.

A. Renaldi and W. Maharani, “Depression Detection of User in Media Social Twitter Using Random Forest,” J. Inf. Syst. Res., vol. 3, no. 4, pp. 410–416, 2022, doi: 10.47065/josh.v3i4.1837.

R. H. H. Aziz and N. Dimililer, “Twitter Sentiment Analysis using an Ensemble Weighted Majority Vote Classifier,” 3rd Int. Conf. Adv. Sci. Eng. ICOASE 2020, pp. 103–109, 2020, doi: 10.1109/ICOASE51841.2020.9436590.

W. Bin Tahir, S. Khalid, S. Almutairi, M. Abohashrh, S. A. Memon, and J. Khan, “Depression Detection in Social Media: A Comprehensive Review of Machine Learning and Deep Learning Techniques,” IEEE Access, vol. 13, no. December 2024, pp. 12789–12818, 2025, doi: 10.1109/ACCESS.2025.3530862.

K. E. Hoque and H. Aljamaan, “Impact of hyperparameter tuning on machine learning models in stock price forecasting,” IEEE Access, vol. 9, pp. 163815–163830, 2021, doi: 10.1109/ACCESS.2021.3134138.

InFamousCoder, “Depression: Twitter Dataset + Feature Extraction.” [Online]. Available: https://www.kaggle.com/datasets/infamouscoder/mental-health-social-media/data

Z. Jianqiang and G. Xiaolin, “Comparison research on text pre-processing methods on twitter sentiment analysis,” IEEE Access, vol. 5, no. c, pp. 2870–2879, 2017, doi: 10.1109/ACCESS.2017.2672677.

F. D. Wibowo, I. Palupi, and B. A. Wahyudi, “Image Detection for Common Human Skin Diseases in Indonesia Using CNN and Ensemble Learning Method,” J. Comput. Syst. Informatics, vol. 3, no. 4, pp. 527–535, 2022, doi: 10.47065/josyc.v3i4.2151.

B. A. Mustofa, W. Laksito, and Y. Saptomo, “Journal of Artificial Intelligence and Engineering Applications Use of Natural Language Processing in Social Media Text Analysis,” Journal of Artificial Intelligence and Engieenering Applications, vol. 4, no. 2, pp. 2808–4519, 2025, [Online]. Available: https://ioinformatic.org/

A. W. Pradana and M. Hayaty, “The Effect of Stemming and Removal of Stopwords on the Accuracy of Sentiment Analysis on Indonesian-language Texts,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 4, no. 3, pp. 375–380, 2019, doi: 10.22219/kinetik.v4i4.912.

R. Pramana, Debora, J. J. Subroto, A. A. S. Gunawan, and Anderies, “Systematic Literature Review of Stemming and Lemmatization Performance for Sentence Similarity,” Proc. 2022 IEEE 7th Int. Conf. Inf. Technol. Digit. Appl. ICITDA 2022, no. November 2022, 2022, doi: 10.1109/ICITDA55840.2022.9971451.

T. E. Ramya and S. Sindhupriya, “An Effective Approach for Mental Health Prediction Using Machine Learning Algorithm,” Int. J. Eng. Res. Technollogy, vol. 10, no. 13, pp. 81–84, 2022.

wesam ahmed, N. Semary, K. Amin, and M. Adel Hammad, “Sentiment Analysis on Twitter Using Machine Learning Techniques and TF-IDF Feature Extraction: A Comparative Study,” IJCI. Int. J. Comput. Inf., vol. 10, no. 3, pp. 52–57, 2023, doi: 10.21608/ijci.2023.236052.1128.

H. Bichri, A. Chergui, and M. Hain, “Investigating the Impact of Train / Test Split Ratio on the Performance of Pre-Trained Models with Custom Datasets,” Int. J. Adv. Comput. Sci. Appl., vol. 15, no. 2, pp. 331–339, 2024, doi: 10.14569/IJACSA.2024.0150235.

M. S. Santos, J. P. Soares, P. H. Abreu, H. Araujo, and J. Santos, “Cross-validation for imbalanced datasets: Avoiding overoptimistic and overfitting approaches [Research Frontier],” IEEE Comput. Intell. Mag., vol. 13, no. 4, pp. 59–76, 2018, doi: 10.1109/MCI.2018.2866730.

P. M. Mathapati, A. S. Shahapurkar, and K. D. Hanabaratti, “Sentiment Analysis using Naïve bayes Algorithm,” Int. J. Comput. Sci. Eng., vol. 5, no. 7, pp. 75–77, 2017, doi: 10.26438/ijcse/v5i7.7577.

D. Pradana and E. Sugiharti, “Implementation Data Mining with Naive Bayes Classifier Method and Laplace Smoothing to Predict Students Learning Results,” Recursive J. Informatics, vol. 1, no. 1, pp. 1–8, 2023, doi: 10.15294/rji.v1i1.63964.

J. Nayak, B. Naik, and H. S. Behera, “A Comprehensive Survey on Support Vector Machine in Data Mining Tasks: Applications & Challenges,” Int. J. Database Theory Appl., vol. 8, no. 1, pp. 169–186, 2015, doi: 10.14257/ijdta.2015.8.1.18.

D. Mustafa Abdullah and A. Mohsin Abdulazeez, “Machine Learning Applications based on SVM Classification A Review,” Qubahan Acad. J., vol. 1, no. 2 SE-Articles, pp. 81–90, Apr. 2021, doi: 10.48161/qaj.v1n2a50.

I. Palupi, B. ari Wahyudi, N. AL Mamuda, and A. Shabrina, “Predicting Forest Fire Hotspots with Carbon Emission Insights Using Random Forest and Gradient Boosting Regression,” Int. J. Inf. Commun. Technol., vol. 9, no. 2, pp. 137–149, 2023, doi: 10.21108/ijoict.v9i2.865.

S. L. Setyowati, A. Qalbi, R. Aristawidya, B. Sartono, and A. R. Firdawanti, “Optimizing Random Forest Parameters with Hyperparameter Tuning for Classifying School-Age KIP Eligibility in West Java,” Jambura J. Math., vol. 7, no. 1, pp. 40–48, 2025, doi: 10.37905/jjom.v7i1.28736.

M. S. Hashim and A. A. Yassin, “Breast Cancer Prediction Using Soft Voting Classifier Based on Machine Learning Models,” IAENG Int. J. Comput. Sci., vol. 50, no. 2, 2023.

Ž. Vujović, “Classification Model Evaluation Metrics,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 6, pp. 599–606, 2021, doi: 10.14569/IJACSA.2021.0120670.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Mental Health Sentiment Analysis on Twitter using Ensemble Learning Algorithm

Mental Health Sentiment Analysis on Twitter using Ensemble Learning Algorithm

Abstract

Downloads

References

Most read articles by the same author(s)