Speech Emotion Classification Using MFCC Feature Extraction and Bagging-Based Ensemble Learning

Ivan Haristyawan; Eka Arriyanti; Wahyuni Wahyuni

doi:10.47065/bits.v7i3.8878

Ivan Haristyawan * STMIK Widya Cipta Dharma, Samarinda, Indonesia
Eka Arriyanti STMIK Widya Cipta Dharma, Samarinda, Indonesia
Wahyuni Wahyuni STMIK Widya Cipta Dharma, Samarinda, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bits.v7i3.8878

Keywords: Speech Emotion Classification; MFCC; Bagging Algorithm; Decision Tree; Ensemble Learning

Abstract

Speech emotion classification, also known as Speech Emotion Recognition (SER), has become increasingly important with the growing prevalence of human–machine interaction, particularly in the domains of healthcare, online education, and customer service. This study aims to develop a robust speech emotion classification system by employing Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction and a Decision Tree–based Bagging algorithm for classification. The proposed approach is designed to address the challenges of low classification accuracy, especially under speaker-independent conditions and limited availability of labeled emotional speech data. The research workflow includes speech signal preprocessing, MFCC feature extraction, dataset partitioning through bootstrapping, ensemble model training, and performance evaluation using accuracy, precision, recall, and F1-score metrics. Experimental results on a balanced dataset comprising five emotion classes (anger, disgust, fear, happy, and sad) demonstrate that the proposed model achieves an overall accuracy of 61.04%. While the fear and happy emotions are classified effectively with recall values of 0.75, the anger class exhibits the lowest performance with an F1-score of 0.49. Confusion matrix analysis further reveals substantial acoustic overlap among several emotion categories, particularly the frequent misclassification of sad as disgust or anger. In conclusion, the integration of MFCC features with the Bagging algorithm improves model stability and robustness; however, further optimization of acoustic features and hyperparameters is required to enhance overall classification accuracy.

Downloads

Download data is not yet available.

References

S. Madanian et al., “Speech emotion recognition using machine learning - A systematic review,” Intell. Syst. Appl., vol. 20, p. 200266, 2023, doi: 10.1016/j.iswa.2023.200266.

S. Mishra, P. Warule, and S. Deb, “Speech emotion recognition using MFCC-based entropy feature,” Signal, Image Video Process., vol. 18, pp. 153–161, 2023, doi: 10.1007/s11760-023-02716-7.

J. H. Chowdhury, S. Ramanna, and K. Kotecha, “Speech emotion recognition with light weight deep neural ensemble model using hand crafted features,” Sci. Rep., vol. 15, 2025, doi: 10.1038/s41598-025-95734-z.

J. H. Chowdhury, S. Ramanna, and K. Kotecha, “Speech emotion recognition with light weight deep neural ensemble model using hand crafted features,” Sci. Rep., vol. 15, no. 1, pp. 1–14, 2025, doi: 10.1038/s41598-025-95734-z.

A. K. Pagidirayi and A. Bhuma, “Speech Emotion Recognition Using Machine Learning Techniques,” Rev. d’Intelligence Artif., vol. 36, no. 2, pp. 271–278, 2022, doi: 10.18280/ria.360211.

S. Patnaik, “Speech emotion recognition by using complex MFCC and deep sequential model,” Multimed. Tools Appl., vol. 82, pp. 11897–11922, 2022, doi: 10.1007/s11042-022-13725-y.

N. Aishwarya, K. Kaur, and K. Seemakurthy, “A computationally efficient speech emotion recognition system employing machine learning classifiers and ensemble learning,” Int. J. Speech Technol., vol. 27, pp. 239–254, 2024, doi: 10.1007/s10772-024-10095-8.

S. Chen and W. Zheng, “RRMSE-enhanced weighted voting regressor for improved ensemble regression,” PLoS One, vol. 20, 2025, doi: 10.1371/journal.pone.0319515.

A. Rojarath and W. Songpan, “Cost-sensitive probability for weighted voting in an ensemble model for multi-class classification problems,” Appl. Intell., vol. 51, pp. 4908–4932, 2021, doi: 10.1007/s10489-020-02106-3.

P. Natha, S. P. Tera, R. Chinthaginjala, S. Rab, V. Narasimhulu, and T. H. Kim, “Boosting skin cancer diagnosis accuracy with ensemble approach,” Sci. Rep., vol. 15, 2025, doi: 10.1038/s41598-024-84864-5.

A. Assiri, S. Nazir, and S. Velastín, “Breast Tumor Classification Using an Ensemble Machine Learning Method,” J. Imaging, vol. 6, 2020, doi: 10.3390/jimaging6060039.

X. Li et al., "Beds: Bagging Ensemble Deep Segmentation For Nucleus Segmentation With Testing Stage Stain Augmentation," 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), Nice, France, 2021, pp. 659-662, doi: 10.1109/ISBI48211.2021.9433869.

N. H. A. Malek, W. Yaacob, Y. B. Wah, S. A. Md Nasir, N. Shaadan, and S. Indratno, “Comparison of Ensemble Hybrid Sampling with Bagging and Boosting Machine Learning Approach for Imbalanced Data,” Indones. J. Electr. Eng. Comput. Sci., vol. 29, no. 1, pp. 598–608, 2022, doi: 10.11591/ijeecs.v29.i1.pp598-608.

P. Patro, T. Goel, S. A. Varaprasad, M. I. Tanveer, and R. Murugan, “Lightweight 3D Convolutional Neural Network for Schizophrenia Diagnosis Using MRI Images and Ensemble Bagging Classifier,” ArXiv, vol. abs/2211.0, 2022, doi: 10.48550/arXiv.2211.02868.

L. N. Mabumbi et al., “New Approach Based on the Ensemble Learning Estimator to Maximize Accuracy,” J. Adv. Math. Comput. Sci., 2025, doi: 10.9734/jamcs/2025/v40i31976.

S. F. Mokhtar, Z. M. Yusof, and H. Sapiri, “Confidence intervals by bootstrapping approach: a significance review,” Malaysian J. Fundam. Appl. Sci., vol. 19, no. 1, pp. 30–42, 2023, doi: https://doi.org/10.11113/mjfas.v19n1.2660.

B. Efron and T. Hastie, Computer Age Statistical Inference: Algorithms, Evidence, and Data Science. Cambridge University Press, 2021. doi: 10.1017/9781108660966.

G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning: with Applications in Python. Springer, 2023. doi: 10.1007/978-3-031-38747-0.

M. Kuhn and K. Johnson, Feature Engineering and Selection: A Practical Approach for Predictive Models. Chapman and Hall/CRC, 2024. doi: 10.1201/9781003317821.

E. Rigdon, M. Sarstedt, and O. Moisescu, “Quantifying model selection uncertainty via bootstrapping and Akaike weights,” Int. J. Consum. Stud., vol. 47, no. 4, pp. 1596–1608, 2023, doi: https://doi.org/10.1111/ijcs.12906.

B. S. Maitner et al., “Bootstrapping outperforms community‐weighted approaches for estimating the shapes of phenotypic distributions,” Methods Ecol. Evol., vol. 14, no. 10, pp. 2592–2610, 2023, doi: https://doi.org/10.1111/2041-210X.14160.

G. Rousselet, C. R. Pernet, and R. R. Wilcox, “An introduction to the bootstrap: a versatile method to make inferences by using data-driven simulations,” Meta-Psychology, vol. 7, pp. 1–24, 2023, doi: https://doi.org/10.15626/MP.2019.2058.

Y. Liu, H. Zhang, and L. Wang, “Enhanced speech emotion recognition using deep fusion of MFCC and spectral features,” IEEE Trans. Affect. Comput., vol. 13, no. 3, pp. 1124–1135, 2022, doi: 10.1109/TAFFC.2021.3112105.

S. Juyal and P. Gupta, “Emotion Recognition from Speech Using Deep Neural Network,” Int. J. Adv. Comput. Sci. Appl., 2021, doi: 10.14569/IJACSA.2021.0120561.

N. Iqbal, “MFCC and Machine Learning Based Speech Emotion Recognition Over TESS and IEMOCAP Datasets,” Int. J. Eng. Res. Technol., vol. 10, no. 12, 2021.

Ravi and S. Taran, “Emotion Recognition in Speech Using MFCC and Energy Based Ratio Features,” 2024 11th Int. Conf. Signal Process. Integr. Networks, pp. 367–371, 2024, doi: 10.1109/spin60856.2024.10511355.

D. Yuan and S. Zhang, “A Single Channel Speech Enhancement Algorithm for Long Distance Scene,” in 2024 17th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), 2024, pp. 1–5. doi: https://doi.org/10.1109/CISP-BMEI64163.2024.10906108.

X. Zhao and X. Nie, “Splitting Choice and Computational Complexity Analysis of Decision Trees,” Entropy, vol. 23, 2021, doi: 10.3390/e23101241.

Z. Saurav, M. M. Mitu, N. S. Ritu, M. A. Hasan, S. Arefin and D. M. Farid, "A New Method for Learning Decision Tree Classifier," 2023 International Conference on Electrical, Computer and Communication Engineering (ECCE), Chittagong, Bangladesh, 2023, pp. 1-6, doi: 10.1109/ECCE57851.2023.10101557

S. A. Fayaz, M. Zaman, and M. A. Butt, “Performance Evaluation of GINI Index and Information Gain Criteria on Geographical Data: An Empirical Study Based on JAVA and Python,” in International Conference on Innovative Computing and Communications, 2021. doi: 10.1007/978-981-16-3071-2_22.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Speech Emotion Classification Using MFCC Feature Extraction and Bagging-Based Ensemble Learning

Speech Emotion Classification Using MFCC Feature Extraction and Bagging-Based Ensemble Learning

Abstract

Downloads

References

Most read articles by the same author(s)