Pengembangan Algoritma Convolutional Neural Network dalam Menganalisis Emosi Suara Menggunakan Mel-Spektogram

Iqlima Sabila Zakka; Abdul Rakhman; Lindawati Lindawati

doi:10.47065/bits.v7i2.7875

Iqlima Sabila Zakka Politeknik Negeri Sriwijaya, Palembang, Indonesia
Abdul Rakhman Politeknik Negeri Sriwijaya, Palembang, Indonesia
Lindawati Lindawati * Politeknik Negeri Sriwijaya, Palembang, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bits.v7i2.7875

Keywords: Mel-Spectrogram; CNN; Neural Network; Speech Emotion Recognition; Accuracy

Abstract

Speech Emotion Recognition (SER) still faces challenges in accuracy, especially in distinguishing acoustically similar emotions. Conventional approaches such as MFCC (Mel Frequency Cepstral Coefficients) are often ineffective in capturing the emotional nuances of voice. To address this, this study aims to develop a Convolution Neural Network (CNN) model based on the Spec-ResNet architecture that uses Mel-Spectrogram as input to improve the system's ability to extract and recognize emotional signatures from speech signals. Another objective is to evaluate the performance of primary emotion classification in the RAVDESS dataset and measure model consistency through 5-fold cross-validation. The model used, Spec-ResNet, is an adaptation of the ResNet architecture equipped with residual learning to maximize the multi-stage feature extraction process. Experiments were conducted with the RAVDESS dataset containing 1,440 voice samples from six primary emotions: neutral, happy, sad, angry, afraid, and surprised. The test results showed a significant increase in accuracy, with a macro score reaching 92%, up from the MLP/SVM baseline of 83%. Neutral and happy emotions were classified very well (F1-scores of 93% and 90%), but emotions such as fear and surprise remained difficult to distinguish due to the similarity of their vocal patterns. Validation through 5-fold cross-validation yielded an average accuracy of 91.5% ± 0.8%. This study demonstrates the great potential of Mel-spectrograms in SER, while also underscoring the need for advanced approaches such as attention mechanisms to handle ambiguous emotions.

Downloads

Download data is not yet available.

References

C. A. Qurniaty and K. Kusnawi, “Ekspresi Emosi Berdasarkan Suara Menggunakan Algortima Multi Layer Perceptron dan Support Vector Machine,” Indones. J. Comput. Sci., vol. 12, no. 6, pp. 4014–4025, 2023, doi: 10.33022/ijcs.v12i6.3567.

Y. K. Aini, T. B. Santoso, and T. Dutono, “Pemodelan CNN Untuk Deteksi Emosi Berbasis Speech Bahasa Indonesia,” J. Komput. Terap., vol. 7, no. 1, pp. 143–152, 2021, doi: 10.35143/jkt.v7i1.4623.

K. Maharana, S. Mondal, and B. Nemade, “A review: Data pre-processing and data augmentation techniques,” Glob. Transitions Proc., vol. 3, no. 1, pp. 91–99, 2022, doi: 10.1016/j.gltp.2022.04.020.

F. J. Tanudjaja, E. Y. Puspaningrum, and Y. V. Via, “Klasifikasi Jenis Emosi Melalui Ucapan Menggunakan Metode Convolutional Neural Network,” Teknologi, vol. 13, no. 2, pp. 1–11, 2023, doi: 10.26594/teknologi.v13i2.3740.

S. Madanian et al., “Speech emotion recognition using machine learning — A systematic review,” Intell. Syst. with Appl., vol. 20, no. July, p. 200266, 2023, doi: 10.1016/j.iswa.2023.200266.

H. Wijaya, “Teknologi Pengenalan Suara tentang Metode, Bahasa dan Tantangan: Systematic Literature Review,” bit-Tech, vol. 7, no. 2, pp. 533–544, Dec. 2024, doi: 10.32877/bt.v7i2.1888.

L. S. Ramba and M. Aria, “Design Of A Voice Controlled Home Automation System Using Deep Learning Convolutional Neural Network (DL-CNN),” Telekontran J. Ilm. Telekomun. Kendali dan Elektron. Terap., vol. 8, no. 1, pp. 57–73, 2020, doi: 10.34010/telekontran.v8i1.3078.

S. Nurmaini, A. Darmawahyuni, A. I. Sapitri, M. N. Rachmatullah, Firdaus, and B. Tutuko, Pengenalan Deep Learning dan Implementasinya. 2021. [Online]. Available: http://repository.unsri.ac.id/id/eprint/89078

I. Minggi, N. Nasrullah, and A. Alimuddin, “Penggunaan Systematic Literature Review Berbantuan PoP untuk Pengembangan Kompetensi Guru SMP Kab. Takalar,” Dedikasi, vol. 25, no. 2, pp. 104–111, 2023, doi: 10.26858/dedikasi.v25i2.56082.

Y. N. Fuadah, I. D. Ubaidullah, N. Ibrahim, F. F. Taliningsing, N. K. Sy, And M. A. Pramuditho, “Optimasi Convolutional Neural Network dan K-Fold Cross Validation pada Sistem Klasifikasi Glaukoma,” ELKOMIKA J. Tek. Energi Elektr. Tek. Telekomun. Tek. Elektron., vol. 10, no. 3, p. 728, 2022, doi: 10.26760/elkomika.v10i3.728.

L. Meng, J. Xu, X. Tan, J. Wang, T. Qin, and B. Xu, “MixSpeech: Data augmentation for low-resource automatic speech recognition,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2021-June, no. 2017, pp. 7008–7012, 2021, doi: 10.1109/ICASSP39728.2021.9414483.

Z. Agustina, P. N. Nisa, and L. S. Laoli, “Visualisasi Dan Analisis Frekuensi Suara Musik Dengan Metodefast Fourier Dan Hamming Window,” Kohesi J. Sains dan Teknol., vol. 8, no. 1, pp. 21–30, 2025, doi: https://doi.org/10.2238/qgqzae16.

A. Amato and V. Di Lecce, “Data preprocessing impact on machine learning algorithm performance,” Open Comput. Sci., vol. 13, no. 1, 2023, doi: 10.1515/comp-2022-0278.

M. Bilal, G. Ali, M. W. Iqbal, M. Anwar, M. S. A. Malik, and R. A. Kadir, “Auto-prep: efficient and automated data preprocessing pipeline,” IEEE Access, vol. 10, pp. 107764–107784, 2022, doi: 10.1109/ACCESS.2022.3198662.

F. I. Muqsith, E. Supriyati, and T. Listyorini, “Klasifikasi Pengucapan Huruf Hijaiyah Berbasis Android Menggunakan CNN dengan Fitur Mel-Spectrogram,” J. Inform. J. Pengemb. IT, vol. 10, no. 1, pp. 67–78, 2025, doi: 10.30591/jpit.v10i1.8145.

E. S. Budi, A. N. Chan, P. P. Alda, and M. A. F. Idris, “Optimasi Model Machine Learning untuk Klasifikasi dan Prediksi Citra Menggunakan Algoritma Convolutional Neural Network,” Resolusi Rekayasa Tek. Inform. dan Inf., vol. 4, no. 5, pp. 502–509, 2024

Y. Zhang, C. Cheng, and Y. Zhang, “Multimodal Emotion Recognition Using a Hierarchical Fusion Convolutional Neural Network,” IEEE Access, vol. 9, pp. 7943–7951, 2021, doi: 10.1109/ACCESS.2021.3049516.

F. Irawan and R. Hanip, “Pelatihan Model Pembelajaran Reading Concept Map (Remap) dalam Melatih Keterampilan Berpikir Kreatif dan Keterampilan Literasi Sains Peserta didik Di SMP YAPIS Merauke,” MAYARA J. Pengabdi. Masy., vol. 3, no. 1, pp. 26–36, 2025, doi: https://doi.org/10.71382/mayara.jurn.peng.masy..v3i1.238.

S. Kumar et al., “Multilayer Neural Network Based Speech Emotion Recognition for Smart Assistance,” Comput. Mater. Contin., vol. 74, no. 1, pp. 1523–1540, 2023, doi: 10.32604/cmc.2023.028631.

B. Wijaya, M. M. E. Haqiqi, A. S. Satyawan, and H. Susilawati, “Restorasi Citra Wajah Terdegradasi Menggunakan Model GAN dan Fungsi Loss,” J. Algoritm., vol. 5, no. 2, pp. 254–263, 2025, doi: https://doi.org/10.35957/algoritme.v5i2.11487.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Pengembangan Algoritma Convolutional Neural Network dalam Menganalisis Emosi Suara Menggunakan Mel-Spektogram

Pengembangan Algoritma Convolutional Neural Network dalam Menganalisis Emosi Suara Menggunakan Mel-Spektogram

Abstract

Downloads

References

Most read articles by the same author(s)