Pengembangan Algoritma Convolutional Neural Network dalam Menganalisis Emosi Suara Menggunakan Mel-Spektogram
Abstract
Speech Emotion Recognition (SER) still faces challenges in accuracy, especially in distinguishing acoustically similar emotions. Conventional approaches such as MFCC (Mel Frequency Cepstral Coefficients) are often ineffective in capturing the emotional nuances of voice. To address this, this study aims to develop a Convolution Neural Network (CNN) model based on the Spec-ResNet architecture that uses Mel-Spectrogram as input to improve the system's ability to extract and recognize emotional signatures from speech signals. Another objective is to evaluate the performance of primary emotion classification in the RAVDESS dataset and measure model consistency through 5-fold cross-validation. The model used, Spec-ResNet, is an adaptation of the ResNet architecture equipped with residual learning to maximize the multi-stage feature extraction process. Experiments were conducted with the RAVDESS dataset containing 1,440 voice samples from six primary emotions: neutral, happy, sad, angry, afraid, and surprised. The test results showed a significant increase in accuracy, with a macro score reaching 92%, up from the MLP/SVM baseline of 83%. Neutral and happy emotions were classified very well (F1-scores of 93% and 90%), but emotions such as fear and surprise remained difficult to distinguish due to the similarity of their vocal patterns. Validation through 5-fold cross-validation yielded an average accuracy of 91.5% ± 0.8%. This study demonstrates the great potential of Mel-spectrograms in SER, while also underscoring the need for advanced approaches such as attention mechanisms to handle ambiguous emotions.
Downloads
References
C. A. Qurniaty and K. Kusnawi, “Ekspresi Emosi Berdasarkan Suara Menggunakan Algortima Multi Layer Perceptron dan Support Vector Machine,” Indones. J. Comput. Sci., vol. 12, no. 6, pp. 4014–4025, 2023, doi: 10.33022/ijcs.v12i6.3567.
Y. K. Aini, T. B. Santoso, and T. Dutono, “Pemodelan CNN Untuk Deteksi Emosi Berbasis Speech Bahasa Indonesia,” J. Komput. Terap., vol. 7, no. 1, pp. 143–152, 2021, doi: 10.35143/jkt.v7i1.4623.
K. Maharana, S. Mondal, and B. Nemade, “A review: Data pre-processing and data augmentation techniques,” Glob. Transitions Proc., vol. 3, no. 1, pp. 91–99, 2022, doi: 10.1016/j.gltp.2022.04.020.
F. J. Tanudjaja, E. Y. Puspaningrum, and Y. V. Via, “Klasifikasi Jenis Emosi Melalui Ucapan Menggunakan Metode Convolutional Neural Network,” Teknologi, vol. 13, no. 2, pp. 1–11, 2023, doi: 10.26594/teknologi.v13i2.3740.
S. Madanian et al., “Speech emotion recognition using machine learning — A systematic review,” Intell. Syst. with Appl., vol. 20, no. July, p. 200266, 2023, doi: 10.1016/j.iswa.2023.200266.
H. Wijaya, “Teknologi Pengenalan Suara tentang Metode, Bahasa dan Tantangan: Systematic Literature Review,” bit-Tech, vol. 7, no. 2, pp. 533–544, Dec. 2024, doi: 10.32877/bt.v7i2.1888.
L. S. Ramba and M. Aria, “Design Of A Voice Controlled Home Automation System Using Deep Learning Convolutional Neural Network (DL-CNN),” Telekontran J. Ilm. Telekomun. Kendali dan Elektron. Terap., vol. 8, no. 1, pp. 57–73, 2020, doi: 10.34010/telekontran.v8i1.3078.
S. Nurmaini, A. Darmawahyuni, A. I. Sapitri, M. N. Rachmatullah, Firdaus, and B. Tutuko, Pengenalan Deep Learning dan Implementasinya. 2021. [Online]. Available: http://repository.unsri.ac.id/id/eprint/89078
I. Minggi, N. Nasrullah, and A. Alimuddin, “Penggunaan Systematic Literature Review Berbantuan PoP untuk Pengembangan Kompetensi Guru SMP Kab. Takalar,” Dedikasi, vol. 25, no. 2, pp. 104–111, 2023, doi: 10.26858/dedikasi.v25i2.56082.
Y. N. Fuadah, I. D. Ubaidullah, N. Ibrahim, F. F. Taliningsing, N. K. Sy, And M. A. Pramuditho, “Optimasi Convolutional Neural Network dan K-Fold Cross Validation pada Sistem Klasifikasi Glaukoma,” ELKOMIKA J. Tek. Energi Elektr. Tek. Telekomun. Tek. Elektron., vol. 10, no. 3, p. 728, 2022, doi: 10.26760/elkomika.v10i3.728.
L. Meng, J. Xu, X. Tan, J. Wang, T. Qin, and B. Xu, “MixSpeech: Data augmentation for low-resource automatic speech recognition,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2021-June, no. 2017, pp. 7008–7012, 2021, doi: 10.1109/ICASSP39728.2021.9414483.
Z. Agustina, P. N. Nisa, and L. S. Laoli, “Visualisasi Dan Analisis Frekuensi Suara Musik Dengan Metodefast Fourier Dan Hamming Window,” Kohesi J. Sains dan Teknol., vol. 8, no. 1, pp. 21–30, 2025, doi: https://doi.org/10.2238/qgqzae16.
A. Amato and V. Di Lecce, “Data preprocessing impact on machine learning algorithm performance,” Open Comput. Sci., vol. 13, no. 1, 2023, doi: 10.1515/comp-2022-0278.
M. Bilal, G. Ali, M. W. Iqbal, M. Anwar, M. S. A. Malik, and R. A. Kadir, “Auto-prep: efficient and automated data preprocessing pipeline,” IEEE Access, vol. 10, pp. 107764–107784, 2022, doi: 10.1109/ACCESS.2022.3198662.
F. I. Muqsith, E. Supriyati, and T. Listyorini, “Klasifikasi Pengucapan Huruf Hijaiyah Berbasis Android Menggunakan CNN dengan Fitur Mel-Spectrogram,” J. Inform. J. Pengemb. IT, vol. 10, no. 1, pp. 67–78, 2025, doi: 10.30591/jpit.v10i1.8145.
E. S. Budi, A. N. Chan, P. P. Alda, and M. A. F. Idris, “Optimasi Model Machine Learning untuk Klasifikasi dan Prediksi Citra Menggunakan Algoritma Convolutional Neural Network,” Resolusi Rekayasa Tek. Inform. dan Inf., vol. 4, no. 5, pp. 502–509, 2024
Y. Zhang, C. Cheng, and Y. Zhang, “Multimodal Emotion Recognition Using a Hierarchical Fusion Convolutional Neural Network,” IEEE Access, vol. 9, pp. 7943–7951, 2021, doi: 10.1109/ACCESS.2021.3049516.
F. Irawan and R. Hanip, “Pelatihan Model Pembelajaran Reading Concept Map (Remap) dalam Melatih Keterampilan Berpikir Kreatif dan Keterampilan Literasi Sains Peserta didik Di SMP YAPIS Merauke,” MAYARA J. Pengabdi. Masy., vol. 3, no. 1, pp. 26–36, 2025, doi: https://doi.org/10.71382/mayara.jurn.peng.masy..v3i1.238.
S. Kumar et al., “Multilayer Neural Network Based Speech Emotion Recognition for Smart Assistance,” Comput. Mater. Contin., vol. 74, no. 1, pp. 1523–1540, 2023, doi: 10.32604/cmc.2023.028631.
B. Wijaya, M. M. E. Haqiqi, A. S. Satyawan, and H. Susilawati, “Restorasi Citra Wajah Terdegradasi Menggunakan Model GAN dan Fungsi Loss,” J. Algoritm., vol. 5, no. 2, pp. 254–263, 2025, doi: https://doi.org/10.35957/algoritme.v5i2.11487.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Pengembangan Algoritma Convolutional Neural Network dalam Menganalisis Emosi Suara Menggunakan Mel-Spektogram
Pages: 1143-1152
Copyright (c) 2025 Iqlima Sabila Zakka, Abdul Rakhman, Lindawati Lindawati

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).





















