Transfer learning based feature selection for feedforward neural network for speech emotion classifier
https://doi.org/10.21122/2309-4923-2025-1-38-43
Abstract
This work discusses speech emotion recognition via custom feature engineering and feature selection techniques using mel-frequency cepstral coefficients as initial audio features. Proposed transfer learning approach consist in employing the backward-step selection algorithm for feature selection using statistical learning classifiers, the obtained subset of features than subsequently used to train feedforward neural networks. This technique allowed us to significantly reduce initial feature vector size while increasing models’ prediction quality. We used TESS and RAVDESS datasets to estimate the performance of proposed method. To evaluate the quality of the model, unweighted average recall (UAR) was used. Experimental results demonstrate promising accuracy (UAR = 82 % for TESS and UAR = 53 % for RAVDESS), showcasing the potential of this approach for applications like virtual agents, voice assistants and mental health diagnostics.
About the Authors
D. V. KrasnoproshinBelarus
Krasnoproshin D.V., PhD Student at the Department of Electronic Computing Facilities
Minsk
M. I. Vashkevich
Belarus
Vashkevich M. I., PhD, Professor at the Department of Electronic Computing Facilities
Minsk
References
1. Issa D. Speech emotion recognition with deep convolutional neural networks / D. Issa, M. Demirci, A. Yazici // Biomedical Signal Processing and Control. – Vol. 59. – 2020. – P. 1-11.
2. Baruah M., Banerjee B. Speech emotion recognition via generation using an attention-based variational recurrent neural network // Proceedings of the INTERSPEECH. – 2022. – P. 4710-4714.
3. Krasnoproshin D.V., Vashkevich M.I. Speech emotion recognition method based on support vector machine and suprasegmental acoustic features // Doklady BGUIR. – 2024. – Vol. 22. – № 3. – P. 93-100. (In Russ.)
4. Flach P. Machine Learning: The Art and Science of Algorithms that Make Sense of Data. // Cambridge University Press, 2012. – 410 p.
5. Tsanas A. et al. Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease // IEEE transactions on biomedical engineering. – 2012. – Vol. 59. – № 5. – P. 1264-1271.
6. Huang S. H. Supervised feature selection: A tutorial //Artif. Intell. Res. – 2015. – Vol. 4. – № 2. – P. 22-37.
7. James G. et al. An Introduction to Statistical Learning: With Applications in R / G. James, T. Hastie, R. Tibshirani, D. Witten // Springer, 2013. – 426 p.
8. Pichora-Fuller, M. Kathleen, and Kate Dupuis. Toronto Emotional Speech Set (TESS). Borealis, 2020. https://doi.org/10.5683/SP2/E8H2MF
9. Livingstone, Steven R., and Frank A. Russo. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). Zenodo, 2018. https://doi.org/10.5281/zenodo.1188975.
10. Luna-Jiménez C. Multimodal emotion recognition on RAVDESS dataset using transfer learning / C. Luna-Jiménez, Griol, Z. Callejas, R. Kleinlein, J. M. Montero, F. Fernández-Martínez. Sensors. – 2021. – Vol. 22. – P. 1–29.
Review
For citations:
Krasnoproshin D.V., Vashkevich M.I. Transfer learning based feature selection for feedforward neural network for speech emotion classifier. «System analysis and applied information science». 2025;(1):38-43. https://doi.org/10.21122/2309-4923-2025-1-38-43