machine-learningdeep-learningspeech-recognitionrecurrent-neural-networkhidden-markov-models

HMM vs Deep Learning for Speech Emotion Recognition (SER)


For building Speech Emotion Detection and Recognition system, which approach would be better? Hidden Markov Model or Deep Learning (RNN-LSTM) approach? I have to build a SER system and I am confused between the two. If there are better models than these two, kindly tell.


Solution

  • HMM and RNN-LSTM based solutions are not considered highly accurate for SER. I believe the ranking algorithm to date is one based on Deep Retinal Convolution Neural Networks (DRCNNs). See Speech emotion recognition using Deep Retinal Convolution Neural Networks, authored by Niu, Yafeng; Zou, Dongsheng; Niu, Yadong; He, Zhongshi; Tan, Hua and published in July of 2017. The authors achieved an average accuracy over 99% on the following databases: IEMOCAP, EMO-DB, and SAVEE.