Robust Excitation-based Feature for Automatic Speech Recognition
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
MetadataShow full item record
Drugman, T., Stylianou, Y., Chen, L., Chen, X., & Gales, M. (2015). Robust Excitation-based Feature for Automatic Speech Recognition. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4664-4668. https://doi.org/10.1109/ICASSP.2015.7178855
In this paper we investigate the use of robust to noise features characterizing the speech excitation signal as complementary features to the usually considered vocal tract based features for automatic speech recognition (ASR). The features are tested in a state-of-the-art Deep Neural Network (DNN) based hybrid acoustic model for speech recognition. The suggested excitation features expands the set of excitation features previously considered for ASR, expecting that these features help in a better discrimination of the broad phonetic classes (e.g., fricatives, nasal, vowels, etc.). Relative improvements in the word error rate are observed in the AMI meeting transcription system with greater gains (about 5%) if PLP features are combined with the suggested excitation features. For Aurora 4, significant improvements are observed as well. Combining the suggested excitation features with filter banks, a word error rate of 9.96% is achieved.
neural networks, automatic speech recognition, speech excitation signal
External DOI: https://doi.org/10.1109/ICASSP.2015.7178855
This record's URL: https://www.repository.cam.ac.uk/handle/1810/247427