Combining tandem and hybrid systems for improved speech recognition and keyword spotting on low resource languages
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
MetadataShow full item record
Rath, S., Knill, K., Ragni, A., & Gales, M. (2014). Combining tandem and hybrid systems for improved speech recognition and keyword spotting on low resource languages. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 835-839. https://doi.org/10.17863/CAM.26571
Copyright © 2014 ISCA. In recent years there has been significant interest in Automatic Speech Recognition (ASR) and KeyWord Spotting (KWS) systems for low resource languages. One of the driving forces for this research direction is the IARPA Babel project. This paper examines the performance gains that can be obtained by combining two forms of deep neural network ASR systems, Tandem and Hybrid, for both ASR and KWS using data released under the Babel project. Baseline systems are described for the five option period 1 languages: Assamese; Bengali; Haitian Creole; Lao; and Zulu. All the ASR systems share common attributes, for example deep neural network configurations, and decision trees based on rich phonetic questions and state-position root nodes. The baseline ASR and KWS performance of Hybrid and Tandem systems are compared for both the "full", approximately 80 hours of training data, and limited, approximately 10 hours of training data, language packs. By combining the two systems together consistent performance gains can be obtained for KWS in all configurations.
This record's DOI: https://doi.org/10.17863/CAM.26571
This record's URL: https://www.repository.cam.ac.uk/handle/1810/279191