Language independent and unsupervised acoustic models for speech recognition and keyword spotting
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
MetadataShow full item record
Knill, K., Gales, M., Ragni, A., & Rath, S. (2014). Language independent and unsupervised acoustic models for speech recognition and keyword spotting. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 16-20. https://doi.org/10.17863/CAM.26568
Copyright © 2014 ISCA. Developing high-performance speech processing systems for low-resource languages is very challenging. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to train a multi-language bottleneck DNN. Language dependent and/or multi-language (all training languages) Tandem acoustic models (AM) are then trained. This work considers a particular scenario where the target language is unseen in multi-language training and has limited language model training data, a limited lexicon, and acoustic training data without transcriptions. A zero acoustic resources case is first described where a multilanguage AM is directly applied, as a language independent AM (LIAM), to an unseen language. Secondly, in an unsupervised approach a LIAM is used to obtain hypotheses for the target language acoustic data transcriptions which are then used in training a language dependent AM. 3 languages from the IARPA Babel project are used for assessment: Vietnamese, Haitian Creole and Bengali. Performance of the zero acoustic resources system is found to be poor, with keyword spotting at best 60% of language dependent performance. Unsupervised language dependent training yields performance gains. For one language (Haitian Creole) the Babel target is achieved on the in-vocabulary data.
This record's DOI: https://doi.org/10.17863/CAM.26568
This record's URL: https://www.repository.cam.ac.uk/handle/1810/279188