Repository logo
 

Language independent and unsupervised acoustic models for speech recognition and keyword spotting

Accepted version
Peer-reviewed

Type

Conference Object

Change log

Authors

Knill, KM 
Gales, MJF 
Ragni, A 
Rath, SP 

Abstract

Copyright © 2014 ISCA. Developing high-performance speech processing systems for low-resource languages is very challenging. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to train a multi-language bottleneck DNN. Language dependent and/or multi-language (all training languages) Tandem acoustic models (AM) are then trained. This work considers a particular scenario where the target language is unseen in multi-language training and has limited language model training data, a limited lexicon, and acoustic training data without transcriptions. A zero acoustic resources case is first described where a multilanguage AM is directly applied, as a language independent AM (LIAM), to an unseen language. Secondly, in an unsupervised approach a LIAM is used to obtain hypotheses for the target language acoustic data transcriptions which are then used in training a language dependent AM. 3 languages from the IARPA Babel project are used for assessment: Vietnamese, Haitian Creole and Bengali. Performance of the zero acoustic resources system is found to be poor, with keyword spotting at best 60% of language dependent performance. Unsupervised language dependent training yields performance gains. For one language (Haitian Creole) the Babel target is achieved on the in-vocabulary data.

Description

Keywords

speech recognition, low resource, multilingual

Journal Title

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Conference Name

Interspeech 2014

Journal ISSN

2308-457X
1990-9772

Volume Title

Publisher

Sponsorship
IARPA (4912046943)