Investigation of multilingual deep neural networks for spoken term detection

Knill, KM; Gales, MJF; Rath, SP; Woodland, PC; Zhang, C; Zhang, SX

Investigation of multilingual deep neural networks for spoken term detection

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/279189

Repository DOI

https://doi.org/10.17863/CAM.26569

Files

Accepted version (162.33 KB)

Type

Conference Object

Authors

Knill, KM

Gales, MJF

Rath, SP

Woodland, PC

Zhang, C

Show 1 more

Abstract

The development of high-performance speech processing systems for low-resource languages is a challenging area. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to use bottleneck features, or hybrid systems, trained on multilingual data for speech-to-text (STT) systems. This paper presents an investigation into the application of these multilingual approaches to spoken term detection. Experiments were run using the IARPA Babel limited language pack corpora (∼10 hours/language) with 4 languages for initial multilingual system development and an additional held-out target language. STT gains achieved through using multilingual bottleneck features in a Tandem configuration are shown to also apply to keyword search (KWS). Further improvements in both STT and KWS were observed by incorporating language questions into the Tandem GMM-HMM decision trees for the training set languages. Adapted hybrid systems performed slightly worse on average than the adapted Tandem systems. A language independent acoustic model test on the target language showed that retraining or adapting of the acoustic models to the target language is currently minimally needed to achieve reasonable performance. © 2013 IEEE.

Keywords

Multilingual, speech recognition, spoken term detection, keyword search, neural networks

Journal Title

2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings

Conference Name

2013 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

Publisher

IEEE

Publisher DOI

https://doi.org/10.1109/ASRU.2013.6707719

Rights

http://www.rioxx.net/licenses/all-rights-reserved

Collections

Cambridge University Research Outputs