Repository logo
 

Combining tandem and hybrid systems for improved speech recognition and keyword spotting on low resource languages

Accepted version
Peer-reviewed

Type

Conference Object

Change log

Authors

Rath, SP 
Knill, KM 
Ragni, A 
Gales, MJF 

Abstract

Copyright © 2014 ISCA. In recent years there has been significant interest in Automatic Speech Recognition (ASR) and KeyWord Spotting (KWS) systems for low resource languages. One of the driving forces for this research direction is the IARPA Babel project. This paper examines the performance gains that can be obtained by combining two forms of deep neural network ASR systems, Tandem and Hybrid, for both ASR and KWS using data released under the Babel project. Baseline systems are described for the five option period 1 languages: Assamese; Bengali; Haitian Creole; Lao; and Zulu. All the ASR systems share common attributes, for example deep neural network configurations, and decision trees based on rich phonetic questions and state-position root nodes. The baseline ASR and KWS performance of Hybrid and Tandem systems are compared for both the "full", approximately 80 hours of training data, and limited, approximately 10 hours of training data, language packs. By combining the two systems together consistent performance gains can be obtained for KWS in all configurations.

Description

Keywords

keyword spotting, deep neural network, Tandem, Hybrid

Journal Title

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Conference Name

Interspeech 2014

Journal ISSN

2308-457X
1990-9772

Volume Title

Publisher

Sponsorship
IARPA (4912046943)