Repository logo
 

Joint decoding of tandem and hybrid systems for improved keyword spotting on low resource languages

Accepted version
Peer-reviewed

Loading...
Thumbnail Image

Type

Conference Object

Change log

Authors

Wang, H 
Ragni, A 
Gales, MJF 
Knill, KM 
Woodland, PC 

Abstract

Copyright © 2015 ISCA. Keyword spotting (KWS) for low-resource languages has drawn increasing attention in recent years. The state-of-the-art KWS systems are based on lattices or Confusion Networks (CN) generated by Automatic Speech Recognition (ASR) systems. It has been shown that considerable KWS gains can be obtained by combining the keyword detection results from different forms of ASR systems, e.g., Tandem and Hybrid systems. This paper investigates an alternative combination scheme for KWS using joint decoding. This scheme treats a Tandem system and a Hybrid system as two separate streams, and makes a linear combination of individual acoustic model log-likelihoods. Joint decoding is more efficient as it requires just a single pass of decoding and a single pass of keyword search. Experiments on six Babel OP2 development languages show that joint decoding is capable of providing consistent gains over each individual system. Moreover, it is possible to efficiently rescore the joint decoding lattices with Tandem or Hybrid acoustic models, and further KWS gains can be obtained by merging the detection posting lists from the joint decoding lattices and rescored lattices.

Description

Keywords

keyword spotting, joint decoding, deep neural network, Tandem, Hybrid

Journal Title

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Conference Name

Interspeech 2015

Journal ISSN

2308-457X
1990-9772

Volume Title

2015-January

Publisher

Sponsorship
IARPA (4912046943)