Speaker adaptation and adaptive training for jointly optimised tandem systems

Wang, Y; Zhang, C; Gales, MJF; Woodland, PC

Speaker adaptation and adaptive training for jointly optimised tandem systems

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/286180

Repository DOI

https://doi.org/10.17863/CAM.33492

Files

Accepted version (186.43 KB)

Type

Conference Object

Authors

Wang, Y

Zhang, C

Gales, MJF

Woodland, PC

Abstract

Speaker independent (SI) Tandem systems trained by joint optimisation of bottleneck (BN) deep neural networks (DNNs) and Gaussian mixture models (GMMs) have been found to produce similar word error rates (WERs) to Hybrid DNN systems. A key advantage of using GMMs is that existing speaker adaptation methods, such as maximum likelihood linear regression (MLLR), can be used which to account for diverse speaker variations and improve system robustness. This paper investigates speaker adaptation and adaptive training (SAT) schemes for jointly optimised Tandem systems. Adaptation techniques investigated include constrained MLLR (CMLLR) transforms based on BN features for SAT as well as MLLR and parameterised sigmoid functions for unsupervised test-time adaptation. Experiments using English multi-genre broadcast (MGB3) data show that CMLLR SAT yields a 4% relative WER reduction over jointly trained Tandem and Hybrid SI systems, and further reductions in WER are obtained by system combination.

Keywords

Speech recognition, Tandem system, joint training, speaker adaptive training

Journal Title

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Conference Name

Interspeech 2018

Journal ISSN

2308-457X
1990-9772

Volume Title

2018-September

Publisher

ISCA

Publisher DOI

https://doi.org/10.21437/Interspeech.2018-2432

Rights

http://www.rioxx.net/licenses/all-rights-reserved

Collections

Cambridge University Research Outputs