Repository logo
 

General sequence teacher-student learning

Accepted version
Peer-reviewed

Type

Article

Change log

Abstract

In automatic speech recognition, performance gains can often be obtained by combining an ensemble of multiple models. However, this can be computationally expensive when performing recognition. Teacher-student learning alleviates this cost by training a single student model to emulate the combined ensemble behaviour. Only this student needs to be used for recognition. Previously investigated teacher-student criteria often limit the forms of diversity allowed in the ensemble, and only propagate information from the teachers to the student at the frame level. This paper addresses both of these issues by examining teacher-student learning within a sequence-level framework, and assessing the flexibility that these approaches offer. Various sequence-level teacher-student criteria are examined in this work, to propagate sequence posterior information. A training criterion based on the KL-divergence between context-dependent state sequence posteriors is proposed that allows for a diversity of state cluster sets to be present in the ensemble. This criterion is shown to be an upper bound to a more general KL-divergence between word sequence posteriors, which places even fewer restrictions on the ensemble diversity, but whose gradient can be expensive to compute. These methods are evaluated on the AMI meeting transcription and MGB-3 television broadcast audio tasks.

Description

Keywords

Automatic speech recognition, ensemble, lattice-free, random forest, teacher-student

Journal Title

IEEE/ACM Transactions on Audio Speech and Language Processing

Conference Name

Journal ISSN

2329-9290
2329-9304

Volume Title

27

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Rights

All rights reserved
Sponsorship
Cambridge Assessment (unknown)
This research was partly funded under the ALTA Institute, University of Cambridge. Thanks to Cambridge Assessment English, University of Cambridge, for supporting this research.