Improving lightly supervised training for broadcast transcription

Long, Y; Gales, MJF; Lanchantin, P; Liu, X; Seigel, MS; Woodland, PC

Improving lightly supervised training for broadcast transcription

Repository URI

https://www.repository.cam.ac.uk/handle/1810/245711

Files

yanhualong-IS2013-lightsupv.pdf (120.86 KB)

Type

Article

Authors

Long, Y

Gales, MJF

Lanchantin, P

Liu, X

Seigel, MS

Show 1 more

Abstract

This paper investigates improving lightly supervised acoustic model training for an archive of broadcast data. Standard lightly supervised training uses automatically derived decoding hypotheses using a biased language model. However, as the actual speech can deviate significantly from the original programme scripts that are supplied, the quality of standard lightly supervised hypotheses can be poor. To address this issue, word and segment level combination approaches are used between the lightly supervised transcripts and the original programme scripts which yield improved transcriptions. Experimental results show that systems trained using these improved transcriptions consistently outperform those trained using only the original lightly supervised decoding hypotheses. This is shown to be the case for both the maximum likelihood and minimum phone error trained systems.

Keywords

lightly supervised training, speech recognition, confidence scores

Journal Title

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Journal ISSN

2308-457X
1990-9772

Publisher

ISCA

Publisher DOI

https://doi.org/10.21437/interspeech.2013-516

Rights

http://www.rioxx.net/licenses/all-rights-reserved

Sponsorship

The research leading to these results was supported by EPSRC Programme Grant EP/I031022/1 (Natural Speech Technology).

Collections

Scholarly Works - Engineering
Symplectic mapped items for data match