Improving Lightly Supervised Training for Broadcast Transcription
View / Open Files
Publication Date
2013-08-25Journal Title
Interspeech 2013
ISSN
2308-457X
Publisher
ISCA
Pages
2187-2191
Language
English
Type
Article
Metadata
Show full item recordCitation
Long, Y., Gales, M., Lanchantin, P., Liu, X., Seigel, M., & Woodland, P. (2013). Improving Lightly Supervised Training for Broadcast Transcription. Interspeech 2013, 2187-2191. http://www.isca-speech.org/archive/interspeech_2013/i13_2187.html
Abstract
This paper investigates improving lightly supervised acoustic
model training for an archive of broadcast data. Standard
lightly supervised training uses automatically derived decoding
hypotheses using a biased language model. However, as the
actual speech can deviate significantly from the original programme
scripts that are supplied, the quality of standard lightly
supervised hypotheses can be poor. To address this issue, word
and segment level combination approaches are used between
the lightly supervised transcripts and the original programme
scripts which yield improved transcriptions. Experimental results
show that systems trained using these improved transcriptions
consistently outperform those trained using only the original
lightly supervised decoding hypotheses. This is shown to be
the case for both the maximum likelihood and minimum phone
error trained systems.
Keywords
lightly supervised training, speech recognition, confidence scores
Sponsorship
The research leading to these results was supported by EPSRC Programme Grant EP/I031022/1 (Natural Speech Technology).
Identifiers
This record's URL: https://www.repository.cam.ac.uk/handle/1810/245711
Rights
Licence:
http://www.rioxx.net/licenses/all-rights-reserved