Repository logo
 

Improving Multiple-Crowd-Sourced Transcriptions Using a Speech Recogniser


Change log

Abstract

This paper introduces a method to produce high-quality transcriptions of speech data from only two crowd-sourced transcriptions. These transcriptions, produced cheaply by people on the Internet, for example through Amazon Mechanical Turk, are often of low quality. Often, multiple crowd-sourced transcriptions are combined to form one transcription of higher quality. However, the state of the art is to use essentially a form of majority voting, which requires at least three transcriptions for each utterance. This paper shows how to refine this approach to work with only two transcriptions. It then introduces a method that uses a speech recogniser (bootstrapped on a simple combination scheme) to combine transcriptions. When only two crowd-sourced transcriptions are available, on a noisy data set this improves the word error rate to gold-standard transcriptions by 21 % relative.

Description

Journal Title

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Conference Name

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Journal ISSN

1520-6149

Volume Title

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Rights and licensing

Except where otherwised noted, this item's license is described as http://www.rioxx.net/licenses/all-rights-reserved
Sponsorship
Cambridge Assessment (unknown)
This paper reports on research supported by Cambridge English, University of Cambridge.