Repository logo

Improving multiple-crowd-sourced transcriptions using a speech recogniser

Change log


Van Dalen, RC 
Knill, KM 
Tsiakoulis, P 
Gales, MJF 


This paper introduces a method to produce high-quality transcrip- tions of speech data from only two crowd-sourced transcriptions. These transcriptions, produced cheaply by people on the Internet, for example through Amazon Mechanical Turk, are often of low qual- ity. Often, multiple crowd-sourced transcriptions are combined to form one transcription of higher quality. However, the state of the art is to use essentially a form of majority voting, which requires at least three transcriptions for each utterance. This paper shows how to refine this approach to work with only two transcriptions. It then introduces a method that uses a speech recogniser (bootstrapped on a simple combination scheme) to combine transcriptions. When only two crowd-sourced transcriptions are available, on a noisy data set this improves the word error rate to gold-standard transcriptions by 21 % relative.



Automatic speech recognition, crowd-sourcing, transcription combination

Journal Title

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Conference Name

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Journal ISSN


Volume Title


Cambridge Assessment (unknown)
This paper reports on research supported by Cambridge English, University of Cambridge.