Repository logo
 

N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space

Accepted version
Peer-reviewed

Type

Conference Object

Change log

Authors

Ma, R 
Gales, MJF 
Knill, KM 
Qian, M 

Abstract

Error correction models form an important part of Automatic Speech Recognition (ASR) post-processing to improve the readability and quality of transcriptions. Most prior works use the 1-best ASR hypothesis as input and therefore can only perform correction by leveraging the context within one sentence. In this work, we propose a novel N-best T5 model for this task, which is fine-tuned from a T5 model and utilizes ASR N-best lists as model input. By transferring knowledge from the pretrained language model and obtaining richer information from the ASR decoding space, the proposed approach outperforms a strong Conformer-Transducer baseline. Another issue with standard error correction is that the generation process is not well-guided. To address this a constrained decoding process, either based on the N-best list or an ASR lattice, is used which allows additional information to be propagated.

Description

Keywords

Journal Title

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Conference Name

INTERSPEECH 2023

Journal ISSN

2308-457X
1990-9772

Volume Title

Publisher

ISCA
Sponsorship
Cambridge University Press & Assessment