Repository logo
 

Increasing Context for Estimating Confidence Scores in Automatic Speech Recognition

Accepted version
Peer-reviewed

Type

Article

Change log

Abstract

Accurate confidence measures for predictions from machine learning techniques play a critical role in the deployment and training of many speech and language processing applications. For example, confidence scores are important when making use of automatically generated transcriptions in training automatic speech recognition (ASR) systems, as well as down-stream applications, such as information retrieval and conversational assistants. Previous work on improving confidence scores for these systems has focused on two main directions: designing features correlated with improved confidence prediction; and employing sequence models to account for the importance of contextual information. Few studies, however, have explored incorporating contextual information more broadly, such as from the future, in addition to the past, or making use of alternative multiple hypotheses in addition to the most likely one. This article introduces two general approaches for encapsulating contextual information from lattices. Experimental results illustrating the importance of increasing contextual information for estimating confidence scores are presented on a range of limited resource languages where word error rates range between 30% and 60%. The results show that the novel approaches provide significant gains in the accuracy of confidence estimation.

Description

Keywords

Lattices, Hidden Markov models, Speech processing, Estimation, History, Feature extraction, Stability criteria, Attention, confidence, graph structures, recurrent neural network, speech recognition

Journal Title

IEEE/ACM Transactions on Audio Speech and Language Processing

Conference Name

Journal ISSN

2329-9290
2329-9304

Volume Title

Publisher

Institute of Electrical and Electronics Engineers (IEEE)
Sponsorship
Cambridge Assessment (unknown)
Cambridge Assessment (Unknown)
All authors were supported in part by the ALTA Institute, Cambridge University. A. Ragni and M. Gales were also supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via Air Force Research Laboratory (AFRL) contract # FA8650-17-C-9117.