Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition
View / Open Files
Publication Date
2019-09-01Journal Title
IEEE/ACM Transactions on Audio Speech and Language Processing
ISSN
2329-9290
Publisher
IEEE Advancing Technology for Humanity
Volume
27
Issue
9
Pages
1444-1454
Type
Article
This Version
AM
Metadata
Show full item recordCitation
Chen, X., Liu, X., Wang, Y., Ragni, A., Wong, J., & Gales, M. (2019). Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition. IEEE/ACM Transactions on Audio Speech and Language Processing, 27 (9), 1444-1454. https://doi.org/10.1109/TASLP.2019.2922048
Abstract
Language modelling is a crucial component in a wide range of applications including speech recognition. Language models (LMs) are usually constructed by splitting a sentence into words and computing the probability of a word based on its word history. This sentence probability calculation, making use of conditional probability distributions, assumes that there is little impact from approximations used in the LMs including:
the word history representations; and approaches to handle finite training data. This motivates examining models that make use of additional information from the sentence. In this work future word information, in addition to the history, is used to predict the probability of the current word. For recurrent neural network LMs (RNNLMs) this information can be encapsulated in a bi-directional model. However, if used directly this form
of model is computationally expensive when training on large quantities of data, and can be problematic when used with word lattices. This paper proposes a novel neural network language model structure, the succeeding-word RNNLM, su-RNNLM, to address these issues. Instead of using a recurrent unit to capture the complete future word contexts, a feed-forward unit is used to model a fixed finite number of succeeding words. This is more efficient in training than bi-directional models and can be applied to lattice rescoring. The generated lattices can be used for downstream applications, such as confusion network decoding and keyword search. Experimental results on speech recognition and keyword spotting tasks illustrate the empirical usefulness of future word information, and the flexibility of the proposed model to represent this information.
Sponsorship
Cambridge Assessment (unknown)
Identifiers
External DOI: https://doi.org/10.1109/TASLP.2019.2922048
This record's URL: https://www.repository.cam.ac.uk/handle/1810/293315
Rights
All rights reserved