Efficient Training and Evaluation of Recurrent Neural Network Language Models for Automatic Speech Recognition

Chen, X; Liu, X; Wang, Y; Gales, MJF; Woodland, PC

Efficient Training and Evaluation of Recurrent Neural Network Language Models for Automatic Speech Recognition

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/274204

Repository DOI

https://doi.org/10.17863/CAM.21304

Files

Accepted version (846.11 KB)

Type

Article

Authors

Chen, X

Liu, X

Wang, Y

Gales, MJF

Woodland, PC

Abstract

© 2014 IEEE. Recurrent neural network language models (RNNLMs) are becoming increasingly popular for a range of applications including automatic speech recognition. An important issue that limits their possible application areas is the computational cost incurred in training and evaluation. This paper describes a series of new efficiency improving approaches that allows RNNLMs to be more efficiently trained on graphics processing units (GPUs) and evaluated on CPUs. First, a modified RNNLM architecture with a nonclass-based, full output layer structure (F-RNNLM) is proposed. This modified architecture facilitates a novel spliced sentence bunch mode parallelization of F-RNNLM training using large quantities of data on a GPU. Second, two efficient RNNLM training criteria based on variance regularization and noise contrastive estimation are explored to specifically reduce the computation associated with the RNNLM output layer softmax normalisation term. Finally, a pipelined training algorithm utilizing multiple GPUs is also used to further improve the training speed. Initially, RNNLMs were trained on a moderate dataset with 20M words from a large vocabulary conversational telephone speech recognition task. The training time of RNNLM is reduced by up to a factor of 53 on a single GPU over the standard CPU-based RNNLM toolkit. A 56 times speed up in test time evaluation on a CPU was obtained over the baseline F-RNNLMs. Consistent improvements in both recognition accuracy and perplexity were also obtained over C-RNNLMs. Experiments on Google's one billion corpus also reveals that the training of RNNLM scales well.

Keywords

Estimation, GPU, language models, noise contrastive, pipelined training, recurrent neural network, speech recognition, variance regularisation

Journal Title

IEEE/ACM Transactions on Audio Speech and Language Processing

Journal ISSN

2329-9290
2329-9304

Volume Title

24

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Publisher DOI

https://doi.org/10.1109/TASLP.2016.2598304

Rights

http://www.rioxx.net/licenses/all-rights-reserved

Collections

Scholarly Works - Engineering