Paraphrastic language models

Liu, X; Gales, MJF; Woodland, PC

doi:10.1016/j.csl.2014.04.004

Paraphrastic language models

Repository URI

https://www.repository.cam.ac.uk/handle/1810/245484

Files

Main article (525.21 KB)

Type

Article

Authors

Liu, X

Gales, MJF

Woodland, PC

Abstract

Natural languages are known for their expressive richness. Many sentences can be used to represent the same underlying meaning. Only modelling the observed surface word sequence can result in poor context coverage and generalization, for example, when using n-gram language models (LMs). This paper proposes a novel form of language model, the paraphrastic LM, that addresses these issues. A phrase level paraphrase model statistically learned from standard text data with no semantic annotation is used to generate multiple paraphrase variants. LM probabilities are then estimated by maximizing their marginal probability. Multi-level language models estimated at both the word level and the phrase level are combined. An efficient weighted finite state transducer (WFST) based paraphrase generation approach is also presented. Significant error rate reductions of 0.5–0.6% absolute were obtained over the baseline n-gram LMs on two state-of-the-art recognition tasks for English conversational telephone speech and Mandarin Chinese broadcast speech using a paraphrastic multi-level LM modelling both word and phrase sequences. When it is further combined with word and phrase level feed-forward neural network LMs, a significant error rate reduction of 0.9% absolute (9% relative) and 0.5% absolute (5% relative) were obtained over the baseline n-gram and neural network LMs respectively.

Keywords

46 Information and Computing Sciences, 4602 Artificial Intelligence, 4611 Machine Learning

Journal Title

Computer Speech & Language

Journal ISSN

0885-2308
1095-8363

Volume Title

28

Publisher

Elsevier

Publisher DOI

https://doi.org/10.1016/j.csl.2014.04.004

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivs 2.0 UK: England & Wales

Sponsorship

The research leading to these results was supported by EPSRC grant EP/I031022/1 (Natural Speech Technology) and DARPA under the Broad Operational Language Translation (BOLT) program.

Collections

Scholarly Works - Engineering
Symplectic mapped items for data match

Paraphrastic language models

Repository URI

Repository DOI

Files

Type

Change log

Authors

Abstract

Description

Keywords

Journal Title

Conference Name

Journal ISSN

Volume Title

Publisher

Publisher DOI

Rights and licensing

Sponsorship

Collections