The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction

Stahlberg, Felix

doi:10.17863/CAM.49422

The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction

Repository URI

https://www.repository.cam.ac.uk/handle/1810/302349

Repository DOI

https://doi.org/10.17863/CAM.49422

Files

Thesis (4.87 MB)

Type

Thesis

Authors

Stahlberg, Felix

https://orcid.org/0000-0002-0430-5704

Abstract

With the advent of deep learning, research in many areas of machine learning is converging towards the same set of methods and models. For example, long short-term memory networks are not only popular for various tasks in natural language processing (NLP) such as speech recognition, machine translation, handwriting recognition, syntactic parsing, etc., but they are also applicable to seemingly unrelated fields such as robot control, time series prediction, and bioinformatics. Recent advances in contextual word embeddings like BERT boast with achieving state-of-the-art results on 11 NLP tasks with the same model. Before deep learning, a speech recognizer and a syntactic parser used to have little in common as systems were much more tailored towards the task at hand.

At the core of this development is the tendency to view each task as yet another data mapping problem, neglecting the particular characteristics and (soft) requirements tasks often have in practice. This often goes along with a sharp break of deep learning methods with previous research in the specific area. This work can be understood as an antithesis to this paradigm. We show how traditional symbolic statistical machine translation models can still improve neural machine translation (NMT) while reducing the risk for common pathologies of NMT such as hallucinations and neologisms. Other external symbolic models such as spell checkers and morphology databases help neural grammatical error correction. We also focus on language models that often do not play a role in vanilla end-to-end approaches and apply them in different ways to word reordering, grammatical error correction, low-resource NMT, and document-level NMT. Finally, we demonstrate the benefit of hierarchical models in sequence-to-sequence prediction. Hand-engineered covering grammars are effective in preventing catastrophic errors in neural text normalization systems. Our operation sequence model for interpretable NMT represents translation as a series of actions that modify the translation state, and can also be seen as derivation in a formal grammar.

Date

2019-09-05

Advisors

Byrne, Bill

Keywords

neural machine translation, natural language processing, statistical machine translation, grammatical error correction, language models, operation sequence model, text normalization

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution 4.0 International (CC BY 4.0)

Sponsorship

EPSRC grant EP/L027623/1 EPSRC Tier-2 capital grant EP/P020259/1

Collections

Theses - Engineering

The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction

Repository URI

Repository DOI

Files

Type

Change log

Authors

Abstract

Description

Date

Advisors

Keywords

Qualification

Awarding Institution

Rights and licensing

Sponsorship

Collections