Similarity-Augmented Prediction Methods for Neural Machine Translation

Cheng, Julius

doi:https://doi.org/10.17863/CAM.120094

Similarity-Augmented Prediction Methods for Neural Machine Translation

Repository URI

https://www.repository.cam.ac.uk/handle/1810/387214

Repository DOI

https://doi.org/10.17863/CAM.120094

Files

Primary Thesis (1.42 MB)

Type

Thesis

Authors

Cheng, Julius

Abstract

Neural language models (LMs) are now the dominant approach to most tasks in natural language processing (NLP), including machine translation (MT). In spite of their success, studies have shown systematic problems in these models such as the high dispersal of probability mass across vastly many similar sequences and excessive allocation of probability mass to short and inadequate translations. Other studies have shown that these properties are prevalent in LMs trained for one-to-many language tasks such as MT. These issues limit the effectiveness of probability maximizing search for prediction, e.g. beam search, and probability-based uncertainty quantification (UQ), e.g. Shannon entropy.

In this thesis, we study a class of methods which measure semantic similarities between elements in the LM output distribution, which we call similarity-augmented prediction methods. The most well-known instance of this is minimum Bayes risk (MBR) prediction, which returns the sequence with the highest expected similarity to the LM output distribution. MBR addresses the flaws of beam search and outperforms it across many NLP tasks.

Our contributions are as follows. First, we propose a general-purpose MBR algorithm that is significantly faster than the conventional approach. Second, we propose a similarity-based uncertainty quantification method for MT which we show to be a superior indicator of model confidence than previous UQ methods which do not consider similarity. Lastly, we propose using Bayesian optimization with Gaussian processes to improve the task of prediction reranking, using similarity between elements to make strategic search choices which significantly speeds up reranking.

LMs are not simply distributions over token sequences, but distributions over possible meanings. We introduce new methods for analyzing them as such, and since these methods usually require more computation than their previous counterparts, we also introduce statistically principled ways to reduce their cost.

Date

2025-01-31

Advisors

Vlachos, Andreas

Keywords

Natural language processing, Machine translation

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Collections

Theses - Computer Science and Technology

Similarity-Augmented Prediction Methods for Neural Machine Translation

Repository URI

Repository DOI

Files

Type

Change log

Authors

Abstract

Description

Date

Advisors

Keywords

Qualification

Awarding Institution

Rights and licensing

Collections