Similarity-Augmented Prediction Methods for Neural Machine Translation
Repository URI
Repository DOI
Change log
Authors
Abstract
Neural language models (LMs) are now the dominant approach to most tasks in natural language processing (NLP), including machine translation (MT). In spite of their success, studies have shown systematic problems in these models such as the high dispersal of probability mass across vastly many similar sequences and excessive allocation of probability mass to short and inadequate translations. Other studies have shown that these properties are prevalent in LMs trained for one-to-many language tasks such as MT. These issues limit the effectiveness of probability maximizing search for prediction, e.g. beam search, and probability-based uncertainty quantification (UQ), e.g. Shannon entropy.
In this thesis, we study a class of methods which measure semantic similarities between elements in the LM output distribution, which we call similarity-augmented prediction methods. The most well-known instance of this is minimum Bayes risk (MBR) prediction, which returns the sequence with the highest expected similarity to the LM output distribution. MBR addresses the flaws of beam search and outperforms it across many NLP tasks.
Our contributions are as follows. First, we propose a general-purpose MBR algorithm that is significantly faster than the conventional approach. Second, we propose a similarity-based uncertainty quantification method for MT which we show to be a superior indicator of model confidence than previous UQ methods which do not consider similarity. Lastly, we propose using Bayesian optimization with Gaussian processes to improve the task of prediction reranking, using similarity between elements to make strategic search choices which significantly speeds up reranking.
LMs are not simply distributions over token sequences, but distributions over possible meanings. We introduce new methods for analyzing them as such, and since these methods usually require more computation than their previous counterparts, we also introduce statistically principled ways to reduce their cost.