Data-Driven Language Understanding for Spoken Dialogue Systems
Spoken dialogue systems provide a natural conversational interface to computer applications. In recent years, the substantial improvements in the performance of speech recognition engines have helped shift the research focus to the next component of the dialogue system pipeline: the one in charge of language understanding. The role of this module is to translate user inputs into accurate representations of the user goal in the form that can be used by the system to interact with the underlying application. The challenges include the modelling of linguistic variation, speech recognition errors and the effects of dialogue context. Recently, the focus of language understanding research has moved to making use of word embeddings induced from large textual corpora using unsupervised methods. The work presented in this thesis demonstrates how these methods can be adapted to overcome the limitations of language understanding pipelines currently used in spoken dialogue systems.
The thesis starts with a discussion of the pros and cons of language understanding models used in modern dialogue systems. Most models in use today are based on the delexicalisation paradigm, where exact string matching supplemented by a list of domain-specific rephrasings is used to recognise users' intents and update the system's internal belief state. This is followed by an attempt to use pretrained word vector collections to automatically induce domain-specific semantic lexicons, which are typically hand-crafted to handle lexical variation and account for a plethora of system failure modes. The results highlight the deficiencies of distributional word vectors which must be overcome to make them useful for downstream language understanding models.
The thesis next shifts focus to overcoming the language understanding models' dependency on semantic lexicons. To achieve that, the proposed Neural Belief Tracking (NBT) model forsakes the use of standard one-hot n-gram representations used in Natural Language Processing in favour of distributed representations of user utterances, dialogue context and domain ontologies. The NBT model makes use of external lexical knowledge embedded in semantically specialised word vectors, obviating the need for domain-specific semantic lexicons.
Subsequent work focuses on semantic specialisation, presenting an efficient method for injecting external lexical knowledge into word vector spaces. The proposed Attract-Repel algorithm boosts the semantic content of existing word vectors while simultaneously inducing high-quality cross-lingual word vector spaces. Finally, NBT models powered by specialised cross-lingual word vectors are used to train multilingual belief tracking models. These models operate across many languages at once, providing an efficient method for bootstrapping language understanding models for lower-resource languages with limited training data.