Show simple item record

dc.contributor.authorPonti, Edoardoen
dc.description.abstractMost of the world's languages suffer from the paucity of annotated data. This curbs the effectiveness of supervised learning, the most widespread approach to modelling language. Instead, an alternative paradigm could take inspiration from the propensity of children to acquire language from limited stimuli, in order to enable machines to learn any new language from a few examples. The abstract mechanisms underpinning this ability include 1) a set of in-born inductive biases and 2) the deep entrenchment of language in other perceptual and cognitive faculties, combined with the ability to transfer and recombine knowledge across these domains. The main contribution of my thesis is giving concrete form to both these intuitions. Firstly, I argue that endowing a neural network with the correct inductive biases is equivalent to constructing a prior distribution over its weights and its architecture (including connectivity patterns and non-linear activations). This prior is inferred by "reverse-engineering" a representative set of observed languages and harnessing typological features documented by linguists. Thus, I provide a unified framework for cross-lingual transfer and architecture search by recasting them as hierarchical Bayesian neural models. Secondly, the skills relevant to different language varieties and different tasks in natural language processing are deeply intertwined. Hence, the neural weights modelling the data for each of their combinations can be imagined as lying in a structured space. I introduce a Bayesian generative model of this space, which is factorised into latent variables representing each language and each task. By virtue of this modular design, predictions can generalise to unseen combinations by extrapolating from the data of observed combinations. The proposed models are empirically validated on a spectrum of language-related tasks (character-level language modelling, part-of-speech tagging, named entity recognition, and common-sense reasoning) and a typologically diverse sample of about a hundred languages. Compared to a series of competitive baselines, they achieve better performances in new languages in zero-shot and few-shot learning settings. In general, they hold promise to extend state-of-the-art language technology to under-resourced languages by means of sample efficiency and robustness to the cross-lingual variation.en
dc.description.sponsorshipERC (Consolidator Grant 648909) Lexical Google Research Faculty Award 2018en
dc.rightsAttribution 4.0 Internationalen
dc.rightsAttribution 4.0 Internationalen
dc.subjectLinguistic Typologyen
dc.subjectMultilingual Natural Language Processingen
dc.subjectSample Efficiencyen
dc.subjectSystematic Generalisationen
dc.subjectInductive Biasen
dc.subjectBayesian Modelsen
dc.subjectNeural Networksen
dc.subjectDeep Learningen
dc.titleInductive Bias and Modular Design for Sample-Efficient Neural Language Learningen
dc.type.qualificationnameDoctor of Philosophy (PhD)en
dc.publisher.institutionUniversity of Cambridgeen
dc.contributor.orcidPonti, Edoardo [0000-0002-6308-1050]
dc.publisher.collegeSt Johns
dc.type.qualificationtitlePhD in Theoretical and Applied Linguisticsen
pubs.funder-project-idECH2020 EUROPEAN RESEARCH COUNCIL (ERC) (648909)
cam.supervisorKorhonen, Anna
cam.supervisorVulic, Ivan

Files in this item


There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Attribution 4.0 International
Except where otherwise noted, this item's licence is described as Attribution 4.0 International