Repository logo
 

Improving Parameter-Efficient Cross-Lingual Transfer for Low-Resource Languages


Loading...
Thumbnail Image

Type

Change log

Abstract

The rapid development and real-world adoption of natural language processing models in recent years underscores the imperative to develop such models in different languages. This endeavour aims to enable access to emerging technologies to a broad spectrum of individuals, irrespective of their language. The most challenging scenario is arguably that of low-resource languages, often lacking labelled data while also possessing a limited amount of unlabelled data. The open question is how to best use available data sources to achieve good performance across a range of resource scarcity. In this thesis, this question is addressed from two different perspectives in the context of modular and parameter-efficient approaches to cross-lingual transfer. Firstly, we propose different strategies for adapting to a low-resource target language within the existing zero-shot cross-lingual transfer paradigm. This involves leveraging unlabelled data in the target language in conjunction with other standard data sources to augment model performance for a specific task in that target language. Despite the underlying assumption of the absence of labelled data, our findings demonstrate that even exclusive reliance on unlabelled data enhances the task performance in the target language. In addition, we study the trade-offs between modularity and performance across the proposed methods. Secondly, we explore several approaches to combine various data sources in a few-shot setting, assuming the availability of a limited amount of labelled data in the target language. Our results illustrate that combining this data with larger amounts of lower-quality labelled data acquired through the translation process, along with unlabelled data, yields large performance gains for low-resource target languages when integrated with existing cross-lingual transfer tools. The outcomes of this research show the feasibility of refining existing methods for cross-lingual transfer through the implementation of different training procedures and data sources. We hope this thesis will provide valuable insights into cross-lingual transfer and serve as an inspiration for further advancements in models designed for low-resource scenarios.

Description

Date

2023-12-05

Advisors

Korhonen, Anna

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Except where otherwised noted, this item's license is described as All Rights Reserved
Sponsorship
Trinity College