Out-of-distribution generalisation in machine learning
Machine learning has proven extremely useful in many applications in recent years. However, a lot of these success stories stem from evaluating the algorithms on data very similar to that they were trained on. When applied to a new data distribution, machine learning algorithms have been shown to fail. Given the non-stationary and heterogeneous nature of real-world data, a better grasp of out-of-distribution generalisation is needed for algorithms to be widely deployed and trusted.
My thesis presents three research studies that aim to investigate and develop the field of out-of-distribution generalisation. The central goal of these research efforts is to produce new tools, such as algorithms, theoretical results, experimental results and datasets, to improve understanding and performance of machine learning methods in the face of distribution shift. The high-level idea that drives these research efforts across three machine learning scenarios is modularity -- the quality of consisting of separate parts that form a whole when combined. Modular approaches are hypothesised to steer the machine learning methods away from rigid memorisation of examples and towards more flexible and `more intelligent' learning that supports generalisation.
In my first contribution, I approach the thesis goal from the perspective of learning from multiple training distributions. The contribution to this line of research is twofold. First, I present a new standardised suite of tasks for evaluation and comparison of out-of-distribution generalisation algorithms. Second, I state a set of new theoretical results that fill an existing gap between data-centric and algorithmic approaches to out-of-distribution generalisation. These theoretical findings guide a new set of practical recommendations on how to employ the algorithmic approach.
In the second contribution, I tackle generalisation in the common learning setup of supervised image recognition. In this context, I first investigate the effect of multi-level feature aggregation on generalisation, and demonstrate that augmentation with one of the considered methods consistently improves the performance. Second, I propose a set of simple image datasets that can be used as a stepping stone for evaluation and comparison of image classification methods in terms of out-of-distribution generalisation.
Finally, I delve into the learning scenarios where multiple neural networks communicate to solve a shared task. This work supports the thesis goal in two ways. First, I propose a new environment, graph referential games, and present results on the influence of data representation and the corresponding data representation learning methods on out-of-distribution generalisation. These results connect the previously disjoint fields of graph representation learning and emergent communication. Second, I tackle the challenging domain of population-based communication grounded in realistic images.
The datasets, algorithms, theorems and experimental results in this thesis represent a few steps towards understanding and improving out-of-distribution generalisation in machine learning. They provide researchers with new tools and results that aim to foster research in this field, some of which have already proved useful to the research community. Finally, this work suggests important future directions in the machine learning subfields of learning from multiple distributions, image classification and multi-agent communication.