Repository logo
 

Advances in Meta-Learning, Robustness, and Second-Order Optimisation in Deep Learning


Loading...
Thumbnail Image

Type

Change log

Abstract

In machine learning, we are concerned with developing algorithms that are able to learn, that is, to accumulate knowledge about how to do a task without having been programmed specifically for that purpose. In this thesis, we are concerned with learning from two different perspectives: domains to which we may apply efficient machine learners and ways in which we can improve learning by solving the underlying optimisation problem more efficiently.

Machine learning methods are typically very data hungry. Although modern machine learning has been hugely effective in solving real-world problems, these success stories are largely limited to settings where there is an enormous amount of domain-relevant data available. The field of meta-learning aims to develop models with improved sample efficiency by creating models that “learn how to learn”, i.e. models that can adapt rapidly to new tasks when presented with a relatively small amount of examples. In this thesis, we are concerned with amortised meta-learners, which perform task adaptation using hypernetworks to generate a task-adapted model. These learners are very cost efficient, requiring just a single forward pass through the hypernetwork to learn how to perform a new task. We show that these amortised meta-learners can be leveraged in novel ways that extend beyond their typical usage in the few-shot learning setting.

We develop a set-based poisoning attack against amortized meta-learners, which allows us to craft colluding sets of inputs that are tailored to fool the system’s learning algorithm when used as training data to adapt to new tasks (i.e. as a support set). Such jointly crafted adversarial inputs can collude to manipulate a classifier, and are especially easy to compute for amortised learners with differentiable adaptation mechanisms. We also employ amortised learners in the field of explainability to perform “dataset debugging”, where we develop a data valuation or sample importance strategy called Meta-LOO that can be used to detect noisy or out-of-distribution data; or to distill a set of examples down to its most useful elements.

From our second perspective, machine learning and optimisation are intimately linked; indeed, learning can be formulated as a minimisation problem of the training loss with respect to the model’s parameters — though in practice we also require our algorithms to generalise which is not a concern of optimisation more broadly. The chosen optimisation strategy affects the speed at which algorithms learn and the quality of solutions (i.e. model parameters) found. By studying optimisation, we may improve how well and how quickly our models are able to learn.

In this thesis we take a two-pronged approach towards this goal. First, we develop an online hypergradient-based hyperparameter optimisation strategy that improves state of the art by supporting a wide range of hyperparameters, while remaining tractable at scale. Notably, our method supports hyperparameters of the optimisation algorithm such as learning rates and momentum, which similar approaches in the literature do not. Second, we develop a second-order optimisation strategy which is applicable to the non-convex loss landscapes of deep learning. Our algorithm approximates a saddle-free version of the Hessian for which saddle points are repulsive rather than attractive, in a way that scales to deep learning problems.

Description

Date

2023-03-31

Advisors

Turner, Richard

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Except where otherwised noted, this item's license is described as All Rights Reserved