Repository logo
 

Probabilistic Continual Learning using Neural Networks


Loading...
Thumbnail Image

Type

Change log

Abstract

Neural networks are being increasingly used in society due to their strong performance at a large scale. They excel when they have access to all data at once, requiring multiple passes through the data. However, standard deep-learning techniques are unable to continually adapt as the environment changes: either they forget old data or they fail to sufficiently adapt to new data. This limitation is a major barrier to applications in many real-world settings, where the environment is often changing, and also in stark contrast to humans, who continuously learn over their lifetimes. The study of learning systems in these settings is called continual learning: data examples arrive sequentially and predictions must be made online. In this thesis we present new algorithms for continual learning using neural networks. We use the probabilistic approach, which maintains a distribution over beliefs, naturally handling continual learning by recursively updating from priors to posteriors. Although previous work has been limited by approximations to this idealised scheme, we scale our probabilistic algorithms to large-data settings and show strong empirical performance. We also theoretically analyse why our algorithms perform well in continual learning. We start with a variational approximation over neural network weights in Chapter 3. Previous weight-prior algorithms converge slowly, and we speed up convergence by using natural-gradient updates, allowing us to scale to large-data settings such as ImageNet for the first time. However, we find there is still room for improving continual learning performance. We argue that ultimately we are only interested in model outputs, and this motivates us to view neural networks in function-space and regularise their outputs directly in Chapter 4. We approximate a term in the variational objective with its function-space alternative, leading to FROMP. FROMP identifies and regularises on a few memorable past examples to avoid forgetting, and performs very well on existing continual learning benchmarks. However, we find that FROMP is not exact in simple settings such as Generalised Linear Models (GLMs). We fix this in Chapter 5 with a method called Knowledge-adaptation priors (K-priors), a generalisation of FROMP and weight-priors that can be exact on GLMs. K-priors achieve quick and accurate adaptation across many adaptation tasks, including adding data (as in continual learning) but also removing data, changing the regulariser, and changing the model. We use K-priors to provide insight into why our previous methods achieve good performance, and we suggest improvements to them. Overall, in this thesis we provide a comprehensive probabilistic framework for continual learning using neural networks, and provide thorough evaluation of instances of this framework.

Description

Date

2022-01-18

Advisors

Turner, Richard

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Except where otherwised noted, this item's license is described as All Rights Reserved
Sponsorship
Engineering and Physical Sciences Research Council (1950609)