Repository logo
 

The Neural Processes Family: Translation Equivariance and Output Dependencies


Type

Thesis

Change log

Authors

Requeima, James 

Abstract

Most contemporary machine learning approaches use a model trained from scratch on a particular task and a learning algorithm designed by hand. This approach has worked very well with the advent of deep learning and in the presence of very large datasets (Goodfellow et al., 2016). Recently, meta-learning has emerged as a machine learning approach to learn both a model and a learning algorithm (Hospedales et al., 2021; Schmidhuber, 1987) directly from data. Neural processes (Garnelo et al., 2018a,b) are a family of meta-learning models which combine the flexibility of deep learning with the uncertainty awareness of probabilistic models. Training using meta-learning allows neural processes to apply deep neural networks to applications with smaller training sets where they would typically overfit. Neural processes produce well-calibrated predictions, enable fast inference at test time, and have flexible data-handling properties that make them a good candidate for messy real-world datasets and applications.

However, this thesis focuses on addressing two shortcomings when applying neural processes to real-world applications by i) incorporating translation equivariance into the architecture of neural processes rather than requiring the model to learn this inductive bias directly from data and ii) developing methods for neural processes to parametrize rich predictive distributions that can model dependencies between output-space variables and produce coherent samples.

This thesis makes four main contributions to the family of neural processes models. First, we introduce the convolutional conditional neural process (ConvCNP). The ConvCNP incorporates translation equivariance into its modelling assumptions by using convolutional neural networks and improves training data efficiency and performance when data is approximately stationary. Second, we propose the latent variable version of the ConvCNP, the convolutional latent neural process (convLNP) that is able to model epistemic uncertainty and output-space dependencies and able to produce coherent function samples. We also propose an approximate maximum likelihood training procedure for the ConvLNP improving upon the standard VI approximate inference technique used by latent neural processes at the time. Third, we propose the Gaussian neural process (GNP) which models the predictive distribution with a full covariance Gaussian. The GNP can model joint output-space dependencies like the ConvLNP but avoids the issues associated with using latent variables. Training GNPs is much more simple than the ConvLNP since it uses the same maximum likelihood technique as standard conditional neural processes. Fourth, we introduce the autoregressive neural process (AR NP). Rather than proposing a new neural process architecture this method produces predictions at test time by evaluating existing neural process models autoregressively via the product rule of probability. This method allows for the use of existing, potentially already trained neural processes to model non-Gaussian predictive distributions and produce coherent samples without any modifications to the architecture or training procedure.

The efficacy of each of these methods is demonstrated through a series of synthetic and real world experiments in climate science, population modelling, and medical science applications. It can be seen in these applications that incorporating translation equivariance as a modelling assumption and generating predictive distributions that model output-space dependencies improves predictive performance.

Description

Date

2022-12-01

Advisors

Turner, Richard

Keywords

meta-learning, neural processes

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge