Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks

Antoran, Javier

doi:https://doi.org/10.17863/CAM.108253

Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks

Repository URI

https://www.repository.cam.ac.uk/handle/1810/367808

Repository DOI

https://doi.org/10.17863/CAM.108253

Files

Primary Thesis (14.84 MB)

Type

Thesis

Authors

Antoran, Javier

Abstract

Large neural networks trained on large datasets have become the dominant paradigm in machine learning. These systems rely on maximum likelihood point estimates of their parameters, precluding them from expressing model uncertainty. This may result in overconfident predictions and it prevents the use of deep learning models for sequential decision making.

This thesis develops scalable methods to equip neural networks with model uncertainty. To achieve this, we do not try to fight progress in deep learning but instead borrow ideas from this field to make probabilistic methods more scalable. In particular, we leverage the linearised Laplace approximation to equip pre-trained neural networks with the uncertainty estimates provided by their tangent linear models. This turns the problem of Bayesian inference in neural networks into one of Bayesian inference in conjugate Gaussian-linear models. Alas, the cost of this remains cubic in either the number of network parameters or in the number of observations times output dimensions. By assumption, neither are tractable.

We address this intractability by using stochastic gradient descent (SGD)---the workhorse algorithm of deep learning---to perform posterior sampling in linear models and their convex duals: Gaussian processes. With this, we turn back to linearised neural networks, finding the linearised Laplace approximation to present a number of incompatibilities with modern deep learning practices---namely, stochastic optimisation, early stopping and normalisation layers---when used for hyperparameter learning. We resolve these and construct a sample-based EM algorithm for scalable hyperparameter learning with linearised neural networks.

We apply the above methods to perform linearised neural network inference with {ResNet-50} (25M parameters) trained on Imagenet (1.2M observations and 1000 output dimensions). To the best of our knowledge, this is the first time Bayesian inference has been performed in this real-world-scaled setting without assuming some degree of independence across network weights. Additionally, we apply our methods to estimate uncertainty for 3d tomographic reconstructions obtained with the deep image prior network, also a first. We conclude by using the linearised deep image prior to adaptively choose sequences of scanning angles that produce higher quality tomographic reconstructions while applying less radiation dosage.

Date

2024-02-09

Advisors

Hernandez-Lobato, Jose Miguel

Keywords

Approximate Inference, Artificial Intelligence, Bayesian Inference, Computed tomography, Gaussian Process, Laplace Approximation, Linearised Laplace, Linearized Laplace, Machine Learning, Sample based inference, Stochastic Gradient Descent, Variational Inference

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Sponsorship

Engineering and Physical Sciences Research Council (2275741)

Javier Antorán acknowledges support from Microsoft Research, through its PhD Scholarship Programme, and from the EPSRC

Collections

Theses - Engineering

Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks

Repository URI

Repository DOI

Files

Type

Change log

Authors

Abstract

Description

Date

Advisors

Keywords

Qualification

Awarding Institution

Rights and licensing

Sponsorship

Collections