Improved Sampling and Variational Inference Methods for Neural Networks
Repository URI
Repository DOI
Change log
Authors
Abstract
Bayesian Neural Networks (BNNs), an application of Bayesian inference to neural networks, offer an alternative way of training. They combine multiple weight settings, each compatible with the training data, and quantify the uncertainty about the network's weights. While Bayesian inference applied to neural networks may not always provide perfectly accurate predictions, it is more robust to over-fitting and offers an advantage of epistemic uncertainty. This means the diversity of predictions emerging from individual weights can indicate when users should be cautious about their accuracy. As applying Bayesian inference directly to neural networks is computationally intractable, resorting to approximate methods is needed.
There are two well-established classes of approximate inference applied in practice. First, Markov Chain Monte Carlo (MCMC) constructed to directly sample from the posterior distribution over the network's weights, giving unbiased but high variance estimates. Second, approximate inference methods that construct or optimise an approximate posterior distribution, giving biased estimates, usually with low variance. Variational Inference (VI) is a commonly used method in this class. This thesis aims to improve the predictive performance of approximate inference applied to neural networks belonging to both of these classes, by developing improvements in sampling-based methods and variational inference, with a particular focus on scalability to large neural networks.
Chapter 2 introduces a computationally efficient method to sample from an approximate posterior distribution. This approach constructs a Markov chain with approximate acceptance probabilities derived from a Taylor expansion of the log density ratio at the chain's current state. Sampling from this chain is compatible with mini-batching, and simulating the chain has a speed comparable to optimisation algorithms applied to maximum likelihood learning.
Chapter 3 introduces a method that reduces under-fitting affecting mean-field variational posteriors optimised in large BNNs. This method develops an efficient optimisation scheme for an augmented BNN model where the inference is applied to weights and their prior means and variances.
Chapter 4 develops an approach for efficiently optimising more complicated Gaussian variational posteriors in BNNs, where the covariance matrix of intra-layer weights has a low rank plus diagonal structure. This method extends previously developed variance reduction techniques for BNNs. These posteriors lead to better predictive distributions than standard mean-field variational inference.
Finally, Chapter 5 establishes common conclusions, summarises the developed algorithms and describes possible future work.
