Repository logo
 

Natural gradient methods in statistics and machine learning


Loading...
Thumbnail Image

Type

Change log

Abstract

Many inference problems can be expressed as optimisations. These optimisations can often be challenging, requiring many iterations to converge using standard methods, making them computationally demanding. In this thesis, we present methods for performing such optimisations efficiently, drawing heavily on the use of natural gradient methods (Amari, 1998).

In the first main contribution chapter of this thesis, we present a novel natural-gradient-based method for optimising the parameters of probability distributions where a direct application of natural gradients would be computationally demanding. We apply this method to maximum likelihood estimation and variational inference tasks involving a number of distributions. These include: skew-elliptical distributions, which can model multivariate real-valued data with characteristics beyond those which can be captured by the Gaussian family, such as asymmetry or heavy-tailedness; elliptical copulas, which are commonly used for modelling correlation structure in high-dimensional statistics; and various mixture distributions, which represent complex probability distributions as combinations of simpler distributions, allowing for features such as multimodality, which may not be possible to represent in the component distributions alone. Our method expands the set of distribution families that can efficiently be targeted with natural gradients, and as we demonstrate, can result in significantly faster convergence than standard methods.

In the second main contribution chapter, we use a novel natural-gradient interpretation of the expectation propagation (EP) algorithm of Minka (2001) to motivate two new natural-gradient-based EP variants that have particularly desirable properties in black-box inference settings. Black-box inference methods allow practitioners to answer questions of inference without requiring expert knowledge of the underlying inference techniques, and typically place few restrictions on the model of interest. EP has several desirable computational properties in black-box settings, but existing EP variants face multiple challenges. Our new variants have several advantages over their predecessors that allow them to address these challenges. Namely, they converge faster, are easier to tune, and do not make use of debiasing estimators. By facilitating the use of EP in such settings, our advances have the potential to reduce the computational demands of performing black-box inference.

We hope that our contributions will prove to be useful in their own right, and also that they may facilitate or inspire further advances in statistics, machine learning, or indeed any other field in which problems of inference are to be found.

Description

Date

2025-04-29

Advisors

Turner, Richard E

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution 4.0 International (CC BY 4.0)
Sponsorship
Harding Distinguished Postgraduate Scholars Programme