Scalable Gaussian Processes: Advances in Iterative Methods and Pathwise Conditioning
Repository URI
Repository DOI
Change log
Authors
Abstract
Driven by advances in hardware for massively-parallel computation, machine learning models trained on large amounts of data have become capable of accomplishing complex tasks, such as generating realistic images or maintaining conversations in natural language. However, the inability to know when they don't know often leads to overconfidence and hallucinations.
Gaussian processes are a powerful framework for uncertainty-aware function approximation and sequential decision-making. Unfortunately, their classical formulation does not scale gracefully to large amounts of data and modern hardware for massively-parallel computation, prompting many researchers to develop techniques which improve their scalability.
This dissertation focuses on the powerful combination of iterative methods and pathwise conditioning to develop methodological contributions which facilitate the use of Gaussian processes in modern large-scale settings. By combining these two techniques synergistically, expensive computations are expressed as solutions to systems of linear equations and obtained by leveraging iterative linear system solvers. This drastically reduces memory requirements, facilitating application to significantly larger amounts of data, and introduces matrix multiplication as the main computational operation, which is ideal for modern hardware.
In particular, this dissertation introduces stochastic gradient algorithms as a computationally efficient method to solve linear systems iteratively. To this end, custom optimisation objectives, stochastic gradient estimators, and variance reduction techniques are developed and analysed. Empirically, the proposed methods achieve state-of-the-art performance on large- scale regression, Bayesian optimisation, and molecular binding affinity prediction tasks.
Additionally, generic improvements, which are applicable to any iterative linear system solver in the context of Gaussian processes, are contributed, leading to computational speed-ups of up to 72x compared to established approaches. Furthermore, iterative methods and pathwise conditioning are combined with structured linear algebra techniques to attain even greater scalability, which is demonstrated on real-world datasets with up to five million examples, including robotics, automated machine learning, and climate modelling applications.
