Hierarchical Inference in Gaussian Processes
Repository URI
Repository DOI
Change log
Authors
Abstract
Hierarchical modelling is a fundamental theme in probabilistic machine learning which relies on the Bayesian interpretation on probability. The starting requirement for these models is that all unknowns are treated as random variables with their own respective probability distributions. The central idea behind hierarchical modelling is that distributional parameters are endowed with their own probability distributions with hyperparameters and the construction recurs. This recursive construction is succinctly referred to as a hierarchy denoting the multiple levels of abstraction away from the data. Probability distributions naturally encode uncertainties and an important aspect of learning and inference in these models is the propagation of uncertainty through the different levels down to predictions. Hierarchical Bayesian models (HBMs) have been a mainstay of statistical modelling in several domains of science and engineering. This work considers hierarchical constructions in the context of Gaussian processes.
Gaussian processes (GP) are flexible distribution over functions and exemplify the key attributes of probabilistic machine learning. They have the ability to encode a diverse range of statistical structures through the choice of a kernel 1 function and account for prediction uncertainty by design. Importantly, they lend themselves to hierarchical modelling opening up avenues for increasing their statistical power and flexibility, albeit at an extra computational cost.
The principal thrust of this thesis is inference in hierarchical constructions of Gaussian process models. Endowing the kernel hyperparameters of a Gaussian process with hyperprior distributions is the most traditional manifestation of a hierarchical GP. Selecting a kernel function and adaptation of its hyperparameters encapsulate the model selection problem, hence, learning in hierarchical Gaussian processes can be seen as an exercise in model selection. Hierarchical constructions of Gaussian processes are not new, they have been indirectly used in several works but very few have made it a topic of stand-alone study. The aim of this work is to do exactly that. We present the mathematical desiderata for these models systematically covering both supervised and unsupervised learning domains. The latter is based on development of Gaussian process latent variable models where the hierarchical prior is on unobserved inputs X. The aim through out is to understand and illustrate the specific conditions under which hierarchical Gaussian processes offer a distinct advantage over traditional inference.
We explore hierarchical incarnations of Gaussian process models in the context of non-parameteric regression and latent variable modelling. While in earlier chapters the focus remains largely on the traditional setting of hyperparameter inference, we also develop a novel hierarchical construction leveraging modern neural networks for the task of kernel learning. By leveraging transformer neural networks we build a sophisticated prediction engine for identifying a suitable kernel for any high-dimensional data set (of observed inputs and outputs) with sub-second prediction times. Building on existing work, we shift our focus on modelling functions of inhomogenous smoothness by placing non-parameteric priors over hyperparameters, opening up doors for truly adaptive kernel design. The final chapter presents a provocative application of large-scale unsupervised dimensionality reduction using Gaussian processes on single-cell RNA (scRNA) data. The custom formulation builds upon the models presented in previous chapters adapting the kernel design to account for random effects, and technical and biological confounders like batch effects into the hierarchical construction. The low-dimensional latent space revealed biologically relevant clusters and in comparison to existing techniques (analysed on the same data set) the proposed method was significantly (9x speed-up) faster with virtually no degradation in results.