Non-parametric Bayesian models for structured output prediction

Bratières, Sébastien

Non-parametric Bayesian models for structured output prediction

Repository URI

https://www.repository.cam.ac.uk/handle/1810/274973

Repository DOI

https://doi.org/10.17863/CAM.22124

Files

Thesis (19.52 MB)

Type

Thesis

Authors

Bratières, Sébastien

https://orcid.org/0000-0002-8117-0153

Abstract

Structured output prediction is a machine learning tasks in which an input object is not just assigned a single class, as in classification, but multiple, interdependent labels. This means that the presence or value of a given label affects the other labels, for instance in text labelling problems, where output labels are applied to each word, and their interdependencies must be modelled.

Non-parametric Bayesian (NPB) techniques are probabilistic modelling techniques which have the interesting property of allowing model capacity to grow, in a controllable way, with data complexity, while maintaining the advantages of Bayesian modelling. In this thesis, we develop NPB algorithms to solve structured output problems.

We first study a map-reduce implementation of a stochastic inference method designed for the infinite hidden Markov model, applied to a computational linguistics task, part-of-speech tagging. We show that mainstream map-reduce frameworks do not easily support highly iterative algorithms.

The main contribution of this thesis consists in a conceptually novel discriminative model, GPstruct. It is motivated by labelling tasks, and combines attractive properties of conditional random fields (CRF), structured support vector machines, and Gaussian process (GP) classifiers. In probabilistic terms, GPstruct combines a CRF likelihood with a GP prior on factors; it can also be described as a Bayesian kernelized CRF.

To train this model, we develop a Markov chain Monte Carlo algorithm based on elliptical slice sampling and investigate its properties. We then validate it on real data experiments, and explore two topologies: sequence output with text labelling tasks, and grid output with semantic segmentation of images. The latter case poses scalability issues, which are addressed using likelihood approximations and an ensemble method which allows distributed inference and prediction.

The experimental validation demonstrates: (a) the model is flexible and its constituent parts are modular and easy to engineer; (b) predictive performance and, most crucially, the probabilistic calibration of predictions are better than or equal to that of competitor models, and (c) model hyperparameters can be learnt from data.

Date

2017-04-06

Advisors

Ghahramani, Zoubin

Keywords

machine learning, Bayesian models, probability

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights

Collections

Theses - Engineering