Show simple item record

dc.contributor.authorBrouwer, Thomas Alexander
dc.date.accessioned2017-12-05T12:12:59Z
dc.date.available2017-12-05T12:12:59Z
dc.date.issued2017-12-01
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/269921
dc.description.abstractIn recent years the amount of biological data has increased exponentially. Most of these data can be represented as matrices relating two different entity types, such as drug-target interactions (relating drugs to protein targets), gene expression profiles (relating drugs or cell lines to genes), and drug sensitivity values (relating drugs to cell lines). Not only the size of these datasets is increasing, but also the number of different entity types that they relate. Furthermore, not all values in these datasets are typically observed, and some are very sparse. Matrix factorisation is a popular group of methods that can be used to analyse these matrices. The idea is that each matrix can be decomposed into two or more smaller matrices, such that their product approximates the original one. This factorisation of the data reveals patterns in the matrix, and gives us a lower-dimensional representation. Not only can we use this technique to identify clusters and other biological signals, we can also predict the unobserved entries, allowing us to prune biological experiments. In this thesis we introduce and explore several Bayesian matrix factorisation models, focusing on how to best use them for predicting these missing values in biological datasets. Our main hypothesis is that matrix factorisation methods, and in particular Bayesian variants, are an extremely powerful paradigm for predicting values in biological datasets, as well as other applications, and especially for sparse and noisy data. We demonstrate the competitiveness of these approaches compared to other state-of-the-art methods, and explore the conditions under which they perform the best. We consider several aspects of the Bayesian approach to matrix factorisation. Firstly, the effect of inference approaches that are used to find the factorisation on predictive performance. Secondly, we identify different likelihood and Bayesian prior choices that we can use for these models, and explore when they are most appropriate. Finally, we introduce a Bayesian matrix factorisation model that can be used to integrate multiple biological datasets, and hence improve predictions. This model hybridly combines different matrix factorisation models and Bayesian priors. Through these models and experiments we support our hypothesis and provide novel insights into the best ways to use Bayesian matrix factorisation methods for predictive purposes.
dc.description.sponsorshipUK Engineering and Physical Sciences Research Council (EPSRC), grant reference EP/M506485/1.
dc.language.isoen
dc.rightsAll Rights Reserveden
dc.rights.urihttps://www.rioxx.net/licenses/all-rights-reserved/en
dc.subjectMatrix factorisation
dc.subjectMachine learning
dc.subjectBayesian statistics
dc.subjectBioinformatics
dc.titleBayesian matrix factorisation: inference, priors, and data integration
dc.typeThesis
dc.type.qualificationlevelDoctoral
dc.type.qualificationnameDoctor of Philosophy (PhD)
dc.publisher.institutionUniversity of Cambridge
dc.publisher.departmentComputer Laboratory
dc.date.updated2017-12-01T16:49:17Z
dc.identifier.doi10.17863/CAM.16797
dc.publisher.collegeHomerton
dc.type.qualificationtitlePhD in Computer Science
cam.supervisorLio, Pietro


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record