A Bayesian mixture modelling approach for spatial proteomics.
PLoS Comput Biol
Public Library of Science (PLoS)
MetadataShow full item record
Crook, O. M., Mulvey, C. M., Kirk, P. D., Lilley, K. S., & Gatto, L. (2018). A Bayesian mixture modelling approach for spatial proteomics.. PLoS Comput Biol, 14 (11), e1006516. https://doi.org/10.1371/journal.pcbi.1006516
Analysis of the spatial sub-cellular distribution of proteins is of vital importance to fully understand context specific protein function. Some proteins can be found with a single location within a cell, but up to half of proteins may reside in multiple locations, can dynamically re-localise, or reside within an unknown functional compartment. These considerations lead to uncertainty in associating a protein to a single location. Currently, mass spectrometry (MS) based spatial proteomics relies on supervised machine learning algorithms to assign proteins to sub-cellular locations based on common gradient profiles. However, such methods fail to quantify uncertainty associated with sub-cellular class assignment. Here we reformulate the framework on which we perform statistical analysis. We propose a Bayesian generative classifier based on Gaussian mixture models to assign proteins probabilistically to sub-cellular niches, thus proteins have a probability distribution over sub-cellular locations, with Bayesian computation performed using the expectation-maximisation (EM) algorithm, as well as Markov-chain Monte-Carlo (MCMC). Our methodology allows proteome-wide uncertainty quantification, thus adding a further layer to the analysis of spatial proteomics. Our framework is flexible, allowing many different systems to be analysed and reveals new modelling opportunities for spatial proteomics. We find our methods perform competitively with current state-of-the art machine learning methods, whilst simultaneously providing more information. We highlight several examples where classification based on the support vector machine is unable to make any conclusions, while uncertainty quantification using our approach provides biologically intriguing results. To our knowledge this is the first Bayesian model of MS-based spatial proteomics data.
Algorithms, Animals, Bayes Theorem, Embryonic Stem Cells, Machine Learning, Mice, Models, Theoretical, Proteomics, Reproducibility of Results, Subcellular Fractions, Uncertainty
LG was supported by the BBSRC Strategic Longer and Larger grant (Award BB/L002817/1) and the Wellcome Trust Senior Investigator Award 110170/Z/15/Z awarded to KSL. PDWK was supported by the MRC (project reference MC_UP_0801/1). CMM was supported by a Wellcome Trust Technology Development Grant (Grant number 108467/Z/15/Z). OMC is a Wellcome Trust Mathematical Genomics and Medicine student supported financially by the School of Clinical Medicine, University of Cambridge. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Biotechnology and Biological Sciences Research Council (BB/N023129/1)
Wellcome Trust (110170/Z/15/Z)
Wellcome Trust (108467/Z/15/Z)
Biotechnology and Biological Sciences Research Council (BB/L002817/1)
Biotechnology and Biological Sciences Research Council (BB/L018497/1)
External DOI: https://doi.org/10.1371/journal.pcbi.1006516
This record's URL: https://www.repository.cam.ac.uk/handle/1810/286957
Attribution 4.0 International
Licence URL: https://creativecommons.org/licenses/by/4.0/
Recommended or similar items
The current recommendation prototype on the Apollo Repository will be turned off on 03 February 2023. Although the pilot has been fruitful for both parties, the service provider IKVA is focusing on horizon scanning products and so the recommender service can no longer be supported. We recognise the importance of recommender services in supporting research discovery and are evaluating offerings from other service providers. If you would like to offer feedback on this decision please contact us on: firstname.lastname@example.org