Protein Condensate Atlas from predictive models of heteromolecular condensate composition
Published version
Peer-reviewed
Repository URI
Repository DOI
Type
Change log
Authors
Abstract
jats:titleAbstract</jats:title>jats:pBiomolecular condensates help cells organise their content in space and time. Cells harbour a variety of condensate types with diverse composition and many are likely yet to be discovered. Here, we develop a methodology to predict the composition of biomolecular condensates. We first analyse available proteomics data of cellular condensates and find that the biophysical features that determine protein localisation into condensates differ from known drivers of homotypic phase separation processes, with charge mediated protein-RNA and hydrophobicity mediated protein-protein interactions playing a key role in the former process. We then develop a machine learning model that links protein sequence to its propensity to localise into heteromolecular condensates. We apply the model across the proteome and find many of the top-ranked targets outside the original training data to localise into condensates as confirmed by orthogonal immunohistochemical staining imaging. Finally, we segment the condensation-prone proteome into condensate types based on an overlap with biomolecular interaction profiles to generate a Protein Condensate Atlas. Several condensate clusters within the Atlas closely match the composition of experimentally characterised condensates or regions within them, suggesting that the Atlas can be valuable for identifying additional components within known condensate systems and discovering previously uncharacterised condensates.</jats:p>
Description
Acknowledgements: We would like to acknowledge the Schmidt Science Fellowship in partnership with the Rhodes Trust (K.L.S.), St. John’s College Research Fellowship (K.L.S.), the National Institutes of Health Oxford-Cambridge Scholars Programme (L.L.G.), the Cambridge Trust’s Cambridge International Scholarship (L.L.G.), the Intramural Research Programme of the National Institute of Diabetes and Digestive and Kidney Diseases at the National Institutes of Health (L.L.G.), and the European Research Council (T.P.J.K.). The authors gratefully acknowledge funding from the European Research Council under the European Union’s Horizon 2020 research and innovation program through the ERC grant DiProPhys (agreement ID 101001615).
Funder: European Research Council