Repository logo
 

Zero shot molecular generation via similarity kernels

Published version
Peer-reviewed

Repository DOI


Change log

Abstract

Generative modelling aims to accelerate the discovery of novel chemicals by directly proposing structures with desirable properties. Recently, score-based, or diffusion, generative models have significantly outperformed previous approaches. Key to their success is the close relationship between the score and physical force, allowing the use of powerful equivariant neural networks. However, the behaviour of the learnt score is not yet well understood. Here, we analyse the score by training an energy-based diffusion model for molecular generation. We find that during the generation the score resembles a restorative potential initially and a quantum-mechanical force at the end, exhibiting special properties in between that enable the building of large molecules. Building upon these insights, we present Similarity-based Molecular Generation (SiMGen), a new zero-shot molecular generation method. SiMGen combines a time-dependent similarity kernel with local many-body descriptors to generate molecules without any further training. Our approach allows shape control via point cloud priors. Importantly, it can also act as guidance for existing trained models, enabling fragment-biased generation. We also release an interactive web tool, ZnDraw, for online SiMGen generation ( https://zndraw.icp.uni-stuttgart.de ).

Description

Acknowledgements: We thank Tamás K. Stenczel for initially proposing the idea of using similarity kernels for molecular generation, J Harry Moore for carrying out the docking of the Octa-acid binders, and Lars L. Schaaf for helpful discussions. R.E. and I.B. acknowledge support by the University of Cambridge Harding Distinguished Postgraduate Scholars Programme. S.W.N. acknowledges support from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Actions (Grant Agreement 945357) as part of the DESTINY PhD programme, as well as support from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement 957189 (BIG-MAP). C.H. and F.Z. acknowledge support by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) in the framework of the priority programme SPP 2363, “Utilisation and Development of Machine Learning for Molecular Applications - Molecular Machine Learning” Project No. 497249646. C.H. and F.Z. acknowledge further funding through the DFG under Germany’s Excellence Strategy - EXC 2075 - 390740016 and the Stuttgart Centre for Simulation Science (SimTech). Access to CSD3 was obtained through a University of Cambridge EPSRC Core Equipment Award EP/X034712/1. We acknowledge funding from UKRI under the UK Car-Parrinello HEC Consortium grant, with number EP/X035891/1.


Funder: EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020); doi: https://doi.org/100010661


Funder: RCUK | Engineering and Physical Sciences Research Council (EPSRC); doi: https://doi.org/501100000266

Journal Title

Nature Communications

Conference Name

Journal ISSN

2041-1723
2041-1723

Volume Title

16

Publisher

Springer Science and Business Media LLC

Rights and licensing

Except where otherwised noted, this item's license is described as http://creativecommons.org/licenses/by/4.0/
Sponsorship
Engineering and Physical Sciences Research Council (EP/X035891/1)
European Commission Horizon 2020 (H2020) Research Infrastructures (RI) (957189)
EPSRC (EP/X034712/1)
University of Cambridge Harding Distinguished Postgraduate Scholars Programme. We acknowledge funding from UKRI under the UK Car-Parrinello HEC Consortium grant, with number EP/X035891/1. European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Actions (Grant Agreement 945357) as part of the DESTINY PhD program, as well as support from the European Union's Horizon 2020 research and innovation program under Grant Agreement 957189 (BIG-MAP). Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) in the framework of the priority program SPP 2363, “Utilization and Development of Machine Learning for Molecular Applications - Molecular Machine Learning” Project No. 497249646. DFG under Germany's Excellence Strategy - EXC 2075 - 390740016 and the Stuttgart Center for Simulation Science (SimTech).

Relationships

Is supplemented by: