Zero shot molecular generation via similarity kernels
Published version
Peer-reviewed
Repository URI
Repository DOI
Type
Change log
Authors
Abstract
Generative modelling aims to accelerate the discovery of novel chemicals by directly proposing structures with desirable properties. Recently, score-based, or diffusion, generative models have significantly outperformed previous approaches. Key to their success is the close relationship between the score and physical force, allowing the use of powerful equivariant neural networks. However, the behaviour of the learnt score is not yet well understood. Here, we analyse the score by training an energy-based diffusion model for molecular generation. We find that during the generation the score resembles a restorative potential initially and a quantum-mechanical force at the end, exhibiting special properties in between that enable the building of large molecules. Building upon these insights, we present Similarity-based Molecular Generation (SiMGen), a new zero-shot molecular generation method. SiMGen combines a time-dependent similarity kernel with local many-body descriptors to generate molecules without any further training. Our approach allows shape control via point cloud priors. Importantly, it can also act as guidance for existing trained models, enabling fragment-biased generation. We also release an interactive web tool, ZnDraw, for online SiMGen generation ( https://zndraw.icp.uni-stuttgart.de ).
Description
Acknowledgements: We thank Tamás K. Stenczel for initially proposing the idea of using similarity kernels for molecular generation, J Harry Moore for carrying out the docking of the Octa-acid binders, and Lars L. Schaaf for helpful discussions. R.E. and I.B. acknowledge support by the University of Cambridge Harding Distinguished Postgraduate Scholars Programme. S.W.N. acknowledges support from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Actions (Grant Agreement 945357) as part of the DESTINY PhD programme, as well as support from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement 957189 (BIG-MAP). C.H. and F.Z. acknowledge support by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) in the framework of the priority programme SPP 2363, “Utilisation and Development of Machine Learning for Molecular Applications - Molecular Machine Learning” Project No. 497249646. C.H. and F.Z. acknowledge further funding through the DFG under Germany’s Excellence Strategy - EXC 2075 - 390740016 and the Stuttgart Centre for Simulation Science (SimTech). Access to CSD3 was obtained through a University of Cambridge EPSRC Core Equipment Award EP/X034712/1. We acknowledge funding from UKRI under the UK Car-Parrinello HEC Consortium grant, with number EP/X035891/1.
Funder: EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020); doi: https://doi.org/100010661
Funder: RCUK | Engineering and Physical Sciences Research Council (EPSRC); doi: https://doi.org/501100000266
Journal Title
Conference Name
Journal ISSN
2041-1723
Volume Title
Publisher
Publisher DOI
Rights and licensing
Sponsorship
European Commission Horizon 2020 (H2020) Research Infrastructures (RI) (957189)
EPSRC (EP/X034712/1)