Repository logo
 

MISATO: machine learning dataset of protein-ligand complexes for structure-based drug discovery.

Published version
Peer-reviewed

Repository DOI


Change log

Authors

Benassou, Sabrina 
Merdivan, Erinc 

Abstract

Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule-ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein-ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein-ligand complexes in explicit water is included, accumulating over 170 μs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models.

Description

Acknowledgements: This work received funding from BMWi ZIM KK 5197901TS0 (T.S., F.M., G.M.P.) and BMBF, SUPREME, 031L0268 (T.S., F.M., G.M.P.). This work was supported by the Helmholtz Association’s Initiative and Networking Fund on the HAICORE@FZJ partition. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.


Funder: BMWi ZIM. KK 5197901TS0

Keywords

Machine Learning, Ligands, Drug Discovery, Proteins, Molecular Dynamics Simulation, Quantum Theory

Journal Title

Nat Comput Sci

Conference Name

Journal ISSN

2662-8457
2662-8457

Volume Title

4

Publisher

Springer Science and Business Media LLC
Sponsorship
Bundesministerium für Bildung und Forschung (Federal Ministry of Education and Research) (SUPREME, 031L0268, SUPREME, 031L0268, SUPREME, 031L0268)