Protein-ligand binding affinity prediction exploiting sequence constituent homology.
Published version
Peer-reviewed
Repository URI
Repository DOI
Change log
Authors
Abstract
MOTIVATION: Molecular docking is a commonly used approach for estimating binding conformations and their resultant binding affinities. Machine learning has been successfully deployed to enhance such affinity estimations. Many methods of varying complexity have been developed making use of some or all the spatial and categorical information available in these structures. The evaluation of such methods has mainly been carried out using datasets from PDBbind. Particularly the Comparative Assessment of Scoring Functions (CASF) 2007, 2013, and 2016 datasets with dedicated test sets. This work demonstrates that only a small number of simple descriptors is necessary to efficiently estimate binding affinity for these complexes without the need to know the exact binding conformation of a ligand. RESULTS: The developed approach of using a small number of ligand and protein descriptors in conjunction with gradient boosting trees demonstrates high performance on the CASF datasets. This includes the commonly used benchmark CASF2016 where it appears to perform better than any other approach. This methodology is also useful for datasets where the spatial relationship between the ligand and protein is unknown as demonstrated using a large ChEMBL-derived dataset. AVAILABILITY AND IMPLEMENTATION: Code and data uploaded to https://github.com/abbiAR/PLBAffinity.
Description
Keywords
Journal Title
Conference Name
Journal ISSN
1367-4811
Volume Title
Publisher
Publisher DOI
Sponsorship
EPSRC (EP/W004801/1)