Deep imputation on largeāscale drug discovery data
Published version
Repository URI
Repository DOI
Change log
Authors
Abstract
jats:titleAbstract</jats:title>jats:pMore accurate predictions of the biological properties of chemical compounds would guide the selection and design of new compounds in drug discovery and help to address the enormous cost and low successārate of pharmaceutical R&D. However, this domain presents a significant challenge for AI methods due to the sparsity of compound data and the noise inherent in results from biological experiments. In this paper, we demonstrate how data imputation using deep learning provides substantial improvements over quantitative structureāactivity relationship (QSAR) machine learning models that are widely applied in drug discovery. We present the largestātoādate successful application of deepālearning imputation to datasets which are comparable in size to the corporate data repository of a pharmaceutical company (678ā994 compounds by 1166 endpoints). We demonstrate this improvement for three areas of practical application linked to distinct use cases; (a) target activity data compiled from a range of drug discovery projects, (b) a high value and heterogeneous dataset covering complex absorption, distribution, metabolism, and elimination properties, and (c) high throughput screening data, testing the algorithm's limits on early stage noisy and very sparse data. Achieving median coefficients of determination, jats:italicR</jats:italic>jats:sup2</jats:sup>, of 0.69, 0.36, and 0.43, respectively, across these applications, the deep learning imputation method offers an unambiguous improvement over random forest QSAR methods, which achieve median jats:italicR</jats:italic>jats:sup2</jats:sup> values of 0.28, 0.19, and 0.23, respectively. We also demonstrate that robust estimates of the uncertainties in the predicted values correlate strongly with the accuracies in prediction, enabling greater confidence in decisionāmaking based on the imputed values.</jats:p>
Description
Keywords
Journal Title
Conference Name
Journal ISSN
2689-5595
Volume Title
Publisher
Publisher DOI
Sponsorship
Royal Society (URF\R\201002)