Imputation versus prediction: applications in machine learning for drug discovery

Change log
Irwin, Benedict WJ 
Mahmoud, Samar 
Whitehead, Thomas M 
Conduit, Gareth J 
Segall, Matthew D 

jats:p Imputation is a powerful statistical method that is distinct from the predictive modelling techniques more commonly used in drug discovery. Imputation uses sparse experimental data in an incomplete dataset to predict missing values by leveraging correlations between experimental assays. This contrasts with quantitative structure–activity relationship methods that use only descriptor – assay correlations. We summarize three recent imputation strategies – heterogeneous deep imputation, assay profile methods and matrix factorization – and compare these with quantitative structure–activity relationship methods, including deep learning, in drug discovery settings. We comment on the value added by imputation methods when used in an ongoing project and find that imputation produces stronger models, earlier in the project, over activity and absorption, distribution, metabolism and elimination end points. </jats:p>

33 Built Environment and Design, 3404 Medicinal and Biomolecular Chemistry, 34 Chemical Sciences, 3303 Design, 8.4 Research design and methodologies (health services), 8 Health and social care services research, Generic health relevance
Journal Title
Future Drug Discovery
Conference Name
Journal ISSN
Volume Title
Future Science Ltd