Improved In Silico Methods for Target Deconvolution in Phenotypic Screens

Mervin, Lewis

Improved In Silico Methods for Target Deconvolution in Phenotypic Screens

Repository URI

https://www.repository.cam.ac.uk/handle/1810/283004

Repository DOI

https://doi.org/10.17863/CAM.30369

Files

Thesis (27.91 MB)

Type

Thesis

Authors

Mervin, Lewis

https://orcid.org/0000-0002-7271-0824

Abstract

Target-based screening projects for bioactive (orphan) compounds have been shown in many cases to be insufficiently predictive for in vivo efficacy, leading to attrition in clinical trials. Phenotypic screening has hence undergone a renaissance in both academia and in the pharmaceutical industry, partly due to this reason. One key shortcoming of this paradigm shift is that the protein targets modulated need to be elucidated subsequently, which is often a costly and time-consuming procedure. In this work, we have explored both improved methods and real-world case studies of how computational methods can help in target elucidation of phenotypic screens. One limitation of previous methods has been the ability to assess the applicability domain of the models, that is, when the assumptions made by a model are fulfilled and which input chemicals are reliably appropriate for the models. Hence, a major focus of this work was to explore methods for calibration of machine learning algorithms using Platt Scaling, Isotonic Regression Scaling and Venn-Abers Predictors, since the probabilities from well calibrated classifiers can be interpreted at a confidence level and predictions specified at an acceptable error rate. Additionally, many current protocols only offer probabilities for affinity, thus another key area for development was to expand the target prediction models with functional prediction (activation or inhibition). This extra level of annotation is important since the activation or inhibition of a target may positively or negatively impact the phenotypic response in a biological system. Furthermore, many existing methods do not utilize the wealth of bioactivity information held for orthologue species. We therefore also focused on an in-depth analysis of orthologue bioactivity data and its relevance and applicability towards expanding compound and target bioactivity space for predictive studies. The realized protocol was trained with 13,918,879 compound-target pairs and comprises 1,651 targets, which has been made available for public use at GitHub. Consequently, the methodology was applied to aid with the target deconvolution of AstraZeneca phenotypic readouts, in particular for the rationalization of cytotoxicity and cytostaticity in the High-Throughput Screening (HTS) collection. Results from this work highlighted which targets are frequently linked to the cytotoxicity and cytostaticity of chemical structures, and provided insight into which compounds to select or remove from the collection for future screening projects. Overall, this project has furthered the field of in silico target deconvolution, by improving the performance and applicability of current protocols and by rationalizing cytotoxicity, which has been shown to influence attrition in clinical trials.

Date

2017-10-06

Advisors

Bender, Andreas
Engkvist, Ola

Keywords

Cheminformatics, Mode of action, In silico, Protein Target Prediction, Orthologue, Chemical space, AstraZeneca, Chemistry Connect, Bioactivity data, Target deconvolution, Target prediction, MoA, ChEMBL, PubChem, Functional prediction, Sphere exclusion, Random Forest, Naive Bayes, SVM, Support Vector Machine, AD-AUC, Activation, Inhibition, Functional Effects, Mechanism-of-action, Mode-of-action, Mechanism of action, Phenotypic screens, High throughput screens, High content screens, PR-AUC, Applicability domain, Venn Abers, Platt scaling, Isotonic regression scaling, Python, Scikit-learn, RDKit

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights

Attribution 4.0 International (CC BY 4.0)

Sponsorship

BBSRC AstraZeneca

Collections

Theses - Chemistry