Computational approaches to predicting drug induced toxicity

Marchese Robinson, Richard Liam

Computational approaches to predicting drug induced toxicity

Repository URI

http://www.dspace.cam.ac.uk/handle/1810/244242
https://www.repository.cam.ac.uk/handle/1810/244242

Repository DOI

https://doi.org/10.17863/CAM.16292

Files

Thesis_v140.pdf (6.38 MB)

Final_PhD_Thesis_SI_RLMarcheseRobinson.zip (92.38 MB)

README.txt (417 B)

Errors and Clarifications (489.04 KB)

Type

Thesis

Authors

Marchese Robinson, Richard Liam

Abstract

Novel approaches and models for predicting drug induced toxicity in silico are presented. Typically, these were based on Quantitative Structure-Activity Relationships (QSAR). The following endpoints were modelled: mutagenicity, carcinogenicity, inhibition of the hERG ion channel and the associated arrhythmia - Torsades de Pointes. A consensus model was developed based on Derek for WindowsTM and Toxtree and used to filter compounds as part of a collaborative effort resulting in the identification of potential starting points for anti-tuberculosis drugs.
Based on the careful selection of data from the literature, binary classifiers were generated for the identification of potent hERG inhibitors. These were found to perform competitively with, or better than, those computational approaches previously presented in the literature.
Some of these models were generated using Winnow, in conjunction with a novel proposal for encoding molecular structures as required by this algorithm. The Winnow models were found to perform comparably to models generated using the Support Vector Machine and Random Forest algorithms. These studies also emphasised the variability in results which may be obtained when applying the same approaches to different train/test combinations. Novel approaches to combining chemical information with Ultrafast Shape Recognition (USR) descriptors are introduced: Atom Type USR (ATUSR) and a combination between a proposed Atom Type Fingerprint (ATFP) and USR (USR-ATFP). These were applied to the task of predicting protein-ligand interactions - including the prediction of hERG inhibition. Whilst, for some of the datasets considered, either ATUSR or USR-ATFP was found to perform marginally better than all other descriptor sets to which they were compared, most differences were statistically insignificant. Further work is warranted to determine the advantages which ATUSR and USR-ATFP might offer with respect to established descriptor sets. The first attempts to construct QSAR models for Torsades de Pointes using predicted cardiac ion channel inhibitory potencies as descriptors are presented, along with the first evaluation of experimentally determined inhibitory potencies as an alternative, or complement to, standard descriptors. No (clear) evidence was found that 'predicted' ('experimental') 'IC-descriptors' improve performance. However, their value may lie in the greater interpretability they could confer upon the models. Building upon the work presented in the preceding chapters, this thesis ends with specific proposals for future research directions.

Description

Errors and issues requiring clarification are included as a separate document (January 2016).

Keywords

Cheminformatics, Toxicology, Machine learning, QSAR

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights

Attribution 2.0 UK: England & Wales

Collections

Theses - Chemistry