Repository logo
 

Using Heterogeneous Information Sources for Understanding and Predicting Biological Effects of Compounds


Loading...
Thumbnail Image

Type

Thesis

Change log

Authors

Trapotsi, Maria-Anna 

Abstract

Understanding a compound’s biological effects such as its Mechanism of Action (MoA) and safety profile is a challenging task in drug discovery process. However, this understanding can facilitate drug discovery process and provide an early warning for potential risks. Biological effects understanding has been significantly facilitated by the advances in Machine Learning (ML), bioinformatic approaches and the increasing deposition of high throughput data in public databases. There are different types of information/data which can be used and as the volume of this data increases, so too does their potential to deepen our understanding. Therefore, key questions remain around which ML methodologies and which data types to use. In this thesis, the aim was to provide answers to two questions about which data and methods to use for compounds’ MoA understanding and how to explore the safety profile of new data modalities such as PROteolysis TArgeting Chimeras (PROTACs).

In the first chapter, “Probabilistic Random Forest improves bioactivity predictions close to the classification threshold by taking into account experimental uncertainty”, a novel algorithm was evaluated and benchmarked. A limiting factor in target prediction for MoA understanding is the experimental variability in bioactivity data, which are used to train target prediction models. By applying this novel algorithm, which is a modification of the long-established Random Forest (RF), and comparing it with the classic RF, a benefit was identified in the prediction of compounds which are close to the classification threshold.

The next chapter, “Comparison of Structural Chemical and Cell Morphology Information for Multitask Bioactivity Predictions”, provided insights in which type of compound information is more useful in target prediction across 224 targets. The comparison was performed using cell morphology information (in the form of CellProfiler features) from a Cell Painting assay and chemical structure information in the form of Extended Connectivity Fingerprints. The comparison revealed that there were targets better predicted by cell morphology information such as the β-catenin and other better predicted by chemical structure information such as proteins belonging to the G-protein-Coupled Receptor 1 family.

The final chapter, “Mitochondrial Toxicity Prediction using Cell Painting Assay on a PROTACs dataset”, explored the successful profiling of a novel data modality (PROTACs) with the Cell Painting assay and evaluated whether this profiling can be used in the understanding of the safety of those novel compounds. Cell morphology features (in the form of CellProfiler features) successfully predicted mitochondrial toxicity in a PROTACs dataset. This work resulted in the first ML model to predict PROTACs’ mitochondrial toxicity using Cell Painting-based features and expanded our knowledge for PROTACs’ safety profile prediction.

In summary, the work described in this thesis has furthered the field of in-silico target deconvolution and PROTACs’ mitochondrial toxicity prediction. Firstly, the work showed that there is benefit of using Probabilistic Random Forest when there is a degree of experimental uncertainty in bioactivity data close to the classification threshold. In addition, this work highlighted targets, where the use of compounds’ cell morphology information was beneficial for target prediction and finally showed that PROTACs’ cell morphology information can be used for mitochondrial toxicity prediction.

Description

Date

2021-12-24

Advisors

Bender, Andreas
Engkvist, Ola
Barrett, Ian

Keywords

mechanism of action, target prediction, cell painting, PROTAC, PROTACs

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
Biotechnology and Biological Sciences Research Council (1944644)
BBSRC and AstraZeneca