On The Dissemination Of Novel Chemistry And The Process Of Optimising Compounds In Drug Discovery Projects

Ashenden, Stephanie

On The Dissemination Of Novel Chemistry And The Process Of Optimising Compounds In Drug Discovery Projects

Repository URI

https://www.repository.cam.ac.uk/handle/1810/291051

Repository DOI

https://doi.org/10.17863/CAM.38232

Files

Thesis (10.26 MB)

Type

Thesis

Authors

Ashenden, Stephanie

Abstract

Optimising the drug discovery process remains one of the largest challenges in medicine. Learning from previous compound-target associations as well as the process of optimising compounds will allow for a more targeted and knowledge-based approach. The aim of the first research chapter of this thesis is to understand where novel chemistry is first published. It is well established that the number of publications of novel small molecule modulators, and their associated targets, has increased over the years. This work focuses on publishing trends over the years with a focus on the comparison between patents and scientific literature, which is accessible via the ChEMBL and GOSTAR databases. More precisely, the patents and scientific literature associated with bioactive molecules and their target annotations have been compared to identify where novelty (in the meaning of the first modulator of a protein target) originated. Comparing the published date of the first small molecule modulator published in literature and patents for a target (with the modulators having either identical or different structures) shows that modulators are usually published in both scientific literature and in patents (45%), or in scientific literature alone (51%), but rarely in patents only. When looking at the time when first modulators are published in both sources, 65% of the time they are disseminated in literature first. Finally, when analysing just the novel small molecule modulators, regardless of the protein targets they have been published with, those structures representing novel chemistry tend to be published in patents first (61% of the time). It is concluded that novel chemistry, when associated with a target, is primarily published in the literature, therefore, when exploring known chemistry for a specific known target, this should be identified from the literature. Following this, it is important to understand how chemists optimise compounds, and we use matched molecular pair analysis (MMPs) to this end, which allows us to compare the properties of two compounds that differ by only one chemical transformation and are important for the compound to be success as a drug. In this part of the thesis, we statistically analyse the most frequently observed MMPs within drug discovery projects by using the compound registration dates to determine the order in which compounds were made within projects and aggregate the findings over all internal projects in AstraZeneca. For those MMPs that are commonly observed in projects, we compare this frequency to the frequency of reverse change in structure, to determine if there are preferences in the chemical changes made in projects over time. Furthermore, we analyse the neighbouring environments for the position where the molecule has changed. 957 unique MMPs were found to occur at least 100 times across projects, comprising 81 unique molecular fragments as starting points and 197 unique molecular fragments as end points of MMPs. The most frequently occurring MMPs as well as 5 the most frequently occurring atomic environments differ between aliphatic and aromatic systems. Overall, this study provides a data-driven method to analyse the order in which molecular fragments are incorporated into molecules in drug discovery projects. This knowledge can be used to help guide decisions in future compound design. Finally, relating these MMP findings to the measured assay results allows an overview to be made about the how the compounds themselves evolve throughout the project. MMPs are used when designing of new compounds to exploit existing knowledge of the effect of a molecular transformation on compound properties (such as binding, solubility, logD etc) and apply this to new compounds with the expectation of seeing the same outcome. The effect on physicochemical properties as measured in assays, from transformations on specific atomic environments since the year 2000, have been analysed via a time course analysis. This allows us to observe the effect of the transformations over time. In total 453 unique transformations were analysed. It highlights that even when just comparing between aromatic and aliphatic systems on a higher level, changes can be observed and shows that when designing a compound, consideration of the atomic environment is essential. These results can be used to identify the structural change that would improve a compound profile going through the design process; saving time, resources and money. Additionally, specific examples have been extracted for discussion. Notably, those examples that are considered extreme outliers, which generally refer to transformations involving a very large property change of the compound (±4 standard deviations). These extreme outliers highlight the need to always consider outliers in the analysis as they may be of importance but retaining them within a study may obscure additional results. Therefore, it is suggested to acknowledge these outliers, but not include them in the main study. Furthermore, case studies are given that show unexpected changes in property values when the logD increases such as solubility also increasing and is shown to be the result of surrounding chemistry of the atomic environment.

Date

2018-09-28

Advisors

Bender, Andreas
Engkvist, Ola
Kogej, Thierry

Keywords

cheminformatics, data science, drug discovery, Matched Molecular Pairs

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights

Sponsorship

BBSRC and AstraZeneca

Collections

Theses - Chemistry