Repository logo

Drug discovery for misfolding diseases using structure-based iterative learning



Change log



Computational methods such as machine learning hold the promise to reduce the costs and the failure rates of conventional drug discovery pipelines. This issue is pressing for neurodegenerative diseases, where the development of disease-modifying drugs has been particularly challenging. The high attrition rate of neurodegenerative drug discovery is especially acute for Parkinson’s disease, where no disease-modifying drugs have yet been approved. Numerous clinical trials targeting α-synuclein aggregation, a process implicated in Parkinson’s disease and other synucleinopathies, have failed, at least in part due to the challenges in identifying potent compounds in preclinical investigations. In Chapter 2, I describe machine learning approaches to identify small molecule inhibitors of α-synuclein aggregation to address this problem. Because the proliferation of α-synuclein aggregates takes place through autocatalytic secondary nucleation from fibril surfaces, we aim to identify compounds that bind the catalytic sites on the surface of the mature fibrillar aggregates (the end point polymers of the aggregation process). This prevents the formation of the toxic intermediate aggregate species, termed misfolded oligomers. Fibrils assume different structural polymorphs depending on the synucleinopathy, likely due to the different locations of the nervous system that these diseases occur within. Each tissue has an associated set of specific conditions which likely shape the final structure of the aggregates. Targeting these pathogenic polymorphs may help ameliorate disease progression more effectively than prior efforts. To achieve this goal, I use structure-based machine learning in an iterative manner to first identify and then progressively optimise secondary nucleation inhibitors. Training data for aggregation inhibition were obtained by an assay specifically isolating secondary nucleation, the major mechanism of toxic oligomer production. My results demonstrate that this approach leads to the facile identification of compounds which are two orders of magnitude more potent than previously reported ones.

This initial work formed the basis of subsequent efforts to both expand the chemical space explored, and explore it more effectively, through application of generative modelling linked with reinforcement learning. I also increased the molecular parameters considered during the process of inhibitor optimisation in Chapter 3, accounting for aspects of pharmacokinetics as well as potency. This work addressed a number of shortcomings in the initial approach including restricted chemical space and a focus on potency alone. The initial method was reminiscent of the early stages of drug development, where large compound libraries are typically screened to identify compounds of promising potency against the chosen targets. Often, however, these compounds have a poor drug metabolism and pharmacokinetics (DMPK) profile, which are negative features that may be difficult to eliminate. To address this, the updated machine learning approach combines generative modelling and reinforcement learning to identify small molecules that perturb the kinetics of aggregation, thus reducing the production of oligomeric species, while also having high predicted blood brain barrier penetrance. This approach resulted in the identification of small molecules with good pharmacokinetic properties and potency against secondary nucleation.

Misfolded protein oligomers generated via secondary nucleation are clearly of central importance in both the diagnosis and treatment of Alzheimer’s and Parkinson’s diseases. All the methods described here are designed to counter their formation, yet accurate high-throughput methods to detect and quantify oligomer populations are still needed. Invariably bulk aggregation is the metric that is tracked, and the oligomer population is then inferred. In Chapter 4 I present a novel single-molecule approach to detection and quantification of oligomeric species. The approach is based on the use of solid state nanopores and multiplexed DNA barcoding to identify and characterise oligomers from multiple samples. I study α-synuclein oligomers in the presence of several small molecule inhibitors of α-synuclein aggregation, as an illustration of the potential applicability of this method to assist the development of diagnostic and therapeutic methods for Parkinson’s disease.

Finally, having created these pipelines for the development of α-synuclein aggregation inhibitors, I then sought to expand into other protein misfolding areas to demonstrate their generalisability as described in Chapter 5. The aggregation of tau into amyloid fibrils is associated with Alzheimer’s disease and related tauopathies. Similarly to synucleinopathies, different tauopathies are characterised by the formation of distinct tau fibril polymorphs. Brain homogenates were used to seed the generation of tau fibrils. The aim here was to create fibrils that replicate the polymorph formed in Alzheimer’s disease, thus mirroring the pathological aggregation mechanisms as closely as possible. Fibrils recovered from these efforts were capable of converting recombinant 0N3R tau into an Alzheimer’s fibril polymorph in a kinetic assay, as verified through cryo-EM structural analysis. Using this kinetic assay, I illustrate the iterative machine learning drug discovery method for tau aggregation in Alzheimer’s disease.





Vendruscolo, Michele


Aggregation, Aggregation inhibition, Alzheimer's Disease, Amyloid, Diagnostics, Machine learning, Nanopore, Neurodegeneration, Parkinson's Disease, PET tracers, Protein oligomers, Single molecule, Small molecules, Therapeutics


Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Horizon Europe UKRI Underwrite Innovate (10059436)
Horizon Europe UKRI Underwrite Innovate (10061100)