Repository logo

Upgrading the Protein Formulation Toolbox through Predictive and Explanatory Models



Change log


Warner, Nina 


Native proteins are metastable molecules; slight perturbations in environmental conditions— temperature, moisture, pH, etc.— can quickly (and often irreversibly) degrade these biological structures. The implications of this instability are directly related to a myriad of diseases, biologic processing costs and development times, waste production, and even humanitarian issues such as the global availability of life-saving medicines. Engineering more stable protein variants is thus one of the most active, exciting areas of modern biological research, accelerated by computational approaches. In contrast, protein formulation, the task of developing stabilising excipient matrices, has received comparatively little attention, remaining dominated by heuristics that compromise both the efficiency and efficacy of this approach.

This thesis details the development of a series of predictive and explanatory models designed with the intention of advancing both the practical tools and knowledge available to the modern protein formulation chemist. Models are developed via a combined experimental-computational approach. Experimentally, protein degradation is studied via a wide range of techniques including fluorescence (intrinsic/extrinsic), SDS-PAGE, CD, DSC, FTIR, DLS, and activity, among others. Computationally, response surface methodology, conventional machine learning (ML), few shot learning, molecular dynamics, and graph theory are applied to protein formulation problems in both the solid and solution states. A broad overview of models developed in this thesis is presented in the figure below. [Figure 1] For the first time, ML is used to predict the excipient-dependent binary thermal stability of a model protein (phytase) in freeze-dried formulations without experimental input; a generalisable formulation encoding method and formulation-specific engineered features are introduced to aid future application of ML to formulation tasks. Furthermore, these results are generalised to new proteins and phases through the use of few-shot learning. From an explanatory standpoint, osmolyte-induced protein stability changes are re-contextualised through graph theory. A protein graph descriptor is identified as a highly sensitive metric to observe room temperature osmolyte-induced protein stability changes, outperforming traditional metrics such as mean RMSF. Graph analysis of simulated osmolyte-protein





Scherman, Oren


protein, formulation, excipient, protein stability, biologic, biologic formulation, protein formulation, osmolyte


Awarding Institution

University of Cambridge
AB Agri