Repository logo

Accelerating process development of complex chemical reactions



Change log


Amar, Yehia 


Process development of new complex reactions in the pharmaceutical and fine chemicals industries is challenging, and expensive. The field is beginning to see a bridging between fundamental first-principles investigations, and utilisation of data-driven statistical methods, such as machine learning. Nonetheless, process development and optimisation in these industries is mostly driven by trial-and-error, and experience. Approaches that move beyond these are limited to the well-developed optimisation of continuous variables, and often do not yield physical insights. This thesis describes several new methods developed to address research questions related to this challenge.

First, we investigated whether utilising physical knowledge could aid statistics-guided self-optimisation of a C-H activation reaction, in which the optimisation variables were continuous. We then considered algorithmic treatment of the more challenging discrete variables, focussing on solvents. We parametrised a library of 459 solvents with physically meaningful molecular descriptors. Our case study was a homogeneous Rh-catalysed asymmetric hydrogenation to produce a chiral γ-lactam, with conversion and diastereoselectivity as objectives. We adapted a state-of-the-art multi-objective machine learning algorithm, based on Gaussian processes, to utilise the descriptors as inputs, and to create a surrogate model for each objective. The aim of the algorithm was to determine a set of Pareto solutions with a minimum experimental budget, whilst simultaneously addressing model uncertainty. We found that descriptors are a valuable tool for Design of Experiments, and can produce predictive and interpretable surrogate models.

Subsequently, a physical investigation of this reaction led to the discovery of an efficient catalyst-ligand system, which we studied by operando NMR, and identified a parametrised kinetic model. Turning the focus then to ligands for asymmetric hydrogenation, we calculated versatile empirical descriptors based on the similarity of atomic environments, for 102 chiral ligands, to predict diastereoselectivity. Whilst the model fit was good, it failed to accurately predict the performance of an unseen ligand family, due to analogue bias. Physical knowledge has then guided the selection of symmetrised physico-chemical descriptors. This produced more accurate predictive models for diastereoselectivity, including for an unseen ligand family. The contribution of this thesis is a development of novel and effective workflows and methodologies for process development. These open the door for process chemists to save time and resources, freeing them up from routine work, to focus instead on creatively designing new chemistry for future real-world applications.





Lapkin, Alexei


molecular descriptors, design of experiments, asymmetric hydrogenation, machine learning, process development


Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge