Repository logo
 

Computational Analysis of Molecules and Chemical Reactions from Cheminformatic Databases


Loading...
Thumbnail Image

Type

Change log

Abstract

A large number of organic reactions feature post-transition state bifurcations. Selectivities in such reactions are difficult to analyse because they cannot be determined by comparing the energies of competing transition states. Molecular dynamics approaches can provide answers but are computationally very expensive. In Chapter 1, a new ‘ValleyRidge’ algorithm has been developed which predicts the major products in bifurcating organic reactions with a negligible computational cost. The method requires two transition states, two product geometries and no additional information. The algorithm correctly predicts the major product for about 90% of the organic reactions investigated. For the remaining 10% of the reactions, the algorithm returns a warning sign, an indication that the conclusion may be uncertain. The method also reproduces the experimental or the molecular dynamics product ratios within the 15% error for more than 80% of the reactions. The method has been successfully applied to a trifurcating organic reaction, a carbocation rearrangement and the solvent-dependent Pummerer-like reactions, demonstrating the power of the algorithm to analyse complex reactions.

In Chapter 2, the ‘ValleyRidge’ algorithm has been extended to the ‘VRAI-Selectivity’ algorithm to model selectivities controlled by reaction dynamics rather than the transition state theory. Such reactions are difficult to analyse using the transition state theory because the approach often does not capture the subtlety of the energy landscapes the reaction trajectories traverse. Therefore, the transition state theory cannot accurately predict selectivities. The upgraded ‘VRAI-Selectivity’ algorithm can predict the major product and the selectivity for a wide range of potential energy surfaces where product distributions are influenced by reaction dynamics. The method requires the transition states, the intermediate (if present) and the product geometries from the reaction profile as the calculation input. The algorithm is quick and simple to run and, except for the two reactions with long alkyl chains, calculates selectivity more accurately than the transition state theory alone.

The use of machine learning techniques in computational chemistry has gained momentum since large molecular databases are now readily available. The predictions of molecular properties with the trained machine learning models are computationally less expensive than the traditional quantum mechanics calculations without the loss of accuracy in many cases. In Chapter 3, a new explainable molecular representation based on bonds, angles and dihedrals has been developed. The machine learning models trained with this representation can accurately predict the electronic energies and the free energies of small organic molecules with atom types C, H N and O, with the mean absolute error of 1.2 kcal mol-1. The models are robust to extrapolations to larger organic molecules with the average error of less than 3.7 kcal mol-1 for 10 or fewer heavy atoms, which represent a chemical space two orders of magnitude larger. The rapid energy predictions of multiple molecules, up to 7 times faster than the previous ML models of similar accuracy, have been achieved by sampling geometries around the potential energy surface minima. The structures around the minima have been sampled by randomly distorting the structures in the dataset. Therefore, the input geometries do not have to be fully optimised; accurate density functional theory electronic energy predictions can be made from force-field optimised geometries with the mean absolute error of 2.5 kcal mol-1.

Chapter 4 combines the selectivity analysis concepts from Chapters 1 and 2 and the database-based approaches from Chapter 3. At present, no cheminformatic study has investigated how frequently the nonstatistical dynamics-driven selectivity might be observed. Chapter 4 investigates a Diels-Alder reaction that has many potential synthetic applications but shows selectivity controlled by nonstatistical dynamics. A new workflow has been developed that can automate the transition state optimisations and the reaction profile calculations. Many new variant Diels-Alder energy profiles have been generated through this workflow. Out of the 260 full reaction profiles calculated, 173 reactions could potentially have the selectivity governed by nonstatistical dynamics. Automating the transition state optimisationsand the selectivity predictions lead to a much wider chemical reaction space exploration. Chapter 4 illustrates how the cheminformatics and the molecular modelling approaches can be combined to investigate the origins of the selectivities in key organic reactions.

Description

Date

2022-06-30

Advisors

Goodman, Jonathan

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Sponsorship
Trinity Henry-Barlow Scholarship