Reactions with many steps can be represented by a single XML-based table of the atoms, bonds and electrons. For each step the complete Chemical Markup Language1 representation of all components is given. These snapshots can then be combined to give an animated description of the complete reaction, both in "2D" chemical structure diagrams and in three dimensions. Here we demonstrate the method's power with enzymatic reactions.
The problem of how to represent chemical reactions was first addressed by Ingold2 in the 1930's. Ingold proposed the now familiar "curly arrow" representation of electron movement during a reaction. He also provided the mechanistic nomenclature which is now familiar to all organic chemists. Whilst the systematic nomenclature of organic compounds has been addressed in great detail, most notably by IUPAC, chemical reactions have never been subjected to the same rigorous treatment.
The mechanistic representation of Ingold has largely remained unchanged over the past seven decades and was designed, not surprisingly, for representation of reactions on paper. In this computer age, we felt that there was scope for creating an animated method of describing the movement of electons, bonds and atoms during a reaction, and with this in mind we created CMLSnap.
Our MACiE Database3 captures detailed mechanisms of enzymatic reactions from the published literature. These can have any number of steps, often involving multiple chemical species. These steps normally connect local minima along the reaction path (transition states are not currently included). Steps are conventionally represented as "paper-based" 2D diagrammatic reaction mechanisms with "curly arrows". Such diagrams (Figure 1) are not "machine-understandable" and are occasionally ambiguous or incorrect . It is not always easy to see which functional groups (including amino acid residues) are modified during the steps, and which are "spectators".
In a sequential reaction the products of one step normally contain some of the reactants of the next step. We present a representation where all reaction components and processes are unambiguously defined and where the fate and involvement of all species is clear. The method is extensible and can also support 3D models, thermodynamic and kinetic quantities; here we concentrate on the "2D reaction diagram".
During a single step some or all of the following changes may occur in the chemical representation:
A sequential multi-step reaction has an initial representation (snap1) and snapshots after each step:
snap1 [step1] snap2 [step2] snap3 ... snapN
Each snapshot describes the complete configuration of all components (atomic positions, bond orders, formal charges and formal electrons) at each stage in the reaction. The snaps normally correspond to local minima on the reaction surface, defined by a collection of "molecules". These molecules can be the result of computation, of interpolation from 3D structures (e.g. in protein ligand complexes) or schematic "2D" diagrams created by chemical editing tools.
Each step can be thought of as a row in a table, with information for each component. As CML can hold any number of such components (2D coordinates, 2D coordinates, spin state, lone pairs, wedge/hatch stereochemistry, physical properties, etc.) many features of reaction mechanisms can now be conveniently tabluated. This tabulation can be used for systematic analysis of reaction mechanisms, or, as we demonstrate here, animation.
Each snapshot contains all the atoms in the system whether or not they are mechanistically involved in the current steps. Explicit bonds are given for covalently linked atoms. If during a step a bond is broken, it is automatically assigned an order of 0 in the resulting snapshot. Explicit mechanistic electrons can either be included manually or deduced from changes in bond orders.
Here we illustrate this with the representation of a 2-step SN1 reaction. Consider:
The configuration includes three atoms (Br, the pseudoatom R, and Cl), two "bonds" (Br-R and R-Cl) and two mechanistic electron pairs (e1, e2) along a schematic x-axis. We create three snaps:
|Snap||Br Charge||Br Coord||e1 Position||Br-R Bond Order||R Charge||R Coord||e2 Position||R-Cl Bond Order||Cl Charge||Cl Coord|
|Br- R+ Cl-||-1||-∞||Br||0||1||0||Cl||0||-1||∞|
A multi-step reaction can be fully represented by a single CMLSnap XML document. It provides the complete topology of the reaction scheme, and allows for branched and cyclic reaction schemes to be represented.
By transforming the XML into SVG (Scalable Vector Graphics), an animated diagram of a reaction can be created. Our MACiE example is a Class A beta-lactamase4,5,6 mechanism consisting of five steps, which is shown below as an animated reaction in figure 2. To view the animations all that is required is an HTML browser with an SVG plugin (e.g. from Adobe, v.i.) or a native SVG viewer that supports animation (e.g. Batik from Apache)
To view the animation from the beginning, simply refresh the page or spawn a new window
The CMLSnap software can be downloaded from http://wwmm.ch.cam.ac.uk/moin/CMLSnap. It is based on the JUMBO toolkit for CML 7 which is OpenSource but here bundled as a single jar file for convenience
The CMLSnap distribution contains about 10 multistep examples, chosen from a spectrum of disciplines. These already contain the snapshots that are used to create the final animation. The software requirements, information on creating the snap shots and details on how to run the animation process can be found in the accompanying tutorial.
CMLSnap has several advantages over conventional methods of publishing and storing reaction mechanisms.
Conventional tools (JChemPaint, Marvin, JME or ISIS/Draw+OpenBabel ) can be used to create snapshots, which are then incrementally edited, by changing bonds, atom positions and charges. We find that this technology is easy to use and may be used for analyzing the reaction scheme.
We thank the EPSRC for financial support of this project and Unilever for their support of the Centre for Molecular Science Informatics.