About this collection
The WorldWideMolecularMatrix, an Open collection of information on small molecules
About 1-2 million new chemical compounds are published every year in primary scientific journals. The properties of a molecule determine what it looks like, how it behaves, and what it might be used for. Of particular recent interest are the safety aspects of chemicals (e.g. the European REACH program). With the advent of computers it is possible to calculate many molecular properties on ordinary computers (e.g. teaching machines idle at night or during vacations).
The "Matrix" in WWMM is influenced by William Gibson's vision of a cyberinfrastructure where all knowledge is accessible. The WWMM is an experiment to see how far this can be taken for chemical compounds. Although much of the information for a given compound has been Openly published, very little is available in Open electronic collections. The WWMM is aimed at catalysing this approach for chemistry and the current collection is made available under the Budapest Open Archive Initiative (http://www.soros.org/openaccess/read.shtml).
To seed the approach this collection will contain the calculated properties of over 200,000 Open molecules provided by the US National Cancer Institute (NCI), using semi-empirical quantum-mechanical methods. Properties include heat of formation, 3-dimensional structure, dipole moment and ionization potential. Each molecule, in Chemical Markup Language (CML) is in a separate entry indexed by NSC number (NCI). The latest version of the CML schema is available here. We have used the recent IUPAC/NIST chemical identifier (InChI) to provide searches by chemical structure.
We intend to expand this collection in two main ways:
- adding more molecules from Open sources
- adding additional Open properties (experimental and calculated)
We wish to avoid any copyright information and do not extract molecules or data from copyrighted collections including websites. We have developed robotic methods for the extraction of data from published papers or theses and are keen to use them. However it is unclear whether it is allowable to use robots to extract data from primary journals; we believe it is and have argued this in an accepted paper for publication by the Royal Society of Chemistry. We strongly urge that authors indicate that the data in their papers is extractable under the BOAI.
The WWMM collection and software is Open and we invite others to clone the collection and the management tools. We would urge them to make any additional data available in the same manner.
Further details can be found on http://wwmm.ch.cam.ac.uk/
Note: the WWMM breaks new ground for scholarly archiving in several ways, so there will be teething problems:
- the collection is large
- we wish to update it robotically
- the content is non-textual and uses a markup language
- specialist tools can be used for searching and display