WWMM

Permanent URI for this collection

The WorldWideMolecularMatrix, an Open collection of information on small molecules



About 1-2 million new chemical compounds are published every year in primary scientific journals. The properties of a molecule determine what it looks like, how it behaves, and what it might be used for. Of particular recent interest are the safety aspects of chemicals (e.g. the European REACH program). With the advent of computers it is possible to calculate many molecular properties on ordinary computers (e.g. teaching machines idle at night or during vacations).

The "Matrix" in WWMM is influenced by William Gibson's vision of a cyberinfrastructure where all knowledge is accessible. The WWMM is an experiment to see how far this can be taken for chemical compounds. Although much of the information for a given compound has been Openly published, very little is available in Open electronic collections. The WWMM is aimed at catalysing this approach for chemistry and the current collection is made available under the Budapest Open Archive Initiative (http://www.soros.org/openaccess/read.shtml).

To seed the approach this collection will contain the calculated properties of over 200,000 Open molecules provided by the US National Cancer Institute (NCI), using semi-empirical quantum-mechanical methods. Properties include heat of formation, 3-dimensional structure, dipole moment and ionization potential. Each molecule, in Chemical Markup Language (CML) is in a separate entry indexed by NSC number (NCI). The latest version of the CML schema is available here. We have used the recent IUPAC/NIST chemical identifier (InChI) to provide searches by chemical structure.

We intend to expand this collection in two main ways:

  • adding more molecules from Open sources
  • adding additional Open properties (experimental and calculated)


We wish to avoid any copyright information and do not extract molecules or data from copyrighted collections including websites. We have developed robotic methods for the extraction of data from published papers or theses and are keen to use them. However it is unclear whether it is allowable to use robots to extract data from primary journals; we believe it is and have argued this in an accepted paper for publication by the Royal Society of Chemistry. We strongly urge that authors indicate that the data in their papers is extractable under the BOAI.

The WWMM collection and software is Open and we invite others to clone the collection and the management tools. We would urge them to make any additional data available in the same manner.

Further details can be found on http://wwmm.ch.cam.ac.uk/

Note: the WWMM breaks new ground for scholarly archiving in several ways, so there will be teething problems:


  • the collection is large
  • we wish to update it robotically
  • the content is non-textual and uses a markup language
  • specialist tools can be used for searching and display

Browse

Recent Submissions

Now showing 1 - 20 of 175356
  • ItemOpen Access
    NSC383508
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:23:41Z) US National Cancer Institute
  • ItemOpen Access
    NSC383504
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:23:38Z) US National Cancer Institute
  • ItemOpen Access
    NSC383503
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:23:35Z) US National Cancer Institute
  • ItemOpen Access
    NSC383502
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:23:32Z) US National Cancer Institute
  • ItemOpen Access
    NSC383501
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:23:29Z) US National Cancer Institute
  • ItemOpen Access
    NSC383500
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:23:26Z) US National Cancer Institute
  • ItemOpen Access
    NSC383499
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:23:24Z) US National Cancer Institute
  • ItemOpen Access
    NSC383498
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:23:21Z) US National Cancer Institute
  • ItemOpen Access
    NSC383497
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:23:18Z) US National Cancer Institute
  • ItemOpen Access
    NSC383496
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:23:15Z) US National Cancer Institute
  • ItemOpen Access
    NSC383493
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:23:12Z) US National Cancer Institute
  • ItemOpen Access
    NSC383492
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:23:09Z) US National Cancer Institute
  • ItemOpen Access
    NSC383491
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:23:06Z) US National Cancer Institute
  • ItemOpen Access
    NSC383488
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:23:03Z) US National Cancer Institute
  • ItemOpen Access
    NSC383487
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:23:00Z) US National Cancer Institute
  • ItemOpen Access
    NSC383486
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:22:57Z) US National Cancer Institute
  • ItemOpen Access
    NSC383485
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:22:53Z) US National Cancer Institute
  • ItemOpen Access
    NSC383484
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:22:50Z) US National Cancer Institute
  • ItemOpen Access
    NSC383483
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:22:48Z) US National Cancer Institute
  • ItemOpen Access
    NSC383482
    (Unilever Center for Molecular Informatics, Cambridge University, 2006-05-05T17:22:45Z) US National Cancer Institute