Repository logo
 

Visions of a Semantic Molecular Future

Browse

Recent Submissions

Now showing 1 - 20 of 24
  • ItemOpen Access
    Visions of a Semantic Molecular Future
    (Murray-Rust group, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, 2011-07-05) Murray-Rust, Peter
    The event looks forward. Scholarship (universities, research, teaching, publishing) has been slow to take up the opportunities of this digital century. This is an opportunity to identify and build the future.
  • ItemOpen Access
    Visions of a Semantic Molecular Future
    (Murray-Rust group, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, 2011-07-05) Murray-Rust, Peter; Brooks, Brian; Bolton, Charlotte
    The event looks forward. Scholarship (universities, research, teaching, publishing) has been slow to take up the opportunities of this digital century. This is an opportunity to identify and build the future.
  • ItemOpen Access
    WWMM/CML Framework
    (Murray-Rust group, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, 2011-07-05) Murray-Rust, Peter; Rzepa, Henry S; Blue Obelisk; Quixote Project; eMinerals Project
    The World-Wide Molecular Matrix (2001-) is a design for capture and re-use of chemical information using Open semantic tools. There is no centre; scientists publish to the Matrix and re-use data from it. All data is Open. it interacts with the Linked Open Data Cloud.
  • ItemOpen Access
    The Quixote Project
    (Murray-Rust group, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, 2011-07-05) Adams, Sam; Beke, Tamas; Echenique, Pablo; Estrada, Jorge; Hanwell, Marcus D; Murray-Rust, Peter; Thomas, Jens; Townsend, Joseph A; Westerhoff, Lance
    The Quixote Project is an Open Source, Open Data, international collaboration to develop the infrastructure to organise, share and query computational chemistry data; no centralised structure, internet-based and run entirely by motivated scientists; create a useful infrastructure and consolidate the model around the tools (the "if you build it, they will come" approach); collaboration managed using skype conferences, wikis, etherpads and mailing lists; have started to attract funding and collaborators. [http://quixote.wikispot.org/]
  • ItemOpen Access
    Open Bibliography
    (Murray-Rust group, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, 2011-07-05) Murray-Rust, Peter; Pollock, Rufus; MacGillivray, Mark; O'Steen, Ben; Waites, William
    More research is published currently than can be understood or followed by a researcher without the aid of a computer. We need Open shareable information on research publications, an Open Bibliography, to build the services that enable researchers to explore their field and discover the research they need. Producers of bibliographic data such as libraries, publishers, universities, scholars or social reference management communities have an important role in supporting the advance of humanity's knowledge. For society to reap the full benefits from bibliographic endeavours, it is imperative that bibliographic data be made open - that is, available for anyone to use and re-use freely for any purpose.
  • ItemOpen Access
    Climate Code Foundation
    (Murray-Rust group, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, 2011-07-05) Barnes, Nick; Jones, David
    Climate Code Foundation - who are we? A non-profit organisation founded in August 2010; our goal is to promote the public understanding of climate science, by increasing the visibility and clarity of the software used in climate science and by encouraging climate scientists to do the same, by encouraging good software development and management practices among climate scientists and by encouraging the publication of climate science software as Open Source. [http://www.climatecode.org/]
  • ItemOpen Access
    Chem# - Semantically Enriched Linked Open Chemical Data
    (Murray-Rust group, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, 2011-07-05) Adams, Sam; Murray-Rust, Peter; Brooks, Brian; Downing, Jim; Day, Nick
    The problem: Vast quantities of chemical data (e.g. crystal structures, NMR spectra, experimental reports) are generated every day. The majority of this data is never published, and the data that is published is fragmented,trapped in legacy formats and difficult to discover. The solution: Semantically Enriched Linked Open Chemical Data: browsable, searchable, discoverable and interpretable by humans and machines alike, using standardized extensible data formats (Chemical Markup Language) and technologies (HTTP, RDF).
  • ItemOpen Access
    Chemistry Add-In for Word
    (Murray-Rust group, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, 2011-07-05) Townsend, Joseph A
    The Chemistry Add-In for Word is an Open Source program that allows chemists to create, edit and manipulate chemistry (labels and 2D structures) in the Word environment. the on-screen representation is backed by semantic data in Chemical Markup Language (CML). Combined with domain-aware libraries we enable novel functionality in data checking during the authoring process, chemistry-centric article reading support and data-mining applications. [http://research.microsoft.com/chem4word/] [http://chem4word.codeplex.com/]
  • ItemOpen Access
    The Blue Obelisk Community
    (Murray-Rust group, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, 2011-07-05) O'Boyle, Noel M; Murray-Rust, Peter
    The Internet has brought together a group of chemists who are driven by wanting to do things better, but are frustrated with the Closed systems that chemists currently have to work with. they share a belief in the concepts of Open Data, Open Standards and Open Source. And they express this in software, data, algorithms, specifications, tutorials, demonstrations, articles and anything that helps get the message across. [http://www.blueobelisk.org/]
  • ItemOpen Access
    Ami-The Chemist's Amanuensis
    (Murray-Rust group, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, 2011-07-05) Thorn, Adam; Smith, Matthew; Matthews, Peter; Chen, Shaoming; O'Steen, Ben; Brooks, Brian
    The Ami project was a six month Rapid Innovation project sponsored by JISC to explore the Virtual Research Environment space. The project brainstormed with chemists and decided to investigate ways to facilitate monitoring and collection of experimental data. A frequently encountered use-case was identified of how the chemist reaches the end of an experiment, but finds an unexpected result. The ability to replay events can significantly help make sense of how things progressed. The project therefore concentrated on collecting a variety of dimensions of ancillary data – data that would not normally be collected due to practicality constraints. There were three main areas of investigation: 1) Development of a monitoring tool using infrared and ultrasonic sensors; 2) Time-lapse motion video capture (for example, videoing 5 seconds in every 60); and 3) Activity-driven video monitoring of the fume cupboard environs. The Ami client application was developed to control these separate logging functions. The application builds up a timeline of the events in the experiment and around the fume cupboard. The videos and data logs can then be reviewed after the experiment in order to help the chemist determine the exact timings and conditions used. The project experimented with ways in which a Microsoft Kinect could be used in a laboratory setting. Investigations suggest that it would not be an ideal device for controlling a mouse, but it shows promise for usages such as manipulating virtual molecules.
  • ItemOpen Access
    Semantic science and its communication – a personal view
    (Murray-Rust group, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, 2011-07-04) Murray-Rust, Peter
    The articles in this special issue represent the culmination of about 15 years working with the potential of the web to support chemical and related subjects. The selection of papers arises from a symposium held in January 2011 (‘Visions of a Semantic Molecular Future’) which gave me an opportunity to invite many people who shared the same vision. I have asked them to contribute their papers and most have been able to do so. They cover a wide range of content, approaches and styles and apart from the selection of the speakers (and hence the authors) I have not exercised any control over the content.
  • ItemOpen Access
    Adventures in Public Data
    (Murray-Rust group, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, 2011-07-04) Zaharevitz, Daniel W
  • ItemOpen Access
    Openness as Infrastructure
    (Murray-Rust group, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, 2011-07-04) Wilbanks, John
  • ItemOpen Access
    Three stories about the conduct of science: Past, future, and present.
    (Murray-Rust group, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, 2011-07-04) Neylon, Cameron
    In this piece I would like to tell a few stories; three stories to be precise. Firstly I want to explain where I am, where I've come from and what has led me to the views that I hold today. I find myself at an interesting point in my life and career at the same point as the research community is undergoing massive change. The second story is one of what the world might look like at some point in the future. What might we achieve? What might it look like? And what will be possible? Finally I want to ask the question of how we get there from here. What is the unifying idea or movement that actually has the potential to carry us forward in a positive way? At the end of this I'm going to ask you, the reader, to commit to something as part of the process of making that happen.
  • ItemOpen Access
    Open Bibliography for Science, Technology, and Medicine
    (Murray-Rust group, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, 2011-07-04) Jones, Richard; MacGillivray, Mark; Murray-Rust, Peter; Pitman, Jim; Sefton, Peter; O'Steen, Ben; Waites, William
    The concept of Open Bibliography in science, technology and medicine (STM) is introduced as a combination of Open Source tools, Open specifications and Open bibliographic data. An Openly searchable and navigable network of bibliographic information and associated knowledge representations, a Bibliographic Knowledge Network, across all branches of Science, Technology and Medicine, has been designed and initiated. For this large scale endeavour, the engagement and cooperation of the multiple stakeholders in STM publishing - authors, librarians, publishers and administrators - is sought. BibJSON, a simple structured text data format (informed by BibTex, Dublin Core, PRISM and JSON) suitable for both serialisation and storage of large quantities of bibliographic data is presented. BibJSON, and companion bibliographic software systems BibServer and OpenBiblio promote the quantity and quality of Openly available bibliographic data, and encourage the development of improved algorithms and services for processing the wealth of information and knowledge embedded in bibliographic data across all fields of scholarship. Major providers of bibliographic information have joined in promoting the concept of Open Bibliography and in working together to create prototype nodes for the Bibliographic Knowledge Network. These contributions include large-scale content from PubMed and ArXiv, data available from Open Access publishers, and bibliographic collections generated by the members of the project. The concept of a distributed bibliography (BibSoup) is explored.
  • ItemOpen Access
    Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on.
    (Springer Science and Business Media LLC, 2011-10-14) O'Boyle, Noel M; Guha, Rajarshi; Willighagen, Egon L; Adams, Samuel E; Alvarsson, Jonathan; Bradley, Jean-Claude; Filippov, Igor V; Hanson, Robert M; Hanwell, Marcus D; Hutchison, Geoffrey R; James, Craig A; Jeliazkova, Nina; Lang, Andrew Sid; Langner, Karol M; Lonie, David C; Lowe, Daniel M; Pansanel, Jérôme; Pavlov, Dmitry; Spjuth, Ola; Steinbeck, Christoph; Tenderholt, Adam L; Theisen, Kevin J; Murray-Rust, Peter
    BACKGROUND: The Blue Obelisk movement was established in 2005 as a response to the lack of Open Data, Open Standards and Open Source (ODOSOS) in chemistry. It aims to make it easier to carry out chemistry research by promoting interoperability between chemistry software, encouraging cooperation between Open Source developers, and developing community resources and Open Standards. RESULTS: This contribution looks back on the work carried out by the Blue Obelisk in the past 5 years and surveys progress and remaining challenges in the areas of Open Data, Open Standards, and Open Source in chemistry. CONCLUSIONS: We show that the Blue Obelisk has been very successful in bringing together researchers and developers with common interests in ODOSOS, leading to development of many useful resources freely available to the chemistry community.
  • ItemOpen Access
    The Quixote project: Collaborative and Open Quantum Chemistry data management in the Internet age
    (Murray-Rust group, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, 2011-07-04) Adams, Sam; de, Castro Pablo; Echenique, Pablo; Estrada, Jorge; Hanwell, Marcus D; Murray-Rust, Peter; Sherwood, Paul; Thomas, Jens; Townsend, Joseph A
    Computational Quantum Chemistry has developed into a powerful, efficient, reliable and increasingly routine tool for exploring the structure and properties of small to medium sized molecules. Many thousands of calculations are performed every day, some offering results which approach experimental accuracy. However, in contrast to other disciplines, such as crystallography, or bioinformatics, where standard formats and well-known, unified databases exist, this QC data is generally destined to remain locally held in files which are not designed to be machine-readable. Only a very small subset of these results will become accessible to the wider community through publication. In this paper we describe how the Quixote Project is developing the infrastructure required to convert output from a number of different molecular quantum chemistry packages to a common semantically rich, machine-readable format and to build respositories of QC results. Such an infrastructure offers benefits at many levels. The standardised representation of the results will facilitate software interoperability, for example making it easier for analysis tools to take data from different QC packages, and will also help with archival and deposition of results. The repository infrastructure, which is lightweight and built using Open software components, can be implemented at individual researcher, project, organisation or community level, offering the exciting possibility that in future many of these QC results can be made publically available, to be searched and interpreted just as crystallography and bioinformatics results are today. Although we believe that quantum chemists will appreciate the contribution the Quixote infrastructure can make to the organisation and and exchange of their results, we anticipate that greater rewards will come from enabling their results to be consumed by a wider community. As the respositories grow they will become a valuable source of chemical data for use by other disciplines in both research and education. The Quixote project is unconventional in that the infrastructure is being implemented in advance of a full definition of the data model which will eventually underpin it. We believe that a working system which offers real value to researchers based on tools and shared, searchable repositories will encourage early participation from a broader community, including both producers and consumers of data. In the early stages, searching and indexing can be performed on the chemical subject of the calculations, and well defined calculation meta-data. The process of defining more specific quantum chemical definitions, adding them to dictionaries and extracting them consistently from the results of the various software packages can then proceed in an incremental manner, adding additional value at each stage. Not only will these results help to change the data management model in the field of Quantum Chemistry, but the methodology can be applied to other pressing problems related to data in computational and experimental science.
  • ItemOpen Access
    CMLLite: a design philosophy for CML
    (Murray-Rust group, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, 2011-07-04) Townsend, Joseph A; Murray-Rust, Peter
    CMLLite is a collection of definitions and processes which provide strong and flexible validation for a document in Chemical Markup Language (CML). It consists of an updated CML schema (schema3), conventions specifying rules in both human and machine-understandable forms and a validator available both online and offline to check conformance. This article explores the rationale behind the changes which have been made to the schema, explains how conventions interact and how they are designed, formulated, implemented and tested, and gives an overview of the validation service.
  • ItemOpen Access
    Mining chemical information from open patents.
    (Springer Science and Business Media LLC, 2011-10-14) Jessop, David M; Adams, Sam E; Murray-Rust, Peter
    Linked Open Data presents an opportunity to vastly improve the quality of science in all fields by increasing the availability and usability of the data upon which it is based. In the chemical field, there is a huge amount of information available in the published literature, the vast majority of which is not available in machine-understandable formats. PatentEye, a prototype system for the extraction and semantification of chemical reactions from the patent literature has been implemented and is discussed. A total of 4444 reactions were extracted from 667 patent documents that comprised 10 weeks' worth of publications from the European Patent Office (EPO), with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra reported as product characterisation data are additionally captured.
  • ItemOpen Access
    OSCAR4: a flexible architecture for chemical text-mining.
    (Springer Science and Business Media LLC, 2011-10-14) Jessop, David M; Adams, Sam E; Willighagen, Egon L; Hawizy, Lezan; Murray-Rust, Peter
    The Open-Source Chemistry Analysis Routines (OSCAR) software, a toolkit for the recognition of named entities and data in chemistry publications, has been developed since 2002. Recent work has resulted in the separation of the core OSCAR functionality and its release as the OSCAR4 library. This library features a modular API (based on reduction of surface coupling) that permits client programmers to easily incorporate it into external applications. OSCAR4 offers a domain-independent architecture upon which chemistry specific text-mining tools can be built, and its development and usage are discussed.