Repository logo
 

Fit for purpose? A metascientific analysis of metabolomics data in public repositories


Type

Thesis

Change log

Authors

Abstract

Metabolomics is the study of metabolites and metabolic processes. Due to the diversity of structures and polarities of metabolites, no single analytical technique is able to measure the entire metabolome — instead a varied set of experimental designs and instrumental technologies are used to measure specific portions. This has led to the development of many distinct data analysis and processing methods and software. There is hope that metabolomics can be utilized for clinical applications, in toxicology and to measure the exposome. However, for these applications to be realised data must be high quality, sufficiently standardised and annotated, and FAIR (Findable, Accessible, Interoperable and Reproducible). For this purpose, it is also important that standardised, FAIR software workflows are available. There has also recently been much concern over the reproducibility of scientific research, which FAIR and open data, and workflows can help to address. To this end, this thesis aims to assess current practices and standards of sharing data within the field of metabolomics, using metascientific approaches. The types of functions of software for processing and analysing metabolomics data is also assessed. Reporting standards are designed to ensure that the minimum information required to un- derstand and interpret the results of analysis are reported. However, poor reporting standards are ignored and not complied with. Compliance to the biological context Metabolomics Standards Initiative (MSI) guidelines was examined, in order to investigate their timeliness. The state of open data within the metabolomics community was examined by investigating how much publicly available metabolomics data there is and where has it been deposited. To explore whether journal data sharing policies are driving open metabolomics data, which journals publish articles that have their underlying data made open was also examined. However, open data alone is not inherently useful: if data is incomplete, lacking in quality or missing crucial metadata, it is not valuable. Conversely, if data are reused, this can demonstrate the worth of public data archiving. Levels of reuse of public metabolomics data were therefore examined. With greater than 250 software tools specific for metabolomics, practitioners are faced with a daunting task to select the best tools for data collection and analysis. To help educate researchers about what software is available, a taxonomy of metabolomics software tools and a GitHub pages wiki, which provides extensive details about all included software, have been developed.

Description

Date

2018-07-30

Advisors

Steinbeck, Christoph

Keywords

Metabolomics, Metabolome, Public Data, Open Data, Metascience, Metascientific analysis, Data sharing, Metabolite, FAIR Data Principles, Reporting Standards, Metabolomics Standards Initative (MSI), Metadata, Data Archiving

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
European Molecular Biology Laboratory (EMBL) PhD studentship