Repository logo
 

ThaleMine: A Warehouse for Arabidopsis Data Integration and Discovery.

Accepted version
Peer-reviewed

Type

Article

Change log

Authors

Krishnakumar, Vivek 
Contrino, Sergio 
Cheng, Chia-Yi 
Belyaeva, Irina 
Ferlanti, Erik S 

Abstract

ThaleMine (https://apps.araport.org/thalemine/) is a comprehensive data warehouse that integrates a wide array of genomic information of the model plant Arabidopsis thaliana. The data collection currently includes the latest structural and functional annotation from the Araport11 update, the Col-0 genome sequence, RNA-seq and array expression, co-expression, protein interactions, homologs, pathways, publications, alleles, germplasm and phenotypes. The data are collected from a wide variety of public resources. Users can browse gene-specific data through Gene Report pages, identify and create gene lists based on experiments or indexed keywords, and run GO enrichment analysis to investigate the biological significance of selected gene sets. Developed by the Arabidopsis Information Portal project (Araport, https://www.araport.org/), ThaleMine uses the InterMine software framework, which builds well-structured data, and provides powerful data query and analysis functionality. The warehoused data can be accessed by users via graphical interfaces, as well as programmatically via web-services. Here we describe recent developments in ThaleMine including new features and extensions, and discuss future improvements. InterMine has been broadly adopted by the model organism research community including nematode, rat, mouse, zebrafish, budding yeast, the modENCODE project, as well as being used for human data. ThaleMine is the first InterMine developed for a plant model. As additional new plant InterMines are developed by the legume and other plant research communities, the potential of cross-organism integrative data analysis will be further enabled.

Description

Keywords

Arabidopsis thaliana, InterMine, data integration, data warehouse, genomics, web services, Arabidopsis, Arabidopsis Proteins, Computational Biology, Databases, Genetic, Gene Expression Profiling, Gene Expression Regulation, Plant, Gene Ontology, Genomics, Information Storage and Retrieval, Internet, Protein Interaction Mapping, Protein Interaction Maps, Reproducibility of Results, Sequence Analysis, RNA

Journal Title

Plant and Cell Physiology

Conference Name

Journal ISSN

0032-0781
1471-9053

Volume Title

58

Publisher

Oxford University Press (OUP)
Sponsorship
Biotechnology and Biological Sciences Research Council (BB/L027151/1)