Phylogeny-Aware Analysis of Metagenome Community Ecology Based on Matched Reference Genomes while Bypassing Taxonomy.
View / Open Files
Authors
Gonzalez, Antonio
Yu, Julian
Kuczynski, Justin
Publication Date
2022-04-26Journal Title
mSystems
ISSN
2379-5077
Publisher
American Society for Microbiology
Volume
7
Issue
2
Language
eng
Type
Article
This Version
VoR
Metadata
Show full item recordCitation
Zhu, Q., Huang, S., Gonzalez, A., McGrath, I., McDonald, D., Haiminen, N., Armstrong, G., et al. (2022). Phylogeny-Aware Analysis of Metagenome Community Ecology Based on Matched Reference Genomes while Bypassing Taxonomy.. mSystems, 7 (2) https://doi.org/10.1128/msystems.00167-22
Description
Funder: Emil Aaltosen Säätiö
Funder: Emerald Foundation
Funder: Sydäntutkimussäätiö
Funder: Suomen Lääketieteen Säätiö
Funder: Suomen Lääketieteen Säätiö (Finnish Medical Foundation)
Funder: Sydäntutkimussäätiö (Finnish Foundation for Cardiovascular Research)
Funder: Emil Aaltosen Säätiö (Emil Aaltonen Foundation)
Abstract
We introduce the operational genomic unit (OGU) method, a metagenome analysis strategy that directly exploits sequence alignment hits to individual reference genomes as the minimum unit for assessing the diversity of microbial communities and their relevance to environmental factors. This approach is independent of taxonomic classification, granting the possibility of maximal resolution of community composition, and organizes features into an accurate hierarchy using a phylogenomic tree. The outputs are suitable for contemporary analytical protocols for community ecology, differential abundance, and supervised learning while supporting phylogenetic methods, such as UniFrac and phylofactorization, that are seldom applied to shotgun metagenomics despite being prevalent in 16S rRNA gene amplicon studies. As demonstrated in two real-world case studies, the OGU method produces biologically meaningful patterns from microbiome data sets. Such patterns further remain detectable at very low metagenomic sequencing depths. Compared with taxonomic unit-based analyses implemented in currently adopted metagenomics tools, and the analysis of 16S rRNA gene amplicon sequence variants, this method shows superiority in informing biologically relevant insights, including stronger correlation with body environment and host sex on the Human Microbiome Project data set and more accurate prediction of human age by the gut microbiomes of Finnish individuals included in the FINRISK 2002 cohort. We provide Woltka, a bioinformatics tool to implement this method, with full integration with the QIIME 2 package and the Qiita web platform, to facilitate adoption of the OGU method in future metagenomics studies. IMPORTANCE Shotgun metagenomics is a powerful, yet computationally challenging, technique compared to 16S rRNA gene amplicon sequencing for decoding the composition and structure of microbial communities. Current analyses of metagenomic data are primarily based on taxonomic classification, which is limited in feature resolution. To solve these challenges, we introduce operational genomic units (OGUs), which are the individual reference genomes derived from sequence alignment results, without further assigning them taxonomy. The OGU method advances current read-based metagenomics in two dimensions: (i) providing maximal resolution of community composition and (ii) permitting use of phylogeny-aware tools. Our analysis of real-world data sets shows that it is advantageous over currently adopted metagenomic analysis methods and the finest-grained 16S rRNA analysis methods in predicting biological traits. We thus propose the adoption of OGUs as an effective practice in metagenomic studies.
Keywords
Metagenomics, Supervised Learning, Unifrac, Taxonomy Independent, Operational Genomic Unit, Reference Phylogeny
Sponsorship
National Science Foundation (NSF) (BIO150043, RAPID 2038509)
Alfred P. Sloan Foundation (APSF) (G-2017-9838)
HHS | National Institutes of Health (NIH) (K12GM068524, F30CA243480, P30DK120515 DP1AT010885 U19AG063744 U24CA248454)
NCI NIH HHS (U24 CA248454, F30 CA243480)
International Business Machines Corporation (IBM) (A1770534)
NCCIH NIH HHS (DP1 AT010885)
NIGMS NIH HHS (K12 GM068524)
Academy of Finland (295741, 321351)
DOD | Defense Advanced Research Projects Agency (JUMP/CRISP)
HHS | National Institutes of Health (K12GM068524, P30DK120515 DP1AT010885 U19AG063744 U24CA248454, F30CA243480)
Academy of Finland (AKA) (295741, 321351)
Alfred P. Sloan Foundation (G-2017-9838)
Crohn&apos (675191)
NIA NIH HHS (U19 AG063744)
DOD | Defense Advanced Research Projects Agency (DARPA) (JUMP/CRISP)
Crohn's and Colitis Foundation (CCF) (675191)
International Business Machines Corporation (A1770534)
National Science Foundation (BIO150043, RAPID 2038509)
NIDDK NIH HHS (P30 DK120515)
Identifiers
35369727, PMC9040630
External DOI: https://doi.org/10.1128/msystems.00167-22
This record's URL: https://www.repository.cam.ac.uk/handle/1810/336863
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.
Recommended or similar items
The current recommendation prototype on the Apollo Repository will be turned off on 03 February 2023. Although the pilot has been fruitful for both parties, the service provider IKVA is focusing on horizon scanning products and so the recommender service can no longer be supported. We recognise the importance of recommender services in supporting research discovery and are evaluating offerings from other service providers. If you would like to offer feedback on this decision please contact us on: support@repository.cam.ac.uk