Recovery and quality estimation of metagenomic assembled genomes of eukaryotes
Repository URI
Repository DOI
Change log
Authors
Abstract
Microorganisms are found in virtually all environments, and while the majority of microorganisms are often prokaryotes, by biomass there are suspected to be 6 times more prokaryotes than fungi globally (Bar-On et al., 2018), eukaryotes are also important constituents of microbial communities (e.g on the human skin). Shotgun metagenomics can provide access to the combined genetic information of a community in a culture independent manner. Using de novo assembly and post processing methods, it can lead to the generation of so called metagenomic assembled genomes (MAGs), which provide contextualised access to genes of these elusive organisms. As most studies have focused on the recovery of prokaryotic MAGs, I first examined the limitations and gaps of existing tools with respect to their ability to recover microbial eukaryotic genomes. This led to the development of EukCC, a software to estimate the completeness and contamination of eukaryotic MAGs. Evaluation of this software showed that it is well suited for the fully automated recovery of eukaryotic MAGs. This workflow was applied to dataset obtained from several biomes to recover eukaryotic MAGs. However, I also demonstrate that eukaryotic MAGs can sometimes be fragmented and developed a merging algorithm to create merged MAGs (mMAGs). With the implementation of this algorithm in EukCC 2, I search a large number of datasets from MGnify for known and novel eukaryotic MAGs. Completing the eukaryotic MAG recovery process, I discuss how species-level dereplication for eukaryotes can be approached based on the genetic information alone. In summary I show that recovery of eukaryotic MAGs is a challenging but can be largely automated allowing large-scale studies to be performed.