The genome of Euglena gracilis: Annotation, function and expression
Euglena gracilis is a species of unicellular photosynthetic flagellate that inhibits aquatic ecosystems. E. gracilis belongs to the supergroup Excavata, and are an important component of the global biosphere, have biotechnological potential and is useful biological model due to their evolutionary history and complex biology. Whilst the evolutionary position of E. gracilis is now clear, their relationship with other protists such as Naegleria, Giardia, and Kinetoplastids, remains to be investigated in detail. Investigating and understanding the biology of this complex organism is a promising way to approach many evolutionary puzzles, including secondary endosymbiotic events and the evolution of parasitism, due to their relationship with Kinetoplastids. Here, I report a draft genome for E. gracilis, together with a high quality transcriptome and proteomic analysis. The estimated genome size is ~ 2 Gbp, with a GC content of ~ 50 % and a protein coding potential predicted at 36,526 Open Reading Frames (ORFs). Less than 25% of the genome is single copy sequence, indicating extensive repeat structure. There are evidences for large number of paralogs amongst specific gene families, indicating expansions and possible polyploidy as well as extensive sharing of genes with other non photosynthetic and photosynthetic eukaryotes: red and green algael genes, together with trypanosomes and other members of the excavates. Functional resolution into several of the biological systems indicates multiple similarities with the trypanosomatids in terms of orthology, paralogy, relatedness and complexity. Several biological systems such as nuclear architecture (e.g. chromosome segregation, nuclear pore complex, nuclear lamins), protein trafficking, translation, surface, consist of conserved and divergent components. For instance, several gene families likely associated with the cell surface and signal transduction possess very large numbers of lineage-specific paralogs, suggesting great flexibility in environmental monitoring and, together with divergent mechanisms for metabolic control, novel solutions to adaptation to extreme environments. I also demonstrate that the majority of control of protein expression levels is post-transcriptional and absence of transcriptional regulation, despite the presence of conventional introns. These data are a major advance in the understanding of the nuclear genome of Euglenids and provide a platform for investigation of the contributions of E. gracilis and relatives to the biosphere.