Duplication is a prominent mechanism of recent gene birth in Caenorhabditis elegans

Authors

Loading...
Thumbnail Image
Type
Thesis
Change log
Abstract

The high number of available reference genomes for different species and their comparison has enabled the elucidation of gene birth mechanisms that act over a long evolutionary timescale. However, the lack of several reference-quality genomes for different individuals of the same species has hampered the study of the mechanisms of more evolutionarily young gene births. Despite the high throughput brought about by second-generation sequencing technologies, their short read length has limited the study of genetic diversity to single nucleotide polymorphisms (SNPs) and short indels. However, in order to study gene-level events, we need to characterise the genetic diversity of a species comprehensively, including structural variants (SVs) (> 50 bp). I present the most comprehensive set of genomes and SVs for Caenorhabditis elegans. I have assembled a high-quality genome for each of 20 wild isolates of the nematode using long and short read sequencing. I show that 1,587 transcripts are deleted among the wild isolates and thus sketch the  first definition of the core genome of C. elegans. I present the case of a highly proliferative transposon harbouring a transcription factor binding site (TFBS) and use it to address the question of transposon co-option in this model organism. Finally, using this dataset, I show that tandem gene duplication is a prominent gene birth mechanism, whereas horizontal gene transfer (HGT) played little or no role in the birth of recent C. elegans genes. Additionally, I show that G protein-coupled receptors (GPCRs) have high levels of presence/absence variation (PAV) and discuss the significance of this  finding in light of the ecology of this little worm.

Date
2021-06-01
Advisors
Hemberg, Martin
Miska, Eric Alexander
Keywords
genomics, biology, sequencing, DNA, evolution, gene birth, PacBio, Pacific Biosciences, long reads, genomes, genome assembly, bioinformatics
Qualification
Doctor of Philosophy (PhD)
Awarding Institution
University of Cambridge
Sponsorship
Wellcome