Integrative Analysis of the Human Gut Phageome Using a Metagenomics Approach
Bacteriophages (or phages; viruses that infect bacteria and archaea) profoundly influence microbial communities. Given the impact of the gut microbiome composition and function on human health, there is a growing focus on phages that inhabit the gut ecosystem. However, the extent of viral diversity, biology, and worldwide epidemiology of gut phages remain largely unknown. In this thesis, I carry out a comprehensive genomic analysis of gut phages by harnessing the biggest collection of phage genomes, gut bacteria isolates, and human gut metagenomes.
I begin by introducing the Gut Phage Database (GPD) which is the largest genomic resource to date of human gut phage genomes and product of mining 28,060 faecal metagenomes and 2898 gut bacteria isolate genomes. I use machine learning to improve the quality of the predictions and investigate ways to organise the viral diversity in order to improve the characterisation of gut phages in downstream analyses.
Afterwards, I describe common functions and auxiliary metabolic genes encoded by human gut phages. I also highlight instances of hypervariable domains which may indicate the presence of phage receptor binding proteins. I then shift the focus to the analysis of two clades of gut phages, namely the Gubaphage and the Picovirinae subfamily. The Gubaphage is a novel phage clade uncovered in this work which is highly prevalent across the world. The Picovirinae clade was the most common predicted phage taxonomy in GPD. Host assignment allows me to study patterns of phage diversity across bacterial clades of the human gut and investigate their host range.
Finally, I analyse global patterns of the human gut phageome and its association with lifestyle and bacterial composition. I assess the idea of a core virome as well as in what degree my data agrees with this concept.