Phylogenetic approaches for quantifying the genotypic diversity of influenza viruses
View / Open Files
Authors
Parker, Edyth
Advisors
Wood, James
Russell, Colin
Date
2020-09-29Awarding Institution
University of Cambridge
Qualification
Doctor of Philosophy (PhD)
Type
Thesis
Metadata
Show full item recordCitation
Parker, E. (2020). Phylogenetic approaches for quantifying the genotypic diversity of influenza viruses (Doctoral thesis). https://doi.org/10.17863/CAM.65996
Abstract
Wild aquatic birds are thought to be the main reservoir for Influenza A viruses, hosting the
largest burden of influenza diversity generated by high frequencies of co-infection and segment
reassortment. The recently emerged avian influenza viruses that spilled-over into the human
population were all produced by dynamic reassortment between wild bird viruses and poultryadapted
H9N2 viruses. The recruitment of poultry-adapted internal gene cassettes (i.e. the
polymerase, nucleoprotein, matrix and non-structural genes) has been shown to increase the
fitness of wild-bird viruses in domestic poultry populations. It is unclear whether acquisition of
poultry-adapted genes by wild-bird viruses increases the probability of human infection by
increased fitness in domestic poultry and associated increased transmission risk at the humananimal
interface or whether it mediates improved adaptation to mammalian hosts. It is also
unclear whether the recent H9N2 genotype that emerged in human infections of H7N9, H10N8
and H5N6 is the only major set of internal genes prevalently facilitating the genesis of novel
reassortants of pandemic concern, or whether there are similar internal genotypes circulating.
There is therefore a need to characterize the observed and unobserved genotypic diversity of
the internal genes of avian influenza viruses across the global influenza ecosystem. However,
this effort has been limited by the lack of a pansubtypic nomenclature to partition and describe
the complex lineage distribution of the internal genes across all HA/NA-defined subtypes
resulting from dynamic reassortment. The ecological and evolutionary processes that structure
reassortment dynamics have also been incompletely investigated across subtypes and reservoir
and non-reservoir hosts, with questions remaining regarding constraints on reassortment
frequency and co-segregation bias for segments in wild birds relative to domestic birds and
swine populations. The current work set out to address these questions, centrally depending on
the development of an internal gene genotyping framework based on phylogenetic clustering
of the respective internal gene phylogenies. A new phylogenetic clustering algorithm,
PhyCLIP, was developed and described in Chapter 2 to overcome current methods’ limiting
reliance on arbitrary genetic distance thresholds for cluster definition. PhyCLIP operates on
the distribution of all branch lengths in the phylogeny, using this global patristic distance
distribution as a pseudo-null distribution to test the within-cluster distance distribution of
putative clusters against. PhyCLIP was validated on the WHO H5Nx clade nomenclature to
identify evolutionarily informative clusters in viral phylogenies. PhyCLIP was applied to
develop a pansubtypic genotyping nomenclature for the internal genes of avian influenza in Chapter Three in a globally representative dataset of n=14 428 sequences and n=120 subtypes.
The system designated 4763 genotypes, with their diversity quantified across spatiotemporal,
host and subtype scales. Genotypic diversity was significantly unevenly distributed, with wild
birds in North America accounting for 45% of all designated genotypes and subtypes H4N6
and H3N8 for 11% each. Approximately 69% of the genotypes were singletons, reflecting the
high reassortment frequency in the natural reservoir. The evolutionary pathways generating
genotypes infecting humans was also described using the lineage-assignment of the new
system, allowing for more complete tracing of progenitor genotypes across subtypes and
identification of lineage distinctions between human viruses. Chapter 4 quantified reassortment
frequencies in the pansubtypic dataset of avian and swine influenza with a new algorithm,
DeviantChild. DeviantChild quantifies phylogenetic incongruency as a measure of
reassortment frequency based on PhyCLIP’s phylogenetic clustering and patristic distance
distributional shift testing. DeviantChild detected extensive reassortment in the avian influenza
dataset and no strong evidence of reassortment bias among segments on a population scale,
supporting evidence of predominantly free reassortment of the internal genes. Reassortment
was twice and three times as high in the natural reservoir Anseriformes hosts relative to
shorebirds and domestic gallinaceous poultry, with the lowest reassortment frequencies
reported for H5Nx, H7Nx and H9Nx viruses in gallinaceous poultry. There was evidence of
segregated gene flow for the H13 and H16 subtypes in shorebird populations, supported by
evidence from the genotypic diversity distribution of gull-restricted genotypes. Chapter five
used the genotype distribution and a comprehensive suite of diversity measurements,
accounting for sampling heterogeneity to characterise the patterns of unobserved diversity
across HA-NA subtype, geographic region and host order. It identified a subset of low
pathogenic viruses including H4N6, H3N8, H1N1, H6N1 and H6N2 estimated to have very
high levels of undetected diversity in wild bird hosts. It also identified wild birds in China,
Guatemala and Japan as major sources of undersampled diversity, as well as domestic poultry
in Bangladesh, Pakistan and H5N2 in the USA.
Keywords
Influenza, Genomic epidemiology, Evolutionary biology, Phylogenetics
Identifiers
This record's DOI: https://doi.org/10.17863/CAM.65996
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.