Inferring Determinants of Viral Transmission using Short-Read Sequence Data

Thumbnail Image
Change log
Lumby, Casper Kaalø  ORCID logo

In order to spread, pathogens must not only be able to grow within an infected host, but also transmit to found new infections. In this thesis, I present a new population genetic framework generating insights into viral transmission events based upon genome sequence data collected before and after transmission. Previous attempts at bottleneck estimation have neglected the underlying genetic structure of viruses, considering instead less informative single-locus statistics.

Here I examine the problem of constructing reliable haplotypes from short-read sequence data, considering the performance of both exhaustive and minimal approaches in capturing linkage characteristics of the viral population. I present a simple method for bottleneck inference rooted in a multi-locus context supported by haplotype inference.

I next develop this model to incorporate selection for increased transmissibility, the effects of within-host growth, and noise arising from the sequencing process. Central to the method is a probabilistic model where unknown variables are marginalised over using compound distributions. A maximum likelihood scheme is employed in model selection where a machine-learning approach, referred to as adaptive BIC, was invented for the interpretation of likelihood statistics. I rigorously validate the performance of my model, identifying regimes wherein selection inference is feasible, and benchmark it against current state-of-the-art bottleneck inference algorithms, demonstrating a higher degree of realism and specificity within my approach.

I next extend the transmission model to account for advanced aspects such as selection for within-host viral adaptation, constructing a more realistic description of within-host growth processes. Accounting for within-host selection, I apply my transmission model to an experimental influenza transmission dataset in ferrets, providing novel quantitative insights.

I further explore limitations inherent to my model and consider regimes wherein the neutral version of my algorithm may be applied. I define and infer effective within-host selection for an influenza transmission study in pigs, employing my model to deduce a generally narrow transmission bottleneck in these animals.

Finally, I consider an influenza human challenge study and compute an effective single-segment within-host selection profile on the basis of an existing multi-segment characterisation. I discuss the relationship between human challenge studies and influenza infections occurring in a natural context.

Illingworth, Christopher
Influenza, virus, transmission, evolution, population genetics, bottleneck, haplotype, sequence data, selection
Doctor of Philosophy (PhD)
Awarding Institution
University of Cambridge
My PhD was funded by a Wellcome Trust Studentship with grant number 105365/Z/14/Z.