Latent feature models and non-invasive clonal reconstruction
Intratumoural heterogeneity complicates the molecular interpretation of biopsies, as multiple distinct tumour genomes are sampled and analysed at once. Ignoring the presence of these populations can lead to erroneous conclusions, and so a correct analysis must account for the clonal structure of the sample. Several methods to reconstruct tumour clonality from sequencing data have been proposed, spanning methods that either do not consider phylogenetic constraints or posit a perfect phylogeny. Models of the first type are typically latent feature models that can describe the observed data flexibly, but whose results may not be reconcilable with a phylogeny. The second type, instead, generally comprises non-parametric mixture models, with strict assumptions on the tumour’s evolutionary process. The focus of this dissertation is on the development of a phylogenetic latent feature model that can bridge the advantages of these two approaches, allowing deviations from a perfect phylogeny. The work is recounted by three statistical models of increasing complexity. First, I present a non-parametric model based on the Indian Buffet Process prior, and highlight the need for phylogenetic constraints. Second, I develop a finite, phylogenetic extension of the previous model, and show that it can outperform competing methods. Third, I generalise the phylogenetic model to arbitrary copy-number states. Markov chain Monte Carlo algorithms are presented to perform inference. The models are tested on datasets that include synthetic data, controlled biological data, and clinical data. In particular, the copy-number generalisation is applied to longitudinal circulating tumour DNA samples. Liquid biopsies that leverage circulating tumour DNA require sensitive techniques in order to detect mutations at low allele fractions. One method that allows sensitive mutation calling is the amplicon sequencing strategy TAm-Seq. I present bioinformatic tools to improve both the development of TAm-Seq amplicon panels and the analysis of its sequencing data. Finally, an enhancement of this method is presented and shown to detect mutations de novo and in a multiplexed manner at allele fractions less than 0.1%.