Repository logo

A scalable analytical approach from bacterial genomes to epidemiology.

Published version



Change log


Parkhill, Julian 


Recent years have seen a remarkable increase in the practicality of sequencing whole genomes from large numbers of bacterial isolates. The availability of this data has huge potential to deliver new insights into the evolution and epidemiology of bacterial pathogens, but the scalability of the analytical methodology has been lagging behind that of the sequencing technology. Here we present a step-by-step approach for such large-scale genomic epidemiology analyses, from bacterial genomes to epidemiological interpretations. A central component of this approach is the dated phylogeny, which is a phylogenetic tree with branch lengths measured in units of time. The construction of dated phylogenies from bacterial genomic data needs to account for the disruptive effect of recombination on phylogenetic relationships, and we describe how this can be achieved. Dated phylogenies can then be used to perform fine-scale or large-scale epidemiological analyses, depending on the proportion of cases for which genomes are available. A key feature of this approach is computational scalability and in particular the ability to process hundreds or thousands of genomes within a matter of hours. This is a clear advantage of the step-by-step approach described here. We discuss other advantages and disadvantages of the approach, as well as potential improvements and avenues for future research. This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.


Funder: National Institute for Health Research (NIHR)


bacterial genomics, dated phylogeny, infectious disease epidemiology, recombination, Bacteria, Genome, Bacterial, Genomics, Phylogeny

Journal Title

Philos Trans R Soc Lond B Biol Sci

Conference Name

Journal ISSN


Volume Title


The Royal Society
Public Health Research Programme (NIHR200892)