Introduction
Mycobacterium bovis (M. bovis), a member of the Mycobacterium tuberculosis complex (MTBC) and a pathogen with zoonotic potential [1], is the main causative agent of bovine tuberculosis (bTB), a significant source of morbidity and mortality in the global cattle industry. In the United Kingdom (UK), the estimated annual cost of managing this disease is £120 million [2].
M. bovis has a broad host range with different wildlife reservoirs depending on geographic location: in Britain and Ireland the Eurasian badger is the predominant wildlife host, in France wild boar and deer, and in New Zealand the introduced brush-tail possum [3–5]. The presence of wildlife reservoirs makes the control and potential elimination of bTB challenging even in countries such as the UK with extensive testing and slaughter of cattle identified as bTB reactors, as well as movement restrictions imposed on herds with new bTB incidents termed breakdowns [6].
The Randomised Badger Culling Trial (RBCT) was a large-scale ecological field experiment carried out between 1998 and 2005 in England with the aim of quantifying the impact of culling badgers on the incidence of bTB breakdowns in nearby cattle herds [7]. Ten trial areas within the southwest of England and English Midlands, each of approximately 100 km2, were selected on the basis of high bTB incidence (defined as areas with the highest number of herd breakdowns in the previous three years). Each trial area was divided into triplets of randomly allocated interventions: proactive culling (widespread and repeated culling across the trial areas), reactive culling (badgers culled if breakdowns detected in nearby herds) and control or survey-only areas (no badger culling). Approximately 9,000 badgers were culled and sampled in proactive areas between 1998 and 2005 though culling was suspended between May 2001 and January 2002 due to a national foot and mouth disease epidemic. The mean prevalence of bTB infection in badgers in the first year of proactive culling was 11.3% (range 1.6% - 37.2%) [7].
A number of previous studies have established epidemiological links between badgers and nearby cattle although extent of transmission between the two host species remains uncertain [8,9]. Analyses making use of whole genome sequencing (WGS), which offers much higher resolution for strain characterisation and tracking transmission than classical genotyping methods such as Mycobacterial interspersed repetitive unit-variable number tandem repeat (MIRU-VNTR) and spoligotyping, have confirmed the close genetic relatedness of M. bovis isolates from sympatric cattle and badger populations but, due to the low genomic variability of the M. bovis genome and a lack of balanced sampling between the different host species, have not been able to adequately address the direction of transmission [10,11]. The first direct estimate of the extent and directionality of transmission between cattle and badgers suggested that transmission was up to ten times higher from badgers to cattle than vice versa in a dataset chosen for the presence of the same strain type (spoligotype SB0263) [12]. A subsequent study in Cumbria estimated that cattle to badger transmission was at least an order of magnitude higher than badger to cattle transmission [13].
This Eradication of bovine tuberculosis (ERADbTB) project was set up with the aim of using WGS data obtained from M. bovis isolates collected as part of the RBCT to 1) characterise the population structure of the bacterium within the trial areas, 2) attempt to quantify levels and directionality of M. bovis transmission between cattle and badgers and, 3) track the longer-term persistence of genetic lineages of the bacterium. Approximately 2,000 M. bovis isolates available from the RBCT were selected for sequencing with the final dataset consisting of 1,442 genomes (690 from badgers and 750 from cattle found to be infected in proactive cull trial areas respectively).
Methods
Sample selection, culturing and sequencing
A total of 2,137 M. bovis isolates from cattle (n = 1,011) and badgers (n = 1,126) were selected for culturing (all available badger and cattle isolates collected within the proactive trial areas), of which 1,838 isolates were located in the frozen archives maintained by the Animal and Plant Health Agency (APHA). All cattle isolates were collected as part of routine bTB surveillance; isolates collected after the end of the RBCT (post-2005; n = 200) were chosen if they were from herds from in or near the trial areas that had ongoing or recurrent breakdowns. Isolates were re-cultured and grown for up to six weeks or until sufficient growth was observed (n = 1,651). Isolates were heat killed in hot blocks at 80°C for 30 minutes. An adapted library construction protocol using an increased number of sixteen PCR cycles was used to generate Illumina libraries which were then sequenced at the Wellcome Sanger Institute using the Illumina HiSeq X10 platform to generate 2 x 150 bp paired-end reads. Metadata for the sequenced isolates is available on pubMLST (https://pubmlst.org/organisms/mycobacteria-spp) [14,15]. A map of the geographical locations of isolate collection (latitude and longitude) was constructed using the R v 3.5.1 [16] library ggmap [17].
Sequence QC
FastQC v0.11.9 [18] was used to generate basic quality control metrics for the raw sequence data. Sequencing reads were prefiltered using Kraken v0.10.6 [19] against a database containing all RefSeq bacterial and archeal nucleotide sequences to identify reads with similarity to Mycobacterium species. Further sequence matching was done on the Kraken results using Bracken v1.0 [20]. Samples with < 70% reads mapping to a Mycobacterium species were excluded from further analyses (n = 183).
In silico genotyping
SpoTyping v2.0 [21] was used to extract the binary representation of spoligotype patterns from the sequence reads and the M. bovis spoligotype database (https://www.mbovis.org/database.php) was used to assign SB numbers. Novel spoligotype patterns were submitted to the database to generate new SB numbers. Clonal complexes were assigned to samples using RD-analyzer v1.0 [22] with samples not identified as belonging to previously described clonal complexes (Eu1, Eu2, Af1, Af2) designated as “Other” [23–26]. Further assignment of isolates marked as “Other” to clonal complex was based on the phylogenetic lineages recently identified by Loiseau et al. [27].
Mapping and phylogenetics
Sequence reads were trimmed using Trimmomatic v0.33 [28] and mapped to the Mycobacterium bovis AF2122/97 reference genome (NC0002945) using BWA mem v0.7.17 (minimum and maximum insert sizes of 50 and 1000 respectively; S2 Table) [29]. Single nucleotide polymorphisms (SNPs) were called using SAMtools v1.2 mpileup and BCFtools v1.2 (minimum base call quality of 50 and minimum root squared mapping quality of 30) as previously described [30]. Samples with reads mapping to less than 90% of the AF2122/97 reference were excluded (n = 26). Genomic regions consisting of repetitive highly-GC-rich sequences such as proline-proline-glutamate (PPE) proteins and repeats were masked in the resulting alignment using previously published coordinates [31]. Gaps in the alignment were excluded and variant sites (n = 3565) in the subsequent masked alignment were extracted using snp-sites v 2.5.1 [32]. Maximum likelihood phylogenetic trees were constructed using IQ-tree v1.6.5 accounting for constant sites (-fconst; determined using snp-sites -C) with the built-in model testing (-m MFP) to determine the best phylogenetic model (GTR+F+R2) and 1000 ultrafast bootstraps (-bb 1000) [33]. Pairwise SNP distances were calculated for all pairs of isolates from the SNP alignment using pairsnp v1.0 (https://github.com/gtonkinhill/pairsnp).
To provide a global context for the isolates sequenced in this study, a published clonal complex Eu1 dataset (n = 2,842; S3 Table) spanning fourteen countries was assembled [10–12,31,34–47]. Sequence data were downloaded from the European Nucleotide Archive (ENA). Read trimming, sample QC, spoligotype assignment, mapping and phylogenetic tree construction were performed as above (Table A in S1 Data). The tree was rooted with a Mycobacterium caprae isolate (SRR7617662).
Transmission clusters
The R library iGRAPH [48] was used to define putative transmission clusters using a pairwise SNP distance between any two samples of 15 as the threshold. This threshold was chosen as it would allow for the possible identification of older transmission events but also allow for any variance in the rates of mutation amongst the sampled isolates, and has been previously used in a similar analysis of a human Mycobacterium tuberculosis dataset [49]. A total of 21 clusters containing more than one isolate were defined; to ensure sufficient resolution for the analysis of transmission, clusters with more than 30 isolates were retained. Visual inspection of the phylogenetic tree with the transmission clusters overlaid allowed large clusters to be manually divided into smaller clusters on the basis of clear phylogenetic divisions within the original clusters (Clusters 5–6 and Clusters 8–12). A single cluster of 47 isolates failed to achieve convergence when being analysed with TransPhylo (see below) and was removed along with a smaller cluster containing 32 isolates which left twelve clusters with more than 50 isolates for further analyses. New alignments were generated for each cluster as described above (Table A in S1 Data).
The presence of a temporal signal in each transmission cluster was investigated by plotting the root to tip distance for each isolate, calculated using the R library phytools [50], against its sampling date (Fig A in S1 Data). The slope, x-intercept (most recent common ancestor; MRCA), correlation coefficient and R2 value were calculated for each dataset in R. BEAST v1.8.4 [51] was run on each SNP alignment, using tip sampling dates for calibration. In order to identify the most suitable model for the data being analysed a selection of different molecular clock and effective population size models were tested (a HKY substitution model was used for all models): strict or relaxed molecular clock and constant or exponential population size and growth. Three runs of 108 Markov chain Monte Carlo (MCMC) iterations were performed (12 separate runs) for each transmission cluster. The performance of each model was assessed through the comparison of posterior marginal likelihood estimates [52,53] and the model with the highest Bayes factor [54] (strict clock/constant population size) was selected for each transmission cluster (Table B in S1 Data). The three selected MCMC runs were combined using LogCombiner v1.8.4 (10% burnin) and convergence was assessed (posterior effective sample size (ESS) > 200 for each parameter). A maximum clade creditability tree summarizing the posterior sample of trees in the combined MCMC runs was produced using TreeAnnotator v1.8.4. To confirm the temporal signal in each tree generated, the R library TIPDATINGBEAST [55] was used to resample tip dates from each alignment to generate 20 new datasets with randomly assigned dates. BEAST was then run on each new dataset using the same strict clock priors (Fig B in S1 Data). If the estimated substitution rates in the observed data did not overlap with the estimated substitution rates in the randomized data then the temporal signal observed in the observed data was considered not to be obtained by chance.
Transmission reconstruction was performed on each cluster using the R library TransPhylo [56] which allows for unsampled cases and within-host diversity. The same parameters (gamma shape = 1.6; scale = 3.5) were used for the infection and generation time prior distributions. The TransPhylo algorithm was run three times for 107 MCMC iterations sampling every 200,000 states and a burnin of 10% on each cluster using the MCC trees generated previously. The R library coda [57] was used to assess convergence (Gelman and Rubin’s Convergence Diagnostic < 1.05) and ESS values > 100 for within-host diversity, reproductive rate and sampling proportion (Table C in S1 Data). Post processing of each TransPhylo run was performed in R.
The BEAST2 [58] package BASTA (Bayesian Structured coalescent Approximation) [59] was used to estimate transmission rates between badgers and cattle, defined as demes, in each transmission cluster. A strict clock/equal population size model was used and the BASTA analysis was repeated three times and run for 3 x 108 MCMC iterations with 10% burnin. Convergence was assessed as above. Post processing of the BASTA analysis was performed in R.
SNP-scaled phylogenetic trees were calculated for each transmission cluster using pyjar (https://github.com/simonrharris/pyjar) [60] and plotted using the R libraries treeio and ggtree [61,62].
Cattle movements
bTB metadata was extracted from APHA’s Sam database that records all statutory bTB testing information. Cattle movement metadata were extracted from APHA’s copy of the Department of Environment, Food and Rural Affairs’ (DEFRA)’s Cattle Tracing System (CTS). Movement data were extracted for 727/752 cattle where the ear tag could be matched to the Sam database (it only became a legal requirement to record cattle movement in the CTS after January 2001 so movement data may be missing for the early part of the RBCT). Movements of TB test reactor cattle that were not subjected to laboratory culture and/or sequencing of M. bovis, but may have contributed to the spread of infection, were extracted from the CTS using the following criteria: the animals passed through the same location as an animal with a sequenced isolate, the animals were born before 2009 and the animals were classified as “reactors”. Animals were classified as reactors if they had a positive tuberculin test result, had an inconclusive test result but were slaughtered and culture positive for M. bovis, were culture positive for M. bovis following detection by routine meat inspection at a slaughterhouse, or were culture-negative reactors that led to a breakdown with other tuberculin test positive animals. UK grid coordinates were extracted from the Sam database by matching to location IDs. Past breakdown history was extracted by matching herds using county-parish-holding (CPH) numbers. Where multiple herds had the same CPH number, the active dates of the herds were checked and the individual animal test records were used to identify the correct entries. Short stay locations and locations with missing coordinates were excluded by creating animal records that removed missing locations or stays of fewer than eight days. Where subsequent movements occurred, these were connected to the previous movements to create a continuous record. Where the cattle ear tag IDs of sequenced isolates could not be matched to the database, the CPH was used to identify the final location and coordinates for plotting. The final herd of the animals with sequenced isolates was determined as the location closest to death where the length of stay was at least seven days. The data were queried and extracted from the CTS using PostgreSQL. Pairwise geographic distances between each isolate in kilometres were calculated using the distHaversine function from the R library geosphere [63]. Herd and badger locations were randomly shifted by up to 1 km in the horizontal and vertical planes for plotting using the R libraries maps and mapdata [64].
Discussion
The RBCT was set up to assess the impact of badger culling on the incidence of bTB in nearby cattle herds. In this study, a subset of the resulting badger and cattle isolates were sequenced. Thus, for the first time, WGS of large numbers of M. bovis isolates from co-located populations of both species collected contemporaneously in well-defined geographical areas can be used to address key questions surrounding bTB transmission in the high-risk regions of England.
A total of 60 unique spoligotype patterns were identified in the dataset though the majority of the isolates (1320/1442; 91.5%) were made up of the six most prevalent spoligotypes confirming that there is relatively little genetic heterogeneity at this level amongst M. bovis in the high prevalence areas of England. The observed prevalence of spoligotypes in this study closely matched previously published data, generated using traditional typing methods, from the same time period [65], showing that the dataset accurately reflects the known population structure of M. bovis at that time.
Genotyping methods such as spoligotyping are used to infer close relationships between isolates (low genetic diversity) and assume monophyly. However, it is clear from plotting spoligotypes on the phylogenetic tree in Fig 1B, that some of the spoligotypes, in particular SB0140, are polyphyletic. It has been recognised for a long time that spoligotypes can be homoplasic and identical spoligotype patterns can be found in phylogenetically unrelated strains [65]. Despite this, spoligotyping, along with MIRU-VNTR analysis, has continued to be the most commonly applied method for genotyping M. bovis as it is cheap and comparatively straightforward to implement in the laboratory. However, given the much higher resolution offered by WGS and the fact that national bodies such as APHA and the United States Department of Agriculture (USDA) are now moving towards routinely sequencing all cases of M.bovis, it may now be time to move towards a SNP-based method of typing M. bovis isolates similar to the Coll method adopted for typing M. tuberculosis sensu stricto [66].
The predominance of clonal complex Eu1 in our dataset was unsurprising as previous work, including the study that first defined Eu1 [24], has shown that this clonal complex is ubiquitous in Great Britain, Northern Ireland and the Republic of Ireland whilst being uncommon in mainland Europe, where M. bovis genetic diversity is much greater [42]. Previous work using both PCR and genomics to assign clonal complex shows that Eu1 is likely the most predominant globally-circulating M. bovis lineage and it has been found in many countries that have historically traded cattle with the UK such as New Zealand, the USA, Mexico and Uruguay [34–36,41]. The presence of this clonal complex and its spoligotypes such as SB0140 and SB0263 in these countries suggests that Eu1 has been present in the UK for at least 200 years or more [67]. This is supported by the high level of pairwise SNP diversity observed within SB0140.
The Unknown7 isolates in the dataset were found in trial areas A3, D3 and G2 and were a maximum of 12 SNPs apart. The small number of isolates suggest that it is uncommon in the UK and the low level of genetic diversity suggests that it may be part of a single, recent introduction. This clonal complex has also been found in France, Mali and the USA [27,34,42] and given the geographical proximity of France and the ongoing trade of cattle between the two countries this may be the most likely origin for this lineage.
The prevailing hypothesis for the population history of M. bovis, in particular Eu1, in the UK is that there was a single introduction followed by long term endemicity and a population bottleneck due to effective control measures beginning in the 1930s [65]. However, the observed phylogenetic structure of isolates when combined with a global collection of Eu1 isolates, indicates that there have likely been multiple, perhaps as many as four, introductions of Eu1 into England. Whilst we did not attempt to date these introductions in this study, the availability of archived isolates from the 1980s along with contemporaneous isolates will be used alongside the RBCT dataset and other published UK datasets in future work to provide estimated dates for these introductions.
Due to the large size and clear phylogenetic and geographical structure of the dataset as well as our study aims, we defined transmission clusters using a conservative pairwise SNP threshold of 15 SNPs. There is currently no consensus as to what is the best threshold to apply to Mycobacterium genome datasets with previous studies using thresholds between three and fifteen SNPs [40,49]. We chose the 15 SNP threshold as it would allow for the possible identification of older transmission events but also allow for any variance in the rates of mutation amongst the sampled isolates. We chose to use the software package TransPhylo as it integrates dates of isolate collection and genetic relatedness and allows for within-host diversity and unsampled cases. The first step of the analysis was to generate molecular dated phylogenies for each of the transmission clusters. Assessing the presence of temporal signal in a genome dataset is typically done in two ways: examining the linear relationship between root to tip distance and sampling date (under a perfect clocklike behaviour, then R2 = 1 [68]), and dated tip randomization (DTR) analysis. In DTR, the dates of sampling are repeatedly shuffled amongst the taxa and the clock rates between the observed and random data calculated and compared [69]. If there is no overlap between the estimated substitution rates of the observed data and the randomized datasets then we can conclude that the observed dataset has a stronger temporal signal than expected by chance [69]. We obtained very low or negative values for R2 for all our transmission clusters which is normally interpreted as evidence for a lack of temporal signal or else overdispersion in lineage-specific clock rates [70] (Fig A in S1 Data). The previously reported slow substitution rate of M. bovis [10,35] and the short window of sampling (twelve years) may explain the lack of association between root to tip distances and sampling dates. As root to tip regression is only a tool for exploratory analysis [70] we performed DTR on all of the transmission clusters (Fig B in S1 Data). From this, we observed that there was strong evidence of temporal signal in 5/12 of the transmission clusters, moderate temporal signal in 5/12 transmission clusters and weak temporal signal in two clusters (clusters 2 and 4; Fig B in S1 Data). To assess the impact on the overall results by including these two clusters, we reanalysed the data without Clusters 2 and 4 (S1 Data); this showed that the exclusion of these two clusters did not have a significant effect on the results or conclusions drawn when all twelve clusters were included. So, despite the especially weak evidence of temporal signal in Cluster 2, we decided to include them in our analyses as, given the close relatedness of all of the clusters, it is highly likely that the mutational process, and therefore the molecular clock, will be similar in each of them. As well as being used as input for TransPhylo molecular dating of each of the transmission clusters provided additional insights into the dataset.
Firstly, we were able to calculate substitution rates for each transmission cluster which ranged between 0.5 and 6 SNPs per genome per year (Table D in S1 Data). Published estimates for the median substitution rate of M. bovis vary between 0.15 and 0.53 substitutions per genome per year [10,35]. We were unable to provide a good explanation for why the substitution rates varied so much between the different transmission clusters, but previous work has shown that there are lineage and study specific differences in the substitution rates within the Mycobacterium tuberculosis complex (MTBC) [71]. This analysis showed that higher substitution rates were found in smaller datasets with narrow sample date ranges, which may explain the much higher substitution rates of up to six substitutions per genome per year we observed in our analyses. Future work will examine the effect sample size, sampling strategy and date range has on estimating evolutionary rates such as substitution rate. Secondly, we were able to infer the date of the MRCA of each transmission cluster (Fig 3C). The short time period (4–15 years) between the inferred date of the MRCAs and the earliest sample collection dates suggests that the transmission clusters are likely the result of recent seeding events and not a consequence of endemic disease in the form of long-term maintenance within herds or an endemic wildlife reservoir, in this case badgers. The introduction of a compulsory test and slaughter scheme in the UK in the 1950s saw a sustained decline in the annual number of infected animals removed as TB test reactors and infected cattle herds with only a few hundred reactors being detected annually in the early 1980s [65]. However, incidence increased from the late 1980s, with the UK now having one of the highest incidences of bTB in Europe. From our analyses, it is clear that the dates for the MRCA of each of the transmission clusters overlap with this expansion in the M. bovis population.
From the TransPhylo analysis we were able to estimate what proportion of infected hosts we managed to sample for each transmission cluster (Fig D in S1 Data). Sampling of all hosts infected with a disease is never complete due to a range of factors such as detection, failure to culture and in the case of genomics, issues related to sequence quality. In this study we estimate that we managed to sequence a median of 43.2% of cases across the transmission clusters though this varied from less than 30% to as high as 75% depending on the cluster. Obviously, the success of sampling has an impact on the types and quality of the inferences we are able to make. For instance, we were able to confidently identify and confirm a superspreading event in Cluster 12 due to having sequenced 38/39 of the confirmed cases in a breakdown (see below).
The incubation period of TB in cattle is generally believed to be several months and potentially years, although there is some evidence of much shorter incubation periods in other mammalian species such as cats [72]. There is also typically a lag period (occult period) between infection and detection where infections are undetectable to the standard tuberculin test [73]. The organism may also persist for several years within infected animals before they are detected (latency) and reactivation has not been demonstrated in cattle. To date, there are no firm estimates for either the duration of the occult period or of epidemiological latency, which is problematic for fitting transmission models [74] and predicting the impact of control polices [75]. Based on our analysis using TransPhylo, we can provide estimates for how long both badgers and cattle were infected before sampling. The analysis showed that, on average, badgers were infected for twice as long as cattle before sampling. The median period of infection for cattle of 0.56 years is consistent with the annual testing schedule imposed on cattle during the RBCT. Whilst there was a wide range of estimates for the length of infection the upper 95% HPD for both badgers (2.26 years) and cattle (3.36 years) were within the typical lifespans for both badgers (mean 5 years) and dairy and beef breeding cattle (up to 6 years) [76,77].
We were able to identify a small number of highly supported direct transmission events, defined as transmission pairs that had a posterior probability greater than 0.5 (Table 3). Although the majority (60/84) of these transmission events occurred between the same species, there were also 24 inter-species transmission pairs across the transmission clusters with pairwise SNP distances varying between 1 and 5 SNPs. To date, there is limited evidence of badgers and cattle directly interacting and the majority of transmission is considered to be indirect i.e. through the environment [78]. Given the inferred number of unsampled cases and small number of highly supported transmission pairs, more intensive sampling would need to be performed to better establish transmission dynamics between the different bTB host species. Despite the logistical challenges around detecting and culturing M. bovis in environmental samples, the inclusion of samples from faeces and feed troughs and other potential hosts such as rodents and cervids should be an integral part of any future work.
One of the aims of this study was to assess and quantify the directionality of transmission between cattle and badgers. For this we used a Bayesian evolutionary tool, BASTA (Bayesian Structured coalescent Approximation), to estimate the inter-species transmission rates in each of the transmission clusters. BASTA was designed to estimate evolutionary dynamics in structured populations and account for sampling biases. For the majority of our transmission clusters, badger to cattle transmission occurred more frequently even in clusters with approximately equal numbers of cattle and badgers (Fig 3E). It is worth noting that the estimated transmission rates were very low with the median number of badger to cattle transmissions across all transmission clusters estimated as 0.05 transmissions per lineage per year and the median number of cattle to badger transmissions estimated as 0.02 transmissions per lineage per year (Fig 3E). Whilst BASTA does not directly estimate intra-species transmission rates we could calculate the number of transmission events between each host species from the posterior log and tree files. These are conservative counts of the minimum number of transitions between sampled animals and their ancestors but do allow us to compare the number of inter- and intra-species transmissions. From this we were able to demonstrate that inter-species transmission occurs much less frequently than intra-species transmission in our transmission clusters and cattle to cattle transmission is more common than badger to badger transmission (Fig 3F). Two previous studies, each on small geographically localized populations, have used BASTA to estimate rates of transmission between badgers and cattle. The first estimated that badger to cattle transmission was 10.4 times more frequent than cattle to badger transmission and the second estimated that cattle to badger transmission was at least an order of magnitude higher than badger to cattle transmission [12–13]. These results, along with those described in this study, suggest that the directionality of transmission may vary between sampling area although badger to cattle transmission does appear to be more frequent. What is consistent across all the studies, however, is that intra-species transmission occurs much more frequently than inter-species transmission.
Beyond the original aims of the project such as characterising the population structure of M. bovis isolates collected as part of the RBCT and investigating inter-species transmission, the combination of genomics and the extensive cattle tracing database allowed us to characterise examples of recurrence, superspreading and long-distance transmission within the dataset. Previous work has shown that prior history of bTB within a herd is an important predictor of breakdown: 38% of herds that clear movement restrictions experience another breakdown within 24 months [79]. This suggests that infection is being maintained within herds despite repeated testing and it is estimated that between 24% and 50% of recurrent breakdowns are due to persistence within the herd [74]. We were able to use pairwise genome comparisons to identify near identical isolates that were collected up to four years apart and which were part of confirmed herd breakdowns. Examination of the cattle movements confirmed that some of these isolates were collected from animals that arrived subsequent to the dates of slaughter of infected animals as part of previous breakdowns. The similarity of these more recent isolates to the earlier isolates would suggest that the animals were infected after their arrival in the new location and that control measures following the prior breakdowns were insufficiently effective.
TransPhylo allowed us to generate plausible transmission networks where star like nodes representative of potential superspreaders (individual hosts that have a disproportionate effect on the spread of infection) could be identified (Fig 4C and 4D). We were then able to incorporate data from the CTS to identify the cattle likely acting as the source of the infections. Whilst previous work using modelling or network analysis has highlighted the importance of small numbers of farms or herds as hubs of transmission which act as superspreaders of infection [80,81], we provide the first evidence, based on genomics and cattle movement data, that particular animals within herds may also act as superspreaders potentially contributing to increased transmission between different locations if these animals are not identified before being moved. We were unable to identify any superspreaders amongst any of the sampled badgers.
From the temporal analysis of the transmission clusters we showed that these clusters mostly emerged in the 1980’s, showing that they were likely seeded 15 to 20 years before the RBCT was conducted. The most likely mechanism for this is the movement of infected cattle into a location followed by subsequent onward transmission within the herd and into the local badger populations. Given the median estimate of the MRCA of the transmission clusters was eight years before sampling began, this precluded any possibility of us sampling the index case for any of the transmission clusters. However, by incorporating cattle movement information with our transmission clusters, we were able to identify cattle infected with a particular lineage in one trial area moving to a trial area further away, highlighting the potential for long distance transmission events to seed new transmission clusters (Fig 4E and 4F). This was also recently demonstrated by Rossi et al. who identified an imported infected animal or animals as being responsible for a bTB outbreak in a region of England with no previously known wildlife infections [13]. This has important implications for infection control; even with the limited sampling we conducted, the combination of genomics and cattle movements still allowed us to identify these potential seeding events. More targeted testing and sequencing before animals are moved, particularly to lower incidence areas, would potentially identify these likely sources of infection before they are able to become established in other locations.
Potential limitations of our analysis were the choice and number of samples included in the study and known issues surrounding the lack of a strong temporal signal in M. bovis that may affect the results of any analyses based on molecular dating. Any sampling strategy we selected would not have been perfect; ideally, we would have tried to sequence all samples collected as part of the RBCT; however, this was not possible due to cost and manpower constraints so we chose to sequence only the badger and cattle isolates collected from proactive triplets excluding isolates from infected badgers culled in reactive triplets and infected cattle culled as part of contemporaneous breakdowns. From our TransPhylo analysis we estimated that we managed to sample approximately 40% of infected cases across our transmission clusters. Despite this, the size of the dataset was still large enough to generate several large transmission clusters that allowed us to draw robust conclusions about transmission, notably directionality of transmission between badgers and cattle. Comparison of the spoligotype distribution in our study to earlier work confirmed that our dataset was representative of the known population structure during the RBCT.
We know from previous work that the lack of a strong temporal signal is a potential issue when attempting to accurately date the origin of particular lineages [71]. The results of the dated tip randomization analysis indicated that there was moderate or strong temporal signal in nearly all of our transmission clusters; however, two of our transmission clusters notably Cluster 2 had a weak temporal signal. The range of substitution rates we estimated for some of our transmission clusters was also higher than previously observed which may have affected the estimated dates of those transmission cluster’s MRCAs. Overall, however, even if individual clusters such as Cluster 2 with little or no temporal signal or Cluster 3 with a high substitution rate are of concern, the conclusions we have drawn are based on considering the results from twelve different transmission clusters composed of over 1,200 genomes and thus can be considered robust.
Multiple previous studies have shown that bTB transmission is complicated, unlikely to be driven by a single mechanism and is strongly associated with the setting and host dynamics of the system being studied. Here we used the largest single country genome dataset alongside the national cattle movement database to attempt to address key questions around bTB transmission in a multi-host, intensive setting. Whilst both the TransPhylo and BASTA results support inter-species transmission with some evidence that there is broadly more badger to cattle transmission than in the opposite direction, it is clear that the majority of ongoing transmission is occurring within cattle herds and within the badger populations. Spillover in either direction could then be considered to be occurring at a low level and, based on the dates of their MRCAs, the transmission clusters we defined are likely to have been the result of recent seeding events and are primarily being maintained by within-species transmission. We have also provided the first genomics-based estimates for the length of time that badgers and cattle are infected with bTB before sampling. Finally, we were able to characterise recurrence, superspreading and long-distance transmission within our transmission clusters.