Genomic evolution, transmission and pathogenesis of
These authors contributed equally: Chrispin Chaguza, Madikay Senghore
These authors jointly supervised this work: Martin Antonio, Stephen D. Bentley, Brenda A. Kwambana-Adams
Nasopharyngeal colonisation facilitates the evolution and transmission of the pneumococcus and other respiratory tract pathogens; therefore, it is key determinant of the strain population dynamics
In this work, we investigate within-host dynamics, genomic diversity, and microevolution of pneumococcal strains during natural colonisation in new-born infants in the Gambia, Sub-Saharan Africa (SSA); a relevant setting with high IPD and colonisation rate up to ≈97% in infants <1 year old
We recovered The newly born babies were recruited into the study at birth and nasopharyngeal swabs were taken with the first week after birth and every two weeks until six months and then after every month until they were one year old at which sampling was stopped. The analysis of these longitudinal data involved fitting multi-state and other models to determine colonisation dynamics in the babies during the first year of life and whole-genome analysis to assess the within-host genetic diversity, recombination and mutation rate of the isolates. The map of The Gambia was generated by the authors in R software using ggmap v3.0.0 package (
We defined transient and extended colonisation episodes as the detection of an isolate of the same serotype at a single and consecutive sampling points respectively (Fig.
Of the 1553 pneumococcal samples collected from the infants, 1074 isolates were had a whole-genome sequence available and were analysed to infer within host genetic diversity of strains during extended colonisation episodes (Supplementary Data
We then conducted an in-depth analysis of the within host genetic diversity of the strains in each episode. The mean number of SNPs between consecutively sampled isolates from the same episode (two weeks apart) of the same serotype and ST was 14.8 (range: 3–150) but the mean number of SNPs between all the isolates in the episodes ranged from 3 to 27.5 for different serotypes (Fig. The strip charts, box and violin plots showing the number of SNPs calculated between isolates of the same serotype and ST within the same episode. The isolates sampled at five or less weeks apart are coloured in light blue while those sample at more than six weeks apart are shown in darker blue. The genetic diversity of some strains was much higher than the rest of the strains in the episode for some serotypes for example 11A, 16F, 19A, 23F, 6A, 6B and NT; which suggested the occurrence of other evolutionary processes other processes other than random substitution particularly genomic recombination. The
Homologous recombination is the major driver of evolution in bacterial pathogens Episodes with high intra-episode recombination rate during natural colonisation. Episode ST Recombination blocks SNPs inside SNPs outside Frequency INF55:21:1 ST11730 8.29 (0,30) 0.01 (0,0.07) 2.14 (0,15) 6.14 (0,15) 0.14 (0,1) INF71:20:1 ST10625 8.38 (0,21) 0.05 (0,1) 0.86 (0,12) 7.52 (0,21) 0.1 (0,1) INF67:19A:1 ST847 8.71 (0,28) 0.02 (0,0.15) 2.14 (0,15) 6.57 (0,13) 0.29 (0,2) INF74:19A:1 ST847 8.78 (0,27) 0.02 (0,0.12) 2.78 (0,19) 6 (0,15) 0.22 (0,1) INF42:9V:1 ST11758 9 (0,25) 0.01 (0,0.09) 1.46 (0,10) 7.54 (0,16) 0.15 (0,1) INF65:18A:1 ST241 9 (0,26) 0.01 (0,0.11) 1 (0,17) 8 (0,22) 0.06 (0,1) INF11:23B:1 ST5706 9.08 (0,21) 0.01 (0,0.12) 1 (0,13) 8.08 (0,21) 0.08 (0,1) INF84:15A:1 ST10618 9.31 (0,31) 0.01 (0,0.12) 0.69 (0,5) 8.62 (0,26) 0.15 (0,1) INF73:9L:1 ST11705 9.44 (0,41) 0.08 (0,0.5) 4 (0,16) 5.44 (0,26) 0.56 (0,2) INF89:6A:1 ST10801 9.71 (0,22) 0.01 (0,0.06) 0.71 (0,5) 9 (0,22) 0.14 (0,1) INF19:35B:2 ST11721 10 (0,41) 0.03 (0,0.17) 4.11 (0,27) 5.89 (0,17) 0.33 (0,1) INF47:13:1 ST11710 10.14 (0,50) 0.02 (0,0.12) 4.71 (0,33) 5.43 (0,17) 0.29 (0,2) INF56:19A:1 ST847 10.56 (0,29) 0.05 (0,0.2) 2 (0,8) 8.56 (0,27) 0.33 (0,1) INF26:23F:1 ST2174 12.26 (0,159) 0.01 (0,0.11) 7.89 (0,150) 4.37 (0,19) 0.05 (0,1) INF85:13:2 ST11711 12.29 (0,33) 0.03 (0,0.12) 3.86 (0,16) 8.43 (0,21) 0.57 (0,2) INF63:19A:1 ST2174 13 (0,31) 0.01 (0,0.05) 1.86 (0,8) 11.14 (0,23) 0.29 (0,1) INF20:19A:1 ST11691 15.15 (0,129) 0.03 (0,0.33) 10.46 (0,123) 4.69 (0,13) 0.23 (0,2) INF61:11A:2 ST5902 15.91 (0,107) 0.03 (0,0.25) 9.18 (0,83) 6.73 (0,24) 0.64 (0,6) INF57:11A:1 ST5902 16.62 (0,175) 0.03 (0,0.25) 13.31 (0,169) 3.31 (0,7) 0.15 (0,1) INF59:6BE:1 ST5516 121.89 (0,1075) 0.04 (0,0.33) 118.11 (0,1063) 3.78 (0,12) 0.44 (0,4) The episode name is shown in the format A:B:C where A,B and C represents the infant ID, serotype and number of episodes with the serotype respectively. The value of
Multiple isolates of the same serotype but identical STs were also detected in some episodes. Such co-existence of highly divergent isolates with the same serotype but different STs occurred during 14 episodes (Supplementary Table
We then assessed the overall contribution of recombination to within-host pneumococcal diversity during the episodes with >2 sequenced isolates of the same serotype and ST (Table
We then used 60 extended episodes with >4 sequenced genomes to infer within-host substitution rates. We estimated the number of accrued substitutions and the amount of time taken to accumulate the substitutions in each episode using the onset strain of the episode as the baseline. To assess whether the accumulation of substitutions was time-dependent, or consistent with molecular-clock evolution, we fitted a linear regression model of the number of accrued substitutions against the corresponding time (Fig. Episodes where molecular-clock signal was evident were analysed. Serotypes with >4 sequenced genomes per individual were included in the analysis. The episode name is shown in the format A:B:C where A, B and C represents the infant ID, serotype and number of episodes with the serotype respectively. Linear relationship between the number of accrued SNPs in comparison with the reference genome sequenced at the onset of the episode was assessed using linear regression. The nucleotide substitution rate ( Within-host nucleotide substitution rates during natural colonisation. Episode ST Estimate ( Substitution rate ( SNPs year−1 INF19:35B:1 11721 0.94 0.91 1.06 2.49 × 10−05 55 15.8 2.99 × 10−2 INF5:34:1 7319 0.76 0.72 0.43 1.00 × 10−05 22 10.5 2.35 × 10−3 INF55:19A:1 10542 1 1 0.13 2.93 × 10−06 7 29.7 2.60 × 10−16 INF59:6B:1 5516 1 1 1.5 3.52 × 10−05 78 72.2 1.33 × 10−16 INF66:34:1 1778 0.92 0.89 1.44 3.38 × 10−05 75 2.63 9.88 × 10−3 INF7:22A:1 10600 0.78 0.70 1.11 2.60 × 10−05 58 3.39 4.80 × 10−2 INF73:9L:1 11705 0.90 0.86 2.75 6.46 × 10−05 143 1.22 4.92 × 10−2 INF76:19A:1 4029 0.37 0.30 0.16 3.81 × 10−06 9 12.6 3.68 × 10−2 INF85:13:2 11711 1.0 1.0 0.89 2.10 × 10−05 47 13.3 1.47 × 10−2 INF90:6A:1 11700 0.62 0.56 0.92 2.17 × 10−05 48 3.27 2.05 × 10−2
The probability of a parallel SNP occurring at any random location in the pneumococcal genome is extremely low ≈2.46 × 10−12 within a year and ≈9.07 × 10−16 within a week, which implies that the occurrence of such mutations reflects adaptive evolution. Since
The most common parallel genic SNPs occurred in genes encoding for the penicillin-binding protein Type of parallel SNP is shown by different panels in the figure as follows;
We assessed the frequency of SNPs and compared the ratio of non-synonymous to synonymous SNPs in the genes mutated during extended colonisation episodes. The highest number of SNPs were found in
Our findings provide compelling evidence that within-host genetic diversity of pneumococcal strains is rapid and adaptive during extended natural colonisation. Since our study was conducted in an African setting, where carriage rates in infants <1 year old ranging from 72 to 97% are among the highest globally
The average pairwise genetic distance between isolates sampled from the same host during extended natural colonisation was higher than would be expected assuming
Strain interactions are vital for pneumococcal colonisation
Our results suggest that within-host evolution is adaptive since the occurrence of parallel mutations is unlikely to due to chance alone
Our findings show rapid within-host microevolution of
One thousand five hundred and fifty-three nasopharyngeal swabs were collected from 98 infants from 21 villages in rural areas via the Sibanor Nasopharyngeal Microbiome study in the Gambia, West Africa, between November 2008 and April 2009
To investigate colonisation dynamics of the strains, we defined a multi-state model with two intermittently observed states; colonised and uncolonised. The uncolonised state referred to a swab that yielded no pneumococcal isolates. We defined a colonisation episode as detection of the same serotype from acquisition to clearance of the serotype. We defined colonisation episodes similar to Turner et al.
Genomic DNA was extracted from pure pneumococcal colonies
The genetic distance between a pair of isolates was estimated as the number of SNPs distinguishing them based on the whole-genome sequence alignment using snp-dists v0.6.3 (
To detect the occurrence of recombination, natural selection, and parallel evolution within extended colonisation episodes, we selected strains from episodes with >3 sequenced genomes. We assessed the distribution of SNPs in the affected genes using the crude ratio of the number of non-synonymous substitutions per kilobase pair (d
Further information on research design is available in the
We would like to thank the study participants and guardians. We acknowledge support from the Research Molecular Microbiology Team at Medical Research Council (MRC) Unit The Gambia at the London School of Hygiene and Tropical Medicine, and the Sequencing and Pathogen Informatics, and Genomics of Pneumonia and Meningitis (and Neonatal Sepsis) teams at the Wellcome Sanger Institute. We would also like to thank Dr Bernard Beall and Dr Allen S. Craig at the Centers for Disease Control and Prevention (CDC) for critically reviewing the manuscript. The study was funded by the Medical Research Council (MRC) Unit The Gambia at London School of Hygiene and Tropical Medicine and the Bill and Melinda Gates Foundation (award no. OPP1034556 to K.P.K., R.F.B., L.M.G. and S.D.B.). C.C. and S.D.B. were funded by the Joint Programme Initiative for Antimicrobial Resistance (JPIAMR). The funders had no role in study design, data collection and analysis, decision to publish, and preparation of the manuscript and the findings do not necessarily reflect views and policies of the authors’ institutions and funders.
B.A.K.A., M.A. and R.A. conducted the Sibanor Nasopharyngeal Microbiome study and conducted the field activities and sample collection. The Global Pneumococcal Sequencing (GPS) project was led by K.P.K., R.F.B., L.M.G. and S.D.B., C.C., M.S., B.A.K.A. and S.D.B. planned the genomic analysis. P.E.T., E.F.N., R.E.B., F.C. and C.O. performed bacteriology work. R.A.G., S.W.L. and S.D.B. performed genome sequencing, MLST and genome-based serotyping. A.W. performed data management and quality checks. C.C., M.S. and E.B. performed whole genome and statistical analysis. C.C., M.S., M.A., S.D.B. and B.A.K.A. drafted the manuscript. M.B. contributed to discussions and data interpretation. All the authors have reviewed and approved the manuscript.
The whole-genome sequences (reads) were deposited into the European Nucleotide Archive (ENA) and are publicly available under the accession numbers provided in Supplementary Data
The authors declare no competing interests.
Supplementary Information Peer Review File Reporting Summary Description of Additional Supplementary Files Supplementary Data 1 Supplementary Data 2 Supplementary Data 3 Supplementary Data 4 Supplementary Data 5 Supplementary Data 6 Supplementary Data 7 Supplementary Data 8