Distinct bacterial trophic networks exist in the gut microbiota of individuals in industrialized and non-industrialized countries. In particular, non-industrialized gut microbiomes tend to be enriched with
Analysis of gut microbiomes from Gambian infants reveals bacterial trophic networks based around
These authors contributed equally: Marcus De Goffau, Amadou T. Jallow, Dora I.A. Pereira, Josef Wagner.
Development of the gut microbiome in infants (aged 0 to 1 year) and toddlers (aged 1–3 years) has been an area of intense research but has mainly focused on European and US populations. Development of the infant gut microbiota in pre-industrial populations is therefore of interest. Several important studies have shown that pre-industrial microbiotas are dominated by
An interesting concept within the human gut microbiome is the enterotype hypothesis which clusters the gut microbiome compositions of individuals, from a simplified point of view, into those dominated by
In the current study, we utilized data from an iron intervention trial in The Gambia, West Africa
We performed 16S ribosomal RNA amplification and sequencing on a total of 1,546 faecal samples from a secondary study outcome as a part of a double-blind, randomized iron intervention trial (
After quality filtering and applying exclusion criteria, 1,389 samples remained. A detailed description of the participants, exclusion criteria and sampling framework is available in the Methods. Children were enroled between the ages of 7 and 37 months, and samples were collected at three different time points (day 1, day 15 and day 85) during an iron intervention trial in The Gambia
Multivariate and univariate analyses were conducted to identify whether children from the treatment arm (iron supplementation) and from the placebo group could be analysed together. Multidimensional scaling using principal coordinates analysis (PCoA) did not cluster children differently based on treatment at the individual time point (Supplementary Fig.
The alpha-diversity indexes for Fisher’s alpha parameter, Simpson’s index, Chao1 richness index and Richness index (observed richness) were not significantly different between the treatment and placebo group (Extended Data Fig.
The PCoA of the gut microbiome stratified into three age group (7–12 months, 1–2 years and 2+ years, age taken at time of sampling) showed distinctive clusters with the youngest and oldest groups separated most from each other (Extended Data Fig.
The R package MaAsLin2 was used to find taxa significantly associated with age, season (wet or dry) and treatment group. In the combined time point analysis, the individual subject identifier was used as a random effect to account for repeat sampling of participants. We also analysed the three individual time points separately. All statistically significant taxa (
In the combined time point analysis, 16 taxa were negatively associated with increasing age including
To account for the association between age groups and time point, a mixed-effects linear regression model was utilized. These analyses were restricted to the top 50 taxa with a minimum abundance of 0.2% (91% of all 16S rRNA reads). Importantly, no adjustments for multiple comparisons were made to the 95% confidence intervals. First, the effect of time point on total sum scaling transformed and cumulative sum scaling) log2 normalized data (TSS + CSS) was assessed using a model that adjusted for season, site and age at enrolment. Repeated sampling was included as a random effect. These results indicated that no species showed notable changes at day 15 or day 85 when compared with day 1, either by including the three age-group variables (Supplementary Fig.
The largest mean changes (just over one unit of TSS + CSS log2 data), were negligible in comparison with the changes between age groups (results below).
After establishing that time point had a minimal effect on abundances, we also estimated the effect of age group at sampling using all available data, while simultaneously adjusting for season and site (geographic location) with repeated sampling included as a random effect. The mean changes for all 50 taxa were between −4 and 3.4 unit of TSS + CSS log2 data for the 1–2 years middle age group compared with the 7–12 months young age group, and between −9.5 and 6 for the >2 years old age group compared with the 7–12 months age group (Extended Data Fig.
A detailed presentation of the top ten taxa with a minimum abundance of 1% showing changes between day 1, day 15 and day 85 in the young, middle and old age groups are shown in Fig. Top ten taxa with a minimum abundance of 1% identified through mixed-effect linear regression associated with the three age groups stratified by the three sampling time point. Relative abundances (%) are plotted on the
Scatterplots of the ten most abundant taxa across the 11 3-month age groups visualize how all of these taxa either increase or decrease significantly in abundance over time (Fig. Significantly differentially abundant bacterial taxa on the TSS + CSS log2 data. The box plot shows the minimum, first quartile, median, third quartile and maximum values. The
In a second analysis, we plotted the bacterial taxa most strongly associated with age according to MaAsLin2 analysis (Supplementary Table
Clusters of co-associated bacteria can be caused purely by temporal or, for example, medication-induced patterns, depending on the type of dataset, but they can also reflect food webs of metabolically interdependent organisms
The largest subcluster identified (teal box) encompassing 9 of the top 50 taxa, includes
The clusters below the blue line in the middle of the heatmap are less coherent because they are mainly positively correlated with one another as they all decrease in abundance over time. In one of these subclusters, indicated with a brown box, various small intestinal species like
The most dominant species,
A summary of the bacterial network is shown in Fig.
Principal component analysis (PCA) combined with Spearman’s rho correlation analysis using clusters from Fig.
In this study, we utilized a large cohort of children from an iron intervention trial in a rural region of The Gambia. This large study gave us an excellent opportunity to analyse in detail the microbiome of children during the first 40 months of life, a critical time frame in the development of the gut microbiome of humans in both industrialized and non-industrialized societies
This study does not aim to disentangle the discussion about enterotypes yet it does provide insights regarding what has thus far been regarded as the
We have summarized several published and well-known studies on the importance of
The trophic network of which
Similar to infant microbiotas in industrialized countries, the genus
The high abundance of
An additional interesting detail is the abundance of
In conclusion, our study has provided insights into the development of a
The stool samples used in this study were taken from the IHAT-GUT ‘The Iron Hydroxide Adipate Tartrate’ trial (
The study area included 45 villages in the Wuli and Sandu districts, situated approximately 400 km east of the capital Banjul, on the north bank of the Gambia River. All villages had access to borehole tap water at central places and are typical of rural sub-Saharan Africa. A detailed description of the study design, child cohort, recruitment, screening, intervention and ethnic statement are present in Gates Open Research
The trial was conducted in accordance with the ethical principles that have their origin in the Declaration of Helsinki, and that are consistent with the International Conference on Harmonisation (ICH) requirements for Good Clinical Practice (GCP), and the applicable regulatory requirements. The study sponsor was the London School of Hygiene and Tropical Medicine (LSHTM) and the study was conducted at the Medical Research Council (MRC) Unit The Gambia at LSHTM (MRCG).
The URR from which the cohort was recruited has an approximate population of 200,000 and only one major town, Basse; it is otherwise typical of rural sub-Saharan Africa. The URR has the highest mortality rate of children under 5 years in the country (92 deaths per 1,000 livebirths), the highest percentage of severely malnourished children (7–11%), and the highest prevalence of malaria and anaemia in children under 5 years (4.5% and 82.5%, respectively)
For this large clinical trial, the Kato-Katz method was used to collect information about helminth parasite egg count and faecal calprotectin. Very few children had helminths because there is a national programme of anti-helmitic metronidazole every 6 months in The Gambia. Children were not given anti-helminths at the start of the study. Regarding gut inflammation, the Gambia team did measure faecal calprotectin. This was a secondary outcome of the trial and will be reported in the main trial outcome paper.
The sample size was chosen so that the IHAT-GUT study was adequately powered for the first primary objective: determining whether IHAT was non-inferior to FeSO4 on the day 85 response outcome. It was assumed based on previous evidence that the proportion of children who were responders with FeSO4 at day 85 would be 0.3. The non-inferiority margin was an odds ratio of 0.583 (equivalent to a 0.1 absolute difference in response probability). Because any significant result would be tested in a subsequent pivotal (phase III) study, a 10% one-sided type I error rate was used. A sample size of 200 per arm provides 89% power to demonstrate non-inferiority when the two arms have the same response probability.
As described further in the protocol, the sample size of 200 per arm also provides: (1) 90% power (10% one-sided type I error rate) for testing the superiority of IHAT over FeSO4 for the prevalence of diarrhoea when prevalence is 0.15 in the IHAT arm and 0.25 in the FeSO4 arm; (2) 93% power (10% one-sided type I error rate) for testing the non-inferiority of IHAT versus placebo for diarrhoea prevalence when it is 0.15 in the IHAT and placebo arms with a 0.1 absolute non-inferiority margin; (3) 90% power (10% one-sided type I error rate) to find a reduction in the incidence density of diarrhoea in IHAT versus FeSO4 assuming 1.28 episodes per child over the 85 days in the FeSO4 arm and rate ratio of 0.8. For the secondary outcomes, the trial (
Randomization was performed using a stratified block design to achieve group balance in terms of age (6–11 months, 12–23 months and 24–37 months) and baseline haemoglobin concentration (above and below median, calculated for each cohort separately) at pre-enrolment (day 0). Within each of the six resulting strata, children were randomly assigned to one of the three study treatment arms (1:1:1 ratio) using a computer program written by the trial statistician and a block randomization approach with fixed block size of six was used.
The Gambia is a low-income country in West Africa, where food availability and nutritional status in rural areas are poor, are strongly influenced by season, and a chronically marginal diet is exacerbated by a ‘hungry season’ (July to September), when food stocks from the previous harvest season are depleted. Infants in rural Gambia are breastfed to 2 years of age, with fewer than half of infants being exclusively breastfed to 6 months of age as per WHO recommendation
We performed 16S rRNA amplification and sequencing on a total of 1,546 samples (1,466 stool samples, 46 negative controls and 34 positive controls). In total, we generated 146,603,504 bacterial V1V2 16S rRNA gene reads. Fifty-six stool samples had fewer than 1,000 high-quality reads and were removed from further analysis; this was mainly due to watery samples containing very low biomass which did not provide sufficient DNA for amplification. Of the remaining 1,410 samples, 1,372 were subsampled to 20,000 high-quality reads per sample and for 38 samples (with 1,000–20,000 sequence reads) all reads were kept. Of the 1,410 samples left after quality filtering, five more samples were removed by the Oligotyping filtering step (reads <1,000), leaving 1,405 samples. These samples were collected at three different time points during an iron intervention trial in The Gambia
Children with severe malnutrition were excluded from the trial (
Extraction of total genomic DNA was conducted on stool samples collected on visit days 1, 15 and 85, using the MO BIO Laboratories (now Qiagen) DNeasy PowerLyzer PowerSoil Kit (catalogue number: 12855-100). Each extraction was done with 24 samples (23 study samples and one reagent blank). About 250 μl of the OMNIGENE (OMNIgene•GUT|OM-200; DNA Genotek) sample mix (from a total of 2 ml of sample plus stabilizing liquid mix) was aliquoted into a labelled PowerLyzer glass bead tube (0.1 mm; catalogue number: 13118-100-GBT) and then mixed gently with 750 μl of PowerSoil Bead Solution (catalogue number: 12855-100-BS). About 60 μl of solution C1 (catalogue number: 12888-100-1) was then added and vortexed briefly. The samples were then homogenized for 45 s at 3000 r.p.m. using a Mo Bio PowerLyzer24 bead beater (catalogue number: 13155). About 400–500 μl of supernatant was transferred to a clean 2-ml collection tube (catalogue number: 12888-100-T) following centrifugation of the bead tubes at 10,000
The bacterial 16S rRNA V1V2 variable region of extracted DNA was amplified with Illumina adapter and indexed PCR primers using a dual-index sequencing strategy to target the bacterial 16S rRNA gene
The forward and reverse fastq files of each sample were processed according to the MOTHUR MiSeq SOP with some modifications (MOTHUR wiki at
Oligotyping was used for clustering the high-quality filtered fasta sequences from the MOTHUR pipeline. Oligotyping is a computational method to investigate the diversity of closely related by distinct bacterial organisms in final operational taxonomic units identified in environmental data sets through 16S rRNA gene data by the canonical approaches. For oligotyping we used the ‘Minimum Entropy Decomposition’ (MED) option for sensitive partitioning of high-throughput marker gene sequences from the oligotyping pipeline
ARB-generated short species abbreviations were then correlated with the full taxonomic path from species to phyla. The 10,152 redundant ARB species were then consolidated to non-redundant 524 species which were present in the 1,410 samples with a minimum substance abundance of an oligotype per node of 100 (-M setting from above). Consolidation was performed using the ‘Consolidate’ option in Excel for Mac v.16.16.14; Microsoft). In cases in which a species could not be classified, we reported the genus name and in few cases the family name. For some obvious beneficial or pathogenic genera, we combined all species within the same genus, for example, for the purposes of ecological functionality, pattern recognition and visualisation of associations we combined all
Alpha-diversity indexes for Fisher’s alpha parameter, Simpson’s index, Chao1 richness index, and Richness index (observed richness) were calculated using the online web portal Calypso v.8.84 (ref.
Multivariate beta-diversity analysis was performed using PERMANOVA, and one- and two-way ANOSIM in the statistical software package PAST4 (v.3.20)
For the visualization of microbial compositional differences between environmental variables (age group, time point, treatment, season and geographic location) we plotted the microbial variances using a multivariate method called PCoA for taxon level ‘species’. For the visualization of microbial differences between the three young, middle and old age groups, we analysed the datasets at the species, genus and family levels using PCoA. The PCoA was done using the online web portal Calypso v.8.84 (ref.
To determine which taxa are associated with age, season and treatment groups, we use “a multivariable statistical framework for finding associations between clinical metadata and potentially high-dimensional microbial multi-omics data” through the R package MaAsLin2 v.1.5.1 (from The Huttenhower Lab:
Before analysis in MaAsLin2, absolute taxa abundancies were normalized to proportional abundance (TSS) using the R function ‘make_relative’ from the ‘funrar’ R package. TheTSS normalized data were then used for a mixed-effect modelling using ‘Maaslin2’ function from the ‘MaAsLin2’ R package. Age, treatment group (iron supplement or placebo) and season (wet or try) were used as ‘fixed effects’ and subject (for repeated sampling) was used as a random effect. Subject is unique to each child from which we had either one, two, or three samples. The minimum abundance was set to 0.01 and the minimum prevalence set to 0. Please note that the ‘subject’ variable (individual identifier for longitudinal models) was included only as a ‘random_effects’ parameter when all time point samples were analysed together. We also analysed samples from day 1, day 15 and day 85 collections separately. The R code is: fit_data = Maaslin2(input_data = countdata_TSS, input_metadata = metadata, output = ‘output’, fixed_effects = c(‘group’,‘age”,‘season’), random_effects = c(‘subject’), min_abundance = 0.01, min_prevalence = 0, normalization = ‘none’, transform = ‘none’)
From MaAsLin2 analysis, we report all taxa that were statistically significantly different with a
Data were plotted in GraphPad Prism 9 for macOS and by using macOS Keynote 11.1.
For statistical analysis, data were tested to check whether they follow a Gaussian distribution using the available tests including Anderson-Darling and the D’Agostino & Pearson test in GraphPad Prism 9. The non-parametric Kruskal–Wallis test was used for three or more groups when not normally distributed. For two-group comparison, the two-tailed unpaired non-parametric Mann–Whitney test was used for non-normally distributed data and normally distributed data were tested with a two-tailed unpaired parametric
Statistical analysis was performed in R, PAST4 or in Calypso Web portal for the ALDEx2 test. The
Mixed-effect linear regression was done in R using the lmer function from R package lme4 v.1.1-27.1. The model estimates the mean difference between categories for each taxon; for example, the difference between day 15 and day 1 as well as between day 85 and day 1 in the three time point analysis. Units are TSS + CSS log2(transformed and normalized abundances).
The heatmap used to visualize bacterial trophic networks was generated using the online web portal Calypso v.8.84 (ref.
Further information on research design is available in the
We acknowledge the staff of the Medical Research Council (MRC) Unit The Gambia at the London School of Hygiene and Tropical Medicine (LSHTM) and, in particular, the committed IHAT-GUT field team, for their support of, and their invaluable contribution to, this research. We thank the staff of the Yorrobawol Health Centre, Darsilami Community Health Post, Konkuba Community Health Post, Taibatu Health Post and Chamoi Health Centre for welcoming our team and supporting our study. We thank the local communities and the many study participants and their families in the Wuli and Sandu districts of the Upper River Region, The Gambia, who have been so willing to contribute to this study. This IHAT-GUT study was supported by a Bill & Melinda Gates Foundation Grand Challenges New Interventions in Global Health award [OPP1140952]. The Nutrition Group of the MRC Unit The Gambia at LSHTM are supported by core funding MC-A760-5QX00 to the MRC Unit The Gambia/MRC International Nutrition Group by the UK MRC and the UK Department for the International Development (DFID) under the MRC/DFID Concordat agreement. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
J.W. prepared the original draft. J.P. set up the initial study and critically reviewed the manuscript. D.I.A.P. and A.M.P. conceived of the study and acquired funding. D.I.A.P., C.S. and A.M.P. were responsible for the study methodology. C.S. and A.T.J. performed the investigation. J.W. and M.D.G. performed the data analysis. D.I.A.P. curated and visualized the data and undertook the project administration. P.A.R. confirmed that the data visualization was correct. D.I.A.P., A.T.J. and A.M.P supervised the study. A.T.J. validated the study. D.I.A.P. and C.S. wrote the manuscript. M.D.G., D.I.A.P., C.S., A.T.J., P.A.R. and A.M.P. reviewed and edited the manuscript. N.M. and D.J.P. provided statistical analysis specific for mixed-effect linear regression and critically reviewed all statistical analyses.
The metadata and bacterial 16S count data used for the analysis are available in Supplementary Table
For this analysis no custom code was used.
D.I.A.P. is one of the inventors of the IHAT iron supplementation technology for which she could receive future awards to inventors through the MRC Awards to Inventor scheme. D.I.A.P. has served as a consultant for Vifor Pharma UK, Shield Therapeutics, Entia Ltd, Danone Nutritia, UN Food and Agriculture Organization (FAO) and Nemysis Ltd. D.I.A.P. has since moved to full employment with Vifor Pharma UK, but all work pertaining to this study was conducted independently of Vifor Pharma. Notwithstanding, the authors declare no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
The Fisher’s alpha diversity index (
The Fisher’s alpha diversity index (
The Fisher’s alpha diversity index, the Chao1 estimated richness, and the observed richness were significantly different across the 11 3-month age groups as analysed by the non-parametric Kruskal-Wallis test. Data were tested for normality using the Anderson-Darling and the D’Agostino & Pearson test in Prism 9 for macOS and were non-normally distributed. The Simpson’s index was only significantly different in the day 1 dataset. Data were considered to be statistically significant with a confidence level of 95%. Data were plotted using the Scatter dot plot function in Prism 9 for macOS with a line drawn at the mean with error bars extending to the standard deviation. In the day 1 dataset there were 520 samples, in the day 15 dataset there were 412 samples, in the day 85 dataset there were 457 samples.
PCoA based on Bray-Curtis distance matrix performed on three age group categories showed that the structure of bacterial communities differs between the young (7 to 12 mths), the middle (1 to 2 years), and the old (plus 2 years) age groups for the combined time-point dataset (
PCoA based on Bray-Curtis distance matrix performed on 11 3-months age group categories showed that the structure of bacterial communities differs between the age groups for the combined time point dataset (
Mixed-effect linear regression was conducted to examining the effect of the three different age groups. The mean changes (95% CI) in TSS + CSSlog2 transformed + normalized data of one to two years old samples and >2-year-old samples compared to seven months to 12 months samples are shown. The analysis was adjusted for season, site (fixed effect) and child ID (multiple sampling = random effect). The taxa are sorted from largest negative mean change to largest positive mean change on the x axis. The top ten taxa with a minimum abundance of 1% high highlighted in bold. Mixed-effects linear regression analysis was restricted to the top 50 taxa with a minimum abundance of 0.2%. Importantly, no adjustments for multiple comparisons were made to the 95% confidence intervals. Subject identifier (child ID) was included as a random-effect to account for repeated measures. In this analysis, all 1389 samples across the three sampling timepoints were combined. The black points shows the estimated change and the error bars show the 95% confidence interval.
Gap statistic using the R function “clusGap” from the R package “cluster” version 2.1.2 was used to calculate a goodness of clustering measure, the “gap” statistic. The “k.max” parameter was set to 10, the bootstrap “B” parameter was set to 100, and the analysis was done with two different “FUNcluster” method including cluster:fanny and kmeans. The analysis was restricted to the top 50 TSS transformed taxa with a minimum abundance of 0.2%.
Child cohort characteristics available for microbiome analysis Child cohort characteristics available for microbiome analysis The participant cohort characteristics from 1389 stool samples collected from 633 children over three time points (Day 1, Day 15, and Day 85) are shown. Participant’s characteristics are shown for all patients (N = 633) and separated for the three different age groups (youngest age N = 88, middle age N = 282, oldest age N = 263). Sample characteristics are shown for the three different sampling time points, for the treatment arms, for the geographic location and for the seasons. Abbreviations: min: minimum, max: maximum, mths = months, HAZ: height-for-age Z score, WAZ: weight-for-age Z score, WHZ: weight-for-height Z score, SD: standard deviation.
Beta diversity analysis (PERMANOVA and ANOSIM) for the three age group comparison Beta diversity analysis (PERMANOVA and ANOSIM) for the three age group comparison Beta diversity analysis (PERMANOVA and ANOSIM) for the three age group comparison. PERMANOVA test and ANOSIM test were performed in PAST (Paleontological Statistics software package for education and data analysis), version 4.04. From the PERMANOVA test the Bonferroni corrected P value and F value is reported. From the ANOSIM test, the Bonferroni corrected P value and the R value is reported. Both tests were done separately for the Day 1 (N = 520), Day 15 (N = 412) and Day 85 (N = 457) samples. All Bonferroni corrected P value < 0.05 are highlighted in blue.
Selected studies with Selected studies with A representative study from Malawi with 631 children confirmed the high relative abundance of
Flow diagram showing the samples taken for the secondary study outcome (microbiome analysis). Alpha-diversity analysis. Beta-diversity analysis. Supplementary Figs. 1–7. Supplementary Fig. 1 PCoA for treatment and placebo. Supplementary Fig. 2 PCoA for gender and location. Supplementary Fig. 3 PCoA for wet and try season. Supplementary Fig. 4 Mixed effect linear regression enrolment 3 age groups time effect. Supplementary Fig. 5 Mixed effect linear regression enrolment 11 age groups time effect. Supplementary Fig. 6 Heatmap bacterial trophic network day 1 top 50 taxa. Supplementary Fig. 7 Heatmap bacterial trophic network day 85 top 50 taxa. Reporting Summary Peer Review Information Supplementary Table 1 Statistical analysis to identify discriminant taxa between the treatment and placebo groups. Supplementary Table 2 Alpha diversity multiple group comparison with Kruskal–Wallis Dunn’s test. Supplementary Table 3 Beta-diversity analysis (PERMANOVA and ANOSIM) for 11 age group comparison. Supplementary Table 4 Beta-diversity analysis (PERMANOVA and ANOSIM) for the wet and try season groups. Supplementary Table 5 Statistically significant associated bacterial taxa with age, season (wet or try), and treatment group (iron supplement or placebo). Supplementary Table 6 Metadata and count data for all 1,389 samples.
is available for this paper at
The online version contains supplementary material available at