- Research
- Open access
- Published:
Shared environments complicate the use of strain-resolved metagenomics to infer microbiome transmission
Microbiome volume 13, Article number: 59 (2025)
Abstract
Background
In humans and other social animals, social partners have more similar microbiomes than expected by chance, suggesting that social contact transfers microorganisms. Yet, social microbiome transmission can be difficult to identify based on compositional data alone. To overcome this challenge, recent studies have used information about microbial strain sharing (i.e., the shared presence of highly similar microbial sequences) to infer transmission. However, the degree to which strain sharing is influenced by shared traits and environments among social partners, rather than transmission per se, is not well understood.
Results
Here, we first use a fecal microbiota transplant dataset to show that strain sharing can recapitulate true transmission networks under ideal settings when donor-recipient pairs are unambiguous and recipients are sampled shortly after transmission. In contrast, in gut metagenomes from a wild baboon population, we find that demographic and environmental factors can override signals of strain sharing among social partners.
Conclusions
We conclude that strain-level analyses provide useful information about microbiome similarity, but other facets of study design, especially longitudinal sampling and careful consideration of host characteristics, are essential for inferring the underlying mechanisms of strain sharing and resolving true social transmission network.
Video Abstract
Introduction
Animals acquire their microbiomes from a combination of vertical, environmental, and horizontal transmission — including, in social species, through social interactions with their conspecifics. In the last 10 years, studies in a wide variety of social animals — from humans and other mammals to birds and insects — have shown that social group residency and social network structure have significant explanatory power for the composition and genetic content of animal microbiomes, especially in the gut [1,2,3,4,5]. These observations suggest that social transmission may be an important mechanism shaping animal microbiomes, with potential consequences for microbiome stability and diversity, host infectious disease risk and immune function [6, 7], and the evolution of host behaviors to facilitate and/or avoid microbial transmission [8]. From the perspective of microorganisms, social transmission may also generate selection pressures to specialize on the substrates and conditions available in the host environment, causing the fitness of socially transmitted microbes to become more closely aligned with the fitness of their hosts [9]. Indeed, an early hypothesis proposed that some social behaviors, such as trophallaxis and coprophagy, evolved in part to transmit specialized, beneficial microbes that can no longer tolerate environmental intermediates [8, 10].Consistent with this idea, several studies in the more recent era of microbiome sequencing report that anaerobic and non-spore-forming species (i.e., those with low predicted environmental persistence) are more likely to be socially shared [2, 11, 12].
Together, these studies argue that social transmission, either directly between interacting animals or indirectly through intermediate substrates (e.g., shared nesting material, deposited scent marks), may play an important role in host microbiome structure and function. This idea is bolstered by experimental work: for example, evidence from captive populations of rhesus macaques, goats, pigs, and vampire bats all show that moving individuals from separate to shared housing causes their microbiomes to converge [13,14,15]. However, although social transmission has become a favored explanation for socially structured microbiomes, it can be very difficult to distinguish from alternative mechanisms also at play in natural populations. Many microbial species are widely distributed in nature [16]. Individuals may therefore independently acquire lineages of the same species, especially if they share similar environments. For example, many animals forage in groups with, or share food preferentially with, close social partners, leading to correlations between social group membership, the strength of social bonds, and diet [17, 18]. Social groups may also encounter microbial species in the water, soil, and other substrates in the environment that differ based on territory or space use [19]. In some species, animals associate preferentially with their close genetic relatives and/or with individuals of a similar age [20, 21]. If these characteristics affect which microbial species can establish and persist in hosts, then age, kin, or shared environmental effects could be mistaken for social transmission, inflating estimates of social effects on the microbiome [22, 23]. Few (if any) studies in natural populations, including humans, can completely eliminate these alternative explanations, and direct experimental evidence for transmission is difficult to obtain. Consequently, the relative importance of social transmission in explaining socially structured microbiomes remains an open question.
One proposed solution is to leverage computational approaches that classify microbial reads at subspecies or strain levels from shotgun metagenomic data [24,25,26]. Current strain profiling pipelines either use sequence variation in species-specific marker genes to construct strain-level phylogenies [24, 25, 27] or align short reads to a set of reference microbial genomes to identify variants throughout the genome [26, 28, 29]. Samples can be classified as “sharing a strain” if the lineages they carry are close in the phylogeny [30, 31], have highly similar marker gene sequences [25], or have high genome-wide nucleotide or structural identity [26, 32]. Individuals with a higher proportion of strains in common are then typically assumed to engage more frequently in transmission. Strain profiling in humans has revealed elevated strain sharing rates among mothers and infants [33,34,35], interpreted as vertical transmission, and among household members [28, 36] and village residents [30, 31], which has been interpreted as support for pervasive horizontal transmission through social networks. Strain-resolved approaches have not yet been widely applied in natural populations of other social animals, but early evidence in wild baboons also demonstrates elevated rates of strain sharing within versus between social groups [37], suggesting that findings in humans will likely generalize to other host species.
Elevated strain sharing rates are often assumed to be the direct result of transmission, based on the assumption that individuals are otherwise unlikely to acquire the same strain independently [36, 38]. Consequently, socially structured strain sharing has been treated as evidence for the importance of direct person-to-person transmission, but the validity of this assumption is uncertain. Some pairs of individuals might frequently transmit strains among each other but consume different enough diets (for instance) that they retain little of what they receive. Others may exchange strains only occasionally but retain a high proportion of socially acquired strains, resulting in similar microbiomes overall. The dual processes of transmission and retention/persistence make it difficult to evaluate whether strain sharing rates are elevated among social partners because they truly exchange microbes or simply because they experience similar environments that shape their microbiomes in parallel.
Here, we evaluate the explanatory power of strain sharing rates in two gut microbiome data sets: an experimental fecal microbiota transplant with a known underlying transmission network, generated by Ianiro and colleagues [39], and a wild baboon population where social structuring of the gut microbiome has been described in previous work [2]. The baboon population has been the subject of continuous study by the Amboseli Baboon Research Project for over 50 years, and extensive data are available on the behavior, ecology, life histories, and gut microbiomes of individually recognized study subjects, making it an ideal setting for disentangling drivers of microbial strain sharing [40].
We use the fecal microbiota transplant to first ask whether, in a setting in which the transmission network is completely understood, strain-level resolution improves our ability to infer the transmission network compared to coarser, compositional resolution. We test whether species that follow the true transmission network are more likely to come from specialized, host-associated taxa, as predicted by evolutionary theory [8]. We also take advantage of this data set to evaluate alternative criteria for defining a strain sharing event as a case of true transmission. We find that transmission dynamics can be more reliably resolved in strains that are private to a single individual at an earlier time point and then spread to other individuals, whereas strains that are already widespread provide less robust information about transmission [11]. Using the baboon data set, we then consider cases of strain sharing under natural conditions — the setting of greatest interest in studies of social transmission [41]. We evaluate “background” rates of strain sharing among individuals with nonoverlapping lifespans (i.e., cases where social transmission is impossible) and compare those rates to both strain sharing levels in longitudinal samples from the same individual and close social partners. Finally, we test whether environmental or demographic characteristics provide alternative explanations to social transmission. Together, our analyses suggest that although strain-resolved metagenomics has substantial value for understanding microbial transmission in the gut microbiome, elevated strain sharing rates are, by themselves, insufficient to infer direct social transmission.
Methods
Fecal microbiota transplant
We reanalyzed publicly available metagenomic data from a study of healthy human donors (n = 5) and patients with either recurrent Clostridium difficile infection or mild-to-moderately active inflammatory bowel disease (n = 8), living in Rome, Italy [39]. While the sample size for this study was small, the transplant recipients experienced a discrete, strong perturbation (i.e., antibiotic treatment followed by transplant), each patient was treated with a fecal transplant from a single donor, and participants were sampled very shortly afterwards, making a strong test case to demonstrate transmission. Each patient was treated with a fecal transplant from a single donor. Stool samples were collected and sequenced from patients immediately before the transplant and 15–30 days after the transplant. DNA extraction was performed by the original authors using the DNeasy PowerSoil Pro Kit (Qiagen). Libraries were prepared using the Illumina DNA Prep (M) Tagmentation kit and sequenced on the Illumina NovaSeq 6000 platform.
We downloaded raw reads for the fecal transplant study from the European Nucleotide Accession (PRJEB47909) and filtered reads using Trimmomatic [42], requiring a minimum length of 70 bp and a minimum quality score of 20 within a 4-bp sliding window. Next, we aligned reads to 4644 species-representative microbial genomes from the Unified Human Gastrointestinal Genome database [43] using bowtie2 [44]. Read counts and mapping statistics are available in Table S1. We conducted strain-level population genetic comparisons using the profile and compare functions of inStrain [26]. To minimize bias against low-coverage strains while still controlling the rate of false positives, we considered a strain to be “present” in a pair of samples if at least 25% of its genome was represented with at least 5 × coverage in both samples (we note that the developers recommend a range between 25 and 50% for this threshold [26]; our choice of 25% allows us to retain more rare, low-abundance strains). We calculated the average nucleotide identity (ANI) of present strains using a microdiversity-aware approach that calls a substitution only when no alleles (major or minor) are shared between the two samples. We considered two samples to share a strain if their strains had 99.999% ANI.
Sample collection from the baboon field study
The newly generated sequences in this study originated from a population of wild baboons (admixed between Papio cynocephalus and Papio anubis, with P. cynocephalus the majority ancestry [45]) inhabiting the Amboseli basin in southern Kenya. The population has been under continuous study since 1971; the present study includes fecal samples collected between 2007 and 2017. After collection, fecal samples were stored in 95% ethanol at 4 °C for up to 2 weeks and then freeze-dried as follows: (1) All ethanol was evaporated from samples under a fume hood, (2) tubes were cooled for 30 min at − 20 °C, and (3) tubes were placed in a freeze-dryer (< − 50 °C, vacuum at 30 mTorr). The resulting powders were stored at − 80 °C.
We selected fecal samples for strain sharing analysis that satisfied one or more of four criteria, focusing our efforts on adult females. First, we selected 20 pairs of fecal samples from female baboons whose lives never overlapped. Second, we selected 23 pairs of fecal samples from baboons living in different social groups at the same approximate time (i.e., collected less than 150 days apart). Note that females do not disperse in this species, meaning that females could not have transferred between distinct social groups prior to sample collection. Third, we selected fecal samples that belonged to close social partners in the same social group, sampled less than 4 days apart. To identify close social partners, we calculated dyadic sociality indices (DSI, a measure of social bond strength [46]) between females based on all observed grooming interactions between 2007 and 2017 and then sampled 26 unique dyadic pairs from the top quartile of the DSI distribution. Finally, we selected longitudinally collected fecal samples from 22 individuals for whom repeated samples were available within a 120–150-day interval. Note that the same individual could be represented in multiple categories with different partners, but not multiple dyads within the same category (Table S2). To ensure that representation of the same individuals in multiple categories did not bias our results, we also iteratively downsampled our dataset to include only a single randomly selected sample from each individual and repeated the analysis. The effects from the full dataset remained qualitatively consistent across 100 iterations of downsampling, even considering the reduced sample size in the downsampled datasets (Fig. S1).
Metagenome sequencing of baboon samples
Gut metagenomes were generated from the 126 fecal samples selected as described above. First, we extracted microbial DNA using the MoBio PowerSoil Kit with a modified protocol optimized for freeze-dried samples. Specifically, we increased the PowerBead solution to 950 µL/well and incubated the plates at 60 °C for 10 min after lysis in order to increase the hydration levels of freeze-dried samples and minimize the risk of plate clogging. We prepared libraries using the seqWell purePlex DNA library prep kit and sequenced the libraries on a NovaSeq X at the University of Chicago DNA Sequencing Facility to a median depth of 34 million read pairs per sample (using a paired end, 150-bp read length design). We removed adapters and filtered raw reads using Trimmomatic [42], requiring a minimum length of 70 bp and a minimum quality score of 20 within a 4-bp sliding window. Read counts after quality control are available in Table S3. Raw data are available on the Sequence Read Archive (accession PRJNA1135081).
As database coverage of nonhuman metagenomes can be low, we created a custom microbial genome database by supplementing the Unified Human Gastrointestinal Genome (UHGG) database with an additional 2985 genomes assembled directly from the metagenomes of nonhuman primates [47], which includes 985 genomes assembled from metagenomes from the baboon population analyzed in this study. The full set of genomes was filtered using dRep [48] to minimize mis-mapping between closely related genomes. Briefly, genomes were clustered into bins based on 95% average nucleotide identity. Representative genomes were selected based on completeness, contamination, and centrality to other genomes in the cluster, with an additional weight using the − extra_weight_table flag to favor genomes from nonhuman primates over those from humans. The final database contained 4712 genomes. The custom database substantially improved read alignment compared to the standard UHGG database (paired t = 29.85, p < 0.001, Fig. S2a).
We aligned reads to the custom database using bowtie2 [44] (mapping statistics available in Fig. S2b and Table S3). We conducted strain-level population genetic comparisons using the profile and compare functions of inStrain. As with the fecal microbiota transplant, we considered a strain to be present in a pair of samples if at least 25% of its genome was represented with at least 5 × coverage in both samples, and we considered samples to be strain sharing if their strains had ≥ 99.999% ANI.
To compare the major conclusions of the study with an alternative analysis pipeline, we additionally profiled metagenomes at the species level using MetaPhlAn [49]. We extracted clade-specific marker genes using the extract_markers function of StrainPhlAn [27]. We used the strainphlan function to build species-level phylogenetic trees, requiring each species to be present with 10 or more markers in at least 4 individuals, and then extracted the pairwise phylogenetic distances with the tree_pairwisedists function of StrainPhlAn. We considered samples to be strain sharing if the normalized phylogenetic distance between them was ≤ 0.1.
Behavioral, demographic, and ecological data from the baboon field study
We collected data on the baboons’ diets using point sampling at 1-min intervals within 10-min focal samples by recording the food type if the individual was feeding during the point sample. Because of the sparseness of individual-level diet data, especially in large groups (focal data are collected via rotating in a random order across individuals in each social group [50]), and because diet tends to be similar for members of the same social group [49, 50], we aggregated these data by social group. An individual’s value therefore represents the average diet of her social group in the year of sampling. Comparisons were thus only possible for dyads from different social groups or dyads who never lived at the same time. We generated a Bray–Curtis dissimilarity matrix based on the compositional data using the R package vegan and measured the relationship between dietary (dis)similarity and bacterial strain sharing. Daily rainfall values were collected using a rain gauge established at the nearby field camp of the Amboseli Baboon Research Project. Ages of all individuals in our sample were known within a few days as they were followed since birth.
Taxonomic and functional annotations of shared species
We used GTDB-TK to assign taxonomic classifications to the metagenome-assembled genomes in our custom database based on their sequence similarity to the bacterial and archaeal reference trees available in the Genome Taxonomy Database (GTDB) [51]. We then annotated the oxygen tolerance and known host(s) of each species-representative genomes in the Unified Human Gastrointestinal Genome database using the Genomes Online Database [52]. If all entries in the containing genus were recorded as obligate aerobes, we considered the species aerobic. If all entries in the containing genus were recorded as obligate anaerobes, we considered the species anaerobic. If the genus contained a mixture of obligate aerobes, obligate anaerobes, and/or facultative aerobes or anaerobes, we assigned it the label “mixed.” All genera in the Unified Human Gastrointestinal Genome database had been previously reported in humans at a minimum; some had also been detected in other animals. We thus annotated each entry as “Homo sapiens only” or “multiple hosts.”
Quantification and statistical analysis
We assessed differences in strain sharing between matched and mismatched pairs (FMT dataset) or among categories (baboon dataset) using t-tests and Benjamini–Hochberg correction of p-values. We confirmed that variation in strain sharing rates was not explained by variation in either sample sequencing depth or mapping rates to bacterial reference genomes, measured as the minimum, maximum, or average for the sample dyad (all r < 0.2 and p > 0.05). We also assessed taxonomic differences between matched and mismatched pairs (FMT dataset) or among categories (baboon dataset) using Fisher’s exact tests, followed by calculation of family-wise log odds ratios if the Fisher’s test was significant.
We assessed differences in anaerobic metabolism and host specificity in the FMT dataset using permutation tests, because the filtering criteria imposed on the FMT dataset generated subsets of the full set of strain sharing events and the phenotypes were therefore not independent of the full set or directly comparable to the full set. For the permutation tests, we sampled from the full list of strain sharing without replacement to a size matching the number of strain sharing events detected following a given set of filtering criteria. The p-value represents the proportion of permutations (out of 1000) for which the simulated proportions of anaerobic or host-specific bacteria were as or more extreme as the observed proportions. Information about test statistics, degrees of freedom, and p-values are available in the main text and figure captions.
We evaluated the contributions of diet and environmental variables using a linear model with the percentage of shared strains between sample dyads as the response variable and either the dietary similarity or the amount of total rainfall as the predictor variable.
Results
Strain sharing in a known transmission network
To assess whether bacterial strain sharing rates can act as a reliable indicator of microbiome transmission, we first identified a sample set with known transmission dynamics. Here, we drew on fecal metagenomic sequences from a fecal microbiota transplant (FMT) study, which tracked both healthy donors and FMT recipients (patients with inflammatory bowel disease or chronic Clostridioides difficile infection) living in Rome, Italy [39]. Each patient underwent a 3-day vancomycin regimen and then received a transplant prepared from the fecal sample of a single donor. Stool samples were collected from donors immediately before FMT and from recipients immediately before FMT and 15–30 days after FMT (n = 21 total samples from 13 subjects, including 5 healthy donors and 8 FMT recipients, Fig. 1A). This study design allowed us to compare strain sharing in actual donor-recipient pairs (“matched pairs”) to background rates of strain sharing in all other pairs of “mismatched” donors and recipients (Fig. 1B).
Strain sharing among donors and recipients in a fecal microbiota transplant cohort. A Transmission network in the FMT study. Stool samples from five healthy donors and eight recipients enrolled in a clinical study were collected for metagenomic sequencing and analysis [39]. B Visualization of criteria used to define a putative transmission event. Each box shows examples of strain sharing events that would be considered transmission (included) or not considered transmission (excluded) based on increasingly stringent criteria concerning the prevalence of the strain in the population. All criteria were cumulative; e.g., the list of strain sharing events absent in pre-FMT recipients was used as the starting point to further select strain sharing events that were unique to a single donor. Inferred transmission events under each successive set of criteria are shown using solid lines to connect samples; x’s mark the presence of shared strains. C Species sharing across increasingly stringent definitions of transmission. The number of shared microbial species among matched donor-recipient dyads (blue) and all other donor-recipient comparisons (mismatched: red) is shown following serially more stringent filtering criteria. D Strain sharing across varying definitions of transmission. The percentage of strain sharing events among matched donor-recipient dyads (blue) and all other comparisons (red) is shown following serially more stringent filtering criteria. Boxplots are bounded by the lower and upper quartiles, with the middle line showing the median value and whiskers extending to 1.5 times the interquartile range. Asterisks represent significant differences between matched and mismatched cohorts based on a t-test followed by Benjamini-Hochberg correction: ***p < 0.001, **0.001 ≤ p < 0.01, *0.01 ≤ p < 0.05. E Subsets of strain sharing events detected under varying criteria for transmission. The log-scaled proportion of total strain sharing events that were classified as transmission under varying criteria. Strain sharing between matched pairs is represented in blue, and strain sharing between mismatched pairs is represented in red
We measured strain sharing using the inStrain pipeline, which calculates the average nucleotide identity (ANI) of bacterial species shared between two individuals [26]. We considered pairs of donors and recipients to share the same strain of a species if their respective sequence alignments met or exceeded a threshold of 99.999% ANI. This threshold was set based on developer recommendations and is estimated to discriminate between strains that diverged within the last 2.2 years [26]. We first calculated strain sharing rates for all microbial species and in all post-FMT donor-recipient pairs (both matched true donor-recipient pairs and mismatched pairs). We identified cases of strain sharing for 131 of the 755 species we detected in the data set. An average of 15% of species shared between a given donor-recipient pair (matched or mismatched) met the sequence identity threshold to be considered the same strain. Consistent with our expectations, strain sharing rates were significantly higher among matched donor-recipient pairs (i.e., true links in the underlying transmission network) than mismatched pairs, even though matched pairs did not share significantly more species in common (strain sharing: 40% between matched donor recipient pairs versus 8% between mismatched pairs, Benjamini–Hochberg adjusted p = 0.002; species sharing: mean = 44.9 shared species in matched pairs versus 31.4 in mismatched pairs, adjusted p = 0.253; Fig. 1C, D).
We then asked whether imposing more stringent criteria on strain sharing events would improve our ability to differentiate matched from mismatched pairs and therefore the concordance between the strain sharing network and the true underlying transmission network. First, for each pair (whether matched or mismatched), we excluded strains shared prior to the FMT from further consideration. This approach controls for background rates of strain sharing between donors and recipients but had little effect on strain sharing patterns in practice, suggesting that background strain sharing between donors and recipients was generally low (Fig. 1D, “strains previously absent from recipient”). Next, we reasoned that widespread strains might be more easily acquired by recipients from sources other than their FMT donor, while strains with limited distribution in the population are more likely to reflect the FMT transmission network. Further restricting the dataset to exclude strains detected in more than one donor prior to FMT was more effective at distinguishing the true transmission network. Although the number of shared species under consideration decreased with this filter (Fig. 1C), the proportion of shared species with 99.999% or higher ANI increased to 52% in matched donor-recipient pairs, compared to only 6% in mismatched pairs (adjusted p = 0.01; Fig. 1D, “strains unique to one donor”; out of an average of 5.0 and 2.2 shared species respectively). Further restriction of the dataset to additionally exclude strains that were detected in more than one recipient after FMT moderately improved this result (61% between matched pairs versus 7% between mismatched pairs, adjusted p = 0.01, Fig. 1D, “strains unique to one donor + unique to one recipient,” out of an average of 3.2 and 0.6 shared species respectively). This filter reduces the possibility that recipients exchanged the strain with one another, independently from the FMT.
Naturally, the increasingly stringent criteria for transmission excluded strain sharing events among matched donor-recipient pairs as well. For example, the most stringent threshold excluded ~ 95% of strain sharing events in mismatched donor and recipient pairs from being categorized as true transmission but also excluded ~ 91% of strain sharing events that occurred in matched pairs (13 strain sharing events in matched pairs and 5 strain sharing events in mismatched pairs remained at this most stringent threshold, Fig. 1E). The focus on strains that are unique in donor and/or recipient cohorts is therefore useful for obtaining the clearest representation of the true transmission network, but more relaxed criteria may be appropriate for studying other aspects of strain sharing in a broader and more representative set of gut microbial species. However, while other investigators may choose to set different thresholds, our finding that true transmission events can be captured using strain-resolved metagenomics, with filtering to reduce false positives, is likely to generalize across datasets. Our analysis also suggests a set of criteria, from less conservative to more conservative, which can be applied to other datasets.
Candidate transmission events in FMT are enriched for rare, anaerobic, and host-specific bacterial taxa
Having established that matched donors and recipients shared more bacterial strains than mismatched pairs, we next asked whether they shared different bacterial taxa as well. Species sharing events in matched donor-recipient pairs involved subtly different sets of bacteria taxa than species sharing events in mismatched pairs (Fisher’s exact test, p = 0.024; Fig. S3a). Species from the Erysipelotrichaceae family were enriched in matched pairs compared to mismatched pairs (log2 odds ratio = 2.73, p < 0.001, Fig. S3b), consistent with previous observations of engraftment of this family following fecal microbiota transplants [53]. Strain sharing events among matched and mismatched pairs also involved different sets of bacteria taxa, though no families were sufficiently enriched among matched pairs to detect in family-by-family analyses (Fisher’s exact test, p = 0.048; Fig. 2A, Fig. S3c).
Taxonomic and functional characteristics of bacteria that exhibit strain sharing between donors and recipients of fecal microbiota transplants. B Family-level structure of bacterial taxa that were shared at the strain level between subjects. Bar chart of species, grouped into families, that exhibit strain sharing between matched (left) or mismatched (right) pairs, based on a 99.999% ANI threshold. Colors reflect different bacterial families and are scaled to represent the relative proportions of strain sharing events in each category. B Population prevalence in donors and pre-FMT recipients. Each strain sharing event was annotated according to the initial prevalence of the species to which it belonged. The x-axis represents the number of donors or pre-FMT recipients containing the species (of a maximum possible of 13); the bar color represents the dyad type. C Anaerobic metabolism across varying definitions of transmission. Each strain sharing event between matched pairs was annotated, at a species level, as aerobic, anaerobic, or mixed. D Host specificity across varying definitions of transmission. Each strain sharing event between matched pairs was annotated as a genus reported only in humans or reported in multiple species
There was a slight difference in the initial population prevalence of strains shared in matched versus mismatched pairs, with strain sharing in matched pairs coming from rarer species (i.e., those present in a smaller number of hosts prior to FMT) (t = 2.36, p = 0.019; Fig. 2B). Strains shared by matched pairs were also more likely to come from obligately anaerobic genera (permutation test, p < 0.001, Fig. 2C), and the proportion of strain sharing events involving obligately anaerobic taxa increased across progressively stricter definitions of transmission. Similarly, with stricter criteria for transmission, strain sharing events were increasingly likely to involve bacterial genera that were reported in human-derived microbiome samples, but not in any other host species, in the Genomes OnLine Database [52]. Under the strictest transmission criteria, for example, 78% of strain sharing events involved human-specific taxa, compared to only 31% in the strain sharing dataset as a whole (permutation test; unique to one donor, p = 0.002; unique to one donor and recipient, p = 0.007; Fig. 2D).
Strain sharing in a natural primate population
The FMT data show that strain sharing patterns can, in principle, reflect the underlying transmission network in a short-term, experimental cohort. However, the primary interest in strain sharing across social and/or transmission networks focuses on unmanipulated groups or populations, where strain sharing may arise due to transmission, genetic effects, and/or shared environments. We therefore next asked whether strain sharing is a reliable indicator of transmission in natural populations, using fecal samples collected from baboons in the Amboseli ecosystem of Kenya [54].
In the Amboseli baboon population, as in any natural setting, there is no “known” transmission network. However, we reasoned that strain sharing due to transmission should be more likely among some pairs of individuals than others. For example, baboons who live in different social groups throughout their entire lives would have fewer opportunities for microbial transmission than close social partners who interact frequently. To test this prediction, we generated metagenomes from a set of 126 fecal samples collected from 93 individual baboons between 2007 and 2017 (mean sequencing depth = 37 million read pairs ± 22 million SD; Fig. 3A). We compared the following: (i) 23 pairs of baboons whose lives never overlapped (i.e., the first one died before the second was born), (ii) 20 pairs of baboons whose lives overlapped but lived in different social groups the entire time, and (iii) 26 pairs of baboons that were close social partners living in the same social group and sampled within 4 days of each other. Because high rates of strain sharing are expected from samples collected from the same individual over time [55, 56], relative to the first three categories, we also included (iv) 22 pairs of longitudinal samples from the same individual, collected 4–5 months apart.
Species and strain sharing rates in the gut microbiomes of wild baboons. A Individuals included in this dataset. Each line represents an individual baboon, starting from the year it was born. The line terminates either in an (x) to represent death or an arrow to represent that the animal was alive as of December 31, 2023. Points represent sampling events. Segments are colored according to social group membership when the individual was born, while points are colored according to social group membership when the corresponding samples were taken. As females do not typically disperse in this species, animals that belonged to two or more groups during their lifetimes represent group fission or fusion events. B Species sharing across dyad types. The number of shared species between each dyad based on inStrain profiling (95% popANI). C Strain sharing rates across dyad types. The percentage of shared strains between each dyad (99.999% popANI). Only repeated samples from the same individual differed from any of the other categories (Tukey HSD; same individual — close social partners p = 0.045, same individual — different social groups p = 0.001, same individual — not alive at same time p = 0.0027). Boxplots are bounded by the lower and upper quartiles, with the middle line showing the median value and whiskers extending to 1.5 times the interquartile range
If co-residency and social interactions facilitate transmission of gut microbes, then species and strain sharing should be low in categories (i) and (ii) and higher in category (iii). In contrast to this expectation, there were no systematic differences in species sharing across categories (Fig. 3B). Within shared species, strain sharing rates also did not differ among baboons whose lives never overlapped (5.9%), baboons who lived in different social groups (5.4%), and baboons who lived in the same social group and interacted closely (7.4%). Only repeated samples from the same individual had significantly elevated strain sharing rates compared to the other categories (11.1%; Fig. 3C). There were also no differences in the family-level taxonomic composition of strains shared across categories (Fisher’s exact test, p = 0.463, Fig. S4a) or the proportion of strain sharing events involving anaerobic bacteria (Fisher’s exact test, p = 0.353, Fig. S4b). Strain sharing events among close social partners were more likely to involve rare species compared to strain sharing across social groups (Tukey HSD, p = 0.035), but not when compared to baboons that lived at different times entirely (Tukey HSD, p = 0.145, Fig. S4c).
These results were somewhat surprising in light of the strong tendency in the literature to interpret strain sharing as evidence for direct transmission or specifically social transmission [36, 38]. We therefore asked whether similarity in other characteristics, such as age, diet, or time of sampling, inflated strain sharing among baboons who never interacted directly. Among baboons who lived in different social groups at the same time, those with more similar diets in the year they were sampled had more shared strains (linear regression; slope = 19.83, df = 18, p = 0.004; Fig. 4A). This relationship was directionally similar, though weaker and not statistically significant, when diet data were aggregated by month of sampling rather than year (linear regression; slope = 3.95, df = 18, p = 0.388; Fig. S5a). Baboons had elevated strain sharing rates if they were sampled at similar times of year as well (linear regression, slope = − 1.31, df = 18, p = 0.017; Fig. 4B), possibly due to seasonal variation in their diets and environments (Fig. S5b).
Diet, sample month, and rainfall predict strain sharing among non-coresiding baboons. A Dietary similarity predicts strain sharing. Dietary similarity was calculated using the Jaccard similarity index based on the dietary compositions of each pair of baboons that lived in different social groups at similar times. Insets display dietary data from selected baboon pairs on either x-axis extreme. B Time of year predicts strain sharing. Minimum distance between the numeric representations of the months (1 = Jan, 2 = Feb, etc.), regardless of the year in which they were sampled, for pairs of baboons that lived in different social groups at similar times. No animals from this set were sampled in directly opposing months (i.e., distance = 6). C Rainfall predicts strain sharing. Rainfall was measured daily using a rain gauge and summed by month for samples taken from baboons that lived at different times
Among baboons whose lives never overlapped, pairs shared more strains if they were sampled during rainier months (considering strain sharing across both dominant and minor strains; linear regression; slope = 0.065, df = 21, p = 0.009; Fig. 4C). Notably, this pattern was undetectable when strain sharing was based only on sharing the same dominant strain of each species (linear regression, slope = 0.002, df = 21, p = 0.791; Fig. S6). During rainy periods, baboons therefore appear to be more likely to harbor multiple strains of the same species and share more nondominant strains.
Neither the number of years between the two sampling events nor the age difference between individuals at the times of sampling predicted strain sharing levels (years between sampling: linear regression, slope = − 0.424, df = 21, p = 0.532; age difference: linear regression, slope = 0.076, df = 67, p = 0.424; Fig. S7a, b).
Concordant findings across two strain sharing pipelines
Our primary approach above uses average nucleotide identity across the observed fraction of each species’ genome to measure strain sharing [26]. An alternative method is to construct a phylogeny based on a set of clade-specific marker genes and classify the closest tips of the phylogeny as the “same strain” rather than using an absolute distance measure [27]. To assess whether these alternative approaches influence strain sharing patterns in our datasets, we therefore also aligned metagenomic reads to the MetaPhlAn marker gene database [49] and then used StrainPhlAn [27] to build phylogenetic trees for all bacterial species present in four or more individuals. Following previous work, individuals with normalized branch lengths less than or equal to 0.1 (i.e., the 10th percentile of pairwise distances for the species) were considered to share a strain [35].
In the fecal microbiota transplant dataset, per-dyad estimates of strain sharing were highly correlated between inStrain and StrainPhlAn (Pearson’s correlation, r = 0.901, p < 0.001 for pairs with 3 or more shared species; Fig. S8a). Matched donor-recipient pairs shared significantly more strains than mismatched pairs (adjusted p = 0.040; Fig. S8b). The additional filtering criteria were not useful for further resolving the transmission network with StrainPhlAn, as the number of shared strains remaining quickly declined to zero for most pairs (Fig. S8b).
In the Amboseli baboon population, inStrain and StrainPhlAn also generated correlated estimates of dyad-wise strain sharing rates (Pearson’s correlation, r = 0.753, p < 0.001 for pairs with three or more shared species; Fig. S9a). As before, strain sharing rates did not significantly differ among baboons in the same social group, baboons in different social groups, and baboons whose lives did not overlap. In this case, however, even longitudinal samples did not differ from any of the other categories (Fig. S9b). This may be the result of poor representation of baboon gut-associated species in the MetaPhlAn/StrainPhlAn database. StrainPhlAn was only able to identify (and therefore compare) an average of 2.4 species per pair of samples and 98 species total. inStrain profiled an average of 52.0 species per pair and 369 species total, likely because it readily accommodates the addition of microbial genomes assembled directly from this baboon population and other nonhuman primate species (metagenome-assembled genomes or MAGs) [47].
Discussion
Strain-resolved metagenomic analysis has been proposed as a way to reliably track microbial transmission through host populations [11, 30, 31, 38, 57]. Tracking transmission is essential to understanding how contact networks and shared surfaces contribute to microbial dispersal, how transmissibility varies among microbial taxa, and how newly arrived strains establish and interact with the resident microbiome following transmission events [41, 57]. We found that strain sharing closely correlated with the true transmission network in a cohort of human patients undergoing a fecal microbiota transplant. Strain-level information significantly improved the correspondence to the transmission network compared to species-level sharing, which was not appreciably higher in matched (true) donor-recipient pairs than among other individuals in the study. Focusing on strains that were limited in prevalence further improved correspondence to the transmission network, suggesting that generally widespread strains may be more likely to be independently acquired by processes other than direct transmission.
Yet in many respects, data from microbiota transplants represent a best-case scenario for detecting microbiome transmission. In the data set we analyzed, the patients had taken antibiotics shortly before the transplant, potentially increasing the probability that the transmitted strains would establish and reach detectable levels in the gut community [39, 58, 59]. The transplant, which occurred through colonoscopy, bypassed normal transmission routes for strains that might have otherwise been poor colonizers [7]. Recipients were sampled shortly after the transplant, possibly allowing detection of short-term or unstable colonizers before they went extinct [60]. Further, the recipients all experienced gut dysbiosis prior to the FMT and may have therefore been particularly vulnerable to colonization by new bacterial strains [61]. While the FMT data set therefore provides important proof of principle that strain-resolved metagenomics can recover true transmission events, it does not show that this method does reliably reflect transmission networks in natural populations, where individuals have the simultaneous potential to be donors and recipients at all times.
Our analysis in the Amboseli baboons points to the much higher complexity of this problem. In contrast to the FMT dataset, background strain sharing in this population was often high among individuals that had never co-resided, especially if they shared other environmental characteristics. For example, baboons that ate similar diets (based on group-level data on diet in the year of sample collection) often had strain sharing rates exceeding those of close grooming partners living in the same group. This result is qualitatively consistent with a large body of work showing that diet predicts gut microbiome species composition [62,63,64] but suggests that diet shapes microbiome similarity at an even finer genetic resolution than previously appreciated. Among baboons whose lives never overlapped at all, strain sharing patterns were explained in part by rainfall. The savannah ecosystem of Amboseli is characterized by a 5-month-long dry season (June to October), followed by a 7-month period of highly variable rainfall. When baboons were sampled in months with more rainfall, they harbored more within-species genetic diversity and were more likely to share low-abundance strains. We speculate that this elevated strain-level diversity may be a result of interacting with (and eating) more diverse types of vegetation in rainier periods [65] or by rain-driven activation of dormant microbial populations in the soil [66, 67]. The latter explanation may be particularly important for explaining how animals living in different time periods nevertheless can take up nearly identical strains.
In summary, strain sharing patterns in the Amboseli baboon population are not strongly driven by known patterns of social interaction, at least in this temporally and environmentally heterogeneous sample. This result contrasts with the fecal microbiota transplant data, where strain sharing clearly recapitulated transmission pathways. One possible explanation for this difference is that social interactions are not a significant pathway for microbiome transmission, and that host-to-host transmission was detected in the FMT because the transplant method bypassed normal colonization routes. However, other work on this population has reported elevated microbiome similarity among close social partners, both at the species level [2] and the strain level [37]. In these previous analyses, all samples were collected close in time (i.e., within a 2-month period) during a period of relatively little environmental variation. We therefore suspect that a more important explanation relates to the fact that strain-level compositional patterns in microbiomes are the product of both transmission and persistence. The distribution of a given strain may reflect its transmission history for a short time after transmission, but in the long term, other ecological processes such as selection, priority effects, and demographic stochasticity will determine whether it persists or goes extinct within each host [68,69,70].
Following this reasoning, the best strategy for identifying transmission networks in natural populations may be to focus on recently acquired strains. This approach in turn underscores the importance of longitudinal sampling, which provides the key information needed to identify newly acquired strains. Another reason to sample repeatedly within short time intervals is to minimize the possibility of misclassifying strain sharing events as transmission (e.g., due to independent acquisition of microbes from environmental reservoirs). The genetic similarity threshold recommended for the inStrain pipeline, which we deployed here, is designed to discriminate between strains that diverged as recently as 2.2 years, but this calculation is based on substitution rates in the human gut [26, 71]. Bacteria grow and evolve more slowly in many other environments, including soil [72]. In our study, strain sharing across nonoverlapping baboon generations was elevated if one or both of individuals in a pair were sampled during a rainy month. If rain revives dormant bacterial populations in environmental reservoirs [66, 67], then these strain sharing events may represent long periods of little evolutionary change between hosts rather than continuous transmission of actively evolving bacterial lineages.
Another conclusion from this study is the value of sampling social networks as completely as possible to obtain information about bacterial distributions at the population level. Our analyses support the idea that some bacterial species are more likely to mirror true transmission networks than others. In the FMT data set, strains shared by true donor-recipient pairs were enriched for bacteria with otherwise limited prevalence in the population, obligately anaerobic bacteria that are unlikely to survive in the environment, and bacteria reported only in human hosts. This result is consistent with reports in several animal populations that socially shared microbial species are enriched for traits associated with host dependence [2, 11, 12], as well as theory predicting that reliance on transmission between hosts will select for such traits [8, 73]. Our study therefore suggests that filtering out widely shared, generalist strains that may be acquired independently from host-to-host transmission can improve the signal of true transmission networks.
Another consideration is the selection of an appropriate strain profiling method and reference database, especially when working in nonhuman or non-model systems. We asked whether the main conclusions of our study would hold using an alternative, phylogeny-based strain inference approach. Estimates of strain sharing rates were highly correlated across pipelines, and conclusions based on strain sharing patterns across categories/dyad types were qualitatively consistent. Both pipelines performed well at classifying reads from human-derived metagenomes, but in the baboon data set, StrainPhlAn was limited by its reliance on a reference database in which nonhuman microbes are still poorly represented. In contrast, inStrain can easily accommodate user-provided genomes, which greatly improved database coverage of the baboon dataset (Fig. S8a). However, it is still possible that inStrain misses signals of social transmission in the remaining unmapped reads. Our analyses indicate that differences in sequencing depth and/or mapping rates across samples did not explain variation in strain sharing rates. Nevertheless, if socially transmitted species are disproportionately baboonspecific (and therefore missing from human-centric databases) or low abundance (therefore missing from the metagenome-assembled genomes we used to expand the standard database [47]), our findings could underestimate the signal of social transmission overall. Whether either explanation is important in this dataset is a question that only additional metagenome- and culture-based assembly of new microbial genomes can help answer.
Finally, our study strongly supports the importance of considering host traits and environmental conditions, beyond social interaction itself. In the Amboseli baboon data set, strain sharing was elevated among individuals who spent their lives in different social groups if they were eating similar diets at the time of sampling. Diet is frequently confounded with sociality in species where social partners forage together, share or steal food, or interact with similar parts of a heterogeneous landscape [19, 74, 75]. Other characteristics with the potential to affect microbiome composition, such as age and host genetics, are often more similar among social partners as well [20, 76]. Studies that simply compare strain sharing within and across social units without considering these additional confounds are at risk of overestimating the contribution of social transmission. More generally, our findings argue that social transmission should not be treated as the default explanation for observations of species or strain sharing among interacting individuals.
In summary, strain-resolved metagenomic analyses have clear value for resolving microbiome transmission networks beyond the resolution of species- or genus-level profiling. However, the insights achievable from strain sharing analyses can be improved by careful study design. The proportion of strains shared between individuals is the result of three processes: (i) how many strains they have exchanged via transmission, (ii) how many strains they have independently acquired (e.g., from the environment or other hosts), and (iii) how many of those strains they have both independently retained since transmission occurred. The latter two processes are facilitated by shared environments and can inflate estimates of transmission among social partners. As an alternative approach to coarse estimates of strain sharing rates, we recommend using repeated longitudinal sampling to identify strains that move between hosts within a short period of time. These recently acquired strains are more likely to be reliable indicators of the transmission network, as they have not yet been subjected to extended selection pressures that would alter their distribution in the population. In addition to sampling design, it is also critical to consider ecological, behavioral, genetic, and demographic characteristics that shape microbiome composition, especially when those characteristics are more similar among social partners. Importantly, as in the case of our unexpected finding of greater sharing across wetter months—even when temporally separated by years—taking these additional factors into account can also suggest interesting new routes for mapping the transmission landscapes of microbiomes in the wild.
Data availability
This paper analyzes a combination of existing, publicly available data and newly generated metagenomic sequences. The newly generated sequences are available on the NCBI Sequence Read Archive (accession PRJNA1135081). All original code is available at the Github repository reenadebray/microbiome-strain-sharing.
References
Theis KR, Schmidt TM, Holekamp KE. Evidence for a bacterial mechanism for group-specific social odors among hyenas. Sci Rep. 2012;2:1–8.
Tung J, Barreiro LB, Burns MB, Grenier J-C, Lynch J, Grieneisen LE, et al. Social networks predict gut microbiome composition in wild baboons. eLife. 2015;4: e05224.
Vernier CL, Chin IM, Adu-Oppong B, Krupp JJ, Levine J, Dantas G, et al. The gut microbiome defines social group membership in honey bee colonies. Science Advances. 2020;6: eabd3431.
Grieves LA, Gloor GB, Quinn JS. Symbiotic microbiota vary with breeding group membership in a highly social joint-nesting bird. Behav Ecol. 2023;34:653–61.
Musciotto F, Dobon B, Greenacre M, Mira A, Chaudhary N, Salali GD, et al. Agta hunter–gatherer oral microbiomes are shaped by contact network structure. Evol Hum Sci. 2023;5:e9.
Snyder-Mackler N, Burger JR, Gaydosh L, Belsky DW, Noppert GA, Campos FA, et al. Social determinants of health and survival in humans and other animals. Science. 2020;368:eaax9553.
Browne HP, Neville BA, Forster SC, Lawley TD. Transmission of the gut microbiota: spreading of health. Nat Rev Microbiol. 2017;15:531–43.
Troyer K. Microbes, herbivory and the evolution of social behavior. J Theor Biol. 1984;106:157–69.
Chomicki G, Toby Kiers E, Renner SS. the evolution of mutualistic dependence. Annu Rev Ecol Evol Syst. 2020;51:409–32.
Lombardo MP. Access to mutualistic endosymbiotic microbes: an underappreciated benefit of group living. Behav Ecol Sociobiol. 2007;62:479–97.
Nayfach S, Rodriguez-Mueller B, Garud N, Pollard KS. An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography. Genome Res. 2016;26:1612–25.
Raulo A, Bürkner P-C, Finerty GE, Dale J, Hanski E, English HM, et al. Social and environmental transmission spread different sets of gut microbes in wild mice. Nat Ecol Evol. 2024;8:972–85.
Amaral WZ, Lubach GR, Proctor A, Lyte M, Phillips GJ, Coe CL. Social influences on Prevotella and the gut microbiome of young monkeys. Psychosom Med. 2017;79:888.
Zhang T, Li M, Shi T, Yan Y, Niyazbekova Z, Wang X, et al. Transmission of the gut microbiome in cohousing goats and pigs. Front Microbiol. 2022;13:948617.
Yarlagadda K, Razik I, Malhi RS, Carter GG. Social convergence of gut microbiomes in vampire bats. Biol Lett. 2021;17:20210389.
Delgado-Baquerizo M, Oliverio AM, Brewer TE, Benavent-González A, Eldridge DJ, Bardgett RD, et al. A global atlas of the dominant bacteria found in soil. Science. 2018;359:320–5.
Wilkinson GS. Social grooming in the common vampire bat. Desmodus rotundus Anim Behav. 1986;34:1880–9.
Lamprecht J. The function of social hunting in larger terrestrial carnivores. Mamm Rev. 1981;11:169–79.
Michel A, Minocher R, Niehoff P-P, Li Y, Nota K, Gadhvi MA, et al. Isolated Grauer’s gorilla populations differ in diet and gut microbiome. Mol Ecol. 2023;32:6523–42.
Wey TW, Blumstein DT. Social cohesion in yellow-bellied marmots is established through age and kin structuring. Anim Behav. 2010;79:1343–52.
Franks, Victoria R., Ewen, John G., McCready, Mhairi, Rowcliffe, J. Marcus, Smith D, Thorogood R. Analysing age structure, residency and relatedness uncovers social network structure in aggregations of young birds. Anim Behav. 2020;166: 73–84.
Yatsunenko T, Rey FE, Manary MJ, Trehan I, Dominguez-Bello MG, Contreras M, et al. Human gut microbiome viewed across age and geography. Nature. 2012;486:222–7.
Bonder MJ, Kurilshikov A, Tigchelaar EF, Mujagic Z, Imhann F, Vila AV, et al. The effect of host genetics on the gut microbiome. Nat Genet. 2016;48:1407–12.
Truong DT, Tett A, Pasolli E, Huttenhower C, Segata N. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res. 2017;27:626–38.
Podlesny D, Arze C, Dörner E, Verma S, Dutta S, Walter J, et al. Metagenomic strain detection with SameStr: identification of a persisting core gut microbiota transferable by fecal transplantation. Microbiome. 2022;10:1–15.
Olm MR, Crits-Christoph A, Bouma-Gregson K, Firek BA, Morowitz MJ, Banfield JF. inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains. Nat Biotechnol. 2021;39:727–36.
Beghini F, McIver LJ, Blanco-Míguez A, Dubois L, Asnicar F, Maharjan S, et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. eLife. 2021;10: e65088.
Brito IL, Gurry T, Zhao S, Huang K, Young SK, Shea TP, et al. Transmission of human-associated microbiota along family and social networks. Nat Microbiol. 2019;4:964–71.
Enav H, Paz I, Ley RE. Strain tracking in complex microbiomes using synteny analysis reveals per-species modes of evolution. Nature Biotechnology. 2024; 1–11.
Valles-Colomer M, Blanco-Míguez A, Manghi P, Asnicar F, Dubois L, Golzato D, et al. The person-to-person transmission landscape of the gut and oral microbiomes. Nature. 2023;614:125–35.
Beghini F, Pullman J, Alexander M, Shridhar SV, Prinster D, Singh A, et al. Gut microbiome strain-sharing within isolated village social networks. Nature. 2024;637:167–75.
Aggarwala V, Mogno I, Li Z, Yang C, Britton GJ, Chen-Liaw A, et al. Precise quantification of bacterial strains after fecal microbiota transplantation delineates long-term engraftment and explains outcomes. Nat Microbiol. 2021;6:1309–18.
Lou YC, Olm MR, Diamond S, Crits-Christoph A Firek Brian, Baker R, Morowitz MJ, et al. Infant gut strain persistence is associated with maternal origin, phylogeny, and traits including surface adhesion and iron acquisition. Cell Rep Med. 2021;2: 100393.
Selma-Royo M, Dubois L, Manara S, Armanini F, Cabrera-Rubio R, Valles-Colomer M, et al. Birthmode and environment-dependent microbiota transmission dynamics are complemented by breastfeeding during the first year. Cell Host Microbe. 2024;32:996-1010.e4.
Ferretti P, Pasolli E, Tett A, Asnicar F, Gorfer V, Fedi S, et al. Mother-to-infant microbial transmission from different body sites shapes the developing infant gut microbiome. Cell Host Microbe. 2018;24:133-145.e5.
Carter, Matthew M., Olm, Matthew R., Merrill, Bryan D., Dahan, Dylan, Tripathi S, Spencer SP, Yu FB, Jain S, Neff N, et al. Ultra-deep sequencing of Hadza hunter-gatherers recovers vanishing gut microbes. Cell. 2023;186: 3111–3124.e13.
Debray R, Tung J, Archie EA. Ecology and evolution of the social microbiome. Annual Reviews of Ecology, Evolution, and Systematics. 2024;55.
Asnicar F, Manara S, Zolfo M, Truong DT, Scholz M, Armanini F, et al. Studying vertical microbiome transmission from mothers to infants by strain-level metagenomic profiling. mSystems. 2017;2: e00164–16.
Ianiro G, Punčochář M, Karcher N, Porcari S, Armanini F, Asnicar F, et al. Variability of strain engraftment and predictability of microbiome composition after fecal microbiota transplantation across different diseases. Nat Med. 2022;28:1913–23.
Alberts SC, Altmann J. The Amboseli Baboon Research Project: 40 years of continuity and change. Long-Term Field Studies of Primates. 2012; 261–287.
Sarkar A, Harty S, Johnson KV-A, Moeller AH, Archie EA, Schell LD, et al. Microbial transmission in animal social networks and the social microbiome. Nature Ecology & Evolution. 2020;4: 1020–1035.
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–20.
Almeida A, Nayfach S, Boland M, Strozzi F, Beracochea M, Shi ZJ, et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat Biotechnol. 2020;39:105–14.
Langmead B, Salzberg SL. Fast gapped-read alignment with bowtie 2. Nat Methods. 2012;9:357–9.
Vilgalys TP, Fogel AS, Anderson JA, Mututua RS, Kinyua Warutere J, Long’ida Siodi I, et al. Selection against admixture and gene regulatory divergence in a long-term primate field study. Science. 2022;377: 635–641.
Campos FA, Villavicencio F, Archie EA, Colchero F, Alberts SC. Social bonds, social status and survival in wild baboons: a tale of two sexes. Philos Trans R Soc Lond B Biol Sci. 2020;375:20190621.
Manara S, Asnicar F, Beghini F, Bazzani D, Cumbo F, Zolfo M, et al. Microbial genomes from non-human primate gut metagenomes expand the primate-associated bacterial tree of life with over 1000 novel species. Genome Biol. 2019;20:1–16.
Olm MR, Brown CT, Brooks B, Banfield JF. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 2017;11:2864–8.
Blanco-Míguez A, Beghini F, Cumbo F, McIver LJ, Thompson KN, Zolfo M, et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat Biotechnol. 2023;41:1633–44.
Alberts S, Archie E, Altmann J, Tung J. Monitoring guide for the Amboseli Baboon Research Project: protocols for long-term monitoring and data collection. 2016.
Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics. 2022;38:5315–6.
Mukherjee S, Stamatis D, Bertsch J, Ovchinnikova G, Sundaramurthi JC, Lee J, et al. Genomes OnLine Database (GOLD) vol 8: overview and updates. Nucleic Acids Res. 2020;49:D723–33.
Goloshchapov OV, Olekhnovich EI, Sidorenko SV, Moiseev IS, Kucher MA, Fedorov DE, et al. Long-term impact of fecal transplantation in healthy volunteers. BMC Microbiol. 2019;19:1–13.
Björk JR, Dasari MR, Roche K, Grieneisen L, Gould TJ, Grenier J-C, et al. Synchrony and idiosyncrasy in the gut microbiome of wild baboons. Nat Ecol Evol. 2022;6:955–64.
Wolff R, Shoemaker W, Garud N. Ecological stability emerges at the level of strains in the human gut microbiome. MBio. 2023;14:e02502-e2522.
Chen L, Wang D, Garmaeva S, Kurilshikov A, Vila AV, Gacesa R, et al. The long-term genetic stability and individual specificity of the human gut microbiome. Cell. 2021;184:2302-2315.e12.
Brito IL, Alm EJ. Tracking strains in the microbiome: insights from metagenomics and models. Front Microbiol. 2016;7:175285.
Freitag TL, Hartikainen A, Jouhten H, Sahl C, Meri S, Anttila V-J, et al. Minor effect of antibiotic pre-treatment on the engraftment of donor microbiota in fecal transplantation in mice. Front Microbiol. 2019;10:483132.
Gopalakrishnan V, Dozier EA, Glover MS, Novick S, Ford M, Morehouse C, et al. Engraftment of bacteria after fecal microbiota transplantation is dependent on both frequency of dosing and duration of preparative antibiotic regimen. Microorganisms. 2021;9:1399.
Damman CJ, Brittnacher MJ, Westerhoff M, Hayden HS, Radey M, Hager KR, et al. Low level engraftment and improvement following a single colonoscopic administration of fecal microbiota to patients with ulcerative colitis. PLoS ONE. 2015;10:e0133925.
McDonald D, Ackermann G, Khailova L, Baird C, Heyland D, Kozar R, et al. Extreme dysbiosis of the microbiome in critical illness. mSphere. 2016;1:10.1128: msphere.00199–16.
Turnbaugh PJ, Ridaura VK, Faith JJ, Rey FE, Knight R, Gordon JI. The effect of diet on the human gut microbiome: a metagenomic analysis in humanized gnotobiotic mice. Sci Transl Med. 2009;1:6ra14.
David LA, Maurice CF, Carmody RN, Gootenberg DB, Button JE, Wolfe BE, et al. Diet rapidly and reproducibly alters the human gut microbiome. Nature. 2013;505:559–63.
de Jonge N, Carlsen B, Christensen MH, Pertoldi C, Nielsen JL. The gut microbiome of 54 mammalian species. Front Microbiol. 2022;13:886252.
Post DG. Feeding behavior of yellow baboons (Papio cynocephalusin) the Amboseli National Park. Kenya Int J Primatol. 1982;3:403–30.
Barnard RL, Osborne CA, Firestone MK. Responses of soil bacterial and fungal communities to extreme desiccation and rewetting. ISME J. 2013;7:2229–41.
Joergensen RG, Wichern F. Alive and kicking: why dormant soil microorganisms matter. Soil Biol Biochem. 2018;116:419–30.
Adair KL, Douglas AE. Making a microbiome: the many determinants of host-associated microbial community composition. Curr Opin Microbiol. 2017;35:23–9.
Debray R, Herbert RA, Jaffe AL, Crits-Christoph A, Power ME, Koskella B. Priority effects in microbiome assembly. Nat Rev Microbiol. 2022;20:109–21.
Hayashi I, Fujita H, Toju H. Deterministic and stochastic processes generating alternative states of microbiomes. ISME Commun. 2024;4:ycae007.
Zhao S, Lieberman TD, Poyet M, Kauffman KM, Gibbons SM, Groussin M, et al. Adaptive evolution within gut microbiomes of healthy people. Cell Host Microbe. 2019;25:656-667.e8.
Caro TA, McFarlin J, Jech S, Fierer N, Kopf S. Hydrogen stable isotope probing of lipids demonstrates slow rates of microbial growth in soil. Proc Natl Acad Sci. 2023;120:e2211625120.
Obeng N, Bansept F, Sieber M, Traulsen A, Schulenburg H. Evolution of microbiota–host associations: the microbe’s perspective. Trends Microbiol. 2021;29:779–87.
Jaeggi AV, Van Schaik CP. The evolution of food sharing in primates. Behav Ecol Sociobiol. 2011;65:2125–40.
Moeller AH, Foerster S, Wilson ML, Pusey AE, Hahn BH, Ochman H. Social behavior shapes the chimpanzee pan-microbiome. Sci Adv. 2016;2:e1500997.
Ross KG. Molecular ecology of social behaviour: analyses of breeding systems and genetic structure. Mol Ecol. 2001;10:265–84.
Acknowledgements
In Kenya, our research was approved by the Wildlife Research Training Institute (WRTI), Kenya Wildlife Service (KWS), the National Commission for Science, Technology, and Innovation (NACOSTI), and the National Environment Management Authority (NEMA). We also thank the University of Nairobi, the Kenya Institute of Primate Research (KIPRE), the National Museums of Kenya, the members of the Amboseli-Longido pastoralist communities, the Enduimet Wildlife Management Area, Ker & Downey Safaris, Air Kenya, and Safarilink for their cooperation and assistance in the field. Particular thanks go to the Amboseli Baboon Project long-term field team (R.S. Mututua, S. Sayialel, J.K. Warutere, I.L. Siodi, I.L., and L. Musembei), and to T. Wango and V. Oudu for their assistance in Nairobi. The baboon project database, Babase, is expertly managed by N. Learn, J. Gordon, and W. Wilbur. Database design and programming are provided by K. Pinc. For a complete set of acknowledgments of funding sources, logistical assistance, and data collection and management, please visit http://amboselibaboons.nd.edu/acknowledgements/.
Funding
Open Access funding enabled and organized by Projekt DEAL. We gratefully acknowledge the support of the National Science Foundation and the National Institutes of Health for the majority of the data represented here, currently through R01AG071684, R01AG075914, and R61AG078470, as well as prior funding from the National Science Foundation that supported microbiome data generation in Amboseli, including IOS-1053461 and DEB-01840223. Current support for field-based data collection also comes from the Max Planck Institute for Evolutionary Anthropology, and we thank Duke University, Princeton University, and the University of Notre Dame for financial and logistical support.
Author information
Authors and Affiliations
Contributions
Conceptualization: RD and JT; Methodology: RD, SEW, CD, EA and JT; Investigation: RD, SEW, CD; Formal Analysis: RD, CD; Resources: EA, JT; Data Curation: RD, SEW, CD, EA, JT; Writing – Original Draft: RD, JT; Writing – Review & Editing: SEW, CD, EA, JT; Visualization: RD; Supervision: EA, JT; Project Administration: EA, JT; Funding Acquisition: EA, JT.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
This research was approved by the IACUC at Duke University and the University of Notre Dame and the Ethics Council of the Max Planck Society and adhered to all the laws and guidelines of Kenya (Research Permit: NACOSTI-P-23–28575).
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
40168_2025_2051_MOESM1_ESM.xlsx
Supplementary Material 1. Supplementary tables: Supplementary Table 1: Read counts and mapping statistics for the human fecal microbiota transplant dataset. Supplementary Table 2: Pairwise comparisons in baboon gut metagenome dataset. Supplementary Table 3: Read counts and mapping statistics for baboon gut metagenome dataset.
40168_2025_2051_MOESM2_ESM.docx
Supplementary Material 2. Supplementary figures: Figure S1. Representation of individuals in multiple dyads does not bias results. We iteratively downsampled the full dataset to include only a single randomly selected sample from each individual. The adjusted p-values as reported by a Tukey’s HSD Test were generally consistent with the results of the full dataset. Comparisons that were not significant in the full dataset (e.g., Not alive at same time – Different social groups) never rose to significance in our permutations, while comparisons that were significant in the full dataset (e.g., Not alive at same time – Longitudinal) typically remained significant despite the reduced sample size. Dashed line indicates padj=0.05. Figure S2. Mapping rates in the baboon gut metagenome dataset. (a) Database customization with metagenome-assembled genomes. Percentage of reads after quality filtering that aligned to microbial genomes in either the Unified Human Gastrointestinal Genome (UHGG) database or the UHGG database plus metagenome-assembled genomes from non-human primate fecal samples. (b) Mapping rates by month of sample collection. Mapping rates varied slightly by sampling month (ANOVA, p=0.053), with a tendency for samples from the dry season to map better than samples from the wet season. (c) Mapping rates by year of sample collection. Mapping rates did not vary by year. Figure S3. Bacterial taxa shared between donors and recipients of fecal microbiota transplants. (a) Family-level structure of bacterial taxa that were shared at the species level between subjects. Bar chart of species (aggregated into families) that exhibit species sharing between matched (left) or mismatched (right) pairs. Colors reflect different bacterial families and are scaled to represent the relative proportions of strain-sharing events in each category. (b) Species-level sharing of families among matched pairs relative to mismatched pairs. The log2 odds ratio represents the relative likelihood that bacterial families were shared at the species level between matched donor-recipient pairs, compared to their species sharing rates between mismatched pairs. (c) Strain-level sharing of families among matched pairs relative to mismatched pairs. The log2 odds ratio represents the relative likelihood that bacterial families were shared at the strain level between matched donor-recipient pairs, compared to their strain sharing rates between mismatched pairs. Error bars represent 95% confidence intervals. Error bars overlapping 0 (dashed line) are not significantly enriched or depleted in matched pairs. Figure S4. Taxonomic and functional characteristics of bacterial taxa in which strain sharing was detected between baboons in a wild population. (a) Family-level structure of bacterial taxa that were shared at the strain level between individuals. Bar chart of species (aggregated into families) that exhibit strain-sharing between pairs of baboons, based on a 99.999% ANI threshold. Colors reflect different bacterial families and are scaled to represent the relative proportions of strain-sharing events in each category. (b) Anaerobic metabolism across varying definitions of transmission. Strain sharing events were annotated based on phenotype information available in the Genomes OnLine database. (c) Population prevalence of bacterial species. Each strain sharing event was annotated according to the prevalence of the species to which it belonged, excluding later longitudinal samples to avoid double-counting individuals. The x-axis represents the number of baboons containing the species (of a maximum possible of 93). Figure S5. Diet and seasonality in Amboseli. (a) Dietary similarity between social groups. Dietary similarity was calculated using the Jaccard similarity index based on the dietary compositions of each pair of baboons that lived in different social groups at similar times, aggregated by month. (b) Variation in diets throughout the year. Colors reflect different foods consumed by the baboons between 2007-2017 and are scaled to represent the relative proportions consumed in each month. Figure S6. A consensus-based approach does not detect an effect of rainfall on strain sharing. Rainfall was measured daily using a rain gauge and summed by month for samples taken from baboons that lived at different times. The y-axis represents strain sharing based on the consensus average nucleotide identity (conANI) value calculated by inStrain. This metric considers two genomes to differ at a given site if their consensus alleles are different (i.e., it ignores minor allele sharing). (b) Microdiversity-aware approach (reproduced from Figure 4b). This metric considers two genomes to differ at a given site if they do not share any alleles at that site (major or minor). Figure S7. Host characteristics that did not predict strain sharing rates. (a) Number of years between samples. Plot includes only baboon pairs that lived at different times, as the remaining dyad types were intentionally sampled within short time periods. (b) Difference in ages at times of sampling. Chronological ages were known with high confidence because all subjects were born in regularly censused study groups. Figure S8. Strain sharing analysis of FMT dataset with StrainPhlAn pipeline. (a) Correlation between inStrain and StrainPhlAn estimates of strain sharing. Each point represents a donor-recipient pair (either matched or mismatched) in the FMT dataset. Samples with fewer than three shared species (i.e., the denominator in the calculation of strain sharing rates) are excluded from this visualization. (b) Strain sharing across varying definitions of transmission. The percentage of strain sharing events among matched donor-recipient dyads (blue) and all other comparisons (red) is shown following serially more stringent filtering criteria. Asterisks represent significant differences between matched and mismatched cohorts based on t-test and Benjamini-Hochberg correction: (***) p<0.001; (**) 0.001≤p<0.01; (*) 0.01≤p<0.05. Figure S9. Strain sharing analysis of baboon dataset with StrainPhlAn pipeline. (a) Correlation between inStrain and StrainPhlAn estimates of strain sharing. Each point represents a donor-recipient pair in the FMT dataset. Samples with fewer than three shared species (i.e., the denominator in the calculation of strain sharing rates) are excluded from this visualization. (b) Strain sharing rates across dyad types. The percentage of shared strains between each dyad (≤0.1 normalized phylogenetic distance).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Debray, R., Dickson, C.C., Webb, S.E. et al. Shared environments complicate the use of strain-resolved metagenomics to infer microbiome transmission. Microbiome 13, 59 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s40168-025-02051-8
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s40168-025-02051-8