- Research
- Open access
- Published:
Diverse and specialized metabolic capabilities of microbes in oligotrophic built environments
Microbiome volume 12, Article number: 198 (2024)
Abstract
Background
Built environments (BEs) are typically considered to be oligotrophic and harsh environments for microbial communities under normal, non-damp conditions. However, the metabolic functions of microbial inhabitants in BEs remain poorly understood. This study aimed to shed light on the functional capabilities of microbes in BEs by analyzing 860 representative metagenome-assembled genomes (rMAGs) reconstructed from 738 samples collected from BEs across the city of Hong Kong and from the skin surfaces of human occupants. The study specifically focused on the metabolic functions of rMAGs that are either phylogenetically novel or prevalent in BEs.
Results
The diversity and composition of BE microbiomes were primarily shaped by the sample type, with Micrococcus luteus and Cutibacterium acnes being prevalent. The metabolic functions of rMAGs varied significantly based on taxonomy, even at the strain level. A novel strain affiliated with the Candidatus class Xenobia in the Candidatus phylum Eremiobacterota and two novel strains affiliated with the superphylum Patescibacteria exhibited unique functions compared with their close relatives, potentially aiding their survival in BEs and on human skins. The novel strains in the class Xenobia possessed genes for transporting nitrate and nitrite as nitrogen sources and nitrosative stress mitigation induced by nitric oxide during denitrification. The two novel Patescibacteria strains both possessed a broad array of genes for amino acid and trace element transport, while one of them carried genes for carotenoid and ubiquinone biosynthesis. The globally prevalent M. luteus in BEs displayed a large and open pangenome, with high infraspecific genomic diversity contributed by 11 conspecific strains recovered from BEs in a single geographic region. The versatile metabolic functions encoded in the large accessory genomes of M. luteus may contribute to its global ubiquity and specialization in BEs.
Conclusions
This study illustrates that the microbial inhabitants of BEs possess metabolic potentials that enable them to tolerate and counter different biotic and abiotic conditions. Additionally, these microbes can efficiently utilize various limited residual resources from occupant activities, potentially enhancing their survival and persistence within BEs. A better understanding of the metabolic functions of BE microbes will ultimately facilitate the development of strategies to create a healthy indoor microbiome.
Video Abstract
Introduction
Built environments (BEs) are habitats for many microbes, including bacteria, fungi, and viruses [1]. The diversity and composition of BE microbiomes are influenced by a multitude of factors (e.g., geography [2], ventilation strategy [3], and humidity [4]). Some microbes in BEs may be actively persisting or proliferating, while others are inactive, dormant, or dying, depending on their compatibility with the environmental conditions (permissive or restrictive) [5]. Due to exchanges with occupants and nearby surroundings, bacteria and fungi associated with humans (e.g., Micrococcus and Malassezia) and soils (e.g., Bacillus and Pseudomonas) are among the most prevalent in BEs [6].
Unlike nutrient-rich ecosystems such as the human gut and soils, BEs that do not have dampness or other environmental issues are considered oligotrophic, but microbes from diverse lineages may be metabolically active under such conditions [7]. In general, when faced with nutrient-depleted or unfavorable conditions, some of the metabolically active taxa may exist in a non- or slow-growing state [8, 9] and synthesize macromolecules to maintain a basal metabolism [10]. Variation in habitat conditions not only affects gene expression in a microbial lineage but also shapes its gene contents through selection and drift after mutations and after gene flow, resulting in genetic variability even within a species [11]. The specific conditions (e.g., temperature and antimicrobial substances) of BEs can vary; thus, a ubiquitous microbial species might display genetic diversification across different types and locations of BEs. Similar genetic variation across locations can be found in many other ecosystems [12]. The evolution, niche adaptation, and even infectious potential of a species in diverse habitats can be revealed by a pangenome analysis of its essential genes (i.e., core genome) and dispensable genes (i.e., accessory genome) [13, 14].
The ability to reconstruct nearly complete metagenome-assembled genomes (MAGs) from environmental metagenomes has facilitated the recovery of many previously uncharacterized microbes and the expansion of the phylogenetic and functional diversity of uncultivated Bacteria and Archaea [15, 16]. Through reconstructed MAGs, members of the newly defined superphylum Patescibacteria, consisting of uncultured, deeply branching lineages in bacteria, have been found in a variety of environments, including groundwater [17], soils [18], marine sediments [19], and the human oral cavity [20]. Patescibacteria typically have an ultra-small genome, a parasitic lifestyle, and streamlined functions [21]. Similarly, bacteria identified using MAGs in the novel Candidatus phylum Eremiobacterota have gained attention due to their ability to perform carbon fixation by oxidizing trace gases from the atmosphere as energy and carbon sources [22]. This phylum encompasses diverse species that are metabolically versatile, enabling them to thrive under harsh environmental conditions [23]. However, the specific metabolic functions of bacteria from the Patescibacteria and Eremiobacterota phyla in BE settings, which are frequently influenced by human presence, have yet to be reported.
In this study, we performed an analysis of 738 metagenomic samples collected from diverse BEs and the skin surfaces of human occupants in Hong Kong with the goal of investigating the metabolic functions of the microbial inhabitants in these settings. A total of 860 non-redundant representative MAGs (rMAGs) were reconstructed, of which 373 were considered to represent novel species. One novel strain affiliated with Eremiobacterota and two novel strains affiliated with Patescibacteria were identified, and their metabolism was found to be closely linked to the conditions present in BEs, which are significantly influenced by human activities. The pangenome of Micrococcus luteus, a species prevalent in local and global BEs and on human skin, revealed a high infraspecific diversity and functional specificity even within a single geographic location. This study provides new insights into the functional characteristics of microbes inhabiting BEs, highlighting the potential influences of occupant activities on microbial metabolic potentials.
Results
Diversity and composition of BE microbiomes varied by sample type
The majority of the reads in all of the samples were assigned to Bacteria (average 99.6 ± 0.35%), with minor proportions of Archaea (0.14 ± 0.11%) and viruses (0.27 ± 0.32%), while the lack of fungal taxa was due to the annotation database used. At the phylum level, Actinobacteriota dominated on skin and residential surfaces, while Proteobacteria dominated on pier surfaces and in subway air and surfaces (Fig. 1a). The 10 most abundant species, each with an average relative abundance > 1%, were prevalent in all of the samples, collectively accounting for a total average relative abundance of 33.9 ± 20.4% (Fig. S1). Among these, M. luteus and Cutibacterium acnes were the most dominant species across sample types (Fig. 1b). Indicator species analysis showed that bacteria associated with marine environments (i.e., Vibrio alginolyticus and Gloeocapsa sp. PCC 7428) were indicative of pier surfaces, while the human skin-associated bacterium Staphylococcus capitis was indicative of subway air (Table S1). The microbial composition significantly differed by sample type (permutational multivariate analysis of variance [PERMANOVA], p = 0.001, pseudo-F = 39.5, R2 = 0.21, Table S2), although a significant dispersion effect was also detected between some sample types in pairwise comparisons (Table S2), highlighting the influence of sample type on the composition of BE microbiomes. The accumulation curve showed that species richness increased with sample size and reached a plateau (Fig. S2), suggesting that the species identified in this meta-analysis were representative. The subway air microbiome contained the highest number of species, the pier surface microbiome was the most diverse and even, and the skin microbiome was the least diverse and most uneven (Fig. 1c). Because skin-associated bacteria (e.g., Cutibacterium species) were detected in different sample types, and considering the urban setting of this study along with Hong Kong’s high population density, the extent to which skin microbiomes contributed to the BEs was investigated. A pairwise comparison between samples revealed that the highest similarity was between the skin and subway air microbiomes, while the lowest was between the skin and pier surface microbiomes (Fig. 1d). These results corroborate previous findings that human occupants are major contributors to indoor BE microbiomes, especially in locations with high occupancy (e.g., subways), while the human contribution was significantly reduced in outdoor BEs (e.g., piers).
Taxonomy and diversity of BE microbiomes. a Top 10 phyla in BE microbiomes across the six sample types. All other phyla are grouped into the “Others” category. The samples from each sample type are arranged based on decreasing relative abundance of Proteobacteria from left to right. b Density plot of the two most abundant species, Cutibacterium acnes and Micrococcus luteus, in all of the samples across sample types. The mean relative abundance of each core species is indicated by a vertical dashed line for each sample type. c Alpha diversity of the microbiomes across different sample types as assessed by Pielou’s evenness, species richness, and the Shannon diversity index. d Pairwise comparison of microbiomes between skin and other types of BE based on the Bray–Curtis dissimilarity
rMAGs reconstructed from BEs clustered by their functional potentials
Following genome assembly, binning, and dereplication, 860 rMAGs were obtained, comprising 736 and 124 that were of medium and high quality, respectively. Taxonomic annotation showed that all of the rMAGs belonged to Bacteria, with the majority of them assigned to the phyla Proteobacteria (n = 400) and Actinobacteriota (n = 335), consistent with the short-read-based taxonomic classification (Fig. 1a). Among the rMAGs, 363 reached their deepest level of taxonomic classification at the genus rank, exhibiting an average nucleotide identity (ANI) value of < 95% to the closest representative in the Genome Taxonomy Database (GTDB) [24], thus being considered as novel species. One medium-quality unclassified rMAG with a completeness reaching 84.1% was assigned to the Candidatus class Xenobia in the Candidatus phylum Eremiobacterota (formerly WPS-2), a terrestrial bacterial clade known for its acid-tolerant adaptation [23]. In addition, two high-quality unclassified rMAGs were assigned to the family Saccharimonadaceae in the superphylum Patescibacteria, a mostly uncultivated group almost exclusively known through (meta)genomics [21]. The novel Xenobia rMAG exhibited a significantly higher relative abundance on indoor surfaces and occupant skin in one of the residences (average 0.54 ± 1.1%, post hoc Kruskal–Wallis test p < 0.05 for all pairwise comparisons between residences), while its presence in the other environments was negligible (average 0.003 ± 0.04%) (Fig. S3). Similarly, the two novel Patescibacteria rMAGs displayed enrichment on occupant skin in two residences compared with other residences and other BEs (average relative abundance 0.09 ± 0.20% and 0.083 ± 0.15% in residence 3 and residence 4, respectively, post hoc Kruskal–Wallis test p < 0.05 for all pairwise comparisons between residences) (Fig. S3).
To investigate whether the rMAGs exhibited clustering based on their metabolic functions, the unsupervised k-means clustering method was applied to the pairwise binary Jaccard distance calculated based on the presence or absence of KEGG Orthology (KO) groups, which resulted in two distinct clusters (Fig. 2a). Cluster B comprised rMAGs exclusively from the phylum Actinobacteriota, whereas cluster A comprised rMAGs from diverse phyla. However, rMAGs from the same phylum in cluster A tended to cluster together, suggesting a higher functional similarity within phyla than between. The strong correlation between the phylogenetic distance and metabolic function distance of the rMAGs (Fig. 2b, Procrustes test, p = 0.001, m2 = 0.48) underscored how phylogeny influenced the metabolic functions of genomes. As a result, specific metabolic functions were particularly prevalent in one or a number of phyla. For example, rMAGs affiliated with Cyanobacteria contained specific functions such as carbon fixation and selenite and arsenate reduction (Fig. 2c). Some of the phylogenetically close rMAGs could also be assigned to different clusters (Fig. 2d), indicating that metabolic functions in the same phylogenic group could differ. In particular, the 335 rMAGs affiliated with Actinobacteriota were assigned to two clusters (91 and 244 in clusters A and B, respectively), each with distinct enriched functions. This difference between functions could be in part driven by the prevalence of acetate oxidation function from all of the 140 M. luteus rMAGs in cluster B (Fig. S4).
Metabolic functions of rMAGs. a Principal component analysis of Jaccard distance results based on the presence or absence of KEGG Orthology (KO) groups in each rMAG. The ellipses represent the 95% confidence of data points assigned to the functional clusters, assuming a multivariate normal distribution (dashed line) and a multivariate t-distribution (solid line). Four phyla (i.e., Chloroflexota, Ermiobacterota, Myxococcota, and Planctomycetota) each with only one rMAG are excluded. The number of rMAGs in each cluster is indicated in the brackets. b Procrustes analysis showing congruence between the phylogeny and functions (based on KO groups) of rMAGs. c Functional differences between the two clusters at the phylum level. Gene presence and absence for each function were assessed for each rMAG. Prevalence represents the percentage of rMAGs in a phylum that contain the genes associated with a function (0% and 100% denote absence of the genes in all genomes and presence of the genes in all genomes, respectively). The number of rMAGs in each phylum is indicated in the brackets following the phylum names. d An unrooted neighbor-joining phylogenetic tree of rMAGs
Putative biosynthetic gene clusters in rMAGs differed by taxonomy
The putative biosynthetic gene clusters (BGCs) present in the rMAGs were predicted to understand their secondary biosynthetic potential. Putative BGCs for the synthesis of terpene and non-ribosomal peptide synthetase (NRPS) were the most abundant (n = 1011 and 757, respectively) and prevalent (100% for both) across all of the rMAGs (Fig. S5a). These two types of BGCs are recognized for their diverse biological functions in aquatic and terrestrial environments [25], including signaling and communication, stress responses, and competition and defense such as antibiotic production [26, 27]. In addition, a large number of NRPS-like gene clusters (n = 471) were identified; these share similarities with NRPS in terms of genetic organization but may exhibit altered functionality, offering the potential for producing peptides with novel functions and thus providing bacteria with additional capabilities to engage in diverse ecological interactions [28].
The number of putative BGCs in an rMAG was significantly and positively correlated with its genome size (Fig. S5b, Spearman’s rho = 0.51, p-value = 2.1 × 10−59). The novel rMAG affiliated with Eremiobacterota, with a relatively large genome size of 5.5 Mbp, was predicted to harbor diverse types of putative BGCs (n = 13). In contrast, only one putative BGC for terpene synthesis was identified in one of the rMAGs affiliated with the superphylum Patescibacteria, which might be due to its ultra-small genome size (812 kbp) and streamlined metabolic functions [21]. The composition of putative BGCs in an rMAG was primarily influenced by its taxonomy rather than its sample type (Fig. S5c). At the phylum level, Proteobacteria harbored abundant putative BGCs associated with homoserine lactone synthesis, while Actinobacteriota harbored abundant BGCs associated with ectoine synthesis. Homoserine lactone is used for quorum sensing and regulating bacterial behaviors within a community [29], whereas ectoine functions as an osmoprotectant to aid bacterial survival under osmotic stresses (e.g., desiccation) [30].
Survival strategies for members of Eremiobacterota and Patescibacteria in BEs
Species affiliated with the novel Eremiobacterota and Patescibacteria phyla have been shown to have specialized functions that are finely tuned to a specific environment [21, 23]. Similarly, the novel rMAGs from these two phyla detected in the BEs were also found to carry a number of novel metabolic functions that were absent in their closest relatives, and these functions could potentially aid their survival.
The novel rMAG in the phylum Eremiobacterota was affiliated with the Candidatus class Xenobia, and its closest phylogenetic relative was a strain in the genus JAEXNA01, which was sourced from human fecal samples (Fig. 3a and Table S3). Similar to other close relatives, the Candidatus Xenobia rMAG featured a repertoire of genes for scavenging trace gases such as hydrogen, carbon monoxide, and carbon dioxide in ambient air and other trace organic compounds (e.g., alcohol) as energy and carbon sources [23]. These genes include those for the [NiFe]-hydrogenase (hydA and hydC), type I [MoCu]-carbon monoxide dehydrogenase (coxL, coxM, and coxS), type II carbon monoxide dehydrogenase (cooC), phosphoenolpyruvate carboxylase (ppc), and alcohol and aldehyde dehydrogenases (adh gene families and aldH) (Figs. 3a, c and 4, and Table S3). The rMAG also harbored genes (LivF, LivG, and LivH) that encode an ABC transporter for taking up branched-chain amino acids (BCAAs) such as leucine, isoleucine, and valine, which are essential for cellular growth. Furthermore, it contained comprehensive pathways for sulfur and ammonia metabolism, spanning from extracellular transport into the cell to intracellular assimilation, that are crucial for synthesizing sulfur-containing amino acids (e.g., cysteine and methionine) as well as glutamine and glutamate (Fig. 4). As part of nitrogen metabolism, a newly acquired gene (narK) that encodes a transmembrane protein for nitrate/nitrite transport [31] was found, but the assimilated nitrite may be reduced to nitric oxide (a toxic intermediate) by a nitrite reductase (nirK) to potentially induce nitrosative stress [32] (Fig. 3c). However, this stress could be effectively mitigated by multiple copies of the newly acquired genes encoding NnrS proteins for detoxification [33] (Figs. 3c and 4, and Table S3). Additionally, a newly acquired sapC gene, which encodes a permease for transporting peptides and potentially conferring resistance to antimicrobial peptides [34] released by microbial competitors or host organisms, was identified (Fig. 3c).
The maximum-likelihood phylogenetic trees of the three novel rMAGs and related rMAGs. The phylogeny of (a) the novel Xenobia rMAG (SL346106_bin.7) and (b) the two novel Patescibacteria rMAGs (SL336691_bin.7 and SL346115_bin.3) identified in BEs. Genes associated with selected key functions (e.g., trace gas metabolism in the Xenobia rMAG and nutrition transport and stress response in the Patescibacteria rMAGs) are shown on the outer rings, with the color intensity representing the gene copy number. Genes that were absent (corresponding to a zero copy number) are shown in white. The branches of the novel rMAGs and their closest relatives are colored in red (class Xenobia of phylum Eremiobacterota), green (family UBA2112 of Patescibacteria), and orange (genus Saccharimonas of Patescibacteria). Due to the large number of families associated with the reference genomes within the class Saccharimonadia of the Patescibacteria in b, only the families of interest have been highlighted, including the one that contains the two novel strains. The evolutionary history of selected genes in (c) the novel Xenobia rMAG and (d–e) the two novel Patescibacteria rMAGs. Only the novel rMAG and its closest relatives that were used to analyze gene evolution are shown in the heatmaps, with the color intensity indicating the gene copy number (genes that were absent are shown in white)
Proposed metabolic capacity of the novel Xenobia strain identified in BEs. Selected key metabolic pathways in the novel Candidatus Xenobia strain of the phylum Eremiobacterota. Genes and compounds associated with sulfur metabolism (green shading and lines) comprise sulfate adenylyltransferase (SAT, encoded by cysD and cysN), also known as ATP sulfurylase (ATPS); adenosine 5′-phosphosulfate (APS), a key intermediate in the sulfur assimilation pathway; adenylyl-sulfate kinase (APSK, encoded by cysC), also known as APS kinase; 3′-phosphoadenosine-5′-phosphosulfate (PAPS), another key intermediate in the sulfur assimilation pathway; and PAPS reductase (encoded by cysH). Genes and compounds associated with carbon metabolism (red shading and lines) comprise the high-affinity carbon uptake protein (Hat/HatR); carbon monoxide dehydrogenases (CODH, encoded by cox gene families and cooC); phosphoenolpyruvate (PEP); PEP carboxylase (PEPC); oxaloacetate (OAA); tricarboxylic acids (TCA); alcohol dehydrogenases (ADH, encoded by adh); aldehyde dehydrogenases (ALDH, encoded by aldH); and acyl-CoA synthetase (ACS, encoded by fadD), also known as fatty acyl-CoA synthetase. Genes and compounds associated with nitrogen metabolism (blue shading and lines) comprise nitrate/nitrite transporters (encoded by narK); nitrate reductase enzyme (encoded by narH); copper-containing nitrite reductase (encoded by nirK); ammonia transporter (encoded by amtB); glutamine synthetase (encoded by glnA); glutamate dehydrogenase (encoded by gdh); and glutamate synthase (GOGAT, encoded by glt gene families). Genes and compounds associated with stress response (purple shading and lines) comprise the sensitive to antimicrobial peptides (SAP) transport system (encoded by sapC) and antimicrobial peptides (AMPs)
Phylogenetic analysis showed that one of the Patescibacteria rMAGs identified was closest to a species of the genus Saccharimonas, which was sourced from wastewater samples (Table S4), while the other rMAG was assigned to the family UBA2112 (Fig. 3b). Similar to other Patescibacteria genomes [21], the two Patescibacteria rMAGs had an unusually compact genome (~ 700 kb) and a reduced metabolic capability that was characterized by a limited number of genes for essential cellular functions such as transcription, DNA repair, and amino acid and fatty acid biosynthesis (Table S4). Nevertheless, both rMAGs either contained the equivalent number of or acquired additional copies of genes involved in BCAA transportation (LivM protein) and protein degradation (clpB and clpC) (Fig. 3d and e). This suggests a potential strategy to acquire essential nutrients directly from bacterial hosts, considering the symbiotic lifestyle of Patescibacteria [17]. Furthermore, similar to their close relatives, both rMAGs possessed a diverse array of genes encoding aminoacyl-tRNA synthetases, i.e., enzymes that function to attach the appropriate amino acids, including alanine (alaS), glutamate (gltX), glycine (glyQS), proline (proS), serine (serS), threonine (thrS), and tyrosine (tyrS), to their corresponding transfer RNAs (tRNAs). Additionally, the presence of genes associated with stress response (usp) and MacAB-TolC efflux pump (macB) [35] in the two rMAGs highlights the capability of tolerating environmental stresses such as nutrient scarcity and antibiotics (Fig. 3d and e).
The rMAG affiliated with Saccharimonas had acquired additional or new genes involved in the metabolism of essential trace elements, such as manganese transportation (mntH), potassium ion efflux (kefC), and the synthesis of carotenoid (crtB, crtE, and crtYf), and ubiquinone (ubiA) (Figs. 3d and 5a), which could be used as antioxidants to protect cells against oxidative damage [36]. The rMAG affiliated with the family UBA2112 contained twice the number of genes for type IV pili formation (pilA) compared with most of its closest relatives (Fig. 3e), which may facilitate attachment and interaction with its hosts. In addition, this rMAG had acquired unique genes related to toxic metal resistance and transport, which could be vital for coping with the high levels of heavy metals often found in urban BEs due to construction materials and industrial pollution [37]. While human skin generally experiences lower exposures to metals, it can still encounter these substances through environmental factors, cosmetic products, or occupational sources [38]. Therefore, genes related to cadmium (cadD) and copper (copB) could help mitigate these metal-induced stresses (Figs. 3e and 5b). A newly acquired gene associated with sporulation (spoVR) was also identified (Figs. 3e and 5b), potentially enabling spore formation to endure adverse environmental conditions such as desiccation and chemical disinfectants in BEs [39] and the dry, acidic, and nutrition-poor conditions of human skin [40]. Additionally, another newly acquired gene, cadA, encodes lysine decarboxylase, an enzyme that converts lysine into cadaverine (Figs. 3e and 5b), thus contributing to bacterial survival and virulence, especially under acid conditions [41].
Proposed metabolic capacity of the novel Patescibacteria strains identified in BEs. Selected key metabolic pathways in the novel genomes of (a) the genus Saccharimonas and (b) the family UBA2112. Genes and compounds associated with carbon metabolism (blue shading and lines), namely glucose 6-phosphate (G6P); glucose 1-phosphate (G1P); fructose 6-phosphate (F6P); trehalose-6-phosphate (T6P); uridine diphosphate glucose (UDP-glucose); hydroxyacetone phosphate (DHAP); glyceraldehyde-3-phosphate (G3P); 1,3-bisphosphoglycerate (1,3-BPG); 3-phosphoglycerate (3-PG); methylcrotonyl-CoA carboxylase (MCC); propionyl coenzyme A (propionyl-CoA); 6-phosphogluconolactone (6PGL); ribulose-5-phosphate (Ru5P); ribose-5-phosphate (R5P); and 2-methylcitrate (2-MC). Genes and compounds associated with amino acid transport and antioxidant biosynthesis (red shading and lines), namely geranylgeranyl pyrophosphate (GGPP), which is synthesized by GGPP synthase (encoded by crtE); branched-chain amino acids (BCAAs); and 4-hydroxybenzoate polyprenyltransferase (4-HB polyprenyltransferase). Genes and compounds associated with stress tolerance (green shading and lines), namely MacB (encoded by macB). MacB is a transmembrane protein located in the bacterial inner membrane and forms a complex with MacA (a periplasmic protein) and TolC (an outer membrane channel), spanning the entire cell envelope in Gram-negative bacteria. The MacAB–TolC macrolide efflux system can transport a variety of substances, including several types of antibiotics and toxic substances. The forespore is a developmental precursor to the bacterial endospore, and the spore coat is its protective armor. The Stage V sporulation protein R encoded by the gene spoVR is essential for proper coat formation
Characterization of M. luteus pangenome
With C. acnes and M. luteus being prevalent and highly abundant in all of the samples, their infraspecific diversity at the genome level was investigated. After dereplication, only one high-quality C. acnes rMAG was obtained, suggesting a low level of infraspecific diversity within this species. In contrast, 11 high-quality M. luteus rMAGs were obtained after dereplication, all of which were present in > 70% of the samples, especially skin and indoor BEs that are associated with humans (Figure S6).
To explore the extensive infraspecific diversity of M. luteus, a pangenome analysis was conducted by integrating 11 high-quality rMAGs from this study, along with an additional 35 high-quality rMAGs from global BEs [42] and 15 complete or near-complete genomes isolated from various sources, including air, soils, seawater, and humans (Table S5). Pairwise genome comparison showed that the 11 rMAGs recovered from Hong Kong displayed ANI values above the species-level threshold of 95%, yet below the strain-level threshold of 99.99% [43], suggesting that all of them differed from other genomes at the strain level. However, the M. luteus rMAGs recovered from Hong Kong did not form a separate cluster in a phylogenetic tree of all the M. luteus genomes, suggesting that they are conspecific strains, which contributes to increasing the infraspecific diversity (Fig. 6a).
Functional differences between the core and accessory genomes of Micrococcus luteus. a Genome comparison between M. luteus from different sources (Hong Kong (HK) BEs, global BEs, and the NCBI database). The ANI value was used as the genome similarity index. b COG categories of the genes in the core and accessory genomes of M. luteus. c Types of biosynthetic gene clusters (BGCs) predicted for the core and accessory genomes of M. luteus
The genes in all of the M. luteus genomes were grouped into core (hardcore and softcore) and accessory (shell and cloud) genomes based on their prevalence. The M. luteus pangenome contained 9513 genes, of which 7.8% belonged to the core genome (consisting of 167 hardcore and 576 softcore genes), and 92.2% belonged to the accessory genome (comprising 1939 shell and 6831 cloud genes). Compared to the small core genome, the accessory genome of M. luteus was considered large. Based on the pangenome and core genome profile curves, the size of the pangenome increased while the size of the core genome decreased with the addition of each genome (Fig. S7). A subsequent analysis using non-linear regression of the pangenome profile curve resulted in a saturation coefficient of 0.4, indicating that the pangenome of M. luteus is open [44]. The genes found in the core and accessory genomes of M. luteus were different. The core genome was enriched with genes encoding functions in the clusters of orthologous groups (COG) categories [C] (energy production and conversion), [E] (amino acid transport and metabolism), [F] (nucleotide transport and metabolism), and [H] (coenzyme transport and metabolism), whereas the accessory genome was enriched with genes for categories [K] (transcription) and [P] (inorganic ion transport and metabolism). In addition, the accessory genomes recovered from territorial environments such as soils and outdoor air were mostly enriched with genes for the COG category [L] (replication, recombination, and repair), while the accessory genomes of particular rMAGs reconstructed from the BEs were enriched in genes for categories [J] (translation, ribosomal structure, and biogenesis) and [I] (lipid transport and metabolism) (Fig. 6b).
Putative BGCs in the M. luteus genomes were predicted in order to elucidate the cellular secondary biosynthetic potential (Fig. 6c). In most of the M. luteus genomes, the putative BGC for terpene synthesis was the only one detected in both the core and accessory genomes. In addition, the core genes possessed a putative BGC for synthesizing NRPS-like fragments, which was exclusively found in the accessory genome of one rMAG reconstructed from the global BEs. Other types of putative BGCs involved in synthesizing siderophores, ribosomally synthesized and post-translationally modified peptide product (RiPP) recognition elements, and ectoine were also largely conserved in the accessory genes. Interestingly, specific types of putative BGCs were identified in the accessory genes of specific genomes. For example, M. luteus strain R17 recovered from gamma-ray-irradiated soils harbored a putative BGC for synthesizing non-alpha poly-amino acids like ε-polylysine exclusively in its accessory genes. Putative BGCs for synthesizing linaridins, lipolanthines, and lantipeptide class III were only found in the accessory genes of a small number of rMAGs reconstructed from the global BEs. In contrast, none of the M. luteus rMAGs recovered from the Hong Kong BEs contained unique types of putative BGCs (Fig. 6c).
Discussion
BEs are man-made structures that passively receive microbial input from human occupants and nearby surroundings [1]. Most of the microbes in BEs are free-living environmental species, while some are human commensals with a symbiotic or mutualistic lifestyle [45]. BEs without dampness or other environmental issues are typically oligotrophic, and the stresses imposed by anthropogenic factors (e.g., disinfectants) make them a unique habitat for microbes. Understanding the functional capabilities of microbial inhabitants under such conditions can provide new insights into BE ecology. In this study, we found significant variations in microbial composition across different sample types, suggesting that each type of samples may harbor a distinct microbial assemblage. This is supported by the presence of indicator taxa that are strongly associated with a specific sample type. Genome analysis further revealed that members of a specific phylum encode unique functions that may aid their survival and persistence in BEs. For example, the genomes of photosynthetic bacteria in the phylum Cyanobacteria [46], detected predominantly on pier surfaces, encode functions for selenite and arsenate reduction, which could be a detoxification mechanism to maintain cellular processes and ensure survival, and the reduction of these two toxic compounds could be accelerated in nutrient-depleted environments [47,48,49].
Similar to the enrichment of Cyanobacteria on pier surfaces, the novel Candidatus Xenobia strain was only found on human palms and indoor surfaces in one residence. This strain may have been sourced from outdoors, as members of this phylum are typically associated with terrestrial environments such as soils [23]. However, the presence of this novel strain on the occupants’ palms is likely due to the acidic nature of human skin, which has a pH range of 4.5 to 5.5 [50], creating a favorable environment for the survival of this strain, consistent with the known preference of members within the Candidatus Eremiobacterota to thrive in acidic conditions [23]. Although the presence of this novel strain on indoor surfaces could be attributed to direct transfer through hand-surface contact or from skin shedding, its enrichment on infrequently touched indoor surfaces suggests that this strain is capable of surviving the unique conditions within BEs. This possibility is supported by the strain’s apparent ability to utilize ammonium ions as a nitrogen source. One potential source of ammonium ions could be cleaning agents, such as quaternary ammonium compounds, that may not have been completely removed after cleaning. Moreover, the presence of genes encoding alcohol and aldehyde dehydrogenases in the rMAG suggests that alcohols, which are also potentially residuals from cleaning agents, could be used as a source of carbon and energy. Similar to those of other Candidatus Xenobia species [22], the novel Candidatus Xenobia strain showed the potential ability to metabolize trace gases. Thus, although the conditions in BEs and on human skin are oligotrophic, the residuals from cleaning products and the trace gases in ambient air could render them suitable for members of Candidatus Xenobia.
The two Patescibacteria strains identified in BEs in this study have an ultra-small genome and a streamlined functionality, which is consistent with other reported Patecibacteria strains [21]. As both the Patescibacteria strains were recovered from human palms, they are likely to live as episymbionts of commensal bacteria inhabiting occupants’ skin [20]. As a result, these novel Patescibacteria strains may rely on their bacterial hosts to obtain vital nutrients such as amino acids and trace elements via transporters [20]. In return, the Patescibacteria strains could confer resistance against antibiotics and toxic substances, which may provide competitive advantages to their bacterial hosts in challenging or hostile environments [51]. Besides having bacterial hosts, the Patescibacteria strains may also closely interact with human hosts. The components of human sweat include ions (e.g., sodium and potassium), amino acids, and lipids. The potassium efflux pumps encoded by the kefC gene may aid Patescibacteria strains to regulate external potassium levels, preventing excessive intracellular accumulation. Essential amino acids like alanine in sweat may be assimilated by the alanine-tRNA ligase (encoded by alaS) for protein synthesis, while lipids from human or bacterial hosts and other dead cells can serve as a source of lipids [17]. Unlike humans, who rely on dietary intake for carotenoids, Patescibacteria strains can synthesize these compounds [52]. The synthesized carotenoids may function as antioxidants, not only benefiting the Patescibacteria strains but also protecting human skin cells against oxidative damage [36]. Taken together, the Patescibacteria strains in BEs appeared to engage in complex relationships with both bacterial and human hosts to enable their survival.
While Cyanobacteria and the three novel strains (one from Candidatus Xenobia and two from Patescibacteria) may prefer specific sample types, M. luteus was ubiquitous in both indoor and outdoor BEs, in that their genomes could be assembled from Hong Kong and 59 other global cities [42]. The ubiquity of a species is often positively correlated with the size of its core genome [44]. However, while the core genome of M. luteus is relatively small, consisting of a limited number of genes, it shows a high prevalence across diverse ecosystems. This deviation may be attributable to its evolutionary adaptation within stable environments provided by animal hosts, leading to the reduction of unnecessary genetic material [53, 54]. Nevertheless, the accessory genome of M. luteus is considered large and encodes versatile functions that can be used to counteract various stresses (e.g., starvation) in different environments. For example, the genes associated with inorganic ion transport and metabolism could be useful for acquiring essential trace elements [55], and genes associated with transcription could be used to accumulate or maintain ribosomal RNA during periods of low metabolic activity to provide a competitive advantage when favorable conditions arise [56]. Furthermore, the accessory genomes harbored two putative BGCs for producing siderophores and ectoines, which were present in almost all of the M. luteus strains and may contribute to their survival and persistence in a wide range of BEs. A few M. luteus strains also possessed specific functions, such as the ability to synthesize metabolites (e.g., lipolanthines and class III lantipeptides) that may exhibit diverse antimicrobial properties, which could play important roles in the survival of M. luteus by allowing it to inactivate competitors [57, 58].
Ubiquitous species often show significant infraspecific diversity, as exemplified by the 11 unique conspecific M. luteus strains in the Hong Kong BEs. This infraspecific diversity may also be attributable to the expansive and flexible nature of the M. luteus pangenome. The accessory genome of a species evolves more rapidly than its core genome [11], and genes unsuited for a specific ecological niche are prone to being eventually lost [59]. Consequently, the accessory genome serves as a gene reservoir for the pangenome, enabling the acquisition of new ecological functions or access to otherwise inaccessible niches [60]. Therefore, species with a large accessory genome undergo frequent genetic variations, resulting in substantial infraspecific diversity [11]. Collectively, the relatively large and open accessory genome of M. luteus may bestow important functions for its ubiquity and geographic speciation, thereby contributing to the significant infraspecific diversity even at a city-wide scale.
While this study has characterized the functional capabilities of microbial taxa in BEs, there are a number of limitations. First, the results of this study are based on metagenomes, which can only reveal potential metabolic functions. Metagenomic analysis does not account for the viability or activity of microbial taxa, leaving it unclear under which conditions specific taxa will become active and which of their genes will be expressed. A combination of metatranscriptome and metaproteome and culture-based experiments under a controlled setting will be required in the future to better understand the physiology of microbes in BEs. Second, fungal taxa were not investigated in this study, as the MiniKraken database used for sequence annotation lacked fungal reference sequences. Future studies should utilize more comprehensive databases that include fungal sequences to better explore the diversity and functions of fungal communities in BEs. Third, although the genomes of the three novel strains of interest have a completeness of at least 84.1%, they are still incomplete, which limits our ability to obtain a comprehensive understanding of their metabolism. A higher sequencing depth needs to be applied in the future to enable the recovery of complete genomes, thereby facilitating the functional profiling of genomes from these lineages and potentially other bacterial and fungal taxa. Fourth, the concentrations of disinfectants and other xenobiotics on surfaces were not measured, so the survival strategies adopted by microbes to counter abiotic stresses could not be better interpreted. Lastly, although this study examined various types of BEs, the city-wide analysis was limited in scope, with a small sample of only four residential households. To develop a more comprehensive understanding of microbial metabolic capabilities in varied and selective conditions, future research should expand the investigation. This should involve increasing the sample size for each BE type and incorporating more specialized BEs, such as hospitals, across city-wide and even broader geographical scales.
In summary, this study provides insights into the metabolic capabilities of microbes found in a variety of BEs and on human occupants’ skin. Based on our findings, three main categories of survival strategies were identified. First, microbial specialists, such as Patescibacteria strains, cannot survive alone in oligotrophic BEs due to their potentially mutualistic lifestyle, and their presence in BEs is likely a result of a passive exchange with human occupants. Second, microbial generalists, such as M. luteus, can persist in a wide variety of biotic (e.g., skin) and abiotic (e.g., BEs) habitats with varying nutrient availability and stress levels using the versatile functions encoded in their large accessory gene pools. Third, versatile microbes, such as Candidatus Xenobia strains, can exploit limited resources, such as residual anthropogenic constituents on surfaces as nutrients and carbons for survival in BEs. Overall, the generally oligotrophic conditions in BEs and on human skin impose selective pressures on microbes, presenting them with challenges that require specialized strategies. The ability of microbes to successfully survive and persist in BEs largely depends on their capability to tolerate these harsh conditions and utilize the limited resources resulting from occupant activities.
Materials and methods
Collection of BE samples and metagenome sequencing
A total of 738 samples collected previously from various BEs and the skin surfaces of human occupants in Hong Kong [2, 61, 62] were analyzed. These samples comprised 161 samples from 84 subway stations (80 air samples from the platform and 81 surface samples from ticket kiosks), 268 samples from four residences with a single occupant (134 surface samples from doorknobs and bed headboards and 134 skin samples from the left and right palms and forearms of occupants), 134 samples from eight urban public facilities (69 and 65 surface samples from park handrails and subway exit handrails, respectively), and 175 samples from nine piers (45 samples from floors and 40, 45, and 45 surface samples from bollards, handrails, and poles, respectively). The indoor BEs, including residences and subway stations, were operating under normal conditions without signs of dampness or other environmental issues that could have promoted microbial growth. Similarly, the outdoor BEs, including urban public facilities and piers, were also considered to be in normal condition, as they were fully operational and free from construction activities during the sampling period. In addition, all four human occupants were healthy and did not have any reported skin diseases at the time of sampling. The collected samples were classified into six types: skin, residential surfaces, subway air, subway surfaces, urban public surfaces, and pier surfaces. In addition, 12 sterile swabs and eight sterile filters were processed in parallel as negative controls. All of the surface and air samples were collected using the same method. Detailed information about the samples and the material of the surfaces (metal, plastic, or concrete) has been reported previously [63]. The same genomic DNA extraction method was applied to process all of the respective surface and air samples [2, 62], and all the purified genomic DNA was sequenced on an Illumina HiSeq X Ten System (Illumina Inc., San Diego, CA, USA) at HudsonAlpha Genome Center (Huntsville, AL, USA) [42].
Quality control of sequence reads
Quality control of the raw sequence reads was performed as described previously [61]. Briefly, adapters were removed using AdapterRemoval (v2.2.2) [64], and human DNA was filtered out using KneadData (https://bitbucket.org/biobakery/kneaddata/wiki/Home, v0.7.6) by mapping reads against the human genome hg38 as the reference. The reads from the respective 12 negative surface controls and eight negative air controls were separately co-assembled using MetaWRAP (v1.2.1) [65] with the default assembly method MEGAHIT and a minimum contig length of 1000 bp. Any reads in a surface or air sample that could be mapped to the contigs in the corresponding negative controls were removed. After contamination removal, an average of 5.1 ± 3.0 million paired-end reads per sample was retained for downstream analysis.
Taxonomic classification and diversity analysis
The taxonomy of the paired-end clean reads was annotated against the MiniKraken_v1_8GB database (April 2019 version) using Kraken2 (v2.0.7-beta) [66] with the parameter “–report-zero-counts.” The MiniKraken database only contains complete bacterial, archaeal, and viral genomes, so no fungal taxa could be identified in this study. The Kraken2 outputs were consolidated using the Python (v3.7) script kraken-multiple.py, and the taxonomic information was determined using the Python script kraken-multiple-taxa.py. On average, 18.9 ± 9.7% of the clean reads across all the samples could be classified at the species level.
Prior to alpha-diversity analysis, all samples were rarefied to a depth of one million reads using the function “rarefy” in the R (v3.6.1) package “vegan” (v2.5–6) (65 samples with insufficient reads were removed). Alpha-diversity metrics, namely Pielou’s evenness, richness (number of species), and the abundance-based Shannon diversity index, were calculated for the rarefied species-level taxa using the function “diversity” in the R package “vegan.” Bray–Curtis dissimilarity was calculated for the unrarefied species-level taxa using the function “vegdist” in the R package “vegan.” The species accumulation curve was generated using the function “specaccum” in the R package “vegan” with 10 permutations. Indicator species in different types of samples were identified using the function “multipatt” in the R package “indicspecies” with 999 permutations.
MAG reconstruction and functional annotation
The reconstruction of MAGs from each sample was performed as described previously [61]. Briefly, assembly of the filtered reads into contigs was performed for each sample using the default assembly method MEGAHIT in MetaWRAP (v1.2.1) [65]. The contigs (> 1000 bp) in each sample were binned into MAGs using MetaWRAP, and refinement of the resulting MAGs was performed using the “bin_refinement” function in MetaWRAP. A total of 1663 MAGs were dereplicated using the “dereplicate” function with the default setting in dRep (v2.5.4) [67]. CheckM (v1.1.2) [68] was used to assess the quality of the rMAGs, utilizing a default universal set of 104 bacterial marker genes to evaluate the non-Patescibacteria rMAGs, whereas a custom set of 43 single-copy genes [69] was utilized to assess the two Patescibacteria rMAGs. Medium-quality (completeness ≥ 50% and < 90%, and contamination < 10%) and high-quality (completeness ≥ 90% and contamination < 5%) rMAGs were defined as described previously [70, 71]. The taxonomy of rMAGs was annotated against GTDB [24] using the function “gtdbtk classify_wf” of GTDB-Tk (v1.5.0) [72] and an unrooted approximately maximum-likelihood phylogenetic tree of rMAGs was inferred using the function “gtdbtk_infer.” An ANI of ≥ 99.99% was used to determine whether any two rMAGs belonged to the same strain of a species [43].
The open reading frames in the contigs were predicted using Prokka (v1.11) [73], and functional annotation was performed using EggNOG-mapper (v2.0.1) [74]. Pairwise binary Jaccard distances between all rMAGs were calculated based on the KO terms using the function “vegdist” in the R package “vegan.” Clustering of metabolic functions in rMAGs based on KO groups was performed using the unsupervised clustering method of k-means using the function “kmeans” in the R package “stats” (v3.6.1). The optimal number of clusters was predicted using the function “prediction.strength” in the R package “fpc” (v2.2–9) with the default settings. The pairwise genomic distance between all the rMAGs was calculated using Mash (v2.0) [75]. Subsequently, an unrooted neighbor-joining phylogenetic tree of rMAGs was constructed based on the genomic distances derived from Mash using mashtree (v0.12) [76], and it was visualized using tvBOT [77]. Secondary metabolites were predicted for each rMAG using antiSMASH (v6.1.1) [78], and the metabolic and biogeochemical functional traits of each rMAG were characterized using METABOLIC (v4.0) [79].
Pangenome analysis of M. luteus
A total of 61 high-quality (using the same completeness and contamination levels as indicated above) rMAGs or complete or near-complete genomes of M. luteus were included in the pangenome analysis. Eleven were rMAGs reconstructed from the dataset in this study. Fifteen were complete or near-complete (at most two contigs) genomes downloaded from the NCBI Assembly database (as of June 19, 2022), and their genomic traits, including genome completeness and contamination, GC content, number of contigs, and N50 of contigs, were determined using CheckM (v1.1.2) while the taxonomy was verified using GDTB-tk. Thirty-five were rMAGs retrieved from a global study on urban BEs in 59 cities (excluding Hong Kong) [42]. The similarity between the 61 M. luteus genomes was calculated based on the average ANI value using the function “anvi-compute-genome-similarity” in Anvi’o (v6.2) [80] with the python module PyANI [81]. The origin and genomic traits of the 61 M. luteus genomes included in the pangenome analysis are summarized in Table S5. The core and accessory genes of the aforementioned 61 M. luteus genomes were analyzed using Roary (v3.13.0) [82], in which hardcore genes are those found in > 99% of the genomes, softcore genes in 95–99% of the genomes, shell genes in 15–95% of the genomes, and cloud genes in < 15% of the genomes. Hardcore and softcore genes make up the core genes, while shell and cloud genes make up the accessory genes. The non-linear least squares method, implemented using the package “nlstools” (v2.1–0) [83] in Python (v4.0.2), was used to model the total number of genes in the M. luteus pangenome and core genome using Eqs. 1 and 2 [44], respectively, to estimate the openness of the pangenomes.
where G is the number of genes, N is the number of genomes, k and c are constants, and r is the saturation coefficient. In Eq. (1), when r ≤ 0, the pangenome is closed, whereas when 0 < r ≤ 1, the pangenome is open.
The sequences of the 61 M. luteus genomes were aligned using MUMmer4 (v4.0.0) [84], and the conserved core gene sequences were identified using SPINE (v0.3.1) [13]. The core gene sequences were then mapped to individual M. luteus genomes using AGEnt (v0.3.1) [13], allowing the identification of accessory (i.e., non-core) gene sequences in each M. luteus genome. The core and accessory gene sequences were functionally annotated using EggNOG-mapper (v2.0.1) and antiSMASH (v6.1.1).
Functional characterization of three novel strains
Three novel rMAGs were subjected to detailed analysis, namely one rMAG belonging to the Candidatus phylum Eremiobacterota and two rMAGs from the superphylum Candidatus Patescibacteria. The relative abundance of each of these rMAGs in each sample was calculated using CoverM (v0.5.0). The phylogenetic relationships between each novel genome and its respective reference genomes obtained from the GTDB (v214.1) [24] based on the 92 genes in the bacterial core gene set were inferred using the UBCG pipeline (v3.0) [85]. The analysis included 200 publicly available strains within the phylum Eremiobacterota and 495 strains within the class Saccharimonadia of the Patescibacteria. A maximum-likelihood phylogenetic tree of the novel genomes and their corresponding reference genomes was constructed using FastTree [86] and visualized using tvBOT [77].
By using OrthoFinder (v2.5.4) [87], the gene families in the novel rMAGs and the reference genomes used in the phylogenetic tree analysis were clustered into orthologs and gene duplication events were identified, enabling the estimation of the evolutionary history of gene families within these novel genomes. The closest relatives to the novel rMAGs were determined to be those reference genomes positioned within the deepest branch of the novel genomes. The evolutionary history encompassed four events, namely “Gain” (i.e., the acquisition of genes associated with a feature in the novel rMAG that were absent in its close relatives), “Expand” (i.e., a significant increase in the copy number of genes associated with a feature in the novel rMAG compared with its close relatives), “Have” (i.e., an equivalent copy number of genes associated with a feature in both the novel rMAG and its close relatives), and “Reduce” (i.e., a significant reduction in the copy number of the genes associated with a specific feature in the novel rMAG compared with its close relatives).
The potential functions of gene families in the three novel rMAGs were characterized using Rapid Annotations using Subsystems Technology (RAST) [88] and EggNOG-mapper (v2.1.12) [74] at all available hierarchy levels of the respective database.
Statistics
PERMANOVA, utilizing the “adonis2” function from the R package “vegan,” was employed to evaluate the impact of sample type on the composition of BE microbiomes at the species level. The significance of differences between each pair of sample types was determined using the “pairwise.adonis” function from the R package “pairwiseAdonis” (v0.4.1). Permutational analysis of multivariate dispersion (PERMDISP) was conducted to analyze the variance in microbial community dispersion across different sample types. This was executed using the “betadisper” function in the “vegan” R package. For a further detailed analysis of pairwise comparisons in dispersion between sample types, the “permutest” function in “vegan” was used, setting the “pairwise” parameter to TRUE. The Spearman correlation between genome size and the number of BGCs was calculated using the “cor.test” function in the R package “stats.” A Procrustes test was performed to compare the Jaccard-based distance (based on the presence of KO groups in rMAGs) with the genomic distance of rMAGs derived from Mash, using the function “procrustes” in the R package “vegan,” with 999 permutations.
Data availability
The raw DNA-sequencing data for subway air (PRJNA561080), human skin, residential surfaces, and urban public surfaces (PRJNA671748), and pier surfaces (PRJNA722771) have been deposited in the NCBI Sequence Read Archive under their respective BioProject accession numbers as indicated, and the subway surface raw DNA-sequencing data can be accessed at https://pngb.io/metasub-2021.
References
Leung MH, Tong X, Lee PK. Indoor microbiome and airborne pathogens. Compr Biotechnol. 2019;6:96–106.
Leung M, Tong X, Bøifot KO, Bezdan D, Butler DJ, Danko DC, et al. Characterization of the public transit air microbiome and resistome reveals geographical specificity. Microbiome. 2021;9(1):1–19.
Knudsen SM, Gunnarsen L, Madsen AM. Airborne fungal species associated with mouldy and non-mouldy buildings–effects of air change rates, humidity, and air velocity. Build Environ. 2017;122:161–70.
Dannemiller KC, Weschler CJ, Peccia J. Fungal and bacterial growth in floor dust at elevated relative humidity levels. Indoor Air. 2017;27(2):354–63.
National Academies of Sciences E, Medicine. Microbiomes of the built environment: a research agenda for indoor microbiology, human health, and buildings. Washington (DC): National Academies Press (US); 2017.
Leung MH, Lee PK. The roles of the outdoors and occupants in contributing to a potential pan-microbiome of the built environment: a review. Microbiome. 2016;4(1):1–15.
Zhou Y, Leung MH, Tong X, Lai Y, Tong JC, Ridley IA, et al. Profiling airborne microbiota in mechanically ventilated buildings across seasons in Hong Kong reveals higher metabolic activity in low-abundance bacteria. Environ Sci Tech. 2020;55(1):249–59.
Bergkessel M, Basta DW, Newman DK. The physiology of growth arrest: uniting molecular and environmental microbiology. Nat Rev Microbiol. 2016;14(9):549–62.
Gray DA, Dugar G, Gamba P, Strahl H, Jonker MJ, Hamoen LW. Extreme slow growth as alternative strategy to survive deep starvation in bacteria. Nat Commun. 2019;10(1):1–12.
Yin L, Ma H, Nakayasu ES, Payne SH, Morris DR, Harwood CS. Bacterial longevity requires protein synthesis and a stringent response. mBio. 2019;10(5):e02189-19.
Van Rossum T, Ferretti P, Maistrenko OM, Bork P. Diversity within species: interpreting strains in microbiomes. Nat Rev Microbiol. 2020;18(9):491–506.
Ryšánek D, Hrčková K, Škaloud P. Global ubiquity and local endemism of free-living terrestrial protists: phylogeographic assessment of the streptophyte alga Klebsormidium. Environ Microbiol. 2015;17(3):689–98.
Ozer EA, Allen JP, Hauser AR. Characterization of the core and accessory genomes of Pseudomonas aeruginosa using bioinformatic tools Spine and AGEnt. BMC Genet. 2014;15(1):1–17.
Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome.” Proc Natl Acad Sci U S A. 2005;102(39):13950–5.
Quince C, Walker AW, Simpson JT, Loman NJ, Segata N. Shotgun metagenomics, from sampling to analysis. Nat Biotechnol. 2017;35(9):833–44.
Nayfach S, Roux S, Seshadri R, Udwary D, Varghese N, Schulz F, et al. A genomic catalog of Earth’s microbiomes. Nat Biotechnol. 2021;39(4):499–509.
Castelle CJ, Brown CT, Anantharaman K, Probst AJ, Huang RH, Banfield JF. Biosynthetic capacity, metabolic variety and unusual biology in the CPR and DPANN radiations. Nat Rev Microbiol. 2018;16(10):629–45.
Nascimento Lemos L, Manoharan L, William Mendes L, Monteiro Venturini A, Satler Pylro V, Tsai SM. Metagenome assembled-genomes reveal similar functional profiles of CPR/Patescibacteria phyla in soils. Environ Microbiol Rep. 2020;12(6):651–5.
Zhao R, Farag IF, Jørgensen SL, Biddle JF. Occurrence, diversity, and genomes of “Candidatus Patescibacteria” along the early diagenesis of marine sediments. Appl Environ Microbiol. 2022;88(24):e01409-e1422.
He X, McLean JS, Edlund A, Yooseph S, Hall AP, Liu S-Y, et al. Cultivation of a human-associated TM7 phylotype reveals a reduced genome and epibiotic parasitic lifestyle. Proc Natl Acad Sci U S A. 2015;112(1):244–9.
Tian R, Ning D, He Z, Zhang P, Spencer SJ, Gao S, et al. Small and mighty: adaptation of superphylum Patescibacteria to groundwater environment drives their genome simplicity. Microbiome. 2020;8:1–15.
Ji M, Greening C, Vanwonterghem I, Carere CR, Bay SK, Steen JA, et al. Atmospheric trace gases support primary production in Antarctic desert surface soil. Nature. 2017;552(7685):400–3.
Ji M, Williams TJ, Montgomery K, Wong HL, Zaugg J, Berengut JF, et al. Candidatus Eremiobacterota, a metabolically and phylogenetically diverse terrestrial phylum with acid-tolerant adaptations. ISME J. 2021;15(9):2692–707.
Parks DH, Chuvochina M, Rinke C, Mussig AJ, Chaumeil P-A, Hugenholtz P. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res. 2022;50(D1):D785–94.
Avalos M, Garbeva P, Vader L, van Wezel GP, Dickschat JS, Ulanova D. Biosynthesis, evolution and ecology of microbial terpenoids. Nat Prod Rep. 2022;39(2):249–72.
Mahizan NA, Yang SK, Moo CL, Song AAL, Chong CM, Chong CW, et al. Terpene derivatives as a potential agent against antimicrobial resistance (AMR) pathogens. Molecules. 2019;24(14):2631.
Agrawal S, Acharya D, Adholeya A, Barrow CJ, Deshmukh SK. Nonribosomal peptides from marine microbes and their antimicrobial and anticancer potential. Front Pharmacol. 2017;8:828.
Shi J, Xu X, Liu PY, Hu YL, Zhang B, Jiao RH, et al. Discovery and biosynthesis of guanipiperazine from a NRPS-like pathway. Chem Sci. 2021;12(8):2925–30.
Parsek MR, Greenberg EP. Acyl-homoserine lactone quorum sensing in gram-negative bacteria: a signaling mechanism involved in associations with higher organisms. Proc Natl Acad Sci U S A. 2000;97(16):8789–93.
Pastor JM, Salvador M, Argandoña M, Bernal V, Reina-Bueno M, Csonka LN, et al. Ectoines in cell stress protection: uses and biotechnological production. Biotechnol Adv. 2010;28(6):782–801.
Fukuda M, Takeda H, Kato HE, Doki S, Ito K, Maturana AD, et al. Structural basis for dynamic mechanism of nitrate/nitrite antiport by NarK. Nat Commun. 2015;6(1):7097.
Rogstam A, Larsson JT, Kjelgaard P, von Wachenfeldt C. Mechanisms of adaptation to nitrosative stress in Bacillus subtilis. J Bacteriol. 2007;189(8):3063–71.
Tosques IE, Shi J, Shapleigh JP. Cloning and characterization of nnrR, whose product is required for the expression of proteins involved in nitric oxide metabolism in Rhodobacter sphaeroides 2.4. 3. J Bacteriol. 1996;178(16):4958–64.
Shelton CL, Raffel FK, Beatty WL, Johnson SM, Mason KM. Sap transporter mediated import and subsequent degradation of antimicrobial peptides in Haemophilus. PLoS Pathog. 2011;7(11):e1002360.
Crofts AA, Giovanetti SM, Rubin EJ, Poly FM, Gutiérrez RL, Talaat KR, et al. Enterotoxigenic E. coli virulence gene regulation in human infections. Proc Natl Acad Sci U S A. 2018;115(38):E8968–76.
Garcia-Caparros P, De Filippis L, Gul A, Hasanuzzaman M, Ozturk M, Altay V, et al. Oxidative stress and antioxidant metabolism under adverse environmental conditions: a review. Bot Rev. 2021;87:421–66.
Kumar M, Gogoi A, Kumari D, Borah R, Das P, Mazumder P, et al. Review of perspective, problems, challenges, and future scenario of metal contamination in the urban environment. J Hazard Toxic Radioact Waste. 2017;21(4):04017007.
Hostýnek JJ, Hinz RS, Lorence CR, Price M, Guy RH. Metals and the skin. Crit Rev Toxicol. 1993;23(2):171–235.
Hu J, Ben Maamar S, Glawe AJ, Gottel N, Gilbert JA, Hartmann EM. Impacts of indoor surface finishes on bacterial viability. Indoor Air. 2019;29(4):551–62.
Bewick S, Gurarie E, Weissman JL, Beattie J, Davati C, Flint R, et al. Trait-based analysis of the human skin microbiome. Microbiome. 2019;7:1–15.
Han L, Yuan J, Ao X, Lin S, Han X, Ye H. Biochemical characterization and phylogenetic analysis of the virulence factor lysine decarboxylase from Vibrio vulnificus. Front Microbiol. 2018;9:3082.
Danko D, Bezdan D, Afshin EE, Ahsanuddin S, Bhattacharya C, Butler DJ, et al. A global metagenomic map of urban microbiomes and antimicrobial resistance. Cell. 2021;184(13):3376-93.e17.
Rodriguez-R LM, Conrad RE, Viver T, Feistel DJ, Lindner BG, Venter SN, et al. An ANI gap within bacterial species that advances the definitions of intra-species units. mBio. 2024;15(1):e02696-23.
Maistrenko OM, Mende DR, Luetge M, Hildebrand F, Schmidt TS, Li SS, et al. Disentangling the impact of environmental and phylogenetic constraints on prokaryotic within-species diversity. ISME J. 2020;14(5):1247–59.
Tong X, Leung MH, Wilkins D, Lee PK. City-scale distribution and dispersal routes of mycobiome in residences. Microbiome. 2017;5(1):1–13.
Berman-Frank I, Lundgren P, Falkowski P. Nitrogen fixation and photosynthetic oxygen evolution in Cyanobacteria. Res Microbiol. 2003;154(3):157–64.
Zhou C, Huang J-C, Gan X, He S, Zhou W. Selenium uptake, volatilization, and transformation by the cyanobacterium Microcystis aeruginosa and post-treatment of Se-laden biomass. Chemosphere. 2021;280:130593.
Kora AJ. Bacillus cereus, selenite-reducing bacterium from contaminated lake of an industrial area: a renewable nanofactory for the synthesis of selenium nanoparticles. Bioresour Bioprocess. 2018;5(1):1–12.
Zhang S, Rensing C, Zhu Y-G. Cyanobacteria-mediated arsenic redox dynamics is regulated by phosphate in aquatic environments. Environ Sci Tech. 2014;48(2):994–1000.
Finnegan M, Duffy E, Morrin A. The determination of skin surface pH via the skin volatile emission using wearable colorimetric sensors. Sens Bio-Sens Res. 2022;35:100473.
Maatouk M, Ibrahim A, Rolain JM, Merhej V, Bittar F. Small and equipped: the rich repertoire of antibiotic resistance genes in Candidate Phyla Radiation genomes. mSystems. 2021;6(6):e00898-21.
Eggersdorfer M, Wyss A. Carotenoids in human nutrition and health. Arch Biochem Biophys. 2018;652:18–26.
Cobo-Simón M, Tamames J. Relating genomic characteristics to environmental preferences and ubiquity in different microbial taxa. BMC Genet. 2017;18(1):1–11.
Li Y, Sun Z-Z, Rong J-C, Xie B-B. Comparative genomics reveals broad genetic diversity, extensive recombination and nascent ecological adaptation in Micrococcus luteus. BMC Genet. 2021;22(1):1–14.
Bergkessel M, Delavaine L. Diversity in starvation survival strategies and outcomes among heterotrophic Proteobacteria. Microb Physiol. 2021;31(2):146–62.
Blazewicz SJ, Barnard RL, Daly RA, Firestone MK. Evaluating rRNA as an indicator of microbial activity in environmental communities: limitations and uses. ISME J. 2013;7(11):2061–8.
Wiebach V, Mainz A, Siegert MAJ, Jungmann NA, Lesquame G, Tirat S, et al. The anti-staphylococcal lipolanthines are ribosomally synthesized lipopeptides. Nat Chem Biol. 2018;14(7):652–4.
Wang H, Van Der Donk WA. Biosynthesis of the class III lantipeptide catenulipeptin. ACS Chem Biol. 2012;7(9):1529–35.
Hottes AK, Freddolino PL, Khare A, Donnell ZN, Liu JC, Tavazoie S. Bacterial adaptation through loss of function. PLoS Genet. 2013;9(7):e1003617.
Brockhurst MA, Harrison E, Hall JP, Richards T, McNally A, MacLean C. The ecology and evolution of pangenomes. Curr Biol. 2019;29(20):R1094–103.
Tong X, Leung MH, Shen Z, Lee JY, Mason CE, Lee PK. Metagenomic insights into the microbial communities of inert and oligotrophic outdoor pier surfaces of a coastal city. Microbiome. 2021;9:1–15.
Wilkins D, Tong X, Leung MH, Mason CE, Lee PK. Diurnal variation in the human skin microbiome affects accuracy of forensic microbiome matching. Microbiome. 2021;9(1):129.
Du S, Tong X, Lai AC, Chan CK, Mason CE, Lee PK. Highly host-linked viromes in the built environment possess habitat-dependent diversity and functions for potential virus-host coevolution. Nat Commun. 2023;14(1):1–15.
Schubert M, Lindgreen S, Orlando L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res Notes. 2016;9:1–7.
Uritskiy GV, DiRuggiero J, Taylor J. MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome. 2018;6(1):158.
Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019;20(1):257.
Olm MR, Brown CT, Brooks B, Banfield JF. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 2017;11(12):2864–8.
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25(7):1043–55.
Brown CT, Hug LA, Thomas BC, Sharon I, Castelle CJ, Singh A, et al. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature. 2015;523(7559):208–11.
Wibowo MC, Yang Z, Borry M, Hübner A, Huang KD, Tierney BT, et al. Reconstruction of ancient microbial genomes from the human gut. Nature. 2021;594(7862):234–9.
Parks DH, Chuvochina M, Chaumeil P-A, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol. 2020;38(9):1079–86.
Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics. 2020;36(6):1925–7.
Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30(14):2068–9.
Cantalapiedra CP, Hernández-Plaza A, Letunic I, Bork P, Huerta-Cepas J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol Biol Evol. 2021;38(12):5825–9.
Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016;17(1):1–14.
Katz LS, Griswold T, Morrison SS, Caravas JA, Zhang S, den Bakker HC, et al. Mashtree: a rapid comparison of whole genome sequence files. J Open Source Softw. 2019;4(44):1–7.
Xie J, Chen Y, Cai G, Cai R, Hu Z, Wang H. Tree Visualization By One Table (tvBOT): a web application for visualizing, modifying and annotating phylogenetic trees. Nucleic Acids Res. 2023;51(W1):W587–92.
Blin K, Shaw S, Kloosterman AM, Charlop-Powers Z, Van Wezel GP, Medema MH, et al. antiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res. 2021;49(W1):W29–35.
Zhou Z, Tran PQ, Breister AM, Liu Y, Kieft K, Cowley ES, et al. METABOLIC: high-throughput profiling of microbial genomes for functional traits, metabolism, biogeochemistry, and community-scale functional networks. Microbiome. 2022;10(1):1–22.
Eren AM, Esen ÖC, Quince C, Vineis JH, Morrison HG, Sogin ML, et al. Anvi’o: an advanced analysis and visualization platform for ‘omics data. PeerJ. 2015;3:e1319.
Pritchard L, Glover RH, Humphris S, Elphinstone JG, Toth IK. Genomics and taxonomy in diagnostics for food security: soft-rotting enterobacterial plant pathogens. Anal Methods. 2016;8(1):12–24.
Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MT, et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics. 2015;31(22):3691–3.
Baty F, Ritz C, Charles S, Brutsche M, Flandrois J-P, Delignette-Muller M-L. A toolbox for nonlinear regression in R: the package nlstools. J Stat Softw. 2015;66:1–21.
Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. MUMmer4: a fast and versatile genome alignment system. PLoS Comput Biol. 2018;14(1):e1005944.
Na S-I, Kim YO, Yoon S-IS-I, Ha S-m, Baek I, Chun J. UBCG: up-to-date bacterial core gene set and pipeline for phylogenomic tree reconstruction. J Microbiol. 2018;56:280–5.
Price MN, Dehal PS, Arkin AP. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490.
Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20:1–14.
Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: rapid annotations using subsystems technology. BMC Genet. 2008;9(1):1–15.
Acknowledgements
We thank the participants who were involved in the sample collection.
Funding
This research was supported by the Hong Kong Research Grants Council Research Impact Fund (R1016-20F) and the General Research Fund (11214721) to P.K.H.L. Support was also provided to X.T. by the Jiangsu Science and Technology Programme (BK20230230).
Author information
Authors and Affiliations
Contributions
X.T. performed bioinformatic analysis, data analysis, and interpretation and wrote the manuscript. D.L., J.Y.Y.L., and Z.S. performed bioinformatic analysis. D.L., M.H.Y.L., and C.E.M. provided guidance on data interpretation. W.J. performed data analysis. P.K.H.L. conceived the study and supervised the research.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
All participants were fully informed of the details of the study and provided written informed consent. The study was approved by the City University of Hong Kong Human Subjects Ethics Sub-Committee (ref: H001553).
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
40168_2024_1926_MOESM2_ESM.xlsx
Supplementary Material 2: Table S2. Pairwise comparisons of the BE microbiome by sample type based on the permutational multivariate analysis of variance and permutational analysis of multivariate dispersion.
40168_2024_1926_MOESM3_ESM.xlsx
Supplementary Material 3: Table S3. Evolutionary history and functional annotations of gene families in the novel Xenobia rMAG.
40168_2024_1926_MOESM4_ESM.xlsx
Supplementary Material 4: Table S4. Evolutionary history and functional annotations of gene families in the two novel Patescibacteria rMAGs.
40168_2024_1926_MOESM6_ESM.pdf
Supplementary Material 6: Figure S1. Density plot of the 10 most abundant species in all the samples across sample types. The average relative abundance of each species was >1% in all the samples.
40168_2024_1926_MOESM7_ESM.pdf
Supplementary Material 7: Figure S2. A species accumulation curve. The light blue shade indicates the 95% confidence interval of 10 permutations.
40168_2024_1926_MOESM8_ESM.pdf
Supplementary Material 8: Figure S3. Relative abundance of the three novel rMAGs in all of the samples. The relative abundance of one rMAG affiliated with the Xenobia class (SL346106_bin.7) and two rMAGs affiliated with the Patescibacteria phylum (SL336691_bin.7 and SL346115_bin.3) in each of the four residences and other BEs (grouped into“Others”) are shown. The colors indicate the six sample types.
40168_2024_1926_MOESM9_ESM.pdf
Supplementary Material 9: Figure S4. Functional differences between the two clusters of Actinobacteriota genomes. Gene presence and absence for each function were assessed for each rMAG. Prevalence represents the percentage of rMAGs in the Actinobacteriota phylum that contains the genes associated with a function (0% and 100% denote absence of the genes in all genomes and presence of the genes in all genomes, respectively). The number of rMAGs in each Actinobacteriota species is indicated in the brackets following the species names, and any rMAGs that are unclassified at the species level are excluded from the figure. Micrococcus luteus (x-axis) and the acetate oxidation function (y-axis) are highlighted in red text.
40168_2024_1926_MOESM10_ESM.pdf
Supplementary Material 10: Figure S5. Secondary biosynthetic potentials in the rMAGs. (a) Number of the top 12 types of BGCs in all of the rMAGs (other BGC types are grouped into “Others”). (b) Correlation between genome size and the number of BGCs in a genome. Genomes without any BGCs are also included in the figure and statistical test. (c) Composition of BGC types based on phylum and sample types. The number of rMAGs in each phylum or sample type is indicated in the brackets.
40168_2024_1926_MOESM11_ESM.pdf
Supplementary Material 11: Figure S6. Relative abundance of the Micrococcus luteus rMAGs recovered from Hong Kong BEs in all of the samples across six sample types. After dereplication of a total of 18 M. luteus MAGs, 11 rMAGs were retained. The panel is colored based on the sample type from which each M. luteus rMAG was reconstructed.
40168_2024_1926_MOESM12_ESM.pdf
Supplementary Material 12: Figure S7. The size of the pangenome and core genome of M. luteus in relation to the number of genomes included. Boxes represent the range of one standard deviation around the median number of genes, while the whiskers indicate the range of two standard deviations. The best-fit curves for the pangenome and core genome obtained using non-linear regression are shown in blue and red, respectively. The equations for the pangenome and core genome are \(G=1254N^{0.46}+1017\) and \(1672e^{-N\ast0.11}+243\), respectively, where N represents the number of genomes.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Tong, X., Luo, D., Leung, M.H.Y. et al. Diverse and specialized metabolic capabilities of microbes in oligotrophic built environments. Microbiome 12, 198 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s40168-024-01926-6
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s40168-024-01926-6