Skip to main content
Cell Reports Medicine logoLink to Cell Reports Medicine
. 2023 Sep 26;4(10):101205. doi: 10.1016/j.xcrm.2023.101205

Expanded microbiome niches of RAG-deficient patients

Ryan A Blaustein 1,6, Zeyang Shen 1, Sara Saheb Kashaf 1, ShihQueen Lee-Lin 1, Sean Conlan 1; NISC Comparative Sequencing Program2, Marita Bosticardo 3, Ottavia M Delmonte 3, Cassandra J Holmes 4, Monica E Taylor 4, Glenna Banania 4, Keisuke Nagao 4, Dimana Dimitrova 5, Jennifer A Kanakry 5, Helen Su 3, Steven M Holland 3, Jenna RE Bergerson 3, Alexandra F Freeman 3, Luigi D Notarangelo 3,7, Heidi H Kong 4,7, Julia A Segre 1,7,8,
PMCID: PMC10591041  PMID: 37757827

Summary

The complex interplay between microbiota and immunity is important to human health. To explore how altered adaptive immunity influences the microbiome, we characterize skin, nares, and gut microbiota of patients with recombination-activating gene (RAG) deficiency—a rare genetically defined inborn error of immunity (IEI) that results in a broad spectrum of clinical phenotypes. Integrating de novo assembly of metagenomes from RAG-deficient patients with reference genome catalogs provides an expansive multi-kingdom view of microbial diversity. RAG-deficient patient microbiomes exhibit inter-individual variation, including expansion of opportunistic pathogens (e.g., Corynebacterium bovis, Haemophilus influenzae), and a relative loss of body site specificity. We identify 35 and 27 bacterial species derived from skin/nares and gut microbiomes, respectively, which are distinct to RAG-deficient patients compared to healthy individuals. Underscoring IEI patients as potential reservoirs for viral persistence and evolution, we further characterize the colonization of eukaryotic RNA viruses (e.g., Coronavirus 229E, Norovirus GII) in this patient population.

Keywords: skin, human microbiome, RAG deficiency, bacteria, RNA virus, nares, inborn error of immunity, antimicrobial resistance, metagenomic assembled genome

Graphical abstract

graphic file with name fx1.jpg

Highlights

  • Patients with RAG deficiency harbor distinct skin, nasal, and gut microbiomes

  • Metagenome-assembled genomes colonizing RAG-deficient patients expand microbial diversity

  • Enriched antimicrobial resistance gene content among microbiota of RAG-deficient patients


Blaustein et al. explore microbiome diversity associated with recombination-activating gene (RAG) deficiency. Patients harbor highly individual microbial communities with expanded diversity and a relative loss in body site specificity. Increased bacterial and viral diversity, including potential pathogens, are observed in this immunodeficient population.

Introduction

Interactions of the microbiome and host immunity across epithelial surfaces play a crucial role in human health and disease. While mechanistic studies have explored how the microbiome can tune host immune responses,1,2 how human immunity shapes microbial community dynamics remains less clear. Studying patients with rare genetically defined inborn errors of immunity (IEIs) provides a unique opportunity to understand the critical contributions of the immune system in protecting the host.3,4

Recombination-activating gene 1 (RAG1) and RAG2 proteins initiate the process of V(D)J recombination in developing lymphocytes during the early stages of T and B cell maturation, which critically builds the vast repertoire of antigen-specific lymphocytes required for effective adaptive immunity. Null mutations in the RAG genes cause severe combined immune deficiency with lack of T and B cells and early susceptibility to life-threatening infections. Hypomorphic RAG mutations result in a combination of immunodeficiency and immune dysregulation with defects of central and peripheral tolerance, and they manifest with a broad spectrum of clinical phenotypes, ranging from recurrent and/or severe infections to autoimmunity.5,6,7 RAG-deficient patients have an increased propensity for infections sustained by a broad range of microbial species (viruses, bacteria, fungi). Investigating the microbiome of RAG-deficient patients provides an understanding of how impaired adaptive immunity impacts microbial colonization, which may further provide insights into the effects of secondary immunodeficiencies, including immunosuppression with cancer therapy or transplantation.

Shotgun metagenomic sequencing of clinical samples serves as a powerful method used for microbiome taxonomic and functional profiling, as well as discovery of novel taxa. Read-based analytical frameworks built upon reference genomes8,9 provide standard taxonomic assignments of the sample. However, the newer analytical approach of de novo assembly and binning of reads into metagenome-assembled genomes (MAGs) enhances characterization of microbial diversity within complex systems, including human skin and gut microbiomes.10,11,12,13 Microbiome analyses have the potential to identify both putative pathogens and the community in which they reside. Constructing a genome collection that spans the microbiota associated with RAG deficiency is an innovative way to uncover novel aspects of microbial diversity associated with this unique patient population.

The COVID-19 pandemic has highlighted the urgent need for predicting and tracking emergence and evolution of RNA viruses that colonize and infect humans to protect public health. While the field of microbiome science has traditionally and disproportionately focused on DNA sequence analysis,14 understanding evolutionary dynamics of RNA viruses is gaining substantial attention as crucial for pandemic preparedness.15,16,17 Since RAG deficiency and other forms of IEI lead to increased susceptibility to recurrent and/or severe infections,5 including granulomatous lesions associated with persistence of the RA27/3 rubella vaccine RNA virus, such patients could be an important source to identify RNA viruses that colonize humans as prototype pathogens for vaccine development.15

Here, we sought to investigate the human skin, nares, and gastrointestinal tract microbiomes associated with RAG deficiency to explore how adaptive immunity surveils and shapes microbiota colonization and community dynamics.

Results

Microbiome analyses of RAG-deficient patients

We aimed to investigate the human microbiome in RAG-deficient patients who are known to have profound immunodeficiencies complicated by severe persistent infections. Patients with RAG deficiency (four children, four adults; median age 18 years) who were evaluated at the National Institutes of Health Clinical Center were enrolled onto our institutional review board-approved protocol (NCT02471352) to undergo sample collection from skin, nares, and stool. Three of the patients were followed during their allogeneic hematopoietic cell transplant with collection of longitudinal stool samples. Half of the patients were characterized as delayed onset combined immunodeficiency with granulomas and/or autoimmunity (CID-G/AI), two patients with combined immunodeficiency (CID), one patient with severe combined immunodeficiency (SCID), and one patient with leaky SCID (LS)18 (Tables S1 and S2).

To explore microbial diversity associated with RAG deficiency, we collected a total of 86 samples across multiple body sites in the eight patients (Figure S1 and Table S3), from which we obtained 2.9 billion reads, or 432.8 gigabases (Gb) of microbial DNA sequence data. Since our goal included characterizing novel microbes that colonize RAG-deficient patients, we identified taxa de novo with assembly and binning from the shotgun metagenomes. From the skin and nares, 470 prokaryotic MAGs (representing 136 species), 29 eukaryotic MAGs (representing six species), and 1,447 non-redundant viral contigs were recovered. From the stool, 673 prokaryote MAGs (representing 214 prokaryote species) and 1,812 non-redundant viral contigs were recovered. The collective set of prokaryote MAGs primarily mapped to the major bacterial phyla Actinobacteriota, Bacteriota, Firmicutes, and Proteobacteria (Figure S2). All eukaryote MAGs (recovered from skin/nares only) mapped to Malassezia (Table S4). Viral sequences primarily mapped to phage, though eukaryotic viruses such as Papillomaviridae (n = 76 contigs) were also recovered from skin/nares.

To assess the potential novelty of MAGs and contigs recovered from RAG-deficient patients, we compared them with microbial genomes that have been recovered from healthy volunteers and children, including the Skin Microbial Genome Collection10 (SMGC; i.e., 622 prokaryote genomes, 12 eukaryote genomes, and 6,395 viral contigs), the Unified Human Gastrointestinal Genome collection13 (UHGG; i.e., 4,729 prokaryote genomes), and the Early Life Gastrointestinal Genome collection19 (ELGG; i.e., 2,172 prokaryote genomes). A comprehensive genome catalog was generated by clustering MAGs and viral contigs recovered from the RAG-deficient patients with those recovered from healthy volunteers (HVs) and healthy children (HCs)10,20,21 (Figure S1), as well as the composite genomes in the SMGC, UHGG, and ELGG. Relative to the species-level taxa in HVs/HCs, SMGC, UHGG, and ELGG, the skin and nares of the RAG-deficient patients contained 33 non-redundant MAGs that were distinct or of higher quality (primarily derived from pediatric patients; i.e., 75.8%) and two distinct fungal MAGs (both from adult patients), and the stool of the RAG-deficient patients contained 27 non-redundant bacterial MAGs that were distinct or of higher quality (primarily derived from adult patients; i.e., 77.8%). The full catalog of 14,799 non-redundant genomes (5,406 prokaryote genomes, 15 eukaryote genomes, and 9,378 viral contigs) was used for downstream analysis.

Skin microbiome diversity, individuality, and novelty of RAG-deficient patients

Given the association of skin infections and atypical skin lesions in patients with RAG deficiency, we sought to examine the skin microbiome. Our assembly-based approach enabled robust characterization of microbial diversity across the patients with RAG deficiency. Mapping the RAG-deficient metagenome sequenced reads against our constructed genome catalog resulted in classification of 92.7% ± 5.0% of reads from skin/nares samples (n = 69) and 95.6% ± 0.6% of reads from stool samples (n = 17) (mean ± standard deviation) (Figure S3). These values were significantly greater than the fraction of reads classified using a traditional read-based approach; i.e., Kraken28 using the default NCBI database (accessed July 2022) as a reference database resulted in 50.8% ± 16.9% of reads mapping for skin/nares (p < 0.001) and 64.3% ± 15.7% for stool samples (p < 0.001) (mean ± standard deviation).

As previously shown, and confirmed here, similarity of HV skin—sampled at antecubital crease (Ac), retroauricular crease (Ra), and volar forearm (Vf)—and nares (N) microbiomes is driven by individual (PERMANOVA R2 = 0.339, p = 0.008) and body site (PERMANOVA R2 = 0.303, p = 0.001). By contrast, similarity of skin and nares microbiomes of adult patients with RAG deficiency across HV-matching sites (Ac, Ra, Vf, N) was primarily driven by subject (PERMANOVA R2 = 0.473, p = 0.001) with body site insignificant in analyses (PERMANOVA R2 = 0.171, p = 0.703) (Table S5). While RAG deficiency subtype (CID, CID-G/AI, LS, SCID) correlated with skin microbiome diversity (PERMANOVA R2 = 0.212, p = 0.001), such associations would need to be validated in larger cohorts of distinct subtypes (Figure S4).

While Cutibacterium acnes was the most prominent taxa for all HVs (Figures 1A and S5), for specific RAG-deficient patients, distinct taxa predominated, e.g., QFRN01/“Candidatus” pellibacterium (RAG_pt06), Corynebacterium bovis and Staphylococcus hominis (RAG_pt07), and Papillomaviridae (RAG_pt08) (Figures 1A and S6). Additional lower abundance taxa enriched in the skin microbiomes of adults with RAG deficiency included Escherichia coli (linear mixed-effects [LME] model p = 0.008, q = 0.024) and Corynebacterium pseudogenitalium (LME model p = 0.013, q = 0.030) (Table S6). Trends for expansions of most of these taxa, as well as Haemophilus influenzae and Malassezia restricta, were further observed in the nasal microbiomes (Figure 1B). The relative lack of C. acnes predominance on the skin and nares of adult RAG-deficient patients was consistent with microbiota re-structuring that occurred in HV controls receiving trimethoprim-sulfamethoxazole (TMP-SMX, a broad-spectrum antibiotic commonly used as prophylaxis in patients with immunodeficiencies) (Figure S7).20 Overall, RAG-deficient skin microbiome diversity appeared to be patient specific with microbial signatures for each individual, potentially related to the expanded niche in an immunodeficient population.3

Figure 1.

Figure 1

Composition of skin and nasal microbiomes of patients with RAG deficiency

(A) Relative abundances of microbial taxa in the skin microbiomes (averages across body sites: Ac, antecubital crease, Ra, retroauricular crease, Vf, volar forearm) of patients with RAG deficiency (adult and children) compared to healthy volunteers (HVs) and healthy children (HCs). Each bar represents a distinct subject.

(B) Relative abundances of microbial taxa in the nasal microbiomes (N) of patients with RAG deficiency (adult) compared to healthy volunteers (HVs). Each bar represents a distinct subject.

(C) Compositional similarity or beta diversity of Ac, Ra, Vf, and N microbiome samples. See also Figure S5 and S6 and Table S6.

The skin microbiomes of children with RAG deficiency, compared with HCs, were enriched in C. acnes (LME model p = 0.005, q = 0.070), as well as a variety of lower abundance taxa (Figure 1A and Table S6). Patient-specific trends for expansions of E. coli (RAG_pt01) and Papillomaviridae (RAG_pt01, RAG_pt02) on skin and H. influenzae in nares (RAG_pt02) were also observed (Figure S6). Compositions of the skin/nares microbiomes of RAG-deficient children and adults clustered distinctly from respective HCs/HVs (Figure 1C and Table S5).

Gut microbiome of RAG-deficient patients

Because RAG-deficient patients commonly manifest with persistent gastrointestinal infections, we next explored the gut microbiomes. Bacterial communities within stool samples collected from adults with RAG deficiency were different from those of HVs (PERMANOVA R2 = 0.121, p = 0.045) (Figure 2A), with trends for higher relative abundances of Parabacteroides distonasis and Alistipes onderdonkii (Figure 2B). Gut microbiomes of three patients were further monitored longitudinally following allogeneic hematopoietic cell transplantation (HCT). The trajectory of each patient’s gut microbiome was highly individual, e.g., expansions of P. distonasis and Enterocloster boltae in RAG_pt01, several Firmicutes species in RAG_pt05, and Bacteroides fragilis in RAG_pt06 (Figure S8). Notably, all patients harbored elevated E. coli at different time points as well (Figure S8). Longitudinal studies on RAG-deficient patient microbiome dynamics parallel to those post HCT would be needed to confirm whether the subject-specific microbiome trajectories reflect antibiotic prophylaxis and immunologic reconstitution associated with HCT and/or natural drift.

Figure 2.

Figure 2

Composition of gut microbiomes of patients with RAG deficiency

(A) Compositional similarity or beta diversity of baseline stool microbiome samples.

(B) Relative abundances of microbial taxa in RAG-deficient patients and HVs. Each bar represents a distinct subject. See also Figure S8 and Table S6.

Microbial species distinguishing RAG-deficient patients

To further investigate the particular microbiota associated with RAG deficiency, we focused on genomes present and absent on the RAG-deficient patients and HVs/HCs, respectively. Setting an average relative abundance threshold of 0.1%, there were 38 genomes unique to the skin (Ac, Ra, Vf) and nares (N) of RAG-deficient patients relative to HV/HC counterparts (Figure 3A). The most abundant species were C. bovis on RAG_pt07 (average relative abundance of 36.4%), Corynebacterium xerosis on RAG_pt03 (avg. 5.2%), E. coli on RAG_pt01 (avg. 11.3%), and H. influenzae on RAG_pt02 (avg. 18.0%) and RAG_pt06 (avg. 16.0%). Both RAG_pt02 and RAG_pt06 had previously received H. influenzae type b (Hib) vaccinations, though RAG_pt02 had been reported to have a poor response, while RAG_pt06 had demonstrated positive anti-Hib titers. Several genomes were found to colonize skin and nares of at least half of the RAG-deficient patient population, yet they were absent or at trivial abundances in HVs/HCs, including E. coli, P. melaninogenica, and an uncharacterized species of Streptococcus (Figure 3A). We note that P. melaninogenica was among the five bacterial species from this set that also emerged on the skin of HV controls receiving TMP-SMX.20

Figure 3.

Figure 3

Bacterial species that colonize the skin/nares (A) and gut (B) of patients with RAG deficiency, not found on healthy individuals

Purple shading of boxes in heatmap corresponds to MAG relative abundance in skin/nares microbiome (average relative abundance across Ac, Ra, Vf, and N body sites) and stool microbiome for each RAG-deficient patient. Colonization in RAG-deficient patients and absence in healthy controls is determined as above/below an average relative abundance threshold of 0.1% across respective subject groups. Font color of bacterial species corresponds to phylum.

Gut microbiomes of RAG-deficient patients contained 54 MAGs that were on average above 0.1% relative abundance and below that threshold in HV/HC counterparts. While Klebsiella ornithinolytica [Raoultella planticola] was the most abundant of these, high-level colonization appeared specific to RAG_pt07 (relative abundance of 14.6%). Gut-associated MAGs that uniquely colonized at least half of the RAG-deficient patients included strains of Alistipes finegoldiii, Alistipes shahii, Bacteroides fragilis_A, Enterocloster aldenensis, Enterocloster lavalensis, Hungatella effluvii, Oscillobacter welbionis, Parabacteroides johnsonii, Prevotella corporis, and Ruthenibacterium lactatiformans (Figure 3B). Overall, a variety of bacterial species that are otherwise absent in normal flora appeared to colonize patients with RAG deficiency.

Antimicrobial resistance is elevated in microbiomes of patients with RAG deficiency

Antimicrobial resistance genes (ARGs) were widely present throughout microbiomes from the RAG-deficient patient population (Table S7). We identified 119 and 134 different ARGs from skin (Ac, Vf, Ra body sites) and nares metagenomes of RAG-deficient pediatric and adult patients, respectively. In contrast, only about half as many were detected in HCs (n = 67 ARGs), with roughly the same total but different types being present in HVs (n = 129 ARGs). The gut microbiomes of adults with RAG deficiency and those of HVs contained 107 and 84 ARGs, respectively.

The 12 ARGs with enriched reads per kilobase per million sequenced reads (RPKMs) among the children with RAG deficiency were primarily those that confer resistance to penams/beta-lactams (CfxA6, mecI, mecR1) and macrolides (ErmB, ErmF, ErmX, mefA, and mphC) (Figure 4A), which was consistent with the antibiotic regimens the children were actively or recently prescribed (Table S2). The skin/nares of adults with RAG deficiency were significantly enriched in 11 ARGs (Figure 4B), including some consistent trends with the children (mecI, mecR1, and mupA) and additional putative resistance to aminoglycosides (APH(6)-Id and ANT(2″)-Ia), fosfomycin (murA), and phenicols (cmx). We note that several ARGs enriched in HVs compared to patients with RAG deficiency are endogenous genes of commensal bacteria that may confer resistance based on specific point mutations (i.e., gyrA, gyrB, and norA associated with HV-enriched C. acnes and Staphylococcus epidermidis) (Figure 4B). Based on genomic predictions, it remains unclear whether such genes confer the specific resistance phenotype.

Figure 4.

Figure 4

Antimicrobial resistance genes (ARGs) enriched in microbiomes of patients with RAG deficiency

(A) Abundances (RPKMs) of ARG families with significant differences (p < 0.05) in skin microbiomes (Ac, Ra, Vf) of pediatric patients. p value calculated with linear mixed-effects model with fixed effects for group type (RAG vs. HV) and random effects for body site (N, Ac, Ra, Vf).

(B) Abundances (RPKMs) of ARG families with significant differences (p < 0.05) in skin and nares microbiomes (Ac, Ra, Vf, N) of adult patients. p value calculated with linear mixed-effects model with fixed effects for group type (RAG vs. HV) and random effects for body site (N, Ac, Ra, Vf).

(C) Abundances (RPKMs) of ARG families with significant differences (p < 0.05) in stool microbiome of adult patients. Vf, volar forearm; Ac, antecubital crease; Ra, retroauricular crease; and N, nares. St, stool. ARG color corresponds to antimicrobial class. p value caluculated with Mann-Whitney P. See also Table S7.

To evaluate whether RAG deficiency-enriched ARGs may reflect responses to antibiotic treatments received by the patients, we analyzed a published skin microbiome dataset in which HVs received TMP-SMX.20 There were 84 ARGs identified from skin microbiomes of the healthy volunteers across the time span of the study. Similar to adult patients with RAG deficiency, the microbiomes of HVs at 2 weeks post antibiotic receipt contained higher abundances of aminoglycoside resistance (APH(3″)) and drug export (patA), yet lower abundances of gyrA (Figure S9), the latter of which is associated with relative loss in C. acnes (Figure S7). Thus, several ARGs enriched in patients with RAG deficiency were consistent with those of HVs receiving similar antibiotics.

The gut microbiomes from adults with RAG deficiency were enriched in nine ARGs, including those associated with sulfonamide (sul1) and tetracycline (tetX) resistance (Figure 4C). Notably, the presence of the tetX gene, which confers resistance to tigecycline, deserves clinical attention because tigecycline is regarded as a last-line antibiotic to treat severe infections caused by multidrug-resistant pathogens.22

Eukaryotic viral classification across RAG deficiency

Given that viruses can predominate in some IEI disorders, we sought to study viral taxa present in RAG-deficient patients. Papillomaviridae was the most abundant microbial taxon enriched across the skin of adult RAG-deficient patients (15.7% ± 7.3%) compared to HVs (0.018% ± 0.003%) (mean ± standard error; LME model p = 0.022, q = 0.080). All major human papillomavirus (HPV) phylogenetic groups—Alphapapillomavirus, Betapapillomavirus, and Gammapapillomavirus—were enriched in the adult patients (Figure 5A). The two latter HPV genera were also enriched in children relative to HCs (Figure 5A). We assembled 33 “high-quality” HPV contigs (i.e., predicted by CheckV23 to have 100% completeness and 0% contamination) from the RAG-deficient patients, which included 13 strains of Gammapapillomavirus, with <80% sequence identity to L1 of the closest relative HPV, i.e., potentially new species (Figure 5B and Table S8).

Figure 5.

Figure 5

Skin microbiomes of patients with RAG deficiency are enriched with human papillomaviruses (HPVs)

(A) Abundances (reads per kilobase per million reads; RPKMs) of HPV groups detected by HPViewer.

(B) Phylogenetic tree of novel HPV genomes recovered from patients in the context of known species. Strains with L1 gene <80% sequence identity to that of closest HPV relative are considered novel. See also Table S8.

RNA viruses colonize nares and gut of patients with RAG deficiency

To explore the full extent of the microbial colonization of patients with RAG deficiency, we subjected nares and stool samples to RNA sequencing to identify RNA viruses. A variety of novel and known RNA viruses were identified, including different strains of Norovirus found in stool samples of two patients (RAG_pt01 and RAG_pt06), of which only one (RAG_pt06) had a known history of Norovirus infections and positive laboratory tests (Figure 6A and Table S2). The other patient (RAG_pt01) did not have a history of Norovirus infection and was asymptomatic.

Figure 6.

Figure 6

RNA viruses colonize nares and stool of patients with RAG deficiency

(A) RNA-seq read counts mapped to reference genomes of RNA viruses for nares (green) and stool (blue) samples. “∗” indicates that a contig was successfully assembled from the sample, matching with the corresponding virus.

(B) A potentially new strain of Cosavirus D discovered in the feces of a patient. Phylogenetic tree of VP1 protein sequence compares the contig assembled from the patient and published Cosavirus from NCBI. See also Table S9.

Some samples contained more than one type of RNA virus, such as Norovirus and Picobirnavirus in the stool of RAG_pt01 and Coronavirus 229E and Rhinovirus in the nares of RAG_pt07. While RAG_pt07 was asymptomatic at the time of routine nasopharyngeal respiratory panel, the panel was negative for human rhinovirus/enterovirus and positive for Coronavirus, and computed tomography scan showed lung infiltrates. This indicated that clinical testing detected only one of the two respiratory viruses detected by RNA sequencing. Two patients also demonstrated multi-body site viral colonizations. For RAG_pt07, Rhinovirus was highly abundant in the nares and additionally detected in the stool. For RAG_pt08 whose gastrointestinal symptoms reported at the time of sampling were limited to constipation, Cosavirus was highly abundant in stool and also identified in the nares. RNA viruses appear to widely colonize the patient population with RAG deficiency, sometimes with co-colonization of multiple viruses and tropism of the same virus across multiple body sites.

To characterize the genotype of the RNA viruses, we performed comparative genomic analysis on the contigs assembled from the shotgun RNA-seq data (Table S9). The Norovirus found in two patients were both categorized as genotype GII.4[P16].24 Alignment of the VP1 protein sequence25 of the Cosavirus contig recovered from RAG_pt08 with NCBI reference genomes placed this strain as most similar to Cosavirus D (Figure 6B). Similarly, we compared the contig of Coronavirus 229E recovered from RAG_pt07 to all Coronavirus 229E genomes in GenBank by aligning the spike protein sequences (Figure 7A). Sequence alignment of the spike surface protein sequence demonstrated that the strain colonizing RAG_pt07 was most similar to those previously discovered in the United States. However, RAG_pt07’s strain also possessed four novel missense mutations that are not found in any Coronavirus 229E sequences in GenBank, along with seven missense mutations that are shared with only a few of the most closely related strains. To add context for Coronavirus associations with IEI, we extended our analysis to another IEI cohort of DOCK8-deficient patients whose skin microbial communities were evaluated in a previous study.4 Here, we re-processed the skin and nasal samples from 15 DOCK8-deficient patients with a more robust RNA-seq protocol. We detected Coronavirus HKU1 at three body sites of one DOCK8-deficient patient and Coronavirus NL63 in one other patient (Figure S10A). Comparative analysis on the spike protein sequence of Coronavirus HKU1 indicated three novel missense mutations found on the DOCK8-deficient patient (Figure S10B).

Figure 7.

Figure 7

Novel allele of Coronavirus 229E identified and inflammatory response elicited in nares of a patient with RAG deficiency

(A) Amino acid comparison of the coronavirus spike protein from the patient compared with representative Coronavirus 229E sequences from GenBank. Annotation of amino acid changes from the GenBank reference Coronavirus 229E (NP_073551.1) to the patient-specific coronavirus are indicated above the chart with amino acid position defined below the chart. Four novel amino acid changes are underlined.

(B) Gene Ontology (GO) terms significantly enriched among the human transcripts in the nasal RNA-seq data of this patient. Dot size indicates the number of genes detected in each GO category. See also Figures S10 and S11.

Overall, patients with RAG deficiency were colonized by a variety of novel bacteria and viruses, including potential pathogens. Human-derived gene transcripts from the nares of RAG_pt07, colonized by Coronavirus 229E and C. bovis, showed an enrichment of genes associated with Gene Ontology terms for inflammatory response, coronavirus disease, and response to bacterium (Figures 7B and S11). Similarly, human transcripts from the nares of the DOCK8-deficient patient colonized by Coronavirus HKU1 were enriched with genes related to immune response (Figure S10C). In general, DNA and RNA sequencing analyses identified potential pathogens in patients who did not exhibit overt symptoms and who had not been tested for these potential pathogens.

Discussion

The host microbiome contributes to the shaping of, and is itself shaped by, host immunity in a homeostatic balance. However, disruptions in host immunity in immunodeficient patients can affect the characteristic body and skin site-specific microbial community patterns observed in healthy individuals.26,27 Microbiomes of patients with RAG deficiency exhibited substantial inter-individual variation with loss of body site specificity, perhaps reflecting a range of defects in central and peripheral tolerance observed among patients with RAG deficiency.6 While normal skin flora is typically predominated by C. acnes,10,28,29 putative pathogens such as C. bovis and H. influenzae were found at high relative abundances in specific patients with RAG deficiency. Likewise, Candidatus pellibacterium/QFRN01, which is typically prevalent at sebaceous skin sites,10 was also highly abundant at dry skin sites (i.e., Hp, Vf) for RAG_pt06. Additionally, nearly half of the bacterial genomes (MAGs) found to uniquely colonize the skin of patients with RAG deficiency were also present in the gut microbiomes. The alterations in host immunity resulted in increased permissivity of the microbiota that can typically persist on and in human skin, nares, and gut, as evidenced by the more frequent presence of typical gut bacteria (e.g., E. coli) on skin in this patient population, the identification of C. bovis in a patient’s nares, and the chronic gut infections with Norovirus. Extending this work to predict bacterial or viral epitopes of these expansive taxa that are recognized by adaptive immune cells could provide critical insights to host-microbiome interactions mediated by IEI.30 In addition, future in vivo experiments conducted with hypomorphic or immunodeficient mouse models31,32,33 could yield mechanistic understanding of RAG gene-mediated surveillance of microbiota colonization.

Given the high infection risk, patients with RAG deficiency are closely monitored and tested for potential infections. Results from clinical testing for Norovirus gut infections and common respiratory viruses were consistent generally with the DNA and RNA sequencing data analyses. However, in a few instances, sequencing methods detected the asymptomatic presence of Norovirus in the gut of one child, the presence of Cosavirus in an adult with gastrointestinal symptoms, and the presence of Rhinovirus in a patient whose clinical testing was positive for Coronavirus 229E and negative for human Enterovirus/Rhinovirus. Interestingly, sequencing also demonstrated the presence of H. influenzae via MAG recovery and read-mapping in two patients who had been previously vaccinated for H. influenzae type b.

While microbial colonization patterns associated with RAG deficiency were mediated by clinical and immunologic phenotypes, the antibiotic prophylaxis and intensive treatment given to manage infection susceptibility in this immunodeficient population played a critical role as well. It is well established that antibiotics and other therapeutics can modify microbiome diversity and select for persistent antibiotic-resistant strains.20,34,35,36,37,38 Specific forms of antibiotic resistance enriched in the RAG-deficient patients, including tetracycline and cephamycin resistance, have been reported to emerge in longitudinal studies of resistomes of healthy individuals receiving antibiotics.20,36 Persistence of ARGs has potential clinical concern due to the risk of multi-drug-resistant infections, particularly in a vulnerable patient population with immunodeficiency. Delineating long-term effects on the microbiome from antibiotic receipt versus those associated with IEI, as well as potential interactions, would not be possible in human patients given the potential risks.

The present study uncovered wide colonization of eukaryotic viruses on patients with RAG deficiency, including those associated with severe acute respiratory infection such as Coronavirus 229E and Rhinovirus. In a study of 1,000 participants, Coronavirus was detected in 2%–3% of immunologically normal patients with severe acute respiratory infection and no underlying diseases.39 In addition, coinfection of Coronavirus and Rhinovirus found in our patient was shown by other studies to be associated with more severe symptoms than single-infection cases.40,41 Consistent with DOCK8 deficiency and other rare disorders, patients with RAG deficiency also harbored expansions in HPVs, which underscores the importance of immune surveillance in mediating eukaryotic viral colonization.4,42 Overall, IEI patients could be an important new source to identify RNA and DNA viruses, including those with signature mutations or genetic rearrangements, that infect humans serving as important examples of prototype pathogens for pandemic preparedness.15,16 Moreover, considering the novel variations of Coronavirus 229E found in our sample and novel variations of Coronavirus HKU1 found in a previously studied cohort of DOCK8-deficient patients, investigating viral evolution in IEI patients may provide deeper understanding of hosts as potential reservoirs and how natural immunity is required to fully clear an infection.43

Limitations of the study

Key limitations to this study and others focusing on the microbiome in rare diseases center around small population size. The limited number of patients with RAG deficiency presented challenges for distinguishing microbial diversity patterns associated with RAG subtypes and clinical manifestations of IEI. In addition, while inter-individual microbiome variation may reflect host-specific immune surveillance, such effects are confounded by potential impacts of intensive antibiotic use. Moreover, since many of the putatively novel MAGs associated with RAG-deficient patient skin microbiomes were sourced from children, it remains unclear whether novelty may reflect genome database limitations.

Consortia

Members of NISC Comparative Sequencing Program are Jim Thomas, James Mullikin, Alice Young, Gerry Bouffard, Betty Barnabas, Shelise Brooks, Chloe Buchter, Juyun Crawford, Joel Han, Shi-ling Ho, Richelle Legaspi, Quino Maduro, Holly Marfani, Casandra Montemayor, Karen Schandler, Brian Schmidt, Christina Sison, Mal Stantripop, Sean Black, Mila Dekhtyar, Cathy Masiello, Jenny McDowell, Morgan Park, Pam Thomas, and Meg Vemulapalli.

STAR★Methods

Key resources table

REAGENT or RESOURCE SOURCE IDENTIFIER
Biological samples

Skin swabs, nasal swabs, and stool samples of patients with RAG deficiency National Institute of Arthritis and Musculoskeletal and Skin Diseases, NIH N/A

Chemicals, peptides, and recombinant proteins

Yeast Cell Lysis Buffer Lucigen Cat# MPY80200
SUPERase⋅In RNase Inhibitor ThermoFisher Cat# AM2694
Ready-Lyse Lysozyme Solution Lucigen Cat# R1804M
DNA-IQ Bucket Promega Cat# V1225
Trizol LS ThermoFisher Cat# 10296010
AMPure XP beads Agencourt Cat# A63881

Critical commercial assays

RNeasy PowerMicrobiome kits Qiagen Cat# 26000-50
Monarch Nucleic Acid Purification kit NEB Cat# T2030L
Stranded total RNA prep kit Illumina Cat# 20040529
FastSelect -5S/16S/23S kit Qiagen Cat# 335927
FastSelect -rRNA H/M/R kit Qiagen Cat# 334386

Deposited data

Raw sequencing data for shotgun metagenomics sequencing and RNA-seq This paper NCBI Bioproject: PRJNA872116
Code that may be used to reproduce our analyses This paper https://github.com/skinmicrobiome/Blaustein_RAG_2023

Software and algorithms

SPAdes v3.15.0 Bankevich et al.44 https://github.com/ablab/spades
MetaWRAP v1.2.2 Uritskiy et al.45 https://github.com/bxlab/metaWRAP
GUNC v1.0.5 Orakov et al.46 https://github.com/grp-bork/gunc
dRep v2.6.2 Olm et al.47 https://drep.readthedocs.io/en/latest/overview.html
GTDB-Tk v1.3.0 Chaumeil et al.48 https://github.com/Ecogenomics/GTDBTk
FastTree v2.1.10 Price et al.49 http://www.microbesonline.org/fasttree/
Ape v5.6 Paradis et al.50 https://rdrr.io/cran/ape/
EukCC v2.1.1 Saary et al.51 https://github.com/Finn-Lab/EukCC
MASH v2.3 Ondov et al.52 https://github.com/marbl/Mash
MUMmer v4.0.0beta2 Kurtz et al.53 https://github.com/mummer4/mummer
VirFinder v1.1 Ren et al.54 https://github.com/jessieren/VirFinder
Virsorter2 v2.1 Guo et al.55 https://github.com/jiarong/VirSorter2
CheckV v0.7.0 Nayfach et al.23 https://bitbucket.org/berkeleylab/checkv
CD-HIT v4.6.8 Fu et al.56 https://www.bioinformatics.org/cd-hit/
DemoVir GitHub https://github.com/feargalr/Demovir
prodigal v2.6.3 Hyatt et al.57 https://github.com/hyattpd/Prodigal
hmmer v3.3.2 Potter et al.58 http://hmmer.org/
MUSCLE v5.0.1428 Edgar et al.59 http://www.drive5.com/muscle/
BWA-MEM v0.7.17 Li et al.60 http://bio-bwa.sourceforge.net/
SAMtools v1.15 Li et al.60 http://www.htslib.org/
Kraken2 v2.1.2 Wood et al.8 https://ccb.jhu.edu/software/kraken2/
HPViewer Hao et al.61 https://github.com/yuhanH/HPViewer
ShortBRED v0.9.5 Kaminski et al.62 https://github.com/biobakery/shortbred
R v4.1.1 CRAN https://www.r-project.org
Bowtie2 v2.4.5 Langmead et al.63 http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
BLAST v2.8.0 Altschul et al.64 https://blast.ncbi.nlm.nih.gov/Blast.cgi
RagTag v2.1.0 Alonge et al.65 https://github.com/malonge/RagTag
Norovirus typing tool v2.0 Kroneman et al.66 https://www.rivm.nl/mpf/norovirus/typingtool
ggtree v3.2.1 Yu et al.67 https://doi.org/10.18129/B9.bioc.ggtree
Clustal Omega Madeira et al.68 https://www.ebi.ac.uk/Tools/msa/clustalo/
STAR v2.7.8a Dobin et al.69 https://github.com/alexdobin/STAR
HOMER v4.11.1 Heinz et al.70 http://homer.ucsd.edu/homer/

Resource availability

Lead contact

Further information and requests for resources should be directed to Julia A. Segre (jsegre@nhgri.nih.gov).

Materials availability

This study did not generate new unique reagents.

Experimental model and study participant details

Patients with biallelic hypomorphic mutations in the recombination-activating gene 1 (RAG1) or RAG2 (referred to as RAG genes) were recruited to participate in a study approved by the institutional review board of the National Institute of Arthritis and Musculoskeletal and Skin Diseases https://clinicaltrials.gov/ct2/show/NCT02471352. Patients were diagnosed initially with RAG gene deficiency based on targeted gene sequencing or whole exome sequencing, and the RAG variants were confirmed by Sanger sequencing. Clonality of the T cell receptor (TCR) repertoire was assessed by evaluating the proportion of TCRβ Variable families expressed on the surface of circulating CD4 and CD8 cells as previously described.71 Written informed consent was obtained from all adult patients and parents or guardians of participating children, including, when age appropriate, assent from children.

Method details

Subject sampling

For shotgun metagenomics sequencing, stool, nares, and eight skin sites representing diverse physiological niches were sampled: dry (hypothenar palm, Hp; volar forearm, Vf); moist (antecubital crease, Ac; inguinal crease, Ic; popliteal crease, Pc; plantar heel, Ph); sebaceous (manubrium, Mb; retroauricular crease, Ra). Longitudinal stool samples were collected for three patients who received allogeneic hematopoietic cell transplantation (HCT). Patient metadata are presented in Table S1 and Table S2. Negative control air samples were collected during patient visits and included for processing and sequencing.

Sample processing and sequencing

From clinical samples, DNA isolation and library preparation to generate shotgun metagenomic sequence data was performed as previously described26 with a target of 35 million 2 × 151-bp reads per sample on an Illumina Novaseq 6000. Sequence preprocessing to trim and remove adapter sequences, low-quality sequences, and human reads (GRCh38) was performed with Cutadapt v.4.0,72 PRINSEQ v.0.20.4,73 and Bowtie2 v.2.4.5.63 In total, for DNA metagenomics sequencing of stool, nares and skin, we analyzed 86 samples and 2.9 billion reads (or 432.8 Gb) of nonhuman, quality filtered, paired-end reads (median 12.8 million reads or 1.9 Gb per skin/nares sample; median 93.7 million reads or 14.1 Gb per stool sample). Previously published shotgun metagenomics data from healthy volunteers (HV) and children (HC), including those receiving TMP-SMX antibiotics for 0–4 weeks, were incorporated in our analysis.10,20,21 The HV/HC metagenomes used in the present study as ‘healthy controls’ are available from respective original studies under NCBI Bioproject’s PRJNA46333, PRJNA604820, and PRJNA694925 (Table S3).

Total RNA was isolated from 11 stool samples (6 baseline; 5 post-HCT for two patients) and 8 nares swabs (baseline) (Table S3). Stool samples were soaked in liquid N2 and ground into fine powder with an ice-cold pulverizer. RNeasy PowerMicrobiome kits (Qiagen, 26000-50) were used to isolate RNA. Nares swabs were thawed in 2 mL Eppendorf Safe-Lock Biopur tubes (#022600044) on ice with 250 μl pre-aliquoted of Yeast Cell Lysis Buffer (Lucigen, MPY80200) with 5% (v/v) SUPERase∗In RNase inhibitor (ThermoFisher, AM2694). An additional 100 μl Yeast Cell Lysis buffer with RNase inhibitor and Ready-Lyse (Lucigen, R1804M) was added, then tubes were incubated at 60°C shaking at 1000 rpm for 15 min. Excess buffer was collected from swabs via DNA-IQ Bucket (Promega, V1225) spin down at 3000 rpm 3x for 10 s. Sterile 2 × 5 mm stainless-steel beads and 1,200 μl Trizol LS (ThermoFisher, 10296010) were then added to each tube, and, following vortex and incubation at RT for 5 min, sample tubes were processed with a tissue-lyser shaking at 15 Hz for 1 min, 2x. We then added 320 μl Chloroform, and tubes were centrifuged at 13,000 x g at 4°C for 20 min. The aqueous phase was carefully transferred to a new 5 mL tube, and 0.8 mL of isopropanol with 1ul glycogen was added. Tubes were incubated at room temperature for 10 min and then centrifuged at 12,000 x g at 4°C for 10 min. Pellets were washed twice with 1.6 mL of 75% ethanol at 12,000 x g at 4°C for 5 min, then air-dried at RT for 15 min and resuspended in 50 μl of RNase-free water. Total RNA from both stool and nares samples was processed for residual DNA removal by adding 0.1 volumes of 10X DNase I buffer and 1ul of rDNase (ThermoFisher, AM1906) and incubating at 37°C for 30 min. Reactions were inactivated by adding of 0.1 volumes of DNase inactivation reagent. Tubes were then vortexed, incubated at RT for 2 min, and centrifugated at 10,000 x g for 2 min. The supernatant (RNA) was transferred to a fresh tube and cleaned with the Monarch Nucleic Acid Purification kit (NEB, T2030L). RNA quality was assessed by Qubit and Agilent 2100 Bioanalyzer.

For RNA-seq, libraries were prepared using the stranded total RNA prep kit (Illumina, 20040529) with AMPure XP beads (Agencourt, A63881), following an initial removal of rRNA with the FastSelect -rRNA H/M/R kit (Qiagen, 334386). Unique-dual indexes (Illumina, 20627581) were ligated to the cDNA libraries. Following quality assessment by Qubit and Bioanalyzer, libaries were then sent for sequencing. In total, 21 libraries were sequenced and pre-processed as described above, yielding 9.47 billion (or 1430 Gb) nonhuman, quality filtered, paired-end reads (median 625 million reads or 94 Gb, per stool/nares sample).

Metagenome-assembled genome (MAG) recovery

Assembly of metagenomic reads from each sample was performed as previously described.10,74 In brief, metaSPAdes v.3.15.044 was used to assemble reads from each sample, as well as co-assembled, concatenated reads from all skin samples in individual patients. MetaWRAP v.1.2.245 was used for binning (with concoct, maxbin2, and metabat2) and bin refinement, and GUNC v1.0.551 was used to identify and remove MAGs that were chimeric. Refined bins with predicted completeness greater than 50% and contamination less than 5% were dereplicated at the species level using dRep v2.6.247 with parameters set for average nucleotide identity (ANI) of 95% and minimum overlap of 30%.

Taxonomic classification of prokaryotic MAGs was performed using the GTDB-Tk v1.3.048 classify workflow using GTDB database release 95. The protein sequence alignments that were generated by GTDB were used to construct a phylogenetic tree via FastTree v.2.1.10.49 Tree visualization was performed in R with the ape package.50

Eukaryotic MAGs were also recovered from the initial set of CONCOCT bins. EukCC v2.1.151 was used for quality assessment, and MAGs with at least 50% completeness and less than 5% contamination were dereplicated using dRep with ANI threshold of 95%. Taxonomic classification of the dereplicated set was performed screening a mash v.2.352 sketch for all GenBank fungal genomes. A species-level match was defined as best mash hit, confirmed by sharing at least 95% ANI with the reference genome according to MUMmer 4.0.0beta2.53

Viral genome recovery

Spades-assembled contigs of at least 5kb were screened for DNA viruses. Putative viral contigs were identified with VirFinder v.1.154 as sequences detected with an adjusted p > 0.05 and Virsorter2 v.2.155 with options to include only DNA viruses and require hallmark gene. Downstream processing of viral contigs was performed as previously described.10 In brief, CheckV v0.7.023 was used for quality control, CD-HIT v4.6.856 was used to cluster contigs at a 75% alignment coverage threshold, and DemoVir (https://github.com/feargalr/Demovir) was used for taxonomic classification.

Genomes belonging to Papillomaviridae were clustered (CD-HIT) against human papillomaviruses (HPVs) from the SMGC10 and the full set of HPVs downloaded from the PaVe database.75 Papillomaviridae genomes that were found to be unique to the RAG-deficient patients and <10 kb were further processed for phylogenetic characterization based on the L1 gene. Following translation with prodigal v.2.6.3,57 L1 amino acid sequences were extracted from assembled contigs using a hidden-markov-model seach (hmmer v.3.3.258) against L1 amino acid sequences from PaVe, then aligned with MUSCLE v. 5.0.1428.59 Strain-level taxonomic classification for novelty screen was further performed using the L1 taxonomy tool (https://pave.niaid.nih.gov/#analyze/l1_taxonomy_tool) (Table S8).4 Strains with L1 with <80% sequence identity to that of the closest HPV relative were considered novel.

Read mapping of microbial genomes

To determine presence and abundance of microbial species within the metagenomes, first, a genome catalog was created for the comprehensive non-redundant set of prokaryote and eukaryote MAGs and viral contigs (i.e., dereplicated and clustered) assembled from the RAG-deficient patients and HVs/HCs (Table S4), along with all genomes from the SMGC,10 UHGG,13 and ELGG.19 Metagenomic reads from individual samples were mapped to the genome catalog using BWA-MEM v0.7.17.60 SAMtools v1.1576 module ‘samtools flagstat’ was used to determine total mapped and properly-paired reads. Species presence was indicated by genome breadth of coverage of at least 30% for bacterial or fungal species and 75% for viruses. Uniquely mapped reads were determined by filtering total read counts using the ‘samtools view’ module with options ‘-q 1 -f 2’. Relative abundances in each microbiome sample were calculated from the unique read count table.

Reference-based microbial taxonomic and functional classification

Taxonomic classification by read-mapping was compared with a standard approach using Kraken2 v.2.1.28 with option ‘--confidence 0.1’ (database downloaded on 11 February 2020). In addition, HPViewer61 was used to classify abundances of HPV strains from the PaVe database based on reads per kilobase per million reads (RPKMs). Antimicrobial resistance gene (ARG) RPKMs were determined with ShortBRED v0.9.562; shortbred_quantify was run with shortbred_identify markers (marker length >30 amino acids) constructed from the Comprehensive Antibiotic Resistance Database v3.0.277 and UniRef9078 as reference.

RNA virus discovery from RNA-seq data

To estimate RNA viral abundances, RNA-seq reads were mapped to all RefSeq viral genomes using Bowtie2 v2.4.5.63 The number of mapped reads were then aggregated at the genus level. SPAdes v.3.15.044 with parameter “--rnaviral” was used to assemble RNA viral contigs, which were compared against nr/nt database using BLASTn64 and compared to all RefSeq viral genomes using MASH v.2.3.52 Contigs sharing the top significant hit from BLASTn and the closest match from MASH were aligned to the respective RNA virus genome from RefSeq using RagTag v2.1.0 with scaffold option65 and concatenated to obtain a more complete genome. To assess the quality of the concatenated contigs, we used the CheckV ‘end-to-end’ pipeline to calculate predicted completeness and contamination. The same data analysis pipeline was applied to the re-processed samples of DOCK8-deficient patients.

Genotype classification of RNA viruses

To classify the two strains of Norovirus assembled from our samples, we used the Norovirus typing tool (https://www.rivm.nl/mpf/norovirus/typingtool).66 For the Cosavirus and Coronavirus contigs, we extracted and obtained the VP1 protein sequence and spike protein sequence, respectively. Then we used Clustal Omega68 with default parameters to obtain the multiple alignment and neighbor-joining phylogenetic tree of the protein sequences from our samples and from GenBank (accessed on May 26, 2022). The phylogenetic tree was visualized using R package ggtree.67 For the DOCK8-deficient patient with Coronavirus HKU1, the spike protein sequence was derived from a relatively complete Coronavirus contig from oropharynx and compared to the most closely related genomes from GenBank identified by CoVdb79 with identity >90%.

Human transcriptomic analysis

To obtain human transcripts from rRNA-depleted RNA-seq data, we mapped all the reads to hg38 using STAR v2.7.8a with default parameters69 and created tag directories based on the mapping results using makeTagDirectory of HOMER v4.11.1.70 We then quantified the gene expression by transcript per million (TPM) values using HOMER analyzeRepeats.pl script with parameters “rna -count exons -condenseGenes -tpm”. Principal component analysis was performed based on the TPM values of expressed genes of all RNA samples from this study using Python package scikit-learn.80 Differentially expressed genes from RAG_pt07 (harboring Coronavirus 229E) were identified as having 4-fold higher expression than the mean expression of the other nasal samples and 4-fold higher expression than the second highest nasal sample. Gene ontology enrichment analysis was performed using Metascape,81 and GO terms with q-value smaller than 0.05 were considered significant. Similar analysis to identify and characterize differentially expressed genes was applied to the nasal sample of the DOCK8-deficient patient with Coronavirus HKU1 by comparing to the other nasal samples of the DOCK8-deficient patients where Coronavirus was not detected.

Quantification & statistical analysis

All statistics and data visualizations were completed in R v.4.1.1. Total metagenomic reads classified by read-mapping with BWA-MEM were compared to those classified by Kraken2 with the Mann-Whitney test. PERMANOVA was applied to evaluate dissimilarity across microbiome taxonomic profiles. To determine microbial taxon and ARG enrichments across skin/nares microbiomes, linear mixed-effects (LME) models with fixed effects for group type (RAG vs. HV) and random effects for body site (e.g., matching sites, such as N, Ac, Ra, Vf for adults) were applied, and p values were adjusted with Benjamini-Hochberg correction. To determine microbial taxon and ARG enrichments across gut microbiomes, the Mann-Whitney test was utilized.

Acknowledgments

We thank Dr. You Che for discussions on the antimicrobial resistance gene data analysis and interpretation; the other members of the Segre and Kong labs for their underlying contributions; and the patients, their families, and the healthy volunteers and children for participating in these studies and making this research possible. This study utilized the computational resources of the NIH HPC Biowulf Cluster (http://hpc.nih.gov). This work was supported by the Intramural Research Programs of the National Human Genome Research Institute, National Institute of Allergy and Infectious Diseases, National Institute of Arthritis and Musculoskeletal and Skin Diseases, and National Cancer Institute, National Institutes of Health.

Author contributions

L.D.N., H.H.K., and J.A.S. designed the research. M.B., K.N., O.M.D., D.D., J.A.K., S.M.H., J.R.E.B., A.F.F., and L.D.N. characterized the clinical features of the patients. C.J.H., M.E.T., G.B., and H.H.K. supervised the collection of patient samples. S.L.-L. prepared samples for DNA and RNA sequencing. NISC performed quality controls and sequencing. R.A.B. and Z.S. analyzed the data with contributions from S.S.K. R.A.B., Z.S., H.H.K., and J.A.S. wrote the manuscript. All authors approved the final version.

Declaration of interests

The authors declare that they have no competing interests.

Published: September 26, 2023

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.xcrm.2023.101205.

Contributor Information

Julia A. Segre, Email: jsegre@nhgri.nih.gov.

NISC Comparative Sequencing Program:

Jim Thomas, James Mullikin, Alice Young, Gerry Bouffard, Betty Barnabas, Shelise Brooks, Chloe Buchter, Juyun Crawford, Joel Han, Shi-ling Ho, Richelle Legaspi, Quino Maduro, Holly Marfani, Casandra Montemayor, Karen Schandler, Brian Schmidt, Christina Sison, Mal Stantripop, Sean Black, Mila Dekhtyar, Cathy Masiello, Jenny McDowell, Morgan Park, Pam Thomas, and Meg Vemulapalli

Supplemental information

Document S1. Figures S1–S11 and Tables S1, S5, and S9
mmc1.pdf (3MB, pdf)
Table S2. Patient-specific clinical metadata, related to STAR Methods
mmc2.xlsx (14KB, xlsx)
Table S3. Metadata for metagenome samples analyzed in this study, related to STAR Methods
mmc3.xlsx (22.6KB, xlsx)
Table S4. Dereplicated set of eukaryote and prokaryote MAGs recovered from metagenomes of RAG-deficient patients, related to STAR Methods and Figure S2

MAGs were further dereplicated from the SMGC [S2], UHGG [S3], and ELGG [S4], and the study with the best representative genome is reported in last column.

mmc4.xlsx (39.1KB, xlsx)
Table S6. Relative abundances of microbial species detected in the metagenomes, related to Figures 1 and 2
mmc5.xlsx (4.1MB, xlsx)
Table S7. Reads per kilobase per million sequences (RPKMs) of antimicrobial resistance genes (ARGs) detected in the metagenomes, related to Figure 4
mmc6.xlsx (202.1KB, xlsx)
Table S8. HPV sequences assembled from metagenomes of RAG-deficient patients, related to Figure 5
mmc7.xlsx (11KB, xlsx)
Document S2. Article plus supplemental information
mmc8.pdf (8.7MB, pdf)

Data and code availability

  • Raw reads from shotgun metagenome and metatranscriptome sequencing, along with MAGs recovered from patients with RAG deficiency, are available under NCBI BioProject accession number PRJNA872116.

  • Data and code that may be used to reproduce our analyses are available at https://github.com/skinmicrobiome/Blaustein_RAG_2023.

  • Any additional information required to reanalyze the data reported in this work paper is available from the lead contact upon request

References

  • 1.Zheng D., Liwinski T., Elinav E. Interaction between microbiota and immunity in health and disease. Cell Res. 2020;30:492–506. doi: 10.1038/s41422-020-0332-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Belkaid Y., Hand T.W. Role of microbiota in immunity and inflammation. Natl. Inst. Heal. 2014;157:121–141. doi: 10.1016/j.cell.2014.03.011.Role. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Oh J., Freeman A.F., NISC Comparative Sequencing Program. Park M., Sokolic R., Candotti F., Holland S.M., Segre J.A., Kong H.H. The altered landscape of the human skin microbiome in patients with primary immunodeficiencies. Genome Res. 2013;23:2103–2114. doi: 10.1101/gr.159467.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Tirosh O., Conlan S., Deming C., Lee-Lin S.Q., Huang X., NISC Comparative Sequencing Program. Su H.C., Freeman A.F., Segre J.A., Kong H.H., et al. Expanded skin virome in DOCK8-deficient patients. Nat. Med. 2018;24:1815–1821. doi: 10.1038/s41591-018-0211-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Delmonte O.M., Villa A., Notarangelo L.D. Immune dysregulation in patients with RAG deficiency and other forms of combined immune deficiency. Blood. 2020;135:610–619. doi: 10.1182/BLOOD.2019000923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Delmonte O.M., Schuetz C., Notarangelo L.D. Physiol. Behav. Vol. 38. RAG; 2019. RAG Deficiency: Two Genes, Many Diseases; pp. 646–655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Notarangelo L.D., Kim M.S., Walter J.E., Lee Y.N. Human RAG mutations: Biochemistry and clinical implications. Nat. Rev. Immunol. 2016;16:234–246. doi: 10.1038/nri.2016.28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Wood D.E., Lu J., Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019;20:257. doi: 10.1186/s13059-019-1891-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Truong D.T., Franzosa E.A., Tickle T.L., Scholz M., Weingart G., Pasolli E., Tett A., Huttenhower C., Segata N. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods. 2015;12:902–903. doi: 10.1038/nmeth.3589. [DOI] [PubMed] [Google Scholar]
  • 10.Saheb Kashaf S., Proctor D.M., Deming C., Saary P., Hölzer M., NISC Comparative Sequencing Program. Taylor M.E., Kong H.H., Segre J.A., Almeida A., Finn R.D. Integrating cultivation and metagenomics for a multi-kingdom view of skin microbiome diversity and functions. Nat. Microbiol. 2022;7:169–179. doi: 10.1038/s41564-021-01011-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Parks D.H., Rinke C., Chuvochina M., Chaumeil P.A., Woodcroft B.J., Evans P.N., Hugenholtz P., Tyson G.W. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2017;2:1533–1542. doi: 10.1038/s41564-017-0012-7. [DOI] [PubMed] [Google Scholar]
  • 12.Hug L.A., Baker B.J., Anantharaman K., Brown C.T., Probst A.J., Castelle C.J., Butterfield C.N., Hernsdorf A.W., Amano Y., Ise K., et al. A new view of the tree of life. Nat. Microbiol. 2016;1 doi: 10.1038/nmicrobiol.2016.48. [DOI] [PubMed] [Google Scholar]
  • 13.Almeida A., Nayfach S., Boland M., Strozzi F., Beracochea M., Shi Z.J., Pollard K.S., Sakharova E., Parks D.H., Hugenholtz P., et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol. 2021;39:105–114. doi: 10.1038/s41587-020-0603-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Liwinski T., Leshem A., Elinav E. Breakthroughs and Bottlenecks in Microbiome Research. Trends Mol. Med. 2021;27:298–301. doi: 10.1016/j.molmed.2021.01.003. [DOI] [PubMed] [Google Scholar]
  • 15.Graham B.S., Sullivan N.J. Emerging viral diseases from a vaccinology perspective: Preparing for the next pandemic review-article. Nat. Immunol. 2018;19:20–28. doi: 10.1038/s41590-017-0007-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Graham B.S., Corbett K.S. Prototype pathogen approach for pandemic preparedness: World on fire. J. Clin. Invest. 2020;130:3348–3349. doi: 10.1172/JCI139601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Shi M., Lin X.D., Chen X., Tian J.H., Chen L.J., Li K., Wang W., Eden J.S., Shen J.J., Liu L., et al. The evolutionary history of vertebrate RNA viruses. Nature. 2018;561:E6. doi: 10.1038/s41586-018-0310-0. [DOI] [PubMed] [Google Scholar]
  • 18.Dvorak C.C., Haddad E., Heimall J., Dunn E., Buckley R.H., Kohn D.B., Cowan M.J., Pai S.Y., Griffith L.M., Cuvelier G.D.E., et al. The diagnosis of severe combined immunodeficiency (SCID): The Primary Immune Deficiency Treatment Consortium (PIDTC) 2022 Definitions. J. Allergy Clin. Immunol. 2023;151:539–546. doi: 10.1016/j.jaci.2022.10.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Zeng S., Patangia D., Almeida A., Zhou Z., Mu D., Paul Ross R., Stanton C., Wang S. A compendium of 32,277 metagenome-assembled genomes and over 80 million genes from the early-life human gut microbiome. Nat. Commun. 2022;13:5139. doi: 10.1038/s41467-022-32805-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Jo J.H., Harkins C.P., Schwardt N.H., Portillo J.A., NISC Comparative Sequencing Program. Zimmerman M.D., Carter C.L., Hossen M.A., Peer C.J., Polley E.C., et al. Alterations of human skin microbiome and expansion of antimicrobial resistance after systemic antibiotics. Sci. Transl. Med. 2021;13:eabd8077. doi: 10.1126/scitranslmed.abd8077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Byrd A.L., Deming C., Cassidy S.K.B., Harrison O.J., Ng W.-I., Conlan S., NISC Comparative Sequencing Program. Belkaid Y., Segre J.A., Kong H.H. Staphylococcus aureus and staphylococcus epidermidis strain diversity underlying pediatric atopic dermatitis. Sci. Transl. Med. 2017;9:eaal4651. doi: 10.1542/peds.2018-2420II. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Stein G.E., Craig W.A. Tigecycline : A Critical Analysis. Clin. Infect. Dis. 2006;43:518–524. doi: 10.1086/505494. [DOI] [PubMed] [Google Scholar]
  • 23.Nayfach S., Camargo A.P., Schulz F., Eloe-Fadrosh E., Roux S., Kyrpides N.C. CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat. Biotechnol. 2021;39:578–585. doi: 10.1038/s41587-020-00774-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Chhabra P., de Graaf M., Parra G.I., Chan M.C.W., Green K., Martella V., Wang Q., White P.A., Katayama K., Vennema H., et al. Updated classification of norovirus genogroups and genotypes. J. Gen. Virol. 2019;100:1393–1406. doi: 10.1099/JGV.0.001318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Kapusinszky B., Phan T.G., Kapoor A., Delwart E. Genetic diversity of the genus cosavirus in the family picornaviridae: A new species, recombination, and 26 new genotypes. PLoS One. 2012;7 doi: 10.1371/journal.pone.0036685. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Oh J., Byrd A.L., Deming C., Conlan S., NISC Comparative Sequencing Program. Kong H.H., Segre J.A., Blakesley R., Bouffard G., Brooks S., et al. Biogeography and individuality shape function in the human skin metagenome. Nature. 2014;514:59–64. doi: 10.1038/nature13786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Human Microbiome Project Consortium. Gevers D., Knight R., Abubucker S., Badger J.H., Chinwalla A.T., Creasy H.H., Earl A.M., Fitzgerald M.G., Fulton R.S., et al. Structure, function and diversity of the healthy human microbiome. Nature. 2012;486:207–214. doi: 10.1038/nature11234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Oh J., Byrd A.L., Park M., NISC Comparative Sequencing Program. Kong H.H., Segre J.A. Temporal Stability of the Human Skin Microbiome. Cell. 2016;165:854–866. doi: 10.1016/j.cell.2016.04.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Byrd A.L., Belkaid Y., Segre J.A. The human skin microbiome. Nat. Rev. Microbiol. 2018;16:143–155. doi: 10.1038/nrmicro.2017.157. [DOI] [PubMed] [Google Scholar]
  • 30.Pedersen T.K., Brown E.M., Plichta D.R., Johansen J., Twardus S.W., Delorey T.M., Lau H., Vlamakis H., Moon J.J., Xavier R.J., Graham D.B. The CD4+ T cell response to a commensal-derived epitope transitions from a tolerant to an inflammatory state in Crohn’s disease. Immunity. 2022;55:1909–1923.e6. doi: 10.1016/j.immuni.2022.08.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Huang X., Hurabielle C., Drummond R.A., Bouladoux N., Desai J.V., Sim C.K., Belkaid Y., Lionakis M.S., Segre J.A. Murine model of colonization with fungal pathogen Candida auris to explore skin tropism, host risk factors and therapeutic strategies. Cell Host Microbe. 2021;29:210–221.e6. doi: 10.1016/j.chom.2020.12.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Ott de Bruin L.M., Bosticardo M., Barbieri A., Lin S.G., Rowe J.H., Poliani P.L., Ching K., Eriksson D., Landegren N., Kämpe O., et al. Hypomorphic Rag1 mutations alter the preimmune repertoire at early stages of lymphoid development. Blood. 2018;132:281–292. doi: 10.1182/blood-2017-12-820985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Rigoni R., Fontana E., Dobbs K., Marrella V., Taverniti V., Maina V., Facoetti A., D’Amico G., Al-Herz W., Cruz-Munoz M.E., et al. Cutaneous barrier leakage and gut inflammation drive skin disease in Omenn syndrome. J. Allergy Clin. Immunol. 2020;146:1165–1179.e11. doi: 10.1016/j.jaci.2020.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Montassier E., Gastinne T., Vangay P., Al-Ghalith G.A., Bruley Des Varannes S., Massart S., Moreau P., Potel G., de La Cochetière M.F., Batard E., Knights D. Chemotherapy-driven dysbiosis in the intestinal microbiome. Aliment. Pharmacol. Ther. 2015;42:515–528. doi: 10.1111/apt.13302. [DOI] [PubMed] [Google Scholar]
  • 35.van Schaik W. The human gut resistome. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2015;370:20140087. doi: 10.1098/rstb.2014.0087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Anthony W.E., Wang B., Sukhum K.V., D’Souza A.W., Hink T., Cass C., Seiler S., Reske K.A., Coon C., Dubberke E.R., et al. Acute and persistent effects of commonly used antibiotics on the gut microbiome and resistome in healthy adults. Cell Rep. 2022;39 doi: 10.1016/j.celrep.2022.110649. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Palleja A., Mikkelsen K.H., Forslund S.K., Kashani A., Allin K.H., Nielsen T., Hansen T.H., Liang S., Feng Q., Zhang C., et al. Recovery of gut microbiota of healthy adults following antibiotic exposure. Nat. Microbiol. 2018;3:1255–1265. doi: 10.1038/s41564-018-0257-9. [DOI] [PubMed] [Google Scholar]
  • 38.Gasparrini A.J., Wang B., Sun X., Kennedy E.A., Hernandez-Leyva A., Ndao I.M., Tarr P.I., Warner B.B., Dantas G. Persistent metagenomic signatures of early-life hospitalization and antibiotic treatment in the infant gut microbiota and resistome. Nat. Microbiol. 2019;4:2285–2297. doi: 10.1038/s41564-019-0550-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Zhang S.F., Tuo J.L., Huang X.B., Zhu X., Zhang D.M., Zhou K., Yuan L., Luo H.J., Zheng B.J., Yuen K.Y., et al. Epidemiology characteristics of human coronaviruses in patients with respiratory infection symptoms and phylogenetic analysis of HCoV-OC43 during 2010-2015 in Guangzhou. PLoS One. 2018;13 doi: 10.1371/journal.pone.0191789. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Matsuno A.K., Gagliardi T.B., Paula F.E., Luna L.K.S., Jesus B.L.S., Stein R.T., Aragon D.C., Carlotti A.P.C.P., Arruda E. Human coronavirus alone or in co-infection with rhinovirus C is a risk factor for severe respiratory disease and admission to the pediatric intensive care unit: A one-year study in Southeast Brazil. PLoS One. 2019;14 doi: 10.1371/journal.pone.0217744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Le Glass E., Hoang V.T., Boschi C., Ninove L., Zandotti C., Boutin A., Bremond V., Dubourg G., Ranque S., Lagier J.-C., et al. 2021. Incidence and Outcome of Coinfections with SARS-CoV-2 and Rhinovirus. Viruses 13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.de Jong S.J., Imahorn E., Itin P., Uitto J., Orth G., Jouanguy E., Casanova J.L., Burger B. Epidermodysplasia verruciformis: Inborn errors of immunity to human beta-papillomaviruses. Front. Microbiol. 2018;9:1222. doi: 10.3389/fmicb.2018.01222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Nussenblatt V., Roder A.E., Das S., de Wit E., Youn J.-H., Banakis S., Mushegian A., Mederos C., Wang W., Chung M., et al. Yearlong COVID-19 Infection Reveals Within-Host Evolution of SARS-CoV-2 in a Patient With B-Cell Depletion. J. Infect. Dis. 2022;225:1118–1123. doi: 10.1093/infdis/jiab622. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Bankevich A., Nurk S., Antipov D., Gurevich A.A., Dvorkin M., Kulikov A.S., Lesin V.M., Nikolenko S.I., Pham S., Prjibelski A.D., et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012;19:455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Uritskiy G.V., DiRuggiero J., Taylor J. MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome. 2018;6:158. doi: 10.1186/s40168-018-0541-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Orakov A., Fullam A., Coelho L.P., Khedkar S., Szklarczyk D., Mende D.R., Schmidt T.S.B., Bork P. GUNC: detection of chimerism and contamination in prokaryotic genomes. Genome Biol. 2021;22:178. doi: 10.1186/s13059-021-02393-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Olm M.R., Brown C.T., Brooks B., Banfield J.F. DRep: A tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 2017;11:2864–2868. doi: 10.1038/ismej.2017.126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Chaumeil P.A., Mussig A.J., Hugenholtz P., Parks D.H. GTDB-Tk: A toolkit to classify genomes with the genome taxonomy database. Bioinformatics. 2020;36:1925–1927. doi: 10.1093/bioinformatics/btz848. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Price M.N., Dehal P.S., Arkin A.P. FastTree 2 - Approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5 doi: 10.1371/journal.pone.0009490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Paradis E., Schliep K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2019;35:526–528. doi: 10.1093/bioinformatics/bty633. [DOI] [PubMed] [Google Scholar]
  • 51.Saary P., Mitchell A.L., Finn R.D. Estimating the quality of eukaryotic genomes recovered from metagenomic analysis with EukCC. Genome Biol. 2020;21:244. doi: 10.1186/s13059-020-02155-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Ondov B.D., Treangen T.J., Melsted P., Mallonee A.B., Bergman N.H., Koren S., Phillippy A.M. Mash: Fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016;17:132. doi: 10.1186/s13059-016-0997-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Kurtz S., Phillippy A., Delcher A.L., Smoot M., Shumway M., Antonescu C., Salzberg S.L. Versatile and open software for comparing large genomes. Genome Biol. 2004;5:R12. doi: 10.1186/gb-2004-5-2-r12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Ren J., Ahlgren N.A., Lu Y.Y., Fuhrman J.A., Sun F. VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. Microbiome. 2017;5:69. doi: 10.1186/s40168-017-0283-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Guo J., Bolduc B., Zayed A.A., Varsani A., Dominguez-huerta G., Delmont T.O., Pratama A.A., Gazitúa M.C., Vik D., Sullivan M.B., Roux S. VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses. Microbiome. 2021;9:37. doi: 10.1186/s40168-020-00990-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Fu L., Niu B., Zhu Z., Wu S., Li W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28:3150–3152. doi: 10.1093/bioinformatics/bts565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Hyatt D., Chen G.L., LoCascio P.F., Land M.L., Larimer F.W., Hauser L.J. Prodigal: Prokaryotic gene recognition and translation initiation site identification. BMC Bioinf. 2010;11:119. doi: 10.1186/1471-2105-11-119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Potter S.C., Luciani A., Eddy S.R., Park Y., Lopez R., Finn R.D. HMMER web server: 2018 update. Nucleic Acids Res. 2018;46:W200–W204. doi: 10.1093/nar/gky448. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Edgar R.C. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Hao Y., Yang L., Galvao Neto A., Amin M.R., Kelly D., Brown S.M., Branski R.C., Pei Z. HPViewer: Sensitive and specific genotyping of human papillomavirus in metagenomic DNA. Bioinformatics. 2018;34:1986–1995. doi: 10.1093/bioinformatics/bty037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Kaminski J., Gibson M.K., Franzosa E.A., Segata N., Dantas G., Huttenhower C. High-Specificity Targeted Functional Profiling in Microbial Communities with ShortBRED. PLoS Comput. Biol. 2015;11 doi: 10.1371/journal.pcbi.1004557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Langmead B., Salzberg S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  • 65.Alonge M., Lebeigle L., Kirsche M., Aganezov S., Wang X., Lippman Z.B., Schatz M.C., Soyk S. Automated assembly scaffolding elevates a new tomato system for high-throughput genome editing. bioRxiv. 2021 doi: 10.1101/2021.11.18.469135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Kroneman A., Vennema H., Deforche K., v d Avoort H., Peñaranda S., Oberste M.S., Vinjé J., Koopmans M. An automated genotyping tool for enteroviruses and noroviruses. J. Clin. Virol. 2011;51:121–125. doi: 10.1016/j.jcv.2011.03.006. [DOI] [PubMed] [Google Scholar]
  • 67.Yu G., Smith D.K., Zhu H., Guan Y., Lam T.T.-Y. ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 2017;8:28–36. doi: 10.1111/2041-210X.12628. [DOI] [Google Scholar]
  • 68.Madeira F., Pearce M., Tivey A.R.N., Basutkar P., Lee J., Edbali O., Madhusoodanan N., Kolesnikov A., Lopez R. Search and sequence analysis tools services from EMBL-EBI in 2022. Nucleic Acids Res. 2022;50:W276–W279. doi: 10.1093/nar/gkac240. gkac240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Heinz S., Benner C., Spann N., Bertolino E., Lin Y.C., Laslo P., Cheng J.X., Murre C., Singh H., Glass C.K. Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities. Mol. Cell. 2010;38:576–589. doi: 10.1016/j.molcel.2010.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Van Der Geest K.S.M., Abdulahad W.H., Horst G., Lorencetti P.G., Bijzet J., Arends S., Van Der Heiden M., Buisman A.M., Kroesen B.J., Brouwer E., Boots A.M.H. Quantifying distribution of flow cytometric TCR-Vβ usage with economic statistics. PLoS One. 2015;10 doi: 10.1371/journal.pone.0125373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Martin M.B. Cutadapt removes sequences from high-throughput sequencing reads. EMBnet J. 2013;18:1–2. [Google Scholar]
  • 73.Schmieder R., Edwards R. Quality control and preprocessing of metagenomic datasets. Bioinformatics. 2011;27:863–864. doi: 10.1093/bioinformatics/btr026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Saheb Kashaf S., Almeida A., Segre J.A., Finn R.D. Recovering prokaryotic genomes from host-associated, short-read shotgun metagenomic sequencing data. Nat. Protoc. 2021;16:2520–2541. doi: 10.1038/s41596-021-00508-2. [DOI] [PubMed] [Google Scholar]
  • 75.Van Doorslaer K., Li Z., Xirasagar S., Maes P., Kaminsky D., Liou D., Sun Q., Kaur R., Huyen Y., McBride A.A. The Papillomavirus Episteme: A major update to the papillomavirus sequence database. Nucleic Acids Res. 2017;45:D499–D506. doi: 10.1093/nar/gkw879. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R., 1000 Genome Project Data Processing Subgroup The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Alcock B.P., Raphenya A.R., Lau T.T.Y., Tsang K.K., Bouchard M., Edalatmand A., Huynh W., Nguyen A.L.V., Cheng A.A., Liu S., et al. CARD 2020: Antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Res. 2020;48:D517–D525. doi: 10.1093/nar/gkz935. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Suzek B.E., Wang Y., Huang H., McGarvey P.B., Wu C.H., UniProt Consortium UniRef clusters: A comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics. 2015;31:926–932. doi: 10.1093/bioinformatics/btu739. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Zhu Z., Meng K., Liu G., Meng G. A Database Resource and Online Analysis Tools for Coronaviruses on a Historical and Global Scale. Database 00. 2020 doi: 10.1093/database/baaa070. baaa070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011;12:2825–2830. [Google Scholar]
  • 81.Zhou Y., Zhou B., Pache L., Chang M., Khodabakhshi A.H., Tanaseichuk O., Benner C., Chanda S.K. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 2019;10:1523. doi: 10.1038/s41467-019-09234-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S11 and Tables S1, S5, and S9
mmc1.pdf (3MB, pdf)
Table S2. Patient-specific clinical metadata, related to STAR Methods
mmc2.xlsx (14KB, xlsx)
Table S3. Metadata for metagenome samples analyzed in this study, related to STAR Methods
mmc3.xlsx (22.6KB, xlsx)
Table S4. Dereplicated set of eukaryote and prokaryote MAGs recovered from metagenomes of RAG-deficient patients, related to STAR Methods and Figure S2

MAGs were further dereplicated from the SMGC [S2], UHGG [S3], and ELGG [S4], and the study with the best representative genome is reported in last column.

mmc4.xlsx (39.1KB, xlsx)
Table S6. Relative abundances of microbial species detected in the metagenomes, related to Figures 1 and 2
mmc5.xlsx (4.1MB, xlsx)
Table S7. Reads per kilobase per million sequences (RPKMs) of antimicrobial resistance genes (ARGs) detected in the metagenomes, related to Figure 4
mmc6.xlsx (202.1KB, xlsx)
Table S8. HPV sequences assembled from metagenomes of RAG-deficient patients, related to Figure 5
mmc7.xlsx (11KB, xlsx)
Document S2. Article plus supplemental information
mmc8.pdf (8.7MB, pdf)

Data Availability Statement

  • Raw reads from shotgun metagenome and metatranscriptome sequencing, along with MAGs recovered from patients with RAG deficiency, are available under NCBI BioProject accession number PRJNA872116.

  • Data and code that may be used to reproduce our analyses are available at https://github.com/skinmicrobiome/Blaustein_RAG_2023.

  • Any additional information required to reanalyze the data reported in this work paper is available from the lead contact upon request


Articles from Cell Reports Medicine are provided here courtesy of Elsevier

RESOURCES