Abstract
Dysbiosis of skin microbiota drives the progression of atopic dermatitis (AD). The contribution of bacteriophages to bacterial community compositions in normal and inflamed skin is unknown. Using shotgun metagenomics from skin swabs of healthy individuals and patients with AD, we found 13,586 potential viral contiguous DNA sequences, which could be combined into 164 putative viral genomes including 133 putative phages. The Shannon diversity index for the viral metagenome-assembled genomes (vMAGs) did not correlate with AD. In total, we identified 28 vMAGs that differed significantly between normal and AD skin. Quantitative polymerase chain reaction validation of three complete vMAGs revealed their independence from host bacterium abundance. Our data indicate that normal and inflamed skin harbor distinct phageomes and suggest a causative relationship between changing viral and bacterial communities as a driver of skin pathology.
Analysis of skin metagenomes identifies bacteriophages as a yet unidentified key player in atopic dermatitis.
INTRODUCTION
Atopic dermatitis (AD) is a common chronic inflammatory skin disease, thought to be triggered by environmental factors in genetically susceptible individuals (1, 2). The underlying pathomechanisms involve aberrations in T cell–mediated immunity, epidermal barrier function, and changes in the skin microbiota (1). Over the past decade, detailed knowledge about the cutaneous microbiome in healthy (3–7) and diseased skin (8) has been gathered. It is now well established that the microbiome shifts from diverse bacterial communities in healthy skin to Staphylococcus aureus–dominated communities in AD (8–10). Comparably, little is known about other skin-inhabiting taxonomic kingdoms such as viruses and bacteriophages, fungi, and archaea.
Bacteriophages (phages) are small, ubiquitous viruses that infect bacteria. Phages are thus thought to shape the composition of commensal and pathogenic microbiomes in human organs and barrier surfaces, such as the gut. Biologically, phages are believed to account for 20 to 40% of bacterial lysis (11, 12). Alternatively, in their lysogenic life cycle, they integrate into bacterial genomes thereby potentially contributing to genetic exchange between bacteria (13). It is estimated that there are 1031 different phage species infecting microbial communities (14); however, currently, only a small proportion of complete phage genomes is available. A major challenge is the identification of phage DNA from noncultured samples because, unlike bacteria, phages do not have a common shared genomic feature (13, 15). Recent advances in machine learning have, however, made robust phage identification from large-scale meta-genomic datasets possible (16, 17). This has enabled first insights into the interplay between phage and bacteria communities, for example, in ocean and soil samples (18–20).
Most knowledge about phages in the context of the human microbiome is based on the analysis of gut phages (13). In the human intestine, 108 phages/ml of fecal filtrate were found (21). The human gut phageome exhibits a high fraction of prophages, contains a large proportion of “viral dark matter” that cannot be assigned to any taxa, and exhibits high interindividual variation but seems to be stable over time in the same host. In patients with Crohn’s disease, the distribution of phage families shifts from Microviridae toward Caudovirales (22). Phages also potentially hold great promise in therapeutic settings. Phage therapy was, for instance, used in inflammatory bowel disease (23), via targeting Klebsiella pneumoniae strains in fracture-related infection (24) and via targeting Mycobacterium abscessus to treat cystic fibrosis (25, 26). Another study successfully used phages against a multidrug-resistant Mycobacterium chelonae strain (27).
On human skin, a recent study identified a large set of viral metagenome-assembled genomes (vMAGs) on several body sites including a cluster of jumbo phages on the foot (6). However, little is known about the precise phageome composition and phage identities in normal and diseased human skin. Given the fact that the cutaneous bacterial microbiome undergoes significant changes in AD progression, we hypothesized that phages may undergo similar species shifts in this setting and may thereby contribute to disease pathogenesis. Using shot-gun metagenomic sequencing, we characterized the phageome of normal skin from healthy individuals and inflamed skin from patients with AD. We identified a large number of previously unidentified (cutaneous) vMAGs, some of which displayed an inflammation-dependent abundance. Notably, the fraction of certain integrated S. aureus–specific vMAGs is increased within their bacterial hosts, indicating that they provide a survival benefit for infected bacteria. Since the skin is easily accessible for topical therapy, modification of the microbiome in AD may be achieved with phage therapy in the future.
RESULTS
To understand the relationship between bacteriophages and bacterial microbiota in healthy and AD skin, we took swabs from the ventral aspects of the elbows from 7 healthy individuals and 10 patients with AD. Patient demographics are shown in table S1. After human DNA (host) depletion, microbial DNA was isolated and subjected to shotgun sequencing (Fig. 1A). On average, 28 million reads per sample were produced (fig. S1), whereof 47.12% (±17.21%) were not aligned to the human genome and were thus used for further analysis.
Fig. 1. Study workflow and bacterial communities in healthy skin and AD.
(A) Study workflow. (B) Ridge plot displaying the distribution of samples depending on the fraction of covered bacterial genome. x axis is the fraction bacterial genome covered (0% indicates that the bacterial genome was not present in the sample, while 100% indicates that the entire bacterial genome was covered). Each subpanel represents a bacterial genome. y axis is the proportion of patient samples that showed the indicated genome coverage rate. Pink are AD samples, and turquoise are healthy controls. (C) Heatmap of all reads mapped against the top abundant skin bacteria according to Byrd et al. (5) and Staphylococcus strains frequently found in AD (full read table is given in table S6). This plot includes samples from our dataset and the dataset from (8). Samples are ordered according to an arbitrary clinical inflammation score, where red indicates high inflammation and blue indicates no inflammation in patients with AD or healthy controls.
For the purpose of quality control of bioinformatically identified contigs (i.e., continuous DNA stretches created from next-generation sequencing reads), we aligned them against abundant skin bacteria (5) and Staphylococcus strains frequently found on the skin of patients with AD. In concordance with published literature, we observed a genome coverage of more than 80% of Propionibacterium acnes in five of seven healthy controls (Fig. 1B). The abundance of P. acnes in patients with AD was reduced as a result of the overgrowth of S. aureus (9), where P. acnes was not detected or below 5% genome coverage in eight patients with AD and above 75% in two patients. The coverage of the S. aureus genome in patients with AD ranged from 17 to 100%, with 6 of 10 patients having over 80% coverage of the S. aureus genome. Notably, in some patients with AD, we found the presence of Staphylococcus schweitzeri, a staphylococcal strain closely related to S. aureus, which has previously not been associated with AD. There was a strong negative correlation (Pearson ρ = −0.71) between covered P. acnes and S. aureus genomes.
In addition to the patient cohort enrolled at the Medical University of Vienna, we included metagenomic data from National Center for Biotechnology Information (NCBI)’s sequence read archive, specifically datasets from Byrd et al. (8). The latter included 422 skin metagenome specimens from different sites of the body taken from 11 children with AD and 7 healthy children (8). We then mapped metagenomic data from both studies against the bacterial reference genomes (Fig. 1C and table S5). Again, S. aureus was correlating with the extent of inflammation in patients with AD (Fig. 1C).
Next, we set out to identify viral DNA sequences within our shotgun libraries. We applied two machine learning algorithms (16, 17), which compare coding sequences (CDS) found in the contigs from shotgun sequencing to hallmark viral features present in several publicly available viral databases (see Supplementary Materials). We found a total of 13,586 potentially viral contigs distributed over 182 metagenome samples, 9765 of which were identified in our dataset, and 3821 were identified in the publicly avaibale dataset published by Byrd et al. (8). A total of 112 contigs were identified by DeepVir finder algorithm (16), 4478 by VirSorter2 approach (17), and 1996 with both approaches (table S7). There were no numerical differences in viral contigs between healthy controls and patients with AD (6859 contigs were found in patients with AD, and 6727 were found in healthy controls). The analysis of phage contig length per participant revealed that an average of 16.1% of the total contig length originated from a viral contig. There was no difference in viral contig length fraction between healthy controls and patients with AD or between the two datasets (figs. S5 and S6).
After combining highly similar contigs found in individual participants (dereplication), we identified 164 unique vMAGs. Of the 164 vMAGs identified, 11 were complete genomes, i.e., they were flanked by a direct or indirect terminal repeat or had a clear bacteria/phage border (28). We also found 16 high-quality genomes with more than 90% of the phage genome complete (Fig. 2A and table S8). Of the 27 top tiered genomes, eight were classified as provirus. We applied a sequence similarity threshold of 95% to compare the 164 vMAGs found in our study to RefSeq phages (table S9) and two large-scale public datasets, totaling 2.479 million viral contigs (Fig. 2D). We found that 32 vMAGs were already present in the publicly available datasets, of which 11 overlapping vMAGs were complete or high-quality genomes. A total of 132 vMAGs have not been reported before (Fig. 2D).
Fig. 2. Overview of normal skin and AD phageome.
(A) Genome quality of found viral contigs based on covered length on reference genomes of several sources. High-quality genomes have a completeness of 90%, medium-quality genomes have a completeness between 80 to 90%, and low-confidence genomes have a completeness smaller than 80% (28). ITR, inverted terminal repeat; DTR, direct terminal repeat. (B) Heatmap representation of sample distances based on sequence similarity of RefSeq phage sequences and skin phages found in this study. For simplicity, we removed RefSeq phages with no overlap to the found vMAGs or less than five partial overlaps within the RefSeq database from the heatmap representation. A full color legend is given in fig. S10. (C) UMAP representation of multidimensional scaling of the sequence similarity–based distances (PCoA) of the vMAG sequences found in this study. Annotation is based on the available match as given in more detail in table S11. PCoA stratified according to study is given in fig. S12. (D) Overlaps between phage contigs found in this study (n = 164), RefSeq phages (n = 4246), a gut phage database (21) (n = 142809), and IMG/VR v3 (19) (n > 2 mio).
To further understand the identity of our sequences, we assessed the sequence similarity between the newly found vMAGs and RefSeq phages (table S9). For this, we calculated the pairwise sequences similarity and represented them as a distance measure (see Methods) to give a visual overview (Fig. 2B). The previously unidentified phage sequences were diverse, and there was no predominant phylum or order of phages or phage hosts. In addition, within NCBI’s RefSeq phages, we observed high-sequence diversity within phage phyla and orders (Fig. 2B).
We next performed multidimensional scaling based on sequence similarity distances to arrive at a visualization of the β diversity of our vMAG dataset (Fig. 2C). To annotate the resulting principal coordinates analysis (PCoA) plot, we used the best hit available to us. Our best hit was based on sequences similarity mappings to RefSeq database, several large databases (Fig. 2D), CheckV results, and CRISPR mapping (table S11). For 21 of the 164 vMAGs, our best hit was Staphylococcus phage, for 11 Streptococcus phages, for 5 Actinomycetota including Corynebacteria phages, for 11 Moraxellaceae, and 5 Campylobacterota phages. A total of 111 vMAGs did not provide a conclusive mapping to a host bacterium. The vMAGs within the three satellite clusters shown in the Uniform Manifold Approximation and Projection plots in Fig. 2C did not correlate with genome quality. Specifically, we found 12.5% complete or high-quality genomes in satellite clusters and 14.3% complete or high-quality genomes in the main cluster (Fig. 2C). However, we found all prophage vMAGs in the central cluster, and vMAGs in central cluster showed a 10 times higher average coverage than vMAGs in satellite clusters. Thus, similar to other studies (13, 29), we suspect vMAGs from lytic phages being present in higher density in satellite clusters of PCoA plots (Fig. 2C). Analyzing virus taxonomy of the 164 vMAGs, we observed different 133 Caudovirales, 7 nucleocytoplasmic large DNA viruses, 7 Papillomaviridae (table S11), and 17 viral dark matter metagenomic assemblies that have never been observed before.
We then calculated the relative abundances of phage contigs in the total dataset and plotted the 15 most abundant contigs. This unsupervised visualization already showed shifts in vMAG abundances correlating with AD-induced inflammation (fig. S13). We calculated the Shannon index across all vMAGs in our study to evaluate the α diversity in the sample (Fig. 3A). There was a strong intersample variability in phage composition; however, the Shannon index did not correlate with severity of AD-induced inflammation (Fig. 3A). Thus, unlike bacterial communities, the α diversity of phages was not reduced in patients with AD.
Fig. 3. Phage contig–associated AD-induced inflammation.
(A) Shannon index per sample represented as bar on top of a heatmap of Bonferroni-significant AD-associated contigs (table S12). Red indicates increased, and blue indicates decreased reads per contig. The track on top of the heatmap gives an inflammation score (see Methods), where red indicates strong skin inflammation, pink and turquoise indicate moderate, and brown indicates no inflation or healthy control. The annotation track below shows dataset source, where green indicates samples from Byrd et al. (8) and blue indicates samples from this study. (B) Read counts given as log CPM (counts per million) for selected phage contigs (vMAG4, vMAG14, and vMAG8). Boxplots are stratified for inflammation score with the color code described in (A). RPKM, reads per kilobase of transcript per million mapped reads. (C) qPCR validation results. Middle panel of each plot (barplot) resembles the fraction of samples where we could detect the phage contig (vMAG4, vMAG14, and vMAG8). Top panel in each plot (boxplot) is based on samples where a phage contig was detected. Boxplots give the fraction of bacteria with prophage present in case of vMAG14 or give the proportion of detected phages relative to the suspected host bacterium using ΔΔCT method. Bottom panel describes the amount of suspected host bacteria per sample for each phage.
When we regressed the read hit data against AD-induced inflammation, 28 phage contigs showed Bonferroni-significant association (Fig. 3A and table S12). Bonferroni significant vMAGs (73%) from combined analysis were nominally significant in both studies separately (fig. S14). Including all association data, without correction for multiple testing (P < 0.05; table S12) into the analysis, we found 88% of prophages associated with AD. Similarly, we observed more S. aureus and Staphylococcus epidermidis phages in patients with AD where 47% of S. aureus (without prophages) are associated with AD (Fig. 3 and table S12).
Table 1. Overview of vMAGs selected for qPCR validiation.
Completeness and genome quality were estimated by CheckV software. Coef indicates regression coefficient, stderr is standard error, and N is number of samples in regression analysis. P value is the raw P value not corrected for multiple testing. An overview of all regression results are given in table S12.
| ID | Best hit host | Provirus | Completeness | Genome quality | Coef | Stderr | N | P value |
|---|---|---|---|---|---|---|---|---|
| vMAG4 | Human: Orala | No | High quality | High quality | −0.44 | 0.10 | 89 | 4.33 × 10−5 |
| vMAG14 | Staphylococcus | Yes | Complete | High quality | 0.49 | 0.11 | 89 | 7.00 × 10−5 |
| vMAG8 | Campylobacteraceae | No | High quality | High quality | 0.46 | 0.10 | 89 | 1.68 × 10−5 |
| vMAG6 | Erysipelotrichaceae | Yes | Medium quality | Genome fragment | −0.73 | 0.13 | 89 | 8.73 × 10−8 |
| vMAG49 | Human: Guta | Yes | Medium quality | Genome fragment | −0.35 | 0.09 | 89 | 2.75 × 10−4 |
| vMAG33 | Propionibacteriales | No | High quality | High quality | −0.67 | 0.17 | 89 | 1.49 × 10−4 |
| vMAG12 | Staphylococcus | Yes | Complete | High quality | 0.06 | 0.14 | 89 | 6.58 × 10−1 |
| vMAG16 | Staphylococcus | Yes | Complete | High quality | 0.44 | 0.13 | 89 | 3.08 × 10−4 |
| vMAG25 | Staphylococcus | Yes | Complete | High quality | 0.47 | 0.12 | 89 | 1.79 × 10−4 |
aHost bacteria found on humans
In the next step, we aimed to corroborate the presence of vMAGs in skin swabs identified by NGS using quantitative polymerase chain reaction (qPCR). On the basis of P value and genome completeness, a total of 12 vMAGs (Table 1, table S13, and fig. S16) were selected for qPCR validation (Fig. 3B and fig. S15) in at least 10 clinical samples each (see barplots in Fig. 3C and fig. S16). qPCR results from vMAGs were normalized separately in each sample against total bacteria. This was done using qPCR values from a 16S ribosomal RNA primer to control for variation in input DNA amounts and primers to quantify suspected host bacteria. For example, phage genome vMAG4 was first normalized to 16S ribosomal RNA, followed by normalization against S. epidermidis (ΔΔCT method; see Supplementary Materials). Using this approach, we could not only replicate phage abundance patterns from the NGS dataset but also provide unbiased abundance measures of the previously unidentified vMAGs in the analyzed samples. This approach thus allows robust decoupling of absolute phage abundance from abundance of their respective host bacteria (Fig. 3C).
Five of the nine evaluated vMAGs showed a changed direction of effect when normalized to suspected host bacterium. The other four showed a consistent direction of effect after normalization against their host bacterium, and three of these showed an abundance pattern correlating to inflammation severity score (see boxplots in Fig. 3C and fig. S16). We think that this high fraction of vMAGs with a changed direction in qPCR validation is due to normalization against their host bacterium. For quantitative analysis of the phageome using shotgun sequencing data, we corrected for differences in total read number. However, analyzing these large-scale datasets, we did not see a feasible way to correct candidate vMAGs for their host bacterium, and we think that this has to be curated carefully and performed individually for disease-associated vMAGs. Thus, this result also highlights the need for careful validation of phage read mapping analysis.
For vMAGs with consistent direction of effect in both datasets NGS and qPCR where abundance pattern correlated to severity of inflammation, we performed an in-depth annotation of their metagenomic assembled genomes (Fig. 4). vMAG4 numbers, for instance, decreased with increasing inflammation level in AD, while its host bacterium, S. epidermidis, increased in abundance (Fig. 3). vMAG4 was rated high quality by CheckV with a hidden Markov model (HMM)–based estimated completeness of 90% (table S8). This 123,818–nucleotide (nt)–long stretch of DNA contained 141 protein CDS (Fig. 4). Of the 141 CDS, 13 could be mapped to a cluster of orthologous groups according to EggNOG database (30). Translated amino acids were compared to NCBI’s nonredundant protein database using BLAST (Basic Local Alignment Search Tool) software. We found nine CDS with high similarity (at least 60-nt-long match and significant E value) to bacterial proteins and six with high similarity to phage proteins. Less conservative HMM search provided significant sequence similarity E values for 106 CDS of the 141 predicted CDS when compared to several phage databases (Fig. 4, HMM sequence similarity track). We found several known phage-associated genes such as phage integrases, phage lysis module proteins, and phage minor tail protein L. However, 106 CDS in vMAG4 contained the phrase “hypothetical” or “unknown function” in their free text description (see table S15 for full list).
Fig. 4. Candidate vMAGs that associate with AD.
Each circos plot represents genome properties of one candidate phage genome. Plots are organized in circular tracks. Outermost tracks give the position and length of putative phage genomes. Next track gives protein CDS as squares, which are colored according to their cluster of orthologous groups. Green color indicates no match to EggNOG database, brown indicates a protein of unknown function, and pink indicates proteins involved in cell wall and membrane biogenesis (see legend for further classification). This track also indicates whether CDS was predicted on forward or reverse strand. Third track shows gas chromatography (GC) content of individual CDS (given in more detail in tables 15 to 17). Next track (pink and green vertical lines) reflects a P value from blast sequence comparison. The higher the bar, the more similarity was present. No bars at any given CDS position indicate that no match with a sequence length above 60 nt was found. Green indicates matches in phage databases, and pink indicate matches in bacteria databases. Next track (colors scaled from yellow to blue) indicates sequence similarity based on HMM models to phage databases, where yellow indicates no match or low similarity and blue indicates high similarity. Innermost track (filled line graph) reflects coverage in single-base resolution at the given genomic position. Pink indicates coverage in patients with AD, and brown indicates coverage in health controls. Additional track inside of coverage track in vMAG14 gives P value sequences in comparison to Newman phage (31).
vMAG14 is an S. aureus prophage. The fraction of S. aureus carrying this prophage increased with inflammation severity in patients with AD, suggesting a growth advantage of the bacterium carrying this prophage (Fig. 3). The vMAG14 genome is 45,409 nt long after trimming bacterial DNA from the initially identified contig. The putative phage genome was identified as complete prophage genome with high confidence and a high overlap to Newman phages (Fig. 4, vMAG14, innermost track) (31). The putative genome consists of 65 protein CDS of which 30 were classified as protein with unknown function in EggNOG database. Within this putative genome, we found not only many structural phage proteins such as tail and head proteins but also proteins involved in phage DNA packaging and replication (table S16). Of note is that 72% of CDS showed a high sequences similarity when compared to the bacterial gene collection (Fig. 4 and table S16).
Although the genome of vMAG8 was scored as high quality (table S11), host mapping was not conclusive, suggesting that this is a yet undescribed phage. Our best hit was Campylobacteria as a host. According to our qPCR analysis, the amount of Campylobacteria remained at a similarly low level in uninflamed and inflamed skin. vMAG8 phage amounts, however, increased with inflammation severity. The putative genome is 205,548 nt long, and the Prodigal tool could predict 199 protein CDS (Fig. 4 and table S17). A total of 183 CDS could be classified into clusters of orthologous groups with EggNOG mapper. Fifty-three protein sequences were classified as unknown function. Among better described database entries, we found not only phage proteins, such as holin protein, endolysin, transcriptional regulators, and prophage lysogenic conversion modules, but also structural proteins such as phage tail proteins (table S17). Similar to vMAG14, a high percentage (47.7%) of CDS could also be mapped to bacterial genes.
DISCUSSION
While the phageome has been linked to certain chronic inflammatory diseases in humans, primarily in the gut, little is known about bacteriophages in normal and inflamed skin. Our study is the first culture free attempt to systematically map phage abundances in a chronic skin disease, namely, AD. Although we observed high interindividual variability, it was possible to identify robust marker phages that presumably provide a growth advantage for S. aureus, a bacterium intimately linked to AD pathogenesis and progression. On the other hand, the abundance of certain phages decreases with increasing severity of inflammation in patients with AD, which may indicate a skin protective function of these prokaryotic viruses. These findings underpin the central role of phages in bacterial communities (12, 32) and, thus, hold potential promise for extended experimental phage therapies, specifically for S. aureus and its multidrug-resistant variants.
Our study also showcases the interplay between the microbiome and the phage community in AD. One might expect that the dominating bacterial species (i.e., S. aureus in AD) correlates with increased abundance of predatory phages, which seems to be the case for some habitats (32, 33) including self-limiting cholera outbreaks in Bangladesh (34). However, this dynamic seems to be out of balance in AD. Our observation rather supports the theory that phages can switch from lytic to temperate lifestyle, if their host bacterium is dominating the environment (35). This notion is supported by our observation of 88% of previously unidentified prophages being associated with inflammation in AD.
Our qPCR validation shows the importance of careful validation of read hit data from phage metagenomic analysis. In five of the nine evaluated vMAGs, we observed a change in direction of effect of their AD association. Reasons for this changed direction of effect may be not only the lower detection limit of qPCR but also the fact that many phages might simply reflect abundance of their bacterial host. This is especially important when interpreting the biological effect of a phage. For example, vMAG25 (fig. S15) showed increased abundance with increasing inflammation in NGS data; however, the opposite effect was observed when normalized against S. aureus showing a similar expression pattern as vMAG4.
Despite additional caution that must be taken when evaluating abundance pattern of phages, our data also encourage further search for suitable phage therapy candidate in skin diseases because, unlike reduction of α diversity (Shannon index; Fig. 3A) in the altered microbiome in AD (36, 37), we find that the phage α diversity in our AD samples was not reduced. Thus, we think that this adds a new layer to therapeutic research efforts for AD (36).
In our study, we find two phage vMAG types that may represent main drivers of cutaneous bacterial communities. First, integrated prophages (vMAG14) that seem to increase the fitness of S. aureus and other pathogenic bacterial strains in AD. There are several ways how prophages have been shown to increase fitness of their host bacterium. They might protect against infection with other phages (11, 37, 38), transfer antibiotic resistance, especially Neuman prophages (31, 39, 40), or transfer auxiliary metabolic genes (41), all that may play a role in the progression of AD. Comparison of S. aureus–specific reads between patients with AD and clinical S. aureus isolates corroborate this notion (fig. S17). Second, dysregulated lytic phages may prey on dominant bacterial strains. If these phages show a decreased abundance in AD microbiomes such as vMAG4, then this could make them suitable candidates for phage therapy because they could be complemented via topical application to exploit their natural potential to remove pathogenic bacteria.
Our study also highlights a hallmark challenge in this field: ambiguity of phage gene annotations (Fig. 4). Because of the ability of prophages to integrate in bacterial genomes, many genes and genome stretches are annotated as bacterial genes and phage genes alike; combined with the limited knowledge about phage gene function, it makes robust identification and evaluation of phage genomes difficult. This genetic mosaicism, which leads to abrupt transitions between highly similar regions and seemingly unrelated regions in phage genomes, is currently one of the main limitations in microbiome and phage studies, including ours. More studies are needed to untangle stable phage-acquired genetic elements from transient phage DNA in bacterial genomes. We would like to acknowledge that the study presented does not consider RNA phages, which are now beginning to have large-scale reference sets available (42). This exclusion is due to the limitations in wet laboratory techniques, where simultaneously assessing both the bacterial and viral kingdoms becomes more challenging.
While our study provides first insights into the phage bacteria dysbiosis in AD, a combination of culture-free observational studies with phage in vitro culturing studies across different skin diseases is necessary to speed up a well-informed discovery of candidates for phage therapy and to increase understanding of phage genomics and genome evolution.
MATERIALS AND METHODS
Participant enrolment and ethics
This study was conducted in accordance with the Helsinki Declaration. Ethics approval was granted by the Ethics Committee of the Medical University of Vienna (EK-Nr. 2275/2019). Healthy participants and patients with atopic dermatitis (AD) between the age of 18 and 60 years who were attending the Department of Dermatology, Medical University of Vienna were invited to participate in the study. Only participants with written informed consent were enrolled in the study. The following participants were excluded: patients who received systemic antibiotic or immunosuppressive therapies within the past 6 months; patients with topical treatment of AD within the past 2 weeks; patients with any underlying chronic disease (except AD); patients with AD with additional inflammatory skin diseases; patients who were participating in another clinical study; and healthy participants with previous inflammatory skin diseases. An inflammation score from 1 to 3 was assigned to patients on the basis of visual inspection of the skin in the antecubital fossa (inner arm). The score was based on the EASI intensity score. The following parameter were scored: skin redness, thickening, scratch marks, and lichenification. Mild intensity was scored with 1, moderate intensity with 2, and severe intensity with 3. The overall mean inflammation score that was assigned to the patient was calculated as follows:
Decimal numbers have been rounded using the round half up method.
Sampling, DNA preparation, and next-generation sequencing
Samples were collected between 2020 and 2022. Two skin swabs per participant were taken from a 5-cm by 5-cm area at the ventral aspect of the elbows, and DNA was isolated using the QIAmp DNA Microbiome Kit (QIAGEN, Germany) with some protocol modifications. Briefly, cotton swabs (Raucotupf, AT) were prewetted in sterile saline (0.9% NaCl; Braun, Germany) before the skin was sampled. Both swabs were swirled in an Eppendorf tube containing 1 ml of PBS and 0.5 ml of eukaryotic cell lysis buffer, which lyses human cells but leaves bacteria and viruses unharmed. Samples were rotated for 2 hours at room temperature, followed by nuclease treatment to deplete human DNA that was released by eukaryotic cell lysis. All further steps were performed according to the manufacturer’s protocol. Isolated DNA was submitted to the Joint Microbiome Facility for library preparation and metagenomic shot-gun sequencing. Briefly, extracted DNA was fragmented enzymatically, followed by adapter and index ligation to prepare the library (NEB Next Ultra II DNA Library Prep Kit) for sequencing. Next-generation sequencing (NGS) was performed on NovaSeq 6000 SP flowcell (1 lane) using a 100-bp paired-end protocol.
NGS data quality control and removal of human DNA sequences
FASTQ files from NGS were quality controlled with fastQC tool (table S2). Raw paired-end FASTQ files were then trimmed with Trimmomatic-0.35, adapters were removed using default settings, a sliding window of 4:20 was used, and the minimum read length was set to 50 (43). Host sequence contamination was removed with a custom script. Briefly, we use Bowtie2 to align the FASTQ files against the human reference genome (GRCh38). Subsequently, we used SAMtools to split the reads into that could not be mapped to the human genome and sequences that match the human reference genome and/or mouse genome (host sequence depletion). We used reads that could not be mapped to the human or mouse genome for all further analysis. On average we recovered 47.12% (±17.21%) of total sequenced reads not aligned to human genome that could be used for analysis. The mean total library size of analyzed skin swaps was 28.8 mio reads per sample (fig. S1). An overview of the NGS quality control (QC) process including fastQC output is given in table S2.
Inclusion of NGS data from BioProject PRJNA46333
We downloaded 2132 FASTQ files from NCBI using SRA toolkit. The files contained paired-end reads from BioProject PRJNA46333, which are skin metagenomes from 11 patients with AD and 7 healthy controls described in greater detail in Byrd et al. (44). For each participant in this study, metagenomic samples from various skin sites were available. We reanalyzed a total of 422 skin metagenome samples. Where necessary, we combined multiple FASTQ files per sample to one file per sample before running metaSpades software (45) as described in assembly of contigs. Details on QC of downloaded FASTQ files and contigs are given in tables S2 and S3. All bioinformatic analyses for samples generated by us and samples from Byrd et al. were done in parallel with identical analysis settings.
Assembly of contigs
We used metaSpades (45) with default settings to assemble reads to contigs. This software constructs a Bruijn graph of all reads within one sample and then attempts to simplify this graph into long genomic fragments, termed contigs. We kept contigs with a minimum length of 2 kb for further analysis. Contigs assembled with metaSpades software served as input for binning procedure, which followed a protocol by Saheb Kashaf et al. (46) and phage discovery analysis (i.e., creation of the AD phage database). Integrity of contigs was assessed with metaQuast software (47). An overview is given in table S4, where we stratified the contigs for length, provided the total length of contigs per sample, and reported N50 values, which is the shortest contig length that needs to be included to cover 50% of the total length of contigs per sample, as well as number of N (unknown bases) per 100 kb.
Analysis of bacterial communities
We mapped the contigs against bacterial reference genomes. For this, we used the top 10 most abundant skin bacteria as reported by Byrd et al. (48). For this analysis, we applied a two-stage binning strategy as implemented in meta-wrap pipeline (49). In this process, we first bin the contigs using three independent software programs. Bins are created with CONCOT (50), Max-Bin (51), and metaBAT (52). Then, the bin predictions were refined by combining all binning results and incorporating checkM (53) results of the bins to generate one dereplicated set of high confidence bins per sample. Final QC of high confidence bins was done with checkM, after which we compared the bins to published RefSeq genomes using mash (54).
For simplicity, we restricted samples given in Fig. 1C to patients with AD with inflammation for the dataset and to samples with at least 100 contigs for the PRJNA46333 dataset. This restriction reduced the sample count to 265 samples for this analysis. Read mapping was performed as described in the “Read hit count and normalization” section. Reads were then transformed to fragments per million and visualized as heatmap (Fig. 1C).
Read hit count and normalization
We used Bowtie2 software with setting “very sensitive” retrieving the top 10 mappings to our vMAG database or mappings to selected bacterial genomes (Fig. 1, B and C). Reads mappings to multiple genomes or contigs were discarded from quantitative analysis. Mapped reads were counted and averaged with “jgi_summarize_bam_depth” function as implemented by metabat package (52). This provides us the sum of exactly aligned bases, which is then dived by the contig length. We used this value to calculate a CPM-like value. CPM in this context is defined as mapped bases normalized to contig length times 106 divided by the total number of mapped bases per sample.
Identification of viral contigs
We used Virsorter2 and DeepVir finder to identify contigs that potentially contain viral sequences (55, 56). For both approaches, we used contigs that were assembled with metaSpades at least 1500 bp in length. Virsorter2 identifies viral coding sequences (CDS) in input DNA sequences with Prodigal software (57). For this, predicted CDS are annotated with HMMER3, a sequence comparison tool based on Hiden Markov model, against Pfam database (58) and a database developed by the Virsorter developers, who extracted viral hallmark features, such as viral structural genes and genes regulating viral gene expression, to create a feature table to train NCBI RefSeq viral high-quality genomes for their random forest classifier. On the basis of this pretrained model unknown sequences are classified. We used all sequences identified with this classifier as input for our viral contig QC. DeepVir finder works similarly. They use slightly different databases and run their classification workflow based on a neural net algorithm instead of a random forest classifier. DeepVir finder returns a Posterior probability for every contig being of viral origin; thus, we applied a cutoff of 0.95.
Dereplication
Contigs containing viral sequences as identified by Virsorter and DeepVir finder in each individual sample were combined for dereplication as implemented in dRep software (59). Briefly, to produce a cluster dendrogram based on sequence similarity of all viral contigs across all samples, a pairwise sequence comparison was performed. On the basis of an ANI similarity cutoff of 95%, vMAGs were grouped. Last, one representative vMAG from each cluster group was taken forward to further analysis.
QC of vMAGs
The dereplicated viral contigs (“skin phage database”) were evaluated for quality and completeness using checkV software (60). To determine completeness and quality for every contig, CDSs are compared to viral and bacterial databases and then classified as either microbial, vial or not annotated. In combination with GC content analysis, this allows checkV to propose boundaries between host and viral DNA in the analyzed contigs. Once nonviral regions are removed, completeness of genomes is estimated. First, closed genomes are identified by direct terminal repeats (DTR), inverted terminal repeats (ITR), or a provirus that is suggested via a sequence of host-viral-host genes. Next, to estimate the expected genome length, the sequence of each contig is compared to several virus databases. ANI-based sequence similarities of 95% are considered as refences when calculating the completeness of the analyzed contig. If a contig has not been seen before in NCBI GenBank virus database or CheckV database, a gene-based method is applied to estimate the completeness of the analyzed contigs. The length of viral genomes within the highest gene-wise HMM sequence comparison match is used as refence to estimate completeness. This process yields a five-tier model: Complete genomes and high-quality genomes have a completeness of 90%, medium quality genomes have a completeness between 80 and 90%, and low confidence have a completeness smaller than 80%. If no mapping to RefSeq or CheckV database can be made, then vMAGs will be classified not determined. In addition, we annotated viral genes using DRAM pipeline (61), which is comparing viral CDSs against Pfam (58), VOGDB (62), uniport database, and viral RefSeq genomes.
Host assignment via CRISPR spacer
We downloaded all NCBI RefSeq bacterial genomes available at 6th of July 2021. These genomes were used to predict CRISPR spacer sequencies using CrispCasFinder-2.0.3 (63). Spacer sequences present in CPRISPR arrays with an evidence level of 3 or 4 were used to map phage sequences to bacterial hosts. We screened for spacers in our viral contigs via blasting the viral contigs against our custom-made BLAST database of the identified RefSeq bacterial spacers. A bacterial host was only assigned if we had a 100% match across the entire length of the spacer sequence extracted from RefSeq bacteria with a section of the viral contig.
MASH distance matrix overlap to existing databases
We used MASH software (64) to generate distance measures between viral contigs, RefSeq viral genomes, and two large-scale phage databases. To this end, we retrieved all complete NCBI viral genomes that had bacteria listed as their host. Furthermore, we downloaded a GutPhage database described by Camarillo-Guerrero et al. (65) and the complete IMG/VRv3, a collection of uncultivated viral genomes across all earth habitats (66). We generated MASH sketch tables out of the data and ran mash screen with “the winner takes all” flag to reduce redundancy. This produces identity values indicating how much of the query sequence (AD phage DB) is shared in specific reference DB (here we used, RefSeq, GutPhages, and IMG/VR v3). Identities above 95% were interpreted as an overlap in the upset plot (Fig. 2D). For distance estimation, we restricted our analysis to RefSeq phages and skin phages (Fig. 2, B and C). We used the mashtree convenience function to get a matrix of sample distances. Those sample distances are based on DNA sequences similarity. Simplified, for two DNA sequences, the Jaccard distance estimation is based on the fraction of intersecting genomic sequences compared to total sequences length. We excluded RefSeq genomes that had no overlapping sequence with any skin phages found in this study and potted the resulting matrix as heatmap. We used the same matrix of distance measures to perform a multidimensional scaling. On the basis of the screen plot (fig. S11), we selected the top 60 PC to generate a UMAP representation of the multidimensional scaling results.
vMAG read hit count
We removed stretches of bacterial genomes from prophage contigs and then used the phage database as reference. Similar to analysis of read hits against bacterial genomes (see the “Read hit count and normalization” section), we used Bowtie2 software and calculated CPM values as well as relative abundance values for each sample with at least 50 reads mapped against the phage database (n = 93). Relative abundance was calculated by summing up CPM values for each sample and dividing every read count by the total number of CPM. We plotted the top abundant phage contigs (fig. S13) and calculated the Shannon-Wiener index, which is H′ = −∑i pi · ln(pi), where pi is the proportional abundance of a phage contig. Regression analysis were performed on CPM data using MaAsLin 2 software (67). Count data were log-transformed and regressed against inflammation score, in which skin swabs from unaffected or healthy individual were rated 0. SCORAD inflammation scores from Byrd et al. were binned into a score ranging from 1 to 3, where SCORAD from 1 to 11.5 was AD inflammation score of 1 up to SCORAD 33.5 that was assigned 2 and above 33.5 that was assigned 3. We performed regression on the full dataset and the two individual studies. Detailed overview of this regression results is given in table S12. Bonferroni threshold for multiple testing in this study was 0.05/164 (i.e., 3 × 10−4).
qPCR validation of vMAGs
Many Bonferroni-significant contigs were incomplete or low-quality genome fragments; thus, we relaxed the P value threshold for qPCR marker selection. Candidates for qPCR were selected on the basis of association to inflammation score and quality metrics information about the contig, prioritizing proviruses over complete genomes and complete genomes over high-quality genomes. qPCR primer for normalization to bacterial genomes were, where possible, taken from pubMLST or designed using Primer3 webtool. Phage contig primers were designed in a way to avoid overlaps to bacterial genome sequences (horizontal gene transfer), or, where possible, we selected phage specific genes (table S13). qPCR validation was performed on the samples used for shotgun sequencing including seven additional healthy controls and additional four patients with AD (14 samples each). Five microliters of aliquots of isolated DNA were amplified using the whole-genome amplification (WGA) kit REPLI-g (QIAGEN, Germany) according to the manufacturer’s protocols and a 1:40 dilution of amplified DNA:PCR grade water (Roche, CH) was used for qPCR. qPCRs were performed using the iTaq Universal SYBR Green SuperMix Master Mix (Bio-Rad, USA) on a StepOnePlus Real-Time PCR System (Applied Biosystems, USA) according to the manufacturer’s protocols. Primers were used at a final concentration of 500 nM together with 4 μl of diluted template (1:40 dilution of the WGA-amplified genomic DNA) in a final reaction volume of 10μl.
qPCR data analysis was done as follows: We excluded any cycle threshold (CT) value with a concordant product showing multiple melting points. We determined the melting point of each PCR-amplified product upon visual inspection. Next, we defined a range of ±1.5°C and excluded each CT value with concordant product specific melting point outside this range. CT values were then normalized to total bacterial DNA per sample. For this, we subtracted the 16S rRNA primer CT values of each sample from the CT value of all other measured genomic loci. Depending on the suspected host bacterium, we further normalized the phage contig CT values to the amount of host bacteria. On the basis of this analysis, we counted the number of samples with a certain phage contig present (Fig. 3C, bar plot), alongside with applying the ΔΔCT method (2−ΔΔCT) to get an impression of the fraction of bacteria with a certain prophage integrated or the content of phages normalized to their suspected host bacterium (Fig. 3C, boxplot).
In depth annotation of candidate vMAGs
For the three vMAGs that showed association with AD in sequencing and qPCR data, we performed an in-depth annotation using several specialized tools such as EggNOG (68), DRAM (61), and Mulitphate2 (69). Briefly, first protein CDSs of the candidate genomes were predicted with Prodigal (57). Then, the Prodigal-called CDS were compared to a series of databases. vMAGs were compared against KEGG, Pfam (58), VOGDB (62), EggNOG database (68), pVOG (70), Swissprot, CAZy (71), and NCBI’s RefSeq nonredundant proteins database (NR). Similarity between database entries and Prodigal predicted protein CDSs was calculated with blastn, which compares the actual sequences and evaluates gaps and overlaps to arrive at a bitscore, which then is corrected for sequence length and given as P value (e value), and/or a HMM model that models protein sequences (profiles) and thus is faster and more sensitive (72). Results from blast searches and HMM searches are summarized for each CDS entry of candidate phage genomes in tables 15 to 17.
To give an overview of genome coverage (Fig. 4), we aligned all reads from healthy controls to the candidate genomes and all reads across patients with AD against the candidate genomes separately. We then created a SAMtools mpileup file giving coverage at single-nucleotide resolution. HMM sequence similarity P values were −log10-transformed, like in a Manhattan plot. Because the range of the transformed P value was still very high (1 to 245), we performed another log2 transformation to arrive at our color scale. All untransformed P values are given in table S15 to S17. We noticed a high similarity between vMAG14 and Newman prophages (73). Therefore, we aligned (blastn) this two sequences separately and plotted the blast P values at an innermost track of vMAG14 (Fig. 4).
Acknowledgments
The computational results presented were achieved using the Vienna Scientific Cluster (VSC). We thank J. Schwarz and G. Kohl for assistance in the laboratory.
Funding: The project was funded by the “Land Niederösterreich” (Danube ARC).
Author contributions: Conceptualization: W.W. and K.P. Methodology: K.P., M.W., D.S., M.B., P.B., C.B., K.J., B.S., P.P., and B.W. Investigation: M.W., K.P., and W.W. Visualization: M.W. Funding acquisition: W.W. Project administration: W.W. and K.P. Supervision: W.W. Writing—original draft: M.W. and W.W. Writing—review and editing: K.P., M.W., D.S., P.B., C.B., K.J., M.B., B.S., P.P., B.W., and W.W.
Competing interests: The authors declare that they have no competing interests.
Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additionally, raw sequence read data and analysis codes are available via Dryad: https://doi.org/10.5061/dryad.qz612jmmt.
Supplementary Materials
This PDF file includes:
Materials and Methods
Supplementary Text
Figs. S1 to S17
Legends for tables S1 to S17
Other Supplementary Material for this manuscript includes the following:
Tables S1 to S17
REFERENCES AND NOTES
- 1.L. Brunello, Atopic dermatitis. Nat. Rev. Dis. Primers 4, 2 (2018). [DOI] [PubMed] [Google Scholar]
- 2.S. M. Langan, A. D. Irvine, S. Weidinger, Atopic dermatitis. Lancet 396, 345–360 (2020). [DOI] [PubMed] [Google Scholar]
- 3.J. Oh, A. L. Byrd, C. Deming, S. Conlan; NISC Comparative Sequencing Program, H. H. Kong, J. A. Segre, Biogeography and individuality shape function in the human skin metagenome. Nature 514, 59–64 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.J. Oh, A. L. Byrd, M. Park; NISC Comparative Sequencing Program, H. H. Kong, J. A. Segre, Temporal stability of the human skin microbiome. Cell 165, 854–866 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.A. L. Byrd, Y. Belkaid, J. A. Segre, The human skin microbiome. Nat. Rev. Microbiol. 16, 143–155 (2018). [DOI] [PubMed] [Google Scholar]
- 6.S. Saheb Kashaf, D. M. Proctor, C. Deming, P. Saary, M. Hölzer; NISC Comparative Sequencing Program, M. E. Taylor, H. H. Kong, J. A. Segre, A. Almeida, R. D. Finn, Integrating cultivation and metagenomics for a multi-kingdom view of skin microbiome diversity and functions. Nat. Microbiol. 7, 169–179 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.E. Pasolli, F. Asnicar, S. Manara, M. Zolfo, N. Karcher, F. Armanini, F. Beghini, P. Manghi, A. Tett, P. Ghensi, M. C. Collado, B. L. Rice, C. DuLong, X. C. Morgan, C. D. Golden, C. Quince, C. Huttenhower, N. Segata, Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176, 649–662.e20 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.A. L. Byrd, C. Deming, S. K. B. Cassidy, O. J. Harrison, W. I. Ng, S. Conlan; NISC Comparative Sequencing Program, Y. Belkaid, J. A. Segre, H. H. Kong, Staphylococcus aureus and Staphylococcus epidermidis strain diversity underlying pediatric atopic dermatitis. Sci. Transl. Med. 9, eaal4651 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.J. A. Geoghegan, A. D. Irvine, T. J. Foster, Staphylococcus aureus and atopic dermatitis: A complex and evolving relationship. Trends Microbiol. 26, 484–497 (2018). [DOI] [PubMed] [Google Scholar]
- 10.J. M. Hanifin, J. L. Rogge, Staphylococcal infections in patients with atopic dermatitis. Arch. Dermatol. 113, 1383–1386 (1977). [PubMed] [Google Scholar]
- 11.K. N. LeGault, S. G. Hays, A. Angermeyer, A. C. McKitterick, F.-T. Johura, M. Sultana, T. Ahmed, M. Alam, K. D. Seed, Temporal shifts in antibiotic resistance elements govern phage-pathogen conflicts. Science 373, eabg2166 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.A. Chevallereau, B. J. Pons, S. van Houte, E. R. Westra, Interactions between bacterial and phage communities in natural environments. Nat. Rev. Microbiol. 20, 49–62 (2022). [DOI] [PubMed] [Google Scholar]
- 13.M. B. Dion, F. Oechslin, S. Moineau, Phage diversity, genomics and phylogeny. Nat. Rev. Microbiol. 18, 125–138 (2020). [DOI] [PubMed] [Google Scholar]
- 14.W. B. Whitman, D. C. Coleman, W. J. Wiebe, Prokaryotes: The unseen majority. Proc. Natl. Acad. Sci. U.S.A. 95, 6578–6583 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.F. L. Nobrega, M. Vlot, P. A. de Jonge, L. L. Dreesens, H. J. E. Beaumont, R. Lavigne, B. E. Dutilh, S. J. J. Brouns, Targeting mechanisms of tailed bacteriophages. Nat. Rev. Microbiol. 16, 760–773 (2018). [DOI] [PubMed] [Google Scholar]
- 16.J. Ren, K. Song, C. Deng, N. A. Ahlgren, J. A. Fuhrman, Y. Li, X. Xie, R. Poplin, F. Sun, Identifying viruses from metagenomic data using deep learning. Quant. Biol. 8, 64–77 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.J. Guo, B. Bolduc, A. A. Zayed, A. Varsani, G. Dominguez-Huerta, T. O. Delmont, A. A. Pratama, M. C. Gazitúa, D. Vik, M. B. Sullivan, S. Roux, VirSorter2: A multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses. Microbiome 9, 37 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.D. Paez-Espino, E. A. Eloe-Fadrosh, G. A. Pavlopoulos, A. D. Thomas, M. Huntemann, N. Mikhailova, E. Rubin, N. N. Ivanova, N. C. Kyrpides, Uncovering Earth’s virome. Nature 536, 425–430 (2016). [DOI] [PubMed] [Google Scholar]
- 19.S. Roux, D. Páez-Espino, I. M. A. Chen, K. Palaniappan, A. Ratner, K. Chu, T. B. K. Reddy, S. Nayfach, F. Schulz, L. Call, R. Y. Neches, T. Woyke, N. N. Ivanova, E. A. Eloe-Fadrosh, N. C. Kyrpides, IMG/VR v3: An integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses. Nucleic Acids Res. 49, D764–D775 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.B. Al-Shayeb, R. Sachdeva, L.-X. Chen, F. Ward, P. Munk, A. Devoto, C. J. Castelle, M. R. Olm, K. Bouma-Gregson, Y. Amano, C. He, R. Méheust, B. Brooks, A. Thomas, A. Lavy, P. Matheus-Carnevali, C. Sun, D. S. A. Goltsman, M. A. Borton, A. Sharrar, A. L. Jaffe, T. C. Nelson, R. Kantor, R. Keren, K. R. Lane, I. F. Farag, S. Lei, K. Finstad, R. Amundson, K. Anantharaman, J. Zhou, A. J. Probst, M. E. Power, S. G. Tringe, W.-J. Li, K. Wrighton, S. Harrison, M. Morowitz, D. A. Relman, J. A. Doudna, A.-C. Lehours, L. Warren, J. H. D. Cate, J. M. Santini, J. F. Banfield, Clades of huge phages from across Earth’s ecosystems. Nature 578, 425–431 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.L. F. Camarillo-Guerrero, A. Almeida, G. Rangel-Pineros, R. D. Finn, T. D. Lawley, Massive expansion of human gut bacteriophage diversity. Cell 184, 1098–1109.e9 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.J. M. Norman, S. A. Handley, M. T. Baldridge, L. Droit, C. Y. Liu, B. C. Keller, A. Kambal, C. L. Monaco, G. Zhao, P. Fleshner, T. S. Stappenbeck, D. P. B. McGovern, A. Keshavarzian, E. A. Mutlu, J. Sauk, D. Gevers, R. J. Xavier, D. Wang, M. Parkes, H. W. Virgin, Disease-specific alterations in the enteric virome in inflammatory bowel disease. Cell 160, 447–460 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.S. Federici, S. Kredo-Russo, R. Valdés-Mas, D. Kviatcovsky, E. Weinstock, Y. Matiuhin, Y. Silberberg, K. Atarashi, M. Furuichi, A. Oka, B. Liu, M. Fibelman, I. N. Weiner, E. Khabra, N. Cullin, N. Ben-Yishai, D. Inbar, H. Ben-David, J. Nicenboim, N. Kowalsman, W. Lieb, E. Kario, T. Cohen, Y. F. Geffen, L. Zelcbuch, A. Cohen, U. Rappo, I. Gahali-Sass, M. Golembo, V. Lev, M. Dori-Bachash, H. Shapiro, C. Moresi, A. Cuevas-Sierra, G. Mohapatra, L. Kern, D. Zheng, S. P. Nobs, J. Suez, N. Stettner, A. Harmelin, N. Zak, S. Puttagunta, M. Bassan, K. Honda, H. Sokol, C. Bang, A. Franke, C. Schramm, N. Maharshak, R. B. Sartor, R. Sorek, E. Elinav, Targeted suppression of human IBD-associated gut microbiota commensals by phage consortia for treatment of intestinal inflammation. Cell 185, 2879–2898.e24 (2022). [DOI] [PubMed] [Google Scholar]
- 24.A. Eskenazi, C. Lood, J. Wubbolts, M. Hites, N. Balarjishvili, L. Leshkasheli, L. Askilashvili, L. Kvachadze, V. van Noort, J. Wagemans, M. Jayankura, N. Chanishvili, M. de Boer, P. Nibbering, M. Kutateladze, R. Lavigne, M. Merabishvili, J. P. Pirnay, Combination of pre-adapted bacteriophage therapy and antibiotics for treatment of fracture-related infection due to pandrug-resistant Klebsiella pneumoniae. Nat. Commun. 13, 302 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.R. M. Dedrick, C. A. Guerrero-Bustamante, R. A. Garlena, D. A. Russell, K. Ford, K. Harris, K. C. Gilmour, J. Soothill, D. Jacobs-Sera, R. T. Schooley, G. F. Hatfull, H. Spencer, Engineered bacteriophages for treatment of a patient with a disseminated drug-resistant Mycobacterium abscessus. Nat. Med. 25, 730–733 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.J. A. Nick, R. M. Dedrick, A. L. Gray, E. K. Vladar, B. E. Smith, K. G. Freeman, K. C. Malcolm, L. E. Epperson, N. A. Hasan, J. Hendrix, K. Callahan, K. Walton, B. Vestal, E. Wheeler, N. M. Rysavy, K. Poch, S. Caceres, V. K. Lovell, K. B. Hisert, V. C. de Moura, D. Chatterjee, P. de, N. Weakly, S. L. Martiniano, D. A. Lynch, C. L. Daley, M. Strong, F. Jia, G. F. Hatfull, R. M. Davidson, Host and pathogen response to bacteriophage engineered against Mycobacterium abscessus lung infection. Cell 185, 1860–1874.e12 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.J. S. Little, R. M. Dedrick, K. G. Freeman, M. Cristinziano, B. E. Smith, C. A. Benson, T. A. Jhaveri, L. R. Baden, D. A. Solomon, G. F. Hatfull, Bacteriophage treatment of disseminated cutaneous mycobacterium chelonae infection. Nat. Commun. 13, 2313 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.S. Nayfach, A. P. Camargo, F. Schulz, E. Eloe-Fadrosh, S. Roux, N. C. Kyrpides, CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat. Biotechnol. 39, 578–585 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.G. Lima-Mendez, J. Van Helden, A. Toussaint, R. Leplae, Reticulate representation of evolutionary and functional relationships between phage genomes. Mol. Biol. Evol. 25, 762–777 (2008). [DOI] [PubMed] [Google Scholar]
- 30.C. P. Cantalapiedra, A. Hernandez-Plaza, I. Letunic, P. Bork, J. Huerta-Cepas, eggNOG-mapper v2: Functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol. 38, 5825–5829 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.T. Bae, T. Baba, K. Hiramatsu, O. Schneewind, Prophages of Staphylococcus aureus Newman and their contribution to virulence. Mol. Microbiol. 62, 1035–1047 (2006). [DOI] [PubMed] [Google Scholar]
- 32.M. Breitbart, C. Bonnain, K. Malki, N. A. Sawaya, Phage puppet masters of the marine microbial realm. Nat. Microbiol. 3, 754–766 (2018). [DOI] [PubMed] [Google Scholar]
- 33.G. Hevroni, J. Flores-Uribe, O. Beja, A. Philosof, Seasonal and diel patterns of abundance and activity of viruses in the Red Sea. Proc. Natl. Acad. Sci. U.S.A. 117, 29738–29747 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.S. M. Faruque, M. J. Islam, Q. S. Ahmad, A. S. G. Faruque, D. A. Sack, G. B. Nair, J. J. Mekalanos, Self-limiting nature of seasonal cholera epidemics: Role of host-mediated amplification of phage. Proc. Natl. Acad. Sci. U.S.A. 102, 6119–6124 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.B. Knowles, C. B. Silveira, B. A. Bailey, K. Barott, V. A. Cantu, A. G. Cobián-Güemes, F. H. Coutinho, E. A. Dinsdale, B. Felts, K. A. Furby, E. E. George, K. T. Green, G. B. Gregoracci, A. F. Haas, J. M. Haggerty, E. R. Hester, N. Hisakawa, L. W. Kelly, Y. W. Lim, M. Little, A. Luque, T. McDole-Somera, K. McNair, L. S. de Oliveira, S. D. Quistad, N. L. Robinett, E. Sala, P. Salamon, S. E. Sanchez, S. Sandin, G. G. Z. Silva, J. Smith, C. Sullivan, C. Thompson, M. J. A. Vermeij, M. Youle, C. Young, B. Zgliczynski, R. Brainard, R. A. Edwards, J. Nulton, F. Thompson, F. Rohwer, Lytic to temperate switching of viral communities. Nature 531, 466–470 (2016). [DOI] [PubMed] [Google Scholar]
- 36.T. Bieber, Atopic dermatitis: An expanding therapeutic pipeline for a complex disease. Nat. Rev. Drug Discov. 21, 21–40 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.N. Frazao, A. Sousa, M. Lassig, I. Gordo, Horizontal gene transfer overrides mutation in Escherichia coli colonizing the mammalian gut. Proc. Natl. Acad. Sci. U.S.A. 116, 17906–17915 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.K. D. Seed, D. W. Lazinski, S. B. Calderwood, A. Camilli, A bacteriophage encodes its own CRISPR/Cas adaptive response to evade host innate immunity. Nature 494, 489–491 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.J. Haaber, J. J. Leisner, M. T. Cohn, A. Catalan-Moreno, J. B. Nielsen, H. Westh, J. R. Penadés, H. Ingmer, Bacterial viruses enable their host to acquire antibiotic resistance genes from neighbouring cells. Nat. Commun. 7, 13333 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.J. Chen, N. Quiles-Puchalt, Y. N. Chiang, R. Bacigalupe, A. Fillol-Salom, M. S. J. Chee, J. R. Fitzgerald, J. R. Penadés, Genome hypermobility by lateral transduction. Science 362, 207–212 (2018). [DOI] [PubMed] [Google Scholar]
- 41.M. R. Mangalea, D. Paez-Espino, K. Kieft, A. Chatterjee, M. E. Chriswell, J. A. Seifert, M. L. Feser, M. K. Demoruelle, A. Sakatos, K. Anantharaman, K. D. Deane, K. A. Kuhn, V. M. Holers, B. A. Duerkop, Individuals at risk for rheumatoid arthritis harbor differential intestinal bacteriophage communities with distinct metabolic potential. Cell Host Microbe 29, 726–739.e5 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.U. Neri, Y. I. Wolf, S. Roux, A. P. Camargo, B. Lee, D. Kazlauskas, I. M. Chen, N. Ivanova, L. Zeigler Allen, D. Paez-Espino, D. A. Bryant, D. Bhaya, M. Krupovic, V. V. Dolja, N. C. Kyrpides, E. V. Koonin, U. Gophna, A. B. Narrowe, A. J. Probst, A. Sczyrba, A. Kohler, A. Séguin, A. Shade, B. J. Campbell, B. D. Lindahl, B. K. Reese, B. M. Roque, C. DeRito, C. Averill, D. Cullen, D. A. C. Beck, D. A. Walsh, D. M. Ward, D. Wu, E. Eloe-Fadrosh, E. L. Brodie, E. B. Young, E. A. Lilleskov, F. J. Castillo, F. M. Martin, G. R. LeCleir, G. T. Attwood, H. Cadillo-Quiroz, H. M. Simon, I. Hewson, I. V. Grigoriev, J. M. Tiedje, J. K. Jansson, J. Lee, J. S. VanderGheynst, J. Dangl, J. S. Bowman, J. L. Blanchard, J. L. Bowen, J. Xu, J. F. Banfield, J. W. Deming, J. E. Kostka, J. M. Gladden, J. Z. Rapp, J. Sharpe, K. D. McMahon, K. K. Treseder, K. D. Bidle, K. C. Wrighton, K. Thamatrakoln, K. Nusslein, L. K. Meredith, L. Ramirez, M. Buee, M. Huntemann, M. G. Kalyuzhnaya, M. P. Waldrop, M. B. Sullivan, M. O. Schrenk, M. Hess, M. A. Vega, M. A. O’Malley, M. Medina, N. E. Gilbert, N. Delherbe, O. U. Mason, P. Dijkstra, P. F. Chuckran, P. Baldrian, P. Constant, R. Stepanauskas, R. A. Daly, R. Lamendella, R. J. Gruninger, R. M. McKay, S. Hylander, S. L. Lebeis, S. P. Esser, S. G. Acinas, S. S. Wilhelm, S. W. Singer, S. S. Tringe, T. Woyke, T. B. K. Reddy, T. H. Bell, T. Mock, T. McAllister, V. Thiel, V. J. Denef, W. T. Liu, W. Martens-Habbena, X. J. Allen Liu, Z. S. Cooper, Z. Wang, Expansion of the global RNA virome reveals diverse clades of bacteriophages. Cell 185, 4023–4037.e18 (2022). [DOI] [PubMed] [Google Scholar]
- 43.A. M. Bolger, M. Lohse, B. Usadel, Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.A. L. Byrd, C. Deming, S. K. B. Cassidy, O. J. Harrison, W.-I. Ng, S. Conlan, NISC Comparative Sequencing Program, Y. Belkaid, J. A. Segre, H. H. Kong, Staphylococcus aureus and Staphylococcus epidermidis strain diversity underlying pediatric atopic dermatitis. Sci. Transl. Med. 9, (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.S. Nurk, D. Meleshko, A. Korobeynikov, P. A. Pevzner, metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.S. Saheb Kashaf, A. Almeida, J. A. Segre, R. D. Finn, Recovering prokaryotic genomes from host-associated, short-read shotgun metagenomic sequencing data. Nat. Protoc. 16, 2520–2541 (2021). [DOI] [PubMed] [Google Scholar]
- 47.A. Mikheenko, V. Saveliev, A. Gurevich, MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32, 1088–1090 (2016). [DOI] [PubMed] [Google Scholar]
- 48.A. L. Byrd, Y. Belkaid, J. A. Segre, The human skin microbiome. Nat. Rev. Microbiol. 16, 143–155 (2018). [DOI] [PubMed] [Google Scholar]
- 49.G. V. Uritskiy, J. DiRuggiero, J. Taylor, MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6, 158 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.J. Alneberg, B. S. Bjarnason, I. de Bruijn, M. Schirmer, J. Quick, U. Z. Ijaz, L. Lahti, N. J. Loman, A. F. Andersson, C. Quince, Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014). [DOI] [PubMed] [Google Scholar]
- 51.Y. W. Wu, Y. H. Tang, S. G. Tringe, B. A. Simmons, S. W. Singer, MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome 2, 26 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.D. D. Kang, F. Li, E. Kirton, A. Thomas, R. Egan, H. An, Z. Wang, MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.D. H. Parks, M. Imelfort, C. T. Skennerton, P. Hugenholtz, G. W. Tyson, CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.B. D. Ondov, T. J. Treangen, P. Melsted, A. B. Mallonee, N. H. Bergman, S. Koren, A. M. Phillippy, Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 17, 132 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.J. Guo, B. Bolduc, A. A. Zayed, A. Varsani, G. Dominguez-Huerta, T. O. Delmont, A. A. Pratama, M. Consuelo Gazitúa, D. Vik, M. B. Sullivan, S. Roux, VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses. Microbiome 9, 37 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.J. Ren, K. Song, C. Deng, N. A. Ahlgren, J. A. Fuhrman, Y. Li, X. Xie, R. Poplin, F. Sun, Identifying viruses from metagenomic data using deep learning. Quant. Biol. 8, 64–77 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.D. Hyatt, G.-L. Chen, P. F. L. Cascio, M. L. Land, F. W. Larimer, L. J. Hauser, Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.J. Mistry, S. Chuguransky, L. Williams, M. Qureshi, G. A. Salazar, E. L. L. Sonnhammer, S. C. E. Tosatto, L. Paladin, S. Raj, L. J. Richardson, R. D. Finn, A. Bateman, Pfam: The protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.M. R. Olm, C. T. Brown, B. Brooks, J. F. Banfield, dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J 11, 2864–2868 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.S. Nayfach, A. P. Camargo, F. Schulz, E. Eloe-Fadrosh, S. Roux, N. C. Kyrpides, CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat. Biotechnol. 39, 578–585 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.M. Shaffer, M. A. Borton, B. B. McGivern, A. A. Zayed, S. L. La Rosa, L. M. Solden, P. Liu, A. B. Narrowe, J. Rodríguez-Ramos, B. Bolduc, M. Consuelo Gazitúa, R. A. Daly, G. J. Smith, D. R. Vik, P. B. Pope, M. B. Sullivan, S. Roux, K. C. Wrighton, DRAM for distilling microbial metabolism to automate the curation of microbiome function. Nucleic Acids Res. 48, 8883–8900 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.J. Thannesberger, H.-J. Hellinger, I. Klymiuk, M.-T. Kastner, F. J. J. Rieder, M. Schneider, S. Fister, T. Lion, K. Kosulin, J. Laengle, M. Bergmann, T. Rattei, C. Steininger, Viruses comprise an extensive pool of mobile genetic elements in eukaryote cell cultures and human clinical samples. FASEB J. 31, 1987–2000 (2017). [DOI] [PubMed] [Google Scholar]
- 63.D. Couvin, A. Bernheim, C. Toffano-Nioche, M. Touchon, J. Michalik, B. Néron, E. P. C. Rocha, G. Vergnaud, D. Gautheret, C. Pourcel, CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins. Nucleic Acids Res. 46, W246–W251 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.B. D. Ondov, G. J. Starrett, A. Sappington, A. Kostic, S. Koren, C. B. Buck, A. M. Phillippy, Mash Screen: high-throughput sequence containment estimation for genome discovery. Genome Biol. 20, 232 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.L. F. Camarillo-Guerrero, A. Almeida, G. Rangel-Pineros, R. D. Finn, T. D. Lawley, Massive expansion of human gut bacteriophage diversity. Cell 184, 1098–1109.e9 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.S. Roux, D. Páez-Espino, I.-M. A. Chen, K. Palaniappan, A. Ratner, K. Chu, T. B. K. Reddy, S. Nayfach, F. Schulz, L. Call, R. Y. Neches, T. Woyke, N. N. Ivanova, E. A. Eloe-Fadrosh, N. C. Kyrpides, IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses. Nucleic Acids Res. 49, D764–D775 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.H. Mallick, A. Rahnavard, L. J. McIver, S. Ma, Y. Zhang, L. H. Nguyen, T. L. Tickle, G. Weingart, B. Ren, E. H. Schwager, S. Chatterjee, K. N. Thompson, J. E. Wilkinson, A. Subramanian, Y. Lu, L. Waldron, J. N. Paulson, E. A. Franzosa, H. C. Bravo, C. Huttenhower, Multivariable association discovery in population-scale meta-omics studies. PLoS Comput. Biol. 17, e1009442 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.C. P. Cantalapiedra, A. Hernandez-Plaza, I. Letunic, P. Bork, J. Huerta-Cepas, eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol. Biol. Evol. 38, 5825–5829 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.C. L. Ecale Zhou, J. Kimbrel, R. Edwards, K. M. Nair, B. A. Souza, S. Malfatti, MultiPhATE2: code for functional annotation and comparison of phage genomes. G3 (Bethesda) 11, jkab074 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.A. L. Grazziotin, E. V. Koonin, D. M. Kristensen, Prokaryotic Virus Orthologous Groups (pVOGs): a resource for comparative genomics and protein family annotation. Nucleic Acids Res. 45, D491–D498 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.L. Wang, G. Zhang, H. Xu, H. Xin, Y. Zhang, Metagenomic Analyses of Microbial and Carbohydrate-Active Enzymes in the Rumen of Holstein Cows Fed Different Forage-to-Concentrate Ratios. Front. Microbiol. 10, 649 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.J. Soding, Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951–960 (2005). [DOI] [PubMed] [Google Scholar]
- 73.T. Bae, T. Baba, K. Hiramatsu, O. Schneewind, Prophages of Staphylococcus aureus Newman and their contribution to virulence. Mol. Microbiol. 62, 1035–1047 (2006). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Materials and Methods
Supplementary Text
Figs. S1 to S17
Legends for tables S1 to S17
Tables S1 to S17




