Abstract
Bacteriophages, viruses that infect bacteria, have great specificity for their bacterial hosts at the strain and species level. However, the relationship between the phageome and associated bacterial population dynamics is unclear. Here we generated a computational pipeline to identify sequences associated with bacteriophages and their bacterial hosts in cell-free DNA from plasma samples. Analysis of two independent cohorts, including a Stanford Cohort of 61 septic patients and 10 controls and the SeqStudy cohort of 224 septic patients and 167 controls, reveals a circulating phageome in the plasma of all sampled individuals. Moreover, infection is associated with overrepresentation of pathogen-specific phages, allowing for identification of bacterial pathogens. We find that information on phage diversity enables identification of the bacteria that produced these phages, including pathovariant strains of Escherichia coli. Phage sequences can likewise be used to distinguish between closely related bacterial species such as Staphylococcus aureus, a frequent pathogen, and coagulase-negative Staphylococcus, a frequent contaminant. Phage cell-free DNA may have utility in studying bacterial infections.
Bacterial culture methods developed over a century ago remain the standard of care for identifying bacterial pathogens in sepsis and other settings. Unfortunately, these methods are often imprecise and can be confounded by both false positives and negatives. Improving the identification and subsequent study of bacterial pathogens is a critical medical need1.
Next-generation sequencing (NGS) of cell-free DNA (cfDNA) holds promise for filling this gap. cfDNA is made up of DNA fragments, typically 50–200 bp long, found in circulation2. cfDNA is primarily human in origin with a small but rich microbial compartment2–4. With high-throughput sequencing, cfDNA lends itself to non-invasive testing and has transformed diagnostics in perinatal testing, autoimmune disorders, cancer staging and transplant rejection5–8. There have been efforts to develop cfDNA infection diagnostics in sepsis9–12 and other settings13,14.
Retrospective cohort studies show that these approaches failed to identify the pathogen in many sepsis cases, and these approaches do not establish thresholds distinguishing colonization from infection15,16. Other molecular approaches such as polymerase chain reaction-based diagnostics yield similarly mixed results17 and have limited widespread adoption. There is an urgent need for better methods to study bacterial infection ecology and facilitate downstream diagnostic improvements.
Bacteriophages (phages), or viruses that infect bacteria, may provide insight into the bacterial ecology underlying sepsis and opportunistic infections. Phages are present in all compartments of the human body, including the skin and gut. Here they infect bacterial populations and are an active and critical component of the human microbiome18,19. Though abundant in the human body, they have historically been challenging to identify in samples and have been termed microbial ‘dark matter’20,21.
Nonetheless, aspects of phage biology make them uniquely well suited for interrogating bacterial infections. Many phages have narrow host ranges limited to a few closely related bacteria, though information is sparse given the vast diversity of phage22–24. Composition of phages may therefore reflect bacterial populations at the species and strain level. Additionally, some phages translocate across epithelia and into the bloodstream18,25,26. Despite structural and genetic diversity, most phages within humans are non-enveloped DNA viruses27,28 and any phage DNA entering circulation can be sequenced using existing NGS methods and detected in cfDNA3,29.
Targeted analysis of phages in metagenomic data has historically been limited by lack of available phage sequences20,30–32. Additionally, many metagenomic annotation software packages require upstream metagenome assembly that is frequently impossible in low-microbial-biomass specimens. Fortunately, continuous rapid increases in submitted genomes to the National Center for Biotechnology Information (NCBI)’s GenBank have facilitated improved identification of these phages as more of these ‘dark matter’ genomes become known. Additionally, there are ongoing efforts to better classify phages taxonomically via genomic methods rather than the tradition of grouping only by phage particle morphology33.
In this Article, we describe an approach to confidently identify and interpret DNA phage sequences in cfDNA. We apply this workflow to plasma samples collected from two independent cohorts, each composed of patients with infection-driven sepsis as well as healthy controls. One of these cohorts involves previously published metagenomic and culture data34, while the other was generated as part of this work. We reveal that phageomes reflect infection through pathogen-specific overrepresentation. Furthermore, we show that phages can distinguish between closely related bacterial species where bacterial cfDNA may fall short. Our data suggest that circulating phage sequences in plasma provide a non-invasive approach to studying bacteriophage ecology in infection.
The circulating phageome reflects infection aetiology
We sought to determine if there is a circulating phageome in healthy subjects and whether it is disrupted in patients with blood culture positive infections. To this end, we analysed plasma cfDNA from 61 patients with sepsis seen in the Stanford University Emergency Department as well as 10 asymptomatic controls (Fig. 1a). The sepsis samples represent 20 different bacterial infections confirmed by positive blood cultures, and three of the samples have positive cultures for multiple bacterial genera (Table 1). The samples were analysed using the workflow depicted in Fig. 1a. Briefly, cfDNA was collected from the samples and sequenced on an Illumina platform to an average depth of 18.67 million reads. Raw data were quality controlled and trimmed using standard bioinformatics tools, FASTQC35 and Trimmomatic36. Human reads were subtracted by mapping to the human reference genome GRCh38 via Bowtie2.
Fig. 1 |. The circulating phageome reflects infection aetiology.

a, Schematic of sample numbers, sample processing and analytical approach. b, Proportions of human and non-human reads. c, Distribution of non-human reads’ identity by archaeal, eukaryotic, bacterial, mammalian viral and bacteriophage categories. Mininum, maximum, range, mean, standard deviation and standard error of mean for asymptomatic and sepsis samples are present in Extended Data Table 2. d, Bacteriophage distribution of bacteriophage family. e, SDI in asymptomatic and septic patient phageomes is not significantly different (asymptomatic: mean 4.41, range 0.0655, sepsis: mean 4.38, range 2.992, P = 0.943 per Mann–Whitney test) shown by violin plot with median and quartiles shown by dashed lines. NS, not significant. f, Average distribution of unique phages by bacterial host genus in asymptomatic and septic patient phageomes. Descriptive statistics (mean and standard deviation) for both asymptomatic and sepsis samples and summary statistics of Mann–Whitney tests are available in Extended Data Table 1. g, Average proportions of unique phage by bacterial host in samples with single identified pathogen, by cultured pathogen (Streptococcus, Staphylococcus, Pseudomonas, Klebsiella and Escherichia), as well as three polymicrobial infections with bar corresponding to row’s infectious pathogen outlined in black. h–k, Infection-specific phage proportion shown as violin plots with median and quartiles shown by dashed lines. Summary statistics of Mann–Whitney tests available in Extended Data Table 3: E. coli infection status (P = 0.0017) (h); Streptococcus infection status (P = 0.0018) (i); Staphylococcus infection status (P = 0.0094) (j); Klebsiella infection status (P = 0.0056) (k). l–o, Infection specific SDI calculated and shown as violin plots with median and quartiles shown by dashed lines. Summary statistics of Mann–Whitney tests available in Extended Data Table 3: E. coli infection status (P = 0.0051) (l); Streptococcus infection status (P = 0.0025) (m); Staphylococcus infection status (P = 0.0192) (n); Klebsiella infection status (P = 0.0272) (o). All statistics were performed using unpaired, two-tailed Mann–Whitney tests. *P < 0.05, **P < 0.01. Cx, culture.
Table 1 |.
Distribution of blood-culture-positive bacterial identifications in the sequenced sepsis cohort (N= 61)
| Cronobacter sakazakii | 1 |
| Enterobacter aerogenes | 2 |
| Enterobacter cloacae | 1 |
| Enterococcus sp. | 1 |
| Escherichia coli | 14 |
| Escherichia coli and Klebsiella pneumoniae | 1 |
| Escherichia coli and Streptococcus sp. | 1 |
| Gram-positive cocci | 1 |
| Klebsiella pneumoniae | 5 |
| Moraxella sp. | 1 |
| Paenibacillus sp. | 1 |
| Proteus vulgaris | 1 |
| Pseudomonas aeruginosa | 6 |
| Salmonella enterica | 1 |
| Serratia marcescens | 1 |
| Staphylococcus aureus | 5 |
| Staphylococcus aureus and coagulase-negative Staphylococcus sp. | 1 |
| Staphylococcus aureus and Klebsiella pneumoniae | 1 |
| Staphylococcus sp., coagulase-negative | 5 |
| Streptococcus sp. | 11 |
| Streptococcus pneumoniae | (2) |
| Streptococcus mitis | (2) |
| Streptococcus pyogenes | (1) |
Plasma cfDNA from both asymptomatic controls and septic patients was composed of mostly human reads (Fig. 1b), consistent with previous studies2–4. A BLAST search utilizing the full NCBI Nucleotide database revealed that non-human reads, despite making up a relatively small proportion of all cfDNA reads, represented a rich microbial community (Fig. 1c). Bacterial hits include many genera. In asymptomatic individuals the greatest single genus represented was Cutibacterium, a known skin commensal37 (Extended Data Fig. 1a). Less than 1% of reads corresponded to mammalian viruses, while considerably more reads mapped to bacteriophage: 8.80% and 9.62% of non-human reads in asymptomatic and septic patient samples, respectively (Fig. 1c, Extended Data Fig. 1b and Extended Data Table 1), and all samples contained phage reads.
A first-pass BLAST search38,39 was performed against our curated phage database (CPD), which was constructed from the NCBI Nucleotide database. Reads with significant hits were subjected to a secondary and more stringent human sequence removal, in which all reads with significant BLAST hits to any human sequence deposited in the NCBI nuccore were removed and annotations were made again using the CPD.
A common challenge in phage research is linking phages with the known bacterial host(s). The CPD pulls from virushostDB40, naming conventions of bacteriophages with clearly identified host genera, and NCBI nucleotide source host fields for bacteriophage sequence entries to include available bacterial host metadata. Taxonomic classifications for both phages and their bacterial host(s) are also included as metadata in the CPD, which we have made publicly available (https://doi.org/10.5281/zenodo.7154236). Additional human depletion steps and the CPD allow for clean phageome annotations that can be meaningfully analysed with respect to infection aetiology.
Using more stringent human read depletion, fewer human reads interfered with phage calls (Extended Data Fig. 1c) and annotations had fewer phages without known bacterial hosts (Extended Data Fig. 1d). Remaining annotations reflect a broad variety of phage families (Fig. 1d), though the tailed phage families Myoviridae and Siphoviridae predominated. There was no statistical difference in phage family representation within the circulating phageomes of asymptomatic controls and septic patients (Fig. 1d). The Shannon Diversity Index (SDI), a common ecological quantification of species richness and evenness41 of circulating phages, was not significantly different between the groups (Fig. 1e), though septic patients did have a wider range in phage SDI. These findings indicate that there is no broad ‘Sepsis Phageome’ signature.
Of identified phages with known bacterial hosts in the CPD, most corresponded to Pseudomonas, Escherichia, Klebsiella, Cutibacterium and Streptococcus genera (Fig. 1f). Bacterial host proportions were not significantly different by sepsis status (Fig. 1f, Extended Data Fig. 2a and Extended Data Table 2). There was inter-individual variation with most samples dissimilar from all others on a per-phage basis and the majority of identified phages present in less than 10% of samples (Extended Data Fig. 2b–f). Many phages have unknown bacterial hosts, accounting for 47.2% and 37.4% of phages in the asymptomatic and septic patient groups, respectively (Fig. 1f). A portion of these phages were uncharacterized gut phages from gut metagenome sequencing, suggesting potential contribution of gut phages to the circulating cfDNA pool. Furthermore, uncharacterized phages in asymptomatic individuals were composed of higher proportions of these gut phages than in individuals with sepsis (Extended Data Fig. 3a).
As sepsis is commonly caused by a single bacterial strain42, we investigated circulating phages on a per-infection basis—hypothesizing that bacteriophages pertinent to the causative pathogen in a sample will be overrepresented. We found that patients with sepsis due to E. coli, Streptococcus spp., Staphylococcus spp. and Klebsiella spp. infections had a corresponding overrepresentation in phage proportion corresponding to the causative pathogen (Fig. 1g–k and Extended Data Table 3). However, analysis of proportion alone does not account for overall number of phages per sample or the composition of the phage population associated with those pathogens across all samples. We therefore calculated the ecological diversity of phages. Applying SDI to each infection aetiology in our dataset with at least seven samples (a sufficient number for us to detect a 75% SDI increase with 80% power), we found increases in the SDI of the pathogen-specific phage popu lation in patients with sepsis due to E. coli, Streptococcus spp., Staphylococcus spp. and Klebsiella spp. infections (Fig. 1l–o and Extended Data Table 3). There were three samples with positive cultures from multiple bacterial genera that demonstrated varying degrees of representation of both pathogens (Fig. 1g and Extended Data Table 4).
Taken together, the content of non-human cfDNA, including the bacterial, eukaryotic and viral DNA as well as the bacteriophage families, diversity and annotated bacterial hosts, was broadly similar in health and in sepsis. However, septic patients with positive blood cultures had phageomes more representative of the bacterial cause of their sepsis.
cfDNA phageomes reflect pathogens in another large study
We sought to validate our findings in a larger, more clinically diverse group of patients. The SepSeq study, a cohort study of 391 individuals, investigated the use of bacterial cfDNA to diagnose the cause of sepsis in patients triggering an emergency department sepsis alert with a wide array of infectious aetiologies for which data are publicly available34.
There are four patient populations in the SepSeq study: patients with (1) blood-culture-positive sepsis, (2) blood-culture-negative sepsis with documented infection elsewhere, (3) systemic inflammatory response syndrome (SIRS) who clinically appear septic but ultimately no infectious aetiology was identified, and (4) asymptomatic controls (Fig. 2a). The original study reported overall promising diagnostic statistics using bacterial cfDNA, but with notable gaps: under-sensitivity in samples with negative initial blood cultures, and oversensitivity to chronic infections and commensals. We sought to determine first, if the phageome trends we observed in our dataset were consistent in the SepSeq study in patients with bacterial infections and second, if phages present in cfDNA might offer additional useful information in the study of pathogenic bacterial ecology.
Fig. 2 |. cfDNA phageomes reflect pathogens in another large study.

a, SepSeq sample schematic. b, SDI of phages by sample category (mean Blood Cx (+): 4.05, mean Asymptomatic: 3.48, P < 0.001). c, Average proportions of unique phages by bacterial host by sample category. d, Per infection disruption of average proportions of unique phages with single identified pathogen, by cultured pathogen in Streptococcus, Staphylococcus, Pseudomonas, Klebsiella and Escherichia infection, with bar corresponding to row’s infectious pathogen outlined in black. e–h, Infection-specific phage proportion by infection category, all statistics performed by Kruskal–Wallis test with Dunn’s multiple comparisons shown as violin plots with median and quartiles shown by dashed lines. Summary statistics available in Extended Data Table 5: E. coli (mean SIRS: 0.152, mean other blood Cx: 0.096, mean blood Cx (−) site Cx (+): 0.347, mean blood Cx (+): 0.338, Kruskal–Wallis P = 3.55 × 10−11) (e); Streptococcus (mean SIRS: 0.119, mean other blood Cx: 0.037, mean blood Cx (−) site Cx (+): 0.236, mean blood Cx (+): 0.375, Kruskal–Wallis P < 0.0001) (f); Staphylococcus (mean SIRS: 0.032, mean other blood Cx: 0.034, mean blood Cx (−) site Cx (+): 0.268, mean blood Cx (+): 0.456, Kruskal–Wallis P = 0.0001) (g); Klebsiella (mean SIRS: 0.034, mean other blood Cx: 0.031, mean blood Cx (+): 0.241, Kruskal–Wallis P = 0.0002) (h). i–l, Pathogen host phage SDI by infection category, all statistics performed by Kruskal–Wallis test with Dunn’s multiple comparisons shown as violin plots with median and quartiles shown by dashed lines. Summary statistics available in Extended Data Table 5: E. coli (mean SIRS: 1.45, mean other blood Cx: 1.07, mean blood Cx (−) site Cx (+): 3.35, mean blood Cx (+): 3.48, Kruskal–Wallis P = 2.20 × 10−14) (i); Streptococcus (mean SIRS: 1.02, mean other blood Cx: 0.760, mean blood Cx (−) site Cx (+): 1.32, mean blood Cx (+): 3.03, Kruskal–Wallis P = 0.0024) (j); Staphylococcus (mean SIRS: 0.266, mean other blood Cx: 0.316, mean blood Cx (−) site Cx (+): 3.17, mean blood Cx (+): 4.16, Kruskal–Wallis P = 3.89 × 10−11) (k); Klebsiella (mean SIRS: 0.510, mean other blood Cx: 1.00, mean blood Cx (+): 2.43, Kruskal–Wallis P = 6.01 × 10−7) (l). NS, not significant. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001.
Using our annotation workflow (Fig. 2a), we again found circulating phageomes were remarkably similar in health and sepsis. While overall phage diversity was higher in patients with sepsis and positive blood cultures (Fig. 2b), proportions of bacterial hosts represented by the identified phages (Fig. 2c) were largely stable. We found the most common bacterial hosts to be Enterobacter spp., Escherichia spp., Staphylococcus spp. and Streptococcus spp., even in healthy patients (Fig. 2c) and again found that uncharacterized gut phages contribute to the phage population in these patients (Extended Data Fig. 3b). Phages specific to the causal bacterial pathogen were again overrepresented in those phageomes compared with other groups (Fig. 2d and Extended Data Table 5). We found that in patients with blood-culture-positive E. coli, Streptococcus spp., Staphylococcus spp. and Klebsiella spp. infections, both the proportion (Fig. 2e–h and Extended Data Table 5) and diversity (Fig. 2i–l and Extended Data Table 5) of pathogen-associated phages were significantly higher than in individuals with another infection or no isolated pathogen (SIRS). Interestingly, for a single bacterial infection aetiology, phage proportion and diversity across patients did not vary between blood-culture-positive and blood-culture-negative groups, which indicates that circulating phage sequences are capable of reflecting infection outside the context of bacteraemia. One other important point is the non-zero background of phages in individuals with SIRS or other infection—especially in the case of Klebsiella phages, where individuals with non-Klebsiella sepsis have a higher diversity of Klebsiella phages compared with patients with SIRS (Fig. 2l). These findings may be indicative of potentially undiagnosed infections, of influence of infection on other bacterial communities in the body, or even of increased phage translocation due to inflammation.
The SepSeq study redemonstrated a healthy circulating phageome and no uniform ‘Sepsis Phageome’. The proportion and diversity of infection-associated phages, however, were consistently overrepresented in the context of bacterial infection. This suggests plasma cfDNA sequencing provides an alternative to site of infection fluid sequencing towards study of bacterial infection ecology.
E. coli phages reflect host strain characteristics
E. coli infections are common causes of sepsis, including in our cohorts, and E. coli phages are extensively studied. Here, E. coli phages were sporadic in asymptomatic patients, more common in patients with SIRS or non-E. coli sepsis, and ubiquitous in patients with E. coli sepsis (Fig. 3a). There was notable enrichment for E. coli phages from Siphoviridae, Podoviridae and Myoviridae phage families in E. coli sepsis, while Autographiviridae phages remain at stably low levels despite infection status (Fig. 3b–e). There was also slight but significant enrichment for E. coli phages from Siphoviridae, Podoviridae and Myoviridae phage families in patients with SIRS or non-E. coli sepsis (Fig. 3b–e). Genome-based viral taxonomies are gaining traction, and testing them using a new but incomplete database43 results in many of these phages being reclassified or flagged as unclassified (Extended Data Fig. 4a). Still, the trends observed with the old taxonomy persist (Extended Data Fig. 4b).
Fig. 3 |. E. coli phages reflect host strain characteristics.

a, In samples with detectable E. coli phages, E. coli bacteriophage numbers by family vary by infection status. b–e, Number of E. coli phages by phage families tested by Brown–Forsythe and Welch ANOVA with two-sided Dunnett’s T3 multiple comparison test: E. coli Siphoviridae phages (mean asymptomatic: 5.80, mean SIRS: 11.19, mean other sepsis: 6.40, mean E. coli sepsis: 45.67, Brown–Forsythe and Welch ANOVA P = 8.13 × 10−34. Multiple comparisons: E. coli sepsis versus: asymptomatic P = 0.00, SIRS P = 0.00, other sepsis P = 0.00. Other sepsis versus: SIRS P = 0.11, asymptomatic P = 0.998. Asymptomatic versus SIRS P = 9.29 × 10−3) (b); E. coli Podoviridae phages (mean asymptomatic: 0.44, mean SIRS: 3.08, mean other sepsis: 1.66, mean E. coli sepsis: 25.47, Brown–Forsythe and Welch ANOVA P = 1.65 × 10−12. Multiple comparisons: E. coli sepsis versus: asymptomatic P = 4.65 × 10−7, SIRS P = 4.3 × 10−6, other sepsis P = 1.3 × 10−6. Other sepsis versus: SIRS P = 0.76, asymptomatic P = 0.38. Asymptomatic versus SIRS P = 0.037) (c); number of E. coli Myoviridae phages (mean asymptomatic: 0.36, mean SIRS: 4.02, mean other sepsis: 0.98, mean E. coli sepsis: 16.67, Brown–Forsythe and Welch ANOVA P = 2.79 × 10−5. Multiple comparisons: E. coli sepsis versus: asymptomatic P = 6.17 × 10−3, SIRS P = 0.07, other sepsis P = 9.19 × 10−3. Other sepsis versus: SIRS P = 0.24, asymptomatic P = 0.39. Asymptomatic versus SIRS P = 0.08) (d); E. coli Autographiviridae phages (Brown–Forsythe and Welch ANOVA P = 0.32) (e). f, Some individual E. coli phages are prevalent across multiple samples, with E. coli sepsis samples reflecting phages more highly represented across the group (mean 0.1524 representation across E. coli sepsis E. coli phages versus 0.01231 for asymptomatic, 0.03364 for SIRS, and 0.01631 for other sepsis samples. Kruskal–Wallis P = 3.25 × 10−113 with two-sided Dunn’s multiple comparisons to E. coli sepsis group. E. coli sepsis versus: asymptomatic P = 1.70 × 10−89, SIRS P = 1.53 × 10−15, other sepsis P = 5.10 × 10−73). g, Stacked bar plot of phage representation across samples by group of E. coli phages subset by representation level in E. coli sepsis. h, E. coli phage morphology by phage representation groups. All violin plots have median and quartiles shown by dashed lines. NS, not significant. *P < 0.05, **P < 0.01, ****P < 0.0001.
There was incredible diversity of E. coli phage hits with more than 500 unique phages identified across the 36 patients with E. coli sepsis. We separated these phages into three groups based on each phage’s prevalence across samples. Most strikingly, there was a group of broadly represented, or ubiquitous, E. coli phages present in nearly all patients with E. coli sepsis but not other groups. A subset of moderately represented phages were present in many but not all patients with E. coli sepsis. The remaining phages were present in only a few samples—in fact the majority of phages were present in only one or two patients (Fig. 3f). We then analysed the characteristics of the individual phages by the aforementioned prevalence-based groups. We found differences in the E. coli phage families (and therefore morphology) that corresponded to how common the phage was amongst patients. Broadly represented E. coli phages possess clear enrichment for Siphoviridae phages; Podoviridae phages predominate in the moderately represented subset of phage; and the rare phage subset was predominantly Myoviridae phages (Fig. 3g).
E. coli phages are well studied, and 65.56% of these phages have a clear bacterial host strain noted in the literature (Supplementary Data (Coliphage dictionary)). Using this rich literature captured in the CPD, we annotated bacterial host to the level of the bacterial strain for the E. coli phages in each prevalence subset. We found that lab strains of E. coli, most derived initially from patient samples, were common hosts across each phage subset. Phages associated with toxin-producing strains of E. coli (STEC and ETEC) were overrepresented in the broadly represented group of phages (Fig. 3h)—most patients with E. coli sepsis had hits to these phages. A more global view of all E. coli phages in patients with E. coli sepsis showed an overrepresentation of STEC-associated phages (Extended Data Fig. 5a) with a corresponding decrease in environment-associated bacterial strains (sewage, manure or water).
Together, these data indicate that E. coli phages provide an ecological view of E. coli infections in the context of sepsis—with phages from samples of true infection being associated with toxin-producing E. coli host pathovariants. This level of granularity may have utility in distinguishing true infection from colonization.
Phages specify bacterial species in Staphylococcus infections
One difficulty in plasma-based microbial analyses is the fact that genomes of related bacterial species may be difficult to distinguish due to short cfDNA fragments and comparatively large genomes that may share many conserved genes. In the case of Staphylococcus infections, bacterial cfDNA can potentially underperform in distinguishing species within a bacterial genus. We asked the question of whether phages can help regain some of the lost ‘resolution’ of bacterial DNA. We assessed whether bacteriophage ecology differed between the species S. aureus, the presence of which in blood culture is always considered pathogenic, and coagulase-negative Staphylococcus species (CoNS), the presence of which is often a contaminant. Both species share many core genes44, which if detected in cfDNA are difficult to attribute to either confidently—though culture techniques adequately distinguish these species in clinical contexts.
We found that, when mapping to a S. aureus reference genome, bacterial cfDNA was unable to distinguish between S. aureus and CoNS in either our cohort (Fig. 4a) or the SepSeq cohort (Fig. 4b). Consistent with this, the corresponding receiver operating characteristic (ROC) plots have areas under the curve (AUCs) near 0.5 (Fig. 4c). Notably, however, the populations of phage that infect S. aureus and CoNS are distinct and the diversity of S. aureus-specific phages was informative as there was a significant increase in S. aureus phage diversity only in S. aureus infections (Fig. 4e). This is reflected in a ROC plot with AUC of 0.96 (Fig. 4f). While the AUC based on phage diversity was higher than that based on bacterial DNA (Fig. 4f), this increase did not reach significance in our smaller cohort (Fig. 4d) and calls for further validation in a separate, more highly powered cohort.
Fig. 4 |. Phages specify bacterial species in Staphylococcus infections.

a, Reads mapping to S. aureus bacterial genome in Stanford sepsis cohort by infection category (mean coagulase-negative, or coag-negative Staphylococcus (CoNS): 3, mean S. aureus: 3, Mann–Whitney test P > 0.99). One sample with both S. aureus and CoNS cultures excluded. b, Reads mapping to S. aureus bacterial genome in SepSeq by infection category (mean CoNS: 2.57, mean S. aureus: 2.38, Mann–Whitney test P = 0.844). c, ROC plot for S. aureus bacterial genome in distinguishing S. aureus culture versus CoNS culture (AUC Stanford sepsis cohort 0.500, 95% confidence interval 0.146 to 0.854, P > 0.99, AUC SepSeq 0.544, 95% confidence interval 0.225 to 0.858, P = 0.772). d, S. aureus phage SDI by infection category in Stanford sepsis cohort (mean CoNS: 0.78, mean S. aureus: 1.78, Mann–Whitney test two-sided P = 0.42). One sample with both S. aureus and CoNS cultures excluded. e, S. aureus SDI by infection category in SepSeq (mean CoNS: 0.553, mean S. aureus: 3.54, Mann–Whitney test two-sided P = 0.0012). f, ROC plot for S. aureus phage SDI in distinguishing S. aureus culture from CoNS culture (AUC Stanford sepsis cohort 0.667, 95% confidence interval 0.339 to 0.995, P = 0.337, AUC SepSeq 0.964, 95% confidence interval 0.877 to 1, P = 0.0026). g, Correlation matrix by culture status, phage diversity and S. aureus mapping bacterial reads in SepSeq samples (S. aureus culture and S. aureus phage SDI Pearson R: 0.81, 95% confidence interval 0.736 to 0.861, two-sided P = 1.65 × 10−29, CoNS culture and S. aureus phage SDI Pearson R: 0.15, 95% confidence interval −0.031 to 0.32, two-sided P = 0.10, S. aureus culture and S. aureus bacterial reads Pearson R: 0.08, 95% confidence interval −0.098 to 0.254, two-sided P = 0.37, S. epidermidis bacterial reads and S. aureus culture Pearson R: −0.43, 95% confidence interval −0.759 to 0.060, two-sided P = 0.08). Pearson R values for correlation of S. aureus infection with S. aureus phage SDI and bacterial reads highlight with a blue box. NS, not significant. **P < 0.01.
These data suggest that increased diversity of S. aureus phages is sensitive and specific for S. aureus infections. This relationship can also be seen in the correlation plot where S. aureus phage diversity correlates strongly with S. aureus infection, which was not the case for CoNS phage diversity and CoNS infection (Fig. 4g). Taken together, subgroups of phage, specifically S. aureus phage, are potential targets for comparing infections in related bacterial species where genetically similar pathogens complicate NGS analyses.
Discussion
We report here that bacteriophage sequences are present in cfDNA from plasma and that this information can be used to non-invasively study bacteriophage ecology in bacterial infection. We demonstrate the utility of this approach in two independent datasets as well as in their associated asymptomatic controls, all generated using conventional NGS methods.
The data presented here indicate that all individuals have a circulating phageome. The phageome is reflective of bacteria-associated skin and gut commensals—suggesting potential translocation of phages from the gastrointestinal tract into circulation. However, low biomass of phage versus human DNA in cfDNA limits more specific analyses, such as generating quality phage contigs that could help to distinguish between integrated prophage and free phage DNA. The origins and functional significance of phage cfDNA in the human body await further investigation.
We found little global alteration in the phageome in bacterial sepsis. We demonstrate that phages can be used to study bacteriophage ecology in infection and to distinguish between closely related bacterial species that cause true infections and contamination. This potential was in the group of patients with E. coli sepsis, where we showed that phages ubiquitous across patients with E. coli sepsis tend to associate with toxin-producing pathovariant hosts. We showed that bacteriophages may help recover lost resolution in plasma cfDNA when comparing closely related bacterial species such as S. aureus and CoNS. Future experimental studies in pre-clinical models are needed to inform our understanding of phage cfDNA in infection.
While the overrepresentation of phages corresponding to bacterial pathogens in sepsis appears robust (the parallel increase in bacterial cfDNA during sepsis is the foundation of various existing diagnostic platforms11,34,45–47), its biological meaning is unclear. It is important to note that some phages may have broader bacterial host ranges than assumed48—which may complicate analyses. Because bacteria may harbour multiple prophages49, our observed increase in diversity may reflect lysis by multiple phages due to rapid bacterial expansion or stress. However, other explanations are possible. Understanding phage dynamics over the course of an infection and in polymicrobial infections will increase robustness of phageome studies in infection. Further studies are needed to investigate how phages gain access to the bloodstream and how this varies by site of infection and bacterial burden. Larger studies to capture phage heterogeneity in the context of infection are needed before attempting to identify diagnostically useful phage targets.
This study has several limitations. One limitation is that both cohorts in this study were collected at the same location—the Stanford Emergency Department. However, sample preparation, year of acquisition, and sequencing methods differed between the groups. Additionally, because of the breadth of sepsis aetiologies, each individual type of infection had relatively low numbers of samples, making comparisons challenging. Additionally, the asymptomatic control plasma samples—acquired commercially—were only from male donors, limiting the generalizability of the asymptomatic phageomes (Table 2). Another limitation is that low microbial cfDNA biomass limits such analyses to only known phages due to sparsity of reads. Though microbial cfDNA biomass may be higher at infection sites, we believe that the trade-off of a non-invasive approach facilitates easier acquisition of patient samples and enables retrospective study in banked plasma samples from infected patients.
Table 2 |.
Plasma sample donor demographic characteristics
| Sepsis patient characteristics (N=61) as reported by donor at recruitment | No. |
|---|---|
| Age | |
| Mean (60.1 years), standard deviation (20.59 years), range (19–97 years) | |
| Sex | |
| Female | 22 |
| Male | 39 |
| Race | |
| Asian | 8 |
| Black or African American | 2 |
| Hispanic/Latino | 9 |
| White | 30 |
| More than one race | 9 |
| Unknown/unreported | 3 |
| Asymptomatic sample donor characteristics (N=10) as provided per commercial single-donor plasma source | |
| Age | |
| Mean (32.2 years), standard deviation (9.83 years), range (19–47 years) | |
| Sex | |
| Male | 10 |
| Race | |
| Black or African American | 3 |
| Hispanic/Latino | 4 |
| White | 3 |
In summary, we showed here that there is a circulating phageome detectable in both asymptomatic and infected individuals using NGS sequencing of cfDNA, and that phage cfDNA can help to investigate bacterial and phage ecology in infection at the population level. We believe there is great promise in using phages to non-invasively study bacterial infection ecology, and that further exploration of bacteriophage ecology and dynamics in the context of bacterial infections will lay the foundation for future efforts to utilize phages in non-invasive diagnostics.
Methods
Our research, which utilizes plasma samples from patients, complies with ethical regulations: the protocol and informed consent was reviewed by Stanford University’s institutional review board (32851).
Sample collection
The Emergency Department Sepsis Biobank study protocol and informed consent (32851) was reviewed and approved by Stanford University’s institutional review board. Samples were collected from patients presenting to the Stanford University Hospital Emergency Department who triggered a sepsis alert, which included blood sample collection at the time of enrolment as well as collection of the results from standard of care microbiological testing they received during the first seven days of their admission. Patients were eligible to enrol if they triggered a sepsis alert, were 18 years of age or older, had a temperature of >38 °C or <36 °C and met at least one SIRS criterion (heart rate >90 beats per minute; respiratory rate >20 breaths per minute or a partial pressure of carbon dioxide <32 mm Hg; white blood cell count of either >12,000 cells μl−1 or <4,000 cells μl−1 and >10% bands). No compensation was provided for participants.
Blood samples were collected from peripheral blood draw or indwelling venous catheter into BD Vacutainer EDTA blood collection tubes (Becton Dickinson) at the same time as the patients were undergoing blood draw for standard of care blood cultures. In the vast majority of patients, this occurred before antibiotic administration. Samples were then stored at 4 °C for no more than 72 h, centrifuged at 1,500g for 10 min to obtain plasma, and stored thereafter at −80 °C. Samples were de-identified using a unique code that could be linked to the patient identifier to later obtain gold-standard microbiological testing results for each patient. Plasma samples from 61 patients with positive blood culture results were randomly selected for further phageome analyses as described below. No statistical method was used to pre-determine sample size but our sample sizes are similar to those reported in previous publications15,46,47.
Asymptomatic control plasma was purchased from Innovative Research (IPLASK2E2ML Novi). Demographic information for samples from both the Stanford sepsis cohort and the asymptomatic control group is presented in Table 2.
DNA extraction, library preparation and NGS
DNA was extracted from 1 ml of each human plasma sample using the DNeasy Blood and Tissue kit (Qiagen, 69504) according to the manufacturer protocol and recommendations. Extracted DNA was delivered to SeqMatic for DNA library preparation, which was performed using the Nextera XT library preparation kit (Illumina, FC-131–1096) and then subsequent sequencing on the NovaSeq 6000 platform (1 × 100 bp) at an average of 18.67 million reads per sample, which, due to low phage biomass, corresponds to a sequencing depth per phage of less than 1 for most phages detected.
Nuclease-free water and phosphate-buffered saline (PBS) were run through the DNeasy Blood and Tissue kit DNA extraction protocol and used as controls in the Nextera XT library preparation to characterize ambient DNA or contaminating DNA in kits used for DNA extraction or library preparation. After adapter trimming and human read removal (as described below), a blast search was done against the full nucleotide database. Blast output, including GI/subject sequence ID, taxonomic ID, name, per cent identical match, e-value, and query sequence, along with taxonomic category (human, eukaryotic, bacterial, viral, bacteriophage, Plantae, other/synthetic, and undescribed environmental) is provided in Supplementary Data (Negative Control Summary, PBS Control BLAST Hits, Water Control BLAST Hits).
The samples from which publicly available data were used (Karius Inc., accession PRJNA507824) were described in their original manuscript (https://doi.org/10.1038/s41564-018-0349-6) as being thawed, spiked with a synthetic normalization molecule controls and centrifuged at 16,000g for removal of human cells. The cfDNA in remaining plasma were extracted using the Mag-Bind cfDNA Kit in the Hamilton STAR liquid handling workstation. Libraries were created using the Ovation Ultralow System V2 kit and were sequenced on the Illumina NextSeq500 using a 75-cycle single end run at an average 24 million reads per sample. These samples were downloaded using the sratoolkit v2.10.9 using the accession list from PRJNA507824 with the ‘—gzip’ option.
Corresponding DNA template-free library preparations were used by Karius Inc. in the preparation of their SepSeq manuscript. They found only samples by IDs SRR8288617, SRR8288832 and SRR8288643 were in a batch with a single Lactotoccus phage contaminant—one asymptomatic control and two samples of different infection aetiologies (Staphylococcus and Streptococcus).
Sequencing read QC
Raw sequencing read quality was assessed using the FASTQC software35. Trimming of adapter sequences and low-quality individual reads was done using Trimmomatic 0.39 (ref. 36) with the following settings: SE -phred33 -threads 8 ILLUMINACLIP:TruSeq3-SE.fa:2:30:10:2:keepBothReads LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36.
FASTQC reports were moved to a single directory and MultiQC50 was used on this directory with default options to assemble a quality report across all samples post-trimming to confirm average read quality above the default trimming threshold.
Human read removal
Reads passing quality control (QC) were then mapped to human reference genome GRCh38 with Bowtie2 v2.4.4 (ref. 51) with default options generating an output sam file.
Reads that did not map were considered to be putative non-human reads. The output sam file was converted to a bam file using the samtools52 view -bS command. The non-mapping reads were extracted using samtools view with the options ‘-b -f 0×4’. The non-human reads were converted from a bam file to a FASTQ file using the samtools bam2fq command.
Seqtkv1.3 was used to convert FASTQ files to FASTA files53 using the command ‘seqtk seq -A input.fastq > output.fasta’.
These sequences were then subjected to an additional layer of human read removal: BLAST v2.13.0 (refs. 38,39) was then used to align putative non-human reads to all human-associated sequences (those with taxids 9606, 63221 or 741158) in NCBI Nucleotide (www.ncbi.nlm.nih.gov/nuccore), retrieved on 28 November 2021. The blast search was done with an output format of ‘6 qseqid sseqid pident length evalue stitle’ with a culling limit of 1 and an e-value cut-off of 0.0005.
All reads with hits to human sequences with an e-value below 0.0005 were discarded, and all remaining reads were considered to be non-human. This was done by using the command ‘awk ‘{print $1}’ blast_output’.txt > toremove.txt’, which extracted the sample read IDs (‘qseqid’) from the blast output tables. Seqkitv2.2.0 was then used to remove these reads and resulted in FASTA files that had been subjected to additional human sequence depletion, using the command ‘seqkit grep -v–pattern-file toremove.txt sample.fasta > depleted_sample. fasta’.
Superkingdom read annotation
Superkingdom distribution of non-human reads was done by nucleotide BLAST search against the full NCBI Nucleotide database (retrieved on 28 November 2021 as above). A negative taxid list containing primate taxids (9606, 63221 and 741158) was provided to avoid any incorrect annotations that could result from residual human- origin reads not removed by the above rigorous human read depletion. A culling limit of 1 was used, and an e-value cut-off of 0.0005 was used. Output format was set by ‘6 qseqid sseqid length pident evalue staxids stitle’.
The distribution of superkingdoms was assessed by assessing proportions of all hit taxids that corresponded to eukaryotic, viral (mammalian or bacteriophage), and archaeal organisms. The lists of these taxids were obtained from the NCBI taxonomy database. The calculation of distribution was done in R Statistical Software (v4.1.2; R Core Team 2021 (ref. 54)) with a summary table counting number of reads corresponding to each superkingdom created, and then corresponding proportions calculated for proportional visualization in Graphpad Prism.
Bacteriophage annotations
The CPD was constructed from phage genomes retrieved from NCBI Nucleotide on 12/07/2021 (NCBI, US National Library of Medicine, retrieved as above); the search was restricted to viral sequences and used terms ‘phage’ or ‘bacteriophage’. Duplicate sequences, or sequences of only open reading frames, genes, proteins, regions or modified genomes were excluded. This yielded 26,159 phage sequences, which were combined into a single FASTA file and used to generate a local phage BLAST database. Cleaned and processed reads were annotated by performing a BLAST alignment search of non-human reads against the CPD. The CPD BLAST database was used as the ‘-db’, wordsize was set to 28 and an e-value of 0.0005 and culling limit of 1 were set. Output format was set to
‘6 qseqid sseqid pident length evalue stitle’. The reads with hits to phage sequences were then extracted from the FASTA file. This was done by using the command ‘awk ‘{print $1}’ blast_output’.txt > tokeep.txt’, which extracted the sample read IDs (‘qseqid’) from the blast output tables.
Seqkit was then used to keep only these reads and resulted in FASTA files with only potential phage hits, using the command ‘seqkit grep–pattern-file tokeep.txt sample.fasta > subset_sample.fasta’.
All reads with bacteriophage hits below the significance threshold were then subjected to a BLAST alignment search limiting the subset of the NCBI Nucleotide database with human taxonomic IDs to identify bacteriophage matching reads with human homology. This was done using a BLAST database created from sequences with the taxid 9606 retrieved on 28 November 2021. The blast search was run with an e-value cut-off of 0.0005, culling limit of 1, 8 threads and word size of 28. Output format was set using ‘6 qseqid sseqid pident length evalue stitle’. The reads with both phage hits and hits to human sequences with an e-value of 0.0005 or less were then removed from the FASTA files using command ‘awk ‘{print $1}’ blast_output’.txt > toremove.txt’, which extracted the sample read IDs (‘qseqid’) from the blast output tables.
Seqkit was then used to remove these reads and resulted in FASTA files containing only sequences with hits to phage-only sequences with any sequences with human genome homology removed, using the command ‘seqkit grep -v–pattern-file toremove.txt subset_sample. fasta > depleted_subset_sample.fasta’.
The remaining sequences were subjected to a second BLAST search against the CPD with a word size of 28, use of 8 threads, an e-value cut-off of less than 0.0005, and a culling limit of 10 to capture potential identities of various bacteriophages for reads covering highly conserved sequences across bacteriophages. Output included qseqid, sseqid, stitle, pident, length and e-value.
The CPD, which includes phages and their bacterial host(s), was created using annotations from virus-host DB40, from the phage NCBI taxonomy entry, by phage name, and manual curation from manuscripts associated with the phage sequence submission. Phages with only genus-level host detail available were noted as such. Further characterization of host bacterial qualities was performed through literature search. Accessions from the Cenote Human Virome Database32 gut phages were used to identify proportions of annotated, uncharacterized phages that have been found to be gut phages. The CPD is publicly available at https://doi.org/10.5281/zenodo.7154236. Of note, the CPD is reflective of a bias in nucleotide database enrichment for human-associated pathogens and their phages, and as such underrepresents environmental phages.
Interpretation of phage annotations
Phage sequence hits were analysed using R. Phage accession IDs were extracted and the taxize v0.9.99 package was used to assign corresponding taxonomic lineage information55. Phage diversity was calculated using the SDI41 through the R package vegan v2.5–7 (ref. 56). R code for the use of the CPD along with blast outputs for interpretation of phageome representation of phage families and host genus/characteristics is available at https://doi.org/10.5281/zenodo.7734114.
Pearson dissimilarity matrix was generated using the R package factoextra v1.0.6 (ref. 57). Genetically defined taxonomic lineages were used from a publicly available database43 for alternative analyses. The subset of E. coli phages was analysed with the genetically defined lineages from INPHARED databasev1.7 (ref. 43) used in place of the phage family identities present in the CPD for creating family summary tables and subsequent comparisons in GraphPad Prism. Sankey diagram of phage reclassification of identified E. coli phages was created using SankeyMATIC, which show on the left the original NCBI taxonomy defined taxonomic family and on the right the taxonomic family identified in the INPHARED database.
Bacterial mapping
Bacterial reads were assessed through mapping to the reference S. aureus bacterial genome NC_007795.1 using Bowtie2 v2.4.4 (ref. 51) with default settings. The number of mapping reads was obtained using the samtools view command, with flag -F 0 × 04 used to obtain a coverage output from the mapped SAM file output.
Statistics and reproducibility
Statistics were performed using Prism 9.3.1 (GraphPad Software). Data for pathogen-associated phage proportion and diversity were found to not have normal distribution by Shapiro–Wilk test, and as such, statistical tests not assuming Gaussian distribution were utilized. Statistical tests used included Mann–Whitney tests, Kruskal–Wallis test with Dunn’s multiple comparisons, ROC curve generation and Pearson correlations. Number of phage by morphology was assumed to be normal but not formally tested and was analysed using a Brown–Forsythe and Welch analysis of variance (ANOVA) test with multiple comparison. Proportion of gut-metagenome phages was assumed to be normal but not formally tested and was analysed using an unpaired t-test. Please see figure legends for specific statistical tests performed in each figure.
No statistical method was used to pre-determine sample size, and these were based on availability of patient samples. No data were excluded from analyses. Plasma samples from both sepsis and asymptomatic groups were randomly selected for processing and downstream analysis. Investigators were blinded during processing of plasma samples (DNA extraction and library preparation) and generation of phageome annotations, but not blinded during analysis of final phageome annotations with respect to infection aetiology.
Materials
All commercially obtained materials used in this study have catalogue numbers noted in relevant Methods sections. The entire volumes of human plasma samples were used as described in Methods and therefore cannot be made available.
Extended Data
Extended Data Fig. 1 |. Non-Human reads in Asymptomatic and Septic individuals.

(A) Average proportion of bacterial hit genus in asymptomatic (N = 10) and septic (N = 61) nonhuman cfDNA as identified by BLAST search. (B) Violin plots of proportions of non-human read identities by BLAST search from both asymptomatic (N = 10) and septic (N = 61) individuals. Descriptive statistics are available in Extended Data Table 1. (C) Proportions of bacteriophage hits removed in secondary human sequence homolog removal step (mean 0.042 SD 0.038, N = 71). (D) Average distribution of unique phages by bacterial host genus with and without secondary human sequence homology removal (N = 71). Uncleaned Hits, mean proportions and SD (Pseudomonas: mean 0.029 SD 0.30, Enterobacter mean 0.021 SD 0.015, Escherichia mean 0.042 SD 0.038, Klebsiella mean 0.015 SD 0.020, Salmonella mean 0.013 SD 0.009, Not Annotated mean 0.603 SD 0.070, Staphylococcus mean 0.009 SD 0.011, Streptococcus mean 0.014 SD 0.036, Enterococcus mean 0.001 SD 0.002, Bacillus mean 0.006 SD 0.007, Other mean 0.245 SD 0.052), Cleaned Hits, mean proportions and SD (Pseudomonas: mean 0.047 SD 0.058, Enterobacter mean 0.050 SD 0.058, Escherichia mean 0.076 SD 0.065, Klebsiella mean 0.023 SD 0.036, Salmonella mean 0.014 SD 0.016, Not Annotated mean 0.35 SD 0.100, Staphylococcus mean 0.020 SD 0.040, Streptococcus mean 0.032 SD 0.076, Enterococcus mean 0.002 SD 0.004, Bacillus mean 0.012 SD 0.015, Other mean 0.373 SD 0.123). All violin plots are shown with individual data points with median and quartiles shown by dashed lines.
Extended Data Fig. 2 |. Bacterial host distribution does not change in sepsis, though individual variation remains.

(A) Violin plot of bacteriophage host genus proportions between Asymptomatic (N = 10) and Septic (N = 61) patient samples, associated statistics are in Extended Data Table 2. (B) Heatmap of Pearson dissimilarity matrix between patients with sepsis (N = 61). (C) Heatmap of Pearson dissimilarity matrix between asymptomatic controls (N = 10). (D) Histogram of prevalence across sequenced samples of each phage. (E) Phage bacterial host genus proportions per asymptomatic patient (N = 10). (F) Phage bacterial host genus proportions per septic patient (N = 61).
Extended Data Fig. 3 |. Proportion of ‘Not Annotated’ Phages from CHVD Gut Metagenome Phages.

(A) Proportion of ‘Not Annotated’ Phages from Gut Metagenome Phages in Stanford Sepsis Cohort (Asymptomatic mean: 0.413 SD: 0.060, N = 10. Sepsis mean: 0.300 SD:0.119, N = 61. Unpaired two-sided t test, P = 0.0046) (B) Proportion of ‘Not Annotated’ Phages from Gut Metagenome Phages in SepSeq cohort (Asymptomatic mean: 0.464 SD:0.271 N = 10. Sepsis mean: 0.418, SD:0.29, N = 61. Unpaired two sided t test, P = 0.101).
Extended Data Fig. 4 |. Number of E. coli phages by genetically classified taxonomic phage family.

(A) Sankey diagram of taxonomic family classifications from NCBI Taxonomy classification (Left) to genetically classified family (Right) (B) Number of E. coli phages by genetically classified taxonomic phage family tested by Brown-Forsythe and Welch Anova Test with two sided Dunnet’s T3 test multiple comparisons for Asymptomatic (N = 166), SIRS (N = 95), Other Sepsis (N = 55) and E. coli Sepsis (N = 36).
Extended Data Fig. 5 |. E. coli Phage Host Characteristic Proportions.

(A)Proportion of E. coli phage host characteristics in violin plots with individual data points with median and quartiles shown by dashed lines. Analyzed by two-way ANOVA with Sidak’s multiple comparisons only in samples with E. coli phage for Asymptomatic (N = 100), SIRS (N = 62), Other Sepsis (N = 36) and E. coli Sepsis (N = 36) patients. Phage characteristic source of variation P = 7.93E-292, Patient category source of variation P > 0.99, Interaction of Phage category and patient category P = 1.75E-50. Lab Strain Associated Phage (Mean: Asymptomatic 0.69, SIRS 0.58, Other Sepsis 0.45, E. coli Sepsis 0.36. Multiple comparisons: E. coli Sepsis vs: Asymptomatic P = 4.37E-33, SIRS P = 4.45E-13, Other Sepsis P = 0.03. Other Sepsis vs: SIRS P = 5.88E-5, Asymptomatic P = 2.19E-18. Asymptomatic vs SIRS P = 1.80E-6), Unspecified Host Associated Phage (Mean: Asymptomatic 0.18, SIRS 0.17, Other Sepsis 0.27, E. coli Sepsis 0.17. Multiple comparisons: E. coli Sepsis vs: Asymptomatic P > 0.99, SIRS P > 0.99, Other Sepsis P = 0.01. Other Sepsis vs: SIRS P = 2E-3, Asymptomatic P = 3E-3. Asymptomatic vs SIRS P = 0.994), STEC Associated Phage (Mean: Asymptomatic 0.05, SIRS 0.14, Other Sepsis 0.13, E. coli Sepsis 0.31. Multiple comparisons: E. coli Sepsis vs: Asymptomatic P = 4.76E-22, SIRS P = 1.19E-8, Other Sepsis P = 5.22E-8. Other Sepsis vs: SIRS P = 0.998, Asymptomatic P = 0.02. Asymptomatic vs SIRS P = 1.79E-4), ETEC Associated Phage (Mean: Asymptomatic 0.03, SIRS 0.04, Other Sepsis 0.03, E. coli Sepsis 0.05. Multiple comparisons: E. coli Sepsis vs: Asymptomatic P = 0.85, SIRS P = 0.99, Other Sepsis P = 0.98. Other Sepsis vs: SIRS P > 0.99, Asymptomatic P > 0.99. Asymptomatic vs SIRS P > 0.99), EPEC Associated Phage Associated Phage (Mean: Asymptomatic 3.67E-3, SIRS 0.01, Other Sepsis 6.78E-4, E. coli Sepsis 0.01. Multiple comparisons: E. coli Sepsis vs: Asymptomatic P > 0.99, SIRS P > 0.99, Other Sepsis P > 0.99. Other Sepsis vs: SIRS P > 0.99, Asymptomatic P > 0.99. Asymptomatic vs SIRS P > 0.99), ExPEC Associated Phage (Mean: Asymptomatic 0.02, SIRS 0.04, Other Sepsis 0.04, E. coli Sepsis 0.05. Multiple comparisons: E. coli Sepsis vs: Asymptomatic P = 0.89, SIRS P > 0.99, Other Sepsis P > 0.99. Other Sepsis vs: SIRS P > 0.99, Asymptomatic P = 0.99. Asymptomatic vs SIRS P > 0.99), Sewage/Manure/Water Associated Phage (Mean: Asymptomatic 0.02, SIRS 0.03, Other Sepsis 0.07, E. coli Sepsis 0.03. Multiple comparisons: E. coli Sepsis vs: Asymptomatic P > 0.99, SIRS P > 0.99, Other Sepsis P = 0.73. Other Sepsis vs: SIRS P = 0.66, Asymptomatic P = 0.33. Asymptomatic vs SIRS P > 0.99).
Extended Data Table 1 |.
Asymptomatic vs septic non-human cfDNA proportions
| Sepsis N = 61 | Asymptomatic N = 10 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Bacteria | Eukaryota | Archaea | Phage | Viruses | Bacteria | Eukaryota | Archaea | Phage | Viruses | |
| Mininum | 0.00924 | 0.006629 | 0 | 0.03389 | 0 | 0 | 0.2485 | 0 | 0.0007375 | 0 |
| Maximum | 0.9098 | 0.9219 | 0.002253 | 0.3829 | 0.1069 | 0.6606 | 0.9602 | 0.001384 | 0.3333 | 0.05758 |
| Range | 0.9006 | 0.9152 | 0.002253 | 0.349 | 0.1069 | 0.6606 | 0.7117 | 0.001384 | 0.3326 | 0.05758 |
| Mean | 0.3697 | 0.5251 | 0.0007543 | 0.09615 | 0.00829 | 0.2549 | 0.6022 | 0.0003155 | 0.1283 | 0.01421 |
| Std. Deviation | 0.2653 | 0.2691 | 0.0004522 | 0.05743 | 0.01626 | 0.238 | 0.2144 | 0.0005755 | 0.1018 | 0.01798 |
| Std. Error of Mean | 0.03369 | 0.03417 | 0.00005742 | 0.007294 | 0.002065 | 0.07527 | 0.06781 | 0.000182 | 0.0322 | 0.005687 |
Extended Data Table 2 |.
Asymptomatic vs septic phage host proportion statistical summary
| Statistical Summary of Mann-Whitney test | Descriptive Statistics Asymptomatic (N=10) | Descriptive Statistics Sepsis (N=61) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Bacterial Genera | Below threshold? | P value | Mean rank of Asymptomatic | Mean rank of Sepsis (E. coli) | Mean rank diff. | Mann-Whitney U | Adjusted P Value | Mean | Std. Deviation | Mean | Std. Deviation |
| Campylobacter | No | 0.114 | 26.750 | 38.070 | −11.320 | 212.500 | 0.855 | 0.007 | 0.004 | 0.012 | 0.010 |
| Pseudomonas | No | 0.497 | 40.750 | 35.810 | 4.935 | 267.500 | 0.997 | 0.053 | 0.032 | 0.053 | 0.068 |
| Burkholderia | No | 0.776 | 34.750 | 36.780 | −2.032 | 292.500 | 0.998 | 0.005 | 0.005 | 0.009 | 0.014 |
| Acinetobacter | No | 0.176 | 28.100 | 37.850 | −9.755 | 226.000 | 0.933 | 0.006 | 0.004 | 0.015 | 0.019 |
| Escherichia | No | 0.004 | 19.150 | 39.300 | −20.150 | 136.500 | 0.067 | 0.052 | 0.038 | 0.124 | 0.097 |
| Klebsiella | No | 0.679 | 33.900 | 36.920 | −3.019 | 284.000 | 0.998 | 0.021 | 0.025 | 0.029 | 0.047 |
| Salmonella | No | 0.561 | 40.100 | 35.920 | 4.181 | 274.000 | 0.997 | 0.015 | 0.012 | 0.014 | 0.016 |
| Shigella | No | 0.421 | 32.100 | 37.210 | −5.110 | 266.000 | 0.997 | 0.001 | 0.002 | 0.005 | 0.010 |
| Aeromonas | No | 0.215 | 44.200 | 35.260 | 8.942 | 233.000 | 0.957 | 0.015 | 0.009 | 0.013 | 0.012 |
| Not_annotated | No | 0.045 | 48.800 | 34.520 | 14.280 | 187.000 | 0.539 | 0.478 | 0.072 | 0.415 | 0.111 |
| Synechococcus | No | 0.844 | 35.350 | 36.690 | −1.335 | 298.500 | 0.998 | 0.011 | 0.025 | 0.006 | 0.015 |
| Staphylococcus | No | 0.864 | 35.400 | 36.680 | −1.277 | 299.000 | 0.998 | 0.013 | 0.025 | 0.022 | 0.052 |
| Streptococcus | No | 0.655 | 39.100 | 36.080 | 3.019 | 284.000 | 0.998 | 0.021 | 0.032 | 0.028 | 0.061 |
| Lactococcus | No | 0.774 | 35.300 | 36.690 | −1.394 | 298.000 | 0.998 | 0.002 | 0.004 | 0.013 | 0.047 |
| Mycobacterium | No | 0.031 | 48.900 | 34.500 | 14.400 | 186.000 | 0.428 | 0.045 | 0.044 | 0.020 | 0.040 |
| Gordonia | No | 0.409 | 41.100 | 35.760 | 5.342 | 264.000 | 0.997 | 0.011 | 0.016 | 0.011 | 0.019 |
| Dickeya | No | 0.143 | 27.450 | 37.960 | −10.510 | 219.500 | 0.901 | 0.012 | 0.006 | 0.016 | 0.011 |
| Cutibacterium | No | 0.440 | 40.900 | 35.790 | 5.110 | 266.000 | 0.997 | 0.082 | 0.090 | 0.055 | 0.083 |
| Other | No | 0.336 | 42.500 | 35.530 | 6.968 | 250.000 | 0.993 | 0.149 | 0.038 | 0.138 | 0.054 |
Extended Data Table 3 |.
Stanford sepsis cohort pathogen associated phage Mann-Whitney summary
| Pathogen | Pathogen associated phage proportion | Pathogen associated phage diversity | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Mean Cx (+) | Mean Cx (−) | Difference between means (Cx(+) − Cx(−)) ± SEM | 95% CI | Two sided P value | Mean Cx (+) | Mean Cx (−) | Difference between means (Cx(+) − Cx(−)) ± SEM | 95% CI | Two sided P value | |
| E. coli Cx (+) N = 16 | 0.2016 | 0.09605 | 0.1056 ± 0.03420 | 0.1775 to 0.03362 | 0.0017 | 2.816 | 2.004 | 0.8123 ± 0.3156 | 0.1562 to 1.468 | 0.0051 |
| Streptococcus Cx (+) N = 12 | 0.0951 | 0.01237 | 0.08273 ± 0.01661 | 0.04950 to 0.1160 | 0.0018 | 1.89 | 0.5455 | 1.344 ± 0.3884 | 0.5671 to 2.122 | 0.0025 |
| Staphylococcus Cx (+) N = 12 | 0.06656 | 0.01143 | 0.05513 ± 0.01540 | 0.02430 to 0.08595 | 0.0094 | 1.587 | 0.5709 | 1.016 ± 0.3194 | 0.3770 to 1.655 | 0.0192 |
| Klebsiella Cx (+) N = 7 | 0.1046 | 0.0196 | 0.08503 ± 0.01557 | 0.05389 to 0.1162 | 0.0056 | 1.631 | 0.7888 | 0.8424 ± 0.4046 | 0.03285 to 1.652 | 0.0272 |
Extended Data Table 4 |.
Stanford sepsis cohort coinfected sample summary
| Coinfecting pathogen | ||||||||
| Escherichia coli, Streptococcus | Sample E. coli phage proportion | 0.36 | Sample E. coli phage SDI | 3.53 | Sample Streptococcus phage proportion | 0.01 | Sample Streptococcus phage SDI | 0.69 |
| Average E. coli (+) sample phage proportion | 0.202 | Average E. coli (+) sample phage SDI | 2.82 | Average Streptococcus (+) sample phage proportion | 0.1 | Average Streptococcus (+) sample phage SDI | 1.89 | |
| Staphylococcus aureus, Klebsiella pneumoniae | Sample S. aureus phage proportion | 0.03 | Sample S. aureus phage SDI | 0 | Sample Klebsiella phage proportion | 0.24 | Sample Klebsiella phage SDI | 2.08 |
| Average S. aureus (+) sample phage proportion | 0.07 | Average S. aureus (+) sample phage SDI | 1.59 | Average Klebsiella (+) sample phage proportion | 0.1 | Average Klebsiella (+) sample phage SDI | 1.63 | |
| Escherichia coli, Klebsiella pneumoniae | Sample E. coli phage proportion | 0.16 | Sample E. coli phage SDI | 2.16 | Sample Klebsiella phage proportion | 0.2 | Sample Klebsiella phage SDI | 2.27 |
| Average E. coli (+) sample phage proportion | 0.202 | Average E. coli (+) sample phage SDI | 2.82 | Average Klebsiella (+) sample phage proportion | 0.1 | Average Klebsiella (+) sample phage SDI | 1.63 |
Extended Data Table 5 |.
SepSeq pathogen associated phage Dunn’s multiple comparison summary
| Pathogen associated phage proportion | Pathogen associated phage diversity | |||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Pathogen | Mean Difference SIRS vs. Other Blood Cx | two sided p value | Mean Difference SIRS vs. Blood Cx (−) Site Cx (+) | two sided p value | Mean Difference SIRS vs. Blood Cx (+) | two sided p value | Difference Other Blood Cx vs. Blood Cx (−) Site Cx (+) | two sided p value | Mean Difference Other Blood Cx vs. Blood Cx (+) | two sided p value | Difference Blood Cx (−) Site Cx (+) vs. Blood Cx (+) | two sided p value | Mean Difference SIRS vs. Other Blood Cx | two sided p value | Mean Difference SIRS vs. Blood Cx (−) Site Cx (+) | two sided p value | Mean Difference SIRS vs. Blood Cx (+) | two sided p value | Difference Other Blood Cx vs. Blood Cx (−) Site Cx (+) | two sided p value | Mean Difference Other Blood Cx vs. Blood Cx (+) | two sided p value | Difference Blood Cx (−) Site Cx (+) vs. Blood Cx (+) | two sided p value |
| E. coli | 0.056 | 0.405 | −0.19 | 9E-05 | −0.19 | 1E-05 | −0.25 | 1E-06 | −0.24 | 7E-08 | 0.0083 | >0.9999 | 0.3752 | 0.6983 | −1.901 | 4E-06 | −2.027 | 2E-08 | −2.276 | 1E-07 | −2.402 | 6E-10 | −0.1263 | >0.9999 |
| Streptococcus | 0.08214 | 0.057 | −0.1173 | >0.9999 | −0.2556 | 0.016 | −0.1994 | 0.347 | −0.3377 | 4E-04 | −0.1383 | >0.9999 | 0.2554 | 0.9898 | −0.2993 | >0.9999 | −2.005 | 0.008 | −0.5546 | >0.9999 | −2.26 | 0.001 | −1.706 | 0.4431 |
| Staphylococcus | −0.00224 | >0.999 | −0.2363 | 0.0717 | −0.424 | 0.001 | −0.2341 | 0.05 | −0.4218 | 8E-04 | −0.1877 | >0.9999 | −0.05044 | >0.9999 | −2.903 | 1E-06 | −3.893 | 2E-06 | −2.853 | 7E-06 | −3.842 | 9E-06 | −0.9895 | >0.9999 |
| Klebsiella | 0.00326 | 0.029 | NA | NA | −0.2062 | 8E-04 | NA | NA | −0.2095 | 0.024 | NA | NA | −0.4927 | 0.0004 | NA | NA | −1.923 | 4E-05 | NA | NA | −1.43 | 0.012 | NA | NA |
Supplementary Material
Acknowledgements
We thank T. Blauwkamp (Karius Inc.), S. Bercovici (Karius Inc.) and N. Noll (Karius Inc.) for their assistance providing additional metadata for the SepSeq dataset. We thank the funding sources supporting this work: NIH R01HL148184-01 (P.L.B.), NIH R01AI12492093 (P.L.B.), NIH R01DC019965 (P.L.B.), Cystic Fibrosis Foundation (P.L.B.), grant from the Emerson Collective (P.L.B.), NSF GRFP (N.L.H.), NIH T32HL129970-06 (L.J.B.), NIH R01AI148623 (A.S.B.), NIH R01AI143757 (A.S.B.), Stand Up 2 Cancer grant (A.S.B.), the Allen Distinguished Investigator Award (A.S.B.), NIH R21GM147838 (S.Y. and P.L.B.), NIH R01AI153133 (S.Y.), NIH R01AI137272 (S.Y.) and NIH R01AI138978 (S.Y.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Footnotes
Competing interests
A.S.B. has consulted for biomX and is on the scientific advisory boards of ArcBio and Caribou Biosciences. The remaining authors declare no competing interests.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Code availability
The R code used to summarize BLAST phageome annotations with the CPD has been made publicly available at https://doi.org/10.5281/zenodo.7734114. This includes an R markdown file detailing processing of BLAST outputs to create phage hit tables across all samples, and subsequent use of the CPD to summarize representation of phage taxonomic families and known bacterial host characteristics. A phage hit table for our sequenced samples is available along with this R code and can be used to re-create phage summary tables as well as for calculation of diversity using the R package ‘vegan’. Processing of raw data, removal of human reads, and BLAST annotations were done using existing software and are described in the relevant Methods sections.
Additional information
Extended data is available for this paper at https://doi.org/10.1038/s41564-023-01406-x.
Peer review information Nature Microbiology thanks Bryan Kraft, Evelien Adriaenssens, Jeremy Barr, Paul Turner and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Reprints and permissions information is available at www.nature.com/reprints.
Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s41564-023-01406-x.
Data availability
Sequencing data with human reads removed have been deposited into NCBI SRA under bioproject PRJNA860730. Publicly available data utilized: the SepSeq study data have been previously published under bioproject PRJNA507824. No new computational tools were developed as part of this study. The INPHARED v1.7 database was downloaded and used for analyses in this study (https://github.com/RyanCook94/inphared). Infection aetiology metadata associated with samples sequenced for this study are included in the Stanford sepsis cohort sheet of Supplementary Data. The CPD FASTA file used for creating the Blast database is publicly available at https://doi.org/10.5281/zenodo.7154236. The Phage dictionary and Coliphage dictionary are additionally available as sheets in Supplementary Data. All associated supplementary files have additionally been made publicly available at https://doi.org/10.5281/zenodo.7644125.
References
- 1.Executive Board, 140. Improving the Prevention, Diagnosis and Clinical Management of Sepsis (The Secretariat, 2017). [Google Scholar]
- 2.Grabuschnig S et al. Putative origins of cell-free DNA in humans: a review of active and passive nucleic acid release mechanisms. Int. J. Mol. Sci. 21, 1–24 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Kowarsky M et al. Numerous uncharacterized and highly divergent microbes which colonize humans are revealed by circulating cell-free DNA. Proc. Natl Acad. Sci. USA 114, 9623–9628 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Cheng AP et al. Cell-free DNA profiling informs all major complications of hematopoietic cell transplantation. Proc. Natl Acad. Sci. USA 119, e2113476118 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Fan HC, Blumenfeld YJ, Chitkara U, Hudgins L & Quake SR Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proc. Natl Acad. Sci. USA 105, 16266–16271 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.De Vlaminck I et al. Circulating cell-free DNA enables non-invasive diagnosis of heart transplant rejection. Sci. Transl. Med. 6, 241ra77 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Snyder TM, Khush KK, Valantine HA & Quake SR Universal noninvasive detection of solid organ transplant rejection. Proc. Natl Acad. Sci. USA 108, 6229–6234 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Schwarzenbach H, Hoon DSB & Pantel K Cell-free nucleic acids as biomarkers in cancer patients. Nat. Rev. Cancer 11, 426–437 (2011). [DOI] [PubMed] [Google Scholar]
- 9.Grumaz S et al. Next-generation sequencing diagnostics of bacteremia in septic patients. Genome Med. 8, 73 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Grumaz C et al. Rapid next-generation sequencing-based diagnostics of bacteremia in septic patients. J. Mol. Diagn. 22, 405–418 (2020). [DOI] [PubMed] [Google Scholar]
- 11.Chen P et al. Rapid diagnosis and comprehensive bacteria profiling of sepsis based on cell-free DNA. J. Transl. Med. 18, 5 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Wang L et al. Plasma microbial cell-free DNA sequencing technology for the diagnosis of sepsis in the ICU. Front. Mol. Biosci. 8, 659390 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Eichenberger EM et al. Microbial cell-free DNA identifies the causative pathogen in infective endocarditis and remains detectable longer than conventional blood culture in patients with prior antibiotic therapy. Clin. Infect. Dis. 10.1093/CID/CIAC426 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Burnham P et al. Urinary cell-free DNA is a versatile analyte for monitoring infections of the urinary tract. Nat. Commun. 9, 2412 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Hogan CA et al. Clinical impact of metagenomic next-generation sequencing of plasma cell-free DNA for the diagnosis of infectious diseases: a multicenter retrospective cohort study. Clin. Infect. Dis. 72, 239–245 (2021). [DOI] [PubMed] [Google Scholar]
- 16.Cheng HK et al. Combined use of metagenomic sequencing and host response profiling for the diagnosis of suspected sepsis. Preprint at bioRxiv 10.1101/854182 (2019). [DOI] [Google Scholar]
- 17.Sinha M et al. Emerging technologies for molecular diagnosis of sepsis. Clin. Microbiol. Rev. 31, e00089–17 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Navarro F & Muniesa M Phages in the human body. Front. Microbiol 8, 566 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Barr JJ A bacteriophages journey through the human body. Immunol. Rev. 279, 106–122 (2017). [DOI] [PubMed] [Google Scholar]
- 20.Hatfull GF Dark matter of the biosphere: the amazing world of bacteriophage diversity. J. Virol. 89, 8107–8110 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Shkoporov AN & Hill C Bacteriophages of the human gut: the ‘known unknown’ of the microbiome. Cell Host Microbe 25, 195–209 (2019). [DOI] [PubMed] [Google Scholar]
- 22.de Jonge PA, Nobrega FL, Brouns SJJ & Dutilh BE Molecular and evolutionary determinants of bacteriophage host range. Trends Microbiol. 27, 51–63 (2019). [DOI] [PubMed] [Google Scholar]
- 23.Flores CO, Meyer JR, Valverde S, Farr L & Weitz JS Statistical structure of host–phage interactions. Proc. Natl Acad. Sci. USA 108, E288–E297 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Koskella B & Meaden S Understanding bacteriophage specificity in natural microbial communities. Viruses 5, 806–823 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Nguyen S et al. Bacteriophage transcytosis provides a mechanism to cross epithelial cell layers. mBio 8, e01874–17 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Górski A et al. Bacteriophage translocation. FEMS Immunol. Med. Microbiol. 46, 313–319 (2006). [DOI] [PubMed] [Google Scholar]
- 27.Manrique P et al. Healthy human gut phageome. Proc. Natl Acad. Sci. USA 113, 10400–10405 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zhang T et al. RNA viral community in human feces: prevalence of plant pathogenic viruses. PLoS Biol. 4, 0108–0118 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Huang Y-F et al. Analysis of microbial sequences in plasma cell-free DNA for early-onset breast cancer patients and healthy females. BMC Med. Genomics 11, 16 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Nayfach S et al. Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome. Nat. Microbiol. 6, 960–970 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Camarillo-Guerrero LF, Almeida A, Rangel-Pineros G, Finn RD & Lawley TD Massive expansion of human gut bacteriophage diversity. Cell 184, 1098–1109.e9 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Tisza MJ & Buck CB A catalog of tens of thousands of viruses from human metagenomes reveals hidden associations with chronic diseases. Proc. Natl Acad. Sci. USA 118, e2023202118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Adriaenssens EM Phage diversity in the human gut microbiome: a taxonomist’s perspective. mSystems 6, e0079921 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Blauwkamp TA et al. Analytical and clinical validation of a microbial cell-free DNA sequencing test for infectious disease. Nat. Microbiol. 4, 663–674 (2019). [DOI] [PubMed] [Google Scholar]
- 35.Andrew S FastQC a quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc (Babraham Institute, 2010). [Google Scholar]
- 36.Bolger AM, Lohse M & Usadel B Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Kirschbaum JO & Kligman AM The pathogenic role of Corynebacterium acnes in acne vulgaris. Arch. Dermatol. 88, 832–833 (1963). [DOI] [PubMed] [Google Scholar]
- 38.Altschul SF, Gish W, Miller W, Myers EW & Lipman DJ Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990). [DOI] [PubMed] [Google Scholar]
- 39.Camacho C et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Mihara T et al. Linking virus genomes with host taxonomy. Viruses 8, 66 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Shannon CE A mathematical theory of communication. Bell Syst. Tech. J 27, 379–423 (1948). [Google Scholar]
- 42.Hotchkiss RS et al. Sepsis and septic shock. Nat. Rev. Dis. Primers 2, 16045 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Cook R et al. INfrastructure for a PHAge REference Database: identification of large-scale biases in the current collection of cultured phage genomes. PHAGE 2, 214–223 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Suzuki H, Lefébure T, Bitar PP & Stanhope MJ Comparative genomic analysis of the genus Staphylococcus including Staphylococcus aureus and its newly described sister species Staphylococcus simiae. BMC Genomics 13, 38 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Gu W et al. Rapid pathogen detection by metagenomic next-generation sequencing of infected body fluids. Nat. Med. 27, 115–124 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Long Y et al. Diagnosis of sepsis with cell-free DNA by next-generation sequencing technology in ICU patients. Arch. Med. Res. 47, 365–371 (2016). [DOI] [PubMed] [Google Scholar]
- 47.Barrett SLR et al. Cell free DNA from respiratory pathogens is detectable in the blood plasma of cystic fibrosis patients. Sci. Rep. 10, 6903 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Ross A, Ward S & Hyman P More is better: selecting for broad host range bacteriophages. Front. Microbiol 7, 1352 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Wang X et al. Cryptic prophages help bacteria cope with adverse environments. Nat. Commun. 1, 147 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Ewels P, Magnusson M, Lundin S & Käller M MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Langmead B & Salzberg SL Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Danecek P et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Li H Seqtk: a fast and lightweight tool for processing FASTA or FASTQ sequences. Github; https://github.com/lh3/seqtk (2013). [Google Scholar]
- 54.R Core Team R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2020). [Google Scholar]
- 55.Chamberlain SA & Szöcs E Taxize: taxonomic search and retrieval in R. F1000Res 2, 191 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Oksanen J et al. vegan: community ecology package. R package version 2.5–2. https://cran.r-project.org/package=vegan (2018). [Google Scholar]
- 57.Kassambara A et al. factoextra: extract and visualize the results of multivariate data analyses. R package factoextra version 1.0.7. https://cran.r-project.org/package=factoextra (2020). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Sequencing data with human reads removed have been deposited into NCBI SRA under bioproject PRJNA860730. Publicly available data utilized: the SepSeq study data have been previously published under bioproject PRJNA507824. No new computational tools were developed as part of this study. The INPHARED v1.7 database was downloaded and used for analyses in this study (https://github.com/RyanCook94/inphared). Infection aetiology metadata associated with samples sequenced for this study are included in the Stanford sepsis cohort sheet of Supplementary Data. The CPD FASTA file used for creating the Blast database is publicly available at https://doi.org/10.5281/zenodo.7154236. The Phage dictionary and Coliphage dictionary are additionally available as sheets in Supplementary Data. All associated supplementary files have additionally been made publicly available at https://doi.org/10.5281/zenodo.7644125.
