Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2018 Nov 27;115(52):E12353–E12362. doi: 10.1073/pnas.1809700115

Integrating host response and unbiased microbe detection for lower respiratory tract infection diagnosis in critically ill adults

Charles Langelier a,b,1, Katrina L Kalantar c,1, Farzad Moazed d, Michael R Wilson e,f, Emily D Crawford b,c, Thomas Deiss d, Annika Belzer d, Samaneh Bolourchi d, Saharai Caldera a,b, Monica Fung a, Alejandra Jauregui d, Katherine Malcolm g, Amy Lyden b, Lillian Khan c, Kathryn Vessel d, Jenai Quan b,c, Matt Zinter h, Charles Y Chiu a,i, Eric D Chow c, Jenny Wilson j, Steve Miller i, Michael A Matthay d,k,l, Katherine S Pollard b,m,n,o,p,q, Stephanie Christenson d, Carolyn S Calfee d,h,2, Joseph L DeRisi b,c,2,3
PMCID: PMC6310811  PMID: 30482864

Significance

Lower respiratory tract infections (LRTIs) are the leading cause of infectious disease-related deaths worldwide yet remain challenging to diagnose because of limitations in existing microbiologic tests. In critically ill patients, noninfectious respiratory syndromes that resemble LRTIs further complicate diagnosis and confound targeted treatment. To address this, we developed a metagenomic sequencing-based approach that simultaneously interrogates three core elements of acute airway infections: the pathogen, airway microbiome, and host response. We studied this approach in a prospective cohort of critically ill patients with acute respiratory failure and found that combining pathogen, microbiome, and host gene expression metrics achieved accurate LRTI diagnosis and identified etiologic pathogens in patients with clinically identified infections but otherwise negative testing.

Keywords: lower respiratory tract infection, pneumonia, next-generation sequencing, transcriptome, mechanical ventilation

Abstract

Lower respiratory tract infections (LRTIs) lead to more deaths each year than any other infectious disease category. Despite this, etiologic LRTI pathogens are infrequently identified due to limitations of existing microbiologic tests. In critically ill patients, noninfectious inflammatory syndromes resembling LRTIs further complicate diagnosis. To address the need for improved LRTI diagnostics, we performed metagenomic next-generation sequencing (mNGS) on tracheal aspirates from 92 adults with acute respiratory failure and simultaneously assessed pathogens, the airway microbiome, and the host transcriptome. To differentiate pathogens from respiratory commensals, we developed a rules-based model (RBM) and logistic regression model (LRM) in a derivation cohort of 20 patients with LRTIs or noninfectious acute respiratory illnesses. When tested in an independent validation cohort of 24 patients, both models achieved accuracies of 95.5%. We next developed pathogen, microbiome diversity, and host gene expression metrics to identify LRTI-positive patients and differentiate them from critically ill controls with noninfectious acute respiratory illnesses. When tested in the validation cohort, the pathogen metric performed with an area under the receiver-operating curve (AUC) of 0.96 (95% CI, 0.86–1.00), the diversity metric with an AUC of 0.80 (95% CI, 0.63–0.98), and the host transcriptional classifier with an AUC of 0.88 (95% CI, 0.75–1.00). Combining these achieved a negative predictive value of 100%. This study suggests that a single streamlined protocol offering an integrated genomic portrait of pathogen, microbiome, and host transcriptome may hold promise as a tool for LRTI diagnosis.


Lower respiratory tract infections (LRTIs) are a leading cause of mortality worldwide (13). Early and accurate determination of acute respiratory disease etiology is crucial for implementing effective pathogen-targeted therapies but is often not possible due to the limitations of current microbiologic tests in terms of sensitivity, speed, and spectrum of available assay targets (4). For instance, even with the best available clinical diagnostics, a contributory pathogen can be detected in only 38% of adults with community acquired pneumonia, due to the low sensitivity and time requirements of culture, and the limited number of microbes detectable by serologic and PCR assays (4, 5).

In the absence of a definitive microbiologic diagnosis, clinicians may presume symptoms are due to a noninfectious inflammatory condition and initiate empiric corticosteroids, which can exacerbate an occult infection (6). Furthermore, even with negative microbiologic testing, providers often continue empiric antibiotics due to concerns of falsely negative results, a practice that drives emergence of antibiotic resistance and increases risk of Clostridium difficile infection (7). In the intensive care unit (ICU), LRTI diagnosis is particularly complex due to a high prevalence of noninfectious inflammatory conditions with overlapping clinical features (8) and a patient demographic that includes severely immunocompromised individuals who may exhibit atypical presentations of pulmonary infections.

Advancements in genome sequencing hold promise for overcoming these diagnostic challenges by affording culture-independent assessment of microbial genomes from microliter volumes of clinical samples (9, 10). Recent work has highlighted the utility of metagenomic next-generation sequencing (mNGS) for rapid and actionable diagnosis of complicated infections (6, 1113). While these results are encouraging, most mNGS computational pipelines have been developed for analysis of sterile fluids or cultured bacterial isolates and have limited capacity to identify pathogens amid the complex background of commensal microbiota present in respiratory specimens (1315).

Host transcriptional profiling from peripheral blood has emerged as a promising alternative to pathogen-based diagnostics that can distinguish viral from bacterial LRTIs as well as differentiate between patients with acute respiratory infections versus those with noninfectious illnesses (5, 16, 17). This approach, while highly promising, has not been well studied in ICU patients with respiratory failure or in severely immunocompromised subjects. Furthermore, host transcriptional profiling has not yet been coupled with simultaneous detection of pulmonary pathogens (5, 18), which could improve diagnostic accuracy and more precisely inform optimal antimicrobial treatment.

mNGS can extend both host gene expression assays and current microbe-based diagnostics by simultaneously detecting pathogens, the airway microbiome, and transcriptional biomarkers of the host’s immune response. Here, we address the need for better LRTI diagnostics by developing an mNGS-based method that integrates host response and unbiased microbe detection. We then evaluate the performance of this approach in a prospective cohort of critically ill patients with acute respiratory failure.

Results

We prospectively enrolled 92 adults admitted to the ICU with acute respiratory failure and collected tracheal aspirate (TA) samples within 72 h of intubation (Table 1). Patients underwent testing with clinician-ordered standard of care microbiologic diagnostics at the University of California, San Francisco, Moffitt–Long Hospital, a tertiary-care referral center. Subjects with LRTI were identified by two-physician adjudication using US Centers for Disease Control/National Healthcare Safety Network (CDC/NHSN) surveillance case definitions and retrospective electronic medical record review, with blinding to mNGS results (Dataset S2A) (19). Using this approach, patients were assigned to one of four groups: (i) LRTI defined by both clinical and microbiologic criteria (LRTI+C+M, n = 26); (ii) no evidence of LRTI and a clear alternative explanation for acute respiratory failure (no-LRTI, n = 18); (iii) LRTI defined by clinical criteria alone with negative conventional microbiologic testing (LRTI+C, n = 34); and (iv) respiratory failure due to unclear cause, infectious or noninfectious (unk-LRTI, n = 14).

Table 1.

Demographics and clinical characteristics of study cohort

Cohort characteristics Cohort overall LRTI+C+M no-LRTI P*
Patient characteristics
 Total enrolled 92 26 18
 Age, y 62 61 63 0.80
 Female gender 31 (34%) 6 9 0.13
 Race
  African American 5
5%)
2 1 0.82
  Asian 26 (28%) 7 5 0.82
  Caucasian 50 (46%) 15 9 0.82
  Other 11 (12%) 2 3 0.82
  Hispanic ethnicity 8 (9%) 3 1 0.88
Comorbidities and outcomes
 Bacteremia 21 (23%) 6 3 0.90
 Nonpulmonary infections 29 (32%) 9 4 0.58
 COPD 12 (13%) 3 0 0.37
 Diabetes mellitus 6 (7%) 1 3 0.36
 Congestive heart failure 7 (8%) 1 1 1.00
 Current smoker 12 (13%) 5 1 0.39
 Immune suppression 41 (45%) 10 9 0.65
 Solid-organ transplantation 13 (14%) 1 5 0.07
 Prior antibiotic use 84 (91%) 22 18 0.23
 Community acquired pneumonia 42 (46%) 18
 Hospital acquired pneumonia 13 (14%) 5
 Ventilator associated pneumonia 3 (3%) 3
 30-d mortality 18 (20%) 6 1 0.25
Clinical metrics
 Max temperature, °C 37.8 38.1 38.0 0.33
 Max WBC count, 106 cells/μL 14.3 13.8 12.8 0.58
 Max heart rate, bpm 110 111 107 0.50
 Max respiratory rate, breaths/min 36 35 35 0.74
 SIRS criteria, mean 3 3 3 0.54
 APACHE III score, mean 97 101 94 0.62
 Pneumonia severity index, mean 151 148 137 0.65

COPD, chronic obstructive pulmonary disease; LRTI+C+M, subjects who met both clinical and microbiologic criteria for LRTI; no-LRTI, subjects with a noninfectious etiology of acute respiratory failure; SIRS, systemic inflammatory response syndrome, defined as two or more abnormalities in white blood cell count (>12,000 or <4,000 cells per µL), temperature (>38 or <36 °C), heart rate (>90 beats per min), or respiratory rate (>20 breaths per min). APACHEIII score predicts mortality and disease severity for critically ill patients. Pneumonia severity index score estimates mortality for adult patients with community-acquired pneumonia (65). Percentage of total cohort is shown in parentheses.

*

P values for Patient characteristics and Comorbidities and outcomes, χ2 test; P values for Clinical metrics, Wilcoxon rank sum test. test.

The age range of the cohort was 21–85+.

From extracted nucleic acid samples, we performed both metagenomic shotgun DNA sequencing (DNA-seq) as well as RNA sequencing (RNA-seq). We first developed computational algorithms to sift respiratory pathogens from background commensal flora in an effort to enhance detection of LRTI etiology. To differentiate patients with LRTI from those with noninfectious critical respiratory illnesses, we next developed metrics of LRTI probability based on pathogen, airway microbiome diversity, and host gene expression (Fig. 1). To assess assay performance, we focused on the most unambiguously LRTI-positive and -negative subjects (LRTI+C+M and no-LRTI) by randomly dividing them into independent derivation (n = 20, used for model training) and validation cohorts (n = 24, used for model testing). Each metric (pathogen, microbiome, and host) was evaluated independently and then in combination.

Fig. 1.

Fig. 1.

Study overview and analysis workflow. Patients with acute respiratory failure were enrolled within 72 h of ICU admission, and TA samples were collected and underwent both RNA sequencing (RNA-seq) and shotgun DNA sequencing (DNA-seq). Post hoc clinical adjudication blinded to mNGS results identified patients with LRTI defined by clinical and microbiologic criteria (LRTI+C+M); LRTI defined by clinical criteria only (LRTI+C); patients with noninfectious reasons for acute respiratory failure (no-LRTI); and respiratory failure due to unknown cause (unk-LRTI). The LRTI+C+M and no-LRTI groups were divided into derivation and validation cohorts. To detect pathogens and differentiate them from a background of commensal microbiota, we developed two models: a rules-based model (RBM) and a logistic regression model (LRM). LRTI probability was next evaluated with (i) a pathogen metric, (ii) a lung microbiome diversity metric, and (iii) a 12-gene host transcriptional classifier. Models were then combined and optimized for LRTI rule out.

Pathogen Detection.

While many NGS platforms utilize only one nucleic acid type, we combined both RNA-seq and DNA-seq. This approach allowed for simultaneous host transcriptional profiling, permitted detection of RNA viruses, and enriched for actively transcribing microbes (versus latent or nonviable taxa). In addition, requiring concordant detection of microbes across both nucleic acid types reduced spurious alignments derived from reagent contaminants intrinsic to the library preparations of each nucleic acid type (20). From each TA sample, we generated a mean of 19.6 and 32.6 million paired-end sequencing reads, from DNA-seq and RNA-seq, respectively, of which the median fraction of microbial reads was 0.04% (interquartile range, 0.01–0.16%). Raw reads were analyzed using a rapid computational pipeline that aligns and classifies microbial taxa by nucleotide and peptide translation using the National Center for Biotechnology Information (NCBI) NT and NR databases, respectively (13, 20). RNA-seq yielded a greater abundance of sequences compared with DNA-seq for 78% of identified microbes, with a median of 2.2 times more reads per microbe.

We and others have previously developed NGS methodologies for “sterile site” clinical fluids such as cerebrospinal fluid (13, 14, 21). The lung, however, is not a sterile environment and in fact harbors microbial communities during states of both health and disease (2225). Asymptomatic carriage of potentially pathogenic organisms is common (26, 27), and only in a subset of cases do these microbes overtake airway microbial communities and precipitate LRTI (28). As such, distinguishing legitimate pathogens from commensal or colonizing microbiota is a central challenge for LRTI diagnostics and adds complexity to the interpretation of metagenomic sequencing data. To this point, while we detected all 38 pathogens identified from clinician-ordered microbiologic tests in the 26 LRTI+C+M patients using mNGS (Dataset S3A), a 10-fold greater number of airway commensals were also identified. The most prevalent microbes in the no-LRTI patient group included well-known commensal taxa (Dataset S4). Thus, to distinguish probable pathogens from airway commensals, we developed two complementary algorithms: (i) a rules-based model (RBM) optimized for detecting well-established respiratory pathogens, and (ii) a more flexible logistic regression model (LRM) that also permitted novel pathogen detection (Fig. 1).

The goal of both models was to correctly identify pathogens amid abundant and heterogeneous populations of commensals. Microbes identified by clinician-ordered diagnostics plus all viruses with established respiratory pathogenicity in the LRTI+C+M group were categorized as pathogens (n = 12 in derivation cohort and n = 26 in validation cohort; Dataset S1). Any additional microbes identified by mNGS were considered commensals (n = 155 in derivation cohort; n = 174 in validation cohort). We accepted that this “practical” gold standard would provide an attenuated estimate of performance due to the sensitivity limitations of microbial culture in the setting of antibiotic preadministration (4).

In the RBM, respiratory microbes from each patient were assigned an abundance score based on the sum of log(RNA-seq) and log(DNA-seq) genus reads per million reads mapped (rpm) (Dataset S3A). After ranking microbes by this abundance score, the greatest score difference between sequentially ranked microbes was identified and used to distinguish the group of highest-scoring microbes within each patient (Fig. 2A and SI Appendix, Fig. S1). These high-scoring microbes plus all RNA viruses detected at a conservative threshold of >0.1 rpm were indexed against an a priori developed table of established lower respiratory pathogens derived from landmark surveillance studies and clinical guidelines (Dataset S2B) and, if present, were identified as putative pathogens by the RBM (4, 2931).

Fig. 2.

Fig. 2.

Workflow for distinguishing LRTI pathogens from commensal respiratory microbiota using an algorithmic approach. (A) Projection of microbial relative abundance in log reads per million reads sequenced (rpm) by RNA sequencing (RNA-seq) (x axis) versus DNA sequencing (DNA-seq) (y axis) for representative cases. In the LRTI+C+M group, pathogens identified by standard clinical microbiology (filled shapes) had higher overall relative abundance compared with other taxa detected by sequencing (open shapes). The largest score differential between ranked microbes (max Δrpm) was used as a threshold to identify high-scoring taxa, distinct from the other microbes based on abundance (line with arrows). Red indicates taxa represented in the reference list of established LRTI pathogens. (B) Receiver operating characteristic (ROC) curve demonstrating logistic regression model (LRM) performance for detecting pathogens versus commensal microbiota in both the derivation and validation cohorts. The gray ROC curve and shaded region indicate results from 1,000 rounds of training and testing on randomized sets the derivation cohort. The blue and green lines indicate predictions using leave-one-patient-out cross-validation (LOPO-CV) on the derivation and validation on the validation cohort, respectively. (C) Microbes predicted by the LRM to represent putative pathogens. The x axis represents combined RNA-seq and DNA-seq relative abundance, and the y axis indicates pathogen probability. The dashed line reflects the optimized probability threshold for pathogen assignment. Red filled circles: microbes predicted by LRM to represent putative LRTI pathogens that were also identified by conventional microbiologic tests. Blue filled circles: microbes predicted to represent putative LRTI pathogens by LRM only. Blue open circles: microbes identified by NGS but not predicted by the LRM to represent putative pathogens. Red open circles: microbes identified using NGS and by standard microbiologic testing but not predicted to be putative pathogens. Dark red outlined circles: microbes detected as part of a polymicrobial culture.

The RBM achieved an accuracy for pathogen detection of 98.8% and 95.5% in the derivation and validation cohorts, respectively (Dataset S3A). In subjects whose respiratory cultures grew three or more different bacteria, mNGS was able to detect each of the species. In most cases, however, their abundance differed by several 100-fold, which confounded detection of the lower abundance taxa (Dataset S3A). Given the unclear significance of single species in such polymicrobial cases with respect to pathogenicity (32), we performed a secondary analysis in which only the most abundant microbe was considered a pathogen, and this approach yielded an accuracy of 98.4%.

While the RBM performed well for identifying microbes with established pulmonary pathogenicity, we recognized the need to also detect novel or atypical species. We thus employed machine learning to distinguish respiratory pathogens from commensals using a LRM trained on microbes detected in the derivation cohort patients (n = 20) using the predictor variables of RNA-seq rpm, DNA-seq rpm, rank by RNA-seq rpm, established LRTI pathogen (yes/no), and virus (yes/no). These features were selected to preferentially favor highly abundant organisms with established pathogenicity in the lung, but still permit detection of uncommon taxa that could represent putative pathogens.

To evaluate LRM performance in the derivation cohort, we performed leave-one-patient-out cross-validation, in which all microbes from a single patient were held out in each round of cross-validation. This yielded an AUC of 0.90 (95% CI, 0.76–0.99). A final model was trained on all microbes from derivation cohort patients, and this achieved an AUC of 0.91 (95% CI, 0.83–0.97) for pathogen identification in the validation cohort (Fig. 2B and Dataset S3 A and B). At an optimized probability threshold of 0.36 (Methods), this translated to an accuracy of 96.4% and 95.5% in the derivation and validation cohorts, respectively. As with the RBM, LRM performance suffered in polymicrobial culture cases with species that differed by several magnitudes in abundance when assessed by mNGS. As such, when only the most abundant microbe identified by clinical microbiologic diagnostics per LRTI+C+M patient was considered as the etiologic pathogen, the AUC increased to 0.997 (95% CI, 0.99–1.00) in the validation cohort.

Combining the RBM and LRM identified more putative pathogens than either model alone and revealed a potential LRTI etiology in 62% (n = 21) of the LRTI+C patients with clinically adjudicated LRTI but negative microbiologic testing (Fig. 3, SI Appendix, Fig. S2, and Dataset S3A). Compared with clinician-ordered diagnostics, this permitted a microbiologic diagnosis in a greater number of LRTI-positive subjects (78% vs. 43%; P < 1.00 × 10−4 by McNemar’s test; Fig. 3). Putative new pathogens in a representative subset of the LRTI+C group patients (n = 11; 32%) were orthogonally confirmed by clinical multiplex respiratory virus PCR, influenza C PCR (33), or by 16S bacterial rRNA gene sequencing (Dataset S3A).

Fig. 3.

Fig. 3.

Distribution of respiratory pathogens identified in patients using clinician-ordered diagnostics versus mNGS. Number of subjects in whom each respiratory microbe was detected. All microbes detected by clinician-ordered diagnostics were detected by mNGS; however, pink bars indicate microbes misclassified as negative by either the RBM or LRM. Notably, all microbes identified by clinician-ordered diagnostics and misclassified by either the RBM or LRM (pink bars) were found in polymicrobial cultures, highlighting the presence of dominant pathogens by NGS that are not captured in the polymicrobial culture results. Red bars indicate microbes detected by clinician-ordered diagnostics and also predicted as pathogens by either the RBM or LRM. More detail on which model identified each microbe can be found in SI Appendix, Fig. S2. Dark red bars (LRTI+C+M and LRTI+C subjects) and gray bars (no-LRTI subjects) indicate number of cases with microbes detected only by mNGS.

Putative pathogens identified in the unk-LRTI group (n = 6, 42%) may have represented atypically presenting respiratory infections or incidental carriage in the respiratory tract (SI Appendix, Fig. S2 and Dataset S3A). Microbes identified in the no-LRTI group (n = 3; 17%) were present at lower abundance compared with microbes in LRTI+C+M subjects (P < 0.01 by Wilcoxon rank sum), LRTI+C (P < 0.01), and unk-LRTI subjects (P = 0.02), and included contextual pathogens such as Streptococcus pneumoniae and Haemophilus influenzae that colonize the airways of 20–50% of healthy individuals (32, 34, 35). Together, these findings highlighted the reality of asymptotic carriage of potentially pathogenic species, emphasizing the need to contextualize microbial detection with respect to other key elements of an airway infection, in particular the airway microbiome and the host’s immune response (26, 36). We thus undertook further analytical development to predict LRTI status by calculating combined metrics based on pathogen, microbiome, and host transcriptional response.

LRTI Prediction Based on Pathogen.

We recognized that the highest per-patient LRM pathogen versus commensal probability value differed significantly between LRTI+C+M and no-LRTI subjects (P = 3.8 × 10−4 by Wilcoxon rank sum). As such, we hypothesized that this value might have utility not only for pathogen versus commensal prediction, but also for LRTI prediction in general. Testing this idea, we found that the maximum per patient LRM probability value predicted LRTI status with an AUC of 0.97 (95% CI, 0.90–1.00) in the derivation cohort and 0.96 (95% CI, 0.86–1.00) in the validation cohort (SI Appendix, Fig. S3).

LRTI Prediction Based on Lung Microbiome Diversity.

Several studies have demonstrated reduced diversity of the airway microbiome in the setting of LRTI (20, 3739). We measured intrapatient (α) diversity of airway genera using the Shannon diversity index (SDI) and found that LRTI+C+M subjects had significantly lower SDI compared with no-LRTI subjects when assessed by both RNA-seq (Fig. 4A; P = 1.3 × 10−4) and DNA-seq (SI Appendix, Fig. S4A; P = 8.9 × 10−3) (Dataset S5). We next examined interpatient (β) diversity (40) using the Bray–Curtis Index (41) and found that this also differed between LRTI+C+M and no-LRTI subjects, with assessment by RNA-seq again yielding a more significant difference versus DNA-seq [P = 5 × 10−3 versus P = 9 × 10−3 by permutation analysis of variance (PERMANOVA), respectively; Fig. 4B and SI Appendix, Fig. S4B]. We then tested whether diversity alone might predict LRTI and found that RNA-seq SDI differentiated LRTI+C+M from no-LRTI subjects with an AUC of 0.96 (95% CI, 0.89–1.00) in the derivation cohort and an AUC of 0.80 (95% CI, 0.63–0.96) in the validation cohort (Fig. 4C). DNA-seq SDI did not perform as well, with AUCs of 0.84 (95% CI, 0.66–1.00) and 0.53 (95% CI, 0.25–0.80) in the derivation and validation cohorts, respectively (SI Appendix, Fig. S4C). These findings suggested that genus diversity assessed by RNA-seq was a useful, albeit imperfect, biomarker of LRTI.

Fig. 4.

Fig. 4.

Diversity of the transcriptionally active lung microbiome in patients with LRTI (LRTI+C+M) versus noninfectious respiratory illnesses (no-LRTI). (A) Box plots of Shannon diversity index (SDI) of the lung microbiome assessed by RNA-seq at the genus level (in the derivation cohort) differed between LRTI+C+M from no-LRTI groups. (B) The β diversity assessed by PERMANOVA on Bray–Curtis dissimilarity values in the derivation cohort differed between LRTI+C+M and no-LRTI groups. (C) ROC curve demonstrating performance of SDI to distinguish LRTI+C+M from no-LRTI groups.

LRTI Prediction Based on Host Response.

In the setting of critical illness, systemic inflammatory responses due to diverse physiologic processes can make true LRTI clinically indistinguishable from noninfectious respiratory failure or severe extrapulmonary infection. Consistent with this, we found that the systemic inflammatory response syndrome (SIRS) criteria (temperature, white blood cell count, heart rate, respiratory rate) had limited utility for LRTI detection despite being widely used for infection assessment (Dataset S1). We thus hypothesized that transcriptional profiling, which has emerged as a promising and accurate host-based approach for assessing infection, might provide diagnostic insight in settings when clinical rules are uninformative (5, 16, 42).

As such, we examined differential gene expression between LRTI+C+M and no-LRTI subjects in the derivation cohort to define a host transcriptional signature of LRTI in patients with critical illness. Using a false-discovery rate (FDR) of <0.05, we identified a total of 882 differentially expressed genes, 414 of which were up-regulated in LRTI+C+M subjects (SI Appendix, Fig. S6 and Dataset S6A). Gene set enrichment analysis (43) identified up-regulation of pathways related to innate immune responses, NF-κβ signaling, cytokine production, and the type I IFN response in LRTI+C+M subjects. In comparison, gene expression pathways in the no-LRTI group were enriched for oxidative stress responses and MHC class II receptor signaling (Dataset S6B). A subanalysis (SI Appendix, Methods) evaluating differences between viral and bacterial infections in known LRTI+C+M patients identified four differentially expressed genes (RSAD2, OAS3, CXCL2, DUSP2). Genes up-regulated in viral cases (RSAD2, OAS3) were related to the type-1 IFN and antiviral responses, reflecting biologically relevant differences in host response indicative of pathogen type, despite a relatively limited sample size within a heterogeneous cohort and high proportion of immune-compromising conditions in the majority of patients with detected viruses.

We next sought to construct an airway-specific host transcriptional classifier that could differentiate LRTI+C+M patients from no-LRTI subjects by employing machine learning (Methods). Elastic net regularized regression in the derivation cohort identified a 12-gene classifier that was then used to score patients based on a weighted sum of scaled expression values (Fig. 5 A and B and Dataset S7A). We found that predictive classifier genes up-regulated in LRTI+C+M patients compared with no-LRTI patients included NFAT-5, which plays a role in T-cell function and inducible gene transcription during immune responses (44); ZC3H11A, which encodes a zinc-finger protein involved in the regulation of cytokine production and immune cell activation (45); and PRRC2C, which functions in RNA binding and may play a role in hematopoietic progenitor cell differentiation in response to infection (46). Genes up-regulated in no-LRTI patients compared with LRTI+C+M patients included the following: CD36, which encodes a macrophage phagocytic receptor involved in scavenging dying/dead cells and oxidized lipids (47, 48); BLVRB, which is involved in oxidative stress responses (49); EDF1, which contributes to the regulation of nitric oxide release in endothelial cells (50); and ENG, an integral membrane glycoprotein receptor that may modulate inflammation and angiogenesis (51).

Fig. 5.

Fig. 5.

Host transcriptional profiling distinguishes patients with acute LRTI (LRTI+C+M) from those with noninfectious acute respiratory illness (no-LRTI). (A) Host classifier scores for all patients in the derivation and validation cohorts; each bar indicates a patient score and is colored as follows: LRTI+C+M, red; no-LRTI, blue. Orange dotted line indicates the host classifier threshold (score, −4) that achieved 100% sensitivity in the training set and was used to classify the test set samples. (B) Normalized expression levels, arranged by unsupervised hierarchical clustering, reflect overexpression (blue) or underexpression (turquoise) of classifier genes (rows) for each patient (columns). Twelve genes were identified as predictive in the derivation cohort and subsequently applied to predict LRTI status in the validation cohort. Column colors above the heatmap indicate whether a patient belonged to the derivation cohort (dark gray) or validation cohort (light gray) and whether they were adjudicated to have LRTI+C+M (red) or no-LRTI (blue). (C) ROC curves demonstrating host classifier performance for derivation (blue) and validation (green) cohorts.

Classifier performance assessed by leave-one-out cross-validation demonstrated an AUC of 0.90 (95% CI, 0.75–1.00) in the derivation cohort and an AUC of 0.88 (95% CI, 0.75–1.00) in the validation cohort (Fig. 5C). Covariates for immune suppression, concurrent nonpulmonary infection, antibiotic use, age, and gender were iteratively incorporated into the regression model, but none was significant enough to be maintained when sparsity was added by elastic net (Dataset S7B). We tested whether differences in host gene expression could be attributed to enrichment of specific cell types using CIBERSORT (52) (Dataset S7C) and found that only M2 macrophages were enriched in the no-LRTI group (P = 0.03 by Wilcoxon rank sum).

Finally, given our modest sample size, we tested the statistical power of our host classifier by computing learning curves (Methods). We observed that even with subsampling, the 12 classifier genes were continually represented. While the derivation cohort sample size approached the limit required for robust performance assessment, the analysis suggested that additional patients might lead to further improvement (SI Appendix, Fig. S6A). A similar analysis for the pathogen versus commensal LRM indicated that performance metrics had converged with the given microbial sample size, indicating robust performance assessment and sufficient training data (SI Appendix, Fig. S6B).

Evaluation of a Combined LRTI Metric.

Given the relative success of each independent metric (pathogen, microbiome, and host) for discerning the presence of infection, we asked whether combining them could enhance LRTI detection. We recognized the potential of mNGS to empower a data-driven assessment of a patient’s LRTI status during the critical time frame following ICU admission. As such, we developed a readily interpretable compilation of host and pathogen mNGS metrics in a rule-out model designed to maximize LRTI diagnostic sensitivity. This process, which involved optimizing intrametric LRTI positivity thresholds in the derivation cohort and calling positivity based on either the host or pathogen scores (Methods), achieved a sensitivity and specificity of 100% and 87.5%, respectively, in the validation cohort, equating to a negative predictive value of 100% (Fig. 6B). Despite the limitations of a small cohort, we investigated the potential utility of the rule-out model for curbing broad-spectrum antibiotic overuse in the ICU by performing a theoretical calculation in the no-LRTI group to estimate the potential impact of mNGS result availability at 48-h postenrollment. This estimate suggested that a significant reduction in unnecessary empiric antibiotic use could have been possible (78 versus 50 d of therapy; P = 0.03; SI Appendix, Methods).

Fig. 6.

Fig. 6.

Combined LRTI prediction metric integrating pathogen detection and host gene expression. (A) Scores per patient for each of the two components of this LRTI rule-out model are projected into a scatterplot (x axis represents the host metric; y axis represents the microbe score). The thresholds optimized for sensitivity in the derivation cohort are indicated in gray dashed line. Each point represents one patient—those that were in the derivation cohort have no fill, and those that were in the validation cohort are filled. Red indicates LRTI+C+M, and blue indicates no-LRTI subjects. (B) LRTI rule-out model results for each patient are shown for both the derivation and validation cohorts, with study subjects shown in rows and metrics in columns. Dark gray indicates a metric exceeded the optimized LRTI threshold; light gray indicates it did not. Dark red indicates the subject was positive for both pathogen-plus-host metrics, and thus was classified as having LRTI. White indicates missing data.

Discussion

Of all infectious disease categories, LRTIs impart the greatest mortality both worldwide and in the United States (1). Contributing to this is the rising rate of treatment failure due to antibiotic resistance (53) and the limited performance of existing diagnostics for identifying respiratory pathogens (4, 54). In this prospective cohort study, we describe the use of unbiased mNGS for respiratory infectious disease diagnosis in the ICU. We develop methods that advance pathogen-based genomic diagnostics as well as existing host transcriptional classifier platforms by simultaneously assessing respiratory pathogens, the airway microbiome, and the host transcriptome in a single test to predict LRTI and identify disease etiology. We find that host/pathogen mNGS accurately detects LRTI in patients with acute respiratory failure and can provide a microbiologic diagnosis in cases due to unknown etiology.

Host transcriptional profiling has gained attention as a promising approach to LRTI diagnosis (5, 16) but is understudied in critically ill and immunocompromised patients, who may be the most likely to benefit from this technology. We addressed this gap by interrogating airway gene expression in a critically ill cohort with 45% immunocompromised patients to develop an accurate host transcriptional classifier. Unlike existing classifiers, host–microbe mNGS offers the advantage of simultaneous species-level microbial identification.

The role of commensal lung microbiota in health and disease is an area of active investigation. We corroborated prior findings demonstrating microbiome differences between subjects with respiratory infections and those with noninfectious airway disease (20, 37). More specifically, we found that LRTI was associated with reduced intrapatient α diversity of the airway microbiome and that, collectively, patients with LRTI differed significantly from those without in terms of β diversity and microbial sequence abundance. This diversity difference was more pronounced when assessed by RNA-seq, potentially due to inclusion of RNA viruses and transcripts from actively replicating pathogens in infected patients. As a biomarker, RNA-seq SDI had moderate utility for predicting LRTI; however, it did not enhance performance in combination with the other metrics, perhaps due to negative correlation with microbe score (r = −0.84 in the derivation cohort).

Discriminating respiratory pathogens from background commensal microbiota is a key challenge for LRTI diagnostics and is particularly relevant for sensitive molecular assays (55). We directly addressed this by developing two complementary algorithms (RBM and LRM) that parsed putative pathogens from airway commensals. When combined, these models enabled a microbiologic diagnosis in significantly more patients with LRTI compared with clinician-ordered diagnostics. The fact that the a priori selected model features successfully differentiated pathogens from commensals validated the underlying model assumptions related to pathogen dominance resulting in disruption of α diversity. Notably, both models also proved useful despite widespread antibiotic use before airway sampling (90% of subjects), a practice that occurs commonly and that can sterilize microbial cultures (56).

The capacity for mNGS to detect pathogens unidentifiable by standard clinical diagnostics was highlighted in several cases, including that of subject 254, who developed rapidly worsening respiratory failure and fever during a prolonged postsurgical admission. He was treated empirically for hospital acquired pneumonia with linezolid, aztreonam, and metronidazole. Lower respiratory cultures returned negative, but mNGS identified influenza C, which is not available on most clinical multiplex viral PCR assays. Notably, 12% of subjects were found to have undetected and potentially transmissible respiratory viruses despite strict precautionary respiratory contact policies at the study site, a finding that suggests the potential value of mNGS for hospital infection control. Several cases also highlighted the potential for mNGS to enhance antibiotic stewardship, and we estimated that theoretical implementation of the rule-out model within 48 h could have reduced antibiotic days of therapy by 36% in the no-LRTI validation cohort patients.

Since at the time of ICU admission it is often difficult to distinguish infectious from noninfectious acute respiratory disease, a theoretical workflow for host/microbe mNGS could involve first employing the rule-out model to assess LRTI probability and complement clinical decision making regarding discontinuation of empiric antimicrobials. In cases where LRTI was ultimately suspected, a microbiologic diagnosis could then be obtained using a combination of the RBM and LRM to accurately screen for both well-established and uncommon respiratory pathogens. A principal advantage of mNGS is that all potential infectious agents can be simultaneously assessed, which avoids the need for ordering multiple individual tests for each different pathogen of concern. Future studies in a larger validation cohort can help optimize host and microbe LRTI rule-out thresholds and further assess test performance before deployment in a clinical setting.

Some limitations of host/microbe mNGS were apparent and included false-positive detection of pathobionts such as H. influenzae and S. pneumoniae in the no-LRTI group, and false positivity of the host-response metric in subjects including patient 349, who was diagnosed with α-1 antitrypsin deficiency-associated pulmonary disease. The relatively small sample size of our derivation and validation cohorts increased the potential for data overfitting and was a limitation of our study. Learning curve estimates, however, indicated that the sample size was optimal for pathogen versus commensal prediction, and adequate for the host classifier, consistent with the estimate from an established sample size prediction tool for high-dimensional classifiers (57) (SI Appendix, Methods). Nonetheless, a larger cohort will be necessary to improve the robustness of model performance estimates and better assess synergy resulting from combining host and microbial metrics.

Strengths of this study include an innovative bioinformatics approach, detailed patient phenotyping, and a study population reflective of the true heterogeneity of ICU patients, including severely immunocompromised subjects and patients receiving broad-spectrum antibiotics. Future studies in a larger cohort can further validate these findings, strengthen the utility of these models, and assess the impact of mNGS on clinical outcomes. In summary, we report a multifaceted approach to LRTI diagnosis that integrates three central elements of airway infections: the pathogen, airway microbiome, and host’s response.

Methods

Study Design and Subjects.

This prospective observational study evaluated adults with acute respiratory failure requiring mechanical ventilation who were admitted to the University of California, San Francisco (UCSF) Moffitt–Long Hospital ICUs. Subjects were enrolled sequentially between July 25, 2013, and October 17, 2017, within the first 72 h of intubation for respiratory failure. The UCSF Institutional Review Board approved an initial waiver consent for obtaining excess respiratory fluid, blood, and urine samples, and informed consent was subsequently obtained from patients or their surrogates for continued study participation according to CHR protocol 10-02701. For patients whose surrogates provided informed consent, follow-up consent was then obtained if patients survived their acute illness and regained the ability to consent. For subjects who died before consent being obtained, a full waiver of consent was approved. For all surviving subjects, if consent was not eventually obtained from either patient or surrogate, all specimens were discarded.

Clinical Microbiologic Testing.

During the period of study enrollment, subjects received standard of care microbiologic testing ordered by the treating clinicians. Respiratory testing from TA, bronchial alveolar lavage (BAL), or mini-BAL included the following: bacterial and fungal stains and semiquantitative cultures (n = 90); AFB stains and cultures (n = 8); 12-target clinical multiplex PCR (Luminex) for influenza A/B, respiratory syncytial virus (RSV), human metapneumovirus (HMPV), human rhinovirus (HRV), adenovirus (ADV), and parainfluenza viruses (PIV) 1–4 (n = 23); Legionella culture (n = 1); Legionella pneumophila PCR (n = 4); cytomegalovirus (CMV) culture (n = 4); and cytology for Pneumocystis jiroveccii (n = 4). Other microbiologic testing included blood culture (n = 89); urine culture (n = 87); serum cryptococcal antigen (n = 4); serum galactomannan (n = 1); and serum β-d-glucan (n = 1).

Definitions and Clinical Adjudication of LRTI.

Because admission diagnoses made by treating clinicians at the time of study enrollment were by necessity based on incomplete clinical, microbiologic, and treatment outcome information, a post hoc adjudication approach was carried out to enhance accuracy of LRTI diagnosis. For this, two attending physicians [one from infectious disease (C.L.) and one from pulmonary medicine (F.M.)] blinded to mNGS results, retrospectively reviewed each patient’s medical record following hospital discharge or death to determine whether they met the CDC/NHSN surveillance definition of pneumonia, with respect to clinical and/or microbiologic criteria (Dataset S1) (19). Chart review consisted of in-depth analysis of complete patient histories, including laboratory and radiographic results, inpatient notes, and postdischarge clinic notes. Using this approach, subjects were assigned to one of four groups, consistent with a recently described approach (16): (i) LRTI defined by both clinical and laboratory criteria; (ii) no evidence of respiratory infection and with a clear alternative explanation for respiratory failure (no-LRTI); (iii) LRTI defined by clinical criteria only (LRTI+C); and (iv) unknown, LRTI possible (unk-LRT). A determination of noninfectious etiology was made only if an alternative diagnosis could be established and results of standard clinical microbiological testing for LRTI were negative.

Host/Microbe mNGS.

Excess TA was collected on ice, mixed 1:1 with DNA/RNA Shield (Zymo), and frozen at −80 °C. RNA and DNA were extracted from 300 µL of patient TA using bead-based lysis and the Allprep DNA/RNA kit (Qiagen). RNA was reverse transcribed to generate cDNA and used to construct sequencing libraries using the NEBNext Ultra II Library Prep Kit (New England Biolabs). DNA underwent adapter addition and barcoding using the Nextera library preparation kit (Illumina) as previously described (20). Depletion of abundant sequences by hybridization (DASH) was employed to selectively deplete human mitochondrial cDNA, thus enriching for both microbial and human protein coding transcripts (58). The final RNA-seq and DNA-seq libraries underwent 125-nt paired-end Illumina sequencing on a HiSeq 4000.

Pathogen Detection Bioinformatics.

Detection of host transcripts and airway microbes leveraged a custom bioinformatics pipeline (20) that incorporated quality filtering using PRICESeqfilter (23) and alignment against the human genome (NCBI GRC h38) using the STAR (59) aligner to extract genecounts. To capture respiratory pathogens, additional filtering to remove Pan troglodytes (UCSC PanTro4) was performed using STAR and removal of nonfungal eukaryotes, cloning vectors, and phiX phage was performed using Bowtie2 (60). The identities of the remaining microbial reads were determined by querying the NCBI nucleotide (NT) and nonredundant protein (NR) databases using GSNAP-L and RAPSEARCH2, respectively.

Microbial alignments detected by RNA-seq and DNA-seq were aggregated to the genus-level and independently evaluated to determine genus α diversity as described below. The sequencing reads comprising each genus were then evaluated for taxonomic assignment at the species level based on species relative abundance as previously described (20). For each patient, the top 15 most abundant taxa by RNA rpm were identified and evaluated under the requirement that all bacteria, fungi, and DNA viruses had concordant detection of their genomes by DNA-seq and concordant alignments in NR and NT. RNA viruses did not require concordant DNA-seq reads (Fig. 2 and Dataset S3A). To differentiate putative pathogens from commensal microbiota, we developed RBM and LRM methods and benchmarked each on sequencing data from LRTI+C+M and no-LRTI subjects.

Statistical Analysis.

Statistical significance was defined as P less than 0.05, using two-tailed tests of hypotheses. Categorical data were analyzed by χ2 test and nonparametric continuous variables were analyzed by Wilcoxon rank sum. For statistical validation in the pathogen versus commensal and LRTI prediction metrics, 10 LRTI+C+M and 10 no-LRTI cases were randomly assigned to create a derivation cohort. Model performance was assessed in an independent validation cohort consisting of 16 LRTI+C+M and 8 no-LRTI cases.

Pathogen Versus Commensal Models.

We found that all clinically confirmed LRTI pathogens were present within the top 15 most abundant microbes by RNA-seq rpm, which on average represented 99% of reads across all samples. We thus limited analysis to the 15 most abundant NGS-detected genera in each sample. For both models, microbes identified using clinician-ordered diagnostics and all viruses with established respiratory pathogenicity in the derivation cohort subjects were considered “pathogens.” Any additional microbes identified by mNGS in these subjects were considered “commensals”. This equated to 12 “pathogens” and 155 “commensals” in the 20 derivation cohort patients, and 26 “pathogens” and 174 “commensals” in the 24 validation cohort patients.

RBM.

This model leveraged previous findings demonstrating that microbial communities in patients with LRTI are characterized by one or more dominant pathogens present in high abundance (20, 39). Using either RNA-seq rpm alone (RNA-viruses) or the combination of RNA-seq and DNA-seq rpm (all others), this model identified the subset of microbes with the greatest relative abundance in each sample, which consisted of single microbes in cases of a dominant pathogen and also identified coinfections where several microbes were present within a similar range. All viruses detected by RNA-seq at >0.1 rpm and present within the a priori-developed reference index of established respiratory pathogens were considered putative pathogens in the model. The remaining taxa (bacteria, fungi, and DNA viruses) were then aggregated at the genus level, assigned an abundance score based on [log(RNA-seq rpm) + log(DNA-seq rpm)], and sorted in descending order by this score. The greatest change in abundance score between sequentially ranked microbes was identified, and all genera with an abundance score greater than this threshold were then evaluated at the species level, by identifying the most abundant species within each genus. If the species was present within the a priori-developed reference index of established respiratory pathogens, it was selected as a putative pathogen by the model (Fig. 2).

LRM.

This model employed the Python (version 3.6.1) sklearn (version 0.18.1) package to train on distinguishing between “pathogen” and “commensals” using the following five input features: log(RNA-seq rpm), log(DNA-seq rpm), per-patient RNA-seq abundance rank, and two binary variables indicating whether the microbe could be identified in the established index of respiratory pathogens or was a virus. These features were selected in alignment with the observation that the pathogens identified in the LRTI+C+M group were more abundant and within the top-ranked microbes. Moreover, the individual features were significantly different between the pathogens and commensals: (RNA-seq rpm, P = 2.44 × 10−4; DNA-seq rpm, P = 3.55 × 10−3; scoring rank, P = 3.51 × 10−6). Model performance was estimated in the derivation and validation cohorts and learning curves were computed (SI Appendix, Methods). For identification of etiologic pathogens reported (Fig. 3 and Datasets S2 and S3A) the threshold of 0.36 was used for consistency between the LRM for pathogen identification and LRTI detection.

LRTI Prediction Based on Pathogen.

Outside of identifying putative LRTI pathogens, we evaluated whether LRM microbial score alone could be used to classify subjects as LRTI positive or LRTI negative. To do so, we used the top LRM-derived pathogen probability score per patient and evaluated the performance of this value alone to predict likelihood of infection in the LRTI+C+M versus no-LRTI subjects.

Lung Microbiome Diversity Analysis.

The α diversity of the respiratory microbiome for each subject was assessed by SDI and Simpson diversity index at the genus level using NT rpm and the Vegan (version 2.4.4) (61) package in R (version 3.4.0) (62). Richness (total number of genera) and genus-specific library sequence abundance (total number of microbial reads normalized per million reads sequenced) were also evaluated. Viral, bacterial, and fungal microbes were included in all diversity analyses, computed independently for RNA- and DNA-seq samples without requiring that taxa be concordant on both nucleic acids. Diversity values were then compared between patients with clinically adjudicated LRTI (LRTI+C+M) and those with respiratory failure due to noninfectious causes (no-LRTI) using the nonparametric Wilcoxon rank sum test. Evaluation of α diversity for prediction LRTI status was performed using the SDI value. The β diversity was evaluated using the Bray–Curtis dissimilarity metric calculated at the genus level using NT rpm and the Vegan package in R. Statistical significance of the β diversity between LRTI+C+M and no-LRTI patients was assessed using PERMANOVA (999 permutations), and the results were visualized using nonmetric multidimensional scaling.

Host Gene Expression Analysis.

Following quality filtration with PRICESeqfilter (63), RNA transcripts were aligned to the ENSEMBL CRCh38 human genome build using STAR. Subsequently, genes were filtered to include only protein-coding genes that were expressed in at least 50% of patients. All samples used for host transcriptome analysis (both derivation and validation sets) ultimately included more than 95,000 protein-coding genes with an average of 734,844 transcripts per patient.

Differential Expression Analysis.

Gene count data were analyzed using the Bioconductor package DESeq2 (version 1.16.1) (64) in R statistical programming environment. To avoid batch-related confounding and class imbalance, we limited our differential expression analysis to the derivation cohort of 10 LRTI+C+M and 10 no-LRTI samples, sequenced in the same batch. Differentially expressed genes with FDR <0.05 were used as input to ToppGene (43) to evaluate for functional pathway enrichment.

Host Gene Expression Classifier for LRTI Prediction.

The derivation cohort was independently normalized using DESeq2 and log-transformed. The values for each gene in the derivation cohort were then scaled and centered by z score. A classifier was built using the elastic net regularized regression model implementation from the glmnet package (version 2.0.13) in the R Statistical Programming Language (version 3.4.0). Regularization parameter α = 0.5 was selected using leave-one-out cross-validation and optimizing for AUC. To account for heterogeneity in the cohort, the model included covariates of concurrent bloodstream infection, immunosuppression, and gender. No significant difference was seen in these parameters between LRTI+C+M and no-LRTI (Dataset S7B). These covariates were reduced to zero in the model-fitting stage. Genes with nonzero weights were used for classification. To obtain a single-value score for each patient, genes selected by the elastic net were evaluated for their correlation with each of the two groups. Genes for which the mean expression was greater in the LRTI+C+M were assigned a weight of 1, and those with mean expression greater in no-LRTI were assigned a weight of −1. The normalized, scaled, expression values for each patient were multiplied by the weight vector and summed across all genes. The total sum was used as a representative score, and the AUC was calculated. Given the importance of sensitivity in the context of diagnostics, the threshold selected for analysis of the test cohort and combined metrics (scores, −4) was chosen as the threshold which provided 100% sensitivity in the derivation cohort. The host gene expression classifier was then validated on the validation set, and learning curves were used to estimate the reliability of the performance metrics (SI Appendix, Methods).

Classifier Combination.

To generate a readily interpretable compilation of host and microbial mNGS metrics that could enable a data-driven assessment of LRTI, the rule-out model was developed. In the rule-out model, we identified score thresholds from the pathogen and host metrics required to achieve 100% sensitivity in the derivation cohort (pathogen > 0.36, and host > −4) and applied these to the validation cohort to predict LRTI using the following combinatorial rule: LRTI = (Host) positive OR (Microbe) positive.

Identification and Mitigation of Environmental Contaminants.

To minimize inaccurate taxonomic assignments due to environmental contaminants, we processed negative water controls with each group of samples that underwent nucleic acid extraction, and included these, as well as positive control clinical samples, with each sequencing run. We directly subtracted alignments to those taxa in water control samples detected by both RNA-seq and DNA-seq analyses (Dataset S8) from the raw rpm values in all samples. To account for selective amplification bias of contaminants in water controls resulting from PCR amplification of metagenomic libraries to a fixed standard concentration across all samples, before direct subtraction we scaled taxa rpms in the water controls to the median percent microbial reads present across all samples (0.04%). In addition, we confirmed reproducibility of results by sequencing 10% of samples in triplicate and evaluated discrepancies between mNGS and standard diagnostics in a random subset of LRTI+C patients using clinically validated 16S bacterial rRNA gene sequencing and/or viral PCR testing, as described above.

Data Availability.

Raw microbial sequences are available via SRA BioProject accession ID SRP139967. Host transcript counts are tabulated in (SI Appendix, Dataset S9). Scripts for the classification algorithms are available on GitHub at: https://github.com/DeRisi-Lab/Host-MicrobeLRTI .

Supplementary Material

Supplementary File
pnas.1809700115.sd02.xlsx (13.2KB, xlsx)
Supplementary File
pnas.1809700115.sd03.xlsx (19.9KB, xlsx)
Supplementary File
Supplementary File
pnas.1809700115.sd01.xlsx (19.7KB, xlsx)
Supplementary File
Supplementary File
pnas.1809700115.sd05.xlsx (10.6KB, xlsx)
Supplementary File
pnas.1809700115.sd06.xlsx (165.3KB, xlsx)
Supplementary File
pnas.1809700115.sd07.xlsx (21.7KB, xlsx)
Supplementary File
Supplementary File
pnas.1809700115.sd09.xlsx (22.3MB, xlsx)

Acknowledgments

This work was supported by National Heart, Lung, and Blood Institute (NHLBI) Grant K23HL138461-01A1 (to C.L.); NHLBI Grant K23HL123778 (to S.C.); National Institute of Allergy and Infectious Diseases Grant P01AI091575; NHLBI Grant K23 HL136844 (to F.M.); and NHLBI Grants R01HL110969, K24HL133390, and R35HL140026 (to C.S.C.). This work was also supported by Chan Zuckerberg Biohub (J.L.D.) and Gladstone Institutes (K.S.P.).

Footnotes

The authors declare no conflict of interest.

Data deposition: The raw microbial sequences reported in this paper have been deposited in Sequence Read Archive BioProject (accession no. SRP139967). Host transcript counts are tabulated in Dataset S9. Scripts for the classification algorithms are available on GitHub at https://github.com/DeRisi-Lab/Host-MicrobeLRTI.

See Commentary on page 13148.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1809700115/-/DCSupplemental.

References

  • 1.World Health Organization 2017 The top 10 causes of death. Available at www.who.int/en/news-room/fact-sheets/detail/the-top-10-causes-of-death. Accessed October, 1, 2018.
  • 2.US Centers for Disease Control and Prevention 2018 Deaths: Leading Causes for 2016. Available at https://www.cdc.gov/nchs/fastats/leading-causes-of-death.htm. Accessed October 1, 2018.
  • 3.El Bcheraoui C, et al. Trends and patterns of differences in infectious disease mortality among US counties, 1980–2014. JAMA. 2018;319:1248–1260. doi: 10.1001/jama.2018.2089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Jain S, et al. CDC EPIC Study Team Community-acquired pneumonia requiring hospitalization among U.S. Adults. N Engl J Med. 2015;373:415–427. doi: 10.1056/NEJMoa1500245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Zaas AK, et al. The current epidemiology and clinical decisions surrounding acute respiratory infections. Trends Mol Med. 2014;20:579–588. doi: 10.1016/j.molmed.2014.08.001. [DOI] [PubMed] [Google Scholar]
  • 6.Wilson MR, et al. Actionable diagnosis of neuroleptospirosis by next-generation sequencing. N Engl J Med. 2014;370:2408–2417. doi: 10.1056/NEJMoa1401268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Leffler DA, Lamont JT. Clostridium difficile infection. N Engl J Med. 2015;372:1539–1548. doi: 10.1056/NEJMra1403772. [DOI] [PubMed] [Google Scholar]
  • 8.Ranzani OT, et al. New sepsis definition (Sepsis-3) and community-acquired pneumonia mortality. A validation and clinical decision-making study. Am J Respir Crit Care Med. 2017;196:1287–1297. doi: 10.1164/rccm.201611-2262OC. [DOI] [PubMed] [Google Scholar]
  • 9.Bibby K. Metagenomic identification of viral pathogens. Trends Biotechnol. 2013;31:275–279. doi: 10.1016/j.tibtech.2013.01.016. [DOI] [PubMed] [Google Scholar]
  • 10.Yozwiak NL, et al. Virus identification in unknown tropical febrile illness cases using deep sequencing. PLoS Negl Trop Dis. 2012;6:e1485. doi: 10.1371/journal.pntd.0001485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Fischer N, et al. Evaluation of unbiased next-generation sequencing of RNA (RNA-seq) as a diagnostic method in influenza virus-positive respiratory samples. J Clin Microbiol. 2015;53:2238–2250. doi: 10.1128/JCM.02495-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Graf EH, et al. Unbiased detection of respiratory viruses by use of RNA sequencing-based metagenomics: A systematic comparison to a commercial PCR panel. J Clin Microbiol. 2016;54:1000–1007. doi: 10.1128/JCM.03060-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wilson MR, et al. Diagnosing Balamuthia mandrillaris encephalitis with metagenomic deep sequencing. Ann Neurol. 2015;78:722–730. doi: 10.1002/ana.24499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wilson, et al. Chronic meningitis investigated via metagenomic next-generation sequencing. Jama Neurol. 2018;75:947–955. doi: 10.1001/jamaneurol.2018.0463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Naccache SN, et al. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res. 2014;24:1180–1192. doi: 10.1101/gr.171934.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Tsalik EL, et al. Host gene expression classifiers diagnose acute respiratory illness etiology. Sci Transl Med. 2016;8:322ra11. doi: 10.1126/scitranslmed.aad6873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Suarez NM, et al. Superiority of transcriptional profiling over procalcitonin for distinguishing bacterial from viral lower respiratory tract infections in hospitalized adults. J Infect Dis. 2015;212:213–222. doi: 10.1093/infdis/jiv047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Tsalik EL, McClain M, Zaas AK. Moving toward prime time: Host signatures for diagnosis of respiratory infections. J Infect Dis. 2015;212:173–175. doi: 10.1093/infdis/jiv032. [DOI] [PubMed] [Google Scholar]
  • 19.US Centers for Disease Control and Prevention 2017 CDC/NHSN surveillance definitions for specific types of infections. Available at https://www.cdc.gov/nhsn/pdfs/pscmanual/17pscnosinfdef_current.pdf. Accessed October, 1, 2018.
  • 20.Langelier C, et al. Metagenomic sequencing detects respiratory pathogens in hematopoietic cellular transplant patients. Am J Respir Crit Care Med. 2018;197:524–528. doi: 10.1164/rccm.201706-1097LE. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Doan T, et al. Illuminating uveitis: Metagenomic deep sequencing identifies common and rare pathogens. Genome Med. 2016;8:90. doi: 10.1186/s13073-016-0344-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Dickson RP, et al. Bacterial topography of the healthy human lower respiratory tract. MBio. 2017;8:e02287-16. doi: 10.1128/mBio.02287-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Panzer AR, et al. Lung microbiota is related to smoking status and to development of acute respiratory distress syndrome in critically ill trauma patients. Am J Respir Crit Care Med. 2018;197:621–631. doi: 10.1164/rccm.201702-0441OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Morris A, et al. Lung HIV Microbiome Project Comparison of the respiratory microbiome in healthy nonsmokers and smokers. Am J Respir Crit Care Med. 2013;187:1067–1075. doi: 10.1164/rccm.201210-1913OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Segal LN, et al. Enrichment of the lung microbiome with oral taxa is associated with lung inflammation of a Th17 phenotype. Nat Microbiol. 2016;1:16031. doi: 10.1038/nmicrobiol.2016.31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Heinonen S, et al. Rhinovirus detection in symptomatic and asymptomatic children: Value of host transcriptome analysis. Am J Respir Crit Care Med. 2016;193:772–782. doi: 10.1164/rccm.201504-0749OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Wertheim HFL, et al. The role of nasal carriage in Staphylococcus aureus infections. Lancet Infect Dis. 2005;5:751–762. doi: 10.1016/S1473-3099(05)70295-4. [DOI] [PubMed] [Google Scholar]
  • 28.McCullers JA. The co-pathogenesis of influenza viruses with bacteria in the lung. Nat Rev Microbiol. 2014;12:252–262. doi: 10.1038/nrmicro3231. [DOI] [PubMed] [Google Scholar]
  • 29.Magill SS, et al. Emerging Infections Program Healthcare-Associated Infections and Antimicrobial Use Prevalence Survey Team Multistate point-prevalence survey of health care-associated infections. N Engl J Med. 2014;370:1198–1208. doi: 10.1056/NEJMoa1306801. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kalil AC, et al. Management of adults with hospital-acquired and ventilator-associated pneumonia: 2016 clinical practice guidelines by the Infectious Diseases Society of America and the American Thoracic Society. Clin Infect Dis. 2016;63:e61–e111. doi: 10.1093/cid/ciw353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Mandell LA, et al. Infectious Diseases Society of America; American Thoracic Society Infectious Diseases Society of America/American Thoracic Society consensus guidelines on the management of community-acquired pneumonia in adults. Clin Infect Dis. 2007;44(Suppl 2):S27–S72. doi: 10.1086/511159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Cillóniz C, Civljak R, Nicolini A, Torres A. Polymicrobial community-acquired pneumonia: An emerging entity. Respirology. 2016;21:65–75. doi: 10.1111/resp.12663. [DOI] [PubMed] [Google Scholar]
  • 33.Pabbaraju K, et al. Detection of influenza C virus by a real-time RT-PCR assay. Influenza Other Respir Viruses. 2013;7:954–960. doi: 10.1111/irv.12099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Dewhirst FE, et al. The human oral microbiome. J Bacteriol. 2010;192:5002–5017. doi: 10.1128/JB.00542-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Chen C, et al. New microbiota found in sputum from patients with community-acquired pneumonia. Acta Biochim Biophys Sin (Shanghai) 2013;45:1039–1048. doi: 10.1093/abbs/gmt116. [DOI] [PubMed] [Google Scholar]
  • 36.Ichinohe T, et al. Microbiota regulates immune defense against respiratory tract influenza A virus infection. Proc Natl Acad Sci USA. 2011;108:5354–5359. doi: 10.1073/pnas.1019378108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Abreu NA, et al. Sinus microbiome diversity depletion and Corynebacterium tuberculostearicum enrichment mediates rhinosinusitis. Sci Transl Med. 2012;4:151ra124. doi: 10.1126/scitranslmed.3003783. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Dickson RP, et al. Analysis of culture-dependent versus culture-independent techniques for identification of bacteria in clinically obtained bronchoalveolar lavage fluid. J Clin Microbiol. 2014;52:3605–3613. doi: 10.1128/JCM.01028-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Flanagan JL, et al. Loss of bacterial diversity during antibiotic treatment of intubated patients colonized with Pseudomonas aeruginosa. J Clin Microbiol. 2007;45:1954–1962. doi: 10.1128/JCM.02187-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Birtel J, Walser J-C, Pichon S, Bürgmann H, Matthews B. Estimating bacterial diversity for ecological studies: Methods, metrics, and assumptions. PLoS One. 2015;10:e0125356. doi: 10.1371/journal.pone.0125356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Bray JR, Curtis JT. An ordination of the upland forest communities of southern Wisconsin. Ecol Monogr. 1957;27:325–349. [Google Scholar]
  • 42.Sweeney TE, Wong HR, Khatri P. Robust classification of bacterial and viral infections via integrated host gene expression diagnostics. Sci Transl Med. 2016;8:346ra91. doi: 10.1126/scitranslmed.aaf7165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Chen J, Bardes EE, Aronow BJ, Jegga AG. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 2009;37:W305–W311. doi: 10.1093/nar/gkp427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Macian F. NFAT proteins: Key regulators of T-cell development and function. Nat Rev Immunol. 2005;5:472–484. doi: 10.1038/nri1632. [DOI] [PubMed] [Google Scholar]
  • 45.Fu M, Blackshear PJ. RNA-binding proteins in immune regulation: A focus on CCCH zinc finger proteins. Nat Rev Immunol. 2017;17:130–143. doi: 10.1038/nri.2016.129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Biswas K, et al. Differentially regulated host proteins associated with chronic rhinosinusitis are correlated with the sinonasal microbiome. Front Cell Infect Microbiol. 2017;7:504. doi: 10.3389/fcimb.2017.00504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Stewart CR, et al. CD36 ligands promote sterile inflammation through assembly of a Toll-like receptor 4 and 6 heterodimer. Nat Immunol. 2010;11:155–161. doi: 10.1038/ni.1836. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Cohen TS, et al. S. aureus blocks efferocytosis of neutrophils by macrophages through the activity of its virulence factor alpha toxin. Sci Rep. 2016;6:35466. doi: 10.1038/srep35466. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Baranano DE, Rao M, Ferris CD, Snyder SH. Biliverdin reductase: A major physiologic cytoprotectant. Proc Natl Acad Sci USA. 2002;99:16093–16098. doi: 10.1073/pnas.252626999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Leidi M, Mariotti M, Maier JAM. EDF-1 contributes to the regulation of nitric oxide release in VEGF-treated human endothelial cells. Eur J Cell Biol. 2010;89:654–660. doi: 10.1016/j.ejcb.2010.05.001. [DOI] [PubMed] [Google Scholar]
  • 51.Pousada G, Baloira A, Fontán D, Núñez M, Valverde D. Mutational and clinical analysis of the ENG gene in patients with pulmonary arterial hypertension. BMC Genet. 2016;17:72. doi: 10.1186/s12863-016-0384-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Newman AM, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods. 2015;12:453–457. doi: 10.1038/nmeth.3337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Currie CJ, et al. Antibiotic treatment failure in four common infections in UK primary care 1991–2012: Longitudinal analysis. BMJ. 2014;349:g5493. doi: 10.1136/bmj.g5493. [DOI] [PubMed] [Google Scholar]
  • 54.Jain S, Finelli L. CDC EPIC Study Team Community-acquired pneumonia among U.S. children. N Engl J Med. 2015;372:2167–2168. doi: 10.1056/NEJMc1504028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Walter JM, Wunderink RG. Severe respiratory viral infections: New evidence and changing paradigms. Infect Dis Clin North Am. 2017;31:455–474. doi: 10.1016/j.idc.2017.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Sands KM, et al. Respiratory pathogen colonization of dental plaque, the lower airways, and endotracheal tube biofilms during mechanical ventilation. J Crit Care. 2017;37:30–37. doi: 10.1016/j.jcrc.2016.07.019. [DOI] [PubMed] [Google Scholar]
  • 57.Dobbin KK, Zhao Y, Simon RM. How large a training set is needed to develop a classifier for microarray data? Clin Cancer Res. 2008;14:108–114. doi: 10.1158/1078-0432.CCR-07-0443. [DOI] [PubMed] [Google Scholar]
  • 58.Gu W, et al. Depletion of Abundant Sequences by Hybridization (DASH): Using Cas9 to remove unwanted high-abundance species in sequencing libraries and molecular counting applications. Genome Biol. 2016;17:41. doi: 10.1186/s13059-016-0904-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Dobin A, et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Oksanen J, et al. 2016 vegan: Community Ecology Package. R Package, Version 2.3-5. Available at https://rdrr.io/rforge/vegan/. Accessed October 1, 2017.
  • 62.R Core Team 2013 R: A Language and Environment for Statistical Computing, Version 3.4.0 (R Foundation for Statistical Computing, Vienna). Available at www.R-project.org/. Accessed October 1, 2017.
  • 63.Ruby JG, Bellare P, Derisi JL. PRICE: Software for the targeted assembly of components of (Meta) genomic sequence data. G3 (Bethesda) 2013;3:865–880. doi: 10.1534/g3.113.005967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Fine MJ, et al. A prediction rule to identify low-risk patients with community-acquired pneumonia. N Engl J Med. 1997;336:243–250. doi: 10.1056/NEJM199701233360402. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
pnas.1809700115.sd02.xlsx (13.2KB, xlsx)
Supplementary File
pnas.1809700115.sd03.xlsx (19.9KB, xlsx)
Supplementary File
Supplementary File
pnas.1809700115.sd01.xlsx (19.7KB, xlsx)
Supplementary File
Supplementary File
pnas.1809700115.sd05.xlsx (10.6KB, xlsx)
Supplementary File
pnas.1809700115.sd06.xlsx (165.3KB, xlsx)
Supplementary File
pnas.1809700115.sd07.xlsx (21.7KB, xlsx)
Supplementary File
Supplementary File
pnas.1809700115.sd09.xlsx (22.3MB, xlsx)

Data Availability Statement

Raw microbial sequences are available via SRA BioProject accession ID SRP139967. Host transcript counts are tabulated in (SI Appendix, Dataset S9). Scripts for the classification algorithms are available on GitHub at: https://github.com/DeRisi-Lab/Host-MicrobeLRTI .


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES