ABSTRACT
Methicillin-resistant Staphylococcus aureus (MRSA) is an uncommon but serious cause of community-acquired pneumonia (CAP). A lack of validated MRSA CAP risk factors can result in overuse of empirical broad-spectrum antibiotics. We sought to develop robust models predicting the risk of MRSA CAP using machine learning using a population-based sample of hospitalized patients with CAP admitted to either a tertiary academic center or a community teaching hospital. Data were evaluated using a machine learning approach. Cases were CAP patients with MRSA isolated from blood or respiratory cultures within 72 h of admission; controls did not have MRSA CAP. The Classification Tree Analysis algorithm was used for model development. Model predictions were evaluated in sensitivity analyses. A total of 21 of 1,823 patients (1.2%) developed MRSA within 72 h of admission. MRSA risk was higher among patients admitted to the intensive care unit (ICU) in the first 24 h who required mechanical ventilation than among ICU patients who did not require ventilatory support (odds ratio [OR], 8.3; 95% confidence interval [CI], 2.4 to 32). MRSA risk was lower among patients admitted to ward units than among those admitted to the ICU (OR, 0.21; 95% CI, 0.07 to 0.56) and lower among ICU patients without a history of antibiotic use in the last 90 days than among ICU patients with antibiotic use in the last 90 days (OR, 0.03; 95% CI, 0.002 to 0.59). The final machine learning model was highly accurate (receiver operating characteristic [ROC] area = 0.775) in training and jackknife validity analyses. We identified a relatively simple machine learning model that predicted MRSA risk in hospitalized patients with CAP within 72 h postadmission.
KEYWORDS: MRSA infection, antibiotic stewardship, community-acquired pneumonia, machine learning, predictive model
INTRODUCTION
Methicillin-resistant Staphylococcus aureus (MRSA) is a pathogen of concern among patients with pneumonia, particularly those with health care exposure. Among hospitalized patients with MRSA, pneumonia was the second most common clinical condition (17%) after skin and soft tissue infections (42%) according to the Agency for Healthcare Research and Quality (1). Whereas hospital-acquired pneumonia (HAP) guidelines recommend empirical MRSA coverage when MRSA rates exceed 10 to 20% of S. aureus isolates, community-acquired pneumonia (CAP) guidelines recommend the use of epidemiology and risk factors to determine the need for MRSA coverage (2). However, prior research demonstrates that MRSA comprises an extremely small number of CAP cases within the United States (approximately 0.7%) (3). Concurrently, empirical anti-MRSA therapy (e.g., vancomycin) is disproportionally prescribed for CAP, at a rate 30-fold higher than the observed prevalence of MRSA (3), without yielding improvement in clinical outcomes (4). Prior research has also shown that vancomycin is associated with a 10% attributable risk of acute kidney injury (AKI) (5), making avoidance of unnecessary vancomycin a major focus of antimicrobial stewardship programs. Yet, vancomycin remains the single most prescribed antibiotic across all hospitalized patients (6). Thus, a need exists to optimize antibiotic decision making for hospitalized patients with CAP—a population at high risk for unnecessary vancomycin use.
Patients with CAP are prescribed broad-spectrum antibiotics far too frequently (7, 8). Inappropriate antibiotic prescribing for CAP is associated with prolonged lengths of stay (9) and greater mortality (10). Optimizing antibiotic prescribing for patients with CAP is thus hypothesized to improve clinical outcomes and reduce the risk of developing vancomycin-induced AKI. In the empirical treatment of CAP, the need for anti-MRSA therapy may be difficult to discern. Models of MRSA infection risk that capture the small number of actual cases without leading to excess anti-MRSA antibiotic treatment are clearly needed. Several models of drug-resistant pathogen infection risk for patients with pneumonia have been published (10–12); however, the validity of each model for predicting MRSA infection risk in patients with CAP has not been robustly evaluated. Furthermore, MRSA risk models should be relatively simple to use—allowing patient-specific risk factors to be extracted from a hospital’s electronic health record (EHR) and to be integrated within pneumonia-specific clinical decision support tools. Accordingly, this study sought to develop and evaluate machine learning models capable of using information available in the EHR to predict the risk of MRSA CAP early in the course of hospital admission.
RESULTS
Included patients and prevalence of MRSA.
During the study period, a total of 2,062 patient charts were retrieved from the Northwestern University Electronic Data Warehouse (EDW) query that met our CAP definition, yielding a total of 1,893 unique patients. A summary of patients meeting inclusion or exclusion criteria is provided in Fig. 1. Of these patients, 63 had a Gram-negative infection in the absence of MRSA and were excluded (n = 1,830). Of the remaining patients, seven had MRSA cultured from outside the blood or respiratory tract and were excluded. Thus, 1,823 unique patients were included (n = 1,341 from Northwestern Memorial Hospital [NMH] and n = 482 from Lake Forest Hospital [LFH]). MRSA nasal screening was not universally performed during the study period. MRSA nares colonization was 6.6% among patients admitted to the intensive care unit (ICU) within the first 24 h compared to 1.3% among patients who were not initially admitted to the ICU. Within the first 72 h after admission, n = 21 (1.2%) patients were found to have MRSA in blood (n = 4) or respiratory (n = 18) cultures; these 21 patients were therefore classified as cases (Fig. 1) and the remaining 1,802 patients were classified as controls in the analysis.
FIG 1.
Overview of CAP cohort and distribution of MRSA cases and controls. GNR, resistant Gram-negative pathogen; LFH, Lake Forest Hospital; MRSA, methicillin-resistant Staphylococcus aureus; NMH, Northwestern Memorial Hospital.
Patient demographics.
Preadmission demographics and postadmission variables for cases and controls are summarized in Table S1 in the supplemental material. Cases were 48% male, had a median age of 61 years, and self-identified as Caucasian (57%), African American (33%), or Asian (5%). Controls were 50% male, had a median age of 67 years, and self-identified as Caucasian (60%), African American (22%), or Asian (3%). Patients were admitted from the emergency department (57% cases versus 39% controls), from home (38% cases versus 53% controls), or as an external transfer (4.8% cases versus 6.8% controls). MRSA cases were more likely to require ICU care within 24 h of admission (67% versus 30%) and more likely to have severe CAP (43% versus 9%) than controls. The median pneumonia severity index (PSI) was higher among MRSA cases than among controls (155 versus 119).
Table 1 summarizes univariate predictors identified by Optimal Discriminant Analysis (ODA). The following “leave-one-out” (LOO)-stable variables were identified as predictors of MRSA within 72 h of admission with an experiment-wise P value of <0.05: requirement of vasopressors within the first 24 h (effect strength for sensitivity [ESS] = 22.97), severe CAP (ESS = 33.92), ICU admission within the first 24 h (ESS = 37.09), requirement of mechanical ventilation within the first 24 h (ESS = 37.42), and history of mechanical ventilation in the prior year (ESS = 40.18).
TABLE 1.
Attributes that discriminated between patients with and without MRSA in training and validity analyses in univariate ODAa
| Evaluation time | Pre- or postadmission clinical attribute | If attribute is | Then predict | Otherwise predict | ROC aread | LOO ESS (%)b | LOO P valuec |
|---|---|---|---|---|---|---|---|
| ≤72 h | Mechanical ventilation (prior yr) | Yes | MRSA | No MRSA | 0.7009 | 40.18 | 0.000001* |
| Mechanical ventilation (first 24 h) | Yes | MRSA | No MRSA | 0.6871 | 37.42 | 9.65E-07* | |
| ICU requirement (first 24 h) | Yes | MRSA | No MRSA | 0.6855 | 37.09 | 0.000527* | |
| Severe CAP | Yes | MRSA | No MRSA | 0.6697 | 33.92 | 0.000048* | |
| Antibiotics in prior 90 days | Yes | MRSA | No MRSA | 0.6653 | 33.06 | 0.00129 | |
| Antibiotics in prior 180 days | Yes | MRSA | No MRSA | 0.6389 | 27.78 | 0.008934 | |
| LTAC or inpatient rehabilitation admission (prior yr) | Yes | MRSA | No MRSA | 0.6306 | 26.12 | 0.000553 | |
| Nursing home residence (prior yr) | Yes | MRSA | No MRSA | 0.6263 | 25.26 | 0.009885 | |
| Vasopressor requirement (first 24 h) | Yes | MRSA | No MRSA | 0.6148 | 22.97 | 0.000936* | |
| Age (yr) | >49.5 | No MRSA | MRSA | 0.6108 | 22.17 | 0.012607 | |
| Systolic blood pressure (first 24 h) | >107.5 | No MRSA | MRSA | 0.5994 | 19.88 | 0.03224 | |
| Recent hospital admission (prior 90 days) | Yes | MRSA | No MRSA | 0.5983 | 19.66 | 0.037544 | |
| Altered mental status on admission | Yes | MRSA | No MRSA | 0.5959 | 19.17 | 0.033174 | |
| Antibiotics in prior 60 days | Yes | MRSA | No MRSA | 0.5936 | 18.73 | 0.037566 | |
| Received i.v. fluid resuscitation (first 24 h) | Yes | MRSA | No MRSA | 0.5750 | 15.00 | 0.010185 | |
| Requirement of tube feeding (prior yr) | Yes | MRSA | No MRSA | 0.5437 | 8.75 | 0.013765 |
LOO, leave-one-out jackknife validity analysis; MRSA, methicillin-resistant Staphylococcus aureus infection; ODA, Optimal Discriminant Analysis; ROC, receiver operating characteristic; ESS, effect strength for sensitivity; LTAC, long-term acute care; i.v., intravenous.
ESS for a binary outcome is equivalent to the corresponding ROC area adjusted for chance (ESS = 0 is accuracy expected by chance, ESS = 100 is perfect prediction, and −100 ≤ ESS < 0 is prediction worse than expected by chance). In line with prior research, we considered any (weighted) ESS values less than 25% to indicate a relatively weak effect, values of 25% to 50% to indicate a moderate effect, values of 50% to 75% to indicate a relatively strong effect, and values of 75% or greater to indicate a strong effect (13).
Asterisks indicate that the comparison was significant at the experiment-wise level (i.e., Sidak-adjusted P < 0.05).
For a binary outcome the ROC area ranges from −1 (indicating prediction worse than chance) to 0.5 (indicating prediction expected by chance) to 1 (indicating perfect prediction) for unit-weighted data and is equivalent to the mean predictive accuracy achieved across classes.
Globally optimal CTA model of MRSA risk.
Figure 2 presents a graphical depiction of the attributes and strata that comprised the globally optimal Classification Tree Analysis (CTA) model of MRSA risk. MRSA risk was higher among patients admitted to the ICU in the first 24 h who required mechanical ventilation than among ICU patients who did not require ventilatory support (odds ratio [OR], 8.3; 95% confidence interval [CI], 2.4 to 32), was lower among patients who were admitted to ward units than among ICU patients (OR, 0.21; 95% CI, 0.07 to 0.56), and was lower among ICU patients who did not have a history of antibiotic use in the last 90 days than among ICU patients with a history of antibiotic use in the last 90 days (OR, 0.03; 95% CI, 0.002 to 0.59). The risk of MRSA according to the model is summarized in Table 2, where each model endpoint (stratum) is ranked from low to high probability.
FIG 2.
Classification tree model predicting methicillin-resistant Staphylococcus aureus (MRSA) in culture within 72 h postadmission. All included model attributes were considered predictors of case status up through but not beyond 72 h postadmission.
TABLE 2.
Staging table for predicting MRSA likelihood within 72 h postadmission
| Evaluation time frame | Stage | Admitted to ICU (first 24 h) | Mechanical ventilation (first 24 h) | Antibiotics within 90 days | n | Odds of MRSA | Percent MRSAa |
|---|---|---|---|---|---|---|---|
| ≤72 h | 1 | Yes | No | No | 323 | <1:323 | 0.3b |
| 2 | No | 1,276 | 1:181 | 0.55 | |||
| 3 | Yes | No | Yes | 120 | 1:23 | 4.17 | |
| 4 | Yes | Yes | 104 | 1:11 | 8.65 |
Percent MRSA was empirically determined.
Percent MRSA was based on minimum odds due to perfect prediction in this stage.
The performance metrics of the globally optimal CTA model are given in Table 3. The model negative predictive value was 99.6%, and the positive predictive value was 6.3%. The model sensitivity was 66.7%, and specificity was 88.4%. The ESS of this model was 55 (receiver operating characteristic [ROC] area = 0.775), with a D statistic of 3.27.
TABLE 3.
Performance summary for CTA model predicting MRSA within 72 h postadmission which met the minimum sample size requirement and was LOO stablea
| Characteristic | Value |
|---|---|
| Evaluation timed | ≤72 h |
| Smallest stratum n | 104 |
| Control n | 1,802 |
| Case n | 21 |
| Model stratab | 4 |
| Sensitivity, % | 66.7 |
| Specificity, % | 88.4 |
| PPV, % | 6.3 |
| NPV, % | 99.6 |
| ROC area | 0.775 |
| ESS | 55 |
| D | 3.27 |
| Exact P value (LOO)c | 0.00000000647 |
CTA, Classification Tree Analysis; D, distance statistic calculated from the ESS and stratum number; ESS, effect strength for sensitivity calculated from the ROC area adjusting for the number of model outcomes; LOO, leave-one-out jackknife analysis; NPV, unweighted negative predictive value; PPV, unweighted positive predictive value; ROC area, area under the receiver operating characteristic curve, equal to mean predictive accuracy for a binary outcome.
Model strata are the number of unique model outcome prediction groups wherein the smallest stratum n was observed.
Exact P value determined using permutation to assess statistical significance of the classification performance obtained using the model prediction rule in the full sample as well as in one-sample leave-one-out (LOO) jackknife analysis, which is indicated.
Case or control status modeled within 72 h postadmission. Each model predicts case or control status.
Final model evaluation.
Model classifications (Table 2) were subjected to bootstrap resampling to estimate the 90% prediction interval (PI) of predictive performance. Figure 3 provides a visual comparison of the 90% PI of ESS and ROC area for the model shown in Table 2 at two discrete time intervals postadmission (24 to 48 h and 48 to 72 h). As applied to the available patient sample at 48 to 72 h (i.e., a worst-case scenario), the 90% PI for model ESS was 37.4 to 83.5, corresponding to a ROC area of 0.687 to 0.918. The 90% PI for chance ESS was −13.3 to 20.1, corresponding to a ROC area of 0.434 to 0.601. Thus, the 90% PI of the model predictions did not overlap the 90% PI of chance. Classifications were stable in jackknife analysis (experiment-wise LOO P < 0.05).
FIG 3.
Sensitivity analysis of model performance when applied to the available patient sample at discrete time intervals postadmission. Effect strength for sensitivity (ESS) for a binary outcome (A) is equivalent to the corresponding receiver operating characteristic curve (ROC) area (B) adjusted for chance where ESS = 0 is accuracy expected by chance, ESS = 100 is perfect prediction, and −100 ≤ ESS < 0 is prediction worse than expected by chance. Classification Tree Analysis (CTA) was used to identify the machine learning model.
DISCUSSION
We identified a machine learning model that predicted MRSA infection status with high sensitivity and excellent discrimination. The model predicted the presence or absence of MRSA within 72 h postadmission with high accuracy (ROC area = 0.775), had a positive predictive value over 5-fold higher than the base rate (i.e., 6.2% positive predictive value versus 1.2% MRSA prevalence), and had a negative predictive value of 99.6%. We internally evaluated this model and found that predictions were stable in multiple validity analyses, suggesting the model was robust to individual observation variation. The point estimate of chance-corrected model accuracy (ESS) for the final model was 55, indicating a relatively strong effect (13). The lower-bound 5th percentile ESS from bootstrap analysis did not overlap the upper-bound 95th percentile of chance effects. Thus, this model would be expected to prospectively yield at worst a statistically reliable moderate effect (13). Finally, LOO analysis suggested that this model would return consistent accuracy when used to classify an independent patient sample.
The globally optimal CTA model identified clinical factors that were strongly associated with MRSA CAP. ICU admission with the requirement of mechanical ventilation within the first 24 h of admission was strongly associated with MRSA in blood or respiratory cultures. Overall, 76% of patients within our study who developed MRSA during admission had a PSI risk class of 4 or 5. Similarly, Self et al. found that 73.3% of CAP patients with MRSA infection in the multicenter Etiology of Pneumonia in the Community (EPIC) study had a PSI risk class of 4 or 5 (3). Furthermore, our study identified a population for which empirical MRSA coverage was unnecessary: the absence of ICU-level care requirement in the first 24 h of admission. Notably, Self et al. (3) also found that the prevalence of MRSA was rare among floor patients—only 2 out of 1,777 patients (0.1%) had MRSA. Clinical severity of illness indicators such as the requirement of ICU-level care or mechanical ventilation were highly associated with MRSA infection in our study. Our model supports the avoidance of empirical MRSA coverage as unnecessary for most patients who do not initially require ICU-level care.
Our model also identified subpopulations of patients with MRSA in our sample who had different levels of health care exposure. Specifically, the risk of MRSA differed among ICU patients depending on prior antibiotic exposure in the past 90 days. Among ICU patients with no history of antibiotic use within 90 days and who did not require mechanical ventilation, the risk of MRSA was similar to that of patients admitted to the wards. These findings have implications for antimicrobial and diagnostic stewardship programs (ADSPs) that seek to assist clinicians with discontinuing unnecessary vancomycin.
Our study has several notable strengths. First, our study provides direct confirmation of prior research demonstrating that MRSA CAP is uncommon (3) and supports efforts to minimize use of unnecessary anti-MRSA antibiotics. Second, our machine learning methodology provided notable advantages versus alternative machine learning algorithms which employ conventional linear statistical models, including (logistic) regression or discriminant analysis, as such methodologies do not perform well for outcomes having low event rates. Third, in contrast to linear modeling approaches, the CTA algorithm is less likely to identify spurious effects (14–17). The unique ability of CTA to identify robust, reproducible statistical models within relatively small samples carries important theoretical and translational public health implications. Fourth, a major strength of our study lies in the translational value and interpretability of a risk model for patients with CAP. The second major branch point within our CTA model (Fig. 2) was the need for mechanical ventilation. Endotracheal intubation offers the opportunity to culture and/or use molecular techniques to definitively diagnose MRSA CAP (18). A definitive answer in this subgroup leaves only 120 patients, only 6.6% of all CAP admissions, as potential candidates for empirical anti-MRSA treatment. This combination of risk factors for specific diagnostic testing and limited empirical treatment is a valuable addition to the armamentarium of ADSPs.
Our study also has limitations that must be considered. First, our sample size of cases was relatively small, preventing split-half or k-fold cross-validation. Second, our internal validation provides an estimate of the upper bound of expected model cross-generalizability (19). A cross-generalizable estimate of our model’s accuracy should be independently obtained, so as to evaluate the prospective ability of our model to correctly predict MRSA. Third, prior antibiotic therapy was based on inpatient use and was included as a single factor in our models, whereas exposure to specific agents or classes may carry differential risks and should thus be explored in subsequent studies. Fourth, we relied on data available within the EHR at the time of admission through discharge, limiting our ability to discern all possible time-varying relationships. Nevertheless, we were able to identify a model that leveraged MRSA risk factors available within the first 72 h of admission. Fifth, our model is most generalizable to a patient population similar to ours wherein microbiologic diagnostic testing was frequently performed for patients with severe CAP and those in the ICU. The model we identified should be evaluated in external validation studies as a next step and may warrant further development for prospective use in predicting MRSA risk in patients.
Conclusions.
In summary, we identified a relatively simple machine learning model that predicted the risk of MRSA in hospitalized patients with CAP. Our model discriminated between cases and controls within 72 h postadmission. Our model underscores the need for optimization of ADSP efforts targeting patients with CAP with and without MRSA risk factors stratified by the clinical acuity of presentation. ADSPs should provide enhanced diagnostic stewardship for respiratory pathogens, a systematic approach to optimize antibiotic use in CAP, and targeted risk assessments that allow discontinuation of unnecessary therapies.
MATERIALS AND METHODS
Study design, setting, and participation.
We conducted a multicenter case-control study of hospitalized patients treated empirically for CAP. The study took place at Northwestern Memorial Hospital (NMH), an 897-bed tertiary academic medical center, and Northwestern Lake Forest Hospital, a 198-bed community hospital—both located in the Chicagoland area. Data were extracted from the medical records using the Northwestern University Electronic Data Warehouse (EDW). Cases included patients diagnosed with CAP with evidence of MRSA in a blood culture or respiratory sample collected within 72 h postadmission. Controls were CAP patients diagnosed with no evidence of MRSA during the same time period. The study protocol was reviewed and approved by the institutional review boards of Northwestern University (no. STU00206507) and Midwestern University (no. 3047).
(i) Inclusion and exclusion criteria. Patients were eligible for inclusion if they were hospitalized between 1 January 2014 and 3 March 2018, were adults aged 18 years or older, and received systemic antibiotics with an indication for pneumonia (therapeutic indication was a required field for providers to answer at the point of antibiotic ordering). Concordance between indication and diagnosis was previously found to be highly accurate within our center (20). The index admission was defined as the first hospitalization for CAP during the study period. Patients were included if they had a discharge diagnosis containing pneumonia and the clinician-designated antibiotic indication was CAP, or if they had discharge diagnosis of CAP or health care-associated pneumonia and an antibiotic indication containing pneumonia. Patients were excluded if they had a discharge diagnosis of hospital-acquired pneumonia or a primary aspiration pneumonia according to the treating physician’s diagnosis. Additional exclusion criteria were a history of cystic fibrosis or frequent hospitalization (≥3 admissions within the prior 30 days) due to the overlap between CAP and HAP risk factors in patients with prolonged prior hospitalization (21). Patients with a blood or respiratory culture growing MRSA were defined as cases. Patients with MRSA identified from other sites (i.e., nonblood and nonrespiratory samples) were excluded as were patients with Gram-negative pathogens in culture in the absence of MRSA. These groups were excluded to avoid modeling non-MRSA pneumonia risk factors (22).
Study data and measurements.
(i) Identification of pneumonia pathogens and MRSA. Pneumonia pathogens were isolated from clinical blood or respiratory cultures during the study period (i.e., January 2014 to March 2018) using standard microbiological methods. Clinically relevant pathogens were identified at the species level using the Vitek MS or the Vitek II system (bioMérieux, Marcy-l’Étoile, France).
(ii) Data sources and retrieval. Patient data were retrieved from the EHR (Cerner, Millennium, North Kansas City, MO) by querying the Northwestern EDW (A. E. Pawlowski). Admission records were queried for hospital encounters matching the study inclusion criteria. Hospitalization characteristics (e.g., length of stay, antibiotic treatment duration, antibiotic selection, diagnosis, etc.), and patient demographics (e.g., age, sex, comorbidities, prior hospitalization, prior antibiotic receipt, etc.) were extracted electronically. Extracted data underwent quality assessments by two trained reviewers (N. J. Rhodes and R. Rohani) on a random subset of the total data set. The study end date was fixed due to a change in EHR systems starting March 2018.
(iii) Data preprocessing. Data elements were divided into two primary categories: (i) prehospitalization and (ii) postadmission. Prehospitalization variables included comorbidities and other clinical demographics available at or before the index hospitalization, including prior microbiology results. Postadmission variables included component and composite severity of illness measures—including the pneumonia severity index (PSI) (23) calculated on the day of hospital admission, the corresponding PSI risk class (24), admission vital signs and laboratory values, and presence of severe CAP (25). Additional categorical variables included the need for supplemental oxygen and the degree of oxygen support required (i.e., oxygen, ventilatory support, or noninvasive ventilation). Data sets were stored in the EDW.
Statistical analysis.
(i) Data analysis. Descriptive statistics were calculated for all variables. Data were analyzed using the ODA package (26) for R (27), a user-written front-end interface for the Optimal Discriminant Analysis (ODA) and Classification Tree Analysis (CTA) software programs (28–30). In every analysis, the composite of positive MRSA bloodstream or respiratory tract cultures was the class (i.e., “dependent”) variable. Missing data were coded as being missing in all analyses.
(ii) Classification accuracy metrics. Sensitivity, specificity, and positive and negative predictive values were calculated for every model. For every analysis, the effect strength for sensitivity (ESS), which is classification accuracy adjusted to remove the effect of chance, was used to guide model development. An ESS of 0 corresponds to the accuracy which is expected by chance (ROC area = 0.5), whereas an ESS of 100 corresponds to perfect accuracy (ROC area = 1) (30, 31). Consistent with prior research, we considered any ESS values less than 25% to indicate a relatively weak effect, values between 25% and less than 50% to indicate a moderate effect, between 50% and less than 75% to indicate a relatively strong effect, and 75% or greater to indicate a strong effect (13). For multivariable (CTA) models, the distance statistic, D, was used to adjust ESS for model complexity (30, 32). D is the number of additional effects with equivalent ESS needed to obtain perfect classification of the sample (33).
(iii) Model development. In all analyses, prehospitalization (i.e., baseline) or immediate (i.e., first 24 h) posthospitalization variables were used to predict MRSA case status. Differences in each variable between groups were compared in univariate analysis using ODA (28). Statistically significant univariate models having the greatest ESS were reported. Differences in multiple variables between groups were compared in multivariate analysis using CTA (30). The “globally optimal” CTA model identified in analysis has the lowest D statistic, thereby satisfying the theoretical criterion of maximally efficient chance-adjusted accuracy (33). A comprehensive summary of the model build approach is included in the supplemental material.
We desired a sample with the greatest homogeneity for prediction. We selected MRSA within 72 h of admission for model building for two reasons: (i) it aligned with the specimen collection window used in the EPIC study (34), and (ii) the available sample size was sufficient to achieve statistical power to identify a moderate (or stronger) effect (see Fig. S1 in the supplemental material).
(iv) Model evaluation. First, we utilized two thresholds for statistical significance: a stringent criterion (i.e., Sidak-adjusted experiment-wise P value of <0.05) and a naive criterion (i.e., per-comparison P value of <0.05) (13). To maintain rigor and minimize the risk of committing type II errors in a relatively small sample of cases, we report both experiment-wise and per-comparison significance levels. Second, the reliability of reported effects was evaluated by assessing whether the 90% prediction interval (PI) (i.e., 5th to 95th percentiles) for the final model overlapped the corresponding exact discrete 90% PI for random chance when applied to discrete patient samples to define a worst-case scenario for model performance (35). To maximize reliability, only effects for which the 90% PI for the model did not overlap that for chance were considered for further evaluation (35, 36). Third, the upper bound of potential cross-generalizability was estimated via a one-sample (“leave-one-out” [LOO]) jackknife analysis (37). Use of k-fold cross-validation was impossible due to the small sample size of cases. To maximize reproducibility, only models with classification results which were stable in LOO analysis were considered for further evaluation (19, 30).
Data availability.
Data will be made available upon reasonable request.
ACKNOWLEDGMENTS
This study was supported by a New Investigator Award from the American Association of Colleges of Pharmacy to N. J. Rhodes. C. Qi, A. E. Pawlowski, and R. G. Wunderink were supported by the National Institutes of Health, grant number U19AI135964. The Electronic Data Warehouse is supported by the Northwestern University Clinical and Translational Science (NUCATS) Institute. Research reported in this publication was supported, in part, by the National Institutes of Health’s National Center for Advancing Translational Sciences, grant number UL1TR001422. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The sponsor played no role in the study.
N. J. Rhodes had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Author contributions were as follows: concept and design, N. J. Rhodes, S. H. Sutton, T. R. Zembower, R. G. Wunderink; acquisition, analysis, or interpretation of data, N. J. Rhodes, R. Rohani, P. R. Yarnold, A. E. Pawlowski, M. Malczynski, C. Qi, S. H. Sutton, T. R. Zembower, R. G. Wunderink; drafting of the manuscript, N. J. Rhodes, P. R. Yarnold; critical revision of the manuscript for important intellectual content, N. J. Rhodes, R. Rohani, P. R. Yarnold, A. E. Pawlowski, M. Malczynski, C. Qi, S. H. Sutton, T. R. Zembower, R. G. Wunderink; statistical analysis, N. J. Rhodes, P. R. Yarnold; obtained funding, N. J. Rhodes; administrative, technical, or material support, R. Rohani, A. E. Pawlowski, M. Malczynski, C. Qi; supervision, R. G. Wunderink.
N. J. Rhodes reported receiving grants from Paratek and consulting fees from Third Pole Therapeutics during the conduct of the study. R. Rohani reports receiving grants from Midwestern University during the conduct of the study. C. Qi reports receiving grants from the National Institutes of Health during the conduct of the study. A. E. Pawlowski reports receiving grants from the National Institutes of Health during the conduct of the study. R. G. Wunderink reports receiving grants from the National Institutes of Health during the conduct of the study. No other disclosures were reported.
Footnotes
Supplemental material is available online only.
Contributor Information
Nathaniel J. Rhodes, Email: nrhode@midwestern.edu.
Richard G. Wunderink, Email: r-wunderink@northwestern.edu.
REFERENCES
- 1.Sutton J, Steiner C. 2016. Hospital-, health care-, and community-acquired MRSA: estimates from California hospitals, 2013. Agency for Healthcare Research and Quality, Rockville, MD. [PubMed] [Google Scholar]
- 2.Metlay JP, Waterer GW, Long AC, Anzueto A, Brozek J, Crothers K, Cooley LA, Dean NC, Fine MJ, Flanders SA, Griffin MR, Metersky ML, Musher DM, Restrepo MI, Whitney CG. 2019. Diagnosis and treatment of adults with community-acquired pneumonia. An official clinical practice guideline of the American Thoracic Society and Infectious Diseases Society of America. Am J Respir Crit Care Med 200:e45–e67. 10.1164/rccm.201908-1581ST. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Self WH, Wunderink RG, Williams DJ, Zhu Y, Anderson EJ, Balk RA, Fakhran SS, Chappell JD, Casimir G, Courtney DM, Trabue C, Waterer GW, Bramley A, Magill S, Jain S, Edwards KM, Grijalva CG. 2016. Staphylococcus aureus community-acquired pneumonia: prevalence, clinical characteristics, and outcomes. Clin Infect Dis 63:300–309. 10.1093/cid/ciw300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Jones BE, Ying J, Stevens V, Haroldsen C, He T, Nevers M, Christensen MA, Nelson RE, Stoddard GJ, Sauer BC, Yarbrough PM, Jones MM, Goetz MB, Greene T, Samore MH. 2020. Empirical anti-MRSA vs standard antibiotic therapy and risk of 30-day mortality in patients hospitalized for pneumonia. JAMA Intern Med 180:552–560. 10.1001/jamainternmed.2019.7495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Wunderink RG, Niederman MS, Kollef MH, Shorr AF, Kunkel MJ, Baruch A, McGee WT, Reisman A, Chastre J. 2012. Linezolid in methicillin-resistant Staphylococcus aureus nosocomial pneumonia: a randomized, controlled study. Clin Infect Dis 54:621–629. 10.1093/cid/cir895. [DOI] [PubMed] [Google Scholar]
- 6.Kelesidis T, Braykov N, Uslan DZ, Morgan DJ, Gandra S, Johannsson B, Schweizer ML, Weisenberg SA, Young H, Cantey J, Perencevich E, Septimus E, Srinivasan A, Laxminarayan R. 2016. Indications and types of antibiotic agents used in 6 acute care hospitals, 2009–2010: a pragmatic retrospective observational study. Infect Control Hosp Epidemiol 37:70–79. 10.1017/ice.2015.226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Yi SH, Hatfield KM, Baggs J, Hicks LA, Srinivasan A, Reddy S, Jernigan JA. 2018. Duration of antibiotic use among adults with uncomplicated community-acquired pneumonia requiring hospitalization in the United States. Clin Infect Dis 66:1333–1341. 10.1093/cid/cix986. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Tomczyk S, Jain S, Bramley AM, Self WH, Anderson EJ, Trabue C, Courtney DM, Grijalva CG, Waterer GW, Edwards KM, Wunderink RG, Hicks LA. 2017. Antibiotic prescribing for adults hospitalized in the etiology of pneumonia in the community study. Open Forum Infect Dis 4:ofx088. 10.1093/ofid/ofx088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Pereira JM, Goncalves-Pereira J, Ribeiro O, Baptista JP, Froes F, Paiva JA. 2018. Impact of antibiotic therapy in severe community-acquired pneumonia: data from the Infauci study. J Crit Care 43:183–189. 10.1016/j.jcrc.2017.08.048. [DOI] [PubMed] [Google Scholar]
- 10.Shindo Y, Ito R, Kobayashi D, Ando M, Ichikawa M, Shiraki A, Goto Y, Fukui Y, Iwaki M, Okumura J, Yamaguchi I, Yagi T, Tanikawa Y, Sugino Y, Shindoh J, Ogasawara T, Nomura F, Saka H, Yamamoto M, Taniguchi H, Suzuki R, Saito H, Kawamura T, Hasegawa Y. 2013. Risk factors for drug-resistant pathogens in community-acquired and healthcare-associated pneumonia. Am J Respir Crit Care Med 188:985–995. 10.1164/rccm.201301-0079OC. [DOI] [PubMed] [Google Scholar]
- 11.Maruyama T, Fujisawa T, Ishida T, Ito A, Oyamada Y, Fujimoto K, Yoshida M, Maeda H, Miyashita N, Nagai H, Imamura Y, Shime N, Suzuki S, Amishima M, Higa F, Kobayashi H, Suga S, Tsutsui K, Kohno S, Brito V, Niederman MS. 2019. A therapeutic strategy for all pneumonia patients: a 3-year prospective multicenter cohort study using risk factors for multidrug-resistant pathogens to select initial empiric therapy. Clin Infect Dis 68:1080–1088. 10.1093/cid/ciy631. [DOI] [PubMed] [Google Scholar]
- 12.Webb BJ, Dascomb K, Stenehjem E, Vikram HR, Agrwal N, Sakata K, Williams K, Bockorny B, Bagavathy K, Mirza S, Metersky M, Dean NC. 2016. Derivation and multicenter validation of the drug resistance in pneumonia clinical prediction score. Antimicrob Agents Chemother 60:2652–2663. 10.1128/AAC.03071-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Yarnold PR, Soltysik RC. 2005. Optimal data analysis: a guidebook with software for Windows. APA Books, Washington, DC. [Google Scholar]
- 14.Linden A, Yarnold PR. 2019. Multi-layer perceptron neural net model identifies effect in random data. Optimal Data Anal 8:94–96. [Google Scholar]
- 15.Linden A, Yarnold PR. 2019. Effect of sample size on discovery of relationships in random data by classification algorithms. Optimal Data Anal 8:76–80. [Google Scholar]
- 16.Linden A, Yarnold PR. 2019. Some machine learning algorithms find relationships between variables when none exist – CTA doesn’t. Optimal Data Anal 8:64–67. [Google Scholar]
- 17.Linden A, Bryant FB, Yarnold PR. 2019. Logistic discriminant analysis and structural equation modeling both identify effects in random data. Optimal Data Anal 8:97–102. [Google Scholar]
- 18.Paonessa JR, Shah RD, Pickens CI, Lizza BD, Donnelly HK, Malczynski M, Qi C, Wunderink RG. 2019. Rapid detection of methicillin-resistant Staphylococcus aureus in BAL: a pilot randomized controlled trial. Chest 155:999–1007. 10.1016/j.chest.2019.02.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Yarnold PR. 2016. Determining jackknife ESS for a CTA model with chaotic instability. Optimal Data Anal 5:11–14. [Google Scholar]
- 20.Patel JA, Esterly JS, Scheetz MH, Postelnick MJ. 2012. An analysis of the accuracy of physician-entered indications on computerized antimicrobial orders. Infect Control Hosp Epidemiol 33:1066–1067. 10.1086/667746. [DOI] [PubMed] [Google Scholar]
- 21.Song JU, Kim YH, Lee MY, Lee J. 2019. The association of prior hospitalization with clinical outcomes among patients admitted with pneumonia: a propensity score matching study. BMC Infect Dis 19:349. 10.1186/s12879-019-3961-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Yarnold PR. 1996. Characterizing and circumventing Simpson’s paradox for ordered bivariate data. Educ Psychol Meas 56:430–442. 10.1177/0013164496056003005. [DOI] [Google Scholar]
- 23.Fine MJ, Hanusa BH, Lave JR, Singer DE, Stone RA, Weissfeld LA, Coley CM, Marrie TJ, Kapoor WN. 1995. Comparison of a disease-specific and a generic severity of illness measure for patients with community-acquired pneumonia. J Gen Intern Med 10:359–368. 10.1007/BF02599830. [DOI] [PubMed] [Google Scholar]
- 24.Fine MJ, Auble TE, Yealy DM, Hanusa BH, Weissfeld LA, Singer DE, Coley CM, Marrie TJ, Kapoor WN. 1997. A prediction rule to identify low-risk patients with community-acquired pneumonia. N Engl J Med 336:243–250. 10.1056/NEJM199701233360402. [DOI] [PubMed] [Google Scholar]
- 25.Mandell LA, Wunderink RG, Anzueto A, Bartlett JG, Campbell GD, Dean NC, Dowell SF, File TM, Jr, Musher DM, Niederman MS, Torres A, Whitney CG, Infectious Diseases Society of America, American Thoracic Society . 2007. Infectious Diseases Society of America/American Thoracic Society consensus guidelines on the management of community-acquired pneumonia in adults. Clin Infect Dis 44(Suppl 2):S27–S72. 10.1086/511159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Rhodes NJ. 2020. ODA: a package and R-interface for the MegaODA software suite. R package version 1.0.1.3.
- 27.R Core Team. 2021. R: a language and environment for statistical computing, 4.1.2 ed. R Foundation for Statistical Computing, Vienna, Austria. [Google Scholar]
- 28.Yarnold PR, Soltysik RC. 1991. Theoretical distributions of optima for univariate discrimination of random data. Decis Sci 22:739–752. 10.1111/j.1540-5915.1991.tb00362.x. [DOI] [Google Scholar]
- 29.Yarnold PR, Soltysik RC, Bennett CL. 1997. Predicting in-hospital mortality of patients with AIDS-related Pneumocystis carinii pneumonia: an example of hierarchically optimal classification tree analysis. Stat Med 16:1451–1463. . [DOI] [PubMed] [Google Scholar]
- 30.Yarnold PR, Soltysik RC. 2016. Maximizing predictive accuracy. ODA Books, Chicago, IL. [Google Scholar]
- 31.Yarnold PR. 2014. UniODA vs. ROC analysis: computing the “optimal” cut-point. Optimal Data Anal 3:117–120. [Google Scholar]
- 32.Yarnold PR. 2015. Distance from a theoretically ideal statistical classification model defined as the number of additional equivalent effects needed to obtain perfect classification for the sample. Optimal Data Anal 4:81–86. [Google Scholar]
- 33.Yarnold PR, Linden A. 2016. Theoretical aspects of the D statistic. Optimal Data Anal 5:171–174. [Google Scholar]
- 34.Jain S, Self WH, Wunderink RG, Fakhran S, Balk R, Bramley AM, Reed C, Grijalva CG, Anderson EJ, Courtney DM, Chappell JD, Qi C, Hart EM, Carroll F, Trabue C, Donnelly HK, Williams DJ, Zhu Y, Arnold SR, Ampofo K, Waterer GW, Levine M, Lindstrom S, Winchell JM, Katz JM, Erdman D, Schneider E, Hicks LA, McCullers JA, Pavia AT, Edwards KM, Finelli L, CDC EPIC Study Team . 2015. Community-acquired pneumonia requiring hospitalization among U.S. adults. N Engl J Med 373:415–427. 10.1056/NEJMoa1500245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Rhodes NJ, Yarnold PR. 2020. Generating novometric confidence intervals in R: bootstrap analyses to compare model and chance ESS. Optimal Data Anal 9:172–177. [Google Scholar]
- 36.Rhodes NJ, Jozefczyk CC, Moore WJ, Yarnold PR, Harkabuz K, Maxwell R, Sutton SH, Silkaitis C, Qi C, Wunderink RG, Zembower TR. 2021. Characterizing risk factors for Clostridioides difficile infection among hospitalized patients with community-acquired pneumonia. Antimicrob Agents Chemother 65:e0041721. 10.1128/AAC.00417-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Yarnold PR. 2016. Using UniODA to determine the ESS of a CTA model in LOO analysis. Optimal Data Anal 5:3–10. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental material. Download aac.01023-22-s0001.pdf, PDF file, 0.3 MB (336.7KB, pdf)
Data Availability Statement
Data will be made available upon reasonable request.



