Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 Aug 11;15(8):e0237419. doi: 10.1371/journal.pone.0237419

Development and validation of a model for individualized prediction of hospitalization risk in 4,536 patients with COVID-19

Lara Jehi 1,*, Xinge Ji 2, Alex Milinovich 2, Serpil Erzurum 3, Amy Merlino 4, Steve Gordon 5, James B Young 6, Michael W Kattan 2
Editor: Juan F Orueta7
PMCID: PMC7418996  PMID: 32780765

Abstract

Background

Coronavirus Disease 2019 is a pandemic that is straining healthcare resources, mainly hospital beds. Multiple risk factors of disease progression requiring hospitalization have been identified, but medical decision-making remains complex.

Objective

To characterize a large cohort of patients hospitalized with COVID-19, their outcomes, develop and validate a statistical model that allows individualized prediction of future hospitalization risk for a patient newly diagnosed with COVID-19.

Design

Retrospective cohort study of patients with COVID-19 applying a least absolute shrinkage and selection operator (LASSO) logistic regression algorithm to retain the most predictive features for hospitalization risk, followed by validation in a temporally distinct patient cohort. The final model was displayed as a nomogram and programmed into an online risk calculator.

Setting

One healthcare system in Ohio and Florida.

Participants

All patients infected with SARS-CoV-2 between March 8, 2020 and June 5, 2020. Those tested before May 1 were included in the development cohort, while those tested May 1 and later comprised the validation cohort.

Measurements

Demographic, clinical, social influencers of health, exposure risk, medical co-morbidities, vaccination history, presenting symptoms, medications, and laboratory values were collected on all patients, and considered in our model development.

Results

4,536 patients tested positive for SARS-CoV-2 during the study period. Of those, 958 (21.1%) required hospitalization. By day 3 of hospitalization, 24% of patients were transferred to the intensive care unit, and around half of the remaining patients were discharged home. Ten patients died. Hospitalization risk was increased with older age, black race, male sex, former smoking history, diabetes, hypertension, chronic lung disease, poor socioeconomic status, shortness of breath, diarrhea, and certain medications (NSAIDs, immunosuppressive treatment). Hospitalization risk was reduced with prior flu vaccination. Model discrimination was excellent with an area under the curve of 0.900 (95% confidence interval of 0.886–0.914) in the development cohort, and 0.813 (0.786, 0.839) in the validation cohort. The scaled Brier score was 42.6% (95% CI 37.8%, 47.4%) in the development cohort and 25.6% (19.9%, 31.3%) in the validation cohort. Calibration was very good. The online risk calculator is freely available and found at https://riskcalc.org/COVID19Hospitalization/.

Limitation

Retrospective cohort design.

Conclusion

Our study crystallizes published risk factors of COVID-19 progression, but also provides new data on the role of social influencers of health, race, and influenza vaccination. In a context of a pandemic and limited healthcare resources, individualized outcome prediction through this nomogram or online risk calculator can facilitate complex medical decision-making.

Introduction

Based on the latest estimates from the Centers for Disease Control (week ending in June 6, 2020), hospitalization rates in the United States due to Coronavirus disease of 2019 (COVID-19) range from 5.6/100,000 population in patients 4 years or younger and up to 273.8/100,000 population in those 65 years or older, posing a significant capacity challenge to the healthcare system. Strategies to address this challenge have focused on imposing social distancing to reduce viral transmission and increasing hospital bed capacity by drastically reducing usual occupancy, eliminating elective surgical procedures, and creating makeshift surge hospitals [1]. Social distancing practices have indeed helped in curbing the acute need for hospital beds–at least momentarily- but the long-term healthcare capacity requirements remain unclear as strategies for lifting restrictions and resuming normal activities are in flux. Improving our understanding of the clinical outcomes of patients infected with COVID-19 is therefore paramount. In addition, we need predictive algorithms that identify the COVID-19 patients at highest risk of progressing to severe disease to develop alternative approaches to safely manage them. These predictive algorithms could also be used at a population level to guide social distancing and other risk limiting strategies in a focused fashion, rather than the blanket approaches of shelter-in-place for society.

Older age [23], smoking [4], and medical co-morbidities such as diabetes, hypertension, cardiovascular disease, chronic kidney disease, chronic lung disease [5], and cancer [56] have been correlated with disease worsening in patients who are already hospitalized with COVID-19. It is unclear how these comorbidities, or other patient characteristics, factor into clinical worsening that leads to hospitalization. Translating their significance at an individual patient care level when faced with a decision to hospitalize patients presenting with symptoms of COVID-19 is even more elusive. The end result is patients being told to go home from the emergency room only to return much more ill and be admitted days later, or patients hospitalized for observation for several days without any significant clinical deterioration.

We present the clinical characteristics and outcomes of patients with COVID-19, including a subset who were hospitalized. We also develop and validate a statistical model that can assist with individualized prediction of hospitalization risk for a patient with COVID-19. This model allows us to generate a visual statistical tool (a nomogram) that can consider numerous variables to predict an outcome of interest for an individual patient [7].

Methods

Patient selection

We included all patients, regardless of age, who had positive COVID-19 testing at Cleveland Clinic between March 8, 2020 and June 5, 2020. The study cohort included all Covid positive patients, whether they were hospitalized or not, from across the Cleveland clinic health system which includes >220 outpatient locations and18 hospitals in Ohio and Florida. As testing demand increased, we adapted our organizational policies and protocols to reconcile demand with patient and caregiver safety. Prior to March 18, any primary care physician could order a COVID-19 test. After that date, testing resources were streamlined through a “COVID-19 Hotline” which followed recommendations from the Centers for Disease Control (recommending to focus on high risk patients as defined by any of the following: Age older than 60 years old or less than 36 months old; on immune therapy; having comorbidities of cancer, end-stage renal disease, diabetes, hypertension, coronary artery disease, heart failure with reduced ejection fraction, lung disease, HIV/AIDS, solid organ transplant; contact with known COVID 19 patients; physician discretion was still allowed).

Cleveland clinic COVID-19 registry

Demographics, co-morbidities, travel and COVID-19 exposure history, medications, presenting symptoms, socioeconomic measures, treatment, disease progression, and outcomes were collected. Registry variables were chosen to reflect available literature on COVID-19 disease characterization, progression, and proposed treatments, including medications thought to have benefits through drug-repurposing studies [8]. Capture of detailed research data was facilitated by the creation of standardized clinical templates that were implemented across the healthcare system as patients were seeking care for COVID-19-related concerns. Outcome capture was facilitated by a home monitoring program whereby patients who tested positive were called daily for 14 days after–test result to monitor their disease progression.

Data were extracted via previously validated automated feeds [9] from our electronic health record (Epic, Epic Systems Corporation) and manually by a study team trained on uniform sources for the study variables. The COVID-19 Research Registry team includes a “Reviewer” group and a “Quality Assurance” group. The reviewers were responsible for manually abstracting and entering a subset of variables (signs and symptoms upon presentation) that cannot be automatically extracted from the electronic health record, and for verifying high-priority variables (co-morbidities) that have been automatically pulled into the database from the electronic health record. The Quality Assurance group provided an independent second layer of review. Study data were collected and managed using REDCap electronic data capture tools hosted at Cleveland Clinic [1011]. REDCap (Research Electronic Data Capture) is a secure, web-based software platform designed to support data capture for research studies, providing 1) an intuitive interface for validated data capture; 2) audit trails for tracking data manipulation and export procedures; 3) automated export procedures for seamless data downloads to common statistical packages; and 4) procedures for data integration and interoperability with external sources.

This research was approved by the Cleveland Clinic Institutional Review Board (IRB# 20–283). Consent was waived by IRB.

COVID-19 testing protocols

Nasopharyngeal and oropharyngeal swab specimens were both collected in all patients and pooled for testing by trained medical personnel. Given previous beliefs that co-infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and other respiratory viruses is rare [1213], a reflex testing algorithm was implemented to conserve resources. All patient specimens were first tested for the presence of influenza A/B and respiratory syncytial virus (RSV), and only those negative for influenza and RSV were subsequently tested for SARS-CoV-2.

Infection with SARS-CoV-2 was confirmed by laboratory testing using the Centers for Disease Control and Prevention (CDC) reverse transcription polymerase chain reaction (RT-PCR) SARS-CoV-2 assay that was validated in the Cleveland Clinic Robert J. Tomsich Pathology and Laboratory Medicine Institute. This assay uses Roche Magnapure extraction and ABI 7500 DX PCR instruments. Between March 8 and 13, the tests were sent out to LabCorp, Burlington, North Carolina. All testing was authorized by the Food and Drug Administration under an Emergency Use Authorization (EUA), and in accordance with the guidelines established by the CDC.

Statistical methods

Baseline data are presented as median [interquartile range [IQR]) and number (%)]. Continuous variables were compared using the Mann-Whitney U test, and categorical variables were compared using the Chi-square test. The outcome of interest was hospitalization anytime within three days of a positive COVID test. The model was built using a development cohort (patients with COVID positive test resulted before May 1, 2020), and subsequently tested in a validation cohort (patients with COVID positive test resulted between May 1 and June 5, 2020). This allowed us to test the model’s validity over time. A full multivariable logistic model was initially constructed to predict hospital admission with COVID-19 based on demographic variables, comorbidities, immunization history, symptoms, travel history, lab variables, and medications that were identified pre-admission. For modeling purposes, methods of missing value imputation for labs variables were compared using median values and using values from multivariate imputation by chained equations (MICE) via the R package mice. Restricted cubic splines with 3 knots were applied to continuous variables to relax the linearity assumption. A least absolute shrinkage and selection operator (LASSO) logistic regression algorithm was performed to retain the most predictive features. A 10-fold cross validation method was applied to find the regularization parameter lambda which gave the minimum mean cross-validated concordance index. Predictors with nonzero coefficients in the LASSO regression model were chosen for calculating predicted risk. The final model was internally validated by assessing the discrimination and calibration with 1000 bootstrap resamples. Discrimination was measured with the concordance index [14]. Calibration was assessed visually by plotting the nomogram predicted probabilities against the observed event proportions over a series of equally spaced values within the range of the predicted probabilities. The closer the calibration curve lies along the 45° line, the better the calibration. A scaled Brier score, called the index of predictive accuracy (IPA) [15], was also calculated, as this has some advantages over the more popular concordance index. The IPA ranges from -1 to 1, where a value of 0 indicates a useless model, and negative values imply a harmful model. We adhered to the TRIPOD checklist for reporting the prediction model [16].

We calculated sensitivity, specificity, positive addictive value, negative predictive value at different cutoffs of predicted risk. We used R, version 3.5.0 (R Project for Statistical Computing) [17], with tidyverse [18], mice [19], caret [20], and risk Regression [21] packages for all analyses. Statistical tests were 2-sided and used a significance threshold of P < .05. We included all COVID positive patients during the study period in this model development and validation to optimize model performance: no specific sample size calculations were performed.

Sensitivity analyses

An outcome of “hospitalized versus not” allows us to predict the likelihood that the patient is actually getting admitted to the hospital. This decision, however, is influenced by multiple “non-medical” factors including bed availability, regulatory systems, and individual physician preferences. To test the applicability of our model towards a determination of whether a patient should have been admitted or not, we subdivided patients included in our validation cohort and development cohorts into 4 categories: A- hospitalized and not sent home within 24 hours; B- sent home (not initially hospitalized) but ultimately hospitalized within 1 week of being sent home; C- not hospitalized at all; D- hospitalized but sent home within 24 hours. In this construct, categories A and C represent patients who were “correctly managed”, at categories B and D represent those who were “incorrectly managed”. We then tested the discrimination of our model in each one of those categories separately.

No model recalibration was done.

Results

Patient characteristics and outcomes

4,536 patients tested positive during the study period, including 2,852 patient in the development cohort (DC) of whom 582 (20.4%) were hospitalized, and 1,684 patients in the validation cohort (VC) of whom 376 (22.3%) were hospitalized. Table 1 provides demographic, exposure, clinical, laboratory, social characteristics, and medication history of COVID-19 patients who were hospitalized versus those who completed their treatment on an outpatient basis in both the DC and VC. At the time of hospital admission, 260 patients were known to have COVID-19, while the results of the (RT-PCR) SARS-CoV-2 nasopharyngeal assay were still pending on 698. Six hundred and sixty five were admitted from the emergency room, 32 were transferred from other hospitals, and 261 were directly admitted from the outpatient areas. Overall outcomes illustrated in Fig 1 show the cumulative incidence of hospital discharge, transfer to intensive care unit, and death in our hospitalized cohort.

Table 1. Detailed descriptive statistics of demographic, exposure, clinical, laboratory, social characteristics, and medication history of COVID-19 positive patients who were hospitalized versus not.

Statistically significant variables (p-value<0.05) are bolded. The development data is before 05/01 and the validation data is between 05/01 and 06/05. The percentages presented are per row.

Development Cohort Validation Cohort
Not hospitalized hospitalized p-value Not hospitalized hospitalized p-value
N 2270 582 1308 376
Demographics:
Race (%) <0.001 <0.001
Asian 27 (77.1) 8 (22.9) 8 (53.3) 7 (46.7)
Black 498 (70.0) 213 (30.0) 422 (68.0) 199 (32.0)
Other 362 (93.5) 25 (6.5) 239 (90.5) 25 (9.5)
White 1383 (80.5) 336 (19.5) 639 (81.5) 145 (18.5)
Male (%) 1049 (76.5) 323 (23.5) <0.001 556 (75.3) 182 (24.7) 0.049
Ethnicity (%) <0.001 <0.001
Hispanic 326 (93.9) 21 (6.1) 99 (84.6) 18 (15.4)
Non-Hispanic 1677 (75.2) 553 (24.8) 925 (72.8) 345 (27.2)
Unknown 267 (97.1) 8 (2.9) 284 (95.6) 13 (4.4)
Smoking (%) <0.001 <0.001
Current Smoker 136 (78.6) 37 (21.4) 111 (71.2) 45 (28.8)
Former Smoker 642 (74.4) 221 (25.6) 301 (70.2) 128 (29.8)
No 1182 (79.6) 302 (20.4) 613 (78.8) 165 (21.2)
Unknown 310 (93.4) 22 (6.6) 283 (88.2) 38 (11.8)
Age (median [IQR]) Missing: 0.5% 50.57 [35.75, 64.40] 64.37 [54.83, 76.58] <0.001 45.57 [30.49, 65.93] 64.94 [52.45, 76.78] <0.001
Exposure history:
Exposed to COVID-19? YES (%) 1725 (81.6) 390 (18.4) <0.001 732 (78.3) 203 (21.7) 0.535
Family member with COVID-19? YES (%) 1565 (80.3) 383 (19.7) 0.161 557 (75.0) 186 (25.0) 0.021
Presenting symptoms:
Cough? Yes (%) 1889 (79.8) 478 (20.2) 0.576 662 (77.3) 194 (22.7) 0.781
Fever? Yes (%) 1534 (79.2) 403 (20.8) 0.472 505 (77.3) 148 (22.7) 0.838
Fatigue? Yes (%) 1479 (76.8) 446 (23.2) <0.001 531 (73.9) 188 (26.1) 0.001
Sputum production? Yes (%) 1042 (78.8) 280 (21.2) 0.365 458 (75.5) 149 (24.5) 0.114
Flu-like symptoms? Yes (%) 1711 (80.0) 429 (20.0) 0.439 659 (78.9) 176 (21.1) 0.245
Shortness of breath? Yes (%) 1098 (72.3) 421 (27.7) <0.001 379 (66.5) 191 (33.5) <0.001
Diarrhea? Yes (%) 995 (78.6) 271 (21.4) 0.256 370 (74.0) 130 (26.0) 0.022
Loss of appetite? Yes (%) 1222 (77.2) 360 (22.8) 0.001 464 (73.0) 172 (27.0) <0.001
Vomiting? Yes (%) 711 (81.7) 159 (18.3) 0.069 282 (75.2) 93 (24.8) 0.217
Co-morbidities:
BMI (median [IQR]) Missing: 52.7% 29.27 [25.73, 33.98] 30.30 [26.29, 35.46] 0.03 30.05 [25.71, 35.18] 29.02 [24.80, 34.95] 0.15
COPD/emphysema? Yes (%) 102 (58.0) 74 (42.0) <0.001 43 (50.6) 42 (49.4) <0.001
Asthma? Yes (%) 264 (67.9) 125 (32.1) <0.001 198 (75.6) 64 (24.4) 0.419
Diabetes? Yes %) 358 (60.3) 236 (39.7) <0.001 193 (57.1) 145 (42.9) <0.001
Hypertension? Yes (%) 800 (65.6) 419 (34.4) <0.001 462 (64.7) 252 (35.3) <0.001
Coronary artery disease? Yes (%) 172 (57.3) 128 (42.7) <0.001 125 (61.3) 79 (38.7) <0.001
Heart failure? Yes (%) 122 (52.4) 111 (47.6) <0.001 80 (51.6) 75 (48.4) <0.001
Cancer? Yes (%) 230 (66.7) 115 (33.3) <0.001 124 (72.1) 48 (27.9) 0.079
Transplant history? Yes (%) 10 (41.7) 14 (58.3) <0.001 2 (22.2) 7 (77.8) <0.001
Multiple sclerosis? Yes (%) 23 (76.7) 7 (23.3) 0.863 11 (68.8) 5 (31.2) 0.576
Connective tissue disease? Yes (%) 165 (69.3) 73 (30.7) <0.001 47 (74.6) 16 (25.4) 0.658
Inflammatory Bowel Disease? Yes (%) 80 (72.1) 31 (27.9) 0.059 27 (75.0) 9 (25.0) 0.852
Immunosuppressive disease? Yes (%) 164 (59.0) 114 (41.0) <0.001 125 (59.8) 84 (40.2) <0.001
Vaccination history:
Influenza vaccine? Yes (%) 818 (72.5) 311 (27.5) <0.001 423 (67.6) 203 (32.4) <0.001
Pneumococcal polysaccharide vaccine? Yes (%) 264 (57.9) 192 (42.1) <0.001 178 (56.3) 138 (43.7) <0.001
Laboratory findings upon presentation:
Pre-testing platelets (median [IQR]) Missing: 67.3% 213.00 [163.00, 267.00] 190.00 [153.25, 241.75] <0.001 213.00 [171.00, 270.50] 207.00 [156.00, 273.00] 0.266
Pre- testing AST (median [IQR]) Missing: 72.0% 28.00 [21.00, 40.00] 36.00 [25.00, 52.00] <0.001 25.00 [20.00, 34.50] 31.50 [22.00, 47.00] <0.001
Pre- testing BUN (median [IQR]) Missing: 67.8% 13.00 [10.00, 19.00] 18.00 [12.00, 30.00] <0.001 13.00 [10.00, 18.00] 19.00 [12.00, 30.75] <0.001
Pre- testing Cholride (median [IQR]) Missing: 67.8% 100.00 [98.00, 103.00] 98.00 [95.00, 101.00] <0.001 101.00 [98.00, 103.00] 99.00 [96.00, 103.00] 0.004
Pre- testing Creatinine (median [IQR]) Missing: 67.7% 0.90 [0.74, 1.11] 1.10 [0.84, 1.57] <0.001 0.87 [0.71, 1.11] 1.05 [0.79, 1.48] <0.001
Pre-testing hematocrit (median [IQR]) Missing: 67.4% 40.60 [36.40, 44.12] 40.00 [36.30, 43.80] 0.421 39.35 [36.00, 42.50] 39.00 [34.60, 42.70] 0.285
Pre- testing Potassium (median [IQR]) Missing: 67.2% 4.00 [3.70, 4.20] 4.00 [3.70, 4.40] 0.034 3.90 [3.60, 4.20] 4.00 [3.70, 4.40] 0.005
Home medications:
Immunosuppressive treatment? Yes (%) 162 (67.8) 77 (32.2) <0.001 69 (63.9) 39 (36.1) 0.001
NSAIDS? Yes (%) 388 (66.2) 198 (33.8) <0.001 187 (61.1) 119 (38.9) <0.001
Steroids? Yes (%) 192 (67.1) 94 (32.9) <0.001 86 (63.7) 49 (36.3) <0.001
Carvedilol? Yes (%) 38 (50.7) 37 (49.3) <0.001 17 (53.1) 15 (46.9) 0.002
ACE inhibitor? Yes (%) 160 (62.5) 96 (37.5) <0.001 83 (62.9) 49 (37.1) <0.001
ARB? Yes (%) 128 (66.0) 66 (34.0) <0.001 38 (53.5) 33 (46.5) <0.001
Melatonin? Yes (%) 51 (60.0) 34 (40.0) <0.001 24 (48.0) 26 (52.0) <0.001
Social influencers of health:
Population Per Sq Km* (median [IQR]) 3.09 [2.68, 3.32] 3.04 [2.67, 3.31] 0.42 3.06 [2.60, 3.32] 3.15 [2.77, 3.38] <0.001
Median Income ($1000, median [IQR]) 57.85 [44.78, 76.40] 55.18 [36.27, 73.09] <0.001 50.80 [36.06, 65.76] 41.67 [29.38, 64.08] <0.001
Population Per Housing Unit (median [IQR]) 2.29 [1.99, 2.61] 2.15 [1.92, 2.40] <0.001 2.22 [1.93, 2.46] 2.06 [1.79, 2.31] <0.001

* transformed as log10(x+1)

Fig 1. This figure shows the cumulative incidence of each of the 3 outcomes (going home; transferred to ICU; death) following hospitalization in our COVID-19 cohort.

Fig 1

Values above the days from admission axis indicate numbers of patients at risk.

Prediction modeling results

Imputation methods were evaluated with 1000 repeated bootstrapped samples. We found that models based on median imputation appeared to outperform those based on data from MICE imputation, so median imputation was selected for the basis of the final model. Variables that we examined and were not found to add value beyond those included in our final model for predicting hospitalization included exposure to COVID 19, other family members with COVID-19, fever, fatigue, sputum production, flu-like symptoms, recent international travel, coronary artery disease, heart failure, on immunosuppressive treatment, other heart disease, other lung disease, pneumovax vaccine, BUN, on angiotensin converting enzyme inhibitor, angiotensin receptor blocker, toremifene, and paroxetine. Model discrimination was excellent with an area under the curve of 0.900 (95% confidence interval of 0.886–0.914) in the development cohort, and 0.813 (0.786, 0.839) in the validation cohort. The scaled Brier score was 42.6% (95% CI 37.8%, 47.4%) in the development cohort and 25.6% (19.9%, 31.3%) in the validation cohort. The nomogram is presented in Fig 2, and an online version of the statistical model (Fig 3) is available at https://riskcalc.org/COVID19Hospitalization/. The calibration curves are shown in Fig 4 and suggest that predicted risk matches observed proportions relatively well throughout the risk range. Table 2 shows the sensitivity, specificity, negative predictive value, and positive predictive value at different cutoffs of predicted risk.

Fig 2. A nomogram (graphical version of the model) is shown.

Fig 2

Line 1 is used to calculate the points that are associated with each of the predictor variables. Each subsequent line represents a predictor in the final model. The patient’s characteristic is found on each line, and from it, a vertical line is drawn to find the points that are associated with each value. All the points are then totaled and located on second to last line. A vertical line is drawn down to the bottom line to locate the predicted risk of hospitalization produced by the model.

Fig 3. Online risk calculator for risk of hospitalization from COVID-19, found at https://riskcalc.org/COVID19Hospitalization/.

Fig 3

The example here is a 55-year-old white male, former smoker, who presented with cough, shortness of breath, and loss of appetite. He has diabetes and received no vaccinations this year and is only on NSAIDs for some chronic joint pains. No labs are available yet. His predicted risk of hospitalization is 8.56%. If race is changed to Black, with all other variables remaining constant, his relative risk almost doubles to an absolute value of 17.22%.

Fig 4. Calibration curve for the model predicting likelihood of hospitalization.

Fig 4

The x-axis displays the predicted probabilities generated by the statistical model and the y-axis shows the fraction of the patients with COVID-19 who were hospitalized at the given predicted probability. The 45° line, therefore, indicates perfect calibration where, for example, at a predicted probability of 0.2 is associated with an actual observed proportion of 0.2. The solid black line indicates the model’s relationship with the outcome. The closer the line is to the 45-degree line, the closer the model’s predicted probability is to the actual proportion. As demonstrated, there is excellent correspondence between the predicted probability of a positive test and the observed frequency of hospitalization in COVID-19 (+) patients.

Table 2. Sensitivity, specificity, positive predictive value, and negative predictive value of the model in the validation dataset at different cutoffs of predicted hospitalization risk.

Sensitivity Specificity PPV NPV
10% 0.769 0.726 0.447 0.916
30% 0.519 0.918 0.646 0.896
50% 0.388 0.963 0.749 0.846
70% 0.253 0.979 0.772 0.820
90% 0.117 0.992 0.800 0.796

Sensitivity analysis

Appropriately managed patients represented the majority of the cohort: 750 patients were hospitalized with a length of stay that exceeded 24 hours (431 in DC and 319 in VC), and 3549 patients were not hospitalized at all (2258 in DC and 1291 in VC). A minority of patients (237 patients, 5.4%) fell in the category of inappropriate initial management: 208 had been initially sent home from the emergency room but were then admitted within 1 week of emergency room visit (151 in DC, 57 in VC), and 29 patients were hospitalized but then discharged within 24 hours (12 in DC, and 17 in VC). When tested in each one of those categories, the predictive model performed very well in the appropriately managed subgroup (area under the curve of 0.821), but its performance was inadequate in the 5.4% of patients who fell in the inappropriate initial management category.

Discussion

Predictors of hospitalization

Our results confirm a higher risk of hospitalization with older age (median age in hospitalized patients of 65.5 years compared to 48.0 years in non-hospitalized patients), male sex (56.9% of hospitalized vs 48.3% of non-hospitalized), and medical co-morbidities most prominently hypertension, diabetes, and immunosuppressive disease (variables significant on univariable analysis in Table 1, but also relevant in final model). The significant association of shortness of breath and diarrhea with hospitalization may reflect the need for inpatient supportive care with these symptoms, regardless of the etiology. Beyond the expected, our results provide some insights that advance the existing literature:

  1. Smoking: The World Health Organization warns of a higher morbidity for COVID-19 in smokers, and proposes multiple possible mechanisms including frequent touching of face and mouth during the act of smoking, sharing cigarettes, and underlying lung disease [22]. We found that former smokers rather than current smokers are at higher risk of COVID-related hospitalization (Table 1), favoring the underlying lung disease mechanism.

  2. Medications: We found a higher risk of hospitalizations in COVID-19 patients who were on Angiotensin Converting Enzyme (ACE) inhibitors, or angiotensin II type-I receptor blockers (ARBs) on univariable analysis [16,2324]. However, being on these medications did not influence the final multivariable model, suggesting that prior associations between ACEI’s and ARBs with COVID severity may be confounded by the underlying medical co-morbidities (hypertension and diabetes) that are linked to highest COVID hospitalization rates, and which are most often treated with these same drugs. ACE2 can also be increased by thiazolidinediones and ibuprofen, potentially explaining the higher hospitalization risk seen in our patients on non-steroidal anti-inflammatory drugs (NSAIDs); in fact, the latest FDA guidance cautions against the use of NSAIDs in COVID patients [25]. Overall, we recommend caution using retrospective data to draw robust conclusions assigning causation to drugs vs underlying co-morbidity vs genetically driven ACE2 polymorphism. We highlight the need for carefully designed, large observational studies or randomized clinical trials to address these critical questions.

  3. Race: African American race was correlated with a higher hospitalization risk (36.2% of hospitalized vs 21% of non-hospitalized). This is consistent with a recent look at hospitalizations for COVID-19 across 14 states from March 1 to 30 [26]. Race data, which were available for 580 of 1,482 patients, revealed that African Americans accounted for 33 percent of the hospitalizations, but only 18 percent of the total population surveyed [26]. The authors proposed explanations like higher rates of medical co-morbidities, higher exposure risks, and distrust of the medical community as a postulated rationale. Our data, however, show that the effect of race on the individualized hospitalization risk prediction far outweighs that of any medical co-morbidity (Fig 2). It is already known that race influences the effectiveness of an immune response [27]. A deeper exploration of the underlying genetics and biology of race in the defense against and the response to a SARS-CoV-2 infection is needed. This should be paired with a deeper exploration of social influencers of health such as population per square kilometer, and population per household which were also relevant in our nomogram. In our online risk calculator, only the zip code entry is required: the relevant social influencers data are derived from the zip code by our program.

Why do we need a prediction tool?

Given the multitude of risk factors discussed, the nomogram and online risk calculator assist with obviating challenges of translating complex information to patient-level clinical decision-making [28]. During a pandemic, with hospital beds in short supply, it is critical to empower front-line healthcare providers with tools that can supplement and support decision-making about who to admit. Advances in tele-health can be leveraged for home monitoring to guide care delivery in an outpatient setting for those determined to be low risk based on the nomogram calculation. Models like ours developed with data obtained through an automated abstraction from the electronic health record (EHR) offer the promise of integration within the EHR to facilitate rapid and efficient implementation into the clinical workflow. Such a strategy is a pragmatic application of overdue calls for a Learning Health System [29].

How well does this nomogram perform?

Model performance, as measured by the concordance index, is excellent (c-statistic = 0.900). This level of discrimination is clearly superior to a coin toss or assuming all patients are at equivalent risk (both c-statistics = 0.5). The calibration of the model is excellent in both the DC and VC (see Fig 4). The metric that considers calibration, the IPA value, confirms that the model predicts substantially better than chance or no model at all. Overall, the model performs very well. Our next step will be to integrate this model into the clinical workflow.

How can this model be integrated in a clinical workflow?

Manually abstracting data and inputting it in an online calculator is cumbersome in a busy clinical practice. Interpreting the prediction without some frame of reference is complex. However, failing to see beyond these hurdles risks wasting opportunities to innovate and improve patient care. It is therefore imperative to develop a clear implementation strategy that aligns with the existing clinical needs and clinical operations of a health organization. One could start by identifying the clinical problems that would benefit from this prediction tool, and reference the information in Table xx on sensitivity, specificity, positive predictive value, and negative predictive value at different prediction cutoffs to provide a framework for clinical application. An illustrative example now being explored from our own health system is the use of this calculator to tailor the intensity of home monitoring for COVID positive patients. Currently, every patient who tests positive for COVID is being called daily for 14 days to check on their symptoms and identify disease progression early enough for intervention. With only 20–30% of COVID positive patients progressing to the point of requiring hospitalization, the nurses can use our prediction tool to identify this high risk group and only call them daily, while reducing the intensity of follow-up with the rest.

Limitations

This is not a multicenter study. It is important to note though that it includes all hospitals and outpatient facilities of the Cleveland Clinic Health System within the US (>220 outpatient locations and18 hospitals in Ohio and Florida) creating robust sampling of the COVID-19 population. As with any other statistical model, other hospital systems may elect to validate this model internally for their specific patient populations as they contemplate options for integrating it in their workflow. Given the alternative of no or constantly changing practice guidelines, implementation of this nomogram into our clinical workflow will allow prospective evaluation of its impact on patient care and outcomes. Our model includes age as a predictor: this may mitigate our ability to identify risk factors for disease progression specific in the younger population, and may underestimate the risks in the younger population with less severe disease and less likely to seek medical care. Lastly, although our model performs very well in the majority of COVID positive patients, more research is needed to optimize it for the sub group (5.4% of the total cohort in our series) with either delayed or unnecessary admission.

Conclusions

Drivers of disease progression and worsening in COVID-19 are multiple and complex. We developed a statistical model with excellent predictive performance (c-statistic of 0.926) to individualize the hospitalization risk assessment at the patient level. This could help guide clinical decision-making and resource allocation.

Supporting information

S1 Checklist

(PDF)

Data Availability

Data used for the generation of this risk prediction model includes human research participant data that are sensitive and cannot be publicly shared due to legal and ethical restrictions by the Cleveland clinic regulatory bodies including the institutional review Board and legal counsel. In particular, variables like the patient's address, date of testing, dates of hospitalization, date of ICU admission, and date of mortality are HIPAA protected health information and legally cannot be publicly shared. Since these variables were critical to the generation and performance of the model, a partial dataset (everything except them) is not fruitful either because it will not help in efforts of academic advancement, such as model validation or application. We will make our data sets available upon request, under appropriate data use agreements with the specific parties interested in academic collaboration. Requests for data access can be made to mascar@ccf.org.

Funding Statement

None of the authors report any conflicts of interest or have any relevant disclosures. LJ, SE, and MK were funded by NIH/NCATS UL1TR002548. https://ncats.nih.gov/ The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Tsai T, Jacobson B, Jha A. American Hospital Capacity And Projected Need for COVID-19 Patient Care. Health Affairs. https://www.healthaffairs.org/do/10.1377/hblog20200317.457910/full/ accessed April 10, 2020. [Google Scholar]
  • 2.Chen J, Qi T, Liu L, Ling Y, Qian Z, Li T, et al. Clinical progression of patients with COVID-19 in Shanghai, China. J Infect. 2020. March 19. pii: S0163-4453(20)30119-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Zhou F, Yu T, Du R, Fan G et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020;395(10229):1054 Epub 2020 Mar 11. 10.1016/S0140-6736(20)30566-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Liu W, Tao ZW, Lei W, Ming-Li Y, Kui L, Ling Z, et al. Analysis of factors associated with disease outcomes in hospitalized patients with 2019 novel coronavirus disease. Chin Med J (Engl). 2020. February 28 10.1097/CM9.0000000000000775 [Epub ahead of print] . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wu Z, McGoogan JM. Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72314 Cases From the Chinese Center for Disease Control and Prevention. JAMA. 2020. [DOI] [PubMed] [Google Scholar]
  • 6.Liang W, Guan W, Chen R, et al. Cancer patients in SARS-CoV-2 infection: a nationwide analysis in China. Lancet Oncol. 2020;21(3):335 Epub 2020 Feb 14. 10.1016/S1470-2045(20)30096-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Kattan MW. Nomograms. Introduction. Semin Urol Oncol 2002; 20(2): 79–81. [PubMed] [Google Scholar]
  • 8.Zhou Y., Hou Y., Shen J. et al. Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2. Cell Discov 6, 14 (2020). 10.1038/s41421-020-0153-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Milinovich A, Kattan MW. Extracting and utilizing electronic health data from Epic for research. Ann Transl Med. 2018. February;6(3):42 10.21037/atm.2018.01.13 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG, Research electronic data capture (REDCap)– A metadata-driven methodology and workflow process for providing translational research informatics support, J Biomed Inform. 2009. April;42(2):377–81. 10.1016/j.jbi.2008.08.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Harris PA, Taylor R, Minor BL, Elliott V, Fernandez M, O’Neal L, et al. , REDCap Consortium, The REDCap consortium: Building an international community of software partners, J Biomed Inform. 2019. May 9 10.1016/j.jbi.2019.103208 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Schuchat A; CDC COVID-19 Response Team. Public Health Response to the Initiation and Spread of Pandemic COVID-19 in the United States, February 24-April 21, 2020. MMWR Morb Mortal Wkly Rep. 2020;69(18):551‐556. Published 2020 May 8. 10.15585/mmwr.mm6918e2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zwald ML, Lin W, Sondermeyer Cooksey GL, et al. Rapid Sentinel Surveillance for COVID-19—Santa Clara County, California, March 2020. MMWR Morb Mortal Wkly Rep. 2020;69(14):419‐421. Published 2020 Apr 10. 10.15585/mmwr.mm6914e3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Harrell F. E. Jr., Califf R. M., Pryor D. B., Lee K. L., & Rosati R. A. (1982). Evaluating the yield of medical tests. JAMA, 247(18), 2543–2546. [PubMed] [Google Scholar]
  • 15.Kattan M. W., & Gerds T. A. (2018). The index of prediction accuracy: an intuitive measure useful for evaluating risk prediction models. Diagn Progn Res, 2, 7 10.1186/s41512-018-0029-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement [published correction appears in Ann Intern Med. 2015 Apr 21;162(8):600]. Ann Intern Med. 2015;162(1):55‐63. 10.7326/M14-0697 [DOI] [PubMed] [Google Scholar]
  • 17.R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria: URL https://www.R-project.org/. [Google Scholar]
  • 18.Hadley Wickham (2017). tidyverse: Easily Install and Load the 'Tidyverse'. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse
  • 19.van Buuren Stef, Groothuis-Oudshoorn Karin (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–67. URL https://www.jstatsoft.org/v45/i03/. [Google Scholar]
  • 20.Max Kuhn. Contributions from Jed Wing, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton Kenkel, the R Core Team, Michael Benesty, Reynald Lescarbeau, Andrew Ziem, Luca Scrucca, Yuan Tang, Can Candan and Tyler Hunt. (2019). caret: Classification and Regression Training. R package version 6.0–84. https://CRAN.R-project.org/package=caret
  • 21.Thomas Alexander Gerds and Brice Ozenne (NA). riskRegression: Risk Regression Models and Prediction Scores for Survival Analysis with Competing Risks. R package version 2019.11.03.
  • 22.WHO team. Department of Communications. Posted on March 24, 2020. https://www.who.int/news-room/q-a-detail/q-a-on-smoking-and-covid-19. Accessed April 11, 2020.
  • 23.Wan Y, Shang J, Graham R, et al. Receptor recognition by novel coronavirus from Wuhan: An analysis based on decadelong structural studies of SARS. J Virology 2020; published online Jan 29. 10.1128/JVI.00127-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Li XC, Zhang J, Zhuo JL. The vasoprotective axes of the renin-angiotensin system: physiological relevance and therapeutic implications in cardiovascular, hypertensive and kidney diseases. Pharmacol Res 2017;125: 21–38. 10.1016/j.phrs.2017.06.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.https://www.fda.gov/drugs/drug-safety-and-availability/fda-advises-patients-use-non-steroidal-anti-inflammatory-drugs-nsaids-covid-19. Content current as of: 03/19/2020. Accessed on 4/11/2020
  • 26.Garg S, Kim L, Whitaker M, et al. Hospitalization Rates and Characteristics of Patients Hospitalized with Laboratory-Confirmed Coronavirus Disease 2019—COVID-NET, 14 States, March 1–30, 2020. MMWR Morb Mortal Wkly Rep. ePub: 8 April 2020. 10.15585/mmwr.mm6915e3externalicon. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Quach H, Rotival M, Pothlichet J,et al. Genetic Adaptation and Neandertal Admixture Shaped the Immune System of Human Populations. Cell. 2016. October 20;167(3):643–656.e17. 10.1016/j.cell.2016.09.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Formica V, Minieri M, Bernardini S, et al. Complete blood count might help to identify subjects with high probability of testing positive to SARS-CoV-2 [published online ahead of print, 2020 Jul 2]. Clin Med (Lond). 2020;clinmed.2020–0373. 10.7861/clinmed.2020-0373 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Institute of Medicine (US) Roundtable on Evidence-Based Medicine; Olsen LA, Aisner D, McGinnis JM, editors. The Learning Healthcare System: Workshop Summary. Washington (DC): National Academies Press (US); 2007. Available from: https://www.ncbi.nlm.nih.gov/books/NBK53494/ 10.17226/11903 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Juan F Orueta

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

2 Jun 2020

PONE-D-20-11909

Characteristics, outcomes, and individualized prediction of hospitalization risk in 818 patients with COVID-19.

PLOS ONE

Dear Dr. Jehi,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised by the reviewers in their comments.

In my opinion, a crucial problem is that the risk calculator has been designed to identify patients that will be hospitalized, but not patients that require to be admitted. As the authors clearly explain in the manuscript, there are patients discharged home from emergency rooms that return more ill and need to be readmitted days later, or patients hospitalized for observation and sent home within three days with no intervention beyond supportive care. I wonder if the risk calculator could be useful to discriminate prospectively patients who need admission for treatment from those who can be safely managed as outpatients. Have the authors analyzed their model categorizing the patients in such subgroups?

In addition, the authors should revise the criteria of PLOS ONE for publication. Please, complete the TRIPOD checklist and include it in your submission.

Please submit your revised manuscript by Jul 17 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Juan F. Orueta, MD, PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

3. For studies involving humans categorized by race/ethnicity, age, disease/disabilities, religion, sex/gender, sexual orientation, or other socially constructed groupings, authors should:

a) Explicitly describe their methods of categorizing human populations,

b) Define categories in as much detail as the study protocol allows,

c) Justify their choices of definitions and categories,

d) Explain whether (and if so, how) they controlled for confounding variables such as socioeconomic status, nutrition, environmental exposures, or similar factors in their analysis.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: hank you for allowing me the opportunity to review this paper. This paper has led to the development of an online risk calculator for covid-19, which is very important in the current environment. Although I feel the paper has merit, some changes need to be made to improve the paper.

Author list:

- Please provide the affiliation details or author Merlino (Chief Medical Officer is not an affiliation)

Introduction:

Line 75 --> Clarify that this is in the United States, as the paper will be read by international readers

Line 101/102 --> Please remove the number of included patients. This should be left for the results section.

Lines 103-106 --> Belong in the methods sections, as you describe how the model was developed.

Methods:

Patient selection --> Please describe how these patients were recruited, in which departments were they tested? Also add if this only included patients that were admitted or also patients who had positive tests but were not admitted

Line 134 --> Some studies suggest that there is a difference in sensitivity and specificity between the nasopharyngeeal and oropharyngeal swab. Did all patients have both swabs taken? If so, clarify that both speciemns were collected in all patients.

Line 170 --> Add a reference for the TRIPOD checklist

Results:

General comment: Numbers below 10 should be written as words, everything else in numerals.

Line 182 and 183: Is it known how many patients went home, fell ill again, but visited a different clinic to be admitted? Are you confident that all patients that fell ill came to the clinics included in your study? This is important to provide a accurate depiction.

Discussion:

Line 216: Does this also hold truth for current smokers with COPD in your dataset?

Line 274: Should other hospitals also include this model then in their clinical workflow or should they wait for it to be validated? Please add some info on this in the limitation section

Reviewer #2: The manuscript by Jehi L. et al. is focused on a crucial point in the decision making concerning patients affected by COVID 19. The ability to correctly stratify the risks could allow the physicians to perform a correct rule in/rule out or whenever patients need to be charged in the hospital to define the best care setting for each patient.

The manuscript is clearly written, data are well collected and the conclusions are well supported by novel and interesting results.

I think that the model proposed by the authors could be very useful. My concern regarding the model and the consequent risk calculator proposed relies in the relative complexity that for some care settings could be a problem.

Another methodological concern relies on the decision to test for SARS-CoV-2 only patients negative for influenza due, according the authors, to an anecdotes belief that coinfection is rare. However, to my knowledge, no scientific report has supported this belief. If the authors can support this concept with a reference should add it, otherwise they should test all suspected patients independently by the existence of a influenza A/B coinfection for avoiding possible statistical bias.

Also the idea to insert "age" in the model could be problematic. That's' not a lab value ... and speaks to the biases in the population cohort.. meaning that in a children's hospital, or any ED with a younger population (perhaps a suburban ED).. the scores would automatically reflect a lower risk for covid simply as a function of a younger aged population. The disease does not selectively choose its patients based on age.... although it does take a more aggressive course in older pts (who then are more likely to come to the ED). The net effect would be that the false negatives would intuitively be highest in the younger population.. which is not good. May be the authors should discuss this issue.

Concerning the data management rigourous statistical methodologies were followed for the analysis which are described in depth in the method section. However a major drawback is the absence of external validation of the model, and since the patient sample is relatively large, authors should think of splitting the cohort in a training and validation subset.

Moreover, the authors state in the Conclusion that their model is able “to individualize this risk assessment at the patient level”. The only risk really assessed by the model is the hospitalization because, for example, no indication is given on the outcome of the patients. I think the authors should point out this issue in the Discussion Section.

Reviewer #3: Dear Editor,

Thank you for giving me the opportunity to review the article: ”Characteristics, outcomes, and individualized prediction of hospitalization risk in 818 patients with COVID-19” by Jehi et al.

The article describes characteristics of patients testing positive for SARS-CoV-2 and tries to find predictors for hospital admission. The authors have used several advanced statistical approaches and handle them with great skill. Nevertheless, I have several major concerns that must be addressed:

1. The authors describe that the aim is to identify factors that could help in selecting patients that should be admitted. As the authors state: “The end result is patients being told to go home from the emergency room only to return much more ill and be admitted days later, or patients hospitalized for observation for several days without any significant clinical deterioration”. However, the outcome analyzed in the article, as I understand it, is those patients that de facto were admitted. I think the correct outcome would be those patients that should have been admitted as the authors themselves imply in the statement above. Patients that actually are admitted of course depends on the attending physician and as the authors discuss there can be both over- and under-admittance.

The fraction of patients that actually are admitted also heavily depends on the location and regulatory system of the country and location (and even on available hospital beds for the moment). In some countries, all persons with positive SARS-CoV-2 are admitted and in some countries only those that really need oxygen or care 24/7 are admitted. Therefore, having admittance as an outcome will only reflect the judgment of the attending physician at that location. This could partly be overcome be analyzing patients that were sent home after admittance without requiring oxygen within 1-3 days and no deterioration and also accounting for patients that were not admitted but came back and were later admitted (in the same or any other hospital) within e.g. 7 days (or died at home). For both these patient categories the judgment to admit or not were “wrong” and the patients should the adjudicated to their “right” category and not their actual category which were done in this study.

I would preferably like to see such a sensitivity analysis to really be able to discuss actual risk factors. If not such an analysis is made known risk factors will likely bias the results since the attending physician will be more likely to admit patients with known risk factors from the literature even if these are not true risk factors for outpatients.

There are some clues to this in the discussion that some patients were sent home and then later returned but it is unclear to what category in the LASSO regression these patients were adjudicated (their “right” category, i.e. to be admitted, or the “wrong” category, i.e to be sent home which were done in the first place). The same is true for the opposite. The authors describe that 50% of the patients admitted not requiring treatment in the ICU, were sent home within 3 days. To what category were these patients adjudicated? Was it in the first place a correct medical judgment to admit them since they most likely did not need medical care 24/7?

2. To me, as a non-American, it is unclear if the cohort covers the whole population in the catchment area; both for patients with and without any insurance and if there might have been differences in the likelihood of being sampled for SARS-CoV-2 or admitted based on this.

3. If a patient were sent home and then deteriorated and came back or died at home or were seeking medical care again would such a patient always be recognized in the cohort or could that patient seek another caregiver and not be accounted for? Are all deaths (no matter where) within 30 days from inclusion among the 818 patients included accounted for?

4. The cohort is based on those that were sampled for SARS-CoV-2. What were the sampling criteria? In some countries persons are sampled as part of screening programs, whereas in other countries only patients that are admitted to the hospital are sampled and no patients that are planned to go home from the ED or seek the GP are sampled. This will of course bias the result in different ways and needs to be discussed and clarified.

Minor comments

1. The article presents in the result section in the abstract that 11,686 patients were tested and the abstract speaks about a large cohort. However, the data presented only represents (correctly) the data of the 818 patients positive for COVID-19. It could be made a little bit clearer that this is actually the case. The complications et cetera, as death or ICU, are only analyzed among those 232 that were admitted making the cohort even smaller.

2. In the method section reference to R and the used packages are missing.

3. To me it is unclear how the multiple imputation was handled with the LASSO regression. Was imputation made on each imputed data set or were any algorithms like MIRL (Multiple Imputation Random Lasso) used?

4. Some variables are log-transformed in table 1. How were these variables entered in the multiple imputation and in the LASSO regression?

5. Even if “modern” statistical measures such as IPA et cetera were used it would be nice to have the sensitivity, specificity and AUC for the score in the bootstrapped cohorts.

6. Ideally, but I understand this is of course much more work, it would be nice to evaluate the score in a different cohort, preferably in another setting/country to validate the findings. Even more so, since the outcome was admittance with all the problems outlined above. The authors try to address this by bootstrapping the cohorts but this does not fully address these potentially biasing factors.

Again, thank you for letting me review this article.

Reviewer #4: Lara et al. aimed to characterise a large cohort of patients hospitalised with COVID-19, their

outcomes, and develop a statistical model that allows individualised prediction of future

hospitalisation risk for a patient newly diagnosed with COVID-19.

Following are my detailed comments:

1. The model temporal/external validation is a necessary step for generalisation of any predictive model. One can do the temporal validation on the remaining data from the same retrospective cohort.

2. There is no guidance what cutoff might be used in practice. This will enhance the applicability of this equation in a clinical setting. It would be good to report sensitivity, specificity, PPV, and NPV at several thresholds to facilitate complex medical decision-making.

3. I appreciate that nomograms and online risk calculator are developed to provide individualised hospitalisation risk for a patient newly diagnosed with COVID-19. It would be good to report the risk equation (at least as a supplementary material) that will help future external validation of this equation.

4. I can see LASSO logistic regression has been used to retain most predictive features for hospitalisation risk, but there is no evidence reported that the parsimonious model performed better than the full model.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: jacopo maria legramante

Reviewer #3: No

Reviewer #4: Yes: Muhammad Faisal

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Aug 11;15(8):e0237419. doi: 10.1371/journal.pone.0237419.r002

Author response to Decision Letter 0


23 Jun 2020

June 18, 2020

We thank the reviewers for their thorough and meticulous comments. Below are our point by point responses.

Best regards

Lara Jehi and Mike Kattan.

Reviewer #1: Thank you for allowing me the opportunity to review this paper. This paper has led to the development of an online risk calculator for covid-19, which is very important in the current environment. Although I feel the paper has merit, some changes need to be made to improve the paper.

Response: We thank reviewer#1 for the kind comments. In addition to the point by point responses below, we want to make the reviewer aware of one additional major change in this revision done to address concerns of reviewers #2-4: We expanded our sample size to now include 4,536 Covid positive patients, and we divided those patients into a development cohort and validation cohort.

Author list:

- Please provide the affiliation details or author Merlino (Chief Medical Officer is not an affiliation)

Response: Affiliation is now provided.

Introduction:

Line 75 --> Clarify that this is in the United States, as the paper will be read by international readers. Response: Clarified. This is in the United States

Line 101/102 --> Please remove the number of included patients. This should be left for the results section. Response: We removed the number of included patients, as requested.

Lines 103-106 --> Belong in the methods sections, as you describe how the model was developed. Response: We removed lines 103-106 from the introduction and moved the information to methods, as recommended by this reviewer.

Methods:

Patient selection --> Please describe how these patients were recruited, in which departments were they tested? Also add if this only included patients that were admitted or also patients who had positive tests but were not admitted. Response: We significantly expanded the patient selection paragraph in this revision, as per the reviewer’s suggestion. We added the following clarifying section: “The study cohort thus included all Covid positive patients, whether they were hospitalized or not. As testing demand increased, we adapted our organizational policies and protocols to reconcile demand with patient and caregiver safety. Prior to March 18, any primary care physician could order a Covid 19 test. After that date, testing resources were streamlined through a “COVID-19 Hotline” which followed recommendations from the Centers for Disease Control (recommending to focus on high risk patients as defined by any of the following: Age older than 60 years old or less than 36 months old; on immune therapy; having comorbidities of cancer, end-stage renal disease, diabetes, hypertension, coronary artery disease, heart failure with reduced ejection fraction, lung disease, HIV/AIDS, solid organ transplant; contact with known Covid 19 patients; physician discretion was still allowed).”

Line 134 --> Some studies suggest that there is a difference in sensitivity and specificity between the nasopharyngeeal and oropharyngeal swab. Did all patients have both swabs taken? If so, clarify that both speciemns were collected in all patients. Response: We added the qualifier “in all patients”

Line 170 --> Add a reference for the TRIPOD checklist. Response: We added the reference as per the reviewer’s request: Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement [published correction appears in Ann Intern Med. 2015 Apr 21;162(8):600]. Ann Intern Med. 2015;162(1):55‐63. doi:10.7326/M14-0697

Results:

General comment: Numbers below 10 should be written as words, everything else in numerals. Response: We apologize to the reviewer for this mis-step. We corrected the spelling of the numbers in this revision.

Line 182 and 183: Is it known how many patients went home, fell ill again, but visited a different clinic to be admitted? Are you confident that all patients that fell ill came to the clinics included in your study? This is important to provide a accurate depiction. Response: We thank the reviewer for highlighting this important issue. It crystallized for us the need to provide more details on where the patients were admitted from, and their ultimate disposition.

All patients who test positive for COVID in our cohort are then followed in our home monitoring program (now described in lines 157-159 of Methods) and are called by our nursing staff daily for 14 days after their test result. Their clinical progression is therefore documented in detail in our medical records and we feel confident about capturing their outcomes. We now added the sentence to the methods section:“Outcome capture was facilitated by a home monitoring program whereby patients who tested positive were called daily for 14 days after–test result to monitor their disease progression.”

In addition, we added a sensitivity analysis to clarify the different paths that patients took to get hospitalized: This was added to the methods section:’ Sensitivity analyses: An outcome of “hospitalized versus not” allows us to predict the likelihood that the patient is actually getting admitted to the hospital. This decision, however, is influenced by multiple “non-medical” factors including bed availability, regulatory systems, and individual physician preferences. To test the applicability of our model towards a determination of whether a patient should have been admitted or not, we subdivided patients included in our validation cohort and development cohorts into 4 categories: A- hospitalized and not sent home within 24 hours; B- sent home (not initially hospitalized) but ultimately hospitalized within 1 week of being sent home; C- not hospitalized at all; D- hospitalized but sent home within 24 hours. In this construct, categories A and C represent patients who were “correctly managed”, at categories B and D represent those who were “incorrectly managed”. We then tested the discrimination of our model in each one of those categories separately.”

This was added to the results section:’ Sensitivity analysis: Appropriately managed patients represented the majority of the cohort: 750 patients were hospitalized with a length of stay that exceeded 24 hours (431 in DC and 319 in VC), and 3549 patients were not hospitalized at all (2258 in DC and 1291 in VC). A minority of patients (237 patients, 5.4%) fell in the category of inappropriate initial management: 208 had been initially sent home from the emergency room but were then admitted within 1 week of emergency room visit (151 in DC, 57 in VC), and 29 patients were hospitalized but then discharged within 24 hours (12 in DC, and 17 in VC). When tested in each one of those categories, the predictive model performed very well in the appropriately managed subgroup (area under the curve of 0.821), but its performance was inadequate in the 5.4% of patients who fell in the inappropriate initial management category.”

We hope that this additional level of details satisfies the reviewer’s questions.

Discussion

Line 216: Does this also hold true for current smokers with COPD in your dataset? Response: Yes. Each predictor that was included in the final model had independent predictive value regardless of the others. This means that smoking history was relevant, regardless of whether the patient had COPD or not.

Line 274: Should other hospitals also include this model then in their clinical workflow or should they wait for it to be validated? Please add some info on this in the limitation section. Response: We thank the reviewer for this important suggestion. We now added the following limitations section: “As with any other statistical model, other hospital systems may elect to validate this model internally for their specific patient populations as they contemplate options for integrating it in their workflow.”

Reviewer #2: The manuscript by Jehi L. et al. is focused on a crucial point in the decision making concerning patients affected by COVID 19. The ability to correctly stratify the risks could allow the physicians to perform a correct rule in/rule out or whenever patients need to be charged in the hospital to define the best care setting for each patient.

The manuscript is clearly written, data are well collected and the conclusions are well supported by novel and interesting results.

Response: We thank the reviewer for the kind and encouraging comments.

I think that the model proposed by the authors could be very useful. My concern regarding the model and the consequent risk calculator proposed relies in the relative complexity that for some care settings could be a problem. Response: We thank the reviewer for highlighting these important points. We now add a section called “How can this model be integrated in a clinical workflow?” to the discussion. We start it by highlighting the limitation due to complexity, but then go on to describe how to implement the calculator with an illustrative example.

“How can this model be integrated in a clinical workflow? Manually abstracting data and inputting it in an online calculator is cumbersome in a busy clinical practice. Interpreting the prediction without some frame of reference is complex. However, failing to see beyond these hurdles risks wasting opportunities to innovate and improve patient care. It is therefore imperative to develop a clear implementation strategy that aligns with the existing clinical needs and clinical operations of a health organization. One could start by identifying the clinical problems that would benefit from this prediction tool, and reference the information in Table 2 on sensitivity, specificity, positive predictive value, and negative predictive value at different prediction cutoffs to provide a framework for clinical application. An illustrative example now being explored from our own health system is the use of this calculator to tailor the intensity of home monitoring for COVID positive patients. Currently, every patient who tests positive for COVID is being called daily for 14 days to check on their symptoms and identify disease progression early enough for intervention. With only 20-30% of COVID positive patients progressing to the point of requiring hospitalization, the nurses can use our prediction tool to identify this high risk group and only call them daily, while reducing the intensity of follow-up with the rest.”

Another methodological concern relies on the decision to test for SARS-CoV-2 only patients negative for influenza due, according the authors, to an anecdotes belief that coinfection is rare. However, to my knowledge, no scientific report has supported this belief. If the authors can support this concept with a reference should add it, otherwise they should test all suspected patients independently by the existence of a influenza A/B coinfection for avoiding possible statistical bias. Response: We now add two references to support the low rates of coinfection:

Schuchat A; CDC COVID-19 Response Team. Public Health Response to the Initiation and Spread of Pandemic COVID-19 in the United States, February 24-April 21, 2020. MMWR Morb Mortal Wkly Rep. 2020;69(18):551‐556. Published 2020 May 8. doi:10.15585/mmwr.mm6918e2

Zwald ML, Lin W, Sondermeyer Cooksey GL, et al. Rapid Sentinel Surveillance for COVID-19 - Santa Clara County, California, March 2020. MMWR Morb Mortal Wkly Rep. 2020;69(14):419‐421. Published 2020 Apr 10. doi:10.15585/mmwr.mm6914e3

Also the idea to insert "age" in the model could be problematic. That's' not a lab value ... and speaks to the biases in the population cohort.. meaning that in a children's hospital, or any ED with a younger population (perhaps a suburban ED).. the scores would automatically reflect a lower risk for covid simply as a function of a younger aged population. The disease does not selectively choose its patients based on age.... although it does take a more aggressive course in older pts (who then are more likely to come to the ED). The net effect would be that the false negatives would intuitively be highest in the younger population.. which is not good. May be the authors should discuss this issue. Response: We thank the reviewer for raising this important question. It is indeed a delicate point because the risk for disease progression does increase with age as shown in our data and by others, so this increased risk needs to be reflected in a risk calculator aimed at the general population, but as the reviewer points out, the calculator would not then be specific to children and may underestimate aspects of disease severity that pertain to them. We now added this to the limitation section:

“Our model includes age as a predictor: this may mitigate our ability to identify risk factors for disease progression specific in the younger population, and may underestimate the risks in the younger population with less severe disease and less likely to seek medical care.”

Concerning the data management rigourous statistical methodologies were followed for the analysis which are described in depth in the method section. However a major drawback is the absence of external validation of the model, and since the patient sample is relatively large, authors should think of splitting the cohort in a training and validation subset. Response: We thank the reviewer for the suggestion which we feel has greatly improved the quality of our manuscript. In this revision, as suggested, we now include a validation cohort (new table 1). For the purposes of modeling performed in this paper, we divided the patients into a development cohort (COVID positive test resulted before May 1, 2020), and validation cohort (COVID positive test resulted between May 1 and June 5, 2020). As shown in our results, the model performed extremely well in both the validation and the development cohort’s: The area under the curve was 0.900 with a 95% confidence interval of (0.886, 0.914) in the development cohort, and 0.813 (0.786, 0.839) in the validation cohort. The IPA was similarly impressive with 42.6% (37.8%, 47.4%) in the development cohort and 25.6% (19.9%, 31.3%) in the validation cohort. We also show the calibration curves for both the development and validation cohorts. These additional findings increase our confidence in the stability and validity of our model’s performance over time.

Moreover, the authors state in the Conclusion that their model is able “to individualize this risk assessment at the patient level”. The only risk really assessed by the model is the hospitalization because, for example, no indication is given on the outcome of the patients. I think the authors should point out this issue in the Discussion Section. Response: We thank the reviewer for pointing this out. We now change in this claim to be more specific “to individualize hospitalization risk assessment at the patient level”.

Reviewer #3: Dear Editor,

Thank you for giving me the opportunity to review the article: ”Characteristics, outcomes, and individualized prediction of hospitalization risk in 818 patients with COVID-19” by Jehi et al.

The article describes characteristics of patients testing positive for SARS-CoV-2 and tries to find predictors for hospital admission. The authors have used several advanced statistical approaches and handle them with great skill. Nevertheless, I have several major concerns that must be addressed:

1. The authors describe that the aim is to identify factors that could help in selecting patients that should be admitted. As the authors state: “The end result is patients being told to go home from the emergency room only to return much more ill and be admitted days later, or patients hospitalized for observation for several days without any significant clinical deterioration”. However, the outcome analyzed in the article, as I understand it, is those patients that de facto were admitted. I think the correct outcome would be those patients that should have been admitted as the authors themselves imply in the statement above. Patients that actually are admitted of course depends on the attending physician and as the authors discuss there can be both over- and under-admittance.

The fraction of patients that actually are admitted also heavily depends on the location and regulatory system of the country and location (and even on available hospital beds for the moment). In some countries, all persons with positive SARS-CoV-2 are admitted and in some countries only those that really need oxygen or care 24/7 are admitted. Therefore, having admittance as an outcome will only reflect the judgment of the attending physician at that location. This could partly be overcome be analyzing patients that were sent home after admittance without requiring oxygen within 1-3 days and no deterioration and also accounting for patients that were not admitted but came back and were later admitted (in the same or any other hospital) within e.g. 7 days (or died at home). For both these patient categories the judgment to admit or not were “wrong” and the patients should the adjudicated to their “right” category and not their actual category which were done in this study.

I would preferably like to see such a sensitivity analysis to really be able to discuss actual risk factors. If not such an analysis is made known risk factors will likely bias the results since the attending physician will be more likely to admit patients with known risk factors from the literature even if these are not true risk factors for outpatients.

There are some clues to this in the discussion that some patients were sent home and then later returned but it is unclear to what category in the LASSO regression these patients were adjudicated (their “right” category, i.e. to be admitted, or the “wrong” category, i.e to be sent home which were done in the first place). The same is true for the opposite. The authors describe that 50% of the patients admitted not requiring treatment in the ICU, were sent home within 3 days. To what category were these patients adjudicated? Was it in the first place a correct medical judgment to admit them since they most likely did not need medical care 24/7? Response: We appreciate the fascinating comments made here by the reviewer, and fully agree with every point made. We considered different approaches to address these comments and hope that what we did below is satisfactory:

1- We significantly expanded our sample size in this revision (up from 818 COVID positive patients to 4,536 COVID positive) allowing us to create both a development and validation cohort for our model (New table 1), and to get more granularity on patient trajectories.

2- In addition, we added a sensitivity analysis to clarify the different paths that patients took to get hospitalized: This was added to the methods section:’ Sensitivity analyses: An outcome of “hospitalized versus not” allows us to predict the likelihood that the patient is actually getting admitted to the hospital. This decision, however, is influenced by multiple “non-medical” factors including bed availability, regulatory systems, and individual physician preferences. To test the applicability of our model towards a determination of whether a patient should have been admitted or not, we subdivided patients included in our validation cohort and development cohorts into 4 categories: A- hospitalized and not sent home within 24 hours; B- sent home (not initially hospitalized) but ultimately hospitalized within 1 week of being sent home; C- not hospitalized at all; D- hospitalized but sent home within 24 hours. In this construct, categories A and C represent patients who were “correctly managed”, at categories B and D represent those who were “incorrectly managed”. We then tested the discrimination of our model in each one of those categories separately.”

3- This was added to the results section:’ Sensitivity analysis: Appropriately managed patients represented the majority of the cohort: 750 patients were hospitalized with a length of stay that exceeded 24 hours (431 in DC and 319 in VC), and 3549 patients were not hospitalized at all (2258 in DC and 1291 in VC). A minority of patients (237 patients, 5.4%) fell in the category of inappropriate initial management: 208 had been initially sent home from the emergency room but were then admitted within 1 week of emergency room visit (151 in DC, 57 in VC), and 29 patients were hospitalized but then discharged within 24 hours (12 in DC, and 17 in VC). When tested in each one of those categories, the predictive model performed very well in the appropriately managed subgroup (area under the curve of 0.821), but its performance was inadequate in the 5.4% of patients who fell in the inappropriate initial management category.”

4- We added the following to the limitations section “Lastly, although our model performs very well in the majority of COVID positive patients, more research is needed to optimize it for the sub group (5.4% of the total cohort in our series) with either delayed or unnecessary admission.” To highlight the challenges of accurate outcome prediction in the subgroup with delayed or unnecessary admission.

We hope that this additional level of detail satisfies the reviewer’s questions.

2. To me, as a non-American, it is unclear if the cohort covers the whole population in the catchment area; both for patients with and without any insurance and if there might have been differences in the likelihood of being sampled for SARS-CoV-2 or admitted based on this. Response: The reviewer raises a very important question. In contrast to countries with nationalized or centralized healthcare, no health system in the United States covers the whole population in a catchment area. On healthcare system (a nonprofit organization) captures though a sizable portion of this population: in Cleveland Ohio, and Weston Florida, our health system covers about 40-50% of the population; our patient demographics mirror those of the city population, and lack of health insurance does not prevent hospitalization. We therefore feel that our findings indeed reflect the biological drivers of disease progression requiring hospitalization.

3. If a patient were sent home and then deteriorated and came back or died at home or were seeking medical care again would such a patient always be recognized in the cohort or could that patient seek another caregiver and not be accounted for? Are all deaths (no matter where) within 30 days from inclusion among the 818 patients included accounted for? Response: All patients who test positive for COVID in our cohort are then followed in our home monitoring program (lines 157-159 of Methods) and are called by our nursing staff daily for 14 days after their test result. Their clinical progression is therefore documented in detail in our medical records and we feel confident about capturing their outcomes. We now added the sentence to the methods section:

“Outcome capture was facilitated by a home monitoring program whereby patients who tested positive were called daily for 14 days after–test result to monitor their disease progression.”

4. The cohort is based on those that were sampled for SARS-CoV-2. What were the sampling criteria? In some countries persons are sampled as part of screening programs, whereas in other countries only patients that are admitted to the hospital are sampled and no patients that are planned to go home from the ED or seek the GP are sampled. This will of course bias the result in different ways and needs to be discussed and clarified. Response: We appreciate the importance of this request. We added now in the methods section the paragraph below to clarify the criteria for testing in our institution:

“The study cohort thus included all COVID positive patients, whether they were hospitalized or not. As testing demand increased, we adapted our organizational policies and protocols to reconcile demand with patient and caregiver safety. Prior to March 18, any primary care physician could order a Covid 19 test. After that date, testing resources were streamlined through a “COVID-19 Hotline” which followed recommendations from the Centers for Disease Control (recommending to focus on high risk patients as defined by any of the following: Age older than 60 years old or less than 36 months old; on immune therapy; having comorbidities of cancer, end-stage renal disease, diabetes, hypertension, coronary artery disease, heart failure with reduced ejection fraction, lung disease, HIV/AIDS, solid organ transplant; contact with known COVID 19 patients; physician discretion was still allowed).”

Minor comments

1. The article presents in the result section in the abstract that 11,686 patients were tested and the abstract speaks about a large cohort. However, the data presented only represents (correctly) the data of the 818 patients positive for COVID-19. It could be made a little bit clearer that this is actually the case. The complications et cetera, as death or ICU, are only analyzed among those 232 that were admitted making the cohort even smaller. Response: As suggested, we now deleted the number of patients who were tested, and only mention those of who were positive.

2. In the method section reference to R and the used packages are missing. Response: We added the reference as per the reviewer’s request: We used R, version 3.5.0 (R Project for Statistical Computing), with tidyverse, mice, caret, and riskRegression packages for all analyses. Statistical tests were 2-sided and used a significance threshold of P < .05.

3. To me it is unclear how the multiple imputation was handled with the LASSO regression. Was imputation made on each imputed data set or were any algorithms like MIRL (Multiple Imputation Random Lasso) used? Response: We created 10 imputed datasets using the mice package in R. All covariates, including those without missingness, were used in the imputation process. We conducted the LASSO regression on the 10 imputed datasets and calculated the average concordance index and compared with other imputation methods.

4. Some variables are log-transformed in table 1. How were these variables entered in the multiple imputation and in the LASSO regression? Response: We used the log-transformed values in the multiple imputation and in the LASSO regression.

5. Even if “modern” statistical measures such as IPA et cetera were used it would be nice to have the sensitivity, specificity and AUC for the score in the bootstrapped cohorts. Response: We thank the reviewer for this very useful suggestion. We now added table 2 with the sensitivity, specificity, PPV, and NPV at several thresholds, as requested.

6. Ideally, but I understand this is of course much more work, it would be nice to evaluate the score in a different cohort, preferably in another setting/country to validate the findings. Even more so, since the outcome was admittance with all the problems outlined above. The authors try to address this by bootstrapping the cohorts but this does not fully address these potentially biasing factors. Response: We thank the reviewer for the suggestion. In this revision, we now include a validation cohort (new table 1). For the purposes of modeling performed in this paper, we divided the patients into a development cohort (COVID positive test resulted before May 1, 2020), and validation cohort (COVID positive test resulted between May 1 and June 5, 2020). As shown in our results, the model performed extremely well in both the validation and the development cohort’s: The area under the curve was 0.900 with a 95% confidence interval of (0.886, 0.914) in the development cohort, and 0.813 (0.786, 0.839) in the validation cohort. The IPA was similarly impressive with 42.6% (37.8%, 47.4%) in the development cohort and 25.6% (19.9%, 31.3%) in the validation cohort. We also show the calibration curves for both the development and validation cohorts. These additional findings increase our confidence in the stability and validity of our model’s performance over time. We acknowledge that this is not the same as doing a validation in a cohort from another country, but this is the best that we could do within the scope of a revised submission.

Again, thank you for letting me review this article.

Reviewer #4: Lara et al. aimed to characterise a large cohort of patients hospitalised with COVID-19, their outcomes, and develop a statistical model that allows individualised prediction of future

hospitalisation risk for a patient newly diagnosed with COVID-19.

Following are my detailed comments:

1. The model temporal/external validation is a necessary step for generalisation of any predictive model. One can do the temporal validation on the remaining data from the same retrospective cohort. Response: We thank the reviewer for this important comment. We now include a validation cohort as detailed in our responses to prior reviewers and shown in a new table 1 and a new figure 4

2. There is no guidance what cutoff might be used in practice. This will enhance the applicability of this equation in a clinical setting. It would be good to report sensitivity, specificity, PPV, and NPV at several thresholds to facilitate complex medical decision-making. Response: We thank the reviewer for this very useful suggestion. We now added table 2 with the sensitivity, specificity, PPV, and NPV at several thresholds, as requested.

3. I appreciate that nomograms and online risk calculator are developed to provide individualised hospitalisation risk for a patient newly diagnosed with COVID-19. It would be good to report the risk equation (at least as a supplementary material) that will help future external validation of this equation. Response: We appreciate the interest but prefer not to publish also the formula. The formula is intellectual property and we would be open to collaborating with anyone who wants to validate. We have repeatedly shared our models with academic institutions, but avoiding public posting of our formula allows us to control commercial access.

4. I can see LASSO logistic regression has been used to retain most predictive features for hospitalisation risk, but there is no evidence reported that the parsimonious model performed better than the full model. Response: In the validation data, the parsimonious model had a higher AUC and IPA (0.813, 25.6%) compared with the full model (0.802, 23.7%).

Attachment

Submitted filename: response to reviewers.docx

Decision Letter 1

Juan F Orueta

15 Jul 2020

PONE-D-20-11909R1

Characteristics, outcomes, and individualized prediction of hospitalization risk in 4,536 patients with COVID-19.

PLOS ONE

Dear Dr. Jehi,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet all PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript as soon as possible.

The authors have been responsive to the comments raised by the reviewers. However, there is lack of a TRIPOD checklist form in the submission. In my opinion, this paper does not accomplish the whole set of TRIPOD recommendations. Please, observe the list of items, make any minor changes required in your manuscript and complete the form.

Please submit your revised manuscript by Aug 29 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Juan F. Orueta, MD, PhD

Academic Editor

PLOS ONE

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

Reviewer #4: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #4: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #4: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #4: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #4: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have addressed my comments in a satisfactory manner. I have no further comments. Thank you

Reviewer #2: I would suggest only to the authors to consider a recent paper which could further support the need to correctly stratify the risks in patients possibly affected by COVID 19 (see Formica V et al. Complete blood count might help to identify subjects withhigh probability of testing positive to SARS-CoV-2. Clinical Medicine 2020; 20: 1-6, No 4 July 2020 DOI:10.7861/clinmed.2020-0373)

Reviewer #4: The authors have adequately addressed my comments and I recommend the manuscript for publication.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Jacopo M. Legramante

Reviewer #4: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Aug 11;15(8):e0237419. doi: 10.1371/journal.pone.0237419.r004

Author response to Decision Letter 1


16 Jul 2020

July 16, 2020

Response to reviewer:

PONE-D-20-11909R1

Characteristics, outcomes, and individualized prediction of hospitalization risk in 4,536 patients with COVID-19.

PLOS ONE

Dear Dr.Orueta,

Thank you for the favorable review of our revised manuscript. We submit here the requested minor revisions with:

1- your request to add the tripod checklist and make the corresponding minor revisions in the manuscript, and

2- the only suggestion from the reviewer (reviewer #2) to add a new reference (now reference#28).

Looking forward to hopefully an expeditious final decision.

Best regards

Lara Jehi, MD, MHCDS and Mike Kattan, PhD

Attachment

Submitted filename: response to reviewers minor revision.docx

Decision Letter 2

Juan F Orueta

22 Jul 2020

PONE-D-20-11909R2

Characteristics, outcomes, and individualized prediction of hospitalization risk in 4,536 patients with COVID-19.

PLOS ONE

Dear Dr. Jehi,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but there are still two minor points that must be addressed before publishing the manuscript. We invite you to submit a revised version as soon as possible.

Firstly, the title of the manuscript does not follow the TRIPOD recommendation. Please, mend it.

Besides, I have noticed that there are some mistakes in table 1 and the results section. According to the title of Table 1, "the statistically significant variables (p value <0.05) are in bold." It was true in the first version of the manuscript but not in the following ones. Also, I suppose there are several typos in the p-value columns since there are big differences in some variables (for example, diarrhea). Furthermore, the description on lines 257-8 ("The significant association of sputum production, shortness of breath and diarrhea with hospitalization ...") corresponds to the p-values ​​in the first draft but not in the second one.

Please submit your revised manuscript by Sep 05 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Juan F. Orueta, MD, PhD

Academic Editor

PLOS ONE

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Aug 11;15(8):e0237419. doi: 10.1371/journal.pone.0237419.r006

Author response to Decision Letter 2


23 Jul 2020

July 23, 2020

Dear Dr.Orueta,

We are submitting here a revised manuscript based on your second editorial request for minor revisions.

1- We changed the title from “Characteristics, Outcomes, and individualized prediction of hospitalization risk in 4,536 patients with COVID-19” to “Development and validation of a model for individualized prediction of hospitalization risk in 4,536 patients with COVID-19”. Hopefully this satisfies your concern about aligning the title with TRIPOD.

2- We bolded the significant p-values in Table 1. The percentages in the table are presented by row. We added a clarifying statement to that effect in the legend. We verified that the p-values are all correct. We deleted “sputum production” from line 258.

Thank you for your consideration, and appreciate your review of these minor edits.

Best

Lara Jehi and Mike Kattan.

Decision Letter 3

Juan F Orueta

28 Jul 2020

Development and validation of a model for individualized prediction of hospitalization risk in 4,536 patients with COVID-19.

PONE-D-20-11909R3

Dear Dr. Jehi,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Juan F. Orueta, MD, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Juan F Orueta

30 Jul 2020

PONE-D-20-11909R3

Development and validation of a model for individualized prediction of hospitalization risk in 4,536 patients with COVID-19.

Dear Dr. Jehi:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Juan F. Orueta

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Checklist

    (PDF)

    Attachment

    Submitted filename: response to reviewers.docx

    Attachment

    Submitted filename: response to reviewers minor revision.docx

    Data Availability Statement

    Data used for the generation of this risk prediction model includes human research participant data that are sensitive and cannot be publicly shared due to legal and ethical restrictions by the Cleveland clinic regulatory bodies including the institutional review Board and legal counsel. In particular, variables like the patient's address, date of testing, dates of hospitalization, date of ICU admission, and date of mortality are HIPAA protected health information and legally cannot be publicly shared. Since these variables were critical to the generation and performance of the model, a partial dataset (everything except them) is not fruitful either because it will not help in efforts of academic advancement, such as model validation or application. We will make our data sets available upon request, under appropriate data use agreements with the specific parties interested in academic collaboration. Requests for data access can be made to mascar@ccf.org.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES