Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 Dec 9;15(12):e0242953. doi: 10.1371/journal.pone.0242953

Development and external validation of a prognostic tool for COVID-19 critical disease

Daniel S Chow 1,*, Justin Glavis-Bloom 1, Jennifer E Soun 1, Brent Weinberg 2, Theresa Berens Loveless 3, Xiaohui Xie 4, Simukayi Mutasa 5, Edwin Monuki 6, Jung In Park 7, Daniela Bota 8, Jie Wu 9, Leslie Thompson 9, Bernadette Boden-Albala 10, Saahir Khan 11,12, Alpesh N Amin 12, Peter D Chang 1,4
Editor: Itamar Ashkenazi13
PMCID: PMC7725393  PMID: 33296357

Abstract

Background

The rapid spread of coronavirus disease 2019 (COVID-19) revealed significant constraints in critical care capacity. In anticipation of subsequent waves, reliable prediction of disease severity is essential for critical care capacity management and may enable earlier targeted interventions to improve patient outcomes. The purpose of this study is to develop and externally validate a prognostic model/clinical tool for predicting COVID-19 critical disease at presentation to medical care.

Methods

This is a retrospective study of a prognostic model for the prediction of COVID-19 critical disease where critical disease was defined as ICU admission, ventilation, and/or death. The derivation cohort was used to develop a multivariable logistic regression model. Covariates included patient comorbidities, presenting vital signs, and laboratory values. Model performance was assessed on the validation cohort by concordance statistics. The model was developed with consecutive patients with COVID-19 who presented to University of California Irvine Medical Center in Orange County, California. External validation was performed with a random sample of patients with COVID-19 at Emory Healthcare in Atlanta, Georgia.

Results

Of a total 3208 patients tested in the derivation cohort, 9% (299/3028) were positive for COVID-19. Clinical data including past medical history and presenting laboratory values were available for 29% (87/299) of patients (median age, 48 years [range, 21–88 years]; 64% [36/55] male). The most common comorbidities included obesity (37%, 31/87), hypertension (37%, 32/87), and diabetes (24%, 24/87). Critical disease was present in 24% (21/87). After backward stepwise selection, the following factors were associated with greatest increased risk of critical disease: number of comorbidities, body mass index, respiratory rate, white blood cell count, % lymphocytes, serum creatinine, lactate dehydrogenase, high sensitivity troponin I, ferritin, procalcitonin, and C-reactive protein. Of a total of 40 patients in the validation cohort (median age, 60 years [range, 27–88 years]; 55% [22/40] male), critical disease was present in 65% (26/40). Model discrimination in the validation cohort was high (concordance statistic: 0.94, 95% confidence interval 0.87–1.01). A web-based tool was developed to enable clinicians to input patient data and view likelihood of critical disease.

Conclusions and relevance

We present a model which accurately predicted COVID-19 critical disease risk using comorbidities and presenting vital signs and laboratory values, on derivation and validation cohorts from two different institutions. If further validated on additional cohorts of patients, this model/clinical tool may provide useful prognostication of critical care needs.

Introduction

The exponential spread of coronavirus disease 2019 (COVID-19) has revealed constraints in critical care capacity around the globe [1, 2]. While there are early indications that social distancing measures have resulted in decreased transmission (i.e., “flattening the curve”), there is concern that subsequent pandemic waves may occur. Accurate and rapid patient prognostication is essential for critical care utilization management. Early identification of patients likely to develop critical disease may facilitate prompt intervention and improve outcomes.

Early reports suggest severe disease and poor outcomes are associated with older age, male sex, and comorbidities including hypertension, diabetes, and coronary artery disease [36]. Recent case series from the United States and France have additionally reported obesity is associated with hospitalization and worse COVID-19 disease [710]. Several attempts have been made to develop prognostic models for COVID-19 disease, largely based on early data from patient cohorts in China [1118]. These models have used demographic features, including age, sex, and comorbidities, and a limited set of laboratory values including lymphocyte count, lactate dehydrogenase (LDH), C-reactive protein (CRP), which have been reported to be associated with more severe disease [19, 20]. These initial models are of variable quality, with a high likelihood of biases and limited numbers of variables, and performance evaluation is limited by suboptimal reporting and limited validation [21].

This study describes the development and external validation of a multivariate regression model and associated clinical tool to predict risk of COVID-19 critical disease, presented utilizing TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) reporting guidelines [22].

Methods

Study design and population

After approval of the institutional review board of the University of California, Irvine Medical Center, a prognostic model was developed with data from a single-center retrospective observational cohort study of sequential patients with COVID-19 disease diagnosed by nucleic acid detection from nasopharyngeal or throat swabs at the University of California, Irvine Medical Center (UCI Health) from March 1, 2020 to April 31, 2020 (derivation cohort). UCI Health is a 411-bed academic medical center located in Orange County, California which performed outpatient, emergency department, and inpatient COVID-19 testing throughout the study period.

The model was validated with a separate retrospective observational cohort of patients with COVID-19 disease at Emory Healthcare (validation cohort). Emory Healthcare is a multi-hospital 1500-bed academic system located in Atlanta, Georgia which performed outpatient, emergency department, and inpatient COVID-19 testing throughout the study period. Patients in the validation cohort were randomly selected from a radiology database of patients who underwent imaging with a clinical concern for COVID-19 disease from March 12, 2020 to April 7, 2020 and were diagnosed with COVID-19 by nucleic acid detection from nasopharyngeal swabs.

Data was obtained by manual chart review of the electronic health record. Clinical and laboratory values were obtained from the earliest documented result at the time of presentation. If a specific laboratory value was not initially available, the value occurring in time closest after presentation was used. If no value was obtained for a patient during the admission, it was marked as “missing”. Data collection and validation were performed in accordance with the Institutional Review Board at each institution. Only de-identified data was transmitted between institutions. This study followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guidelines [22].

Outcome

The primary outcome was the likelihood of critical disease, defined as meeting the criteria of ICU admission, ventilation, and/or death. The initial index date for each patient was the date of COVID-19 diagnosis. All patients had follow-up of outcomes for a minimum of 10 days.

Statistical analysis

Developing the prediction model

We searched the literature for predictors of COVID-19 disease severity and identified the following candidate predictors: demographic characteristics (age, sex), presenting vital signs (temperature, heart rate, respiratory rate, systolic blood pressure, diastolic blood pressure, body mass index), past medical history (hypertension, diabetes, cardiovascular disease, coronary artery disease, asthma, chronic kidney disease, metabolic syndrome [as defined by consensus criteria] [23], and total number of these comorbidities), and presenting laboratory values (white blood cell count, lymphocyte percentage, serum creatinine, aspartate aminotransferase [AST], lactate dehydrogenase [LDH], C-reactive protein [CRP], procalcitonin, ferritin, troponin I, d-dimer, triglycerides, and high density lipoprotein [HDL]).

With regards to the criteria for including variables into the model, using a recursive feature selection technique, univariate statistical testing was applied to the cohort to identify the variables with greatest differences in distribution when stratified based on outcomes. Starting with the variable calculated to have the largest differences, additional variables are added to the model, one-by-one, in order of significance based on univariate testing, until the model performance plateaus. In this study, the optimal number of variables based on this technique was chosen to be 13. It should be noted that model performance did not degrade with additional variables (e.g. instead overall performance reached a plateau); thus the total number of variables used in this study represents the minimum amount needed to approximate the performance models using arbitrarily large number of covariates.

Two-sided t-tests were used for continuous variables and Pearson’s chi-squared test (χ2) was used for categorical variables to assess for differences in each candidate predictor based on critical disease status. Based on these results and prevalence of candidate predictor, the top thirteen covariates were chosen and used to create multivariable logistic regression model. For missing data, median imputation was performed based on underlying critical disease status. Each covariate was independently normalized to a scale of [0, 1] based on minimum and maximum values present in the dataset. Normalization of the data to similar scales facilitates numeric stability during the algorithm training process and ensures that all variables are initialized with relatively equal contribution to the prediction. Patients missing more than 50% of data (e.g. having 6 or less variables) were excluded from analysis.

The model was implemented using L2 regularization and optimized using the limited memory Broyden–Fletcher–Goldfarb–Shanno (BFGS) technique. Finally, a Wald chi-squared test was used to evaluate the contribution from each variable.

Validating the prediction model

The predictive accuracy of the model was determined retrospectively in the external validation cohort with discrimination and calibration. For any given patient, missing data was imputed using population-derived median values from the training cohort. Additionally, all model inputs were clipped to the minimum and maximum values present in the training cohort. Model discrimination (i.e., the degree to which the model differentiates between individuals with critical and non-critical outcomes) was calculated with the C statistic. All analyses were conducted using the Python scikit-learn library (0.22.2) [24] and IBM SPSS Statistics Subscription, version 1.0.0.1012 (IBM Corp., Armonk, N.Y., USA).

Developing a clinical tool

A web-based application was created in Python using a Flask server (1.1.1) to facilitate clinical implementation of the trained model.

Results

For the derivation cohort, a total of 3,208 COVID-19 tests were conducted over the study period, of which 9.3% (299/3208) were positive. Clinical data including past medical history and presenting laboratory values were available for 29.1% (87/299) patients (median age, 48 years [range, 21–88 years]; 64.4% [56/87] male). Demographic detail is provided in Table 1. (Fig 1). Most common comorbidities included obesity (35.6%, 31/87), hypertension (36.8%, 32/87), and diabetes (24%, 24/87). Critical disease was present in 24.1% (21/87).

Table 1. Patient data.

Variable Critical (n = 21) Non-Critical (n = 66)
Demographics
 Age (yr) 55 46.5
 Male (count, %) 15 (71.4) 42 (62.1)
Presenting Vital Signs
 Respiratory Rate 22 18
 Body mass index (kg/m2) 33.2 27.5
Comorbidities
 Total Number 2.2 1.0
Laboratory Values
 White blood cell count (1000/mcl) 8.8 6.1
 % Lymphocytes 15.0 22.2
 Serum Creatinine (mg/dL) 1.6 0.9
 Lactate Dehydrogenase (U/L) 513.8 248.4
 Troponin I (ng/L) 43.7 8.7
 Ferritin (ng/mL) 1066.8 372.7
 Procalcitonin (ng/mL) 1.2 0.2
 C-reactive protein (mg/dL) 13.1 8.1

All statistics represent the mean value unless otherwise stated. See Table 2 for F-statistic and details regarding modeling weights.

Fig 1. Flow diagram of the derivation cohort.

Fig 1

For the derivation cohort, a total of 3,208 COVID-19 tests were conducted over the study period, of which 9.3% (299/3208) were positive. Of positive patients, laboratory data was available for 29.1% (87/299) patients.

Of a total of 40 patients in the validation cohort (median age, 60 years [range, 27–88 years]; 55% [22/40] male), critical disease was present in 65% (26/40). Most common comorbidities included obesity (53%, 21/40), hypertension (60%, 24/40), and diabetes (40%, 16/40). Characteristics between the derivation and validation cohorts were notable for increased prevalence of comorbidities in the validation cohort.

After feature selection, the following factors associated with greatest increased risk of critical disease were used in model training: age, gender, total number of comorbidities (which included cardiovascular disease, coronary artery disease, chronic kidney disease, asthma/chronic obstructive pulmonary disease, diabetes mellitus, hypertension, and obesity), BMI, respiratory rate, white blood cell count, lymphocyte percentage, creatinine, lactate dehydrogenase (LDH), troponin I, ferritin, procalcitonin, and C-reactive protein (CRP) (Table 2).

Table 2. Predictive model for COVID-19 critical disease.

Variable Coefficient Standard Error Wald Statistic f-test
Demographics
 Age 0.07 1.98 0.001 1.08
 Gender 0.51 1.13 0.204 0.59
Presenting Vital Signs
 Respiratory Rate 0.80 2.80 0.081 27.74
 Body mass index (BMI) 1.07 2.49 0.185 13.91
Comorbidities
 Total Number 1.14 1.62 0.491 16.80
Laboratory Values
 White blood cell count (WBC) 0.14 2.68 0.003 7.87
 % Lymphocytes -0.38 2.74 0.019 7.48
 Serum Creatinine 0.24 4.75 0.002 6.89
 Lactate Dehydrogenase (LDH) 1.72 2.12 0.658 23.13
 Troponin I 0.60 3.38 0.032 7.26
 Ferritin 0.55 3.17 0.030 12.38
 Procalcitonin 0.67 4.38 0.023 7.59
 C-reactive protein (CRP) 1.48 2.00 0.548 6.89

Model discrimination in the derivation cohort was high (concordance statistic: 0.948, 95% confidence interval 0.900–0.997);); with the best logistic regression score cut point at 30%, sensitivity was 90.4%, specificity was 89.4%, positive predictive value was 73.0%, and negative predictive value was 96.7%.

Model discrimination in the validation cohort was also high (concordance statistic: 0.940, 95% confidence interval 0.870–1.009); with the same 30% logistic regression cut point, sensitivity was 100%, specificity was 71.4%, positive predictive value was 86.7%, and negative predictive value was 100% (Fig 2). Procalcitonin was unavailable for all patients. During training, only 30/87 patients had all thirteen lab values present. The remaining patients had at least one missing variable, the distribution of missing variables is provided in Table 3. In our cohort, no patient requiring ICU admission had less than 6 lab values. The average number of missing variables was 1.33 (range 1–3) for cases that were correctly predicted as critical or non-critical and 2.11 (range 1–5) for cases that were incorrectly predicted.

Fig 2. Receiver operator curves.

Fig 2

Model discrimination for the derivation cohort (A) was (concordance statistic: 0.948, 95% confidence interval 0.900–0.997) and validation cohort (B) was (concordance statistic: 0.940, 95% confidence interval 0.870–1.009).

Table 3. Variable distribution for patients.

Number of Variables Patient Count
7 2
8 37
9 2
10 5
11 5
12 6
13 30

A web-based tool was developed to enable clinicians to input patient data and view model output (Fig 3). The page accepts user input and outputs a likelihood of critical disease and does not require all variables to be present.

Fig 3. Web-based clinical tool for COVID-19 critical disease prediction.

Fig 3

Discussion

In this study, we developed and externally validated a predictive model and clinical tool that can be used to prognosticate the likelihood of COVID-19 critical disease based on data available early in a patient’s presentation.

By using derivation and validation cohorts from separate institutions with different underlying patient characteristics, in particular a higher prevalence of comorbidities in the validation cohort, we achieved high calibration and discrimination. This model has the potential to be utilized by front-line healthcare providers to predict critical care demand and provide early indications of likelihood a patient’s condition may worsen. As therapeutic interventions become validated, this may enable early intervention in at-risk patients to improve outcomes. In particular, antiviral therapies may have increased efficacy if administered earlier in the disease course.

Compared with other earlier models, which were primarily single institution-based, were developed from patient cohorts in China, utilized only a few variables, and did not include subsequently identified risk factors such as number of comorbidities and obesity [710], this model may have greater relevance and predictive strength in cohorts of Western patients in which obesity is more common. In particular, the inclusion of nearly 30 candidate variables in model derivation ensures sufficient consideration to numerous previously identified prognostic correlates.

Interestingly, variables which have previously been reported to be associated with worse COVID-19 disease, most notably including older age and hypertension, were less predictive in our sample than body mass index, total number of comorbidities and several laboratory values. The tool performed well in the validation set even though there was a higher rate of missing data for some values such as procalcitonin and ferritin, which were not frequently performed at the validation institution. In settings in which laboratory data is easily and rapidly acquired, this study suggests there may be value to establishing a panel of COVID-19-specific laboratory studies including lactate dehydrogenase, troponin I, ferritin, procalcitonin, and C-reactive protein (in addition to commonly acquired complete blood count and serum chemistries).

Front-line medical providers have been inundated with critically ill COVID-19 patients. A simple web-based tool utilized at patient presentation may facilitate decision making by simplifying integration of numerous clinical variables. Our model has a high negative predictive value, which can increase physician confidence in determining which patients may be discharged safely at presentation. This is of particular utility in settings of high healthcare utilization, especially when physicians are treating higher than expected numbers of patients and/or working outside of their standard practice. Our model has high positive predictive value, highlighting those patients for whom admission and close clinical monitoring may be appropriate.

The chosen cutoff point of 30% based on the derivation cohort performs with 100% sensitivity and 71.4% specificity in the validation cohort that included more critical patients. In most circumstances, identifying all cases of critical disease is preferred even if some less critical patients are identified, but in certain situation such as a surge in which critical care resources are limited, the cutoff point could be adjusted to a desired balance of sensitivity and specificity.

Limitations

This study has limitations. A limited small sample of patient data was reviewed retrospectively from two centers. As data was obtained retrospectively, there was no control over which laboratory data was collected, which varied with institutional practice patterns. However, the model performed well in a validation data set with incomplete laboratory values. Further testing on larger cohorts of patient data is needed. Conclusions may not be globally generalizable to different patient cohorts. Lastly, not all patients had complete data available. While imputation is an imperfect approximation to true lab value data, the high performance on an external data set with missing data suggests that the approach is reasonable.

Conclusions

We present a predictive model and clinical tool which can be used to prognosticate the likelihood of COVID-19 critical disease based on data at patient presentation. Further testing is needed on larger patient cohorts to establish generalizability. In subsequent analyses, we intend to evaluate whether this model can be applied to daily trends of clinical data in admitted patients to predict patient disposition.

Data Availability

Data are available from the UCI Institutional Data Access / Ethics Committee (contact via Joy Chu at joy.chu@uci.edu) for researchers who meet the criteria for access to confidential data due to ethical restrictions involving potentially identifying information.

Funding Statement

This study was funded by an internal award at the University of California, Irvine through the COVID-19 Basic, Translational, and Clinical Research Funding Opportunity. None of the authors received salary support from this award. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Xie J, Tong Z, Guan X, Du B, Qiu H, Slutsky AS. Critical care crisis and some recommendations during the COVID-19 epidemic in China. Intensive Care Med. 2020:6–9. 10.1007/s00134-020-05979-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Grasselli G, Pesenti A, Cecconi M. Critical Care Utilization for the COVID-19 Outbreak in Lombardy, Italy. JAMA. 2020;19:1–2. 10.1001/jama.2020.4031 [DOI] [PubMed] [Google Scholar]
  • 3.Wu Z, McGoogan JM. Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China. JAMA. 2020;323(13):1239 10.1001/jama.2020.2648 [DOI] [PubMed] [Google Scholar]
  • 4.Arentz M, Yim E, Klaff L, et al. Characteristics and Outcomes of 21 Critically Ill Patients With COVID-19 in Washington State. JAMA. 2020;4720:2019–2021. 10.1001/jama.2020.4326 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Onder G, Rezza G, Brusaferro S. Case-Fatality Rate and Characteristics of Patients Dying in Relation to COVID-19 in Italy. JAMA. 2020;2019:2019–2020. [DOI] [PubMed] [Google Scholar]
  • 6.Yang J, Zheng Y, Gou X, et al. Prevalence of comorbidities in the novel Wuhan coronavirus (COVID-19) infection: a systematic review and meta-analysis. Int J Infect Dis. 2020;94:91–95. 10.1016/j.ijid.2020.03.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Richardson S, Hirsch JS, Narasimhan M, et al. Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area. JAMA. 2020;10022:1–8. 10.1001/jama.2020.6775 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Petrilli CM, Jones SA, Yang J, et al. Factors associated with hospitalization and critical illness among 4,103 patients with COVID-19 disease in New York City. medRxiv. January 2020:2020.04.08.20057794. 10.1101/2020.04.08.20057794 [DOI] [Google Scholar]
  • 9.Lighter J, Phillips M, Hochman S, et al. Obesity in patients younger than 60 years is a risk factor for Covid-19 hospital admission. Clin Infect Dis. April 2020. 10.1093/cid/ciaa415 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Simonnet A, Chetboun M, Poissy J, et al. High prevalence of obesity in severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) requiring invasive mechanical ventilation. Obesity. 2020;n/a(n/a). 10.1002/oby.22831 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Bai X, Fang C, Zhou Y, et al. Predicting COVID-19 Malignant Progression with AI Techniques. SSRN Electron J. 2020. 10.2139/ssrn.3557984 [DOI] [Google Scholar]
  • 12.Caramelo F, Ferreira N, Oliveiros B. Estimation of risk factors for COVID-19 mortality—preliminary results. medRxiv. 2020;19:2020.02.24.20027268. 10.1101/2020.02.24.20027268 [DOI] [Google Scholar]
  • 13.Gong J, Ou J, Qiu X, et al. A Tool to Early Predict Severe 2019-Novel Coronavirus Pneumonia (COVID-19): A Multicenter Study using the Risk Nomogram in Wuhan and Guangdong, China. medRxiv. January 2020:2020.03.17.20037515. 10.1101/2020.03.17.20037515 [DOI] [Google Scholar]
  • 14.Lu J, Hu S, Fan R, et al. ACP Risk Grade: A Simple Mortality Index for Patients with Confirmed or Suspected Severe Acute Respiratory Syndrome Coronavirus 2 Disease (COVID-19) During the Early Stage of Outbreak in Wuhan, China. SSRN Electron J. 2020. 10.1101/2020.02.20.20025510 [DOI] [Google Scholar]
  • 15.Qi X, Jiang Z, Yu Q, et al. Machine learning-based CT radiomics model for predicting hospital stay in patients with pneumonia associated with SARS-CoV-2 infection: A multicenter study. medRxiv. January 2020:2020.02.29.20029603. 10.21037/atm-20-3026 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Shi Y, Yu X, Zhao H, Wang H, Zhao R, Sheng J. Host susceptibility to severe COVID-19 and establishment of a host risk score: Findings of 487 cases outside Wuhan. Crit Care. 2020;24(1):2–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Xie J, Hungerford D, Chen H, et al. Development and external validation of a prognostic multivariable model on admission for hospitalized patients with COVID-19. medRxiv. 2020:2020.03.28.20045997. 10.1101/2020.03.28.20045997 [DOI] [Google Scholar]
  • 18.Yan L, Zhang H-T, Xiao Y, et al. Prediction of criticality in patients with severe Covid-19 infection using three clinical features: a machine learning-based prognostic model with clinical data in Wuhan. medRxiv. 2020:2020.02.27.20028027. [Google Scholar]
  • 19.Zhou F, Yu T, Du R, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020;6736(20):1–9. 10.1016/S0140-6736(20)30566-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Lippi G, Plebani M. Laboratory abnormalities in patients with COVID-2019 infection. Clin Chem Lab Med. March 2020. 10.1515/cclm-2020-0198 [DOI] [PubMed] [Google Scholar]
  • 21.Wynants L, Van Calster B, Bonten MMJ, et al. Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. BMJ. 2020;369:m1328. 10.1136/bmj.m1328 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement. Ann Intern Med. 2015;162(1):55–63. 10.7326/M14-0697 [DOI] [PubMed] [Google Scholar]
  • 23.Alberti KGMM, Eckel RH, Grundy SM, et al. Harmonizing the metabolic syndrome: A joint interim statement of the international diabetes federation task force on epidemiology and prevention; National heart, lung, and blood institute; American heart association; World heart federation; International. Circulation. 2009;120(16):1640–1645. 10.1161/CIRCULATIONAHA.109.192644 [DOI] [PubMed] [Google Scholar]
  • 24.Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine Learning in {P}ython. J Mach Learn Res. 2011;12:2825–2830. [Google Scholar]

Decision Letter 0

Itamar Ashkenazi

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

30 Jul 2020

PONE-D-20-14464

Development and External Validation of a Prognostic Tool for COVID-19 Critical Disease

PLOS ONE

Dear Dr. Chow,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Two reviewers commented on your study.  Their comments were not supportive in favor of accepting your study for publication.  Their specific comments are attached below and should be addressed if a revision is submitted.  I want to add one more comment that should be addressed.  This study was performed 4 months ago.  If your tool was used prospectively since then, this should be commented on, providing the observed reliability of the tool in identifying patients who will develop critical disease.  If this request exceeds what was authorized by the institution's research ethics committee, please use this letter when approaching the IRB to authorize a change in the protocol.  

Please submit your revised manuscript by Sep 13 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Itamar Ashkenazi

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2.Thank you for stating the following financial disclosure:

 [The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.].

At this time, please address the following queries:

  1. Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution.

  2. State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

  3. If any authors received a salary from any of your funders, please state which authors and which funders.

  4. If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

3.We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: I Don't Know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The concept of developing a risk score for which Covid 19 patients will require critical care is laudable. Prediction of which patients will come to need such care is very moprtant for planning and allocation of resources.

The paper is written in clear and standard English.

The methodology is sound, however this reviewer would be grateful fi the authors could address the following:

The very limited sample size - 87 in the drivation cohort of which 21 wre critical and 40 in the validation cohort of which 26 were critical.

Why the derivation and validation cohorts were drawn from different insitutions, particulalry where the laboratory sampling in the validation group did not include important elements of the derivation group data.

What effect the retrospective data collection might have had on study results (this is briefly alluded to in the "limitations" section).

Reviewer #2: The authors made a prediction model for critical disease (defined as ICU admission, MV or death) in 87 COVID-19 patients and found a high C-statistic of 0.95 and 0.94 in the validation cohort and concluded that the model performed well.

I have some questions.

Effective sample size is 21 (out of 87), is it allowed to put 13 variables in the model (the full model consisted even more variables) while one variable (total number of co-morbidities consists another 8 variables)?

If it is allowed, for clinical practice it is not useful. A practical clinical prediction model consists of 3-4 variables that are ready available at the bedside. Of course does accuracy increase with more variables, but consider reducing the amount of variables.

Out of 299 patients, 212 were excluded due to missing variables. The fact that data are missing in these patient may not be random. Furthermore, even in de 87 eligible patients data was not complete (mean missing’s was 1.33(range 1-3) for every patient. Therefore imputation was performed. What were the results when missing’s considered as missing’s?

What is the necessity and effect of normalizing the data?

What were the criteria for including a variable in the model?

I miss a Table 1 with all baseline characteristics (and difference between yes/no critical disease) and perhaps a table 2 with univariate logistic regression analyses of these parameters.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: PV van Heerden

Reviewer #2: Yes: Walter m. van den Bergh

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Dec 9;15(12):e0242953. doi: 10.1371/journal.pone.0242953.r002

Author response to Decision Letter 0


18 Sep 2020

Ref: PONE-D-20-14464

Dr. Ashkenazi and PLOS ONE Reviewers,

Thank you for your review of our manuscript. We sincerely appreciate the comments provided. We have made edits to our manuscript based on these comments, addressing all of the issues that have been raised. In addition, we have responded directly to each of the reviewer remarks below. Please do not hesitate to contact us with additional comments or suggestions.

Reviewer #1:

1. The concept of developing a risk score for which Covid 19 patients will require critical care is laudable. Prediction of which patients will come to need such care is very moprtant for planning and allocation of resources. The paper is written in clear and standard English.

Thank you for reviewing our manuscript and for your comments and suggestions.

2. The methodology is sound, however this reviewer would be grateful fi the authors could address the following:

a. The very limited sample size - 87 in the drivation cohort of which 21 wre critical and 40 in the validation cohort of which 26 were critical. Why the derivation and validation cohorts were drawn from different insitutions, particulalry where the laboratory sampling in the validation group did not include important elements of the derivation group data.

Early in the COVID-19 pandemic there were a limited number of patients from which to derive and validate our predictive model. Utilizing a cohort of patients from an outside institution was necessary, as we lacked sufficient numbers of patients at our institution to perform both derivation and validation. As laboratory variables were collected retrospectively, there were differences in clinical practice patterns which affected data availability. Additionally, the use of a completely external validation group increases the confidence in generalizability of the results. Though the external validation group lab values are not as comprehensive as the training set, the relatively robust performance suggests that the algorithm will nonetheless maintain accuracy even with missing data.

b. What effect the retrospective data collection might have had on study results (this is briefly alluded to in the "limitations" section).

The principal limitation is data availability. To overcome this limitation, we suggested the development of a panel of COVID-19 laboratory tests, which can help standardize clinical practice, and which we have subsequently implemented at our institution.

Reviewer #2:

3. The authors made a prediction model for critical disease (defined as ICU admission, MV or death) in 87 COVID-19 patients and found a high C-statistic of 0.95 and 0.94 in the validation cohort and concluded that the model performed well.

Thank you for reviewing our manuscript and for your comments and suggestions.

4. Effective sample size is 21 (out of 87), is it allowed to put 13 variables in the model (the full model consisted even more variables) while one variable (total number of co-morbidities consists another 8 variables)?

Thank you for this suggestion. We agree that the use of many model parameters may increase the risk of model overfitting. Using a recursive feature selection process, one feature was added to the model at a time and the performance of the new model was assessed via a cross-validation technique. Despite the use of thirteen features, no significant overfitting was observed across each of the cross-validation folds during training. Furthermore, the high performance of the model on the external test set helps to validate this approach and give confidence to the use of all thirteen variables in the final predictive algorithm.

Regarding the composite variable for comorbidity: a composite variable that captures the total number of 8 comorbidities demonstrated strong predictive value (Wald score 0.491). We hypothesize that this is because COVID-19 is a multiorgan/multisystem disease and that overall patient health status, as captured by the total number of comorbidities, is more important than a specific comorbid condition.

5. If it is allowed, for clinical practice it is not useful. A practical clinical prediction model consists of 3-4 variables that are ready available at the bedside. Of course does accuracy increase with more variables, but consider reducing the amount of variables.

Thank you for this observation. While the full model can use up to 13 different variables, in fact there is no requirement that all variables must be present for either training of prediction. During training, only 30/87 patients had all thirteen lab values present. The remaining patients had at least one missing variable, the distribution of which is shown here:

Number of variables Total patient count

7 2

8 37

9 2

10 5

11 5

12 6

13 30

During the prediction process, all missing data is accounted for using median imputation from population statistics from the training data. While imputation is an imperfect approximation to true lab value data, the high performance on an external data set with missing data suggests that the approach is reasonable. Additionally, as a surrogate for clinical utility, the clinical prediction model has been integrated into the clinical workflow at our hospital and is furthermore available as a public website at http://covidrisk.hs.uci.edu (Figure 3).

This table and added has been added to the results of the manuscript as well as the discussion in the limitation section.

Out of 299 patients, 212 were excluded due to missing variables. The fact that data are missing in these patient may not be random. Furthermore, even in de 87 eligible patients data was not complete (mean missing’s was 1.33(range 1-3) for every patient. Therefore imputation was performed. What were the results when missing’s considered as missing’s?

The distribution of data availability per patient is shown in the table above. Patients missing more than 50% of data (e.g. having 6 or less variables) were excluded from analysis. In our cohort, no patient requiring ICU admission had less than 6 lab values.

This discussion has been added to the methods and results where appropriate.

We agree that these patients may not be random. In general, such patients have a less severe clinical presentation and have been determined by a medical expert to require less laboratory testing. By contrast this tool is designed to identify patients at high risk for decompensation.

As in the discussion above, median imputation is used during inference based on population statistics. This approach allows for patients with less than 6 lab values to be analyzed by the tool, with the acknowledgement that the prediction may be less accurate than patients with additional lab testing.

6. What is the necessity and effect of normalizing the data?

Normalization of the data to similar scales facilitates numeric stability during the algorithm training process and ensures that all variables are initialized with relatively equal contribution to the prediction. This has been added to the methods.

7. What were the criteria for including a variable in the model?

Using a recursive feature selection technique, univariate statistical testing was applied to the cohort to identify the variables with greatest differences in distribution when stratified based on outcomes. Starting with the variable calculated to have the largest differences, additional variables are added to the model, one-by-one, in order of significance based on univariate testing, until the model performance plateaus. In this study, the optimal number of variables based on this technique was chosen to be 13. It should be noted that model performance did not degrade with additional variables (e.g. instead overall performance reached a plateau); thus the total number of variables used in this study represents the minimum amount needed to approximate the performance models using arbitrarily large number of covariates. This has been added to the methods.

8. I miss a Table 1 with all baseline characteristics (and difference between yes/no critical disease) and perhaps a table 2 with univariate logistic regression analyses of these parameters.

Thank you for this comment. We have added Table 1 as well as the f-test data for table 2.

The ranked f-tests are below:

'Resp': 27.739717,

'LDH': 23.125122,

'Comorbidities': 16.79656,

'BMI': 13.908264,

'Ferritin': 12.375533,

'WBC': 7.8667326,

'Procalcitonin': 7.5904236,

'Lymph': 7.47715,

'Troponin': 7.263967,

'CRP': 6.589011,

'Creatinine': 5.7942467,

'Age': 1.0775236,

'Gender': 0.5919638

Attachment

Submitted filename: Revision Letter.docx

Decision Letter 1

Itamar Ashkenazi

13 Nov 2020

Development and External Validation of a Prognostic Tool for COVID-19 Critical Disease

PONE-D-20-14464R1

Dear Dr. Chow,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Itamar Ashkenazi

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

The three reviewers differed in their opinion regarding this study. The main problem discussed with the reviewers was that this forecast model was constructed based on many variables and this model was validated on a rather small cohort. Another problem is that many of the parameters included in the forecast model rely on tests that are not in common use in the set up of acute care. I believe a note on these issues should be posted with the manuscript to be published. The authors could then reply. Whether more data has been accumulated (since this study was sent for publication) on this score in the authors' daily practice, presentation of these will be appreciated.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: I Don't Know

Reviewer #3: No

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: (No Response)

Reviewer #3: An important factor was ignored in the study—race. Disparity of COVID-19 between ethnic groups in US is profound.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Walter M. van den Bergh

Reviewer #3: No

Acceptance letter

Itamar Ashkenazi

20 Nov 2020

PONE-D-20-14464R1

Development and External Validation of a Prognostic Tool for COVID-19 Critical Disease

Dear Dr. Chow:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Itamar Ashkenazi

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Revision Letter.docx

    Data Availability Statement

    Data are available from the UCI Institutional Data Access / Ethics Committee (contact via Joy Chu at joy.chu@uci.edu) for researchers who meet the criteria for access to confidential data due to ethical restrictions involving potentially identifying information.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES