Table 5.
Study | Variables included in the final model (for mortality) | External validation | How are predictors combined? | AUC in derivation cohort | AUC in validation cohort | Limitations |
---|---|---|---|---|---|---|
Halalau (Halalau et al., 2021) | Age, male sex, congestive heart failure, end-stage renal disease, chronic pulmonary disease, DM, hypertension, obesity, nursing home residence, immunocompromised status, congenital heart disease, coronary artery disease, end-stage liver disease and pregnancy | Yes | Points-based score | Not available | 0.75 (0.71 – 0.78) | Selection bias: Excluded patients who were hospitalized beyond May 12, 2020. Data on how the score was developed not reported. Absence of an initial validation cohort. Uniform scoring weights of different risk factors. Complete case analysis. |
Fumagalli (Fumagalli et al., 2020) | Age, number of comorbidities (CV disease, hypertension, DM, depression, dementia and cancer), respiratory rate, PaO2/FiO2, serum creatinine and platelet count obtained on admission | No | Points-based score | 0.90 (0.87 – 0.93) | NA | Modest sample size. No external validation. Variables were selected by univariate analysis. Complete case analysis. |
Knight (Knight et al., 2020) | Age, sex, number of comorbidities (chronic cardiac disease, chronic respiratory disease excluding asthma, chronic renal disease defined as estimated glomerular filtration rate ≤ 30, mild to severe liver disease, dementia, chronic neurological conditions, connective tissue disease, DM, HIV or AIDS, and malignancy), respiratory rate, SpO2, level of consciousness, urea and CPR obtained on admission | Yes | Points-based score | 0.786 (0.781 – 0.790) | 0.767 (0.760 – 0.773) | Several potentially relevant comorbidities, such as hypertension, previous myocardial infarction, and stroke, were not included in data collection. The authors considered that inclusion of these comorbidities might have impacted upon or improved the performance and generalizability of the 4C Mortality Score. Secondly, a proportion of recruited patients (3.3%) had incomplete episodes, so there is a possibility of selection bias, if patients with incomplete episodes, such as those with prolonged hospital admission, had a differential mortality risk to those with completed episodes. |
Liang (Liang et al., 2020) | Chest radiographic abnormality, age, hemoptysis, dyspnea, unconsciousness, number of comorbidities (COPD, hypertension, DM, coronary heart disease, chronic kidney disease, cancer, cerebrovascular disease, hepatitis B, immunodeficiency), cancer history, neutrophil-to-lymphocyte ratio, lactate dehydrogenase and direct bilirubin obtained on admission | Yes | Logistic Regression | 0.88 (0.85 – 0.91) | 0.88 (0.84 – 0.93) | Modest sample size for score development and a relatively small sample for validation. The data for score development and validation are entirely from China, which could potentially limit the generalizability of the risk score in other areas of the world. Mortality was quite low (3.2%). Apparently, patients with cancer should gain points for both cancer history and number of comorbidities, not clear. |
Nicholson (Nicholson et al., 2020) | Age, sex, diabetes mellitus, chronic statin use, albumin, C-reactive protein, neutrophil-lymphocyte ratio, mean corpuscular volume, platelet count, and procalcitonin obtained on admission | Yes | Logistic Regression | 0.87 (0.83 – 0.91) | 0.80 (0.75 – 0.85) | Modest sample sizes in both our derivation and validation cohorts. The number of events on the derivation and validation cohort separately was not informed (211 in total). Variables were selected by univariate analysis. Complete case analysis. |
Garibaldi (Garibaldi et al., 2021) | Age, nursing home residence, sex, BMI, Charlson Comorbidity Index, SaO2/FiO2 ratio obtained on admission | No | Cox regression analysis | Not available | Not available | Modest sample size. No external validation. Too many variables tested in the model for the number of events (24/131). To try to overcome that, authors tested variables "in blocks" |
Sourij (Sourij et al., 2020) | Age, arterial occlusive disease, CRP, estimated GFR and aspartate AST levels obtained on admission | No | Nomogram | 0.889 (0.837 – 0.941) | NA | Small sample size and number of events. Number of variables tested not clear. Complete case analysis, and predictors with > 20% missing values were excluded. No external validation |
Gavelli (Gavelli et al., 2020) | Presence of comorbidity (any disease on active therapy), SpO2 and respiratory rate after a trial of 15 minutes with oxygen at a FiO2 0.5 | No | Points-based score | NA | Not reported | Score developed by consensus. Modest sample size. Number of events is not clear. Single-center study. No external validation. AUC and accuracy not presented. |
Kazemi (Kazemi et al., 2020) | Age, sex, comorbidity (cardiovascular and pulmonary), diffused distribution of CT abnormality, total CT-score and dyspnea at admission | No | Logistic Regression | 0.73 (95% CI not reported) | NA | Small sample size and number of events. Too many variables tested for the low number of events. Comorbidities were not well defined, percentage of involvement included in CT score is subjective and peripheral involvement is not well defined. Complete case analysis. High risk of selection bias: All 3 hospitals were referral centers for COVID-19 patients, so it is possible that the overall CT- score of the patients in this study would not be representative of the general population |
Núñez-Gil (Nunez-Gil et al., 2020) | Age, hypertension, obesity, renal insufficiency, any immunosuppressive condition, SpO2, CRP obtained on admission | No | Points-based score | 0.88 (0.85 – 0.91) | NA | No external validation. Variables were selected by univariate analysis. Complete case analysis. Variables included in the model not clearly defined. Authors reported that some incident events in the participating centers may not have been diagnosed and/or not been reported. The data analysis and modeling focused on only two countries (Italy and Spain) of the four initially considered, since as previously mentioned heterogeneity among countries with regard to clinical features and death-risk assessment could limit the representative nature of the sampling. |
Allenbach (Allenbach et al., 2020) | Age, WHO clinical scale, CRP and lymphocytes count obtained on admission | No | Points-based score (but AUC presented based on the logistic regression model) | 0.786 for the composite outcome and 0.803 for death (after correction for over-optimism; IC95% not reported) | 0.787 for the composite outcome and 0.827 for death (after correction for over-optimism; IC95% not reported) | Small sample size of both development and validation samples. Too many predictors tested for a small number of events. Complete case analysis. External validation sample not described. The external sample consisted of patients from a regional non-university hospital, which could explain the differences on catchment area and patient recruitment. In the acute context of the first SARS-CoV-2 epidemic wave in France, we relied on a sample prospectively defined by consecutive eligible patients in the study center. |
Kim (Kim et al., 2020) | Myocardial damage marker (creatine kinase-MB [CK-MB] or troponin-I > the 99th percentile upper reference limit) + Heart failure marker (NT-proBNP ≥ 125 pg/mL) + Electrical abnormality marker (first detected or newly developed supraventricular tachycardia, ventricular tachycardia, ventricular fibrillation, atrial fibrillation, bundle branch block, ST-segment elevation/depression, T-wave flattening/inversion, and QT interval prolongation on ECG) | No | Points-based score | Not reported | NA | Score developed by consensus. Small sample size and small number of events. Accuracy not assessed. The protocol for the evaluation of cardiac injury was not controlled. The attending physician decided each category of the test according to the patient's condition at the time of the management. When the test was not performed, it is assumed as a negative result because the physician considered it as an unnecessary test or the result might be negative. |
Altschul (Altschul et al., 2020) | Age, sex, SpO2, MAP, INR, creatinine, BUN, interleukin-6 (IL-6), CRP and procalcitonin obtained on admission | Yes | Points-based score | 0.824 (0.814 to 0.851) | 0.798 (0.789 to 0.818) | Complete case analyses, variables selected by univariate analyses |
Hajifathalian (Hajifathalian et al., 2020) | Age, mean arterial pressure, serum creatinine and severity of hypoxia at hospital presentation. | Yes | Multivariate logistic regression | 7 days: 0.877 (95%CI 0.831–0.923); 14 days: 0.847 (95%CI 0.806–0.888) | 7 day (0.851 [0.781 to 0.921]); 14 day (0.825 [0.764 to 0.887]) | Modest sample size for development and validation, less than 100 events both in the development and validation cohorts, short follow-up time |
Wang (Wang et al., 2020) | Age, ferritin and D-dimer obtained on admission | Yes | Logistic regression and nomogram | 0.871 (based on its optimal cut-off value = 85) | Not available (link for supplemental material does not work) | Single-center study, with small sample for development and validation, less than 100 events both in the development and validation cohorts. Complete-case analysis. D-dimer assay not described. AUC for external validation not available to the readers. |
Zhou (Zhou et al., 2020) | Lactate dehydrogenase, albumin, BUN, NLR and D-dimer obtained on admission | No | Nomogram | 0.955 (95% CI not provided) | NA | Single-center study, with small sample size, including cases not confirmed by RT-PCR, and less than 100 events. Complete-case analysis and tests too many variables for the number of events. D-dimer assay not described. |
Goméz (Gomez et al., 2020) | Age, creatinine, glucose and white blood cells obtained on admission | No | Not clear | 0.874 (0.816–0.933) | NA | Single-center study, with small sample size, including cases not confirmed by RT-PCR, and less than 100 events. Complete-case analysis and tests too many variables for the number of events. |
Galloway (Galloway et al., 2020) | Age, sex, ethnicity, DM, hypertension, chronic lung disease, SpO2, radiographic severity score, neutrophil count, respiratory rate, CRP, albumin, creatinine obtained on admission | No | Points-based score | 0.697 (0.652,0.741) | NA | Modest sample size. No external validation. Complete case analysis. AUC < 0.70 |
Bello-Chavolla (Bello-Chavolla et al., 2020) | Age, diabetes, obesity, CKD, COPD, hypertension, immunosuppression and COVID-19 pneumonia | Yes | Points-based score | 0.823 (95% CI not reported) | 0.830 (95% CI not reported) | The use of data collected from a sentinel surveillance system model, what raises concern about data quality. The same score for inpatient and outpatients and sensitivity analysis was not performed to assess accuracy for patients who were hospitalized. Apparently, complete case analysis. |
Weng (Weng et al., 2020) | Age, neutrophil-to-lymphocyte ratio, D-dimer and C-reactive protein obtained on admission | Yes | Nomogram and logistic regression | 0.921 (0.835–0.968) | 0.975 (0.947–1.0) | Small sample size for development and validation, with < 100 events in both cohorts. Variables with > 10% missing values were excluded. D-dimer assay was not reported. |
Ko (Weng et al., 2020) | Lymphocytes, neutrophils, albumin, LDH, neutrophil count (?), CRP, prothrombin activity, calcium, urea, estimated GFR, monocytes, globulin, eosinophils, glucose, RDW, bicarbonate, RDW standard deviation, platelet count, mean platelet volume, platelet large-cell ratio, prothrombin time, total protein, platelet distribution width, aspartate aminotransferase, thrombocytocrit, eosinophil count, alkaline phosphatase, INR | Yes | AI model | Not reported | Not reported | Small sample size for development and validation, too many variables tested for the limited number of events, high mortality rate, with possibility of selection bias. Not clear if included laboratory-confirmed COVID-19 patients only. The number of predictors make it difficult to be applicable at bedside. |
Xie (Xie et al., 2020) | Age, lymphocyte count, lactate dehydrogenase and SpO2 obtained on admission | Yes | Logistic regression and nomogram | 0.880 (95% CI not reported) | 0.980 (0.958–1.00) | High risk of selection bias: the cohort was conducted early in the pandemic, there was a high mortality rate (51.8% in development cohort and 47.6% in the validation cohort), and it may not accurately represent patients with mild or asymptomatic COVID-19 (as they were not being tested). Small sample size for development and validation, less than 100 events both. Complete case analysis. |
Yoo (Yoo et al., 2020) | Glasgow coma scale, oxygen support level, BUN, age, lymphocyte percentage, troponin | Yes | Points-based score | Not reported, as AUC was used to define the variables for the score. | At admission 0.81; maximum through admission 0.91; mean through admission 0.92 | The authors reported that documentation of all kinds was inconsistent during the first wave of Covid-19, and the environments at different hospitals varied substantially. While it is unlikely that a laboratory result or medication administration was missed, inconsistencies in flowsheet documentation during this period could mean that the timings of different modes of oxygen administration were not always accurately capture. The statistical test used to produce the score is not adequate according to the TRIPOD and may lead to over optimism. |
Zhang (Zhang et al., 2020) | DCS (demographic, comorbidities and symptoms): age, sex, chronic lung disease, DM, hypertension, immunosuppression, cancer, CKD, heart disease, cough, dyspnea, diarrhea; DCSL (demographic, comorbidities, symptoms and laboratory tests): age, sex, chronic lung disease, DM, cancer, cough, dyspnea, CRP, creatinine, platelets, neutrophils and lymphocytes counts; DL (demographic and laboratory tests): age, sex, CRP, creatinine, platelets, neutrophils and lymphocytes counts (around admission) | Yes | Logistic regression | DCS: 0.79; DCS: 0.89; DL: 0.91 (95% CI not reported) | DL: 0.74 (95% CI not reported) | Authors reported that clinical datasets were collected when healthcare services were under severe strain. Data extraction sought to ensure consistency and accuracy, but there is missing data in both datasets, and the analysis was complete case based. Sample sizes for development and validation were small, with < 100 events. Clinical assessments at admission such as SpO2 were not available in either dataset. The external validation dataset has very different case-mix, and only had follow-up to a fixed date (6–39 days). Although the Wuhan cohort includes many people with less severe disease, in the validation cohort most admitted patients are likely to have severe disease. Although the authors reported all variables were included in the model, for most of the included ones the 95% CI of the OR included 1.0 |
Yadaw (Yadaw et al., 2020) | 17F: age, sex, ethnicity, encounter type, temperature, diastolic blood pressure, oxygen saturation at presentation, minimum oxygen saturation, smoking, asthma, COPD, obesity, DM, HIV, cancer; 3F: age, minimum oxygen saturation, and type of patient encounter, obtained the day of admission | Yes | Artificial intelligence (XGBoost) | 0.91 (95% CI not provided) | 0.91 (95% CI not provided) | As it includes inpatients and outpatients, important laboratory parameters were not tested. The authors reported that the clinical features available were limited to those routinely collected during hospital encounters, and they pointed out that development of even better prediction models should be possible using a richer set of features. |
Shang (Shang et al., 2020) | Age, coronary heart disease, % of lymphocytes, procalcitonin, D-dimer | Yes | Points-based score | 0.919 (95% CI 0.870–0.970) | 0.938 (95% CI 0.902–0.973) | Small sample size in development (113 participants) and validation cohorts, with < 100 events in the development one. Too many variables tested for the number of events. |
Faisal (Faisal et al., 2020) | CARMc19_N: 10 [age, sex, COVID-19 (yes/no), NEWS2 score and subcomponents] and CARMc19_NB: 18. All variables from CARMc19_N + 7 blood test results + AKI score | Yes | Points-based score | CARMc19_NB = 0.87 (95% CI 0.85–0.89) vs CARMc19_N 0.86 (95% CI 0.84–0.87) | CARMc19_NB = 0.88 vs CARMc19_N = 0.86 | Not exclusively for COVID-19 patients. COVID-19 was identified by ICD-10 code which depends on clinical judgment. Risk of selection bias, as only patients with NEWS2 recorded were included. Complete case analysis. |
Mei (Mei et al., 2020) | Age, NLR, admission body temperature, AST, total protein | Yes | Points-based score | 0.912 (95% CI 0.878–0.947) | VC1 = 0.928 (95% CI 0.884–0.971) and VC2 = 0.883 (0.815–0.952) | Risk of selection bias due to inclusion/exclusion criteria, included only patients from Wuhan. Small sample size for development and validation. Complete case analysis. |
Zhang (Zhang et al., 2020) | Age, LDH, NLR and direct bilirubin obtained on admission | Yes | Nomogram | 0.886 (95% CI 0.873–0.899) | 0.879 (95% CI, 0.856–0.900) and 0.839 (95% CI [0.798–0.880) for each one of the hospitals | Small sample size for development and validation, < 100 events for both cohorts. The amount of missing data differed between the survivor and non-survivor groups. The study included a high population of patients who were severely ill, the authors pointed out there may be a selection bias when identifying the risk factors of mortality |
Lu (Lu et al., 2020) | Age, CPR | No | Cox regression analysis, decision tree | Not reported | NA | Included both patients with confirmed and not confirmed disease, small sample size with < 100 events, number of potential predictors tested was not clear. No external validation. |
Soto-Mota (Soto-Mota et al., 2020) | Age, hypertension, white blood cell count, lymphocyte count, myocardial necrosis marker, creatinine, SpO2 (not clear in which moment) | No | Logistic regression | NA | Provided by different cut-offs, ranging from 0.61 to 0.90 (95% ranges from 0.59 to 0.93), with best AUC for 25 points (0.90 [95% CI 0.87–0.93]) | Score developed by consensus. Not clear the moment it is meant to be used. Risk of selection bias, high mortality in the cohort (50%) |
Yan (Yan et al., 2020) | LDH, lymphocytes and CRP obtained at hospital admission | Yes | Multi-tree XGBoost model | 0.978 (IC 95% not provided) | 0.951 (CI 95% not provided) | Single-center study, with small sample for development and validation, less than 100 events in the validation cohort. Apparently, complete-case analysis. |
Williams (Williams et al., 2020) | Age, sex, history of cancer, COPD, diabetes, heart disease, hypertension, hyperlipidemia and kidney disease. | Yes | Points-based score | 0.896 (95% CI 0.72 – 0.90) | CUIMC database 0.820 (95% CI 0.796–0.840); HIRA database 0.898 (95% CI 0.857–0.940); SIDIAP 0.895 (95% CI 0.881–0.910); VA 0.717 (0.642–0.791) | The authors reported they were unable to develop a model on COVID-19 patient data due the scarcity of databases that contains this information in sufficient numbers. Based on secondary data, with possibility of misclassifications of predictors (diseases is incorrectly recorded in a patient's history, incorrect recording of influenza or COVID-19, and authors were unable to include some suspected diseases predictors such as BMI/obesity in the analysis due to the inconsistency with which these measures are collected and reported across the databases included in the study. Patients may day after 30 days, and this will be recorded as a non-event. Apparently, complete case analysis. |
Gue (Gue et al., 2020) | Age, sex, hypertension, coronary artery disease, heart failure, atrial fibrillation, oral anticoagulants, modified sepsis-induced coagulopathy (mSIC) score (INR, platelet count, qSOFA score) | No | Points-based score | 0.793 (95% CI 0.745–0.841) | NA | Small sample size from a single center, no external validation. Complete case analysis. Authors pointed out that patients at the highest risk may be deemed too sick for maximal intervention and may be denied ICU treatment; predictors and their assigned weights in the final model. |
Das (Das et al., 2020) | Age, sex, province (in South Korea) and exposure (nursing home, hospital, religious gathering, call center, community center, shelter and apartment, gym facility, overseas inflow, contact with patients and others) | No | Logistic regression (SMOTE) | 0.830 (95% CI not reported) | NA | Risk of selection bias (only patients with complete data were included), unavailability of crucial clinical information on symptoms, risk factors and clinical parameters. Less than 100 events. No external validation |
Levy (Levy et al., MedRxiv) | Age, length of stay, SpO2, neutrophil, RDW, sodium urea (on admission and every 2 days) | Yes | Logistic regression | 0.86 (95% CI not reported) | 0.82 (95% CI not reported) | Data were imputed for variables with up to 50% missing values. Follow up was too short (7 days), what causes a high risk of bias, as a significant proportion of patients may die after 7 days. Authors did not show how to calculate the score. |
Chen (Chen et al., 2020) | Age, coronary heart disease, cerebrovascular disease, dyspnea, procalcitonin, aspartate aminotransferase, total bilirubin upon admission | No | Nomogram | 0.91 (95% CI, 0.85–0.97) | NA | High risk of selection bias (20.8% patients with incomplete data were excluded), modest sample size, with < 100 events. No external validation. Complete case analysis. Authors did not show how to calculate the score. |
Sarkar (Sarkar and Chakrabarti, 2020) | Age, sex, from Wuhan, visit to Wuhan, days from symptom onset to hospitalization | No | RF classification algorithm | 0.97 (95% CI not reported) | NA | Small sample size, with < 100 events. High risk of selection bias: from 1085 patients, 652 (60.1%) were excluded due to missing values, and the model was developed using one 115 patients(10.6%). Data quality is questionable, as the study is based in open source database. |
Hu (Hu et al., 2020) | Age, CRP, D-dimer, lymphocyte count at admission | Yes | Points-based score | 0.895 (95% CI not reported) | 0.881 (95% CI not reported) | Small sample size of both development and validation samples, with < 100 events. Too many predictors tested for a small number of events. The authors did not exclude patients transferred from other hospitals (so the assessment was not the first hospital assessment in all patients). Single center study, patients from both derivation and validation sets were from Tongji Hospital, which is one of the hospitals with a high level of medical care in China (the authors reported that some critically ill patients who recovered there might die in other hospitals with suboptimal or typical levels of medical care). |
AUC: area under the curve; BMI: body mass index; CI: confidence interval; CPOD: chronic obstructive pulmonary disease; CPR: C-reactive protein; CT: computed tomography; DLN: deep learning networks; DM: diabetes mellitus; GFR: glomerular filtration rate; ICU: intensive care unit; LASSO: least absolute shrinkage and selection operator logistic regression; NA: not applicable; RDW: red blood cell distribution width; PLS: partial least squares RF: Random Forest; SF ratio: SpO2/FiO2 ratio; SVM: support-vector machine; XGBoost: eXtreme Gradient Boosting; WHO: World Health Organization.