Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2022 Apr 9;98:107681. doi: 10.1016/j.compbiolchem.2022.107681

A composite ranking of risk factors for COVID-19 time-to-event data from a Turkish cohort

Ayse Ulgen a, Sirin Cetin b, Meryem Cetin c, Hakan Sivgin d, Wentian Li e,
PMCID: PMC8993420  PMID: 35487152

Abstract

Having a complete and reliable list of risk factors from routine laboratory blood test for COVID-19 disease severity and mortality is important for patient care and hospital management. It is common to use meta-analysis to combine analysis results from different studies to make it more reproducible. In this paper, we propose to run multiple analyses on the same set of data to produce a more robust list of risk factors. With our time-to-event survival data, the standard survival analysis were extended in three directions. The first is to extend from tests and corresponding p-values to machine learning and their prediction performance. The second is to extend from single-variable to multiple-variable analysis. The third is to expand from analyzing time-to-decease data with death as the event of interest to analyzing time-to-hospital-release data to treat early recovery as a meaningful event as well. Our extension of the type of analyses leads to ten ranking lists. We conclude that 20 out of 30 factors are deemed to be reliably associated to faster-death or faster-recovery. Considering correlation among factors and evidenced by stepwise variable selection in random survival forest, 10 ~ 15 factors seem to be able to achieve the optimal prognosis performance. Our final list of risk factors contain calcium, white blood cell and neutrophils count, urea and creatine, d-dimer, red cell distribution widths, age, ferritin, glucose, lactate dehydrogenase, lymphocyte, basophils, anemia related factors (hemoglobin, hematocrit, mean corpuscular hemoglobin concentration), sodium, potassium, eosinophils, and aspartate aminotransferase.

Keywords: COVID-19, Survival analysis, Competing risks, Composite ranking

Graphical Abstract

ga1

1. Introduction

The purpose of meta analysis is to combine information from different datasets in multiple studies in order to provide robust and consistent conclusions on the effect of a factor on an outcome (Borenstein et al., 2009, Haidich, 2010). However, it is less common to attempt multiple analyses on the same set of data to extract robust information. For example, in investigating risk factors for COVID-19 infection susceptibility, disease severity (e.g. hospitalization), and mortality (Guan et al., 2020), the most common approach is to carry out one test to obtain p-value (Tian et al., 2020, Tian et al., 2021, Liu et al., 2020, Liu et al., 2020, Rosenthal et al., 2020, Williamson et al., 2020, Fadl et al., 2021). The test can be t-test/Wilcoxon test for continuous variable, or χ 2-test/Fisher’s test for discrete variables, with the value of a risk factor in samples within two groups compared. Alternatively, uni-variable logistic regression can be used, and the null hypothesis of regression coefficient to be zero is tested. For time-to-event data (survival data), the time from hospitalization to death of a COVID-19 patient can be used to examine which factor contributes to a faster death (per unit time rate of death), which can be done by the Cox regression (proportional hazard model). The null hypothesis of zero regression coefficient is then tested.

In this paper, we extended the above common practice in three directions, on a COVID-19 patient time-to-event data. The first is to use both p-value based measures and prediction performance based ones. Although p-value-based approach has advantages: the meaning is easy to understand and the result is easy to report, it also has problems. P-value itself, often treated as the “gold standard for statistical validity", may not be so golden (Nuzzo, 2014). A change in true prior probability of a signal will change the prediction error even when the p-value is the same (Nuzzo, 2014, Colquhoun, 2017). There have already been proposals of alternatives for p-value in evaluating variables (Lu and Ishwaran, 2018, Halsey, 2019).

The second extension is to use multiple-variable methods as well as single-variable ones. Single-variable methods evaluate a variable in isolation with respect to other variables. As a result, they would not detect conditional importance of a variable and its interaction with other variables. Inconsistency, or larger confidence interval, between different datasets concerning the importance of a factor may well reflect the contextual heterogeneity in other variables (Ghahramani et al., 2020, Kermali et al., 2020). The multi-variable statistical/machine learning models (Strobl et al., 2008) are ideal to supplement the single-variable analyses, but in the case of COVID-19 data analysis, are most focused on prediction and diagnosis (An et al., 2020, Li et al., 2020, McCoy et al., 2021, Li et al., 2021, Li et al., 2021, Bennett et al., 2021, Karthikeyan et al., 2021, Aljameel et al., 2021, Mahdavi et al., 2021, Cornelius et al., 2021, Kocadagli et al., 2022, Malik et al., 2022), not on variable evaluation such as in our work. There are more applications of machine learning and artificial intelligence in the context of COVID-19, ranging from drug repurposing to medical assistance (Zeng et al., 2020, Deepthi et al., 2021, Chen et al., 2022, Alafif et al., 2021, Khan et al., 2021, Piccialli et al., 2021, Dogan et al., 2021, Majeed and Hwang, 2022). On the other hand, over-interpreting specific multi-variable models (Yan et al., 2020) might not be a good practice as it may not be applicable to other data (Barish et al., 2021).

The third extension is specific for time-to-event data. For our inpatient data, not only have we deceased patients admission-to-death time information, but also we have larger number of patients who are completely recovered and released. In a typical survival analysis, these patients’ time-to-release information would be treated as right-censored data. However, treating them as the main event of interest, extra information might be obtained (Cetin et al., 2021b, Cetin et al., 2021c).

With our three extensions, we are able to carry out ten analyses on the same set of data. Combining these analyses to get a composite ranking of risk factors for COVID-19 faster death or faster recovery, we effectively run a meta-analysis on the same dataset. Besides the standard testing for unit hazard ratio from Cox regression (thus p-value based), we also used single-variable random survival forest (Breiman, 2001, Ishwaran et al., 2008) (thus prediction performance based), multi-variable random survival forest and variable selection in regularized regression (thus multiple variable based). Two different measures of performance (discordance index and integrated Brier score) in single-variable random survival forest are used. Then all these analyses were repeated for the time-to-release data, resulting in ten different sets of results. We will show that our composite ranking of ten runs result in a more robust list of risk factors for COVID-19 severity than the p-value based method alone, and our selected factors are fully validated by being consistent with the literature.

2. Methods and data

2.1. COVID-19 patient data

Our COVID-19 patient data was collected from Tokat State Hospital (Turkey), with 3084 people and 35 potential risk factors. This study was carried out with the approval of Gaziosmanpaşa University Faculty of Medicine Non-Interventional Clinical Research Ethics Committee (decision No: 22-KAEK-051). The 2682 outpatients do not have time-to-event data and would not be used in our survival analysis by RSF. For the remaining 402 inpatients, five factors have too much missing data (activated partial thromboplastin clotting time (aPTT), red blood cell (RBC) count, HbA1C, fibrinogen, and C-reactive protein (CRP)) and are not used. The remaining 30 factors are: age, gender, glucose, d-dimer, calcium, chloride, potassium, sodium, creatine, ferritin, urea, alanine aminotransferase (ALT), aspartate aminotransferase (AST), lactate dehydrogenase (LDH), white blood cell (WBC) or leukocyte count, neutrophils (NEU) cell count, lymphocyte (LYM) cell count, monocyte (MON) cell count, eosinophils (EOS) cell count, basophils (BAS) cell count, platelets (PLT) cell count, hemoglobin (HGB) count, hematocrit (HCT), mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), red cell distribution widths (RDW.CV and RDW.SD), mean platelet volume (MPV), platelet distribution width (PDW). Some of these factors still have missing data, but none of the missing rate exceeds 14%.

2.2. Assessing independent variables in their contribution to time-to-event dependent variable using four methods

Our data is of the following form: 30 independent variables, x={x1,x2,x30}, and one dependent variable y={T,δ}, where T is the time to the event, and event status δ can take the value of 1 (event 1, or death), 2 (event 2, or release from hospital), 0 (right censored). We do not have any samples with right-censored time. How to handle multiple events data is usually within the scope of “competing risks" survival analysis (example: death from heart attack versus death from stroke); though in our case, the two events, risk and benefit, do not point to the same direction. How to analyze our special type of data will be discussed in the Result section. Here we simply reset δ = 2 to δ = 0 and summarize the existing methods.

All four of our analyses aim at finding which independent variable is associated with the dependent variable. We will describe each method as below. (1) In Cox regression, instead of modeling and fitting the survival function, an arbitrary baseline survival function is assumed, and a change in independent variable is assumed to lead to a constant multiple of the whole baseline curve (proportional hazard hypothesis):

h(t,xi)h0(t)=eβ0+βixi(i=1,2,30) (1)

where h 0(t) is the arbitrary baseline hazard function, h(tx i) is the hazard function with one of the independent variable present. The p-value pi for testing β i = 0 measures the statistical significance of the contribution of x i to the time-to-event data, and eβi measures the hazard ratio with one unit change in x i.

(2) Random survival forest (RSF) (Ishwaran et al., 2008) is an extension of Random Forest (RF) (Breiman, 2001) to handle time-to-event data. RF/RSF construct many decision trees (therefore “forest") that separate the dependent variable value in two daughter nodes as much as possible (for introduction on RF, see, for example, (Louppe, 2014, Fernández-Delgado et al., 2014). Once the splitting of nodes in a tree is done, in each external node, the cumulative hazard function (CHF) can be estimated by the Nelson-Aalen estimator (Ishwaran et al., 2008). The CHF with an independent variable x i can be obtained by tracing the path on the tree, according to the x i value, to reach the external node (Ishwaran et al., 2008):

Hˆnode(t)=tj,node<tdj,nodennodeH(tx)=meannode,treeHˆnode(t)(the node a bootstrap ofxvalue leads to) (2)

The performance of a RSF is measured by samples not used in the construction of the tree/forest, as only 1 − e −1 ≈ 63.2% of the data are used in a sample-with-replacement approach (bootstrap), and those are called out-of-bag (OOB) samples. The CHF of individual OOB samples can be obtained in a similar way by tracing the path along a tree by its independent variable values, and averaged over trees (Ishwaran et al., 2008). Averaging over all OOB samples lead to an ensemble prediction of the CHF. As all samples will become an OOB in one of run of a tree, if the number of runs is large enough, a RSF will produce a predicted CHF for each sample.

Using the predicted CHF from the RSF program, two errors can be used to measure the prediction performance. One is Harrell’s concordance index (C-index) (Harrell et al., 1982) whose complement can be called discordance index (D-index) (Cetin et al., 2021a). The C-index (D-index) is simply the number of sample pairs whose event time and predicted CHF are consistent (not consistent), divided by the number of permissible sample pairs:

C=%Tj<Tk&Ht=Tjxj>Ht=Tkxkj,k index for permissible sample pairD=1C (3)

The second measure is (integrated) Brier score (IBS) (Brier, 1950), which is the squared difference between the actual value (e.g. binary indicator for survival) and the predicted value (e.g., predicted survival probability Prob(T > t)), integrated over available time-to-event points in the sample:

BS(t)=1nj=1n{(Prob(T>t)j1)2ift<TjProb(T>t)j2iftTj,δj=1NAiftTj,δj=0IBS1nTjBS(Tj) (4)

where n T is the number of samples with δ = 1 and the sum is over these samples.

(3) RF/RSF also provides a performance-based evaluation of individual variables when all variables x are used in RF/RSF (Breiman, 2001, Ishwaran et al., 2008). We refer to the permutation based measure of variable important as VIM (the other choices are external node purity based, such as the Gini index (Nembrini et al., 2018). In this approach, a variable is removed (version-1) or randomized by permuting its value among samples (version-2), and the RF/RSF performance before and after permutation is compared:

ver1VIMP(xi)=error(x\xi)error(x)ver2VIMP(xi)=error(x1,x2,xi1,R(xi),xi+1,)error(x) (5)

where R() refers to the random permutation step. Usually, it is the version-2 that is implemented in the commonly used programs. The variable that leads to the largest decrease of performance (or largest increase of error) is the most important variable. Note that this approach provides only a ranking of variables, and no cutoff of the list to separate important from unimportant variables is given. (4) The regularized (Hastie et al., 2009) regression LASSO (Tibshirani, 1996, Tibshirani, 1997) for Cox regression for multiple variables is:

h(t,xi)h0(t)=exp(βTx)conditional oniβi<c (6)

When c → 0, all regression coefficients are zero. As c increases, {β i} gradually emerge from zero to non-zero value one by one. It is a variable selection technique, and the order of variables being selected can be used to rank them. For example, the first variable with non-zero coefficient is ranked no.1. A LASSO plot (how regression coefficient values change with the constraint) can be used to easily find the rank order.

2.3. Programs used

The R statistical platform (https://www.r-project.org) is used for our analysis. We use the R package randomForestSRC for random survival forest (Ishwaran and Kogalur, 2007), our own R functions for calculating IBS errors (http://github.com/wlicol/coxrsf/) (Cetin et al., 2021a), survival package for Cox regression and other survival analyses (Therneau and Grambsch, 2010). For LASSO on right-censored data, we use the R package glmnet (Simon et al., 2011). The Kendall correlation and testing of zero correlation is carried out by the R function cor( …method=“kendall") and cor.test( …method=“kendall").

In the main RSF program we used, the rfsrc function from randomForestSRC package, the default number of trees (ntree=1000), default number of variables for splitting a branch (mtry= 30~ 5), default minimum (average) number of samples in the external nodes (nodesize=15), are used. We choose samptype = “swr" (sampling with replacement), na.action = “na.impute" (imputing the missing value). The decision to use several default parameter settings is based on our experimenting with the parameter values (Probst et al., 2019), as have been done previously (Cetin et al., 2021a). Note that sampling with replacement is not the default setting in rfsrc, but consistent with the original proposal by Breiman.

3. Results

3.1. Survival analyses with two types of mutually exclusive events

Our time-to-event data contains two events of very different nature. As early as 1980s, it was suggested to use a type-specific (cause-specific) use of standard survival analysis program by switching the second type event to right-censored event (i.e., convert δ = 2 to δ = 0) (Kalbfleisch and Prentice, 1980). The cause-specific (cs) hazard is defined as:

hkcs(t)=limΔt0ProbtT<t+Δt,K=kTtΔt,k=1,2 (7)

Although this definition clearly points to the possible cause-specific hazard for the second event, most people only concern about the main event, death, and do not consider the second event type.

We argue here that to run a survival analysis on the time-to-release data, while masking death as right-censored, actually provide valuable information. If h2cs(xi+1)h2cs(xi)>1, then a higher value of x i leads to a faster release of a patient, and the variable-i is protective. For that same variable, we would expect h1cs(xi+1)h1cs(xi)<1 for the death event. The deceased samples should be more important in estimating h1cs. Similarly, the released samples should be more useful for the estimation of h2cs(xi). Therefore, we believe running the same survival analysis twice to find variables contributing to the two cause-specific hazard-ratios would use the dataset more fully.

In competing risk survival analysis, there is another population approach called Fine-Gray model (Fine and Gray, 1999) which defines the following subdistribution hazard:

hksd(t)=limΔt0Prob(tT<t+Δt,K=k(Tt)(T<tKk))Δt,k=1,2 (8)

where the occurring of a type-2 event has no impact on the calculation of h1sd, with the underlying assumption that the sample experiencing type-2 event may continue to experience a type-1 event. However, this scenario is impossible in our example because our two types of events, release from hospital and die from COVID-19, are mutually exclusive. Using cause-specific survival analysis for mutually exclusive events is explicitly recommended in (Allison, 2014).

3.2. Composite ranking of factors based on five measures (time-to-death)

The factors are ranked five times using five measures: D-index from single-variable RSF, IBS from single-variable RSF, p-value for testing unit hazard ratio by single-variable Cox regression, permutation based VIM from the full model RSF, the variable selection order in LASSO. We found that for full-model RSF, because of random components in a run (both samples and variables are randomly chosen in a tree formation), the ranking order may change from run to run, especially for low-ranking factors. Therefore, we run the full-model RSF 100 times and the average of these runs is used. Table 1 shows the results of these five measures. The rank for each method is within the parentheses, and the factors are listed by the overall rank order. It can be seen that some factors are consistently ranked high no matter what method is used (e.g., urea is ranked no.1 by all five methods, neutrophils and calcium are ranked within top 5 by all five methods). Other factors are not ranked consistently among methods: e.g., LDH, AST are ranked high in the three RSF based methods but ranked lower by Cox and LASSO; sodium is ranked lower in single-variable RSF measures; etc. The consistency is reassuring that the very top factors would be discovered by any method. The inconsistency or variation provides a basis for our approach of combining multiple rankings to improve the robustness of the result, even for one dataset.

Table 1.

Composite ranking of factors based on time-to-death of COVID-19. Column 1 (Col-1): composite rank from five analyses; Col-2: names of factor; Col-3: two error measures of single-variable random survival forest (RSF): discordance index (D) and integrated Brier score (IBS). The number in the parenthesis is the rank (also for Cols 4–6); Col-4: p-value from single-variable Cox regression; Col-5: permutation-based variable importance (VIM) from the full RSF model, averaged over 100 runs; COl-6: (ranking according to the variable selection order from LASSO; T for tied rank).

combined rank var single-var RSF Cox C full RSF LASSO
(time-to-death) D (rank), IBS (rank) pv (rank) VIM × 103 (rank) (rank)
1 urea .186 (1),.121 (1) 1.4E-19 (1) 38.2 (1) (1)
2 calcium .220 (3),.131 (3) 5.6E-14 (4) 12.6 (2) (3)
3 NEU .235 (4),.141 (5) 3.1E-16 (2) 9.05 (4) (2)
4 WBC * .252 (5),.153 (8) 2.2E-14 (3) 5.05 (8) (7)
5 creatine * .214 (2),.127 (2) 2.1E-12 (5) 10.6 (3) (19)
6 LYM .326 (12),.167 (14) 1.3E-9 (8) 5.3 (7) (5.5T)
7 d-dimer .3 (8),.167 (16) 1.1E-9 (7) 1.13 (15) (4)
8 glucose .281 (7),.154 (9) 4.4E-5 (14) 2.69 (10) (11)
9 sodium .367 (18),.18 (21) 4.2E-11 (6) 2.6 (9) (5.5T)
10 BAS .377 (20),.14 (4) 2.4E-6 (13) 0.98 (17) (9T)
11 chloride .35 (14),.175 (20) 4.24E-8 (9) 1.32 (12) (9T)
12 RDW.SD .313 (10),.171 (18) 5.8E-8 (10) .56 (19) (9T)
13 LDH .26 (6),.15 (7) .015 (23) 7.56 (5) (26)
14 AST .302 (9),.15 (6) .021 (24) 5.54 (6) (24)
15 EOS .365 (17),.164 (11) 4E-4 (17) 1 (16) (13.5T)
16 potassium .319 (11),.167 (15) 5.7E-4 (18) 0.279 (23) (15)
17 age .373 (19),.171 (17) 8.7E-8 (11) 0.319 (21) (13.5T)
18 ALT * .354 (15),.165 (13) .0403 (27) 0.454 (20) (12)
19 RDW.CV * .335 (13),.165 (12) 1.7E-4 (16) 0.29 (22) (25)
20 ferritin .38 (22),.183 (22) 3.6E-7 (12) 0.186 (24) (21)
21 MPV .378 (21),.192 (24) 1.5E-4 (15) 0.0648 (25) (16)
22 PDW .356 (16),.174 (19) .036 (25) -0.0178 (26) (21)
23 HCT .474 (27),.214 (29) .008 (22) 1.16 (14) (17)
24 MCHC .407 (23),.193 (25) .0011 (19) -0.047 (27) (18)
25 PLT .443 (25),.2 (26) .0019 (21) 0.88 (18) (23)
26 HGB * .416 (24),.206 (27) .0014 (20) 1.22 (13) (29)
27 MON .475 (28),.208 (28) .49 (28) 1.35 (11) (21)
28 gender .637 (30),.155 (10) .83 (29) -0.06 (28) (28)
29 MCV .461 (26),.189 (23) .037 (26) -0.1 (29) (30)
30 MCH * .561 (29),.218 (30) .86 (30) -0.21 (30) (27)

3.3. Composite ranking of factors based on five measures (time-to-release)

Table 2 shows the similar five rankings (and the factors are ordered by the composite ranking obtained the five) with time-to-release as the dependent variable. It is interesting that different methods do not share a common top factor: the three RSF based methods pick ferritin as the top factor, whereas Cox regression picks age, and LASSO picks calcium. When the overall rank in Table 2 is combined with the overall rank in Table 1, we have a composite rank using 10 ranking lists (last column in Table 2).

Table 2.

Similar to Table 1 for time-to-release analyses.

combined rank var single-var RSF Cox C full RSF LASSO composite10
(time-to-release) D (rank), IBS (rank) pv (rank) VIM × 103 (rank) (rank) (rank)
1 ferritin .327(1),.099 (1) 1.9E-17 (7) 21.4 (1) (2.5T) (8)
2 RDW.SD .355 (4),.103 (5) 2.6E-19 (3) 7.87 (3) (4.5T) (6)
3 WBC .361 (6),.1 (2) 8.8E-18 (5) 5.8 (6) (2.5T) (2)
4 age .338 (2),.108(12) 1.5E-27 (1) 15.8 (2) (4.5T) (7)
5 calcium .354 (3),.107 (9) 3.8E-21 (2) 4.08 (7) (1) (1)
6 d-dimer .368 (7),.104 (6) 3.6E-15 (9) 6.56 (4) (6.5T) (5)
7 NEU * .359 (5),.102 (3) 1.1E-17 (6) 6.48 (5) (17.5T) (3)
8 HGB .39 (11),.105 (7) 6.4E-19 (4) 1.89 (10) (6.5T) (14)
9 HCT * .395 (12),.105 (8) 6.5E-19 (8) 1.04 (18) (9T) (16)
10 LDH .38 (9),.11 (14) 8.9E-15 (10) 2.85 (9) (19) (11)
11 urea .373 (8),.109 (13) 3.7E-14 (11) 3.19 (8) (26) (4)
12 glucose .397 (13),.112 (17) 8.7E-8 (16) 1.52 (11) (11) (9)
13 BAS .42 (17),.103 (4) 0.64 (29) 1.35 (15) (15.5T) (13)
14 MCHC .441 (21),.118 (26) 8.1E-9 (14) 1.4 (13) (9T) (21)
15 LYM .41 (15),.107 (11) 1.4E-4 (20) 0.81 (19) (21) (12)
16 RDW.CV * .422 (18),.114 (20) 6.7E-10 (12) 1.42 (14) (27) (18)
17 gender .541 (30),.107 (10) 0.002 (23) 0.78 (20) (9T) (25)
18 creatine * .387 (10),.115 (23) 4.3E-9 (13) 0.43 (21) (24) (10)
19 sodium .415 (16),.117 (25) 1.5E-5 (18) 0.32 (23) (12.5T) (15)
20 potassium .409 (14),.115 (22) 7.7E-5 (19) 0.29 (25) (17.5T) (17)
21 MCV .454 (26),.118 (27) 3.7E-6 (17) 1.05 (17) (15.5T) (29)
22 PLT .452 (25),.112 (16) 0.217 (27) 1.1 (16) (20) (27)
23 MPV .467 (27),.119 (28) 8E-8 (15) 0.37 (22) (14) (24)
24 EOS .48 (28),.115 (24) 0.777 (30) 1.48 (12) (12.5T) (19)
25 PDW .44 (20),.114 (19) 4.7E-4 (21) 0.13 (28) (22.5T) (26)
26 MON .447 (24),.111 (15) 0.09 (26) 0.24 (26) (22.5T) (28)
27 ALT .445 (23),.112 (18) 0.002 (24) 0.21 (27) (25) (23)
28 AST * .425 (19),.114 (21) 0.0014 (22) 0.12 (30) (30) (20)
29 chloride .444 (22),.12 (29) 0.03 (25) 0.1 (29) (29) (22)
30 MCH * .509 (29),.122 (30) 0.22 (28) 0.31 (24) (28) (30)

To compare the time-to-death and time-to-release obtained ranking, we plot the two 1/ranks versus the composite-19 rank in Fig.1(A). Generally speaking, the two are consistent. When the two are less consistent, a “bubble" is formed. We mark the name in black if a factor is ranked higher (by more than 3) in the time-to-death analyses, and in blue if the factor is ranked higher in the time-to-release analysis (and gray if the ranks are similar). We can see that neutrophils, urea, glucose, creatine, lymphocytes, etc. are ranked higher in time-to-death runs, whereas RDW.SD, age, ferritin, HGB, etc. are ranked higher in time-to-release runs. Fig. 1(B) and (C) also show if the individual ranks within the time-to-death group and those within the time-to-release group are consistent or not. If a factor has a large variance (normalized by the mean rank) among individual ranks, it is marked by the brown color, otherwise by gray color. All curves in Fig. 1 are decreasing functions, indicating a general agreement among all ranking lists. A factor with a star (*) indicates that it is highly correlated with another higher ranked factor (see Table 3).

Fig. 1.

Fig. 1

(A) Comparing the composite rank based on 5 time-to-death analyses (black) and the composite rank based on 5 time-to-release analysis (blue). The x-axis is the composite rank based on 10 analyses, and y-axis is 1/(composite rank using 5 analyses). (B) Comparing the five ranks obtained from five time-to-death analysis. The x-axis is the composite rank and y-axis is 1/(individual rank). (C) Similar to (B) for ranks from five time-to-release analyses.

Table 3.

Factor pairs that have very strong correlation (with R2 > 0.6 in both survived and deceased group, either in the original level or log-transformed level). The correlation coefficient (cc) and p-value for testing cc= 0 both refer to Spearman correlation.

factor1 factor 2 deceased samples: n, R2 (linear)/ (log), survived samples: n, R2 (linear)/ (log),
cc (Spearman), pv (Spearman) cc (Spearman), pv (Spearman)
urea creatine 94, 0.58/0.61, 0.78, 9.7E-21 308, 0.42/0.61, 0.57, 1.6E-27
NEU WBC 89, 0.99/0.98, 0.99, 6.8E-78 257, 0.87/0.98, 0.91, 1.4E-97
AST ALT 94, 0.62/0.73, 0.79, 6.5E-21 307, 0.6/0.73, 0.71, 4.6E-48
RDW.CV RDW.SD 94, 0.61/0.62, 0.77, 2.3E-19 290, 0.66/0.62, 0.71, 9.3E-46
HGB HCT 94, 0.96/0.96, 0.98, 4.9E-64 290, 0.96/0.96, 0.98, 1.1E-210
MCV MCH 94, 0.81/0.82, 0.81, 2.1E-23 290, 0.86/0.82, 0.92, 1.5E-122

3.4. Correlated factors

Because collinearity in a regression model is a problem of concern, we examine variable pairs that are correlated with each other. We use plotting of the raw data, correlation coefficient, p-value for testing correlation to be zero, the R 2 from regression to determination the correlation status. Several issues are considered: (1) we check the deceased samples and survived samples separately; (2) we check both the original data and log-transformed data; and for the same reason, the correlation coefficient and testing zero correlation is based on the non-parametric Spearman method; (3) if the visual impression of the scatter plot is a guide, the R 2 from regression provides a better quantity to use than, e.g. p-value for testing zero correlation.

We found six pairs of strongly correlated variables: urea and creatine, neutrophils and white blood cell, AST and ALT, RDW.CV and RDW.SD, HGB and HCT, MCV and MCH. The measure of their correlation is shown in Table 3. The lower ranked factor of a pair is marked with asterisk in Tables 1 and 2. There are more correlated variable pairs than those shown in Table 3, e.g., sodium and chloride. We use a more conservative R 2 cutoff point, and require the correlation in both survived and deceased group.

3.5. Estimation of the number of independent factors that achieve the optimal prediction performance

Although we have the composite ranking order of factors (both composite-5 in Table 1 and Table 2 and composite-10 in the last column of Table 2), there is still a question of where to cut the list to select the relevant factors. Towards this, we use a stepwise variable selection, similar to that in regression (e.g. (Ryan, 2008), but in the framework of RSF. We first need to clarify the meaning of adding or removing a variable. There are two versions: the first is actually add a variable starting from an empty field, or remove a variable starting from a full model. The second version is to keep all variables, but instead of an empty field with no variable, the null model refers to all variables being value-shuffled. Therefore, adding the first variable is to retain its values while keeping other variables scrambled. The difference between the two versions might be written as (up to step-i of a variable selection):

actual:(T,δ)~RSF(x[1],x[2],x[i])vitual:(T,δ)~RSF(x[1],x[2],x[i],R(x[i+1]),R(x[i+2]),) (9)

where the subscript [i] refers to the i-th variable selected, and operation R refers to random value-shuffling. Fig. 2 shows the OOB error IBS as a function of variable selection with this variable selection criterion (to stage-i):

[i]=minjIBS(x[1],x[2],,x[i1],xj,R(x\{x[1],x[2],,x[i1]})) (10)

where the index j goes through all variables not already selected in stage-1,2, ⋯ , i − 1, and all the rest of variables not selected remain to be value-scrambled. We have run the stepwise variable selection three times each for both time-to-death RSF and time-to-release RSF. The horizontal line is the mean of IBS from 500 runs of the full model, and dashed lines are one standard deviation from the mean (for time-to-death data, IBS= 0.0945 ± 0.000514, and for time-to-release data IBS=0.0753 ± 0.000246).

Fig. 2.

Fig. 2

IBS from OOB samples in RSF run with the stepwise variable selection (Eq.10), for time-to-death data (A) and time-to-release data (B). For each variable at each stage, 10 RSF runs were carried out. The variable with the lowest mean IBS is selected, and its mean the one standard deviation up or down are shown in a vertical bar. The whole process is repeated three times (for (A) and for (B) separately). The larger IBS’s with few variables (i < 5) are cut off in order to zoom in the middle range of i’s.

There are several observations from these runs: (1) The full model is not the best performing model. It is related to a long debate on whether RF (or RSF here) needs variable selection or not (Díaz-Uriarte and De Andreés, 2006, Li, 2006). The Fig. 2 shows that variable selection (reduction from the full model) is still needed for RF/RSF. However, we did not show the full range of IBS; and if we do, it will be seen that the problem of overfitting from the full model is less severe compared to other methods. (2) At each stage-i, multiple variables may have very similar IBS’s and the selection of x [i] by Eq.10. As a result, which particular variable is selected at stage-i may change from run to run (with the exception of perhaps the first few stages). Therefore, we cannot use this procedure to select risk factors. (3) On the other hand, because the three runs (per subplot) all reach the optimal performance in the middle, we can use Fig. 2 to roughly estimate the number of (independent) factors to achieve the best performance. This estimation will not be precise because different runs exhibit variations; However 10 ~ 15 (10 from Fig. 2(A) and 15 from Fig. 2(B)) factors should be a correct range.

3.6. Final selection of list of risk factors

Because the factor order in Fig. 2 changes from run to run, we use the pre-determined rank order (column 1 in Tables 1 and 2) to check how error decrease, i.e., the i-th variable added in stage-i is simply the rank-(i) variable in the ranking list: either the composite ranking order based on 5 time-to-dead analyses or on 5 time-to-release analyses. We also use both IBS and D-index as a measure of OOB prediction errors. Furthermore, both the virtual and actual variable addition were used. The resulting error curves are shown in Fig. 3 (top: time-to-death runs, bottom: time-to-release runs; left: D-index, right: IBS; black: virtual addition of variables, red: actual addition of variable). It is also possible to remove the least important (low ranking) variables first (going through the ranking list backward), but the results were very similar (not shown).

Fig. 3.

Fig. 3

OOB RSF error (D-index on left, IBS on right) for time-to-dead (top) and time-to-release (bottom) data, as a function of i (stage-i of addition of the top-i ranked factors), with black for virtual variable addition and red for actual addition. The horizontal line is the mean and one standard deviation away from the mean of the full model errors (from 500 runs). The vertical bar represents one standard deviation away from the mean at stage-i by 10 runs.

It has been suggested in the literature that IBS is a better measure than D-index because it is more practical (Longato et al., 2020) and more quantitative (Kattan and Gerds, 2018). We also prefer the virtual variable addition over actual one because the change in the error curves in Fig. 3 is smoother. Therefore, we use the black curve in Fig. 3(B) to choose the top 14 factors as removing (the number 15 factor does not visibly reduce the error further, and that in Fig. 3(D) to choose the top 8 factors (again because removing the number 9 factor in Fig. 3(D) does not seem to reduce the error greatly). Using the similar error curve from the ranking order from VIM in full RSF model (columns 5 in Tables 1 and 2) (not shown), we will have a similar list of selected top factors. All these information and more are presented in Table 4.

Table 4.

The risk factor selection worksheet. T2D/T2R: time-to-death/time-to-release data. VIM/rank5: rank order from variable importance of the full RSF model (column 5 of Tables 1 and 2)/composite rank order (column 1 of Tables 1 and 2). If a factor is selected by either being among the top ranking variables by the corresponding error curve, or by the p-value < 0.001 in Cox regression, it is marked by + .

rank factor VIM/T2D rank5/T2D VIM/T2R rank5/T2R Cox/T2D Cox/T2R
1 calcium + + + + + +
2 WBC + + + + + +
3 NEU ( → WBC) + + + + + +
4 urea + + + + +
5 d-dimer + + + + +
6 RDW.SD + + + + +
7 age + + + +
8 ferritin + + + +
9 glucose + + + + +
10 creatine ( → urea) + + + +
11 LDH + + + +
12 LYM + + + +
13 BAS + +
14 HGB + + + +
15 sodium + + + +
16 HCT ( → HGB) + +
17 potassium + +
18 RDW.CV ( → RDW.SD) + + +
19 EOS + +
20 AST + +
21 MCHC + +
22 chloride + + +
23 ALT ( → AST)
24 MPV + +
25 gender
26 PDW +
27 PLT
28 MON + +
29 MCV
30 MCH ( → MCV)

In Table 4, we mark factors being selected by Fig. 3(B), Fig. 3(D), and those by two other error curves not shown. The last two columns in Table 4 mark factors which would have been selected by the standard p-value approach (p-value <0.001). In fact, the threshold for p-value can be 0.01, or 0.005, and 0.001, but we consider the threshold 0.001 to be a good choice (see, for example, Colquhoun, 2017, Ioannidis, 2018, Li et al., 2021, Li et al., 2021). If we choose factors that contribute to a better prediction performance, 21–22 factors would be selected, 17–18 of them are independent. These are the factors above the horizontal partition line, except potassium. We may consider chloride a borderline choice as it ranks last in our list, and monocyte as a borderline possibility. Note that the Cox regression p-value based selection (at p = 0.001) would select MPV and PDW, which are not on our list.

The partition of factors in Table 4 into selected and not-selected ones should be considered to be ad hoc to some extent. We are only more confident of the factors being selected based on multiple lines of evidence, whereas less confident for the factors not selected. It does not imply that those not selected here are never relevant to COVID-19 severity/mortality, only that in our data they do not have the fullest level of evidence support.

4. Discussions

In this work, we have carried out a careful analysis from a single dataset to determine which factors contribute to either faster death or faster release from hospital. By a literature search, we found all our selected factors were addressed in other studies and were shown to be significantly associated with the COVID-19 disease severity. Here is a partial summary:

Our “within-sample-meta-analysis" lead to a more robust conclusion concerning risk factors for COVID-19 severity. Using independent studies published by other groups, our cutoff in Table 4 may lead to close to zero false discovery rate (FP/(FP+TP)) or close to 100% precision (positive predictive value, TP/(FP+TP)).

We may still underestimate the number of risk factors for the following reason. Because the number of variables selected by Figs. 2 and 3 represent the sufficient number of variables needed to achieve optimal performance, and adding more correlated/collineared variables would not decrease the error further. Among factors not selected in Table 4, several are related to platelet: PLT itself, and MPV and PDW related to platelet size. Because platelet is a key regulator of thrombosis and inflammation, both present in severe COVID-19 patients, it is a good candidate for risk factor (Barrett et al., 2021, Rohlfing et al., 2021, Delshad et al., 2021). However, it is suggested that platelet/large-cell-cell ratio (PLCR) (Daniels et al., 2021) is a better marker than MPV and PDW, and platelet-to-lymphocyte ratio (PLR) (Asan et al., 2021) is a better marker than platelet count. However, see counter conclusions in Aydinyilmaz et al., 2021, Lippi et al., 2021.

Other factors not selected by our composite ranking in Table 4 include gender, monocyte, MCH/MCV (mean corpuscular hemoglobin/volume). Male gender seems to be a risk for severity/mortality (Jin et al., 2020), though the effect size can be weak (Ortolan et al., 2020, Mukherjee and Pahan, 2021). We also cannot exclude the possibility of impact from gender-specific comorbidities (Ya’qoub et al., 2021). Monocytes usually make up a very small percentage of all white blood cells. Although it could play a role in COVID-19 severity (Vanderbeke et al., 2021), neutrophils with a much larger percentage in white cell population, should provide more signal. MCH/MCV tend to be less significant than other variables as reported in (Ballaz et al., 2021, Rahman et al., 2022). ALT is in a special situation because it is highly correlated with another factor selected (AST). There were indications that ALT could be a relevant biomarker by itself (Malik et al., 2021), or not (Qin et al., 2020), or within some patients (Bertolini et al., 2020).

The fact that factors not selected in our data whereas mentioned in other publications as potential relevant factors might reflect two situations. The first is that these factors are actually only weakly associated with COVID-19 severity/mortality and we are not able to detect them. The second possibility is that any two datasets can not be identical, and there are always chances that heterogeneity, unmeasured covariates, variations due to finite sample sizes may lead to inconsistent conclusions.

In the calculation of variable VIM in RSF, one variable is removed/shuffled from the full model, and the most important variable is listed first. It is tempting to extend this for a stepwise variable selection procedure by removing/shuffling the important variable at each stage (currently we remove/shuffle the least important variable first). However, this procedure does not produce an error curve that reaches plateau (result not shown). Therefore, we did not use this procedure for deciding the cutoff in our variable list.

In conclusion, with more interests in using blood test for prognosis in COVID-19 patients (COMBAT Consortium, 2022), we have carried out a careful analysis on blood-test factors affecting COVID-19 hazard in a time-to-event data from a Turkish cohort. We use multiple measures and methods to rank factors, some are traditional single-variable test (Cox proportional hazard ratio model), whereas others are multiple-variable machine learning techniques (random survival forest). A novel choice in our composite ranking is to utilize shorter hospital stay for released patients to discover protective factors, which should also be a risk factor when the factor value changes in the opposite direction. This approach complements the approach in using the time-to-death information for deceased patients. All of our top choices in the composite ranking list are confirmations to one of the other studies for risk factors for COVID-19 severity and/or mortality, resulting in a 100% positive predictive value.

Conflict of Interest

There is no conflict of interest to declare.

Acknowledgments

WL thanks the support from the Robert S Boas Center for Genomics and Human Genetics.

References

  1. Ahmed S., Ahmed Z.A., Siddiqui I., Rashid N.H., Mansoor M., Jafri L. Evaluation of serum ferritin for prediction of severity and mortality in COVID-19- A cross sectional study. Ann. Med. Surg. 2021;63 doi: 10.1016/j.amsu.2021.02.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alafif T., Tehame A.M., Bajaba S., Barnawi A., Zia S. Machine and deep learning towards COVID-19 diagnosis and treatment: survey, challenges, and future directions. Int. J. Environ. Res. Public Health. 2021;18:1117. doi: 10.3390/ijerph18031117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Aljameel S.S., Khan I.U., Aslam N., Aljabri M., Alsulmi E.S. Machine learning-based model to predict the disease severity and outcome in COVID-19 patients. Sci. Program. 2021;2021 [Google Scholar]
  4. Allison P. Event History Analysis: Regression for Longitudinal Event Data. 2nd edition. SAGE Publications; 2014. [Google Scholar]
  5. Aloisio E., Colombo G., Arrigo C., Dolci A., Panteghini M. Sources and clinical significance of aspartate aminotransferase increases in COVID-19. Clin. Chim. Acta. 2021;522:88–95. doi: 10.1016/j.cca.2021.08.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. An C., Lim H., Kim D.W., Chang J.H., Choi Y.J., Kim S.W. Machine learning prediction for mortality of patients diagnosed with COVID-19: a nationwide Korean cohort study. Sci. Rep. 2020;10:18716. doi: 10.1038/s41598-020-75767-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Asan A., Ustundag Y., Koca N., Şimşek A., Sayan H.E., Parildar H., Cilo B.D., Huysal K. Do initial hematologic indices predict the severity of COVID-19 patients? Turkish J. Med. Sci. 2021;51:39–44. doi: 10.3906/sag-2007-97. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Aydinyilmaz D., Aksakal E., Pamukcu H.E., Aydemir S., Dogan R., Sarac I., Aydin S.S., Kalkan K., Gulcu O., Tanboga I.H. Significance of MPV, RDW and PDW with the severity and mortality of COVID-19 and effects of acetylsalicylic acid use. Clin. Appl. Thromb. /Hemost. 2021;27:1–8. doi: 10.1177/10760296211048808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Ballaz S.J., Pulgar-Sánchez M., Chamorro K., Fernández-Moreira E., Ramírez H., Mora F.X., Fors M. Common laboratory tests as indicators of COVID-19 severity on admission at high altitude: a single-center retrospective study in Quito (ECUADOR) Clin. Chem. Lab Med. 2021;59:e326–e329. doi: 10.1515/cclm-2021-0156. [DOI] [PubMed] [Google Scholar]
  10. Barish M., Bolourani S., Lau L.F., Shah S., Zanos T.P. External validation demonstrates limited clinical utility of the interpretable mortality prediction model for patients with COVID-19. Nat. Mach. Intell. 2021;3:25–27. [Google Scholar]
  11. Barrett T.J., Bilaloglu S., Cornwell M., Burgess H.M., Virginio V.W., Drenkova K., Ibrahim H., Yuriditsky E., Aphinyanaphongs Y., Lifshitz M., Liang F.X., Alejo J., Smith G., Pittaluga S., Rapkiewicz A.V., Wang J., Iancu-Rubin C., Mohr I., Ruggles K., Stapleford K.A., Hochman J., Berger J.S. Platelets contribute to disease severity in COVID-19. J. Thromb. Haemost. 2021;19:3139–3153. doi: 10.1111/jth.15534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Bennett T.D., Moffitt R.A., Hajagos J.G., Amor B., Anand A., Bissell M.M., Bradwell K.R., Bremer C., Byrd J.B., Denham Alina, DeWitt P.E., Gabriel D., Garibaldi B.T., Gabriel A.T., Guinney J., Hill E.L., Hong S.S., Jimenez H., Kavuluru R., Kostka K., Lehmann H.P., Levitt E., Mallipattu S.K., Manna A., McMurry J.A, Morris M., Muschelli J., Neumann A.J, Palchuk M.M.B., Pfaff E.R., Qian Z., Qureshi N., Russell S., Spratt H., Walden A., Williams A.E., Wooldridge J.T., Yoo Y.J., Zhang X.T., Zhu R.L., Austin C.P., Saltz J.H., Gersing K.R., Haendel M.A., Chute C.G., for the National COVID Cohort Collaborative (NC) Consortium Clinical characterization and prediction of clinical severity of SARS-CoV-2 infection among US adults using data from the US National COVID Cohort Col. JAMA Netw. Open. 2021;4 doi: 10.1001/jamanetworkopen.2021.16901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Benoit J.L., Benoit S.W, de Oliveira M.H.S., Lippi G., Henry B.M. Anemia and COVID-19: a prospective perspective. J. Med. Virol. 2021;93:708–711. doi: 10.1002/jmv.26530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Berger J.S., Kunichoff D., Adhikari S., Ahuja T., Amoroso N., Aphinyanaphongs Y., Cao M., Goldenberg R., Hindenburg A., Horowitz J., Parnia S., Petrilli C., Reynolds H., Simon E., Slater J., Yaghi S., Yuriditsky E., Hochman J., Horwitz L.I. Prevalence and outcomes of d-dimer elevation in hospitalized patients with COVID-19. Arterioscler. Thromb. Vasc. Biol. 2020;40:2539–2547. doi: 10.1161/ATVBAHA.120.314872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Bertolini A., van de Peppel I.P., Bodewes F.A.J.A, Moshage H., Fantin A., Farinati F., Fiorotto R., Jonker J.W., Strazzabosco M., Verkade H.J., Peserico G. Abnormal liver function tests in patients with COVID-19: relevance and potential pathogenesis. Hepatology. 2020;72:1864–1872. doi: 10.1002/hep.31480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Borenstein M., Hedges L.V., Higgins J.P.T., Rothstein H.R. Wiley; 2009. Introduction to Meta-?Analysis. [Google Scholar]
  17. Breiman L. Random forest. Mach. Learn. 2001;45:5–23. [Google Scholar]
  18. Brier G.W. Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 1950;78:1–3. [Google Scholar]
  19. Carubbi F., Salvati L., Alunno A., Maggi F., Borghi E., Mariani R., Mai F., Paoloni M., Ferri C., Desideri G., Cicogna S., Grassi Davide. Ferritin is associated with the severity of lung involvement but not with worse prognosis in patients with COVID-19: data from two Italian COVID-19 units. Sci. Rep. 2021;11:4863. doi: 10.1038/s41598-021-83831-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Carvalho H.De, Richard M.C., Chouihed T., Goffinet N., LeBastard Q., Freund Y., Kratz A., Dubroux M., Masson D., Figueres L., Montassier E. Electrolyte imbalance in COVID-19 patients admitted to the Emergency Department: a case-control study. Intern. Emerg. Med. 2021;16:1945–1950. doi: 10.1007/s11739-021-02632-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Cetin S., Ulgen A., Dede I., Li W. On fair performance comparison between Random Survival Forest and Cox regression: an example of colorectal cancer study. Sci. Med. J. 2021;3(1):66–76. [Google Scholar]
  22. Cetin S., Ulgen A., Sivgin H., Li W. Approximate reciprocal relationship between two cause-specific hazard ratios in COVID-19 data with mutually exclusive events. medRxiv. 2021 doi: 10.1101/2021.04.22.21255955. [DOI] [PubMed] [Google Scholar]
  23. Cetin S., Ulgen A., Sivgin H., Li W. A study on factors impacting length of hospital stay of COVID-19 inpatients. J. Contemp. Med. 2021;11:396–404. [Google Scholar]
  24. Cetin S., Ulgen A., Balci P.O., Sivgin H., Cetin M., Sirgin S., Li W. Survival analyses of COVID-19 patients in a Turkish cohort: comparison between using time to death and time to release. Sci. Med. J. 2021;3(Spec. Issue COVID-19):1–9. [Google Scholar]
  25. Chen J., Li K., Zhang Z., Li K., Yu P.S. A survey on applications of artificial intelligence in fighting against COVID-19. ACM Comp. Surveys. 2022;54:158. [Google Scholar]
  26. Cheng Y., Luo R., Wang K., Zhang M., Wang Z., Dong L., Li J., Yao Y., Ge S., Xu G. Kidney disease is associated with in-hospital death of patients with COVID-19. Kidney Int. 2020;97:829–838. doi: 10.1016/j.kint.2020.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Colquhoun D. The reproducibility of research and the misinterpretation of p-values. R. Soc. Open Sci. 2017;4 doi: 10.1098/rsos.171085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. COMBAT Consortium, 2022. A blood atlas of COVID-19 defines hallmarks of disease severity and specificity, Cell, 185: 916–938. [DOI] [PMC free article] [PubMed]
  29. Coppelli A., Giannarelli R., Aragona M., Penno G., Falcone M., Tiseo G., Ghiadoni L., Barbieri G., Monzani F., Virdis A., Menichetti F., Del Prato S., on behalf of the Pisa COVID-19 Study Group Hyperglycemia at hospital admission is associated with severity of the prognosis in patients hospitalized for COVID-19: the Pisa COVID-19 study. Diabetes Care. 2020;43:2345–2348. doi: 10.2337/dc20-1380. [DOI] [PubMed] [Google Scholar]
  30. Cornelius E., Akman O., Hrozencik D. COVID-19 mortality prediction using machine learning-integrated Random Forest algorithm under varying patient frailty. Mathematics. 2021;9:2043. [Google Scholar]
  31. Daniels S., Wei H., Denning D.W. Platelet size as a predictor for severity and mortality in COVID-19 patients: a systematic review and meta-analysis. medRxiv. 2021 doi: 10.1101/2021.07.15.21260576. [DOI] [Google Scholar]
  32. De Carvalho H., Letellier T., Karakachoff M., Desvaux G., Caillon H., Papuchon E., Bentoumi-Loaec M., Benaouicha N., Canet E., Chapelet G., LeTurnier P., Montassier E., Rouhani A., Goffinet N., Figueres L. Hyponatremia is associated with poor outcome in COVID-19. J. Nephrol. 2021;34:991–998. doi: 10.1007/s40620-021-01036-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Deepthi K., Jereesh A.S., Liu Y. A deep learning ensemble approach to prioritize antiviral drugs against novel coronavirus SARS-CoV-2 for COVID-19 drug repurposing. Appl. Soft Comp. 2021;113(B) doi: 10.1016/j.asoc.2021.107945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Delshad M., Safaroghli-Azar A., Pourbagheri-Sigaroodi A., Poopak B., Shokouhi S., Bashash D. Platelets in the perspective of COVID-19; pathophysiology of thrombocytopenia and its implication as prognostic and therapeutic opportunity. Int. Immunopharm. 2021;99 doi: 10.1016/j.intimp.2021.107995. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Díaz-Uriarte R., De Andreés S. Gene selection and classification of microarray data using random forest. BMC Bioinf. 2006;7:3. doi: 10.1186/1471-2105-7-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Ding Z.Y., Li G.X., Chen L., Chen X.P., Zhang B. Association of liver abnormalities with in-hospital mortality in patients with COVID-19. J. Hepatol. 2020;74:1295–1302. doi: 10.1016/j.jhep.2020.12.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Dogan O., Tiwari S., Jabbar M.A., Guggari S. A systematic review on AI/ML approaches against COVID-19 outbreak. Complex Intell. Syst. 2021;7:2655–2678. doi: 10.1007/s40747-021-00424-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Fadl N., Ali E., Salem T.Z. COVID-19: risk factors associated with infectivity and severity. Scand. J. Immunol. 2021;93 doi: 10.1111/sji.13039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Fernández-Delgado M., Cernadas E., Barro S. Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 2014;15:3133–3181. [Google Scholar]
  40. Filippo Ldi, Formenti A.M., Rovere-Querini P., Carlucci M., Conte C., Ciceri F., Zangrillo A., Giustina A. Hypocalcemia is highly prevalent and predicts hospitalization in patients with COVID-19. Endocrine. 2020;68:475–478. doi: 10.1007/s12020-020-02383-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Fine J.P., Gray R.J. A proportional hazards model for the subdistribution of a competing risk. J. Am Stat. Assoc. 1999;94:496–509. [Google Scholar]
  42. Foy B.H., Carlson J.C.T., Reinertsen E., Valls R.P.I., Lopez R.P., Palanques-Tost E., Mow C., Westover M.B., Aguirre A.D., Higgins J.M. Association of red blood cell distribution width with mortality risk in hospitalized adults with SARS-CoV-2 infection. JAMA Netw. Open. 2020;3 doi: 10.1001/jamanetworkopen.2020.22058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Ghahramani S., Tabrizi R., Lankarani K.B., Kashani S.M.A., Rezaei S., Zeidi N., Akbari M., Heydari S.T., Akbari H., Nowrouzi-Sohrabi P., Ahmadizar F. Laboratory features of severe vs. non-severe COVID-19 patients in Asian populations: a systematic review and meta-analysis. Euro. 2020;25:30. doi: 10.1186/s40001-020-00432-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Gu X., Li X., An X., Yang S., Wu S., Yang X., Wang H. Elevated serum aspartate aminotransferase level identifies patients with coronavirus disease 2019 and predicts the length of hospital stay. J. Clin. Lab Anal. 2021;34 doi: 10.1002/jcla.23391. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Guan W.J., Ni Z.Y., Hu Y., Liang W.H., Ou C.Q., He J.X., Liu L., Shan H., Lei C.L., Hui D.S.C., Du B., Li L.J., et al. for the China Medical Treatment Expert Group for Covid-19 Clinical characteristics of coronavirus disease 2019 in China. N. Eng. J. Med. 2020;382:1708–1720. doi: 10.1056/NEJMoa2002032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Guarisco G., Fasolo M., Capoccia D., Morsello G., Carraro A., Zuccalá P., Marocco R., Del Borgo C., Pelle G., Iannarelli A., Orlando E., Spagnoli A., Carbone I., Lichtner M., Iacobellis G., Leonetti F., the COVID-19 Latina Study Group Blood glucose and epicardial adipose tissue at the hospital admission as possible predictors for COVID-19 severity. Endocrine. 2022;75:10–18. doi: 10.1007/s12020-021-02925-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Haidich A.B. Meta-analysis in medical research. Hippokratia. 2010;14(suppl 1):29–37. [PMC free article] [PubMed] [Google Scholar]
  48. Halsey L.G. The reign of the p-value is over: what alternative analyses could we employ to fill the power vacuum? Biol. Lett. 2019;15 doi: 10.1098/rsbl.2019.0174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Han Y., Zhang H., Mu S., Wei W., Jin C., Tong C., Song Z., Zha Y., Xue Y., Gu G. Lactate dehydrogenase, an independent risk factor of severe COVID-19 patients: a retrospective and observational study. Aging. 2020;12:11245–11258. doi: 10.18632/aging.103372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Hariyanto T.I., Kurniawan A. Anemia is associated with severe coronavirus disease 2019 (COVID-19) infection. Transf. Apher. Sci. 2020;59 doi: 10.1016/j.transci.2020.102926. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Harrell F.E., Jr., Califf R.M., Pryor D.B., Lee K.L., Rosati R.A. Evaluating the yield of medical tests. J. Am. Med. Assoc. 1982;247:2543–2546. [PubMed] [Google Scholar]
  52. Hastie T., Tibshirani R., Friedman J. Springer; 2009. The Elements of Statistical Learning, 2nd editor. [Google Scholar]
  53. Henry B.M., Aggarwal G., Wong J., Benoit S., Vikse J., Plebani M., Lippi G. Lactate dehydrogenase levels predict coronavirus disease 2019 (COVID-19) severity and mortality: a pooled analysis. Am. J. Emerg. Med. 2020;38:1722–1726. doi: 10.1016/j.ajem.2020.05.073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Huang G., Kovalic A.J., Graber C.J. Prognostic value of leukocytosis and lymphopenia for coronavirus disease severity. Emerg. Infect. Dis. 2020;26:1839–1841. doi: 10.3201/eid2608.201160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Ioannidis J.P.A. The proposal to lower P value thresholds to.005. J. Am. Med. Assoc. 2018;319:1429–1430. doi: 10.1001/jama.2018.1536. [DOI] [PubMed] [Google Scholar]
  56. Ishwaran H., Kogalur U.B. Random survival forests for R. Rnews. 2007;7:25–31. [Google Scholar]
  57. Ishwaran H., Kogalur U.B., Blackstone E.H., Lauer M.S. Random survival forest. Ann. Appl. Stat. 2008;2:841–860. [Google Scholar]
  58. Jansen J., Reimer K.C., Nagai J.S., Varghese F.S., Overheul G.J., de Beer M., Roverts R., Daviran D., Fermin L.A.S., Willemsen B., Beukenboom M., Djudjaj S., vonStillfried S., van Eijk L.E., Mastik M., Bulthuis M., denDunnen W., van Goor H., et al. SARS-CoV-2 infects the human kidney and drives fibrosis in kidney organoids. Cell Stem Cell. 2022;29:217–231. doi: 10.1016/j.stem.2021.12.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Jin J.M., Bai P., He W., Wu F., Liu X.F., Han D.M., Liu S., Yang J.K. Gender differences in patients with COVID-19: focus on severity and mortality. Front. Public Health. 2020;8:152. doi: 10.3389/fpubh.2020.00152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Jin Z.M., Shi J.C., Zheng M., Chen Q.L., Zhou Y.Y., Cheng F., Cai J., Jiang X.G. Increased levels of lactate dehydrogenase and hypertension are associated with severe illness of COVID-19. WOrld. 2022;10:128–136. doi: 10.12998/wjcc.v10.i1.128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Kalbfleisch J.D., Prentice R.L. Wiley-Interscience; 1980. The Statistical Analysis of Failure Time Data. [Google Scholar]
  62. Karthikeyan A., Garg A., Vinod P.K., Priyakumar U.D. Machine learning based clinical decision support system for early COVID-19 mortality prediction. Front. Public Health. 2021;9:475. doi: 10.3389/fpubh.2021.626697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Kattan M.W., Gerds T.A. The index of prediction accuracy: an intuitive measure useful for evaluating risk prediction models. Diagn. Progn. Res. 2018;2:7. doi: 10.1186/s41512-018-0029-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Kaushal K., Kaur H., Sarma P., Bhattacharyya A., Sharma D.J., Prajapat M., Pathak M., Kothari A., Kumar S., Rana S., Kaur M., Prakash A., Mirza A.A., Panda P.K., Vivekanandan S., Omar B.J., Medhi B., Naithani M. Serum ferritin as a predictive biomarker in COVID-19. A systematic review, meta-analysis and meta-regression analysis. J. Crit. Care. 2022;67:172–181. doi: 10.1016/j.jcrc.2021.09.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Kermali M., Khalsa R.K., Pillai K., Ismail Z., Harky A. The role of biomarkers in diagnosis of COVID-19 - A systematic review. Life Sci. 2020;254 doi: 10.1016/j.lfs.2020.117788. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Khan M., Mehran M.T., Haq Z.U., Ullah Z., Naqvi S.R., Ihsan M., Abbass H. Applications of artificial intelligence in COVID-19 pandemic: a comprehensive review. Exp. Syst. Appl. 2021;185 doi: 10.1016/j.eswa.2021.115695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Kocadagli O., Baygul A., Gokmen N., Incir S., Aktan C. Clinical prognosis evaluation of COVID-19 patients: an interpretable hybrid machine learning approach. Curr. Res. Transl. Med. 2022;70 doi: 10.1016/j.retram.2021.103319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Lee K.K., Rahimi O., Lee C.K., Shafi A., Hawwass D. A meta-analysis: coronary artery calcium score and COVID-19 prognosis. Med. Sci. 2022;10:5. doi: 10.3390/medsci10010005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Li M., Zhang Z., Cao W., Liu Y., Du B., Chen C., Liu Q., Uddin M.N., Jiang S., Chen C., Zhang Y., Wang X. Identifying novel factors associated with COVID-19 transmission and fatality using the machine learning approach. Sci. Total Environ. 2021;764 doi: 10.1016/j.scitotenv.2020.142810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Li W. The-more-the-better and the-less-the-better. Bioinformatics. 2006;22:2187–2188. doi: 10.1093/bioinformatics/btl189. [DOI] [PubMed] [Google Scholar]
  71. Li W., Shih A., Freudenberg-Hua Y., Fury W., Yang Y. Beyond standard pipeline and p < 0.05 in pathway enrichment analyses. Comp. Biol. Chem. 2021;92 doi: 10.1016/j.compbiolchem.2021.107455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Li W.T., Ma J., Shende N., Castaneda G., Chakladar J., Tsai J.C., Apostol L., Honda C.O., Xu J., Wong L.M., Zhang T., Lee A., Gnanasekar A., Honda T.K., Kuo S.Z., Yu M.A., Chang E.Y., Rajasekaran M., Ongkeko W.M. Using machine learning of clinical data to diagnose COVID-19: a systematic review and meta-analysis. BMC Med. Inf. Dec. Mak. 2020;20:247. doi: 10.1186/s12911-020-01266-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Lin Z., Long F., Yang Y., Chen X., Xu L., Yang M. Serum ferritin as an independent risk factor for severity in COVID-19 patients. J. Infect. 2020;81:647–679. doi: 10.1016/j.jinf.2020.06.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Lindsley A.W., Schwartz J.T., Rothenberg M.E. Eosinophil responses during COVID-19 infections and coronavirus vaccination. J. Allergy Clin. Immun. 2020;146:1–7. doi: 10.1016/j.jaci.2020.04.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Ling P., Luo S., Zheng X., Cai G., Weng J. Elevated fasting blood glucose within the first week of hospitalization was associated with progression to severe illness of COVID-19 in patients with preexisting diabetes: a multicenter observational study. J. Diabetes. 2020;13:89–93. doi: 10.1111/1753-0407.13121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Lippi G., South A.M., Henry B.M. Electrolyte imbalances in patients with severe coronavirus disease 2019 (COVID-19) Ann. Clin. Biochem. 2020;57:262–265. doi: 10.1177/0004563220922255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Lippi G., henry B.M., Favaloro E.J. Mean platelet volume predicts severe COVID-19 illness. Semi. Thromb. Hemost. 2021;47:456–459. doi: 10.1055/s-0041-1727283. [DOI] [PubMed] [Google Scholar]
  78. Liu J., Han P., Wu J., Gong J., Tian D. Prevalence and predictive value of hypocalcemia in severe COVID-19 patients. J. Infect. Public Health. 2020;13:1224–1228. doi: 10.1016/j.jiph.2020.05.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Liu X., Zhou H., Zhou Y., Wu X., Zhao Y., Lu Y., Tan W., Yuan M., Ding X., Zou J., Li R., Liu H., Ewing R.M., Hu Y., Nie H., Wang Y. Risk factors associated with disease severity and length of hospital stay in COVID-19 patients. J. Infect. 2020;81:e95–e97. doi: 10.1016/j.jinf.2020.04.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Longato E., Vettoretti M., DiCamillo B. A practical perspective on the concordance index for the evaluation and selection of prognostic time-to-event models. J. Biomed. Inform. 2020;108 doi: 10.1016/j.jbi.2020.103496. [DOI] [PubMed] [Google Scholar]
  81. Louppe, G., 2014. Understanding random forests: from theory to practice, Ph.D Thesis (Department of Electrical Engineering and Computer Science, Universite de Liége).
  82. Lu M., Ishwaran H. J. Thorac. Card. Surg. 2018;155:1130–1136. doi: 10.1016/j.jtcvs.2017.08.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Ma A., Cheng J., Yang J., Dong M., Liao X., Kang Y. Neutrophil-to-lymphocyte ratio as a predictive biomarker for moderate-severe ARDS in severe COVID-19 patients. Crit. Care. 2020;24:288. doi: 10.1186/s13054-020-03007-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Mahdavi M., Choubdar H., Zabeh E., Rieder M., Safavi-Naeini S., Jobbagy Z., Ghorbani A., Abedini A., Kiani A., Khanlarzadeh V., Lashgari R., Kamrani E. A machine learning based exploration of COVID-19 mortality risk. PLoS One. 2021;16 doi: 10.1371/journal.pone.0252384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Majeed A., Hwang S.O. Data-driven analytics leveraging artificial intelligence in the era of COVID-19: an insightful review of recent developments. Symmetry. 2022;14:16. [Google Scholar]
  86. Malik M., Iqbal M.W., Shahzad S.K., Mushtaq M.T., Naqvi M.R., Kamran M., Khan B.A., Tahir M.U. Determination of COVID-19 patients using machine learning algorithms. Intell. Autom. Soft Comput. 2022;31:207–222. [Google Scholar]
  87. Malik P., Patel U., Mehta D., Patel N., Kelkar R., Akrmah M., Gabrilove J.L., Sacks H. Biomarkers and outcomes of COVID-19 hospitalisations: systematic review and meta-analysis. BMJ Evid. -Based Med. 2021;26:107–108. doi: 10.1136/bmjebm-2020-111536. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. McCoy D., Mgbara W., Horvitz N., Getz W.M., Hubbard A. Ensemble machine learning of factors influencing COVID-19 across US counties. Sci. Rep. 2021;11:11777. doi: 10.1038/s41598-021-90827-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Meizlish M.L., Pine A.B., Bishai J.D., Goshua G., Nadelmann E.R., Simonov M., Chang C.H., Zhang H., Shallow M., Bahel P., Owusu K., Yamamoto Y., Arora T., Atri D.S., Patel A., Gbyli R., Kwan J., Won C.H., Dela Cruz C., Price C., Koff J., King B.A., Rinder H.M., Wilson F.P., Hwa J., Halene S., Damsky W., van Dijk D., Lee A.I., Chun H.J. A neutrophil activation signature predicts critical illness and mortality in COVID-19. Blood Adv. 2021;5:1164–1177. doi: 10.1182/bloodadvances.2020003568. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Mertoglu C., Huyut M.T., Arslan Y., Ceylan Y., Coban T.A. How do routine laboratory tests change in coronavirus disease 2019? Scand. J. Clin. Lab. Investig. 2021;81:24–33. doi: 10.1080/00365513.2020.1855470. [DOI] [PubMed] [Google Scholar]
  91. Mo P., Xing Y., Xiao Y., Deng L., Zhao Q., Wang H., Xiong Y., Cheng Z., Gao S., Liang K., Luo M., Chen T., Song S., Ma Z., Chen X., Zheng R., Cao Q., Wang F., Zhang Y. Clinical characteristics of refractory COVID-19 pneumonia in Wuhan, China. Clin. Infect. Dis. 2020;2020 doi: 10.1093/cid/ciaa270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Mukherjee S., Pahan K. Is COVID-19 gender-sensitive? J. Neuroimmune Pharmacol. 2021;16:38–47. doi: 10.1007/s11481-020-09974-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Nembrini S., König I.R., Wright M.N. The revival of the Gini importance? Bioinformatics. 2018;34:3711–3718. doi: 10.1093/bioinformatics/bty373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Nuzzo R. Scientific method: statistical errors: p values, the -?gold standard’ of statistical validity, are not as reliable as many scientists assume. Nature. 2014;506:150–152. doi: 10.1038/506150a. [DOI] [PubMed] [Google Scholar]
  95. Ortolan A., Lorenzin M., Felicetti M., Doria A., Ramonda R. Does gender influence clinical expression and disease outcomes in COVID-19? A systematic review and meta-analysis. Int. J. Infect. Dis. 2020;99:496–504. doi: 10.1016/j.ijid.2020.07.076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Pan L., Huang P., Xie X., Xu J., Guo D., Jiang Y. Metabolic associated fatty liver disease increases the severity of COVID-19: a meta-analysis. Dig. Liver Dis. 2021;53:153–157. doi: 10.1016/j.dld.2020.09.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Piccialli F., diCola V.S., Giampaolo F., Cuomo S. The role of artificial intelligence in fighting the COVID-19 pandemic. Info Syst. Front. 2021;23:1467–1497. doi: 10.1007/s10796-021-10131-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Probst P., Wright M.N., Boulesteix A.L. Hyperparameters and tuning strategies for random forest. WIREs Data Min. Knowl. Discov. 2019;9 [Google Scholar]
  99. Qin C., Wei Y., Lyu X., Zhao B., Feng Y., Li T., Cao H., Yang X., Zhou X., Wang W., You L., Wang Y. High aspartate aminotransferase to alanine aminotransferase ratio on admission as risk factor for poor prognosis in COVID-19 patients. Sci. Rep. 2020;10:16496. doi: 10.1038/s41598-020-73575-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Rahman T., Khandakar A., Abir F.F., Faisal M.A.A., Hossain M.S., Podder K.K., Abbas T.O., Alam M.F., Kashem S.B., Islam M.T., Zughaier S.M., Chowdhury M.E.H. QCovSML: a reliable COVID-19 detection system using CBC biomarkers by a stacking machine learning model. Comp. Biol. Med. 2022;143 doi: 10.1016/j.compbiomed.2022.105284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Reusch N., De Domenico E., Bonaguro L., Schulte-Schrepping J., Bassler K., Schultze J.L., Aschenbrenner A.C. Neutrophils in COVID-19. Front. Immunol. 2021;12 doi: 10.3389/fimmu.2021.652470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Rodriguez L., Pekkarinen P.T., Lakshmikanth T., Tan Z., Consiglio C.R., Pou C., Chen Y., Mugabo C.H., Nguyen N.A., Nowlan K., Strandin T., Levanov L., Mikes J., Wang J., Kantele A., Hepojoki J., Vapalahti O., Heinonen S., Kekäläinen E., Brodin P. Systems-level immunomonitoring from acute to recovery phase of severe COVID-19. Cell Rep. Med. 2020;1 doi: 10.1016/j.xcrm.2020.100078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Rohlfing A.K., Rath D., Geisler T., Gawaz Meinrad. Platelets and COVID-19. Hamostaseologie. 2021;41:379–385. doi: 10.1055/a-1581-4355. [DOI] [PubMed] [Google Scholar]
  104. Rosenthal N., Cao Z., Gundrum J., Sianis J., Safo S. Risk factors associated with in-hospital mortality in a US national sample of patients with COVID-19. JAMA Netw. Open. 2020;3 doi: 10.1001/jamanetworkopen.2020.29058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Ryan T.P. Modern Regression Methods. 2nd edition. Wiley; 2008. [Google Scholar]
  106. Sayad B., Afshar Z.M., Mansouri F., Rahimi Z. Leukocytosis and alteration of hemoglobin level in patients with severe COVID-19: association of leukocytosis with mortality. Health Sci. Rep. 2020;3 doi: 10.1002/hsr2.194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Simon N., Friedman J., Hastie T., Tibshirani R. Regularization paths for Coxas proportional hazards model via coordinate descent. J. Stat. Softw. 2011;39:1–13. doi: 10.18637/jss.v039.i05. [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. Singh A.K., Singh R. Hyperglycemia without diabetes and new-onset diabetes are both associated with poorer outcomes in COVID-19. Diabetes Res. Clin. Pract. 2020;167 doi: 10.1016/j.diabres.2020.108382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  109. Sinha S., Rosin N.L., Arora R., Labit E., Jaffer A., Cao L., Farias R., Nguyen A.P., de Almeida L.G.N., Dufour A., Bromley A., McDonald B., Gillrie M.R., Fritzler M.J., Yipp B.G., Biernaskie J. Dexamethasone modulates immature neutrophils and interferon programming in severe COVID-19. Nat. Med. 2022;28:201–211. doi: 10.1038/s41591-021-01576-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  110. Skwiersky S., Rosengarten S., Change M., et al. Sugar is not always sweet: exploring the relationship between hyperglycemia and COVID-19 in a predominantly African. J. Endo. Soc. 2021;5:A350–A351. doi: 10.1136/bmjdrc-2021-002692. (suppl 1) [DOI] [PMC free article] [PubMed] [Google Scholar]
  111. Strobl C., Boulesteix A.L., Kneib T., Augustin T., Zeileis A. Conditional variable importance for random forests. BMC Bioinf. 2008;9:307. doi: 10.1186/1471-2105-9-307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  112. Sun Y., Zhou J., Ye K. White blood cells and severe COVID-19: a Mendelian randomization study. J. Pers. Med. 2021;11:195. doi: 10.3390/jpm11030195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Szoke D., Caruso S., Aloisio E., Pasqualetti S., Dolci A., Panteghini M. Serum potassium concentrations in COVID-19. Clin. Chim. Acta. 2021;512:26–27. doi: 10.1016/j.cca.2020.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Tan L., Wang Q., Zhang D., Ding J., Huang Q., Tang Y.Q., Wang Q., Miao H. Lymphopenia predicts disease severity of COVID-19: a descriptive and predictive study. SIgnal Transd. Target. Ther. 2020;5:33. doi: 10.1038/s41392-020-0148-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Tan Y., Zhou J., Zhou Q., Hu L., Long Y. Role of eosinophils in the diagnosis and prognostic evaluation of COVID-19. J. Med. Virol. 2021;93:1105–1110. doi: 10.1002/jmv.26506. [DOI] [PubMed] [Google Scholar]
  116. Tang N., Li D., Wang X., Sun Z. Abnormal coagulation parameters are associated with poor prognosis in patients with novel coronavirus pneumonia. J. Thromb. Haemost. 2020;18:844–847. doi: 10.1111/jth.14768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  117. Tao Z., Xu J., Chen W., Yang Z., Xu X., Liu L., Chen R., Xie J., Liu M., Wu J., Wang H., Liu J. Anemia is associated with severe illness in COVID-19: a retrospective cohort study. J. Med. Virol. 2021;93:1478–1488. doi: 10.1002/jmv.26444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  118. Tavakolpour S., Rakhshandehroo T. Lymphopenia during the COVID-19 infection: what it shows and what can be learned. Immunol. Lett. 2020;225:31–32. doi: 10.1016/j.imlet.2020.06.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  119. Terra P.O.C., Donadel C.D., Oliveira L.C., Menegueti M.G., Auxiliadora-Martins M., Calado R.T., De Santis G.C. Neutrophil-to-lymphocyte ratio and D-dimer are biomarkers of death risk in severe COVID-19: a retrospective observational study. Health Sci. Rep. 2022;5 doi: 10.1002/hsr2.514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  120. Thachil J., Tang N., Gando S., Falanga A., Cattaneo M., Levi M., Clark C., Iba T. ISTH interim guidance on recognition and management of coagulopathy in COVID-19. J. Thromb. Haemost. 2020;18:1023–1026. doi: 10.1111/jth.14810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  121. Therneau T.M., Grambsch P.M. Springer; 2010. Modeling Survival Data: Extending the Cox Model. [Google Scholar]
  122. Tian J., Yuan X., Xiao J., Zhong Q., Yang C., Liu B., Cai Y., Lu Z., Wang J., Wang Y., Liu S., Cheng B., Wang J., Zhang M., Wang L., Niu S., Yao Z., Deng X., Zhou F., Wei W., Li Q., Chen X., Chen W., Yang Q., Wu S., Fan J., Shu B., Hu Z., Wang S., Yang X.P., Liu W., Miao X., Wang Z. Clinical characteristics and risk factors associated with COVID-19 disease severity in patients with cancer in Wuhan, China: a multicentre, retrospective, cohort study. Lancet. 2020;21:893–903. doi: 10.1016/S1470-2045(20)30309-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  123. Tian T., Zhang J., Hu L., Jiang Y., Duan C., Li Z., Wang X., Zhang H. Risk factors associated with mortality of COVID-19 in 3125 counties of the United States. Infect. Dis. Poverty. 2021;10:3. doi: 10.1186/s40249-020-00786-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  124. Tibshirani R. Regression Shrinkage and Selection via the lasso. J. Royal Stat. Soc. B. 1996;58:267–288. [Google Scholar]
  125. Tibshirani R. The lasso method for variable selection in the Cox model. Stat. Med. 1997;16:385–395. doi: 10.1002/(sici)1097-0258(19970228)16:4<385::aid-sim380>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
  126. Tzoulis P., Waung J.A., Bagkeris E., Hussein Z., Biddanda A., Cousins J., Dewsnip A., Falayi K., McCaughran W., Mullins C., Naeem A., Nwokolo M., Quah H., Bitat S., Deyab E., Ponnampalam S., Bouloux P.M., Montgomery H., Baldeweg S.E. Dysnatremia is a predictor for morbidity and mortality in hospitalized patients with COVID-19. J. Clin Endocrinol. Metab. 2021;106:1637–1648. doi: 10.1210/clinem/dgab107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  127. Ulgen A., Cetin S., Balci P.O., Sivgin H., Sivgin S., Cetin M., Li W. COVID-19 outpatients and surviving inpatients exhibit comparable blood test results that are distinct from non-surviving inpatients. Health Sci. Med. 2021;4(3):306–313. [Google Scholar]
  128. Vanderbeke L., Van Mol P., Van Herck Y., De Smet F., Humblet-Baron S., Martinod K., Antoranz A., Arijs I., Boeckx B., Bosisio F.M., Casaer M., Dauwe D., De Wever W., Dooms C., Dreesen E., Emmaneel A., Filtjens J., Gouwy M., Gunst J., Hermans G., Jansen S., Lagrou K., Liston A., Lorent N., Meersseman P., Mercier T., Neyts J., Odent J., Panovska D., Penttila P.A., Pollet E., Proost P., Qian J., Quintelier K., Raes J., Rex S., Saeys Y., Sprooten J., Tejpar S., Testelmans D., Thevissen K., Van Buyten T., Vandenhaute J., Van Gassen S., Velásquez Pereira L.C., Vos R., Weynand B., Wilmer A., Yserbyt J., Garg A.D., Matthys P., Wouters C., Lambrechts D., Wauters E., Wauters J. Monocyte-driven atypical cytokine storm and aberrant neutrophil activation as key mediators of COVID-19 disease severity. Nat. Commun. 2021;12:4117. doi: 10.1038/s41467-021-24360-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  129. Wang C., Zhang H., Cao X., Deng R., Ye Y., Fu Z., Gou L., Shao F., Li J., Fu W., Zhang X., Ding X., Xiao J., Wu C., Li T., Qi H., Li C., Lu Z. Red cell distribution width (RDW): a prognostic indicator of severe COVID-19. Ann. Transl. Med. 2020;8:120. doi: 10.21037/atm-20-6090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  130. Wang D., Li R., Wang J., Jiang Q., Gao C., Yang J., Ge L., Hu Q. Correlation analysis between disease severity and clinical and biochemical characteristics of 143 cases of COVID-19 in Wuhan, China: a descriptive study. BMC Infect. Dis. 2020;20:519. doi: 10.1186/s12879-020-05242-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  131. Wang Z.H., Fu B.Q., Lin Y.W., Wei X.B., Geng H., Guo W.X., Yuan H.Q., Liao Y.W., Qin T.H., Li F., Wang S.H. Red blood cell distribution width: a severity indicator in patients with COVID-19. J. Med. Virol. 2022;94:2133–2138. doi: 10.1002/jmv.27602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  132. Williamson E.J., Walker A.J., Bhaskaran K., Bacon S., Bates C., Morton C.E., Curtis H.J., Mehrkar A., Evans D., Inglesby P., Cockburn J., McDonald H.I., MacKenna B., Tomlinson L., Douglas I.J., Rentsch C.T., Mathur R., Wong A.Y.S., Grieve R., Harrison D., Forbes H., Schultze A., Croker R., Parry J., Hester F., Harper S., Perera R., Evans S.J.W., Smeeth L., Goldacre B. Factors associated with COVID-19-related death using OpenSAFELY. Nature. 2020;584:430–436. doi: 10.1038/s41586-020-2521-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  133. Wu Y., Hou B., Liu J., Chen Y., Zhong P. Risk factors associated with long-term hospitalization in patients with COVID-19: a single-centered, retrospective study. Front. Med. 2020;7:315. doi: 10.3389/fmed.2020.00315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  134. Xie G., Ding F., Han L., Yin D., Lu H., Zhang M. The role of peripheral blood eosinophil counts in COVID-19 patients. Allergy. 2021;76:471–482. doi: 10.1111/all.14465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  135. Ya’qoub L., Elgendy I.Y., Pepine C.J. Sex and gender differences in COVID-19: More to be learned! Am. Heart J. Cardiol. Res. Pract. 2021;3 doi: 10.1016/j.ahjo.2021.100011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  136. Yan L., Zhang H.T., Goncalves J., Xiao Y., Wang M., Guo Y., Sun C., Tang X., Jing L., Zhang M., Huang X., Xiao Y., Cao H., Chen Y., Ren T., Wang F., Xiao Y., Huang S., Tan X., Huang N., Jiao B., Cheng C., Zhang Y., Luo A., Mombaerts L., Jin J., Cao Z., Li S., Xu H., Yuan Y. An interpretable mortality prediction model for COVID-19 patients. Nat. Mach. Intell. 2020;2:283–288. [Google Scholar]
  137. Ye B., Deng H., Zhao H., Liang J., Ke L., Li W. Association between an increase in blood urea nitrogen at 24-?h and worse outcomes in COVID-19 pneumonia. Ren. Fail. 2021;43:347–350. doi: 10.1080/0886022X.2021.1879855. [DOI] [PMC free article] [PubMed] [Google Scholar]
  138. Zeng X., Song X., Ma T., Pan X., Zhou Y., Hou Y., Zhang Z., Li K., Karypis G., Cheng F. Repurpose open data to discover therapeutics for COVID-19 using deep learning. J. Proteome Res. 2020;19:4624–4636. doi: 10.1021/acs.jproteome.0c00316. [DOI] [PubMed] [Google Scholar]
  139. Zhang B., Zhou X., Zhu C., Song Y., Feng F., Qiu Y., Feng J., Jia Q., Song Q., Zhu B., Wang J. Immune phenotyping based on the neutrophil-to-lymphocyte ratio and IgG level predicts disease severity and outcome for patients with COVID-19. Front. Mol. Biosci. 2020;7:157. doi: 10.3389/fmolb.2020.00157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  140. Zhang W., Zhang Z., Ye Y., Luo Y., Pan S., Qi H., Yu Z., Qu J. Lymphocyte percentage and hemoglobin as a joint parameter for the prediction of severe and nonsevere COVID-19: a preliminary study. Ann. Transl. Med. 2020;8:1231. doi: 10.21037/atm-20-6001. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Computational Biology and Chemistry are provided here courtesy of Elsevier

RESOURCES