Skip to main content
Heliyon logoLink to Heliyon
. 2024 Jun 23;10(13):e33337. doi: 10.1016/j.heliyon.2024.e33337

A novel clinical prediction model for in-hospital mortality in sepsis patients complicated by ARDS: A MIMIC IV database and external validation study

Ying Chen a,b,1,∗∗∗, Chengzhu Zong c,1, Linxuan Zou c, Zhe Zhang d,e, Tianke Yang d,e, Junwei Zong d,e,, Xianyao Wan a,⁎⁎
PMCID: PMC467048  PMID: 39027620

Abstract

Background

Sepsis complicated by ARDS significantly increases morbidity and mortality, underscoring the need for robust predictive models to enhance patient management.

Methods

We collected data on 6390 patients with ARDS-complicated sepsis from the MIMIC IV database. Following rigorous data cleaning, including outlier management, handling missing values, and transforming variables, we conducted univariate analysis and logistic multivariate regression. We employed the LASSO machine learning algorithm to identify risk factors closely associated with patient outcomes. These factors were then used to develop a new clinical prediction model. The model underwent preliminary assessment and internal validation, and its performance was further tested through external validation using data from 225 patients at a major tertiary hospital in China. This validation assessed the model's discrimination, calibration, and net clinical benefits.

Results

The model, illustrated by a concise nomogram, demonstrated significant discrimination with an area under the curve (AUC) of 0.711 in the internal validation set and 0.771 in the external validation set, outperforming conventional severity scores such as the SOFA and SAPS II. It also showed good calibration and net clinical benefits.

Conclusions

Our model serves as a valuable tool for identifying sepsis patients with ARDS at high risk of in-hospital mortality. This could enable the implementation of personalized treatment strategies, potentially improving patient outcomes.

Keywords: Sepsis, ARDS, Clinical prediction model, Mortality

1. Background

Sepsis, a life-threatening condition, remains a leading cause of mortality among critically ill patients worldwide [[1], [2], [3]]. Approximately 30 % of patients diagnosed with sepsis go on to develop Acute Respiratory Distress Syndrome (ARDS), which significantly increases mortality rates compared to other complications [4,5].

Clinical predictive models are innovative tools for risk assessment, estimating the likelihood of present conditions or future patient outcomes through specific formulas. These models deliver intuitive and evidence-based information to both healthcare professionals and patients [6]. Various models have been developed to evaluate the prognostic risks associated with sepsis and its complications. For instance, studies utilizing stepwise logistic regression have analyzed independent risk factors for sepsis-related liver injury, demonstrating the models' effectiveness through ROC curves with promising results [7]. Similarly, predictive models for sepsis-associated acute kidney injury have shown strong predictive performance in both training and validation cohorts, especially in elderly patients [8]. While numerous scoring models predict the severity and mortality associated with sepsis in the elderly, reports on models for in-hospital mortality, specifically in patients with sepsis-related ARDS, are limited.

The Medical Information Mart for Intensive Care (MIMIC) database is a vast, openly accessible repository containing de-identified health data from thousands of patients treated in intensive care units at Beth Israel Deaconess Medical Center from 2001 to 2019 [[9], [10], [11]]. This database offers detailed data on demographics, vital signs, lab results, procedures, medications, caregiver notes, imaging reports, and mortality outcomes. Utilizing the extensive clinical data from the MIMIC database provides robust and reliable support for predicting the prognostic risks in patients with sepsis complicated by ARDS [12].

This study aimed to address a research gap by analyzing data from the MIMIC database to identify patients with sepsis and ARDS and analyze the associated risk factors for in-hospital mortality. We utilized a combination of multivariate logistic regression modeling and machine learning algorithms to select key prognostic indicators of clinical relevance. Based on these indicators, we constructed a predictive model to estimate the risk of in-hospital mortality in patients with sepsis-associated ARDS, which was then internally validated.

Additionally, we retrospectively collected de-identified data from patients with sepsis and ARDS at a tertiary hospital in Liaoning Province, China. The newly developed predictive model was applied to this dataset, followed by external validation to assess its clinical applicability and value.

By leveraging comprehensive clinical data from the MIMIC database and conducting rigorous internal and external validations, this study aimed to develop a robust and clinically valuable predictive model for assessing the risk of in-hospital mortality in patients with sepsis-associated ARDS.

2. Methods

2.1. Database and study population

All clinical data for this study were obtained from two distinct sources: the MIMIC IV v2.0 [13] and the ICU of a large tertiary hospital in Liaoning Province, China. These two datasets are completely independent. Clinical data from the MIMIC database were used to construct and internally validate predictive models. Conversely, data from the ICU of the large tertiary hospital in Liaoning were used for external validation of the predictive model.

Access to the MIMIC IV database requires passing a qualifying test and obtaining approval from the MIMIC IV administration staff. Following the successful completion of the related training course, one author (Zong J) received permission to extract data for research purposes (certification number: 50766047). The Medical Ethics Committee of the hospital approved the data used for external validation of the model (PJ-KS-KY-2023-304).

Data from both the MIMIC and external validation datasets were selected based on the same inclusion and exclusion criteria. The external validation dataset was obtained from a tertiary hospital in Liaoning Province, China, between January 2016 and September 2022. Adult patients who met the criteria for sepsis-3 and ARDS were included in this study. The inclusion criteria were as follows: (1) diagnosed with sepsis-3 and ARDS, (2) first intensive care unit admission, and (3) age ≥18 years. The exclusion criterion was an ICU stay duration of less than 24 h.

2.2. Data extraction

We collected patient data that met the inclusion criteria, organized into the following categories: basic information (admission and ICU admission times, date of death if applicable, gender, age), vital signs (heart rate, respiratory rate, body temperature, systolic and diastolic blood pressure, mean arterial pressure), disease scores (SOFA, SAPS II [14], Charlson Comorbidity Index, CCI [15,16]), and laboratory tests (white blood cell count, hemoglobin, hematocrit, platelet count, bicarbonate, blood urea nitrogen, serum creatinine, sodium, potassium, blood glucose, prothrombin time, activated partial thromboplastin time, international normalized ratio, pH, oxygen saturation, arterial oxygen and carbon dioxide pressures, urine output). Additionally, we documented comorbidities (myocardial infarction, congestive heart failure, peripheral vascular disease, cerebrovascular disease, dementia, chronic lung disease, rheumatic disease, peptic ulcer disease, liver disease, diabetes, paralysis, kidney disease, malignancy, AIDS) and interventions (use of vasopressors, renal replacement therapy (RRT).

To ensure the accuracy and reliability of the data analysis, all variable values were selected based on the first measurements taken upon ICU admission. This approach was adopted to eliminate the potential effects of any interventions that occurred after ICU admission on the variable values.

2.3. Missing values and outlier handling

The first step involved assessing the extent of missing data within each variable. Variables that had more than 20 % of their observations missing were completely removed from the dataset. This threshold was set to maintain data integrity and ensure the robustness of the statistical analysis.

For variables where the missing data constituted less than 20 % of the observations, multiple imputation was employed to manage the missing entries [[17], [18], [19]]. Multiple imputation involves creating several different plausible imputations by replacing missing values with estimated values based on the other available data. The visualization and handling of missing values were conducted using the VIM and mice packages in R (Supplementary Fig. 1).

Next, we conducted outlier detection and processing. We examined the data distribution of each indicator and presented them using quartiles, minimum and maximum values, as well as mean ± standard deviation(Supplementary Table 4). To mitigate the impact of outliers on data analysis, we retained data within the range of 1 %–99 % for indicators with significant extremely high or low values.

2.4. Data processing

We transformed continuous variables exhibiting nonlinear relationships with prognosis into categorical variables based on their data distribution types and clinical significance. The variables that underwent this transformation included heart rate, mean arterial pressure (MAP), respiratory rate, body temperature, pH, urine output, and blood glucose.

Next, we split the clinical data obtained from the MIMIC database randomly, following a 7:3 ratio. The training set comprised 70 % of the total samples and was utilized for multiple regression analysis and model construction. The remaining 30 % of the case data constituted the internal validation set, used to validate the developed predictive model. Additionally, we employed de-identified data from sepsis patients with ARDS from a tertiary hospital in Liaoning Province, China, as the external validation set to further validate the predictive model using independent external data.

2.5. Variable selection and model construction

Patients were divided into two groups based on in-hospital mortality, and variables between the groups were compared. Sample size was evaluated based on the 10 events per variable (EPV) principle [20,21].

In the variable selection phase, univariate analysis was performed on all variables, using a threshold of p < 0.1 to select variables associated with in-hospital mortality. These selected variables were further included in Logistic regression models for multivariable analysis. Multiple Logistic regression models were constructed using the full model, forward selection, backward elimination, and stepwise regression methods, based on the Akaike information criterion (AIC) [22]. Variables were selected with a threshold of p < 0.05.

Additionally, LASSO machine learning algorithm was used for variable selection. Finally, considering the clinical significance of the selected variables, the final set of variables was chosen for constructing the predictive model.

Internal and external validation of the model was conducted using receiver operating characteristic curves (ROC) area under the curve (AUC), calibration curves, and decision curve analysis (DCA) to evaluate the clinical usefulness and net benefit of the model with the best diagnostic value. Furthermore, a comparative analysis of the predictive performance was performed among the newly developed predictive model, SOFA score model, and SAPS II score model.

2.6. Statistical analysis

Normally distributed continuous variables were summarized as mean ± standard deviation, while non-normally distributed continuous variables were summarized as median. Normality of continuous variables was assessed using the Kolmogorov-Smirnov test. For non-normally distributed continuous data, Student's t-test, or Kruskal-Wallis H test were used for comparisons. Categorical variables were expressed as numbers or percentages, and assessed using the chi-square test or Fisher's exact test, depending on sample sizes. All analyses were conducted using R software, with a significance level of p < 0.05, except for the univariate regression analysis where a threshold of p < 0.1 was used for variable selection.

3. Results

3.1. Baseline characteristics

A total of 6390 patients from the MIMIC IV database who met the inclusion and exclusion criteria and had complete data were included in the study. Among these, there were 1880 in-hospital deaths and 4510 survivors. Comparative analysis of baseline characteristics revealed significant statistical differences (P < 0.05) between the two groups in terms of age, heart rate, respiratory rate, mean arterial pressure (MAP), Sequential Organ Failure Assessment (SOFA) score, Simplified Acute Physiology Score (SAPS) II, Charlson Comorbidity Index (CCI), white blood cell count (WBC), hemoglobin, hematocrit, platelet count, bicarbonate, blood urea nitrogen (BUN), serum creatinine, sodium, potassium, international normalized ratio (INR), prothrombin time (PT), activated partial thromboplastin time (APTT), pH, oxygen saturation (SaO2), urine output, peripheral vascular disease, cerebrovascular disease, diabetes mellitus, malignant tumors, use of vasoactive medications, and renal replacement therapy (RRT). However, no statistically significant differences were observed for the other variables. Fig. 1 presents a flowchart of the model development and validation processes used in this study, while Table 1 compares the baseline characteristics between the two groups.

Fig. 1.

Fig. 1

Flowchart for patient selection and data processing analysis.

Table 1.

Baseline parameters comparisons of patients with sepsis and ARDS.

Death Survival P-value
Number (sample size) 4510 1880
Baseline characteristics
 Gender (male) 2566 (56.9 %) 1048 (55.7 %) 0.413
 Age (year, mean ± SD) 64.2 ± 16.7 69.4 ± 15.3 <0.001
Vital signs
 Heart rate (times/min)
 Normal (60–100) 3281 (72.7 %) 1211 (64.4 %) <0.001
 Moderate (101–120) 931 (20.6 %) 515 (27.4 %)
 Severe (<60 or >120) 298 (6.6 %) 154 (8.2 %)
 Respiratory rate (times/min)
 Normal (15–20) 1990 (44. 1 %) 645 (34.3 %) <0.001
 Moderate (<15 or 21–25) 1881 (41.7 %) 808 (43.0 %)
 Severe (>25) 639 (14.2 %) 427 (22.7 %)
 Temperature (°C)
 Normal (36∼37.3) 2922 (64.8 %) 1270 (67.6 %) 0.0122
 Moderate (34∼35.9 or 37.4–38.2) 1418 (31.4 %) 525 (27.9 %)
 Severe (<34 or >38.2) 170 (3.8 %) 85 (4.5 %)
 Mean Arterial Pressure (mmHg)
Normal (65–110) 4162 (92.3 %) 1621 (86.2 %) <0.001
Abnormal (<65 or >110) 348 (7.7 %) 259 (13.8 %)
Disease indice
 SOFA ((mean ± SD)) 3.86 ± 2.14 4.58 ± 2.63 <0.001
 SAPS II (mean ± SD) 42.7 ± 13.9 53.0 ± 15.4 <0.001
 Charlson Comorbidity Index (mean ± SD) 5.53 ± 2.92 6.58 ± 2.78 <0.001
Laboratory parameters
 WBC (10 × 109/L, mean ± SD) 16.0 ± 8.52 17.4 ± 10.2 <0.001
 Hemoglobin(g/dl, mean ± SD) 11.6 ± 2.30 11.2 ± 2.28 <0.001
 Hematocrit (%, mean ± SD) 35.6 ± 6.80 34.6 ± 6.89 <0.001
 Platelet (10 × 109/L, mean ± SD) 237 ± 116 222 ± 133 <0.001
 Bicarbonate (mmol/L, mean ± SD) 24.5 ± 5.06 23. 1 ± 5.47 <0.001
 BUN (mg/dl, mean ± SD) 32.9 ± 24.0 41.7 ± 27.3 <0.001
 Creatinine (mg/dl, mean ± SD) 1.74 ± 1.46 2.05 ± 1.48 <0.001
 Sodium (mmol/L)
 Normal (135–145) 3306 (73.3 %) 1169 (62.2 %) <0.001
 Abnormal (<135 or >145) 1204 (26.7 %) 711 (37.8 %)
 Potassium (mmol/L)
 Normal (3.5–5.5) 3659 (81.1 %) 1448 (77.0 %) <0.001
 Abnormal (<3.5 or >5.5) 851 (18.9 %) 432 (23.0 %)
 Glucose (mmol/L)
 Normal (3.9–7.8) 2334 (51.8 %) 921 (49.0 %) 0.0471
 Abnormal (<3.9 or >7.8) 2176 (48.2 %) 959 (51.0 %)
 INR (mean ± SD) 1.64 ± 1.05 2.08 ± 1.45 <0.001
 PT (s, mean ± SD) 17.9 ± 10.8 22.5 ± 15. 1 <0.001
 APTT (s, mean ± SD) 44.5 ± 30.7 53.3 ± 36.0 <0.001
 pH
 Normal (7.35–7.45) 2358 (52.3 %) 887 (47.2 %) <0.001
 Moderate (7.25–7.34 or 7.46–7.55) 1971 (43.7 %) 834 (44.4 %)
 Severe (<7.25 or >7.55) 181 (4.0 %) 159 (8.5 %)
 SaO2 (%, mean ± SD) 97.0 ± 2.27 96.3 ± 3.36 <0.001
 PaO2 (mmHg, mean ± SD) 200 ± 122 197 ± 120 0.314
 PaCO2 (mmHg, mean ± SD) 49.2 ± 15.5 48.8 ± 16.6 0.473
 Urine output (ml)
 Normal (1500–2500) 1276 (28.3 %) 344 (18.3 %) <0.001
 Moderate (500–1499 or 2501–3500) 2333 (51.7 %) 903 (48.0 %)
 Severe (<500 or >3500) 901 (20.0 %) 633 (33.7 %)
Comorbidities
 Myocardial infarct 677 (15.0 %) 341 (18. 1 %) 0.0021
 Congestive heart failure 1404 (31. 1 %) 595 (31.6 %) 0.706
 Peripheral vascular disease 445 (9.9 %) 231 (12.3 %) 0.00478
 Cerebrovascular disease 651 (14.4 %) 326 (17.3 %) 0.0037
 Dementia 207 (4.6 %) 80 (4.3 %) 0.602
 Chronic pulmonary disease 1409 (31.2 %) 578 (30.7 %) 0.718
 Rheumatic disease 128 (2.8 %) 68 (3.6 %) 0.117
 Peptic ulcer disease 122 (2.7 %) 58 (3. 1 %) 0.451
 Liver disease 640 (14.2 %) 463 (24.6 %) <0.001
 Diabetes 1448 (32. 1 %) 538 (28.6 %) 0.00869
 Paraplegia 207 (4.6 %) 88 (4.7 %) 0.926
 Renal disease 1006 (22.3 %) 419 (22.3 %) 1
 Malignant cancer 567 (12.6 %) 420 (22.3 %) <0.001
 AIDS 38 (0.8 %) 13 (0.7 %) 0.642
Interventions
 Vasopressor 636 (14. 1 %) 767 (40.8 %) <0.001
 Renal replacement therapy, RRT 368 (8.2 %) 393 (20.9 %) <0.001

3.2. Feature selection combined logistic regression and LASSO

Based on the training set data, we conducted univariate analysis by including each variable in the model with in-hospital mortality as the outcome measure. Only a few demographic variables, a subset of blood test results, blood gas indicators, and some comorbidities did not reach statistical significance. However, significant statistical differences (P < 0.05) were observed in age, heart rate, respiratory rate, MAP, SOFA score, SAPS II, CCI, WBC, hemoglobin (Hb), hematocrit (HCT), platelet count (PLT), bicarbonate, BUN, serum creatinine (Cre), sodium, potassium, INR, PT, APTT, pH, SaO2, and urine output. These significant differences indicated a strong association between abnormalities in these indicators and the risk of in-hospital mortality in patients with sepsis complicated by ARDS. Additionally, the presence of peripheral vascular disease, cerebrovascular disease, diabetes mellitus, malignant tumors, the need for vasoactive medications, and RRT was associated with a significantly higher risk of in-hospital mortality (P < 0.05) (Supplementary Table 1).

Further multivariate logistic analysis (Table 2) revealed a strong correlation between in-hospital mortality and several factors in patients with sepsis complicated by ARDS. These factors include age, respiratory rate, SAPS II, Hb, BUN, Cre, sodium, PT, APTT, SaO2, urine output, cerebrovascular disease, liver disease, diabetes mellitus, malignant tumors, the use of vasoactive medications, and RRT (P < 0.05).

Table 2.

Multivariate logistic regression analysis of predictors for patients with sepsis and ARDS.

SE OR 95%CI Z P
Age 0.003 1.02 1.01–1.02 6.435 0
Heart rate 0.061 1.09 0.97–1.23 1.444 0.149
Respiratory rate 0.054 1.24 1.12–1.38 4.032 0
Mean Arterial Pressure 0.12 1.21 0.95–1.53 1.568 0.117
SAPS II 0.003 1.02 1.01–1.03 6.238 0
Hemoglobin 0.017 0.96 0.93–0.99 −2.504 0.012
BUN 0.002 1.01 1.01–1.01 4.194 0
Creatinine 0.041 0.83 0.76–0.9 −4.614 0
Sodium 0.079 1.34 1.14–1.56 3.668 0
PT 0.003 1.01 1∼1.02 3.359 0.001
APTT 0.001 1.00 1∼1.01 2.883 0.004
SpO 2 0.015 0.96 0.93–0.99 −2.742 0.006
Urine output 0.058 1.2 1.08–1.35 3.195 0.001
Cerebrovascular disease 0.102 1.91 1.57–2.33 6.322 0
Liver disease 0.1 1.84 1.52–2.24 6.137 0
Diabetes 0.083 0.7 0.6–0.82 −4.307 0
Malignant cancer 0.098 2.00 1.65–2.42 7.08 0
Vasopressor 0.091 2.58 2.16–3.08 10.456 0
CRRT 0.128 1.42 1.11–1.83 2.733 0.006

Machine learning algorithms offer unique advantages in large-scale variable selection. The LASSO machine learning algorithm was utilized for this purpose, revealing significant results. Variables identified as independent risk factors for in-hospital mortality in patients with sepsis complicated by ARDS include age, respiratory rate, MAP, SAPS II, CCI, Hb, HCT, BUN, sodium, PT, APTT, SaO2, urine output, cerebrovascular disease, liver disease, diabetes mellitus, malignant tumors, the use of vasoactive medications, and RRT (Fig. 2A–B).

Fig. 2.

Fig. 2

A–B Further screening of risk factors for sepsis patients complicated by ARDS using the LASSO machine learning algorithm. At log(lambda) = −4.398, 19 variables were selected as survival-associated factors (age, respiratory rate, MAP, SAPS II, CCI, Hb, HCT, BUN, sodium, PT, APTT, SaO2, urine output, cerebrovascular disease, liver disease, diabetes mellitus, malignant tumors, use of vasoactive medications, and RRT).

In addition to the LASSO algorithm, we also employed the Random Forest algorithm for variable selection, obtaining results similar to those previously achieved (see Supplementary Fig. 2 and Supplementary Table 5).

3.3. Model construction

Based on the selection of variables using logistic regression and the LASSO machine learning algorithm and considering their clinical significance, the following variables were chosen to construct the Clinical Prediction Model (CPM): age, MAP, Hb, pH, sodium, SaO2, bicarbonate, PT, cerebrovascular disease, liver disease, diabetes mellitus, and malignant tumors. After these variables were fitted into the model and further analyses were conducted, the coefficients and detailed information are presented in Supplementary Table 2.

The constructed CPM is represented by the following equation: CPM = 0.025 × Age +0.343 × MAP - 0.0045 × Hb + 0.149 × pH + 0.424 × Sodium - 0.094 × SaO2 - 0.043 × Bicarbonate +0.018 × PT + 0.502 × Cerebrovascular Disease +0.683 × Liver Disease - 0.333 × Diabetes Mellitus +0.734 × Malignant Tumors +7.113. Additionally, a corresponding visualization using a nomogram (Fig. 3) was developed to illustrate the model.

Fig. 3.

Fig. 3

Visual nomogram constructed based on repeatedly selected risk factors. The vertical axis represents the selected risk factors, and the horizontal axis represents the risk of mortality. A total score is obtained by calculating the corresponding scores for each risk factor, which is further used to assess the in-hospital mortality risk in sepsis-associated acute respiratory distress syndrome (ARDS) patients.

3.4. Internal validation and comparison with SOFA and SAPS II

Next, the model's predictive performance was validated using an internal validation dataset from the MIMIC IV database. The model's clinical discrimination, calibration, and net benefit were assessed using ROC curves, calibration plots, and decision curve analysis (DCA), respectively. The results, presented in Fig. 4, show good clinical discrimination with an AUC of 0.715 for the training set (Fig. 4A–C) and 0.711 for the validation set (Fig. 4D–F). The predicted probabilities closely aligned with the ideal line, indicating good model calibration. Additionally, the model exhibited favorable clinical applicability and net benefits.

Fig. 4.

Fig. 4

Performance evaluation of the newly developed clinical prediction model. A-C represent ROC analysis, calibration curve analysis, and DCA of the training set, respectively. D-F represent ROC analysis, calibration curve analysis, and DCA of the internal validation set, respectively.

The SOFA and SAPS II scores have long been classic tools for assessing the severity of sepsis. We compared the newly developed predictive model (AUC = 0.715) with the SOFA (AUC = 0.577) and SAPS II (AUC = 0.688) scores to evaluate its performance and potential applications. The results revealed that the newly constructed model demonstrated significantly higher discriminative ability than the other two methods, as shown in Fig. 5, Table 3 and Table 4.

Fig. 5.

Fig. 5

Comparison of the discrimination performance between the newly developed prediction model and SOFA and SAPS II. The new model shows an AUC value of 0.715, while SOFA has an AUC value of 0.577, and SAPS II has an AUC value of 0.688.

Table 3.

Model performance comparison with SOFA and SAPAS II.

Models AUC 95%CI Accuracy Sensitivity Specificity
New model 0.714 0.698–0.731 0.680 0.607 0.711
SOFA 0.577 0.559–0.595 0.557 0.570 0.552
SAPAS II 0.688 0.671–0.705 0.644 0.642 0.645

Table 4.

Delong's test for comparison different models.

Comparison Z P AUC (NewModel) AUC (SOFA) AUC (SAPSII)
New Model vs SOFA 12.067 <0.001 0.714 0.577
New Model vs SAPSII 2.838 0.005 0.714 0.688
SOFA vs SAPSII −10.521 <0.001 0.577 0.688

3.5. External validation of models

In addition to using the MIMIC IV data as training and internal validation sets, we collected and compiled de-identified data from patients with sepsis and ARDS at a tertiary hospital in Liaoning Province, China. This dataset included 225 patients who met the inclusion and exclusion criteria and provided complete information between January 2016 and September 2022. Among these patients, 117 died during hospitalization, and 108 survived (Supplementary Table 3). This dataset was used to independently validate the newly developed predictive models.

The results, as shown in Fig. 6, indicate that the model demonstrated good clinical discrimination, with an AUC of 0.771 in the external validation set (Fig. 6 A). Furthermore, the model exhibited favorable calibration (Fig. 6 B), clinical applicability, and net benefits (Fig. 6C).

Fig. 6.

Fig. 6

External validation of the newly developed clinical prediction model. A-C represent ROC analysis, calibration curve analysis, and DCA of the external validation set, respectively.

4. Discussion

Sepsis complicated with ARDS is associated with high mortality and significant morbidity [23,24]. Therefore, early identification and appropriate management can improve patient outcomes [25,26]. However, predictive models for this population are currently lacking. In this study, we developed and validated a new clinical prediction model using the MIMIC IV database and independent external data from a tertiary hospital in Liaoning Province, China. Our model demonstrated superior discriminatory ability compared to existing assessment tools, offering a potential strategy for identifying high-risk patients and guiding personalized treatment decisions.

Machine learning algorithms provide unique advantages in selecting variables from large samples [24,27,28]. In addition to traditional multivariable logistic regression, we employed the LASSO machine learning algorithm to enhance the accuracy and reliability of our variable selection. The selected variables, including age, MAP, Hb, pH, sodium, SaO2, bicarbonate, PT, cerebrovascular disease, liver disease, diabetes, and malignancy, represent a range of patient characteristics, physiological parameters, and clinical conditions commonly assessed in patients with sepsis and ARDS. This underscores the clinical applicability of the model, as the data used were readily obtainable. Each variable in the model contributes unique information about the patient's current condition and interacts synergistically within the model. Consideration of each variable, along with adjustments for others in the multivariate analysis, ensures the accuracy and clinical relevance of our model. Furthermore, the use of multiple data sources for internal and external validation demonstrates the robustness and generalizability of our model across different populations.

Age is a critical factor in the model, and numerous studies have identified it as an independent risk factor for prognosis in patients with sepsis-ARDS [29,30]. Advanced age is linked to decreased organ reserves and an increased susceptibility to infections [31,32]. In elderly patients, even minor insults can precipitate immune dysfunction, potentially exacerbating the development of ARDS and multiple organ dysfunction syndrome (MODS), leading to worsened clinical outcomes and increased mortality [33].

MAP, a critical indicator of cardiovascular function and blood perfusion, has been associated with reduced mortality when it reaches or exceeds a threshold of 65 mmHg [34,35].

In patients with septic shock, decreased Hb levels are frequently observed and are attributed to a systemic inflammatory response that suppresses erythropoiesis, along with hemolysis and bleeding [36]. Patients with reduced Hb levels are more prone to tissue hypoxia, and their systemic inflammatory responses tend to be more severe than those in patients with normal Hb levels [37]. Therefore, Hb-related indicators require further investigation.

PT is a significant independent risk factor, indicating impaired coagulation function, which also influences prognosis. Sepsis-induced complement activation can cause endothelial damage, leading to the activation of inflammatory and microthrombotic pathways, resulting in disseminated intravascular coagulation [38,39]. Sepsis-associated coagulation dysfunction (SAC) significantly affects patient outcomes, with higher mortality rates observed in patients with sepsis and severe SAC [40].

pH, which reflects metabolic disturbances and patient responses to treatment, is not only an early indicator of metabolic dysregulation but is also associated with patient outcomes [41,42].

In our study, patients with cerebrovascular disease, liver disease, diabetes, malignancy, and those requiring vasopressor support or RRT had significantly worse outcomes. Patients with sepsis-ARDS and underlying comorbidities have been reported to have poorer prognoses and higher mortality rates. Having three or more underlying comorbidities and advanced age were identified as independent risk factors for poor prognosis. Elderly age and the presence of multiple comorbidities greatly increase mortality risk in patients with sepsis.

A nomogram is a visual representation that simplifies a complex regression equation, making the results of the clinical prediction model more accessible for interpretation. It assists clinicians in making individualized decisions regarding the treatment and management of patients with sepsis, thereby reducing the risk of mortality [43,44]. We created a nomogram to visualize the model, with each variable represented by a line segment labeled with the corresponding scale indicating the range of values. The contribution of each factor to the outcome event is represented by the length of the line segment. The final row represents the total score obtained by adding the scores for each individual variable. The predicted probability of in-hospital mortality in patients with sepsis and ARDS is calculated based on the total score. This visual tool is concise and practical, facilitating its clinical application. Each variable in the model provides unique information about the patient's current condition and works together within the model. Consideration of each variable, including adjustments for other variables in the multivariate analysis, ensures the accuracy and clinical relevance of the model. Furthermore, the use of multiple data sources for internal and external validation demonstrates the robustness and generalizability of our model across different populations.

In our prediction model, we considered several variables, including age and interventions. Age is a fundamental factor in assessing disease risk and prognosis and is widely accepted for use in clinical decision support. However, the causal relationship between age and outcomes is not direct and may be influenced by various underlying factors such as overall health status, comorbidities, and lifestyle. Without a full understanding of how these factors interact, simply using age as a predictor could lead to bias in the model. Furthermore, the choice of intervention is often based on clinical judgment and the specific needs of patients, with complex and variable causal mechanisms underlying these decisions. For instance, in our model, the application of specific intervention (such as Vasopressor or CRRT) is closely related to the expected clinical outcomes. Understanding how these interventions affect the recovery process requires a deep knowledge of the pathophysiological changes before and after treatment, which is where causal inference methods, especially marginal structural models, can play a crucial role. Within the framework of causal inference, we can more systematically analyze how these variables affect the clinical pathways of patients. For example, by using methods such as Marginal Structural Models (MSM) and Inverse Probability Weighting (IPW), we can control for the influence of confounding variables, thereby more accurately estimating the causal effects of specific treatments on outcomes [45].

However, we acknowledge certain limitations in predicting the in-hospital mortality rate of specific individuals using our model. These limitations primarily include differences in clinical data among patients from different countries or regions. Although we standardized the data from different units, variations in measurement values due to different reagents used for testing were inevitable. Moreover, the external validation dataset had a smaller sample size than the previous MIMIC IV dataset, and there were differences in the matching of clinical indicators. Additionally, due to data limitations, other important factors that may influence patient outcomes, such as treatment strategies, infection control measures, and other nursing interventions, were not considered. Therefore, our model should be regarded as a supplementary tool that requires consideration of these factors rather than as the sole decision-making tool.

5. Conclusions

The clinical prediction model we developed offers a valuable tool for identifying sepsis patients with ARDS who are at high risk of in-hospital mortality. This may contribute to enhancing the management of sepsis, especially in patients with concurrent ARDS, by facilitating the implementation of personalized treatment strategies and improving patient outcomes.

Funding

This study was supported by the Dalian Medical Science Research Program Project (2112013)

6. Data availability statement

The clinical data for this study were obtained from the MIMIC IV v2.0 database and the ICU data from a large hospital in China. The MIMIC data were used for model construction and internal validation, while the data from the hospital in Liaoning Province were used for external validation. Access to the MIMIC IV database required passing a qualifying test and obtaining approval (certification number: 50766047). The hospital's ethics committee approved the use of external validation data (approval number: PJ-KS-KY-2023-304).

CRediT authorship contribution statement

Ying Chen: Writing – original draft, Methodology, Formal analysis, Data curation, Conceptualization. Chengzhu Zong: Writing – original draft, Data curation, Conceptualization. Linxuan Zou: Methodology, Data curation. Zhe Zhang: Data curation. Tianke Yang: Formal analysis. Junwei Zong: Supervision, Conceptualization. Xianyao Wan: Supervision, Conceptualization.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

The authors would like to thank MIMIC-IV for open access to their database. The opinions expressed in this study are those of the authors and do not represent the opinions of the Beth Israel Deaconess Medical Center.

Footnotes

Appendix A

Supplementary data to this article can be found online at https://doi.org/10.1016/j.heliyon.2024.e33337.

Contributor Information

Ying Chen, Email: yoursjane1934@126.com.

Junwei Zong, Email: aweizone@163.com.

Xianyao Wan, Email: wxy-1-1@163.com.

Appendix A. Supplementary data

The following is/are the supplementary data to this article.

Multimedia component 1
mmc1.pdf (366.6KB, pdf)
Multimedia component 2
mmc2.pdf (9KB, pdf)
Multimedia component 3
mmc3.docx (17.7KB, docx)
Multimedia component 4
mmc4.docx (15KB, docx)
Multimedia component 5
mmc5.docx (26.1KB, docx)
Multimedia component 6
mmc6.csv (6.1KB, csv)
Multimedia component 7
mmc7.csv (2.1KB, csv)

References

  • 1.Singer M., Deutschman C.S., Seymour C.W., Shankar-Hari M., Annane D., Bauer M., et al. The third international consensus definitions for sepsis and septic shock (Sepsis-3) JAMA. 2016;315:801–810. doi: 10.1001/jama.2016.0287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Seymour C.W., Liu V.X., Iwashyna T.J., Brunkhorst F.M., Rea T.D., Scherag A., et al. Assessment of clinical criteria for sepsis: for the third international consensus definitions for sepsis and septic shock (Sepsis-3) JAMA. 2016;315:762–774. doi: 10.1001/jama.2016.0288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Kellum J.A., Formeck C.L., Kernan K.F., Gómez H., Carcillo J.A. Subtypes and mimics of sepsis. Crit. Care Clin. 2022;38(2):195–211. doi: 10.1016/j.ccc.2021.11.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bos L.D.J., Ware L.B. Acute respiratory distress syndrome: causes, pathophysiology, and phenotypes. Lancet. 2022;400:1145–1156. doi: 10.1016/S0140-6736(22)01485-4. [DOI] [PubMed] [Google Scholar]
  • 5.Bellani G., Laffey J.G., Pham T., Fan E., Brochard L., Esteban A., et al. Epidemiology, patterns of care, and mortality for patients with acute respiratory distress syndrome in intensive care units in 50 countries. JAMA. 2016;315:788–800. doi: 10.1001/jama.2016.0291. [DOI] [PubMed] [Google Scholar]
  • 6.Steyerberg E.W., Moons K.G.M., van der Windt D.A., Hayden J.A., Perel P., Schroter S., et al. Prognosis Research Strategy (PROGRESS) 3: prognostic model research. PLoS Med. 2013;10 doi: 10.1371/journal.pmed.1001381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Xie T., Xin Q., Cao X., Chen R., Ren H., Liu C., et al. Clinical characteristics and construction of a predictive model for patients with sepsis related liver injury. Clin. Chim. Acta. 2022;537:80–86. doi: 10.1016/j.cca.2022.10.004. [DOI] [PubMed] [Google Scholar]
  • 8.Xin Q., Xie T., Chen R., Wang H., Zhang X., Wang S., et al. Construction and validation of an early warning model for predicting the acute kidney injury in elderly patients with sepsis. Aging Clin. Exp. Res. 2022;34:2993–3004. doi: 10.1007/s40520-022-02236-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Johnson A.E.W., Pollard T.J., Shen L., Lehman L.H., Feng M., Ghassemi M., et al. MIMIC-III, a freely accessible critical care database. Sci. Data. 2016;3 doi: 10.1038/sdata.2016.35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Bennett A.M., Ulrich H., van Damme P., Wiedekopf J., Johnson A.E.W. MIMIC-IV on FHIR: converting a decade of in-patient data into an exchangeable, interoperable format. J. Am. Med. Inf. Assoc. 2023;30:718–725. doi: 10.1093/jamia/ocad002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Pollard T.J., Johnson A.E.W., Raffa J.D., Celi L.A., Mark R.G., Badawi O. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci. Data. 2018;5 doi: 10.1038/sdata.2018.178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Huang B., Liang D., Zou R., Yu X., Dan G., Huang H., et al. Mortality prediction for patients with acute respiratory distress syndrome based on machine learning: a population-based study. Ann. Transl. Med. 2021;9 doi: 10.21037/atm-20-6624. 794–794. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Johnson A.E.W., Bulgarelli L., Shen L., Gayles A., Shammout A., Horng S., et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data. 2023;10:1. doi: 10.1038/s41597-022-01899-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Le Gall J.R., Lemeshow S., Saulnier F. A new Simplified Acute Physiology Score (SAPS II) based on a European/North American multicenter study. JAMA. 1993;270:2957–2963. doi: 10.1001/jama.270.24.2957. [DOI] [PubMed] [Google Scholar]
  • 15.Charlson M.E., Pompei P., Ales K.L., MacKenzie C.R. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J. Chron. Dis. 1987;40:373–383. doi: 10.1016/0021-9681(87)90171-8. [DOI] [PubMed] [Google Scholar]
  • 16.Charlson M., Szatrowski T.P., Peterson J., Gold J. Validation of a combined comorbidity index. J. Clin. Epidemiol. 1994;47:1245–1251. doi: 10.1016/0895-4356(94)90129-5. [DOI] [PubMed] [Google Scholar]
  • 17.Hamzah F.B., Hamzah F.M., Razali S.F.M., Samad H. A comparison of multiple imputation methods for recovering missing data in hydrological studies. CIVIL ENG. 2021;7:1608–1619. [Google Scholar]
  • 18.Van Ginkel J.R., Linting M., Rippe R.C.A., Van Der Voort A. Rebutting existing misconceptions about multiple imputation as a method for handling missing data. J. Pers. Assess. 2020;102:297–308. doi: 10.1080/00223891.2018.1530680. [DOI] [PubMed] [Google Scholar]
  • 19.Grund S., Lüdtke O., Robitzsch A. Multiple imputation of missing data in multilevel models with the R package mdmb: a flexible sequential modeling approach. Behav Res. 2021;53:2631–2649. doi: 10.3758/s13428-020-01530-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.van Smeden M., Moons K.G., de Groot J.A., Collins G.S., Altman D.G., Eijkemans M.J., et al. Sample size for binary logistic prediction models: beyond events per variable criteria. Stat. Methods Med. Res. 2019;28:2455–2474. doi: 10.1177/0962280218784726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Peduzzi P., Concato J., Kemper E., Holford T.R., Feinstein A.R. A simulation study of the number of events per variable in logistic regression analysis. J. Clin. Epidemiol. 1996;49:1373–1379. doi: 10.1016/s0895-4356(96)00236-3. [DOI] [PubMed] [Google Scholar]
  • 22.Vrieze S.I. Model selection and psychological theory: a discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) Psychol. Methods. 2012;17:228–243. doi: 10.1037/a0027127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hu Q., Hao C., Tang S. From sepsis to acute respiratory distress syndrome (ARDS): emerging preventive strategies based on molecular and genetic researches. Biosci. Rep. 2020;40 doi: 10.1042/BSR20200830. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhao Q.-Y., Liu L.-P., Luo J.-C., Luo Y.-W., Wang H., Zhang Y.-J., et al. A machine-learning approach for dynamic prediction of sepsis-induced coagulopathy in critically ill patients with sepsis. Front. Med. 2020;7 doi: 10.3389/fmed.2020.637434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Luo M., Chen Y., Cheng Y., Li N., Qing H. Association between hematocrit and the 30-day mortality of patients with sepsis: a retrospective analysis based on the large-scale clinical database MIMIC-IV. PLoS One. 2022;17 doi: 10.1371/journal.pone.0265758. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Hu C., Li L., Huang W., Wu T., Xu Q., Liu J., et al. Interpretable machine learning for early prediction of prognosis in sepsis: a discovery and validation study. Infect. Dis. Ther. 2022;11:1117–1132. doi: 10.1007/s40121-022-00628-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Yue S., Li S., Huang X., Liu J., Hou X., Zhao Y., et al. Machine learning for the prediction of acute kidney injury in patients with sepsis. J. Transl. Med. 2022;20:215. doi: 10.1186/s12967-022-03364-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Hou N., Li M., He L., Xie B., Wang L., Zhang R., et al. Predicting 30-days mortality for MIMIC-III patients with sepsis-3: a machine learning approach using XGboost. J. Transl. Med. 2020;18:462. doi: 10.1186/s12967-020-02620-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Martin G.S., Mannino D.M., Moss M. The effect of age on the development and outcome of adult sepsis. Crit. Care Med. 2006;34:15–21. doi: 10.1097/01.ccm.0000194535.82812.ba. [DOI] [PubMed] [Google Scholar]
  • 30.Liang S.Y. Sepsis and other infectious disease emergencies in the elderly. Emerg. Med. Clin. 2016;34:501–522. doi: 10.1016/j.emc.2016.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Esme M., Topeli A., Yavuz B.B., Akova M. Infections in the elderly critically-ill patients. Front. Med. 2019;6:118. doi: 10.3389/fmed.2019.00118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Atamna H., Tenore A., Lui F., Dhahbi J.M. Organ reserve, excess metabolic capacity, and aging. Biogerontology. 2018;19:171–184. doi: 10.1007/s10522-018-9746-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Brown R., McKelvey M.C., Ryan S., Creane S., Linden D., Kidney J.C., et al. The impact of aging in acute respiratory distress syndrome: a clinical and mechanistic overview. Front. Med. 2020;7 doi: 10.3389/fmed.2020.589553. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Ko C.-H., Lan Y.-W., Chen Y.-C., Cheng T.-T., Yu S.-F., Cidem A., et al. Effects of mean artery pressure and blood pH on survival rate of patients with acute kidney injury combined with acute hypoxic respiratory failure: a retrospective study. Medicina. 2021;57:1243. doi: 10.3390/medicina57111243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Asfar P., Radermacher P., Ostermann M. MAP of 65: target of the past? Intensive Care Med. 2018;44:1551–1552. doi: 10.1007/s00134-018-5292-8. [DOI] [PubMed] [Google Scholar]
  • 36.Vincent J.L., Baron J.-F., Reinhart K., Gattinoni L., Thijs L., Webb A., et al. Anemia and blood transfusion in critically ill patients. JAMA. 2002;288:1499–1507. doi: 10.1001/jama.288.12.1499. [DOI] [PubMed] [Google Scholar]
  • 37.Hotchkiss R.S., Moldawer L.L., Opal S.M., Reinhart K., Turnbull I.R., Vincent J.-L. Sepsis and septic shock. Nat. Rev. Dis. Prim. 2016;2:1–21. doi: 10.1038/nrdp.2016.45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Iba T., Levi M., Levy J.H. Sepsis-induced coagulopathy and disseminated intravascular coagulation. Semin. Thromb. Hemost. 2020;46:89–95. doi: 10.1055/s-0039-1694995. [DOI] [PubMed] [Google Scholar]
  • 39.Abe T., Kubo K., Izumoto S., Shimazu S., Goan A., Tanaka T., et al. Complement activation in human sepsis is related to sepsis-induced disseminated intravascular coagulation. Shock. 2020;54:198–204. doi: 10.1097/SHK.0000000000001504. [DOI] [PubMed] [Google Scholar]
  • 40.Levi M., Toh C.H., Thachil J., Watson H.G. Guidelines for the diagnosis and management of disseminated intravascular coagulation. British Committee for Standards in Haematology. Br. J. Haematol. 2009;145:24–33. doi: 10.1111/j.1365-2141.2009.07600.x. [DOI] [PubMed] [Google Scholar]
  • 41.Nolt B., Tu F., Wang X., Ha T., Winter R., Williams D.L., Li C. Lactate and immunosuppression in sepsis. Shock. 2018;49(2):120–125. doi: 10.1097/SHK.0000000000000958. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Thomlinson P., Carpenter M., D'Alessandri-Silva C. The evaluation and treatment of metabolic acidosis. Curr Treat Options Peds. 2020;6:29–37. [Google Scholar]
  • 43.Ren Y., Zhang L., Xu F., Han D., Zheng S., Zhang F., et al. Risk factor analysis and nomogram for predicting in-hospital mortality in ICU patients with sepsis and lung infection. BMC Pulm. Med. 2022;22:17. doi: 10.1186/s12890-021-01809-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Zeng Q., He L., Zhang N., Lin Q., Zhong L., Song J. Prediction of 90-day mortality among sepsis patients based on a nomogram integrating diverse clinical indices. BioMed Res. Int. 2021;2021 doi: 10.1155/2021/1023513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Zhang Z., Jin P., Feng M., Yang J., Huang J., Chen L., et al. Causal inference with marginal structural modeling for longitudinal data in laparoscopic surgery: a technical note. Laparoscopic. Endoscopic and Robotic Surgery. 2022;5:146–152. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1
mmc1.pdf (366.6KB, pdf)
Multimedia component 2
mmc2.pdf (9KB, pdf)
Multimedia component 3
mmc3.docx (17.7KB, docx)
Multimedia component 4
mmc4.docx (15KB, docx)
Multimedia component 5
mmc5.docx (26.1KB, docx)
Multimedia component 6
mmc6.csv (6.1KB, csv)
Multimedia component 7
mmc7.csv (2.1KB, csv)

Data Availability Statement

The clinical data for this study were obtained from the MIMIC IV v2.0 database and the ICU data from a large hospital in China. The MIMIC data were used for model construction and internal validation, while the data from the hospital in Liaoning Province were used for external validation. Access to the MIMIC IV database required passing a qualifying test and obtaining approval (certification number: 50766047). The hospital's ethics committee approved the use of external validation data (approval number: PJ-KS-KY-2023-304).


Articles from Heliyon are provided here courtesy of Elsevier

RESOURCES