Abstract
Colonization by carbapenemase-producing Enterobacterales (CPE) on admission to an intensive care unit (ICU) poses a serious threat to infection control. Early detection is critical but remains challenging in real-world settings. We aimed to develop interpretable machine learning models for predicting CPE colonization at ICU admission to support clinical decision-making for early isolation of CPE carriers. We conducted a retrospective cohort study of adult ICU admissions at a tertiary hospital in South Korea from January 2022 to December 2023. CPE colonization was defined by rectal swab culture within 48 h of admission. Forty-two candidate variables were extracted from electronic medical records, and ten machine learning algorithms were evaluated. Of 4,915 ICU admissions, 453 (9.2%) were colonized with CPE at admission. Twelve predictors were retained for model development, including antibiotic exposure, device use, and medical condition. Logistic regression at a threshold of 0.45 achieved the best-balanced performance with a sensitivity of 0.73, an ROC-AUC of 0.77, and a negative predictive value of 0.96. A web-based CPE prediction tool was developed based on the model; this enables clinicians to enter the 14 selected variables at ICU admission and instantly obtain an estimated risk of CPE colonization. Our machine learning–based tool for predicting CPE colonization at ICU admission appears to hold promise as a rule-out aid for CPE carriage.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-026-37927-8.
Keywords: Carbapenemase-producing enterobacterales, Intensive care unit, Machine learning, Prediction model, SHAP, Logistic regression
Subject terms: Computational biology and bioinformatics, Diseases, Health care, Medical research, Microbiology, Risk factors
Introduction
Infections with carbapenemase-producing Enterobacterales (CPE) have become one of the leading clinical challenges worldwide because of the associated high morbidity and mortality1. The incidence of CPE infections in South Korea increased by 263.9% between 2018 and 20222. To mitigate transmission in healthcare settings, strict infection control measures are recommended, including contact isolation, hand hygiene, and environmental decontamination3.
Hospitalized patients admitted to intensive care units (ICU) are particularly vulnerable to CPE colonization and subsequent infection, both of which are associated with poor clinical outcomes4. Early identification of CPE carriers upon ICU admission is therefore crucial to enable timely implementation of isolation measures. However, no standardized guidelines exist for active CPE screening in ICU patients, and isolating all admissions until culture results become available is often impractical owing to limited resources.
Several studies have investigated risk factors for CPE rectal colonization in ICU patients5–7, and predictive scores for colonization by multidrug-resistant organisms (MDRO) have been proposed8,9. However, these tools were either developed for MDRO in general rather than for CPE specifically8, or focused on hospital-wide CPE colonization rather than ICU-specific circumstances9.
Recently, machine learning methods have been applied to predict colonization with carbapenem-resistant Enterobacterales (CRE)10–12. Liang et al. developed a machine learning model for ICU patients, but the explanation was limited to feature importance, and did not use more interpretable methods such as SHAP (SHapley Additive exPlanations) or LIME (Local Interpretable Model-Agnostic Explanations)10. Freire et al. focused on liver transplant recipients11 and McGuire et al. created a model for general inpatients12, which restricts generalizability to broader ICU populations.
There is clearly a need for a machine learning model that is specifically developed for ICU patients and has improved interpretability to assist in clinical decision-making. Therefore, in this study we aimed to develop interpretable machine learning models for predicting CPE colonization upon ICU admission to support clinical decision-making in early isolation of CPE carriers.
Results
Baseline characteristics of the participants
A total of 4,915 patients with surveillance cultures at ICU admission were included during the study period from 2022 to 2023. Of these, 453 patients (9.2%) were colonized with CPE. Patient characteristics are presented in Table 1. In total, 41 features were extracted. Patients with CPE colonization were older (P = 0.032), had longer hospital stays before ICU admission (P < 0.001) and had more previous hospitalizations within six months (P = 0.008). In addition, they had more admission to long-term care facilities within one year (P < 0.001), more previous surgery within three months (P < 0.001), more diabetes mellitus (P = 0.003), recent chemotherapy within six months (P < 0.001), end-stage renal disease on renal replacement therapy (P < 0.001), immunosuppressant use within three months (P = 0.018), steroid use within three months (P < 0.001), previous antibiotics use within three months (P < 0.001), indwelling devices including foley catheter (P < 0.001), central venous catheter (P < 0.001), nasogastric tube (P < 0.001), endotracheal tube (P < 0.001), biliary drain (P < 0.001), and pigtail catheter (P < 0.001), receipt of endoscopy within one year (P < 0.001), and colonization with vancomycin-resistant Enterococci (VRE) or infection within six months (P < 0.001).
Table 1.
Characteristics of patients in the positive and negative CPE carriage groups at ICU admission.
| Characteristic | Negative CPE carriage (n = 4,462) |
Positive CPE carriage (n = 453) |
P value |
|---|---|---|---|
| Age (yr), median (IQR) | 72.0 (60–83) | 75.0 (63–83) | 0.032 |
| Male | 1,956 (43.8) | 189 (41.7) | 0.400 |
|
Hospital days before ICU admission, median (IQR) |
2 (1–3) | 3 (1–4) | < 0.001 |
| Admission source | 0.800 | ||
| Emergency room | 508 (11.4) | 50 (11.0) | |
| General ward | 3,954 (88.6) | 403 (89.0) | |
|
Previous hospitalization within 6 months |
1,993 (44.7) | 232 (51.2) | 0.008 |
|
Admission to long-term care facility within 1 year |
290 (6.5) | 91 (20.1) | < 0.001 |
| Preexisting medical condition | 3,157 (70.8) | 363 (80.1) | < 0.001 |
|
Previous surgery within 3 months |
2,620 (58.7) | 312 (68.9) | < 0.001 |
| Diabetes mellitus | 1,168 (26.2) | 148 (32.7) | 0.003 |
| Cardiovascular disease | 426 (9.5) | 47 (10.4) | 0.600 |
| Solid cancer | 208 (4.7) | 29 (6.4) | 0.100 |
| Chemotherapy within 6 months | 176 (3.9) | 36 (7.9) | < 0.001 |
| Chronic renal disease | 210 (4.7) | 39 (8.6) | < 0.001 |
|
ESRD on renal replacement therapy |
175 (3.9) | 40 (8.8) | < 0.001 |
| Hematologic malignancy | 19 (0.4) | 5 (1.1) | 0.064 |
| Cerebrovascular disease | 15 (0.3) | 2 (0.4) | 0.700 |
| Aortic disease | 15 (0.3) | 1 (0.2) | > 0.9 |
| Solid organ transplant | 13 (0.3) | 1 (0.2) | > 0.9 |
| Liver cirrhosis | 6 (0.1) | 0 | > 0.9 |
|
Chronic obstructive pulmonary disease |
4 (< 0.1) | 2 (0.4) | 0.100 |
|
Immunosuppressant use within 3 months |
77 (1.7) | 15 (3.3) | 0.018 |
| Steroid use within 3 months | 432 (9.7) | 133 (29.4) | < 0.001 |
|
Previous antibiotic within 3 months |
1,176 (26.4) | 259 (57.2) | < 0.001 |
| β-lactam/β-lactamase inhibitor | 543 (12.2) | 181 (40.0) | < 0.001 |
| Cephalosporin | 649 (14.5) | 146 (32.2) | < 0.001 |
| Fluoroquinolone | 299 (6.7) | 100 (22.1) | < 0.001 |
| Carbapenem | 159 (3.6) | 92 (20.3) | < 0.001 |
| Aminoglycoside | 14 (0.3) | 14 (3.1) | < 0.001 |
| Indwelling device | 3,014 (67.5) | 381 (84.1) | < 0.001 |
| Foley catheter | 2,272 (50.9) | 268 (59.2) | < 0.001 |
| Central venous catheter | 1,697 (38.0) | 278 (61.4) | < 0.001 |
| Nasogastric tube | 1,193 (26.7) | 225 (49.7) | < 0.001 |
| Endotracheal tube | 719 (16.1) | 133 (29.4) | < 0.001 |
| Hemovac | 245 (5.5) | 21 (4.6) | 0.400 |
| Chest tube | 43 (1.0) | 7 (1.5) | 0.200 |
| Pigtail catheter drain | 40 (0.9) | 13 (2.9) | < 0.001 |
| Biliary draina | 32 (0.7) | 14 (3.1) | < 0.001 |
| Cystostomy | 10 (0.2) | 3 (0.7) | 0.110 |
| Ostomy | 31 (0.7) | 6 (1.3) | 0.150 |
| Endoscopy within 1 year | 336 (7.5) | 75 (16.6) | < 0.001 |
|
VRE within 6 monthsb |
45 (1.0) | 30 (6.6) | < 0.001 |
Data are presented as number of patients (percentage), unless otherwise specified.
CPE, carbapenemase-producing Enterobacterales; ESRD, end-stage renal disease; ICU, intensive care unit; IQR, interquartile range; VRE, vancomycin-resistant Enterococci.
aBiliary drain was percutaneous transhepatic biliary drain.
bVRE colonization or infection within 6 months.
Performance of the models
Based on levels of statistical significance and clinical relevance, 12 variables were selected for model development. These included indicators of previous healthcare exposure (e.g., admission to a long-term care facility, hospital days before ICU admission, endoscopy), device use (e.g., central venous catheter, nasogastric tube, biliary drain), comorbidities (e.g., steroid use, end-stage renal disease on renal replacement therapy), microbiological history (e.g., VRE colonization or infection) and a composite measure of antibiotic exposure.
Table 2 summarizes the predictive performance of the machine learning models in the test set. Among the classifiers, logistic regression with threshold of 0.5 yielded a precision recall-area under the curve (PR-AUC) of 0.36 and a receiver operating characteristics-area under the curve (ROC-AUC) of 0.77, with a sensitivity of 0.63 and specificity of 0.77. With thresholds of 0.45 and 0.40, sensitivities were 0.73 and 0.77, respectively. The precision–recall curve and ROC curve of the final logistic regression model with thresholds of 0.45 are shown in Fig. 1, and the corresponding confusion matrix is presented in Supplementary Fig. 1. Of the 91 patients with CPE colonization in the test set, 66 (72.5%) were correctly identified, and 25 (27.5%) were missed. Of the 640 patients predicted to be negative CPE carriers, 615 (96.1%) were true negatives and 25 (3.9%) were false negatives.
Table 2.
Predictive performance of machine learning models on the test set.
| Model | PR-AUC | ROC-AUC | Sensitivity | Specificity | PPV | NPV | F1-score | Accuracy |
|---|---|---|---|---|---|---|---|---|
|
Logistic regression (Threshold 0.5) |
0.358 | 0.774 | 0.626 | 0.768 | 0.216 | 0.953 | 0.321 | 0.755 |
|
Logistic regression (Threshold 0.45) |
0.358 | 0.774 | 0.725 | 0.689 | 0.192 | 0.961 | 0.304 | 0.693 |
|
Logistic regression (Threshold 0.4) |
0.358 | 0.774 | 0.769 | 0.619 | 0.171 | 0.963 | 0.279 | 0.633 |
| Gradient Boosting | 0.303 | 0.757 | 0.099 | 0.991 | 0.529 | 0.915 | 0.167 | 0.908 |
| AdaBoost | 0.332 | 0.757 | 0.088 | 0.999 | 0.889 | 0.915 | 0.160 | 0.915 |
|
SVM (Support Vector Machine) |
0.272 | 0.751 | 0.000 | 1.000 | 0.000 | 0.907 | 0.000 | 0.907 |
| Voting ensemble | 0.221 | 0.733 | 0.451 | 0.831 | 0.214 | 0.937 | 0.290 | 0.796 |
|
LightGBM (Gradient Boosting Machine) |
0.251 | 0.714 | 0.560 | 0.780 | 0.206 | 0.946 | 0.302 | 0.760 |
| XGBoost | 0.219 | 0.648 | 0.462 | 0.789 | 0.183 | 0.935 | 0.262 | 0.759 |
| Decision Tree | 0.221 | 0.640 | 0.549 | 0.740 | 0.177 | 0.942 | 0.268 | 0.722 |
| Random Forest | 0.135 | 0.606 | 0.264 | 0.863 | 0.164 | 0.920 | 0.203 | 0.808 |
| Extra Trees | 0.143 | 0.600 | 0.396 | 0.834 | 0.196 | 0.931 | 0.262 | 0.793 |
NPV, negative predictive value; PPV, positive predictive value; PR-AUC, precision recall-area under the curve; ROC-AUC, receiver operating characteristics-area under the curve.
Fig. 1.
Precision–recall curve (left) and receiver operating characteristic (ROC) curve (right) of the final logistic regression model at a threshold of 0.45. The precision–recall (PR) area under the curve (AUC) of 0.358 reflects the model’s performance in identifying true positives, whereas the receiver operating characteristic (ROC) AUC of 0.774 indicates its ability to differentiate between CPE-positive and CPE-negative cases.
Model interpretation
Adjusted odds ratios (aORs) with 95% confidence intervals (CI) from the final logistic regression model are shown in Fig. 2; Table 3. The following were significant risk factors for CPE colonization upon ICU admission: biliary drain (aOR 4.96; 95% CI, 2.32–10.60; P < 0.001), admission to long-term care facility (aOR, 2.40; 95% CI, 1.72–3.34; P < 0.001), VRE (aOR, 1.85; 95% CI, 1.01–3.40; P = 0.047), nasogastric tube (aOR, 1.82; 95% CI, 1.41–2.35; P < 0.001), steroid use (aOR, 1.79; 95% CI, 1.32–2.41; P < 0.001), central venous catheter (aOR, 1.55; 95% CI, 1.20–2.00; P = 0.001), antibiotic risk (aOR, 1.36; 95% CI, 1.22–1.50; P < 0.001), and hospital days before ICU admission (aOR, 1.04; 95% CI, 1.01–1.06; P = 0.007).
Fig. 2.

Risk factors for CPE colonization upon ICU admission. CI, confidence interval; CPE, carbapenemase-producing Enterobacterales; ESRD, end-stage renal disease; ICU, intensive care unit; VRE, vancomycin-resistant Enterococci.
Table 3.
Multivariate analysis of risk factors for CPE colonization at ICU admission.
| Feature | Adjusted OR (95% CI) | P value |
|---|---|---|
| Biliary drain | 4.96 (2.32–10.60) | < 0.001 |
|
Admission to long-term care facility |
2.40 (1.72–3.34) | < 0.001 |
| Aminoglycoside | 2.28 (0.95–5.47) | 0.066 |
| VRE | 1.85 (1.01–3.40) | 0.047 |
| Nasogastric tube | 1.82 (1.41–2.35) | < 0.001 |
| Steroid use | 1.79 (1.32–2.41) | < 0.001 |
| Central venous catheter | 1.55 (1.20–2.00) | 0.001 |
|
ESRD on renal replacement therapy |
1.40 (0.91–2.16) | 0.129 |
| Endoscopy | 1.340 (0.99–1.97) | 0.056 |
| Carbapenem | 1.38 (0.90–2.12) | 0.137 |
| Antibiotic risk | 1.36 (1.22–1.50) | < 0.001 |
|
Hospital days before ICU admission |
1.04 (1.01–1.06) | 0.007 |
CI, confidence interval; CPE, carbapenemase-producing Enterobacterales; ESRD, end-stage renal disease; ICU, intensive care unit; OR, odds ratio; VRE, vancomycin-resistant Enterococci.
SHAP (SHapley Additive exPlanations) values for the final logistic regression model are presented in Fig. 3. In this model, a positive result (red) increases the probability of a patient being colonized by CPE (Fig. 3a), and the mean SHAP value shows the importance of each feature (Fig. 3b). Antibiotic risk, nasogastric tube, and central venous catheter increased the probability of a patient being colonized by CPE. A prolonged duration of hospitalization prior to ICU admission was associated with an increased risk of CPE.
Fig. 3.
SHAP summary plot (a) and feature importance plot (b) illustrating the relative contribution of predictors to the logistic regression model. ESRD, end-stage renal disease; ICU, intensive care unit; VRE, vancomycin-resistant Enterococci. (a) Summary plot showing the impact of each feature on the model’s prediction, with colors indicating feature values. Positive values (red) indicate an increased probability of the patient being colonized by CPE, and negative values (blue) indicate decreased probability. (b) Bar plot of mean absolute SHAP values illustrating the relative importance of each feature.
Web interface
We developed a web-based CPE prediction tool based on the final logistic regression model using a threshold of 0.45 (www.cpepredictor.com). The application allows clinicians to enter 14 user-facing input items available at the time of ICU admission and instantly obtain an estimated risk of CPE colonization. For clinical usability, individual antibiotic classes are presented as separate yes/no inputs and are used internally to compute a composite antibiotic risk variable. Clinically important high-risk classes (carbapenem and aminoglycoside) are additionally retained as separate predictors. Accordingly, the final logistic regression model is based on the 12 predictors selected during model development. The user interface of this application is presented in Fig. 4.
Fig. 4.

User interface of the CPE prediction web application (www.cpepredictor.com). CPE, carbapenemase-producing Enterobacterales. Clinicians can input selected variables at ICU admission to obtain an estimated risk of CPE colonization.
Discussion
This study was conducted to develop a machine learning model to assist clinicians isolate CPE carriers before surveillance culture results become available. Given the low prevalence of CPE colonization and the potential consequences of missed carriers, we prioritized negative predictive value (NPV) and sensitivity, which are more clinically important indices for infection control decision-making. In our study, logistic regression at a threshold of 0.45 achieved a sensitivity of 0.73 and an NPV of 0.96, which is higher than, or comparable to, those of previous prediction models using machine learning10–12. Threshold selection was guided by the intended clinical use of the model as a rule-out decision support tool. A threshold sensitivity analysis demonstrated that lowering the threshold from 0.50 to 0.45 substantially improved sensitivity while maintaining a high NPV. In contrast, further lowering the threshold to 0.40 resulted in a disproportionate decrease in specificity, which would markedly increase the number of patients requiring isolation. Therefore, a threshold of 0.45 was selected as a pragmatic balance between infection control safety and operational feasibility. Although other machine learning models, including Gradient Boosting, AdaBoost, and SVM demonstrated higher specificity compared with logistic regression, they exhibited markedly low sensitivity, misclassifying a substantial proportion of true CPE carriers as negative. Such performance substantially limits their clinical applicability, as missed carriers may lead to delayed isolation and increased risk of nosocomial transmission.
Patients predicted to not be CPE carriers by the logistic regression model had a very low probability of CPE positivity with an NPV of 96%. As universal active surveillance with empirical contact precautions until results are available has no significant effect13, our prediction model may assist clinicians in ruling out CPE carriage without the need for isolation. However, the relatively low positive predictive value (PPV) of 19% indicates that only 19% of patients predicted to be colonized by CPE were true positives, and applying this model as a stand-alone tool could lead to unnecessary isolation. Therefore, combining our model with other microbiological tools for rapid screening of CPE carriage, such as Xpert Carba-R assay, which has been reported to enable robust infection control14, might assist clinical decision in patients predicted by our model to be positive CPE carriers.
In our study, central venous catheter, nasogastric tube, admission to long-term care facilities, previous antibiotic exposure, and prolonged hospital stay prior to ICU admission emerged as key predictors of CPE carriage upon ICU admission. In a previous study, Rinaldi et al.11 developed prediction models for CRE carriage at the time of liver transplantation using machine-learning algorithms, and identified use of antibiotics and use of β-lactam/β-lactamase inhibitor as contributing factors in line with our findings. Similarly, Huang et al.. developed an early prediction model for carbapenem-resistant Gram-negative bacterial carriage in ICUs and reported invasive catheterization, operation history, and history of cephalosporin use as important variables10.
A key strength of our study is that we developed a freely accessed web application (www.cpepredictor.com), which allows clinicians to easily apply the model at the time of ICU admission. The tool is intended to be used during transfer from emergency department or general ward to the ICU within the first hours of admission. It supports clinical decision-making regarding early empiric contact precautions for high-risk patients while awaiting rectal swab culture results, as well as prioritization of limited isolation rooms and additional rapid molecular testing, such as the Xpert Carba-R assay. Although a single operating threshold optimized for rule-out performance was used in this study, the predicted probability may be interpreted as a risk spectrum and adapted according to local CPE prevalence, available isolation capacity, and infection control policies. Periodic model recalibration and external validation are required to ensure sustained performance as local epidemiology changes.
There are several limitations to our study. First, it was conducted at a single tertiary hospital in South Korea, which may limit its generalizability to settings with different levels of CPE prevalence. Second, its retrospective design carries the risk of unmeasured confounders and missing variables that could influence CPE colonization. Third, although the model achieved high sensitivity and NPV, a proportion of true CPE carriers were still misclassified as negative. Therefore, the model should not be used to replace routine microbiological screening tests. Instead, it should be applied as a clinical decision support tool to assist in prioritizing early contact precautions for high-risk patients while maintaining universal admission screening upon ICU admission. In addition, the model’s PPV remained modest (0.19), indicating that a positive prediction does not establish colonization of CPE. Rather, such predictions should be interpreted as an indication to consider early isolation or additional rapid testing in conjunction with the clinical context. Finally, external and prospective validation in diverse healthcare settings is needed to increase generalizability. To further improve model performance, additional environmental factors, such as ICU colonization pressure and hospital-wide epidemic conditions with CPE seasonality, could be incorporated15,16.
Conclusion
We have been able to develop a machine learning–based tool to predict CPE colonization at ICU admission that shows high sensitivity and NPV, supporting its use as a rule-out aid for CPE carriage. The model can assist clinical decision-making in taking advantage of resources for early isolation of high-risk CPE carriers before surveillance culture results become available.
Methods
Study design and population
This was a retrospective observational study conducted at Hallym University Sacred Heart Hospital, an 842-bed tertiary referral hospital in Anyang, South Korea, from January 2022 to December 2023. The study was conducted and reported in accordance with TRIPOD-AI (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis – Artificial Intelligence) guidelines.
The study population consisted of adult patients (aged ≥ 18 years) who were admitted to the ICU during the study period. Since 2013 our hospital has implemented surveillance cultures from rectal swabs for all patients admitted to ICU, and contact precautions are then introduced for patients colonized or infected with CPE. To avoid duplication, only the first ICU admission per patient was included. Also, patients were excluded if CPE colonization had been confirmed prior to ICU admission or if no rectal swab was performed within 48 h of admission. All patient-identifying information was removed prior to analysis.
Sample size
A total of 4,915 ICU admissions met the inclusion criteria. Clinical data were collected from electronic medical records (EMR). The dataset was split temporally, with patients admitted in 2022 used as the training dataset and patients admitted in 2023 used as the test dataset. Due to the retrospective nature of the study, no formal sample size calculation was conducted.
Variables
Candidate variables were selected based on the literature, clinical relevance, and availability in the EMR system. A total of 42 variables were extracted from data available at the time of ICU admission. These included demographic characteristics (e.g., age, sex), underlying comorbidities (e.g., diabetes mellitus, previous surgery), prior healthcare exposures (e.g., previous hospitalization, admissions to long-term care facilities, and antibiotic use), and microbiological history (e.g., previous colonization with VRE). To facilitate model training, related variables were also grouped into composite features. Specifically, indwelling devices (e.g., central venous catheter, nasogastric tube, biliary drain), prior antibiotic exposures (e.g., fluoroquinolones, cephalosporins, carbapenems, β-lactam/β-lactamase inhibitor, aminoglycosides), and comorbidities (e.g., diabetes mellitus, chronic renal disease, cardiovascular disease, solid cancer, hematologic malignancy) were aggregated into summary risk scores. There were no missing values in the final dataset.
Outcome definition
The outcome of interest was presence of colonization with CPE at ICU admission. CPE colonization was defined as a positive rectal swab culture collected within 48 h of ICU admission. The outcome was treated as a binary variable: CPE colonization (1) or no colonization (0).
Feature selection
To improve model performance and reduce overfitting, all feature selection procedures were performed exclusively using the training dataset, with the test dataset kept fully untouched until final model evaluation. We first examined multicollinearity using variance inflation factors (VIF), and variables with high collinearity (VIF > 10) were excluded. Subsequently, we conducted a univariate logistic regression for each variable and retained those with P values < 0.1. Finally, backward elimination was applied to iteratively remove variables with higher P values, resulting in a final set of 12 predictors.
Machine learning models
We compared ten machine learning classifiers for predicting CPE colonization at ICU admission: logistic regression, decision tree, random forest, extra trees, gradient boosting, AdaBoost, support vector machine (SVM), XGBoost, LightGBM, and a voting ensemble. All models were trained using default hyperparameters without further tuning to avoid overfitting. All analyses were conducted in Python programming language.
Model performance and evaluation
Model performance was evaluated using stratified 5-fold cross-validation on the training dataset. Evaluation metrics included precision recall-area under the curve (PR-AUC), receiver operating characteristics-area under the curve (ROC-AUC), sensitivity (recall), specificity, positive predictive value (PPV), negative predictive value (NPV), F1 score and accuracy.
Model interpretation
To improve interpretability of the logistic regression model, we reported odds ratios with 95% confidence intervals. Feature importance was further analyzed using Shapley Additive exPlanations (SHAP) values. All code can be found at this GitHub address (https://github.com/hyeonji0831/CPE_machine_learning_prediction).
Statistical analysis
The Mann-Whitney U test was used to compare differences between continuous variables, and the Pearson chi-square test or Fisher’s exact test was used for the corresponding categorical variables, as appropriate. A two-tailed P value of < 0.05 was considered statistically significant. All statistical analyses were performed with R software (R Development Core Team, Vienna, Austria).
Supplementary Information
Below is the link to the electronic supplementary material.
Author contributions
Conceptualization: Hyeonji Seo, Methodology: Hyeonji Seo, Ji Hun Kim, Data curation: Hyeonji Seo, Software and formal analysis: Hyeonji Seo, Ji Hun Kim, Writing – original draft: Hyeonji Seo, Ji Hun Kim, Supervision: Yun Woo Lee, Eunmi Yang, Han-Sung Kim.
Funding
This research was supported by the Hallym University Research Fund 2023 (HURF-2023-02).
Data availability
The data supporting the findings of this study and code used in this study is available on the GitHub repository (https://github.com/hyeonji0831/CPE_machine_learning_prediction).
Competing interests
The authors declare no competing interests.
Ethical approval
Our research involving human data was performed in accordance with the Declaration of Helsinki. This study was approved by the Institutional Review Board of Hallym University Sacred Heart Hospital (IRB No. 2023-07-004-001). Informed consent was waived by the ethics committee because no intervention was involved, and no patient-identifying information was included. All the data were anonymized, and identifying information in the electronic medical records was encrypted to ensure patient confidentiality.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.van Duin, D. & Doi, Y. The global epidemiology of carbapenemase-producing Enterobacteriaceae. Virulence8, 460–469. 10.1080/21505594.2016.1222343 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Jeong, H., Hyun, J. & Lee, Y. K. Epidemiological characteristics of carbapenemase-producing Enterobacteriaceae outbreaks in the Republic of Korea between 2017 and 2022. Osong Public. Health Res. Perspect.14, 312–320. 10.24171/j.phrp.2023.0069 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Guidelines for the prevention and control of carbapenem-resistant Enterobacteriaceae, Acinetobacter baumannii and Pseudomonas aeruginosa in health care facilities (2017). https://www.who.int/publications/i/item/9789241550178. [PubMed]
- 4.Dautzenberg, M. J. et al. The association between colonization with carbapenemase-producing Enterobacteriaceae and overall ICU mortality: an observational cohort study. Crit. Care Med.43, 1170–1177. 10.1097/ccm.0000000000001028 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Kim, Y. A. et al. Risk factors for Carbapenemase-Producing enterobacterales infection or colonization in a Korean intensive care unit: a Case-Control study. Antibiot. (Basel Switzerl.)9, 569. 10.3390/antibiotics9100680 (2020). [DOI] [PMC free article] [PubMed]
- 6.Yan, L., Sun, J., Xu, X. & Huang, S. Epidemiology and risk factors of rectal colonization of carbapenemase-producing Enterobacteriaceae among high-risk patients from ICU and HSCT wards in a university hospital. Antimicrob. Resist. Infect. Control. 9, 155. 10.1186/s13756-020-00816-4 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Papadimitriou-Olivgeris, M. et al. Risk factors for KPC-producing Klebsiella pneumoniae enteric colonization upon ICU admission. J. Antimicrob. Chemother.67, 2976–2981 (2012). [DOI] [PubMed] [Google Scholar]
- 8.Wang, L. et al. Predicting the occurrence of multidrug-resistant organism colonization or infection in ICU patients: development and validation of a novel multivariate prediction model. Antimicrob. Resist. Infect. Control. 9, 66. 10.1186/s13756-020-00726-5 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Papafotiou, C. et al. Predictive score for patients with carbapenemase-producing enterobacterales colonization upon admission in a tertiary care hospital in an endemic area. J. Antimicrob. Chemother.77, 3331–3339. 10.1093/jac/dkac321 (2022). [DOI] [PubMed]
- 10.Liang, Q., Zhao, Q., Xu, X., Zhou, Y. & Huang, M. Early prediction of carbapenem-resistant Gram-negative bacterial carriage in intensive care units using machine learning. J. Glob Antimicrob. Resist.29, 225–231. 10.1016/j.jgar.2022.03.019 (2022). [DOI] [PubMed] [Google Scholar]
- 11.Freire, M. P. et al. Prediction models for carbapenem-resistant enterobacterales carriage at liver transplantation: a multicenter retrospective study. Transpl. Infect. Dis.: Off. J. Transplant. Soc.24, e13920. 10.1111/tid.13920 (2022). [DOI] [PubMed] [Google Scholar]
- 12.McGuire, R. J. et al. A pragmatic machine learning model to predict carbapenem resistance. Antimicrob. Agents Chemotherapy65, e0006321. 10.1128/aac.00063-21 (2021). [DOI] [PMC free article] [PubMed]
- 13.Jung, J. et al. Active surveillance testing to reduce transmission of carbapenem-resistant, gram-negative bacteria in intensive care units: a pragmatic, randomized cross-over trial. Antimicrob. Resist. Infect. Control. 12, 16. 10.1186/s13756-023-01222-2 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Park, S. H. et al. The impact of enhanced screening for carbapenemase-producing enterobacterales in an acute care hospital in South Korea. Antimicrob. Resist. Infect. Control. 12, 62. 10.1186/s13756-023-01270-8 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kim, J. Y. et al. The seasonality of carbapenemase-producing enterobacterales in South Korea. J. Hosp. Infect.140, 87–89. 10.1016/j.jhin.2023.07.010 (2023). [DOI] [PubMed] [Google Scholar]
- 16.Logan, L. K. & Weinstein, R. A. The epidemiology of carbapenem-resistant enterobacteriaceae: the impact and evolution of a global menace. J. Infect. Dis.215, S28–S36 (2017). [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data supporting the findings of this study and code used in this study is available on the GitHub repository (https://github.com/hyeonji0831/CPE_machine_learning_prediction).


