Skip to main content
Communications Medicine logoLink to Communications Medicine
. 2026 Jan 12;6:104. doi: 10.1038/s43856-025-01364-x

A matched case-control study on Escherichia coli factors contributing to sepsis and septic shock in bacteraemic patients

Natalia Maldonado 1,2,3,4,#, Inmaculada López-Hernández 1,2,3,4,#, John Karlsson Valik 5,6, Luis Eduardo López-Cortes 1,2,3,4, Pedro María Martínez Pérez-Crespo 7, Andrea García-Montaner 1,3,4,34, Manuel Alcalde-Rico 1,2,3,4, Adrián Sousa-Domínguez 8, Alfredo Jover Sáenz 9, Josune Goikoetxea 10, Ángeles Pulido-Navazo 11, Luis Buzón-Martín 12, Ana Isabel Aller 7, Lucía Boix-Palop 13, Alfonso del Arco-Jiménez 14, Alejandro Smithson-Amat 15, Juan Manuel Sánchez Calvo 16, Clara Natera- Kindelán 2,17, José Mª Reguera Iglesias 18, Carlos Armiñanzas-Castillo 2,19, Fátima Galán-Sánchez 20, Alberto Bahamonde 21, Isabel Gea-Lázaro 22, Cristian Castelló-Abietar 23,24, Inés Pérez-Camacho 25,35, Teresa Marrodán-Ciordia 26, Berta Becerril-Carral 27, Pontus Naucler 5,6, Álvaro Pascual-Hernández 1,2,3,4, Jesús Rodríguez-Baño 1,2,3,4,, On behalf of the PROBAC REIPI/GEIRAS-SEIMC/SAMICEI Group
PMCID: PMC12895010  PMID: 41526674

Abstract

Background

One third of patients with Escherichia coli bacteraemia develop a dysregulated inflammatory response (sepsis/septic shock). Our objective was to investigate whether specific microbiological determinants of E. coli are associated to presentation with sepsis/shock.

Methods

A matched case-control study was performed; 101 case-patients with E. coli bacteraemia presenting with sepsis (SEPSIS-3 criteria) and 101 control-patients with E. coli bacteraemia without sepsis were matched by service, sex, age, Charlson index, acquisition and source of the bacteraemia and empirical treatment. Whole genome sequencing of E. coli isolates was performed (Illumina MiSeq Inc.). Sequence type, serotype, fimH type, virulence factors, antibiotic resistance genes, plasmid replicons pathogenicity islands and prophages were determined. A multivariate model was built for presentation with sepsis/septic shock using conditional logistic regression. The predictive capacity on the observed data was measured with the area under the ROC curve (AUROC) with 95% confidence intervals (CI).

Results

Here we show that in the multivariate model (adjusted OR; 95% CI), the ST69 clone (7.53; 1.06-35.05) and pic gene (4.38; 1.53-12.54) are associated to presentation with sepsis/shock, while the genes papC (0.30; 0.12-0.74) and fdeC (0.18; 0.03-1.32) show a protective effect. The AUROC of this model is 0.81 (95% CI 0.74-0.87).

Conclusions

We identify E. coli bacterial factors associated with severe clinical presentation in patients with bacteraemia. Further studies would be needed to consider these factors as potential preventive or therapeutic targets.

Subject terms: Bacterial pathogenesis, Bacterial infection, DNA sequencing

Plain language summary

Escherichia coli is the most common cause of invasive infections, including bacteraemia that often progresses to severe conditions like sepsis or septic shock. While many host factors determine the severity of illness, this study looked at the bacterial factors that may contribute to sepsis severity. We directly compared E. coli-infected patients with similar traits but either with or without sepsis to control for patient factors Our analysis revealed that the ST69 clone and the presence of the pic gene were significantly associated with an increased risk of sepsis/septic shock, whereas the adhesion genes papC and fdeC were associated with a lower risk. These key findings underscore a role for specific E. coli genetic factors in determining clinical severity, thereby providing potential bacterial targets for the development of improved diagnostics and novel preventive or therapeutic interventions.


Maldonado, Lopez-Hernandez, et al. use a matched case-control study to compare E. coli-infected patients with or without sepsis. Their analysis shows that the ST69 clone is associated with risk of sepsis development, and certain genetic factors such as adhesion genes papC and fdeC were associated with a protective effect.

Introduction

E scherichia coli(E. coli) is the most common pathogen in invasive infections in Europe, representing around 40% of invasive isolates reported to EARS-Net in 20221; with an estimated incidence rate of bacteraemia of 48 per 100,000 person-years, which is much higher in aged persons2. Between 20 and 30% of patients with E. coli bacteraemia develop a dysregulated inflammatory response to infection, in the form of sepsis or septic shock35, which is associated with a much higher risk of death. While host genetic or epigenetic determinants particularly those associated with inflammatory, metabolic and immune responses, are likely to play a decisive role in the development of a dysregulated response to infection6,7, the fact that the causative pathogen accounts for a significant degree of variability in the host response8 suggests that certain microbiological characteristics may also be important in this regard.

The study of the association of E. coli genetic factors with the severity of the clinical presentation in patients with invasive infection has been the subject of some studies with discrepant results; this is probably due to the heterogeneity of the populations and the wide diversity of E. coli isolates3, as well as limitations in studies in terms of sample size, design, number of virulence genes studied, differences in sepsis/septic shock definition criteria, molecular techniques used for bacterial characterisation and lack of information on gene expression.

The objective of this study was to comprehensively analyse the contribution of microbiological determinants of E. coli for presentation with sepsis/septic shock in patients with bacteraemia using a specific design for controlling the confounding effect of host factors, as a first step towards identifying bacterial targets for the development of new diagnostic, preventive and therapeutic strategies.

The findings of this study reveal that specific microbiological factors of E. coli play a role in disease severity. Specifically isolates from clone ST69 and those harboring the pic gene are linked to a higher risk of developing sepsis or septic shock. In contrast, the presence of the adhesion genes papC and fdeC appears to confer a protective effect against this severe clinical outcome.

Methods

Study design and participants

This is a matched case-control study nested in the prospective multicentre PROBAC cohort (NCT03148769), which included patients with bacteraemia between October 2016 and October 2017 in 26 Spanish hospitals. The overall design of the PROBAC cohort was previously described9.

In the present study patients with E. coli monomicrobial bacteraemia from the PROBAC cohort were eligible; exclusion criteria included polymicrobial bacteraemia, neutropenia (<500 neutrophils/μL), solid organ transplantation, immunosuppressive diseases (HIV infection with ≤200 CD4 cells/µL, haematological cancer, primary immunodeficiency and receipt of immunosuppressive drugs (antineoplastic chemotherapy, classic or biological immunosuppressant). The first 225 reported patients in the database with E. coli bacteraemia presenting with sepsis or septic shock criteria (see below) were eligible as case patients; while patients with E. coli bacteraemia not presenting with sepsis or septic shock were eligible as controls and were selected with the same clinical characteristics as their pair within the case group according to the following matching criteria (1:1): hospital service, sex, age (±10 years), Charlson comorbidity index (±2 points), type of infection acquisition (nosocomial, healthcare-associated or community-acquired), source of bacteraemia and appropriateness of antibiotic treatment received in the first 24 h. When more than one matched candidate was available, the control patient was randomly selected by employing Excel function for binary randomisation.

Variables and definitions

The endpoint variable defining cases and controls was presentation with sepsis or septic shock or not. Sepsis was defined using SEPSIS-3 criteria10 as an increase in SOFA score of ≥ 2 points on the day the first positive blood cultures were obtained. Septic shock was defined as the requirement of vasopressors to maintain a mean arterial pressure of 65 mm Hg or greater during the first 24 h.

Patients’ and infection-related features collected were previously detailed9 and included demographics, underlying conditions and Charlson comorbidity index, recent exposure to antimicrobials and invasive devices, acquisition type of the infection, source of bacteraemia, appropriateness of antibiotic therapy and 30-day all-cause mortality.

In order to identify potential confounders and guide the selection of covariates for statistical adjustment, a directed acyclic graph was created illustrating hypothesised relationships between variables in the analysis of sepsis/septic shock in patients with E. coli bacteremia (Fig. 1, Supplementary information).

Fig. 1. Phylogenomic reconstruction of bacteraemic isolates of Escherichia coli from patients presenting as sepsis/septic shock (cases) and without sepsis (controls).

Fig. 1

Phylogenomic reconstruction of case-control Escherichia coli isolates based on a 99% core-genome alignment. The maximum likelihood (ML) phylogenomic tree based on 1000 rapid bootstrap inferences (GTR substitution model) was obtained with RAxML. Then, a recombination-free ML tree was generated with ClonalFrameML and visualised with the Tree of Life interactive tool (iTOL). Sequence types and phylogroups, represented in colored square boxes, respectively. Isolates collected from ‘control’ patients are labelled with a brown star, while those from ‘case’ patients are labelled with a yellow star.

Human ethics

The PROBAC project received ethical approval from the Hospital Universitario Virgen Macarena Ethics Board (reference: FIS-AMO-2016-01) and all necessary participating institutional review boards. Because of the study’s entirely observational nature, the need for written informed consent was officially waived.

Microbiological procedures

E. coli isolated from blood cultures of PROBAC patients were kept frozen at each participating site, at −80 °C in trypticase soy broth containing 15% (v/v) glycerol and sent to the Microbiology Laboratory of the Hospital Universitario Virgen Macarena, where whole genome sequencing (WGS) and de novo assembly were performed in the Illumina MiSeq Inc system and CLC Genomics Workbench software v.10, respectively. Antibiotic susceptibility was determined by the EUCAST broth microdilution method using EUCAST version 10.0 clinical breakpoints for interpretation. All microbiological procedures performed have been described in detail previously11.

Bioinformatic analysis

E. coli genomes were annotated with Bakta v1.6.112 and a multiple alignment with a core-genome definition of 99% was performed with Roary v3.13.013,14. The maximum likelihood (ML) phylogenomic tree was obtained with RAxML 8.2.12 executing 1000 rapid bootstrap inferences using a GTR Substitution. The final recombination-free ML tree was generated with ClonalFrameML v1.1215 and visualized with the Tree of Life (iTOL) v6 interactive tool16.

Bioinformatic analyses used to identify the microbiological factors of E. coli were performed as described before11. In summary in silico characterisation of sequence type (ST), serotype, fimH type, virulence factors, antibiotic resistance genes and plasmid replicons were done using online resources integrated in the Center of Genomic Epidemiology. Pathogenicity islands (PAI) were analysed from the PAI-DB database and prophage sequences within bacterial genomes were identified and annotated in the PHASTER web server. For each isolate the scores for microbiological factors (virulence, resistance genes, PAI, plasmids and prophage sequences) were calculated by summing the number of genes/elements detected in each category.

Statistics and reproducibility

We could match 101 pairs of cases and controls; this sample size allowed showing 20% difference in the risk of sepsis or septic shock among isolates with or without specific microbiological factors with 91.3% power or a 17% difference with 81.8% power and alpha error of 5%.

Crude odds ratios (OR) with 95% confidence intervals (CI) for developing sepsis or shock for the different patients’ features and microbiological factors were calculated by conditional logistic regression. An exploratory analysis of the independent association of microbiological factors (phylogroups, STs, plasmids, prophage sequences, PAI, virulence genes, antibiotic resistance genes and antibiotic susceptibility) with sepsis/septic shock was performed by multivariate conditional regression analysis; for controlling the potential confounding effect of variables not included as matching criteria and to avoid overfitting of models, a propensity score (PS) for being case or control was calculated using a binary logistic regression model in which all available baseline patients and infection-related variables were included. The predictive capacity of the PS score model on observed data was estimated by calculating its area under the receiver operating characteristic curve (AUROC) with 95% CI. The PS was included as a covariate in the multivariate models investigating the effect of microbiological determinants. Microbiological variables with bilateral bivariate p < 0.2 were included by groups (first the phylogroups, then ST, PAI, virulence genes, antibiotic resistance genes and antibiotic susceptibility results, plasmids and finally the prophage sequences) using a hierarchical forward stepwise procedure. Variables with bilateral p < 0.1 and the PS were retained and adjusted OR with 95% CI were calculated. The predictive capacity of the multivariate model on observed data was estimated by calculating their AUROC with 95% CI. Statistical analyses were performed using IBM SPSS Statistics for Windows (v21.0. Armonk, NY: IBM Corp.).

In addition to above analysis and to reduce the dimensionality of the virulence genome, we performed principal coordinate analysis (PCoA) using a Euclidean distance matrix of all virulence genes. K-means clustering algorithm was then applied to the PCoA coordinates to identify virulence clusters. The elbow method was used to determine the optimal number of clusters (k). Furthermore, we trained and assessed several random forest machine learning models to classify sepsis/septic shock using the Caret package and Ranger function in R (v.4.3.1, Copyright® 2023 The R Foundation for Statistical Computing). Random forest has the advantage of handling datasets with large number of predictors relative to the number of samples and it is also well-suited for capturing interactions between variables. Individual virulence genes were included if detected in more than 2%, but less than 98% of the isolates to reduce collinearity between predictors. Feature selection to improve the machine learning model performance was conducted using the Boruta package in R (Fig. 2, Supplementary information). Each model was based on a different set of genomic markers as predictors (Table 1, Supplementary information). To evaluate model performance more robustly, the dataset was repeatedly divided into training (80%) and testing (20%) subset. Each split was used to train and fine-tune the model on the training portion and assess its performance on the held-out test set. This procedure was repeated 1000 times using different random seeds to account for the variability in the data splits and provide a more reliable estimate of model performance. The results are presented as the distribution of AUROC values across all test sets. The integrity of the matched data structure was preserved in the training and testing datasets to prevent any data leakage.

Fig. 2. Clustering of Escherichia coli isolates based on the virulence genome.

Fig. 2

Principal coordinate analysis (PCoA) using a Euclidean distance matrix of all virulence genes shown in two dimensions. In total 32.4% of the variation between isolates was explained by the first two axes. Virulence clusters were identified using K-means clustering, with ellipses covering 95% of isolates in each cluster.

Table 1.

Distribution of matching variables in case and control patients with Escherichia coli bacteraemia

Variable Cases (n = 101) Controls (n = 101) OR (95% CI) P value
Charlson index, Me (IQR) 2 (0–3) 1 (0–2) 1.36 (0.98–1.89) 0.07
Age, Me (IQR) 77 (68–85) 78 (69–84) 1.09 (0.98–1.20) 0.11
Male sex 53 (52) 53 (52) NC NC
Acquisition of bacteraemia NC NC
 Community-acquired 66 (65) 66 (65)
 Healthcare associated 26 (26) 26 (26)
 Nosocomial 9 (9) 9 (9)
Source of bacteraemia NC NC
 Urinary tract infection 74 (73) 74 (73)
 Biliary intra-abdominal infection 19 (19) 19 (19)
 Non-biliary intra-abdominal infection 6 (6) 6 (6)
 Respiratory tract infection 1 (1) 1 (1)
 Unknown source 1 (1) 1 (1)
Type of hospitalisation service NC NC
 Emergencies 85 (84) 85 (84)
 Medical 10 (10) 10 (10)
 Surgical 3 (3) 3 (3)
 Intensive care unit 3 (3) 3 (3)
 Active empirical treatment 99 (98) 99 (98)

Me median; IQR interquartile range; OR odds ratio, calculated by conditional logistic regression; CI confidence interval; NC not calculated.

Inclusion and ethics statement

This article complies with the principles of Authorship, Inclusion and Ethics in Global Research. The work ensured the inclusive participation of local researchers across all project stages and contributors meeting authorship criteria have been appropriately credited. Furthermore, the study complied with established ethical recommendations, including obtaining local ethics approval.

Results

Clinical characteristics

One hundred and one case-patients with sepsis or septic shock were matched to 101 control-patients. Patients were recruited at 21 of the 26 participating centres in the PROBAC cohort. Regarding matching variables, 53 (52%) patients in both groups were male, 74 (73%) had a urinary tract infection as source of bacteraemia, 66 (65%) had a community-acquired infection and 99 (98%) received appropriate empirical therapy; because of the allowed matching margins, there were some differences between cases and controls in age (median, 77 years, interquartile range [IQR] 68–85 and 78 years [69–84] respectively, p = 0.11) and Charlson index (medians, 2 [IQR 0–3] and 1 [0–2], respectively, p = 0.07) (Table 1).

Regarding other baseline variables the proportion of dementia, recurrent urinary tract infections, peripheral vascular disease, obstructive uropathy, chronic kidney disease, peptic ulcer disease and exposure to urinary and central venous catheter were higher among cases, while cancer and recent surgery were more frequent among controls (Table 2, Supplementary information). Mortality at 30 days was 26% (26 patients) and 4 (4%) in cases and controls, respectively.

Table 2.

Crude association between Escherichia coli phylogroup, ST and virulence genes and presentation as sepsis/septic shock in patients with bacteraemia

Variable Cases (n = 101) Controls (n = 101) OR (95% CI) P value
Phylogroup
 A 7 (7) 8 (8) 0.86 (0.29–2.55) 0.782
 B1 10 (10) 6 (6) 1.80 (0.60–5.37) 0.292
 B2 63 (62) 70 (69) 0.70 (0.37–1.32) 0.265
 C 4 (4) 3 (3) 1.33 (0.30–5.96) 0.706
 D 12 (12) 9 (9) 1.37 (0.55–3.42) 0.493
 F 0 (0) 5 (5) 0.02 (0.00–20.33) 0.255
 G 5 (5) 0 (0) 65.3 (0.05–86658.6) 0.255
 ExPEC (B2, D, F, G) 80 (79) 84 (83) 0.77 (0.37–1.57) 0.467
Predominant sequence types
 ST131 15 (15) 13 (13) 1.20 (0.52–2.78) 0.670
 ST73 15 (15) 12 (12) 0.33 (0.56–3.16) 0.514
 ST69 11 (11) 5 (5) 2.50 (0.78–7.97) 0.121
 ST95 8 (8) 14 (14) 0.57 (0.24–1.36) 0.207
 ST12 6 (6) 11 (11) 0.44 (0.14–1.44) 0.177
Predominant serogroups
 O25:H4 15 (15) 14 (14) 1.10 (0.47–2.59) 0.827
 O6:H1 9 (9) 7 (7) 1.29 (0.48–3.45) 0.618
 O1:H7 4 (4) 6 (6) 0.67 (0.19–2.36) 0.530
 O4:H1 2 (2) 6 (6) 0.33 (0.07–1.65) 0.178
Virulence genesa
 cia 16 (16) 25 (25) 0.64 (0.34–1.20) 0.163
 cib 7 (7) 1 (1,0) 7.00 (0.86–56.89) 0.069
 clbB 30 (30) 39 (39) 0.64 (0.34–1.20) 0.163
 cvaC 20 (20) 30 (30) 0.58 (0.30–1.13) 0.109
 dhaK 7 (7) 14 (14) 0.46 (0.18–1.21) 0.117
 espY2 1 (1, 0) 8 (8) 0.13 (0.02–1.00) 0.050
 etsC 20 (20) 33 (33) 0.52 (0.27–0.99) 0.046
 fdeC 93 (92) 99 (98) 0.25 (0.05–1.18) 0.080
 hha 62 (61) 51 (50) 1.55 (0.88–2.72) 0.127
 hlyF 21 (21) 35 (35) 0.48 (0.25–0.93) 0.030
 ibeA 7 (7) 14 (14) 0.46 (0.18–1.21) 0.117
 iroN 48 (48) 57 (56) 0.68 (0.38–1.22) 0.192
 iss 83 (82) 93 (92) 0.33 (0.12–0.92) 0.033
 iucC 58 (57) 69 (68) 0.59 (0.32–1.10) 0.097
 iutA 57 (56) 69 (68) 0.57 (0.31–1.06) 0.074
 kpsE 77 (76) 85 (84) 0.60 (0.29–1.23) 0.162
 kpsMII 16 (16) 23 (12) 0.59 (0.27–1.28) 0.183
 kpsMII-K1 16 (16) 26 (26) 0.57 (0.29–1.12) 0.100
 mchF 43 (43) 53 (52) 0.67 (0.38–1.17) 0.160
 neuC 19 (19) 28 (28) 0.60 (0.31–1.18) 0.143
 opmT 86 (85) 96 (95) 0.23 (0.07–0.81) 0.022
 papC 57 (56) 70 (69) 0.50 (0.26–0.97) 0.041
 pic 27 (27) 17 (17) 1.91 (0.92–3.96) 0.082
 shiB 32 (32) 43 (43) 0.65 (0.37–1.13) 0.127
 yfcV 63 (62) 75 (74) 0.56 (0.30–1.04) 0.068
 Virulence score, Me (IQR) 34 (28-40) 37 (31–42) 0.96 (0.93–1.00) 0.032

Me median; IQR interquartile range; OR odds ratio; CI confidence interval.

aVirulence genes with p < 0.2 in crude analysis are shown; for other genes, see Table 3 in Supplementary information.

Microbiological factors

Most E. coli isolates belonged to phylogenetic group B2 62% (n = 63) in the case group isolates and 69% (n = 70) in the control group; the most frequent serogroup was O25:H4 (15% of cases and 14% of controls). No significant differences were found in phylogroup or in serogroup distribution between the case and control isolates (Fig. 1 and Table 2). Regarding sequence types there were not significant differences in overall distribution (Table 2); however, the most frequent among cases were ST131 and ST73 (15 isolates, 15% each) while the most frequent in controls were ST95 (14 isolates, 14%) and ST131 (13, 13%). The only STs showing a p < 0.2 was ST69 (11 [11%] in cases and 5 [5%] in controls, p = 0.12) and ST12 (6 [6%] in cases and 11 [11%] in controls, p = 0.17).

Isolates from the control group had a significantly higher virulence score than the case group (median, 37; IQR 31–42 vs. 34; 28–40, respectively), as well as higher proportions of most of the virulence genes tested. The exceptions were cib (production of the bacteriocin colicin Ib), which was found in 7% and 1% of the case and control groups, respectively and pic (Enterobacteriaceae serine-protease autotransporter protein), present in 27% and 17% of the case and control isolates, respectively (Table 2 and Supplementary information, Table 3).

Differences between E. coli isolates from cases and controls in prevalence of other microbiological factors, including pathogenicity islands, plasmids, prophage sequences, antibiotic resistance and genes or mutations conferring antibiotic resistance are described in Table 3 and Supplementary information, Tables 48. Although no statistically significant differences were found in median PAI, plasmids or prophage sequences scores between case and control isolates, genomic islands HPI, OI-57 and TAI were more frequent in isolates from the case group, while AGI-1 and PAI II APEC-O1 and IIIAPEC-O1 were more prevalent in isolates from patients without sepsis (Table 3 and Supplementary information, Table 4). The control group also had a higher proportion of isolates with IncFIB plasmid replicon (Table 3 and Supplementary information, Table 5) and cdt-I and Salmon SEN34 prophage sequences (Table 3 and Supplementary information, Table 6).

Table 3.

Crude association between pathogenicity islands, plasmids, phages, antibiotic resistance and genes/mutations conferring resistance in Escherichia coli and presentation as sepsis/septic shock

Variable Cases (n = 101) Controls (n = 101) OR (95% CI) P value
Pathogenicity island (PAI)
 AGI-1 39 (39) 49 (49) 0.69 (0.40–1.18) 0.176
 HPI 28 (28) 19 (19) 1.69 (0.85–3.36) 0.133
 OI-57 39 (39) 29 (29) 1.48 (0.85–2.57) 0.168
 PAI II APEC-O1 39 (39) 49 (49) 0.69 (0.40–1.18) 0.176
 PAI III APEC-O1 2 (2) 6 (6) 0.33 (0.07–1.65) 0.178
 TAI 12 (12) 6 (6) 2.00 (0.75–5.33) 0.166
 PAI score, Me (IQR) 9 (4–12) 9 (4–12) 1.00 (0.94–1.06) 0.988
Plasmids
 IncFIB (AP001918) 62 (61) 73 (72) 0.54 (0.28–1.06) 0.075
 IncY 6 (6) 2 (2) 3.00 (0.61–14.86) 0.178
 Plasmid score, Me (IQR) 3 (1–4) 3 (2–4) 1.00 (0.84–1.20) 0.964
Prophage sequences
 Entero Cdt-I 1 (1) 5 (5) 0.20 (0.02–1.71) 0.142
 Salmon SEN34 3 (3) 9 (9) 0.33 (0.09–1.23) 0.099
 Salmon 118970 9 (9) 4 (4) 2.67 (0.71–10.05) 0.147
 Entero fiAA91 11 (11) 4 (4) 3.33 (0.92–12.11) 0.067
 Prophage sequences score, Me (IQR) 3 (2–3) 3 (2–4) 0.95 (0.80–1.14) 0.611
Antibiotic resistance
 Cefepime 10 (10) 4 (4) 3.00 (0.81–11.08) 0.099
 Cefotaxime/ceftriaxone 13 (13) 5 (5) 3.67 (1.02–13.14) 0.046
 Ceftazidime 17 (17) 4 (4) 7.50 (1.72–32.80) 0.007
 Cefuroxime 19 (19) 8 (8) 3.75 (1.24–11.30) 0.019
 Ciprofloxacin 43 (43) 29 (29) 2.08 (1.07–4.03) 0.030
 Piperacillin/tazobactam 8 (8) 3 (3) 2.67 (0.71–10.05) 0.147
 Trimethoprim-sulfamethoxazole 38 (38) 28 (28) 1.59 (0.87–2.91) 0.135
Resistance genes or mutations
 blaCTX-M-15 7 (7) 3 (3) 3.00 (0.61–14.86) 0.178
 gyrA (Asp87Asn) 32 (32) 23 (23) 1.64 (0.85–3.19) 0.143
 gyrA (Ser83Leu) 47 (47) 36 (36) 1.79 (0.93–3.44) 0.082
 sul3 6 (6) 2 (2) 5.00 (0.58–42.80) 0.142
 tetA 18 (18) 25 (25) 0.59 (0.27–1.28) 0.183
 Resistance score, Me (IQR) 4 (1–8) 3 (0–7) 1.06 (0.99–1.14) 0.114

Me median; IQR interquartile range; OR odds ratio; CI confidence interval.

*Microbiological factors with p < 0.2 in crude analysis are shown. The rest of the comparisons are presented in Supplementary Tables 48.

Regarding antibiotic susceptibility data, E. coli isolates from case-patients showed higher resistance rates to all antibiotics tested compared to isolates from control-patients, especially to third and fourth generation cephalosporins, ciprofloxacin and trimethoprim-sulfamethoxazole (Table 3 and Supplementary information, Table 7), as well as a higher prevalence of genes coding for antibiotic resistance mechanisms, such as blaCTX-M-15 and mutations in gyrA associated with fluoroquinolones resistance (Table 3 and Supplementary information, Table 8).

Multivariate analysis of clinical and microbiological factors

A PS for presentation as sepsis/septic shock was calculated from patients’ clinical-epidemiological data, including age, Charlson index (both due to differences in the margin allowed in their values for matching) and previous invasive devices and procedures. The AUROC of the PS was 0.74 (95% CI 0.67–0.81), indicating moderate predictive power for the observed values.

In a multivariate conditional regression model E. coli microbiological factors with a bivariate P < 0.2 for their association with sepsis/septic shock were added hierarchically in groups to the PS calculated for hosts’ factors; variables for which exposure was ≤1% in one of the groups (cases or controls) and ≤10% in both were not included (this was the case for genes cib and espY2), to avoid model instability. Using this procedure the following microbiological determinants were selected as independently associated with an increased likelihood of presenting as sepsis/septic shock in patients with bacteraemia (adjusted OR; 95% CI): isolate belonging to clone ST69 (7.53; 1.06–35.05) and presence of pic (serine-protease autotransporter protein) (4.38; 1.53–12.54), while virulence genes papC (P pili biogenesis) (0.30; 0.12–0.74) and fdeC (E. coli adhesion factor) (0.08; 0.03–1.32) were protective for sepsis/septic shock presentation (Table 9, Supplementary information). The AUROC of this model was 0.81 (95% CI 0.74–0.87), higher than the PS model without microbiological determinants (Fig. 3, Supplementary information).

Fig. 3. Performance of machine learning models for classifying septic shock using different sets of bacterial genomic predictors.

Fig. 3

Boxplots (median, interquartile range [IQR], whiskers = 1.5 × IQR and outliers) showing the test-set performance of random forest models trained to classify sepsis/septic shock in patients with Escherichia coli bacteraemia. Each model is based on a different combination of bacterial genomic predictors. The plot displays the distribution of area under the receiver operating characteristic curve (AUROC) across 1000 random splits of the study population into training and test sets. The dashed line indicates the AUROC = 0.50 (random performance). Model 1: feature-selected predictors (genes or elements) (n = 11) from the full set of 225 available predictors, including virulence genes, pathogenicity islands, sequence types, phylogroups, plasmids and resistance genes. Model 2: feature-selected predictors (n = 7) from virulence genes and pathogenicity islands (subset of n = 109). Model 3: virulence clusters. Model 4: sequence types. Model 5: phylogroups. Model 6: plasmid types. Model 7: resistance genes and mutations. Model 8: all virulence genes and pathogenicity islands (n = 109). Model 9: all bacterial genomic predictors (n = 225).

Virulence clusters and discriminatory ability of bacterial factors

PCoA of virulence genes showed that the first two axes explained 32.4% of the variation between isolates, indicating diversity in the virulence genome (Fig. 2). The identified virulence clusters were linked to different STs (Table 10, Supplementary information). To assess the ability of genomic markers and virulence clusters to discriminate between patients with and without sepsis/septic shock, we fitted several random forest machine learning models using different combinations of predictors (Fig. 3). Models that included a feature selection algorithm as a preparatory step showed moderate predictive performance; when all genomic markers were included (model 1), the median AUROC was 0.72 (IQR: 0.67–0.76). This model comprised the following selected bacterial factors: cib, espY2, hlyF, iss, opmT, yehB, phylogroups F and G, ST117, etsC and aac(6’)-Ib-cr. On the other hand, when restricted to virulence genes and pathogenicity islands (model 2) the median AUROC was 0.68 (IQR: 0.63–0.72). This second model included cib, espY2, etsC, hlyF, opmT, pic and yehB. In contrast, models based on virulence clusters, sequence type, phylogroup, plasmids, antibiotic resistance genes, or a combination of all genomic markers all demonstrated poor performance in classifying sepsis/septic shock in bacteraemic patients.

Discussion

In this study isolates from clone ST69 and those harbouring the pic gene (codifying for the serine-protease autotransporter protein) were found to be associated with presentation with a dysregulated response to infection, i.e. as sepsis or septic shock; whereas the adhesion genes papC (P pili biogenesis) and fdeC (E. coli adhesion factor) were found to be protective factors. The model, including these features modestly but significantly improves the prediction ability of the model including only hosts’ factors. In addition, the results of PCoA analysis suggested a higher influence of specific virulence factors compared to virulence cluster or sequence types.

When investigating the impact of microbial factors in the occurrence of dysregulated response to infection, host-related variables are critical since both genetic traits6 and chronic underlying conditions17 have been associated to increase the risk of sepsis. Therefore considering their impact is relevant to avoid overestimating the importance of microbiological features. We could not include host genetics in our study, but we used a matched design with the intention to control the influence of phenotypic characteristics of patients in sepsis occurrence in an efficient and strict manner. The advantages of this approach are obvious, but other challenges were raised, including the risk of overmatching and issues with the representativeness of matched cases. However from a pathophysiology perspective, we consider these limitations to be less critical. While overmatching may induce an underestimation of the associations for some variables related to sepsis, any variable that demonstrates an association despite overmatching must be considered highly relevant.

To the best of our knowledge this study is the first to analyse the relationship between E. coli bacterial factors using WGS and severe presentation in patients with bacteraemia using a matched case-control design. Previous data on the influence of bacterial factors studied by WGS in the severity of E. coli bacteraemia considering hosts’ features are scarce. Denamur et al.4 studied 912 E. coli isolates from patients with bacteraemia, 24% of whom developed septic shock using SEPSIS-3 criteria10. They found no significant association between bacterial genetic factors and septic shock; this could be explained by a non-differential information bias, because patients with sepsis were included in the not-septic shock group, even though sepsis and septic shock are different degrees of the same pathophysiological path (i.e. dysregulated response to infection). D’Onofrio et al.18 found genes mchB, mchC and mcmA (microcins production), cnf1, vat and clbB (toxins), sfaD and focG (S-fimbriae and adhesins) to be associated with sepsis, while some genes related to iron uptake (senB, iucC, iutA) and adhesion genes papA_fsiA_F16 and iha showed a protective effect; however, no adjustment for host features was performed, which might have overestimate the relevance of these factors. Other studies evaluating microbial genetic factors in bacteraemic E. coli isolates did not use WGS; Jauréguy et al.19 found no association between phylogroups or nine putative VFs characteristic of extra-intestinal pathogenic E. coli with severe sepsis or shock, using the 1992 definitions20; Mora-Rillo et al.3, using these same definitions, studied phylogroups, STs, 25 virulence genes and four beta-lactamase genes and found that the presence of cnf (cytotoxic necrotising factor) and blaTEM to be associated with severe sepsis and septic shock. Some other studies included only antibiotic-resistant E. coli isolates21,22. The discrepancies in the results of previous studies are probably due to different definitions, methods and populations studied.

In our study despite the low precision of estimation clone ST69 showed a strong association with sepsis or septic shock. This clone, also known as clonal group A23, is the main clone within phylogroup D11 and one of the most frequently isolated high-risk ST in patients with bacteraemia11,2428. In our collection of isolates from patients with sepsis/septic shock, ST69 showed resistance rates above 20% for ciprofloxacin and 35% for amoxicillin/clavulanate, gentamicin and trimethoprim-sulfamethoxazole, being the second clone, after ST131, in the frequency of harboring antibiotic resistance genes (in particular, aph(3”)-Ib, aph(6)-Id, aadA5, blaTEM-1B, sul1, sul2, dfrA17)11.

In contrast ST69 has shown a low content of virulence genes in these patients, compared to other predominant clones such as ST73 and ST95. Clone ST69 is characterised by a specific virulence profile with three genes (lpfA, eilA and air) significantly associated compared to ST131, ST95 and ST7311. The long polar fimbria (lpfA) not only acts as an adhesin but also modulates the host inflammatory response29; eilA (HilA-like transcriptional regulator), a gene encoding a putative transcriptional regulator of a type III secretion system ETT2 widespread in bloodstream isolates of the ST69 lineage30, which has been shown to alter adhesion, motility, biofilm formation and serum bactericidal resistance31. Mutants of eilA are less adherent and have a reduced ability to form biofilm; that phenotype was observed in an air (enteroaggregative immunoglobulin repeat protein) mutant suggesting that this protein could be involved as an accessory adhesin/aggregin32. Notably, the virulence gene eilA was an independent risk factor for 30-day mortality in patients with sepsis/septic shock33, suggesting that it may be a virulence gene relevant to both the severity of clinical presentation and fatal outcome in E. coli bacteraemia.

Colonisation-involved protein gene pic was associated with sepsis/septic shock presentation. Pic is a member of the serine protease autotransporter superfamily of Enterobacteriaceae (SPATE)34. Several biological functions associated with this protein have been described, including proteolytic activity on mucin35 and induction of mucus hypersecretion36, promoting colonisation and growth in the presence of mucin in the Enteroaggregative E. coli (EAEC) pathotype37. In addition this protein has been shown to increase serum resistance by direct cleavage of certain molecules such as C3, a central molecule of the complement cascade, as well as C3b and proteins C4 and C2 of the classical and lectin pathway. Additional proteolytic assays using human serum as a source of complement proteins indicated that Pic is also active in a more physiological environment, thus favoring immune evasion of E. coli and its invasiveness38.

In addition to promoting E. coli survival in the bloodstream Pic has been shown to be capable of inducing high host production of proinflammatory mediators and, concomitantly, cellular immunosuppression leading to sepsis, as demonstrated by a murine model study of sepsis39. Additional functions include hemagglutination, degradation of coagulation factor V and cleavage of leukocyte surface glycoproteins34,40,41. Given the biological functions described this protein is likely a key driver of the stronger response observed in the cytokine and systemic inflammation domains associated with E. coli bloodstream infections in critically ill patients8, becoming a promising target for investigating the virulence factors responsible for a dysregulated host response in E. coli bacteraemia.

The microbiological factors found in the present work associated with sepsis/septic shock presentation; in particular pic, as well as the eilA, lpfA and air genes, frequently found in the ST69 clone, have been characteristically all described in the EAEC pathotype. This pathotype is known for their ability to form persistent biofilms mediated by adhesins, generally fimbriae, although other adhesion-encoding genes have been identified as potential factors to contribute to the phenotype. Moreover, recent studies have documented the presence of EAEC/ExPEC hybrids and its association with extraintestinal infections such as bacteraemia4245. Further studies will be necessary to elucidate the impact of the enteroaggregative pattern and clinical presentation in E. coli bacteraemia.

In this study E. coli isolates from control patients exhibited a significantly higher median virulence score. This observation leads to the hypothesis that many virulence genes may play a role as protective factors for sepsis or septic shock, as has been observed with papC and fdeC. The papC gene is located in the pap operon, which encodes several proteins responsible for the biogenesis of pyelonephritis-associated P pili, one of the most important virulence mechanisms associated with adhesion in uropathogenic E. coli (UPEC) strains46. The PapC protein is involved in the export and assembly of pili subunits across the outer membrane47, while direct binding to epithelial cells is mediated by the PapG adhesin, whose papGII allele has also been previously described as a protective factor for presentation with severe systemic inflammatory response syndrome in patients with extended-spectrum β-lactamases-producing bacteraemia even after controlling for the source of bacteraemia48; this could be related to the immunogenicity of P fimbriae, which could contribute to the development of an earlier or stronger immune response.

A similar mechanism could be behind the finding of the E. coli adhesion factor gene (fdeC) as a protective factor for presentation as sepsis/septic shock in the present study. This gene which encodes the FdeC protein involved in adhesion to mammalian cells, contributes to the colonisation of the bladder and kidney49. Interestingly the FdeC protein has been described as a promising vaccine candidate against ExPEC infections in a murine model in which intranasal immunisation with FdeC provided considerable protection against experimental infection with two different UPEC strains50.

Limitations of this study include a restricted statistical power to detect factors with lower impact, despite the use of a sufficient design; the possibility of an overmatching effect; genes expression were not studied; and host genetic analysis was not performed. To explore whether the bacterial genomic markers carried discriminatory information related to sepsis/septic shock we assessed several machine learning models using different combinations of predictors. These methods do not explicitly model the matched structure of data and it is uncertain if the results are generalisable to the underlying source population. However studies suggest that the AUROC is typically underestimated in such settings, indicating that our reported performance metrics likely are conservative estimates51. Furthermore, while informative, these results are not sufficient for prediction in a clinical setting without confirmation in external data sets to ensure model reliability.

The strengths are the matched case-control design the analysis of a prospective, nationwide, clinically well-characterised cohort; the use of current consensus definitions of sepsis and septic shock; and the use of WGS.

In summary we could identify some microbiological factors such as ST69 and pic gene as associated with increased odds of presentation with sepsis or septic shock in E. coli bacteraemia, although their overall impact may be somehow limited. Further studies are needed to understand if these microbiological features might be targets for preventive or therapeutic interventions.

Supplementary information

Supplementary Information (592.4KB, pdf)
43856_2025_1364_MOESM3_ESM.pdf (28.9KB, pdf)

Description of Additional Supplementary Files

Supplementary Data 1 (10KB, txt)
Supplementary Data 2 (49.5KB, xls)
Supplementary Data 3 (488.5KB, xls)

Acknowledgements

This study was funded by the Instituto de Salud Carlos III through grant PI16/01432 and co-funded by the European Union (Development Regional Fund ‘A Way to Achieve Europe’). The funders had no role in the design, collection of data, analysis and writing of the manuscript or the decision to publish. We would like to thank all local clinical and microbiological researchers at participating hospitals, members of the PROBAC REIPI/GEIRAS-SEIMC/SAMICEI Group, who helped recruit patients and collect the data.

Author contributions

N.M., I.L.-H., J.R.-B. and A.P. were responsible for conceptualisation, formulating the overall research questions, methodology, formal analysis and writing of the original draft. N.M., I.L.-H., A.G.-M. and M.A.-R. were responsible for bioinformatics analysis. L.E.L.-C., P.M.M.-P.C., A.S.-D., A.J.-S., J.G., A.P.-N., L.B.-M., A.A., L.B.-P., A.d.-J., A.S.-A., J.M.S.-C., C.N.-K., J.M.R.-I., C.A.-C., F.G.-S., A.B., I.G.-L., C.C.-A., I.P.-C., T.M.-C., B.B.-C. participated by reviewing the design, recruiting patients and isolates and thoroughly reviewed the manuscript. J.K.-V. and P.N. participated reviewing the design, performing analysis and reviewed the manuscript. All authors have seen and approved the submitted version of this manuscript and accept responsibility for the decision to submit for publication. I.L.-H., J.R.-B. and A.P. were responsible for funding acquisition, project administration, supervision and coordinating the study.

Peer review

Peer review information

Communications Medicine thanks Evdoxia Kyriazopoulou, Valentino D’Onofrio and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Data availability

Source data for Figs. 1, 2 and 3 are accessible from Supplementary Data 1, 2 and 3, respectively. The complete genome sequence data are publicly available in the European Nucleotide Archive (ENA) under Bioproject PRJEB62601. Researchers may gain access to an anonymised and de-identified version of the dataset presented in this article. To obtain access, a proposed use must first be approved by an independent review committee and the requestor must sign a data access agreement with the senior authors’ institution. Please submit all inquiries and proposals to jesusrb@us.es

Code availability

The custom source code to generate the study’s main results is publicly available at https://data.mendeley.com/datasets/4j8tw9wnwb/1. This material is provided under the licence CC BY NC 3.0, meaning you are free to adapt, copy or redistribute the material, providing you attribute appropriately and do not use the material for commercial purposes. Quality analysis of genome assemblies, genome annotation and pan-genome analysis were performed in QUAST v5.2.0, Bakta v1.6.1 and Roary v3.13.0, respectively. Phylogenomic tree reconstruction based on the best-scoring maximum-likelihood (ML) inference tree for a DNA alignment was performed in RAxML v8.2.12 and best-scoring ML inference tree with branch lengths corrected to account for recombination events in ClonalFrameML v1.12. Other bioinformatics analyses, including Principal Coordinate Analysis (PCoA), clustering based on PCoA, Boruta algorithm and Random Forest Models were performed in R v4.3.1 using the following core packages: tidyverse (2.0.0), caret (6.0-94), Boruta (8.0.0), ranger (0.17.0), vegan (2.6-4) and pROC (1.18.5).

Competing interests

L.E.L.-C. reports consulting fees from Angelini Pharm and payments for presentations from Correvio Pharma Corp., Gilead Sciences, Inc. and ViiV Healthcare. L.B.-P. reports payments for presentations in educational events from Tillotts Pharma AG and Menarini Group and support for attending meetings and/or travel from Pfizer, Inc. All other authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A list of authors and their affiliations appears at the end of the paper.

These authors contributed equally: Natalia Maldonado, Inmaculada López-Hernández.

Supplementary information

The online version contains supplementary material available at 10.1038/s43856-025-01364-x.

References

  • 1.European Centre for Disease Prevention and Control. Antimicrobial Resistance in the EU/EEA (EARS-Net)-Annual Epidemiological Report 2022. (ECDC, 2023).
  • 2.Bonten, M. et al. Epidemiology of Escherichia coli bacteremia: a systematic literature review. Clin. Infect. Dis.XXX72, 1211–1219 (2021). [DOI] [PubMed] [Google Scholar]
  • 3.Mora-Rillo, M. et al. Impact of virulence genes on sepsis severity and survival in Escherichia coli bacteremia. VirulenceXXX6, 93–100 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Denamur, E. et al. Genome wide association study of Escherichia coli bloodstream infection isolates identifies genetic determinants for the portal of entry but not fatal outcome. PLoS Genet.XXX18, e1010112 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Brumwell, A. et al. Escherichia coli ST131 associated with increased mortality in bloodstream infections from urinary tract source. J. Clin. Microbiol.XXX61, e0019923 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Lu, H. et al. Host genetic variants in sepsis risk: a field synopsis and meta-analysis. Crit. CareXXX23, 26 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Zeng, X. et al. Screening of key genes of sepsis and septic shock using bioinformatics analysis. J. Inflamm. Res.XXX14, 829–841 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Butler, J. M. et al. Pathogen-specific host response in critically ill patients with blood stream infections: a nested case-control study. EBioMedicineXXX117, 105799 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Pérez-Crespo, P. M. M. et al. Revisiting the epidemiology of bloodstream infections and healthcare-associated episodes: results from a multicentre prospective cohort in Spain (PRO-BAC Study). Int. J. Antimicrob. AgentsXXX58, 106352 (2021). [DOI] [PubMed] [Google Scholar]
  • 10.Singer, M. et al. The Third International Consensus Definitions for sepsis and septic shock (sepsis-3). JAMAXXX315, 801–810 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Maldonado, N. et al. Whole-genome characterisation of Escherichia coli isolates from patients with bacteraemia presenting with sepsis or septic shock in Spain: a multicentre cross-sectional study. Lancet MicrobeXXX5, e390–e399 (2024). [DOI] [PubMed] [Google Scholar]
  • 12.Schwengers, O. et al. Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microb. Genom.XXX7, 000685 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Page, A. J. et al. Roary: rapid large-scale prokaryote pan genome analysis. BioinformaticsXXX31, 3691–3693 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. BioinformaticsXXX30, 1312–1313 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Didelot, X. & Wilson, D. J. ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput. Biol.XXX11, e1004041 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Letunic, I. & Bork, P. Interactive Tree of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res.XXX49, W293–W296 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wang, H. E. et al. Chronic medical conditions and risk of sepsis. PLoS ONEXXX7, e48307 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.D’Onofrio, V. et al. Virulence factor genes in invasive Escherichia coli are associated with clinical outcomes and disease severity in patients with sepsis: a prospective observational cohort study. MicroorganismsXXX11, 1827 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Jauréguy, F. et al. Host and bacterial determinants of initial severity and outcome of Escherichia coli sepsis. Clin. Microbiol. Infect.XXX13, 854–862 (2007). [DOI] [PubMed] [Google Scholar]
  • 20.Bone, R. C. et al. Definitions for sepsis and organ failure and guidelines for the use of innovative therapies in sepsis. The ACCP/SCCM consensus conference committee. American College of Chest Physicians/Society of Critical Care Medicine. ChestXXX101, 1644–1655 (1992). [DOI] [PubMed] [Google Scholar]
  • 21.Rodríguez-Baño, J. et al. Virulence profiles of bacteremic extended-spectrum β-lactamase-producing Escherichia coli: association with epidemiological and clinical features. PLoS ONEXXX7, e44238 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Fröding, I. et al. Extended-spectrum-β-lactamase- and plasmid AmpC-producingEscherichia coli causing community-onset bloodstream infection: association of bacterial clones and virulence genes with septic shock, source of infection, and recurrence. Antimicrob. Agents Chemother.XXX64, e02351–19 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Kocsis, B., Gulyás, D. & Szabó, D. Emergence and dissemination of extraintestinal pathogenic high-risk international clones of Escherichia coli. LifeXXX12, 2077 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.de Lastours, V. et al. Mortality in Escherichia coli bloodstream infections: antibiotic resistance still does not make it. J. Antimicrob. Chemother.XXX75, 2334–2343 (2020). [DOI] [PubMed] [Google Scholar]
  • 25.Lipworth, S. et al. Ten-year longitudinal molecular epidemiology study of Escherichia coli and Klebsiella species bloodstream infections in Oxfordshire, UK. Genome Med.XXX13, 144 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Jauneikaite, E. et al. Bacterial genotypic and patient risk factors for adverse outcomes in Escherichia coli bloodstream infections: a prospective molecular epidemiological study. J. Antimicrob. Chemother.XXX77, 1753–1761 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Kim, B., Kim, J. H. & Lee, Y. Virulence factors associated with Escherichia coli bacteremia and urinary tract infection. Ann. Lab. Med.XXX42, 203–212 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Vanstokstraeten, R. et al. Molecular characterization of extraintestinal and diarrheagenic Escherichia coli blood isolates. VirulenceXXX13, 2032–2041 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Zhou, M. et al. Long polar fimbriae contribute to pathogenic Escherichia coli infection to host cells. Appl. Microbiol. Biotechnol.XXX103, 7317–7324 (2019). [DOI] [PubMed] [Google Scholar]
  • 30.Fox, S. et al. A highly conserved complete accessory Escherichia coli type III secretion system 2 is widespread in bloodstream isolates of the ST69 lineage. Sci. Rep.XXX10, 4135 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Wang, X. et al. Genetic distribution, characterization, and function of Escherichia coli type III secretion system 2 (ETT2). iScienceXXX27, 109763 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Sheikh, J. et al. EilA, a HilA-like regulator in enteroaggregative Escherichia coli. Mol. Microbiol.XXX61, 338–350 (2006). [DOI] [PubMed] [Google Scholar]
  • 33.Maldonado, N. et al. Association of microbiological factors with mortality in Escherichia coli bacteraemia presenting with sepsis/septic shock: a prospective cohort study. Clin. Microbiol. Infect.XXX30, 1035–1041 (2024). [DOI] [PubMed] [Google Scholar]
  • 34.Henderson, I. R. et al. Characterization of Pic, a secreted protease of Shigella flexneri and enteroaggregative Escherichia coli. Infect. Immun.XXX67, 5587–5596 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Gutiérrez-Jiménez, J., Arciniega, I. & Navarro-García, F. The serine protease motif of Pic mediates a dose-dependent mucolytic activity after binding to sugar constituents of the mucin substrate. Microb. Pathog.XXX45, 115–123 (2008). [DOI] [PubMed] [Google Scholar]
  • 36.Navarro-Garcia, F. et al. Pic, an autotransporter protein secreted by different pathogens in the Enterobacteriaceae family, is a potent mucus secretagogue. Infect. Immun.XXX78, 4101–4109 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Harrington, S. M. et al. The Pic protease of enteroaggregative Escherichia coli promotes intestinal colonization and growth in the presence of mucin. Infect. Immun.XXX77, 2465–2473 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Abreu, A. G. et al. The serine protease Pic from enteroaggregative Escherichia coli mediates immune evasion by the direct cleavage of complement proteins. J. Infect. Dis.XXX212, 106–115 (2015). [DOI] [PubMed] [Google Scholar]
  • 39.Dutra, I. L. et al. Pic-producingEscherichia coli induces high production of proinflammatory mediators by the host leading to death by sepsis. Int. J. Mol. Sci.XXX21, 2068 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Abreu, A. G. et al. The serine protease Pic as a virulence factor of atypical enteropathogenic Escherichia coli. Gut MicrobesXXX7, 115–125 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Ruiz-Perez, F. et al. Serine protease autotransporters from Shigella flexneri and pathogenic Escherichia coli target a broad range of leukocyte glycoproteins. Proc. Natl. Acad. Sci. USAXXX108, 12881–12886 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Lara, F. B. et al. Virulence markers and phylogenetic analysis of Escherichia coli strains with hybrid EAEC/UPEC genotypes recovered from sporadic cases of extraintestinal infections. Front. Microbiol.XXX8, 146 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Boll, E. J. et al. Emergence of enteroaggregative Escherichia coli within the ST131 lineage as a cause of extraintestinal infections. mBioXXX11, e00353-20 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Del Carpio, A. M. G. et al. Genomic dissection of an enteroaggregative Escherichia coli strain isolated from bacteremia reveals insights into its hybrid pathogenic potential. Int. J. Mol. Sci.XXX25, 9238 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Luiz, B. M. et al. Enteroaggregative Escherichia coli (EAEC) isolates obtained from non-diarrheic children carry virulence factor-encoding genes from extraintestinal pathogenic E. coli (ExPEC). Braz. J. Microbiol.XXX55, 3551–3561 (2024). [DOI] [PMC free article] [PubMed]
  • 46.Waksman, G. & Hultgren, S. J. Structural biology of the chaperone-usher pathway of pilus biogenesis. Nat. Rev. Microbiol.XXX7, 765–774 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Norgren, M., Båga, M., Tennent, J. M. & Normark, S. Nucleotide sequence, regulation and functional analysis of the papC gene required for cell surface localization of Pap pili of uropathogenic Escherichia coli. Mol. Microbiol.XXX1, 169–178 (1987). [DOI] [PubMed] [Google Scholar]
  • 48.Rodríguez-Baño, J. et al. Outcome of bacteraemia due to extended-spectrum β-lactamase-producing Escherichia coli: impact of microbiological determinants. J. Infect.XXX67, 27–34 (2013). [DOI] [PubMed] [Google Scholar]
  • 49.Nesta, B. et al. FdeC, a novel broadly conserved Escherichia coli adhesin eliciting protection against urinary tract infections. mBioXXX3, e00010–e00012 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Moriel, D. G. et al. Identification of protective and broadly conserved vaccine antigens from the genome of extraintestinal pathogenic Escherichia coli. Proc. Natl. Acad. Sci. USAXXX107, 9072–9077 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Xu, H. et al. Estimating the receiver operating characteristic curve in matched case control studies. Stat. Med.XXX38, 437–451 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information (592.4KB, pdf)
43856_2025_1364_MOESM3_ESM.pdf (28.9KB, pdf)

Description of Additional Supplementary Files

Supplementary Data 1 (10KB, txt)
Supplementary Data 2 (49.5KB, xls)
Supplementary Data 3 (488.5KB, xls)

Data Availability Statement

Source data for Figs. 1, 2 and 3 are accessible from Supplementary Data 1, 2 and 3, respectively. The complete genome sequence data are publicly available in the European Nucleotide Archive (ENA) under Bioproject PRJEB62601. Researchers may gain access to an anonymised and de-identified version of the dataset presented in this article. To obtain access, a proposed use must first be approved by an independent review committee and the requestor must sign a data access agreement with the senior authors’ institution. Please submit all inquiries and proposals to jesusrb@us.es

The custom source code to generate the study’s main results is publicly available at https://data.mendeley.com/datasets/4j8tw9wnwb/1. This material is provided under the licence CC BY NC 3.0, meaning you are free to adapt, copy or redistribute the material, providing you attribute appropriately and do not use the material for commercial purposes. Quality analysis of genome assemblies, genome annotation and pan-genome analysis were performed in QUAST v5.2.0, Bakta v1.6.1 and Roary v3.13.0, respectively. Phylogenomic tree reconstruction based on the best-scoring maximum-likelihood (ML) inference tree for a DNA alignment was performed in RAxML v8.2.12 and best-scoring ML inference tree with branch lengths corrected to account for recombination events in ClonalFrameML v1.12. Other bioinformatics analyses, including Principal Coordinate Analysis (PCoA), clustering based on PCoA, Boruta algorithm and Random Forest Models were performed in R v4.3.1 using the following core packages: tidyverse (2.0.0), caret (6.0-94), Boruta (8.0.0), ranger (0.17.0), vegan (2.6-4) and pROC (1.18.5).


Articles from Communications Medicine are provided here courtesy of Nature Publishing Group

RESOURCES