Skip to main content
PLOS One logoLink to PLOS One
. 2026 Feb 23;21(2):e0342894. doi: 10.1371/journal.pone.0342894

Sequencing of the invasive E. coli strain BEN2908 isolated from poultry: A comparative investigation of genomic regions shared with intestinal and extraintestinal model E. coli strains

Tobias Weber Martins 1, Angélina Trotereau 2, Simone Lahnig-Jacques 1, Maxime Branger 2, Sébastien Houle 3, Charles M Dozois 3, Daniel Brisotto Pavanelo 1, Fabiana Horn 1,*, Catherine Schouler 2,*
Editor: Feng Gao4
PMCID: PMC12928451  PMID: 41729883

Abstract

Extraintestinal pathogenic Escherichia coli (ExPEC) cause disease outside the gut and include avian pathogenic E. coli (APEC), a leading cause of bacterial infections in poultry. Among their highly diverse types, strain BEN2908 stands out for its significant invasive ability across various human and avian cell types. Aiming to investigate further aspects of this strain and its plasmid, we sequenced and assembled the complete genome of BEN2908 and compared it to 22 E. coli strains, including other invasive strains such as adherent and invasive E. coli (AIEC) LF82 and NRG857c, by constructing a phylogenetic tree and using web-based characterization software. With these results, we selected eight strains closely related to BEN2908 to perform a ring comparison, including two APEC (APEC O1 and IMT5155), two neonatal meningitis E. coli (NMEC; RS218 and IHE3034), two uropathogenic E. coli (UPEC; 78-Pyelo and CFT073), one commensal E. coli (MG1655) and one adherent-invasive E. coli (AIEC; LF82). This revealed 20 genomic regions (GRs) of interest which were then analysed by CD-Search, BLASTp and KEGG Pathway databases. Many of the genes in these GRs had no previous description but showed similarity to known genes involved in sugar uptake, nitrogen metabolism, and dicarboxylate transport and processing, among other functions. These results were tabulated and used to infer possible pathways that could be involved in ExPEC pathogenesis, highlighting candidate genes that have been overlooked in ExPEC research.

Introduction

Avian pathogenic Escherichia coli (APEC) causes avian colibacillosis, a prevalent bacterial infection that can affect birds of all ages and at all stages of poultry production. It manifests in a range of clinical forms, including omphalitis in embryos, salpingitis in laying hens, cellulitis, airsacculitis, perihepatitis, peritonitis, and septicaemia [1]. Avian colibacillosis results in high morbidity and mortality leading to economic losses in the industry throughout the world (e.g., in Netherlands, losses due to salpingitis were estimated at € 0.4 million, € 3.3 million and € 3.7 million for the layer-sector, the meat-sector and poultry farming, respectively [2]). APEC strains are typical commensal inhabitants of the chicken intestine, exhibiting a high genetic diversity, reflected by the various serogroups and sequence types (ST) isolated from clinical cases. To cause an extraintestinal infection, though, an APEC would require siderophores, which permit them to survive in the host fluids (poor in free iron), and protectins (such as K1 capsule), with which they can evade host defenses. Yet, even if some virulence genes/strategies have been identified, the pathophysiology of avian colibacillosis remains incompletely understood, and additional determinants likely remain to be discovered. One such example is the dicarboxylate uptake regulator, DctR, which appears to contribute to biofilm formation, serum resistance, adherence and colonization in ducks [3]. In that perspective, many APEC genomes have been sequenced; to date (December, 2024), of the 349,451 E. coli genomes available on Enterobase, 21,666 were isolated from poultry, of which 468 reportedly belong to ST95. ST95 strains exhibit a broad host range, and phylogenetic studies show that strains isolated from different species share numerous genes and SNPs, indicating a substantial degree of genomic overlap [4]. Among E. coli strains of this ST, BEN2908 (O2:H5:K1) is one of the APEC strains that has been most studied. Moreover, the ST95 complex (STc95, a ST complex defined as having at least three STs that differ among each other by no more than two of the seven loci) is also closely related to different types of ExPEC infection, being one of the five most common STc in ExPEC, at least for the past few decades [58].

Strain BEN2908 is a nalidixic acid resistant derivative of strain MT78, isolated in 1977 from the trachea of a one-day old chick in France [9]. BEN2908 is an efficient colonizer of the chicken intestine [10,11], and its inoculation either by the air sacs or intratracheally results in severe systemic infection [12,13]. In mice, transurethral inoculation of BEN2908 results in urinary infections comparable to those caused by human UPEC strains [14]. Remarkably, BEN2908 is able to metabolize short chain fructooligosaccharides, conferring a competitive advantage to colonize the intestine of chickens [10,11,15]. Carbohydrate metabolism also seems to play a role in colonization of the lungs and/or air sacs by BEN2908 [16]. Genes linked to sugar metabolism also contribute to bacterial fitness under stressful conditions such as oxygen restriction, the late stationary phase of growth, or growth in serum or in the intestinal tract [17,18]. Type 1 fimbriae expression, in particular, seems to be dependent on regular cytosolic levels of carbohydrates [18,19]. Further, this strain can also invade and survive within avian and human cells including avian fibroblasts [20] and hepatocytes (LHM), human pneumocytes (A549) [21], human brain microvascular endothelial cells (HBMEC) [22], and human intestinal cells (Intestine-407), at levels comparable to AIEC strains [23]. Moreover, BEN2908 has also been used as a model to study important ExPEC-specific genes such as ibeA, which contributes to adherence to host cells, resistance to oxidative stress and virulence in both avian and mammalian hosts [22,24,25]. Due to its ability to induce colibacillosis in poultry, cause urinary tract infections in murine models, and invade various types of human cells, BEN2908 represents a highly versatile and cross-host pathogenic E. coli.

Another subset of pathogenic E. coli includes those belonging to the AIEC pathotype, which are commonly associated with high adhesion and invasion rates causing inflammatory bowel diseases (IBD) in humans, such as ulcerative colitis or Crohn’s disease. Strain LF82 is an extensively studied AIEC reference strain and one of the first identified as capable of inducing IBD, followed by AIEC NRG857c, isolated from the ileum of a patient with Crohn’s disease [26,27]. In addition to its capacity to invade different intestinal cell lines, these AIEC strains display notable genomic similarity to some ExPEC strains, such as strain APEC O1 and UPEC strain UTI89 [28]. AIEC strains also have some well-known virulence factors that are essential for adherence, invasion and survival inside host cells, such as the serine autotransporter protease Vat, which induces vacuolization and cytoskeleton rearrangements in avian, murine and human cells [2931]. Moreover, transcriptomics and Tn-seq analyses identified LF82-specific genes that might be implicated in the growth or survival of intracellular bacterial communities (IBCs). This investigation uncovered three noteworthy gene clusters: the High Pathogenicity Island (HPI), a putative type 6 secretion system (T6SS), and a region associated with carbohydrate metabolism [31]. Curiously, the functionality of those three clusters has already been correlated to ExPEC, especially APEC, pathogenesis by many different authors [3236].

In this study, we perform a comparative genomic analysis of BEN2908 to ten ExPEC strains, ten intestinal pathogenic E. coli (InPEC) strains, and two commensal strains, to identify genomic features common to these different strains, that are potentially underlying their ability to cause pathogenesis. To achieve that, we: (i) demonstrated the close evolutionary relationship between BEN2908, AIEC LF82 and NRG857c and other ExPEC model strains through a phylogenetic analysis, complemented by different characterization programs, (ii) identified and analysed the content of 20 genomic regions (GRs) in common to these strains that were absent in the commensal E. coli K-12, and (iii) inferred the possible origin of some of these features by comparison to analogous genomic modules from other bacteria. Through this approach, our aim was to identify novel genes or gene modules that may have a functional role in E. coli pathogenesis due to their homology to other known genes. Moreover, the complete sequence of the chromosome and plasmid, identified, described, and analysed in this study will be useful to study more comprehensively the pathogenic mechanism of strain BEN2908 and other invasive E. coli.

Methods

BEN2908 DNA extraction, sequencing, assembly and annotation

Nanopore sequencing and assembly were performed by the Genome and Transcriptome Facility at Bordeaux, France. For the extraction, 5 µg of genomic DNA were sheared to 20 Kb using Megaruptor 2 (Diagenode). Sheared DNA was End-Repaired using Oxford Nanopore recommendations for 1D Ligation sequencing (LSK-SQK 108), with minor modifications, as follows. 48 µL of sheared DNA were incubated with 7 µL of Ultra II End-prep reaction buffer and 3.5 µL of Ultra II End-prep enzyme mix (New England Biolabs) at 20 °C for 15 minutes and at 65 °C also for 15 minutes. The sample was then cleaned up using 1.0X of AMPure XP beads and barcoded using NBD-103 kit (Oxford Nanopore Technologies). After that, 22.5 µL of clean repaired DNA and 25 µL of Blunt/TA Ligase Master Mix (New England Biolabs) were added to 2.5 µL of barcode and incubated at room temperature for 15 minutes. Barcoded samples were cleaned up again and quantified using Qubit fluorometer (Invitrogen) for equimolar pooling. Then, 2.2 µg of pooled DNA were ligated to AMX adapter and purified according to Oxford Nanopore recommendations. Afterwards, 13 µL of the library were loaded into a MinION Flow cell (FLO-MIN106 R9.4) and sequenced during 48 hours on a GridION x5. The obtained raw data (.fast5 files) were base called in high accuracy mode and filtered using Guppy (v. 4.0.11) by applying a minimum quality score of Q7. This resulted in 752,742 reads above the Q7 cutoff. Read quality and length distributions were then assessed using NanoPlot [37] and compared with the dataset prior to filtering.

Illumina paired-end sequencing was performed in 2015 at the Genome Québec facility, at McGill University (Montreal, Quebec, Canada) in an Illumina MiSeq machine; sequencing generated 3,274,430 reads with 250 bp of length and 50% GC content. Illumina reads were trimmed for adapters and low-quality bases using Trimmomatic (v. 0.32) [38], with the following parameters: ILLUMINACLIP:TruSeq3-PE.fa:2:30:10; LEADING:30; TRAILING:30; HEADCROP:20; MINLEN:150. This resulted in 2,476,272 paired reads with minimum length of 150 bp, which were further filtered with a Q20 cutoff using fastq_quality_filter (v. 1.0.0), available on Galaxy platform (v. 25.0) [39]. After filtering, a total of 1,713,025 reads were retained, yielding a coverage depth of 282x and breadth of 99.86%, as assessed by BWA (v. 0.7.19) [40]. The files and logs of the programs mentioned above, including the FastQC score (v. 0.12.1)(https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) are available on the GitHub repository referred in the Data availability section. Then, the Nanopore files were assembled with Canu (v 1.6) [41] and polished with the Illumina reads obtained using Pilon (v. 1.22) [42]; three runs were necessary until no corrections were made anymore. Finally, the annotation was made with the classic RAST workflow available online [43].

Genomic comparison and characterization of E. coli strains

In addition to BEN2908, described above, 22 other E. coli genomes used in this study were downloaded from NCBI’s database under the following accession numbers and were also submitted to RAST for annotation: 55989 (GCF_000026245.1), 11368 (GCF_000091005.1), 042 (GCF_008042015.2), H10407 (GCF_000210475.1), 2009EL-2050 (GCF_000299255.1), 11128 (GCF_000010765.1), E2348/69 (GCF_000026545.1), E24377A (GCF_000017745.1), O157:H7 EDL933 (GCF_000732965.1), O157:H7 Sakai (GCF_000008865.1), IMT5155 (GCA_000813165.1), APEC O1 (GCA_000014845.1), IHE3034 (GCA_000025745.1), RS218 (GCA_000800845.2), NRG857c (GCA_000183345.1), LF82 (GCA_000284495.1), CFT073 (GCA_014262945.1), χ7122 (GCA_000307205.1), 78-Pyelo (GCA_014131615.1), UTI89 (GCA_000013265.1), SCU-397 (GCA_013358385.1), and K-12 MG1655 (GCA_000005845.2). Regarding extraintestinal strains, these comprised three APEC (APEC O1, IMT5155 and χ7122) [4446], two NMEC (RS218 and IHE3034) [47,48] and three UPEC strains (UTI89, CFT073 and 78-Pyelo) [4951]. Regarding intestinal strains, these comprised four enterohemorrhagic E. coli (EHEC; O157:H7 Sakai, O157:H7 EDL933, 11368 and 11128) [5255], three enteroaggregative E. coli (EAEC; 042, 2009EL-2050 and 55989) [5658], two enterotoxigenic E. coli (ETEC; H10407 and E24377A) [59,60], two AIEC (LF82 and NRG857c) [27,28], one enteropathogenic (EPEC; E2348/69) [61], and two commensal strains (SCU-397 and MG1655) [62,63]. Multilocus sequence type (MLST) and serotype information of these strains were described in the above cited publications and information was verified using data from the Enterobase database [64] and the following software: ClermonTyping (v. 24.02) [65], SerotypeFinder (v. 2.0) [66], MLST (v. 2.0) [5,67,68], and FimTyper (v. 1.0) [69]. Other programs, such as VFanalyzer (v. 6.0) [70], CRISPRCasTyper (v. 1.8.0) [71], MinCED [72], SecReT6 (v. 3.0) [73], PHASTEST (v. 3.0) [74], Roary (v. 3.13.0) [75] and KEGG (release 116) [76] were also used to complement this characterization. The KEGG Pathway (KP) and KEGG Orthology (KO) databases were used by mapping the KO assignment numbers of the uncharacterized ORF homologs identified in this work onto KP maps. This allowed us to predict metabolic pathways potentially related to the molecular functions of the novel ORFs. However, these predictions are based on homology-derived KO assignments and require experimental validation of protein activity, regulation and specificity. All programs, excepting Roary, were run using the .fasta files of the 23 strains on their respective web-based platforms using default parameters. Roary was executed in Linux Ubuntu 24.04.3 LTS terminal with default alignment configurations using the .gff files of each strain. The -r and -e parameters were used to generate R plots and align core genes using PRANK [77], respectively.

With the annotated genomes, Orthofinder (v. 2.5.4) [78] was used for the identification of orthogroups (i.e., groups of genes with an evolutionary relation). Afterwards, the files generated from this analysis were used for constructing an unrooted phylogenetic tree, described below.

RAxML unrooted tree generation and average amino acid identity (AAI) to AIEC LF82

Phylogenetic tree construction was based on two files generated with Orthofinder: (i) a .txt file identifying all the single copy orthogroups (i.e., the identification of all groups of homologous genes that possess only one allele corresponding to each strain) and (ii) a .csv file containing all the Orthogroups found. Based on the information contained in these two files, Python scripts were made for the following operations: (i) generation of one .fasta file for each single copy orthogroup found; (ii) alignment of the homologous genes contained in each single copy orthogroup .fasta using MUSCLE (v.5) [79]; and (iii) concatenation of all the aligned genes in a single .multifasta file with one entry for each strain. After, trimAl (v. 1.5) was used to trim ambiguous regions in the alignment, resulting in 23 sequences of 987,155 amino acids, which were converted in a .phy file, using a custom Python script. Then, ModelFinder (from IQ-TREE v. 3.0.1) [80] was used to identify the best substitution matrix. After that, RAxML [81] was executed using a JTT substitution matrix incorporating a proportion of invariable sites and a gamma rate of heterogeneity. Node support was assessed with 1,000 nonparametric bootstrap replicates. To validate the tree topology, three additional phylogenies were generated under alternative substitution matrices and evaluated with bootstrap replicates. The trees and supporting files are available in the GitHub repository. Finally, the web tool iTOL (v.7)(interactive Tree of Life) [82] was used for tree visualization and image editing. To complement the result of the tree, the program EzAAI (v. 1.2.4) [83] was used to obtain AAI values and proteome coverage percentages of all strains to AIEC LF82.

Ring image generation and CDS functional characterization

The ring comparison was generated using the software BRIG (BLAST Ring Image Generator; v. 0.95) [84]. The chromosomal ring comparison was made by setting BEN2908 as the reference genome against the genomes of the ExPEC/AIEC strains that clustered in the phylogenetic tree (grey arrows), excepting UTI89 and NRG857c, which presented the same alignment pattern of RS218 and LF82, respectively. These comprised two APEC (red tones; IMT5155 and APEC O1), one AIEC (orange; LF82), two NMEC (purple tones; RS218 and IHE3034), two UPEC (green tones; 78-Pyelo and CFT073), and one commensal E. coli (grey; K-12 MG1655). The plasmidial ring comparison was made by setting pBEN2908 as the reference and the sequences of four other ColV-like plasmids screened using Liu et al. criteria (2018) [85], i. e., if at least one gene from four or more of the following gene clusters were present: (i) cvaABC and cvi (the ColV operon), (ii) iroBCDEN (the salmochelin operon), (iii) iucABCD and iutA (the aerobactin operon), (iv) etsABC, (v) ompT and hlyF, and (vi) sitABCD. The four plasmids comprised three APEC plasmids (red tones; p1ColV5155, pAPECO1Col-BM, and pAPEC-1) and one AIEC plasmid (orange; pO83-CORR). BLAST was run with BLAST+ (v. 2.12.0) using default BLASTn parameters (word_size = 11; reward = 2; penalty=−3; gapopen = 5; gapextend = 2; e-value = 10). Regions containing at least 4 kb that were absent (coverage values below 50%) in the commensal K-12 strain were tagged as genomic regions (GR) of interest. The 4 kb cutoff was chosen because it approximately corresponds to the size of small functional gene clusters (around 3−4 genes, given E. coli general gene density of 1 gene per kb) and because the same threshold has precedent in genomic descriptions of related APEC strains (APEC O1 and IMT5155), which aids comparative interpretation [44,45]. S1 Table shows a relation of the GC content, nucleotide coverage, and nucleotide identity of each GR from BEN2908 against the genomes of the other strains using BLASTn optimized for highly similar sequences (megablast). Each GR range drawn in the ring was determined by the end of the last CDS common to all strains until the beginning of the first CDS common to all strains. These limiting CDS are written in the “GR Limits” column in Table 3 and S2 Table. Further, the names of twenty-one previously identified regions, including four from BEN2908 genome (GimA, GimB, AGI-1, and AGI-3) [16,86,87] were also added to the ring comparison.

The predicted putative functions of some of the genes and operons within the GRs were verified using BLASTp and CD-Search (v 3.21) [88,89], and were classified as follows: Sugar metabolism (SM), Prophages (Phg), Metabolism of iron and other metals (Met), Secretion Systems (SS), Adhesion and Invasion (A/I), and Defense mechanisms (Def). The other genes selected that did not correlate with these functions were grouped as General Metabolism (GM). The domains and genes selected on “CD-Search prediction” and “BLASTp identity” columns in Table 5 and S2 Table were, respectively, the hit result with an expect value closer to 0 and, preferentially, a reviewed entry from the UniProtKB database. Unreviewed entries were selected when an unreviewed homolog had a relevant mention in a previous paper (ORFs 36-56 [90]; ORF 234 [91]; ORF 534 [92]; and ORFs 934-1334 [93] or when no reviewed entries were found (ORF 15 and ORF 16). Each novel ORF identified in this work or previously reported CDS or region were referred to as a “feature”. For prophage-containing GRs, the boundaries were defined by the region between the attachment sites identified in PHASTEST, except for GRs 11 and 14, where no attachment sites were found. S2 Table contains a summary of the features and their characterization in all 36 GRs.

Results and discussion

Assembly, characterization and phylogenetic analysis of BEN2908

Sequencing analysis indicated that the genome of BEN2908 comprised the chromosome and a single plasmid. The chromosome contig has 5,061,728 bp and 50.6% GC content, and the plasmid has 133,673 bp and 49.8% GC content. The characterization of BEN2908 and the strains used in this study was conducted using the web-based programs from CGE, as described in the Methods section. Strain BEN2908 (phylogroup B2; serotype O2:H5) exhibits a characteristic STc95 APEC profile, similar to APEC strain IMT5155 (B2; O2:H5:K1) and APEC O1 (B2; O1:H7) (Table 1).

Table 1. Characterization of the strains used in this study1.

Strains Pathotype Sequence Type ST complex Phylogroup Serotype fimH type
ExPEC BEN2908 APEC ST95 STc95 B2 O2:H5:K1 fimH23432
APEC O1 APEC ST95 STc95 B2 O1:H7:K1 fimH15
IMT5155 APEC ST140 STc95 B2 O2:H5:K1 fimH15
χ 7122 APEC ST23 STc23 C O78:H9:K80 fimH35
RS218 NMEC ST95 STc95 B2 O18:H7:K1 fimH18
IHE3034 NMEC ST95 STc95 B2 O18:H7:K1 fimH18
UTI89 UPEC ST95 STc95 B2 O18:H7 fimH18
78-Pyelo UPEC ST12 STc12 B2 O21:H5 fimH5
CFT073 UPEC ST73 STc73 B2 O6:H1 fimH10
InPEC LF82 AIEC ST135 STc135 B2 O83:H1 fimH436
NRG857c AIEC ST135 STc135 B2 O83:H1 fimH2
O157:H7 Sakai EHEC ST11 STc11 E O157:H7 fimH36
O157:H7 EDL933 EHEC ST11 STc11 E O157:H7 fimH36
11368 EHEC ST21 STc29 B1 O26:H11 fimH440
11128 EHEC ST16 STc29 B1 O111:H8 fimH86
E2348/69 EPEC ST15 B2 O127:H6 fimH57
H10407 ETEC ST48 STc10 A O78:H11 fimH41
E24377A ETEC ST1132 B1 O-:H28 fimH54
42 EAEC ST410 STc23 C O-:H21 fimH24
2009EL-2050 EAEC ST678 B1 O104:H4
55989 EAEC ST678 B1 O104:H4
Commensal SCU-397 Commensal ST38 STc38 D O86:H18 fimH5
K-12 MG1655 Commensal ST10 STc10 A O16:H48 fimH27

1This analysis was conducted using the programs ClermonTyping, for defining the phylogroup, Enterobase, for defining the ST complex according to Wirth (2006) [5], and the characterization programs from the Center for Genomic Epidemiology (CGE), mentioned in the Methods section.

2The allele fimH2343 differs from fimH15 by a single nucleotide mutation (537 G > A), resulting in a non-synonymous substitution on the amino acid sequence (180 G > S).

To compare the genome of BEN2908 to other E. coli strains, we generated an unrooted phylogenetic tree using the 3,101 common orthologues identified by Orthofinder (Fig 1). The strains used in this tree were selected to evaluate which typing scheme best clusters BEN2908 and to elucidate which pathogenic pathotypes show greater affinity to the AIEC strains LF82 and NRG857c. For those reasons, in addition to the three strains mentioned above, 20 model E. coli strains from different phylogroups, sequence types, serotypes, and pathotypes were selected. Also, considering the importance of type 1 fimbriae (T1F) to invasion, strains of different fimH types were also selected. Gene fimH is commonly used as T1F typing gene because it directly mediates attachment to host cells, and it has been shown that its allelic variation alters affinity for mannosylated receptors, affecting adhesion and invasion [94,95]. In the phylogenetic tree (Fig 1), BEN2908 clustered with other STc95 strains, especially close to APEC strain IMT5155 which has the same serotype (O2:H5:K1), but a different ST due to one allelic differentiation in the adenylate kinase housekeeping gene (adk55 rather than adk13). Another noticeable aspect of the tree is that the AIEC strains LF82 and NRG857c are more closely related to the eight ExPEC strains (grey arrows), including BEN2908, than to any other InPEC strains analysed in this work. This is further supported by the AAI values and proteome coverage analysis, which shows that LF82 proteins have more identity and are more covered if compared to these ExPEC strains than to any InPEC, excepting AIEC NRG857c. These results suggest that STc135 AIEC orthologs closely resemble those of the ExPEC strains that clustered with BEN2908 (also known for its invasive capacity), implying that some genomic features may underlie their pathogenic traits.

Fig 1. Unrooted phylogenetic tree generated and AAI values of each strain to AIEC LF82.

Fig 1

Maximum likelihood phylogeny using the 3,101 common concatenated orthologues detected by OrthoFinder. The strains used are described in Table 1. Branches colored blue represent ExPEC, red represent InPEC, and green represent Commensal strains. The red dotted circle indicates that all nodes within have 100% bootstrap support. The grey arrows indicate the strains that clustered together and showed the highest AAI values and proteome coverage to AIEC LF82.

Supplementary characterization using VFanalyzer, CRISPRCasTyper, Roary and PHASTEST and comparison to AIEC strains

The genetic proximity shown above is further supported by the results of additional characterization tools, such as CRISPRCasTyper and MinCED (for Cas typing and CRISPR spacers identification), VFanalyzer (for virulence related genes screening), and Roary (for core genome comparisons). CRISPR/Cas analysis revealed that, with the exception of CFT073, 78-Pyelo (which doesn’t possess a CRISPR/Cas system) and χ7122, all ExPEC and both AIEC harbour the same Cas subtype (I-F), contrasting to the I-E subtype found in every InPEC and commensal strain (S3 Table). Additionally, excepting χ7122, at least half of the spacers from every ExPEC strain were identified in LF82 and at least one-third in NRG857c, contrasting with none found in common to InPEC or commensal strains (S4 Table). This suggests a genetic relationship between these AIEC and ExPEC, since strains that share a recent evolutionary history often have identical spacers [96]. The presence of virulence and core genes also follow this trend, as a higher proportion of genes common to AIEC and ExPEC were found on both VFanalyzer and Roary. Roary results showed that AIEC strains have more genes in common to ExPEC (3,352) rather than InPEC (3,119) strains, yielding a difference of 233 genes (full output available on GitHub). VFanalyzer identified 85 virulence-related genes in both AIEC strains. Of these, six genes (toxins vat and usp, invasin ibeA, metal transport periplasmic binding protein sitA, and two T6SS components) were found exclusively in at least one ExPEC and absent in all other InPEC strain, in contrast to only three genes (long polar fimbriae genes lpfBCE) exclusively shared with InPEC strains (full output in S5 Table). Despite the small numeric difference, the genes shared exclusively with ExPEC are not restricted to a single locus as the lpfBCE genes, and their presence and functionality are usually related to ExPEC virulence traits, such as pyelonephritis, HBMEC invasion, and cytoskeletal modifications on different hosts cells [22,29,9799]. In contrast, PHASTEST results diverged from the pattern observed, as AIEC most common phages were almost evenly distributed among InPEC and ExPEC strains (S6 Table).

pBEN2908 is a ColV-like plasmid

Colicin V (ColV) plasmids have been shown to play a pivotal role in the pathogenesis of extraintestinal pathogenic E. coli. Its name was attributed to the production of colicin V, a bacteriocin originally described by Gratia in 1925 as “factor V” [100]. These plasmids usually range from 80 to 180 kbp and were identified in a variety of E. coli hosts, with particular significance in poultry infections [101104]. Besides being widespread among hosts, the pathogenicity acquired from those plasmids is also noteworthy: acquisition of ColV-like plasmids by commensal/environmental E. coli from phylogroup B1 was associated with a more than threefold increase in infection rates among patients at a hospital in Paris [8,105]. This pathogenic potential relies on their gene content which encode various virulence factors, especially metal transport and uptake systems, such as the siderophores encoded by the aerobactin (iuc/iut) and salmochelin (iro) genes, but also the sitABCD and hypothetical etsABC metal transport systems [106]. The iss (increased serum survival) gene is also present on ColV plasmids and may contribute collectively with other systems to resist the bactericidal activity of serum complement [107]. Other genes such as outer membrane vesicle regulator hlyF and outer membrane protease ompT, type II toxin-antitoxin system vapBC, SOS inhibition system psiAB, and the conjugative transfer system tra/trb are also commonly found on these plasmids. To screen for ColV-like plasmids we applied the criteria defined by Liu et al. [85] to the plasmids from 18 strains in our selection, as five strains (78-Pyelo, CFT073, 55989, MG1655, and SCU-397) did not harbour any plasmids. Of the 41 plasmids tested (description in S7 Table), only five fitted the criteria: four from the APEC strains (among which BEN2908) and one from the AIEC strain NRG857c (Table 2). To facilitate comparison, we generated a ring comparison containing only these five ColV-like plasmids and the ColV-associated genes described above with pBEN2908 as reference (Fig 2). The pLF82 is distinct from ColV plasmids, closely resembling the cryptic pHCM2 of Salmonella enterica serovar Typhi CT18 (isolated from a typhoid fever patient in Vietnam) [28]. This cryptic plasmid harbours phage‐derived regions, a parAB‐like partitioning module, a suite of DNA replication genes (including helicases, ligases, and exonucleases), and coding sequences similar to genes rnhA (ribonuclease H), dhfR (dihydrofolate reductase), thyA (thymidylate synthase), and nrdAB (ribonucleotide diphosphate reductase), involved in nucleotide synthesis [108]. Despite this difference, finding a ColV-like plasmid uniquely in an AIEC strain supports the idea that these plasmids may act as a zoonotic pathogenic trait derived from APEC [85].

Table 2. Table of identities of ColV-like plasmids and the identity of their genes to the clusters from p1ColV51551.

ST140 ST95 ST23 ST135
Strain IMT5155 BEN2908 APEC O1 χ7122 NRG857c
ColV-like plasmids (Acc. Number) p1ColV5155

(NZ_CP005931.1)
pBEN2908

(LR740777.2)
pAPEC-ColBM

(DQ381420.1)
pAPEC-1

(CP000836.1)
pO83_CORR

(CP001856.1)
Plasmid size (bp) 194,170 133,673 174,241 103,275 147,060
PlasmidFinder best hit IncFIB IncFIB IncFIB IncFIB IncQ12
Cluster 1

(operon iro)
iroN (47259..49436) 100 99.91 99.91 99.91
iroE (49481..50437) 99.9 99.9 99.16 99.9
iroD (50522..51751) 100 100 99.35 100
iroC (51855..55514) 99.97 99.86 99.56 99.86
iroB (55654..56769) 100 99.91 99.91 99.91
Cluster 2

(operon ets)
etsC (68053..69423) 100 99.83
etsB (70647..72143) 100 99.83
etsA (72140..73327) 100 99.83
Cluster 3

(hlyF and ompT)
ompT (76295..77248) 100 99.69 99.79 99.79
hlyF (77681..78790) 100 99.73 99.73 99.73
Cluster 4

(operon sit)
sitA (85429..86343) 100 100 100 99.89
sitB (86343..87170) 99.28 99.88 100 1002
sitC (87167..88027) 98.61 99.54 99.65 99.65
sitD (88024..88882) 96.86 99.77 99.77 99.77
Cluster 5

(operon iut/iuc)
iucA (92214..93938) 100 100 100 100
iucB (93939..94886) 99.89 99.79 99.79 99.79
iucC (94886..96628) 100 100 100 100
iucD (96625..97958) 100 99.78 100 100
iutA (97984..100185) 100 99.91 99.91 99.86
Cluster 6

(operon cva)
cvaA (166828..168069) 100 99.76 99.76
cvaB (168062..170158) 99.95 100 99.95 1003
cvaC (170328..170639) 100 100 99.36
cvi (170617..170853) 100 100 99.16

1The plasmids shown are the ones from the 23 strains that fitted Liu et al. criteria [85] (see Methods section).

2Query not fully covered template length (529/ 796).

311% alignment coverage.

Fig 2. Plasmid comparison using the software BRIG, setting pBEN2908 as the reference strain.

Fig 2

From the innermost to the outermost ring, the following plasmids are shown: p1ColV5155 (NZ_CP005931.1; IMT5155), pAPEC-O1-ColBM (DQ381420.1; APEC O1), pAPEC-1 (CP000836.1; χ7122), and pO83-CORR (CP001856.1; NRG857c). Genes usually present in ColV-like plasmids and the four gene clusters used in Liu et al. for ColV plasmid screening are shown in the figure [85].

Overview and metabolic functions of the GRs in common between ExPEC and AIEC strains

While numerous genes have already been reported to contribute to virulence of extraintestinal E. coli [109,110], our research focused on identifying specific genomic regions possibly harbouring novel genes that could have a role in pathogenesis. Therefore, in an attempt to further analyse the genomic content of the strains that clustered with BEN2908 and AIEC LF82/NRG857c (Fig 1; grey arrows), we generated a ring comparison using the software BRIG. The BEN2908 genome was set as the reference and the following strains as subjects: two APEC (IMT5155 and APEC O1), two NMEC (RS218 and IHE3034), two UPEC (78-Pyelo and CFT073), one AIEC (LF82), and one commensal E. coli strain (MG1655).

The ring comparison identified 36 regions larger than 4 Kbp in the BEN2908 chromosome that were absent (below 50% of coverage) in the commensal strain MG1655 (Fig 3, S1 Table). To characterize the features within these regions, we used BLASTp and CD-Search. This analysis enabled us to classify the features into one or more of the following categories: “Sugar Metabolism” (SM), “Prophages,” “Metabolism of Iron and other metals” (Met), “Secretion Systems” (SS), “Adhesion and Invasion” (A/I), and “Defense Mechanisms” (Def). Among these, 20 regions were also identified in all strains, excepting the commensal MG1655 with 15 exhibiting more than 70% coverage (Table 3, S1 Table). Features that did not fit into any of these categories were grouped as “General Metabolism” (GM). To better infer the functions of the uncharacterized features in this section, we used the KP and KO databases to investigate whether any pathway maps could be related to some of the hypothetical molecular functions of the ORFs identified. The following subsections describe the content of these 20 GRs - which are summarised in Table 4 (Reported features) and Table 5 (Uncharacterized features) – according to their respective categories, while also linking these findings to important ExPEC virulence traits.

Fig 3. Genome comparison using the software BRIG, setting BEN2908 as the reference strain.

Fig 3

From the innermost to the outermost ring, the following strains are shown: APEC IMT5155 and APEC O1 (red tones), NMEC RS218 and IHE3034 (purple tones), UPEC 78-Pyelo and CFT073 (green tones), AIEC LF82 (yellow) and commensal K-12 MG1655 (grey). Genomic regions (GRs) identified by the criteria described in the Methods section are indicated by numbers 1 to 36. The 20 regions marked in orange are common to all strains, except K-12 MG1655.

Table 3. Genomic regions (GRs) of BEN2908 with more than 50% coverage to the ExPEC strains from this study and AIEC LF82.

GR GR Limits Start..Stop1 Completeness2
APEC O1 IMT5155 RS2183 IHE3034 78-Pyelo CFT073 LF824
GR 3 LSU rRNA L31p..rclC 343380..357703 T T T T T T T
GR 4 ykgH..betA 365342..371538 T T T T T T T
GR 6 nagE..glnS 700554..710344 T T T T T T T
GR 8 icd..ybcV 1162715..1213864 T T T T T T P
GR 9 ymgE..treA 1243453..1250266 T T T T T T T
GR 11 yncG..yddH 1528246..1536627 P T T P T T P
GR 12 zinT..shiA 1992335..2062152 P T P P T T P
GR 17 xseA..yfgJ 2704047..2715732 T T T T T T T
GR 19 tRNA-Met-CAT..amiC 3049099..3085277 T T T T T T T
GR 20 tRNA-Phe-GAA..yghD 3235081..3253327 T T T T P P P
GR 21 ygiQ..ftsP 3318762..3326558 T T T T T T T
GR 22 ygiN..parE 3338966..3346267 T T T T T T T
GR 23 accC..yhdT 3563686..3572367 T T T T T T T
GR 24 glpD..glgP 3723141..3733104 T T T T T T T
GR 26 gor..yhiD 3816252..3828842 T T T T T T T
GR 27 yicH..yicI 3991916..3999198 T T T T T T T
GR 29 metE..ysgA 4228188..4244233 T T T T T T T
GR 31 metL..metF 4388343..4396133 T T T T T T T
GR 32 qorA..aphA 4533449..4560039 T T T T T T P

1Refers to the end of the gene immediately upstream and the beginning of the gene immediately downstream the GR.

2GRs with coverage values below 70% were considered partially (P) complete and were underscored and written in italic. GRs with more than 70% were considered totally (T) complete and were written in bold. All GRs had more than 95% sequence identity.

3RS218 and UTI89 have the same pattern. See S1 Table.

4LF82 and NRG857c have the same pattern. See S1 Table.

Table 4. Summary of Genomic Regions (GRs) present in both ExPEC and AIEC strains from this study containing Reported features.

Category GRs Feature Occurrence (size) Reported Features Reference
Sugar Metabolism GR 27 3991960..3999186 (7227 bp) frz operon [18]
Metabolism of Iron and other metals GR 8 1208072..1211398 (3327 bp) sit operon [98]
GR 9 1243503..1249884 (6382 bp) prrA-modD-yc73-fepC cluster [111]
GR 12 1992335..2062152 (69818 bp) HPI [35]
GR 22 3339011..3346203 (7193 bp) fit operon [112]
GR 26 3819781..3828790 (9010 bp) chu operon [113]
Adhesion and Invasion GR 3 344320..357543 (13224 bp) fdeC cluster [114]
GR 4 365526..370467 (4942 bp) PAI-X [115]
GR 17 2704116..2715523 (11408 bp) ila cluster [116]
GR 24 3723195..3732610 (9416 bp) auf operon [117]
Defense Mechanisms GR 20 3235318..3252268(16951 bp) kps cluster [118]
Secretion Systems GR 11 1529090..1530652 (1563 bp) T6SS Effector module 1 [119]
GR 11 1533136..1534173 (1038 bp) T6SS Effector module 2
GR 19 3049735..3079488 (29754 bp) T6SS [120]
General Metabolism GR 29 4232707..4234704 (1998 bp) tkt1 [121]
GR 32 4540790..4542205 (1416 bp) dnaB [122]
GR 32 4542258..4543337 (1080 bp) alr [123]
GR 32 4544994..4546187 (1194 bp) tyrB [124]

Table 5. Genomic Regions (GRs) present in both ExPEC and AIEC strains from this study containing Uncharacterized features and their predicted identification.

Category GRs Feature Occurrence (size) ORF number CD-Search prediction (expect)¹ BLASTp identity

(Acc. code)²
Sugar Metabolism GR 6 701923..700643 (1281 bp) ORF 1₆ CitT (5.48e-24) 88.50% to A0A285B4I9³
702407..701937 (471 bp) ORF 2₆ EbgC (3.73e-41) 32.69% to NanQ (P45424)
703573..702404 (1170 bp) ORF 3₆ COG4692 (3.05e-138) 46.34% to Q92X17³
705816..704929 (888 bp) ORF 4₆ DapA (2.68e-102) 42.46% to Q92X22³
706977..705820 (1158 bp) ORF 5₆ Fe-ADH (4.41e-129) 34.15% to Q92X15³
707147..708382 (1236 bp) ORF 6₆ OtnK (3.80e-126) 40.38% to DtnK (Q8ZRS5)
708375..709361 (987 bp) ORF 7₆ PdxA (0.0) 75.38% to PdxA2 (P58718)
709393..710124 (732 bp) ORF 8₆ GlpR (1.04e-89) 32.50% to YgbI (P52598)
GR 19 3080737..3079790 (948 bp) ORF 1₁₉ PGDH_like_2 (6.92e-130) 32% to SerA (P0A9T0)
3081405..3080809 (597 bp) ORF 2₁₉ GutQ (3.17e-88) 52.43% to KdsD (P45395)
3082583..3081408 (1176 bp) ORF 3₁₉ MalY (1.04e-165) 36.27% to MalY (P23256)
3084112..3082583 (1530 bp) ORF 4₁₉ PRK10110 (4.69e-155) 41.02% to MalX (P19642)
3085019..3084195 (825 bp) ORF 5₁₉ PRK09772 (7.66e-28) 30% to BglG (P11989)
GR 21 3320086..3321558 (1473 bp) ORF 221 MtlD (0.0) 49.2% to UxuB (P39160)
3321555..3322571 (1017 bp) ORF 321 Zn_ADH7 (1.72e-163) 55.3% to YjjN

(P39400)
GR 23 3564824..3563844 (981 bp) ORF 1₂₃ Aldose_epim_Ec_c4013 (1.11e-145) 32.86% to GalM (P0A9C3)
3565783..3564821 (963 bp) ORF 2₂₃ KdgK (4.07e-104) 24.05% to KdgK (P37647)
3566794..3565805 (990 bp) ORF 3₂₃ AraH (4.09e-102) 42.31% to RbsC (P0AGI1)
3568294..3566795 (1500 bp) ORF 4₂₃ MglA (0.0) 44.06% to RbsA (P04983)
3569245..3568355 (891 bp) ORF 5₂₃ PBP1_ABC_sugar_binding-like (1.88e-108) 32.07% to RbsB (P02925)
3570135..3569281 (855 bp) ORF 6₂₃ GatY (0.0) 63.96% to GatY (P0C8J6)
3570492..3571307 (816 bp) ORF 7₂₃ GlpR (3.05e-50) 32.60% to GlpR (P0ACL0)
3571273..3572274 (1002 bp) ORF 8₂₃ KdgK (3.75e-91) 22.69% to KdgK (P37647)
GR 29 4230549..4231106 (558 bp) ORF 1₂₉ SIS_PHI (2.42e-80) 39.44% to HxlB (P42404)
4231148..4232656 (1509 bp) ORF 2₂₉ PTS-II-BC-glcB (0.0) 44.17% to PtsG (P69786)
4235579..4234734 (846 bp) ORF 4₂₉ RpiR (3.21e-74) 18.33% to RpiR (P0ACS7)
4236685..4235780 (906 bp) ORF 5₂₉ PRK11074 (1.15e-95) 22.03% to LysR (P03030)
Defense Mechanisms GR 19 3081405..3080809 (597 bp) ORF 2₁₉ GutQ (3.17e-88) 52.43% to KdsD (P45395)
General Metabolism GR 21 3319014..3319763 (750 bp) ORF 1₂₁ PRK10225 (6.60e-60) 24.68% to FadR (P0A8V6)
3322582..3323592 (1011 bp) ORF 4₂₁ AllD (1.29e-130) 40.43% to AllD (P77555)
3323665..3324648 (984 bp) ORF 5₂₁ PBP2_TRAP_SBP_like_3 (2.06e-146) 32.45% to DctP (P37735)
3324690..3325172 (483 bp) ORF 6₂₁ DctM (2.14e-32) 30.11% to DctQ (O07837)
3325183..3326487 (1305 bp) ORF 7₂₁ DctQ (8.69e-144) 36.30% to DctM (O07838)
GR 29 4237986..4236763 (1224 bp) ORF 6₂₉ SLC-NCS1sbd_CobB-like (9.35e-76) 26.50% to CodB (P0AA82)
4239374..4238418 (957 bp) ORF 8₂₉ PRK12686 (0.0) 47.16% to YqeA (Q46807)
4240791..4239367 (1425 bp) ORF 9₂₉ DUF1116 (5.27e-129) 45.82% to YahG (P77221)
4242347..4240788 (1560 bp) ORF 10₂₉ PRK06091 (3.02e-165) 40.48% to FdrA (Q47208)
4243976..4243308 (669 bp) ORF 12₂₉ PncA (1.92e-48) 31.82% to RutB (P75897)
GR 31 4389421..4388561 (861 bp) ORF 1₃₁ PRK15106 (2.14e-175) 67.32% to Tsx (P0A261)
4391117..4389498 (1620 bp) ORF 2₃₁ UshA (3.32e-148) 27.61% to YfkN (O34313)
4392708..4391158 (1551 bp) ORF 3₃₁ UshA (3.90e-152) 27.52% to YfkN (O34313)
4394198..4395751 (1554 bp) ORF 4₃₁ UshA (1.09e-145) 28.04% to YfkN (O34313)
GR 324 4546369..4549113 (2745 bp) ORF 11₃₂ SucA (0.0) 48.76% to SucA/Odo1 (P0AFG3)
4549146..4550300 (1155 bp) ORF 12₃₂ PRK05704 (0.0) 50.25% to SucB/Odo2 (P0AFG6)
4550312..4551730 (1419 bp) ORF 13₃₂ PRK06327 (0.0) 38.65% to LpdA (P0A9P0)
4551752..4552921 (1170 bp) ORF 14₃₂ SucC (0.0) 55.96% to SucC (P0A836)
4552934..4553806 (873 bp) ORF 15₃₂ SucD (0.0) 67.59% to SucD (P0AGE9)
4554018..4555517 (1500 bp) ORF 16₃₂ Na_sulph_symp (2.55e-87) 63.01% to Orf3 (Q07252)
4555529..4556602 (1074 bp) ORF 17₃₂ AllD (1.26e-112) 44.88% to Ldh (Q07251)
4557949..4556591 (1359 bp) ORF 18₃₂ AtoC (0.0) 52.26% to DctD

(Q9HU19)
4559762..4557942 (1821 bp) ORF 19₃₂ PDC1_DGC_like (2.41e-10)

COG4191 (1.23e-77)
32.09% to DctB

(Q9HU20)

1specific hit or non-specific hit with expect value closer to 0.

2Preferentially reviewed entries were selected from the UniProtKB database, unreviewed entries were selected when the unreviewed homolog was previously mentioned in the literature (ORFs 36, 46, and 56; [90]) or when no reviewed entry was found (ORF 16).

3Unreviewed entry.

4Only the CDSs common to AIEC strains were shown, to see the complete annotation and characterization of GR 32, check S2 Table.

Sugar metabolism (SM).

Reported. The genomic regions 6, 19, 21, 23, 27 and 29 contain genes predicted to contribute to sugar metabolism (S2 Table), one of which (GR 27) has already been described. GR 27 contains a carbohydrate metabolic operon, named frz [18]. In this work, the authors showed that the presence of the frz operon promoted fitness, adhesion and internalization to different eukaryotic cell lines. Moreover, hybridization analysis of 151 ExPEC and 35 non-pathogenic avian E. coli strains showed that this operon is rarely present in non-pathogenic strains (5%) and its association increases with virulence, reaching 75% in the most virulent group [18].

Uncharacterized. The number of distinct GRs containing genes related to SM supports the importance of varied carbohydrate usage and uptake, enabling the bacteria to acquire various sugars that could be linked to other important metabolic functions. Four genomic regions identified in the ring comparison (GRs 19, 21, 23 and 29) encode genes predicted to be related to the transport of carbohydrates (Table 5) such as Phosphotransferase systems (PTS: ORFs 419 and 129), ATP-binding cassette transporters (ABC-T: ORFs 323, 423, and 523) and Tripartite ATP-independent periplasmic transporters (TRAP-T: ORFs 521, 621, and 721) as well as probable transcriptional regulators (ORFs 519, 121, 723, 429, and 529). Additionally, with the exception of GR 21, hypothetical epimerases or isomerases were also present in those GRs (ORF 219, ORF 123, and ORF 129), likely playing a role in converting isomeric sugars into a form that can be metabolically processed by other enzymes following uptake.

Besides transport systems, additional enzymes potentially involved in carbohydrate processing were also identified. GR 6 harbours a pair of CDSs (ORFs 66 and 76) structurally analogous to the PdxA2 and DUF1537 protein families. This gene pair is involved in pyridoxal-5’-phosphate synthesis, an important cofactor for many enzymes such as alanine aminotransferases, which catalyze the reversible transfer of an amino group from alanine to 2-oxoglutarate to generate glutamate and pyruvate [90,125]. Some of the homologs of the identified ORFs in this region also cluster together in some bacteria. In Sinorhizobium meliloti 1021, for instance, enzymes structurally analogous to a sialidase (ORF 36), a 4-hydroxy-tetrahydrodipicolinate synthase (dapA; ORF 46), and an alcohol dehydrogenase (adh; ORF 56), are encoded in a similar cluster. Additionally, S. meliloti carries an ABC transporter absent from the pathogenic E. coli strains. However, CD-Search of ORF 16 returned a modest match to a citrate permease (Table 5, S2 Table), suggesting a possible transporter role that would require more robust confirmation [90].

Other features could also be noticed in these regions. Region 19, for instance, encodes a probable phosphoglycerate dehydrogenase (ORF 119) and a beta-cystathionase (ORF 319) with structural similarity to malY. MalY is capable of cleaving C-S linkages, producing central metabolic compounds such as pyruvate, and also represses the activity of the maltose regulon, being involved in catabolic regulation [126]. Additionally, in GR 23, a gene encoding a hypothetical tagatose aldolase (ORF 623), showing high similarity to the catalytic GatY aldolase subunit from E. coli K-12 (63.95% identity) was identified. This enzyme plays an important role in glycolysis and gluconeogenesis, as it reversibly converts tagatose-1,6-bisphosphate to D-Glyceraldehyde-3P (G3P) and dihydroxyacetone phosphate (DHAP) [127]. Interestingly, by analysing three KEGG pathway maps it was possible to trace a connection among ORFs 221 and 321 (GR 21), and 223/823 (GR 23) that may also lead to the production of these compounds (Fig 4).

Fig 4. Predicted sugar metabolism pathway based on KEGG maps and KEGG Orthology.

Fig 4

ORFs highlighted in purple represent putative novel coding sequences, while those in red correspond to genes located in other regions of the genome. ORFs from GRs 19, 21, 23 appear to be involved in the conversion of different sugars to KDPG, and, after, to Pyruvate and G3P, two important intermediates of glycolytic pathways.

The predicted protein product of ORF 221 presented homology to YjjN, an enzyme that converts L-galactonate to D-tagaturonate [128], which is a molecule present in both “Ascorbate and Aldarate metabolism” and “Pentose and Glucuronate interconversions” KEGG pathway. On this second map, the homolog of ORF 321, named UxuB, catalyses the reversible interconversion of D-fructuronate to D-mannonate, and vice-versa [129]. By employing other enzymes present elsewhere on the genome (Fig 4, depicted in red: UxaA, UxaB, and UxuA), both reaction products (D-tagaturonate and D-mannonate) may be transformed to 2-dehydro-3-deoxy-D-gluconate (KDG). KDG is the substrate of the enzyme KdgK, whose sequences encoded by ORFs 223 and 823 showed significant structural homology containing all the specific ATP binding sites and active sites (S2 Table). This enzyme phosphorylates KDG, producing 2-dehydro-3-deoxy-D-gluconate-6P (KDPG), which is degraded to pyruvate and G3P in the final step of the Entner-Doudoroff pathway of glucose oxidation (Fig 4).

Metabolism of Iron and other metals (Met).

Reported. For a bacterium to obtain iron within host extraintestinal tissues and fluids it requires specialized uptake systems called siderophores or other high-affinity systems to uptake iron-containing molecules such as haem or transferrin. Although the conserved siderophore enterobactin is the most widespread in E. coli, this siderophore is effective only in the intestinal lumen, as it is neutralized in extra-intestinal fluids by albumin and siderocalins [130]. Consequently, ExPEC lineages rely on a variety of patho-specific siderophores that evade host innate defense proteins such as siderocalin.

For that matter, different kinds of metal uptake systems contribute to the success of APEC strains in different types of infection, depending on the specific microenvironment inhabited by the pathogen [131]. All five regions (GRs 8, 9, 12, 22, and 26) contained previously reported ABC transporters and correlated metabolic enzymes: the sit operon (GR 8; related to Fe+2, but mainly Mn+2 uptake [98]), the prrA, modD, fepC, yc73 cluster (GR 9; possibly mediating iron uptake; [111]), the yersiniabactin cluster (GR 12; related to Fe+3 and Zn+2 uptake by the yersiniabactin siderophore system [132]), the fit operon (GR 22; related to Fe+2, Co+2, and Cd+2 uptake [112]) and the chu operon (GR 26; related to the capture of haem through haemophores [113]). The roles of these systems for APEC infection were previously described in a number of reports [101,133,134]. For instance, it was shown that the most virulent APEC strains were able to grow in the presence of transferrin, in contrast to non-lethal strains, probably because these virulent strains contained additional iron uptake systems [135]. Also, it has been demonstrated that APEC O1 upregulated 13 genes related to metal uptake out of 20, when in contact with chicken serum and, among them were genes encoded by the chu, ybt (yersiniabactin), sit and fep operons [136].

Adhesion and Invasion (A/I).

Reported. Four GRs previously reported in E. coli were linked to bacterial adhesion and invasion (GRs 3, 4, 17, and 24, Table 4). The capacity to adhere and invade is perhaps one of the main characteristics of strain BEN2908 [14,2022]. An extensively studied factor involved in this capacity is the type 1 fimbria (T1F) encoded by the fim operon, present in most E. coli strains. This fimbria attaches to mannosylated host cell receptors, contributing directly to adhesion and invasion by various E. coli pathotypes [94]; T1F is considered a major virulence factor of ExPEC. Despite its conservation in most strains, some strains are unable to express T1F. T1F is regulated by a phase variation mechanism mediated by an invertible promoter switch, FimS, whose orientation can be flipped by a pair of recombinases called FimB and FimE [137]. While FimB inverts the fim switch in the on-to-off and off-to-on orientations with similar efficiencies, FimE inverts it rapidly in the on-to-off orientation [138]. Aside from these two recombinases, the gene fimX, present in GR 4, is an additional recombinase found in some strains. In the absence of the fimE and fimB, fimX also plays a role in phase variation, turning the E. coli fimbriae rapidly to an “ON” state in vivo [139], further increasing adhesion and invasion properties as well.

In addition to T1F, the fimbrial system encoded by the auf operon, which has been linked to UPEC pathogenesis is present in GR 24 [140]. The auf operon is structurally similar to the fim operon, also containing: an adhesin (AufG), major (AufA) and minor subunits (AufE and AufD), two chaperones (AufB and AufF), and an usher (AufC) protein. The expression of this operon, evaluated by RT-PCR, was observed in strain CFT073 in at least three different times during infection of mice (4 h, 24 h, and 48 h post-infection) [117]. Aside from fimbrial systems, other genomic features are also linked to bacterial adherence [14,16,18,141]. Two other previously reported adhesin-related loci were detected: the gene encoding the E. coli adherence factor, FdeC (GR 3; [114]), and the Intimin-like Adhesin (Ila) encoding gene cluster (GR 17; [116]). Gene fdeC codes for a single large 1,416 residue protein with 95% identity to the surface protein EaeH, which was shown to promote adhesion [142]. It was shown by confocal micrographs that a strain lacking fdeC was incapable of adhering to bladder cells from the UM-UC-3 line, appearing to be involved in urinary tract infection (UTI) [114]. Moreover, the fdeC gene belongs to a locus containing eight other CDS, five structurally related to reductases and three putative regulatory genes. The ila cluster comprises three genes (sinH, sinI, ratA), and is probably derived from the genetic island CS54 identified in Salmonella enterica, which contains five genes, and is also linked to adherence and cell invasion. Many UPEC strains harbour the ila gene cluster, and it was shown that the deletion of these genes attenuated strains in a murine UTI model, due to reduced bladder cell invasion and decreased capacity to ascend the urinary tract, depending on the gene that was deleted [116].

Defense mechanisms.

Reported. The modification of the bacterial cell surface is also a common defense strategy against host defenses and antimicrobial peptides. GR 20 harbours a genetic cluster known as the kps cluster, responsible for the synthesis and transport of components of the bacterial capsule [118]. The bacterial capsule is a polysaccharide-rich layer that surrounds the cell, playing a key role in evading host immune responses, mediating surface adhesion, and resisting desiccation. In E. coli, approximately 80 different capsule serotypes have been identified, highlighting the significance of capsule usage and adaptation [143].

Uncharacterized. Gram-negative bacteria have evolved a second external membrane that selectively allows compounds to enter through protein pores with size-exclusion properties or via diffusion across its hydrophobic lipid bilayer. Structures in the outer leaflet of this bilayer, such as lipopolysaccharides (LPS), play a critical defensive role. LPS is a modified lipid comprising three parts: Lipid A, the core oligosaccharide, and the highly variable O antigen, all of which can undergo modifications in response to environmental pressures [144]. For instance, Lipid A can be modified by the addition of 4-amino-4-deoxy-L-arabinose (L-Ara4N), a process likely mediated by ORF 219 in GR 19, which shows high sequence homology with the KdsD arabinose isomerase responsible for this modification [145]. Such alterations can enhance bacterial resistance to positively charged particles, including cationic antimicrobial peptides (CAMP), thus contributing significantly to bacterial defense and survival.

Secretion systems.

Reported. GRs encoding secretion systems belonged to the type 6 secretion system (T6SS), which are widely spread among Proteobacteria. The T6SS is a secretion apparatus that bacteria use to transport effector molecules that usually disrupt the targeted cell wall, being structurally related to a prophage injection apparatus. In E. coli, the T6SS has been described to be involved in interbacterial competition and in bacteria-host cell interactions, secreting different types of effectors, depending on the targeted structure [146]. In addition, many reports on E. coli have tested the role of T6SS in different E. coli pathotypes, being involved not only in the capacity to alter the structural actin filaments in human brain microvascular endothelial cells (HBMEC), but also to promote antibacterial effects through DNase activity [97].

To investigate the presence of T6SS genes in the strains investigated, we have employed the SecReT6 web resource [147]. Two GRs containing T6SS components could be identified: GRs 11 and 19. The latter contained more than fifteen CDS of this secretion system, being classified as T6SS-2, which is among the most prevalent sets found in APEC [97,148].

In addition to this cluster, the program SecReT6 identified two additional copies of the tssI (vgrG) gene in GR 11. One complete copy, with 1563 bp, exhibited 99.6% identity to the tssI from cluster 1 of APEC strain DE719 [149], while the smaller copy, with only 960 bp, showed 49.2% identity to the same gene. It’s not unusual to find various copies of the tssI/vgrG homologs in the same strain, being reported in different gram-negative bacteria such as P. aeruginosa and V. cholerae [120,150]. For example, it has been shown that different VgrG homologs from V. cholerae strain V52 have distinct impacts on virulence, as only one of the three copies (VgrG-1) was able to cause modifications in cell actin structure. Moreover, these homologs have been found to interact with each other, forming various multimeric complexes that may affect the targeted cell differently. So, the presence of multiple copies of the tssI/vgrG gene in E. coli may have a potential for diverse functions, as demonstrated in other gram-negative bacteria.

General Metabolism (GM).

The BRIG analysis identified four GRs associated with general or accessory metabolic genes present in the strains phylogenetically close to BEN2908 and AIEC LF82/NRG85. Of these, two (GRs 29 and 32) contained both reported and uncharacterized features, and two (GRs 21 and 31) consisted exclusively of uncharacterized features.

Reported. GR 29 is a 16 kb Genomic Island strongly associated with ExPEC from the phylogroup B2 and first identified in the APEC O1 genome [44]. In that island, a putative transketolase named Tkt1 (ORF 329) showed 68% amino acid identity to TktA from Vibrio cholerae [121]. Notably, the Tkt1 protein could not complement the function of TktA involved in L-arabinose usage as a carbon source. Instead, it showed activity as an enzyme involved in peptide nitrogen extraction, since a mutant of this gene showed defects in the use of Pro-Ala or Phe-Ala as a nitrogen source. Interestingly, in GR 32, the genes alr and tyrB are both involved in the usage of some of the same residues tested by Li et al., since alr encodes an alanine racemase involved in the interconversion of both stereoisomers of alanine while tyrB encodes an aminotransferase that uses aromatic residues to transfer its amino group to 2-oxoglutarate and vice-versa [121]. In addition to these genes, GR 32 also harbours dnaB that encodes the extensively studied DNA helicase, which is the main replicative DNA helicase, participating in initiation and elongation during chromosome replication.

Uncharacterized. Other hypothetical enzymes encoded by genes in GR 29 (Table 5) also could be related to nitrogen obtention via carbamate hydrolysis. In the Rut pathway, nitrogen is obtained by cleaving the pyrimidine ring of a nitrogenous base, initially forming ureidoacrylate (product of RutA), followed by aminoacrylate and carbamate (product of RutB). Carbamate, in turn, is hydrolysed yielding nitrogen (as ammonia) and carbon (as CO2) [151]. Notably, while ORF 1229 presented sequence homology to RutB, ORF 629 showed similarity to a cytosine permease channel named CodB [152], indicating the potential presence of a pyrimidine transporter in this region. ORFs 829 and 1029, in turn, presented homology to a carbamate kinase (YqeA) and to a oxamate carbamoyltransferase (FdrA), both capable of performing their reactions bidirectionally [153]. While FdrA transforms oxalurate to oxamate and carbamoyl phosphate, YqeA catalyses the transfer of phosphate from carbamoyl phosphate to ADP, forming ATP and leaving carbamate as a by-product, which, as mentioned earlier, is hydrolyzed.

Further expanding the network of nitrogen-related processes, the genes that encode hypothetical enzymes identified in GR 31 could be related to nucleotide processing or degradation. Despite displaying a modest (around 27%) amino acid identity to the B. subtilis 168 YfkN nucleotidase and no identity to the E. coli K-12 UshA nucleotidase, ORFs 231, 331, and 431 each exhibited highly significant CD-Search matches to the conserved domains of UshA (Table 5; S2 Table). UshA is a 5′-nucleotidase that cleaves phosphate groups from nucleotides (with secondary NAD-pyrophosphatase activity), preserving all metal binding and active sites. Further, ORF 131 exhibited high sequence identity to the nucleoside-specific channel Tsx, known to form pores for nucleoside uptake under substrate-limiting conditions [154,155]. Together, these findings could suggest the presence of a coordinated system for nucleoside degradation and transport, probably facilitating the processing of nucleosides.

Furthermore, other ORFs identified in GR 32 may have a role in the synthesis of tricarboxylic acid cycle intermediates, such as succinate. By plotting the KO number of the homologs of ORFs 1132, 1232, 1332, 1432 and 1532 in KEGG “Citrate Cycle (TCA Cycle)” map, it is noteworthy that they encode enzymes of a chain of reactions that lead to succinate production from 2-oxoglutarate (Fig 5). These ORFs exhibited high protein identity and all the conserved functional sites to SucA, SucB, LpdA, SucC, and SucD, respectively (Table 5, S2 Table). Moreover, ORF 1632 showed significant similarity to the Orf3 permease from Cupriavidus necator H16, a dicarboxylate transporter responsible for the translocation of TCA cycle intermediates, including citrate, alpha-ketoglutarate, and succinate. This ORF also displayed the 206 conserved residues of the transmembrane helices of the Solute Carrier 13 permease domain, characteristic of these transporters [156]. Other ORFs also presented homology to dicarboxylate transporters, such as ORF 16, which showed some structural similarity to CitT, a citrate transporter [157], and ORFs 521, 621, and 721, mentioned above in the Sugar Metabolism subsection, that encode a hypothetical TRAP transporter, of which the most studied, dctPQM from R. capsulatus, is involved in C4-dicarboxylate uptake [158]. Notably, the two last ORFs from GR 32 presented similarity to the two-component regulatory system responsible for the expression of some dicarboxylate transporters. ORFs 1832 and 1932 presented significant homology to DctD and DctB from P. aeruginosa PAO1, respectively. When dicarboxylates are present, DctB autophosphorylates and transfers its phosphate group to DctD. DctD, in turn, binds to the promoter regions of DctA and DctPQM, which interacts with RpoN sigma factor, enabling its transcription and the posterior uptake of TCA cycle intermediates [159].

Fig 5. Predicted dicarboxylate utilization pathway based on KEGG maps and KEGG Orthology.

Fig 5

ORFs highlighted in purple represent putative novel coding sequences. Some ORFs from GR 32 appear to be involved in the conversion of 2-oxoglutarate to succinate, forming a second gene module analogous to the canonical sucABCD from E. coli K-12 MG1655.

Genomic rearrangements as influential forces for pathogenic E. coli adaptation

Some studies have shown that horizontal gene transfer (HGT) and gene duplication are important differentiation events in E. coli [160,161]. These events not only directly augment the arsenal of virulence genes, but also may give rise to new copies of existing genes that can evolve to acquire novel functions (neofunctionalization), complementary functions (subfunctionalization) or equal functions but with a different stimulatory network or a different expressing dosage [162].

Many GRs (6, 19, 21, 23, 29, and 32) contained ORFs that showed significant structural and sequence homology to known transporters (Table 5, S2 Table). Transporters are one of the most common categories seen in HGT and duplication events, because they are encoded by fewer genes than other systems, and possess fewer pleiotropic constraints, favouring their posterior fixation on the genome [163].

In our report, many of the GRs identified had gene modules that probably originated from HGT events and posterior gene rearrangements. In Fig 6A, for instance, the pdxA-like cluster from the pathogenic strains from this study is compared to one of the three pdxA-like clusters from S. meliloti 1021, described in [90] (Fig 2). Although the modular structure is different than any of the three from S. meliloti 1021, the sequence identity between five of the genes support the likelihood of a transfer event in the past. Curiously, S. enterica LT2 has a similar, but smaller cluster that was capable of complementing the function of the canonical pdxA gene in vitro (4PHT dehydrogenase) [90]. Notably, three of the four protein sequences from this cluster showed an even higher identity than S. meliloti 1021 (40.38% to ORF 66, 75.38% to ORF 76, and 71.17% to ORF 86)(Table 5, S2 Table). This suggests that Salmonella enterica LT2 suffered enough selection pressure to discard some of the genes acquired, although more studies would be required to understand the correct timeline of events related to this cluster.

Fig 6. Comparison of operons with gene organization similar to the observed in GRs 6, 29 and 32.

Fig 6

Fig 6B shows the high resemblance of ORFs from GR 29 to genes from Photobacterium profundum 3TCK, a resemblance significantly higher than to the homologous genes present on the E. coli K-12 genome (S2 Table). Photobacterium is a cosmopolitan genus of gram-negative marine bacterium that belongs to the family Vibrionaceae, and is present on the oceans in different depths, being considered piezophilic (optimal growth in high hydrostatic pressures) or non-piezophilic, depending on the species or even the strains. To live in different depths, each strain requires a series of adaptations to different chemo-physical parameters, such as light, hydrostatic pressure, and organic carbon or nitrogen availability [164]. The strain P. profundum 3TCK is a non-piezophilic strain that was isolated from shallow waters in the San Diego Bay, it presents optimal growth under atmospheric pressure and at a broad range of temperatures (0 to above 20o C) [165]. The presence of this organism in shallow waters of a populous bay may have facilitated a transfer event to Escherichia at some point in the past [166]. Hypothetically, because this organism is from marine environments, the gene module shown on Fig 6B may be expressed to facilitate nitrogen acquisition through pyrimidine degradation (as mentioned in the GM subsection) when the E. coli strains are exposed to sea-like conditions (high salinity, colder temperature, etc). Curiously, it has been shown using a BIOLOG assay that another shallow water Photobacterium, named P. marinum J15, is capable of obtaining nitrogen through 41 (out of 95) different substrates, including the pyrimidines uridine and cytidine. Moreover, genomic analysis revealed that this strain also carries a carbamate kinase (similar to ORF 829), which is likely involved in the nitrogen processing pathway [167].

Different from the cases depicted in Fig 6A (GR 6)in Fig 6B (GR 29),and in Fig 6C (GR 32), only 12 of the 19 ORFs identified in ExPEC were also present in AIEC strains. Among these, three are well-known genes: the helicase dnaB (ORF 632), the alanine racemase alr (ORF 732), and the aromatic aminotransferase tyrB (ORF 1032). Five of the missing ones occur between the quinone reductase, qorA (which is the GR upper limit) and dnaB, and the other two between alr and tyrB (for more information, see S2 Table). All the remaining ORFs (with the exception of ORFs 1732 and 1932) displayed more than 50% amino acid identity to homologous genes from different bacteria, suggesting that more than one genomic rearrangement has occurred throughout time. The first portion (Fig 6C, triangle), showed significant identity to the canonical sucABCD operon from E. coli K-12, although it had even greater similarity in sequence and operon arrangement to the sucAB-lpdA2-sucCD cluster from Chromobacterium violaceum ATCC 12472. C. violaceum is a gram-negative saprophyte found in water and soil of tropical regions. Nonetheless, it can also act as an opportunistic pathogen, with infections typically arising from contact with contaminated water or exposure of skin lesions to infected soil or water. This bacterium was already isolated in cases of bacteraemia, septicaemia, and UTI, with some infections leading to death [168]. Curiously, the second (Fig 6C, circle) and third portions (Fig 6C, square) of the GR 32 gene module are absent in C. violaceum. Instead, they show significant homology to a probable dicarboxylate permease and a lactate dehydrogenase from the gram-negative soil bacterium Cupriavidus necator H16, as well as to the two-component regulatory system dctBD of the TRAP transporter dctPQM from the gram-negative opportunistic pathogen Pseudomonas aeruginosa PAO1. This suggests that the genomic rearrangements within GR 32 have formed a “mosaic” gene module [155], and although some genes are absent in AIEC, the ORFs depicted in Fig 6C are conserved, preserving key modules regarding catabolism of dicarboxylates such as genes encoding proteins for transport (ORF 1632), processing (ORFs 1132, 1232, 1332, 1432, and 1532), and regulation (ORFs 1832 and 1932).

Conclusion

Despite some differences such as the variation in phage-related regions (S6 Table), STc135 AIEC strains show strong genomic similarity to ExPEC strains, greater than with the InPEC included in the current genomic comparative analyses. This close phylogenetic relatedness (Fig 1), shared virulence gene profile (S5 Table), overlapping CRISPR spacers (S3 and S4 Tables) and greater core genome similarity support a recent common ancestry.

The ring comparison (Fig 3) revealed that of the 36 genomic regions of BEN2908 absent in E. coli K-12, 20 were also present on AIEC and ExPEC strains studied. Further investigation of these 20 regions reinforced aspects regarding ExPEC pathogenesis, such as the importance of iron and sugar uptake and metabolism (as supported by the number of GRs containing features related to those aspects, especially transporters), but also the importance of the T6SS (as effectors and cluster identified are commonly studied in APEC pathogenesis). In addition to that, the analyses of some of the GR features made with BLASTp, CD-Search, and KEGG databases indicated that ORFs related to nitrogen processing (identified in GRs 29, 31, and 32) and dicarboxylate uptake and metabolism (identified in GRs 6, 21, and 32) also appear to be relevant traits in these E. coli lineages. Although deletion of some of the ORF homologs from those GRs resulted in decreased virulence in other bacteria, such as sucA/ORF 1132 in CFT073 [169], sucABCD/ORFs 1132, 1232, 1432, 1532 in Salmonella Typhimurium [170], yqeA/ORF 829 in E. coli O102-ST405 [171], and dctP/ORF 521 in Vibrio alginolyticus [172], none of these bacteria had a second copy of the affected genes, which is the case identified here. Moreover, the architecture of some of the genomic modules explored in this paper clearly resembles the architecture described for bacteria from other genera, with several genes presenting high amino acid identity, particularly those from GRs 29 and 32 (Fig 6) to those in other bacterial species. In spite of that, this is a predictive work and experimental validation for these in silico analysis is necessary to define the true functionality of the novel ORFs identified.

Finally, it is interesting to consider that the year of isolation of BEN2908 (1977) and the phylogenetically close strains studied spans decades, so the uncharacterized ORFs identified in this work (Table 5) remained conserved in those E. coli lineages, possibly indicating relevant roles for fitness, host adaptation, and virulence.

Supporting information

S1 Table. Table with GC content, coverage and identity.

1 GC content calculated with the https://jamiemcgowan.ie/bioinf/gc_content.html web tool. 2 Bolded values have more than 70% coverage. Underscored values have between 50 and 70% coverage.

(XLSX)

pone.0342894.s001.xlsx (22.8KB, xlsx)
S2 Table. Table containing all GRs with reported and uncharacterized features.

1 specific hit or non-specific hit with expect value closer to 0. Commented in the cells are the conserved residues identified by CD-Blast. 2 Sequencing made by Schouler and Trotereau (2016), available at: https://www.ncbi.nlm.nih.gov/nuccore/AY395687.1. 3 Unreviewed entries selected because of their mention in the following papers: ORFs 3₆-5₆ (Thiaville et al., 2016); ORF 2₃₄ (Majumdar et al. 2004); ORF 5₃₄ (Gárcia-Sanchez et al., 2021); and ORFs 9₃₄-13₃₄ (Lim et al., 2015). 4 Unreviewed entry. No reviewed entries found. 5 PHASTEST attL (2101049.2101064) and attR (2167393.2167408) are too upstream and downstream, respectively, encompassing genes present in all strains – even those lacking phage content. Therefore, the region boundaries were defined using the same criteria applied to the other GRs: end of the first gene common to all strains, beginning of the last gene common to all strains.

(XLSX)

pone.0342894.s002.xlsx (55KB, xlsx)
S3 Table. Table with information regarding spacers and type of CRISPR/Cas system using MinCED and CRISPRCastyper.

1 CFT073, 78-Pyelo and E2348/69 doesn’t possess a CRISPR/Cas system.

(XLSX)

pone.0342894.s003.xlsx (26.8KB, xlsx)
S4 Table. Table containing the number of spacers shared in each strain.

1 CFT073, 78-Pyelo and E2348/69 doesn’t possess a CRISPR/Cas system. 2 Spacer sequence is present, but displayed one to four nucleotide insertions located at their 5′ or 3′ ends.

(XLSX)

pone.0342894.s004.xlsx (15.5KB, xlsx)
S5 Table. Table generated using VFAnalyzer (VFDB) with the 23 strains.

1 Highlighted in any color is shown the 85 genes identified in AIEC strains by VFanalyzer, of which: Yellow represents the 55 genes present among at least one InPEC, one ExPEC, one Commensal, and LF82. Green represents the 18 genes present among at least one InPEC, one ExPEC, and LF82. Orange represents the 6 genes present between at least one ExPEC and LF82. Blue represents the 3 genes present between at least one ExPEC, one Commensal, and LF82. Red represents the 3 genes present between at least one InPEC and LF82.

(XLSX)

pone.0342894.s005.xlsx (73.7KB, xlsx)
S6 Table. PHASTEST summary statistics of the 23 strains from this study.

(XLSX)

pone.0342894.s006.xlsx (34.3KB, xlsx)
S7 Table. Plasmids statistics.

1 The strains IHE3034, CFT073, 55989, SCU-397, and K-12 do not harbour any plasmids. 2 Three plasmids from the strain χ7122 and the plasmid from LF82 were not available for download. 3 pLF82 information was extracted from Miquel et al. (2010) Table 2 and Supplemental S1 Table. 4 Small plasmid size, RAST annotation failed. Information obtained from Genbank and GC content calculator: https://jamiemcgowan.ie/bioinf/gc_content.html.

(XLSX)

pone.0342894.s007.xlsx (20.6KB, xlsx)

Acknowledgments

We thank Raul Simon Batista for assisting in editing the preliminary results.

Data Availability

The sequences of the BEN2908 chromosome and plasmid are available in GenBank under the following accession numbers: LR740776.1; LR740777.2. Raw reads were submitted to Sequencing Read Archive (SRA) under the following BioProject: PRJNA1359407. The Python scripts used, BRIG alignment files, FASTA sequences of each GR, their annotations generated by RAST, as well as EzAAI, RAxML, Orthofinder, Roary, PHASTEST outputs and the sequencing files mentioned in methods section, are available on GitHub at the following repository: https://github.com/Martins-TW/BEN2908_Genome_Analysis.git. BEN2908 strain has been deposited at the International Center for Microbial Resources—Bacterial Pathogens (CIRM-BP) under name CIRMBP-1386.

Funding Statement

This study has received funding from DGAL within the EcoAntibio2 call, COLIPHAVI project, France (C.S.), and from CNPq (Projeto Universal 423.902/2016‐4) and FAPERGS (PPSUS 21/2551-0000079-1), Brazil (F.H.). T.W.M. was the recipient of a CAPES Master studentship (DS 88887921543/2023-00). A.T. was supported by a training grant from the Fédération de recherche en infectiologie (FéRI). C.M.D. received funding from NSERC Discovery Grant 2019-06642. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Guabiraba R, Schouler C. Avian colibacillosis: still many black holes. FEMS Microbiol Lett. 2015;362(15):fnv118. doi: 10.1093/femsle/fnv118 [DOI] [PubMed] [Google Scholar]
  • 2.Landman WJM, van Eck JHH. The incidence and economic impact of the Escherichia coli peritonitis syndrome in Dutch poultry farming. Avian Pathol. 2015;44(5):370–8. doi: 10.1080/03079457.2015.1060584 [DOI] [PubMed] [Google Scholar]
  • 3.Zhang Y, Wang Y, Zhu H, Yi Z, Afayibo DJA, Tao C, et al. DctR contributes to the virulence of avian pathogenic Escherichia coli through regulation of type III secretion system 2 expression. Vet Res. 2021;52(1):101. doi: 10.1186/s13567-021-00970-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Jorgensen SL, Stegger M, Kudirkiene E, Lilje B, Poulsen LL, Ronco T. Diversity and population overlap between avian and human Escherichia coli belonging to Sequence Type 95. mSphere. 2019;4(1). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wirth T, Falush D, Lan R, Colles F, Mensa P, Wieler LH, et al. Sex and virulence in Escherichia coli: an evolutionary perspective. Mol Microbiol. 2006;60(5):1136–51. doi: 10.1111/j.1365-2958.2006.05172.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Nandanwar N, Janssen T, Kühl M, Ahmed N, Ewers C, Wieler LH. Extraintestinal pathogenic Escherichia coli (ExPEC) of human and avian origin belonging to sequence type complex 95 (STC95) portray indistinguishable virulence features. Int J Med Microbiol. 2014;304(7):835–42. doi: 10.1016/j.ijmm.2014.06.009 [DOI] [PubMed] [Google Scholar]
  • 7.Gordon DM, Geyik S, Clermont O, O’Brien CL, Huang S, Abayasekara C, et al. Fine-scale structure analysis shows epidemic patterns of lonal Complex 95, a cosmopolitan Escherichia coli lineage responsible for extraintestinal infection. mSphere. 2017;2(3):e00168-17. doi: 10.1128/mSphere.00168-17 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Royer G, Darty MM, Clermont O, Condamine B, Laouenan C, Decousser J-W, et al. Phylogroup stability contrasts with high within sequence type complex dynamics of Escherichia coli bloodstream infection isolates over a 12-year period. Genome Med. 2021;13(1):77. doi: 10.1186/s13073-021-00892-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Dho M, Lafont JP. Escherichia coli colonization of the trachea in poultry: comparison of virulent and avirulent strains in gnotoxenic chickens. Avian Dis. 1982;26(4):787–97. doi: 10.2307/1589865 [DOI] [PubMed] [Google Scholar]
  • 10.Porcheron G, Chanteloup NK, Trotereau A, Brée A, Schouler C. Effect of fructooligosaccharide metabolism on chicken colonization by an extra-intestinal pathogenic Escherichia coli strain. PLoS One. 2012;7(4):e35475. doi: 10.1371/journal.pone.0035475 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Schouler C, Taki A, Chouikha I, Moulin-Schouleur M, Gilot P. A genomic island of an extraintestinal pathogenic Escherichia coli Strain enables the metabolism of fructooligosaccharides, which improves intestinal colonization. J Bacteriol. 2009;191(1):388–93. doi: 10.1128/JB.01052-08 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Horn F, Corrêa AMR, Barbieri NL, Glodde S, Weyrauch KD, Kaspers B, et al. Infections with avian pathogenic and fecal Escherichia coli strains display similar lung histopathology and macrophage apoptosis. PLoS One. 2012;7(7):e41031. doi: 10.1371/journal.pone.0041031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Pourbakhsh SA, Boulianne M, Martineau-Doizé B, Dozois CM, Desautels C, Fairbrother JM. Dynamics of Escherichia coil infection in experimentally inoculated chickens. Avian Dis. 1997;41(1):221–33. doi: 10.2307/1592463 [DOI] [PubMed] [Google Scholar]
  • 14.Pavanelo DB, Houle S, Matter LB, Dozois CM, Horn F. The periplasmic trehalase Affects Type 1 fimbria production and virulence of Extraintestinal Pathogenic Escherichia coli strain MT78. Infect Immun. 2018;86(8):e00241–18. doi: 10.1128/IAI.00241-18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Porcheron G, Kut E, Canepa S, Maurel M-C, Schouler C. Regulation of fructooligosaccharide metabolism in an extra-intestinal pathogenic Escherichia coli strain. Mol Microbiol. 2011;81(3):717–33. doi: 10.1111/j.1365-2958.2011.07725.x [DOI] [PubMed] [Google Scholar]
  • 16.Chouikha I, Germon P, Brée A, Gilot P, Moulin-Schouleur M, Schouler C. A selC-associated genomic island of the extraintestinal avian pathogenic Escherichia coli strain BEN2908 is involved in carbohydrate uptake and virulence. J Bacteriol. 2006;188(3):977–87. doi: 10.1128/JB.188.3.977-987.2006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Répérant M, Porcheron G, Rouquet G, Gilot P. The yicJI metabolic operon of Escherichia coli is involved in bacterial fitness. FEMS Microbiol Lett. 2011;319(2):180–6. doi: 10.1111/j.1574-6968.2011.02281.x [DOI] [PubMed] [Google Scholar]
  • 18.Rouquet G, Porcheron G, Barra C, Répérant M, Chanteloup NK, Schouler C, et al. A metabolic operon in extraintestinal pathogenic Escherichia coli promotes fitness under stressful conditions and invasion of eukaryotic cells. J Bacteriol. 2009;191(13):4427–40. doi: 10.1128/JB.00103-09 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Klemberg VS, Pavanelo DB, Houle S, Dhakal S, Pokharel P, Iahnig-Jacques S, et al. The osmoregulated metabolism of trehalose contributes to production of type 1 fimbriae and bladder colonization by extraintestinal Escherichia coli strain BEN2908. Front Cell Infect Microbiol. 2024;14:1414188. doi: 10.3389/fcimb.2024.1414188 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Matter LB, Barbieri NL, Nordhoff M, Ewers C, Horn F. Avian pathogenic Escherichia coli MT78 invades chicken fibroblasts. Vet Microbiol. 2011;148(1):51–9. doi: 10.1016/j.vetmic.2010.08.006 [DOI] [PubMed] [Google Scholar]
  • 21.Chanteloup NK, Porcheron G, Delaleu B, Germon P, Schouler C, Moulin-Schouleur M, et al. The extra-intestinal avian pathogenic Escherichia coli strain BEN2908 invades avian and human epithelial cells and survives intracellularly. Vet Microbiol. 2011;147(3–4):435–9. doi: 10.1016/j.vetmic.2010.07.013 [DOI] [PubMed] [Google Scholar]
  • 22.Germon P, Chen Y-H, He L, Blanco JE, Brée A, Schouler C, et al. ibeA, a virulence factor of avian pathogenic Escherichia coli. Microbiology (Reading). 2005;151(Pt 4):1179–86. doi: 10.1099/mic.0.27809-0 [DOI] [PubMed] [Google Scholar]
  • 23.Martinez-Medina M, Garcia-Gil J, Barnich N, Wieler LH, Ewers C. Adherent-invasive Escherichia coli phenotype displayed by intestinal pathogenic E. coli strains from cats, dogs, and swine. Appl Environ Microbiol. 2011;77(16):5813–7. doi: 10.1128/AEM.02614-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Cortes MAM, Gibon J, Chanteloup NK, Moulin-Schouleur M, Gilot P, Germon P. Inactivation of ibeA and ibeT results in decreased expression of type 1 fimbriae in extraintestinal pathogenic Escherichia coli strain BEN2908. Infect Immun. 2008;76(9):4129–36. doi: 10.1128/IAI.00334-08 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Fléchard M, Cortes MAM, Répérant M, Germon P. New role for the ibeA gene in H2O2 stress resistance of Escherichia coli. J Bacteriol. 2012;194(17):4550–60. doi: 10.1128/JB.00089-12 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Darfeuille-Michaud A, Neut C, Barnich N, Lederman E, Di Martino P, Desreumaux P, et al. Presence of adherent Escherichia coli strains in ileal mucosa of patients with Crohn’s disease. Gastroenterology. 1998;115(6):1405–13. doi: 10.1016/s0016-5085(98)70019-8 [DOI] [PubMed] [Google Scholar]
  • 27.Nash JH, Villegas A, Kropinski AM, Aguilar-Valenzuela R, Konczy P, Mascarenhas M, et al. Genome sequence of adherent-invasive Escherichia coli and comparative genomic analysis with other E. coli pathotypes. BMC Genomics. 2010;11:667. doi: 10.1186/1471-2164-11-667 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Miquel S, Peyretaillade E, Claret L, de Vallée A, Dossat C, Vacherie B, et al. Complete genome sequence of Crohn’s disease-associated adherent-invasive E. coli strain LF82. PLoS One. 2010;5(9):e12714. doi: 10.1371/journal.pone.0012714 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Gibold L, Garenaux E, Dalmasso G, Gallucci C, Cia D, Mottet-Auselo B, et al. The Vat-AIEC protease promotes crossing of the intestinal mucus layer by Crohn’s disease-associated Escherichia coli. Cell Microbiol. 2016;18(5):617–31. doi: 10.1111/cmi.12539 [DOI] [PubMed] [Google Scholar]
  • 30.Parreira VR, Gyles CL. A novel pathogenicity island integrated adjacent to the thrW tRNA gene of avian pathogenic Escherichia coli encodes a vacuolating autotransporter toxin. Infect Immun. 2003;71(9):5087–96. doi: 10.1128/IAI.71.9.5087-5096.2003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Prudent V, Demarre G, Vazeille E, Wery M, Quenech’Du N, Ravet A, et al. The Crohn’s disease-related bacterial strain LF82 assembles biofilm-like communities to protect itself from phagolysosomal attack. Commun Biol. 2021;4(1):627. doi: 10.1038/s42003-021-02161-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Feldmann F, Sorsa LJ, Hildinger K, Schubert S. The salmochelin siderophore receptor IroN contributes to invasion of urothelial cells by extraintestinal pathogenic Escherichia coli in vitro. Infect Immun. 2007;75(6):3183–7. doi: 10.1128/IAI.00656-06 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Huang SH, Chen YH, Kong G, Chen SH, Besemer J, Borodovsky M, et al. A novel genetic island of meningitic Escherichia coli K1 containing the ibeA invasion gene (GimA): functional annotation and carbon-source-regulated invasion of human brain microvascular endothelial cells. Funct Integr Genomics. 2001;1(5):312–22. doi: 10.1007/s101420100039 [DOI] [PubMed] [Google Scholar]
  • 34.Martinez-Jéhanne V, Pichon C, du Merle L, Poupel O, Cayet N, Bouchier C, et al. Role of the vpe carbohydrate permease in Escherichia coli urovirulence and fitness in vivo. Infect Immun. 2012;80(8):2655–66. doi: 10.1128/IAI.00457-12 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Schubert S, Picard B, Gouriou S, Heesemann J, Denamur E. Yersinia high-pathogenicity island contributes to virulence in Escherichia coli causing extraintestinal infections. Infect Immun. 2002;70(9):5335–7. doi: 10.1128/IAI.70.9.5335-5337.2002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Zong B, Zhang Y, Wang X, Liu M, Zhang T, Zhu Y, et al. Characterization of multiple type-VI secretion system (T6SS) VgrG proteins in the pathogenicity and antibacterial activity of porcine extra-intestinal pathogenic Escherichia coli. Virulence. 2019;10(1):118–32. doi: 10.1080/21505594.2019.1573491 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.De Coster W, D’Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018;34(15):2666–9. doi: 10.1093/bioinformatics/bty149 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20. doi: 10.1093/bioinformatics/btu170 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Galaxy Community. The Galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update. Nucleic Acids Res. 2024;52(W1):W83–94. doi: 10.1093/nar/gkae410 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60. doi: 10.1093/bioinformatics/btp324 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27(5):722–36. doi: 10.1101/gr.215087.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9(11):e112963. doi: 10.1371/journal.pone.0112963 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: rapid annotations using subsystems technology. BMC Genomics. 2008;9:75. doi: 10.1186/1471-2164-9-75 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Johnson TJ, Kariyawasam S, Wannemuehler Y, Mangiamele P, Johnson SJ, Doetkott C, et al. The genome sequence of avian pathogenic Escherichia coli strain O1:K1:H7 shares strong similarities with human extraintestinal pathogenic E. coli genomes. J Bacteriol. 2007;189(8):3228–36. doi: 10.1128/JB.01726-06 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Zhu Ge X, Jiang J, Pan Z, Hu L, Wang S, Wang H, et al. Comparative genomic analysis shows that avian pathogenic Escherichia coli isolate IMT5155 (O2:K1:H5; ST complex 95, ST140) shares close relationship with ST95 APEC O1:K1 and human ExPEC O18:K1 strains. PLoS One. 2014;9(11):e112048. doi: 10.1371/journal.pone.0112048 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Dziva F, Hauser H, Connor TR, van Diemen PM, Prescott G, Langridge GC, et al. Sequencing and functional annotation of avian pathogenic Escherichia coli serogroup O78 strains reveal the evolution of E. coli lineages pathogenic for poultry via distinct mechanisms. Infect Immun. 2013;81(3):838–49. doi: 10.1128/IAI.00585-12 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Wijetunge DSS, Katani R, Kapur V, Kariyawasam S. Complete Genome Sequence of Escherichia coli Strain RS218 (O18:H7:K1), Associated with Neonatal Meningitis. Genome Announc. 2015;3(4):e00804-15. doi: 10.1128/genomeA.00804-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Moriel DG, Bertoldi I, Spagnuolo A, Marchi S, Rosini R, Nesta B, et al. Identification of protective and broadly conserved vaccine antigens from the genome of extraintestinal pathogenic Escherichia coli. Proc Natl Acad Sci U S A. 2010;107(20):9072–7. doi: 10.1073/pnas.0915077107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Chen SL, Hung C-S, Xu J, Reigstad CS, Magrini V, Sabo A, et al. Identification of genes subject to positive selection in uropathogenic strains of Escherichia coli: a comparative genomics approach. Proc Natl Acad Sci U S A. 2006;103(15):5977–82. doi: 10.1073/pnas.0600938103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Welch RA, Burland V, Plunkett G 3rd, Redford P, Roesch P, Rasko D, et al. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci U S A. 2002;99(26):17020–4. doi: 10.1073/pnas.252529799 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Biggel M, Xavier BB, Johnson JR, Nielsen KL, Frimodt-Møller N, Matheeussen V, et al. Horizontally acquired papGII-containing pathogenicity islands underlie the emergence of invasive uropathogenic Escherichia coli lineages. Nat Commun. 2020;11(1):5968. doi: 10.1038/s41467-020-19714-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Strachan NJC, Rotariu O, Lopes B, MacRae M, Fairley S, Laing C, et al. Whole Genome Sequencing demonstrates that Geographic Variation of Escherichia coli O157 Genotypes Dominates Host Association. Sci Rep. 2015;5:14145. doi: 10.1038/srep14145 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Ohnishi M, Terajima J, Kurokawa K, Nakayama K, Murata T, Tamura K, et al. Genomic diversity of enterohemorrhagic Escherichia coli O157 revealed by whole genome PCR scanning. Proc Natl Acad Sci U S A. 2002;99(26):17043–8. doi: 10.1073/pnas.262441699 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Delannoy S, Mariani-Kurkdjian P, Webb HE, Bonacorsi S, Fach P. The mobilome; a major contributor to Escherichia coli stx2-Positive O26:H11 Strains Intra-Serotype Diversity. Front Microbiol. 2017;8:1625. doi: 10.3389/fmicb.2017.01625 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Ogura Y, Ooka T, Iguchi A, Toh H, Asadulghani M, Oshima K, et al. Comparative genomics reveal the mechanism of the parallel evolution of O157 and non-O157 enterohemorrhagic Escherichia coli. Proc Natl Acad Sci U S A. 2009;106(42):17939–44. doi: 10.1073/pnas.0903585106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Chaudhuri RR, Sebaihia M, Hobman JL, Webber MA, Leyton DL, Goldberg MD, et al. Complete genome sequence and comparative metabolic profiling of the prototypical enteroaggregative Escherichia coli strain 042. PLoS One. 2010;5(1):e8801. doi: 10.1371/journal.pone.0008801 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Ahmed SA, Awosika J, Baldwin C, Bishop-Lilly KA, Biswas B, Broomall S, et al. Genomic comparison of Escherichia coli O104:H4 isolates from 2009 and 2011 reveals plasmid, and prophage heterogeneity, including shiga toxin encoding phage stx2. PLoS One. 2012;7(11):e48228. doi: 10.1371/journal.pone.0048228 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Alexander DC, Hao W, Gilmour MW, Zittermann S, Sarabia A, Melano RG, et al. Escherichia coli O104:H4 infections and international travel. Emerg Infect Dis. 2012;18(3):473–6. doi: 10.3201/eid1803.111281 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Crossman LC, Chaudhuri RR, Beatson SA, Wells TJ, Desvaux M, Cunningham AF, et al. A commensal gone bad: complete genome sequence of the prototypical enterotoxigenic Escherichia coli strain H10407. J Bacteriol. 2010;192(21):5822–31. doi: 10.1128/JB.00710-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Rasko DA, Rosovitz MJ, Myers GSA, Mongodin EF, Fricke WF, Gajer P, et al. The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J Bacteriol. 2008;190(20):6881–93. doi: 10.1128/JB.00619-08 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Iguchi A, Thomson NR, Ogura Y, Saunders D, Ooka T, Henderson IR, et al. Complete genome sequence and comparative genome analysis of enteropathogenic Escherichia coli O127:H6 strain E2348/69. J Bacteriol. 2009;191(1):347–54. doi: 10.1128/JB.01238-08 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Blattner FR, Plunkett G 3rd, Bloch CA, Perna NT, Burland V, Riley M, et al. The complete genome sequence of Escherichia coli K-12. Science. 1997;277(5331):1453–62. doi: 10.1126/science.277.5331.1453 [DOI] [PubMed] [Google Scholar]
  • 63.Stephens C, Arismendi T, Wright M, Hartman A, Gonzalez A, Gill M. F plasmids are the major carriers of antibiotic resistance genes in human-associated commensal Escherichia coli. mSphere. 2020;5(4). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Zhou Z, Alikhan N-F, Mohamed K, Fan Y, Agama Study Group, Achtman M. The EnteroBase user’s guide, with case studies on Salmonella transmissions, Yersinia pestis phylogeny, and Escherichia core genomic diversity. Genome Res. 2020;30(1):138–52. doi: 10.1101/gr.251678.119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Beghain J, Bridier-Nahmias A, Le Nagard H, Denamur E, Clermont O. ClermonTyping: an easy-to-use and accurate in silico method for Escherichia genus strain phylotyping. Microb Genom. 2018;4(7):e000192. doi: 10.1099/mgen.0.000192 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Joensen KG, Tetzschner AMM, Iguchi A, Aarestrup FM, Scheutz F. Rapid and easy in silico serotyping of Escherichia coli isolates by use of whole-genome sequencing data. J Clin Microbiol. 2015;53(8):2410–26. doi: 10.1128/JCM.00008-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Larsen MV, Cosentino S, Rasmussen S, Friis C, Hasman H, Marvig RL, et al. Multilocus sequence typing of total-genome-sequenced bacteria. J Clin Microbiol. 2012;50(4):1355–61. doi: 10.1128/JCM.06094-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Roer L, Tchesnokova V, Allesøe R, Muradova M, Chattopadhyay S, Ahrenfeldt J, et al. Development of a Web Tool for Escherichia coli Subtyping Based on fimH Alleles. J Clin Microbiol. 2017;55(8):2538–43. doi: 10.1128/JCM.00737-17 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Liu B, Zheng D, Zhou S, Chen L, Yang J. VFDB 2022: a general classification scheme for bacterial virulence factors. Nucleic Acids Res. 2022;50(D1):D912–7. doi: 10.1093/nar/gkab1107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Russel J, Pinilla-Redondo R, Mayo-Muñoz D, Shah SA, Sørensen SJ. CRISPRCasTyper: Automated Identification, Annotation, and Classification of CRISPR-Cas Loci. CRISPR J. 2020;3(6):462–9. doi: 10.1089/crispr.2020.0059 [DOI] [PubMed] [Google Scholar]
  • 72.Bland C, Ramsey TL, Sabree F, Lowe M, Brown K, Kyrpides NC, et al. CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics. 2007;8:209. doi: 10.1186/1471-2105-8-209 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Zhang J, Guan J, Wang M, Li G, Djordjevic M, Tai C, et al. SecReT6 update: a comprehensive resource of bacterial Type VI Secretion Systems. Sci China Life Sci. 2023;66(3):626–34. doi: 10.1007/s11427-022-2172-x [DOI] [PubMed] [Google Scholar]
  • 74.Wishart DS, Han S, Saha S, Oler E, Peters H, Grant JR, et al. PHASTEST: faster than PHASTER, better than PHAST. Nucleic Acids Res. 2023;51(W1):W443–50. doi: 10.1093/nar/gkad382 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MTG, et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics. 2015;31(22):3691–3. doi: 10.1093/bioinformatics/btv421 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Kanehisa M, Furumichi M, Sato Y, Matsuura Y, Ishiguro-Watanabe M. KEGG: biological systems database as a model of the real world. Nucleic Acids Res. 2025;53(D1):D672–7. doi: 10.1093/nar/gkae909 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Löytynoja A. Phylogeny-aware alignment with PRANK. Methods Mol Biol. 2014;1079:155–70. doi: 10.1007/978-1-62703-646-7_10 [DOI] [PubMed] [Google Scholar]
  • 78.Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20(1):238. doi: 10.1186/s13059-019-1832-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–7. doi: 10.1093/nar/gkh340 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods. 2017;14(6):587–9. doi: 10.1038/nmeth.4285 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–3. doi: 10.1093/bioinformatics/btu033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Letunic I, Bork P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021;49(W1):W293–6. doi: 10.1093/nar/gkab301 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Kim D, Park S, Chun J. Introducing EzAAI: a pipeline for high throughput calculations of prokaryotic average amino acid identity. J Microbiol. 2021;59(5):476–80. doi: 10.1007/s12275-021-1154-0 [DOI] [PubMed] [Google Scholar]
  • 84.Alikhan N-F, Petty NK, Ben Zakour NL, Beatson SA. BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons. BMC Genomics. 2011;12:402. doi: 10.1186/1471-2164-12-402 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Liu CM, Stegger M, Aziz M, Johnson TJ, Waits K, Nordstrom L, et al. Escherichia coli ST131-H22 as a Foodborne Uropathogen. mBio. 2018;9(4):e00470–18. doi: 10.1128/mBio.00470-18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Bonacorsi S, Clermont O, Houdouin V, Cordevant C, Brahimi N, Marecat A, et al. Molecular analysis and experimental virulence of French and North American Escherichia coli neonatal meningitis isolates: identification of a new virulent clone. J Infect Dis. 2003;187(12):1895–906. doi: 10.1086/375347 [DOI] [PubMed] [Google Scholar]
  • 87.Homeier T, Semmler T, Wieler LH, Ewers C. The GimA locus of extraintestinal pathogenic E. coli: does reductive evolution correlate with habitat and pathotype? PLoS One. 2010;5(5):e10877. doi: 10.1371/journal.pone.0010877 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Lu S, Wang J, Chitsaz F, Derbyshire MK, Geer RC, Gonzales NR, et al. CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res. 2020;48(D1):D265–8. doi: 10.1093/nar/gkz991 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Wang J, Chitsaz F, Derbyshire MK, Gonzales NR, Gwadz M, Lu S, et al. The conserved domain database in 2023. Nucleic Acids Res. 2023;51(D1):D384–8. doi: 10.1093/nar/gkac1096 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Thiaville JJ, Flood J, Yurgel S, Prunetti L, Elbadawi-Sidhu M, Hutinet G, et al. Members of a novel kinase family (DUF1537) can recycle toxic intermediates into an essential metabolite. ACS Chem Biol. 2016;11(8):2304–11. doi: 10.1021/acschembio.6b00279 [DOI] [PubMed] [Google Scholar]
  • 91.Majumdar A, Ghatak A, Ghosh RK. Identification of the gene for the monomeric alkaline phosphatase of Vibrio cholerae serogroup O1 strain. Gene. 2005;344:251–8. doi: 10.1016/j.gene.2004.11.005 [DOI] [PubMed] [Google Scholar]
  • 92.García-Sánchez M, Souche M, Trives-Segura C, Plassard C. The grazing activity of Acrobeloides sp. drives phytate mineralisation within its trophic relationship with bacteria. J Nematol. 2021;53:e2021–21. doi: 10.21307/jofnem-2021-021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Lim S, Han A, Kim D, Seo HS. Transcriptional profiling of an attenuated Salmonella Typhimurium ptsI mutant strain under low-oxygen conditions using microarray analysis. J Bacteriol Virol. 2015;45(3):200–14. [Google Scholar]
  • 94.Dreux N, Denizot J, Martinez-Medina M, Mellmann A, Billig M, Kisiela D, et al. Point mutations in FimH adhesin of Crohn’s disease-associated adherent-invasive Escherichia coli enhance intestinal inflammatory response. PLoS Pathog. 2013;9(1):e1003141. doi: 10.1371/journal.ppat.1003141 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Schwartz DJ, Kalas V, Pinkner JS, Chen SL, Spaulding CN, Dodson KW, et al. Positively selected FimH residues enhance virulence during urinary tract infection by altering FimH conformation. Proc Natl Acad Sci U S A. 2013;110(39):15530–7. doi: 10.1073/pnas.1315203110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Koonin EV, Makarova KS. Origins and evolution of CRISPR-Cas systems. Philos Trans R Soc Lond B Biol Sci. 2019;374(1772):20180087. doi: 10.1098/rstb.2018.0087 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Ma J, Sun M, Bao Y, Pan Z, Zhang W, Lu C, et al. Genetic diversity and features analysis of type VI secretion systems loci in avian pathogenic Escherichia coli by wide genomic scanning. Infect Genet Evol. 2013;20:454–64. doi: 10.1016/j.meegid.2013.09.031 [DOI] [PubMed] [Google Scholar]
  • 98.Kehres DG, Janakiraman A, Slauch JM, Maguire ME. SitABCD is the alkaline Mn(2+) transporter of Salmonella enterica serovar Typhimurium. J Bacteriol. 2002;184(12):3159–66. doi: 10.1128/JB.184.12.3159-3166.2002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Lloyd AL, Rasko DA, Mobley HLT. Defining genomic islands and uropathogen-specific genes in uropathogenic Escherichia coli. J Bacteriol. 2007;189(9):3532–46. doi: 10.1128/JB.01744-06 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Gratia A. Sur un remarquable exemple d’antagonisme entre deux souches de colibacille. C R Soc Biol. 1925;93:1040–1. [Google Scholar]
  • 101.Caza M, Lépine F, Milot S, Dozois CM. Specific roles of the iroBCDEN genes in virulence of an avian pathogenic Escherichia coli O78 strain and in production of salmochelins. Infect Immun. 2008;76(8):3539–49. doi: 10.1128/IAI.00455-08 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Chagneau CV, Payros D, Goman A, Goursat C, David L, Okuno M, et al. HlyF, an underestimated virulence factor of uropathogenic Escherichia coli. Clin Microbiol Infect. 2023;29(11):1449.e1–1449.e9. doi: 10.1016/j.cmi.2023.07.024 [DOI] [PubMed] [Google Scholar]
  • 103.Murase K, Martin P, Porcheron G, Houle S, Helloin E, Pénary M, et al. HlyF produced by extraintestinal pathogenic Escherichia coli Is a virulence factor that regulates outer membrane vesicle biogenesis. J Infect Dis. 2016;213(5):856–65. doi: 10.1093/infdis/jiv506 [DOI] [PubMed] [Google Scholar]
  • 104.Quackenbush RL, Falkow S. Relationship between colicin V activity and virulence in Escherichia coli. Infect Immun. 1979;24(2):562–4. doi: 10.1128/iai.24.2.562-564.1979 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Reid CJ, Cummins ML, Börjesson S, Brouwer MSM, Hasman H, Hammerum AM, et al. A role for ColV plasmids in the evolution of pathogenic Escherichia coli ST58. Nat Commun. 2022;13(1):683. doi: 10.1038/s41467-022-28342-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Mellata M, Ameiss K, Mo H, Curtiss R 3rd. Characterization of the contribution to virulence of three large plasmids of avian pathogenic Escherichia coli chi7122 (O78:K80:H9). Infect Immun. 2010;78(4):1528–41. doi: 10.1128/IAI.00981-09 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Nolan LK, Horne SM, Giddings CW, Foley SL, Johnson TJ, Lynne AM, et al. Resistance to serum complement, iss, and virulence of avian Escherichia coli. Vet Res Commun. 2003;27(2):101–10. doi: 10.1023/a:1022854902700 [DOI] [PubMed] [Google Scholar]
  • 108.Kidgell C, Pickard D, Wain J, James K, Diem Nga LT, Diep TS, et al. Characterisation and distribution of a cryptic Salmonella typhi plasmid pHCM2. Plasmid. 2002;47(3):159–71. doi: 10.1016/s0147-619x(02)00013-6 [DOI] [PubMed] [Google Scholar]
  • 109.Denamur E, Clermont O, Bonacorsi S, Gordon D. The population genetics of pathogenic Escherichia coli. Nat Rev Microbiol. 2021;19(1):37–54. doi: 10.1038/s41579-020-0416-x [DOI] [PubMed] [Google Scholar]
  • 110.Schouler C, Schaeffer B, Brée A, Mora A, Dahbi G, Biet F, et al. Diagnostic strategy for identifying avian pathogenic Escherichia coli based on four patterns of virulence genes. J Clin Microbiol. 2012;50(5):1673–8. doi: 10.1128/JCM.05057-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Ye C, Xu J. Prevalence of iron transport gene on pathogenicity-associated island of uropathogenic Escherichia coli in E. coli O157:H7 containing Shiga toxin gene. J Clin Microbiol. 2001;39(6):2300–5. doi: 10.1128/JCM.39.6.2300-2305.2001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Ouyang Z, Isaacson R. Identification and characterization of a novel ABC iron transport system, fit, in Escherichia coli. Infect Immun. 2006;74(12):6949–56. doi: 10.1128/IAI.00866-06 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Suits MDL, Lang J, Pal GP, Couture M, Jia Z. Structure and heme binding properties of Escherichia coli O157:H7 ChuX. Protein Sci. 2009;18(4):825–38. doi: 10.1002/pro.84 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Nesta B, Spraggon G, Alteri C, Moriel DG, Rosini R, Veggi D, et al. FdeC, a novel broadly conserved Escherichia coli adhesin eliciting protection against urinary tract infections. mBio. 2012;3(2):e00010–12. doi: 10.1128/mBio.00010-12 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Bateman SL, Stapleton AE, Stamm WE, Hooton TM, Seed PC. The type 1 pili regulator gene fimX and pathogenicity island PAI-X as molecular markers of uropathogenic Escherichia coli. Microbiology (Reading). 2013;159(Pt 8):1606–17. doi: 10.1099/mic.0.066472-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Shea AE, Stocki JA, Himpsl SD, Smith SN, Mobley HLT. Loss of an intimin-like protein encoded on a uropathogenic E. coli pathogenicity island reduces inflammation and affects interactions with the urothelium. Infect Immun. 2022;90(2):e0027521. doi: 10.1128/IAI.00275-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Buckles EL, Bahrani-Mougeot FK, Molina A, Lockatell CV, Johnson DE, Drachenberg CB, et al. Identification and characterization of a novel uropathogenic Escherichia coli-associated fimbrial gene cluster. Infect Immun. 2004;72(7):3890–901. doi: 10.1128/IAI.72.7.3890-3901.2004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Vimr ER. Map position and genomic organization of the kps cluster for polysialic acid synthesis in Escherichia coli K1. J Bacteriol. 1991;173(3):1335–8. doi: 10.1128/jb.173.3.1335-1338.1991 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Sun M, Gao X, Zhao K, Ma J, Yao H, Pan Z. Insight into the virulence related secretion systems, fimbriae, and toxins in O2:K1 Escherichia coli Isolated From Bovine Mastitis. Front Vet Sci. 2021;8:622725. doi: 10.3389/fvets.2021.622725 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Pukatzki S, Ma AT, Revel AT, Sturtevant D, Mekalanos JJ. Type VI secretion system translocates a phage tail spike-like protein into target cells where it cross-links actin. Proc Natl Acad Sci U S A. 2007;104(39):15508–13. doi: 10.1073/pnas.0706532104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Li G, Kariyawasam S, Tivendale KA, Wannemuehler Y, Ewers C, Wieler LH, et al. tkt1, located on a novel pathogenicity island, is prevalent in avian and human extraintestinal pathogenic Escherichia coli. BMC Microbiol. 2012;12:51. doi: 10.1186/1471-2180-12-51 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Nakayama N, Arai N, Bond MW, Kaziro Y, Arai K. Nucleotide sequence of dnaB and the primary structure of the dnaB protein from Escherichia coli. J Biol Chem. 1984;259(1):97–101. doi: 10.1016/s0021-9258(17)43626-x [DOI] [PubMed] [Google Scholar]
  • 123.Wu D, Hu T, Zhang L, Chen J, Du J, Ding J, et al. Residues Asp164 and Glu165 at the substrate entryway function potently in substrate orientation of alanine racemase from E. coli: Enzymatic characterization with crystal structure analysis. Protein Sci. 2008;17(6):1066–76. doi: 10.1110/ps.083495908 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Kuramitsu S, Inoue K, Ogawa T, Ogawa H, Kagamiyama H. Aromatic amino acid aminotransferase of Escherichia coli: nucleotide sequence of the tyrB gene. Biochem Biophys Res Commun. 1985;133(1):134–9. doi: 10.1016/0006-291x(85)91851-0 [DOI] [PubMed] [Google Scholar]
  • 125.McAllister CH, Good AG. Alanine aminotransferase variants conferring diverse NUE phenotypes in Arabidopsis thaliana. PLoS One. 2015;10(4):e0121830. doi: 10.1371/journal.pone.0121830 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Zdych E, Peist R, Reidl J, Boos W. MalY of Escherichia coli is an enzyme with the activity of a beta C-S lyase (cystathionase). J Bacteriol. 1995;177(17):5035–9. doi: 10.1128/jb.177.17.5035-5039.1995 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Nobelmann B, Lengeler JW. Molecular analysis of the gat genes from Escherichia coli and of their roles in galactitol transport and metabolism. J Bacteriol. 1996;178(23):6790–5. doi: 10.1128/jb.178.23.6790-6795.1996 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 128.Kuivanen J, Richard P. The yjjN of E. coli codes for an L-galactonate dehydrogenase and can be used for quantification of L-galactonate and L-gulonate. Appl Biochem Biotechnol. 2014;173(7):1829–35. doi: 10.1007/s12010-014-0969-0 [DOI] [PubMed] [Google Scholar]
  • 129.Robert-Baudouy J, Portalier R, Stoeber F. Regulation of hexuronate system genes in Escherichia coli K-12: multiple regulation of the uxu operon by exuR and uxuR gene products. J Bacteriol. 1981;145(1):211–20. doi: 10.1128/jb.145.1.211-220.1981 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Ratledge C, Dover LG. Iron metabolism in pathogenic bacteria. Annu Rev Microbiol. 2000;54:881–941. doi: 10.1146/annurev.micro.54.1.881 [DOI] [PubMed] [Google Scholar]
  • 131.Porcheron G, Garénaux A, Proulx J, Sabri M, Dozois CM. Iron, copper, zinc, and manganese transport and regulation in pathogenic Enterobacteria: correlations between strains, site of infection and the relative importance of the different metal transport systems for virulence. Front Cell Infect Microbiol. 2013;3:90. doi: 10.3389/fcimb.2013.00090 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Perry RD, Fetherston JD. Yersiniabactin iron uptake: mechanisms and role in Yersinia pestis pathogenesis. Microbes Infect. 2011;13(10):808–17. doi: 10.1016/j.micinf.2011.04.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133.Caza M, Lépine F, Dozois CM. Secretion, but not overall synthesis, of catecholate siderophores contributes to virulence of extraintestinal pathogenic Escherichia coli. Mol Microbiol. 2011;80(1):266–82. doi: 10.1111/j.1365-2958.2011.07570.x [DOI] [PubMed] [Google Scholar]
  • 134.Li G, Laturnus C, Ewers C, Wieler LH. Identification of genes required for avian Escherichia coli septicemia by signature-tagged mutagenesis. Infect Immun. 2005;73(5):2818–27. doi: 10.1128/IAI.73.5.2818-2827.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135.Dho M, Lafont JP. Adhesive properties and iron uptake ability in Escherichia coli lethal and nonlethal for chicks. Avian Dis. 1984;28(4):1016–25. doi: 10.2307/1590278 [DOI] [PubMed] [Google Scholar]
  • 136.Li G, Tivendale KA, Liu P, Feng Y, Wannemuehler Y, Cai W, et al. Transcriptome analysis of avian pathogenic Escherichia coli O1 in chicken serum reveals adaptive responses to systemic infection. Infect Immun. 2011;79(5):1951–60. doi: 10.1128/IAI.01230-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 137.Klemm P. Two regulatory fim genes, fimB and fimE, control the phase variation of type 1 fimbriae in Escherichia coli. EMBO J. 1986;5(6):1389–93. doi: 10.1002/j.1460-2075.1986.tb04372.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138.Holden N, Totsika M, Dixon L, Catherwood K, Gally DL. Regulation of P-fimbrial phase variation frequencies in Escherichia coli CFT073. Infect Immun. 2007;75(7):3325–34. doi: 10.1128/IAI.01989-06 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 139.Hannan TJ, Mysorekar IU, Chen SL, Walker JN, Jones JM, Pinkner JS, et al. LeuX tRNA-dependent and -independent mechanisms of Escherichia coli pathogenesis in acute cystitis. Mol Microbiol. 2008;67(1):116–28. doi: 10.1111/j.1365-2958.2007.06025.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 140.Behzadi P. Classical chaperone-usher (CU) adhesive fimbriome: uropathogenic Escherichia coli (UPEC) and urinary tract infections (UTIs). Folia Microbiol (Praha). 2020;65(1):45–65. doi: 10.1007/s12223-019-00719-x [DOI] [PubMed] [Google Scholar]
  • 141.Matter LB, Spricigo DA, Tasca C, de Vargas AC. Invasin gimB found in a bovine intestinal Escherichia coli with an adherent and invasive profile. Braz J Microbiol. 2015;46(3):875–8. doi: 10.1590/S1517-838246320140621 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142.Sheikh A, Luo Q, Roy K, Shabaan S, Kumar P, Qadri F, et al. Contribution of the highly conserved EaeH surface protein to enterotoxigenic Escherichia coli pathogenesis. Infect Immun. 2014;82(9):3657–66. doi: 10.1128/IAI.01890-14 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 143.Whitfield C. Biosynthesis and assembly of capsular polysaccharides in Escherichia coli. Annu Rev Biochem. 2006;75:39–68. doi: 10.1146/annurev.biochem.75.103004.142545 [DOI] [PubMed] [Google Scholar]
  • 144.Bertani B, Ruiz N. Function and biogenesis of lipopolysaccharides. EcoSal Plus. 2018;8(1). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 145.Sommaruga S, Gioia LD, Tortora P, Polissi A. Structure prediction and functional analysis of KdsD, an enzyme involved in lipopolysaccharide biosynthesis. Biochem Biophys Res Commun. 2009;388(2):222–7. doi: 10.1016/j.bbrc.2009.07.154 [DOI] [PubMed] [Google Scholar]
  • 146.Russell AB, Peterson SB, Mougous JD. Type VI secretion system effectors: poisons with a purpose. Nat Rev Microbiol. 2014;12(2):137–48. doi: 10.1038/nrmicro3185 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 147.Li J, Yao Y, Xu HH, Hao L, Deng Z, Rajakumar K, et al. SecReT6: a web-based resource for type VI secretion systems found in bacteria. Environ Microbiol. 2015;17(7):2196–202. doi: 10.1111/1462-2920.12794 [DOI] [PubMed] [Google Scholar]
  • 148.Barret M, Egan F, O’Gara F. Distribution and diversity of bacterial secretion systems across metagenomic datasets. Environ Microbiol Rep. 2013;5(1):117–26. doi: 10.1111/j.1758-2229.2012.00394.x [DOI] [PubMed] [Google Scholar]
  • 149.Wang S, Dai J, Meng Q, Han X, Han Y, Zhao Y, et al. DotU expression is highly induced during in vivo infection and responsible for virulence and Hcp1 secretion in avian pathogenic Escherichia coli. Front Microbiol. 2014;5:588. doi: 10.3389/fmicb.2014.00588 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 150.Hachani A, Lossi NS, Hamilton A, Jones C, Bleves S, Albesa-Jové D, et al. Type VI secretion system in Pseudomonas aeruginosa: secretion and multimerization of VgrG proteins. J Biol Chem. 2011;286(14):12317–27. doi: 10.1074/jbc.M110.193045 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 151.Kim K-S, Pelton JG, Inwood WB, Andersen U, Kustu S, Wemmer DE. The Rut pathway for pyrimidine degradation: novel chemistry and toxicity problems. J Bacteriol. 2010;192(16):4089–102. doi: 10.1128/JB.00201-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 152.Danielsen S, Kilstrup M, Barilla K, Jochimsen B, Neuhard J. Characterization of the Escherichia coli codBA operon encoding cytosine permease and cytosine deaminase. Mol Microbiol. 1992;6(10):1335–44. doi: 10.1111/j.1365-2958.1992.tb00854.x [DOI] [PubMed] [Google Scholar]
  • 153.Kim NY, Kim OB. The ybcF Gene of Escherichia coli encodes a local orphan enzyme, catabolic carbamate kinase. J Microbiol Biotechnol. 2022;32(12):1527–36. doi: 10.4014/jmb.2210.10037 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 154.Wang L, Zhou YJ, Ji D, Lin X, Liu Y, Zhang Y, et al. Identification of UshA as a major enzyme for NAD degradation in Escherichia coli. Enzyme Microb Technol. 2014;58–59:75–9. doi: 10.1016/j.enzmictec.2014.03.003 [DOI] [PubMed] [Google Scholar]
  • 155.Alber A, Morris KM, Bryson KJ, Sutton KM, Monson MS, Chintoan-Uta C, et al. Avian Pathogenic Escherichia coli (APEC) strain-dependent immunomodulation of respiratory granulocytes and mononuclear phagocytes in CSF1R-reporter transgenic chickens. Front Immunol. 2020;10:3055. doi: 10.3389/fimmu.2019.03055 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 156.Pajor AM. Molecular properties of the SLC13 family of dicarboxylate and sulfate transporters. Pflugers Arch. 2006;451(5):597–605. doi: 10.1007/s00424-005-1487-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 157.Pos KM, Dimroth P, Bott M. The Escherichia coli citrate carrier CitT: a member of a novel eubacterial transporter family related to the 2-oxoglutarate/malate translocator from spinach chloroplasts. J Bacteriol. 1998;180(16):4160–5. doi: 10.1128/JB.180.16.4160-4165.1998 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 158.Rabus R, Jack DL, Kelly DJ, Saier MH Jr. TRAP transporters: an ancient family of extracytoplasmic solute-receptor-dependent secondary active transporters. Microbiology (Reading). 1999;145(Pt 12):3431–45. doi: 10.1099/00221287-145-12-3431 [DOI] [PubMed] [Google Scholar]
  • 159.Valentini M, Storelli N, Lapouge K. Identification of C(4)-dicarboxylate transport systems in Pseudomonas aeruginosa PAO1. J Bacteriol. 2011;193(17):4307–16. doi: 10.1128/JB.05074-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 160.Rodríguez-Beltrán J, Tourret J, Tenaillon O, López E, Bourdelier E, Costas C, et al. High Recombinant Frequency in Extraintestinal Pathogenic Escherichia coli Strains. Mol Biol Evol. 2015;32(7):1708–16. doi: 10.1093/molbev/msv072 [DOI] [PubMed] [Google Scholar]
  • 161.Didelot X, Méric G, Falush D, Darling AE. Impact of homologous and non-homologous recombination in the genomic evolution of Escherichia coli. BMC Genomics. 2012;13:256. doi: 10.1186/1471-2164-13-256 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 162.Glasner ME, Truong DP, Morse BC. How enzyme promiscuity and horizontal gene transfer contribute to metabolic innovation. FEBS J. 2020;287(7):1323–42. doi: 10.1111/febs.15185 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 163.Milner DS, Attah V, Cook E, Maguire F, Savory FR, Morrison M, et al. Environment-dependent fitness gains can be driven by horizontal gene transfer of transporter-encoding genes. Proc Natl Acad Sci U S A. 2019;116(12):5613–22. doi: 10.1073/pnas.1815994116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 164.Lauro FM, Eloe-Fadrosh EA, Richter TKS, Vitulo N, Ferriera S, Johnson JH, et al. Ecotype diversity and conversion in Photobacterium profundum strains. PLoS One. 2014;9(5):e96953. doi: 10.1371/journal.pone.0096953 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 165.Campanaro S, Vezzi A, Vitulo N, Lauro FM, D’Angelo M, Simonato F, et al. Laterally transferred elements and high pressure adaptation in Photobacterium profundum strains. BMC Genomics. 2005;6:122. doi: 10.1186/1471-2164-6-122 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 166.Pang R, Xie T, Wu Q, Li Y, Lei T, Zhang J, et al. Comparative genomic analysis reveals the potential risk of Vibrio parahaemolyticus Isolated From Ready-To-Eat Foods in China. Front Microbiol. 2019;10:186. doi: 10.3389/fmicb.2019.00186 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 167.Roslan NN, Ngalimat MS, Leow ATC, Oslan SN, Baharum SN, Sabri S. Genomic and phenomic analysis of a marine bacterium, Photobacterium marinum J15. Microbiol Res. 2020;233:126410. doi: 10.1016/j.micres.2020.126410 [DOI] [PubMed] [Google Scholar]
  • 168.Yang C-H, Li Y-H. Chromobacterium violaceum infection: a clinical review of an important but neglected infection. J Chin Med Assoc. 2011;74(10):435–41. doi: 10.1016/j.jcma.2011.08.013 [DOI] [PubMed] [Google Scholar]
  • 169.Redford P, Roesch PL, Welch RA. DegS is necessary for virulence and is among extraintestinal Escherichia coli genes induced in murine peritonitis. Infect Immun. 2003;71(6):3088–96. doi: 10.1128/IAI.71.6.3088-3096.2003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 170.Spiga L, Winter MG, Furtado de Carvalho T, Zhu W, Hughes ER, Gillis CC, et al. An oxidative central metabolism enables Salmonella to utilize microbiota-derived succinate. Cell Host Microbe. 2017;22(3):291-301.e6. doi: 10.1016/j.chom.2017.07.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 171.Billard-Pomares T, Clermont O, Castellanos M, Magdoud F, Royer G, Condamine B, et al. The arginine deiminase operon is responsible for a fitness trade-off in extended-spectrum-β-lactamase-producing strains of Escherichia coli. Antimicrob Agents Chemother. 2019;63(8):e00635-19. doi: 10.1128/AAC.00635-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 172.Zhang Y, Tan H, Yang S, Huang Y, Cai S, Jian J, et al. The role of dctP gene in regulating colonization, adhesion and pathogenicity of Vibrio alginolyticus strain HY9901. J Fish Dis. 2022;45(3):421–34. doi: 10.1111/jfd.13571 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Feng Gao

22 Oct 2025

-->PONE-D-25-49371-->-->Sequencing of the Invasive E. coli Strain BEN2908 Isolated from Poultry: A Comparative Investigation of Genomic Regions Shared with Other Invasive Model Strains-->-->PLOS ONE

Dear Dr. Schouler,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Dec 06 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:-->

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Feng Gao

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please include a complete copy of PLOS’ questionnaire on inclusivity in global research in your revised manuscript. Our policy for research in this area aims to improve transparency in the reporting of research performed outside of researchers’ own country or community. The policy applies to researchers who have travelled to a different country to conduct research, research with Indigenous populations or their lands, and research on cultural artefacts. The questionnaire can also be requested at the journal’s discretion for any other submissions, even if these conditions are not met.  Please find more information on the policy and a link to download a blank copy of the questionnaire here: https://journals.plos.org/plosone/s/best-practices-in-research-reporting. Please upload a completed version of your questionnaire as Supporting Information when you resubmit your manuscript.

3. Thank you for stating the following financial disclosure:

“This study has received funding from DGAL within the EcoAntibio2 call, COLIPHAVI project, France (C.S.), and from CNPq (Projeto Universal 423.902/2016‐4) and FAPERGS (PPSUS 21/2551-0000079-1), Brazil (F.H.). T.W.M. was the recipient of a CAPES Master studentship (DS 88887921543/2023-00). A.T. was supported by a training grant from the Fédération de recherche en infectiologie (FéRI). C.M.D. received funding from NSERC Discovery Grant 2019-06642.”

Please state what role the funders took in the study.  If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

If this statement is not correct you must amend it as needed.

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

4. Thank you for stating the following in the Funding Section of your manuscript:

“This study has received funding from DGAL within the EcoAntibio2 call, COLIPHAVI project, France (C.S.), and from CNPq (Projeto Universal 423.902/2016‐4) and FAPERGS (PPSUS 21/2551-0000079-1), Brazil (F.H.). T.W.M. was the recipient of a CAPES Master studentship (DS 88887921543/2023-00). A.T. was supported by a training grant from the Fédération de recherche en infectiologie (FéRI). C.M.D. received funding from NSERC Discovery Grant 2019-06642.”

We note that you have provided funding information that is currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“This study has received funding from DGAL within the EcoAntibio2 call, COLIPHAVI project, France (C.S.), and from CNPq (Projeto Universal 423.902/2016‐4) and FAPERGS (PPSUS 21/2551-0000079-1), Brazil (F.H.). T.W.M. was the recipient of a CAPES Master studentship (DS 88887921543/2023-00). A.T. was supported by a training grant from the Fédération de recherche en infectiologie (FéRI). C.M.D. received funding from NSERC Discovery Grant 2019-06642.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

5. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

-->Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. -->

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Partly

**********

-->2. Has the statistical analysis been performed appropriately and rigorously? -->

Reviewer #1: N/A

Reviewer #2: I Don't Know

Reviewer #3: N/A

Reviewer #4: Yes

Reviewer #5: N/A

**********

-->3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.-->

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

**********

-->4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.-->

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

**********

-->5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)-->

Reviewer #1: Peer Review: “Sequencing of the Invasive E. coli Strain BEN2908 Isolated from Poultry: A Comparative Investigation of Genomic Regions Shared with Other Invasive Model Strains” PONE-D-25-49371

Recommendation: Major Revision

Reviewer Comments

This is a very complete and carefully executed study. The authors present a full genome assembly and comparative analysis of the APEC strain BEN2908, with integration of plasmid features and comparison to other ExPEC and AIEC strains. The manuscript is generally well organized and within scope for PLOS ONE.

That said, to fully meet the journal’s standards of technical rigor, reproducibility, and cautious interpretation, I recommend major revision.

Global Comments

The manuscript would benefit from a clearer description of the origin of strain BEN2908. Please specify the sample type, host species, geographic origin, year of isolation, and whether this strain was previously described or deposited in a public collection. These details are critical to contextualize the comparative analysis with other ExPEC and AIEC isolates.

The genomic methodology is described in commendable detail, but essential elements are still missing to evaluate assembly quality and reproducibility. In particular,

The Introduction is well written and easy to follow, but the narrative would be strengthened by briefly introducing the main virulence and metabolic genes that are later discussed in the Results (e.g., iron uptake systems, T6SS loci, dicarboxylate transporters). Providing this context early on would prepare the reader for the subsequent comparative analysis.

In the Methods, the description of the Illumina sequencing is too limited. While the platform (MiSeq), read number, length, and GC content are reported, key information is missing for reproducibility and assessment of data quality.

Please, provide coverage statistics across the chromosome and plasmid, QC metrics (e.g., FastQC), and details on any trimming or filtering steps prior to assembly. Given that the data were generated in 2015, it would also be important to clarify whether quality was re-evaluated before use in this study. Finally, to meet PLOS ONE’s data availability requirements, the raw Illumina reads should be deposited in SRA with accession numbers included in the Data Availability Statement. Also, provide version/date for the web-based tools to ensure reproducibility (especially the tools that uses online databases).

The RAxML tree must include bootstrap support values. At present the conclusions about STc95 and STc135 are not supported.

Please specify the BLAST settings (identity, coverage thresholds, word size, or e-value) used to define the genomic regions (GRs), and justify the specific ≥4 kb cutoff, as this arbitrary threshold affects which loci are considered “GRs.”

In addition, while Tables S1–S2 summarize GC content, coverage, and identities, the underlying files (BRIG project, FASTA sequences of each GR, and CDS functional annotation tables) should be deposited in a public repository (for example Github, to allow replication). If it is available, scripts should be public on GitHub or public repositories, or more detailed in a section of supplementary methods. In the Table 2 and Figure they some plasmids are analyzed but they are not available for download. If pLF82 data were taken from an older publication, the comparative analysis is only partially reproducible; please provide the accession numbers or upload the sequence to a public repository.

For the functional assignments, please clarify the criteria for selecting “reviewed” vs “unreviewed” UniProtKB hits and provide the accession numbers of representative proteins. Without these details, other researchers cannot reproduce or verify the classification of features.

Please temper the VFAnalyzer/PHASTEST claims (from lines 221) by specifying identity/coverage thresholds, depositing the full outputs (Tables S4–S5 with raw hits), and avoiding absolute statements like “shares all” without qualification. I The same situation is in the plasmid section (from line 237) the description of pBEN2908 is super clear, but the statement that “all genes related to a ColV-like plasmid are present” is too absolute without showing the evidence in detail. A table listing each ColV-associated locus (e.g., iuc/iut, iro, etc) with accession, coverage, and identity values would be useful and also the Inc or MOB type groups in the same table.

In the Sugar Metabolism section (from line 328), please distinguish more clearly between functions previously validated in the literature (e.g., frz operon, GimA) and novel pathway predictions based on low-identity ORFs, so that readers can separate established evidence from putative annotations and similar in the identification of iron and metal uptake systems (from line 404), the phrasing “we identified systems related to metal uptake” should be revised to clarify that this refers to the detection of previously described operons through genome annotation, rather than novel experimental findings. In the Adhesion and Invasion section (lines 412), please clarify that operons and loci (fim, auf, fdeC, ila) were detected through annotation rather than experimentally demonstrated, and temper the functional claims for ORF 1₁₁ as putative predictions based on moderate homology.

The Discussion is very detailed and in parts can be excessive. Several subsections (e.g., Sugar Metabolism, General Metabolism) repeat pathway-level explanations that could be streamlined or moved to Supplementary Material. A more concise discussion, with selective emphasis on novel findings, would greatly improve readability. Consider moving some of the descriptive detail into the Introduction (to set context) or into Supplementary Notes, while keeping the main text focused on the comparative insights.

In the Conclusion, please temper the statement on long-term conservation of uncharacterized ORFs, which is only inferred from two isolates 21 years apart, and explicitly clarify that all proposed functions remain hypothetical pending experimental validation.

Minor revisons:

The statement that these findings “reinforce the importance in government monitoring of bacterial genomic evolution” (line 276) introduces a policy-oriented opinion that goes beyond the scope of the presented data. I recommend rephrasing this sentence to remain focused on the scientific evidence (i.e., the role of ColV-like plasmids in virulence).

Replace “softwares” with “software,” shorten long passive sentences.

Reframe causal or absolute statements (“shares all genes,” “ancestral proximity”) into cautious, data-supported wording (“shares X genes above threshold Y,” “phylogenetic proximity suggested but requires bootstrap support”).

Specific Comments

• Line 71: Please make explicit that you are now referring to human disease context.

• Line 93: Be careful with the term “cassette gene.” Are you referring to integron cassettes or simply modular groups of mobilizable genes? Clarify the terminology.

• Line 104: Explain or cite the “minor modification” of the ONT nanopore protocol. Without details, it is not reproducible.

• Line 120: Please, provide the RAST version

• Lines 113–115: Wording issue: “The obtained raw data (.fast5 files) was base called…” → should be “The raw data were basecalled with Guppy v4.0.11 using [config].” Please also specify qscore filter and parameters.

• Line 210–215: The statement that STc95 is closer to AIEC STc135 than to APEC SCI-07 cannot stand without bootstrap support. Please provide node support and, ideally, ANI/AAI values.

• Line 222–229: When describing shared virulence genes with LF82, avoid absolute terms like “all 76 genes.” Instead, specify thresholds for identity/coverage and include a full gene list in Supplementary Materials.

• Line 300–311 (Table 3): Provide coverage/identity values for each “genomic region” absent or present, or the cut off used for declare as presence/absence.

• Line 461–466: The CRISPR comparison is interesting, but please provide a full spacer table with coordinates and system subtype classification to avoid confusion.

• Line 650–656 (Conclusion): The wording “attests the utility and importance” is too strong. Suggest softening to “suggests possible functional utility, subject to experimental validation.”

Recommendation: Major revision.

This is a very complex and beautifully crafted manuscript that will be a valuable reference once revised. With the additions of assembly QC, reproducibility deposits, phylogenetic support, careful terminology and parameters, the paper will fully meet the PLOS ONE publication criteria. Once these issues are addressed, the manuscript will meet the journal’s standards for transparency, reproducibility, and cautious interpretation.

Reviewer #2: I see the main contribution of this paper serving as a potentially valuable resource, providing a comprehensive genomic comparison of BEN2908 and related E. coli strains, and identifying candidate loci for virulence that warrant experimental validation. The current manuscript is mainly descriptive. Below, I list some comments that need addressing before I believe it is suitable for publication.

Major Concerns

[1] Besides the strain of interest (BEN2908), 13 other E. coli strains are used to construct a phylogenetic tree. Instead of using the same 13 strains for the following analysis, the ring images are generated by comparing BEN2908 versus three other APEC strains, one AIEC strain (LF82), and one K-12 strain. Why only select a subset of the 13 strains for ring comparison?

[2] More details are needed to ensure the robustness of the phylogenetic analysis. When using MUSCLE, state whether ambiguous regions were trimmed and whether only the single-copy orthologs were included. For RAxML, report detailed bootstrap support values and/or compare the resulting topology under different substitution models.

[3] When constructing the phylogenetic tree, is it possible to incorporate some enteric pathotypes, such as E2348/69 (EPEC reference strain) or EDL933 (EHEC reference strain)?

[4] Around 23 genomic regions are identified in common between STc95 model strains and LF82. Since NRG867c is often compared with LF82 as representatives of AIEC, how many of these 23 genomic regions are also detected in NRG867c?

[5] When describing gene functions, I found it hard to distinguish functions predicted by KEGG from those demonstrated by experiments. The authors could emphasize functions that have been confirmed by both bioinformatics and bench.

Minor Comments

[6] Citation numbers 1 to 5 are missing in the main manuscript.

[7] Line 136, the software name “Orthofinder” should be “OrthoFinder”.

[8] Low figure resolution renders some key features unclear.

[9] The bibliography is extensive but not necessarily relevant. The authors should prioritize conclusions and related bibliography to emphasize the most novel findings.

Reviewer #3: Introduction

line 38/39- a general definition of avian colibacillosis is needed to help appreciate the significance of the work. I understand that it is not well defined, but perhaps just a general definition/classification of the disease... AC is a major cause of mortality, systemic bacterial infection, affects X% of poultry, costs to poultry farmers, something like that. The authors give slightly more information in the second paragraph, but it would help the reader to move some of the information to the first paragraph.

line 65- misplaced comma after "although". There are a few awkward sentences in the introduction and it should be re-read carefully and edited for clarity and ease of reading.

Line 87- if the paper is about both BEN2908 and LF82, why is BEN2908 the only strain mentioned in the title and why is LF82 not mentioned more in the abstract?

General comment: I think the introduction could be re-structured for better flow of information, especially for the benefit of genomic science/bioinformatics readers who are not e. coli experts. There are some instances where information is initially referenced, but the context is not added until later (example- initial mention of AIEC in line 63 by way of comparing BEN2908, but then an explanation of the importance/clinical context of AIEC isn't mentioned again until the next paragraph on line 71. Other examples: ST complex information in lines 38 and 42-47 could be put into better context- a brief mention of the difference between STc and ST would be helpful for readers who are not intimately familiar with E. coli, but are interested in the paper for the comparative genomic aspect- as could the relationship between AIEC and ExPEC strains (paragraph starting on line 71)).

Methods

Genomic comparison: pathotype acronyms should be defined. APEC and AIEC are previously defined, but NMEC and UPEC are not.

Results

Line 202: what is the rationale for only looking at fimH and not other components of type 1 fimbriae, such as FimA?

Line 208: the mention of x7122's sequence similarity to K12 feels distracting from the manuscript's goal of comparing BEN2908 to the rest and from the stated goal of the paragraph (in line 198).

Line 212: missing word "strains" after STc95 and before "(fumC38, mdh17, and recA26)".

Line 224-226: awkward sentence structure (and grammatical error) starting with "if considered (considering) exclusively".

Line 234: "results from the PHASTEST program went in the other direction" should be rephrased to clearly state the result shows a lack of commonality rather than using casual language and asking the reader to infer the result.

Line 238: please state the plasmid sizes in the text in addition to their location in the table.

Line 238: it might be interesting to also state what genes are found on the larger plasmids that are not found on pBEN2908.

Line 239: define and describe the characteristics of a ColV-like plasmid. Colicins are not mentioned in the text prior to this point, nor are they addressed/explained in this paragraph in a meaningful way.

Line 240: the authors state that "all genes related to a ColV-like plasmid are present on pBEN2908", but these genes are not explicitly highlighted in the text or in Figure 2. Please add an indication of which genes these are, either in the text or in the figure.

Line 241: "APEC plasmids are also known to encode various virulence factors" please confirm in the text that these are not only "known to be encoded" but ARE, in fact, encoded in the plasmid of interest.

Line 265: "pO83-CORR plasmid of strain NRG857c also carries all of the ColV‐associated genes" is this expected based on its ST, etc? Please note the ST or pathotype or such in the text so that the reader does not have to frequently flip back to Table 1 to look at the most important details and understand why this specific strain is being discussed for comparison.

Line 273: this level of significance would be great earlier on in this section of the results to help explain the significance of ColV plasmids. Include something like this up front to address my earlier comment on line 239.

Line 328: remove "the" at the beginning of the sentence. Start with "Genomic regions....". This also happens throughout the next few paragraphs- the authors switch back and forth between referring to GRs as "the GR #" or "GR #". Please choose one convention and stick with it, though I recommend removing "The" for ease of reading.

Line 336: The reference to "the authors showed" on this line is not clearly connected to citation #21.

Line 338-341: awkward sentence phrasing.

Line 352: please change "Aside from those" to more conventionally professional language, such as "Additionally".

Line 355: What is the significance of the catabolism of threonic acid to this paper that warrants its specific inclusion? Please elaborate. This paragraph in general is missing an explanation of the named genes' functions, unlike the surrounding paragraphs which do include that information.

Line 380: "pathway" should be plural.

Line 382: Potential rephrase, instead of using vice-versa: "...named UxuB, catalyses the reversible interconversion of..."

Line 426: The discussion of the interaction between Fim proteins does not include the function of FimH. Why, then, is this the Fim protein being analyzed (see also the comment on line 202)?

Conclusion

The limitations of the study should be acknowledged when drawing conclusions.

General comment: Casual and/or clunky language is used throughout the text. It should be carefully reviewed to ensure professionalism in the writing. Overall, the science is sound and the analysis is satisfactory.

Instances of "[author] and coll." should be edited to "[author] et al"

Reviewer #4: The manuscript “Sequencing of the Invasive E. coli Strain BEN2908 Isolated from Poultry: A

Comparative Investigation of Genomic Regions Shared with Other Invasive Model

Strains” by Martins et al., provide insight into sequence similarity between a invasive strain BEN2908 from poultry and compares it with previously reported E. coli genomes. The study identified sequence diversity as well as conserved regions in the genome and its biological significance. Following are the minor comments on the manuscript.

1. References for the genes that contribute to the virulence of E. coli (line 280) should be provided

2. In line 335, reference 21 appears in italics, and reference 153 is also italicized; please ensure consistent formatting.

3. The references numbered 154–159 are listed in the reference section but not cited in the main text.

4. At line 609, the temperature unit is incorrectly written as ‘Celsius’; it should be expressed in °C, consistent with the formatting used throughout the manuscript

5. Please provide a reference for the previously described GRs in E. coli mentioned in line 413.

6. Please include an appropriate reference to support the statement in line 531.

Reviewer #5: The authors have sequenced and compared BEN2908 genome with other model genomes of ExPEC strains and come up with some features that were unique to BEN2908, that is not shared with other genomes, that may contribute to pathogenesis.

Introduction: In the introduction, the authors describe the limited understanding of the pathophysiology of avian colibacillosis. In this context, they have chosen ST95 strains, particularly BEN2908, a well-studied strain. However, it is unclear why they also selected strain LF82—an adherent-invasive E. coli (AIEC) commonly associated with Crohn’s disease in humans—for comparison. Given that the focus of the study is the characterization of an extraintestinal pathogenic E. coli (ExPEC), what was the rationale for including a strain that is primarily an intestinal pathogen?

Methods: If the genome sequencing was outsourced to Genome and Transcriptome Facility at Bordeaux, France - technical details can be reduced to minimal.

The genome of BEN2908 strain was submitted in 2022 (The two accession numbers: LR740776.1; LR740777.2). This study has compared BEN2908 genome with 13 other genomes. The comparison was done with four APEC, two AIEC, two NMEC, three UPEC and two commensal E. coli strains. What was the rational to compare against just these categories?

Results & Discussion

The authors could have incorporated a larger set of genomes readily available from public databases, rather than limiting the analysis to just 14 genomes representing various ST types, pathotypes, and serotypes. This limited selection may reduce the resolution of the phylogenomic tree and the accuracy of virulome comparisons. The authors could have included a larger set of genomes and compared with a pangenome approach to get a wider and rather better picture on the virulome of strain BEN2908.

Fig 2 – Compares only plasmids of p1ColV5155 (IMT5155), pAPEC-O1-ColBM (APEC O1), and pAPEC-1( 7122) with that of BEN2908. What about plasmids from other genomes?

Page 21: lines 284-286 - The BEN2908 genome was set as the reference and five other strains as subjects: three model APEC strains (IMT5155, APEC O1, 7122), the commensal K-12 strain MG1655, and AIEC strain LF82. Why were only these genomes compared?

With the BRIG analysis, BEN2908was used as the reference genome and other genomes were compared to the reference genome. Comparing BEN2908 with other reference genomes would have given better insights of what BEN2908 was lacking.

**********

-->6. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .-->

Reviewer #1: Yes: Florencia Martino, PhD

Reviewer #2: No

Reviewer #3: No

Reviewer #4: No

Reviewer #5: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

To ensure your figures meet our technical requirements, please review our figure guidelines: https://journals.plos.org/plosone/s/figures

You may also use PLOS’s free figure tool, NAAS, to help you prepare publication quality figures: https://journals.plos.org/plosone/s/figures#loc-tools-for-figure-preparation.

NAAS will assess whether your figures meet our technical requirements by comparing each figure against our figure specifications.

Attachment

Submitted filename: PONE-D-25-49371_reviewed.pdf

pone.0342894.s008.pdf (100.4KB, pdf)
Attachment

Submitted filename: Review Report-PONE-D-49371.docx

pone.0342894.s009.docx (16.6KB, docx)
PLoS One. 2026 Feb 23;21(2):e0342894. doi: 10.1371/journal.pone.0342894.r002

Author response to Decision Letter 1


18 Dec 2025

Response to Reviewers comments:

Reviewer #1: Peer Review: “Sequencing of the Invasive E. coli Strain BEN2908 Isolated from Poultry: A Comparative Investigation of Genomic Regions Shared with Other Invasive Model Strains” PONE-D-25-49371

Recommendation: Major Revision

Reviewer Comments

This is a very complete and carefully executed study. The authors present a full genome assembly and comparative analysis of the APEC strain BEN2908, with integration of plasmid features and comparison to other ExPEC and AIEC strains. The manuscript is generally well organized and within scope for PLOS ONE.

That said, to fully meet the journal’s standards of technical rigor, reproducibility, and cautious interpretation, I recommend major revision.

We thank the reviewer for these insightful and constructive comments, which have helped us to significantly improve our manuscript.

Global Comments

1. The manuscript would benefit from a clearer description of the origin of strain BEN2908. Please specify the sample type, host species, geographic origin, year of isolation, and whether this strain was previously described or deposited in a public collection. These details are critical to contextualize the comparative analysis with other ExPEC and AIEC isolates.

BEN2908 has been described in the introduction with the requested information (lines 70-73) and the following text was added in the Data availability section (lines 260-261) regarding deposition in a public collection: ”BEN2908 strain has been deposited at the International Center for Microbial Resources—Bacterial Pathogens (CIRM-BP) under name CIRMBP-1386.”

2. The genomic methodology is described in commendable detail, but essential elements are still missing to evaluate assembly quality and reproducibility. In particular, The Introduction is well written and easy to follow, but the narrative would be strengthened by briefly introducing the main virulence and metabolic genes that are later discussed in the Results (e.g., iron uptake systems, T6SS loci, dicarboxylate transporters). Providing this context early on would prepare the reader for the subsequent comparative analysis.

The Introduction was restructured to mention the importance of metal uptake in APEC infection, the recently shown relation of dicarboxylates in APEC virulence (lines 54-60), and the contribution of the T6SS to both APEC and AIEC pathogenicity (lines 102-106).

3. In the Methods, the description of the Illumina sequencing is too limited. While the platform (MiSeq), read number, length, and GC content are reported, key information is missing for reproducibility and assessment of data quality. Please, provide coverage statistics across the chromosome and plasmid, QC metrics (e.g., FastQC), and details on any trimming or filtering steps prior to assembly. Given that the data were generated in 2015, it would also be important to clarify whether quality was re-evaluated before use in this study. Finally, to meet PLOS ONE’s data availability requirements, the raw Illumina reads should be deposited in SRA with accession numbers included in the Data Availability Statement.

The requested information was added to the Methods “BEN2908 DNA extraction, sequencing, assembly and annotation” section (lines 143-151) and the sequencing files mentioned were submitted to GitHub, under the following repository: https://github.com/Martins-TW/BEN2908_Genome_Analysis.git

SRA files were submitted to BioProject PRJNA1359407 (lines 258-259).

4. Also, provide version/date for the web-based tools to ensure reproducibility (especially the tools that uses online databases).

All versions and releases were added to their respective Methods sections.

5. The RAxML tree must include bootstrap support values. At present the conclusions about STc95 and STc135 are not supported.

To improve phylogenetic support, we removed the draft genome from APEC SCI-07 and added 10 model InPEC strains. Our intent in this new selection is to show that AIEC STc135 and ExPEC strains phylogenetically close to BEN2908 have more similar orthologues than AIEC STc135 and InPEC strains. This is supported by the phylogenetic clustering of these strains, bootstrap values, Average Amino Acid Identity (AAI) comparisons, and consistent tree topology using different substitution matrices (available on github repository: link)

6. Please specify the BLAST settings (identity, coverage thresholds, word size, or e-value) used to define the genomic regions (GRs), and justify the specific ≥4 kb cutoff, as this arbitrary threshold affects which loci are considered “GRs.”

BLAST was run with BLAST+ using default blastn parameters (word_size=11; reward=2; penalty=−3; gapopen=5; gapextend=2; e-value=10). BRIG visualization thresholds were set to 90% (upper) and 70% (lower) identity; these thresholds control intensity colouring of the ring (lines 221-223). As suggested in the comment below, BRIG alignment files containing coverage and other data were uploaded at my GitHub account (link).

The ≥4 kb cutoff was chosen for the following reasons:

First, it corresponds roughly to 3-4 genes given the typical gene density in Escherichia coli (~1 gene per ~1 kb; some references on NCBI: K-12 (link), BEN2908 (link), LF82 (link). Thus, a 4 kb region is large enough to capture small functional clusters such as operons or adjacent co-functional genes, while still considering intergenic spaces and genetic elements that could not be identified by the characterization programs we used. Second, a 4 kb threshold has precedent in analogous genomic reports (e.g., genome announcements for APECO1 (link) and IMT5155 (link)), which facilitates comparison with closely related E. coli strains (lines 224-228).

To complement, we also note that some previously described genomic islands in E. coli (for example, GimB (GR 25) and PAI-X (GR 4)) have <5 kb in size (S2 Table). So, adopting a cutoff larger than 4 kb could exclude these biologically relevant islands.

7. In addition, while Tables S1–S2 summarize GC content, coverage, and identities, the underlying files (BRIG project, FASTA sequences of each GR, and CDS functional annotation tables) should be deposited in a public repository (for example Github, to allow replication). If it is available, scripts should be public on GitHub or public repositories, or more detailed in a section of supplementary methods.

The BRIG alignment files, FASTA sequences of each GR, their annotations generated by RAST, the python scripts used, as well as EzAAI, RAxML, Orthofinder, Roary, PHASTEST outputs and the sequencing files mentioned in methods section, are available on GitHub at the following repository (lines 257-259): https://github.com/Martins-TW/BEN2908_Genome_Analysis

8. In the Table 2 and Figure they some plasmids are analyzed but they are not available for download. If pLF82 data were taken from an older publication, the comparative analysis is only partially reproducible; please provide the accession numbers or upload the sequence to a public repository.

The section was revised so that the analysis no longer relies on plasmid comparison with pLF82, which is not available for download. The information we included about pLF82 in Supplementary Table S7 was extracted from the description provided in the supplementary material of Miquel et al. (2010).

9. For the functional assignments, please clarify the criteria for selecting “reviewed” vs “unreviewed” UniProtKB hits and provide the accession numbers of representative proteins. Without these details, other researchers cannot reproduce or verify the classification of features.

A better description of the reviewed vs unreviewed criteria was added to the “Ring Image generation and CDS functional characterization” Methods section (lines 243-247). The accession number of representative proteins from Uniprot database are available on Table 5 and S2 Table.

10. Please temper the VFAnalyzer/PHASTEST claims (from lines 221) by specifying identity/coverage thresholds, depositing the full outputs (Tables S4–S5 with raw hits), and avoiding absolute statements like “shares all” without qualification. I

PHASTEST full outputs were deposited in the GitHub repository and VFanalyzer full output is S5 Table with pathotype description and line coloring included. We also provided a description of how the programs were used in the “Genomic comparison and characterization of E. coli strains” Methods section (lines 179-186). The sentences were also modified to better attribute findings to the respective program outputs avoiding unqualified statements.

11. The same situation is in the plasmid section (from line 237) the description of pBEN2908 is super clear, but the statement that “all genes related to a ColV-like plasmid are present” is too absolute without showing the evidence in detail. A table listing each ColV-associated locus (e.g., iuc/iut, iro, etc) with accession, coverage, and identity values would be useful and also the Inc or MOB type groups in the same table.

As suggested, Table 2 now lists ColV-associated loci containing accession number, coverage, identity values, and PlasmidFinder identification. We also updated Figure 2 to include the same strains shown in Table 2.

12. In the Sugar Metabolism section (from line 328), please distinguish more clearly between functions previously validated in the literature (e.g., frz operon, GimA) and novel pathway predictions based on low-identity ORFs, so that readers can separate established evidence from putative annotations and similar in the identification of iron and metal uptake systems (from line 404), the phrasing “we identified systems related to metal uptake” should be revised to clarify that this refers to the detection of previously described operons through genome annotation, rather than novel experimental findings.

In the Adhesion and Invasion section (lines 412), please clarify that operons and loci (fim, auf, fdeC, ila) were detected through annotation rather than experimentally demonstrated, and temper the functional claims for ORF 1₁₁ as putative predictions based on moderate homology.

To clarify when the referred genes were already studied and when they were predicted, we divided each functional category (SM, Bf, A/I, …) into a Described and an Uncharacterized subsection. Also, we added a more careful statement when referring to low-identity ORFs, like ORF 16, mentioned in the Sugar Metabolism section (lines 466-470). GR 11 (so, ORF 111) was removed because we remade our alignments and found out that commensal K-12 have more than 50% of coverage and above 90% identity to GR 11. After this curation, 36 genomic regions remained absent from K-12 MG1655 and present in all strains from the ring (Fig. 3; S1 Table).

13. The Discussion is very detailed and in parts can be excessive. Several subsections (e.g., Sugar Metabolism, General Metabolism) repeat pathway-level explanations that could be streamlined or moved to Supplementary Material. A more concise discussion, with selective emphasis on novel findings, would greatly improve readability. Consider moving some of the descriptive detail into the Introduction (to set context) or into Supplementary Notes, while keeping the main text focused on the comparative insights.

As suggested, some of the text was removed from the discussion and some were modified to fit in the introduction.

14. In the Conclusion, please temper the statement on long-term conservation of uncharacterized ORFs, which is only inferred from two isolates 21 years apart, and explicitly clarify that all proposed functions remain hypothetical pending experimental validation.

As suggested, the statement was tempered and affirmation was broadened to clarify that these GRs are also present in several ExPEC strains, but require experimental validation to our predictive analysis (lines 769-772).

Minor revisions:

15. The statement that these findings “reinforce the importance in government monitoring of bacterial genomic evolution” (line 276) introduces a policy-oriented opinion that goes beyond the scope of the presented data. I recommend rephrasing this sentence to remain focused on the scientific evidence (i.e., the role of ColV-like plasmids in virulence).

Thanks for the comment, the suggested statement was removed.

16. Replace “softwares” with “software,” shorten long passive sentences.

done

17. Reframe causal or absolute statements (“shares all genes,” “ancestral proximity”) into cautious, data-supported wording (“shares X genes above threshold Y,” “phylogenetic proximity suggested but requires bootstrap support”).

Absolute statements were avoided, and wording was adjusted to reflect threshold-based interpretations.

Specific Comments

• Line 71: Please make explicit that you are now referring to human disease context.

added

• Line 93: Be careful with the term “cassette gene.” Are you referring to integron cassettes or simply modular groups of mobilizable genes? Clarify the terminology.

changed to genomic modules and gene modules

• Line 104: Explain or cite the “minor modification” of the ONT nanopore protocol. Without details, it is not reproducible.

Thank you for your comment. In fact, the minor changes are those indicated in the following sentences. The text has been amended as follows: Sheared DNA was End-Repaired using Oxford Nanopore recommendations for 1D Ligation sequencing (LSK-SQK 108), with minor modifications, as follows (line 129).

• Line 120: Please, provide the RAST version

done

• Lines 113–115: Wording issue: “The obtained raw data (.fast5 files) was base called…” → should be “The raw data were basecalled with Guppy v4.0.11 using [config].” Please also specify qscore filter and parameters.

The text was modified accordingly and the Guppy mode and qscore filter was specified (lines 139-142). The Nanoplot file generated comparing read quality before and after filtering was submitted to GitHub in the following repository: https://github.com/Martins-TW/BEN2908_Genome_Analysis

• Line 210–215: The statement that STc95 is closer to AIEC STc135 than to APEC SCI-07 cannot stand without bootstrap support. Please provide node support and, ideally, ANI/AAI values.

To improve phylogenetic support, we removed the draft genome from APEC SCI-07 and added 10 model InPEC strains. The new tree is supported by the phylogenetic clustering of these strains, bootstrap values, Average Amino Acid Identity (AAI) comparisons, and consistent tree topology using different substitution matrices (available on github repository; link)

• Line 222–229: When describing shared virulence genes with LF82, avoid absolute terms like “all 76 genes.” Instead, specify thresholds for identity/coverage and include a full gene list in Supplementary Materials.

VFanalyzer full output is S5 Table with pathotype description and line coloring included. We also provided a description of how the programs were used in the “Genomic comparison and characterization of E. coli strains” Methods section (179-186). As suggested the sentences were also modified to avoid absolute terms.

• Line 300–311 (Table 3): Provide coverage/identity values for each “genomic region” absent or present, or the cut off used for declare as presence/absence.

S1 Table contains coverage and identity values to the 22 strains and cutoff was explicit in S1 Table comment and Table 3 title (lines 420-424). Also, in “Overview and metabolic functions of the GRs in common between ExPEC and AIEC strains” Results and Discussion section (lines 398-400).

• Line 461–466: The CRISPR comparison is interesting, but please provide a full spacer table with coordinates and system subtype classification to avoid confusion.

S3 Table was built to contain the requested information.

• Line 650–656 (Conclusion): The wording “attests the utility and importance” is too strong. Suggest softening to “suggests possible functional utility, subject to experimental validation.”

As suggested, we rephrased the conclusion to adopt a more tempered tone as follows: “In spite of that, this is a predictive work and experimental validation for these in silico analysis is necessary to define the true functionality of the novel ORFs identified” (lines 766-768).

Recommendation: Major revision. This is a very complex a

Attachment

Submitted filename: response reviewers PLOS ONE PONE-D-25-49371_final.docx

pone.0342894.s011.docx (3.1MB, docx)

Decision Letter 1

Feng Gao

11 Jan 2026

-->PONE-D-25-49371R1-->-->Sequencing of the invasive E. coli strain BEN2908 isolated from poultry: a comparative investigation of genomic regions shared with intestinal and extraintestinal model E. coli strains-->-->PLOS One

Dear Dr. Schouler,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Feb 25 2026 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:-->

  • A letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

-->If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Feng Gao

Academic Editor

PLOS One

Journal Requirements:

If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise.

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

-->Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.-->

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

-->2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. -->

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

-->3. Has the statistical analysis been performed appropriately and rigorously? -->

Reviewer #1: N/A

Reviewer #2: N/A

Reviewer #3: Yes

**********

-->4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.-->

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

-->5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.-->

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

-->6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)-->

Reviewer #1: Reviewer Report

I reviewed the revised submission together with the authors’ responses. Overall, the revision is clearly improved compared with the previous version. The manuscript reads more coherently, the methods are more reproducible (tool versions and key phylogenetic settings are now provided), and the comparative genomics results are presented in a clearer narrative. The authors have addressed most of the substantive concerns raised previously.

That said, there are still several points that should be corrected before acceptance.

Major points

1) Inconsistency in T6SS genomic region (GR) assignment.

In the “Secretion systems” section, the authors report T6SS in GR 11 and GR 19, but later state: “One set containing more than fifteen CDS … were found occurring in GR 20.” This does not match Table 4, where the larger T6SS cluster corresponds to GR 19 and modules are present in GR 11. Please verify and correct the GR numbering in the text (very likely GR 20 should be GR 19) so that the manuscript is internally consistent.

2) The Illumina read retention after filtering is extremely low and needs clarification.

The manuscript reports 32,274,430 MiSeq reads but only 1,713,025 reads retained after trimming and quality filtering. This is an unusually large reduction (approximately 95% discarded). This may reflect stringent filtering or an underlying quality issue, but the manuscript should explicitly state the filtering criteria (quality threshold, minimum length) and clarify whether counts refer to reads or read pairs. A short justification would help readers interpret this and reduce concern about data quality. Please clarify in methods the execution environment and provide version/parameters accordingly of “fastq_quality_filter”

3) Define explicitly what “absent in MG1655” means.

The ring comparison is built around calling genomic regions “absent in MG1655,” but the manuscript does not provide a clear operational definition of absence ( for example, coverage and identity thresholds). Since the authors already define inclusion thresholds in Table 3 (>50% coverage; “partial” <70%), the same explicit definition should be given for “absence. (pages 27 and 36)

Minor points

Page 24: “Read quality and length distributions were then assessed using.” Has an extra “.”

Page 36: (line 389) Double colon (“subjects: : two APEC”)

Page 29: Double period in Data availability (“LR740777.2. . Raw reads”)

Harmonize strain and plasmid naming conventions (APEC O1 vs APECO1). The manuscript alternates between “APEC O1” and “APECO1” (including in table headers). Please adopt a single convention for strain names and plasmid names and apply it consistently throughout.

Page 30: Table 1, rephrase the fimH2343 footnote for clarity. The note stating that fimH2343 results in the fimH5 translated amino acid sequence is difficult to interpret as written and mixes allele nomenclature with protein identity. Please rewrite this footnote more precisely (nucleotide change and resulting amino acid equivalence).

Page 44: Maintain consistent framing of homology-based inferences. In a few places, the wording still suggests experimental identification rather than annotation and or prediction. Minor edits (“annotated,” “predicted,” “detected operons involved in…”) would better align with the evidence (especially for metal uptake and pathway reconstruction).

Standardize the way the ring comparison strain set is counted. At one point the manuscript refers to “7 strains phylogenetically close,” while elsewhere it describes a comparison set including eight strains (plus a commensal reference). Please standardize the counting and wording to avoid confusion.

Add a brief limitation statement for the metabolic pathway reconstruction. The KEGG-based reconstruction is plausible, but it would be useful to add one sentence noting that this is inferred from homology-based KO assignments and requires experimental validation of regulation and transport specificity.

Recommendation

This is a strong revision and I think it can be accepted after the authors address the points above. The remaining issues are mainly about internal consistency (notably the T6SS GR numbering), interpretability/reproducibility (explicit thresholds; Illumina filtering and read counts), and copyediting (punctuation and terminology). Addressing these items will strengthen the manuscript and reduce the risk of additional questions in a subsequent round.

Reviewer #2: Thank you for addressing all my concerns and comments. Congratulations on the manuscript. I would appreciate it if the figures could be provided at a higher resolution.

Reviewer #3: (No Response)

**********

-->7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .-->

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Katherine A. Innamorati, Ph.D.

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

To ensure your figures meet our technical requirements, please review our figure guidelines: https://journals.plos.org/plosone/s/figures

You may also use PLOS’s free figure tool, NAAS, to help you prepare publication quality figures: https://journals.plos.org/plosone/s/figures#loc-tools-for-figure-preparation.

NAAS will assess whether your figures meet our technical requirements by comparing each figure against our figure specifications.

-->

PLoS One. 2026 Feb 23;21(2):e0342894. doi: 10.1371/journal.pone.0342894.r004

Author response to Decision Letter 2


23 Jan 2026

Reviewer #1:

I reviewed the revised submission together with the authors’ responses. Overall, the revision is clearly improved compared with the previous version. The manuscript reads more coherently, the methods are more reproducible (tool versions and key phylogenetic settings are now provided), and the comparative genomics results are presented in a clearer narrative. The authors have addressed most of the substantive concerns raised previously.

That said, there are still several points that should be corrected before acceptance.

We thank you for your thorough review of our document, which has enabled us to correct the remaining errors

Major points

1) Inconsistency in T6SS genomic region (GR) assignment.

In the “Secretion systems” section, the authors report T6SS in GR 11 and GR 19, but later state: “One set containing more than fifteen CDS … were found occurring in GR 20.” This does not match Table 4, where the larger T6SS cluster corresponds to GR 19 and modules are present in GR 11. Please verify and correct the GR numbering in the text (very likely GR 20 should be GR 19) so that the manuscript is internally consistent.

Indeed, in this case GR 20 was GR 19, thanks for the comment. The text was modified accordingly. (line 613).

2) The Illumina read retention after filtering is extremely low and needs clarification.

The manuscript reports 32,274,430 MiSeq reads but only 1,713,025 reads retained after trimming and quality filtering. This is an unusually large reduction (approximately 95% discarded). This may reflect stringent filtering or an underlying quality issue, but the manuscript should explicitly state the filtering criteria (quality threshold, minimum length) and clarify whether counts refer to reads or read pairs. A short justification would help readers interpret this and reduce concern about data quality. Please clarify in methods the execution environment and provide version/parameters accordingly of “fastq_quality_filter”.

Thank you for your observation. Indeed the number was mistaken; it was actually 3,227,430 reads and not 32,227,430. To clarify each step of the processing, the text was modified to attend your considerations: “Illumina reads were trimmed for adapters and low-quality bases using Trimmomatic (v. 0.32)(38), with the following parameters: ILLUMINACLIP:TruSeq3-PE.fa:2:30:10; LEADING:30; TRAILING:30; HEADCROP:20; MINLEN:150. This resulted in 2,476,272 paired reads with minimum length of 150 bp, which were further filtered with a Q20 cutoff using fastq_quality_filter (v. 1.0.0), available on Galaxy platform (v. 25.0)(39). After filtering, a total of 1,713,025 reads were retained, yielding a coverage depth of 282x and breadth of 99.86%, as assessed by BWA (v. 0.7.19)(40).” (lines 148-153)

3) Define explicitly what “absent in MG1655” means.

The ring comparison is built around calling genomic regions “absent in MG1655,” but the manuscript does not provide a clear operational definition of absence ( for example, coverage and identity thresholds). Since the authors already define inclusion thresholds in Table 3 (>50% coverage; “partial” <70%), the same explicit definition should be given for “absence. (pages 27 and 36)

An explicit definition of absence was added to lines 239 and 411.

Minor points

Page 24: “Read quality and length distributions were then assessed using.” Has an extra “.”

done

Page 36: (line 389) Double colon (“subjects: : two APEC”)

done

Page 29: Double period in Data availability (“LR740777.2. . Raw reads”)

done

Harmonize strain and plasmid naming conventions (APEC O1 vs APECO1). The manuscript alternates between “APEC O1” and “APECO1” (including in table headers). Please adopt a single convention for strain names and plasmid names and apply it consistently throughout.

done, APEC O1 was defined as the convention.

Page 30: Table 1, rephrase the fimH2343 footnote for clarity. The note stating that fimH2343 results in the fimH5 translated amino acid sequence is difficult to interpret as written and mixes allele nomenclature with protein identity. Please rewrite this footnote more precisely (nucleotide change and resulting amino acid equivalence).

We agreed with your comment and the footnote was edited as follows:”¹ The allele fimH2343 differs from fimH15 by a single nucleotide mutation (537 G>A), resulting in a non-synonymous substitution on the amino acid sequence (180 G>S). (lines 293 and 294)

Page 44: Maintain consistent framing of homology-based inferences. In a few places, the wording still suggests experimental identification rather than annotation and or prediction. Minor edits (“annotated,” “predicted,” “detected operons involved in…”) would better align with the evidence (especially for metal uptake and pathway reconstruction).

To make a clear distinction between predicted and experimental identification we modified some paragraphs and made wording edits (lines 458-468, 476-479, 484-485, 531, 544, 566, 581, 601, 630, 632, and 786).

Standardize the way the ring comparison strain set is counted. At one point the manuscript refers to “7 strains phylogenetically close,” while elsewhere it describes a comparison set including eight strains (plus a commensal reference). Please standardize the counting and wording to avoid confusion.

To avoid confusion regarding the phylogenetically close strains we made wording edits (lines 311, 323, and 407).

Add a brief limitation statement for the metabolic pathway reconstruction. The KEGG-based reconstruction is plausible, but it would be useful to add one sentence noting that this is inferred from homology-based KO assignments and requires experimental validation of regulation and transport specificity.

We agree with your comment, and an explanation was added to the ”Genomic comparison and characterization of E. coli strains” Methods section: “The KEGG Pathway (KP) and KEGG Orthology (KO) databases were used by mapping the KO assignment numbers of the uncharacterized ORF homologs identified in this work onto KP maps. This allowed us to predict metabolic pathways potentially related to the molecular functions of the novel ORFs. However, these predictions are based on homology-derived KO assignments and require experimental validation of protein activity, regulation and specificity” (lines 187-192)

Recommendation

This is a strong revision and I think it can be accepted after the authors address the points above. The remaining issues are mainly about internal consistency (notably the T6SS GR numbering), interpretability/reproducibility (explicit thresholds; Illumina filtering and read counts), and copyediting (punctuation and terminology). Addressing these items will strengthen the manuscript and reduce the risk of additional questions in a subsequent round.

Reviewer #2:

Thank you for addressing all my concerns and comments. Congratulations on the manuscript. I would appreciate it if the figures could be provided at a higher resolution.

We thank the reviewer for the comment. Regarding the figures resolution, the PDF with the figures sent to you is generated by PLOS ONE during the submission process, and we have no control of the quality over this step. Nonetheless, as suggested, we improved the quality of the figures.

Attachment

Submitted filename: response reviewers PONE-D-25-49371R2 final.docx

Decision Letter 2

Feng Gao

29 Jan 2026

Sequencing of the invasive E. coli strain BEN2908 isolated from poultry: a comparative investigation of genomic regions shared with intestinal and extraintestinal model E. coli strains

PONE-D-25-49371R2

Dear Dr. Schouler,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager®  and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support .

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Feng Gao

Academic Editor

PLOS One

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

-->Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.-->

Reviewer #1: All comments have been addressed

**********

-->2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. -->

Reviewer #1: Yes

**********

-->3. Has the statistical analysis been performed appropriately and rigorously? -->

Reviewer #1: Yes

**********

-->4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.-->

Reviewer #1: Yes

**********

-->5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.-->

Reviewer #1: Yes

**********

-->6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)-->

Reviewer #1: Thank you for the careful and thorough revision. I re-read the revised manuscript and confirm that you have addressed the substantive points raised previously, including correcting the T6SS genomic region numbering, clarifying the Illumina read counts and filtering parameters (with versions and thresholds), and defining the operational criterion for “absence in MG1655”.

The manuscript is now internally consistent and substantially clearer and more reproducible, and the remaining issues are minor copyediting details that can be handled during production. On this basis, I recommend acceptance for publication.

**********

-->7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .-->

Reviewer #1: No

**********

Acceptance letter

Feng Gao

PONE-D-25-49371R2

PLOS One

Dear Dr. Schouler,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS One. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Feng Gao

Academic Editor

PLOS One

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Table with GC content, coverage and identity.

    1 GC content calculated with the https://jamiemcgowan.ie/bioinf/gc_content.html web tool. 2 Bolded values have more than 70% coverage. Underscored values have between 50 and 70% coverage.

    (XLSX)

    pone.0342894.s001.xlsx (22.8KB, xlsx)
    S2 Table. Table containing all GRs with reported and uncharacterized features.

    1 specific hit or non-specific hit with expect value closer to 0. Commented in the cells are the conserved residues identified by CD-Blast. 2 Sequencing made by Schouler and Trotereau (2016), available at: https://www.ncbi.nlm.nih.gov/nuccore/AY395687.1. 3 Unreviewed entries selected because of their mention in the following papers: ORFs 3₆-5₆ (Thiaville et al., 2016); ORF 2₃₄ (Majumdar et al. 2004); ORF 5₃₄ (Gárcia-Sanchez et al., 2021); and ORFs 9₃₄-13₃₄ (Lim et al., 2015). 4 Unreviewed entry. No reviewed entries found. 5 PHASTEST attL (2101049.2101064) and attR (2167393.2167408) are too upstream and downstream, respectively, encompassing genes present in all strains – even those lacking phage content. Therefore, the region boundaries were defined using the same criteria applied to the other GRs: end of the first gene common to all strains, beginning of the last gene common to all strains.

    (XLSX)

    pone.0342894.s002.xlsx (55KB, xlsx)
    S3 Table. Table with information regarding spacers and type of CRISPR/Cas system using MinCED and CRISPRCastyper.

    1 CFT073, 78-Pyelo and E2348/69 doesn’t possess a CRISPR/Cas system.

    (XLSX)

    pone.0342894.s003.xlsx (26.8KB, xlsx)
    S4 Table. Table containing the number of spacers shared in each strain.

    1 CFT073, 78-Pyelo and E2348/69 doesn’t possess a CRISPR/Cas system. 2 Spacer sequence is present, but displayed one to four nucleotide insertions located at their 5′ or 3′ ends.

    (XLSX)

    pone.0342894.s004.xlsx (15.5KB, xlsx)
    S5 Table. Table generated using VFAnalyzer (VFDB) with the 23 strains.

    1 Highlighted in any color is shown the 85 genes identified in AIEC strains by VFanalyzer, of which: Yellow represents the 55 genes present among at least one InPEC, one ExPEC, one Commensal, and LF82. Green represents the 18 genes present among at least one InPEC, one ExPEC, and LF82. Orange represents the 6 genes present between at least one ExPEC and LF82. Blue represents the 3 genes present between at least one ExPEC, one Commensal, and LF82. Red represents the 3 genes present between at least one InPEC and LF82.

    (XLSX)

    pone.0342894.s005.xlsx (73.7KB, xlsx)
    S6 Table. PHASTEST summary statistics of the 23 strains from this study.

    (XLSX)

    pone.0342894.s006.xlsx (34.3KB, xlsx)
    S7 Table. Plasmids statistics.

    1 The strains IHE3034, CFT073, 55989, SCU-397, and K-12 do not harbour any plasmids. 2 Three plasmids from the strain χ7122 and the plasmid from LF82 were not available for download. 3 pLF82 information was extracted from Miquel et al. (2010) Table 2 and Supplemental S1 Table. 4 Small plasmid size, RAST annotation failed. Information obtained from Genbank and GC content calculator: https://jamiemcgowan.ie/bioinf/gc_content.html.

    (XLSX)

    pone.0342894.s007.xlsx (20.6KB, xlsx)
    Attachment

    Submitted filename: PONE-D-25-49371_reviewed.pdf

    pone.0342894.s008.pdf (100.4KB, pdf)
    Attachment

    Submitted filename: Review Report-PONE-D-49371.docx

    pone.0342894.s009.docx (16.6KB, docx)
    Attachment

    Submitted filename: response reviewers PLOS ONE PONE-D-25-49371_final.docx

    pone.0342894.s011.docx (3.1MB, docx)
    Attachment

    Submitted filename: response reviewers PONE-D-25-49371R2 final.docx

    Data Availability Statement

    The sequences of the BEN2908 chromosome and plasmid are available in GenBank under the following accession numbers: LR740776.1; LR740777.2. Raw reads were submitted to Sequencing Read Archive (SRA) under the following BioProject: PRJNA1359407. The Python scripts used, BRIG alignment files, FASTA sequences of each GR, their annotations generated by RAST, as well as EzAAI, RAxML, Orthofinder, Roary, PHASTEST outputs and the sequencing files mentioned in methods section, are available on GitHub at the following repository: https://github.com/Martins-TW/BEN2908_Genome_Analysis.git. BEN2908 strain has been deposited at the International Center for Microbial Resources—Bacterial Pathogens (CIRM-BP) under name CIRMBP-1386.


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES