Abstract
Atypical enteropathogenic E. coli belonging to the serotype O177 is a rare strain found in ruminants, especially cattle. When compared to shiga toxin producing E. coli (STEC) O157 and non-O157 STEC (O26, O45, O103, O104, O111, O121, and O145) serotypes, the antimicrobial resistance, virulence factors, and genomic structure of E. coli O177 are poorly understood. Therefore, in this article, we present the whole genome sequence data of two aEPEC E. coli O177 isolates (E. coli O177_CF-154-A and E. coli O177_CF-335-B) generated using Illumina MiSeq platform. The raw data were generated, cleaned, and assembled using Trimmomatic and SPAdes. Genome data analysis yielded 5,112,402 and 5,460,435 bp, comprising contigs 101 and 191 with GC contents of 50.7% and 50.5% for E. coli O177_CF-154-A and E. coli O177_CF-335-B, respectively. Prokaryotic Genome Annotation Pipeline (PGAP) and Rapid Annotation using Subsystem Technology (RAST) showed that the complete genome of E. coli O177_CF-154-A contained 5040 coding sequences (CDS), 5146 genes, 4896 proteins, 90 RNAs, and 78 tRNA while that of E. coli O177_CF-335-B contained 5463 CDS, 5570 genes, 5230 proteins, 92 RNAs, and 80 tRNA for. A total of 426 and 425 subsystem features with 5190 and 5662 CDS were obtained for E. coli O177_CF-154-A and E. coli O177_CF-335-B, respectively. Several genes encoding virulence and antimicrobial resistance were identified in both genomes. Complete genome sequence data of both isolates have been deposited in the National Center for Biotechnology Information (NCBI), GenBank: accession numbers, VMKH00000000 (E. coli O177_CF-154-A) and VMKG00000000 (E. coli O177_CF-335-B). This data can be used as a reference for determining the virulence and antimicrobial resistance in E. coli O177 isolates from different sample sources.
Keywords: Escherichia coli O177, Whole genome sequence, Genome annotation, Genomic data, Virulence and Antimicrobial resistance genes
Specifications Table
| Subject | Microbiology |
| Specific subject area | Molecular Microbiology and Bioinformatics |
| Type of data | TableFiguresExcel Sheets |
| How the data were acquired | Whole genome sequence was performed using Illumina MiSeq platform. The FASTQ files were obtained and imported into Kbase platform (https://kbase.us/). The files were subjected to FASTQC (v.0.11.5) to assess reads quality. Subsequently, raw data were processed using Trimmomatic (v0.36). The assemble algorithm was carried out using SPAdes (v3.13.0), and genome annotation was performed using Prokaryotic Genome Annotation Pipeline (PGAP), Rapid Annotation using Subsystem Technology (RAST) and Pathosystems Resource Integration Center (PATRIC). |
| Data format | Raw, filtered and analysed. |
| Description of data collection | Genomic DNA was extracted from two aEPEC O177 isolates (CF-154-A and CF-334-B) obtained from the Department of Microbiology, at NWU. The gDNA was sequenced using Illumina MiSeq platform. After sequencing FASTQ files were obtained. Raw reads were cleaned and assembled into contigs using FASTQC (v.0.11.5) SPAdes (v3.13.0), respectively. The genome annotation was carried out using PGAP, v.2.0 and RAST (v.2.0). The genome maps were drafted using PATRIC (v.3.6.2). |
| Data source location | • Institution: North-West University• City/Town/Region: North-West Province• Country: South Africa |
| Data accessibility | Repository name: National Center for Biotechnology Information (NCBI), GenBank, and figshare.Data identification numbers: VMKH00000000 (E. coli O177_CF-154-A) and VMKG00000000 (E. coli O177_CF-335-B);PRJNA555014 and PRJNA554852, SAMN12288806 and SAMN12285021E. coli O177_CF-154-A and for E. coli O177_ CF-335-B, respectively).Direct URL to data: https://www.ncbi.nlm.nih.gov/nuccore/VMKH00000000, https://www.ncbi.nlm.nih.gov/nuccore/VMKG00000000, https://figshare.com/s/e3a60e4a3d918527b572 |
| Related research article | P. K. Montso, C. C. Bezuidenhout, C. Mienie, Y. M. Somorin, O. A. Odeyemi, V. Mlambo, C. N. Ateba. Genetic diversity and whole genome sequence analysis data of multidrug resistant atypical enteropathogenic E. coli O177 strains: An assessment of food safety and public health implications. Int J Food Microbiol. 2022, https://doi.org/10.1016/j.ijfoodmicro.2022.109555. |
Value of the Data
-
•
These data provide genomic features of E. coli O177 serotype. Moreover, these data give an extensive information on the virulence and antimicrobial resistance profile of this serotype, which may contribute to understanding and improving of scientific knowledge of this pathogenic strain.
-
•
The data may be used by researchers to develop new methods for detection of E. coli O177 serotype from different environmental samples. In addition, these data can be used in public health to establish policy framework and strategy intended to curb antimicrobial resistance, especially in humans.
-
•
This genome can be used as a reference, especially for comparative genomic and epidemiological studies.
1. Data Description
Two atypical enteropathogenic E. coli O177 isolates (E. coli O177_CF-154-A and E. coli O177_CF-335-B) were obtained from cattle faeces in the North West province, South Africa (−27° 00′ 0.00″ S 26° 00′ 0.00″ E), Fig. 1. Genome sequencing was performed using Illumina MiSeq platform and a total of 576.5 Mb (CF-154’s genome) and 794.3 Mb (CF-335’s genome) raw data were obtained. The genome characteristics of the two isolates (E. coli O177_CF-154-A and E. coli O177_CF-335-B) are summarised in Table 1 and Fig. 2. The genome sizes were 5,112,402 and 5,460,435 bp, comprising contigs 101 and 191 with GC content of 50.7% and 50.5% for E. coli O177_CF-154-A and E. coli O177_CF-335-B, respectively. There were 5040 coding sequences (CDS), 5146 genes, 4896 proteins, 90 RNAs, and 78 tRNA for E. coli O177_CF-154-A genome, while E. coli O177_CF-335-B genome contained 5463 CDS, 5570 genes, 5230 proteins, 92 RNAs, and 80 tRNA. Furthermore, both genomes contained 2 CRISPR Arrays. Based on RAST annotation, there were 426 and 425 subsystem feature counts with 5190 and 5662 CDS in E. coli O177_CF-154-A and E. coli O177_CF-335-B, respectively. As depicted in Fig. 2, the carbohydrates; amino acids and derivatives; stress response; respiration; DNA metabolism; protein metabolism; membrane transport; and cofactor, vitamins, prosthetic groups, pigments were the most abundant subsystem feature found in both genomes. The circular complete genome draft shown in Fig 3 was constructed using CGView [1]. The Virulence and Resistance Gene Identifier revealed that both genomes contained several virulence and antimicrobial resistance genes, Figs 3-7 and Excel sheets 1 and 2 (S 1 and 2).
Fig. 4.
Distribution of antimicrobial resistance genes in genomes of two E. coli O177 isolates obtained from cattle faeces.
Fig. 5.
Distribution of antimicrobial resistance gene family in genomes of two E. coli O177 isolates obtained from cattle faeces.
Fig. 6.
Antimicrobial drug classes in genomes of two E. coli O177 isolates from cattle faeces
Fig 1.
An illustration of the North West province map. https://municipalities.co.za/provinces/view/8/north-west.
Table 1.
Features of draft genomes of two E. coli O177 isolates obtained from cattle faeces.
| Sample ID |
||
|---|---|---|
| Features | E. coli O177_CF-154-A | E. coli O177_CF-335-B |
| Genome size | 5,112,402 bp | 5,460,435 bp |
| Genome coverage depth | 124.7x | 162.128x |
| Total length | 5111092 bp | 5459908 bp |
| GC content (%) | 50.7 | 50.5 |
| Number of contigs | 101 | 191 |
| Contigs N50 | 127249 | 113919 |
| Contigs L50 | 14 | 15 |
| Number of Scaffold | 101 | - |
| Scaffold N50 | 130301 | - |
| Scaffold L50 | 13 | - |
| Coding genes | 4896 | 5230 |
| Total genes | 5146 | 5570 |
| Total CDSs | 5040 | 5463 |
| Total proteins | 4896 | 5230 |
| rRNA | 8, 4, 6 (5S, 16S, 23S) | 7, 4, 6 (5S, 16S, 23S) |
| tRNA | 78 | 80 |
| ncRNA | 10 | 10 |
| CRISPR Arrays | 2 | 2 |
Fig. 2.
Frequency distribution of gene categories in genomes of two E. coli O177isolates obtained from cattle faeces.
Fig. 3.
The circular genome map of E. coli O177 isolates (CF-154-A and CF-335-B) obtained from cattle faeces. Circle displays from inside to outside: GC Skew (light orange), GC content (light purple), Drug Tagets (black), Transporters (blue), Virulence factor genes (yellow), Antimicrobial resistance genes (red), Non CDS features (turquoise blue), CDS reverse strand (light purple) and CDS forward strand (green).
Fig. 7.
Resistance mechanisms in two E. coli O177 isolates from cattle faeces
2. Experimental Design, Materials and Methods
2.1. Bacterial strain
Two atypical enteropathogenic E. coli O177 isolates were obtained from Antimicrobial Resistance and Phage Biocontrol Laboratory, Department of Microbiology. The isolates were selected based on the virulence and antimicrobial resistance profiles as described in the previous studies [2,3]. The stock cultures were removed from −80 °C and revived on MacConkey agar. The plates were incubated at 37 °C for 24 hours. After incubation, a single colony was transferred into 15 falcon tubes containing 10 mL nutrient broth. The tubes were incubated in a shaking incubator (150 rpm) at 37 °C for 24 hours.
2.2. Genomic DNA extraction and Sequencing
Genomic DNA was extracted from overnight cultures using the Zymo Research Genomic DNATM-Tissue MiniPrep Kit (Biolab, South Africa) following the manufacturer's instructions. The DNA concentration was determined using the NanoDropTM-Lite 1,000 spectrophotometer (Thermo Fisher Scientific, Walton, ma, USA). After fragmentation, DNA libraries were constructed using the Nextera XT DNA library prep kit (Illumina, USA) following the manufacturer's instruction. The fragmented DNA was amplified using 12 cycles PCR, which adds the index sequences [index 1 (i7) and index 2 (i5)]. The PCR products were purified using 0.6 × Agencourt AMPure XP beads (Beckman Coulter), and the quality was determined using 1.5% (w/v) agarose gel. Each library was diluted to 12 pmol. Samples were normalized to 4 nM using Nextra XT Library Normalization Beads (Illumina). Normalized libraries were pooled and 150 base paired-ends sequencing was performed with MiSeq Reagent V3 600-cycle kits on the Miseq instrument (Illumina).
2.3. Genome assembly, annotation and data analysis
Raw sequence data were generated and FASTQ files were obtained. The data were assessed for quality using FASTQC (v.0.11.5) and filtered for low quality reads and adapter regions using Trimmomatic (v.0.36) [4,5]. The de novo genome assembly was carried out using SPAdes (v.3.13) [5]. Complete genome annotation was performed using NCBI PGAP (v.5.0), Prokka pipeline (v.2.1.1), RAST server (v.2.0) and PATRIC online sever (v.3.6.2) [6], [7], [8], [9], [10]. Antimicrobial resistance genes were further mined using the Resistance Gene Identifier online tool of the comprehensive Antibiotic Resistance Database CARD4 (https://card.mcmaster.ca/analyze/rgi) with all parameters (‘Perfect and Strict hits’ and ‘High quality/Coverage’) set at default [11].
Ethics Statements
This study did not involve the use of human subjects or animal experiments.
CRediT authorship contribution statement
Peter Kotsoana Montso: Conceptualization, Methodology, Data curation, Writing – original draft, Visualization, Investigation, Software, Validation, Writing – review & editing. Victor Mlambo: Conceptualization, Methodology, Supervision, Software, Validation, Writing – review & editing. Collins Njie Ateba: Conceptualization, Methodology, Supervision, Software, Validation, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
Acknowledgments
This work is based on the research supported wholly by the National Research Foundation of South Africa [Grant number: 138545], and Food Security and Safety Niche Area, North-West University
Data Availability
References
- 1.Grant J.R., Stothard P. The CGView Server: a comparative genomics tool for circular genomes. Nucleic Acids Res. 2008;36:W181–W184. doi: 10.1093/nar/gkn179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Montso P.K., Bezuidenhout C.C., Mienie C., Somorin Y.M., Odeyemi O.A., Mlambo V., Ateba C.N. Genetic diversity and whole genome sequence analysis data of multidrug resistant atypical enteropathogenic E. coli O177 strains: An assessment of food safety and public health implications. Int.J Food Microbiol. 2022 doi: 10.1016/j.ijfoodmicro.2022.109555. [DOI] [PubMed] [Google Scholar]
- 3.Montso P.K., Mlambo V., Ateba C.N. The first isolation and molecular characterization of Shiga Toxin-producing virulent multi-drug resistant atypical enteropathogenic E. coli O177 serogroup from South African Cattle. Front. Cell. Infect. Microbiol. 2019;9:333. doi: 10.3389/fcimb.2019.00333. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.S Andrews, FastQC: a quality control tool for high throughput sequence data, 2010. http://www.bioinformatics.babraham.ac.uk/projects/fastqc.
- 5.Bolger A.M., Lohse M., Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bankevich A., Nurk S., Antipov D., Gurevich A.A., Dvorkin M., Kulikov A.S., Lesin V.M., Nikolenko S.I., Pham S., Prjibelski A.D., Pyshkin A.V. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012;19:455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Li W., O'Neill K.R., Haft D.H., DiCuccio M., Chetvernin V., Badretdin A., Coulouris G., Chitsaz F., Derbyshire M.K., Durkin A.S., Gonzales N.R. RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation. Nucleic Acids Res. 2021;49:D1020–D1028. doi: 10.1093/nar/gkaa1105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30:2068–2069. doi: 10.1093/bioinformatics/btu153. [DOI] [PubMed] [Google Scholar]
- 9.Wattam A.R., Davis J.J., Assaf R., Boisvert S., Brettin T., Bun C., Conrad N., Dietrich E.M., Disz T., Gabbard J.L., Gerdes S. Improvements to PATRIC, the all-bacterial bioinformatics database and analysis resource center. Nucleic Acids Res. 2017;45:D535–D542. doi: 10.1093/nar/gkw1017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Overbeek R., Olson R., Pusch G.D., Olsen G.J., Davis J.J., Disz T., Edwards R.A., Gerdes S., Parrello B., Shukla M., Vonstein V. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST) Nucleic Acids Research. 2014;42:D206–D214. doi: 10.1093/nar/gkt1226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Alcock B.P., Raphenya A.R., Lau T.T., Tsang K.K., Bouchard M., Edalatmand A., W Huynh, Nguyen A.L.V., Cheng A.A., Liu S., Min S.Y. CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic acids research. 2020;48:D517–D525. doi: 10.1093/nar/gkz935. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.







