Skip to main content
Data in Brief logoLink to Data in Brief
. 2022 Jun 23;43:108407. doi: 10.1016/j.dib.2022.108407

Genome sequencing data of extended-spectrum beta-lactamase-producing Escherichia coli INF191/17/A isolates of nosocomial infection

Nik Siti Hanifah Nik Ahmad a, Khor Bee Yin b, Nik Yusnoraini Yusof c,
PMCID: PMC9253457  PMID: 35799858

Abstract

The infection with extended-spectrum beta-lactamase-producing Escherichia coli is associated with higher mortality, longer length of hospital-stay and increased costs compared to infection with antibiotic-susceptible E. coli. Here, the draft genome of ESBL-producing E. coli circulating at local hospital is reported. The strain was detected as containing the genes of antibiotic resistance TEM, CTX-M-1, and CTX-M-9. The 5,136,548-bp genome, with a GC content of 50.59%, comprised 4987 protein-coding genes, four ribosomal RNA, and 66 transfer RNA. The ResFinder was successfully predicted fourteen antimicrobial genes in the E. coli INF191/17/A genome. Sequence data has been deposited in the GenBank database under the accession number JAIEXV000000000. The BioProject ID in the GenBank database is PRJNA752944. The raw data was sequenced using Ilumina MiSeq and submitted to the NCBI SRA database (SRX11797310), which is publicly available.

Keywords: Escherichia coli, Genome sequencing, Extended-spectrum beta-lactamase, Antimicrobial resistant gene

Specifications Table

Subject Health and medical sciences
Specific subject area Microbiology and genomics.Genome sequencing of pathogenic bacteria by using next generation sequencing approach.
Type of data TableSequencing raw reads in FASTQ format text fileAssembled draft genome of E. coli strain INF191/17/A in FASTA format text fileGenome sequence data in FASTA and FASTQ format
How data were acquired The Illumina MiSeq platform was used to generate paired-end reads of extended spectrum beta lactamase (ESBL)-producing E. coli strain INF191/17/A genome.
Data format Raw data in FASTQ formatAssembled data in FASTA format: GenBank assembly accession: GCA_019599325.1 (https://www.ncbi.nlm.nih.gov/assembly/ GCA_019599325.1).
Parameters for data collection Bacterial genomic DNA was extracted from a pure culture of ESBL-producing E. coli INF191/17/A . Nextera XT DNA library preparation kit was used for the whole-genome sequencing library preparation to generate 2 × 251 paired end reads data.
Description of data collection Whole genome sequencing was performed using Illumina MiSeq system (Illumina®,USA). BBDuk (BBTools v36) was used to trim raw reads, and SPAdes v3.9.0 was used to assemble clean reads. Genome scaffolding was performed with Medusa v1.6. ResFinder software predicted the putative antimicrobial resistant genes.
Data source location Institution: Institute for Research in Molecular Medicine (INFORMM)City/Town/Region: Kubang Kerian, KelantanCountry: MalaysiaLatitude and longitude for collected samples/data: 6.10 N 102.28 E
Data accessibility The data is hosted on a public repository.Bioproject: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA752944Biosample: https://www.ncbi.nlm.nih.gov/biosample/SAMN20668118NCBI GenBank Accession Number: JAIEXV000000000https://www.ncbi.nlm.nih.gov/nuccore/JAIEXV000000000Repository name: NCBI SRA databaseData identification number: SRR15497613Direct URL to data: https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR15497613

Value of the Data

  • The whole genome sequencing data provides insight into genomic determinants of the ESBL-producing E. coli strains INF191/17/A and antimicrobial resistance (AMR) genes.

  • This data should be used by researchers and public health officers to keep up surveillance and control of ESBL-producing gram negative organisms in order to prevent the emergence of highly resistant strain, which is one of serious problem in the world.

  • The genome data of E. coli strain INF191/17/A accelerates knowledge for pathogenic microbial research in the context of comparative studies, pan-genome, and evolution of non-ESBL and ESBL strains within different epidemiology.

  • Furthermore, prior to biomarker discovery, drug or vaccine development, the comprehensive understanding of the whole genome of this pathogen is critically important.

1. Data Description

The Escherichia coli INF191/17/A was discovered as an extended-spectrum beta-lactamase (ESBL) strain carrying the antibiotic resistance genes TEM, CTX-M-1, and CTX-M-9 via polymerase chain reaction using ESBL specific primers [1]. The 251 base-pair paired-end (2 × 251 bp) sequencing raw reads of the E. coli strain INF191/17/A genome were obtained from the Illumina MiSeq system (Illumina, CA, USA) [2]. The raw reads were pre-processed before the genome assembly and annotation. Antimicrobial resistant genes were predicted using curated public database. Genomic DNA was extracted from E. coli strain INF191/17/A and sequenced to generate a total of 1,368,224 reads in a 500-cycle run. The total reads from a paired-end dataset (191-17-A_R1.fastq and 191-17-A_R2.fastq) have resulted in 329,238,355 total bases (Table 1). The pre-processed of raw reads including trimming adapter sequences, low-quality and short reads, resulting 46.9% of clean readings. De novo assembly of the clean reads was performed and generated 314 contigs with a total size of 5.12 Mbp. Scaffolding resulted in 74 scaffolds with the longest scaffold is 2,520,446 and N50 scaffold length of 1,733,129 bases (Table 2). The average coverage of assembled sequence is 66x with 50.59% of G+C content. Using PGAP, a total of 4987 coding sequences (CDS), four ribosomal RNA, and 66 transfer RNA (Table 3) were predicted. Furthermore, ResFinder predicted that E. coli INF191/17/A will develop fourteen antibiotic resistance genes (Table 4).

Table 1.

Statistics of the raw and clean reads data including forward (191-17-A_R1.fastq) and reverse (191-17-A_R2.fastq) reads.

191-17-A R1 R2 Total
Total Raw Reads 684,112 684,112 1,368,224
Total Raw Reads Bases 164,465,730 164,772,625 329,238,355
Total Clean Reads 320,871 320,871 641,742
Total Clean Reads Bases 54,470,383 40,781,248 95,251,631
Clean Reads (%) 46.90 46.90 46.90

Table 2.

The statistics of the assembled draft genome of E. coli strain INF191/17/A.

Attributes Value
Number of scaffolds 74
Total size of scaffolds 5,136,548
Longest scaffold 2,520,446
Shortest scaffold 204
Number of scaffolds > 1 K nt 51 (68.9%)
Number of scaffolds > 10 K nt 21 (28.4%)
Number of scaffolds > 100 K nt 3 (4.1%)
Number of scaffolds > 1 M nt 2 (2.7%)
Number of scaffolds > 10 M nt 0 (0.0%)
Mean scaffold size 69,413
Median scaffold size 2736
N50 scaffold length 1,733,129
L50 scaffold count 2

Table 3.

The annotation of draft genome of E. coli INF191/17/A.

Attributes Value
Total number of genes 5062
Number of coding sequences 4987
Number of genes (coding) 4736
Total number of RNAs 75
Number of rRNAs 4
Number of tRNAs 66
Number of ncRNAs 5
Number of pseudogenes 251

Table 4.

Antimicrobial resistance genes and their corresponding antibiotics detected in the E. coli INF191/17/A.

AMR gene Description Resistance
mdf(A) Multidrug transporter MdfA Fluoroquinolone, Aminoglycoside, Tetracycline, Macrolide, Rifamycin, Phenicol
aph(3′')-Ib Aminoglycoside resistance protein B Streptomycin
aac(3)-IId Aminoglycoside-(3)-N-acetyl-transferase (aacC2) gene Apramycin, Gentamicin, Tobramycin, Dibekacin, Netilmicin, Sisomicin
aph(6)-Id Inosamine-phosphate amidinotransferase Streptomycin
aadA5 Streptomycin and spectinomycin resistance aminoglycoside adenyltransferase Spectinomycin, Streptomycin
tet(A) Trimethoprim resistant dihydrofolate reductase Doxycycline, Tetracycline
mph(A) Macrolide 2′-phosphotransferase I Erythromycin, Azithromycin, Spiramycin, Telithromycin
sitABCD Periplasmic binding protein (sitA), ATP-binding component (sitB), inner membrane component (sitC), inner membrane component (sitD) Hydrogen peroxide
blaTEM-1B Bet-lactamase TEM-1 Amoxicillin, Ampicillin, Cephalothin, Piperacillin, Ticarcillin
blaCTX-M-27 Beta-lactamase CTX-M-27 Amoxicillin, Ampicillin, Aztreonam, Cefepime, Cefotaxime, Ceftazidime, Ceftriaxone, Piperacillin, Ticarcillin
sul2 Dihydropteroate synthase type-2 Sulfamethoxazole
sul1 Dihydropteroate synthase type-1 Sulfamethoxazole
dfrA17 Dihydrofolate reductase Trimethoprim
qacE Quaternary ammonium compound-resistance protein QacE Benzylkonium chloride, Ethidium bromide, Chlorhexidine, Cetylpyridinium chloride

2. Experimental Design, Materials and Methods

2.1. Sample Collection and Isolation of ESBL E. coli Strain INF191/17/A

E. coli strain INF191/17/A was isolated from a 45-year-old male patient who was suffering from a high fever at a local hospital. In brief, the sample was cultured in the Bactec 9240 blood culture system (Becton, Dickinson, USA) before proceeding with the biochemical testing and gram staining [3]. The ESBL screening and disk confirmation tests were measured according to Clinical and Laboratory Standards Institute (CLSI) [4]. The 16S rRNA sequences for this strain were validated using specific primers of E. coli [5]. Then, the PCR was conducted using ESBL-primers for the confirmation of ESBL-type [1].

2.2. DNA Isolation, Genome Sequencing, Assembly, and Annotation

Genomic DNA was isolated using NucleoSpin tissue DNA, RNA, and protein purification kit according to manufacturer's instructions (Macherey-Nagel). The purified DNA was processed using Nextera XT DNA library preparation kit following the manufacturer's instructions (Illumina, USA). A whole-genome sequence was performed using the Miseq platform (Illumina, USA) (2 × 251 bp). The adapter trimming, quality trimming, contaminant filtering and read length filtering were performed using BBDuk (BBTools version 36) (http://jgi.doe.gov/data-and-tools/bbtools/). The low-quality bases (<Q30) and short reads (<50 bp) were trimmed to produce clean reads with a high quality read dataset. The clean reads were assembled de novo using SPAdes v3.9.0 [6] to obtain contigs. These assembled contigs were subjected to scaffolding against the closest reference genomes [3] to produce a draft genome using Medusa (Multi-Draft based Scaffolder) software [7]. The genome was annotated using the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) v4.10 [8].

2.3. Antimicrobial Resistant Genes Analysis

ResFinder (v4.1) [9] was used to screen for antimicrobial resistance genes. The assembled genome was searched against the curated Escherichia coli database using the default parameters. The prediction of the genes was confirmed if the assembled sequence had at least 95% nucleotide matching identity and 80% coverage with candidate genes in the database.

Ethics Statement

The study protocol was approved by the ethics committee of the Universiti Sains Malaysia (USM/JEPeM/20030152).

CRediT authorship contribution statement

Nik Siti Hanifah Nik Ahmad: Software, Formal analysis, Writing – review & editing, Funding acquisition. Khor Bee Yin: Conceptualization, Software, Formal analysis, Data curation, Writing – original draft. Nik Yusnoraini Yusof: Conceptualization, Software, Methodology, Resources, Writing – review & editing, Supervision, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.

Acknowledgment

This work was supported by a USM Short Term grant (304.CIPPM.6315337). We would like to thank the Hospital Universiti Sains Malaysia and Department of Microbiology and Parasitology, School of Medical Sciences, Universiti Sains Malaysia (USM), for providing the isolate.

References

  • 1.Alyamani E.J., Khiyami A.M., Booq R.Y., et al. The occurrence of ESBL-producing Escherichia coli carrying aminoglycoside resistance genes in urinary tract infections in Saudi Arabia. Ann. Clin. Microbiol. Antimicrob. 2017;16(1) doi: 10.1186/s12941-016-0177-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Kim H.M., Jeon S., Chung O., Jun J.H., Kim H.S., Blazyte A., Lee H.Y., Yu Y., Cho Y.S., Bolser D.M., Bhak J. Comparative analysis of 7 short-read sequencing platforms using the Korean Reference Genome: MGI and Illumina sequencing benchmark for whole-genome sequencing. Gigascience. 2021;10(3) doi: 10.1093/gigascience/giab014. giab014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ratmaazila W.M.W., Azlan M.M., Hassan N.H., Aziah I., Samsurizal N.H., Yusof N.Y. Draft genome sequence of the extended-spectrum β-lactamase-producing Escherichia coli isolate INF13/18/A, recovered from Kelantan, Malaysia. Microbiol. Resour. Announc. 2020;9(33) doi: 10.1128/MRA.01497-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Sari R., Apridamayanti P., Puspita I.D. Sensitivity of Escherichia coli bacteria towards antibiotics in patient with diabetic foot ulcer. Pharm. Sci. Res. 2018;5:19–24. doi: 10.7454/psr.v5i1.3649. [DOI] [Google Scholar]
  • 5.Al-Jamei S.A., Albsoul A.Y., Bakri F.G., Al-Bakri A.G. Extended spectrum beta-lactamase-producing E. coli in urinary tract infections: a two-center, cross-sectional study of prevalence, genotypes and risk factors in Amaan, Jordan. J. Infect. Public Health. 2019;12:21–25. doi: 10.1016/j.jiph.2018.07.011.8. [DOI] [PubMed] [Google Scholar]
  • 6.Bankevich A., Sergey N., Dmitry A., Alexey A.G., Mikhail D., Alexander S.K., et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012;19(5):455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bosi E., Donati B., Galardini M., Brunetti S., Sagot M.F., Lió P., et al. MeDuSa: a multi-draft based scaffolder. Bioinformatics. 2015;31(15):2443–2451. doi: 10.1093/bioinformatics/btv171. [DOI] [PubMed] [Google Scholar]
  • 8.Li W., O'Neill K.R., Haft D.H., DiCuccio M., Chetvernin V., Badretdin A., Coulouris G., Chitsaz F., Derbyshire M.K., Durkin A.S., Gonzales N.R., Gwadz M., Lanczycki C.J., Song J.S., Thanki N., Wang J., Yamashita R.A., Yang M., Zheng C., Marchler-Bauer A., Thibaud-Nissen F. RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation. Nucl. Acids Res. 2021;49(D1):D1020–D1028. doi: 10.1093/nar/gkaa1105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bortolaia V., Kaas R.F., Ruppe E., Roberts M.C., Schwarz S., Cattoir V., Philippon A., Allesoe R.L., Rebelo A.R., Florensa A.R., Fagelhauer L., Chakraborty T., Neumann B., Werner G., Bender J.K., Stingl K., Nguyen M., Coppens J., Xavier B.B., Malhotra-Kumar S., Westh H., Pinholt M., Anjum M.F., Duggett N.A., Kempf I., Nykäsenoja S., Olkkola S., Wieczorek K., Amaro A., Clemente L., Losch J.S., Ragimbeau C., Lund O., Aarestrup F.M. ResFinder 4.0 for predictions of phenotypes from genotypes. J. Antimicrob. Chemother. 2020;75(12):3491–3500. doi: 10.1093/jac/dkaa345. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES