ABSTRACT
Shiga toxin-producing Escherichia coli (STEC) is an enteric foodborne pathogen that can cause mild to severe illness. Here, we report the availability of high-quality whole-genome sequences for 77 STEC strains generated using the PacBio sequencing platform.
GENOME ANNOUNCEMENT
Shiga toxin-producing Escherichia coli (STEC) is a major foodborne pathogen responsible for outbreaks and sporadic cases of diarrheal illness (1). Although the majority of reported STEC infections in the United States are caused by E. coli O157:H7, non-O157 serotypes have grown to be a public health concern both in the United States and internationally, as they can cause severe illness comparable to that caused by STEC O157 (2, 3). Non-O157 STEC has been linked to a range of clinical illnesses, from asymptomatic shedding and mild diarrhea to hemorrhagic colitis and potentially fatal hemolytic-uremic syndrome (HUS); more than 100 STEC serotypes have been linked to such human disease (4). Many of these non-O157 serotypes do not have publicly available PacBio-sequenced genomes.
Here, we report whole-genome sequences for 77 STEC strains representing 43 serotypes. The STEC cultures were grown overnight on blood agar plates at 37°C, and genomic DNA was extracted according to the manufacturer’s protocol (ArchivePure; 5 Prime, Gaithersburg, MD). The DNA was sheared to 20 kb using needle shearing, and the prepared libraries were further size selected using BluePippin (Sage Scientific, Beverly, MA). The large SMRTbell libraries were generated using standard library protocols of the Pacific Biosciences DNA template preparation kit (Pacific Biosciences, Menlo Park, CA). Each strain was sequenced using one, two, or three single-molecule real-time (SMRT) cells. The finished libraries were bound to proprietary P6 version 2 polymerase and sequenced on a PacBio RS II platform using C4 chemistry for 360-min movies. The sequence reads were then filtered and assembled de novo using Falcon, Canu, or the PacBio Hierarchical Genome Assembly Process version 3 (5–7). For 30 strains, whole-genome optical maps were generated using the Argus platform (OpGen, Gaithersburg, MD), and the sequence order was verified using corresponding AflII and NcoI whole-genome maps.
The detected serotypes, accession numbers, and assembly metrics for each genome are listed in Table 1. The average G+C content for all 77 chromosomal sequences was 50.6%. The average coverage ranged from 39.5× to 230.8×, with an average coverage of 109×. All but nine chromosomal sequences were circularized and found to have overlapping ends. Of the nine genomes that could not be circularized due to collapsed or unresolved repeats, a single chromosomal sequence was obtained for 2014C-3741, 2014C-3716, 89-3506, 2013C-3996, and 2013C-3304. The remaining four genomes (2013C-3925, 03-3375, 2014C-4638, and 2012C-4196) had two or more chromosomal contigs. The average genome size of the 73 isolates with a single chromosomal sequence was 5,287,902 bp, ranging from 4,717,123 to 5,858,766 bp. Each genome contained between one and seven plasmids.
TABLE 1.
Escherichia coli
strain IDa |
Serotype | Chromosomal GenBank accession no. |
No. of contigs |
Chromosome size (bp) |
Associated plasmid size(s) (bp) (GenBank accession no.) |
---|---|---|---|---|---|
2015C-3163 | O103:H2 | CP027219 | 2 | 5,500,189 | 94,104 (CP027220) |
2015C-3101 | O111:H8 | CP027221 | 3 | 5,313,278 | 48,390 (CP027222), 72,543c (CP027223) |
2015C-3108 | O111:H8 | CP027307 | 3 | 5,364,442 | 63,664c (CP027308), 93,724 (CP027309) |
2014C-4135 | O113:H21 | CP027310 | 2 | 4,949,048 | 133,438 (CP027311) |
2013C-3181 | O113:H21 | CP027312 | 1 | 5,167,951 | No plasmids |
2014C-3550 | O118:H16 | CP027313 | 4 | 5,549,395 | 59,928 (CP027314), 88,840 (CP027315), 179,514 (CP027316) |
2015C-3107 | O121:H19 | CP027317 | 2 | 5,388,260 | 81,954 (CP027318) |
2014C-3084 | O145:H28 | CP027319 | 4 | 4,717,123 | 78,854c (CP027320), 84,276c (CP027321), 706,680 (CP027322) |
2013C-3033 | O146:H21 | CP027323 | 2 | 5,426,201 | 127,667 (CP027324) |
2013C-4830 | O165:H25 | CP027325 | 3 | 5,135,675 | 74,671 (CP027326), 93,170 (CP027327) |
2014C-3741 | O174:H8 | CP027328 | 3 | 5,394,679c | 128,345c (CP027329), 102,897c (CP027330) |
2013C-3277 | O26:H11 | CP027331 | 4 | 5,438,694 | 20,839c (CP027332), 46,866c (CP027333), 181,066c (CP027334) |
2014C-3716 | O26:H11 | CP027335 | 3 | 5,568,215c | 144,060c (CP027336), 62,870c (CP027337) |
2013C-3925 | O5:H9 | PVMF00000000 | 6 | 331,062,c 4,413,627,c 496,511c | 77,933,c 170,283,c 99,102c |
2014C-3051 | O71:H11 | CP027338 | 2 | 5,597,475 | 92,644 (CP027339) |
2015C-3121 | O91:H14 | CP027340 | 2 | 5,366,577 | 104,198c (CP027341) |
2014C-4587 | OUND:H19 | CP027342 | 2 | 5,040,163 | 131,410 (CP027343) |
2014C-3946 | O111:H8 | CP027344 | 3 | 5,264,938 | 22,197c (CP027345), 18,123c (CP027346) |
2013C-4361 | O111:H8 | CP027347 | 2 | 5,317,846 | 70,613c (CP027348) |
2014C-3655 | O121:H19 | CP027351 | 2 | 5,442,537 | 97,117c (CP027350) |
2012C-4606 | O26:H11 | CP027352 | 3 | 5,647,195 | 20,881c (CP027353), 57,720c (CP027354) |
2013C-4390 | O76:H19 | CP027484 | 2 | 5,353,719 | 147,394 (CP027485) |
2013C-4991 | O80:H2 | CP027355 | 4 | 5,367,251 | 71,714c (CP027356), 131,463c (CP027357), 110,001 (CP027358) |
2014C-4639 | O26:H11 | CP027361 | 3 | 5,325,246 | 54,873c (CP027359), 329,873 (CP027360) |
95-3192 | O145:H28 | CP027362 | 1 | 5,385,516 | No plasmids |
88-3001 | O165:H25 | CP027363 | 2 | 5,195,753 | 74,659 (CP027364) |
89-3156 | O174:H21 | CP027366 | 2 | 5,065,883 | 125,561 (CP027367) |
03-3375 | O145:H25 | PVMG00000000 | 4 | 5,199,239,c 40,965c | 30,901,c 83,963c |
2014C-3307 | O178:H19 | CP027368 | 3 | 4,965,987 | 109,641 (CP027369), 176,149c (CP027370) |
2015C-3905 | O181:H49 | CP027371 | 2 | 4,901,620 | 175,427c (CP027372) |
2014C-4638 | O26:H11 | PVMH00000000 | 4 | 261,681,c 2,112,842,c 3,317,231c | 88,223c |
05-3629 | O8:H16 | CP027373 | 3 | 4,904,151 | 91,648 (CP027374), 118,863c (CP027375) |
2013C-4404 | O91:H14 | CP027376 | 4 | 5,009,822 | 70,152 (CP027377), 113,102c (CP027378), 104,889 (CP027379) |
2013C-3250 | O111:H8 | CP027380 | 6 | 5,401,672 | 24,547c (CP027381), 36,491c (CP027382), 73,784c (CP027383), 27,224c (CP027384), 118,259 (CP027385) |
2014C-3057 | O26:H11 | CP027387 | 2 | 5,645,983 | 54,452c (CP027386) |
2011C-4251 | O45:H2 | CP027388 | 2 | 5,440,026 | 68,062c (CP027389) |
2015C-4944 | O26:H11 | CP027390 | 2 | 5,802,748 | 98,724 (CP027391) |
97-3250 | O26:H11 | CP027599 | 3 | 5,942,969 | 120,604 (CP027600), 92,590c (CP027601) |
2014C-3599 | O121:H19 | CP027435 | 2 | 5,400,138 | 83,611 (CP027436) |
2012C-4221b | O101:H6 | CP027437 | 3 | 5,012,557 | 74,904 (CP027438), 107,188 (CP027439) |
2012C-4502 | O185:H28 | CP027440 | 2 | 4,892,666 | 173,714 (CP027441) |
2013C-3252 | O69:H11 | CP027442 | 3 | 5,636,732 | 95,157 (CP027443), 91,399 (CP027444) |
2013C-3492b | O172:H25 | CP027445 | 2 | 5,196,105 | 74,269 (CP027446) |
2014C-3075 | O36:H42 | CP027447 | 2 | 5,168,620 | 170,848 (CP027448) |
2014C-3097b | O181:H49 | CP027449 | 3 | 5,077,228 | 34,867 (CP027450), 173,649 (CP027451) |
2014C-3338b | O183:H18 | CP027452 | 2 | 4,799,014 | 159,611 (CP027453) |
2014C-4423b | O121:H19 | CP027454 | 3 | 5,338,915 | 73,262 (CP027455), 79,682 (CP027456) |
88-3493b | O137:H41 | CP027457 | 2 | 5,001,754 | 107,796 (CP027458) |
90-3040b | O172:H25 | CP027459 | 2 | 5,253,712 | 74,247 (CP027460) |
95-3322b | O22:H5 | CP027461 | 1 | 5,095,223 | No plasmids |
07-4299b | O130:H11 | CP027462 | 2 | 4,847,172 | 125,059c (CP027463) |
2013C-4248 | O186:H2 | CP027464 | 8 | 5,243,827 | 113,063 (CP027465), 10,950c (CP027466), 62,602 (CP027467), 97,439c (CP027468), 62,881c (CP027469), 80,206 (CP027470), 243,267 (CP027471) |
2014C-3050b | O118:H16 | CP027472 | 2 | 5,671,594 | 81,624c (CP027473) |
89-3506b | O126:H27 | CP027520 | 3 | 5,178,386c | 160,231c (CP027521), 93,253c (CP027522) |
2013C-3264b | O103:H25 | CP027544 | 2 | 5,486,407 | 101,089 (CP027545) |
2013C-4187b | O71:H11 | CP027546 | 2 | 5,509,931 | 95,367 (CP027547) |
2014C-3061b | O156:H25 | CP027548 | 2 | 5,303,935 | 94,116 (CP027549) |
2014C-4705b | O112:H21 | CP027640 | 2 | 5,329,029 | 126,957 (CP027641) |
2015C-4136CT1b | O145:H34 | CP027550 | 2 | 4,836,918 | 162,810 (CP027551) |
2015C-4498b | O117:H8 | CP027552 | 2 | 5,434,442 | 67,055 (CP027553) |
2013C-3513b | O186:H11 | CP027555 | 3 | 5,584,939 | 70,129c (CP027554), 91,046c (CP027556) |
2013C-3996 | O26:H11 | CP027572 | 2 | 5,858,766c | 96,937 (CP027571) |
2013C-4081b | O111:H8 | CP027573 | 4 | 5,411,943 | 48,183 (CP027574), 95,952c (CP027575), 78,427 (CP027576) |
2013C-4225b | O103:H11 | CP027577 | 2 | 5,646,446 | 87,714c (CP027578) |
2013C-4282b | O77:H45 | CP027579 | 3 | 5,030,044 | 54,544c (CP027580), 118,822 (CP027581) |
2013C-4538b | O118:H16 | CP027582 | 2 | 5,680,428 | 88,339c (CP027583) |
2014C-3003b | O76:H19 | CP027672 | 3 | 5,234,640 | 88,529c (CP027673), 133,420 (CP027674) |
2015C-3125b | O145:H28 | CP027763 | 3 | 5,471,132 | 66,944c (CP027764), 66,388 (CP027765) |
00-3076b | O113:H21 | CP027584 | 2 | 4,997,979 | 160,576 (CP027585) |
2012C-4196 | O145:H25 | PVZZ00000000 | 5 | 3,847,435,c 1,375,699c | 26,290,c 111,344,c 65,126c |
2012EL-2448b | O91:H14 | CP027586 | 1 | 5,272,286 | No plasmids |
2013C-4974b | O5:H9 | CP027587 | 2 | 5,235,560 | 58,109c (CP027588) |
2014C-3011b | O177:H25 | CP027591 | 4 | 5,168,350 | 75,065c (CP027589), 92,449c (CP027590), 17,880c (CP027592) |
2013C-3304 | O71:H8 | CP027593 | 4 | 5,309,950c | 14,119c (CP027594), 36,845c (CP027595), 87,855 (CP027596) |
86-3153b | O5:H9 | CP027597 | 2 | 5,342,528 | 74,505c (CP027598) |
88-3510b | O172:H25 | CP027675 | 2 | 5,140,386 | 65,738c (CP027676) |
2013C-3342 | O117:H8 | CP027766 | 2 | 5,489,451 | 66,545 (CP027767) |
ID, identification.
Strain for which an optical map was generated and used to confirm the sequence order.
A linear sequence that could not be circularized due to unresolved or collapsed repeats.
Accession number(s).
The whole-genome sequences have been deposited in the DDBJ/ENA/GenBank under the accession numbers listed in Table 1. The versions described in this paper are first versions.
ACKNOWLEDGMENTS
This work was funded by federal appropriations to the Centers for Disease Control and Prevention, through the Advanced Molecular Detection Initiative line item.
The findings and conclusions of this article are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention. The use of trade names is for identification only and does not imply endorsement by the Centers for Disease Control and Prevention or by the U.S. Department of Health and Human Services.
Footnotes
Citation Patel PN, Lindsey RL, Garcia-Toledo L, Rowe LA, Batra D, Whitley SW, Drapeau D, Stoneburg D, Martin H, Juieng P, Loparev VN, Strockbine N. 2018. High-quality whole-genome sequences for 77 Shiga toxin-producing Escherichia coli strains generated with PacBio sequencing. Genome Announc 6:e00391-18. https://doi.org/10.1128/genomeA.00391-18.
REFERENCES
- 1.Mead PS, Slutsker L, Dietz V, McCaig LF, Bresee JS, Shapiro C, Griffin PM, Tauxe RV. 1999. Food-related illness and death in the United States. Emerg Infect Dis 5:607–625. doi: 10.3201/eid0505.990502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bettelheim KA. 2007. The non-O157 Shiga-toxigenic (verocytotoxigenic) Escherichia coli; under-rated pathogens. Crit Rev Microbiol 33:67–87. doi: 10.1080/10408410601172172. [DOI] [PubMed] [Google Scholar]
- 3.Brooks JT, Sowers EG, Wells JG, Greene KD, Griffin PM, Hoekstra RM, Strockbine NA. 2005. Non-O157 Shiga toxin-producing Escherichia coli infections in the United States, 1983–2002. J Infect Dis 192:1422–1429. doi: 10.1086/466536. [DOI] [PubMed] [Google Scholar]
- 4.Hughes JM, Wilson ME, Johnson KE, Thorpe CM, Sears CL. 2006. The emerging clinical importance of non-O157 Shiga toxin-producing Escherichia coli. Clin Infect Dis 43:1587–1595. doi: 10.1086/509573. [DOI] [PubMed] [Google Scholar]
- 5.Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler EE, Turner SW, Korlach J. 2013. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10:563–569. doi: 10.1038/nmeth.2474. [DOI] [PubMed] [Google Scholar]
- 6.Chin CS, Peluso P, Sedlazeck FJ, Nattestad M, Concepcion GT, Clum A, Dunn C, O'Malley R, Figueroa-Balderas R, Morales-Cruz A, Cramer GR, Delledonne M, Luo C, Ecker JR, Cantu D, Rank DR, Schatz MC. 2016. Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods 13:1050–1054. doi: 10.1038/nmeth.4035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. 2017. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27:722–736. doi: 10.1101/gr.215087.116. [DOI] [PMC free article] [PubMed] [Google Scholar]