Clostridium perfringens is the second leading cause of bacterial foodborne illness in the United States. The Wadsworth Center (WC) at the New York State Department of Health enumerates infectious dose from primary patient and food samples and, until recently, identified C. perfringens to the species level only.
KEYWORDS: foodborne disease, surveillance, public health laboratory, GalaxyTrakr, NCBI-PD, polyclonal source, whole-genome sequencing
ABSTRACT
Clostridium perfringens is the second leading cause of bacterial foodborne illness in the United States. The Wadsworth Center (WC) at the New York State Department of Health enumerates infectious dose from primary patient and food samples and, until recently, identified C. perfringens to the species level only. We investigated whether whole-genome sequence-based subtyping could benefit epidemiological investigations of this pathogen, as it has with other enteric organisms. We retrospectively sequenced 76 patient and food samples received between May 2010 and February 2020, including 52 samples linked epidemiologically to 13 outbreaks and 24 sporadic samples not linked to other samples. Phylogenetic trees were built using two Web-based platforms: National Centers for Biotechnology Information Pathogen Detection (NCBI-PD) and GalaxyTrakr (a Galaxy instance supported by the GenomeTrakr initiative). For GalaxyTrakr analyses, single nucleotide polymorphism (SNP) matrices and maximum-likelihood (ML) trees were generated using 3 different reference genomes. Across the four separate analyses, phylogenetic clustering was generally concordant with epidemiologically identified outbreaks. SNP diversity among phylogenetically linked samples from an outbreak ranged from 0 to 20 SNPs, excepting one outbreak ranging from 4 to 62 SNPs. Importantly, four of the 13 outbreak isolates harbored one or more samples that were phylogenetic outliers, and for two outbreaks, no samples were closely related. Two specimens were found harboring two distinct genotypes. For samples below CDC enumeration dose threshold, phylogenetic clustering was robust and linked patient and/or food samples. We concluded that WGS phylogenetic clusters (i) are largely concordant with epidemiologically defined outbreaks, irrespective of analysis platform or reference genome we employed; (ii) have limited pairwise SNP diversity, allowing phylogenetic clusters to be distinguished from sporadic cases; and (iii) can aid in epidemiological investigations by identifying outlier and polyclonal samples.
INTRODUCTION
Clostridium perfringens, a widely distributed anaerobic Gram-positive spore-forming bacterium, is the second leading cause of bacterial foodborne illness in the United States (approximately 1,000,000 cases annually) and Great Britain and fourth leading cause in Europe (1–3). It inhabits soil and the intestinal tracts of animals, with outbreaks often associated with consumption of beef, poultry, pork and meat-containing products such as gravies and stews (4, 5). Ingested vegetative cells enter the small intestine, where sporulation releases enterotoxin, causing gastroenteritis and diarrhea. Ingestion of a few cells may not result in disease, but ingestion of 100,000 or more cells per gram of food can cause symptoms of C. perfringens food poisoning within 6 to 24 h (6–8). One study found that about half of asymptomatic healthy human subjects carried C. perfringens in their intestines, some with two genotypes (9). Furthermore, C. perfringens can harbor a wide array of toxigenic genes (2, 3, 10, 11) and has a highly divergent open pangenome (10, 12).
In the absence of typing below the species level, several issues complicate the linking of C. perfringens to specific outbreaks. The organism’s widespread distribution can lead to chance inclusion of isolates with different genotypes and provenance in the samples representing an outbreak. Likewise, asymptomatic carriage in healthy individuals (9) can lead to patient symptoms being linked to C. perfringens intoxication when there is another etiology. Finally, exclusion of samples below the CDC enumeration thresholds can lead to missed associations.
In the U.S., reporting and investigation of suspected outbreaks of C. perfringens are initiated regionally by local, state, and regional health department epidemiologists when clusters of two or more similar illnesses are linked to a common food. Positive laboratory findings are then reported to the CDC. The New York State Department of Health (NYSDOH) uses a coordinated three-pronged approach involving the Division of Epidemiology, the Bureau of Community Environmental Health and Food Protection, and the Wadsworth Center’s laboratories, all working together to identify pathogens and food vehicles driving an outbreak. Clinical and food specimens are collected for laboratory testing and confirmation of causative pathogens. Per CDC guidance, etiology is confirmed if clinical or food samples contain 106 or 105 CFU of C. perfringens, respectively (5). In NYS, samples that yield C. perfringens counts under CDC’s threshold are reported to NYSDOH Epidemiology and Environmental Health, and when possible, five isolated colonies are saved from the enumeration plate. Importantly, in the past NYS has not typed C. perfringens below the species level.
In contrast, case-based surveillance of other major enteric pathogens is coordinated at the national level by PulseNet (CDC). Recently, whole-genome-sequencing (WGS)-based phylogenetic analysis has become the gold standard for subtyping (13), replacing pulsed-field gel electrophoresis (PFGE) in 2019. PulseNet and GenomeTrakr have established sequencing quality metrics and minimal metadata standards and suggest thresholds of genomic divergence to aid state and federal laboratorians and epidemiologists in assigning samples to a phylogenetic cluster. While still evolving, this approach has been widely adopted because of the improved resolution of WGS over PFGE. The system has so far proven to be robust and of great benefit to the surveillance network in establishing outbreak sources (13–18).
To explore the potential for similar benefits from incorporating WGS phylogenetic analysis into locally driven epidemiological foodborne outbreak investigations of C. perfringens, we retrospectively sequenced 52 samples associated with 13 foodborne outbreaks and 24 sporadic samples collected from May 2010 to February 2020 in NYS. The resulting data set was used to address the following questions. Is WGS-based phylogenetic clustering concordant with the epidemiologically defined outbreaks? Can we start to define a single nucleotide polymorphism (SNP) threshold or range that is likely to indicate a single source to aid in source attribution? Can WGS-based surveillance aid epidemiologists by refining assignments of samples to an outbreak that otherwise might confound an investigation?
MATERIALS AND METHODS
Isolation and culture.
Food samples were diluted 1:10 in sterile phosphate-buffered saline (PBS) (pH 7.4) and homogenized in a stomacher (Seward, United Kingdom). Stool specimens were diluted 1:10 in 1% peptone water and heated at 75°C for 20 min to eliminate competitive bacterial organisms. Serial dilutions from 10−1 to 10−5 of the homogenates were inoculated into tryptose sulfite cycloserine agar plates and incubated anaerobically for 48 to 72 h. Plates containing 30 to 300 black colonies were counted for enumeration. Five black colonies were isolated and identified using matrix-assisted laser desorption ionization–time of flight (MALDI-TOF) mass spectrometry with the MALDI Biotyper IVD system (Bruker Daltonics, Billerica, MA). The selected colonies were frozen at −80°C in glycerol for long-term storage.
DNA extraction.
DNA was extracted from overnight cultures using the QIAcube (Qiagen, Germantown, MD). Before being loaded on the QIAcube, samples were lysed in 100 mg/ml lysozyme for 30 min on a shaking thermomixer at 56°C; they were then placed in the QIAcube and run using the standard QIAamp DNA blood minikit protocol.
Library preparation, sequencing, quality assurance check, and data deposition.
Library preparation and sequencing were performed at the WC Advanced Genomic Technologies Center (AGTC). Library preparation followed standard Illumina protocols for Nextera XT or Nextera DNA Flex kits. We performed sequencing either on a MiSeq system using 2 × 250 chemistry and version 2 kits or on a NextSeq system using 2 × 150 chemistry and version 2 kits. NextSeq reads were demultiplexed using the Illumina BCL2FASTQ script (Illumina, San Diego, CA).
Read quality was assessed to ensure that minimum quality thresholds established by the Center for Food Safety and Applied Nutrition (CFSAN) were met using MicroRunQC implemented on the GalaxyTrakr platform of Galaxy (19). The mean and median estimated average coverages for all reads were 91× and 84×, respectively, and ranged from 26× to 202×. Q scores for all reads exceeded 32.5, and estimated genome sizes ranged from 2,724,566 to 3,778,059 bp (Table S1).
All samples were processed through the NCBI-Pathogen Detection (PD) (https://www.ncbi.nlm.nih.gov/pathogens/) (20) except SRR6443790, which did not meet NCBI-PD quality thresholds.
Phylogenetic analysis.
We used two Web-based bioinformatic platforms to perform four independent reference-based phylogenetic analyses: (i) NCBI-PD and (ii) GalaxyTrakr (https://galaxytrakr.org), using three different reference genomes. NCBI-PD is a Web-based pipeline that produces SNP-based trees for a number of pathogens in near real time. For Clostridium perfringens, NCBI-PD uses a two-pass clustering method, initial k-mer-based clustering followed by single linkage clustering of isolates that are within 50 SNPs of each other. The reference is chosen within the cluster, and the tree is reconstructed using a maximum-compatibility algorithm (21). At the time of this writing, there were 417 samples in 42 trees for C. perfringens in the NCBI-PD database.
For GenomeTrakr laboratories, FDA-CFSAN has developed GalaxyTrakr (https://galaxytrakr.org), a Galaxy instance with curated tools and standardized bioinformatic workflows for analysis of foodborne bacteria. We uploaded C. perfringens WGS paired-end reads to GalaxyTrakr (v. 1.0.1) and used the CFSAN SNP pipeline on default settings to create a high-quality SNP (hq-SNP) pairwise matrix (22). To analyze branch reliability, we generated 500 rapid bootstrap data set replicates using RAxML (v. 8.2.4; general time-reversible maximum-likelihood [ML] model) (23). Three independent ML trees were created using 3 of the 4 available C. perfringens closed genomes from human sources at NCBI as a reference; the species type strain, ATCC 13124 (NC_008261.1) (10), from a gas gangrene patient and two strains from cases of human food poisoning, strain SM101 (CP000312.1) (10) and strain FORC_025 (NZ_CP013101.1).
Data availability.
Reads and associated metadata were uploaded to the National Center for Biotechnology Information Sequence Read Archive (NCBI SRA) and Biosample databases, respectively (BioProject no. PRJNA420718) and marked for immediate release. BioProject no. PRJNA420718 is linked to the GenomeTrakr umbrella BioProject.
RESULTS
From May 2010 to February 2020, 83 clinical and food samples from 13 epidemiologically defined foodborne outbreaks were collected by NYSDOH Bureau of Community Environmental Health and Food Protection and sent to the WC laboratory, which isolated C. perfringens from 52 of 83 samples without further subtyping. The pathogen was recovered from both food and clinical sources in seven outbreaks, clinical samples only in four, and food samples only in two (Table 1). Food sources included beef, poultry, pork, meat-containing foods, cabbage, corn, and potato. We retrospectively performed WGS-based phylogenetic clustering on all 52 specimens from the 13 epidemiologically defined outbreaks and an additional 24 sporadic isolates collected during the same time period (Table S2).
TABLE 1.
Outbreak sample types, intracluster pairwise SNP distance ranges, and distances to outliers
| Outbreak | No. (%) of samples and source (food/clinical) |
Pairwise SNP distance rangea |
|||||||
|---|---|---|---|---|---|---|---|---|---|
| Intracluster |
Outlier to cluster |
||||||||
| Total | Phylogenetic outliers | NCBI_PDb | FORC_025 | ATCC 13124 | SM101 | FORC_025 | ATCC 13124 | SM101 | |
| 1 | 4 roast beef/1 stool | 3 (60) beef | 3 | 3 | 2 | 5 | 795–800 | 847–857 | 491–496 |
| 2 | 1 corn/2 stool | 1 (33.3) stool | 20 | 19 | 4 | 62 | 239–242 | 175–176 | 1,626–1,634 |
| 3 | 1 roast beef/2 stool | 1 (33.3) stool | 8 | 0 | 1 | 8 | 646–647 | 694 | 467–468 |
| 4 | 1 pork, 1 chicken | NA | 12 | 2 | 1 | 9 | NA | NA | |
| 5 | 2 corn | NA | 3 | 2 | 0 | 0 | NA | NA | |
| 6 | 3 stool | NA | 5–6 | 0–3 | 1–2 | 4–5 | NA | NA | |
| 7 | 1 roast beef/3 stool | NA | 2–8 | 1–4 | 1–2 | 6–10 | NA | NA | |
| 9 | 1 potato, 1 gravy/8 stool | 1 (10) stool | 2–11 | 0–6 | 0–6 | 2–14 | 962–1,088 | 1,120–1,215 | 1,337–1,436 |
| 10 | 1 potato, 1 cabbage, 1 corned beef/2 stool | NA | 0–4 | 1–5 | 0–2 | 1–10 | NA | NA | |
| 11 | 2 stool | NA | 11 | 0 | 3 | 6 | NA | NA | |
| 12 | 3 stool | 3 (100) stool | No cluster | No cluster | No cluster | No cluster | 804–983c | 997–1,018c | 464–511c |
| 13 | 6 stool | 6 (100) stool | No cluster | No cluster | No cluster | No cluster | 793–1,018c | 939–1,293c | 438–668c |
| 14 | 1 beef stew /3 stool | NA | 0–1 | 1–2 | 1–4 | 1–2 | NA | NA | |
Pairwise SNP distance ranges were calculated with the NCBI-PD pipeline and with the GalaxyTrakr CFSAN SNP pipeline using three reference genomes: ATCC 13124 (NC_008261.1), SM101, (CP000312.1), and FORC_25 (CP013101.1). NA, not applicable.
Analysis was done on 29 July 2020.
SNP diversity among all isolates in the cluster
Currently, there are no standardized analysis methods or guidelines for clustering thresholds for WGS surveillance of C. perfringens. Furthermore, it has a high degree of intraspecies genomic diversity (10, 12), which should be considered when WGS cluster SNP thresholds indicative of a single source are being established. To ensure that our methods were accessible to the nonbioinfomatician and that our selection of an analysis method and reference genomes yielded robust results, we used two Web-based bioinformatic platforms to perform four independent but related approaches for reference-based phylogenetic clustering. In one case, clustering was examined in hq-SNP trees built by the NCBI-PD pipeline. In a second, we used three different references to create ML trees using the CFSAN pipeline implemented in GalaxyTrakr.
Impact of methodology on cluster assignment and pairwise SNP diversity.
The NCBI-PD and the three CFSAN pipeline analyses, using different reference genomes, clustered the same outbreak samples together. Depending on the method used, intracluster pairwise SNP distances ranged from 0 to 20 SNPs for all outbreaks except outbreak 2, where we saw a 62-SNP distance between a corn sample and stool sample using the SM101 reference (Table 1). Pairwise SNP distance ranges for a given cluster were generally greater for the NCBI-PD and the SM101 reference. We identified 15 outbreak samples as phylogenetic cluster outliers using the CFSAN SNP pipeline with all three reference genomes (Table 1; Fig. 1). Outlier samples were highly divergent, 175 to 1,634 SNPs, depending on the reference genome used, from the other samples in the outbreak and usually resided in a separate, well-supported clade (Table 1; Fig. 1). The NCBI-PD also identified the same outlier samples, with the exception of one sample with low-quality metrics which was not analyzed (Table S2). The intercluster pairwise SNP distance for the GalaxyTrakr trees were 23 to 1,247, 32 to 1,315, and 43 to 2,314 for the FORC-025, ATCC 13124, and SM101 references, respectively. The NCBI-PD clustered each set of outbreak samples into unique NCBI-PD trees, indicating an intercluster distance of at least 50 SNPs (Table S2). For all outbreaks except 6 and 10, the NYS samples were the sole occupants of the NCBI-PD tree built for those isolates.
FIG 1.
An hq-SNP phylogenetic tree for all samples in this study using the FORC_025 (NZ_CP013101.1) closed genome as a reference. The tree was created using the CFSAN pipeline implemented at GalaxyTrakr and visualized on iTOL (32). Outbreaks are indicated by different colors (see legend), withphylogenetic clusters within an outbreak indicated by bars and outliers for a given outbreak indicated by arrows. Sporadic samples (samples not associated epidemiologically with any of the outbreaks in this study) are in black. Asterisks indicate branches with more than 75% bootstrap support; the plus signs indicate two sporadic samples that are closely related phylogenetically but were collected almost 8 years apart. Solid horizontal branch line lengths are proportional to the amount of inferred evolutionary change. rb, roast beef.
Epidemiological outbreak and phylogenetic cluster concordance.
Of the 52 samples linked to the 13 outbreaks, 37 samples resided in phylogenetic clusters concordant with epidemiological outbreaks, while six outbreaks (1, 2, 3, 9, 12, and 13) yielded a total of 15 samples that were clear outliers (175 to 1,634 SNPs from other outbreak samples) (Table 1; Fig. 1). In outbreak 1, a large roast beef portion was divided into four sections before sampling. Only section 3 yielded an isolate matching the single stool sample (2- to 5-pairwise-SNP distance range, depending on method used) (Table 1; Fig. 1). Depending on the reference used, the other three sections were all closely related (4 to 5 SNPs, 2 to 7 SNPs, and 13 to 18 SNPs for references ATCC 13124, FORC_25, and SM101, respectively) but were 491 to 857 SNPs distant from the stool/roast beef section 3 cluster (Table 1; Fig. 1). Outbreaks 2, 3, and 9 all had a single stool outlier sample which, depending on the method used and the cluster, was 175 to 1,634 SNPs away from other outbreak samples (Table 1; Fig. 1). NCBI-PD assigned no outlier samples to the outbreak tree, indicating that they were >50 SNPs away (Table S2). For samples from outbreaks 12 and 13, the CFSAN pipeline yielded pairwise SNP distance ranges of 438 to 1,293 SNPs, showing that none of the samples in these outbreaks were closely related (Table 1; Fig. 1). Furthermore, none of these samples were assigned to a tree by NCBI-PD, indicating a divergence of >50 SNPs between all samples for these two outbreaks (Table S2).
None of the 24 sporadic samples were placed in an NCBI-PD tree, indicating a ≥50-SNP divergence among these samples and all samples in the study or the NCBI-PD database (Table S2). Using the CFSAN pipeline, pairwise distances between sporadic and clustered samples (excluding outbreak clusters 12 and 13, with no clustered samples) ranged from 105 to 1,275 (FORC_25 reference), 89 to 1,530 (ATCC 13124 reference), or 198 to 2,257 (SM101 reference) SNPs, with the lowest interquartile range of 387 to 489 SNPs (cluster 11 SM101 reference) (Fig. 2). Interestingly, two sporadic samples, SRR7066572 and SRR10154212 (Fig. 1, plus symbol), collected 8 years apart from different locations, were not linked at NCBI-PD but were 14 (FORC_25 reference), 17 (ATCC 13124 reference), or 10 (SM101 reference) SNPs apart using the CFSAN pipeline.
FIG 2.
Pairwise SNP distances between all outbreak clusters (except 12 and 13) and all sporadic samples analyzed on GalaxyTrakr using the three reference genomes. Individual outbreak clusters are listed on the x axis. The y axis plots range, first, second, third, and fourth quartile, mean (×), and outlier (single dots) distances between all samples in a cluster and all sporadic samples.
Phylogenetic clustering is robust independent of infectious dose.
CDC and FDA protocols require enumeration of the infectious dose to support epidemiological investigations of C. perfringens and to confirm etiology. Thresholds of 105 and 106 CFU/g for food/environmental and clinical samples, respectively, are required for a sample to be considered for inclusion in an outbreak cluster. Twenty to 100% of samples in a given outbreak cluster failed to meet the enumeration threshold (39 of 52 samples) (Table 2). Overall, 74% of food and 78% of clinical samples failed to pass enumeration thresholds.
TABLE 2.
Samples that failed to meet enumeration thresholds
| Outbreak | No. (%) of samples and source (environmental/clinical) |
|
|---|---|---|
| Total | Below CDC threshold | |
| 1 | 4 roast beef/1 stool | 1/0 (20) |
| 2 | 1 corn/2 stool | 1/0 (33) |
| 3 | 1 roast beef/2 stool | 0/1 (33) |
| 4 | 1 pork, 1 chicken | 2 (100) |
| 5 | 2 corn | 2 (100) |
| 6 | 3 stool | 3 (100) |
| 7 | 1 roast beef/3 stool | 3/1 (100) |
| 9 | 1 potato, 1 gravy/8 stool | 1/8 (90) |
| 10 | 1 potato, 1 cabbage, 1 corned beef/ 2 stool | 3/2 (100) |
| 11 | 2 stool | 2 (100) |
| 12 | 3 stool | 3 (100) |
| 13 | 6 stool | 5 (83) |
| 14 | 1 beef stew/3 stool | 0/1 (25) |
For one patient in outbreak 9, we received two samples collected 1 day apart (SRR6930169 and SRR6985953) (Table S2). The first specimen had 9.45 × 107 CFU/g, meeting the enumeration threshold, but the second specimen had insufficient material to permit enumeration. Nonetheless, the NCBI-PD and CFSAN pipeline runs placed them within the same phylogenetic cluster at 1 to 7 SNPs apart, depending on the analytical method used.
Multiple genotypes from a single source.
Single clinical or environmental sources may harbor multiple strains of C. perfringens, as observed in the roast beef from outbreak 1. This may cause misidentification of an isolate as an outlier, confounding an investigation by suggesting that the isolate is unrelated to the outbreak. To examine this possibility, five isolates from the enumeration plates were sequenced from two samples each in outbreaks 1 and 9 that harbored phylogenetic outliers.
In outbreak 1, the single stool isolate matched the single roast beef isolate from section 3, but neither matched isolates from roast beef sections 1, 2, and 4 (Table 1; Table S2; Fig. 1). To determine if we could detect other roast beef strains in the stool or in roast beef section 3, four additional isolates from the enumeration plate were sequenced. All five isolates from roast beef section 3 had a pairwise SNP range of 0 to 14 but were 479 to 857 (depending on the reference genome) SNPs apart from isolates from sections 1, 2, and 4 (Table 3; Fig. 1). Four stool isolates had a pairwise SNP range of 0 to 11, but interestingly, isolate 3 was 537 to 983 SNPs from other stool isolates (Table 3; Fig. 1) and also not closely related to any roast beef isolates. Thus, no additional strains were detected in the stool or roast beef section 3 that matched the three roast beef outlier samples, although a unique outlier was isolated from the stool (Fig. 1; Table 3).
TABLE 3.
Pairwise SNP distance ranges for multiple isolates from a single sample
| Outbreak | Multiple isolates from a single sample | Pairwise SNP distance rangea |
Outlier sample(s) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Multiple isolates |
Outlier to multiple-isolate cluster |
||||||||
| NCBI_PDb | FORC_025 | ATCC 13124 | SM101 | FORC_025 | ATCC 13124 | SM101 | |||
| 1 | Roast beef section 3, isolates 1–5 | 1–8 | 0–6 | 1–6 | 5–14 | 786–800 | 836–857 | 479–496 | Roast beef sections 1, 2, and 4 |
| Stool isolates 1, 2, 4, and 5 | 0–11 | 1–2 | 1–4 | 5–10 | 864–877 | 966–983 | 537–548 | Stool isolate 3 | |
| 9 | Gravy isolates 1–5 | 4–8 | 1–8 | 0–2 | 4–17 | NA | NA | NA | NA |
| Stool isolates 1–5 | 1–12 | 1–5 | 0–2 | 0–5 | 962–1,091 | 1,115–1,217 | 1,326–1,436 | Cluster 9 | |
Pairwise SNP distance ranges were calculated with the NCBI-PD pipeline and with the GalaxyTrakr CFSAN SNP pipeline using three reference genomes: ATCC 13124 (NC_008261.1), SM101, (CP000312.1), and FORC_25 (CP013101.1). NA, not applicable.
Analysis was done on 29 July 2020.
Among the outbreak 9 isolates, one stool isolate (SRR6986379) did not cluster with the other stool, gravy, and mashed potato isolates (962 to 1,436 SNPs distant) (Table 1; Table S2; Fig. 1). Four additional isolates were each sequenced from the enumeration plate for the gravy and the outlier stool sample. All 5 gravy and 5 stool isolates had pairwise SNP distance ranges of 0 to 17 SNPs and 0 to 12 SNPs, respectively, depending on the reference used (Fig. 1; Table 3). Thus, we did not recover any genotype that linked the outlier stool isolate to the outbreak.
DISCUSSION
For enteric organisms such as Escherichia coli, Listeria monocytogenes, and Salmonella enterica, nationally coordinated WGS-based surveillance implemented by a network of state and federal public health laboratories improves cluster identification and source traceback (13, 16, 18, 24–28). In contrast, C. perfringens foodborne outbreaks are generally identified through epidemiologically driven local outbreak investigations and typed only to the species level. This study of 13 outbreaks within a 10-year period showed that WGS phylogenetic analysis would be of benefit to these local investigations. Using two Web-based bioinformatic platforms, we demonstrated that WGS phylogenetic analyses are largely concordant with epidemiologically defined outbreaks, have a limited pairwise SNP diversity that allows clusters to be distinguished from sporadic cases, and can aid in refining epidemiological investigations.
Reference-based WGS phylogenetic clustering was generally concordant for 11 of 13 epidemiologically defined outbreaks. For all analyses, irrespective of bioinformatic platform or reference genome, phylogenetically related isolates within an outbreak had limited pairwise SNP distances (0 to 20 SNPs) except for outbreak 2, in which using the SM101 reference genome identified 62 SNPs between the stool and corn samples (Table 1). Furthermore, while the FORC_25 and ATCC 13124 references yielded pairwise SNP distances between outbreak clusters and sporadic samples that were very similar, SM101 was notably different (Fig. 2), reflecting a substantially different underlying tree structure. Based on these observations, SM101 may not be a good reference and would not be used in analysis going forward.
The pairwise SNP distance between any given sample in a phylogenetic cluster to a sporadic sample ranged from 89 to 2,257 SNPs, depending on the reference used, with cluster 11 displaying the lowest interquartile range of 387 to 489 using the SM101 reference (Fig. 2). Thus, based on pairwise SNP distances, phylogenetic clusters in this study are readily distinguished from sporadic samples. However, with continued WGS-based surveillance increasing the numbers of samples in the tree (either at NCBI or in our GalaxyTrakr analysis), we expect that these SNP differences between sporadic and outbreak samples will be reduced or disappear, emphasizing that phylogenetic relationships must be interpreted in the context of epidemiological data. For example, two putative sporadic samples, SRR7066572 and SRR10154212, were closely related according to the GenomeTrakr analysis pipeline but were collected 8 years apart. Such information might not be helpful in active outbreak investigations but could indicate long-term persistence of pathogens in the environment. As GenomeTrakr and NCBI databases expand, analysis of environmental persistence will be possible and should aid in source attribution and further understanding of C. perfringens ecology.
The difference in SNP pairwise distances within a cluster and those found in comparisons of clusters to sporadic samples permits us to refine outbreak cluster definitions and thus aid epidemiological investigations. For example, using the GalaxyTrakr pipeline, all 9 samples in outbreak clusters 12 and 13 had intracluster pairwise SNP distances that were much greater (464 to 1,018 and 438 to 1,293 SNPs, respectively) than the 0 to 20 pairwise SNP range associated with other clusters. Furthermore, a separate NCBI-PD analysis of these samples placed none of them in a phylogenetic tree, indicating they were >50 SNPs from any other samples in our data set or the NCBI-PD database. In such circumstances, epidemiologists could be alerted that the samples were unlikely to share a source. Similarly, phylogenetic outlier samples from outbreaks 1, 2, 3, and 9 were identified because their pairwise SNP distance from other outbreak samples was well outside the 0- to 20-SNP range, indicating that these isolates might stem from separate sources or a single polyclonal source. Such information could be used to guide an investigation.
Two large studies in England and France investigated the phylogenetic relationships, based on WGS data, for reported outbreaks of C. perfringens (2, 3). Both studies support our finding that WGS phylogenetic cluster analysis can be used to support and refine epidemiological studies of foodborne outbreaks of C. perfringens. In the French study, sequencing of 58 isolates linked to foodborne outbreaks yielded clusters with a mean pairwise SNP distance of 7 (3). In the British study, pairwise SNP diversity for foodborne outbreaks ranged from 0 to 21 SNPs (2). Both studies found outbreak outliers based on phylogenetic analysis.
Based on our findings, NYSDOH will continue using the NCBI-PD and GalaxyTrakr platforms for WGS surveillance of C. perfringens. For each outbreak, reports to our epidemiologists will include all pairwise SNP distances for all outbreak samples from each platform, noting isolates with pairwise distances of ≤20 SNPs as likely to have arisen from a single source. Nonoutbreak samples within 20 SNPs of outbreak samples, irrespective of the collection source, time, and location, will also be reported. If needed, additional bioinformatic and/or epidemiological data can be gathered in cases of high method discordance. Finally, as databases grow, additional epidemiological findings will be available to aid in discussions with our epidemiologists regarding refining pairwise SNP ranges that indicate a common source.
All state and many regional public health laboratories in the United States now have WGS capabilities, due in large part to the CDC PulseNet and FDA GenomeTrakr networks’ adoption of WGS for surveillance (13, 16, 17). However, many still do not have in-house bioinformatic capabilities, access to GalaxyTrakr, or other WGS analysis tools. Any laboratory performing C. perfringens sequencing can submit their sequence and metadata for inclusion and analysis in the NCBI-PD to aid in detecting genomic clusters (29). Using this pipeline is straightforward (30), requires no bioinformatic expertise, and produces results that are highly concordant with other hq-SNP pipelines. Submitting laboratories are required only to provide read data with minimum-quality metrics and to make sequence data and limited metadata publicly available (31). Critically, NCBI-PD allows for surveillance outside a laboratory’s jurisdiction.
A major limitation for genomic epidemiology of C. perfringens surveillance is that there is no national surveillance system for C. perfringens. Outbreaks are generally local, single-source events (e.g., an event meal) and of short duration. Furthermore, because of this organism’s widespread distribution and asymptomatic carriage, there is no national diagnostic infrastructure, as exists for other major enteric organisms. Thus, investigations are initiated locally by epidemiologists. Nevertheless, as we show, such investigations would benefit from WGS-based subtyping. Additional limitations include the low numbers of samples currently curated in the NCBI-PD database (to date, 417). For the 13 NYS outbreaks, only outbreaks 6 and 10 were linked to non-NYS samples, which were from France and Italy, respectively, and collected a number of years before the NYS outbreaks. As more laboratories start performing WGS and uploading data for C. perfringens to NCBI-PD, phylogenetic matches to samples beyond the immediate investigation should be more common. Another limitation is that samples may be misidentified as outliers because of widespread distribution of the organism, increasing the possibility that a primary sample may harbor multiple genotypes, as observed by us and others (2, 3, 9). One solution is to sequence multiple isolates from primary samples that are epidemiologically linked to the outbreak, but because of the expense, this would be reserved for situations demonstrating a strong epidemiological need for additional genomic data. Finally, while clustering by the four approaches we employed was concordant and would be useful in supporting outbreak investigations, an analysis of the impact of each reference genome on the deeper tree structure was not undertaken. Therefore, these data and this approach cannot be used to infer the evolutionary history of these isolates until such analyses are performed.
CDC guidance recommends confirming C. perfringens etiology by enumeration. We found that phylogenetic analysis for all isolates was robust irrespective of the level of enumeration. Thus, by implementing WGS, laboratories may be able to eliminate enumeration, saving time and money as well as increasing the number of samples with subtyping data available for epidemiological follow-up.
We have demonstrated that WGS-based phylogenetic clustering is a powerful tool for refining epidemiological investigations into foodborne outbreaks of C. perfringens. These sorts of data, while useful in ongoing investigations, will also lead to a better understanding of the clonal distribution of this pathogen and identification of environmental reservoirs for this organism so that effective intervention and food protection measures may be implemented.
Supplementary Material
ACKNOWLEDGMENTS
We thank the Wadsworth Center Advanced Genomic Technologies Center for sequencing and Wolfgang Haas, Samantha Wirth, and Ruth Timme for reviewing the manuscript.
The study was supported by Cooperative Agreements 5U18FD006229 and 1U18FD006763 with the FDA.
REFERENCES
- 1.Scallan E, Hoekstra RM, Angulo FJ, Tauxe RV, Widdowson MA, Roy SL, Jones JL, Griffin PM. 2011. Foodborne illness acquired in the United States—major pathogens. Emerg Infect Dis 17:7–15. doi: 10.3201/eid1701.p11101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Kiu R, Caim S, Painset A, Pickard D, Swift C, Dougan G, Mather AE, Amar C, Hall LJ. 2019. Phylogenomic analysis of gastroenteritis-associated Clostridium perfringens in England and Wales over a 7-year period indicates distribution of clonal toxigenic strains in multiple outbreaks and extensive involvement of enterotoxin-encoding (CPE) plasmids. Microbial Genomics 5:e000297. doi: 10.1099/mgen.0.000297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Mahamat A, Radomski N, Delannoy S, Djellal S, Le Négrate M, Hadjab K, Fach P, Hennekinne J, Mistou M, Firmesse O. 2019. Large-scale genomic analyses and toxinotyping of Clostridium perfringens implicated in foodborne outbreaks in France. Front Microbiol 10:777. doi: 10.3389/fmicb.2019.00777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Wen Q, McClane BA. 2004. Detection of enterotoxigenic Clostridium perfringens type A isolates in American retail foods. Appl Environ Microbiol 70:2685–2691. doi: 10.1128/aem.70.5.2685-2691.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Grass JE, Gould LH, Mahon BE. 2013. Epidemiology of foodborne disease outbreaks caused by Clostridium perfringens, United States, 1998–2010. Foodborne Pathog Dis 10:131–136. doi: 10.1089/fpd.2012.1316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.McClane BA. 2001. Clostridium perfringens, p 351–372. In Doyle MP, Beuchat LR, Montville TJ (ed), Food microbiology: fundamentals and frontiers, 2nd ed. ASM Press, Washington, DC. [Google Scholar]
- 7.Bennett SD, Walsh KA, Gould LH. 2013. Foodborne disease outbreaks caused by Bacillus cereus, Clostridium perfringens, and Staphylococcus aureus—United States, 1998–2008. Clin Infect Dis 57:425–433. doi: 10.1093/cid/cit244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Labbe RG, Juneja VK. 2017. Clostridium perfringens, p 235–242. In Dodd CER, Aldsworth T, Stein RA, Cliver DO, Riemann HP (ed), Foodborne diseases, 3rd ed. Elsevier, Amsterdam, The Netherlands. [Google Scholar]
- 9.Carman RJ, Sayeed S, Li J, Genheimer CW, Hiltonsmith MF, Wilkins TD, McClane BA. 2008. Clostridium perfringens toxin genotypes in the feces of healthy North Americans. Anaerobe 14:102–108. doi: 10.1016/j.anaerobe.2008.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Myers GSA, Rasko DA, Cheung JK, Ravel J, Seshadri R, DeBoy RT, Ren Q, Varga J, Awad MM, Brinkac LM, Daugherty SC, Haft DH, Dodson RJ, Madupu R, Nelson WC, Rosovitz MJ, Sullivan SA, Khouri H, Dimitrov GI, Watkins KL, Mulligan S, Benton J, Radune D, Fisher DJ, Atkins HS, Hiscox T, Jost BH, Billington SJ, Songer JG, McClane BA, Titball RW, Rood JI, Melville SB, Paulsen IT. 2006. Skewed genomic variability in strains of the toxigenic bacterial pathogen, Clostridium perfringens. Genome Res 16:1031–1040. doi: 10.1101/gr.5238106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kiu R, Hall LJ. 2018. An update on the human and animal enteric pathogen Clostridium perfringens. Emerg Microbes Infect 7:1–15. doi: 10.1038/s41426-018-0144-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kiu R, Caim S, Alexander S, Pachori P, Hall LJ. 2017. Probing genomic aspects of the multi-host pathogen Clostridium perfringens reveals significant pangenome diversity, and a diverse array of virulence factors. Front Microbiol 8:2485. doi: 10.3389/fmicb.2017.02485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kubota KA, Wolfgang WJ, Baker DJ, Boxrud D, Turner L, Trees E, Carleton HA, Gerner-Smidt P. 2019. PulseNet and the changing paradigm of laboratory-based surveillance for foodborne diseases. Public Health Rep 134:22S–28S. doi: 10.1177/0033354919881650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Jackson BR, Tarr C, Strain E, Jackson KA, Conrad A, Carleton H, Katz LS, Stroika S, Gould LH, Mody RK, Silk BJ, Beal J, Chen Y, Timme R, Doyle M, Fields A, Wise M, Tillman G, Defibaugh-Chavez S, Kucerova Z, Sabol A, Roache K, Trees E, Simmons M, Wasilenko J, Kubota K, Pouseele H, Klimke W, Besser J, Brown E, Allard M, Gerner-Smidt P. 2016. Implementation of nationwide real-time whole-genome sequencing to enhance listeriosis outbreak detection and investigation. Clin Infect Dis 63:380–386. doi: 10.1093/cid/ciw242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Allard MW, Strain E, Rand H, Melka D, Correll WA, Hintz L, Stevens E, Timme R, Lomonaco S, Chen Y, Musser SM, Brown EW. 2019. Whole genome sequencing uses for foodborne contamination and compliance: discovery of an emerging contamination event in an ice cream facility using whole genome sequencing. Infect Genet Evol 73:214–220. doi: 10.1016/j.meegid.2019.04.026. [DOI] [PubMed] [Google Scholar]
- 16.Brown E, Dessai U, McGarry S, Gerner-Smidt P. 2019. Use of whole-genome sequencing for food safety and public health in the United States. Foodborne Pathog Dis 16:441–450. doi: 10.1089/fpd.2019.2662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ribot EM, Freeman M, Hise KB, Gerner-Smidt P. 2019. PulseNet: entering the age of next-generation sequencing. Foodborne Pathog Dis 16:451–456. doi: 10.1089/fpd.2019.2634. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Allard MW, Strain E, Melka D, Bunning K, Musser SM, Brown EW, Timme R. 2016. Practical value of food pathogen traceability through building a whole-genome sequencing network and database. J Clin Microbiol 54:1975–1983. doi: 10.1128/JCM.00081-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Timme RE, Venkata SLG, Balkey M, Randolph R, Wolfgang WJ, Strain EA. 2020. Assessing sequence quality in GalaxyTrakr V.2. ProtocolsIo doi: 10.17504/protocols.io.bdvfi63n. [DOI] [Google Scholar]
- 20.NCBI Resource Coordinators. 2017. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 45:D12–D17. doi: 10.1093/nar/gkw1071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Cherry JL. 2017. A practical exact maximum compatibility algorithm for reconstruction of recent evolutionary history. BMC Bioinformatics 18:127. doi: 10.1186/s12859-017-1520-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Davis S, Pettengill JB, Luo Y, Payne J, Shpuntoff A, Rand H, Strain E. 2015. CFSAN SNP Pipeline: an automated method for constructing SNP matrices from next-generation sequence data. PeerJ Comput Sci 1:e20. doi: 10.7717/peerj-cs.20. [DOI] [Google Scholar]
- 23.Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Brown EW, Gonzalez-Escalona N, Stones R, Timme R, Allard MW. 2017. The rise of genomics and the promise of whole genome sequencing for understanding microbial foodborne pathogens, p 333–351. In Gurtler J, Doyle M, Kornacki J (ed), Foodborne pathogens. Springer International Publishing, Cham, Switzerland. [Google Scholar]
- 25.Nadon C, Van Walle I, Gerner-Smidt P, Campos J, Chinen I, Concepcion-Acevedo J, Gilpin B, Smith AM, Kam KM, Perez E, Trees E, Kubota K, Takkinen J, Nielsen EM, Carleton H, FWD-NEXT Expert Panel. 2017. Pulsenet international: vision for the implementation of whole genome sequencing (WGS) for global foodborne disease surveillance. Euro Surveill 22:30544. doi: 10.2807/1560-7917.ES.2017.22.23.30544. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Allard MW, Bell R, Ferreira CM, Gonzalez-Escalona N, Hoffmann M, Muruvanda T, Ottesen A, Ramachandran P, Reed E, Sharma S, Stevens E, Timme R, Zheng J, Brown EW. 2018. Genomics of foodborne pathogens for microbial food safety. Curr Opin Biotechnol 49:224–229. doi: 10.1016/j.copbio.2017.11.002. [DOI] [PubMed] [Google Scholar]
- 27.Besser J, Carleton HA, Gerner-Smidt P, Lindsey RL, Trees E. 2018. Next-generation sequencing technologies and their application to the study and control of bacterial infections. Clin Microbiol Infect 24:335–341. doi: 10.1016/j.cmi.2017.10.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Chattaway MA, Dallman TJ, Larkin L, Nair S, McCormick J, Mikhail A, Hartman H, Godbole G, Powell D, Day M, Smith R, Grant K. 2019. The transformation of reference microbiology methods and surveillance for Salmonella with the use of whole genome sequencing in England and Wales. Front Public Health 7:317. doi: 10.3389/fpubh.2019.00317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Timme RE, Wolfgang WJ, Balkey M, Laxmi S, Venkata G, Allard M, Strain E. 2020. Optimizing open data to support One Health: best practices to ensure interoperability of genomic data from microbial pathogens. One Health Outlook 2:20. doi: 10.1186/s42522-020-00026-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Timme RE, Balkey M, Randolph R, Venkata SLG, Wolfgang WJ, Strain EA. 2020. NCBI submission protocol for microbial pathogen surveillance V.2. ProtocolsIo doi: 10.17504/protocols.io.bdvii64e. [DOI] [Google Scholar]
- 31.Timme RE, Balkey M, Venkata SLG, Randolph R, Wolfgang WJ, Strain EA. 2020. NCBI data curation protocol. ProtocolsIo doi: 10.17504/protocols.io.bacaiase. [DOI] [Google Scholar]
- 32.Letunic I, Bork P. 2016. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res 44:W242–W245. doi: 10.1093/nar/gkw290. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Reads and associated metadata were uploaded to the National Center for Biotechnology Information Sequence Read Archive (NCBI SRA) and Biosample databases, respectively (BioProject no. PRJNA420718) and marked for immediate release. BioProject no. PRJNA420718 is linked to the GenomeTrakr umbrella BioProject.


