Skip to main content
Applied and Environmental Microbiology logoLink to Applied and Environmental Microbiology
. 2020 Aug 18;86(17):e00814-20. doi: 10.1128/AEM.00814-20

Municipal Wastewater Surveillance Revealed a High Community Disease Burden of a Rarely Reported and Possibly Subclinical Salmonella enterica Serovar Derby Strain

Sabrina Diemert a, Tao Yan a,
Editor: Christopher A Elkinsb
PMCID: PMC7440783  PMID: 32591375

Wastewater-based epidemiology (WBE) has been conventionally used to analyze community health via the detection of chemicals, such as legal and illicit drugs; however, municipal wastewater contains microbiological determinants of health and disease as well, including enteric pathogens. Here, we demonstrate that WBE can be used to examine subclinical community salmonellosis patterns. Derby was the most abundant Salmonella serovar detected in Hawaii wastewater over a year-long sampling study, with few corresponding clinical cases. Comparative genomics analyses indicate that the normally rare strain of S. Derby found in wastewater has a unique combination of genes which allow it to persist as a subclinical infection without producing symptoms of severe gastroenteritis. This study shows that WBE can be used to explore trends in community infectious disease patterns which may not be reflected in clinical monitoring, shedding light on overall enteric disease burden and rates of asymptomatic cases.

KEYWORDS: Wastewater epidemiology, comparative genomics, next-generation sequencing, salmonellosis

ABSTRACT

Clinical surveillance of enteric pathogens like Salmonella is integral to track outbreaks and endemic disease trends. However, clinic-centered disease monitoring biases toward detection of severe cases and underestimates the incidence of self-limiting gastroenteritis and asymptomatic strains. Monitoring pathogen loads and diversity in municipal wastewater (MW) can provide insight into asymptomatic or subclinical infections which are not reflected in clinical cases. Subclinical infection patterns may explain the unusual observation from a year-long sampling campaign in Hawaii: Salmonella enterica serovar Derby was the most abundant pulsotype in MW but was detected infrequently in clinics over the sampling period. Using whole-genome sequencing data of Salmonella isolates from MW and public databases, we demonstrate that the Derby serovar has lower virulence potential than other clinical serovars, particularly based on its reduced profile of genes linked with immune evasion and symptom production, suggesting its potential as a subclinical salmonellosis agent. Furthermore, MW had high abundance of a rare Derby sequence type (ST), ST-72 (rather than the more common ST-40). ST-72 isolates had higher frequencies of fimbrial adherence genes than ST-40 isolates; these are key virulence factors involved in colonization and persistence of infections. However, ST-72 isolates lack the Derby-specific Salmonella pathogenicity island 23 (SPI-23), which invokes host immune responses. In combination, ST-72’s genetic features may lead to appreciable infection rates without obvious symptom production, allowing for subclinical persistence in the community. This study demonstrated wastewater’s capability to provide community infectious disease information—such as background infection rates of subclinical enteric illness—which is otherwise inaccessible through clinical approaches.

IMPORTANCE Wastewater-based epidemiology (WBE) has been conventionally used to analyze community health via the detection of chemicals, such as legal and illicit drugs; however, municipal wastewater contains microbiological determinants of health and disease as well, including enteric pathogens. Here, we demonstrate that WBE can be used to examine subclinical community salmonellosis patterns. Derby was the most abundant Salmonella serovar detected in Hawaii wastewater over a year-long sampling study, with few corresponding clinical cases. Comparative genomics analyses indicate that the normally rare strain of S. Derby found in wastewater has a unique combination of genes which allow it to persist as a subclinical infection without producing symptoms of severe gastroenteritis. This study shows that WBE can be used to explore trends in community infectious disease patterns which may not be reflected in clinical monitoring, shedding light on overall enteric disease burden and rates of asymptomatic cases.

INTRODUCTION

Municipal wastewater (MW)—which contains waste products from all members of the connected community—has recently been recognized as a valuable tool for monitoring community health trends. Especially in separated sewer systems (in which stormwater and MW are conveyed through distinct pipe networks), wastewater contains a large percentage of human-derived microorganisms which can indicate overall illness patterns for the connected community (1). To date, wastewater-based epidemiology (WBE) has mostly involved chemical monitoring (for example, illicit and prescription pharmaceuticals and metabolites); however, due to increasing technological capabilities and data processing abilities, more applications of WBE for monitoring biologicals are emerging, including tracking human pathogens (2). We previously demonstrated this capability for infectious disease monitoring using MW: Salmonella enterica concentrations in separated sewer wastewater correlated with community salmonellosis cases, and many strains appeared simultaneously in health clinics and wastewater (3). However, the patterns of pathogens in MW are not expected to perfectly reflect results from clinics due to the underestimation of enteric illness prevalence in clinical data, particularly for gastroenteritis illnesses such as nontyphoidal salmonellosis (NTS) (4). Underestimation of disease incidence using clinical surveillance methods is well documented and can be broadly explained by underascertainment (affected individuals not seeking medical care) and underreporting (problems with the reporting process at the health provider level) (5). We have shown that MW surveillance can mitigate some of the issues with clinical underdetection: monitoring of wastewater Salmonella detected a resurgence of a salmonellosis outbreak that was missed by clinics (6).

Related to the issue of underascertainment, pathogen profiles may differ between clinics and municipal wastewater due to subclinical infections. Pathogens that cause low grades of illness, which are less likely to prompt a clinic visit and hence are not detected by clinical approaches, can still be shed significantly in feces (7) and are therefore able to be detected through MW monitoring. Despite the lack of symptoms, subclinical NTS infections can still have serious impacts on public health: subclinical carriers can increase the spread of disease, low-virulence pathogen strains may cause serious illnesses in immunocompromised people, and strains involved in asymptomatic persistent infection may spread antibiotic resistance genes to other, more virulent strains (8). Presently, little is known about the specific serovars that are associated with subclinical infections, although strains with reduced virulence factor (VF) genes may be associated with more minor illnesses (9, 10).

Subclinical infections may explain the high wastewater dominance and simultaneous clinical underreporting of Salmonella enterica serovar Derby in Honolulu, HI. S. Derby comprised approximately 21% of MW isolates from the year-long MW sampling campaign conducted at the Sand Island Wastewater Treatment Plant (SIWWTP) in Honolulu; this wastewater system uses separated conveyance pipes (i.e., stormwater has a separate piping network than MW and is not conveyed to the SIWWTP) and treats approximately 60% of Oahu’s MW (3). In contrast, it was detected in only 2% of Hawaii clinical cases from the same period (3) (see also Fig. S1 in the supplemental material). While not rare in the United States, S. Derby is not one of the more common serovars reported to the Laboratory-based Enteric Disease Surveillance (LEDS) system in the United States (11). S. Derby has been postulated to have relatively low virulence as a serovar (9), and traceback investigations have found that S. Derby outbreaks can be largely asymptomatic, which can exacerbate the spread through secondary infection (12).

In order to understand the underlying mechanisms for S. Derby’s clinical underreporting and high abundance in wastewater, we conducted a comparative genomic investigation of the wastewater isolates against Salmonella strains selected from publicly accessible databases, focusing on virulence factor genes and phylogenetic diversity. VFs are molecules produced by pathogens which enable their growth and ability to cause disease in hosts, and they may be involved in colonizing and/or invading host cells, evading barriers and defense mechanisms, suppressing host immune responses, and acquiring nutrients more efficiently from hosts (13). The presence and variability of VF genes can provide an estimate of the virulence potential of a Salmonella genome and correspond with clinical presentation of the salmonellosis infection, including its host specificity (14) and overall virulence toward establishing infections (9). We hypothesized that S. Derby from MW is less virulent (i.e., has fewer virulence genes) than Salmonella serovars normally observed in clinics and in MW. We also sought to clarify whether this reduced virulence was due to S. Derby (as a serovar) lacking key VF genes compared to other Salmonella groups or whether the S. Derby from wastewater was phylogenetically distinct from other S. Derby isolates from clinical and environmental data sets.

RESULTS AND DISCUSSION

VF gene analysis in Salmonella data sets.

From the sequenced collection of Hawaii MW Salmonella isolates (n = 273), 63 isolates in total belonged to the Derby serovar. All 63 S. Derby MW isolates were included in the present study. For comparison, four other groups of isolates were examined: other serovars in MW (“other MW”), clinical S. Derby isolates from NCBI (“clinical S. Derby”), environmental/other S. Derby isolates from the NCBI (“environmental/other S. Derby”), and clinical isolates from NCBI that comprised isolates representative of the top 20 clinical serovars in the United States for 2010 and 2011 (“clinical serovars”) (15). These four groups were subsampled for 63 isolates each (except for clinical S. Derby isolates, for which only 54 isolates were available), and draft assemblies were constructed using bioinformatics pipelines. These assemblies were examined for the presence of virulence factor (VF) genes, and the frequencies of VF genes were compared using either chi-squared or Fisher’s exact test of independence.

Based on the chi-squared test of independence, there are significant differences in VF gene frequencies between different Salmonella categories (χ2 = 282.77, degrees of freedom [df] = 4, and P < 0.001). Post hoc testing (pairwise χ2 with Bonferroni correction) indicated that all categories had significantly different VF gene frequencies (adjusted P value [Padj] < 0.001 to 0.0049), with the exceptions of the S. Derby groups from NCBI (S. Derby enviro/other and S. Derby clinical, shown in the red box in Fig. 1A). All S. Derby groups (MW S. Derby, clinical S. Derby, and environmental/other S. Derby) had significantly fewer VF genes (and subsequently lower overall virulence potential) than other serovars found in MW as well as the clinical serovars in the NCBI database. Additionally, this post hoc testing revealed that the MW S. Derby group contained significantly more VF genes than the Derby groups surveyed in the NCBI database (clinical S. Derby and environmental/other S. Derby).

FIG 1.

FIG 1

VF gene variation in subsampled Salmonella categories (ABRIcate against VFDB with 70% coverage and 90% identity). (A) Box plot with overlaid strip plot showing unique VF genes per isolate. The strip plot (black dots) indicate results from individual isolates within each group. The red rectangle indicates pair which did not have significant differences in VF gene frequencies (null hypothesis accepted for χ2 post hoc test, α = 0.05 with Bonferroni correction for multiple comparisons). (B) Heat map of VF gene frequency (grouped by functionality [54]). Asterisks indicate gene functional categories with significant omnibus tests at an α value of 0.05 (one asterisk for Fisher’s exact test and two asterisks for χ2 test of independence). (C) Box plots with overlaid strip plots for fimbrial determinant genes per isolate in different Salmonella categories. Red rectangles indicate pairs which did not have significant differences in VF gene frequencies (null hypothesis accepted).

To explore functional explanations for these gene frequency differences, the VF genes were clustered into biochemical categories and visualized in a heat map showing the average proportion of full gene groups by Salmonella category (Fig. 1B). Post hoc testing revealed that for many gene groups, the MW S. Derby isolates were not significantly different from the NCBI S. Derby groups (clinical and environmental/other; Padj < 0.001). However, all Derby isolates (MW S. Derby, clinical S. Derby, and environmental/other S. Derby) differ drastically from the clinical serovars and other serovars in MW. Depending on the VF gene group, the S. Derby groups either lack all genes within the group (stress protein and antivirulence, toxin genes, and immune evasion genes) or contain significantly fewer genes (as for secretion system and nonfimbrial adherence determinant genes) than the clinical serovars and/or the other serovars subsampled in MW. These gene groups largely contribute to increased survival and persistence within hosts (i.e., serum resistance and immune evasion), as well as the production of symptoms (i.e., toxins). Comparatively, S. Derby as a serovar (as represented by the three groups tested in this study) has lower virulence potential in these functionalities. This is consistent with our hypothesis that S. Derby is a low-virulence serovar and with previous studies that noted S. Derby as a less pathogenic serovar lacking key virulence genes (9), which may result in a subclinical presentation.

Interestingly, post hoc testing also revealed that for one gene functional group—fimbrial adherence determinants—the MW S. Derby isolates showed significantly higher gene frequencies than S. Derby isolates from NCBI databases (clinical, χ2 = 35.3, n = 1, and Padj < 0.001; environmental/other, χ2 = 52.6, n = 1, and Padj < 0.001 [Fig. 1C]). MW S. Derby had higher frequencies of fimbrial genes than the NCBI S. Derby groups and lower frequencies than clinical serovars (χ2 = 46.0, n = 1, and Padj < 0.001); however, the fimbrial gene frequencies did not differ significantly from those of MW isolates belonging to other serovars (χ2 = 0.24, n = 1, and Padj > 0.001). Thus, while the MW Salmonella isolates (MW S. Derby and MW other) showed fewer fimbrial adherence determinants than the representative clinical serovars, they possessed more of these genes than environmental and clinical S. Derby isolates from NCBI databases. Fimbrial determinants are key virulence genes involved in adherence to host cells (16), and these trends indicate that the isolates obtained from Hawaii MW, including MW S. Derby, may have decreased adhesion ability compared with that of representative clinical isolates but increased ability compared with that of other S. Derby isolates from the NCBI (clinical and environmental/other).

MW S. Derby phylogeny and traditional multilocus sequence types.

In addition to the differences in gene frequencies, MW S. Derby exhibited a more bimodal distribution of unique VFs than other Salmonella categories (strip plot in Fig. 1A). These observations suggest an underlying population subgrouping in MW S. Derby, which is corroborated by phylogenetic analyses (Fig. 2). Pulsed-field gel electrophoresis (PFGE) analysis of MW S. Derby pulsotypes (compared with Hawaii Department of Health clinical isolates from the same sampling period) resulted in a dendrogram with three main clusters of MW S. Derby isolates (Fig. 2A). This overall clustering corresponds with the phylogenetic tree created via pangenome alignment (Fig. 2B), the core genome multilocus sequence type (cgMLST) minimum spanning tree (MST) (Fig. 2C), and the sequence types (STs) of the isolates, determined by 7-gene traditional MLST (see the legend to Fig. 2C). Two isolates which formed a cluster—HIY151 and HIY163—were found to belong to the Agona serovar (ST-13), not Derby, after applying the multilocus sequence typing and the Salmonella In Silico Typing Resource (SISTR) (17) for genetic serovar confirmation. S. Agona is sometimes incorrectly serotyped as S. Derby (and vice versa) due to similarities in antigenic structure between the serovars (18). The two isolates identified as S. Agona were not analyzed further, nor were they included in the previous VF analyses.

FIG 2.

FIG 2

Comparison of clustering methods for Derby pulsotypes from MW. (A) PFGE analysis of Derby isolates from MW and Hawaii clinics (black dots indicate clinical samples; not all MW samples are shown). The dendrogram splits isolates into three major clusters. (B) Generalized time-reversible/gamma maximum likelihood phylogenetic tree from alignment of core pangenome (generated from Roary [48] alignment via RAxML [49], visualized in iTOL [50]). (C) cgMLST minimum spanning tree for MW Derby pulsotypes. Branch lengths correspond with allelic differences over 3,002 loci in the EnteroBase cg-MLST scheme for Salmonella (45). The minimum spanning tree was generated with GrapeTree using the online web tool (46).

Based on this phylogenetic clustering, the MW S. Derby collection is composed of two distinct subpopulations which are delineated as traditional multilocus sequence types: ST-72 and ST-40 (Fig. 2C). This in itself is not surprising, as Derby is known to be a polyphyletic serovar with at least two distinct lineages (1921) (in contrast to many Salmonella serovars with single common ancestors). However, the presence and abundance of ST-72 (in clinics, foodborne samples, environmental samples, or samples from other isolation environments) have not previously been reported. In this study, we found ST-72 to compose the majority of S. Derby isolates in MW (43/63 isolates, or 68% of MW S. Derby isolates, compared with 20 isolates of ST-40). In addition to ST-72’s abundance in MW, several Honolulu clinical isolates of ST-72 were recorded during the same period (indicated by circles in Fig. 2A and compared with MW detection in Fig. S1). This indicates that not only are the strains of ST-72 ubiquitous in the community, they also have the capacity to develop clinically relevant infections.

Contrasting with the abundance of ST-72 in Honolulu MW, ST-40 is the most commonly reported sequence type for the Derby serovar in literature (21), with isolates found in North America (22), Europe (19, 21), and Asia (23). The dominance of ST-40 can be observed in the distributions of the different Derby subsamples used in this study (clinical and environmental/other), as well as the EnteroBase database results for the Derby serovar as a whole (Fig. 3), and in the minimum spanning tree for these isolates (Fig. S2). ST-72 has been rarely discussed in literature: normally, it is noted only as a constituent of Derby STs in sequence typing schemes and not found in samples studied (23, 24). All ST-72 isolates currently available in EnteroBase were collected in Canada or the United States, and the majority of these are from the present study of Hawaiian MW. The only significant mentions of ST-72 consist of the publication of a closed reference genome (25) and the inclusion of a single ST-72 isolate in a key study on Derby lineages (19). ST-72 formed part of Derby lineage L2, with other isolates identified as ST-71, while lineage L1 contained mostly ST-40.

FIG 3.

FIG 3

Traditional 7-gene MLST distribution for difference sample sets and from the overall EnteroBase database when selecting for isolates with Derby serovar using SISTR (17). STs were obtained on assemblies using mlst_check (51) for MW Hawaii Derby and subsampled Derby groups, and STs for EnteroBase were collected from the EnteroBase experimental data (45).

This increased detection of ST-72 in Hawaii MW may represent an endemic subclinical infection in Honolulu, or it may indicate that ST-72 is a strain of Salmonella which is more likely to impose subclinical infections in general. S. Derby has previously been noted to be a dominant serovar in California wastewater, suggesting a possible role as a subclinical strain (26); however, no other studies have characterized the traditional multilocus sequence types of Salmonella in MW, which limits the ability to determine if ST-72 is commonly detected across different locations or geographically restricted.

Comparison of ST-72 and ST-40 S. Derby isolates from MW and EnteroBase.

To conduct a more in-depth comparison between the ST-72 and ST-40 virulence gene composition, EnteroBase was queried for isolates matching these STs (Hawaii MW samples were excluded from this query). Only 25 ST-72 isolates with genomic sequence data were found in EnteroBase, and therefore, all samples were collected; the ST-40 data set was subsampled. These subsamples from EnteroBase were subjected to the same VF gene analyses and compared with the ST-72 and ST-40 isolates from the MW S. Derby sample set. Omnibus tests indicated that fimbrial adherence determinants (Fisher’s exact test, df = 4 and P < 0.001) were the only gene group that exhibited significantly different frequencies between the two STs, with ST-72 exhibiting significantly higher frequencies. To clarify the specific genes which varied, individual fimbrial adherence genes were analyzed for frequency differences between groups via Fisher’s exact test. Five genes—all components of the lpfABCDE operon—varied significantly (df = 4 and P < 0.001). Post hoc testing indicated that ST-72 isolates (regardless of source, i.e., MW versus EnteroBase subsamples) have significantly higher frequencies of lpfABCDE genes than ST-40 isolates. This pattern was consistent: all ST-72 isolates were found to contain the lpfABCDE genes, which are missing in all ST-40 isolates (Fig. 4).

FIG 4.

FIG 4

Combined core pan-genome maximum likelihood phylogenetic trees (constructed via Roary [48] and RAxML [49]) for ST-72 and ST-40 isolates from MW Derby and EnteroBase subsamples, with heat map of specific gene counts (visualized via Phandango [58]).

The presence of the lpfABCDE operon in ST-72 Derby isolates (in contrast to its absence in ST-40) is particularly interesting due to its role in host cell adhesion, invasion, and persistence and its implications for host specificity. The long polar fimbriae (encoded by lpf) mediate attachment to and invasion of epithelial cells (16) and have demonstrated preference for Peyer’s patches (intestinal cells) in mouse models (27). Salmonella strains contain, on average, 11 to 13 unique types of fimbrial gene clusters (FGCs), which provides some functional redundancy (28). Compared to core FGCs, which were detected in >80% of serovars studied, the lpfABCDE operon is only partially conserved (present in approximately 70% of serovars and <40% of individual strains) (28). Due to the partial functional redundancy of FGCs, the evolutionary loss of specific fimbrial genes may lead to only a moderate loss in virulence, compared to a major reduction of infectivity when multiple FGCs are mutated or lost (27, 29). However, the spectrum of FGCs genetically available to Salmonella strains affects their host preference and their ability to evade immune responses. Most Salmonella strains containing lpf genes exhibit a broad host range (28); this may be the case for ST-72 isolates, as lower host selectivity is often associated with reduced symptoms or subclinical infections (14). This supports the hypothesis that the MW S. Derby abundance may reflect subclinical salmonellosis infections in the community. Additionally, lpf operon expression is regulated through phase variation, which allows for cross-immunity evasion: lpf phase “off” variants do not develop surface long polar fimbriae, preventing antigenic recognition by hosts, but can retain the ability to develop these fimbriae for infection of different hosts (30). Since the ST-72 strains possess these lpf genes and the ST-40 strains do not, the ST-72 strains may have enhanced survival against immune responses and longer persistence in hosts, allowing them to establish as an endemic infection within a community.

In addition to general virulence genes, all ST-72 and ST-40 strains were investigated for genes indicative of the Derby-specific Salmonella pathogenicity island 23 (SPI-23) (31). The ST-72 isolates (from MW and EnteroBase subsampling) were found to lack all SPI-23 genes except for docB, and most ST-40 isolates contained all SPI-23 genes. This provides further evidence that ST-72 belongs to the proposed L2 lineage of Derby, along with ST-71, as this lineage lacks SPI-23 (19). SPI-23 encodes several putative type III secretion effector proteins; these secretion systems can modulate host immune responses and cellular functions and structures (32) and, in particular, play key roles in the induction of host inflammatory responses (33). Additionally, L1 Derby isolates containing SPI-23 genes (including ST-40 isolates) have demonstrated higher infection rates on porcine jejunum monolayers than isolates lacking these genes (31). While it is currently unknown whether these genes confer host-specific (porcine) or broad-range infectivity, their expression is associated with higher virulence and tissue tropism (31). The absence of SPI-23 in ST-72 isolates may explain their dominance in MW but lack of clinical representation: while ST-72 isolates may be able to establish infections and persist in hosts, they are less able to interfere with host cellular functions and promote inflammatory responses, potentially leading to reduced symptoms in infected hosts (and thus a decreased potential for clinical reporting).

Previous studies on S. Derby host specificity have indicated that members of the L1 Derby lineage (such as ST-40) may be better adapted for porcine hosts, while isolates from the L2 Derby lineage (such as ST-72) would have a higher propensity for avian hosts such as turkeys (19, 21). In contrast with this theory, a review of these EnteroBase-derived ST-72 and ST-40 samples indicated that for both sequence types, swine (either livestock or meat) was the primary host of the S. Derby isolates (Fig. S3). Due to the small available sample size, it is difficult to draw a definitive conclusion about host preference. If indeed ST-72 preferentially establishes infections in pigs, it could be posited that nonhuman impact (i.e., wastes from pigs) could be causing the increased ST-72 S. Derby populations in Hawaiian MW. However, this is unlikely the case, as there are no swine farms in the urban Honolulu area covered by SIWWTP. While there could be an impact of feral pig waste in stormwater runoff from the valleys, the wastewater collection piping is a separated sewage system, and thus stormwater is not conveyed to the SIWWTP. Therefore, it is more likely that the abundance of ST-72 in wastewater is due to the input of infected humans, rather than impact from pigs or other environmental sources.

Implications.

Taken together, our results show a previously underreported strain of Salmonella Derby (ST-72) with unique virulence characteristics to be highly abundant in MW but relatively rare in local health clinics. Based on virulence factor gene comparison with representative clinical Salmonella serovars and Derby isolates from clinics and environmental/other sources, Derby as a serovar has lower virulence potential than other clinical representative Salmonella serovars and thus may cause only a subclinical or low-grade infection. Furthermore, the ST-72 strain of Derby which was dominant in Hawaiian MW has a greater ability than most Derby strains to establish infections due to its additional fimbrial adherence system (encoded by the lpfABCDE operon) but is less virulent to the hosts due to its lack of SPI-23, which is found in other Derby isolates; this may aid its persistence in the community.

Relatively little is known about Salmonella strains responsible for subclinical salmonellosis, including their overall contribution to disease burden and their impact on environmental Salmonella reservoirs (7). This study suggests that wastewater-based public health monitoring can be used to determine the strains responsible for these clinically inaccessible “background levels” of enteric infectious disease in a community: pathogens which form part of the overall disease burden, may sporadically cause serious illness in healthy and immunocompromised community members, and can contribute the pathogenic resistome may be identified and explored using municipal wastewater.

MATERIALS AND METHODS

MW sample collection and WGS.

Salmonella isolates were collected as part of a weekly raw municipal wastewater (MW) sampling campaign (spanning 54 weeks from 2010 to 2011) at the Sand Island Wastewater Treatment Plant in Honolulu, HI, which was described in detail previously (3). Briefly, once weekly (on Sundays), 24-h composite samples of raw MW were collected from upstream of the primary clarifiers. Salmonella bacteria were enumerated in MW using a modified most-probable-number (MPN) method. For each sample that was positive for Salmonella, 5 to 10 colonies were picked at random from xylose lysine deoxycholate (XLD) plates, confirmed as Salmonella using biochemical testing and quantitative PCR (qPCR), and subjected to pulsed-field gel electrophoresis (PFGE) typing. In total, 378 Salmonella isolates were obtained for the MW isolate collection and stored in LB broth with 50% (vol/vol) glycerol at –80°C; from this MW isolate collection, 273 samples were randomly selected for whole-genome sequencing (WGS). Sequencing was conducted by the Hawaii Department of Health, following standard CDC PulseNet procedures for DNA extraction, Nextera XT library preparation, and Illumina MiSeq runs (34, 35). From the MW Salmonella data set, 65 isolates were identified as S. Derby according to PFGE analysis.

Genome draft assembly procedure.

Sequencing adapters were trimmed from raw paired-end FASTQ reads using BBDuk from the BBTools package (36). The Trim Galore wrapper (37) was then used to trim for quality bases (Phred scores of <20 removed) via Cutadapt (38) and confirm read quality via FastQC (39). Quality-trimmed paired-end FASTQ reads were assembled using the de novo assembler SPAdes (40). Assemblies were assessed for contamination using BlobTools at the phylum level (41); contigs not belonging to Proteobacteria or viral phages were removed from the assemblies. Depth of coverage and assembly statistics were determined using QUAST (42); results are summarized in Tables S1 to S7.

Selection of S. Derby MW and other MW isolates.

Serovar designations were confirmed using SISTR for command line (17). This analysis indicated that 63 of 65 isolates identified as Derby through PFGE belonged to the Derby serovar, while 2 isolates were identified as Agona. These isolates were excluded from subsequent WGS analyses, with the exception of pangenome analysis + cg-MLST construction. All 63 confirmed S. Derby isolates from MW were selected for the “S. Derby MW” isolate group. All other MW isolates were scanned for serovar confirmation with SISTR, and no additional isolates were identified as Derby (results not shown). From the remaining MW isolates not identified as the Derby serovar (n = 210), 63 isolates were randomly selected to represent the “other MW” group.

Selection of isolate genomes from databases.

Metadata for all Salmonella isolates in the NCBI database were collected from the NCBI Pathogen Detection Isolates Browser (Beta) FTP (ftp://ftp.ncbi.nlm.nih.gov/pathogen/Results/Salmonella/) on 31 July 2019 and imported into JuPyter for analysis with the pandas library (43) in Python. The data set was filtered for samples isolated in the United States (under geo_loc_name flag) which had corresponding data uploaded to the NCBI Sequence Read Archive (SRA) (under Run flag). Derby isolates were selected by sorting according to serovar (as identified by the submitting lab) after cleaning the serovar flag categories for naming consistency. Derby results were further filtered into clinical or “environmental/other” categories based on the epi_type flag, and only samples isolated from human hosts were retained for the clinical data set. Sample entries corresponding to MW isolates from the Hawaii SIWWTP data set were excluded from the Derby environmental/other category. The Derby environmental/other data set was randomly subsampled (n = 63, to match the total number of MW Derby sequenced isolates) to retrieve a list of accession numbers: FASTQ files for these accession numbers were downloaded from the NCBI SRA. For the Derby clinical data set, only 63 samples met the filter conditions, so subsampling was not required and all accession numbers in the data set were downloaded from SRA. FASTQ files were subjected to the assembly pipeline described above. For the Derby clinical data set, results from SISTR indicated that only 54/63 isolates belonged to the Derby serovar, and only this subset (n = 54) was subjected to further analyses. Accession numbers and assembly data are provided in Tables S3 (clinical) and S4 (environmental/other).

The NCBI database was also queried to build a Salmonella data set representative of human clinical samples for 2010, the same year as the MW sampling campaign. The top 20 laboratory-confirmed serovars in U.S. clinics (as well as percentages of other serotypes, unknown, and partially serotyped isolates) were obtained from the CDC Laboratory-based Enteric Disease Surveillance report for 2010 (15). Proportions for each serovar (and categories of “other serotypes” and “unknown/partially serotyped isolates”) required to make a total sample set of 63 were calculated. For each serovar, the U.S. clinical category of NCBI isolates was randomly subsampled for the appropriate proportion. For the “other serotypes” category, serovars not matching the top 20 list were randomly subsampled; for the “unknown” category, isolates with no serovar information (or with “Missing” as the entry for the serovar flag) were randomly subsampled. Details are provided in Table S5. This representative clinical data set (n = 63) was subjected to the assembly pipeline and further analyses.

For comparison of specific sequence types (STs) of Derby, metadata were queried from the Salmonella database of EnteroBase (44) with SISTR serovar testing. Metadata for all publicly available Derby isolates (as identified via SISTR) from the United States was filtered to determine proportions of different STs, not including the Hawaiian MW isolates. Additionally, the EnteroBase Salmonella Derby U.S. data set was sampled for ST-40 and ST-72 isolates to compare with Derby isolates from Hawaiian MW (n for ST-40 = 20 and n for ST-72 = 43). Of the EnteroBase Salmonella Derby U.S. data set, only 25 isolates belonged to the ST-72 group (and thus all ST-72 isolates were sampled), whereas 957 isolates corresponded with the ST-40 group: 43 isolates were subsampled from the ST-40 group. These isolates were downloaded from the NCBI SRA and were subjected to the assembly pipeline and further analyses. Accession numbers and assembly data are provided in Tables S6 (ST-40) and S7 (ST-72). All code for metadata analyses is available in the supplemental material (JuPyter sheet).

Gene identification and comparative genomics.

All Derby pulsotypes in Hawaiian MW were subjected to core genome multilocus sequence typing (cgMLST) in EnteroBase (45), using the cgMLST V2 + HierCC V1 schemes. GrapeTree (46) was used to construct a minimum spanning tree (MST; publicly available at enterobase.warwick.ac.uk/ms_tree/28133). Another MST (Fig. S2) was constructed using all isolates in the EnteroBase database which were identified as Derby using SISTR (17) with release dates no later than 30 October 2019 (n = 1,107). This MST is publicly available at enterobase.warwick.ac.uk/ms_tree/33906.

MW draft assemblies were annotated using Prokka (47). The pan-genome of MW S. Derby isolates was constructed using Roary (48): the -e mafft flag was used to create a concatenated core genome alignment, which was then passed to RAxML (49) to infer a phylogenetic tree using the GTR-Gamma model of site heterogeneity and 100 bootstrap iterations. This tree was visualized using the iTOL web tool (50).

The traditional 7-gene MLST Salmonella scheme was used to classify STs of draft assemblies computationally with mlst_check (51). Draft assemblies were scanned for virulence genes using ABRicate (52) and the Virulence Factors of Pathogenic Bacteria Database (VFDB) (53) with a minimum identity of 90% and minimum coverage of 50%. For confirmation of SPI-23, 8 marker genes previously described by Hayward et al. (31) (gooN, potR, talN, chlR, bigM, genE, tinY, and docB) were used to construct a local BLAST database. Assemblies were scanned against the database with using ABRIcate with a minimum identity of 90% and an increased minimum coverage of 70%.

Statistics and visualization.

ABRIcate summary tables were processed in JuPyter sheets using pandas to construct a contingency table for overall unique VF gene frequencies per Salmonella isolate category (Derby in MW, other serovars in MW, clinical Derby, environmental/other Derby, and clinical representative serovars). Additionally, VF genes were separated into functional categories (54), and separate contingency tables were created.

The significance of gene frequency variations between Salmonella isolate categories was evaluated using P values from nominal variables tests: chi-squared (χ2) test of independence for overall unique VF genes, fimbrial adherence determinants, and secretion systems (as response variables > 1,000) and Fisher’s exact test for other VF gene categories (55). If omnibus testing yielded significant results (at α = 0.05), post hoc multiple-comparison tests were conducted on all possible pairs of Salmonella isolate categories, with resulting P values adjusted using the Bonferroni correction. The chi-squared test (chisq.test) and Fisher’s exact test (fishers.test) were called in python from the stats package in R (56) using rpy2 (https://rpy2.github.io/). Similar analyses were conducted on ST-72 versus ST-40 data sets from Derby MW and EnteroBase. Box plots of overall gene frequencies and heatmaps for gene proportions were visualized via Seaborn (57). All code for statistical analyses is included in the supplemental material (JuPyter sheet).

Accession number(s).

WGS paired-end reads for MW Salmonella isolates are available under BioProject accession no. PRJNA274995. BioSamples and assemblies for MW Salmonella isolates are available under BioProject accession no. PRJNA507471; accession numbers and assembly data are provided in Tables S1 (S. Derby MW) and S2 (other MW). Accession numbers for other genomes are provided in Tables S3 to S7.

Supplementary Material

Supplemental file 1
AEM.00814-20-s0001.pdf (873.3KB, pdf)
Supplemental file 2
AEM.00814-20-sd002.xlsx (13.3KB, xlsx)

ACKNOWLEDGMENTS

We thank Pamela O’Brien (Hawaii Department of Health) for assistance with next-generation sequencing and PFGE analysis and Matthew R. Hayward (Harvard) for providing Derby assemblies from previous work for SPI-23 gene comparison.

This work was financially supported by a grant from the National Science Foundation (CBET-1507979 to T.Y.) and the Natural Sciences and Engineering Research Council of Canada (NSERC PGS-D to S.D.).

T.Y. conceived and supervised the project. S.D. planned the isolate data collection and performed the bioinformatics, conducted statistical analysis, interpreted results, and wrote the manuscript. T.Y. edited the manuscript.

We declare no competing financial interest.

Footnotes

Supplemental material is available online only.

REFERENCES

  • 1.O’Brien E, Xagoraraki I. 2019. A water-focused one-health approach for early detection and prevention of viral outbreaks. One Health 7:100094. doi: 10.1016/j.onehlt.2019.100094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Yang K, Pagaling E, Yan T. 2014. Estimating the prevalence of potential enteropathogenic Escherichia coli and intimin gene diversity in a human community by monitoring sanitary sewage. Appl Environ Microbiol 80:119–127. doi: 10.1128/AEM.02747-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Yan T, O’Brien P, Shelton JM, Whelen AC, Pagaling E. 2018. Municipal wastewater as a microbial surveillance platform for enteric diseases: a case study for Salmonella and salmonellosis. Environ Sci Technol 52:4869–4877. doi: 10.1021/acs.est.8b00163. [DOI] [PubMed] [Google Scholar]
  • 4.CDC. 2012. Principles of epidemiology in public health practice, 3rd ed US Department of Health and Human Services, Atlanta, GA. [Google Scholar]
  • 5.Gibbons CL, Mangen MJJ, Plass D, Havelaar AH, Brooke RJ, Kramarz P, Peterson KL, Stuurman AL, Cassini A, Fèvre EM, Kretzschmar ME, Burden of Communicable diseases in Europe (BCoDE) consortium. 2014. Measuring underreporting and under-ascertainment in infectious disease datasets: a comparison of methods. BMC Public Health 14:147. doi: 10.1186/1471-2458-14-147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Diemert S, Yan T. 2019. Clinically unreported salmonellosis outbreak detected via comparative genomic analysis of municipal wastewater Salmonella isolates. Appl Environ Microbiol 85:e00139-19. doi: 10.1128/AEM.00139-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Gal-Mor O. 2019. Persistent infection and long-term carriage of typhoidal and nontyphoidal salmonellae. Clin Microbiol Rev 32:e00088-18. doi: 10.1128/CMR.00088-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Quilliam RS, Cross P, Williams AP, Edwards-Jones G, Salmon RL, Rigby D, Chalmers RM, Thomas DR, Jones DL. 2013. Subclinical infection and asymptomatic carriage of gastrointestinal zoonoses: occupational exposure, environmental pathways, and the anonymous spread of disease. Epidemiol Infect 141:2011–2021. doi: 10.1017/S0950268813001131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Litrup E, Torpdahl M, Malorny B, Huehn S, Christensen H, Nielsen EM. 2010. Association between phylogeny, virulence potential and serovars of Salmonella enterica. Infect Genet Evol 10:1132–1139. doi: 10.1016/j.meegid.2010.07.015. [DOI] [PubMed] [Google Scholar]
  • 10.Marzel A, Desai PT, Goren A, Schorr YI, Nissan I, Porwollik S, Valinsky L, Mcclelland M, Rahav G, Gal-Mor O. 2016. Persistent infections by nontyphoidal salmonella in humans: epidemiology and genetics. Clin Infect Dis 62:879–886. doi: 10.1093/cid/civ1221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Centers for Disease Control and Prevention. 2018. National enteric disease surveillance: Salmonella annual report, 2016. Centers for Disease Control and Prevention, Atlanta, GA. [Google Scholar]
  • 12.Sanders E, Sweeney FJ, Friedman EA, Boring JR, Randall EL, Polk LD. 1963. An outbreak of hospital-associated infections due to Salmonella Derby. JAMA 186:110–112. [DOI] [PubMed] [Google Scholar]
  • 13.Zachary JF. 2017. Pathologic basis of veterinary disease, 6th ed, p 132–241.e1. Elsevier, Amsterdam, the Netherlands. [Google Scholar]
  • 14.Yue M, Han X, De Masi L, Zhu C, Ma X, Zhang J, Wu R, Schmieder R, Kaushik RS, Fraser GP, Zhao S, McDermott PF, Weill FX, Mainil JG, Arze C, Fricke WF, Edwards RA, Brisson D, Zhang NR, Rankin SC, Schifferli DM. 2015. Allelic variation contributes to bacterial host specificity. Nat Commun 6:8754. doi: 10.1038/ncomms9754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Centers for Disease Control and Prevention. 2013. National Salmonella surveillance annual report, 2010. Centers for Disease Control and Prevention, Atlanta, GA. [Google Scholar]
  • 16.Bäumler AJ, Tsolis RM, Heffron F. 1996. Contribution of fimbrial operons to attachment to and invasion of epithelial cell lines by Salmonella typhimurium. Infect Immun 64:1862–1865. doi: 10.1128/IAI.64.5.1862-1865.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Yoshida CE, Kruczkiewicz P, Laing CR, Lingohr EJ, Gannon VPJ, Nash JHE, Taboada EN. 2016. The Salmonella In Silico Typing Resource (SISTR): an open web-accessible tool for rapidly typing and subtyping draft Salmonella genome assemblies. PLoS One 11:e0147101. doi: 10.1371/journal.pone.0147101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ashton PM, Nair S, Peters TM, Bale JA, Powell DG, Painset A, Tewolde R, Schaefer U, Jenkins C, Dallman TJ, de Pinna EM, Grant KA, Salmonella Whole Genome Sequencing Implementation Group. 2016. Identification of Salmonella for public health surveillance using whole genome sequencing. PeerJ 4:e1752. doi: 10.7717/peerj.1752. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hayward MR, Petrovska L, Jansen VAA, Woodward MJ. 2016. Population structure and associated phenotypes of Salmonella enterica serovars Derby and Mbandaka overlap with host range. BMC Microbiol 16:15. doi: 10.1186/s12866-016-0628-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Beltran P, Musser JM, Helmuth R, Farmer JJ, Frerichs WM, Wachsmuth IK, Ferris K, McWhorter AC, Wells JG, Cravioto A, Selander RK. 1988. Toward a population genetic analysis of Salmonella: genetic diversity and relationships among strains of serotypes S. Choleraesuis, S. Derby, S. Dublin, S. Enteritidis, S. Heidelberg, S. Infantis, S. Newport, and S. Typhimurium. Proc Natl Acad Sci U S A 85:7753–7757. doi: 10.1073/pnas.85.20.7753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Sévellec Y, Vignaud ML, Granier SA, Lailler R, Feurer C, Le Hello S, Mistou MY, Cadel-Six S. 2018. Polyphyletic nature of Salmonella enterica serotype derby and lineage-specific host-association revealed by genome-wide analysis. Front Microbiol 9:891. doi: 10.3389/fmicb.2018.00891. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wang B, Wang C, McKean JD, Logue CM, Gebreyes WA, Tivendale KA, O’Connor AM. 2011. Salmonella enterica in swine production: assessing the association between amplified fragment length polymorphism and epidemiological units of concern. Appl Environ Microbiol 77:8080–8087. doi: 10.1128/AEM.00064-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zheng H, Hu Y, Li Q, Tao J, Cai Y, Wang Y, Li J, Zhou Z, Pan Z, Jiao X. 2017. Subtyping Salmonella enterica serovar Derby with multilocus sequence typing (MLST) and clustered regularly interspaced short palindromic repeats (CRISPRs). Food Control 73:474–484. doi: 10.1016/j.foodcont.2016.08.051. [DOI] [Google Scholar]
  • 24.Sévellec Y, Felten A, Radomski N, Granier S, Le Hello S, Petrovska L, Mistou M-Y, Cadel-Six S. 2019. Genetic diversity of Salmonella Derby from the poultry sector in Europe Pathogens 8:46. doi: 10.3390/pathogens8020046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Bessonov K, Robertson JA, Lin JT, Liu K, Gurnik S, Kernaghan SA, Yoshida C, Nash J. 2018. Complete genome and plasmid sequences of 32 Salmonella enterica strains from 30 serovars. Microbiol Resour Announc 7:e01232-18. doi: 10.1128/MRA.01232-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Berge ACB, Dueger EL, Sischo WM. 2006. Comparison of Salmonella enterica serovar distribution and antibiotic resistance patterns in wastewater at municipal water treatment plants in two California cities. J Appl Microbiol 101:1309–1316. doi: 10.1111/j.1365-2672.2006.03031.x. [DOI] [PubMed] [Google Scholar]
  • 27.Bäumler AJ, Tsolis RM, Heffron F. 1996. The lpf fimbrial operon mediates adhesion of Salmonella typhimurium to murine Peyer’s patches. Proc Natl Acad Sci U S A 93:279–283. doi: 10.1073/pnas.93.1.279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Yue M, Rankin SC, Blanchet RT, Nulton JD, Edwards RA, Schifferli DM. 2012. Diversification of the Salmonella fimbriae: a model of macro- and microevolution. PLoS One 7:e38596. doi: 10.1371/journal.pone.0038596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Van Der Woude MW, Bäumler AJ. 2004. Phase and antigenic variation in bacteria. Clin Microbiol Rev 17:581–611. doi: 10.1128/CMR.17.3.581-611.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Nicholson TL, Bäumler AJ. 2001. Salmonella enterica serotype Typhimurium elicits cross-immunity against a Salmonella enterica serotype Enteritidis strain expressing LP fimbriae from the lac promoter. Infect Immun 69:204–212. doi: 10.1128/IAI.69.1.204-212.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Hayward MR, AbuOun M, La Ragione RM, Tchórzewska MA, Cooley WA, Everest DJ, Petrovska L, Jansen VAA, Woodward MJ. 2014. SPI-23 of S. Derby: role in adherence and invasion of porcine tissues. PLoS One 9:e107857. doi: 10.1371/journal.pone.0107857. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Hayward MR, Jansen VAA, Woodward MJ. 2013. Comparative genomics of Salmonella enterica serovars Derby and Mbandaka, two prevalent serovars associated with different livestock species in the UK. BMC Genomics 14:365. doi: 10.1186/1471-2164-14-365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Wallis TS, Galyov EE. 2000. Molecular basis of Salmonella-induced enteritis. Mol Microbiol 36:997–1005. doi: 10.1046/j.1365-2958.2000.01892.x. [DOI] [PubMed] [Google Scholar]
  • 34.CDC PulseNet. 2016. Laboratory standard operating procedure for PulseNet Nextera XT library prep and run setup for the Illumina MiSeq. PNL32. Centers for Disease Control and Prevention, Atlanta, GA. [Google Scholar]
  • 35.CDC PulseNet. 2015. PulseNet standard operating procedure for Illumina MiSeq data quality control. PNQ07. Centers for Disease Control and Prevention, Atlanta, GA. [Google Scholar]
  • 36.Bushnell B. 2018. BBMap. sourceforge.net/projects/bbmap/.
  • 37.Krueger F. 2017. Trim Galore v.0.4.5. www.bioinformatics.babraham.ac.uk/projects/trim_galore/.
  • 38.Martin M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17:10–12. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]
  • 39.Andrews S. 2018. FastQC v.0.11.7. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
  • 40.Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Laetsch DR, Blaxter ML. 2017. BlobTools: interrogation of genome assemblies [version 1; referees: 2 approved with reservations]. F1000Res 6:1287. doi: 10.12688/f1000research.12232.1. [DOI] [Google Scholar]
  • 42.Gurevich A, Saveliev V, Vyahhi N, Tesler G. 2013. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29:1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.McKinney W. 2010. Data structures for statistical computing in Python, p 51–56. In Proceedings of the 9th Python in Science Conference. doi: 10.25080/Majora-92bf1922-00a. [DOI]
  • 44.Zhou Z, Alikhan N-F, Mohamed K, the Agama Study Group, Achtman M. 2019. The user’s guide to comparative genomics with EnteroBase. Three case studies: micro-clades within Salmonella enterica serovar Agama, ancient and modern populations of Yersinia pestis, and core genomic diversity of all Escherichia. bioRxiv doi: 10.1101/613554. [DOI]
  • 45.Alikhan N-F, Zhou Z, Sergeant MJ, Achtman M. 2018. A genomic overview of the population structure of Salmonella. PLoS Genet 14:e1007261. doi: 10.1371/journal.pgen.1007261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Zhou Z, Alikhan N-F, Sergeant MJ, Luhmann N, Vaz C, Francisco AP, Carrico JA, Achtman M. 2018. GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens. Genome Res 28:1395–1404. doi: 10.1101/gr.232397.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Seemann T. 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069. doi: 10.1093/bioinformatics/btu153. [DOI] [PubMed] [Google Scholar]
  • 48.Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MTG, Fookes M, Falush D, Keane JA, Parkhill J. 2015. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31:3691–3693. doi: 10.1093/bioinformatics/btv421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Letunic I, Bork P. 2016. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res 44:W242–W245. doi: 10.1093/nar/gkw290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Page AJ, Taylor B, Keane JA. 2016. Multilocus sequence typing by blast from de novo assemblies against PubMLST. J Open Source Softw 1:118. doi: 10.21105/joss.00118. [DOI] [Google Scholar]
  • 52.Seemann T. 2018. ABRicate v.0.8.7. github.com/tseemann/abricate.
  • 53.Chen L, Xiong Z, Sun L, Yang J, Jin Q. 2012. VFDB 2012 update: toward the genetic diversity and molecular evolution of bacterial virulence factors. Nucleic Acids Res 40:641–645. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Thomas M, Fenske GJ, Antony L, Ghimire S, Welsh R, Ramachandran A, Scaria J. 2017. Whole genome sequencing-based detection of antimicrobial resistance and virulence in non-typhoidal Salmonella enterica isolated from wildlife. Gut Pathog 9:1–9. doi: 10.1186/s13099-017-0213-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.MacDonald JH. 2014. Handbook of biological statistics, 3rd ed Sparky House Publishing, Baltimore, MD. [Google Scholar]
  • 56.R Core Team. 2019. R: a language and environment for statistical computing. The R Foundation, Vienna, Austria. [Google Scholar]
  • 57.Waskom M, Botvinnik O, O’Kane D, Hobson P, Ostblom J, Lukauskas S, Gemperline DC, Augspurger T, Halchenko Y, Cole JB, Warmenhoven J, de Ruiter J, Pye C, Hoyer S, Vanderplas J, Villalba S, Kunter G, Quintero E, Bachant P, Martin M, Meyer K, Miles A, Ram Y, Brunner T, Yarkoni T, Williams ML, Evans C, Fitzgerald C, Brian, Qalieh A. July 2018. mwaskom/seaborn: v0.9.0 (July 2018) Zenodo doi: 10.5281/zenodo.1313201. [DOI] [Google Scholar]
  • 58.Hadfield J, Croucher NJ, Goater RJ, Abudahab K, Aanensen DM, Harris SR. 2018. Phandango: an interactive viewer for bacterial population genomics. Bioinformatics 34:292–293. doi: 10.1093/bioinformatics/btx610. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental file 1
AEM.00814-20-s0001.pdf (873.3KB, pdf)
Supplemental file 2
AEM.00814-20-sd002.xlsx (13.3KB, xlsx)

Articles from Applied and Environmental Microbiology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES