ABSTRACT
Escherichia coli is deposited into soil with feces and exhibits subsequent population decline with concomitant environmental selection. Environmentally persistent strains exhibit longer survival times during this selection process, and some strains have adapted to soil and sediments. A georeferenced collection of E. coli isolates was developed comprising 3,329 isolates from 1,428 soil samples that were collected from a landscape spanning the transition from the grasslands to the eastern deciduous forest biomes. The isolate collection and sample database were analyzed together to discover how land cover, site characteristics, and soil chemistry influence the prevalence of cultivable E. coli in surface soil. Soils from forests and pasture lands had equally high prevalences of E. coli. Edge interactions were also observed among land cover types, with proximity to forests and pastures affecting the likelihood of E. coli isolation from surrounding soils. E. coli is thought to be more prevalent in sediments with high moisture, but this was observed only in grass- or crop-dominated lands in this study. Because differing E. coli phylogroups are thought to have differing ecology profiles, isolates were also typed using a novel single-nucleotide polymorphism (SNP) genotyping assay. Phylogroup B1 was the dominant group isolated from soil, as has been reported in all other surveys of environmental E. coli. Although differences were small, isolates belonging to phylogroups B2 and D were associated with wooded areas, slightly more acidic soils, and soil sampling after rainfall events. In contrast, isolates from phylogroups B1 and E were associated with pasture lands.
IMPORTANCE The consensus is that complex niches or life cycles should select for complex genomes in organisms. There is much unexplained biodiversity in E. coli, and its cycling through complex extrahost environments may be a cause. In order to understand the evolutionary processes that lead to adaptation for survival and growth in soil, an isolate collection that associates soil conditions and isolate genome sequences is required. An equally important question is whether traits selected in soil or other extrahost habitats can be transmitted to E. coli residing in hosts via gene flow. The new findings about the distribution of E. coli in soil at the landscape scale (i) enhance our capability to study how extrahost environments influence the evolution of E. coli and other bacteria, (ii) advance our knowledge of the environmental biology of this microbe, and (iii) further affirm the emerging scientific consensus that E. coli in waterways originates from nonpoint sources not associated with human activity or livestock farming.
KEYWORDS: environmental E. coli, soil, landscape, geographic information systems, prevalence, Escherichia coli, land use, soil microbiology
INTRODUCTION
Escherichia coli is a widespread commensal and pathogenic bacterium that primarily resides in the intestines of warm-blooded animals and has been considered to be an indicator of fecal pollution in recreational and drinking water (1, 2). The fecal-oral transmission route of these organisms requires transient passage through extrahost habitats (i.e., secondary habitats), where E. coli must survive environmental stressors in order to colonize new hosts. It can be postulated that extrahost habitats select for persistent E. coli (3, 4). In some cases, E. coli has adapted for growth outside the host (5, 6). Thus, much of the biodiversity in the E. coli species may be generated and maintained via transient-to-semipermanent residence in extrahost habitats (7). This process can confound the use of E. coli as an indicator organism for fecal contamination in water quality testing (8, 9). Therefore, understanding the ecology of E. coli in extrahost habitats can inform not only our environmental and public health policy but also our concept of microbial species.
Estimates suggest that there are 1020 E. coli individuals globally, and approximately half of those (5 × 1019 individuals) are residing in extrahost habitats at any one time (2, 10). Extrahost habitats can include water, soil, sediments, mucus layers of algal mats, plant surfaces (including food crops), and various elements of the built environment (11–14). Environmental persistence of E. coli was previously attributed to exquisite gene regulation to cope with environmental changes between intestinal and extrahost environments (10, 15). However, isolates of E. coli from diverse extrahost habitats have now been shown to exhibit phenotypes and population structure differing from those of E. coli strains isolated from fecal deposits and from clinical samples and genetic reference strains (16, 17). For example, E. coli isolates from soil and spinach leaves have been shown to form denser biofilms than corresponding collections of fecal or clinical isolates, and this trait likely contributes to fitness differences among lineages residing in extrahost environments (12). The collected data over recent years suggest a model in which (i) E. coli populations in extrahost habitats undergo fast selection to enrich for the most adaptive persistence traits, and (ii) some of these subtypes may actually adapt to grow in extrahost habitats in a process called evolutionary rescue (18).
Soil presents a particularly interesting extrahost habitat. Due to its high degree of spatial and temporal heterogeneity near the surface (19), it can provide myriad selective pressures to act on diverse E. coli strains as these organisms enter the soil through the breakdown of fecal deposits. Numerous survival experiments have shown that commensal and pathogenic E. coli strains can persist in soil for months (20–23). Some studies have shown that E. coli is genetically structured by selection in the soil environment (11, 23, 24). Indeed, the host-soil-water cycle presents one mechanism by which E. coli can reach new hosts, because some E. coli strains will then be mobilized in overland or groundwater flow, leading to redeposition into new soil environments or entry into surface water, where they can transfer to hosts directly or via food (25–29). Due to its heterogeneity and its frequent role as an extrahost habitat, the soil environmental milieu probably selects and/or maintains much of the genomic biodiversity of E. coli. However, studies to date have been limited to field-scale studies focused on lands impacted by agricultural activity. As such, the larger patterns of how deposition in extrahost habitats acts to generate or maintain biodiversity in the E. coli species remain obscure (7).
In order to test hypotheses about the role of soil in the diversification of E. coli, a large well-curated set of E. coli isolates from soil, their genomes, and edaphic variables should be developed. Because E. coli phylogroups have been shown to differ in their ecophysiological traits in extrahost habitats (11, 12, 30), a sampling requirement should be that soil isolate collections are structured to include adequate representation of the dominant phylogroups observed in soil to account for the effects of selection in soil on divergent genetic backgrounds. Here, we report such a collection of E. coli strains isolated from 1,428 surface soil samples in the Upper Midwest of the United States and straddling the boundary between the grassland and eastern temperate forest biomes. Samplings among biomes contributes to a natural stratification in soil properties throughout the sample set. In the present study, we tested hypotheses about the environmental prevalence of E. coli among land cover types, soil properties, and meteorological variation throughout our sampling period (June to November 2015). This study was focused on four phylogroups, B1, B2, D, and E, that have been shown to be the dominant groups in extrahost habitats and together comprised over 87% of isolates obtained from soil in the present collection.
RESULTS
Soil sample collection.
Soil samples (n = 1,430) were collected from 143 sites in the area from 47.14°N, 96.81°W to 46.67°N, 95.70°W. The sampled landscape occupied a 20-km buffer surrounding the Buffalo River of Minnesota and measured 585,589 ha (Fig. 1). The composition of the landscape was analyzed using FragStats version 4.2. As represented in the U.S. Geological Survey (USGS) National Land Cover Data (NLCD) 2011 database, the landscape was 36% cropland, 29% forest (mainly deciduous), 10% open water, 6% wooded wetland, 6% pasture, 5% herbaceous wetland, 4% urban development, 3% grassland, and 1% scrubland. Because soil pH has been suggested to structure E. coli populations and is a key correlate of microbial community composition (11, 31, 32), a stratified-random spatial sampling design was implemented based on three strata of predicted soil pH (Fig. 1 and see Materials and Methods). Areas in the landscape with greater heterogeneity of predicted soil pH were sampled at more sites (see Fig. S1 in the supplemental material). Soil samples yielding E. coli isolates (see below) were submitted for chemical analysis. All soil properties varied by land cover type (Table 1). Of the 1,430 samples collected, more than 99% of soil samples were successfully processed for E. coli isolation. To maximize genetic diversity of isolates (and minimize competition among them), soil cultures were subsampled into 384 subcultures of 200 μl each. Five hundred eighty-one cultures from soil samples showed blue fluorescence consistent with the presence of non-O157:H7 E. coli in at least one of the 384 enrichment subcultures.
TABLE 1.
Land cover type | No. of samplesa | Soil properties from chemical analysis (median [range]) |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
pH | Organic matter (%LoI)b | Nitrate content (lb/acre) | Phosphorus content (mg/kg) | Electrical conductivity (Mohms/cm) | Copper content (mg/kg) | Iron content (mg/kg) | Sodium content (mg/kg) | Calcium content (g/kg) | Magnesium content (mg/kg) | ||
Cropland | 87 | 7.9 (5.5–8.6) | 5.4 (1.6–14.7) | 3 (1–61) | 27 (4–122) | 0.4 (0.1–2.8) | 0.8 (0.3–7.0) | 19 (4–117) | 12 (2–740) | 4.8 (0.8–10.2) | 700 (200–5,500) |
Deciduous forest | 317 | 7.2 (5.7–8.5) | 12.1 (0.4–60.0) | 4 (0–26) | 24 (5–136) | 0.4 (0.1–1.1) | 0.9 (0.1–3.9) | 50 (7–392) | 6 (0–480) | 5.1 (0.2–9.4) | 700 (200– 1,900) |
Grassland | 49 | 7.7 (6.2–8.4) | 6.9 (2.3–20.8) | 2 (0–22) | 14 (3–59) | 0.3 (0.1–2.6) | 1.0 (0.1–3.2) | 18 (6–152) | 9 (2–560) | 5.2 (2.3–23.3) | 500 (200–2,100) |
Pasture | 44 | 7.7 (6.9–8.5) | 20.9 (5.0–69.1) | 2 (1–6) | 31 (8–48) | 1.2 (0.3–3.2) | 2.0 (0.5–4.0) | 145 (6–327) | 32 (18–1,250) | 7.0 (5.5–8.3) | 1,100 (800–2,700) |
Scrubland | 4 | 8.2 (7.6–8.3) | 15.1 (7.8–16.1) | 7 (0–7) | 10 (5–13) | 0.5 (0.3–0.5) | 0.8 (0.7–0.9) | 10 (8–12) | 8 (6–30) | 8.4 (4.1–8.9) | 1,300 (600–1,900) |
Herbaceous wetland | 32 | 7.9 (6.4–8.5) | 13.9 (3.3–46.2) | 1 (0–44) | 13 (2–53) | 0.5 (0.2–1.9) | 0.9 (0.5–5.1) | 63 (6–107) | 24 (6–188) | 9.7 (2.2–11.8) | 900 (400–2,300) |
Wooded wetland | 26 | 7.4 (7.0–7.9) | 15.9 (5.7–39.6) | 3 (0–6) | 38 (7–60) | 0.4 (0.2–0.6) | 1.3 (0.4–2.4) | 66 (22–252) | 16 (8–22) | 6.5 (2.9–8.1) | 700 (300–3,100) |
Only soils that yielded E. coli were submitted for soil chemical analysis.
LoI, loss on ignition.
Isolate collection.
The 581 culture-positive soil samples amounted to 41% overall prevalence of E. coli among all soil samples. As previous studies of E. coli in soil and plant surfaces have indicated that phylogroups B1, B2, D, and E are the most frequently isolated, we sought to determine the membership of our isolates in those focal phylogroups. Of the 3,329 isolates of E. coli that were obtained, 87% were found to belong to the focal phylogroups using quantitative PCR (qPCR) single-nucleotide polymorphism (SNP) genotyping assays: 43% B1, 19% B2, 18% D, and 8% E. This amounted to 1,427 isolates from phylogroup B1, 617 isolates from phylogroup B2, 590 isolates from phylogroup D, and 248 isolates from phylogroup E. The remaining 447 isolates (i.e., the remaining 13%) were not successfully typed by these assays and are presumptively not members of one of the four focal phylogroups.
Core genome alignment confirmed that the SNP assays were 96% accurate for phylogroup B1, 87% accurate for phylogroup B2, 89% accurate for phylogroup D, and 96% accurate for phylogroup E. Of the isolates that were erroneously typed by the SNP assays, members of phylogroups B2, D, E, and F were erroneously typed as B1 (3, 3, 7, and 1 isolate, respectively); members of phylogroups B1, D, E, and F were erroneously typed as B2 (22, 5, 5, and 7 isolates, respectively); members of phylogroups B1, B2, and F were erroneously typed as D (15, 9, and 10 isolates, respectively); and members of phylogroups B1 and D were erroneously typed as E (1 and 2 isolates, respectively). Although strong statistical inference is impossible with such low misclassification rates, the misclassification errors seem roughly proportional to the overall distribution of phylogroup membership and suggest that the misclassifications were due to approximately random errors in the SNP assay. The exception appears to be that phylogroup F isolates were more frequently misclassified as phylogroup D, and we attribute this to the more recent shared genetic ancestry between phylogroups D and F. The remaining 13% of isolates that remained untyped presumptively contain members of phylogroups A, C, F, and/or the cryptic Escherichia clades. Among the sequenced genomes, no members of cryptic Escherichia clades were detected.
E. coli varied in prevalence per sample site among land cover, with forests, pastures, and wooded wetlands exhibiting the greatest median values per site for all phylogroups (Table 2). Although the number of sampled pasture sites was limited (n = 6), the prevalence of E. coli bacteria of all phylogroups was greatest in pasture soil samples, followed by forests and wooded wetlands (Table 2). Such elevated prevalence could be caused by migration of E. coli into the soil at higher frequency due to more frequent fecal deposition (demographic rescue), lower selection pressure in those environments leading to enhanced survival (or growth), or a combination of the two. The four phylogroups studied were not evenly distributed among land cover types, and phylogroup B1 was isolated from 23% of all soil samples, making it the most prevalent phylogroup (Table 2). As a function of relative prevalence, phylogroup B1 was isolated from soil approximately 1.8 times more frequently than groups B2 and D and 3.8 times more frequently than E in the overall sample collection. All phylogroups exhibited disproportionately greater prevalence in forest and pasture lands. Phylogroup E was almost four times more prevalent in pasture soils than expected based on its overall prevalence, where the other phylogroups were approximately two times more prevalent in pastures. In pasture, B1 was isolated 2.3 times more frequently than B2 and D and 2.8 times more frequently than E, indicating that these groups form a larger proportion of the E. coli population in pasture land than in other land cover types. Phylogroup B1 alone exhibited relatively high prevalence in wetlands of both vegetation types. In contrast, B1 was isolated at somewhat lower relative frequency from forest soils, at 1.5 times more frequently than B2 and D and only 3.2 times more frequently than E.
TABLE 2.
Land cover | No. of sample sites | No. of soil samples | Prevalence of E. coli phylogroups by site (median [range]); overall prevalence in land covera |
||||
---|---|---|---|---|---|---|---|
All E. coli isolates | B1 | B2 | D | E | |||
Cropland | 50 | 499 | 0.1 (0.0–0.8); 0.19 | 0.0 (0.0–1.0); 0.10 | 0.0 (0.0–0.4); 0.02 | 0.0 (0.0–0.4); 0.04 | 0.0 (0.0–1.0); 0.01 |
Deciduous forest | 44 | 440 | 0.7 (0.2–1.0); 0.73 | 0.4 (0.0–1.0); 0.38 | 0.2 (0.0–0.8); 0.24 | 0.2 (0.0–1.0); 0.27 | 0.1 (0.0–0.7); 0.12 |
Grassland | 29 | 290 | 0.1 (0.0–0.9); 0.19 | 0.0 (0.0–1.0); 0.08 | 0.0 (0.0–0.9); 0.05 | 0.0 (0.0–0.2); 0.04 | 0.0 (0.0–0.1); 0.02 |
Herbaceous wetland | 8 | 79 | 0.4 (0.2–0.8); 0.44 | 0.3 (0.0–0.7); 0.33 | 0.0 (0.0–0.2); 0.05 | 0.0 (0.0–0.3); 0.07 | 0.0 (0.0–0.6); 0.04 |
Pasture | 6 | 60 | 0.9 (0.3–1.0); 0.73 | 0.8 (0.2–1.0); 0.65 | 0.3 (0.0–0.6); 0.28 | 0.3 (0.0–0.6); 0.28 | 0.2 (0.0–0.6); 0.23 |
Scrubland | 2 | 20 | 0.2 (0.0–0.3); 0.15 | 0.2 (0.0–0.3); 0.15 | ND | 0.1 (0.0–0.1); 0.05 | 0.0 (0.0–0.1); 0.05 |
Wooded wetland | 4 | 40 | 0.6 (0.5–0.9); 0.65 | 0.5 (0.1–0.8); 0.40 | 0.0 (0.0–0.3); 0.15 | 0.2 (0.1–0.3); 0.20 | 0.1 (0.0–0.2); 0.10 |
Total | 143 | 1,428 | 0.41 | 0.23 | 0.12 | 0.13 | 0.06 |
Prevalence (given as a fraction of 1.0) by site measured among 10 soil samples within each site. ND, not detected in this land cover.
Phylogroups B2 and D were isolated from soils with lower mean pH and calcium levels than phylogroups B1 and E.
Because soil chemical data were only obtained for samples that yielded E. coli isolates, analysis of variance (ANOVA) was used to detect variance in soil chemical properties among phylogroups. All soil properties were subjected to Levene's test of homogeneity of variance among phylogroups to confirm their suitability for ANOVA (all P values > 0.05). ANOVA indicated that pH had lower means in samples yielding phylogroups B2 and D than in samples yielding B1 (Fig. 2; FpH = 3.84, Bonferroni-adjusted P < 0.05). Calcium also displayed differing means among phylogroups (Fig. 2; FCa = 4.03, Bonferroni-adjusted P < 0.05). Examination of contrasts using Tukey's honest significant difference tests indicated that B2 and D were isolated from soils that were approximately 0.1 to 0.2 pH units lower and 0.3 to 0.5 g/kg of soil lower in Ca concentration. Although these results suggest that soil chemistry may affect the distribution of phylogroups, calcium concentrations were moderately correlated with pH (R = 0.42; P < 0.05), as expected. Thus, the precise cause of differences in distribution remain undetermined, whether pH or Ca concentration.
Tests of differences in means of meteorological variables were only marginally significant. Soil samples yielding phylogroups B2 and D exhibited higher values of precipitation on the day before sampling (ANOVA F = 2.31, P < 0.1). The mean precipitation in soil samples yielding isolates in B2 and D was approximately 0.25 in. of rainfall (64 mm) on the day before sampling (overall median, 0.02 in.; overall range, 0 to 0.88 in.).
E. coli prevalence varied by land cover type, and there were edge interactions among forests, crop, and pasture lands.
In order to construct a model of the prevalence of E. coli in surface soils, random forest analysis was used to identify useful predictors of E. coli presence and absence in soil samples. Random forest analysis identified land cover type, distance to nearest forest, and percent forest cover as the best predictors of E. coli isolation from surface soil (Fig. 3). The best predictors of samples that did not yield E. coli, in order of importance, were percent cropland cover, percent forest cover, distance to the nearest pasture, distance to the nearest surface water, distance to the nearest forest, land cover type, and surface soil texture. Correlation tests indicated that percent forest cover and percent cropland cover were strongly inversely correlated (R = −0.72). Proximity to pasture was weakly, but significantly, correlated with proximity to impervious surfaces and proximity to surface water (RImperv. = 0.18, RWater = 0.32), so percent forest cover and proximity to pasture were used as representatives of the main effects of those groups of variables. The out-of-bag (OOB) error rate for this random forest analysis was 24%. As expected, the power to predict soils that did not yield E. coli isolates was better than the power to predict soils that did. Class error proportion for negative samples was 20% and for positive samples was 30%. Thus, a sampling design based on criteria from this random forest outcome would still require researchers to collect several samples to ensure isolation. The OOB error rate was stable over orders of magnitude of class weighting schemes that were tested. Although this OOB error rate is marginally acceptable for many classification schemes, we present this as an improved prediction for sampling E. coli from soil and to support the increasing consensus that E. coli is not a useful fecal indicator for water quality assessments, because non-point sources of E. coli from overland flow include not only pastures or manured croplands, but also forests and wooded wetlands.
The partial effects of each important variable on E. coli isolation were assessed (Fig. 4). Wooded areas and pastures exhibited greater prevalence of E. coli, as did samples collected in areas with greater forest cover (calculated in a 250-m radius) and areas within approximately a 30 m distance of forest cover. For example, the per-sample probability of E. coli isolation from surface soil in croplands is almost 20-fold lower than in forest areas (Fig. 4A). When samples are collected from areas within 30 m of forest, they were approximately 5-fold more likely to yield E. coli than those farther away (Fig. 4B), and samples collected in areas with less than 7% forest cover (250-m radius) were approximately 8-fold less likely to yield E. coli (Fig. 4C). The effect of close proximity to pasture was less clear, with a peak prevalence at approximately 400 m from pasture land and a trough at approximately 900 m from pasture (Fig. 4D); given that proximity measurements were calculated from the NLCD basemap and that the rate of misclassification between cropland, grassland, and pasture in the NLCD is high, it seems likely that the lower prevalence in pastures at short distances (<400 m) is due to misclassification of some grassland areas as pasture in the NLCD and demonstrates some of the limitations of using remotely sensed data to predict the prevalences of species.
Random forest analysis yielded the most important variables at the scale of the whole-sample collection, but tree models contain a hierarchical structure that is not easily extracted from the ensemble analysis. A conditional inference classification tree was generated, as implemented in the R/party package, to examine factors affecting the prevalence in portions of the sample collection and to devise a simple set of rules that can be used to maximize E. coli-positive samples in soils of the Upper Midwest based on the presented analysis (Fig. 5). As expected, the classification tree showed that E. coli was more prevalent in soils in wooded areas and pasture lands. Within land cover types, several other factors were useful on smaller scales. In wooded areas and pastures, soil moisture was associated with only a small difference in prevalence, and those soils saturated with water showed 66% prevalence, whereas dry or moist soils showed 72%. In croplands and grass-dominated areas, E. coli was much more prevalent where soils were saturated with water, and the overall prevalence in those areas was 41% (n = 69) compared to dry or moist soils in those areas, where prevalence was 19% (n = 755; Cochran-Mantel-Haenszel test, P < 0.05). When grassland and cropland soils were not saturated with water, cooler temperatures were associated with increased prevalence, as represented by the minimum temperature on the day of sampling, which was below 9.2°C (48°F). Under those conditions, the prevalence was 18% in samples collected on warmer days (n = 635) and 39% in samples collected on cooler days (n = 120). As minimum temperature and maximum temperature were highly correlated (R = 0.82), the maximum temperature on days of sampling was also examined. On days with temperature minima lower than 9.2°C, the maximum temperature ranged from 9.3 to 27.8°C, with a median of 15.6°C (60.1°F).
DISCUSSION
E. coli has historically been used as an indicator of fecal pollution in recreational and drinking water, but much evidence has indicated that there are environmental sources of E. coli in water (5, 14, 33, 34). Although the literature shows evidence that E. coli can adapt to extrahost environments (6, 16), the study of the ecological and evolutionary processes leading to environmental adaptation in soil requires a sample set of E. coli that has been designed with selection pressures, dispersal processes, and natural history of E. coli in mind. The isolate collection presented here is designed to examine the effects of deposition in soil on the generation and maintenance of biodiversity in E. coli. It is a well-curated set of E. coli isolates from soil that can be used alone or in comparison with clinical and fecal isolates to search for evidence of environmental adaptation. Here, we analyzed environmental factors associated with isolation of E. coli from surface soils, detected some differences in the distributions of four phylogroups, and presented additional criteria that can be used to design future studies of E. coli in soil.
We did not describe a comparable set of fecal isolates during this study. Although it is indisputable that fecal deposition is frequent in some soils and that movement of some wild animal hosts among sites is rapid at spatial scales of 101 to 103 meters, this soil isolate collection can be especially useful to detect adaptation to soil environments, because (i) death of unfit E. coli genotypes after deposition is rapid, especially when soil communities are unperturbed (35–37); (ii) as a result of this rapid death, even seven-gene multilocus sequence typing (MLST) has been capable of detecting gene-environment interactions in other studies of E. coli in the environment in the absence of fecal isolates obtained simultaneously (11, 12, 24); (iii) fecal communities in wild animals fluctuate over time, and deposition by wild animals imposes an easily detectable spatial structure on E. coli in soil only at the scale of 102 m (11, 23); and thus, (iv) the best means of detecting the selective effect of the soil environment is to examine soil isolates in a study that comports with a landscape genomics analytical framework in which movements can be taken into account when seeking to detect environmental selection (38, 39).
All E. coli strains were most prevalent in forest and pasture soils.
Several papers have highlighted the potential for E. coli to adapt to soils in pastures (23, 40), but reports of E. coli persisting or growing in forest soils have been less frequent (13). We found that E. coli exhibited 73% prevalence in soils of both forests and pastures, albeit at abundances of only 101 to 103 CFU per g of soil. That abundance is consistent with prior studies of E. coli in surface soil (11, 13, 23). The low abundance of E. coli in these soils suggests that competition for niche space and/or predation are intense in surface soil. It further indicates that E. coli organisms are ruderal soil inhabitants that are most likely to colonize niches in perturbed soil communities. This is consistent with studies on E. coli O157:H7 in soil that showed that perturbing the native microbiota led to enhanced persistence (22, 35, 36). The finding that E. coli strains are present at similar prevalence and abundance in soils of both pasture and forest lands has implications for water quality, as forests draining into streams may be adding significant amounts of suspended sediments and E. coli to surface water ways. In a prior study, we were confused by our observation of unexpectedly high E. coli loads in roadside ditches that drained forest land. During the fall season in that study, roadside ditches adjacent to forests transported more E. coli than in the spring and summer and nearly the same amount of E. coli as agricultural fields that had been manured the previous spring (41).
E. coli was found at low frequency in cropland soils in this study. The prevalence, abundance, and survival of E. coli have been studied in croplands for potential application to food safety of fresh fruits and vegetables. Many studies have focused on the survival of E. coli O157:H7 after inoculation into soil with manure amendments. Two studies on generic E. coli linked recent manure application, on-site hygienic facilities for workers, and recent irrigation (especially with pond water) as factors that increased the likelihood of E. coli detection in fresh-produce farms (42, 43). Field surveys of E. coli in croplands have found similarly low prevalences. A prevalence of 6.6% was observed in croplands of Colorado and Texas in one sample collection (43). Two other studies found a prevalence of 48% in croplands located in or near riparian areas (P.W. Bergholz, unpublished data) and a prevalence of 65% at 248 days after natural flooding in other New York State farms (24).
Reports of the prevalence of E. coli in soils of pastures and forests compared to cropland are not entirely new (44, 45), though they have been sparse and focused on isolated single-use land areas (3, 11, 23, 24, 40). Here, we report a contemporary analysis of E. coli from 1,428 samples collected in an agricultural landscape totaling 585,589 ha. Thus, the observation of increased E. coli prevalence in forests has been considerably strengthened. The elevated prevalence and abundance of E. coli in forest and pasture soils have been attributed to elevated fecal input. However, it should also be noted that the plant cover and shading in these areas moderate perturbations and diurnal cycling in temperature or soil moisture. High levels of organic matter can also act as a buffer against pH change and nutrient depletion. Thus, a high prevalence of E. coli in those soils may be in part due to elevated fecal deposition (certainly in the case of pastures) but is also likely due to more stable soil conditions. In contrast, croplands exhibit high levels of diurnal and anthropogenic variation under soil conditions, and this may have contributed to lower prevalence. It is unclear what impacts soil amendment and/or diurnal environmental variation might have on the genomic biodiversity of E. coli strains isolated from croplands compared to those from other land cover types or potential for adaptation to soil. On the one hand, lower prevalence might be indicative of lower genetic diversity in these systems. On the other hand, genomic diversity might be increased in croplands due to the combination of greater variation in environmental conditions (over time) or greater diversity of immigrating E. coli with soil amendments.
E. coli was more prevalent in areas with greater forest cover and in samples collected in close proximity (0 to 38 m) of forests.
In addition to increased prevalence in forests and pastures, our classification analyses revealed an effect of proximity to forest cover. Increased prevalence was observed in soils of croplands or grasslands that occur near forest cover, suggesting an edge interaction in which forests influence the prevalence of E. coli in nearby soils. Such edge interactions could be due to the spread of E. coli out of forests and into surrounding lands, or it might be due to moderation of soil conditions in the vicinity by shading, forest vegetation, or soil. Such effects may accumulate over the landscape. In the current study, E. coli was up to 90% less prevalent when forest cover (250-m radius) was less than 7%, and the prevalence was predicted to approach the overall sample collection average at greater forest cover levels. Although we did not find previously published observations about forest edge interactions acting on E. coli, these results echo previous studies on Listeria species. Significant edge interactions have been observed in a study of Listeria monocytogenes and other Listeria species. In one study, an increase of 100 m distance from a forest edge resulted in a 16% decrease in likelihood of Listeria sp. isolation in croplands (46).
Phylogroup B1 was the predominant phylogroup in all land cover types.
Phylogroup B1 has been previously reported to be the predominant phylogroup isolated from feces of domesticated and wild animals, as well as soil and surface water samples (2, 11, 23, 29, 30, 47). The predominance of phylogroup B1 may be, in part, due to its dominance in hosts (a source-sink hypothesis), enhanced survival of phylogroup B1 isolates in extrahost environments due to a unique set of stress tolerance traits (a stress tolerance phenotype hypothesis) (12, 23, 30), or explained by the existence of some clades in B1 that appear to more readily adapt for growth in sediment and/or soil habitats, e.g., clade ET-1 (a genetic predisposition hypothesis) (33).
In the present study, phylogroup B1 was the predominant group isolated from all land cover types. In most land cover types, only one phylogroup was isolated from most soil samples. The exception to this is pasture soil samples. Inspection of our soil database revealed that 63% of E. coli-positive pasture soils contained at least two phylogroups, but only approximately 36%, 21%, and 13% of E. coli-positive forest, grassland, and cropland soils, respectively, contained at least two phylogroups. Not only were phylogroups E and B1 isolated more frequently than expected from pasture soil, they were frequently isolated from the same soil samples in pastures. In contrast, the prevalences of D and B2 in pastures were more similar to those in forests. Although we were only able to sample six active pastures as part of this study, our finding is consistent with findings of a recently published study on the population structure of environmental E. coli isolates from pastures in South Dakota that found phylogroups B1 and E to be more abundant in that soil (23). This same study indicated that phylogroups A, B2, and D were only rarely isolated from pasture land. In contrast, we observed phylogroups D and B2 in approximately the same proportion in pasture land as in the forest. One important difference between these two studies may be that the South Dakota study included pastures with only grassy cover, whereas four of the six pastures that were sampled in our study on the Buffalo River in Minnesota included soil under tree cover. It is tempting to speculate that the tree cover may have ameliorated any selection against D and B2 in our study, while also permitting the survival of B1 and E strains that were being deposited into the soil by cattle.
Phylogroups D and B2 were isolated from soils with lower average pH than phylogroups B1 and E.
Although the difference in soil pH was modest, the preponderance of the evidence from many studies together suggests that ecological differences among phylogroups are real. Phylogroups D and B2 are thought to be well adapted to extraintestinal habitats in hosts, including the urinary tract, where these phylogroups cause the majority of urinary tract infections (48). Recent studies have also suggested that many strains in phylogroup D have traits that are advantageous on plant surfaces and in soil (11, 12). Our finding that slightly more-acidic soils (on average) are associated with isolation of phylogroups B2 and D invites comparisons between those soils and urinary tract or skin environments, which also have moderately acidic environmental pH. We hypothesize that tolerance traits and growth in lower-pH environments originated in the host but may be of adaptive value in moderately acidic soils. Therefore, we expect that this adaptive trait for life outside the intestine is maintained during persistence or growth in soil and may permit strains from phylogroups D and B2 to displace phylogroup E in soil and in moderately low-pH host environments.
A recommended sampling scheme for E. coli in soil.
In this study, we developed a model that expands our ability to predict the prevalence of culturable E. coli in surface soils using map data that can be remotely sensed. Taken together with the other tests on prevalence, a set of recommendations can be developed for future studies of soil as a habitat for E. coli populations. Although limited data suggest that differing regions may have differing baseline E. coli prevalences in surface soil (24, 42, 43), we expect that these rules are useful in all landscapes after adjusting for baseline differences. In croplands and grassland areas, one should expect to collect approximately 10 to 20 soil samples for a high likelihood of detecting E. coli in a single sample, but fewer could be collected in wooded areas and pastures, if needed. Frequent wild-ruminant activity may increase the frequency of isolates in grasslands, though our ability to sample protected lands with abundant evidence of deer activity was limited. One should expect that phylogroup B1 will comprise approximately 50% of isolates obtained from most soil sample collections. To maximize phylogroup diversity and the number of isolates from soil, sample surface soils with pH from 6.0 to 8.0 at locations within 30 m of wooded areas on a cool day with temperatures ranging from 47°F to 60°F and after a moderate rainfall event (0.25 to 0.9 in.). In soils from croplands and grasslands, collect samples preferentially where soils are moist or water saturated at the surface. Under those conditions, sampling in pasture land will maximize the isolation of phylogroups B1 and E, and sampling in forests or wooded wetlands will maximize the isolation of phylogroups B1, B2, and D. Such a sampling scheme can be derived from intersections between the USDA Soil Survey Geographic Database (SSURGO) maps of predicted soil properties and the USGS NLCD maps of land cover distributions and adapted for predicted meteorological conditions.
Conclusion.
There has been some debate in the literature about the relative degrees of environmental adaptation among phylogroups. The repeated observation that phylogroup B1 predominates in most fecal samples and essentially all environmental samples is consistent across numerous papers (29, 30). However, some debate remains about other phylogroups, with some reports on waterways and pasture lands showing that phylogroups D and B2 are rare and other reports on forest, croplands, and plant surfaces showing that phylogroups D and B1 may have a selective advantage (11, 12). When phenotypes have been measured, including soil environment-gene interactions, biofilm formation, and nutrient utilization, phylogroups B1 and D have exhibited the greatest proportions of environmental adaptations (12, 30). Phylogroup D has exhibited the greatest gene-soil environment interactions, possibly owing to its greater genetic diversity than that of other phylogroups (11, 49). Phylogroup B1 exhibits many adaptive traits but only small gene-environment interactions (11, 29, 30). Synthesizing our current study with all of this prior work, we propose a phylogroup-dependent model of environmental adaptation in E. coli that can be tested using our sample collection. Environmental adaptation in E. coli should occur by evolutionary rescue, as E. coli sensu stricto is a commensal intestinal organism that can adapt to live outside a host (6, 18). The probability of evolutionary rescue events is dictated by two components: (i) the survival time of the population (a product of survival traits and initial population size) and (ii) preexisting genetic diversity in the population at the time of deposition in soil (50). We hypothesize that phylogroup D may have a tendency toward eventual evolutionary rescue in soils because of its high levels of preexisting genomic diversity. In contrast, we hypothesize that phylogroup B1, although not as genetically diverse, may possess enhanced stress tolerance traits and may be more persistent in soil. Such enhanced stress tolerance may predispose B1 to evolutionary rescue in soil.
MATERIALS AND METHODS
Study design.
A spatially explicit landscape sampling design was devised using the Sampling Design Tool extension for ArcMap 10.2 (51). Strata were generated by classifying predicted surface soil pH (0 cm depth) from the USDA SSURGO database into three strata ranging from 5.9 to 6.7, 6.7 to 7.3, and 7.3 to 8.2 for all locations within 20 km of the Buffalo River of Minnesota, a tributary of the Red River of the North (53; https://websoilsurvey.sc.egov.usda.gov/ [accessed 15 April 2015]. The sampling area was then subdivided into five zones based on spatial heterogeneity in predicted soil pH values (Fig. S1). Within each zone, each pH stratum was sampled equally. Zones with greater soil heterogeneity were sampled more intensively, as these zones provided for soil samples that varied in environmental chemistry over short geographic distances (Fig. 1). At each site, 10 surface (0 to 4 in. depth) soil samples of approximately 300 g of soil were collected into a 32-oz. Whirl-Pak bag (Nasco, Fort Atkinson, WI) using a sterile disposable 4-oz. plastic scoop. The locations of individual soil samples were pseudorandomly selected at the site, and the locations of all samples were recorded at a submeter scale using a Trimble GeoXT 3000 global positioning system (GPS) receiver (Trimble, Sunnyvale, CA). Geographic coordinates were postprocessed and corrected using local National Oceanic and Atmospheric Administration (NOAA) reference data as implemented in Trimble GPS Pathfinder Office version 5.60. Upon soil sample collection, sensory observations about the sample were immediately recorded in a spreadsheet on a Samsung Galaxy S6 tablet (Samsung, Seoul, South Korea), including sample number, land cover type, vegetation type, soil moisture (four levels), soil consolidation (four levels), and density of vegetation (four levels). Soils were kept at ambient temperature in a cooler until the following day. At that time, 8 g of soil was retained for isolation of E. coli, and the remainder of the soil sample was dried at 55°C and prepared for chemical analysis. Only soil samples found to contain E. coli were submitted for chemical analysis at the North Dakota State University (NDSU) Soil Analysis Laboratory. Soil chemical analysis included nitrate-nitrogen content (pound per acre), electrical conductivity (millimhos per centimeter), total organic carbon (% loss on ignition), phosphorus content (milligrams per kilogram), copper content (milligrams per kilogram), iron content (milligrams per kilogram), magnesium content (milligrams per kilogram), calcium content (milligrams per kilogram), and sodium content (milligrams per kilogram).
Isolation and typing of E. coli isolates.
Isolation of E. coli was performed as previously described (11). Briefly, 2 g of soil was suspended in each of four 20-ml aliquots of E. coli (EC) medium with 4-methylumbelliferyl-β-d-glucuronide (MUG). Each 20-ml aliquot was subsampled in 96 180-μl aliquots in a microtiter plate for a total of 384 enrichment cultures per soil sample. Wells that fluoresced blue under a handheld UV-A emitter were subcultured by a triple streak of 10 μl onto EC-MUG agar plates. Up to 10 fluorescent wells were subcultured per soil sample. When the resulting colonies fluoresced blue under UV-A light, they were streaked for isolation onto EC-MUG agar plates. All incubations were performed without shaking at 37°C. Putative isolates were screened with a glutamate decarboxylase assay (54). Isolates that fluoresced under UV-A in the presence of MUG and were positive for glutamate decarboxylase were stored at −80°C in brain heart infusion (BHI) broth with 15% glycerol.
Single-nucleotide polymorphism qPCR assays were used to putatively group isolates into E. coli phylogroup B1, B2, D, or E. These assays were based on SNPs that differentiate the phylogroups with at least 80% certainty based on genome sequence comparisons. The SNP assay probes and primers were developed using the Custom TaqMan Assay Design Tool with the sequence of E. coli K-12 MG1655 as the reference against which phylogroups (SNP variants) were delineated (Table 3). The five assays were conducted on crude genomic DNA preparations that were generated by incubation of single colonies of E. coli culture in 50 mM NaOH at 95°C for 15 min. Phylogroup D testing was conducted in 96-well plate format, and the remaining phylogroup typing assays were conducted in preprinted 384-well plates (Life Technologies, Carlsbad, CA). Each preprinted plate contained 96 wells each of the remaining four assays for phylogroups B1 (two assays), B2, and E. All assays were conducted using a reaction mixture containing 2× TaqMan genotyping PCR master mix, 40× prealiquoted custom assay probe and primer mix (final concentrations of 200 nM probe and 900 nM each primer), a 1:10 dilution of crude lysate, and sterile distilled water to a final volume of 5 μl. Although no control strains of a known phylogroup were included in the SNP assays, we did include reference genomes of every phylogroup in our core genome phylogeny (Table 4).
TABLE 3.
Phylogroup | Gene locus taga | SNP positionb | Forward primer | Reverse primer | Reference probe (FAM)c | Target probe (VIC) |
---|---|---|---|---|---|---|
B1 | b1756 | 156 | CCGTCTGGCTGTGGAAAATC | GCATGTCAATCCGTTGCTCATT | AAGAAAACTGTCCGGCCAG | CAAGAAAACTGTCCAGCCAG |
B1 | b3251 | 342 | GCCCGCGCGTTCTG | TGCGCGGCGTTCAAC | AACCGGCACACAAA | CCAACCGGTACACAAA |
B2 | b0722 | 60 | CCTCCGCATTAGGACGCAAT | GTAGAGCGTCAGGACGATAGC | CTCGTTCGCGCTACC | CCTCGTTCGTGCTACC |
D | b1288 | 504 | CGTTATGGGTCTGGCAAAAGC | GCGTTAACACGCACACCTT | TCTGGAAGCAAACGTG | TCTGGAAGCGAACGTG |
E | b0405 | 57 | ATGCGCGTTACCGATTTCTC | CAGCGACAGTAAACGACAGCTA | CGTTCAGGCATGGGAT | CGTTCAGGCATCGGAT |
Gene locus tag corresponds to E. coli K-12 MG1655.
SNP position indicates the SNP location in the nucleotide sequence of the indicated gene.
FAM, 6-carboxyfluorescein.
TABLE 4.
Accession no. | Strain | Phylogroup |
---|---|---|
CP000802 | E. coli HS | A |
CP000946 | E. coli C ATCC 8739 | A |
CP002729 | E. coli UMNK88 | A |
CP006636 | E. coli PCN061 | A |
CP007265 | E. coli ST540 | A |
CP007390 | E. coli ST540-A | A |
CP007391 | E. coli ST540-AN | A |
CP007594 | E. coli SEC470 | A |
CP009166 | E. coli 1303 | A |
CP010129 | E. coli C9 | A |
CP010137 | E. coli D2 | A |
CP010143 | E. coli D4 | A |
CP010152 | E. coli D9 | A |
CP010160 | E. coli H1 | A |
CP010163 | E. coli H2 | A |
CP014583 | E. coli CFSAN004176 | B1 |
CP014670 | E. coli CFSAN004177 | B1 |
CP014752 | E. coli PSUO103 | B1 |
CP015228 | E. coli 09-00049 | B1 |
CP015240 | E. coli 2011C-3911 | B1 |
CP015244 | E. coli RM7190 | B1 |
CP015912 | E. coli 210205630 | B1 |
CP015995 | E. coli S51 | B1 |
CP016034 | E. coli Co6114 | B1 |
CP016546 | E. coli O177:H21 | B1 |
CP016628 | E. coli FORC_041 | B1 |
CP016828 | E. coli FORC_043 | B1 |
CP018323 | E. coli 9 | B1 |
CP018840 | E. coli 64 | B1 |
CP018948 | E. coli Ecol_224 | B1 |
CP018965 | E. coli Ecol_517 | B1 |
CP019558 | E. coli 207 | B1 |
CP019560 | E. coli 1031 | B1 |
CP000468 | E. coli APEC O1 | B2 |
CP001855 | E. coli NRG 857C | B2 |
CP001969 | E. coli IHE3034 | B2 |
CP002167 | E. coli UM146 | B2 |
CP002211 | E. coli clone D i2 | B2 |
CP002212 | E. coli clone D i14 | B2 |
CP002797 | E. coli NA114 | B2 |
CP005930 | E. coli APEC IMT5155 | B2 |
CP006784 | E. coli JJ1886 | B2 |
CP006830 | E. coli APEC O18 | B2 |
CP007149 | E. coli RS218 | B2 |
CP007275 | E. coli NMEC O18 | B2 |
CP007799 | E. coli Nissle 1917 | B2 |
CP006632 | E. coli PCN033 | D |
CP007392 | E. coli ST2747 | D |
CP007393 | E. coli ST2747-A | D |
CP007394 | E. coli ST2747-AN | D |
CP010116 | E. coli C1 | D |
CP010121 | E. coli C4 | D |
CP018206 | E. coli MRSN346647 | D |
CP018770 | E. coli 2016C-3936C1 | D |
CP018957 | E. coli Ecol_316 | D |
CP018976 | E. coli Ecol_545 | D |
CP021202 | E. coli Z1002 | D |
CP022229 | E. coli WCHEC96200 | D |
CP023142 | E. coli CFSAN061770 | D |
CP023353 | E. coli 746 | D |
CP023364 | E. coli 144 | D |
CP023386 | E. coli 1190 | D |
CP023644 | E. coli M12 | D |
CP023960 | E. coli FDAARGOS_448 | D |
CP006027 | E. coli RM13514 | E |
CP006262 | E. coli RM13516 | E |
CP006736 | Shigella dysenteriae 1617 | E |
CP007133 | E. coli RM12761 | E |
CP007136 | E. coli RM12581 | E |
CP007592 | E. coli Santai | E |
CP008805 | E. coli SS17 | E |
CP008957 | E. coli EDL933 | E |
CP010304 | E. coli SS52 | E |
CP012802 | E. coli WS4202 | E |
CP014314 | E. coli JEONG-1266 | E |
CP015020 | E. coli 28RC1 | E |
CP015023 | E. coli SRCC 1675 | E |
CP015241 | E. coli 2013C-4465 | E |
CP015831 | E. coli 644-PT8 | E |
CP015832 | E. coli 180-PT54 | E |
CP015842 | E. coli FRIK2533 | E |
CP015843 | E. coli FRIK2455 | E |
CP015846 | E. coli FRIK2069 | E |
CP016625 | E. coli FRIK944 | E |
CP016755 | E. coli FORC_044 | E |
AP017617 | E. coli MRY15-117 | F1 |
AP017620 | E. coli MRY15-131 | F1 |
CP000970 | E. coli SMS-3-5 | F1 |
CP003034 | E. coli CE10 | F1 |
CP008697 | E. coli ST648 | F1 |
CP009859 | E. coli ECONIH1 | F1 |
CP015834 | E. coli MS6198 | F1 |
CP019029 | E. coli Ecol_881 | F1 |
CP022164 | E. coli M160133 | F1 |
CP023366 | E. coli 1428 | F1 |
CP023815 | E. coli IMT16316 | F1 |
CU928164 | E. coli IAI39 | F1 |
AP017610 | E. coli 20Ec-P-124 | F2 |
CP006834 | E. coli APEC O2 | F2 |
CP010157 | E. coli D10 | F2 |
CP012112 | E. coli PSUO78 | F2 |
CP013025 | E. coli 2009C-3133 | F2 |
CP019903 | E. coli MDR_56 | F2 |
CP015229 | E. coli 06-00048 | G |
AEJX00000000 | Escherichia sp. TW15838 | Cryptic lineage I |
AEJW00000000 | Escherichia sp. TW09231 | Cryptic lineage III |
AEJV00000000 | Escherichia sp. TW09276 | Cryptic lineage III |
AEJZ00000000 | Escherichia sp. TW14182 | Cryptic lineage IV |
AEME00000000 | Escherichia sp. TW09308 | Cryptic lineage V |
AEJU00000000 | Escherichia albertii TW08933 | E. albertii |
AEJY00000000 | Escherichia albertii TW15818 | E. albertii |
Whole-genome sequencing libraries were generated using Nextera XT library preparation kits (Illumina, San Diego, CA). Genomes were sequenced using a HiSeq 4000 platform with 96 genomes multiplexed per lane by the Macrogen Clinical Laboratory (Seoul, South Korea). Quality control and trimming of sequences were performed using FastQC version 0.11 and Trimmomatic version 0.32, respectively (55, 56). De novo assembly was performed using SPAdes version 3.10.1, and the resulting scaffold sequences were annotated with Prokka version 1.12 with the included Escherichia annotation database (57, 58). Core genes were identified via pangenome analysis using Roary version 3.8.2 (59). To assess SNP assay accuracy, Roary was run on 990 genome sequences and a collection of 104 publically available E. coli genomes representing all phylogroups. The reference genomes were obtained from NCBI as nucleotide fasta files and reannotated with Prokka for the purposes of ensuring compatibility with Roary. A core genome phylogeny was generated using RAxML 8.2.11 with the GTRCAT model in order to confirm phylogroup membership as assigned via SNP assays (60).
Statistical analysis of distribution of E. coli among surface soil samples.
Statistical analyses were performed on the environmental distributions of all E. coli isolates and of the isolates from respective phylogroups. Because soil chemistry data were available for all soil samples from which E. coli strains were isolated, ANOVA was performed to test for differences between soil chemistry values associated with each phylogroup. When ANOVA P values were significant, contrasts were analyzed using Tukey's honest significant difference test. The distributions of all E. coli isolates and each phylogroup among land cover, vegetation density, vegetation type, and soil texture classes were assessed using Pearson's chi-square test.
Environmental modeling of E. coli prevalence.
Environmental modeling of E. coli prevalence was conducted to determine if remotely sensed basemap data available from public databases could be useful to predict the presence of E. coli in surface soils. Random forest ensemble analysis was used similarly previously published methods (24, 61). Random forest analysis has been a useful tool for the analysis of habitat preferences of species, including microbes. Random forest and classification tree methods are desirable in this scenario, because the tree building method tests for variables that can be used to partition samples containing E. coli from those that do not. The methods make minimal assumptions about response variable distributions, except that they can be successfully divided into discrete classes using the predictor variables as rules. However, heavily unbalanced sampling of positive and negative samples can affect the performance of the analysis algorithm (62). Briefly, Geographic Information Systems (GIS) data were obtained from the National Land Cover Database (63), the USDA SSURGO database of predicted soil properties via the Web Soil Survey (https://websoilsurvey.sc.egov.usda.gov/ [accessed 15 April 2015]), and hydrologic features from the National Hydrographic Dataset (53). The data obtained included land cover type; proximity to nearest forest, impervious surface, hydrological features, and cropland; percent land cover at a 250-m scale for forest, impervious surface, hydrological features, and cropland; and predicted pH, surface soil texture, drainage class, depth to water table, and available water storage in the depth range from 0 to 25 cm. Random forest analysis was conducted by recursively constructing overfit classification trees with E. coli presence/absence binary data as the response variable and GIS basemap data as the predictors using the R package randomForest in R 3.3.1. Two thousand trees were constructed from the data, and each tree was constructed using a random selection of five predictor variables and two-thirds of available soil samples as the in-bag samples. The overall accuracy of the predictor variables to classify E. coli-positive samples was assessed by measuring the loss of prediction accuracy when each predictor value was scrambled relative to soil samples. Unbalanced sampling of classes can affect classification accuracy on randomForest. Positives were slightly undersampled compared to negatives: positive soil samples were 41% of the overall sample set. Attempts to adjust the weight given to positive and negative samples did not significantly change prediction accuracy in the randomForest analysis.
To produce a decision tree to support future sampling of E. coli from surface soils, a conditional inference classification tree (using the same formula as the randomForest) was generated that reflected not only the top most important rules in the random forest but also extracted lower order rules in the hierarchy of samples. The significance of split rules was evaluated using a quadratic chi-square-like multivariate test statistic as implemented in R/party in R 3.3.1 (64). A Bonferroni adjustment was applied to P values calculated at each split to account for nonindependent tests across the classification tree.
Accession number(s).
An MS Access database of samples was deposited into the georeferenced Knowledge Network for Biocomplexity with the data set identifier “knb.92105.1” (https://knb.ecoinformatics.org). Genome sequences are associated with BioProject no. PRJNA416911, where links to all Biosample entries and corresponding SRA archives can be found.
Supplementary Material
ACKNOWLEDGMENTS
David Lacher and Mark Mammel developed the SNP set used in our assays at the U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition in Laurel, MD. We thank 77 private landowners without whose permission we could not have completed this study. The North Dakota State Agricultural Experiment Station in Fargo granted permission for sampling in two locations. We similarly thank the Minnesota Department of Natural Resources and The Nature Conservancy for permission to sample nine sites at the Buffalo River State Park and Natural Area. We thank Oleksandr Maistrenko (NDSU), Steffan Stroh (NDSU), Elliot Welker (NDSU), Osama Mahdi (NDSU), Birgit Prüß (NDSU), Sara Anderson (MN State University-Moorhead [MSUM]), Vincent Anani (MSUM), and Nancy Castro-Borjas (MSUM) for assistance with soil sample collection.
This work was funded by National Science Foundation CAREER award no. DEB-1453397 to P.W.B. and Hatch Act Federal Formula Funds project no. ND02428 in association with the North Dakota Agricultural Experiment Station.
Footnotes
Supplemental material for this article may be found at https://doi.org/10.1128/AEM.02714-17.
REFERENCES
- 1.Jang J, Hur HG, Sadowsky MJ, Byappanahalli MN, Yan T, Ishii S. 2017. Environmental Escherichia coli: ecology and public health implications–a review. J Appl Microbiol 123:570–581. doi: 10.1111/jam.13468. [DOI] [PubMed] [Google Scholar]
- 2.Tenaillon O, Skurnik D, Picard B, Denamur E. 2010. The population genetics of commensal Escherichia coli. Nat Rev Microbiol 8:207–217. doi: 10.1038/nrmicro2298. [DOI] [PubMed] [Google Scholar]
- 3.Brennan FP, Abram F, Chinalia FA, Richards KG, O'Flaherty V. 2010. Characterization of environmentally persistent Escherichia coli isolates leached from an Irish soil. Appl Environ Microbiol 76:2175–2180. doi: 10.1128/AEM.01944-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Somorin Y, Abram F, Brennan F, O'Byrne C. 2016. The general stress response is conserved in long-term soil-persistent strains of Escherichia coli. Appl Environ Microbiol 82:4628–4640. doi: 10.1128/AEM.01175-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ishii S, Ksoll WB, Hicks RE, Sadowsky MJ. 2006. Presence and growth of naturalized Escherichia coli in temperate soils from Lake Superior watersheds. Appl Environ Microbiol 72:612–621. doi: 10.1128/AEM.72.1.612-621.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Luo C, Walk ST, Gordon DM, Feldgarden M, Tiedje JM, Konstantinidis KT. 2011. Genome sequencing of environmental Escherichia coli expands understanding of the ecology and speciation of the model bacterial species. Proc Natl Acad Sci U S A 108:7200–7205. doi: 10.1073/pnas.1015622108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Blount ZD. 2015. The unexhausted potential of E. coli. Elife 4:e05826. doi: 10.7554/eLife.05826. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Santo Domingo JW, Bambic DG, Edge TA, Wuertz S. 2007. Quo vadis source tracking? Towards a strategic framework for environmental monitoring of fecal pollution. Water Res 41:3539–3552. doi: 10.1016/j.watres.2007.06.001. [DOI] [PubMed] [Google Scholar]
- 9.Wanjugi P, Fox GA, Harwood VJ. 2016. The interplay between predation, competition, and nutrient levels influences the survival of Escherichia coli in aquatic environments. Microb Ecol 72:526–537. doi: 10.1007/s00248-016-0825-6. [DOI] [PubMed] [Google Scholar]
- 10.Savageau MA. 1983. Escherichia coli habitats, cell-types, and molecular mechanisms of gene-control. Am Nat 122:732–744. doi: 10.1086/284168. [DOI] [Google Scholar]
- 11.Bergholz PW, Noar JD, Buckley DH. 2011. Environmental patterns are imposed on the population structure of Escherichia coli after fecal deposition. Appl Environ Microbiol 77:211–219. doi: 10.1128/AEM.01880-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Méric G, Kemsley EK, Falush D, Saggers EJ, Lucchini S. 2013. Phylogenetic distribution of traits associated with plant colonization in Escherichia coli. Environ Microbiol 15:487–501. doi: 10.1111/j.1462-2920.2012.02852.x. [DOI] [PubMed] [Google Scholar]
- 13.Byappanahalli MN, Whitman RL, Shively DA, Sadowsky MJ, Ishii S. 2006. Population structure, persistence, and seasonality of autochthonous Escherichia coli in temperate, coastal forest soil from a Great Lakes watershed. Environ Microbiol 8:504–513. doi: 10.1111/j.1462-2920.2005.00916.x. [DOI] [PubMed] [Google Scholar]
- 14.Byappanahalli MN, Whitman RL, Shively DA, Ferguson J, Ishii S, Sadowsky MJ. 2007. Population structure of Cladophora-borne Escherichia coli in nearshore water of Lake Michigan. Water Res 41:3649–3654. doi: 10.1016/j.watres.2007.03.009. [DOI] [PubMed] [Google Scholar]
- 15.Fremaux B, Prigent-Combaret C, Vernozy-Rozand C. 2008. Long-term survival of Shiga toxin-producing Escherichia coli in cattle effluents and environment: an updated review. Vet Microbiol 132:1–18. doi: 10.1016/j.vetmic.2008.05.015. [DOI] [PubMed] [Google Scholar]
- 16.Brennan FP, Grant J, Botting CH, O'Flaherty V, Richards KG, Abram F. 2013. Insights into the low-temperature adaptation and nutritional flexibility of a soil-persistent Escherichia coli. FEMS Microbiol Ecol 84:75–85. doi: 10.1111/1574-6941.12038. [DOI] [PubMed] [Google Scholar]
- 17.Bleibtreu A, Clermont O, Darlu P, Glodt J, Branger C, Picard B, Denamur E. 2014. The rpoS gene is predominantly inactivated during laboratory storage and undergoes source-sink evolution in Escherichia coli species. J Bacteriol 196:4276–4284. doi: 10.1128/JB.01972-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Carlson SM, Cunningham CJ, Westley PA. 2014. Evolutionary rescue in a changing world. Trends Ecol Evol 29:521–530. doi: 10.1016/j.tree.2014.06.005. [DOI] [PubMed] [Google Scholar]
- 19.Jacobson AR, Dousset S, Andreux F, Baveye PC. 2007. Electron microprobe and synchrotron X-ray fluorescence mapping of the heterogeneous distribution of copper in high-copper vineyard soils. Environ Sci Technol 41:6343–6349. doi: 10.1021/es070707m. [DOI] [PubMed] [Google Scholar]
- 20.Islam M, Doyle MP, Phatak SC, Millner P, Jiang X. 2004. Persistence of enterohemorrhagic Escherichia coli O157:H7 in soil and on leaf lettuce and parsley grown in fields treated with contaminated manure composts or irrigation water. J Food Prot 67:1365–1370. doi: 10.4315/0362-028X-67.7.1365. [DOI] [PubMed] [Google Scholar]
- 21.Islam M, Morgan J, Doyle MP, Jiang X. 2004. Fate of Escherichia coli O157:H7 in manure compost-amended soil and on carrots and onions grown in an environmentally controlled growth chamber. J Food Prot 67:574–578. doi: 10.4315/0362-028X-67.3.574. [DOI] [PubMed] [Google Scholar]
- 22.Semenov AV, Franz E, van Overbeek L, Termorshuizen AJ, van Bruggen AH. 2008. Estimating the stability of Escherichia coli O157:H7 survival in manure-amended soils with different management histories. Environ Microbiol 10:1450–1459. doi: 10.1111/j.1462-2920.2007.01558.x. [DOI] [PubMed] [Google Scholar]
- 23.NandaKafle G, Seale T, Flint T, Nepal M, Venter SN, Brozel VS. 2017. Distribution of diverse Escherichia coli between cattle and pasture. Microbes Environ 32:226–233. doi: 10.1264/jsme2.ME17030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Bergholz PW, Strawn LK, Ryan GT, Warchocki S, Wiedmann M. 2016. Spatiotemporal analysis of microbiological contamination in New York State produce fields following extensive flooding from Hurricane Irene, August 2011. J Food Prot 79:384–391. doi: 10.4315/0362-028X.JFP-15-334. [DOI] [PubMed] [Google Scholar]
- 25.Guber AK, Shelton DR, Pachepsky YA. 2005. Effect of manure on Escherichia coli attachment to soil. J Environ Qual 34:2086–2090. doi: 10.2134/jeq2005.0039. [DOI] [PubMed] [Google Scholar]
- 26.Stocker MD, Pachepsky YA, Hill RL, Shelton DR. 2015. Depth-dependent survival of Escherichia coli and enterococci in soil after manure application and simulated rainfall. Appl Environ Microbiol 81:4801–4808. doi: 10.1128/AEM.00705-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kiefer LA, Shelton DR, Pachepsky Y, Blaustein R, Santin-Duran M. 2012. Persistence of Escherichia coli introduced into streambed sediments with goose, deer and bovine animal waste. Lett Appl Microbiol 55:345–353. doi: 10.1111/j.1472-765X.2012.03296.x. [DOI] [PubMed] [Google Scholar]
- 28.Blaustein RA, Pachepsky Y, Hill RL, Shelton DR, Whelan G. 2013. Escherichia coli survival in waters: temperature dependence. Water Res 47:569–578. doi: 10.1016/j.watres.2012.10.027. [DOI] [PubMed] [Google Scholar]
- 29.Tymensen LD, Pyrdok F, Coles D, Koning W, McAllister TA, Jokinen CC, Dowd SE, Neumann NF. 2015. Comparative accessory gene fingerprinting of surface water Escherichia coli reveals genetically diverse naturalized population. J Appl Microbiol 119:263–277. doi: 10.1111/jam.12814. [DOI] [PubMed] [Google Scholar]
- 30.Berthe T, Ratajczak M, Clermont O, Denamur E, Petit F. 2013. Evidence for coexistence of distinct Escherichia coli populations in various aquatic environments and their survival in estuary water. Appl Environ Microbiol 79:4684–4693. doi: 10.1128/AEM.00698-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Fierer N, Jackson RB. 2006. The diversity and biogeography of soil bacterial communities. Proc Natl Acad Sci U S A 103:626–631. doi: 10.1073/pnas.0507535103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Fierer N. 2017. Embracing the unknown: disentangling the complexities of the soil microbiome. Nat Rev Microbiol 15:579–590. doi: 10.1038/nrmicro.2017.87. [DOI] [PubMed] [Google Scholar]
- 33.Walk ST, Alm EW, Calhoun LM, Mladonicky JM, Whittam TS. 2007. Genetic diversity and population structure of Escherichia coli isolated from freshwater beaches. Environ Microbiol 9:2274–2288. doi: 10.1111/j.1462-2920.2007.01341.x. [DOI] [PubMed] [Google Scholar]
- 34.Ihssen J, Grasselli E, Bassin C, Francois P, Piffaretti JC, Koster W, Schrenzel J, Egli T. 2007. Comparative genomic hybridization and physiological characterization of environmental isolates indicate that significant (eco-)physiological properties are highly conserved in the species Escherichia coli. Microbiology 153:2052–2066. doi: 10.1099/mic.0.2006/002006-0. [DOI] [PubMed] [Google Scholar]
- 35.van Elsas JD, Chiurazzi M, Mallon CA, Elhottova D, Kristufek V, Salles JF. 2012. Microbial diversity determines the invasion of soil by a bacterial pathogen. Proc Natl Acad Sci U S A 109:1159–1164. doi: 10.1073/pnas.1109326109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.van Elsas JD, Hill P, Chronakova A, Grekova M, Topalova Y, Elhottova D, Kristufek V. 2007. Survival of genetically marked Escherichia coli O157:H7 in soil as affected by soil microbial community shifts. ISME J 1:204–214. doi: 10.1038/ismej.2007.21. [DOI] [PubMed] [Google Scholar]
- 37.Tymensen L, Booker CW, Hannon SJ, Cook SR, Zaheer R, Read R, McAllister TA. 2017. Environmental growth of enterococci and Escherichia coli in feedlot catch basins and a constructed wetland in the absence of fecal input. Environ Sci Technol 51:5386–5395. doi: 10.1021/acs.est.6b06274. [DOI] [PubMed] [Google Scholar]
- 38.Manel S, Albert CH, Yoccoz NG. 2012. Sampling in landscape genomics, p 1–12. In Pompanon F, Bonin A (ed), Data production and analysis in population genomics: methods and protocols. Humana Press, New York, NY. [Google Scholar]
- 39.Fountain-Jones NM, Craft ME, Funk WC, Kozakiewicz C, Trumbo DR, Boydston EE, Lyren LM, Crooks K, Lee JS, VandeWoude S, Carver S. 2017. Urban landscapes can change virus gene flow and evolution in a fragmentation-sensitive carnivore. Mol Ecol 26:6487–6498. doi: 10.1111/mec.14375. [DOI] [PubMed] [Google Scholar]
- 40.Texier S, Prigent-Combaret C, Gourdon MH, Poirier MA, Faivre P, Dorioz JM, Poulenard J, Jocteur-Monrozier L, Moenne-Loccoz Y, Trevisan D. 2008. Persistence of culturable Escherichia coli fecal contaminants in dairy alpine grassland soils. J Environ Qual 37:2299–2310. doi: 10.2134/jeq2008.0028. [DOI] [PubMed] [Google Scholar]
- 41.Falbo K, Schneider RL, Buckley DH, Walter MT, Bergholz PW, Buchanan BP. 2013. Roadside ditches as conduits of fecal indicator organisms and sediment: implications for water quality management. J Environ Manage 128:1050–1059. doi: 10.1016/j.jenvman.2013.05.021. [DOI] [PubMed] [Google Scholar]
- 42.Park S, Navratil S, Gregory A, Bauer A, Srinath I, Szonyi B, Nightingale K, Anciso J, Jun M, Han D, Lawhon S, Ivanek R. 2015. Multifactorial effects of ambient temperature, precipitation, farm management, and environmental factors determine the level of generic Escherichia coli contamination on preharvested spinach. Appl Environ Microbiol 81:2635–2650. doi: 10.1128/AEM.03793-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Park S, Navratil S, Gregory A, Bauer A, Srinath I, Jun M, Szonyi B, Nightingale K, Anciso J, Ivanek R. 2013. Generic Escherichia coli contamination of spinach at the preharvest stage: effects of farm management and environmental factors. Appl Environ Microbiol 79:4347–4358. doi: 10.1128/AEM.00474-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Geldreich EE, Huff CB, Bordner RH, Kabler PW, Clark HF. 1962. Faecal coli-aerogenes flora of soils from various geographical areas. J Appl Bacteriol 25:87–93. doi: 10.1111/j.1365-2672.1962.tb01123.x. [DOI] [Google Scholar]
- 45.Faust MA. 1982. Relationship between land-use practices and fecal bacteria in soils. J Environ Qual 11:141–146. doi: 10.2134/jeq1982.00472425001100010031x. [DOI] [Google Scholar]
- 46.Weller D, Shiwakoti S, Bergholz P, Grohn Y, Wiedmann M, Strawn LK. 2016. Validation of a previously developed geospatial model that predicts the prevalence of Listeria monocytogenes in New York State produce fields. Appl Environ Microbiol 82:797–807. doi: 10.1128/AEM.03088-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Blyton MD, Gordon DM. 2017. Genetic attributes of E. coli isolates from chlorinated drinking water. PLoS One 12:e0169445. doi: 10.1371/journal.pone.0169445. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Salipante SJ, Roach DJ, Kitzman JO, Snyder MW, Stackhouse B, Butler-Wu SM, Lee C, Cookson BT, Shendure J. 2015. Large-scale genomic sequencing of extraintestinal pathogenic Escherichia coli strains. Genome Res 25:119–128. doi: 10.1101/gr.180190.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Turrientes MC, Gonzalez-Alba JM, del Campo R, Baquero MR, Canton R, Baquero F, Galan JC. 2014. Recombination blurs phylogenetic groups routine assignment in Escherichia coli: setting the record straight. PLoS One 9:e105395. doi: 10.1371/journal.pone.0105395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Ramsayer J, Kaltz O, Hochberg ME. 2013. Evolutionary rescue in populations of Pseudomonas fluorescens across an antibiotic gradient. Evol Appl 6:608–616. doi: 10.1111/eva.12046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Buja K, Menza C. 2016. NCCOS sampling design tool for ArcGIS, ESRI ArcGIS Add-in, ArcGIS ver. 10.0 service pack 3 or higher. NOAA/NOS National Centers for Coastal Ocean Science, Silver Spring, MD. [Google Scholar]
- 52.Reference deleted.
- 53.U.S. Geological Survey. 2017. National Hydrography Dataset (NHD) WMS from the National Map. U.S. Geological Survey, Reston, VA: https://viewer.nationalmap.gov/services/. [Google Scholar]
- 54.Rice EW, Johnson CH, Dunnigan ME, Reasoner DJ. 1995. Rapid glutamate decarboxylase assay for detection of Escherichia coli. Appl Environ Microbiol 61:847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Andrews S. 2010. FastQC: a quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
- 57.Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Seemann T. 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069. doi: 10.1093/bioinformatics/btu153. [DOI] [PubMed] [Google Scholar]
- 59.Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MT, Fookes M, Falush D, Keane JA, Parkhill J. 2015. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31:3691–3693. doi: 10.1093/bioinformatics/btv421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Chapin TK, Nightingale KK, Worobo RW, Wiedmann M, Strawn LK. 2014. Geographical and meteorological factors associated with isolation of Listeria species in New York State produce production and natural environments. J Food Prot 77:1919–1928. doi: 10.4315/0362-028X.JFP-14-132. [DOI] [PubMed] [Google Scholar]
- 62.Cutler DR, Edwards TC Jr, Beard KH, Cutler A, Hess KT, Gibson J, Lawler JJ. 2007. Random forests for classification in ecology. Ecology 88:2783–2792. doi: 10.1890/07-0539.1. [DOI] [PubMed] [Google Scholar]
- 63.Homer C, Dewitz J, Yang LM, Jin S, Danielson P, Xian G, Coulston J, Herold N, Wickham J, Megown K. 2015. Completion of the 2011 National Land Cover Database for the conterminous United States—representing a decade of land cover change information. Photogramm Eng Remote Sensing 81:345–354. [Google Scholar]
- 64.Hothorn T, Hornik K, Zeileis A. 2006. Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15:651–674. doi: 10.1198/106186006X133933. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.