Abstract
Human fecal pollution of recreational waters remains a public health concern worldwide. As a result, there is a growing interest in the application of human-associated fecal source identification quantitative real-time PCR (qPCR) technologies for water quality research and management. However, there are currently no standardized approaches for field implementation and interpretation of qPCR data. In this study, a standardized HF183/BacR287 qPCR method was combined with a water sampling strategy and a novel Bayesian weighted average approach to establish a human fecal contamination score (HFS) that can be used to prioritize sampling sites for remediation based on measured human waste levels. The HFS was then used to investigate 975 study design scenarios utilizing different combinations of sites with varying sampling intensities (daily to once per week) and number of qPCR replicates per sample (2–14 replicates). Findings demonstrate that site prioritization with HFS is feasible and that both sampling intensity and number of qPCR replicates influence reliability of HFS estimates. The novel data analysis strategy presented here provides a prescribed approach for the implementation and interpretation of human-associated HF183/BacR287 qPCR data with the goal of site prioritization based on human fecal pollution levels. In addition, information is provided for future users to customize study designs for optimal HFS performance.
Keywords: Microbial source tracking, Site prioritization, Human fecal pollution, qPCR
Graphical Abstract
1. Introduction
Many environmental waters are routinely impaired based on general fecal indicator water quality standards such as E. coli or enterococci. Fecal pollution can originate from many sources due to a combination of wildlife, agricultural, natural, and human activities. However, general fecal indicator methods used for routine water quality monitoring do not discriminate between pollution sources making it difficult to manage sites impacted by more than one source. As a result, many water quality managers are using fecal source identification technologies to compliment general fecal indicator approaches. These host-associated methods are deliberately designed to characterize levels of fecal pollution in water samples from a specific animal group. Technologies that target human fecal pollution are of particular interest because exposure to human waste may represent a higher public health risk compared to exposure from most other animal feces such as gull, chicken, and swine (Soller et al., 2010).
There are many available human fecal source identification methods ranging from canine scent detection (Murray, 2011) to bacterial community approaches (Cao et al., 2013a, Cao et al., 2013b, Dubinsky et al., 2012, Fisher et al., 2015, Unno et al., 2010, Cao et al., 2013b). However, only the HF183 quantitative real-time PCR (qPCR) (Bernhard and Field, 2000, Green et al., 2014, Haugland et al., 2010) method has been a consistent top performing technology across multiple validation study efforts (Boehm et al., 2013, Layton et al., 2013, Shanks et al., 2010), is a recommended water quality approach for use in the State of California (Griffith et al., 2013), and is under consideration by the United States Environmental Protection Agency for the development of a national standardized procedure (Shanks et al., 2016). Even though there is a growing precedence for the use of this technology, there are currently no standardized procedures for water quality implementation or interpretation of HF183 qPCR data. The lack of standardized approaches is, in part, due to the broad range of potential water quality applications possible with a human-associated fecal indicator, ranging from verification of sanitary survey findings (USEPA, 2012) to an indicator of public health risk (Boehm et al., 2015). The lack of a standardized procedure, in turn, prevents evaluation of sampling and laboratory study design choices, making implementation challenging.
Standardization of qPCR interpretation procedures is further complicated by differences in opinion among experts on key aspects of data analysis and experiment design, such as defining the lower limit of quantification (LLOQ), inclusion of data below the LLOQ, data acceptance metrics, and replicate sampling requirements (Stewart et al., 2013). As a first step towards the standardization of HF183 qPCR data interpretation, ten water quality experts participated in a formal Delphi exercise to identify and reach consensus regarding the use of data below the LLOQ, among other factors for prioritizing recreational sites based on human fecal pollution levels (Cao et al., 2013c). Results of this effort indicated that participating water quality experts unanimously agreed that the ideal HF183 qPCR data analysis approach should utilize all data including both measurements within the range of quantification (ROQ), as well as, any results below the LLOQ including non-detections. This notion differs from the traditional qPCR absolute quantification strategy, where samples yielding data only within the ROQ are used to estimate the DNA target concentration in an unknown sample (LifeTechnologies, 2014).
This study seeks to introduce a novel metric to estimate the level of human fecal contamination across a series of sampling locations with the purpose of prioritizing sites for remediation. This metric uses the standardized HF183/BacR287 qPCR method combined with a prescribed water sampling strategy and novel Bayesian weighted average approach to establish a human fecal score (HFS) for each sampling site. In essence, HFS is an estimate of the level of human fecal contamination at a given site based on the average concentration of the HF183 gene in water samples collected over a defined period of time. Unlike traditional qPCR quantification strategies, HFS mathematically incorporates all data from water sample tests regardless if qPCR measurements are below or above the calibration model LLOQ. With this metric, 975 study design choices including sample intensity (number of samples tested over a fixed duration of time) and level of qPCR replication were evaluated to identify optimal conditions. Findings demonstrate the utility of HFS for site prioritization and provide key information necessary for future users to optimize study designs based on local field and laboratory capacities.
2. Materials and methods
2.1. HFS definition
HFS (copies per 100 mL) is defined as a weighted average utilizing all HF183/BacR287 qPCR measurements from a series of samples collected at a site over a designated period of time. All sample Cq (quantification cycle) values, regardless if they are below a LLOQ threshold value, are used to estimate the HFS. The LLOQ threshold (35.03 Cq) is defined as the upper bound of the 95% credible interval corresponding to the HF183/BacR287 master calibration model at 10 copies per reaction. Prior to calculating HFS, the mean Cq for each sample (no amplification was set to 40 Cq) was classified into a ROQ group (if mean Cq < LLOQ) or MPN group (if mean Cq > LLOQ). After classification of each sample into either the ROQ or MPN groups, HFS is calculated as follows:
Let the number of samples in ROQ group be r, and the number of replicates per sample be n0. Suppose the average Cq and the standard deviation of the ith sample are and si, i = 1, 2 … r. Then the standard deviation s of the overall mean of all ‘s is . A normal distribution with mean and variance s2 is assumed for true Cq0 (i.e Cq0 ∼ N(, s2). The posterior distribution of
(1) |
is used to estimate the mean concentration C1 (in log10 base), where αand β are the intercept and slope parameters, respectively, of the master calibration curve.
Out of the remaining m samples with n0 replicates per sample, which are in the MPN group, let the total number of positives be n1 and N = n0·m. The following Bayesian model is used to estimate the concentration C2 in the MPN range (Sivaganesan et al., 2011):
(2) |
Note that the above approach provides an estimate for log10 C2, even if n1 = 0 or N.
The HFS (in log10 base) is defined as the weighted average of log10 C1and log10 C2 and is given by:
(3) |
where, W = r/(r + m). Please refer to supplemental material for WinBugs coding for HFS.
2.2. Field sites and surface water sampling
A total of 42 surface water samples were collected from three Southern California coastal sites including Escondido (Esco; coordinates, 34.037745, −118.582138), Marie Canyon (Mcyn; coordinates, 34.03055, −118.71), and Topanga (Topa; coordinates, 34.02551, −118.765). Samples were collected on the same day in the morning (before 10 a.m.) on a weekly basis from June 26 to September 24, 2013. These sites were selected based on historical fecal indicator data suggesting the presence of fecal pollution at different levels (Topa < Mcyn < Esco; data not shown). Water samples were collected in acid-washed (10% HCl) 1-L containers from surface water and were immediately transported on ice to the Southern California Coastal Water Research Project (SCCWRP) laboratory for filtration (<6 h holding time). For each water sample, triplicate filtrations were prepared by passing 100 mL of water through a 0.45 μm pore size 47 mm diameter GE Osmonics™ polycarbonate filter (Thermo Fisher Scientific, Grand Island, NY) for each replicate. The same volume of phosphate saline buffer was used in place of sample water for filtration blanks. Filters were placed in sterile 2-mL screw cap tubes containing a silica bead mill matrix (GeneRite, North Brunswick, NJ), flash frozen in liquid nitrogen, and stored at −80 °C (<8 months) prior to overnight express shipping on dry ice to the U.S. EPA National Risk Management Research Laboratory (Cincinnati, OH).
2.3. DNA extractions
DNA extractions were performed with the DNA-EZ RW02 kit (GeneRite LLC, North Brunswick, NJ) according to manufacturer’s instructions as previously described (Kelty et al., 2012). For all filters including extraction blanks, 600 μL of 0.2 μg/mL salmon testes DNA diluted in AE buffer (Qiagen, Valencia, CA) was spiked into each bead milling tube prior to extraction (Haugland et al., 2010). Two extraction blanks were performed for each batch preparation. DNA extracts were stored at 4 °C in GeneMate Slick low-adhesion microcentrifuge tubes (ISC BioExpress, Kaysville, UT) until time of qPCR amplification (<24 h storage time).
2.4. Preparation of reference DNA materials
Reference DNA sources included two plasmid constructs (Integrated DNA Technologies, Coralville, IA) and salmon testes DNA (Sigma-Aldrich, St. Louis, MO). The plasmid constructs for calibration standards and the internal amplification control (IAC) contained target sequences for HF183/BacR287 and were prepared for qPCR testing as previously described (Green et al., 2014). For the calibration standard, the concentration was determined by droplet digital PCR (1.02 ± 0.19 × 107copies/2 μL) as described elsewhere (Cao et al., 2015, Cao et al., 2016) and diluted in 10 mM Tris and 0.1 mM EDTA (pH 8.0) to generate 10, 102, 103, 104, and 105 copies/2 μL. The initial concentration of the IAC plasmid preparation was determined with a Quant-iT PicoGreen ds DNA Assay Kit (Thermo Fisher Scientific, Grand Island, NY) on a SpectraMax Paradigm Multi-Mode Microplate Detection Platform (Molecular Devices, Sunnyvale, CA) and diluted in 10 mM Tris and 0.1 mM EDTA (pH 8.0) to generate a 102 copies/2 μL stock. Salmon DNA working stocks containing 10 μg/mL were prepared by dilution of a commercially available 10 mg/mL solution. All reference DNA material preparations were stored in GeneMate Slick low-adhesion microcentrifuge tubes (ISC BioExpress, Kaysville, UT) at −80 °C (<30 days) prior to laboratory testing.
2.5. qPCR amplification
Multiplex reaction mixtures for HF183/BacR287 contained 1X TaqMan© Environmental Master Mix (Version 2.0), 0.2 mg/mL bovine serum albumin (Sigma-Aldrich, St. Louis, MO), 1 μM each primer, 80 nM 6-carboxyfluorescein (FAM)-labeled probe, and 80 nM VIC-labeled probe [internal amplification control (IAC)]. Multiplex reaction mixtures contained 102 copies of IAC template combined with either PCR grade water, 10 to 1 × 106 target copies of reference calibration standard DNA, or 2 μL of DNA sample extract in a total reaction volume of 25 μL. Fourteen replicates were performed for each sample filter DNA extract preparation. All other reactions were performed in triplicate in MicroAmp optical 96-well reaction plates with MicroAmp 96-well optical adhesive film (Thermo Fisher Scientific, Grand Island, NY). The thermal cycling profile was as follows: 2 min at 95 °C followed by 40 cycles of 5 s at 95 °C and 30 s at 60 °C. The threshold was adjusted manually to 0.03 and Cq values were exported to Microsoft Excel. To monitor for potential sources of extraneous DNA during qPCR amplification, a minimum of six no-template amplifications (NTC) with purified water substituted for template DNA were performed for each instrument run. A master calibration model (Sivaganesan et al., 2008) was generated for HF183/BacR287 based on plasmid reference calibration standard measurements from six instrument runs (outliers removed; outliers were defined as absolute value of studentized residual > 3).
2.6. Amplification inhibition and sample processing efficiency controls
To screen for potential amplification interference in filter sample DNA extracts, each test reaction was spiked with 102 copies of IAC reference DNA material as previously described (Green et al., 2014). An amplification interference threshold (mean VIC NTC Cq + 1.5) was calculated for each instrument run containing sample filter DNA test reactions. For each sample filter, HF183/BacR287 VIC Cq values from triplicate reactions were used to calculate a sample filter mean HF183/BacR287 VIC mean Cq. Sample filter mean Cq values below the respective instrument run-specific interference threshold indicated no amplification inhibition. Variability in sample processing was monitored in all sample filters and extraction blanks with a sample processing control (SPC) consisting of a fixed concentration spike of salmon testes DNA (0.2 μg/mL) followed by amplification of 2 μL DNA extract with the Sketa22 qPCR assay as previously described (Haugland et al., 2010). A SPC acceptance threshold was calculated using Cq values from all extraction blanks (Sketa22 extraction blank mean Cq + 3). For each sample filter, Sketa22 Cq values from triplicate reactions were used to calculate a sample filter mean Sketa22 Cq. Sample filter mean Cq values below the SPC threshold indicated acceptable sample processing variability.
2.7. Field site data simulation
To investigate the influence of sampling intensity (presented as proportion of days sampled over a defined study period), number of replicate qPCR reactions per sample, and site data distributions on HFS estimates, five datasets were created to represent maximum sampling intensity (daily sampling over 105 days, the approximate length of beach recreational season) and maximum qPCR replication (14 replicates per sample). To help datasets better represent distributions of actual field measurements, they were simulated based on laboratory HF183/BacR287 measurements (14 sampling events x 3 filters/sampling event x 14 qPCR replicates/filter = 588 Cq values per site) from Esco, Mcyn, and Topa field sites. In addition, two hybrid sites (Esco:Mcyn and Mcyn:Topa) were created by randomly mixing respective field site data sets in equal proportions.
To simulate complete field data sets (105 days with 14 qPCR replicate values per day), first, 105 seed Cq measurements were randomly selected without replacement for each of the five sites from respective laboratory data measurements. If a selected seed Cq, say Cq.o falls in the ROQ range then a simulated set of 14 Cq values were generated by random sampling from a normal distribution N(Cq.o, σ2) where σ2 was estimated from Esco and Mcyn laboratory data. A simple linear regression model was used to model ln(σ) as a function of Cq.o with the fitted equation: ln(σ) = −13.40221 + 0.3842 · Cq.o (model R2 = 0.842, total number of available data points = 23). Note that Topa data was not used estimate σ2because 99.8% of Cq measurements were ≥40.
In contrast, suppose a randomly selected Cq.o falls in the MPN range and the percentage of positive Cq measurements (positive = any Cq value < 40) was po = number of positives/14, for the corresponding filter f0. Depending on the value of po, different approaches were used to simulate 14 qPCR Cq measurements from the seed Cq.o. If 0 < po < 1, then 14 simulated probabilities of detection (p1, …, pi, …, p14) were generated by random sampling from a uniform distribution U(po-s, po + s), where s represents standard deviation [s = sqrt (po · (1 - po)/14)]. Each of the above simulated probabilities of detection was used to generate a simulated qPCR binary measurement (i.e. 0 for negative and 1 for positive) from a Bernoulli (pi) distribution. Thus, 14 binary data points were simulated when a randomly selected Cq.0 falls in the MPN range. If po = 0, i.e all Cq values = 40, all 14 simulated values are set to 0. Note that, if po = 1, i.e all Cq values < 40, Cq.o was assumed to fall in the ROQ range and random sampling was performed as described above.
2.8. Evaluation of study design choices on HFS
The influence of study design choices on HFS estimates were examined using simulated datasets focusing on the influence of sampling intensity, number of qPCR replicates per sample, and site data distributions. A total of 975 scenarios were evaluated representing different combinations of sampling intensity (N; 7 to 105 by increment of 7), qPCR replicate number (j; varied from 2 to 14 per sample by increment of 1) and site (Esco, Esco:Mcyn, Mcyn, Mcyn:Topa, or Topa). For each scenario, a new dataset was created from the simulated dataset for the respective site (105 sampling events with 14 qPCR replicates per sample) by randomly selecting N samples and j qPCR replicates per sample. Each scenario dataset was used to calculate HFS estimates with variability. This process was repeated for 100 iterations for each scenario.
2.9. Prioritizing sites with HFS and study design optimization
Site prioritization is accomplished by comparing the HFS 95% Bayesian confidence interval (BCI) from each site. Sites where HFS 95% BCI ranges do not overlap can be ranked into separate groups, whereas sites with overlapping HFS 95% BCI values are considered to have similar human fecal pollution levels. In practice, ranking is determined based on a single field sampling campaign (i.e. corresponding to one iteration of simulation) for a given study design scenario (i.e. a defined sampling intensity and level of qPCR replication), generating a single HFS 95% BCI estimate for each site. To characterize optimal study design choices for future site prioritization applications, HFS 95% BCI values generated from 100 iterations was used to approximate ranking outcomes. A cumulative HFS 95% BCI was constructed using the lowest lower BCI bound and the highest upper BCI bound among the 100 iterations for each site and study design combination. For a given study design combination, sites were ranked based on whether their cumulative 95% BCI ranges overlap. Ranking outcomes of various study design choices (i.e. qPCR replicate and sampling intensity combinations) were compared to identify study design choices that produce the same ranking outcome as the best case scenario (BCS; i.e. maximum sampling and analytic effort with 14 qPCR replicates at 100% sampling intensity).
2.10. Calculations and statistics
The HFS representing the expected score for each site was calculated based on the complete data set (i.e. BCS with105 days and 14 qPCR replicates per sample). Variability of HFS across 100 iterations of a particular scenario were also expressed as a HFS 95% inter-quantile range and relative range (HFS 95% inter-quantile range/average HFS of given scenario). Variability of HFS within each iteration was reported as either a 95% BCI or standard deviations. Sampling intensity was expressed as the sample size proportion (scenario sample size divided by 105 days). All statistics were performed with SAS software (Cary, NC), R (Version 3.2.0) or WinBugs (https://www.mrc-bsu.cam.ac.uk/software/bugs/the-bugs-project-winbugs/).
3. Results
3.1. Laboratory and simulated field data distributions
Data distribution parameters for laboratory and simulated Cq measurements are shown in Table 1. Data distributions of simulated Esco, Mcyn and Topa closely mirrored laboratory generated data, while both hybrid sites yielded distributions reflecting a mixture between corresponding parent sites. The proportion of non-detections (Cq = 40) ranged from 44% (Esco) to 99.9% (Topa). The lowest Cq (27.2) was observed in the Esco data set.
Table 1.
Data distributions of HF183/BacR287 qPCR laboratory and simulated Cq measurements.
Site | n | Min Cq | Cq≥40 | Cq>35.03a< 40 | Cq<35.03a |
---|---|---|---|---|---|
Laboratory Data | |||||
Esco | 588 | 31.1 | 44.0% | 26.0% | 29.9% |
Mcyn | 574 | 28.4 | 65.5% | 15.7% | 18.8% |
Topa | 588 | 39.1 | 99.8% | 0.2% | 0.0% |
Simulated Data | |||||
Esco | 1470 | 27.2 | 44.1% | 29.4% | 26.5% |
Esco:Mcyn | 1470 | 28.5 | 51.0% | 24.9% | 24.1% |
Mcyn | 1470 | 28.5 | 63.9% | 15.0% | 21.2% |
Mcyn:Topa | 1470 | 28.6 | 90.7% | 4.9% | 4.4% |
Topa | 1470 | 38.0 | 99.9% | 0.1% | 0.0% |
“n” shows the total number of Cq measurements (14 replicates per filter).
“Min” indicates lowest Cq value in respective data set.
Represents the Cq threshold between ROQ and MPN groups.
3.2. Influence of sampling intensity and number of qPCR replicates on HFS
Analyses designed to characterize the influence of sampling intensity and number of qPCR replicates on HFS was conducted for each simulated field site data set resulting in the analysis of 975 scenarios. Average HFS (copies per 100 mL) calculated from 100 iterations for each scenario ranged from 0.09 (Topa, sampling intensity 80%, and 12 qPCR replicates) to 59.0 (Esco, sampling intensity 13.3%, and 14 qPCR replicates) and varied with sampling intensity to a higher degree than with the number of qPCR replicates per sample (Fig. 1, Panel A). The level of bias in HFS for each scenario (deviation from the dashed line, Fig. 1, Panel A) varied by site and was typically higher at smaller sampling intensities, but varied little by number of qPCR replicates per sample. The 95% inter-quantile range of HFS individual values for each scenario decreased rapidly with sampling intensity, but was minimally affected by the number of qPCR replicates (Fig. 1, Panel B).
Fig. 1.
Effect of sampling intensity and number of qPCR replicates (denoted by different colors) on HFS estimates across the five simulated field sites (Esco, Esco:Mcyn, Mcyn, Mcyn:Topa, and Topa). Sampling intensity was presented as the proportion of all 105 samples for a respective scenario. Panel A shows the average HFS calculated from all 100 iterations for each scenario. The HFS estimate for the best case scenario (BCS; 100% sampling intensity and 14 qPCR replicates is indicated by a dashed line (Panel A). Panel B represents the 95% inter-quantile range of HFS values across 100 iterations. (Note: y-axis truncated to maximum of 150 copies/100 mL).
To characterize variability in HFS for each scenario, the standard deviation of HFS across 95 iterations (the five iterations with highest variability removed) are shown in Fig. 2. Overall, variability in HFS decreased with increasing sampling intensity and number of qPCR replicates. The magnitude of variability differed by field site where sites with low HFS estimates (Mcyn:Topa and Topa) yielded much smaller standard deviations compared to other sites. The highest standard deviation was observed from the Esco scenario at a sampling intensity of 6.7% with two qPCR replicates.
Fig. 2.
Effect of sampling intensity and number of qPCR replicates (denoted by different colors) on HFS variability reported as the standard deviation of HFS values (top 5% removed) for each site and scenario across the five simulated field sites (Esco, Esco:Mcyn, Mcyn, Mcyn:Topa, and Topa). Sampling intensity was presented as the proportion of all 105 samples for a respective scenario.
3.3. Site prioritization using HFS
To characterize the utility of HFS to prioritize sites with different human fecal pollution levels, the cumulative 95% BCI across 100 iterations was calculated for each site based on the selected sampling intensity and qPCR replicate combination and used to generate ranking outcomes. Fig. 3 (Panel A) shows individual HFS with 95% BCI results from all 100 iterations at each site with 3 qPCR replicates and 73.3% sampling intensity leading to a site ranking as follows: (Esco, Esco:Mcyn, and Mcyn) > (Mcyn:Topa and Topa), where the cumulative 95% BCI ranges do not overlap between site groupings. In this scenario (Fig. 3, Panel A), sites could only be categorized into two groups. In contrast, under the BCS, sites could be ranked as follows: (Esco and Esco:Mcyn) > Mcyn > (Mcyn:Topa and Topa) (Fig. 3, Panel B). To generate information for future users to design optimized studies, site ranking outcomes were then determined for each qPCR replicate and sampling intensity design scenario (n = 975) to identify any combinations that result in the same ranking outcomes as the BCS (Fig. 4; top 2.5% lowest and 2.5% highest HFS values removed).
Fig. 3.
Plot showing human fecal score with 95% BCI for all 100 iterations at each site under with two different design choice scenarios. Panel A shows results with a sampling intensity of 73.3% and 3 qPCR replicates. Panel B depicts thebest case scenario (BCS; 100% sampling intensity and 14 qPCR replicates). In the BCS, sites are ranked as follows: (Esco and Esco:Mcyn) > Mcyn > (Mcyn:Topa, and Topa), where the cumulative 95% BCI ranges do not overlap between field site groups. Vertical gray lines represent 95% BCI for each individual HFS iteration.
Fig. 4.
Plot depicting sampling intensity (number of sampling days divided by 105 days) and qPCR replicates per sample (2–14 replicates) combinations for a Human Fecal Score (HFS) site prioritization application. Filled circles indicate qPCR replicate and sampling intensity combinations required to achieve the same site ranking outcome [(Esco and Esco:Mcyn) > Mcyn > (Mcyn:Topa, and Topa)] as the best case scenario (14 qPCR replicates at 100% sampling intensity).
3.4. Quality controls
The HF183/BacR287 qPCR master calibration model (y = −3.56X + 37.3) indicated an R2 of 0.997 and amplification efficiency of 0.91 based on repeated measures from six instrument runs. The ROQ spanned 10 to 106 copies of target DNA per reaction (entire range tested in study). Extraneous DNA controls indicated the absence of contaminant HF183/BacR287 targets in 99.7% of all amplifications [1 false positive (38.6 Cq) out of 316 reactions]. IAC quality assurance tests indicated no amplification inhibition in all 42 filter DNA extracts tested across 20 instrument runs (interference thresholds ranged from 32.0 to 33.6 Cq). SPC tests showed that 95% of filters passed (SPC threshold = 26.5 Cq). For digital PCR experiments (32 reactions), all reactions had > 10,000 accepted droplets (13,300 average) and all no template controls (n = 8) were negative.
4. Discussion
4.1. Influence of sampling intensity and number of qPCR replicates on HFS
To characterize the influence of sampling intensity and qPCR replicate numbers study design parameters on HFS estimates, we conducted an analysis investigating 975 design scenarios at five sites with different levels of human fecal pollution. Using Bayesian simulation, we estimated HFS results representing 100 water quality testing iterations for each sampling intensity and qPCR replication combination. Examination of the average HFS, range, and variability across and within iterations identified several key trends, providing important information on the relationship between study design choices and HFS estimation.
The average HFS values represent the central tendency of pollution level estimates (i.e. HFS) based on 100 simulated water quality testing iterations for each sampling intensity and qPCR replication combination. Findings indicate that increased sampling intensity leads to more accurate average HFS estimates (Fig. 1, Panel A). However, for field implementation, a HFS would be calculated based on a single sampling campaign (corresponding to a single iteration), making the range of possible HFS values across the 100 iterations important to consider (Fig. 1, Panel B). The smaller the HFS range is across the 100 iterations for each scenario, the more likely a HFS calculated from a single iteration or field sampling campaign will be close to the average HFS. Results indicate that increased sampling intensities lead to a smaller HFS range (Fig. 1, Panel B) suggesting that a considerable effort in field sample collection is necessary for reliable implementation. A different trend was observed when standard deviation of HFS estimates was considered (Fig. 2), where both sampling intensity and number of qPCR replicates influence variability in HFS estimates suggesting that these study design factors, together, play a role in optimal implementation of HFS for a site prioritization application.
In addition to sampling intensity and qPCR replication, analyses demonstrated the importance of the field site human fecal pollution level on HFS estimates. At each site, there is a clear pattern where increased sampling intensity and qPCR replication number lead to more accurate and precise HFS estimates. However, the magnitude of sampling intensity and qPCR replication number influence on HFS estimates varied by site, suggesting that the optimal field sampling and laboratory effort varies based on site human fecal pollution levels. This could present a challenge to future practitioners because human pollution levels are typically unknown prior to fecal source identification testing making it difficult to select the optimal sample intensity and qPCR replicate combination. In this study, a bracketing strategy was employed where test sites were deliberately selected based on known human fecal pollution levels representing a range of human fecal pollution detection rates spanning 56% (Esco) to 0.02% (Topa). This range likely represents human fecal pollution levels at most recreational water locations. However, users anticipating higher fecal pollution levels may need to perform additional analyses to characterize the optimal study design for HFS prior to implementation.
4.2. Use of HFS to prioritize recreational sites
The goal of the HFS application reported here is to prioritize sites based on human fecal pollution levels. To demonstrate this approach, HFS estimates for each sampling intensity and qPCR replicate number design scenario were used for site ranking based on a 15-week recreational water beach season study period. Using the scenario where all data are available (100% sampling intensity; 14 qPCR replicates), referred to as the BCS, sites could be prioritized into three groups: (Esco and Esco:Mcyn) > Mcyn > (Mcyn:Topa and Topa) (Fig. 3, Panel B). The inability to separately rank each site using all available data provides important insights regarding the application of human fecal source identification technologies to prioritize water quality between recreational sites. Many monitoring efforts rely on monthly sampling or at best, weekly sampling. However, findings suggest that this low sampling intensity may not be sufficient to convincingly rank a series of impaired sites. Results presented here suggest that even monitoring water quality daily over an entire beach season combined with qPCR testing with many replicates (n = 14) does not guarantee complete site prioritization with the HFS approach. Nevertheless, successful ranking into three categories still offers extremely valuable information and could serve as a foundation for further remediation action prioritization.
Further comparison of BCS results to the 975 design scenarios of sampling intensity and qPCR replication indicate that the BCS outcome can be achieved with considerably less effort. For example, the BCS ranking outcome can be achieved with one third less samples (66.7% sampling intensity; Fig. 4) suggesting that daily water quality testing is not necessary to successfully implement the HFS approach. In addition, study design optimization simulations indicate that as little as two qPCR replicates per sample can achieve a comparable ranking outcome to the BCS (Fig. 4).
4.3. Factors to consider for HFS implementation
The HFS approach is designed to provide a step-by-step implementation plan to prioritize site remediation based on the level of human fecal pollution. Users must select a defined study period (i.e. beach season), confirm HF183/BacR287 local specificity, identify appropriate sampling location, collect samples in a prescribed fashion, as well as process samples and interpret findings with standardized procedures (Table 2). Successful implementation requires careful planning and laboratory preparation prior to initiating HFS testing.
Table 2.
Factors to consider for HFS implementation.
Factor | Recommendation |
---|---|
HF183/BacR287 Performance | Confirm specificity with local reference pollution sources |
Document any cross-reactivity | |
Site Selection | Impaired sites based on local water quality standards |
Sites must be sampled on same day and time over study period | |
Sample holding times≤6h | |
Easy site accessibility | |
Sampling Intensity and qPCR Replication Selection | Identify suitable confidence level |
Consult Fig. 4 to identify sampling intensity/qPCR replication requirements | |
Customize based on local field sampling and laboratory constraints | |
Laboratory Testing Conditions | Utilize single, centralized laboratory |
Use standardized protocol and DNA reference materials | |
Include data acceptance criteria | |
Demonstrate successful method proficiency |
The first step is to confirm performance of the HF183/BacR287 method in the local area of interest. The human-associated HF183/BacR287 can be highly specific for human fecal pollution, but reports of cross-reactivity with dog and chicken have been reported, albeit typically at much lower concentrations compared to human fecal sources (Green et al., 2014). In addition, although rare, it is possible that the HF183/BacR287 genetic target may not be shed by local human populations (Boehm et al., 2016). Thus, it is recommended that water quality managers perform performance tests with reference pollution source materials collected from potential human and non-human sources in the same geographic area as water quality testing.
There are also several factors to consider when choosing sampling sites for HFS application. There should be evidence that sites are impacted by fecal pollution based on local water quality standard criteria. Because HFS is a weighted average across a defined sampling intensity, not a day-to-day measurement of human fecal pollution levels, each site should be sampled at a similar time of day, on the same days of the week over the study period. Sampling days should also be evenly spaced throughout the designated sampling period if possible. Thus, it is important to consider site accessibility, sample transport conditions, and holding times when selecting sampling site locations.
HFS optimization characterization under a large range of sampling intensity and qPCR replication study designs provides future users with the information to customize implementation efforts based on local laboratory and field sampling capacities (Fig. 4). For instance, a group with limited qPCR replicate testing capacity may elect to utilize a higher field sampling intensity, such as 93.3%–100% and a minimal number of qPCR replicates (n = 2). However, another group may have limited field sampling support and choose to reduce sampling intensity and maximize qPCR replicate testing using a study design with only 66.7% sampling intensity and 8 to 14 qPCR replicates (Fig. 4).
It is important to note that the HFS procedure presented here is specifically designed and optimized for ranking sites based on a standardized HF183/BacR287 protocol including strict data acceptance criteria (Green et al., 2014, Shanks et al., 2016) using data generated from a single, centralized laboratory by repeated sampling of ambient surface water samples collected over a defined period of time. It remains unclear how site ranking based on HFS could be influenced by changes in the protocol, such as filtration of larger volumes of water (>100 mL), testing larger volumes of DNA template, using different thermal cycling instrumentation, changing the definition of the calibration model LLOQ, changing qPCR reagent brands, running qPCR tests in a simplex rather than multiplex format, or generating data across multiple laboratories. Simply put, the HFS is a standardized approach. Standardization practices are deliberate to facilitate implementation, enhance the quality of findings, and generate more meaningful data sets. It is likely that the HFS procedure will perform adequately with some minor changes, but additional research is recommended to confirm prior to implementation.
4.4. Implications for future water quality management
The development of a standardized field implementation plan and data analysis strategy specifically designed to address a particular water quality challenge are vital for the successful public adoption of a fecal source identification technology. Here, we introduce a novel, standardized, step-by-step procedure for ranking sampling sites for remediation based on levels of human fecal pollution. Information is also provided to optimize future implementation efforts based on local field sampling and laboratory capacities. The ability to prioritize sites based on human fecal pollution levels should improve and focus water quality management responses to chronically impaired recreational water sites.
In addition to site prioritization, the HFS approach may lead to several other new management tools. Rather than a metric to rank a series of sites, HFS could serve as a benchmark for comparison of human fecal pollution levels at the same site overtime. For instance, HFS could be measured before and after the installation of a best management practice to reduce human fecal pollution. A reduction in HFS after best management practice implementation would suggest that efforts to manage human fecal pollution were successful. However, no change or an increase in the HFS estimate value may indicate that management efforts failed to reduce human fecal pollution levels. Recreational beaches could also benefit from a HFS tool to generate a report card system where recreational areas are graded based on levels of human fecal pollution water quality. Future work is needed to identify any potential links between HFS and disease causing pathogens in impaired recreational waters, as well as any relationships with public health risk present. It may also be possible to apply the HFS approach to other host-associated qPCR technologies to develop site prioritization tools for non-human fecal pollution sources of concern such as avian, cattle, or swine animal groups.
5. Conclusion
The HFS approach introduced here represents the first attempt to develop a tailored procedure to solve a specific water quality problem using a standardized qPCR fecal source identification technology. Key contributions include:
HFS combines a consensus data analysis approach with a standardized HF183/BacR287 human-associated fecal source identification method to provide a prescribed procedure to rank impaired recreational sites based on levels of human fecal pollution.
HFS utilizes a novel mathematical framework for interpreting qPCR results that incorporates all data regardless of magnitude of measurement, including non-detections.
Analysis of 975 implementation combinations provides crucial information for future users to optimize study designs based on field and laboratory capabilities.
The HFS tool could have broad implications for the development of other water quality tools to measure utility of best management practices or recreational site report card systems based on human fecal pollution levels.
Future applications should reveal new and important findings to optimize this procedure and, hopefully, spark further development of novel strategies designed to address other common water management problems utilizing standardized qPCR methods.
Supplementary Material
Highlights.
New standardized metric to rank sample sites based on human fecal pollution levels.
Metric utilizes a standardized HF183/BacR287 qPCR procedure.
All qPCR data is used regardless of concentration, including samples with no detection.
Total of 975 implementation scenarios investigated to identify optimal study design for future use.
Key implementation factors are discussed to prioritize sites for remediation.
Acknowledgements
Information has been subjected to U.S. EPA peer and administrative review and has been approved for external publication. Any opinions expressed in this paper are those of the authors and do not necessarily reflect the official positions and policies of the U.S. EPA. Any mention of trade names or commercial products does not constitute endorsement or recommendation for use.
References
- Bernhard and Field, 2000. Bernhard AE, Field KG A PCR assay to discriminate human and ruminant feces on the basis of host differences in Bacteroides-Prevotella genes encoding for 16S rRNA Appl. Environ. Microbiol, 66 (10) (2000), pp. 4571–4574 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boehm et al. , 2013. Boehm AB, Van De Werfhorst LC, Griffith JF, Holden PA, Jay JA, Shanks OC, Wang D, Weisberg SB Performance of forty-one microbial source tracking methods: a twenty-seven lab evaluation study Water Res, 47 (18) (2013), pp. 6812–6828 [DOI] [PubMed] [Google Scholar]
- Boehm et al. , 2015. Boehm AB, Soller JA, Shanks OC Human-associated fecal quantitative polymerase chain reaction measurements and simulated risk of gastrointestinal illness in recreational waters contaminated with raw sewage Environ. Sci. Technol. Lett, 2 (10) (2015), pp. 270–275 [Google Scholar]
- Boehm et al. , 2016. Boehm AB, Wang D, Ercumen A, Shea M, Harris AR, Shanks OC, Kelty CA, Ahmed A, Mahmud ZH, Arnold BF, Chase C, Kullmann C, Colford JM, Luby SP, Pickering AJ Occurrence of host-associated fecal markers on child hands, household soil, and drinking water in rural Bangladeshi households Environ. Sci. Technol. Lett, 3 (2016)393–368 [PMC free article] [PubMed] [Google Scholar]
- Cao et al. , 2013a. Cao Y, Van De Werfhorst LC, Scott EA, Raith MR, Holden PA, Griffith JF Bacteroidales terminal restriction fragment length polymorphism (TRFLP) for fecal source differentiation in comparison to and in combination with universal bacteria TRFLP Water Res, 47 (18) (2013), pp. 6944–6955 [DOI] [PubMed] [Google Scholar]
- Cao et al. , 2013b. Cao Y, Van De Werfhorst LC, Dubinsky EA, Badgley BD, Sadowsky MJ, Andersen GL, Griffith JF, Holden PA Evaluation of molecular community analysis methods for discerning fecal sources and human waste Water Res, 47 (18) (2013), pp. 6862–6872 [DOI] [PubMed] [Google Scholar]
- Cao et al. , 2013c. Cao Y, Hagedorn C, Shanks OC, Wang D, Ervin J, Griffith JF, Layton B, McGee C, Riedel T, Weisberg SB Towards establishing a human fecal contamination index in microbial source tracking Int. J. Environ. Sci. Eng. Res., 4 (2013), pp. 46–58 [Google Scholar]
- Cao et al. , 2015. Cao Y, Raith MR, Griffith JF Droplet digital PCR for simultaneous quantification of general and human-associated fecal indicators for water quality assessment Water Res, 70 (2015), pp. 337–349 [DOI] [PubMed] [Google Scholar]
- Cao et al. , 2016. Cao Y, Raith MR, Griffith JF A duplex digital PCR assay for simultaneous quantification of the Enterococcus spp. and the human fecal-associated HF183 markers in waters J. Vis. Exp. (2016), p. e53611 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dubinsky et al. , 2012. Dubinsky EA, Esmaili L, Hulls JR, Cao Y, Griffith JF, Andersen GL Application of phyogentic microarray analysis to discriminate sources of fecal pollution Environ. Sci. Technol, 46 (2012), pp. 4340–4347 [DOI] [PubMed] [Google Scholar]
- Fisher et al. , 2015. Fisher JC, Eren AM, Green HC, Shanks OC, Morrison HG, Vineis JH, Sogin ML, McLellan SL Comparison of sewage and animal fecal microbiomes using oligotyping reveals potential human fecal indicators in multiple taxonomic groups Appl. Environ. Microbiol, 81 (7023–7033) (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
- Green et al. , 2014. Green HC, Haugland R, Varma M, Millen HT, Borchardt MA, FIeld KG, Kelty CA, Sivaganesan M, Shanks Improved OC HF183 quantitative real-time PCR assay for characterization of human fecal pollution in ambient surface water samples Appl. Environ. Microbiol, 80 (10) (2014), pp. 3086–3094 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Griffith et al. , 2013. Griffith JF, Layton BA, Boehm AB, Holden P, Jay J, Hagedorn C, McGee C, Weisberg SB The California Microbial Source Identification Manual: a Tiered Approach to Identifying Fecal Pollution Sources to Beaches Southern California Coastal Water Research Project, Costa Mesa, CA: (2013) [Google Scholar]
- Haugland et al. , 2010. Haugland RA, Varma M, Kelty CA, Peed L, Sivaganesan M, Shanks OC Evaluation of genetic markers from the 16S rRNA gene V2 region for use in quantitative detection of selected Bacteroidales species and human fecal waste by real-time PCR Syst. Appl. Microbiol, 33 (2010), pp. 348–357 [DOI] [PubMed] [Google Scholar]
- Kelty et al. , 2012. Kelty CA, Varma M, Sivaganesan M, Haugland R, Shanks OC Distribution of genetic marker concentrations for fecal indicator bacteria in sewage and animal feces Appl. Environ. Microbiol, 78 (2012), pp. 4225–4232 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Layton et al. , 2013. Layton BA, Cao Y, Ebentier DL, Hanley K, Ballesté E, Brandão J, Byappanahalli M, Converse R, Farnleitner AH, Gentry-Shields J, Gidley ML, Gourmelon M, Lee CS, Lee J, Lozach S, Madi T, Meijer WG, Noble R, Peed L, Reischer GH, Rodrigues R, Rose JB, Schriewer A, Sinigalliano C, Srinivasan S, Stewart J, Van De Werfhorst LC, Wang D, Whitman R, Wuertz S, Jay J, Holden PA, Boehm AB, Shanks O, Griffith JF Performance of human fecal anaerobe-associated PCR-based assays in a multi-laboratory method evaluation study Water Res, 47 (18) (2013), pp. 6897–6908 [DOI] [PubMed] [Google Scholar]
- LifeTechnologies, 2014. LifeTechnologies Real-time PCR Handbook (third ed.) (2014), p. 11 [Google Scholar]
- Murray, 2011. Murray J Canine Scent and Microbial Source Tracking in Santa Barbara, CA: (2011), p. U2R09 [Google Scholar]
- Shanks et al. , 2010. Shanks OC, White K, Kelty CA, Sivaganesan M, Blannon J, Meckes M, Varma M, .A. Haugland Performance of PCR-based assays targeting Bacteroidales genetic markers of human fecal pollution in sewage and fecal samples Environ. Sci. Technol, 44 (16) (2010), pp. 6281–6288 [DOI] [PubMed] [Google Scholar]
- Shanks et al. , 2016. Shanks OC, Kelty CA, Oshiro R, Haugland RA, Madi T, Brooks L, Field KG, Sivaganesan M Data acceptance criteria for standardized human-associated fecal source identificationq quantitative real-time PCR methods Appl. Environ. Microbiol, 82 (9) (2016), pp. 2773–2782 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sivaganesan et al. , 2008. Sivaganesan M, Seifring S, Varma M, Haugland RA, Shanks OC A Bayesian method for calculating real-time quantitative PCR calibration curves using absolute plasmid DNA standards BMC Bioinforma, 9 (2008), p. 120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sivaganesan et al. , 2011. Sivaganesan M, Siefring S, Varma M, Haugland RA MPN estimation of qPCR target sequence recoveries from whole cell calibrator samples J. Microbiol. Methods, 87 (2011), pp. 343–349 [DOI] [PubMed] [Google Scholar]
- Soller et al. , 2010. Soller JA, Schoen ME, Bartrand T, Ravenscroft JE, Ashbolt NJ Estimated human health risks from exposure to recreational waters impacted by human and non-human sources of faecal contamination Water Res, 44 (2010), pp. 4674–4691 [DOI] [PubMed] [Google Scholar]
- Stewart et al. , 2013. Stewart JR, Boehm AB, Dubinsky EA, Fong T-T, Goodwin KD, Griffith JF, Noble RT, Shanks OC, Vijayavel K, Weisberg SB Recommendations following a multi-laboratory comparison of microbial source tracking methods Water Res, 47 (18) (2013), pp. 6829–6838 [DOI] [PubMed] [Google Scholar]
- Unno et al. , 2010. Unno T, Jang J, Han D, Ha Kim J, Sadowsky MJ, Kim O, Chun J, Hur H Use of barcoded pyrosequencing and shared OTUs to determine source of fecal bacteria in watersheds Environ. Sci. Technol, 44 (2010), pp. 7777–7782 [DOI] [PubMed] [Google Scholar]
- USEPA, 2012 USEPA Recreational Water Quality Criteria Office of Water (2012) [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.