Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2022 Aug 12;56(17):12126–12136. doi: 10.1021/acs.est.2c00470

Regional Scale Assessment of Shallow Groundwater Vulnerability to Contamination from Unconventional Hydrocarbon Extraction

Mario A Soriano Jr †,*, Nicole C Deziel , James E Saiers †,*
PMCID: PMC9454823  PMID: 35960643

Abstract

graphic file with name es2c00470_0007.jpg

Concerns over unconventional oil and gas (UOG) development persist, especially in rural communities that rely on shallow groundwater for drinking and other domestic purposes. Given the continued expansion of the industry, regional (vs local scale) models are needed to characterize groundwater contamination risks faced by the increasing proportion of the population residing in areas that accommodate UOG extraction. In this paper, we evaluate groundwater vulnerability to contamination from surface spills and shallow subsurface leakage of UOG wells within a 104,000 km2 region in the Appalachian Basin, northeastern USA. We test a computationally efficient ensemble approach for simulating groundwater flow and contaminant transport processes to quantify vulnerability with high resolution. We also examine metamodels, or machine learning models trained to emulate physically based models, and investigate their spatial transferability. We identify predictors describing proximity to UOG, hydrology, and topography that are important for metamodels to make accurate vulnerability predictions outside their training regions. Using our approach, we estimate that 21,000–30,000 individuals in our study area are dependent on domestic water wells that are vulnerable to contamination from UOG activities. Our novel modeling framework could be used to guide groundwater monitoring, provide information for public health studies, and assess environmental justice issues.

Keywords: unconventional oil and gas, hydraulic fracturing, risk assessment, groundwater protection, machine learning

Short abstract

The paper demonstrates efficient approaches for locating populations at risk of groundwater contamination from unconventional oil and gas development.

Introduction

Unconventional oil and gas (UOG) development has dramatically altered the energy portfolio of the United States of America (USA) and is projected to remain a significant component of its energy mix until 2050.1 Unconventional hydrocarbon extraction, which involves directional drilling and hydraulic fracturing, also has a substantial presence in several other countries.2,3 While the industry has generated measurable benefits, concerns about potential water contamination and public health risk persist.4 Such concerns are especially prevalent in areas hosting intense UOG operations where communities heavily rely on shallow aquifers for daily needs. Indeed, UOG-related activities, including site preparation, drilling, hydraulic fracturing, hydrocarbon production, and wastewater handling, have been linked to instances of local surface water and groundwater impairment.59

Surface spills of UOG-derived fluids, including hydraulic fracturing fluids and produced waters, have been identified as the most likely pathway for drinking water contamination.1014 Produced water, which is continuously generated throughout UOG well production, is of particular concern given the current challenges for its proper treatment, reuse, or disposal.15 Previous analyses of these wastewaters have revealed the presence of contaminants with known detrimental effects on human health, including carcinogenicity, endocrine-disrupting activity, and reproductive and developmental toxicity.16,17

Groundwater contamination by UOG spills has been assessed with fate and transport models assuming the source location and properties of the spilled material can be specified. However, UOG spills may consist of complex chemical mixtures of unknown volume and composition for which data on the sorptive and degradative characteristics of the chemical constituents is sparse.18,19 Moreover, large-scale contaminant transport simulations in complex terrain and involving multiple potential sources typical in UOG-affected landscapes may be computationally intractable.20 Previous modeling studies have thus been limited to investigating the fate and transport of hypothetical spill scenarios considering individual compounds and idealized transport simulation domains.2124 Drinking water contamination risks that are distributed nonuniformly over regional scales of UOG fields, where large numbers of water wells exist in proximity to UOG wells, remain underexplored.

One approach for circumventing these challenges is to employ the concept of vulnerability. Based on the source–pathway–receptor framework, groundwater vulnerability to contamination describes the likelihood of contaminants reaching a specified groundwater receptor after introduction at some known source location.25,26 Several methods have been adopted to operationalize this concept within the context of UOG risk assessment. Variations of the widely used DRASTIC index—which considers groundwater vulnerability as the weighted sum of seven rated parameters: depth to water table, recharge, aquifer media, soil media, topography, impact of vadose zone, and hydraulic conductivity—have been adapted to assess vulnerability in regions targeted for UOG in Canada and South Africa.2730 Similar index-based methods were developed to evaluate vulnerability to contamination from hydrocarbon extraction operations in the United Kingdom and Spain.31,32 While these index-based methods facilitate rapid large-scale vulnerability assessments, they have been criticized for their dependence on subjective expert judgement in identifying relevant parameters and assigning ratings.33 Physically based modeling approaches are less subjective and are generally regarded as more scientifically defensible, but as previously mentioned the application of these models at large scales and high resolutions can be hampered by prohibitive computational costs.25

To reduce the computational burden, hybrid approaches combining index-based methods and physically based modeling in idealized domains have been proposed. Rosales-Ramirez et al. developed a workflow that started by mapping vulnerability across a 162,000 km2 region in northeast British Columbia, Canada, based on the D–I–C (water table depth–impact of vadose zone–hydraulic conductivity) parameters from the DRASTIC index.30 The mapped D–I–C combinations were used to specify parameter values of physically based models for the migration of a UOG wastewater spill within a rectangular domain (0.16 km2). Simulated travel times and distances were matched back to the D–I–C zones, thus providing process-based information to the mapped vulnerability index throughout the region. Mallants et al. used a similar approach to map simulated dilution factors for 39 UOG-related chemicals across two large regions (29,000 and 139,000 km2) targeted for shale gas development in Australia.34 Their solute transport simulations were conducted for a 1 km2 domain, with parameters estimated from mapped landscape classes and hydrogeologic properties. Both studies assumed that contaminant transport was radially symmetric from the spill site.

An alternative to the aforementioned hybrid approaches is the groundwater well vulnerability assessment framework proposed by Soriano et al., which utilizes the physically based concept of capture probability and actual locations of UOG and drinking water wells.35 In this spatially explicit approach, vulnerability is quantified from the number and locations of UOG contaminant sources inside a water well’s probabilistic capture zone, which is simulated using a groundwater flow and solute transport model. This approach accounts for ambient groundwater flow patterns that govern directions of contaminant transport rather than assuming radially symmetric transport. It also considers the cumulative effects of multiple UOG sources on individual water wells simultaneously. The authors noted the need for high-performance computing resources in their analysis of 316 domestic drinking water wells within a 190 km2 watershed in northeast Pennsylvania, USA. While this physics-based approach was computationally demanding, its application to predict groundwater well vulnerability over larger spatial scales (2900 km2) was demonstrated to be feasible through physics-informed machine learning.36

In this paper, we evaluate groundwater vulnerability to contamination from UOG at a regional scale (104,000 km2) and with high resolution (250 m) using physics-based, computationally efficient approaches in the form of ensemble particle tracking and metamodeling. Efficient ensemble generation through iterative ensemble smoothing has recently enabled robust calibration and uncertainty analysis of large-scale physically based groundwater models.37,38 Meanwhile, metamodeling, where machine learning models are trained to learn generalizable input–output relationships from physically based models, has been demonstrated to successfully emulate groundwater models and identified as a promising approach to expedite the process of making realistic predictions in areas outside original model training regions.36,39,40 We illustrate the utility of these novel approaches for quantifying vulnerability in a large, multistate region of the northeastern USA overlying the Marcellus, Utica/Point Pleasant, and Upper Devonian shales where more than 10,000 UOG wells have been completed. In addition, we demonstrate the novel application of vulnerability for estimating the proportions of groundwater-dependent populations that are at risk of UOG contamination. Our analysis provides detailed insights into the spatial distribution and nature of water contamination risks at a regional scale. It also offers a prospective methodology for elucidating the role of groundwater exposure pathways (vs other environmental exposures, e.g., air) in observed associations between residential proximity to UOG and adverse public health outcomes.

Materials and Methods

Physically Based Modeling

The hydrologic model used in this study is based on a two-dimensional (single layer) MODFLOW-6 model of the surficial aquifer system of the coterminous USA.4143 The surficial aquifer consists of unconfined groundwater systems above the shallowest confining units in areas with unconsolidated deposits as well as weathered and fractured regolith in areas with consolidated rocks. From this national scale groundwater flow model, we reconstructed and reanalyzed a domain covering parts of western Pennsylvania, eastern Ohio, and northern West Virginia (regional model domain in Figure 1), where intensive UOG development of the Upper Devonian, Marcellus, and Utica/Point Pleasant Shale formations continues to occur. Four adjacent HUC4 watersheds comprise the model domain: 0501 (Allegheny), 0502 (Monongahela), 0503 (Upper Ohio), and 0504 (Muskingum) (Figure S1). The 104,000 km2 model domain encompasses 91 counties with a population of approximately 6.5 million across the three states. The domain was discretized into a 250 m resolution finite difference grid, corresponding to 1.7 × 106 cells. Domain thickness ranged from 5 to 150 m, with a mean of 67 m. These aquifer thicknesses encompass the average depths of domestic water wells in the region (Figure S2). All MODFLOW inputs were carried over from the national scale model. The groundwater flow model was run at steady state, representing long-term average flow conditions.

Figure 1.

Figure 1

The regional model domain, covering ∼104,000 sq. km of Pennsylvania, Ohio, and West Virginia, delineates the boundaries of the physically based MODFLOW-6/MODPATH-7 grid. Boxes A—H are subdomains where machine learning models were trained to emulate the physically based model.

We recalibrated the regional model using the iterative ensemble smoother, PESTPP-IES, within the PEST++ Version 5 suite.44 The calibration generates an ensemble of hydraulic conductivity parameter fields that allow the model-simulated values to match calibration targets in a Monte Carlo-type evaluation. We implemented two approaches for parametrizing hydraulic conductivity to account for the uncertainties introduced by model parametrization.45 Hydraulic conductivity was parametrized by (a) delineating 55 zones of constant hydraulic conductivity based on the surficial geology of the study area, (“zones”) and (b) distributing pilot points in a uniform 2500 m grid with additional points distributed according to guidelines from Doherty et al.,46 for a total of 16,867 pilot points, assigned within the aforementioned zones to account for intrazone subsurface heterogeneity (“ppoints”) (Text S1 and Figure S3). We prescribed 200 realizations in the ensemble for the zones parametrization and 2000 realizations for the ppoints parametrization. Calibration targets included 119 groundwater levels from USGS National Water Information System, as well as 11,647 land surface expressions of the water table interpreted from the National Hydrography Dataset and the National Wetland Inventory.4749 Starting values for the parameters were derived from the previously published calibration results of Zell and Sanford.41

Through the calibration process, realizations that inflated model-observation mismatch were dropped, and only those that reduced these errors were retained for subsequent analysis. We then performed forward particle tracking using MODPATH-7 for each retained realization of the two parametrization approaches.50 Particles were initialized at UOG well locations with reported spud dates before September 2020 obtained from state databases (11,928 sites corresponding to 4007 unique grid cells).5153 Pathlines were generated from these particle origins toward their exit point out of the groundwater flow system, delineating the transport paths taken by dissolved contaminants in the system under an advective transport regime. In other words, pathlines indicate where and how far contaminants released from sources are likely to travel based on groundwater flow, which can be used, for example, to guide setback distances needed to protect groundwater receptors. This particle-tracking approach models contamination from spills at the surface of well pads or leaks due to UOG well casing failure in the shallow subsurface. This large, regional scale analysis is limited to aqueous-phase contamination under ambient flow conditions. Multiphase transport, which, for example, occurs when methane is presented at sufficiently high levels to partition between free-gas and dissolved phases, is not modeled.

For each grid cell location x, we operationalize vulnerability to contamination from UOG sites as

graphic file with name es2c00470_m001.jpg 1

where the numerator of (1) is the number of realizations in which a particle track intersects location x, while the denominator is the total number of retained realizations in the ensemble. V(x) ranges from 0 to 1, with higher values indicating greater agreement in the ensemble that a grid cell lies along an advective transport path from UOG sites. This definition of vulnerability can be viewed as a form of the forward location probability that describes the likely future position of contaminants in an aquifer within a stochastic ensemble framework.54 Areas with higher vulnerability are at greater risk of being impacted by contaminant releases from UOG sources.

This approach assumes rapid, vertical transport in the unsaturated zone beneath the well pad. Moreover, the two-dimensional groundwater model assumes predominantly horizontal flow in the aquifer, an approximation that has been shown to successfully preserve overall lateral flow patterns computed by fully 3D models.55,56 The vulnerability analysis focuses on these lateral flow patterns, such that V varies with space (x,y) but not with aquifer depth (z). This simplification is reasonable considering that domestic wells in our study areas are constructed with open boreholes that draw water and solutes from relatively large aquifer intervals. Our approach also assumes that contaminants are transported conservatively, a simplification necessitated by the fact that contaminant releases from UOG operations consist of complex mixtures of often undisclosed chemicals and that site-specific information on physicochemical conditions governing adsorption and biodegradation is unavailable. Collectively, the assumptions describe a worst-case contamination scenario, consistent with the precautionary principle, which places emphasis on protection from serious harm in the face of scientific uncertainty.57 As will be described later in the paper, the zones and ppoints parametrization schemes have very similar postcalibration model performance. Thus, further following the precautionary approach, we integrate their results by retaining the maximum value of V(x) at each grid cell (2).58

graphic file with name es2c00470_m002.jpg 2

Metamodeling

We trained machine learning models to emulate the general behavior of the physically based model in eight subdomains of the regional model (Figure 1 and Table S1). The subdomains represent different combinations of hydrogeologic conditions (e.g., geology and topography) and UOG well density. In each metamodel subdomain, we labeled grid cell x as vulnerable if Vmax(x) ≥ 0.001 and nonvulnerable otherwise. We assembled a catalog of 20 candidate predictor variables at each grid cell centroid. These predictors describe proximity to UOG wells, hydrologic position, and topography (Table S2 and Figure S4). Due to correlation in the predictors, specifically for the proximity metrics, we employed the conditional inference forest algorithm, a variant of the random forest algorithm that implements unbiased recursive partitioning and has been shown to make accurate predictions and robust assessments of variable importance in the presence of correlated predictors.59

The output of the conditional inference forest algorithm for each grid cell is its probability of being classified as vulnerable, P(vulnerable), that is, Inline graphic. The final classification label is determined from a selected threshold of the probability P, which, by default, is 0.5. In other words, the metamodel labels a grid cell as vulnerable if P(vulnerable) > 0.5 and nonvulnerable otherwise. Metamodel performance was assessed using accuracy at the default 0.5 threshold (“accuracy”) and the area under the receiver operating characteristic curve (“AUC-ROC”), which characterizes model performance across all possible thresholds. AUC-ROC can range from 0.5 to 1, with 1 indicating a model with perfect predictive power and 0.5 indicating a model with no predictive power.60

For each metamodel subdomain, we used 70% of the data for training and the remaining 30% as a hold-out for internal testing. Five thousand inference trees were used for each conditional forest. We used a systematic variable selection and ranking approach based on backward elimination and fivefold cross validation to prevent overfitting.6163 Predictors that do not contribute to improving model performance were dropped, and the remaining predictors were ranked based on computed variable importance. We also interpreted the metamodels’ learned dependence between vulnerability and the predictor variables using accumulated local effect plots.64

The internal consistency of the metamodels within their training regions was assessed using the aforementioned 70–30 split. Upon confirming such internal consistency, we then assessed the metamodels’ ability to make accurate predictions of vulnerability outside of their training region under two frameworks: round robin and leave one out. In the round-robin framework, we train metamodels on one subdomain and classify vulnerability at all other subdomains. This framework can represent a situation where time, expertise, and computational resources constrain physically based modeling to a domain with limited spatial coverage, but predictions are desired for many discrete locations distributed over large geographic regions (e.g., Soriano et al.36). In the leave-one-out framework, we train metamodels using seven subdomains and predict vulnerability in the remaining one. This can represent a situation where multiple physically based models covering noncontiguous domains are available, but predictions in previously unmodeled areas over contiguous regions are desired (e.g., Starn et al.39). In both cases, the aim of metamodeling is to facilitate rapid large-scale physically informed assessments as an aid to scientifically defensible decision making.

Predictor variables were quantified using ESRI ArcGIS 10.9. Metamodels were developed using R v.3.6.3 with the packages party, caTools, ROCR, and ALEPlot.6569

Estimating Vulnerable Populations

Because our definition of vulnerability V(x) ranges from 0 to 1, we can also interpret it as the proportion of the population relying on domestic groundwater wells that are vulnerable to contamination from UOG development activities within a grid cell. We combined the spatially explicit vulnerability predictions from the physically based regional model with estimates of the domestic groundwater well-dependent populations. The domestic groundwater-dependent populations were derived from published estimates using the net housing unit (NHU) method,70 the block group method (BGM),71 and the road-enhanced method (REM).71 We assessed the number of vulnerable populations in comparison to the total number of people using domestic groundwater wells in (a) the entire regional model domain, (b) counties with at least 100 UOG wells, and (c) census blocks with at least one UOG well.72 Note that the concept of vulnerable populations as defined in our work is distinct from other definitions of vulnerable populations, such as in the social sciences, where this term is used to indicate a community’s demographic characteristics and lack of socioeconomic resources that may impede its recovery from hazards.73

Results and Discussion

Physically Based Model Performance and Vulnerability Assessment

Model computations of hydraulic head vary with land surface elevation, indicating the influence of topography on groundwater flow patterns (Figure 2a,b), with recharge from uplands generally discharging into adjacent valleys. The ensembles from the zones and the ppoints parametrizations achieved similar levels of agreement between target observations and their simulated equivalents (Figure 2c). The number of realizations retained in the calibrated ensembles was 168 for zones and 1998 for ppoints, indicating the consistency of the current calibration with the previously published work by Zell and Sanford.41 The mean residual in hydraulic head was −0.4 m for zones and −1.3 m for ppoints, suggesting that both parametrizations slightly overestimated observed heads. In terms of absolute value, the mean residual in hydraulic head was 2.2 m for zones and 2.3 m for ppoints. No trends were apparent from the spatial distribution of the residuals (Figure S5).

Figure 2.

Figure 2

Simulated hydraulic heads and histogram of average residuals. Shown are heads for one representative realization of the (a) zones parametrization and (b) ppoints parametrization. (c) Histogram of average residuals (observed—simulated) across retained realizations in the calibrated ensembles.

These calibrated ensembles of the steady-state flowfield were then used for particle tracking simulations from UOG sites (Figure 3). Across all retained realizations of the zones parametrization, the 25th percentile, median, and 75th percentile pathline lengths were 316, 556, and 903 m while those for ppoints were 362, 686, and 1160 m. The maximum pathline lengths were 9043 and 9192 m for zones and ppoints, respectively. These pathline lengths generally exceeding regulatory setback distances of UOG wells from water wells (sbd) established for Pennsylvania (sbd = 152 m), Ohio (sbd = 15 m), and West Virginia (sbd = 76 m) suggest that current mandates may require further consideration, which could be informed, in part, by computations of our physics-based modeling framework.7476

Figure 3.

Figure 3

Particle tracking results for metamodel subdomain A (see Figure 1). Particle tracking pathlines originate from UOG well locations and terminate at groundwater discharge cells. Shown are one representative realization of the (a) zones parametrization and (b) ppoints parametrization. (c) Histogram of pathline length in the regional model domain across retained realizations in the calibrated ensembles.

We also explore particle tracking results for a fixed time horizon of 25 years for both the zones and ppoints parametrizations, considering three scenarios: no retardation, a weakly adsorbing, and a strongly adsorbing contaminant (Figure S6). For the no retardation scenario, 86–90% of the 25 year pathlines exceeded the 152 m Pennsylvania setback distance. The no retardation scenario represents transport of conservative contaminants, such as chloride and bromide, which are known to be elevated in UOG wastewaters and have been used to attribute contamination incidents to UOG.7779 For the weakly adsorbing scenario, 20–43% of the pathlines exceeded 152 m. For the strongly adsorbing scenario, no pathlines exceeded 152 m. The weakly and strongly adsorbing scenarios were representative of acrylamide (log Koc = 0.55) and bis-2-ethylhexyl phthalate (log Koc = 4.99), respectively, which correspond to the 25th and 75th percentile of organic carbon partition coefficients (Koc) from disclosed hydraulic fracturing compounds.80 These chemicals have been detected in UOG wastewater and domestic groundwater samples near UOG sites.13,81,82 The results demonstrate the variability of transport distances away from UOG sources, depending on the contaminant in consideration. In addition to retardation, biodegradation will also effectively reduce the transport distances of some organic contaminants, and dispersion will lower their concentrations. However, for many UOG-related chemicals, site-specific transport properties and relationships between concentration and toxicity remain unknown,17 such that a precautionary approach remains necessary for assessing contamination and human health risks.

The ensemble of particle tracks (computed for conservative advective transport) was translated into quantitative estimates of vulnerability at each grid cell using (1) and (2) (Figure 4). The maximum vulnerability computed was 1.0, corresponding to grid cells that were intersected by pathlines across all retained realizations. Most of the variability in particle tracks for the zones parametrization occurred at the subgrid scale, with intersected grid cells consistent across realizations. In comparison, greater variability in particle tracks occurred across realizations for the ppoints parametrization. As noted earlier, the similar postcalibration model performance of the zones and ppoints parametrization schemes precludes the preferential selection of one scheme over the other and justifies the use of Vmax as a worst-case, precautionary estimate of vulnerability.

Figure 4.

Figure 4

Vulnerability computed from the ensemble particle tracks for metamodel subdomain A (see Figure 1) using the (a) zones parametrization (Vzones), (b) ppoints parametrization (Vppoints), and (c) maximum value for each grid cell (Vmax).

The model results also reveal stream segments that are vulnerable to contamination because they receive groundwater discharge from flow paths that intercept UOG wells. This information can strengthen empirically based inferences on source attribution of surface water contamination. For example, Agarwal et al.77 used multiple upstream–downstream water samples to attribute elevated downstream levels of chloride to a spill at the Greene-1 UOG site, and our model results reinforce this interpretation by demonstrating that this stream is highly vulnerable to contamination to spills from this location (Figure S7).

Metamodel Evaluation

Metamodels displayed excellent performance within subdomains for the 70% training and 30% internal testing data sets, showing that they were able to extract the desired information from the physically based model and accurately generalize it in their surrounding locales (Table S3). Metamodel performance evaluated across other subdomains also support their potential transferability to regions with similar hydrogeologic settings. In the round-robin approach, accuracy ranged from 0.62 to 0.87 (mean = 0.75) and AUC-ROC ranged from 0.81 to 0.94 (mean = 0.87), while in the leave-one-out approach, accuracy ranged from 0.71 to 0.86 (mean = 0.79) and AUC-ROC ranged from 0.83 to 0.94 (mean = 0.88) (Table S4). Within the round-robin framework, metamodels trained using larger subdomains generally outperformed those using smaller subdomains. Similarly, metamodels trained under the leave-one-out framework performed better than those trained under the round-robin approach. These findings suggest that better transferability and predictive performance can be attained by metamodels trained with a wider array of physical settings and predictor–outcome combinations, increasing the likelihood that the training will encounter conditions resembling those in other regions. Within the round-robin framework, the metamodel trained on subdomain D attained the highest performance metrics on average, suggesting that this subdomain had the greatest transferability and that it encompassed hydrogeologic conditions and UOG source distribution that most resembled those in the other subdomains. Within the leave-one-out framework, the metamodel trained on subdomains A–G for predicting vulnerability in subdomain H attained the highest performance metrics, suggesting conditions in subdomain H were encompassed by those in the other subdomains.

The high values of AUC-ROC confirm the metamodels’ ability to clearly distinguish vulnerable and nonvulnerable classes across different training–testing data set combinations, although the optimal separation between classes does not necessarily occur at the default 0.5 probability threshold where accuracy is computed (Figure S8). A more appropriate probability threshold may be identified depending on the desired model application, for example, to maximize the sensitivity to classifying vulnerable locations, and such threshold optimization has been suggested in other studies that employed machine learning for classification tasks. For example, Erickson et al. used the default 0.5 threshold for classifying groundwater samples with high manganese concentrations but adopted a lower threshold of 0.2 to optimize model sensitivity for classifying high arsenic samples in their training data.83 In identifying the appropriate threshold, one should also consider the anticipated similarity between the training and testing regions. This is relevant in the context of metamodels trained within one geographic region that are intended for making predictions in other geographic regions. Determining the optimal probability threshold for assigning class labels can be obviated by making inferences outside the training region based directly on the metamodel-predicted probabilities rather than predicted class labels, if suitable for the specific application at hand.8385 For instance, in our application, grid cells with a true vulnerable label overall have higher metamodel-predicted P(vulnerable) than cells with a true nonvulnerable label across all combinations of the training and testing regions. Thus, we can infer that cells outside the training region that are predicted to have high P(vulnerable) are also likely to have high vulnerability (i.e., Vmax).

Three predictors were consistently ranked by the variable selection algorithm as among the most important for predicting vulnerability: dnrst (distance to the nearest UOG source), d_elev (difference in surface elevation between the nearest UOG source and the receptor), and idups (inverse distance to the nearest upgradient UOG source) (Figure 5 and Table S5). These predictors were also found to be informative for predicting vulnerability in a smaller-scale study in northeast Pennsylvania.36 The most informative predictor of vulnerability was dnrst. This predictor exhibits an inverse relationship with the log odds of being vulnerable, with the effect appearing to plateau at some distance > 1.5 km depending on the data set used for metamodel training. The inverse relationship learned by the metamodels is consistent with the source–pathway–receptor framework, where receptors nearer to contaminant sources are subject to greater contamination risk. The other two top predictors quantify the role of hydrologic connectivity and topography (which influences hydraulic gradients) on vulnerability. That is, receptors are unlikely to be vulnerable to aqueous-phase contamination unless the nearest UOG sources are upgradient based on flow direction (idups > 0) and at a higher elevation (d_elev > 0). The observed consistency in the predictor effects on vulnerability further supports the robustness of the relationships learned by the metamodels and their ability to successfully emulate the physical controls of proximity to sources, topography, and hydrology encapsulated in the physically based models.

Figure 5.

Figure 5

Accumulated local effect plots showing the top predictors of vulnerability from the leave-one-out metamodel analysis. Each curve label indicates the metamodel subdomain left out during training; for example, curve A represents a metamodel trained using data from subdomains B–H and tested for making predictions on subdomain A. The y-axis on these plots represents the natural logarithm of the ratio between the probability of being vulnerable and the probability of being nonvulnerable. The top predictors of vulnerability are (a) dnrst—distance to the nearest UOG source, (b) d_elev—difference in surface elevation between the nearest UOG source and the receptor, and (c) idups—inverse distance to the nearest upgradient UOG source. Note that the x-axis for idups is in log 10 scale; for plotting purposes, we added 0.0001 to all x values such that idups = 0 is shown at idups = 0.0001.

Vulnerable Populations Dependent on Domestic Groundwater

The population served by domestic groundwater wells that are vulnerable to contamination from UOG sources inside the study area was calculated to range from ∼21,000 to ∼30,000 individuals (Tables 1 and S6). Vulnerable populations are colocated with areas with the highest number of UOG sites (Figure S9). The largest vulnerable population estimates correspond to Vmax (as defined in eq 2), while the smallest correspond to Vzones. The vulnerable population is 1.4–2% of the total population dependent on domestic groundwater supplies within the regional model domain, 3.2–4.8% of the population supplied by domestic groundwater in counties with at least 100 UOG wells, and 15.8–20.8% of domestic groundwater-dependent populations in census blocks with at least one UOG well.

Table 1. Estimates of Population Served by Domestic Groundwater Wells That Are Vulnerable to Contamination from UOG Sources Inside the Regional Model Domain (Figure 1)a.

  NHU BGM REM
A—vulnerable population in the entire regional model domain 29,734 29,990 29,982
B—vulnerable population in counties containing UOG (n ≥ 100) 17,925 17,863 17,856
C—vulnerable population in census blocks containing UOG (n ≥ 1) 12,701 14,840 14,830
D—population served by domestic groundwater in the entire regional model domain 1.57 × 106 1.49 × 106 1.49 × 106
E—population served by domestic groundwater in counties containing UOG (n ≥ 100) 417,734 374,494 374,459
F—population served by domestic groundwater in census blocks containing UOG (n ≥ 1) 61,009 75,200 75,071
vulnerable population as % of total (A/D) 1.9% 2.0% 2.0%
vulnerable population as % of total (B/E) 4.3% 4.8% 4.8%
vulnerable population as % of total (C/F) 20.8% 19.7% 19.8%
a

This table is based on the maximum vulnerability estimate Vmax. Calculations based on Vzones and Vppoints are given in Table S6.

A previous study analyzed distances between domestic groundwater wells constructed from 2000 to 2014 and oil and gas wells hydraulically fractured in 2014.86 The analysis reported calculations for 10 counties that were within the regional model domain in the current study (Table S7). The previous study calculated that 12–32.2% of domestic wells in those 10 counties were within 2 km of oil and gas wells hydraulically fractured in 2014, and 1.6–13.3% were within 1 km. In comparison, our study calculated that 1.5–11.3% of the population served by domestic groundwater in those 10 counties was vulnerable to contamination from UOG. Despite including a larger number of UOG sources (all UOG wells drilled up to September 2020 compared to hydraulically fractured wells stimulated in 2014), our study resulted in lower percentages of populations or receptors that may be at risk. This finding supports the notion that while proximity is a primary driver of contamination risk, only a subset of the receptors close to sources would likely be impacted because topography and hydrologeologic connectivity determine the physical transport pathways of dissolved contaminants. Other studies quantifying proximity of populations to UOG operations as a proxy for risk did not directly deal with risks of groundwater contamination. One statewide study in Pennsylvania reported that 9.66% of the 2010 population, that is, 1,177,895 people, was at risk from hydraulic fracturing-related activities, with risk defined as the ratio of UOG well intensity to population intensity for a given area.87 “High” (≥100 UOG wells per 1000 people) and “very high” (≥1000 UOG wells per 1000 people) were clustered in the southwest and northeast portion of the state. Another study that identified 1632 UOG sites from aerial photographs taken throughout Pennsylvania between 2004 and 2010 reported that ∼30% of the mapped sites had more than 100 residents within 3 km.88 A nationwide study quantifying proximity of populations to both conventional and UOG operations estimated that 18 million people across the country were potentially at risk.89 Based on the results of our work, we can infer that the populations specifically at risk of groundwater contamination from UOG are likely lower than previous estimates based only on proximity.

Implications

The assessment of groundwater vulnerability to contamination can facilitate groundwater quality-monitoring efforts, identifying locales where UOG-derived contaminants are likely to be detected, should they be released through spills or leaks in the shallow subsurface.7,13,80,90,91 Such information is crucial given that groundwater quality data are not routinely collected in rural areas where UOG development tends to be most intense, especially in the absence of adequate resources and regulatory monitoring requirements.92

The term “vulnerability” has also been applied in the context of environmental justice concerns arising from disproportionate exposure of certain demographic groups to undesirable UOG impacts.9395 In this case, socioeconomic characteristics of populations living nearer to UOG are compared to those of populations living farther away. Recent scholarship on environmental justice has identified counties in Appalachia as hotspots for issues of lower water access and degraded water quality relative to other parts of the USA,96 and these inequalities may be exacerbated by the colocation of UOG operations. Exposure has similarly been operationalized using proximity to UOG in public health studies.97 Populations living close to UOG are identified as potentially exposed to UOG-related contaminants, and these exposures are subsequently used to test associations with health outcomes. Living closer to UOG has been found to be associated with higher odds of adverse pregnancy outcomes, cancer incidences, and hospitalization rates, among others.98 The physically based concept of groundwater vulnerability to contamination can elucidate the role of groundwater as a potential exposure route for UOG-derived contaminants. The regional scale assessment framework presented in this paper offers a novel approach for future interdisciplinary research, exploring the intersection of hydrogeologic and socioeconomic vulnerability, environmental justice, and public health.

Acknowledgments

We are grateful for comments from Susan Brantley and an anonymous reviewer that led to improvements in this manuscript. This research was developed under Assistance agreement no. CR839249 awarded by the U.S. Environmental Protection Agency to Yale University. It has not been formally reviewed by EPA. The views expressed in this document are solely those of the authors and do not necessarily reflect those of the Agency. EPA does not endorse any products or commercial services mentioned in this publication. M.A.S. was also supported by the Yale Institute for Biospheric Studies Small Grants Program and the Geological Society of America Graduate Student Research grant no. 13136-21. We thank the Yale Center for Research Computing for use of the high-performance computing infrastructure.

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.est.2c00470.

  • Regional model domain characteristics and physically based model performance; additional details for the parametrization schemes; analysis of 25 year particle tracks with retardation scenarios; metamodel subdomain characteristics and metamodel performance; and estimation of populations served by domestic water wells that are vulnerable to contamination (PDF)

Author Present Address

§ High Meadows Environmental Institute, Princeton University, Princeton, NJ 08544, United States

The authors declare no competing financial interest.

Supplementary Material

es2c00470_si_001.pdf (2.3MB, pdf)

References

  1. EIA . Annual Energy Outlook 2021; US Energy Information Administration: Washington, DC, 2021.
  2. Rosa L.; Rulli M. C.; Davis K. F.; D’Odorico P. The Water-Energy Nexus of Hydraulic Fracturing: A Global Hydrologic Analysis for Shale Oil and Gas Extraction. Earth’s Future 2018, 6, 745–756. 10.1002/2018ef000809. [DOI] [Google Scholar]
  3. Zhong C.; Zolfaghari A.; Hou D.; Goss G. G.; Lanoil B. D.; Gehman J.; Tsang D. C. W.; He Y.; Alessi D. S. Comparison of the Hydraulic Fracturing Water Cycle in China and North America: A Critical Review. Environ. Sci. Technol. 2021, 55, 7167–7185. 10.1021/acs.est.0c06119. [DOI] [PubMed] [Google Scholar]
  4. Mayfield E. N.; Cohon J. L.; Muller N. Z.; Azevedo I. M. L.; Robinson A. L. Cumulative environmental and employment impacts of the shale gas boom. Nat. Sustain. 2019, 2, 1122–1131. 10.1038/s41893-019-0420-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. EPA . Hydraulic Fracturing for Oil and Gas: Impacts from the Hydraulic Fracturing Water Cycle on Drinking Water Resources in the United States; Office of Research and Development, US Environmental Protection Agency: Washington DC, 2016.
  6. Bonetti P.; Leuz C.; Michelon G. Large-sample evidence on the impact of unconventional oil and gas development on surface waters. Science 2021, 373, 896–902. 10.1126/science.aaz2185. [DOI] [PubMed] [Google Scholar]
  7. Llewellyn G. T.; Dorman F.; Westland J. L.; Yoxtheimer D.; Grieve P.; Sowers T.; Humston-Fulmer E.; Brantley S. L. Evaluating a groundwater supply contamination incident attributed to Marcellus Shale gas development. Proc. Natl. Acad. Sci. U.S.A. 2015, 112, 6325. 10.1073/pnas.1420279112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. DiGiulio D. C.; Jackson R. B. Impact to Underground Sources of Drinking Water and Domestic Wells from Production Well Stimulation and Completion Practices in the Pavillion, Wyoming, Field. Environ. Sci. Technol. 2016, 50, 4524–4536. 10.1021/acs.est.5b04970. [DOI] [PubMed] [Google Scholar]
  9. Brantley S. L.; Yoxtheimer D.; Arjmand S.; Grieve P.; Vidic R.; Pollak J.; Llewellyn G. T.; Abad J.; Simon C. Water resource impacts during unconventional shale gas development: The Pennsylvania experience. Int. J. Coal Geol. 2014, 126, 140–156. 10.1016/j.coal.2013.12.017. [DOI] [Google Scholar]
  10. Shanafield M.; Cook P. G.; Simmons C. T. Towards Quantifying the Likelihood of Water Resource Impacts from Unconventional Gas Development. Groundwater 2019, 57, 547–561. 10.1111/gwat.12825. [DOI] [PubMed] [Google Scholar]
  11. Vengosh A.; Jackson R. B.; Warner N.; Darrah T. H.; Kondash A. A Critical Review of the Risks to Water Resources from Unconventional Shale Gas Development and Hydraulic Fracturing in the United States. Environ. Sci. Technol. 2014, 48, 8334–8348. 10.1021/es405118y. [DOI] [PubMed] [Google Scholar]
  12. Gross S. A.; Avens H. J.; Banducci A. M.; Sahmel J.; Panko J. M.; Tvermoes B. E. Analysis of BTEX groundwater concentrations from surface spills associated with hydraulic fracturing operations. J. Air Waste Manage. Assoc. 2013, 63, 424–432. 10.1080/10962247.2012.759166. [DOI] [PubMed] [Google Scholar]
  13. Drollette B. D.; Hoelzer K.; Warner N. R.; Darrah T. H.; Karatum O.; O’Connor M. P.; Nelson R. K.; Fernandez L. A.; Reddy C. M.; Vengosh A.; Jackson R. B.; Elsner M.; Plata D. L. Elevated levels of diesel range organic compounds in groundwater near Marcellus gas operations are derived from surface activities. Proc. Natl. Acad. Sci. U.S.A. 2015, 112, 13184. 10.1073/pnas.1511474112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Cozzarelli I. M.; Skalak K. J.; Kent D. B.; Engle M. A.; Benthem A.; Mumford A. C.; Haase K.; Farag A.; Harper D.; Nagel S. C.; Iwanowicz L. R.; Orem W. H.; Akob D. M.; Jaeschke J. B.; Galloway J.; Kohler M.; Stoliker D. L.; Jolly G. D. Environmental signatures and effects of an oil and gas wastewater spill in the Williston Basin, North Dakota. Sci. Total Environ. 2017, 579, 1781–1793. 10.1016/j.scitotenv.2016.11.157. [DOI] [PubMed] [Google Scholar]
  15. Robbins C. A.; Du X.; Bradley T. H.; Quinn J. C.; Bandhauer T. M.; Conrad S. A.; Carlson K. H.; Tong T. Beyond treatment technology: Understanding motivations and barriers for wastewater treatment and reuse in unconventional energy production. Resour. Conserv. Recycl. 2022, 177, 106011. 10.1016/j.resconrec.2021.106011. [DOI] [Google Scholar]
  16. Elliott E. G.; Ettinger A. S.; Leaderer B. P.; Bracken M. B.; Deziel N. C. A systematic evaluation of chemicals in hydraulic-fracturing fluids and wastewater for reproductive and developmental toxicity. J. Expo. Sci. Environ. Epidemiol. 2017, 27, 90–99. 10.1038/jes.2015.81. [DOI] [PubMed] [Google Scholar]
  17. Wollin K.-M.; Damm G.; Foth H.; Freyberger A.; Gebel T.; Mangerich A.; Gundert-Remy U.; Partosch F.; Röhl C.; Schupp T.; Hengstler J. G. Critical evaluation of human health risks due to hydraulic fracturing in natural gas and petroleum production. Arch. Toxicol. 2020, 94, 967–1016. 10.1007/s00204-020-02758-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Maloney K. O.; Baruch-Mordo S.; Patterson L. A.; Nicot J.-P.; Entrekin S. A.; Fargione J. E.; Kiesecker J. M.; Konschnik K. E.; Ryan J. N.; Trainor A. M.; Saiers J. E.; Wiseman H. J. Unconventional oil and gas spills: Materials, volumes, and risks to surface waters in four states of the U.S. Sci. Total Environ. 2017, 581-582, 369–377. 10.1016/j.scitotenv.2016.12.142. [DOI] [PubMed] [Google Scholar]
  19. Patterson L. A.; Konschnik K. E.; Wiseman H.; Fargione J.; Maloney K. O.; Kiesecker J.; Nicot J.-P.; Baruch-Mordo S.; Entrekin S.; Trainor A.; Saiers J. E. Unconventional Oil and Gas Spills: Risks, Mitigation Priorities, and State Reporting Requirements. Environ. Sci. Technol. 2017, 51, 2563–2573. 10.1021/acs.est.6b05749. [DOI] [PubMed] [Google Scholar]
  20. Sreekanth J.; Moore C. Novel patch modelling method for efficient simulation and prediction uncertainty analysis of multi-scale groundwater flow and transport processes. J. Hydrol. 2018, 559, 122–135. 10.1016/j.jhydrol.2018.02.028. [DOI] [Google Scholar]
  21. Cai Z.; Li L. How long do natural waters “remember” release incidents of Marcellus Shale waters: a first order approximation using reactive transport modeling. Geochem. Trans. 2016, 17, 6. 10.1186/s12932-016-0038-4. [DOI] [Google Scholar]
  22. Mallants D.; Kirby J.; Golding L.; Apte S.; Williams M. Modelling the attenuation of flowback chemicals for a soil-groundwater pathway from a hypothetical spill accident. Sci. Total Environ. 2022, 806, 150686. 10.1016/j.scitotenv.2021.150686. [DOI] [PubMed] [Google Scholar]
  23. Ma L.; Hurtado A.; Eguilior S.; Llamas Borrajo J. F. Forecasting concentrations of organic chemicals in the vadose zone caused by spills of hydraulic fracturing wastewater. Sci. Total Environ. 2019, 696, 133911. 10.1016/j.scitotenv.2019.133911. [DOI] [PubMed] [Google Scholar]
  24. Shores A.; Laituri M.; Butters G. Produced Water Surface Spills and the Risk for BTEX and Naphthalene Groundwater Contamination. Water, Air, Soil Pollut. 2017, 228, 435. 10.1007/s11270-017-3618-8. [DOI] [Google Scholar]
  25. Focazio M.; Reilly T.; Rupert M.; Helsel D.. Assessing Ground-Water Vulnerability to Contamination: Providing Scientifically Defensible Information for Decision Makers; Reston, VA, 2003.
  26. NRC . Ground Water Vulnerability Assessment: Predicting Relative Contamination Potential under Conditions of Uncertainty; The National Academies Press: Washington, DC, 1993; p 224.
  27. Esterhuyse S. Developing a groundwater vulnerability map for unconventional oil and gas extraction: a case study from South Africa. Environ. Earth Sci. 2017, 76, 626. 10.1007/s12665-017-6961-6. [DOI] [Google Scholar]
  28. Holding S.; Allen D. M.; Notte C.; Olewiler N. Enhancing water security in a rapidly developing shale gas region. J. Hydrol. Reg. 2017, 11, 266–277. 10.1016/j.ejrh.2015.09.005. [DOI] [Google Scholar]
  29. Rivard C.; Lavoie D.; Lefebvre R.; Séjourné S.; Lamontagne C.; Duchesne M. An overview of Canadian shale gas production and environmental concerns. Int. J. Coal Geol. 2014, 126, 64–76. 10.1016/j.coal.2013.12.004. [DOI] [Google Scholar]
  30. Rosales-Ramirez T. Y.; Kirste D.; Allen D. M.; Mendoza C. A. Mapping the Vulnerability of Groundwater to Wastewater Spills for Source Water Protection in a Shale Gas Region. Sustainability 2021, 13, 3987. 10.3390/su13073987. [DOI] [Google Scholar]
  31. Loveless S. E.; Lewis M. A.; Bloomfield J. P.; Davey I.; Ward R. S.; Hart A.; Stuart M. E. A method for screening groundwater vulnerability from subsurface hydrocarbon extraction practices. J. Environ. Manage. 2019, 249, 109349. 10.1016/j.jenvman.2019.109349. [DOI] [PubMed] [Google Scholar]
  32. Veiguela M.; Hurtado A.; Eguilior S.; Recreo F.; Roqueñi N.; Loredo J. A risk assessment tool applied to the study of shale gas resources. Sci. Total Environ. 2016, 571, 551–560. 10.1016/j.scitotenv.2016.07.021. [DOI] [PubMed] [Google Scholar]
  33. Wachniew P.; Zurek A. J.; Stumpp C.; Gemitzi A.; Gargini A.; Filippini M.; Rozanski K.; Meeks J.; Kværner J.; Witczak S. Toward operational methods for the assessment of intrinsic groundwater vulnerability: A review. Crit. Rev. Environ. Sci. Technol. 2016, 46, 827–884. 10.1080/10643389.2016.1160816. [DOI] [Google Scholar]
  34. Mallants D.; Doble R.; Beiraghdar Y. Fate and transport modelling framework for assessing risks to soil and groundwater from chemicals accidentally released during surface operations: An Australian example application from shale gas developments. J. Hydrol. 2022, 604, 127271. 10.1016/j.jhydrol.2021.127271. [DOI] [Google Scholar]
  35. Soriano M. A.; Siegel H. G.; Gutchess K. M.; Clark C. J.; Li Y.; Xiong B.; Plata D. L.; Deziel N. C.; Saiers J. E. Evaluating Domestic Well Vulnerability to Contamination From Unconventional Oil and Gas Development Sites. Water Resour. Res. 2020, 56, e2020WR028005 10.1029/2020wr028005. [DOI] [Google Scholar]
  36. Soriano M. A.; Siegel H. G.; Johnson N. P.; Gutchess K. M.; Xiong B.; Li Y.; Clark C. J.; Plata D. L.; Deziel N. C.; Saiers J. E. Assessment of groundwater well vulnerability to contamination through physics-informed machine learning. Environ. Res. Lett. 2021, 16, 084013. 10.1088/1748-9326/ac10e0. [DOI] [Google Scholar]
  37. Hunt R. J.; White J. T.; Duncan L. L.; Haugh C. J.; Doherty J. Evaluating Lower Computational Burden Approaches for Calibration of Large Environmental Models. Groundwater 2021, 59, 788–798. 10.1111/gwat.13106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. White J. T. A model-independent iterative ensemble smoother for efficient history-matching and uncertainty quantification in very high dimensions. Environ. Model. Software 2018, 109, 191–201. 10.1016/j.envsoft.2018.06.009. [DOI] [Google Scholar]
  39. Starn J. J.; Kauffman L. J.; Carlson C. S.; Reddy J. E.; Fienen M. N. Three-Dimensional Distribution of Groundwater Residence Time Metrics in the Glaciated United States Using Metamodels Trained on General Numerical Simulation Models. Water Resour. Res. 2021, 57, e2020WR027335 10.1029/2020wr027335. [DOI] [Google Scholar]
  40. Fienen M. N.; Nolan B. T.; Kauffman L. J.; Feinstein D. T. Metamodeling for Groundwater Age Forecasting in the Lake Michigan Basin. Water Resour. Res. 2018, 54, 4750–4766. 10.1029/2017wr022387. [DOI] [Google Scholar]
  41. Zell W. O.; Sanford W. E. Calibrated Simulation of the Long-Term Average Surficial Groundwater System and Derived Spatial Distributions of its Characteristics for the Contiguous United States. Water Resour. Res. 2020, 56, e2019WR026724 10.1029/2019wr026724. [DOI] [Google Scholar]
  42. Zell W. O.; Sanford W. E.. MODFLOW 6 Models Used to Simulate the Long-Term Average Surficial Groundwater System for the Contiguous United States; U.S. Geological Survey data release, 2020.
  43. Langevin C. D.; Hughes J. D.; Banta E. R.; Niswonger R. G.; Panday S.; Provost A. M.. Documentation for the MODFLOW 6 Groundwater Flow Model. 6-A55; U.S. Geological Survey: Reston, VA, 2017.
  44. White J. T.; Hunt R. J.; Fienen M. N.; Doherty J. E.. Approaches to Highly Parameterized Inversion: PEST++ Version 5, a Software Suite for Parameter Estimation, Uncertainty Analysis, Management Optimization and Sensitivity Analysis. 7-C26; U.S. Geological Survey: Reston, VA, 2020; p 64.
  45. Knowling M. J.; White J. T.; Moore C. R. Role of model parameterization in risk-based decision support: An empirical exploration. Adv. Water Resour. 2019, 128, 59–73. 10.1016/j.advwatres.2019.04.010. [DOI] [Google Scholar]
  46. Doherty J.; Fienen M. N.; Hunt R. J.. Approaches to Highly Parameterized Inversion: Pilot-point Theory, Guidelines, and Research Directions. Scientific Investigations Report 2010-5168; U.S. Geological Survey: Reston, VA, 2010; p 36.
  47. U.S. Fish & Wildlife Service National Wetlands Inventory. https://www.fws.gov/wetlands/ (accessed Dec 19, 2021).
  48. U.S. Geological Survey . National Water Information System (USGS Water Data for the Nation). https://waterdata.usgs.gov/nwis/ (accessed Dec 19, 2021).
  49. U.S. Geological Survey . National Hydrography Dataset and Watershed Boundary Dataset. https://www.usgs.gov/national-hydrography/access-national-hydrography-products.
  50. Pollock D. W.User Guide for MODPATH Version 7—A Particle-Tracking Model for MODFLOW; 2016-1086; Reston, VA, 2016; p 41.
  51. ODNR Ohio Oil & Gas Well Database. https://gis.ohiodnr.gov/mapviewer/?config=oilgaswells (accessed Dec 19, 2021).
  52. PADEP Pennsylvania DEP Office of Oil and Gas Management Spud Data. http://www.depreportingservices.state.pa.us/ReportServer/Pages/ReportViewer.aspx?/Oil_Gas/Spud_External_Data (accessed Dec 19, 2021).
  53. WVDEP West Virginia Oil & Gas Well Information. https://dep.wv.gov/oil-and-gas/databaseinfo/Pages/GIS-Data-Download-and-Information-Link.aspx (accessed Dec 19, 2021).
  54. Neupauer R.; Wilson J. Forward and backward location probabilities for sorbing solutes in groundwater. Adv. Water Resour. 2004, 27, 689–705. 10.1016/j.advwatres.2004.05.003. [DOI] [Google Scholar]
  55. Anderson M. P.; Woessner W. W.; Hunt R. J.. Applied Groundwater Modeling: Simulation of Flow and Advective Transport, 2nd ed.; Academic Press: San Diego, 2015; p 630. [Google Scholar]
  56. Haitjema H. The Role of Hand Calculations in Ground Water Flow Modeling. Groundwater 2006, 44, 786–791. 10.1111/j.1745-6584.2006.00189.x. [DOI] [PubMed] [Google Scholar]
  57. Evensen D. Ethics and ‘fracking’: a review of (the limited) moral thought on shale gas development. Wiley Interdiscip. Rev.: Water 2016, 3, 575–586. 10.1002/wat2.1152. [DOI] [Google Scholar]
  58. Sousa M. R.; Frind E. O.; Rudolph D. L. An integrated approach for addressing uncertainty in the delineation of groundwater management areas. J. Contam. Hydrol. 2013, 148, 12–24. 10.1016/j.jconhyd.2013.02.004. [DOI] [PubMed] [Google Scholar]
  59. Hothorn T.; Hornik K.; Zeileis A. Unbiased Recursive Partitioning: A Conditional Inference Framework. J. Comput. Graph. Stat. 2006, 15, 651–674. 10.1198/106186006x133933. [DOI] [Google Scholar]
  60. James G.; Witten D.; Hastie T.; Tibshirani R.. An Introduction to Statistical Learning with Applications in R; Springer: New York, NY, 2013; p 607. [Google Scholar]
  61. Hapfelmeier A.; Ulm K. A new variable selection approach using Random Forests. Comput. Stat. Data Anal. 2013, 60, 50–69. 10.1016/j.csda.2012.09.020. [DOI] [Google Scholar]
  62. Jiang H.; Deng Y.; Chen H.-S.; Tao L.; Sha Q.; Chen J.; Tsai C.-J.; Zhang S. Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes. BMC Bioinf. 2004, 5, 81. 10.1186/1471-2105-5-81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Speiser J. L.; Miller M. E.; Tooze J.; Ip E. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 2019, 134, 93–101. 10.1016/j.eswa.2019.05.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Apley D. W.; Zhu J. Visualizing the effects of predictor variables in black box supervised learning models. J. R. Stat. Soc. Series B Stat. Methodol. 2020, 82, 1059–1086. 10.1111/rssb.12377. [DOI] [Google Scholar]
  65. R Core Team . R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
  66. Hothorn T.; Hornik K.; Strobl C.; Zeileis A.. Party: A Laboratory for Recursive Partytioning, R package version 1.3-9, 2021.
  67. Tuszynski J.caTools: Tools: Moving Window Statistics, GIF, Base64, ROC AUC, Etc, R package version 1.18.2, 2021.
  68. Sing T.; Sander O.; Beerenwinkel N.; Lengauer T.; Unterthiner T.; Ernst F. G. M.. ROCR—Visualizing the Performance of Scoring Classifiers, R package version 1.0-11, 2020.
  69. Apley D. W.ALEPlot: Accumulated Local Effects (ALE) Plots and Partial Dependence (PD) Plots, R package version 1.1, 2018.
  70. Murray A.; Hall A.; Weaver J.; Kremer F. Methods for Estimating Locations of Housing Units Served by Private Domestic Wells in the United States Applied to 2010. J. Am. Water Resour. Assoc. 2021, 57, 828–843. 10.1111/1752-1688.12937. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Johnson T. D.; Belitz K.; Lombard M. A. Estimating domestic well locations and populations served in the contiguous U.S. for years 2000 and 2010. Sci. Total Environ. 2019, 687, 1261–1273. 10.1016/j.scitotenv.2019.06.036. [DOI] [PubMed] [Google Scholar]
  72. Manson S.; Schroeder J.; Van Riper D.; Kugler T.; Ruggles S.. IPUMS National Historical Geographic Information System. Version 16.0: Minneapolis, MN, 2021.
  73. Cutter S. L.; Boruff B. J.; Shirley W. L. Social Vulnerability to Environmental Hazards*. Soc. Sci. Q. 2003, 84, 242–261. 10.1111/1540-6237.8402002. [DOI] [Google Scholar]
  74. 58 PA Cons Stat § 3215, 2016, Well location restrictions. https://www.legis.state.pa.us/WU01/LI/LI/CT/HTM/58/00.032.015.000.HTM (accessed Jan 19, 2022).
  75. Ohio Rev Code § 1509.021, 2014, Surface locations of new wells. https://codes.ohio.gov/ohio-revised-code/section-1509.021 (accessed Jan 19, 2022).
  76. WV Code § 22-6A-12, 2015, Well location restrictions. https://code.wvlegislature.gov/22-6A-12/ (accessed Jan 19, 2022).
  77. Agarwal A.; Wen T.; Chen A.; Zhang A. Y.; Niu X.; Zhan X.; Xue L.; Brantley S. L. Assessing Contamination of Stream Networks near Shale Gas Development Using a New Geospatial Tool. Environ. Sci. Technol. 2020, 54, 8632–8639. 10.1021/acs.est.9b06761. [DOI] [PubMed] [Google Scholar]
  78. Shih J.-S.; Saiers J. E.; Anisfeld S. C.; Chu Z.; Muehlenbachs L. A.; Olmstead S. M. Characterization and Analysis of Liquid Waste from Marcellus Shale Gas Development. Environ. Sci. Technol. 2015, 49, 9557–9565. 10.1021/acs.est.5b01780. [DOI] [PubMed] [Google Scholar]
  79. Warner N. R.; Christie C. A.; Jackson R. B.; Vengosh A. Impacts of Shale Gas Wastewater Disposal on Water Quality in Western Pennsylvania. Environ. Sci. Technol. 2013, 47, 11849–11857. 10.1021/es402165b. [DOI] [PubMed] [Google Scholar]
  80. Xiong B.; Soriano M. A.; Gutchess K. M.; Hoffman N.; Clark C. J.; Siegel H. G.; De Vera G. A.; Li Y.; Brenneis R. J.; Cox A. J.; Ryan E. C.; Sumner A. J.; Deziel N. C.; Saiers J. E.; Plata D. L. Groundwaters in northeastern Pennsylvania near intense hydraulic fracturing activities exhibit few organic chemical impacts. Environmental Science: Processes & Impacts 2022, 24, 252–264. 10.1039/d1em00124h. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Maguire-Boyle S. J.; Barron A. R. Organic compounds in produced waters from shale gas wells. Environmental Science: Processes & Impacts 2014, 16, 2237–2248. 10.1039/c4em00376d. [DOI] [PubMed] [Google Scholar]
  82. Rogers J. D.; Burke T. L.; Osborn S. G.; Ryan J. N. A Framework for Identifying Organic Compounds of Concern in Hydraulic Fracturing Fluids Based on Their Mobility and Persistence in Groundwater. Environ. Sci. Technol. Lett. 2015, 2, 158–164. 10.1021/acs.estlett.5b00090. [DOI] [Google Scholar]
  83. Erickson M. L.; Elliott S. M.; Brown C. J.; Stackelberg P. E.; Ransom K. M.; Reddy J. E.; Cravotta C. A. Machine-Learning Predictions of High Arsenic and High Manganese at Drinking Water Depths of the Glacial Aquifer System, Northern Continental United States. Environ. Sci. Technol. 2021, 55, 5791–5805. 10.1021/acs.est.0c06740. [DOI] [PubMed] [Google Scholar]
  84. Podgorski J.; Berg M. Global threat of arsenic in groundwater. Science 2020, 368, 845. 10.1126/science.aba1510. [DOI] [PubMed] [Google Scholar]
  85. Tesoriero A. J.; Gronberg J. A.; Juckem P. F.; Miller M. P.; Austin B. P. Predicting redox-sensitive contaminant concentrations in groundwater using random forest classification. Water Resour. Res. 2017, 53, 7316–7331. 10.1002/2016wr020197. [DOI] [Google Scholar]
  86. Jasechko S.; Perrone D. Hydraulic fracturing near domestic groundwater wells. Proc. Natl. Acad. Sci. U.S.A. 2017, 114, 13138. 10.1073/pnas.1701682114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Meng Q. Spatial analysis of environment and population at risk of natural gas fracking in the state of Pennsylvania, USA. Sci. Total Environ. 2015, 515-516, 198–206. 10.1016/j.scitotenv.2015.02.030. [DOI] [PubMed] [Google Scholar]
  88. Slonecker E. T.; Milheim L. E. Landscape Disturbance from Unconventional and Conventional Oil and Gas Development in the Marcellus Shale Region of Pennsylvania, USA. Environments 2015, 2, 200. 10.3390/environments2020200. [DOI] [Google Scholar]
  89. Czolowski E. D.; Santoro R. L.; Srebotnjak T.; Shonkoff S. B. C. Toward Consistent Methodology to Quantify Populations in Proximity to Oil and Gas Development: A National Spatial Analysis and Review. Environ. Health Perspect. 2017, 125, 086004. 10.1289/ehp1535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Barth-Naftilan E.; Sohng J.; Saiers J. E. Methane in groundwater before, during, and after hydraulic fracturing of the Marcellus Shale. Proc. Natl. Acad. Sci. U.S.A. 2018, 115, 6970–6975. 10.1073/pnas.1720898115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Li Y.; Thelemaque N. A.; Siegel H. G.; Clark C. J.; Ryan E. C.; Brenneis R. J.; Gutchess K. M.; Soriano M. A.; Xiong B.; Deziel N. C.; Saiers J. E.; Plata D. L. Groundwater Methane in Northeastern Pennsylvania Attributable to Thermogenic Sources and Hydrogeomorphologic Migration Pathways. Environ. Sci. Technol. 2021, 55, 16413–16422. 10.1021/acs.est.1c05272. [DOI] [PubMed] [Google Scholar]
  92. Bondu R.; Kloppmann W.; Naumenko-Dèzes M. O.; Humez P.; Mayer B. Potential Impacts of Shale Gas Development on Inorganic Groundwater Chemistry: Implications for Environmental Baseline Assessment in Shallow Aquifers. Environ. Sci. Technol. 2021, 55, 9657–9671. 10.1021/acs.est.1c01172. [DOI] [PubMed] [Google Scholar]
  93. Ogneva-Himmelberger Y.; Huang L. Spatial distribution of unconventional gas wells and human populations in the Marcellus Shale in the United States: Vulnerability analysis. Appl. Geogr. 2015, 60, 165–174. 10.1016/j.apgeog.2015.03.011. [DOI] [Google Scholar]
  94. Clark C. J.; Warren J. L.; Kadan-Lottick N.; Ma X.; Bell M. L.; Saiers J. E.; Deziel N. C. Community concern and government response: Identifying socio-economic and demographic predictors of oil and gas complaints and drinking water impairments in Pennsylvania. Energy Res. Social Sci. 2021, 76, 102070. 10.1016/j.erss.2021.102070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Kroepsch A. C.; Maniloff P. T.; Adgate J. L.; McKenzie L. M.; Dickinson K. L. Environmental Justice in Unconventional Oil and Natural Gas Drilling and Production: A Critical Review and Research Agenda. Environ. Sci. Technol. 2019, 53, 6601–6615. 10.1021/acs.est.9b00209. [DOI] [PubMed] [Google Scholar]
  96. Mueller J. T.; Gasteyer S. The widespread and unjust drinking water and clean water crisis in the United States. Nat. Commun. 2021, 12, 3544. 10.1038/s41467-021-23898-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Clark C. J.; Xiong B.; Soriano M. A. Jr; Gutchess K. M.; Siegel H. G.; Ryan E. C.; Johnson N. P.; Cassell K.; Elliott E. G.; Li Y.; Cox A. J.; Bugher N.; Glist L.; Brenneis R. J.; Sorrentino K. M.; Plano J.; Ma X.; Warren J. L.; Plata D. L.; Saiers J. E.; Deziel N. C. Assessing unconventional oil and gas exposure in the Appalachian Basin: Comparison of exposure surrogates and residential drinking water measurements. Environ. Sci. Technol. 2022, 56, 1091–1103. 10.1021/acs.est.1c05081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Deziel N. C.; Brokovich E.; Grotto I.; Clark C. J.; Barnett-Itzhaki Z.; Broday D.; Agay-Shay K. Unconventional oil and gas development and health outcomes: A scoping review of the epidemiological research. Environ. Res. 2020, 182, 109124. 10.1016/j.envres.2020.109124. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

es2c00470_si_001.pdf (2.3MB, pdf)

Articles from Environmental Science & Technology are provided here courtesy of American Chemical Society

RESOURCES