Skip to main content
PLOS One logoLink to PLOS One
. 2020 Aug 18;15(8):e0237325. doi: 10.1371/journal.pone.0237325

Exposure assessment of adults living near unconventional oil and natural gas development and reported health symptoms in southwest Pennsylvania, USA

Hannah N Blinn 1,2,*,#, Ryan M Utz 1,#, Lydia H Greiner 2,, David R Brown 2,
Editor: Min Huang3
PMCID: PMC7446921  PMID: 32810134

Abstract

Recent research has shown relationships between health outcomes and residence proximity to unconventional oil and natural gas development (UOGD). The challenge of connecting health outcomes to environmental stressors requires ongoing research with new methodological approaches. We investigated UOGD density and well emissions and their association with symptom reporting by residents of southwest Pennsylvania. A retrospective analysis was conducted on 104 unique, de-identified health assessments completed from 2012–2017 by residents living in proximity to UOGD. A novel approach to comparing estimates of exposure was taken. Generalized linear modeling was used to ascertain the relationship between symptom counts and estimated UOGD exposure, while Threshold Indicator Taxa Analysis (TITAN) was used to identify associations between individual symptoms and estimated UOGD exposure. We used three estimates of exposure: cumulative well density (CWD), inverse distance weighting (IDW) of wells, and annual emission concentrations (AEC) from wells within 5 km of respondents’ homes. Taking well emissions reported to the Pennsylvania Department of Environmental Protection, an air dispersion and screening model was used to estimate an emissions concentration at residences. When controlling for age, sex, and smoker status, each exposure estimate predicted total number of reported symptoms (CWD, p<0.001; IDW, p<0.001; AEC, p<0.05). Akaike information criterion values revealed that CWD was the better predictor of adverse health symptoms in our sample. Two groups of symptoms (i.e., eyes, ears, nose, throat; neurological and muscular) constituted 50% of reported symptoms across exposures, suggesting these groupings of symptoms may be more likely reported by respondents when UOGD intensity increases. Our results do not confirm that UOGD was the direct cause of the reported symptoms but raise concern about the growing number of wells around residential areas. Our approach presents a novel method of quantifying exposures and relating them to reported health symptoms.

Introduction

Unconventional oil and natural gas development (UOGD) may represent a health risk due to exposure to chemicals used during the hydraulic fracturing process, on-site emissions, and/or a lack of strict regulations [14]. The UOGD process involves a combination of horizontal drilling across shale formations and the use of a heterogeneous fracturing fluid injected into wells at high pressure to fracture shale and release trapped oil and gas. Evidence suggesting associations between UOGD activity and adverse health effects has emerged from multiple studies. UOGD activity has been associated with adverse birth outcomes [57], increased rates of hospital use [810], asthma [11,12], and upper respiratory and neurologic symptoms [1315]. These studies have used a variety of approaches to estimate exposure to UOGD, including inverse distance weighting (IDW), cumulative well count, cumulative well density (CWD), well activity metrics, spatiotemporal models, and direct water sampling [68,13,16,17].

Given the associations between UOGD development and adverse health outcomes, but lack of resolution on questions pertaining to safe proximity of residency to wells, we sought to determine which variables related to UOGD are associated with a higher number of reported symptoms. For this study, two proximity metrics and one exposure variable constitute our exposure estimates and are referred to as exposure measures throughout this paper. This study was conducted to address the following questions: 1) Which exposure measure(s) best predicts the of number of symptoms reported? and 2) Which individual symptoms are associated with increasing exposure as estimated by each exposure measure? Unlike prior studies, this analysis compares three estimates of exposure: CWD, an IDW measure, and annual emission concentrations (AEC) derived from estimated well emissions within 5 km of a residence. CWD is defined as the count of wells divided by a spatial scale in km2 [8], while IDW, a similar measure, weights wells according to distance from a residence [6,7]. The AEC measure used publicly available data on wells to estimate concentrations of emission pollution at a residence. Bamber at al. [18] notes that exposure to UOGD is poorly characterized, and this analysis–comparing three estimates of exposure–attempts to address this concern. Though frequently used proximity and density metrics are included in this analysis, the methodological approach taken here has not been used to model emission concentrations at the home nor to predict symptom outcomes associated with increasing levels of exposure. The use of two methodologies applied here (i.e., statistical modeling to analyze the influence of different exposures on symptom reporting, and a technique to identify specific symptoms that might be indicative of exposure) suggests new techniques for studying relationships between health and exposure.

Materials and methods

Study sites & health outcomes

The Southwest Pennsylvania Environmental Health Project (hereafter referred to as EHP) is a nonprofit public health organization in Washington County, Pennsylvania (PA). Between February 1, 2012 and December 31, 2017, 135 children and adults completed health assessments at EHP. Individuals self-selected and approached EHP because of their concerns about exposure to UOGD. Health data were abstracted as described in Weinberger et al. [19] and the same data were used in this analysis.

As described by Weinberger et al. [19] the 135 de-identified health assessments were reviewed retrospectively by a team of health-care providers, including a board-certified occupational-health physician and at least one nurse practitioner. Records were excluded if the respondent was under 18 years old, worked in the oil-and-gas industry, lived outside of PA, or did not fully complete the assessment form (17 excluded). The remaining 118 health assessments were reviewed. Each symptom recorded in the assessment was reviewed and those symptoms that could be plausibly explained by co-occurring medical conditions, medical history, or work and/or social history were excluded. For this analysis, symptoms that remained were grouped into nine categories: general; lung and heart; skin; eyes, ears, nose, and throat (EENT); gastrointestinal (GI); nerves and muscle; reproductive; blood system; and psychological. For this analysis, we restricted the sample to residents of southwest PA with known latitude and longitude data for their residence (14 individuals excluded). The study population included individuals from eight counties: Washington, Greene, Beaver, Butler, Allegheny, Bedford, Fayette, and Westmoreland (Fig 1). This resulted in a convenience sample of 104 adults. This study was approved by the New England Institutional Review Board and the Chatham University Institutional Review Board.

Fig 1. Study area and active well locations.

Fig 1

Southwestern PA study location and active wells in 2016. No respondents lived in Lawrence County; however, a respondent in Butler County lived near the county border. Map was made with ArcGIS Desktop [20].

Exposure measures

Cumulative well density and inverse distance weighting

Home address was collected at the time of the health assessment. For this analysis, the address was used to determine the latitude and longitude coordinate of the residence of each respondent [21].

The PA Department of Environmental Protection (PA DEP) publishes active well locations and reported emissions on an open-access online portal [22]. The emissions inventory provides well location data in latitude and longitude coordinates and emissions data by pollutant type for each well. For assessments completed between February 1, 2012 and December 31, 2017, ArcGIS ArcMap 10.3 [20] was used to plot the latitude and longitude of each respondent’s residence alongside all active, unconventional wells within a 5-km radius around the residence during that year. A CWD was calculated for each respondent by dividing the number of wells in a 5-km radius around the home by the area of the radius.

An IDW calculation was also applied as a second method for quantifying exposure intensity. This measure applies more weight to wells located closer to a residence than to those located farther away. The inverse distance of each well within a 5-km radius of a residence was calculated, and those values were summed into one IDW score per residence as shown in the following equation:

IDWdensity=i=1n1di (1)

where distance (d) is kilometers between the well (i) and respondent’s residence, and n is the number of wells within the 5-km radius [5,13]. For this analysis, only wells located within PA state lines were included in the calculations due to a lack of data availability from neighboring states. Four residences' 5-km radius crossed into neighboring West Virginia. For these sites, the radius percentages outside of Pennsylvania were 0.6%, 4.4%, 10.7%, and 14.3%.

Annual emissions concentration

Annual emissions inventories for 2012 through 2017 were exported from the PA DEP’s database. Sources reported on the emissions inventory included venting and blowdown, dehydration units, drill rigs, stationary engines, pneumatic pumps, fugitive emissions, and emissions produced during the well completion stage. Sources of emissions that are not represented in the inventory include flaring, off-gassing from contaminated water, and truck traffic. A review of the PA DEP’s emissions-inventory data revealed six compounds had the highest reported volume expressed in tons/year: carbon monoxide, nitrogen oxides, particulate matter (PM2.5), aggregated volatile organic compounds (VOCs), methane, and carbon dioxide [22]. To estimate emissions at the residence, we used carbon monoxide, nitrogen oxides, PM2.5, and VOCs because they had known health effects at the expected level of exposure; methane and carbon dioxide did not so were not included despite being two of the top six compounds emitted. For this study, tons/year was converted to grams/hour.

A complete explanation of how concentrations at a residence were estimated can be found in Brown et al. [23] and will briefly be described here. To estimate emissions concentration at a respondent’s residence, an atmospheric dispersion box model was used to determine air dilution downwind from emission sources (wells) and estimate the concentration of compounds at a residence. The model assumes a theoretical box, or volume, of air carries emissions downwind from a well. As the box moves away from the source, the size of the box increases, and the concentration of pollutants is proportionally diluted. The initial concentration is inversely proportional to the rate of speed with which the box moves over the source. The vertical and lateral expansion of the box as it moves downwind is determined by weather and wind speed. This screening model estimates the level of air dilution during dispersion using three parameters: 1) cloud cover, 2) wind speed, and 3) time of day. These parameters are taken from Pasquill [24]. His report identifies six stability classes and five wind speeds that characterize the meteorological conditions that define these classes [25,26]. Using these conditions, we applied hourly cloud cover and wind speed data retrieved from the National Oceanic and Atmospheric Administration (NOAA) for the years 2012 through 2017. To ensure a complete set of weather data for each year of the study, we chose to use data from one major airport in southwest PA, the Pittsburgh Allegheny County Airport in West Mifflin, PA, in the model [27]. We were able to establish hourly conditions over a year and apply the estimates to each residence in our sample, to determine an annual level of exposure for each residence. Estimates of annual average exposures were based on weather patterns for each year over the entire region.

After our screening model was established, we used the weather data to calculate hourly concentrations from a reference well, estimated to emit 300 grams of a compound per hour, to standardize the formula when calculating how other wells deviate from a given reference [23]. Once hourly concentrations were computed for the reference case, we calculated a 90th percentile emissions concentration value (μg/m3) for distances of 0.5 km, 1 km, 2 km, 3 km, and 5 km in the four directional quadrants around the reference well. The resulting values represent varying exposure levels experienced at a given residence living between 0.5–5 km from the reference well. The hourly emissions are assumed proportional to the 300 grams/hour reference. Using the PA DEP data for the year corresponding to the respondent’s health assessment, the emissions of carbon monoxide, nitrogen oxides, PM2.5, and VOCs in grams/hour were summed into one total for each well.

Well sites are ubiquitous around residences in these counties, so we used the model to first calculate a residence’s exposure for the four directional quadrants. Within a quadrant, the distance of each well from the residence was determined and, depending on the distance, the 90th percentile concentration value was assigned to that well. Then, the total emissions from the well, in grams/hour, was multiplied by the 90th percentile concentration value and divided by 300 grams/hour to derive the deviance from the reference in each quadrant. The outputs give μg/m3 per well for each directional quadrant in a 5-km radius. The estimated emission concentrations from each well, across all quadrants, were added together into an annual total exposure value per residence. The total exposure value was used as the AEC measure in the analysis.

Statistical analysis

All statistical analyses were executed in the R Project for Statistical Computing [28]. Model comparisons were made using glmutli version 1.0.7.1 [29], and TITAN analyses with TITAN2 version 2.1 [30].

The analysis consisted of two approaches to address the research questions: generalized linear models (GLMs) to test the association between the number of symptoms reported and the intensity of each exposure, and Threshold Indicator Taxa Analysis (TITAN) to predict which specific symptoms were most likely to be reported with increasing intensity of each exposure measure. Each individual symptom reported in the health assessment was binomially coded per respondent with 1/0 for yes/no. An alpha level of < = 0.05 was used as a threshold for significance in both tests.

Because the dependent variable followed a Poisson distribution, GLMs were used for modeling. For each exposure GLM, a tool was used to automate statistical model selection by generating all possible unique combinations of our demographic variables with each exposure measure to identify the best-fit statistical model for each exposure measure against total number of symptoms. Our demographic variables included: age, sex, smoking status, and water source. All demographic variables were included in the selection tool and, by default, 100 potential models were generated a priori to determine the best fitting models. To choose our model, Akaike information criterion (AIC) values, with a correction for small sample sizes, and number of terms for each output model were compared [31]. Lower AIC values are associated with simpler models that exclude irrelevant terms, so when comparing models, the model with the lowest AIC is considered optimal [32,33]. The best model is the one with the lowest or second-lowest AIC score and then statistically assessed for each exposure variable [34]. Interactions between variables were excluded from the best model to increase model parsimony and only explore main effects. Zero-inflation was not required for our data as only 15% of the sample reported no symptoms. To determine our radius distance around the home, we applied GLM analyses using three spatial scales of cumulative well density: 1, 2, and 5 km. AIC criterion was used to determine which scale to study.

To assess how individual symptoms were related to changing density (CWD and IDW) and AEC, we applied the TITAN methodology. TITAN is a non-parametric analysis traditionally applied in the ecological sciences, but increasingly applied in environmental science [35], where the presence/absence of a species (also referred to as taxon) among different samples of communities is used to assess nonlinear community-scale responses, both positive and inverse, to changes in their environment. Environmental gradients are used in this process to express how an exposure is increasing in the studied environment. The primary goal in TITAN is to determine if there are levels of exposure along the gradient that influence a statistically significant positive or inverse response and are associated with the presence or absence of one or more specific species. The relationship of each species is assessed via an indicator value that ranges from 0 to 100, with 100 representing a perfect indication of species-specific association with the gradient. The TITAN analysis allows for the consideration of species that have low occurrence frequencies to identify those that possess high sensitivity to the environmental gradient. For example, Khamis et al. used the TITAN methodology to determine how reductions in glacier melting influence the presence and absence of certain aquatic species in rivers and lakes [3638].

For this study, we defined communities as individual respondents and species as the specific symptoms reported to identify the degree to which each symptom represented a statistically significant indicator of UOGD exposure (CWD, IDW, and AEC). To remove symptoms with frequencies too low to detect a pattern, we only included symptoms reported five or more times into the TITAN analysis (n = 50) [39]. Indicator values were considered statistically significant at an α of 0.05, and resulting symptoms were organized by those having a frequency greater than 10 and a z-score greater than or equal to 1. To our knowledge, this is the first use of TITAN methodology in public health research (S1 Appendix).

Results

Symptom reporting characteristics

In this convenience sample of 104 adults who presented health concerns about UOGD, 59% were female with a median age of 57. In this predominantly rural area, only a third reported using municipal water for household use with the majority relying on private wells, cisterns, or springs. Smoking status was available for 78 of the 104; of those, 40% reported either current or former smoking. The number of individual symptoms reported by individuals ranged from 0 symptoms to 36, with mean of 7 symptoms and a standard deviation of ± 7.7 symptoms per person. Table 1 shows the most frequently reported symptoms.

Table 1. Ten most frequently reported symptoms by number and percent of respondents (n = 104).

Symptom n n (%)
Sore Throat 34 33
Headache 34 33
Difficulty Speaking 34 33
Cough 32 31
Itchy or Burning Eyes 30 29
Stress 30 29
Shortness of Breath/Difficulty Breathing 26 25
Anxiety/Worry 26 25
Fatigue 21 20
Sinus Infection 20 19

Generalized linear models: Symptom total

Initial GLMs to test the three spatial scales against symptom total showed that models using 5 km as the radius had the lowest AIC value and were therefore selected in our study (1 km: AIC = 1095.26, 2 km: AIC = 1039.73, 5 km: AIC = 1027.65). Between the three exposure measures, Pearson correlation coefficients ranged from 0.03 to 0.60; thus, all three were tested independently against total reported symptoms. Final GLMs for each exposure measure included sex and smoker status as statistically significant individual predictors, while age was not found to be statistically significant. Sex and smoker status were modeled as categorical variables, while age was treated as continuous. Water source was excluded during the model selection process and was not included in the final models.

When controlling for age, sex, and smoker status the exposure measures produced the following results: CWD, IDW, and AEC predicted total reported symptoms (p<0.001, p<0.001, p<0.05 respectively). Based on comparisons of AIC values, CWD (AIC = 780.91) appeared to be more closely related to adverse health symptom reporting compared to IDW (AIC = 803.13) and AEC (AIC = 831.95; Table 2; Fig 2).

Table 2. GLM model results for each exposure variable against total reported symptoms.

Model Variable Estimate Std. Error Z statistic P value
CWD
Intercept 1.339 0.257 5.220 <0.001
Ever Smoked 0.520 0.088 5.921 <0.001
Sex 0.486 0.094 5.156 <0.001
CWD 0.840 0.102 8.267 <0.001
Age -0.002 0.004 -0.605 0.545
Residual degrees of freedom 73
AIC 780.91
IDW Score      
Intercept 1.407 0.253 5.563 <0.001
Ever Smoked 0.492 0.088 5.615 <0.001
Sex 0.487 0.094 5.184 <0.001
IDW Score 0.015 0.002 6.245 <0.001
Age -0.002 0.004 -0.461 0.645
Residual degrees of freedom 73
  AIC 803.13  
AEC      
Intercept 1.508 0.250 6.029 <0.001
Ever Smoked 0.544 0.087 6.252 <0.001
Sex 0.550 0.094 5.855 <0.001
AEC 5.74 x10-6 2.35x10-6 2.444 <0.05
Age -0.003 0.004 -0.758 0.449
Residual degrees of freedom 73
AIC 831.95

Fig 2. Exposure model plots.

Fig 2

Poisson distributed generalized linear model for total symptoms and a) CWD, b) IDW score, and c) AEC as the exposure measure. A 95% confidence interval was applied around the regression line.

TITAN analysis

The TITAN analysis identified multiple statistically significant symptoms along gradients of CWD, IDW, and AEC (α< = 0.05). The higher the indicator value, the more likely the symptom is to be seen with an increase in exposure. Twenty-wo symptoms were associated with the gradient of CWD (Fig 3) with itchy or burning eyes as the strongest, positive indicator value along the gradient (indicator value = 59.31), followed by stress (indicator value = 47.17) and dry skin (indicator value = 44.44). Headache, difficulty sleeping, sore throat, stress, and itchy or burning eyes were the five most frequent symptoms in this gradient. Of the twenty-two statistically significant symptoms, approximately, 27% were categorized as EENT symptoms, followed by nerve and muscle symptoms at 27% as well. Four symptoms were inversely associated with the gradient. Although this is counterintuitive, given that 50 symptoms were assessed along each gradient, one would expect a small number of symptoms be statistically significantly associated with gradients as type-I errors.

Fig 3. CWD TITAN results.

Fig 3

Individual symptoms by indicator value along the gradient of CWD. Indicator values range 0–100, with 100 being a perfect association with the gradient. Bar width represents symptom frequency.

Twenty-four symptoms were statistically significantly associated with the gradient of IDW (Fig 4), with difficulty sleeping as the strongest, positive indicator (indicator value = 46.6), followed by stress (indicator value = 45.58), and headache (indicator value = 37.7), though this particular symptom was inversely associated with the gradient. In addition to headache, difficulty speaking, and rash were also inversely associated with the gradient. The top five most frequent symptoms were the same as those in the gradient of CWD. Of the twenty-four statistically significant symptoms, approximately 25% were EENT; 25% were nerves and muscle symptoms; 17% were psychological symptoms.

Fig 4. IDW TITAN results.

Fig 4

Individual symptoms by indicator value along the gradient of IDW. Indicator values range 0–100, with 100 being a perfect association with the gradient. Bar width represents symptom frequency.

Seventeen symptoms were statistically significantly associated with the gradient of AEC (Fig 5). Difficulty sleeping represented the strongest, positive indicator value (indicator value = 61.58), followed by anxiety/worry (indicator value = 44.29), and depressed mood (indicator value = 37.36) which were both positively associated. Two symptoms were significantly inversely associated with the gradient of AEC. The top five most frequent symptoms of this gradient were: difficulty sleeping, anxiety/worry, cough, stress, and shortness of breath (difficulty breathing). Of the seventeen significant symptoms, roughly 29% were lung and heart symptoms; 29% were psychological.

Fig 5. AEC TITAN results.

Fig 5

Individual symptoms by indicator value along gradient of AEC. Indicator values range 0–100, with 100 being a perfect association with the gradient. Bar width represents symptom frequency.

Discussion

Despite a high degree of inherent complexity in associations between health and UOGD, a growing body of evidence, including our findings, suggests that the impacts of UOGD are heterogeneous and consistently detectable even at distances considered safe by some regulations. Determining the best method for quantifying UOGD intensity from a health standpoint is still unknown; however, we detected links between each exposure measure and total symptoms reported, including effects detected at a farther range (5 km) than reported in other studies [15,19]. Variation in UOGD operations can include the size, operation duration, and heterogeneity in chemicals used which adds complexity when attempting to relate operations to health symptoms. Discerning other influences on health that are not UOGD related or interact with UOGD in ways that have not yet been studied is an additional challenge. Other environmental stressors compounded with UOGD, or the inclusion of other UOGD infrastructure like pipelines and compressor stations, further such complexity. The use of amended IDW metrics, such as employed in Koehler et al. [40], attempts to expand IDW by including well development phases to better define exposure. Regardless, the consensus of studies reporting on health impacts around UOGD infrastructure suggests consistency between variables. The aggregate of these analyses suggests that regardless of how exposure to UOGD intensity is quantified, the impacts may occur at broad spatial scales and using distance to just the nearest UOGD facility may underrepresent risks to health.

The method of estimating UOGD intensity appears to affect the strength of associations between exposure and health outcomes in our study, but overall, a positive relationship was found between CWD, IDW, and AEC and total reported health symptoms within a 5-km radius of respondent homes. Brown et al. [23] did not find an association with the median AEC. This apparent inconsistency may be explained by their use of the median AEC, rather than the 90th percentile AEC used in this study.

Our model accounts for variation in the results that may be linked to our demographic variables. By doing so, our model terms related to exposure can account for the weight of UOGD after the variability of our demographic variables has been factored out. Relative to AEC and IDW measures, our findings indicate that CWD in proximity to residences, which constitutes a more simplistic measure, was more closely linked to total symptom reporting (Fig 2A). Exposure measures like CWD and IDW are considered proximity metrics and do not define an exact exposure pathway from source to residence; however, we hypothesize that adverse health symptoms could occur through inhalation of chemicals in UOGD emissions and that an increase in the density of wells would, together, create an exposure route. Given that both proximity and a better-defined exposure measure of AEC were significant, future studies should explore links between these measures on their own.

Our challenge to predict adverse health symptoms may reflect the general challenge of condensing well operations into a single, simple metric due to variation in each operation. Studies often apply only one metric for exposure, which could potentially overlook effects that may be seen if the measure were more precise and if more detailed UOGD data were readily available. Regardless of our findings, additional inquiries that compare health outcomes associated with exposure magnitude coupled with real-time live air monitoring are needed to determine which measure best quantifies exposure.

Our results also caution against limiting investigations of UOGD impacts on health within symptom categories due to the mixed suite of effects reported by respondents. For example, our model assessing the relationship between total symptoms and IDW, and total symptoms with AEC, suggested relatively limited predictability (Fig 2B & 2C). However, the respective TITAN analyses included nearly as many significant symptom associations compared to the CWD model (24 and 17 statistically significant indicators, respectively). Other studies have limited analyses to symptom categories, which may lead to underreporting of impacts to health across the literature, as individual symptoms have been classified under different categories [13,15,41]. A closer look at category composition in other studies revealed that itchy or burning eyes, sinus pain, fatigue, stress, and anxiety/worry are specific symptoms reported by individuals, consistent with our findings in the TITANs [14,15,42,43]. Psychological symptoms, such as stress and anxiety/worry, were included in the top five symptoms either together or separately in each of our models, with the highest percentage of psychological symptoms found in the gradient of AEC. Studies have found that increased air pollution can be linked to psychological distress, while others have found that increased stress, depression, and anxiety can be experienced by people living in communities with UOGD [14,15,4244]. Furthermore, Albrecht [45] notes that environmental change can cause human distress, which is supported by Lai [46] who found that negative perceptions of UOGD were associated with negative psychological states. The individual symptom counts increased along exposure gradients (Figs 35), suggesting subtler effects when compared to aggregate symptom total (Fig 2).

Our results also caution against emphasizing a single symptom to represent detrimental health in association with UOGD. Given the suite of various chemicals applied in UOGD operations and statistically significant interactions between UOGD exposures and demographic variables as highlighted by our GLM models, substantial weight of evidence is needed to conclude that a single symptom is likely to increase with UOGD intensity. The TITAN analyses identified four, three, and two symptoms that were statistically inversely related to the gradients of CWD, IDW, and AEC. Regardless of these anomalies, 18 out of 22, 21 out of 24, and 15 out of 17 statistically significant indictor symptoms were positively associated with the gradients of CWD, IDW, and AEC which contributes further evidence that UOGD impacts health in a heterogeneous manner.

Limitations & recommendations

As with any work attempting to relate the severity of health impacts to an environmental stressor, our study findings must be considered in the context of the study limitations. Our convenience sample consisted of individuals who presented to EHP because they had concerns about health effects associated with exposure to UOGD, limiting generalizability. Additionally, the health records lacked detailed information about symptoms onset, duration, and severity, or the nature of the symptom (i.e., episodic or chronic). Our lack of detailed information in our symptom data is a limitation of this study. The health records are also subject to recall bias, with the potential for over-reporting of symptoms particularly since respondents presented due to concern about health impacts of UOGD. One mitigating factor is that at the time of reporting their symptoms the respondents did not know their records would be reviewed for this study, nor did they know the exposure measures that would be used. Future studies should collect detailed symptom data and exposure measures in real-time to address these issues.

A further limitation of our study concerns available exposure data. Not all sources of emissions are included in data released by regulatory agencies, and activities such as flaring, off-gassing from contaminated water, and truck traffic may contribute to total emission rates, but are not currently reported [4749]. In addition, we were limited by available emissions data, which is reported on an annual basis. Some studies suggest that of the development and production stages, the hydraulic fracturing phase of development and the flowback phase of production account for the highest levels of emissions [3,40,50] and future work should include developing exposure measures that capture and isolate these stages.

The air-and-exposure screening model may have also underestimated actual emission concentrations because the model assumes emissions are constant over a year for all sources and does not factor in varying levels of emissions associated with well development phase. Furthermore, our model treats the trajectory of each well’s emissions plume equally when summed into one AEC value. Future work should factor wind direction into the model to estimate and correct for the influence wind direction plays on plume movement and concentration to improve upon the AEC value. Additionally, the box model does not correct for influences of topography [25], so we could not compare emission concentrations of various elevations. Regarding weather data, one limitation was that weather data was only taken from one airport for our sample.

Conclusion

This study was unique in its attempt to use an analytical tool taken from ecological research to determine specific symptom sensitivity to changes in CWD, IDW, and AEC from UOGD. The consistency in relationships between UOGD operations, regardless of how UOGD is quantified, and adverse health outcomes across the literature suggests that increases in symptoms could be related to higher exposure to emissions or chemicals used on the well pad [3,5,11,50]. The impact of fracking on health requires ongoing research because of continued industry growth, the relatively young age of the field, and the potential for chronic or latent illness, like cancer or developmental health impacts, to result from long-term exposure [1,51]. Our results do not confirm direct causal links between UOGD exposure and reported symptoms, but they do suggest that living in proximity to wells may be associated with health symptoms. Our findings suggest that an estimation of exposure that relies only on proximity may be simplistic, particularly in communities with increasing density of wells at 5-km scales, and that a deeper understanding of emissions composition and potency at the residence level is warranted. Future research should examine the question of how the aggregation of exposure affects health.

Supporting information

S1 Appendix. TITAN example code and explanation.

Lines 7–13 prepare a sample dataset of twenty potential symptoms and fifty individual respondents to mimic a subset of the data used in this study. For each respondent, 1s and 0s were used randomly for each symptom. A 1 means they did have that symptom, 0 means they did not. Now we have a dataset of fifty respondents and what symptoms they did or did not have. Line 16 creates a randomized list of exposure, one for each of the fifty respondents. In our study, each respondent had a measure of cumulative well density (CWD), an inverse distance weighting (IDW) score, and a measure of estimated annual emissions concentration (AEC). Line 16 creates an exposure variable that ranges from 0 to 50 (no units), with 0 being no exposure and 50 being representative of high exposure, though in our sample there was no limit to how high an exposure measure could go. Line 19 uses titan() to run the TITAN analysis, taking the reported symptoms and exposure values to determine if certain symptoms occur more or less at different levels of exposure. For example, when the exposure measure reaches 12, the model is looking for any symptoms that stand out as occurring more frequently at that exposure level. Indicator values (range 0–100) are used to score each symptom’s relationship to that exposure level, or gradient. A high indicator value shows a strong relationship with the gradient at a certain level. Then, the model determines if that relationship is positive or inverse. In ecological studies, one might study how changes in dissolved oxygen (DO) in a pond ecosystem cause certain species to die off or thrive as levels of DO change. When we begin to see a certain species appear in the pond, we can hypothesize that there may also be a change in DO as well since that species is an indicator of a certain threshold, or level of DO. Lines 22–29 takes information from the TITAN analysis and creates a table. For this table, the rows each represent the different symptoms, while columns are information pertaining to Indicator Value, the frequency of the symptom, p-values, whether the symptom is positively or inversely associated with the gradient, and the z-score. Using these parameters, we begin to filter out symptoms that were infrequent (line 25) and can also filter out insignificant symptoms or symptoms with low z-scores (lines 40–41). The latter two were done in our study but did not make sense for this sample data. Lines 34–36 construct the final plot we used to visualize the results of the TITAN analysis. In the plot, there are ten symptoms positively associated with the gradient with indicator values ranging from 32 to 71. The same goes for the inversely associated symptoms. For the plots in our study, we added additional characteristics like colors to group symptoms into categories and using the width of each bar to represent the frequency of symptoms being reported.

(R)

Acknowledgments

Dr. Melissa Bednarek, PT, DPT, PhD, CCS (proof reading) and Luke Curtis (proof reading).

Data Availability

Gas well location and emissions data is hosted on a PowerBI report and controlled by the PA Department of Environmental Protection. To view only gas well data, filter by Facility Type. We additionally filtered by year, county, and and pollutant as described in our methods. Data can then be exported to a .csv file: http://www.depgreenport.state.pa.us/powerbiproxy/powerbi/Public/DEP/AQ/PBI/Air_Emissions_Report Climate data was retrieved from NOAA's local climatological database. To use the tool, you need to select the state and county of where the airport is located. We used data from the Pittsburgh Allegheny County Airport in Allegheny County, PA. Once the airport has been added to your cart, you can determine the data range you wish to download and request a .csv of the data: https://www.ncdc.noaa.gov/cdo-web/datatools/lcd Health data cannot be shared publicly because some of the data we collect is in rural areas with sparse population. In areas of sparse population, it may be possible to identify participants using data such as GIS coding. Data are available from the Environmental Health Project Institutional Data Access / Ethics Committee (contact via Environmental Health Project, Sarah Rankin 724.260.5504) for researchers who meet the criteria for access to confidential data.

Funding Statement

DB and LG positions at the Southwest PA Environmental Health Proejct are funded by the Heinz Endowments E5450. The funders did not play a role in this study's design analysis, decision to publish, or preparation of the manuscript. Their funding was used prior to this study when the data was being collected. This study is a retrospective review of that data. HB and RU did not receive funding for this project.

References

  • 1.Elliott EG, Trinh P, Ma X, Leaderer BP, Ward MH, Deziel NC. Unconventional oil and gas development and risk of childhood leukemia: Assessing the evidence. Science of the Total Environment. 2017. January 15;576:138–47. 10.1016/j.scitotenv.2016.10.072 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Shonkoff SB, Hays J, Finkel ML. Environmental public health dimensions of shale and tight gas development. Environmental health perspectives. 2014. April 16;122(8):787–95. 10.1289/ehp.1307866 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.McKenzie LM, Witter RZ, Newman LS, Adgate JL. Human health risk assessment of air emissions from development of unconventional natural gas resources. Science of the Total Environment. 2012. May 1;424:79–87. 10.1016/j.scitotenv.2012.02.018 [DOI] [PubMed] [Google Scholar]
  • 4.Colborn T, Kwiatkowski C, Schultz K, Bachran M. Natural gas operations from a public health perspective. Human and ecological risk assessment: An International Journal. 2011. September 1;17(5):1039–56. [Google Scholar]
  • 5.Stacy SL, Brink LL, Larkin JC, Sadovsky Y, Goldstein BD, Pitt BR, et al. Perinatal outcomes and unconventional natural gas operations in Southwest Pennsylvania. PloS one. 2015. June 3;10(6):e0126425 10.1371/journal.pone.0126425 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Casey JA, Savitz DA, Rasmussen SG, Ogburn EL, Pollak J, Mercer DG, et al. Unconventional natural gas development and birth outcomes in Pennsylvania, USA. Epidemiology (Cambridge, Mass.). 2016. March;27(2):163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.McKenzie LM, Guo R, Witter RZ, Savitz DA, Newman LS, Adgate JL. Birth outcomes and maternal residential proximity to natural gas development in rural Colorado. Environmental health perspectives. 2014. January 28;122(4):412–7. 10.1289/ehp.1306722 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Denham A, Willis M, Zavez A, Hill E. Unconventional natural gas development and hospitalizations: evidence from Pennsylvania, United States, 2003–2014. Public health. 2019. March 1;168:17–25. 10.1016/j.puhe.2018.11.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Peng L, Meyerhoefer C, Chou SY. The health implications of unconventional natural gas development in Pennsylvania. Health economics. 2018. June;27(6):956–83. 10.1002/hec.3649 [DOI] [PubMed] [Google Scholar]
  • 10.Jemielita T, Gerton GL, Neidell M, Chillrud S, Yan B, Stute M, et al. Unconventional gas and oil drilling is associated with increased hospital utilization rates. PloS one. 2015. July 15;10(7):e0131093 10.1371/journal.pone.0131093 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Rasmussen SG, Ogburn EL, McCormack M, Casey JA, Bandeen-Roche K, Mercer DG, et al. Association between unconventional natural gas development in the Marcellus Shale and asthma exacerbations. JAMA internal medicine. 2016. September 1;176(9):1334–43. 10.1001/jamainternmed.2016.2436 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Willis MD, Jusko TA, Halterman JS, Hill EL. Unconventional natural gas development and pediatric asthma hospitalizations in Pennsylvania. Environmental research. 2018. October 1;166:402–8. 10.1016/j.envres.2018.06.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Elliott EG, Ma X, Leaderer BP, McKay LA, Pedersen CJ, Wang C, et al. A community-based evaluation of proximity to unconventional oil and gas wells, drinking water contaminants, and health symptoms in Ohio. Environmental research. 2018. November 1;167:550–7. 10.1016/j.envres.2018.08.022 [DOI] [PubMed] [Google Scholar]
  • 14.Tustin AW, Hirsch AG, Rasmussen SG, Casey JA, Bandeen-Roche K, Schwartz BS. Associations between unconventional natural gas development and nasal and sinus, migraine headache, and fatigue symptoms in Pennsylvania. Environmental health perspectives. 2016. August 25;125(2):189–97. 10.1289/EHP281 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Rabinowitz PM, Slizovskiy IB, Lamers V, Trufan SJ, Holford TR, Dziura JD, et al. Proximity to natural gas wells and reported health status: results of a household survey in Washington County, Pennsylvania. Environmental health perspectives. 2014. September 10;123(1):21–6. 10.1289/ehp.1307732 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wendt Hess J, Bachler G, Momin F, Sexton K. Assessing Agreement in Exposure Classification between Proximity-Based Metrics and Air Monitoring Data in Epidemiology Studies of Unconventional Resource Development. International journal of environmental research and public health. 2019. January;16(17):3055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Allshouse WB, Adgate JL, Blair BD, McKenzie LM. Spatiotemporal industrial activity model for estimating the intensity of oil and gas operations in Colorado. Environmental science & technology. 2017. September 5;51(17):10243–50. [DOI] [PubMed] [Google Scholar]
  • 18.Bamber AM, Hasanali SH, Nair AS, Watkins SM, Vigil DI, Van Dyke M, et al. A systematic review of the epidemiologic literature assessing health outcomes in populations living near oil and natural gas operations: Study quality and future recommendations. International journal of environmental research and public health. 2019. January;16(12):2123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Weinberger B, Greiner LH, Walleigh L, Brown D. Health symptoms in residents living near shale gas activity: A retrospective record review from the Environmental Health Project. Preventive medicine reports. 2017. December 1;8:112–5. 10.1016/j.pmedr.2017.09.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Environmental Systems Research Institute (ESRI). ArcGIS Release 10.3 [software]. 2014 [cited 2018 Nov 28]; Available from https://www.esri.com/en-us/home
  • 21.LatLong.net. Get latitude and longitude [Internet]. 2019 [cited 2018 Nov 28]; Available from https://www.latlong.net/
  • 22.PA Department of Environmental Protection (PA DEP); 2019 [cited 2018 Nov 28]. Database: Bureau of air quality: Air emissions report [Internet]. Available from http://www.depgreenport.state.pa.us/powerbiproxy/powerbi/Public/DEP/AQ/PBI/Air_Emissions_Report
  • 23.Brown DR, Greiner LH, Weinberger BI, Walleigh L, Glaser D. Assessing exposure to unconventional natural gas development: using an air pollution dispersal screening model to predict new-onset respiratory symptoms. Journal of Environmental Science and Health, Part A. 2019. August 26:1–7. [DOI] [PubMed] [Google Scholar]
  • 24.Pasquill F. Atmospheric Diffusion: The Dispersion of Windborne Material from Industrial and other Sources ‘, Ellis Horwood Limited, Chichester.
  • 25.Brown DR, Lewis C, Weinberger BI. Human exposure to unconventional natural gas development: a public health demonstration of periodic high exposure to chemical mixtures in ambient air. Journal of Environmental Science and Health, Part A. 2015. April 16;50(5):460–72. [DOI] [PubMed] [Google Scholar]
  • 26.Leelőssy Á, Molnár F, Izsák F, Havasi Á, Lagzi I, Mészáros R. Dispersion modeling of air pollutants in the atmosphere: a review. Open Geosciences. 2014. September 1;6(3):257–78. [Google Scholar]
  • 27.National Oceanic and Atmospheric Administration (NOAA) National Climatic Data Center; 2005 [cited 2018 Nov 28]. Database: Local climatological data [Internet]. Available from https://www.ncdc.noaa.gov/cdo-web/datatools/lcd
  • 28.R Core Team. A language and environmental for statistical computer. R for Windows 3.5.3, 2018 [cited 2018 Oct 1]; Available from https://www.R-project.org
  • 29.Calcagno V. Package ‘glmulti’ [Internet]. 2019 Apr 14 [cited 2018 Dec 15]; Available from https://cran.r-project.org/web/packages/glmulti/glmulti.pdf
  • 30.Baker ME, King RS, Kahle D. Glades. TITAN [Internet]. 2019 Aug 28 [cited 2018 Jan 5]; Available from https://cran.r-project.org/web/packages/TITAN2/TITAN2.pdf
  • 31.Hurvich CM, Tsai CL. Regression and time series model selection in small samples. Biometrika. 1989. June 1;76(2):297–307. [Google Scholar]
  • 32.Heinze G, Wallisch C, Dunkler D. Variable selection–a review and recommendations for the practicing statistician. Biometrical Journal. 2018. May;60(3):431–49. 10.1002/bimj.201700067 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Dziak JJ, Coffman DL, Lanza ST, Li R, Jermiin LS. Sensitivity and specificity of information criteria. bioRxiv. 2019. January 1:449751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Calcagno V, de Mazancourt C. glmulti: an R package for easy automated model selection with (generalized) linear models. Journal of statistical software. 2010. May 31;34(12):1–29. [Google Scholar]
  • 35.Qian SS. Environmental and ecological statistics with R. Chapman and Hall/CRC; 2016. November 3. [Google Scholar]
  • 36.Khamis K, Hannah DM, Brown LE, Tiberti R, Milner AM. The use of invertebrates as indicators of environmental change in alpine rivers and lakes. Science of the Total Environment. 2014. September 15;493:1242–54. 10.1016/j.scitotenv.2014.02.126 [DOI] [PubMed] [Google Scholar]
  • 37.Cardoso P, Rigal F, Fattorini S, Terzopoulou S, Borges PA. Integrating landscape disturbance and indicator species in conservation studies. PloS one. 2013. May 1;8(5):e63294 10.1371/journal.pone.0063294 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Baker ME, King RS. A new method for detecting and interpreting biodiversity and ecological community thresholds. Methods in Ecology and Evolution. 2010. March 1;1(1):25–37. [Google Scholar]
  • 39.King RS, Baker ME. Use, misuse, and limitations of Threshold Indicator Taxa Analysis (TITAN) for natural resource management. In Application of threshold concepts in natural resource decision making 2014. (pp. 231–254). Springer, New York, NY. [Google Scholar]
  • 40.Koehler K, Ellis JH, Casey JA, Manthos D, Bandeen-Roche K, Platt R, et al. Exposure assessment using secondary data sources in unconventional natural gas development and health studies. Environmental science & technology. 2018. April 26;52(10):6061–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Steinzor N., Subra W. and Sumi L., 2013. Investigating links between shale gas development and health impacts through a community survey project in Pennsylvania. NEW SOLUTIONS: A Journal of Environmental and Occupational Health Policy, 23(1), pp.55–83. [DOI] [PubMed] [Google Scholar]
  • 42.Hirsch JK, Smalley KB, Selby-Nelson EM, Hamel-Lambert JM, Rosmann MR, Barnes TA, et al. Psychosocial impact of fracking: a review of the literature on the mental health consequences of hydraulic fracturing. International Journal of Mental Health and Addiction. 2018. February 1;16(1):1–5. [Google Scholar]
  • 43.Ferrar KJ, Kriesky J, Christen CL, Marshall LP, Malone SL, Sharma RK, et al. Assessment and longitudinal analysis of health impacts and stressors perceived to result from unconventional shale gas development in the Marcellus Shale region. International journal of occupational and environmental health. 2013. June 1;19(2):104–12. 10.1179/2049396713Y.0000000024 [DOI] [PubMed] [Google Scholar]
  • 44.Sass V, Kravitz-Wirtz N, Karceski SM, Hajat A, Crowder K, Takeuchi D. The effects of air pollution on individual psychological distress. Health & place. 2017. November 1;48:72–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Albrecht G, Sartore GM, Connor L, Higginbotham N, Freeman S, Kelly B, et al. Solastalgia: the distress caused by environmental change. Australasian psychiatry. 2007. January 1;15(sup1):S95–8. [DOI] [PubMed] [Google Scholar]
  • 46.Lai PH, Lyons KD, Gudergan SP, Grimstad S. Understanding the psychological impact of unconventional gas developments in affected communities. Energy Policy. 2017. February 1;101:492–501. [Google Scholar]
  • 47.Garcia-Gonzales DA, Shonkoff SB, Hays J, Jerrett M. Hazardous air pollutants associated with upstream oil and natural gas development: a critical synthesis of current peer-reviewed literature. Annual review of public health. 2019. April 1;40:283–304. 10.1146/annurev-publhealth-040218-043715 [DOI] [PubMed] [Google Scholar]
  • 48.Macey GP, Breech R, Chernaik M, Cox C, Larson D, Thomas D, et al. Air concentrations of volatile compounds near oil and gas production: a community-based exploratory study. Environmental Health. 2014. December;13(1):82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.McCawley MA. Does increased traffic flow around unconventional resource development activities represent the major respiratory hazard to neighboring communities?: Knowns and unknowns. Current opinion in pulmonary medicine. 2017. March 1;23(2):161–6. 10.1097/MCP.0000000000000361 [DOI] [PubMed] [Google Scholar]
  • 50.McCawley M. Air contaminants associated with potential respiratory effects from unconventional resource development activities. InSeminars in respiratory and critical care medicine 2015. June (Vol. 36, No. 03, pp. 379–387). Thieme Medical Publishers. [DOI] [PubMed] [Google Scholar]
  • 51.Webb E, Bushkin-Bedient S, Cheng A, Kassotis CD, Balise V, Nagel SC. Developmental and reproductive effects of chemicals associated with unconventional oil and natural gas operations. Reviews on Environmental Health. 2014. December 6;29(4):307–18. 10.1515/reveh-2014-0057 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Min Huang

21 Jan 2020

PONE-D-19-34629

Exposure assessment of adults living near unconventional oil and natural gas development and reported health symptoms in southwest Pennsylvania, USA

PLOS ONE

Dear Ms. Blinn,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

While all of the reviewers appreciate the ideas presented by the authors, more than one of them also raised some major concerns. I'd like to give the authors a chance to address the reviewers' comments.

We would appreciate receiving your revised manuscript by Mar 06 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Min Huang

Academic Editor

PLOS ONE

Journal Requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: this is a very interesting and well written paper using a technique that is somewhat new for addressing the subject. given that much has been written on the topic, i would suggest stressing your use of statistical modeling to try to better understand the relationships. While you use a convenience sample, the focus should be on testing the model. You acknowledge the limitations of your study, which is refreshing and helpful to the reader.

Reviewer #2: This manuscript looks at the relationship between three UOGD exposure metrics and symptoms among a population of 135 individuals in SW Pennsylvania who approached the Environmental Health Project given their concerns about UOGD. The study adds new techniques, including TITAN borrowed from ecology. They also estimate exposure to UOGD emissions at participant residence, which has been done rarely in the UOGD literature. Poisson models are used to assess associations between the exposure metrics and symptom counts. Models are inappropriately adjusted given the very small sample size and therefore extrapolate beyond the support of the data. Further, the study far oversteps its results in the discussion section. I provide some detailed comments below:

Major

1. The Hess et al. 2019 study had major flaws and I urge you not to cite it. If you want to continue citing it, please describe many of its limitations as discussed by Buonocore et al in their letter to the editor published this month. Koehler et al. also discuss exposure metrics in ES&T in 2018.

2. The number of excluded surveys is very high (~48%). It would be helpful to break this down by reason for excluding the survey, e.g., did most excluded surveys result because the participant did not fully fill it out or because they lived out of state? If most surveys excluded because of incompletion, could you consider doing a sensitivity analysis with this excluded group?

3. I am confused about the first metric. “A cumulative well density was calculated per respondent for the year their survey was completed by taking the total number of wells divided by 5 km.” Wouldn’t you either count the total number of wells within 5km and use that as the exposure or divide the total number by pi*5km*5km (true density of wells). I can’t think of a time you would divide by 5km.

4. Please explain why only a single station was used to derive wind speed and direction. This seems to be a major limitation of the model, given that the study area is over 100km wide. Wind could be dramatically different across this area. If you have reason to believe otherwise, please include. Also, why were emissions of 4 pollutants summed when they can have quite different health effects?

5. What is a model selection and averaging tool? Which demographic variables were considered? In addition, you have a relatively small sample size (N = 135) and it appears you run double stratified models (sex and smoking). This means some models have <33 observations. Given a rule of thumb to have 15 observations for each independent variable, you could really only include the exposure and a single confounding variable in these models. Otherwise, you are extrapolating far beyond your data. Therefore, I am not confident in the current modeling strategy. Further, did symptoms really follow a Poisson distribution or was 0-inflation required?

6. I commend the authors for trying something new but the TITAN analysis is fairly confusing. Consider adding more information to the appendix how why this strategy was selected and how it was implemented, perhaps with a toy example.

7. Lines 327-330: this interpretation of the findings is extremely strong. You have a highly selected sample (people that were worried about UOGD) and a tiny sample size (<200 people). Claiming you have identified which metric to use is over-interpreting your results. Same issue lines 357-360. Tustin et al. had a much larger sample of people who did not enter the study based on UOGD concerns. Therefore, you are asking/answering vastly different questions.

Minor

• Line 69: should this read ambient air pollution?

• Figure 1 add year of active wells to the figure caption

• Please upload higher res figures, these are very difficult to read

• Figure 3: why are bars different widths? If this indicates something, please make clearer what

• Lines 134-136, you say you don’t have data on wells outside PA but then contradict yourself by saying four residences have wells outside PA. If you know about these wells, why not include?

• Line 163: and to be consistent with your other exposure metrics using 5km?

• “An alpha level of <= 0.5 was used as a threshold for significance in both tests” do you mean 0.05?

• First line of results: it wasn’t just a convenience sample but a sample of people reporting issues with UOGD, right?

• What does median age of +/-1 mean?

• Smoking status was never mentioned in methods, please add.

• Results: what was total number of symptoms queried?

• Table 1: how many people included in this table?

• Line 238: which correlation coefficient used?

• How was age modeled? Continuous? What other demographics were considered?

• Lines 316-317 see Koehler et al. 2018 re: compressors

Reviewer #3: Summary of Article

This study uses voluntary health surveys (taken across six years) and data on oil-and-gas (O&G) well locations and annual emissions to identify correlations between reported...[SEE ATTACHMENT FOR FULL COMMENTS]

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Chris Holder

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Review of PONE-D-19-34629.pdf

PLoS One. 2020 Aug 18;15(8):e0237325. doi: 10.1371/journal.pone.0237325.r002

Author response to Decision Letter 0


13 Mar 2020

Easier if view in Response Letter submitted as a document.

Comments Response and Action Taken to Comments Line numbers in revised manuscript where action was taken

Reviewer 1

Overall Comments

This is a very interesting and well written paper using a technique that is somewhat new for addressing the subject. given that much has been written on the topic, I would suggest stressing your use of statistical modeling to try to better understand the relationships. While you use a convenience sample, the focus should be on testing the model. You acknowledge the limitations of your study, which is refreshing and helpful to the reader. This has been addressed. 57-78

Reviewer 2

Overall Comments

This manuscript looks at the relationship between three UOGD exposure metrics and symptoms among a population of 135 individuals in SW Pennsylvania who approached the Environmental Health Project given their concerns about UOGD. The study adds new techniques, including TITAN borrowed from ecology. They also estimate exposure to UOGD emissions at participant residence, which has been done rarely in the UOGD literature. Poisson models are used to assess associations between the exposure metrics and symptom counts. Models are inappropriately adjusted given the very small sample size and therefore extrapolate beyond the support of the data. Further, the study far oversteps its results in the discussion section The model selection tool used for the glm analysis adjusts for the small sample size using a corrected Akaike information criterion (AIC) value, also known as an AICc value. We decided to simplify the glm models for this analysis and remove all interaction terms to only look at main effect variables. Doing so, we can discuss how each metric was related to symptom total after accounting for the other covariates. We have modified the discussion to highlight the issue of small sample size and how that may have influenced our results. 196-206

327-3329

Major Concerns

1. The Hess et al. 2019 study had major flaws and I urge you not to cite it. If you want to continue citing it, please describe many of its limitations as discussed by Buonocore et al in their letter to the editor published this month. Koehler et al. also discuss exposure metrics in ES&T in 2018. This has been addressed and the study removed. NA

2. The number of excluded surveys is very high (~48%). It would be helpful to break this down by reason for excluding the survey, e.g., did most excluded surveys result because the participant did not fully fill it out or because they lived out of state? If most surveys excluded because of incompletion, could you consider doing a sensitivity analysis with this excluded group? Only 23% of survey participants were excluded from the starting sample of 135, as our n for analysis was 104 adults. More detail on exclusion numbers was included. 91-101

3. I am confused about the first metric. “A cumulative well density was calculated per respondent for the year their survey was completed by taking the total number of wells divided by 5 km.” Wouldn’t you either count the total number of wells within 5km and use that as the exposure or divide the total number by pi*5km*5km (true density of wells). I can’t think of a time you would divide by 5km. This has been addressed and updated in the calculations and text. 118-120

4. Please explain why only a single station was used to derive wind speed and direction. This seems to be a major limitation of the model, given that the study area is over 100km wide. Wind could be dramatically different across this area. If you have reason to believe otherwise, please include. Also, why were emissions of 4 pollutants summed when they can have quite different health effects? This has been addressed. We did not have day to day weather data for each residence in our study. Using annually weather data from a local weather station at the Allegheny County Airport was deemed appropriate by the research team since annual averages are more generalizable across airports in the region, compared to day to day values that would certainly fluctuate. We have included this into the limitations section.

You are correct, each pollutant can have different health effects. For this exploratory study, we chose to combine them. Future work will examine individual pollutants. 159-162

5. What is a model selection and averaging tool? Which demographic variables were considered? In addition, you have a relatively small sample size (N = 135) and it appears you run double stratified models (sex and smoking). This means some models have <33 observations. Given a rule of thumb to have 15 observations for each independent variable, you could really only include the exposure and a single confounding variable in these models. Otherwise, you are extrapolating far beyond your data. Therefore, I am not confident in the current modeling strategy. Further, did symptoms really follow a Poisson distribution or was 0-inflation required? The model selection tool uses AICc, a corrected AIC value for small sample sizes. The tool is used to run all the potential statistical models and determine the best fitting models per the AIC values. This does not mean random subsets of data were generated. We simplified the models by removing the interaction terms as possibilities in the final model. 0-inflation was not required for our data as only 15% of the sample reported no symptoms. 196-209

6. I commend the authors for trying something new but the TITAN analysis is fairly confusing. Consider adding more information to the appendix how why this strategy was selected and how it was implemented, perhaps with a toy example We have added supplemental documentation by the way of sample code and a real-world example of TITAN being used in ecology. 223-225

S2 Appendix & submitted R code

7. Lines 327-330: this interpretation of the findings is extremely strong. You have a highly selected sample (people that were worried about UOGD) and a tiny sample size (<200 people). Claiming you have identified which metric to use is over-interpreting your results. Same issue lines 357-360. Tustin et al. had a much larger sample of people who did not enter the study based on UOGD concerns. Therefore, you are asking/answering vastly different questions. This has been addressed. We softened the language and discussed results relative to the characteristics of our sample being small and made up of concerned individuals. We clarified that Tustin et al. did indeed have a larger sample size and therefore isn’t directly comparable. 324-338

363-365

Minor Concerns

Line 69: should this read ambient air pollution? This has been addressed. Throughout the text we have standardized this and similar phrases to “annual emissions concentration” or AEC NA

Figure 1 add year of active wells to the figure caption This has been addressed. 104

Please upload higher res figures, these are very difficult to read Figures were submitted separately to the Journal. When they were included in the PDF submission, the quality was reduced. The Journal will have the highest resolution photos we can provide. NA

Figure 3: why are bars different widths? If this indicates something, please make clearer what This has been addressed. Figure 3 caption explains that the width of the bar is related to the frequency of that symptom being recorded. See the legend on the top right of the image. 278

Lines 134-136, you say you don’t have data on wells outside PA but then contradict yourself by saying four residences have wells outside PA. If you know about these wells, why not include? Fractracker.org provides a mapping tool of wells outside of PA but does not give Latitude/Longitude of these wells or provide a downloadable dataset for us to map on our own. We could type in respondent’s addresses and see that there may be wells within their buffer but could not apply this data to what we did in ArcGIS.

We only used PA gas wells so removed reference to wells outside of the state since we are not able to quantify their emissions, and yes, this would underestimate their exposure. Furthermore, only 7 out of 104 subjects we assessed were near enough to state borders to result in a reduced areal extent, and these possessed sample areas >50% within PA. 130-131

Line 163: and to be consistent with your other exposure metrics using 5km? This has been addressed. 117

“An alpha level of <= 0.5 was used as a threshold for significance in both tests” do you mean 0.05? This has been addressed. 193

First line of results: it wasn’t just a convenience sample but a sample of people reporting issues with UOGD, right? This has been addressed and more language about the sample characteristics has been added. 237

What does median age of +/-1 mean? This has been addressed. 238

Smoking status was never mentioned in methods, please add. This has been addressed. 199

Results: what was total number of symptoms queried? 779 NA

Table 1: how many people included in this table? This has been addressed. 243

Line 238: which correlation coefficient used? Pearson – this has been addressed in the text. 246

How was age modeled? Continuous? What other demographics were considered? Continuous – this has been addressed in the text. 249-250

Lines 316-317 see Koehler et al. 2018 re: compressors This has been addressed. 318

Reviewer 3

Overall Comments

It is particularly interesting that the correlation with total reported symptoms is stronger with CWD than with IDW, and the authors make a good case for why that is—that being, the authors did not have data on the particulars of each well’s activities (beyond location and annual emissions), and assigning higher weights to wells closer to a residence compounds that uncertainty. In my opinion, that points out a notable limitation in the methodology and conclusions of this study, which I believe the authors should be more straightforward in acknowledging. That is, the health concerns reported by residents are correlated only with annual data on well locations and emissions. It is not known if the health issues were transient or longer lasting (which the authors acknowledge), and it is not known exactly what was going on at the well pads within 5 km of their house. We know that wells under development can have highly variable emissions, perhaps by orders of magnitude, and some wells may only be under development for weeks before going into production mode, during which emissions are generally much smaller. The body of literature suggests that higher air concentrations resulting

from O&G activities are much more likely to occur during development, and that reports from local residents of health issues and nuisances also tend to peak during development. Therefore, correlating one-time reports of health complaints with annual O&G data is missing an opportunity to more directly investigate possible connections between health complaints and

O&G operations in real time. Can you say if new well development is active and thriving in these counties, mixed with wells in long-term production mode? Are there really no data on which wells were under development vs. in production, with sufficient time resolution to draw closer connections? We have added language in several sections to be clearer about these specific limitations. We acknowledge the annual emissions concentration does not account for day to day exposure at the household, given the different stages of well development have different lengths and emissions amounts. We added additional citations to support this claim. Any inclusion of well pad activity or development would have been an estimate on our part, as we do not have access to the start and end date of the various activities that occur on the well. Our dataset only including the spud date for the well. We do know that in these counties, wells are continuously being drilled and same are being re-fracked. That is not tracked on a public database for us to use.

60-61

159-162

180-182

333-343

379-387

391-397

And if you are asserting that these health-symptom reports may be linked inhaling chemicals emitted from well pads, then emissions and proximity are key to that exposure route. However, you should be clearer in your assumption that the respondents’ exposures are entirely at their residence (or at least that’s what this intensity metric represents) and that there is full chemical penetration into their home. I also found the “Ambient Air Emissions” methodology section to be rather unclear on a number of fronts, as I discuss in more detail below. This is the section that I advise the most revisions to. We added additional language to add clarity to this assumption and to be clear the exposure metric is calculated at the residence. 180-182

333-338

416-419

I also appreciate that the authors used several characteristics of respondents as part of the GLM application. However, the discussion of the role of those characteristics in correlating symptom reporting with changes in exposure intensity is non-existent. In my opinion, if you are making observations about statistical differences in males/females, smokers/non-smokers, age,

water source, etc., then the discussion on them should be more complete and include

speculations about the meaning of those differences. If you are not willing to speculate, then say why. The adjustments we made to simplifying the glm models are meant to also address this point and we added language to the discussion about how our models do account for the covariates like sex, age, and gender. Since we are controlling for the covariates, our discussion focuses on the relationship between exposure metrics and symptom total only (see revised Table 2). Starting at 245 under “Generalized Linear Models: Symptom Total”.

327-329

Major Concerns

Study Sites & Health Outcomes

Line 90: What does “Appendix A” refer to? The reference doesn’t have an Appendix A, and neither do you? Also, the link provided for reference 18 is broken, I think you’re missing a hyphen between “individual” and “heath”. An appendix has been added and supplemental materials provided separately. S1 Appendix

Cumulative Well Density and Inverse Distance Weighting

Line 119: You say that three radii were drawn initially. That implies something changed later…? This has been addressed. For this study, we only chose to look at wells within a 5-km buffer. The 1-km and 2-km buffers were initially included in the text to explain why we ended up choosing 5 km from a statistical approach (comparison of AIC value between buffer distances and CMD), but we have elected to instead site Elliot et al. (2019) as they also looked at a 5-km buffer range. We removed reference to the smaller buffers as we did not run glm analyses on these distances. 118-120

Line 122: suggest updating this sentence to “Active, unconventional wells for the year of a completed health assessment were plotted within the three radii around the respondent’s home.” Though 5 km was the maximum radius, it’s probably more clear to say within the three radii. This has been addressed. 118-120

Line 123: update this sentence to “A cumulative well density was calculated per respondent for the year of their survey, equal to the total

number of wells divided by the radius (in km).” Again, it’s using three radii with 5 km being the max, right? This has been addressed. 118-120

Line 135 about the four residences with wells outside of PA within their 5-km radius (insert hyphen there!): were they also outside of PA for their 1- and 2-km radii? Also, why not throw these respondents out of your study since you essentially had to remove part of their buffer areas, making their exposure measures underestimates? We only used PA gas wells so removed reference to wells outside of the state since we are not able to quantify their emissions, and yes, this would underestimate their exposure. We added information to describe the percent of these participant’s 5-km buffer that was in West Virginia. 130-131

Ambient Air Emissions

“Ambient Air Concentrations” is a more appropriate section title, as those are what you are deriving in this section. This has been addressed. Annual Emissions Concentration was the phrase chosen to standardize how we referred to this measure. Thank you for your input on this. 133

Line 139: should year 2012 been removed from your assessment, given that 25% of the year’s emissions data were unavailable? It is our understanding that all 2012 data was reported to the PA DEP. Given the industry uses algorithms to estimate their yearly emissions, we question whether they would adjust the formula to exclude months from their calculations. If confusion is coming from Brown et al.’s statement “Since March 31, 2012, owners and operators of natural gas production and processing operations have been required to report air emissions to the PA DEP…” we want to point out that this statement does not reflect a belief that only data after March 2012 was included.

Brown DR, Greiner LH, Weinberger BI, Walleigh L, Glaser D. Assessing exposure to unconventional natural gas development: using an air pollution dispersal screening model to predict new-onset respiratory symptoms. Journal of Environmental Science and Health, Part A. 2019 Dec 6;54(14):1357-63. NA

Line 155: for the Pittsburgh meteorological data, can you speak to their representativeness of conditions across your study area? This has been addressed. We did not have day to day weather data for each residence in our study. Using annually weather data from a local weather station at the Allegheny County Airport was deemed appropriate by the research team since annual averages are more generalizable across airports in the region, compared to day to day values that would certainly fluctuate. We have included this into the limitations section. 159-162

392-393

Line 163: what is the significance of modeled concentrations being less than 10 μg/m 3 , and what was that based on? This has been removed. The rational that we were keeping a buffer of 5-km across our metrics serves as a clearer justification. NA

Lines 171–173: an air concentration is not a “rate of emissions exposure”, it’s a concentration. You can then say (if correct) that you assume that it is the concentration at which residents are exposed (i.e., constant exposure to outdoor concentrations at their residence), and refer to it as an exposure concentration. What is meant by “total, or aggregated, emissions”—the one emissions rate you say earlier that you used in your calculations? Repeating that here makes it sound like there’s something more going on here at the end to determine individual exposure. This has been addressed. See section “Annual Emissions Concentration” starting on line 133

There is an inconsistency in terminology that can be easily rectified: you are using emissions of chemicals from well sites (which indeed are ambient air emissions, though “air emissions” is

clear enough; and they are reported as rates, not volumes; a consistent use of emission “rate” rather than “amount” is desired here, too) and meteorological data to estimate concentrations (not “levels” or “emissions concentrations” or “air level value”) of those chemicals in the ambient air at various distances from the well. This has been addressed. Emissions concentration is the phrase we selected to replace the varying ways we previously were describing it. See section “Annual Emissions Concentration” starting on line 133

You kindly offer a brief summary of a box-model methodology more fully described in other papers, but the brief summary as it is currently written is inadequate and confusing. Pasquill used five wind-speed categories, along with cloud cover and time of day, to define six stability classes, not 30. I see that your reference [23] (Brown et al., 2019) has a Table 1 that defines 30 stability classes from A1 to D30, but I don’t recall seeing these from Pasquill’s work (certainly correct me if I’m wrong!) and I don’t see how that’s used in concert with Figure 1 of [23] which is the vertical mixing/stability/distance look-up chart showing just the six stability classes. You say you pulled data on wind direction from NOAA, but the box model does not utilize wind direction. Taking a step back from the details, it would be clearer to say (roughly) that the model utilizes atmospheric stability, wind speed, and an

assumption about the size of a well-pad facility to estimate the size of a box in which the emissions are well mixed, which in turn is a measure of plume dilution, where the chemical concentration in the box is calculated as emission rate divided by box volume. (Hourly wind speed is used as part of the box volume calculation, right? That’s the “meters of air that pass over a site/minute” stated in [23]?) Then you can march through how you identified each of those parameters (Pasquill stability from hourly data on cloud cover and wind speed from NOAA; an assumed 100-m diameter of well pad; Pasquill assumptions on vertical mixing given stability

and horizontal distance; an assumption of constant 300 g/h emissions). This has been addressed. We believe the confusion was coming from the use of stability classes and stability conditions, where they are properly called stability classes and meteorological conditions, based on solar radiation, that define correlated stability classes. In Brown et al. (2019), five wind speeds were used alongside 6 meteorological conditions to create the 30 correlated stability classes that we applied to hourly weather data from the Allegheny County Airport (their Table 1). Pasquill’s original correlated chart shows 5 meteorological conditions, but Brown et al. 2019 added a fourth daytime insolation category.

Brown DR, Greiner LH, Weinberger BI, Walleigh L, Glaser D. Assessing exposure to unconventional natural gas development: using an air pollution dispersal screening model to predict new-onset respiratory symptoms. Journal of Environmental Science and Health, Part A. 2019 Dec 6;54(14):1357-63. 144-162

The relationship between the reference emission rate used in the modeling (300 g/h) and the actual facility emission rates (variable)

is not entirely clear. I think you’re telling me that you ran the model to get concentrations per unit emissions at five different distances

from a well, based on a high-end metric of hourly concentrations in a year (why 300 g/h and not just 1 g/h?). Then you got a well’s real emissions and multiplied them by the modeled concentration per unit emissions. If that is correct, please consider updating the

final paragraph of this section to be more clear about this. This has been addressed. You are correct, the 300 g/h is arbitrary as any hypothetical emissions number, above zero, would give the concentration value at each distance. The model was run to determine the concentration at each distance from the well. When we determined the distance between the well and a home, we took that concentration value, multiplied it by the emissions rate from the DEP dataset, and then divided by 300 to get what is estimated to be the emissions concentration at that residence µg/m3.

163-172

The use of quadrants around a residence is not clear to me. How does this affect concentration in any way? This has been addressed. We factor in what directional quadrant the well is from the home and the appropriate 90th percentile concentration value for that quadrant and the distance from the home is chosen and used to calculate the estimate emissions from a given well. Then we add the concentrations calculated from the North, South, East, and West into one AEC value. The quadrants are a step in the calculation. 173-182

Statistical Analysis

Lines 184 and 219: was the threshold of significance 0.5 or 0.05? This has been addressed. 193

Final paragraph: what is the purpose of only including symptoms reported 5+ times, and organizing symptoms with frequencies > 10? We used these exclusion parameters to remove symptoms that are too infrequent to do statistical analysis with. The numbers are arbitrary and simply provide a way for us cut out infrequencies. 228-230

Generalized Linear Models: Symptom Total

I must admit that Table 2 is difficult to follow. Would a strong statistician understand this? Under a given model, should I be looking to the line restating the model name as the variable, to find statistical significance (e.g., p<0.001 for Cumulative well density-Cumulative Well Density and for Aggregated emissions-Aggregated Emissions, but p=0.316 for IDW Score-IDW Score)? What’s the meaning of the “Cumulative Well Density: Age” type rows, and why is there only one of these kinds of rows for the IDW Score model, while three for Aggregated emissions model and five for Cumulative well density model? The removal of the interaction terms (Cumulative Well Density: Age) were removed to simplify these models. Table 2 has been updated to reflect these changes. Removing the interaction terms and adjusting the IDW metric as explained in the cover letter let to the IDW metric to be a significant predictor in the study and this has been updated throughout the text.

In the table, the name of the model is given in italics. That is identifying with exposure model was being testing against symptom total (while adjusting for our demographic variables). When you read the table, you will see Cumulative Well Density statistically significantly predicted Symptom Total while adjusting for covariates sex, age, and gender. The model was significant at a p<0.001. Citation – 230

Table 2 – 256

Paragraph starting Line 250: I’m not sure what the second sentence is saying, the part about “occurred between smoker status”. It appears to

me that there was little-to-no relationship between number of symptoms reported by current/former male smokers and magnitude of emissions, and for males not reporting their smoker status the model actually shows

a declining number of reported symptoms and as emissions increased. I think your declarations of these results could use qualifying terms like

“generally” or “on average”: “Increased symptom reporting also generally occurred…”, “Females on average also reported more symptoms than males…”, and so on. Change “with” on lien 254 to “between”. Final sentence: you probably should say that water source isn’t shown in Figure 2, for clarity? However, again going back to my confusion over Table 2, doesn’t it show p=0.049 (which is >0.001) for Cumulative Well Density: Water Source, and thus isn’t statistically significant? This has been addressed by simplifying the models to only show main effects. See Table 2. NA – text was omitted

Figure 2b’s X axis is better labeled as concentration rather than emissions, as covered earlier in my comments. This has been addressed. NA

TITAN Analysis

Lines 281–282: range and mean was the same as what? Same on line 295. This has been addressed and the confusing text removed to simplify the presentation of the results. NA

Figures need y-axis labels This has been addressed. NA

Minor Concerns

“Household” refers to the people in the house. I think in most cases you mean “residence” (i.e., the location of the house). This has been addressed. NA – updated throughout text

In many cases, you use “gas well” as shorthand for “oil and gas well” but it implies by omission that they’re not oil wells. Consider just using “well”. This has been addressed. NA – updated throughout text

If you estimate concentrations from emissions, then consider if your model names and

results discussions should refer to a “concentration model” and “concentration intensity” and “concentration gradient” etc. rather than emissions model, intensity, gradient, etc. This has been addressed. NA – updated throughout text

Not defining duration of symptom (short periods vs long periods of symptom persistence) is a concern in terms of understanding if reported health issues are episodic versus chronic. Not correlating time of symptom with UOGD activity also weakens assumptions about correlations between well activities and health issues. This has been addressed.

380-382

I take some issue with calling the reported symptoms “health effects”, “health impacts”,

and similar phrasing. These phrases imply cause (O&G emissions) and effect (itchy eyes, etc.). Perhaps terms like “negative health symptoms” are more appropriate? This has been addressed. Negative health symptoms used instead. NA – updated throughout text

I also think you should be more careful about referring to CDW and IDW as measurements of exposure. They’re metrics of proximity to wells, and that’s it. CONC is closer to an exposure metric, as you attempt to estimate air concentrations of O&G-emitted chemicals, at residences. I think at the least you should acknowledge this, and perhaps then establish that for convenience you will refer to them as metrics of potential exposure intensity (or something like that). This has been addressed and we clarify that CDW and IDW are proximity in the introduction but will be referred to as exposure metrics. In the discussion we re-discuss this distinction. 60-61

333-336

Abstract

Lines 26–27: suggest rewording sentence to: “We investigated UOGD density and well emissions and their associations with symptom reporting by residents of southwest Pennsylvania.” This has been addressed. 24-25

Line 28: change “from 2012-2017” to “in 2012–2017” (en dash) This has been addressed. 26

Line 31: insert comma after “intensity” This has been addressed. NA – text modified or removed

Line 34: change “ambient air emissions” to “ambient-air emissions” This has been addressed. NA – text modified or removed

Line 35: a dispersion model does not quantify emissions This has been addressed. 34-35

Line 41: change “comprised of” to “constituted” This has been addressed. 39

Line 42: I think you should change “increased” to “increases”? This has been addressed. 41

Introduction

Lines 48–49: change “human health risk” to “human-health risk” This has been addressed. 46

Line 61: change “number” to “numbers” This has been addressed. 59

Lines 67–68: insert comma after the [8] citation, and change “inverse distance weighting” to “IDW” This has been addressed. 67

Lines 69–70: I think you should change “well emissions exposure” to “emissions exposure metric”? Also, change “calculate ambient air at the” to “calculate ambient-air concentrations at the”. Change “exposure metric comparison as well, however, their” to “exposure-metric comparison as well, but their” This has been addressed. 68-69

Line 72: I think you should put “[16]” after the Hess citation? This has been addressed and the study was removed. NA – text modified or removed

Lines 74–75: suggest revising as “...and this analysis—comparing three estimates of exposure, including reported emissions—attempts…” This has been addressed. 71-72

Final sentence starting on Line 78: suggest changing to “The aggregate of methodologies applied here—using statistical modeling to analyze the influence of different exposures on symptom reporting, and applying a technique to identify specific symptoms that might be indicative of exposure—is novel in UOGD research and provides insight into new techniques for studying relationships between health and exposure variables.” This has been addressed. 75-78

Study Sites & Health Outcomes

Line 87: change “Between” to “In” This has been addressed. 83

Line 95: I think you should change “Weinberger et al. the” to “Weinberger et al. [19], the”? This has been addressed. 87,91

Line 97: change “oil and gas industry” to “oil-and-gas industry” This has been addressed. 96

Line 98: suggest changing “complete the assessment form (n=118). The 118 health assessments” to “acomplete the assessment form (17 excluded). The remaining 118 health assessments” This has been addressed. 97-101

Line 99: change “health care providers” to “health-care providers”, and “occupational health physician” to “occupational-health physician” This has been addressed. 92

Line 103: change “one of eight counties” to just “eight counties” This has been addressed. 100

Figure 1 caption: remove comma in “Southwestern, PA”; change “Lawrence county” to “Lawrence County”; insert “County” after “Butler”; change the “[20]” citation to “[22]” This has been addressed. 103-105

Figure 1: suggest making county names more readable (move them on top of the well locations?) This has been addressed. New map image included with submission. NA

Cumulative Well Density and Inverse Distance Weighting

Citation numbering got messed up in a few spots. I think [20] in the first paragraph should be [21], and in the second paragraph [22] should also

be [21] while [20] should be [22]. Please check. Also in the second paragraph, I think you should insert “[19]” after the Weinberger citation. This has been addressed. 111,113,115

Line 120: change “1km” to “1 km” (insert space) This has been addressed. NA – text modified or removed

Line 126: the “IDW” abbreviation was already established earlier. This has been addressed. NA – text modified or removed

Line 127: should “qualifying” be “quantifying”? Leading into the next line, change “closer to the respondents’ home” to “closer to a respondent’s

home”. This has been addressed. NA – text modified or removed

Line 128: suggest updating this sentence to “The inverse distance of each well within 1-, 2, and 5-km radii of a residence was calculated, and those

values were summed into one IDW score per respondent, per radius, as shown in the following equation:” This has been addressed. NA – text modified or removed

Lines 132–133: change “respondents’ home, and n is the number of wells within the 5 km buffer” to “respondent’s home, and n is the number of

wells within the radius” This has been addressed. 127

Ambient Air Emissions

Line 139: why does it matter that emissions data after December 31, 2017 were unavailable? The health survey data stopped after 2017 anyway. This has been addressed. NA – text modified or removed

Line 140: insert hyphen: “emissions-inventory data” This has been addressed. 135

Line 141: define PM 2.5 This has been addressed. 136

Line 142: reference [22] should be [21] This has been addressed. It is back to being [22] simple to do the removal and addition of citations in the reference list, but [22] reflects the citation for the PA DEP data. 137

Lines 144–146: what emission sources weren’t included? You mention this much later, but would be good to mention here, too. This has been addressed. 141-142

Line 154: “that was collected at the Pittsburgh Allegheny County Airport” is more clearly stated as “for the Pittsburgh Allegheny County Airport” This has been addressed. 157

Line 156: your study was through 2017, not 2016. This has been addressed. 162

Line 162: insert hyphen: “16-km radius” This has been addressed. NA – text modified or removed

Statistical Analysis

Line 181: insert comma after “intensity of each exposure” This has been addressed. 190

Line 185: insert “for” after “were used”? This has been addressed. 195

Line 186: change “a model selection and averaging tool” to “a tool for model selection and averaging” This has been addressed. 196

Line 187: change “best fit” to “best-fit” This has been addressed. 198

Line 189: change “one hundred” to “100” This has been addressed. 200

Line 190: I’m not sure what “to model assessment” means, and why 100

models? 100 is the default number of models that are run prior to the fitting of the best models. Out of 100 models, the model with the lowest AIC value is considered the best combination of variables. 205-206

Line 194: change “model” to “models” This has been addressed. 204

Line 202: insert comma after “emissions” This has been addressed. NA – text modified or removed

Symptom Reporting Characteristics

Line 227: is the median age better stated as “57 ± 1 standard deviation (SD)”? Note to put a space after ±. If this is the correct expression, then consider also updating lines 230–231 to say “with a mean of 7 ± 7.7 SD”. This has been addressed. NA – text modified or removed

Lines 229–230: is this sentence saying that some respondents reported 0 symptoms, while one or more respondents reported 36 symptoms, with an average of 7 reported per person? Consider rewording to be more clear. This has been addressed. 241-243

Generalized Linear Models: Symptom Total

First sentence: consider rewording to “Based on the initial GLMs (including demographic variables), models using a 5-km radius for

cumulative well density had the lowest AIC value (relative to the 1- and 2-km radii), and 5 km was therefore used as the defining radius for

cumulative well density as an exposure variable…” This has been addressed. NA – text modified or removed

Line 238: change “from 0.05 to 0.73, thus all three” to “from 0.05 to 0.73; thus, all three” This has been addressed. 247

Figure 2a’s X axis should have a space between 5 and km (“5 km”). This has been addressed. NA

Line 248: insert “did” before “emissions” This has been addressed. NA – text modified or removed

Line 250: revise to “In the cumulative-well-density model…” This has been addressed. NA – text modified or removed

TITAN Analysis

Lines 264–265: update “along the cumulative well density, IDW, and emissions gradients” to “along gradients of cumulative well density, IDW, and emissions”. Otherwise, proper grammar would suggest that you say “along the cumulative-well-density, IDW, and emissions gradients”, which is fine but some consider to be awkward. Consider this throughout this section (e.g., like 267 “cumulative-well-density gradient” or “gradient of cumulative well density”). This has been addressed. Language corrected starting at 262 and going through the “TITAN Analysis” section.

Figure 3 discussion: Suggest you introduce the figure on line 266: “Fig 3 shows results for cumulative well density, with the 23 significant

symptoms displayed. Itchy or burning eyes…”. Then later on line 270, you can remove the “Of the twenty-three significant symptoms, ” preface and just begin “Roughly 26% of the symptoms were categorized…”. This has been addressed. 269

Lines 272–273: What is the meaning of negative associations? Increasing well density leading to decreasing reports of headaches, difficulty

speaking, ringing in ears, and rash? That’s odd, isn’t it? It might be worth mentioning your “type-I error” hypothesis here? Earlier on line 266, you

when discussing the top three indicator values, you probably should include that they’re positive associations, just to be clear? These negative associations are considered type-I errors in our analysis. It does not necessarily mean that was density increased headache decrease, but rather we expect with so many symptom variables being thrown into the model, some will return as anomalies/errors. 271-274

369-373

Lines 272 and 281: should you use “negatively associated” or “inversely associated”? This has been addressed. NA – text updated in multiple places in section

Lines 284–285: maybe you mean “followed by nerves and muscle symptoms and psychological symptoms, which comprised 21% of

symptoms each”? This has been addressed. 286

Lines 285–286: This sentence might be better as “In addition to headaches, difficulty speaking was also negatively associated with the gradient.”? Similarly, lines 294–295: “In addition to headaches, rash and palpitations were also negatively associated with the gradient.” This has been addressed. 282

Lines 298–299: end of sentence might be better stated as “with psychological symptoms and nerve and muscle symptoms each at 20%.”? This has been addressed. 299

Discussion

Lines 309, 314, 318, and 323: “human-health standpoint”, “human-health symptoms”, “human-health impacts”, “human-health metrics” (insert hyphen) This has been addressed. NA – text updated in multiple places in section

Line 311: I’m not 100% certain but I think “further” should be “farther”? This has been addressed. 311

Line 314: change “non-UOGD” to “not UOGD” This has been addressed. 314-315

Line 320: remove “that” This has been addressed. NA – text modified or removed

Line 322: change “does” to “do” This has been addressed. NA – text modified or removed

Line 325: “5-km” (insert hyphen) This has been addressed. 326

Line 334: change “was” to “were” This has been addressed. NA – text modified or removed

Lines 335–336: “exposure-magnitude impacts (insert hyphen) This has been addressed. 343-344

Line 346: I think the “which may lead to underreporting of impacts to health across the literature” should be bounded by commas This has been addressed. 351-352

Lines 351–352: consider changing “in each model” to “in each of our models” for clarity This has been addressed. 357

Lines 365–366: consider “related to the gradients of cumulative well density, IDW, and emissions, respectively”, rather than “related to the cumulative well density, IDW, and emissions gradient respectively”. Similarly later on line 370:

“associated with gradients of cumulative well density, IDW, and emissions, respectively, which contributes” This has been addressed. 355, 370

Last sentence: why would these aspects lead to underestimates of emissions? This has been addressed. 398-402

Limitations & Recommendations

Lines 375–376: “health-reporting data” (insert hyphen) This has been addressed. NA – text modified or removed

Line 378: should the Tustin and Rabinowitz citations have numbered indicators

(and Elliot on line 379, Rabinowitz on line 385)? This has been addressed. NA – text modified or removed

Lines 391–392: consider revising sentence to “In future studies, other health indicators or metrics such as blood pressure, heart rate, and the number of days a symptom persisted could provide a more in-depth analysis and help define the severity of the symptoms experienced.” This has been addressed. NA – text modified or removed

Line 400: change to “The air-and-exposure screening model” (insert hyphens) This has been addressed. 398

Conclusion

Line 408: remove extra space before “an IDW metric” This has been addressed. NA – text modified or removed

Line 410: “detrimental health” is awkward, do you mean health complications? This has been addressed. Yes, health complications. 409

Line 414: insert “the” before “relatively young age” This has been addressed. 412

Line 416: change “but do raise” to “but they do raise” This has been addressed. 415

Line 418: change “oil and gas industry” to “oil-and-gas industry” This has been addressed. NA – text modified or removed

Lines 418–419: suggest rewording to “Our study suggests that the narrow consideration of exposure risk used in some locations—based only on proximity to an individual well—may warrant revision to assess the contribution of the

growing density of wells in and around communities at 5-km scales.” This has been addressed. 416-420

Attachment

Submitted filename: Response Letter.docx

Decision Letter 1

Min Huang

7 Apr 2020

PONE-D-19-34629R1

Exposure assessment of adults living near unconventional oil and natural gas development and reported health symptoms in southwest Pennsylvania, USA

PLOS ONE

Dear Ms. Blinn,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

As recognized by the reviewers, the manuscript has been much improved. The reviewers have further questions related to your methodology and discussions, and suggest to improve the presentation (language and table/figure quality). Please take this opportunity to address the reviewers' remaining comments.

We would appreciate receiving your revised manuscript by May 22 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Min Huang

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: (No Response)

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Partly

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: No

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: Thank you for your thorough responses to the last round of edits, especially during this difficult time. I have a few remaining comments.

A few questions related to Brown et al. 2019 (the exposure metric paper).

1. Why does the present study omit formaldehyde but it was included in the earlier study?

2. The final sample size ended up being 87 in the 2019 paper, were you able to acquire additional data?

3. What is the distribution of years that the participants took the survey in the present study? No emissions data was available in 2017, so did these subjects receive estimates from 2016?

4. Switched from 2 to 5km buffer between the two studies, what was the rationale?

Other comments

1. Please add some explanation regarding the decision to combine 80+ (in many cases completely unrelated) symptoms into a single symptom count variable for the regression analysis.

2. How does the present study compare to Brown et al. 2019 where only respiratory symptoms were assessed and no association was found with air emissions? Please add some text to the discussion section on this topic.

3. The figures continue to be too low resolution to really read. Please update these to at least 300 dpi.

4. Add n (%) to table 1, please.

5. Discussion, page 21, line 429+: the CWD metric is most strongly associated with total symptom count. To me, this indicates a physical and psychological pathway between UNGD and health.

Some discussion of how perceived environmental change is associated with health might be helpful here. For example, the idea of solastalgia, feeling homesick even at home due to changed environmental conditions around the home (Albrecht). Further, in Lai 2017 “Understanding the psychological impact of unconventional gas developments in affected communities,” the authors find that negative perceptions of unconventional development was associated with negative psychological states.

6. It should be emphasized that this study provides evidence that among people aware and concerned about UNGD the CWD metric performs best, we have no idea if this would be the case with people not concerned or pro-UNGD. There is an interplay of perception, psychology, and health that should be more carefully discussed.

Reviewer #3: See comments in attachment.

Thank you for the opportunity to review your revised manuscript. Your revisions have satisfactorily addressed most of my concerns from my review of your initial submission. As with my first review, I currently have numerous minor concerns related to proofreading. Aside from those, I have a small number of less minor concerns related to your methodology, and one related to your conclusions, that should be addressed within the paper before I can recommend publication.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: Yes: Chris Holder

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PONE-D-19-34629R1 - Reviewer 3 Comments.pdf

PLoS One. 2020 Aug 18;15(8):e0237325. doi: 10.1371/journal.pone.0237325.r004

Author response to Decision Letter 1


18 May 2020

Please see cover letter for best formatting of responses:

Comments Response

Reviewer 2

1. Why does the present study omit formaldehyde but it was included in the earlier study? Brown et al. (2019) calculated emissions from wells, processing plants, and compressor stations. Brown et al. established the 5 compounds with the highest reported mass and known health effects. This analysis calculated emissions from wells only; this analysis did not include formaldehyde because it was not one of the top 5 compounds emitted from wells.

See lines 140-147

2. The final sample size ended up being 87 in the 2019 paper, were you able to acquire additional data? The 2019 paper excluded those whose “residence was outside Pennsylvania and whose residence was outside of the county of interest”. In this paper, we excluded those whose residence was “outside of Pennsylvania” and included those residing in Washington, Greene, Beaver, Butler, Allegheny, Bedford, Fayette, and Westmoreland counties which resulted in a larger sample size.

See lines 87-100

3. What is the distribution of years that the participants took the survey in the present study? No emissions data was available in 2017, so did these subjects receive estimates from 2016? Participants were distributed among years, with 23, 9, 29, 20, and 17 in the respective years between 2012-2016. The 2016 emissions data were used for the 6 participants from 2017.

See lines 119-120 & 179-180

4. Switched from 2 to 5km buffer between the two studies, what was the rationale? In our first submitted manuscript, we included text in the methods “We applied GLM analyses using three spatial scales of cumulative well density: 1, 2, and 5 km. AIC criterion was used to determine the appropriate spatial scale to study” and in the results “Initial GLMs for cumulative well density at the three spatial scales against symptom total, including demographic variables, showed that models using cumulative well density in 5 km had the lowest AIC value and was therefore used as the defined radius in cumulative well density as an exposure variable (1 km: AIC=1095.26, 2 km: AIC=1039.73, 5 km: AIC=1024.86).”

We have included this text back into the manuscript to explain why 5 km was chosen.

See lines 216-218 & 254-256

1. Please add some explanation regarding the decision to combine 80+ (in many cases completely unrelated) symptoms into a single symptom count variable for the regression analysis. The health assessment was conducted using a standard clinical interview, which included a comprehensive list of symptoms (such as one might see at any clinical visit with a health care provider). We have edited lines 81-94 to more accurately reflect the process. The number of symptoms reported ranged from 0-36, as stated in the results on line 249-250. Other studies have used this approach—i.e., counting total number of symptoms reported as an indication of health (Rabinowitz for example). It is important to note that it is not unreasonable to expect emissions of several compounds, each with specific health effects, might cause multiple body systems to be affected.

2. How does the present study compare to Brown et al. 2019 where only respiratory symptoms were assessed and no association was found with air emissions? Please add some text to the discussion section on this topic. This was added to the text:

Brown et al. 2019 did not find an association with the median air emissions. The most likely explanation for this inconsistency is that for this study, we used the 90th percentile, rather than the median.

See lines 336-338

3. The figures continue to be too low resolution to really read. Please update these to at least 300 dpi. We have used the PACE tool to provide the figures that meet journal requirements. Please download them and view on your computer to see if that enhances the quality compared to what is shown in the PDF.

4. Add n (%) to table 1, please.

Column added to table.

5. Discussion, page 21, line 429+: the CWD metric is most strongly associated with total symptom count. To me, this indicates a physical and psychological pathway between UNGD and health. Some discussion of how perceived environmental change is associated with health might be helpful here. For example, the idea of solastalgia, feeling homesick even at home due to changed environmental conditions around the home (Albrecht). Further, in Lai 2017 “Understanding the psychological impact of unconventional gas developments in affected communities,” the authors find that negative perceptions of unconventional development was associated with negative psychological states. Thank you for this suggestion, we have edited the text

See lines 371-375

6. It should be emphasized that this study provides evidence that among people aware and concerned about UNGD the CWD metric performs best, we have no idea if this would be the case with people not concerned or pro-UNGD. There is an interplay of perception, psychology, and health that should be more carefully discussed. While it is true that our sample was aware of/concerned for their health, they did not know about the exposure metrics our study explored nor that their symptoms would be looked at in this manner. It is safe to assume that the bias in our sample is minimized by that fact. Our exposure metrics were determined and designed years later. We assume respondents could not have over or under reported symptoms related to the exposure metrics given they did not know them. Additionally, we do not know if people who are concerned vs. not concerned (supportive vs unsupportive) are different from each other since we did not test our metrics with individuals who were not concerned about UOGD.

See lines 84-85, 245, & 389-399

Reviewer 2

Line 39 - remove “of” before “50%” Addressed

Line 40 - change “grouping” to “groupings” Addressed

Line 51 - remove hyphen in “human-health” Addressed

Lines 62-63 - think you mean “Which exposure measure(s) is the best predictor…” or “Which exposure measures are the best predictors…” Addressed

Line 76 - did you mean “different” instead of “difference”? Addressed

MAJOR COMMENT: Lines 98-99 - it’s not clear why your analysis was restricted to the eight counties Analysis was restricted to PA residents (who happened to be from just 8 counties) because we used PA DEP emission data.

See lines 96-100

Line 116 - change “were plotted” to “was plotted” Addressed

Line 119 - remove “was completed” Addressed

Line 136 - change “rate” to “rates” Addressed

Line 143 - you mean “concentrations” not “emissions” Addressed

Lines 146, 147, 151 - “down-wind” “downwind” - choose one Addressed

MAJOR COMMENT: Line 156 - I still don’t understand what part wind direction played in your modeling The box model does not take wind direction into account. Brown et al. 2019 examined wind direction data from NOAA and found that the wind blows from the north, south, and west 90% of the time, with 10% coming from the east.

The use of putting sources in cardinal directions from a home was a broad way to think about wind direction but did not actually use the NOAA wind direction data in the analysis. This was updated in the text.

See lines 148-180

MAJOR COMMENT: Line 159 - I still think you don’t adequately address in the paper the representativeness of the data at the Pittsburgh airport to the region at large. I don’t think it’s the only hourly station in the eight-county area that reports the variables you use. Why did you use this one station? The approach we took was to choose one major airport where we felt the most complete set of weather data could be gathered and generalized over an area. Data from smaller airports is frequently incomplete even if they are closer to a home. We recognize this is a limitation of this exploratory study.

See lines 162-168 & 416-417

Line 162 - change “was” to “were” Addressed

Line 163 - remove “yearly” as it implies annual-average weather, when in fact you used hourly weather data Addressed

Line 172 - change “respondents’” to “respondent’s” and “was” to “were” Addressed

MAJOR COMMENT: Paragraph beginning Line 173 - Your methodology treats the trajectory of each well’s plume equally when summing the quadrant concentrations together. In fact, a wind rose analysis would tell you how often winds blew from each quadrant toward each residence, allowing you to weight the concentrations from some quadrants more than others. You should either address this limitation or correct for it. We have addressed this in the limitations section.

See lines 411-414

Line 173 - suggest changing “surround” to “are ubiquitous around” Addressed

Line 176 - I think you mean “in which quadrant the well was located relative to the residence”? Addressed

Line 182 - change “of their survey and was used” to “of their survey, which was used” Addressed

Line 186 - should “compared” be changed to “made”? Addressed

Line 201 - should “to generation” be “for determining”? Addressed

Line 213 - “a species” is singular but “taxa” is plural - change “taxa” to “taxon”? Addressed

Line 214 - change “are” to “is” Addressed

Line 223 - change “uses” to “used” Addressed

Line 232 - change “z score” to “z-score” Addressed

Line 248 - change “measures” to “measure” Addressed

I don’t think Figure 2 is cited in the text? Addressed

Lines 306, 314 - remove hyphen in “human-health” Addressed

MAJOR COMMENT: Line 311 - you only evaluated wells within 5 km of a residence, so how did you detect effects beyond that? We are stating that our study used a further range than references like [15,19] and detected effects beyond where those researchers studied. We did not look beyond 5 km. The text has been modified to help clarify that statement.

See lines 318-321

Line 333 - remove “primarily”, all respondents were concerned about UOGD exposure Addressed

Line 342 - end of line, change “was” to “were” Addressed

Line 343 - remove hyphen in “exposure-magnitude” Addressed

Line 395 - I believe development and production are the ONLY UOGD stages, and I believe the literature supports that emissions from development tend to be higher than those from production Addressed

MAJOR COMMENT: Line 401 - your model actually does account for day-to-day weather patterns, as it utilizes hourly weather data. Do you mean that your model does not correct for differences in weather observations at residences relative to the weather station site? Yes, that is correct, the model does account for day to day weather patterns each day of the year. We have removed that line from the text. The main limitation is that the box model does not correct for topography.

Line 406 - I think you mean “emissions concentrations”. Air dispersion modeling does not quantify emissions, it uses them to quantify concentrations. Also, your study is not the first to use dispersion modeling for UOGD purposes. Addressed

Line 411 - remove hyphen in “human-health” Addressed

Lines 417-418 - I think you want to remove “in and around communities at” Addressed

Line 419 - remove hyphen in “residence-level” Addressed

Line 420 - remove hyphen in “human-health” Addressed

Line 577 - change “CDW” to “CWD” Addressed

Attachment

Submitted filename: response to reviewers PONE-D-19-34629.docx

Decision Letter 2

Min Huang

16 Jun 2020

PONE-D-19-34629R2

Exposure assessment of adults living near unconventional oil and natural gas development and reported health symptoms in southwest Pennsylvania, USA

PLOS ONE

Dear Dr. Blinn,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jul 31 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Min Huang

Academic Editor

PLOS ONE

Additional Editor Comments (if provided):

Thanks for preparing and submitting the revised manuscript. One of the reviewers has some remaining comments which must be addressed before the manuscript can be published.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

Reviewer #3: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: No

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: (No Response)

Reviewer #3: Please see my attached comments. The new revelation that 2016 well and weather data were matched to 2017 health data is a concern. Within my comments, you'll find a discussion on this. I will not recommend this for publication without addressing this issue. All other comments are minor/typographical.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: Yes: Chris Holder

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PONE-D-19-34629_R2_Review.docx

PLoS One. 2020 Aug 18;15(8):e0237325. doi: 10.1371/journal.pone.0237325.r006

Author response to Decision Letter 2


17 Jul 2020

Comments Response

Reviewer 3

Line 51: change “adverse health” to “adverse health effects” (or outcomes, etc.) Addressed

Line 85: data is plural, so change “was abstracted” to “were abstracted” Addressed

Lines 95–96: convert “General” to lower case; suggest using semicolons in the list so that the EENT group is separated more clearly from skin; also, need to put the EENT group in the right order. So: “general; lung and heart; skin; eyes, ears, nose, and throat; gastrointestinal (GI); nerves and muscle; reproductive; blood system; and psychological” Addressed

Line 99: 14 individuals were excluded because you didn’t know their home lat/longs right? If so, move the parenthetical to the end of the previous sentence Addressed

Lines 105–106: If no respondents lived in Lawrence County, then why color it dark gray indicating “county with study residencies”? (did you mean “residences”?) Why does it matter that someone in Butler County lived near the Lawrence border? Lawrence County was colored dark grey because we included gas wells from the county and considered the wells part of the study. The individual living in Butler County lived near the border of Lawrence and a well pad located within their 5-km radius was located in Lawrence, so we had to include those wells in the study. For sake of not identifying the respondent, all Lawrence County wells were included in the analysis.

MAJOR COMMENT Lines 119–120 (and 168): You used 2016 well and weather data for comparison to 2017 health survey data? How is that appropriate? Your data are already of low temporal resolution (one-time reporting of symptoms related to annual well data), and with this mismatching of 2016/2017, I find myself wondering how much different your study outcomes would be if you just randomized which year of data were applied to each respondent. You need a discussion of this in your discussion/limitations, or remove 2017 from your assessment, or redo the 2017 assessment with 2017 well/weather data if available We were able to use 2017 weather and gas well data for the 2017 health data and updated our models and findings accordingly. Now, each survey year has corresponding weather and gas well data from the same year. Each exposure metric remained significant, while the Titan results changed slightly. New figures have been uploaded.

Lines 148–149: remove either “were derived” or “were estimated”, it’s redundant Addressed

Line 150: I think you should use “respondent’s” instead of “participant’s” for consistency. Addressed

Line 165: “hourly conditions for each hour” is redundant Addressed

Line 208: change “determining” to “determine” Addressed

Line 255: change “and was therefore selected” to “and were therefore selected” Addressed

Line 316: alter to “Despite a high degree of inherent complexity in associations between health and UOGD…” Addressed

Line 328: change “attempt” to “attempts” Addressed

Lines 354–356: this is an incomplete sentence Addressed

Line 397: I think you should use “respondents” instead of “participants” for consistency. Addressed – also addressed in S2 Appendix

Decision Letter 3

Min Huang

27 Jul 2020

Exposure assessment of adults living near unconventional oil and natural gas development and reported health symptoms in southwest Pennsylvania, USA

PONE-D-19-34629R3

Dear Dr. Blinn,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Min Huang

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Thanks to the authors and the reviewers for their efforts. In my view all reviewers' comments have been addressed.

Reviewers' comments:

Acceptance letter

Min Huang

5 Aug 2020

PONE-D-19-34629R3

Exposure assessment of adults living near unconventional oil and natural gas development and reported health symptoms in southwest Pennsylvania, USA

Dear Dr. Blinn:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Min Huang

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. TITAN example code and explanation.

    Lines 7–13 prepare a sample dataset of twenty potential symptoms and fifty individual respondents to mimic a subset of the data used in this study. For each respondent, 1s and 0s were used randomly for each symptom. A 1 means they did have that symptom, 0 means they did not. Now we have a dataset of fifty respondents and what symptoms they did or did not have. Line 16 creates a randomized list of exposure, one for each of the fifty respondents. In our study, each respondent had a measure of cumulative well density (CWD), an inverse distance weighting (IDW) score, and a measure of estimated annual emissions concentration (AEC). Line 16 creates an exposure variable that ranges from 0 to 50 (no units), with 0 being no exposure and 50 being representative of high exposure, though in our sample there was no limit to how high an exposure measure could go. Line 19 uses titan() to run the TITAN analysis, taking the reported symptoms and exposure values to determine if certain symptoms occur more or less at different levels of exposure. For example, when the exposure measure reaches 12, the model is looking for any symptoms that stand out as occurring more frequently at that exposure level. Indicator values (range 0–100) are used to score each symptom’s relationship to that exposure level, or gradient. A high indicator value shows a strong relationship with the gradient at a certain level. Then, the model determines if that relationship is positive or inverse. In ecological studies, one might study how changes in dissolved oxygen (DO) in a pond ecosystem cause certain species to die off or thrive as levels of DO change. When we begin to see a certain species appear in the pond, we can hypothesize that there may also be a change in DO as well since that species is an indicator of a certain threshold, or level of DO. Lines 22–29 takes information from the TITAN analysis and creates a table. For this table, the rows each represent the different symptoms, while columns are information pertaining to Indicator Value, the frequency of the symptom, p-values, whether the symptom is positively or inversely associated with the gradient, and the z-score. Using these parameters, we begin to filter out symptoms that were infrequent (line 25) and can also filter out insignificant symptoms or symptoms with low z-scores (lines 40–41). The latter two were done in our study but did not make sense for this sample data. Lines 34–36 construct the final plot we used to visualize the results of the TITAN analysis. In the plot, there are ten symptoms positively associated with the gradient with indicator values ranging from 32 to 71. The same goes for the inversely associated symptoms. For the plots in our study, we added additional characteristics like colors to group symptoms into categories and using the width of each bar to represent the frequency of symptoms being reported.

    (R)

    Attachment

    Submitted filename: Review of PONE-D-19-34629.pdf

    Attachment

    Submitted filename: Response Letter.docx

    Attachment

    Submitted filename: PONE-D-19-34629R1 - Reviewer 3 Comments.pdf

    Attachment

    Submitted filename: response to reviewers PONE-D-19-34629.docx

    Attachment

    Submitted filename: PONE-D-19-34629_R2_Review.docx

    Data Availability Statement

    Gas well location and emissions data is hosted on a PowerBI report and controlled by the PA Department of Environmental Protection. To view only gas well data, filter by Facility Type. We additionally filtered by year, county, and and pollutant as described in our methods. Data can then be exported to a .csv file: http://www.depgreenport.state.pa.us/powerbiproxy/powerbi/Public/DEP/AQ/PBI/Air_Emissions_Report Climate data was retrieved from NOAA's local climatological database. To use the tool, you need to select the state and county of where the airport is located. We used data from the Pittsburgh Allegheny County Airport in Allegheny County, PA. Once the airport has been added to your cart, you can determine the data range you wish to download and request a .csv of the data: https://www.ncdc.noaa.gov/cdo-web/datatools/lcd Health data cannot be shared publicly because some of the data we collect is in rural areas with sparse population. In areas of sparse population, it may be possible to identify participants using data such as GIS coding. Data are available from the Environmental Health Project Institutional Data Access / Ethics Committee (contact via Environmental Health Project, Sarah Rankin 724.260.5504) for researchers who meet the criteria for access to confidential data.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES