Abstract
Due to structural racism and income inequality, exposure to environmental chemicals is tightly linked to socioeconomic factors. In addition, exposure to psychosocial stressors, such as racial discrimination, as well as having limited resources, can increase susceptibility to environmentally induced disease. Yet, studies are often conducted separately in fields of social science and environmental science, reducing the potential for holistic risk estimates. To tackle this gap, we developed the Chemical and Social Stressors Integration Technique (CASS-IT) to integrate environmental chemical and social stressor datasets. The CASS-IT provides a framework to identify distinct geographic areas based on combinations of environmental chemical exposure, social vulnerability, and access to resources. It incorporates two data dimension reduction tools: k-means clustering and latent profile analysis. Here, the CASS-IT was applied to North Carolina (NC) as a case study. Environmental chemical data included toxic metals – arsenic, manganese, and lead – in private drinking well water. Social stressor data were captured by the CDC’s social vulnerability index’s four domains: socioeconomic status, household composition and disability, minority status and language, and housing type and transportation. Data on resources were derived from Federal Emergency Management Agency (FEMA’s) Resilience and Analysis Planning Tool, which generated measures of health resources, social resources, and information resources. The results highlighted 31 NC counties where exposure to both toxic metals and social stressors are elevated, and health resources are minimal; these are counties in which environmental justice is of utmost concern. A census-tract level analysis was also conducted to demonstrate the utility of CASS-IT at different geographical scales. The tract-level analysis highlighted specific tracts within counties of concern that are particularly high priority. In future research, the CASS-IT can be used to analyze United States-wide environmental datasets providing guidance for targeted public health interventions and reducing environmental disparities.
Keywords: Clustering, Metals, Private wells, Social vulnerability, Geographic data
Graphical Abstract

1. Introduction
Traditionally, exposures to environmental chemicals and social stressors are addressed separately in public health and social policy research. However, mounting data from the fields of epidemiology and toxicology demonstrate that harmful social and environmental factors have interactive effects (Wright, 2009). Several theoretical frameworks have been proposed for why researchers should integrate social stressors in environmental health research, generally sharing three foundational understandings (Clougherty and Rider, 2020; Flanagan et al., 2018; Gee and Payne-Sturges, 2004; Morello-Frosch and Shenassa, 2006; Rider et al., 2012). First, psychosocial stressors and lack of resources can increase susceptibility to the adverse effects of a chemical exposure. For example, poor maternal psychosocial status (i.e., high stress, low social support) has been shown to intensify the association between manganese (Mn) exposure and preterm birth risk as well as the association of maternal lead (Pb) exposure and 2-year neurodevelopmental outcomes in their offspring (Ashrap et al., 2021; Hu et al., 2006; Tamayo y Ortiz et al., 2017). Second, chemical and social exposures often co-occur due to common systemic root causes such as structural racism and income inequity. For instance, populations living proximate to Superfund sites have a higher proportion of minority residents and residents with less than a high school education compared to the general United States (US) population (US EPA, 2020; Maranville et al., 2009). Third, chemical and social exposures share biologic mechanisms of effect including disrupted epigenetic patterning, damage to endocrine and autonomic nervous systems, and increased oxidative stress and inflammation (Gee and Payne-Sturges, 2004; Morello-Frosch and Shenassa, 2006; Wright, 2009). For instance, inorganic arsenic (iAs) is a well-known epigenetic dysregulator of pathways such as the glucocorticoid pathway, which is a set of genes known to also be dysregulated by persistent stress and discrimination (Bailey and Fry, 2014; Meakin et al., 2020; Santos et al., 2018).
Although the theoretical underpinnings and epidemiologic evidence support integrating chemical and social exposures, to our knowledge, there is no analytical technique that researchers can apply to identify geographic areas at risk of combined chemical and social stressors. With increased interest in measuring and addressing social determinants of health, analytic strategies that incorporate the complexity of the social and physical environment are needed to inform public health policy.
North Carolina (NC) is an ideal state to study the interactive effects of harmful social and environmental chemical factors given the urgency of health disparities and enduring environmental injustices, particularly in relation to private drinking well water (Eaves et al., 2022; MacDonald Gibson and Pieper, 2017; NC Dept. of Health and Human Services, 2018). In addition, social vulnerability is high with nearly half (43 %) of NC children living in poor or low-income homes (NC Child, 2019). Furthermore, compared to other states, NC has one of the highest proportions of the population relying on federally unregulated private well water, a group that is vulnerable to toxic chemical exposures via drinking water (Gibson et al., 2020; Sanders et al., 2012). iAs, Pb, and Mn are common metals found in private well water in NC, each of which are known developmental toxicants and have been associated with an increased risk for birth defects in an NC-based study (Sanders et al., 2014). The ability to protect oneself from chemical exposure via private well water is intimately linked to social and economic resources. For example, testing and treatment is often prohibitively expensive, and access to the federally protected public water supply (under the Safe Drinking Water Act) is wrought with environmental racism (Leker and MacDonald, 2018; MacDonald Gibson and Pieper, 2017; Nigra, 2020; Stillo et al., 2019). Thus, NC is an ideal state to apply an analysis integrating chemical exposures via private wells and social stressors.
Addressing both chemical and social exposures holistically is complex due to high dimensionality in datasets, correlated chemical levels, and the challenge of simultaneously tracking chemical and non-chemical factors that can be integrated across data sources. To address these challenges, we developed the Chemical and Social Stressors Integration Technique (CASS-IT) framework and then deployed it to identify NC counties at risk of hazardous chemical and social factors. This technique builds upon the risk formula proposed by Flanagan et al.: (Flanagan et al., 2011; Flanagan et al., 2018). In our application of CASS-IT to NC, the hazard, defined by Flanagan et al. as “a condition posing the threat of harm” is toxic metal exposure (iAs, Pb, and Mn) in NC private wells. Vulnerability, defined as “the extent to which persons or things are likely to be affected” is captured by the CDC’s Social Vulnerability Index (SVI)’s four domains: socioeconomic status, household composition and disability, minority status and language, and housing type and transportation. Lastly, resources, defined as “assets in place that will diminish the effects of hazards” are measured by the FEMA’s Resilience and Analysis Planning Tool (RAPT) and are categorized into health resources, social resources, and information resources.
Our primary goal in the present study was to identify counties of concern because in NC, county health departments are responsible for well water user outreach and many critical social programs and policies are administered at the county-level (NC DHHS, 2022; Wait et al., 2020). However there is recognized heterogeneity in metal levels, social vulnerability and resources within counties, therefore we also deployed CASS-IT at the census-tract level to demonstrate that CASS-IT can be used at varied geographic resolutions. We hypothesized that: (1) individuals in NC are co-exposed to social vulnerability, inadequate resources, and chemical exposures, (2) variation in stressors exists across counties within the state, and across tracts within counties, and (3) this variation would be represented by distinct profiles of counties/census tracts. Further, we hypothesized that taking the CASS-IT approach would yield clusters of counties/census tracts for prioritization that would not emerge by examining these environmental and social stressors individually. We anticipate that the CASS-IT can be used directly in NC by public health professionals at the state-level looking to prioritize resources and funding for private well water user programming or at the county-level to advocate for the need for support and prioritize certain areas within the county utilizing the tract-level findings. We also foresee CASS-IT being used by different states, for US-wide analyses and/or for different exposures of interest.
2. Materials and methods
2.1. Description of the CASS-IT
The CASS-IT is a framework that builds upon the formula for risk developed by Flanagan et al. where (Flanagan et al., 2011):
Fig. 1A details the conceptual framework behind the CASS-IT and Fig. 1B details the analytical process involved in generating risk profiles based on the CASS-IT. Note that in risk assessment, risk is commonly calculated as (Brown, 2014). This latter equation can be transposed onto the formula when considering that propensity for exposure to hazard is related to one’s vulnerability and resources (i.e., ). The CASS-IT can be applied using different geographic units of analysis, depending on the goal of the application. Here, our primary analysis used counties as the unit of analysis; however, we also describe and present the results for a census tract-based analysis.
Fig. 1.

Overview of CASS-IT. A) Schematic of the conceptual framework underlying the CASS-IT for combining geographic data on environmental and social stressors to generate holistic risk estimates. B) Description of the analytical process for putting the framework into practice using CASS-IT.
2.2. Data
For the purposes of this study, we defined a stressor as a condition or exposure that causes biological strain or harm over time. We combined three different types of stressors: (1) chemical stressors, defined as high levels of or high likelihood of exposure to a toxic environmental chemical (the “hazard”); (2) social stressors, defined as socioeconomic forces that confer a disadvantage within the US, including being low-income, having a disability, or being non-white, among others (“vulnerability”); and (3) resource stressors, defined here as having low access to social, information or health-related resources, which would – in the counterfactual case of having access to them – be able to buffer the effects of chemical and social stressors (“resources”). A county-level dataset (n = 100) that included private well water test reports for iAs, Pb, and Mn (Eaves et al., 2022) was integrated with publicly-available data on social vulnerability (Flanagan et al., 2011; Flanagan et al., 2018) and social, health and information resources (FEMA, 2021a). The county-level dataset used to apply the CASS-IT is available in Table S1. A census tract-level dataset (n = 2195) was also generated using the same variable definitions and categories as the county-level dataset, which is available in Table S2.
2.2.1. Metals data
To capture a chemical stressor particularly relevant to environmental health in NC, metal concentrations in private well water were obtained from the NCWELL database (Eaves et al., 2022). The NCWELL database compiled private well water test samples collected by the NCDHHS Division of Public Health (DPH) State Laboratory of Public Health. Inductively coupled plasma mass spectrometry (ICP-MS) was used to characterize the contaminants present in the sample utilizing EPA method 200.8 Revision 5.4. The reports contain the metal concentration in mg/L (ppm), or an indication that the concentration was below a limit of reporting (LOR). The LOR is the minimum value below which no quantitative concentration is reported. The individual reports sent to well owners are publicly available on the NCDHHS website. These publicly available well water test reports were downloaded on May 23rd, 2019, from the NCDHHS website. Data were cleaned and then geocoded utilizing ArcGIS Desktop ESRI version 10.7 and ESRI’s Online World Geocoding Service address locator. The latitude and longitude coordinates in decimal degrees format were utilized in a spatial join to the census tract and county boundaries, based on the 2010 census reference data, to generate state, county, and census tract FIPS codes for each well water test. For all concentrations reported as below the LOR, measurements were imputed as . The final dataset included n = 71,698 well water test reports inclusive of tests taken between October 29th, 2009, to May 20th, 2019. Of these, n = 64,121 contained measurements of iAs, n = 64,315 contained measurements of Pb, and n = 64,128 contained measurements of Mn. This dataset was used to calculate the mean concentration of iAs, Pb, and Mn (in parts per billion, ppb) in each county and in each census tract. Due to non-normality of the mean values, each of the metal concentrations variables was log-transformed before proceeding to statistical analysis. To provide context on the public health relevance for each county, the percentage of the county utilizing private well water was also calculated. To generate this percentage, the number of individuals using private wells in a county was obtained from the US Geological Survey database for 2015 and the population of the county was obtained from the US Census Bureau Population Estimates for 2018 (US Geological Survey, 2015).
2.2.2. Social vulnerability data
To capture a wide range of social stressors, we leveraged the Centers for Disease Control and Prevention’s Agency for Toxic Substances and Disease Registry (CDC/ATSDR) Social Vulnerability Index (SVI). The SVI was first reported in 2011 in the context of disaster management and at the time was used to understand the differential impact of Hurricane Katrina and associated recovery efforts (Centers for Disease Control and Prevention Agency for Toxic Substances and Disease Registry Geospatial Research Analysis and Services Program, 2018; Flanagan et al., 2011; Flanagan et al., 2018). The SVI was developed to aid local planners in identifying vulnerable communities and target resources for prevention and mitigation. The SVI consists of 15 variables collected through the American Community Survey, most recently the 2014–2018 5-year estimates. The 15 SVI variables are organized into four themes, which constitute the variable names henceforth: 1) socioeconomic status (which includes the variables: below poverty, unemployed, income, no high school diploma), 2) household composition and disability (which includes the variables: aged 65 or older, aged 17 or younger, civilian with a disability, single-parent households), 3) minority status and language (which includes the variables: minority, speaks English “less than well”), and 4) housing type and transportation (which includes the variables: multi-unit structures, mobile homes, crowding, no vehicle, group quarters). The SVI for each category is generated by first ranking each of the 15 variables within the state so that each county/tract is assigned a percentile. The percentiles are then summed within a category for each county/tract. Lastly, the summed percentiles are ranked to generate a ranking for each county/tract for each of socioeconomic status, household composition & disability, minority status & language, and housing type & transportation.
2.2.3. Resources data
Datasets detailing county-level and census tract-level resources were identified from a Federal Emergency Management Agency (FEMA) report of community resilience indicators known as the Resilience and Analysis Planning Tool (RAPT) (FEMA, 2021a; FEMA, 2021b). RAPT includes suggested indicators for identifying social, health, and information resources in a county (FEMA, 2021c). From this, low social resources in a county/tract were measured using two indicators from which an average was calculated: 1) connection with civic or social organizations per 10,000 people and 2) the proportion of the population affiliated with a religion. The low health resources indicator represents a measure of hospital and medical capacity in the county indicated by the average of 1) the number of hospitals per 10,000 population and 2) number of diagnosing and treating practitioners per 1000 population. Low information resources was operationalized as the ratio of households with access to high-speed internet within a county. These data were obtained from the Federal Communication Commission’s (FCC) database (FCC, 2018). All resources variables were reverse-scored so that high values represent a worse status or lower resources, in order to ease interpretation by matching direction with the metals and social vulnerability variables.
2.2.4. Imputation of missing data
In the county-level dataset (n = 100), the low information resources variable was missing for seven counties where data was withheld to maintain firm confidentiality. In the census tract-level dataset (n = 2195), the following variables had missing values: socioeconomic status (n = 33), household composition and disability (n = 29), minority status and language (n = 24), household type and transport (n = 31), arsenic (n = 439), lead (n = 449), manganese (n = 451), low social resources (n = 477), low health resources (n = 477), low information resources (n = 12). In both datasets, missing values were imputed with random forest modeling, utilizing the missForest R package (v1.4) (Stekhoven, 2013; Stekhoven and Bühlmann, 2012). In missForest, a random forest model is fit on the non-missing data and is used to predict the missing data. This is repeated, continuously updating the imputed data, until the stopping criterion is met, which is when the differences between the previous imputation result and the new imputation result is increased (Stekhoven, 2013; Stekhoven and Bühlmann, 2012).
2.3. Analysis
The purpose of the CASS-IT is to determine whether groups of individual counties/tracts, or another geographic unit in other applications, can be identified using variables that combine data on chemical stressors (in this application, three variables: well water iAs, Mn, or Pb levels), social stressors (four variables: socioeconomic status, household composition and disability, minority status and language, and housing type and transportation), and resources stressor (three variables: low health resources, low information resources, and low social resources). Our assumption was that counties/tracts across the state of NC varied across all three dimensions but that some counties/tracts would be more similar to each other to form “clusters” of counties/tracts. The goal of applying the CASS-IT was to identify the optimal number of clusters and interpret the generated clusters for substantive meaning.
The CASS-IT is a two-step analytic approach to identify clusters of counties/tracts. The first step is a k-means clustering analysis which is then followed by a latent profile analysis (LPA) as a sensitivity test. Both algorithms use a similar approach to analysis: observations are assigned to an increasing number of clusters or classes (k = 1, k = 2, k = 3, etc.) then analyses are provided to indicate whether k + 1 cluster is superior to the k cluster. The key distinction is that LPA is a model-based approach that uses maximum likelihood estimation to assign probabilities of class membership (Jason and Glenwick, 2016; Masyn, 2013) while k-means clustering uses an optimization algorithm to minimize the distance to a cluster centroid (Hartigan and Wong, 1979; Rupp, 2013).
Given the different scales and distributions of the measures, all variables were z-score transformed prior to analysis. We conducted the two analyses (k-means and LPA) using the three metal variables alone (chemical stressor), the four social vulnerability variables alone (social stressor), the three resource variables alone (resources stressor), then applied the CASS-IT with the 10 combined chemical, social and resources stressor variables. All analyses were conducted in R (version 4.02).
2.3.1. k-Means clustering analysis
The k-means algorithm seeks to partition M points (in this case, 100 counties or 2195 census tracts) in N dimensions into k clusters (Hartigan and Wong, 1979). In this application, N represents county-level or tract-level z-score transformed mean concentrations of metals, z-score transformed social vulnerability variables, and z-score transformed resources variables. In k-means analysis, k (or the number of clusters) must be a priori selected based on either pre-existing knowledge or can be selected through metrics assessing observable responses of the data to the clustering. To maximize interpretability of the clustering solutions, we set the a priori preferences for the optimal k to be >2 and <6 (i.e. k = 3, 4 or 5); however, we iterated through a larger range of k to observe trends in responses of the data to the clustering. To determine the optimal k for this dataset, we iterated through k = 1 to k = 99 and evaluated four metrics, while also considering the interpretability. The four metrics were: (1) the number of single county/tract clusters; (2) the elbow point; (3) the Gap statistic, generated through the fviz_nbclust function within the factoextra package (v1.07) (Kassambara and Mundt, 2020); and (4) the majority rule of 30 different indices designed to determine the optimal k, generated through the Nbclust function within the factoextra package (v1.07). First, we sought to minimize the number of single-county/single-tract clusters. While single-county/single-tract clusters may decrease the within-cluster variation, they are not meaningful nor interpretable in relation to the goals of our analysis. The optimal k for the cluster metric is defined as the highest k before there are one or more clusters that contain only one county/tract. Following this, no solution with a single county/tract cluster was considered. Second, the elbow point was examined. The elbow point is determined visually through plotting the number of clusters on the x axis and the ratio of within cluster variance to total variance. The elbow point is the curve in the plot at which a further increase in the number of clusters does not represent substantial gains in reduction of within cluster variability. Third, the Gap statistic was evaluated. To detail, the Gap statistic compares the total within-cluster variation for different values of k with their expected values under null reference distribution of the data (Tibshirani et al., 2000). The best k is the one that maximizes the gap statistic, which means that the solution is far away from what would be achieved under the random uniform distribution of the dataset. Lastly, the majority rule of 30 different indices, generated using the Nbclust function, was evaluated alongside with the solution derived from the previous three metrics. In the current analysis we did not apply a threshold for the number of metrics that must agree on the optimal number of clusters, instead we evaluated all the metrics as a whole along with considering interpretability of the clusters generated; however, other uses of CASS-IT may which to a priori set such a threshold.
2.3.2. Latent profile analysis as a sensitivity test
Latent profile analysis (LPA) is another methodological approach to identifying clusters and was used in this study as a sensitivity test of the k-means analysis. LPA is considered a person-centered method (whereas k-means is variable-centered) that is appropriate for exploring unobserved heterogeneity or potential subgroups in samples (Chung et al., 2020; Kainz et al., 2018). Further, variable-centered methods focus on identifying relationships between variables in a population with an assumption that a sample is drawn from a single population and person-centered methods are useful to determine whether a given population is constituted by unobserved but emergent subgroups (Howard and Hoffman, 2018). In the present study, up to six latent profile solutions were estimated to identify the optimal solution. The Bayesian information criterion (BIC) was used as a measure of the relative fit across different profile solutions with lower values indicating better relative model fit (Collins and Lanza, 2010; Schwarz, 1978). The Bootstrap Likelihood Ratio Test (BSLRT) was also used to contrast the fit of neighboring profile solutions (i.e., comparing the k-profiles model with the k − 1-profiles model) (Berlin et al., 2014). p-Values derived from the BSLRT were used to determine if there is a statistically significant improvement in fit for the inclusion of an additional profile. Entropy and mean posterior probability values were also examined to assess the classification certainty associated with each profile solution; values closer to 1 reflect better classification certainty (Berlin et al., 2014).
3. Results
3.1. Summary of data
Data on metal contamination in private wells, social vulnerability and low health, information, and social resources were compiled at a county-level and a census tract-level across the state of NC. The county-level and census tract-level values for each of the variables are provided in Table S1 and Table S2, respectively. Table S1 also provides the percentage of each county predicted to be on private well water. Summary distributions across all 100 counties and all 2195 census tracts are provided in Table S3. The mean of the county-level concentrations for iAs was 4.04 ppb, for Pb was 5.20 ppb, and for Mn was 70.55 ppb. The maximum county-level mean for iAs was 15.26 ppb, for Pb was 33.58 ppb, and for Mn was 229.86 ppb, for Anson, Tyrrell, and Perquimans counties, respectively. For ease of interpretation of results, across all variables (metals, social vulnerability, and resources) a higher value corresponds to a worse status for human health, in other words, higher risk (Table S3).
The maximum county-level value (indicative of fewest resources in the state) for low health resources, low information resources, and low social resources was for Tyrrell, Pitt, and Greene counties, respectively. In contrast, the minimum county-level value (indicative of the highest resources in the state) for low health resources was for Durham county, for low information resources was for Avery county, for low social resources was for Tyrrell county. When looking at social stressors, Wake, Watauga, Stokes, and Camden counties had the minimum values for socioeconomic status, household composition & disability, minority status & language and housing type & transportation, respectively, indicating lowest levels of stressors in each domain. Greene, Washington, Durham, and Scotland counties had the maximum values for socioeconomic status, household composition & disability, minority status & language and housing type & transportation, respectively, indicating highest levels of stressors in each domain.
Numerous indicators across chemical, social and resource stressors were significantly correlated in pairwise spearman rank correlation tests (Fig. 2, Table S4). Of the 45 unique pairings of stressors, 18 (40 %) were significantly correlated (p < 0.05). As expected, positive correlations were generally observed between low resources and social stressors. For example, low information resources was correlated with both socioeconomic status (Spearman coefficient: 0.66, p < 0.01) and household composition and disability (Spearman coefficient: 0.66, p < 0.01), and low health resources was correlated with socioeconomic status (Spearman coefficient: 0.59, p < 0.01). There were generally positive correlations within the indicators of social stressors, for example socioeconomic status and minority status and language were both positively correlated with housing type and transport.
Fig. 2.

Pair-wise Spearman rank correlations between the 10 variables used in the CASS-IT analysis. *p< 0.05, **p< 0.01, ***p< 0.001.
3.2. Metals-only clustering
When evaluating iAs, Pb and Mn alone, four distinct county clusters were identified (k = 4) (Table S1, Fig. 3). k = 4 was the last solution before single-county clusters were formed; it also still represented a substantial decrease (>60 %) in the proportion of within-cluster variance (Table S5). Furthermore, k = 4 was the solution most commonly identified as optimal from the 30-indices summary metric (Table S5). Cluster 1 (green) represents counties in which multimetal contamination of both iAs and Mn is of concern: both iAs and Mn contamination is substantially higher than the rest of the state’s average levels. This cluster is the smallest cluster, containing five counties, namely Union, Stanly, Alexander, Montgomery, and Anson, and is primarily centered in the south-central region of the state. Cluster 2 (41 counties) and 3 (12 counties) represent cluster in which single metal contamination is of concern, for Mn and Pb, respectively. Cluster 4 (pink) represents counties of low concern for iAs, Pb, or Mn contamination, as concentrations of all of these are generally below the state average. This cluster contained the largest number of counties, 42 in total. The cluster assignments of all counties are listed in Table S1.
Fig. 3.

Results of k-means clustering for A) metals-only, B) vulnerability only; c) resources only. Top plots are maps of NC shaded by the county cluster assignments. Bottom plots are bar graphs demonstrating the mean of the z-score standardized mean for each variable for the counties within each of the clusters (y = 0 represents the state mean).
3.3. Social vulnerability-only clustering
We then evaluated the clustering patterns of the social vulnerability variables (the SVI’s four themes: socioeconomic status, household composition and disability, minority status and language, housing type and transport) and also identified four distinct clusters (k = 4) (Table S1, Fig. 3). k = 4 represented a solution that reduced the within cluster proportion of variance by over 60 % and did not contain any single county clusters (Table S5). Moreover, the Gap statistic identified k = 4 as the optimal solution (Table S5). Cluster 1 (green) represents counties with low social vulnerability across all variables, compared to the rest of the state. The 20 counties comprising cluster 1 are generally dispersed on the coasts and borders of the state. In contract, cluster 2 (orange) comprises counties with high social vulnerability across all four themes. The 25 counties within cluster 2 are generally located in the eastern part of the state. Cluster 3 (34 counties) and 4 (21 counties) represented counties in which there are more varied profiles of social vulnerability. Cluster 3 (purple) represents counties with slightly higher vulnerability in socioeconomic status and household composition than the state average, although not close to the vulnerability represented in cluster 2 counties, and roughly state-average levels of minority status and language and housing type and transport vulnerability. Lastly, Cluster 4 (pink) comprises mostly urban and peri urban counties whereby there is low socioeconomic, household composition and disability and housing type and transport vulnerability but high minority status and language vulnerability, given the diversity of populations.
3.4. Resources-only clustering
Low health resources, low social resources, and low information resources were also evaluated for their clustering patterns across the state. k = 4 was again found to be the most ideal solution for interpretability and accurate fitting of data (Table S1, Fig. 3). Specifically, k = 4 did not contain any single county clusters, was in a reasonable range of the elbow point and was selected to be the most optimal solution by the 30-indicies summary (Table S5). Note that in Fig. 3, the higher the bar, the greater the lack of resources. Cluster 1 (green) comprises 14 counties with high social resources and low health resources, and midlevel information resources. These counties are scattered across the state and generally represent more rural counties, areas where religious and civil organizations are strong, but healthcare institutions are lacking. Cluster 2 (orange) contains 25 counties with high resources health and information resources and mid-level social resources; these are in general more urban counties. Cluster 3 (purple) includes 44 counties with mid-low health and social resources and approximately average information resources. Cluster 4 (pink) includes 17 counties that have low resources across all three definitions, these counties are generally in the eastern part of the state and represent high-priority counties with regards to resources.
3.5. CASS-IT results
When combining all three domains of metals, social vulnerability, and low resources, four clusters of counties across the state were identified (Fig. 4). k = 4 represented the solution identified as most optimal in the multi-indices summary and did not contain any single county clusters (Table S5). While the reduction in proportion of within cluster variance with k = 4 was less than in the single domain analyses, there was a >30 % reduction (Table S5). The four clusters of counties can be summarized as the following: Cluster 1 (green), low metals-low resources-high social vulnerability; cluster 2 (orange): low metals-high information resources-low social vulnerability; cluster 3 (blue): high metals-high resources-low/mixed social vulnerability; cluster 4 (pink): high metals-low health resources-high social vulnerability. Thus, cluster 4 counties are those of most concern for social and environmental stressor synergistic effects in the context of well water metals contamination, followed by cluster 3 counties. Cluster 4 contains 31 counties, primarily in the mid and coastal regions of the state. Cluster 1, on the other hand, is a collection of counties where there is great social need (high social vulnerability- low resources); however, the risk of exposure to well water-based iAs, Pb or Mn is comparatively less. Cluster 3, with high resources and lower social vulnerability, other than for minority status and language, represents more urban and peri-urban areas. While there are higher levers of metals in cluster 3, compared to clusters 1 and 2, that are worthy of attention, they are unlikely to be interacting with low-resources and high-social vulnerability in the same manner as in cluster 4 counties.
Fig.4.

Results of k-means clustering for the county-level analysis. A) A map of NC shaded by the county cluster assignments. B) A bar graph demonstrating the mean of the z-score standardized mean for each variable for the counties within each of the clusters (y = 0 represents the state mean).
3.6. LPA sensitivity assessment
In the LPA, the BIC values decreased from one profile to four profiles but increased from five profiles to six profiles, indicating that a larger number of profiles yielded a better fit but only up to four profiles. The BSLRT test was statistically insignificant (p = 0.525) when assessing the addition of the sixth profile, indicating that the sixth-profile solution was not optimal. The four-profile solution had a high entropy value of 0.896 indicating strong classification certainty. The mean posterior probabilities values for four-profile ranged from 0.925 to 1.00, indicating stronger class separation compared to the three-profile (0.923) or five-profile (0.919) (Asparouhov and Muthén, 2014). Thus, the four-profile solution was selected as optimal. The LPA identified the same number of profiles as the k-means analysis. There was substantive overlap in the interpretation of the profiles and cluster regarding the metals exposures, resources, and social vulnerability. Specifically, 79 % of the counties were identified in the same profile type in the LPA and k-means analysis indicating good congruence between the two methods (Table S7).
3.7. CASS-IT applied at the census tract-level
To demonstrate the application of CASS-IT at a different geographic resolution, we also conducted a census tract-level analysis. Utilizing the same metrics as in the county-level analysis, k = 3 was identified as the optimal solution in both the k-means analysis and the LPA. Only 236 (of 2195, 11 %) tracts did not match in the cluster profiles identified across the two techniques (Table S8). In the k-means analysis, Cluster 1 (green) contains 913 tracts with comparatively lower metals levels, but high levels of social vulnerability and low resources (Fig. 5). Cluster 2 (orange) represents 1016 mostly urban tracts (located in and around cities such as Wilmington, Charlotte, Durham, Raleigh, and Greensboro, Fig. S1) in which metal levels are also comparatively low; however, unlike cluster 1, resources are high and social vulnerability is low (Fig. 5). In fact, looking closer at these urban counties reveals interesting heterogeneity in which there are peri-urban census tracts assigned to cluster 1 (high social vulnerability, low resources, and low metals) and cluster 3 (Fig. S1). Cluster 3 constitutes 266 census tracts of the most concern for cumulative social and chemical exposure given the high levels of metals, low health and information resources and high social vulnerability (Fig. 5). A number of these cluster 3 census tracts highlight peri-urban areas of concern (Fig. S1). The majority of these tracts; however, are located in non-urban counties, including Anson and Stanly counties. These two counties were identified to be in the high priority cluster (cluster 4) in the county-level analysis and the majority of tracts within these counties were also found to be in the high priority cluster (cluster 3) in the tract-level analysis. Specifically, of the 6 census tracts in Anson county, 4 (75 %) were identified to be in cluster 3. In Stanly county, 7 of 13 (54 %) of census tracts were found to be in cluster 3.
Fig.5.

Results of k-means clustering for the tract-level analysis. A) A map of NC shaded by the census tract cluster assignments. B) A bar graph demonstrating the mean of the z-score standardized mean for each variable for the census tracts within each of the clusters (y = 0 represents the state mean).
4. Discussion
Combined assessments of exposure to chemical and social stressors are increasingly necessary to take a holistic approach to public health promotion (Flanagan et al., 2018; Gee and Payne-Sturges, 2004; Morello-Frosch and Shenassa, 2006). Here we propose the CASS-IT to identify geographic areas of concern for environmental justice promotion. Specifically, these analyses identify areas where social vulnerability is high, resources are low, and there is a high-level of exposure to an environmental toxicant. To demonstrate the utility of this technique, we integrated data from a database of private well water toxic metal contamination (iAs, Mn, Pb) with publicly available SVI and RAPT data and analyzed them using the CASS-IT, which entails a two-step approach of k-means clustering and then latent profile analysis. We conducted a primary analysis at the county-level and also a secondary analysis at the census tract-level in NC. There were three key findings of the study: first, that the CASS-IT provides additional insight as to which areas are at highest risk in NC; second, that CASS-IT results can yield information on environmental justice concerns; and third, that CASS-IT clustering can provide valuable direction for public health intervention efforts.
We recently published the NCWELL database and detailed high levels of exposure to toxic metals in NC via private well water (Eaves et al., 2022). In this previous work, we identified two critical clusters of counties for metal contamination of private wells: one cluster dominated by iAs and Mn contamination, and the other by Pb. However, what was missing from this previous work was the integration of other non-chemical stressors that contribute to the overall risk of environmentally induced disease (Flanagan et al., 2011; Flanagan et al., 2018; Morello-Frosch and Shenassa, 2006). When the NCWELL data was analyzed through the CASS-IT, all the high-metal counties in the previous analysis were found to be in either cluster 4 or cluster 3 of the CASS-IT. Cluster 4 represents the high metals-high social vulnerability-low health resources counties, while cluster 3 represents the high metals-low/mixed social vulnerability-high resources cluster. Thus, by incorporating previous environmental data with data on non-chemical stressors through the CASS-IT, distinctions can be made within these previously identified clusters of concern. Specifically, the high iAs-Mn cluster previously identified comprised Anson, Union, and Stanly counties (Eaves et al., 2022). Anson and Stanly were herein identified using the CASS-IT to be in the high-priority cluster 4 (high metals-high social vulnerability-low health resources) and Union in cluster 3 (high metals-low/mixed social vulnerability-high resources). In addition, the high Pb cluster previously identified included Guilford, Mecklenburg, Tyrrell, and Wake counties. Through the current CASS-IT analysis, Tyrrell was found to be in cluster 4, while Guilford, Mecklenburg, and Wake were located in cluster 3. As such, the CASS-IT provided additional context and revealed that Anson, Stanly, and Tyrrell are not only exposed to higher level of environmental contaminants, but that residents also have relatively low health resources and high social vulnerability. Furthermore, the application of CASS-IT at the census tract-level provides both an additional layer of granularity to identifying areas of concern and also confirmation of the county-level patterns found. For Anson and Stanly counties, the majority of census tracts were found to be in the tract-level analysis high priority cluster, thus mirroring the county-level findings. Note that Tyrrell County is comprised of only one tract, which was also located in the high priority cluster. Thus, these counties may be ideal targets of public health intervention that requires coordinated environmental, social, and health policy action.
The findings from this study are particularly relevant when considering private well water contamination in NC using an environmental justice lens. Environmental justice is a social movement that opposes and fights against the phenomenon whereby people of color, indigenous people, ethnic minorities, and lower income communities face the disproportionate burden of environmental health hazards (Lee, 2002; Mohai et al., 2009). Private well metal contamination in NC is an environmental justice issue because the likelihood of being on private well water, and therefore not on federally regulated municipal water, is influenced by race and class (Leker and MacDonald, 2018; MacDonald Gibson and Pieper, 2017). Additionally, for those who do rely on private well water, income is a significant contributor to the ability to test and treat water to ensure its safety (Leker and MacDonald, 2018; MacDonald Gibson and Pieper, 2017; Stillo et al., 2019; Wait et al., 2020). When evaluating the county clusters generated from the CASS-IT, clusters 3 and 4 highlight counties of concern for potential environmental injustices due to the high presence of metal contaminants. While cluster 4 had an easily identifiable pattern of concern with high metal concentrations, low health resources, and high social vulnerability, cluster 3 is also significant because of the number of peri-urban and urban counties contained within the cluster. Many counties in cluster 3 have more resources and less social vulnerability compared to the state average at the county-level overall, but this is likely driven by cities within the counties. We know that peri-urban areas in NC are subject to municipal underbounding, in which city services, including water and service lines, are deliberately not extended to minority communities, thereby leaving these communities on private well water (Leker and MacDonald, 2018). In fact, this phenomenon was observed in the results from the census tract-level CASS-IT analysis in which numerous tracts surrounding major cities (Durham, Raleigh, Charlotte, and Greensboro, most notably) were located in the high priority cluster (cluster 3 in the tract-level analysis), even though the other tracts in the county were mostly in the lowest priority cluster (cluster 2 in the tract-level analysis). Therefore, counties in cluster 3, especially minority communities on the periphery of the major cities who rely on well water, need to be particularly prioritized for public health programming around private wells.
Of note, in addition to the CASS-IT assisting with identifying priority areas, it can also help inform planning for public health interventions. For instance, when deploying a specific environmental intervention (e.g., well water testing or filters), the success of the intervention could be improved by better understanding the unique combination of social factors that could be a possible barrier or facilitator to implementation (e.g. housing and transportation options, socioeconomic status). Similarly, the success of social interventions (e.g., economic benefits, mental health services) could be enhanced by understanding the environmental factors unique to that population. Further, the CASS-IT not only identifies vulnerability but can also highlight the strengths of communities, through their resources, that may inform interventions. This is unique because although most conceptualizations of social “risk” attempt to capture both negative and positive factors, the tendency is to measure only those factors that increase the risk of a negative outcome (Rodriguez et al., 2019). Specifically, cluster 4, including counties such as Anson and Stanly, which were high for metals and social vulnerability and low for health resources, actually had the highest social resources, a strength. Thus, public health efforts that partner with churches, other religious institutions, and civic organizations may be particularly effective. This additional nuance is important when tailoring interventions and priorities with scarce resources to reach the most vulnerable and under-resourced communities.
In recent years there has been growing attention to the interactive role of chemical and non-chemical stressors and numerous groups have developed tools to visualize trends across multiple stressors. For example, the EPA has developed the national EJIndex and EJScreen, the California Office of Environmental Health Hazard Assessment has created CalEnviroScreen, and our team recently launched ENVIROSCAN, a similar platform that focuses on NC (Cushing et al., 2015; Greenfield et al., 2017; UNC Superfund Research Program, 2022; US EPA, 2022). We foresee that CASS-IT can synergize with these tools as it offers a complementary framework to use data presented in these tools and to take the data a step further from visualization by generating categories of and patterns among areas that should be prioritized, a critical step in identifying areas in which to begin an intervention. In addition, CASS-IT provides a novel addition by including resources, thus also focusing on the strengths of communities.
While this is among the first studies to demonstrate an analytical framework to integrate social and environmental hazards, the findings of this study should be considered in light of several limitations. First, the NCWELL database, while the largest repository of geocoded well water tests for NC, is not without limitations. This includes, but is not limited to: 1) not including metal exposures that might occur in public water, although this source of water is federally regulated and thus the exposure is generally lower (Gibson et al., 2020), 2) containing flushed draw samples, often at the well head, known to underestimate true lead exposure, and 3) likely being biased towards households in the socioeconomic position to pay for well water testing (Eaves et al., 2022). Second, one critical assumption made in the Flanagan et al. formula used as the basis of CASS-IT is that a unit increase in resources would nullify a unit increase in social vulnerability, whether this assumption holds under different conditions is unclear (Flanagan et al., 2011). Future work expanding on the current CASS-IT could investigate the effect of differential weighting of the different components of CASS-IT to evaluate if they give different and more or less meaningful descriptions of risk. Third, we focused on toxic metal exposure in well water; however, there are many other critical environmental chemicals that could be considered as chemical stressors of relevance to NC and elsewhere, including proximity to confined animal feeding operations, air pollution and per- and polyfluoroalkyl substances (PFAS) in drinking water. Future research could look into applying CASS-IT with these other exposures. Lastly, it is important to note that the cluster characteristics discussed herein were products of the NCWELL dataset analyzed through the CASS-IT, but that these characteristics may change when analyzed using a different dataset. Thus, critical interpretation of the clusters produced by CASS-IT are necessary to produce robust results. Future directions of this research could include connecting CASS-IT derived clusters to health outcomes, such as preterm birth or cancer rates to evaluate the predictive capacity of CASS-IT.
In conclusion, in this study we developed and tested CASS-IT, an analytic approach that integrated stressors across three domains — chemical, social vulnerability and lack of resources. Many health outcomes, including maternal and child health outcomes such as preterm birth, have known risk factors in both the environmental and social domains of the human experience (Morello-Frosch and Shenassa, 2006). Preventing these outcomes and reducing the disparities that exist between racial and economic groups requires a more nuanced understanding of the combination of these domains. Analytic approaches such as the CASS-IT provide novel tools to improve the fit between our conceptual understanding of the complex determinants of health and the methods we use in research.
Supplementary Material
HIGHLIGHTS.
Developed the CASS-IT to integrate geographic social and chemical stressor data
CASS-IT applied to metal contamination in private wells in North Carolina counties.
Chemical- and social stressor-only analyses produced different results than CASS-IT.
A 31-county high metal/high social vulnerability/low resource cluster was identified
Funding
This research was funded in part by a grant from the National Institutes of Health (P42-ES031007).
Abbreviations:
- BIC
Bayesian information criterion
- BSLRT
Bootstrap Likelihood Ratio Test
- CDC
Centers for Disease Control and Prevention
- EPA
Environmental Protection Agency
- FCC
Federal Communications Commission
- FEMA
Federal Emergency Management Agency
- iAs
inorganic arsenic
- LOR
limit of reporting
- LPA
latent profile analysis
- Mn
manganese
- NC
North Carolina
- NCDHHS
North Carolina Department of Health and Human Services
- Pb
lead
- Ppb
parts per billion
- RAPT
Resilience and Analysis Planning Tool
- SVI
social vulnerability index
Footnotes
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
CRediT authorship contribution statement
Lauren A. Eaves: Conceptualization, Methodology, Software, Validation, Formal analysis, Writing-original draft preparation. Paul Lanier: Conceptualization, Methodology, Software, Data Curation, Writing-original draft preparation, Writing- review & editing, Funding acquisition; Adam E Enggasser: Writing-original draft preparation; Gerard Chung: Methodology, Software, Data Curation, Formal analysis; Toby Turla: Writing- review & editing; Julia E Rager: Conceptualization, Methodology, Writing- review & editing; Rebecca C. Fry: Conceptualization, Methodology, Writing- review & editing, Project administration, Funding acquisition.
Supplementary data to this article can be found online at https://doi.org/10.1016/j.scitotenv.2022.160409.
Data availability
Data and code are available on github: https://github.com/UNCSRP/CASS-IT
References
- Ashrap P, Aker A, Watkins DJ, Mukherjee B, Rosario-Pabon Z, Velez-Vega CM, et al. , 2021. Psychosocial status modifies the effect of maternal blood metal and metalloid concentrations on birth outcomes. Environ. Int. 149, 106418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Asparouhov T, Muthén B, 2014. Auxiliary variables in mixture modeling: three-step approaches using mplus. Struct. Equ. Model. Multidiscip. J. 21, 329–341. [Google Scholar]
- Bailey KA, Fry RC, 2014. Arsenic-associated changes to the epigenome: what are the functional consequences? Curr. Environ. Health Rep. 1, 22–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berlin KS, Parra GR, Williams NA, 2014. An introduction to latent variable mixture modeling (part 2): longitudinal latent class growth analysis and growth mixture models. J. Pediatr. Psychol. 39, 188–203. [DOI] [PubMed] [Google Scholar]
- Brown VJ, 2014. Risk perception: it’s personal. Environ. Health Perspect. 122, A276–A279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Centers for Disease Control and Prevention Agency for Toxic Substances and Disease Registry Geospatial Research Analysis and Services Program, 2018. Social Vulnerability Index 2018. Database. [Google Scholar]
- Chung G, Phillips J, Jensen TM, Lanier P, 2020. Parental involvement and adolescents’ academic achievement: latent profiles of mother and father warmth as a moderating influence. Fam. Process 59, 772–788. [DOI] [PubMed] [Google Scholar]
- Clougherty JE, Rider CV, 2020. Integration of psychosocial and chemical stressors in risk assessment. Curr. Opin. Toxicol. 22, 25–29. [Google Scholar]
- Collins LM, Lanza ST, 2010. Latent Class and Latent Transition Analysis: With Applications in the Social, Behavioral, and Health Sciences. John Wiley & Sons. [Google Scholar]
- Cushing L, Faust J, August LM, Cendak R, Wieland W, Alexeeff G, 2015. Racial/Ethnic disparities in cumulative environmental health impacts in California: evidence from a statewide environmental justice screening tool (CalEnviroScreen 1.1). Am. J. Public Health 105, 2341–2348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eaves LA, Keil AP, Rager JE, George A, Fry RC, 2022. Analysis of the novel NCWELL database highlights two decades of co-occurrence of toxic metals in North Carolina private well water: public health and environmental justice implications. Sci. Total Environ. 812, 151479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- US EPA, 2020. Population Surrounding 1,857 Superfund Remedial Sites. [Google Scholar]
- FCC, 2018. FCC County Connections Database. [Google Scholar]
- FEMA, 2021a. Resilience Analysis and Planning Tool (RAPT). [Google Scholar]
- FEMA, 2021b. Resilience Analysis and Planning Tool: CRIA Research Summary. [Google Scholar]
- FEMA, 2021c. Resilience Analysis and Planning Tool: Data Layers and Sources. [Google Scholar]
- Flanagan BE, Gregory EW, Hallisey EJ, Heitgerd JL, Lewis B, 2011. A social vulnerability index for disaster management. J. Homel. Secur. Emerg. Manag. 8. [Google Scholar]
- Flanagan BE, Hallisey EJ, Adams E, Lavery A, 2018. Measuring community vulnerability to natural and anthropogenic hazards: the Centers for Disease Control and Prevention’s social vulnerability index. J. Environ. Health 80, 34–36. [PMC free article] [PubMed] [Google Scholar]
- Gee GC, Payne-Sturges DC, 2004. Environmental health disparities: a framework integrating psychosocial and environmental concepts. Environ. Health Perspect. 112, 1645–1653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gibson JM, Fisher M, Clonch A, MacDonald JM, Cook PJ, 2020. Children drinking private well water have higher blood lead than those with city water. Proc. Natl. Acad. Sci. U. S. A. 117 (29), 16898–16907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Greenfield BK, Rajan J, McKone TE, 2017. A multivariate analysis of CalEnviroScreen: comparing environmental and socioeconomic stressors versus chronic disease. Environ. Health 16, 131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hartigan JA, Wong MA, 1979. A K-means clustering algorithm. Appl. Stat. 28, 100–108. [Google Scholar]
- Howard MC, Hoffman M, 2018. Variable-centered, person-centered, and person-specific approaches. Organ. Res. Methods 21, 846–876. [Google Scholar]
- Hu H, Téllez-Rojo MM, Bellinger D, Smith D, Ettinger AS, Lamadrid-Figueroa H, et al. , 2006. Fetal lead exposure at each stage of pregnancy as a predictor of infant mental development. Environ. Health Perspect. 114, 1730–1735. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jason L, Glenwick D, 2016. Handbook of Methodological Approaches to Community-based Research. Oxford University Press. [Google Scholar]
- Kainz K, Jensen T, Zimmerman S, 2018. Cultivating a research tool kit for social work doctoral education. J. Soc. Work. Educ. 54, 792–807. [Google Scholar]
- Kassambara A, Mundt F, 2020. factoextra: Extract and Visualize the Results of Multivariate Data Analyses. [Google Scholar]
- Lee C, 2002. Environmental justice: building a unified vision of health and the environment. Environ. Health Perspect. 110 (Suppl. 2), 141–144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leker HG, Mac Donald, Gibson J., 2018. Relationship between race and community water and sewer service in North Carolina, USA. PLoS ONE 13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- MacDonald Gibson J, Pieper KJ, 2017. Strategies to improve private-well water quality: a North Carolina perspective. Environ. Health Perspect. 125, 076001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maranville AR, Ting T-F, Zhang Y, 2009. An Environmental Justice Analysis: Superfund Sites and Surrounding Communities in Illinois. Environmental Justice, p. 2. [Google Scholar]
- Masyn KE, 2013. Latent Class Analysis and Finite Mixture Modeling. Vol 2. Oxford University Press. [Google Scholar]
- Meakin CJ, Szilagyi JT, Avula V, Fry RC, 2020. Inorganic arsenic and its methylated metabolites as endocrine disruptors in the placenta: mechanisms underpinning glucocorticoid receptor (GR) pathway perturbations. Toxicol. Appl. Pharmacol. 409, 115305. [DOI] [PubMed] [Google Scholar]
- Mohai P, Pellow D, Roberts JT, 2009. Environmental justice. Annu. Rev. Environ. Resour. 34, 405–430. [Google Scholar]
- Morello-Frosch R, Shenassa ED, 2006. The environmental “riskscape” and social inequality: implications for explaining maternal and child health disparities. Environ. Health Perspect. 114, 1150–1153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- NC Child, 2019. Child Poverty in North Carolina: The Scope of the Problem. [Google Scholar]
- NC Dept. of Health and Human Services, 2018. Racial and Ethnic Health Disparities in North Carolina. [Google Scholar]
- NC DHHS, 2022. NC DHHS, Division of Social Services, General Information. [Google Scholar]
- Nigra AE, 2020. Environmental racism and the need for private well protections. Proc. Natl. Acad. Sci. U. S. A. 117, 17476–17478. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rider CV, Dourson ML, Hertzberg RC, Mumtaz MM, Price PS, Simmons JE, 2012. Incorporating nonchemical stressors into cumulative risk assessments. Toxicol. Sci. 127, 10–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rodriguez MY, DePanfilis D, Lanier P, 2019. Bridging the gap: social work insights for ethical algorithmic decision-making in human services. IBM J. Res. Dev. 63 (8), 1–8 8. [Google Scholar]
- Rupp AA, 2013. Clustering and Classification. Vol 2. Oxford University Press. [Google Scholar]
- Sanders AP, Desrosiers TA, Warren JL, Herring AH, Enright D, Olshan AF, et al. , 2014. Association between arsenic, cadmium, manganese, and lead levels in private wells and birth defects prevalence in North Carolina: a semi-ecologic study. BMC Public Health 14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanders AP, Messier KP, Shehee M, Rudo K, Serre ML, Fry RC, 2012. Arsenic in North Carolina: public health implications. Environ. Int. 38, 10–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Santos HP Jr., Nephew BC, Bhattacharya A, Tan X, Smith L, Alyamani RAS, et al. , 2018. Discrimination exposure and DNA methylation of stress-related genes in Latina mothers. Psychoneuroendocrinology 98, 131–138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwarz G, 1978. Estimating the dimension of a model. Ann. Stat. 6, 461–464. [Google Scholar]
- Stekhoven DJ, 2013. missForest: Nonparametric Missing Value Imputation Using Random Forest. R Package. [Google Scholar]
- Stekhoven DJ, Bühlmann P, 2012. MissForest–non-parametric missing value imputation for mixed-type data. Bioinformatics 28, 112–118. [DOI] [PubMed] [Google Scholar]
- Stillo F, Bruine de Bruin W, Zimmer C, Gibson JM, 2019. Well water testing in african-american communities without municipal infrastructure: beliefs driving decisions. Sci. Total Environ. 10, 1220–1228. [DOI] [PubMed] [Google Scholar]
- Tamayo y Ortiz M, Téllez-Rojo MM, Trejo-Valdivia B, Schnaas L, Osorio-Valencia E, Coull B, 2017. Maternal stress modifies the effect of exposure to lead during pregnancy and 24-month old children’s neurodevelopment. Environ. Int. 98, 191–197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tibshirani R, Walther G, Hastie T, 2000. Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B Stat Methodol. 63, 411–423. [Google Scholar]
- UNC Superfund Research Program, 2022. North Carolina ENVIROSCAN. [Google Scholar]
- US EPA, 2022. EJScreen: Environmental Justice Screening and Mapping Tool. [Google Scholar]
- US Geological Survey, 2015. USGS Water Use Data for North Carolina for 2015 2022. [Google Scholar]
- Wait K, Katner A, Gallagher D, Edwards M, Mize W, Jackson CLP, et al. , 2020. Disparities in well water outreach and assistance offered by local health departments: a North Carolina case study. Sci. Total Environ. 747, 141173. [DOI] [PubMed] [Google Scholar]
- Wright RJ, 2009. Moving towards making social toxins mainstream in children’s environmental health. Curr. Opin. Pediatr. 21, 222–229. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data and code are available on github: https://github.com/UNCSRP/CASS-IT
