Skip to main content
GeoHealth logoLink to GeoHealth
. 2026 Jan 23;10(1):e2025GH001666. doi: 10.1029/2025GH001666

Surface Variable‐Based Machine Learning for Scalable Arsenic Prediction in Undersampled Areas

Shams Azad 1,2, Mason O Stahl 3, Melinda Erickson 4, Beck A DeYoung 3, Craig Connolly 2, Lawrence Chillrud 5, Kathrin Schilling 6, Ana Navas‐Acien 6, Anirban Basu 6, Brian Mailloux 7, Benjamin C Bostick 2,, Steven N Chillrud 2,
PMCID: PMC12828343  PMID: 41583008

Abstract

In the United States, private wells are not federally regulated, and many households do not test for Arsenic (As). Chronic exposure is linked with multiple health outcomes, and risk can change sharply over short distances and with well depth. Coarse maps or sparse sampling often miss exceedances. Most existing models operate at ∼1 km resolution and use groundwater chemistry or detailed geologic logs, which limits their use in undersampled areas where improved guidance is most needed. We overcome these limitations by developing a machine learning model for Minnesota, USA, that predicts As exposure risk using only surficial variables from remote sensing and global data sets. Variables related to surface water hydrology and geomorphology are selected based on mechanistic links that control redox conditions and As mobilization. Local training was essential, and surficial geology variables that are more sensitive to local conditions were needed to maximize model accuracy. The resulting complete model was sufficiently sensitive to generate accurate and detailed risk maps and depth profiles of As concentrations above the 10 μg/L maximum contaminant level. Accuracy depended on local training data density. We identified a training data density of 0.07 wells/km2 as a practical target for stable county‐level performance. Maps of exceedance probabilities highlight priority areas for testing that are particularly important in rural communities that have received less sampling. These results support public health action by guiding where to install wells and where to test them, how much new sampling is needed, and where treatment outreach is most urgent.

Keywords: drinking water, machine learning, hydrology, well water, public health, private wells, exposure, probability mapping

Plain Language Summary

Many Minnesotans use private wells, and these wells are not checked under federal rules. Arsenic in well water can cause serious health problems. Levels of arsenic can change quickly over short distances and with depth, so families may not know if their well is safe. To address this, we built 30‐m maps that show the chance that arsenic is higher than the drinking water limit of 10 μg/L. These maps use information from satellites and other surface data that are available everywhere. They do not require water chemistry tests or drilling logs, which makes them useful in areas with little existing data. The model works best for wells shallower than 100 m. We also estimated how many test results are needed in each county to make the maps more reliable. Local health agencies, Tribes, and well drillers can use the maps to plan testing, guide outreach, and choose safer well locations.

Key Points

  • A high‐resolution framework predicts groundwater arsenic using mainly surface data, scalable to undersampled regions

  • Maps capture household‐scale risk, account for well depth, and test transferability across geologic settings

  • Stable county‐level predictions require ≥0.07 wells/km2, giving agencies a clear sampling target

1. Introduction

Geogenic As is widespread worldwide in groundwater and presents a significant human health hazard (Buschmann et al., 2007; Eisler, 2004; Fendorf et al., 2010; Rahman & Rahaman, 2018; Verma et al., 2023; Welch et al., 2000; Ying et al., 2017). Chronic As exposure, even at levels below drinking water maximum contaminant levels (MCL) of 10 μg/L, can lead to a variety of severe health outcomes, including dermatological, cardiovascular, and neurological impairments, and cancer (Lamm et al., 2021; Mohammed Abdul et al., 2015; O’Bryant et al., 2011; Singh et al., 2024; Sinha & Prasad, 2020; Tsuji et al., 2014). Arsenic concentrations in groundwater vary greatly across short distances (Van Geen et al., 2003) and with depth (Smedley & Kinniburgh, 2002). High As concentrations can occur sporadically, influenced by local geology and geochemistry. This heterogeneity challenges traditional mapping methods, which typically rely on interpolating sparse well measurements and often produce unreliable “bull's‐eye” patterns centered around data points (Donselaar et al., 2024; Pal et al., 2024; Sobel et al., 2021). These limitations have motivated alternative approaches, such as machine learning (ML) models, to better predict As distribution.

Over the past decade, ML models have been increasingly applied to predict groundwater As by recognizing patterns between known As occurrences and various environmental attributes. These models have been used at regional to global scales to highlight broad regions at risk (M. L. Erickson et al., 2021; Podgorski & Berg, 2020). By leveraging predictor variables ranging from climate to soil/water chemistry, ML models estimate As hazard in areas where sampling is sparse (Ayotte et al., 2006, 2016, 2017). Among various ML techniques, ensemble methods such as Random Forest (RF) and Boosted Regression Trees (BRT) have emerged as widely used approaches in As predictive modeling (M. L. Erickson et al., 2018; Lombard et al., 2021; Podgorski et al., 2020; Tan et al., 2020; Wu et al., 2021).

Despite extensive research on As risk, important uncertainties remain in both our understanding and our ability to map the spatial distribution of As accurately. The majority of existing models produce hazard maps at a coarse resolution (typically 1 km) (Podgorski & Berg, 2020). However, As levels can vary at much finer scales both horizontally and vertically (Smedley & Kinniburgh, 2002; Van Geen et al., 2003). This substantially limits their utility for identifying individual wells at risk, supporting agencies responsible for regulating drilling practices, and providing guidance to private well owners. It even complicates testing the model because individual well locations are not predicted by a regional probability. In addition, many models depend on parameters such as in‐well water chemistry or detailed lithologic data from core logs (Amini et al., 2008; Bindal & Singh, 2019; Bretzler et al., 2017; Cao et al., 2021; Hossain & Piantanakulchai, 2013; Podgorski & Berg, 2020; Podgorski et al., 2020; Winkel et al., 2008), which biases them toward regions that have been more intensively studied and can potentially lead to erroneous predictions in undersampled areas where they are most needed. A major limitation is that they require site‐specific data, often available only through drilling a well. As a result, model performance in data‐poor rural areas remains uncertain, even though those areas often face the highest As risks (Ayotte et al., 2017; Powers et al., 2019). In the United States, predicting high‐As (>10 μg/L) areas is particularly challenging because training data sets are dominated by samples with low As (≤10 μg/L) concentrations. This imbalance biases models toward the majority class and results in poor detection of high‐As locations. Thus, many current models achieve high overall accuracy but perform poorly in terms of sensitivity (Ayotte et al., 2006, 2016; M. L. Erickson et al., 2021; Saftner et al., 2023). Imbalance in the distribution of data geographically compounds this problem because most test data are located in regions with dense testing rather than areas that lack sufficient data. This poor sensitivity is concerning from a public health perspective since low sensitivity results in a model that misclassifies many unsafe areas as safe (i.e., high false negative rates), particularly in areas where As concentrations are near the MCL. This highlights the need for new modeling strategies that place greater emphasis on reducing false negatives rather than simply maximizing overall accuracy.

Most existing models are developed within a specific geological context (Cho et al., 2011; R. Fan et al., 2024; Khatun et al., 2024; Stewart et al., 2025; Zhao et al., 2024). Because these models depend on geological parameters that can be compared across sites, they are assumed to be transferable at some scale, for example, to make predictions in undersampled regions of the same area. We are not aware of any study that has systematically evaluated their transferability. In other words, it remains unclear how a model trained in one region will perform in another with different subsurface conditions, such as sediment composition or redox state. This uncertainty in cross‐context performance is a major limitation in current As risk modeling. While general ML frameworks for environmental prediction provide useful guidance on model generalization, spatial dependence, and pseudo‐replication (Zhu et al., 2023), there is still no domain‐specific framework for geogenic As prediction. A comprehensive, empirically tested approach is still needed to evaluate how well models perform across regions and under varying depositional and redox conditions.

To address these challenges, we propose an approach that produces high‐resolution, transferable predictions using only widely available surface and subsurface variables. By excluding in situ water chemistry inputs, we aim to create a model that can be applied proactively in undersampled regions to predict As risk prior to drilling new wells or extensive well testing. The mechanistic rationale is that many surface parameters shape the recharge location and composition, and govern subsurface processes that mobilize As. For example, clay‐rich confining layers and organic‐rich soils create anoxic groundwater conditions, which promote As release from iron oxides (McArthur et al., 2004; Mihajlov et al., 2020). Regions with alkaline soils and low recharge from evapotranspiration often exhibit higher groundwater pH, which drives As desorption. These linkages are well established in hydrogeochemical studies worldwide (Smedley & Kinniburgh, 2002). Thus, variables such as soil pH, flooding frequency, clay content, depth to water table, proximity to surface waters, and long‐term hydrologic balance can serve as proxies for conditions conducive to As mobilization. Incorporating these variables enables high‐resolution As prediction and constitutes a substantial improvement over previous models.

In this study, we implement and test a surface‐variable‐based ML model in Minnesota, USA, a region with known but unevenly distributed groundwater As contamination (M. Erickson & Barnes, 2004). The state contains high‐As and low‐As wells in close proximity and has a legacy of Pleistocene glacial deposition that created complex hydrogeology (M. Erickson & Barnes, 2004; M. L. Erickson & Barnes, 2005). Mandatory As testing of all new private wells in Minnesota (MN Department of Health, 2025) has generated a data set of more than 50,000 samples, which allows us both to train the model and rigorously evaluate its performance at a fine scale. Here, we address three key gaps in groundwater As prediction: (a) transferability, (b) scalability, and (c) sensitivity, with particular focus on optimization near the MCL. First, we evaluate the transferability of a surface hydrology–based model that successfully predicted groundwater As in Cambodia (Connolly et al., 2022) by applying it to Minnesota, where hydrogeologic conditions differ substantially from those of the original training region. We then gradually improve the model by adding local surface variables relevant to Minnesota. This step‐by‐step process helps us test transferability while also improving As prediction performance in a challenging setting where existing models are usually not sufficiently sensitive to resolve exceedances near the MCL.

2. Materials and Methods

2.1. Mechanistic Underpinning of Surface Variables to Predict Groundwater As

Many variables correlate with groundwater geochemical composition in dense data sets. To maximize transferability, we selected model variables based on three criteria: (a) Variable must be available as high‐resolution rasters to enable household‐scale predictions. (b) Each variable must have a potential mechanistic link to processes that mobilize As (e.g., redox conditions influenced by flooding, organic carbon, or pH‐driven sorption). For example, given the strong association of high As groundwater with periodic flooding (Connolly et al., 2022), variables should have direct or indirect relationships to surface hydrology and geomorphology. (c) Whenever available, we prioritized variables derived from remotely sensed data to allow efficient site characterization without costly and labor‐intensive field sampling or well installation. Accordingly, we excluded point aqueous chemical data such as iron (Fe) or sulfur (S) concentrations, which are not available across the study domain, and subsurface geological data, such as sediment/rock type, which lack well‐to‐well, high‐resolution, and consistent coverage. We intentionally excluded these variables, not because they lack mechanistic relevance, but because they introduce two key limitations that reduce model transferability. First, variables such as dissolved Fe and sulfide can only be measured after a well is drilled, so they are not available for predicting arsenic risk in new or unsampled areas. Using these variables would require spatial interpolation (e.g., kriging) to create continuous maps (Cao et al., 2018), which would introduce additional uncertainty and potential bias into the model. For these reasons, including such variables would limit the model's practical use and reduce its reliability for large‐scale, pre‐drilling prediction. Second, these data are spatially sparse, often concentrated in previously studied regions, unevenly distributed, and variable in quality. Low‐resolution or imprecise geologic inputs would have introduced additional uncertainty and biased predictions. To avoid these biases, we used surface and near‐surface proxies that represent the same underlying processes: (a) redox potential through flooding metrics and shallow water‐table depth, (b) soil organic matter and moisture balance, which influence the development of reducing conditions in groundwater, (c) sorption and desorption controls through soil pH, and (d) permeability and groundwater residence time through soil texture (clay content) and topographic setting. This enables pre‐drilling risk assessment in data‐poor regions while still reflecting the Fe‐oxide reduction and sulfide formation pathways responsible for arsenic mobilization. We further prioritized high‐resolution, globally extensive data sets to ensure model scalability to other undersampled regions. This design allows the model to be applied in areas with limited or no prior groundwater sampling.

We build on the process‐informed work of Connolly et al. (2022), which showed that surface flooding metrics capture groundwater‐surface connectivity and redox conditions (Connolly et al., 2022). Connolly et al. developed a groundwater As prediction model for Cambodia using only 5 predictor variables: flooding duration and frequency, proximity to rivers, river width, and surface water fraction. The model performed well in Cambodia and transferred successfully to Bangladesh and Vietnam without local training, both of which contain large deltas (Connolly et al., 2022). When flooding regimes are compared across Cambodia, Bangladesh, and Minnesota (Figure 1), Cambodia and Bangladesh span wide ranges of flooding duration and frequency, and bins with elevated flooding correspond to high median As. By contrast, Minnesota falls within a narrow, low‐flooding range and shows low median As, consistent with flooding‐As linkage identified by Connolly et al. (2022). However, when we restrict the Cambodia and Bangladesh data to flooding conditions observed in Minnesota, As values correctly center near the MCL, but are more variable and have numerous high outliers. This result indicates that flooding alone cannot explain As levels near the MCL in Minnesota. Although hydrologic metrics effective in monsoonal alluvial environments do not fully translate to glaciated terrain, the overlap suggests a hybrid approach holds promise. We hypothesize that surface variables remain reliable predictors of groundwater As in Minnesota, but only when trained on a wider range of local data.

Figure 1.

Figure 1

Flooding regime versus median groundwater As across regions. Top row: two‐dimensional histograms of flooding duration versus flooding frequency, colored by bin‐wise median As, for Cambodia, Bangladesh, and Minnesota. Right‐hand panels show Cambodia and Bangladesh restricted to the observed levels of Minnesota flooding. Bottom row: boxplots of bin‐wise median As for each data set. Cambodia and Bangladesh occupy broad flooding regimes and contain many high‐As bins. Minnesota shows a narrow flooding range and low bin‐wise medians. Within Minnesota's flooding bins, Cambodia and Bangladesh were also low, but retained higher medians. The figure suggests flooding variables alone (at least based on Southeast Asian data) will not accurately transfer to groundwater As predictions in Minnesota, and it motivates the inclusion of additional surficial predictors and local training.

To examine this, we first apply the Cambodia model in Minnesota as a transfer experiment, then add surficial predictors that better capture hydrogeochemical controls in glaciated terrain. Through this stepwise process, we develop and validate an optimized Minnesota model (see Section 2.2). Finally, we evaluate the model's ability to predict groundwater As concentrations in low‐sampled regions of the state.

2.2. Modeling Approach

We developed a series of RF classification models in Python using the Scikit‐learn package (Pedregosa et al., 2011) to categorize samples by As concentration, relative to the 10 μg/L MCL standard. In this binary classification, As concentrations >10 μg/L represent the event of interest, while concentrations ≤10 μg/L define the non‐event.

Our goal is to develop a prediction model that can be broadly applied, including in areas with limited sampling. To test transferability, we first trained a binary classification model for Cambodia using the data and parameters of Connolly et al., 2022. Because the original model was regression‐based, we constructed a new classification version, consistent with our Minnesota framework, with a threshold of 10 μg/L. Then, we applied this Cambodia‐trained model to Minnesota samples to test the transferability across distinct hydrogeological settings. We subsequently developed a Minnesota‐specific model using the same variables, but trained on local hydrological data, then refined it by incorporating additional soil‐based geological predictors to enhance accuracy.

This approach produced four binary classification models (Figure 2): (a) Cambodia‐trained with hydrological variables tested on Cambodia data (Cam‐Cam‐Hydro); (b) Cambodia‐trained with hydrological variables tested on Minnesota data (Cam‐Min‐Hydro); (c) Minnesota‐trained with hydrological variables, tested on Minnesota data (Min‐Min‐Hydro); and (d) Minnesota‐trained with both hydrological and surficial geological variables tested on Minnesota data (Min‐Min‐Hydro‐Geo). This stepwise method allows us to systematically assess model transferability and improvements gained by incorporating local predictors.

Figure 2.

Figure 2

Flow diagram illustrating the study approach. First, the study recreates the Cambodia model as a classification model, trained using hydrological variables. A prior study demonstrated that this model performs well in Cambodia. Next, we tested its performance in Minnesota and used it to predict As concentrations there. The third model was developed using Minnesota's hydrological data and tested in Minnesota. In the fourth model, we incorporated both hydrological and surface geological variables using Minnesota's data and tested it in Minnesota. The colored lines correspond to the modeling approach used for each of the models developed in this study.

For the binary Cambodia model (Cam‐Cam‐Hydro), we used the complete data set (n = 42,864) provided by Connolly et al. (2022). The data set was balanced with respect to As concentrations, with 55% of samples (23,529 out of 42,864) exceeding the MCL threshold. In contrast, the Minnesota data set showed a skewed distribution of As concentrations (Figure 3a): 88% of samples (51,000 out of 57,648) were below the MCL (majority class), while 12% (6,648 out of 57,648) were above it (minority class). Such an imbalance can bias the binary classification results toward the majority class. To address this, we examined several resampling techniques, including random undersampling, random oversampling, SMOTE, and class weighting (discussed in Section 1 in Supporting Information S1). Random undersampling outperformed the other methods, so we adopted this method to balance the data set. Specifically, we randomly selected 6,648 samples from the majority class to equalize the sample count between the majority and the minority classes, that produced a balanced data set of 13,296 samples.

Figure 3.

Figure 3

Observed groundwater As concentrations, well depths, and population distribution in Minnesota. (a) Histogram of As concentration from groundwater samples, (b) spatial distribution of As concentration, (c) histogram of well depths, (d) spatial distribution of well depths, (e) population density map (persons/km2) developed using 1 km resolution gridded population data (Center for International Earth Science Information Network ‐ CIESIN ‐ Columbia University, 2018), (f) groundwater sample density map generated using kernel density estimation with a 10 km bandwidth and 100 m pixel size.

We used an 80:20 split for the Cambodia data set (n = 42,864). This approach retained a large training set and preserved a sufficiently large, independent test set. Sensitivity checks across 60:40, 70:30, 75:25, and 80:20 ratios produced indistinguishable performance metrics within sampling uncertainty (Table S2 in Supporting Information S1). For the smaller, balanced Minnesota data set (n = 13,296), we applied a 75:25 split to maintain enough test samples for meaningful model evaluation while preserving the majority of data for training. A separate sensitivity analysis of train‐test ratios (Table S3 in Supporting Information S1) showed consistent receiver operating characteristic (ROC)‐Area Under the Curve (ROC‐AUC), accuracy, and sensitivity values across all splits. These results confirm that model performance is stable and not sensitive to the choice of data partition.

For each model, we identified the optimal hyperparameters for the models using 5‐fold cross‐validation, evaluating 288 candidate combinations for a total of 1,440 model fits. The optimal hyperparameters for the Cam‐Cam‐Hydro model were: n_estimators = 500, max_depth = 16, min_samples_split = 2, min_samples_leaf = 1, and max_features = “sqrt.” For both the Min‐Min‐Hydro and Min‐Min‐Hydro‐Geo models, the optimal hyperparameters were: n_estimators = 500, max_depth = 20, min_samples_split = 2, min_samples_leaf = 1, and max_features = “sqrt.” Given the numerous possible permutations of data sets, we performed a sensitivity analysis by running 10 iterations of the train‐test split. We found that the model consistently produced the same level of accuracy across all iterations. Therefore, we fixed the train and test sets from the final iteration, so we could evaluate the models using the same test data. We used the 44,352 majority class samples removed during undersampling for further model evaluation.

To evaluate alternative algorithms, we compared the performance of RF with Extreme Gradient Boosting (XGB), and Support Vector Machine (SVM) models. All three models were trained and tested using the same data sets to ensure identical evaluation conditions. The results (Tables S4 and S5 in Supporting Information S1) show that RF consistently achieved the highest sensitivity, accuracy, and ROC‐AUC for both the Cambodia and Minnesota data sets, while also minimizing false negatives. We selected the RF model because it performed best in our comparison and has been widely used in previous groundwater arsenic studies, including the Cambodia model by Connolly et al. (2022). To evaluate potential overfitting, we examined learning curves showing training and cross‐validated recall as functions of training sample size for both the Cambodia and Minnesota models (Figure S1 in Supporting Information S1). In both cases, the validation recall increased with more data, while the training recall decreased, indicating improved generalization and limited overfitting.

2.3. Predictor Variables

In this study, most surface variable data were collected as raster files from Google Earth Engine (Gorelick et al., 2017), and the predictor data were extracted and processed using Quantum Geographic Information System (QGIS) software and Python scripting language. We included six hydrological variables: (a) distance to the nearest river from each sample, (b) nearest river width, (c) proportion of river and lake (percent river and lake area within a 1 km radius buffer of each sample), (d) water occurrence (flooding duration), and (e) water recurrence (flooding frequency).

The distance to the nearest river and river width variables were derived using the global river width from the Landsat river mask (Allen & Pavelsky, 2018). The data were directly downloaded from the Google Earth Engine as a shapefile line feature (Gorelick et al., 2017). To determine river distance and width, first, in QGIS, the line feature was converted into a point feature, and then the distance between each well and the nearest river point from the well was computed. In addition, the river width associated with the nearest river point was determined. To calculate the proportion of river and lake variables, first, 30‐m‐resolution river and lake raster files of the ASTER Global Water Bodies Database (NASA/METI/AIST/Japan Spacesystems & U.S./Japan ASTER Science Team, 2019) were downloaded from the Google Earth Engine. Then, in QGIS, we first created a 1‐km buffer from each sample and computed the percentage of the buffer area covered by the pixels from the river and lake raster files to determine the proportion of river and lake variables. The water occurrence and recurrence raster were obtained from the JRC Global Surface Water data set on Google Earth Engine. Because this variable was included to determine the influence of surface flooding on As concentration, all permanent water (pixel value = 1) was masked out for exclusion from the aggregate value for each buffer.

Surface geology and other factors were included in the final model (based on local hydrology and geology, the hydro‐geo model) to account for their potential effect on surface chemical and hydrological processes. Soil variables include soil pH, soil clay content, and soil organic matter. This data originated from the 30‐m resolution Probabilistic Remapping of SSURGO soil properties data set (Chaney et al., 2016). Additionally, we included the annually averaged normalized difference vegetation index (NDVI) from Landsat 8 and a soil aridity index data set (Global‐AI_PET_v3) (Zomer et al., 2022) as additional metrics of surface water hydrology. The NDVI, an index of the health and density of vegetation, is often limited by water availability. The NDVI is thus potentially a sensitive indicator of water availability in areas that do not flood, while the aridity is a measure of potential for recharge. Furthermore, we integrated the elevation data set from the Shuttle Radar Topography Mission (SRTM). We also included water table depth as a predictor variable for the model, using the global water table depth data set (Y. Fan et al., 2013).

Because the specific locations of groundwater recharge are uncertain, we aggregated all raster‐based variables within a 1 km buffer around each well, because shallow wells (<50 m) usually are sourced within this distance (U.S. Geological Survey, 2016a). We also included the measured depth of each well as a predictor variable to assess its influence on As prediction. Table 1 summarizes the 14 predictor variables, their definitions, and the specific models in which they were applied.

Table 1.

Predictor Variables Used in As Prediction Models

Variable Description Model usage a
Distance to nearest river Euclidean distance from each well to the closest river segment 1, 2, 3, 4
Nearest river width Width of the nearest river, derived from Landsat river mask 1, 2, 3, 4
Proportion of river Percent of 1 km buffer area around a well covered by river pixels 1, 2, 3, 4
Proportion of lake Percent of 1 km buffer area around a well covered by lake pixels 1, 2, 3, 4
Water occurrence (flooding duration) Frequency of surface water presence over time (excluding permanent water) 1, 2, 3, 4
Water recurrence (flooding frequency) Degree to which surface water appears repeatedly over time (excluding permanent water) 1, 2, 3, 4
Soil pH Soil acidity/alkalinity at 30 m resolution (SSURGO‐derived data set) 4
Soil clay content Proportion of clay in soil profile at 30 m resolution (SSURGO‐derived data set) 4
Soil organic matter Organic matter content in soil at 30 m resolution (SSURGO‐derived data set) 4
NDVI Vegetation index from Landsat 8: proxy for water availability and plant health 4
Aridity index Global AI_PET_v3 data set: measure of potential recharge and water balance 4
Elevation Elevation derived from Shuttle Radar Topography Mission (SRTM) 4
Water table depth Depth to groundwater table from global data set 4
Well depth Actual depth of each well, included to assess vertical variation in As 4
a

Model references: 1 = Cam‐Cam‐Hydro model (Cambodia‐trained, tested in Cambodia). 2 = Cam‐Min‐Hydro model (Cambodia‐trained, tested in Minnesota). 3 = Min‐Min‐Hydro model (Minnesota‐trained, tested in Minnesota). 4 = Min‐Min‐Hydro‐Geo model (Minnesota‐trained, tested in Minnesota).

We examined pairwise predictor relationships using Pearson correlation on the training set (Figure S2 in Supporting Information S1) and calculated variance inflation factors (VIF) (Table S6 in Supporting Information S1). Most variables showed low intercorrelation. However, water occurrence and water recurrence variables were highly correlated. Because the RF model uses feature subsampling and tree averaging, they are less sensitive to multicollinearity. We retained both variables in the model since they were used in Connolly et al. (2022) and have been identified as important predictors of groundwater As. We examined outliers and implausible values for all predictor variables using univariate histograms and summary statistics. Only a few extreme values were identified, which were winsorized at the 0.5% upper and lower tails to reduce their influence while retaining all valid observations.

2.4. Groundwater As Data

Groundwater As data was compiled using 57,648 unique samples from the USGS NWIS and Minnesota Department of Health data sets (U.S. Geological Survey, 2016b) for model development. Most wells in Minnesota had low As concentrations, with 52% of samples below 2 μg/L, which produced a right‐skewed distribution (Figure 3a). Groundwater As showed a clear spatial pattern, with a hotspot of high As levels extending from the western to the central part of the state (Figure 3b). High and low As wells often occurred adjacent to each other. Most wells in our data set were relatively shallow (Figure 3c), with an average depth of 43 m and a median depth of 32 m. Well depths also displayed spatial structure, with wells in the southeastern region generally deeper than those elsewhere (Figure 3d). Groundwater sampling was uneven across the state, with higher densities in more populated regions (Figures 3e and 3f). Prediction models that estimate the As level across all areas are therefore especially useful in sparsely populated regions, where both population and well densities are low.

2.5. Generating Prediction Maps

To generate prediction maps at a specific spatial resolution, we first created a grid of points covering the entire study area, with spacing set to the desired resolution. For example, for a 30 m resolution prediction map, points were spaced at 30 m intervals, whereas for a 250 m resolution map, spacing was set at 250 m. Around each grid point, a buffer of 1 km radius was created, and mean values of all independent predictor variables within this buffer were calculated. Although the buffer radius was fixed at 1 km, the centroid shifted by 30 m for each successive grid point. Because we used high‐resolution predictor variables, each prediction point retained a unique set of aggregated predictor values. We then used these averaged predictor variables as inputs to our RF model to classify As contamination at each point. Finally, we converted the point predictions into raster maps for both 30 and 250 m resolutions. Since the prediction resolution depends directly on grid spacing, this approach generates prediction maps at any chosen spatial resolution.

3. Results and Discussion

We evaluated the performance of four models (see Section 2.2) in predicting groundwater As concentrations exceeding the 10 μg/L MCL across Minnesota. The following subsections detail model accuracy, variable importance, false negative analysis, depth‐dependent performance, and spatially resolved prediction maps. Model accuracy metrics are elaborated in Section 6 in Supporting Information S1.

3.1. Model Accuracy and Variable Importance

3.1.1. Cam‐Cam‐Hydro: Binary Model Trained and Tested on Cambodia Data

This binary classification model was developed for Cambodia to distinguish samples with As concentrations above or below 10 μg/L, using the data and variables from Connolly et al. (2022). The model included only hydrological predictors from that study: flooding occurrence, flooding recurrence, proximity to rivers, proportion of rivers and lakes, and river width. The model was highly effective, performing similarly to the regression model reported in Connolly et al. (2022). It achieved an out‐of‐sample accuracy over 80% and an AUC (Area Under the ROC Curve) of 0.87. Tree‐based feature importance analysis (Figure S4 in Supporting Information S1) and partial dependence plots (Figure S5 in Supporting Information S1) identified flooding recurrence and flooding occurrence as the most influential predictors of groundwater As levels in Cambodia.

3.1.2. Cam‐Min‐Hydro: Binary Model Trained on Cambodia and Tested on Minnesota

Given the robust predictive power and strong mechanistic connection to As mobilization of the hydrologic variables from Connolly et al. (2022), we evaluated the transferability of the Cambodia‐trained hydrologic model (Cam‐Cam‐Hydro model) by applying it to Minnesota well sites (Cam‐Min‐Hydro). Cambodia experiences frequent and prolonged flooding, while flooding in Minnesota is less extensive (Figure 1; Figure S8 in Supporting Information S1). The model therefore predicted that nearly all wells in Minnesota would be low As, consistent with expectations for regions with limited flooding. However, the Cam‐Cam‐Hydro model was unable to predict groundwater concentrations above the MCL. It achieved an ROC‐AUC of 0.51 (Figure S3 in Supporting Information S1), indicating near‐random performance, and had a sensitivity of only 0.01 for detecting wells exceeding 10 μg/L (Table 2). This poor accuracy highlights that flooding alone cannot discriminate between wells above and below the MCL, since the MCL lies at the very low end of concentrations typically observed in Cambodia. This highlights the limits of model transferability and the need to train models with local data for reliable predictions. These findings are consistent with previous studies on limited cross‐regional applicability (M. L. Erickson et al., 2018; Fienen et al., 2016), and suggest that incorporating additional variables capturing Minnesota's distinct hydrological, geological, and geochemical conditions could improve performance.

Table 2.

Performance Metrics of Groundwater As Prediction Models on the Test Data Set

Model Sensitivity Specificity False negative rate False positive rate Positive prediction value (PPV) Negative prediction value (NPV) Accuracy ROC‐AUC
Cam‐Cam Hydro 0.85 0.74 0.15 0.26 0.80 0.79 0.80 0.87
Cam‐Min Hydro 0.01 0.98 0.99 0.01 0.35 0.50 0.50 0.51
Min‐Min Hydro 0.66 0.63 0.33 0.36 0.64 0.66 0.65 0.70
Min‐Min Hydro‐Geo 0.78 0.68 0.22 0.32 0.71 0.75 0.73 0.80

Note. The best model based on MN testing data is bolded for each fitting metric.

3.1.3. Min‐Min‐Hydro: Binary Model Trained and Tested on Minnesota Data

The Min‐Min‐Hydro model, trained on Minnesota hydrological variables similar to those used in the Cambodia model, performed better than a random model, with an out‐of‐sample accuracy of 0.65, an ROC‐AUC of 0.70, and a false negative rate of 0.33. These results suggest that this model, based solely on local hydrologic variables, can predict groundwater As events in Minnesota but with lower accuracy than the Cam‐Cam‐hydro model (Table 2). This indicates that while hydrologic variables are important predictors of As concentrations in Cambodia, their limited variability in Minnesota reduces predictive power. Partial dependence plots support this observation (Figure S6 in Supporting Information S1). Thus, additional factors beyond remotely sensed surface water conditions are needed to improve prediction accuracy for Minnesota.

3.1.4. Min‐Min‐Hydro‐Geo: Binary Model Trained and Tested on Minnesota Data

The Min‐Min‐Hydro‐Geo model combined hydrological variables with additional surface and soil attributes, including soil clay content, soil organic matter, soil pH, elevation, water table depth, vegetation coverage, and aridity index. In the RF classification, we applied a probability threshold of 0.47 instead of the conventional 0.50 to reduce false negatives while maintaining overall accuracy. The optimal threshold was determined using Youden's J statistic (J = TPR − FPR) (Youden, 1950) derived from the ROC curve. This method identifies the cutoff that maximizes the difference between true‐positive and false‐positive rates and provides the best balance between sensitivity and specificity. According to Youden's J statistic, the optimal classification threshold was determined to be 0.47, which was used as the cutoff value for our model. Incorporating these additional predictors and adjusting the threshold substantially improved performance compared to hydrology‐only models. Out‐of‐sample accuracy of the model increased to 0.73 and the false negative rate declined to 0.22 (Table 2). The model achieved higher specificity than several As prediction models previously developed for the United States (Ayotte et al., 2006, 2016, 2017; M. L. Erickson et al., 2021; Lombard et al., 2021).

Tree‐based feature importance analysis identified soil pH, aridity index, and clay content as the most influential predictors of groundwater As (Figure S4 in Supporting Information S1), with their marginal effects illustrated in the partial dependence plots (Figure S7 in Supporting Information S1). Adding soil‐based variables significantly improved the performance over Minnesota models trained solely on hydrological variables. Soil pH and texture reflect underlying geological settings, which often shift sharply across different landforms. For instance, sandy, well‐drained soils tend to lose bicarbonate and become more acidic. In contrast, clay‐rich, poorly drained soils retain carbonates and cations, resulting in higher pH. These contrasts are critical because pH, redox conditions, and texture all influence As fate and transport. Reducing conditions with high pH in poorly drained soils can release As from iron oxides. In well‐drained, oxidizing environments, this process is limited. Sandy soils, by allowing more drainage, also play a role in groundwater recharge and shape As distribution in aquifers.

To verify model stability, we performed a five‐fold cross‐validation for all three models (Tables S7–S9 in Supporting Information S1). Performance across folds was consistent, and the mean ± SD accuracy and ROC‐AUC values closely matched with test‐set results in Table 2. This confirms that model performance is stable and not the result of a favorable or biased data partition.

3.2. Examining False Negatives

False negatives represent a significant health threat because they misclassify samples with As > 10 μg/L as safe drinking water sources. In this case, model predicts As < MCL when the actual concentration exceeds the limit. While the overall goal of these models is to enhance overall accuracy, minimizing false negatives is even more critical for public health.

Figure 4 presents the distribution of false negatives for the Min‐Min‐Hydro and Min‐Min‐Hydro‐Geo models. The Cam‐Min‐Hydro model is excluded from this analysis because its poor accuracy and precision. Both models were evaluated on the same 3,324 samples, of which 1,663 had As concentrations above 10 μg/L. The Min‐Min‐Hydro model yielded a false negative rate of 33.6% (559 false negatives), whereas the Min‐Min‐Hydro‐Geo model reduced this to 22% (370 false negatives), making it the stronger performer from a health risk perspective. The histogram in Figure 4 shows that most misclassifications occur when observed As concentrations are between 10 and 12 μg/L. This range corresponds to the typical measurement variance (20%) based on repeated samples (Safarzadeh‐Amiri et al., 2011), and highlights the difficulty in predicting As concentrations at the MCL. When we excluded samples in this range, the false negative rate for the Min‐Min‐Hydro‐Geo model decreased to 18%.

Figure 4.

Figure 4

Distribution of observed As concentrations for the false negative predictions in the Minnesota models. (left) Min‐Min‐Hydro model, and (right) Min‐Min‐Hydro‐Geo model.

Figure 5 compares observed As concentrations with model predictions. The left subplots illustrate the distribution of observed As concentrations when the model predicted ≤10 μg/L, and the right subplots show the distribution when the model predicted >10 μg/L. In each subplot, blue bars represent correct predictions, and orange bars represent misclassifications. The orange bars in the left plots correspond to false negatives, whereas the orange bar in the right represents false positives. A cumulative line is also included in each subplot, which indicates the percentages of correct predictions.

Figure 5.

Figure 5

Histograms of actual As concentrations calculated for models using the same balanced test data set in Minnesota. Left panels show the histogram of observed As concentrations for wells predicted as ≤10 μg/L As, while the right panels show the observed concentrations when predictions >10 μg/L. Blue bars indicate correct predictions, while orange bars show incorrect predictions. Cumulative lines provide information about prediction accuracy within each subplot. “n” denotes the number of predictions in each group. All models were tested on the same test data set containing 3,324 samples.

We tested both models using the same test set consisting of 3,324 samples, which included 1,663 with actual concentrations above the MCL and the remaining 1,661 at or below the MCL. The Min‐Min‐Hydro model predicted 1,609 samples as at or below the MCL, with 66% correct (blue bar), and 1,715 samples above the MCL, with 64% correct (blue bar). The Min‐Min‐Hydro‐Geo model predicted 1,503 samples at or below the MCL, with 76% correct, and 1,821 samples above the MCL, with 70% correct. Overall, the Min‐Min‐Hydro‐Geo model outperforms the Min‐Min‐Hydro model by minimizing both false negatives and false positives.

3.3. Test With the Remaining Samples Not Used in the Model

As described in Section 2.2, the random undersampling excluded 44,352 Minnesota samples with As concentrations below 10 μg/L from training and testing the Min‐Min‐Hydro and Min‐Min‐Hydro‐Geo models. We used these samples to further evaluate model performance. The Min‐Min‐Hydro‐Geo model correctly predicted 32,349 of them, achieving an accuracy of 73%, consistent with its overall accuracy.

3.4. Model Accuracy at Different Well Depths

We examined the accuracy of the Min‐Min‐Hydro‐Geo model across different well depths (Figure 6a). Of the 3,324 samples in the test set, 1,022 were from shallow wells (0–25 m depth). For these samples, the model achieves a false positive rate of 0.24 and a false negative rate of 0.3, achieving an AUC and accuracy of 0.73. Only 168 samples came from deep wells (>100 m). For these samples, the false negative rate increased to 0.44, while the false positive rate dropped to 0.1. The AUC for this category is 0.71, with an accuracy of 0.81. The model performed best at intermediate depths (50–75 m), where the false negative rate was only 0.13.

Figure 6.

Figure 6

(a) Min‐Min‐Hydro‐Geo model's performance at different well depths. The false negative rate significantly increases for samples with well depths exceeding 100 m. As a result, the model predictions based on depth should be restricted to depths <100 m, excluding only 5% of wells. (b) The box plot illustrates observed As concentrations at varying depths within the test set. Deeper wells exhibited relatively lower average As concentrations compared to shallow and medium‐depth wells.

Despite consistent AUC across depths, false negatives increased with well depth (Figure 6a). This trend may be attributed to several factors. One explanation is that the aquifer recharge location for deeper wells might fall outside the 1 km buffer radius used to aggregate surface variables. As such, our model was most accurate at shallow depths, where connections between surface and groundwater are strongest. Training for deep wells (>100 m) was limited by sample size (only 450 training wells). As a result, the model may not have effectively learned the relationship between surface features and As concentration as a function of depth. Balancing the training data by depth could improve predictions of As depth profiles, but this was not possible with the current limited data set. Figure 6b illustrates the distribution of observed As concentrations in the test set. Wells with depths between 25 and 75 m had median As concentrations above the MCL, where concentrations decreased substantially in deeper wells (>100 m), with a median value of only 1.5 μg/L.

3.5. Minnesota As Prediction Maps

The classification model predicts the likelihood of As concentrations being above or below the MCL across different well depths and locations. Figure 7 shows statewide predictions of As contamination from the Min‐Min‐Hydro‐Geo model at 30 and 100 m depth. The model predicts sharp depth gradients, with groundwater As levels decreasing as well depth increases. At 30 m, 49% of the state is predicted to have groundwater As levels above 10 μg/L. At 100 m, only 31% of the state is predicted to exceed the 10 μg/L threshold. The predictions capture the observed trend in As concentrations, with deep wells showing a lower risk of elevated As than shallow wells (Figure 6b). However, training data for wells deeper than 100 m are sparse, and most of those samples have low As. So the model underestimates high As at greater depths, which increases the false negative rate in deep wells (Figure 6a). The model also predicts a higher risk of As contamination in the western part of the state than in the east (Figure 7), consistent with the observed spatial distribution of As concentrations (Figure 3).

Figure 7.

Figure 7

Groundwater As prediction maps for Minnesota. (a) Model‐estimated probability that As exceeds 10 μg/L at 30 m depth. (b) Binary classification of As exceedance at 30 m depth. (c) Model‐estimated probability that As exceeds 10 μg/L at 100 m depth. (d) Binary classification at 100 m depth. Panels (a) and (c) show continuous probabilities from 0 to 1. Dark red means a high probability of exceedance where blue means a low probability. Panels (b) and (d) are derived from the probability maps using a decision threshold of 0.47. Pixels with probability >0.47 labeled >10 μg/L, others ≤10 μg/L. The 0.47 threshold is a classification rule chosen to reduce false negatives while maintaining overall accuracy.

Figure 8 (left) shows prediction accuracy at the county level, calculated as the percentage of correct predictions among test samples. The prediction accuracy is lower in counties with low population density or sample density. Water sampling density was generally higher in urban than rural regions (Figures 3e and 3f). Because the model is intended to make predictions statewide, this imbalance increases the risk of underperformance in rural areas with sparse training data. We analyzed the minimum sampling density required to achieve consistent prediction accuracy statewide.

Figure 8.

Figure 8

Prediction accuracy and sample data density. (left) Model prediction accuracy (out‐of‐sample) at the county level. (right) Shows the relationship between model accuracy and training data density. Each point in the plot represents the density of training data within a 100 km2 area for each county, with color coding indicating the corresponding testing data density. Model accuracy stabilizes when training data has seven samples per 100 km2 area.

Figure 8 (right) shows the relationship between model accuracy (%) and training and testing data density (samples per 100 km2). Each point represents a county's training data density (x‐axis) and model accuracy (y‐axis), with colors indicating testing data density. A blue horizontal line marks 50% accuracy, the threshold for better‐than‐random predictions. When the density of training data falls below seven samples per 100 km2 (0.07 samples/km2), prediction accuracy ranges from 0% to 100%, which reflects higher uncertainty due to insufficient and unevenly distributed training data. Consistent statewide prediction accuracy requires evenly distributed training samples and adequate coverage in rural areas. The data suggest a minimum density of 0.07 samples per square kilometer is needed for spatially consistent prediction accuracy for the entire study region.

Rural Native American communities are often undersampled (Lewis et al., 2017). We examined predictions at two spatial resolutions within an undersampled White Earth Reservation, a Native American community in North‐Central Minnesota. Figure 9 shows As prediction maps at 250 and 30 m resolution for wells at 38 m depth, the typical well depth in this region. These two prediction maps were generated independently. The side‐by‐side comparison highlights the importance of high‐resolution predictions. The 30 m resolution map delineates the boundaries between high and low As concentrations much more clearly than the 250 m resolution map. This high‐resolution prediction is essential for household‐level As risk mapping, information that would help government agency staff, well drillers, and homeowners.

Figure 9.

Figure 9

Groundwater As prediction maps for White Earth Reservation area at 38 m depth. Predictions are shown at 250 m (left) and 30 m (right) resolution. Higher spatial resolution improves delineation of boundaries between risk categories.

3.6. Limitations

Our model is not purely based on surface observations, with predictor variables derived from other models. For example, surface flooding variables are derived from a classification model that predicts surface water extent monthly using Landsat (visible wavelengths) images (Pekel et al., 2016). The soil properties were taken from the Probabilistic Remapping of SSURGO soil properties data (Chaney et al., 2016), while the aridity index data comes from the Global‐AI_PET_v3 model (Zomer et al., 2022). These models have their own limitations, which may affect the accuracy of our model. A detailed evaluation of how the sensitivity and accuracy of underlying predictor variables contribute to uncertainty in As predictions was beyond the scope of this study.

A key objective of this study was to test whether the Cambodia model (Cam‐Cam‐Hydro), trained in a distinct geological setting, was transferable to predict groundwater As levels Minnesota without additional tuning. Our results showed that the existing Cambodia model performed poorly in Minnesota. Techniques such as transfer learning, feature normalization and standardization, and data augmentation could enhance transferability, but were beyond the scope of this study.

4. Conclusion

We demonstrate that integrating local hydrological and geological data with ML significantly improves the prediction of groundwater As in undersampled regions. We make four contributions to groundwater As prediction. First, we showed Cambodia‐trained model, which performed well in Bangladesh and Vietnam, performed poorly in Minnesota, which underscores limited cross‐regional transferability and suggests the need for locally trained models when environments are radically different. Second, we developed and validated a scalable, high‐resolution (30 m) hazard model for Minnesota using only surface variables from remote sensing and globally available data sets that enables application in undersampled regions. Third, we improved model sensitivity near the 10 μg/L drinking water standard, with significant reduction in false negatives, even in sparsely sampled areas. Finally, we identified a minimum sampling density of 0.07 wells per km2 as necessary for stable and reliable predictions, providing practical guidance for future monitoring and extending this approach to other regions.

High‐resolution predictions are critical for delineating localized hotspots because As can vary sharply over short distances and with depth. The model achieved 73% overall accuracy. The false negative rate was 22%, which fell to 18% after accounting for measurement uncertainty in groundwater samples. Compared to previously published models for the U.S., this model demonstrates higher sensitivity for elevated As levels. Key predictors of elevated As in Minnesota include soil pH, clay content, and aridity index, which indicate geochemical controls on As mobilization and retention. Hydrological features such as flooding occurrence and recurrence, which were highly informative in Cambodian data, contributed less in Minnesota because they vary little. Prediction maps reveal pronounced vertical and horizontal As gradients, especially at shallow depths (<30 m), where approximately 49% of the state exceeds the MCL. However, prediction uncertainty increases with depth due to limited training data for deeper wells, indicating the importance of additional sampling at high depths. Although urban areas have higher sample densities, rural areas also require sufficient coverage to maintain spatially consistent accuracy.

We balanced model specificity with scalability. Using only surface‐based variables allows broad application, since all predictors are available globally or near‐globally. Adding subsurface geological data could improve accuracy, but such data are rarely available in undersampled regions, which would limit scalability. Overall, we provide a scalable, high‐resolution framework for predicting groundwater As risk to support water resource management and public health interventions in both data‐rich and underserved areas. The significance of this work is twofold: it produces the most detailed As risk maps for Minnesota to date at 30 m resolution, and it establishes a practical, transferable modeling approach for predictive mapping in data‐poor regions.

Conflict of Interest

The authors declare no conflicts of interest relevant to this study.

Supporting information

Supporting Information S1

Acknowledgments

Any use of trade, firm, or product names is for description purposes only and does not imply endorsement by the U.S. Government. This study is supported by the National Institute of Environmental Health Sciences (P42 ES033719 and P30 ES009089), Columbia Climate School postdoctoral fellowship, NSF fellowship, and the John Wesley Powell Center for Analysis and Synthesis as a part of the Characterizing Global Variability in Groundwater Arsenic Working Group that was funded by the U.S. Geological Survey. Use of generative AI tools: ChatGPT (OpenAI, GPT‐4o and GPT‐5) was used solely for copyediting purposes, such as grammar and phrasing improvements. All text was reviewed and edited by the authors.

Azad, S. , Stahl, M. O. , Erickson, M. , DeYoung, B. A. , Connolly, C. , Chillrud, L. , et al. (2026). Surface variable‐based machine learning for scalable arsenic prediction in undersampled areas. GeoHealth, 10, e2025GH001666. 10.1029/2025GH001666

Contributor Information

Benjamin C. Bostick, Email: bostick@ldeo.columbia.edu.

Steven N. Chillrud, Email: chilli@ldeo.columbia.edu.

Data Availability Statement

All data and code that support this study are openly available (Azad et al., 2025). The archive contains the materials needed to reproduce the analyses and figures.

References

References

  1. Allen, G. H. , & Pavelsky, T. M. (2018). Global extent of rivers and streams. Science, 361(6402), 585–588. 10.1126/science.aat0636 [DOI] [PubMed] [Google Scholar]
  2. Amini, M. , Abbaspour, K. C. , Berg, M. , Winkel, L. , Hug, S. J. , Hoehn, E. , et al. (2008). Statistical modeling of global geogenic arsenic contamination in groundwater. Environmental Science and Technology, 42(10), 3669–3675. 10.1021/es702859e [DOI] [PubMed] [Google Scholar]
  3. Ayotte, J. D. , Medalie, L. , Qi, S. L. , Backer, L. C. , & Nolan, B. T. (2017). Estimating the high‐arsenic domestic‐well population in the conterminous United States. Environmental Science and Technology, 51(21), 12443–12454. 10.1021/ACS.EST.7B02881 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Ayotte, J. D. , Nolan, B. T. , & Gronberg, J. A. (2016). Predicting arsenic in drinking water wells of the Central Valley, California. Environmental Science and Technology, 50(14), 7555–7563. 10.1021/acs.est.6b01914 [DOI] [PubMed] [Google Scholar]
  5. Ayotte, J. D. , Nolan, B. T. , Nuckols, J. R. , Cantor, K. P. , Robinson, G. R. , Baris, D. , et al. (2006). Modeling the probability of arsenic in groundwater in New England as a tool for exposure assessment. Environmental Science and Technology, 40(11), 3578–3585. 10.1021/es051972f [DOI] [PubMed] [Google Scholar]
  6. Azad, S. , Stahl, M. , Erickson, M. , Beck, A. D. , Connelly, C. , Chillrud, L. , et al. (2025). Dataset for: Surface variable‐based machine learning for scalable arsenic prediction in undersampled areas (version 2) [Dataset]. Zenodo. 10.5281/zenodo.17556627 [DOI]
  7. Bindal, S. , & Singh, C. K. (2019). Predicting groundwater arsenic contamination: Regions at risk in highest populated state of India. Water Research, 159, 65–76. 10.1016/j.watres.2019.04.054 [DOI] [PubMed] [Google Scholar]
  8. Bretzler, A. , Lalanne, F. , Nikiema, J. , Podgorski, J. , Pfenninger, N. , Berg, M. , & Schirmer, M. (2017). Groundwater arsenic contamination in Burkina Faso, West Africa: Predicting and verifying regions at risk. Science of the Total Environment, 584–585, 958–970. 10.1016/j.scitotenv.2017.01.147 [DOI] [PubMed] [Google Scholar]
  9. Buschmann, J. , Berg, M. , Stengel, C. , & Sampson, M. L. (2007). Arsenic and manganese contamination of drinking water resources in Cambodia: Coincidence of risk areas with low relief topography. Environmental Science and Technology, 41(7), 2146–2152. 10.1021/ES062056K [DOI] [PubMed] [Google Scholar]
  10. Cao, H. , Xie, X. , Wang, Y. , & Deng, Y. (2021). The interactive natural drivers of global geogenic arsenic contamination of groundwater. Journal of Hydrology, 597, 126214. 10.1016/j.jhydrol.2021.126214 [DOI] [Google Scholar]
  11. Cao, H. , Xie, X. , Wang, Y. , Pi, K. , Li, J. , Zhan, H. , & Liu, P. (2018). Predicting the risk of groundwater arsenic contamination in drinking water wells. Journal of Hydrology, 560, 318–325. 10.1016/j.jhydrol.2018.03.007 [DOI] [Google Scholar]
  12. Center for International Earth Science Information Network ‐ CIESIN ‐ Columbia University . (2018). Gridded population of the world, version 4 (GPWv4): Basic demographic characteristics, revision 11. 10.7927/H46M34XX [DOI]
  13. Chaney, N. W. , Wood, E. F. , McBratney, A. B. , Hempel, J. W. , Nauman, T. W. , Brungard, C. W. , & Odgers, N. P. (2016). POLARIS: A 30‐meter probabilistic soil series map of the contiguous United States. Geoderma, 274, 54–67. 10.1016/j.geoderma.2016.03.025 [DOI] [Google Scholar]
  14. Cho, K. H. , Sthiannopkao, S. , Pachepsky, Y. A. , Kim, K. W. , & Kim, J. H. (2011). Prediction of contamination potential of groundwater arsenic in Cambodia, Laos, and Thailand using artificial neural network. Water Research, 45(17), 5535–5544. 10.1016/J.WATRES.2011.08.010 [DOI] [PubMed] [Google Scholar]
  15. Connolly, C. T. , Stahl, M. O. , DeYoung, B. A. , & Bostick, B. C. (2022). Surface flooding as a key driver of groundwater arsenic contamination in Southeast Asia. Environmental Science and Technology, 56(2), 928–937. 10.1021/ACS.EST.1C05955 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Donselaar, M. E. , Khanam, S. , Ghosh, A. K. , Corroto, C. , & Ghosh, D. (2024). Machine‐learning approach for identifying arsenic‐contamination hot spots: The search for the needle in the haystack. ACS ES&T Water, 4(8), 3110–3114. 10.1021/ACSESTWATER.4C00422 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Eisler, R. (2004). Arsenic hazards to humans, plants, and animals from gold mining. Reviews of Environmental Contamination & Toxicology, 180, 133–165. 10.1007/0-387-21729-0_3 [DOI] [PubMed] [Google Scholar]
  18. Erickson, M. , & Barnes, R. (2004). Arsenic in groundwater: Recent research and implications for Minnesota. Center for Urban and Regional Affairs, University of Minnesota, 34(2), 1–7. [Google Scholar]
  19. Erickson, M. L. , & Barnes, R. J. (2005). Well characteristics influencing arsenic concentrations in ground water. Water Research, 39(16), 4029–4039. 10.1016/J.WATRES.2005.07.026 [DOI] [PubMed] [Google Scholar]
  20. Erickson, M. L. , Elliott, S. M. , Brown, C. J. , Stackelberg, P. E. , Ransom, K. M. , Reddy, J. E. , & Cravotta, C. A. (2021). Machine‐learning predictions of high arsenic and high manganese at drinking water depths of the glacial aquifer system, northern continental United States. Environmental Science and Technology, 55(9), 5791–5805. 10.1021/acs.est.0c06740 [DOI] [PubMed] [Google Scholar]
  21. Erickson, M. L. , Elliott, S. M. , Christenson, C. A. , & Krall, A. L. (2018). Predicting geogenic arsenic in drinking water wells in glacial aquifers, north‐central USA: Accounting for depth‐dependent features. Water Resources Research, 54(12), 10172–10187. 10.1029/2018WR023106 [DOI] [Google Scholar]
  22. Fan, R. , Deng, Y. , Du, Y. , & Xie, X. (2024). Predicting geogenic groundwater arsenic contamination risk in floodplains using interpretable machine‐learning model. Environmental Pollution, 340, 122787. 10.1016/J.ENVPOL.2023.122787 [DOI] [PubMed] [Google Scholar]
  23. Fan, Y. , Li, H. , & Miguez‐Macho, G. (2013). Global patterns of groundwater table depth. Science, 339(6122), 940–943. 10.1126/SCIENCE.1229881 [DOI] [PubMed] [Google Scholar]
  24. Fendorf, S. , Michael, H. A. , & Van Geen, A. (2010). Spatial and temporal variations of groundwater arsenic in South and Southeast Asia. Science, 328(5982), 1123–1127. 10.1126/SCIENCE.1172974 [DOI] [PubMed] [Google Scholar]
  25. Fienen, M. N. , Nolan, B. T. , & Feinstein, D. T. (2016). Evaluating the sources of water to wells: Three techniques for metamodeling of a groundwater flow model. Environmental Modelling & Software, 77, 95–107. 10.1016/J.ENVSOFT.2015.11.023 [DOI] [Google Scholar]
  26. Gorelick, N. , Hancher, M. , Dixon, M. , Ilyushchenko, S. , Thau, D. , & Moore, R. (2017). Google Earth Engine: Planetary‐scale geospatial analysis for everyone. Remote Sensing of Environment, 202, 18–27. 10.1016/J.RSE.2017.06.031 [DOI] [Google Scholar]
  27. Hossain, M. M. , & Piantanakulchai, M. (2013). Groundwater arsenic contamination risk prediction using GIS and classification tree method. Engineering Geology, 156, 37–45. 10.1016/j.enggeo.2013.01.007 [DOI] [Google Scholar]
  28. Khatun, M. F. , Reza, A. H. M. S. , Sattar, G. S. , Khan, A. S. , & Khan, M. I. A. (2024). Prediction of arsenic concentration in groundwater of Chapainawabganj, Bangladesh: Machine learning‐based approach to spatial modeling. Environmental Science and Pollution Research, 31(33), 46023–46037. 10.1007/S11356-024-34148-2 [DOI] [PubMed] [Google Scholar]
  29. Lamm, S. H. , Boroje, I. J. , Ferdosi, H. , & Ahn, J. (2021). A review of low‐dose arsenic risks and human cancers. Toxicology, 456, 152768. 10.1016/J.TOX.2021.152768 [DOI] [PubMed] [Google Scholar]
  30. Lewis, J. , Hoover, J. , & MacKenzie, D. (2017). Mining and environmental health disparities in Native American communities. Current Environmental Health Reports, 4(2), 130–141. 10.1007/S40572-017-0140-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Lombard, M. A. , Bryan, M. S. , Jones, D. K. , Bulka, C. , Bradley, P. M. , Backer, L. C. , et al. (2021). Machine learning models of arsenic in private wells throughout the conterminous United States as a tool for exposure assessment in human health studies. Environmental Science and Technology, 55(8), 5012–5023. 10.1021/ACS.EST.0C05239 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. McArthur, J. M. , Banerjee, D. M. , Hudson‐Edwards, K. A. , Mishra, R. , Purohit, R. , Ravenscroft, P. , et al. (2004). Natural organic matter in sedimentary basins and its relation to arsenic in anoxic ground water: The example of West Bengal and its worldwide implications. Applied Geochemistry, 19(8), 1255–1293. 10.1016/J.APGEOCHEM.2004.02.001 [DOI] [Google Scholar]
  33. Mihajlov, I. , Mozumder, M. R. H. , Bostick, B. C. , Stute, M. , Mailloux, B. J. , Knappett, P. S. K. , et al. (2020). Arsenic contamination of Bangladesh aquifers exacerbated by clay layers. Nature Communications, 11(1), 2244. 10.1038/S41467-020-16104-Z [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. MN Department of Health . (2025). Arsenic in well water. Retrieved from https://www.health.state.mn.us/communities/environment/water/wells/waterquality/arsenic.html
  35. Mohammed Abdul, K. S. , Jayasinghe, S. S. , Chandana, E. P. S. , Jayasumana, C. , & De Silva, P. M. C. S. (2015). Arsenic and human health effects: A review. Environmental Toxicology and Pharmacology, 40(3), 828–846. 10.1016/J.ETAP.2015.09.016 [DOI] [PubMed] [Google Scholar]
  36. NASA/METI/AIST/Japan Spacesystems, & U.S./Japan ASTER Science Team . (2019). ASTER global water bodies database V001 [Dataset]. NASA EOSDIS Land Processes Distributed Active Archive Center. 10.5067/ASTER/ASTWBD.001 [DOI]
  37. O’Bryant, S. E. , Edwards, M. , Menon, C. V. , Gong, G. , & Barber, R. (2011). Long‐term low‐level arsenic exposure is associated with poorer neuropsychological functioning: A Project FRONTIER study. International Journal of Environmental Research and Public Health, 8(3), 861–874. 10.3390/IJERPH8030861 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Pal, S. , Singh, S. K. , Singh, P. , Pal, S. , & Kashiwar, S. R. (2024). Spatial pattern of groundwater arsenic contamination in Patna, Saran, and Vaishali districts of Gangetic plains of Bihar, India. Environmental Science and Pollution Research, 31(41), 54163–54177. 10.1007/S11356-022-25105-Y [DOI] [PubMed] [Google Scholar]
  39. Pedregosa, F. , Michel, V. , Grisel, O. , Blondel, M. , Prettenhofer, P. , Weiss, R. , et al. (2011). Scikit‐learn: Machine learning in python. Journal of Machine Learning Research, 12. 2825–2830. [Google Scholar]
  40. Pekel, J. F. , Cottam, A. , Gorelick, N. , & Belward, A. S. (2016). High‐resolution mapping of global surface water and its long‐term changes. Nature, 540(7633), 418–422. 10.1038/nature20584 [DOI] [PubMed] [Google Scholar]
  41. Podgorski, J. , & Berg, M. (2020). Global threat of arsenic in groundwater. Science, 368(6493), 845–850. 10.1126/science.aba1510 [DOI] [PubMed] [Google Scholar]
  42. Podgorski, J. , Wu, R. , Chakravorty, B. , & Polya, D. A. (2020). Groundwater arsenic distribution in India by machine learning geospatial modeling. International Journal of Environmental Research and Public Health, 17(19), 1–17. 10.3390/ijerph17197119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Powers, M. , Yracheta, J. , Harvey, D. , O’Leary, M. , Best, L. G. , Black Bear, A. , et al. (2019). Arsenic in groundwater in private wells in rural North Dakota and South Dakota: Water quality assessment for an intervention trial. Environmental Research, 168, 41–47. 10.1016/J.ENVRES.2018.09.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Rahman, A. , & Rahaman, H. (2018). Contamination of arsenic, manganese and coliform bacteria in groundwater at Kushtia District, Bangladesh: Human health vulnerabilities. Journal of Water and Health, 16(5), 782–795. 10.2166/WH.2018.057 [DOI] [PubMed] [Google Scholar]
  45. Safarzadeh‐Amiri, A. , Fowlie, P. , Kazi, A. I. , Siraj, S. , Ahmed, S. , & Akbor, A. (2011). Validation of analysis of arsenic in water samples using Wagtech Digital Arsenator. Science of The Total Environment, 409(13), 2662–2667. 10.1016/J.SCITOTENV.2011.03.016 [DOI] [PubMed] [Google Scholar]
  46. Saftner, D. M. , Bacon, S. N. , Arienzo, M. M. , Robtoy, E. , Schlauch, K. , Neveux, I. , et al. (2023). Predictions of arsenic in domestic well water sourced from alluvial aquifers of the Western Great Basin, USA. Environmental Science and Technology, 57(8), 3124–3133. 10.1021/acs.est.2c07948 [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Singh, P. , Varshney, G. , & Kaur, R. (2024). Arsenic contamination in drinking water and health. Emerging Contaminants and Associated Treatment Technologies, 125–142. 10.1007/978-3-031-52614-5_7 [DOI] [Google Scholar]
  48. Sinha, D. , & Prasad, P. (2020). Health effects inflicted by chronic low‐level arsenic contamination in groundwater: A global public health challenge. Journal of Applied Toxicology, 40(1), 87–131. 10.1002/JAT.3823 [DOI] [PubMed] [Google Scholar]
  49. Smedley, P. L. , & Kinniburgh, D. G. (2002). A review of the source, behaviour and distribution of arsenic in natural waters. Applied Geochemistry, 17(5), 517–568. 10.1016/S0883-2927(02)00018-5 [DOI] [Google Scholar]
  50. Sobel, M. , Sanchez, T. R. , Zacher, T. , Mailloux, B. , Powers, M. , Yracheta, J. , et al. (2021). Spatial relationship between well water arsenic and uranium in Northern Plains native lands. Environmental Pollution, 287, 117655. 10.1016/J.ENVPOL.2021.117655 [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Stewart, E. D. , Fitzpatrick, W. A. , & Stewart, E. K. (2025). Improved groundwater arsenic contamination modeling using 3‐D stratigraphic mapping, eastern Wisconsin, USA. Water, 17(13), 2024. 10.3390/W17132024/S1 [DOI] [Google Scholar]
  52. Tan, Z. , Yang, Q. , & Zheng, Y. (2020). Machine learning models of groundwater arsenic spatial distribution in Bangladesh: Influence of Holocene sediment depositional history. Environmental Science and Technology, 54(15), 9454–9463. 10.1021/ACS.EST.0C03617 [DOI] [PubMed] [Google Scholar]
  53. Tsuji, J. S. , Perez, V. , Garry, M. R. , & Alexander, D. D. (2014). Association of low‐level arsenic exposure in drinking water with cardiovascular disease: A systematic review and risk assessment. Toxicology, 323, 78–94. 10.1016/J.TOX.2014.06.008 [DOI] [PubMed] [Google Scholar]
  54. U.S. Geological Survey . (2016a). GAMACTT: Groundwater Age Mixtures and Contaminant Trends Tool (Version 1). Retrieved from https://ca.water.usgs.gov/projects/gamactt/
  55. U.S. Geological Survey . (2016b). National Water Information System data available on the World Wide Web (USGS Water Data for the Nation). 10.5066/F7P55KJN [DOI]
  56. Van Geen, A. , Zheng, Y. , Versteeg, R. , Stute, M. , Horneman, A. , Dhar, R. , et al. (2003). Spatial variability of arsenic in 6000 tube wells in a 25 km2 area of Bangladesh. Water Resources Research, 39(5), 1140. 10.1029/2002WR001617 [DOI] [Google Scholar]
  57. Verma, N. , Rachamalla, M. , Kumar, P. S. , & Dua, K. (2023). Assessment and impact of metal toxicity on wildlife and human health. Metals in Water, 93–110. 10.1016/B978-0-323-95919-3.00002-1 [DOI] [Google Scholar]
  58. Welch, A. H. , Westjohn, D. B. , Helsel, D. R. , & Wanty, R. B. (2000). Arsenic in ground water of the United States: Occurrence and geochemistry. Groundwater Series, 38(4), 589–604. 10.1111/J.1745-6584.2000.TB00251.X [DOI] [Google Scholar]
  59. Winkel, L. , Berg, M. , Amini, M. , Hug, S. J. , & Johnson, A. A. (2008). Predicting groundwater arsenic contamination in Southeast Asia from surface parameters. Nature Geoscience, 1(8), 536–542. 10.1038/ngeo254 [DOI] [Google Scholar]
  60. Wu, R. , Alvareda, E. M. , Polya, D. A. , Blanco, G. , & Gamazo, P. (2021). Distribution of groundwater arsenic in Uruguay using hybrid machine learning and expert system approaches. Water, 13(4), 527. 10.3390/W13040527 [DOI] [Google Scholar]
  61. Ying, S. C. , Schaefer, M. V. , Cock Esteb, A. , Li, J. , & Fendorf, S. (2017). Depth stratification leads to distinct zones of manganese and arsenic contaminated groundwater. Environmental Science and Technology, 51(16), 8926–8932. 10.1021/ACS.EST.7B01121 [DOI] [PubMed] [Google Scholar]
  62. Youden, W. J. (1950). Index for rating diagnostic tests. Cancer, 3(1), 32–35. 10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3 [DOI] [PubMed] [Google Scholar]
  63. Zhao, Z. , Kumar, A. , & Wang, H. (2024). Predicting arsenic contamination in groundwater: A comparative analysis of machine learning models in coastal floodplains and inland basins. Water, 16(16), 2291. 10.3390/W16162291/S1 [DOI] [Google Scholar]
  64. Zhu, J. J. , Yang, M. , & Ren, Z. J. (2023). Machine learning in environmental research: Common pitfalls and best practices. Environmental Science & Technology, 57(46), 17671–17689. 10.1021/ACS.EST.3C00026 [DOI] [PubMed] [Google Scholar]
  65. Zomer, R. J. , Xu, J. , & Trabucco, A. (2022). Version 3 of the global aridity index and potential evapotranspiration database. Scientific Data, 9(1), 1–15. 10.1038/s41597-022-01493-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

References From the Supporting Information

  1. Asniar, Maulidevi, N. U. , & Surendro, K. (2022). SMOTE‐LOF for noise identification in imbalanced data classification. Journal of King Saud University ‐ Computer and Information Sciences, 34(6), 3413–3423. 10.1016/j.jksuci.2021.01.014 [DOI] [Google Scholar]
  2. Gnip, P. , Vokorokos, L. , & Drotár, P. (2021). Selective oversampling approach for strongly imbalanced data. PeerJ Computer Science, 7, 1–22. 10.7717/PEERJ-CS.604 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. Azad, S. , Stahl, M. , Erickson, M. , Beck, A. D. , Connelly, C. , Chillrud, L. , et al. (2025). Dataset for: Surface variable‐based machine learning for scalable arsenic prediction in undersampled areas (version 2) [Dataset]. Zenodo. 10.5281/zenodo.17556627 [DOI]
  2. NASA/METI/AIST/Japan Spacesystems, & U.S./Japan ASTER Science Team . (2019). ASTER global water bodies database V001 [Dataset]. NASA EOSDIS Land Processes Distributed Active Archive Center. 10.5067/ASTER/ASTWBD.001 [DOI]

Supplementary Materials

Supporting Information S1

Data Availability Statement

All data and code that support this study are openly available (Azad et al., 2025). The archive contains the materials needed to reproduce the analyses and figures.


Articles from GeoHealth are provided here courtesy of Wiley

RESOURCES