Abstract
Lithium (Li) concentrations in drinking-water supplies are not regulated in the United States; however, Li is included in the 2022 U.S. Environmental Protection Agency list of unregulated contaminants for monitoring by public water systems. Li is used pharmaceutically to treat bipolar disorder, and studies have linked its occurrence in drinking water to human-health outcomes. An extreme gradient boosting model was developed to estimate geogenic Li in drinking-water supply wells throughout the conterminous United States. The model was trained using Li measurements from ∼13,500 wells and predictor variables related to its natural occurrence in groundwater. The model predicts the probability of Li in four concentration classifications, ≤4 μg/L, >4 to ≤10 μg/L, >10 to ≤30 μg/L, and >30 μg/L. Model predictions were evaluated using wells held out from model training and with new data and have an accuracy of 47–65%. Important predictor variables include average annual precipitation, well depth, and soil geochemistry. Model predictions were mapped at a spatial resolution of 1 km2 and represent well depths associated with public- and private-supply wells. This model was developed by hydrologists and public-health researchers to estimate Li exposure from drinking water and compare to national-scale human-health data for a better understanding of dose–response to low (<30 μg/L) concentrations of Li.
Keywords: lithium, drinking water, groundwater, machine learning, extreme gradient boosting
Short abstract
Lithium is currently a nonregulated constituent in drinking water that may affect human health at low concentrations. A model is developed to estimate the occurrence of lithium in groundwater used as drinking water for the conterminous United States.
1. Introduction
Lithium (Li) is a naturally occurring alkali metal found in minerals and in groundwater and surface water as a monovalent cation. Li concentrations in drinking-water supplies are not currently regulated in the United States; therefore, its occurrence has not been commonly measured. Li is used as a medication to treat bipolar disorder and depression,1 and clinical doses typically range between 600 and 1800 mg per day, which is 2–3 orders of magnitude greater than typical drinking-water concentrations (<0.030 mg/L).2 Studies have linked the low-level occurrence (0.1–219 μg/L) of Li in drinking water to positive human-health outcomes such as reduced suicide mortality3−7 and other mental-health benefits2,8−11 in addition to potential negative outcomes such as autism12 and thyroid hormone levels.13,14 Further, side effects impacting renal, neurological, dermal, cardiovascular, and endocrine systems can occur due to Li used clinically, especially at higher doses (2.74–4.2 mg Li/kg of body weight per day).15 The U.S. Environmental Protection Agency (EPA) established a provisional reference dose (p-RfD) of 2 μg of Li per kilogram of body weight per day and reports that confidence in this value is low to medium because of a lack of dose–response information at subclinical concentrations.15 The U.S. Geological Survey developed a nonenforceable health-based screening level of 10 μg/L for Li in drinking water based on the p-RfD.16 Li is included in the most recent EPA list of unregulated contaminants to be monitored by public water systems as part of the fifth Unregulated Contaminant Monitoring Rule (UCMR5). This rule requires public water suppliers to measure Li concentrations in drinking water starting in 2023 to provide nationally distributed data on its occurrence.17 The inclusion of Li in UCMR5 indicates the potential for future regulations in public drinking-water supply utilities.
To date, there have been few studies that quantify and characterize the occurrence of geogenic Li in groundwater or drinking water at the national scale for the conterminous United States (CONUS).18−21 Historically, Li measurements have been reported in groundwater studies that measure numerous elements.18,19 In a study focused on water quality from domestic wells in the principal aquifers of the United States, Li was measured in approximately 97% of 662 wells sampled; however, there was no discussion or map related to its occurrence.18 A subsequent national-scale study that examined trace elements and radon in groundwater across the United States reported Li in 94% of 936 wells sampled and observed that in humid regions, Li was greater in monitoring wells that were shallower as compared to deeper drinking-water supply wells. However, in dry regions, an opposite relationship was observed. The same study also found that Li concentrations were generally higher in unconsolidated sand and gravel aquifers than in other aquifers.19
Recently, two studies that focused solely on Li offer more insights into its national occurrence. One study included data from groundwater used as drinking water throughout the United States20 and another measured Li at 21 public water utilities across the United States that are supplied by surface and groundwater.21 The groundwater-focused study determined that Li concentrations vary across the United States with higher concentrations in arid regions and older groundwater. This previous study determined that the primary processes controlling Li in groundwater are cation exchange and mixing with saline water.20 The study at public water utilities reported regional differences in Li concentrations with higher values in the arid southwestern and southern regions of the United States compared to the Midwest, Southeast, and mid-Atlantic regions. A comparison of groundwater and surface water sourced drinking water found that Li concentrations were higher in groundwater in the arid southwestern and southern regions, with similar concentrations in the Southeast and Midwest.21
In this study, we developed a machine learning model to predict Li concentrations in groundwater used as drinking water throughout the CONUS. Machine learning (ML) methods are becoming more widely utilized in the field of environmental science and can identify patterns in data not easily accomplished with traditional statistical methods.22 Several different ML approaches have been applied to predicting groundwater quality at regional and national scales including random forest and boosted regression trees.23−28 Recently, extreme gradient boosting (XGB) models have been developed to predict manganese concentrations in the North Atlantic Coastal Plain aquifer system29 and nitrate concentrations in groundwater across the CONUS.30 The nitrate study found that the XGB model outperformed a boosted regression tree model based on the root-mean-square error from the cross-validation folds during model tuning. Extreme gradient boosting has been a highly successful modeling approach for a wide range of applications and has been the winning method in many machine learning competitions.31 This method utilizes ensemble decision trees with a regularized learning objective (absent from other decision tree methods) that minimizes the loss of function between the predicted and actual values for each tree. Also, complex model structures are penalized to avoid the problem of overfitting the model to the model training data. XGB is also designed to handle sparse data sets and missing values in the predictor variables,31 a common occurrence in environmental data. We apply the XGB method in this study based on its robustness.
The purpose of this study is to develop a model to map estimated Li concentrations in groundwater used as drinking water in the United States. We used the XGB algorithm, an interpretable modeling method, to explore the relationships between the predictor variables and Li occurrence. The maps that are produced provide a nationally consistent estimate of Li occurrence in groundwater, which is especially useful for areas of the country lacking sampling data. The models were developed in collaboration with public-health scientists so that the results may be used in national-scale studies of associations between exposure to low-dose Li and human-health outcomes.
2. Methods
2.1. Lithium Data
Data on Li concentrations in groundwater from wells across the CONUS were compiled from the water-quality portal (https://www.waterqualitydata.us/), which contains data from the U.S. Geological Survey (USGS) National Water Information System Database, the EPA, and the National Water Quality Monitoring Council provided by Federal, Tribal, and local agencies.
Upon collection of the Li data, some samples were excluded from further analyses to focus on geogenic Li in groundwater that had the potential to be used for drinking water. Exclusion criteria included wells less than 1 m deep or greater than 1500 m deep as these were unlikely to be used as drinking-water supply wells. Additionally, wells were excluded if the water quality was considered nonpotable with dissolved solids concentrations >1000 mg/L or pH values classified as acidic or alkaline (3.5 ≤ pH ≥ 10.5), or they were readily identifiable as contaminated or associated with coal, oil, or gas with explicit identifiers in the site name such as coal, oil, gas, mining, or contam operable unit. Samples were also excluded if the reporting limit for Li analysis was >4 μg/L. For sites with multiple samples, only the most recent sample was retained. The final data set comprises 18,027 wells sampled between January 1989 and November 2020. Most of the sites are either monitoring (33.4%), domestic-supply (28.4%), or public-supply (24.2%) wells. The well type for 6.2% of the sites is unknown and the remaining 7.8% of sites are other miscellaneous water uses. A summary of the general characteristics of the wells is included in Table SI_1. Well locations are shown in Figure 1.
Additionally, an independent set of samples were recently collected from private wells in Nevada and northeastern California for a separate study and Li concentrations from that study were used to evaluate the regional accuracy of the private-well model prediction map produced in the current study.32 Independent data (in contrast to a subset of data purposefully held back from a larger data set) are rarely used to evaluate models but can be useful because they can represent a different areal extent or time and provide additional information on model performance.23,33
2.2. Predictor Variables
Predictor variables were selected that represent established associations or are proxies for factors that influence the occurrence of geogenic lithium in groundwater. All variables were required to be available as geographic information system (GIS) layers for CONUS. In general, predictor variables represent hydrologic, geologic, geochemical, and climatic characteristics associated with each well location. Categorical variables, such as the Köppen–Gieger climate classification, were converted to binary variables so that each category is assigned one if it is present or zero if it is absent (i.e., one-hot encoded). Approximately, 200 variables were tested in preliminary models with groups of variables dropped from consideration for the final model based on low variable importance scores in preliminary models. After the final model was developed, additional variables were excluded based on normalized variable importance scores that were less than 0.34.
Li concentration and independent variable data used for predictive modeling are available in a U.S. Geological Survey Data Release.34
2.3. Model Development
An XGB classification model31 with four lithium concentration classes (≤4, >4 to ≤10, >10 to ≤30, >30 μg/L) was developed. This model algorithm was selected because it can handle missing data values (unlike random forest, for example) and in a previous study outperformed another tree-based method for predicting groundwater quality at the national scale.30 The boundaries for the Li concentration classes were chosen based on the distribution of Li concentrations (Table SI_2) and nonregulatory health-based and regulatory values. The lowest concentration boundary of 4 μg/L was chosen because it is below the health-based screening level of 10 μg/L and did not eliminate too many wells from inclusion in the study based on method detection limits; for example, if a reported detection limit for a sample was greater than 4 μg/L, then it could not be included. The 10 and 30 μg/L concentration boundaries are based on the health-based screening level and Eurasian Economic Union Limit for Li in drinking water, respectively.16,35
Lithium concentrations from a total of 18,027 wells were used to develop and evaluate the model. Lithium well concentration data were split into a model training data set of 13,522 wells (75%) and model validation data set of 4505 wells (25%). The data were split between the two data sets to have the same distribution of Li concentrations (Table SI_2). The training data set was used to develop the model, and the validation data set was used to evaluate the predictive performance of the final model. As mentioned, the model also was tested with an independent dataset from Nevada and northeastern California, which is discussed later in this manuscript.
Models were tuned using the R software environment, version 4.2.0, and the “xgboost” and “caret” packages. 10-fold cross-validation was run on a tuning grid consisting of 256 combinations of model hyperparameters. The range of hyperparameters evaluated were step size shrinkage (η) 0.005 to 0.0125 by 0.0025, maximum depth of a tree (max_depth) 8 to 14 by 2, nrounds from 500 to 2000 by 500, subsample ratio of columns when constructing each tree (col_sample_by_tree) 0.5 to 0.75 by 0.25, and subsample 0.5 to 0.75 by 0.25. During initial model tuning, the hyperparameters held constant were the minimum loss reduction required to make a further partition on a leaf node of the tree (γ) held at 0, the minimum sum of instance weight needed in a child (min_child_weight) held at 1, the constraint on the updating step for each leaf output (max_delta_step) held at 0, α (L1 regularization term) held at 1, and λ (L2 regularization term) held at 0.
The most accurate model was determined via 10-fold cross-validation, and standard errors for the model accuracy were calculated. The simplest model within one standard error of the most accurate model was selected as the final model. Simpler models were identified based on the smallest values for model hyperparameters with preference given in the following order: η, max_depth, nrounds, colsample_bytree, and subsample. The simpler model within one standard error was selected as the final model to avoid overfitting to the training data. After selection of the simpler model, changes in secondary hyperparameters including the model regularization terms (α and λ), γ, and max_delta_step were tested to assess their impact on reducing the number of variables included in the final model and model prediction accuracy to the training data set. Each of these secondary hyperparameters was changed individually, while the primary hyperparameters were held constant and the additional secondary hyperparameters were held at their default values. Secondary hyperparameter values tested were γ (0, 3, 5), λ (1, 3, 5), α (0, 3, 5), and max_delta_step (0, 1, 2, 5, 7). Finally, the number of variables in the model were evaluated based on the variable importance scores and were included in the model if normalized importance scores were greater than 0.34.
Continuous maps of Li occurrence in groundwater used as drinking water were made for the CONUS using the final model and run using the R software environment, version 4.3.1. Map predictions were made at a 1 km2 grid cell size by overlaying a common grid36 on each of the GIS files for the independent variables and extracting a value for each grid cell using the bilinear resampling technique in ArcMap v.10.8.1.37 Well depth values were extracted from two different GIS layers, representing typical private-well and public-supply well depths, which range from 1.5 to 1463 and 1.5 to 1596 m, respectively.38 The final model was used to predict a lithium concentration at each grid cell, and two maps were created, one at each well depth distribution, representing private or public wells.
2.4. Model Interpretation
SHapley Additive exPlanations (SHAP) dependence plots provide insights into the relationships between the predictor variables and the model outcomes,39 allowing for interpretation of the model. SHAP plots were generated using the xgboost package, and their patterns were compared to processes and conditions known to contribute to Li concentrations in groundwater.
3. Results and Discussion
Lithium concentrations in the data set used to develop the model range from <0.001 to 15,000 μg/L. The number of wells within each concentration class used in the classification model are listed in Table SI_2. The Li concentration class of ≤4 μg/L contains the most wells (n = 7025), while the concentration class >30 μg/L contains the least wells (n = 2727). Based on the model tuning results, the hyperparameters of the final model (i.e., the simplest model within one standard error of the most accurate model) are η = 0.005, max_depth = 10, nrounds = 1000, colsample_bytree = 0.75, and subsample = 0.50. Using these hyperparameters, the secondary hyperparameters were varied and changes evaluated by assessing the impact on the number of variables included in the model and the model accuracy for predictions to the training data set. Changes to the secondary hyperparameters decreased the model accuracy; therefore, the default values were used in the final model and include γ = 0, λ = 1, α = 0, and max_delta_step = 0. The variable importance scores were examined and used to determine the final number of variables in the model. There was a large decrease in the variable importance scores from 0.34 to 0.07 between the 20th and 21st variables. The model was run with the top 20 variables, and predictions to the training data were compared to the model with 46 variables. The model with fewer variables had slightly greater overall model prediction accuracy (92.66% vs 92.52%) and was chosen as the final model.
When the model is run on the validation data set, it has an overall accuracy of 65.1%. Additional model prediction metrics for each Li concentration class include the sensitivity (true positive rate), specificity (true negative rate), and balanced accuracy (average of sensitivity and specificity) (Table 1). The predictions have the highest balanced accuracy and sensitivity for the lowest Li concentration class (Li ≤ 4 μg/L), which is expected because the greatest number of wells occur in this class. The specificity for the model predictions in this class is the lowest across the classes (0.7905); however, the difference between the specificity and sensitivity in this class is the smallest when compared to the other classes, with a difference of 0.0945. The difference between the sensitivity and specificity of the predictions for the other classes ranges from 0.2324 to 0.5855. For the three higher Li concentration classes (>4 and ≤10 μg/L, >10 and ≤30 μg/L, and >30 μg/L), the specificity is greater than the sensitivity. Therefore, when Li is greater than 4 ug/L, the model is better at predicting when Li is less than a certain threshold compared to when Li is above that threshold. The model does best at predicting when a well has a Li concentration that is in a class other than >30 μg/L (specificity for >30 μg/L) and is worst at predicting Li concentrations from >4 to ≤10 μg/L (sensitivity for >4 to ≤10 μg/L).
Table 1. Model Prediction Metrics for the Validation Data set Used to Estimate Geogenic Lithium (Li) in Drinking-Water Supply Wells throughout the Conterminous United States.
Li classification | class 1 (≤4 μg/L) | class 2 (>4 to ≤10 μg/L) | class 3 (>10 to ≤30 μg/L) | class 4 (>30 μg/L) |
---|---|---|---|---|
sensitivity | 0.8850 | 0.3333 | 0.6100 | 0.5580 |
specificity | 0.7905 | 0.9188 | 0.8424 | 0.9548 |
balanced accuracy | 0.8377 | 0.6261 | 0.7262 | 0.7564 |
To examine the differences in the model prediction classifications and the observed classifications of Li concentrations, the classifications were assigned a value of 1 through 4 (one equal to the lowest concentration class), and the observed classification was subtracted from the model prediction classification (Figure 2). A result of zero indicates the model correctly predicts the observed classification, while negative values indicate the model underpredicts and positive values indicate the model overpredicts. The results from this analysis show that the model predicts the correct class for 65.1% of the validation data and predicts a result either one class above (1) or below (−1) the correct result for 27.1% of the validation data. Overall, the model predicts the observed concentration class or within one class of the observed 92.2% of the time. The model tends to underpredict with negative values accounting for 21.7% and overpredict with positive values accounting for 13.2%.
3.1. Predictor Variables
The final model contains 20 predictor variables (Figure 3 and Table SI_3). They include climatic variables such as average annual precipitation,40 geochemical variables representing the soil chemistry,41 and hydrologic variables such as the lateral position of a well with respect to streams and their hydrologic divides42 and outputs from a national groundwater model.43 The variable importance within the model was determined using the function built into the XGBoost package that calculates the fractional contribution of each feature based on the total gain from the splits that use that feature.44 For the overall model, the relative variable importance, which is normalized to the variable with the greatest variable importance, is shown in Figure 3 and listed in Table SI_3. The average annual precipitation (ppt_91_20) and well depth (WELL_DEPTH) are the two most important variables. The lateral positions of a well within a watershed for ninth-, seventh-, and fifth-order streams are 3 of the top 10 variables, and soil geochemistry variables comprise an additional 3 variables in the top 10. The relative variable importance was also determined for each Li classification in the model (Tables SI_4–SI_7) and indicates that there are slight differences in variable importance between the classifications, but generally they are similar. For example, the average annual precipitation is the most important variable overall and for all well classifications except class 2 where it is the second most important variable. Similarly, well depth is in the top 2 for variable importance overall and for Li concentration classifications 1, 2, and 3. For the Li concentration class 4, the well depth is the fifth most important variable.
3.2. Model Interpretation
Here, we focus on the SHAP plots for the model predictions relative to the highest Li concentration classification (Li > 30 μg/L), allowing for easier interpretation because these plots compare the independent variables to the predictions being greater than a given Li concentration, as opposed to the other classifications, which are for Li concentrations less than a value or between two concentrations. We examine in detail and discuss four SHAP plots for the most important predictor variables (Figure 4).
The SHAP plot for average annual precipitation indicates that generally the probability of Li concentrations > 30 μg/L increases with decreasing average annual precipitation amounts. The overall pattern in this graph is indicative of the locations of high Li concentrations that occur more frequently in the arid western region of the country (average annual rainfall < 500 mm) compared to the more humid eastern regions of the country that tend to have lower Li concentrations (Figure 1a). This result agrees with two previous national-scale studies that report higher Li concentrations in groundwater and surface waters from dry climate regions compared to humid regions.19,21 These higher concentrations in arid regions might be associated with playas, or dried lake beds that have elevated concentrations of lithium in associated salts.45 However, an additional national-scale study focusing on Li concentrations in groundwater observed higher concentrations (>60 μg/L) in both arid and humid regions of the country.20 The infrequently higher Li concentrations observed in parts of the more humid upper Midwest and Northeast may be driven by factors unrelated to climate.
Well depth is another important model variable, and the SHAP plot indicates a large and positive relationship between well depth and high Li concentrations (Figure 4b). The increase in Li concentration with well depth has been observed in other Li groundwater studies and is indicative of processes that contribute to Li in groundwater including mixing with deep brines, dissolution of lithium-bearing minerals, and cation exchange.20 The well depth relationship is also consistent with the previously reported relationship of higher Li with increasingly older groundwater.20
The lateral position (LP) of a well is a normalized dimensionless value ranging from 0 to 10,000 that represents the location of a well relative to its distance between a stream of nth order and the drainage divide.42,46 The lower the value, the closer the well is to a stream where the LP is equal to zero. Lateral positions have been calculated for the CONUS for first- through ninth-order streams (n = 1 through 9),42,46 we use 5 of the 9 LP variables (LP 1, 3, 5, 7, 9) in our model to avoid over-reliance on these variables. The SHAP plot for the LP for ninth-order streams (LP_9), which has the highest relative importance of the LP variables, is shown in Figure 4c and indicates a nonlinear relationship where Shapley values are greatest at low (<1000) and high (>9000) LP values and dip in between. This could be indicative of several different processes and geologic associations linked with Li occurrence in groundwater. Ninth-order streams are the largest rivers in the conterminous United States and include the Missouri, Mississippi, Columbia, and Colorado Rivers. LP_9 values also approach zero along the oceanic coastlines and shores of the Great Lakes. LP_9 values are greatest along the Continental Divides of North America including the Great Continental Divide in the western United States from New Mexico to Montana and the Appalachian Mountain range in the eastern United States from Georgia north to Maine. The higher Li concentrations observed at low lateral position values near ninth-order streams (LP_9) may be associated with groundwater that has a long residence time and, thus, greater chemical evolution, allowing Li to increase through mineral weathering and cation exchange along the flow path from recharge regions to discharge at streams. The high LP 9 values in the eastern United States coincide with mountainous regions that contain pegmatites, some of which are known to be enriched in Li.45,47
The concentration of Li in the C soil horizon is also an important predictor variable. The C soil horizon is the lowestmost soil horizon that sits above the bedrock and typically consists of partially weathered bedrock or the parent material. The soil chemistry information used in this study is from a national-scale study that collected soil samples throughout the CONUS and measured numerous geochemical and mineralogical constituents.41 The SHAP plot for Li in the C soil horizon indicates that higher concentrations of Li in the soil contribute more to the model prediction of Li > 30 μg/L than lower concentrations (Figure 4d). This indicates that dissolution of Li from minerals in the soil and bedrock contributes to the presence of Li in groundwater.
The SHAP plots are useful tools for interpreting how individual variables contribute to model predictions and the relationships between the model predictions and that variable. The SHAP plots for the Li prediction class ≤ 4 μg/L and the same variables shown in Figure 4 indicate consistent results between the contributions of these variables to the model. The patterns for the SHAP plots in the Li prediction class ≤ 4 μg/L (Figure SI_1) are the opposite of the patterns for the Li prediction class > 30 μg/L (Figure 4). For example, the relationship between average annual precipitation and the predictions for the lowest Li class indicate that at higher average annual precipitation, the probability of Li ≤ 4 μg/L is greatest or that Li concentrations are lowest at higher average annual precipitation values. This is the same interpretation that can be inferred from Figure 4a. The SHAP plots for the two intermediate Li concentration classes are not as straightforward to interpret because most of them do not contain any discernible pattern and are largely scattered around the SHAP value of zero (Figures SI_2 and SI_3). SHAP plots for all predictor variables and all Li concentration classification predictions as well as the overall model are included in Figures SI_1–SI_5.
3.3. Model Prediction Maps
Maps were made for the CONUS that show model predictions of the Li concentration classification at estimated well depths for domestic and public-supply wells (Figure 1b,c). A comparison of the observed and predictive maps generally indicates that the observed Li concentration map (Figure 1a) is spatially well-represented by the predictive maps (Figure 1b,c). Li concentrations > 30 μg/L were measured throughout much of the west and southwestern states including Montana, Wyoming, North Dakota, South Dakota, Colorado, Utah, Nevada, Arizona, New Mexico and Texas, and also in the eastern and northeastern States but to a lesser degree and extent. These areas of high Li are captured in the model prediction maps. A visual comparison of the prediction maps for domestic and public-supply well depths shows that the two are very similar, with noticeable differences in Li concentration predictions primarily in the southeastern United States (Georgia, Alabama, Mississippi, and Louisiana) where public-supply well depth predictions of Li concentration categories are slightly higher than those for domestic-supply well depths.
The mapped model predictions for Li concentration classes from domestic-supply wells were compared to newly collected, independent Li data from 253 wells that were part of a study on water quality in domestic-supply wells in Nevada and northeastern California.32 The mapped predictions for these wells have 45.5% accuracy, which is lower than the model prediction accuracy for the model validation data set (62.5%). The mapped predictions for these wells tend to underpredict the correct Li concentration classification (41.5%) more than they overpredict (13.0%; Figure 5). This result indicates that the model is more accurate in some areas than others, and this may occur for several reasons including (1) specific important local features such as geologic formations or deposits that are not represented by the predictor variables in our model, (2) this comparison was between the grid predictions at the well location and not for model point predictions at the specific well locations, and (3) geospatial variation in the model uncertainty.
3.4. Model Prediction Uncertainty
The Li concentration classification that is predicted for a grid cell is based on the classification that has the highest probability of occurrence from the model results, and the sum of all probabilities across the four classes equals 1. There are four possible classifications for each grid with the probabilities spread across these categories; therefore, a grid with a classification probability of only 26% could have the highest probability if, for example, the probabilities across the four classes were 26, 24, 25, and 25%. The Li classification with 26% would be selected as the most probable; however, that class has a much higher uncertainty than a class that is selected with a 90% probability. As prediction values across all classifications approach 0.25, uncertainty increases because there is a greater chance of predicting one of the four classes nearly equally. Figure 6a is a map of the highest probability for each grid classification from the private-well depth predictions; areas with low values have a higher uncertainty than those with high values. Areas of the country with the highest probabilities greater than 75% (and lowest prediction uncertainty) include the Pacific Northwest, north Texas, Nebraska, Wisconsin, and the upper Peninsula of Michigan, parts of New York, and the southeastern States (Figure 6a). Areas of the country with the lowest probabilities (and highest prediction uncertainty) are more widespread and include the Northeast, Midwest, and Western States (Figure 6a). Therefore, one explanation for the lower prediction accuracy in Nevada and northeastern California for the private-well map compared with observations is the low overall probabilities of classification predictions in that area of the country (Figure 6a). The map for the highest probabilities for public-supply well depth predictions is very similar and is included in Figure SI_6.
Figure 6b maps the modeled Li concentration classes minus observed Li concentration classes for the model validation data and shows that most predictions are correct or within one class of the actual class. Comparing Figure 6b with Figure 6a indicates that areas of the country with the highest probabilities for each grid classification, such as north Texas, have predictions that tend to be correct in Figure 6b as indicated by a zero. The validation wells that have modeled classes that are most different from the observed wells (−3 or +3 in Figure 6b) are dispersed in areas of the country in the lower probability ranges shown in Figure 6a.
4. Limitations and Future Directions
The model presented here for a CONUS-scale representation of Li in drinking water from public and private groundwater sources has reasonable accuracy at the well scale and across CONUS-scale predictions. However, there is room for improved predictions. For example, more dependent variable data (i.e., Li concentrations) in areas of sparse representation (the Northwest and Central United States) would most likely result in more refined and higher-accuracy predictions. Also, additional or improved predictor variables, such as refinement on the representation of groundwater residence time or chemical evolution of groundwater, would be useful. It is also possible that estimates of modeled groundwater travel times or recharge ages for depths to drinking-water supplies would improve predictive accuracy.
Further, our model does not consider impacts from potential anthropogenic sources of Li to groundwater. This may become important to consider in the future as the use and disposal of Li-containing products increases.48 Li has become increasingly important as an economic commodity due to its use in rechargeable batteries, especially for electric vehicles.49 This recent surge in demand and use of Li in consumer products is likely to result in its increased level of detection in groundwater and waterways due to anthropogenic impacts from use and disposal. Our model may be used to help discern whether areas with Li detections in groundwater are from anthropogenic or geogenic sources. A study of Li concentrations and isotopes in South Korean streams found that Li concentrations downstream of areas with high population density have higher Li concentrations and attributed the increase to anthropogenic activities based on Li isotope ratios.50 Studies have reported that standard wastewater treatment methods do not decrease Li concentrations between the influent and effluent.21,50 Additionally, our model only considers groundwater sources of drinking water and many large cities throughout the United States rely on surface water sources for their drinking water.51 Surface water sources of drinking water are likely to contain Li in varying concentrations; however, the data are currently sparse.17,21 As the UCMR5 rule goes into effect and more data become available, this model may be useful in future efforts to compare Li concentrations and occurrence in groundwater and surface waters.
Given the scattered sampling of wells for Li across the CONUS, the model and maps developed in this study provide a current best estimate of Li concentrations in groundwater used as drinking water throughout the CONUS. These estimates will be used to calculate human exposure metrics and evaluate associations with various human-health outcomes. This is particularly important as there is mixed evidence about possible health benefits to having higher concentrations of Li in drinking water, even when exposures are several orders of magnitude lower than would be seen in clinical situations. These potential benefits include a reduction in suicide,52 violent crime,53 and dementia.9 This has led to some calling for trials to evaluate the supplementation of Li in drinking water as a public-health intervention.54 Still, further confirmatory evidence in new geographic areas would help to clarify whether the health associations are attributable to Li exposure and whether they are beneficial or detrimental. These necessary future studies will be supported by the model presented here, which enables epidemiologic investigation across the CONUS. It also enables assessment of interactions with other exposures, such as lead, which may interact with Li in its impact on human health.2 Groundwater-quality models such as these provide a useful tool for identifying potential health effects of drinking-water contaminants, environmental justice issues, and public-health education and outreach needs.
Acknowledgments
Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. The authors thank the U.S. Geological Survey Environmental Health Program for project funding, Kristin Romanov, USGS, for assistance with assembling data, and Katherine Knierim, USGS, for providing an initial review of the manuscript. M.A.L. also acknowledges the Shelia Seaman Women Geoscientists Writing Retreat.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.est.3c03315.
Well and water-quality characteristics from wells across the conterminous United States; number of wells in each lithium (Li) concentration class used in the classification model to estimate geogenic lithium in drinking-water supply wells throughout the conterminous United States; predictor variables and relative importance for the overall model used to estimate geogenic lithium in drinking-water supply wells throughout the conterminous United States; relative importance of variables for lithium (Li) classification 1–4; SHapley Additive exPlanation (SHAP) plots for prediction classification of lithium (Li); SHapley Additive exPlanation (SHAP) plots for the overall model; and highest probability for each grid classification for public well depth predictions (PDF)
The authors declare no competing financial interest.
Special Issue
Published as part of Environmental Science & Technologyvirtual special issue “The Exposome and Human Health”.
Supplementary Material
References
- Curran G.; Ravindran A. Lithium for bipolar disorder: a review of the recent literature. Expert Rev. Neurother. 2014, 14 (9), 1079–1098. 10.1586/14737175.2014.947965. [DOI] [PubMed] [Google Scholar]
- Brown E. E.; Gerretsen P.; Pollock B.; Graff-Guerrero A. Psychiatric benefits of lithium in water supplies may be due to protection from the neurotoxicity of lead exposure. Med. Hypotheses 2018, 115, 94–102. 10.1016/j.mehy.2018.04.005. [DOI] [PubMed] [Google Scholar]
- Blüml V.; Regier M. D.; Hlavin G.; Rockett I. R. H.; König F.; Vyssoki B.; Bschor T.; Kapusta N. D. Lithium in the public water supply and suicide mortality in Texas. J. Psychiatr. Res. 2013, 47 (3), 407–411. 10.1016/j.jpsychires.2012.12.002. [DOI] [PubMed] [Google Scholar]
- Helbich M.; Leitner M.; Kapusta N. D. Geospatial examination of lithium in drinking water and suicide mortality. Int. J. Health Geogr. 2012, 11 (1), 19. 10.1186/1476-072X-11-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Helbich M.; Leitner M.; Kapusta N. D. Lithium in drinking water and suicide mortality: Interplay with lithium prescriptions. Br. J. Psychiatry 2015, 207 (1), 64–71. 10.1192/bjp.bp.114.152991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ando S.; Suzuki H.; Matsukawa T.; Usami S.; Muramatsu H.; Fukunaga T.; Yokoyama K.; Okazaki Y.; Nishida A. Comparison of lithium levels between suicide and non-suicide fatalities: Cross-sectional study. Transl. Psychiatry 2022, 12 (1), 466. 10.1038/s41398-022-02238-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Knudsen N. N.; Schullehner J.; Hansen B.; Jørgensen L. F.; Kristiansen S. M.; Voutchkova D. D.; Gerds T. A.; Andersen P. K.; Bihrmann K.; Grønbæk M.; Kessing L. V.; Ersbøll A. K. Lithium in Drinking Water and Incidence of Suicide: A Nationwide Individual-Level Cohort Study with 22 Years of Follow-Up. Int. J. Environ. Res. Public Health 2017, 14, 627. 10.3390/ijerph14060627. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ando S.; Shinsuke K.; Shimodera S.; Fujito R.; Sawada K.; Terao T.; Furukawa T.; Sasaki T.; Inoue S.; Asukai N.; Okazaki Y.; Nishida A. Lithium Levels in Tap Water and the Mental Health Problems of Adolescents: An Individual-Level Cross-Sectional Survey. J. Clin. Psychiatry 2017, 78 (3), e252–e256. 10.4088/JCP.15m10220. [DOI] [PubMed] [Google Scholar]
- Kessing L. V.; Gerds T. A.; Knudsen N. N.; Jørgensen L. F.; Kristiansen S. M.; Voutchkova D.; Ernstsen V.; Schullehner J.; Hansen B.; Andersen P. K.; Ersbøll A. K. Association of Lithium in Drinking Water With the Incidence of Dementia. JAMA Psychiatry 2017, 74 (10), 1005–1010. 10.1001/jamapsychiatry.2017.2362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eyre-Watt B.; Mahendran E.; Suetani S.; Firth J.; Kisely S.; Siskind D. The association between lithium in drinking water and neuropsychiatric outcomes: A systematic review and meta-analysis from across 2678 regions containing 113 million people. Aust. N. Z. J. Psychiatry 2021, 55 (2), 139–152. 10.1177/0004867420963740. [DOI] [PubMed] [Google Scholar]
- Muronaga M.; Terao T.; Kohno K.; Hirakawa H.; Izumi T.; Etoh M. Lithium in drinking water and Alzheimer’s dementia: Epidemiological Findings from National Data Base of Japan. Bipolar Disord. 2022, 24 (8), 788–794. 10.1111/bdi.13257. [DOI] [PubMed] [Google Scholar]
- Liew Z.; Meng Q.; Yan Q.; Schullehner J.; Hansen B.; Kristiansen S. M.; Voutchkova D. D.; Olsen J.; Ersbøll A. K.; Ketzel M.; Raaschou-Nielsen O.; Ritz B. R. Association Between Estimated Geocoded Residential Maternal Exposure to Lithium in Drinking Water and Risk for Autism Spectrum Disorder in Offspring in Denmark. JAMA Pediatrics 2023, 177 (6), 617–624. 10.1001/jamapediatrics.2023.0346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Broberg K.; Concha G.; Engström K.; Lindvall M.; Grandér M.; Vahter M. Lithium in Drinking Water and Thyroid Function. Environ. Health Perspect. 2011, 119 (6), 827–830. 10.1289/ehp.1002678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harari F.; Bottai M.; Casimiro E.; Palm B.; Vahter M. Exposure to Lithium and Cesium Through Drinking Water and Thyroid Function During Pregnancy: A Prospective Cohort Study. Thyroid 2015, 25 (11), 1199–1208. 10.1089/thy.2015.0280. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Provisional Peer Reviewed Toxicity Values for Lithium EPA; 2008.
- Norman J. E.; Toccalino P. L.; Morman S. A.. Health-Based Screening Levels for Evaluating Water-Quality Data USGS; 2022.
- EPA Revisions to the Unregulated Contaminant Monitoring Rule (UCMR 5) for Public Water Systems and Announcement of Public Meetings 2021https://www.govinfo.gov/content/pkg/FR-2021-12-27/pdf/2021-27858.pdf.
- Desimone L. A.Quality of Water from Domestic Wells in Principal Aquifers of the United States, 1991–2004 USGS; 2009.
- Ayotte J. D.; Gronberg J. A. M.; Apodaca L. E.. Trace Elements and Radon in Groundwater Across the United States, 1992–2003 USGS: Reston, VA; 2011.
- Lindsey B. D.; Belitz K.; Cravotta C. A.; Toccalino P. L.; Dubrovsky N. M. Lithium in groundwater used for drinking-water supply in the United States. Sci. Total Environ. 2021, 767, 144691 10.1016/j.scitotenv.2020.144691. [DOI] [PubMed] [Google Scholar]
- Sharma N.; Westerhoff P.; Zeng C. Lithium occurrence in drinking water sources of the United States. Chemosphere 2022, 305, 135458 10.1016/j.chemosphere.2022.135458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhong S.; Zhang K.; Bagheri M.; Burken J. G.; Gu A.; Li B.; Ma X.; Marrone B. L.; Ren Z. J.; Schrier J.; Shi W.; Tan H.; Wang T.; Wang X.; Wong B. M.; Xiao X.; Yu X.; Zhu J. J.; Zhang H. Machine Learning: New Ideas and Tools in Environmental Science and Engineering. Environ. Sci. Technol. 2021, 55 (19), 12741–12754. 10.1021/acs.est.1c01339. [DOI] [PubMed] [Google Scholar]
- Ayotte J. D.; Nolan B. T.; Gronberg J. A. Predicting Arsenic in Drinking Water Wells of the Central Valley, California. Environ. Sci. Technol. 2016, 50 (14), 7555–7563. 10.1021/acs.est.6b01914. [DOI] [PubMed] [Google Scholar]
- Ransom K. M.; Nolan B. T.; J A. T.; Faunt C. C.; Bell A. M.; Gronberg J. A. M.; Wheeler D. C.; C Z. R.; Jurgens B.; Schwarz G. E.; Belitz K.; S M. E.; Kourakos G.; Harter T. A hybrid machine learning model to predict and visualize nitrate concentration throughout the Central Valley aquifer, California, USA. Sci. Total Environ. 2017, 601–602, 1160–1172. 10.1016/j.scitotenv.2017.05.192. [DOI] [PubMed] [Google Scholar]
- Erickson M. L.; Elliott S. M.; Brown C. J.; Stackelberg P. E.; Ransom K. M.; Reddy J. E.; Cravotta C. A. 3rd Machine-Learning Predictions of High Arsenic and High Manganese at Drinking Water Depths of the Glacial Aquifer System, Northern Continental United States. Environ. Sci. Technol. 2021, 55 (9), 5791–5805. 10.1021/acs.est.0c06740. [DOI] [PubMed] [Google Scholar]
- Knierim K. J.; Kingsbury J. A.; Belitz K.; Stackelberg P. E.; Minsley B. J.; Rigby J. R. Mapped Predictions of Manganese and Arsenic in an Alluvial Aquifer Using Boosted Regression Trees. Groundwater 2022, 60 (3), 362–376. 10.1111/gwat.13164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lombard M. A.; Bryan M. S.; Jones D. K.; Bulka C.; Bradley P. M.; Backer L. C.; Focazio M. J.; Silverman D. T.; Toccalino P.; Argos M.; Gribble M. O.; Ayotte J. D. Machine Learning Models of Arsenic in Private Wells Throughout the Conterminous United States As a Tool for Exposure Assessment in Human Health Studies. Environ. Sci. Technol. 2021, 55 (8), 5012–5023. 10.1021/acs.est.0c05239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosecrans C. Z.; Belitz K.; Ransom K. M.; Stackelberg P. E.; McMahon P. B. Predicting regional fluoride concentrations at public and domestic supply depths in basin-fill aquifers of the western United States using a random forest model. Sci. Total Environ. 2022, 806 (Pt 4), 150960 10.1016/j.scitotenv.2021.150960. [DOI] [PubMed] [Google Scholar]
- DeSimone L. A.; Ransom K. M. Manganese in the Northern Atlantic Coastal Plain aquifer system, eastern USA—Modeling regional occurrence with pH, redox, and machine learning. J. Hydrol.: Reg. Stud. 2021, 37, 100925. 10.1016/j.ejrh.2021.100925. [DOI] [Google Scholar]
- Ransom K. M.; Nolan B. T.; Stackelberg P. E.; Belitz K.; Fram M. S. Machine learning predictions of nitrate in groundwater used for drinking supply in the conterminous United States. Sci. Total Environ. 2022, 807 (Pt 3), 151065 10.1016/j.scitotenv.2021.151065. [DOI] [PubMed] [Google Scholar]
- Chen T.; Guestrin C. In XGBoost, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016; pp 785–794.
- Arienzo M. M.; Saftner D.; Bacon S. N.; Robtoy E.; Neveux I.; Schlauch K.; Carbone M.; Grzymski J. Naturally occurring metals in unregulated domestic wells in Nevada, USA. Sci. Total Environ. 2022, 851, 158277 10.1016/j.scitotenv.2022.158277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Andy C. M.; Fahnestock M. F.; Lombard M. A.; Hayes L.; Bryce J. G.; Ayotte J. D. Assessing Models of Arsenic Occurrence in Drinking Water from Bedrock Aquifers in New Hampshire. J. Contemp. Water Res. Educ. 2017, 160 (1), 25–41. 10.1111/j.1936-704X.2017.03238.x. [DOI] [Google Scholar]
- Lombard M. A.Data Used to Model and Map Lithium Concentrations in Groundwater Used as Drinking Water for the Conterminous United States U.S. Geological Survey: 2023. [DOI] [PMC free article] [PubMed]
- Eurasian Economic Union . EAEU Technical Regulation on Safety of Packaged Potable Water including Natural Mineral Water (TR EAEU 044/2017), as translated by the U.S. Dept. of Agriculture, Foreign Agricultural Service, Global Agricultural Information Network Report Number RS1752, 2017.
- Clark B. R.; Barlow P. M.; Peterson S. M.; Hughes J. D.; Reeves H. W.; Viger R. J.. National-Scale Grid to Support Regional Groundwater Availability Studies and a National Hydrogeologic Database; U.S. Geological Survey: 2018.
- ArcGIS Desktop: Release 10.8.1; Environmental Systems Research Institute: Redlands, CA, 2020.
- Degnan J. R.; Kauffman L. J.; Erickson M. L.; Belitz K.; Stackelberg P. E.. Depth of Groundwater Used for Drinking-Water Supplies in the United States USGS: Reston, VA; 2021.
- Molnar C.Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, 2nd ed.; BookDown, 2022. [Google Scholar]
- PRISM Climate Group . 30-Year Normals for Average Annual Precipitation. https://prism.oregonstate.edu.
- Smith D. B.; Cannon W. F.; Woodruff L. G.; Solano F.; Ellefsen K. J.. Geochemical and Mineralogical Maps for Soils of the Conterminous United States U.S. Geological Survey; 2014.
- Moore R.; Belitz K.; Arnold T. L.; Sharpe J. B.; Starn J. J.. National Multi Order Hydrologic Position (MOHP) Predictor Data for Groundwater and Groundwater-Quality Modeling U.S. Geological Survey; 2019.
- Zell W. O.; Sanford W. E. Calibrated Simulation of the Long-Term Average Surficial Groundwater System and Derived Spatial Distributions of its Characteristics for the Contiguous United States. Water Resour. Res. 2020, 56 (8), e2019WR026724 10.1029/2019WR026724. [DOI] [Google Scholar]
- Chen T.; He T.; Benesty M.; Khotilovich V.; Tang Y.; Cho H.; Chen K.; Mitchell R.; Cano I.; Zhou T.; Li M.; Xie J.; Lin M.; Geng Y.; Li Y.; Yuan J.. Extreme Gradient Boosting, 1.6.0.1; CRAN, 2022.
- Bradley D. C.; Stillings L. L.; Jaskula B. W.; Munk L.; McCauley A. D.. Economic and Environmental Geology Prospects for Future Supply U.S. Geological Survey; 2017.
- Belitz K.; Moore R. B.; Arnold T. L.; Sharpe J. B.; Starn J. J. Multiorder Hydrologic Position in the Conterminous United States: A Set of Metrics in Support of Groundwater Mapping at Regional and National Scales. Water Resour. Res. 2019, 55 (12), 11188–11207. 10.1029/2019WR025908. [DOI] [Google Scholar]
- Kesler S. E.; Gruber P. W.; Medina P. A.; Keoleian G. A.; Everson M. P.; Wallington T. J. Global lithium resources: Relative importance of pegmatite, brine and other deposits. Ore Geol. Rev. 2012, 48, 55–69. 10.1016/j.oregeorev.2012.05.006. [DOI] [Google Scholar]
- Yang X.; Wen H.; Lin Y.; Zhang H.; Liu Y.; Fu J.; Liu Q.; Jiang G. Emerging Research Needs for Characterizing the Risks of Global Lithium Pollution under Carbon Neutrality Strategies. Environ. Sci. Technol. 2023, 57 (13), 5103–5106. 10.1021/acs.est.3c01431. [DOI] [PubMed] [Google Scholar]
- Bibienne T.; Magnan J.-F.; Rupp A.; Laroche N. From Mine to Mind and Mobiles: Society’s Increasing Dependence on Lithium. Elements 2020, 16 (4), 265–270. 10.2138/gselements.16.4.265. [DOI] [Google Scholar]
- Choi H. B.; Ryu J. S.; Shin W. J.; Vigier N. The impact of anthropogenic inputs on lithium content in river and tap water. Nat. Commun. 2019, 10 (1), 5371 10.1038/s41467-019-13376-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dieter C. A.; Maupin M. A.; Caldwell R. R.; Harris M. A.; Ivahnenko T. I.; Lovelace J. K.; Barber N. L.; Linsey K. S.. Estimated Use of Water in the United States in 2015; USGS: Reston, VA, 2018; p 76.
- Barjasteh-Askari F.; Davoudi M.; Amini H.; Ghorbani M.; Yaseri M.; Yunesian M.; Mahvi A. H.; Lester D. Relationship between suicide mortality and lithium in drinking water: A systematic review and meta-analysis. J. Affective Disord. 2020, 264, 234–241. 10.1016/j.jad.2019.12.027. [DOI] [PubMed] [Google Scholar]
- Kohno K.; Ishii N.; Hirakawa H.; Terao T. Lithium in drinking water and crime rates in Japan: cross-sectional study. BJPsych Open 2020, 6 (6), e122 10.1192/bjo.2020.63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Araya P.; Martínez C.; Barros J. Lithium in Drinking Water as a Public Policy for Suicide Prevention: Relevance and Considerations. Front. Public Health 2022, 10, 805774 10.3389/fpubh.2022.805774. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.