Skip to main content
EPA Author Manuscripts logoLink to EPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Jan 1.
Published in final edited form as: Mar Pollut Bull. 2022 Dec 8;186:114456. doi: 10.1016/j.marpolbul.2022.114456

Assessing the relative importance of stressors to the benthic index, M-AMBI: An example from U.S. estuaries

Marguerite C Pelletier 1, Michael Charpentier 2
PMCID: PMC9813808  NIHMSID: NIHMS1858386  PMID: 36502776

Abstract

M-AMBI, a multivariate benthic index, has been used by European and American (U.S.) authorities to assess estuarine and coastal health and has been used in scientific studies throughout the world. It has been shown to be related to multiple pressures and stressors, but the relative importance of individual stressors within a multiple stressor context has not generally been assessed. In this study, we assembled data collected between 1999 and 2015 by the U.S. Environmental Protection Agency using consistent methods. These data included sediment and water quality measures and benthic invertebrate data which were used to calculate M-AMBI. We further assembled watersheds for all US estuaries with benthic data and calculated land use metrics. Random forest (RF) was used to identify those variables most strongly related to M-AMBI. Because RF is a compilation of multiple, nonlinear models, we then assessed which of these variables had a direct relationship with M-AMBI. The resulting variables were then assessed using RF to identify the subsets of variables that produced an effective and parsimonious model. This process was conducted at the national and ecoregional scale and the variables identified as being most important to predict M-AMBI were compared with literature reports of ecological patterns in a given area. At the national scale, better condition was correlated with clearer waters, lower amounts of agriculture in the watershed, and lower carbon and metal concentrations in estuarine sediments. Other stressors were identified as being important at the ecoregional scale, although sediment metal concentrations and watershed agriculture were identified as being important in most ecoregions. Our results suggest that this technique is useful to identify the most important variables impacting M-AMBI at broad spatial scales, even when the percentage of sites in Bad or Poor condition is low. This technique also provides an initial identification of important stressors that can be used to target more intensive local studies.

Keywords: Benthic index, M-AMBI, Random Forest, Statistics, Stressors

1. Introduction

Estuaries and coasts have high importance to humans due to their land-sea nexus. In these areas, freshwater, nutrients, sediment and organic matter from the watershed (Tesi et al., 2013; Chen et al., 2018; Chen et al., 2019; Oelsner and Stets, 2019; Paerl et al., 2020) mix with oceanic water, creating highly productive ecosystems characterized by environmental gradients and a mosaic of habitats, many of which act nursery areas (Vasconcelos et al., 2011; Whitfield, 2020) for commercially and ecologically important species. The opportunities for fishing, commerce and recreation have drawn people to the shore for hundreds of years; >50 % of the world's cities and 38 % of the world's human population live in these coastal areas (Small and Nicholls, 2003). With increasing human occupancy, estuaries and coasts have been impacted by nutrient and organic carbon over-enrichment, and alteration of sediment and freshwater delivery, which has adversely impacted these estuarine and coastal ecosystems (Gillanders and Kingsford, 2002; Diaz et al., 2008; Stein and Cadien, 2009; Borja et al., 2010; Liu et al., 2021).

In the United States, the Clean Water Act (CWA) was implemented in 1972 to “restore and maintain the chemical, physical and biological integrity of the Nation's waters.” Initially, monitoring in support of the CWA focused on funding state monitoring programs, but this approach did not allow a statistically robust assessment of U.S. waters or improvements due to management action, so an additional nationwide monitoring initiative was enacted (Whittier and Paulsen, 1992), now known as the National Aquatic Resource Surveys (NARS). In Europe, the 2000 Water Framework Directive (WFD) built on the CWA and European regional seas conventions and agreements but focused on using biological elements and ecological status rather than chemical measures to assess water quality (Hering et al., 2010).

Marine and estuarine benthic invertebrates have often been used to assess ecological condition (O'Brien et al., 2016) because they respond predictably to pollution (Pearson and Rosenberg, 1978), are relatively sedentary, and act as integrators of stress over months to years (Dauer, 1993). They are also ecologically important, affecting nutrient cycling and food web dynamics (Griffiths et al., 2017; Young et al., 2021). Different aspects of these communities are often summarized into individual metrics such as diversity and pollution tolerance that are then combined into a single number (a benthic index) that can be easily interpreted by environmental managers (Engle and Summers, 1999; Pinto et al., 2009). In Europe, the AZTI-Marine Biotic Index (AMBI; Borja et al., 2000) and its multivariate extension M-AMBI (Muxika et al., 2007) are commonly used benthic indices used to assess ecological condition. AMBI is an abundance-weighted, tolerance value index. M-AMBI uses factor analysis to combine AMBI, diversity, and species richness into a single index, and is able to account for habitat differences. In the United States, M-AMBI has been adapted using salinity zones as individual habitats and enhancing the world-wide AMBI ecological groups list with U.S.-specific taxa (Gillett et al., 2015; Pelletier et al., 2018). This adapted index is currently being used to assess the condition of U.S. waters by the National Coastal Condition Assessment (NCCA), the coastal portion of NARS.

The US Environmental Protection Agency (EPA)'s nationwide surveys, i.e., NARS, have three major goals. These goals are to provide 1, a statistical assessment of the current condition of U.S. waters, 2, a statistical assessment of trends over time (i.e., are conditions improving or declining) and 3, a determination of which stressors are responsible for adverse condition. For most of the freshwater resources (rivers and streams, lakes, and wetlands), this third goal is addressed using a relative risk approach (Van Sickle et al., 2006) which assesses the relative importance of individual stressors to observed poor biological condition. This technique is not appropriate for the coastal survey due to the low numbers of assessed sites in poor condition (Van Sickle et al., 2006) in estuaries. In the most recent surveys, 31 % of lake area and 44 % of stream length was in poor condition based on macroinvertebrate condition (US EPA, 2016, US EPA, 2020), while 32 % of wetland area was in poor condition based on a vegetation index (US EPA, 2015). In contrast, only 7 % of estuaries were in poor condition, based on the M-AMBI benthic index (US EPA, 2021a). The higher condition of estuaries in this survey is a function of the survey design as well as the resource itself. NARS applies a probabilistic approach to site selection. While stratification is used to ensure that all coasts have an equal probability of inclusion and, in some cases that specific estuaries are targeted (Kiddon et al., 2020), there is no attempt to target sampling in areas expected to be at higher risk for adverse impacts. Most of the randomly selected sites are located in more open, better flushed areas, rather than in small tributaries, near industrial discharges, cities or at the head of estuaries. Although this survey design allows for an unbiased assessment of overall condition, the lack of sites in areas expected to be more impacted by anthropogenic stress makes it more difficult to develop relationships between the condition indicator, M-AMBI, and environmental stressors. In order to be able to understand which stressors contributed more to observed biological impairment, an alternate approach to relative risk was needed. In this study we used random forest models to rank and assess the relative importance of individual estuarine stressors on M-AMBI in U.S. estuaries.

2. Materials and methods

2.1. Data assembly

In-estuary data were obtained from two EPA monitoring programs – the National Coastal Assessment (NCA, 1999–2006) and NCCA, (2010 and 2015). Both NCA and NCCA are conducted in partnerships with states and tribes that collect data in a consistent manner to assess the condition of the US coastal waters (Kiddon et al., 2020). All data were collected during the summer, generally June through September. Most sites were sampled once during a given year, although a few sites were sampled twice during the summer, often early and late in the season to assess site variability. Water clarity was measured using a secchi disk while water column physical and chemical variables were obtained using a CTD, water quality meter or water sampler. Sediments were collected using grabs for assessment of sediment chemistry and benthic community condition. Benthic invertebrates were sieved (0.5–1.0 mm) from the sediment and identified to the lowest possible taxonomic level, generally species. Samples were not replicated and were collected using a variety of grab types. However, the M-AMBI good/bad expectations were adjusted based on grab size (0.04 m2 vs. 0.1 m2).

Benthic invertebrate data were summarized into the benthic index, M-AMBI (Muxika et al., 2007; Pelletier et al., 2018). This index is used to assess biological condition for EPA's national survey (NCCA, US EPA, 2021a). Briefly, the invertebrate taxa were categorized by ecological group (Gillett et al., 2015), and the AMBI index (Borja et al., 2000), Shannon's H′, and species richness and % oligochaetes calculated. M-AMBI was calculated separately for each of five salinity zones (i.e., tidal freshwater, oligohaline, polyhaline, euhaline, hyperhaline), adjusted for grab size using the good and bad site expectations from Pelletier et al. (2018). M-AMBI is scaled from 0 to 1, with degraded sites being closer to zero and undegraded sites closer to one.

Sediment contaminant concentrations were summarized using mean Effect Range Median (ERM) quotients (mERMQ; Long et al., 1998, Long et al., 2006), which is calculated by averaging the sum of each contaminant divided by its respective ERM value (Long et al., 1995). ERMQ is one of the metrics used to assess sediment quality for EPA's national survey (NCCA, US EPA, 2021b). Prior to calculating the mERMQ, contaminants with zero values were replaced with a value of ½ the detection limit. Because many of organic contaminants were below detection limit, we also calculated the mERMQ for metals alone (metal mERMQ). We also compiled sediment percent total organic carbon (TOC). Secchi depth, bottom dissolved oxygen (DO), and surface chlorophyll a (chl) concentrations, all variables used by the EPA's National Survey (NCCA, US EPA, 2021a) to assess water quality, were compiled. These water column and sediment variables were matched to stations where benthic invertebrates were collected and M-AMBI was able to be calculated. Missing values for water or sediment variables were filled with values from nearby stations using best profession judgement, if possible, otherwise, the entire station was deleted resulting in a database of 5674 stations.

Watershed boundaries for the 220 estuaries (Fig. 1) with in-estuary measures were created using the NHDPlus Version 2 (nhd.usgs.gov) watershed boundary dataset and 1:24,000 hydrography dataset in ArcGIS 10.8 (ESRI 1999–2020). These watersheds were used as the basis for calculating landscape metrics in ArcGIS. Land use/land cover (Anderson et al., 1976) and impervious cover were obtained from the National Land Cover dataset (NLCD; Homer et al., 2004) for 2001, 2006, 2011 and 2016. These measures were summarized by watershed and converted to percent cover. Land use was summarized into four main categories - developed (% low, medium, and high intensity development), forested (% forest + % woody wetlands), salt marsh (% emergent herbaceous wetland, Muñoz et al., 2019) and agriculture, which have been shown to be significantly related to benthos (Hale et al., 2004; King et al., 2005; Seitz et al., 2018). Forested lands are expected to be associated with good ecological condition while developed and agricultural lands are often correlated with impacted aquatic communities (Dauer et al., 2000). Impervious surfaces are also expected to be associated with poor ecological condition due to its association with increased runoff and decreased soil processing, which would be expected to increase pollutant delivery (Arnold and Gibbons, 1996). Percent agriculture on steep (>9 %) slopes was calculated by creating a percent slope map using USGS 10 m elevation data (https://apps.nationalmap.gov/downloader/#/) and overlaying the agriculture cover from NLCD. Cropland on steep slopes increases the probability of erosion and associated delivery of sediment and nutrients to nearby water bodies (O’Neill et al., 1997), which may translate into poorer estuarine condition. Estuarine data collected in 1999–2003 were associated with the 2001 NLCD, while data collected in 2004–2006 were associated with the 2006 NLCD. Estuary data collected in 2010 and 2015 were associated with the 2011 and 2016 NLCD, respectively.

Fig. 1.

Fig. 1.

Map of study area showing location of the 220 estuaries included this study and the 9 Marine Ecoregions of the World (MEOW) ecoregions.

Road density was calculated for each watershed using the 2011 TeleAtlas roads dataset (NAVSTREETS Navteq Streets, accessed June 2021). Roads can negatively impact aquatic communities by altering habitat and hydrology, acting as conduits for additional contaminant loads to systems and by facilitating increased human use of an area, which can result in additional changes in land use and hydrology (Trombulak and Frissell, 2000). Dayesmetic population was acquired from the US EPA EnviroAtlas (https://www.epa.gov/enviroatlas, accessed June 2021) and associated with individual watersheds. Dayesmetic population (Mennis, 2003) was determined by distributing census block population onto areas where population is expected to be located. The number of wastewater treatment facilities (WWTF) in each watershed was obtained by obtaining the location of sewage treatment plants (major dischargers) from EPA's Permit Compliance System (www.epa.gov/enviro/html/pcs) and associating the locations with its watershed in ArcGIS.

2.2. Statistical analyses

Random forest regression models (randomForest package v.4.6-14; Breiman, 2001, Liaw and Wiener, 2002, R Core Team, 2020) were used to identify the independent variables (Table 1) that were the best predictors of M-AMBI. Random forest (RF) is a machine learning process that aggregates multiple regression trees of varying length created from random subsets of the data. Data not used to construct a given tree (“out of bag”) are used to evaluate performance. These models are resistant to overfitting and provide an unbiased estimate of model error (Breiman, 2001). The importance of each variable is determined by the % increase in prediction error (%IncMSE) for out-of-bag data when data for a given variable are permuted (Liaw and Wiener, 2002). Visually, the RF output displays the variables from most to least important. The most important variables were those with the highest %IncMSE. RF (ntree = 5000) was applied to the entire dataset and to individual ecoregions (Marine Ecoregions of the World (MEOW), Spalding et al., 2007, Fig. 1, Table 2). These ecoregions were selected as they were able to capture much of the raw observed geographic variability in M-AMBI, specifically detecting the better conditions in the southeast Gulf of Mexico, Southern California, and northern Washington (Fig. S1 in Supplementary (S) Material).

Table 1.

List of variables used in Random Forest models.

Variable Description Reference
secchi Secchi depth (m) US EPA, 2021a
chl Surface chorophyll a concentrations (μg/L) US EPA, 2021b
DO Bottom dissolved oxygen concentrations (mg/L) US EPA, 2021a
TOC Sediment total organic carbon (%) US EPA, 2021b
mERMQ Mean Effect Range Median Quotient US EPA, 2021a, Long et al., 1998, Long et al., 2006
metal mERMQ Mean Effect Range Median Quotient for metals based on Long et al., 1998, Long et al., 2006
pop Population in the watershed www.census.gov
roads Road density Trombulak and Frissell, 2000
WWTF Number of wastewater treatment facilities in the watershed www.epa.gov/enviro/html/pcs
Imp Watershed impervious surface (%) Homer et al., 2004
Dev Watershed development (%) Homer et al., 2004
Ag Agricultural land in the watershed (%) Homer et al., 2004
Steep Ag Agricultural land on steep slopes in the watershed (%) Homer et al., 2004, O’Neill et al., 1997
Forest Forested land in the watershed (%) Homer et al., 2004
marsh Salt marsh in the watershed (%) Homer et al., 2004, Muñoz et al., 2019

Table 2.

Numbers of estuaries and stations in each Marine Ecoregions of the World (MEOW) Ecoregion.

Abbreviated
name
Full name Number of
estuaries
Number of
stations
Gulf of ME Gulf of Maine/Bay of Fundy 34 564
Virginian Virginian 62 1639
Carolinian Carolinian 35 784
Floridian Floridian 9 262
N. Gulf of MX Northern Gulf of Mexico 29 1570
S. CA Bight Southern California Bight 8 124
N. CA Northern California 11 219
OR, WA Oregon, Washington, Vancouver Coast and Shelf 22 379
Puget Puget Trough/Georgia Basin 9 133

The most important variables identified from the RF technique were assessed to determine how well they were related to M-AMBI. Each variable was correlated (Spearman's rho) with M-AMBI and variables with significant correlation were further assessed with Kruskal-Wallis tests used determine if the selected variables were able to distinguish between the five M-AMBI condition classes. All analyses were conducted using SPSS v24, which was also used to produce boxplots of the significant relationships (p < 0.05). The variables identified as significant by both the correlation and nonparametric analysis of variance tests, were used in a separate RF to create reduced RF models. Overall variance explained by the reduced RF was compared to the variance explained by the full model using all variables. A further subset of these variables (those with the highest importance in the RF models as determined by %IncMSE) was combined and secondary RF conducted. As with the initial reduced model, the overall variance explained was compared to the overall variance explained in the full model. The reduced model that was able to explain a majority of the variance of the full model was assumed to include the variables primarily impacting benthic communities.

3. Results

Based on M-AMBI scores, the majority (48.8 %) of estuarine sites across all U.S estuaries were in Good condition, while there were few sites in Bad (2.4 %) or Poor (9.9 %) condition. The remaining sites were in Moderate (21.8 %) or High (17.1 %) condition. The nationwide random forest model relating stressors (Table 1) to M-AMBI was able to explain 36.3 % of the variation in benthic condition. The top five variables explaining benthic condition were secchi depth, metal mERMQ, watershed agriculture (Ag), TOC, and salt marsh (Fig. 2). All variables were significantly related to M-AMBI (α = 0.05). Secchi depth and salt marsh were positively correlated with M-AMBI, while metal ERMQ, Ag, TOC were negatively correlated with M-AMBI (Table 3, Fig. S2). There were significant differences (α = 0.05) among M-AMBI categories for all five variables (Table 3, Table S1, Fig. S2). A random forest model using these five variables was able to explain the majority (90 %) of the variance explained by the model with all variables, while a further reduced model excluding the salt marsh variable explained 77 % of variance explained by the model with all variables.

Fig. 2.

Fig. 2.

Importance of predictor variables from the national random forest model predicting M-AMBI at 5674 sites from U.S estuaries sampled during the summer. %IncMSE (% increase in prediction error) measures the mean decrease in model accuracy if a given variable is excluded. 36.33 % of the variance in M-AMBI was explained by this model.

Table 3.

Estuary and watershed variables identified as having the highest importance in the Random Forest Analysis. Variables significantly correlated (α = 0.05) with M-AMBI that were also able to distinguish between M-AMBI classes (Kruskal-Wallis, α = 0.05) are enclosed within solid boxes.

MEOW Ecoresion
Variable National Gulf of ME Virginian Carolinian Floridian N. Gulf of MX S. CA Bight N. CA OR_WA Puget
Agriculture Spearman’s p
Sig. (2-tailed)
N
−0.150
p<0.005
5674
−0.282
p<0.005
564
−0.138
p<0.005
1639
−0.081
p<0.023
784
−0.273
p<0.005
262
−0.279
p<0.005
1570
−0.454
p<0.005
219
0.183
p<0.005
379
Chlorophyll a Spearman’s p
Sig. (2-tailed)
N
−.238**
p<0.005
1570
0.005
p=0.942
219
Development Spearman’s p
Sig. (2-tailed)
N
−0.069
p=0.102
564
0.048
p=0.600
124
0.148
p=0.004
379
0.294
p=0.001
133
Dissolved oxygen Spearman’s p
Sig. (2-tailed)
N
−0.127
p=0.060
219
0.196
p<0.005
379
Forest Spearman’s p
Sig. (2-tailed)
N
−0.027
p=0.526
564
0.098
p<0.005
1639
−0.082
p=0.021
784
−0.021
p=0.814
124
−0.243
p<0.005
379
−0.369
p<0.005
133
Impervious Spearman’s p
Sig. (2-tailed)
N
0.037
p=0.683
124
0.292
p=0.001
133
Salt marsh Spearman’s p
Sig. (2-tailed)
N
0.056
p<0.005
5674
0.050
p=0.240
564
0.020
p=0.419
1639
0.111
p=0.002
784
0.213
p=0.001
262
−0.387
p<0.005
124
metal mERMQ Spearman’s p
Sig. (2-tailed)
N
−0.281
p<0.005
5673
0.262
p<0.005
564
0.272
p<0.005
1639
−0.242
p<0.005
783
0.175
p=0.004
262
−0.338
p<0.005
1570
−0.415
p<0.005
124
−0.449
p<0.005
219
−0.165
p=0.005
379
−0.436
p<0.005
133
mERMQ Spearman’s p
Sig. (2-tailed)
N
−0.274
p<0.005
564
−0.251
p<0.005
1639
−0.245
p<0.005
783
−0.143
p=0.020
262
−0.34
p<0.005
1570
−0.153
p=0.090
124
−0.416
p<0.005
219
−0.384
p<0.005
133
Population Spearman’s p
Sig. (2-tailed)
N
−0.475
p<0.005
219
Road density Spearman’s p
Sig. (2-tailed)
N
Secchi depth Spearman’s p
Sig. (2-tailed)
N
0.290
p<0.005
5674
0.502
p<0.005
564
0.275
p<0.005
1639
0.222
p<0.005
1570
0.317
p<0.005
219
0.131
p=0.010
379
Ag on Steep Slopes Spearman’s p
Sig. (2-tailed)
N
0.088
p=0.330
124
Total Org Carbon Spearman’s p
Sig. (2-tailed)
N
−0.222
p<0.005
5673
−0.333
p<0.005
564
−0.311
p<0.005
1639
−0.286
p<0.005
783
−0.052
p=0.399
262
−0.326
p<0.005
219
−0.212
p<0.005
379
−0.408
p<0.005
133
Number of WWTF Spearman’s p
Sig. (2-tailed)
N
−0.452
p<0.005
219

Across all ecoregions, the average amount of variance in benthic condition explained was 33 %, similar to that seen in the national model, but the amount varied from 17 % in the Floridian ecoregion to 55 % in the southern California Bight ecoregion. These endmembers were the two ecoregions with the highest overall condition. A majority of stations in the Floridian ecoregion were in High (63.7 %) or Good (29.4 %), with only 2.3 % of stations in Bad or Poor condition. In the southern California Bight ecoregion, most of the stations were in Good (63.7 %) or High (29.0 %) condition, with few in Bad or Poor condition (0.8 %). Four of the variables identified as important in the national model (secchi, metal ERMQ, Ag, and TOC) were identified as top variables of importance in more than half of the ecoregion models. Salt marsh was identified as a variable of importance in only 3 ecoregional models and mERMQ, while not identified as a top variable in the national model, was identified as a variable of importance in most regional models (Table 3).

The RF model for the Gulf of Maine, Bay of Fundy (Gulf of ME) ecoregion was able to explain 36.5 % of the variance in M-AMBI. Of the top eight variables (Fig. 3A), five variables were significantly correlated with M-AMBI and significantly different among M-AMBI classes (Table 3, Table S1). Secchi depth was positively correlated with M-AMBI, while TOC, metal ERMQ, agricultural land and mERMQ were all negatively correlated with M-AMBI. A reduced model using only these 5 variables was able to explain over 100 % of the variance from the model using all variables (full model = 36.5 %, reduced model = 37.4 % variance explained). A further reduced model only including secchi, TOC and metal ERMQ explained 80 % of the variance of the model using all variables.

Fig. 3.

Fig. 3.

Fig. 3.

Importance of predictor variables in random forest model predicting M-AMBI in the individual Marine Ecoregions of the World (MEOW) Ecoregions. %IncMSE (% increase in prediction error) measures the mean decrease in model accuracy if a given variable is excluded. A. Gulf of Maine/Bay of Fundy (N = 564, 36.5 % variance explained), B. Virginian (N = 1639, 31.5 % variance explained), C. Carolinian (N = 784, 18.4 % variance explained), D. Floridian (N = 262, 16.6 % variance explained), E, Northern Gulf of Mexico (N = 1570, 34.3 % variance explained), F. Southern California Bight (N = 124, 55.3 % variance explained), G. Northern California (N = 219, 41.4 % variance explained), H. Oregon, Washington, Vancouver Coast and Shelf (N = 379, 28.2 % variance explained), I. Puget Trough/Georgia Basin (N = 133, 36.6 % variance explained).

The RF model for the Virginian ecoregion was able to explain 31.5 % of the variance in M-AMBI. Of the top seven variables (Fig. 3B), six variables were significantly correlated with M-AMBI and significantly different among M-AMBI classes (Table 3, Table S1). Secchi depth and forested land were positively correlated with M-AMBI, while TOC, metal ERMQ, agricultural land and mERMQ were all negatively correlated with M-AMBI. A reduced model using only these 6 variables was able to explain 85 % of the variance from the model using all variables. A further reduced model including secchi, TOC and metal ERMQ explained less (50 %) of the variance of the model using all variables.

The RF model for the Carolinian ecoregion was able to explain 18.4 % of the variance in M-AMBI. Of the top six variables (Fig. 3C), five variables were significantly correlated with M-AMBI and significantly different among M-AMBI classes (Table 3, Table S1). TOC, metal ERMQ, mERMQ, and agricultural land were all negatively correlated with M-AMBI, while salt marsh was positively correlated with M-AMBI. A reduced model using only these 5 variables was able to explain 76 % of the variance from the model using all variables. The further reduced model including only TOC, metal ERMQ, and mERMQ explained little (38 %) of the variance of the model using all variables.

The RF model for the Floridian ecoregion was able to explain 16.6 % of the variance in M-AMBI. Of the top five variables (Fig. 3D), two variables were significantly correlated with M-AMBI and significantly different among M-AMBI classes (Table 3, Table S1). Both mERMQ and agricultural land were negatively correlated with M-AMBI. A reduced model using only these 2 variables only explained 46 % of the variance of the model using all variables.

The RF model for the northern Gulf of Mexico (Gulf of MX) ecoregion was able to explain 34.3 % of the variance in M-AMBI. All five of the top variables (Fig. 3E) were significantly correlated with M-AMBI and significantly different among M-AMBI classes (Table 3, Table S1). Metal ERMQ, agricultural land, mERMQ, and chlorophyll were all negatively correlated with M-AMBI, while secchi depth was positively correlated with M-AMBI. A reduced model using only these 5 variables was able to explain 74 % of the variance from the model using all variables. The further reduced model including only agricultural land, mERMQ, and metal ERMQ explained 63 % of the variance of the model using all variables.

The RF model for the southern California Bight (S. CA Bight) ecoregion was able to explain 55.3 % of the variance in M-AMBI. Of the top seven variables (Fig. 3F), two variables were significantly correlated with M-AMBI and significantly different among M-AMBI classes (Table 3, Table S1). Both salt marsh and metal ERMQ were negatively correlated with M-AMBI. A reduced model using only these 2 variables was able to explain 86 % of the variance from the model using all variables.

The RF model for the northern California (N. CA) ecoregion was able to explain 41.6 % of the variance in M-AMBI. Of the top nine variables (Fig. 3G), seven variables were significantly correlated with M-AMBI and significantly different among M-AMBI classes (Table 3, Table S1). Secchi depth was positively correlated with M-AMBI, while agricultural land, TOC, metal ERMQ, mERMQ, number of WWTF and watershed population were all negatively correlated with M-AMBI. A reduced model using only these 7 variables was able to explain 73 % of the variance of the model using all variables. The further reduced model including only agricultural land, TOC, mERMQ, and metal ERMQ explained 75 % of the variance of the model using all variables. A further reduced model including only agricultural land and secchi explained less (49 %) of the variance of the model using all variables.

The RF model for the Oregon, Washington, Vancouver Coast and Shelf (OR, WA) ecoregion was able to explain 28.2 % of the variance in M-AMBI. Of the top seven variables (Fig. 3H), six variables were significantly correlated with M-AMBI and significantly different among M-AMBI classes (Table 3, Table S1). Dissolved oxygen, secchi depth and agricultural land were positively correlated with M-AMBI, while forested land, TOC, and metal ERMQ were negatively correlated with M-AMBI. A reduced model using only these 5 variables explained 100 % of the variance from the model using all variables. A further reduced model including only forested land, DO and TOC explained 72 % of the variance of the model using all variables.

The RF model for the Puget Trough/Georgia Basin (Puget) ecoregion was able to explain 36.6 % of the variance in M-AMBI. Of the top six variables (Fig. 3I), four variables were significantly correlated with M-AMBI and significantly different among M-AMBI classes (Table 3, Table S1). Metal ERMQ, forested land, mERMQ, and TOC were all negatively correlated with M-AMBI. A reduced model using only these 4 variables was able to explain 95 % of the variance of the model using all variables. A further reduced model including only forested land, metal ERMQ and mERMQ was able to explain 96 % of the variance of the model using all variables.

4. Discussion

This study was conducted to assess the relative importance of individual stressors to observed biological condition in estuarine waters of the United States. We chose variables used as indicators by NCCA (US EPA, 2021, US EPA, 2021) or land use variables associated with adverse estuarine benthic condition (Dauer et al., 2000; Hale et al., 2004; King et al., 2005; Seitz et al., 2018). Some of these land use variables have also been used to predict benthic stream condition using NARS data (Hill et al., 2017). The NARS assessment of rivers (US EPA, 2020) and lakes (US EPA, 2016) assesses the impact of riparian land use on benthic invertebrates however, Pelletier et al. (2019) found that riparian and watershed land use were highly correlated for U.S. estuaries, so watershed land use was used in this study. NCCA assesses sediment contamination and eutrophication (US EPA, 2021a), both of which are expected to impact eutrophication. Because benthic invertebrates live in or on sediments, sediment contaminants are expected to directly impact estuarine benthos. In contrast, eutrophication impacts can often be indirect. Nutrient enrichment in estuaries can stimulate algal blooms, which can senesce and accumulate in sediments, which can promote the development of hypoxia (Cloern, 2001). Although the NARS rivers and lake surveys indicated that total nitrogen (TN) and total phosphorus (TP) were the most important stressors impacting benthos, in estuaries nutrient impacts are modulated by factors such as tides, residence time and the biomass of suspension feeders in a given system (Cloern, 2001). In addition, TN and TP were only routinely measured in estuaries nationwide by EPA starting in 2010. Thus, for this study we selected chlorophyll a as a measure of algal blooms/production and dissolved oxygen as a measure of hypoxia. TOC was selected as a measure of sediment organic enrichment (Hyland et al., 2005). Water clarity (secchi depth) can be reduced by high algal production but can also be influenced by other factors (Testa et al., 2019). The landscape variables (percent agriculture, percent development, number of wastewater treatment plants, watershed population) are indirect but more stable indicators of stress to benthos. We also included land use variables expected to be related to better benthic condition (percent forest, percent salt marsh) as maintenance of natural lands is assumed to protect estuaries (Pittock et al., 2015).

We chose random forest regression, a powerful non-parametric technique that aggregates multiple models to produce a robust model that can provide a measure of variable importance (Breiman, 2001; Cutler et al., 2007). As with any model, the results depend upon which variables are input into the model. This approach does not indicate that these stressors are the most important stressors globally or indicate that they are the most widespread stressors. The area-weighted assessment of NCCA provides an assessment of the extent of stressor impact (the 1st NARS goal). For example, in the 2015 NCCA report, 51 % of estuarine area nationwide was in fair condition based on the eutrophication index (US EPA, 2021a). In contrast, 76 % of estuarine area nationwide was in good condition based on the sediment quality index (US EPA, 2021a), suggesting that eutrophication is a more widespread issue than sediment contamination. In contrast, the random forest regression indicates which of the stressors selected for inclusion were more strongly related to M-AMBI (the 3rd NARS goal). Because RF is an assemblage of models, it is difficult to extract the exact relationship between the response variable, M-AMBI, and the individual variables. To address this issue, we used the initial RF using all variables to initially identify the most important variables, and then assessed whether there was a significant correlation between each variable and M-AMBI and if that variable differed among M-AMBI classes. The variables that had a significant relationship with M-AMBI and among M-AMBI classes were used to develop reduced models. The most parsimonious reduced model explaining a significant proportion of the model using all variables (>70 %) was assumed to have identified the variables most strongly responsible for observed benthic effects. This technique allowed us to identify important stressors impacting M-AMBI at the national scale and in individual biogeographic provinces.

At the national scale, most of the variance in M-AMBI was explained by secchi depth, metal mERMQ, TOC and watershed agriculture. Secchi depth is a measure of water clarity, which in estuarine waters is influenced by colored dissolved organic matter, suspended particulate matter, and chlorophyll a (Harvey et al., 2019; Testa et al., 2019). Although benthic condition was better when secchi depth visibility was deeper, this relationship was likely an indirect response to overall better water quality and/or increased flushing. Benthic condition, as measured by M-AMBI, was worse when the metal mERMQ was higher. mERMQ is an empirically derived index based on the relationship between toxicity and sediment chemistry in field collected sediments (Long et al., 1998). Although based on toxicity tests, studies in individual US estuaries indicate that mERMQ is related to benthic community impacts with increasing mERMQ corresponding to decreases in pollution-sensitive taxa and increases in pollution-tolerant taxa (Long et al., 2006). Borja et al., 2015, Borja et al., 2019 also showed that M-AMBI was negatively correlated to both mERMQ and metals. It is not surprising that increasing organic matter (TOC) corresponded with decreasing benthic condition given that Pearson and Rosenberg (1978) demonstrated that high carbon results in a shallowing of the oxic zone and the benthos becoming dominated by small, opportunistic species. Eventually, increasing carbon loads can lead to defaunated sediments as oxygen decreases and ammonia and sulfide increase (Hyland et al., 2005). In worldwide metanalyses, Borja et al. (2015) discovered that organic matter was one of the most common individual stressors shown to be related to M-AMBI, while Borja et al. (2019) indicated that both organic enrichment and organic matter were significantly and negatively correlated with M-AMBI. Agricultural practices in the watershed can lead to impaired benthic communities via delivery of nutrients, pesticides and sediments to nearby waterbodies (FAO and IWMI, 2017), which can then drain into local estuaries. A classic example of this can be seen in the Gulf of Mexico, where nutrients from the agricultural heartland of the US result in a large hypoxic ‘dead zone’ in near-coastal and shelf waters (Fry et al., 2015; Jarvis et al., 2021) that adversely impacts the benthos (Rabalais et al., 2002). Thus, on a nationwide basis, US benthic communities are primarily impacted by overall water quality, metals, organic enrichment and agricultural impacts.

The amount of variance explained in many of the biogeographic province models was approximately the same as that seen in the national model but the variables identified as being most important for predicting M-AMBI reflected local conditions. Three ecoregions are found on the U.S. Atlantic coast – Gulf of ME, Virginian, and Carolinian. Both the Gulf of ME and Virginian Ecoregions included in the Cold Temperate Northwest Atlantic Province, while the Carolinian Ecoregion is part of the Warm Temperate Northwest Atlantic Province (Spalding et al., 2007). The Gulf of Maine is characterized by deep, cold waters, which are strongly affected by the Labrador Current, and high (2–6 m) tidal heights (Holland, 1990; Roman et al., 2000). This ecoregion has low population density, with most of the human development is found in the southern half of the ecoregion, which includes the Boston, Massachusetts metropolitan area. The Virginian Ecoregion is characterized by large embayments including Buzzards Bay, Massachusetts, Narragansett Bay, Rhode Island, Long Island Sound, Connecticut & New York, Delaware Bay, New York & New Jersey, Chesapeake Bay, Maryland & Virginia, and Albemarle-Pamlico Sound, North Carolina. This ecoregion also includes the New York metropolitan region as well as several other urban centers. The Carolinian Ecoregion is characterized by barrier island and lagoonal systems as well as extensive coastal marshes (Holland, 1990).

TOC and metals were identified as being important variables impacting M-AMBI in all three Atlantic Coast ecoregions. However, the average TOC levels were highest in the Gulf of ME Ecoregion and lowest in the Carolinian Ecoregion (Fig. S3), likely reflecting the higher processing rates in warmer waters. Metals were lowest in the Carolinian Ecoregion (Fig. S3), which likely reflects the lack of a major coastal metropolitan area in this region. Secchi depth was positively related to M-AMBI in both the Gulf of ME and Virginian Ecoregions but secchi depth was higher in the Gulf of ME (Fig. S3), which is not surprising given the higher energy and flushing in this area. M-AMBI was also negatively impacted by watershed agriculture and overall sediment contaminants (mERMQ). Like sediment metals, overall sediment contamination was lower in the Carolinian Province (Fig. S3). Finally, M-AMBI was positively related to natural features – forested land in the Virginian Ecoregion and salt marsh in the Carolinian Ecoregion. Forested lands act to maintain water quality and quantity by regulating water flow and processing nutrients and contaminants in forest soils (Neary et al., 2009) while marshes can act as a ‘filter,’ intercepting water from land. Nutrients can be taken up by marsh plants or stored or processed in marsh sediments (Nelson and Zavaleta, 2012), which can help to protect estuarine water quality.

The Floridian Ecoregion is part of the Tropical Northwestern Atlantic Province (Spalding et al., 2007), which includes south Florida including Biscayne Bay, the Florida Keys and Florida Bay. Most stations in this ecoregion were classified as High based on M-AMBI, and over 90 % of the stations were in Good or High condition. This ecoregion explained just over 15 % of the variance in M-AMBI. No reduced models could be produced, due to the low variance and lack of a gradient in ecological condition.

The Northern Gulf of Mexico Ecoregion is part of the Tropical Northwestern Atlantic Province (Spalding et al., 2007) and includes most of the U.S. Gulf of Mexico coastline with the Mississippi River discharging into the Gulf in the middle of the ecoregion. Major coastal cities within this ecoregion include Tampa, Florida and Houston, Texas. M-AMBI in this area was negatively related to metals, watershed agriculture, overall sediment contamination (mERMQ) and chlorophyll a concentration and positively related to secchi depth. This in part reflects the strong influence of the Mississippi-Atchafalaya River system, which delivers large quantities of freshwater, nutrients and suspended sediments as well as contaminants such as metal, pesticides and herbicides (Pereira et al., 1995; Walker, 1996; Shiller, 1997; Clark and Goolsby, 2000; Allison et al., 2012; White et al., 2014) to the Gulf of Mexico. The river plume generally moves westward along the Louisiana-Texas Shelf but can episodically move eastward to as far as the Florida coast with high river discharge and strong easterly winds (Walker, 1996; da Silva and Castelao, 2018). The Gulf of Mexico is also known for its oil production, with oil platforms and refineries located on or off the coast of all states except Florida (Ruble, 2019). Contaminant loads from upstream sources via river discharge, petroleum spills, and local sources can adversely impact benthos, which is reflected in lower M-AMBI values. Agricultural inputs, both locally and via delivery from the Mississippi River can provide nutrients that fuel productivity and higher chlorophyll concentrations. As these blooms senesce, they can sink to the bottom, which can increase oxygen demand and decrease benthic condition (Bricker et al., 1999). Nutrients discharged to shelf waters have also been shown to be upwelled to coastal waters, fueling local production and hypoxia (Jarvis et al., 2021). In contrast, better conditions (higher M-AMBI values) were seen in areas with clear water, without large anthropogenic inputs.

Four ecoregions are found on the U.S. Pacific Coast – S. CA Bight, N. CA, OR, WA and Puget. The S. CA Bight Ecoregion is included within the Warm Temperate Northeast Pacific Province, while the N. CA, OR, WA and Puget Ecoregions are part of the Cold Temperate Northeast Pacific Province (Spalding et al., 2007). While the majority of the west coast of the US are influenced by upwelling, in the southern CA Bight, weaker winds and the presence of islands result in weaker upwelling and the presence of off-shore eddies (https://www.csulb.edu/geological-sciences/southern-california-bight-oceanography/circulation#:~:text=Identification%20of%20Eddies%20in%20the%20Southern%20California%20Bight&text=Eddies%20are%20more%20or%20lesstopographic%2C%20tidal%20and%20wind%20forcing, accessed 7/20/22). The S. CA Bight includes the metropolitan areas of Los Angeles, California and San Diego California, and is one of the most highly populated coastal areas in the US (Stein and Cadien, 2009). The N. CA Ecoregion encompasses much of the State of California. Many of the estuaries in this ecoregion are small and not sampled by NCCA. However, this region also includes San Francisco Bay and its metropolitan area, and is where the majority of stations in this ecoregion were collected. The OR, WA Ecoregion includes the northern fifth of California, all of Oregon State and the coastal areas of Washington State, including the Columbia River. This ecoregion also has the lowest population density of all the U.S. Pacific Coast ecoregions. The Puget Sound Ecoregion encompasses the Salish Sea - Puget Sound, the Juan de Fuca Strait, Strait of Georgia and connecting waters. The metropolitan areas of Tacoma, Washington and Seattle, Washington are located in this ecoregion.

Overall sediment contamination (mERMQ) was an important variable negatively impacting M-AMBI in all Pacific coast ecoregions except the OR, WA Ecoregion. Metals (metal mERMQ) was also important to M-AMBI in the N. CA and Puget Ecoregions. These sediment contaminants likely reflect the influence of the surrounding metropolitan areas in these ecoregions; the Puget Ecoregion had lower concentrations than the S.CA Bight and N.CA ecoregions (Fig. S4), reflecting development and population pressure. The S. CA Bight Ecoregion had higher watershed development than did the Puget Ecoregion, and the N. CA Ecoregion had higher watershed population. Organic matter (TOC) was an important variable in the N. CA and OR, WA Ecoregions although TOC was higher in the N. CA Ecoregion than in the OR, WA Ecoregion (Fig. S4), likely reflecting differences in sediment grain size (Pelletier et al., 2011). In the OR, WA Ecoregion, sediments were sandy, while in the N.CA Ecoregion (San Francisco Bay), most sediments were muddy. Watershed agriculture was shown to be an important variable related to M-AMBI in the N.CA Ecoregion, which had the highest percentage of watershed agriculture of all the Pacific ecoregions. Dissolved oxygen was identified as an important variable in the OR, WA Ecoregion. It is unknown whether the hypoxia in this ecoregion is due to upwelling impacts, local sources, or both.

A surprising result of this study was the negative relationships between M-AMBI and salt marshes in the S. CA Bight Ecoregion, and M-AMBI and forests in the OR, WA and Puget Ecoregions rather than the positive ‘filtering’ relationship that we expected. However, it is known that under certain conditions, salt marshes can export nutrients (Odum, 2000) and pathogens (Jeong et al., 2008). It is further understood that salt marshes can be adversely impacted by urbanization (Lee et al., 2006), and that nutrient loading can impair salt marsh function (Krause et al., 2020). The S. CA Bight Ecoregion has a low amount of salt marsh relative to the amount of development so it is likely that the urban influences can surpass the salt marsh carrying capacity, resulting in the release of contaminants and nutrients that can ultimately impact estuarine benthos. In the OR, WA and Puget Ecoregions the negative relationship between M-AMBI and forest is likely due to the influence of Red Alder, Alnus rubra, an important hardwood in the U.S. Pacific Northwest (US Forest Service, 2006). Red Alder has been shown to be a significant source of nutrients to both freshwater streams (Greathouse et al., 2014) and estuaries (Detenbeck et al., 2019) in the U.S. Pacific Northwest. In fact, Alders provided a median of 71 % and up to 93 % of the nitrogen load to estuaries in the OR, WA Ecoregion and a median of 19 % and up to 40 % of the nitrogen load to estuaries in the Puget Ecoregion (Detenbeck et al., 2019). This suggests that the negative relationship observed between forest and M-AMBI in these two ecoregions are due to forests acting as a source rather than a sink for nutrients.

This study examined broad-scale relationships between M-AMBI and stressors in U.S. estuaries, focusing on NCCA stressors. As such, it does not address all possible stressors impacting estuarine benthos, nor does it address specific local processes. For example, this study did not address physical disturbance such as that from dredging (Dauvin et al., 2006; Hinchey et al., 2006), hydraulic alteration (Baeta et al., 2011; Van Diggelen and Montagna, 2016) and climate impacts (Scavia et al., 2002, Birchenough et al., 2015). Assuming sufficient temporal resolution, climate or hydraulic impacts may be incorporated using climate indices such as the North Atlantic Oscillation (NAO), or some measure of deviation from mean or historic conditions. Other disturbances may be more localized and the impact of disturbance modulated by nearby populations (Cowie et al., 2000). Local habitat can act to decrease the impact of chemical pollution (Pitacco et al., 2021). Whether a given stressor should be included in an assessment such as this will depend upon data availability, the extent of the study (e.g., local vs. continental) and the goals of a given study. Even when metrics are available for individual stressors, they are not all-inclusive. For example, mERMQ is an index that provides a measure of when adverse impacts might be expected due to sediment contaminants. However, this metric is based on 23 legacy contaminants, including metals, PAHs, PBCS and pesticides. It does not include all possible contaminants, pharmaceutical or personal care products, microplastics or nanoparticles. However, some of the unmeasured contaminants are likely correlated with those included in the mERMQ calculation. mERMQ also does not account for whether the sediment contaminants are bioavailable, and therefore able to cause benthic impairment. TOC, while shown to be linked to benthic impairment, is also influenced by hydrodynamics and sediment grain size. Watershed measures of land use do not account for best management practices or, as in the U.S. Pacific Northwest, species differences. To understand local processes, specific local studies will be needed, and the specific stressors targeted will vary based on local management concerns. However, our study provided a framework to allow association of estuary stressors to benthic condition, both in the U.S. and in other countries. Despite a relatively low percentage of poor sites, we were able to develop relationships between M-AMBI and land use and estuarine stressors for the entire country and for all ecoregions except one. The iterative and nonlinear nature of the random forest technique allowed all relationships to be rigorously examined, while the later screening of data allowed us to select the variables that were most directly related to M-AMBI. Development of multiple reduced models further refined our ability to identify the most important variables both nationally and in specific ecoregions. Finally, by comparing each variable identified as being statistically important to information about local patterns and processes, we were able to assess the ecological plausibility of the statistical results. Our study suggests that this simple technique can be applied to other nations and estuaries.

Supplementary Material

Supplement1

Acknowledgements

We would like to thank the NCA and NCCA field crews and laboratory analysts for providing the data used in this study and C. Wigand, J. Paul, L. Smith, and S. Paulsen, and two anonymous reviewers for helpful comments. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The views expressed in this manuscript are those of the authors and do not necessarily represent the views or policies of the U.S. Environmental Protection Agency. Any mention of trade names, products, or services does not imply an endorsement by the U.S. Government or the U.S. Environmental Protection Agency. The EPA does not endorse any commercial products, services, or enterprises. This is ORD-049693.

Footnotes

CRediT authorship contribution statement

Marguerite C. Pelletier: Conceptualization, Methodology, Investigation, Formal analysis, Writing – original draft, Writing – review & editing, Visualization. Michael Charpentier: Investigation, Visualization, Writing – review & editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability

The data used were publically available US EPA data.

References

  1. Adl SM, Leander BS, Simpson AGB, Archibald JM, Anderson OR, Bass D, Bowser SS, Brugerolle G, Farmer MA, Karpov S, Kolisko M, Lane CE, Lodge DJ, Mann DG, Meisterfeld R, Mendoza L, Moestrup Ø, Mozley- Standridge SE, Smirnov AV, Spiegel F, Collins T, Sullivan J, 2007. Diversity, nomenclature, and taxonomy of protists. Syst. Biol 56, 684–689. [DOI] [PubMed] [Google Scholar]
  2. Agrawal AA, Hastings AP, Johnson MTJ, Maron JL, Salminen J-P, 2012. Insect herbivores drive real-time ecological and evolutionary change in plant populations. Science 338, 113–116. [DOI] [PubMed] [Google Scholar]
  3. Alverson AJ, 2008. Molecular systematics and the diatom species. Protist 159, 339–353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Amin SA, Parker MS, Armbrust EV, 2012. Interactions between diatoms and bacteria. Microbiol. Mol. Biol. Rev 76, 667–684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Anderson CR, Kudela RM, Kahru M, Chao Y, Rosenfeld LK, Bahr FL, Anderson DM, Norris TA, 2016. Initial skill assessment of the California harmful algae risk mapping (C-HARM) system. Harmful Algae 59, 1–18. [DOI] [PubMed] [Google Scholar]
  6. Anderson CR, Siegel DA, Kudela RM, Brzezinski MA, 2009. Empirical models of toxigenic Pseudo-nitzschia blooms: potential use as a remote detection tool in the Santa Barbara Channel. Harmful Algae 8, 478–492. [Google Scholar]
  7. Anderson DM, Burkholder JM, Cochlan WP, Glibert PM, Gobler CJ, Heil CA, Kudela RM, Parsons ML, Rensel JEJ, Townsend DW, Trainer VL, Vargo GA, 2008. Harmful algal blooms and eutrophication: examining linkages from selected coastal regions of the United States. Harmful Algae 8, 39–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Assaf G, Hannon GJ, 2010. FASTX-toolkit. FASTX-Toolkit. [Google Scholar]
  9. Bates SS, Douglas DJ, Doucette GJ, Léger C, 1995. Enhancement of domoic acid production by reintroducing bacteria to axenic cultures of the diatom Pseudo-nitzschia multiseries. Nat. Toxins 3, 428–435. [DOI] [PubMed] [Google Scholar]
  10. Bates SS, Hubbard KA, Lundholm N, Montresor M, Leaw CP, 2018. Pseudo-nitzschia, Nitzschia, and domoic acid: new research since 2011. Harmful Algae 79, 3–43. [DOI] [PubMed] [Google Scholar]
  11. Bejarano AC, VanDola FM, Gulland FM, Rowles TK, Schwacke LH, 2008. Production and toxicity of the marine biotoxin domoic acid and its effects on wildlife: a review. Hum. Ecol. Risk Assess 14, 544–567. [Google Scholar]
  12. Bendall ML, Stevens SL, Chan L-K, Malfatti S, Schwientek P, Tremblay J, Schackwitz W, Martin J, Pati A, Bushnell B, Froula J, Kang D, Tringe SG, Bertilsson S, Moran MA, Shade A, Newton RJ, McMahon KD, Malmstrom RR, 2016. Genome-wide selective sweeps and gene-specific sweeps in natural bacterial populations. ISME J. 10, 1589–1601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Bornet B, Antoine E, Françoise S, Baut CM, 2005. Development of sequence characterized amplified region markers from intersimple sequence repeat fingerprints for the molecular detection of toxic phytoplankton alexandrium catenella (dinophyceae) and pseudo-nitzschia pseudodelicatissima (bacillariophyceae) from french coastal waters1. J. Phycol 41, 704–711. [Google Scholar]
  14. Bruin A.de, Ibelings BW, Van Donk E, 2003. Molecular techniques in phytoplankton research: from allozyme electrophoresis to genomics. Hydrobiologia 491, 47–63. [Google Scholar]
  15. Bushnell B, 2014. BBMap: A Fast, Accurate, Splice-Aware Aligner (No. LBNL-7065E). Lawrence Berkeley National Lab. (LBNL), Berkeley, CA United States. [Google Scholar]
  16. Carlson MCG, McCary ND, Leach TS, Rocap G, 2016. Pseudo-nitzschia challenged with Co-occurring Viral communities display diverse infection phenotypes. Front. Microbiol 7, 527. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Casteleyn G, Leliaert F, Backeljau T, Debeer A-E, Kotaki Y, Rhodes L, Lundholm N, Sabbe K, Vyverman W, 2010. Limits to gene flow in a cosmopolitan marine planktonic diatom. Proc. Natl. Acad. Sci. U. S. A 107, 12952–12957. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Clark S, Hubbard KA, Anderson DM, McGillicuddy DJ, Ralston DK, Townsend DW, 2019. Pseudo-nitzschia bloom dynamics in the Gulf of Maine: 2012–2016. Harmful Algae 88, 101656. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Cochlan WP, Herndon J, Kudela RM, 2008. Inorganic and organic nitrogen uptake by the toxigenic diatom Pseudo-nitzschia australis (Bacillariophyceae). Harmful Algae 8, 111–118. [Google Scholar]
  20. Countway PD, Gast RJ, Savai P, Caron DA, 2005. Protistan diversity estimates based on 18S rDNA from seawater incubations in the Western North Atlantic. J. Eukaryot. Microbiol 52, 95–106. [DOI] [PubMed] [Google Scholar]
  21. Cuvelier ML, Allen AE, Monier A, McCrow JP, Messié M, Tringe SG, Woyke T, Welsh RM, Ishoey T, Lee J-H, Binder BJ, DuPont CL, Latasa M, Guigand C, Buck KR, Hilton J, Thiagarajan M, Caler E, Read B, Lasken RS, Chavez FP, Worden AZ, 2010. Targeted metagenomics and ecology of globally important uncultured eukaryotic phytoplankton. Proc. Natl. Acad. Sci. U. S. A 107, 14679–14684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Davey JW, Blaxter ML, 2010. RADSeq: next-generation population genetics. Brief. Funct. Genomics 9, 416–423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Evans KM, Bates SS, Medlin LK, Hayes PK, 2004. Microsatellite marker development and genetic variation in the toxic marine diatom pseudo-Nitzschia multiseries (Bacillariophyceae)1. J. Phycol 40, 911–920. [Google Scholar]
  24. Evans KM, Hayes PK, 2004. Microsatellite markers for the cosmopolitan marine diatom Pseudo-nitzschia pungens. Mol. Ecol. Notes 4, 125–126. [Google Scholar]
  25. Fehling J, Davidson K, Bolch CJ, Bates SS, 2004. Growth and domoic acid production by pseudo-nitzschia seriata (bacillariophyceae) under phosphate and silicate limitation. J. Phycol 40, 674–683. [Google Scholar]
  26. Fernandes LF, Hubbard KA, Richlen ML, Smith J, Bates SS, Ehrman J, Léger C, Mafra LL, Kulis D, Quilliam M, Libera K, McCauley L, Anderson DM, 2014. Diversity and toxicity of the diatom Pseudo-nitzschia Peragallo in the Gulf of Maine, Northwestern Atlantic Ocean. Deep Sea Res. Part 2 Top. Stud. Oceanogr 103, 139–162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Fu L, Niu B, Zhu Z, Wu S, Li W, 2012. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Gawad C, Koh W, Quake SR, 2016. Single-cell genome sequencing: current state of the science. Nat. Rev. Genet 17, 175–188. [DOI] [PubMed] [Google Scholar]
  29. Godhe A, Härnström K, 2010. Linking the planktonic and benthic habitat: genetic structure of the marine diatom Skeletonema marinoi. Mol. Ecol 19, 4478–4490. [DOI] [PubMed] [Google Scholar]
  30. Graham ED, Heidelberg JF, Tully BJ, 2017. BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation. PeerJ 5, e3035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Guannel ML, Haring D, Twiner MJ, Wang Z, Noble AE, Lee PA, Saito MA, Rocap G, 2015. Toxigenicity and biogeography of the diatom Pseudo-nitzschia across distinct environmental regimes in the South Atlantic Ocean. Mar. Ecol. Prog. Ser 526, 67–87. [Google Scholar]
  32. Hasle GR, Syvertsen EE, 1997. Chapter 2 - marine diatoms. In: Tomas CR (Ed.), Identifying Marine Phytoplankton. Academic Press, San Diego, pp. 5–385. [Google Scholar]
  33. Heisler J, Glibert P, Burkholder J, Anderson D, Cochlan W, Dennison W, Gobler C, Dortch Q, Heil C, Humphries E, Lewitus A, Magnien R, Marshall H, Sellner K, Stockwell D, Stoecker D, Suddleson M, 2008. Eutrophication and harmful algal blooms: a scientific consensus. Harmful Algae 8, 3–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Hohenlohe PA, Bassham S, Etter PD, Stiffler N, Johnson EA, Cresko WA, 2010. Population genomics of parallel adaptation in Threespine stickleback using sequenced RAD tags. PLos Genet. 6, e1000862. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Hong Z, Lai Q, Luo Q, Jiang S, Zhu R, Liang J, Gao Y, 2015. Sulfitobacter pseudonitzschiae sp. nov., isolated from the toxic marine diatom Pseudo-nitzschia multiseries. Int. J. Syst. Evol. Microbiol 65, 95–100. [DOI] [PubMed] [Google Scholar]
  36. Howard MDA, Cochlan WP, Ladizinsky N, Kudela RM, 2007. Nitrogenous preference of toxigenic Pseudo-nitzschia australis (Bacillariophyceae) from field and laboratory experiments. Harmful Algae 6, 206–217. [Google Scholar]
  37. Hubbard KA, Olson CH, Armbrust EV, 2014. Molecular characterization of Pseudo-nitzschia community structure and species ecology in a hydrographically complex estuarine system (Puget Sound, Washington, USA). Mar. Ecol. Prog. Ser 507, 39–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Hubbard KA, Rocap G, Armbrust VE, 2008. Inter- and intraspecific community structure within the diatom genus pseudo-nitzschia (bacillariophyceae). J. Phycol 44, 637–649. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Johansson ON, Pinder MIM, Ohlsson F, Egardt J, Töpel M, Clarke AK, 2019. Friends with benefits: exploring the Phycosphere of the marine diatom Skeletonema marinoi. Front. Microbiol 10, 1828. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Kofler R, Pandey RV, Schlötterer C, 2011. PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq). Bioinformatics 27, 3435–3436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Kudela RM, Bickel A, Carter ML, Howard MDA, Rosenfeld L, 2015. Chapter 5 - the monitoring of harmful algal blooms through ocean observing: the development of the California Harmful Algal bloom monitoring and alert program. In: Liu Y, Kerkering H, Weisberg RH (Eds.), Coastal Ocean Observing Systems. Academic Press, Boston, pp. 58–75. [Google Scholar]
  42. Kudela RM, Cochlan WP, Roberts A, 2002. Spatial and Temporal Patterns of Pseudo-nitzschia species in Central California related to Regional oceanography, in: Harmful Algae. Florida Fish and Wildlife Conservation Commission, Florida Institute of Oceanography, and Intergovernmental Oceanographic Commission of UNESCO, St. Petersburg, FL. [Google Scholar]
  43. Kudela RM, Lane JQ, Cochlan WP, 2008. The potential role of anthropogenically derived nitrogen in the growth of harmful algae in California, USA. Harmful Algae 8, 103–110. [Google Scholar]
  44. Kvitek RG, Goldberg JD, Smith GJ, Doucette GJ, Silver MW, 2008. Domoic acid contamination within eight representative species from the benthic food web of Monterey Bay, California, USA. Mar. Ecol. Prog. Ser 367, 35–47. [Google Scholar]
  45. Langlois G, Zubkousky-White V, Christen J, Rankin S, 2014. Marine Biotoxin Monitoring Program Annual Report. California Department of Public Health. [Google Scholar]
  46. Langmead B, Salzberg SL, 2012. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Lelong A, Hégaret H, Soudant P, Bates SS, 2012. Pseudo-nitzschia (Bacillariophyceae) species, domoic acid and amnesic shellfish poisoning: revisiting previous paradigms. Phycologia 51, 168–216. [Google Scholar]
  48. Lim HC, Lim PT, Teng ST, Bates SS, Leaw CP, 2014. Genetic structure of Pseudo-nitzschia pungens (Bacillariophyceae) populations: implications of a global diversification of the diatom. Harmful Algae 37, 142–152. [Google Scholar]
  49. Lim HC, Tan SN, Teng ST, Lundholm N, Orive E, David H, Quijano-Scheggia S, Leong SCY, Wolf M, Bates SS, Lim PT, Leaw CP, 2018. Phylogeny and species delineation in the marine diatom Pseudo-nitzschia (Bacillariophyta) using cox1, LSU, and ITS2 rRNA genes: a perspective in character evolution. J. Phycol 54, 234–248. [DOI] [PubMed] [Google Scholar]
  50. Li W, Godzik A, 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659. [DOI] [PubMed] [Google Scholar]
  51. Lynch M, Milligan BG, 1994. Analysis of population genetic structure with RAPD markers. Mol. Ecol 3, 91–99. [DOI] [PubMed] [Google Scholar]
  52. Maldonado MT, Hughes MP, Rue EL, Wells ML, 2002. The effect of Fe and Cu on growth and domoic acid production by Pseudo-nitzschia multiseries and Pseudo-nitzschia australis. Limnol. Oceanogr 47, 515–526. [Google Scholar]
  53. Manhart JR, Fryxell GA, Villac MC, Segura LY, 1995. Pseudo-Nitzschia pungens and p. Multiseries (Bacillariophyceae): nuclear ribosomal dnas and species differences1. J. Phycol 31, 421–427. [Google Scholar]
  54. Marie D, Le Gall F, Edern R, Gourvil P, Vaulot D, 2017. Improvement of phytoplankton culture isolation using single cell sorting by flow cytometry. J. Phycol 53, 271–282. [DOI] [PubMed] [Google Scholar]
  55. Matz MV, 2018. Fantastic beasts and how to sequence them: ecological genomics for obscure model organisms. Trends Genet. 34, 121–132. [DOI] [PubMed] [Google Scholar]
  56. Moore SK, Dreyer SJ, Ekstrom JA, Moore K, Norman K, Klinger T, Allison EH, Jardine SL, 2020. Harmful algal blooms and coastal communities: socioeconomic impacts and actions taken to cope with the 2015U.S. West Coast domoic acid event. Harmful Algae 96, 101799. [DOI] [PubMed] [Google Scholar]
  57. Moriarty ME, Tinker MT, Miller MA, Tomoleoni JA, Staedler MM, Fujii JA, Batac FI, Dodd EM, Kudela RM, Zubkousky-White V, Johnson CK, 2021. Exposure to domoic acid is an ecological driver of cardiac disease in southern sea otters✰. Harmful Algae 101, 101973. [DOI] [PubMed] [Google Scholar]
  58. Orsini L, Procaccini G, Sarno D, Montresor M, 2004. Multiple rDNA ITS-types within the diatom Pseudo-nitzschia delicatissima (Bacillariophyceae) and their relative abundances across a spring bloom in the Gulf of Naples. Mar. Ecol. Prog. Ser 271, 87–98. [Google Scholar]
  59. Osuna-Cruz CM, Bilcke G, Vancaester E, De Decker S, Bones AM, Winge P, Poulsen N, Bulankova P, Verhelst B, Audoor S, Belisova D, Pargana A, Russo M, Stock F, Cirri E, Brembu T, Pohnert G, Piganeau G, Ferrante MI, Mock T, Sterck L, Sabbe K, De Veylder L, Vyverman W, Vandepoele K, 2020. The Seminavis robusta genome provides insights into the evolutionary adaptations of benthic diatoms. Nat. Commun 11, 3320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Parsons ML, Dortch Q, 2002. Sedimentological evidence of an increase in Pseudo-nitzschia(Bacillariophyceae)abundance in response to coastal eutrophication. Limnol. Oceanogr 47, 551–558. [Google Scholar]
  61. Prince EK, Irmer F, Pohnert G, 2013. Domoic acid improves the competitive ability of Pseudo-nitzschia delicatissima against the diatom Skeletonema marinoi. Mar. Drugs 11, 2398–2412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Puritz JB, Matz MV, Toonen RJ, Weber JN, Bolnick DI, Bird CE, 2014. Demystifying the RAD fad. Mol. Ecol 23, 5937–5942. [DOI] [PubMed] [Google Scholar]
  63. R Core Team, Others, 2017. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, Vienna, Austria. [Google Scholar]
  64. Round FE, Crawford RM, Mann DG, 1990. Diatoms: Biology and Morphology of the Genera. Cambridge University Press. [Google Scholar]
  65. Rue E, Bruland K, 2001. Domoic acid binds iron and copper: a possible role for the toxin produced by the marine diatom Pseudo-nitzschia. Mar. Chem 76, 127–134. [Google Scholar]
  66. Rynearson TA, Newton JA, Armbrust EV, 2006. Spring bloom development, genetic variation, and population succession in the planktonic diatom Ditylum brightwellii. Limnol. Oceanogr 51, 1249–1261. [Google Scholar]
  67. Salazar G, Sunagawa S, 2017. Marine microbial diversity. Curr. Biol 27, R489–R494. [DOI] [PubMed] [Google Scholar]
  68. Sayers EW, Cavanaugh M, Clark K, Ostell J, Pruitt KD, Karsch-Mizrachi I, 2019. GenBank. Nucl. Acids Res 47, D94–D99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Schlötterer C, Tobler R, Kofler R, Nolte V, 2014. Sequencing pools of individuals — mining genome-wide polymorphism data without big funding. Nat. Rev. Genet 15, 749–763. [DOI] [PubMed] [Google Scholar]
  70. Schnetzer A, Jones BH, Schaffner RA, Cetinic I, Fitzpatrick E, Miller PE, Seubert EL, Caron DA, 2013. Coastal upwelling linked to toxic Pseudo-nitzschia australis blooms in Los Angeles coastal waters, 2005–2007. J. Plankton Res 35, 1080–1092. [Google Scholar]
  71. Schoch CL, Ciufo S, Domrachev M, Hotton CL, Kannan S, Khovanskaya R, Leipe D, Mcveigh R, O’Neill K, Robbertse B, Sharma S, Soussov V, Sullivan JP, Sun L, Turner S, Karsch-Mizrachi I, 2020. NCBI Taxonomy: a comprehensive update on curation, resources and tools. Database 2020. 10.1093/database/baaa062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Scholin CA, Gulland F, Doucette GJ, Benson S, Busman M, Chavez FP, Cordaro J, DeLong R, De Vogelaere A, Harvey J, Haulena M, Lefebvre K, Lipscomb T, Loscutoff S, Lowenstine LJ, Marin R 3rd, Miller PE, McLellan WA, Moeller PD, Powell CL, Rowles T, Silvagni P, Silver M, Spraker T, Trainer V, Van Dolah FM, 2000. Mortality of sea lions along the central California coast linked to a toxic diatom bloom. Nature 403, 80–84. [DOI] [PubMed] [Google Scholar]
  73. Seubert EL, Gellene AG, Howard MDA, Connell P, Ragan M, Jones BH, Runyan J, Caron DA, 2013. Seasonal and annual dynamics of harmful algae and algal toxins revealed through weekly monitoring at two coastal ocean sites off southern California, USA. Environ. Sci. Pollut. Res. Int 20, 6878–6895. [DOI] [PubMed] [Google Scholar]
  74. Seymour JR, Amin SA, Raina J-B, Stocker R, 2017. Zooming in on the phycosphere: the ecological interface for phytoplankton–bacteria relationships. Nature Microbiol. 2, 1–12. [DOI] [PubMed] [Google Scholar]
  75. Sison-Mangus MP, Jiang S, Tran KN, Kudela RM, 2014. Host-specific adaptation governs the interaction of the marine diatom, Pseudo-nitzschia and their microbiota. ISME J. 8, 63–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Smith J, Connell P, Evans RH, Gellene AG, Howard MDA, Jones BH, Kaveggia S, Palmer L, Schnetzer A, Seegers BN, Seubert EL, Tatters AO, Caron DA, 2018a. A decade and a half of Pseudo-nitzschia spp. and domoic acid along the coast of southern California. Harmful Algae 79, 87–104. [DOI] [PubMed] [Google Scholar]
  77. Smith J, Gellene AG, Hubbard KA, Bowers HA, Kudela RM, Hayashi K, Caron DA, 2018b. Pseudo-nitzschia species composition varies concurrently with domoic acid concentrations during two different bloom events in the Southern California Bight. J. Plankton Res 40, 29–45. [Google Scholar]
  78. Tatters AO, Fu F-X, Hutchins DA, 2012. High CO2 and silicate limitation synergistically increase the toxicity of Pseudo-nitzschia fraudulenta. PLoS ONE 7, e32116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Töpel M, Pinder MIM, Johansson ON, Kourtchenko O, Godhe A, Clarke AK, 2019. Whole Genome Sequence of Marinobacter salarius Strain SMR5, Shown to Promote Growth in its Diatom Host. J. Genomics 7, 60–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Trainer VL, Adams NG, Bill BD, Stehr Wekell, John C, Moeller P, Busman M, Woodruff D, 2000. Domoic acid production near California coastal upwelling zones, June 1998. Limnol. Oceanogr 45, 1818–1833. [Google Scholar]
  81. Trainer VL, Bates SS, Lundholm N, Thessen AE, Cochlan WP, Adams NG, Trick CG, 2012. Pseudo-nitzschia physiological ecology, phylogeny, toxicity, monitoring and impacts on ecosystem health. Harmful Algae 14, 271–300. [Google Scholar]
  82. Trainer VL, Hickey BM, Lessard EJ, Cochlan WP, Trick CG, Wells Mark L, Amoreena MacFadyen, Moore Stephanie K, 2009. Variability of Pseudo-nitzschia and domoic acid in the Juan de Fuca eddy region and its adjacent shelves. Limnol. Oceanogr 54, 289–308. [Google Scholar]
  83. Trainer VL, Pitcher GC, Reguera B, Smayda TJ, 2010. The distribution and impacts of harmful algal bloom species in eastern boundary upwelling systems. Prog. Oceanogr 85, 33–52. [Google Scholar]
  84. Villac MC, Fryxell GA, 1998. Pseudo-nitzschia pungens var. cingulata var. nov. (Bacillariophyceae) based on field and culture observations. Phycologia 37, 269–274. [Google Scholar]
  85. Wang S, Meyer E, McKay JK, Matz MV, 2012. 2b-RAD: a simple and flexible method for genome-wide genotyping. Nat. Methods 9, 808–810. [DOI] [PubMed] [Google Scholar]
  86. Wells ML, Trick CG, Cochlan WP, Hughes MP, Trainer VL, 2005. Domoic acid: the synergy of iron, copper, and the toxicity of diatoms. Limnol. Oceanogr 50, 1908–1917. [Google Scholar]
  87. Wessells CR, Miller CJ, Brooks PM, 1995. Toxic algae contamination and demand for shellfish: a case study of demand for mussels in Montreal. Mar. Resour. Econ 10, 143–159. [Google Scholar]
  88. Whittaker KA, Rynearson TA, 2017. Evidence for environmental and ecological selection in a microbe with no geographic limits to gene flow. Proc. Natl. Acad. Sci 114, 2651–2656. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Xu N, Tang YZ, Qin J, Duan S, Gobler CJ, 2015. Ability of the marine diatoms Pseudo-nitzschia multiseries and P. pungens to inhibit the growth of co-occurring phytoplankton via allelopathy. Aquat. Microb. Ecol 74, 29–41. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement1

Data Availability Statement

The data used were publically available US EPA data.

RESOURCES