Abstract
There is a pressing need to understand the impact of contaminants on Arctic ecosystems; however, most toxicity tests are based on temperate species, and there are issues with reliability and relevance of bioassays in general. Together this may result in an underestimation of harm to Arctic organisms and contribute to significant uncertainty in risk assessments. To help address these concerns, a critical review to assess reported effects for these species, quantify methodological and endpoint relevance gaps, and identify future research needs for testing was performed. We developed uniform criteria to score each study, allowing an objective comparison across experiments to quantify their reliability and relevance. We scored a total of 48 individual studies, capturing 39 tested compounds, 73 unique Arctic test species, and 95 distinct endpoints published from 1975 to 2021. Our analysis shows that of 253 test substance and species combinations scored (i.e., a unique toxicity test), 207 (82%) failed to meet at least one critical study criterion that contributes to data reliability for use in risk assessment. Arctic‐focused toxicity testing needs to ensure that exposures can be analytically confirmed, include environmentally realistic exposure scenarios, and report test methods more thoroughly. Significant data gaps were identified as related to standardized toxicity testing with Arctic species, diversity of compounds tested with these organisms, and the inclusion of ecologically relevant sublethal and chronic endpoints assessed in Arctic toxicity testing. Overall, there needs to be ongoing improvement in test conduction and reporting in the scientific literature to support effective risk assessments in an Arctic context. Environ Toxicol Chem 2022;41:46–72. © 2021 The Authors. Environmental Toxicology and Chemistry published by Wiley Periodicals LLC on behalf of SETAC.
Keywords: Bioassays, Strength of methods, Marine toxicity tests, Aquatic toxicology, Reliability, Ecological risk assessment
INTRODUCTION
Environmental risk assessors and others who rely on the ecotoxicological peer‐reviewed literature to aid in regulatory decision‐making have identified data reliability and reporting as a critical issue preventing data uptake (see Borgert et al., 2016; Hanson et al., 2017; Harris & Sumpter, 2015; Thoré et al., 2019). In the present study, reliability refers to the completeness of reported methodologies—specifically, whether the experimental design is sufficiently described to ensure reproducibility of the method and accuracy of the results (Ågerstrand, Kuster, et al., 2011; Moermond et al., 2016), with the specific purpose of informing environmental risk assessment (ERA). Well‐designed studies with relevant species and endpoints contribute to the effectiveness of the ERA process (Ågerstrand, Breitholtz, & Rudén, 2011; Ågerstrand, Kuster, et al., 2011; Küster et al., 2009). High‐quality data and effective reporting reduce the uncertainty around the potential for effects and give risk assessors greater confidence in their decisions and recommendations. Furthermore, methodologically robust toxicity tests are vital in helping to reduce unnecessary animal testing within ecotoxicology (Burden et al., 2020; Vergauwen, 2018). Ultimately, well‐designed and ‐performed studies free up resources that could be used to perform additional studies that will further ERAs, instead of repeating studies to confirm or refute inconsistent observations (Hanson & Brain, 2021).
While many regulatory agencies have internal screening criteria for data inclusion in ERAs (e.g., Henry & Pease, 2016; US Environmental Protection Agency [USEPA], 2003), there is currently no single, standard approach used to critically evaluate the reliability of data being used in ERA. Several approaches have been developed and proposed for ad hoc screening of the peer‐reviewed literature (Ågerstrand et al., 2013; Durda & Preziosi, 2000; Hanson et al., 2019; Hobbs et al., 2005; Klimisch et al., 1997; Moermond et al., 2016; Schneider et al., 2009; Van Der Kraak et al., 2014). For example, Klimisch et al. (1997) proposed an evaluation system whereby data are sorted into one of four reliability categories for use in risk assessment, ranging from reliable without restriction to not reliable or assignable. Criteria regarding thorough reporting of methods, results, and inclusion of appropriate controls are outlined to aid in determining which category is appropriate for a particular study. This approach relies heavily on expert judgment and, thus, could be seen as relatively subjective in its application. Since then, several papers (e.g., Ågerstrand et al., 2013; Durda & Preziosi, 2000; Schneider et al., 2009) have expanded on the relatively brief descriptors outlined by Klimisch et al. (1997) and broadened the scope beyond acute studies. Schneider et al. (2009) offered an instrument called ToxRTool whereby scores are given to different groups of criteria resulting in the assignment of the study to one of the Klimisch categories. This technique helps address the criticisms of subjectivity and lack of transparency that have been identified when using Klimisch categories as a stand‐alone approach in determining data reliability (see Kase et al., 2016). Ultimately, attempts to objectively assess data quality rely on a combination of using such tools and expert judgment. Beyond assessing the quality of the data, a reasonable and neutral interpretation of results is crucial to ensure that ecotoxicology as a discipline maintains credibility in both the scientific and the public spheres (Brain & Hanson, 2021). Overemphasis of effects not only can have damaging implications for the reputation of the discipline but can also tie up funding and resources in addressing problems that potentially do not require resolution (Hanson & Brain, 2020).
It is important to note that there are numerous reasons why a study may not be recommended for risk‐assessment purposes, and these are not judgments that the research is wholly uninformative or does not contribute to our larger understanding of risk. Experimental design and execution could be excellent, but if the question being asked is not relevant to a risk scenario, the utility of the data for ERA is likely limited. Relevance in this context refers to the applicability of the observations to ERAs and the utility of the data in informing decision‐making (Ågerstrand, Kuster, et al., 2011; Moermond et al., 2016). Whether specific data are relevant to a risk scenario depends on the risk assessment in question, but the ecological relevance of various endpoints (i.e., mortality being more definitively tied to population‐level effects than biomarker responses) can be assessed independently of a specific risk scenario and was the focus in the present review.
Thorough documentation and transparency are needed to accurately assess the reliability of data, but these elements can be lacking in published studies, leading to difficulty in verifying the methods used and the results. Regulators often rely on studies conducted under good laboratory practices or according to standard methods (e.g., International Organization for Standardization, Organisation for Economic Co‐operation and Development, and USEPA), especially for plant protection products, because these guidelines have been developed to reduce uncertainty in toxicological testing (Borgert et al., 2016). However, it is not always possible to obtain these types of data for all compounds, and other published literature is often the source for deriving acceptable environmental values (Ågerstrand et al., 2013; Henry & Pease, 2016). This is especially true for tests and assessments for ecological risk where there is a lack of standard test development, such as the Arctic.
It has been long noted in the literature that there is a lack of testing and understanding around Arctic species' sensitivity to contaminants and that this lack of information may result in undue harm to this ecosystem (Chapman & Riddle, 2003). Of the available data, only a fraction may be reliable enough to provide accurate insight into environmental risk. This gap is in part due to the lack of standardized laboratory testing of Arctic species, as noted by others (Chapman & Riddle, 2005). The response of relatively numerous temperate aquatic species to a vast array of contaminants has been characterized (Chapman & Riddle, 2003; USEPA, 2021), but critical differences in the structure and function of Arctic and temperate ecosystems mean extrapolating these results could be a source of significant uncertainty. For example, Arctic ecosystems lack the same functional redundancy that is present elsewhere because there may only be one or two different species that play a similar functional role, while in temperate, tropical, or boreal ecosystems, there may be tens or hundreds of species with similar functional niches (Aune et al., 2018). Therefore, assessing and supporting the conduct of high‐quality Arctic ecotoxicology data are vital to Arctic ecosystem protection.
The presence of contaminants in Arctic marine ecosystems (Sonne et al., 2021) underlines the importance of generating high‐quality, Arctic‐specific toxicity data to provide accurate and robust insight for ERA in the region. Human activities in this region such as mineral extraction, hydrocarbon exploration, tourism, and shipping all pose potential threats to flora and fauna through the release of contaminants into aquatic ecosystems (Arctic Monitoring and Assessment Programme, 2017; Aulanier et al., 2017; Hsiao, 1978). For example, in 2008, the US Geological Survey estimated that up to 90 billion barrels of oil (~14 trillion L), 1669 trillion cubic feet (~47 trillion m3) of natural gas, and 44 billion barrels (~7 trillion L) of natural gas liquids remain undiscovered in the Arctic (US Department of the Interior, 2008). If and when these resources are extracted, these processes bring to light several contaminants of concern; namely polycyclic aromatic hydrocarbon (PAH) compounds that are present in crude oil. As risk assessments move forward in the Arctic, a baseline of data gaps is needed to ensure proper test prioritization.
The objective of this critical review was to quantitatively characterize the current state of knowledge in Arctic ecotoxicological research regarding effects of contaminants on Arctic organisms. Specifically, we sought to objectively and transparently assess the methodological strengths and weaknesses (i.e., reliability) of available toxicity data found in the peer‐reviewed and accessible gray literature through the creation and application of a scoring rubric. The results from the scoring rubric were used to analyze trends in current research practices; identify gaps in study design, conduct, and reporting; and highlight the future work needed to address these deficiencies and contribute to the broader discussion of data reliability, both in Arctic ecotoxicology and beyond. Finally, this approach allowed for the highlighting of knowledge gaps in available Arctic species, endpoints, and contaminants with recommendations to address in future toxicity testing.
METHODS
Literature search and initial screening
An overview of the methodological process for this exercise is presented in Figure 1 (for interpretation of the references to color in these figures, the reader is referred to the web version of this article). Literature searches were conducted from January 2018 to June 2021 to gather relevant studies for assessment in the present review. The University of Manitoba Libraries' database search engines were used to conduct Boolean searches from the Web of Science, Wiley Online Library, TOXLINE, ProQuest, and ScienceDirect databases. A Boolean search was conducted using the search terms Arctic OR polar AND ecotoxicology, toxicology, toxic*, effects, contaminants, test, assay, exposure, NOT Antarctic. These search terms generated relevant articles that were then subjected to backward and forward searching, which entails reviewing relevant sources cited by the obtained article and sources that have cited the article itself (Brocke et al., 2009).
Figure 1.

Overview of the process whereby Arctic laboratory toxicity tests were evaluated for the reliability and relevance of their data for use in environmental risk assessment.
Initial screening of retrieved articles was completed to filter out any non‐English studies and those that were not peer‐reviewed or published in an indexed scientific journal (i.e., one with an assigned impact factor). The process of backward and forward searching highlighted data sets available from conference proceedings and other gray literature (e.g., Arctic and Marine Oil Spill Program [AMOP], government technical reports) that were highly relevant to this subject matter. Despite targeting peer‐reviewed publications for this exercise, the sporadic and sparse nature of Arctic species toxicity tests became apparent throughout the metadata collection phase. To address this, technical reports were included in the full evaluation (e.g., Department of the Environment Beaufort Sea Technical Report 11; Percy & Mullin, 1975) where they were publicly accessible through database searches or easily available from a public forum. Where technical reports were not accessible (e.g., US National Marine Fisheries Service technical reports), titles were gathered into a table as references (Supporting Information Table S1) but not scored. The AMOP abstracts from 1978 to 2018 were accessed through personal copies available from the 2019 AMOP conference. At the time of this writing, AMOP abstracts were only available to the authors up until 2018. The inconsistency of including gray literature was outweighed by the importance of incorporating this widely accessible and significant pool of Arctic‐related data. It is also important to note that these publications represented a mechanism that was designed to complement standard peer‐reviewed articles. A significant amount of Arctic ecotoxicology data have been gathered by Canadian government departments and programs (e.g., former Department of Indian Affairs and Northern Development, Newfoundland Oil Burn Experiment, Oil‐in‐Ice Joint Industry Program) where studies were not officially published or peer‐reviewed, but excluding these data on this basis alone would have detracted from the overall purpose of this assessment and resulted in eight fewer studies from which data points could be gathered. For this reason, abstracts that reported results were subsequently included in the analysis if they also met the additional inclusion criteria outlined in the following section.
Inclusion and exclusion criteria
Reports and literature that passed initial screening were organized into a table where studies were further evaluated for their relevance to the present review. The following criteria determined which studies proceeded to the scoring phase:
Must be an aquatic toxicity test (freshwater, marine, or estuarine), terrestrial tests were excluded;
Must be laboratory toxicity data or deliberate field exposure (e.g., mesocosm studies, studies that measured effects in wild populations after accidental contaminant releases were excluded);
Must use a test species endemic to an Arctic region or be conducted in an Arctic context with a species that can be found in an Arctic region (see next paragraph for further details).
In summary, studies returned from the search were excluded if they did not include a deliberate exposure to a test species found in the aquatic Arctic environment. Toxicity tests with seabirds were included as aquatic toxicity tests because seabirds spend >50% of their lives at sea (Brown, 1980). This behavior means that they have the potential to be exposed to marine contaminants both directly (e.g., swimming through contaminated areas) and indirectly (e.g., biomagnification via feeding on contaminated fish; Braune et al., 2012; Campbell et al., 2005). Some retrieved studies included test species with Arctic distribution but that were not endemic to this region. For example, the blue mussel Mytilus edulis is commonly tested as a temperate species because it has geographical distribution throughout the North Atlantic ocean (see Falfushynska et al., 2019); however, it has also been tested as an Arctic species because of its presence in the Arctic ocean (see Thyrring et al., 2015). Similarly, the copepod Calanus finmarchicus exists geographically in both oceans and has been tested as both an Arctic (see Faksness et al., 2011; Grenvald et al., 2013) and a non‐Arctic (e.g., Hansen et al., 2013; Øverjordet et al., 2014) species for comparison. In the case of the blue mussel, the study was included if conducted in an Arctic context (e.g., based on study objective, environmental parameters, or exposure scenario) but excluded if conducted in a non‐Arctic context. The copepod C. finmarchicus was included from all papers because its prevalence and importance in Arctic regions are key to ecosystem processes (Hansen et al., 2009). Including all aquatic toxicity tests using a specific species solely because it is found in Arctic regions, although found elsewhere, would broaden the scope of the present review beyond its original purpose of describing the current state of knowledge of Arctic‐specific species within ecotoxicology.
Scoring
The objective of the scoring exercise was to assess the methodological strengths and weaknesses of published studies to determine the reliability of the data for use in ERA. To accomplish this, a scoring system was modified from Hanson et al. (2019), which was developed for primary producer toxicity tests and based on previously published data reliability guidelines and criteria from standard methods (Klimisch et al., 1997; Schneider et al., 2009; US Environmental Protection Agency, 2012). The aim of identifying scoring criteria a priori was to increase the transparency and objectivity of the process.
Information regarding the exposure scenario, test conditions, and results for each study (i.e., exposure durations, test vessels, environmental parameters, x% effect concentration [ECx] values, statistical analyses performed) was incorporated into a spreadsheet to facilitate scoring and comparison between studies. The spreadsheet required modifications over several iterations to ensure that pertinent information from all assessed studies could be captured (e.g., the inclusion of additional columns for media properties), but the basic scoring approach and weightings did not change. The system was comprised of three groups of criteria based on (1) test substance, (2) test organism and experimental system, and (3) test design, statistics, and results. Within these groups, several key criteria were identified as critical study components (described in more detail in section Data handling), and overall study scores were multiplied by a factor of 0.5 for each critical criterion that was not met (i.e., where the uncertainty introduced by its absence is considered exceptional). Experimental procedures were numerically scored with binary responses according to 15 criteria (i.e., 1 if the criterion was met and 0 if it was not). An additional category called “reviewer judgment” was added as part of the criteria when the method was outlined a priori, in the case where an unforeseen issue not captured in the rubric rendered a study unreliable (e.g., control performance was reported but would not be considered adequate or significant contamination of test compound in controls). This criterion ultimately did not have to be used for any of the studies assessed in the present review. The scoring rubric is presented in Table 1. To be considered reliable for use in risk assessment, we set a minimum overall reliability score threshold of 7.5 out of 15, with all critical study criteria (those in bold in Table 1) being met. Tests that achieved exactly 50% were considered to be reliable. Although the numerical value of 7.5 is arbitrary, an overall reliability score above this value means that the study did not achieve a score of 0 in any of the critical criteria, which was our minimum threshold for being considered reliable.
Table 1.
Scoring criteria, rationale for their use, and elaboration on scoring used to assess the methodological quality of available Arctic species ecotoxicological dataa
| Criterion | Rationale | Score of 1 | Score of 0 | |
|---|---|---|---|---|
| Group A—Test substance | ||||
| 1 | >95% pure (or equivalent) | Using a test substance of high purity ensures that any toxic effects observed in the test are attributable to the active ingredient instead of other compounds in a formulated product. | If the source and percentage of the test substance were reported and the percentage was >95% OR the dispersant/oil type was identified | If the source and/or percentage of the test substance were not reported or the percentage was <95% OR the dispersant/oil type was not identified |
| 2 | Measured concentrations reported (any) | Analytically confirming tested concentrations rules out the possibility of test solution preparation errors or inconsistencies, giving greater confidence in the results. | If concentrations of any stock and/or test solutions were analytically confirmed and concentrations reported | If concentrations of any stock and/or test solutions were not analytically confirmed and no measured concentrations are reported |
| 3 | Measured concentrations reported—individual initial | Analytically confirming tested concentrations rules out the possibility of test solution preparation errors or inconsistencies, giving greater confidence in the results; additionally, measuring concentrations at the start of the exposure provides a baseline against which later measurements can be compared. | If concentrations of test solutions in individual test units at test initiation (pooled or separate) were analytically confirmed and concentrations reported | If concentrations of test solutions in individual test units at test initiation (pooled or separate) were not analytically confirmed and no measured concentrations reported |
| 4 | Measured concentrations reported—individual final | Analytically confirming tested concentrations rules out the possibility of test solution preparation errors or inconsistencies, giving greater confidence in the results; additionally, comparing concentrations at the end of the exposure to those at the start can demonstrate how the compound has changed over the duration of the exposure. This is particularly useful in oil spill research. | If concentrations of test solutions in individual test units at the end of the test (pooled or separate) were analytically confirmed and concentrations reported | If concentrations of test solutions in individual test units at the end of the test (pooled or separate) were not analytically confirmed and no measured concentrations reported |
| 5 | Number of concentrations tested (excluding control) | A minimum of three concentrations are recommended to effectively plot a concentration–response curve with confidence. | If there were three or more test concentrations, excluding control | If there were fewer than three test concentrations, excluding control |
| 6 | Ecological relevance | Data collected in laboratory toxicity tests are more useful to risk assessors if concentrations that can be found in the environment are included. | If at least one of the tested concentrations was less than or equal to environmentally relevant concentration for that substance (see Supporting Information, Table S2) | If none of the tested concentrations were less than the environmentally relevant concentration for that substance (see Supporting Information, Table S2) |
| Group B—Test organism and test system | ||||
| 7 | Strain or source identified | Specific locations of wild‐collected species may provide insight regarding potential previous exposures. Laboratory strains, depending on the species, may have adapted over time to culture conditions. Describing this information can help guide the interpretation of results based on these occurrences. | If the provenance of the test organism was identified (wild‐collected location, laboratory strain identifier) | If the provenance of the test organism was not identified (wild‐collected location, laboratory strain identifier) |
| 8 | Initial test organism characteristics described | Physical characteristics, especially size, can directly influence the toxicity of a given compound and relative responses between tests. | If initial characteristics relevant to the species (i.e., size, density, mass, and feeding protocols) were described | If no initial characteristics relevant to the species (i.e., size, density, mass, and feeding protocols) were described |
| 9 | Standard protocol followed | Standard methods allow more transparency in the data collection process by thoroughly describing methods and how results should be interpreted. Following these protocols also more readily facilitates intra‐ and interlaboratory comparisons of data. | If the procedures were based on standard protocols (e.g., USEPA, OECD, ASTM, and ISO) or modified from previous studies; deviations acceptable if described | If the procedures were not based on standard protocols (e.g., USEPA, OECD, ASTM, and ISO) or modified from previous studies; deviations acceptable if described |
| 10 | Test conditions | Environmental conditions appropriate for the test species ensure that any toxic effects observed are attributable to the presence of the test substance and not any other environmental factor. | If relevant environmental parameters specific to the test species and substance are identified (see Supporting Information, Table S4) | If relevant environmental parameters specific to the test species and substance are not identified (see Supporting Information, Table S4) |
| Group C—Test design, statistics, and results | ||||
| 11 | Replication | To perform meaningful statistical analysis of results, a minimum of three replicates is recommended to assess variation and contrast with controls. | Number of replicates is ≥3 | Number of replicates is <3 |
| 12 | Statistical methods described | Describing statistical methods allows other members of the scientific community to observe how results were derived and determine whether the statistical tests were appropriate for the data that were gathered. | If the statistical method to calculate results (i.e., NOEC/LOEC/EC x /LC x , statistical significance between treatments) were described and reported and appropriate controls were employed | If the statistical method to calculate results (i.e., NOEC/LOEC/EC x /LC x , statistical significance between treatments) were not described or reported and appropriate controls were not employed |
| 13 | Concentration–response | The inclusion of a concentration–response curve allows users of the data to see how the full toxicity profile is characterized at the full range of concentrations used in the test. | If a concentration–response model and parameters were reported in graph or formulaic expression | If a concentration–response model and parameters were not reported |
| 14 | Raw values | Including the raw data set increases the transparency of the study by demonstrating how well the test was performed and allowing other users of the data to perform different statistical approaches if required. | If raw values (average raw values acceptable but not percentage of control) were reported in tables and/or figures (all or average with error) | If raw values (average raw values acceptable but not percentage of control) were not reported in tables and/or figures |
| 15 | Control performance | All tests should have some form of control performance criteria to demonstrate that observed toxicity is a direct result of the compound instead of being attributed to poor husbandry or other factors. | If control values were reported and the reported values meet the control performance requirements | If control values were not reported or the reported values did not meet the control performance requirement |
Criteria presented in bold are considered critical components of a study for reliable use in risk assessment. Each bold criterion scored with a 0 results in a multiplication factor of 0.5 on the overall reliability score of the study.
USEPA = US Environmental Protection Agency; OECD = Organisation for Economic Co‐operation and Development; ASTM = ASTM International; ISO = International Organisation for Standardisation; NOEC = no‐observed‐effect concentration; LOEC = lowest‐observed‐effect concentration; ECx = x% effect concentration; LCx = x% lethal concentration.
Each endpoint that was assessed in each study was additionally given a score based on relevance of the response, which was not a factor in the overall reliability score, but was assessed separately to gauge utility for risk assessment. Endpoints were classified into one of eight categories based on the general descriptor of the response (i.e., biochemical, cellular, physiological, behavioral, growth, mortality, reproduction, and ecosystem‐level) and scored based on potential impacts with increasing levels of biological organization. For example, endpoints that have more relevance to sustaining populations or communities were scored higher, while those with little or no linkage to higher‐level effects were scored lower (US Environmental Protection Agency, 2016). Though sublethal endpoints such as behavior and biomarker response are important to characterize and add value to our understanding of toxicity, they do not supersede endpoints that have more direct and definitive population‐level effects such as mortality and reproduction. This endpoint weighting approach differs from current Canadian regulatory pass or fail practices but has been integrated into assessments such as the US National Resource Damage Assessment (Office of the Federal Register, National Archives and Records Administration, 2021). A complete list of endpoints assessed in the present study and justification for their inclusion in each category are presented in Table 2.
Table 2.
Scores for relevance of laboratory responses of available Arctic ecotoxicological dataa
| Score | General descriptor | Rationale | Endpoints observed in the present study | ||||
|---|---|---|---|---|---|---|---|
| Mammal | Bird | Fish | Invertebrate | Primary producer | |||
| 0 | No known linkage to survival, development, growth, and/or reproduction | If a response has no real or hypothetical linkage to higher‐level effects, then it has little to no value in elucidating ecological risk. | NA | NA | NA | NA | NA |
| 1 | Biomarker response such as genomic or metabolomic measures | While informative from a mechanistic perspective, the relevance of these responses to population‐, community‐, and ecosystem‐level effects is considered very low. In many cases, the responses characterized are regular processes to detoxify or adapt to a transient stressor, which in and of themselves are not adverse. | NA | NA |
|
|
|
| 2 | Biomarker responses such as enzymatic changes or general physiology | While informative from a mechanistic perspective, the relevance of these responses to population‐, community‐, and ecosystem‐level effects is considered very low. In many cases, the responses characterized are regular processes to detoxify or adapt to a transient stressor, which in and of themselves are not adverse. |
|
NA |
|
|
NA |
| 3 | Changes in behavior | Behavior, especially if related to reproduction, can have impacts on populations and communities of organisms. Other behavioral changes, such as predator avoidance, could result in increased mortality. | NA | NA | NA |
|
NA |
| 4 | Changes in growth and development, such as mass, other morphometrics, and phenology | These responses are typically highly relevant to the success or sustaining of populations and communities in an ecological context. | NA | NA |
|
|
|
| 5 | Changes in reproduction, including malformation of young | These responses are typically highly relevant to the success or sustaining of populations and communities in an ecological context. | NA | NA |
|
|
NA |
| 6 | Mortality and other surrogates | Loss of individuals is highly relevant to the success or sustaining of populations and communities in an ecological context. | NA |
|
|
|
|
Based on their linkage to effects on organisms at the population and community levels. Study number corresponding to each endpoint is reported in parentheses.
CYP1A/CYP330A1 = cytochrome P450 1A/330A1; GST = glutathione S‐transferase; mRNA = messenger RNA; EROD = 7‐ethoxyresorufin O‐deethylase; PAH = polycyclic aromatic hydrocarbon.
The rubric for the scored studies can be found in the Supporting Information for this article. Other researchers are encouraged to modify criteria depending on their specific use and recalculate scores as needed if certain aspects of the sheet are more or less relevant to their particular scenario. A blank template is also included where criteria can be removed, added, or modified and used for additional studies.
Data handling
Metadata from this exercise were compiled into a spreadsheet for further consideration. The in vitro and in vivo studies followed the same scoring process, and in cases where a cell line was used, the test species and aquatic habitat (freshwater or marine) were categorized based on the source animal (e.g., ringed seal [Pusa hispida] cell tests filed under marine mammal tests). Unidentified test species (e.g., a sediment sample with unidentified bacterial and algal communities in Petersen et al. [2008]) were counted as a discrete species category, as well as tests that identified the species only by genus (e.g., Gammarus sp. was differentiated from Gammarus oceanicus and counted as a different species).
Data are presented for individual papers (peer‐reviewed publications [see Olsen et al., 2013], abstracts [see Faksness et al., 2011], or theses [see Moore, 2016]) collectively under the term studies, as well as by unique toxicity tests (test substance and species combinations, e.g., Calanus glacialis exposed to pyrene [Jensen et al., 2008]) or experimental combinations (test substance/species/endpoint combinations, e.g., 7‐ethoxyresorufin O‐deethylase activity in Arctic cod exposed to produced water [Geraudie et al., 2014]). Test organisms were divided by organism group (i.e., invertebrate, vertebrate, or primary producer) and organism type as defined by the USEPA ECOTOX Knowledgebase (i.e., alga, bird, crustacean, fish, mammal, mollusk, or other invertebrate [US Environmental Protection Agency, 2021]), with the additional category of microbial consortium included based on findings in the present study. Test substances were organized into three main groups based on their structure and use: oil‐related contaminants, inorganic contaminants (e.g., metals), and organics (e.g., phenols and flame retardants). This was done to easily identify more generally groups of compounds where testing was either strong or lacking. Last, endpoints were categorized into eight effect categories, again based on the USEPA ECOTOX Knowledgebase: behavioral, biochemical, cellular, ecosystem, growth, mortality, physiological, and reproduction (US Environmental Protection Agency, 2021). This was done to assist in the assignment of relevance scores to individual endpoints.
Oil‐related contaminants in this context are defined as any type or brand of fuel oil or gasoline, products designed to clean up or remediate sites contaminated with fuel oil or gasoline, and single compounds that are found in fuel oil or gasoline when tested in the context of fuel oil or gasoline contamination. Specific test substances grouped within oil‐related contaminants were difficult to categorize because of the extensive number of products (e.g., types of oil and oil spill–remediation technologies), preparation methods (e.g., oil slick, mechanically or chemically dispersed water accommodated fractions [WAFs], in situ burn residues), and additives (e.g., chemical dispersants or shoreline washing agents) used in this testing. To facilitate data analysis and assist in the understanding of knowledge gaps, crude oil was counted as a single test substance regardless of oil type, preparation method, or additives. Oil spill–remediation technologies were counted as a single test substance based on operational usage (e.g., Absorrep K212 and Bioversal both counted as shoreline washing agents, Dasic NS and Corexit 9500 both counted as dispersants).
To investigate whether there was a correlation between the impact factor of the journal in which a paper was published and the reliability of the data presented, historical impact factors were retrieved from the Web of Science InCites Journal Citation Report (Clarivate Analytics, 2021) for the year in which the study was published. Where no impact factor was available for the time of publication, the average from the most recent 5 years was calculated and used in its place.
Although the scoring rubric was designed to be as objective as possible, there remained areas that require clarification for full transparency. For example, Criterion 11 requires a minimum of three replicates per exposure to achieve a score of 1; however, the line between true replication and pseudoreplication is sometimes difficult to discern (e.g., replicate dishes being held in the same bath in Frantzen et al. [2012] or the inclusion of 12 eggs per dose group pooled together for survival calculations in Braune et al. [2012]). Arguments could be made that replicates using the same preparation of stock or test solutions are actually pseudoreplicates or, similarly, experimental units being incubated in the same growth chamber. To limit subjective differences in scoring between studies for this criterion, replicates were simply defined as unique exposure vessels.
Environmental relevance of test substances (Criterion 6) is described in detail in Supporting Information, Table S2. Note that environmentally relevant concentration thresholds listed in this table are based on general literature values, but study‐specific justification may change the environmentally relevant value for a particular study. For example, a general environmentally relevant concentration of PAHs may be 189 µg/L based on data accumulated from an oil spill; however, Petersen et al. (2008) used a measured value of sediment porewater concentrations to justify environmental realism because their study mirrored this type of exposure and, as such, were scored as 1 in this category.
The purity and/or grade of some oil‐related contaminants are difficult to characterize and compare across studies (Criterion 1). Justification for purity or grade of these test substances can be found in Supporting Information, Table S3. Similarly, relevant environmental parameters to include when describing methodologies for toxicity tests can vary depending on test species and whether the test was conducted in vitro or in vivo. Justification for scoring in this criterion can be found in Supporting Information, Table S4.
Statistical analyses
For comparisons, scores from each group of criteria (test substance; test organism and system; test design, statistics, and results) were converted to percentages of the total achievable score in each group. These percentages were used for statistical analyses. Before assessing the data for statistically significant differences between organism groups, assumptions of normality were assessed using the Shapiro‐Wilk test and homogeneity of variances using Bartlett's test (base package in R; R Foundation for Statistical Computing, 2012). Then, the statistical significance of differences between organism groups was assessed through Kruskal‐Wallis rank tests. Post hoc Dunn tests were performed with a Bonferroni p adjustment to reveal where significant differences occurred. Spearman's correlation coefficients (rho) were calculated for non‐normally distributed data with unequal variance using linear regression models.
Scores are presented in the Results section as mean ± standard deviation.
RESULTS
Summary of reviewed studies
The search process returned 180 studies, 47 of which passed screening criteria for scoring. One additional PhD thesis (Moore, 2016) was included because it was publicly available and included a large volume of test results from freshwater Arctic fish. This resulted in a total of 48 individual studies (40 peer‐reviewed and 8 gray literature). The four chapters of the unpublished thesis were assessed independently. In total, there were a total of 51 separate scoring sheets in the Excel file (see Supporting Information, Study Scores). There were 253 unique toxicity tests (i.e., test substance and test species combinations), 73 unique Arctic test species (two unidentified groups of microbes not included in this total [Pančić et al., 2019; Petersen et al., 2008]), 39 tested substances, and 95 endpoints, for a total of 596 experimental combinations (i.e., test substance/test species/endpoint combinations).
The volume of research produced in this field steadily declined from the 1970s (n = 45 unique toxicity tests) through the 1980s (n = 18) and 1990s (n = 0) and began to increase again in the first decade of the 2000s (n = 5), finally experiencing a period of rapid growth in the 2010s (n = 185); however, 104 tests of the 185 reported in the 2010s were from a single study (Moore, 2016). Excluding that study, 81 tests were performed in this decade. The relationship between number of unique toxicity tests conducted and study year was not statistically significant (Supporting Information, Figure S1). Moore (2016) was also the driver of the notable increase in testing of inorganic contaminants in the 2010s (Supporting Information, Figure S2); however, with that study excluded, oil‐related contaminants comprised the majority of tests throughout this decade.
Test species
Vertebrates were tested in 141, invertebrates in 104, and primary producers in 8 out of 253 unique toxicity tests (Supporting Information, Figure S3). These organism groups were further broken down by organism type within each category (Table 3). Of these, 107 of 253 discrete experiments used freshwater organisms (42% of all unique tested species), and 146 of 253 used marine organisms (58%; Supporting Information, Figure S4). Crustaceans were the most studied organism type in number of individual studies (n = 29, 60% of all studies), number of species tested (n = 31, 42% of all species), and number of endpoints studied (n = 26, 27% of total endpoints). They were also one of only two organism groups that encompassed both marine (n = 29 species) and freshwater (n = 2 species) organisms. Within this group, the class Malacostraca was the best studied (n = 41, 47% of crustacean tests) and the most diverse class in terms of number of species tested (n = 20, 23% of species within crustaceans). Hexanauplia was the second best‐studied class of crustaceans (n = 34, 39% of crustacean tests), and within Hexanauplia C. glacialis (n = 15, 44% of Hexanauplia tests) and C. finmarchicus (n = 14, 41% of Hexanauplia tests) were the primarily studied species. These two copepods were also the most frequently tested organisms of all crustaceans.
Table 3.
Summary of publicly available Arctic ecotoxicological laboratory tests that passed inclusion criteria for the present study as of June 2021
| Organism type | Individual studies (n = 48) | Unique toxicity tests (n = 253) | Species tested (n = 73)a | Compounds tested (n = 39) | Endpoints assessed (n = 95) |
|---|---|---|---|---|---|
| Crustacean | 29 | 87 | 31 | 12 | 26 |
| Fish | 14 | 126 | 12 | 24 | 24 |
| Mammal | 4 | 13 | 8 | 8 | 12 |
| Alga | 3 | 6 | 6 | 3 | 3 |
| Mollusk | 5 | 12 | 11 | 5 | 13 |
| Other invertebrate | 4 | 5 | 3 | 2 | 10 |
| Microbial consortium | 2 | 2 | NA | 2 | 13 |
| Bird | 1 | 2 | 2 | 1 | 1 |
Two unidentified groups of microbes not included.
NA = not available.
Fish were the second most frequently tested organism type by number of individual studies (n = 14, 29% of studies) but were the best studied in terms of unique toxicity tests (n = 126, 50% of tests). This group was the second most diverse in terms of number of species tested (n = 12, 16% of all tested species); however, fish tests were entirely comprised of a single organism class (Actinopterygii, or ray‐finned fishes). Both marine (n = 7 species) and freshwater (n = 5 species, anadromous fish included) species made up this group; however, the majority of data points collected using fish were driven by a single study that tested a variety of metals, metal salts, and nitrogenous contaminants on freshwater fish (Moore, 2016). With that study included, Prosopium cylindraceum (round whitefish; n = 25, 20% of fish tests) and Salvelinus alpinus (Arctic charr; n = 25, 20%) were the most frequently tested species. With that study excluded, the keystone species polar cod (Boreogadus saida) was the most frequently tested (n = 11, 50% of remaining fish tests). Fish were tested against the widest variety of test substances compared to other organism groups (n = 24 unique test substances, 62% of all tested compounds), representing all three contaminant groups (oil‐related contaminants, inorganic contaminants, and organic contaminants).
Mammals were the third most frequently studied organism types based on number of individual studies (n = 4, 8% of studies) and number of unique toxicity tests (n = 13, 5% of unique tests). Twelve of the 13 mammalian tests were in vitro studies using orca (Orcinus orca), seal (Pusa hispida, Pagophilus groenlandicus), and polar bear (Ursus maritimus) cell lines to determine cellular, biochemical, and physiological effects. A single test (Engelhardt, 1981) was conducted using whole organisms to assess the clinical effects of exposure to oil pollution on polar bears.
Algal, mollusk, microbial, bird, and other invertebrate tests were a combined total of only 11% of unique toxicity tests (n = 27 tests). Algal tests principally used Arctic diatoms as test species (i.e., Bacillariophyceae and Coscinodiscophyceae; n = 4, 67% of algal tests), but green algae were also tested (Chlorophyceae; n = 1, 17% of algal tests). A single test was conducted on an identified species of phytoplankton (Lemcke et al., 2018). The “other invertebrate” category is composed of invertebrates that did not fall under either of the other organism types (i.e., crustacean or mollusk). This category includes the green sea urchin (Strongylocentrotus droebachiensis), sea spider (Nymphon gracile), and balloon jellyfish (Halitholus cirratus). The microbial consortium category captured two studies (Pančić et al., 2019; Petersen et al., 2008) that used an unidentified mixture of benthic algae and bacteria obtained from grab samples as test species. Seabirds were also tested in a single study (Braune et al., 2012). Scores from the studies testing seabirds and microbes were omitted in comparisons of scores between organism types to avoid skewing the data based on one and two studies, respectively, but were included in assessments of the data set as a whole (e.g., included when presenting number of tested compounds and endpoints among all studies but excluded in statistical analyses comparing overall reliability scores of different organism types).
Test substances
Contaminants used as test substances in Arctic ecotoxicological research were categorized into three groups: oil‐related contaminants, inorganic contaminants, and organic contaminants. The majority of experiments using inorganic contaminants were freshwater tests (n = 104 of 114 tests; 91%), whereas the other compound groups were mainly comprised of marine tests (98% of oil‐related contaminant tests [n = 125 of 127 tests] and 92% of organic contaminant tests [n = 11 of 12 tests]; Supporting Information, Figure S5). Oil‐related contaminants were used in 50% of all toxicity tests (n = 127), with crude oil comprising the majority of substances in this group (n = 63, 50% of tests using oil‐related contaminants). Crude oils from both Arctic (Alaska North Slope, Atkinson Point, Norman Wells, Cook Inlet, Kobbe, Troll, North Sea, and Prudhoe Bay) and non‐Arctic (Venezuela, Arabian Light, Lago Medio, Midale, and Pembina) regions were included and used as fresh and weathered, with and without suspended sediment, and in the presence and absence of ultraviolet light. In addition, specific compound groups found in oil (e.g., aromatic, asphaltic, or paraffinic compounds alone) were tested as individual test substances. Preparations of crude oil were either mechanically dispersed (i.e., physically agitated to create a WAF of oil) or chemically dispersed (i.e., crude oil used in conjunction with a chemical dispersant to create a chemically enhanced WAF of crude oil) or used after a period of natural attenuation by microorganisms was allowed to occur, as with in situ burn residues or a slick in mesocosm studies. Single compounds found in oil (naphthalene, 2‐methyl‐naphthalene, and pyrene) were the second most frequently tested group of substances in this category, used in 27% of oil‐related contaminant tests (n = 34). Spill response agents alone (i.e., no oil added) were used in 13% of oil‐related contaminant testing using either chemical dispersants (Corexit [9500, 9500A, or 9527], Dasic NS, Finasol OSR52, or Gamlen OD4000) or shoreline washing agents (Absorrep K212, Bios, Bioversal, Corexit 9580A, or Hela saneringsvaeske). The remaining tests in this category were conducted using heavy fuel oil (n = 2) and produced water (n = 7), an oil extraction by‐product. The vast majority of crustacean, algal, mollusk, other invertebrate, and microbial tests have been conducted with oil‐related contaminants (Figure 2); but there are some data for these substances across nearly every organism category.
Figure 2.

Number of unique Arctic species toxicity tests captured in the present review (n = 253 as of June 2021) by organism group, with test substance type within bars (number below each bar is total number of unique tests).
Inorganic contaminants were made up of mostly metals and salts (e.g., copper, zinc, aluminum, sodium chloride, and calcium chloride) and were the second most tested class of substances, used in 45% of all tests (n = 114). More than three‐quarters (n = 104) of all fish tests in particular used inorganic contaminants, most of which were tested in the same study (Moore, 2016). They were also the only class of substances tested on seabirds. This compound group was the most diverse and included 23 unique test substances.
Organic contaminants were the least frequently tested compounds encompassing just under 5% of all toxicity tests (n = 12), all of which were tested on mammalian cell lines (in vitro studies) with the exception of a single test each with crustaceans and fish. Half of the tests that used this group of substances were from a single study (Desforges et al., 2017) where contaminant cocktails were derived from the blubber of wild marine mammals. On analysis, they were determined to be composed primarily of polychlorinated biphenyls (PCBs) and organochlorine pesticides. The remaining compounds tested in this category were various PCB congeners, tetrachlorobiphenyl, and unidentified phenolic compounds from crude oil.
Endpoints
The 95 distinct endpoints were grouped into eight effect categories, as shown in Figure 3. Mortality was the endpoint most frequently assessed, with 54% of all data points being attributed to this effect category (n = 319 of 596 experimental combinations). Within this effect category, crustaceans comprised the majority with 42% being attributed to this group (n = 134, 42% of mortality endpoints), followed by fish (n = 128, 40% of mortality endpoints), mollusks (n = 9), other invertebrates (n = 4), and birds (n = 2). All other effect categories were sublethal, chronic, and/or ecosystem‐based and take up a combined total of less than half of all data points observed (n = 277, 46% of all data points). Behavioral endpoints were the second most frequently assessed, comprising 14% of all tested combinations (n = 85). Within this effect category, crustaceans were again the most popular test organism at 78% of all data points (n = 66), followed by other invertebrates (n = 11) and mollusks (n = 8). The third most frequently assessed endpoint was growth, which was the primary endpoint for algal studies (n = 13 of 17 algal data points; 25% of all growth endpoints). This endpoint was also assessed in crustaceans (n = 19 of 51 growth endpoints; 37% of all growth endpoints), fish (n = 17; 33%), and mollusks (n = 2; 4%) as test species. The remaining five effect categories comprised a total of only 24% of all data points (n = 141). In order of most frequent appearance, they included biochemical endpoints (n = 50; 8% of all endpoints), cellular (n = 37; 6% of all endpoints), physiological (n = 28; 5% of all endpoints), reproduction (n = 20; 3% of all endpoints), and ecosystem‐based (n = 6; 1% of all endpoints) responses.
Figure 3.

Number of data points for Arctic species captured in the present review (n = 596 as of June 2021) by effect category, with organism type within bars (number below each bar is total number of reported responses for that category).
A period of monitoring postexposure, framed as either a recovery period or an opportunity to observe latent effects, was included in 25% of individual studies (n = 12 of 48 studies), The terminology used in the study was mirrored in the scoring spreadsheet to present the results for these time points in the same context as originally reported.
Overall methodology scores
Overall reliability scores calculated from the rubric are presented as vertebrates, invertebrates, or primary producers in Figure 4. Scores were variable within and between groups; however, the majority of means, medians, and first and third quartiles for every group were <50% of the total achievable score. Mean scores expressed as a percentage of the total achievable score were greatest for vertebrates (36 ± 15), followed by invertebrates (30 ± 20) and primary producers (30 ± 23). The difference in means of overall reliability scores between vertebrates and invertebrates were found to be statistically significant, while primary producers were not significantly different from either of the other two groups (p < 0.05).
Figure 4.

Overall reliability scores (as a percentage out of 15) for all unique toxicity tests reviewed in the present study (n = 253 as of June 2021) by their status as vertebrates, invertebrates, or primary producers (number below each bar is total number of unique toxicity tests for that group). Median values are represented by a black line within each box; lower and upper hinges correspond to the 25th and 75th percentiles, respectively; and whiskers extend to the largest or smallest value within 1.5 × the interquartile range from the hinge. Letters indicate statistical significance between groups (p < 0.05) based on Kruskal‐Wallis rank sum tests followed by post hoc Dunn tests with Bonferroni p adjustments.
The individual toxicity test scores for Groups A (test substance), B (test organism and test system), and C (test design, statistics, and results) and overall are presented by organism type in Figure 5. Birds and microbes have been omitted from this figure and statistical analyses because of the small number of scores obtained from these groups of studies (n = 2 for each); however, mean scores for each criteria group for all organism types can be found in Table 4. Tests achieved the greatest score in Group B (74 ± 28), followed by Group A (55 ± 16), Group C (50 ± 15), and overall across all organism types (34 ± 18). The overall reliability scores were generally lower than the average scores for specific groups of criteria because of the high proportion of tests that did not meet one or more of the critical study criteria outlined in the rubric, resulting in a multiplication factor of 0.5 applied to the overall reliability score for each one not met.
Figure 5.

Scores (as a percentage out of 15) for all unique toxicity tests reviewed in the present study (n = 253 as of June 2021) by organism type. Median values are represented by a black line within each box; lower and upper hinges correspond to the 25th and 75th percentiles, respectively; and whiskers extend to the largest or smallest value within 1.5 × the interquartile range from the hinge. Letters indicate statistical significance between groups (p < 0.05) based on Kruskal‐Wallis rank sum tests followed by post hoc Dunn tests with Bonferroni p adjustments. No statistically significant differences were found for Group A (test substance).
Table 4.
Percentage of unique toxicity tests with a score of 1 for each criterion by organism typea
| Crustacean | Fish | Alga | Mammal | Mollusk | Other invertebrate | Bird | Microbe | All unique toxicity tests | ||
|---|---|---|---|---|---|---|---|---|---|---|
| Group A | Criterion 1 | 79 | 95 | 100 | 54 | 33 | 60 | 100 | 50 | 84 |
| Criterion 2 | 64 | 98 | 33 | 85 | 67 | 40 | 100 | 100 | 80 | |
| Criterion 3 | 15 | 7 | 17 | 0 | 33 | 20 | 0 | 50 | 11 | |
| Criterion 4 | 14 | 5 | 0 | 8 | 33 | 20 | 100 | 100 | 11 | |
| Criterion 5 | 71 | 94 | 100 | 92 | 92 | 100 | 100 | 50 | 86 | |
| Mean score | 57 ± 19 | 52 ± 11 | 47 ± 22 | 56 ± 13 | 60 ± 26 | 57 ± 28 | 83 ± 0b | 75 ± 12 | 55 ± 16 | |
| Group B | Criterion 6 | 94 | 16 | 33 | 100 | 100 | 100 | 100 | 100 | 56 |
| Criterion 7 | 93 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 98 | |
| Criterion 8 | 79 | 97 | 83 | 100 | 100 | 100 | 100 | 100 | 91 | |
| Criterion 9 | 15 | 88 | 17 | 0 | 8 | 0 | 0 | 0 | 49 | |
| Mean score | 53 ± 24 | 93 ± 18 | 50 ± 16 | 73 ± 7 | 54 ± 10 | 50 ± 0b | 75 ± 0b | 63 ± 18 | 74 ± 28 | |
| Group C | Criterion 10 | 23 | 89 | 0 | 92 | 8 | 0 | 100 | 50 | 57 |
| Criterion 11 | 55 | 9 | 100 | 92 | 67 | 40 | 100 | 50 | 37 | |
| Criterion 12 | 84 | 100 | 33 | 92 | 100 | 100 | 100 | 100 | 93 | |
| Criterion 13 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 50 | 1 | |
| Criterion 14 | 38 | 90 | 83 | 8 | 17 | 20 | 100 | 50 | 61 | |
| Criterion 15 | 32 | 87 | 0 | 0 | 17 | 40 | 100 | 0 | 55 | |
| Mean score | 42 ± 17 | 57 ± 10 | 43 ± 8 | 38 ± 6 | 40 ± 17 | 40 ± 14 | 80 ± 0b | 50 ± 14 | 50 ± 15 | |
| Mean reliability score | 30 ± 21 | 35 ± 13 | 27 ± 26 | 48 ± 18 | 34 ± 16 | 22 ± 8 | 80 ± 0b | 38 ± 31 | 34 ± 18 |
Bolded numbers indicate <50% of tests with a score of 1 for that criterion. Mean reliability scores ± standard deviation for each organism type are calculated per criterion group and presented in italics below each section.
All scores identical.
Within Group A, toxicity tests performed on mollusks had the greatest mean score (60 ± 26), although other invertebrates, crustaceans, mammals, fish, and algae were closely behind with mean scores ranging between 47 and 57%. Tests performed with algae scored the lowest in this criteria group (47 ± 22). Variability in scores was the greatest in this category for other invertebrates (coefficient of variation [CV] = 49%), while fish scores were the most consistent (CV = 21%).
Tests performed with fish had the greatest mean score among organism types for Group B (93 ± 18), followed by mammals (73 ± 7), mollusks (54 ± 10), crustaceans (53 ± 24), other invertebrates (50 ± 0), and algae (50 ± 16). Variability in scores for criterion Group B was greatest for tests using crustaceans (CV = 46%) with scores ranging from 0 to 100%. The least variability was found in the other invertebrate group (CV = 0%).
Within Group C, fish again had the greatest average score (57 ± 10). Algae, crustaceans, mollusks, and other invertebrates all scored in the 40% range (43 ± 8, 42 ± 17, 40 ± 17, and 40 ± 14, respectively); and mammals achieved the lowest average score of 38 ± 6. Variation in scores was greatest for mollusks (CV = 43%) and least for mammals (CV = 14%).
With regard to overall reliability scores (i.e., the sum of Groups A, B, and C multiplied by 0.5 for each critical criterion not met), mammalian tests scored the greatest with an average score of 48 ± 18 and had the greatest percentage of unique toxicity tests that were deemed most reliable for use in risk assessment (e.g., score >50%) according to the criteria outlined in the present review (n = 10 of 13 mammalian tests, 77%). Fish tests scored second greatest overall at 35 ± 13, followed by mollusks (34 ± 16), crustaceans (30 ± 21), algae (27 ± 26), and other invertebrates (22 ± 8). After mammals, algal tests had the second highest proportion of reliable studies with 33% of tests meeting the criteria, although only six unique toxicity tests were observed in total. Twenty of 87 crustacean tests were considered reliable for risk assessment (23%), 1 of 12 mollusk tests (8%), 10 of 126 fish tests (8%), and 0 of 5 tests using other invertebrates. In total, only 46 of 253 (18%) unique toxicity tests passed screening for data reliability for use in ERA.
Finally, statistically significant corelationships were found between score and study year for Groups B and C and overall reliability scores but not for Group A (Supporting Information, Figures S6–S9). The corelationship between overall reliability score and the impact factor of the journal in which the study was published was analyzed and determined to be statistically significant (Supporting Information, Figure S10).
Individual criteria scores
Data pertaining to individual criteria organized by organism type are presented in Table 4 and are visually represented in Supporting Information, Figures S11–S13. Bold numbers indicate that <50% of tests achieved a score of 1 in that criterion across all organism types. Criterion 7 (strain or source of test organism identified) was met the most frequently (n = 247 of 253 unique toxicity tests; 98%); followed by Criterion 12 (statistical methods described; n = 234; 92%); Criterion 8 (initial test organism characteristics described; n = 230; 91%); Criterion 5 (three or more test concentrations included; n = 217; 86%); Criterion 1 (source and purity of test substance reported; n = 212; 84%); Criterion 2 (any analytical confirmation of tested concentrations; n = 206; 81%); Criterion 14 (key measured raw values included; n = 159; 63%); Criterion 10 (relevant environmental parameters identified and reported; n = 148; 58%); Criterion 15 (control values reported and performance criteria met; n = 143; 57%); Criterion 6 (at least one exposure concentration < environmental concentrations; n = 138; 55%); Criterion 9 (standard protocol followed; n = 126; 50%); Criterion 11 (minimum of three replicates per exposure; n = 90; 36%); Criterion 3 (individual initial analytical confirmation of tested concentrations; n = 29; 11%); Criterion 4 (individual final analytical confirmation of tested concentrations; n = 28; 11%); and Criterion 13 (concentration–response model and parameters provided; n = 2; 1%).
Several organism types achieved a score of 1 for specific criteria in 100% of unique toxicity tests. These were algal tests in Criteria 1, 5, 7, and 11; fish tests in Criteria 7 and 12; mammalian tests in Criteria 6, 7, and 8; mollusk tests in Criteria 6, 7, 8, and 12; and other invertebrate tests in Criteria 5, 6, 7, 8, and 12. In addition, almost all fish tests achieved a score of 1 in Criteria 2 and 8 (98 and 97%, respectively). In contrast, several organism types did not achieve a score of 1 in any test for certain criteria. No algal tests achieved a score of 1 in Criteria 4, 10, 13, or 15 and likewise for crustacean tests in Criterion 13; mammal tests in Criteria 3, 9, 13, or 15; mollusk tests in Criterion 13; and other invertebrate tests in Criteria 9, 10, or 13.
Supporting Information, Figure S14 shows overall, among all organism types, the percentage of unique toxicity tests that achieved a score of 1 in the four critical criteria outlined in the rubric (criteria in bold text in Table 1). Of these four criteria considered critical components of a laboratory toxicity test for reliable use in risk assessment, the inclusion of at least three replicates (Criterion 11) was met in the fewest number of unique toxicity tests (n = 90, 36%). Eighty percent of tests (n = 206) included some analytical confirmation of test concentrations (i.e., measured concentrations in stock solution prior to diluting to create test solutions or in test solution stocks prior to allocating into individual experimental units; Criterion 2). The inclusion of a minimum of three test concentrations (Criterion 5) was met in 86% of tests (n = 217), and an appropriate description of statistical methods (Criterion 12) was met in 92% of tests (n = 234). These data are further broken down by organism type in Supporting Information, Figure S15. The strongest study assessed for each organism type and test substance category combination is presented in Table 5.
Table 5.
Arctic ecotoxicology studies with the greatest reliability scores for each organism type tested in the present studya
| Organism type | Test substance category | Study no. | Study | Test substance | Test species | Marine or freshwater | Endpoints | Overall reliability score (%) |
|---|---|---|---|---|---|---|---|---|
| Fish | Oil‐related contaminant | 34 | Nahrgang et al. (2016) | Crude oil | Boreogadus saida | Marine |
Cardiac activity and arrhythmia Hatching Malformations Mortality |
87 |
| Inorganic contaminant | 30–33 | Moore (2016) | Metals, metal salts, and nitrogenous contaminants |
Coregonus clupeaformis Prosopium cylindraceum Salvelinus alpinus Salvelinus namaycush Thymallus arcticus |
Freshwater | Mortality | 33b | |
| Organic contaminant | 40 | Palace et al. (2001) | PCB Congeners | Thymallus arcticus | Freshwater |
EROD activity Osmolality Thyroid status Vitamin status |
60b | |
| Bird | Oil‐related contaminant | NA | NA | NA | NA | NA | NA | NA |
| Inorganic contaminant | 2 | Braune et al. (2012) | Methylmercury |
Sterna paradisaea Uria lomvia |
Marine | Survival to 90% of development | 80b | |
| Organic contaminant | NA | NA | NA | NA | NA | NA | NA | |
| Crustacean | Oil‐related contaminant | 15a | Gardiner et al. (2013) | Crude oil | Calanus glacialis | Marine | Mortality | 80 |
| Inorganic contaminant | 39 | Overjordet et al. (2014) | Mercury |
Calanus glacialis Calanus finmarchicus |
Marine |
Gene expression (GST mRNA) Mortality |
73 | |
| Organic contaminant | 45b | Riebell and Percy (1989) | Phenol | Mysis oculata | Marine |
Loss of equilibrium Lying on dorsal side Mortality No movement without prodding |
12b | |
| Algae | Oil‐related contaminant | 4 | Camus et al. (2015) | Produced water | Porosira glacialis | Marine | Growth inhibition | 67 |
| Inorganic contaminant | NA | NA | NA | NA | NA | NA | NA | |
| Organic contaminant | NA | NA | NA | NA | NA | NA | NA | |
| Mollusk | Oil‐related contaminant | 4 | Camus et al. (2015) | Produced water | Mytilus edulis | Marine | Development | 67 |
| Inorganic contaminant | 48 | Thyrring et al. (2015) | Lead | Mytilus edulis | Marine |
Fatty acid composition Mortality |
30b | |
| Organic contaminant | NA | NA | NA | NA | NA | NA | NA | |
| Mammal | Oil‐related contaminant | 8 | Engelhardt (1981) | Crude oil | Ursus maritimus | Marine |
Electrolyte changes Enzyme and metabolite changes Erythrocytic changes Hormonal changes Leukocytic changes Nitrogenous compound changes |
6b |
| Inorganic contaminant | 14a and 14b | Frouin et al. (2012) | Mercury | Delphinapterus leucas | Marine |
Cell viability Intracellular thiol levels Lymphocyte proliferation Metallothionein induction |
23b | |
| Organic contaminant | 6a and 6b | Desforges et al. (2017) | Blubber‐derived contaminant cocktail |
Cetacean spp. Orcinus orca Pagophilus groenlandicus Seal spp. Ursus maritimus Cystophore cristata |
Marine |
Natural killer cell activity Cell viability Lymphocyte proliferation Phagocytosis |
60 | |
| Microbe | Oil‐related contaminant | 44 | Petersen et al. (2008) | Pyrene | Unidentified microbial consortium | Marine |
Algal 14C incorporation Chlorophyll‐a content Phosphate flux Potential nitrification Potential oxygen consumption Silicate flux |
60 |
| Inorganic contaminant | NA | NA | NA | NA | NA | NA | NA | |
| Organic contaminant | NA | NA | NA | NA | NA | NA | NA | |
| Other invertebrate | Oil‐related contaminant | 9 | Engelhardt (1983) | Crude oil | Strongylocentrotus droebachiensis | Marine |
Curling of tube feet Loss of attachment ability Loss of cover Mortality Retraction of tube feed Spine droop Spine rigidity Stimulus response |
30 |
| 42a | Percy and Mullin (1975) | Crude oil | Halitholus cirratus | Marine | Mortality | 30 | ||
| Inorganic contaminant | NA | NA | NA | NA | NA | NA | NA | |
| Organic contaminant | NA | NA | NA | NA | NA | NA | NA |
Studies presented in bold are considered reliable for use in ecological risk assessment (overall reliability score ≥50%). Study number corresponds to the alphabetically ordered list of assessed studies in Supporting Information, Table S1.
The only study for this category.
PCB = polychlorinated biphenyl; EROD = 7‐ethoxyresorufin O‐deethylase; NA = not available; GST = glutathione S‐transferase; mRNA = messenger RNA.
Ecological relevance scores
Relevance scores for all experimental combinations assessed in the present study are presented by organism type in Table 6. The mean relevance score for fish tests (with the greatest mean relevance score; 5.3 ± 1.4) and mammalian tests (with the least mean relevance score; 2.0 ± 0) were both significantly different from all other organism groups and each other (p < 0.05).
Table 6.
Count of relevance scores for all reported responses assessed in the present study (i.e., test substance/test species/endpoint combinations, n = 596), except birds and microbes, by organism type
| Relevance score | Crustacean (n = 276) | Fish (n = 173) | Alga (n = 17) | Mammal (n = 35) | Mollusk (n = 23) | Other invertebrate (n = 15) |
|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 10 | 2 | 0 | 0 | 0 | 0 |
| 2 | 2 | 18 | 0 | 35 | 3 | 0 |
| 3 | 60 | 0 | 0 | 0 | 9 | 11 |
| 4 | 33 | 17 | 15 | 0 | 1 | 0 |
| 5 | 20 | 4 | 0 | 0 | 1 | 0 |
| 6 | 151 | 132 | 2 | 0 | 9 | 4 |
Crustacean tests achieved the second highest mean relevance score (4.8 ± 1.5), followed by mollusks (4.2 ± 1.6), algae (4.2 ± 0.7), and other invertebrates (3.8 ± 1.4). No statistically significant differences were found between any organism groups other than those described. When assessed by test substance type, differences between mean relevance scores for inorganic contaminants (5.5 ± 1.2), oil‐related contaminants (4.7 ± 1.6), and organic contaminants (2.3 ± 1.0) were all statistically significantly different from one another (Supporting Information, Figure S16). Figure 6 provides the relationship between overall and relevance scores for the data set as a whole (n = 596 experimental combinations). The points in the lower left quadrant indicate both low overall and relevance scores (20% of experimental combinations). The upper left quadrant contains studies that used highly relevant endpoints but did not score well overall (59% of experimental combinations). The lower right quadrant indicates high overall reliability scores but with minimal relevance for use in risk assessment (7% of experimental combinations). The upper right quadrant includes studies that are considered highly relevant, achieved a score of 1 in all the critical categories outlined, and were generally well designed, conducted, and reported (14%). This relationship is broken down further by organism group in Supporting Information, Figure S17.
Figure 6.

Strength of methods and relevance scores for all reported responses in the present study (n = 596). Size of circle corresponds to number of studies. Numerical values in each quadrant refer to percentage of data points with a relevance score between 0 and 3 (inclusive) and an overall reliability score of <50% (bottom left quadrant) or >50% (bottom right quadrant) and a relevance score between 4 and 6 (inclusive) and an overall reliability score of <50% (top left quadrant) or >50% (top right quadrant).
DISCUSSION
The present review supports the data reliability concerns that have been raised by the ecotoxicological community in recent years (e.g., Ågerstrand et al., 2013; Breitholtz et al., 2006; Durda & Preziosi, 2000; Harris et al., 2014) pertaining to Arctic species toxicity testing. A specific sensitivity comparison between Arctic and temperate species was not a focus of the present review; however, it is a question that needs to be answered for environmental risk assessors conducting ERAs. Overall, relatively few Arctic species toxicity tests assessed in the present study achieved an overall reliability score deemed minimally acceptable for use in risk assessment (43 of 259 unique toxicity tests; 17%) without significant uncertainty; therefore, there are limited data available to use in a sensitivity comparison. Of those that achieved acceptable scores, there are many important groups of test species, test substances, and endpoints that are highly underrepresented relative to their potential for exposure in the Arctic. The results of the present study highlight gaps in collected data, clearly demonstrate the need for more methodologically robust studies, and provide a set of collected data that could be used to address these relative sensitivity questions in the future. In addition, the present study provides an opportunity to further refine the scoring rubric.
General recommendations for improving the conduct of future toxicity studies
Several methodological deficiencies across studies came to light during the scoring process. First, appropriate controls should always be included with every test, regardless of prior evidence obtained that supports the decision to exclude them. A common approach in the studies assessed in the present study was to indicate that a solvent was used to dissolve the test substance but that no solvent control was included because of evidence in previous studies that such concentrations did not significantly impact the organism. Errors in solution preparation, solvent stock preparation, or increased sensitivity of a particular group of test organisms cannot be definitively ascertained without the inclusion of these controls. Ultimately, each test compound may have a unique interaction with a solvent control that needs to be accounted for in the test design (Christou et al., 2020).
Second, reporting of environmental conditions needs significant improvement to ensure that tests can be repeated and results reproduced. For example, studies often reported the target temperature or lighting of exposure chambers without reporting actual measured values. Issues can arise with growth chambers, from temperature and lighting swings and inconsistencies spatially to faulty light bulbs that stop working midway through exposure to power failures that may not be recognized if no monitoring is conducted. In some cases, monitoring may have been conducted but the results not reported in the publication itself. Although this information may have been retrievable through contacting the authors to confirm whether or not certain monitoring was done or whether measured values were within the predefined range, it is unrealistic and impractical to do this when it could easily be included in the original paper. Just as it is critical to analytically confirm exposure concentrations to have confidence in the results, the user of the data will not have confidence that test conditions were within the predefined range for the duration of exposure if target conditions were not measured as well as reported.
Third, low exposure volumes can lead to an inability to analytically confirm exposure concentrations (e.g., algae and copepod tests in Camus et al. [2015]). The approach taken in the present study was to estimate measured concentrations by using the average ratio of nominal to measured concentrations that were reported for other tests in the same study. This likely provides a more accurate estimation of the actual exposure than nominal concentrations alone; however, it does not give risk assessors full confidence in the results as when the individual unit concentrations were reported. Including larger‐volume surrogate vessels that are exposed for the same duration and under the same conditions as experimental units can be an option for these scenarios.
Finally, and perhaps most importantly, it is imperative to stress that not all of the studies assessed in the present study claimed to have been conducted with the objective of generating toxicity data for use in ERA. Understanding the biological traits of Arctic species through a variety of study designs is equally as critical to achieving the overall goal of environmental sustainability as is generating toxicity data; however, the aim of the present study was to identify studies exclusively for the latter purpose. A poor score achieved through the application of this rubric does not necessarily mean that the study was not well conducted or of high value to science; rather, it implies that the study is not fully reliable for the sole function outlined a priori in the present study. For example, Criterion 5 indicates that a minimum of three concentrations must be tested. This is the minimum number of concentrations required to plot a concentration–response curve to characterize the range of toxicity to an organism. This is useful for ERA but may not be required to produce scientifically valuable results. If a study were assigned a score of 0 for this criterion because only a single concentration was tested to assess the mode of action of a compound, the data may not carry as much weight in the risk‐assessment sphere, though the results are important to science. Understanding the mechanism of action of various contaminants on exposure to Arctic species in Arctic conditions or generating data that will assist in our knowledge of biomarkers of exposure in the field is extremely relevant. These studies contribute to the foundation of knowledge that is required to advance the field of ecotoxicology and ERA and would likely render toxicity data less useful if they were not gathered. We recommend that this type of data continue to be collected, but it would be beneficial if studies were more explicit in their objectives as to the purpose of the data collection so that data collected for the purpose of informing risk assessment could be easily identified.
Specific recommendations to advance Arctic toxicity testing
The present study has highlighted several knowledge gaps and deficiencies in reporting that should be addressed for future published work. The recommendations should be taken into consideration by anyone planning, performing, or reviewing ecotoxicological data including researchers, funding agencies, and risk assessors.
Standardization of Arctic‐specific test methods
Knowledge Gap 1: Lack of standard toxicity and recovery test methods for Arctic species.
Recommendation 1: Strive toward developing standard laboratory tests (whole organism and in vitro), and in the absence of standard tests, provide a thorough description of what was modified from standard methods.
Prior to discussing the data set as a whole, it is important to note that standard toxicity test methods designed specifically for Arctic aquatic species do not currently exist in the literature. There have been techniques introduced by researchers for modeling Arctic‐specific toxicity data that take into account environmental, geophysical, and physiological factors specific to Arctic regions that partially bridge this gap; however, they are novel in their application (see Fahd et al., 2019). Many of the studies scored in the present review followed standard test methods as closely as possible but with modifications to environmental parameters, test durations, or validity criteria as necessary to accommodate the requirements of Arctic test organisms (see Camus et al., 2015; Faksness et al., 2011; Hansen et al., 2012; Riebell & Percy, 1989). However, several studies reported that standard methods were followed but with “modifications,” without detailing what was modified from the original test conditions. This renders the experiment impossible to reproduce or score effectively. Researchers should at least report a brief summary of the standard method conditions instead of referencing a separate document with or without modifications.
Developing and validating standard in vivo tests for Arctic regions is necessary because data based on modeling or collected via in vitro methods need to be validated by in vivo studies to have confidence in the results. In addition, there are many questions that cannot be answered with tests that do not expose the whole organism, such as behavioral and physiological interactions within the organism. The current regulatory infrastructure relies on whole organism–based endpoints (e.g., mortality) to determine hazardous concentrations in the environment, and thus these data are enormously helpful to ERA. In short, both in vivo and in vitro tests are important to conduct, each offering a range of benefits and drawbacks that the other does not.
Because it pertains specifically to in vitro tests, emphasis should be placed on developing methods where the sensitivity, specificity, and exposure conditions of the assays are clearly understood and defined (Rehberger et al., 2018). The purpose of these tests, including how the data can be extrapolated to the ecosystem level, should be plainly expressed by researchers to facilitate understanding by environmental risk assessors regarding how the data can be used in the context of ERA.
Finally, understanding potential latent effects and/or the capacity of test organisms to recover postexposure is a critical component in extrapolating ecosystem‐level effects from lab‐based assays; however, only a quarter of tests assessed in the present study included postexposure monitoring (Beketoc & Liess, 2009; Pechenik, 2006). This may be even more important for Arctic test organisms because a variety of biotic and abiotic factors specific to this ecosystem result in significant uncertainty regarding time to effect (Chapman & Riddle, 2005; Olsen et al., 2011). For example, longer toxicity response times for polar invertebrates relative to temperate ones have been observed for metal exposures. It has been hypothesized that this can be partially attributed to the temperature‐driven low metabolic rate of Arctic species as well as the longer development time of polar organisms relative to their temperate and tropical counterparts. In addition, sea‐ice cover is known to prevent mixing of sea layers, which increases the probability of long exposure times for under‐ice organisms (Chapman & Riddle, 2005).
Test species
Knowledge Gap 2: Lack of toxicity data for Arctic‐focused risk assessments in general, especially for some ecologically important groups.
Recommendation 2: Generate toxicity data using organisms that are likely the most sensitive first (i.e., algae). Emphasize the development and refinement of models and surrogates (e.g., temperate species) using these data, and confirm protection with ongoing toxicity testing.
The diversity of Arctic species that have been tested in the studies assessed in the present study is not directly proportionate to the diversity of species that live in this region, though there are some parallels. The Conservation of Arctic Flora and Fauna (2013) reported that of the approximately 7630 Arctic marine species (mammals, seabirds, fish, invertebrates, and algae) reported in their 2013 Arctic Biodiversity Assessment, 65.5% were invertebrates. In the present study the distribution of tested species aligned with these data because approximately 67% of marine species tested were invertebrates (n = 43 of 67 marine species). In contrast, of all fish species reported in the Arctic, approximately 34% were freshwater and diadromous, while the remaining 66% were marine. In the present study, 42% of tested fish species were from freshwater or diadromous environments, and 58% were marine. This was largely driven by a study that focused on the impacts of mining contaminants on freshwater fish (Moore, 2016). Marine algae were underrepresented in the assessed literature relative to environmental populations, with only 6 of 146 marine tests being conducted on these organisms (4% of marine tests), while marine algae are the second most diverse species group in the Arctic based on the same report (~30% of all marine species). The preference to study some species groups over others (i.e., taxonomic chauvinism) has been observed in ecotoxicology for pesticides and may occur for a variety of reasons (Prosser et al., 2021). Within Arctic‐specific toxicology, the lack of data using algae as test species may be due to the limited availability of ecologically important species in commercial culture collections.
Tests conducted with fish were plentiful; however, the majority were conducted by a single group (Moore, 2016), and thus there is limited reproducibility of these results. Scores for fish were also driven by this single study and, as mentioned, could potentially have been vastly different on secondary review. Further, only 17% of fish tests (n = 21 of 126) were conducted using marine species, an area of research where more testing would be beneficial because of the abundance and ecological importance of marine fish in the Arctic, their role as a country food for Indigenous people of the north, and the anticipated increase in Arctic shipping that will increase their likelihood of exposure to contaminants (Arctic Council, 2020; Darnis et al., 2012).
Relatively few data points were collected from mammals, mollusks, other invertebrates, microbes, and birds. Of these, we believe the priority should be placed on collecting algal data. With a relatively low number of data points derived from algae in the present review (n = 25), they also achieved the lowest scores in two of the three criteria groups: Group A (test substance) and Group C (test design, statistics, and results). For algae, it is a case of both lack of data and issues with reliability that need to be addressed.
The reasoning for prioritizing on algal testing is 2‐fold. The first is their relative sensitivity and availability. Algae have demonstrated sensitivity to many substances (Staveley & Smrchek, 2005) and are frequently used by environmental risk assessors to help define regulatory thresholds of contaminants in temperate ecosystems. They can also be cultured year‐round in the laboratory with minimal cost and infrastructure. Second, the importance of ice algae in Arctic ecosystems cannot be overstated. For example, diatoms such as Nitzschia frigida and Melosira arctica are drivers of production in the Arctic Ocean, forming the foundation of the food web when they bloom in spring and providing a food source for copepods and subsequently fish, mammals, and humans (Fernández‐Méndez et al., 2014; Gosselin et al., 1997; Michel et al., 1996; Søreide et al., 2010). These resources are extremely important in sustaining the structure and function of Arctic ecosystems, and they are severely underrepresented in Arctic ecotoxicology.
Another option to address the gap in Arctic species' toxicity testing would be to place emphasis on modeling and other alternative methods to estimate toxicity. For example, the Interspecies Correlation Estimation model (Raimondo et al., 2015) predicts the toxicity of contaminants to untested organisms based on surrogate species and has great potential to be refined further to increase its reliability (Bejarano, 2019; Bejarano & Wheeler, 2020; Bejarano et al., 2017b). In addition, species sensitivity distributions and hazard concentrations for Arctic and non‐Arctic species have been compared in the literature and have been reported to be protective of Arctic organisms for select compounds (e.g., hydrocarbons [Bejarano et al., 2017a]). Further generation of high‐quality toxicity data for other compound groups would contribute to these models, thus allowing risk assessors more confidence in extrapolating results from such data.
Though there are limited data, some comparisons of Arctic versus temperate species sensitivity have been made in the literature. For example, Olsen et al. (2011) compared sensitivities of Arctic and temperate organisms to 2‐methylnaphthalene and found that they were similarly sensitive. Similarly, hazardous concentrations of artificially produced water to Arctic and temperate fish have been found to be comparable (Camus et al., 2015). Hansen et al. (2014) found that neither Arctic nor temperate organisms were consistently the most sensitive to oil spill–response chemicals. This demonstrates no clear, repeatable evidence that one group is more, less, or similarly sensitive than the other, underlining the importance of generating high‐quality data to facilitate these comparisons effectively.
Test substances
Knowledge Gap 3: Lack of diversity in compounds tested on Arctic organisms.
Recommendation 3: Design repeatable and reliable experiments using standard reference toxicants so that results can be compared across space and time.
Knowledge Gap 4: Inconsistencies in oil spill–related research have led to a lack of comparability between studies, Arctic or otherwise.
Recommendation 4: Focus research on developing, validating, and refining laboratory methods and predictive effects models.
A wider variety of test substances that are representative of contaminants found in Arctic environments should be incorporated into future research. The vast majority of our knowledge thus far is related to oil spills and oil spill–remediation methods. There have been no studies assessing the effect of multiple stressors on Arctic organisms or other emerging contaminants besides a single study on microplastics (Rodríguez‐Torres et al., 2020). For example, the impact of contaminants found in wastewater has not been determined via first‐tier toxicity tests (e.g., pharmaceuticals and personal care products), nor has that of persistent organic pollutants (e.g., pesticides and industrial chemicals). Further, because regulations for Arctic activities change on the political, economic, and social scales, new contaminants of concern will continue to emerge that will require action by the ecotoxicological community. Data have been collected from field monitoring exercises (e.g., tissue concentrations from naturally exposed marine mammals compared to biomarkers of exposure within the organism; see Supporting Information, Table S1); however, this does not provide the data that risk assessors require to inform regulatory decision‐making under the current framework.
More than half of all collected data uses oil‐related contaminants alone. The vast estimates of untapped hydrocarbons in Arctic regions and increased shipping traffic over the last decade provide sound reasons to study the impacts of potential future oil spills in the marine environment (Ostreng et al., 2013; US Department of the Interior, 2008). The potential for Arctic organisms to be exposed to both local and foreign oils as a result of Arctic hydrocarbon exploration and fuel transportation exists and was well captured in the studies assessed in the present review. As of January 2020, the International Maritime Organization (2019) has implemented a new limit on fuel oil sulfur content to reduce sulfur oxide emissions from ships, which introduces a new potential contaminant threat to Arctic organisms and presents an opportunity for identifying additional test substances to use in toxicity testing. This underscores the importance of continuously adapting toxicity tests to ensure that the data being collected are needed to answer current questions being asked by environmental risk assessors as regulations change.
Though the question of oil‐related contaminant toxicity has been partially answered with the tests assessed in the present review, inconsistent methodologies and reporting in oil spill research have limited our ability to compare and use the data generated by these experiments. This issue has been raised by researchers in the past, and some have provided guidance for improving consistency in the future (see Bejarano, 2019; Redman & Parkerton, 2015). Deviations from the standard methods are common, often with limited or no description of what said deviations were. Although protocols do exist to address some of these difficulties (e.g., the Chemical Response to Oil Spills Ecological Effects Research Forum method [Aurand & Coelho, 2005]), the complex nature of oil as a test substance continues to provide challenges in exposure media preparation, characterization, and reporting. The myriad of compounds that make up crude oil render it problematic to characterize the toxicity of every one, and a wide variety of environmental, physical, and biological factors can have extreme influence on how the compounds interact with one another and with the test organism (Wang et al., 2021). Predictive models have become more popular in the last decade (Carroll & Smit, 2011; Hansen et al., 2019; McGrath et al., 2018) and are a promising approach to addressing this issue.
There were many test substances that were used across multiple studies on the same or similar organisms (e.g., crude oil tested on crustaceans [Busdosh & Atlas, 1977; Gardiner et al., 2013; Hansen et al., 2009], produced water tested on fish [Camus et al., 2015; Geraudie et al., 2014]); however, most compounds where data exist for similar species across multiple studies are difficult or impossible to compare because the test substances are complex mixtures that differ depending on their source. Pyrene is one exception; this compound was tested on crustaceans in six studies (Grenvald et al., 2013; Hjorth & Nielsen, 2011; Jensen et al., 2008; Nørregaard et al., 2014; Toxvӕrd et al., 2019) but with major differences in test organism and endpoint parameters (i.e., life stages, feeding strategies [starved/feeding], ECx values calculated at various time points that do not temporally align) that again render comparison difficult. This variability demonstrates that risk assessors would benefit from the generation of a bank of Arctic‐specific toxicity data that use standard reference toxicants so that tests can be shown to be repeatable. Environment Canada (1990) has outlined several criteria to determine the acceptability of various reference toxicants, including the ability to detect abnormal organisms, their solubility and stability over time, their ease of analysis, and whether they have an established toxicity database. The inclusion of reference toxicants in toxicity testing not only facilitates inter‐ and intralaboratory comparisons of results through space and time but also enables laboratories to determine the precision of their toxicity tests (Environment Canada, 1990). This will also ensure that results can be confidently compared to understand where the sensitivity of Arctic organisms lies relative to standard, temperate species when they are employed as surrogates.
Endpoints
Knowledge Gap 5: Lack of highly relevant sublethal and chronic endpoints assessed in Arctic toxicity testing.
Recommendation 5: Prioritize the incorporation of these endpoints into future Arctic toxicity test protocols.
Mortality was used as an endpoint in more than half of all experimental combinations. Although these data are necessary and useful to collect, sublethal and chronic responses are severely underrepresented. Understanding the threshold of concentrations where organisms will die (e.g., median lethal concentration values) is critical for risk assessment; however, it is equally, if not more, important to prevent contaminant concentrations from reaching these levels in the first place. Mortality may be the most highly relevant endpoint identified in this rubric for risk assessments, but chronic exposures provide information that allows risk assessors to remain as conservative as possible in their recommendations to regulatory decision‐makers. The USEPA (1994) has highlighted the ecological significance of laboratory‐derived sublethal effect data when extrapolated to field conditions. Growth, developmental, and reproductive changes from a population's norm have the capability to drastically alter a population's biomass and age structure. Examples of sublethal endpoints that have been recommended in the literature for future study range from the organism level (e.g., avoidance, courtship, migratory, and rearing behavior) to the community, ecosystem, and landscape levels (e.g., nutrient retention, decomposition rate, resilience and recovery [USEPA, 2016]). The only organism type from studies assessed in the present review that were primarily studied for nonlethality endpoints was the other invertebrate category, for which behavioral endpoints comprised most of the data; therefore, algae, birds, crustaceans, fish, mammals, microbes, and mollusks are lacking these data. Where some work has already been started on developing methods to test sublethal endpoints with Arctic species (e.g., reproductive tests with the copepod C. glacialis [Jensen et al., 2008] and the fish Mallotus villosus [Frantzen et al., 2012] or behavioral tests with the mollusk Mya truncata and the echinoderm Strongylocentrotus droebachiensis [Engelhardt, 1983]), these species would be excellent candidates for further developing and refining said methods to produce high‐quality, reliable data. A summary of the key knowledge gaps and recommendations can be found in Supporting Information, Table S5.
Critiquing and modifications to the rubric
The scoring rubric used in the present study was modified from that described in Hanson et al. (2019) for the herbicide atrazine and underwent modifications based on the specific Arctic ecotoxicological data that had been collected. This ensured that the spreadsheet was appropriately designed for Arctic‐specific data and included space for a range of exposure scenarios. That said, there were still studies that did not fit the spreadsheet fully and could not be objectively scored with total confidence according to the criteria set out in advance (e.g., in vitro studies, abbreviated abstracts that purposefully did not include all relevant methodologies but were not published in full elsewhere, studies that did not include three replicates but were still statistically sound). These studies would benefit from a secondary or consensus review where expert judgment could be used to determine scores more accurately. Though potential issues and modifications to the spreadsheet for the future are noted, a full secondary review is outside of the scope of the present study.
First, a minimum of three replicates per exposure is necessary to meet Criterion 11. Achieving a score of 0 in this criterion, because it is considered critical to the study, results in the application of a multiplication factor of 0.5 to the overall reliability score. Although appropriate replication is in fact critical to the study, three replicates are not an absolute requirement, depending on study design (e.g., studies using a linear regression approach per the Environment Canada [2000] test method), to be deemed appropriate replication. For example, the fact that relatively few experiments met this criteria was largely driven by a single study that produced 104 of the 126 fish tests (Moore, 2016). That study followed the Environment Canada test method, which requires 20 fish to be placed in a single bucket with no replicate buckets for each concentration, a method that has been deemed statistically appropriate and acceptable because of the statistical power of the test. Because of the strictness of the criterion, that study resulted in an overall reliability score of 5/15 after the multiplication factor was applied for scoring 0 in Criterion 11, when on secondary review it may have been scored as 10/15. If the criterion indicated that a minimum of three replicates per exposure was required or that justification of an otherwise appropriate statistical approach could still warrant a score of 1, the study would have been considered reliable for use in risk assessment (i.e., overall reliability score >50%).
Second, the rubric includes three separate criteria related to chemical analyses of exposure media: any analytical confirmation and reporting of tested concentrations (e.g., stock solutions), initial concentration in individual test units reported, and final concentrations in individual test units reported. While this is useful and sometimes necessary information to report, the necessity should be determined based on the test compound, with an opportunity to adjust the score if required. For example, many crude oils contain compounds that can volatilize relatively quickly, such as toluene and benzene; thus, exposure media using oil as a test substance can change significantly over the duration of the test, and these toxicity‐inducing compounds may no longer be present (Rajabi et al., 2020). In this scenario, it is necessary to chemically analyze the media at both test initiation and termination to have confidence in the exposure. However, metals such as copper and zinc have been used extensively in the ecotoxicological literature and have been recommended as reference toxicants by Environment Canada (1990) because of their demonstrated ability to maintain consistent concentrations over long periods of time, among other criteria. In this scenario, confirming concentrations at the start of exposure is likely sufficient for the reasons we have outlined. In future assessments of data reliability, a table could be created that outlines appropriate chemical analyses and reporting requirements for different classes of compounds based on best available scientific practice and need.
Third, the rubric did not lend itself well to the assessment and scoring of studies conducted in vitro. The purpose and approach of these tests can be significantly different from a whole‐organism toxicity test, resulting in fundamental differences in what criteria may be considered critical to the reliability of the study. For example, information required to be reported for whole‐organism toxicity tests may include environmental parameters such as temperature, light intensity, and photoperiod; husbandry practices including media renewal and feeding strategies; and control performance criteria. Most of these requirements are not applicable for tests conducted on cell lines because the test organism husbandry element has been removed, so objective scoring becomes difficult. In addition, in vitro tests typically compare treatments to unexposed cell lines used as controls (see Desforges et al., 2017). This allows results to be expressed as percentages of control values; however, there should be some sort of control criteria for in vitro tests that allow researchers to compare both treated and control cell lines to a predefined value. For this reason, these tests did not achieve a score of 1 in Criterion 15 (control values reported and performance criteria met) even though the control performance may have been acceptable for this type of test. In future applications of this scoring rubric, in vivo and in vitro studies should be assessed separately according to criteria that have been predefined and are specific to each group. Several tools exist for evaluating data collected via in vitro methods, including ToxRTool and SciRAP, which could be useful in informing this aspect of the rubric in the future (Roth et al., 2021; Schneider et al., 2009).
Last, it would be beneficial to develop more distinct criteria in future versions of this rubric or for scoring in general. For example, Criterion 12 (statistical methods described) indicates that the appropriate test must be reported and appropriate controls employed. This criterion could be split into two separate criteria so that each of these issues could be addressed separately. As it stands, if the test employed was not described appropriately or if appropriate controls are not included, the study does not achieve a score of 1 in that criterion when it may have actually met one or the other. Implementing these changes would give risk assessors a more accurate picture of the experiment with all relevant details easily identifiable from the rubric. Ultimately there is a balance that needs to be struck between interrogating all aspects of a study with expediency and a focus on those that speak most directly to methodological strength and reliability for risk assessment.
Despite the uncertainties resulting from the application of the scoring rubric discussed, the overall objective of the exercise was met. The transparent nature of defining criteria a priori meant that once the method was defined, it was followed for all studies to ensure consistency. This resulted in some unavoidable rigidity, and there could be changes to several scores on the implementation of the suggestions we have described; however, the original scores and rubric have been included to assist other researchers in assessing and modifying, as needed, the reliability of studies for use in ERA as they deem fit. The process of developing a method such as this is iterative and can only be improved after working with it and acknowledging uncertainties that may arise.
CONCLUSION
We found that application of the scoring rubric identified methodologically strong, reliable Arctic ecotoxicology studies that could be prioritized for use in ERA and regulatory guidelines. The exercise determined that the vast majority of studies have significant weaknesses and uncertainties and should likely be recognized as such when aiming to inform regulatory decisions. This highlights the need for risk assessors and ecotoxicology researchers in general to critically evaluate the reliability of data before any decision‐making occurs. To save time, energy, and resources, we encourage researchers to consult a rubric such as the one presented in the present study before designing their own experiments, to ensure that minimum experimental and reporting requirements for high‐quality data are met, as well as the need for the toxicity test itself. Performing a new test that is poorer in design and methodology than past work does not move environmental protection forward but rather undermines the effort and should be avoided.
Disclaimer
The authors have no known conflicts of interests related to the present study.
Author Contributions Statement
Rebecca J. Eldridge, Benjamin P. de Jourdan, and Mark L. Hanson conceived and designed the review. Rebecca J. Eldridge performed the review and statistical analysis and wrote the manuscript. Benjamin P. de Jourdan and Mark L. Hanson provided editorial and technical assistance.
Supporting information
This article includes online‐only Supporting Information.
Supporting information.
Supporting information.
Acknowledgment
We thank the funders of the present review, specifically the Multi‐Partner Research Initiative of Fisheries and Oceans Canada, POLAR Knowledge Canada, and the Clayton H. Riddell Faculty of Environment Endowment Fund, as well as J. Anderson for helping to facilitate the scoring exercise. Finally, thank you to the two anonymous reviewers for taking their time to review our manuscript and provide helpful comments that contributed to the improvement of this manuscript.
Data Availability Statement
Data, associated metadata, and calculation tools are available from the corresponding author, as well as in the Supporting Information.
REFERENCES
- Ågerstrand, M. , Breitholtz, M. , & Rudén, C. (2011). Comparison of four different methods for reliability evaluation of ecotoxicity data—A case study of non‐standard test data used in environmental risk assessments of pharmaceutical substances. Environmental Science Europe, 23, Article 17. [Google Scholar]
- Ågerstrand, M. , Edvardsson, L. , & Rudén, C. (2013). Bad reporting or bad science? Systematic data evaluation as a means to improve the use of peer‐reviewed studies in risk assessments of chemicals. Human and Ecological Risk Assessment: An International Journal, 20(6), 1427–1445. [Google Scholar]
- Ågerstrand, M. , Kuster, A. , Bachmann, J. , Breitholtz, M. , Ebert, I. , Rechenberg, B. , & Rudén, C. (2011). Reporting and evaluation criteria as means toward a transparent use of ecotoxicity data for environmental risk assessment of pharmaceuticals. Environmental Pollution, 159(10), 2487–2492. [DOI] [PubMed] [Google Scholar]
- Arctic Council . (2020). First Arctic shipping status report from PAME working group highlights increase in Arctic shipping traffic. https://arctic-council.org/en/news/first-arctic-shipping-status-report-increase-shipping-traffic/
- Arctic Monitoring and Assessment Programme . (2017). Adaptation actions for a changing Arctic: Perspectives from the Bering‐Chukchi‐Beaufort region. https://www.amap.no/documents/download/2993/inline
- Aulanier, F. , Simard, Y. , Roy, N. , Gervaise, C. , & Bandet, M. (2017). Effects of shipping on marine acoustic habitats in Canadian Artic estimated via probabilistic modelling and mapping. Marine Pollution Bulletin, 125(1–2), 115–131. [DOI] [PubMed] [Google Scholar]
- Aune, M. , Aschan, M. M. , Greenacre, M. , Dolgov, A. V. , Fossheim, M. , & Primicerio, R. (2018). Functional roles and redundancy of demersal Barents Sea fish: Ecological implications of environmental change. PLoS One, 13(11), Article e0207451. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aurand, D. , & Coelho, G. (Eds.). (2005). Cooperative aquatic toxicity testing of dispersed oil and the “Chemical Response to Oil Spills: Ecological Effects Research Forum (CROSERF)” (Technical Report 07‐03). Ecosystem Management & Associates.
- Bejarano, A. C. (2019). Further development and refinement of interspecies correlation estimation models for current‐use dispersants. Environmental Toxicology and Chemistry, 38(8), 1682–1691. [DOI] [PubMed] [Google Scholar]
- Bejarano, A. C. , Gardiner, W. W. , Barron, M. G. , & Word, J. Q. (2017a). Relative sensitivity of Arctic species to physically and chemically dispersed oil determined from three hydrocarbon measures of aquatic toxicity. Marine Pollution Bulletin, 122(1–2), 316–322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bejarano, A. C. , Raimondo, S. , & Barron, M. G. (2017b). Framework for optimizing selection of interspecies correlation estimation models to address species diversity and toxicity gaps in an aquatic database. Environmental Science & Technology, 51(14), 8158–8165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bejarano, A. C. , & Wheeler, J. R. (2020). Scientific basis for expanding the use of interspecies correlation estimation models. Integrated Environmental Assessment and Management, 16(4), 528–530. [DOI] [PubMed] [Google Scholar]
- Beketoc, M. A. , & Liess, M. (2009). Acute and delayed effects of the neonicotinoid insecticide thiacloprid on seven freshwater arthropods. Environmental Toxicology and Chemistry, 27(2), 461–470. [DOI] [PubMed] [Google Scholar]
- Borgert, C. J. , Becker, R. A. , Carlton, B. D. , Hanson, M. , Kwiatkowski, P. L. , Marty, M. S. , McCarty, L. S. , Quill, T. F. , Solomon, K. , van der Kraak, G. , Witorsch, R. J. , & Yi, K. D. (2016). Does GLP enhance the quality of toxicological evidence for regulatory decisions? Toxicological Sciences, 151(2), 206–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brain, R. A. , & Hanson, M. L. . (2021). The press sells newspapers, we should not sell ecotoxicology. Environmental Toxicology and Chemistry, 40(5), 1239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Braune, B. M. , Scheuhammer, A. M. , Crump, D. , Jones, S. , Porter, E. , & Bond, D. (2012). Toxicity of methylmercury injected into eggs of thick‐billed murres and Arctic terns. Ecotoxicology, 21, 2143–2152. [DOI] [PubMed] [Google Scholar]
- Breitholtz, M. , Rudén, C. , Hansson, S. O. , & Bengtsson, B.‐E. (2006). Ten challenges for improved ecotoxicological testing in environmental risk assessment. Ecotoxicology and Environmental Safety, 63, 324–335. [DOI] [PubMed] [Google Scholar]
- Brocke, J. , Simons, A. , Niehaves, B. , & Reimer, K. (2009, June 8–10). Reconstructing the giant: On the importance of rigour in documenting the literature search process [Paper presentation]. 17th European Conference on Information Systems (ECIS), Verona, Italy. [Google Scholar]
- Brown, R. G. B. (1980). Seabirds as marine animals. In Burger J., Olla B. L., & Winn H. E. (Eds.), Behaviour of marine animals. Springer. 10.1007/978-1-4684-2988-6_1 [DOI] [Google Scholar]
- Burden, N. , Benstead, R. , Benyon, K. , Clook, M. , Green, C. , Handley, J. , Harper, N. , Maynard, S. K. , Mead, C. , Pearson, A. , Ryder, J. , Sheahan, D. , van Egmond, R. , Wheeler, J. R. , & Hutchinson, T. H. (2020). Key opportunities to replace, reduce, and refine regulatory fish acute toxicity tests. Environmental Toxicology and Chemistry, 39(10), 2076–2089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Busdosh, M. , & Atlas, R. M. (1977). Toxicity of oil slicks to Arctic amphipods. Arctic, 30(2), 85–92. [Google Scholar]
- Campbell, L. M. , Norstrom, R. J. , Hobson, K. A. , Muir, D. C. G. , Backus, S. , & Fisk, A. T. (2005). Mercury and other trace elements in a pelagic Arctic marine food web (Northwater Polynya, Baffin Bay). Science of the Total Environment, 351–352, 247–263. [DOI] [PubMed] [Google Scholar]
- Camus, L. , Brooks, S. , Geraudie, P. , Hjorth, M. , Nahrgang, J. , Olsen, G. H. , & Smit, M. G. D. (2015). Comparison of produced water toxicity to Arctic and temperate species. Ecotoxicology and Environmental Safety, 113, 248–258. [DOI] [PubMed] [Google Scholar]
- Carroll, J. , & Smit, M. (2011, February 22–24). An integrated modeling framework for decision support in ecosystem management: Case study Lofoten/Barents Sea [Paper presentation]. SPE European Health, Safety, and Environmental Conference in Oil and Gas Exploration and Production, Vienna, Austria.
- Chapman, P. M. , & Riddle, M. J. (2003). Missing and needed: Polar marine ecotoxicology. Marine Pollution Bulletin, 46(8), 927–928. [DOI] [PubMed] [Google Scholar]
- Chapman, P. M. , & Riddle, M. J. (2005). Polar marine toxicology—Future research needs. Marine Pollution Bulletin, 50(9), 905–908. [DOI] [PubMed] [Google Scholar]
- Christou, M. , Kavaliauskis, A. , Ropstad, E. , & Fraser, T. W. K. (2020). DMSO effects larval zebrafish (Danio rerio) behaviour, with additive and interaction effects when combined with positive controls. Science of the Total Environment, 709, Article 134490. [DOI] [PubMed] [Google Scholar]
- Clarivate Analytics . (2021). Journal Citation Reports Sciences Edition.
- Conservation of Arctic Flora and Fauna . (2013). Arctic biodiversity assessment: Status and trends in Arctic biodiversity.
- Darnis, G. , Robert, D. , Pomerleau, C. , Link, H. , Archambault, P. , Nelson, R. J. , Geoffroy, M. , Tremblay, J.‐E. , Lovejoy, C. , Ferguson, S. H. , Hunt, B. P. V. , & Fortier, L. (2012). Current state and trends in Canadian Arctic marine ecosystems: II. Heterotrophic food web, pelagic‐benthic coupling, and biodiversity. Climatic Change, 115(1), 179–205. [Google Scholar]
- Desforges, J.‐P. , Levin, M. , Jasperse, L. , De Guise, S. , Eulaers, I. , Letcher, R. J. , Acquarone, M. , Nordoy, E. , Folkow, L. P. , Jensen, T. H. , Grondahl, C. , Bertelsen, M. F. , Leger, J. S. , Almunia, J. , Sonne, C. , & Dietz, R. (2017). Effects of polar bear and killer whale derived contaminant cocktails on marine mammal immunity. Environmental Science & Technology, 51, 11431–11439. [DOI] [PubMed] [Google Scholar]
- Durda, J. L. , & Preziosi, D. V. (2000). Data quality evaluation of toxicological studies used to derive ecotoxicological benchmarks. Human and Ecological Risk Assessment, 6(5), 747–765. [Google Scholar]
- Engelhardt, F. R. (1981). Oil pollution in polar bears: Exposure and clinical effects. In Proceedings of the Arctic Marine Oil Spill Program Technical Seminar, Edmonton, AB, June 16–18, 1981. Environment Canada, pp. 139–179.
- Engelhardt, F. R. (1983). Behavioural responses of benthic invertebrates exposed to dispersed crude oil. In Proceedings of the Arctic Marine Oil Spill Program Technical Seminar, Edmonton, AB, June 14–16, 1983. Environment Canada, pp. 32–51.
- Environment Canada . (1990). Guidance document on control of toxicity test precision using reference toxicants (Report EPS 1/RM/12).
- Environment Canada . (2000). Biological test method: Acute lethality of effluents to rainbow trout (Report EPS 1/RM/13).
- Fahd, F. , Veitch, B. , & Khan, F. (2019). Arctic marine fish “biotransformation toxicity” model for ecological risk assessment. Marine Pollution Bulletin, 142, 408–418. [DOI] [PubMed] [Google Scholar]
- Faksness, L.‐G. , Borseth, J. F. , Baussant, T. , Hansen, B. H. , Altin, D. , Tandberg, A. H. S. , Ingvarsdottir, A. , Aarab, N. & Nordtug, T. (2011). The effects of different oil spill cleanup technologies on body burden and biomarkers in Arctic marine organisms—A laboratory study. In Proceedings of the Arctic Marine Oil Spill Program Technical Seminar, Banff, AB, October 4–6, 2011. Environment Canada, pp. 738–758.
- Falfushynska, H. , Sokolov, E. P. , Haider, F. , Opperman, C. , Kragl, U. , Ruth, W. , Stock, M. , Glufke, S. , Winkel, E. J. , & Sokolova, I. M. (2019). Effects of a common pharmaceutical, atorvastatin, on energy metabolism and detoxification mechanisms of a marine bivalve Mytilus edulis . Aquatic Toxicology, 208, 47–61. [DOI] [PubMed] [Google Scholar]
- Fernández‐Méndez, M. , Wenzhöfer, F. , Peeken, I. , Sørensen, H. L. , Glud, R. N. , & Boetius, A. (2014). Composition, buoyancy regulation and fate of ice algal aggregates in the central Arctic Ocean. PLoS One, 9(9), Article e107452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frantzen, M. , Falk‐Petersen, I.‐B. , Nahrgang, J. , Smith, T. J. , Olsen, G. H. , Hangstad, T. A. , & Camus, L. (2012). Toxicity of crude oil and pyrene to the embryos of beach spawning capelin (Mallotus villosus). Aquatic Toxicology, 108, 42–52. [DOI] [PubMed] [Google Scholar]
- Frouin, H. , Loseto, L. L. , Stern, G. A. , Haulena, M. , & Ross, P. S. (2012). Mercury toxicity in beluga whale lymphocytes: Limited effects of selenium protection. Aquatic Toxicology, 109, 185–193. [DOI] [PubMed] [Google Scholar]
- Gardiner, W. W. , Word, J. Q. , Word, J. D. , Perkins, R. A. , McFarlin, K. M. , Hester, B. W. , Word, L. S. , & Ray, C. M. (2013). The acute toxicity of chemically and physically dispersed crude oil to key arctic species under arctic conditions during the open water season. Environmental Toxicology and Chemistry, 32(10), 2284–2300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Geraudie, P. , Nahrgang, J. , Forget‐Leray, J. , Minier, C. , & Camus, L. (2014). In vivo effects of environmental concentrations of produced water on the reproductive function of polar cod (Boreogadus saida). Journal of Toxicology and Environmental Health, Part A, 77(9–11), 557–573. [DOI] [PubMed] [Google Scholar]
- Gosselin, M. , Levasseur, M. , & Wheeler, P. A. (1997). New measurements of phytoplankton and ice algal production in the Arctic Ocean. Deep Sea Research Part II: Topical Studies in Oceanography, 44(8), 1623–1625, 1627–1644. [Google Scholar]
- Government of Canada . (2019). Environmental risk assessment. https://www.canada.ca/en/health-canada/services/consumer-product-safety/pesticides-pest-management/public/protecting-your-health-environment/pesticide-registration-process/reviews/environmental-risk-assessment.html
- Grenvald, J. C. , Nielsen, T. G. , & Hjorth, M. (2013). Effects of pyrene exposure and temperature on early development of two co‐existing Arctic copepods. Ecotoxicology, 22, 184–198. [DOI] [PubMed] [Google Scholar]
- Hansen, B. , Altin, D. , Bonaunet, K. , & Øverjordet, I. (2014). Acute toxicity of eight oil spill response chemicals to temperate, boreal, and Arctic species. Journal of Toxicology and Environmental Health, Part A, 77, 495–505. [DOI] [PubMed] [Google Scholar]
- Hansen, B. H. , Altin, D. , Olsen, A. J. , & Nordtug, T. (2012). Acute toxicity of naturally and chemically dispersed oil on the filter‐feeding copepod Calanus finmarchicus . Ecotoxicology and Environmental Safety, 86, 38–46. [DOI] [PubMed] [Google Scholar]
- Hansen, B. H. , Altin, D. , Rørvik, S. , Øverjordet, I. , Jager, T. , & Nordtug, T. (2013). Acute exposure of water soluble fractions of marine diesel on Arctic Calanus glacialis and boreal Calanus finmarchicus: Effects on survival and biomarker response. Science of the Total Environment, 449, 276–284. [DOI] [PubMed] [Google Scholar]
- Hansen, B. H. , Nordtug, N. , Altin, D. , Booth, A. , Hessen, K. M. , & Olsen, A. J. (2009). Gene expression of GST and CYP330A1 in lipid‐rich and lipid poor female Calanus finmarchicus (Copepoda: Crustacea) exposed to dispersed crude oil. Journal of Toxicology and Environmental Health, Part A, 72, 131–139. [DOI] [PubMed] [Google Scholar]
- Hansen, B. H. , Parkerton, T. , Nordtug, T. , Storseth, T. R. , & Redman, A. (2019). Modeling the toxicity of dissolved crude oil exposures to characterize the sensitivity of cod (Gadus morhua) larvae and role of individual and unresolved hydrocarbons. Marine Pollution Bulletin, 138, 286–294. [DOI] [PubMed] [Google Scholar]
- Hanson, M. , Baxter, L. , Anderson, J. , Solomon, K. , & Brain, R. (2019). Strength of methods assessment for aquatic primary producer toxicity data: A critical review of atrazine studies from the peer‐reviewed literature. Science of the Total Environment, 685, 1221–1239. [DOI] [PubMed] [Google Scholar]
- Hanson, M. , & Brain, R. (2020). Context and perspective in ecotoxicology. Environmental Toxicology and Chemistry, 39(9), 1655–1656. [DOI] [PubMed] [Google Scholar]
- Hanson, M. , & Brain, R. (2021). A method to screen for consistency of effect in laboratory toxicity tests: A case study anurans and the herbicide atrazine. Archives of Environmental Toxicology and Chemistry, 81, 123–132. [DOI] [PubMed] [Google Scholar]
- Hanson, M. L. , Wolff, B. A. , Green, J. W. , Kivi, M. , Panter, G. H. , Warne, M. St. J. , Ågerstrand, M. , & Sumpter, J. P. (2017). How we can make ecotoxicology more valuable to environmental protection. Science of the Total Environment, 578, 228–235. [DOI] [PubMed] [Google Scholar]
- Harris, C. A. , Scott, A. P. , Johnson, A. C. , Panter, G. H. , Sheahan, D. , Roberts, M. , & Sumpter, J. P. (2014). Principles of sound ecotoxicology. Environmental Science & Technology, 48, 3100–3111. [DOI] [PubMed] [Google Scholar]
- Harris, C. A. , & Sumpter, J. P. (2015). Could the quality of published ecotoxicological research be better? Environmental Science & Technology, 49(16), 9495–9496. [DOI] [PubMed] [Google Scholar]
- Henry, T. , & Pease, A. (2016). ET&C perspectives: A government perspective. Environmental Toxicology and Chemistry, 35(1), 16–18. [DOI] [PubMed] [Google Scholar]
- Hjorth, M. , & Nielsen, T. G. (2011). Oil exposure in the Arctic: Potential impacts on key zooplankton species. Marine Biology, 158(6), 1339–1347. [Google Scholar]
- Hobbs, D. A. , Warne, M. S. J. , & Markich, S. J. (2005). Evaluation of criteria used to assess the quality of aquatic toxicity data. Integrated Environmental Assessment and Management, 1, 174–180. [DOI] [PubMed] [Google Scholar]
- Hsiao, S. C. (1978). Effects of crude oils on the growth of Arctic marine phytoplankton. Environmental Pollution, 17, 93–107. [Google Scholar]
- International Maritime Organization . (2019). IMO 2020—Cutting sulphur oxide emissions. https://www.imo.org/en/MediaCentre/HotTopics/Pages/Sulphur-2020.aspx
- Jensen, M. H. , Nielsen, T. G. , & Dahllöf, I. (2008). Effects of pyrene on grazing and reproduction of Calanus finmarchicus and Calanus glacialis from Disko Bay, West Greenland. Aquatic Toxicology, 87, 99–107. [DOI] [PubMed] [Google Scholar]
- Kase, R. , Korkaric, M. , Werner, I. , & Ågerstrand, M. (2016). Criteria for reporting and evaluating ecotoxicity data (CRED): Comparison and perception of the Klimisch and CRED methods for evaluating reliability and relevance of ecotoxicity studies. Environmental Sciences Europe, 28(1), 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Klimisch, H. L. , Andreae, M. , & Tillmann, U. (1997). A systematic approach for evaluating the quality of experimental toxicological and ecotoxicological data. Regulatory Toxicology and Pharmacology, 25, 1–5. [DOI] [PubMed] [Google Scholar]
- Küster, A. , Bachmann, J. , Brandt, U. , Ebert, I. , Hickmann, S. , Klein‐Goedicke, J. , Maack, G. , & Schmitz, S. (2009). Regulatory demands on data quality for the environmental risk assessment of pharmaceuticals. Regulatory Toxicology and Pharmacology, 55, 276–280. [DOI] [PubMed] [Google Scholar]
- Lemcke, S. , Holding, J. , Moller, E. F. , Thyrring, J. , Gustavson, K. , Juul‐Pedersen, T. , & Sejr, M. K. (2018). Acute oil exposure reduces physiological process rates in Arctic phyto‐ and zooplankton. Ecotoxicology, 28, 26–36. [DOI] [PubMed] [Google Scholar]
- McGrath, J. A. , Fanelli, C. J. , Di Toro, D. M. , Parkerton, T. F. , Redman, A. D. , Paumen, M. L. , Comber, M. , Eadsforth, C. V. , & den Haan, K. (2018). Re‐evaluation of target lipid model‐derived HC5 predictions for hydrocarbons. Environmental Toxicology and Chemistry, 37(6), 1579–1593. [DOI] [PubMed] [Google Scholar]
- Michel, C. , Legendre, L. , Ingram, R. G. , Gosselin, M. , & Levasseur, M. (1996). Carbon budget of sea‐ice algae in spring: Evidence of a significant transfer to zooplankton grazers. Journal of Geophysical Research: Oceans, 101(C8), 18345–18360. [Google Scholar]
- Moermond, C. , Beasley, A. , Breton, R. , Junghans, M. , Laskowski, R. , Solomon, K. , & Zahner, H. (2016). Assessing the reliability of ecotoxicological studies: An overview of current needs and approaches. Integrated Environmental Assessment and Management, 13(4), 640–651. [DOI] [PubMed] [Google Scholar]
- Moermond, C. T. A. , Kase, R. , Korkaric, M. , & Ågerstrand, M. (2015). CRED: Criteria for reporting and evaluating ecotoxicity data. Environmental Toxicology and Chemistry, 35(5), 1297–1309. [DOI] [PubMed] [Google Scholar]
- Moore, D. (2016). Toxicity of salts, metals, and nitrogenous contaminants to cold‐water fish under northern conditions [Unpublished doctoral dissertation]. University of Manitoba. [Google Scholar]
- Nahrgang, J. , Dubourg, P. , & Frantzen, M. (2016). Early life stages of an Arctic keystone species (Boreogadus saida) show high sensitivity to a water‐soluble fraction of crude oil. Environmental Pollution, 218, 605–614. [DOI] [PubMed] [Google Scholar]
- Nørregaard, R. D. , Nielsen, T. G. , Moller, E. F. , Strand, J. , Espersen, L. , & Mohl, M. (2014). Evaluating pyrene toxicity on Arctic key copepod species Calanus hyperboreus . Ecotoxicology, 23, 163–174. [DOI] [PubMed] [Google Scholar]
- Office of the Federal Register, National Archives and Records Administration . (2021). 43 CFR 11—Natural resource damage assessments.
- Olsen, A. J. , Nordtug, T. , Altin, D. , Lervik, M. , & Hansen, B. J. (2013). Effects of dispersed oil on reproduction in the cold water copepod Calanus finmarchicus (Gunnerus). Environmental Toxicology and Chemistry, 32(9), 2045–2055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Olsen, G. H. , Smit, M. G. D. , Carroll, J. , Jaeger, I. , Smith, T. , & Camus, L. (2011). Arctic versus temperate comparison of risk assessment metrics for 2‐methylnaphthalene. Marine Environmental Research, 72, 179–187. [DOI] [PubMed] [Google Scholar]
- Øverjordet, I. B. , Altin, D. , Berg, T. , Jenssen, B. M. , Gabrielsen, G. W. , & Hansen, B. H. (2014). Acute and sub‐lethal response to mercury in Arctic and boreal calanoid copepods. Aquatic Toxicology, 155, 160–165. [DOI] [PubMed] [Google Scholar]
- Palace, V. P. , Allen‐Gil, S. M. , Brown, S. B. , Evans, R. E. , Metner, D. A. , Landers, D. H. , Lawrence, C. R. , Klaverkamp, J. F. , Baron, C. L. , & Lockhart, W. L. (2001). Vitamin and thyroid status in arctic grayling (Thymallus arcticus) exposed to doses of 3,3′,4,4′‐tetrachlorobiphenyl that induce the phase I enzyme system. Chemosphere, 45(2), 185–193. [DOI] [PubMed] [Google Scholar]
- Pančić, M. , Köhler, E. , Paulsen, M. L. , Toxvaerd, K. , Lacroix, C. , Le Floch, S. , Hjorth, M. , & Nielsen, T. G. (2019). Effects of oil spill response technologies on marine microorganisms in the high Arctic. Marine Environmental Research, 151, Article 104785. [DOI] [PubMed] [Google Scholar]
- Pechenik, J. A. (2006). Larval experience and latent effects—Metamorphosis is not a new beginning. Integrative and Comparative Biology, 46(3), 323–333. [DOI] [PubMed] [Google Scholar]
- Percy, J. A. , & Mullin, T. C. (1975). Effects of crude oils on Arctic marine invertebrates (Beaufort Sea Technical Report 11). Department of the Environment.
- Petersen, D. G. , Reichenberg, F. , & Dahlloff, I. (2008). Phototoxicity of pyrene affects benthic algae and bacteria from the Arctic. Environmental Science & Technology, 42, 1371–1376. [DOI] [PubMed] [Google Scholar]
- Prosser, R. S. , Deeth, L. , Humeniuk, B. , Jeyabalan, T. , & Hanson, M. (2021). Taxonomic chauvinism in pesticide ecotoxicology. Environmental Toxicology and Chemistry. Advance online publication. 10.1002/etc.5227 [DOI] [PubMed] [Google Scholar]
- Raimondo, S. , Lilavois, C. R. , & Barron, M. G. (2015). Web‐based interspecies correlation estimation (WebICE) for acute toxicity: User manual (Ver 3.3, EPA/600/R‐15/192). US Environmental Protection Agency.
- Rajabi, H. , Mosleh, M. H. , Mandal, P. , Lea‐Langton, A. , & Sedighi, M. (2020). Emissions of volatile organic compounds from crude oil processing—Global emission inventory and environmental release. Science of the Total Environment, 727, Article 138654. [DOI] [PubMed] [Google Scholar]
- Redman, A. D. , & Parkerton, T. F. (2015). Guidance for improving comparability and relevance of oil toxicity tests. Marine Pollution Bulletin, 98, 156–170. [DOI] [PubMed] [Google Scholar]
- Rehberger, K. , Kropf, C. , & Segner, H. (2018). In vitro or not in vitro: A short journey through a long history. Environmental Sciences Europe, 30(1), Article 23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Riebell, P. N. , & Percy, J. A. (1989). Acute toxicity of petroleum hydrocarbons to the Arctic littoral mysid, Mysis oculata (Fabricus). In Proceedings of the Arctic Marine Oil Spill Program Technical Seminar, Calgary, AB, June 7–9, 1989. Environment Canada, pp. 973–1010.
- R Foundation for Statistical Computing . (2012). R: A language and environment for statistical computing.
- Rodríguez‐Torres, R. , Almeda, R. , Kristiansen, M. , Rist, S. , Winding, M. S. , & Nielsen, T. G. (2020). Ingestion and impact of microplastics on Arctic Calanus copepods. Aquatic Toxicology, 228, Article 105631. [DOI] [PubMed] [Google Scholar]
- Roth, N. , Zilliacus, J. , & Beronius, A. (2021). Development of the SciRAP approach for evaluating the reliability and relevance of in vitro toxicity data. Frontiers in Ecotoxicology, 3, Article 746430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schneider, K. , Schwarz, M. , Burkholder, I. , Kopp‐Schneider, A. , Edler, L. , KinsnerOvaskainen, A. , Hartung, T. , & Hoffman, S. (2009). ToxRTool, a new tool to assess the reliability of toxicological data. Toxicology Letters, 189(2), 138–144. [DOI] [PubMed] [Google Scholar]
- Sonne, C. , Dietz, R. , Jenssen, B. M. , Lam, S. S. , & Letcher, R. J. (2021). Emerging contaminants and biological effects in Arctic wildlife. Trends in Ecology & Evolution, 36(5), 421–429. [DOI] [PubMed] [Google Scholar]
- Staveley, J. P. , Smrchek, J. C. , Blaise, C. , & Ferard, J.‐F. (Eds.). (2005). Algal Toxicity Test. Small‐scale freshwater toxicity investigations (Vol. 1). Springer, Dordrecht, pp. 181–202.
- Søreide, J. E. , Leu, E. V. A. , Berge, J. , Graeve, M. , & Falk‐Petersen, S. (2010). Timing of blooms, algal food quality and Calanus glacialis reproduction and growth in a changing Arctic. Global Change Biology, 16(11), 3154–3163. [Google Scholar]
- Thoré, E. S. J. , Steenaerts, L. , Philippe, C. , Grégoire, A. F. , Brendonck, L. , & Pinceel, T. (2019). Improving the reliability and ecological validity of pharmaceutical risk assessment: Turquoise killifish (Nothobranchius furzeri) as a model in behavioural ecotoxicology. Environmental Toxicology and Chemistry, 38(1), 262–270. [DOI] [PubMed] [Google Scholar]
- Thyrring, J. , Juhl, B. K. , & Holmstrup, M. (2015). Does acute lead (Pb) contamination influence membrane fatty acid composition and freeze tolerance in intertidal blue mussels in Arctic Greenland? Ecotoxicology, 24, 2036–2042. [DOI] [PubMed] [Google Scholar]
- Toxvӕrd, K. , Dinh, K. V. , Henriksen, O. , Hjorth, M. , & Nielsen, T. G. (2019). Delayed effects of pyrene exposure during overwintering on the Arctic copepod Calanus hyperboreus . Aquatic Toxicology, 217, Article 105332. [DOI] [PubMed] [Google Scholar]
- US Department of the Interior . (2008). Circum‐Arctic resource appraisal: Estimates of undiscovered oil and gas north of the arctic circle.
- US Environmental Protection Agency . (1994). Using toxicity tests in ecological risk assessment. Publication 9345‐0.05I. https://www.epa.gov/sites/default/files/2015-09/documents/v2no1.pdf
- US Environmental Protection Agency . (2003). A summary of general assessment factors for evaluating the quality of scientific and technical information (EPA 100/B‐03/001). https://www.epa.gov/risk/summary-general-assessment-factors-evaluating-quality-scientific-and-technical-information
- US Environmental Protection Agency (2012). Ecological effects test guidelines OCSPP 850.4500: Algal toxicity.
- US Environmental Protection Agency . (2016). General ecological assessment endpoints (GEAEs) for ecological risk assessment: Second edition with generic ecosystem services endpoints added (EPA/100/F15/05).
- US Environmental Protection Agency . (2021). ECOTOX User Guide: ECOTOXicology Knowledgebase System. Version 5.3. Available at http://www.epa.gov/ecotox
- Van der Kraak, G. J. , Hosmer, A. J. , Hanson, M. L. , Kloas, W. , & Solomon, K. R. (2014). Effects of atrazine in fish, amphibians, and reptiles: An analysis based on quantitative weight of evidence. Critical Reviews in Toxicology, 44(S5), 1–66. [DOI] [PubMed] [Google Scholar]
- Vergauwen, L. (2018). Application of adverse outcome pathways (AOPs) in ecotoxicology—A multi‐taxon AOP network to reduce animal testing. Toxicology Letters, 295(1), S36–S37. [Google Scholar]
- Wang, Z. , An, C. , Lee, K. , Owens, E. , Chen, Z. , Boufadel, M. , Taylor, E. , & Feng, Q. (2021). Factors influencing the fate of oil spilled on shorelines: A review. Environmental Chemistry Letters, 19, 1611–1628. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
This article includes online‐only Supporting Information.
Supporting information.
Supporting information.
Data Availability Statement
Data, associated metadata, and calculation tools are available from the corresponding author, as well as in the Supporting Information.
