Abstract
This study demonstrates how machine learning techniques can bridge data gaps in the ecotoxicological hazard assessment of chemical pollutants and illustrates how the results can be used in practice. The innovation herein consists of the prediction of the sensitivity of all species that were tested for at least one chemical for all chemicals based on all available data. As proof of concept, pairwise learning was applied to 3295 × 1267 (chemical,species) pairs of Observed LC50 data, where only 0.5% of the pairs have experimental data. This yielded more than four million Predicted LC50s for separate exposure durations. These were used to create (1) a novel Hazard Heatmap of Predicted LC50s, (2) Species Sensitivity Distributions (SSD) for all chemicals based on 1267 species each, as well as (3) for taxonomic groups separately, and (4) newly defined Chemical Hazard Distributions (CHD) for all species based on 3295 chemicals each. Validation results and graphical examples illustrate the utility of the results and highlight species and compound selection biases in the input data. The results are broadly applicable, ranging from Safe and Sustainable by Design (SSbD) assessments and setting protective standards to Life Cycle Assessment of products and assessing and mitigating impacts of chemical pollution on biodiversity.
Keywords: chemical, hazard, ecotoxicity, machine learning, pairwise learning, hazard heatmap, species sensitivity distribution, chemical hazard distribution, biodiversity, environmental quality standard, toxic pressure, safe and sustainable by design


Introduction
Chemical pollution and biodiversity loss are referred to as two items of three contemporary planetary crises. In reply, the European Union has formulated the toxic-free environment status as aspirational goal, whereby the Safe and Sustainable by Design (SSbD) framework was introduced as a pro-active approach to the design of safer chemicals. , This expands on existing regulatory frameworks that aim to protect (e.g., REACH) or restore environmental quality (e.g., Water Framework Directive) and to lower life cycle impacts of products (e.g., Product Environmental Footprint). All these approaches require hazard data, but that is often a limiting factor in practice, since only approximately 3.5% of chemicals in trade have sufficient data to derive an SSD (there are approximately 350 thousand chemicals in trade, of which 12,000 have estimated SSDs). Research efforts focus on a relatively small subset of species. Moreover, SSbD assessments concern early innovation stages that are characterized by data scarcity. We aimed to support all assessment formats by developing a method to bridge the existing data gaps.
The method developed in the present study is based on the “lock key analogy”. , This analogy visualizes interactions between enzymes, toxic chemicals, or pharmaceuticals and their target sites of action within organism tissues as a key fitting in a lock. Missing ecotoxicological data for nontested (species, chemical) pairs can be generated by employing Machine Learning techniques on a large set of test data, based on the lock and key analogy (e.g., Viljanen et al., von Borries et al., and Kosnik et al.). In a subsequent step, the resulting data can be used in different formats for practical assessments. The approach differs from commonly employed approaches in that it predicts missing ecotoxicity data from all training data (across all compounds and species) rather than on a per-chemical basis (as described in, e.g. refs and ). More specifically, we redefine the problem as a matrix–completion problem: for every chemical c and every species s, we would like to quantify any ecotoxicity end point, such as the LC50(c,s), while only a tiny fraction of the |s| × |c| matrix is tested experimentally. Instead of analyzing each chemical in isolation, we see it as a pairwise-learning task, which corresponds to the supervised learning setting where the goal is to make predictions for pairs of objects. Pairwise learning is considered a relevant and powerful approach for this task and is applied to a wide variety of problems. The goal is to learn a function that can take an unseen chemical and species pair and predict the LC50 value. By treating the chemical and the species as covariates of equal importance, pairwise learning naturally captures cross terms (the “lock–key” interactions) that are not considered in the classical per-chemical modeling approaches, and does so in a parameter-efficient way.
In the present study, we employed a Bayesian matrix factorization approach adapted for pairwise learning following Rendle. We analyzed a large set of curated lethal effect (LC50, 50% lethal concentration), covering 3295 tested chemicals and 1267 tested species and four exposure durations. LC50s were preferred in this research as it is a relatively uniform test end point, as compared to sublethal effect metrics. The Observed LC50 data matrix is sparsely populated (approximately 0.5% coverage). The pairwise learning yielded more than four million Predicted LC50s for each exposure duration, which were used to generate novel (1) Hazard Heatmaps for all (chemical,species) pairs, (2) all-species Species Sensitivity Distributions (SSD) and (3) taxonomically split SSDs, and (4) Chemical Hazard Distributions (CHD) from the data. All-species SSDs are often-applied hazard assessment models, but the other use formats are new. All output formats were designed to yield useful insights into science and practice.
The main aim of this paper is to address the problem of gaps in ecotoxicological hazard data, describe the approaches and results, and discuss these in the context of practical utility. The following research and practice questions are central:
Question 1: Can pairwise learning produce a full matrix of Predicted (chemical,species)-pair LC50s, and what is the accuracy of the predicted values?
Question 2: In which ways can the matrix with Predicted LC50 values be used in practice? Here, we consider and illustrate four output formats.
Materials and Methods
Input Data: Observed LC50s
Data for the study were extracted from the ADORE data set, a benchmark database for evaluation of machine learning approaches. Extracted data are LC50s (″processed/ecotox_mortality_processed.csv”) with specified test organism identity (“tax_gs”), test chemical identity (“test_cas”), test duration (″result_obs_duration_mean”), and Observed LC50 value (“result_conc1_mean_mol_log”), thus pragmatically adopting the standardized definitions (e.g., on LC50) of ADORE. ADORE’s “mortality filtered” data set represents data with more chemical and species features but was not used as our model does not rely on those. Extracted data comprise a set of Observed LC50 data for 3295 chemicals and 1267 species, representing 70,670 experiments with 18,966 unique (chemical,species) pairs (for some duration) out of 4,174,765 (3295 × 1267) possible (chemical,species) pairs. Given the sparse coverage (approximately 0.5%), the pairwise learning step should yield 99.5% LC50s for the missing pairs. Further data details are in Supporting Information Section 2.
Step 1: Bridging Data Gaps and Validation
The input data were comprehensively (as total set) subjected to pairwise learning, using Bayesian matrix factorization based on the “libfm” library by Rendle. Information on Chemical identities (CAS number), Species identities, and Test durations (24, 48, 72, or 96 h of exposure) were treated as categorical variables. LC50s were expressed in logarithms and were treated as a continuous variable. Features were encoded into a feature vector x, which is a concatenation of three vectors: (1) Chemicals were represented by a 3295 length vector consisting of one-hot-encoded indicator variable, (2) species were represented by a 1267 length vector consisting of one hot-encoded indicator variable, and (3) exposure durations were represented by a 4 length vector consisting of one hot-encoded indicator variable. Predicted LC50s were again represented as a continuous variable. Each (chemical,species,duration) triplet defines an “Experiment”, and some triplets occur (vastly) more than once in the data set. We did not aggregate those, to characterize the magnitude of interexperimental variation (as in ref ) as key part of the modeling steps and analyses. Note that this is different from some regulatory-adopted approaches, where in such cases, one value (e.g., the median LC50) is used.
We predict the (log-transformed) LC50 value y(x) for each experiment with a second-order Factorization-Machine model specified by
where d is the feature dimensionality and k is the dimensionality of the factorization. Here, x is a sparse binary vector that indicates which chemical, species, and exposure duration are active (only three entries are 1, the rest are 0).
The model parameters are
: a global bias term, which captures the overall average log-LC50 in the data set, centering the model.
: a species bias term, chemical bias term, or duration bias term, which enables learning the first-order interactions, or the mean effects of each species, chemical, and duration.
: a factorized matrix of all pairwise interactions between species, chemicals, and durations. This part of the model captures the “lock and key” effect, in which individual species uniquely interact with individual chemicals uniquely.
The “libfm” model (version 1.2.4; http://libfm.org/#publications) was run with Markov Chain Monte Carlo (MCMC) optimization. Accuracy was reached when running the model with 2000 epochs (full iterations over the data set) and 32 latent factors. We defined four model runs:
-
a.
Null model: learn only the global bias term. This implies learning means of the training data.
-
b.
Mean model: learn the global bias term, species bias terms, chemical bias terms, and duration bias terms. This implies learning the overall species sensitivity, the overall chemical hazard, and the overall duration effect. Under this model approach, for example, the SSDs derived in the next step have the same shape and species order for every chemical with location corresponding to the toxicity of that chemical and duration of the experiment.
-
c.
Pairwise model: learn all the bias terms and pairwise interactions between species, chemical, and duration. Here, different species may have different reactions to different chemicals, species may be alternatively sensitive to duration, and chemical toxicity may be uniquely expressed depending on duration. Under this model, for example, the results can result in any SSD shape and location for any chemical when applied to a (chemical,species) matrix for a given duration.
-
d.
“Ideal” model: fit a separate parameter to indicators of each (chemical,species,duration) experiment using ordinary linear regression on the test set. This implies a theoretical model that correctly predicts the mean of (repeated) outcomes of the experiment in the test set, which results in the minimum possible root mean squared error (RMSE). The error of this ideal model captures the limit of modeling on this data set that is imposed due to bias and variability in the experimental results. Comparing the RMSE of the pairwise model to that of the ideal model makes it clear what proportion of the error is due to inherent variability in the data set (characterized by the triplets with multiple data) and what is due to modeling error. Differently curated data sets could have lower or higher ceilings on model performance stemming from this variability.
The accuracy of Predicted LC50s for (chemical,species) pairs was evaluated via a 10-fold grouped cross-validation strategy, where the data is split into train and test folds of Observed LC50s. This cross-validation strategy gives confidence that a model with a good average RMSE across folds is actually performing well, rather than good performance being the result of a lucky train–test split, which can have a large biasing effect on model performance. Low RMSE scores achieved on the test sets are a sign that the model is not overfitting to the training data, and the restricted dimensionality of the latent factors acts as a regularization term (to ascertaining no overfitting), further forcing the model to generalize correctly.
Step 2: Output Use Formats and Inspections
The output from step 1 were used to generate four output formats, which are visualized and inspected for selected chemicals and species. Depending on the context, researchers may refer to the LC50s as “species sensitivity data” or as “chemical hazard data”.
First, we characterize the whole (chemical,species) matrix and visualize its contents as a comprehensive Hazard Heatmap. The heatmap centroid was defined as a calculated mean (chemical,species) pair, which serves as a (virtual) anchor point for adding color shades to all data points. Rows and columns were generated by sorting from the most sensitive to least sensitive species and most toxic to least toxic chemical based on the Predicted LC50s. The heatmap was color-coded in the array blue–green–yellow–orange–red, with shades bordered by three standard deviations of predicted LC50 values up and downward from the anchor point. This “rainbow” visualization is further referred to as a (Chemical) Hazard Heatmap in the present paper, given the customary practice of characterizing hazard differences among chemicals. The heatmap can also be referred to as a (Species) Sensitivity Heatmap if the research focus is on the relative sensitivity of species.
Second, we derive SSDs for each chemical. With LC50 data here considered as species sensitivity data points, SSDs describe the pattern in the sensitivity data across multiple species for each chemical separately, which allows one to rank chemicals in their potential to cause harm. We compare SSDs derived from Predicted LC50s (rainbow) with traditional SSDs derived from the Observed LC50s (black) and introduce a novel distribution format. This output format is solely referred to as the SSD (of each chemical), given the historical use of this term for this model format.
Third, given the number of Predicted LC50 data, we derive taxonomically split SSDs, to align with practices in applied ecology and to account for multimodal SSDs (which occur for chemicals with a specific mode of action). Split SSDs allow one to rank chemicals in their potential to cause harm to specific taxonomic groups. This output format is solely referred to as split SSD.
Fourth, we conceptualize and derive a novel distribution, namely, the chemical hazard distribution (CHD). With LC50 data here considered as chemical hazard data points, CHDs describe the pattern in the hazard data across multiple chemicals for each tested species separately, which allows us to rank species with regard to their relative sensitivity for exposure to chemicals. This output format is solely referred to as CHD (of each species), as a logical complement to the (historical) use of the SSD-term.
Data and Code
The ADORE data set was obtained from the ERIC (10.25678/0008C9) repository https://renkulab.io/gitlab/mltox/adore. The code is available on GitHub: https://github.com/rivm-syso/adore_pairwise_learning/. The matrix of Predicted LC50s for all (chemical,species) pairs is available on Zenodo: 10.5281/zenodo.14449272.
Results
Step 1: Bridging Data Gaps (Predicted LC50s)
The pairwise modeling resulted in a complete (chemical,species,duration) triplet matrix of 1267 × 3295 × 4 = 16,699,060 Predicted LC50s (10.5281/zenodo.14449272). Those can be interpreted as point estimates of a comprehensive Quantitative Species Sensitivity Relationship (QSSR). Results of the model validation analyses are summarized in Figure and resemble outcomes obtained with other data. Prediction accuracy improved from model steps 1 to 4: RMSEs were 1.78, 0.96, 0.85, and 0.51 for the four model formats. Based on characteristics of the data set, we considered both variability (in repeated measurements for the same species/chemical pairs) and biases (some species and chemicals being tested more than others).
1.
Explained variance for various model specifications (left) and predicted-observed diagram (log10 scaled molar concentration units for the X and Y axes, right). The scatter plot concerns model predictions and true values in the first validation setting: species and chemical interactions. Every dot is a (chemical,species)-pair observation. Deviations from the predicted values are characterized via orange diagonal bands (RMSE-based; bands indicate a range where we have 31.7% (±1σ), 4.6% (±2σ), and 0.3% (±3σ) of observations outside the range, assuming normally distributed errors).
Regarding variability, the lowest RMSE of 0.51 means that a perfect model for the current data would have an RMSE of this magnitude, rather than 0, attributable to the high variability/uncertainty in the data for the species/chemical pairs that have multiple observations. Accounting for the overall sensitivity of species and the overall toxicity of chemicals explains 70% of the variance, and there are some species and chemical interactions that explain up to 77% variance. It is theoretically possible to explain at most 92% of the variance due to stochastic variability in the repeated experiments. Such variability is common and is considered to result from differences in test conditions, lineages of species tested, exposure concentration steps, and natural variability. , Regarding biases in the test data, we determined model outcomes after correction for the unequal representation of the species and chemicals in the data set, by assigning each data point a weight that is inversely proportional to the number of samples for that species/chemical pair. This did not result in altered final RMSE scores (with or without this bias-controlling approach). This outcome suggests that the bias in the data set is not negatively affecting the model.
In short, the pairwise model reaches satisfactory results, and most of its variability is explained by how sensitive the species are or how toxic the chemicals are overall. The higher-order models are a substantial improvement (over 70% of variance) over the null model of simply predicting the mean.
Step 2: Output Use Formats and Inspections
Hazard Heatmap
Observed and Predicted LC50 data were formatted as heatmaps, both with rainbow colors (Figure ). Zoomable heatmaps can be inspected via GitHub (https://github.com/rivm-syso/adore_pairwise_learning/), showing data sparsity and detailed patterns. Despite the overall sorting of chemicals and species, heatmap details (Figure , enlargements) illustrate discontinuities in Observed and Predicted LC50s (as compared to monotonous color changes). This reflects peculiar (hazard or sensitivity) features of specific (chemical, species) pairs, recognizing that some species are indeed particularly sensitive for some chemicals (e.g., snails for copper). The rainbow-heatmap format is useful for Safe and Sustainable by Design assessments, as a safer chemical (for a function) is positioned more to the right (toward blue) in the heatmap.
2.
Hazard heatmaps of LC50 data (example: 24 h test duration data). Top: sparsely populated sorted Observed LC50s; matrix space covering 18,966 unique (chemical,species) pairs (observed for some duration). Bottom: sorted Predicted LC50s; matrix space filled with >4 million predicted values. Insets: vastly enlarged bottom-left areas of both heatmaps (manually adapted for illustration purposes only). Zoomable heatmaps: https://github.com/rivm-syso/adore_pairwise_learning/. Rainbow colors: red, low LC50 (high hazard, high sensitivity); blue, high LC50 (low hazard, low sensitivity).
Species Sensitivity Distributions (SSDs)
Observed and Predicted LC50 data were formatted as Species Sensitivity Distributions (SSDs). We show how the model can be used to derive a new kind of Species Sensitivity Distribution (SSD) for all species in the data set. In Figure , we compare side-by-side this novel SSD to a traditionally fitted SSD curve for five selected species at 48 h exposure. To enable detailed inspection of the traditional and novel SSD patterns, we provide an interactive version of this plot for 20 different chemicals at four different exposure durations online on the GitHub repository for this article (https://rivm-syso.github.io/adore_pairwise_learning/interactive_ssd_viewer.html).
3.
Illustration of Species Sensitivity Distribution results for two chemicals, with emphasis on data patterns and SSD positions and shapes (top row: the plasticizer dibutyl phthalate, and bottom row: the insecticide cyfluthrin). Note the different LC50 ranges (X-axis). Black: Observed LC50s, rainbow: Predicted LC50s. Rainbow colors: redlow LC50 (high hazard, high sensitivity), bluehigh LC50 (low hazard, low sensitivity, as in Figure ). Left panels: novel SSD format. Right panels: traditional SSD format. X-axis: LC50 values are in mol/L (log scale). Interactive versions of similar plots for 20 different chemicals at four different exposure durations are provided at https://rivm-syso.github.io/adore_pairwise_learning/interactive_ssd_viewer.html.
The novel SSDs can be viewed at the left of Figure . To build the novel SSD, we first let the factorization-machine model fill the chemical × species table, giving one predicted log-LC5 0 for the selected chemical for every one of the 1267 species at (in Figure ) 48 h exposure time. These values are sorted from the lowest (most sensitive species) to the highest and converted to a cumulative rank percentage, so each species occupies its natural place on a 0–100% vertical scale. We then plot every species at its own coordinate (x-axis being predicted log-LC5 0, y-axis being the rank-based percentage) plotting the points to obtain the cumulative distribution of the data, which appeared to be a visually smooth S-curve. In the figures, for the subset of species that have experimental LC5 0s, we place markers (square/triangle/circle) at the same y-positions but with their measured concentrations on the x-axis. No curve fitting is applied in the figures of the novel SSD format.
On the right of the figure are shown traditionally derived SSD curves for the same species. These are constructed based on the data of the species that were actually tested for a specific chemical. Their measured LC5 values are sorted from lowest to highest, and each is assigned a cumulative probability. A parametric log-normal curve is fitted to these the points.
The following observations can be made from the SSD pattern analyses. First, rainbow SSDs have a (nearly) sigmoidal shape. This is not the result of fitting a log-normal model to the Predicted LC50s. This finding corroborates the observation of the 1980s, which observed ecotoxicity data resembling a log-normal distribution. The sigmoidal shape is found for chemicals with both nonspecific and specific modes of action. A goodness-of-fit test (on the fit of the log-normal model to the data) would not have identified a misfit of the model, and traditional log-normal SSD fits to the data (right panels) would not be rejected. However, the novel results (left panels) show that the tested species are not equally distributed over the sensitivity range, as shown by the subgroups of data points. That is, traditional SSDs may be biased due to the species selection bias.
Second, rainbow SSDs embody a much broader representation of species than those generated from Observed LC50s only, relating to the key assumption that an SSD constructed from test data resembles the sensitivity distribution of the field species assemblage. Usually, this hinges on only very few data only. The present results improve this, although the true sensitivity distributions of field species assemblages remains unknown.
Third, the novel SSDs (left panels) allow for an improved interpretation. While the quantification of the potentially affected fraction (Y) of a given exposure concentration (X) is feasible and of degree for traditional and novel SSDs, it is only in the latter case that the interpretation is also of kind. That relates to the Y-axis definition, which is taxonomic rather than quantitative in kind. Therefore, one can see (and count) whether an exposure equally affects all taxonomic groups or not. The observation of clustered Observed LC50 data (e.g., the cluster of insect test data in the lower tail for cyfluthrin) illustrates the presence of species selection bias and supports derivation of split SSDs.
Fourth, the SSDs based on Predicted or Observed LC50s may be remarkably similar or very different (top and bottom right panels, Figure , respectively). In the latter case, the rainbow SSD predicts lower toxicity for the whole assemblage at a given exposure (X) than the traditional SSD derived from the available LC50 test data. Such non-overlay is at least in part attributable to species selection bias (here: tests with insect species dominate).
Fifth, and most importantly, the method yields all-species SSDs (based on 1267 predicted data points each) for every chemical. Together with the results of the validation study (step 1) and the novel SSD format (overlays of Predicted and Observed data, Figure , left), this implies that the employed combination of methods (step 1 + step 2) appears to surprisingly well bridge the data gap that often hampers hazard assessments.
Further all-species SSD analyses are in the Supporting Information Section 3, including a table with log-normal SSD models fitted to the data (SI Table 2) for all chemicals. Important results presented there are (1) that SSDs based on Predicted LC50s shift left for longer exposure times (as expected) and (2) that the similarity of SSDs based on Observed and Predicted data improves for SSDs based on higher numbers of Observed LC50s.
Species Sensitivity Distributions for Taxonomic Groups
Observed (black) and Predicted (rainbow) LC50 data were formatted as split SSD, based on training the pairwise-learning model on all data (across the taxonomic groups). A split into algae, crustacea, and vertebrates was made for the photosynthesis inhibitor atrazine as example (Figure , taxonomic Y-axis) for data subsets ≥10 Observed LC50s (over all and per taxonomic group).
4.
Illustration of the outcomes of a split-SSD approach for the photosynthesis inhibiting chemical atrazine, without splitting of the data (top left) and upon splitting the data to derive SSDs for specific taxonomic groups (Crustacea: top, right; fish: bottom, left; algae, bottom right. Rainbow colors: redlow LC50 (high hazard, high sensitivity), bluehigh LC50 (low hazard, low sensitivity). Plotting principles are as in Figure (left panels).
The split-SSD analysis results in SSDs with distinct positions and slopes: Algae-test data points are positioned in the low-exposure range with few insensitive species, and Observed data on Invertebrates and Vertebrates are positioned in the higher-exposure ranges with some tested sensitive species. The rainbow distributions confirm the taxonomic differences with Algae showing the more sensitive color range. Note that the Algae SSD deviates slightly from sigmoid, with a near-vertical distribution at X ≅ −6, and a complete near-sigmoidal plot for Crustacea and fish. The figures again show a surprising degree of overlay of the Observed LC50s and the Predicted LC50-patterns (in addition to the general validation results described in step 1). Additional arguments on motives to apply split SSDs are in the Supporting Information Section 4. The outcomes of the present study would support extensive use of the split-SSD approach in practice.
Chemical Hazard Distributions
Observed (black) and Predicted (rainbow) LC50 data were formatted as novel Chemical Hazard Distributions (Figure ). CHDs summarize information about the sensitivity profile of each tested species for all tested chemicals. The CHD position marks highly sensitive, moderate, or less sensitive species for all chemicals; the CHD slope marks whether species are equally sensitive to many chemicals (steep) or not; and bi- or multimodality marks that some chemical groups are more hazardous for the species than others.
5.
Examples of Chemical Hazard Distributions of two frequently tested species (48 h exposure), formatted similar as in Figure and with emphasis on data patterns and CHD positions and shapes. Left: novel CHD plot format; right: traditional CHD plot formats. Rainbow colors: redlow LC50 (high hazard, high sensitivity), bluehigh LC50 (low hazard, low sensitivity). Note the similar LC50 ranges (X-axis).
The following observations can be made from CHD pattern analyses (additional analyses on CHDs are listed in the Supporting Information Section 5). First, the outcomes illustrate that some species are tested more frequently than others. This may relate to regulatory obligations on using some standard-test species (in turn related to practical issues such as complexity and costs).
Second, the rainbow distributions are sigmoid, again (as for rainbow SSDs) without fitting a model to the data points. The color shades show that the two test species are characterized by a wide array of sensitivities for different chemicals.
Third, chemical selection bias can be present, as the set of tested chemicals for a species can be dominated by highly toxic or minimally toxic chemicals. As an example, the Observed LC50s for the fish are positioned in the high-impact range. Compound selection bias has similar types of effects (nonoverlay of black and colored data points) as the species selection bias discussed for the SSDs (right-hand panels).
Fourth, the novel visualization again generally resulted in an overlap of Observed and Predicted LC50s. Where species selection bias is an identified cause of the nonoverlap for traditional and predicted SSDs, it likely is (test) chemical selection bias that likewise causes nonoverlap for CHDs.
Fifth, and most importantly, inspection of the CHD patterns allowed to test whether frequently tested species are equally distributed over the heatmap or not. In the latter case, the standard-test species may be relatively sensitive or insensitive. Both would be an argument to reject the critical (SSD-) assumption that the test data resemble sensitivity patterns for field species assemblages. Details of this analysis are in the Supporting Information, Section 5.4. Data analyses suggest, however, that often-tested species are not specifically sensitive (all positioned on the left in the figure) or insensitive (all positioned on the right in the figure). Their data spread over the whole rank-ordered array of species sensitivities.
Discussion
General
Our results illustrate that data gaps for ecotoxicological hazard assessment can be bridged by pairwise learning, how valid that bridging is judged by outcomes of steps 1 and 2, and how results can help to support practical assessments. The societal relevance of those are high: potential uses of the step 2 outputs range from SSbD assessments of newly designed chemicals, decisions on allowing chemicals on the market, the comparative life cycle assessment of products, and environmental footprints to environmental quality assessment and management. Beyond the stage of proof of principle, further analyses based on chronic and lower (field relevant) exposure concentrations as well as other improvements are warranted. A next step should, among others, consider characterization and presentation of confidence intervals for the predicted LC50s (and the use formats). This is important because of observed species and chemical testing biases. For the current study, the zoomable heatmaps (github) illustrate that the data matrix is characterized by different data densities.
Feasibility, Validity, and Utility
The present study shows that the comprehensive assessment of all paired (chemical,species) data is feasible and that the obtained results matrix can be considered surprisingly valid (step 1, Figure ) and valid and informative for practical purposes (step 2, Figures –). Although the data set used to train the model has its inherent uncertainties/variability and biases, evaluations (with, e.g., equally weighting all species/compound pairs) suggested that the results that were obtained are robust. The validity is further underscored by covariation of the SSDs derived in the present study as compared to those derived in an earlier study. The detailed analysis results of this comparison are in the Supporting Information (Section 3.6). The remark on the surprising degree of validity is an interpretation that can be made in the context of the common use of Safety or Application Factors in protective regulatory contexts. Such factors can range up to a value of 1000, where an Observed ecotoxicity end-point value (under data-poor conditions) may be divided by 1000 to derive a proposed regulatory protective concentration. As compared to those high factors, the SSDs and CHDs show remarkable overlaps between Observed and Predicted LC50s (left panels, Figures and and Figure ). Despite those overlaps, it is also likely that an even higher accuracy can be obtained when using more input data or more features that describe the tested species, test chemicals, and study conditions. This will be subject of further model development.
The various use formats (step 2) are important for contemporary regulatory and decision-support applications. Briefly, the output matrix can be simply queried by entering the identity of the chemical(s) of concern, yielding summary hazard insight(s) as result (see also Table S2 for SSDs of all chemicals). This has many uses. It can help to identify the chemicals of least concern in an SSbD assessment, in the early innovation stages. It can be used to derive protective environmental quality standards for prospective (e.g., REACH) or retrospective (e.g., WFD) evaluations of chemical pollution threats. The results can also be used to derive ecotoxicological Effect Factors for chemicals in the context of product Life Cycle Assessments (www.usetox.org) and Product Environmental Footprinting, The results can visually be communicated to nonspecialist stakeholders, via formats such as the novel Hazard Heatmap, or as traditional or novel SSDs (to summarize the relative hazard of different chemicals for tested species) or as novel CHDs (to summarize the relative sensitivity of different species for tested chemicals).
The outcomes of our study embody a major step forward for the hazard assessment of chemical pollutants vis a vis impacts on biodiversity. This holds especially in comparison to the methods and results of Gasser et al., who synchronously and independently utilized ADORE data and Machine Learning approaches in combination with SSD modeling. That study focused on LC50 data for fish species and fish-specific SSDs (as in Figure ) and used more chemical, species, and test condition features. These authors found that that the range of Predicted LC50s was far smaller than that of Observed LC50s, that SSDs derived from both data sets differed, and found nonsigmoidal distributions for the Predicted data sets. Comparison with our results suggest that considering a wide set of training data (all species, all chemicals)even with more limited consideration of featuresyielded outcomes that are in part fully as expected (similarity of traditional and novel SSDs), in part surprising (the sigmoid shape of many data patterns) and in part fully new (heatmap, some SSD formats, and the CHD formats). The comparison suggests that the fit of SSDs to Predicted LC50 data sets can be improved by expanding the input data to an all-species group approach. The final conclusion formulated by Gasser et al. that “···. species specific sensitivities are not adequately distinguished by our models” is opposite to ours and may be explained by these differences.
Additional Scientific and Practical Insights
SSDs are the best-known format to characterize distributions of ecotoxicity hazard data. SSDs are the applied-ecotoxicology format of the oldest-known model in the history of risk, which is the normal distribution. Since their conceptualization for applied hazard assessment and policy making in the 1980s, , the use of SSDs is paramount around the globe. SSDs are used to derive regulatory protective standards, to decide on market entry of novel chemicals, to characterize environmental quality, to evaluate life cycle impacts of products, and to evaluate chemical footprints for areas. These uses are supported by studies showing that increased toxic pressure, derived by using SSDs and mixture modeling, covaries with biodiversity loss. , The importance of SSD-type distributions for decision-making on preventing or reducing chemical pollution cannot be overstated.
Nonetheless, our results underscore that there is latitude to improve the derivation and use of SSDs. The first and major improvement concerns the bridging of data gaps for all of the aforementioned practical purposes. Many practical assessments are currently not feasible due to lack of hazard data, as a consequence of working chemical-by-chemical, or because of data-filtering practices (where focus is on chronic no-effect end points that pass criteria for study quality and relevance , ). The second improvement concerns the potential for recognizing and avoiding species selection bias. The results of the present study show that species selection bias may occur, as earlier observed, , but that such bias can be recognized better. Specifically, the use of SSDs for separate taxonomic groups may be an important improvement option (Figure ), as the species selection bias phenomenon appears to be at least in part related to specific modes of action of chemicals and the apparently preferred testing of sensitive species (Figure ).
The present study also resulted in the proposal of the novel concept of CHDs. This is very relevant for the evaluation of testing strategies and the evaluation of the assumption that the tested species represent a field species assemblage. In the early stages of environmental regulation, researchers have been aiming at identifying a single most sensitive species to test all chemicals and protect all species based on the findings. Extensive studies followed, resulting in evidence that (a) no species is generally sensitive to all chemicals, (b) a test battery (covering various species) is needed, and that hazard insights improve with battery size, and (c) the presence of various test organisms in a battery is relevant to gain all necessary insights in chemical hazards, while (d) the risk of underestimating toxicity of a chemical for aquatic life is considerable, even for test batteries with up to five or six species. Such findings eventually resulted in regulatory Guidance in which minimum numbers and taxonomic diversity of tested species were prescribed when using SSDs (e.g., OECD, 2006). The CHD-output format allows evaluations on this matter and points at chemical selection bias, meaning that a species is not tested for the entire range of chemicals. The opportunities to investigate what the novel CHD concept implies for research and practice have only partly been scrutinized (see further Supporting Information Section 5). However, further investigating the positions and shapes of CHDs vis a vis the Observed data and field data may prove to be highly relevant to move toward obtaining and interpreting data sets that resemble the sensitivity characteristics of the species assemblages in the field.
In short, our results show that data gaps can be effectively bridged, that both species and chemical selection biases occur (and can be traced), that taxonomically split (or otherwise split) SSDs can be derived (to align applied ecotoxicology to applied ecology), and that practical assessments (even SSbD assessments under their typically data-poor conditions) can be supported. Nonetheless, extrapolation beyond the set of tested species and chemicals remains challenging.
Opportunities
There are many opportunities to develop the study beyond the proof-of-concept stage. A nonlimitative list of opportunities contains multiple items. First and foremost, the methods should be employed to chronic data to obtain results for realistic ambient exposure levels. Second, the methods should be expanded to enable characterization of hazards of nontested chemicals and species (extrapolation, rather than the interpolation applied so far). Third, in line with the approaches followed by others, it is likely that improved pairwise learning model outcomes can be obtained by accounting for more and relevant species, chemical, and test condition features. The present research utilized a bare minimum of features, with already surprising prediction accuracy. Fourth, applied ecotoxicology and applied ecology can be aligned in regulatory practice, by systematic use of split SSDs (sensu Oginah et al.), which also (partly) solves issues such as species selection bias. Fifth, likewise, the results can also be used to make (further) regionally relevant assessments, by deriving, e.g., waterbody-type-specific SSDs. That is, given that different water bodies are populated with different species pools, such methods would acknowledge that there are differences in vulnerable and less vulnerable ecosystems. Sixth, in contrast to splitting, the outputs can also be used to support chemical footprinting, up to evaluations at the scale of the planetary boundary for chemical pollution. , Seventh, and of high practical importance, outputs of this study can intuitively be understood by actors in environmental quality protection and management. That is, the heatmap (Figure ) can be made available in the format of a webtool and operated via a (chemical identity) lookup function. Such an application-oriented design of decision-support tools to enable assessing potential chemical pollution problems is important and appears feasible for the heatmap, for SSDs and CHDs (rainbow formatted). In total, our results may be helpful to improve environmental protection, assessment, and management methods for chemical pollution and reduce impacts on biodiversity.
Supplementary Material
Acknowledgments
The research is based on combining long-term knowledge, experiences, and insights collated on chemical pollutant effects on different species (LP) with contemporary machine learning approaches and software development (MV, TP), shaped into application for predicting ecotoxicity metrics by MV/TP and co-workers in the project Predictive Toxicology coordinated by Pim Wassenaar. Funding agencies and colleagues involved in these long-term research lines are acknowledged for their inspiration and long-term support for sufficient “mass and focus” on this research subject. Lya Soeteman-Hérnandez, Jaap Slootweg, Elmer Swart, and Lisa Trostrams of RIVM are acknowledged for stimulating discussions and technical support on the development of the approach. Peter Fantke (DTU Copenhagen and substitute ApS), Mark Huijbregts (Radboud University Nijmegen), Max Keuken (RIVM), Christoph Schür (Eawag), and Glenn W. Suter (U.S. EPA, retired) proofread the manuscript and are acknowledged for their suggestions, which helped to improve the manuscript. Jaap Slootweg (RIVM) is acknowledged for screening the modeling scripts.
Glossary
Abbreviations
- CAS
Chemical Abstract Services
- CHD
Chemical Hazard Distribution
- LC50
50% Lethal Concentration
- MCMC
Markov Chain Monte Carlo
- QSSR
Quantitative Species Sensitivity Relationship
- RMSE
root mean squared error
- SSbD
Safe and Sustainable by Design
- SSD
Species Sensitivity Distribution
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.est.5c01289.
Motives for additional data analyses, methods used, and the results obtained, including additional graphical results and the importance of the findings in supporting the implications of the main article (PDF)
Characteristics of the set of LC50 ecotoxicity test data used in the data analyses, with specification on number of test data for (a) the four exposure durations, (b) the 3295 chemicals that were tested, and (c) the 1265 species that were used in the tests (XLSX)
SSD parameters (per exposure duration) for all chemicals, and of the data analyses underlying SI Figure 2 and SI Figure 3 (for which the motives, approaches, and results are described in the Supporting Information Text file (XLSX)
CHD parameters (per exposure duration) for all studied species, and of the data analyses underlying SI Figure 6, SI Figure 7, and SI Figure 8, as well as for the data analyses for SI Table 4, for which the motives, approaches, and results are described in the Supporting Information Text file (XLSX)
L.P. and M.V. contributed equally. The manuscript was written through contributions of all authors. All authors have given approval to the definitive version of the manuscript. ConceptualizationL.P. and M.V. Data curationM.V. Formal analysisM.V. and T.P. Funding acquisitionL.P. InvestigationM.V., T.P., and L.P. MethodologyL.P. and M.V. Project administrationL.P. ResourcesL.P. SoftwareM.V. and T.P. SupervisionL.P. ValidationM.V. and T.P. VisualizationM.V., T.P., and L.P. Writingoriginal draft: L.P. and M.V. Writingreview and editing: L.P., T.P., and M.V.
The project was funded by a stimulus fund for internationally relevant research networks under RIVM-project number (O/121000) and further supported by stimulus and feedback from the IRISS project (https://iriss-ssbd.eu/iriss/about-iriss/) on forwarding the SSbD approach. The IRISS project receives funding from the European Union’s HORIZON EUROPE research and innovation program under grant agreement no. 101058245.
The authors declare no competing financial interest.
References
- United Nations Environment Programme; Council, I. S. . Navigating New Horizons: A global foresight report on planetary health and human wellbeing; Available: https://wedocs.unep.org/20.500.11822/45890. [Accessed: Oct. 30, 2024] 2024.
- European Commission Chemicals Strategy for Sustainability. Towards a toxic-free environment; European Commission: Brussels, 14.10.2020 COM(2020) 667 final, 2020. [Google Scholar]
- Caldeira, C. ; Farcal, R. ; Garmendia Aguirre, I. ; Mancini, L. ; Tosches, D. ; Amelio, A. ; Rasmussen, K. ; Rauscher, H. ; Riego Sintes, J. ; Sala, S. . Safe and sustainable by design chemicals and materials - Framework for the definition of criteria and evaluation procedure for chemicals and materials, JRC, Publications Office of the European Union: Luxembourg, 2022. [Google Scholar]
- European Commission. Regulation (EC) No 1907/2006 of the European Parliament and of the Council of 18 December 2006 concerning the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) Off. J. Eur. Union. 2006;L 396:1–848. [Google Scholar]; https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:02006R01907-20140410&from=EN
- European Commission. Directive 2000/60/EC of the European Parliament and of the Council of 23 October 2000 establishing a framework for Community action in the field of water policy. Off. J. Eur. Communities. 2000;L 327:1–72. [Google Scholar]; https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=OJ:L:2000:2327:TOC
- Sala S., Biganzoli F., Mengual E. S., Saouter E.. Toxicity impacts in the environmental footprint method: calculation principles. International Journal of Life Cycle Assessment. 2022;27(4):587–602. doi: 10.1007/s11367-022-02033-0. [DOI] [Google Scholar]
- Wang Z., Walker G. W., Muir D. C. G., Nagatani-Yoshida K.. Toward a Global Understanding of Chemical Pollution: A First Comprehensive Analysis of National and Regional Chemical Inventories. Environ. Sci. Technol. 2020;54:2575. doi: 10.1021/acs.est.9b06379. [DOI] [PubMed] [Google Scholar]
- Posthuma L., van Gils J., Zijp M. C., van de Meent D., de Zwart D.. Species sensitivity distributions for use in environmental protection, assessment, and management of aquatic ecosystems for 12 386 chemicals. Environ. Toxicol. Chem. 2019;38(4):905–917. doi: 10.1002/etc.4373. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kristiansson E., Coria J., Gunnarsson L., Gustavsson M.. Does the scientific knowledge reflect the chemical diversity of environmental pollution? – A twenty-year perspective. Environmental Science & Policy. 2021;126:90–98. doi: 10.1016/j.envsci.2021.09.007. [DOI] [Google Scholar]
- Fischer E.. Einfluss der configuration auf die wirkung der enzyme. Ber. Dtsch. Chem. Ges. 1894;27:2985–2993. doi: 10.1002/cber.18940270364. [DOI] [Google Scholar]
- McKinney J. D.. The molecular basis of chemical toxicity. Environ. Health Perspect. 1985;61:5–10. doi: 10.1289/ehp.85615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Viljanen M., Minnema J., Wassenaar P. N. H., Rorije E., Peijnenburg W.. What is the ecotoxicity of a given chemical for a given aquatic species? Predicting interactions between species and chemicals using recommender system techniques. SAR and QSAR in Environmental Research. 2023;34(10):765–788. doi: 10.1080/1062936X.2023.2254225. [DOI] [PubMed] [Google Scholar]
- von Borries K., Holmquist H., Kosnik M., Beckwith K. V., Jolliet O., Goodman J. M., Fantke P.. Potential for Machine Learning to Address Data Gaps in Human Toxicity and Ecotoxicity Characterization. Environ. Sci. Technol. 2023;57(46):18259–18270. doi: 10.1021/acs.est.3c05300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kosnik M. B., Schuwirth N., Rico A.. Harnessing Computational Methods to Characterize Chemical Impacts on Biodiversity. Environ. Sci. Technol. Lett. 2024;11:185. doi: 10.1021/acs.estlett.3c00865. [DOI] [Google Scholar]
- NORMAN-Network Deriving Environmental Quality Standards for chemical substances in surface waters (https://www.norman-network.com/nds/ecotox/docs/Fact-sheet-EQS-Derivation.pdf) 2025. (accessed April 22, 2025)
- von der Ohe P. C., Dulio V., Slobodnik J., De Deckere E., Kühne R., Ebert R.-U., Ginebreda A., De Cooman W., Schüürmann G., Brack W.. A new risk assessment approach for the prioritization of 500 classical and emerging organic microcontaminants as potential river basin specific pollutants under the European Water Framework Directive. Science of The Total Environment. 2011;409(11):2064–2077. doi: 10.1016/j.scitotenv.2011.01.054. [DOI] [PubMed] [Google Scholar]
- Viljanen M., Airola A., Pahikkala T.. Generalized vec trick for fast learning of pairwise kernel models. Machine Learning. 2022;111(2):543–573. doi: 10.1007/s10994-021-06127-y. [DOI] [Google Scholar]
- Rendle S.. Factorization machines with libfm. ACM Transactions on Intelligent Systems and Technology (TIST) 2012;3(3):1–22. doi: 10.1145/2168752.2168771. [DOI] [Google Scholar]
- Schür C., Gasser L., Perez-Cruz F., Schirmer K., Baity-Jesi M.. A benchmark dataset for machine learning in ecotoxicology. Scientific Data. 2023;10(1):718. doi: 10.1038/s41597-023-02612-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schür, C. ; Schirmer, K. ; Baity-Jesi, M. . On the Comparability of Studies in Predictive Ecotoxicology. bioRxiv 2025, 2025.2003.2004.641385. DOI: 10.1101/2025.03.04.641385 [DOI] [Google Scholar]
- Oginah S. A., Posthuma L., Hauschild M., Slootweg J., Kosnik M., Fantke P.. To Split or Not to Split: Characterizing Chemical Pollution Impacts in Aquatic Ecosystems with Species Sensitivity Distributions for Specific Taxonomic Groups. Environ. Sci. Technol. 2023;57:14526–14538. doi: 10.1021/acs.est.3c04968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fox D. R., van Dam R. A., Fisher R., Batley G. E., Tillmanns A. R., Thorley J., Schwarz C. J., Spry D. J., McTavish K.. Recent Developments in Species Sensitivity Distribution Modeling. Environ. Toxicol. Chem. 2020;40(2):293–308. doi: 10.1002/etc.4925. [DOI] [PubMed] [Google Scholar]
- Notenboom J., Vaal M. A., Hoekstra J. A.. Using comparative ecotoxicology to develop quantitative species sensitivity relationships (QSSR) Environmental Science and Pollution Research. 1995;2:242–243. doi: 10.1007/BF02986776. [DOI] [PubMed] [Google Scholar]
- Hickey G. L., Craig P. S., Luttik R., de Zwart D.. On the quantification of intertest variability in ecotoxicity data with application to species sensitivity distributions. Environ. Toxicol. Chem. 2012;31(8):1903–1910. doi: 10.1002/etc.1891. [DOI] [PubMed] [Google Scholar]
- Gasser L., Schür C., Perez-Cruz F., Schirmer K., Baity-Jesi M.. Machine learning-based prediction of fish acute mortality: implementation, interpretation, and regulatory relevance. Environ. Sci.: Adv. 2024;3:1124. doi: 10.1039/D4VA00072B. [DOI] [Google Scholar]
- Brix K. V., Esbaugh A. J., Grosell M.. The toxicity and physiological effects of copper on the freshwater pulmonate snail, Lymnaea stagnalis. Comparative Biochemistry and Physiology Part C: Toxicology & Pharmacology. 2011;154(3):261–267. doi: 10.1016/j.cbpc.2011.06.004. [DOI] [PubMed] [Google Scholar]
- Suter, G. W. North American history of Species Sensitivity Distributions. In Species sensitivity distributions in ecotoxicology, Posthuma, L. ; Suter, G. W., II ; Traas, T. P. Eds.; Lewis Publishers, 2002; pp 11–18. [Google Scholar]
- Bernstein, P. L. Against the gods: the remarkable story of risk, John Wiley & Sons, Inc., 1996. [Google Scholar]
- Van Straalen, N. M. ; Van Leeuwen, C. J. . European history of species sensitivity distributions. In Species sensitivity distributions in ecotoxicology, Posthuma, L. ; Suter, G. W., II ; Traas, T. P. Eds.; Lewis Publishers, 2002; pp 211–254. [Google Scholar]
- Lemm J. U., Venohr M., Globevnik L., Stefanidis K., Panagopoulos Y., van Gils J., Posthuma L., Kristensen P., Feld C. K., Mahnkopf J., Hering D., Birk S.. Multiple stressors determine river ecological status at the European scale: Towards an integrated understanding of river status deterioration. Global Change Biology. 2021;27(9):1962–1975. doi: 10.1111/gcb.15504. [DOI] [PubMed] [Google Scholar]
- Posthuma L., Zijp M. C., De Zwart D., Van de Meent D., Globevnik L., Koprivsek M., Focks A., Van Gils J., Birk S.. Chemical pollution imposes limitations to the ecological status of European surface waters. Sci. Rep. 2020;10(1):14825. doi: 10.1038/s41598-020-71537-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Klimisch H. J., Andreae M., Tillmann U.. A systematic approach for evaluating the quality of experimental toxicological and ecotoxicological data. Regul. Toxicol. Pharmacol. 1997;25(1):1–5. doi: 10.1006/rtph.1996.1076. [DOI] [PubMed] [Google Scholar]
- Moermond C. T. A., Kase R., Korkaric M., Ågerstrand M.. CRED: Criteria for reporting and evaluating ecotoxicity data. Environ. Toxicol. Chem. 2015;35(5):1297–1309. doi: 10.1002/etc.3259. [DOI] [PubMed] [Google Scholar]
- Fox D. R.. Selection bias correction for species sensitivity distribution modeling and hazardous concentration estimation. Environ. Toxicol. Chem. 2015;34(11):2555–2563. doi: 10.1002/etc.3098. [DOI] [PubMed] [Google Scholar]
- Cairns J.. The Myth of the Most Sensitive Species: Multispecies testing can provide valuable evidence for protecting the environment. BioScience. 1986;36(10):670–672. doi: 10.2307/1310388. [DOI] [Google Scholar]
- Blanck H.. Species Dependent Variation among Aquatic Organisms in Their Sensitivity to Chemicals. Ecol. Bull. 1984;36:107–119. [Google Scholar]
- Musset, L. ; OECD . Current Approaches in the Statistical Analysis of Ecotoxicity Data; 2006. DOI: 10.1787/9789264085275-en. [DOI] [Google Scholar]
- Birk S., Bonne W., Borja A., Brucet S., Courrat A., Poikane S., Solimini A., van de Bund W., Zampoukas N., Hering D.. Three hundred ways to assess Europe’s surface waters: An almost complete overview of biological methods to implement the Water Framework Directive. Ecological Indicators. 2012;18:31–41. doi: 10.1016/j.ecolind.2011.10.009. [DOI] [Google Scholar]
- Zijp M. C., Posthuma L., Van de Meent D.. Definition and applications of a versatile chemical pollution footprint methodology. Environ. Sci. Technol. 2014;48:10588–10597. doi: 10.1021/es500629f. [DOI] [PubMed] [Google Scholar]
- Kosnik M. B., Hauschild M. Z., Fantke P.. Toward Assessing Absolute Environmental Sustainability of Chemical Pollution. Environ. Sci. Technol. 2022;56(8):4776–4787. doi: 10.1021/acs.est.1c06098. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





