Reflected generalized concentration addition and Bayesian hierarchical models to improve chemical mixture prediction

Daniel Zilber; Kyle Messier

doi:10.1371/journal.pone.0298687

. 2024 Mar 28;19(3):e0298687. doi: 10.1371/journal.pone.0298687

Reflected generalized concentration addition and Bayesian hierarchical models to improve chemical mixture prediction

Daniel Zilber ¹, Kyle Messier ^1,^*

Editor: Y-h Taguchi²

PMCID: PMC10977799 PMID: 38547186

Abstract

Environmental toxicants overwhelmingly occur together as mixtures. The variety of possible chemical interactions makes it difficult to predict the danger of the mixture. In this work, we propose the novel Reflected Generalized Concentration Addition (RGCA), a piece-wise, geometric technique for sigmoidal dose-responsed inverse functions that extends the use of generalized concentration addition (GCA) for 3+ parameter models. Since experimental tests of all relevant mixtures is costly and intractable, we rely only on the individual chemical dose responses. Additionally, RGCA enhances the classical two-step model for the cumulative effects of mixtures, which assumes a combination of GCA and independent action (IA). We explore how various clustering methods can dramatically improve predictions. We compare our technique to the IA, CA, and GCA models and show in a simulation study that the two-step approach performs well under a variety of true models. We then apply our method to a challenging data set of individual chemical and mixture responses where the target is an androgen receptor (Tox21 AR-luc). Our results show significantly improved predictions for larger mixtures. Our work complements ongoing efforts to predict environmental exposure to various chemicals and offers a starting point for combining different exposure predictions to quantify a total risk to health.

1 Introduction

For many years, the field of toxicology focused on understanding how individual chemicals affect biological systems. However, it has become clear in recent years that individual chemicals can have a cumulative effect: exposure to safe levels of multiple chemicals can be toxic in aggregate [1, 2]. Even worse, durable “forever” chemicals accumulate in the environment and contribute to the growing list of chemicals people are exposed to every day [3]. Hence, being able to predict the toxicity of a mixture has moved from being an esoteric toxicological puzzle to an urgent challenge to understand looming health risks.

Mixture prediction is mainly a logistical problem for standard toxicology methods. For example, for a set of 10 chemicals, there are more than a thousand combinations of drugs, and for each of those combinations there are countless ways to set the relative concentrations. How does one go about collecting all that data at reasonable cost, and what is to be done if a new chemical is added to the list? Realistically, there are thousands of chemicals that could potentially contribute to a mixture, so an experimental approach is infeasible. Recent spatial exposure maps such as those created in [4] also illustrate the variety of mixtures that people across the country are exposed to and present a compelling use-case for a predictive model. An observational study may seem like a natural work-around to the cost of experiments, but data collection is still tricky and multicollinearity makes the analysis difficult [5]. For these reasons, we focus on methodology that can predict a mixture effect directly from the individual chemical properties rather than a regression approach which requires experimental observations to fit a model.

The two most popular classical methods for predicting a mixture are concentration addition [6] and independent action [7]. Concentration addition (CA) models a mixture of chemicals as if they were different amounts of the same molecule; concentrations are scaled relative to their potency and added together as the name implies. The biological assumption is that all mixture components have the same mode of action. In contrast, independent action (IA) [8] assumes all modes of action are different, so a series of chemical responses can be added after an appropriate relative scaling. This is response addition, in contrast to the doses being scaled and added as in CA. Both models assume the chemicals are additive and do not synergize, a shortcoming that is not addressed in this work.

Often, neither CA nor IA is accurate [7], and a number of methods offer a middle path. Two-step or joint models such as those used in [9–11] apply CA to chemicals assumed to have similar modes of action and IA to combine the effects from different modes of action. Mode of action is not always known and quantitative structure-activity relationships (QSAR) can be used to inform the joint procedure [12, 13], but the relevant QSARs have to be selected either by an expert or through a procedure fitted to an observed mixture response. When the mixture response is available, techniques such as fuzzy membership models [14, 15] and integrated two-step models [16] can also be applied, but as mentioned we wish to avoid the implied experimental constraint.

When applying a two-step method, it is desirable to use the most accurate models for the individual dose responses. CA requires all chemical to have equal maximum effects. Generalized concentration addition (GCA) is a powerful extension of CA [17] that can account for partial agonists, or chemicals that are less effective than others and serve to antagonize large effects if present at high concentrations. GCA requires strong simplifying assumptions to model individual dose effects, so [18] present an alternative to GCA that allows for more complex models by fixing the toxic effect of partial agonists at higher concentrations. While they have reasonably good empirical results, their algorithm is essentially making the same compromise as GCA: the true sill parameter is sacrificed to correctly model the slope parameter that is ignored in GCA.

We propose a new technique that provides uncertainty quantification and generalizes the approach of GCA when the individual toxic responses are modeled with a three-parameter Hill or logistic function. The parameters represent a maximum effect, a dose to reach 50% of the maximum effect, and a slope, and our method allows for all three parameters to vary in contrast to the fixed slope value of 1 required by GCA. A key observation is that the standard case with unit slope exhibits symmetry that can be applied in the general case by defining an inverse based on reflection in a piecewise fashion. Like GCA, there is an option to use this inverse within a two-step framework where chemicals can be grouped together. Chemicals within a group are combined into a single effective response with GCA and then the responses are combined with IA.

The rest of this paper is organized as follows. In section 2 we describe the details of our method, including the Bayesian model, the reflection argument for extending the inverse, and possible implementations of the two-step model with uncertainty quantification. In section 2.5 we describe the motivating mixtures problem with data from the Tox21 database [19]. In section 3 we describe the results of our method both on simulated data and on the real data, including diagnostics for the parameter estimation procedures. Section 4 concludes with a discussion and areas of improvement. Technical details are left to the supplement.

2 Materials and methods

Our goal in this work is to create a tool that can predict the toxicity of an arbitrary collection of chemicals for a specific assay. This is in contrast to a model that is fit to a specific mixture using the empirical mixture response. This means our first step is to fit a model to each individual chemical for the assay of interest. We then want to use the set of individual models to predict the mixture effect with uncertainty quantification, and when possible, validate against empirical data. A summary of our approach is shown as pseudocode in Algorithm 1.

Algorithm 1: Toxicological Mixture Prediction Algorithm

Data: Individual dose response data (c_ik, r_ijk) for chemicals i ∈ 1: N, replicate j, and concentration k, Mixture dose C with component doses c₁, …, c_N

Result: (Step 1) Individual dose-response curve parameters and uncertainty;

(2) Cluster assignment for

dose-response groups;

(3) Total Mixture dose-response predictions R_mix (Step 1:) Estimate the individual fixed and random effect parameters for S1 Eq 7 using the BHM of S2.1 Section in S1 File

r_{i j k} = \frac{α_{i} + u_{i j}}{1 + (\frac{θ_{i}}{c_{k}})^{β_{i}}} + v_{i j} + ϵ_{i j k}

(Step 2:) Compute grouping, if not random: K-means, sill, etc.

(Step 3a): Sample Parameters

for b ← 1 to B do

Generate a random cluster assignment or use computed cluster: denote the grouping as G_b with K_b clusters;

Sample the remaining parameters α_b = α_1b:Nb, etc, from the BHM posterior;

Let $P_{b} = {G_{b}, α_{b}, β_{b}, θ_{b}, u_{b}, v_{b}}$ represent a set of parameters for the b’th sampled curve

end

(Step 3b:) Predict Response

foreach $P_{b}$ with b ← 1 to B do

Input: Mixture dose C = (c₁, …, c_N)

for Cluster k ← 1 to K_b do

Calculate the group response for the bth sample curve R_b,k via RGCA using Eq 6 for the inverse $f_{i}^{- 1}$ ;

arg min_{R_{b, k}} | \sum_{q \in Q_{k}} c_{q} / f_{q}^{- 1} (R_{b, k}) - 1 |

end

Do Calculate total mixture response for the sampled parameters $R_{b} = α_{m a x} (1 - \prod_{k_{i} = 1}^{K} (1 - f_{k_{i}} (C_{k_{i}}) / α_{m a x}))$ , Eq 3, using f_k(C_k) = R_b,k from the previous step

end

Output: Median, 5th, and 95th quantiles for the collection {R_b} as the estimated mixture response R_mix and uncertainty

This section is structured as follows. Section 2.1 reviews the standard methods of Independent Action (IA) and Generalized Concentration Addition (GCA) and how they are combined into a two-step method. A novel extension to GCA is illustrated in Section 2.2, which we call reflected GCA (RGCA). This device is necessary when applying GCA or CA for slope values different than 1 and can also be used in a two-step method. The two-step approach requires parameter estimates and clustering, so the novel extension is followed by a description of the statistical models used for fitting in Section 2.3 and the interpretation of clustering in Section 2.4. Some information is provided about the motivating data for this work in Section 2.5 and we conclude with comments about simulation and validation in Section 2.6.

2.1 GCA, IA, and the two-step model

In our notation, r is a response, c a concentration of a chemical, α a maximum response or effect, θ the concentration to reach 50% of the maximum effect or EC₅₀, and β the slope. An upper case letter R or C represents the response or concentration of a mixture. Indexing such as r_i represents the i’th response out of a collection of N chemicals, while R_i represents the response of a mixture of chemicals with index depending on context. An index b = 1, …, B refers to a sample from the Bayesian posterior where we take B total samples. As summarized in Algorithm 1, R_b refers to the complete mixture response for a particular set of parameters $P_{b}$ , R_b,k refers to the response of the kth subcluster for $P_{b}$ , and R_mix refers to the final estimate of the mixture response. In this work we assume a Hill function for a toxic response:

\begin{matrix} r = f (c | α, θ, β) = \frac{α}{1 + (\frac{θ}{x})^{β}} \end{matrix}

(1)

The standard version of GCA is derived from concentration addition in [17]. The great advantage of this technique over CA is the ability to make realistic predictions in the case where one chemical has a smaller maximum effect than another. Theoretically, we expect that at high concentrations, the less toxic chemical actually prevents some of the damage from the more effective toxicant. This is the type of antagonism that GCA models, as we will illustrate. The GCA equation for a mixture of two chemicals is written:

\begin{matrix} \frac{[A]}{f_{A}^{- 1} (R)} + \frac{[B]}{f_{B}^{- 1} (R)} = 1 \end{matrix}

(2)

Their method is very flexible in terms of the choice of f, the toxic response function, but they mainly work with the Hill model and we build on this specific case.

Recall that a two-step model in our context means that we cluster a set of chemicals and use GCA within clusters as step 1 and assume independent action (IA) across clusters as step 2. Eq 3 expresses IA when the index k corresponds to a single chemical and expresses the two-step model when k corresponds to the k’th set of chemicals. Note the introduction of the parameter α_max = max_i α_i which rescales the final prediction to the highest observed individual response.

\begin{matrix} R = α_{m a x} (1 - \prod_{k = 1}^{K} (1 - \frac{f_{k} (C_{k})}{α_{m a x}})) \end{matrix}

(3)

Existing theory suggests that chemicals with the same mode of action are expected to obey CA, while chemicals that have unrelated molecular targets are assumed to obey IA [7]. Unfortunately, mode of action is not well understood for many chemicals. Hence we compare the effectiveness of several clustering methods, including no clustering, for our two-step methodology. For completeness we also apply CA, using the same scaling idea that IA applies by resetting all sills to 1, computing the combined effect, and then mutliplying by the largest sill. The slope parameters for CA do not have to be 1 as in GCA.

2.2 Reflected GCA

We now make a few observations about the Hill function. First, note that aside for a few special values for β like 1 and 1/2, the inverse is not a real number for large values of r:

\begin{matrix} c = f^{- 1} (r | α, θ, β) = \frac{θ}{{(\frac{α}{r} - 1)}^{1 / β}} \end{matrix}

(4)

An imaginary component makes the resulting estimate difficult to use and interpret [20]. Next, when β = 1, we have a formula describing a hyperbola:

\begin{matrix} c = f^{- 1} (r | α, θ, β = 1) = \frac{θ}{(\frac{α}{r} - 1)} \end{matrix}

(5)

Studying either the plot or the formula, we make the key observation that we can reflect along the axes x = −θ and y = α to recover the “negative concentration” portion of the graph. This symmetry is a property of hyperbolic functions and illustrated below in Fig 1. The existence of the Hill function for response values above the sill is what makes GCA possible, since this region returns negative concentrations to yield the partial agonist response for large effects.

Fig 1 — Reflections can be used to extend the Hill function with a slope ≠1 in green to match the support of the standard GCA case, shown as a dashed black line. The green line (RGCA) has parameters (Sill, EC₅₀, Slope) = (0.9, 0.5, 0.5 while the shadow lines have slope values ranging from 0.1 to 4. The teal line is acquired by reflecting the green segment across the vertical line Y = -EC₅₀(Intermediate step, gray) and then across the horizontal line X = Sill. The Extension (red line) is found by flipping the pink box containing the RGCA line across the diagonal Y = -X (not shown) and using the EC₅₀ value as the effective sill. The Reflected Extension (purple line) is found by the same procedure as the Reflection (teal) line, starting with the Extension (red) line.

There is also a secondary symmetry that is crucial for full generality of our proposed approach. Focusing on the continuous portion of the curve that includes positive concentrations (Fig 1, pink box), there is a symmetry between the segment before and after the origin (x = y = 0). This region of symmetry is bounded by the reflection axes and X-Y axes as illustrated in Fig 1 with the pink and blue boxes. This additional symmetry is relevant for two reasons. First, the negative reflected component (teal line) of the defined curve (green line) described earlier does not extend across the full negative support (purple line). By reflection, the maximum defined effect is only two times the sill, so if some other chemical in the mixture has a maximum effect above this threshold, there is no defined value. Hence, we need to define the extension (red line) and reflect it as well (purple line). The second reason the extension is necessary is because the sill parameter in the observed dose-response can be negative. In this case, the relevant portion for inversion at positive effects is the extension shown as the red line, since the entire curve is flipped across the X-axis.

Our first novel contribution is to apply this reflection argument when the slope parameter β is not 1. The inverse function is specified for this domain differently from the regular inverse and the derivation is provided in the next section, resulting in Eq 6 when α > 0. Similar equations hold for the case α < 0 with appropriate sign changes for each region, see the supplement.

This inverse provides a wide enough support to satisfy the invertibility requirements of Eq 2, so we now have the ability to apply GCA to Hill models with non-unit slopes.

2.2.1 Derivation of RGCA

Here, we derive how to extend GCA to non-unit slopes. To ensure RGCA is a valid function that is fully-defined across the real-number domain, we utilize geometric techniques such as reflections, extensions, and substitutions. Starting with the three parameter Hill model in Eq 1 and its inverse, we define three additional functions as combinations of reflections and extensions.

Reflection. The first function is a reflection (teal line, Fig 1) along the axes defined by the parameters x = −θ and y = α. The calculations are straightforward since the axes are parallel to the standard X and Y axes. We first subtract the offset for each axis, then negate the sign for the corresponding variable, and then add back the offset. We represent the concentration as c and the reflected concentration as c′, and similarly for the response r and r′:

c^{'} = (- θ) - (c - (- θ)) = - 2 θ - x

r^{'} = α - (r - α) = 2 α - r

To provide the inverse, we just plug in the new (c′, r′) into the relation f⁻¹(r) = c:

- 2 θ - c = f^{- 1} (2 α - r) \Rightarrow c = - 2 θ - θ (\frac{α}{2 α - r} - 1)^{- 1 / β}

Extension. Next we extend (red line, Fig 1) the Hill function to negative values of c between the origin and the asymptote c = −θ. The key observation is that, while it is not a direct reflection along an axis like y = x, it can be thought of as a separate Hill function where the role of the sill and EC₅₀ have been reversed along with c and r. Hence, the standard Hill function becomes the inverse, where the two parameters are swapped and the new sill (previously the EC₅₀) and input response are negated to account for the sign change of the support. While it is necessary to use the EC₅₀ as the sill (so that there is a vertical asymptote at c = −θ), it is not necessary to use the sill as the new EC₅₀.

c = f_{e x t}^{- 1} (r) = \frac{- θ}{1 + {(\frac{α}{- r})}^{β}}, r = f_{e x t} (c) = - α (\frac{- θ}{c} - 1)^{- 1 / β}

Here the equation for x as a function of response R has the form of a Hill function with slope β even though it represents the inverse; the equation for the response R as a function of c has the form of the Hill inverse. Finally we create a smooth transition by inverting the slope parameter to match the slope of the adjacent segment:

c = f_{e x t}^{- 1} (r) = \frac{- θ}{1 + {(\frac{α}{- r})}^{1 / β}}

If the slope is not inverted, the resulting curve will have a non-smooth kink at 0 and lead to situations where there is no solution to the RGCA equation.

Reflected Extension. Finally, we reflect the extension we just derived over the axes x = −θ and y = α (purple line, Fig 1). The method is the same as before so we plug c′ and r′ into $f_{e x t}^{- 1} (r) = c$

- 2 θ - c = f_{e x t}^{- 1} (2 α - r) \Rightarrow c = - 2 θ + \frac{θ}{1 + {(\frac{α}{r - 2 α})}^{1 / β}}

Combined Inverse. Putting the pieces together, we have a piecewise inverse for the four domains:

\begin{matrix} c = f^{- 1} (r | α > 0, θ, β > 0) = {\begin{matrix} \frac{- θ}{1 + (\frac{- α}{r})^{1 / β}} & r \in (- \infty, 0) \\ θ (\frac{α}{r} - 1)^{- 1 / β} & r \in [0, α) \\ - 2 θ - θ (\frac{α}{2 α - r} - 1)^{- 1 / β} & r \in (α, 2 α) \\ - 2 θ + \frac{θ}{1 + (\frac{α}{r - 2 α})^{1 / β}} & r \in (2 α, \infty) \end{matrix} \end{matrix}

(6)

The resulting inverse maintains a coarse hyperbolic shape and continuity and is smooth at the transitions. This procedure is not limited to the Hill function and can be applied to any monotonic dose response function, but the resulting stability may vary. Note that negative slope parameters for the Hill function are not supported. The negative sill case α < 0 is shown in the appendix for completeness.

2.2.2 Limitations

An important property of GCA that partially carries over to RGCA is the existence and uniqueness of a solution. This means that a set of chemicals with arbitrary sill and EC50 values will have a single solution for the equivalent response r in Eq 2. This stability is due to the fact that slope values are fixed to 1, and that it is extremely unlikely that the sills and EC₅₀ values will perfectly balance out. In the case of RGCA, it is possible that the solution may not be unique if any chemical’s sill is negative.

A proof sketch for the existence and uniqueness of the GCA solution is provided in the supplementary material. RGCA has a provable and unique solution when all of the chemicals have positive sill value, which is applicable across a wide variety of problems in mixtures toxicology. The problem for RGCA arises when the slope values are not all 1 and some sill values are negative. In this case there can be a “battle” where one chemical dominates in one region and another chemical dominates in another. This complex behavior is shown in Fig 2, which shows an isobole plot (A) for two chemicals with a clear change in appearance near the bottom half. Chemical one (C1) has (sill, slope, EC₅₀) of (-2, 1.5, 1) and chemical two (C2) has parameters (1, 1, 0.6). The four smaller inset plots, B to E, show RGCA coefficients $1 / f_{i}^{- 1} (R)$ from Eq 2 for specific concentrations pointed out by the black lines. The dotted horizontal lines in each inset plot represents 1, the intersection of which provides the solution and matches the value of the contour in the isobole plot. Plot (E) in the top right for concentrations (c1 = 4, c2 = 3) are similar to the result of GCA, which always has one solution. The bottom plot progression (B,C,D) shows how the RGCA solution changes from one unique value to three unique values as the proportion of chemical 1 to the total mixture increases. With chemical 2 at a fixed concentration, chemical 1 becomes the primary driver of the mixture and introduces a negative response. The boundary between regions is a bifurcation point where there is just enough chemical 1 to create a second solution (but not a third). Mathematically, the additional solutions are just the result of the slope parameters determining the rate at which particular functions go to zero or infinity.

Numerical precision can be an issue. For example, in the bottom right inset plot for (c1 = 4, c2 = 2), a local optima at the positive boundary or a negative optima may be found rather than the true positive optima near the origin. This is the reason for the denser region in the lower half of Fig 2. Similar patterns can be seen in S1 Fig in S1 File in the supplement. Extending to larger mixtures is expected to yield similar results, as all positive sill terms can be added together and likewise for negative terms, reducing to the binary case shown.

In summary, when the sills of the chemicals are different signs, some combinations of parameters lead to multiple solutions. Fig 2 suggests a linear boundary beyond which multiple solutions and numerical issues may be a problem, but a general formula for when there are three solutions does not appear tractable. To avoid ambiguity, we could choose to always select the positive solution if available, but it is not clear that a positive solution near 0 is more correct than a negative solution with larger magnitude. Since the numerical issues are difficult to predict, our recommendation is to simply note any optimization messages or errors when using this method.

2.3 Bayesian data model

As illustrated in Fig 3, chemicals have replicates and high variability between replicates. We express the Hill function as a statistical model by introducing random effect and noise terms (u, v, ϵ) and use additional indices to track the chemical (i) and replicate (j) and concentration sample (k):

\begin{matrix} R_{i j k} = \frac{α_{i} + u_{i j}}{1 + (\frac{θ_{i}}{x_{k}})^{β_{j}}} + v_{i j} + ϵ_{i j k} \end{matrix}

(7)

Fig 3 — Example dose response curves from the Tox21 data set with chemical name (CAS number). Clockwise from the top left, each plot shows replicates exhibiting: no effect, a random intercept effect, a random sill effect, and a consistent response without random effects. Eq 7 can account for all of the random effects, but we do not explicitly model a no-effect response with a null model because we include all chemicals when predicting a mixture response, even if there is no effect. A sharp decrease in response implies cytotoxicity and such points are removed before analysis.

The noise term ϵ represents the standard assumption of independent and identically distributed Gaussian noise and allows for negative values of the response R, which are observed in the data even though we might presume that negative effects are unrealistic. The term u_ij is a random effect that accounts for the replicates having variable maximum effects. The v_ij term acts as a y-intercept random effect; naturally when the concentration is 0 we expect a response of 0, but due to the data collection procedure this is not guaranteed.

Since this model does not have a closed form solution for the parameter estimates, and we care about the uncertainty in the estimates, we fit a Bayesian model using Markov Chain Monte Carlo, see [21] for a full reference. Note that the variance of the noise term is specified for each chemical rather than globally. This was done for two reasons: there are at least three replicates for each chemical, and a global variance parameter can be influenced by a chemical with a large sill, while chemicals with small but noticeable effects will get treated as noise. Using 25,000 iterations, posterior estimates of the parameters are found using a burn-in of 5000 and thinning the remaining 20,000 iterations by taking every 20th sample and then taking the median as a robust estimate of the center of the distribution. For sampling curves, we sample directly from the thinned set of posterior samples as described below in section 2.1. The priors for the main parameters are Gaussian with mean 0 and variance 200 for the sill, Gamma with mean and variance 1 for the slope, and Gamma with mean 1 and variance 100 for the EC₅₀, respectively. The sill and EC₅₀ priors are weakly informative to avoid absurdly large values when the dose response curve is incomplete [22]. The slope parameters are given Gamma priors centered at 1 because our method requires a positive slope and a center of 1 reflects the baseline assumption from GCA. The full model specification is in the Supplement, S2 Section in S1 File, and R code is available on Github.

An immediate concern may be that our model does not allow for a null or no-effect chemical. This is a deliberate choice. In the context of CA, the natural thing to do is to assume extreme parameters, such as a very small sill, a very large EC₅₀, and fix the slope to 1, or simply ignore the no-effect chemicals. We make an assumption that all chemicals are relevant so we do not want to throw any out. We also intend to use the slope parameter for clustering, so the choice of a null model becomes consequential. Rather than attempt to justify a fixed null model, we leave it to the algorithm to find the best fit. We comment in the conclusion that this question is a direction for future work.

2.4 Clustering as interpolation

Clustering is a nuanced task that involves many decisions. Functional data such as dose response curves can be clustered according to the parameters or prior information, with multiple settings to adjust depending on what specific cluster method is used. In the context of the mixture problem, the benefit of clustering for a two-step or joint method is to serve as an interpolation between IA and CA [13]. A simple but motivating example is a mixture composed of androgen receptor (AR) agonists and estrogen receptor (ER) agonists. If this mixture is applied to an an AR assay, one would expect the significant effect to come from the AR agonists. However, a naive application of CA on this kind of mixture will significantly underestimate the predicted response because the ER agonists have low or zero responses individually that dilute the effect.

In some cases like the example given, there is a clear candidate for clustering method. But as we will show in the Results, in practice this clustering isn’t enough to correctly recover the mixture response because the independence assumption ignores molecular dynamics such as binding affinity or synergy, and is also unaware of whatever grouping would result from knowing the true modes of action. As a secondary contribution of this work, we explore the effectiveness of a few different clustering methods to improve predictions when true modes of action are not known.

The first clustering approach we consider is to group chemicals on the binary condition of having a positive response versus a non-positive response. This is a simple clustering on the sill parameter. A second approach we consider is a partial random clustering. The chemicals are randomly assigned into one of two to five groups (hence partial, rather than allowing any possible group), along with one assignment representing the single group of CA and one assignment to represent the completely separated grouping of IA. The last cluster approach we explore is a K-means grouping based on all of the parameters, a logical extension of the argument that zero-effect chemicals should be grouped separately from positive-effect chemicals. The assignments are computed using the posterior median parameter estimates and a few reasonable options are taken based on the Elbow criteria (the cluster count beyond which additional clusters do not significantly decrease the within-group variability).

Our two-stage approach can now be summarized as follows (See Algorithm 1). We first fit Hill functions with random effects to the individual chemicals using a Bayesian model. To make a single prediction, we sample the parameters from the posterior distributions for each chemical, propagating variance by adding noise according to the estimated random effect. If the clustering method has any randomness (eg random clustering), we sample a cluster assignment. Given the parameters and a clustering assignment, we apply RGCA to the chemicals in each group to get a combined group effect, and then apply the IA model of Eq 3 to the set of group effects. GCA and CA predictions rely only on posterior mean parameter estimates for purposes of comparison, but could be adapted to use the Bayesian approach.

The uncertainty quantification for a predicted effect at a given concentration is achieved by generating multiple predictions (eg B = 100)) and estimating point-wise quantiles. For comparison to existing methods, we compute curves for regular IA where every chemical gets its own cluster and GCA and CA where all chemicals are in one cluster. For GCA we fix the slope values to 1 while CA scales all sills to 1 and applies GCA, multiplying the final prediction by the largest sill. IA uses all of the same fitted parameters as our two-step method. For these standard methods, we do not compute uncertainty intervals to improve contrast with our Bayesian method. In practice, all of these techniques can be used with the Bayesian framework and will result in similar uncertainty intervals.

2.5 Tox21 data

Tox21 [23] is a library of 10,000 chemicals selected for their potential relevance to health and amenability to testing. Quantitative high throughput screening (qHTS) with specific assays is used to efficiently find which chemicals are relevant for pathways or targets of interest [19].

We focus on a subset of Tox21 that evaluates responses with the androgen receptor luciferase (AR-luc) assay. AR-luc is a cell line derived from human breast cancer cells that includes a gene to express a firefly luciferase protein. A promoter, MMTV, is introduced into the cell line DNA as the response element for the AR, while the luciferase gene is added as a reporter in the regulatory sequence with MMTV. In other words, MMTV is used as the piece of DNA (promoter) that the AR first attaches to in the nucleus (regulatory sequence) to begin transcription, and the luciferase gene is the subsequent section that is transcribed to create a measurable effect (reporter). This corresponds to the last step of the AR pathway, right before RNA transcription begins, so the data collected should be highly correlated with a count of how often a gene is expressed or protein produced. See [24, 25] for additional details on the analysis of Tox21 data for the AR-luc assay.

The input data consists of individual dose responses for 18 chemicals. Six of the chemicals are clear AR agonists, one is a weak agonist, and the remaining 11 are known estrogen receptor (ER) agonists but appear to have no effect on AR. Other assays in Tox21 suggest that some of these known ER agonists can act as AR antagonists. There are dose responses for 69 different mixtures as part of the same Tox21 data set, including binary, equipotent, equiconcentration, and combinations dominated by one or two chemicals. Some of the mixtures include only the AR or ER agonists. Many mixtures were designed by referring to their activity in a previous study, with chemical concentrations in the mixtures at different proportions of their EC₅₀ values [25]. As a part of quality control of the original data, doses that exhibited cytotoxic responses were flagged. These data points were subsequently excluded in our analysis.

Because our approach does not account for synergy or direct antagonism, the mixtures that are more sensitive to these interactions are expected to have greater prediction error. In fact, some binary mixtures were found to have antagonistic responses that entirely ignore the effect of one toxic compound. Furthermore, clustering for a binary mixture reduces to either GCA (i.e. one cluster) or IA (i.e. two clusters). Our focus is on “large” mixtures that contain proportions of many chemicals, which corresponds to a more realistic exposure scenario in which humans are exposed to a wide variety of complex chemicals and stressors through their life course—i.e. a complete exposome [26]. Additionally, with large, complex mixtures, the effects of synergy or antagonism are less likely to impact the whole mixture response [27]).

A positive control chemical is used as a benchmark to define a response of 100 while a vehicle control defines a response of 0. The response does not have defined units because the response is a measure of relative fluorescence, and negative values simply mean less response than the vehicle control.

2.6 Simulation and validation

We conduct a simulation study to explore relative performance of RGCA and/or a two step approach. A single simulation run consists of generating Hill parameters and assigning a cluster grouping for 10 or 20 chemicals, treating this as the true mixture model. The models are then fit using the true parameters where possible and the clustering according to the method. Mixture doses are computed as equipotent vectors for a range of dilutions. For example, a GCA prediction will use the correct parameters but fix the slope values to 1 and assume a single group (all chemicals in one cluster). There are 50 simulation runs per model per mixture size. The set of true models is describe in Table 1. The EC₅₀ parameters are sampled uniformly from 0.1 to 20 for all models. The models used for fitting include the true models along with CA and IA. When the fitted model is the true model, we expect 0 error, with the exception of the random clustering method which always creates a random grouping. When the random model is used as the truth, only one random grouping is chosen to generate a mixture response; when used as fitting procedure, ten random groupings are pre-generated with four assignments having two clusters, four assignments having three clusters, and an assignment each for CA and IA. When GCA is the truth, RGCA should yield identical results because RGCA generalizes GCA. Under RGCA, we expect a large error for all other methods when the slope parameter is different from 1. It is not clear how much relative error to expect when the two-step model is the true model or used for fitting.

Table 1. Simulation setup.

Parameters for the mixture response are sampled uniformly from the specified intervals according to the true model.

True Model	Slope Range	Sill Range	Clustering
GCA	1	[1.5, 10]	None
RGCA (slope ∼ 1)	U[0.5, 1.5]	U[1.5, 10]	None
RGCA (slope >> 1)	U[0.1, 10]	U[1.5, 10]	None
RGCA (sill < 0)	U[0.5, 1.5]	U[-10, 10]	Postive vs non-positive sills
RGCA (2-step)	U[0.5, 1.5]	U[1.5, 10]	Random, 2–3 clusters
RGCA (2-step KM)	U[0.5, 1.5]	U[1.5, 10]	K-Means on all parameters

Open in a new tab

We do not perform a simulation study for the mixed effect model, as this performance can be evaluated empirically and is compared to a simpler approach of maximum likelihood using an existing dose response curve fitting package called drc from [28].

For the data application, we evaluate the same methods we simulated but expand the random clustering from ten to 100 possible assignments to match the number of samples B = 100 used to generate uncertainty intervals: 1 each for GCA and IA, then 10 that group among 2 clusters, 20 that group among 3 clusters, 30 that group among 4 clusters, and 38 that group among 5 clusters. The number of assignments per cluster count is arbitrary and roughly reflects the fact that there are many more possible assignments when the number of clusters increases.

Error can be quantified with mean squared error (MSE), log likelihood (LLH) or a continuous ranked probability score (CRPS). CRPS is a proper scoring rule: the true model will give the best score (but other models may give equally good scores) [29]. Intuitively, a CRPS measures how well the current distribution predicts the observed value in comparison to a perfect predictive distribution which puts all mass on the observed value. The empirical CRPS at a concentration x_j with observed response R is computed as

\begin{matrix} CRPS (F_{j}, R) = \sum_{i = 1}^{N} {[{\hat{F}}_{j} (y_{i}) - 1 (y_{i} > R)]}^{2}, {\hat{F}}_{j} (y) = \frac{1}{N} \sum_{i = 1}^{N} 1 (f_{i} (x_{j}) < y) \end{matrix}

(8)

This equation describes the empirical cumulative distribution function F_j of the predicted responses y_i at point x_j, for index i representing the sampled curves.

When the prediction is far from the truth, the CRPS score may give identical results for all methods, so we also include LLH and MSE. LLH is computed using a kernel density estimate rather than the empirical CDF used in the CRPS:

\begin{matrix} LL (R) = - log \frac{1}{N} \sum_{i} ϕ (R | y_{i}, σ_{b w}) \end{matrix}

(9)

The kernel ϕ is a Gaussian centered at the value y_i with a computed bin width. When there is only one data point, the LLH is not provided since it depends entirely on the bin width. In cases where the data is far from the truth, the LLH may be infinity (representing a likelihood that is effectively 0). MSE for our data is computed as

\begin{matrix} MSE = \sum_{i, j} {(R_{i j} - f (x_{i j k}))}^{2} \end{matrix}

(10)

For the simulation study, since we are only testing the applicability of various models to each other rather than the statistical accuracy of a fitting procedure with noisy data, so we compare fits using MSE. All three scores are computed for our predictions with the Tox21 data, which is used for validation. The R package scoringRules described in [30] is used to compute the empirical CRPS and LLH. Each score emphasizes a different quality. MSE ignores uncertainty and just tells how closely we recovered the observations; CRPS emphasizes both centering and uncertainty; LLH is more lenient when it comes to the mean or center being wrong as long as the uncertainty is wide enough to represent the data.

3 Results

3.1 Simulation study

Our simulation study illustrated in Fig 4 shows that RGCA and two-step methods can be helpful for improving predictions when the true model is not GCA. We can make a number of qualitative comparisons because of the variety of true models we tested. For example, the error between GCA (as truth) and IA is roughly twice the error between GCA and a joint model, which is reasonable since the joint models interpolate between GCA and IA. When slope parameters are near 1, GCA and RGCA are nearly identical and the two-step methods have large errors; when slope parameters are allowed to vary from 0.1 to 10 (slope >>1), the error of GCA and two-step method become comparable. These observations suggest that extreme slopes have an interpolating effect similar to the two-step methods, decreasing the difference between GCA and IA. A similar conclusion seems to hold for simulations that allow negative sills, an edge case discussed in Section 2.2.2. When the two-step methods are the true responses, all of the methods have less error, again due to the interpolation effect. It is surprising that our implementation of CA has the lowest error among one-step approaches when the true model is two-step, especially in larger mixture sets.

The main conclusions from this simulation study are that 1) RGCA alone may not provide much benefit if the slopes are around 1 and sills have the same sign, and 2) a two-step approach using RGCA with random clusters tends to perform well in a variety of settings, including unknown groupings into modes of action. The benefits often increased with the size of the mixture, but our study looked at equipotent dose vectors. If a mixture is driven by a few potent chemicals, the effective mixture size and resulting errors may be smaller.

3.2 Parameter estimates and clustering

The parameter estimates for our Bayesian random effect model and for the maximum likelihood found with the R package drc is shown below in Table 2. For the chemicals with large responses, the models arrive at nearly identical results. The random effect model has different sill estimates compared to drc because all of the curve effects are averaged for drc, while our model takes just one replicate and adds offsets (realizations of random effects) for the other replicates. The differences between the small or no effect chemical parameters is due to the regularizing effect of priors which were chosen to prevent extreme values and ambiguities in the likelihood. For example, note that the 17th chemical (benzyl butyl phthalate) has an estimated sill of 1.175 and an EC₅₀ of 2e-12 under our method, suggesting a no-effect chemical, while the drc package finds a sill of -7500 and an EC₅₀ of about 4e-3. Both parameter sets explain the data and offer similar likelihood, but extrapolated responses for higher doses could be more misleading with the drc parameters. In the supplement, see S2 Table in S1 File for additional parameter estimates, S2 Fig in S1 File for trace plots showing convergence of parameter samples to stationary distributions, and S3 Fig in S1 File for a visualization of the resulting curve fit.

Table 2. Fitted parameters for Tox21.

Comparison of parameter estimates from the random effect model and the maximum likelihood of drc.

CAS	Name	Sill	EC₅₀	Slope	Sill (drc)	EC₅₀(drc)	Slope (drc)
107–15–3	Ethylenediamine	8.91	2.55e − 01	1.12	0.60	2.35e − 06	−0.05
143–50–0	Kepone	−18.10	1.16e − 04	1.60	−10.07	4.99e − 05	3.46
15972–60–8	Alachlor	−38.42	4.98e − 04	0.81	−58.53	7.87e − 04	0.77
17924–92–4	Zearalenone	−30.38	3.26e − 04	0.70	−25.75	2.21e − 04	0.68
34256–82–1	Acetochlor	−27.35	2.83e − 04	0.85	1.02	2.24e − 06	−4.51
434–07–1	Oxymetholone	90.34	6.20e − 09	1.16	93.00	5.91e − 09	1.14
50–02–2	Dexamethasone	357.85	5.52e − 09	1.05	398.32	5.12e − 09	0.90
50–29–3	p,p’-DDT	−25.05	1.07e − 04	4.94	−64.83	1.63e − 04	3.40
52806–53–8	Hydroxyflutamide	33.58	4.01e − 06	1.83	21.85	3.49e − 06	2.22
57–83–0	Progesterone	96.79	1.45e − 06	0.88	98.23	1.47e − 06	0.87
63–05–8	4-Androstene-3,17-dione	109.96	3.19e − 08	0.89	110.67	3.25e − 08	0.88
71–58–9	Medroxyprogesterone acetate	238.26	7.87e − 09	0.59	265.92	6.76e − 09	0.53
76–43–7	Fluoxymestrone	152.85	7.74e − 09	0.17	274.83	5.14e − 06	0.10
80–05–7	Bisphenol A	−17.06	5.67e − 04	0.55	−4.95	1.31e − 05	0.89
80–43–3	Dicumyl peroxide	−10.72	3.17e − 03	0.51	0.86	1.77e − 07	−1.89
84852–15–3	4-Nonylphenol, branched	−31.41	2.05e − 03	0.52	−30.07	7.15e − 04	0.59
85–68–7	Benzyl butyl phthalate	1.17	2.08e − 12	0.78	−7502.20	4.24e − 03	9.24
90–05–1	2-Methoxyphenol	20.11	8.27e − 02	0.34	2.27	3.12e − 05	0.43

Open in a new tab

Because of identifiability or flat likelihood issues, some of the MCMC chains do not exhibit convergence to a stationary distribution. In brief, for no-effect curves, one could set the sill to 0 and the other parameters could be completely free. While this could pose an issue in our sampling by creating a sample curve with a non-zero sill and an extreme EC₅₀, we find that practically it doesn’t cause an issue because the chance of a bad combination of parameters is low. Very strong priors could be used to improve identifiability, but at a cost of introducing additional bias. See S1 Table and S2 Fig in the S1 File for an analysis of the MCMC convergence to stationarity.

3.3 Application results

For the 17 Tox21 mixtures tested, a joint or two-step method with RGCA was superior to GCA and CA for a majority of the cases. In particular, our predictions with RGCA were better for most of the mixtures tested using the CRPS metric and about half of the mixtures based on MSE, see S3 and S4 Tables in S1 File respectively. LLH is included for completeness in S5 Table in S1 File. When comparing just RGCA with random clustering to GCA, the RGCA joint method was superior in 14 out of the 17 mixtures by both CRPS and MSE. The best results relative to GCA and CA were with mixtures which include large concentrations of the ER agonist (4x EC₅₀) in addition to the AR agonists. We had less success with smaller mixtures, such as when the ER agonists were excluded, suggesting that mixtures with more chemicals may help cancel out synergistic or antagonistic effects. See S6 Table in S1 File for descriptions of the mixtures. We illustrate the scores in aggregate as boxplots in Fig 5.

Fig 5 — Comparison of methods applied to the Tox21 dataset, lower scores are better. We compare a total of six methods: CA, IA, GCA, RGCA, RGCA grouped by sill, clustering by K-means on all parameters, and random clustering. The scoring metrics are the continuous rank probability score (CRPS), mean squared error from the median curve (MSE), and log likelihood (LLH). Score Summary A at top shows scores across all of the mixtures in tables S3-S5 Tables in S1 File from the supplement. Summary B is restricted to mixtures with large doses of ER agonists: mix 12, 20, 43, 52, 55, 57, 62. Note log-scaled y-axis. GCA, IA, and CA do not have LLH values since they were computed without uncertainty. IA and the two-step methods with RGCA perform best, suggesting that larger mixtures are following IA more than CA.

Although our methods are significantly better than GCA or CA, IA proved surprisingly effective at predicting the mixture response, especially when measured by MSE. When measured by CRPS, the results are comparable to our joint methods and occasionally inferior. This illustrates a trade off where a method with some error but with uncertainty quantification can do a better job of explaining all of the data compared to a method that predicts the center of the data well but does help understand the spread. A series of representative figures, Figs 6–8, is shown below. See the supplement, S4 and S5 Figs in S1 File for additional plots across the tested mixtures.

Fig 6 — A representative plot showing how mixtures are predicted for various concentrations and methods. All chemicals are present in this mixture, which is described as Mix 20 in the Supplement, S6 Table in S1 File. Our recommended method, RGCA with random clustering, is labeled as Random RGCA. IA, CA, and GCA are shown in yellow, pink, and blue lines, respectively. IA is better centered while our method covers most of the data with the credible interval. RGCA without clustering underestimates the truth due to the ER agonists, shown on the right without stars, that have small or negative sills.

Fig 8 — An example of poor predictive performance with a binary mixture. All methods fail to recover the true response. This mixture is exhibiting antagonism, which is not accounted for in any of the models.

Fig 7 — Only AR agonists are present, which is described as Mix 10 in the Supplement, S6 Table in S1 File. RGCA with random clustering is labeled as Random RGCA. IA, CA, and GCA are shown in yellow, pink, and blue lines, respectively. GCA fits best at very small concentrations but then underestimates the response. RGCA is similar to GCA at high concentrations but overestimates low doses because the estimated slopes are less than 1. Random RGCA along with CA and IA may be predicting the correct sill but observations at higher doses are not available.

In summary, our results suggest that the two-step method is more likely to cover the true mixture response when the truth is unknown.

4 Discussion

This work is motivated by the critical need for predicting a mixture response of an arbitrary set of chemicals from single chemical data only. First, we present Reflected Generalized Concentration Addition (RGCA), a piece-wise geometric extension of GCA that allows for non-unit slops. In contrast to existing methods, there are no restrictions on effect sizes or concentrations and we can use a 3-parameter Hill model to capture more of the structure in the individual dose response curves. Second, we can account for uncertainty in the data and predict a range of possible mixture responses. This demonstrates the importance of uncertainty quantification in mixtures toxicity prediction. Our method works best when there are many chemicals and fails (along with GCA and IA) in binary and small mixtures. We attribute this to the presence of synergy, which can have a strong effect when there are two chemicals of similar concentrations and can have a much weaker effect when there are many other chemicals to interfere in the interaction. This situation has been described as a “funnel effect” by [27].

The joint or two-step methods were generally the best approaches for the mixtures tested. As the mixtures involved more chemicals at significant doses, the IA model did quite well despite the assumption of a single mode of action with the assay tested. On one hand, this explains why our methods perform better than GCA: if IA is the truth and the two-step method is interpolating between GCA and IA, then the two-step method will surely do better than GCA. On the other hand, perhaps the assumption of CA can be questioned: the experiment is cell based and the chemicals may be interfering with other processes that impinge on the AR receptor process, requiring IA rather than CA. In that case, we should be comparing to IA and will find that our method is not significantly better.

This leads to the main conclusion of our work: the two-step approach with RGCA and uncertainty quantification is most likely to cover the true mixture response when modes of action are not known. GCA and IA are likely to be the true boundary conditions for the prediction, but choosing just one carries a risk of large error. The two-step approach with random clusters or K-means clusters can reduce error and help the user determine which boundary is more relevant. The results of both simulation study and application support the use of the RGCA extension with random clustering when the slope parameters are far from 1 (ie outside of [0.5, 1.5]) and partial agonists are present. Even when CA is the correct model, forcing the slope parameters to be 1 when they are far from 1 can induce errors comparable to the interpolation error between a two-step method and GCA for moderate sizes mixtures.

4.1 Limitations and future directions

Our reflection argument allows for any positive slope value, but we found that there are numerical issues for combinations of non-unit slopes and negative sills. Fig 2 shows how multiple solutions arise and lead to distortions in the isobolograms that cannot be smoothed away. The problem lies in the optimization process, which can be susceptible to the local optima and very steep functions that result from non-unit slopes as shown in Fig 2. We describe a simple heuristic of always taking the positive solution when present, but it is not possible to specify which positive solution will be found when there are multiple solutions and the optimization may simply fail to find the solution because it was lost in a local optima. Investigating better numerical methods or heuristics is an area of future work. Methods for dealing with stiff differential equations show promise.

Although our approach demonstrates the applicability of the two-step approach, the question of finding a true or correct grouping that reflects the actual modes of action is not addressed. We explore the use of simple random or K-means clustering and find that it is often better than using a single cluster, but the predictions are never entirely correct. In short, the mixture problem isn’t solved by interpolation. Furthermore, our methodology assumes a dose responses that is increasing, which is not directly applicable for interpolating biphasic responses that account for effects including cytotoxicity, such as those studied in [31].

Related to the concept of interpolation is whether or not chemicals with no observed effect should even be included in the calculation. Common sense suggests they should be left out, especially when using a method like GCA in which the no-effect chemical is treated as a partial agonist that strongly attenuates the predicted response. However, when an individual dose response shows no effect, it is not clear if the chemical is actually binding to the receptor with zero efficacy or not binding at all. If the chemical is binding to the receptor, it could be a competitive antagonist and must be included if possible in a mixture prediction. The Tox21 database contains such antagonist assays and also includes mixtures that include and exclude zero-effect chemicals; a preliminary study suggested inclusion of zero-effect chemicals, but such a study is beyond the scope of the present work.

The questions about mode-of-action and no-effect chemicals are side effects of the main limitation of our work: the exclusion of synergy. The challenge with synergy is the introduction of additional parameters controlling which chemicals synergize or antagonize with others. These are difficult to determine from individual dose response curves but possible with quantitative structure-activity relationships (QSAR), which would not require experimental data of the mixture. QSAR and docking or affinity scores may also help address the issue of no-effect chemicals by distinguishing between chemicals that do not occupy a receptor dock and those that do. With the inclusion of QSAR, it may be possible to use a true null model but identify slope parameters or cluster inclusion within our approach; this is useful because the chemical may have no effect but still synergize or antagonize another chemical. In contrast to IA and GCA, our approach is flexible enough to incorporate additional parameters or algorithmic steps utilizing QSAR data to account for synergy, which represents a promising future direction of research.

Supporting information

S1 File

(PDF)

pone.0298687.s001.pdf^{(2.2MB, pdf)}

Acknowledgments

Thanks to Fred Parham and Mike DeVito for help with data.

Data Availability

All code written in support of this publication is publicly available at https://github.com/Spatiotemporal-Exposures-and-Toxicology/RGCA-DP. The Tox21 data is available at https://tripod.nih.gov//tox21/pubdata/.

Funding Statement

This work is supported by the National Institute of Environmental Health Sciences, Division of Translational Toxicology, Division of Intramural Research, and the Spatiotemporal Exposures and Toxicology group under project number ZIA ES103368-02. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Conley JM, Lambright CS, Evans N, Cardon M, Medlock-Kakaley E, Wilson VS, et al. A mixture of 15 phthalates and pesticides below individual chemical no observed adverse effect levels (NOAELs) produces reproductive tract malformations in the male rat. Environment International. 2021;156:106615. doi: 10.1016/j.envint.2021.106615 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Silva E, Rajapakse N, Kortenkamp A. Something from “nothing”- eight weak estrogenic chemicals combined at concentrations below NOECs produce significant mixture effects. Environmental science & technology. 2002;36(8):1751–1756. doi: 10.1021/es0101227 [DOI] [PubMed] [Google Scholar]
3. Kwiatkowski CF, Andrews DQ, Birnbaum LS, Bruton TA, DeWitt JC, Knappe DR, et al. Scientific basis for managing PFAS as a chemical class. Environmental Science & Technology Letters. 2020;7(8):532–543. doi: 10.1021/acs.estlett.0c00255 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Eccles KM, Karmaus AL, Kleinstreuer NC, Parham F, Rider CV, Wambaugh JF, et al. A geospatial modeling approach to quantifying the risk of exposure to environmental chemical mixtures via a common molecular target. Science of The Total Environment. 2023;855:158905. doi: 10.1016/j.scitotenv.2022.158905 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Billionnet C, Sherrill D, Annesi-Maesano I, et al. Estimating the health effects of exposure to multi-pollutant mixture. Annals of epidemiology. 2012;22(2):126–141. doi: 10.1016/j.annepidem.2011.11.004 [DOI] [PubMed] [Google Scholar]
6. Bliss CI. The toxicity of poisons applied jointly 1. Annals of applied biology. 1939;26(3):585–615. doi: 10.1111/j.1744-7348.1939.tb06990.x [DOI] [Google Scholar]
7. Cedergreen N, Christensen AM, Kamper A, Kudsk P, Mathiassen SK, Streibig JC, et al. A review of independent action compared to concentration addition as reference models for mixtures of compounds with different molecular target sites. Environmental Toxicology and Chemistry: An International Journal. 2008;27(7):1621–1632. doi: 10.1897/07-474.1 [DOI] [PubMed] [Google Scholar]
8. Loewe St, Muischnek H. Über Kombinationswirkungen: Mitteilung: Hilfsmittel der Fragestellung. Naunyn-Schmiedebergs Archiv für experimentelle Pathologie und Pharmakologie. 1926;114:313–326. doi: 10.1007/BF01952257 [DOI] [Google Scholar]
9. Teuschler LK, Rice GE, Wilkes CR, Lipscomb JC, Power FW. A feasibility study of cumulative risk assessment methods for drinking water disinfection by-product mixtures. Journal of Toxicology and Environmental Health, Part A. 2004;67(8-10):755–777. doi: 10.1080/15287390490428224 [DOI] [PubMed] [Google Scholar]
10. Rider CV, Furr JR, Wilson VS, Gray LE Jr. Cumulative effects of in utero administration of mixtures of reproductive toxicants that disrupt common target tissues via diverse mechanisms of toxicity. International journal of andrology. 2010;33(2):443–462. doi: 10.1111/j.1365-2605.2009.01049.x [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Altenburger R, Schmitt H, Schüürmann G. Algal toxicity of nitrobenzenes: combined effect analysis as a pharmacological probe for similar modes of interaction. Environmental Toxicology and Chemistry: An International Journal. 2005;24(2):324–333. doi: 10.1897/04-032R.1 [DOI] [PubMed] [Google Scholar]
12. Altenburger R, Nendza M, Schüürmann G. Mixture toxicity and its modeling by quantitative structure-activity relationships. Environmental Toxicology and Chemistry: An International Journal. 2003;22(8):1900–1915. doi: 10.1897/01-386 [DOI] [PubMed] [Google Scholar]
13. Altenburger R, Walter H, Grote M. What contributes to the combined effect of a complex mixture? Environmental Science & Technology. 2004;38(23):6353–6362. doi: 10.1021/es049528k [DOI] [PubMed] [Google Scholar]
14. Wang Z, Chen J, Huang L, Wang Y, Cai X, Qiao X, et al. Integrated fuzzy concentration addition–independent action (IFCA–IA) model outperforms two-stage prediction (TSP) for predicting mixture toxicity. Chemosphere. 2009;74(5):735–740. doi: 10.1016/j.chemosphere.2008.08.023 [DOI] [PubMed] [Google Scholar]
15. Mwense M, Wang XZ, Buontempo FV, Horan N, Young A, Osborn D. Prediction of noninteractive mixture toxicity of organic compounds based on a fuzzy set method. Journal of chemical information and computer sciences. 2004;44(5):1763–1773. doi: 10.1021/ci0499368 [DOI] [PubMed] [Google Scholar]
16. Qin LT, Liu SS, Zhang J, Xiao QF. A novel model integrated concentration addition with independent action for the prediction of toxicity of multi-component mixture. Toxicology. 2011;280(3):164–172. doi: 10.1016/j.tox.2010.12.007 [DOI] [PubMed] [Google Scholar]
17. Howard GJ, Webster TF. Generalized concentration addition: a method for examining mixtures containing partial agonists. Journal of theoretical biology. 2009;259(3):469–477. doi: 10.1016/j.jtbi.2009.03.030 [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Scholze M, Silva E, Kortenkamp A. Extending the applicability of the dose addition model to the assessment of chemical mixtures of partial agonists by using a novel toxic unit extrapolation method. PloS one. 2014;9(2):e88808. doi: 10.1371/journal.pone.0088808 [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Tice RR, Austin CP, Kavlock RJ, Bucher JR. Improving the human hazard characterization of chemicals: a Tox21 update. Environmental health perspectives. 2013;121(7):756–765. doi: 10.1289/ehp.1205784 [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Webster TF, Schlezinger JJ. Generalized concentration addition for ligands that bind to homodimers. Mathematical biosciences. 2019;316:108214. doi: 10.1016/j.mbs.2019.108214 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Robert CP, Casella G, Casella G. Monte Carlo statistical methods. vol. 2. Springer; 1999. [Google Scholar]
22. Wheeler MW. An investigation of non-informative priors for Bayesian dose-response modeling. Regulatory Toxicology and Pharmacology. 2023;141:105389. doi: 10.1016/j.yrtph.2023.105389 [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Huang R. A quantitative high-throughput screening data analysis pipeline for activity profiling. High-throughput screening assays in toxicology. 2016; p. 111–122. doi: 10.1007/978-1-4939-6346-1_12 [DOI] [PubMed] [Google Scholar]
24. Lynch C, Sakamuru S, Huang R, Stavreva DA, Varticovski L, Hager GL, et al. Identifying environmental chemicals as agonists of the androgen receptor by using a quantitative high-throughput screening platform. Toxicology. 2017;385:48–58. doi: 10.1016/j.tox.2017.05.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Kleinstreuer NC, Ceger P, Watt ED, Martin M, Houck K, Browne P, et al. Development and validation of a computational model for androgen receptor activity. Chemical research in toxicology. 2017;30(4):946–964. doi: 10.1021/acs.chemrestox.6b00347 [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Vermeulen R, Schymanski EL, Barabási AL, Miller GW. The exposome and health: Where chemistry meets biology. Science. 2020;367(6476):392–396. doi: 10.1126/science.aay3164 [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Warne MSJ, Hawker DW. The number of components in a mixture determines whether synergistic and antagonistic or additive toxicity predominate: The funnel hypothesis. Ecotoxicology and Environmental Safety. 1995;31(1):23–28. doi: 10.1006/eesa.1995.1039 [DOI] [PubMed] [Google Scholar]
28. Ritz C, Baty F, Streibig JC, Gerhard D. Dose-Response Analysis Using R. PLOS ONE. 2015;10(e0146021). doi: 10.1371/journal.pone.0146021 [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Gneiting T, Raftery AE. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association. 2007;102(477):359–378. doi: 10.1198/016214506000001437 [DOI] [Google Scholar]
30.Jordan A, Krüger F, Lerch S. Evaluating probabilistic forecasts with scoringRules. arXiv preprint arXiv:170904743. 2017;.
31. Martin-Betancor K, Ritz C, Fernández-Piñas F, Leganés F, Rodea-Palomares I. Defining an additivity framework for mixture research in inducible whole-cell biosensors. Scientific Reports. 2015;5(1):17200. doi: 10.1038/srep17200 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0298687.r001

Decision Letter 0

Y-h Taguchi

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

22 Oct 2023

PONE-D-23-27967Reflected Generalized Concentration Addition and Bayesian Hierarchical Models to Improve Chemical Mixture PredictionPLOS ONE

Dear Dr. Messier,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Dec 06 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Y-h. Taguchi, Dr. Sci.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match.

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

3. Thank you for stating the following in the Acknowledgments Section of your manuscript:

"Thanks to Fred Parham and Mike DeVito for help with data. This work is supported by the

444 National Institute of Environmental Health Sciences, Division of Translational Toxicology,

445 Division of Intramural Research, and the Spatiotemporal Exposures and Toxicology group

446 under project number ZIA ES103368-02."

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

"No, The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

4. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: I Don't Know

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The development of joint toxicity modeling has been one of the challenges in the field, to which the authors of this work have made their own contribution. However, some issues still require further explanation or expansion of the data by the authors.

Lines 21-22: GCA is proposed to compensate for the shortcomings of the CA model, so it is suggested that the authors also need to provide the predictions of the CA model for comparison.

Lines 24-25: The presentation is too general and the predictive power of the developed models should be described quantitatively. For example, how many times higher than the predictive power of traditional models?

Lines 28-29: This problem is difficult to solve; after all, the models developed are also based on the prerequisite of non-interaction between the mixed components. And, existing models require a clear mode of toxic action.

2.1 section: Personally, I think that there is no theoretical basis for the integration of these two models into one. the GCA model is developed on the basis of the CA model on the premise that the mixed components should have the same modes of toxic action, whereas the IA model is applied on the premise that the mixed components have different modes of toxic action.

Figure 6: Too many constraints may limit the predictive efficiency of the model.

2.3 section: The incorporation of Bayesian Data Model may reduce the predictive efficiency of the model.

Result section: Increasing the results of the developed model in comparison with the CA model is more convincing.

Lines 427-441: In addition to considering interactions between hybrid components, the authors need to consider applying additional data sets to validate the models developed. In addition, the model involves too many parameters, which requires a larger amount of data for validation. From a risk assessment perspective, the practical applicability of the model still requires further consideration.

In summary, the authors of this study are still working hard to try to advance the mixture toxicity model. After considering the above reviewer' comments, the work is worthy of publication.

Reviewer #2: The manuscript refers to a specific mixture scenario where mixture components have differences in efficacy (ie, dose-response curve maxima) and, as consequence, the calculation of an expected mixture effect according to dose additivity is restricted. Solutions for this problem have been published previously but rely on narrow, usually unrealistic model assumptions (Howard &Webster) or provide a range of worst-case mixture predictions (Toxic unit extrapolation, Scholze et al). Here the manuscript suggests a mathematically-motivated more flexible approach. This is integrated in a chemical grouping approach where the authors make the assumption that the model parameter describing the steepness of a dose-response relationship is capable of showing a toxicological mode of action.

As much as I would like to see the use of novel mathematical & statistic methods in mixture toxicology, I have serious problems with the authors’ methods as they rest on assumptions which are not in line with current knowledge in mixture toxicology and pharmacology:

• The steepness of dose-response pattern is not indicative for the pharmacological/toxicological mode of action, a stubborn myth which experimental and theoretical toxicology have disproved now for decades. Compounds with nearly identical hazard profiles but different kinetic properties can result into data of very different dose-response shapes, compounds with different targets can produce nearly identical dose-response pattern, and this holds true for in vivo and in vitro study endpoints. So no, establishing cumulative mixture assessment groups according to the steepness of dose-response patterns is not an option anybody would suggest, and I cannot think about any application where a grouping of this specific model parameter under consideration of statistical variability would be useful.

• A receptor-based bioassay endpoint from eg the AR-luc assay is not capable of showing responses to more than one “mode of action”. In fact, most cell-based assay endpoints from the ToxCast library are annotated as MIE’s, and therefore considered as ideal for being in line with the pharmacological assumptions of Concentration addition. Also, factors like (i) nominal vs bioavailable (or target) concentrations, (ii) cytotoxicity at high concentrations, and (iii) lack of data support for weak assay responses and their impact on model complexity, can impact the curve estimate from an empirical dose-response model.

• Demonstrating by simulation that IA and CGA cannot describe a mixture scenario accurately that doesn’t fulfil the conceptual requirements of these two mixture models is trivial, so I don’t understand the meaning of paragraph 2.6, especially as it is promised to be a “validation”.

• Dose-response data from the AR-luc assay were used for 18 compounds, with 7 positively identified as AR antagonists and the remaining 11 known to be ER agonists. The authors say “there are 69 different mixtures” but without explaining the “where”, and it is unclear why and how these data sets were used for the simulations. For this endpoint only C(G)A is a reasonable mixture reference, and this has been demonstrated in numerous mixture studies. So why considering IA and RGCA+DP(I assume this refers to the proposed CGA/IA model)?

• Real data from experimental mixture studies are not used (or shown) to assess the approach, so I cannot see evidence in favour for this method.

• I cannot see how the extension of the CGA model by Hobster&Webster works, as Eq 6 refers only to β=1, ie the regression model used by Hobster&Webster. Where is the novelty?

My recommendations:

- The extension of the CGA model by Hobster&Webster deserves certainly its own publication, but here I would expect a better and more clear mathematical presentation, the (explicit or implicit) mathematical equations for the case β≠1 (Eq 6 refers only to β=1, or?), statements about the model domain (what are the limitations?), and a real mixture scenario to demonstrate its applicability (as in Howard&Webster or Scholze et al). The reader must be in the position not only to use the method but also to repeat the calculation for any example shown.

- If I got it right, the Bayesian aspect refers mainly to the between-study variability in the dose-response modelling, which is not novel (eg, the recommended approach by EFSA/Europe in the derivation of a regulatory benchmark dose). Here I would suggest to expand this to the proposed CGA model, expressing the statistical uncertainty of CGA prediction via a credibility interval is certainly something of great interest in the mixture field.

I would also strongly recommend to align with a mixture toxicologist.

Reviewer #3: In their manuscript "Reflected Concentration Addition to Improve Chemical Mixture Prediction" Daniel Zilber and Kyle Messier suggest an adopted approach to predict effects of chemical mixtures from observed effects of the mixture components.

In my view the approach is very interesting and plausible and can be a very valuable contribution for the field of mixture toxicology. The manuscript suffers in part from some ambiguities (see below) which should be addressed before publication. Overall, I would recommend considering the manuscript for publication after moderate revisions.

In my view, the following major topics should be addressed in a revision:

1) Incomplete and biphasic concentration responses in Tox21 Data:

An accurate mixture prediction depends a lot on accurate concentration response models for the mixture components. The authors partly address this issue by integrating a set of noise terms in the Hill function used. Yet, it did not become clear to me, how the authors deal with

a) biphasic responses (e.g. due to cytotox) as it can be seen for progesterone or hydroxyflutamide in Figure 2. I guess the highest 2 (?) concentrations are just not considered for the model, but it is not mentioned anywhere. It might also have a large effect on the overall outcome which concentrations are selected for modeling.

There are some studies on adopted CA for biphasic models (e.g. http://www.nature.com/articles/srep17200) and it would be interesting to see how these compare to the proposed RGCA model. Since this could be quite a lot of additional work and thus not feasible, I think it would be really good to include this topic in the discussion.

b) "incomplete" concentration response/unknown maximum effect: The authors thoroughly address the topic of partial agonists/variable maximum effect. Especially for High-Throughout-Data like Tox21, we regularly have the issue, that effects do not reach a plateau, so we only have an incomplete curve and a maximum effect cannot be identified. It gets even worse for the case of cytotox, where the maximum effect is even more obscured. This does not matter for calculating Benchmark Concentrations, but is a very important limitation in mixture modeling. It would be very helpful if the authors make more explicit how the maximum effect is modelled. If feasible, they could make a very strong point, when they could test the sensitivity of different alpha ranges on mixture prediction. I am aware, that this might be too much outside the scope of the study, but it should made explicit in the methods and included as limitation in the discussion at least.

2) Benefits of the Bayesian Hierarchical Models in Mixture Modelling: As the authors point out, earlier mixture studies have shown that Observed Mixture Responses often end up somewhere between IA and CA. When reading the manuscript I wondered, if one could arrive at the same quality of prediction as with the Bayesian Clustering by just randomly assigning the clusters (maybe still based on slope, but e.g. a random cluster number). Would you dare to try it out? ;-)

3) I found the manuscript very interesting and inspiring to read, but I got the feeling that there should be some switching between SI and manuscript paragraphs. In some parts, established methods (like DPMM) were explained in detail - which is a nice service to the reader - but distracts a bit from the main adoptions and findings of this study. On the other hand, important information about the methods (e.g. how and why did we chose priors for single chemical modeling) are only implicitly mentioned or buried in the supplement. Maybe this is also a matter of taste, but I just want to encourage the authors to check theirs manuscript for this once more. Also make sure, that references to the SI are clear and easy to follow.

Some more minor issues and questions in the following:

Abstract:

l. 13: "This is purely predictive..." This only becomes clear in the context of approaches that use mixture observations for modelling. Maybe reframe, e.g. "This predicts mixture responses based on observed responses of mixture components"...

l. 25: "significantly improved...": In my view, this does not go in line whith your discussion, where you state, that CRPS is your recommended metric for drawing conclusions (l. 389) and CRPS showed "marginally better" results (l. 386).

l. 28: "Lastly..." I would recommend to leave out this sentence, since this is not scope of the study

Introduction:

ll. 34ff: I share your concerns regarding "forever" chemicals, but was not convinced, that this group specifically adds to the mixture problem...

ll. 39-51: I really liked this introduction into the world of mixture prediction!

ll. 52ff: Maybe consider to cite the original publications for CA and IA?

l. 58: Could be misleading to talk about "adding" effects. Maybe reframe to "...so a series of chemical proportional effects can be multiplied..."?

ll. 60ff.: It took me a while to understand which probem(s) you are actually adressing with your study, maybe because there are these two elements of a) clustering similar mode-of-action and b) reflected gca, which both adress different types of problems. Maybe this could be made even more explicit in the introduction

l. 74: This statement did not became clear to me, maybe you could elaborate a bit more on this

Material and Methods:

"Algorithm 1": This did not help me too much for understanding the approach. To be able to grasp this more intuitively, maybe a flowchart would be an option. Otherwise, it would at least help to explain/define all mentioned parameters and abbreviations, be a bit more consistent in the structure (e.g. "Step 1...Step 2... Step 3a...Step3b" in the "Result" and Approach). I could not find the referenced Equation 7 in the Supplement (Equations are not numbered there).

Equation 3: Please also define/explain parameters of this equation. How is alpha_max different from alpha (equation 1)? How do you deal with different alpha_max values in the IA part?

Equation 6: I got a bit confused here. You are talking about a general form of inverse for beta not 1, but in Equation 6 you state beta = 1?

l. 162: "iid Gaussian noise" - Typo?

l. 170: Please state and justify all priors here or in the supplement.

l. 180: Please explain how the data was filtered before modeling (see above).

Thanks for supplying your R code on Github. I saw a summary Rmd file there. Maybe it would be worth considering to include the rendered pdf as a supplement to the manuscript?

ll. 197-210: Nice description of the process, but maybe consider to move to supplements

l. 235: Very important point, that the slopes within a cluster are identical. Do you consider this when sampling the remaining parameters, that you only sample from the chains with the corresponding fixed slope?

Is the RGCA approach in general also applicable for differing slopes?

ll. 252-262: consider to move to supplements

l. 289-290: It did not became clear to me, how you arrived at "realistic" Hill parameters, and what "feasible clustering conditionals" means. Please also state how many different mixtures you simulate, with how many chemicals included and how the mixture compositions are defined.

Results

l. 317: It would help to reiterate your assumptions here

Figure 3: Please state n. Would it be feasible to simulate larger mixtures?

l. 336: This is a very interesting finding. It would be nice, if you could plot the fits and the mentioned extrapolations, so one could judge how they differ and which would be more plausible. Include here or in the supplement.

l. 347: "below" Figure 4?

l. 384: Maybe you could draw a dot/ or boxplot showing CRPS score against the number of mixture components and different colors/shapes for the different prediction methods?

Figure 6: What are the large circles? replicate 1? It is confusing that they are larger than Replicate 2 and 3 and not mentioned in the legend. In my opinion more fits would deserve to be included in the main part of the manuscript. It would also be interesting to include the CRPS score for the different models in the plot so one could get a feeling for what the numbers mean. Consider to include at least a plot with a very low score/ a medium score and a high score...Include all the remaining fits in the supplement.

Discussion:

l. 380: "Fails with binary and small mixtures": When looking at Supplemental Figure 5, Binary AR2 and AR7, I get the impression, that your RGD approach actually performs not totally bad, since the measured responses fall in the (quite broad) credible interval? What did not become clear to me, why the credible interval is so wide for RGD exclusively, because the clustering cannot be so variable for 2 substances, right? Could you elaborate on this?

l. 422: Workaround would then be back to GCA?

ll. 427ff: Can you relate to any exisiting literature on this topic?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Mar 28;19(3):e0298687. doi: 10.1371/journal.pone.0298687.r002

Author response to Decision Letter 0

27 Dec 2023

See attached response_to_reviewers pdf document.

Attachment

Submitted filename: Response_to_Reviewers.pdf

pone.0298687.s002.pdf^{(151.1KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0298687.r003

Decision Letter 1

Y-h Taguchi

30 Jan 2024

Reflected Generalized Concentration Addition and Bayesian Hierarchical Models to Improve Chemical Mixture Prediction

PONE-D-23-27967R1

Dear Dr. Messier,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Y-h. Taguchi, Dr. Sci.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

**********

6. Review Comments to the Author

Reviewer #1: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

PLoS One. doi: 10.1371/journal.pone.0298687.r004

Acceptance letter

Y-h Taguchi

19 Mar 2024

PONE-D-23-27967R1

PLOS ONE

Dear Dr. Messier,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Professor Y-h. Taguchi

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 File

(PDF)

pone.0298687.s001.pdf^{(2.2MB, pdf)}

Attachment

Submitted filename: Response_to_Reviewers.pdf

pone.0298687.s002.pdf^{(151.1KB, pdf)}

Data Availability Statement

[pone.0298687.ref001] 1. Conley JM, Lambright CS, Evans N, Cardon M, Medlock-Kakaley E, Wilson VS, et al. A mixture of 15 phthalates and pesticides below individual chemical no observed adverse effect levels (NOAELs) produces reproductive tract malformations in the male rat. Environment International. 2021;156:106615. doi: 10.1016/j.envint.2021.106615 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0298687.ref002] 2. Silva E, Rajapakse N, Kortenkamp A. Something from “nothing”- eight weak estrogenic chemicals combined at concentrations below NOECs produce significant mixture effects. Environmental science & technology. 2002;36(8):1751–1756. doi: 10.1021/es0101227 [DOI] [PubMed] [Google Scholar]

[pone.0298687.ref003] 3. Kwiatkowski CF, Andrews DQ, Birnbaum LS, Bruton TA, DeWitt JC, Knappe DR, et al. Scientific basis for managing PFAS as a chemical class. Environmental Science & Technology Letters. 2020;7(8):532–543. doi: 10.1021/acs.estlett.0c00255 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0298687.ref004] 4. Eccles KM, Karmaus AL, Kleinstreuer NC, Parham F, Rider CV, Wambaugh JF, et al. A geospatial modeling approach to quantifying the risk of exposure to environmental chemical mixtures via a common molecular target. Science of The Total Environment. 2023;855:158905. doi: 10.1016/j.scitotenv.2022.158905 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0298687.ref005] 5. Billionnet C, Sherrill D, Annesi-Maesano I, et al. Estimating the health effects of exposure to multi-pollutant mixture. Annals of epidemiology. 2012;22(2):126–141. doi: 10.1016/j.annepidem.2011.11.004 [DOI] [PubMed] [Google Scholar]

[pone.0298687.ref006] 6. Bliss CI. The toxicity of poisons applied jointly 1. Annals of applied biology. 1939;26(3):585–615. doi: 10.1111/j.1744-7348.1939.tb06990.x [DOI] [Google Scholar]

[pone.0298687.ref007] 7. Cedergreen N, Christensen AM, Kamper A, Kudsk P, Mathiassen SK, Streibig JC, et al. A review of independent action compared to concentration addition as reference models for mixtures of compounds with different molecular target sites. Environmental Toxicology and Chemistry: An International Journal. 2008;27(7):1621–1632. doi: 10.1897/07-474.1 [DOI] [PubMed] [Google Scholar]

[pone.0298687.ref008] 8. Loewe St, Muischnek H. Über Kombinationswirkungen: Mitteilung: Hilfsmittel der Fragestellung. Naunyn-Schmiedebergs Archiv für experimentelle Pathologie und Pharmakologie. 1926;114:313–326. doi: 10.1007/BF01952257 [DOI] [Google Scholar]

[pone.0298687.ref009] 9. Teuschler LK, Rice GE, Wilkes CR, Lipscomb JC, Power FW. A feasibility study of cumulative risk assessment methods for drinking water disinfection by-product mixtures. Journal of Toxicology and Environmental Health, Part A. 2004;67(8-10):755–777. doi: 10.1080/15287390490428224 [DOI] [PubMed] [Google Scholar]

[pone.0298687.ref010] 10. Rider CV, Furr JR, Wilson VS, Gray LE Jr. Cumulative effects of in utero administration of mixtures of reproductive toxicants that disrupt common target tissues via diverse mechanisms of toxicity. International journal of andrology. 2010;33(2):443–462. doi: 10.1111/j.1365-2605.2009.01049.x [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0298687.ref011] 11. Altenburger R, Schmitt H, Schüürmann G. Algal toxicity of nitrobenzenes: combined effect analysis as a pharmacological probe for similar modes of interaction. Environmental Toxicology and Chemistry: An International Journal. 2005;24(2):324–333. doi: 10.1897/04-032R.1 [DOI] [PubMed] [Google Scholar]

[pone.0298687.ref012] 12. Altenburger R, Nendza M, Schüürmann G. Mixture toxicity and its modeling by quantitative structure-activity relationships. Environmental Toxicology and Chemistry: An International Journal. 2003;22(8):1900–1915. doi: 10.1897/01-386 [DOI] [PubMed] [Google Scholar]

[pone.0298687.ref013] 13. Altenburger R, Walter H, Grote M. What contributes to the combined effect of a complex mixture? Environmental Science & Technology. 2004;38(23):6353–6362. doi: 10.1021/es049528k [DOI] [PubMed] [Google Scholar]

[pone.0298687.ref014] 14. Wang Z, Chen J, Huang L, Wang Y, Cai X, Qiao X, et al. Integrated fuzzy concentration addition–independent action (IFCA–IA) model outperforms two-stage prediction (TSP) for predicting mixture toxicity. Chemosphere. 2009;74(5):735–740. doi: 10.1016/j.chemosphere.2008.08.023 [DOI] [PubMed] [Google Scholar]

[pone.0298687.ref015] 15. Mwense M, Wang XZ, Buontempo FV, Horan N, Young A, Osborn D. Prediction of noninteractive mixture toxicity of organic compounds based on a fuzzy set method. Journal of chemical information and computer sciences. 2004;44(5):1763–1773. doi: 10.1021/ci0499368 [DOI] [PubMed] [Google Scholar]

[pone.0298687.ref016] 16. Qin LT, Liu SS, Zhang J, Xiao QF. A novel model integrated concentration addition with independent action for the prediction of toxicity of multi-component mixture. Toxicology. 2011;280(3):164–172. doi: 10.1016/j.tox.2010.12.007 [DOI] [PubMed] [Google Scholar]

[pone.0298687.ref017] 17. Howard GJ, Webster TF. Generalized concentration addition: a method for examining mixtures containing partial agonists. Journal of theoretical biology. 2009;259(3):469–477. doi: 10.1016/j.jtbi.2009.03.030 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0298687.ref018] 18. Scholze M, Silva E, Kortenkamp A. Extending the applicability of the dose addition model to the assessment of chemical mixtures of partial agonists by using a novel toxic unit extrapolation method. PloS one. 2014;9(2):e88808. doi: 10.1371/journal.pone.0088808 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0298687.ref019] 19. Tice RR, Austin CP, Kavlock RJ, Bucher JR. Improving the human hazard characterization of chemicals: a Tox21 update. Environmental health perspectives. 2013;121(7):756–765. doi: 10.1289/ehp.1205784 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0298687.ref020] 20. Webster TF, Schlezinger JJ. Generalized concentration addition for ligands that bind to homodimers. Mathematical biosciences. 2019;316:108214. doi: 10.1016/j.mbs.2019.108214 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0298687.ref021] 21. Robert CP, Casella G, Casella G. Monte Carlo statistical methods. vol. 2. Springer; 1999. [Google Scholar]

[pone.0298687.ref022] 22. Wheeler MW. An investigation of non-informative priors for Bayesian dose-response modeling. Regulatory Toxicology and Pharmacology. 2023;141:105389. doi: 10.1016/j.yrtph.2023.105389 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0298687.ref023] 23. Huang R. A quantitative high-throughput screening data analysis pipeline for activity profiling. High-throughput screening assays in toxicology. 2016; p. 111–122. doi: 10.1007/978-1-4939-6346-1_12 [DOI] [PubMed] [Google Scholar]

[pone.0298687.ref024] 24. Lynch C, Sakamuru S, Huang R, Stavreva DA, Varticovski L, Hager GL, et al. Identifying environmental chemicals as agonists of the androgen receptor by using a quantitative high-throughput screening platform. Toxicology. 2017;385:48–58. doi: 10.1016/j.tox.2017.05.001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0298687.ref025] 25. Kleinstreuer NC, Ceger P, Watt ED, Martin M, Houck K, Browne P, et al. Development and validation of a computational model for androgen receptor activity. Chemical research in toxicology. 2017;30(4):946–964. doi: 10.1021/acs.chemrestox.6b00347 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0298687.ref026] 26. Vermeulen R, Schymanski EL, Barabási AL, Miller GW. The exposome and health: Where chemistry meets biology. Science. 2020;367(6476):392–396. doi: 10.1126/science.aay3164 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0298687.ref027] 27. Warne MSJ, Hawker DW. The number of components in a mixture determines whether synergistic and antagonistic or additive toxicity predominate: The funnel hypothesis. Ecotoxicology and Environmental Safety. 1995;31(1):23–28. doi: 10.1006/eesa.1995.1039 [DOI] [PubMed] [Google Scholar]

[pone.0298687.ref028] 28. Ritz C, Baty F, Streibig JC, Gerhard D. Dose-Response Analysis Using R. PLOS ONE. 2015;10(e0146021). doi: 10.1371/journal.pone.0146021 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0298687.ref029] 29. Gneiting T, Raftery AE. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association. 2007;102(477):359–378. doi: 10.1198/016214506000001437 [DOI] [Google Scholar]

[pone.0298687.ref030] 30.Jordan A, Krüger F, Lerch S. Evaluating probabilistic forecasts with scoringRules. arXiv preprint arXiv:170904743. 2017;.

[pone.0298687.ref031] 31. Martin-Betancor K, Ritz C, Fernández-Piñas F, Leganés F, Rodea-Palomares I. Defining an additivity framework for mixture research in inducible whole-cell biosensors. Scientific Reports. 2015;5(1):17200. doi: 10.1038/srep17200 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Reflected generalized concentration addition and Bayesian hierarchical models to improve chemical mixture prediction

Daniel Zilber

Kyle Messier

Roles

Abstract

1 Introduction

2 Materials and methods

2.1 GCA, IA, and the two-step model

2.2 Reflected GCA

Fig 1. Hill function symmetry.

2.2.1 Derivation of RGCA

2.2.2 Limitations

Fig 2. Multiple solutions with RGCA.

2.3 Bayesian data model

Fig 3. Examples of Tox21 data.

2.4 Clustering as interpolation

2.5 Tox21 data

2.6 Simulation and validation

Table 1. Simulation setup.

3 Results

3.1 Simulation study

Fig 4. Simulation study results.

3.2 Parameter estimates and clustering

Table 2. Fitted parameters for Tox21.

3.3 Application results

Fig 5. Application results.

Fig 6. Accurate predictions with RGCA.

Fig 8. Poor predictions with RGCA.

Fig 7. Moderately accurate predictions with RGCA.

4 Discussion

4.1 Limitations and future directions

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Y-h Taguchi

Roles

Transfer Alert

Author response to Decision Letter 0

Decision Letter 1

Y-h Taguchi

Roles

Acceptance letter

Y-h Taguchi

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases