Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Sep 1.
Published in final edited form as: Risk Anal. 2020 Jun 29;40(9):1706–1722. doi: 10.1111/risa.13537

Quantitative Risk Assessment: Developing a Bayesian Approach to Dichotomous Dose–Response Uncertainty

Matthew W Wheeler 1,*, Todd Blessinger 2, Kan Shao 3, Bruce C Allen 4, Louis Olszyk 5, J Allen Davis 6, Jeffrey S Gift 7
PMCID: PMC7722241  NIHMSID: NIHMS1613685  PMID: 32602232

Abstract

Model averaging for dichotomous dose–response estimation is preferred to estimate the benchmark dose (BMD) from a single model, but challenges remain regarding implementing these methods for general analyses before model averaging is feasible to use in many risk assessment applications, and there is little work on Bayesian methods that include informative prior information for both the models and the parameters of the constituent models. This article introduces a novel approach that addresses many of the challenges seen while providing a fully Bayesian framework. Furthermore, in contrast to methods that use Monte Carlo Markov Chain, we approximate the posterior density using maximum a posteriori estimation. The approximation allows for an accurate and reproducible estimate while maintaining the speed of maximum likelihood, which is crucial in many applications such as processing massive high throughput data sets. We assess this method by applying it to empirical laboratory dose–response data and measuring the coverage of confidence limits for the BMD. We compare the coverage of this method to that of other approaches using the same set of models. Through the simulation study, the method is shown to be markedly superior to the traditional approach of selecting a single preferred model (e.g., from the U.S. EPA BMD software) for the analysis of dichotomous data and is comparable or superior to the other approaches.

Keywords: Benchmark dose estimation, maximum a posteriori estimation, Monte Carlo simulation, quantitative risk estimation

1. INTRODUCTION

Model averaging (Buckland, Burnham, & Augustin, 1997; Claeskens & Hjort, 2008; Hoeting, Madigan, Raftery, & Volinsky, 1999; Raftery, Madigan, & Hoeting, 1997) is a technique for inference over multiple models that accounts for model uncertainty by estimating a predictor–response relationship as a weighted sum of individual model estimates. There are many different model averaging approaches for benchmark dose (BMD) estimation (Bailer, Noble, & Wheeler, 2005; Faes, Aerts, Geys, & Molenberghs, 2007; Shao & Gift, 2013; Shao & Shapiro, 2018; Simmons et al., 2015; M. Wheeler & Bailer, 2009; M. W. Wheeler & Bailer, 2007), because research shows that traditional quantitative risk assessments based upon a single “best model” have poor statistical properties (Simmons et al., 2015; West et al., 2012; M. Wheeler & Bailer, 2009) and that model averaging is superior to the single model approach (Shao & Gift, 2013; M. Wheeler & Bailer, 2009; M. W. Wheeler & Bailer, 2007). This has led some regulatory agencies to recommend model averaging (EFSA Scientific Committee, et al., 2017). Despite its superiority, some argue that there remain technical challenges that need to be addressed before model averaging can be adopted as a standard risk assessment practice. This article proposes a methodology that addresses these challenges while demonstrating its superiority over the single model selection method summarized in the U.S. EPA’s Benchmark Dose Technical Guidance (U.S. EPA, 2012) as well as other approaches currently in the literature (Guha et al., 2013; Shao & Shapiro, 2018).

The challenges seen in parametric model averaging for BMD estimation are manifold. For example, model averaging for BMD analysis is based upon individual parametric models chosen by the modeler, and the performance of the approach depends upon the models chosen. This was the conclusion of Wheeler and Bailer (2007) who showed that the statistical results differ depending on the number and type of parametric models included in the average. Though Shao and Gift (2013) showed the difference is minimal in practical terms, to maintain public confidence in risk assessment science, it is important to use a methodology that has a standard set of models that can be reliably used for any data set. This fact is further compounded as many of the models can degenerate into the same model during fitting. For instance, some models have bounds on parameters such that when the parameters are estimated as equal to a bound, the models degenerate into a single common model. As an example of this, the Weibull and the multistage two-degree models (see Table I below) can degenerate into the quantal linear model. This leads to concerns of implicit bias in the results by essentially including the same model multiple times, and results in ambiguities in construction of the model weights as well as inference.

Table I.

The Individual Models Used in the Model Averaging Method and Their Respective Parameter Priors

Model Constraints Priors Notes
Quantal linear
p1(d)=γ+(1γ)(1exp[βd])
β > 0
0 ≤ γ ≤ 1
log(β) ~ Normal(0.1)
logit(γ) ~ Normal(0,2)
Multistage
p2(d)=γ+(1γ)(1exp[β1dβ2d2])
β1 > 0
β2 > 0
0 ≤ γ ≤ 1
log(β1) ~ Normal(0,0.25)
log(β2) ~ Normal(0,1)
logit(γ ) ~ Normal(0,2)
The prior over the β1 parameter expresses the belief that the linear term should be positive if the quadratic term is positive in the two-hit model of carcinogenesis.
Weibull
p3(d)=γ+(1γ)(1exp[βdα])
β > 0
α > 0
0 ≤ γ ≤ 1
log(β) ~ Normal(0,1)
log(α) ~ Normal(log(2),0.18)
logit(γ) ~ Normal(0,2).
Here the prior over a is designed such that there is only a low prior probability the power parameter will be less than 1 (i.e., biologically unrealistic supralinear dose-response curves). This allows for models that are supra-linear, but it requires conclusive data for the a parameter to go much below 1 and the model to represent biologically unreasonable dose-response relationships.
Gamma
p4(d)=γ+1γΓ(α)0βdtα1exp(t)dt
β > 0
α > 0
0 ≤ γ ≤ 1
log(β) ~ Normal(0, 1)
log(α) ~ Normal(log(2),0.18)
logit(γ) ~ Normal(0,2)
Here the prior over a is designed such that there is only a low prior probability the power parameter will be less than 1 (i.e., biologically unrealistic supralinear dose-response curves). This allows for models that are supra-linear, but it requires conclusive data for the a parameter to go much below 1 and the model to represent biologically unreasonable dose-response relationships.
Dichotomous Hill
p5(d)=γ+v(1γ)1+exp[ablog(d)]
0 ≤ γ ≤ 1
0 ≤ v ≤ 1
−∞ < a < ∞
b > 0
a ~ Normal(0, .25)
log(b) ~ Normal(log(10),0.0625)
logit(γ) ~ Normal(0,2)
logit(ν) ~ Normal(4,2)
Logistic
p6(d)=11+exp[β0β1d]
−∞ < β0 < ∞
β1 > 0
β0 ~ Normal(0, 1)
log(β1) ~ Normal(0,2)
Log-Logistic
p7(d)=γ+1γ1+exp[β0β1log(d)]
−∞ < β0 < ∞
β1 > 0
β0 ~ Normal(0, 1)
log(β1) ~ Normal(log(2),0.25)
logit(γ) ~ Normal(0,2).
Probit
p8(d)=Φ(β0+β1d)
−∞ < β0 < ∞
β1 > 0
β0 ~ Normal(0.1)
log(β1) ~ Normal(0,1)
Log-Probit
p9(d)=γ+(1γ)Φ[β0+β1log(d)]
−∞ < β0 < ∞
β1 > 0
β0 ~ Normal(0,1)
log(β1) ~ Normal(log(2),0.25
logit(γ) ~ Normal(0,2)

Note. logit(γ)=log(γ1γ).

We develop a process using informative priors that prevents model degeneracy and allows a model to be included in an analysis even if there are more parameters than dose points. This allows development of a standard suite of models to be used in every risk assessment. We construct our model average weights based upon a Laplace approximation, which, unlike many previous strategies, is a consistent estimate for the weight. This guarantees that, in the case of degeneracy, the more complicated model will get a weight near zero relative to the model it degenerates into, thus limiting a model being counted twice in the average.

In terms of computation, our approach differs from previous Bayesian approaches. Other methods (Fang et al., 2015; Shao & Shapiro, 2018; Simmons et al., 2015) use Markov Chain Monte Carlo (MCMC) to compute the BMD and develop a set of priors based upon a reparametrized model set focused on the BMD. As this approach requires expert analysis of the Markov Chain to guarantee convergence, it may be difficult to guarantee the correctness of the analysis in general. Further, in a general Bayesian setting, informative priors for the model’s parameter, as opposed to the focused BMD prior approach of Fang et al. who use informative priors on the BMD, have not been proposed. For example, Shao and Shapiro define priors over a uniform region (2018) using approximate weighting strategies for model averaging. Uniform priors implicitly set parameter bounds based upon the size of the region chosen, which does not solve the issue of parameter bounds.

As an alternative to these Bayesian approaches, we define informative priors on the entire or positive real line and use asymptotic approximations instead of MCMC sampling. The approximation is deterministic allowing for reproducible analyses that prevents the need for analyzing the Markov chain. These priors prevent estimates from being exactly equal to the bound or by having distributions truncated at the bound, which causes consistency problems when the true parameter is outside of the bound. Our method replaces strict bounds with “soft bounds” defined by prior densities for the individual parameters that put low prior probability on regions outside the typical parameter boundaries. For example, the U.S. EPA’s BMD technical guidance (U.S. EPA, 2012) recommends constraining the bounds of the shape parameter of the Weibull model to be greater or equal to 1 because values less than 1 lead to an infinite slope of the dose–response curve at dose zero, which is often considered biologically implausible. Our proposed priors allow the shape parameter to take any positive value but places a low prior probability to values less than 1. This model will still fit supralinear curves, but such shapes will only get a high weight if the data support them. In cases where there are limited data, the prior information shrinks the dose–response estimate to be like shapes frequently seen in practice.

In addition to their theoretical disadvantages, current standard model averaging approaches are in some cases impractical because they are computer intensive. Wheeler and Bailer (2007) requires bootstrap resamples that can require minutes to complete an individual analysis, and Shao and Shapiro (2018) use MCMC, which can be equally computationally intensive. Though this may not be important for individual analyses, it may introduce computational bottlenecks when analyzing data from high throughput studies that have thousands of dose–response curves to analyze. Our method uses an accurate approximation technique that can complete a single analysis in tenths of a second as compared to tens of seconds and reduces the time required for analyzing hundreds or thousands of dose–responses curves in batch.

The article is organized as follows: Section 2 describes the Bayesian approximation method, the model averaging method, and the prior choices and justifications for their use. Section 3 gives a numerical analysis of the method, Section 4 gives an analysis of several real dichotomous data sets. Finally, Section 5 outlines a comprehensive simulation study of the method and gives results comparing it to current practice.

2. BAYESIAN APPROXIMATION MODEL

In what follows, let π(·) stand for a prior distribution (i.e., what we know prior to observing data) and pr(·) be a posterior distribution (i.e., what we have learned after observing data) for a parameter of interest. For the exposition, the relevant parameter of interest is clear from context and is ususally the BMD. As an example, π(a|b) is the prior distribution for parameter “a” given parameter “b” and pr(a |Y,b) is the posterior distribution of this parameter given data Y and parameter “b.” Bayesian analysis learns about the relevant parameter (e.g., the parameter “a” in the example above) by assuming a data generating mechanism L(·), the likelihood in frequentist inference, and prior π(·) on the parameter. One computes the posterior, pr(·), from L(·) and π(·) given data.

For our modeling, we explicitly define priors over parameters and dose–response models (e.g., the parameter β is thought to arise from a normal distribution or the Weibull model is thought to be the true model with 30% probability), and this implicitly specifies priors over functional transformations of model parameters. For risk analysis purposes, the main quantity of interest is the BMD, which is a complicated function of the dose–response parameters. As we concern ourselves with multi-model inference of the BMD, this value has a model-specific posterior distribution, i.e., a distribution given a specific dose– response, and a multi-model posterior distribution, i.e., a distribution based upon all of dose–response models considered. For simplicity in notation, the BMD is assumed to be the multi-model estimate and BMDk is the estimate from model k.

2.1. Bayesian Modeling

From a Bayesian perspective, inference on the BMD proceeds by defining a data generating mechanism given model M and its parameters θ. The likelihood L(Y|M, θ) represents a data generating mechanism (e.g., normal, binomial) based upon a model M (e.g., a Hill or Weibull dose–response curve) having the vector of parameters θ. For a suite of K models, let π (Mk, θk) = f(Mk)g(θk|Mk) be the prior probability of model k, k = 1,..., K, with f(·) being a discrete probability measure over the K models, e.g., 0 ≤ f(Mk) ≤ 1 with k=1Kf(Mk)=1, and g(θk|Mk) a density over Mk’s parameters θk. When K = 1 and f (M1) = 1, inference is defined by Bayes theorem,

pr(θ1,M1|Y))=L(Y|M1)π(M1,θ1)L(Y|M1)π(M1,θ1)dθ1.

We are interested in estimating the BMD when K > 1. The BMD for the kth model, BMDk, can be written as a function of θk, i.e., h (θk) = BMDk, and pr(BMDk|Y) can be computed from pr(θk, Mk|Y). For multi-model inference, the posterior density of the BMD averaged over K models is

pr(BMD|Y))=k=1Kpr(Mk|Y)pr(BMDk|Mk,Y)), (1)

where pr(Mk|Y) is the posterior probability of model Mk given the data,

pr(Mk|Y))=f(Mk)Iki=1Kf(Mi)Ii,

and Ik = ∫ L(Y|Mk)π(Mk, θk) k. Analytic evaluation of Ik is often not possible. Instead, numerical integration techniques may be more appropriate. Richardson and Green (1997) noted that inference on (1) can proceed using reversable jump MCMC; however, developing an efficient reversable jump MCMC algorithm is difficult as it requires special tuning of a proposal algorithm for jumps between models. One can estimate the posterior distributions for the BMDk and compute Ik individually to approximate (1), but Ik must be then approximated directly. In what follows, we develop a fast approach for approximating both pr(BMDk|Y) and Ik.

For our algorithm, the BMD point estimate is computed from the maximum a posteriori (MAP) parameter estimate of the model and the lower confidence bound on the BMD, the BMDL, is taken as the 100(1 − α) percentile for appropriately chosen probability level α using the approximation method. Model posterior probabilities, pr(Mk|Y) and more specifically Ik, are approximated using a Laplace approximation (Hoeting et al., 1999). That is calculated from the MAP estimate of θk. Sections 2.2 and 2.3 below summarize the theoretical justification for this approximation. Additional details can be found in Tierney and Kadane (1986).

2.2. Approximate Posterior Estimation of the BMD

The posterior densities of the individual models and the posterior model weights described in (1) are calculated using a Laplacian approximation (Hsu, 1995; Leonard, Hsu, & Tsui, 1989). This approximation is similar to the Model Averaged Profile Likelihood (MAPL) approach of Fletcher and Turek (2012), but while MAPL relies only on the likelihood, our approach incorporates prior information in calculating the marginal profile density (Hsu, 1995; Hu, Ji, & Tsui, 2008) of the BMD. In other words, both the likelihood and prior are used in the profile. The model-specific density is defined by treating profile density bounds as quantiles of a marginal posterior density for the parameter of interest. A detailed description and full motivation are in the Supporting Information with the approach being justified asymptotically (Hsu, 1995; Severini, 1991).

Our model-averaged BMD point estimate is the weighted average of BMD MAP estimates from individual models, weighted by posterior model probabilities πk(Mk|Y). In the approximation, this is equivalent to the median of the approximate posterior density of the BMD. The BMDL and BMD upper bound (BMDU) estimates are obtained by integrating Equation (1) as follows. A 100(α)% BMDU estimate or 100(1 − α)% BMDL estimate is the value BMDγ, γ = α or γ = 1 − α, such that,

γ=0BMDγpr(BMD|Y)dBMD,

which is

=k=1Kπk(Mk|Y)0BMDγpr(BMD|Mk,Y)dBMD. (2)

For the case where BMDγ<BMD^, the integral in Equation (2) is approximated using the formula

0BMDγpr(BMD|Mk,Y)dBMD12Pr(2log[pr(BMD^k|Mk,Y)]+2log[pr(BMDγ|Mk,Y)]<χ1,2α2), (3)

where pr(BMD^k|Mk,Y) is the maximum value of the posterior of model Mk evaluated at BMDk^=hk(θ^k), θ^k is the MAP estimate for model Mk,χ1,2α2 is the 2α quantile of a chi-squared random variable with one degree of freedom, and Pr() is the evaluation of a probability statment of a random variable. For the case where BMD^<BMDγ, the right-hand side of (3) is replaced by

112Pr(2log[pr(BMD^k|Mk,Y)]+2log[pr(BMDγ|Mk,Y)]<χ1,2α2).

This approximation is like the profile likelihood used when estimating the BMDL and BMDU using the method of maximum likelihood, but in this case, pr(x|Mk, Y) is the posterior density for model Mk. For full justification of the approximation, see the Supporting Information.

Remark: Although the estimate θ^k is the MAP of pr(θk,Mk|Y), the MAP is not transformation invariant, and BMDk^=hk(θ^k) is not the maximum of pr(BMD|Mk, Y). Instead, by construction the BMDk^ is the median of the distribution used to approximate pr(BMD|Mk, Y).

2.3. Weight Calculation

In previous approaches to BMD inference using model averaging (e.g., Bailer et al., 2005), weights were calculated using either the Bayesian information criterion (BIC) or Akaike information criterion (AIC), the latter being used primarily in frequentist model averaging (Buckland et al., 1997; Claeskens & Hjort, 2008). Our approach generates weights using the Laplace approximation to the marginal density of the data. That is, for model Mk with parameter vector θk of dimension s × 1, one approximates the probability of model k as being proportional to

Ik=(2π)s/2|Σ^k|1/2L(Y|Mk),θ^k)f(θ^k|Mk)), (4)

where Σ^k is the inverse of the negative Hessian matrix of the posterior evaluated at θ^k,θ^k is the MAP estimate, L(Y|Mk,θ^k) is the likelihood of Mk given Y evaluated at θ^k, and g(θ^k|Mk) is prior for θk evaluated at θ^k.

To estimate the model weights, the MAP estimate and Ik from Equation (2) are calculated for each model Mk. The posterior probability of the model is

pr(Mk|Y)=f(Mk)Iki=1Kf(Mi)Ii,

where f(Mk) is the prior probability of model Mk (e.g., 1/K if each of K models treated as equally plausible a priori). In practice, the discrete probabilities f(Mk) may not necessarily be equal, and such a case is given in the simulation.

2.4. Dichotomous Data

Though our approach can be used for any type of data likelihood, this article focuses on application to dichotomous data. In that setting, consider an animal toxicology experiment with m doses d1,..., dm and n1,..., nm animals per dose group. Let Y = (y1,...ym)’ be the vector of the number of positive responses observed in each dose group. It is frequently assumed that yi ~ binomial(p(di), ni), where p(di) is the probability of adverse response at dose di. The dose–response function p(di), i = 1,..,m, is often assumed to be a parametric function of dose. For example, the current U.S. EPA Benchmark Dose Software (BMDS) (U.S. EPA, 2017) can estimate p(d) using any one of nine dose–response functions.

We apply our approach under the BMDS modeling framework, incorporating all nine models (K = 9). These models and the priors for their individual parameters are listed in Table I. We specify priors in relation to the interval [0, 1], which allows uniformity in the prior specification relative to dose. Doses are rescaled so that 1 represents the maximum tested dose in any study, and the rescaled doses are used in all fitting procedures.

The priors have no strict bound conditions (except for numerical stability based upon finite computer arithmetic). In other words, parameters cannot be estimated to be on a bound, and each model fit is unique and cannot collapse into a simpler model. Thus, the posterior weights in Equation (2) can be calculated without concerns about double counting the same model. If one model is arbitrarily close to another, its weight will approach zero with increasing data, relative to the simpler model, and this will occur more rapidly than for weights formed from the BIC-based weights. Our approach compares even better to the case with AIC-based weights such as Wheeler and Bailer (2007). In that approach, the ratio of the weights will always be a constant if one model degenerates into a simpler model regardless of sample size.

Despite the above advantage developing priors for a general analysis is a difficult task as they may not be generalizable to all analyses. One may prefer simple approaches such as placing the parameter on a uniform range as in Shao and Shapiro (2018), which carefully designed the boundaries of the uniform distributions. However, as shown in the Supporting Information Appendix 4, this approach may be undesirable when the uniform range is too large. If the limits are large, it results in priors that place nearly 100% of their probability on dose–response relationships exhibiting “on/off behavior” (i.e. response increases from 0% to 100% very rapidly). This can significantly bias an analysis when there is little or no information in the data to inform the parameter estimate. Furthermore, it may make the use of Bayes’ factors, and thus classical Bayesian model averaging, impossible because these priors are often equivalent to improper priors, which cannot be used in Bayesian model averaging (Hoeting et al., 1999). As a contrast, the priors used in the proposed approach (described below) were designed specifically for model averaging to place a high probability over shapes commonly seen in practice (prior elicitation). These priors are similar to those proposed in Wheeler, Piegorsch, and Bailer (2019), and because they are unconstrained, they allow other dose– response shapes if there are enough data to support them. For example, when using the method of maximum likelihood to estimate the model shape parameter, α, of the Weibull model, EPA constrains the value to be less than 18 (U.S. EPA, 2017). Values near 18 result in a hockey stick–shaped dose– response curve that implies the probability of an adverse event goes from background to 100% within an extremely small dose range. Elicitation of experts suggests such behavior is not toxicologically reasonable, so the prior proposed in our method puts exponentially decreasing weight on larger values of α. This results in a Bayesian estimate smaller than its equivalent estimate made using maximum likelihood. This is especially true in cases where there are limited data for the dose–response curve. For large enough sample sizes and with enough dose groups, one sees minimal difference between the Bayesian estimate and the method of maximum likelihood.

3. NUMERICAL APPROXIMATION FIDELITY FOR MODEL AVERAGING

Several numerical examples are considered to investigate the accuracy of the approximation approach. Here, we compare the approach to estimates using MCMC. For this comparison, a Metropolis–Hastings algorithm was used (Chib & Greenberg, 1995). MCMC proceeded by block sampling the model parameters using a multivariate normal proposal distribution whose mean is the MAP and covariance matrix being the posterior Hessian evaluated at the MAP with variance of 1.5. This Metropolis–Hastings proposal allowed for fast convergence and low correlation between samples. Using this algorithm, a total of 50,000 samples were taken with the first 1,000 disregarded as burn-in. Analysis showed the mixing was excellent and effective sample sizes were well above 2,000 for BMD samples. For both the Laplace and MCMC approaches, posterior model probabilities were computed using the calculations described in Section 2.4. For the comparison, the 5%, 50%, and 95% quantiles of the BMD were estimated using both methods.

Table II describes the five data sets and corresponding quantile estimates using the Laplace approximation as well as the MCMC approach for data sets having from 25 to 200 total observations with between 5 and 50 observations per dose group. This table shows how in most conditions the Laplace approximation’s estimates are qualitatively identical to the MCMC estimates, and this is true for data sets with no more than 25 observations. Importantly, these estimates are often slightly less than the corresponding intervals corresponding 5%. This suggests estimates may be slightly more conservative than the estimates given using MCMC in terms of coverage, which suggests they would be no worse from a public health standpoint.

Table II.

Numerical Examples of Synthetic Data Sets Comparing the Model Averaging Estimates Using the Laplace Approximation as Compared to Those Based Upon Markov Chain Monte Carlo

Data Dose Group MA Laplace Quantiles (5%, 50%, 95%) MA MCMC Quantiles (5%,50%,95%)
2/10, 2/10, 4/10, 8/10, 10/10 0, 20, 40, 60, 100 (6.0,13.9,29.5) (6.7, 13.4, 29.9)
1/50, 8/50, 11/50, 14/50 0, 25, 50, 100 (13.4,34.0,69.7) (14.3,34.7,69.2)
0/20, 1/20, 2/20, 11/20, 14/20 0,12.5, 25, 50, 100 (9.0, 18.0, 30.0) (9.7, 20.3, 31.5)
1/5, 2/5, 3/5, 5/5, 5/5 0, 10, 30, 60, 100 (2.2, 6.5, 18.0) (2.3, 7.6, 28.4)
1/10, 1/10, 1/10, 4/10, 10/10 0, 20, 40, 60, 100 (16.1, 32.2, 52.0) (12.0, 27.7, 50.2)

Note. The quantiles compared correspond to the BMD, BMDL, and BMDU used in most quantitative risk assessments.

In the one case where the 5% MCMC quantile is less than the other estimate, the values are still comparable, i.e., a MCMC value of 12.0 as compared to 16.1 using the Laplace approximation. If used for risk assessment, BMDLs would be different by a factor of 1.33, which is minimal in this context. This is in comparison to a BMDL of 22.4 given by the model averaging approach of Wheeler and Bailer (2007). Thus, though the Laplace approximation has some deviance from the posterior in certain cases, it is well within a range suggested appropriate, and correlates well with other approaches.

4. DATA EXAMPLE

For illustration, we apply the approach to two data sets that present challenges to the risk assessor. The first data set (Fig. 1) exhibits a supralinear pattern, and the second (Fig. 2) exhibits a response different from controls only in the highest dose group. For these dose–response patterns, application of current methodologies often yields unreliable curve estimates due to overfitting and model parameters that hit bounds. However, the method proposed herein yields reasonable curve estimates, and parameters cannot hit bounds, as demonstrated below. All analyses were conducted using a benchmark response (BMR) of 10% extra risk, and BMDLs were calculated using 95% coverage. All models were given equal initial prior probability of 1/9 of being the true model.

Fig 1.

Fig 1.

Model average estimate of the dose–response function for N-Nitrosomorpholine data. The model average is in black, and the other curves (shades of gray) represent the constituent curves in the model average. The darkness of the gray curves is proportional to the model weight, where darker gray curves receive higher weight. Error bars and central estimates are estimated using Bayesian estimation assuming a Beta(1/2,1/2) prior.

Fig 2.

Fig 2.

Model average estimate of the dose–response function for methyl isocyanate dose–response data. The model average is in black, and the other curves (shades of gray) represent the constituent curves in the model average. The darkness of the gray curves is proportional to the model weight, where darker gray curves receive higher weight. Error bars and central estimates are estimated using Bayesian estimation assuming a Beta(1/2,1/2) prior.

4.1. N-Nitrosomorpholine

A study in Ketkar, Holste, Preussmann, and Althoff (1983) explored the respiratory effects of N-Nitrosomorpholine in Syrian golden hamsters. In this study, four groups consisting of 50, 28, 30, and 30 animals were exposed to N-Nitrosomorpholine concentrations of 0, 1.36, 6.82, and 13.60 mg/kg/day, respectively, in their drinking water, and 0, 14, 16, and 22 of the animals exhibited respiratory hyperplasia. When modeled using the current BMDS system, none of the constrained models (e.g., the Weibull model with α ≥ 1) adequately describe the data, and some of the unconstrained models yield zero as the BMDL.

This data set was modeled using the proposed model averaging method; Fig. 1 displays the estimated model average dose–response (black) and constituent models (shades of grey, with darker shaded curves having more weight). The analysis yielded a BMD estimate of 0.16 mg/kg-day, with a BMDL of 0.01 mg/kg/day. The final weight assigned to the log-logistic model was 82.9%, and the quantal linear and log-probit models each had just under 6% of the final weight. From a visual inspection of the fit, the model average dose–response is well within the error bars (given as one standard deviation) of the given data sets. This indicates the model average adequately describes the data. This is significant as none of the constrained models adequately fit the data (figure not shown). Further, this analysis provides a reasonable estimate of the BMDL (0.01 ppm), as opposed to 0 for the unconstrained dose–response data sets.

4.2. Methyl Isocyanate

Dodd and Fowler (1986) conducted a subchronic study to explore the effects of exposing 344 Fischer rats to methyl isocyanate by vapor inhalation. In this study, four dose groups of 10 rats each were exposed to 0, 0.15, 0.60, and 3.1 ppm of methyl isocyanate in the air, and nonneoplastic lesions in the respiratory tract were observed with frequencies 0, 0, 0, 10, respectively. Unlike the first example, the responses in this study increased from 0% to 100%, with no intermediate responses. In addition, only the highest dose group had a positive response, possibly because there was a low probability of lesions occurring at low doses and the sample sizes were too small to elicit a response at those doses. For this type of dose–response pattern, non-Bayesian analyses using the Weibull and similar models force the shape parameter to be as large as possible. For the log-logistic, gamma, and Weibull models, a frequentist analysis in the BMDS system estimates the shape parameter at its upper bound, which results in dose–response curves that are essentially on/off switches. Though the BMDL is similar across these models, the BMD is estimated by the maximum bound programmed into the BMDS system and will tend toward 3.1 ppm as this bound is increased. In many cases, the bound is arbitrary and often set based upon computer precision.

This data set was modeled using the proposed model averaging method; Fig. 2 displays the model average plot (black line) and the corresponding individual model fits (gray lines, with darker shaded curves having more weight). Estimated curves were not step functions as the priors provided information that shrunk the curve fit back to the mean of the prior. Though different priors may produce different results, the priors used here were selected because they favor dose–response curves that do not increase arbitrarily rapidly. As a result, the BMD estimate was lower because the priors put more weight on dose–response curves with higher prior probability. Despite the differences between the two methods, the model average BMDL of 0.41 ppm was similar to the results from the frequentist BMDS analysis, where the BMDLs across the models ranged from 0.33 and 0.57 ppm.

5. SIMULATION

To investigate the performance of the proposed approach, we created simulations from 34 different dose–response curves assuming an experimental condition designed to mimic chronic bioassays. We generated and analyzed 2,000 randomly generated data sets each with four dose groups and 50 observations per group, with geometric spacing between doses (0, 0.25, 0.5, and 1.0). Coverage (percent of samples for which the BMDL was below the true BMD), relative bias (percent of true BMD), and BMD/BMDL ratio were investigated. Coverage reflects the ability of the method to accurately characterize the lower confidence limit on the BMD (BMDL). It is a key metric of comparison given that the BMDL with a 95% confidence limit is the preferred point of departure (POD) for the derivation of noncancer reference values and cancer slope factors derived by the U.S. EPA and other organizations. Bias reflects the ability of the method to accurately identify the true BMD and provides a comparison of the tendencies of methods to err low or high with respect to their BMD central estimates. The BMD/BMDL ratio is a less important metric for comparison purposes but can be used to identify issues that may need further investigation. For example, if a method predicts BMD/BMDL ratios that are consistently low or high relative to other methods, then the method may produce unreliable BMD estimates.

Results are provided for the described MA approach using the priors defined in Table I (“Prior 1”) and for the MA approach using several alternative sets of model parameter priors. The alternative sets of priors considered are described in the Supporting Information Appendix 3. In addition, to investigate the sensitivity of the model to the prior model weight choice, two schemes for the initial model weighs were assessed. The first scheme, denoted as the MA “even” alternative in Supporting Information Tables SA3–1, SA3–2, and SA3–3, assumes all models are equally likely a priori. The second scheme, called the MAQ approach and denoted as the MA “QL = 0.5” alternative in Supporting Information Tables SA3–1, SA3–2, and SA3–3, places 50% of the weight on the quantal linear model with the remaining eight models given equal to 6.25% weighting. The MAQ approach is similar in spirit to the linearized multistage approach and favors the one-hit cancer model. The development of the second scheme follows from the literature, which suggests near linear dose–response curves are the most difficult to account for in model averaging approaches (Simmons et al., 2015; M. W. Wheeler & Bailer, 2007).

For comparison purposes, simulation results are also provided for the approach recommended in the U.S. EPA BMD technical guidance for the selection of a “best model” (U.S. EPA, 2012). For that simulation analysis, all models described above were fit, except the Hill model, which was not fit due to convergence difficulties. For each iteration, model selection was conducted as follows. The models that did not fit the data (i.e., having p-value < 0.1 using a Pearson chi-squared test) were excluded. From among the remaining models, the BMD and BMDL of the model with the lowest AIC were chosen if the range of BMDLs from adequately fitting models was no more than threefold; otherwise, the BMD and BMDL from the model with the lowest BMDL were chosen. Simulation results are also provided for a competing Bayesian model averaging method from Shao and Shapiro (2018), which uses the uniform priors for the model parameters θ, using the MAKS approach. This approach fits all models except the Gamma model using the same priors as defined above and uses a model averaging approach as defined in that manuscript. Finally, for an additional comparison, simulation results are provided for the NP method of Guha et al. (2013).

5.1. True Dose–Response Curves

To simulate a range of plausible dose–response relationships, we conducted the simulation analysis on 34 dose–response curves from a variety of shapes. The shapes varied from simple parametric forms and weighted averages of such models to smooth monotone curves generated from stochastic processes. Many of the shapes mimic curves that may be realistically seen in a dose–response analysis, but some of the shapes might be nonstandard, which can be used as a benchmark to diagnose possible problems with any particular method.

5.1.1. Single Parametric Models

To mimic a flexible parametric model that may be included in the model suite, we constructed models based on the multistage three-degree and log-logistic models, with varying parameter values to form various true shapes. The three-degree multistage model used is

pms3(d)=γ+(1γ)[1exp(β1dβ2d2β3d3)],

with the log-logistic being

p(d)=γ+1γ1+exp(αβlog(d)).

The range of model (M) shapes considered are shown in Fig. 3 (M1–M14) and Fig. 5 (M24–M26). Though all dose–response curves are monotone, we include three nonstandard curves in this set of models. Models M3, M8, and M12 represent curves that increase at low- and high-dose ranges, but plateau somewhere in the mid-dose range. These curves, though not expected to represent dose–response curves routinely encountered in practice, give an indication of data sets where the methods may have difficulty fitting the data. The exact forms of the dose–response curves and their corresponding BMDs are indicated in Table SA2–1 of the Supporting Information.

Fig 3.

Fig 3.

Realized dose–response curves for simulation conditions M1–M14. Simulation conditions were generated using a single parametric model.

Fig 5.

Fig 5.

Realized dose–response curves for simulation conditions M27–M34. Simulation conditions were generated using monotone stochastic processes (M27–M32) or were generated from parametric models outside of proposed model averaging approach (M33 and M34).

5.1.2. Convex Sum of Multiple Parametric Models

In addition to single parametric models, we analyzed cases where the true dose–response is a convex combination of the underlying dose–response curves. Although these dose–responses are representable by the proposed methodology, one should not consider these as directly in the model space using the proposed methodology. This is because, as the sample size goes to infinity, the model average estimated model is known to converge to a single model that minimizes the Kulback–Leibler divergence within the data generating mechanism (Yao, Vehtari, Simpson, & Gelman, 2018), implying that for large enough n there is a single model the procedure will suggest with probability 1.

The models M15–M17 and M23 were constructed as a convex sum of

ps1(d)=11+exp(34d)

and

ps2(d)=0.02+0.98×[1exp(1.5d)],

which are versios of the logistic and quantal linear models, respectively. Table SA2–2 of the Supporting Information gives the different convex sums considered in these conditions and their corresponding BMDs, and Fig. 4 displays the range of these dose– response models.

Fig 4.

Fig 4.

Realized dose–response curves for simulation conditions M15–M26. Simulation conditions M15–M23 were generated using a convex sum of multiple parametric models. Simulation conditions M24–M26 were generated from a three-degree multistage parameter to test the performance when a mode is not in the model suite and has a higher background rate.

In addition to the two-model convex combination, the models M18–M22 were each constructed as a convex sum of the following four dose–response curves:

ps3(d)=Φ(1.6+2.5d),
ps4(d)=0.02+0.98[1exp(1.6d)],
ps5(d)=0.02+0.981+exp[1.32×log(d)],

and

ps6(d)=0.02+0.98[1exp(1.5d2.2)],

which are versions of the probit, quantal linear, log-logistic, and Weibull models, respectively. Table SA2–3 of the Supporting Information gives the different convex sums considered in these models and their corresponding BMDs, and Fig. 4 gives the range of curves estimated (M18–M22). These models form dose–response conditions similar to the quantal linear function (i.e., curves similar to ps2(d), ps4(d), and ps6(d)) found to be problematic model averaging cases by Wheeler and Bailer (2007).

5.1.3. Models Outside the Model Suite

We also analyzed models not representable as any function in the model suite, denoted in Fig. 5 as simulation models M27–M34. Models M27–M32 were generated from a smooth monotone stochastic process over a basis set (e.g., see Higdon et al., 2002). In the simulations used to generate these models, random coefficients for each basis were generated in a manner that guaranteed monotonicity. To guarantee the plausibility of the dose–response, each curve was visually inspected and found to be a reasonable dose–response shape based upon its shape. In addition to the nonparametric (NP) curves M27–M32, two additional cases, M33 and M34, were considered. M33 uses an exponentially modified Gaussian distribution, which has a history in analytical chemistry (Pauls & Rogers, 1977). For M34, a multistage three-degree model was created to define a case of high dose downturn. For these simulations, the generation of each curve is available in an R program (R Core Team, 2015) in the Supporting Information. Fig. 5 gives the range of curvature defined using these functions (M27–M34) and shows that a large range of curvature was considered when constructing the simulations.

5.2. Simulation Results

For the simulation, we calculated the observed coverage Pr(BMDL < BMDTRUE), approximated as relative frequency, the relative BIAS percentage 100×E[BMD^BMDTRUE]%, and the expected ratio between the lower bound estimate and the estimated BMD as a measure of spread E[BMDL^BMD^], where all values were estimated using arithmetic averages. We note that the more commonly suggested ratio of the lower and upper bound (BMDL/BMDU) was not used as BMDU is not available in all methods investigated (e.g., the BMDS modeling results). Additionally, the statistics were computed for BMRs of 1% and 10% extra risk. As there are 34 true dose–response curves, two BMRs for each curve, and five methods tested, not all of the results are presented here, but are available in Supporting Information Appendix 3.

Tables IIV give the observed coverage for the three model averaging methods for the BMR = 10% extra risk. The simulation results for the 1% extra risk BMR, given in the Supporting Information, are similar with the 10% results and are not discussed further. Overall, the proposed model averaging approach and the NP approach of Guha et al. (2013) are similar for many of the simulations. These two approaches frequently achieve near nominal coverage, i.e., 95% across the simulations. In contrast, the current BMDS approach failed to achieve nearly nominal coverage in most simulations. Unsurprisingly, all methods performed poorly for simulation conditions M3 and M12, which are conditions where the response increases, plateaus, and then increases again at higher doses. Note for model M8 the MA and MAQ approaches reached nominal, though conservative, coverage. In many of these cases, the coverage is 0%, which is a result of the poor fit of the parametric methods. Even the NP monotone approach performed poorly in these conditions because it linearly interpolates between observed points. In cases of concave dose–responses between dose groups, a linear interpolation will systematically underestimate the true dose–response curve and the corresponding BMD. For the NP approach, this pattern is also seen in simulations M1, M23, and M24, and will all exhibit concavity between zero and the first tested dose of 0.25.

Table V.

Observed Coverage Probabilities for the Test Conditions M27–M34 with BMR = 10% and 2,000 Simulation Conditions, and a Nominal Coverage of 95%

Test Condition MA MAQ BMDS NP MAKS
M27 99.5% 99.8% 95.1% 99.6% 98.7%
M28 100.0% 100.0% 100.0% 100.0% 100.0%
M29 100.0% 100.0% 100.0% 100.0% 100.0%
M30 92.7% 97.3% 93.2% 99.8% 96.3%
M31 95.7% 99.0% 67.3% 56.1% 90.4%
M32 95.9% 100.0% 77.7% 100.0% 85.7%
M33 0.9% 36.4% 59.6% 96.9% 45.8%
M34 80.7% 99.8% 99.7% 98.9% 93.0%

Note. Here, MA is the proposed method with equal weighting, MAQ is the proposed method with 50% prior weight assigned to the quantal linear, BMDS is the current algorithm, and fitting procedure recommended by the US EPA (U.S. EPA, 2012), NP is the nonparametric Bayesian procedure of Guha et al. (2013), and MAKS is the approach of Shao and Shapiro, which uses uniform priors (Shao & Shapiro, 2018).

The simulations also examined the effect of placing a priori weight on the quantal linear model of 0.5 (i.e. the MAQ approach). The results show that by using this weighting scheme coverage may improve for many dose–responses that are very similar to dose–responses observed in practice. For example, for simulation conditions M15–M25 using the MAQ approach, coverage is improved to nominal or near-nominal rates with little impact on the coverage for curves that are clearly sublinear (i.e., threshold-type models). This indicates such a priori weighting schemes may help in modeling the BMD for most dose–response data sets seen in practice.

Coverage results for all 34 conditions (Supporting Information Table SA3–2) indicate median coverage percentages for the MA method of 95.2% and 94.9% for the 10% and 1% BMR, respectively. The 10% and 1% BMR median coverage percentages were lower than the nominal 95% coverage rate for the BMDS (81.1% and 92.4%) and MAKS (88.1% and 88.6%) methods and higher for the MAQ (97.6% and 98.6%) and NP methods (98.6% and 99.9%). Thus, relative to the MA method, the BMDS method resulted in a substantially greater frequency of lower than optimal coverage, (i.e., BMDL < BMDtrue at lower than the desired nominal 95% rate for the 10% BMR). The MAQ and NP methods resulted in a higher than optimal (near maximal) coverage (i.e., a greater tendency to estimate BMDL < true BMD at higher than the desired nominal 95% rate) for a greater number of templates.

Although the model average (MA), MAQ, and NP approaches often obtain similar coverage for many models, there are differences in the methodologies’ performance for some models. For some models (M1, M23, M24, and M31), the MA approach yields much higher coverage than the NP approach. For example, for simulation M1 the observed coverage using the MA approach was 97.9%, compared to 0% using the NP approach. This discrepancy can be traced back to the linearization performed in the specific MA approach. In cases where the NP approach clearly outperforms MA (M4, M13 and M33), the true dose–response is nearly linear, that is, directly proportional to the dose. The constituent parametric models in the MA approach do not support the shape, whereas the linear interpolation of the NP approach appropriately models the true dose–response curve as it assumes a linearity between observed doses. We note the MAQ approach overcomes many of the problems with coverage seen in the MA and NP cases with minimal decrease in performance for cases that are not linear.

We also evaluated the bias of the methods by calculating median percentages across templates (Supporting Information Table SA3–2). The MAQ results exhibited less bias (101.5% and 140.5% for 10% and 1% BMRs, respectively) than the MA approach (107.7% and 175.3% for 10% and 1% BMRs, respectively) and typically had less bias than the NP approach at the 10% BMR (96.3%) due to the NP method reporting more conservative (lower) point estimates for sublinear dose–response relationships. For example, conditions M7 and M32 were very sublinear dose–response functions; the NP approach had point estimates that were 35.4% and 34.9% of the true BMD whereas the MA and MAQ approaches each had point estimates that were 78.6% and 79.6% of the true BMD. The BMDS approach and the approach of Shao and Shapiro performed better with respect to bias than they did for coverage, but the results were not noticeably better than either the MA or MAQ approaches. Though the MAQ weightings make the BMDL more conservative with respect to the MA, these results show that MAQ changes the point estimate very little, and possibly makes the estimate less biased. A point that is seen regarding the ratio statistic is fully described in the Supporting Information. Nearly all the MAQ BMDLs are closer to the BMD, which, combined with the coverage results, argues that the MAQ weighting scheme also increases the stability of the estimates.

6. Discussion

The proposed dichotomous model averaging method addresses various problems defining a fast, reproducible, and fully Bayesian approach. As seen from the data examples in Section 4, MA allows the use of unconstrained models without problems in the estimation of the lower bound, something that occurs frequently in model averaging using unconstrained models. In addition, by using a Bayesian approach with informative priors, it allows the fitting of models that have more parameters than there are data points, and this allows the use of a consistent model suite across dose–response data sets.

The MA method using our proposed priors performs favorably against many of the current state-of-the-art methods. This was demonstrated in a comprehensive simulation study using several alternative sets of parameter priors and representing 34 plausible dose–response curves (templates) that are both in and out of the MA modeling suite. Some of 34 dose–response templates employed are certainly more plausible than others, and we have discussed some of the implications of individual dose–response curve shapes. However, the summary statistics describing the distribution of results across templates assume that this variability in plausibility does not have a major impact on the results. The tested models were selected to broadly cover all feasible dose–response curves. As the priors place the majority of the prior weight for BMR = 10% to be between 0.05 and 0.75 of maximum tested dose, these priors are not overly informative for the BMD for the range of dose–responses normally considered, and we do not believe our approach is biased toward the proposed MA.

Thus, the simulation results presented here can be interpreted as indicating that our MA approach offers a significant improvement over single model selection and a slight improvement with respect to coverage, with little difference in bias, when compared to other methods. More appropriate priors can be developed in the future to produce better results in certain situations (Fang et al., 2015) as necessary. This is evident in the MAQ approach, from which a slight change in weighting a priori, improved the performance in situations that represent one-hit cancer models, but did not drastically alter performance in other situations that were sublinear.

Finally, the proposed MA method has practical advantages. It promotes consistency by removing many of the decisions a risk assessor needs to make analyzing a Markov chain or by manually fitting multiple individual models and choosing a “best model.” It also runs much faster than previously published MA approaches. Individual model results are fit in at most a tenth of a second, with all model averaging results (BMD and BMDL estimates) computed within a second or less on a modern computer. This is in contrast to previous model averaging approaches such as Wheeler and Bailer (2007) that require half a minute to complete, or full MCMC-based approaches such as Shao and Shapiro (2018) that may require longer run times. This improvement in efficiency can be appreciable when many data sets are analyzed. While the accuracy of the Laplace approximation is dependent on sample size (Tierney & Kadane, 1986), it has been shown here to be sufficiently accurate in simulation studies for use with typical NTP chronic bioassays involving ~200 test animals. Further, numerical tests showed reasonably accurate results when there are as few as 25 animals at five animals per dose group.

Supplementary Material

Suppl material

Figure SA1-1: Comparison of the true posterior distribution (calculated using MCMC) of the BMD calculated at a BMR=0.01 for the Weibull model compared to the approximate density (red line) generated using the approximation method.

Table SA2-1: The true models, parameter values, and BMD-10s and BMD-01s for models M1-M14 and M24-M26. Here, MS3 is the multistage 3° model and LL is the log-logistic model.

Table SA2-2: The weighting schemes for simulation models M15-M17, and M23 and corresponding BMD-10 and BMD-01. The exact form of the logistic and quantal linear (QL) models are described in in the main text.

Table SA2-3: The weighting schemes for simulation models M18-M22 and corresponding BMD-10 and BMD-01. The exact form of the probit (PRO), logistic (LOG) and Weibull (WEI) models are described in the main text.

Table SA3-1: Proposed and Alternative Priors

Table SA3-2: Coverage

Table SA3-3: Percent of True BMD

Table SA3-4: BMD/BMDL Ratio

Table III.

Observed Coverage Probabilities for the Test Conditions M1–M12 with BMR = 10% Simulation Conditions, and a Nominal Coverage of 95%

Test Condition MA MAQ BMDS NP MAKS
  1 97.9% 97.0% 99.1% 0.0% 92.1%
  2 99.7% 99.7% 90.8% 100.0% 93.6%
  3 0.0% 0.0% 0.0% 0.0% 0.0%
  4 52.1% 55.6% 36.2% 92.7% 32.9%
  5 98.8% 98.8% 77.8% 99.2% 83.7%
  6 100.0% 100.0% 91.4% 100.0% 93.4%
  7 100.0% 100.0% 100.0% 100.0% 100.0%
  8 100.0% 100.0% 97.0% 0.0% 66.2%
  9 87.4% 91.0% 88.0% 94.1% 83.4%
10 100.0% 100.0% 92.6% 99.7% 91.4%
11 100.0% 100.0% 100.0% 100.0% 99.9%
12 29.7% 6.2% 48.8% 33.5% 66.7%

Note. MA is the proposed method with equal weighting, MAQ is the proposed method with 50% prior weight assigned to the quantal linear, BMDS is the current algorithm used by the US EPA, and fitting procedure recommended by the US EPA (U.S. EPA, 2012), NP is the nonparametric Bayesian procedure of Guha et al. (2013), and MAKS is the approach of Shao and Shapiro, which uses uniform priors (Shao & Shapiro, 2018).

Table IV.

Observed Coverage Probabilities for Test Conditions M13–M26 with BMR = 10% Simulation Conditions, and a Nominal Coverage of 95%

Test Condition MA MAQ BMDS NP MAKS
13 67.7% 83.5% 80.9% 98.8% 83.6%
14 94.9% 95.0% 91.6% 100.0% 92.8%
15 94.9% 95.0% 57.8% 97.2% 76.7%
16 88.2% 95.1% 56.3% 94.1% 78.7%
17 91.6% 96.9% 81.2% 89.2% 86.9%
18 91.6% 93.1% 65.6% 98.4% 82.3%
19 95.5% 98.3% 73.6% 97.1% 88.6%
20 97.2% 97.9% 76.2% 99.0% 91.9%
21 91.5% 92.7% 78.7% 99.2% 87.1%
22 92.7% 94.5% 61.6% 98.3% 82.2%
23 89.6% 90.5% 87.5% 83.7% 84.4%
24 97.1% 99.9% 67.7% 65.8% 87.6%
25 100.0% 100.0% 99.7% 100.0% 99.1%
26 95.8% 98.8% 53.1% 95.4% 90.9%

Note. Here, MA is the proposed method with equal weighting, MAQ is the proposed method with 50% prior weight assigned to the quantal linear, BMDS is the current algorithm, and fitting procedure recommended by the US EPA (2012), NP is the nonparametric Bayesian procedure of Guha et al. (2013), and MAKS is the approach of Shao and Shapiro, which uses uniform priors (Shao & Shapiro, 2018).

Acknowledgments

The study was reviewed by the Center for Public Health and Environmental Assessment and approved for publication. Mention of trade names or commercial products does not constitute endorsement or recommendation for use. Views expressed in this article are the authors’ and do not necessarily reflect the US EPA’s views or policies. This research was supported (in part) by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences. Some of this research was conducted while the first author worked for the National Institute for Occupational Safety and Health.

Footnotes

SUPPORTING INFORMATION

Additional supporting information may be found online in the Supporting Information section at the end of the article.

References

  1. Bailer AJ, Noble RB, & Wheeler MW (2005). Model uncertainty and risk estimation for experimental studies of quantal responses. Risk Analysis, 25(2), 291–299. 10.1111/j.1539-6924.2005.00590.x [DOI] [PubMed] [Google Scholar]
  2. Buckland ST, Burnham KP, & Augustin NH (1997). Model selection: An integral part of inference. Biometrics, 53(2), 603–618. [Google Scholar]
  3. Chib S, & Greenberg E (1995). Understanding the MetropolisHastings algorithm. American Statistician, 49, 327–335. [Google Scholar]
  4. Claeskens G, & Hjort NL (2008). Model selection and model averaging. Cambridge, England: Cambridge University Press. [Google Scholar]
  5. Dodd DE, & Fowler EH (1986). Methyl isocyanate subchronic vapor inhalation studies with Fischer 344 rats. Toxicological Sciences, 7, 502–522. [DOI] [PubMed] [Google Scholar]
  6. Faes C, Aerts M, Geys H, & Molenberghs G (2007). Model averaging using fractional polynomials to estimate a safe level of exposure. Risk Analysis, 27(1), 111–123. 10.1111/j.1539-6924.2006.00863.x [DOI] [PubMed] [Google Scholar]
  7. Fang Q, Piegorsch WW, Simmons SJ, Li X, Chen C, & Wang Y (2015). Bayesian model-averaged benchmark dose analysis via reparameterized quantal-response models. Biometrics, 71(4), 1168–1175. 10.1111/biom.12340 [DOI] [PubMed] [Google Scholar]
  8. Fletcher D, & Turek D (2012). Model-averaged profile likelihood intervals. Journal of Agricultural, Biological, and Environmental Statistics, 17(1), 38–51. 10.1007/s13253-011-0064-8 [DOI] [Google Scholar]
  9. Guha N, Roy A, Kopylev L, Fox J, Spassova M, & White P (2013). Nonparametric Bayesian methods for benchmark dose estimation. Risk Analysis, 33(9), 1608–1619. 10.1111/risa.12004 [DOI] [PubMed] [Google Scholar]
  10. Hardy A, Benford D, Halldorsson T, Jeger MJ, Knutsen KH, ... Ockleford C (2017). Update: Use of the benchmark dose approach in risk assessment. EFSA Journal 15(1), e04658. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Higdon D (2002). Space and space-time modeling using process convolutions In Quantitative methods for current environmental issues (pp. 37–56). London: Springer. [Google Scholar]
  12. Hoeting JA, Madigan D, Raftery AE, & Volinsky CT (1999). Bayesian model averaging: A tutorial. Statistical Science, 14(4), 382–417. [Google Scholar]
  13. Hsu JSJ (1995). Generalized Laplacian approximations in Bayesian inference. Canadian Journal of Statistics, 23(4), 399–410. 10.2307/3315383 [DOI] [Google Scholar]
  14. Hu B, Ji Y, & Tsui KW (2008). Bayesian estimation of inverse dose response. Biometrics, 64(4), 1223–1230. 10.1111/j.1541-0420.2008.01013.x [DOI] [PubMed] [Google Scholar]
  15. Ketkar MB, Holste J, Preussmann R, & Althoff J (1983). Carcinogenic effect to nitrosomorpholine administered in the drinking water to Syrian golden hamsters. Cancer Letters, 17, 333–338. [DOI] [PubMed] [Google Scholar]
  16. Leonard T, Hsu JSJ, & Tsui KW (1989). Bayesian marginal inference. Journal of the American Statistical Association, 84(408), 1051–1058. 10.2307/2290082 [DOI] [Google Scholar]
  17. Pauls RE, & Rogers LB (1977). Band broadening studies using parameters for an exponentially modified gaussian. Analytical Chemistry, 49(4), 625–628. [Google Scholar]
  18. R Core Team. (2015). R: A language and environment for statistical computing. Retrieved from https://www.R-project.org/
  19. Raftery AE, Madigan D, & Hoeting JA (1997). Bayesian model averaging for linear regression models. Journal of the American Statistical Association, 92, 179–191. 10.1080/01621459.1997.10473615 [DOI] [Google Scholar]
  20. Richardson S, & Green PJ (1997). On Bayesian analysis of mixtures with an unknown number of components. Journal of the Royal Statistical Society: Series B (Methodological), 59(4), 731–792. [Google Scholar]
  21. Severini T (1991). On the relationship between Bayesian and non-Bayesian interval estimates. Journal of the Royal Statistical Society: Series B (Methodological), 53(3), 611–618. [Google Scholar]
  22. Shao K, & Gift JS (2013). Model uncertainty and Bayesian model averaged benchmark dose estimation for continuous data. Risk Analysis, 34(1), 101–120. 10.1111/risa.12078 [DOI] [PubMed] [Google Scholar]
  23. Shao K, & Shapiro AJ (2018). A web-based system for bayesian benchmark dose estimation. Environmental Health Perspectives, 126(1), 017002 10.1289/ehp1289 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Simmons SJ, Chen C, Li X, Wang Y, Piegorsch WW, Fang Q, ... Dunn GE (2015). Bayesian model averaging for benchmark dose estimation. Environmental and Ecologica Statistics, 22(1), 5–16. 10.1007/s10651-014-0285-4 [DOI] [Google Scholar]
  25. Tierney L, & Kadane JB (1986). Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 81(393), 82–86. 10.1080/01621459.1986.10478240 [DOI] [Google Scholar]
  26. U.S. EPA. (2012). Benchmark dose technical guidance (EPA/100/R-12/001). Retrieved from https://www.epa.gov/risk/benchmark-dose-technical-guidance [Google Scholar]
  27. U.S. EPA. (2017). Benchmark dose software (BMDS). Version 2.7.0.4 [BMDS]. Washington, DC: U.S. Environmental Protection Agency, National Center for Environmental Assessment; Retrieved from https://www.epa.gov/bmds [Google Scholar]
  28. West RW, Piegorsch WW, Pena EA, An L, Wu W, Wickens AA, ... Chen W (2012). The impact of model uncertainty on benchmark dose estimation. Environmetrics, 23(8), 706–716. 10.1002/env.2180 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Wheeler M, & Bailer AJ (2009). Comparing model averaging with other model selection strategies for benchmark dose estimation. Environmental and Ecological Statistics, 16(1), 37–51. 10.1007/s10651-007-0071-7 [DOI] [Google Scholar]
  30. Wheeler MW, & Bailer AJ (2007). Properties of model-averaged BMDLs: A study of model averaging in dichotomous response risk estimation. Risk Analysis, 27(3), 659–670. 10.1111/j.1539-6924.2007.00920.x [DOI] [PubMed] [Google Scholar]
  31. Wheeler MW, Piegorsch WW, & Bailer AJ (2019). Quantal risk assessment database: A database for exploring patterns in quantal dose-response data in risk assessment and its application to develop priors for Bayesian dose-response analysis. Risk Analysis, 39(3), 616–629. 10.1111/risa.13218 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Yao Y, Vehtari A, Simpson D, & Gelman A (2018). Using stacking to average bayesian predictive distributions. Bayesian Analysis, 13(3), 917–1007. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Suppl material

Figure SA1-1: Comparison of the true posterior distribution (calculated using MCMC) of the BMD calculated at a BMR=0.01 for the Weibull model compared to the approximate density (red line) generated using the approximation method.

Table SA2-1: The true models, parameter values, and BMD-10s and BMD-01s for models M1-M14 and M24-M26. Here, MS3 is the multistage 3° model and LL is the log-logistic model.

Table SA2-2: The weighting schemes for simulation models M15-M17, and M23 and corresponding BMD-10 and BMD-01. The exact form of the logistic and quantal linear (QL) models are described in in the main text.

Table SA2-3: The weighting schemes for simulation models M18-M22 and corresponding BMD-10 and BMD-01. The exact form of the probit (PRO), logistic (LOG) and Weibull (WEI) models are described in the main text.

Table SA3-1: Proposed and Alternative Priors

Table SA3-2: Coverage

Table SA3-3: Percent of True BMD

Table SA3-4: BMD/BMDL Ratio

RESOURCES