Abstract
We present a novel nonparametric method for bioassay and benchmark analysis in risk assessment, which averages isotonic MLEs based on disjoint subgroups of dosages. The asymptotic theory for the methodology is derived, showing that the MISEs (mean integrated squared error) of the estimates of both the dose-response curve F and its inverse F−1 achieve the optimal rate O(N−4/5). Also, we compute the asymptotic distribution of the estimate of the effective dosage ζp = F−1 (p) which is shown to have an optimally small asymptotic variance.
Keywords: Monotone dose-response curve estimation, effective dosage, benchmark analysis, mean integrated square error, asymptotic normality
1. Introduction
The efficient estimation of effective dosage is an old but still very important problem in biology and medicine. In addition, concerns about the impact of pollutants in the environment have added a great sense of urgency to the development of good methods for the estimation of benchmarks in risk assessment (See, e.g., Piegorsch and Bailer (2005)). We present in this article the asymptotic theory of a new method. In a companion study based on extensive simulation and data analysis, to be presented elsewhere, it is shown that the method performs remarkably well even with small and moderate sample sizes (Bhattacharya and Lin (2010)).
Consider quantal dose-response experiments in bioassay where the response of a subject to a drug or a chemical agent is measured in a binary scale, 1 for response and 0 for non-response. Given a dosage x of the substance, let F(x) be the probability of response. The function is called the dose-response curve, and it is assumed to be monotone increasing. The effective dosage for a targeted response (probability) p is defined as the ‘p-th quantile’ ζp or EDp,
(1.1) |
For the data, suppose that ni subjects are given a dosage xi (i = 1, . . . , m), where x1 < . . . < xm, with the total number of observations . One may assume, without loss of generality, that 0 = x1 < . . . < xm = 1. The number of responses observed at dosage xi is ri (i = 1, . . . , m). The likelihood function for the estimation of F(xi), 1 ≤ i ≤ m, is
(1.2) |
The maximum likelihood estimator (MLE) of (p1, . . . , pm), under the monotonicity constraint, is given in Ayer et al.(1955) by the following PAV, or pool-adjacent-violators algorithm (Also see Barlow et al.(1972), p.73, and Cran (1980)):
(1.3) |
Bhattacharya and Kong (2007) proposed an estimate of F(x), the dose-response curve, by taking to be at xi and by linear interpolation in the interval (xi, xi+1):
is a continuous function whose inverse is the estimate of EDp as given by:
(1.4) |
if and, more generally, by .
From now on, we will assume, for simplicity, that there are m equidistant dosages and the same number n of i.i.d. 0 − 1 valued observations at each dosage. Assume n → ∞, m → ∞ and
(1.5) |
in Theorems 2.1, 2.2, 2.3 part (b), and in Theorem 2.3, part (c). Here means that the ratio of the two sides are bounded away from zero and infinity.
Let denote the observed proportion of 1's at dosage xi. Divide the observed proportions and dosages into r groups, and consider the following application of the PAV algorithm to each of the r groups of levels below:
(1.6) |
Note that Group 2 through Group r − 1 each has s(n) + 2 levels, while Groups 1 and r each has s(n) + 1 levels. Also, except for the smallest and the largest levels (with proportions and , the sets of levels covered by them are disjoint. Together, they comprise all the different m = rs(n) dosages.
By linear interpolation, each Group j (j = 1, . . . , r) provides an estimate of the dose-response curve F on [0, 1], and an estimate of F−1. Note that while F−1 is defined on [F(0), F(1)], and below are defined on . Compute
(1.7) |
and choose the values of r for which the bootstrap estimates of the MISEs of and are the smallest. These we call the NAM estimates of F and F−1.
Among kernel based nonparametric methods for quantal bioassay, one may mention Müller and Schmitt (1988), Park and Park (2006), Dette et al. (2005) and Dette and Scheder (2010). A description of these methods may be found in the last two articles.
Remark 1.1. For the purpose of asymptotics, one may take the r groups in (1.6) to be disjoint, omitting from Group 1, and from Groups 2 through r − 1, and from Group r. As is shown in Bhattacharya and Kong (2007), outside a set Bn of negligible probability, . Given x ∈ (0, 1), if m, n are sufficiently large, and (see (2.2)), x belongs to the domain of , even if the curve is constructed with common points removed. Outside Bn, the curves so obtained would coincide, on their respective domains, with the curves constructed after adjoining the end points. On the other hand, for relatively small sample sizes one needs to construct with the groupings (1.6), so that each has domain [0, 1].
We now provide a summary of the rest of the article. The asymptotic theory of the NAM is derived in Section 2. Theorem 2.1 proves that the estimate of the dose-response curve has a MISE attaining the optimal rate O(N−4/5) under the assumptions that f = F′ is strictly positive, F″ is bounded and m = o(n3/2/(log n)5/2). Theorem 2.2 provides the same optimal MISE rate for the estimate of the quantile curve of interest, under the additional restriction . Theorem 2.3 shows that ζp is asymptotically Normal around with an asymptotic variance , under the same broad assumptions as in Theorem 2.1. However, for asymptotic Normality of around ζp, one needs the restriction m = o(n2/3). For larger m, a bias correction of is thus called for. It will be shown in a companion paper (Bhattacharya and Lin (2010)), by extensive simulation and data analysis, that the method proposed here performs quite favorably in comparison with other leading nonparametric methods, including the new method due to Dette et al. (2005) and Dette and Scheder (2010)
2. Asymptotic Behavior
Let denote the sample proportion of responses to the dosage xi (i = 1, . . . , m). For simplicity, we assume in this section that ni = n for all i and that xi+1 − xi = 1/m for i = 1, . . . , m − 1. Let N = mn denote the total number of observations.
Theorem 2.1. Assume that the dose-response function F on [0, 1] is twice differentiable, f = F′ has a positive lower bound θ and that F″ is bounded.
(a) The mean integrated squared error (MISE) of has the asymptotically optimal rate O(N−4/5) as N → ∞, if r = O(1), .
(b) If m/n1/4 → ∞, m = o(n3/2/(log n)5/2), then also the MISE of is O(N−4/5), with a choice of r satisfying .
Proof. (a) It follows from Bernstein's inequality, as in the proof of Theorem 1 in Bhattacharya and Kong (2007), that there exist appropriate positive constants c, c′ such that for n > 1,
(2.1) |
It follows that if
(2.2) |
then
(2.3) |
for some c″ > 0. Let Bn denote the union of the two sets within parentheses in (2.1) and (2.3). It is shown in Bhattacharya and Kong (2007), and simple to check using (2.1) to (2.3), that, on , .
Let x ∈ [xi, xi+1]. By linearity of on [xi, xi+1],
(2.4) |
Also, for some x* ∈ [xi, xi+1],
(2.5) |
where ε(x) = F′(x*) − F′(x**) for some x*, x** lying in [xi, xi+1], and M = sup{|F″(x)| : 0 ≤ x ≤ 1}. Thus, noting that F, are bounded by one,
(2.6) |
and
(2.7) |
(2.8) |
and, by subtracting (2.7) from (2.8) one gets
(2.9) |
Hence
(2.10) |
From (2.7) and (2.10) one obtains, on integration,
(2.11) |
If , then the MISE attains its optimal rate (noting that mn = N, or n5/4 = O(N)), .
(b) First observe that the r groups in ((1.6)) are essentially disjoint. Inclusion of (x1, ) and (xm, ) in each group ensures that (j = 1, . . . , r) is defined on all of [0, 1]. Note the strict inequality , on , since the assumption m = o(n3/2/(log n)5/2) implies that (2.2) holds with m/r in place of m.
If one has m/n1/4 → ∞, then using r essentially disjoint groups, and averaging, one has (See (2.7), (2.10))
(2.12) |
The optimal choice of r is given by the relation or, , yielding the optimal rate: .
We now turn to the estimation of the curve F−1.
Theorem 2.2. Assume the hypothesis of Theorem 2.1.
(a) If m = O(n1/4) then, with r = 1, , one has .
(b) If m/n1/4 → ∞, but , then , with .
Proof. (a) For m = O(n1/4), one may consider r = 1 in (1.7). Then . Let p ∈ [pi, pi+1], so that x = F−1(p) ∈ [xi, xi+1]. Then, on ,
(2.13) |
First, consider, for an appropriate positive constant c1,
(2.14) |
say. Then on , F(x) and belong to . Using (2.13), the linearity of on and (2.8), and writing
(2.15) |
on , we get the following relation, noting that is bounded by 1:
(2.16) |
Here
(2.17) |
Note that, on , , say, , so that for all sufficiently large n.
The expectation of (2.16) equals
(2.18) |
Now Eεn,3 is the sum of the following:
(2.19) |
(2.20) |
(2.21) |
For the first relation in (2.21), use , and Eδn = 0. Hence the bias (of as an estimator of F−1(p)) is
(2.22) |
Subtracting (2.18) from (2.16), one obtains
(2.23) |
noting that for some constant c‴. The term Op(N−2) is bounded by civ1Bn, for some constant civ. Therefore,
(2.24) |
It is relatively simple to check that the contribution from , 1 ≤ i ≤ m−1, to is negligible compare to that from . It is useful, however, to show that for all p ∈ [0, 1], one has on , the relation
(2.25) |
where εn,4 = Op(1/m). Indeed, on , for some cv > 0. To establish (2.25), note first that if then, although (since x ∈ [xi, xi+1]), it may happen that F(x) belongs to or . On , there is no other possibility.
Now if , e.g., then, recalling that x = F−1(p),
(2.26) |
in view of the linearity of on both and , but with different slopes (given in curly brackets). But the second slope differs from the first by an amount εn,4 which is easily shown to be no more than cv/m on . The MISE of is then given by
(2.27) |
Once again, the optimal choice of m is , and then the MISE has the optimal rate
(2.28) |
(b) Next consider the case m/n1/4 → ∞, i.e., n = o(N4/5). Since , it is of larger order than N−4/5, and hence the estimation is suboptimal. In this case, again consider r groups of essentially disjoint equidistant dosages. Then the average has bias and variance (See (2.22) and (2.24)) given by
(2.29) |
(2.30) |
Assume that m is not very large, i.e., . Then the optimal choice of r is , since the term m/(rn) in (2.29) is not of larger order than (r/m)2, and one equates the orders of 1/(rn) and (r/m)4 to get the optimal r. This yields the optimal MISE of , namely, . □
Finally, we arrive at the asymptotic distribution of .
Theorem 2.3. Let p ∈ (0, 1). In addition to the hypothesis of Theorem 2.1, assume m/n1/4 → ∞. Then the following hold.
(a) With r = 1, , if m < (2/(cθ))(n/log n)1/2, then
(2.31) |
where
(2.32) |
lies in [1/2,1], and x = F−1(p) = ζp.
(b) If m = o(n3/2/log5/2 n), then with ,
(2.33) |
Here is the average of the r quantities , 1 ≤ j ≤ r, of the form (2.32), one for each subgroup with m/r dosages at a distance of r/m from each other.
(c) If m = o(n2/3/log log n), then with ,
(2.34) |
Proof. (a) It follows from (2.16) (and (2.25)) that for p ∈ [pi, pi+1) one has, outside Bn,
(2.35) |
Multiplying the two sides by , and noting that , the desired Normal approximation holds.
(b) By (2.23), one has, outside Bn,
(2.36) |
Using the analog of (2.36) for , one may apply Lyapunov's central limit theorem (See, e.g., Bhattacharya and Waymire (2007), p.103) to the r summands ), 1 ≤ j ≤ r, and with m/r for m, to get the desired result. Note that the summands have zero means, variances bounded away from zero and infinity, and bounded third moments, since as m = o(n3/2/log5/2 n), which also ensures that (See (2.2), (2.3)).
(c) One has (See (2.22))
(2.37) |
since , and .
Hence subtracting the bias from , (2.34) follows from (2.33).
Remark 2.1. Note that (2.33) implies that, with , the asymptotic variance of is O(N−4/5).
Remark 2.2. Theorems 2.1-2.3 easily extend to the case of non-equal sample sizes ni, 1 ≤ i ≤ m, and non-equidistant dosages x1 < . . . < xm, provided (1) the ratio of min{ni : 1 ≤ i ≤ m} to max{ni : 1 ≤ i ≤ m} is bounded away from zero, and (2) the ratio of min{xi+1 − xi : 1 ≤ i ≤ m − 1} to max{xi+1 − xi : 1 ≤ i ≤ m − 1} is bounded away from zero.
Acknowledgments
The authors wish to thank Professor Walt Piegorsch for a careful reading of the paper and for helpful suggestions.
References
- 1.Ayer M, Brunk HD, Ewing GM, Reid WT, Silverman E. An empirical distribution function for sampling with incomplete information. Ann.Math.Statist. 1955;26:641–647. [Google Scholar]
- 2.Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD. Statistical Inference Under Order Restrictions: The Theory and Application of Isotonic Regression. Wiley; London: 1972. [Google Scholar]
- 3.Bhattacharya R, Kong M. Consistency and asymptotic normality of the estimated effective dose in bioassay. Journal of Statistical Planning and Inference. 2007;137:643–658. [Google Scholar]
- 4.Bhattacharya R, Lin L. Nonparametric benchmark analysis in risk assessment: a comparative study by simulation and data analysis. 2010. preprint. [DOI] [PMC free article] [PubMed]
- 5.Bhattacharya R, Waymire EC. A Basic Course in Probability Theory. Springer; New York: 2007. [Google Scholar]
- 6.Cran GW. AS149 amalgamation of means in the case of simple ordering. Appl. Statist. 1980;29(2):209–211. [Google Scholar]
- 7.Dette H, Neumeyer N, Pliz KF. A note on nonparametric estimation of the effective dose in quantal bioassay. J.Amer.Statist.Assoc. 2005;100:503–510. [Google Scholar]
- 8.Dette H, Scheder R. A finite sample comparison of nonparametric estimates of the effective dose in quantal bioassay. Journal of Statistical Computation and Simulation. 2010;80(5):527–544. [Google Scholar]
- 9.Müller HG, Schmitt T. Kernel and probit estimation in quantal bioassay. J.Amer.Statist.Assoc. 1988;83(403):750–759. [Google Scholar]
- 10.Park D, Park S. Parametric and nonparametric estimators of ED100α. Journal of Statistical Computation and Simulation. 2006;76(8):661–672. [Google Scholar]
- 11.Piegorsch WW, Bailer AJ. Analyzing Environmental Data. John Wiley & Sons; 2005. [Google Scholar]