Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Dec 1.
Published in final edited form as: Stat Probab Lett. 2010 Dec 1;80(23-24):1947–1953. doi: 10.1016/j.spl.2010.08.024

An adaptive nonparametric method in benchmark analysis for bioassay and environmental studies

Rabi Bhattacharya 1,1,*, Lizhen Lin 1
PMCID: PMC3027186  NIHMSID: NIHMS240739  PMID: 21278850

Abstract

We present a novel nonparametric method for bioassay and benchmark analysis in risk assessment, which averages isotonic MLEs based on disjoint subgroups of dosages. The asymptotic theory for the methodology is derived, showing that the MISEs (mean integrated squared error) of the estimates of both the dose-response curve F and its inverse F−1 achieve the optimal rate O(N−4/5). Also, we compute the asymptotic distribution of the estimate ζ~p of the effective dosage ζp = F−1 (p) which is shown to have an optimally small asymptotic variance.

Keywords: Monotone dose-response curve estimation, effective dosage, benchmark analysis, mean integrated square error, asymptotic normality

1. Introduction

The efficient estimation of effective dosage is an old but still very important problem in biology and medicine. In addition, concerns about the impact of pollutants in the environment have added a great sense of urgency to the development of good methods for the estimation of benchmarks in risk assessment (See, e.g., Piegorsch and Bailer (2005)). We present in this article the asymptotic theory of a new method. In a companion study based on extensive simulation and data analysis, to be presented elsewhere, it is shown that the method performs remarkably well even with small and moderate sample sizes (Bhattacharya and Lin (2010)).

Consider quantal dose-response experiments in bioassay where the response of a subject to a drug or a chemical agent is measured in a binary scale, 1 for response and 0 for non-response. Given a dosage x of the substance, let F(x) be the probability of response. The function xF(x) is called the dose-response curve, and it is assumed to be monotone increasing. The effective dosage for a targeted response (probability) p is defined as the ‘p-th quantile’ ζp or EDp,

ζp=EDp=F1(p),0p1;F1(p):=inf{x:F(x)p}. (1.1)

For the data, suppose that ni subjects are given a dosage xi (i = 1, . . . , m), where x1 < . . . < xm, with the total number of observations N=1imni. One may assume, without loss of generality, that 0 = x1 < . . . < xm = 1. The number of responses observed at dosage xi is ri (i = 1, . . . , m). The likelihood function for the estimation of F(xi), 1 ≤ im, is

L(p1,,pm)=i=1mpiri(1pi)niri(0p1pm1);[piF(xi)]. (1.2)

The maximum likelihood estimator (MLE) of (p1, . . . , pm), under the monotonicity constraint, is given in Ayer et al.(1955) by the following PAV, or pool-adjacent-violators algorithm (Also see Barlow et al.(1972), p.73, and Cran (1980)):

p~i=max0u<iminiv<mj=uvrjj=uvnj(1im). (1.3)

Bhattacharya and Kong (2007) proposed an estimate F~(x) of F(x), the dose-response curve, by taking F~(x) to be p~i at xi and by linear interpolation in the interval (xi, xi+1):

F~(x)={p~iifx=xip~i+p~i+1p~ixi+1xi(xxi)ifxi<xxi+1.}

F~ is a continuous function whose inverse is the estimate of EDp as given by:

ED~p={x1ifpp~1xi+pp~ip~i+1p~i(xi+1xi)ifp~i<pp~i+1for some ixmifp>p~m,} (1.4)

if p~i+1>p~i and, more generally, by F~1(p)=inf{x:F~(x)p}.

From now on, we will assume, for simplicity, that there are m equidistant dosages and the same number n of i.i.d. 0 − 1 valued observations at each dosage. Assume n → ∞, m → ∞ and

m=rs(n),withr1,s(n)integers, (1.5)

r(m4n)15 in Theorems 2.1, 2.2, 2.3 part (b), and r(m4n)15(log logn)65 in Theorem 2.3, part (c). Here f(m,n)g(m,n) means that the ratio of the two sides are bounded away from zero and infinity.

Let p^i denote the observed proportion of 1's at dosage xi. Divide the observed proportions and dosages into r groups, and consider the following application of the PAV algorithm to each of the r groups of levels below:

[Group1]:(x1,p^1),(xr+1,p^r+1),(x2r+1,p^2r+1),,(xm,p^m);[Group2]:(x1,p^1),(x2,p^2),(xr+2,p^r+2),(x2r+2,p^2r+2),,(xm,p^m);[Groupr]:(x1,p^1),(xr,p^r),(xr+r,p^r+r),(x2r+r,p^2r+r),,(xm,p^m). (1.6)

Note that Group 2 through Group r − 1 each has s(n) + 2 levels, while Groups 1 and r each has s(n) + 1 levels. Also, except for the smallest and the largest levels (with proportions p^1 and p^m, the sets of levels covered by them are disjoint. Together, they comprise all the different m = rs(n) dosages.

By linear interpolation, each Group j (j = 1, . . . , r) provides an estimate F~j of the dose-response curve F on [0, 1], and an estimate ζ~p,j of F−1. Note that while F−1 is defined on [F(0), F(1)], ζ~p,j and ζ~p below are defined on [F~(0),F~(1)]. Compute

F~(1r)1jrF~j,ζ~p(1r)1jrζp,j~, (1.7)

and choose the values of r for which the bootstrap estimates of the MISEs of F~ and ζ~ are the smallest. These we call the NAM estimates of F and F−1.

Among kernel based nonparametric methods for quantal bioassay, one may mention Müller and Schmitt (1988), Park and Park (2006), Dette et al. (2005) and Dette and Scheder (2010). A description of these methods may be found in the last two articles.

Remark 1.1. For the purpose of asymptotics, one may take the r groups in (1.6) to be disjoint, omitting p^m from Group 1, p^1 and p^m from Groups 2 through r − 1, and p^1 from Group r. As is shown in Bhattacharya and Kong (2007), outside a set Bn of negligible probability, p^i<p^i+1i. Given x ∈ (0, 1), if m, n are sufficiently large, and mr=o(nlogn) (see (2.2)), x belongs to the domain of F~jj, even if the curve F~j is constructed with common points removed. Outside Bn, the curves so obtained would coincide, on their respective domains, with the curves constructed after adjoining the end points. On the other hand, for relatively small sample sizes one needs to construct F~j with the groupings (1.6), so that each has domain [0, 1].

We now provide a summary of the rest of the article. The asymptotic theory of the NAM is derived in Section 2. Theorem 2.1 proves that the estimate of the dose-response curve has a MISE attaining the optimal rate O(N−4/5) under the assumptions that f = F′ is strictly positive, F″ is bounded and m = o(n3/2/(log n)5/2). Theorem 2.2 provides the same optimal MISE rate for the estimate pζ~p of the quantile curve of interest, under the additional restriction mn23. Theorem 2.3 shows that ζp is asymptotically Normal around Eζ~p with an asymptotic variance O(N45log logN), under the same broad assumptions as in Theorem 2.1. However, for asymptotic Normality of ζ~p around ζp, one needs the restriction m = o(n2/3). For larger m, a bias correction of ζ~p is thus called for. It will be shown in a companion paper (Bhattacharya and Lin (2010)), by extensive simulation and data analysis, that the method proposed here performs quite favorably in comparison with other leading nonparametric methods, including the new method due to Dette et al. (2005) and Dette and Scheder (2010)

2. Asymptotic Behavior

Let p^i=rini denote the sample proportion of responses to the dosage xi (i = 1, . . . , m). For simplicity, we assume in this section that ni = n for all i and that xi+1xi = 1/m for i = 1, . . . , m − 1. Let N = mn denote the total number of observations.

Theorem 2.1. Assume that the dose-response function F on [0, 1] is twice differentiable, f = Fhas a positive lower bound θ and that Fis bounded.

(a) The mean integrated squared error (MISE) of F~ has the asymptotically optimal rate O(N−4/5) as N → ∞, if r = O(1), mn14.

(b) If m/n1/4 → ∞, m = o(n3/2/(log n)5/2), then also the MISE of F~ is O(N−4/5), with a choice of r satisfying r(m4n)15.

Proof. (a) It follows from Bernstein's inequality, as in the proof of Theorem 1 in Bhattacharya and Kong (2007), that there exist appropriate positive constants c, c′ such that for n > 1,

P(p^ipi>clognnfor some i,i=1,,m)cN2. (2.1)

It follows that if

m<(θ2c)nlogn, (2.2)

then

P(p^ip~ifor some i,i=1,,m)cN2 (2.3)

for some c″ > 0. Let Bn denote the union of the two sets within parentheses in (2.1) and (2.3). It is shown in Bhattacharya and Kong (2007), and simple to check using (2.1) to (2.3), that, on Bnc, p^i<p^i+1i.

Let x ∈ [xi, xi+1]. By linearity of F~ on [xi, xi+1],

F~(x)=(xi+1xxi+1xip^i+xxixi+1xip^i+1)1Bnc+F~(x)1Bn=xi+1xxi+1xip^i+xxixi+1xip^i+1+εn,1=p^i+xxixi+1xi(p^i+1p^i)+εn,1(εn,121Bn=Op(N2)). (2.4)

Also, for some x* ∈ [xi, xi+1],

F(x)=F(xi)+(xxi)F(x)=pi+(xxi)[F(xi+1)F(xi)xi+1xi+ε(x)],ε(x)Mm, (2.5)

where ε(x) = F′(x*) − F′(x**) for some x*, x** lying in [xi, xi+1], and M = sup{|F″(x)| : 0 ≤ x ≤ 1}. Thus, noting that F, F~ are bounded by one,

EF~(x)=xi+1xxi+1xipi+xxixi+1xipi+1+O(N2)=pi+xxixi+1xi(pi+1pi)+O(N2), (2.6)

and

EF~(x)F(x)=(xxi)ε(x)+O(N2)=O(1m2). (2.7)

From (2.4) and (2.5),

EF~(x)F(x)=(xxi)ε(x)+O(N2)=O(1m2). (2.8)

and, by subtracting (2.7) from (2.8) one gets

F~(x)EF~(x)=p^ipi+[p^i+1p^i(pi+1pi)]xxixi+1xi+εn,1=xi+1xxi+1xi(p^ipi)+xxixi+1xi(p^i+1pi+1)+εn,1. (2.9)

Hence

Var(F~(x))=(xi+1xxi+1xi)2pi(1pi)n+(xxixi+1xi)2pi+1(1pi+1)n+O(N2)=O(1n). (2.10)

From (2.7) and (2.10) one obtains, on integration,

MISE(F~)=O(1n)+O(1m4). (2.11)

If mn14, then the MISE attains its optimal rate (noting that mn = N, or n5/4 = O(N)), MISE(F~)=O(1n)=O(N45).

(b) First observe that the r groups in ((1.6)) are essentially disjoint. Inclusion of (x1, p^1) and (xm, p^m) in each group ensures that F~j (j = 1, . . . , r) is defined on all of [0, 1]. Note the strict inequality p^j<p^j+1, j on Bnc, since the assumption m = o(n3/2/(log n)5/2) implies that (2.2) holds with m/r in place of m.

If one has m/n1/4 → ∞, then using r essentially disjoint groups, and averaging, one has (See (2.7), (2.10))

MISE(F~)=MISE(1r1jrF~j)=O(1rn)+O(1(mr)4). (2.12)

The optimal choice of r is given by the relation (mr)4rn or, r(m4n)15, yielding the optimal rate: MISE(F~)=O((rn)1)=O(N45).

We now turn to the estimation of the curve F−1.

Theorem 2.2. Assume the hypothesis of Theorem 2.1.

(a) If m = O(n1/4) then, with r = 1, ζ~=F~1, one has MISE(ζ~)=O(N45).

(b) If m/n1/4 → ∞, but mn23, then MISE(ζ~)=O(N45), with r(m4n)15.

Proof. (a) For m = O(n1/4), one may consider r = 1 in (1.7). Then ζ~=F~1. Let p ∈ [pi, pi+1], so that x = F−1(p) ∈ [xi, xi+1]. Then, on Bnc,

F1(p)F~1(p)=F~1(F(x))F~1(F(x)). (2.13)

First, consider, for an appropriate positive constant c1,

p[pi+c1log(n)n,pi+1c1log(n)n]=Dn,i, (2.14)

say. Then on Bnc, F(x) and F~(x) belong to [p^i,p^i+1]. Using (2.13), the linearity of F~1 on [p^i,p^i+1] and (2.8), and writing

δn=p^i+1p^i(pi+1pi),1p^i+1p^i=1pi+1pi(1δnp^i+1p^i) (2.15)

on Bnc, we get the following relation, noting that F1(p)F~1(p) is bounded by 1:

F1(p)F~1(p)=[F~(x)F(x)]xi+1xip^i+1p^i1Bnc+εn,2(εn,21Bn=Op(N2))={(p^ipi)xi+1xip^i+1p^i+xxip^i+1p^iδnε(x)(xxi)xi+1xip^i+1p^i}1Bnc+εn,2={(p^ipi)xi+1xipi+1pi(1δnp^i+1p^i)+(xxi)δnpi+1pi(1δnp^i+1p^i)ε(x)(xxi)(xi+1xi)pi+1pi(1δnp^i+1p^i)}1Bnc+εn,2.={(p^ipi)xi+1xipi+1pi+(xxi)δnpi+1piε(x)(xxi)(xi+1xi)pi+1pi}1Bnc+εn,3+εn,2. (2.16)

Here

εn,3={(p^ipi)(xi+1xipi+1pi)δnp^i+1p^i+(xxipi+1pi)δn2p^i+1p^i+ε(x)(xxi)(xi+1xi)pi+1pi(δnp^i+1p^i)}1Bnc. (2.17)

Note that, on Bnc, p^ipi<c(lognn)12=εn, say, i, so that p^i+1p^ipi+1pi2εn>θ2m for all sufficiently large n.

The expectation of (2.16) equals

F1(p)EF~1(p)=ε(x)(xxi)(xi+1xi)pi+1pi+Eεn,3+O(N2). (2.18)

Now n,3 is the sum of the following:

E(p^ipi)(xi+1xi)δn(pi+1pi)(p^i+1p^i)1Bnc)=E((p^ipi)(xi+1xi)δn(pi+1pi)2(1δnp^i+1p^i)1Bnc)=pi(1pi)nxi+1xi(pi+1pi)2+O(m2n32)+O(N2)=O(mn); (2.19)
E((xxi)(pi+1pi)δn2(p^i+1p^i)1Bnc)=(xxi)pi+1piE(δn2p^i+1p^i1Bnc)=(xxi)(pi+1pi)2Eδn2+(xxi)(pi+1pi)2Eδn3p^i+1p^i1Bnc+O(N2)=(xxi)(pi+1pi)2{pi(1pi)+pi+1(1pi+1)}n+O(m2n32); (2.20)
E(ε(x)(xxi)(xi+1xi)pi+1pi(δnp^i+1p^i)1Bnc)=ε(x)(xxi)(xi+1xi)pi+1piE{δn2(pi+1pi)(p^i+1p^i)}1Bnc=O(1n). (2.21)

For the first relation in (2.21), use δnp^i+1p^i=δn(1δnp^i+1p^i)(pi+1pi), and Eδn = 0. Hence the bias (of F~1(p) as an estimator of F−1(p)) is

Bias(F~1(p))=O(mn)+O(1m2). (2.22)

Subtracting (2.18) from (2.16), one obtains

EF~1(p)F~1(p)={(p^ipi)xi+1xpi+1pi+(p^i+1pi+1)xxipi+1pi}1Bnc+O(mn)+Op(N2), (2.23)

noting that E(εn,32)cm2n2 for some constant c‴. The term Op(N−2) is bounded by civ1Bn, for some constant civ. Therefore,

Var(F~1(p))=O(1n)+O(m2n2). (2.24)

It is relatively simple to check that the contribution from Dn,ic, 1 ≤ im−1, to MISE(ζ~) is negligible compare to that from Dn=1im1Dn,i. It is useful, however, to show that for all p ∈ [0, 1], one has on Bnc, the relation

F1(p)F~1(p)=[F~(x)F(x)]xi+1xip^i+1p^i(1+εn,4), (2.25)

where εn,4 = Op(1/m). Indeed, εn,4cvm on Bnc, for some cv > 0. To establish (2.25), note first that if pDn,ic[pi,pi+1] then, although F~(x)[p^i,p^i+1] (since x ∈ [xi, xi+1]), it may happen that F(x) belongs to (p^i1,p^i) or (p^i+1,p^i+2). On Bnc, there is no other possibility.

Now if F(x)(p^i1,p^i), e.g., then, recalling that x = F−1(p),

F1(p)F~1(p)F~1(F~(x))F~1(F(x))=F~1(F~(x))F~1(F~(xi))+F~1(F~(xi))F~1(F(x))=(F~(x)F~(xi)){xi+1xip^i+1p^i}+(F~(xi)F(x)){xixi1p^ip^i1}, (2.26)

in view of the linearity of F~1 on both [p^i,p^i+1] and [p^i1,p^i], but with different slopes (given in curly brackets). But the second slope differs from the first by an amount εn,4 which is easily shown to be no more than cv/m on Bnc. The MISE of F~1 is then given by

MISE(F~1)=O(m2n2)+O(1m4)+O(1n). (2.27)

Once again, the optimal choice of m is mn14, and then the MISE has the optimal rate

MISE(F~1)=O(1n)=O(N45). (2.28)

(b) Next consider the case m/n1/4 → ∞, i.e., n = o(N4/5). Since MISE(F~1)=O(1n), it is of larger order than N−4/5, and hence the estimation is suboptimal. In this case, again consider r groups of essentially disjoint equidistant dosages. Then the average ζ~p=1rj=1rζ~p,j has bias and variance (See (2.22) and (2.24)) given by

Bias(ζ~p)=O(m(rn))+O((rm)2), (2.29)
Var(ζ~p)=O(1(rn))+O(m2(r3n2)). (2.30)

Assume that m is not very large, i.e., mn23. Then the optimal choice of r is r(m4n)15, since the term m/(rn) in (2.29) is not of larger order than (r/m)2, and one equates the orders of 1/(rn) and (r/m)4 to get the optimal r. This yields the optimal MISE of ζ~p, namely, MISE(ζ~p)=O(1(rn))=O((mn)45)=O(N45). □

Finally, we arrive at the asymptotic distribution of ζ~p.

Theorem 2.3. Let p ∈ (0, 1). In addition to the hypothesis of Theorem 2.1, assume m/n1/4 → ∞. Then the following hold.

(a) With r = 1, ζ~p=F~1(p), if m < (2/())(n/log n)1/2, then

n(ζ~pζp)δ(p)LN(0,p(1p)f2(p)), (2.31)

where

δ2(p)i=1m1(xi+1xi)2{(xi+1x)2+(xxi)2}1Ii(p)(Ii=[pi,pi+1)for1im2,Im1=[pm1,pm]), (2.32)

lies in [1/2,1], and x = F−1(p) = ζp.

(b) If m = o(n3/2/log5/2 n), then with r(m4n)15,

rn(ζ~pEζ~p)δ(p)LN(0,p(1p)f2(p)) (2.33)

Here δ2(p) is the average of the r quantities δj2(p), 1 ≤ j ≤ r, of the form (2.32), one for each subgroup with m/r dosages at a distance of r/m from each other.

(c) If m = o(n2/3/log log n), then with r(m4n)15(log logn)65,

rn(ζ~pζp)δ(p)LN(0,p(1p)f2(p)). (2.34)

Proof. (a) It follows from (2.16) (and (2.25)) that for p ∈ [pi, pi+1) one has, outside Bn,

F~1(p)F1(p)={(p^ipi)xi+1xpi+1pi+(p^i+1pi+1)xxipi+1pi}+O(mn)+O(1m2) (2.35)

Multiplying the two sides by n, and noting that mn0,nm20, the desired Normal approximation holds.

(b) By (2.23), one has, outside Bn,

F~1(p)EF~1(p)={(p^ipi)xi+1xpi+1pi+(p^i+1pi+1)xxipi+1pi}+O(mn). (2.36)

Using the analog of (2.36) for F~j1(p)EF~j1(p), one may apply Lyapunov's central limit theorem (See, e.g., Bhattacharya and Waymire (2007), p.103) to the r summands n(F~j1(p)EF~j1(p))), 1 ≤ jr, and with m/r for m, to get the desired result. Note that the summands have zero means, variances bounded away from zero and infinity, and bounded third moments, since nmrn=m(rn)m15n15n=m15n3100 as m = o(n3/2/log5/2 n), which also ensures that mr=o(nlogn) (See (2.2), (2.3)).

(c) One has (See (2.22))

Bias(ζ~p)=O(m(rn))+O(r2m2),rnBias(ζ~p)=O(mrn)+O(r52nm2)0, (2.37)

since mrn=(mn23log logn)350, and r52nm2=O((m2n(log logn)3)nm2)0.

Hence subtracting the bias from ζ~p, (2.34) follows from (2.33).

Remark 2.1. Note that (2.33) implies that, with r(m4n)15, the asymptotic variance of ζ~p is O(N−4/5).

Remark 2.2. Theorems 2.1-2.3 easily extend to the case of non-equal sample sizes ni, 1 ≤ im, and non-equidistant dosages x1 < . . . < xm, provided (1) the ratio of min{ni : 1 ≤ im} to max{ni : 1 ≤ im} is bounded away from zero, and (2) the ratio of min{xi+1xi : 1 ≤ im − 1} to max{xi+1xi : 1 ≤ im − 1} is bounded away from zero.

Acknowledgments

The authors wish to thank Professor Walt Piegorsch for a careful reading of the paper and for helpful suggestions.

References

  • 1.Ayer M, Brunk HD, Ewing GM, Reid WT, Silverman E. An empirical distribution function for sampling with incomplete information. Ann.Math.Statist. 1955;26:641–647. [Google Scholar]
  • 2.Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD. Statistical Inference Under Order Restrictions: The Theory and Application of Isotonic Regression. Wiley; London: 1972. [Google Scholar]
  • 3.Bhattacharya R, Kong M. Consistency and asymptotic normality of the estimated effective dose in bioassay. Journal of Statistical Planning and Inference. 2007;137:643–658. [Google Scholar]
  • 4.Bhattacharya R, Lin L. Nonparametric benchmark analysis in risk assessment: a comparative study by simulation and data analysis. 2010. preprint. [DOI] [PMC free article] [PubMed]
  • 5.Bhattacharya R, Waymire EC. A Basic Course in Probability Theory. Springer; New York: 2007. [Google Scholar]
  • 6.Cran GW. AS149 amalgamation of means in the case of simple ordering. Appl. Statist. 1980;29(2):209–211. [Google Scholar]
  • 7.Dette H, Neumeyer N, Pliz KF. A note on nonparametric estimation of the effective dose in quantal bioassay. J.Amer.Statist.Assoc. 2005;100:503–510. [Google Scholar]
  • 8.Dette H, Scheder R. A finite sample comparison of nonparametric estimates of the effective dose in quantal bioassay. Journal of Statistical Computation and Simulation. 2010;80(5):527–544. [Google Scholar]
  • 9.Müller HG, Schmitt T. Kernel and probit estimation in quantal bioassay. J.Amer.Statist.Assoc. 1988;83(403):750–759. [Google Scholar]
  • 10.Park D, Park S. Parametric and nonparametric estimators of ED100α. Journal of Statistical Computation and Simulation. 2006;76(8):661–672. [Google Scholar]
  • 11.Piegorsch WW, Bailer AJ. Analyzing Environmental Data. John Wiley & Sons; 2005. [Google Scholar]

RESOURCES