An adaptive nonparametric method in benchmark analysis for bioassay and environmental studies

Rabi Bhattacharya; Lizhen Lin

doi:10.1016/j.spl.2010.08.024

. Author manuscript; available in PMC: 2011 Dec 1.

Published in final edited form as: Stat Probab Lett. 2010 Dec 1;80(23-24):1947–1953. doi: 10.1016/j.spl.2010.08.024

An adaptive nonparametric method in benchmark analysis for bioassay and environmental studies

Rabi Bhattacharya ^1,^1,^*, Lizhen Lin ¹

PMCID: PMC3027186 NIHMSID: NIHMS240739 PMID: 21278850

Abstract

We present a novel nonparametric method for bioassay and benchmark analysis in risk assessment, which averages isotonic MLEs based on disjoint subgroups of dosages. The asymptotic theory for the methodology is derived, showing that the MISEs (mean integrated squared error) of the estimates of both the dose-response curve F and its inverse F⁻¹ achieve the optimal rate O(N^−4/5). Also, we compute the asymptotic distribution of the estimate ${\tilde{ζ}}_{p}$ of the effective dosage ζ_p = F⁻¹ (p) which is shown to have an optimally small asymptotic variance.

Keywords: Monotone dose-response curve estimation, effective dosage, benchmark analysis, mean integrated square error, asymptotic normality

1. Introduction

The efficient estimation of effective dosage is an old but still very important problem in biology and medicine. In addition, concerns about the impact of pollutants in the environment have added a great sense of urgency to the development of good methods for the estimation of benchmarks in risk assessment (See, e.g., Piegorsch and Bailer (2005)). We present in this article the asymptotic theory of a new method. In a companion study based on extensive simulation and data analysis, to be presented elsewhere, it is shown that the method performs remarkably well even with small and moderate sample sizes (Bhattacharya and Lin (2010)).

Consider quantal dose-response experiments in bioassay where the response of a subject to a drug or a chemical agent is measured in a binary scale, 1 for response and 0 for non-response. Given a dosage x of the substance, let F(x) be the probability of response. The function $x \mapsto F (x)$ is called the dose-response curve, and it is assumed to be monotone increasing. The effective dosage for a targeted response (probability) p is defined as the ‘p-th quantile’ ζ_p or ED_p,

ζ_{p} = E D_{p} = F^{- 1} (p), 0 \leq p \leq 1; F^{- 1} (p) : = inf {x : F (x) \geq p} .

(1.1)

For the data, suppose that n_i subjects are given a dosage x_i (i = 1, . . . , m), where x₁ < . . . < x_m, with the total number of observations $N = \sum_{1 \leq i \leq m} n_{i}$ . One may assume, without loss of generality, that 0 = x₁ < . . . < x_m = 1. The number of responses observed at dosage x_i is r_i (i = 1, . . . , m). The likelihood function for the estimation of F(x_i), 1 ≤ i ≤ m, is

L (p_{1}, \dots, p_{m}) = \prod_{i = 1}^{m} p_{i}^{r_{i}} {(1 - p_{i})}^{n_{i} - r_{i}} (0 \leq p_{1} \leq \dots \leq p_{m} \leq 1); [p_{i} ≔ F (x_{i})] .

(1.2)

The maximum likelihood estimator (MLE) of (p₁, . . . , p_m), under the monotonicity constraint, is given in Ayer et al.(1955) by the following PAV, or pool-adjacent-violators algorithm (Also see Barlow et al.(1972), p.73, and Cran (1980)):

{\tilde{p}}_{i} = max_{0 \leq u < i} min_{i \leq v < m} \frac{\sum_{j = u}^{v} r_{j}}{\sum_{j = u}^{v} n_{j}} (1 \leq i \leq m) .

(1.3)

Bhattacharya and Kong (2007) proposed an estimate $\tilde{F} (x)$ of F(x), the dose-response curve, by taking $\tilde{F} (x)$ to be ${\tilde{p}}_{i}$ at x_i and by linear interpolation in the interval (x_i, x_i+1):

\tilde{F} (x) = {\begin{matrix} {\tilde{p}}_{i} & if x = x_{i} \\ {\tilde{p}}_{i} + \frac{{\tilde{p}}_{i + 1} - {\tilde{p}}_{i}}{x_{i + 1} - x_{i}} (x - x_{i}) & if x_{i} < x \leq x_{i + 1} . \end{matrix}

$\tilde{F}$ is a continuous function whose inverse is the estimate of ED_p as given by:

{\tilde{E D}}_{p} = {\begin{matrix} x_{1} & if p \leq {\tilde{p}}_{1} \\ x_{i} + \frac{p - {\tilde{p}}_{i}}{{\tilde{p}}_{i + 1} - {\tilde{p}}_{i}} (x_{i + 1} - x_{i}) & if {\tilde{p}}_{i} < p \leq {\tilde{p}}_{i + 1} for some i \\ x_{m} & if p > {\tilde{p}}_{m}, \end{matrix}

(1.4)

if ${\tilde{p}}_{i + 1} > {\tilde{p}}_{i}$ and, more generally, by ${\tilde{F}}^{- 1} (p) = i n f {x : \tilde{F} (x) \geq p}$ .

From now on, we will assume, for simplicity, that there are m equidistant dosages and the same number n of i.i.d. 0 − 1 valued observations at each dosage. Assume n → ∞, m → ∞ and

m = r s (n), with r \geq 1, s (n) integers,

(1.5)

$r ≍ {(m^{4} ∕ n)}^{1 ∕ 5}$ in Theorems 2.1, 2.2, 2.3 part (b), and $r ≍ {(m^{4} ∕ n)}^{1 ∕ 5} ∕ {(log log n)}^{6 ∕ 5}$ in Theorem 2.3, part (c). Here $f (m, n) ≍ g (m, n)$ means that the ratio of the two sides are bounded away from zero and infinity.

Let ${\hat{p}}_{i}$ denote the observed proportion of 1's at dosage x_i. Divide the observed proportions and dosages into r groups, and consider the following application of the PAV algorithm to each of the r groups of levels below:

\begin{matrix} [Group 1] : (x_{1}, {\hat{p}}_{1}), (x_{r + 1}, {\hat{p}}_{r + 1}), (x_{2 r + 1}, {\hat{p}}_{2 r + 1}), \dots, (x_{m}, {\hat{p}}_{m}); \\ [Group 2] : (x_{1}, {\hat{p}}_{1}), (x_{2}, {\hat{p}}_{2}), (x_{r + 2}, {\hat{p}}_{r + 2}), (x_{2 r + 2}, {\hat{p}}_{2 r + 2}), \dots, (x_{m}, {\hat{p}}_{m}); \\ \dots \dots \\ [Group r] : (x_{1}, {\hat{p}}_{1}), (x_{r}, {\hat{p}}_{r}), (x_{r + r}, {\hat{p}}_{r + r}), (x_{2 r + r}, {\hat{p}}_{2 r + r}), \dots, (x_{m}, {\hat{p}}_{m}) . \end{matrix}

(1.6)

Note that Group 2 through Group r − 1 each has s(n) + 2 levels, while Groups 1 and r each has s(n) + 1 levels. Also, except for the smallest and the largest levels (with proportions ${\hat{p}}_{1}$ and ${\hat{p}}_{m}$ , the sets of levels covered by them are disjoint. Together, they comprise all the different m = rs(n) dosages.

By linear interpolation, each Group j (j = 1, . . . , r) provides an estimate ${\tilde{F}}_{j}$ of the dose-response curve F on [0, 1], and an estimate ${\tilde{ζ}}_{p, j}$ of F⁻¹. Note that while F⁻¹ is defined on [F(0), F(1)], ${\tilde{ζ}}_{p, j}$ and ${\tilde{ζ}}_{p}$ below are defined on $[\tilde{F} (0), \tilde{F} (1)]$ . Compute

\tilde{F} ≔ (1 ∕ r) \sum_{1 \leq j \leq r} {\tilde{F}}_{j}, {\tilde{ζ}}_{p} ≔ (1 ∕ r) \sum_{1 \leq j \leq r} \tilde{ζ_{p, j}},

(1.7)

and choose the values of r for which the bootstrap estimates of the MISEs of $\tilde{F}$ and $\tilde{ζ}$ are the smallest. These we call the NAM estimates of F and F⁻¹.

Among kernel based nonparametric methods for quantal bioassay, one may mention Müller and Schmitt (1988), Park and Park (2006), Dette et al. (2005) and Dette and Scheder (2010). A description of these methods may be found in the last two articles.

Remark 1.1. For the purpose of asymptotics, one may take the r groups in (1.6) to be disjoint, omitting ${\hat{p}}_{m}$ from Group 1, ${\hat{p}}_{1}$ and ${\hat{p}}_{m}$ from Groups 2 through r − 1, and ${\hat{p}}_{1}$ from Group r. As is shown in Bhattacharya and Kong (2007), outside a set B_n of negligible probability, ${\hat{p}}_{i} < {\hat{p}}_{i + 1} \forall i$ . Given x ∈ (0, 1), if m, n are sufficiently large, and $m ∕ r = o (\sqrt{n ∕ log n})$ (see (2.2)), x belongs to the domain of ${\tilde{F}}_{j} \forall j$ , even if the curve ${\tilde{F}}_{j}$ is constructed with common points removed. Outside B_n, the curves so obtained would coincide, on their respective domains, with the curves constructed after adjoining the end points. On the other hand, for relatively small sample sizes one needs to construct ${\tilde{F}}_{j}$ with the groupings (1.6), so that each has domain [0, 1].

We now provide a summary of the rest of the article. The asymptotic theory of the NAM is derived in Section 2. Theorem 2.1 proves that the estimate of the dose-response curve has a MISE attaining the optimal rate O(N^−4/5) under the assumptions that f = F′ is strictly positive, F″ is bounded and m = o(n^3/2/(log n)^5/2). Theorem 2.2 provides the same optimal MISE rate for the estimate $p \to {\tilde{ζ}}_{p}$ of the quantile curve of interest, under the additional restriction $m ∕ n^{2 ∕ 3} ↛ \infty$ . Theorem 2.3 shows that ζ_p is asymptotically Normal around $E {\tilde{ζ}}_{p}$ with an asymptotic variance $O (N^{- 4 ∕ 5} \sqrt{log log N})$ , under the same broad assumptions as in Theorem 2.1. However, for asymptotic Normality of ${\tilde{ζ}}_{p}$ around ζ_p, one needs the restriction m = o(n^2/3). For larger m, a bias correction of ${\tilde{ζ}}_{p}$ is thus called for. It will be shown in a companion paper (Bhattacharya and Lin (2010)), by extensive simulation and data analysis, that the method proposed here performs quite favorably in comparison with other leading nonparametric methods, including the new method due to Dette et al. (2005) and Dette and Scheder (2010)

2. Asymptotic Behavior

Let ${\hat{p}}_{i} = r_{i} ∕ n_{i}$ denote the sample proportion of responses to the dosage x_i (i = 1, . . . , m). For simplicity, we assume in this section that n_i = n for all i and that x_i+1 − x_i = 1/m for i = 1, . . . , m − 1. Let N = mn denote the total number of observations.

Theorem 2.1. Assume that the dose-response function F on [0, 1] is twice differentiable, f = F′ has a positive lower bound θ and that F″ is bounded.

(a) The mean integrated squared error (MISE) of $\tilde{F}$ has the asymptotically optimal rate O(N^−4/5) as N → ∞, if r = O(1), $m ≍ n^{1 ∕ 4}$ .

(b) If m/n^1/4 → ∞, m = o(n^3/2/(log n)^5/2), then also the MISE of $\tilde{F}$ is O(N^−4/5), with a choice of r satisfying $r ≍ {(m^{4} ∕ n)}^{1 ∕ 5}$ .

Proof. (a) It follows from Bernstein's inequality, as in the proof of Theorem 1 in Bhattacharya and Kong (2007), that there exist appropriate positive constants c, c′ such that for n > 1,

P (∣ {\hat{p}}_{i} - p_{i} ∣ > c \sqrt{log n ∕ n} for some i, i = 1, \dots, m) \leq c^{'} N^{- 2} .

(2.1)

It follows that if

m < (θ ∕ 2 c) \sqrt{n ∕ log n},

(2.2)

then

P ({\hat{p}}_{i} \neq {\tilde{p}}_{i} for some i, i = 1, \dots, m) \leq c^{″} N^{- 2}

(2.3)

for some c″ > 0. Let B_n denote the union of the two sets within parentheses in (2.1) and (2.3). It is shown in Bhattacharya and Kong (2007), and simple to check using (2.1) to (2.3), that, on $B_{n}^{c}$ , ${\hat{p}}_{i} < {\hat{p}}_{i + 1} \forall i$ .

Let x ∈ [x_i, x_i+1]. By linearity of $\tilde{F}$ on [x_i, x_i+1],

\begin{matrix} \tilde{F} (x) & = (\frac{x_{i + 1} - x}{x_{i + 1} - x_{i}} {\hat{p}}_{i} + \frac{x - x_{i}}{x_{i + 1} - x_{i}} {\hat{p}}_{i + 1}) 1_{B_{n}^{c}} + \tilde{F} (x) 1_{B_{n}} \\ = \frac{x_{i + 1} - x}{x_{i + 1} - x_{i}} {\hat{p}}_{i} + \frac{x - x_{i}}{x_{i + 1} - x_{i}} {\hat{p}}_{i + 1} + ε_{n, 1} \\ = {\hat{p}}_{i} + \frac{x - x_{i}}{x_{i + 1} - x_{i}} ({\hat{p}}_{i + 1} - {\hat{p}}_{i}) + ε_{n, 1} (∣ ε_{n, 1} ∣ \leq 21_{B_{n}} = O_{p} (N^{- 2})) . \end{matrix}

(2.4)

Also, for some x* ∈ [x_i, x_i+1],

F (x) = F (x_{i}) + (x - x_{i}) F^{'} (x^{*}) = p_{i} + (x - x_{i}) [\frac{F (x_{i + 1}) - F (x_{i})}{x_{i + 1} - x_{i}} + ε (x)], ∣ ε (x) ∣ \leq M ∕ m,

(2.5)

where ε(x) = F′(x*) − F′(x**) for some x*, x** lying in [x_i, x_i+1], and M = sup{|F″(x)| : 0 ≤ x ≤ 1}. Thus, noting that F, $\tilde{F}$ are bounded by one,

E \tilde{F} (x) = \frac{x_{i + 1} - x}{x_{i + 1} - x_{i}} p_{i} + \frac{x - x_{i}}{x_{i + 1} - x_{i}} p_{i + 1} + O (N^{- 2}) = p_{i} + \frac{x - x_{i}}{x_{i + 1} - x_{i}} (p_{i + 1} - p_{i}) + O (N^{- 2}),

(2.6)

and

E \tilde{F} (x) - F (x) = - (x - x_{i}) ε (x) + O (N^{- 2}) = O (1 ∕ m^{2}) .

(2.7)

From (2.4) and (2.5),

E \tilde{F} (x) - F (x) = - (x - x_{i}) ε (x) + O (N^{- 2}) = O (1 ∕ m^{2}) .

(2.8)

and, by subtracting (2.7) from (2.8) one gets

\begin{matrix} \tilde{F} (x) - E \tilde{F} (x) & = {\hat{p}}_{i} - p_{i} + [{\hat{p}}_{i + 1} - {\hat{p}}_{i} - (p_{i + 1} - p_{i})] \frac{x - x_{i}}{x_{i + 1} - x_{i}} + ε_{n, 1} \\ = \frac{x_{i + 1} - x}{x_{i + 1} - x_{i}} ({\hat{p}}_{i} - p_{i}) + \frac{x - x_{i}}{x_{i + 1} - x_{i}} ({\hat{p}}_{i + 1} - p_{i + 1}) + ε_{n, 1} . \end{matrix}

(2.9)

Hence

V a r (\tilde{F} (x)) = {(\frac{x_{i + 1} - x}{x_{i + 1} - x_{i}})}^{2} \frac{p_{i} (1 - p_{i})}{n} + {(\frac{x - x_{i}}{x_{i + 1} - x_{i}})}^{2} \frac{p_{i + 1} (1 - p_{i + 1})}{n} + O (N^{- 2}) = O (1 ∕ n) .

(2.10)

From (2.7) and (2.10) one obtains, on integration,

M I S E (\tilde{F}) = O (1 ∕ n) + O (1 ∕ m^{4}) .

(2.11)

If $m ≍ n^{1 ∕ 4}$ , then the MISE attains its optimal rate (noting that mn = N, or n^5/4 = O(N)), $M I S E (\tilde{F}) = O (1 ∕ n) = O (N^{- 4 ∕ 5})$ .

(b) First observe that the r groups in ((1.6)) are essentially disjoint. Inclusion of (x₁, ${\hat{p}}_{1}$ ) and (x_m, ${\hat{p}}_{m}$ ) in each group ensures that ${\tilde{F}}_{j}$ (j = 1, . . . , r) is defined on all of [0, 1]. Note the strict inequality ${\hat{p}}_{j} < {\hat{p}}_{j + 1}$ , $\forall j$ on $B_{n}^{c}$ , since the assumption m = o(n^3/2/(log n)^5/2) implies that (2.2) holds with m/r in place of m.

If one has m/n^1/4 → ∞, then using r essentially disjoint groups, and averaging, one has (See (2.7), (2.10))

M I S E (\tilde{F}) = M I S E (1 ∕ r \sum_{1 \leq j \leq r} {\tilde{F}}_{j}) = O (1 ∕ r n) + O (1 ∕ {(m ∕ r)}^{4}) .

(2.12)

The optimal choice of r is given by the relation ${(m ∕ r)}^{4} ≍ r n$ or, $r ≍ {(m^{4} ∕ n)}^{1 ∕ 5}$ , yielding the optimal rate: $M I S E (\tilde{F}) = O ({(r n)}^{- 1}) = O (N^{- 4 ∕ 5})$ .

We now turn to the estimation of the curve F⁻¹.

Theorem 2.2. Assume the hypothesis of Theorem 2.1.

(a) If m = O(n^1/4) then, with r = 1, $\tilde{ζ} = {\tilde{F}}^{- 1}$ , one has $M I S E (\tilde{ζ}) = O (N^{- 4 ∕ 5})$ .

(b) If m/n^1/4 → ∞, but $m ∕ n^{2 ∕ 3} ↛ \infty$ , then $M I S E (\tilde{ζ}) = O (N^{- 4 ∕ 5})$ , with $r ≍ {(m^{4} ∕ n)}^{1 ∕ 5}$ .

Proof. (a) For m = O(n^1/4), one may consider r = 1 in (1.7). Then $\tilde{ζ} = {\tilde{F}}^{- 1}$ . Let p ∈ [p_i, p_i+1], so that x = F⁻¹(p) ∈ [x_i, x_i+1]. Then, on $B_{n}^{c}$ ,

F^{- 1} (p) - {\tilde{F}}^{- 1} (p) = {\tilde{F}}^{- 1} (F (x)) - {\tilde{F}}^{- 1} (F (x)) .

(2.13)

First, consider, for an appropriate positive constant c₁,

p \in [p_{i} + c_{1} \sqrt{log (n) ∕ n}, p_{i + 1} - c_{1} \sqrt{log (n) ∕ n}] = D_{n, i},

(2.14)

say. Then on $B_{n}^{c}$ , F(x) and $\tilde{F} (x)$ belong to $[{\hat{p}}_{i}, {\hat{p}}_{i + 1}]$ . Using (2.13), the linearity of ${\tilde{F}}^{- 1}$ on $[{\hat{p}}_{i}, {\hat{p}}_{i + 1}]$ and (2.8), and writing

δ_{n} = {\hat{p}}_{i + 1} - {\hat{p}}_{i} - (p_{i + 1} - p_{i}), \frac{1}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}} = \frac{1}{p_{i + 1} - p_{i}} (1 - \frac{δ_{n}}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}})

(2.15)

on $B_{n}^{c}$ , we get the following relation, noting that $F^{- 1} (p) - {\tilde{F}}^{- 1} (p)$ is bounded by 1:

\begin{matrix} F^{- 1} (p) - {\tilde{F}}^{- 1} (p) & = [\tilde{F} (x) - F (x)] \frac{x_{i + 1} - x_{i}}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}} 1_{B_{n}^{c}} + ε_{n, 2} (∣ ε_{n, 2} ∣ \leq 1_{B_{n}} = O_{p} (N^{- 2})) \\ = {({\hat{p}}_{i} - p_{i}) \frac{x_{i + 1} - x_{i}}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}} + \frac{x - x_{i}}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}} δ_{n} - ε (x) (x - x_{i}) \frac{x_{i + 1} - x_{i}}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}}} 1_{B_{n}^{c}} + ε_{n, 2} \\ = {({\hat{p}}_{i} - p_{i}) \frac{x_{i + 1} - x_{i}}{p_{i + 1} - p_{i}} (1 - \frac{δ_{n}}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}}) + \frac{(x - x_{i}) δ_{n}}{p_{i + 1} - p_{i}} (1 - \frac{δ_{n}}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}}) - \frac{ε (x) (x - x_{i}) (x_{i + 1} - x_{i})}{p_{i + 1} - p_{i}} (1 - \frac{δ_{n}}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}})} 1_{B_{n}^{c}} + ε_{n, 2} . \\ = {({\hat{p}}_{i} - p_{i}) \frac{x_{i + 1} - x_{i}}{p_{i + 1} - p_{i}} + \frac{(x - x_{i}) δ_{n}}{p_{i + 1} - p_{i}} - \frac{ε (x) (x - x_{i}) (x_{i + 1} - x_{i})}{p_{i + 1} - p_{i}}} 1_{B_{n}^{c}} + ε_{n, 3} + ε_{n, 2} . \end{matrix}

(2.16)

Here

ε_{n, 3} = - {({\hat{p}}_{i} - p_{i}) (\frac{x_{i + 1} - x_{i}}{p_{i + 1} - p_{i}}) \frac{δ_{n}}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}} + (\frac{x - x_{i}}{p_{i + 1} - p_{i}}) \frac{δ_{n}^{2}}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}} + \frac{ε (x) (x - x_{i}) (x_{i + 1} - x_{i})}{p_{i + 1} - p_{i}} (\frac{- δ_{n}}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}})} 1_{B_{n}^{c}} .

(2.17)

Note that, on $B_{n}^{c}$ , $∣ {\hat{p}}_{i} - p_{i} ∣ < c {(log n ∕ n)}^{1 ∕ 2} = ε_{n}$ , say, $\forall i$ , so that ${\hat{p}}_{i + 1} - {\hat{p}}_{i} \geq p_{i + 1} - p_{i} - 2 ε_{n} > θ ∕ 2 m$ for all sufficiently large n.

The expectation of (2.16) equals

F^{- 1} (p) - E {\tilde{F}}^{- 1} (p) = - \frac{ε (x) (x - x_{i}) (x_{i + 1} - x_{i})}{p_{i + 1} - p_{i}} + E ε_{n, 3} + O (N^{- 2}) .

(2.18)

Now Eε_n,3 is the sum of the following:

\begin{matrix} - E (\frac{{\hat{p}}_{i} - p_{i}) (x_{i + 1} - x_{i}) δ_{n}}{(p_{i + 1} - p_{i}) ({\hat{p}}_{i + 1} - {\hat{p}}_{i})} 1_{B_{n}^{c}}) & = - E (({\hat{p}}_{i} - p_{i}) \frac{(x_{i + 1} - x_{i}) δ_{n}}{{(p_{i + 1} - p_{i})}^{2}} (1 - \frac{δ_{n}}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}}) 1_{B_{n}^{c}}) \\ = \frac{p_{i} (1 - p_{i})}{n} \frac{x_{i + 1} - x_{i}}{{(p_{i + 1} - p_{i})}^{2}} + O (m^{2} ∕ n^{3 ∕ 2}) + O (N^{- 2}) \\ = O (m ∕ n); \end{matrix}

(2.19)

\begin{matrix} - E (\frac{(x - x_{i})}{(p_{i + 1} - p_{i})} \frac{δ_{n}^{2}}{({\hat{p}}_{i + 1} - {\hat{p}}_{i})} 1_{B_{n}^{c}}) = - \frac{(x - x_{i})}{p_{i + 1} - p_{i}} E (\frac{δ_{n}^{2}}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}} 1_{B_{n}^{c}}) \\ = - \frac{(x - x_{i})}{{(p_{i + 1} - p_{i})}^{2}} E δ_{n}^{2} + \frac{(x - x_{i})}{{(p_{i + 1} - p_{i})}^{2}} E \frac{δ_{n}^{3}}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}} 1_{B_{n}^{c}} + O (N^{- 2}) \\ = - \frac{(x - x_{i})}{{(p_{i + 1} - p_{i})}^{2}} \frac{{p_{i} (1 - p_{i}) + p_{i + 1} (1 - p_{i + 1})}}{n} + O (m^{2} ∕ n^{3 ∕ 2}); \end{matrix}

(2.20)

E (\frac{ε (x) (x - x_{i}) (x_{i + 1} - x_{i})}{p_{i + 1} - p_{i}} (\frac{δ_{n}}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}}) 1_{B_{n}^{c}}) = \frac{ε (x) (x - x_{i}) (x_{i + 1} - x_{i})}{p_{i + 1} - p_{i}} E {\frac{δ_{n}^{2}}{(p_{i + 1} - p_{i}) ({\hat{p}}_{i + 1} - {\hat{p}}_{i})}} 1_{B_{n}^{c}} = O (1 ∕ n) .

(2.21)

For the first relation in (2.21), use $\frac{δ_{n}}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}} = δ_{n} (1 - \frac{δ_{n}}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}}) ∕ (p_{i + 1} - p_{i})$ , and Eδ_n = 0. Hence the bias (of ${\tilde{F}}^{- 1} (p)$ as an estimator of F⁻¹(p)) is

∣ B i a s ({\tilde{F}}^{- 1} (p)) ∣ = O (m ∕ n) + O (1 ∕ m^{2}) .

(2.22)

Subtracting (2.18) from (2.16), one obtains

E {\tilde{F}}^{- 1} (p) - {\tilde{F}}^{- 1} (p) = {({\hat{p}}_{i} - p_{i}) \frac{x_{i + 1} - x}{p_{i + 1} - p_{i}} + ({\hat{p}}_{i + 1} - p_{i + 1}) \frac{x - x_{i}}{p_{i + 1} - p_{i}}} 1_{B_{n}^{c}} + O (m ∕ n) + O_{p} (N^{- 2}),

(2.23)

noting that $E (ε_{n, 3}^{2}) \leq c^{' ″} m^{2} ∕ n^{2}$ for some constant c‴. The term O_p(N⁻²) is bounded by c^iv1_B_n, for some constant c^iv. Therefore,

V a r ({\tilde{F}}^{- 1} (p)) = O (1 ∕ n) + O (m^{2} ∕ n^{2}) .

(2.24)

It is relatively simple to check that the contribution from $D_{n, i}^{c}$ , 1 ≤ i ≤ m−1, to $M I S E (\tilde{ζ})$ is negligible compare to that from $D_{n} = \cup_{1 \leq i \leq m - 1} D_{n, i}$ . It is useful, however, to show that for all p ∈ [0, 1], one has on $B_{n}^{c}$ , the relation

F^{- 1} (p) - {\tilde{F}}^{- 1} (p) = [\tilde{F} (x) - F (x)] \frac{x_{i + 1} - x_{i}}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}} (1 + ε_{n, 4}),

(2.25)

where ε_n,4 = O_p(1/m). Indeed, $∣ ε_{n, 4} ∣ \leq c^{v} ∕ m$ on $B_{n}^{c}$ , for some c^v > 0. To establish (2.25), note first that if $p \in D_{n, i}^{c} \cap [p_{i}, p_{i + 1}]$ then, although $\tilde{F} (x) \in [{\hat{p}}_{i}, {\hat{p}}_{i + 1}]$ (since x ∈ [x_i, x_i+1]), it may happen that F(x) belongs to $({\hat{p}}_{i - 1}, {\hat{p}}_{i})$ or $({\hat{p}}_{i + 1}, {\hat{p}}_{i + 2})$ . On $B_{n}^{c}$ , there is no other possibility.

Now if $F (x) \in ({\hat{p}}_{i - 1}, {\hat{p}}_{i})$ , e.g., then, recalling that x = F⁻¹(p),

\begin{matrix} F^{- 1} (p) - {\tilde{F}}^{- 1} (p) & \equiv {\tilde{F}}^{- 1} (\tilde{F} (x)) - {\tilde{F}}^{- 1} (F (x)) \\ = {\tilde{F}}^{- 1} (\tilde{F} (x)) - {\tilde{F}}^{- 1} (\tilde{F} (x_{i})) + {\tilde{F}}^{- 1} (\tilde{F} (x_{i})) - {\tilde{F}}^{- 1} (F (x)) \\ = (\tilde{F} (x) - \tilde{F} (x_{i})) {\frac{x_{i + 1} - x_{i}}{{\hat{p}}_{i + 1} - {\hat{p}}_{i}}} + (\tilde{F} (x_{i}) - F (x)) {\frac{x_{i} - x_{i - 1}}{{\hat{p}}_{i} - {\hat{p}}_{i - 1}}}, \end{matrix}

(2.26)

in view of the linearity of ${\tilde{F}}^{- 1}$ on both $[{\hat{p}}_{i}, {\hat{p}}_{i + 1}]$ and $[{\hat{p}}_{i - 1}, {\hat{p}}_{i}]$ , but with different slopes (given in curly brackets). But the second slope differs from the first by an amount ε_n,4 which is easily shown to be no more than c^v/m on $B_{n}^{c}$ . The MISE of ${\tilde{F}}^{- 1}$ is then given by

M I S E ({\tilde{F}}^{- 1}) = O (m^{2} ∕ n^{2}) + O (1 ∕ m^{4}) + O (1 ∕ n) .

(2.27)

Once again, the optimal choice of m is $m ≍ n^{1 ∕ 4}$ , and then the MISE has the optimal rate

M I S E ({\tilde{F}}^{- 1}) = O (1 ∕ n) = O (N^{- 4 ∕ 5}) .

(2.28)

(b) Next consider the case m/n^1/4 → ∞, i.e., n = o(N^4/5). Since $M I S E ({\tilde{F}}^{- 1}) = O (1 ∕ n)$ , it is of larger order than N^−4/5, and hence the estimation is suboptimal. In this case, again consider r groups of essentially disjoint equidistant dosages. Then the average ${\tilde{ζ}}_{p} = \frac{1}{r} \sum_{j = 1}^{r} {\tilde{ζ}}_{p, j}$ has bias and variance (See (2.22) and (2.24)) given by

B i a s ({\tilde{ζ}}_{p}) = O (m ∕ (r n)) + O ({(r ∕ m)}^{2}),

(2.29)

V a r ({\tilde{ζ}}_{p}) = O (1 ∕ (r n)) + O (m^{2} ∕ (r^{3} n^{2})) .

(2.30)

Assume that m is not very large, i.e., $\frac{m}{n^{2 ∕ 3}} ↛ \infty$ . Then the optimal choice of r is $r ≍ {(m^{4} ∕ n)}^{1 ∕ 5}$ , since the term m/(rn) in (2.29) is not of larger order than (r/m)², and one equates the orders of 1/(rn) and (r/m)⁴ to get the optimal r. This yields the optimal MISE of ${\tilde{ζ}}_{p}$ , namely, $M I S E ({\tilde{ζ}}_{p}) = O (1 ∕ (r n)) = O ({(m n)}^{- 4 ∕ 5}) = O (N^{- 4 ∕ 5})$ . □

Finally, we arrive at the asymptotic distribution of ${\tilde{ζ}}_{p}$ .

Theorem 2.3. Let p ∈ (0, 1). In addition to the hypothesis of Theorem 2.1, assume m/n^1/4 → ∞. Then the following hold.

(a) With r = 1, ${\tilde{ζ}}_{p} = {\tilde{F}}^{- 1} (p)$ , if m < (2/(cθ))(n/log n)^1/2, then

\frac{\sqrt{n} ({\tilde{ζ}}_{p} - ζ_{p})}{δ (p)} \overset{L}{\to} N (0, \frac{p (1 - p)}{f^{2} (p)}),

(2.31)

where

\begin{matrix} δ^{2} (p) \equiv \sum_{i = 1}^{m - 1} {(x_{i + 1} - x_{i})}^{- 2} {{(x_{i + 1} - x)}^{2} + {(x - x_{i})}^{2}} 1_{I_{i}} (p) \\ (I_{i} = [p_{i}, p_{i + 1}) f o r 1 \leq i \leq m - 2, I_{m - 1} = [p_{m - 1}, p_{m}]), \end{matrix}

(2.32)

lies in [1/2,1], and x = F⁻¹(p) = ζ_p.

(b) If m = o(n^3/2/log^5/2n), then with $r ≍ {(m^{4} ∕ n)}^{1 ∕ 5}$ ,

\frac{\sqrt{r n} ({\tilde{ζ}}_{p} - E {\tilde{ζ}}_{p})}{\overset{‒}{δ} (p)} \overset{L}{\to} N (0, \frac{p (1 - p)}{f^{2} (p)})

(2.33)

Here ${\overset{‒}{δ}}^{2} (p)$ is the average of the r quantities $δ_{j}^{2} (p)$ , 1 ≤ j ≤ r, of the form (2.32), one for each subgroup with m/r dosages at a distance of r/m from each other.

(c) If m = o(n^2/3/log log n), then with $r ≍ {(m^{4} ∕ n)}^{1 ∕ 5} ∕ {(log log n)}^{6 ∕ 5}$ ,

\frac{\sqrt{r n} ({\tilde{ζ}}_{p} - ζ_{p})}{\overset{‒}{δ} (p)} \overset{L}{\to} N (0, \frac{p (1 - p)}{f^{2} (p)}) .

(2.34)

Proof. (a) It follows from (2.16) (and (2.25)) that for p ∈ [p_i, p_i+1) one has, outside B_n,

{\tilde{F}}^{- 1} (p) - F^{- 1} (p) = - {({\hat{p}}_{i} - p_{i}) \frac{x_{i + 1} - x}{p_{i + 1} - p_{i}} + ({\hat{p}}_{i + 1} - p_{i + 1}) \frac{x - x_{i}}{p_{i + 1} - p_{i}}} + O (m ∕ n) + O (1 ∕ m^{2})

(2.35)

Multiplying the two sides by $\sqrt{n}$ , and noting that $m ∕ \sqrt{n} \to 0, \sqrt{n} ∕ m^{2} \to 0$ , the desired Normal approximation holds.

(b) By (2.23), one has, outside B_n,

{\tilde{F}}^{- 1} (p) - E {\tilde{F}}^{- 1} (p) = - {({\hat{p}}_{i} - p_{i}) \frac{x_{i + 1} - x}{p_{i + 1} - p_{i}} + ({\hat{p}}_{i + 1} - p_{i + 1}) \frac{x - x_{i}}{p_{i + 1} - p_{i}}} + O (m ∕ n) .

(2.36)

Using the analog of (2.36) for ${\tilde{F}}_{j}^{- 1} (p) - E {\tilde{F}}_{j}^{- 1} (p)$ , one may apply Lyapunov's central limit theorem (See, e.g., Bhattacharya and Waymire (2007), p.103) to the r summands $\sqrt{n} ({\tilde{F}}_{j}^{- 1} (p) - E {\tilde{F}}_{j}^{- 1} (p))$ ), 1 ≤ j ≤ r, and with m/r for m, to get the desired result. Note that the summands have zero means, variances bounded away from zero and infinity, and bounded third moments, since $\sqrt{n} \frac{m ∕ r}{n} = m ∕ (r \sqrt{n}) ≍ m^{1 ∕ 5} n^{1 ∕ 5} ∕ \sqrt{n} = m^{1 ∕ 5} ∕ n^{3 ∕ 10} \to 0$ as m = o(n^3/2/log^5/2n), which also ensures that $m ∕ r = o (\sqrt{n ∕ log n})$ (See (2.2), (2.3)).

\begin{matrix} B i a s ({\tilde{ζ}}_{p}) = O (m ∕ (r n)) + O (r^{2} ∕ m^{2}), \\ \sqrt{r n} B i a s ({\tilde{ζ}}_{p}) = O (m ∕ \sqrt{r n}) + O (r^{5 ∕ 2} \sqrt{n} ∕ m^{2}) \to 0, \end{matrix}

(2.37)

since $m ∕ \sqrt{r n} = {(\frac{m}{n^{2 ∕ 3}} log log n)}^{3 ∕ 5} \to 0$ , and $\frac{r^{5 ∕ 2} \sqrt{n}}{m^{2}} = O ((\frac{m^{2}}{\sqrt{n}} ∕ {(log log n)}^{3}) \frac{\sqrt{n}}{m^{2}}) \to 0$ .

Hence subtracting the bias from ${\tilde{ζ}}_{p}$ , (2.34) follows from (2.33).

Remark 2.1. Note that (2.33) implies that, with $r ≍ {(m^{4} ∕ n)}^{1 ∕ 5}$ , the asymptotic variance of ${\tilde{ζ}}_{p}$ is O(N^−4/5).

Remark 2.2. Theorems 2.1-2.3 easily extend to the case of non-equal sample sizes n_i, 1 ≤ i ≤ m, and non-equidistant dosages x₁ < . . . < x_m, provided (1) the ratio of min{n_i : 1 ≤ i ≤ m} to max{n_i : 1 ≤ i ≤ m} is bounded away from zero, and (2) the ratio of min{x_i+1 − x_i : 1 ≤ i ≤ m − 1} to max{x_i+1 − x_i : 1 ≤ i ≤ m − 1} is bounded away from zero.

Acknowledgments

The authors wish to thank Professor Walt Piegorsch for a careful reading of the paper and for helpful suggestions.

References

1.Ayer M, Brunk HD, Ewing GM, Reid WT, Silverman E. An empirical distribution function for sampling with incomplete information. Ann.Math.Statist. 1955;26:641–647. [Google Scholar]
2.Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD. Statistical Inference Under Order Restrictions: The Theory and Application of Isotonic Regression. Wiley; London: 1972. [Google Scholar]
3.Bhattacharya R, Kong M. Consistency and asymptotic normality of the estimated effective dose in bioassay. Journal of Statistical Planning and Inference. 2007;137:643–658. [Google Scholar]
4.Bhattacharya R, Lin L. Nonparametric benchmark analysis in risk assessment: a comparative study by simulation and data analysis. 2010. preprint. [DOI] [PMC free article] [PubMed]
5.Bhattacharya R, Waymire EC. A Basic Course in Probability Theory. Springer; New York: 2007. [Google Scholar]
6.Cran GW. AS149 amalgamation of means in the case of simple ordering. Appl. Statist. 1980;29(2):209–211. [Google Scholar]
7.Dette H, Neumeyer N, Pliz KF. A note on nonparametric estimation of the effective dose in quantal bioassay. J.Amer.Statist.Assoc. 2005;100:503–510. [Google Scholar]
8.Dette H, Scheder R. A finite sample comparison of nonparametric estimates of the effective dose in quantal bioassay. Journal of Statistical Computation and Simulation. 2010;80(5):527–544. [Google Scholar]
9.Müller HG, Schmitt T. Kernel and probit estimation in quantal bioassay. J.Amer.Statist.Assoc. 1988;83(403):750–759. [Google Scholar]
10.Park D, Park S. Parametric and nonparametric estimators of ED100α. Journal of Statistical Computation and Simulation. 2006;76(8):661–672. [Google Scholar]
11.Piegorsch WW, Bailer AJ. Analyzing Environmental Data. John Wiley & Sons; 2005. [Google Scholar]

[R1] 1.Ayer M, Brunk HD, Ewing GM, Reid WT, Silverman E. An empirical distribution function for sampling with incomplete information. Ann.Math.Statist. 1955;26:641–647. [Google Scholar]

[R2] 2.Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD. Statistical Inference Under Order Restrictions: The Theory and Application of Isotonic Regression. Wiley; London: 1972. [Google Scholar]

[R3] 3.Bhattacharya R, Kong M. Consistency and asymptotic normality of the estimated effective dose in bioassay. Journal of Statistical Planning and Inference. 2007;137:643–658. [Google Scholar]

[R4] 4.Bhattacharya R, Lin L. Nonparametric benchmark analysis in risk assessment: a comparative study by simulation and data analysis. 2010. preprint. [DOI] [PMC free article] [PubMed]

[R5] 5.Bhattacharya R, Waymire EC. A Basic Course in Probability Theory. Springer; New York: 2007. [Google Scholar]

[R6] 6.Cran GW. AS149 amalgamation of means in the case of simple ordering. Appl. Statist. 1980;29(2):209–211. [Google Scholar]

[R7] 7.Dette H, Neumeyer N, Pliz KF. A note on nonparametric estimation of the effective dose in quantal bioassay. J.Amer.Statist.Assoc. 2005;100:503–510. [Google Scholar]

[R8] 8.Dette H, Scheder R. A finite sample comparison of nonparametric estimates of the effective dose in quantal bioassay. Journal of Statistical Computation and Simulation. 2010;80(5):527–544. [Google Scholar]

[R9] 9.Müller HG, Schmitt T. Kernel and probit estimation in quantal bioassay. J.Amer.Statist.Assoc. 1988;83(403):750–759. [Google Scholar]

[R10] 10.Park D, Park S. Parametric and nonparametric estimators of ED100α. Journal of Statistical Computation and Simulation. 2006;76(8):661–672. [Google Scholar]

[R11] 11.Piegorsch WW, Bailer AJ. Analyzing Environmental Data. John Wiley & Sons; 2005. [Google Scholar]

PERMALINK

An adaptive nonparametric method in benchmark analysis for bioassay and environmental studies

Rabi Bhattacharya

Lizhen Lin

Abstract

1. Introduction

2. Asymptotic Behavior

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

An adaptive nonparametric method in benchmark analysis for bioassay and environmental studies

Rabi Bhattacharya

Lizhen Lin

Abstract

1. Introduction

2. Asymptotic Behavior

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases