Optimal sampling ratios in comparative diagnostic trials

Ting Dong; Liansheng Larry Tang; William F Rosenberger

doi:10.1111/rssc.12043

. Author manuscript; available in PMC: 2014 Jun 17.

Published in final edited form as: J R Stat Soc Ser C Appl Stat. 2014 Apr;63(3):499–514. doi: 10.1111/rssc.12043

Optimal sampling ratios in comparative diagnostic trials

Ting Dong ¹, Liansheng Larry Tang ¹, William F Rosenberger ¹

PMCID: PMC4060451 NIHMSID: NIHMS583708 PMID: 24948841

Summary

A subjective sampling ratio between the case and the control groups is not always an efficient choice to maximize the power or to minimize the total required sample size in comparative diagnostic trials.We derive explicit expressions for an optimal sampling ratio based on a common variance structure shared by several existing summary statistics of the receiver operating characteristic curve. We propose a two-stage procedure to estimate adaptively the optimal ratio without pilot data. We investigate the properties of the proposed method through theoretical proofs, extensive simulation studies and a real example in cancer diagnostic studies.

Keywords: Area under the curve, Diagnostic accuracy, Partial area under the curve, Power, Receiver operating characteristic curve, Two-stage design

1. Introduction

Diagnostic trials evaluate the diagnostic accuracy of a marker or compare the diagnostic accuracy of two markers. For example, in a diagnostic trial by Hendrick et al. (2008), investigators compared the accuracy of digital mammography with screen film mammography. Pepe et al. (2001) referred to these trials as phase III diagnostic trials. In these trials, the true disease status of subjects is known. To evaluate the diagnostic accuracy of a binary marker, sensitivity and specificity are used. Sensitivity is the probability of having a positive test result for a case subject. Specificity is the probability of having a negative test result for a control subject. The false positive rate (FPR) is 1– specificity. For continuous markers, we obtain the sensitivity and FPR on the basis of a threshold that distinguishes the test result as being positive or negative. Varying thresholds allow a number of sensitivities and FPRs to be computed simultaneously. The receiver operating characteristic (ROC) curve plots sensitivities against FPRs for all thresholds (Zhou et al., 1998, 2002).

Typically the ratio between the number of cases versus the number of controls is fixed in advance. A lung cancer prevention trial used an equal ratio with 71 prostate cancer cases and 71 age-matched controls (Etzioni et al., 1999). Some studies use other case–control ratios; for example, the controls who were enrolled for prostate cancer screening were four times as many as the cases in the Physicians Health Study (Etzioni et al., 2003). In a cancer diagnostic trial from Goddard and Hinberg (1990), 135 cancer patients and 218 non-cancer patients were recruited. A traditional biomarker A and newly developed diagnostic biomarkers were used to test blood samples from each subject. The power for comparing biomarkers A and D is below 45% by using the original sampling ratio of 0.62. Thus, these ratios may not be the best choice to maximize the test power, and optimal sampling ratios need to be derived and utilized to improve the power.

Janes and Pepe (2006) provided the optimal ratio for evaluating a continuous marker to maximize the power for a fixed total sample size (SS). Their method is the first attempt to address the optimal sampling ratio in diagnostic trials. However, pilot data are required to estimate the optimal ratio. Without pilot data, some distributions must be assumed to perform the calculations. An optimal ratio from an incorrect distributional assumption may lead to an underpowered study. It is desirable to recalculate the optimal ratio when data become available during the trial. In addition, optimal ratios for comparative diagnostic trials have not been discussed in the literature.

We make the following methodological contributions in this paper:

a design to update the optimal ratio for evaluating a single marker with questionable parametric assumptions;
extension of Janes and Pepe (2006) to two continuous markers and ordinal markers;
a design to update the optimal ratio for comparing two markers and rigorous proof of its properties.

In this paper we first derive a general expression for the optimal sampling ratio of cases to controls in diagnostic trials. The ratio proposed is based on a common variance structure that is shared among existing ROC summary statistics. Special cases of these statistics include the non-parametric area under the ROC curve statistic AUC that was proposed by DeLong et al. (1988) and the weighted AUC-statistic by Wieand et al. (1989). The method proposed can be used in evaluating one marker or comparing two markers. The rest of the paper is organized as follows. In Section 2, we start with the optimal ratios for diagnostic trials. In Section 3, we propose a two-stage method to incorporate the idea of internal pilot data to estimate adaptively the optimal sampling ratio. The method can be applied in trials that evaluate one marker or compare two markers. We show that, although the optimal ratio is updated during a diagnostic trial, the analysis at the end of the trial can be carried out in the same fashion as in the traditional trial without affecting the nominal type I error rate. Section 4 illustrates the increase in power and the savings on the overall required SS by using the proposed method through a cancer example. Section 5 investigates benefits of the proposed procedure through extensive simulation studies. Some discussion is provided in Section 6.

The data that are analysed in the paper can be obtained from http://wileyonlinelibrary.com/journal/rss-datasets

2. Optimal sampling ratio

Suppose that we have N subjects with m cases and n controls. Each subject is measured by diagnostic test l (l = 1, 2). We define the ith case as X_li, where i = 1, . . . , m, and the jth control as Y_lj, where j = 1, . . . , n. The joint cumulative survival functions for cases are (X_1i, X_2i) ~ S_d(x₁, x₂) and the joint cumulative survival functions for controls are $(Y_{1 j}, Y_{2 j}) \sim S_{\overset{‒}{d}} (y_{1}, y_{2})$ . Their marginal survival distributions are X_li ~ S_d,l and $Y_{l j} \sim S_{\overset{‒}{d}, l} (y)$ respectively. For the threshold c varying in (−∞, ∞), the sensitivity is S_d,l and the FPR is $S_{\overset{‒}{d}, l} (c) = \Pr (Y_{l j} > c)$ . Consequently, the ROC curve for test l is defined as $R_{l} (u) = S_{d, l} {S_{\overset{‒}{d}, l}^{- 1} (u)}$ , where the FPR u falls within [0,1].

Summary measures for a single ROC curve include the area under the ROC curve, AUC, the partial AUC, pAUC, and the weighted AUC, wAUC. wAUC for marker l, $Ω_{l} = \int_{0}^{1} R_{l} (u) d W (u)$ , was given by Wieand et al. (1989) where W(u) is a probability measure. We let W(u) be a point u₀, an FPR, to calculate the sensitivity of a test, or W(u) = u, where u ∈(0, 1), to obtain AUC. When W(u) = (u – u₀)/u₁ – u₀), where u ∈ (u₀, u₁), Ω_l gives the partial AUC.

The statistics for comparing markers might be parametric (Mazumdar and Liu, 2003), or non-parametric (DeLong et al., 1988; Wieand et al., 1989). Let θ be the parameter in the ROC comparison, and $\hat{θ}$ be the estimator. On the basis of the variance expressions for these ROC statistics, we identify the following common structure for the variance of the aforementioned ROC statistics when the sample sizes become large:

var (\hat{θ}) = \frac{v_{x}}{m} + \frac{v_{y}}{n},

(1)

where v_x is the variance associated with measurements of case patients and v_y is the variance related to control patients. In this paper we use the non-parametric statistics of DeLong et al. (1988) and Wieand et al. (1989). We present the variance expressions for these statistics in Sections 2.1.2 and 2.2. A similar variance structure for a conventional binormal ROC statistic of Mazumdar and Liu (2003) is presented in Appendix A.

Given the variance structure in equation (1), the total required SS in a diagnostic trial can be minimized by using an optimal sampling ratio when the variance is fixed. In other words, the power for comparing two markers can be maximized by using this optimal sampling ratio. Suppose that the total required SS in the diagnostic trial is N = m + n; the sampling ratio is r = m/n. Let the variance of $\hat{θ}$ be a fixed constant, a. Since m = rn = Nr(1 + r), it follows that

\frac{v_{x}}{m} + \frac{v_{y}}{n} = \frac{(1 + r) (v_{x} ∕ r + v_{y})}{N} = a .

The total required SS can then be expressed as

N = \frac{(1 + r) (v_{x} ∕ r + v_{y})}{a} .

To minimize N, we take the first derivative with respect to r and equate it to 0. We obtain v_y/a – v_x/ar^–2=0.By solving this equation, the optimal sampling ratio is obtained as

r^{*} = \sqrt (v_{x} ∕ v_{y}) .

(2)

It is worth noting that the optimal sampling ratio is analogous to the Neyman allocation ratio for clinical trials which has been widely used to reduce the overall SS for a fixed power. However, as will be seen from Sections 2.1 and 2.2, v_x and v_y in diagnostic trials take more complicated forms than those used in clinical trials which are commonly the variances of response variables in treatment and control groups. Interested readers are refered to Jennison and Turnbull (2000) and Rosenberger and Lachin (2002).

2.1. Optimal sampling ratio for continuous markers

The difference Δ=Ω₁ –Ω₂ was used in Wieand et al. (1989) to compare the wAUCs for continuous data. Here the estimator ${\hat{Ω}}_{l, m, n}$ of Ω_l, for l =1, 2, is obtained by substituting the empirical function estimators Ŝ_d,l for S_d,l and Ŝ _d̄,l for S_d̄,l in Ω_l:

{\hat{Ω}}_{l, m, n} = \int_{0}^{1} {\hat{S}}_{d, l} {{\hat{S}}_{\overset{‒}{d}, l}^{- 1} (u)} d W (u) .

The resulting Δ-statistic is given by ${\hat{Δ}}_{m, n} = {\hat{Ω}}_{l, m, n} - {\hat{Ω}}_{2, m, n}$ . Hereafter the subscripts m and n in ${\hat{Δ}}_{m, n}$ will be omitted unless necessary and the notation $\hat{Δ}$ will be used. We shall need differentiability of the ROC functions for our main theorem. The following assumption guarantees this property.

Assumption 1. S_d,l and S_d̄,l have continuous positive derivatives on $R$ . Let $S_{d, l}^{'}$ and $S_{\overset{‒}{d}, l}^{'}$ denote their derivatives.

Let

w_{i, l} = \int_{0}^{1} [R_{l} (u) - I {X_{l i} \leq S_{\overset{‒}{d}, l}^{- 1} (u)}] d W (u)

and

v_{j, l} = \int_{0}^{1} (R_{l}^{'} (u) [I {Y_{l j} \leq S_{\overset{‒}{d}, l}^{- 1} (u)} - u]) d W (u),

for l=1,2, where $R_{l}^{'} (u) = S_{d, l}^{'} {S_{\overset{‒}{d}, l}^{- 1} (u)} ∕ S_{\overset{‒}{d}, l}^{'} {S_{\overset{‒}{d}, l}^{- 1} (u)}$ . The variances of w_i,j and v_j,l are

var (w_{i, l}) = \int_{0}^{1} \int_{0}^{1} R_{l} (s \land t) d W (s) d W (t) - {\int_{0}^{1} R_{l} (s) d W (s)}^{2}

and

var (v_{j, l}) = \int_{0}^{1} \int_{0}^{1} R_{l}^{'} (s) R_{l}^{'} (t) (s \land t) d W (s) d W (t) - {\int_{0}^{1} R_{l}^{'} (s) s d W (s)}^{2} .

Let w_i = w_i,1 – w_i,2 and v_j = v_j,1 – v_j,2. Further denote ${\overset{‒}{w}}_{m} = \sum_{i = 1}^{m} w_{i} ∕ m$ and ${\overset{‒}{v}}_{n} = \sum_{j = 1}^{n} v_{j} ∕ n$ . Wieand et al. (1989) and Tang et al. (2008) studied the Δ-statistic and showed that, under assumption 1, $\hat{Δ}$ is ${\overset{‒}{w}}_{m} + {\overset{‒}{v}}_{n} + Ω_{1} - Ω_{2} + T_{m, n}^{*}$ , where $T_{m, n}^{*}$ is a small order term with $T_{m, n}^{*} {(m + n)}^{- 1 ∕ 2}$ converging to 0 in probability, as m, n → ∞ (Wieand et al. (1989), page 591). They also showed that

var (w_{i}) = \sum_{l = 1}^{2} var (w_{i, l}) - 2 \int_{0}^{1} \int_{0}^{1} [S_{d} {S_{\overset{‒}{d}, 1}^{- 1} (s), S_{\overset{‒}{d}, 2}^{- 1} (t)} - R_{1} (s) R_{2} (t)] d W (s) d W (t),

(3)

and

var (v_{j}) = \sum_{l = 1}^{2} Var (v_{j, l}) - 2 \int_{0}^{1} \int_{0}^{1} R_{1}^{'} (s) R_{2}^{'} (t) [S_{\overset{‒}{d}} {S_{\overset{‒}{d}, 1}^{- 1} (s), S_{\overset{‒}{d}, 2}^{- 1} (t)} - s t] d W (s) d W (t) .

(4)

2.1.1. Optimal sampling ratio for evaluating one continuous marker

We start with one marker, say marker 1. It follows from the approximation of $\hat{Δ}$ that ${\hat{Ω}}_{1}$ is asymptotically equivalent to $\sum_{i = 1}^{m} w_{i, 1} ∕ m + \sum_{j = 1}^{n} v_{j, 1} ∕ n + Ω_{1}$ . The w_i,1s are independent, identically distributed random corresponding to measurements of cases and the v_j,1s are independent, identically distributed random variables related to measurements of controls. Following the general expression (2), we can see that the optimal sampling ratio for evaluating marker 1 on the basis of wAUC is given by $r_{1}^{*} \sqrt {var (w_{i, 1}) ∕ var (v_{j, 1})}$ . This ratio includes existing results for AUC by Hanley and Hajian-Tilaki (1997) and for the sensitivity by Janes and Pepe (2006). wAUC ${\hat{Ω}}_{1}$ estimates AUC when W(u) = u for 0 < u₀ < 1. Consequently, the optimal ratio becomes

\sqrt [\frac{E {I (X_{1 i} > Y_{1 j}) I (X_{1 i} > Y_{1 l})} - E {I (X_{1 i} > Y_{1 j})}^{2}}{E {I (X_{1 i} > Y_{1 j}) I (X_{1 k} > Y_{1 j})} - E {I (X_{1 i} > Y_{1 j})}^{2}}],

i, k = 1, . . . , m, and j, l = 1, . . . , n. This can be written in terms of placement values, $\sqrt [var {S_{\overset{‒}{d}, l} (Y_{1 j})} ∕ var {S_{d, 1} (X_{1 i})}]$ , as shown in Janes and Pepe (2006). When W(u)=I{u=u₀} for 0<u₀<1, ${\hat{Ω}}_{1}$ , estimates the sensitivity at the FPR u₀ and the optimal ratio can be shown reduce to

r_{s, 1}^{*} = \sqrt {\frac{R_{1} (u_{0}) - R_{1}^{2} (u_{0})}{R_{1}^{'} {(u_{0})}^{2} u_{0} - R_{1}^{'} {(u_{0})}^{2} u_{0}^{2}}},

\frac{\sqrt [R_{1} (u_{0}) {1 - R_{1} (u_{0})} ∕ {u_{0} (1 - u_{0})}]}{R_{1}^{'} (u_{0})} .

This has been derived in Janes and Pepe (2006).

2.1.2. Optimal sampling ratio for comparing two continuous markers

Since the w_is are random variables corresponding to measurements of case patients and the v_js are also random variables related to measurements of control subjects, expression (2) gives the optimal ratio for comparing the difference between wAUCs:

r^{*} = \sqrt {var (w_{i}) ∕ var (v_{j})},

(5)

where the variances are given in equations (3) and (4).

Since $\hat{Δ}$ compares AUCs, partial AUCs or sensitivities at a particular FPR, we discuss the optimal ratios for these special cases by specifying corresponding weight functions. When we let the weight function be W(u) = u, for 0 < u < 1, $\hat{Δ}$ compares the AUCs. The optimal ratio in equation (5) implies that the following ratio between the case and the control maximizes the power for comparing the AUCs A: $r_{A}^{*} = \sqrt (v_{x}^{A} ∕ v_{y}^{A})$ , where

v_{x}^{A} = \sum_{l = 1}^{2} [E {I (X_{l i} > Y_{l j}) I (X_{l i} > Y_{l l})} - E {I (X_{l i} > Y_{l j})}^{2}] - 2 [E {I (X_{1 i} > Y_{1 j}) I (X_{2 i} > Y_{2 l})} - E {I (X_{1 i} > Y_{1 j})} E {I (X_{2 i} > Y_{2 l})}]

(6)

and

v_{y}^{A} = \sum_{l = 1}^{2} [E {I (X_{l i} > Y_{l j}) I (X_{l k} > Y_{l j})} - E {I (X_{l i} > Y_{l j})}^{2}] - 2 [E {I (X_{l i} > Y_{l j}) I (X_{2 k} > Y_{2 j}) - E {I (X_{1 i} > Y_{1 j})} E {I (X_{2 k} > Y_{2 j})}],

(7)

as shown in Appendix A. When W(u)= I{u = u₀} for 0 < u₀ < 1, $\hat{Δ}$ compares the sensitivities of two markers at the FPR u₀. The optimal ratio in equation (5) reduces to

r_{s}^{*} = \sqrt (\frac{\sum_{l = 1}^{2} {R_{l} (u_{0}) - R_{l}^{2} (u_{0})} - 2 A}{\sum_{l = 1}^{2} [R_{l}^{'} {(u_{0})}^{2} - {R_{l}^{'} (u_{0}) u_{0}}^{2}] - 2 B}) .

where $A = \Pr {X_{1 i} > S_{\overset{‒}{d}, l}^{- 1} (u_{0}), X_{2 i} > S_{\overset{‒}{d}, 2}^{- 1} (u_{0})} - R_{1} (u_{0}) R_{2} (u_{0})$ and $B = R_{1}^{'} (u_{0}) R_{2}^{'} (u_{0}) [\Pr {X_{1 i} > S_{\overset{‒}{d}, 1}^{- 1} (u_{0}), X_{2 i} > S_{\overset{‒}{d}, 2}^{- 1} (u_{0})} - u_{0}^{2}]$

2.2. Optimal sampling ratio for ordinal markers

The variance of the $\hat{Δ}$ -statistic involves the first derivatives of the ROC curves. The optimal ratio in equation (5) cannot be readily applied to ordinal data which often occur in radiology. We thus consider the non-parametric statistic by DeLong et al. (1988) to obtain the optimal ratio for comparing two ordinal markers which are usually two imaging modalities in radiology. Let $Ω_{l}^{A} = P (X_{l i} > Y_{l j}) + P (X_{l i} = Y_{l j}) ∕ 2$ for marker l, and ${\hat{Ω}}_{l}^{A}$ be its estimator. DeLong's statistic estimates ${\hat{Δ}}^{D} = Ω_{1}^{A} - Ω_{2}^{A}$ and is given as

{\hat{Δ}}^{D} = {\hat{Ω}}_{1}^{A} - {\hat{Ω}}_{2}^{A} = \frac{1}{m n} \sum_{j = 1}^{n} \sum_{i = 1}^{m} (ψ_{i j j}^{(1)} - ψ_{i j}^{(2)}),

where $ψ_{i j}^{(l)}$ equals 1, for Y_lj < X_li, $\frac{1}{2}$ for Y_lj > X_li and 0 for Y_lj > X_li, for marker l, l=1,2.

DeLong et al. (1988) showed that the large sample variance of ${\hat{Δ}}^{D}$ has the form $var ({\hat{Δ}}^{D}) = v_{x}^{D} ∕ m + v_{y}^{D} ∕ n$ , with

v_{x}^{D} = \sum_{l = 1}^{2} {E (ψ_{i j}^{(l)} ψ_{i l}^{(l)}) - E {(ψ_{i j}^{(l)})}^{2}} - 2 {E (ψ_{i j}^{(1)} ψ_{i j}^{(2)}) - E (ψ_{i j}^{(1)}) E (ψ_{i j}^{(2)})}

from the cases, and

v_{y}^{D} = \sum_{l = 1}^{2} {E (ψ_{i j}^{(l)} ψ_{k j}^{(l)}) - E {(ψ_{i j}^{(l)})}^{2}} - 2 {E (ψ_{i j}^{(1)} ψ_{k j}^{(2)}) - E (ψ_{i j}^{(1)}) E (ψ_{k j}^{(2)})}

from the controls. Therefore, it follows from equation (2) that the ratio $r_{D}^{*} = \sqrt (v_{x}^{D} ∕ v_{y}^{D})$ maximizes the power for comparing two ordinal markers. For the problem of evaluating a single ordinal marker, the optimal ratio is reduced to

\sqrt {\frac{E (ψ_{i j}^{(1)} ψ_{i l}^{(1)}) - E {(ψ_{i j}^{(1)})}^{2}}{E (ψ_{i j}^{(1)} ψ_{k j}^{(1)}) - E {(ψ_{i j}^{(1)})}^{2}}} .

3. Two-stage procedure to obtain the optimal ratio

We may assume a parametric model to obtain the variances and resulting optimal ratios derived in the preceding section. When a parametric model is correctly specified, the optimal ratio can be calculated from equation (2) for comparing ROC summary measures, and the SS to obtain a specified power can be subsequently derived. However, if the parametric model is misspecified, the SS calculated may not give the appropriate power. Fig. 1 shows the optimal ratios for comparing the AUCs and pAUCs with the case and control having different variances. The case and control observations are from the bivariate normal distributions with (X₁, X₂)~ N{(2, 2), Σ_x}, and (Y₁, Y₂) ~ N{(0, 0), Σ_y) , where Σ_x has diagonal elements 1 and off-diagonal elements 0.1, and Σ_y has diagonal elements σ_Y² and off-diagonal elements of 0.1σ_Y².We see that the optimal ratio decreases as αY increases from 0.8 to 1.3. This indicates that the variances of the case and the control play an important role in the optimal ratio. When the variance for the control is larger than the case, the optimal ratio becomes larger than 1, indicating that sampling more controls than cases yields a better power to detect a difference between markers. Thus, the misspecification of parametric models at the planning stage may lead to an incorrect optimal ratio.

Fig. 1 — Optimal sampling ratio for comparing (a) the AUCs or (b) pAUCs: the case and control observations are from the bivariate normal distributions with (X₁, X₂)~N{(2, 2), Σ_x} and (Y₁, Y₂) ~ N{(0,0), Σ_y}, where Σ_x has diagonal elements and off-diagonal elements 0.1, and Σ_y has diagonal elements *σ_Y*² and off-diagonal elements 0.1σ_Y²; the pAUCs are obtained over the FPR between 0 and 0.6

For a fixed sample two-sided hypothesis test, to detect the difference between ROC summary measures, the required SSs m and n with power 1 – β and the significance level of α are given by

m = r n = \frac{{(z_{1 - α ∕ 2} + z_{β})}^{2} (v_{x} + v_{y})}{Δ_{1}^{2}},

(8)

where Δ₁ is the diffference between ROC summary measures under the alternative hypothesis to be detected. The total required SS is N = m + n.

Proschan (2005) introduced the concept of internal pilot data which often refers to accumulated data after a trial has been carried out for a certain period of time. To correct for the model misspecification at the beginning of the trial, we propose a two-stage procedure to use internal pilot data after some observations are available during the trial. Suppose that the total required SS N is fixed. Without loss of generality, we use a two-sided test in the procedure proposed. The procedure is given in the following steps.

Step 1: specify a parametric model to obtain v_x,0 and v_y,0, and the resulting initial optimal ratio $r_{0}^{*} = \sqrt (v_{x, 0} ∕ v_{y, 0})$ .
Step 2: use the ratio $r_{0}^{*}$ together with v_x,0 and v_y,0 in the following SS formula to calculate initial SS m₀ and n₀ with power 1 – β and significance level α, $m_{0} = {(z_{α ∕ 2} + z_{β})}^{2} (v_{x} + r_{0}^{*} v_{y}) ∕ Δ_{1}^{2}$ , and n₀ = N – m₀, where Δ₁ is the difference between ROC summary measures under the alternative hypothesis.
Step 3: after sufficient marker measurements are available on m₁ cases and n₁ controls at the first stage, the variance expressions of either the Δ-statistic or DeLong's statistic are recalculated by using available data. These variance estimators, ${\hat{v}}_{x, 1}$ and ${\hat{y}}_{y, 1}$ , are applied in equation (2) to recalculate the optimal ratio ${\hat{r}}^{*} = \sqrt ({\hat{v}}_{x, 1} ∕ {\hat{v}}_{y, 1})$ .
Step 4: continue the trial by recruiting M₂ cases and N₂ controls, where M₂ and N₂ are given by
$\begin{matrix} M_{2} = & \frac{N {\hat{r}}^{*}}{1 + {\hat{r}}^{*}} - m_{1}, \\ N_{2} = & \frac{N}{1 + {\hat{r}}^{*}} - n_{1} . \end{matrix}$ (9)

It was showed in Proschan (2005) that using the internal pilot data for comparing population means in clinical trials maintains the nominal type I error rate. The reason is that the sample variance that is obtained at the end of the first stage does not give information for the sample mean at the end of the trial. We show that it is also true in our case as m,n → ∞. Suppose that ${\hat{Δ}}_{m, n}$ is the estimated Δ at the end of the stage with m cases and n controls. The variance estimators at the first stage are ${\hat{v}}_{x, 1} = Σ_{i = 1}^{m_{1}} {(w_{i} - {\overset{‒}{w}}_{m_{1}})}^{2} ∕ m_{1}$ and ${\hat{v}}_{y, 1} = Σ_{j = 1}^{n_{1}} {(v_{j} - {\overset{‒}{v}}_{n_{1}})}^{2} ∕ n_{1}$ , where w_i and v_j are given in Section 2.2. We first state the results for w̄_m and v̄_n in the following theorem, and then state the result for the Δ-statistic in the consequent corollary. The proof is provided in Appendix A.

Theorem 1. Let H₀ :Ω₁=Ω₂. Assume that m, n → ∞, m₁/m → λ₁, n₁/n → λ₂ and m/n →, where 0 < λ < ∞ and 0 < λ₁, λ₂ < 1. Then,

{({\overset{‒}{w}}_{m}, {\hat{v}}_{x, 1})}^{T} \sqrt m \overset{d}{\to} N {(0, σ_{v, x}^{2}), Σ_{w}},

(10)

where $σ_{v, x}^{2} = var ({\hat{v}}_{x, 1} \sqrt m)$ and $Σ_{v} = diag {var ({\overset{‒}{v}}_{n} \sqrt m), var ({\hat{v}}_{y, 1} \sqrt m)}$ . Also, under assumption 1,

{({\overset{‒}{v}}_{n}, {\hat{v}}_{y, 1})}^{T} \sqrt m \overset{d}{\to} N {(0, σ_{v, y}^{2}), Σ_{v}},

(11)

Theorem 1 implies that w̄_m is asymptotically independent of v̂_x,1, and v̄_n is asymptotically independent of v̂_{y, 1}. We also observe that both w̄_m and v̂_x,1 are obtained on different subjects from v̄_n and v̂_y,1. Thus, we can obtain the following corollary for ${\hat{Δ}}_{m, n}$ by ignoring the small order term $T_{m, n}^{*}$ in the approximation of $\hat{Δ}$ .

Corollary 1. Under the regularity conditions in theorem 1, ${\hat{Δ}}_{m, n}$ is asymptotically independent of v̂_x,1 and v̂_y,1 as m, n → ∞.

Therefore, the variance estimated at the first stage does not give information for the Δ-statistic at the end for large SSs. Thus, the resulting optimal ratio by using data from the first stage does not reveal information about the estimated difference between two ROC statistics obtained at the end of the second stage. Consequently, although the optimal ratio is updated during the trial, the analysis at the end of the trial can be carried out in the same fashion as in the trial without updating the optimal ratio. This is important in maintaining the proper type I error rate.

4. Application to the cancer diagnostic trial

We applied our method to the cancer diagnostic trial from Goddard and Hinberg (1990). Measurements from the blood samples are highly skewed for all biomarkers. We compared a new biomarker D and the reference biomarker A to illustrate the increment in power and the SS savings by using the procedure proposed. We assumed a contrast of Δ₁ =0.05 between AUCs and the type I error rate 0.05 for calculating power and SS from a two-sided alternative. The overall SS N is 353 by summing the numbers of cases and controls. At the first stage, we accrued data on m₁ = 60 cancer and n₁ = 60 non-cancer patients and obtained the variance estimates v̂_x,1 = 0.082 and v̂_y,1 = 0.035, which resulted in the optimal case–control ratio r̂* = 1.53, from equation (2). Using this optimal ratio in the expression (9) in step 4 of the procedure proposed, the numbers of the cases and controls to be recruited in the second stage were calculated to be 153 and 80 respectively. The power by using the optimal ratio was then 50.9% from the equation

1 - β = Φ [Δ_{1} \sqrt {\frac{N {\hat{r}}^{*}}{(1 + {\hat{r}}^{*}) ({\hat{v}}_{x, 1} + {\hat{v}}_{y, 1} {\hat{r}}^{*})}} - z_{α ∕ 2}] .

(12)

This power offers a 7% increment over the power 43.8% calculated by using equation (12) by replacing r̂* with the original ratio of 0.62. We also investigated the savings on the overall SS by using the procedure proposed. Using the original power 43.8% with the estimated optimal ratio r̂* = 1.53, the overall SS was calculated to be 292 with 177 cancer patients and 115 non-cancer patients. This offers savings of 61 patients over the original ratio.

5. Simulation studies

In this section, we demonstrate the performance of our method for maximizing power or minimizing total SSs when comparing summary statistics of diagnostic tests in extensive simulation studies. We consider both continuous data and ordinal data.

5.1. Simulation studies based on continuous data

The biomarker results in the example used in Section 4 are highly skewed, and a log-normal distribution was used by Goddard and Hinberg (1990) as a possible approximation to the distribution of results. Thus, we consider bivariate log-normal distributions in the simulation studies. In addition, we simulate data from both bivariate normal distributions which are commonly used for symmetrically distributed marker results and bivariate exponential distributions which can approximate survival biomarker results. The bivariate normal models have the forms (X₁, X₂/ ~ N{(μ₁, μ₂, Σ_X} and (Y₁, Y₂/ ~ N{(0, 0), Σ_Y}, where the diagonal elements of Σ_X and Σ_Y are 1 and 9 respectively, and the correlation parameter ρ is the same for two matrices. We choose ρ = 0.1 and ρ = 0.25 in our simulations. AUC is set to be 0.70 for marker 1, and 0.75 or 0.80 for marker 2. pAUC with the FPR in the range (0, 0.6) is set to be 0.30 for marker 1, and 0.35 or 0.40 for marker 2. The bivariate log-normal models have the forms exp(X₁, X₂) and exp(Y₁, Y₂) for cases and controls respectively. The AUCs and pAUCs remain the same as in the normal models. The log-normal distribution may also demonstrate the robustness of the aforementioned non-parametric methods. The performance of the methods is expected to be similar for the normal and log-normal distributions because the non-parametric estimators should remain invariant under monotone transformations.

According to the algorithm in Gumbel (1960), the bivariate exponential random variables take the form

H (x, y) = H_{1} (x) H_{2} (y) [1 + 4 ρ {1 - H_{1} (x)} {1 - H_{2} (y)}],

where H_l, l = 1, 2, are univariate exponential functions, and ρ is in [–0.25, 0.25]. We set ρ to 0.1 or 0.25 here. The marginal survival functions are exp(–β_l1x) and exp(–β_l2y), so we could generate data from these two distributions. In the simulation, AUC is set to 0.70 for marker 1, and 0.75 or 0.80 for marker 2. pAUC with the FPR in the range (0, 0.6) is set to 0.30 for marker 1, and 0.35 or 0.45 for marker 2.

We compare the proposed two-step procedure with the equal case–control ratio and the optimal ratio. We use DeLong's statistic for comparing the AUCs and the Δ-statistic for comparing the pAUCs. In our simulation, we first assume that our samples were from bivariate normal distributions; then we use equation (8) to calculate the initial total required SS. With the type I error rate 0.05 and power 80%, the initial total required SSs are N = 1421, or N = 326 to detect the difference of two pairs of AUCs of (0.70, 0.75) and (0.70, 0.80) respectively, with ρ=0.1. When ρ=0.25, the total required SSs, N =1207, or N = 278, are needed to detect the difference in these pairs. There are three different sampling ratios:

the proposed two-stage optimal ratio;
the optimal ratio of 0.5 for the normal and log-normal distributions and the optimal ratio of 1.5 for the exponential distributions;
the equal sampling ratio.

To implement the method proposed, we let m₁ = n₁ = N/4. By substituting non-parametric variance estimates v̂_x,1 and v̂_y,1, the resulting optimal ratio is estimated by ${\hat{r}}^{*} = \sqrt ({\hat{v}}_{x, 1} ∕ {\hat{v}}_{y, 1})$ , and M₂ and N₂ are calculated by using equation (9). We then generate M₂ new observations for cases and N₂ observations for controls. Consequently, the null hypothesis of equal AUCs or pAUCs is rejected in favour of the alternative if the absolute value of the calculated Z-statistic is greater than or equal to z_0.025. The simulated power is then calculated as the percentage of times out of 5000 simulation runs that the null hypothesis is rejected. The simulated powers are presented in Table 1, which illustrates that the simulated powers of the two-stage method proposed are close to those of the optimal ratio and are greater than those of the equal sampling ratio in the normal settings. Since the optimal ratio for the exponential distribution specified is close to 1.5, we see that most of the powers of the method proposed are greater than those of fixed ratios.

Table 1.

Simulated power for comparing AUCs or pAUCs by using the two-stage method proposed and fixed ratios, over 5000 simulations^†

ρ	Distribution	Powers (%) for comparing AUCs				Powers (%) for comparing pAUCs
		AUC for marker 2	Two-stage	Fixed ratio		pAUC for marker 2	Two-stage	Fixed ratio
		AUC for marker 2	Two-stage	Equal	Optimal	pAUC for marker 2	Two-stage	Equal	Optimal
0.10	BN	0.75	80.0	77.0	79.5	0.35	33.3	31.4	32.8
		0.80	80.4	74.6	80.2	0.45	88.1	86.2	88.9
	LN	0.75	79.1	74.5	78.3	0.35	34.0	31.9	32.1
		0.80	80.5	74.8	79.4	0.45	89.1	85.3	88.0
	BE	0.75	81.0	80.4	82.2	0.35	84.0	83.4	84.6
		0.80	81.6	80.0	82.7	0.45	85.0	84.0	84.4
0.25	BN	0.75	82.2	78.0	81.7	0.35	37.3	34.8	37.5
		0.80	81.0	77.6	80.1	0.45	91.8	89.4	92.1
	LN	0.75	82.0	78.7	81.5	0.35	37.0	34.2	36.7
		0.80	83.5	78.1	82.6	0.45	92.7	89.8	92.3
	BE	0.75	83.7	82.6	83.3	0.35	91.2	90.3	91.0
		0.80	83.6	82.8	84.4	0.45	90.9	90.7	90.8

Open in a new tab

^†

AUC for marker 1 is 0.70, and pAUC for marker 1 is 0.30. BN, bivariate normal; LN, bivariate log-normal; BE, bivariate exponential. ρ is the correlation coefficient of two markers. The optimal ratios for the bivariate normal and log-normal distributions are close to 0.5, and the optimal ratios for the bivariate exponential distribution are close to 1.5.

We also conduct simulation studies to illustrate that the method proposed reduces the total SS compared with the equal ratio. The aforementioned bivariate normal distribution is applied to simulate test results. We first calculate the initial total SS N with the equal ratio, type I error rate 0.05 and power 80%. At the end of stage I, with m₁ =n₁ simulated test results from two groups, we update the case/control ratio with the estimated optimal ratio from the interim data, and recalculate the total SS that is needed to achieve 80% power on the basis of the estimated ratio. Additional test results are then generated according to the updated SS in two groups, and the Z-statistic is estimated. The null hypothesis of equal AUCs is rejected in favour of the alternative if the absolute value of the calculated Z-statistic is greater than or equal to z_0.025. The simulated power is given by the percentage of times out of 5000 simulation runs that the null hypothesis is rejected. The simulated power and the average updated total SS with m₁ = n₁ (N/5, N/7) are presented in Table 2, which illustrates that the two-stage method proposed reduces the total SS compared with the equal ratio. The simulated power of the two-stage method proposed is close to the nominal power for all parameterizations. In addition, the simulated power and updated SS vary little with different sizes at stage I.

Table 2.

Average updated total SS and simulated power for comparing AUCs by using the proposed two-stage method over 5000 simulations^†

ρ	Results for m₁ = n₁ = N/5				Results for m₁ = n₁ = N/7
	AUC for marker 2	Initial SS	Updated SS	Power (%)	AUC for marker 2	Initial SS	Updated SS	Power (%)
0.10	0.75	1744	1333	80.9	0.75	1744	1335	80.4
	0.80	405	311	80.6	0.80	405	313	79.1
0.25	0.75	1527	1160	80.3	0.75	1527	1161	80.2
	0.80	357	273	81.5	0.80	357	275	79.8

Open in a new tab

^†

The AUC for marker 1 is 0.70. ρ is the correlation coefficient of two markers.

We also evaluate the performance of the two-stage procedure to see whether the procedure maintains the nominal type I error rate. We use N = 200, 400, 500. We consider the parametric distributions and the three different sampling ratios that were used in the previous simulation. We assume equal AUCs or pAUCs with the AUCs being (0.70, 0.75, 0.80), and the pAUCs being (0.30, 0.35, 0.40). The nominal type I error rate is 0.05 in our simulation. The simulated type I error rates with 10000 simulation runs are shown in Table 3. All these rates are close to the nominal level when the sample size goes to 500.

Table 3.

Type I error rates for comparing the AUCs or pAUCs by using the two-stage method proposed, over 10000 simulations^†

ρ	Distribution	Error rates (%) for comparing the AUCs				Error rates (%) for comparing the pAUCs
		AUCs	N = 200	N = 400	N = 500	pAUCs	N = 200	N = 400	N = 500
0.1	BN	0.70	4.5	5.0	5.0	0.30	4.8	5.1	5.0
		0.75	5.1	5.0	4.9	0.35	5.0	4.9	5.0
		0.80	4.9	5.1	4.9	0.40	5.2	5.1	5.5
	LN	0.70	4.9	5.0	5.0	0.30	4.6	5.2	5.1
		0 75	5.1	4.9	5.1	0.35	4.6	5.1	5.0
		0.80	5.0	4.4	5.0	0.40	5.0	5.1	4.9
	BE	0.70	5.0	5.1	5.0	0.30	5.2	4.9	5.0
		0.75	5.0	4.9	4.9	0.35	5.3	5.0	5.1
		0.80	5.2	5.1	4.9	0.40	4.7	4.9	5.1
0.25	BN	0.70	4.9	4.8	4.7	0.30	5.1	5.0	4.9
		0.75	5.1	5.0	5.0	0.35	5.2	5.3	5.2
		0.80	5.2	5.1	5.0	0.40	4.9	5.1	5.1
	LN	0.70	5.0	5.1	5.0	0.30	5.1	5.3	5.2
		0.75	4.9	4.7	4.9	0.35	4.5	5.0	4.8
		0.80	4.8	3.9	5.0	0.40	4.6	4.8	5.0
	BE	0.70	4.2	5.0	5.2	0.30	5.0	5.0	4.8
		0.75	5.3	5.0	4.9	0.35	5.1	4.7	5.0
		0.80	4.2	5.1	4.9	0.40	4.9	4.7	5.0

Open in a new tab

^†

BN, bivariate normal distribution; LN, bivariate log-normal distribution; BE, bivariate exponential distribution.

N is the total required SS and ρ is the correlation coefficient of two markers.

5.2. Simulation studies based on ordinal data

We also conduct simulation studies to evaluate the simulated power of the proposed method on ordinal test results. We first use the aforementioned bivariate log-normal distributions and bivariate exponential distributions to simulate continuous results. We then use the 20th, 40th, 60th and 80th percentiles of the distributions to categorize the simulated continuous data as follows. A test result is recoded as 1 if it is less than the 20th percentile, 2 if it is between the 20th and 40th percentiles, 3 if it is between the 40th and 60th percentiles, 4 if it is between the 60th and 80th percentiles, and 5 if it is greater than the 80th percentile. The rest of the simulated settings are identical to those in the previous section on evaluating the power for continuous data. The results in Table 4 indicate that the simulated power by using the method proposed is similar to that of the optimal ratios and is higher than for those parameterizations using the equal ratio.

Table 4.

Simulated power for ordinal data for comparing AUCs by using the two-stage method proposed and fixed ratios, over 5000 simulations^†

ρ	Distribution	AUC for marker 2	Two-stage power (%)	Fixed ratio power (%)
ρ	Distribution	AUC for marker 2	Two-stage power (%)	Equal	Optimal
0.10	LN	0.75	81.2	75.3	79.7
		0.80	82.0	77.3	83.2
	BE	0.75	87.1	84.7	88.6
		0.80	86.6	84.5	86.4
0.25	LN	0.75	80.9	77 6	80.0
		0.80	80.0	78.6	80.2
	BE	0.75	89.0	87.1	88.6
		0.80	88.1	87.2	88.9

Open in a new tab

^†

The AUC for marker 1 is 0.70. LN, bivariate log-normal distribution; BE, bivariate exponential distribution. ρ is the correlation coefficient of two markers.

6. Discussion

The optimal sampling ratio in diagnostic trials can maximize the test power or minimize the overall SS. The optimal sampling ratio that is discussed in this paper is analogous to the optimal allocation ratio in assigning treatments to patients in clinical trials. The optimal allocation ratio has been used in clinical trials for decades, but the importance of the optimal ratio in diagnostic trials has not been widely recognized. Implementation requires the calculation of complicated variances of frequently used ROC statistics. This paper discusses a common variance structure for ROC statistics and thereby introduces optimal sampling ratios in comparative diagnostic trials based on these statistics. Two popular non-parametric ROC statistics are used to illustrate the explicit forms of the optimal ratios because their variance expressions can be written as the sum of separate terms; one relates to the cases, and the other relates to the controls.

If preliminary studies are available before carrying out a comparative diagnostic trial, the variance can be estimated by using pilot data to obtain the optimal ratio for comparing specified ROC summary measures. The ratio can then be used to recruit patients in the trial, and recalculating the ratio may not be necessary during the trial. However, when medical practitioners do not have preliminary data for the markers and are not certain about the distributions of the marker results, the distribution assumption that is used for obtaining the optimal ratio may be far from the true underlying distributions for the marker results. This may result in less power or larger overall SSs than using the true optimal ratio. The two-stage procedure proposed is then particularly useful to ensure that the optimal ratio can be recalculated by using internal pilot data during the trial. The procedure proposed performs well in a large-scale simulation study. We also demonstrate that the procedure proposed maintains the nominal type I error rate in the simulation. We use an example in cancer diagnostic studies to illustrate the application of our method on maximizing the test power and saving overall SSs. The results indicate that, compared with the original sampling ratio, using the proposed two-stage procedure for a fixed overall sample size increased the test power. Alternatively, for the fixed test power, the procedure proposed reduces the overall SS by nearly 25%.

In some rare diseases, it may not be possible to recruit the required number of the cases. Suppose that only 135 cancer patients can be recruited in the aforementioned cancer diagnostic trial. If the calculated optimal ratio of 1.53 is maintained, then 89 non-cancer patients should be in the trial. This leads to the total SS of N =224. Using the power calculation formula (12) gives a power of 35.2%, which sacrifices 8% power while reducing the SS by 129. This indicates that, for a fixed number of cases, recruiting more controls may increase the power if the budget of a trial allows. This can be seen from the variance expression (1) since, when m is fixed in equation (1), the variance decreases as n increases. Thus, with the constraint of total 353 subjects and 135 cases, the original sampling ratio of 1.14 (135/118) gives the maximum power.

The characteristics of subjects are often matched in case–control studies to minimize the confounding effects. Janes and Pepe (2008) illustrated that a ROC summary estimate without adjusting for covariates may be biased. If covariate information is available, matching should be considered for deriving the optimal sampling ratio. Future research on this topic is warranted.

Acknowledgements

The authors thank the Associate Editor, the Joint Editor and a referee for their constructive comments. The authors also thank their colleague Anand Vidyashankar for many useful suggestions that led to an improvement in this paper. The project described here was supported by award R15CA150698 from the National Cancer Institute under the American Recovery and Reinvestment Act of 2009 and by award H98230-11-1-0196 from the National Security Agency.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Cancer Institute or the National Institutes of Health.

Appendix A: Variance expressions of receiver operating characteristic statistics, variance deviation and proof of proposition 1

A.1. Variance expressions for a parametric receiver operating characteristic statistic

When measurements of markers have bivariate normal distributions, Mazumdar and Liu (2003) provided expressions for the variances, v_x and v_y. Suppose that (X₁, X₂) ~ N{(μ_1d, μ_2d), Σ_d} for i=1, . . . , m and (Y₁,Y₂) ~ N{(μ_1d̄, μ_2d̄} for j = 1, . . . , n, where

Σ_{d} = (\begin{matrix} σ_{1 d}^{2} & ρ σ_{1 d} σ_{2 d} \\ ρ σ_{1} σ_{2 d} & σ_{2 d}^{2} \end{matrix})

and

Σ_{\overset{‒}{d}} = (\begin{matrix} σ_{1 \overset{‒}{d}}^{2} & ρ σ_{1 \overset{‒}{d}} σ_{2 \overset{‒}{d}} \\ ρ σ_{1 \overset{‒}{d}} σ_{2 \overset{‒}{d}} & σ_{2 \overset{‒}{d}}^{2} \end{matrix}) .

The statistic considered in Mazumdar and Liu (2003) is the partial AUC estimator given by

{\hat{θ}}_{p}^{b n} = \int_{c_{1}}^{c_{2}} Φ (α_{1} - β_{1} x) ϕ (x) d x - \int_{c_{1}}^{c_{2}} Φ (α_{2} - β_{2} x) ϕ (x) d x,

where α₁ = (μ_1d – μ₁_d̄)/σ_1d, β₁ = σ₁_d̄/σ_1d and α₂ = (μ_2d – μ₂_d̄)/σ_2d and β₂ = σ₂_d̄/σ_2d. Let

Γ_{1} (r, s; c_{1}, c_{2}) = {\frac{1}{\sqrt (s^{2} + 1)} ϕ {\frac{r}{\sqrt (s^{2} + 1)}} Φ (x) ∣}_{c_{1}^{'}}^{c_{2}^{'}}

and

Γ_{2} (r, s; c_{1}, c_{2}) = {- \frac{1}{s^{2} + 1} ϕ {\frac{r}{\sqrt (s^{2} + 1)}} {ϕ (x) + \frac{r s}{\sqrt (s^{2} + 1)} Φ (x)} ∣}_{c_{1}^{'}}^{c_{2}^{'}},

with $c_{1}^{'} = c_{1} \sqrt (s^{2} + 1) + r s ∕ \sqrt (s^{2} + 1)$ and $c_{2}^{'} = c_{2} \sqrt (s^{2} + 1) + r s ∕ \sqrt (s^{2} + 1)$ .

Let

f_{l 1} = \frac{1}{σ_{l d}} Γ_{1} (\frac{μ_{l d} - μ_{l \overset{‒}{d}}}{σ_{l d}}, - \frac{σ_{l \overset{‒}{d}}}{σ_{l d}}; c_{1}, c_{2})

and f_l2 = –f_l1, for l = 1, 2. In addition, let

f_{l 3} = \frac{2}{2 σ_{l d}^{3}} {(μ_{l \overset{‒}{d}} - μ_{l d}) Γ_{1} (\frac{μ_{l d} - μ_{l \overset{‒}{d}}}{σ_{l d}}, - \frac{σ_{l \overset{‒}{d}}}{σ_{l d}}; c_{1}, c_{2}) + σ_{l \overset{‒}{d}} Γ_{2} (\frac{μ_{l d} - μ_{l \overset{‒}{d}}}{σ_{l d}}, - \frac{σ_{l \overset{‒}{d}}}{σ_{l d}}; c_{1}, c_{2})},

and

f_{l 4} = - \frac{1}{2 σ_{l d} σ_{l \overset{‒}{d}}} Γ_{2} (\frac{μ_{l d} - μ_{l \overset{‒}{d}}}{σ_{l d}}, - \frac{σ_{l \overset{‒}{d}}}{σ_{l d}}; c_{1}, c_{2}) .

The variances v_x and v_y for ${\hat{θ}}_{p}^{b n}$ can be written as

v_{x} = σ_{1 d}^{2} (f_{11}^{2} + 2 f_{13}^{2} σ_{1 d}^{2}) + σ_{2 d}^{2} (f_{21}^{2} + 2 f_{23}^{2} σ_{2 d}^{2}) - 2 ρ σ_{l d} σ_{2 d} (f_{11} f_{21} + f_{13} f_{23})

and

v_{y} = σ_{1 \overset{‒}{d}}^{2} (f_{12}^{2} + 2 f_{14}^{2} σ_{1 \overset{‒}{d}}^{2}) + σ_{2 \overset{‒}{d}}^{2} (f_{22}^{2} + 2 f_{24}^{2} σ_{2 \overset{‒}{d}}^{2}) - 2 ρ σ_{1 \overset{‒}{d}} σ_{2 \overset{‒}{d}} (f_{12} f_{22} + f_{14} f_{24}) .

A.2. Derivation of $v_{x}^{A}$ and $v_{y}^{A}$

We can show that

\int_{0}^{1} \int_{0}^{1} S_{d} {S_{\overset{‒}{d}, 1}^{- 1} (s), S_{\overset{‒}{d}, 2}^{- 1} (t)} d s d t

can be expressed as

\int_{- \infty}^{\infty} \int_{- \infty}^{\infty} S_{d} (y_{1}, y_{2}) d S_{\overset{‒}{d}, 1} (y_{1}) d S_{\overset{‒}{d}, 2} (y_{2}) .

Let $S_{\overset{‒}{d}, 1}^{- 1} (s) = y_{1}$ and $S_{\overset{‒}{d}, 2}^{- 1} (t) = y_{2}$ ; then we have

\int_{0}^{1} \int_{0}^{1} S_{d} {S_{\overset{‒}{d}, 1}^{- 1} (s), S_{\overset{‒}{d}, 2}^{- 1} (t)} d s d t = E {I (X_{1 i} > Y_{1 j}) I (X_{2 i} > Y_{2 l})} .

Similarly, v_y becomes

v_{y} = \sum_{l = 1}^{2} [\int_{0}^{1} \int_{0}^{1} R_{l}^{'} (s) R_{l}^{'} (r) (s \land t) d s d t - {\int_{0}^{1} R_{l}^{'} (s) s d s}^{2}] - 2 \int_{0}^{1} \int_{0}^{1} R_{1}^{'} (s) R_{2}^{'} (t) [S_{\overset{‒}{d}} {S_{\overset{‒}{d}, l}^{- 1} (s), S_{\overset{‒}{d}, 2}^{- 1} (t)} - s t] d s d t .

It follows that

\int_{0}^{1} \int_{0}^{1} R_{1}^{'} (s) R_{2}^{'} (t) S_{\overset{‒}{d}} {S_{\overset{‒}{d}, 1}^{- 1} (s), S_{\overset{‒}{d}, 2}^{- 1} (t)} = \int_{0}^{1} \int_{0}^{1} \frac{S_{d, 1}^{'} {S_{\overset{‒}{d}, 1}^{- 1} (s)}}{S_{\overset{‒}{d}, 1}^{'} {S_{\overset{‒}{d}, 1}^{- 1} (s)}} \frac{S_{d, 2}^{'} {S_{\overset{‒}{d}, 2}^{- 1} (t)}}{S_{\overset{‒}{d}, 2}^{'} {S_{\overset{‒}{d}, 2}^{- 1} (t)}} S_{\overset{‒}{d}} {S_{\overset{‒}{d}, 1}^{- 1} (s), S_{\overset{‒}{d}, 2}^{- 1} (t)} d s d t .

Let $S_{\overset{‒}{d}, 1}^{- 1} (s) = y_{1}$ and $S_{\overset{‒}{d}, 2}^{- 1} (t) = y_{2}$ ; then it follows that

\begin{matrix} \iint S_{d, 1}^{'} (y_{1}) S_{d, 2}^{'} (y_{2}) S_{\overset{‒}{d}} (y_{1}, y_{2}) d y_{1} d y_{2} & = E {I (X_{1 i} < Y_{1 j}) I (X_{2 k} < Y_{2 j})} \\ = E [{1 - I (X_{1 i} > Y_{1 j})} {1 - I (X_{2 k} > Y_{2 j})}] \\ = 1 - E {I (X_{1 i} > Y_{1 j})} - E {I (X_{2 k} > Y_{2 j})} + E {I (X_{1 i} > Y_{1 j}) I (X_{2 k} > Y_{2 j})} . \end{matrix}

Because

\int_{0}^{1} \int_{0}^{1} R_{1}^{'} (s) R_{2}^{'} (t) s t d s d t

can also be written as

1 - \Pr (X_{1 i} > Y_{1 j}) - \Pr (X_{2 k} > Y_{2 j}) + E {I (X_{1 i} > Y_{1 j})} E {I (X_{2 k} > Y_{2 j})},

the expressions for v_x and v_y are simplified to equations (6) and (7) respectively.

A.3. Proof of theorem 1

Recall that ${\overset{‒}{w}}_{m} = Σ_{i = 1}^{m} w_{i} ∕ m$ , and ${\overset{‒}{v}}_{n} = Σ_{j = 1}^{m} v_{j} ∕ n$ are the sample means at the end of the trial and E(w̄_m) = E(v̄_n) = 0, where m and n are the SSs at the end of the trial. The variance estimators at the end of the first stage are ${\hat{v}}_{x, 1} = Σ_{i = 1}^{m_{1}} {(w_{i} - {\overset{‒}{w}}_{m_{1}})}^{2} ∕ m_{1}$ and ${\hat{v}}_{y, 1} = Σ_{j = 1}^{n_{1}} {(v_{j} - {\overset{‒}{v}}_{n_{1}})}^{2} ∕ n_{1}$ , where m₁ and n₁ are SSs at the end of the first stage. Let $A_{m} = {\overset{‒}{w}}_{m} \sqrt m$ and $B_{m} = {\hat{v}}_{x, 1} \sqrt m$ . We shall show that

X_{m} ≔ (\begin{matrix} A_{m} \\ B_{m} - σ_{v, x}^{2} \end{matrix}) \overset{d}{\to} N_{2} (0, Σ_{w}),

(13)

where Σ_w is a diagonal matrix. For this, using the Cramer–Wold device, consider l^TX_m, where l = (l₁, l₂)^T. Since the B_m can be expressed as

B_{m} = \sqrt m (\sum_{i = 1}^{m_{1}} w_{i}^{2} ∕ m_{1} - m_{1} {\overset{‒}{w}}_{m_{1}}^{2}),

it follows that

\begin{matrix} I^{T} X_{m} = & (\sum_{i = 1}^{m} \frac{l_{1} w_{i}}{m} + \sum_{i = 1}^{m_{1}} \frac{l_{2} w_{i}^{2}}{m_{1}} - l_{2} {\overset{‒}{w}}_{m_{1}}^{- 2}) \sqrt m - σ_{v, x}^{2} \\ = & \frac{\sqrt m}{m_{1}} \sum_{i = 1}^{m_{1}} (\frac{m_{1}}{m} l_{1} w_{i} + l_{2} w_{i}^{2} - \frac{σ_{v, x}^{2}}{\sqrt m}) + \frac{\sqrt m}{m - m_{1}} \sum_{i = m_{1} + 1} \frac{m - m_{1}}{m} l_{1} w_{i} - l_{2} {\overset{‒}{w}}_{m_{1}}^{2} \sqrt m \\ = & T_{m} (1) + T_{m} (2) - T_{m} (3) . \end{matrix}

(14)

Since the w_is are bounded random variables, T_m(1) and T_m(2) have finite second moments. Also, T_m(1) and T_m(2) are independent since T_m(1) is based on the random variables {w_i : i = 1, . . . , m₁} and T_m(2) is based on the random variables {w_i : i = m₁ + 1, . . . , m}. Hence, by the central theorem and Slutsky's theorem (Serfling, 1980), it follows that, as m → ∞,

T_{m} (1) + T_{m} (2) \overset{d}{\to} N [0, var (λ_{1} l_{1} w_{1} + l_{2} w_{1}^{2}) + var {(1 - λ_{1}) l_{1} w_{1}}] .

Also, under hypothesis H₀, since $E (w_{1}^{3}) = 0$ , the limiting variance can be shown to reduce to $l_{1}^{2} σ_{1}^{2} + l_{2}^{2} σ_{2}^{2}$ , where $σ_{1}^{2} = {{(1 - λ_{1})}^{2} + 1}$ , and $σ_{2}^{2} = var (w_{1}^{2})$ . Now, returning to the last term on the right-hand side of equation (14), note that $T_{m} (3) = m^{- 1 ∕ 2} l_{2} {({\overset{‒}{w}}_{m_{1}} \sqrt m)}^{2}$ converges to 0 in probability, as m → ∞, by the central limit theorem. This completes the proof of expression (13) and hence expression (10).

We now turn to the proof of expression (11). Now, under assumption 1, R_l(u) is continuously differentiable, and it follows that

v_{j} = \int_{0}^{1} (R_{1}^{'} (u) [I {Y_{1 j} \leq S_{\overset{‒}{d}, 1}^{- 1} (u)} - u] - R_{2}^{'} (u) [I {Y_{2 j} \leq S_{\overset{‒}{d}, 2}^{- 1} (u)} - u]) d W (u) .

Now the proof can be completed along the lines of the proof of expression (10). This completes the proof of theorem 1.

References

DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44:837–845. [PubMed] [Google Scholar]
Etzioni R, Kooperberg C, Pepe M, Smith R, Gann PH. Combining biomarkers to detect disease with application to prostate cancer. Biostatistics. 2003;4:523–538. doi: 10.1093/biostatistics/4.4.523. [DOI] [PubMed] [Google Scholar]
Etzioni R, Pepe M, Longton G, Hu C, Goodman G. Incorporating the time dimension in receiver operating characteristic curves: a case study of prostate cancer. Med. Decsn Makng. 1999;19:242–251. doi: 10.1177/0272989X9901900303. [DOI] [PubMed] [Google Scholar]
Goddard MJ, Hinberg I. Receiver operator characteristic (roc) curves and non-normal data: an empirical study. Statist. Med. 1990;9:325–337. doi: 10.1002/sim.4780090315. [DOI] [PubMed] [Google Scholar]
Gumbel EJ. Bivariate exponential distributions. J. Am. Statist. Ass. 1960;55:698–707. [Google Scholar]
Hanley JA, Hajian-Tilaki KO. Sampling variability of nonparametric estimates of the areas under receiver operating characteristic curves: an update. Acad. Radiol. 1997;4:49–58. doi: 10.1016/s1076-6332(97)80161-4. [DOI] [PubMed] [Google Scholar]
Hendrick RE, Cole EB, Pisano ED, Acharyya S, Marques H, Cohen MA, Jong RA, Mawdsley GE, Kanal KM, D’Orsi CJ, Rebner M, Gatsonis C. Accuracy of soft-copy digital mammography versus that of screen-film mammography according to digital manufacturer: ACRIN DMIST retrospective multireader study. Radiology. 2008;247:38–48. doi: 10.1148/radiol.2471070418. [DOI] [PMC free article] [PubMed] [Google Scholar]
Janes H, Pepe M. The optimal ratio of cases to controls in a case-control for estimating the classification accuracy of a biomarker. Biostatistics. 2006;7:456–468. doi: 10.1093/biostatistics/kxj018. [DOI] [PubMed] [Google Scholar]
Janes H, Pepe MS. Matching in studies of classification accuracy: implications for analysis, efficiency, and assessment of incremental value. Biometrics. 2008;64:1–9. doi: 10.1111/j.1541-0420.2007.00823.x. [DOI] [PubMed] [Google Scholar]
Jennison C, Turnbull BW. Group Sequential Methods with Applications to Clinical Trials. Chapman and Hall; New York: 2000. [Google Scholar]
Mazumdar M, Liu A. Group sequential design for comparative diagnostic accuracy studies. Statist. Med. 2003;22:727–739. doi: 10.1002/sim.1386. [DOI] [PubMed] [Google Scholar]
Pepe MS, Etzioni R, Feng Z, Potter JD, Thompson ML, Thornquist M, Winget M, Yasui Y. Phases of biomarker development for early detection of cancer. J. Natn. Cancer Inst. 2001;93:1054–1061. doi: 10.1093/jnci/93.14.1054. [DOI] [PubMed] [Google Scholar]
Proschan M. Two-stage sample size re-estimation based on nuisance parameter a review. J. Biopharm. Statist. 2005;15:559–574. doi: 10.1081/BIP-200062852. [DOI] [PubMed] [Google Scholar]
Rosenberger WF, Lachin JM. Randomization in Clinical Trials Theory and Practice. Wiley; New York: 2002. [Google Scholar]
Serfling RJ. Approximation Theorems of Mathematical Statistics. Wiley; New York: 1980. [Google Scholar]
Tang L, Emerson SS, Zhou X. Nonparametric and semiparametric group sequential methods for comparing accuracy of diagnostic tests. Biometrics. 2008;64:1137–1145. doi: 10.1111/j.1541-0420.2008.01000.x. [DOI] [PubMed] [Google Scholar]
Wieand S, Gail MH, James BR, James KL. A family of non-parametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika. 1989;76:585–592. [Google Scholar]
Zhou X, McClish DK, Obuchowski NA. Statistical Methods in Diagnostic Medicine. Wiley; New York: 2002. [Google Scholar]
Zou K, Tempany C, Fielding J, Silverman S. Original smooth receiver operating characteristic curve estimation from continuous data: statistical methods for analyzing the predictive value of spiral ct of ureteral stones. Acad. Radiol. 1998;5:680–687. doi: 10.1016/s1076-6332(98)80562-x. [DOI] [PubMed] [Google Scholar]

[R1] DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44:837–845. [PubMed] [Google Scholar]

[R2] Etzioni R, Kooperberg C, Pepe M, Smith R, Gann PH. Combining biomarkers to detect disease with application to prostate cancer. Biostatistics. 2003;4:523–538. doi: 10.1093/biostatistics/4.4.523. [DOI] [PubMed] [Google Scholar]

[R3] Etzioni R, Pepe M, Longton G, Hu C, Goodman G. Incorporating the time dimension in receiver operating characteristic curves: a case study of prostate cancer. Med. Decsn Makng. 1999;19:242–251. doi: 10.1177/0272989X9901900303. [DOI] [PubMed] [Google Scholar]

[R4] Goddard MJ, Hinberg I. Receiver operator characteristic (roc) curves and non-normal data: an empirical study. Statist. Med. 1990;9:325–337. doi: 10.1002/sim.4780090315. [DOI] [PubMed] [Google Scholar]

[R5] Gumbel EJ. Bivariate exponential distributions. J. Am. Statist. Ass. 1960;55:698–707. [Google Scholar]

[R6] Hanley JA, Hajian-Tilaki KO. Sampling variability of nonparametric estimates of the areas under receiver operating characteristic curves: an update. Acad. Radiol. 1997;4:49–58. doi: 10.1016/s1076-6332(97)80161-4. [DOI] [PubMed] [Google Scholar]

[R7] Hendrick RE, Cole EB, Pisano ED, Acharyya S, Marques H, Cohen MA, Jong RA, Mawdsley GE, Kanal KM, D’Orsi CJ, Rebner M, Gatsonis C. Accuracy of soft-copy digital mammography versus that of screen-film mammography according to digital manufacturer: ACRIN DMIST retrospective multireader study. Radiology. 2008;247:38–48. doi: 10.1148/radiol.2471070418. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Janes H, Pepe M. The optimal ratio of cases to controls in a case-control for estimating the classification accuracy of a biomarker. Biostatistics. 2006;7:456–468. doi: 10.1093/biostatistics/kxj018. [DOI] [PubMed] [Google Scholar]

[R9] Janes H, Pepe MS. Matching in studies of classification accuracy: implications for analysis, efficiency, and assessment of incremental value. Biometrics. 2008;64:1–9. doi: 10.1111/j.1541-0420.2007.00823.x. [DOI] [PubMed] [Google Scholar]

[R10] Jennison C, Turnbull BW. Group Sequential Methods with Applications to Clinical Trials. Chapman and Hall; New York: 2000. [Google Scholar]

[R11] Mazumdar M, Liu A. Group sequential design for comparative diagnostic accuracy studies. Statist. Med. 2003;22:727–739. doi: 10.1002/sim.1386. [DOI] [PubMed] [Google Scholar]

[R12] Pepe MS, Etzioni R, Feng Z, Potter JD, Thompson ML, Thornquist M, Winget M, Yasui Y. Phases of biomarker development for early detection of cancer. J. Natn. Cancer Inst. 2001;93:1054–1061. doi: 10.1093/jnci/93.14.1054. [DOI] [PubMed] [Google Scholar]

[R13] Proschan M. Two-stage sample size re-estimation based on nuisance parameter a review. J. Biopharm. Statist. 2005;15:559–574. doi: 10.1081/BIP-200062852. [DOI] [PubMed] [Google Scholar]

[R14] Rosenberger WF, Lachin JM. Randomization in Clinical Trials Theory and Practice. Wiley; New York: 2002. [Google Scholar]

[R15] Serfling RJ. Approximation Theorems of Mathematical Statistics. Wiley; New York: 1980. [Google Scholar]

[R16] Tang L, Emerson SS, Zhou X. Nonparametric and semiparametric group sequential methods for comparing accuracy of diagnostic tests. Biometrics. 2008;64:1137–1145. doi: 10.1111/j.1541-0420.2008.01000.x. [DOI] [PubMed] [Google Scholar]

[R17] Wieand S, Gail MH, James BR, James KL. A family of non-parametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika. 1989;76:585–592. [Google Scholar]

[R18] Zhou X, McClish DK, Obuchowski NA. Statistical Methods in Diagnostic Medicine. Wiley; New York: 2002. [Google Scholar]

[R19] Zou K, Tempany C, Fielding J, Silverman S. Original smooth receiver operating characteristic curve estimation from continuous data: statistical methods for analyzing the predictive value of spiral ct of ureteral stones. Acad. Radiol. 1998;5:680–687. doi: 10.1016/s1076-6332(98)80562-x. [DOI] [PubMed] [Google Scholar]

PERMALINK

Optimal sampling ratios in comparative diagnostic trials

Ting Dong

Liansheng Larry Tang

William F Rosenberger

Summary

1. Introduction

2. Optimal sampling ratio

2.1. Optimal sampling ratio for continuous markers

2.1.1. Optimal sampling ratio for evaluating one continuous marker

2.1.2. Optimal sampling ratio for comparing two continuous markers

2.2. Optimal sampling ratio for ordinal markers

3. Two-stage procedure to obtain the optimal ratio

Fig. 1.

4. Application to the cancer diagnostic trial

5. Simulation studies

5.1. Simulation studies based on continuous data

Table 1.

Table 2.

Table 3.

5.2. Simulation studies based on ordinal data

Table 4.

6. Discussion

Acknowledgements

Appendix A: Variance expressions of receiver operating characteristic statistics, variance deviation and proof of proposition 1

A.1. Variance expressions for a parametric receiver operating characteristic statistic

A.2. Derivation of $v_{x}^{A}$ and $v_{y}^{A}$

A.3. Proof of theorem 1

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Optimal sampling ratios in comparative diagnostic trials

Ting Dong

Liansheng Larry Tang

William F Rosenberger

Summary

1. Introduction

2. Optimal sampling ratio

2.1. Optimal sampling ratio for continuous markers

2.1.1. Optimal sampling ratio for evaluating one continuous marker

2.1.2. Optimal sampling ratio for comparing two continuous markers

2.2. Optimal sampling ratio for ordinal markers

3. Two-stage procedure to obtain the optimal ratio

Fig. 1.

4. Application to the cancer diagnostic trial

5. Simulation studies

5.1. Simulation studies based on continuous data

Table 1.

Table 2.

Table 3.

5.2. Simulation studies based on ordinal data

Table 4.

6. Discussion

Acknowledgements

Appendix A: Variance expressions of receiver operating characteristic statistics, variance deviation and proof of proposition 1

A.1. Variance expressions for a parametric receiver operating characteristic statistic

A.2. Derivation of vxA and vyA

A.3. Proof of theorem 1

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

A.2. Derivation of $v_{x}^{A}$ and $v_{y}^{A}$