Summary
A subjective sampling ratio between the case and the control groups is not always an efficient choice to maximize the power or to minimize the total required sample size in comparative diagnostic trials.We derive explicit expressions for an optimal sampling ratio based on a common variance structure shared by several existing summary statistics of the receiver operating characteristic curve. We propose a two-stage procedure to estimate adaptively the optimal ratio without pilot data. We investigate the properties of the proposed method through theoretical proofs, extensive simulation studies and a real example in cancer diagnostic studies.
Keywords: Area under the curve, Diagnostic accuracy, Partial area under the curve, Power, Receiver operating characteristic curve, Two-stage design
1. Introduction
Diagnostic trials evaluate the diagnostic accuracy of a marker or compare the diagnostic accuracy of two markers. For example, in a diagnostic trial by Hendrick et al. (2008), investigators compared the accuracy of digital mammography with screen film mammography. Pepe et al. (2001) referred to these trials as phase III diagnostic trials. In these trials, the true disease status of subjects is known. To evaluate the diagnostic accuracy of a binary marker, sensitivity and specificity are used. Sensitivity is the probability of having a positive test result for a case subject. Specificity is the probability of having a negative test result for a control subject. The false positive rate (FPR) is 1– specificity. For continuous markers, we obtain the sensitivity and FPR on the basis of a threshold that distinguishes the test result as being positive or negative. Varying thresholds allow a number of sensitivities and FPRs to be computed simultaneously. The receiver operating characteristic (ROC) curve plots sensitivities against FPRs for all thresholds (Zhou et al., 1998, 2002).
Typically the ratio between the number of cases versus the number of controls is fixed in advance. A lung cancer prevention trial used an equal ratio with 71 prostate cancer cases and 71 age-matched controls (Etzioni et al., 1999). Some studies use other case–control ratios; for example, the controls who were enrolled for prostate cancer screening were four times as many as the cases in the Physicians Health Study (Etzioni et al., 2003). In a cancer diagnostic trial from Goddard and Hinberg (1990), 135 cancer patients and 218 non-cancer patients were recruited. A traditional biomarker A and newly developed diagnostic biomarkers were used to test blood samples from each subject. The power for comparing biomarkers A and D is below 45% by using the original sampling ratio of 0.62. Thus, these ratios may not be the best choice to maximize the test power, and optimal sampling ratios need to be derived and utilized to improve the power.
Janes and Pepe (2006) provided the optimal ratio for evaluating a continuous marker to maximize the power for a fixed total sample size (SS). Their method is the first attempt to address the optimal sampling ratio in diagnostic trials. However, pilot data are required to estimate the optimal ratio. Without pilot data, some distributions must be assumed to perform the calculations. An optimal ratio from an incorrect distributional assumption may lead to an underpowered study. It is desirable to recalculate the optimal ratio when data become available during the trial. In addition, optimal ratios for comparative diagnostic trials have not been discussed in the literature.
We make the following methodological contributions in this paper:
a design to update the optimal ratio for evaluating a single marker with questionable parametric assumptions;
extension of Janes and Pepe (2006) to two continuous markers and ordinal markers;
a design to update the optimal ratio for comparing two markers and rigorous proof of its properties.
In this paper we first derive a general expression for the optimal sampling ratio of cases to controls in diagnostic trials. The ratio proposed is based on a common variance structure that is shared among existing ROC summary statistics. Special cases of these statistics include the non-parametric area under the ROC curve statistic AUC that was proposed by DeLong et al. (1988) and the weighted AUC-statistic by Wieand et al. (1989). The method proposed can be used in evaluating one marker or comparing two markers. The rest of the paper is organized as follows. In Section 2, we start with the optimal ratios for diagnostic trials. In Section 3, we propose a two-stage method to incorporate the idea of internal pilot data to estimate adaptively the optimal sampling ratio. The method can be applied in trials that evaluate one marker or compare two markers. We show that, although the optimal ratio is updated during a diagnostic trial, the analysis at the end of the trial can be carried out in the same fashion as in the traditional trial without affecting the nominal type I error rate. Section 4 illustrates the increase in power and the savings on the overall required SS by using the proposed method through a cancer example. Section 5 investigates benefits of the proposed procedure through extensive simulation studies. Some discussion is provided in Section 6.
The data that are analysed in the paper can be obtained from http://wileyonlinelibrary.com/journal/rss-datasets
2. Optimal sampling ratio
Suppose that we have N subjects with m cases and n controls. Each subject is measured by diagnostic test l (l = 1, 2). We define the ith case as Xli, where i = 1, . . . , m, and the jth control as Ylj, where j = 1, . . . , n. The joint cumulative survival functions for cases are (X1i, X2i) ~ Sd(x1, x2) and the joint cumulative survival functions for controls are . Their marginal survival distributions are Xli ~ Sd,l and respectively. For the threshold c varying in (−∞, ∞), the sensitivity is Sd,l and the FPR is . Consequently, the ROC curve for test l is defined as , where the FPR u falls within [0,1].
Summary measures for a single ROC curve include the area under the ROC curve, AUC, the partial AUC, pAUC, and the weighted AUC, wAUC. wAUC for marker l, , was given by Wieand et al. (1989) where W(u) is a probability measure. We let W(u) be a point u0, an FPR, to calculate the sensitivity of a test, or W(u) = u, where u ∈(0, 1), to obtain AUC. When W(u) = (u – u0)/u1 – u0), where u ∈ (u0, u1), Ωl gives the partial AUC.
The statistics for comparing markers might be parametric (Mazumdar and Liu, 2003), or non-parametric (DeLong et al., 1988; Wieand et al., 1989). Let θ be the parameter in the ROC comparison, and be the estimator. On the basis of the variance expressions for these ROC statistics, we identify the following common structure for the variance of the aforementioned ROC statistics when the sample sizes become large:
| (1) |
where vx is the variance associated with measurements of case patients and vy is the variance related to control patients. In this paper we use the non-parametric statistics of DeLong et al. (1988) and Wieand et al. (1989). We present the variance expressions for these statistics in Sections 2.1.2 and 2.2. A similar variance structure for a conventional binormal ROC statistic of Mazumdar and Liu (2003) is presented in Appendix A.
Given the variance structure in equation (1), the total required SS in a diagnostic trial can be minimized by using an optimal sampling ratio when the variance is fixed. In other words, the power for comparing two markers can be maximized by using this optimal sampling ratio. Suppose that the total required SS in the diagnostic trial is N = m + n; the sampling ratio is r = m/n. Let the variance of be a fixed constant, a. Since m = rn = Nr(1 + r), it follows that
The total required SS can then be expressed as
To minimize N, we take the first derivative with respect to r and equate it to 0. We obtain vy/a – vx/ar–2=0.By solving this equation, the optimal sampling ratio is obtained as
| (2) |
It is worth noting that the optimal sampling ratio is analogous to the Neyman allocation ratio for clinical trials which has been widely used to reduce the overall SS for a fixed power. However, as will be seen from Sections 2.1 and 2.2, vx and vy in diagnostic trials take more complicated forms than those used in clinical trials which are commonly the variances of response variables in treatment and control groups. Interested readers are refered to Jennison and Turnbull (2000) and Rosenberger and Lachin (2002).
2.1. Optimal sampling ratio for continuous markers
The difference Δ=Ω1 –Ω2 was used in Wieand et al. (1989) to compare the wAUCs for continuous data. Here the estimator of Ωl, for l =1, 2, is obtained by substituting the empirical function estimators Ŝd,l for Sd,l and Ŝ d̄,l for Sd̄,l in Ωl:
The resulting Δ-statistic is given by . Hereafter the subscripts m and n in will be omitted unless necessary and the notation will be used. We shall need differentiability of the ROC functions for our main theorem. The following assumption guarantees this property.
Assumption 1. Sd,l and Sd̄,l have continuous positive derivatives on . Let and denote their derivatives.
Let
and
for l=1,2, where . The variances of wi,j and vj,l are
and
Let wi = wi,1 – wi,2 and vj = vj,1 – vj,2. Further denote and . Wieand et al. (1989) and Tang et al. (2008) studied the Δ-statistic and showed that, under assumption 1, is , where is a small order term with converging to 0 in probability, as m, n → ∞ (Wieand et al. (1989), page 591). They also showed that
| (3) |
and
| (4) |
2.1.1. Optimal sampling ratio for evaluating one continuous marker
We start with one marker, say marker 1. It follows from the approximation of that is asymptotically equivalent to . The wi,1s are independent, identically distributed random corresponding to measurements of cases and the vj,1s are independent, identically distributed random variables related to measurements of controls. Following the general expression (2), we can see that the optimal sampling ratio for evaluating marker 1 on the basis of wAUC is given by . This ratio includes existing results for AUC by Hanley and Hajian-Tilaki (1997) and for the sensitivity by Janes and Pepe (2006). wAUC estimates AUC when W(u) = u for 0 < u0 < 1. Consequently, the optimal ratio becomes
i, k = 1, . . . , m, and j, l = 1, . . . , n. This can be written in terms of placement values, , as shown in Janes and Pepe (2006). When W(u)=I{u=u0} for 0<u0<1,, estimates the sensitivity at the FPR u0 and the optimal ratio can be shown reduce to
or
This has been derived in Janes and Pepe (2006).
2.1.2. Optimal sampling ratio for comparing two continuous markers
Since the wis are random variables corresponding to measurements of case patients and the vjs are also random variables related to measurements of control subjects, expression (2) gives the optimal ratio for comparing the difference between wAUCs:
| (5) |
where the variances are given in equations (3) and (4).
Since compares AUCs, partial AUCs or sensitivities at a particular FPR, we discuss the optimal ratios for these special cases by specifying corresponding weight functions. When we let the weight function be W(u) = u, for 0 < u < 1, compares the AUCs. The optimal ratio in equation (5) implies that the following ratio between the case and the control maximizes the power for comparing the AUCs A: , where
| (6) |
and
| (7) |
as shown in Appendix A. When W(u)= I{u = u0} for 0 < u0 < 1, compares the sensitivities of two markers at the FPR u0. The optimal ratio in equation (5) reduces to
where and
2.2. Optimal sampling ratio for ordinal markers
The variance of the -statistic involves the first derivatives of the ROC curves. The optimal ratio in equation (5) cannot be readily applied to ordinal data which often occur in radiology. We thus consider the non-parametric statistic by DeLong et al. (1988) to obtain the optimal ratio for comparing two ordinal markers which are usually two imaging modalities in radiology. Let for marker l, and be its estimator. DeLong's statistic estimates and is given as
where equals 1, for Ylj < Xli, for Ylj > Xli and 0 for Ylj > Xli, for marker l, l=1,2.
DeLong et al. (1988) showed that the large sample variance of has the form , with
from the cases, and
from the controls. Therefore, it follows from equation (2) that the ratio maximizes the power for comparing two ordinal markers. For the problem of evaluating a single ordinal marker, the optimal ratio is reduced to
3. Two-stage procedure to obtain the optimal ratio
We may assume a parametric model to obtain the variances and resulting optimal ratios derived in the preceding section. When a parametric model is correctly specified, the optimal ratio can be calculated from equation (2) for comparing ROC summary measures, and the SS to obtain a specified power can be subsequently derived. However, if the parametric model is misspecified, the SS calculated may not give the appropriate power. Fig. 1 shows the optimal ratios for comparing the AUCs and pAUCs with the case and control having different variances. The case and control observations are from the bivariate normal distributions with (X1, X2)~ N{(2, 2), Σx}, and (Y1, Y2) ~ N{(0, 0), Σy) , where Σx has diagonal elements 1 and off-diagonal elements 0.1, and Σy has diagonal elements σY2 and off-diagonal elements of 0.1σY2.We see that the optimal ratio decreases as αY increases from 0.8 to 1.3. This indicates that the variances of the case and the control play an important role in the optimal ratio. When the variance for the control is larger than the case, the optimal ratio becomes larger than 1, indicating that sampling more controls than cases yields a better power to detect a difference between markers. Thus, the misspecification of parametric models at the planning stage may lead to an incorrect optimal ratio.
Fig. 1.
Optimal sampling ratio for comparing (a) the AUCs or (b) pAUCs: the case and control observations are from the bivariate normal distributions with (X1, X2)~N{(2, 2), Σx} and (Y1, Y2) ~ N{(0,0), Σy}, where Σx has diagonal elements and off-diagonal elements 0.1, and Σy has diagonal elements σY2 and off-diagonal elements 0.1σY2; the pAUCs are obtained over the FPR between 0 and 0.6
For a fixed sample two-sided hypothesis test, to detect the difference between ROC summary measures, the required SSs m and n with power 1 – β and the significance level of α are given by
| (8) |
where Δ1 is the diffference between ROC summary measures under the alternative hypothesis to be detected. The total required SS is N = m + n.
Proschan (2005) introduced the concept of internal pilot data which often refers to accumulated data after a trial has been carried out for a certain period of time. To correct for the model misspecification at the beginning of the trial, we propose a two-stage procedure to use internal pilot data after some observations are available during the trial. Suppose that the total required SS N is fixed. Without loss of generality, we use a two-sided test in the procedure proposed. The procedure is given in the following steps.
Step 1: specify a parametric model to obtain vx,0 and vy,0, and the resulting initial optimal ratio .
Step 2: use the ratio together with vx,0 and vy,0 in the following SS formula to calculate initial SS m0 and n0 with power 1 – β and significance level α, , and n0 = N – m0, where Δ1 is the difference between ROC summary measures under the alternative hypothesis.
Step 3: after sufficient marker measurements are available on m1 cases and n1 controls at the first stage, the variance expressions of either the Δ-statistic or DeLong's statistic are recalculated by using available data. These variance estimators, and , are applied in equation (2) to recalculate the optimal ratio .
- Step 4: continue the trial by recruiting M2 cases and N2 controls, where M2 and N2 are given by
(9)
It was showed in Proschan (2005) that using the internal pilot data for comparing population means in clinical trials maintains the nominal type I error rate. The reason is that the sample variance that is obtained at the end of the first stage does not give information for the sample mean at the end of the trial. We show that it is also true in our case as m,n → ∞. Suppose that is the estimated Δ at the end of the stage with m cases and n controls. The variance estimators at the first stage are and , where wi and vj are given in Section 2.2. We first state the results for w̄m and v̄n in the following theorem, and then state the result for the Δ-statistic in the consequent corollary. The proof is provided in Appendix A.
Theorem 1. Let H0 :Ω1=Ω2. Assume that m, n → ∞, m1/m → λ1, n1/n → λ2 and m/n →, where 0 < λ < ∞ and 0 < λ1, λ2 < 1. Then,
| (10) |
where and . Also, under assumption 1,
| (11) |
Theorem 1 implies that w̄m is asymptotically independent of v̂x,1, and v̄n is asymptotically independent of v̂y, 1. We also observe that both w̄m and v̂x,1 are obtained on different subjects from v̄n and v̂y,1. Thus, we can obtain the following corollary for by ignoring the small order term in the approximation of .
Corollary 1. Under the regularity conditions in theorem 1, is asymptotically independent of v̂x,1 and v̂y,1 as m, n → ∞.
Therefore, the variance estimated at the first stage does not give information for the Δ-statistic at the end for large SSs. Thus, the resulting optimal ratio by using data from the first stage does not reveal information about the estimated difference between two ROC statistics obtained at the end of the second stage. Consequently, although the optimal ratio is updated during the trial, the analysis at the end of the trial can be carried out in the same fashion as in the trial without updating the optimal ratio. This is important in maintaining the proper type I error rate.
4. Application to the cancer diagnostic trial
We applied our method to the cancer diagnostic trial from Goddard and Hinberg (1990). Measurements from the blood samples are highly skewed for all biomarkers. We compared a new biomarker D and the reference biomarker A to illustrate the increment in power and the SS savings by using the procedure proposed. We assumed a contrast of Δ1 =0.05 between AUCs and the type I error rate 0.05 for calculating power and SS from a two-sided alternative. The overall SS N is 353 by summing the numbers of cases and controls. At the first stage, we accrued data on m1 = 60 cancer and n1 = 60 non-cancer patients and obtained the variance estimates v̂x,1 = 0.082 and v̂y,1 = 0.035, which resulted in the optimal case–control ratio r̂* = 1.53, from equation (2). Using this optimal ratio in the expression (9) in step 4 of the procedure proposed, the numbers of the cases and controls to be recruited in the second stage were calculated to be 153 and 80 respectively. The power by using the optimal ratio was then 50.9% from the equation
| (12) |
This power offers a 7% increment over the power 43.8% calculated by using equation (12) by replacing r̂* with the original ratio of 0.62. We also investigated the savings on the overall SS by using the procedure proposed. Using the original power 43.8% with the estimated optimal ratio r̂* = 1.53, the overall SS was calculated to be 292 with 177 cancer patients and 115 non-cancer patients. This offers savings of 61 patients over the original ratio.
5. Simulation studies
In this section, we demonstrate the performance of our method for maximizing power or minimizing total SSs when comparing summary statistics of diagnostic tests in extensive simulation studies. We consider both continuous data and ordinal data.
5.1. Simulation studies based on continuous data
The biomarker results in the example used in Section 4 are highly skewed, and a log-normal distribution was used by Goddard and Hinberg (1990) as a possible approximation to the distribution of results. Thus, we consider bivariate log-normal distributions in the simulation studies. In addition, we simulate data from both bivariate normal distributions which are commonly used for symmetrically distributed marker results and bivariate exponential distributions which can approximate survival biomarker results. The bivariate normal models have the forms (X1, X2/ ~ N{(μ1, μ2, ΣX} and (Y1, Y2/ ~ N{(0, 0), ΣY}, where the diagonal elements of ΣX and ΣY are 1 and 9 respectively, and the correlation parameter ρ is the same for two matrices. We choose ρ = 0.1 and ρ = 0.25 in our simulations. AUC is set to be 0.70 for marker 1, and 0.75 or 0.80 for marker 2. pAUC with the FPR in the range (0, 0.6) is set to be 0.30 for marker 1, and 0.35 or 0.40 for marker 2. The bivariate log-normal models have the forms exp(X1, X2) and exp(Y1, Y2) for cases and controls respectively. The AUCs and pAUCs remain the same as in the normal models. The log-normal distribution may also demonstrate the robustness of the aforementioned non-parametric methods. The performance of the methods is expected to be similar for the normal and log-normal distributions because the non-parametric estimators should remain invariant under monotone transformations.
According to the algorithm in Gumbel (1960), the bivariate exponential random variables take the form
where Hl, l = 1, 2, are univariate exponential functions, and ρ is in [–0.25, 0.25]. We set ρ to 0.1 or 0.25 here. The marginal survival functions are exp(–βl1x) and exp(–βl2y), so we could generate data from these two distributions. In the simulation, AUC is set to 0.70 for marker 1, and 0.75 or 0.80 for marker 2. pAUC with the FPR in the range (0, 0.6) is set to 0.30 for marker 1, and 0.35 or 0.45 for marker 2.
We compare the proposed two-step procedure with the equal case–control ratio and the optimal ratio. We use DeLong's statistic for comparing the AUCs and the Δ-statistic for comparing the pAUCs. In our simulation, we first assume that our samples were from bivariate normal distributions; then we use equation (8) to calculate the initial total required SS. With the type I error rate 0.05 and power 80%, the initial total required SSs are N = 1421, or N = 326 to detect the difference of two pairs of AUCs of (0.70, 0.75) and (0.70, 0.80) respectively, with ρ=0.1. When ρ=0.25, the total required SSs, N =1207, or N = 278, are needed to detect the difference in these pairs. There are three different sampling ratios:
the proposed two-stage optimal ratio;
the optimal ratio of 0.5 for the normal and log-normal distributions and the optimal ratio of 1.5 for the exponential distributions;
the equal sampling ratio.
To implement the method proposed, we let m1 = n1 = N/4. By substituting non-parametric variance estimates v̂x,1 and v̂y,1, the resulting optimal ratio is estimated by , and M2 and N2 are calculated by using equation (9). We then generate M2 new observations for cases and N2 observations for controls. Consequently, the null hypothesis of equal AUCs or pAUCs is rejected in favour of the alternative if the absolute value of the calculated Z-statistic is greater than or equal to z0.025. The simulated power is then calculated as the percentage of times out of 5000 simulation runs that the null hypothesis is rejected. The simulated powers are presented in Table 1, which illustrates that the simulated powers of the two-stage method proposed are close to those of the optimal ratio and are greater than those of the equal sampling ratio in the normal settings. Since the optimal ratio for the exponential distribution specified is close to 1.5, we see that most of the powers of the method proposed are greater than those of fixed ratios.
Table 1.
Simulated power for comparing AUCs or pAUCs by using the two-stage method proposed and fixed ratios, over 5000 simulations†
| ρ | Distribution | Powers (%) for comparing AUCs |
Powers (%) for comparing pAUCs |
||||||
|---|---|---|---|---|---|---|---|---|---|
| AUC for marker 2 | Two-stage | Fixed ratio |
pAUC for marker 2 | Two-stage | Fixed ratio |
||||
| Equal | Optimal | Equal | Optimal | ||||||
| 0.10 | BN | 0.75 | 80.0 | 77.0 | 79.5 | 0.35 | 33.3 | 31.4 | 32.8 |
| 0.80 | 80.4 | 74.6 | 80.2 | 0.45 | 88.1 | 86.2 | 88.9 | ||
| LN | 0.75 | 79.1 | 74.5 | 78.3 | 0.35 | 34.0 | 31.9 | 32.1 | |
| 0.80 | 80.5 | 74.8 | 79.4 | 0.45 | 89.1 | 85.3 | 88.0 | ||
| BE | 0.75 | 81.0 | 80.4 | 82.2 | 0.35 | 84.0 | 83.4 | 84.6 | |
| 0.80 | 81.6 | 80.0 | 82.7 | 0.45 | 85.0 | 84.0 | 84.4 | ||
| 0.25 | BN | 0.75 | 82.2 | 78.0 | 81.7 | 0.35 | 37.3 | 34.8 | 37.5 |
| 0.80 | 81.0 | 77.6 | 80.1 | 0.45 | 91.8 | 89.4 | 92.1 | ||
| LN | 0.75 | 82.0 | 78.7 | 81.5 | 0.35 | 37.0 | 34.2 | 36.7 | |
| 0.80 | 83.5 | 78.1 | 82.6 | 0.45 | 92.7 | 89.8 | 92.3 | ||
| BE | 0.75 | 83.7 | 82.6 | 83.3 | 0.35 | 91.2 | 90.3 | 91.0 | |
| 0.80 | 83.6 | 82.8 | 84.4 | 0.45 | 90.9 | 90.7 | 90.8 | ||
AUC for marker 1 is 0.70, and pAUC for marker 1 is 0.30. BN, bivariate normal; LN, bivariate log-normal; BE, bivariate exponential. ρ is the correlation coefficient of two markers. The optimal ratios for the bivariate normal and log-normal distributions are close to 0.5, and the optimal ratios for the bivariate exponential distribution are close to 1.5.
We also conduct simulation studies to illustrate that the method proposed reduces the total SS compared with the equal ratio. The aforementioned bivariate normal distribution is applied to simulate test results. We first calculate the initial total SS N with the equal ratio, type I error rate 0.05 and power 80%. At the end of stage I, with m1 =n1 simulated test results from two groups, we update the case/control ratio with the estimated optimal ratio from the interim data, and recalculate the total SS that is needed to achieve 80% power on the basis of the estimated ratio. Additional test results are then generated according to the updated SS in two groups, and the Z-statistic is estimated. The null hypothesis of equal AUCs is rejected in favour of the alternative if the absolute value of the calculated Z-statistic is greater than or equal to z0.025. The simulated power is given by the percentage of times out of 5000 simulation runs that the null hypothesis is rejected. The simulated power and the average updated total SS with m1 = n1 (N/5, N/7) are presented in Table 2, which illustrates that the two-stage method proposed reduces the total SS compared with the equal ratio. The simulated power of the two-stage method proposed is close to the nominal power for all parameterizations. In addition, the simulated power and updated SS vary little with different sizes at stage I.
Table 2.
Average updated total SS and simulated power for comparing AUCs by using the proposed two-stage method over 5000 simulations†
| ρ |
Results for m1 = n1 = N/5 |
Results for m1 = n1 = N/7 |
||||||
|---|---|---|---|---|---|---|---|---|
| AUC for marker 2 | Initial SS | Updated SS | Power (%) | AUC for marker 2 | Initial SS | Updated SS | Power (%) | |
| 0.10 | 0.75 | 1744 | 1333 | 80.9 | 0.75 | 1744 | 1335 | 80.4 |
| 0.80 | 405 | 311 | 80.6 | 0.80 | 405 | 313 | 79.1 | |
| 0.25 | 0.75 | 1527 | 1160 | 80.3 | 0.75 | 1527 | 1161 | 80.2 |
| 0.80 | 357 | 273 | 81.5 | 0.80 | 357 | 275 | 79.8 | |
The AUC for marker 1 is 0.70. ρ is the correlation coefficient of two markers.
We also evaluate the performance of the two-stage procedure to see whether the procedure maintains the nominal type I error rate. We use N = 200, 400, 500. We consider the parametric distributions and the three different sampling ratios that were used in the previous simulation. We assume equal AUCs or pAUCs with the AUCs being (0.70, 0.75, 0.80), and the pAUCs being (0.30, 0.35, 0.40). The nominal type I error rate is 0.05 in our simulation. The simulated type I error rates with 10000 simulation runs are shown in Table 3. All these rates are close to the nominal level when the sample size goes to 500.
Table 3.
Type I error rates for comparing the AUCs or pAUCs by using the two-stage method proposed, over 10000 simulations†
| ρ | Distribution |
Error rates (%) for comparing the AUCs
|
Error rates (%) for comparing the pAUCs
|
||||||
|---|---|---|---|---|---|---|---|---|---|
| AUCs | N = 200 | N = 400 | N = 500 | pAUCs | N = 200 | N = 400 | N = 500 | ||
| 0.1 | BN | 0.70 | 4.5 | 5.0 | 5.0 | 0.30 | 4.8 | 5.1 | 5.0 |
| 0.75 | 5.1 | 5.0 | 4.9 | 0.35 | 5.0 | 4.9 | 5.0 | ||
| 0.80 | 4.9 | 5.1 | 4.9 | 0.40 | 5.2 | 5.1 | 5.5 | ||
| LN | 0.70 | 4.9 | 5.0 | 5.0 | 0.30 | 4.6 | 5.2 | 5.1 | |
| 0 75 | 5.1 | 4.9 | 5.1 | 0.35 | 4.6 | 5.1 | 5.0 | ||
| 0.80 | 5.0 | 4.4 | 5.0 | 0.40 | 5.0 | 5.1 | 4.9 | ||
| BE | 0.70 | 5.0 | 5.1 | 5.0 | 0.30 | 5.2 | 4.9 | 5.0 | |
| 0.75 | 5.0 | 4.9 | 4.9 | 0.35 | 5.3 | 5.0 | 5.1 | ||
| 0.80 | 5.2 | 5.1 | 4.9 | 0.40 | 4.7 | 4.9 | 5.1 | ||
| 0.25 | BN | 0.70 | 4.9 | 4.8 | 4.7 | 0.30 | 5.1 | 5.0 | 4.9 |
| 0.75 | 5.1 | 5.0 | 5.0 | 0.35 | 5.2 | 5.3 | 5.2 | ||
| 0.80 | 5.2 | 5.1 | 5.0 | 0.40 | 4.9 | 5.1 | 5.1 | ||
| LN | 0.70 | 5.0 | 5.1 | 5.0 | 0.30 | 5.1 | 5.3 | 5.2 | |
| 0.75 | 4.9 | 4.7 | 4.9 | 0.35 | 4.5 | 5.0 | 4.8 | ||
| 0.80 | 4.8 | 3.9 | 5.0 | 0.40 | 4.6 | 4.8 | 5.0 | ||
| BE | 0.70 | 4.2 | 5.0 | 5.2 | 0.30 | 5.0 | 5.0 | 4.8 | |
| 0.75 | 5.3 | 5.0 | 4.9 | 0.35 | 5.1 | 4.7 | 5.0 | ||
| 0.80 | 4.2 | 5.1 | 4.9 | 0.40 | 4.9 | 4.7 | 5.0 | ||
BN, bivariate normal distribution; LN, bivariate log-normal distribution; BE, bivariate exponential distribution.
N is the total required SS and ρ is the correlation coefficient of two markers.
5.2. Simulation studies based on ordinal data
We also conduct simulation studies to evaluate the simulated power of the proposed method on ordinal test results. We first use the aforementioned bivariate log-normal distributions and bivariate exponential distributions to simulate continuous results. We then use the 20th, 40th, 60th and 80th percentiles of the distributions to categorize the simulated continuous data as follows. A test result is recoded as 1 if it is less than the 20th percentile, 2 if it is between the 20th and 40th percentiles, 3 if it is between the 40th and 60th percentiles, 4 if it is between the 60th and 80th percentiles, and 5 if it is greater than the 80th percentile. The rest of the simulated settings are identical to those in the previous section on evaluating the power for continuous data. The results in Table 4 indicate that the simulated power by using the method proposed is similar to that of the optimal ratios and is higher than for those parameterizations using the equal ratio.
Table 4.
Simulated power for ordinal data for comparing AUCs by using the two-stage method proposed and fixed ratios, over 5000 simulations†
| ρ | Distribution | AUC for marker 2 | Two-stage power (%) |
Fixed ratio power (%)
|
|
|---|---|---|---|---|---|
| Equal | Optimal | ||||
| 0.10 | LN | 0.75 | 81.2 | 75.3 | 79.7 |
| 0.80 | 82.0 | 77.3 | 83.2 | ||
| BE | 0.75 | 87.1 | 84.7 | 88.6 | |
| 0.80 | 86.6 | 84.5 | 86.4 | ||
| 0.25 | LN | 0.75 | 80.9 | 77 6 | 80.0 |
| 0.80 | 80.0 | 78.6 | 80.2 | ||
| BE | 0.75 | 89.0 | 87.1 | 88.6 | |
| 0.80 | 88.1 | 87.2 | 88.9 | ||
The AUC for marker 1 is 0.70. LN, bivariate log-normal distribution; BE, bivariate exponential distribution. ρ is the correlation coefficient of two markers.
6. Discussion
The optimal sampling ratio in diagnostic trials can maximize the test power or minimize the overall SS. The optimal sampling ratio that is discussed in this paper is analogous to the optimal allocation ratio in assigning treatments to patients in clinical trials. The optimal allocation ratio has been used in clinical trials for decades, but the importance of the optimal ratio in diagnostic trials has not been widely recognized. Implementation requires the calculation of complicated variances of frequently used ROC statistics. This paper discusses a common variance structure for ROC statistics and thereby introduces optimal sampling ratios in comparative diagnostic trials based on these statistics. Two popular non-parametric ROC statistics are used to illustrate the explicit forms of the optimal ratios because their variance expressions can be written as the sum of separate terms; one relates to the cases, and the other relates to the controls.
If preliminary studies are available before carrying out a comparative diagnostic trial, the variance can be estimated by using pilot data to obtain the optimal ratio for comparing specified ROC summary measures. The ratio can then be used to recruit patients in the trial, and recalculating the ratio may not be necessary during the trial. However, when medical practitioners do not have preliminary data for the markers and are not certain about the distributions of the marker results, the distribution assumption that is used for obtaining the optimal ratio may be far from the true underlying distributions for the marker results. This may result in less power or larger overall SSs than using the true optimal ratio. The two-stage procedure proposed is then particularly useful to ensure that the optimal ratio can be recalculated by using internal pilot data during the trial. The procedure proposed performs well in a large-scale simulation study. We also demonstrate that the procedure proposed maintains the nominal type I error rate in the simulation. We use an example in cancer diagnostic studies to illustrate the application of our method on maximizing the test power and saving overall SSs. The results indicate that, compared with the original sampling ratio, using the proposed two-stage procedure for a fixed overall sample size increased the test power. Alternatively, for the fixed test power, the procedure proposed reduces the overall SS by nearly 25%.
In some rare diseases, it may not be possible to recruit the required number of the cases. Suppose that only 135 cancer patients can be recruited in the aforementioned cancer diagnostic trial. If the calculated optimal ratio of 1.53 is maintained, then 89 non-cancer patients should be in the trial. This leads to the total SS of N =224. Using the power calculation formula (12) gives a power of 35.2%, which sacrifices 8% power while reducing the SS by 129. This indicates that, for a fixed number of cases, recruiting more controls may increase the power if the budget of a trial allows. This can be seen from the variance expression (1) since, when m is fixed in equation (1), the variance decreases as n increases. Thus, with the constraint of total 353 subjects and 135 cases, the original sampling ratio of 1.14 (135/118) gives the maximum power.
The characteristics of subjects are often matched in case–control studies to minimize the confounding effects. Janes and Pepe (2008) illustrated that a ROC summary estimate without adjusting for covariates may be biased. If covariate information is available, matching should be considered for deriving the optimal sampling ratio. Future research on this topic is warranted.
Acknowledgements
The authors thank the Associate Editor, the Joint Editor and a referee for their constructive comments. The authors also thank their colleague Anand Vidyashankar for many useful suggestions that led to an improvement in this paper. The project described here was supported by award R15CA150698 from the National Cancer Institute under the American Recovery and Reinvestment Act of 2009 and by award H98230-11-1-0196 from the National Security Agency.
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Cancer Institute or the National Institutes of Health.
Appendix A: Variance expressions of receiver operating characteristic statistics, variance deviation and proof of proposition 1
A.1. Variance expressions for a parametric receiver operating characteristic statistic
When measurements of markers have bivariate normal distributions, Mazumdar and Liu (2003) provided expressions for the variances, vx and vy. Suppose that (X1, X2) ~ N{(μ1d, μ2d), Σd} for i=1, . . . , m and (Y1,Y2) ~ N{(μ1d̄, μ2d̄} for j = 1, . . . , n, where
and
The statistic considered in Mazumdar and Liu (2003) is the partial AUC estimator given by
where α1 = (μ1d – μ1 d̄)/σ1d, β1 = σ1d̄/σ1d and α2 = (μ2d – μ2d̄)/σ2d and β2 = σ2d̄/σ2d. Let
and
with and .
Let
and fl2 = –fl1, for l = 1, 2. In addition, let
and
The variances vx and vy for can be written as
and
A.2. Derivation of and
We can show that
can be expressed as
Let and ; then we have
Similarly, vy becomes
It follows that
Let and ; then it follows that
Because
can also be written as
the expressions for vx and vy are simplified to equations (6) and (7) respectively.
A.3. Proof of theorem 1
Recall that , and are the sample means at the end of the trial and E(w̄m) = E(v̄n) = 0, where m and n are the SSs at the end of the trial. The variance estimators at the end of the first stage are and , where m1 and n1 are SSs at the end of the first stage. Let and . We shall show that
| (13) |
where Σw is a diagonal matrix. For this, using the Cramer–Wold device, consider lTXm, where l = (l1, l2)T. Since the Bm can be expressed as
it follows that
| (14) |
Since the wis are bounded random variables, Tm(1) and Tm(2) have finite second moments. Also, Tm(1) and Tm(2) are independent since Tm(1) is based on the random variables {wi : i = 1, . . . , m1} and Tm(2) is based on the random variables {wi : i = m1 + 1, . . . , m}. Hence, by the central theorem and Slutsky's theorem (Serfling, 1980), it follows that, as m → ∞,
Also, under hypothesis H0, since , the limiting variance can be shown to reduce to , where , and . Now, returning to the last term on the right-hand side of equation (14), note that converges to 0 in probability, as m → ∞, by the central limit theorem. This completes the proof of expression (13) and hence expression (10).
We now turn to the proof of expression (11). Now, under assumption 1, Rl(u) is continuously differentiable, and it follows that
Now the proof can be completed along the lines of the proof of expression (10). This completes the proof of theorem 1.
References
- DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44:837–845. [PubMed] [Google Scholar]
- Etzioni R, Kooperberg C, Pepe M, Smith R, Gann PH. Combining biomarkers to detect disease with application to prostate cancer. Biostatistics. 2003;4:523–538. doi: 10.1093/biostatistics/4.4.523. [DOI] [PubMed] [Google Scholar]
- Etzioni R, Pepe M, Longton G, Hu C, Goodman G. Incorporating the time dimension in receiver operating characteristic curves: a case study of prostate cancer. Med. Decsn Makng. 1999;19:242–251. doi: 10.1177/0272989X9901900303. [DOI] [PubMed] [Google Scholar]
- Goddard MJ, Hinberg I. Receiver operator characteristic (roc) curves and non-normal data: an empirical study. Statist. Med. 1990;9:325–337. doi: 10.1002/sim.4780090315. [DOI] [PubMed] [Google Scholar]
- Gumbel EJ. Bivariate exponential distributions. J. Am. Statist. Ass. 1960;55:698–707. [Google Scholar]
- Hanley JA, Hajian-Tilaki KO. Sampling variability of nonparametric estimates of the areas under receiver operating characteristic curves: an update. Acad. Radiol. 1997;4:49–58. doi: 10.1016/s1076-6332(97)80161-4. [DOI] [PubMed] [Google Scholar]
- Hendrick RE, Cole EB, Pisano ED, Acharyya S, Marques H, Cohen MA, Jong RA, Mawdsley GE, Kanal KM, D’Orsi CJ, Rebner M, Gatsonis C. Accuracy of soft-copy digital mammography versus that of screen-film mammography according to digital manufacturer: ACRIN DMIST retrospective multireader study. Radiology. 2008;247:38–48. doi: 10.1148/radiol.2471070418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Janes H, Pepe M. The optimal ratio of cases to controls in a case-control for estimating the classification accuracy of a biomarker. Biostatistics. 2006;7:456–468. doi: 10.1093/biostatistics/kxj018. [DOI] [PubMed] [Google Scholar]
- Janes H, Pepe MS. Matching in studies of classification accuracy: implications for analysis, efficiency, and assessment of incremental value. Biometrics. 2008;64:1–9. doi: 10.1111/j.1541-0420.2007.00823.x. [DOI] [PubMed] [Google Scholar]
- Jennison C, Turnbull BW. Group Sequential Methods with Applications to Clinical Trials. Chapman and Hall; New York: 2000. [Google Scholar]
- Mazumdar M, Liu A. Group sequential design for comparative diagnostic accuracy studies. Statist. Med. 2003;22:727–739. doi: 10.1002/sim.1386. [DOI] [PubMed] [Google Scholar]
- Pepe MS, Etzioni R, Feng Z, Potter JD, Thompson ML, Thornquist M, Winget M, Yasui Y. Phases of biomarker development for early detection of cancer. J. Natn. Cancer Inst. 2001;93:1054–1061. doi: 10.1093/jnci/93.14.1054. [DOI] [PubMed] [Google Scholar]
- Proschan M. Two-stage sample size re-estimation based on nuisance parameter a review. J. Biopharm. Statist. 2005;15:559–574. doi: 10.1081/BIP-200062852. [DOI] [PubMed] [Google Scholar]
- Rosenberger WF, Lachin JM. Randomization in Clinical Trials Theory and Practice. Wiley; New York: 2002. [Google Scholar]
- Serfling RJ. Approximation Theorems of Mathematical Statistics. Wiley; New York: 1980. [Google Scholar]
- Tang L, Emerson SS, Zhou X. Nonparametric and semiparametric group sequential methods for comparing accuracy of diagnostic tests. Biometrics. 2008;64:1137–1145. doi: 10.1111/j.1541-0420.2008.01000.x. [DOI] [PubMed] [Google Scholar]
- Wieand S, Gail MH, James BR, James KL. A family of non-parametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika. 1989;76:585–592. [Google Scholar]
- Zhou X, McClish DK, Obuchowski NA. Statistical Methods in Diagnostic Medicine. Wiley; New York: 2002. [Google Scholar]
- Zou K, Tempany C, Fielding J, Silverman S. Original smooth receiver operating characteristic curve estimation from continuous data: statistical methods for analyzing the predictive value of spiral ct of ureteral stones. Acad. Radiol. 1998;5:680–687. doi: 10.1016/s1076-6332(98)80562-x. [DOI] [PubMed] [Google Scholar]

