Abstract
The area under the ROC curve (AUC) and partial area under the ROC curve (pAUC) are summary measures used to assess the accuracy of a biomarker in discriminating true disease status. The standard sampling approach used in biomarker validation studies is often inefficient and costly, especially when ascertaining the true disease status is costly and invasive. To improve efficiency and reduce the cost of biomarker validation studies, we consider a test-result-dependent sampling (TDS) scheme, in which subject selection for determining the disease state is dependent on the result of a biomarker assay. We first estimate the test-result distribution using data arising from the TDS design. With the estimated empirical test-result distribution, we propose consistent nonparametric estimators for AUC and pAUC and establish the asymptotic properties of the proposed estimators. Simulation studies show that the proposed estimators have good finite sample properties and that the TDS design yields more efficient AUC and pAUC estimates than a simple random sampling (SRS) design. A data example based on an ongoing cancer clinical trial is provided to illustrate the TDS design and the proposed estimators. This work can find broad applications in design and analysis of biomarker validation studies.
Keywords: Area under ROC curve (AUC), Empirical likelihood, Nonparametric, Partial area under ROC curve (pAUC), Simple random sampling, Test-result-dependent sampling
1 Introduction
The area under the ROC curve (AUC) and partial area under the ROC curve (pAUC) are two summary measures of accuracy of a diagnostic test or a biomarker [1, 2]. We use the terms “biomarker” and “diagnostic test” in a generic sense to mean any measure that could signal the onset of a disease or a disease condition. Let D be the true disease status with D = 1 for presence of disease and D = 0 for absence of disease. Let Y be a continuous measure for a biomarker, where a higher value is indicative of increased severity of the disease. The true and false positive rates at the threshold c are TPR(c) = Pr(Y ≥ c|D = 1) = Pr(YD ≥ c) and FPR(c) = Pr(Y ≥ c|D = 0) = Pr(YD̄ ≥ c) respectively, where D and D̄ in indices denote the sets {D = 1} and {D = 0}. An ROC curve is a plot of the entire set of possible true and false positive rates ROC(c) = {(FPR(c), TPR(c)), c ∈ (−∞, ∞)}. The AUC is defined over the entire ROC curve as . The larger the AUC value, the more accurate the diagnostic test is. However, in medical research, we often know a priori that a particular test would be useful only if its false-positive rate is in a restricted range of rates. Partial AUC (pAUC) is defined over the restricted range of false positive rates (t0, t1) with , where q0 = FPR−1(t1) and q1 = FPR−1 (t0).
Validation of the predictive value of a biomarker or a diagnostic test usually involves large prospective studies, in which subjects are sampled randomly from a target population. Observing the true disease status can be costly, invasive, or time consuming and the positivity rates for many markers can be low in the target population. Moreover, medical investigators are often interested in the performance of a diagnostic test in a narrow range of test results that corresponds to low false positive rates. For example, in cancer screening, where false positivity is associated with greater risk than benefit from aggressive intervention, partial AUC over the restricted false positive rates is of more interest [3, 4]. All of these factors make the conventional design, in which the true disease status is ascertained for a large number of randomly sampled subjects regardless of their test results, is inefficient and costly. In this article, we consider test-result-dependent sampling (TDS) as an alternative to simple random sampling (SRS) for biomarker validation. Rather than sampling all subjects randomly from the target population, TDS allows a subset of subjects to be sampled using a test-result-dependent sampling scheme. Depending on the specific questions, the investigators can select more subjects from those in the range of test results that contain the most information for statistical estimation at the design phase of the study. By oversampling subjects with the chosen ranges of test results and undersampling subjects outside of these ranges, the TDS design has potential to considerably improve efficiency of estimating AUC (or pAUC) for a given number of subjects that require their true disease status determined.
In this article, we consider test-result-dependent sampling (TDS) that consists of two sampling components, a simple random component of subjects from the targeted study population and a test-result-dependent component from the same population. As in the case of verification bias in diagnostic medicine [1, 2], the TDS design will introduce bias when standard methods are used to estimate AUC and pAUC based on TDS data. However, the existing methods of AUC and pAUC for verification bias are not applicable to the TDS data. Verification bias occurs primarily in population-based studies, in which a diagnostic test is applied to all N subjects in the cohort, yielding a test result Y for all study units. As only a subset of patients is selected to observe (verify) true disease status D, unselected patients in the cohort are missing D. When D is missing at random (MAR) [5], the selection probability of observing D is directly estimable. The bias in estimating AUC can be corrected by the inverse probability weighting method for binary, ordinal or continuous diagnostic tests (e.g. [6, 7, 8, 9, 10, 11]). In the TDS design, patients are selected to observe D depending on their ranges of test result Y, but the size of the parent cohort from which the patients are accrued and the test results of unscreened patients are unknown to the investigators. Indeed, such situations occur in most cancer biomarker validation studies, in which subjects come from hospital-based settings and randomized clinical trials and population-based study designs are not feasible. In the TDS design, for all the subjects in the selected sample, the data available for analysis are observed, both Y and D. Since there are no missing variables, the TDS design is different from the verification bias problem and the inverse probability weighting (IPW) approach for missing data is not directly applicable.
For data with a TDS design structure, standard methods of estimating AUC and pAUC yield biased results. In this article, an empirical likelihood method is used to estimate the distribution of the test result Y conditional on disease status D. Unbiased estimators for AUC and pAUC are then obtained by reweighing the standard estimators with the estimated empirical distribution. The estimators are nonparametric in that no parametric model is assumed for the relationship between the test result and the true disease status. Our method is related to the general approach of outcome-dependent sampling. Weaver and Zhou [12] and Zhou et al. [13] discussed the utility of outcome-dependent sampling schemes in estimating the coefficients of regression models for epidemiologic studies. Zhou et al. [14] and Wang and Zhou [15] studied empirical likelihood-based estimators for regression models where the sample depends on the outcome or an auxiliary variable. Outcome-dependent sampling is also used in econometrics. Cosslett [16, 17] proposed a constrained likelihood estimator for regression coefficients of a categorical regression model. Morgenthaler and Vardi [18] studied a weighted likelihood method for data arising from length-bias sampling.
For AUC and pAUC estimation, nonparametric methods are generally preferable to their parametric counterparts since they require fewer assumptions. Bamber [19] showed that AUC is the probability that a randomly observed test result in diseased subjects is higher than a randomly observed test result in non-diseased subjects, i.e. Pr(YD > YD̄). Hanley and McNeil [20] showed that the AUC estimator based on the trapezoidal rule is equivalent to the Mann-Whitney U statistic. DeLong et al. [21] and Wieand et al. [22] studied nonparametric estimation of AUC and testing of AUC differences for paired biomarkers. Partial AUC (pAUC) can be considered an average of TPR(c) across all FPR(c) where FPR(c) ∈ (t0, t1). Thompson and Zucchini [23], McClish [24] and Jiang et al. [25] studied parametric pAUC estimation assuming bivariate normal models. Zhang et al. [26], Dodd and Pepe [4], and He and Escobar [27] studied nonparametric pAUC estimation by extending the approach of DeLong et al. [21]. In all these papers, the data are assumed to be a simple random sample of the targeted patient population.
The rest of the paper is organized as follows. In Section 2, we specify the likelihood for data arising from a TDS design and propose a nonparametric empirical likelihood method to estimate the distribution of the test result conditional on the disease status. Section 3 presents the nonparametric AUC estimator under the TDS design and establishes its asymptotic properties. Section 4 studies the nonparametric pAUC estimator under the TDS design. Section 5 describes the finite sample properties of the proposed estimators under the TDS design. The efficiency gain of the proposed estimators under the TDS design relative to the standard estimators under the SRS design is also investigated. We illustrate in Section 6 the proposed method with an example. The article concludes with some remarks in Section 7. Proofs of the asymptotic properties for the proposed estimators are sketched in the Appendix.
2 Estimation of Test Result Distribution
We consider test-result-dependent sampling (TDS) which consists of a simple random component and a test-result-dependent component. Assume that the test result Y falls into one of K mutually exclusive intervals Ck = (ak−1, ak), k = 1, …, K where ak−1 < ak and a0 = −∞, aK = ∞. One observes a test-result-dependent component {Yki, Dki}, i = 1, …, nk conditional on whether Y belongs to stratum Ck as well as a simple random component {Y0i, D0i}, where i = 1, …, n0. The combined sample size of the two components is .
Denote by f(Y, D) the joint density function of Y and D. Let psi = f (Ysi, Dsi), s = 0, …, K, i = 1, …, ns, and θk = Pr(Y ∈ Ck), k = 1, …, (K − 1), θ = (θ1, …, θK−1)T and . The empirical likelihood of {Ysi, Dsi} under the TDS design is then
Based on empirical likelihood theory [28, 29, 30], one can maximize the log(L(θ, psi)) under the following constraints below to find estimates of θ and psi:
where I(·) is the indicator function. Using Lagrange multipliers, one can show that the maximum of log(L(θ, psi)) is attained at
| (1) |
where and θ̂k can be obtained by solving ∂lp(θ)/∂θ = 0, where
| (2) |
Following Qin and Lawless [30], we can show θ̂ asymptotically follows a normal distribution. That is, under general regularity conditions, if ns/n → ρs as n → ∞ where 0 < ρs < 1, s = 0, …, K − 1, we have , where with ξsi is given in the Appendix.
3 Nonparametric Estimation of AUC
The AUC is defined as . Under simple random sampling, the nonparametric AUC estimator [19, 20] is
| (3) |
where Iij = I(Yi > Yj), i, j = 1, …, n, and ÂSRS is known to have good properties under simple random sampling. However, ÂSRS yields biased estimates under the TDS design. To obtain unbiased AUC estimates for the TDS design, we reweight the standard AUC estimator with the estimates p̂si in (1) and propose the following nonparametric estimator under the TDS design:
| (4) |
For notational simplicity, we have replaced the double subscripts si in p̂si with a single subscript i with i = 1, · · ·, n in p̂i. Let and
| (5) |
Assuming the U-process Un(A, θ) = n1/2 {Rn(A, θ) − E[Rn(A, θ)]} is equicontinuous, we are able to show that the proposed estimator ÂTDS for AUC converges asymptotically to a normal distribution, that is,
| (6) |
where Λn is given in the Appendix.
4 Nonparametric Estimation of pAUC
Denote the partial AUC restricted in (t0, t1) by At. Let q0 = FPR−1(t1) and q1 = FPR−1(t0). By the definition of partial AUC, we have
Under simple random sampling, Dodd and Pepe (2003) proposed a nonparametric estimator for partial AUC: , or equivalently,
| (7) |
and they illustrated that is consistent and more robust than other estimators that make parametric assumptions. We propose the following nonparametric pAUC estimator for the TDS design:
| (8) |
where and . To estimate in formula (8), we first estimate , then limit to the range (t0, t1).
Besides , we may also consider a standardization recommended by McClish [24] and Jiang et al. [25]. Let τt = At/(t1 − t0) = Pr(YD > YD̄| FPR ∈ (t0, t1)). The τt may be interpreted as the average true positive rate over the range of false positive rates of concern. The estimator τ̂t is given by
We establish the asymptotic properties of by letting and FP̃Rj = Σip̃i(1 − Di)I(Yi > Yj)/Σip̃i(1 − Di) where p̃i is given in (5). Let , and be the number of elements of Δ falling into the sth stratum of the test result Y. Assuming the U-process is equicontinuous and as n → ∞, we can show that the proposed estimator for τt follows asymptotically a normal distribution, that is, , where Ωn is given in the Appendix.
The asymptotic properties of can be easily obtained from , that is,
| (9) |
where .
5 Simulation Study
Simulation studies were conducted to evaluate the performance of the AUC estimator ÂTDS and the pAUC estimator under the TDS design. First, we illustrate the unbiasedness of the estimators under the TDS design. Second, we assess the performance of the proposed variance estimators under different specifications of subject allocations and cutoff points for stratification. The structure of the TDS design is determined by several factors, including number of strata K, choice of cutoff points (a1, a2, · · ·, aK−1) and subject allocations among strata ns = (n0, n1, · · ·, nK). Last, we examine the efficiency gain of the use of the TDS design and the proposed estimators over the SRS design and the standard estimators.
In all simulations, we assume D ~ Bernoulli(0.3), and . To generate the TDS data, we first generate a random sample of size n0, and then generate the test-result-dependent (TDS) component. The TDS component consists of samples of sizes n1, n2, n3 from the lower tail, the middle region and the upper tail of Y, respectively. These regions are defined by the cutoff points (a1, a2). For the purpose of illustration, we use K = 3 and (a1, a2) = (μY − α × σY, μY +α × σY ) for AUC, and (a1, a2) = (q0, q1) for partial AUC, where (q0, q1) are the quantiles of YD̄ at the pre-specified (t0, t1). All simulation studies are based on 5, 000 independent runs.
5.1 Unbiasedness of Proposed Estimators Under TDS
To illustrate the unbiasedness of the proposed estimator ÂTDS under the TDS design, we generated TDS data with K = 3, (a1, a2) = (μY − ασY, μY + ασY ) and ns = (n0, n1, n2, n3) = (180, 60, 60, 60), where α = 1 or 1.5, μY and σY are mean and standard deviation of Y. Table 1 lists the mean (Estimate) and the relative bias (Bias%) of ÂSRS (3) and ÂTDS (4) under the TDS design. The standard AUC estimator ÂSRS yields significant bias under the TDS design, and the bias becomes larger as the cutoff points move away from the center of the test-result distribution. In a contrast, for the same TDS data, the proposed estimator ÂTDS yields unbiased estimates. In the illustration of the unbiasedness of the proposed pAUC estimator , we chose the combinations: (t0, t1) = (0, 0.1), ns = (180, 90, 90), and (t0, t1) = (0.1, 0.2), ns = (180, 60, 60, 60). Notice that two methods were used to determine the cutoff points for stratification: either the true quantiles (q0, q1) of YD̄ or the quantiles (q̂0, q̂1) of the empirical distribution ŶD̄ at (t0, t1). The standard estimator (7) and the proposed estimator (8) were applied to the same TDS data. As seen in Table 2, yields extreme bias under the TDS design, while the proposed estimator for the TDS design performs well with essentially no bias.
Table 1.
Unbiasedness of ÂTDS under TDS
| True AUC | Method | Estimate | Bias % | Estimate | Bias % |
|---|---|---|---|---|---|
|
|
|
||||
| α = 1.0 | α = 1.5 | ||||
YD ~
(0.5, 1.5), YD̄ ~
(0, 1) | |||||
| 0.6092 | ÂSRS | 0.6298 | 3.38 | 0.6383 | 4.78 |
| ÂTDS | 0.6080 | −0.20 | 0.6079 | −0.21 | |
|
| |||||
YD ~
(1, 1), YD̄ ~
(0, 1) | |||||
| 0.7602 | ÂSRS | 0.7969 | 4.83 | 0.8282 | 8.95 |
| ÂTDS | 0.7594 | −0.11 | 0.7587 | −0.20 | |
Note: TDS data were generated with ns = (180, 60, 60, 60) and cutoff points (a1, a2) = (μY − ασY, μY + ασY). The ÂTDS and ÂSRS were applied to the same copies of the TDS data.
Table 2.
Unbiasedness of under TDS
| (t0, t1) | True pAUC | Method | Estimate | Bias % | Estimate | Bias % | |
|---|---|---|---|---|---|---|---|
|
|
|
||||||
| (q0, q1) | (q̂0, q̂1) | ||||||
YD ~
(0.5, 1.5), YD̄ ~
(0, 1) | |||||||
| (0, 0.1) | 0.0209 |
|
0.0817 | 290.91 | 0.0759 | 263.16 | |
|
|
0.0206 | −1.44 | 0.0208 | −0.48 | |||
| (0.1, 0.2) | 0.0358 |
|
0.1167 | 225.98 | 0.0985 | 175.14 | |
|
|
0.0354 | −1.12 | 0.0358 | 0 | |||
|
| |||||||
YD ~
(1, 1), YD̄ ~
(0, 1) | |||||||
| (0, 0.1) | 0.0243 |
|
0.0747 | 207.41 | 0.0701 | 188.48 | |
|
|
0.0239 | −1.65 | 0.0241 | −0.82 | |||
| (0.1, 0.2) | 0.0481 |
|
0.1220 | 153.64 | 0.1043 | 116.84 | |
|
|
0.0474 | −1.46 | 0.0481 | 0 | |||
Note: The TDS data were generated with ns = (180, 90, 90) for (t0, t1) = (0, 0.1) and ns = (180, 60, 60, 60) for (t0, t1) = (0.1, 0.2). The cutoff points for stratification are either the quantiles (q0, q1) of YD̄ or the quantiles (q̂0, q̂1) of ŶD̄ at (t0, t1).
5.2 Performance of Variance Estimators under TDS
We examine the finite sample performance of the proposed variance estimators for ÂTDS and by simulating the TDS data with different structures. Table 3 and Table 4 lists the mean of estimates (Estimate), the relative bias (Bias%), the simulated standard error (SE), the mean of the estimated standard errors ( ) and the coverage percentage of the 95% confidence interval (CP). Estimated standard errors were obtained by the proposed variance estimators, in which large sample quantities were replaced with finite sample quantities. As seen in Table 3, the proposed estimator ÂTDS has good finite sample property under TDS data with several specifications of subject allocation and cutoff points. The estimated standard errors in all cases are very close to their true values and the empirical 95% coverage percentage are close to the nominal value. Different structures of the TDS design result in slightly different true standard errors, but the proposed estimators and their variance estimators perform fairly well in all these cases. Similar results are observed in Table 4 for the proposed estimator for partial AUC under various TDS designs. Simulation studies were conducted under many combinations on subject allocation and cutoff points, but we have not seen a systematic pattern that a specific combination consistently yields most efficiency gain.
Table 3.
Performance of ÂTDS and Its Variance Estimator under TDS
| True AUC | TDS Designs | Estimate | Bias% | SE |
|
95% CP | |
|---|---|---|---|---|---|---|---|
| ns | (α0, α1) = (−1.0, 1.0) | ||||||
| 0.7602 | (180, 60, 60, 60) | 0.7594 | −0.11 | 0.0274 | 0.0274 | 0.949 | |
| (180, 30, 60, 90) | 0.7596 | −0.08 | 0.0277 | 0.0277 | 0.951 | ||
| (270, 30, 30, 30) | 0.7591 | −0.14 | 0.0270 | 0.0267 | 0.943 | ||
|
| |||||||
| (α0, α1) | ns = (180, 60, 60, 60) | ||||||
| 0.7602 | (−1.5, 1.5) | 0.7587 | −0.20 | 0.0304 | 0.0302 | 0.945 | |
| (−1.0, 0.5) | 0.7593 | −0.12 | 0.0277 | 0.0274 | 0.943 | ||
Note: The TDS data were drawn assuming YD ~
(1, 1), YD̄ ~
(0, 1) with (a1, a2) = (μY + α0σY, μY + α1σY ) and ns = (n0, n1, n2, n3).
Table 4.
Performance of and Its Variance Estimator under TDS
| (t0, t1) | True pAUC | ns | Estimate | Bias% | SE |
|
95% CP | |
|---|---|---|---|---|---|---|---|---|
| (q0, q1) | ||||||||
| (0, 0.1) | 0.0243 | (180, 90, 90) | 0.0239 | −1.65 | 0.0039 | 0.0039 | 0.955 | |
| (0, 0.1) | 0.0243 | (180, 30, 150) | 0.0238 | −2.06 | 0.0036 | 0.0039 | 0.961 | |
|
| ||||||||
| (q̂0, q̂1) | ||||||||
| (0, 0.1) | 0.0243 | (180, 90, 90) | 0.0241 | −0.82 | 0.0040 | 0.0039 | 0.950 | |
| (0, 0.1) | 0.0243 | (180, 30, 150) | 0.0240 | −1.23 | 0.0038 | 0.0039 | 0.956 | |
| (0, 0.05) | 0.0079 | (180, 90, 90) | 0.0078 | −1.27 | 0.0016 | 0.0017 | 0.961 | |
| (0, 0.2) | 0.0724 | (180, 90, 90) | 0.0721 | −0.41 | 0.0093 | 0.0087 | 0.938 | |
| (0.1, 0.2) | 0.0481 | (180, 60, 60, 60) | 0.0481 | 0 | 0.0057 | 0.0054 | 0.933 | |
Note: The TDS data were drawn assuming YD ~
(1, 1), YD̄ ~
(0, 1).
5.3 Efficiency Comparison of TDS Design over SRS Design
One of the main reasons to use the TDS design over the standard SRS design in biomarker validation is that the TDS design improves efficiency in assessing the performance of a diagnostic test or biomarker with the same number of subjects. The strategy could be particularly useful for the estimation of the partial area under ROC curve (pAUC). In this case, oversampling can be done by concentrating those subjects with test results largely corresponding to the range of false-positive rates of interest. To illustrate this, we generated separate data for the SRS design and the TDS design of the same sample size n = 360. Efficiency comparison is made by applying to the TDS data while applying to the SRS data. If simulation shows that the TDS design yields better efficiency over the SRS design in estimating pAUC, it would support the utility of the TDS design over the SRS design in biomarker validation. Table 5 lists Estimate, Bias% and simulated standard error (SE), ratio of mean square error (RMSE). subject proportion (nratio%) for the TDS data and the SRS data, where ns = (180, 90, 90) is taken for (t0, t1) = (0, 0.1). Table 5 shows considerable efficiency gain using the TDS design to estimate partial AUC under various scenarios, regardless of whether the true quantile (q0, q1) or (q̂0, q̂1) was used as cutoff points for stratification. The results indicate that a combined use of the TDS design and the proposed pAUC estimator can lead to more than 50% efficiency gain over that from the SRS design and the standard pAUC estimator. Based on similar simulations, the efficiency gain for estimating AUC under the TDS design is relatively small. This is understandable since the TDS design oversamples patients corresponding to the range of interest. Thus, the efficiency gain of the TDS design is not as high in estimating AUC, since it is an average across the full range of test results, thereby counteracting the effect of the TDS design by oversampling the two tails and undersampling in between.
Table 5.
Efficiency Comparison of TDS Design and SRS Design for Estimating pAUC
| (t0, t1) | Design | nratio% | Method | True | Estimate | Bias % | SE | RMSE | |
|---|---|---|---|---|---|---|---|---|---|
YD ~
(0.1, 1), YD̄ ~
(0, 1) | |||||||||
| (q0, q1) | |||||||||
| (0, 0.1) | SRS | (89, 11) |
|
0.0061 | 0.0060 | −1.64 | 0.0023 | 1 | |
| TDS | (70, 30) |
|
0.0061 | 0.0059 | −3.28 | 0.0014 | 0.40 | ||
| (q̂0, q̂1) | |||||||||
| (0, 0.1) | SRS | (91, 9) |
|
0.0061 | 0.0064 | 4.92 | 0.0024 | 1 | |
| TDS | (70, 30) |
|
0.0061 | 0.0060 | −1.64 | 0.0015 | 0.33 | ||
|
| |||||||||
YD ~
(1, 1), YD̄ ~
(0, 1) | |||||||||
| (q0, q1) | |||||||||
| (0, 0.1) | SRS | (84, 16) |
|
0.0243 | 0.0243 | 0 | 0.0062 | 1 | |
| TDS | (67, 33) |
|
0.0243 | 0.0238 | −2.06 | 0.0038 | 0.38 | ||
| (q̂0, q̂1) | |||||||||
| (0, 0.1) | SRS | (77, 23) |
|
0.0243 | 0.0252 | 3.70 | 0.005 | 1 | |
| TDS | (63, 37) |
|
0.0243 | 0.0241 | −0.82 | 0.004 | 0.62 | ||
Note: RMSE represents ratio of mean square error relative to the SRS design. was applied to the SRS data; was applied to TDS data. The SRS and TDS designs have the same size n = 360. For TDS data nratio = (n01 + n1, n02 + n2, n03 + n3)/n, where n0k is the number of patients in each stratum with the cutoff points (q0, q1) or (q̂0, q̂1) based on the SRS component of the TDS data. For the SRS data nratio% is the percentage of patients falling in each stratum defined by the cutoff points for stratification.
6 Data Example
We have focused on the TDS design and emphasized its potential to improve efficiency over the SRS design in AUC and partial AUC estimation. In practice, biomarker data with the TDS structure may also come from existing studies, in which subjects are not a simple random sample from the target population and analyzing the available data ignoring this fact will lead to biased results. In this section, we use a randomized clinical trial to illustrate this idea and show how the proposed TDS methods can be applied. CALGB 30801 is an ongoing phase III lung cancer trial conducted at the Cancer and Leukemia Group B (CALGB) [31]. A total of 216 patients with positive COX-2 expression are to be randomized with equal allocation to a standard chemotherapy (arm A) or celecoxib combined with standard chemotherapy (arm B). The primary objective of the trial is to evaluate the survival benefit of celecoxib (a COX-2 inhibitor) among patients with positive COX-2 expression. COX-2 is a protein over-expressed in lung cancers and its intensity is measured on a continuous scale ranging from 0 to 10. Based on preliminary data [32], the proportions of patients with negative (COX-2 < 2), moderate (2 ≤ COX-2 < 4) and positive (COX-2 ≥ 4) expression are about 60%, 13% and 27% respectively in the target population. Therefore, approximately 800 patients are to be screened to accrue the 216 positive patients. As a secondary objective, the investigators are interested in validating the prognostic value of COX-2 for survival in those patients who receive the standard chemotherapy. To answer this question, data from patients who have a full range of COX-2 scores (negative, moderate, positive) and are treated by standard chemotherapy are needed. To avoid the costly option of treating and following the large number of COX-2 negative patients, the investigators decided to select about 1/4 of all negatives to treat and follow for long term survival. At the end of the trial, the data available for assessing the prognostic value of COX-2 has a data structure of the TDS design. The simple random component (SRC) include all negatives when 1/4 of the positives are enrolled, all moderates when 1/4 of the positives are enrolled, and the first 54 positives on arm A. The expected sizes of the three SRC strata are n0,1 = 120, n0,2 = 26, n0,3 = 54, respectively. The rest of patients receiving standard chemotherapy, including n1 = 0 negatives, n2 = 78 moderates and n3 = 54 positives patients, makes up of the test-result-dependent component (TDC) of the TDS design. For this purpose of illustration, we define D = 1 for patients surviving less than 6 years and D = 0 otherwise. Since very few patients are expected to drop out before 6 years of follow-up in this trial, we ignore the issue of censoring for those patients who are followed less than 6 years. However, when censoring is non-ignorable and the predictive accuracy of the biomarker over time is of interest, an estimator that takes into account both time-to-event endpoint and censoring is warranted. This is a research topic for the future.
The CALGB trial is still open for patient accrual and the final data are not yet available for analysis. To illustrate the proposed AUC and pAUC estimators, we resampled a cohort of 800 patients with full range of COX-2 measures using the data from a preliminary COX-2 study [32], and used the cohort to create a dataset with the same data structure of the TDS design as CALGB 30801. The specific goal is to evaluate how well COX-2 expression levels predict death within 6 years) among the patients who receive only standard chemotherapy. Figure 1 shows the estimated ROC curves by the standard method ignoring the TDS data structure (Left) and the proposed method (Right), and the shaded areas indicate the estimated pAUCs. For AUC, the vector of cutoff points is set to (−∞, 3, 4, ∞). For pAUC, the range of t is set to (0, 0.1), which approximately corresponds to cutoff points (−∞, 4, ∞), such that n1 = 0 and n2 = 54 in the notation for pAUC in Section 5. The estimated AUC using ÂTDS is 0.8052 with a 95% confidence interval [0.7445, 0.8544]. The biased estimator ÂSRS yields an AUC estimate of 0.7749 with a 95% confidence interval [0.7192, 0.8223]. When ÂSRS is applied only to the SRS component of the dataset, the estimated AUC is 0.8125 with a 95% confidence interval [0.7400, 0.8685]. Assuming that the investigators are interested in the performance of the biomarker within a low FPR range t ∈ (0, 0.1), the estimated pAUC using is 0.0216 with a 95% confidence interval [0.0159, 0.0291]. The biased estimator yields a pAUC estimate of 0.0151. The estimated pAUC is 0.0115 when is applied only to the SRS component of the dataset.
Figure 1. COX-2 ROC Curve Estimated by Standard Method and Proposed Method.
Note: The left ROC curve is estimated by the standard method for a SRS design; the right ROC curve is estimated by the proposed method for a TDS design. The shaded areas denote the partial area under curve (pAUC) with t ∈ (0, 0.1)
7 Discussion
We propose nonparametric estimators for the area under the ROC curve (AUC) and the partial area under the ROC curve (pAUC) for data arising from a test-result-dependent sampling scheme (TDS). We demonstrate that the estimated empirical distribution of test results can be used to reweight the standard estimators of AUC and pAUC to provide valid inference for data arising from the TDS design. We establish the asymptotic properties of the proposed estimators under general regularity conditions. We also demonstrate good finite sample properties for the proposed estimators and their variance estimators under various specifications of the extent of separation of diseased vs. non-diseased subjects, cutoff points for stratification and subject allocation among strata. The proposed estimators are nonparametric in the same sense as the nonparametric AUC and pAUC estimators under the SRS design [4, 20]; and these estimators make no parametric assumptions about the relationship between test results and true disease status.
The test-result-dependent (TDS) design is motivated by the need for better efficiency for biomarker validation studies with limited resources. It is particularly useful when determining the true disease status is expensive, invasive or time consuming. In this article, the data arising from the TDS design consists of two components: a simple random component and a test-result-dependent component. The determination of the true disease status of a subject in the TDS component is dependent on the test results. More specifically, the selection of the TDS component is dependent on the intervals defined by pre-chosen cutoff points for stratification on the test result.
Simulations confirm that the proposed estimators for AUC and partial AUC effectively correct the bias introduced by the TDS design. Simulations also show that efficiency gains can be achieved by adopting a TDS design over a SRS design, particularly when the estimation of partial AUC is of interest. It is not surprising to see that much better precision is achieved by the TDS design for partial AUC estimation since more subjects fall into the false-positive range of interest as compared to the SRS design. In addition, greater efficiency gain occurs for pAUC estimation when (q0, q1) is unknown and has to be estimated than when (q0, q1) is known. The efficiency gain in estimating AUC under a TDS design is also understandable since a SRS design leads to an uneven distribution of subjects in different segments of the test results and hence leads to lower precision in AUC estimation.
Acknowledgments
The research is supported by R03-CA131596 (XW, JM), UL1-RR024128 (XW, SLG) and P01-CA142538 (XW, SLG).
Appendix
Let , s = 0, · · ·, K, i = 1, · · ·, ns, k = 1, · · ·, K − 1, where .
Let
and
where PK−1 is the (K − 1)th-order matrix with all elements 1.
Proof for Asymptotic Distribution of θ̂
The proof follows Owen [28, 29] and Qin and Lawless [30]. Since ns/n, s = 0, · · ·, K − 1 converge to ρs as n → ∞, nK/n converges to . Using the Taylor expansion of lp(θ̂) (see (2)) at the true value θ and noticing , we have
where . Rewriting the sum as , we can apply the central limit theorem to each term, and it is easy to show converges as n tends to infinity. Thus we have n1/2(θ̂− θ) converges to normal distribution as n tends to infinity. Let , we have
Proof for Asymptotic Distribution of ÂTDS
It is obvious that Rn(ÂTDS, θ̂) = 0. Denote e(A, θ) = E(Rn(A, θ)). Expanding Rn(ÂTDS, θ̂) at (A, θ), it follows from the condition that Un(A, θ) is equicontinuous that
where and
Rewrite ξsi in the proof of asymptotic distribution for θ̂ using single subscript as ξi. We have
where where Rij = p̃ip̃jDi(1 − Dj)(Iij − A). The last identity follows by the asymptotic property of θ̂ and by the limit distribution theorem of U-Statistic [33].
Rewrite using double subscripts as , then we can apply the central limit theorem to each term , s = 0, · · ·, K. Let , it follows that as n → ∞.
Proof of Asymptotic Distribution for τ̂t
Denote . Expanding at the true value (τt, θ), it follows from the condition that is equicontinuous that
where and
By the proof of asymptotic distribution for t̂heta, we have
where , and the second identity follows by the limit distribution theorem of U-statistic, similarly to the proof of Theorem 1. Rewrite using double subscripts as . Similarly, rewrite as . Applying the central limit theorem to and respectively. Let , we have .
References
- 1.Zhou XH, McClish DK, Obuchowski NA. Statistical Methods in Diagnostic Medicine. Wiley; New York: 2002. [Google Scholar]
- 2.Pepe MS. The statistical evaluation of medical tests for classification and prediction. Oxford University Press; New York: 2003. [Google Scholar]
- 3.Baker S, Pinsky P. A proposed design and analysis for comparing digital and analog mammography: Special receiver operating characteristic methods for cancer screening. Journal of the American Statistical Association. 2001;96:421428. [Google Scholar]
- 4.Dodd LE, Pepe MS. Partial AUC estimation and regression. Bio metrics. 2003;59:614–623. doi: 10.1111/1541-0420.00071. [DOI] [PubMed] [Google Scholar]
- 5.Little RJA, Rubin DB. Statistical analysis with missing data. Wiley; New York: 1987. [Google Scholar]
- 6.Begg CB, Greenes RA. Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics. 1983;39:207–215. [PubMed] [Google Scholar]
- 7.Gray R, Begg C, Greenes R. Construction of receiver operating characteristic curves when disease verification is subject to selection bias. Medical Decision Making. 1984;4:151–164. doi: 10.1177/0272989X8400400204. [DOI] [PubMed] [Google Scholar]
- 8.Zhou XH. A nonparametric maximum likelihood estimator for the receiver operating characteristic curve area in the presence of verification bias. Biometrics. 1996;52:299–305. [PubMed] [Google Scholar]
- 9.Zhou XH. Comparing accuracies of two screening tests in the presence of verification bias. Journal of the Royal Statistical Society, Series C. 1998;47:135–147. [Google Scholar]
- 10.Alonzo TA, Pepe MS. Assessing accuracy of a continuous screening test in the presence of verification bias. Applied Statistics. 2005;54:173190. [Google Scholar]
- 11.Wang XF, Wu YG, Zhou HB. Outcome- and Auxiliary-Dependent Subsampling and Its Statistical Inference. Journal of Biopharmaceutical Statistics. 2009;19(6):1132–1150. doi: 10.1080/10543400903243025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Weaver MA, Zhou HB. An estimated likelihood method for continuous outcome regression models with outcome-dependent sampling. Journal of the American Statistical Association. 2005;100:459–469. [Google Scholar]
- 13.Zhou HB, Chen JW, Rissanen T, Korrick S, Hu H, Salonen JT, Longnecker MP. An efficient sampling and inference procedure for studies with a continuous outcome. Epidemiology. 2007;18:461–468. doi: 10.1097/EDE.0b013e31806462d3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zhou HB, Weaver MA, Qin J, Longnecker MP, Wang MC. A semiparametric empirical likelihood method for data from an outcome dependent sampling scheme with a continuous outcome. Biometrics. 2002;58:413–421. doi: 10.1111/j.0006-341x.2002.00413.x. [DOI] [PubMed] [Google Scholar]
- 15.Wang XF, Zhou HB. A semiparametric empirical likelihood method for biased sampling schemes in epidemiologic studies with auxiliary covariates. Biometrics. 2006;62:1149–1160. doi: 10.1111/j.1541-0420.2006.00612.x. [DOI] [PubMed] [Google Scholar]
- 16.Cosslett SR. Efficient estimation of discrete-choice models. In: Manski C, McFadden D, editors. Structural Analysis of Discrete Data: with Econometric Applications. Cambridge MA: M.I.T. Press; 1981a. pp. 51–111. [Google Scholar]
- 17.Cosslett SR. Maximum likelihood estimator for choice-based samples. Econometrica. 1981b;49:12891316. [Google Scholar]
- 18.Morgenthaler S, Vardi Y. Choice-based samples: a nonparametric approach. Journal of Econometrics. 1986;32:109–125. [Google Scholar]
- 19.Bamber D. The area above the ordinal dominance graph and the area below the operating graph. Journal of Mathmatical Psychology. 1975;12:387–415. [Google Scholar]
- 20.Hanley JA, McNeil BJ. The meaning and use of the area under the receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36. doi: 10.1148/radiology.143.1.7063747. [DOI] [PubMed] [Google Scholar]
- 21.DeLong ER, DeLong D, Clarke-Pearson D. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics. 1988;44:837845. [PubMed] [Google Scholar]
- 22.Wieand S, Gail M, James B, James K. A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika. 1989;76:585–592. [Google Scholar]
- 23.Thompson ML, Zucchini W. On the statistical analysis of ROC curves. Statistics in Medicine. 1989;8:12771290. doi: 10.1002/sim.4780081011. [DOI] [PubMed] [Google Scholar]
- 24.McClish DK. Analyzing a portion of the ROC curve. Medical Decision Making. 1989;9:190–195. doi: 10.1177/0272989X8900900307. [DOI] [PubMed] [Google Scholar]
- 25.Jiang Y, Metz CE, Nishikawa RM. A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology. 1996;201:745–750. doi: 10.1148/radiology.201.3.8939225. [DOI] [PubMed] [Google Scholar]
- 26.Zhang DZ, Zhou XH, Freeman DH, Freeman JL. A non-parametric method for the comparison of partial areas under ROC curves and its application to large health care data sets. Statistics in Medicine. 2002;21:701715. doi: 10.1002/sim.1011. [DOI] [PubMed] [Google Scholar]
- 27.He Y, Escobar M. Nonparametric statistical inference method for partial areas under receiver operating characteristic curves, with application to genomic studies. Statistics in Medicine. 2008;27:52915308. doi: 10.1002/sim.3335. [DOI] [PubMed] [Google Scholar]
- 28.Owen AB. Empirical likelihood ratio confidence intervals for a single functional. Biometrika. 1988;75:237–249. [Google Scholar]
- 29.Owen AB. Empirical likelihood for confidence regions. Annals of Statis tics. 1990;18:90–120. [Google Scholar]
- 30.Qin J, Lawless JF. Empirical likelihood and general estimating equations. Annals of Statistics. 1994;22:300–325. [Google Scholar]
- 31.Edelman MJ, Wang XF, Cheney R, Kratzke R, Vokes E. A randomized phase III double blind trial evaluating selective COX-2 inhibition in COX-2 expressing advanced non-small cell lung cancer. Cancer and Leukemia Groupd B Protocol. 2009;2009 [Google Scholar]
- 32.Edelman M, Watson D, Wang XF, Morrison C, Kratzke R, Jewell S, Hodgson L, Mauer AM, Graziano SL, Masters GA, Bedor M, Green MJ, Vokes EE. Eicosanoid modulation in advanced lung cancer: COX-2 expression is a positive predictive factor for celecoxib + chemotherapy. Journal of Clinical Oncology. 2007;26:848–855. doi: 10.1200/JCO.2007.13.8081. [DOI] [PubMed] [Google Scholar]
- 33.Van der Vaart AW. Asymptotic statistics. Cambridge University Press; 1998. [Google Scholar]

