Abstract
Diagnostic trials often require the use of a homogeneity test among several markers. Such a test may be necessary to determine the power both during the design phase and in the initial analysis stage. However, no formal method is available for the power and sample size calculation when the number of markers is greater than two and marker measurements are clustered in subjects. This article presents two procedures for testing the accuracy among clustered diagnostic markers. The first procedure is a test of homogeneity among continuous markers based on a global null hypothesis of the same accuracy. The result under the alternative provides the explicit distribution for the power and sample size calculation. The second procedure is a simultaneous pairwise comparison test based on weighted areas under the receiver operating characteristic curves. This test is particularly useful if a global difference among markers is found by the homogeneity test. We apply our procedures to the BioCycle Study designed to assess and compare the accuracy of hormone and oxidative stress markers in distinguishing women with ovulatory menstrual cycles from those without.
Keywords: ROC curve, biomarker, homogeneity test, sample size
1. Introduction
Diagnostic trials are often carried out to test and compare the diagnostic accuracy of markers. Elegant methods have been proposed to combine markers after the trials are finished (see, for example, [1, 2]). However, these methods may not be applicable when the investigators try to determine the power during the design phase and in the initial analysis stage. A homogeneity test offers a better alternative. When more than two markers are measured on the same set of diseased and non-diseased subjects, how to appropriately obtain the test statistic and its variance becomes challenging.
The BioCycle Study [3, 4] is a longitudinal study conducted at the Eunice Kennedy Shriver National Institute of Child Health and Human Development of the National Institutes of Health. One of the primary objectives of the study is to assess and compare the accuracy of endogenous hormones (i.e., estrogen and progesterone) and oxidative stress markers during the menstrual cycle in distinguishing women with ovulatory menstrual cycles from those without. This helps further the understanding of the implications of oxidative stress in the risk of human infertility and the mechanisms by which oxidative stress may be associated with female infertility. The study enrolled healthy premenopausal women and measured both oxidative stress markers and endogenous hormones in serum on multiple days [3].
In this study, we initially classified menstrual cycles as anovulatory if the peak progesterone concentration across the cycle was <5 ng/mL. To minimize misclassification, we employed a more conservative definition of anovulation in which cycles with progesterone concentrations ≤5 ng/mL and an observed serum luteinizing hormone (LH) peak in the mid or late luteal phase visit were considered ovulatory. On the basis of this algorithm, 35 out of 259 women had at least one anovulatory cycle, and the rest of the women were ovulatory. The investigators were interested in comparing three hormones [estradiol (E2), follicle stimulating hormone (FSH), and LH and an oxidative stress marker F2isoprostanes (F2iso) on the visits around ovulation (corresponding to the late follicular phase, LH/FSH surge, and predicted ovulation) to see whether these markers have different diagnostic accuracy and, if so, which of these markers are better. Figure 1 shows the histograms of four markers’ levels at the first visit around ovulation. We can see that it is difficult to make a reasonable parametric assumption about the distributions of marker measurements in our study. It is also impossible to transform the marker observations to follow some known distributions because part (a) in Figure 1 clearly indicates that observations from the anovulatory group and the ovulatory group tend to skew differently. Therefore, a nonparametric or semiparametric test that does not require any distribution assumptions would be more appropriate than parametric methods.
Figure 1.
Relative frequency histograms of ovulatory and anovulatory women for the markers estradiol (E2), follicle stimulating hormone (FSH), luteinizing hormone (LH), and F2-isoprostanes (F2iso).
The primary scientific questions to be answered in the BioCycle Study include the following: (1) testing homogeneity among markers and (2) power and sample size calculation for this study. However, to the best of our knowledge, no method is available to address these two questions. Although we may extend semiparametric kernel ROC methods proposed in [5] to compare multiple markers by obtaining test statistics under the null hypothesis using some resampling methods, it is difficult to carry out the resampling under the alternative. It is also infeasible to conduct resampling at the design stage when no data are available. In this paper, we will introduce a formal homogeneity test for testing diagnostic accuracy among markers. The proposed procedure tests a global hypothesis. The χ2 distribution derived under the alternative for the test statistic is useful for power analysis based on hypothesized differences among markers. When the number of markers is only two, the proposed global test reduces to the commonly used test by [6]. In addition, we will also propose a simultaneous pairwise comparison procedure based on generalized ROC summary statistics. This procedure is applicable if the test of homogeneity rejects the null hypothesis and a subsequent analysis is needed to search for the set of most accurate markers. The contributions of the proposed tests are not limited to the aforementioned example. The tests are also applicable in comparing multiple medical imaging modalities which are of wide interest in radiological studies (see, for example, [7, 8]).
The rest of the article is organized as follows. In Section 2, we propose a global test based on the hypothesis of homogeneous accuracy and discuss its theoretical properties. In Section 3, we introduce a simultaneous pairwise comparison test if significance is found by the homogeneity test. We also investigate the theoretical property of the procedure. We apply the proposed procedures to the BioCycle Study data in Section 4. Section 5 reports simulation results to illustrate the small sample performance of the proposed method. We defer the proof of the asymptotic distributions to the Appendix.
2. A test of homogeneity
In this section, we introduce a test of homogeneity based on ROC summary statistics for clustered observations. We start with a brief introduction to ROC summary statistics. We then use these summary statistics to construct the test.
Let Xℓip denote test result of marker ℓ on day p in the diseased subject i, where ℓ = 1,…, L, p = 0, 1, …, mℓi, and i = 1,… I. Let Yℓjq denote test result of ℓth marker on day q in the non-diseased subject j, where ℓ = 1,…, L, q = 0, 1, …,nℓj, and j = 1, …J. Here, the total number of subjects is N = I + J. Define the joint survival function (Xℓ1ip1, Xℓ2ip2) ~ SD,ℓ,1,ℓ2 (x1, x2) p1 =1, …, mℓ1i, p2 = 1, …,mℓ 2i, for the diseased subjects with marginal survival functions Xℓip ~ SD,ℓ(x). Similarly, define (Yℓ1jq1, Yℓ2jq2) ~ SD̄, ℓ1,ℓ2 (y1, y2), q1 = 1, …,nℓ1i, q2 = 1, …, nℓ2i for the non-diseased subjects with marginal survival functions Yℓjq ~ SD̄, ℓ(y). Without loss of generality, we assume that measurements tend to be larger for the diseased than for the non-diseased. Let u be the false positive rate (FPR), which is also 1-specificity. The ROC curve for the ℓth marker is , where u ∈ [0, 1]. The resulting ℓth weighted area under curve (wAUC) is
| (1) |
with a probability measure W(u)defined on u, for u ∈ (0, 1). Included in this class of accuracy measures are AUC (when W(u) = u for 0 < u < 1), partial AUC (pAUC) between FPRs u1 and u2 (when W(u) = (u − u1)/(u2 − u1) for 0 < u1 ≤ u ≤ u2 ≤ 1), and the sensitivity at a given level of FPR u0 (when W(u) is a point mass at u0). The nonparametric wAUC estimator is given by , where ŜD,ℓ and ŜD̄,ℓ are empirical functions of SD,ℓ and SD̄,ℓ defined by
and
Denote Ω = (Ω1, Ω2,…, ΩL). By substituting ŜD,ℓ and ŜD̄,ℓ in Equation (1), the nonparametric estimator of Ω is given by Ω̂ = (Ω̂1, Ω̂2, …, Ω̂L).
2.1. Comparing multiple diagnostic tests
We will consider the following null hypothesis of homogeneity among markers:
This hypothesis is an important initial step in comparing multiple markers [9]. Rejecting this null hypothesis is indicative of significant difference among some markers. Subsequent analysis such as pairwise comparisons can then be conducted to search for the optimal set of markers which have the best diagnostic accuracy. To construct a test statistic for the null hypothesis, we define a new vector as
where Ωc,ℓ = Ωℓ − ΩL, for ℓ< L. In fact, any of the markers can serve as marker L to construct the test statistic. This is due to the fact that the hypothesis of homogeneity is equivalent to testing H0: Ωc = 0 vs. Ha: Ωc ≠ 0. A consistent estimator of Ωc is given by Ω̂c = (Ω̂1 − Ω̂L, Ω̂2 − Ω̂L, …, ΩL−1 − Ω̂L)′. The relationship between Ω̂ and Ω̂c can be expressed by Ω̂c = MΩ̂, where M = (IL−1, −1L−1), with an identity matrix IL−1 and a vector of one’s 1L−1 =(1, 1,…, 1)′.
Denote and . Assume that I/J → λ < ∞, as I, J → ∞. Let
where and are the first derivatives of SD, ℓ and SD̄,ℓ, respectively.
Define
and
and r̂ℓ(u) is the kernel density estimate of rℓ(u). In [10], the authors used the Epanechnikov kernel function with the bandwidth of 4/max(min(I, J)4/5, 50), where I and J are sample sizes for the diseased group and the non-diseased group, respectively. The same setting was used in our simulation studies and the example. Other kernel functions such as the Gaussian kernel may also be used. A detailed discussion on kernel methods can be found in [11]. Denote
In the Appendix, we show that Ω̂c asymptotically follows a multivariate normal distribution given by
| (2) |
A nonnegative definite covariance estimator of Σc is
| (3) |
The asymptotic distribution of the test statistic is provided in the following theorem and is proved in the Appendix.
Theorem 1
With mild regularity conditions, Tc converges in distribution to a χ2 distribution with d.f. L minus; 1 under H0, and Tc converges in distribution to a noncentral chi-square distribution distribution with the non-centrality parameter,
under the alternative hypothesis, as I, J → ∞.
For L = 2, Σc reduces to one element, var(Ω̂1 − Ω̂2). The asymptotic distribution of Tc = (Ω̂1 − Ω̂2)2/var(Ω̂1 − Ω̂2) reduces to a distribution with the non-centrality ϕ = c2/var(Ω̂1 − Ω̂2) under the alternative Ha: Ω1 − Ω2 = c. It implies that has an approximately normal distribution with mean and variance 1 under Ha and has approximately a standard normal distribution under H0. This asymptotic normality has been studied by [12] and [13]. Their expression of var(Ω̂1 − Ω̂2) is the same as ours.
When different subjects are measured with different markers, Ω̂ℓ’s are independent estimators, and we can derive an explicit form for the variance–covariance matrix of Ω̂c. Assume that I/mℓ → αℓ, and I/nℓ → βℓ, as I, J → ∞. Here, αℓ and βℓ are finite numbers. We also assume that , and , as I, J → ∞. It follows that
where 1 = (1, 1, …, 1)′, and vℓ = var(Ω̂ℓ), whose explicit form is given in Theorem 2 by letting h(·) = Ωℓ,
| (4) |
Given the special form of the variance–covariance matrix Σc, the inverse matrix is given by an explicit expression using results in matrix operations. Theorem 1 then gives a simplified distribution result of Tc, which is summarized in the following corollary.
Corollary 1
With mild regularity conditions, as I, J → ∞, Tc converges in distribution to a χ2 distribution, with L − 1 degree of freedom under H0, and converges in distribution to a noncentral chi-square distribution with
under Ha.
2.2. Power and sample size
The explicit forms of χ2 distributions under Ha in either Theorem 1 or Corollary 1 are useful for power analysis. The power under Ha is given by
where , and is the upper α critical value of a chi-square distribution with L − 1 degrees of freedom. The non-centrality ϕ can be estimated by , where the hypothesized value of Ωc is specified under the alternative. Asymptotically, ϕ̂ converges to its true value.
For a given power 1 − β, we have the following relationship
| (5) |
where is the upper β critical value of a noncentral chi-square distribution with L − 1 degrees of freedom. By fixing the power 1 − β and type I error rate α, the non-centrality parameter ϕ can be solved numerically to satisfy the relationship in (5). Let the solution for ϕ be ϕα,1−β. By combining the expression of ϕ in Theorem 1 and the expression of Σc in (2), it follows that
| (6) |
The minimal relevant difference, Ωc, among ROC summary measures can be specified by investigators. Using either pilot studies or specific distribution assumptions, we can estimate and . By specifying a ratio between I and J, we can then solve (6) for the required sample sizes for the diseased and non-diseased patients.
3. Post hoc pairwise comparisons based on Δ-statistics
After the test finds a significant difference among markers, we can use a post hoc pairwise comparison to identify the most accurate markers by using differences between the weighted AUCs, Δℓk = Ωℓ − Ωk, for ℓ ≠ k. The estimator Δ̂ℓk = Ω̂ℓ − Ω̂k, for ℓ ≠ k can be obtained from the data. A general setting for Ω̂ℓ − Ω̂k is when mℓi, nℓj ≥ 1. In this paper, the multivariate normality of Ω̂c in (8) of the Appendix allows derivation of the explicit expression of the asymptotic variance of Δ̂ℓk.
Theorem 2
With mild regularity conditions,
where
and
where
Theorem 2 is a direct result obtained by combining the multivariate normal distribution of Ω̂ and the Cramer–Wold device [14, Theorem 1.5.2]. Special details are omitted here.
For the purpose of pairwise comparisons after the simultaneous test, L − 1 Δ-statistics can be defined to be the difference between markers. That is, Δ̂1L = Ω̂1 − Ω̂L, Δ̂2L = Ω̂2 − Ω̂L, …, Δ̂L−1,L = Ω̂L−1 − Ω̂L. By directly using results in Theorem 2, p-values can be obtained for these pairwise comparisons. By comparing these p-values to a certain threshold from multiple comparison procedures such as the Bonferroni test or false discovery rate method, markers which have better accuracy can be identified.
The pairwise comparison after the proposed homogeneity test in diagnostic trials is analogous to post hoc tests after the analysis of variance test in clinical trials. In our setting, we use an overall significance test to monitor the diagnostic trial, for example, to determine early stopping of the trial. After the trial is terminated on the basis of the global test, methods such as a step-down approach are then employed to identify individual biomarkers.
4. The BioCycle Study revisited
In the aforementioned BioCycle Study, 35 women were anovulatory, and 224 women were ovulatory. The markers were measured at three visits for all patients. The investigators were interested in comparing accuracy between markers of E2, FSH, LH, and F2iso in distinguishing between ovulatory and anovulatory menstrual cycles using ROC summary measures. The empirical ROC curves of these markers are shown in Figure 2(a). The ROC derivative function r(u)using kernel density smoothing estimation for each marker is illustrated in Figure 2(b). We first compared these markers on the basis of their AUCs. We conducted a homogeneity test of all these markers using results from Theorem 1 with α = 0.05. We estimated the AUCs of E2, FSH, LH, and F2iso,
Figure 2.
Empirical ROC curves and their derivative functions of the markers estradiol (E2), follicle stimulating hormone (FSH), luteinizing hormone (LH), and F2-isoprostanes (F2iso).
In this example, we chose F2iso as a reference biomarker. Note that the results remain the same when we use either biomarker as the reference. We calculated differences between individual markers and the reference marker of Ω̂c of (0.1915, 0.0130, 0.0100). The variance–covariance matrix Σc of Ω̂c was further estimated by (3):
The chi-square statistic was calculated, , under H0. The p-value given by for this test does not show a significant difference among these markers based on their AUCs.
As these markers were developed to screen a large population of women and distinguish between ovulatory and anovulatory menstrual cycles, the accuracy at a FPR less than 0.6 was also determined to be important. In fact, similar decisions to look at small FPRs are often made for cancer screening markers. We conducted a homogeneity test of all these markers on the basis of the pAUCs with α = 0.05. We calculated the partial AUC estimates of the biomarkers, (0.3590, 0.2286, 0.2196, 0.2026). This gave differences between individual markers and the reference marker of (0.1564, 0.0260, 0.0170). The variance–covariance matrix Σc of Ω̂c was further estimated by (3):
The chi-square statistic was under H0. The p-value is 0.04 for this test, indicating a significant difference among these markers based on their partial AUCs.
The results we derived in Theorem 1 were used for power analysis. Suppose the variance–covariance matrix of Ω̂ remains the same under Ha. With a hypothesized Ωc and type I error rate 0.05, the power under Ha is given by
where , and is the upper α critical value of a chi-square distribution with three degrees of freedom. Here, ϕ̂ is the estimate of the non-centrality parameter, . Table I gives the powers for various hypothesized differences between partial AUCs of the reference marker and the other markers.
Table I.
Power of the homogeneity test.
| (Δ1L, Δ2L, Δ3L) | Power (%) |
|---|---|
| (0.15, 0.05, 0.05) | 44.49 |
| (0.15, 0.10, 0.05) | 60.27 |
| (0.15, 0.10, 0.10) | 62.49 |
| (0.20, 0.05, 0.05) | 41.87 |
| (0.20, 0.10, 0.05) | 85.25 |
We also illustrate the sample size calculation using the method proposed in Section 2.2. Let the hypothesized difference between partial AUCs of the reference marker and the other markers be (0.15, 0.05, 0.05). The power is 44.49% in Table I with 35 anovulatory women and 224 ovulatory women in the original study. We calculated the required sample sizes to obtain 80% power. With type I error rate 0.05, the non-centrality parameter was calculated to be ϕ0.05,0.80 ≈ 14.64. We then used the function uniroot in R package to solve the equation (6) and obtained the required sample sizes, I = 96 and J = 48, with the ratio of 2:1 between anovulatory and ovulatory women. Comparing with the original sample sizes, we can see that the required sample size for ovulatory women is only one-fourth of the original sample size. But the power of 80% is almost twice as large as the original power. Interestingly, the findings indicate that recruiting less anovulatory women and more ovulatory women may dramatically increase the test power.
Because a significant difference among pAUCs was found, we conducted post hoc pairwise comparisons while using the conservative Bonferroni criteria to adjust for multiple comparisons. The estimated pairwise differences between the reference marker and other markers are given by Δ̂ℓL for ℓ = 1, …, 3, and the estimated variances of these differences are given by the diagonal elements in Σ̂c. We may also estimate the variances using results in Theorem 2. In fact, variance estimates given by Theorem 2 are similar to those in Σ̂c. As we are interested in whether other markers have different accuracy from the reference marker, the z-statistics for pairwise comparisons are (2.2785, 0.3972, 0.3140). The Bonferroni adjusted threshold is given by z0.05/3 = 2.12. Comparing the z-statistics with this threshold, it can be seen that E2 has different accuracy from F2isoprostane in distinguishing women with and without ovulation, and there is no significant difference between other markers and F2isoprostane.
5. A simulation study
We report simulation studies to evaluate the simulated type I error rates of the proposed test procedure. We simulated 1000 datasets under multivariate normal and multivariate log-normal distributions. We simulated multivariate normal random variables X̃ ~ N(μX, ΣX) and Ỹ ~ N(μY, ΣY), where μX = (1, 1, 2, 2, 1.5, 1.5), μY = (0, …, 0), and ΣX = ΣY is the variance–covariance matrix with diagonal elements (1, 1, 4, 4, 2.25, 2.25) and correlation coefficient, ρ. Here, ρ gives within-subject correlation. We let L = 3 markers and K = 2 repeated measurements for each subject. We chose the correlation coefficient ρ to be −0.2, −0.1, 0, 0.2, or 0.5. We let I = J = (50, 100, 200). For comparing AUCs, we let W(u)= 1 for 0 < u < 1. For comparing partial AUCs, we let W(u) = 1 for 0 < u < 0.8. To simulate multivariate log-normal random variables, we applied exponential transformation, X = exp (X̃) and Y = exp (Ỹ), to get simulated log-normal data, where X̃ and Ỹ were simulated using the aforementioned settings.
We also simulated multivariate log-normal random variables under unequal sample sizes and unequal covariance matrices with L = 3 markers and K = 2 repeated measurements for each subject. In the simulation, we let I = 80, and J = (100, 200). We let μX = (0, …, 0), μY =(0, …, 0), and the covariance matrix for the non-diseased population have diagonal elements (1, 1, 4, 4, 2.25, 2.25) and a correlation coefficient, ρ, and let the covariance matrix for the non-diseased population have the same diagonal elements but with a different correlation coefficient ρ + 0.2. We again let ρ be −0.2, −0.1, 0, 0.2, or 0.5. We first simulated multivariate normal random variables X̃ ~ N(μX, ΣX) and Ỹ ~ N(μY, ΣY). The exponential transformation, X = exp (X̃) and Y = exp (Ỹ), was then applied to get simulated log-normal data.
It is clear that the null hypothesis of equal AUCs or equal partial AUCs is true under these simulation settings. We then applied the proposed simultaneous comparison procedure to simulated datasets. For each simulated dataset, we estimated Ω̂c = (Ω̂2 − Ω̂1, Ω̂3 − Ω̂1) and its variance–covariance matrix Σ̂c. We then calculated the χ2 statistic, . The null hypothesis was rejected if . We counted the frequency that the null hypothesis of equal AUCs or pAUCs was rejected out of 1000 simulated datasets in each simulation setting. Tables II and III show simulated rejection rates of our test procedure with the nominal level of 0.05. For sample sizes of 100 and 200, most rejection rates are within the 95% prediction interval.
Table II.
Rejection rates of the proposed procedure for equal sample sizes.
| Multivariate normal
| ||||||
|---|---|---|---|---|---|---|
| ρ | AUC
|
pAUC
|
||||
| I = J = 50 (%) | 100 (%) | 200 (%) | I = J = 50 (%) | 100 (%) | 200 (%) | |
| −0.2 | 7.90 | 6.20 | 5.80 | 8.20 | 6.50 | 6.00 |
| −0.1 | 6.80 | 6.20 | 6.20 | 6.80 | 6.80 | 6.20 |
| 0 | 6.40 | 5.90 | 6.20 | 6.70 | 5.70 | 5.90 |
| 0.2 | 8.20 | 5.80 | 5.80 | 8.30 | 5.70 | 5.80 |
| 0.5 | 5.60 | 5.30 | 5.20 | 5.90 | 5.50 | 5.20 |
|
| ||||||
| Multivariate log-normal
| ||||||
| −0.2 | 7.10 | 5.00 | 5.70 | 6.90 | 4.90 | 6.10 |
| −0.1 | 6.90 | 4.70 | 4.90 | 7.20 | 5.10 | 5.10 |
| 0 | 6.90 | 6.10 | 6.00 | 7.20 | 5.80 | 5.80 |
| 0.2 | 7.50 | 6.10 | 5.80 | 7.20 | 5.60 | 3.90 |
| 0.5 | 6.00 | 6.80 | 5.80 | 5.90 | 4.70 | 5.60 |
The 95% predictive interval for the error rate is (3.5%, 6.5%).
Table III.
Rejection rates for unequal sample sizes with I = 80.
| ρ | AUC
|
pAUC
|
||
|---|---|---|---|---|
| J = 100 (%) | J = 200 (%) | J = 100 (%) | J = 200 (%) | |
| −0.2 | 5.40 | 5.60 | 7.10 | 5.80 |
| −0.1 | 5.60 | 6.30 | 6.10 | 5.60 |
| 0 | 5.50 | 5.40 | 5.30 | 4.00 |
| 0.2 | 7.60 | 5.50 | 5.70 | 5.20 |
| 0.5 | 3.70 | 5.70 | 3.80 | 3.80 |
The 95% predictive interval for the error rate is (3.5%, 6.5%).
6. Discussion
This article provides formal methods for the homogeneity test of several markers. The proposed methods are nonparametric and have important applications in comparing diagnostic markers. The associated theoretical properties are easy to implement in clustered ROC data. They provide new insights for analyzing clustered ROC data. Currently, the existing statistical inferences for evaluating complex ROC markers mainly rely on bootstrap or other resampling methods. The results studied in this paper give the closed-form expression of covariance structure for both within and between empirical ROC curves for clustered data. We provided the formula for power calculation and the numerical method for sample size calculation. The R code for conducting the proposed tests is available from the first author.
Acknowledgments
The authors thank anonymous referees, the associate editor, and the editor for their constructive comments and useful suggestions. The work was supported in part with funding from the American Chemistry Council and the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development. The project described here was supported by Award Number R15CA150698 from the National Cancer Institute under the American Recovery and Reinvestment Act of 2009, by Award Number H98230-11-1-0196 from the National Security Agency, and by Department of Veterans Affairs, Veterans Health Administration, Health Services Research and Development Service, project grant #XVA 61-036.
Appendix: Proofs
We follow similar lines in the proof of Theorem 1 in [15] and use continuous mapping theorem [16, Theorem 1.3.6,] to show that is asymptotically equivalent to
| (7) |
Note that the two summands of (7) are independent. Also, the first summation of (7) is the sum of independent mean zero random vectors, W̃i, and the second summation of (7) is the sum of independent mean zero random vectors, Ṽj. Thus, it follows from multivariate central limit theorem [14, Theorem 1.9.2B] that
| (8) |
The eigen-decomposition of Σc is given by , where λ(ℓ) is the ℓth largest eigenvalue out of L − 1 eigenvalues, and Qℓ is the corresponding orthonormal eigenvector. On the basis of (8), we have
| (9) |
where ||·|| is a norm on R(L−1)2. Denote
ℓ= 1, …, L − 1. Using Cramer–Wold device, we can show that asymptotically, Zℓ’s are independent and follow standard normal. We substitute λ̂(ℓ) for λ(ℓ) and Q̂ℓ for Qℓ in Zℓ. Using the law of large numbers, it follows from (9) that
Because Tc is essentially , the proof of Theorem 1 is completed by applying Theorem 3.5 in [14].
Footnotes
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Cancer Institute, the National Institutes of Health, or the Department of Veterans Affairs.
References
- 1.Baker SG. Identifying combinations of cancer markers for further study as triggers of early intervention. Biometrics. 2000;56 (4):1082–1087. doi: 10.1111/j.0006-341x.2000.01082.x. [DOI] [PubMed] [Google Scholar]
- 2.McIntosh MW, Pepe MS. Combining several screening tests: optimality of the risk score. Biometrics. 2002;58(3):657–664. doi: 10.1111/j.0006-341x.2002.00657.x. [DOI] [PubMed] [Google Scholar]
- 3.Wactawski-Wende J, Schisterman EF, Hovey KM, Howards PP, Browne RW, Hediger M, Liu A, Trevisan M BioCycle Study Grp. BioCycle Study: design of the longitudinal study of the oxidative stress and hormone variation during the menstrual cycle. Paediatric and Perinatal Epidemiology. 2009;23(2):171–184. doi: 10.1111/j.1365-3016.2008.00985.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Schisterman EF, Gaskins AJ, Mumford SL, Browne RW, Yeung E, Trevisan M, Hediger M, Zhang C, Perkins NJ, Hovey K, et al. Influence of endogenous reproductive hormones on F-2-isoprostane levels in premenopausal women. American Journal of Epidemiology. 2010;172(4):430–439. doi: 10.1093/aje/kwq131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Zou K, Hall W, Shapiro D. Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. Statistics in Medicine. 1997;16(19):2143–2156. doi: 10.1002/(sici)1097-0258(19971015)16:19<2143::aid-sim655>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
- 6.DeLong ER, DeLong D, Clarke-Pearson D. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44:837–845. [PubMed] [Google Scholar]
- 7.Obuchowski NA. Nonparametric analysis of clustered ROC curve data. Biometrics. 1997;53:567–578. [PubMed] [Google Scholar]
- 8.Zou KH. Comparison of correlated receiver operating characteristic curves derived from repeated diagnostic test data. Academic Radiology. 2001;8(3):225–233. doi: 10.1016/S1076-6332(03)80531-7. [DOI] [PubMed] [Google Scholar]
- 9.Hochberg Y, Tamhane AC. Multiple Comparison Procedures. Wiley; New York: 1987. [Google Scholar]
- 10.Cai T, Pepe M. Semi-parametric ROC analysis to evaluate biomarkers for disease. Journal of The American Statistical Association. 2002;97:1099–1107. [Google Scholar]
- 11.Silverman BW. Density Estimation for Statistics and Data Analysis. Chapman and Hall/CRC; London: 1986. [Google Scholar]
- 12.Wieand S, Gail MH, James BR, James KL. A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika. 1989;76:585–592. [Google Scholar]
- 13.Tang L, Emerson SS, Zhou XH. Nonparametric and semiparametric group sequential methods for comparing accuracy of diagnostic tests. Biometrics. 2008;64:1137–1145. doi: 10.1111/j.1541-0420.2008.01000.x. [DOI] [PubMed] [Google Scholar]
- 14.Serfling RJ. Approximation Theorems of Mathematical Statistics. Wiley; New York: 1980. [Google Scholar]
- 15.Hsieh F, Turnbull BW. Non- & semi- parametric estimation of the receiver operating characteristics (ROC) curve. Annals of Statistics. 1996;24:25–40. [Google Scholar]
- 16.Van der Vaart A, Wellner J. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer-Verlag; New York: 1996. [Google Scholar]


