Nonparametric ROC Summary Statistics for Correlated Diagnostic Marker Data

Liansheng Larry Tang; Aiyi Liu; Zhen Chen; Enrique F Schisterman; Bo Zhang; Zhuang Miao

doi:10.1002/sim.5654

. Author manuscript; available in PMC: 2014 Jun 15.

Published in final edited form as: Stat Med. 2012 Oct 11;32(13):2209–2220. doi: 10.1002/sim.5654

Nonparametric ROC Summary Statistics for Correlated Diagnostic Marker Data

Liansheng Larry Tang ^1,^*, Aiyi Liu ², Zhen Chen ², Enrique F Schisterman ², Bo Zhang ³, Zhuang Miao ¹

PMCID: PMC3578098 NIHMSID: NIHMS418419 PMID: 23055248

Abstract

We propose efficient nonparametric statistics to compare medical imaging modalities in multi-reader multi-test data and to compare markers in longitudinal ROC data. The proposed methods are based on the weighted area under the ROC curve which includes the area under the curve and the partial area under the curve as special cases. The methods maximize the local power for detecting the difference between imaging modalities. The asymptotic results of the proposed methods are developed under a complex correlation structure. Our simulation studies show that the proposed statistics result in much better powers than existing statistics. We applied the proposed statistics to an endometriosis diagnosis study.

Keywords: ROC curve, Optimal weights, Wilcoxon statistics, Correlated data

1. Introduction

In medical imaging studies, one is concerned about whether a newly developed imaging modality is more accurate than traditional modalities to correctly discriminate a subject with abnormal lesions from a subject without such lesions. Imaging modalities are considered as an example of diagnostic markers, which are used to distinguish a subject with a particular condition (“the diseased”) from a subject without the condition (“the non-diseased”). For diagnostic markers that generate binary test results, their accuracy can be summarized in terms of sensitivity (probability of identifying a diseased subject when the disease truly exists) and specificity (probability of correctly ruling out a non-diseased subject when the disease is truly absent). For diagnostic markers that generate discrete or continuous test results, the receiver operating characteristic (ROC) curve is a standard statistical tool to describe and compare the accuracy of markers [1]. The ROC curve combines all possible pairs of sensitivities and 1–specificities from different decision thresholds and thus describes the accuracy of markers apart from decision thresholds.

For correlated results from two diagnostic markers, parametric and nonparametric methods have been proposed to compare ROC summary measures. Parametric methods for the area under the curve (AUC) assume distributions (e.g. negative exponential, normal, lognormal, gamma) on marker measurements [2, 3]. These methods may not perform well if the parametric assumptions are invalid. The semiparametric ROC estimation based on the logistic regression is proposed by [4]. As an alternative, nonparametric methods do not require distribution assumptions and are robust to model misidentification. Nonparametric methods to estimate and compare two AUCs have been proposed by [5], [6], and others. These methods are based on results for U-statistics because an empirical AUC statistic is essentially a Wilcoxon rank sum statistic [7]. However, if two ROC curves intersect, their AUCs may be equal and do not provide valid information for the comparison. Moreover, summarizing the entire ROC curve may include irrelevant information about the marker’s accuracy when one is only interested in some range of specificities. For example, acceptable specificities are high for early cancer detection tests. The partial area under the curve (pAUC), which summarizes part of the ROC curve in the range of desired specificities, may be a better alternative. Nonparametric methods to compare pAUCs are proposed by [8]. Utilizing the pAUCs is particularly important in comparing markers which are developed to screen a large population for certain diseases, for example, breast cancer [9]. A lower specificity for a large population leads to many more falsely classified non-diseased subjects who may have to undergo a more invasive test subsequently. It is thus desired to compare screening markers at a higher range of specificities.

In this paper we propose efficient nonparametric ROC statistics to analyze multi-reader multi-test ROC data and to nonparametrically summarize correlated longitudinal ROC data. The proposed method not only includes many nonparametric ROC summary measures as special cases, but also maximizes the local power for detecting the difference between markers. The rest of the article is organized as follows. In Section 2 we introduce the new statistics for multi-reader multi-test ROC data and longitudinal ROC data, and discuss the equivalence between our statistics and the generalized Wilcoxon statistics under specific assumptions. Section 3 gives the variance expressions for the proposed statistics. Section 4 reports simulation results to illustrate the small sample performance of the proposed ROC statistics and their theoretical variances. Section 5 applies the proposed method to a real example on the diagnosis of endometriosis. Section 6 gives some discusion.

2. Methods

2.1. Definition of nonparametric ROC summary statistics

We first define some notations. Suppose test result X_ℓip of marker ℓ is from the pth abnormal location in the diseased subject i, where ℓ = 1, …, L, p = 0, 1, …, m_ℓi, and i = 1, … M. Test result Y_ℓjq of marker ℓ is from the qth normal location in the non-diseased subject j, where ℓ = 1, …, L, q = 0, 1, …, n_ℓj, and j = 1, … J. Here the total number of subjects is N = M + J. The joint pairwise cumulative function of (X_ℓ₁ip₁, X_ℓ₂ip₂) is taken to be S_{D,ℓ₁,ℓ₂}(x₁, x₂), p₁, p₂ = 1, …, m_ℓi, with marginal survival functions X_ℓip ~ S_D,ℓ(x). Similarly we define (Y_ℓ₁jq₁, Y_ℓ₂jq₂) ~ S_{D̄,ℓ₁,ℓ₂}(y₁, y₂), q₁, q₂ = 1, …, n_ℓi, with survival functions Y_ℓjq ~ S_D̄,ℓ(y₁, y₂), q₁, q₂ = 1, …, n_ℓi with marginal survival functions Y_ℓjq ~ S_D̄,ℓ(y). The ROC curve for the ℓth marker is then given by $R O C_{ℓ} (u) = S_{D, ℓ} {S_{D, ℓ}^{- 1} (u)}$ , where the false positive rate (FPR) u is in [0, 1]. The resulting ℓth weighted area under the curve (wAUC) is

Ω_{l} = \int_{0}^{1} S_{D, ℓ} {S_{D, ℓ}^{- 1} (u)} d W (u),

(1)

with a probability measure W(u) defined on u, for u ∈ [0, 1]. Included in this class of accuracy measures are AUC, pAUC between FPRs u₁ and u₂, and the sensitivity at a given level of FPR u₀. W(u) can also be defined as certain distribution functions, such as the beta cdf, to assign varying weight to the specificity. The detailed discussion is in [10].

By substituting the functions S_D,ℓ and S_D̄,ℓ with their respective empirical function ${\hat{S}}_{D, ℓ}$ and ${\hat{S}}_{\overset{‒}{D}, ℓ}$ , the nonparametric wAUC estimator is given by ${\hat{Ω}}_{ℓ} \int_{0}^{1} {\hat{S}}_{D, ℓ} {{\hat{S}}_{D, ℓ}^{- 1} (u)} d W (u)$ . The empirical survival functions ${\hat{S}}_{D, ℓ}$ and ${\hat{S}}_{\overset{‒}{D}, ℓ}$ are defined

\begin{matrix} {\hat{S}}_{D, ℓ} (x) & = \frac{1}{\sum_{i = 1}^{M} m_{ℓ i}} \sum_{i = 1}^{M} \sum_{p = 1}^{m_{ℓ i}} I (X_{ℓ i p} > x), \\ {\hat{S}}_{\overset{‒}{D}, ℓ} (x) & = \frac{1}{\sum_{j = 1}^{J} m_{ℓ j}} \sum_{j = 1}^{J} \sum_{q = 1}^{n_{ℓ j}} I (Y_{ℓ j q} > x) . \end{matrix}

(2)

Denote Ω = (Ω₁, Ω₂, …, Ω_L). By substituting ${\hat{S}}_{D, ℓ}$ and ${\hat{S}}_{\overset{‒}{D}, ℓ}$ in Equation (1), the nonparametric estimator of Ω is given by $\hat{Ω} = ({\hat{Ω}}_{1}, {\hat{Ω}}_{2}, \dots, {\hat{Ω}}_{L})$ .

We define W(u) = u for 0 < u < 1 to obtain the nonparametric AUC estimator for the ℓth marker as follows

{\hat{Ω}}_{A, ℓ} = \frac{1}{\sum_{i = 1}^{M} m_{ℓ i} \sum_{j = 1}^{J} n_{ℓ j}} \sum_{i = 1}^{M} \sum_{p = 1}^{m_{ℓ i}} \sum_{j = 1}^{J} \sum_{q = 1}^{n_{ℓ j}} I (X_{ℓ i p} > Y_{ℓ j q}) .

(3)

The AUC statistic in (3) takes the form of the Wilcoxon rank-sum statistic. It essentially compares the measurements of abnormal locations with those of normal locations. To calculate this statistic, we obtain every possible pair of measurements from an abnormal location and a normal location. We assign 1 if the abnormal location’s measurement is larger than the normal location in the pair, and 0 otherwise. ${\hat{Ω}}_{A, ℓ}$ is then calculated by averaging the 1’s and 0’s over all possible pairs. Since the location within each subject is viewed as the unit of sampling, the inference based on the regular Wilcoxon rank-sum statistic is not valid here.

When W (u) = (u - u₁)/(u₂ - u₁) for 0 < u₁ ≤ u ≤ u₂ < 1, ${\hat{Ω}}_{ℓ}$ empirically estimates the partial AUC (pAUC), and its explicit form is given by

\frac{1}{\sum_{i = 1}^{M} m_{ℓ i} \sum_{j = 1}^{J} n_{ℓ j}} \sum_{i = 1}^{M} \sum_{p = 1}^{m_{ℓ i}} \sum_{j = 1}^{J} \sum_{q = 1}^{n_{ℓ j}} I (X_{ℓ i p} > Y_{ℓ j q} ∣ Y_{ℓ j q} \in ({\hat{S}}_{\overset{‒}{D}, l}^{- 1} (u_{2})), {\hat{S}}_{\overset{‒}{D}, l}^{- 1} (u_{1})) .

(4)

The pAUC statistic in (4) uses all measurements from the abnormal locations. Since the pAUC is specified to be in the range of (u₁, u₂), only measurements from the normal locations which fall in ( ${\hat{S}}_{\overset{‒}{D}, ℓ}^{- 1} (u_{2}), {\hat{S}}_{\overset{‒}{D}, ℓ}^{- 1} (u_{1})$ ) are used in (4). That is, we sort all measurements from the normal locations from the smallest to the largest, and obtain the order statistics $Y_{[(1 - u_{2}) Σ_{j = 1}^{J} n_{ℓ j}]}$ , and $Y_{[(1 - u_{1}) Σ_{j = 1}^{J} n_{ℓ j}]}$ , where [x] denotes the smallest integer greater than or equal to x. We then calculate the Wilcoxon rank-sum like statistic by comparing all X’s with Y’s which are between $Y_{[(1 - u_{2}) Σ_{j = 1}^{J} n_{ℓ j}]}$ and $Y_{[(1 - u_{1}) Σ_{j = 1}^{J} n_{ℓ j}]}$ . The pAUC statistic is useful in disease screening when a high FPR would lead to a large number of falsely diagnosed subjects. It is desirable to evaluate and compare the marker accuracy at the low FPRs rather than the entire range of FPRs. When we are interested in the sensitivity of the ℓth marker at a particular threshold, say c, we can specify the probability measure to be a point mass at $u_{0} = S_{\overset{‒}{D}, ℓ} (c)$ . The estimator ${\hat{Ω}}_{ℓ}$ then becomes

\frac{1}{\sum_{i = 1}^{M} m_{ℓ i}} \sum_{i = 1}^{M} \sum_{p = 1}^{m_{ℓ i}} I (X_{ℓ i p} > Y_{[(1 - u_{0}) \sum_{j = 1}^{J} n_{ℓ j}]}) .

(5)

The estimator in (5) is obtained by comparing all X’s with $Y_{[(1 - u_{0}) Σ_{j = 1}^{J} n_{ℓ j}]}$ .

In the following sections, we propose efficient nonparametric methods based on the nonparametric estimator of Ω to evaluate and compare multiple markers in multi-reader multi-test ROC Data and longitudinal ROC data.

2.2. Multi-reader multi-test ROC data

One type of complex marker data arise frequently in medical imaging studies when radiological images of a patient are evaluated by several radiologists. [11] consider a mixed-effect ANOVA model while allowing for correlation among AUC estimators. Their model requires a specific covariance structure among the AUCs. [12] propose a pseudo-generalized estimating equation method and derive large sample theory for the estimators. Their method remains valid under the working-independence assumption.

In a multi-reader multi-test ROC study, suppose the radiologist r, r = 1, …, R, rates images for M diseased subjects and J non-diseased subjects from ℓ imaging devices. A radiologist can give one or more ratings to suspicious locations in each subject, that is, m_ℓi, n_ℓj ≥ 1. We consider L = 2. Denote Ω₁, …, Ω_R as wAUCs from R readers for modality 1, Ω_R+1, …, Ω_2R as wAUCs from R readers for modality 2. Common nonparametric approaches for comparing imaging modalities take the difference Ω_r - Ω_R+r between two devices for reader r, and then average these differences over all reader [13]. We can see that such methods are a special case of the linear combination of the weighted AUC statistics for reader-modality combinations. Rather than the simple average of all Ω_r - Ω_R+r’s, we propose to use the following weighted linear combination to possibly achieve a higher power to compare markers

Δ_{m} = {(\sum_{r = 1}^{r} w_{r})}^{- 1} \sum_{r = 1}^{R} [w_{r} (Ω_{r} - Ω_{R + r})],

(6)

with positive and bounded weights $\tilde{W} = {(w_{1}, w_{2}, \dots w_{R})}^{'}$ . The parameter Ω_m can be empirically estimated by

{\hat{Δ}}_{m} = {(\sum_{r = 1}^{R} w_{r})}^{- 1} \sum_{r = 1}^{R} [w_{r} ({\hat{Ω}}_{r} - {\hat{Ω}}_{R + r})],

which compares two modalities with multiple readers.

Various choices of weights exist in the ROC literature. W̃ may not depend on the data. For instance, if all readers are assumed to be homogeneous with regard to their accuracy of rating images, an equal weight w_r = 1/R can be assigned to reader r, r = 1, …, R. Then with m_ℓi = n_ℓj = 1 and W (u) = 1 at 0 < u < 1, ${\hat{Δ}}_{m}$ becomes the AUC statistic in [13]. When one has to estimate W̃ from the data, the consistency of estimated weights Ŵ in probability is required for the derivation. For instance, a set of optimal weights is introduced by [14] and further developed by [15], who argues that when readers’ experience vary greatly, using equal weights may yield a biased AUC estimate. Let the R × R covariance matrix of estimated AUC differences, ${({\hat{Ω}}_{1} - {\hat{Ω}}_{R + 1}, \dots, {\hat{Ω}}_{R} - {\hat{Ω}}_{2 R})}^{'}$ , be Σ_A, and its consistent estimator ${\hat{Σ}}_{A}$ . They then choose $\tilde{W} = {\hat{Σ}}_{A}^{- 1}$ to obtain a consistent estimator for the AUC difference, where 1 is a R-dimensional vector of one’s. [14] and [15] show that this set of weights are optimal since they maximize the local power to detect the AUC difference between imaging modalities. It is clear that by combining these weights with m_ℓi = n_ℓj = 1 and W (u) = 1 at 0 < u < 1, ${\hat{Δ}}_{m}$ becomes [15]’s statistic. To properly calculate the weights for the proposed statistic, we need to obtain the covariance matrix Σ of $\hat{Ω} = {({\hat{Ω}}_{1}, \dots, {\hat{Ω}}_{2 R})}^{'}$ . Since in practice Ω is unknown, its consistent estimator $\hat{Σ}$ can be obtained using the explicit expression (A.1) derived in the Appendix. Since Σ and Σ_A is related via

Σ_{A} = Σ A,

where the rth column of the 2R × R matrix A has 1’s at rth and (R + r)th rows and 0 at other rows, the estimated weights are given by

\hat{W} = {\hat{Σ}}^{- 1} A 1 .

(7)

2.3. Longitudinal biomarker data

Another example of complex marker data comes from longitudinal studies when marker measurements are taken at several times during the studies. Most methodology for longitudinal ROC data rely on appropriate assumptions on the distributions of marker measurements [16]. In longitudinal ROC data, suppose L markers are measured on M diseased patients and J non-diseased patients at times t₁, t₂, …, t_K.

Suppose each subject is repeatedly measured for every marker at each time. Let X_ℓipk denote the test result of marker ℓ in the pth repetition on the diseased subject i at time t_k, where ℓ = 1, …, L, p = 1, …, m_ℓik, i = 1, …, M, and k = 1, …, K. Let Y_ℓjqk denote test result of ℓth marker on the pth repetition in the non-diseased subject j at time t_k, where ℓ = 1, …, L, q = 1, …, n_ℓjk, j = 1, … J, and k = 1, …, K. The nonparametric wAUC estimator for the ℓth marker is then given by ${\hat{Ω}}_{ℓ} = \int_{0}^{1} {\hat{S}}_{D, ℓ} {{\hat{S}}_{\overset{‒}{D}, ℓ}^{- 1} (u) d W (u)}$ , where ${\hat{S}}_{D, ℓ}$ and ${\hat{S}}_{\overset{‒}{D}, ℓ}$ are defined by

{\hat{S}}_{D, ℓ} (x) = \frac{1}{\sum_{i = 1}^{M} \sum_{k = 1}^{K} m_{ℓ i k}} \sum_{i = 1}^{M} \sum_{k = 1}^{K} \sum_{p = 1}^{m_{ℓ i k}} I (X_{ℓ i p k} > x), and {\hat{S}}_{\overset{‒}{D}, ℓ} (x) = \frac{1}{\sum_{j = 1}^{J} \sum_{k = 1}^{K} n_{ℓ i k}} \sum_{j = 1}^{J} \sum_{k = 1}^{K} \sum_{q = 1}^{n_{ℓ i k}} I (Y_{ℓ j q k} > x),

(8)

By defining W (u) accordingly in the wAUC estimator, we obtain the nonparametric AUC estimator for the ℓth marker:

\frac{1}{\sum_{i = 1}^{M} \sum_{k = 1}^{K} m_{ℓ i k} \sum_{j = 1}^{J} \sum_{k = 1}^{K} n_{ℓ j k}} \sum_{i = 1}^{M} \sum_{k_{1} = 1}^{K} \sum_{p = 1}^{m_{ℓ i k}} \sum_{j = 1}^{J} \sum_{k_{2} = 1}^{K} \sum_{q = 1}^{n_{ℓ j k}} I (X_{ℓ i p k_{1}} > Y_{ℓ j q k_{2}}),

the partial AUC estimator:

\frac{\sum_{i = 1}^{M} \sum_{k_{1} = 1}^{K} \sum_{p = 1}^{m_{ℓ i k}} \sum_{j = 1}^{J} \sum_{k_{2} = 1}^{K} \sum_{q = 1}^{n_{ℓ j k}} I (X_{ℓ i p k_{1}} > Y_{ℓ i q k_{2}} ∣ Y_{ℓ q k_{2}} \in ({\hat{S}}_{\overset{‒}{D}, ℓ}^{- 1} (u_{2}), {\hat{S}}_{\overset{‒}{D}, l}^{- 1} (u_{1})))}{\sum_{i = 1}^{M} \sum_{k = 1}^{K} m_{ℓ i k} \sum_{j = 1}^{J} \sum_{k = 1}^{K} n_{ℓ j k}},

and the sensitivity estimator at the FPR of u₀,

\frac{1}{\sum_{i = 1}^{M} \sum_{k = 1}^{K} m_{ℓ i k}} \sum_{i = 1}^{M} \sum_{k = 1}^{K} \sum_{p = 1}^{m_{ℓ i k}} I (X_{ℓ i p k} > Y_{[(1 - u_{0}) \sum_{j = 1}^{J} \sum_{k = 1}^{K} n_{ℓ j k}]}) .

We define h to be a real-valued function of $\hat{Ω}$ . Here the function h is defined on $R^{L}$ , and has continuous partial derivatives of order 2. Let the ROC summary measure be Δ_h = h(Ω). Its empirical estimator is given by

{\hat{Δ}}_{h} \equiv h (\hat{Ω}) = h (\int_{0}^{1} {\hat{S}}_{D, 1} {{\hat{S}}_{\overset{‒}{D}, 1}^{- 1} (u)} d W (u), \dots, \int_{0}^{1} {\hat{S}}_{D, L} {{\hat{S}}_{\overset{‒}{D}, L}^{- 1} (u)} d W (u)) .

(9)

The statistic above can be used to compare two longitudinal markers when h is a linear contrast. ${\hat{Δ}}_{h}$ also includes a broad range of ROC statistics. It is the weighted AUC statistic in [17] and later in [10] for evaluating and comparing markers. When W (u) = 1 at 0 < u < 1 and h is a linear function, ${\hat{Δ}}_{h}$ is the generalized AUC statistic in [13]. When W (u) = 1 at 0 < u < 1, ${\hat{Δ}}_{h}$ is the AUC statistic in [18], assuming no correlation between X and Y, which allows for multiple observations per patient from each marker. When W (u) = (u - a)/(b - a) for 0 < a < u < b < 1 and h(Ω₁, Ω₂) = Ω₁ - Ω₂, ${\hat{Δ}}_{h}$ is the pAUC statistic in [8] for comparing two markers.

When there are two longitudinal markers in the study, the optimal combination for comparing the two markers can be obtained using the similar steps in the aforementioned multi-reader multi-test studies. Suppose ℓ = 2. Let Ω_l,k be the wAUC of marker l, l = 1, 2, at time t_k and ${\hat{Ω}}_{ℓ, k}$ be its nonparametric estimator given by ${\hat{Ω}}_{ℓ, k} = \int_{0}^{1} {\hat{S}}_{D, ℓ, k} {{\hat{S}}_{\overset{‒}{D}, ℓ, k}^{- 1}} (u) d W (u)$ , where ${\hat{S}}_{D, ℓ, k}$ and ${\hat{S}}_{\overset{‒}{D}, ℓ, k}$ are defined by

{\hat{S}}_{D, ℓ, k (x)} = \frac{1}{\sum_{i = 1}^{M} m_{ℓ i k}} \sum_{i = 1}^{M} \sum_{p = 1}^{m_{ℓ i k}} I (X_{ℓ i p k} > x), and {\hat{S}}_{\overset{‒}{D}, ℓ, k (x)} = \frac{1}{\sum_{j = 1}^{J} n_{ℓ i k}} \sum_{j = 1}^{J} \sum_{q = 1}^{n_{ℓ i k}} I (Y_{ℓ i q k} > x) .

(10)

Note that the estimation of Ω_l,k is based on every individual time point. One can take difference of the wAUCs of two markers, and simply average these differences over all time points. We may also use the following weighted linear combination to possibly achieve a higher power to compare markers

Δ_{ℓ} = {(\sum_{k = 1}^{K} w_{k})}^{- 1} \sum_{k = 1}^{K} [w_{k} (Ω_{1, k} - Ω_{2, k})],

(11)

with positive and bounded weights $\tilde{W} = {(w_{1}, w_{2}, \dots, w_{K})}^{'}$ . The parameter Ω_ℓ can be empirically estimated by

{\hat{Δ}}_{ℓ} = {(\sum_{k = 1}^{K} w_{k})}^{- 1} \sum_{k = 1}^{K} [w_{k} ({\hat{Ω}}_{1, k} - {\hat{Ω}}_{2, k})] .

Similarly as in the previous section, the 2K × 2K covariance matrix Σ of $\hat{Ω} = {({\hat{Ω}}_{1, k}, \dots, {\hat{Ω}}_{2 K})}^{'}$ can be estimated can be obtained using the explicit expression in (A.1). Thus the estimated weights are given by the same expression as (7).

3. Asymptotic variance expressions of the proposed statistics

In this section we derive the asymptotic variances for the proposed statistics in the multi-reader multi-test data and the longitudinal data. We first show the explicit variance expressions for ${\hat{Δ}}_{m}$ , and then show the variance expression for the more general statistic ${\hat{Δ}}_{h}$ in (9) for the longitudinal data.

The numbers of abnormal locations within a diseased subject may differ, and so are the numbers of normal locations within a non-diseased subject. Denote ${\tilde{m}}_{ℓ} = Σ_{i = 1}^{M} m_{ℓ i}$ , and ${\tilde{n}}_{ℓ} = Σ_{j = 1}^{J} n_{ℓ j}$ . Assume that S_D,ℓ and S_D̄,ℓ have continuous and positive derivatives, $S_{D, ℓ}^{'}$ , and $S_{\overset{‒}{D}, ℓ}^{'}$ . In Appendix we show that the proposed statistic, ${\hat{Δ}}_{m}$ , for the multi-reader multi-test ROC data is asymptotically normal when sample sizes are large. The variance of ${\hat{Δ}}_{m}$ has the following expression when sample sizes get large:

v a r ({\hat{Δ}}_{m}) = {\tilde{v}}_{X} + {\tilde{v}}_{Y},

(12)

with

{\tilde{v}}_{X} = \frac{1}{M {\tilde{m}}_{ℓ_{1}} {\tilde{m}}_{ℓ_{2}} {(\sum_{r = 1}^{R} w_{r})}^{2}} \sum_{1 \leq ℓ_{1}, ℓ_{2} \leq 2 R} \sum_{i = 1}^{M} {\tilde{m}}_{ℓ_{1} i} {\tilde{m}}_{ℓ_{2} i} {(- 1)}^{I (ℓ_{1}, ℓ_{2}) + 1} (\int \int [S_{D, ℓ_{1}, ℓ_{2}} {S_{\overset{‒}{D}, ℓ_{1}}^{- 1} (s), S_{\overset{‒}{D}, ℓ_{2}}^{- 1} (t)} - S_{D, ℓ_{1}} {S_{\overset{‒}{D}, ℓ_{1}}^{- 1} (s)} S_{D, ℓ_{2}} {S_{\overset{‒}{D}, ℓ_{2}}^{- 1} (t)}] d W (s) d W (t)),

and

{\tilde{v}}_{Y} = \frac{1}{M {\tilde{n}}_{ℓ_{1}} {\tilde{n}}_{ℓ_{2}} {(\sum_{r = 1}^{r} w_{r})}^{2}} \sum_{1 \leq ℓ_{1}, ℓ_{2} \leq 2 R} \sum_{j = 1}^{J} {\tilde{n}}_{ℓ_{1} j} {\tilde{n}}_{ℓ_{2} j} {(- 1)}^{I (ℓ_{1}, ℓ_{2}) + 1} (\int \int r_{ℓ_{1}} (s) r_{ℓ_{2}} (t) \times [S_{\overset{‒}{D}, ℓ_{1}, ℓ_{2}} {S_{\overset{‒}{D}, ℓ_{1}}^{- 1} (s), S_{\overset{‒}{D}, ℓ_{2}}^{- 1} (t)} - s t] d W (s) d W (t)),

where I(ℓ₁, ℓ₂) = 1, if |ℓ₂ - ℓ₁| < R, and 0, otherwise, and

r_{ℓ} (u) = S_{D, ℓ}^{'} {S_{\overset{‒}{D}, ℓ}^{- 1} (u)} ∕ S_{\overset{‒}{D}, l}^{- 1} {S_{\overset{‒}{D}, ℓ}^{- 1} (u)}, for ℓ = 1, \dots, L .

The marginal and joint survivor functions can also be empirically estimated.

Denote $m_{ℓ} = Σ_{i = 1}^{M} Σ_{k = 1}^{K} m_{ℓ i k}$ , and $n_{ℓ} = Σ_{j = 1}^{J} Σ_{k = 1}^{K} n_{ℓ j k}$ . we show in Appendix that the proposed statisatic, ${\hat{Δ}}_{h}$ in (9) for the longitudinal data is also asymptotically normal, and the variance of ${\hat{Δ}}_{h}$ takes on the following form when sample sizes are large,

v a r ({\hat{Δ}}_{h}) = v_{X} + v_{Y},

(13)

where

v X = \frac{1}{M m_{ℓ_{1}} m_{ℓ_{2}}} \sum_{i = 1}^{M} m_{ℓ_{1} i} m_{ℓ_{2} i} \frac{\partial h}{\partial Ω_{ℓ_{1}}} \frac{\partial h}{\partial Ω_{ℓ_{2}}} (\int \int [S_{D, ℓ_{1}, ℓ_{2}} {S_{\overset{‒}{D}, ℓ_{1}}^{- 1} (s), S_{\overset{‒}{D}, ℓ_{2}}^{- 1} (t)} - S_{D, ℓ_{1}} {S_{\overset{‒}{D}, ℓ_{1}}^{- 1} (s)} S_{\overset{‒}{D}, ℓ_{2}}^{- 1} (t)] d W (s) d W (t)),

and

v Y = \frac{1}{M n_{ℓ_{1}} n_{ℓ_{2}}} \sum_{j = 1}^{J} n_{ℓ_{1} j} n_{ℓ_{2} j} \frac{\partial h}{\partial Ω_{ℓ_{1}}} \frac{\partial h}{\partial Ω_{ℓ_{2}}} (\int \int r_{ℓ_{1}} (s) r_{ℓ_{2}} (t) [S_{\overset{‒}{D}, ℓ_{1}}^{- 1} (s), S_{\overset{‒}{D}, ℓ_{2}}^{- 1} (t) - s t] d W (s) d W (t)),

where

r_{ℓ} (u) = S_{D, ℓ}^{'} {S_{\overset{‒}{D}, ℓ}^{- 1} (u)} ∕ S_{\overset{‒}{D}, ℓ}^{'} {S_{\overset{‒}{D}, ℓ}^{- 1} (u)}, for ℓ = 1, \dots, L .

The empirical or other type of smoothed estimators for the marginal and joint survivor functions S_D,ℓ, S_D̄,ℓ, S_{D,ℓ₁,l₂}(x₁, x₂), and S_{D̄,ℓ₁,l₂}(y₁, y₂) can be used to estimate v_X and v_Y. In the simulations and the example, we used the empirical estimators. That is, we estimate S_D,ℓ and S_D̄,ℓ using the expressions in (8). And we estimate S_{D,ℓ₁,l₂}(x₁, x₂), and S_{D̄,ℓ₁,l₂}(y₁, y₂) as follows:

{\hat{S}}_{D, ℓ_{1}, ℓ_{2}} (x_{1}, x_{2}) = \frac{1}{\sum_{i = 1}^{M} m_{ℓ i}^{2}} \sum_{i = 1}^{M} \sum_{p_{1} = 1}^{m_{ℓ_{1} i}} \sum_{p_{2} = 1}^{m_{ℓ_{2} i}} \sum_{k_{1} = 1}^{K} \sum_{k_{2} = 1}^{K} I (X_{ℓ_{1} i p_{1} k_{1}} > x_{1}, X_{ℓ_{2} i p_{2} k_{2}} > x_{2}),

{\hat{S}}_{\overset{‒}{D}, ℓ_{1}, ℓ_{2}} (y_{1}, y_{2}) = \frac{1}{\sum_{j = 1}^{J} n_{ℓ i}^{2}} \sum_{j = 1}^{J} \sum_{q_{1} = 1}^{n_{ℓ_{1} i}} \sum_{q_{2} = 1}^{n_{ℓ_{2} i}} \sum_{k_{1} = 1}^{K} \sum_{k_{2} = 1}^{K} I (Y_{ℓ_{1} j q_{1} k_{1}} > y_{1}, Y_{ℓ_{2} j q_{2} k_{2}} > y_{2}),

Thus, when Δ’s are AUCs, v_X is given by

v_{X} = \frac{1}{M m_{ℓ_{1}} m_{ℓ_{2}}} \sum_{1 \leq ℓ_{1}, ℓ_{2} \leq 2 R} \sum_{i = 1}^{M} m_{ℓ_{1} i} m_{ℓ_{2} i} \frac{\partial h}{\partial Ω_{ℓ_{1}}} \frac{\partial h}{\partial Ω_{ℓ_{2}}} (E [I (X_{ℓ_{1} i p_{1} k_{1}} > Y_{ℓ_{1} j p_{1} k_{1}}) I (X_{ℓ_{2} i p_{1} k_{1}} > Y_{ℓ_{2} j p_{1} k_{1}})] - E [I (X_{ℓ_{1} i p_{1} k_{1}} > Y_{ℓ_{1} j p_{1} k_{1}})] E [I (X_{ℓ_{2} i p_{1} k_{1}} > Y_{ℓ_{2} j p_{1} k_{1}})]),

and v_Y is given by

v_{Y} = \frac{1}{M n_{ℓ_{1}} n_{ℓ_{2}}} \sum_{1 \leq ℓ_{1}, ℓ_{2} \leq 2 R} \sum_{j = 1}^{J} n_{ℓ_{1} j} n_{ℓ_{2} j} \frac{\partial h}{\partial Ω_{ℓ_{1}}} \frac{\partial h}{\partial Ω_{ℓ_{2}}} (E [I (X_{ℓ_{1} i p_{1} k_{1}} > Y_{ℓ_{1} j p_{1} k_{1}}) I (X_{ℓ_{2} i p_{1} k_{1}} > Y_{ℓ_{1} j p_{1} k_{1}})] - E [I (X_{ℓ_{1} i p_{1} k_{1}} > Y_{ℓ_{1} j p_{1} k_{1}})] E [I (X_{ℓ_{2} i p_{1} k_{1}} > Y_{ℓ_{2} j p_{1} k_{1}})]),

4. Simulation studies

We report simulation studies to evaluate the finite sample property of the proposed statistics. We simulated both multi-reader multi-test ROC data and longitudinal data. In multi-reader multi-test data, we considered the finite sample performance of the variance expression. More importantly, we compared the simulated powers of the equal weight and the optimal weight introduced in Section 2.2. We expect that the optimal weight results in better power than the equal weight. In longitudinal data we considered the general setting where each subject is diagnosed repeatedly at each time point and the number of repeated measures varies from subject to subject.

4.1. Multi-reader multi-test data

In the first simulation study we investigated the finite sample accuracy of the variance expression for multireader multitest data. We let m_ℓi = n_ℓj = 1, R = 3, and ℓ = 2. We simulated 1000 datasets under multivariate normal and lognormal distributions:

X ~ N(μ_X, Σ_X) and Y ~ N(μ_Y, Σ_Y), where μ_X = (1, …, 1), μ_Y = (0, …, 0) and Σ_X = Σ_Y is the variance-covariance matrix with diagonal elements (1, 1.5, 2, 1, 1.5, 2) and correlation coefficient, ρ;
X ~ LogNormal(μ_X, Σ_X) and Y ~ LogNormal(μ_Y, Σ_Y).

From simulated data we used the proposed statistic in Section 2.2, ${\hat{Δ}}_{m} = Σ_{r = 1}^{3} ({\hat{Ω}}_{r} - {\hat{Ω}}_{R + r}) ∕ R$ to estimate the AUC by defining the weight function W (u) = 1, for 0 < u < 1), and the pAUC by defining W (u) = 1, for 0 < u < 0.6; 0 otherwise. A 95% confidence interval for ${\hat{Δ}}_{m}$ was obtained using the variance expression derived in (13). Table 1 shows biases, square root of mean squared errors (RMSE), and simulated coverage of confidence intervals. It is clear from the table that coverage levels are close to the nominal level, and biases for comparing AUCs or pAUCs are close to zero. This shows good performance of our estimator and associated asymptotic results.

Table 1.

Bias, RMSE and coverage for simulated multi-reader multi-test data

			AUC			pAUC
	ρ	M (J)	Bias (in %)	RMSE	Coverage	Bias (in %)	RMSE	Coverage
Norm	−0.1	50	8.01E-02	0.0359	91.94%	3.17E-02	0.0304	92.52%
		100	3.43E-02	0.0483	89.47%	7.99E-02	0.0404	91.99%
		200	−1.93E-01	0.0481	92.18%	−1.00E-01	0.0396	94.40%
	0.2	50	−8.21E-02	0.0258	91.66%	−1.01E-01	0.0217	93.70%
		100	1.31E-01	0.0348	89.87%	1.03E-01	0.0296	91.20%
		200	−1.32E-01	0.0343	92.50%	−1.21E-01	0.0297	92.60%
	0.5	50	−6.38E-02	0.0175	94.12%	−2.01E-02	0.0151	95.70%
		100	−2.78E-02	0.0240	92.10%	−5.44E-02	0.0200	93.00%
		200	6.24E-02	0.0239	94.30%	−7.06E-03	0.0209	94.10%
LN	−0.1	50	−5.01E-02	0.0346	91.99%	1.69E-02	0.0354	92.29%
		100	7.77E-02	0.0478	89.21%	5.27E-02	0.0488	89.38%
		200	−1.38E-01	0.0493	91.98%	−8.07E-04	0.0464	92.59%
	0.2	50	−5.86E-02	0.0261	91.82%	−4.46E-02	0.0250	91.42%
		100	7.04E-02	0.0339	90.16%	7.59E-02	0.0352	89.39%
		200	3.88E-02	0.0340	92.40%	4.38E-02	0.0345	92.70%
	0.5	50	−5.39E-02	0.0169	94.43%	−3.60E-02	0.0172	93.93%
		100	−1.02E-01	0.0241	93.00%	−8.00E-02	0.0234	93.20%
		200	−4.62E-02	0.0239	94.40%	−5.02E-02	0.0243	93.80%

Open in a new tab

Norm denotes the normal distribution; LN denotes the lognormal distribution.

In the second simulation study we compared the performance of the proposed method with the parametric method by [3] and the semiparametric logistic regression method by [4] with regard to estimating the AUC. We used the same setting as the first simulation study except changing μ_X to (1, 1, 1, 1.5, 2, 2.5). The biases and RMSEs from the three methods are shown in Table 2. The results indicate that the proposed method and the semiparametric method perform much better than the parametric method when the distribution assumptions are violated. They also indicate that the semiparametric method performs as well as the proposed method. This is not surprising as can be seen from the description of the semiparametric method in Section 2 of [4]. The logistic regression fits the regression parameters based on the following equation:

l o g i t (D = 1) = β_{0} + β_{1} Z,

where D is the disease status (with 1 being the diseased, and 0 being the non-diseased), β₀ and β₁ are regression parameters, and Z is the test result. After the regression parameter estimators, ${\hat{β}}_{0}$ and ${\hat{β}}_{1}$ , are obtained, the empirical ROC curve is estimated based on the new score, $\tilde{Z} = {\hat{β}}_{0} + {\hat{β}}_{1} Z$ . Since the ROC curve is invariant to monotonic transformation, the empirical ROC curve based on the new score remains the same as the empirical ROC curve from the original test results.

Table 2.

Bias and RMSE of the proposed, parametric, and semiparametric methods

			Proposed Method		Semiparametric Method		Parametric Method

	ρ	M(J)	Bias	RMSE	Bias	RMSE	Bias	RMSE
Norm	−0.1	50	−0.0140	0.0329	−0.0123	0.0318	−0.0131	0.0326
		100	−0.0126	0.0251	−0.0144	0.0249	−0.0138	0.0246
		200	−0.0136	0.0202	−0.0132	0.0203	−0.0135	0.0198
	0.2	50	−0.0149	0.0247	−0.0155	0.0440	−0.0117	0.0423
		100	−0.0150	0.0331	−0.0139	0.0327	−0.0125	0.0317
		200	−0.0140	0.0451	−0.0147	0.0262	−0.0136	0.0241
	0.5	50	−0.0133	0.0455	−0.0153	0.0456	−0.0168	0.0446
		100	−0.0132	0.0252	−0.0130	0.0327	−0.0151	0.0330
		200	−0.0132	0.0333	−0.0139	0.0258	−0.0121	0.0239
LN	−0.1	50	−0.0152	−0.0158	−0.0122	0.0360	0.0689	0.0779
		100	−0.0131	−0.0129	−0.0120	0.0265	0.0758	0.0814
		200	−0.0131	−0.0145	−0.0127	0.0203	0.0799	0.0833
	0.2	50	−0.0158	0.0446	−0.0139	0.0499	0.0706	0.0817
		100	−0.0120	0.0232	−0.0141	0.0351	0.0754	0.0810
		200	−0.0136	0.0327	−0.0129	0.0249	0.0807	0.0846
	0.5	50	−0.0158	0.0460	−0.0156	0.0498	0.0705	0.0838
		100	−0.0129	0.0255	−0.0120	0.0344	0.0791	0.0877
		200	−0.0145	0.0343	−0.0134	0.0256	0.0826	0.0884

Open in a new tab

Norm denotes the normal distribution; LN denotes the lognormal distribution.

In the third simulation study we compared the simulated powers using the optimal weight versus the equal weight. We again let m_ℓi = n_ℓj = 1, R = 3, and ℓ = 2. We simulated 1000 datasets under multivariate normal distributions: X ~ N(μ_X, Σ_X) and Y ~ N(μ_Y, Σ_Y), where μ_X = (2, 1, …, 1), μ_Y = (0, …, 0) and Σ_X = Σ_Y is the variance-covariance matrix with diagonal elements (1, 1.5, 2, 2, 3, 2) and correlation coefficient, ρ. We selected m = n in (50,100), and ρ in (−0.1, 0.2, 0.5). For each simulated data, we estimated the weighted differences in (2.2):

h (Ω) = {(\sum_{r = 1}^{3} w_{r})}^{- 1} \sum_{r = 1}^{3} [w_{r} (Ω_{r} - Ω_{3 + r})],

with both equal weights (w_r = 1/3) and the optimal weights given in (7). The AUC was estimated by defining the weight function W (u) = 1, for 0 < u < 1), and the pAUC was estimated by defining W (u) = 1, for 0 < u < 0.6; 0 otherwise. The simulated power was then calculated as the number of rejections out of 1000 simulated datasets. Table 3 shows the simulated powers for the comparison of AUCs and pAUCs. It is clear that the optimal weights always result in much larger powers than the equal weights.

Table 3.

Simulated powers for comparing tests

AUC
	Equal Weight		Optimal Weight
ρ	M=J=50	100	50	100
−0.1	0.507	0.741	0.723	0.932
0.2	0.335	0.541	0.659	0.909
0.5	0.327	0.538	0.703	0.936

pAUC
	Equal Weight		Optimal Weight
	M=J=50	100	50	100

−0.1	0.156	0.290	0.316	0.599
0.2	0.141	0.212	0.280	0.584
0.5	0.133	0.187	0.266	0.643

Open in a new tab

4.2. Longitudinal biomarker data

In this simulation study we generated multivariate log-normal correlated biomarker data. We generated data by taking exponential of multivariate normal data X_i ~ N(μ_X,i, Σ_X,i) and Y_j ~ N( 0, Σ_Y,j), where μ_X,i = (2, …, 2, 1, …, 1), and Σ_X,i and Σ_Y,j are variance-covariance matrices. We let L = 2, K = 3, M = J = (50, 200). To allow various cluster sizes, we let m_ℓik = 2 for the first half of diseased subjects, and m_ℓik = 4 for the other half. For non-diseased subjects, let n_ℓjk = 5 for the first half, and n_ℓjk = 3 for the other half. We chose Σ_X,i = (1 - ρ)M + ρ 1_i1_i′, where M_i is the LKm_ℓik × LKm_ℓik identity matrix and 1_i is the LKm_ℓik × 1 matrix with all elements 1. Similar setting was applied to define Σ_Y,j. Here ρ gives within-subject correlation. We let ρ = 0.4 for the diseased and ρ = 0.3 for the non-diseased. We simulated 1000 datasets for each sample size, and obtained the estimate of AUC difference between two biomarkers, ${\hat{Δ}}_{ℓ}$ , and its variance. Table 4 shows biases, square root of mean squared errors (RMSE), and simulated coverage of confidence intervals. This again shows good performance of our estimator for correlated biomarker data.

Table 4.

Bias, RMSE and coverage for simulated correlated data

		AUC			pAUC
	M (J)	Bias (in %)	RMSE	Coverage	Bias (in %)	RMSE	Coverage
Norm	50	−0.1182	1.0266	97.40%	0.0627	0.0184	97.40%
	100	0.0302	2.1682	96.60%	0.0931	0.0128	96.60%
	200	0.0038	1.5226	95.80%	0.0116	0.0090	96.00%
LN	50	−0.0768	0.0143	97.10%	0.0097	0.0125	97.10%
	100	−0.1126	0.0218	96.20%	0.0521	0.0093	96.80%
	200	−0.0445	0.0109	94.90%	0.0317	0.0188	95.00%

Open in a new tab

Norm denotes the normal distribution; LN denotes the lognormal distribution.

5. An example in the diagnosis of endometriosis

The proposed nonparametric ROC summary statistics are applied in this section to data from a study on endometriosis diagnosis. Endometriosis is a gynecological medical condition in which endometrial-like cells appear and flourish in areas outside the uterine cavity and is typically seen in women at their reproductive ages. It has been estimated that endometriosis occurs in roughly 5%–10% of women. Despite its relatively high prevalence, substantive and methodological challenges exist, including diagnostic proficiency. The Physician Reliability Study, an add-on to the Endometriosis: Natural History, Diagnosis and Outcome (ENDO) Study [19], addressed this issue by investigating whether sequentially added clinical information of a subject can aid in more accurately diagnosing the disease of endometriosis. Detailed study designs of ENDO and PRS can be found in the aforementioned references. For demonstration purpose in this paper, we used review results of 4 physicians (reviewers) in PRS on 150 participants. All 150 participants had recorded operative digital images of their pelvic organs and descriptive drawings and notes, both from surgeons who conducted the laparoscopies on these women in ENDO study. The reviewers conducted their reviewing and diagnosis under two modalities. Modality one corresponds to the setting where the reviewers are presented with participants’ digital video/images while modality two corresponds to the setting where both digital video/images and surgeon’s reports (drawings and notes) are presented. For each participant under each modality, the reviewer answered a series questions on what they observe from the clinical information. These answered were later used to derive the rASRM scores [20] which we used as the diagnostic outcomes in this paper. The visualized diagnosis from the original ENDO study of these participants were used as the gold standard.

For the first modality, the estimated AUCs are (0.71, 0.75, 0.63, 0.76) for the four reviewers; the corresponding numbers are (0.83, 0.85, 0.75, 0.87) for the second modality. With equal weights w_r = 1/4, r = 1, …, 4, the Δ-statistic is ${\hat{Δ}}_{m} = - 0.1145$ , and its variance estimate is 0.0007475. We used (7) to obtain the optimal weights (w₁, w₂, w₃, w₄)=(298.08, 401.16, 176.88, 560.48). Using these weights, the Δ-statistic is given by ${\hat{Δ}}_{m} = - 0.1115$ , and its variance estimate is 0.0006961. This indicates that the Δ-statistic is more precisely estimated by using the optimal weights. The two-sided p-value using the optimal weights is 2.36 × 10⁻⁵, which is slightly smaller than the p-value 2.82 × 10⁻⁵ using equal weights. The two-sided p-values based on both sets of weights are both close to zero, which indicates that these physicians are able to give more precise diagnosis on endometriosis by reviewing both digital images and surgeons’ descriptive reports.

6. Discussion

The proposed methods in the paper are nonparametric and can be applied to evaluate and compare diagnostic markers in the multireader multitest data and the longitudinal data. As illustrated in the simulation studies and the example, the proposed weighted method in the multireader multitest data tends to have a larger power than the existing methods. We also conducted simulation studies to investigate the finite sample performance of the proposed method in the longitudinal data setting. More complex correlated data in which both normal and abnormal locations may occur in the same subject have been considered in [21] and [22]. How to extend the proposed statistics to such a data setting is a future research topic.

As pointed out by a reviewer, the proposed method is based on the empirical distribution estimators, and may not allow more complicated dependencies of observations in longitudinal data. For example, in the case of autoregressive dependencies, empirical estimators could not converge to target probabilities, especially when autoregression coefficients are greater than one. More research is merited to extend the proposed method in this direction.

Acknowledgement

The authors would like to thank an associate editor and two referees for their constructive comments and suggestions. The project described here was supported in part by Award Number R15CA150698 from the National Cancer Institute under the American Recovery and Reinvestment Act of 2009 and by Award Number H98230-11-1-0196 from the National Security Agency. The work was also supported in part with funding from the American Chemistry Council and the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Cancer Institute or the National Institutes of Health.

Appendix: Derivation of variance expression of Δ_h

Assume that S_D,ℓ and S_D̄,ℓ have continuous and positive derivatives, $S_{D, ℓ}^{'}$ , and $S_{\overset{‒}{D}, ℓ}^{'}$ . Suppose that M/m_ℓ → α_ℓ, M/n_ℓ → β_ℓ, M/J → λ, $Σ_{i = 1}^{M} m_{ℓ_{1 i} m_{ℓ_{2 i}}} ∕ M^{2} \to η_{ℓ_{1}, ℓ_{2}}^{X}$ , and $Σ_{j = 1}^{J} n_{ℓ_{1 j}, ℓ_{2 j}} ∕ M^{2} \to η_{ℓ_{1}, ℓ_{2}}^{Y}$ , as M, J → ∞. Assume that α_ℓ, β_ℓ, $η_{ℓ_{1}, ℓ_{2}}^{X}$ and $η_{ℓ_{1}, ℓ_{2}}^{Y}$ are finite numbers. In addition, assume that the function h has continuous partial derivatives of order 2 at each point of an open set (Ω − ε, Ω + ε), for ε > 0.

r_{ℓ (u)} = S_{D, ℓ}^{- 1} {S_{\overset{‒}{D}, ℓ}^{- 1} (u)} ∕ S_{\overset{‒}{D}, ℓ}^{'} {S_{\overset{‒}{D}, ℓ}^{- 1} (u)}, for ℓ = 1, \dots, L,

where $S_{D, ℓ}^{'}$ and $S_{\overset{‒}{D}, ℓ}^{'}$ are the first derivatives of S_D,ℓ and S_D̄,ℓ, respectively.

The asymptotic normality of $\hat{Ω}$ is derived using results from [18], which gives that for markers 1, … L,

\sqrt{M} (\begin{matrix} {\hat{R O C}}_{1} (u) - R O C_{1} (u) \\ {\hat{R O C}}_{2} (u) - R O C_{2} (u) \\ ⋮ \\ {\hat{R O C}}_{L} (u) - R O C_{L} (u) \end{matrix}) \overset{d}{\to} (\begin{matrix} \sqrt{α_{1}} U_{1, 1} [S_{D, 1} {S_{\overset{‒}{D}, 1}^{- 1} (u)}] - \sqrt{β_{1}} r_{1} (u) U_{2, 1} (u) \\ \sqrt{α_{2}} U_{1, 2} [S_{D, 2} {S_{\overset{‒}{D}, 1}^{- 1} (u)}] - \sqrt{β_{2}} r_{2} (u) U_{2, 2} (u) \\ ⋮ \\ \sqrt{α_{L}} U_{1, L} [S_{D, L} {S_{\overset{‒}{D}, L}^{- 1} (u)}] - \sqrt{β_{L}} r_{L} (u) U_{2, L} (u) \end{matrix})

where $U_{1, ℓ}$ and $U_{2, ℓ}$ are limiting Gaussian processes. Therefore, after some calculation, it follows that

\sqrt{M} (\hat{Ω} - Ω) \overset{d}{\to} N_{L} (0, Σ = Σ_{1} + Σ_{2}),

(A.1)

where the {ℓ₁, ℓ₂} element in Σ₁ is given by

α_{ℓ_{1}} α_{ℓ_{2}} η_{ℓ_{1}, ℓ_{2}}^{x} \int_{0}^{1} \int_{0}^{1} [S_{D, ℓ_{1}, ℓ_{2}} {S_{\overset{‒}{D}, ℓ_{1}}^{- 1} (s), S_{\overset{‒}{D}, ℓ_{2}}^{- 1} (t)} - S_{D, ℓ_{1}} {S_{\overset{‒}{D}, ℓ_{1}}^{- 1} (s)} S_{\overset{‒}{D}, ℓ_{2}}^{- 1} {S_{\overset{‒}{D}, ℓ_{2}}^{- 1} (t)}] d W (s) d W (t),

(A.2)

and the {ℓ₁, ℓ₂} element in Σ₂ is

λ β_{ℓ_{1}} β_{ℓ_{2}} η_{ℓ_{1}, ℓ_{2}}^{y} \int_{0}^{1} \int_{0}^{1} r_{ℓ_{1}} (s) r_{ℓ_{2}} (t) [S_{\overset{‒}{D}, ℓ_{1}, ℓ_{2}} {S_{\overset{‒}{D}, ℓ_{1}}^{- 1} (s), S_{\overset{‒}{D}, ℓ_{2}}^{- 1} (t)} - s t] d W (s) d W (t) .

(A.3)

The Taylor expansion of $\hat{Δ}$ at Ω gives

{\hat{Δ}}_{h} - Δ_{h} \overset{d}{\to} {(\hat{Ω} - Ω)}^{'} Δ h (Ω),

(A.4)

where Δh(Ω) is the gradient of h evaluated at Ω. Since the asymptotic variance of the right hand side in (A.4) is given by

\nabla h {(Ω)}^{'} v a r (\hat{Ω} - Ω) \nabla h (Ω) .

It follows that

v a r ({\hat{Δ}}_{h} - Δ_{h}) \overset{p}{\to} \sum_{ℓ_{1}, ℓ_{2}} \frac{\partial h^{2}}{\partial Ω_{ℓ_{1}} \partial Ω_{ℓ_{2}}} c o v ({\hat{Ω}}_{ℓ_{1}} - Ω_{ℓ_{2}}, {\hat{Ω}}_{ℓ_{1}} - Ω_{ℓ_{2}}) .

(A.5)

Using the covariance structures in (A.2) and (A.3) in (A.5), we can then obtain the asymptotic normality of ${\hat{Δ}}_{h}$ by combining (A.1) with the Cramer-Wold device [23].

References

1.Zhou XH, McClish DK, Obuchowski N. Statistical Methods in Diagnostic Medicine. Wiley; New York: 2002. [Google Scholar]
2.Zou K. Comparison of correlated receiver operating characteristic curves derived from repeated diagnostic test data. Academic Radiology. 2001;8(3):225–233. doi: 10.1016/S1076-6332(03)80531-7. [DOI] [PubMed] [Google Scholar]
3.Molodianovitch K, Faraggi D, Reiser B. Comparing the areas under two correlated ROC curves: parametric and non-parametric approaches. Biometrical Journal. 2006;48:745–757. doi: 10.1002/bimj.200610223. [DOI] [PubMed] [Google Scholar]
4.Copas JB, Corbett P. Overestimation of the receiver operating characteristic curve for logistic regression. Biometrika. 2002;89(2):315–331. [Google Scholar]
5.DeLong ER, DeLong D, Clarke-Pearson D. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics. 1988;44:837–845. [PubMed] [Google Scholar]
6.Obuchowski NA. Nonparametric analysis of clustered ROC curve data. Biometrics. 1997;53:567–578. [PubMed] [Google Scholar]
7.Bamber D. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology. 1975;12:387–415. [Google Scholar]
8.Zhang D, Zhou X, Freeman D, Freeman J. A non-parametric method for the comparison of partial areas under ROC curves and its application to large health care data sets. Statistics in Medicine. 2002;21(5):701–715. doi: 10.1002/sim.1011. [DOI] [PubMed] [Google Scholar]
9.Baker S, Pinsky P. A proposed design and analysis for comparing digital and analog mammography: special receiver operating characteristic methods for cancer screening. Journal of The American Statistical Association. 2001;96:421–428. [Google Scholar]
10.Li J, Fine JP. Weighted area under the receiver operating characteristic curve and its application to gene selection. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2010;59(4):673–692. doi: 10.1111/j.1467-9876.2010.00713.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Obuchowski N, Rockette H. Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: an ANOVA approach with dependent observations. Communications in Statistics-Theory and Methods. 1995;24(2):285–308. [Google Scholar]
12.Song X, Zhou XH. A marginal model approach for analysis of multi-reader multi-test receiver operating characteristic (ROC) data. Biostatistics. 2005;6(2):303–312. doi: 10.1093/biostatistics/kxi011. [DOI] [PubMed] [Google Scholar]
13.Lee MLT, Rosner BA. The average area under correlated receiver operating characteristic curves: A nonparametric approach based on generalized two-sample wilcoxon statistics. Applied Statistics. 2001;50(3):337–344. [Google Scholar]
14.Wei LJ, Johnson WE. Combining dependent tests with incomplete repeated measurements. Biometrika. 1985;72(2):359–364. [Google Scholar]
15.Yang Y, Jin Z. Combining dependent tests to compare the diagnostic accuracies: non-parametric approach. Statistics in Medicine. 2006;25(7):1239–1250. doi: 10.1002/sim.2338. [DOI] [PubMed] [Google Scholar]
16.Etzioni R, Pepe M, Longton G, Hu C, Goodman G. Incorporating the time dimension in receiver operating characteristic curves: A case study of prostate cancer. Medical Decision Making. 1999;19:242–251. doi: 10.1177/0272989X9901900303. [DOI] [PubMed] [Google Scholar]
17.Wieand S, Gail MH, James BR, James KL. A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika. 1989;76:585–592. [Google Scholar]
18.Li G, Zhou K. A unified approach to nonparametric comparison of receiver operating characteristic curves for longitudinal and clustered data. Journal of the American Statistical Association. 2008;103:705–713. doi: 10.1198/016214508000000364. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Buck Louis GM, Hediger ML, Peterson CM, Croughan M, Sundaram R, Stanford J, Chen Z, Fujimoto VY, Varner MW, Trumble A, et al. Incidence of endometriosis by study population and diagnostic method: the endo study. Fertility and sterility. 2011;96:360–365. doi: 10.1016/j.fertnstert.2011.05.087. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.American Society For Reproductive Medicine Revised american society for reproductive medicine classification of endometriosis: 1996. Fertility and Sterility. 1997;67:817–821. doi: 10.1016/s0015-0282(97)81391-x. [DOI] [PubMed] [Google Scholar]
21.Werner C, Brunner E. Rank methods for the analysis of clustered data in diagnostic trials. Computational Statistics & Data Analysis. 2007;51(10):5041–5054. [Google Scholar]
22.Konietschke F, Brunner E. Nonparametric analysis of clustered data in diagnostic trials: Estimation problems in small sample sizes. Computational Statistics & Data Analysis. 2009;53(3):730–741. [Google Scholar]
23.Serfling RJ. Approximation theorems of mathematical statistics. Wiley; New York: 1980. [Google Scholar]

[R1] 1.Zhou XH, McClish DK, Obuchowski N. Statistical Methods in Diagnostic Medicine. Wiley; New York: 2002. [Google Scholar]

[R2] 2.Zou K. Comparison of correlated receiver operating characteristic curves derived from repeated diagnostic test data. Academic Radiology. 2001;8(3):225–233. doi: 10.1016/S1076-6332(03)80531-7. [DOI] [PubMed] [Google Scholar]

[R3] 3.Molodianovitch K, Faraggi D, Reiser B. Comparing the areas under two correlated ROC curves: parametric and non-parametric approaches. Biometrical Journal. 2006;48:745–757. doi: 10.1002/bimj.200610223. [DOI] [PubMed] [Google Scholar]

[R4] 4.Copas JB, Corbett P. Overestimation of the receiver operating characteristic curve for logistic regression. Biometrika. 2002;89(2):315–331. [Google Scholar]

[R5] 5.DeLong ER, DeLong D, Clarke-Pearson D. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics. 1988;44:837–845. [PubMed] [Google Scholar]

[R6] 6.Obuchowski NA. Nonparametric analysis of clustered ROC curve data. Biometrics. 1997;53:567–578. [PubMed] [Google Scholar]

[R7] 7.Bamber D. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology. 1975;12:387–415. [Google Scholar]

[R8] 8.Zhang D, Zhou X, Freeman D, Freeman J. A non-parametric method for the comparison of partial areas under ROC curves and its application to large health care data sets. Statistics in Medicine. 2002;21(5):701–715. doi: 10.1002/sim.1011. [DOI] [PubMed] [Google Scholar]

[R9] 9.Baker S, Pinsky P. A proposed design and analysis for comparing digital and analog mammography: special receiver operating characteristic methods for cancer screening. Journal of The American Statistical Association. 2001;96:421–428. [Google Scholar]

[R10] 10.Li J, Fine JP. Weighted area under the receiver operating characteristic curve and its application to gene selection. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2010;59(4):673–692. doi: 10.1111/j.1467-9876.2010.00713.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Obuchowski N, Rockette H. Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: an ANOVA approach with dependent observations. Communications in Statistics-Theory and Methods. 1995;24(2):285–308. [Google Scholar]

[R12] 12.Song X, Zhou XH. A marginal model approach for analysis of multi-reader multi-test receiver operating characteristic (ROC) data. Biostatistics. 2005;6(2):303–312. doi: 10.1093/biostatistics/kxi011. [DOI] [PubMed] [Google Scholar]

[R13] 13.Lee MLT, Rosner BA. The average area under correlated receiver operating characteristic curves: A nonparametric approach based on generalized two-sample wilcoxon statistics. Applied Statistics. 2001;50(3):337–344. [Google Scholar]

[R14] 14.Wei LJ, Johnson WE. Combining dependent tests with incomplete repeated measurements. Biometrika. 1985;72(2):359–364. [Google Scholar]

[R15] 15.Yang Y, Jin Z. Combining dependent tests to compare the diagnostic accuracies: non-parametric approach. Statistics in Medicine. 2006;25(7):1239–1250. doi: 10.1002/sim.2338. [DOI] [PubMed] [Google Scholar]

[R16] 16.Etzioni R, Pepe M, Longton G, Hu C, Goodman G. Incorporating the time dimension in receiver operating characteristic curves: A case study of prostate cancer. Medical Decision Making. 1999;19:242–251. doi: 10.1177/0272989X9901900303. [DOI] [PubMed] [Google Scholar]

[R17] 17.Wieand S, Gail MH, James BR, James KL. A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika. 1989;76:585–592. [Google Scholar]

[R18] 18.Li G, Zhou K. A unified approach to nonparametric comparison of receiver operating characteristic curves for longitudinal and clustered data. Journal of the American Statistical Association. 2008;103:705–713. doi: 10.1198/016214508000000364. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Buck Louis GM, Hediger ML, Peterson CM, Croughan M, Sundaram R, Stanford J, Chen Z, Fujimoto VY, Varner MW, Trumble A, et al. Incidence of endometriosis by study population and diagnostic method: the endo study. Fertility and sterility. 2011;96:360–365. doi: 10.1016/j.fertnstert.2011.05.087. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.American Society For Reproductive Medicine Revised american society for reproductive medicine classification of endometriosis: 1996. Fertility and Sterility. 1997;67:817–821. doi: 10.1016/s0015-0282(97)81391-x. [DOI] [PubMed] [Google Scholar]

[R21] 21.Werner C, Brunner E. Rank methods for the analysis of clustered data in diagnostic trials. Computational Statistics & Data Analysis. 2007;51(10):5041–5054. [Google Scholar]

[R22] 22.Konietschke F, Brunner E. Nonparametric analysis of clustered data in diagnostic trials: Estimation problems in small sample sizes. Computational Statistics & Data Analysis. 2009;53(3):730–741. [Google Scholar]

[R23] 23.Serfling RJ. Approximation theorems of mathematical statistics. Wiley; New York: 1980. [Google Scholar]

PERMALINK

Nonparametric ROC Summary Statistics for Correlated Diagnostic Marker Data

Liansheng Larry Tang

Aiyi Liu

Zhen Chen

Enrique F Schisterman

Bo Zhang

Zhuang Miao

Abstract

1. Introduction

2. Methods

2.1. Definition of nonparametric ROC summary statistics

2.2. Multi-reader multi-test ROC data

2.3. Longitudinal biomarker data

3. Asymptotic variance expressions of the proposed statistics

4. Simulation studies

4.1. Multi-reader multi-test data

Table 1.

Table 2.

Table 3.

4.2. Longitudinal biomarker data

Table 4.

5. An example in the diagnosis of endometriosis

6. Discussion

Acknowledgement

Appendix: Derivation of variance expression of Δ_h

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Nonparametric ROC Summary Statistics for Correlated Diagnostic Marker Data

Liansheng Larry Tang

Aiyi Liu

Zhen Chen

Enrique F Schisterman

Bo Zhang

Zhuang Miao

Abstract

1. Introduction

2. Methods

2.1. Definition of nonparametric ROC summary statistics

2.2. Multi-reader multi-test ROC data

2.3. Longitudinal biomarker data

3. Asymptotic variance expressions of the proposed statistics

4. Simulation studies

4.1. Multi-reader multi-test data

Table 1.

Table 2.

Table 3.

4.2. Longitudinal biomarker data

Table 4.

5. An example in the diagnosis of endometriosis

6. Discussion

Acknowledgement

Appendix: Derivation of variance expression of Δh

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Appendix: Derivation of variance expression of Δ_h