Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Apr 1.
Published in final edited form as: Lifetime Data Anal. 2012 Oct 10;19(2):257–277. doi: 10.1007/s10985-012-9233-5

ROC Analysis for Multiple Markers with Tree-Based Classification

Mei-Cheng Wang 1, Shanshan Li 1
PMCID: PMC3633731  NIHMSID: NIHMS413948  PMID: 23054242

Abstract

Multiple biomarkers are frequently observed or collected for detecting or understanding a disease. The research interest of this paper is to extend tools of ROC analysis from univariate marker setting to multivariate marker setting for evaluating predictive accuracy of biomarkers using a tree-based classification rule. Using an arbitrarily combined and-or classifier, an ROC function together with a weighted ROC function (WROC) and their conjugate counterparts are introduced for examining the performance of multivariate markers. Specific features of the ROC and WROC functions and other related statistics are discussed in comparison with those familiar properties for univariate marker. Nonparametric methods are developed for estimating the ROC and WROC functions, and area under curve (AUC) and concordance probability. With emphasis on population average performance of markers, the proposed procedures and inferential results are useful for evaluating marker predictability based on multivariate marker measurements with different choices of markers, and for evaluating different and-or combinations in classifiers.

Keywords: Concordance probability, Multiple markers, Prediction accuracy, U-statistics

1 INTRODUCTION

The Receiver Operating Characteristic (ROC) analysis has been widely used as tools for assessing the discriminant performance for biomarkers. Based on a univariate or combined-to-univariate marker, the ROC curve is known as a plot of the true positive rate versus the false positive rate for each possible cut point, for summarizing sensitivity and specificity of a binary classifier system when marker measurements are continuous. In nonparametric, semiparametric or parametric models, the ROC curve and its associated measures such as area under curve (AUC) or partial area under curve (pAUC) have been used as useful indices for evaluating the predictive accuracy of markers or diagnostic tests (Pepe, 2003). In statistical literature, different measures have been developed to summarize and compare the predictive accuracy of biomarkers (Gu and Pepe, 2009; Baker et al., 2009; among others).

This paper considers situations when multiple markers (M1, M2, …, Mk) are available for classification of disease state. The research interest is to establish criterion and tools for assessing predictive accuracy based on multivariate markers or multivariate test measurements, (M1, M2, …, Mk), from observed data or a training data set. The proposed work includes at least two types of applications: (i) to quantify the result of dual or multiple readings from a single diagnostic test, or readings from multiple tests; (ii) to evaluate the predictability of combined multiple markers for a disease, where each marker characterizes a specific biological function for the disease. For the first type of applications, (i), multiple reading is employed for either reducing uncertainty of test classification or comparison of multiple diagnostic modalities (Metz and Shen, 1992; Hanley and McNeil, 1983). Applications of the second type, (ii), are important when multivariate markers are used as prognostic measurements for predicting or understanding the disease.

To analyze multiple marker data, several approaches have been developed to handle the correlation structure of marker measurements for different research goals. The most common approach is perhaps to combine multiple markers into a single composite score using logistic regression model, and evaluate the predictability of markers by the one-dimensional composite score (McIntosh and Pepe, 2002). For high-dimensional markers, or when markers come from different biological sources, it may not be analytically appropriate to combine the markers into a composite score and, in such situations, the tree-based regression model could serve as a good alternative for identifying a classification rule. The tree-based classification method is sometimes referred to as recursive partitioning, which is frequently used in data mining, machine learning and clinical practice as a predictive model (Breiman et al., 1984; Zhang et al., 1998). For example, Baker (2000) and Etzioni et al. (2003) considered discretized markers by keeping the marker values in multi-dimensional settings and proposed new definitions for ROC curves.

When markers are continuous, Jin and Lu (2009) considered bivariate markers and proposed to use the area under the upper boundary of ROC region to evaluate diagnostic utilities. Jin and Lu’s work can be viewed as an extension of Baker’s approach (Baker, 2000) from discrete markers to continuous markers. Wang and Li (2012) defined an ROC function for bivariate continuous markers via generalized inverse set of the quantile function FP, where the ROC function possesses a conditional expectation expression. In this paper, we generalize Wang and Li’s results from bivariate marker to multivariate marker setting, and develop methods and inference for ROC analysis.

Assume a k-dimensional marker vector (M1, M2, …, Mk) is available and the disease state is determined by a sequence of arbitrarily combined and-or classifier with positivity specified in either direction of marker values; for example, I((M1m1 or M2 < m2) and (M3 < m3 or M4m4)). This extension links to potential applications related to classification tree with binary decision diagrams. The research interest is to establish criterion and tools for assessing predictive accuracy based on multivariate markers, (M1, M2, …, Mk). Specifically, the ROC function is extended from univariate case to multivariate case, and a weighted ROC (WROC) function is introduced for examining the performance of predictive accuracy with arbitrarily combined and-or classifiers.

Let (Ml1, Ml2, …, Mlk), l = 0, 1, be the marker vector for a non-diseased or diseased subject. Let the arbitrarily combined and-or classifier be expressed as I{(Ml1, Ml2, …, Mlk) ∈ D(m1, m2, …, mk)} with D(m1, m2, …, mk) ⊆ Rk defined as the region for marker-based positivity. To simplify notation and formulation, hereafter we shall use bold face m to represent the vector (m1, m2, …, mk), and let ml and Ml, l = 0, 1, represent the vectors (ml1, ml2, …, mlk) and (Ml1, Ml2, …, Mlk). Define the false and true positive rates respectively as

FP(m)=P{M0D(m)},TP(m)=P{M1D(m)}

The research interest is to extend rules and tools from univariate marker to multivariate marker setting for assessment of predictive accuracy of markers.

Using the US Alzheimers Disease Neuroimaging Initiative (ADNI) data set as an example, the biomarkers of interest include measurements from different biological systems related to neuroimaging, genetics, CSF (Cerebrospinal fluid) and cognition. As the k markers are identified from different biological sources, it may not be appropriate to combine them using, say, a linear combination of the measurements. The and-or classifier also signifies the importance of interaction between markers. For example, using an Alzheimer’s Disease study that the authors are currently involved (the BIOCARD study at Johns Hopkins School of Medicine), decreases in CSF Amyloid beta-42 and/or increases in total tau or phosphorylated-tau (p-tau) are hypothesized as strong predictors for AD or AD-related symptoms. It would be interesting to keep the k markers in multivariate setting and explore their respective roles and interaction nonparametrically.

The paper is organized as follows. Section 2 briefly reviews some of the fundamental definitions and properties for univariate ROC analysis, where emphasis is placed on those which will be extended to multivariate setting. In sections 3 and 5, a set of ROC and ROC-related functions are introduced with discussion focused on contrasting features between univariate and multivariate cases. Section 4 considers nonparametric estimators for ROC-related functions, AUC and concordance probabilities. Simulation and a real data analysis are presented in section 6 to illustrate the applicability of the proposed procedures. Section 7 concludes the paper with a brief discussion.

2 UNIVARIATE MARKER CASE

In the section we consider the univariate marker case, k = 1. Suppose the disease outcome D takes binary values 0 or 1, and M is a continuous marker variable. Let M0 and M1 respectively represent the marker variable from non-diseased (D = 0) and diseased (D = 1) group. Define TP(m) = P(M1 > m) = P(M > m|D = 1) as the true positive rate (sensitivity), and FP(m) = P(M0 > m) = P(M > m|D = 0) the false positive rate (1 − specificity). Assume M0 and M1 are independent. Define F0(m) = 1 − FP(m) and F1(m) = 1 − TP(m) respectively as the cumulative distribution function of M0 and M1.

There are multiple ways to define the ROC function for a univariate marker. A mathematically simple definition ROC(q) = TP[FP−1(q)], q ∈ [0, 1], evaluates the magnitude of true positive rate at controlled false positive rate through inverse functional mapping between FP and TP. The comparison of two ROC functions from different markers should thus be interpreted as the comparison of TP values with the same FP rate. The partial area under ROC curve for false positive rate less than p, 0 ≤ p ≤ 1, is defined as AUC(p) = ∫I(0 ≤ qp)ROC(q) dq. The area under ROC curve is defined as the total area with the FP rate ranging from 0 to 1, that is, AUC(1). Define the partial concordance probability as CON(p) = P(M1 > M0, FP(M0) ≤ p). For univariate marker model, the quantile variable Q0 = FP(M0) is Uniform[0, 1] distributed and thus CON(p) can be calculated using probability measure on (M1, Q0) and is simplified to

CON(p)=P(M1>FP-1(Q0),Q0p)=0pI(m1>FP-1(q))dF1(m1)dq=0pROC(q)dq=AUC(p)

Thus, an alternative way to define ROC(p) is to obtain it as the derivative of the partial concordance probability with respect to p, namely ROC(p) = CON′(p). By definition, CON(p) can also be expressed as

CON(p)=I(m1>m0)I(FP(m0)p)dF1(m1)dF0(m0) (1)

The equivalence between CON(p) and AUC(p) has led to development of nonparametric approaches for estimating AUC(p) using the formula in (1). Dodd and Pepe (2003) showed that the partial area under curve possesses a concordance probability expression: Let p0=FP(TP-1(p0)) and assume p0<p1, then

I(p0q<p1)ROC(q)dq=P(M1>M0,FP-1(p1)<M0TP-1(p0)) (2)

Thus, the partial concordance probability coincides with the partial AUC restricted to the interval that false positive rate less than p1 and true positive rate greater than p0. As proposed by Dodd and Pepe (2003), by plugging the empirical distributions of M0 and M1 into (1) and (2), the partial area-under-curve can be estimated by nonparametric U-statistics. The above properties will be extended to multivariate marker case for further analytical developments.

An alternative approach can be adopted by reversing the roles of true and false positive rates to define a function similar to the ROC function:

ROC(q)=FP[TP-1(q)],q(0,1) (3)

By property of composite function, it is seen that

ROC(q)=ROC-1(q) (4)

Clearly, since the mapping ROC(q) is one-to-one, the function ROC*(q) consists the same amount of information as that of ROC(q). Graphically, ROC(q) and ROC*(q) are symmetric with respect to the diagonal line which connects points (0, 0) and (1, 1). Thus, ROC(q) + ROC*(1 − q) = 1 and the sum of area under ROC curve and area under ROC* curve equals 1. In section 4, for multivariate marker model, a function parallel to ROC*(q) will be introduced and some interesting relationships similar to or different from those of univariate maker case will be explored.

3 MULTIVARIATE MARKERS: ROC, WROC AND AUC

Now consider continuous markers and classification rule in multivariate setting. Suppose M0 and M1 are independent k-dimensional marker vectors from non-diseased group (D = 0) and diseased group (D = 1) respectively. Define

FP(m)=P{M0D(m)},TP(m)=P{M1D(m)}.

Let F0(m) = P(M01m1, M02m2, …, M0kmk) be the cumulative distribution function for non-diseased population, and F1(m) = P (M11m1, M12m2, …, M1kmk) the cumulative distribution function for diseased population. Define the quantile variable Q0 = FP(M0) and denote by H0 the distribution function of Q0. As an important feature of multivariate markers, in general Q0 is not uniformly distributed. The distribution of Q0 depends on the classifier as well as the probability structure of M0, and therefore varies from marker vector to marker vector.

3.1 DEFINITION OF ROC FUNCTION

When marker measurements are multivariate, the function FP(M0) is not a one-to-one transformation, which implies that the ROC function for univariate marker, TP(FP−1(q)), can not be used for multivariate marker case. Wang and Li (2012) considered bivariate marker models and defined an ROC function via generalized inverse set of the quantile function FP, where the ROC function possesses a conditional expectation expression. For multivariate markers, instead of using the generalized inverse set to conceptualize the ROC function, the ROC function is defined as the average of the true positive rate conditioning on the set of marker values with false positive rate q, where the conditional average is calculated subject to the non-diseased population:

ROC(q)=E[TP(M0)FP(M0)=q] (5)

There are a few characteristics of ROC(q) in (5), which may or may not be similar to characteristics of the ROC function for univariate marker:

  • The value of the ROC function in (5) is bounded between 0 and 1.

  • The function ROC(q) may not be an increasing function in q, 0 ≤ q ≤ 1.

  • If the distributions of M0 and M1 are the same (i.e., the marker vector is non-predictive for disease), then for each Borel set D(m1, m2, …, mk), one has TP(m1, m2, …, mk) = FP(m1, m2, …, mk). This implies TP(M0) = FP(M0) with probability one and
    E[TP(M0)FP(M0)=q]=q.

    Thus, if the markers are non-predictive for disease, the ROC function coincides with the diagonal line which connects points (0, 0) and (1, 1), which is similar to the ROC function for univariate marker.

  • When the markers are predictive subject to the classifier D(m1, m2, …, mk), it means that TP(m1, m2, …, mk) ≥ FP(m1, m2, …, mk) for each (m1, m2, …, mk) ∈ Rk, and this implies TP(M0) ≥ FP(M0) with probability one and
    ROC(q)=E[TP(M0)FP(M0)=q]E[FP(M0)FP(M0)=q]=q,

    for 0 ≤ q ≤ 1, Thus, the ROC function is above the diagonal line if the markers are predictive for disease.

3.2 WROC AND AUC

In use of the ROC function, a question of interest is whether the function in (5) can be used for comparisons of markers’ predictive accuracy at population level. To address the question, we recall that for univariate marker the area under ROC curve is calculated with uniform distribution on q-axis (i.e., FP-axis). For multivariate markers, the ROC function defined in (5) can be used to compare the performance of true positive rate locally by conditioning on FP(M0) = q. To evaluate multivariate markers’ predictability unconditionally, the evaluation should take into account the distribution of Q0 besides the use of the conditionally defined ROC function.

Using the probability distribution of Q0, the AUC can be naturally defined as the area under ROC curve subject to Lebesgue integration with measure H0 on q-axis, namely AUC = ∫ROC(q)dH0(q), or equivalently,

AUC=01ROC(q)·h0(q)dq (6)

where h0(q) is the derivative of H0(q), which is assumed to exist. Define

WROC(q)=ROC(q)·h0(q)

as the weighted ROC (WROC) function. Note that WROC(q) is the unconditional average of the true positive rate with fixed false positive rate q:

WROC(q)=E[TP(M0)I(FP(M0)=q)]. (7)

It is seen that AUC is interpreted as area under WROC curve with uniform measure over the unit interval [0, 1]. Subsequently, the partial area under WROC curve can be defined as

AUC(p)=0pWROC(q)dq, (8)

which can be used for comparison of markers in terms of their population-average predictability.

The concordance probability is naturally defined as CON = P(M1D(M0)). Next we prove the equivalence between the concordance probability and the area under WROC curve, which is an extension of a property for univariate marker (Dodd and Pepe, 2003):

CON=P(M1D(M0))=I(m1D(m0))dF1(m1)dF0(m0)=TP(m0)dF0(m0)=01E[TP(M0)Q0=q]·h0(q)dq=01WROC(q)dq=AUC (9)

With an additional constraint on the false positive rate p, 0 ≤ p ≤ 1, the partial concordance probability can be expressed as

CON(p)=P(M1D(M0),FP(M0)p),

where the full concordance probability corresponds to the special case p = 1. The partial concordance probability is

CON(p)=P(M1D(M0),FP(M0)p)=I(m1D(m0))I(FP(m0)p)dF1(m1)dF0(m0)=TP(m0)I(FP(m0)p)dF0(m0)=0pE[TP(M0)Q0=q]·h0(q)dq=0pWROC(q)dq=AUC(p) (10)

The equivalence between CON(p) and AUC(p) is again an extension of the result from univariate marker model to multivariate marker model. Further, with the restrictions that the false positive rate is less than or equal to p and that the true positive rate is greater than q, the formula in (10) can be extended to

CON(p,q)=P(M1D(M0),FP(M0)p,TP(M1)>q)=I(m1D(m0))I(FP(m0)p,TP(m1)>q)dF1(m1)dF0(m0),

which is a useful formula for constructing a U-statistic in estimation of the concordance probability with two-sided constraints. It is also clear that CON(p, 0) = AUC(p).

4 NONPARAMETRIC ESTIMATION

Suppose the observations include independent samples of iid copies of M0 and iid copies of M1, where marker vectors are represented by {Mi,0: i = 1, …, n0} and {Mj,1: j = 1, …, n1}, and realization values by {mi,0: i = 1, …, n0} and {mj,1: j = 1, …, n1}, respectively from non-diseased and diseased populations. In this section we consider non-parametric approaches for estimation of ROC, WROC, AUC and CON. Denote by TP^,FP^, 1 and 0 respectively the empirical distribution of the corresponding function. For those p with FP(mi,0) = p, initially one can use a crude empirical estimate TP(mi,0) to estimate ROC(p). Or, alternatively, we can consider the ROC function in its form as a conditional expectation in (5), ROC(q) = E[TP(M0)|FP(M0) = q], and construct a kernel average estimate, which can be thought of as a smoothed version of the crude empirical estimate, to estimate ROC(q):

ROC^(p)=TP^(m0)·k(p-FP^(m0)b)dF^0(m0)k(p-FP^(m0)b)dF^0(m0)=i=1n0TP^(mi,0)·k(p-FP^(mi,0)b)i=1n0k(p-FP^(mi,0)b),

where the kernel k(·) is a mean zero density function and b is a bandwidth (Green and Silverman, 1994).

Note that the ROC function in (5) is defined as the average of true positive rate given a fixed value of the false positive rate, where the calculation of the conditional expectation is through the two one-dimensional variables TP(M0) and FP(M0). Thus, the ‘curse of dimensionality’ does not occur when the ROC function is estimated nonparametrically. A nonparametric estimator of WROC(p) can be constructed by estimating the derivative of CON(p) in (10) using kernel estimation technique:

WROC^(p)-1bTP^(m0)·k(p-FP^(m0)b)dF^0(m0)=1n0bi=1n0TP^(mi,0)·k(p-FP^(mi,0)b)

which is seen to be the same as the product of ROC^(p) and the kernel estimate of h(p),

1bk(p-FP^(m0)b)dF^0(m0).

Based on the equivalence between AUC(p) and CON(p), a nonparametric estimator of AUC(p) can be obtained:

AUC^(p)=I(m1D(m0))I(FP^(m0)p)dF^1(m1)dF^0(m0) (11)

With the restriction that the false positive rate is less than or equal to p and the true positive rate greater than q, the formula in (11) can be extended to

CON^(p,q)=I(m1D(m0))·I(FP^(m0)p,TP^(m1)>q)dF^1(m1)dF^0(m0)=1n0n1i=1n0j=1n1I(mj,1D(mi,0))·I(FP^(mi,0)p,TP^(mj,1)>q),

where the estimator has the form of a U-statistic (Hoeffding, 1948).

Theorem 4.1

Let N = n0 + n1. Assume 0 < limN→∞n0/N = λ < 1. Then, for p, q ∈ [0, 1], (i) CON^(p,q) converges to CON(p, q) in probability as N → ∞, and (ii) N{CON^(p,q)-CON(p,q)}dNormal(0,σ2), where σ2 is specified in the Appendix.

The asymptotic results require that N be large and 0 < n0/N = λ < 1. This condition is generally satisfied with random sampling while disease status D could be either random or fixed, which is respectively relevant in prospective and retrospective (case-control) study. In the case D is random, N corresponds to the total sample size and n0/N converges to P(D = 0) = λ, 0 < λ < 1, with probability 1 and the asymptotic normality holds with the usual interpretation.

5 OTHER TYPES OF ROC AND WROC FUNCTIONS

Similar to considerations of using (3) in univariate marker case, for multivariate markers we may want to consider a function with the roles of true and false positive rates reversed. Define Q1 = TP(M1), and let H1 and h1 respectively be the distribution function and density function of Q1. Then, similar to the structure of ROC(q), where ROC(q) = E[TP(M0) | FP(M0) = q ], for multivariate markers we may define

ROC(q)=E[FP(M1)TP(M1)=q].

In general, as a part of the main features which distinguish the univariate and multivariate ROC inferences, the functional transformation ROC*(q) is not one-to-one and therefore does not have the inverse functional relationship with ROC(q). Further define

ROC¯(q)=E[FN(M0)TN(M0)=q]andROC¯(q)=E[TN(M1)FN(M1)=q]

where FN(m) = P(M1D(m)) is the false negative rate and TN(m) = P(M0D(m)) is the true negative rate. The weighted functions corresponding to ROC*, ROC¯(q) and ROC¯(q) can be defined in such ways similar to the WROC function: for 0 < q < 1,

WROC(q)=ROC(q)·h0(q);WROC(q)=ROC(q)·h1(q)WROC¯(q)=ROC¯(q)·h0(1-q);WROC¯(q)=ROC¯(q)·h1(1-q)

These weighted ROC functions serve to study the performance of predictive accuracy for multivariate markers from different perspectives. For example, WROC*(p) serves to study the performance of false positive rate with true positive rate controlled at value p. It is shown in the appendix that

ROC(q)+ROC¯(1-q)=1;ROC(q)+ROC¯(1-q)=1WROC(q)+WROC¯(1-q)=h0(q);WROC(q)+WROC¯(1-q)=h1(q)

Thus, the function ROC provides the same amount of information as ROC¯, and similarly ROC* is as informative as ROC¯. Also, with knowledge of h(q), WROC(q) provides the same amount of information as WROC¯ for predictive accuracy, and similar argument applies to the relationship between WROC* and WROC¯. Essentially, the pair-wise relationship can be thought of as the conjugate partnership.

For evaluation based on partial area under curve, subject to either smaller FP (FPp) or larger TP (TP > q), choices of these weighted ROC functions should be WROC and WROC¯ so that maximization of area under curve would make sense. These two weighted ROC functions together with their corresponding ROC functions are used in our simulation to study the performance of the proposed criterions and methods for multivariate markers. Note that the the partial concordance probability for true negativity is CON¯(p)=P(M0D(M1),FN(M1)p). By similar technique employed in section 3.2, it can be proved that this concordance probability coinsides with the area under WROC¯(p) function, CON¯(p)=AUC¯(p), and therefore a U-statistic CON¯^(p) can be constructed to estimate CON¯(p).

In case of requiring both FPp and TP > q, these ROC or WROC functions cannot be used for evaluation, but CON(p, q) can be used and estimated by the technique described in section 3. For estimation of ROC¯,WROC¯ and CON¯(p,q), nonparametric estimates can be constructed using methods similar to those for the functions ROC, WROC and CON(p, q). Also, a property similar to Theorem 4.1 can be established for CON¯^(p) by the same technique.

Remark

By setting Ml1 = Ml2 = … = Mlk, l = 0, 1, univariate marker model can be viewed as a degenerated case of multivariate markers. For this degenerated case, the quantile variable Q0 = FP(M0) and Q1 = TP(M1) both follow Uniform[0, 1] distribution, and ROC¯(q)=FN(TN-1(q)) and ROC¯(q)=TN(FN-1(q)). In this case, each of the WROC functions coincides with their counterpart of ROC functions. Further, besides the relationship ROC(q)+ROC¯(1-q)=1 and ROC(q)+ROC¯(1-q)=1, it is seen that ROC*(q) = ROC −1(q), which implies that each of the four ROC functions provides the same amount of information as the other three functions for predictive accuracy of the marker.

6 SIMULATION AND DATA EXAMPLE

6.1 SIMULATION

To show the performance of predictive accuracy for multivariate markers, we conduct simulation studies under different scenarios. We compare ROC and WROC curves for multivariate markers under each scenario, along with the weight function h0(q). We also compare univariate and multivariate marker cases to evaluate the gain and loss by using multiple markers.

Since this paper is a generalization of the bivariate ROC analysis of Wang and Li (2012), we take k ≥ 3 markers for evaluation. For simplicity, we take k = 3. Consider the simulation model where (M01, M02, M03) and (M11, M12, M13) follow a multivariate normal distribution. By convention we assume higher marker value indicates presence of disease. Let N1 = 200 be the number of diseased individuals and N2 = 200 be the number of non-diseased individuals. We generate data so that (M01, M02, M03) have mean (0, 0, 0) and unit deviations. We generate data so that (M11, M12, M13) have mean (1, 1, 1) and unit deviations. Let ρl = (ρl12, ρl23, ρl13), l = 0, 1, where ρlij denote the correlation between Mli and Mlj. We consider different scenarios according to different correlations ρl. The ROC analysis for univariate marker is based on data generated from the distributions of Ml1, bivariate ROC analysis is based on data generated from the distribution of (Ml1, Ml2), and multivariate ROC analysis is based on data generated from the distribution of (Ml1, Ml2, Ml3).

Figures 13 exhibit simulation results when ρ0 = ρ1 = 0, 0.5 and 1 respectively. As discussed in section 5, WROC is the conjugate partner of WROC¯ and WROC* is the conjugate partner of WROC¯, and with the knowledge of h0(q) and h1(q), each of paired-partners provides the same amount of information for prediction as its partner. Choices of these weighted ROC functions should include only WROC and WROC¯ so that maximization of area under curve makes sense.

Figure 1.

Figure 1

Simulation for classifier I(M1 > m1, M2 > m2, M3 > m3) with ρ0 = ρ1 = 0

Figure 3.

Figure 3

Simulation for classifier I(M1 > m1, M2 > m2, M3 > m3) with ρ0 = ρ1 = 1

When ρ0 = ρ1 = 0, the three markers are mutually independent, so the use of all three markers is expected to be more informative than one marker or two markers alone. Figure 1 shows a clear pattern of gain and loss as the number of markers increases. The gain in WROC(q) for small values of q, when compared to univariate ROC curve, is substantial for multivariate ROC curve but only moderate for bivariate ROC curve. Similarly, the loss in WROC(q) for large values of q is substantial for bivariate ROC curve but only moderate for bivariate ROC curve. This phenomenon can partly be explained by the right skewness of the weight function h0(q): the distribution of FP is uniform in univariate case, but it distributes more probability toward smaller values for bivariate marker case, and the inclusion of the third marker makes the weight function more skewed. By the equivalence between partial concordance probability and partial area under WROC curve, we find that multivariate markers outperform univariate marker and bivariate marker for the region with small FP. The function WROC¯ for multivariate markers shows the opposite direction of gain and loss, compared to univariate or bivariate marker case. There is loss in WROC¯(q) for small values of q (FP) and gain for large values of q, which is due to the left skewness of the weight function h1(1 − q).

When ρ0 = ρ1 = 0.5, the three markers are moderately correlated, similar to the case ρ0 = ρ1 = 0, the distribution of Q0 and Q1 still distribute more probability to small values, so we can observe the same pattern of tradeoff between gain at small FP and loss at large FP.

When ρ0 = ρ1 = 1, the three markers are identical and they provide the same information as one marker case (or two marker case). The ROC (WROC) functions for multivariate case coincides with the ROC function for univariate case (Figure 3). The univariate case can thus be viewed as a degenerated case of multivariate markers.

6.2 A DATA EXAMPLE

We apply the proposed methods to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data for multivariate ROC analysis. The ADNI study is a research project with research focus on ‘changes of cognition, function, brain structure and function, and biomarkers in elderly controls, subjects with mild cognitive impairment, and subjects with Alzheimer’s disease’ (quoted from http://adni.loni.ucla.edu/). The study is supported by the NIH, private pharmaceutical companies, and nonprofit organizations. Enrollment target was 800 participants - 200 normal controls, 400 patients with amnestic MCI, and 200 patients with mild AD - at 58 sites in the United States and Canada. Participants were enrolled on a rolling basis, and evaluated every six month. One of the major goals of the ADNI study is to identify biomarkers that are associated with progression from MCI to AD, and determine which biomarker measures (alone or in combination) are the best predictors of disease progression. Sensitivity and specificity for both cross-sectional and longitudinal diagnostic classification were considered important statistical techniques for assessing biomarkers in disease progression (Risacher et al., 2009).

Investigations of the risk of progressing from MCI to AD dementia have largely focused on measures from the following categories: demographics, cognition, apolipoprotein E (APOE), magnetic resonance imaging (MRI), and cerebrospinal fluid (CSF) data. Demographic variables include age, education and gender. Cognitive measures represent five domains respectively: memory, language, executive function, spatial ability, and attention. Neuroimaging measures include brain volume, ventricular volume, and bilateral hippocampal volumes. The CSF variables include T-tau, Aβ42, p-tau181, the ratio of the first two variables, and the ratio of the last two variables.

For this section, we selected three markers, hippocampus volume, memory score and executive function for illustration. To account for censoring, we used a reduced sample data set to create time-independent binary disease outcomes (D = 0, 1). We chose the 24th month as the cut-off time to define disease state. Of the 274 subjects who had complete data for the three markers, 49 subjects were loss to follow up before 24 months, so we focused on the 225 subjects who have had follow-up time longer than 24 months: there were 89 failures (D = 1) and 136 survivors (D = 0) at the 24th month. Let M1 be hippocampus volume, M2 be executive function score, and M3 be memory score. Figure 4 compares the diagnostic performance of three markers (M1, M2, M3), bivariate markers (M1, M2), and univariate marker M1. If the classifier is I(M1 > m1, M2 > m2, M3 > m3), there is gain for small values of FP and loss for large values of FP. The partial AUC plot indicates that multivariate markers produce higher partial concordance summary than univariate marker when q < 0.6, and multivariate markers produce higher partial concordance summary than bivariate marker when q < 0.3. In diagnostic testing, it is crucial to maintain the false positive rate to be low to avoid unnecessary monetary costs. Thus, if the prognostic capacity is evaluated in terms of partial AUC, the multivariate marker hippocampus volume, executive function and memory score together would be considered performing much better than hippocampus volume alone.

Figure 4.

Figure 4

(M1, M2, M3)=(hippocampus, executive function, memory), with classifier I(M1 > m1, M2 > m2, M3 > m3).

Without restriction on the false positive rate, the AUC under the multivariate WROC curve is 0.358 (SE: 0.022) and the AUC under the multivariate WROC¯ is 0.964 (SE: 0.024); the AUC under the bivariate WROC curve is 0.437 (SE: 0.030) and the AUC under the bivariate WROC¯ is 0.906 (SE: 0.030); the AUC under the univariate ROC curve is 0.658 (SE: 0.040). The bootstrap method was adopted to calculate the standard errors for estimation of AUC.

7 DISCUSSION

Existing ROC methods to incorporate multiple markers typically consider a composite score based on combined markers by modeling the relationship between the marker vector M and the binary outcome D (McIntosh and Pepe, 2002), where P(Y = 1|M) = p(M) is used as the optimal score to identify the combination of multiple markers for classifying the disease outcome. In general, by the Neyman-Pearson lemma, the optimality of p(M) is a very general property which holds without dimensionality constraint on M. In the case that the linear logistic regression model assumption holds, the optimal classification rule, P(M), becomes equivalent to the regression function βM under the logit link. Thus, the optimality property of a one-dimensional classification score heavily relies on the assumption of logistic regression model. In this paper, we extend tools from univariate marker to multivariate markers for evaluating predictive accuracy of markers under a nonparametric setting based on tree-based classification rules.

The proposed ROC and WROC functions together with the AUC are intended to measure the average performance of and-or classifier among all possible combinations of true positive rate for a given false positive rate for evaluating predictability of markers and comparing curves, and they may not reflect the optimized use of markers for clinical decisions. Although the proposed approach is not designed to achieve optimality as a decision rule such as the one proposed by Jin and Lu (2009), our methods and inferential results are much more structural, accessible and workable. The proposed ROC and WROC functions enjoy the advantage of preserving the distributional structures of markers, and the associated summary measures such as AUC or partial AUC serve as very appropriate summary measures to evaluate the performance of and-or classifier among all possible combinations of marker values - this is a feature similar to the univariate marker case. These summary measures are useful in applications, since many biomarker studies (such as the ADNI study and two other Alzheimer’s Disease studies that the authors are currently involved) have research emphasis largely focused on the understanding of predictability of biomarkers in target population, and less emphasis toward optimization of clinical decision rules.

The evaluation takes into account the distributions of quantile variables Q0 and Q1 in the diseased and non-diseased populations, which leads to the result of equivalence between AUC and CON, a property similar to the case of univariate marker. We also provide estimation procedures using nonparametric smoothing estimators for the ROC and WROC function, and U-statistic for the AUC. For applications of the proposed analysis, as the ‘curse of dimensionality’ is not a concern for nonparametric estimation of ROC, WROC and other related properties, the usual random split into training sample (for model fitting) and test sample (for creating ROC curve and calculating AUC) would be as proper as it is for univariate marker case, and therefore is advisable.

For future and further research, similar to the considerations for univariate ROC analysis (Tosteson and Begg, 1988; Pepe, 1998), it would be interesting to consider methodology to adjust for covariates such as age, sex or other demographical factors for bivariate or multivariate markers.

Also, given that the disease outcomes typically change with time, it would be interesting to extend the ROC analysis for high-dimensional markers to accommodate time-to-disease information using the ‘survival-tree methodology’ (Zhang et al. 1998), along the lines of extending ROC techniques from binary disease outcome model to right-censored survival data model in univariate marker settings (Etzioni et al., 1999; Heagerty et al., 2000; Slate and Turnbull, 2000; Heagerty and Zheng, 2005).

Figure 2.

Figure 2

Simulation for classifier I(M1 > m1, M2 > m2, M3 > m3) with ρ0 = ρ1 = 0.5

Appendix

Proof of Theorem 3.1

Define the kernel function of the U-statistic (Hoeffding, 1948) as

h(M0i,M1j;FP,TP)=I(M1jD(M0i))·I(FP(M0i)p,TP(M1j)>q).

Note that

AUC^(p,q)=1n0n1i=1n0j=1n1h(M0i,M1j;FP^,TP^)=1n0n1i=1n0j=1n1h(M0i,M1j;FP,TP)+1n0n1i=1n0j=1n1{h(M0i,M1j;FP^,TP^)-h(M0i,M1j;FP,TP)}=I+II

The kernel function in Term I satisfies E[h2] < ∞ and by two-sample U-statistics theory, I converges to AUC(p, q) in probability. Term II can be expressed as

II=1n0n1i=1n0j=1n1I(M1jD(M0i)){I(FP^(M0i)p,TP^(M1j)>q)-I(FP(M0i)p,TP(M1j)>q)}

Note that

II1n0n1i=1n0j=1n1|I(FP^(M0i)p,TP^(M1j)>q)-I(FP(M0i)p,TP(M1j)>q)|1n0n1i=1n0j=1n1|I(FP^(M0i)p)-I(FP(M0i)p)|+1n0n1i=1n0j=1n1|I(TP^(M1j)>q)-I(TP(M1j)>q)|=1n0i=1n0|I(FP^(M0i)p)-I(FP(M0i)p)|+1n1j=1n1|I(TP^(M1j)>q)-I(TP(M1j)>q)|=op(n0-1/2)+op(n1-1/2)=op(N-1/2)

The consistency result, (i), in Theorem 3.1 follows by viewing the fact that term II converges to 0 in probability. To prove (ii), first note that Term I converges in distribution to a normal distribution by U-statistics theory: N{1-AUC(p,q)}dNormal(0,σ2), where σ2 = λ−1τ1,0 + (1 − λ) τ−10,1 with

τ1,0=COV[h(M01,M11),h(M01,M12)]

and

τ0,1=COV[h(M01,M11),h(M02,M11)].

Also,

N{AUC^(p,q)-AUC(p,q)}=N{I-AUC(p,q)}+N·II=N{I-AUC(p,q)}+op(1)dNormal(0,σ2)

Property in section 4

  1. ROC(q)+ROC¯(1-q)=1, and WROC(q)+WROC¯(1-q)=h0(q)

  2. ROC(q)+ROC¯(1-q)=1, and WROC(q)+WROC¯(1-q)=h1(q)

Proof

Note that

ROC(q)+ROC¯(1-q)=E[TP(M0)FP(M0)=q]+E[FN(M0)FP(M0)=q]=E[TP(M0)+FN(M0)FP(M0)=q]=E[1FP(M0)=q]=1,

and it follows WROC(q)+WROC¯(1-q)=ROC(q)·h0(q)+ROC¯(1-q)h0(q)=h0(q), which proved (i). Similar argument can be used to prove (ii).

Contributor Information

Mei-Cheng Wang, Email: mcwang@jhsph.edu.

Shanshan Li, Email: shli@jhsph.edu.

References

  1. Baker SJ. Identifying combinations of cancer markers for further study as triggers of early intervention. Biometrics. 2000;56:1082–1087. doi: 10.1111/j.0006-341x.2000.01082.x. [DOI] [PubMed] [Google Scholar]
  2. Baker SJ, Cook NR, Vickers A, Kramer BS. Using relative utility curves to evaluate risk prediction. J R Stat Soc [Ser A] 2009;172:729–748. doi: 10.1111/j.1467-985X.2009.00592.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. Monterey, CA: Wadsworth and Brooks/Cole Advanced Books and Software; 1984. [Google Scholar]
  4. Dodd L, Pepe M. Partial AUC estimation and regression. Biometrics. 2003;59:614–623. doi: 10.1111/1541-0420.00071. [DOI] [PubMed] [Google Scholar]
  5. Etzioni R, Pepe M, Longton G, Hu C, Goodman G. Incorporating the time dimension in receiver operating characteristic curves: A prostate cancer case study. Medical Decision Making. 1999;19:242–251. doi: 10.1177/0272989X9901900303. [DOI] [PubMed] [Google Scholar]
  6. Etzioni R, Kooperberg C, Pepe M, Smith R, GANN PH. Combining biomarkers to detect disease with application to prostate cancer. Biostatistics. 2003;4:523–538. doi: 10.1093/biostatistics/4.4.523. [DOI] [PubMed] [Google Scholar]
  7. Green PJ, Silverman BW. Nonparamnetric Regression and Generalized Linear Models: A Robust Penalty Approach. London: Chapman and Hall; 1994. [Google Scholar]
  8. Gu W, Pepe MS. Measures to summarize and compare the predictive capacity of markers. International Journal of Biostatistics. 2009;5(1) doi: 10.2202/1557-4679.1188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983;148:839–843. doi: 10.1148/radiology.148.3.6878708. [DOI] [PubMed] [Google Scholar]
  10. Heagerty PJ, Lumley T, Pepe MS. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics. 2000;56:337–344. doi: 10.1111/j.0006-341x.2000.00337.x. [DOI] [PubMed] [Google Scholar]
  11. Heagerty PJ, Zheng Y. Survival model predictive accuracy and ROC curves. Biometrics. 2005;61:92–105. doi: 10.1111/j.0006-341X.2005.030814.x. [DOI] [PubMed] [Google Scholar]
  12. Hoeffding W. A class of statistics with asymptotically normal distributions. Annals of Statistics. 1948;19:293–325. [Google Scholar]
  13. Jin H, Lu Y. ROC region of a regression tree. Statistics and Probability Letters. 2009;79:936–942. [Google Scholar]
  14. Metz CE, Shen JH. Gains in Accuracy from Replicated Readings of Diagnostic Images: Prediction and Assessment in Terms of ROC Analysis. Medical Decision Making. 1992:60–75. doi: 10.1177/0272989X9201200110. [DOI] [PubMed] [Google Scholar]
  15. McIntosh MW, Pepe MS. Combining several screening tests: Optimality of the risk score. Biometrics. 2002;58:657–664. doi: 10.1111/j.0006-341x.2002.00657.x. [DOI] [PubMed] [Google Scholar]
  16. Pepe MS. Three Approaches to Regression Analysis of Receiver Operating Characteristic Curves for Continuous Test Results. Biometrics. 1998;54:124–135. [PubMed] [Google Scholar]
  17. Pepe MS. The statistical evaluation of medical tests for classification and prediction. New York: Oxford; 2003. [Google Scholar]
  18. Risacher SL, Saykin AJ, West JD, Shen L, Firpi HA, McDonald BC the Alzheimer’s Disease Neuroimaging Initiative (ADNI) Baseline MRI Predictors of Conversion from MCI to Probable AD in the ADNI Cohort. Current Alzheimer Research. 2009;6:347–361. doi: 10.2174/156720509788929273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Slate EH, Turnbull BW. Models for longitudinal biomarkers of disease onset. Statistics in Medicine. 2000;19:617–637. doi: 10.1002/(sici)1097-0258(20000229)19:4<617::aid-sim360>3.0.co;2-r. [DOI] [PubMed] [Google Scholar]
  20. Tosteson AN, Begg CB. A General Regression Methodology for ROC Curve Estimation. Medical Decision Making. 1988;8:205–215. doi: 10.1177/0272989X8800800309. [DOI] [PubMed] [Google Scholar]
  21. Wang M-C, Li S. Bivariate Marker Measurements and ROC Analysis. Biometrics. 2012 doi: 10.1111/j.1541-0420.2012.01783.x. In Press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Zhang H, Crowley J, Sox HC, Olshen RA. Tree-structured statistical methods. In: Armitage P, Colton T, editors. The Encyclopedia of Biostatistics. Vol. 6. Chichester: John Wiley; 1998. pp. 4561–4573. [Google Scholar]

RESOURCES