Accommodating Covariates in ROC Analysis

Holly Janes; Gary Longton; Margaret Pepe

. Author manuscript; available in PMC: 2010 Jan 1.

Published in final edited form as: Stata J. 2009 Jan 1;9(1):17–39.

Accommodating Covariates in ROC Analysis

Holly Janes ¹, Gary Longton ¹, Margaret Pepe ¹

PMCID: PMC2758790 NIHMSID: NIHMS90149 PMID: 20046933

Abstract

Classification accuracy is the ability of a marker or diagnostic test to discriminate between two groups of individuals, cases and controls, and is commonly summarized using the receiver operating characteristic (ROC) curve. In studies of classification accuracy, there are often covariates that should be incorporated into the ROC analysis. We describe three different ways of using covariate information. For factors that affect marker observations among controls, we present a method for covariate adjustment. For factors that affect discrimination (i.e. the ROC curve), we describe methods for modelling the ROC curve as a function of covariates. Finally, for factors that contribute to discrimination, we propose combining the marker and covariate information, and ask how much discriminatory accuracy improves with the addition of the marker to the covariates (incremental value). These methods follow naturally when representing the ROC curve as a summary of the distribution of case marker observations, standardized with respect to the control distribution.

1 Introduction

The classification accuracy of a marker (Y) is most commonly described by the receiver operating characteristic (ROC) curve, a plot of the true positive rate (TPR) versus the false positive rate (FPR) for the set of rules which classify an individual as “test-positive” if Y ≥ c, where the threshold c is varied over all possible values (Pepe et al., 2001; Baker, 2003). Equivalently, the ROC curve can be represented as the cumulative distribution function (CDF) of the case marker observations, standardized with respect to the control distribution (Pepe and Cai, 2004; Pepe and Longton, 2005). The standardized marker observations, or percentile values, are written as pv_D = F(Y_D), where F is the right-continuous CDF of Y among controls and Y_D denotes a case marker observation. The ROC curve at a FPR of f is

ROC (f) = P (1 - p v_{D} \leq f) .

In many settings, covariates should be incorporated into the ROC analysis. First, there are covariates which impact the marker distribution among controls. For example, “center effects” in multi-center studies may affect marker observations. In Section 2, we describe methods for adjusting the ROC curve for such covariates. The associated Stata programs are called roccurve and comproc. Other covariates may affect the inherent discriminatory accuracy of the marker (i.e. the ROC curve). For example, disease severity often impacts marker accuracy, with less severe cases being more difficult to distinguish from controls. In Section 3 we describe an ROC regression method which allows the ROC curve to depend on covariates. The associated Stata program is called rocreg. Finally, there are covariates which contribute to discrimination. For example, baseline risk factors for disease provide some ability to discriminate between cases and controls. A common question is how much discriminatory accuracy the marker adds to the known classifiers (i.e. incremental value). In Section 4 we describe methods for evaluating incremental value.

This paper is a companion to another article in this journal (Pepe et al., 2007) which describes the use of the programs roccurve and comproc for estimating and comparing ROC curves without incorporating covariate information.

2 The Covariate-Adjusted ROC Curve

2.1 Motivation and Concept

Consider a covariate, Z, which affects the distribution of the marker among controls. Figure 1 shows hypothetical data for a continuous marker Y, binary outcome D, and binary covariate Z. The data can be downloaded from the Diagnostic and Biomarker Statistical Center (DABS) website (http://www.fhcrc.org/labs/pepe/dabs). Suppose for concreteness that Z is an indicator of study center. Observe that marker observations among controls (D = 0) tend to be higher in center 1 as compared with center 0, but that the inherent accuracy of the marker (the ROC curve) is the same in the two centers. Consider the pooled ROC curve for Y which combines all case observations together and all control observations together, regardless of study center. Observe in Figure 1 that when the proportion of cases varies across centers (scenario 1), the pooled ROC curve for Y is overly optimistic relative to the ROC curve for Y in each center. Even when Z is independent of the outcome (i.e. the proportion of cases is held constant across centers; scenario 2), the pooled ROC curve is biased; this time it is attenuated with respect to the center-specific ROC curve. This suggests that covariates which impact marker observations among controls should be statistically adjusted in the ROC analysis.

A simulated marker Y and binary covariate Z = 0, 1. Under scenario 1, Z is associated with the outcome: P[D = 1|Z = 0] = 0.36 and P[D = 1|Z = 1] = 0.83. Under scenario 2, Z is independent of the outcome: P[D = 1|Z = 0] = P[D = 1|Z = 1] = 0.50. (a) The densities of Y conditional on Z = 0, conditional on Z = 1, in the pooled data under scenario 1, and in the pooled data under scenario 2. A common threshold is indicated. (b) The common covariate-specific ROC curve, the pooled ROC curve under scenario 1, and the pooled ROC curve under scenario 2. The performances of the common threshold rule are indicated.

We propose a covariate-adjusted measure of classification accuracy called the covariate-adjusted ROC curve, or Inline graphic ROC (Janes and Pepe, 2006; Janes and Pepe, 2007). Conceptually, this is a stratified measure of marker performance. It is defined as

A ROC (f) = P (1 - p v_{D Z} \leq f)

where pv_DZ = F_Z(Y_DZ) represents the case observation with covariate value Z(Y_DZ) standardized with respect to the control population with the same value of Z. When the performance of the marker is the same across populations with different values of Z, as in Figure 1, the Inline graphic ROC is the common covariate-specific ROC curve. More generally, it is a weighted average of covariate-specific ROC curves (Janes and Pepe, 2006). Equivalently, the ROC is the ROC curve for Y when Z-specific thresholds are used for classification. The thresholds, c_Z, are chosen to ensure that the covariate-specific FPR, FPR_Z(c_Z), is common across levels of Z.

2.2 Estimating the ROC

Estimation of the Inline graphic ROC proceeds in two steps: 1) estimate F_Z, the distribution of the marker in controls as a function of Z. For each subject i, calculate the case percentile value: pv_{DZ_i} = F_{Z_i}(Y_{DZ_i}). 2) Estimate the CDF of the case percentile values.

Estimating F_Z begins with specifying how Z acts on the distribution of Y among controls. For example, a linear model could be specified,

Y = β_{0} + β_{1} Z + ε .

The random error, ε, could be assumed to be normally distributed, ε ~ N(0, σ²), which would lead to case percentile values

{\hat{p v}}_{D Z} = Φ ((Y - {\hat{β}}_{0} - {\hat{β}}_{1} Z) / \hat{σ}),

where Φ is the standard normal CDF and β̂₀, β̂₁, and σ̂ are estimates from the linear model. Alternatively, the error distribution could be estimated empirically using the residuals from the linear model as in Heagerty and Pepe (1999). This would lead to percentile values

{\hat{p v}}_{D Z} = \hat{F} (Y - {\hat{β}}_{0} - {\hat{β}}_{1} Z) .

In addition to allowing Z to act linearly on marker observations among controls, the roccurve command allows for stratifying on Z. Here again the distribution of Y among controls conditional on Z can be estimated empirically or by assuming a normal distribution.

Once the percentile values have been calculated, their CDF must be estimated. This estimation step is described in more detail in the companion paper (Pepe et al., 2007). Briefly, the CDF can be estimated empirically, or a parametric distribution can be assumed. The roccurve program allows parametric forms

ROC (f) = P (1 - p v_{D Z} \leq f) = g (α_{0} + α_{1} g^{- 1} (f))

where g = Φ is the standard normal CDF or g(·) = exp(·)/(1+exp(·)) is the logistic function. These forms give rise to binormal (Dorfman and Alf, 1969) and bilogistic (Ogilvie and Creelman, 1968) ROC curves.

In order to fit the ROC model, a discrete set of FPR points, f₁, …, f_{n_p} is chosen. These points may span the interval (0, 1) or a subinterval of interest, (a, b). For each case observation, a set of n_p records is created. The k^th record for the i^th subject includes the binary outcome $U_{k i} = I [1 - {\hat{p v}}_{D Z_{i}} \leq f_{k}]$ and covariate g⁻¹(f_k). A binary regression model with link g, outcome U, and covariate g⁻¹(f) provides estimates of (α₀, α₁) (Alonzo and Pepe, 2002).

We bootstrap the data to obtain standard errors for the estimated Inline graphic ROC. The data should be resampled according to the design of the study; for a case-control study this means resampling separately within case and control strata. If the data are clustered, the clusters should be the resampling units.

Consider as an example, data from a neonatal audiology study designed to evaluate the accuracy with which three audiology tests identify hearing impairment in newborns (Norton et al., 2000). The data can be downloaded from the DABS website or loaded directly into Stata using

use http://www.fhcrc.org/science/labs/pepe/book/data/nnhs2

Note that test results for hearing-unimpaired ears may depend on the age and gender of the child. Figure 2 shows the estimated age- and gender-adjusted ROC curves for the marker DPOAE. Several estimation options are shown. The first estimator assumes a linear model for marker measurements among controls,

Three different estimates of the age- and gender-adjusted ROC curve for DPOAE based on the Norton et al. (2002) audiology data.

Y = β_{0} + β_{1} Z_{age} + β_{2} Z_{gender} + ε,

where the error distribution is estimated empirically. The CDF of the estimated placement values,

{\hat{p v}}_{D Z_{i}} = \hat{F} (Y_{i} - {\hat{β}}_{0} - {\hat{β}}_{1} Z_{a g e_{i}} - {\hat{β}}_{2} Z_{{gender}_{i}}),

is estimated empirically. The second estimator adds the assumption that ε is normally distributed, and the third estimator additionally assumes that the ROC curve is binormal. Clustered bootstrapping is used for inference to account for correlation amongst observations (ears) for the same individual. Observe that the ROC fit is somewhat sensitive to the normality assumption at the high end of the marker distribution. We next describe how to estimate these curves using the roccurve command.

2.3 The `roccurve` Command

2.3.1 Syntax

The syntax for the roccurve command is

roccurve disease_var test_varlist [if] [in] [, options]

where disease_var is the name of the binary outcome, D = 1 for a case and D = 0 for a control, and test_varlist is the list of markers.

2.3.2 Options

The general roccurve options are described in detail in the companion paper (Pepe et al., 2007). Here we focus on options that relate to covariate adjustment.

Marker Standardization

The covariates to be used for adjustment are specified using the option adjcov(varlist). The option adjmodel(model) specifies how the covariates are to be used for adjustment; the default is stratified, where the control marker distribution is stratified on the covariates. The other option is linear; here the covariates are assumed to act linearly on the control marker distribution.

Standardized marker values are calculated according to specification in the option pvcmethod(method). Options include empirical (the default), where the control marker distribution is estimated empirically conditional on the covariates, and normal, where the control marker is assumed to have a normal distribution conditional on the covariates.

ROC Calculation

rocmethod(method) specifies whether the empirical or parametric model for the ROC curve is used. The link option is required for a parametric ROC model; a binormal model is fit with link( probit) and a bilogistic model with link( logistic). In the case of a parametric ROC model, the option interval(a b n_p) can be used to specify that the model is fit at n_p points over the restricted FPR interval (a, b).

Sampling Variability

Bootstrapping is used for inference. By default the data are resampled conditional on the binary outcome. The option noccsamp specifies that data be resampled without regard to the outcome. The option nostsamp specifies that sampling be done without regard to covariate strata; by default, when covariates are used for stratification, bootstrap samples are drawn from within covariate strata. The cluster(varlist) option can be used to bootstrap clustered data.

Other Options

Other options include: tiecorr, which corrects for ties between case and control observations; various plot options; and options for saving the estimated TPRs, FPRs, and percentile values as new variables. These are all discussed in more detail in the companion article (Pepe et al., 2007).

Example

The following code produces the estimators shown in Figure 2:

use http://www.fhcrc.org/science/labs/pepe/book/data/nnhs2
roccurve d y1, adjcov(currage gender) adjm(linear) cl(id) noccsamp
roccurve d y1, adjcov(currage gender) adjm(linear) pvc(normal) cl(id) noccsamp
roccurve d y1, adjcov(currage gender) adjm(linear) pvc(normal) rocm(parametric)///cl(id) noccsamp

2.3.3 ROC Summary Indices

Summary measures of the ROC curve serve as metrics for comparing markers. The area under the covariate-adjusted ROC curve, $A AUC = \int_{0}^{1} A ROC (f) d f$ , can be interpreted as the probability that, for a random case and control marker observation with the same covariate value, the case observation is higher than the control. This is not a clinially relevant summary of marker performance, as the task is not to determine which of a pair of subjects is the case. Moreover, the Inline graphic AUC summarizes the entire ROC curve, when frequently only a portion (e.g. low FPRs) is of interest.

A more clinically meaningful summary measure of the covariate-adjusted ROC curve is the Inline graphic ROC curve (TPR) at a fixed FPR = f of interest. This can be interpreted as the percent of cases detected when the covariate-specific FPRs are held at f. Alternatively, the FPR corresponding to a specific TPR = ROC⁻¹(t) could be reported. This is the common covariate-specific FPR associated with a proportion t of cases detected.

The partial area under the Inline graphic ROC, $p A AUC (f_{0}) = \int_{0}^{f_{0}} A ROC (f) d f$ , can be viewed as a compromise between the AUC and the ROC at a specified point. It has the advantage of focusing on a portion of the ROC, but it lacks a clinically relevant interpretation.

The Inline graphic ROC summary measures are estimated in the same way as their counterparts for the traditional ROC curve. The AUC estimate is the sample average of the case standardized marker values,

\hat{A AUC} = \sum_{i = 1}^{n_{D}} {\hat{p v}}_{D Z_{i}} / n_{D},

(1)

where the sum is over the n_D case observations. When the case percentile values are estimated non-parametrically (i.e. with stratification on Z), this is a weighted average of empirical AUCs in each covariate stratum. The estimated p Inline graphic AUC is also an average of standardized marker values (Dodd and Pepe, 2003),

p \hat{A AUC} (f_{0}) = \sum_{i = 1}^{n_{D}} \max ({\hat{p v}}_{D Z_{i}} - (1 - f_{0}), 0) / n_{D} .

(2)

When the control marker distribution is estimated empirically, corrections are made for ties between case and control marker observations, as discussed in the companion paper (Pepe et al., 2007).

Estimates of Inline graphic AUC and pAUC values for parametric ROC models generally require numerical integration and are not produced by our programs. Instead the parameters are estimated using empirical averages of percentile values, as in equations (1) and (2). Similarly, we estimate ROC curves at fixed FPR = f by calculating the proportion of percentile values that are greater than 1 − f, rather than the value estimated by a parametric ROC model.

2.4 Comparing Covariate-Adjusted ROC Curves

Comparisons between Inline graphic ROC curves can be made using any of the summary indices discussed above. A confidence interval for the difference in summary measures is calculated using the bootstrap. A Wald statistic, which divides the observed difference by its standard error, is compared to the standard normal distribution to obtain a p-value. Standard errors are obtained by bootstrapping. The comproc command is used to compare Inline graphic ROC curves.

2.5 The `comproc` Command

2.5.1 Syntax

The syntax of the comproc command is

comproc disease_var test_var1 [test_var2] [if] [in] [, options]

where disease_var is the binary outcome and test var1 and test var2 are the two markers to be compared. If only one marker is specified, the requested summary statistics are returned but no comparisons are made.

2.5.2 Options

Marker standardization and bootstrap options are the same as with the roccurve command. The choices of summary measures are: auc, the area under the Inline graphic ROC; pauc(f), the partial area under the ROC; roc(f), the TPR corresponding to a FPR of f; and rocinv(t), the FPR corresponding to a TPR of t. The tiecorr option can be used to correct for ties between case and control marker observations. It is used by default if pauc(f) is among the summary measures specified.

2.5.3 Example

Consider again the audiology data. Figure 3 shows the ROC curves for the markers DPOAE and TEOAE, both adjusted for age and gender. The covariates are assumed to act linearly on control marker observations, and the marker distributions and ROC curves are estimated empirically. The comproc command yields estimates of the associated ROC curves at a FPR of f = 0.20, as well as p Inline graphic AUC(0.20) and AUC, as shown below. We conclude that there is no evidence of a difference in the percent cases detected when the FPR is 20%. Comparisons based on the pAUC(0.20) and AUC similarly suggest that there is no difference in performance between the two markers.

Age- and gender-adjusted ROC curves for DPOAE and TEOAE based on the Norton et al. (2002) audiology data.

The comproc command applied to the audiology data yields the following results:

graphic file with name nihms90149u1.jpg

graphic file with name nihms90149u2.jpg

graphic file with name nihms90149u3.jpg

3 ROC Regression

3.1 Motivation and Concept

Covariates such as disease severity and specimen storage time can do more than impact marker observations among controls. They often also impact the inherent discriminatory accuracy of the marker (i.e. the ROC curve). That is, they affect the separation between the case and control marker distributions. A hypothetical example is shown in Figure 4. The data can be downloaded from the DABS website. Observe that the separation between the case and control marker distributions is much greater when Z = 0 than when Z = 1. The covariate also affects the distribution of the marker among controls, necessitating covariate adjustment.

A simulated marker Y and binary covariate Z = 0, 1. The marker is more accurate when Z = 0 than when Z = 1, and marker observations among controls also depend on Z. The performances of a common threshold are indicated.

ROC regression is a methodology that models the marker’s ROC curve as a function of covariates (Pepe, 2000; Alonzo and Pepe, 2002). Implementation proceeds in two steps: 1) model the distribution of the marker among controls as a function of covariates. Calculate the case percentile values, and; 2) model their CDF (i.e. the ROC curve) as a function of covariates. The result is an estimate of the ROC curve for the marker as a function of covariates, in other words a covariate-specific ROC curve. We emphasize that the covariates used in step (1) for adjustment are those that affect the marker distribution in the control population; these may or may not differ from the covariates that impact the separation between cases and controls, used in step (2).

3.2 Estimation

Estimation of the control marker distribution as a function of covariates and calculation of the case percentile values proceeds in exactly the same manner as with the covariate adjustment method. The standardization options allowed by rocreg are the same as with roccurve and comproc. The covariates may be assumed to act linearly on marker observations, or stratification can be employed if they are discrete. The percentile values can be calculated by estimating the control marker distribution conditional on covariates empirically or by assuming a normal model.

Next, a parametric model is specified for the ROC curve (i.e. the CDF of the case percentile values) as a function of covariates. The forms allowed by the rocreg program are

{ROC}_{Z} (f) = P (1 - p v_{D Z} \leq f) = g (α_{0} + α_{1} g^{- 1} (f) + α_{2} Z + α_{3} Z \times g^{- 1} (f)),

where g(·) is the standard normal CDF or the logistic function. The parameter α₂ allows the covariates to impact the “intercept” of the ROC curve, while α₃ allows Z to impact the “slope” of the ROC curve. If α₃ ≠ = 0, the covariates have a different impact on the ROC curve at different FPRs. Observe that this ROC model gives rise to binormal (Dorfman and Alf, 1969) or bilogistic (Ogilvie and Creelman, 1968) ROC curves at each fixed value of Z.

In order to fit the ROC regression model, a discrete set of FPR points, f₁, …, f_{n_p} is chosen. These points may span (0, 1) or a subinterval of interest, (a, b). For each case observation, a set of n_p records is created. The k^th record for the i^th subject includes the binary outcome $U_{k i} = I [1 - {\hat{p v}}_{D Z_{i}} \leq f_{k}]$ and covariates: g⁻¹(f_k), Z_i, and Z_i × g⁻¹(f_k). A binary regression model with link g, outcome U, and covariates: g⁻¹(f), Z, and Z × g⁻¹(f) provides estimates of (α₀, α₁, α₂, α₃) (Alonzo and Pepe, 2002). Bootstrapping is used for inference, where the data are resampled according to the design.

For illustration, an ROC regression model was fit for DPOAE using the audiology data. DPOAE observations among controls are assumed to depend linearly on age and gender, and their distribution is estimated empirically. Age-specific ROC curves are modelled parametrically using

{ROC}_{Z} (f) = Φ (α_{0} + α_{1} Φ^{- 1} (f) + α_{2} Z_{age}) .

(3)

Estimates of the age-specific ROC curves are calculated using the parameter estimates (α̂₀, α̂₁, α̂₂). Figure 5 shows estimated binormal ROC curves for children at 30, 40, and 50 months of age. This figure suggests that the marker is more accurate among older children, but the effect is not statistically significant (see below).

Age-specific ROC curves for DPOAE based on the Norton et al. (2002) audiology data. The ROC curves are adjusted for age and gender.