Summary
To evaluate the clinical utility of new risk markers, a crucial step is to measure their predictive accuracy with prospective studies. However, it is often infeasible to obtain marker values for all study participants. The nested case-control (NCC) design is a useful cost-effective strategy for such settings. Under the NCC design, markers are only ascertained for cases and a fraction of controls sampled randomly from the risk sets. The outcome dependent sampling generates a complex data structure and therefore a challenge for analysis. Existing methods for analyzing NCC studies focus primarily on association measures. Here, we propose a class of non-parametric estimators for commonly used accuracy measures. We derived asymptotic expansions for accuracy estimators based on both finite population and Bernoulli sampling and established asymptotic equivalence between the two. Simulation results suggest that the proposed procedures perform well in finite samples. The new procedures were illustrated with data from the Framingham Offspring study.
Keywords: Biomarker study, Classification Accuracy, Conditional Kaplan Meier, Inverse Probability Weighting, Nested case-control study, Predictive Value, ROC Curve, Time-dependent Accuracy
1 Introduction
Establishing reliable and parsimonious classification rules for predicting patient survival is a crucial step in the path toward personalized medicine. With the advancement of technology, much progress has been made in identifying new markers useful for disease prognosis. For example, the MammaPrint genetic test holds great potential in predicting disease progression for lymph node negative breast cancer patients and was approved for clinical usage in 2007 (Food and Drug Administration, 2007). In epidemiology studies, risk scores have been developed for many diseases and adopted in public health practice to assist in prevention and treatment efforts. Examples include the Framingham risk score for cardiovascular events (Wilson et al., 1998) and the Gail model for breast cancer (Gail et al., 1989). Here and in the sequel, the terms “biomarker” and “marker” refer generally to the continuous output of a prognostic classifier, such as a biological marker, a genetic score or a clinical risk score.
Several large cohorts have been assembled over the past decade in which biological specimens were collected and stored for future studies. While such prospective cohort studies are crucial for evaluating the prognostic potential of a novel marker, it is often undesirable and/or infeasible to measure markers for the entire cohort due to costs associated with the measurement. Two subcohort sampling designs, the case cohort and the nested case–control (NCC), are often employed as cost-effective alternatives to the full-cohort design. In particular, under the NCC design, markers are only measured for cases and a fraction of controls selected from the risk sets of the corresponding cases. Such a design is often preferred in a biomarker study as it naturally accommodates practical issues such as batch effects, storage effects and freeze-thaw cycles (Rundle et al., 2005). However, the design also generates complex datasets in which the missingness of the marker values depends on the outcome of interest, making inference about the predictive accuracy of the marker challenging.
Statistical methods for quantifying the prognostic accuracy of a marker with data from NCC studies are not well developed. Existing literature on NCC studies focuses primarily on relative risk parameters. Inference procedures for the hazard ratio under the Cox model (Cox, 1972) have been proposed (Goldstein & Langholz, 1992; Samuelsen, 1997; Chen, 2001). For example, Goldstein & Langholz (1992) developed a conditional logistic regression estimator and Samuelsen (1997) proposed an inverse probability weighted (IPW) estimator. However, such relative measures ignore some fundamental aspects of risk prediction and do not fully capture the predictiveness of a marker (Pepe et al., 2004; Ware, 2006). To construct more clinically relevant accuracy measures, various time-dependent accuracy measures, including the time specific true positive rate (TPR), false positive rate (FPR), receiver operating characteristic (ROC) curve, positive predictive value (PPV) and negative predictive value (NPV), have been proposed (Heagerty et al., 2000; Heagerty & Zheng, 2005; Cai et al., 2006; Zheng et al., 2008). These measures extend existing classification measures for binary outcomes to incorporate the time domain by dichotomizing the continuous event time T into two disease states at any given time-point of interest t. For example, one may consider the classification between subjects with T ≤ t and those with T > t. This leads to the following accuracy measures:
using the convention that a larger value of Y is associated with higher risk of failure. The corresponding time-dependent ROC curve is then . Existing estimators for these accuracy measures are limited to the case when Y is fully observable. Because of the non-random missingness in Y, they are not directly applicable to data from NCC studies.
Here, we propose non-parametric IPW estimators for the aforementioned accuracy measures with observations inversely weighted by their probabilities of being sampled into the NCC subcohort. We take a non-parametric approach here for the following reason: although risk scores used in practice are often derived from regression models such as the Cox model, validating their prediction performance ideally should not require stringent model assumptions. Our approach is robust in that it remains valid even when the regression model from which the score is derived fails to hold. We consider two different sampling schemes for selecting controls from the risk sets based on: (i) finite population sampling (-sampling); and (ii) independent Bernoulli sampling (-sampling). For (i), we obtain IPW estimators using true sampling weights and calculated their asymptotic variance by accounting for the between subject correlation due to sampling. For (ii), we construct IPW estimators using estimated sampling weights. We show that these two types of estimators are equivalent with respect to their asymptotic variance. Such an equivalence has been established in Breslow & Wellner (2006) for IPW estimators under two-phase stratified case-cohort sampling where the number of matched case-control strata is finite. Here, under NCC design, the number of case-control strata increases with sample size and we show that such an equivalence remains. In addition we consider estimators that accommodate different censoring assumptions. In practice, censoring time C can be quite frequently dependent on marker values. For example, subjects with lower marker values might drop out of the study earlier. To incorporate marker-dependent censoring, we propose IPW kernel smoothing based estimators under the standard survival analysis assumption that T and C are independent given Y. When C is independent of both T and Y, we derive a double IPW estimator as a simple alternative.
The rest of the paper is organized as follows. Section 2 discusses estimation procedures under both sampling schemes and under two types of censoring assumptions. Detailed inference procedures are provided in section 3. We present simulation results in section 4 to demonstrate the finite sample performance of proposed procedures. These procedures are applied to data from the Framingham Offspring study to evaluate the accuracy of a recently develop risk score for predicting cardiovascular events. Concluding remarks are presented in section 5.
2 Estimation
2.1 Sampling Probabilities for the NCC subcohort
Suppose we have a cohort of n individuals followed prospectively for a clinical event. Due to censoring, for T, we observe a bivariate vector (X, δ), where X = T Λ C, δ = I(T ≤ C). Let denote the full cohort data, where Yi only observable if subject i was selected into the NCC subcohort. We assume that Y has a finite support [, ], C has a finite support [0, τ]. We consider the prediction of survival up to τ0 < τ such that . Throughout, we require the standard conditional independent censoring assumption, i.e. T and C are independent given Y. Note that in a purely non-parametric setting, the distribution of T is not identifiable if C is dependent on T given Y and this assumption is not verifiable in general without additional assumptions on the dependence structure (Tsiatis, 1975).
Without loss of generality, we consider a typical NCC study where all cases are included in the subcohort. For each observed case failed at tj, m controls are randomly sampled from his/her risk set excluding the candidate case, which is of size . The m controls are sampled without replacement for -sampling. For -sampling, each eligible subject in the risk set of tj is sampled independently with probability m/ as a control for tj. Both sampling schemes are easy to implement in practice, but -sampling may be more frequently used. For either of the sampling scheme, V0i denotes whether subject i is ever sampled as a control and Vi = δi+(1−δi)V0i indicates being sampled into the NCC subcohort.
Under -sampling, the sampling probability for subject i is (Samuelsen, 1997), and thus the weight used for the IPW estimators is
where is the probability of subject i being sampled as a control and
For the -sampling scheme, Let Bjk denote an indicator that takes the value 1 if subject k is sampled as a control for subject j (0 otherwise). Then {Bjk} are independent Bernoulli random variables with success probability , and V0i = 1 − Πj:Xj≤Xi,δj=1(1 − Bij). Note that the true sampling probability for subject i is also , i.e., , and one may use the true sampling weight, i.e., to construct IPW estimators. However, similar to the findings for case-cohort studies (Breslow & Wellner, 2006; Nan et al., 2009), it can be shown that using estimated sampling weights yields improved efficiency. To construct IPW estimators under -sampling that correspond to those obtained under the -sampling, we instead use weights
where we estimate as , and
2.2 IPW Conditional Nelson-Aalen Estimators of the Conditional Risk and Accuracy Functions
Estimators of the accuracy measures can be constructed by consistently estimating the bivariate survival function and the marginal distribution of Y, . We first direct attention to estimating the conditional survival Sy(t) = P(T ≥ t | Y = y). In the following, the IPW weight to account for sampling will be chosen as for -sampling and for -sampling.
Conditional Survival Estimation
To estimate Sy(t) without imposing any parametric assumptions on the relationship between T and Y, we consider the kernel-smoothed conditional Nelson-Aalen (CNA) estimator (Beran, 1981; Dabrowska, 1989; Du & Akritas, 2002). Aside from providing a natural estimator for estimating Sy(t) nonparametrically, the CNA estimator is also known for its robustness when censoring is dependent on Y. Under NCC sampling, we propose to modify the estimator with IPW to account for the outcome-dependent missingness in Y. Specifically, we propose to estimate the cumulative hazard function Λy(t) = −log Sy(t) as
where , , Kh(x) = K(x/h)/h and K is a known smooth symmetric density function. As for the standard kernel estimation (e.g. Beran, 1981; Dabrowska, 1989; Du & Akritas, 2002), the bandwidth parameter h is assumed to be of order O(n−ν) with ν ∈ [1/5, 1/2) to ensure the consistency and asymptotic normality of . More discussions on the order of h for the accuracy estimators are given in section 3. Subsequently, Sy(t) can be estimated as .
Accuracy Measure Estimation
Based on the estimated conditional survival, we construct the following empirical estimator for the bivariate survival function as
(2.1) |
for , where we estimate the marginal distribution of Y as
(2.2) |
Subsequently, we may estimate the marginal survival distribution of T, , as . With the joint and marginal distributions of Y and T estimated, we may easily construct the following plug-in estimators of the aforementioned accuracy measures:
(2.3) |
(2.4) |
The ROC curve can be estimated as .
2.3 Double Inverse Probability Weighted Estimators
When the censoring C is independent of both Y and T, one may consistently estimate the accuracy measures using double IPW (DIPW) to account for missingness due to both NCC sampling and censoring. Specifically, let , where is the Kaplan-Merier estimator of . It is straightforward to see that and one may use to account for missing information about I(Ti ≤ t) due to censoring. Thus, and F(c) can be estimated as
respectively. Subsequently, we obtain estimates of the accuracy measures by replacing , , in (2.3) and (2.4) with , , and , respectively. Note that double weighting was used for to ensure that the estimated accuracy measures are between 0 and 1. The resulting estimators are denoted by , , , .
The DIPW approach has the advantage of being simple to calculate without kernel smoothing. However, as shown in the simulation section, the resulting estimators are subject to bias when the censoring distribution depends on the marker values.
3 Asymptotic Properties and Inference Procedures
The NCC sampling scheme brings in additional complexity and poses a significant challenge in the theoretical study of the proposed estimators. Specifically, our proposed estimators based on -sampling involve the sampling variables {, …, }, which are weakly dependent conditional on . To establish the consistency and asymptotic normality of the proposed estimators, one may account for the dependence using the law of large numbers and central limit theorems for sequences of asymptotically linear negative quadrant dependent random variables (Zhang, 2000; Cai, 2005). Under -sampling, the sampling variables are independent conditional on and the asymptotic properties of the corresponding estimators can be established using empirical processes theory.
Variance Form
For the -sampling scheme, we show in Appendix A.1 that the asymptotic variance (aVAR) of a generic IPW estimator of the form is , which is defined in (A.1). In addition, we demonstrate in Appendix A.2 that under -sampling scheme, the aVAR of the IPW estimator is also . It is easy to show that for -sampling, the aVAR of , the IPW estimator with true weights, is , where pi is defined in Lemma 1. Similar phenomenon has been observed for various IPW estimators (e.g. Breslow & Wellner, 2006; Nan et al., 2009) with case-cohort studies. As emphasized in Robins et al. (1994), the variance form of this type of IPW estimators can be viewed as a residual sum of squares and thus the enrichment of the model for the sampling probability is likely to enhance the efficiency of the estimation. Since the use of and yields the same efficiency, we only provide detailed asymptotic derivations for the latter with .
Consistency
In Appendix B, we showed that for the IPW conditional Nelson-Aalen estimator,
when h = O(n−ν) with ν ∈ [1/5, 1/2). Thus is uniformly consistent for Sy(t) in y. Furthermore, , , , , and , are uniformly consistent for , TPRt(c), FPRt(c), NPVt(c), and PPVt(c) for and , where , , τ and are constants such that .
Asymptotic Normality and Interval Estimation
To construct confidence intervals (CIs) for the proposed accuracy measures, we show in Appendix C that and converge jointly to zero-mean Gaussian processes in c ∈ ΩY when h = O(n−ν) with ν ∈ (1/4, 1/2). It is important to note that we require the standard under-smoothing assumption to avoid bias for the resulting accuracy estimators as for smoothed empirical processes (van der Vaart, 1994; Zheng et al., 2008; León et al., 2009). Furthermore, we established weak convergence for the accuracy estimators. With the aforementioned rate for h, the asymptotic distribution of the accuracy estimators does not depend on h at the first order.
The aVAR of these estimators can be estimated empirically, and the CIs can be constructed based on normal approximations. For example, we showed in Appendix C that
in distribution, where is defined in Appendix C and ηζFPRt (c, u) = E{ζFPRt(c; Di)I(Xi ≥ u)(1 − pi)/pi}. A 95% confidence interval for FPRt(c) may be obtained as , where
, and is obtained by replacing all theoretical quantities in ζFPRt(c, Di) by their empirical counterparts. Similar point-wise CIs can be constructed for the ROC curve as well as the predictive value functions.
Similar arguments could be used to establish the asymptotic properties of the DIPW estimators when C is independent of Y and T. Under this assumption, is a uniformly consistent estimator of , and converges weakly to a zero-mean Gaussian process (Kalbfleisch & Prentice, 2002). This, together with similar arguments as given in the Appendices, can be used to establish the consistency and asymptotic normality of the DIPW accuracy estimators.
4 Numerical Studies
4.1 Simulation Studies
Simulation studies were conducted to assess the performance of the proposed inference procedure in finite samples and to compare the accuracy estimators. To this end, we generated Y from a truncated normal such that with . The event time T was generated from a Cox model with log T = 1 − log(3)Y/2 + ∊ and ∊ generated from an independent extreme-value distribution. We generated C as C = min(C0, C1) with C0 ~ Uniform(.5,1). Two configurations were used for C1 to incorporate (i) independent censoring and (ii) marker dependent censoring. For (i), we let C1 ~ 0.1+ Gamma(2,2); and for (ii), we let C1 = eY/10−1 +eY/5Gamma(2,5). Both types of censoring lead to about 90% of censoring and event rate of 5% by t0 = 0.5. The cohort sample size was chosen to be 5000, and for each observed case, either 1 or 3 matched controls were selected. For each dataset, we obtained point and interval estimators for the accuracy of Y in predicting the risk of having an event by t0 based on both the - and -sampling. For each configuration, we simulated 1000 datasets to summarize the empirical performance of the proposed estimators.
We first focus on the CNA estimators. In Table 1(a), we present results for FPR, TPR, PPV, and NPV at from simulated datasets with 1 matched control and under independent censoring, for p = 0.2, 0.4, 0.6, 0.8. First, we note that all estimators have negligible bias; the estimated standard errors (SE) are close to the sampling standard errors, and the 95% CIs have empirical coverage level close to the nominal level. Second, consistent with the theoretical results, the two sampling schemes yield asymptotically equivalent estimators with both the sampling SE and the estimated SE close to each other. In clinical applications, it is often of interest to summarize the overall accuracy of the marker using the area under the ROC curve (AUC) and also to examine the accuracy of a marker with cut-off value selected to achieve a certain level of sensitivity or specificity. In Table 1(b), we present results for AUC as well as for FPR, PPV and NPV at a sensitivity level of 0.90, representing a relatively low level of false negative rate. The proposed point and interval estimates also perform well with respect to bias and coverage levels.
Table 1.
(a) Accuracy Estimates at , for k = 1, 2,3,4. | |||||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
Truth | Bias | Bias | SSE | SSE | ASE | ASE | CovP | CovP | |
FPRt0(c.2) | 79.0 | 0.1 | −0.1 | 2.5 | 2.5 | 2.4 | 2.5 | 93.1 | 94.4 |
FPRt0(c.4) | 58.2 | 0.1 | 0.0 | 3.0 | 3.1 | 3.0 | 3.0 | 94.4 | 93.3 |
FPRt0(c.6) | 37.7 | 0.2 | 0.0 | 2.9 | 3.0 | 2.9 | 2.9 | 95.1 | 93.9 |
FPRt0(c.8) | 17.8 | 0.2 | 0.0 | 2.3 | 2.2 | 2.2 | 2.2 | 94.2 | 93.8 |
| |||||||||
TPRt0(c.2) | 97.1 | −0.1 | −0.1 | 1.0 | 1.0 | 1.1 | 1.1 | 94.1 | 94.2 |
TPRt0(c.4) | 90.4 | −0.4 | −0.4 | 2.0 | 1.9 | 2.0 | 2.0 | 96.1 | 95.8 |
TPRt0(c.6) | 78.5 | −0.7 | −0.8 | 2.8 | 2.8 | 2.9 | 2.9 | 94.9 | 95.1 |
TPRt0(c.8) | 57.3 | −1.1 | −1.3 | 3.6 | 3.6 | 3.7 | 3.7 | 95.3 | 94.0 |
| |||||||||
NPVt0(c.2) | 99.2 | 0.0 | 0.0 | 0.3 | 0.3 | 0.3 | 0.3 | 94.6 | 94.2 |
NPVt0(c.4) | 98.6 | 0.0 | 0.0 | 0.3 | 0.3 | 0.3 | 0.3 | 95.2 | 94.7 |
NPVt0(c.6) | 98.0 | 0.0 | 0.0 | 0.3 | 0.3 | 0.3 | 0.3 | 96.7 | 96.4 |
NPVt0(c.8 | 97.0 | 0.0 | 0.0 | 0.3 | 0.3 | 0.3 | 0.3 | 96.0 | 95.7 |
| |||||||||
PPVt0(c.2) | 6.9 | −0.1 | −0.1 | 0.5 | 0.5 | 0.5 | 0.5 | 93.6 | 92.1 |
PPVt0(c.4) | 8.5 | −0.2 | −0.2 | 0.7 | 0.7 | 0.7 | 0.7 | 93.5 | 92.4 |
PPVt0(c.6) | 11.1 | −0.3 | −0.3 | 1.1 | 1.0 | 1.0 | 1.0 | 92.7 | 92.1 |
PPVt0(c.8) | 16.2 | −0.5 | −0.4 | 1.9 | 1.9 | 1.9 | 1.9 | 92.4 | 93.2 |
(b) AUC and Accuracy Estimates at TPR of 0.90. | |||||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
Truth | Bias | Bias | SSE | SSE | ASE | ASE | CovP | CovP | |
AUC | 78.5 | −1.0 | −0.9 | 1.9 | 1.9 | 1.9 | 1.9 | 94.1 | 92.8 |
FPRTPR=0.9 | 57.2 | 0.7 | 0.6 | 4.7 | 4.7 | 5.2 | 5.3 | 96.0 | 96.7 |
NPVTPR=0.9 | 98.6 | 0.0 | 0.0 | 0.2 | 0.2 | 0.2 | 0.2 | 94.7 | 93.7 |
PPVTPR=0.9 | 8.6 | −0.2 | −0.2 | 0.9 | 0.9 | 0.9 | 0.9 | 93.1 | 93.1 |
Results for independent censoring with 3 matched controls are shown in Table 2. In addition to observing reasonable performance for propose estimators, we found that an increase in the number of matched controls appears to be most helpful in improving the estimation of the FPR with about 65% of reduction in the variance. The % reduction in the variance is about 30% for the AUC estimation and ranges from 0% to 21% for the TPR estimation. Similar patterns were also observed under the scenario of marker dependent censoring (Table 3).
Table 2.
(a) Accuracy Estimates at , for k = 1, 2,3,4. | |||||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
Truth | Bias | Bias | SSE | SSE | ASE | ASE | CovP | CovP | |
FPRt0(c.2) | 79.0 | 0.0 | 0.0 | 1.5 | 1.5 | 1.5 | 1.5 | 94.0 | 93.0 |
FPRt0(c.4) | 58.2 | 0.1 | 0.0 | 1.7 | 1.9 | 1.8 | 1.8 | 95.8 | 93.4 |
FPRt0(c.6) | 37.7 | 0.1 | 0.0 | 1.8 | 1.8 | 1.7 | 1.7 | 94.8 | 94.3 |
FPRt0(c.8) | 17.8 | 0.1 | 0.1 | 1.3 | 1.3 | 1.3 | 1.3 | 94.3 | 94.6 |
| |||||||||
TPRt0(c.2) | 97.1 | −0.1 | −0.1 | 1.0 | 1.0 | 1.0 | 1.0 | 93.7 | 93.1 |
TPRt0(c.4) | 90.4 | −0.3 | −0.3 | 1.8 | 1.8 | 1.9 | 1.9 | 96.8 | 95.9 |
TPRt0(c.6) | 78.5 | −0.6 | −0.6 | 2.5 | 2.5 | 2.7 | 2.7 | 96.4 | 95.9 |
TPRt0(c.8) | 57.3 | −0.9 | −0.9 | 3.3 | 3.3 | 3.3 | 3.3 | 94.3 | 94.0 |
| |||||||||
NPVt0(c.2) | 99.2 | 0.0 | 0.0 | 0.3 | 0.3 | 0.3 | 0.3 | 93.0 | 93.0 |
NPVt0(c.4) | 98.6 | 0.0 | 0.0 | 0.3 | 0.3 | 0.3 | 0.3 | 95.6 | 96.1 |
NPVt0(c.6) | 98.0 | 0.0 | 0.0 | 0.3 | 0.3 | 0.3 | 0.3 | 95.6 | 95.5 |
NPVt0(c.8) | 97.0 | 0.0 | 0.0 | 0.3 | 0.3 | 0.3 | 0.3 | 93.7 | 95.2 |
| |||||||||
PPVt0(c.2) | 6.9 | −0.1 | −0.1 | 0.5 | 0.5 | 0.5 | 0.5 | 92.4 | 92.9 |
PPVt0(c.4) | 8.5 | −0.2 | −0.2 | 0.6 | 0.6 | 0.6 | 0.6 | 92.5 | 93.2 |
PPVt0(c.6) | 11.1 | −0.2 | −0.2 | 0.8 | 0.8 | 0.8 | 0.8 | 94.1 | 92.7 |
PPVt0(c.8) | 16.2 | −0.5 | −0.5 | 1.4 | 1.4 | 1.5 | 1.5 | 92.9 | 93.0 |
(b) AUC and Accuracy Estimates at TPR of 0.90. | |||||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
Truth | Bias | Bias | SSE | SSE | ASE | ASE | CovP | CovP | |
AUC | 78.5 | −0.8 | −0.7 | 1.6 | 1.6 | 1.6 | 1.6 | 93.4 | 93.1 |
FPRTPR=0.9 | 57.2 | 0.7 | 0.6 | 4.2 | 4.1 | 4.6 | 4.7 | 95.9 | 96.5 |
NPVTPR=0.9 | 98.6 | 0.0 | 0.0 | 0.2 | 0.2 | 0.2 | 0.2 | 95.3 | 95.5 |
PPVTPR=0.9 | 8.6 | −0.2 | −0.2 | 0.8 | 0.8 | 0.8 | 0.8 | 93.5 | 94.5 |
Table 3.
(a) Accuracy Estimates at , for k = 1, 2,3,4. | |||||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
Truth | Bias | Bias | SSE | SSE | ASE | ASE | CovP | CovP | |
FPRt0(c.2) | 79.0 | 0.0 | 0.0 | 1.4 | 1.3 | 1.4 | 1.4 | 94.8 | 94.7 |
FPRt0(c.4) | 58.2 | 0.1 | 0.0 | 1.6 | 1.5 | 1.6 | 1.6 | 94.7 | 95.7 |
FPRt0(c.6) | 37.7 | 0.1 | 0.0 | 1.5 | 1.5 | 1.4 | 1.4 | 92.8 | 94.7 |
FPRt0(c.8) | 17.8 | 0.1 | 0.1 | 1.1 | 1.1 | 1.1 | 1.1 | 94.0 | 95.2 |
| |||||||||
TPRt0(c.2) | 97.1 | −0.1 | −0.1 | 1.0 | 1.0 | 1.0 | 1.0 | 94.3 | 94.5 |
TPRt0(c.4) | 90.4 | −0.3 | −0.3 | 1.7 | 1.8 | 1.8 | 1.8 | 95.8 | 95.5 |
TPRt0(c.6) | 78.5 | −0.5 | −0.5 | 2.5 | 2.5 | 2.5 | 2.5 | 95.1 | 95.1 |
TPRt0(c.8) | 57.3 | −0.6 | −0.6 | 2.9 | 2.9 | 3.0 | 3.0 | 95.6 | 95.5 |
| |||||||||
NPVt0(c.2) | 99.2 | 0.0 | 0.0 | 0.3 | 0.3 | 0.3 | 0.3 | 93.0 | 92.7 |
NPVt0(c.4) | 98.6 | 0.0 | 0.0 | 0.3 | 0.3 | 0.3 | 0.3 | 94.9 | 94.9 |
NPVt0(c.6) | 98.0 | 0.0 | 0.0 | 0.3 | 0.3 | 0.3 | 0.3 | 95.1 | 94.5 |
NPVt0(c.8) | 97.0 | 0.0 | 0.0 | 0.3 | 0.3 | 0.3 | 0.3 | 95.2 | 94.9 |
| |||||||||
PPVt0(c.2) | 6.9 | −0.1 | −0.1 | 0.4 | 0.4 | 0.4 | 0.4 | 94.1 | 93.1 |
PPVt0(c.4) | 8.5 | −0.2 | −0.2 | 0.5 | 0.5 | 0.5 | 0.5 | 93.1 | 93.3 |
PPVt0(c.6) | 11.1 | −0.3 | −0.2 | 0.8 | 0.8 | 0.8 | 0.8 | 92.4 | 93.4 |
PPVt0(c.8) | 16.2 | −0.4 | −0.4 | 1.3 | 1.3 | 1.3 | 1.3 | 94.1 | 94.6 |
(b) AUC and Accuracy Estimates at TPR of 0.90. | |||||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
Truth | Bias | Bias | SSE | SSE | ASE | ASE | CovP | CovP | |
AUC | 78.5 | −0.7 | −0.7 | 1.5 | 1.5 | 1.5 | 1.5 | 93.0 | 92.0 |
FPRTPR=0.9 | 57.2 | 0.6 | 0.6 | 4.1 | 4.2 | 4.5 | 4.5 | 96.2 | 95.9 |
NPVTPR=0.9 | 98.6 | 0.0 | 0.0 | 0.2 | 0.2 | 0.2 | 0.2 | 94.8 | 93.5 |
PPVTPR=0.9 | 8.6 | −0.2 | −0.2 | 0.8 | 0.8 | 0.8 | 0.8 | 93.9 | 93.8 |
As discussed in Section 2.3, we may estimate the accuracy measure via the DIPW approach, which has the computational advantage compared to the CNA approach. To compare the performance of these two approaches under both the - and -sampling, we generated data from the same models as described above and assessed the percent bias (relative to the truth) and mean squared errors of all the proposed estimators. In Table 4, we summarize the results for the case with m = 3. With independent censoring, all the estimators have negligible bias. Gauged by the mean square errors, both the CNA approach and the DIPW approach yield estimators with comparable efficiency. On the other hand, when the censoring distribution depends on the value of Y, the DIPW approach leads to substantially biased estimators with relative bias as high as 15.6%, while the CNA approach always yields consistent estimators with negligible bias.
Table 4.
(a) Independent Censoring | |||||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
Percent of Relative Bias | 100× Root MSE | ||||||||
Truth | DIPW | CNA | DIPW | CNA | DIPW | CNA | DIPW | CNA | |
FPRt0(c.2) | 79.0 | 0.0 | 0.0 | 0.0 | 0.1 | 1.3 | 1.5 | 1.3 | 1.5 |
FPRt0(c.8) | 17.8 | 0.1 | 0.7 | 0.4 | 0.5 | 1.1 | 1.4 | 1.7 | 1.3 |
| |||||||||
TPRt0(c.2) | 97.1 | 0.0 | −0.1 | 0.0 | −0.1 | 1.0 | 1.0 | 1.1 | 1.0 |
TPRt0(c.8 | 57.3 | −0.2 | −1.6 | −0.2 | −1.6 | 3.1 | 3.4 | 4.0 | 3.4 |
| |||||||||
NPVt0(c.2) | 99.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.3 | 0.3 | 0.3 | 0.3 |
NPVt0(c.8) | 97.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.3 | 0.3 | 0.3 | 0.3 |
| |||||||||
PPVt0(c.2) | 6.9 | −0.1 | −1.6 | −0.1 | −1.7 | 0.4 | 0.5 | 0.4 | 0.5 |
PPVt0(c.8) | 16.2 | −0.1 | −3.0 | −0.2 | −2.9 | 1.4 | 1.5 | 1.5 | 1.5 |
| |||||||||
AUC | 78.5 | −0.4 | −1.0 | −0.4 | −0.9 | 1.6 | 1.8 | 1.6 | 1.8 |
FPRTPR=0.9 | 57.2 | −1.7 | 1.2 | −1.7 | 1.1 | 4.6 | 4.3 | 4.6 | 4.1 |
NPVTPR=0.9 | 98.6 | 0.0 | 0.0 | 0.0 | 0.0 | 0.2 | 0.2 | 0.2 | 0.2 |
PPVTPR=0.9 | 8.6 | 1.7 | −2.1 | 1.7 | −2.2 | 0.9 | 0.8 | 0.9 | 0.8 |
(b) Dependent Censoring | |||||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
Percent of Relative Bias | 100× Root MSE | ||||||||
Truth | DIPW | CNA | DIPW | CNA | DIPW | CNA | DIPW | CNA | |
FPRt0(c.2) | 79.0 | 5.4 | −0.0 | 5.5 | 0.1 | 4.4 | 1.4 | 4.5 | 1.3 |
FPRt0(c.8) | 17.8 | 15.2 | 0.4 | 15.6 | 0.3 | 3.0 | 1.1 | 3.4 | 1.1 |
| |||||||||
TPRt0(c.2) | 97.1 | 0.2 | −0.1 | 0.2 | −0.1 | 1.0 | 1.0 | 1.0 | 1.0 |
TPRt0(c.8 | 57.3 | 1.3 | −1.0 | 1.2 | −1.0 | 3.2 | 3.0 | 4.0 | 3.0 |
| |||||||||
NPVt0(c.2) | 99.2 | −0.2 | −0.0 | −0.2 | −0.0 | 0.4 | 0.3 | 0.4 | 0.3 |
NPVt0(c.8) | 97.0 | −0.1 | 0.0 | −0.1 | 0.0 | 0.3 | 0.3 | 0.4 | 0.3 |
| |||||||||
PPVt0(c.2) | 6.9 | −2.5 | −1.6 | −2.5 | −1.7 | 0.4 | 0.4 | 0.5 | 0.4 |
PPVt0(c.8) | 16.2 | −8.4 | −2.4 | −8.6 | −2.3 | 1.8 | 1.3 | 1.9 | 1.3 |
| |||||||||
AUC | 78.5 | −2.8 | −0.9 | −2.8 | −0.9 | 3.0 | 2.0 | 3.0 | 2.0 |
FPRTPR=.9 | 57.2 | 6.0 | 1.0 | 6.0 | 1.0 | 6.0 | 4.0 | 6.0 | 4.0 |
NPVTPR=.9 | 98.6 | −0.2 | −0.0 | −0.2 | −0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
PPVTPR=.9 | 8.6 | −2.9 | −2.0 | −2.9 | −2.0 | 1.0 | 0.9 | 1.0 | 0.9 |
4.2 Example
The Framingham risk model, based on several clinical factors, is used extensively for detecting risk for coronary heart disease. However it has only moderate levels of sensitivity and specificity. A new risk model, based on both Framingham risk model variables (Wilson et al., 1998) and an inflammation marker, C-reactive protein (CRP), has been developed recently using data from the Women's Health Study (Cook et al., 2006). We illustrate here how our proposed procedure can be used to evaluate the clinical utility of the cardiovascular risk prediction model using an independent dataset from the Framingham Offspring study (Kannel et al., 1979).
The Framingham Offspring Study was established in 1971 with 5,124 participants who were monitored prospectively on epidemiological and genetic risk factors of CVD. We consider here 1728 female participants who were free of CVD and have CRP measurement and other clinical information at the second examination. The average age of this subset was about 44 years with standard deviation 10. The outcome we considered was the time from exam date to first major CVD event, including CVD-related death. During the follow-up period, 269 participants experienced at least one CVD event and the 5-year event rate was about 2%. Since CRP measurements are complete in the cohort, the Framingham data allows us to illustrate the methods with a real dataset and compare estimators obtained using data from NCC subcohorts to those from the full cohort.
We first calculated the risk score using an algorithm developed previously in Cook et al. (2006), combining information on age, systolic blood pressure, smoking status, high-density lipoprotein (HDL), total cholesterol, medication for hypertension and CRP concentration. The score was derived using a Cox proportional hazards model. To evaluate the clinical utility of the score in a different dataset, it is sensible to seek a procedure that is independent of the original modeling assumption. Our nonparametric procedures fit well for this purpose. To compare different sampling designs, for each design with either 1 or 3 matched controls, we assembled 500 nested case control datasets by repeatedly sampling the matched controls. For each dataset, we obtained the point and interval estimates of accuracy summaries for the new score in predicting the risk of developing CVD events within 5 years since predictor measurements. In Table 5, we report the average of the estimates over the 500 sets from three subsampling settings: (i) the full cohort; (ii) NCC samples with m = 1; and (iii) NCC samples with m = 3 Since the -sampling results in asymptotically equivalent estimators, we focus only on the -sampling. For comparison, results from both the CNA and DIPW method are reported. Since the results are fairly comparable between these methods, below we summarize estimates from the CNA method only.
Table 5.
CNA | DIPW | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Cohort | 1 control | 3 controls | Cohort | 1 control | 3 controls | |||||||
Est | SE | Est | SE | Est | SE | Est | SE | Est | SE | Est | SE | |
FPR5(c.2) | 79.7 | 1.0 | 79.7 | 2.5 | 79.6 | 1.6 | 79.7 | 1.0 | 79.6 | 2.5 | 79.8 | 1.5 |
FPR5(c.4) | 59.4 | 1.2 | 59.3 | 3.1 | 59.4 | 2.0 | 59.2 | 1.2 | 59.0 | 3.0 | 59.3 | 1.9 |
FPR5(c.6) | 39.2 | 1.2 | 39.2 | 3.1 | 39.0 | 2.0 | 38.8 | 1.3 | 38.6 | 3.0 | 38.9 | 1.9 |
FPR5(c.8) | 19.1 | 1.0 | 19.1 | 2.6 | 19.1 | 1.7 | 18.8 | 0.9 | 18.8 | 2.4 | 18.8 | 1.5 |
| ||||||||||||
TPR5(c.2) | 92.8 | 4.5 | 93.4 | 4.6 | 93.2 | 4.6 | 91.9 | 4.3 | 91.9 | 4.5 | 91.9 | 4.5 |
TPR5(c.4) | 86.9 | 5.1 | 87.0 | 5.2 | 87.0 | 5.1 | 89.2 | 4.7 | 89.2 | 5.1 | 89.2 | 5.1 |
TPR5(c.6) | 78.5 | 6.8 | 78.3 | 6.9 | 78.4 | 6.8 | 78.4 | 6.5 | 78.4 | 6.8 | 78.4 | 6.8 |
TPR5(c.8) | 61.2 | 7.9 | 60.0 | 8.6 | 60.6 | 8.0 | 62.2 | 7.7 | 62.2 | 8.0 | 62.2 | 8.0 |
| ||||||||||||
NPV5(c.2) | 99.2 | 0.5 | 99.3 | 0.5 | 99.3 | 0.5 | 99.1 | 0.5 | 99.1 | 0.5 | 99.1 | 0.5 |
NPV5(c.4) | 99.3 | 0.3 | 99.3 | 0.3 | 99.3 | 0.3 | 99.4 | 0.3 | 99.4 | 0.3 | 99.4 | 0.3 |
NPV5(c.6) | 99.2 | 0.3 | 99.2 | 0.3 | 99.2 | 0.3 | 99.2 | 0.3 | 99.2 | 0.3 | 99.2 | 0.3 |
| ||||||||||||
NPV5(c.8) | 99.0 | 0.3 | 98.9 | 0.3 | 99.0 | 0.3 | 99.0 | 0.3 | 99.0 | 0.3 | 99.0 | 0.3 |
PPV5(c.2) | 2.5 | 0.4 | 2.5 | 0.4 | 2.5 | 0.4 | 2.5 | 0.4 | 2.5 | 0.4 | 2.5 | 0.4 |
PPV5(c.4) | 3.1 | 0.5 | 3.1 | 0.6 | 3.1 | 0.5 | 3.2 | 0.6 | 3.2 | 0.6 | 3.2 | 0.6 |
PPV5(c.6) | 4.2 | 0.7 | 4.2 | 0.9 | 4.2 | 0.8 | 4.2 | 0.8 | 4.3 | 0.9 | 4.2 | 0.8 |
PPV5(c.8) | 6.5 | 1.3 | 6.4 | 1.7 | 6.5 | 1.4 | 6.8 | 1.4 | 6.9 | 1.6 | 6.8 | 1.4 |
| ||||||||||||
AUC | 75.2 | 4.1 | 74.9 | 4.5 | 75.1 | 4.2 | 75.8 | 3.9 | 75.5 | 4.3 | 75.7 | 4.2 |
FPRTPR=.9 | 65.0 | 13.9 | 66.3 | 14.5 | 65.8 | 13.8 | 58.7 | 8.4 | 58.5 | 14.2 | 58.8 | 12.5 |
PPVTPR=.9 | 99.4 | 0.3 | 99.4 | 0.3 | 99.4 | 0.3 | 99.4 | 0.7 | 99.4 | 1.0 | 99.4 | 0.9 |
NPVTPR=.9 | 2.9 | 0.8 | 2.8 | 0.8 | 2.9 | 0.8 | 3.2 | 0.2 | 3.3 | 0.2 | 3.2 | 0.2 |
Across all accuracy measures, the point estimates from all three subsampling settings are close to each other. The sampling variability of these estimators decreases as the number of controls increases. However, similar to the results in simulation studies, the gain in precision is most pronounced in estimates of FPR. It appears that a NCC design with m = 3 would yield accuracy estimators with precision comparable to that of the full cohort in most of the cases. The estimated AUC is about 0.75 with standard error about 0.04 and 95% CI (0.67, 0.84) based on NCC samples with m = 3. These estimates suggest that the new score incorporating the CRP information has a moderate accuracy in predicting the 5-year risk of CVD events. One utility of the risk score is to recommend preventive strategies such as a statin therapy to patients who are positive on the score-based test. If a low false negative rate, say 10%, is desirable, then a decision rule based on the corresponding threshold would yield about an FPR of 65% (s.e. 14%); PPV of 99% (s.e. 0.3%) and NPV of 2.9% (s.e. 0.8%).
5 Remarks
Ensuring adequate validation of a prediction model is one of the major challenges in prognostic tool development. In this paper, we proposed nonparametric estimators for prognostic accuracy measures of novel markers with data generated by a NCC design within a prospective cohort study. By using a kernel smoothing technique along with IPW, our proposed estimators are robust and broadly applicable to complex settings where censoring is marker dependent and marker information is missing by design. Results from extensive simulation studies and practical examples suggest that the commonly time-dependent accuracy measures can be estimated well using data from NCC studies. In general, we find that the CNA approach works well for smaller cohort sizes provided that there are a sufficient number of cases. For example, we also conducted simulation studies with n = 2000 using similar setting as those described above but with slightly higher event rate yielding about 300 cases by the end of the study. As shown in Table 6, the proposed point and interval estimates for the accuracy measures perform well under this setting.
Table 6.
(a). | |||||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
Truth | Bias | Bias | SSE | SSE | ASE | ASE | CovP | CovP | |
FPRt0(c.2) | 77.4 | 0.1 | 0.1 | 2.6 | 2.7 | 2.6 | 2.6 | 94.0 | 94.7 |
FPRt0(c.4) | 55.4 | 0.1 | 0.2 | 3.1 | 3.1 | 3.1 | 3.1 | 94.4 | 94.8 |
FPRt0(c.6) | 34.3 | 0.1 | 0.1 | 2.8 | 2.8 | 2.9 | 2.9 | 95.8 | 96.5 |
FPRt0(c.8) | 14.7 | 0.3 | 0.3 | 2.1 | 2.1 | 2.1 | 2.1 | 95.2 | 95.0 |
| |||||||||
TPRt0(c.2) | 96.8 | −0.2 | −0.2 | 1.1 | 1.1 | 1.2 | 1.2 | 95.5 | 95.2 |
TPRt0(c.4) | 89.3 | −0.5 | −0.5 | 1.9 | 2.0 | 2.1 | 2.1 | 96.9 | 96.5 |
TPRt0(c.6) | 76.2 | −0.8 | −0.8 | 2.8 | 2.8 | 3.0 | 3.0 | 96.2 | 95.9 |
TPRt0(c.8) | 53.6 | −0.9 | −0.1 | 3.6 | 3.5 | 3.6 | 3.7 | 94.3 | 94.9 |
| |||||||||
NPVt0(c.2) | 97.8 | −0.1 | −0.1 | 0.8 | 0.7 | 0.8 | 0.8 | 95.2 | 94.4 |
NPVt0(c.4) | 96.4 | −0.1 | −0.1 | 0.7 | 0.7 | 0.7 | 0.7 | 96.7 | 95.7 |
NPVt0(c.6) | 94.6 | −0.1 | −0.1 | 0.7 | 0.7 | 0.7 | 0.7 | 95.8 | 95.4 |
NPVt0(c.8) | 92.1 | −0.1 | −0.1 | 0.8 | 0.8 | 0.8 | 0.8 | 95.6 | 95.5 |
| |||||||||
PPVt0(c.2) | 16.4 | −0.2 | −0.2 | 1.2 | 1.1 | 1.2 | 1.2 | 93.7 | 94.2 |
PPVt0(c.4) | 20.2 | −0.3 | −0.3 | 1.5 | 1.5 | 1.5 | 1.5 | 95.0 | 94.0 |
PPVt0(c.6) | 25.9 | −0.5 | −0.5 | 2.1 | 2.1 | 2.2 | 2.2 | 95.8 | 93.9 |
PPVt0(c.8) | 36.4 | −0.1 | −0.1 | 3.4 | 3.5 | 3.6 | 3.6 | 95.2 | 93.8 |
Biological samples collected from cohort members in large studies are often limited and should be used as efficiently as possible. Our proposed approach will enable researchers to efficiently utilize existing resources collected in large cohort studies such as the Nurses' Health Study (Colditz et al., 1997) or the Health Professional Follow-up Study (Hunter et al., 1992), while maintaining scientific rigor in validating novel prediction models for patients' future risk and prognosis. Depending on the quantity of interest, it is possible that a 1:1 matching with m = 1 provides sufficient estimation precision. The majority of the precision gain due to a larger m contributes to the FPR estimation. When the desired FPR level is low, the width of the CI could be rather small in general and thus one may achieve a reasonable precision for the estimation of most accuracy measures with a small m. In practice, it appears that when m = 3, most of the accuracy estimates achieve reasonable efficiencies relative to those obtained from the full cohort. This echos the finding in the literature that for testing the significance of a single binary covariate, the efficiency of a design with m matched controls per case relative to use of all controls is m/(m + 1) (Ury, 1975; Breslow et al., 1983).
We established the asymptotic equivalence between estimators derived under the -sampling and -sampling. This suggests that in practice, sampling with and without replacement can lead to estimators with similar efficiency when appropriate weights are used. While we show that asymptotically the variances of the CNA estimators are not influenced by the choice of bandwidth h provided that it has the correct order, in practice the selection of h in a particular dataset requires special attention to ensure stable estimation. When C is independent of Y and T, the DIPW estimator may be a useful alternative to the CNA estimator with advantage of not requiring smoothing and thus may be more stable when the number of cases is not large. However, one needs to use this estimator with caution as they are prone to bias when the censoring pattern changes with the marker values. The current development considers the predictive accuracy for I(T ≤ t) at a pre-specified time point t. When there are multiple time points of interest, one may obtain accuracy estimates across all the points. The asymptotic derivations given in the appendix can be used to justify that properly standardized accuracy estimates over time converge jointly to a multivariate normal. This would allow one to construct simultaneous CIs for these parameters to account for multiple comparisons.
Compared with a case-cohort design, individually matched NCC design is known for its weakness that biomarker information on controls is limited to testing the specific study hypotheses. The proposed IPW approach to analyzing NCC data overcomes such design limitations. Indeed, when selected individuals are weighted inversely by their sampling probability, they provide representative data on the entire cohort and can be used for additional evaluation with a different outcome. The IPW approach, however, may not be most efficient. When auxiliary variables are available, it would be interesting to improve the estimation efficiency via augmentation. For example, one may consider efficient estimators along the lines of Robins et al. (1994) or constructing an optimal augmentation procedure within a pre-specified class of functionals as in Bang & Tsiatis (2000) and Bang & Tsiatis (2002). The work presented here is an initial step toward future development along that direction.
Appendix
Throughout, let Ni(t) = I(Xi ≤ t)δi, , π(t) = P (Xi ≥ t),
We assume that C has a finite support [0, τ], which is shorter than that of T. The marker Y is assumed to be continuous and bounded. Throughout, unless noted otherwise, the sup over time t is taken over [0, τ]. We use the notation ≲ to denote bounded up to a constant and ≃ to denote equal up to op(1) in the uniform sense unless specified otherwise. For the kernel function K and marker Y, we make the same assumptions as in Du & Akritas (2002), including: (i) K is a symmetric probability density function with finite support and bounded second derivative; (ii) the distribution function of Y, has bounded second and third derivatives with infx f(x) > 0, where .
A Equivalence Between the Finite Population Sampling with True Weights and the Bernoulli Sampling with Estimated Weights
Here, we demonstrate that in general, the IPW estimators obtained based on the two sampling schemes are asymptotically equivalent at the first order.
A.1 Asymptotic Variance with Finite Population Sampling
Our proposed estimators based on the -sampling involve the sampling variables {, …, } which are weakly dependent conditional on . To establish the consistency and asymptotic normality of the proposed estimators, one may account for the weak dependence using the law of large numbers (Cai, 2005) and central limit theorem theorems (Zhang, 2000) for sequences of asymptotically linear negative quadrant dependent random variables. Here, we focus primarily on the derivation of the asymptotic variances and outline the justification for the following Lemma:
Lemma 1 Let ξ(·) be a given function of D = (X, δ, Y)T such that E{ξ(D)} = 0, E{ξ(D)2} < ∞ and the total variation of ξ(D) is bounded by a constant. Then the random variable of the form
has asymptotic variance
(A.1) |
where ηξ(u) = E{ξ(Di)I(Xi > u)(1 − pi)/pi} and pi = δi + (1 − δi){1 − Gm(Xi)}.
To obtain the asymptotic variance, we note that from Samuelsen (1997), , and for i ≠ j,
where
On the other hand, since .
A.2 Bernoulli Sampling with Estimated Weights
Here we derive the asymptotic variance for a statistic of the form
for some deterministic function ξ. First, since conditional on , {Bij} are independent Bernoulli random variables with success probability and {, …, } are independent Bernoulli with success probability . By the standard empirical process theory (Pollard, 1990), it is not difficult to show that, uniformly over t ∈ [0, τ], conditional on
(A.2) |
which converges weakly to a zero-mean Gaussian process. This, together with a taylor expansion and a uniform law of large numbers (ULLN) (Pollard, 1990), implies that
(A.3) |
where , and
We next approximate . Since and ,
(A.4) |
(A.5) |
Conditional on , Bij′ and are independent when j′ ≠ j. Thus, the covariance between and given is
Since implies Bij = 0,
This, together with a ULLN, implies that
(A.6) |
On the other hand, . Thus, we have
Lemma 2 Let ξ(·) be a given function of D such that E{ξ(D)} = 0, E{ξ(D)2} < ∞ and the total variation of ξ(D) is bounded by a constant. Then the random variable of the form
has asymptotic variance
(A.7) |
B Uniform Consistency of the Absolute Risk and Accuracy Estimators under Bernoulli Sampling
For the consistency, we assume that h = O(n−ν) with ν ∈ [1/5, 1/2). We first establish the following uniform convergence rate for :
(B.1) |
where
and . By Lemma A.3 in Bilias et al. (1997), it suffices to show that
(B.2) |
(B.3) |
where Ay(t) = E{Ni(t) | Yi = y}, πy(t) = P(Xi ≥ t | Yi = y).
First, we note that since ,
where . Then (B.2) follows immediately from Du & Akritas (2002). To show (B.3), we let and write , where
and
For ∊1y(t), we first note that from a functional central limit theorem (FCLT) (Pollard, 1990), converges weakly to a zero-mean Gaussian process in s and thus
(B.4) |
This, together with (A.2), a ULLN and Lemma A.3 of Bilias et al. (1997), yields
(B.5) |
where ηHπ (s; x, t) = E{I(t > Xi ≥ s, Yi ≤ x)(1 − pi)/pi}. Conditional on , are independent with mean 0. Furthermore, can be written as differences of monotone functions with a constant bound and thus has finite psuedo-dimension (Pollard, 1990). Then by a FCLT, conditional on , converges weakly to a zero-mean Gaussian process in (x, t). This, together with the standard arguments given in Bickel & Rosenblatt (1973), implies that . On the other hand, from Du & Akritas (2002), . This concludes the proof for (B.2) and thus we have (B.1).
The convergence of in (B.1) implies that is uniformly consistent for Sy(t). Since , the uniform consistency of for follows immediately. The convergences of and along with a continuous mapping theorem and Lemma A.3 of Bilias et al. (1997) imply the uniform consistency of and all the proposed accuracy measure estimators.
C Asymptotic Distribution of Accuracy Estimators under Bernoulli Sampling
To obtain an asymptotic expansion for the proposed accuracy estimators, we first obtain approximations for and . To remove the potential bias in the accuracy estimators due to the kernel smoothing, we now require h = O(n−ν) with 1/4 < ν < 1/2. From the asymptotic approximations given in Appendix B for and as well as the arguments given in Bickel & Rosenblatt (1973), we have
(C.1) |
where , .
Next, noting that in probability, we write
where . From (A.2) and similar arguments as given for the approximation of (A.3), we have , where
and . Conditional on , are independent with mean 0 and finite pseudo-dimension. Thus, by a FCLT, converges weakly to a zero-mean Gaussian process . On the other hand, is tight and weakly convergent to a zero-mean Gaussian process . Since, and are independent conditional on , converges weakly to .
We next approximate the distribution of . Since and , we have
It follows from (C.1) that
where . By a change of variable ψ = (y − Yi)/h,
Therefore,
where . Using similar arguments as given above, a FCLT may be used to show that conditional on , converges weakly to a zero-mean Gaussian process . On the other hand, also converges weakly to a zero-mean Gaussian process . Therefore, converges weakly to .
Furthermore, it is not difficult to show that the weak convergence of and holds jointly. The asymptotic distribution of the accuracy estimators follows directly from the joint distribution of and . This, together with a functional delta theorem, implies the following approximations for , , , and ,
The same arguments as given above can then be used to establish the weak convergence for these processes and obtain the asymptotic variance based on (A.7). For example, since with , in distribution, where , and .
To establish the weak convergence of the ROC curve estimator, we first note that the arguments above can be extended to show that the weak convergences of the two processes, and (c), hold jointly. This, together with the stochastic equicontinuity of these processes, implies that for u ∈ [ul, ur] ⊂ (0, 1),
where RȮCt(u) = ∂ROCt(u)/∂u. It follows that converges weakly to a zero-mean Gaussian process.
REFERENCES
- Bang H, Tsiatis A. Estimating Medical Costs with Censored Data. Biometrika. 2000;87:329–343. [Google Scholar]
- Bang H, Tsiatis A. Median regression with censored cost data. Biometrics. 2002;58:643–649. doi: 10.1111/j.0006-341x.2002.00643.x. [DOI] [PubMed] [Google Scholar]
- Beran R. Nonparametric regression with randomly censored survival data. Univ. of California; Berkeley: 1981. Unpublished manuscript. [Google Scholar]
- Bickel PJ, Rosenblatt M. On some global measures of the deviations of density function estimates (Corr: V3 p1370) Ann. Statist. 1973;1:1071–1095. [Google Scholar]
- Bilias Y, Gu M, Ying Z. Towards a general asymptotic theory for Cox model with staggered entry. Ann. Statist. 1997;25:662–682. [Google Scholar]
- Breslow N, Lubin J, Marek P, Langholz B. Multiplicative models and cohort analysis. J. Am. Statist. Assoc. 1983;78:1–12. [Google Scholar]
- Breslow N, Wellner J. Weighted Likelihood for Semiparametric Models and Two-phase Stratified Samples, with Application to Cox Regression. Scand. J. Statist. 2006;34:86–102. doi: 10.1111/j.1467-9469.2007.00574.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cai G. Almost sure convergence for linear process generated by asymptotically linear negative quadrant dependence processes. Commun. Korean Math. Soc. 2005;20:161–168. [Google Scholar]
- Cai T, Pepe M, Zheng Y, Lumley T, Jenny N. The sensitivity and specificity of markers for event times. Biostatistics. 2006;7:182–97. doi: 10.1093/biostatistics/kxi047. [DOI] [PubMed] [Google Scholar]
- Chen K. Generalized case-cohort sampling. J. R. Statist. Soc. B. 2001;63:791–809. [Google Scholar]
- Colditz G, Manson J, Hankinson S. The Nurses' Health Study: 20-year contribution to the understanding of health among women. Journal of Women's Health. 1997;6:49–62. doi: 10.1089/jwh.1997.6.49. [DOI] [PubMed] [Google Scholar]
- Cook N, Buring J, Ridker P. The effect of including C-reactive protein in cardiovascular risk prediction models for women. Ann. Intern. Med. 2006;145:21. doi: 10.7326/0003-4819-145-1-200607040-00128. [DOI] [PubMed] [Google Scholar]
- Cox D. Regression models and life-tables. J. R. Statist. Soc. B. 1972:187–220. [Google Scholar]
- Dabrowska D. Uniform consistency of the kernel conditional Kaplan-Meier estimate. Ann. Statist. 1989;17:1157–67. [Google Scholar]
- Du Y, Akritas M. IID representations of the conditional Kaplan-Meier process for arbitrary distributions. Math. Meth. Statist. 2002;11:152–82. [Google Scholar]
- Food and Drug Administration Test determines risk of breast cancer returning. 2007 http://www.fda.gov/ForConsumers/ConsumerUpdates/ucm048477.htm.
- Gail M, Brinton L, Byar D, Corle D, Green S, Schairer C, Mulvihill J. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. JNCI Cancer Spectrum. 1989;81:1879–86. doi: 10.1093/jnci/81.24.1879. [DOI] [PubMed] [Google Scholar]
- Goldstein L, Langholz B. Asymptotic theory for nested case-control sampling in the Cox regression model. Ann. Statist. 1992;20:1903–28. [Google Scholar]
- Heagerty P, Lumley T, Pepe M. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics. 2000;56:337–44. doi: 10.1111/j.0006-341x.2000.00337.x. [DOI] [PubMed] [Google Scholar]
- Heagerty P, Zheng Y. Survival model predictive accuracy and ROC curves. Bio-metrics. 2005;61:92–105. doi: 10.1111/j.0006-341X.2005.030814.x. [DOI] [PubMed] [Google Scholar]
- Hunter D, Rimm E, Sacks F, Stampfer M, Colditz G, Litin L, Willett W. Comparison of measures of fatty acid intake by subcutaneous fat aspirate, food frequency questionnaire, and diet records in a free-living population of US men. American Journal of Epidemiology. 1992;135:418–27. doi: 10.1093/oxfordjournals.aje.a116302. [DOI] [PubMed] [Google Scholar]
- Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. John Wiley & Sons; 2002. [Google Scholar]
- Kannel W, Feinleib M, McNamara P, Garrison R, Castelli W. An investigation of coronary heart disease in families: The Framingham O spring Study. American Journal of Epidemiology. 1979;110:281–90. doi: 10.1093/oxfordjournals.aje.a112813. [DOI] [PubMed] [Google Scholar]
- Léon L, Cai T, Wei L. Robust Inferences For Covariate Effects On Survival Time With Censored Linear Regression Models. Statistics in Biosciences. 2009;1:1–15. [Google Scholar]
- Nan B, Kalbfleisch J, Yu M. Asymptotic theory for the semiparametric accelerated failure time model with missing data. Ann. Statist. 2009;37:2351–2376. [Google Scholar]
- Pepe M, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am. J. Epidemiol. 2004;159:882–90. doi: 10.1093/aje/kwh101. [DOI] [PubMed] [Google Scholar]
- Pollard D. Empirical processes: theory and applications. Institute of Mathematical Statistics; 1990. [Google Scholar]
- Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coeffcients when some regressors are not always observed. J. Am. Statist. Assoc. 1994;89:846–866. [Google Scholar]
- Rundle A, Vineis P, Ahsan H. Design options for molecular epidemiology research within cohort studies. Cancer Epidemiology Biomarkers & Prevention. 2005;14:1899. doi: 10.1158/1055-9965.EPI-04-0860. [DOI] [PubMed] [Google Scholar]
- Samuelsen S. A psudolikelihood approach to analysis of nested case-control studies. Biometrika. 1997;84:379–394. [Google Scholar]
- Tsiatis A. A nonidentifiability aspect of the problem of competing risks. Proceedings of the National Academy of Sciences of the United States of America. 1975;72:20. doi: 10.1073/pnas.72.1.20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ury H. Effciency of case-control studies with multiple controls per case: continuous or dichotomous data. Biometrics. 1975;31:643–649. [PubMed] [Google Scholar]
- van der Vaart A. Weak convergence of smoothed empirical processes. Scand. J. Statist. 1994;21:501–4. [Google Scholar]
- Ware J. The limitations of risk factors as prognostic tools. N. Eng. J. Med. 2006;355:2615. doi: 10.1056/NEJMp068249. [DOI] [PubMed] [Google Scholar]
- Wilson P, D'Agostino R, Levy D, Belanger A, Silbershatz H, Kannel W. Prediction of coronary heart disease using risk factor categories. Circulation. 1998;97:1837–47. doi: 10.1161/01.cir.97.18.1837. [DOI] [PubMed] [Google Scholar]
- Zhang L. A functional central limit theorem for asymptotically negatively dependent random fields. Acta Mathematica Hungarica. 2000;86:237–259. [Google Scholar]
- Zheng Y, Cai T, Pepe M, Levy W. Time-dependent predictive values of prognostic biomarkers with failure time outcome. J. Am. Statist. Assoc. 2008;103:362–8. doi: 10.1198/016214507000001481. [DOI] [PMC free article] [PubMed] [Google Scholar]