Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2020 Dec 21;48(8):1374–1401. doi: 10.1080/02664763.2020.1861225

Empirical evaluation of sub-cohort sampling designs for risk prediction modeling

Myeonggyun Lee a, Anne Zeleniuch-Jacquotte a,b, Mengling Liu a,b,CONTACT
PMCID: PMC9042011  PMID: 35706464

ABSTRACT

Sub-cohort sampling designs, such as nested case-control (NCC) and case-cohort (CC) studies, have been widely used to estimate biomarker-disease associations because of their cost effectiveness. These designs have been well studied and shown to maintain relatively high efficiency compared to full-cohort designs, but their performance of building risk prediction models has been less studied. Moreover, sub-cohort sampling designs often use matching (or stratifying) to further control for confounders or to reduce measurement error. Their predictive performance depends on both the design and matching procedures. Based on a dataset from the NYU Women's Health Study (NYUWHS), we performed Monte Carlo simulations to systematically evaluate risk prediction performance under NCC, CC, and full-cohort studies. Our simulations demonstrate that sub-cohort sampling designs can have predictive accuracy (i.e. discrimination and calibration) similar to that of the full-cohort design, but could be sensitive to the matching procedure used. Our results suggest that researchers can have the option of performing NCC and CC studies with huge potential benefits in cost and resources, but need to pay particular attention to the matching procedure when developing a risk prediction model in biomarker studies.

KEYWORDS: Calibration, case-cohort, discrimination accuracy, matching, nested case–control, risk prediction

1. Introduction

Predicting the absolute risk of a disease has been a long standing interest in biomedical research. Many statistical modeling and inference methods have been proposed and applied to build risk prediction models, such as the Gail model for breast cancer [13,11], the Framingham study for cardiovascular disease risk [10,33], and the prostate cancer prediction model [54,52]. Prospective full-cohort study has been traditionally used to build the risk prediction model using information from the entire cohort [41,34,32], while this is typically feasible when prediction models use basic demographic and clinical information.

With advances in knowledge and technology, novel biomarkers have been identified and added to risk modeling to improve predictive accuracy [28,53,29]. Although population-based full-cohort studies provide an ideal setting to study the predictive ability of models with biomarkers, the combination of large sample sizes, low incidence rates, and high costs make it infeasible to measure new biomarkers on the full cohort [12,14,45].

Sub-cohort sampling designs can help overcome this limitation, and have begun to play a key role in studying new biomarkers and their added values in risk prediction [2,1,50]. Nested case–control (NCC) [24,22] and case-cohort (CC) designs [35] are popular alternatives to the full-cohort design in estimating a relative risk. In both NCC and CC designs, all subjects who develop the disease during the study follow-up period are included as cases, but they differ in the way of selecting controls. These designs offer great advantages such as cost savings and efficiency in the estimation [48,23,4,31]. Furthermore, sub-cohort sampling designs often use a matching (or stratifying) not only to improve efficiency in the estimation of relative risks for the variables of interest, but also to reduce confounding and measurement error; for example, matching for time of sample storage (to limit the impact of biomarker degradation), matching for time of the day (to take into account circadian cycle effects), and matching for phase/day of the menstrual cycle when studying sex hormones in women [51]. Note that the improvement may be small unless there are strong confounders [47]. Therefore, the performance of building risk prediction models will depend on the choice of design as well as the matching procedure.

In this paper, we report the results of a systematic simulation study to evaluate the performance of risk prediction models built using NCC designs (without or with matching) and CC designs (without or with stratification) compared to the full-cohort design. In addition to the performance of point estimation, we assess three aspects of the risk prediction measures under NCC and CC designs: discrimination, calibration, and overall performance. Since only a few studies have examined the numerical performance of the estimations [20,38] and risk predictions [14,40] under sub-cohort sampling designs, additional numerical investigation is necessary to examine the risk prediction performance using different methods under NCC and CC designs. For instance, Salim et al. [38] compared the Langholz-Borgan method and Samulsen’s weighted method for only NCC data for estimating absolute risk, but did not compare with the CC design. Ganna et al. [14] described methods for estimating risk prediction measures from NCC and CC designs, but only considered no-matching and fine matching by age and gender. Thus, our study intends to provide systematic comparison of CC and NCC designs using multiple risk prediction measures with various matching strategies. Sub-cohort sampling designs all involve a within-cohort sampling, and there are sampling probabilities associated with sampled subjects. Thus, we provide technical details of how to compute risk prediction measures under these sub-cohort designs.

Our simulations were motivated by at study within the NYU Women’s Health Study (NYUWHS), which aimed at breast cancer risk prediction for younger women [51,15,8]. We considered risk factors included in the Breast Cancer Risk Assessment Tool developed by the National Cancer Institute as part of the Breast Cancer Detection and Demonstration project (http://bcra.nci.nih.gov/brc/start.htm), i.e. age at the enrollment, age at menarche, history of benign breast biopsy, age at first full-term pregnancy, family history of breast cancer, and race. In addition to these risk factors, we simulated to include a new biomarker into the risk prediction model to investigate the estimated relative risk, its efficiency, and the prediction accuracy by changing the effect of the biomarker. Through our extensive simulations, we demonstrate how sensitive the risk prediction meausres are to the underlying sub-cohort sampling designs with or without confounding matching, discuss the relative merits of each method, and explain how these methods can be implemented using standard statistical software in the context of building risk prediction models.

2. Methods

2.1. Full cohort design

Consider a full cohort in which n subjects are followed over time. For the ith subject, i=1,,n, we observe survival time Ti=min(Ti,Ci) where Ti is true time-to-event (for those who develop the event) and Ci is censoring time (for those who have not developed the event by the end of follow-up). Let δi=I(TiCi) define the event indicator where the indicator function I() takes the value 1 if TiCi, and 0 otherwise. Thus, the observed full-cohort time-to-event data consist of {(Ti,δi),i=1,,n}.

We herein consider a Cox proportional hazards (PH) model for the hazard function of the time-to-event outcome,

λi(t)=λ0(t)eXiβ+Ziγ (1)

where λ0(t) is the baseline hazard function, Xi is a vector of p covariates of interest, β is a p×1 vector as the log hazard ratios associated with Xi, and Zi is a vector of q confounders, with γ being its log hazard ratios. Full-cohort analysis is based on maximizing the partial likelihood,

PL(β,γ)=i[eXiβ+Ziγj=1nYj(Ti)eXjβ+Zjγ]δi,

where Yj(t)=I(Tjt) is the at-risk indicator function. In R software [44], we use coxph() function in the survival package [46] for the estimation and inference of model (1).

2.2. Nested case–control design

The NCC design uses incidence density sampling, in which, for each incident case, a number of controls are randomly selected from the current at-risk set excluding the case. Note that controls can be selected more than once and a case could serve as control for an incident case occurring before his/her event. Estimation of the log hazard ratio parameters in model (1) with NCC data can be based on the partial likelihood using the Langholz-Borgan (L-B) method [21], which is equivalent to the conditional logistic regression that treats each case–control set as a stratum [24]. Methods with the selection probability weighting [39,5] and pseudo-likelihood method [7,6] have also been proposed.

When assessing absolute risk is of interest, however, estimating the baseline hazard function is required and cannot be done directly using NCC data due to the over-representation of cases [38]. Therefore, it is necessary to use a weighting method to obtain an unbiased estimation of the baseline hazard function based on either the partial likelihood using the L-B method or the selection probability weighting method. The methods proposed by [21] and [39] are similar in the idea that uses sampling weights to adjust the contribution from controls, but differ in ways of modeling the weights. Note that the longer the follow-up duration, the more likely it is for a participant to be selected as a control. Since the sampling probability for an individual depends on entry time, censoring time, and censoring status available for all cohort subjects as well as any matching variables that were used, the missing-at-random (MAR) assumption is required for the validity of all inverse probability weighting approaches [39,37,43]. In the following sections, we briefly review the L-B method and Samuelsen’s weighted method as well as the matching procedure under the NCC study.

2.2.1. Langholz–Borgan method

Under the standard NCC design, the log hazard ratios in Cox PH model (1) are estimated by maximizing the partial likelihood specified below,

PL(β,γ)=i[eXiβ+ZiγjRieXjβ+Zjγ]δi,

where the Ri includes case i and his/her selected control(s). Given the log hazard ratio estimates, the cumulative baseline hazards are estimated as

Λ^0(t)=i=1nδiI[Tit]w(Ti)jRieXjβ^+Zjγ^,

where w(Ti)=j=1nYj(Ti)/(m+1) with the required number of controls per case (i.e., m). The weight is simply the inverse of the sampling probability of the nested case–control set Ri at time Ti. Under the NCC design, a subject may be selected as a control for more than one case, as well as be included as a case if he/she develops the event at a later time; this subject will therefore be included in more than one case–control set, and analyzed as if he/she was a distinct individual in each of these case–control sets. Note that the weight is time-dependent and a control selected multiple times may have different weights at different time points, reflecting the time-dependent nature of the control selection. We perform the L-B method in R software by using the coxph() function with each case–control set included as strata() object [44,46].

2.2.2. Weighted method

Samuelsen [39] proposed a weighted partial likelihood approach to estimate the hazard ratio and baseline hazard estimation in the NCC study. This method breaks the case–control sets and pools unique individuals for analysis, keeping only one record for each control. Specifically, the weighted partial likelihood based on (1) would be

PL(β,γ)=i=1n[eXiβ+Ziγj=1nΔjYj(Ti)wjeXjβ+Zjγ]δi

where Δj=δj+(1δj)R~j is the indicator of subject j being in the NCC sample, and R~j is the indicator of subject j ever being selected. The weight ( wj) is the inverse of sampling probability defined as 1 if individual j is a case or for a control

wj=[1i,TiTj(1mk=1nYk(Ti)1)]1 (2)

where the product term is simply the probability of never being selected into the NCC study. Given the log hazard ratio estimates, the cumulative baseline hazard can be estimated by using the method in [5],

Λ^0(t)=i=1nδiI[Tit]j=1nΔjYj(Ti)wjeXjβ^+Zjγ^.

In the survival package of the R software, we performed the weighted method to estimate the log hazard ratios by using the coxph() function with the weights calculated as in (2) [44,46].

2.2.3. Matching procedure

When matching is considered, at the time a case is diagnosed, the matching at-risk set consists of all subjects who are at-risk and meeting the matching criteria. Note that when the variable is exactly matched between cases and controls, the hazard ratio associated with this variable cannot be estimated by the L-B method, but the Samuelsen’s weighting method still can estimate because it breaks the case–control sets. Note that the sampling weight needs to be calculated within the confounding stratum. Since under the L-B approach the risk set will only consist of those subjects still at risk who have the same matching covariate values as the incidence case, the stratum-specific weight is simply calculated as

w(Ti)=|j=1nYj(Ti)|z=zim+1,

where |j=1nYj(Ti)|z=zi is the size of the risk set when z=zi. Similarly, for the Samuelsen’s method, the sampling probability of the control accounting for matching becomes

wj=[1i,TiTj(1mI(zi=zj)|k=1nYk(Ti)|z=zi1)]1.

2.3. Case-cohort study

CC design randomly chooses a sub-cohort from the full cohort, and covariate data are assembled for all the cases and subjects in the selected sub-cohort [35]. Various methods have been proposed for the inference of the Cox PH model under the CC design [35,42,25,3]. Even though these methods differ in defining risk sets and individual weights, the estimation and inference procedure are all based on maximizing a weighted partial likelihood function with a unified form as specified below,

PL(β,γ)=i[eXiβ+Ziγj=1nR¯jYj(Ti)wjeXiβ+Ziγ]δi,

where R¯j indicates the appropriate case-cohort risk set at onset time Ti. The specification of weight wj for subject j is given in Table 1, which summarizes the definition of risk sets and weights by Prentice [35], Self and Prentice [42], and Lin and Ying [25] methods for the un-stratified CC designs. Given the estimated hazard ratios, the cumulative baseline hazard is estimated by

Λ^0(t)=Ti<tδij=1nR¯jYj(Ti)wjeXjβ^+Zjγ^.

Specifically, in the Prentice method, each risk set involves all sub-cohort subjects at risk and the case. In the Self and Prentice approach, each risk set includes all sub-cohort members at risk, but excludes the case. Hence, the Self and Prentice approach comprises one less subject than the Prentice approach. The Lin and Ying method uses all members of the sub-cohort with the addition of cases not already included in the sub-cohort who are at risk. For all three methods, the weight has value wj=1 for cases and wj=1/π for controls in the sub-cohort, where π is the proportion of sub-cohort to full-cohort. We fit the Cox PH regression model to CC data using cch() function in the survival R package [46].

Table 1.

Definition of at-risk sets and individual weights under the CC design.

Method Each risk set ( R¯j) Weight ( wj)
Un-stratified approach
Prentice [35] I(ji)r~j+I(j=i) wj={1for case1/π for control
Self and Prentice [42] I(ji)r~j
Lin and Ying [25] r~j+δj(1r~j)
Stratified approach
Borgan I [3] I(ji)r~j wj=1/πl(j)
Borgan II [3] r~j+δj(1r~j) wj={1for case1/πl(j) for control

2.3.1. Stratified approaches under the CC design

The stratified CC design first partitions the cohort into L strata defined by confounder values, then selects random samples of ml sub-cohort subjects without replacement from the nl(n=l=1Lnl) subjects in stratum l(l=1,,L). Thus, the stratified CC design allows different sampling proportions for different strata. Borgan I method [3] is the natural generalization of Self and Prentice’s estimator [42] to stratified sampling, and each risk set in this method includes all sub-cohort members at risk, but excludes the case (Table 1). Borgan I method uses the inverse of the sampling probability for all subjects as 1/πl(j), where πl(j)=ml(j)/nl(j) with l(j) the sampling stratum of individual j. On the other hand, Borgan II method [3] includes all members of the sub-cohort in each risk set plus cases not included in the sub-cohort (i.e. similar to Lin and Ying method), and the weight is defined as 1 for all cases and 1/πl(j)for controls in the sub-cohort (Table 1). For the stratified CC design, we use cch() function with stratum argument in the survival R package [46].

2.4. Predictive performance: discrimination, calibration, and overall performance

When evaluating a prediction risk model, we often assess the predictive performance of the model by quantifying: (i) its ability to distinguish between low and high risk patients (discrimination); (ii) the agreement between the observed and predicted outcomes (calibration); (iii) the overall ‘distance’ between the observed and predicted outcomes (overall performance) [36]. While these performance measures have been extensively studied for risk models built under the full cohort design, they are less explored for risk models building under the sub-cohort sampling designs. In this section, we use a weighted version of these measures to evaluate the predictive performance of the risk model built using sub-cohort sampling designs.

2.4.1. Discrimination: a weighted version of Harrell’s C-index

A commonly applied measure of discrimination for survival prediction models is the C-index [17,18,19], and we consider the weighted C-index

CW=ijwjI(Ti<Tj)I(η^i>η^j)δiijwjI(Ti<Tj)δi,

where η^i=Xiβ^+Ziγ^ in the estimated model and wj is the appropriate weight of each design from Section 2.2 and 2.3 according to the design.

2.4.2. Calibration: calibration slope

The calibration slope is defined as the slope of the regression of the observed survival outcomes on the predicted prognostic index of the estimated model [49], i.e. α in

h(t)=h0(t)exp(αη^).

Values of α^ close to 1 suggest that the prediction model is well calibrated, while α^1 suggests over-fitting in the original data with potentially poor generalization to other populations. In our study, we calculated the calibration slope using a 5-fold cross-validation (CV).

2.4.3. Overall performance: a weighted Brier score

Graf et al. [16] proposed the empirical Brier score under right-censorship to measure the overall performance which can be interpreted in a similar way as a mean square error for prediction models. To account for the sub-cohort designs, the weighted Brier score is defined as

BS(t)=1ni=1nwi{(0S^(t|Xi,Zi))2I(Tit,δi=1)(1/G^(Ti))+(1S^(t|Xi,Zi))2I(Ti>t)(1/G^(t))},

where wi is the appropriate weight of each design, S^(t) is the estimated survival probability at a specific time t with the estimated hazard ratio parameters and baseline hazard function corresponding to the study design and G^(t) denotes the Kaplan-Meier estimate of the censoring distribution G or Cox PH model if the censoring depends on vocariates.

3. Simulation studies

We performed extensive numerical studies to evaluate the performance of risk prediction models built using sub-cohort sampling designs with or without matching procedures and compared them to the model developed using the full cohort. Our simulations were based on a dataset of 6550 women who were younger than 50 years old at enrollment in the NYUWHS as the objective was to identify risk factors for breast cancer in younger women [15,8]. Six variables were considered: age at the enrollment (AGE), age at menarche (AGEMEN), history of benign breast biopsy (BIOPSY), age at first full-term pregnancy (FTP), family history of breast cancer (RELATIVE), and race (RACE). We additionally simulated a biomarker (BIO) from a standard normal distribution.

Given the simulated biomarker and six risk factors, we generated the time to breast cancer onset from the following

λi(t)=λ0(t)exp(0.028AGEi0.034AGEMENi+0.431BIOPSYi0.105FTPi+0.541RELATIVEi+0.347RACEi+βBIOi),

where the log hazard ratio parameters for the six risk factors were set at the estimated values from the Cox PH model with these six variables using the full-cohort data, and the effect of biomarker, β, was set to be 0.0, 0.2, or 0.5 for BIO1, BIO2, or BIO3 model, respectively, corresponding to a biomarker with null, weak, or strong association with survival risk. The baseline hazards function, λ0(t), assumed Weibull(k=0.929,λ=0.002). We run 20,000 simulations, each with the full cohort size N=6,550, with random censoring times independently generated from min(Exp(λ=0.044),25) to yield a censoring rate of approximately 88%, which is similar with our cohort.

For each generated cohort, the full-cohort analysis results were considered the gold standard. From the full-cohort of 6550 subjects, we generated NCC data where one control was randomly chosen for each case using the incidence sampling, and CC data where a 12% of sub-cohort was selected so that the sample sizes between CC and NCC designs were similar. In our simulation for the BIO1 model as an example, the average sample size of no-matching NCC data was 1519.98 (SD=51.21) and the unstratified CC data had 1454.95 (SD=24.76). We used the L-B method and weighted method for NCC data. Prentice, Self and Prentice, and Lin and Ying methods were applied for un-stratified CC data while stratified CC data used Borgan I and II approaches.

Four different matching procedures for the NCC study were performed: (i) no-matching, (ii) matching on RACE, (iii) matching on AGE-group and RACE (i.e. RACE + AGE), and (iv) fine matching. Specifically, for RACE matching, controls were randomly selected from participants of the same RACE group (i.e. White or Non-white). In the RACE + AGE matching, the same RACE categories and four AGE-groups (corresponding to cohort quartiles) were used. In the fine matching, cases were matched to controls with the same AGE (in years) and RACE categories. For the stratified CC designs, we stratified the cohort by RACE and two AGE groups defined by the median of AGE, so that there were four stratums in these analyses.

The log hazard ratio estimates and its standard deviation (SD) under each method were calculated to compare the estimation, inference, and relative efficiency of each approach with respect to that of the full-cohort approach. The coverage probability of the 95% confidence interval (CI) and its Monte Carlo standard errors (SE) defined as p^cov(1p^cov)/nsim, where p^cov as the estimated coverage probability and nsim is the number of simulation replications, were also obtained to evaluate the precision of the estimates [30].

As we explained in the introduction and method sections, prediction measures of sub-cohort sampling designs require the weighting scheme. To evaluate prediction measures of the sub-cohort sampling designs, we calculated the weighted version of C-index, calibration slope, and the Brier score for discrimination, calibration, and overall performance, respectively. Note that the weighted Brier score was calculated at different time points including the median follow-up time of 13 years. All computations were conducted using R version 3.6.3 [44].

4. Simulation results

Table 2 summarizes baseline characteristics of NYUWHS’s subjects, along with the estimated associations of risk factors with breast cancer risk in the full cohort, with or without the inclusion of the simulated biomarker. The average AGE of the study participants at enrollment was 42.75 years and approximately 11.5% of the 6550 participants had developed breast cancer during the follow-up duration. Breast cancer events occurred with a median of 12.5 years since the enrollment. Compared to controls, women who developed breast cancer were more likely to have had a breast biopsy (i.e. BIOPSY) and family history of breast cancer (i.e. RELATIVE). The estimated hazard ratios of these covariates were 1.54(e0.431) and 1.72(e0.541), respectively. In order to evaluate the performance of NCC and CC designs, these full-cohort estimates in Table 2 were considered the gold standard.

Table 2.

Baseline descriptive statistics and average regression coefficients (SD) with breast cancer risk for full models with/without simulated biomarkers.

  Descriptive Statistics+ Association (Average Realization)
  Case Control Overall Base      
Variables (n=751) (n=5799) (n=6550) Model* Base+ Bio 1 Base+ Bio 2 Base+ Bio 3
AGE 43.2 42.7 42.8 0.028 0.028 0.028 0.028
  (4.01) (4.12) (4.11) (0.009) (0.009) (0.009) (0.008)
AGEMEN 12.4 12.5 12.5 −0.034 −0.036 −0.035 −0.035
  (1.52) (1.53) (1.53) (0.087) (0.024) (0.023) (0.021)
BIOPSY = Yes 172 886 1058 0.431 0.428 0.429 0.429
  (22.9) (15.3) (16.2) (0.087) (0.088) (0.083) (0.076)
FTP = Yes 427 3443 3870 −0.105 −0.108 −0.108 −0.108
  (56.9) (59.4) (59.1) (0.084) (0.075) (0.071) (0.064)
RELATIVE = Yes 194 882 1076 0.541 0.543 0.543 0.543
  (25.8) (15.2) (16.4) (0.084) (0.084) (0.080) (0.073)
RACE = White 540 3600 4140 0.347 0.349 0.349 0.348
  (71.9) (62.1) (63.2) (0.082) (0.082) (0.078) (0.070)
BIO         0.000 0.200 0.501
          (0.037) (0.035) (0.032)

We visualized the bias of the estimated log hazard ratios and standard deviation for NCC and CC designs with simulated biomarker 2 (i.e. BIO 2 model) in Figure 1 and deferred results from models BIO1 and BIO3 to Appendix. Compared to the gold standard of full-cohort, the estimation results from CC and NCC designs were mostly unbiased (see Appendix A-1, A-3 and A-5) and had proper coverage probabilities (see Appendix A-2, A-4 and A-6), except for the NCC with fine-matching setting. For example, under the fine matching setting, BIOPSY and FTP variables clearly showed biased with the L-B method (Figure 1). Furthermore, the standard deviations under fine-matching setting were relatively larger than other matching procedures; for example, the standard deviation of FTP variable under the weighted method with fine matching was 0.165, while no-matching procedure estimated 0.100 (Appendix A-3). Thus, the fine matching showed a 20–40% loss of efficiency corresponding to no-matching procedure. When AGE in quartiles was used in addition to RACE for matching, the standard deviation of the AGE using the L-B method became less efficient than the no-matching setting, although the estimation was still unbiased (Appendix A-3). In addition, both stratified and un-stratified CC designs were efficient and unbiased.

Figure 1.

Figure 1.

Bias and SD of log hazard ratios for risk factors in BIO2 model.

Abbreviations: Gold, gold standard (full cohort analysis); P, Prentice; S&P, Self and Prentice; L&Y, Lin and Ying; B1, Borgan I; B2, Borgan II; LB#, L-B method and W#, Weighted method with matching confounders, (1) No matching, (2) RACE, (3) RACE + quartile AGE, and (4) Fine (RACE + exact AGE) matching. Note that blank line means that the variable was used in the matching procedure (i.e. not estimated).

Simulation results of the models including BIO1 and BIO3 biomarkers are shown in Appendix A and demonstrate similar observations as in the BIO2 model. Our results consistently showed that fine matching could lead biased and inefficient in estimating the relative risks.

Measures of the predictive performance are reported for all models in Table 3. The C-index of gold standard increased when the effect size of the simulated biomarker increased; 0.608, 0.618, and 0.668 for BIO1, BIO2, and BIO3 models, respectively. All sub-cohort sampling designs showed the satisfactory performance of discriminations according to the weighted C-index. For NCC designs, the weighted method showed more consistent results of C-index than the L-B method when some variables were used for matching. Both un-stratified and stratified approaches have very similar ability of discrimination (Table 3). However, we found that the model relatively had a poor fit (i.e. poor calibrations) when using fine matching for NCC designs, in particular using the weighted method (Table 3). Compared to other matching procedures, fine-matching had smaller value of calibration slopes with larger standard deviations of the slope estimates, indicating the lack of goodness-of-fit. NCC designs without fine matching and CC designs all performed satisfactorily in the calibration.

Table 3.

Predictive performance of the models: discrimination, calibration, and overall performance.

Model   Base + BIO1 Base + BIO2 Base + BIO3
      Discrimination  
Full-cohort NCC (L-B) Gold standard 0.608 (0.017) 0.618 (0.010) 0.668 (0.009)
  No matching 0.607 (0.014) 0.620 (0.016) 0.668 (0.014)
  RACE 0.591 (0.017) 0.607 (0.017) 0.661 (0.014)
  RACE + AGE 0.584 (0.017) 0.602 (0.017) 0.658 (0.014)
  Fine 0.584 (0.021) 0.601 (0.021) 0.659 (0.018)
NCC (Weighted) No matching 0.605 (0.011) 0.620 (0.013) 0.669 (0.011)
  RACE 0.606 (0.013) 0.620 (0.013) 0.668 (0.011)
  RACE + AGE 0.606 (0.013) 0.619 (0.012) 0.668 (0.011)
  Fine 0.610 (0.020) 0.624 (0.019) 0.671 (0.017)
CC+ Un-stratified 0.607 (0.014) 0.620 (0.014) 0.669 (0.012)
  Stratified 0.606 (0.013) 0.620 (0.013) 0.669 (0.012)
      Calibration1  
Full-cohort NCC (L-B) Gold standard 0.943 (0.026) 0.960 (0.018) 0.984 (0.008)
  No matching 0.884 (0.063) 0.917 (0.046) 0.966 (0.023)
  RACE 0.876 (0.079) 0.919 (0.052) 0.972 (0.023)
  RACE + AGE 0.857 (0.091) 0.909 (0.057) 0.971 (0.023)
  Fine 0.826 (0.147) 0.895 (0.089) 0.970 (0.035)
NCC (Weighted) No matching 0.903 (0.058) 0.930 (0.042) 0.972 (0.020)
  RACE 0.901 (0.055) 0.928 (0.040) 0.971 (0.020)
  RACE + AGE 0.898 (0.056) 0.927 (0.040) 0.971 (0.020)
  Fine 0.760 (0.165) 0.822 (0.118) 0.918 (0.057)
CC+ Un-stratified 0.889 (0.068) 0.915 (0.052) 0.960 (0.029)
  Stratified 0.898 (0.064) 0.923 (0.050) 0.964 (0.029)
      Overall performance2  
Full-cohort NCC (L-B) Gold standard 0.095 (0.003) 0.104 (0.004) 0.125 (0.003)
  No matching 0.226 (0.027) 0.230 (0.030) 0.219 (0.082)
  RACE 0.224 (0.024) 0.228 (0.027) 0.218 (0.081)
  RACE + AGE 0.226 (0.028) 0.230 (0.030) 0.243 (0.040)
  Fine 0.186 (0.039) 0.192 (0.041) 0.205 (0.047)
NCC (Weighted) No matching 0.199 (0.005) 0.202 (0.005) 0.191 (0.064)
  RACE 0.201 (0.005) 0.204 (0.005) 0.192 (0.064)
  RACE + AGE 0.202 (0.005) 0.205 (0.005) 0.214 (0.004)
  Fine 0.241 (0.007) 0.245 (0.007) 0.257 (0.007)
CC+ Un-stratified 0.203 (0.008) 0.220 (0.008) 0.254 (0.007)
  Stratified 0.203 (0.008) 0.220 (0.008) 0.254 (0.007)

For the overall performance of the risk prediction using the weighted Brier score, we observed some discripencies between sub-chort sampling analyses and the full-cohort analysis. The average of the Brier score at the median follow-up time was 0.095, 0.104, and 0.125 for BIO1, BIO2, and BIO3 models, respectively, while the range of the Brier score for sub-cohort sampling designs was about 0.2 (Table 3). In particular, the Brier score for the weighted method with fine matching was subsentially underperformed. In general, CC designs and the weighted method (without fine matching) for NCC data showed relatively better performance of the Brier score than the L-B method. Furthermore, the L-B methods for the NCC designs showed high variability of the Brier score over time (see Figure 2 for BIO2 model; Appendix B-2 for BIO1 model; and Appendix B-3 for BIO3 model). Even though the performance of sub-cohort sampling designs were not fully satisfactory comparing to the full-cohort deisgn in terms of overall performance according to the Brier score, we note that there are still benefits of using proper weighting schemes when comparing with the unweighted Brier score (see Appendix B-1).

Figure 2.

Figure 2.

Empirical Brier score at different time points ( t=5,10,15) for BIO2 model.

Abbreviations: LB#, L-B method and W#, Weighted method with matching confounders, (1) No matching, (2) RACE, (3) RACE + quartile AGE, and (4) Fine (RACE + exact AGE) matching; L&Y, Lin and Ying; B2, Borgan II. Note that red cross mark indicates the average value of the Brier score for gold standard model (i.e. full cohort analysis).

5. Discussion

Our numerical studies were designed to capture typical parameters found in studies with new biomarkers where a full-cohort design would be infeasible. Simulations were based on a real dataset of 6550 women who were younger than 50 years-old at enrollment in the NYUWHS. This study had an incidence rate of approximately 12% in which sub-cohort sampling designs could be ideal rather than the full-cohort design. We also had various effect sizes of the simulated biomarkers as null, week, and strong (i.e. hazards ratios of biomarkers were set at 1.00, 1.22, and 1.65, respectively), corresponding to practical settings relevant in biomarker studies. As the similar scientific question of our study has been studied in literatures [14,38,40,5,9], our simulation study additionally provides comprehensive results by comparing different methods of both NCC and CC designs, considering various matching or stratifying strategies, and evaluating risk prediction performance for discrimination, calibration, and overall performance. In addition, Sanderson et al. [40] and Cook et al. [9] focused on CC design. All these works provide important insights that could help researchers plan a biomarker study to develop a risk prediction model using sub-cohort sampling designs with or without matching.

We have shown that sub-cohort sampling designs can be used to build a risk prediction model with unbiasedness and relatively high efficiency in estimating hazard ratio parameters. Except to fine matching for NCC data, all the methods for both NCC and CC designs produced accurate estimates (i.e. unbiased) while they differed in efficiency. In addition, the coverage probabilities were overall close to 0.95. These results imply that the main influence by the design and matching (without fine matching) was more likely due to the estimation of the baseline hazard functions. The weighted method tended to be more efficient than the L-B method under the NCC studies. Even though the stratified approach slightly improved the statistical efficiency under the CC study, both stratified and un-stratified methods performed very similar in our simulations. However, when it comes to the fine matching for NCC data, both the L-B and weighted approaches had large standard errors as well as biased estimations. Specifically, the poor performance of the fine matching would be from unbalanced distribution of the variable that we used for fine matching procedure. For example, when we looked at the frequency table for AGE (data are not shown), younger AGE had small numbers so that the weighting by fine matching by exact AGE and RACE could be inaccurate. Our results regarding the fine matching are consistent with those of Støer and Samuelsen [43] and Ganna et al. [14] who reported biased results for the weighted approach under fine matching. Although Salim et al. [38] did not observe such bias, the matching they used (age in years) was not as fine as that in [43] (week or month of blood collection date). Furthermore, compared to our simulation, Salim et al. [38] used relatively large sampling strata that would not be real in biomarker studies. So although any sub-cohort design is a suitable candidate for building a risk prediction model, its performance could be highly affected by the choice of matching procedure, particularly in case of the fine matching. For instance, if the matching variables are not adequately accounted for, estimates might be biased and inefficient. Thus, we notice that researchers should pay particular attention to the drawbacks of using fine matching, in particular, when the study size is not big enough (i.e. small sampling strata).

Acknowledging that sub-cohort sampling designs can have particular value in extremely low incidence rate for their cost-effectiveness, we additionally conducted simulations with 5% of incidence rate (see Appendix C). By adding the case of 5% incidence rate, our simulation had additional setting for smaller sample size in the sub-cohort sampling designs. Simulation results under the 5% incidence rate yielded that the prediction accuracy by weighted versions of calibration slope and the Brier score deteriorates when the sample incidence rate becomes low (e.g. Table 3 versus Appendix C-1). Interestingly, the C-index was consistent between 5% and 12% incidence rates. Note that the types of predictors, their effect sizes, and their distributions all have impact on the performance of the final model.

By using the proper weighting schemes in each measure, the prediction measures can be unbiasedly (i.e. correctly) calculated, even though there are still lack of satisfactory of the use of the weighted Brier score for the sub-cohort sampling designs. However, not accounting for the weighting though could obviously lead to substantial bias; for example, the results of risk prediction performance for NCC designs without the weight underachieved in discrimination, calibration, and overall performance (Appendix B-1). All methods showed very similar performance for discrimination and calibration, except to fine matching case. We observed noticeable lack of fit and deteriorated overall performance of the fine matching according to the calibration slope and the Brier score. Therefore, fine-matching may be useful to remove confounding effect when estimating relative risk but should be used with cautions when the scientific interest is in risk prediction.

There are several limitations of our study. First, the effects of biomarker and other risk factors can change over time, i.e. having time-varying effects. We acknowledged that evaluating the performance of risk prediction models with time-varying effects built from CC and NCC data is of great importance. We have studied the estimation and inference time-varying effects in NCC and CC studies previously [26]. Thus, we will consider the risk prediction modeling with time-varying effects under sub-cohort sampling designs in future research. Second, left truncation sometimes occurs in cohort studies; for example, when only subjects who are disease-free enter the study. Recently, Lu and Liu [27] discussed left-truncation into NCC studies and this can be naturally extended to risk prediction modeling. Thus, our future research will add the left truncation to evaluate the risk prediction performance under sub-cohort sampling designs.

In conclusion, motivated by real data from NYU Women’s Health study focusing on the breast cancer, we provide comprehensive knowledge on designing biomarker studies to develop or validate risk prediction models under sub-cohort sampling designs with a matching procedure. Through our simulations, we have shown that sub-cohort sampling designs are suitable to develop a risk prediction model. NCC and CC studies not only provide accurate and efficient estimates of association but also can be used to calculate predictive measures of discrimination, calibration, and overall performance in the context of a risk prediction model. However, fine matching of NCC studies may not appropriate when the research aim is to develop a risk prediction model in a biomarker study unless the study has enough size.

Appendix A.

A-1.

BIO1 model: log hazard ratio estimates (SD) for sub-cohort sampling designs with BIO1.

        Case Cohort Design      
  Full-cohort Un-stratified approaches Stratified approaches    
Variables Gold standard Prentice Self and Prentice Lin and Ying BorganI BorganII    
AGE 0.028 (0.009) 0.028 (0.014) 0.029 (0.014) 0.028 (0.014) 0.028 (0.012) 0.028 (0.012)    
AGEMEN −0.036 (0.024) −0.036 (0.038) −0.036 (0.039) −0.036 (0.037) −0.036 (0.038) −0.036 (0.037)    
BIOPSY 0.428 (0.088) 0.438 (0.147) 0.439 (0.148) 0.438 (0.142) 0.438 (0.148) 0.437 (0.142)    
FTP −0.108 (0.075) −0.110 (0.117) −0.110 (0.117) −0.110 (0.114) −0.109 (0.118) −0.109 (0.115)    
RELATIVE 0.543 (0.084) 0.552 (0.144) 0.554 (0.145) 0.553 (0.139) 0.553 (0.143) 0.551 (0.138)    
RACE 0.349 (0.082) 0.351 (0.120) 0.351 (0.121) 0.351 (0.118) 0.352 (0.098) 0.351 (0.097)    
BIO1 0.000 (0.037) 0.000 (0.057) 0.000 (0.058) 0.000 (0.056) 0.000 (0.057) 0.000 (0.056)    
        Nested Case Control Design      
    L-B method     Weighted method  
Variables No matching RACE RACE + AGE Fine No matching RACE RACE + AGE Fine
AGE 0.028 (0.013) 0.028 (0.013) 0.028 (0.047) - 0.028 (0.013) 0.028 (0.013) 0.028 (0.010) 0.025 (0.018)
AGEMEN −0.036 (0.036) −0.036 (0.036) −0.036 (0.036) −0.050 (0.045) −0.036 (0.035) −0.036 (0.034) −0.036 (0.034) −0.038 (0.055)
BIOPSY 0.436 (0.141) 0.435 (0.138) 0.437 (0.139) 0.373 (0.172) 0.435 (0.132) 0.436 (0.130) 0.436 (0.130) 0.467 (0.215)
FTP −0.109 (0.111) −0.109 (0.110) −0.111 (0.110) −0.216 (0.139) −0.109 (0.106) −0.110 (0.105) −0.110 (0.105) −0.114 (0.171)
RELATIVE 0.550 (0.137) 0.550 (0.134) 0.551 (0.133) 0.500 (0.167) 0.550 (0.128) 0.550 (0.125) 0.551 (0.125) 0.580 (0.213)
RACE 0.352 (0.117) - - - 0.351 (0.112) 0.348 (0.092) 0.348 (0.091) 0.310 (0.154)
BIO1 0.000 (0.054) 0.000 (0.054) 0.000 (0.054) 0.008 (0.067) 0.000 (0.052) 0.000 (0.051) 0.000 (0.051) 0.002 (0.084)

A-2.

BIO1 model: coverage probability of 95% confidence intervals (Monte Carlo SE) for sub-cohort sampling designs with BIO1.

        Case Cohort Design      
  Full-cohort Un-stratified approaches Stratified approaches    
Variables Gold standard Prentice Self and Prentice Lin and Ying BorganI BorganII    
AGE 94.94 (0.16) 94.98 (0.15) 94.98 (0.15) 95.06 (0.15) 94.96 (0.15) 95.00 (0.15)    
AGEMEN 95.12 (0.15) 94.83 (0.16) 94.83 (0.16) 94.77 (0.16) 94.96 (0.15) 95.00 (0.15)    
BIOPSY 94.92 (0.16) 94.97 (0.15) 95.00 (0.15) 94.84 (0.15) 94.95 (0.15) 94.96 (0.15)    
FTP 94.91 (0.15) 94.80 (0.16) 94.83 (0.16) 94.86 (0.16) 94.84 (0.16) 94.90 (0.16)    
RELATIVE 94.90 (0.15) 94.98 (0.15) 94.97 (0.15) 94.83 (0.15) 94.96 (0.15) 95.04 (0.15)    
RACE 94.90 (0.15) 94.97 (0.15) 94.97 (0.15) 95.09 (0.15) 95.10 (0.15) 95.21 (0.15)    
BIO1 95.10 (0.15) 95.08 (0.15) 95.08 (0.15) 95.20 (0.15) 95.10 (0.15) 94.90 (0.16)    
        Nested Case Control Design      
    L-B method     Weighted method  
Variables No matching RACE RACE + AGE Fine No matching RACE RACE + AGE Fine
AGE 94.98 (0.15) 94.94 (0.16) 94.82 (0.16) - 94.90 (0.16) 95.02 (0.15) 94.86 (0.16) 94.52 (0.16)
AGEMEN 94.94 (0.16) 94.92 (0.16) 95.05 (0.15) 93.48 (0.17) 94.97 (0.15) 94.89 (0.16) 94.96 (0.15) 95.14 (0.15)
BIOPSY 94.92 (0.16) 95.06 (0.15) 94.86 (0.16) 94.02 (0.17) 94.94 (0.16) 95.24 (0.15) 94.88 (0.16) 94.81 (0.16)
FTP 94.93 (0.16) 95.05 (0.15) 94.86 (0.15) 87.74 (0.23) 95.04 (0.15) 94.94 (0.15) 94.98 (0.15) 95.15 (0.15)
RELATIVE 94.90 (0.16) 95.02 (0.15) 95.00 (0.16) 94.16 (0.17) 95.10 (0.15) 95.00 (0.15) 94.86 (0.16) 94.68 (0.16)
RACE 95.02 (0.15) - - - 95.00 (0.15) 95.16 (0.15) 95.01 (0.15) 94.26 (0.16)
BIO1 95.09 (0.15) 95.04 (0.15) 95.06 (0.15) 94.81 (0.16) 95.08 (0.15) 95.02 (0.15) 95.08 (0.15) 94.87 (0.16)

A-3.

BIO2 model: log hazard ratio estimates (SD) for sub-cohort sampling designs with BIO2.

        Case Cohort Design      
  Full-cohort Un-stratified approaches Stratified approaches    
Variables Gold standard Prentice Self and Prentice Lin and Ying BorganI BorganII    
AGE 0.028 (0.009) 0.028 (0.014) 0.028 (0.014) 0.028 (0.013) 0.028 (0.012) 0.028 (0.012)    
AGEMEN −0.035 (0.023) −0.036 (0.038) −0.036 (0.038) −0.036 (0.037) −0.036 (0.038) −0.036 (0.036)    
BIOPSY 0.429 (0.083) 0.439 (0.146) 0.440 (0.147) 0.438 (0.139) 0.439 (0.146) 0.437 (0.139)    
FTP −0.108 (0.071) −0.110 (0.116) −0.110 (0.116) −0.110 (0.112) −0.110 (0.117) −0.110 (0.113)    
RELATIVE 0.543 (0.080) 0.553 (0.144) 0.555 (0.144) 0.553 (0.137) 0.554 (0.143) 0.552 (0.136)    
RACE 0.349 (0.078) 0.350 (0.119) 0.351 (0.119) 0.350 (0.116) 0.352 (0.096) 0.351 (0.094)    
BIO2 0.200 (0.035) 0.203 (0.057) 0.203 (0.058) 0.203 (0.055) 0.203 (0.058) 0.203 (0.055)    
        Nested Case Control Design      
    L-B method     Weighted method  
Variables No matching RACE RACE + AGE Fine No matching RACE RACE + AGE Fine
AGE 0.028 (0.013) 0.028 (0.013) 0.028 (0.045) - 0.028 (0.012) 0.028 (0.012) 0.028 (0.010) 0.025 (0.017)
AGEMEN −0.036 (0.034) −0.036 (0.034) −0.036 (0.034) −0.050 (0.043) −0.036 (0.032) −0.036 (0.033) −0.036 (0.033) −0.038 (0.053)
BIOPSY 0.435 (0.134) 0.435 (0.133) 0.435 (0.132) 0.375 (0.162) 0.434 (0.124) 0.435 (0.124) 0.435 (0.123) 0.464 (0.205)
FTP −0.110 (0.106) −0.110 (0.106) −0.110 (0.105) −0.216 (0.134) −0.109 (0.100) −0.110 (0.101) −0.111 (0.100) −0.112 (0.165)
RELATIVE 0.549 (0.132) 0.550 (0.129) 0.550 (0.129) 0.500 (0.161) 0.548 (0.122) 0.550 (0.119) 0.550 (0.119) 0.578 (0.202)
RACE 0.351 (0.112) - - - 0.350 (0.106) 0.347 (0.088) 0.348 (0.087) 0.313 (0.149)
BIO2 0.202 (0.052) 0.202 (0.052) 0.202 (0.052) 0.210 (0.065) 0.202 (0.049) 0.202 (0.049) 0.202 (0.049) 0.210 (0.080)

A-4.

BIO2 model: coverage probability of 95% confidence intervals (Monte Carlo SE) for sub-cohort sampling designs with BIO2.

        Case Cohort Design      
  Full-cohort Un-stratified approaches Stratified approaches    
Variables Gold standard Prentice Self and Prentice Lin and Ying BorganI BorganII    
AGE 95.08 (0.16) 94.96 (0.15) 94.97 (0.15) 95.02 (0.15) 94.84 (0.16) 94.82 (0.16)    
AGEMEN Gold standard 95.12 (0.15) 95.12 (0.15) 94.95 (0.15) 94.96 (0.15) 94.98 (0.15)    
BIOPSY 95.04 (0.16) 94.88 (0.16) 94.88 (0.16) 94.84 (0.16) 94.86 (0.16) 95.06 (0.15)    
FTP 95.00 (0.15) 95.02 (0.15) 95.02 (0.15) 94.97 (0.15) 94.84 (0.16) 94.77 (0.16)    
RELATIVE 94.82 (0.15) 94.80 (0.16) 94.80 (0.16) 94.83 (0.16) 94.98 (0.15) 95.13 (0.15)    
RACE 95.24 (0.16) 94.99 (0.15) 94.99 (0.15) 94.92 (0.16) 95.01 (0.15) 94.91 (0.16)    
BIO2 95.17 (0.16) 95.00 (0.15) 95.00 (0.15) 94.97 (0.15) 95.01 (0.15) 95.08 (0.15)    
        Nested Case Control Design      
    L-B method     Weighted method  
Variables No matching RACE RACE + AGE Fine No matching RACE RACE + AGE Fine
AGE 95.02 (0.15) 94.92 (0.16) 94.73 (0.16) - 94.87 (0.16) 94.96 (0.15) 95.00 (0.15) 94.58 (0.16)
AGEMEN 95.00 (0.15) 95.12 (0.15) 94.98 (0.15) 93.45 (0.17) 95.05 (0.15) 94.81 (0.16) 94.87 (0.16) 95.10 (0.15)
BIOPSY 94.96 (0.15) 94.94 (0.15) 94.90 (0.16) 93.70 (0.17) 94.92 (0.16) 95.01 (0.15) 94.84 (0.16) 94.70 (0.16)
FTP 95.12 (0.15) 94.96 (0.15) 95.04 (0.15) 87.61 (0.23) 94,98 (0.15) 94.80 (0.16) 94.96 (0.15) 94.78 (0.16)
RELATIVE 94.96 (0.15) 95.12 (0.15) 95.05 (0.15) 94.12 (0.17) 95.10 (0.15) 95.02 (0.15) 95.02 (0.15) 94.61 (0.16)
RACE 94.80 (0.16) - - - 94.92 (0.16) 94.97 (0.15) 95.06 (0.15) 94.44 (0.16)
BIO2 95.10 (0.15) 94.88 (0.16) 95.96 (0.15) 94.58 (0.16) 94.94 (0.16) 94.99 (0.15) 95.04 (0.15) 94.76 (0.16)

A-5.

BIO3 model: log hazard ratio estimates (SD) for sub-cohort sampling designs with BIO3.

        Case Cohort Design      
  Full-cohort Un-stratified approaches Stratified approaches    
Variables Gold standard Prentice Self and Prentice Lin and Ying BorganI BorganII    
AGE 0.028 (0.008) 0.028 (0.014) 0.028 (0.014) 0.028 (0.013) 0.028 (0.012) 0.028 (0.012)    
AGEMEN −0.035 (0.021) −0.036 (0.038) −0.036 (0.038) −0.036 (0.036) −0.036 (0.038) −0.036 (0.036)    
BIOPSY 0.429 (0.076) 0.440 (0.150) 0.442 (0.150) 0.438 (0.137) 0.441 (0.151) 0.438 (0.137)    
FTP −0.108 (0.064) −0.110 (0.118) −0.110 (0.118) −0.110 (0.110) −0.110 (0.119) −0.110 (0.111)    
RELATIVE 0.543 (0.073) 0.555 (0.148) 0.557 (0.149) 0.553 (0.136) 0.556 (0.148) 0.552 (0.135)    
RACE 0.348 (0.070) 0.350 (0.121) 0.351 (0.121) 0.350 (0.115) 0.352 (0.099) 0.351 (0.094)    
BIO3 0.501 (0.032) 0.508 (0.062) 0.510 (0.062) 0.506 (0.056) 0.510 (0.063) 0.507 (0.056)    
        Nested Case Control Design      
    L-B method     Weighted method  
Variables No matching RACE RACE + AGE Fine No matching RACE RACE + AGE Fine
AGE 0.028 (0.012) 0.028 (0.012) 0.028 (0.043) - 0.028 (0.011) 0.028 (0.011) 0.028 (0.010) 0.026 (0.017)
AGEMEN −0.036 (0.032) −0.036 (0.032) −0.036 (0.032) −0.050 (0.041) −0.035 (0.030) −0.035 (0.030) −0.035 (0.030) −0.038 (0.049)
BIOPSY 0.434 (0.127) 0.434 (0.126) 0.434 (0.126) 0.377 (0.155) 0.434 (0.114) 0.434 (0.113) 0.434 (0.113) 0.460 (0.190)
FTP −0.110 (0.100) −0.109 (0.100) −0.109 (0.100) −0.215 (0.126) −0.110 (0.092) −0.110 (0.091) −0.109 (0.092) −0.113 (0.155)
RELATIVE 0.550 (0.124) 0.549 (0.122) 0.550 (0.122) 0.499 (0.152) 0.548 (0.110) 0.548 (0.109) 0.549 (0.109) 0.569 (0.189)
RACE 0.350 (0.105) - - - 0.350 (0.097) 0.346 (0.084) 0.348 (0.085) 0.321 (0.145)
BIO3 0.504 (0.052) 0.503 (0.052) 0.504 (0.052) 0.512 (0.065) 0.504 (0.047) 0.501 (0.046) 0.504 (0.046) 0.516 (0.075)

A-6.

BIO3 model: coverage probability of 95% confidence intervals (Monte Carlo SE) for sub-cohort sampling designs with BIO3.

        Case Cohort Design      
  Full-cohort Un-stratified approaches Stratified approaches    
Variables Gold standard Prentice Self and Prentice Lin and Ying BorganI BorganII    
AGE 95.02 (0.16) 94.98 (0.15) 94.97 (0.15) 95.08 (0.15) 95.06 (0.15) 95.08 (0.15)    
AGEMEN 94.98 (0.16) 94.90 (0.16) 94.91 (0.16) 94.90 (0.16) 94.92 (0.16) 94.90 (0.16)    
BIOPSY 94.98 (0.15) 95.00 (0.15) 95.00 (0.15) 95.06 (0.15) 94.80 (0.16) 94.96 (0.15)    
FTP 94.94 (0.16) 94.86 (0.16) 94.88 (0.16) 94.86 (0.16) 94.82 (0.16) 94.96 (0.15)    
RELATIVE 95.00 (0.15) 94.95 (0.15) 94.94 (0.15) 94.95 (0.15) 95.03 (0.15) 95.13 (0.15)    
RACE 94.80 (0.16) 94.94 (0.16) 94.95 (0.15) 94.95 (0.15) 95.04 (0.15) 95.14 (0.15)    
BIO3 95.04 (0.16) 94.69 (0.16) 94.56 (0.16) 94.77 (0.16) 94.83 (0.16) 94.86 (0.16)    
        Nested Case Control Design      
    L-B method     Weighted method  
Variables No matching RACE RACE + AGE Fine No matching RACE RACE + AGE Fine
AGE 94.92 (0.16) 94.86 (0.16) 94.98 (0.15) - 94.87 (0.16) 95.02 (0.15) 94.92 (0.16) 94.82 (0.16)
AGEMEN 94.88 (0.16) 94.94 (0.15) 94.86 (0.16) 93.50 (0.17) 94.92 (0.16) 94.84 (0.16) 95.03 (0.15) 94.79 (0.16)
BIOPSY 94.92 (0.16) 94.88 (0.16) 94.92 (0.16) 93.64 (0.17) 95.14 (0.15) 95.17 (0.15) 94.90 (0.16) 94.83 (0.16)
FTP 95.08 (0.15) 94.84 (0.16) 94.94 (0.15) 86.30 (0.24) 94.90 (0.16) 94.92 (0.16) 95.12 (0.15) 94.97 (0.15)
RELATIVE 95.04 (0.15) 94.90 (0.16) 94.82 (0.16) 94.23 (0.16) 95.00 (0.15) 94.86 (0.16) 94.98 (0.15) 94.66 (0.16)
RACE 94.91 (0.16) - - - 94.91 (0.16) 95.03 (0.15) 94.86 (0.16) 94.72 (0.16)
BIO3 94.88 (0.16) 94.81 (0.16) 94.88 (0.16) 94.32 (0.16) 94.77 (0.16) 94.90 (0.16) 94.85 (0.16) 94.60 (0.16)

B-1.

Risk prediction measures (SD) without weighting under NCC designs.

    Base + BIO1 Base + BIO2 Base + BIO3
      Discrimination  
Full-cohort NCC (L-B) Gold standard 0.605 (0.011) 0.618 (0.010) 0.668 (0.009)
  No matching 0.577 (0.011) 0.585 (0.011) 0.618 (0.009)
  RACE 0.567 (0.012) 0.577 (0.011) 0.614 (0.010)
  RACE + AGE 0.563 (0.012) 0.573 (0.012) 0.612 (0.010)
  Fine 0.567 (0.015) 0.579 (0.014) 0.620 (0.012)
NCC (Weighted) No matching 0.573 (0.011) 0.583 (0.010) 0.619 (0.009)
  RACE 0.558 (0.011) 0.570 (0.010) 0.612 (0.009)
  RACE + AGE 0.551 (0.011) 0.565 (0.010) 0.609 (0.009)
  Fine 0.544 (0.012) 0.555 (0.011) 0.594 (0.010)
      Calibration1  
Full-cohort NCC (L-B) Gold standard 0.943 (0.026) 0.960 (0.018) 0.984 (0.008)
  No matching 0.474 (0.075) 0.498 (0.063) 0.539 (0.043)
  RACE 0.476 (0.086) 0.506 (0.070) 0.550 (0.004)
  RACE + AGE 0.358 (0.119) 0.409 (0.102) 0.500 (0.064)
  Fine 0.524 (0.099) 0.565 (0.073) 0.612 (0.047)
NCC (Weighted) No matching 0.610 (0.048) 0.636 (0.038) 0.675 (0.026)
  RACE 0.490 (0.065) 0.546 (0.049) 0.640 (0.026)
  RACE + AGE 0.434 (0.071) 0.504 (0.053) 0.624 (0.027)
  Fine 0.323 (0.080) 0.389 (0.065) 0.509 (0.042)
      Overall performance2  
Full-cohort Gold standard 0.095 (0.003) 0.104 (0.004) 0.125 (0.003)
NCC (L-B) No matching 0.260 (0.015) 0.263 (0.018) 0.219 (0.113)
  RACE 0.258 (0.014) 0.261 (0.016) 0.245 (0.085)
  RACE + AGE 0.261 (0.018) 0.264 (0.019) 0.275 (0.029)
  Fine 0.284 (0.059) 0.286 (0.061) 0.295 (0.068)
NCC (Weighted) No matching 0.327 (0.007) 0.326 (0.006) 0.263 (0.132)
  RACE 0.328 (0.007) 0.327 (0.006) 0.297 (0.099)
  RACE + AGE 0.328 (0.007) 0.327 (0.006) 0.330 (0.006)
  Fine 0.370 (0.009) 0.369 (0.008) 0.373 (0.008)

Note that we used the average of the weighted Harrell’s C-index for discrimination, the calibration slope for calibration, and the weighted Brier score for overall performance with their standard deviations in the parentheses.

1

5-fold CV was used.

2

The median follow-up time (i.e. 13 years) was used for specification of time-point.

+

Lin and Ying method for un-stratified CC design and Borgan II method for stratified CC design were used.

B-2. The empirical Brier score at different time points ( t=5,10,15) for BIO1 model

graphic file with name CJAS_A_1861225_ILG0001.jpg

Abbreviations: LB#, L-B method and W#, Weighted method with matching confounders, (1) No matching, (2) RACE, (3) RACE + quartile AGE, and (4) Fine (RACE + exact AGE) matching; L&Y, Lin and Ying; B2, Borgan II. Note that red cross mark indicates the average value of the Brier score for gold standard model (i.e. full cohort analysis).

B-3. Empirical Brier score at different time points ( t=5,10,15) for BIO3 model

graphic file with name CJAS_A_1861225_ILG0002.jpg

Abbreviations: LB#, L-B method and W#, Weighted method with matching confounders, (1) No matching, (2) RACE, (3) RACE + quartile AGE, and (4) Fine (RACE + exact AGE) matching; L&Y, Lin and Ying; B2, Borgan II. Note that red cross mark indicates the average value of the Brier score for gold standard model (i.e. full cohort analysis).

Appendix C.

C-1.

Predictive performance of the models: discrimination, calibration, and overall performance from additional simulations with 5% of incidence rate.

Model   Base + BIO1 Base + BIO2 Base + BIO3
      Discrimination  
Full-cohort NCC (L-B) Gold standard 0.609 (0.018) 0.622 (0.017) 0.671 (0.014)
  No matching 0.616 (0.026) 0.628 (0.025) 0.674 (0.022)
  RACE 0.600 (0.026) 0.614 (0.025) 0.666 (0.022)
  RACE + AGE 0.594 (0.026) 0.609 (0.025) 0.663 (0.022)
  Fine 0.599 (0.035) 0.613 (0.034) 0.666 (0.030)
NCC (Weighted) No matching 0.614 (0.023) 0.626 (0.021) 0.673 (0.018)
  RACE 0.613 (0.022) 0.625 (0.021) 0.673 (0.018)
  RACE + AGE 0.613 (0.022) 0.625 (0.021) 0.673 (0.018)
  Fine 0.615 (0.028) 0.628 (0.027) 0.675 (0.025)
CC+ Un-stratified 0.614 (0.023) 0.626 (0.023) 0.674 (0.020)
  Stratified 0.612 (0.022) 0.625 (0.021) 0.673 (0.020)
      Calibration1  
Full-cohort NCC (L-B) Gold standard 0.874 (0.064) 0.911 (0.044) 0.966 (0.018)
  No matching 0.761 (0.150) 0.825 (0.108) 0.927 (0.050)
  RACE 0.743 (0.191) 0.826 (0.125) 0.941 (0.051)
  RACE + AGE 0.708 (0.218) 0.807 (0.137) 0.938 (0.052)
  Fine 0.598 (0.408) 0.738 (0.273) 0.925 (0.098)
NCC (Weighted) No matching 0.794 (0.148) 0.852 (0.106) 0.943 (0.050)
  RACE 0.784 (0.145) 0.843 (0.104) 0.939 (0.049)
  RACE + AGE 0.772 (0.144) 0.835 (0.103) 0.937 (0.049)
  Fine 0.615 (0.028) 0.639 (0.248) 0.837 (0.109)
CC+ Un-stratified 0.730 (0.197) 0.788 (0.155) 0.898 (0.082)
  Stratified 0.752 (0.186) 0.806 (0.147) 0.909 (0.082)
      Overall performance2  
Full-cohort NCC (L-B) Gold standard 0.095 (0.006) 0.104 (0.006) 0.125 (0.006)
  No matching 0.300 (0.072) 0.303 (0.077) 0.281 (0.133)
  RACE 0.297 (0.065) 0.300 (0.071) 0.279 (0.130)
  RACE + AGE 0.299 (0.085) 0.302 (0.088) 0.312 (0.105)
  Fine 0.172 (0.060) 0.174 (0.060) 0.189 (0.068)
NCC (Weighted) No matching 0.265 (0.010) 0.265 (0.010) 0.241 (0.081)
  RACE 0.266 (0.010) 0.266 (0.009) 0.242 (0.081)
  RACE + AGE 0.268 (0.010) 0.268 (0.009) 0.271 (0.009)
  Fine 0.328 (0.015) 0.327 (0.014) 0.330 (0.014)
CC+ Un-stratified 0.330 (0.018) 0.343 (0.018) 0.368 (0.021)
  Stratified 0.329 (0.018) 0.343 (0.018) 0.367 (0.021)

Note that we used the average of the weighted Harrell’s C-index for discrimination, the calibration slope for calibration, and the weighted Brier score for overall performance with their standard deviations in the parentheses.

1

5-fold CV was used.

2

The median follow-up time (i.e. 13 years) was used for specification of time-point.

+

Lin&Ying method for un-stratified CC design and Borgan II method for stratified CC design were used.

Funding Statement

This work was partially supported by grants National Institutes of Health R01 CA178949 and UM1 CA182934.

Footnotes

Note that r~j is the indicator of subjects being in the sub-cohort, π is the proportion of sub-cohort and πl(j) is the proportion of sub-cohort for each stratum.

+

Values are displayed as mean (SD) for AGE and AGEMEN or N (%) for BIOPSY, FTP, RELATIVE and RACE.

*The coefficients of base model were used to generate data as a true in our simulations.

Note that we used the average of the weighted Harrell’s C-index for discrimination, the calibration slope for calibration, and the weighted Brier score for overall performance with their standard deviations in the parentheses.

1

5-fold CV was used.

2

The median follow-up time (i.e. 13 years) was used for specification of time-point.

+

Lin and Ying method for un-stratified CC design and Borgan II method for stratified CC design were used.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Anderson G.L., Mclntosh M., Wu L., Barnett M., Goodman G., Thorpe J.D., Bergan L., Thornquist M.D., Scholler N., Kim N., O'Briant K., Drescher C., and Urban N., Assessing lead time of selected ovarian cancer biomarkers: A nested case–control study. J. Natl. Cancer Inst. 102 (2010), pp. 26–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Biesheuvel C.J., Vergouwe Y., Oudega R., Hoes A.W., Grobbee D.E., and Moons K.G., Advantages of the nested case-control design in diagnostic research. BMC Med. Res. Methodol. 8 (2008), pp. 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Borgan O., Langholz B., Samuelsen S.O., Goldstein L., and Pogoda J., Exposure stratified case-cohort designs. Lifetime Data Anal. 6 (2000), pp. 39–58. [DOI] [PubMed] [Google Scholar]
  • 4.van den Brandt P.A., Spiegelman D., Yaun S., Adami H., Beeson L., Folsom A.R., Fraser G., Goldbohm R.A., Graham S., Kushi L., Marshall J.R., Miller A.B., Rohan T., Smith-Warner S.A., Speizer F.E., Willett W.C., Wolk A., and Hunter D.J., Pooled analysis of prospective cohort studies on height, weight, and breast cancer risk. Am. J. Epidemiol. 152 (2000), pp. 514–527. [DOI] [PubMed] [Google Scholar]
  • 5.Cai T. and Zheng Y., Evaluating prognostic accuracy of biomarkers in nested case–control studies. Biostatistics 13 (2012), pp. 89–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Chen K., Statistical estimation in the proportional hazards model with risk set sampling. Ann. Stat. 32 (2004), pp. 1513–1532. [Google Scholar]
  • 7.Chen K. and Lo S.-H., Case-cohort and case-control analysis with Cox's model. Biometrika 86 (1999), pp. 755–764. [Google Scholar]
  • 8.Clendenen T.V., Ge W., Koenig K.L., Afanasyeva Y., Agnoli C., Brinton L.A., Darvishian F., Dorgen J.F., Eliassen A.H., Falk R.T., Hallmans G., Hankinson S.E., Hoffman-Bolton J., Key T.J., Krogh V., Nichols H.B., Sandler D.P., Schoemaker M.J., Sluss P.M., Sund M., Swerdlow A.J., Visvanathan K., Zeleniuch-Jacquotte A., and Liu M., Breast cancer risk prediction in women aged 35–50 years: Impact of including sex hormone concentrations in the Gail model. Breast Cancer Res. 21 (2019), pp. 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Cook N.R., Paynter N.P., Eaton C.B., Manson J.E., Martin L.W., Robinson J.G., Rossouw J.E., Wasserthell-Smoller S., and Ridker P.M., Comparison of the Framingham and Reynolds risk scores for global cardiovascular risk prediction in the multiethnic women's Health Initiative. Circulation 125 (2012), pp. 1748–1756. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.D'Agostino R.B., Vasan R.S., Pencina M.J., Wolf P.A., Cobain M., Massaro J.M., and Kannel W.B., General cardiovascular risk profile for use in primary care. Circulation 117 (2008), pp. 743–753. [DOI] [PubMed] [Google Scholar]
  • 11.Decarli A., Calza S., Masala G., Specchia C., Palli D., and Gail M.H., Gail model for prediction of absolute risk of invasive breast cancer: Independent evaluation in the Florence–European prospective investigation into cancer and nutrition cohort. J. Natl. Cancer Inst. 98 (2006), pp. 1686–1693. [DOI] [PubMed] [Google Scholar]
  • 12.De Ruijter W., Westendorp R.G., Assendelft W.J., den Elzen W.P., de Craen A.J., le Cessie S., and Gussekloo J., Use of Framingham risk score and new biomarkers to predict cardiovascular mortality in older people: Population based observational cohort study. Br. Med. J. 338 (2009), pp 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Gail M.H., Brinton L.A., Byar D.P., Corle D.K., Green S.B., Schairer C, and Mulvihill J.J., Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. JNCI: J. National Cancer Inst. 81 (1989), pp. 1879–1886. [DOI] [PubMed] [Google Scholar]
  • 14.Ganna A., Reilly M., de Faire U., Pedersen N., Magnusson P., and Ingelsson E., Risk prediction measures for case-cohort and nested case-control designs: An application to cardiovascular disease. Am. J. Epidemiol. 175 (2012), pp. 715–724. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.W. Ge, T.V. Clendenen, Y. Afanasyeva, K.L. Koenig, C. Agnoli, L.A. Brinton, J.F. Dorgan, A.H. Eliassen, R.T. Falk, G. Hallmans, S.E. Hankinson, J. Hoffman-Bolton, T.J. Key, V. Krogh, H.B. Nichols, D.P. Sandler, M.J. Schoemaker, P.M. Sluss, M. Sund, A.J. Swerdlow, K. Visvanathan, M. Liu, and A. Zeleniuch-Jacquotte, Circulating anti-müllerian hormone and breast cancer risk: A study in ten prospective cohorts. Int. J. Cancer 142 (2018), pp. 2215–2226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Graf E., Schmoor C., Sauerbrei W., and Schumacher M., Assessment and comparison of prognostic classification schemes for survival data. Stat. Med. 18 (1999), pp. 2529–2545. [DOI] [PubMed] [Google Scholar]
  • 17.Harrell F.E., Califf R.M., Pryor D.B., Lee K.L., and Rosati R.A., Evaluating the yield of medical tests. JAMA 247 (1982), pp. 2543–2546. [PubMed] [Google Scholar]
  • 18.Harrell F.E., Lee K.L., Califf R.M., Pryor D.B., and Rosati R.A., Regression modelling strategies for improved prognostic prediction. Stat. Med. 3 (1984), pp. 143–152. [DOI] [PubMed] [Google Scholar]
  • 19.Harrell Jr F.E., Lee K.L., and Mark D.B., Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 15 (1996), pp. 361–387. [DOI] [PubMed] [Google Scholar]
  • 20.Kim R.S., A new comparison of nested case–control and case–cohort designs and methods. Eur. J. Epidemiol. 30 (2015), pp. 197–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Langholz B. and Borgan Ø, Estimation of absolute risk from nested case-control data. Biometrics 53 (1997), pp. 767–774. [PubMed] [Google Scholar]
  • 22.Langholz B. and Thomas D.C., Nested case-control and case-cohort methods of sampling from a cohort: A critical comparison. Am. J. Epidemiol. 131 (1990), pp. 169–176. [DOI] [PubMed] [Google Scholar]
  • 23.Liao D., Cai J., Rosamond W.D., Barnes R.W., Hutchinson R.G., Whitsel E.A., Rautaharju P., and Heiss G., Cardiac autonomic function and incident coronary heart disease: A population-based case-cohort study: The ARIC study. Am. J. Epidemiol. 145 (1997), pp. 696–706. [DOI] [PubMed] [Google Scholar]
  • 24.Liddell F., McDonald J., and Thomas D., Methods of cohort analysis: Appraisal by application to asbestos mining. J. R. Stat. Soc. Ser. A (General) 140 (1977), pp. 469–483. [Google Scholar]
  • 25.Lin D. and Ying Z., Cox regression with incomplete covariate measurements. J. Am. Stat. Assoc. 88 (1993), pp. 1341–1349. [Google Scholar]
  • 26.Liu M., Lu W., Shore R.E., and Zeleniuch-Jacquotte A., Cox regression model with time-varying coefficients in nested case–control studies. Biostatistics 11 (2010), pp. 693–706. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Lu W. and Liu M., On estimation of linear transformation models with nested case–control sampling. Lifetime Data Anal. 18 (2012), pp. 80–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.McGeechan K., Macaskill P., Irwig L., Liew G., and Wong T.Y., Assessing new biomarkers and predictive models for use in clinical practice: A clinician's guide. Arch. Intern. Med. 168 (2008), pp. 2304–2310. [DOI] [PubMed] [Google Scholar]
  • 29.Melander O., Newton-Cheh C., Almgren P., Hedblad B., Berglund G., Engstrom G., Persson M., Smith G., Magnusson M., Christensson A., Struck J., Morgenthaler N.G., Bergmann A., Pencina M.J., and Wang T.J., Novel and conventional biomarkers for prediction of incident cardiovascular events in the community. JAMA 302 (2009), pp. 49–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Morris T.P., White I.R., and Crowther M.J., Using simulation studies to evaluate statistical methods. Stat. Med. 38 (2019), pp. 2074–2102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Parker A.S., Thiel D.D., Bergstralh E., Carlson R.E., Rangel L.J., Joseph R.W., Diehl N., and Karnes R.J., Obese men have more advanced and more aggressive prostate cancer at time of surgery than non-obese men after adjusting for screening PSA level and age: Results from two independent nested case–control studies. Prostate Cancer Prostatic Dis. 16 (2013), pp. 352–356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Payne B.A., Hutcheon J.A., Ansermino J.M., Hall D.R., Bhutta Z.A., Bhutta S.Z., Biryabarema C., Grobman W.A., Groen H., Haniff F., Li J., Magee L.A., Merialdi M., Nakimuli A., Qu Z., Sikandar R., Sass N., Sawchuck D., Steyn D.W., Widmer M., Zhou J., and von Dadelszen P., A risk prediction model for the assessment and triage of women with hypertensive disorders of pregnancy in low-resourced settings: The miniPIERS (Pre-eclampsia Integrated Estimate of RiSk) multi-country prospective cohort study. PLoS Med. 11 (2014), pp. e1001589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Pencina M.J., D'Agostino R.B., Larson M.G., Massaro J.M., Vasan R.S., Predicting the thirty-year risk of cardiovascular disease: The Framingham heart study. Circulation 119 (2009), pp. 3078–3084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Polonsky T.S., McClelland R.L., Jorgensen N.W., Bild D.E., Burke G.L., Guerci A.D., and Greenland P., Coronary artery calcium score and risk classification for coronary heart disease prediction. JAMA 303 (2010), pp. 1610–1616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Prentice R.L., A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73 (1986), pp. 1–11. [Google Scholar]
  • 36.Rahman M.S., Ambler G., Choodari-Oskooei B., and Omar R.Z., Review and evaluation of performance measures for survival prediction models in external validation settings. BMC Med. Res. Methodol. 17 (2017), pp. 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Saarela O., Kulathinal S., Arjas E., and Läärä E., Nested case–control data utilized for multiple outcomes: A likelihood approach and alternatives. Stat. Med. 27 (2008), pp. 5991–6008. [DOI] [PubMed] [Google Scholar]
  • 38.Salim A., Delcoigne B., Villaflores K., Koh W., Yuan J., van Dam R.M., and Reilly M., Comparisons of risk prediction methods using nested case-control data. Stat. Med. 36 (2017), pp. 455–465. [DOI] [PubMed] [Google Scholar]
  • 39.Samuelsen S.O., A psudolikelihood approach to analysis of nested case-control studies. Biometrika 84 (1997), pp. 379–394. [Google Scholar]
  • 40.Sanderson J., Thompson S.G., White I.R., Aspelund T., and Pennells L., Derivation and assessment of risk prediction models using case-cohort data. BMC Med. Res. Methodol. 13 (2013), pp. 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Schnabel R.B., Sullivan L.M., Levy D., Pencina M.J., Massaro J.M., D'Agostino R.B., Newton-Cheh C., Yamamoto J.F., Magnani J.W., Tadros T.M., Kannel W.B., Wang T.J., Ellinor P.T., Wolf P.A., Vasan R.S., and Benjamin E.J., Development of a risk score for atrial fibrillation (Framingham Heart study): A community-based cohort study. Lancet 373 (2009), pp. 739–745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Self S.G. and Prentice R.L., Asymptotic distribution theory and efficiency results for case-cohort studies. Ann. Stat. 16 (1988), pp. 64–81. [Google Scholar]
  • 43.Støer N.C. and Samuelsen S.O., Inverse probability weighting in nested case-control studies with additional matching—a simulation study. Stat. Med. 32 (2013), pp. 5328–5339. [DOI] [PubMed] [Google Scholar]
  • 44.Team R.C., R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www. R-project. org/ (Date of access 01/12/2014). 2014, Accessed 01/12.
  • 45.Terry K.L., Schock H., Fortner R.T., Hüsing A., Fichorova R.N., Yamamoto H.S., Vitonis A.F., Johnson T., Overvad K., Tjønneland A., Boutron-Ruault M., Mesrine S., Severi G., Dossus L., Rinaldi S., Boeing H., Benetou V., Lagiou P., Trichopoulou A., Krogh V., Kuhn E., Panico S., Bueno-de-Masquita H.B., Onlan-Moret N.C., Peeters P.H., Gram I.T., Weiderpass E., Duell E.J., Sanchez M., Ardanaz E., Etxezarreta N., Navarro C., Idahl A., Lundin E., Jirström K., Manjer J., Wareham N.J., Khaw K., Byme K.S., Travis R.C., Gunter M.J., Merritt M.A., Riboli E., Cramer D.W., and Kaaks R., A prospective evaluation of early detection biomarkers for ovarian cancer in the European EPIC cohort. Clin. Cancer Res. 22 (2016), pp. 4664–4675. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Therneau T. and Grambsch P., Modeling Survival Data: Extending the 414 Cox Model, Springer Science & Business Media, New York, 2013. [Google Scholar]
  • 47.Thomas D.C. and Greenland S., The relative efficiencies of matched and independent sample designs for case-control studies. J. Chronic. Dis. 36 (1983), pp. 685–697. [DOI] [PubMed] [Google Scholar]
  • 48.P.L. van Daele, M.J. Seibel, H. Burger, A. Hofman, D.E. Grobbee, J.P. van Leeuwen, J.C. Birkenhager, and H.A. Pols, Case-control analysis of bone resorption markers, disability, and hip fracture risk: The Rotterdam study. Br. Med. J. 312 (1996), pp. 482–483. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.van Houwelingen H.C., Validation, calibration, revision and combination of prognostic survival models. Stat. Med. 19 (2000), pp. 3401–3415. [DOI] [PubMed] [Google Scholar]
  • 50.van der Leeuw J., Beulens J.W., van Dieren S., Schalkwijk C.G., Glatz J.F., Hofker M.H., Verschuren W.M., Boer J.M., van der Graaf Y., Visseren F.L., Peelen L.M., and van der Schouw Y.T., Novel biomarkers to improve the prediction of cardiovascular event risk in type 2 diabetes mellitus. J. Am. Heart. Assoc. 5 (2016), pp. e003048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Wu F., Koenig K.L., Zeleniuch-Jacquotte A., Jonas S., Afanasyeva Y., Wójcik O.P., Coasta M., and Chen Y., Serum taurine and stroke risk in women: A prospective, nested case-control study. PloS one 11 (2016), pp. e0149348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Xu J., Sun J., Kader A.K., Lindström S., Wiklund F., Hsu F., Johansson J., Zheng S.L., Thomas G., Hayes R.B., Kraft P., Hunter D.J., Chanock S.J., Isaacs W.B., and Grönberg H., Estimation of absolute risk for prostate cancer using genetic markers and family history. Prostate 69 (2009), pp. 1565–1572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Zethelius B., Berglund L., Sundström J., Ingelsson E., Basu S., Larsson A., Venge P., and Ärnlöv J., Use of multiple biomarkers to improve the prediction of death from cardiovascular causes. N. Engl. J. Med. 358 (2008), pp. 2107–2116. [DOI] [PubMed] [Google Scholar]
  • 54.Zheng S.L., Sun J., Wiklund F., Smith S., Stattin P., Li G., Adami H., Hsu F., Zhu Y., Bälter K., Kader A.K., Turner A.R., Liu W., Bleecker E.R., Meyers D.A., Duggan D., Carpten J.D., Chang B., Isaacs W.B., Xu J., and Grönberg H., Cumulative association of five genetic variants with prostate cancer. N. Engl. J. Med. 358 (2008), pp. 910–919. [DOI] [PubMed] [Google Scholar]

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES