Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Oct 1.
Published in final edited form as: Biom J. 2022 Jun 26;64(7):1240–1259. doi: 10.1002/bimj.202100194

New weighting methods when cases are only a subset of events in a nested case-control study

Qian M Zhou a, Xuan Wang b, Yingye Zheng c, Tianxi Cai b
PMCID: PMC10249867  NIHMSID: NIHMS1905313  PMID: 35754309

Abstract

Nested case-control (NCC) is a sampling method widely used for developing and evaluating risk models with expensive biomarkers on large prospective cohort studies. In a typical NCC design, biomarker values are obtained on a sub-cohort, where cases consist of all the events (subjects who experience the event during the follow-up). However, when the number of events is not small, due to the cost and limited availability of bio-specimen, one may select only a subset of events as cases. We refer to such a variation as the untypical NCC. Unfortunately, existing inverse probability weighted (IPW) estimators for the untypical NCC are biased, and they only focus on relative risk parameters under the proportional hazards (PH) model. In this manuscript, we propose new weighting methods that produce consistent IPW estimators for not only relative risk parameters but also several metrics that evaluate a risk model’s predictive performance. We also provide the inference procedure via perturbation resampling, which captures all the variance and between-subject covariance induced by the sampling processes for both case and control selections. In addition, our methods are not limited to the PH model, and they can be applied to the time-specific generalized linear model. Under the typical NCC design, our new weights are equivalent to the weight proposed in Samuelsen; under the untypical NCC, the IPW estimators using our weights have smaller bias and variance than the existing methods. We will demonstrate this improved performance via both analytical and numerical investigations.

Keywords: between-subject covariance, inverse probability weighting, nested case-control, perturbation resampling, time-dependent accuracy measure

1. Introduction

Risk prediction using novel biomarkers plays a vital role in disease prevention and management. The development and evaluation of risk models require rich information from large-scale cohort studies where participants are followed prospectively to the clinical outcome of interest, and their clinical information is collected at baseline. Many cohorts also obtain biological specimens that are used later for investigating new biomarkers to improve the predictive capacity of the risk model. Often, the cost and effort are expensive for ascertaining biomarkers on a large population, and there is a need to preserve precious biological samples. Thus, sub-cohort sampling such as nested case-control (NCC) is often employed, where new biomarkers are measured on the sub-cohort instead of the entire cohort.

An NCC sub-cohort is constructed in two steps: first, cases are selected, and second, for each case, a number of controls are selected from a group of subjects who are event-free at the failure time of the case; this group is referred to as the risk set of the case. For a typical NCC study, all the events (subjects who encounter the event during the follow-up) are included as cases at the first step. However, when the number of events is not small, the above-described practical constraints prevent using all the events to save cost or samples.

To address this challenge, some studies modified the typical NCC design by selecting only a subset of events as cases. We refer to this variation as the untypical NCC. For example, Jakszyn et al. (2006) investigated the association of Helicobacter pylori infection and vitamin C levels with the risk of gastric cancer. They included 229 out of 314 gastric cancer patients as cases, and for each case, two to four controls were selected; the same design was used in Jakszyn et al. (2012). Lü et al. (2018) conducted a multi-center study on fatality, hospitalization costs, and length of stay due to healthcare-associated infection (HAI). They selected 10 HAI events as cases from each of the 51 hospitals, and for each case, one control is selected and matched on several criteria. Besides these applications, Edelmann et al. (2020) and Graziano et al. (2021) compared the untypical NCC with other sub-cohort sampling methods, such as case-cohort, via simulation studies. Unlike these two papers, our manuscript focuses on just NCC, particularly the untypical variation.

For analyzing typical NCC data, conditional logistic regression has been widely used to estimate hazard ratios under the proportional hazards (PH) model (Goldstein and Langholz 1992). Another popular approach is the inverse probability weighted (IPW) estimation (Samuelsen 1997. Cai and Zheng 2012: 2011, Zhou et al. 2015). Compared to conditional logistic regression, the IPW can be applied to a variety of models, including the PH model and time-specific generalized linear model (GLM) (Uno et al. 2007). Samuelsen (1997) provided the weight formula: the inverse of the selection probability. For the typical NCC, since all the events are selected as cases at the first step, their selection probability is 1, and thus, their weights are 1.

What about the untypical NCC? Conditional logistic regression would still be valid, but some studies misused it. For example, Jakszyn et al. (2006) fit an unconditional logistic regression but included the matching variables as the covariates in the regression. Edelmann et al. (2020) and Graziano et al. (2021) employed the IPW method. However, their weights for the events in the sub-cohort are different: Edelmann et al. (2020) still used 1, but Graziano et al. (2021) used 1/π1, where π1 is the proportion of the events that are selected as cases at the first step.

Unfortunately, our investigations show that these two existing IPW estimators are both biased. It is because they failed to recognize the fact that for the untypical NCC, some events would be selected as cases at the first step, and others would be selected only at the second step through the control selection. To differentiate these two scenarios, we refer to the former as the event case and the latter as the event control. Because of this difference, we need to carefully design the weight for avoiding biased estimation and improving estimation efficiency.

In this manuscript, we propose three weighting methods that all result in consistent IPW estimators. Our first two methods weight event cases and event controls differently based on how they are selected. In contrast, our third weight has the same formula for these two groups, which is the inverse of the overall selection probability: the probability of being ever selected to the sub-cohort as a case or a control. For this weighting method, we formulate the NCC design as a two-stage stratified sampling framework described in Breslow et al. (2009). Under this framework, our third weight follows the method of Horvitz and Thompson (1952). We will prove that the expectation of each weight given the data is either exactly or asymptotically 1, which is the condition of the consistency for the IPW estimator.

Statistical inference is another challenge for the IPW estimation under the untypical NCC. It is difficult to obtain an analytical variance estimate for the IPW estimator because the variance expression is complicated. Standard resampling procedures such as bootstrap fail to capture the correlation induced by the finite-population sampling (Gray 2009). Cai and Zheng (2013) proposed a perturbation resampling method for the typical NCC, and their approach accounts for only between-control correlations. However, for the untypical NCC, sampling processes are involved for both case and control selections. Thus, we propose a new perturbation procedure that captures both between-case and between-control correlations.

In this manuscript, we provide IPW estimators under both the PH model and time-specific GLM. The model parameters characterize relative risks, such as log hazard ratios for the PH model, which are the focus of all the existing works on the untypical NCC. Besides these parameters, we are also interested in several time-dependent accuracy summaries for evaluating the risk model’s predictive performance, such as the true positive rate (TPR), false positive rate (FPR), positive predictive value (PPV), negative predictive value (NPV), and area under a receiver operating characteristic (ROC) curve (AUC) (Heagerty and Zheng 2005, Cai and Zheng 2012; 2011, Zhou et al. 2015). For allowing the public to implement our methods, we create an R package NCCIPW, available at https://github.com/michellezhou2009/NCCIPW.git.

The remaining manuscript is organized as follows. In Section 2, we introduce our three new weights and present the IPW estimators for model and accuracy parameters. In Section 3, we derive and compare the asymptotic properties of the IPW estimators using our weights. We also describe our perturbation resampling procedure for drawing inferences on the parameters. We compare our methods with the existing approaches via simulation studies in Section 4 and a data example in Section 5. Concluding remarks are given in Section 6.

2. IPW Estimation with Three New Weights

In this section, we introduce our three new weights and their resulting IPW estimators for model and accuracy parameters. Under the typical NCC, the three weights are equivalent to the weight of Samuelsen (1997). Under the untypical NCC, their differences are displayed in Table 1, which lists the expressions of each weight for three groups of subjects in the sub-cohort: event cases, event controls, and non-event controls. We have defined event cases and event controls earlier: they have the same disease status but enter the sub-cohort in different ways. Non-event controls are subjects who are event-free during the follow-up and selected as controls. Before these descriptions, we first introduce notation that we will use throughout the manuscript.

Table 1:

Expressions of our three weights: w~j,w^j,wHT,j for three groups of subjects in the subcohort under the untypical NCC, i.e., π1<1 : (i) events cases, events that are selected as cases, (ii) events controls, events that are selected only through the control selection, (iii) non-events controls, non-events that are selected as controls. Note that some event cases might be selected as other cases’ control, and thus, their V0j could be 1 or 0.

(δj,V1j,V0j) w˜j w˜j wHT,j
event cases (1, 1, V0j) 1 1/π1 1/[π1+(1π1)p^0j]
event controls (1, 0, 1) 1/p^0j 0 1/[π1+(1π1)p^0j]
non-event controls (0, 0, 1) 1/p^0j 1/p^0j 1/p^0j

2.1. Notation

An NCC data set includes variables on survival outcome, markers, and sub-cohort sampling indicators. For survival outcome, let Tj denote the time to the event of interest. For some subjects, Tj might not be observed due to censoring; for instance, the subject is lost to follow-up, or the follow-up ends. Let C~j denote a random censoring time, and let τ denote the follow-up duration. For each subject, we only observe Tj=minTj,Cj and δj=ITjCj, where Cj=minC~j,τ, and I() is an indicator function. Subjects who experience the event during the follow-up, i.e., with δj=1 are referred to as events, and those with δj=0 are referred to as non-events. Let Zj denote a p-dimensional vector of clinical markers and biomarkers.

We define two sub-cohort sampling indicator variables: V1j=1 if subject j is selected as a case, and V0j=1 if subject j is selected as a control. Let Vj=δjV1j+1-Vi1V0j+1-δjV0j indicate whether subject j is ever selected to the sub-cohort. In an NCC data set, Tj,δj,V1j,V0j and clinical markers are observed on the full cohort, but biomarkers are usually available for only subjects with Vj=1.

Furthermore, we express the control indicator variable as

V0j=1i:ji(1V1iV0ji), (1)

where V0ji is an indictor variable: V0ji=1 if subject j is a control of case subject i, and i denotes the risk set for subject i. Specifically, i=k:1kn,TkTi consists of all the subjects who have not experienced the event before subject i's event time. For some studies, the controls are also matched on some variables. For these situations, the risk set is expressed as i={k:1kn,TkTi,Mk-Mia0, where Mi is a vector of matching variables, and |a|a0 denotes |a| being less than or equal to a0 component-wise.

In addition, as described earlier, π1(0,1] is the proportion of events that are selected as cases. When π1=1, all the events are cases, and this study is a typical NCC; when π1<1, it is an untypical variation.

2.2. Three New Weights

Our first two methods assign different weights to event cases and event controls. Like Edelmann et al. (2020) and Graziano et al. (2021), we consider two different values: 1 or 1/π1 for event cases. To make the expectation of each weight to be 1, we need to weight event controls correspondingly. Specifically, our first weight is defined as

w~j=δjV1j+1-δjV1jV0j/p^0j, (2)

and the second weight is

w^j=δjV1j/π1+1-δjV0j/p^0j, (3)

where

p^0j=1i:ji{1mni1δiV1i} (4)

is the probability of subject j being selected as a control, m is the number of controls for each case, and ni is the size of the risk set i for subject i. By Equations (2) and (3), we have: for non-event controls, w~j=w^j=1/p^0j; for event cases, w~j=1 but w^j=1/π1; for event controls, w~j=1/p^0j but w^j=0.

We want to point out that our formula of pˆ0j is different from Equation (2.6) in Samuelsen (1997), given as p^0j=1-i:ji1-mni-1δi, which does not contain V1i. We modify this formula because, under the untypical NCC, controls are selected only for selected cases, and not all the events are selected as cases. Thus, Vi1 is needed. When π1=1,V1i=1 whenever δi=1; our formula (4) is equivalent to Equation (2.6) in Samuelsen (1997) under the typical NCC.

Remark 1 The probability p^0j is usually small for large cohort studies because m is often very small relative to the size of the risk set. In addition, if an event, say subject j, has a short event time, its probability p^0j can be close to zero since there are very few cases that subject j is eligible to be included in their risk sets.

Remark 2 Using our notations, the weight in Edelmann et al. (2020) is expressed as

κ1,j=δj[V1j+(1V1j)V0j]+(1δj)V0j/p^0j, (5)

and the weight in Graziano et al. (2021) is expressed as

κ2,j=δj[V1j+(1V1j)V0j]/π1+(1δj)V0j/p^0j. (6)

For non-event controls, κ1,j=κ2,j=1/p^0j; for both events cases and event controls, κ1,j=1 but κ2,j=1/π1. In Appendix A.1 of Supplementary Material, we show that neither weight has an expectation of 1 when π1<1, and consequently, the resulting IPW estimators are biased for the untypical NCC.

Unlike w~ and w^, our third weight has the same formula for event cases and event controls. This method is inspired by Breslow et al. (2009), which formulated the case-cohort sampling as a two-stage stratified sampling. Following this idea, we render the NCC design as such a framework and, based on it, propose a Horvitz-Thompson’s type of weight.

NCC as a two-stage stratified sampling.

In the first stage, the subjects in the full cohort are classified into N1 strata, where N1 is the number of events, i.e., N1=j=1Nδj with N being the size of the full cohort. Each stratum includes one event and his/her risk set. At the second stage, a two-step sampling is performed: first, select n1N1 strata with π1=n1/N1, and second, within each selected stratum, select the case and m subjects from the risk set of this case.

This framework differs from the one described in Breslow et al. (2009) in two aspects. First, the strata in Breslow et al. (2009) are disjoint, but the strata in NCC usually overlap because a subject can be eligible for the risk set of more than one case. Second, the sampling in the untypical NCC is not independent. The strata selection is a finite-population sampling; if one stratum is selected, the others have a lower chance of being selected.

Horvitz-Thompson’s weight.

Our third weight is defined as the inverse probability of being ever selected into the sub-cohort, given as

wHT,j=δj[V1j+(1V1j)V0j]/[π1+(1π1)p^0j]+(1δj)V0j/p^0j. (7)

This expression indicates that for both event cases and event controls, wHT,j=1/π1+1-π1p^0j; for non-event controls, wHT,j=1/p^0j.

2.3. Comparison of Three Weights

When π1=1, all three weights have the same expression and are equivalent to the weight proposed in Samuelsen (1997): w~j=w^j=wHT,j=δj+1-δjV0j/p^0j.

When π1<1, the three weights have the same expression for non-events: w~j=w^j=wHT,j=V0j/p^0j, but for events, they are completely different. Specifically,

  • for event cases, w~j=1,w^j=1/π1, and wHT,j=1/π1+1-π1p^0j; among them, w^j>wHT,j>w~j;

  • for event controls, w~j=1/p^0j,w^j=0, and wHT,j=1/π1+1-π1p^0j; among them, w^j<wHT,j<w~j. We also want to point out that the weight w^j complete discards this group of subjects for estimation.

In Appendix A, we derive each weight’s expectation, variance, and covariance. These properties determine the consistency and asymptotic variance of its resulting IPW estimator, which will be delineated in Section 3.

2.4. IPW estimation

We consider two models for characterizing the relationship between the event time T and markers Z: (i) the PH model and (ii) the time-specific GLM. Under each model, we present the IPW estimators of the model parameters and accuracy parameters. We want to point out that these parameters are associated with the probability model in Equation (8) below, which describes a larger population where the full cohort is from (Breslow et al. 2009).

Since the IPW estimators using the three weights have a similar expression except for the weight, we will present the estimators using wj, representing one of the three weights.

2.4.1. Model Parameters Estimation

Both the PH model and time-specific GLM can be expressed in the following form:

P(Tjt0Zj)=g(αt0+βt0Zj)t0,j, (8)

where g() is a link function, and βt0, a p-dimensional vector, consists of the relative risk parameters that characterize the effects of the markers Zj on the risk Pt0,j.

PH model.

This model can be expressed as Pt0,j=1-exp -exp log Λ0t0+βZj, where Λ0t0 is the baseline cumulative hazard function. Based on Equation (8), αt0=log Λ0t0, which can be estimated by the IPW Breslow’s estimator with the weight wj (Cai and Zheng 2012). The relative risk parameters β are estimated via maximizing the IPW log partial likelihood function with the weight wj (Samuelsen 1997).

The PH model assumes the relative risks β to be constant over time. However, in practice, the biomarkers may have strong effects on the short-term risk but weak for the long-term risk, or vice versa (Zhou et al. 2015). In these situations, the time-specific GLM allows the marker effects to vary over time t0.

Time-specific GLM.

This model can be expressed as Pt0,j=gαt0+βt0Zj, where both αt0 and βt0 are functions of t0. Given a t0, these parameters can be estimated via double IPW. Each observation is weighted by wj*ω^t0,j, where wj is the sampling weight accounting for the missing values of Zj due to the sub-cohort sampling, and ω^t0,j is the censoring weight accounting for the missing disease status ITjt0 due to censoring. The censoring weight ω^t0,j is given as ω^t0,j=δjITjt0/𝒢^Tj+ITj>t0/𝒢^t0, where 𝒢^(t) is a consistent estimator of 𝒢(t)=PCjt, the survival function of the censoring time. If the censoring time is independent of both the event time and markers, 𝒢^(t) can be the Kaplan-Meier estimator (Kaplan and Meier 1958). If the censoring time depends on the markers Zj, a PH model can be fit to estimate PCjtZj.

Let α^t0 and β^t0 denote the IPW estimates of the model parameters under either the PH model or time-specific GLM. Let αt0*,βt0* be the limiting values of α^t0,β^t0 as N, and they can be regarded as the true values of the model parameters.

2.4.2. Accuracy Parameters Estimation

The probability Pt0,j in Equation (8) can be used as a risk score that classifies subjects into different risk categories. Given a cut-off value c, subjects with Pt0,j>c are classified as the highrisk group, and the low-risk group consists of subjects with Pt0,jc. We evaluate the above models’ predictive accuracy with the following time-dependent TPR, FPR, PPV, and NPV.

These accuracy parameters are based on the population risk score Pt0,j*=gαt0*+βt0*Zj. They are defined as TPRt0 (c)=Pr Pt0,j*>cTjt0,FPRt0 (c)=Pr Pt0,j*>cTj>t0, PPVt0 (c)=Pr Tjt0Pt0,j*>c, and NPVt0(c)=Pr Tj>t0Pt0,j*c. In addition, the time-dependent AUC is the area under the time-dependent ROC curve, which is a curve of TPRt0(c) versus FPRt0(c) over all possible values of c. The time-dependent AUC can be expressed as AUCt0(c)=Pr Pt0,i*>Pt0,j*Tit0,Tj>t0, a conditional probability that, given a pair of an event and a non-event by time t0, the event has a higher risk score.

With the estimates α^t0,β^t0, we can calculate the estimated risk P^t0,j=gα^t0+β^t0Zj. The time-dependent accuracy measures described above can be estimated by the double IPW estimators:

TPR^t0(c)=j=1Nwjω^t0,jIP^t0,j>cITjt0j=1Nwjω^t0,jITjt0,FPR^t0(c)=j=1Nwjω^t0,jIP^t0,j>cITj>t0j=1Nwjω^t0,jITj>t0,PPV^t0(c)=j=1Nwjω^t0,jIP^t0,j>cITjt0j=1Nwjω^t0,jIP^t0,j>c,NPV^t0(c)=j=1Nwjω^t0,j,jIP^t0,j,jcITj>t0j=1Nwjω^t0,jIP^t0,jc,

and

AUC^t0=i=1Nj=1Nwiω^t0,iwjω^t0,jIP^t0,i>P^t0,jITit0ITj>t0i=1Nj=1Nwiω^t0,iwjω^t0,jITit0ITj>t0. (9)

3. Asymptotic Properties of IPW estimators

In this section, we will present the consistency and asymptotic distribution of the IPW estimators using our three weights: w~j,w^j, and wHT,j.

3.1. Consistency of IPW estimators

In Appendix A.1, we prove that given the data D=Ti,δi,Mi,i=1,,N,Ew~jD=1, Ew^jD=1, and EwHT,jD=1+OpN-1 for any 0<π11. Thus, the IPW estimator using each weight is consistent for both the typical and untypical NCC.

3.2. Asymptotic Distribution of IPW Estimators

In Appendix B, we show that the IPW estimator with each weight can be expressed as a weighted sum of independent zero-mean random variables that are functions of data Tj,δj,Zj, shown in Equation (B.1). Thus, they are asymptotically normally distributed, and the asymptotic variance is the sum of two components. The first component is the model-based variance of the estimator as if the full cohort data Tj,δj,Zj,j=1,,N are available (Equation (B.3)). The second component is the sampling variance, i.e., variability from the sub-cohort sampling, including the variance and covariance of V1j and V0j given the data (Equations (B.4) and (B.5)). Breslow et al. (2009) and Edelmann et al. (2020) expressed their IPW estimator variance by a similar decomposition.

3.3. Comparison of IPW Estimators

Since the three IPW estimators are all consistent, we compare their efficiency. As pointed out in Section 2.3, when π1=1, the three weights have the same expression, and thus, their IPW estimators have the same asymptotic variance for the typical NCC.

Our comparison focuses on π1<1, i.e., the untypical NCC. The three IPW estimators have the same model-based variance since this part does not involve the sampling weight. Their sampling variance for non-events is also the same since the three weights have the same expression for this group. Therefore, the difference of the total variance is on the sampling variance for the events, mainly controlled by the variance of the sampling weight.

Equations (A.7) - (A.9) in Appendix A.2 express each weight’s variance for events: Var w~jD=1-π1EV11-p^0jp^0jD+OpN-1,Var w^jD=1-π1π1, and Var wHT,jD=EV11-p^jp^jD+OpN-1 with pˆj=π1+1-π1pˆ0j, where V1=V1j,j=1,,N. When π1<1, we have:

Var[w˜jD]>Var[w^jD]Var[wHT,jD]. (10)

The first inequality is because the probability p^0j is usually very small (Remark 1), leading to p^0j<π1, and thus, Var w~jD>Var w^jD. The second inequality is because π1<π1+1-π1p^0j, and thus, Var w^jD>Var wHT,jD. Again, because p^0j is small, these two variances are very close. The order in Equation (10) determines that the IPW estimator with w~ is less efficient than the other two, which have similar variances. These analytical conclusions agree with the numerical results of the simulation study in Section 4.2 and the data example in Section 5.

3.4. Inference via Perturbation

To draw inferences on the parameters of interest, we need a variance estimate for the IPW estimator. Samuelsen (1997) provided an analytical variance estimate for the PH model parameters under the typical NCC. However, for the untypical NCC, the IPW estimator’s variance expression is more complicated, as shown above, and therefore, obtaining an analytical estimate is challenging. Resampling procedures, such as bootstrap, fail to estimate the variance accurately because they cannot emulate the between-subject correlations (Gray 2009. Cai and Zheng 2013).

Cai and Zheng (2013) proposed a perturbation resampling procedure for variance estimation under the typical NCC. Their method mimics the variance and covariance of V0j via repeatedly perturbing these indicator variables. However, under the untypical NCC, there is also a sampling process for the case selection, so the perturbation of V0j alone is not sufficient. Thus, we extend this procedure by perturbing both V1j’s and V0j’s to recover all the variances and covariances.

Like bootstrap, the perturbation method creates many perturbed counterparts for the estimator. We can calculate their empirical variance, which approximates the finite-sample variance of the estimator.

Perturbed counterpart of IPW estimator.

Here, we use wj to represent one of our weights. For the model parameters αt0 and βt0, the perturbed counterparts of their IPW estimators, denoted by α^t0(b) and β^t0(b), are obtained by replacing the sampling weight wj and censoring weight ω^t0,j with their respective perturbed counterparts, wj(b) and ω^t0,j(b). We will explain how to perturb these two weights later.

For the double IPW estimators of accuracy parameters, their counterparts are obtained using the perturbed weight wj(b)*ω^t0,j(b) and the perturbed risk score P^t0,j(b)=gα^t0(b)+Zjβ^t0(b). For example, the perturbed counterpart of AUC^t0 in Equation (9) is given as

AUC^t0(b)=i=1Nj=1Nwi(b)ω^t0,i(b)wj(b)ω^t0,j(b)IP^t0,i(b)>P^t0,j(b),Tit0,Tj>t0i=1Nj=1Nwi(b)ω^t0,i(b)wj(b)ω^t0,j(b)ITit0,Tj>t0.

Perturbed censoring weight ω^j(b).

Let Iij,i=1,,N,j=1,,N be independent and identically distributed random variables with mean 1 and variance 1. The perturbed censoring weight ω^t0,j(b) is given as ω^t0,j(b)=δjITjt0/𝒢^(b)Tj+ITj>t0/𝒢^(b)t0, where 𝒢^(b)(t) is the estimate of 𝒢(t) with each subject weighted by Ijj,j=1,,N.

Previously, we concluded that the two IPW estimators using w^j and wHT,j are more efficient than w~j. Thus, we describe how to perturb these two weights.

Perturbed sampling weight w^j(b).

This perturbed sampling weight is w^j(b)=δjV1j(b)π1(b)+1-δjV0j(b)p^0j(b), obtained by replacing the indicator variables V1j and V0j as well as their probabilities π1 and pˆ0j with their perturbed counterparts. Specifically, the perturbed counterpart of the case indicator is V1j(b)=V1jIjj. The probability π1 can be written as π1=i=1NδiV1i/i=1Nδi, and its perturbed counterpart is given as π1(b)=i=1NIiiδiV1i/i=1NIiiδi. By Equation (1), the perturbed counterpart of the control indicator variable is given as V0j(b)=1-i:ji1-V1iV0jiIij. The probability p^0j in Equation (4) can be written as p^0j=1-i:ji1-liV0lini-1δiV1i where m=liV0li, and its perturbed counterpart is p^0j(b)=1i:ji{1liV0kiIilni1δiV1i}.

Perturbed sampling weight wHT,j(b).

With the notations above, this perturbed weight is wHT,j(b)=δjV1j+1-V1jV0jjjπ1(b)+1-π1(b)pˆ0j(b)+1-δjV0j(b)p^0j(b).

It is worth noting that our perturbation procedure is valid for both the typical and untypical NCC. When π1=1,V1j=δj, and π1(b)=π1=1. The perturbed weights w^j(b) and wHT,j(b) are the same as those proposed in Cai and Zheng (2013) for the typical NCC studies.

4. Simulation Studies

We investigate the proposed methods via the following two simulation studies. The first one focuses on comparing the IPW estimators using our three new weights: w~j, w^j, and wHT,j and two weights of Edelmann et al. (2020) and Graziano et al. (2021): κ1,j and κ2,j in terms of their bias and variability. For the ease of presentation, we use θ^w˜,θ^wˆ,,θ^wHT,θ^κ1, and θ^κ2 to denote the five IPW estimators using the five weights. The second study investigates the validity of the proposed perturbation resampling procedure for estimating the variance of the IPW estimator.

As defined earlier, the true values of the model and accuracy parameters are governed by the underlying data generating mechanism. Appendix C of Supplementary Material explains how to calculate the true parameter values based on the simulation scheme below. In both studies, we use the true parameter values to evaluate our proposed estimation and inference procedures as well as the existing estimators.

4.1. Simulation setting

For both studies, we use the same simulation setting. We consider one clinical marker Zj that is measured on the full cohort and one biomarker Bj that is measured only on the NCC sub-cohort. The marker Zj is first generated from the standard normal distribution N(0,1). In many situations, biomarkers are correlated with clinical markers, and thus, we simulate Bj depending on Zj:Bj=Zj+eB,j, where eB,j~N(0,1). Given these two markers, the event time Tj is obtained from

log(Tj)=1.50.25Zj0.25Bj+0.5ϵT,j, (11)

where ϵT,j is generated from an extreme value distribution with the cumulative distribution function F(x)=1-exp -ex. As described in Section 2.1, the event time might be censored by a random censoring time variable C~j and a follow-up duration of τ=2 years, where C~j is generated from C~j=0.1+Cj1 with Cj1~Gamma (2,2). Thus, about 90% of the event times are censored by Cj=minC˜j,τ.

We consider two full cohort sizes: N=5,000 and 10,000. To construct the NCC sub-cohort, we set π1 (the percentage of events that are selected as cases) to be 20%, 50%, and 80%. For each selected case, m=1 or 3 controls are selected from either (i) the case’s risk set without matching, or (ii) the case’s risk set with exact matching on a variable M1 and matching up to \pm 1 on another variable M2. The two matching variables M1 and M2 are generated as follows. Let M1=IM˜1>0.5 where M˜1=ΦZ+e1 with e1~N(0,1), and let M2 be the closest integer to M˜2=5ΦB+e2 where e2~N(0,1).

We set the prediction time t0=1 with the event rate Pr T<18%. To estimate the absolute risk PTj<1Zj,Bj, we fit the PH model and the time-specific GLM with a logit link. As explained in Section 2.4.1, both models can be expressed as the general form in Equation (8): PTjt0Zj,Bj=gαt0+βZ,t0Zj+βB,t0Bj. Since we consider only one t0 value, we suppress t0 from the parameters. We also want to point out that the true data generating mechanism in Equation (11) follows the PH assumption, and thus the GLM is a misspecified model.

Under each model, we obtain the IPW estimates of the relative risk parameters βZ,βB. For the PH model, as mentioned earlier, conditional logistic regression (clogit) is an alternative method to obtain the estimates of βZ,βB. Thus, we compare the five IPW estimators with the clogit estimator. In addition, each model is evaluated on the following time-dependent accuracy measures: TPR, PPV, and NPV at a cutoff value making FPR = 0.05, and AUC.

As described in Section 3.2, the total variance of the IPW estimator includes the model-based variance as if the full cohort data are available. To capture this part, we generate 1000 replicates of the full cohort data, and for each replication, an NCC sub-cohort is constructed following one of the sampling schemes described above.

4.2. Study I: Comparison of IPW Estimators using Five Weights

In the first study, we compare the empirical bias and standard deviation (SD) of the five IPW estimators. Tables 2 and 3 report these statistics for the relative risk parameters βZ and βB under the PH model and time-specific GLM respectively. The summary statistics for each model’s accuracy parameters are shown in Figures 1 and 2. All these results are for the settings where the controls are selected with matching, and those for the without-matching settings are included in Appendix D of Supplementary Material. We also report the square root of mean square error (RMSE) in Tables 47 of Appendix D.

Table 2:

Simulation Results: Estimation of βZ and βB under the PH model. For each parameter, the results include the bias and empirical SD (in the parentheses) relative to the true parameter value in 100%. The NCC sub-cohort is constructed with matching.

N π1 1:m w˜ w˜ wHT k1 k2 clogit
βZ=0.5
5000 0.2 1:1 9.8 (99.4) 0.5 (41.4) 0.5 (41.4) 19.4 (53.0) 21.7 (55.9) 1.0 (56.6)
5000 0.2 1:3 6.8 (66.9) 0.6 (32.4) 1.4 (31.6) 20.2 (38.1) 21.2 (40.0) 1.2 (39.5)
5000 0.5 1:1 3.6 (53.7) 2.0 (27.4) 2.1 (27.2) 11.2 (30.7) 11.4 (30.4) 1.6 (36.2)
5000 0.5 1:3 2.4 (38.9) 1.1 (19.8) 1.3 (19.4) 10.1 (21.0) 10.1 (21.1) 0.1 (25.5)
5000 0.8 1:1 2.5 (33.1) 2.1 (21.1) 2.1 (21.0) 5.2 (21.9) 5.2 (21.8) 2.2 (31.0)
5000 0.8 1:3 1.3 (22.7) 1.2 (16.5) 1.2 (16.3) 4.1 (16.7) 4.1 (16.6) −1.2 (22.7)
10000 0.2 1:1 8.2 (80.3) 0.0 (29.1) 0.6 (29.0) 18.7 (37.2) 22.6 (39.4) 0.3 (38.8)
10000 0.2 1:3 2.5 (54.0) −0.5 (21.8) 0.1 (20.7) 18.6 (24.9) 19.7 (26.8) −0.9 (27.4)
10000 0.5 1:1 −0.9 (47.1) −0.6 (18.5) −0.6 (18.3) 8.2 (20.5) 8.4 (20.4) 0.2 (26.3)
10000 0.5 1:3 2.4 (31.4) 0.0 (14.7) 0.2 (14.3) 9.0 (15.3) 9.1 (15.3) −0.9 (18.6)
10000 0.8 1:1 1.5 (24.1) 0.4 (14.7) 0.5 (14.6) 3.5 (15.2) 3.5 (15.1) 0.1 (20.1)
10000 0.8 1:3 −0.7 (18.8) 0.2 (11.8) 0.2 (11.6) 3.1 (11.9) 3.1 (11.9) −0.8 (15.9)
βB=0.5
5000 0.2 1:1 11.8 (76.3) 2.7 (29.6) 2.7 (29.5) 23.8 (39.7) 27.3 (41.8) 12.1 (46.0)
5000 0.2 1:3 6.4 (49.9) 1.4 (22.3) 1.4 (21.8) 22.1 (26.2) 24.0 (28.0) 5.3 (30.8)
5000 0.5 1:1 6.3 (39.6) 1.2 (18.7) 1.3 (18.7) 10.9 (21.1) 11.5 (21.0) 6.4 (28.8)
5000 0.5 1:3 1.5 (29.2) 0.5 (14.6) 0.6 (14.4) 10.4 (15.7) 11.0 (15.7) 1.8 (21.5)
5000 0.8 1:1 1.0 (24.7) 0.0 (14.9) 0.0 (14.9) 3.2 (15.6) 3.2 (15.5) 4.3 (23.4)
5000 0.8 1:3 1.2 (16.7) 0.1 (11.9) 0.2 (11.8) 3.4 (12.1) 3.5 (12.1) 0.5 (17.7)
10000 0.2 1:1 6.1 (58.7) 0.7 (20.8) 0.6 (20.6) 19.3 (26.8) 23.3 (28.1) 7.3 (31.6)
10000 0.2 1:3 4.9 (38.2) 0.1 (15.4) 0.5 (14.9) 20.4 (17.7) 23.8 (19.0) 3.1 (21.6)
10000 0.5 1:1 5.1 (34.2) 1.6 (12.9) 1.6 (12.8) 10.9 (14.5) 11.5 (14.4) 5.9 (21.1)
10000 0.5 1:3 0.3 (24.6) 0.4 (10.8) 0.5 (10.5) 10.2 (11.4) 10.9 (11.2) 0.9 (15.4)
10000 0.8 1:1 2.0 (17.8) 0.8 (10.5) 0.8 (10.5) 4.0 (10.9) 4.0 (10.9) 4.3 (16.3)
10000 0.8 1:3 1.0 (13.3) 0.5 (8.3) 0.6 (8.1) 3.8 (8.3) 3.9 (8.3) −0.3 (12.4)

Table 3:

Simulation Results: Estimation of βZ and βB under the time-specific GLM. For each parameter, the results include the bias and empirical SD (in the parentheses) relative to the true parameter value in 100%. The NCC sub-cohort is constructed with matching. In addition, for the IPW estimator using w~, its summary statistics are calculated using only the replications for which the GLM converged. The number of replications (out of the total 1000 replications) for which the GLM did not converge is reported as numbers labelled with *.

N π1 1:m w˜ w˜ wHT k1 k2
βZ=0.54
5000 0.2 1:1 1.8e+14 (9.5e+15); 34* 2.2 (54.8) 2.1 (54.2) 19.5 (64.8) 18.6 (69.1)
5000 0.2 1:3 1.3e+13 (8.7e+15); 8* 3.2 (45.7) 3.8 (43.0) 19.7 (47.7) 19.2 (56.7)
5000 0.5 1:1 1.8e+14 (4.6e+15); 2* 3.5 (35.5) 3.6 (35.2) 11.7 (38.1) 11.3 (38.0)
5000 0.5 1:3 5.8 (57.3); 0* 2.4 (28.1) 2.3 (27.1) 9.3 (28.0) 8.1 (28.6)
5000 0.8 1:1 5.2e+13 (1.6e+15); 2* 4.0 (27.3) 4.0 (27.3) 6.7 (27.9) 6.6 (27.9)
5000 0.8 1:3 3.6 (33.7); 0* 2.8 (23.0) 2.8 (22.8) 5.0 (23.0) 4.8 (23.0)
10000 0.2 1:1 −8.3e+14 (1.4e+16); 30* 1.4 (37.1) 1.8 (36.3) 19.0 (44.3) 19.9 (46.8)
10000 0.2 1:3 −3.2e+14 (7.6e+15); 7* 0.7 (30.2) 0.7 (28.6) 16.6 (31.6) 14.2 (38.8)
10000 0.5 1:1 3.6e+13 (1.1e+15); 5* −0.6 (24.3) −0.6 (24.1) 7.2 (25.8) 6.7 (25.9)
10000 0.5 1:3 5.5 (43.3); 1* 0.5 (19.9) 0.7 (19.1) 7.8 (19.7) 6.8 (20.2)
10000 0.8 1:1 3.1 (34.8); 0* 1.1 (19.2) 1.1 (19.1) 3.7 (19.5) 3.6 (19.5)
10000 0.8 1:3 0.0 (25.8); 0* 0.8 (15.9) 0.8 (15.7) 3.1 (15.8) 2.9 (15.8)
βB=0.54
5000 0.2 1:1 5.3e+13 (6.7e+15); 34* 4.0 (38.9) 4.2 (38.3) 23.4 (47.2) 23.6 (50.2)
5000 0.2 1:3 2.5e+14 (6.4e+15); 8* 3.4 (32.8) 2.9 (30.6) 20.3 (34.0) 18.3 (40.8)
5000 0.5 1:1 −7e+13 (2.4e+15); 2* 1.8 (25.0) 2.0 (24.9) 10.3 (26.8) 10.1 (26.9)
5000 0.5 1:3 6.3 (43.5); 0* 1.9 (21.2) 2.0 (20.4) 9.7 (21.2) 9.1 (21.4)
5000 0.8 1:1 −2.7e+13 (8.5e+14); 2* 0.7 (19.6) 0.6 (19.5) 3.3 (20.0) 3.3 (20.0)
5000 0.8 1:3 3.3 (24.5); 0* 1.1 (17.0) 1.2 (16.9) 3.7 (17.1) 3.6 (17.1)
10000 0.2 1:1 5.3e+14 (9e+15); 30* 2.1 (26.4) 2.1 (25.8) 19.9 (31.5) 20.0 (33.7)
10000 0.2 1:3 4.3e+13 (3.2e+15); 7* 1.8 (22.1) 2.2 (21.3) 19.2 (23.4) 19.5 (28.7)
10000 0.5 1:1 −2.4e+13 (7.6e+14); 5* 3.2 (17.0) 3.2 (17.0) 11.3 (18.2) 11.0 (18.3)
10000 0.5 1:3 3.4 (34.4); 1* 2.2 (14.8) 2.2 (14.0) 9.8 (14.6) 9.1 (14.7)
10000 0.8 1:1 4.5 (25.6); 0* 2.0 (13.3) 2.0 (13.3) 4.7 (13.6) 4.6 (13.6)
10000 0.8 1:3 2.9 (18.9); 0* 1.9 (11.1) 1.9 (10.9) 4.4 (11.0) 4.3 (11.0)

Figure 1:

Figure 1:

Simulation Results: Estimation of AUC and TPR, PPV, and NPV at the cut-off value which corresponds to FPR=0.05 under the PH model. For each parameter, the results include the absolute bias (the marker) and empirical SD (half of the bar length) relative to the true parameter value in 100%. The NCC sub-cohort is constructed with matching.

Figure 2:

Figure 2:

Simulation Results: Estimation of AUC and TPR, PPV, and NPV at the cut-off value which corresponds to FPR=0.05 under the time-specific GLM. For each parameter, the results include the absolute bias (the marker) and empirical SD (half of the bar length) relative to the true parameter value in 100%. The NCC sub-cohort is constructed with matching. In addition, for the IPW estimator using w~, its summary statistics are calculated using only the replications for which the GLM converged.

When calculating these summary statistics, we take into account the magnitude of the parameter values: we compute the bias, SD, and RMSE relative to the true parameter value. Specifically, let θ^[r] denote the IPW estimate of a parameter θ based on the data generated in the r-th replication, r=1,,1000, and let θ* denote the true value of θ. The above-mentioned tables and figures present the relative bias, calculated as rBias =θ^--θ*/θ* with θ^-=11000r=11000θ^[r], and the relative SD as rSD=1999r=11000(θ^[r]-θ^)2/θ*. The RMSE is an overall metric accounting for both bias and variability. The relative RMSE is calculated as rRMSE =rBias2+rSD2. In addition, Figures 1 and 2 plot the absolute value of the relative bias to better compare its magnitude.

Results for PH model.

For estimating the relative risk parameters βZ,βB, we find that a larger full cohort size N, a larger π1, or a larger number m of controls per case tend to produce smaller bias and smaller variance. Among the five IPW estimators and the clogit estimator, θ^wˆ and θ^wHT perform the best with the least bias and highest efficiency, and therefore, their RMSEs are the smallest. Between them, the variance of θ^wHT is slightly smaller, but their difference is minimal, especially when π1 is close to 1. This observation agrees with our analytical conclusions in Section 3.3.

The IPW estimator θ^w˜ has a smaller bias than θ^κ1 and θ^κ2. However, its disadvantage is variability, which is the largest among all the estimators, and because of this, the RMSE of θ^w˜ is also the largest. As explained in Section 3.3, its large variance is due to small values of pˆ0j. In addition, when π1 decreases, we find that θˆw˜ gets less efficient because there are fewer cases that a control can be selected for, which means pˆ0j drops even more.

The results clearly show that θ^κ1 and θ^κ2 are biased. As explained in Remark 2, the bias results from the expectations of their weights not equal to 1. In addition, as derived in Appendix A.1, when π1 approaches 1, these expectations get closer to 1. Accordingly, we observe that their biases decrease when π1 increases. When π1 is small, for instance, π1=0.2, doubling the full cohort size N did not significantly reduce the bias. In terms of variability, their variances are slightly smaller than the clogit estimator. Overall, they have a similar RMSE as the clogit estimator.

For the accuracy parameters, we observe similar results: θ^wˆ and θ^wHT perform the best and similarly, θ^w˜ has the largest variance, and θ^κ1 and θ^κ2 have the largest bias. When π1 is closer to 1, these five IPW estimators are more alike because their weights are more similar to each other.

Results for time-specific GLM.

Most of the results are similar to those under the PH model. The main difference is that, besides being inefficient, θ^w˜ did not converge in some replications. Table 3 lists the number of the “nonconverged” replications for each scenario. The summary statistics for all the parameters are calculated using only the estimates from the converged replications. However, some of them still yield “abnormal” estimates, which makes the bias and SD go through the roof.

We believe that this issue is caused by event controls that experienced the event early. As explained in Remark 1, for such subjects, their p^0j values are close to zero, leading to a huge weight w~j, but their binary outcomes ITjt0 are 1. In addition, when π1 decreases, as explained earlier, pˆ0j would be even smaller, and consequently, this issue gets worse. In comparison, other IPW estimators do not have this problem because their weights are smaller for these subjects.

4.3. Study II: Perturbation Resampling for Inference

The second study examines the validity of the perturbation resampling procedure for two IPW estimators θ^w^ and θ^wHT since they are the best among all the estimators. In the following, we use θ^ to denote one of these two estimators. In each replication, besides the estimate θ^[r], we obtain 500 perturbed counterparts with Iij generated from an exponential distribution with rate 1. The perturbed counterpart is denoted as θ^[r,b] where b indexes the perturbation, and r still indexes the replication. We calculate the SD of θ^[r,b],b=1,,500, which is the standard error (SE) associated with the estimate θ^[r]. This SE is denoted as SE[r] and used to construct a level 1-α confidence interval (CI) for the parameter: θ^[r]±z1-α/2*SE[r], where z1-α/2 is the 100*(1-α/2)% quantile of a standard normal distribution.

We evaluate this procedure with two metrics, reported in Tables 4 and 5. The first metric compares the average SE, i.e., 11000r=11000SE[r] with the empirical SD of the estimates θ^[r],r=1,,1000}. Specifically, this metric is the ratio of the average SE over the empirical SD. The second metric is the empirical coverage probability of the 95% CI constructed with the perturbation-based SE. We observe that for both estimators, the average SE is close to the empirical SD, and the empirical coverage probability is close to the nominal level 95% for most of the scenarios. These results indicate that the proposed perturbation resampling can accurately capture the variability of the IPW estimator, and thus, the inference procedure is valid.

Table 4:

Simulation Results: the ratios of the average SE to empirical SD and empirical coverage probabilities (in the parentheses) of 95% CIs for relative risk parameters. Both the SEs and 95% CIs are obtained via perturbation resampling. The NCC sub-cohort is constructed with matching.

BZ βB
N π1 1:m w˜ wHT w˜ wHT
PH Model
5000 0.2 1:1 0.927 (92.8%) 0.914 (92.5%) 0.926 (92.5%) 0.915 (92.5%)
5000 0.2 1:3 0.963 (94.2%) 0.934 (93.7%) 1.005 (95.2%) 0.976 (94.9%)
5000 0.5 1:1 0.912 (92.0%) 0.910 (92.4%) 0.952 (94.0%) 0.948 (93.7%)
5000 0.5 1:3 1.025 (94.5%) 1.006 (94.8%) 0.997 (94.5%) 0.977 (94.1%)
5000 0.8 1:1 0.959 (94.2%) 0.959 (94.7%) 0.967 (94.3%) 0.966 (94.4%)
5000 0.8 1:3 1.001 (95.2%) 1.001 (94.8%) 0.995 (94.0%) 0.990 (94.4%)
10000 0.2 1:1 0.944 (93.6%) 0.930 (93.6%) 0.938 (94.1%) 0.935 (94.1%)
10000 0.2 1:3 1.012 (94.9%) 1.006 (95.4%) 1.021 (94.7%) 1.000 (94.6%)
10000 0.5 1:1 0.965 (94.3%) 0.965 (94.3%) 0.987 (94.8%) 0.983 (94.2%)
10000 0.5 1:3 0.977 (94.4%) 0.971 (94.1%) 0.949 (94.2%) 0.944 (94.1%)
10000 0.8 1:1 0.981 (94.5%) 0.986 (94.0%) 0.978 (94.3%) 0.978 (94.5%)
10000 0.8 1:3 0.988 (94.3%) 0.989 (94.4%) 1.015 (94.4%) 1.018 (94.7%)
Time-specific GLM
5000 0.2 1:1 0.963 (94.8%) 0.955 (94.6%) 0.976 (95.2%) 0.972 (94.7%)
5000 0.2 1:3 0.981 (94.8%) 0.982 (94.8%) 0.990 (95.2%) 1.000 (94.6%)
5000 0.5 1:1 0.936 (93.5%) 0.936 (93.0%) 0.956 (94.5%) 0.951 (94.0%)
5000 0.5 1:3 1.012 (94.7%) 1.012 (95.3%) 0.969 (94.3%) 0.967 (94.5%)
5000 0.8 1:1 0.973 (93.5%) 0.969 (93.8%) 0.972 (94.4%) 0.973 (94.1%)
5000 0.8 1:3 0.983 (94.6%) 0.981 (94.9%) 0.964 (94.3%) 0.957 (94.3%)
10000 0.2 1:1 0.993 (94.4%) 0.996 (94.4%) 1.003 (95.7%) 1.008 (95.5%)
10000 0.2 1:3 1.037 (95.2%) 1.033 (95.2%) 1.019 (95.5%) 1.002 (95.6%)
10000 0.5 1:1 0.963 (93.9%) 0.962 (93.4%) 0.994 (94.6%) 0.986 (94.0%)
10000 0.5 1:3 1.000 (94.8%) 1.010 (95.8%) 0.975 (94.7%) 0.992 (95.0%)
10000 0.8 1:1 0.976 (93.6%) 0.978 (94.0%) 1.012 (94.8%) 1.011 (95.1%)
10000 0.8 1:3 1.002 (94.7%) 1.004 (94.8%) 1.037 (96.2%) 1.046 (96.1%)

Table 5:

Simulation Results: the ratios of the average SE to empirical SD and empirical coverage probabilities (in the parentheses) of 95% CIs for accuracy parameters. Both the SEs and 95% CIs are obtained via perturbation resampling. The NCC sub-cohort is constructed with matching.

AUC TPR PPV NPV
w˜ wHT w˜ wHT w˜ wHT w˜ wHT
N π1 1:m PH Model
5000 0.2 1:1 0.922 (90.9%) 0.914 (91.1%) 1.006 (94.0%) 1.008 (94.8%) 1.107 (96.1%) 1.093 (96.0%) 0.928 (92.2%) 0.925 (93.1%)
5000 0.2 1:3 0.973 (92.8%) 0.967 (93.9%) 1.027 (94.4%) 1.027 (94.7%) 1.036 (94.9%) 1.031 (95.2%) 0.913 (92.6%) 0.929 (92.5%)
5000 0.5 1:1 0.949 (93.7%) 0.949 (94.2%) 1.011 (94.9%) 1.007 (93.6%) 1.027 (94.4%) 1.036 (95.6%) 0.911 (91.9%) 0.908 (92.4%)
5000 0.5 1:3 0.958 (93.7%) 0.954 (93.5%) 1.015 (94.4%) 1.002 (94.2%) 0.979 (93.3%) 0.973 (93.9%) 0.896 (91.6%) 0.891 (91.3%)
5000 0.8 1:1 0.987 (93.6%) 0.981 (93.3%) 1.012 (95.3%) 1.013 (95.1%) 1.005 (94.9%) 1.007 (94.8%) 0.941 (92.9%) 0.943 (93.3%)
5000 0.8 1:3 1.024 (95.2%) 1.019 (95.3%) 1.011 (95.2%) 1.014 (94.8%) 0.973 (93.6%) 0.986 (94.1%) 0.860 (90.0%) 0.857 (90.3%)
10000 0.2 1:1 0.967 (94.2%) 0.961 (94.1%) 1.022 (94.5%) 1.018 (94.8%) 1.063 (96.3%) 1.060 (95.2%) 0.953 (93.7%) 0.948 (93.6%)
10000 0.2 1:3 0.968 (94.5%) 0.949 (93.5%) 0.991 (93.5%) 0.988 (93.3%) 0.967 (94.3%) 0.972 (94.7%) 0.915 (92.3%) 0.919 (92.8%)
10000 0.5 1:1 0.957 (93.1%) 0.955 (93.4%) 1.016 (95.1%) 1.021 (95.4%) 1.034 (95.1%) 1.046 (95.1%) 0.912 (91.7%) 0.911 (91.9%)
10000 0.5 1:3 0.970 (93.4%) 0.974 (94.1%) 0.997 (94.9%) 1.002 (94.6%) 0.976 (94.7%) 0.994 (94.8%) 0.876 (90.6%) 0.883 (90.7%)
10000 0.8 1:1 0.964 (93.8%) 0.967 (93.7%) 1.006 (94.8%) 1.007 (94.7%) 1.026 (94.8%) 1.029 (95.3%) 0.897 (90.7%) 0.898 (90.5%)
10000 0.8 1:3 0.982 (95.1%) 0.985 (94.6%) 1.031 (95.8%) 1.050 (96.3%) 1.008 (95.5%) 1.024 (95.4%) 0.833 (89.3%) 0.838 (89.0%)
Time-specific GLM
5000 0.2 1:1 0.916 (91.1%) 0.908 (90.8%) 1.029 (95.1%) 1.023 (95.2%) 1.107 (95.9%) 1.109 (96.5%) 0.938 (93.3%) 0.925 (93.4%)
5000 0.2 1:3 0.958 (92.3%) 0.954 (92.7%) 1.050 (94.9%) 1.059 (95.5%) 1.039 (95.2%) 1.055 (95.3%) 0.917 (91.9%) 0.937 (92.7%)
5000 0.5 1:1 0.947 (93.5%) 0.946 (93.9%) 1.022 (94.5%) 1.021 (94.6%) 1.038 (94.4%) 1.046 (95.0%) 0.910 (92.0%) 0.906 (92.2%)
5000 0.5 1:3 0.956 (93.6%) 0.952 (93.6%) 1.014 (94.3%) 1.003 (95.0%) 0.974 (93.7%) 0.972 (94.1%) 0.894 (92.0%) 0.891 (91.7%)
5000 0.8 1:1 0.987 (93.2%) 0.982 (93.3%) 1.027 (95.4%) 1.017 (95.2%) 1.012 (94.6%) 1.010 (94.7%) 0.943 (93.3%) 0.941 (92.8%)
5000 0.8 1:3 1.020 (95.2%) 1.017 (95.2%) 1.012 (95.5%) 1.009 (94.6%) 0.976 (93.6%) 0.981 (93.4%) 0.859 (90.2%) 0.858 (90.1%)
10000 0.2 1:1 0.962 (94.2%) 0.956 (93.8%) 1.027 (94.7%) 1.036 (95.2%) 1.058 (95.4%) 1.072 (95.6%) 0.954 (93.8%) 0.955 (94.2%)
10000 0.2 1:3 0.966 (94.2%) 0.945 (93.0%) 1.000 (94.3%) 1.016 (94.9%) 0.969 (94.3%) 0.989 (95.1%) 0.922 (92.4%) 0.931 (92.5%)
10000 0.5 1:1 0.956 (92.9%) 0.955 (92.9%) 1.029 (94.9%) 1.030 (95.5%) 1.055 (95.7%) 1.054 (95.7%) 0.912 (91.8%) 0.912 (91.8%)
10000 0.5 1:3 0.968 (93.4%) 0.973 (94.2%) 1.004 (94.5%) 1.011 (95.3%) 0.981 (95.3%) 1.005 (95.4%) 0.875 (91.3%) 0.880 (91.0%)
10000 0.8 1:1 0.959 (93.5%) 0.963 (93.7%) 1.014 (94.9%) 1.019 (94.6%) 1.036 (95.2%) 1.044 (95.7%) 0.900 (91.5%) 0.901 (91.1%)
10000 0.8 1:3 0.979 (95.0%) 0.982 (94.4%) 1.036 (95.8%) 1.059 (96.6%) 1.010 (95.9%) 1.022 (95.7%) 0.835 (89.7%) 0.847 (89.9%)

5. Data Example: the Framingham Offspring Study

To further compare our three new weights with existing methods, we use the Offspring Cohort of Framingham Heart Study (Wawrzyniak 2013) as an example, in which the outcome of interest is a cardiovascular disease (CVD) event. This cohort includes 1501 males and 1644 females; among them, 989 subjects have encountered a CVD event during the follow-up of about 35 years.

To build a risk model for CVD, we consider two markers: the Framingham risk score (FRS), and a biomarker, C-reactive protein (CRP). The FRS was developed by Wilson et al. (1998) to estimate the 10-year CVD risk. This score is gender-specific and based on several risk factors including age, systolic blood pressure, diastolic blood pressure, total cholesterol, high-density lipoprotein cholesterol, current smoking status, and diabetes status. The CRP is an inflammation biomarker and shown to improve the prediction on top of the risk variables from the FRS (Ridker 2003, Cook et al. 2006).

The Offspring Cohort is the full cohort, and we construct NCC sub-cohorts from it and obtain the IPW estimates using our three weights: w~,w^, and wHT, and two existing weights κ1 and κ2. To evaluate the accuracy of these IPW estimators, we use the estimates obtained from the full cohort as the reference values since both FRS and CRP are available for the full cohort. We want to point out that unlike the simulation studies, we cannot compute the true parameter values for this example because the underlying data generating mechanism is unknown. Thus, the full-cohort estimates are the best estimates we could have in this situation.

Specifically, we draw 100 sub-cohorts following each of the two sampling schemes: (i) selecting π1=50% of the 989 CVD events as cases and 1 control for each case, and (ii) selecting π1=26% of the 989 CVD events as cases and 3 controls for each case. These two schemes produce a similar sub-cohort size despite different π1 : out of 100 samples, on average, about 904 subjects are selected based on Scheme (i) and 896 based on Scheme (ii).

We consider two prediction times t0=15 and 30 years. Within 15-year follow-up, 213 subjects experienced a CVD event and 6 subjects were lost to follow-up; the 15-year event risk is estimated to be about 7%; within 30-year follow-up, 805 subjects experienced a CVD event, and 169 subjects were lost-to-follow up; the 30-year event risk is estimated to be about 34%. We fit both the PH model and time-specific GLM with the FRS and log transformed CRP. Figures 3, and 5 include the boxplots of the five IPW estimates for the relative risk and accuracy parameters. In these figures, the horizontal lines represent the full-cohort estimates. In addition, for the relative risk parameters of the PH model, we compare the five IPW estimators with the clogit estimator in Figure 3(a).

Figure 3:

Figure 3:

Data Example: Boxplots of the IPW estimates using five weights: w~,w^,wHT,κ1, and κ2 for the marker effects under the PH model and time-specific GLM based on 100 NCC sub-cohorts. For the PH model, the IPW estimates are also compared with the clogit estimates. The triangle inside each box represents the mean of the estimates; the dashed horizontal lines represent the full-cohort estimates.

Figure 5:

Figure 5:

Data Example: Boxplots of the IPW estimates using five weights: w~,w^,wHT,κ1, and κ2 for the accuracy parameters under the time-specific GLM based on 100 NCC sub-cohorts. The triangle inside each box represents the mean of the estimates; the dashed horizontal lines represent the full-cohort estimates.

We observe similar results with those of the simulation study. The two IPW estimators θ^wˆ and θ^wHT perform the best: they are closest to the full-cohort estimates on average and have the smallest variability. In addition, these two estimators perform similarly. By contrast, θ^κ1 and θ^κ2 are farthest from the full-cohort estimates on average, especially for the accuracy parameters. For θ^w˜, although the GLM estimation converged for all the samples, it still has the largest variance.

From this data example, we also observe that the advantage of θˆwˆ and θ^wHT over all the other methods is more obvious for predicting the 15-year risk than the 30-year risk. In addition, each estimator tend to have a larger variance for π1=26%, compared to π1=50%, although these sampling schemes lead to a similar sub-cohort size. This phenomenon is more apparent for the 15-year risk prediction. This indicates that if the sub-cohort size is fixed given a budget, selecting more cases, i.e., a larger π1, and a lower control-to-case ratio would lead to more efficient estimates.

6. Concluding Remarks

In this paper, we are interested in the untypical NCC design where a subset, not all, of the events are selected as cases. Such a design, although not generally considered in the statistical literature, is useful in practice for various reasons. In particular, for biomarker studies, samples from events may be more easily depleted and require careful preservation. To analyze the untypical NCC data with the IPW approach, we need to address two challenges. First, event cases and event controls are selected to the sub-cohort in different ways. Failing to account for this difference would lead to biased estimation, like Edelmann et al. (2020) and Graziano et al. (2021). In contrasts, our two weights w~ and w^ weight these two groups differently based on how an event enters the sub-cohort. Although our third weight wHT has the same formula for both groups, its selection probability for events counts both case and control selections. In addition, when all the events are selected as cases, all these three weights are equivalent to the Samuelsen’s weight.

The second challenge is statistical inference since the IPW estimator for the untypical NCC has a complicated variance structure, including both between-case and between-control correlations induced by the finite-population sampling. We provided a perturbation resampling procedure for drawing inferences on both model and accuracy parameters. Our simulation study has demonstrated that this procedure can well approximate the empirical variance of the IPW estimator, and consequently, the coverage probability of the perturbation-based CI is close to the nominal level.

Among our three IPW estimators, θ^wHT and θ^wˆ have a similar performance, and they are more efficient than θ^w˜. We have provided the analytical derivation and numerical evidence. In addition, θ^wHT and θ^wˆ perform better than all the existing IPW and clogit estimators. Thus, we recommend these two weighting methods for the IPW estimation under the untypical NCC.

Our proposed framework can be further extended to NCC studies where the cases are sampled via a more complex design, such as stratified sampling (Lü et al. 2018). In such a situation, we can design the following two weights: w^j=δjV1j/p^1j+1-δjV0j/p^0j, where p^1j is the probability that subject j is selected as a case, and wHT,j=δjV1j+1-V1jV0j/p^1j+1-p^1jp^0j+1-δjV0j/p^j. Together, our proposed work could open the door for more efficient and practical biomarker studies.

In this manuscript, all the weights are design-based because the selection probability is calculated based on the sampling scheme. An alternative approach is to augment the weight by either calibrating or estimating the selection probability from a model with auxiliary variables (Breslow et al. 2009). The model-based weighting method is not the focus of this manuscript, but it is worth future exploration for NCC designs since it has a potential to improve the estimation efficiency.

Supplementary Material

supplement

Figure 4:

Figure 4:

Data Example: Boxplots of the IPW estimates using five weights: w~,w^,wHT,κ1, and κ2 for the accuracy parameters under the PH model based on 100 NCC sub-cohorts. The triangle inside each box represents the mean of the estimates; the dashed horizontal lines represent the full-cohort estimates.

Acknowledgement

We appreciate the reviewers for constructive suggestions that lead to the improvement in our methodologies and manuscript. In addition, we acknowledge that the Framingham Heart Study is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University. This manuscript was not prepared in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or the NHLBI.

The work is supported by grants from R01 HL089778, R01 CA236558 and U01 CA86368 from the National Institutes of Health.

Footnotes

Supplementary Material

The supplementary material consists of four parts. In Appendix A, we derive the expectation, variance, and covariance of our three new weights. Appendix B includes the derivation of the asymptotic variance for the IPW estimators and the justification of the perturbation resampling method for estimating the asymptotic variance. In Appendix C, we explain how to obtain the true values of the model and accuracy parameters based on the simulation setting in Section 4.1, and Appendix D includes additional results of the simulation studies.

References

  1. Breslow NE, Lumley T, Ballantyne CM, Chambless LE, and Kulich M (2009). Improved horvitz-thompson estimation of model parameters from two-phase stratified samples: applications in epidemiology. Statistics in biosciences, 1(1):32–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Cai T and Zheng Y (2011). Nonparametric evaluation of biomarker accuracy under nested case-control studies. Journal of the American Statistical Association, 106:569–580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Cai T and Zheng Y (2012). Evaluating prognostic accuracy of biomarkers under nested case-control studies. Biostatistics, 13:89–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Cai T and Zheng Y (2013). Resampling procedures for making inference under nested case–control studies. Journal of the American Statistical Association, 108(504):1532–1544. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cook NR, Buring JE, and Ridker PM (2006). The effect of including c-reactive protein in cardiovascular risk prediction models for women. Annals of internal medicine, 145(1):21–29. [DOI] [PubMed] [Google Scholar]
  6. Edelmann D, Ohneberg K, Becker N, Benner A, and Schumacher M (2020). Which patients to sample in clinical cohort studies when the number of events is high and measurement of additional markers is constrained by limited resources. Cancer Medicine, 9(20):7398–7406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Goldstein L and Langholz B (1992). Asymptotic theory for nested case-control sampling in the Cox regression model. The Annals of Statistics, 20(4):1903–1928. [Google Scholar]
  8. Gray RJ (2009). Weighted analyses for cohort sampling designs. Lifetime data analysis, 15(1):2440. [DOI] [PubMed] [Google Scholar]
  9. Graziano F, Valsecchi MG, and Rebora P (2021). Sampling strategies to evaluate the prognostic value of a new biomarker on a time-to-event end-point. BMC medical research methodology, 21(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Heagerty P and Zheng Y (2005). Survival model predictive accuracy and ROC curves. Biometrics, 61(1):92–105. [DOI] [PubMed] [Google Scholar]
  11. Horvitz DG and Thompson DJ (1952). A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association, 47(260):663–685. [Google Scholar]
  12. Jakszyn P, Agudo A, Lujan-Barroso L, Bueno-de Mesquita HB, Jenab M, Navarro C, Palli D, Boeing H, Manjer J, Numans ME, et al. (2012). Dietary intake of heme iron and risk of gastric cancer in the european prospective investigation into cancer and nutrition study. International journal of cancer, 130(11):2654–2663. [DOI] [PubMed] [Google Scholar]
  13. Jakszyn P, Bingham S, Pera G, Agudo A, Luben R, Welch A, Boeing H, Del Giudice G, Palli D, Saieva C, et al. (2006). Endogenous versus exogenous exposure to n-nitroso compounds and gastric cancer risk in the european prospective investigation into cancer and nutrition (epic-eurgast) study. Carcinogenesis, 27(7):1497–1501. [DOI] [PubMed] [Google Scholar]
  14. Kaplan EL and Meier P (1958). Nonparametric estimation from incomplete observations. Journal of the American statistical association, 53(282):457–481. [Google Scholar]
  15. Lü Y, Cai MH, Cheng J, Zou K, Xiang Q, Wu JY, Wei DQ, Zhou ZH, Wang H, Wang C, et al. (2018). A multi-center nested case-control study on hospitalization costs and length of stay due to healthcare-associated infection. Antimicrobial Resistance & Infection Control, 7(1):99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Ridker PM (2003). Clinical application of c-reactive protein for cardiovascular disease detection and prevention. Circulation, 107(3):363–369. [DOI] [PubMed] [Google Scholar]
  17. Samuelsen S (1997). A psudolikelihood approach to analysis of nested case-control studies. Biometrika, 84(2):379–394. [Google Scholar]
  18. Uno H, Cai T, Tian L, and Wei L (2007). Evaluating prediction rules for t-year survivors with censored regression models. Journal of the American Statistical Association, 102(478):527–537. [Google Scholar]
  19. Wawrzyniak AJ (2013). Framingham Heart Study, pages 811–814. Springer New York, New York, NY. [Google Scholar]
  20. Wilson PW, D’Agostino RB, Levy D, Belanger AM, Silbershatz H, and Kannel WB (1998). Prediction of coronary heart disease using risk factor categories. Circulation, 97(18):1837–1847. [DOI] [PubMed] [Google Scholar]
  21. Zhou QM, Zheng Y, Chibnik LB, Karlson EW, and Cai T (2015). Assessing incremental value of biomarkers with multi-phase nested case-control studies. Biometrics, 71(4):1139–1149. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplement

RESOURCES