Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2010 Jun 3;11(4):693–706. doi: 10.1093/biostatistics/kxq037

Cox regression model with time-varying coefficients in nested case–control studies

Mengling Liu 1,2,3,4,*, Wenbin Lu 1,2,3,4, Roy E Shore 1,2,3,4, Anne Zeleniuch-Jacquotte 1,2,3,4
PMCID: PMC3294270  PMID: 20525697

Abstract

The nested case–control (NCC) design is a cost-effective sampling method to study the relationship between a disease and its risk factors in epidemiologic studies. NCC data are commonly analyzed using Thomas' partial likelihood approach under Cox's proportional hazards model with constant covariate effects. Here, we are interested in studying the potential time-varying effects of covariates in NCC studies and propose an estimation approach based on a kernel-weighted Thomas' partial likelihood. We establish asymptotic properties of the proposed estimator, propose a numerical approach to construct simultaneous confidence bands for time-varying coefficients, and develop a hypothesis testing procedure to detect time-varying coefficients. The proposed inference procedure is evaluated in simulations and applied to an NCC study of breast cancer in the New York University Women's Health Study.

Keywords: Kernel estimation, Martingale, Nested case–control study, Proportional hazards model, Risk-set sampling, Time-varying coefficient

1. INTRODUCTION

Epidemiologic cohort studies of rare diseases are usually expensive to conduct because a large number of individuals need to be followed up for a long time in order to obtain an adequate number of cases. Moreover, the cost of assembling exposure variables of interest and confounders for the entire cohort can be financially prohibitive. Therefore, the nested case–control (NCC) design (Thomas, 1977) has been widely used as a cost-effective alternative to the full-cohort design. For example, an NCC study was conducted in the New York University Women's Health Study (NYUWHS) to investigate the association between breast cancer risk and genetic variations in the nucleotide excision repair (NER) pathway (Shore and others, 2008). The NYUWHS is a prospective cohort study that enrolled 14 274 healthy women aged 35–65 between 1985 and 1991 at a breast cancer screening center and has followed these women since the enrollment for cancers and other health outcomes. The NER mechanism is important for cells to prevent unwanted mutations by removing DNA damage, and thus genes in the NER pathway are hypothesized to play a role in the development of cancers and other diseases involving DNA damage and genetic mutations. It would have been very costly to ascertain genetic information for the entire cohort. Thus, the NCC design was implemented, and for each of the 612 identified invasive breast cancer cases, one control was selected from the case's risk set. Genotypic information on 2 genes, XPC and ERCC2 in the NER pathway, was obtained for the cases and their matched controls. Covariate information on demographics, smoking status, and pregnancy history was collected from baseline and follow-up questionnaires. Effects of NER genes, environmental exposure of smoking that causes DNA damage, and their interactions on breast cancer risk were of primary interest.

Cox's proportional hazards model (Cox, 1972) has been commonly used to analyze NCC data with the successful implementation of the partial likelihood technique (Thomas, 1977, Oakes, 1981). Under the assumption of proportional hazards, that is, the ratio of hazard functions with different covariate values remains constant over time, the expression of Thomas' partial likelihood function is equivalent to the conditional logistic likelihood for matched case–control studies. Theoretical properties of Thomas' maximum partial likelihood estimator have been formally established using counting process and martingale theory (Goldstein and Langholz, 1992; Borgan and others, 1995).

Due to the nature of long-term observation and complex relationships to be explored in large epidemiologic studies, the proportional hazards assumption may be violated. In fact, researchers have extended Cox's model to improve the modeling flexibility and one popular extension is to allow the coefficients to vary with time. Specifically, the Cox model with time-varying coefficients (Zucker and Karr, 1990, Hastie and Tibshirani, 1993) assumes that the individual hazard function has a multiplicative form

1. (1.1)

where λ0(t) denotes an unspecified baseline hazard function, Z(t) denotes p covariate processes, and a0(t) are p processes characterizing the covariates' temporal effects. The estimation and inference for model (1.1) in cohort studies have been studied by many researchers using various techniques: for example, the penalized partial likelihood approach with smoothing splines (Tibshirani and Hastie, 1987, Zucker and Karr, 1990); the sieve maximum partial likelihood approach with histogram sieves (Murphy and Sen, 1991); the integrated Newton–Raphson equation for the cumulative coefficient functions 0ta0(u)du (Martinussen and others, 2002); the kernel-weighted partial likelihood approach (Cai and Sun, 2003; Tian and others, 2005). However, the use of model (1.1) in NCC studies remains limited.

In this article, we are interested in studying model (1.1) with NCC data. We show that the kernel-weighted local polynomial fitting technique (Fan and Gijbels, 1996) can be well coupled with Thomas' partial likelihood to study the time-varying coefficients in NCC studies. The rest of the article is organized as follows. In Section 2, we propose a kernel-weighted partial likelihood estimation approach and establish the asymptotic properties for the proposed estimator. Pertinent to making inference for time-varying coefficients, we develop numerical approaches to constructing simultaneous confidence bands and to testing hypotheses of existing time-invariant coefficients. Furthermore, we consider inference for an extension to incorporate both time-varying and time-invariant coefficients. In Section 3, we present numerical studies including simulations to evaluate the finite-sample performance of our proposed approaches and the analysis of the NYUWHS breast cancer data. We conclude with some remarks in Section 4. All proofs are relegated to the supplementary material available at Biostatistics online.

2. METHODS

Throughout the paper, let {T*,C,Z(·)} denote a random triplet of failure time, right-censoring time, and p-dimensional covariate processes. Consider a cohort of size n and refer the full-cohort data to n independent realizations of {Ti,δi,Zi(·),i=1,,n}, where Ti=min(Ti*,Ci) is the observed failure time and δi indicates the status of the observed event, taking value of 1 for observing a real failure event and 0 otherwise. At a specific time t, let R(t)={i:Tit} denote the risk set. When the NCC design is used to sample from the cohort, one identifies cases as subjects with δi=1 and, for each case, randomly samples (m1) controls without replacement from the risk set at the case's failure time excluding the case itself. For a given case i, let Ri* denote the indices of the (m1) selected controls and define Ri=Ri*{i}. The complete covariate information is ascertained for all the cases and selected controls.

2.1. Kernel-weighted partial likelihood estimation approach

Consider model (1.1) and assume time-varying coefficients a0(·) to be smooth functions with continuous first and second derivatives. Locally around a time point t, we approximate a0(·) by a linearization using the first-order Taylor expansion,

2.1.

Let ϐ0(t)={a01(t),,a0p(t),a01(t),,a0p(t)}T and Inline graphic with ⊗ denoting the Kronecker product. Moreover, define a counting process representation for the observed failure event as N(t)=I(Tt,δ=1). Note that, for each case in the NCC data N(t) jumps from 0 to 1 at the case's failure time; for others, N(t) remains at 0 for the entire follow-up period. To estimate β0 locally around t, we consider the following kernel-weighted partial likelihood function of 2p-dimensional parameters β,

2.1. (2.1)

where K(·) is a kernel function, h denotes a bandwidth parameter, Kh(·)=h1K(·/h) is the scaled kernel function, and τ0 is the upper bound of follow-up period and satisfies τ0=inf{t:pr(T>t)=0}. We assume that K(·) is a symmetric density function with bounded support on [1,1] and hhn=O(nv) for 0<v<1. This kernel function down-weights the contributions from subjects whose event times are far from t and the bandwidth parameter controls the size of local neighborhood. Thus, the local partial likelihood function (2.1) depends only on the case–control sets with case event times in the close vicinity of t.

The score function of (2.1) can be easily derived:

2.1. (2.2)

where for a set R, Inline graphic For a vector a, let ak=1,a and aaT for k=0,1, and 2, respectively. Furthermore, the Hessian matrix of (2.1) equals

2.1. (2.3)

It is evident that ln(β,t) is semi-negative and thus the concavity of (2.1) assures a unique maximum. The Newton–Raphson method or other gradient-based search algorithms can be used to find Inline graphic that maximizes (2.1). We denote the kernel-weighted maximum partial likelihood estimate of a0(t) by Inline graphic(t), which are the first p components of Inline graphic.

2.2. Asymptotic properties of Inline graphic(t)

Because of the risk-set sampling mechanism of NCC design, even when the sample size increases, the size of each sampled risk set Ri for case i is always m, rather than increasing to infinity as the size of risk sets in Cox's partial likelihood function for the full-cohort data. Thus, we adopt similar theoretical arguments for Thomas' maximum partial likelihood estimator used in Goldstein and Langholz (1992) and consider its kernel-weighted local version by using the kernel polynomial fitting technique (Fan and Gijbels, 1996). Let r={1,,m}, Yr(t)=irYi(t), where Yi(t)=I(Tit) and p0(t)=pr {Y1(t)=1}. Denote

2.2.

Indeed, p0(t)λ0(t)Σ0(t) is the local contribution to the asymptotic information matrix of Thomas' partial likelihood function (Goldstein and Langholz, 1992). Here, we state the main asymptotic results. Regularity conditions A.1–A.5, remarks on the technical device used in Goldstein and Langholz (1992) to make the NCC sampling process predictable, and proofs are given in the supplementary material available at Biostatistics online. Let μj=sjK(s)ds and vj=sjK2(s)ds for j=0,1,2.

PROPOSITION 2.1

As n (i) under Conditions A.1–A.4, Inline graphic (ii) under Conditions A.1–A.5, for t(0,τ0),

PROPOSITION 2.1

2.3. Point-wise confidence interval and simultaneous confidence band

From Proposition 1, it is evident that the optimal bandwidth that minimizes the mean squared error or mean integrated squared error is h=O(n1/5). This theoretical optimal bandwidth, however, will lead to an asymptotically biased estimator, that is, the bias term is O(1). In this paper, we prefer to use a slightly faster rate for bandwidth h=nv, with 1/5<v<1, to obtain an unbiased estimator. The main reason is to avoid complications due to the estimation bias in constructing point-wise confidence intervals and simultaneous confidence bands. We refer the reader to Härdle and Marron (1991) for more discussion of handling bias in constructing simultaneous confidence bands for general nonparametric curve estimation.

The variance of Inline graphic(t) can be consistently estimated by the upper left p×p matrix of

2.3. (2.4)

where ln(β,t) is as specified in (2.3). Thus, the 100(1α)% point-wise confidence interval for the jth element of a0(·) at time t can be constructed as Inline graphic, where Inline graphicjj(t) denotes the jth diagonal element of Inline graphic(t) and z1α/2 is the 100(1α/2)th percentile of the standard normal distribution. However, to make inference for the underlying coefficient function over a specific time interval, it is desirable and more informative to consider the simultaneous confidence band than the point-wise confidence interval. In general, it is difficult to derive an analytic form for simultaneous confidence bands and they are usually estimated using numerical approaches that mimic the original data structure while assessing variability (Härdle and Marron, 1991; Tian and others, 2005). Here, we construct simultaneous confidence bands by approximating the distribution of

2.3. (2.5)

with the resampling method of Lin and others (1994), where the weight function wn(t) can be a data-related positive function that uniformly converges to a deterministic function. In our numerical studies, we choose Inline graphic to take the variability into account. In the proof of Proposition 1, we show that, if h=nv and 1/5<v<1, the kernel-weighted score function (2.2) at the true coefficient values is asymptotically equivalent to

2.3.

where dMi(u)=dNi(u)Yi(u)exp{a0T(u)Zi(u)}λ0(u) and consequently, under standard measurability assumptions Mi(u) is a local martingale. Substituting β0 by Inline graphic and Mi(t) by Ni(t)Gi, where Gi are independent standard normal random variables, we obtain a randomly perturbed version of Un*(β0,t), denoted by Inline graphic. At each specific t, the conditional limiting distribution of Inline graphic given the observed data is the same as the unconditional limiting distribution of Un*(β0,t) (Lin and others, 1994). Therefore, the distribution of Sj (2.5) can be numerically approximated by its randomly perturbed counterpart Inline graphic. Let c1α denote the sample 100(1α)th percentile of a large number of realizations of Inline graphicj's, and thus the simultaneous confidence band for a0j(t) over [t1,t2] can be constructed as Inline graphic.

2.4. Inference for mixed Cox model with time-varying and time-invariant coefficients

Model (2.1) is flexible by allowing all coefficients to be time-varying. But when there are indications of possible time-invariant coefficients for certain covariates, one may want to consider a mixed model with time-varying and time-invariant coefficients. Without loss of generality, we consider a mixed model with the first q components of a0(t) being constants, that is, a0T(t)={γ0T,a0[q]T(t)}, where a[q] denotes the first q elements of a vector a and a[q] denotes the remaining pq elements. Based on the proposed kernel-weighted partial likelihood estimates Inline graphic(t), we estimate the constant coefficients γ by

2.4. (2.6)

where wγ(t) is a weight function converging to a deterministic function and also satisfies 0τ0wγ(t)dt=Iq×q as an identity matrix. As suggested in Tian and others (2005), a natural choice for this weight function is the standardized inverse covariance matrix of Inline graphic[q](t), that is, wγ(t)={0τ0J(t)dt}1J(t), where J(t) is the inverse of the upper left q×q submatrix of Inline graphic(t) defined in (2.4). To make inference for γ0, by arguments similar to Tian and others (2005), we can show that Inline graphic converges weakly to a mean-zero normal distribution for h=nv with 1/4<v<1/2, and the limiting covariance matrix can be consistently estimated by Inline graphic.

In practice, after we obtain the estimates Inline graphic(t) and the simultaneous confidence bands, we can examine each plot for whether the confidence band encloses a horizontal line to check if a constant assumption for this coefficient is possible. The drawbacks for this procedure include the slow convergence rate of the local estimates and its low sensitivity, and it also does not take into account the uncertainty of the constant coefficient estimation. As shown in the paragraph above, we can consider the cumulative function of the time-varying coefficient estimates to achieve a better convergence rate (Martinussen and others, 2002). To check whether the jth component of a(t) is independent of time, we consider the process Inline graphicdu, where Inline graphic(t) and Inline graphic are the local and integrated estimates, respectively. Based on a similar resampling method described in Section 2.3, we can obtain randomly perturbed realizations of Inline graphic to evaluate this hypothesis graphically and numerically.

More specifically, define Inline graphic as Inline graphic(t) with the jth and the (p+j)th elements replaced by (Inline graphicj, 0). When the hypothesis H0j:aj(t)=γ is true, Tj(t) converges weakly to a mean-zero Gaussian process (Tian and others, 2005). Its asymptotic distribution can be approximated by the conditional distribution of the randomly perturbed process

2.4.

A large number of resampled realizations of Inline graphic can be plotted with Tj(t) to provide a visual examination for the constant hypothesis. Furthermore, a statistical test can be constructed based on Tj*=max[0,τ0]|Tj(t)|, where the critical value can be obtained using the sample quantile of the resampled counterparts Inline graphic.

2.5. Bandwidth selection

In practice, we suggest to use the cross-validation method (Hastie and others, 2008) to select the bandwidth parameter h. Specifically, we first randomly split the case–control sets into K subsets. Following Tian and others (2005), we may use minus the logarithm of the partial likelihood function as a measure for the prediction error. For each h and the kth data part,

2.5.

where Inline graphic(k)(t) is estimated using K1 data sets excluding the kth part with bandwidth h. Then the total prediction error with h is PE(h)=k=1KPEk(h), and we can choose the bandwidth parameter h that minimizes PE(h).

3. NUMERICAL STUDIES

3.1. Simulations

Simulation studies were carried out to evaluate the performance of the proposed estimation and inference procedures under finite-sample sizes. The failure times were generated from a model with the hazard function of λ(t|Z)=λ0(t)ea1(t)Z1+a2(t)Z2, where the baseline hazard function λ0(t)=b0+b1t+b2texp{b3(t2)2} with (b0,b1,b2,b3) to be specified later and covariates Z1 and Z2 were generated from the binomial distribution with success probability of 0.5. We considered 2 types of time-varying function for a1(t): polynomial, where a1(t)=10.25(t2)2 and (b0,b1,b2,d)=(0.0,0.005,0.002,0.25); log-sinusoid, where a1(t)=log{10.8sin(t/1.2)} and (b0,b1,b2,d)=(0.02,0.01,0,0). We assumed a2(t)=γ to be a constant coefficient of 0.5. In addition, the censoring time was generated as C=min(5,C*), where C* was from a uniform distribution on (3,6). As NCC studies are commonly implemented when the disease incidence rate is low, our simulated incidence rates were all about 10–13%. We simulated the NCC data from full cohorts sized 1000 and 2000 and selected 2 controls for each case. We used the Epanechnikov kernel and ran 1000 simulations for each setting. The proposed simultaneous confidence band and testing procedure were carried out with 5000 resampling runs.

Table 1 presents the results regarding the simultaneous confidence band for the time-varying coefficient a1(t). The bandwidth parameter h was set to change from 0.6 to 1.4 by an increment of 0.2 and the simultaneous band was constructed over [1,4]. We found that the performance of simultaneous confidence band depended on the bandwidth parameter h. Small h led to small biases but large variance and thus higher coverage probabilities; while large h led to reverse results. When the bias–variance balance was reached, for example, when h=1.2 for n=1000 and h=1 for n=2000, the simultaneous confidence band yielded satisfactory coverage probability. The overall performance improved with increased sample size. For example, when n=2000 and h=1.0, the resampling-based simultaneous coverage probabilities matched the nominal level reasonably well and the empirical quantiles of observed Inline graphic based on 1000 simulated data sets were very close to the resampling-based threshold. Furthermore, when a1(t) was the quadratic polynomial function, the curvature of this polynomial function, that is, a1(t)=0.5 for all t; while a1(t) being the log-sinusoid function, its curvature ranged from 0.5 to 2.8. Therefore, by comparing the results between these 2 different types of time-varying coefficients, we found that the performance of simultaneous confidence band for the polynomial coefficient was less sensitive to the selection of h than with the log-sinusoid coefficient function. This observation is not surprising because it is usually more difficult to characterize a more variable function and the bias of local linear fitting depends on the magnitude of a(t).

Table 1.

Simulation results: simultaneous confidence band for Inline graphic1(t) over [1, 4]

a1(t) h N = 1000
N = 2000
S.CP E.S R.S§ SD(R.S) S.CP E.S R.S§ SD(R.S)
Polynomial 0.6 98.4 2.524 2.826 0.146 96.4 2.689 2.810 0.089
0.8 97.9 2.526 2.755 0.126 95.5 2.712 2.740 0.082
1.0 97.2 2.511 2.698 0.116 94.8 2.709 2.686 0.078
1.2 96.0 2.529 2.661 0.113 93.8 2.724 2.653 0.078
1.4 95.3 2.609 2.640 0.112 92.6 2.806 2.635 0.078
Sinusoid 0.6 97.8 2.494 3.078 9.185 96.0 2.728 2.803 0.127
0.8 97.1 2.514 2.733 0.166 95.2 2.695 2.732 0.107
1.0 96.5 2.568 2.682 0.150 94.5 2.689 2.673 0.099
1.2 95.2 2.643 2.645 0.140 92.3 2.866 2.632 0.093
1.4 92.7 2.739 2.622 0.132 87.3 3.083 2.607 0.089

S.CP: Simultaneous 95% coverage probability

E.S: Empirical 95% quantile of Inline graphic in 1000 simulations

§

R.S: average of resampling-based thresholds

SD(R.S): standard deviation of resampling-based thresholds.

Figures 1 and 2 show the estimated time-varying coefficient curves with sample size of 2000, the average 95% point-wise confidence intervals over 1000 simulations, and 95% confidence envelope constructed using the point-wise 2.5% and 97.5% quantiles of the estimated curves over 1000 simulations. We found that when h was small, the estimated curves were very close to the true curves but the point-wise confidence intervals were wide. As h increased, the estimated curves showed biases at “the valley” but the point-wise confidence intervals were narrower.

Fig. 1.

Fig. 1.

Estimated coefficient curves with different bandwidth parameters for the polynomial curve. 1: the true underlying coefficient; 2: the average of 1000 estimated curves; 3: the median of 1000 estimated curves; 4: the average of 95% point-wise confidence intervals; 5: confidence envelope of 2.5% and 97.5% quantiles of 1000 estimated curves.

Fig. 2.

Fig. 2.

Estimated coefficient curves with different bandwidth parameters for log-sinusoid curve.1: the true underlying coefficient; 2: the average of 1000 estimated curves; 3: the median of 1000 estimated curves; 4: the average of 95% point-wise confidence intervals; 5: confidence envelope of 2.5% and 97.5% quantiles of 1000 estimated curves.

We summarize the estimation results for the constant coefficient a2(t)γ in Table 2. We report the bias, the sample standard deviation (SD) of the estimates over 1000 simulations, the average standard error (SE) using the asymptotic approximation, and the 95% coverage probability (CP) of Wald-type confidence intervals. The biases were all small and the SDs and SEs decreased as the sample size increased. Overall, the SEs and SDs matched well and the 95% CPs were close to the nominal level. The performance of the integrated estimator Inline graphic for the constant coefficient was stable with respect to h. Such an observation confirms the theoretical result that the integrated estimator has rate of n1/2 and is independent of h (given the rate of h in certain range).

Table 2.

Simulation results: estimation of constant coefficient γ

a(t) h N = 1000
N = 2000
Bias SD SE CP§ Bias SD SE CP§
Polynomial 0.6 0.002 0.247 0.252 0.964 − 0.007 0.168 0.167 0.948
0.8 − 0.002 0.244 0.248 0.967 − 0.006 0.167 0.167 0.949
1.0 − 0.003 0.242 0.248 0.967 − 0.005 0.166 0.169 0.953
1.2 − 0.002 0.241 0.251 0.966 − 0.004 0.165 0.172 0.960
1.4 − 0.002 0.240 0.255 0.971 − 0.003 0.164 0.175 0.965
Sinusoid 0.6 0.005 0.262 0.276 0.969 0.001 0.183 0.180 0.951
0.8 − 0.002 0.257 0.270 0.969 − 0.001 0.179 0.180 0.955
1.0 − 0.004 0.253 0.270 0.969 − 0.002 0.177 0.182 0.959
1.2 − 0.005 0.250 0.272 0.973 − 0.002 0.177 0.185 0.959
1.4 − 0.005 0.249 0.276 0.976 − 0.002 0.177 0.188 0.963

SD: Sample standard deviation of the proposed estimates in 1000 simulations

SE: average of standard error estimates

§

CP: coverage probability of the 95% Wald-type confidence interval.

Finally, we assessed the performance of our proposed testing procedure for testing time-varying coefficients. Table 3 reports 5% error rates for testing a1(t) and a2(t), respectively, with sample size of 2000. The empirical threshold was estimated by the 95th quantile of the sample test statistics from 1000 runs of simulations, and the average resampling-based threshold was defined as the mean of resampling-based 1000 thresholds. For the constant coefficient, the proposed testing procedure showed good error rate control; for the time-varying coefficient, it yielded reasonable power. Similar to the observation in Table 1, the power was higher for detecting the log-sinusoid coefficient.

Table 3.

Simulation results: identifying time-varying coefficients with N = 2000

a(t) h Constant coeficient
Time-varying coeficient
5% rate E.T R.T SD(R.T)§ 5% rate E.T R.T SD(R.T)§
Polynomial 0.6 0.065 1.220 1.111 0.160 0.412 2.053 1.123 0.160
0.8 0.058 1.099 1.038 0.140 0.401 1.797 1.050 0.148
1.0 0.050 1.084 1.008 0.137 0.363 1.649 1.020 0.147
1.2 0.042 1.043 0.996 0.136 0.314 1.605 1.008 0.146
1.4 0.039 1.024 0.993 0.134 0.277 1.562 1.004 0.143
Sinusoid 0.6 0.097 1.614 1.337 0.182 0.897 4.558 1.358 0.182
0.8 0.068 1.369 1.237 0.157 0.882 3.820 1.263 0.155
1.0 0.061 1.259 1.171 0.146 0.850 3.403 1.208 0.149
1.2 0.055 1.171 1.124 0.141 0.798 2.997 1.171 0.149
1.4 0.048 1.086 1.093 0.139 0.732 2.653 1.146 0.151

E.T: Empirical 95% quantile of the test statistics in 1000 simulations

R.T: average of resampling-based thresholds

§

SD(R.T): standard deviation of resampling-based thresholds.

3.2. Breast cancer study in the NYUWHS

The details of this NCC study have been reported in Shore and others (2008). As an illustration, we estimated the gene–environmental interaction effects of gene XPC with the smoking exposure which can induce DNA damage. Based on the results of Shore and others (2008), we assumed a recessive model for gene XPC and fitted model (1.1) with 4 covariates: Ethnicity (1 for Caucasian; 0 otherwise), XPC-Smoking-10 (1 for XPC-PAT allele +/+ nonsmokers; 0 otherwise), XPC-Smoking-01 (1 for XPC-PAT allele / or /+ smokers; 0 otherwise), and XPC-Smoking-11 (1 for XPC-PAT allele +/+ smokers; 0 otherwise). Because there were very few incident cases before age 45, we focused our analysis on the NCC data with age of diagnosis between 45 and 75.

We fitted the kernel-weighted partial likelihood approach using the Epanechnikov kernel function with bandwidth parameter h=10. The estimated coefficient curves, the point-wise confidence intervals, and the simultaneous confidence bands are presented in the top panel of Figure 3. We found that the Caucasian group had lower risk in early age than the non-Caucasian group but then the risk increased and peaked around 60–65. Both the 90% point-wise confidence interval and the simultaneous confidence band excluded zero. There was no significantly increased risk in the XPC-PAT +/+ nonsmokers group (XPC-Smoking-10) comparing to the reference group of XPC wild-type nonsmokers as the point-wise and simultaneous confidence intervals all included zero. In XPC-Smoking-01 group, the risk seemed to be elevated at early ages then diminished but the overall effect was not significant according to the simultaneous confidence band. Lastly, the risk of breast cancer in the group of XPC-PAT+/+ smokers was uniformly elevated across all ages, and the effect was borderline significant, which agreed with the findings in Shore and others (2008)

Fig. 3.

Fig. 3.

Analysis results of the breast cancer NCC study. The solid curves are the estimated coefficients; the dashed lines are 90% point-wise confidence intervals; the dotted lines are 90% simultaneous confidence bands.

We next applied the proposed resampling testing procedure to examine whether each covariate effect can be sufficiently described by a constant. We plot Tj(t) and 10 resampling realizations in the lower panel of Figure 3, and the p-values based on 5000 resampling runs are also presented in the plots. The constant coefficient assumption for Ethnicity variable was rejected at 0.05 level (p-value=0.0497), but this assumption seemed to be reasonable for all other covariates. Therefore, we further estimated the constant coefficients using the proposed integrated estimator (2.6). Comparing to XPC wild-type nonsmokers, there was no significantly increased risk in XPC-Smoking-10 group (OR=1.01, 95% CI: 0.57–1.76, p=0.99) or in XPC-Smoking-01 group (OR=0.94, 95% CI: 0.68–1.30, p=0.70), but a significant increase in XPC-Smoking-11 group (OR=1.93, 95% CI: 1.15–3.25, p=0.01). Again, these observations confirmed the results of Shore and others (2008).

4. DISCUSSION

As a way of enhancing modeling flexibility and providing an alternative and diagnostic tool for Cox's proportional hazards model, we developed inference procedures for the Cox regression model with time-varying coefficients in NCC studies. The NCC design is a member of a general class of cohort risk-set sampling designs that include counter-matching design (Langholz and Borgan, 1995), quota-matching design, and many others. Borgan and others (1995) studied Cox's proportional hazards model in this general framework using the marked point process to characterize the failure time process and the sampling scheme simultaneously. As a referee pointed out, the proposed inference procedure for NCC studies may be generalized to accommodate other types of cohort risk-set sampling designs by a formulation based on marked point process theory (Brémaud, 1981).

In cohort studies, the data can often present not only right censoring but also left truncation. For example, left-truncated data occur when only subjects who are disease-free enter the study because the disease can only happen afterward. To incorporate the left truncation, controls need to be drawn from an adjusted risk set defined as Inline graphic, where Xi denotes the left-truncation time for subject i. The extension to accommodate left-truncated and right-censored NCC data is certainly of interest and may be handled in a unified manner using the marked point process and martingale theory (Brémaud, 1981). We shall further investigate this extension elsewhere.

Our analysis of the breast cancer data in the NYUWHS has confirmed the finding of interaction between gene XPC in the NER pathway and smoking, a DNA-damaging agent (Shore and others, 2008). In addition, the finding that the Caucasian group has lower risk at earlier age and then elevated risk at older age compared to the non-Caucasian group is consistent with the well-described black-to-white ethnic crossover in breast cancer incidence rate (Gray and others, 1980; Joslyn and others, 2005, Anderson and others, 2008) and demonstrates the issue of race disparity in disease risk. In practice, it is also common to add covariate-by-time interaction terms to Cox's proportional hazards model to examine possible time-varying coefficients. We have also considered this approach by adding an interaction term of ethnicity with time and fitted the model with Thomas' partial likelihood estimation approach. However, the interaction term did not reach statistical significance (p-value=0.127). One possible reason is that as shown in Figure 3, the shape of the ethnic effect is like a U-shape rather than linear or monotone, and the simple covariate–time interaction term may not be able to catch such a trend. In summary, the Cox model with time-varying coefficients can elucidate the effect of risk factors on the disease and provide intuitive tool to visualize such effects.

SUPPLEMENTARY MATERIAL

Supplementary material is available at http://biostatistics.oxfordjournals.org.

FUNDING

National Cancer Institute (CA098661, CA091892, CA16087, CA140632); Department of Defense (DAMD17-01-1-0578); Susan G. Komen Breast Cancer Research Foundation (BCTR 2000 685).

Supplementary Material

Supplementary Data

Acknowledgments

The authors would like thank the co-editor, an associate editor, and a referee for their valuable suggestions. Conflict of Interest: None declared.

References

  1. Anderson WF, Rosenberg PS, Menashe I, Mitani A, Pfeiffer RM. Age-related crossover in breast cancer incidence rates between black and white ethnic groups. Journal of National Cancer Institute. 2008;100:1804–1814. doi: 10.1093/jnci/djn411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Brémaud P. Point Processes and Queues: Martingale Dynamics. New York: Springer; 1981. [Google Scholar]
  3. Borgan Ø, Goldstein L, Langholz B. Methods for the analysis of sampled cohort data in the Cox proportional hazards model. Annals of Statistics. 1995;23:1749–1778. [Google Scholar]
  4. Cai ZW, Sun YQ. Local linear estimation for time-dependent coefficients in Cox's regression models. Scandinavian Journal of Statistics. 2003;30:93–111. [Google Scholar]
  5. Cox DR. Regression models and life tables (with discussion) Journal of the Royal Statistical Society, Series B. 1972;34:187–220. [Google Scholar]
  6. Fan J, Gijbels I. Local Polynomial Modelling and Its Applications. London: Chapman and Hall; 1996. [Google Scholar]
  7. Goldstein L, Langholz B. Asymptotic theory for nested case-control sampling in the Cox regression model. Annals of Statistics. 1992;20:1903–1928. [Google Scholar]
  8. Gray GE, Henderson BE, Pike MC. Changing ratio of breast cancer incidence rates with age of black females compared with white females in the United States. Journal of National Cancer Institute. 1980;64:461–463. [PubMed] [Google Scholar]
  9. Härdle W, Marron JS. Bootstrap simultaneous error bars for nonparametric regression. Annals of Statistics. 1991;19:778–796. [Google Scholar]
  10. Hastie T, Tibshirani R. Varying-coefficient models. Journal of the Royal Statistical Society, Series B. 1993;55:757–796. [Google Scholar]
  11. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer; 2008. [Google Scholar]
  12. Joslyn SA, Foote ML, Nasseri K, Coughlin SS, Howe HL. Racial and ethnic disparities in breast cancer rates by age: NAACCR breast cancer project. Breast Cancer Research and Treatment. 2005;92:97–105. doi: 10.1007/s10549-005-2112-y. [DOI] [PubMed] [Google Scholar]
  13. Langholz B, Borgan Ø. Counter-matching: a stratified nested case-control sampling method. Biometrika. 1995;82:69–79. [Google Scholar]
  14. Lin DY, Fleming TR, Wei LJ. Confidence bands for survival curves under the proportional hazards model. Biometrika. 1994;81:73–81. [Google Scholar]
  15. Martinussen T, Scheike TH, Skovgaard IM. Efficient estimation of fixed and time-varying covariate effects in multiplicative intensity models. Scandinavian Journal of Statistics. 2002;29:57–74. [Google Scholar]
  16. Murphy SA, Sen PK. Time-dependent coefficients in a Cox-type regression-model. Stochastic Processes and Their Applications. 1991;39:153–180. [Google Scholar]
  17. Oakes D. Survival times: aspects of partial likelihood. International Statistical Review. 1981;49:235–252. [Google Scholar]
  18. Shore RE, Zeleniuch-Jacquotte A, Currie D, Mohrenweiser H, Afanasyeva Y, Koenig KL, Arslan AA, Toniolo P, Wirgin I. Polymorphisms in XPC and ERCC2 genes, smoking and breast cancer risk. International Journal of Cancer. 2008;122:2101–2105. doi: 10.1002/ijc.23361. [DOI] [PubMed] [Google Scholar]
  19. Thomas DC. Addendum to “Methods of Cohort Analysis—Appraisal by Application to Asbestos Mining” by Liddell, F.D.K., McDonald, J.C., and Thomas, D.C., J.R. Journal of the Royal Statistical Society, Series A. 1977;140:469–491. [Google Scholar]
  20. Tian L, Zucker D, Wei LJ. On the Cox model with time-varying regression coefficients. Journal of the American Statistical Association. 2005;100:172–183. [Google Scholar]
  21. Tibshirani R, Hastie T. Local likelihood estimation. Journal of the American Statistical Association. 1987;82:559–568. [Google Scholar]
  22. Zucker DM, Karr AF. Nonparametric survival analysis with time-dependent covariate effects—a penalized partial likelihood approach. Annals of Statistics. 1990;18:329–353. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES