Summary
Attributable fractions are commonly used to measure the impact of risk factors on disease incidence in the population. These static measures can be extended to functions of time when the time to disease occurrence or event time is of interest. The present paper deals with nonparametric and semiparametric estimation of attributable fraction functions for cohort studies with potentially censored event time data. The semiparametric models include the familiar proportional hazards model and a broad class of transformation models. The proposed estimators are shown to be consistent, asymptotically normal and asymptotically efficient. Extensive simulation studies demonstrate that the proposed methods perform well in practical situations. A cardiovascular health study is provided. Connections to causal inference are discussed.
Some key words: Adjusted attributable fraction, Attributable risk, Cohort study, Population attributable fraction, Proportional hazards model, Transformation model
1. Introduction
An important task in public health research is to evaluate the impact of risk factors on the occurrence of disease in the population. The population attributable fraction is commonly used for this purpose. First proposed by Levin (1953), the population attributable fraction is defined as ‘the reduction in incidence that would be achieved if the population had been entirely unexposed, compared with its current (actual) exposure pattern’ (Rothman & Greenland, 1998). Unlike relative risk, the population attributable fraction takes into account the prevalence of risk factors in the population and thus quantifies the population impact of risk factors. A related concept is the adjusted attributable fraction, which is the reduction in incidence if a subset of risk factors is eliminated from the population while the other risk factors retain their actual levels. These measures have received considerable attention in recent years (e.g. Benichou, 2001; Greenland, 2001; Silverberg et al., 2004; Graubard & Fears, 2005).
Let D be a binary disease status and Z be a binary exposure indicator. The population attributable fraction is defined as (Levin, 1953)
In the presence of confounding by other risk factors, say W, it is more appropriate to use the adjusted attributable fraction
where w1, . . . , wm are the m levels of W (Whittemore, 1982; Bruzzi et al., 1985).
The aforementioned measures are defined for binary outcomes with time-independent risk factors. Such measures are inadequate for cohort studies that record potentially censored event times and possibly time-dependent risk factors. Chen et al. (2006) extended the population attributable fraction to a population attributable fraction function for event time data by replacing the disease incidence rate with the cumulative distribution function of the event time. They approximated the population attributable fraction function with the so-called attributable hazard function and proposed an estimator for the latter under the proportional hazards model. The two functions can be quite different when the disease is not rare, and the work requires that the censoring time is independent of both the event time and the risk factors.
In the present paper, we study nonparametric and semiparametric estimation of the population attributable fraction function, allowing censoring to depend on the risk factors. The semiparametric estimators are very general in that the model can be proportional hazards or nonproportional hazards and the risk factors can be discrete or continuous and possibly time-dependent. We also extend the adjusted attributable fraction to event time data and develop semiparametric estimators.
2. Inference procedures
2.1. The population attributable fraction function
The population attributable fraction function is defined as
where T denotes the time to disease or event time, and X denotes a p-vector of risk factors (Chen et al., 2006). It is convenient to express A(t) in terms of survival functions,
where S(t) = pr(T > t) and S0(t) = pr(T > t | X = 0). If S0(t) and S(t) are estimated by Ŝ0(t) and Ŝ(t), respectively, then A(t) is naturally estimated by
Below, we describe various nonparametric and semiparametric estimators for S0(·) and S(·). In Appendix A, we show that n1/2{Â(t) – A(t)} converges weakly to a zero-mean Gaussian process with covariance function E{ξ(t)ξT(s)} between time-points t and s, where
(1) |
and η1(t) and η2(t) depend on the estimation methods for S0(t) and S(t), respectively. The covariance function can be consistently estimated by , where
(2) |
and η̂1i (t) and η̂2i (t) are the sample versions of η1(t) and η2(t), respectively, for the i th subject.
The above results enable one to construct confidence intervals for A(t). We recommend using the log-transformation log{1 – A(t)}, which not only ensures that the resulting intervals lie in the range (–∞, 1), but also improves the coverage probabilities in small samples.
It is useful to adopt counting process notation. Let N (t) = I {T ⩽ min(C, t)}, and Y (t) = I{min(T, C) ⩾ t}, where C denotes the censoring time, and I(·) is the indicator function. The data consist of n independent replicates {Ni (t), Yi (t), Xi (t); t ∈ [0, τ]}, where τ is the endpoint of the study. We consider the situation that C is independent of T and X as well as the situation that C is independent of T conditional on X, which are referred to as independent censoring and covariate-dependent censoring, respectively.
When X includes only time-independent categorical covariates, we can estimate S0(·) by the Kaplan–Meier estimator using the data from the subjects with the baseline covariate values, i.e. X = 0. Then η1(t) in (1) is given by , where and Λ0(·) = – log{S0(·)}. The Kaplan–Meier estimator of S0(·) can be unstable and inefficient if the number of subjects with X = 0 is small.
When X contains continuous or time-dependent covariates, we estimate S0(·) under a semi-parametric regression model. The familiar proportional hazards model (Cox, 1972) specifies that the cumulative hazard function of T conditional on X takes the form
where β is a p-vector of unknown regression parameters and Λ0(·) is an arbitrary cumulative baseline hazard function. The covariates are assumed to be external in the sense of Kalbfleisch & Prentice (2002, p. 197). We estimate S0(t) by exp{–Λ̂0(t)}, where Λ̂0(t) is the Breslow (1972) estimator of Λ0(t). Then
(3) |
where , , e(t, β) = E[Y (t) exp{βTX(t)}X(t)]/E[Y (t) exp{βTX(t)}] and ℐ(β) is the information matrix for β.
Because the proportional hazards assumption may be inappropriate in certain applications, we explore the following class of transformation models:
(4) |
where G is a strictly increasing function, β is a vector of unknown regression parameters and Λ0(·) is an arbitrary increasing function (Zeng & Lin, 2006). If X is time-independent, then (4) reduces to the class of linear transformation models
where H is an arbitrary increasing function and ∊ is a random error with a parametric distribution (Dabrowska & Doksum, 1988; Kalbfleisch & Prentice, 2002, p. 241). We consider the class of Box–Cox transformations G(x) = {(1 + x)ρ– 1}/ρ (ρ ⩾ 0) with ρ = 0 corresponding to G(x) = log(1 + x) and the class of logarithmic transformations G(x) = log(1 + rx)/r (r ⩾ 0) with r = 0 corresponding to G(x) = x. The choice of ρ = 1 or r = 0 yields the proportional hazards model while the choice of ρ = 0 or r = 1 yields the proportional odds model. Figure 1 displays the patterns of the population attributable fraction functions under three transformation models. Not surprisingly, the population attributable fraction increases as the effect of the exposure becomes larger and as the exposure becomes more prevalent.
Under the transformation models in (4), S0(t) can be estimated by exp[–G{Λ̂0(t)}], where Λ̂0(t) is the nonparametric maximum likelihood estimator of Λ0(t) (Zeng & Lin, 2006). Then
(5) |
where (𝒮β, 𝒮Λ0) is the score operator for β and Λ0, ℐβ,Λ0 is the information operator for β and Λ0 and h(v; t) = I (v ⩽ t). Here and in the sequel, g′(x) = dg(x)/dx. In the special case of the proportional hazards model, (5) reduces to (3). We obtain the η̂1i (t) in (2) as follows. Let t1 < · · · < tk be the distinct observed event times. We treat β and the jump sizes of Λ0 (·) at (t1, . . . , tk) as parameters. We calculate the score vector of those parameters for the i th subject, denoted by Ui, and the observed information matrix for those parameters, denoted by ℐn. Then η̂1i is given by , where ĥ(t) is the vector of indicators I (tj ⩽ t) (j = 1, . . . , k).
Under the independent censoring assumption, it is natural and simple to estimate S(·) by the Kaplan–Meier method. Then η2(t) in (1) is given by , where , and Λ(·) = –log{S(·)}. When the independent censoring assumption is violated, the Kaplan–Meier estimator for S(·) is no longer consistent.
Under the covariate-dependent censoring condition, we estimate S(t) by , where Ŝ (t | x) is a nonparametric or semiparametric estimator of S(t | x) = pr(T > t | X = x). When X only consists of time-independent categorical covariates, S(·| x) can be estimated by the Kaplan–Meier estimator among the subjects with X = x and the corresponding Ŝ(·) is a weighted Kaplan–Meier estimator (Murray & Tsiatis, 1996). Then η2 (t) = S(t | X) – S(t) + ψ (t; X), where
For the general type of X, we adopt the class of transformation models given in (4) and estimate S(t) by , where β̂ and Λ̂0(·) are the nonparametric maximum likelihood estimators of β and Λ0(·). Then
(6) |
where and .
We obtain various estimators of A(·) by combining specific estimators of S0(·) and S(·). If S0(·) and S(·) are estimated by the Kaplan–Meier and weighted Kaplan–Meier estimators, respectively, then the resulting estimator of A(·) is referred to as the Kaplan–Meier × weighted Kaplan–Meier estimator; if S0(·) and S(·) are both estimated under a transformation model, then the resulting estimator of A(·) is referred to as the transformation model × transformation model estimator; other combinations are named in the same way. In Appendix B, we show that the Kaplan–Meier × weighted Kaplan–Meier estimator is asymptotically efficient for the model space satisfying the independent or covariate-dependent censoring assumption and the transformation model × transformation model estimator is asymptotically efficient under model (4).
2.2. Adjusted attributable fraction function
In some applications, the attributable fraction of a particular subset of risk factors is of interest. Suppose that the entire set of risk factors X is decomposed into subsets Z and W, where Z denotes the risk factors of main interest and W denotes the remaining risk factors. For notational simplicity, we assume that the dimensions of Z and W are both 1. We define the adjusted attributable fraction function of Z in the presence of W as
which can also be expressed as
where FW (·) is the marginal distribution of W . Under model (4), S(t | x) can be estimated by . Then Aadj(t) can be estimated by
where Wi is the observation of W on the i th subject. In Appendix A, we show that n1/2{Âadj(t) – Aadj(t)} converges weakly to a zero-mean Gaussian process with covariance function E{ξ(t) ξT(s)} at (t, s), where
(7) |
η1(t) is expression (6) with X = (0, W)T, and η2(t) is determined by the estimation method for S(·). Under independent censoring, S(·) can simply be estimated by the Kaplan–Meier method and η2(t) is equal to . In the case of covariate-dependent censoring, S(t) can be estimated by under model (4), and η2(t) is given in (6). The variance estimator for Âadj(·) and the confidence intervals for Aadj(·) can be obtained in the same manner as in the case of A(·).
2.3. Causal interpretation
Let T be the observed event time, and let T (z) be the potential event time if the exposure Z has value z. We define the function
which is the proportionate reduction of the incidence by t if the entire population were unexposed, i.e. Z = 0. To connect pr{T (0) ⩽ t} to the observed outcome T, we make the following two assumptions.
Assumption 1. No unobserved confounders: conditional on all observed confounders W, Z is independent of {T (z)}.
Assumption 2. Stable unit treatment value assumption: T = ∑z T (z)I (Z = z).
These assumptions are standard in causal inference (e.g. Rubin, 1978; Pearl, 2000, pp. 98–103). Under Assumption 1, pr{T (0) ⩽ t} = EW [pr{T (0) ⩽ t | W }] = EW [pr{T (0) ⩽ t | Z = 0, W}]. It follows from Assumption 2 that pr{T (0) ⩽ t} = EW {pr(T ⩽ t | Z = 0, W)}. Thus, A(t) becomes the adjusted attributable fraction function of § 2.2:
If the marginal distribution of W is the same as the conditional distribution of W under Z = 0 or, more specifically, Z is independent of W, then EW {pr(T ⩽ t | Z = 0, W)} = pr(T ⩽ t | Z = 0). Consequently,
which is the population attributable fraction function of § 2.1 upon setting Z to X.
3. Simulation studies
To assess the performance of the proposed estimators for the population attributable fraction function under independent censoring, we generated event times from the transformation model Λ(t | X) = G{0.1t exp(β X)}, where X is Bernoulli with success probability 0.4, and G is the Box–Cox transformation with ρ =1, 2 or the logarithmic transformation with r =1, 2. We generated censoring times from the Un(0, τ) distribution, where t was chosen to yield a censoring rate of approximately 70%. We generated 10 000 replicates with n =1000. Since censoring is independent and X is binary, all four estimators of A(·) can be used. Table 1 summarizes the results for the Kaplan–Meier × weighted Kaplan–Meier and transformation model × transformation model estimators of A(t) at t = τ/4, τ/2, 3τ/4 and t under β = 1.0. The transformation model × transformation model estimator performs very well: the estimator is virtually unbiased, its variance estimator accurately reflects the true variation and the confidence intervals have proper coverage probabilities. The Kaplan–Meier × weighted Kaplan–Meier estimator performs very well when t is not near τ . As expected, the former estimator is more efficient than the latter. The results for the Kaplan–Meier × Kaplan–Meier and transformation model × Kaplan–Meier estimators are almost the same as those of the Kaplan–Meier × weighted Kaplan–Meier and transformation model × transformation model estimators, respectively, and are thus omitted.
Table 1.
Kaplan–Meier × weighted Kaplan–Meier | transformation model × transformation model | ||||||||
---|---|---|---|---|---|---|---|---|---|
Model | Parameter | Bias | sse | see | cp(%) | Bias | sse | see | cp(%) |
ρ= 2 | A(τ/4) | 0.000 | 0.063 | 0.062 | 94.3 | 0.000 | 0.040 | 0.040 | 94.7 |
A(τ/2) | 0.000 | 0.046 | 0.046 | 95.1 | 0.000 | 0.039 | 0.038 | 94.7 | |
A(3τ/4) | 0.001 | 0.042 | 0.041 | 94.4 | 0.000 | 0.037 | 0.036 | 94.7 | |
A(τ) | 0.001 | 0.063 | 0.050 | 89.6 | 0.000 | 0.038 | 0.036 | 94.2 | |
ρ= 1 | A(τ/4) | 0.000 | 0.060 | 0.059 | 94.7 | −0.000 | 0.043 | 0.042 | 94.6 |
A(τ/2) | 0.000 | 0.046 | 0.045 | 95.0 | −0.000 | 0.040 | 0.039 | 94.5 | |
A(3τ/4) | 0.000 | 0.043 | 0.042 | 94.4 | 0.000 | 0.036 | 0.036 | 94.5 | |
A(τ) | 0.001 | 0.062 | 0.050 | 89.9 | 0.001 | 0.036 | 0.035 | 94.6 | |
r = 1 | A(τ/4) | 0.000 | 0.056 | 0.055 | 94.3 | −0.001 | 0.047 | 0.046 | 94.5 |
A(τ/2) | 0.000 | 0.044 | 0.044 | 94.6 | −0.000 | 0.040 | 0.039 | 94.5 | |
A(3τ/4) | 0.001 | 0.043 | 0.042 | 94.6 | 0.000 | 0.034 | 0.034 | 94.4 | |
A(τ) | 0.001 | 0.061 | 0.050 | 91.4 | 0.001 | 0.033 | 0.032 | 94.4 | |
r = 2 | A(τ/4) | 0.000 | 0.054 | 0.053 | 94.5 | −0.000 | 0.048 | 0.047 | 94.4 |
A(τ/2) | 0.000 | 0.044 | 0.044 | 94.7 | 0.000 | 0.038 | 0.038 | 94.6 | |
A(3τ/4) | 0.001 | 0.043 | 0.042 | 94.7 | 0.000 | 0.033 | 0.032 | 94.4 | |
A(τ) | 0.002 | 0.059 | 0.050 | 92.1 | 0.001 | 0.031 | 0.030 | 94.4 |
Bias, the sampling bias; sse, the sampling standard error; see, the sampling mean of the standard error estimator; cp, the coverage probability of the 95% Wald confidence interval.
To evaluate the performance of the estimators for the population attributable fraction function under covariate-dependent censoring, we modified the above set-up by generating censoring times from the proportional hazards model Λ(t | X) = 0.1t exp(1.5X). Table 2 shows the results based on 10 000 replicates under the proportional hazards model for the event time with β = 1.0 and n = 1000. The Kaplan–Meier × weighted Kaplan–Meier and transformation model × transformation model estimators have excellent performance. The Kaplan–Meier × Kaplan–Meier and transformation model × Kaplan–Meier estimators, which require independent censoring, are severely biased.
Table 2.
Kaplan–Meier × weighted Kaplan–Meier | Kaplan–Meier × Kaplan–Meier | |||||||
---|---|---|---|---|---|---|---|---|
Parameter | Bias | sse | see | cp(%) | Bias | sse | see | cp(%) |
A(τ/4) | −0.000 | 0.067 | 0.066 | 94.8 | −0.023 | 0.065 | 0.064 | 91.3 |
A(τ/2) | 0.000 | 0.049 | 0.049 | 94.9 | −0.043 | 0.046 | 0.045 | 80.3 |
A(3τ/4) | 0.000 | 0.042 | 0.042 | 95.0 | −0.060 | 0.037 | 0.036 | 57.5 |
A(τ) | −0.000 | 0.039 | 0.038 | 94.8 | −0.073 | 0.031 | 0.031 | 31.7 |
transformation model × transformation model | transformation model × Kaplan–Meier | |||||||
Parameter | Bias | sse | see | cp(%) | Bias | sse | see | cp(%) |
A(τ/4) | −0.000 | 0.044 | 0.044 | 94.8 | −0.023 | 0.043 | 0.043 | 91.8 |
A(τ/2) | 0.000 | 0.041 | 0.041 | 94.8 | −0.043 | 0.039 | 0.039 | 78.7 |
A(3τ/4) | 0.000 | 0.038 | 0.038 | 94.9 | −0.060 | 0.034 | 0.034 | 56.9 |
A(τ) | −0.000 | 0.035 | 0.034 | 95.0 | −0.072 | 0.030 | 0.030 | 32.4 |
Bias, the sampling bias; sse, the sampling standard error; see, the sampling mean of the standard error estimator; cp, the coverage probability of the 95% Wald confidence interval.
To evaluate the performance of the proposed estimators for the adjusted attributable fraction function, we generated event times from the transformation model Λ(t | X) = G{0.1t exp(β1 X1 + β2 X2)}, where X1 is Bernoulli with success probability 0.4, and X2 is normal with mean X1 and variance 1. We generated censoring times from the Un(0, τ) distribution to create censoring rates of approximately 70%. The goal was to estimate the adjusted attributable fraction function of X1 in the presence of X2. Table 3 provides the summary statistics for the transformation model × transformation model estimator based on 10 000 replicates with (β1, β2) = (1.0, 0.5) and n = 1000. The estimator performs very well. The results for the transformation model × Kaplan–Meier estimator are almost the same and are thus omitted.
Table 3.
Box–Cox transformation models | ||||||||
ρ = 2 | ρ = 1 | |||||||
Parameter | Bias | sse | see | cp(%) | Bias | sse | see | cp(%) |
Aadj(τ/4) | −0.001 | 0.046 | 0.046 | 94.9 | −0.001 | 0.050 | 0.049 | 94.9 |
Aadj(τ/2) | −0.001 | 0.045 | 0.045 | 95.0 | −0.001 | 0.047 | 0.046 | 94.7 |
Aadj(3τ/4) | −0.000 | 0.044 | 0.044 | 94.7 | −0.001 | 0.043 | 0.043 | 94.6 |
Aadj(τ) | 0.001 | 0.046 | 0.043 | 94.0 | 0.001 | 0.043 | 0.041 | 94.2 |
Logarithmic transformation models | ||||||||
r = 2 | r = 1 | |||||||
Parameter | Bias | sse | see | cp(%) | Bias | sse | see | cp(%) |
Aadj(τ/4) | −0.002 | 0.054 | 0.053 | 94.6 | −0.002 | 0.053 | 0.052 | 94.7 |
Aadj(τ/2) | −0.001 | 0.043 | 0.043 | 94.9 | −0.001 | 0.045 | 0.044 | 94.8 |
Aadj(3τ/4) | −0.000 | 0.037 | 0.037 | 95.0 | −0.000 | 0.039 | 0.039 | 95.0 |
Aadj(τ) | 0.001 | 0.034 | 0.034 | 94.5 | 0.001 | 0.037 | 0.036 | 94.1 |
Bias, the sampling bias; sse, the sampling standard error; see, the sampling mean of the standard error estimator; cp, the coverage probability of the 95% Wald confidence interval.
4. Example
We consider the Cardiovascular Health Study (Fried et al., 1991), which is a population-based cohort study of cardiovascular diseases in adults aged 65 years and older. The subjects were recruited from four US field centres. The major events of interest include myocardial infarction, stroke and cardiovascular disease mortality. A key objective of this study was to determine the importance of conventional cardiovascular disease risk factors on the time to the first occurrence of the major events in the Caucasian population. There are 3907 Caucasian subjects in the study, 27% of whom have experienced at least one of the three major events. We consider ten baseline covariates: age, sex, hypertension, body mass index, systolic blood pressure, smoking status, diabetes status and three dummy variables comparing the four field centres, and estimate the attributable fraction functions for hypertension and diabetes.
To assess the independent censoring assumption, we fit a proportional hazards model for the censoring time with the aforementioned ten baseline covariates. Censoring appears to be strongly associated with covariates, the standard-normal test statistics being 14.42, 2.76, 4.29 and 4.48 for age, systolic blood pressure, smoking status and diabetes, respectively. Thus, we allow censoring to depend on covariates in our analysis.
First, we estimate the population attributable fraction function of hypertension without adjusting for any other covariates. We try the proportional hazards model with hypertension as the only covariate. The proportional hazards assumption does not seem appropriate, the test of proportionality based on the score process (Lin et al., 1993) having a p-value of 0.038. We fit the Box–Cox transformation models with ρ = 2, 1, 0.5 and the logarithmic transformation models with r = 0.5, 1, 2. Using the Akaike information criterion (Akaike, 1985), we select the logarithmic transformation model with r = 2. Under this model, the regression coefficient for hypertension is estimated at 0.436 with an estimated standard error of 0.045. Figure 2(a) compares the estimates of the population attributable fraction function based on the nonparametric and semiparametric methods. The estimated population attributable fraction curve under the selected transformation model agrees well with the nonparametric curve, but with narrower confidence intervals. The estimated population attributable fraction curves under the proportional hazards and proportional odds models, especially the former, are considerably lower than those of the selected transformation model and the nonparametric method.
Next, we estimate the adjusted attributable fraction function of hypertension. We include all ten baseline covariates in the Box–Cox transformation models with ρ = 2, 1, 0.5 and the logarithmic transformation models with r = 0.5, 1, 2. The Akaike information criterion selects the logarithmic transformation model with r = 1, i.e. the proportional odds model. The estimates of regression coefficients under the selected model are shown in Table 4. The corresponding estimate of the adjusted attributable fraction function is shown in Fig. 2(b). The adjusted attributable fraction curve is considerably lower than the population attributable fraction curve. The difference is mainly due to the high correlation between hypertension and systolic blood pressure and the strong effect of systolic blood pressure on the event time.
Table 4.
Parameter | Estimate | se | Estimate/se | p-value |
---|---|---|---|---|
Age | 0.103 | 0.007 | 15.247 | <0.0001 |
Gender | 0.512 | 0.075 | 6.840 | <0.0001 |
Hypertension | 0.215 | 0.045 | 4.732 | <0.0001 |
Body mass index | 0.007 | 0.009 | 0.833 | 0.405 |
Blood pressure > 128 | 0.496 | 0.089 | 5.579 | <0.0001 |
Smoking | 0.553 | 0.118 | 4.679 | <0.0001 |
Diabetes | 0.657 | 0.102 | 6.458 | <0.0001 |
Centres 2 vs. 1 | −0.051 | 0.104 | −0.491 | 0.623 |
Centres 3 vs. 1 | 0.051 | 0.103 | 0.496 | 0.620 |
Centres 4 vs. 1 | −0.218 | 0.111 | −1.961 | 0.050 |
se, standard error.
To estimate the population attributable fraction function of diabetes, we fit the proportional hazards model with diabetes as the only covariate. The proportional hazards assumption appears reasonable: the test of proportionality based on the score process (Lin et al., 1993) has a p-value of 0.092, and the Breslow estimate of the baseline survival function is very close to its Kaplan–Meier counterpart. The estimate of the regression coefficient under the proportional hazards model is 0.642, with an estimated standard error of 0.079. As shown in Fig. 3, the estimated population attributable fraction curve under the proportional hazards model agrees well with its Kaplan–Meier counterpart but is less variable.
To estimate the adjusted attributable fraction function of diabetes, we adopt the proportional odds model shown in Table 4. As evident from Fig. 3, the adjusted attributable fraction curve for diabetes starts higher than its unadjusted counterpart and decreases more rapidly over time.
It is interesting to compare the attributable fraction functions of hypertension and diabetes. Diabetes has a stronger effect on the event time, and yet hypertension has much higher population attributable fraction values than diabetes at all time-points. The reason is that hypertension is much more prevalent than diabetes: the proportions of subjects with 0, 1 and 2 levels of hypertension are 0.453, 0.150 and 0.398, respectively, whereas the prevalence of diabetes is 0.132.
5. Discussion
The assumption on the censoring mechanism is critical to the estimation of attributable fraction functions because the Kaplan–Meier estimator for the marginal survival function is inconsistent when censoring depends on covariates. To deal with covariate-dependent censoring, we construct new estimators for the marginal survival function under a broad class of semiparametric transformation models and establish their asymptotic properties. These estimators are useful beyond the context of attributable fraction functions. Shen & Fleming (1997) studied an estimator for the special case of the proportional hazards model with time-independent covariates.
There has been a tremendous recent interest in transformation models with censored data. These models can greatly improve the accuracy of prediction over the proportional hazards model, but the regression parameters do not have simple interpretations outside the proportional hazards and proportional odds models. In the context of attributable fraction functions, the primary interest lies in the prediction rather than the regression parameters. Thus, transformation models are particularly attractive in our setting.
We have confined our attention to pointwise confidence limits for A(·) and Aadj(·), as opposed to confidence bands. Because the proposed estimators can be expressed as sums of independent terms at each time-point in the form of (1) or (7), the Monte Carlo approach of Lin et al. (1994) can be used to construct confidence bands. Unfortunately, attributable fraction functions involve ratios of probabilities and are thus intrinsically difficult to estimate well, even with large cohorts. Thus, the confidence bands would be too wide to be practically useful.
When X contains only categorical variables, one can estimate the baseline survival function S0(·) by the Kaplan–Meier estimator or under a semiparametric regression model. The former estimator is model-free, but can be highly unstable and inefficient when the number of subjects with X = 0 is small. The latter is more efficient, but less robust. Because of the intrinsic difficulties in estimating attributable fraction functions, it is generally preferable to adopt a semiparametric estimator for S0(·). The use of transformation models entails greater robustness of inference, as opposed to indiscriminate application of the proportional hazards model.
We have assumed implicitly that the data come from a random sample of the underlying population. If the sampling depends on the exposure level, then the baseline survival function S0(·) can still be estimated in the same manner as before, but the estimation of the marginal survival function S(·) needs to be adjusted. Specifically, we can estimate S(t) by , where pi is the selection probability for the i th study subject. The variance estimators can be modified accordingly.
Acknowledgments
This research was supported by the National Institutes of Health, U.S.A., and the University of North Carolina Cancer Research Fund. The authors are grateful to the editor and referees for their helpful comments.
Appendix A
Weak convergence of n1/2 {Â(·) – A(·)} and n1/2 {Âadj(·) – Aadj(·)}
We prove the weak convergence of n1/2{Â(·) – A(·)} through modern empirical process theory. The proof for n1/2{Âadj(·) – Aadj(·)} is similar and thus omitted. Let 𝒫n and P denote the empirical measure and the distribution under the true model, respectively. For a measurable function f and measure Q, the integral ∫ f d Q is abbreviated as Q f . We impose the following regularity conditions.
Condition A1. The function Λ0(·) is strictly increasing and continuously differentiable, and β lies in the interior of a compact set 𝒞.
Condition A2. With probability 1, X(·) is bounded and has uniformly bounded total variation in [0, τ ]. In addition, if there exist a vector γ and a deterministic function γ0(t) such that γ0(t) + γ TX(t) = 0 with probability 1, then γ = 0 and γ0(t) = 0.
Condition A3. With probability 1, there exists a positive constant d such that pr(C ⩾t | X) > δ and pr{Y (t) = 1 | X} >δ.
Condition A4. For any sequence 0 < x1 < · · · < xm ⩽y, , where α0 and μ0 are positive constants. This condition is satisfied by the classes of Box–Cox transformations and logarithmic transformations.
Clearly,
We shall show that n1/2{ Ŝ(t) – S0(t) and n1/2{Ŝ(t) – S(t)} are asymptotically equivalent to n1/2(𝒫n – P)η1(t) and n1/2 (𝒫n – P)η2(t), respectively, where η1(t) and η2(t) depend on the estimation methods for S0(·) and S(·), respectively. It will then follow that n1/2{Â(t) – A(t)} converges weakly to a zero-mean Gaussian process and is asymptotically equivalent to
The asymptotic equivalence is defined in the metric space l∞ [0, τ].
The main task is to establish the weak convergence of n1/2{Ŝ(·) – S(·)}. Under independent censoring, Ŝ(·) is the Kaplan–Meier estimator. Then n1/2{Ŝ(t) – S(t)} is asymptotically equivalent to
Under covariate-dependent censoring, , where Ŝ(t | x) is an estimator of S(t | x), which can be obtained by the Kaplan–Meier method or under the class of transformation models given in (4). We make use of the following representation:
(A1) |
where the expectations 𝒫n Ŝ(t | X) and P Ŝ(t | X) are taken with respect to X.
If Ŝ(t | x) is the Kaplan–Meier estimator of the survival function among the subjects with X = x, then n1/2{Ŝ(t | x) – S(t | x)} is asymptotically equivalent to
This result, together with the fact that X has a finite number of categories, implies that the second term on the right-hand side of (A1) is asymptotically equivalent to
where EX denotes expectation with respect to X.
Under the class of transformation models given in (4), and . It can be shown that for any x (·) with bounded total variation, S(·| x) is Hadamard-differentiable with respect to β and Λ0(·). It then follows from simple algebraic manipulations that n1/2{Ŝ(t | x) – S(t | x)} is asymptotically equivalent to
which in turn is asymptotically equivalent to . Since X(·) has uniformly bounded total variation, this asymptotic equivalence is uniform for X(·). Thus, the second term on the right-hand side of (A1) is asymptotically equivalent to
We can verify that P{Ŝ(t | X) – S(t | X)}2 →p 0 uniformly for t ∈ [0, t ] and that Ŝ(t | X) and S(t | X) belong to a P-Donsker class (van der Vaart & Wellner, 1996, § 2.10). It then follows that the third term on the right-hand side of (A1) converges uniformly to zero in probability by Lemma 19.24 of van der Vaart (1998).
Combining the results of the above five paragraphs, we conclude that n1/2{Ŝ(t) – S(t)} is asymptotically equivalent to n1/2(𝒫n – P) η2(t), where η2(t) is given in § 2.1. Since n1/2 {Ŝ0(t) – S0(t)} is a special case of n1/2{Ŝ(t | x) – S(t | x)} with x = 0, the weak convergence of the former follows from that of the latter.
Appendix B
Asymptotic efficiency of the Kaplan–Meier × weighted Kaplan–Meier and transformation model × transformation model estimators
We first establish the asymptotic efficiency of the Kaplan–Meier × weighted Kaplan–Meier estimator. Suppose that X contains only categorical covariates with possible values x1, . . . , x J . Let F (·) denote the distribution function of X. We consider the model space 𝒫 = {P: C is independent of T and X } or 𝒫 = {P: C is independent of T given X}. The likelihood is the product of three terms: the first term pertains to the likelihood for the conditional distribution of T given X, the second term to the likelihood for the conditional distribution of C given X and the third term to the likelihood for the distribution of X. Thus, the empirical distribution function of X is an efficient estimator of F(·). Because the first term can be written as the product of the likelihoods for the conditional survival functions of T given X = x j (j = 1, · · ·, J), the Kaplan–Meier estimator among the subjects with X = x j is an efficient estimator for S(· | x j). It can be shown that A(t) is Hadamard-differentiable with respect to S(t | x) and F(x). Hence, the Kaplan–Meier × weighted Kaplan–Meier estimator for A(·) is asymptotically efficient by Theorem 25.47 of van der Vaart (1998).
We now establish the asymptotic efficiency of the transformation model × transformation model estimator. We consider the model space 𝒫 = {P: C is independent of T and X, and the transformation model (4) holds} or 𝒫 = {P: C is independent of T given X, and the transformation model (4) holds}. The likelihood is the product of three terms: the first term pertains to the likelihood for the parameters (β, Λ0), the second term to the likelihood for the distribution of C given X and the third term to the likelihood for the distribution of X. This implies that the empirical distribution function of X is asymptotically efficient. The maximization of the first term yields the nonparametric maximum likelihood estimators β̂ and Λ^0. The asymptotic efficiency of β̂0, was proved in Zeng & Lin (2006). To establish the asymptotic efficiency of Λ^0, we define ℱ = {w(t) : ||w||BV [0τ] ⩽ 1}, where ||w||BV [0,t] denotes the total variation of w(·) in [0, τ ]. For any t, there exists {w(·), β} ∈ ℱ × ℛp such that
where is the score function along the path {Λ0+ ∊ ∫ w(s)dΛ0(s), β + ∊b} (Zeng & Lin, 2006). In addition, {I (· ⩽ t) : t ∈ [0, τ ]} is a Donsker class. It then follows from Theorem 18.9 of Kosorok (2008) that Λ̂0(·) is asymptotically efficient. Because A(·) is a function of β, Λ0(·) and F(·) and is Hadamard-differentiable, the transformation model × transformation model estimator for A(·) is asymptotically efficient by Theorem 25.47 of van der Vaart (1998).
References
- Akaike H. Prediction and entropy. In: Atkinson AC, Fienberg SE, editors. A Celebration of Statistics. New York: Springer; 1985. pp. 1–24. [Google Scholar]
- Benichou J. A review of adjusted estimators of attributable risk. Statist Meth: Med Res. 2001;10:195–216. doi: 10.1177/096228020101000303. [DOI] [PubMed] [Google Scholar]
- Breslow NE. Discussion of the paper by D. R. Cox. J. R. Statist. Soc. B. 1972;34:216–7. [Google Scholar]
- Bruzzi P, Green SB, Byar DP, Brinton LA, Schairer C. Estimating the population attributable risk for multiple risk factors using case-control data. Am J Epidemiol. 1985;122:904–14. doi: 10.1093/oxfordjournals.aje.a114174. [DOI] [PubMed] [Google Scholar]
- Chen YQ, Hu C, Wang Y. Attributable risk function in the proportional hazards model for censored time-to-event. Biostatistics. 2006;7:515–29. doi: 10.1093/biostatistics/kxj023. [DOI] [PubMed] [Google Scholar]
- Cox DR. Regression models and life-tables (with discussion) J. R. Statist. Soc. B. 1972;34:187–200. [Google Scholar]
- Dabrowska DM, Doksum KA. Partial likelihood in transformation models with censored data. Scand J Statist. 1988;15:1–23. [Google Scholar]
- Fried LP, Borhani NO, Enright P, Furberg CD, Gardin JM, Kronmal RA, Kuller LH, Manolio TA, Mittelmark MB, Newman A, O’Leary D, Psaty B, Rautaharju P, Tracy R. The cardiovascular health study: design and rationale. Ann Epidemiol. 1991;1:263–76. doi: 10.1016/1047-2797(91)90005-w. [DOI] [PubMed] [Google Scholar]
- Graubard BI, Fears TR. Standard errors for attributable risk for simple and complex sample designs. Biometrics. 2005;61:847–55. doi: 10.1111/j.1541-0420.2005.00355.x. [DOI] [PubMed] [Google Scholar]
- Greenland S. Estimation of population attributable fractions from fitted incidence ratios and exposure survey data, with an application to electromagnetic fields and childhood leukemia. Biometrics. 2001;57:182–8. doi: 10.1111/j.0006-341x.2001.00182.x. [DOI] [PubMed] [Google Scholar]
- Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. New York: Wiley; 2002. [Google Scholar]
- Kosorok MR. Introduction to Empirical Processes and Semiparametric Inference. New York: Springer; 2008. [Google Scholar]
- Levin ML. The occurrence of lung cancer in man. Acta Unio Int. contra Cancrum. 1953;9:531–41. [PubMed] [Google Scholar]
- Lin DY, Fleming TR, Wei LJ. Confidence bands for survival curves under the proportional hazards model. Biometrika. 1994;81:73–81. [Google Scholar]
- Lin DY, Wei LJ, Ying Z. Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika. 1993;80:557–72. [Google Scholar]
- Murray S, Tsiatis AA. Nonparametric survival estimation using prognostic longitudinal covariates. Biometrics. 1996;52:137–51. [PubMed] [Google Scholar]
- Pearl J. Causality: Models, Reasoning, and Inference. Cambridge: Cambridge University Press; 2000. [Google Scholar]
- Rothman KJ, Greenland S. Modern Epidemiology. 2nd ed. Philadelphia: Lippincott-Raven; 1998. [Google Scholar]
- Rubin DB. Bayesian inference for causal effects: the role of randomization. Ann Statist. 1978;6:34–58. [Google Scholar]
- Shen Y, Fleming TR. Large sample properties of some survival estimators in heterogeneous samples. J Statist Plan Infer. 1997;60:123–38. [Google Scholar]
- Silverberg MJ, Smith MW, Chmiel JS, Detels R, Margolick JB, Rinaldo CR, O’Brien SJ, Muñoz A. Fraction of cases of acquired immunodeficiency syndrome prevented by the interactions of identified restriction gene variants. Am J Epidemiol. 2004;159:232–41. doi: 10.1093/aje/kwh036. [DOI] [PubMed] [Google Scholar]
- van der Vaart AW. Asymptotic Statistics. New York: Cambridge University Press; 1998. [Google Scholar]
- van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. New York: Springer; 1996. [Google Scholar]
- Whittemore AS. Statistical methods for estimating attributable risk from retrospective data. Statist Med. 1982;1:229–43. doi: 10.1002/sim.4780010305. [DOI] [PubMed] [Google Scholar]
- Zeng D, Lin DY. Efficient estimation of semiparametric transformation models for counting processes. Biometrika. 2006;93:627–40. [Google Scholar]