Abstract
This paper presents simple weighted and fully augmented weighted estimators for the additive hazards model with missing covariates when they are missing at random. The additive hazards model estimates the difference in hazards and has an intuitive biological interpretation. The proposed weighted estimators for the additive hazards model use incomplete data nonparametrically and have close-form expressions. We show that they are consistent and asymptotically normal, and are more efficient than the simple weighted estimator which only uses the complete data. We illustrate their finite-sample performance through simulation studies and an application to study the progression from mild cognitive impairment to dementia using data from the Alzheimer’s Disease Neuroimaging Initiative as well as an application to the mouse leukemia study.
Keywords: kernel smoother, missing covariates, nonparametric method, weighted estimators, weighted estimating equations
1. Introduction
For survival (time-to-event) data, a commonly used model is the Cox proportional hazards (PH) model (Cox, 1972) pertaining to the relative risk of certain covariates. Another well-known but less used method is the additive hazards model (Aalen, 1980; Cox and Oaks, 1984; Thomas, 1986; Breslow and Day, 1987, p.182). Unlike the Cox PH model, the additive hazards model does not assume proportional hazards, and it estimates the difference in hazards instead of the hazard ratios. Although the Cox PH model is very popular, it is desirable to utilize the additive hazards model for several reasons as discussed in Lin and Ying (1997). When describing the covariate-disease association, the hazard difference is complementary to and may be more relevant to public health than the hazard ratio because it translates directly into the number of events (e.g. disease cases) for the covariate. In practice, the additive hazards model may fit certain type of data better than the Cox PH model (Breslow and Day, 1987) and it provides a simple structure for studying frailty models and interval-censored data (Lin and Ying, 1997). Therefore, when the difference in disease risk due to the covariates is of primary interest or the PH assumption does not hold, the additive hazards model may be more proper.
The additive hazards model assumes that the conditional hazard function given a set of covariates is the sum of, rather than the product of, a baseline hazard function and a linear regression function of the covariates. Specifically, the hazard function for the failure time T associated with a column vector of covariates Z has the form
| (1) |
where λ0(t) is an unspecified baseline hazard function, and β is a column vector of regression parameters. The additive hazards model has an intuitive biological interpretation. When all covariates are fully observed, Lin and Ying (1994) proposed a simple semiparametric estimating function for β which generates a consistent and asymptotically normally distributed estimator with an explicit form.
Biomedical studies with survival outcomes frequently have missing covariates and some components of Z are not observed for all study subjects. Discarding the subjects with missing covariates may lead to either biased or inefficient estimates when the missing-data mechanism depends on the outcomes. Assuming missing at random (MAR) (Little and Rubin, 1987), i.e., the missing-data mechanism (or the selection probability) depends on the observed data but not on the missing data, Qi et al. (2005) proposed the simple weighted estimating equations using nonparametrically estimated selection probabilities and the kernel-assisted fully augmented weighted estimating equations for the Cox PH models. Their resultant fully augmented estimators (FAWEs) have the double-robustness property and also improve efficiency compared to most of the simple weighted estimators (SWEs); and the SWEs with selection probability estimated using all observed data are asymptotically equivalent to the FAWEs. For two-stage studies, Mark and Katki (2006) incorporated auxiliary information as weights and used model based approaches for estimating the sampling probabilities, which was further extended to general semi-parametric models by several authors including Breslow et al. (2009) and Sun et al. (2017). For the additive hazards models, Kulich and Lin (2000) proposed a simple weighted estimator for case-cohort studies (Prentice, 1986), a special case of MAR where all cases (failures) and a subset of controls (censored subjects) are selected and have complete observations.
In this paper, we propose the simple weighted and fully augmented weighted estimating equations for the additive hazards models with missing covariates under the MAR assumption. An advantage is that the SWEs and FAWEs for the additive hazards model take explicit forms. We also propose to estimate the selection probabilities in the simple weighted and fully augmented weighted estimating equations, and the unknown conditional expectations in the fully augmented weighted estimating equations using the nonparametric kernel smoothing techniques similar to those used by Wang et al. (1997), Wang and Wang (2001) and Qi et al. (2005). Under certain regularity conditions, the resultant SWEs and FAWEs are consistent and asymptotical normal. We examine the finite sample performance of these estimators through a simulation study, and also demonstrate these methods using an Alzheimer’s Disease Neuroimaging Initiative data set (adni.loni.usc.edu) and the data from the mouse leukemia study (Kalbfleisch and Prentice, 1980).
The remainder of the paper is organized as follows: Section 2 presents the SWEs and the FAWEs and their asymptotic properties as well as the discussions of the relationships between the SWEs and the FAWEs. Section 3 examines some properties of the proposed estimators through a simulation study, and illustrates the proposed methods with real examples using the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data and the mouse leukemia data. Section 4 provides further discussions and some practical recommendations. The regularity conditions are given in the Appendix and the proofs of the theorems are given in the online Appendix S1.
2. Method
2.1. Simple Weighted Estimating Equations
In the additive hazards model (1), let T, C and X = min(T,C) be the failure, censoring and observed time for a subject, respectively. The failure indicator δ = I(T ≤ C) is 1 if the subject experiences a failure and δ = 0 if censored. Let Z denote a set of time-independent covariates. We assume that, given Z, T and C are independent, and all study subjects have X,δ observed. Suppose some elements of Z, denoted by Zc, are observed for all n study subjects, while other elements may be missing for some subjects, denoted by Zm. Let the selection indicator V equal 1 if Zm is available, and 0 otherwise. Then the selection probability π is defined by pr(V = 1 | X, δ, Zc, Zm), equal to pr(V = 1 | X, δ, Zc) under the MAR assumption.
Let N(t) = δI(X ≤ t) and Y (t) = I(X ≥ t) be the counting process and the at-risk process, respectively, corresponding to (X, δ). Let (Xi, δi, , ,Vi), i = 1,…,n, be independent and identically distributed copies of (X, δ, Zm, Zc, V). When the selection probability π is known, a simple weighted estimating function can have the following form:
| (2) |
where and τ = sup{t: pr(Y (t) = 1) > 0}. This estimating function can be regarded as a weighted complete-data pseudolikelihood score vector. By solving equation Usw(β, π) = 0, we obtain the estimator
| (3) |
In a case-cohort study (Prentice, 1986) with Bernoulli sampling (Kulich & Lin, 2000), i.e., V = 1 for all failure events (cases) and V = 1 with probability π for censored observations (controls), the SWE is the same as the estimator proposed by Kulich & Lin (2000).
To study rigorously the asymptotic properties of , we impose regularity conditions (a1) to (a5) given in the Appendix. Since π is a function of X, δ, Zc and may not be predictable, the techniques of Andersen and Gill (1982) cannot be directly applied to Usw(β, π). We employ modern empirical process theory to establish the consistency and asymptotic normality of , and further introduce the following notation to present the asymptotic results in Theorem 1. Define, for k = 0, 1, , s(k)(t) = E{Z⊗kY(t)}, a⊗0 = I, and a⊗1 = a. Let be the counting process martingale for the failure process, and e(t) = s(1)(t)/s(0)(t). Let be the martingale transformation with mean and variance , where a⊗2 = aaT.
Theorem 1 Under the regularity conditions (a1) to (a5) given in the Appendix, is consistent for the true parameter β, and converges to N (0,Σ−1Σsw(π)Σ−1) in distribution, where and .
The first term ΣA of Σsw(π) is the asymptotic variance of the full-cohort pseudo-score estimator, while the second term quantifies the efficiency loss due to the missing covariates. The variance Σsw(π) can be estimated consistently by , where
and Σ can be estimated consistently by .
When the true selection probability π is used, is obtained excluding all incomplete observations, so it may not be efficient. To improve efficiency, an estimate of π can be used in the estimating function. We implement nonparametric methods to estimate π using all observed data, allowing incomplete observations to also contribute to the calculation of . Let W denote the variables used to estimate π. When W is discrete, π can be estimated by the empirical proportion based on the observed data,
| (4) |
If W has d number of continuous components, π can be estimated consistently by nonparametric kernel smoothers. Let K be a rth-order (r > d) kernel function with bounded support, with ∫ K(u)du = 1, ∫ umK(u)du = 0 for m = 1,…, (r − 1), ∫ urK(u)du ≠ 0, and ∫ K(u)2 du < ∞. Let Kh(·) = K(·/h), and h is the smoothing parameter, also called the bandwidth. We estimate π(w) using the Nadaraya-Watson (Nadaraya 1964; Watson 1964) estimator
| (5) |
The kernel function K usually has little effect on , and thus on the estimator of β, while h affects the behavior of the estimator both theoretically and practically. We assume h satisfies nh2d → ∞ and nh2r → 0, as n → ∞. Similar to Qi et al. (2005) and Wang & Wang (2001), we may choose h = O(n−1/p) for some integer p > 2d, and the smallest even integer for r such that r ≥ p − d. For example, when d = 2, p and r can take values of 5 and 4, respectively.
Plugging in in the simple weighted estimating function (2) results in the following estimating function:
Solving , we obtain with the same closed form of (3) replacing π with . The consistency and asymptotic normality of were established in the online supplementary material, with more imposed regularity conditions (a6) to (a10) in the Appendix, and the results are stated in the following Theorem 2.
Theorem 2 Under the regularity conditions (a1) to (a10), given in the Appendix, is consistent for the true parameter β, and converges to in probability, with , where .
Theorem 2 shows that with the consistent estimator in (4) or (5), is consistent for β, and has a smaller asymptotic variance than because their asymptotic variances have the same first term ΣA and the second term in is , greater than , the second term in . Hence using the nonparametrically estimated selection probability in the simple weighted estimating equation allows more effective use of the available data and improves the efficiency of the SWE with π.
Also is non-increasing over the dimension of W, so the more variables are used to estimate π, the smaller is and the more efficient is . This suggests that estimating selection probabilities using additional variables besides the variables on which they depend may lead to further efficiency gains for the SWEs. The variance can be estimated consistently by estimating ΣA and consistently. Let . Estimate dΛ0(t) and , respectively, by
We estimate ΣA by
and by
| (6) |
where is obtained by the Nadaraya-Watson estimator (12) of Section 2.2.
2.2. Fully Augmented Weighted Estimating Equations
We propose the following fully augmented estimating function for the additive hazards model,
| (7) |
where , and let W = (X, δ, Zc),
and for k = 0, 1,
| (8) |
When k = 0, .
The fully augmented weighted estimating function Ufaw(β, π) uses incomplete observations through the augmented averages and , and the augmentation term . The resulting fully augmented weighted estimator (FAWE) possesses the so-called double-robust property, ie., the estimator is consistent if either the missing-data mechanism (i.e. the selection probability) or the distribution of the missing covariates given the observed data is modeled correctly (Wang & Chen 2001, Qi et al. 2005). Solving Ufaw(β, π) = 0, we can obtain the FAWE explicitly:
| (9) |
where and .
The conditional expectations and in equation (9) are unknown since they contain unknown quantities and E(Zi|Wi). Specifically, , and . The denominator of , does not contain unknown quantities, while the nominator of , involves E(Zi | Wi), i = 1,…, n, as seen in equation (8). So once and E(Zi|Wi) are estimated, we can estimate and as well as in equation (9), and generate a FAWE . Furthermore, since Z = (Zm, Zc)T and Zc is known for all subjects, we have
| (10) |
and
| (11) |
Hence we can use the observed values for Zc and only need to estimate the unknown quantities E[(Zm)⊗2 | W] and E(Zm | W) in equations (10) and (11). We propose to use a nonparametric kernel-assisted method to estimate E[(Zm)⊗2 | W] and E(Zm | W). Specifically, let ζ denote Zm or (Zm)⊗2, then we can estimate E(ζ | W) using the Nadaraya-Watson (Nadaraya 1964; Watson 1964) estimator based on the complete observations.
Let ϕ(w) = E(ζ | w). Assuming ϕ(w) is a smooth function with r continuous and bounded partial derivatives with respect to the continuous components of W a.e., then a Nadaraya-Watson estimator of ϕ(w) is
| (12) |
where K is an rth-order kernel function as defined in Section 2.1 and h is a smoothing parameter.
A step by step algorithm for obtaining the FAWEs using equation (9) can be summarized as below:
Apply the kernel smoother (12) to obtain the estimated E[(Zm)⊗2|W)] and E(Zm|W), i.e., Ê[(Zm)⊗2|W)] and Ê(Zm|W), respectively. When there are both discrete and continuous components in W, first stratify by the discrete components, then at each level, implement the kernel smoothers.
Calculate using equation (8). For the augmented term , plug in the observed values of Zc and Ê(Zm|W). Calculate and obtain .
Obtain an FAWE using equation (9). For the augmented terms in both the nominator and the denominator, plug in the observed values of Zc, and Ê[(Zm)⊗2|W], Ê(Zm|W) and the estimated .
When the selection probability π is unknown, in (5) with w = (x, δ, zc) can substitute for π in equation (7), resulting in the FAWE . When both selection probabilities and conditional expectations are estimated nonparametrically, different kernel functions may be used respectively. For simplicity, we used the same kernel function for theoretical derivations and simulation studies. The following theorems present the asymptotical properties of , , and .
Theorem 3 Under the regularity conditions (a1) to (a5) given in the Appendix, is consistent for the true parameter β, and converges to N(0,Σ−1Σfaw(π)Σ−1) in distribution, with , where .
Theorem 4 Under the regularity conditions (a1) to (a10) given in the Appendix, , , and are consistent for the true parameter β. Moreover, , and are asymptotically normal with mean 0 and variance matrix Σ−1Σfaw(π)Σ−1.
All the FAWEs have the same asymptotic distribution, indicating that the asymptotic properties of the FAWEs are not affected by the nonparametric estimation of the selection probabilities and the conditional expectations. The FAWEs are more efficient than the SWE with true π and the SWEs with except that the SWE with nonparametric has the same asymptotic distribution as all the FAWEs. When Zm can be exactly specified by W = (X, δ, Zc), then the martingale transformation is constant given W, so , and . Therefore the SWE with and the kernel-assisted FAWEs achieve the efficiency of the estimator based on the full cohort data in this special situation.
Consistent estimators of Σfaw(π) can be obtained similarly to those of the For illustration, we demonstrate how to estimate the variance Σfaw(π) for . Set
and
to be the estimators of dΛ0(t) and , respectively. Then ΣA and are estimated respectively by
and
where is obtained using the Nadaraya-Watson estimator in (12).
3. Numerical Studies
3.1. Simulation Study
A comprehensive simulation study was conducted to examine the moderate sample size performance of the SWEs and the kernel-assisted FAWEs and to compare their performance with that of the full-cohort (i.e. Lin & Ying’s estimator, 1994) and complete-case analyses. In all simulations, 1000 datasets were generated and either n = 250 or 500 subjects were used. In the first three simulation settings, we considered two independent covariates, a missing covariate Zm and an observed covariate Zc. In the first setting, a binary variable Zc was generated from the Bernoulli distribution with probability 0.5, and Zm followed a standard normal distribution. We generated the failure time using λ(t;Zm, Zc) = 1.5 − 0.5Zm + 1.0Zc, and the censoring time based on the exponential distribution with mean 1.5, resulting in about 45% censored observations. The first setting mimicked the case-cohort (Prentice, 1986) sampling scheme used in Kulich & Lin (2000), with π(δ) = δ + 0.5(1 − δ), so that we can compare our estimators with theirs.
In the second setting, π(δ) = 0.7δ + 0.5(1 − δ), resulting in an overall missing rate of 39%. We considered correlated Zm and Zc in this setting. Specifically, Zc ~ Bernoulli(0.5), Zm = Zc − 0.5 + ϵ, where ϵ ~ N(0, 1), and the correlation between Zm and Zc is about 0.44. In the third setting, both Zm and Zc were generated from a Bernoulli distribution with probability 0.5. The hazard function λ(t;Zm, Zc) = 2t+0.8Zm +0.4Zc was used to generate the failure time and the censoring time was generated from the exponential distribution with mean 1, yielding about 44% censored observations. The selection probability was π(δ, X) = 1/{1+exp(1.5 − 2.5δ − X)}, and about 44% observations had missing Zm. In the fourth setting, we considered two missing covariates, Zm1 ~ N(0,1), Zm2 ~ Bernoulli(0.5), Zc ~ Bernoulli(0.5), β = (0.5,1.0,−0.5) and the baseline hazard function equal to 1.0. The censoring time was generated from exponential distribution with mean 0.5, yielding about 49% censored observations. The selection probability π(δ) = 0.6δ + 0.4(1 − δ), resulting in 50% observations with Zm1 and Zm2 missing.
The theoretical standard errors for the SWEs, the kernel-assisted FAWEs were obtained from the corresponding variance estimators discussed in Sections 2. The conditional expectations in the FAWEs were estimated by the Nadaraya-Watson estimators with the smoothing parameter h = 4σW n−1/3, where σW was the standard deviation of observed times stratified by δ and Zc.
We obtain the following measures for all the estimators. Bias is the average difference between a parameter and its estimate. Relative bias is the result of bias divided by the true β value. Sample standard error (SE) equals the square root of the sample variance of the 1000 parameter estimates. Mean theoretical standard error (SE) is the average of the one thousand standard error estimates. And 95% confidence interval coverage probabilities (CP) were calculated using the theoretical standard error estimates.
Table 1 presents the results from the first simulation setting. All estimated selection probabilities are consistent for the true π. There is no evidence of bias in any of the estimates except for the complete-case analysis, due to the strong association between the selection probability and the outcome variable δ. All FAWEs had smaller bias than other estimators for both β1 and β2. The sample standard errors are generally in good agreement with the corresponding mean theoretical standard errors. As the cohort size increased from 250 to 500, the sample and mean theoretical standard errors for became closer for all the weighted estimators, and the standard errors of SWE () became closer to those of the FAWEs.
Table 1.
Simulation comparison of various estimators of the additive hazards model parameter β = (−0:5; 1:0).
| Bias | Relative bias | Sample SE | Theoretical SE | 95% CP | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| n | Method | β1 | β2 | β1 | β2 | β1 | β2 | β1 | β2 | β1 | β2 |
| 250 | Full cohort | −0.010 | 0.021 | −0.019 | 0.021 | 0.169 | 0.340 | 0.168 | 0.344 | 0.948 | 0.955 |
| Complete case | −0.059 | 0.110 | −0.118 | 0.110 | 0.219 | 0.446 | 0.222 | 0.444 | 0.945 | 0.949 | |
| Kulich & Lin | −0.009 | 0.026 | −0.017 | 0.026 | 0.200 | 0.421 | 0.201 | 0.414 | 0.949 | 0.950 | |
| SWE - π(δ)a | −0.009 | 0.026 | −0.017 | 0.026 | 0.200 | 0.421 | 0.195 | 0.406 | 0.941 | 0.941 | |
| SWE - (δ)b | −0.009 | 0.026 | −0.019 | 0.026 | 0.200 | 0.421 | 0.201 | 0.414 | 0.952 | 0.949 | |
| SWE - | −0.009 | 0.026 | −0.019 | 0.026 | 0.200 | 0.383 | 0.201 | 0.383 | 0.954 | 0.948 | |
| SWE - | −0.004 | 0.021 | −0.008 | 0.021 | 0.205 | 0.427 | 0.199 | 0.407 | 0.947 | 0.941 | |
| SWE - | −0.005 | 0.059 | −0.009 | 0.059 | 0.206 | 0.375 | 0.197 | 0.366 | 0.943 | 0.947 | |
| FAWE - π(δ)a | −0.003 | 0.009 | 0.006 | 0.009 | 0.210 | 0.356 | 0.199 | 0.360 | 0.943 | 0.961 | |
| FAWE - (δ)b | −0.001 | 0.007 | 0.002 | 0.007 | 0.210 | 0.356 | 0.199 | 0.360 | 0.942 | 0.963 | |
| FAWE - | −0.002 | 0.008 | 0.004 | 0.008 | 0.210 | 0.356 | 0.200 | 0.360 | 0.942 | 0.962 | |
| FAWE - | −0.004 | 0.008 | 0.008 | 0.008 | 0.210 | 0.355 | 0.200 | 0.361 | 0.946 | 0.962 | |
| FAWE - | −0.005 | 0.008 | 0.010 | 0.008 | 0.213 | 0.355 | 0.200 | 0.361 | 0.933 | 0.961 | |
| 500 | Full cohort | −0.001 | 0.008 | −0.002 | 0.008 | 0.114 | 0.238 | 0.115 | 0.240 | 0.956 | 0.947 |
| Complete case | −0.049 | 0.104 | −0.099 | 0.104 | 0.154 | 0.304 | 0.152 | 0.307 | 0.939 | 0.948 | |
| Kulich & Lin | −0.001 | 0.016 | −0.003 | 0.016 | 0.138 | 0.285 | 0.139 | 0.289 | 0.945 | 0.957 | |
| SWE - π(δ)a | −0.001 | 0.016 | −0.003 | 0.016 | 0.138 | 0.285 | 0.137 | 0.286 | 0.945 | 0.954 | |
| SWE - (δ)b | −0.001 | 0.016 | −0.003 | 0.016 | 0.137 | 0.285 | 0.139 | 0.289 | 0.945 | 0.955 | |
| SWE - | −0.002 | 0.015 | −0.003 | 0.015 | 0.138 | 0.267 | 0.139 | 0.267 | 0.945 | 0.948 | |
| SWE - | 0.000 | 0.014 | 0.000 | 0.014 | 0.140 | 0.288 | 0.138 | 0.285 | 0.943 | 0.951 | |
| SWE - | −0.000 | 0.036 | −0.000 | 0.036 | 0.138 | 0.259 | 0.136 | 0.250 | 0.940 | 0.934 | |
| FAWE - π(δ)a | 0.002 | 0.003 | −0.005 | 0.003 | 0.141 | 0.244 | 0.137 | 0.248 | 0.936 | 0.950 | |
| FAWE - (δ)b | 0.003 | 0.001 | −0.007 | 0.001 | 0.141 | 0.245 | 0.137 | 0.248 | 0.936 | 0.950 | |
| FAWE - | 0.002 | 0.002 | −0.005 | 0.002 | 0.142 | 0.245 | 0.137 | 0.248 | 0.938 | 0.950 | |
| FAWE - | −0.001 | 0.002 | 0.001 | 0.002 | 0.142 | 0.245 | 0.138 | 0.248 | 0.942 | 0.950 | |
| FAWE - | −0.001 | 0.002 | 0.002 | 0.002 | 0.141 | 0.245 | 0.138 | 0.248 | 0.945 | 0.948 | |
The baseline hazard was a constant, λ0(t) = 1.5 and Zm ~ N(0, 1), Zc ~ Bernoulli(0.5). The censoring time was generated from exponential distribution. Censoring rate 45% and selection probability π(δ) = δ + 0.5(1 – δ), a mocked case-cohort sampling.
True π was used.
was estimated nonparametrically based on the variable in the bracket. Other was obtained on the variables in the bracket using the Nadaraya-Watson estimator with uniform kernel and bandwidth h = 4σW n−1/3.
In addition, the estimator of Kulich & Lin (2000) and SWE (π(δ)) have the same point estimates because they share the same estimating equations. All weighted estimators have similar standard errors for , indicating that compared to the SWE with true π, using partially incomplete data in estimation did not improve the efficiency of estimates of β1. However, compared to SWE(π(δ)) the sample standard error for was reduced by using the SWEs with (e.g. about 9% when n = 250) and (about 11% when n = 250) and as well as all the kernel-assisted FAWEs (about 16% when n = 250). So the efficiency of was improved by including the incomplete data in estimation. Especially, all kernel-assisted FAWEs almost achieved the full-cohort efficiency for estimating β2. For n = 250, the ratios of sample and theoretical standard errors between the full-cohort estimator and the FAWEs were 96%. When cohort size was 500, the ratio was about 98% if calculated from sample standard errors, and about 97% if from theoretical standard errors.
The results from the second simulation setting, where Zm and Zc are correlated, show similar patterns as those from the first setting and are provided in Table 2. The bias of complete-case method was smaller in this setting because the missing data mechanism was less dependent on the censoring indicator δ (selection probability was 70% for cases and 50% for controls).
Table 2.
Simulation comparison of various estimators of the additive hazards model parameter β = (−0.5, 1.0). The baseline hazard was a constant, λ0(t) = 1.5 and Zc ~ Bernoulli(0.5); Zm = Zc – 0.5 + ϵ, where ϵ ~ N(0, 1), and the correlation between Zm and Zc is about 0.44. The censoring time was generated from exponential distribution. Censoring rate 45% and selection probability π(δ) = 0.7δ + 0.5(1 – δ).
| Bias | Relative bias | Sample SE | Theoretical SE | 95% CP | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| n | Method | β1 | β2 | β1 | β2 | β1 | β2 | β1 | β2 | β1 | β2 |
| n=250 | Full cohort | −0.012 | 0.017 | 0.024 | 0.017 | 0.170 | 0.348 | 0.167 | 0.342 | 0.949 | 0.948 |
| Complete case | −0.050 | 0.080 | 0.100 | 0.080 | 0.242 | 0.497 | 0.236 | 0.475 | 0.948 | 0.941 | |
| SWE - π(δ)a | −0.019 | 0.025 | 0.038 | 0.025 | 0.227 | 0.467 | 0.212 | 0.442 | 0.930 | 0.943 | |
| SWE - (δ)b | −0.019 | 0.026 | 0.037 | 0.026 | 0.225 | 0.467 | 0.218 | 0.448 | 0.946 | 0.946 | |
| SWE - | −0.020 | 0.017 | 0.039 | 0.017 | 0.226 | 0.429 | 0.218 | 0.409 | 0.946 | 0.956 | |
| SWE - | −0.019 | 0.018 | 0.038 | 0.018 | 0.231 | 0.472 | 0.216 | 0.440 | 0.929 | 0.935 | |
| SWE - | −0.019 | 0.061 | 0.038 | 0.061 | 0.229 | 0.400 | 0.215 | 0.376 | 0.932 | 0.940 | |
| FAWE - π(δ)a | −0.020 | 0.007 | 0.040 | 0.007 | 0.230 | 0.363 | 0.215 | 0.367 | 0.935 | 0.961 | |
| FAWE - (δ)b | −0.021 | 0.008 | 0.041 | 0.008 | 0.230 | 0.364 | 0.215 | 0.367 | 0.935 | 0.962 | |
| FAWE - | −0.020 | 0.008 | 0.041 | 0.008 | 0.230 | 0.364 | 0.216 | 0.368 | 0.935 | 0.962 | |
| FAWE - | −0.021 | 0.007 | 0.041 | 0.007 | 0.232 | 0.363 | 0.216 | 0.368 | 0.932 | 0.959 | |
| FAWE - | −0.020 | 0.007 | 0.039 | 0.007 | 0.231 | 0.362 | 0.216 | 0.368 | 0.928 | 0.961 | |
| n=500 | Full cohort | −0.010 | −0.007 | 0.019 | −0.007 | 0.114 | 0.235 | 0.116 | 0.239 | 0.952 | 0.952 |
| Complete case | −0.047 | 0.059 | 0.093 | 0.059 | 0.161 | 0.331 | 0.162 | 0.329 | 0.953 | 0.959 | |
| SWE - π(δ)a | −0.016 | 0.004 | 0.032 | 0.004 | 0.152 | 0.316 | 0.148 | 0.309 | 0.945 | 0.951 | |
| SWE - (δ)b | −0.016 | 0.005 | 0.032 | 0.005 | 0.151 | 0.315 | 0.151 | 0.311 | 0.954 | 0.952 | |
| SWE - | −0.016 | 0.003 | 0.032 | 0.003 | 0.151 | 0.287 | 0.151 | 0.284 | 0.950 | 0.948 | |
| SWE - | −0.015 | 0.003 | 0.029 | 0.003 | 0.155 | 0.323 | 0.150 | 0.308 | 0.943 | 0.952 | |
| SWE - | −0.014 | 0.026 | 0.027 | 0.026 | 0.155 | 0.265 | 0.149 | 0.259 | 0.934 | 0.945 | |
| FAWE - π(δ)a | −0.015 | −0.013 | 0.030 | −0.013 | 0.155 | 0.246 | 0.148 | 0.254 | 0.937 | 0.959 | |
| FAWE - (δ)b | −0.015 | −0.013 | 0.030 | −0.013 | 0.155 | 0.246 | 0.148 | 0.254 | 0.941 | 0.959 | |
| FAWE - | −0.015 | −0.013 | 0.030 | −0.013 | 0.155 | 0.246 | 0.149 | 0.254 | 0.940 | 0.959 | |
| FAWE - | −0.015 | −0.013 | 0.030 | −0.013 | 0.157 | 0.245 | 0.149 | 0.255 | 0.941 | 0.959 | |
| FAWE - | −0.015 | −0.013 | 0.030 | −0.013 | 0.157 | 0.245 | 0.149 | 0.255 | 0.937 | 0.959 | |
True π was used.
was estimated nonparametrically based on the variable in the bracket. Other was obtained on the variables in the bracket using the Nadaraya-Watson estimator with uniform kernel and bandwidth h = 4σW n−1/3.
Table 3 displays the results from the third simulation setting where both Zm and Zc were binary and the selection probability depended on both the censoring indicator and the survival time. The results from this setting show similar patterns as those from the first two settings, except that the sample and mean theoretical standard errors of are not in good agreement with each other and its 95% confidence interval coverage probability is low when the sample size was 250. This occurred because the based on δ alone was inconsistent for the true π(δ, X). These issues are reduced when the sample size increases to 500. The , however, does not have these issues and still performs well under both sample sizes. Because the selection probability depended heavily on both survival time and censoring indicator, the bias for estimates from the complete-case analysis was elevated to about 15% of the true parameter value.
Table 3.
Simulation comparison of various estimators of the additive hazards model parameter β = (0.8, 0.4). The baseline hazard was a linear function of time, λ0(t) = 2t and Zm ~ Bernoulli(0.5), Zc ~ Bernoulli(0.5). The censoring time was generated from exponential distribution. Censoring rate 44% and selection probability π(δ, X) = 1/{1 + exp(1.5 − 2:5δ – X}.
| Bias | Relative bias | Sample SE | Theoretical SE | 95% CP | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| n | Method | β1 | β2 | β1 | β2 | β1 | β2 | β1 | β2 | β1 | β2 |
| 250 | Full cohort | 0.010 | 0.003 | 0.013 | 0.008 | 0.218 | 0.223 | 0.226 | 0.218 | 0.957 | 0.951 |
| Complete case | 0.125 | 0.059 | 0.156 | 0.149 | 0.315 | 0.314 | 0.314 | 0.305 | 0.942 | 0.946 | |
| SWE - π(δ,X)a | 0.027 | 0.012 | 0.034 | 0.008 | 0.303 | 0.303 | 0.292 | 0.281 | 0.940 | 0.925 | |
| SWE - (δ)b | −0.016 | −0.009 | −0.020 | −0.006 | 0.320 | 0.312 | 0.279 | 0.267 | 0.908 | 0.906 | |
| SWE - | 0.023 | 0.007 | 0.029 | 0.005 | 0.307 | 0.304 | 0.287 | 0.277 | 0.936 | 0.921 | |
| SWE - | 0.023 | 0.024 | 0.029 | 0.017 | 0.301 | 0.256 | 0.282 | 0.235 | 0.935 | 0.938 | |
| FAWE - π(δ,X)a | 0.026 | 0.016 | 0.032 | 0.012 | 0.301 | 0.247 | 0.285 | 0.233 | 0.938 | 0.943 | |
| FAWE - (δ)b | 0.029 | 0.016 | 0.037 | 0.011 | 0.308 | 0.247 | 0.317 | 0.238 | 0.946 | 0.947 | |
| FAWE - | 0.028 | 0.016 | 0.035 | 0.012 | 0.306 | 0.247 | 0.286 | 0.234 | 0.937 | 0.941 | |
| FAWE - | 0.027 | 0.016 | 0.034 | 0.012 | 0.307 | 0.247 | 0.283 | 0.233 | 0.938 | 0.942 | |
| 500 | Full cohort | 0.011 | 0.004 | 0.014 | 0.011 | 0.153 | 0.149 | 0.159 | 0.153 | 0.956 | 0.969 |
| Complete case | 0.120 | 0.063 | 0.149 | 0.157 | 0.216 | 0.211 | 0.219 | 0.213 | 0.929 | 0.945 | |
| SWE - π(δ,X)a | 0.017 | 0.009 | 0.021 | 0.007 | 0.208 | 0.201 | 0.208 | 0.200 | 0.955 | 0.950 | |
| SWE - (δ)b | −0.032 | −0.016 | −0.040 | −0.012 | 0.219 | 0.210 | 0.203 | 0.194 | 0.922 | 0.932 | |
| SWE - | 0.012 | 0.008 | 0.015 | 0.006 | 0.212 | 0.208 | 0.204 | 0.197 | 0.936 | 0.939 | |
| SWE - | 0.011 | 0.015 | 0.014 | 0.011 | 0.211 | 0.167 | 0.201 | 0.163 | 0.937 | 0.953 | |
| FAWE - π(δ,X)a | 0.014 | 0.011 | 0.018 | 0.008 | 0.212 | 0.160 | 0.201 | 0.163 | 0.942 | 0.961 | |
| FAWE - (δ)b | 0.016 | 0.010 | 0.019 | 0.007 | 0.216 | 0.160 | 0.230 | 0.167 | 0.964 | 0.963 | |
| FAWE - | 0.014 | 0.010 | 0.018 | 0.007 | 0.214 | 0.160 | 0.203 | 0.163 | 0.934 | 0.964 | |
| FAWE - | 0.015 | 0.010 | 0.018 | 0.007 | 0.215 | 0.160 | 0.202 | 0.163 | 0.934 | 0.961 | |
True π was used.
was estimated nonparametrically based on the variable in the bracket. Other was obtained on the variables in the bracket using the Nadaraya-Watson estimator with uniform kernel and bandwidth h = 4σW n−1/3.
The results from the fourth setting with a mixture of continuous and binary missing covariates, Zm1 and Zm2, and a binary Zc, showed similar patterns as those from the first three settings (Table 4).
Table 4.
Simulation comparison of various estimators of the additive hazards model parameter β = (0.5, 1.0, −0.5). The baseline hazard was a linear function of time, λ0(t) = 1.0 and , , Zc ~ Bernoulli(0.5). The censoring time was generated from exponential distribution. Censoring rate 49% and selection probability π(δ) = 0.6δ + 0.4(1 – δ).
| Bias | Relative bias | Sample SE | Theoretical SE | 95% CP | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| n | Method | β1 | β2 | β3 | β1 | β2 | β3 | β1 | β2 | β3 | β1 | β2 | β3 | β1 | β2 | β3 |
| 500 | Full cohort | −0.001 | 0.009 | 0.003 | −0.003 | 0.009 | −0.006 | 0.138 | 0.280 | 0.269 | 0.136 | 0.279 | 0.271 | 0.943 | 0.944 | 0.950 |
| Complete case | 0.044 | 0.096 | −0.052 | 0.087 | 0.096 | 0.103 | 0.220 | 0.433 | 0.411 | 0.216 | 0.434 | 0.426 | 0.938 | 0.943 | 0.960 | |
| SWE - π(δ,X,Zc)a | 0.003 | 0.016 | −0.014 | 0.006 | 0.016 | 0.028 | 0.202 | 0.403 | 0.381 | 0.192 | 0.396 | 0.386 | 0.939 | 0.948 | 0.957 | |
| SWE - (δ)b | 0.003 | 0.018 | −0.015 | 0.006 | 0.018 | 0.030 | 0.201 | 0.403 | 0.383 | 0.190 | 0.387 | 0.378 | 0.938 | 0.945 | 0.948 | |
| SWE - | 0.000 | 0.010 | −0.009 | 0.001 | 0.010 | 0.017 | 0.204 | 0.410 | 0.442 | 0.194 | 0.393 | 0.389 | 0.940 | 0.943 | 0.919 | |
| SWE - | 0.004 | 0.008 | −0.030 | 0.008 | 0.008 | 0.059 | 0.208 | 0.410 | 0.318 | 0.193 | 0.391 | 0.306 | 0.928 | 0.941 | 0.945 | |
| FAWE - π(δ,X,Zc)a | 0.005 | 0.010 | −0.026 | 0.010 | 0.010 | 0.051 | 0.208 | 0.413 | 0.307 | 0.232 | 0.471 | 0.326 | 0.969 | 0.974 | 0.967 | |
| FAWE - (δ)b | 0.005 | 0.010 | −0.026 | 0.010 | 0.010 | 0.051 | 0.208 | 0.413 | 0.307 | 0.194 | 0.392 | 0.304 | 0.926 | 0.939 | 0.949 | |
| FAWE - | 0.005 | 0.012 | −0.025 | 0.011 | 0.012 | 0.051 | 0.210 | 0.417 | 0.307 | 0.194 | 0.394 | 0.305 | 0.927 | 0.945 | 0.946 | |
| FAWE - | 0.005 | 0.011 | −0.026 | 0.011 | 0.011 | 0.051 | 0.211 | 0.418 | 0.308 | 0.193 | 0.391 | 0.305 | 0.922 | 0.939 | 0.949 | |
True π was used.
was estimated nonparametrically based on the variable in the bracket. Other was obtained on the variables in the bracket using the Nadaraya-Watson estimator with uniform kernel and bandwidth h = 4σW n−1/3.
In summary, the results from the simulation studies suggest that (1) the kernel-assisted FAWEs and most of the SWEs with nonparametric are more efficient than the SWE with true π; (2) the SWEs with are not as efficient as the kernel-assisted FAWEs most of the time, and the efficiency of , the most efficient among all SWEs, approaches to those of the FAWEs when sample size increases; (3) the complete-case analysis generates inconsistent estimates when true π depends on outcome variables; (4) all FAWEs and the SWEs correct such bias with true π or consistent ; (5) an inconsistent may affect the variance estimation of the SWEs but not of the kernel-assisted FAWEs.
3.2. Application to the Alzheimers Disease Neuroimaging Initiative Data
We illustrate our methods using a data set from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database, a large depository of clinical and demographic data of Alzheimer’s disease (AD) patients, as well as their longitudinal outcome and imaging measurements (adni.loni.usc.edu). The ADNI, led by Dr. Michael W. Weiner at VA Medical Center and University of California San Francisco, was launched in 2003 by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, the Food and Drug Administration, pharmaceutical companies and non-profit organizations as a public-private partnership. The ADNI had several goals including AD pathophysiology investigation, diagnostic tool improvement and biomarker development.
Our example involves the progression from mild cognitive impairment (MCI) to dementia data from the first phase of ADNI (ADNI-1), where 800 patients were enrolled from over 50 sites across the U.S. and Canada in 2005–2007. Patients were advised to have office visit very six months for a period of two years, followed by a visit after one-year interval. Totally 382 patients had MCI diagnosis at the initial visit and had at least one follow-up visit. The middle point between two visits was used as the time-to-event, as in conventional analyses of follow-up survival outcomes in practice. Among the 382 patients, 159 developed dementia and 223 were censored. The median follow-up time for patients without dementia diagnosis at subsequent visits was 36 months. Baseline covariates were obtained during the initial visit to address various questions. Here we consider the association between the risk of developing dementia from MCI and two covariates: the APOE-e4 status, a binary variable with 1 indicating the presence of APOE-e4 and 0 otherwise, and Abeta, a biomarker with continuous expression levels. All patients had APOE-e4 status while the Abeta expression levels were available only for 192 patients who consented to get the lumbar punctures done. The standardized Abeta expression level was included in the analysis and each unit of the standardized variable represented 50 in the original expression level. We applied the complete-case analysis, the SWEs and the kernel-assisted FAWEs with estimated selection probabilities to the data. Selection probabilities were estimated based on censoring indicator δ only, observed time X and δ, and X, δ and APOE-e4 status for the SWEs and the FAWEs. When including X to estimate the selection probabilities, the Nadaraya-Watson estimator was used with the band width h = 4σW n−1/3. Conditional expectations were estimated by the Nadaraya-Watson estimator (12) based on X, δ and the APOE-e4 status.
Results in Table 5 show that the Abeta level was significantly associated with the risk of dementia after adjusting for APOE-e4 status. For patients with the same APOE-e4 status, a high Abeta expression level was associated with a lower risk of dementia. However, APOE-e4, was not statistically significant regardless of the methods used.
Table 5.
Analysis of the ANDI dementia data using the additive hazards model and various estimators.
| Abeta | APOE-e4 | |||||
|---|---|---|---|---|---|---|
| Method | Estimate | SE | p–value | Estimate | SE | p–value |
| Complete-case | −0.0055 | 0.0016 | 0.0009 | 0.0035 | 0.0047 | 0.46 |
| SWE - | −0.0052 | 0.0016 | 0.0009 | 0.0044 | 0.0041 | 0.28 |
| SWE - | −0.0054 | 0.0012 | < 0.0001 | 0.0037 | 0.0034 | 0.27 |
| SWE - | −0.0052 | 0.0012 | < 0.0001 | 0.0063 | 0.0041 | 0.13 |
| FAWE - | −0.0053 | 0.0015 | 0.0004 | 0.0057 | 0.0037 | 0.12 |
| FAWE - | −0.0052 | 0.0016 | 0.0011 | 0.0058 | 0.0039 | 0.14 |
| FAWE - | −0.0053 | 0.0016 | 0.0008 | 0.0057 | 0.0039 | 0.15 |
was obtained on the variables in the bracket using the Nadaraya-Watson estimator with uniform kernel and bandwidth h = 4σW n−1/3.
3.3. Application to the Mouse Leukemia Study
We also illustrate our methods using the data set from the mouse leukemia study (Kalbfleisch and Prentice, 1980). This study was conducted in the laboratories of Dr. Robert Nowinski of the Fred Hutchinson Cancer Research Center, Seattle, Washington, investigating genetic and viral factors in the development of spontaneous leukemia in mice. Totally 204 mice were followed for 2 years for mortality due to thymic or nonthymic leukemia, or other natural causes. Two covariates, the Gpd-1 phenotype and the level of endogenous murine leukemia virus, were of interest. Almost all mice had the level of endogenous murine leukemia virus measured. The Gpd-1 phenotype was obtained for 100 mice that survived 400 days, indicating whether the Gpd-1 phenotype was observed on a mouse depends on its follow-up time. The MAR assumption seems proper here since the missingness was caused by design.
Following previous publications (e.g. Wang and Chen, 2001, and Qi et. al. 2005), we excluded the animals with missing endogenous murine leukemia virus for computational simplicity. A total of 175 mice were analyzed in the data analysis. The virus level was classified into two categories, with Zc = 0 if a virus level < 104 PFU/ml and 1 otherwise. We conducted separate analyses for the death of thymic leukemia and the death of thymic or nonthymic leukemia as the endpoint, respectively. We obtained the estimates of regression coefficients in the AFT model using the complete case analysis, the SWEs and the kernel-assisted FAWEs. To estimate the selection probabilities for the SWEs and the FAWEs, we applied the Nadaraya-Watson estimator in (5) with bandwidth h = 4σW n−1/3. Conditional expectations were estimated by the Nadaraya-Watson estimator (12).
Results in Table 6 show that only the complete case analysis indicated a significant association between the Gpd-1 phenotype and the death of thymic leukemia adjusting for the virus load, while for the death of thymic or nonthymic leukemia, none of the methods had significant results. For the virus load, all methods except for the complete case analysis resulted in significant associations with the death of thymic leukemia and with the death of thymic or nonthymic leukemia, respectively. The FAWEs sometimes have slightly smaller SE than the SWEs, especially for the coefficient of the observed covariate, virus load. This trend is consistent with what we have seen in the simulation results.
Table 6.
Analysis of the Mouse Leukemia data using the additive hazards model and various estimators.
| Gpd-1 | Virus load | |||||
|---|---|---|---|---|---|---|
| Method | Estimate | SE | p–value | Estimate | SE | p–value |
| Thymic leukemia death | ||||||
| Complete-case | −0.0167 | 0.0069 | 0.0153 | 0.0072 | 0.0047 | 0.1288 |
| SWE - | −0.0145 | 0.0081 | 0.0723 | 0.0173 | 0.0068 | 0.0107 |
| SWE - | −0.0149 | 0.0078 | 0.0578 | 0.0207 | 0.0063 | 0.0010 |
| FAWE - | −0.0144 | 0.0084 | 0.0857 | 0.0235 | 0.0065 | 0.0003 |
| FAWE - | −0.0150 | 0.0080 | 0.0597 | 0.0233 | 0.0063 | 0.0002 |
| Thymic or nonthymic leukemia death | ||||||
| Complete-case | −0.0144 | 0.0075 | 0.0561 | 0.0107 | 0.0061 | 0.0781 |
| SWE - | −0.0137 | 0.0086 | 0.1090 | 0.0230 | 0.0074 | 0.0018 |
| SWE - | −0.0131 | 0.0084 | 0.1170 | 0.0233 | 0.0070 | 0.0009 |
| FAWE - | −0.0148 | 0.0088 | 0.0925 | 0.0262 | 0.0073 | 0.0003 |
| FAWE - | −0.0152 | 0.0085 | 0.0743 | 0.0259 | 0.0072 | 0.0003 |
was obtained on the variables in the bracket using the Nadaraya-Watson estimator with uniform kernel and bandwidth h = 4σW n−1/3.
4. Discussion
Missing covariates complicate analysis of survival data. Inconsistent and inefficient estimates can be generated by naively discarding subjects with missing covariates. The additive hazards model is a useful alternative to the commonly used Cox PH model, especially when the primary interest is to estimate the difference in disease risk for the covariates or when the proportional hazards assumption is violated. Assuming the missingness is MAR, we proposed the SWEs and kernel-assisted FAWEs for the additive hazards model. By using the nonparametric smoothing techniques, the proposed SWEs and FAWEs are robust against model misspecifications for the selection probability and the conditional expectation of missing covariates, which is an advantage over the existing methods in the literature. The proposed weighted estimators are consistent and asymptotically normal and can improve the efficiency of the estimates from the SWE with true π as well as from the complete-data analysis. All the weighted estimators possess an explicit expression, an advantage of using the additive hazards model over the Cox PH model.
The proposed SWEs and FAWEs expand the SWE of Kulich & Lin (2000) for the case-cohort studies (Prentice, 1986) to general missing-data mechanisms. The asymptotic distribution theory of the SWEs with nonparametric suggests that the more variables are used in obtaining , the more efficiency may be gained potentially. So the SWE with has the best efficiency among all the SWEs. Although the SWE with has the same asymptotic distribution as the kernel-assisted FAWEs, our simulation studies and the data analysis example suggest that the FAWEs tend to perform better than the SWE with when sample sizes are moderate. In addition, the FAWEs are robust toward misspecifications of the selection probabilities due to their double robustness property.
The proposed methods can utilize surrogate variables to predict the missing covariates and the selection probability for increased efficiency. In this case, the surrogate variables should considered as a part of the observed data under the MAR and included as elements in W.
These weighted methods can be applied to situations where missing covariates occur by happenstance or by design, such as two-phase studies where selection probabilities are known. For two-phase studies with moderate sample sizes, we suggest the use of the FAWE with true selection probabilities. When the sample sizes are large and missingness rates are not extreme, one can employ either the SWE with or the kernel-assisted FAWE with π, and the estimators will generate consistent estimates with similar efficiency.
Supplementary Material
Acknowledgement
The authors wish to thank the editor, associate editor and the referee for their constructive comments and suggestions which have greatly improved the paper. The authors thank the ADNI study and database for allowing us to use their data (the full acknowledgements section of the ADNI study are in the online supplementary material). Many thanks to Dr. Danielle Harvey for her valuable help with obtaining the ADNI data set and suggestions regarding to the data analysis. Thanks to Wei Ran, Yueheng An, Yiming Hu, and Nan Bi for their help. Dr. Yichuan Zhao was partially supported by the NSF grant DMS-1406163 and NSA grant H98230-12-1-0209. Dr. Yanqing Sun was partially supported by the National Science Foundation grant DMS-1208978, DMS-1513072, and the National Institute of Health NIAID grant R37 AI054165.
Appendix
The following regularity conditions are needed in the proofs of Theorems 1 to 4.
(a1) Λ0(τ) < ∞.
(a2) P{Y (τ) = 1} > 0.
(a3) Z is time-independent and bounded.
(a4) The matrix is positive definite.
(a5) W has bounded support . There exists a constant π0 > 0 such that π(w) > π0 for .
(a6) The selection probability π(w) has r continuous and bounded partial derivatives with respect to the continuous components of W a.e.
(a7) The probability density/mass function f(w) of w and the conditional probability density/mass function fW|V (w) of W | V have r continuous and bounded partial derivatives with respect to the continuous components of W a.s.
(a8) Conditional distributions fW|V =0(w) and fW|V =1(w) have the same support, and c(w) = fW|V =0(w)/fW|V =1(w) is bounded over the support.
(a9) The conditional expectations E(Zk | W = w), E{(Zk)⊗2|W = w}, k = 0,1, have r continuous and bounded partial derivatives with respect to the continuous components of W a.e.
(a10) nh2d → ∞ and nh2r → 0, as n → ∞.
Footnotes
Supporting information. Additional information for this article is available online. The proofs of the theorems are in the Appendix S1 and the full acknowledgements section of the ADNI study are in the Appendix S2.
“For the Alzheimer’s Disease Neuroimaging Initiative**”
Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf
The online version of this article contains supplementary material.
References
- Aalen OO (1980). A model for nonparametric regression analysis of counting processes In Lecture Notes in Statistics, 2 Ed Klonecki N, Kosek A & Rosinski J, pp. 1–25. New York: Springer. [Google Scholar]
- Andersen PK and Gill RD (1982) Cox’s regression model for counting processes: A large sample study. Annals of Statistics, 10, 1100–1120. [Google Scholar]
- Breslow NE and Day NE (1987) Statistical Methods in Cancer Research Vol II The Design and Analysis of Cohort Studies. Lyon: IARC. [PubMed] [Google Scholar]
- Breslow NE, Lumley T, Ballantyne CM, Chambless LE, and Kulich M (2009). Improved Horvitz-Thompson estimation of model parameters from two-phase stratified samples: Applications in epidemiology. Statistics in Biosciences, 1, 32–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cox DR (1972). Regression models and life tables (with Discussion). Journal of the Royal Statistical Society: Series B, 34, 187–220. [Google Scholar]
- Cox DR and Oaks D (1984). Analysis of Survival Data. London: Chapman and Hall. [Google Scholar]
- Kalbfleisch JD and Prentice RL (1980). The Statistical Analysis of Failure Time Data. New York: Wiley. [Google Scholar]
- Kulich M and Lin DY (2000). Additive hazards regression for case-cohort studies. Biometrika, 87, 73–87. [Google Scholar]
- Lin DY and Ying Z (1994). Semiparametric analysis of the additive risk model. Biometrika, 81, 61–71. [Google Scholar]
- Lin DY and Ying Z (1997). Additive regression models for survival data In: Lin DY, Fleming TR(eds). In Proceedings of the First Seattle Symposium in Biostatistics: Survival Analysis, 185–198. Springer: New York. [Google Scholar]
- Little RJA and Rubin DB (1987). Statistical Analysis with Missing Data. New York: Wiley. [Google Scholar]
- Mark SD and Katki HA (2006). Specifying and implementing nonparametric and semiparametric survival estimators in two-Stage (nested) cohort studies with missing case data. Journal of the American Statistical Association, 101, 460–471. [Google Scholar]
- Nadaraya EA (1964). On estimating regression. Theory of Probability and Its Applications, 9, 141–142. [Google Scholar]
- Prentice RL (1986). A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika, 73, 1–11. [Google Scholar]
- Qi L, Wang CY, and Prentice RL (2005). Weighted estimators for proportional hazards regression with missing covariates. Journal of the American Statistical Association, 472, 1250–1263. [Google Scholar]
- Sun Y, Qian X, Shou Q and Gilbert P (2017). Analysis of two-phase sampling data with semiparametric additive hazards models. Lifetime Data Analysis, 23, 377–399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thomas DC (1986). Use of auxiliary information in fitting nonproportional hazards models Modern Statistical Methods in Chronic Disease Epidemiology. Ed. Moolgavkar SH and Prentice RL, pp. 197–210. New York: Wiley. [Google Scholar]
- Wang CY, Chen HY (2001) Augmented inverse probability weighted estimator for Cox missing covariate regression. Biometrics, 57(2), 414–419. [DOI] [PubMed] [Google Scholar]
- Wang CY, Wang SJ, Zhao LP, and Ou ST (1997). Weighted semiparametric estimation in regression analysis with missing covariate data. Journal of the American Statistical Association, 92, 512–525. [Google Scholar]
- Wang S and Wang CY (2001). A note on kernel assisted estimators in missing covariate regression. Statistics & Probability Letters, 55, 439–449. [Google Scholar]
- Watson GS (1964). Smooth regression analysis. Sankhyā A, 26, 359–372. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
