Abstract
Standard methods for estimating the effect of a time-varying exposure on survival may be biased in the presence of time-dependent confounders themselves affected by prior exposure. This problem can be overcome by inverse probability weighted estimation of Marginal Structural Cox Models (Cox MSM), g-estimation of Structural Nested Accelerated Failure Time Models (SNAFTM) and g-estimation of Structural Nested Cumulative Failure Time Models (SNCFTM). In this paper, we describe a data generation mechanism that approximately satisfies a Cox MSM, an SNAFTM and an SNCFTM. Besides providing a procedure for data simulation, our formal description of a data generation mechanism that satisfies all three models allows one to assess the relative advantages and disadvantages of each modeling approach. A simulation study is also presented to compare effect estimates across the three models.
1 Introduction
Commonly used methods to estimate the effect of a time-varying treatment on mortality model the hazard at time t conditional on treatment and covariate history through time t (e.g., a Cox model) (Cox and Oakes, 1984). This standard approach, however, may be biased in the presence of a time-dependent covariate (Robins, 1986; Hernán et al, 2004) that is:
a time-dependent confounder, i.e., it affects both future risk of failure and treatment and
affected by past treatment.
As an example, consider an observational study of the effect of diet on risk of coronary heart disease. The time-varying covariate \diagnosis of diabetes” is a time-dependent confounder because a diagnosis of diabetes affects future dietary choices and is a risk factor for coronary heart disease. In addition, prior diet affects future risk of diabetes.
Robins and collaborators have developed methods to appropriately adjust for measured time-varying confounders that are affected by past treatment (for a review of these methods see Robins and Hernán (2009)). In the high-dimensional failure time setting, these methods include inverse probability weighting of marginal structural Cox models (Cox MSM) (Robins, 1998a; Hernán et al, 2000), g-estimation of structural nested accelerated failure time models (SNAFTM) (Robins et al, 1992, 1993; Hernán et al, 2005), and g-estimation of structural nested cumulative failure time models (SNCFTM) (Page et al, 2008; Picciotto et al, 2008, 2009).
This paper describes the relations between these three models. In previous work (Young et al, 2008) we described a data generation mechanism (with no modification of the treatment effect by time-varying covariates) that satisfied both a Cox MSM and an SNAFTM. In this paper, we describe a data generation mechanism that approximately satisfies a Cox MSM, an SNAFTM and an SNCFTM. Besides providing a procedure for data simulation, our formal description of a data generation mechanism that satisfies all three models allows one to assess the relative advantages and disadvantages of each modeling approach.
This paper is structured as follows. In §2 we describe the data structure of interest. In §3 we review general definitions of the SNAFTM, Cox MSM and SNCFTM and briefly describe associated estimation procedures and inference. In §4 we describe sufficient conditions for a data generation mechanism that satisfies all three models. In §5 we present results of a simulation study that compares estimators of the parameters of the three models both when using data generated under those sufficient conditions, and when using data in which the conditions are violated. In §6 we discuss our results.
2 Data structure and identifying assumptions
Consider a longitudinal study with n subjects and observation times m = 0, 1, 2,…, K + 1. Let T be a failure (death) time variable that may be either exactly observed or interval censored, Ym the indicator for death by time m (1 = yes, 0 = no), Vm a vector of time-varying covariates measured at the start of the interval [m, m + 1), and Am a treatment indicator (1 = yes, 0 = no) during the interval (m, m + 1]. We use overbars to represent a variable’s history, i.e., V̄K = (V0, V1, …, Vm, …, VK). By convention, (i) Y0 = 0 and (ii) if Ym = 1 then Vm = 0, Am = 0, and Ym+1 = 1. Those who do not die before the last observation time K + 1 are said to be administratively censored. The observed data consists of n i.i.d. copies of
where
if T is interval-censored and
if T is exactly observed.
Let g = ā for ā ≡ āK in the support of ĀK denote a (static or nondynamic) treatment regime. An example of a treatment regime is “treat continuously since baseline” or g = (1, 1, …, 1) = 1̄. Let Tg and V̄K, g represent the failure time and covariate history, respectively, a subject would have experienced had she, possibly contrary to fact, followed treatment regime g = ā. We say a subject follows treatment regime g = ā if the subject takes treatment am at time m if alive at m. By convention, a subject takes treatment am = 0 if dead at m. Let ām be the first m components of ā. The full data structure consists of the observed data O and the counterfactual data (V̄K, g, Tg) for all g = ā. We can think of the observed data structure O as a missing data structure with (V̄K,g, Tg) unobserved.
We assume the following three identifying assumptions (Robins and Hernán, 2009):
-
Consistency:
Given g = ā, if Ām = ām then Ȳm+1,g = Ȳm+1 and V̄m+1,g = V̄m+1 whereif T is interval-censored andif T is exactly observed. Also in the exactly observed case, if the above holds and either T < m + 1 or Tg < m + 1 it follows that Tg = T.
- Conditional exchangeability: For any regime g and m ∈ [0, K]
Positivity: fĀm–1, V̄m, Ym(ām–1, v̄m,0) ≠ 0 ⇒ Pr(Am = am|V̄m, Ām–1, Ym = 0) > 0 w.p.1 for all am in the support of Am, 0 ≤ m ≤ K.
Informally, consistency is satisfied if the counterfactual outcomes are well defined, exchangeability if there is no unmeasured confounding, and positivity if there are subjects at all levels of exposure within levels of the measured confounders. See Young et al (2008) for a graphical representation of this data structure.
3 Model definitions, estimation and inference
3.1 Model definitions
Let T0 be the counterfactual failure time under the treatment regime “never treat during the follow-up” or g = 0̄ ≡ 0̄K = (0, 0,…0). Let T(ām–1,0) be the failure time under the regime “take treatment ām–1 through m–1 and then no more treatment” so T(Ā<sub>m–1</sub>,0) is the treatment regime “take treatment actually taken through m – 1 and then no more treatment.” We define three models:
An SNAFTM assumes
| (1) |
an MSM assumes
| (2) |
and an SNCFTM assumes
| (3) |
where γAFT(t, āt, V̄t, ā,, ψaft) and γMSM(t, āt, ψmsm) are known functions, continuous in t and differentiable wrt to t except at t = 1, 2, …, K; γCFT;k(V̄m, Ām, ψcft) is a known function for 0 ≤ m < k ≤ K + 1; λTā(t) and λ0(t) are the hazard functions at t for Tā, and T0, respectively; and and denote the unknown true values of the model parameters ψaft, ψmsm and ψcft, respectively.
3.2 Estimation and inference
Briefly, estimating the parameters of the structural models defined above requires solving an estimating equation of the general form
| (4) |
where ψ is ψaft, ψmsm, or ψcft depending on which model is of interest, and α̂ is a consistent estimator of the p-dimensional nuisance parameter α* of a parametric model Pr(Am = 1|V̄m–1, Ām–1, Ym = 0; α) for the treatment mechanism Pr(Am = 1|V̄m, Ām–1, Ym = 0), 0 ≤ m ≤ K. The specific form of U(ψ, α̂) depends on the choice of model for the treatment mechanism and the choice of γAFT(t, āt, V̄t, ā, ψaft), γMSM (t, āt, ψmsm), or γCFT;k(V̄m, Ām, ψcft). More efficient estimators of ψ exist which solve estimating equations with additional nuisance parameters.
If the model for the treatment mechanism is correctly specified and α̂ is the MLE of α*, then where
| (5) |
when U(ψ, α*) is a differentiable function of ψ, S(α*) is the score for α evaluated at α* and
A consistent estimator of Σ is given by
| (6) |
where , Γ̂ = Pn{AAT} and
Differences in the specific form of U(ψ, α̂) associated with each model result in varying degrees of computational complexity. For the Cox MSM, the inverse probability weighted estimator of that solves can be computed using standard off-the-shelf software. Robust variance estimates that lead to conservative Wald confidence intervals for ψ* are also straightforward to obtain using off-the-shelf software although, if desired, consistent estimates of the limiting variance estimates can be obtained from equation (6). In contrast, for the SNAFTM, the estimating equation is non differentiable with respect to ψaft when there is administrative censoring (i.e., when not all subjects have failed by end of follow-up at K + 1) and so-called artificial’ censoring is used to guarantee unbiasedness of the estimating function. As a consequence, solving requires search-based algorithms (e.g., bisection method for one-dimensional ψaft, Nelder-Mead Simplex method in general). G-estimation of an SNCFTM is somewhat more computationally involved than inverse probability weighted estimation of a Cox MSM, but the estimating function U(ψcft, α̂) is a continuously differentiable function of ψcft, even in the presence of administrative censoring. Thus, the estimating equation can be generally solved using a Newton-Raphson type procedure.
For more details on estimation of a Cox MSM, SNAFTM and SNCFTM see Hernán et al (2005), Hernán et al (2000) and Page et al (2008), respectively. For more on general inference for estimators obtained using estimating equation methodology see van der Laan and Robins (2002).
4 A data generation mechanism that satisfies all three models
The following theorem states sufficient conditions for the generation of data that satisfies an SNAFTM and a Cox MSM. Note that this is a special case of the more general theorem presented in Young et al (2008). Proofs of all theorems are presented in the appendix.
Theorem 1
Suppose the counterfactual failure times Tā follow an SNAFTM (1) with γAFT(t, āt, V̄t, ā, ψaft) = at × ψaft: Further assume that T0 has an exponential distribution with hazard λT0(t) = λ0. Then the Tᾱ also follow a Cox MSM with γMSM(t, āt, ψmsm) = at × ψmsm and .
Note that, in this case, is the hazard ratio comparing the regimes “always treat” vs. “never treat.”
The next theorem provides conditions under which there is approximate equivalence between an SNAFTM and SNCFTM.
First, given the SNAFTM γAFT(t, āt, V̄t, ā, ψaft) = at × ψaft, define for u≥m
| (7) |
Note under the above SNAFTM.
Theorem 2
Suppose that the following assumptions hold in addition to those of Theorem 1 and §2:
the conditional distribution of Vm given Ām–1, V̄m–1, T0 depends on T0 only through the function I(T0 < c) for a constant c such that c > max({h(K, Ām), h(K, Ām–1)} and
failure is rare in the sense that ST0(t) ≈ 1 for t < maxm∈{ 0,…,K} {h(K, Ām)} where ST0(t) is the survival function for T0 at t, and A ≈ B means A and B are approximately equal.
It then follows that the SNCFTM:
approximately holds with .
When the probability of failure in any interval (m, m + 1) is small, eγCFT,k(V̄m,Ām,ψcft) with k = m+ 1 approximates the conditional hazard ratio at time t ∈ (m, m + 1] given V̄m, Ām, for regime g = (Ām, 0) versus regime g = (Ām−1, 0) if, as we assume, the conditional hazard ratio is nearly constant in the interval (m, m + 1]. Consider a correctly specified SNCFTM (3) with the form
Under this model, exp{γCFT, m+1 (V̄m, Ām, ψcft)} = eAmψcft. Note does not imply . If it did we could conclude that the MSM γMSM (t, āt, ψmsm) = at × ψmsm was correctly specified with . However, under the additional assumptions of Theorems 1 and 2, we can conclude that approximates , and thus that the MSM γMSM (t, āt, ψmsm) = at × ψmsm with approximately holds.
The following theorem provides sufficient conditions for data generation that satisfies a SNAFTM, Cox MSM and SNCFTM in the special case where .
Theorem 3
Suppose the counterfactual failure times Tā follow a SNAFTM (1) with γAFT (t, āt, V̄t, ā, ψaft) = at × ψaft and . Then:
the Tā follow a Cox MSM with γMSM (t, āt, ψmsm) = at × ψmsm and and
-
the following SNCFTM:
holds with .
5 Simulation study
We generated data consistent with the conditions stated in Theorems 1 and 2, and under the full data structure described in §2. The simulations consisted of 1000 samples, each with 2500 subjects and K + 1 = 10 observation times. Each sample was generated according to the general algorithm described in Young et al (2008) for SNAFTM data generation. Here, this algorithm was specifically implemented as follows:
For each of 2500 simulated subjects:
-
step 1: Simulate the counterfactual T0 from an Exponential distribution with λ0 = 0.01.
Define V−1 = A−1 = Y0 = 0. Then for each m ∈ [0, 9] implement steps 2–4:
step 2: Simulate Vm from logit[Pr(Vm = 1|V̄m−1, Ām–1, T0, Ym = 0; β)] = β0 + β1I(T0 < c) + β2Am−1 + β3Vm−1 where and c = 30.
step 3: Simulate Am from logit [Pr(Am = 1|V̄m, Ām−1, Ym = 0; α)] = α0 + α1Vm + α2Vm−1 + α3Am−1 where .
-
step 4: Simulate Ym+1, and possibly T, based on the following:
if then Ym+1 = 0;
– else if then Ym+1 = 1 and T ∈ (m, m + 1] with .
Finally, redefine Vl = 0, Al = 0 for l > T.
SAS code to implement the above algorithm is provided at www.hsph.harvard.edu/causal/software.htm.
Tables 1 through 3 display simulation results for the inverse probability weighted estimates ψ̂msm, and the g-estimates ψ̂cft and ψ̂aft. The true value of the parameter was varied to be either −0.3, 0.0 or 0.3. Each table reports the mean of the model parameter estimates across Monte Carlo simulation samples (MC Mean), the difference between this mean and the true value of the parameter (Bias), variance of the model parameter estimates across samples (MC Var), the test statistic and the two-sided p-value comparing T to a N(0, 1) (p-value).
Table 1.
Monte Carlo simulation results for estimators of the parameter of a Cox MSM, SNCFTM and SNAFTM when data are generated under the assumptions of §2 and Theorems 1 and 2 for various values of based on 1000 replicates, n = 2500 and K + 1 = 10.
|
|
Model | MC Mean | Bias | MC Var | T | p-value | |
|---|---|---|---|---|---|---|---|
| −0.3 | Cox MSM | −0.301 | −0.001 | 0.024 | −0.15 | 0.88 | |
| SNCFTM | −0.300 | −0.000 | 0.060 | 0.00 | 1.00 | ||
| SNAFTM | −0.287 | 0.013 | 0.058 | 1.71 | 0.09 | ||
|
| |||||||
| 0.0 | Cox MSM | 0.000 | 0.000 | 0.020 | 0.14 | 0.88 | |
| SNCFTM | −0.002 | −0.002 | 0.046 | −0.26 | 0.79 | ||
| SNAFTM | 0.010 | 0.010 | 0.051 | 1.40 | 0.16 | ||
|
| |||||||
| 0.3 | Cox MSM | 0.302 | 0.002 | 0.018 | 0.50 | 0.62 | |
| SNCFTM | 0.294 | −0.006 | 0.037 | −0.99 | 0.32 | ||
| SNAFTM | 0.302 | 0.002 | 0.047 | 0.27 | 0.77 | ||
Table 3.
Monte Carlo simulation results for estimators of the parameter of a Cox MSM, SNCFTM and SNAFTM when data are generated as in Table 1 but with violation of the rare disease assumption (λ0 = 0.1).
|
|
Model | MC Mean | Bias | MC Var | T | p-value | |
|---|---|---|---|---|---|---|---|
| −0:3 | Cox MSM | −0:314 | −0:014 | 0:004 | −7:17 | < 0:0001 | |
| SNCFTM | −0:248 | 0:052 | 0:006 | 21:46 | < 0:0001 | ||
| SNAFTM | −0:296 | 0:004 | 0:011 | 1:08 | 0:28 | ||
|
| |||||||
| 0:0 | Cox MSM | 0:001 | 0:001 | 0:003 | 0:58 | 0:56 | |
| SNCFTM | −0:000 | −0:000 | 0:005 | −0:10 | 0:92 | ||
| SNAFTM | −0:000 | −0:000 | 0:010 | −0:03 | 0:98 | ||
|
| |||||||
| 0:3 | Cox MSM | 0:318 | 0:018 | 0:003 | 10:25 | < 0:0001 | |
| SNCFTM | 0:245 | −0:055 | 0:005 | −25:31 | < 0:0001 | ||
| SNAFTM | 0:296 | −0:004 | 0:011 | −1:21 | 0:23 | ||
Results in Table 1 confirm that the estimators of and are essentially unbiased when data are generated under the assumptions of Theorems 1 and 2.
Tables 2 and 3 display simulation results under a data generation mechanism in which the conditions of Theorems 1 and 2 are violated. Specifically, results presented in Table 2 are based on data generated as in Table 1, except with T0 generated from a Weibull distribution with shape and scale parameters 2 and 0.02, respectively, which violates the condition that T0 is exponentially distributed. Results presented in Table 3 differ from those of Table 1 in that λ0 = 0.1 (as opposed to 0.01), which violates the rare disease condition defined in Theorem 2.
Table 2.
Monte Carlo simulation results for estimators of the parameter of a Cox MSM, SNCFTM and SNAFTM when data are generated as in Table 1 but with violation of the assumption that the T0 are exponentially distributed. Here, the T0 follow a Weibull distribution with shape and scale parameters 2 and 0.02, respectively.
|
|
Model | MC Mean | Bias | MC Var | T | p-value | |
|---|---|---|---|---|---|---|---|
| −0:3 | Cox MSM | −0:364 | −0:064 | 0:074 | −7:47 | < 0:0001 | |
| SNCFTM | −0:467 | −0:167 | 0:175 | −12:64 | < 0:0001 | ||
| SNAFTM | −0:300 | −0:000 | 0:055 | −0:02 | 0.98 | ||
|
| |||||||
| 0:0 | Cox MSM | −0:001 | −0:001 | 0:055 | −0:16 | 0:88 | |
| SNCFTM | −0:006 | −0:006 | 0.083 | −0:64 | 0:52 | ||
| SNAFTM | 0:010 | 0:010 | 0:043 | 1:59 | 0:11 | ||
|
| |||||||
| 0:3 | Cox MSM | 0:358 | 0:058 | 0:044 | 8:66 | < 0:0001 | |
| SNCFTM | 0:394 | 0:094 | 0:054 | 12:79 | < 0:0001 | ||
| SNAFTM | 0:301 | 0:001 | 0:037 | 0:242 | 0:81 | ||
As expected, for , the results reported in Table 2 confirm that violation of the exponential condition results in biased estimators of and as the data are no longer generated under a Cox MSM or an SNCFTM. Also as expected under , violation of the rare disease condition results in biased estimators of (see Table 3).
In theory, the performance of the inverse probability weighted estimator ψ̂msm of should be unaffected by violations of the rare disease condition. However, as is common practice (Hernán et al, 2000) we approximated ψ̂msm via a weighted logistic regression model, which requires the rare disease condition in every time interval. This approximation may explain the poorer performance of the inverse probability weighted estimator that is seen in Table 3.
As expected based on Theorem 3, estimators of both and are unbiased when as shown in Tables 2 and 3.
6 Discussion
This paper defines sufficient conditions for a data generation mechanism to satisfy three structural failure time models: the SNAFTM, Cox MSM and SNCFTM. A simulation study where the data generation mechanism was (i) consistent with these conditions, and (ii) in violation of these conditions supports theoretical results regarding the sufficiency of these conditions. Our results also describe how to correctly simulate data from a SNCFTM with known parameter by generating data from a SNAFTM with known parameter. For simplicity, our discussion did not allow for right-censoring due to loss to follow-up or competing risks before K + 1, but estimating the model parameters in the presence of such censoring is straightforward under additional identifying assumptions as described in Hernán et al (2005), Hernán et al (2000) and Page et al (2008).
By generating data that satisfies all three models, we can evaluate the relative performance of the inverse probability weighted estimator of the Cox MSM and the g-estimators of the SNAFTM and SNCFTM under this limited data-generating mechanism. An interesting finding is that, as shown in Table 1, the widely used inverse probability weighted estimator of had similar or less bias, and a smaller variance, than the g-estimators of and , with the added advantage of being more easily computed.
As discussed in §3.2, the estimators studied in our simulations were simple to compute but non-optimal. Optimal estimators of parameters of structural nested models should be more efficient than those of marginal structural models under the assumption of no effect modification by past covariates (which is assumed in Theorems 1 and 2) (Robins and Hernán, 2009). Our simulation results suggest that, at least under this limited data-generating mechanism, non-optimal parameter estimates for the Cox MSM are actually more efficient than those of the SNCFTM or SNAFTM.
Acknowledgments
This work was supported by NIH grant R01 HL080644. The original publication is available at www.springerlink.com see http://www.springerlink.com/content/g31gl43370611421/
7 Appendix
7.1 Proof of Theorem 1
Proof
Without loss of generality, we can assume the SNAFTM is locally rank preserving in the sense that
| (8) |
since it is non-identifiable whether or not local rank preservation holds (Robins, 1998b). Thus
Hence . It follows that for a Cox MSM with γMSM (t, āt, ψmsm) = at × ψmsm.
7.2 Proof of Theorem 2
Proof
By the definition of Yj,g=(Ām,0) and Yj,g = (Ām–1,0) for any j ∈ [m, m + 1,…, k] and by consistency, we may rewrite the SNCFTM (3) as follows:
Now, with no loss of generality, explicitly writing out under the locally rank preserving SNAFTM and noting that by assumption fVj|V̄j–1, Āj–1,T0 (Vj|V̄j–1, Āj–1, u) is the same for all u < max {h(K, Ām), h(K, Ām–1)}, we obtain
7.3 Proof of Theorem 3
Proof
By equation (8) and it follows that T0 = Tā for any ā. Hence
It follows that for a Cox MSM with γMSM (t, āt, ψmsm) = at × ψmsm. Further, using equation (7), for for any Ām. Thus, we may rewrite the SNCFTM (3) as
It follows that for a SNCFTM with .
References
- Cox DR, Oakes D. Analysis of Survival Data. London: Chapman and Hall; 1984. [Google Scholar]
- Hernán MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11(5):561–570. doi: 10.1097/00001648-200009000-00012. [DOI] [PubMed] [Google Scholar]
- Hernán MA, Hernández-Diáz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15:615–625. doi: 10.1097/01.ede.0000135174.63482.43. [DOI] [PubMed] [Google Scholar]
- Hernán MA, Cole SR, Margolick J, Cohen M, Robins JM. Structural accelerated failure time models for survival analysis in studies with time-varying treatments. Pharmacoepidemiology and Drug Safety. 2005;14(7):477–491. doi: 10.1002/pds.1064. [DOI] [PubMed] [Google Scholar]
- van der Laan MJ, Robins JM. Unified Methods for Censored Longitudinal Data and Causality. New York: Springer; 2002. [Google Scholar]
- Page J, Hernán MA, Robins JM. Tech rep. Department of Epidemiology, Harvard School of Public Health; 2008. Structural Nested Cumulative Failure Time Models. [Google Scholar]
- Picciotto S, Young J, Hernán MA. G-estimation of structural nested cumulative failure time models. American Journal of Epidemiology. 2008;167(Suppl):S139. [Google Scholar]
- Picciotto S, Robins JM, Young J, Hernán MA. Estimating absolute risks under hypothetical interventions using a structural nested cumulative failure time model. American Journal of Epidemiology. 2009;169(Suppl):S34. [Google Scholar]
- Robins JM. A new approach to causal inference in mortality studies with a sustained exposure period: application to the healthy worker survivor effect. Mathematical Modelling. 1986;7:1393–1512. [Google Scholar]
- Robins JM. Marginal structural models. 1997 Proceedings of the American Statistical Association, Section on Bayesian Statistical Science; American Statistical Association; 1998a. pp. 1–10. [Google Scholar]
- Robins JM. Structural nested failure time models. In: Armitage P, Colton T, editors. Encyclopedia of Biostatistics. Wiley; Chichester: 1998b. pp. 4372–4389. [Google Scholar]
- Robins JM, Hernán MA. Estimation of the causal effects of time-varying exposures. In: Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G, editors. Advances in Longitudinal Data Analysis. Boca Raton, FL: Chapman and Hall/CRC Press; 2009. pp. 553–599. [Google Scholar]
- Robins JM, Blevins D, Ritter G, Wulfsohn M. G-estimation of the effect of prophylaxis therapy for pneumocystis carinii pneumonia on the survival of AIDS patients. Epidemiology. 1992;3:319–336. doi: 10.1097/00001648-199207000-00007. [DOI] [PubMed] [Google Scholar]
- Robins JM, Blevins D, Ritter G, Wulfsohn M. Errata to g-estimation of the effect of prophylaxis therapy for pneumocystis carinii pneumonia on the survival of AIDS patients. Epidemiology. 1993;4:189. doi: 10.1097/00001648-199207000-00007. [DOI] [PubMed] [Google Scholar]
- Young JG, Hernán MA, Picciotto S, Robins JM. Simulation from structural survival models under complex time-varying data structures. JSM Proceedings, Section on Statistics in Epidemiology; Denver, CO: American Statistical Association; 2008. [Google Scholar]
