Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Mar 15.
Published in final edited form as: Stat Med. 2013 Oct 22;33(6):1001–1014. doi: 10.1002/sim.5994

Simulation from a known Cox MSM using standard parametric models for the g-formula

Jessica G Young a,*, Eric J Tchetgen Tchetgen a,b
PMCID: PMC3947915  NIHMSID: NIHMS531688  PMID: 24151138

Abstract

It is routinely argued that, unlike standard regression-based estimates, inverse probability weighted (IPW) estimates of the parameters of a correctly specified Cox marginal structural model (MSM) may remain unbiased in the presence of a time-varying confounder affected by prior treatment. Previously proposed methods for simulating from a known Cox MSM lack knowledge of the law of the observed outcome conditional on the measured past. While unbiased IPW estimation does not require this knowledge, standard regression-based estimates rely on correct specification of this law. Thus, in typical high-dimensional settings, such simulation methods cannot isolate bias due to complex time-varying confounding as it may be conflated with bias due to misspecification of the outcome regression model. In this paper, we describe an approach to Cox MSM data generation that allows for a comparison of the bias of IPW estimates versus that of standard regression-based estimates in the complete absence of model misspecification. This approach involves simulating data from a standard parametrization of the likelihood and solving for the underlying Cox MSM. We prove that solutions exist and computations are tractable under many data generating mechanisms. We show analytically and confirm in simulations that, in the absence of model misspecification, the bias of standard regression-based estimates for the parameters of a Cox MSM is indeed a function of the coefficients in observed data models quantifying the presence of a time-varying confounder affected by prior treatment. We discuss limitations of this approach including that implied by the “g-null paradox”.

Keywords: marginal structural models, simulation, g-formula, survival analysis, causal inference, g-null paradox

1. Introduction

Inverse probability weighted (IPW) estimation of Cox Marginal Structural Models (MSMs) [1, 2] is now a popular approach to estimating the causal effect of a time-varying treatment on survival in observational studies. A Cox MSM is a model for the hazard ratio at a given follow-up time comparing counterfactual time-varying treatment regimes. It is routinely argued that, unlike standard regression-based estimates, IPW estimates of the parameters of a correctly specified Cox MSM may remain unbiased in the presence of a time-varying confounder affected by prior treatment. For example, in observational studies of HIV-infected patients, CD4 cell count affects whether a patient will receive treatment and is associated with future survival. It is also itself affected by whether treatment has been previously initiated.

Young et al. [3, 4] and Havercroft and Didelez [5] have proposed algorithms for simulating data under a known Cox MSM and known model for the treatment mechanism. Westreich et al. [6] recently applied a variant of one of these algorithms to compare the performance of IPW estimates and standard regression-based estimates of the true Cox MSM parameters under several simulation scenarios where time-varying confounding affected by prior treatment is present. As knowledge of the correct functional form of the Cox MSM and of the model for the treatment mechanism are required for unbiasedness of IPW estimation, these previously proposed approaches are reasonably useful for simulation studies of IPW estimator performance. In particular, under such data generating algorithms, the properties of IPW estimators in the complete absence of model misspecification may be studied.

These previously proposed simulation approaches, however, lack explicit knowledge of the law of the observed outcome at each time conditional on the measured past. Unlike IPW estimates, standard regression-based estimates rely on correct specification of this law. In settings most often of interest, where treatment and confounders are frequently updated over time and/or covariates are high-dimensional, regression-based estimates cannot be constructed non-parametrically and, typically, parametric models are used. It follows that, in such settings, these previous simulation methods will not be useful for studying the performance of standard regression-based estimates as bias due to time-varying confounding may be conflated with bias due to model misspecification.

Xiao et al. [7] suggested an alternative approach to simulating from a Cox MSM by generating according to standard parametric models for the joint distribution of the observed data. These authors argued, under a particular data generating mechanism and a rare disease assumption, that the parameters of the underlying Cox MSM may be derived analytically from the parameters of the specified observed data generating models. These include a regression model for the treatment mechanism used in the construction of IPW estimates. These also include a regression model for the law of the outcome conditional on the measured past used in the construction of standard regression-based estimates. In turn, this approach allows a comparison of IPW and standard regression-based estimates in the absence of model misspecification.

In this paper, we show more generally that the parameters of an underlying Cox MSM may be derived based on a particular parametrization of the observed data distribution. This derivation follows from the general relationship between a Cox MSM and Robins’ g-formula [8]. We prove that solving for the true Cox MSM parameters is both possible and computationally tractable under many data generating models with or without the assumption of rare disease. Various examples are presented where follow-up time is arbitrary and standard parametric models for the observed data are imposed such that time-varying confounding affected by prior treatment is present. A large sample simulation study is also presented. We begin with a description of the observed data to be generated.

2. Observed data structure

We wish to generate samples of n i.i.d observations where each observation represents measurements on a subject in a hypothetical observational study. In this study, subjects are followed beginning at time t0 (baseline) and the investigator takes measurements on each subject during frequent, regular intervals. Specifically, for m = 0, …, K: Let Am be the value of a binary treatment measured during interval m defined by [tm, tm+1), with Am = 1 if the subject is treated and Am = 0 otherwise. Further, let Lm be a covariate measured at the start of that same interval. Define Ym+1 = I(Ttm+1) with T a subject’s exact failure time which may be either continuous or discrete; equivalently, Ym+1 is an indicator of failure by tm+1.

In general, we denote the history of a random variable using overbars; for example Ām = (A0, …, Am) is the observed treatment history through the end of interval m. By definition 0 = 0 (all subjects must be at risk for failure at baseline). For notational convenience we set −1 and Ā−1 to be identically 0. To simplify the presentation, we will assume no loss to follow-up.

3. Definition of a Cox MSM

Let āāK = (a0, …, am, …, aK) denote a treatment regime in 𝒜̅K, the support of ĀK consisting of all possible treatment regimes. Examples include ā = 1̅ (or “always treat”) and ā = 0̅ (or “never treat”). Treatment regimes of the form ā are known as static regimes in that the treatment received in every future interval by someone following that regime is deterministically known at baseline. By contrast, a dynamic treatment regime is one under which treatment at a future time may depend on the values of evolving time-dependent covariates. For example, see [9, 10, 11, 12, 13, 14, 15, 16]. In this paper we limit our attention to causal contrasts involving only static regimes.

Define K+1ā as the outcome history a subject would have had if, possibly contrary to fact, she followed regime ā with Tā her exact failure time. A discrete time Cox MSM γ(m, ām, ψ) is defined as

Pr[Ym+1ā=1|Ymā=0]Pr[Ym+1=1|Ym=0]=exp{γ(m,ām,ψ)} (1)

where ψ is a constant parameter vector, and γ is a particular function of ψ, regime ā through m and m. The parameter ψ encodes the causal treatment effect of following a static regime ā compared with 0̅ up to any interval m + 1 such that ψ = 0 if and only if Pr[Ym+1ā=1|Ymā=0]=Pr[Ym+1=1|Ym=0] for all ām in the support of Ām, m = 0, …, K.

Following D’agostino et al. [17], when Tā is continuous and measurement intervals are sufficiently small such that the event rate is negligible within each interval and over the follow-up, then the discrete time Cox MSM (1) may approximate the continuous time Cox MSM

λTā(t)=λ0(t)exp{γ(t,āt,ψ)}

for t ∈ [tm, tm+1) where λTā (t) and λ0(t) are the counterfactual hazards at t under regimes ā and “never treat”, respectively, for all ā and m = 0, …, K.

4. Identifying assumptions and the g-formula

Suppose the goal of this hypothetical study is to obtain an unbiased estimate of the parameter vector ψ under model (1). Note that, if γ(m, ām, ψ) is a saturated model – i.e., the ratio on the LHS of (1) is allowed to differ for every possible ām and m – then the Cox MSM is by definition correctly specified. As K+1ā for all āK in the support of ĀK are not observed for all study subjects, in order to identify ψ based only on measured variables, we require additional assumptions. For each m = 0, …, K and each static regime ā, suppose that the following hold:

  1. Consistency: If Ām = ām then m+1=m+1ā and m=mā with mā the covariate history through m under ā.

  2. Positivity: fĀm−1,m,Ym (ām−1, m, 0) ≠ 0 ⇒ Pr[Am = am|m = m,Ām−1 = ām−1, Ym = 0] > 0 w.p.1.

  3. Exchangeability (no unmeasured confounding):
    (Ym+1ā,,YK+1ā)Am|m=m,Ām1=ām1,Ym=0
    where, in general, AB|C denotes “A is independent of B given C”.

As stated in the appendix of [3], under the above three identifying assumptions for a given static regime ā, Pr[Ym+1ā=1|Ymā=0] is given by Robins’ g-formula [8]:

h(m,ām)=mPr[Ym+1=1|m=m,Ām=ām,Ym=0]wā(m,m)/mwā(m,m) (2)

with

wā(m,m)=j=0mPr[Yj=0|j1=j1,Āj1=āj1,Yj1=0]f(lj|āj1,j1,Yj=0)

for m = 0, …, K.

It follows that, given our identifying assumptions, the Cox MSM (1) will hold in our study population if the following relationship holds

h(m,ām)h(m,m)=exp{γ(m,ām,ψ)} (3)

for all ām and m = 0, …, K. For simplicity, we have generally expressed the g-formula h(m, ām) above in terms of a high-dimensional sum. However, when m contains continuously measured components, we may replace sums with integrals.

5. Parametric assumptions on the g-formula

Let us now additionally assume that the components of the g-formula h(m, ām) may be characterized by standard parametric models. Under these additional restrictions, it follows that the Cox MSM (1) holds if

h(m,ām;β,θ)h(m,m;β,θ)=exp{γ(m,ām,ψ)} (4)

for all ām and m = 0, …, K where

h(m,ām;θ,β)=mPr[Ym+1=1|m=m,Ām=ām,Ym=0;θ]wā(m,m;θ,β)/mwā(m,m;θ,β) (5)

with

wā(m,m;θ,β)=j=0mPr[Yj=0|j1=j1,Āj1=āj1,Yj1=0;θ]f(lj|āj1,j1,Yj=0;β)

such that (5) is a particular parametric version of (2).

We can now explicitly connect models for the observed data likelihood to a Cox MSM. Specifically, based on (4), when m contains only discrete components, one can, at least in theory, derive in closed form the underlying Cox MSM γ(m, ām, ψ) that holds for any choice of parametric models. However, this derivation may become computationally unwieldy in practice without additional restrictions (e.g. Markov assumptions) when the confounders may take on many levels and/or K is large.

When m contains continuously measured components, deriving the true Cox MSM based on a particular parametrization of the g-formula may require evaluating integrals with no closed form. In this case, whether there is a closed form solution will depend on the choice of parametrization and, possibly, whether additional restrictions are imposed (e.g. rare disease assumptions). In the following sections, we consider a variety of data generating assumptions under which we may tractably derive the true Cox MSM implied by a standard parametrization of the observed data likelihood.

6. Cox MSMs under Markov assumptions

Suppose the following Markov assumptions hold:

Pr[Ym+1=1|m=m,Ām=ām,Ym=0]=g(lm,am,am1) (6)

and

f(lm|ām1,m1,Ym=0)=r(lm,am1) (7)

m = 0, …, K where g and r are any real-valued functions bounded between 0 and 1. We now have the following theorem.

Theorem 6.1 Assume that the restrictions (6) and (7) hold for all m = 0, …, K. Then, it follows that the hazard ratio h(m,ām)h(m,m) only depends on (m, ām) through (am, am−1) with

h(m,ām)h(m,m)=lmg(lm,am,am1)r(lm,am1)lmg(lm,0,0)r(lm,0) (8)

for Lm discrete and

h(m,ām)h(m,m)=g(lm,am,am1)r(lm,am1)dlmg(lm,0,0)r(lm,0)dlm (9)

for Lm continuous.

A proof of Theorem 6.1 is given in Appendix A. The following is a corollary of Theorem 6.1.

Corollary 6.2 Suppose the assumptions of Theorem 6.1 hold and denote h(m, ām) ≡ h(am, am−1). Then, the Cox MSM γ(m, ām, ψ) = ψ0am + ψ1am−1 + ψ2amam−1 holds with

exp(ψ0)=h(1,0)h(0,0) (10)
exp(ψ1)=h(0,1)h(0,0) (11)
exp(ψ2)=h(1,1)h(0,0)×1exp(ψ0)exp(ψ1) (12)

7. Binary covariates

In this section, suppose Lm is binary. By Corollary 6.2, we obtain

exp(ψ0)=g(1,1,0)r(1,0)+g(0,1,0){1r(1,0)}g(1,0,0)r(1,0)+g(0,0,0){1r(1,0)} (13)
exp(ψ1)=g(1,0,1)r(1,1)+g(0,0,1){1r(1,1)}g(1,0,0)r(1,0)+g(0,0,0){1r(1,0)} (14)
exp(ψ2)=g(1,1,1)r(1,1)+g(0,1,1){1r(1,1)}g(1,1,0)r(1,0)+g(0,1,0){1r(1,0)}×g(1,0,0)r(1,0)+g(0,0,0){1r(1,0)}g(1,0,1)r(1,1)+g(0,0,1){1r(1,1)} (15)

Equations (13), (14) and (15) follow simply by plugging the RHS of (8) into (10), (11) and (12) for the appropriate (am, am−1).

Standard parametric assumptions on g(lm, am, am−1) and r(lm, am−1) might be regression models with logit, probit or complementary log-log links [18]. To fix ideas, we work through an example under logistic regression models for both g(lm, am, am−1) and r(lm, am−1) such that

r(1,am1;β)=exp(β1am1)1+exp(β1am1) (16)

and

g(lm,am,am1;θ)=exp(θ0+θ1lm+θ2am+θ3am1)1+exp(θ0+θ1lm+θ2am+θ3am1) (17)

m = 0, …, K. To simplify the presentation, we have implicitly set the intercept in (16) to zero, however a non-zero intercept is easily added. Plugging these choices into (13), (14) and (15) we obtain the solutions

exp(ψ0)=exp(θ1+θ2)1+exp(θ0+θ1+θ2)+exp(θ2)1+exp(θ0+θ2)exp(θ1)1+exp(θ0+θ1)+11+exp(θ0) (18)
exp(ψ1)=exp(θ3)1+exp(β1){exp(θ1+β1)1+exp(θ0+θ1+θ3)+11+exp(θ0+θ3)}12{exp(θ1)1+exp(θ0+θ1)+11+exp(θ0)} (19)
exp(ψ2)={exp(θ1+β1)1+exp(θ0+θ1+θ3)+11+exp(θ0+θ2+θ3)}{exp(θ1)1+exp(θ0+θ1)+11+exp(θ0)}{exp(θ1)1+exp(θ0+θ1+θ2)+11+exp(θ0+θ2)}{exp(θ1+β1)1+exp(θ0+θ1+θ3)+11+exp(θ0+θ3)} (20)

Notably, if the disease were rare as in Xiao et al. [7] our solutions simplify considerably. Specifically, given the model (17) and rare disease within each measurement interval and history we have the approximation

g(lm,am,am1;θ)exp(θ0+θ1lm+θ2am+θ3am1). (21)

Plugging (21) into expressions (13), (14) and (15) in place of (17) we obtain the simplified approximate solutions

ψ0=θ2 (22)
ψ1=log[exp(θ3){exp(θ1+β1)1+exp(β1)+11+exp(β1)}12{exp(θ1)+1}] (23)
ψ2=0 (24)

Equations (22) and (23) establish that the parameters of a standard time-dependent Cox regression such as (21) do not generally match those of the Cox MSM. Specifically, the maximum likelihood estimates (MLEs) of θ2 and θ3 based on the correctly specified model (17) have bias approximately equal to θ2 − ψ0 and θ3 − ψ1 for ψ0 and ψ1, respectively. Given the rare disease approximation (21), we can see by expression (22) that we have approximately θ2 − ψ0 = 0 for any choice of θ1 or β1. However, by expression (23), we will only have θ3 = ψ1 if either θ1 or β1 is zero; that is, if Lm is either not a confounder or not itself affected by prior treatment. Without the rare disease assumption, by equations (18) and (19), the bias of the MLEs of θ2 and θ3 for ψ0 and ψ1, respectively, depends not only on the values of θ1 and β1 but also on the other components of θ.

By equation (24), we also see in this example that, given the rare disease approximation (21), absence of an interaction term between am and am−1 in the model (17) also implies no interaction as quantified by ψ2. However, by equation (20), in the absence of rare disease, the presence of interaction as quantified by ψ2 generally depends on β1 and all components of θ, despite the absence of an interaction term in the model (17). For interested readers, we explicitly consider alternative solutions for ψ2 in Appendix B when an interaction term between am and am−1 is added to the model (17), both with and without a rare disease approximation. Note that solutions for ψ0 and ψ1 will remain unchanged (with or without rare disease) under a less restricted model for g(lm, am, am−1) that allows interaction between am and am−1 by equations (13) and (14), respectively.

8. Continuous covariates

In this section, suppose Lm is a continuous random variable m = 0, …, K. Given the assumptions of Theorem 6.1, whether a closed form solution exists for h(am,am1)h(0,0) in this setting will now depend on the choice of g and r.

For example, as in Xiao et al. [7], assume that Lm is normally distributed given the past with

Lm|Ām1,m1,Ym=0~N(β1Am1,σ2) (25)

As in (16), an intercept is easily added to (25). With this choice of r, a closed form solution for (9) is not generally available for g the logistic regression model (17). If, however, along with this choice of r, we choose a probit link for g such that

g(lm,am,am1;θ)=Φ(θ0+θ1lm+θ2am+θ3am1) (26)

with Φ(·) the CDF of a standard normal, then, following Agresti [18], we have

h(m,ām;β,θ)=Φ{c(θ0+θ2am+θ3am1+θ1β1am1)} (27)

with c=(1+θ12σ2)1. By Corollary 6.2, we then also have that the Cox MSM γ(m, ām, ψ) = ψ0am + ψ1am−1 + ψ2amam−1 holds with specifically

exp(ψ0)=Φ{c(θ0+θ2)}Φ(cθ0)
exp(ψ1)=Φ{c(θ0+θ3+θ1β1)}Φ(cθ0)
exp(ψ2)=Φ{c(θ0+θ2+θ3+θ1β1)}Φ{c(θ0+θ2)}×Φ(cθ0)Φ{c(θ0+θ3+θ1β1)}

A similar result is available for the logistic regression model (17) provided r is alternatively defined in terms of the more complex bridge distribution function of Wang and Louis [19]. However, if we further assume rare disease, we may maintain assumption (25) for r along with the model (17) for g and obtain an approximate closed form solution for (9). Specifically, using the approximation (21) for g and model (25) for r we have

h(m,ām;β,θ)exp(θ0+θ1lm+θ2am+θ3am1)r(lm,am1;β)dlm=exp(θ0+θ2am+θ3am1)E[exp(θ1Lm)|Am1=am1]=exp(θ0+θ2am+θ3am1)exp(θ1β1am1)exp(12σ2θ12) (28)

with the last equality given by the moment generating function for the normal distribution. Note that, given (21), an analogous solution for h(m, ām; β, θ) will exist for any choice of r provided the conditional distribution of Lm has homoscedastic errors by the general property of the moment generating function. By Corollary 6.2, plugging the approximation (28) into (10), (11) and (12) for the appropriate choices of (am, am−1), the Cox MSM γ(m, ām, ψ) = ψ0am + ψ1am−1 + ψ2amam−1 holds with the approximate solutions ψ0 = θ2, ψ1 = θ3 + θ1β1 and ψ2 = 0.

Analogous to the worked example for binary Lm under the rare disease assumption (21), we see that ψ0 = θ2 regardless of the values of θ1 and β1. By contrast, ψ1 = θ3 only if θ1 or β1 is zero; that is, when Lm is not a confounder affected by prior treatment. As in the binary case, our probit example illustrates more generally that the discrepancy between ψ and the components of θ corresponding to treatment coefficients in the conditional outcome regression model, in addition to β1 and θ1, may also depend on other components of θ.

Of the examples considered thus far, the data generating assumptions of our last example – with r defined by model (25) and g approximated by (21) – are closest to those of the data generating models given in Xiao et al. [7]. One key distinction, however, is that Xiao et al. [7] allowed the distribution of Lm also to depend on Lm−1. Under this weaker assumption, the resulting Cox MSM now depends on the entire treatment history ām and not simply the two most recent values (am, am−1) as given by the following theorem.

Theorem 8.1 Assume the data generating mechanism (6) of Theorem 6.1 holds with g approximated by (21). Further assume

Lm|Ām1,m1,Ym=0~N(β1Am1+β2Lm1,σ2) (29)

m = 0, …, K. Data generated under these assumptions will approximately follow a Cox MSM of the form

γ(m,ām,ψ)=ψ0am+ψ1am1+s=1m1ψms+1as1

with ψ0 = θ2, ψ1 = θ3 + θ1β1 and ψms+1=θ1β1β2ms, s = 1, …, m − 1.

A proof of Theorem 8.1 is given in Appendix C. Note that Xiao et al. [7] concluded that, under their data generating models, the resulting Cox MSM should only depend on am and am−1 for all m = 0, …, K. Theorem 8.1 appears to contradict this conclusion for K > 1. Under these data generating assumptions, IPW estimates constructed based on a Cox MSM that excludes the correct function of ām−2 should theoretically incur some bias because such a Cox MSM will be misspecified. This is a particular problem when |β2| ≥ 1.

9. Simulation Algorithm

The following general algorithm may be used to simulate a sample of n i.i.d. observations as described in §2 that follows a particular parametrization of the g-formula.

Let Pr[Am = 1|m = m,Ām−1 = ām−1, Ym = 0; α] be a parametric model for the probability of receiving treatment in interval m given survival to m and history (m, ām−1). For each of i = 1, …, n simulated observations, implicitly define −1,iĀ−1,iY0,i = 0. Then for each observation i:

For m = 0, …, K:

  1. Draw Lm,i from some choice of f(Lm|Ām−1, m−1, Ym = 0; β) evaluated at the previously generated (Ām−1,i, m−1,i).

  2. Draw Am,i from some choice of Pr[Am = 1|m, Ām−1, Ym = 0; α] evaluated at previously generated (Ām−1,i, m,i).

  3. Draw Ym+1,i from some choice of Pr[Ym+1 = 1|m, Ām, Ym = 0; θ] evaluated at previously generated (Ām,i, L̅m,i). If Ym+1,i = 1 then this is the last record in the data set for observation i. Otherwise, generate another record for observation i (i.e., go to index m + 1).

The above algorithm may be used to confirm theoretical results under any of the data generating assumptions considered above. As a simple illustration, we performed a simulation study where 20, 000 samples were generated according to the above algorithm, each with n = 100, 000 observations and K = 6. Data were generated according to the restrictions of Theorem 6.1 with the covariate and outcome generated according to the logistic regression models (16) and (17), respectively. Treatment was generated according to the logistic regression model logit[Pr(Am = 1|m = m, Ām−1 = ām−1, Ym = 0; α)] = α0 + α1lm for each m = 0, …, K.

Simulations were conducted under the following six different combinations of (β1, θ1). In all scenarios, we fixed α0 = 0.5, α1 = 0.5, θ0 = −7, θ2 = −0.8 and θ3 = 0. As all components of θ were selected ≤ 0, the rare disease approximation (21) holds under all six scenarios. Recall that by the analytic results of §7, under all simulation scenarios the Cox MSM γ(m, ām, ψ) = ψ0am + ψ1am−1 + ψ2amam−1 holds with approximate values of ψ0, ψ1 and ψ2 defined as in (22), (23) and (24), respectively.

Table 1 presents the bias of the IPW estimates of ψ0 and ψ1 for the true ψ0 and ψ1, respectively, constructed under the correctly specified Cox MSM and model for treatment for the 20, 000 runs. We see little bias at n = 100, 000 in these estimates. See Appendix D for details of the IPW estimation procedure. Table 2 presents the bias of the MLE of θ2 for θ2 = ψ0 constructed under the correctly specified outcome regression model (17). Table 3 presents the bias of the MLE of θ3 for θ3, along with the bias for ψ1, also under the correct model (17). Our simulations confirm the analytic results of §7. In particular, we see little bias of the MLE of θ2 and θ3 for θ2 = ψ0 and θ3, respectively, regardless of the values of β1 and θ1. However, we see that the bias of the MLE of θ3 for ψ1 approximates θ3 − ψ1. As expected, this difference is approximately zero only when either β1 or θ1 is zero.

Table 1.

Bias of IPW estimates under the six choices of (β1, θ1) for n = 100, 000 and K = 6. ψ̂j is the IPW estimate of ψj, E [ψ̂j] is the mean of the estimates ψ̂j over the 20, 000 simulation runs and Bias(ψ̂j, ψj) = E [ψ̂j] − ψj, j = 0, 1.

β1 θ1 ψ0 E [ψ̂0] Bias(ψ̂0, ψ0) ψ1 E [ψ̂1] Bias(ψ̂1, ψ1)
−2.0 −2.0 −0.8 −0.8056 −0.0056 0.4574 0.4511 −0.0063
−0.5 −0.5 −0.8 −0.8021 −0.0021 0.0583 0.0586 0.0003
0.0 −0.5 −0.8 −0.8011 −0.0011 0 0.0004 0.0004
−0.5 0.0 −0.8 −0.8004 −0.0004 0 0.0002 0.0002
0.5 −2.0 −0.8 −0.7973 0.0027 −0.2064 −0.2047 0.0017
2.0 −2.0 −0.8 −0.7772 0.0228 −0.8676 −0.8709 −0.0033

Table 2.

Bias of the MLE of θ2 for θ2 = ψ0 under the six choices of (β1, θ1) for n = 100, 000 and K = 6. θ̂2 is the MLE of θ2, E [θ̂2] is the mean of the estimates θ̂2 over the 20, 000 simulation runs and Bias(θ̂2, θ2) = E [θ̂2] − θ2.

β1 θ1 θ2 E [θ̂2] Bias(θ̂2, θ2)
−2.0 −2.0 −0.8 −0.8006 −0.0006
−0.5 −0.5 −0.8 −0.8006 −0.0006
0.0 −0.5 −0.8 −0.8006 −0.0006
−0.5 0.0 −0.8 −0.8004 −0.0004
0.5 −2.0 −0.8 −0.8004 −0.0004
2.0 −2.0 −0.8 −0.7992 0.0008

Table 3.

Bias of the MLE of θ3 for both θ3 and ψ1 under the six choices of (β1, θ1) for n = 100, 000 and K = 6. θ̂3 is the MLE of θ3, E [θ̂3] is the mean of the estimates θ̂3 over the 20, 000 simulation runs, Bias(θ̂3, θ3) = E [θ̂3] − θ3 and Bias(θ̂3, ψ1) = E [θ̂3] − ψ1.

β1 θ1 E [θ̂3] θ3 Bias(θ̂3, θ3) ψ1 Bias(θ̂3, ψ1)
−2.0 −2.0 0.0008 0 0.0008 0.4574 −0.4566
−0.5 −0.5 −0.0005 0 −0.0005 0.0583 −0.0588
0.0 −0.5 −0.0007 0 −0.0007 0 −0.0007
−0.5 0.0 −0.0006 0 −0.0006 0 −0.0006
0.5 −2.0 −0.0013 0 −0.0013 −0.2064 0.2051
2.0 −2.0 −0.0060 0 −0.0060 −0.8676 0.8616

10. Relation to the g-null paradox

As discussed, a limitation of the proposed simulation approach is that, under some data generating assumptions, it may be intractable or impossible to solve for the true Cox MSM parameters. Interestingly, an additional limitation of the proposed simulation approach follows from previous arguments regarding the “the g-null paradox” [20]. These arguments would suggest that given standard parametrizations of the observed data distribution consistent with the presence of a time-varying confounder affected by prior treatment, it is impossible for the null hypothesis of ψ = 0 to hold simultaneously. Our examples allow a careful consideration of this paradox in the current setting.

Consider the data generating assumptions of the simulation study described in §9. As before, under these assumptions, we approximately have ψ0 = 0 if θ2 = 0 by equation (22). Further, by equation (23), we have ψ1 = 0 if θ3 is set to

exp(θ3)=12{exp(θ1)+1}exp(θ1+β1)1+exp(β1)+11+exp(β1)

regardless of the values of θ1 and β1. Thus, we have at least one example illustrating that it is mathematically possible to generate data according to standard parametric models such that all components of ψ are zero and a time-varying confounder affected by prior treatment is present.

However, we do not expect such a scenario, where one coefficient is restricted to depend on a function of other coefficients of the data generating mechanism, to occur in nature. This is an example of the faithfulness assumption invoked when causal directed acyclic graphs are used to represent underlying data generating mechanisms [21, 22]. We therefore may be limited to simulation scenarios with the proposed algorithm to unrealistic settings if we wish simultaneously to generate data under the null.

11. Discussion

In this paper, we have illustrated how to derive a closed form Cox MSM given a set of parametric models for the observed data distribution. This gives an approach for simulating from a known Cox MSM using a standard parametrization of the likelihood. In contrast to previously proposed simulation methods, this approach allows a comparison of the performance of IPW and standard regression-based estimators of the effect of a time-varying treatment on survival in the complete absence of model misspecification. This, in turn, allows isolation of any particular source of bias in a simulation study. These sources may include finite sample bias, that due to (known) model misspecification and that due to complex time-varying confounding structures.

We used our analytic results to demonstrate and confirm in an example simulation study that the bias of standard regression-based estimates depends, at least in part, on the degree to which parameters quantifying the presence of a time-varying confounder affected by prior treatment are non-zero. Using analytic results, one may know, prior to undertaking a simulation study, how much large sample bias to expect in such a standard estimate in the absence of model misspecification. Confirmation of these expectations reduces the possibility of coding errors.

Our arguments highlight the importance of clearly defining the target population parameter of interest in any consideration of bias. As discussed, standard estimates will be approximately unbiased in large samples for the coefficients on treatment history in a correctly specified outcome regression model conditional on past treatment and confounders. However, following previous graphical arguments in largely model-free settings [9, 23, 24], these coefficients may fail to have a causal interpretation, even given the identifying assumptions of §4, when Lm is a time-varying confounder affected by prior treatment. Our arguments further highlight the need for careful consideration when imposing a parsimonious Cox MSM. For example, as we showed, typical assumptions that restrict dependence of the causal hazard ratio on only the most recent values of treatment may be difficult to justify if one is unwilling to make potentially extreme Markov assumptions on the underlying observed data generating process.

Finally, the utility of the proposed approach to known Cox MSM data generation is not limited to simulation-based comparisons of IPW performance versus that of standard regression-based estimators. This approach may also be useful in simulation studies aimed at comparing IPW with other estimators of correctly specified Cox MSM parameters that rely on a correctly specified conditional outcome regression model for optimal performance. These include parametric g-computation [8, 25, 26, 16, 27] as well as double-robust methods [28, 29, 30, 31].

Acknowledgements

The authors thank Miguel Hernán for helpful comments. This work was funded by NIH grants HL080644, R01AI104459-01A1, and 1R21ES019712-01.

Appendix A

We explicitly prove Theorem 6.1 under the discrete case as the proof under the continuous case is identical requiring integrals instead of sums. Incorporating assumptions (6) and (7) into the g-formula (2) we have

h(m,ām)=lmg(lm,am,am1)r(lm,am1)m1wt(m1,m1,ām1)lmr(lm,am1)m1wt(m1,m1,ām1) (30)

where

wt(m1,m1,ām1)={1g(lm1,am1,am2)}j=0m1{1g(lj1,aj1,aj2)}r(lj,aj1).

The result follows by noting that ∑m−1 wt(m − 1, m−1, ām−1) cancels in the numerator and denominator of (30) and ∑lm r(lm, am−1) = 1.

Appendix B

Consider the worked example of §7 but where the model (17) is replaced by

g(lm,am,am1;θ)=exp(θ0+θ1lm+θ2am+θ3am1+θ4amam1)1+exp(θ0+θ1lm+θ2am+θ3am1+θ4amam1)

allowing interaction between am and am−1 as quantified by θ4. Plugging this alternative choice of g into (15), along with the original model (16) for r, we have

exp(ψ2)=exp(θ4){exp(θ1+β1)1+exp(θ0+θ1+θ2+θ3+θ4)+11+exp(θ0+θ2+θ3+θ4)}{exp(θ1)1+exp(θ0+θ1)+11+exp(θ0)}{exp(θ1)1+exp(θ0+θ1+θ2)+11+exp(θ0+θ2)}{exp(θ1+β1)1+exp(θ0+θ1+θ3)+11+exp(θ0+θ3)}

which is equivalent to (20) when θ4 = 0.

Given rare disease such that we have the approximation

g(lm,am,am1;θ)exp(θ0+θ1lm+θ2am+θ3am1+θ4amam1)

we obtain the simplified approximate solution

ψ2=θ4

which is equivalent to (24) when θ4 = 0.

Appendix C

For any m = 0, …, K define

bā(m)b(m)=E[exp(θ1Lm)|Am1=am1,Lm1]E[exp(θ1Lm)|Am1=0,Lm1]

By (29) and the moment generating function for the normal distribution we have

bā(m)b(m)=exp(θ1β1am1)exp(12σ2θ12)exp(θ1β2Lm1)exp(12σ2θ12)exp(θ1β2Lm1)=exp(θ1β1am1)exp(θ1β1Lm1)exp(θ1β2Lm1)

Next define

bā(m1)b(m1)=E[bā(m)|Am2=am2,Lm2]E[b(m)|Am2=0,Lm2]

By the fact that (29) holds for L0, …, LK and, again, using the moment generating function for the normal distribution we have

E[bā(m)|Am2=am2,Lm2]E[b(m)|Am2=0,Lm2]=exp(θ1β1am1)E[exp(θ1β2Lm1)|Am2=am2,Lm2]E[exp(θ1β2Lm1)|Am2=0,Lm2]=exp(θ1β1am1)exp(12σ2θ12β22)exp{θ1β2(β1am2+β2Lm2)}exp(12σ2θ12β22)exp(θ1β22Lm2)=exp(θ1β1am1)exp(θ1β1β2am2)exp(θ1β22Lm2)exp(θ1β22Lm2)

Analogously define

bā(m2)b(m2)=E[bā(m1)|Am3=am3,Lm3]E[b(m1)|Am3=0,Lm3]=exp(θ1β1am1)exp(θ1β1β2am2)exp(θ1β1β22am3)exp(θ1β23Lm3)exp(θ1β23Lm3)

Arguing recursively for any j = 1, …, m − 1

bā(j)b(j)=exp(s=jmθ1β1β2msas1)exp(θ1β2mj+1Lj1)exp(θ1β2mj+1Lj1)

Setting j = 1 we have

bā(1)b(1)=exp(s=1mθ1β1β2msas1)exp(θ1β2mL0)exp(θ1β2mL0)

Further, by Ā−1−1 ≡ 0, we have

E[bā(1)|A1=a1,L1]E[b(1)|A1=0,L1]=exp(s=1mθ1β1β2msas1)

By (21) we have that

h(m,ām;β,θ)h(m,m;β,θ)exp(θ2am+θ3am1)E[bā(1)|A1=a1,L1]E[b(1)|A1=0,L1] (31)

Our result follows by noting that the RHS of (31) is equivalent to exp (ψ0am+ψ1am1+s=1m1ψms+1as1) where ψ0 = θ2, ψ1 = θ3 + θ1β1 and ψms+1=θ1β1β2ms, s = 1, …, m − 1.

Appendix D

A typical implementation of IPW estimation is as follows. Let (ϕ̂, ψ̂) be the solution to the estimating equation

āmm=0Ki=1nUi,mā(ϕ,ψ,α̂)=0, (32)

with respect to (ϕ′, ψ′). Following the appendix of [15] and, again, suppressing the i subscript, define

Umā(ϕ,ψ,α̂)=[Ym+1expit{w(m;ϕ)+γ(m,ām,ψ)}](1Ym)Wmā(α̂)q(m,ām)j=0mI(Aj=aj)

where expit(·)=exp(·)1+exp(·), w(m; ϕ′) is a function of interval m and a parameter vector ϕ′, q(m, ām) is a user-selected vector function of (m, ām) and

Wmā(α̂)=1j=0mPr[Aj=aj|j,Āj1=āj1,Yj=0;α̂], (33)

with α̂ the MLE of α given the treatment model Pr[Am = am|m, Ām−1, Ym = 0; α].

Assume the following holds for all m and ā:

  1. The Cox MSM (3) holds,

  2. h(m, 0̅m) = exp{w(m; ϕ′)} when evaluated at ϕ′ = ϕ,

  3. the treatment model Pr[Am = am|m, Ām−1, Ym = 0; α] is correctly specified and

  4. h(m, ām) ≈ 0 such that (given assumptions 1 and 2)

h(m,ām)expit{w(m;ϕ)+γ(m,ām,ψ)}. (34)

Given these four assumptions we have

E[Umā(ϕ,ψ,α)]0 (35)

for all m and ām and the IPW estimator ψ̂ approximately consistent for ψ and asymptotically normal. The choice of q(m, ām) in large samples affects only the efficiency of ψ̂.

Note that the approximation (35) follows under these assumptions by the equivalence between the g-formula (2) and the ratio of expectations

E[Ym+1(1Ym)Wmāj=0mI(Aj=aj)]E[(1Ym)Wmāj=0mI(Aj=aj)]

where Wmā is equivalent to Wmā(α̂) but with the true conditional probability of treatment Pr[Am = am|m, Ām−1 = ām−1, Ym = 0] replacing its MLE.

A convenient choice of q(m, ām) is {w(m;ϕ)ϕ𝖳,γ(m,ām;ψψ𝖳}𝖳. With this choice an approximate solution to (32) may be obtained with off-the-shelf software by fitting a weighted logistic regression model in a person-time data set of the structure described in §9. This approach was used to construct IPW estimates in the example simulation study also described in §9. Specifically, a logistic regression model was fit in SAS using the LOGISTIC procedure with dependent variable Ym+1 and independent variables Am and Am−1 along with a completely flexible function of m = 0, …, 6. The WEIGHT option was used with stabilized weights to increase efficiency. That is, the weight for observation i at time m was defined by expression (33) under the model used to generate treatment and then multiplied by an estimate of j=0mPr[Aj=aj|Āj1=āj1,Yj=0] with ām selected as that observation’s treatment history through m. The weight numerator can be considered an implicit component of q(m, ām). It is straightforward to confirm that the data generating parameters under all simulation scenarios maintain assumption (34) for all m and ām. Code is available upon request.

References

  • 1.Robins JM. Statistical Models in Epidemiology. New York: Springer; 1999. Marginal structural models versus structural nested models as tools for causal inference; pp. 95–133. [Google Scholar]
  • 2.Hernán MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11(5):561–570. doi: 10.1097/00001648-200009000-00012. [DOI] [PubMed] [Google Scholar]
  • 3.Young JG, Hernán MA, Picciotto S, Robins JM. JSM Proceedings, Section on Statistics in Epidemiology. Alexandria, VA: American Statistical Association; 2008. Simulation from structural survival models under complex time-varying data structures. [Google Scholar]
  • 4.Young JG, Hernán MA, Picciotto S, Robins JM. Equivalence between structural models for the effect of a time-varying exposure on survival. Lifetime Data Analysis. 2010;16(1):71–84. doi: 10.1007/s10985-009-9135-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Havercroft WG, Didelez V. Simulating from marginal structural models with time-dependent confounding. Statistics in Medicine. 2012;31(30):4190–4206. doi: 10.1002/sim.5472. [DOI] [PubMed] [Google Scholar]
  • 6.Westreich D, Cole SR, Schisterman EF, Platt RW. A simulation study of finite-sample properties of marginal structural cox proportional hazards models. Statistics in Medicine. 2012;31(19):2098–2109. doi: 10.1002/sim.5317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Xiao Y, Abrahamowicz M, Moodie EE. Accuracy of conventional and marginal structural cox model estimators: a simulation study. International Journal of Biostatistics. 2010;6(2) doi: 10.2202/1557-4679.1208. Article 13. [DOI] [PubMed] [Google Scholar]
  • 8.Robins JM. A new approach to causal inference in mortality studies with a sustained exposure period: application to the healthy worker survivor effect. Mathematical Modelling. 1986;7:1393–1512. [Errata (1987) in Computers and Mathematics with Applications 14, 917–921. Addendum (1987) in Computers and Mathematics with Applications 14, 923–945. Errata (1987) to addendum in Computers and Mathematics with Applications 18, 477.]. [Google Scholar]
  • 9.Robins JM. Causal inference from complex longitudinal data. In: Berkane M, editor. Latent Variable Modeling and Applications to Causality. Lecture notes in statistics 120. Springer-Verlag; 1997. pp. 69–117. [Google Scholar]
  • 10.Murphy SA, van der Laan MJ, Robins JM. Marginal mean models for dynamic regimes. Journal of the American Statistical Association. 2001;96(456):1410–1423. doi: 10.1198/016214501753382327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.van der Laan MJ, Petersen ML, Joffe MM. History-adjusted marginal structural models and statically-optimal dynamic treatment regimens. International Journal of Biostatistics. 2005;1(1) Article 4. [Google Scholar]
  • 12.Hernán MA, Lanoy E, Costagliola D, Robins JM. Comparison of dynamic treatment regimes via inverse probability weighting. Basic & Clinical Pharmacology & Toxicology. 2006;98:237–242. doi: 10.1111/j.1742-7843.2006.pto_329.x. [DOI] [PubMed] [Google Scholar]
  • 13.Orellana L, Rotnitzky A, Robins JM. Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, Part I: Main Content. International Journal of Biostatistics. 2010a;6 Article 7. [PubMed] [Google Scholar]
  • 14.Orellana L, Rotnitzky A, Robins JM. Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, Part II: Proofs and Additional Results. International Journal of Biostatistics. 2010b;6 doi: 10.2202/1557-4679.1242. Article 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Cain LE, Robins JM, Lanoy E, Logan R, Costagliola D, Hernán MA. When to start treatment? A systematic approach to the comparison of dynamic regimes using observational data. International Journal of Biostatistics. 2010;6 doi: 10.2202/1557-4679.1212. Article 18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Young JG, Cain LE, Robins JM, O’Reilly EJ, Hernán MA. Comparative effectiveness of dynamic treatment regimes: an application of the parametric g-formula. Statistics in Biosciences. 2011 doi: 10.1007/s12561-011-9040-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.D’agostino RB, Lee M, Belanger AJ. Relation of pooled logistic regression to time-dependent Cox regression analysis: the Framingham Heart Study. Statistics in Medicine. 1990;9:1501–1515. doi: 10.1002/sim.4780091214. [DOI] [PubMed] [Google Scholar]
  • 18.Agresti A. Categorical Data Analysis. Connecticut, USA: Wiley Series in Probability and Statistics; 2012. [Google Scholar]
  • 19.Wang Z, Louis TA. Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function. Biometrika. 2003;90(3):765–775. [Google Scholar]
  • 20.Robins JM, Wasserman L. Estimation of effects of sequential treatments by reparameterizing directed acyclic graphs. In: Geiger D, Shenoy P, editors. Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence. San Francisco: Morgan Kaufmann; 1997. pp. 409–420. [Google Scholar]
  • 21.Spirtes P, Glymour C, Scheines R. Causation, Prediction and Search. New York: Springer-Verlag; 1993. [Google Scholar]
  • 22.Pearl J. Causal diagrams for empirical research. Biometrika. 1995;82:669–710. [Google Scholar]
  • 23.Hernán MA, Hernández-Diáz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15:615–625. doi: 10.1097/01.ede.0000135174.63482.43. [DOI] [PubMed] [Google Scholar]
  • 24.Robins JM, Hernán MA. Estimation of the causal effects of time-varying exposures. In: Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G, editors. Advances in Longitudinal Data Analysis. Boca Raton, FL: Chapman and Hall/CRC Press; 2009. pp. 553–599. [Google Scholar]
  • 25.Robins JM, Hernán MA, Siebert U. Effects of multiple interventions. In: Ezzati M, Lopez AD, Rodgers A, Murray CJL, editors. Comparative Quantification of Health Risks: Global and Regional Burden of Disease Attributable to Selected Major Risk Factors. Geneva: World Health Organization; 2004. [Google Scholar]
  • 26.Taubman SL, Robins JM, Mittleman MA, Hernán MA. Intervening on risk factors for coronary heart disease: an application of the parametric g-formula. International Journal of Epidemiology. 2009;38(6):1599–1611. doi: 10.1093/ije/dyp192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Westreich D, Cole SR, Young JG, Palella F, Tien PC, Kingsley L, Gange SJ, Hernán MA. The parametric g-formula to estimate the effect of highly active antiretroviral therapy on incident AIDS or death. Statistics in Medicine. 2012 doi: 10.1002/sim.5316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.van der Laan MJ, Robins JM. Unified Methods for Censored Longitudinal Data and Causality. New York: Springer; 2002. [Google Scholar]
  • 29.Bang H, Robins JM. Doubly robust estimation in missing data and causal inference models. Biometrics. 2005;61:692–972. doi: 10.1111/j.1541-0420.2005.00377.x. [DOI] [PubMed] [Google Scholar]
  • 30.van der Laan MJ. Targeted maximum likelihood based causal inference: Part I. International Journal of Biostatistics. 2010;6(2) doi: 10.2202/1557-4679.1211. Article 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.van der Laan MJ. Targeted maximum likelihood based causal inference: Part II. International Journal of Biostatistics. 2010;6(2) doi: 10.2202/1557-4679.1211. Article 3. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES