Simulation from a known Cox MSM using standard parametric models for the g-formula

Jessica G Young; Eric J Tchetgen Tchetgen

doi:10.1002/sim.5994

. Author manuscript; available in PMC: 2015 Mar 15.

Published in final edited form as: Stat Med. 2013 Oct 22;33(6):1001–1014. doi: 10.1002/sim.5994

Simulation from a known Cox MSM using standard parametric models for the g-formula

Jessica G Young ^a,^*, Eric J Tchetgen Tchetgen ^a,^b

PMCID: PMC3947915 NIHMSID: NIHMS531688 PMID: 24151138

Abstract

It is routinely argued that, unlike standard regression-based estimates, inverse probability weighted (IPW) estimates of the parameters of a correctly specified Cox marginal structural model (MSM) may remain unbiased in the presence of a time-varying confounder affected by prior treatment. Previously proposed methods for simulating from a known Cox MSM lack knowledge of the law of the observed outcome conditional on the measured past. While unbiased IPW estimation does not require this knowledge, standard regression-based estimates rely on correct specification of this law. Thus, in typical high-dimensional settings, such simulation methods cannot isolate bias due to complex time-varying confounding as it may be conflated with bias due to misspecification of the outcome regression model. In this paper, we describe an approach to Cox MSM data generation that allows for a comparison of the bias of IPW estimates versus that of standard regression-based estimates in the complete absence of model misspecification. This approach involves simulating data from a standard parametrization of the likelihood and solving for the underlying Cox MSM. We prove that solutions exist and computations are tractable under many data generating mechanisms. We show analytically and confirm in simulations that, in the absence of model misspecification, the bias of standard regression-based estimates for the parameters of a Cox MSM is indeed a function of the coefficients in observed data models quantifying the presence of a time-varying confounder affected by prior treatment. We discuss limitations of this approach including that implied by the “g-null paradox”.

Keywords: marginal structural models, simulation, g-formula, survival analysis, causal inference, g-null paradox

1. Introduction

Inverse probability weighted (IPW) estimation of Cox Marginal Structural Models (MSMs) [1, 2] is now a popular approach to estimating the causal effect of a time-varying treatment on survival in observational studies. A Cox MSM is a model for the hazard ratio at a given follow-up time comparing counterfactual time-varying treatment regimes. It is routinely argued that, unlike standard regression-based estimates, IPW estimates of the parameters of a correctly specified Cox MSM may remain unbiased in the presence of a time-varying confounder affected by prior treatment. For example, in observational studies of HIV-infected patients, CD4 cell count affects whether a patient will receive treatment and is associated with future survival. It is also itself affected by whether treatment has been previously initiated.

Young et al. [3, 4] and Havercroft and Didelez [5] have proposed algorithms for simulating data under a known Cox MSM and known model for the treatment mechanism. Westreich et al. [6] recently applied a variant of one of these algorithms to compare the performance of IPW estimates and standard regression-based estimates of the true Cox MSM parameters under several simulation scenarios where time-varying confounding affected by prior treatment is present. As knowledge of the correct functional form of the Cox MSM and of the model for the treatment mechanism are required for unbiasedness of IPW estimation, these previously proposed approaches are reasonably useful for simulation studies of IPW estimator performance. In particular, under such data generating algorithms, the properties of IPW estimators in the complete absence of model misspecification may be studied.

These previously proposed simulation approaches, however, lack explicit knowledge of the law of the observed outcome at each time conditional on the measured past. Unlike IPW estimates, standard regression-based estimates rely on correct specification of this law. In settings most often of interest, where treatment and confounders are frequently updated over time and/or covariates are high-dimensional, regression-based estimates cannot be constructed non-parametrically and, typically, parametric models are used. It follows that, in such settings, these previous simulation methods will not be useful for studying the performance of standard regression-based estimates as bias due to time-varying confounding may be conflated with bias due to model misspecification.

Xiao et al. [7] suggested an alternative approach to simulating from a Cox MSM by generating according to standard parametric models for the joint distribution of the observed data. These authors argued, under a particular data generating mechanism and a rare disease assumption, that the parameters of the underlying Cox MSM may be derived analytically from the parameters of the specified observed data generating models. These include a regression model for the treatment mechanism used in the construction of IPW estimates. These also include a regression model for the law of the outcome conditional on the measured past used in the construction of standard regression-based estimates. In turn, this approach allows a comparison of IPW and standard regression-based estimates in the absence of model misspecification.

In this paper, we show more generally that the parameters of an underlying Cox MSM may be derived based on a particular parametrization of the observed data distribution. This derivation follows from the general relationship between a Cox MSM and Robins’ g-formula [8]. We prove that solving for the true Cox MSM parameters is both possible and computationally tractable under many data generating models with or without the assumption of rare disease. Various examples are presented where follow-up time is arbitrary and standard parametric models for the observed data are imposed such that time-varying confounding affected by prior treatment is present. A large sample simulation study is also presented. We begin with a description of the observed data to be generated.

2. Observed data structure

We wish to generate samples of n i.i.d observations where each observation represents measurements on a subject in a hypothetical observational study. In this study, subjects are followed beginning at time t₀ (baseline) and the investigator takes measurements on each subject during frequent, regular intervals. Specifically, for m = 0, …, K: Let A_m be the value of a binary treatment measured during interval m defined by [t_m, t_m+1), with A_m = 1 if the subject is treated and A_m = 0 otherwise. Further, let L_m be a covariate measured at the start of that same interval. Define Y_m+1 = I(T ≤ t_m+1) with T a subject’s exact failure time which may be either continuous or discrete; equivalently, Y_m+1 is an indicator of failure by t_m+1.

In general, we denote the history of a random variable using overbars; for example Ā_m = (A₀, …, A_m) is the observed treatment history through the end of interval m. By definition Y̅₀ = 0 (all subjects must be at risk for failure at baseline). For notational convenience we set L̅₋₁ and Ā₋₁ to be identically 0. To simplify the presentation, we will assume no loss to follow-up.

3. Definition of a Cox MSM

Let ā ≡ ā_K = (a₀, …, a_m, …, a_K) denote a treatment regime in 𝒜̅_K, the support of Ā_K consisting of all possible treatment regimes. Examples include ā = 1̅ (or “always treat”) and ā = 0̅ (or “never treat”). Treatment regimes of the form ā are known as static regimes in that the treatment received in every future interval by someone following that regime is deterministically known at baseline. By contrast, a dynamic treatment regime is one under which treatment at a future time may depend on the values of evolving time-dependent covariates. For example, see [9, 10, 11, 12, 13, 14, 15, 16]. In this paper we limit our attention to causal contrasts involving only static regimes.

Define ${Y̅}_{K + 1}^{ā}$ as the outcome history a subject would have had if, possibly contrary to fact, she followed regime ā with T^ā her exact failure time. A discrete time Cox MSM γ(m, ā_m, ψ) is defined as

\frac{Pr [Y_{m + 1}^{ā} = 1 | Y_{m}^{ā} = 0]}{Pr [Y_{m + 1}^{0̅} = 1 | Y_{m}^{0̅} = 0]} = exp {γ (m, ā_{m}, ψ)}

(1)

where ψ is a constant parameter vector, and γ is a particular function of ψ, regime ā through m and m. The parameter ψ encodes the causal treatment effect of following a static regime ā compared with 0̅ up to any interval m + 1 such that ψ = 0 if and only if $Pr [Y_{m + 1}^{ā} = 1 | Y_{m}^{ā} = 0] = Pr [Y_{m + 1}^{0̅} = 1 | Y_{m}^{0̅} = 0]$ for all ā_m in the support of Ā_m, m = 0, …, K.

Following D’agostino et al. [17], when T^ā is continuous and measurement intervals are sufficiently small such that the event rate is negligible within each interval and over the follow-up, then the discrete time Cox MSM (1) may approximate the continuous time Cox MSM

λ_{T^{ā}} (t) = λ_{0} (t) exp {γ (t, ā_{t}, ψ)}

for t ∈ [t_m, t_m+1) where λ_T^ā (t) and λ₀(t) are the counterfactual hazards at t under regimes ā and “never treat”, respectively, for all ā and m = 0, …, K.

4. Identifying assumptions and the g-formula

Suppose the goal of this hypothetical study is to obtain an unbiased estimate of the parameter vector ψ under model (1). Note that, if γ(m, ā_m, ψ) is a saturated model – i.e., the ratio on the LHS of (1) is allowed to differ for every possible ā_m and m – then the Cox MSM is by definition correctly specified. As ${Y̅}_{K + 1}^{ā}$ for all ā_K in the support of Ā_K are not observed for all study subjects, in order to identify ψ based only on measured variables, we require additional assumptions. For each m = 0, …, K and each static regime ā, suppose that the following hold:

Consistency: If Ā_m = ā_m then ${Y̅}_{m + 1} = {Y̅}_{m + 1}^{ā}$ and ${L̅}_{m} = {L̅}_{m}^{ā}$ with ${L̅}_{m}^{ā}$ the covariate history through m under ā.
Positivity: f_{Ā_m−1,L̅_m,Y_m} (ā_m−1, l̄_m, 0) ≠ 0 ⇒ Pr[A_m = a_m|L̅_m = l̄_m,Ā_m−1 = ā_m−1, Y_m = 0] > 0 w.p.1.
Exchangeability (no unmeasured confounding):
$(Y_{m + 1}^{ā}, \dots, Y_{K + 1}^{ā}) ∐ A_{m} | {L̅}_{m} = {l̄}_{m}, Ā_{m - 1} = ā_{m - 1}, Y_{m} = 0$
where, in general, A ∐ B|C denotes “A is independent of B given C”.

As stated in the appendix of [3], under the above three identifying assumptions for a given static regime ā, $Pr [Y_{m + 1}^{ā} = 1 | Y_{m}^{ā} = 0]$ is given by Robins’ g-formula [8]:

h (m, ā_{m}) = \sum_{{l̄}_{m}} Pr [Y_{m + 1} = 1 | {L̅}_{m} = {l̄}_{m}, Ā_{m} = ā_{m}, Y_{m} = 0] w_{ā} (m, {l̄}_{m}) / \sum_{{l̄}_{m}} w_{ā} (m, {l̄}_{m})

(2)

with

w_{ā} (m, {l̄}_{m}) = \prod_{j = 0}^{m} Pr [Y_{j} = 0 | {L̅}_{j - 1} = {l̄}_{j - 1}, Ā_{j - 1} = ā_{j - 1}, Y_{j - 1} = 0] f (l_{j} | ā_{j - 1}, {l̄}_{j - 1}, Y_{j} = 0)

for m = 0, …, K.

It follows that, given our identifying assumptions, the Cox MSM (1) will hold in our study population if the following relationship holds

\frac{h (m, ā_{m})}{h (m, {0̅}_{m})} = exp {γ (m, ā_{m}, ψ)}

(3)

for all ā_m and m = 0, …, K. For simplicity, we have generally expressed the g-formula h(m, ā_m) above in terms of a high-dimensional sum. However, when L̅_m contains continuously measured components, we may replace sums with integrals.

5. Parametric assumptions on the g-formula

Let us now additionally assume that the components of the g-formula h(m, ā_m) may be characterized by standard parametric models. Under these additional restrictions, it follows that the Cox MSM (1) holds if

\frac{h (m, ā_{m}; β, θ)}{h (m, {0̅}_{m}; β, θ)} = exp {γ (m, ā_{m}, ψ)}

(4)

for all ā_m and m = 0, …, K where

h (m, ā_{m}; θ, β) = \sum_{{l̄}_{m}} Pr [Y_{m + 1} = 1 | {L̅}_{m} = {l̄}_{m}, Ā_{m} = ā_{m}, Y_{m} = 0; θ] w_{ā} (m, {l̄}_{m}; θ, β) / \sum_{{l̄}_{m}} w_{ā} (m, {l̄}_{m}; θ, β)

(5)

with

w_{ā} (m, {l̄}_{m}; θ, β) = \prod_{j = 0}^{m} Pr [Y_{j} = 0 | {L̅}_{j - 1} = {l̄}_{j - 1}, Ā_{j - 1} = ā_{j - 1}, Y_{j - 1} = 0; θ] f (l_{j} | ā_{j - 1}, {l̄}_{j - 1}, Y_{j} = 0; β)

such that (5) is a particular parametric version of (2).

We can now explicitly connect models for the observed data likelihood to a Cox MSM. Specifically, based on (4), when L̅_m contains only discrete components, one can, at least in theory, derive in closed form the underlying Cox MSM γ(m, ā_m, ψ) that holds for any choice of parametric models. However, this derivation may become computationally unwieldy in practice without additional restrictions (e.g. Markov assumptions) when the confounders may take on many levels and/or K is large.

When L̅_m contains continuously measured components, deriving the true Cox MSM based on a particular parametrization of the g-formula may require evaluating integrals with no closed form. In this case, whether there is a closed form solution will depend on the choice of parametrization and, possibly, whether additional restrictions are imposed (e.g. rare disease assumptions). In the following sections, we consider a variety of data generating assumptions under which we may tractably derive the true Cox MSM implied by a standard parametrization of the observed data likelihood.

6. Cox MSMs under Markov assumptions

Suppose the following Markov assumptions hold:

Pr [Y_{m + 1} = 1 | {L̅}_{m} = {l̄}_{m}, Ā_{m} = ā_{m}, Y_{m} = 0] = g (l_{m}, a_{m}, a_{m - 1})

(6)

and

f (l_{m} | ā_{m - 1}, {l̄}_{m - 1}, Y_{m} = 0) = r (l_{m}, a_{m - 1})

(7)

m = 0, …, K where g and r are any real-valued functions bounded between 0 and 1. We now have the following theorem.

Theorem 6.1 Assume that the restrictions (6) and (7) hold for all m = 0, …, K. Then, it follows that the hazard ratio $\frac{h (m, ā_{m})}{h (m, {0̅}_{m})}$ only depends on (m, ā_m) through (a_m, a_m−1) with

\frac{h (m, ā_{m})}{h (m, {0̅}_{m})} = \frac{\sum_{l_{m}} g (l_{m}, a_{m}, a_{m - 1}) r (l_{m}, a_{m - 1})}{\sum_{l_{m}} g (l_{m}, 0, 0) r (l_{m}, 0)}

(8)

for L_m discrete and

\frac{h (m, ā_{m})}{h (m, {0̅}_{m})} = \frac{\int_{- \infty}^{\infty} g (l_{m}, a_{m}, a_{m - 1}) r (l_{m}, a_{m - 1}) d l_{m}}{\int_{- \infty}^{\infty} g (l_{m}, 0, 0) r (l_{m}, 0) d l_{m}}

(9)

for L_m continuous.

A proof of Theorem 6.1 is given in Appendix A. The following is a corollary of Theorem 6.1.

Corollary 6.2 Suppose the assumptions of Theorem 6.1 hold and denote h(m, ā_m) ≡ h(a_m, a_m−1). Then, the Cox MSM γ(m, ā_m, ψ) = ψ₀a_m + ψ₁a_m−1 + ψ₂a_ma_m−1 holds with

exp (ψ_{0}) = \frac{h (1, 0)}{h (0, 0)}

(10)

exp (ψ_{1}) = \frac{h (0, 1)}{h (0, 0)}

(11)

exp (ψ_{2}) = \frac{h (1, 1)}{h (0, 0)} \times \frac{1}{exp (ψ_{0}) exp (ψ_{1})}

(12)

7. Binary covariates

In this section, suppose L_m is binary. By Corollary 6.2, we obtain

exp (ψ_{0}) = \frac{g (1, 1, 0) r (1, 0) + g (0, 1, 0) {1 - r (1, 0)}}{g (1, 0, 0) r (1, 0) + g (0, 0, 0) {1 - r (1, 0)}}

(13)

exp (ψ_{1}) = \frac{g (1, 0, 1) r (1, 1) + g (0, 0, 1) {1 - r (1, 1)}}{g (1, 0, 0) r (1, 0) + g (0, 0, 0) {1 - r (1, 0)}}

(14)

exp (ψ_{2}) = \frac{g (1, 1, 1) r (1, 1) + g (0, 1, 1) {1 - r (1, 1)}}{g (1, 1, 0) r (1, 0) + g (0, 1, 0) {1 - r (1, 0)}} \times \frac{g (1, 0, 0) r (1, 0) + g (0, 0, 0) {1 - r (1, 0)}}{g (1, 0, 1) r (1, 1) + g (0, 0, 1) {1 - r (1, 1)}}

(15)

Equations (13), (14) and (15) follow simply by plugging the RHS of (8) into (10), (11) and (12) for the appropriate (a_m, a_m−1).

Standard parametric assumptions on g(l_m, a_m, a_m−1) and r(lm, a_m−1) might be regression models with logit, probit or complementary log-log links [18]. To fix ideas, we work through an example under logistic regression models for both g(l_m, a_m, a_m−1) and r(lm, a_m−1) such that

r (1, a_{m - 1}; β) = \frac{exp (β_{1} a_{m - 1})}{1 + exp (β_{1} a_{m - 1})}

(16)

and

g (l_{m}, a_{m}, a_{m - 1}; θ) = \frac{exp (θ_{0} + θ_{1} l_{m} + θ_{2} a_{m} + θ_{3} a_{m - 1})}{1 + exp (θ_{0} + θ_{1} l_{m} + θ_{2} a_{m} + θ_{3} a_{m - 1})}

(17)

m = 0, …, K. To simplify the presentation, we have implicitly set the intercept in (16) to zero, however a non-zero intercept is easily added. Plugging these choices into (13), (14) and (15) we obtain the solutions

exp (ψ_{0}) = \frac{\frac{exp (θ_{1} + θ_{2})}{1 + exp (θ_{0} + θ_{1} + θ_{2})} + \frac{exp (θ_{2})}{1 + exp (θ_{0} + θ_{2})}}{\frac{exp (θ_{1})}{1 + exp (θ_{0} + θ_{1})} + \frac{1}{1 + exp (θ_{0})}}

(18)

exp (ψ_{1}) = \frac{\frac{exp (θ_{3})}{1 + exp (β_{1})} {\frac{exp (θ_{1} + β_{1})}{1 + exp (θ_{0} + θ_{1} + θ_{3})} + \frac{1}{1 + exp (θ_{0} + θ_{3})}}}{\frac{1}{2} {\frac{exp (θ_{1})}{1 + exp (θ_{0} + θ_{1})} + \frac{1}{1 + exp (θ_{0})}}}

(19)

exp (ψ_{2}) = \frac{{\frac{exp (θ_{1} + β_{1})}{1 + exp (θ_{0} + θ_{1} + θ_{3})} + \frac{1}{1 + exp (θ_{0} + θ_{2} + θ_{3})}} {\frac{exp (θ_{1})}{1 + exp (θ_{0} + θ_{1})} + \frac{1}{1 + exp (θ_{0})}}}{{\frac{exp (θ_{1})}{1 + exp (θ_{0} + θ_{1} + θ_{2})} + \frac{1}{1 + exp (θ_{0} + θ_{2})}} {\frac{exp (θ_{1} + β_{1})}{1 + exp (θ_{0} + θ_{1} + θ_{3})} + \frac{1}{1 + exp (θ_{0} + θ_{3})}}}

(20)

Notably, if the disease were rare as in Xiao et al. [7] our solutions simplify considerably. Specifically, given the model (17) and rare disease within each measurement interval and history we have the approximation

g (l_{m}, a_{m}, a_{m - 1}; θ) \approx exp (θ_{0} + θ_{1} l_{m} + θ_{2} a_{m} + θ_{3} a_{m - 1}) .

(21)

Plugging (21) into expressions (13), (14) and (15) in place of (17) we obtain the simplified approximate solutions

ψ_{0} = θ_{2}

(22)

ψ_{1} = log [\frac{exp (θ_{3}) {\frac{exp (θ_{1} + β_{1})}{1 + exp (β_{1})} + \frac{1}{1 + exp (β_{1})}}}{\frac{1}{2} {exp (θ_{1}) + 1}}]

(23)

ψ_{2} = 0

(24)

Equations (22) and (23) establish that the parameters of a standard time-dependent Cox regression such as (21) do not generally match those of the Cox MSM. Specifically, the maximum likelihood estimates (MLEs) of θ₂ and θ₃ based on the correctly specified model (17) have bias approximately equal to θ₂ − ψ₀ and θ₃ − ψ₁ for ψ₀ and ψ₁, respectively. Given the rare disease approximation (21), we can see by expression (22) that we have approximately θ₂ − ψ₀ = 0 for any choice of θ₁ or β₁. However, by expression (23), we will only have θ₃ = ψ₁ if either θ₁ or β₁ is zero; that is, if L_m is either not a confounder or not itself affected by prior treatment. Without the rare disease assumption, by equations (18) and (19), the bias of the MLEs of θ₂ and θ₃ for ψ₀ and ψ₁, respectively, depends not only on the values of θ₁ and β₁ but also on the other components of θ.

By equation (24), we also see in this example that, given the rare disease approximation (21), absence of an interaction term between a_m and a_m−1 in the model (17) also implies no interaction as quantified by ψ₂. However, by equation (20), in the absence of rare disease, the presence of interaction as quantified by ψ₂ generally depends on β₁ and all components of θ, despite the absence of an interaction term in the model (17). For interested readers, we explicitly consider alternative solutions for ψ₂ in Appendix B when an interaction term between a_m and a_m−1 is added to the model (17), both with and without a rare disease approximation. Note that solutions for ψ₀ and ψ₁ will remain unchanged (with or without rare disease) under a less restricted model for g(l_m, a_m, a_m−1) that allows interaction between a_m and a_m−1 by equations (13) and (14), respectively.

8. Continuous covariates

In this section, suppose L_m is a continuous random variable m = 0, …, K. Given the assumptions of Theorem 6.1, whether a closed form solution exists for $\frac{h (a_{m}, a_{m - 1})}{h (0, 0)}$ in this setting will now depend on the choice of g and r.

For example, as in Xiao et al. [7], assume that L_m is normally distributed given the past with

L_{m} | Ā_{m - 1}, {L̅}_{m - 1}, Y_{m} = 0 ~ N (β_{1} A_{m - 1}, σ^{2})

(25)

As in (16), an intercept is easily added to (25). With this choice of r, a closed form solution for (9) is not generally available for g the logistic regression model (17). If, however, along with this choice of r, we choose a probit link for g such that

g (l_{m}, a_{m}, a_{m - 1}; θ) = Φ (θ_{0} + θ_{1} l_{m} + θ_{2} a_{m} + θ_{3} a_{m - 1})

(26)

with Φ(·) the CDF of a standard normal, then, following Agresti [18], we have

h (m, ā_{m}; β, θ) = Φ {c (θ_{0} + θ_{2} a_{m} + θ_{3} a_{m - 1} + θ_{1} β_{1} a_{m - 1})}

(27)

with $c = {(\sqrt{1 + θ_{1}^{2} σ^{2}})}^{- 1}$ . By Corollary 6.2, we then also have that the Cox MSM γ(m, ā_m, ψ) = ψ₀a_m + ψ₁a_m−1 + ψ₂a_ma_m−1 holds with specifically

exp (ψ_{0}) = \frac{Φ {c (θ_{0} + θ_{2})}}{Φ (c θ_{0})}

exp (ψ_{1}) = \frac{Φ {c (θ_{0} + θ_{3} + θ_{1} β_{1})}}{Φ (c θ_{0})}

exp (ψ_{2}) = \frac{Φ {c (θ_{0} + θ_{2} + θ_{3} + θ_{1} β_{1})}}{Φ {c (θ_{0} + θ_{2})}} \times \frac{Φ (c θ_{0})}{Φ {c (θ_{0} + θ_{3} + θ_{1} β_{1})}}

A similar result is available for the logistic regression model (17) provided r is alternatively defined in terms of the more complex bridge distribution function of Wang and Louis [19]. However, if we further assume rare disease, we may maintain assumption (25) for r along with the model (17) for g and obtain an approximate closed form solution for (9). Specifically, using the approximation (21) for g and model (25) for r we have

h (m, ā_{m}; β, θ) \approx \int_{- \infty}^{\infty} exp (θ_{0} + θ_{1} l_{m} + θ_{2} a_{m} + θ_{3} a_{m - 1}) r (l_{m}, a_{m - 1}; β) d l_{m} = exp (θ_{0} + θ_{2} a_{m} + θ_{3} a_{m - 1}) E [exp (θ_{1} L_{m}) | A_{m - 1} = a_{m - 1}] = exp (θ_{0} + θ_{2} a_{m} + θ_{3} a_{m - 1}) exp (θ_{1} β_{1} a_{m - 1}) exp (\frac{1}{2} σ^{2} θ_{1}^{2})

(28)

with the last equality given by the moment generating function for the normal distribution. Note that, given (21), an analogous solution for h(m, ā_m; β, θ) will exist for any choice of r provided the conditional distribution of L_m has homoscedastic errors by the general property of the moment generating function. By Corollary 6.2, plugging the approximation (28) into (10), (11) and (12) for the appropriate choices of (a_m, a_m−1), the Cox MSM γ(m, ā_m, ψ) = ψ₀a_m + ψ₁a_m−1 + ψ₂a_ma_m−1 holds with the approximate solutions ψ₀ = θ₂, ψ₁ = θ₃ + θ₁β₁ and ψ₂ = 0.

Analogous to the worked example for binary L_m under the rare disease assumption (21), we see that ψ₀ = θ₂ regardless of the values of θ₁ and β₁. By contrast, ψ₁ = θ₃ only if θ₁ or β₁ is zero; that is, when L_m is not a confounder affected by prior treatment. As in the binary case, our probit example illustrates more generally that the discrepancy between ψ and the components of θ corresponding to treatment coefficients in the conditional outcome regression model, in addition to β₁ and θ₁, may also depend on other components of θ.

Of the examples considered thus far, the data generating assumptions of our last example – with r defined by model (25) and g approximated by (21) – are closest to those of the data generating models given in Xiao et al. [7]. One key distinction, however, is that Xiao et al. [7] allowed the distribution of L_m also to depend on L_m−1. Under this weaker assumption, the resulting Cox MSM now depends on the entire treatment history ā_m and not simply the two most recent values (a_m, a_m−1) as given by the following theorem.

Theorem 8.1 Assume the data generating mechanism (6) of Theorem 6.1 holds with g approximated by (21). Further assume

L_{m} | Ā_{m - 1}, {L̅}_{m - 1}, Y_{m} = 0 ~ N (β_{1} A_{m - 1} + β_{2} L_{m - 1}, σ^{2})

(29)

m = 0, …, K. Data generated under these assumptions will approximately follow a Cox MSM of the form

γ (m, ā_{m}, ψ) = ψ_{0} a_{m} + ψ_{1} a_{m - 1} + \sum_{s = 1}^{m - 1} ψ_{m - s + 1} a_{s - 1}

with ψ₀ = θ₂, ψ₁ = θ₃ + θ₁β₁ and $ψ_{m - s + 1} = θ_{1} β_{1} β_{2}^{m - s}$ , s = 1, …, m − 1.

A proof of Theorem 8.1 is given in Appendix C. Note that Xiao et al. [7] concluded that, under their data generating models, the resulting Cox MSM should only depend on a_m and a_m−1 for all m = 0, …, K. Theorem 8.1 appears to contradict this conclusion for K > 1. Under these data generating assumptions, IPW estimates constructed based on a Cox MSM that excludes the correct function of ā_m−2 should theoretically incur some bias because such a Cox MSM will be misspecified. This is a particular problem when |β₂| ≥ 1.

9. Simulation Algorithm

The following general algorithm may be used to simulate a sample of n i.i.d. observations as described in §2 that follows a particular parametrization of the g-formula.

Let Pr[A_m = 1|L̅_m = l̄_m,Ā_m−1 = ā_m−1, Y_m = 0; α] be a parametric model for the probability of receiving treatment in interval m given survival to m and history (l̄_m, ā_m−1). For each of i = 1, …, n simulated observations, implicitly define L̅_−1,i ≡ Ā_−1,i ≡ Y_0,i = 0. Then for each observation i:

For m = 0, …, K:

Draw L_m,i from some choice of f(L_m|Ā_m−1, L̅_m−1, Y_m = 0; β) evaluated at the previously generated (Ā_m−1,i, L̅_m−1,i).
Draw A_m,i from some choice of Pr[A_m = 1|L̅_m, Ā_m−1, Y_m = 0; α] evaluated at previously generated (Ā_m−1,i, L̅_m,i).
Draw Y_m+1,i from some choice of Pr[Y_m+1 = 1|L̅_m, Ā_m, Y_m = 0; θ] evaluated at previously generated (Ā_m,i, L̅_m,i). If Y_m+1,i = 1 then this is the last record in the data set for observation i. Otherwise, generate another record for observation i (i.e., go to index m + 1).

The above algorithm may be used to confirm theoretical results under any of the data generating assumptions considered above. As a simple illustration, we performed a simulation study where 20, 000 samples were generated according to the above algorithm, each with n = 100, 000 observations and K = 6. Data were generated according to the restrictions of Theorem 6.1 with the covariate and outcome generated according to the logistic regression models (16) and (17), respectively. Treatment was generated according to the logistic regression model logit[Pr(A_m = 1|L̅_m = l̄_m, Ā_m−1 = ā_m−1, Y_m = 0; α)] = α₀ + α₁l_m for each m = 0, …, K.

Simulations were conducted under the following six different combinations of (β₁, θ₁). In all scenarios, we fixed α₀ = 0.5, α₁ = 0.5, θ₀ = −7, θ₂ = −0.8 and θ₃ = 0. As all components of θ were selected ≤ 0, the rare disease approximation (21) holds under all six scenarios. Recall that by the analytic results of §7, under all simulation scenarios the Cox MSM γ(m, ā_m, ψ) = ψ₀a_m + ψ₁a_m−1 + ψ₂a_ma_m−1 holds with approximate values of ψ₀, ψ₁ and ψ₂ defined as in (22), (23) and (24), respectively.

Table 1 presents the bias of the IPW estimates of ψ₀ and ψ₁ for the true ψ₀ and ψ₁, respectively, constructed under the correctly specified Cox MSM and model for treatment for the 20, 000 runs. We see little bias at n = 100, 000 in these estimates. See Appendix D for details of the IPW estimation procedure. Table 2 presents the bias of the MLE of θ₂ for θ₂ = ψ₀ constructed under the correctly specified outcome regression model (17). Table 3 presents the bias of the MLE of θ₃ for θ₃, along with the bias for ψ₁, also under the correct model (17). Our simulations confirm the analytic results of §7. In particular, we see little bias of the MLE of θ₂ and θ₃ for θ₂ = ψ₀ and θ₃, respectively, regardless of the values of β₁ and θ₁. However, we see that the bias of the MLE of θ₃ for ψ₁ approximates θ₃ − ψ₁. As expected, this difference is approximately zero only when either β₁ or θ₁ is zero.

Table 1.

Bias of IPW estimates under the six choices of (β₁, θ₁) for n = 100, 000 and K = 6. ψ̂_j is the IPW estimate of ψ_j, E [ψ̂_j] is the mean of the estimates ψ̂_j over the 20, 000 simulation runs and Bias(ψ̂_j, ψ_j) = E [ψ̂_j] − ψ_j, j = 0, 1.

β₁	θ₁	ψ₀	E [ψ̂₀]	Bias(ψ̂₀, ψ₀)	ψ₁	E [ψ̂₁]	Bias(ψ̂₁, ψ₁)
−2.0	−2.0	−0.8	−0.8056	−0.0056	0.4574	0.4511	−0.0063
−0.5	−0.5	−0.8	−0.8021	−0.0021	0.0583	0.0586	0.0003
0.0	−0.5	−0.8	−0.8011	−0.0011	0	0.0004	0.0004
−0.5	0.0	−0.8	−0.8004	−0.0004	0	0.0002	0.0002
0.5	−2.0	−0.8	−0.7973	0.0027	−0.2064	−0.2047	0.0017
2.0	−2.0	−0.8	−0.7772	0.0228	−0.8676	−0.8709	−0.0033

Open in a new tab

Table 2.

Bias of the MLE of θ₂ for θ₂ = ψ₀ under the six choices of (β₁, θ₁) for n = 100, 000 and K = 6. θ̂₂ is the MLE of θ₂, E [θ̂₂] is the mean of the estimates θ̂₂ over the 20, 000 simulation runs and Bias(θ̂₂, θ₂) = E [θ̂₂] − θ₂.

β₁	θ₁	θ₂	E [θ̂₂]	Bias(θ̂₂, θ₂)
−2.0	−2.0	−0.8	−0.8006	−0.0006
−0.5	−0.5	−0.8	−0.8006	−0.0006
0.0	−0.5	−0.8	−0.8006	−0.0006
−0.5	0.0	−0.8	−0.8004	−0.0004
0.5	−2.0	−0.8	−0.8004	−0.0004
2.0	−2.0	−0.8	−0.7992	0.0008

Open in a new tab

Table 3.

Bias of the MLE of θ₃ for both θ₃ and ψ₁ under the six choices of (β₁, θ₁) for n = 100, 000 and K = 6. θ̂₃ is the MLE of θ₃, E [θ̂₃] is the mean of the estimates θ̂₃ over the 20, 000 simulation runs, Bias(θ̂₃, θ₃) = E [θ̂₃] − θ₃ and Bias(θ̂₃, ψ₁) = E [θ̂₃] − ψ₁.

β₁	θ₁	E [θ̂₃]	Bias(θ̂₃, θ₃)	ψ₁	Bias(θ̂₃, ψ₁)
−2.0	−2.0	0.0008	0.0008	0.4574	−0.4566
−0.5	−0.5	−0.0005	−0.0005	0.0583	−0.0588
0.0	−0.5	−0.0007	−0.0007	0	−0.0007
−0.5	0.0	−0.0006	−0.0006	0	−0.0006
0.5	−2.0	−0.0013	−0.0013	−0.2064	0.2051
2.0	−2.0	−0.0060	−0.0060	−0.8676	0.8616

Open in a new tab

10. Relation to the g-null paradox

As discussed, a limitation of the proposed simulation approach is that, under some data generating assumptions, it may be intractable or impossible to solve for the true Cox MSM parameters. Interestingly, an additional limitation of the proposed simulation approach follows from previous arguments regarding the “the g-null paradox” [20]. These arguments would suggest that given standard parametrizations of the observed data distribution consistent with the presence of a time-varying confounder affected by prior treatment, it is impossible for the null hypothesis of ψ = 0 to hold simultaneously. Our examples allow a careful consideration of this paradox in the current setting.

Consider the data generating assumptions of the simulation study described in §9. As before, under these assumptions, we approximately have ψ₀ = 0 if θ₂ = 0 by equation (22). Further, by equation (23), we have ψ₁ = 0 if θ₃ is set to

exp (θ_{3}) = \frac{\frac{1}{2} {exp (θ_{1}) + 1}}{\frac{exp (θ_{1} + β_{1})}{1 + exp (β_{1})} + \frac{1}{1 + exp (β_{1})}}

regardless of the values of θ₁ and β₁. Thus, we have at least one example illustrating that it is mathematically possible to generate data according to standard parametric models such that all components of ψ are zero and a time-varying confounder affected by prior treatment is present.

However, we do not expect such a scenario, where one coefficient is restricted to depend on a function of other coefficients of the data generating mechanism, to occur in nature. This is an example of the faithfulness assumption invoked when causal directed acyclic graphs are used to represent underlying data generating mechanisms [21, 22]. We therefore may be limited to simulation scenarios with the proposed algorithm to unrealistic settings if we wish simultaneously to generate data under the null.

11. Discussion

In this paper, we have illustrated how to derive a closed form Cox MSM given a set of parametric models for the observed data distribution. This gives an approach for simulating from a known Cox MSM using a standard parametrization of the likelihood. In contrast to previously proposed simulation methods, this approach allows a comparison of the performance of IPW and standard regression-based estimators of the effect of a time-varying treatment on survival in the complete absence of model misspecification. This, in turn, allows isolation of any particular source of bias in a simulation study. These sources may include finite sample bias, that due to (known) model misspecification and that due to complex time-varying confounding structures.

We used our analytic results to demonstrate and confirm in an example simulation study that the bias of standard regression-based estimates depends, at least in part, on the degree to which parameters quantifying the presence of a time-varying confounder affected by prior treatment are non-zero. Using analytic results, one may know, prior to undertaking a simulation study, how much large sample bias to expect in such a standard estimate in the absence of model misspecification. Confirmation of these expectations reduces the possibility of coding errors.

Our arguments highlight the importance of clearly defining the target population parameter of interest in any consideration of bias. As discussed, standard estimates will be approximately unbiased in large samples for the coefficients on treatment history in a correctly specified outcome regression model conditional on past treatment and confounders. However, following previous graphical arguments in largely model-free settings [9, 23, 24], these coefficients may fail to have a causal interpretation, even given the identifying assumptions of §4, when L_m is a time-varying confounder affected by prior treatment. Our arguments further highlight the need for careful consideration when imposing a parsimonious Cox MSM. For example, as we showed, typical assumptions that restrict dependence of the causal hazard ratio on only the most recent values of treatment may be difficult to justify if one is unwilling to make potentially extreme Markov assumptions on the underlying observed data generating process.

Finally, the utility of the proposed approach to known Cox MSM data generation is not limited to simulation-based comparisons of IPW performance versus that of standard regression-based estimators. This approach may also be useful in simulation studies aimed at comparing IPW with other estimators of correctly specified Cox MSM parameters that rely on a correctly specified conditional outcome regression model for optimal performance. These include parametric g-computation [8, 25, 26, 16, 27] as well as double-robust methods [28, 29, 30, 31].

Acknowledgements

The authors thank Miguel Hernán for helpful comments. This work was funded by NIH grants HL080644, R01AI104459-01A1, and 1R21ES019712-01.

Appendix A

We explicitly prove Theorem 6.1 under the discrete case as the proof under the continuous case is identical requiring integrals instead of sums. Incorporating assumptions (6) and (7) into the g-formula (2) we have

h (m, ā_{m}) = \frac{\sum_{l_{m}} g (l_{m}, a_{m}, a_{m - 1}) r (l_{m}, a_{m - 1}) \sum_{{l̄}_{m - 1}} w t (m - 1, {l̄}_{m - 1}, ā_{m - 1})}{\sum_{l_{m}} r (l_{m}, a_{m - 1}) \sum_{{l̄}_{m - 1}} w t (m - 1, {l̄}_{m - 1}, ā_{m - 1})}

(30)

where

w t (m - 1, {l̄}_{m - 1}, ā_{m - 1}) = {1 - g (l_{m - 1}, a_{m - 1}, a_{m - 2})} \prod_{j = 0}^{m - 1} {1 - g (l_{j - 1}, a_{j - 1}, a_{j - 2})} r (l_{j}, a_{j - 1}) .

The result follows by noting that ∑_{l̄_m−1} wt(m − 1, l̄_m−1, ā_m−1) cancels in the numerator and denominator of (30) and ∑_{l_m} r(l_m, a_m−1) = 1.

Appendix B

Consider the worked example of §7 but where the model (17) is replaced by

g (l_{m}, a_{m}, a_{m - 1}; θ) = \frac{exp (θ_{0} + θ_{1} l_{m} + θ_{2} a_{m} + θ_{3} a_{m - 1} + θ_{4} a_{m} a_{m - 1})}{1 + exp (θ_{0} + θ_{1} l_{m} + θ_{2} a_{m} + θ_{3} a_{m - 1} + θ_{4} a_{m} a_{m - 1})}

allowing interaction between a_m and a_m−1 as quantified by θ₄. Plugging this alternative choice of g into (15), along with the original model (16) for r, we have

exp (ψ_{2}) = exp (θ_{4}) \frac{{\frac{exp (θ_{1} + β_{1})}{1 + exp (θ_{0} + θ_{1} + θ_{2} + θ_{3} + θ_{4})} + \frac{1}{1 + exp (θ_{0} + θ_{2} + θ_{3} + θ_{4})}} {\frac{exp (θ_{1})}{1 + exp (θ_{0} + θ_{1})} + \frac{1}{1 + exp (θ_{0})}}}{{\frac{exp (θ_{1})}{1 + exp (θ_{0} + θ_{1} + θ_{2})} + \frac{1}{1 + exp (θ_{0} + θ_{2})}} {\frac{exp (θ_{1} + β_{1})}{1 + exp (θ_{0} + θ_{1} + θ_{3})} + \frac{1}{1 + exp (θ_{0} + θ_{3})}}}

which is equivalent to (20) when θ₄ = 0.

Given rare disease such that we have the approximation

g (l_{m}, a_{m}, a_{m - 1}; θ) \approx exp (θ_{0} + θ_{1} l_{m} + θ_{2} a_{m} + θ_{3} a_{m - 1} + θ_{4} a_{m} a_{m - 1})

we obtain the simplified approximate solution

ψ_{2} = θ_{4}

which is equivalent to (24) when θ₄ = 0.

Appendix C

For any m = 0, …, K define

\frac{b^{ā} (m)}{b^{0̅} (m)} = \frac{E [exp (θ_{1} L_{m}) | A_{m - 1} = a_{m - 1}, L_{m - 1}]}{E [exp (θ_{1} L_{m}) | A_{m - 1} = 0, L_{m - 1}]}

By (29) and the moment generating function for the normal distribution we have

\frac{b^{ā} (m)}{b^{0̅} (m)} = exp (θ_{1} β_{1} a_{m - 1}) \frac{exp (\frac{1}{2} σ^{2} θ_{1}^{2}) exp (θ_{1} β_{2} L_{m - 1})}{exp (\frac{1}{2} σ^{2} θ_{1}^{2}) exp (θ_{1} β_{2} L_{m - 1})} = exp (θ_{1} β_{1} a_{m - 1}) \frac{exp (θ_{1} β_{1} L_{m - 1})}{exp (θ_{1} β_{2} L_{m - 1})}

Next define

\frac{b^{ā} (m - 1)}{b^{0̅} (m - 1)} = \frac{E [b^{ā} (m) | A_{m - 2} = a_{m - 2}, L_{m - 2}]}{E [b^{0̅} (m) | A_{m - 2} = 0, L_{m - 2}]}

By the fact that (29) holds for L₀, …, L_K and, again, using the moment generating function for the normal distribution we have

\frac{E [b^{ā} (m) | A_{m - 2} = a_{m - 2}, L_{m - 2}]}{E [b^{0̅} (m) | A_{m - 2} = 0, L_{m - 2}]} = exp (θ_{1} β_{1} a_{m - 1}) \frac{E [exp (θ_{1} β_{2} L_{m - 1}) | A_{m - 2} = a_{m - 2}, L_{m - 2}]}{E [exp (θ_{1} β_{2} L_{m - 1}) | A_{m - 2} = 0, L_{m - 2}]} = exp (θ_{1} β_{1} a_{m - 1}) \frac{exp (\frac{1}{2} σ^{2} θ_{1}^{2} β_{2}^{2}) exp {θ_{1} β_{2} (β_{1} a_{m - 2} + β_{2} L_{m - 2})}}{exp (\frac{1}{2} σ^{2} θ_{1}^{2} β_{2}^{2}) exp (θ_{1} β_{2}^{2} L_{m - 2})} = exp (θ_{1} β_{1} a_{m - 1}) exp (θ_{1} β_{1} β_{2} a_{m - 2}) \frac{exp (θ_{1} β_{2}^{2} L_{m - 2})}{exp (θ_{1} β_{2}^{2} L_{m - 2})}

Analogously define

\frac{b^{ā} (m - 2)}{b^{0̅} (m - 2)} = \frac{E [b^{ā} (m - 1) | A_{m - 3} = a_{m - 3}, L_{m - 3}]}{E [b^{0̅} (m - 1) | A_{m - 3} = 0, L_{m - 3}]} = exp (θ_{1} β_{1} a_{m - 1}) exp (θ_{1} β_{1} β_{2} a_{m - 2}) exp (θ_{1} β_{1} β_{2}^{2} a_{m - 3}) \frac{exp (θ_{1} β_{2}^{3} L_{m - 3})}{exp (θ_{1} β_{2}^{3} L_{m - 3})}

Arguing recursively for any j = 1, …, m − 1

\frac{b^{ā} (j)}{b^{0̅} (j)} = exp (\sum_{s = j}^{m} θ_{1} β_{1} β_{2}^{m - s} a_{s - 1}) \frac{exp (θ_{1} β_{2}^{m - j + 1} L_{j - 1})}{exp (θ_{1} β_{2}^{m - j + 1} L_{j - 1})}

Setting j = 1 we have

\frac{b^{ā} (1)}{b^{0̅} (1)} = exp (\sum_{s = 1}^{m} θ_{1} β_{1} β_{2}^{m - s} a_{s - 1}) \frac{exp (θ_{1} β_{2}^{m} L_{0})}{exp (θ_{1} β_{2}^{m} L_{0})}

Further, by Ā₋₁ ≡ L̅₋₁ ≡ 0, we have

\frac{E [b^{ā} (1) | A_{- 1} = a_{- 1}, L_{- 1}]}{E [b^{0̅} (1) | A_{- 1} = 0, L_{- 1}]} = exp (\sum_{s = 1}^{m} θ_{1} β_{1} β_{2}^{m - s} a_{s - 1})

By (21) we have that

\frac{h (m, ā_{m}; β, θ)}{h (m, {0̅}_{m}; β, θ)} \approx exp (θ_{2} a_{m} + θ_{3} a_{m - 1}) \frac{E [b^{ā} (1) | A_{- 1} = a_{- 1}, L_{- 1}]}{E [b^{0̅} (1) | A_{- 1} = 0, L_{- 1}]}

(31)

Our result follows by noting that the RHS of (31) is equivalent to exp ( $ψ_{0} a_{m} + ψ_{1} a_{m - 1} + \sum_{s = 1}^{m - 1} ψ_{m - s + 1} a_{s - 1}$ ) where ψ₀ = θ₂, ψ₁ = θ₃ + θ₁β₁ and $ψ_{m - s + 1} = θ_{1} β_{1} β_{2}^{m - s}$ , s = 1, …, m − 1.

Appendix D

A typical implementation of IPW estimation is as follows. Let (ϕ̂, ψ̂) be the solution to the estimating equation

\sum_{ā_{m}} \sum_{m = 0}^{K} \sum_{i = 1}^{n} U_{i, m}^{ā} (ϕ', ψ', α̂) = 0,

(32)

with respect to (ϕ′, ψ′). Following the appendix of [15] and, again, suppressing the i subscript, define

U_{m}^{ā} (ϕ', ψ', α̂) = [Y_{m + 1} - expit {w (m; ϕ') + γ (m, ā_{m}, ψ')}] (1 - Y_{m}) W_{m}^{ā} (α̂) q (m, ā_{m}) \prod_{j = 0}^{m} I (A_{j} = a_{j})

where $expit (\cdot) = \frac{exp (\cdot)}{1 + exp (\cdot)}$ , w(m; ϕ′) is a function of interval m and a parameter vector ϕ′, q(m, ā_m) is a user-selected vector function of (m, ā_m) and

W_{m}^{ā} (α̂) = \frac{1}{\prod_{j = 0}^{m} Pr [A_{j} = a_{j} | {L̅}_{j}, Ā_{j - 1} = ā_{j - 1}, Y_{j} = 0; α̂]},

(33)

with α̂ the MLE of α given the treatment model Pr[A_m = a_m|L̅_m, Ā_m−1, Y_m = 0; α].

Assume the following holds for all m and ā:

The Cox MSM (3) holds,
h(m, 0̅_m) = exp{w(m; ϕ′)} when evaluated at ϕ′ = ϕ,
the treatment model Pr[A_m = a_m|L̅_m, Ā_m−1, Y_m = 0; α] is correctly specified and
h(m, ā_m) ≈ 0 such that (given assumptions 1 and 2)

h (m, ā_{m}) \approx expit {w (m; ϕ) + γ (m, ā_{m}, ψ)} .

(34)

Given these four assumptions we have

E [U_{m}^{ā} (ϕ, ψ, α)] \approx 0

(35)

for all m and ā_m and the IPW estimator ψ̂ approximately consistent for ψ and asymptotically normal. The choice of q(m, ā_m) in large samples affects only the efficiency of ψ̂.

Note that the approximation (35) follows under these assumptions by the equivalence between the g-formula (2) and the ratio of expectations

\frac{E [Y_{m + 1} (1 - Y_{m}) W_{m}^{ā} \prod_{j = 0}^{m} I (A_{j} = a_{j})]}{E [(1 - Y_{m}) W_{m}^{ā} \prod_{j = 0}^{m} I (A_{j} = a_{j})]}

where $W_{m}^{ā}$ is equivalent to $W_{m}^{ā} (α̂)$ but with the true conditional probability of treatment Pr[A_m = a_m|L̅_m, Ā_m−1 = ā_m−1, Y_m = 0] replacing its MLE.

A convenient choice of q(m, ā_m) is ${\frac{\partial w (m; ϕ')}{{\partial ϕ'}^{𝖳}}, \frac{\partial γ (m, ā_{m}; ψ'}{{\partial ψ'}^{𝖳}}}^{𝖳}$ . With this choice an approximate solution to (32) may be obtained with off-the-shelf software by fitting a weighted logistic regression model in a person-time data set of the structure described in §9. This approach was used to construct IPW estimates in the example simulation study also described in §9. Specifically, a logistic regression model was fit in SAS using the LOGISTIC procedure with dependent variable Y_m+1 and independent variables A_m and A_m−1 along with a completely flexible function of m = 0, …, 6. The WEIGHT option was used with stabilized weights to increase efficiency. That is, the weight for observation i at time m was defined by expression (33) under the model used to generate treatment and then multiplied by an estimate of $\prod_{j = 0}^{m} Pr [A_{j} = a_{j} | Ā_{j - 1} = ā_{j - 1}, Y_{j} = 0]$ with ā_m selected as that observation’s treatment history through m. The weight numerator can be considered an implicit component of q(m, ā_m). It is straightforward to confirm that the data generating parameters under all simulation scenarios maintain assumption (34) for all m and ā_m. Code is available upon request.

References

1.Robins JM. Statistical Models in Epidemiology. New York: Springer; 1999. Marginal structural models versus structural nested models as tools for causal inference; pp. 95–133. [Google Scholar]
2.Hernán MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11(5):561–570. doi: 10.1097/00001648-200009000-00012. [DOI] [PubMed] [Google Scholar]
3.Young JG, Hernán MA, Picciotto S, Robins JM. JSM Proceedings, Section on Statistics in Epidemiology. Alexandria, VA: American Statistical Association; 2008. Simulation from structural survival models under complex time-varying data structures. [Google Scholar]
4.Young JG, Hernán MA, Picciotto S, Robins JM. Equivalence between structural models for the effect of a time-varying exposure on survival. Lifetime Data Analysis. 2010;16(1):71–84. doi: 10.1007/s10985-009-9135-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Havercroft WG, Didelez V. Simulating from marginal structural models with time-dependent confounding. Statistics in Medicine. 2012;31(30):4190–4206. doi: 10.1002/sim.5472. [DOI] [PubMed] [Google Scholar]
6.Westreich D, Cole SR, Schisterman EF, Platt RW. A simulation study of finite-sample properties of marginal structural cox proportional hazards models. Statistics in Medicine. 2012;31(19):2098–2109. doi: 10.1002/sim.5317. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Xiao Y, Abrahamowicz M, Moodie EE. Accuracy of conventional and marginal structural cox model estimators: a simulation study. International Journal of Biostatistics. 2010;6(2) doi: 10.2202/1557-4679.1208. Article 13. [DOI] [PubMed] [Google Scholar]
8.Robins JM. A new approach to causal inference in mortality studies with a sustained exposure period: application to the healthy worker survivor effect. Mathematical Modelling. 1986;7:1393–1512. [Errata (1987) in Computers and Mathematics with Applications 14, 917–921. Addendum (1987) in Computers and Mathematics with Applications 14, 923–945. Errata (1987) to addendum in Computers and Mathematics with Applications 18, 477.]. [Google Scholar]
9.Robins JM. Causal inference from complex longitudinal data. In: Berkane M, editor. Latent Variable Modeling and Applications to Causality. Lecture notes in statistics 120. Springer-Verlag; 1997. pp. 69–117. [Google Scholar]
10.Murphy SA, van der Laan MJ, Robins JM. Marginal mean models for dynamic regimes. Journal of the American Statistical Association. 2001;96(456):1410–1423. doi: 10.1198/016214501753382327. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.van der Laan MJ, Petersen ML, Joffe MM. History-adjusted marginal structural models and statically-optimal dynamic treatment regimens. International Journal of Biostatistics. 2005;1(1) Article 4. [Google Scholar]
12.Hernán MA, Lanoy E, Costagliola D, Robins JM. Comparison of dynamic treatment regimes via inverse probability weighting. Basic & Clinical Pharmacology & Toxicology. 2006;98:237–242. doi: 10.1111/j.1742-7843.2006.pto_329.x. [DOI] [PubMed] [Google Scholar]
13.Orellana L, Rotnitzky A, Robins JM. Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, Part I: Main Content. International Journal of Biostatistics. 2010a;6 Article 7. [PubMed] [Google Scholar]
14.Orellana L, Rotnitzky A, Robins JM. Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, Part II: Proofs and Additional Results. International Journal of Biostatistics. 2010b;6 doi: 10.2202/1557-4679.1242. Article 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Cain LE, Robins JM, Lanoy E, Logan R, Costagliola D, Hernán MA. When to start treatment? A systematic approach to the comparison of dynamic regimes using observational data. International Journal of Biostatistics. 2010;6 doi: 10.2202/1557-4679.1212. Article 18. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Young JG, Cain LE, Robins JM, O’Reilly EJ, Hernán MA. Comparative effectiveness of dynamic treatment regimes: an application of the parametric g-formula. Statistics in Biosciences. 2011 doi: 10.1007/s12561-011-9040-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.D’agostino RB, Lee M, Belanger AJ. Relation of pooled logistic regression to time-dependent Cox regression analysis: the Framingham Heart Study. Statistics in Medicine. 1990;9:1501–1515. doi: 10.1002/sim.4780091214. [DOI] [PubMed] [Google Scholar]
18.Agresti A. Categorical Data Analysis. Connecticut, USA: Wiley Series in Probability and Statistics; 2012. [Google Scholar]
19.Wang Z, Louis TA. Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function. Biometrika. 2003;90(3):765–775. [Google Scholar]
20.Robins JM, Wasserman L. Estimation of effects of sequential treatments by reparameterizing directed acyclic graphs. In: Geiger D, Shenoy P, editors. Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence. San Francisco: Morgan Kaufmann; 1997. pp. 409–420. [Google Scholar]
21.Spirtes P, Glymour C, Scheines R. Causation, Prediction and Search. New York: Springer-Verlag; 1993. [Google Scholar]
22.Pearl J. Causal diagrams for empirical research. Biometrika. 1995;82:669–710. [Google Scholar]
23.Hernán MA, Hernández-Diáz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15:615–625. doi: 10.1097/01.ede.0000135174.63482.43. [DOI] [PubMed] [Google Scholar]
24.Robins JM, Hernán MA. Estimation of the causal effects of time-varying exposures. In: Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G, editors. Advances in Longitudinal Data Analysis. Boca Raton, FL: Chapman and Hall/CRC Press; 2009. pp. 553–599. [Google Scholar]
25.Robins JM, Hernán MA, Siebert U. Effects of multiple interventions. In: Ezzati M, Lopez AD, Rodgers A, Murray CJL, editors. Comparative Quantification of Health Risks: Global and Regional Burden of Disease Attributable to Selected Major Risk Factors. Geneva: World Health Organization; 2004. [Google Scholar]
26.Taubman SL, Robins JM, Mittleman MA, Hernán MA. Intervening on risk factors for coronary heart disease: an application of the parametric g-formula. International Journal of Epidemiology. 2009;38(6):1599–1611. doi: 10.1093/ije/dyp192. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Westreich D, Cole SR, Young JG, Palella F, Tien PC, Kingsley L, Gange SJ, Hernán MA. The parametric g-formula to estimate the effect of highly active antiretroviral therapy on incident AIDS or death. Statistics in Medicine. 2012 doi: 10.1002/sim.5316. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.van der Laan MJ, Robins JM. Unified Methods for Censored Longitudinal Data and Causality. New York: Springer; 2002. [Google Scholar]
29.Bang H, Robins JM. Doubly robust estimation in missing data and causal inference models. Biometrics. 2005;61:692–972. doi: 10.1111/j.1541-0420.2005.00377.x. [DOI] [PubMed] [Google Scholar]
30.van der Laan MJ. Targeted maximum likelihood based causal inference: Part I. International Journal of Biostatistics. 2010;6(2) doi: 10.2202/1557-4679.1211. Article 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.van der Laan MJ. Targeted maximum likelihood based causal inference: Part II. International Journal of Biostatistics. 2010;6(2) doi: 10.2202/1557-4679.1211. Article 3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Robins JM. Statistical Models in Epidemiology. New York: Springer; 1999. Marginal structural models versus structural nested models as tools for causal inference; pp. 95–133. [Google Scholar]

[R2] 2.Hernán MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11(5):561–570. doi: 10.1097/00001648-200009000-00012. [DOI] [PubMed] [Google Scholar]

[R3] 3.Young JG, Hernán MA, Picciotto S, Robins JM. JSM Proceedings, Section on Statistics in Epidemiology. Alexandria, VA: American Statistical Association; 2008. Simulation from structural survival models under complex time-varying data structures. [Google Scholar]

[R4] 4.Young JG, Hernán MA, Picciotto S, Robins JM. Equivalence between structural models for the effect of a time-varying exposure on survival. Lifetime Data Analysis. 2010;16(1):71–84. doi: 10.1007/s10985-009-9135-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Havercroft WG, Didelez V. Simulating from marginal structural models with time-dependent confounding. Statistics in Medicine. 2012;31(30):4190–4206. doi: 10.1002/sim.5472. [DOI] [PubMed] [Google Scholar]

[R6] 6.Westreich D, Cole SR, Schisterman EF, Platt RW. A simulation study of finite-sample properties of marginal structural cox proportional hazards models. Statistics in Medicine. 2012;31(19):2098–2109. doi: 10.1002/sim.5317. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Xiao Y, Abrahamowicz M, Moodie EE. Accuracy of conventional and marginal structural cox model estimators: a simulation study. International Journal of Biostatistics. 2010;6(2) doi: 10.2202/1557-4679.1208. Article 13. [DOI] [PubMed] [Google Scholar]

[R8] 8.Robins JM. A new approach to causal inference in mortality studies with a sustained exposure period: application to the healthy worker survivor effect. Mathematical Modelling. 1986;7:1393–1512. [Errata (1987) in Computers and Mathematics with Applications 14, 917–921. Addendum (1987) in Computers and Mathematics with Applications 14, 923–945. Errata (1987) to addendum in Computers and Mathematics with Applications 18, 477.]. [Google Scholar]

[R9] 9.Robins JM. Causal inference from complex longitudinal data. In: Berkane M, editor. Latent Variable Modeling and Applications to Causality. Lecture notes in statistics 120. Springer-Verlag; 1997. pp. 69–117. [Google Scholar]

[R10] 10.Murphy SA, van der Laan MJ, Robins JM. Marginal mean models for dynamic regimes. Journal of the American Statistical Association. 2001;96(456):1410–1423. doi: 10.1198/016214501753382327. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.van der Laan MJ, Petersen ML, Joffe MM. History-adjusted marginal structural models and statically-optimal dynamic treatment regimens. International Journal of Biostatistics. 2005;1(1) Article 4. [Google Scholar]

[R12] 12.Hernán MA, Lanoy E, Costagliola D, Robins JM. Comparison of dynamic treatment regimes via inverse probability weighting. Basic & Clinical Pharmacology & Toxicology. 2006;98:237–242. doi: 10.1111/j.1742-7843.2006.pto_329.x. [DOI] [PubMed] [Google Scholar]

[R13] 13.Orellana L, Rotnitzky A, Robins JM. Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, Part I: Main Content. International Journal of Biostatistics. 2010a;6 Article 7. [PubMed] [Google Scholar]

[R14] 14.Orellana L, Rotnitzky A, Robins JM. Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, Part II: Proofs and Additional Results. International Journal of Biostatistics. 2010b;6 doi: 10.2202/1557-4679.1242. Article 8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Cain LE, Robins JM, Lanoy E, Logan R, Costagliola D, Hernán MA. When to start treatment? A systematic approach to the comparison of dynamic regimes using observational data. International Journal of Biostatistics. 2010;6 doi: 10.2202/1557-4679.1212. Article 18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Young JG, Cain LE, Robins JM, O’Reilly EJ, Hernán MA. Comparative effectiveness of dynamic treatment regimes: an application of the parametric g-formula. Statistics in Biosciences. 2011 doi: 10.1007/s12561-011-9040-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.D’agostino RB, Lee M, Belanger AJ. Relation of pooled logistic regression to time-dependent Cox regression analysis: the Framingham Heart Study. Statistics in Medicine. 1990;9:1501–1515. doi: 10.1002/sim.4780091214. [DOI] [PubMed] [Google Scholar]

[R18] 18.Agresti A. Categorical Data Analysis. Connecticut, USA: Wiley Series in Probability and Statistics; 2012. [Google Scholar]

[R19] 19.Wang Z, Louis TA. Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function. Biometrika. 2003;90(3):765–775. [Google Scholar]

[R20] 20.Robins JM, Wasserman L. Estimation of effects of sequential treatments by reparameterizing directed acyclic graphs. In: Geiger D, Shenoy P, editors. Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence. San Francisco: Morgan Kaufmann; 1997. pp. 409–420. [Google Scholar]

[R21] 21.Spirtes P, Glymour C, Scheines R. Causation, Prediction and Search. New York: Springer-Verlag; 1993. [Google Scholar]

[R22] 22.Pearl J. Causal diagrams for empirical research. Biometrika. 1995;82:669–710. [Google Scholar]

[R23] 23.Hernán MA, Hernández-Diáz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15:615–625. doi: 10.1097/01.ede.0000135174.63482.43. [DOI] [PubMed] [Google Scholar]

[R24] 24.Robins JM, Hernán MA. Estimation of the causal effects of time-varying exposures. In: Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G, editors. Advances in Longitudinal Data Analysis. Boca Raton, FL: Chapman and Hall/CRC Press; 2009. pp. 553–599. [Google Scholar]

[R25] 25.Robins JM, Hernán MA, Siebert U. Effects of multiple interventions. In: Ezzati M, Lopez AD, Rodgers A, Murray CJL, editors. Comparative Quantification of Health Risks: Global and Regional Burden of Disease Attributable to Selected Major Risk Factors. Geneva: World Health Organization; 2004. [Google Scholar]

[R26] 26.Taubman SL, Robins JM, Mittleman MA, Hernán MA. Intervening on risk factors for coronary heart disease: an application of the parametric g-formula. International Journal of Epidemiology. 2009;38(6):1599–1611. doi: 10.1093/ije/dyp192. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Westreich D, Cole SR, Young JG, Palella F, Tien PC, Kingsley L, Gange SJ, Hernán MA. The parametric g-formula to estimate the effect of highly active antiretroviral therapy on incident AIDS or death. Statistics in Medicine. 2012 doi: 10.1002/sim.5316. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.van der Laan MJ, Robins JM. Unified Methods for Censored Longitudinal Data and Causality. New York: Springer; 2002. [Google Scholar]

[R29] 29.Bang H, Robins JM. Doubly robust estimation in missing data and causal inference models. Biometrics. 2005;61:692–972. doi: 10.1111/j.1541-0420.2005.00377.x. [DOI] [PubMed] [Google Scholar]

[R30] 30.van der Laan MJ. Targeted maximum likelihood based causal inference: Part I. International Journal of Biostatistics. 2010;6(2) doi: 10.2202/1557-4679.1211. Article 2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.van der Laan MJ. Targeted maximum likelihood based causal inference: Part II. International Journal of Biostatistics. 2010;6(2) doi: 10.2202/1557-4679.1211. Article 3. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Simulation from a known Cox MSM using standard parametric models for the g-formula

Jessica G Young

Eric J Tchetgen Tchetgen

Abstract

1. Introduction

2. Observed data structure

3. Definition of a Cox MSM

4. Identifying assumptions and the g-formula

5. Parametric assumptions on the g-formula

6. Cox MSMs under Markov assumptions

7. Binary covariates

8. Continuous covariates

9. Simulation Algorithm

Table 1.

Table 2.

Table 3.

10. Relation to the g-null paradox

11. Discussion

Acknowledgements

Appendix A

Appendix B

Appendix C

Appendix D

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Simulation from a known Cox MSM using standard parametric models for the g-formula

Jessica G Young

Eric J Tchetgen Tchetgen

Abstract

1. Introduction

2. Observed data structure

3. Definition of a Cox MSM

4. Identifying assumptions and the g-formula

5. Parametric assumptions on the g-formula

6. Cox MSMs under Markov assumptions

7. Binary covariates

8. Continuous covariates

9. Simulation Algorithm

Table 1.

Table 2.

Table 3.

10. Relation to the g-null paradox

11. Discussion

Acknowledgements

Appendix A

Appendix B

Appendix C

Appendix D

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases