Abstract
Causal inference with observational longitudinal data and time-varying exposures is often complicated by time-dependent confounding and attrition. The G-computation formula is one approach for estimating a causal effect in this setting. The parametric modeling approach typically used in practice relies on strong modeling assumptions for valid inference, and moreover depends on an assumption of missing at random, which is not appropriate when the missingness is missing not at random (MNAR) or due to death. In this work we develop a flexible Bayesian semi-parametric G-computation approach for assessing the causal effect on the subpopulation that would survive irrespective of exposure, in a setting with MNAR dropout. The approach is to specify models for the observed data using Bayesian additive regression trees, and then use assumptions with embedded sensitivity parameters to identify and estimate the causal effect. The proposed approach is motivated by a longitudinal cohort study on cognition, health, and aging, and we apply our approach to study the effect of becoming a widow on memory. We also compare our approach to several standard methods.
Keywords: BART, Cognitive aging, Longitudinal data, Observational data, Non-ignorable missing, Sensitivity analysis, Survivor Average Causal Effect, Time-varying exposure, Time-varying confounding
1. Introduction
Causal inference in non-randomized longitudinal studies with time-varying exposures is often complicated by time-dependent confounding and attrition. Attrition is inevitable especially if individuals in the studied population are older and followed over a long time period. Additionally, for cohort studies, an individual’s data is only recorded if that person completes follow-up testing. Hence, data for not only the outcome but also exposure level and confounders are missing at subsequent test waves.
The G-computation formula (Robins 1986) is one approach for estimating a causal effect of time-varying exposures when time-varying confounding is present. The approach is completely nonparametric in its original form, although a parametric modeling approach based on maximum likelihood estimation is most typically used in practice (e.g. Snowden, Rose, and Mortimer 2011; Wang, Nianogo, and Arah 2017). Valid inference with the parametric G-formula requires correct model specification. This can be extremely difficult when there is a large set of regressors, the relationship is non-linear and/or includes interaction terms, and there are multiple observation times. Non- and semi-parametric estimation techniques that do not require prespecified distributional or functional forms of the data, have become popular in the causal inference literature (e.g. Hill 2011; Häggström 2018; Kim et al. 2017; Karim et al. 2017; Tan and Roy 2019; Wager and Athey 2018). One such modeling strategy is Bayesian Additive Regression Trees (BART, Chipman, George, McCulloch, et al. 2010). BART is a sum-of-trees model that adds together the predictions of a number of regression trees regularized by prior distributions. BART does not rely on strong modeling assumptions, and in contrast to other tree-based algorithms BART yields interval estimates for full posterior inference.
A number of methodologies have been applied to address missing response or missing covariate data in causal effect estimation of longitudinal data under an assumption of missing at random (MAR; Chen and Zhou 2011; Robins, Rotnitzky, and Zhao 1995). These methods, however, are generally invalid when the missingness is missing not at random (MNAR) or due to death (Kurland et al. 2009). Partly conditional models have been proposed to address the combination of dropout and truncation by death, where inference is conditioning on the sub-population being alive at a specific time-point (Kurland and Heagerty 2005; Shardell and Miller 2008; Li and Su 2018; Rizopoulos 2012; Wen and Seaman 2018). However, conditioning on survival may introduce bias due to the fact that survival is a post-randomization event. One estimand that has gained much attention to address this issue is the ”survivors average causal effect” (SACE), i.e. the causal effect on the subpopulation of those surviving irrespective of exposure (Frangakis and Rubin 2002; Frangakis et al. 2007). Several approaches have been developed for estimation of the SACE in longitudinal randomized control studies (e.g. Lee and Daniels 2013; Lee, Daniels, and Sargent 2010; Wang et al. 2017; Wang, Richardson, and Zhou 2017), or in context of semicompeting risks (Comment et al. 2019; Xu et al. 2019). For observational data Tchetgen Tchetgen (2014) developed a weighting estimator to identify the SACE without missingness, and Shardell, Hicks, and Ferrucci (2014) identified the SACE with MAR missingness using also a weighting technique. Moreover, Josefsson et al. (2016) proposed assumptions to identify the SACE of a baseline exposure on a longitudinal outcome under MNAR missingness for the outcome using parametric methods. These approaches however, do not appropriately account for MNAR data among survivors when the exposure and confounding are time-varying. Shardell and Ferrucci (2018) proposed a parametric shared parameter model with g-computation to identify a principal stratum causal effect for observational longitudinal data with time-dependent confounding. A drawback of their approach is that unbiased estimation depends on correct model specification and it does not appropriately account for MNAR data among survivors.
Widowhood has been identified as an important social factor associated with increased mortality (Håkansson et al. 2009) and cognitive impairment (e.g. Mousavi-Nasab et al. 2012). Here, our goal is to develop a framework for assessing the impact of becoming a widow on memory, a monotone exposure, by estimating the SACE in a setting with MNAR dropout among survivors. The proposed approach is motivated by the Betula study (Nilsson et al. 1997), where individuals are followed over multiple test waves to study how cognitive functions potentially deteriorate with age and identify risk factors for dementia.
The remainder of the paper is organized as follows. In Section 2, we introduce the notation and the causal estimand. In Section 3, we propose identifying default assumptions and sensitivity parameters to allow deviations from these assumptions, followed by the identification of the SACE in Section 4. In Section 5, we propose a Bayesian semi-parametric (BSP) modeling approach for the observed data distributions and the algorithm for estimation of the SACE. In Section 6, we provide a simulation and in Section 7 an application to the Betula data. Finally, we conclude with a discussion and possible future work in Section 8.
2. Notation and the causal effect of interest
2.1. Data structure and notation
We begin with a formal description of the data. Let i = 1, 2, …, N denote individual and j = 0, 1, …, J denote time (the data used from the Betula study has J = 3 follow-up test waves). We denote the vector of baseline confounders by Xi0 (gender, education, and age cohort) and the time-varying confounder by Wij (if the spouse has been seriously ill between the j − 1th and jth test wave). The continuous memory outcome is denoted by Yij and the binary exposure is denoted by Zij. We assume a monotone exposure where initially all subjects are unexposed (Z10 = 0 for all i). If a subject is exposed (widowed) at test wave j Zij = 1 and if Zij = 1, then Zik = 1 for k > j. Let Sij denote survival, where Sij = 1 if an individual is alive at the time of the testing and 0 otherwise. Let Rij be a dropout indicator, where Rij = 1 if an individual has completed the cognitive testing or 0 otherwise. We have monotone missingness, so if Rij = 0, Rik = 0 for k > j. Note that vital status information is presumed to be available even after dropping out of the study. The history of the time-varying variables are denoted with an overbar. For example, the exposure history for individual i through test wave j is denoted by . Furthermore, for individual i, denotes the number of test waves (s)he participates in the study, and denotes the number of test waves (s)he is alive. A simplified version of the study design restricted to two test waves is depicted in a causal diagram in Figure 1.
Figure 1:

A causal diagram of a simplified version of the Betula study design restricted to two test waves.
2.2. Causal estimand
The goal of the study is to estimate the causal effect of becoming a widow (within 5 years) on memory among those who would survive irrespective of being widowed or not. We consider two contrasting exposure regimes, , i.e individuals exposed (widowed) between the j − 1th and jth wave, and the contrasting regime , i.e. individuals unexposed through test wave j, for j = 1, 2, 3. Below, we generally suppress the subscript i to simplify notation. The potential memory outcome at wave j is denoted by for an individual under exposure regime . Similarly, let be the potential survival outcome at wave j, denoting survival under exposure regime .
We consider a principal stratum causal effect of a time-varying exposure on the outcome, at wave j, for those who would survive under either exposure regime,
| (1) |
However, main interest is not the effect at a specific wave, but rather the effect aggregated over test waves, defined as
| (2) |
3. Identifying assumptions and sensitivity parameters
To identify the causal effect in [2] from the observed data we first introduce a set of assumptions followed by a set of sensitivity parameters to assess the impact of violations to some of the assumptions. The sensitivity parameters (and their values) will be explained in relation to the Betula data in Section 7.2.
3.1. Assumptions
Assumptions 1 – 3 are a set of standard assumptions for causal inference of longitudinal observational data:
Assumption 1 Consistency:
For a given individual, if , then and .
Assumption 2 Positivity for a monotone exposure:
for zj = 0, 1 and for all individuals, such that all unexposed individuals have a nonzero probability of becoming exposed between test wave j − 1 and j if
Assumption 3 Conditional exchangeability:
If X0 and contains all pre-exposure covariates related to exposure, potential outcomes and survival, then for all exposure regimes
That is, at each test wave j, being exposed zj is as if randomized conditional on the set of the temporally preceding variables. The assumption of conditional exchangeability is likely to be violated in many settings and is impossible to assess from the data. Therefore, we introduce a sensitivity parameter to investigate sensitivity for unmeasured confounding in Section 3.2.
In cohort studies Yj, Zj and Wj are not observed (but defined) for individuals who are alive but who drop out of the study. We make an MAR type assumption conditional on being survival at time j (MAR-S) to identify the distribution of dropouts among survivors.
Assumption 4 Dropout among survivors:
For all j ≥ 1 and all t ≤ j
That is, the outcome is distributed the same among dropouts and non-dropouts conditional on survival and the temporally preceding variables. Similarly, and . Previous studies of the Betula data have shown that individuals who drop out have lower cognitive performance and steeper decline (Josefsson et al. 2012). In Section 3.2 we introduce sensitivity parameters to allow the dropout to deviate from this MAR type assumption.
We also need three further assumptions for identification of the potential outcomes for those individuals who would survive regardless of exposure history, i.e. the principal strata. We start with two standard assumptions.
Assumption 5 Monotonicity:
; if an individual were to be alive under exposure regime then (s)he would also be alive under the contrasting regime . Deterministic monotonicity can be too strong in many settings and we discuss a weakening of this in Section 8.
Assumption 6 Differences in outcomes when comparing different strata:
For the contrasting exposure regime we assume, . That is, there is no difference in potential outcomes when comparing the ”always survivor” strata to the strata where individuals were to live under the contrasting regime but not under exposure regime . In Section 3.2 we introduce a sensitivity parameter to investigate sensitivity to this assumption, due to the fact that individuals in the always survivor strata are likely healthier and have better cognitive performance.
A common problem encountered in longitudinal cohort studies is that an individual’s exposure level zj, hence the exposure regime , and time-varying confounder wj is only observed if (s)he is alive and participates at the jth test wave. Hence we need to introduce a new assumption to be able to identify the probability of survival among exposed and non-exposed; this is necessary for the identification of the potential outcomes among always survivors.
Assumption 7 Exposure and confounding among non-survivors:
If sj = 0 and for an individual, zj and wj may have occurred before the event of death, thus, zj and wj are not observed but could still be well-defined. We assume,
and
i.e. the exposure and confounder are distributed the same among survivors and non-survivors conditional on the temporally preceding variables.
This assumption is used for identification of the principal strata. In the Betula study the cognitive testing is performed at 5 year intervals. Since 5 years is a rather long time period it is likely that some of the participants who died before follow-up were also widowed before death. Thus, the number of widowed participants in the sample may be underestimated and must be accounted for.
3.2. Sensitivity parameters
To investigate sensitivity of Assumption 3 we follow the procedure of Brumback et al. (2004). The unmeasured confounding is quantified through a parameter which describes the outcome confounding. That is, for exposure regime , , where is the average difference in potential outcomes because of unmeasured confounding. The conditional exchangeability assumption does not hold if . Thus, estimating using the naive estimator leads to a bias of . Further, since the two regimes only differ in zj, for , the bias becomes . Sensitivity to several types of unmeasured confounding can be assessed using this form. Here, we restrict to an unmeasured confounder independent of the history of the joint processes (, , , , , x0).
To investigate sensitivity of Assumption 4 we first make an assumption of non-future dependence (NFD) conditional on survival (NFD-S) for the outcome and then instroduce sensitivity parameters within this partial identifying restrictions (Linero and Daniels 2018). NFD is a special case of MNAR (Kenward, Molenberghs, and Thijs 2003), and NFD-S is defined as, , for all j > 1 and all t < j. Here it is defined conditional on being alive at time j. The NFD-S assumption leaves one conditional distribution per incomplete dropout pattern unidentified, that is when t = j. To identify the unidentified conditional distribution left by the NFD-S assumption, we introduce a sensitivity parameter γj such that , when γj < 0 implies a negative location shift in the outcome at the first unobserved test wave. This assumption implies dropout at time j depends on being alive at that time, the history up to that time, the exposure, time-varying confounder and the outcome at time j, but not outcomes or time-varying variables after time j. This assumption of dropout not depending on the ‘future’ is often viewed as realistic and was proposed originally as a remedy to concerns about many pattern mixture models implicitly having future dependence. Table 1 displays a description of the possible mortality- and missing data patterns under the NFD-S assumption.
Table 1:
The table shows possible missing data, , and mortality patterns, . The outcome vector Y = {Y0, Y1, Y2, Y3} is fully observed if , otherwise it is constrained by the mortality outcome and/or missing data patterns. Yj = O if the outcome is observed, Yj = M if missing, and Yj = nd when truncated by death. The NFD-S restriction leaves the distribution for Yj = M* unidentified.
| {1, 0, 0, 0} | {1, 1, 0, 0} | {1, 1, 1, 0} | {1, 1, 1, 1} | |
|---|---|---|---|---|
| {1, 0, 0, 0} | {O, nd, nd, nd} | - | - | - |
| {1, 1, 0, 0} | {O, M*, nd, nd} | {O, O, nd, nd} | - | - |
| {1, 1, 1, 0} | {O, M*, M, nd} | {O, O, M*, nd} | {O, O, O, nd} | - |
| {1, 1, 1, 1} | {O, M*, M, M} | {O, O, M*, M} | {O, O, O, M*} | {O, O, O, O} |
To investigate sensitivity of Assumption 6 we let, , for the contrasting exposure regime . That is, the mean difference in potential outcomes when comparing the ”always survivor” strata to the strata where individuals were to live under the contrasting regime but not under exposure regime . In our analysis we assume which implies that memory performance is on average higher in the ”always survivors”-strata (the always survivors-strata is healthier). We further assume this difference is independent of the preceding variables.
To investigate sensitivity of Assumption 7, we introduce a sensitivity parameter νj for the exposure such that, νj equals
representing the mean difference in the proportion exposed between non-survivors and survivors. The first probability on the right-hand side of each expression is not identified. However, bounds can be derived for νj; see the Web Appendix section A.2 for details. In particular, the upper bound for νj, , is obtained when . This reflects that among non-survivors, all subjects were exposed before the event of death between the j−1th and jth wave. Further, by using Assumption 1 and 5, the lower bound for νj is obtained when . This reflects an equal survival probability among those exposed or unexposed at wave j. Here, by using the law of total probability and Bayes theorem, the lower bound becomes 0.
4. Identification
Identification of the SACE in [2] follows from two results.
Result 1:
The causal contrasts in [1] can be identified as follows
| (3) |
where denotes the set of temporally preceding variables (, , , x0).
Result 2:
τ in [2] can further be identified using Assumption 5 by weighting the contrasts in [3] with
| (4) |
The proofs of the results can be found in the Web Appendix section A.3. The causal effect is identifiable based on the observed data and Assumptions 1–7, conditional on the fixed values for the sensitivity parameters , , , νj and, γj. For a Bayesian analysis, the sensitivity parameters can be given informative priors. In Section 5.3 and Table 2 we describe the estimation algorithm where the sensitivity parameters are given informative, non-degenerate, priors.
Table 2:
Algorithm for estimation of τ in [2] using the G-computation formula. Details of the algorithm can be found in the Web Appendix section A.4.
| 1. | Sample the observed data posteriors as described in Section 5. |
| 2. | For each posterior sample of the parameters sample pseudo data and sensitivity parameters γj, vj, c(zj), and of size N*. Additionally, sample one set of . |
| 3. | Implement G-computation for , and similarly for , using the pseudo data and sensitivity parameters from Step 2 by computing and . Furthermore, implement Monte Carlo integration using the pseudo data to compute and . |
| 4. | Use the quantities in Step 3 to compute one posterior sample of τ as defined in [3]–[4]. |
| 5 | Repeat step 2 – 4 for each of the posterior sample of the parameters. |
5. Modeling of the observed data distributions and computation of the causal effect
The joint distribution of the observed data is specified as a marginal model for the baseline confounders and a set of sequential conditional models for the time-varying variables, given the history of the joint process (the outcome, exposure, confounders, and missingness). Details of the joint distribution are given in Web Appendix section A.1. The baseline confounders xi0 are all observed before an individual enters the study. For each visit j we postulate the time-varying variables in the following order: (sij, rij, wij, zij, yij), even though the exposure, the time-varying confounder, and survival all occurred between (j − 1)st and jth test wave. Of course, yij, wij and zij, are only observed if and . It is further allowed that wij and zij may have occurred before sij.
5.1. Bayesian semi-parametric modeling
We propose a Bayesian semi-parametric modeling approach based on Bayesian Additive Regression Trees (BART, Chipman, George, McCulloch, et al. 2010) for the observed data distribution.
For the time varying components, we specify BART models for the responses as a function of prior histories for all individuals alive and not dropped out at a given test wave. The model consists of two parts: a sum-of-trees model and a regularization prior on the parameters of that model. The model for the continuous response Yj is conditioned on the history of the joint process (, , , x0) for the subset that satisfies and , and can be expressed as . The model consists of distinct binary regression trees denoted by . Each tree constitute a set of interior node decision rules leading down to terminal nodes, and for a given , is the associated terminal node parameters. The conditional distribution of the continuous outcome is specified as normal, , where the mean function, , is given by the sum-of-trees.
The BART models for our binary responses Zj, Wj, Rj, and Sj are specified as probit models. For example the model for the exposure can be expressed as: , where Φ denotes the cumulative density function of the standard normal distribution and is the probability of being exposed at wave j given (, , , x0) for the subset that satisfies and . The BART model for Sj is fitted for the subset that satisfies and , and for Rj the subset that satisfies and . The predicted probabilities of rj = 1 and sj = 1 are: and . Note that, s0 = 1 and r0 = 1 for all individuals, if rj−1 = 0, and if sj−1 = 0.
The baseline confounders are all categorical (age cohort, sex, and education level). We create a saturated multinomial random variable, , based on these categorical variables. L is the number of categories and each category corresponds to a unique combination of the categorical variables. is given a Dirichlet prior with parameters equal to one.
5.2. Posterior
Draws from the posterior distribution of the sum-of-trees models are generated using Markov chain Monte Carlo (MCMC). The parameters of the conditional distributions for Yj, Zj, Wj, Rj, and Sj are assumed independent and thus their posteriors can be sampled simultaneously. BART is implemented in the R package bartMachine (Kapelner and Bleich 2013) for continuous and binary responses. We use default priors on all of the parameters of the sum-of-trees model, that is, on the tree structure, the terminal node parameters, and the error variance. For details see Kapelner and Bleich (2013).
5.3. Computation of the SACE
The algorithm for generating samples from the posterior distribution of τ in [2] using the G-computation formula is given in Table 2. Details can be found in the Web Appendix section A.4. The algorithm provides the details of generating posterior samples of the causal quantities in Results 1 and 2 (from Section 4) using the posterior distribution of the observed data model parameters (Section 5.1) and the identifying restrictions with sensitivity parameters (Sections 3.1 and 3.2). Recall the expressions in Results 1 and 2 are a function of the observed data distribution and the sensitivity parameters.
For implementation of the algorithm in practice, a number of the initial posterior samples are discarded as burn-in. Parallel computation can be implemented to speed up computations. For example, instead of running one long chain in Step 1, it is possible to run multiple shorter chains in parallel, although each chain still needs to converge. Also, Step 2 may be divided into k blocks of size N*/k, and in Steps 3 – 4 the parameters of interest are computed by combining the pseudo data from the k blocks. We give further details on computation with Betula data in Section 7.3.
6. Simulation study
We performed a simulation study to evaluate the performance of the BSP G-computation algorithm. For simplicity of comparison to other appropriate methods we estimate and set , γj = 0, and , i.e. a setting with MAR missingness and and no deaths. Details are found in the Web Appendix section A.5.
We consider two settings for our BSP approach. First, where we specify a normal distribution for the outcome as described in the algorithm (BSP-GC1), and second (BSP-GC2), when specifying a t-distribution with 3 degrees of freedom (t3). We compare our approach with three other methods used for causal effect estimation of longitudinal data with time-varying confounding. The three other methods implemented are: (i) A parametric version of the proposed procedure (BP-GC). Here we specified Bayesian linear and logistic additive regression models instead of the BART models described in Section 5.1. (ii) Inverse probability of treatment weights (IPTW; Cole and Hernán 2008). Here, the mean is estimated by averaging the memory outcome for the subset with in a pseudo-population constructed by weighting each individual using both unstabilized weights (IPTW-W) and stabilized weights (IPTW-SW), to adjust for confounding and for attrition among survivors. The IPTW-W and IPTW-SW were implemented using the ipw and survey packages in R. (iii) Targeted minimum loss-based estimation approach for longitudinal data structures (TMLE; Laan and Gruber 2012). We implemented the TMLE using the ltmle package using default settings (Lendle et al. 2017). Confidence intervals were calculated using nonparametric bootstrap. We used 5000 bootstrap samples. The bootstrap confidence intervals were calculated using the 2.5th and 97.5th percentiles of the resulting estimates.
Data were generated based on a simplified version of the Betula data. We simulated 1000 datasets of size n = 1000. We considered Ji = 2 follow-up test waves, a continuous baseline covariate, Xi0, generated as X0 ~ Unif (0, 1). The outcome, Yij, was considered a continuous time-varying variable. The binary variable Zij indicated if the subject was widowed or not, and Wij indicated if the spouse been severely sick. Widowhood was an absorbing state, such that, if Zij = 1 then Zik = 1 for k ≥ j. Note, that Zi0 = 0 for all subjects. As in the Betula data, all time-varying variables had a highly nonlinear relationship with the baseline covariate and the time-varying confounder interacted with the baseline covariate in the exposure model. Data for the simulation study was generated as X0 ~ Unif (0, 1), , , and , where ϵj ~ N(0, 0.12). R code for the data generation is provided in the Web Appendix section A.5.
Table 3 shows the bias, empirical standard deviation (ESD), mean squared error (MSE), and coverage of 95% confidence intervals from the simulation study for BSP-GC1, BSP-GC2, BP-GC, IPTW-W, IPTW-SW, and TMLE. The causal effect estimates for BSP-GC1, BSP-GC2 and TMLE are nearly unbiased. BSP-GC1 and BSP-GC2 are however more efficient (smaller MSE and ESD) and have higher coverage (larger than 95%) than TMLE (lower than 95%). The simulation results for BSP-GC1 and BSP-GC2 are very similar. As expected, the three other methods; BP-GC, IPTW-W, and IPTW-SW are all biased. Additionally, of all methods, BP-GC has the highest bias and MSE, IPTW-W is the least efficient, and IPTW-SW has the lowest coverage.
Table 3:
Simulation results for causal effect estimation with n=1000 with the true causal effect τ = −0.05, using two settings for our proposed approach: error specified as normal (BSP-GC1) and error specified using a t-distribution (BSP-GC2), a parametric version of the proposed procedure (BP-GC), inverse probability of treatment weights using unstabilized weights (IPTW-W) and stabilized weights (IPTW-SW), and Targeted minimum loss-based estimation approach for longitudinal data structures (TMLE). Mean squared error (MSE) are multiplied by 100 for ease of presentation. ESD denotes empirical standard deviation and CP denotes coverage probability of 95% credible intervals.
| Bias | ESD | MSE | CP | |
|---|---|---|---|---|
| BSP-GC1 | −0.002 | 0.013 | 0.02 | 98.7 |
| BSP-GC2 | −0.003 | 0.013 | 0.02 | 98.4 |
| BP-GC | −0.065 | 0.021 | 0.47 | 63.6 |
| IPTW-W | 0.024 | 0.040 | 0.22 | 64.3 |
| IPTW-SW | −0.031 | 0.013 | 0.12 | 20.4 |
| TMLE | −0.002 | 0.021 | 0.04 | 92.8 |
To see how the proposed approach performs when the error distribution is misspecified data were instead generated from a t3-distribution for the error of the outcome Yj in [??]. Bias, ESD, MSE, and coverage from the simulation are found in Table 4. The results are similar in terms of bias and coverage compared to the previous simulation with correctly specified error. However, ESD and MSE are higher and are now comparable to TMLE.
Table 4:
Simulation results for causal effect estimation with n=1000 with the true causal effect τ = −0.05. Two scenarios, a) where the error distribution for the outcome is misspecified using a t-distribution with 3 degrees of freedom and b) a setting with limited overlap. Comparing our proposed approach (BSP-GC1), a parametric version of the proposed procedure (BP-GC), inverse probability of treatment weights using unstabilized weights (IPTW-W) and stabilized weights (IPTW-SW), and Targeted minimum loss-based estimation approach for longitudinal data structures (TMLE). Mean squared error (MSE) are multiplied by 100 for ease of presentation. ESD denotes empirical standard deviation and CP denotes coverage probability of 95% credible intervals.
| a) t3 | b) non-overlap | |||||||
|---|---|---|---|---|---|---|---|---|
| Bias | ESD | MSE | CP | Bias | ESD | MSE | CP | |
| BSP-GC1 | −0.002 | 0.037 | 0.13 | 97.2 | −0.003 | 0.014 | 0.02 | 98.9 |
| BP-GC | −0.074 | 0.038 | 0.69 | 69.5 | −0.078 | 0.024 | 0.67 | 43.0 |
| IPTW-W | 0.028 | 0.068 | 0.55 | 59.7 | −0.026 | 0.033 | 0.18 | 56.2 |
| IPTW-SW | −0.031 | 0.022 | 0.15 | 43.5 | −0.042 | 0.015 | 0.20 | 10.6 |
| TMLE | −0.001 | 0.036 | 0.13 | 92.3 | 0.002 | 0.023 | 0.05 | 93.9 |
To see how the proposed approach performs when there is lack of overlap, data were generated as for the exposure, mimicking the Betula study where only individuals at older ages were exposed (widowed). The results from the simulation are found in Table 4. The results are similar to the first simulation with correctly specified error and non-linear effects for BSP-GC1 and TMLE. But for the other three approaches bias was higher and CP was lower.
7. Analysis of the Betula data
7.1. The Betula data
The goal is to estimate the causal effect of becoming a widow on memory among those who would survive irrespective of being widowed or not. As such, we limit our data set to those individuals who were married at enrollment. Of approximately 2000 participants N = 1059 were married at study enrolment, and data were recorded at 4 fixed test waves (j = 0, …, 3) with 5 years interval. The memory outcome was assessed at each wave using a composite of three episodic memory tasks. The score can range between 0 and 76, with a higher score indicating better memory (for details see Josefsson et al. 2012). We consider two contrasting exposure regimes, subjects who became a widow between the j − 1th and jth wave, , and subjects married through test wave j, , for j = 1, 2, 3. Baseline demographic characteristics included age-cohorts: 45, 50, …, 80 years of age at enrollment, gender, and education, categorized into low: 6–7 years of education (29%), intermediate: 8–9 years (31%), or high: >9 years (40%). We also measured a time-varying confounder; an indicator if the spouse has been sick within the last 5 years. We note that baseline confounders are always recorded.
7.2. Sensitivity parameters
Our approach allows uncertainty about untestable assumptions by specifying priors for the sensitivity parameters described in Section 3.2. We restrict the parameters to a plausible range of values, reflecting the authors’ beliefs about the unknown quantities.
In Section 3.2, the sensitivity parameter reflects the average difference in potential outcomes due to unmeasured confounding (violation of Assumption 3). For the Betula data, when studying the effect of widowhood on cognition, one concern may be that the association is confounded by a healthy lifestyle, such as a healthy diet and/or exercise, something that is often shared within couples. Couples with a healthy lifestyle live longer and may have better cognitive performance than couples with a less healthy lifestyle. This information is not available from the database. Hence, it is a potential unmeasured confounder. Here, we assume c(zj) < 0 and , reflecting that exposed (widowed) individuals are less healthy compared to unexposed (married) individuals. We further assume the effect is equal for exposed and unexposed. That is, we assume c(zj) = −ξj and . Here, we specify a uniform prior on the sensitivity parameters, , with upper bound . That is, we expect the sensitivity parameter not to be bigger than one-half standard deviation of the outcome conditional on the history of the joint process. This approximately corresponds to an effect size similar to that found in previous literature on the effect of Mediterranean diet on memory (Radd-Vagenas et al. 2018).
Departures from a MAR mechanism (Assumption 4) for the missingness among survivors can be investigated by varying γj in Section 3.2. Our prior belief is that γj < 0, reflecting a negative shift in memory performance occur immediately after the first unobserved test wave. Here, the prior is specified as , where we assume the lower bound is one observed conditional standard deviation, . The effect is similar to what has been found in previous work examining differences in cognition between completers and those who withdraw, at the last cognitive testing visit before dropping out (Rabbitt, Lunn, and Wong 2008).
Sensitivity to Assumption 6, uses , which reflects the difference in outcomes when comparing the ”always survivor” strata to the strata where individuals were to live under the contrasting regime but not under exposure regime . We again specify a uniform prior , with upper bound .
Finally, sensitivity to Assumption 7 uses the sensitivity parameter νj, which represents the difference in the probability of being exposed at wave j for non-survivors and survivors conditioning on the history of the joint process. As shown in Section 3.2, νj is restricted to [0, ]. We assume the prior for νj is uniform over this range, . The upper bound reflects that, between the j − 1th and jth wave, all subjects were exposed before death.
7.3. Results and comparison with other methods
We estimated τ using the proposed BSP method and embedded sensitivity parameters. For each chain the first 1000 iterations were discarded as burn-in, and 2240 posterior samples of τ were obtained. We sampled pseudo data of size N* = 25000 at each iteration. Convergence of the posterior samples was monitored using trace plots of the samples. To reduce computation time we used 448 parallel chains. Total computation time was 1 hour and 18 minutes.
For longitudinal exposure regimes limited overlap is not uncommon. To avoid extrapolation of the outcome model outside the range of estimated propensities we restrict the overlap region for the longitudinal exposure regimes. Specifically, we restrict data to the set of individuals that have an estimated propensity score that lies within the range of the observed propensities for the two contrasting regimes (similar to the procedure used in Zhou, Elliott, and Little 2019).
We consider two settings for our BSP approach. First, we specify a normal distribution for the residual of the outcome as described in the algorithm (BSP-GC1); second (BSP-GC2), we replace the normal distribution with a t-distribution with 3 degrees of freedom (t3). For BSP-GC1, the posterior sampling results revealed a mean episodic memory score of 38.2 (95% CI; 35.4, 40.8) for exposed and 38.1 (95% CI; 35.6, 40.1) for unexposed individuals, and an estimate of τ of 0.18 (95% CI; −1.43, 1.86), suggesting that there is no effect of becoming a widow on memory among those who would survive irrespective of exposure. For BSP-GC2, the posterior sampling results revealed a mean episodic memory score of 38.2 (95% CI; 35.5, 40.9) for exposed and 38.0 (95% CI; 35.6, 40.1) for unexposed individuals, and an estimate of τ of 0.21 (95% CI; −1.42, 1.82). The conclusions are insensitive to the two choices of outcome residual distribution here.
As a sensitivity analysis we compare how the point estimates and uncertainty varied when setting one sensitivity parameter at a time to zero, while the remaining sensitivity parameters are given the priors described in Section 7.2. Setting γj to zero resulted in a estimate of τ of 0.18 (95% CI; −1.42, 1.91); for νj = 0, 0.20 (95% CI; −1.36, 1.83); for δ = 0, 0.21 (95% CI; −1.38, 1.94); and for ξj = 0, −0.83 (95% CI; −2.43, 0.75). The largest differences was found for the analysis setting ξj to zero (i.e. no unmeasured confounding); however the CI still cover zero and we expect this assumption to not hold. Fixing the other sensitivity parameters at zero had minimal impact.
We also compare our approach, BSP-GC1 with BP-GC, IPTW-W, IPTW-SW, and TMLE (described in the simulation study). For simplicity of comparison we estimate the causal contrasts described in the simulation study. Further to avoid limited overlap, we restrict our data to those age-cohorts where we observe both married and widowed participants over the study period, instead of restricting to the region as for the main analyses. For IPTW-W, IPTW-SW, and TMLE, confidence intervals were calculated using nonparametric bootstrap. We used 5000 bootstrap samples. The bootstrap confidence intervals were calculated using the 2.5th and 97.5th percentiles of the resulting estimates.
The results from all the methods are given in Table 5. First, all of the methods display a negative widowhood effect on memory, although all confidence/credible intervals (CI) cover zero. There is a large discrepancy between our semi-parametric approach, BSP-GC1, and the parametric counterpart, BP-GC. In the latter, the effect was attenuated and the CI was narrower. A likely explanation of the discrepancy in effect estimates is that BP-GC is more susceptible to bias caused by model misspecification. BP-GC and IPTW-SW yielded most similar results, although the weighting approach had much wider CI. Further, the effect estimate appeared most negative using IPTW-W and the CI was much wider than for any of the other methods. Weighting methods are known to be unstable and to have problems with large variance estimates in finite samples if the values of the weights are extreme. In our analysis the range of the weights was 0.06–14.3 for IPTW-W, compared to 0.06–5.4 for IPTW-SW. The large weights using IPTW-W may explain the deviating result using this method. Our BSP-GC1 approach yielded an estimate of τ most similar to TMLE, although TMLE had slightly wider CI. This is consistent with the results of the simulation study.
Table 5:
Comparison of methods used for causal effect estimation of the Betula data, setting , γj = 0, and , using our proposed approach (BSP-GC), a parametric version of the proposed procedure (BP-GC), inverse probability of treatment weights using unstabilized weights (IPTW-W) and stabilized weights (IPTW-SW), and Targeted minimum loss-based estimation approach for longitudinal data structures (TMLE).
| Estimate [95% CI] | |
|---|---|
| BSP-GC | −0.98 [−2.78, 0.73] |
| BP-GC | −0.53 [−1.73, 0.68] |
| IPTW-W | −1.67 [−5.96, 1.51] |
| IPTW-SW | −0.44 [−3.06, 1.39] |
| TMLE | −0.96 [−3.11, 0.99] |
8. Concluding remarks
This paper has proposed a Bayesian semi-parametric (BSP) framework for estimating the SACE with longitudinal cohort data. Our approach allows for Bayesian inference under MNAR missingness and truncation by death, as well as the ability to characterize uncertainty about unverifiable assumptions. The proposed approach has several advantages compared to existing approaches: (i) the flexible modeling of the observed data as compared to parametric methods, while maintaining computational ease, (ii) interval estimates for full posterior inference, (iii) easy to introduce sensitivity parameters.
The simulation study, although simplified, mirrored the Betula data. All time-varying variables had a highly nonlinear relationship with the baseline covariate and interaction effects were included. The models for BP-GC, IPTW-W, IPTW-SW, and TMLE were specified using additive effects, and thus, were misspecified. The results showed that BSP-GC1, BSP-GC2 and TMLE were nearly unbiased. BSP-GC1 and BSP-GC2 were however more efficient and had better coverage than TMLE (though a bit conservative). The results are in line with previous research (Roy et al. 2018), suggesting that TMLE is less efficient than Bayesian semi-parametric and non-parametric modeling. This, however, must be explored more thoroughly in future work.
The three other methods; BP-GC, IPTW-W, and IPTW-SW, were all biased. This is expected since these methods make stronger distributional assumptions and thus are more sensitive to model misspecification. Similar to TMLE our approach does not rely on strong modeling assumptions, but unlike TMLE, it is quite easy to modify assumptions and incorporate sensitivity parameters. Recall we could not easily make direct comparisons of the proposed approach with the other approaches under our assumptions that include sensitivity parameters. We attempted to implement Super learner, implemented in the R package SuperLearner, but observed highly variable results for the Betula data (causal effect estimates varied between −0.34 and −1.33). This may be a result of the cross-validation step and the fact that the exposure is a rather rare event. Using our BSP approach these problems are avoided by increasing the size of the pseudo data and running longer chains. Although, computation time can be demanding for large pseudo sample sizes, the algorithm can be fully parallelized as discussed in Section 5.3 and Section 7.3, which would vastly reduced the total computation time.
For the Betula data we did not find an effect of widowhood on memory. The results were not sensitive to different specification of the errors as normal- or t-distributed, and changing the sensitivity parameters one at a time did not change the results significantly either. The difference in findings from previous studies may partly be explained by different estimands being used; ours is the only analysis using a SACE. Additionally, in this study we considered the immediate effect of widowhood (within 5 years) rather than a long term effect; it may take longer for degeneration to become apparent.
Our approach can be generalized in various ways. For example, it is possible to allow for multiple time-varying confounders and/or continuous baseline confounders using a sequential approach as proposed by (Xu, Daniels, and Winterstein 2016). This would involve first ordering the confounders into sequential conditionals and then applying BART to model each of these univariate conditionals. Additionally, although widowhood status is thought of as a monotone exposure pattern and an absorbing state in this study, this is not essential for the proposed approach and other (non-monotone) exposure regimes, such as the effect of widowhood duration on memory at the last visit, might be of interest and are possible to study with a few modifications (for example, the positivity assumption).
Violations of the consistency assumption can be problematic when using observational data (Cole and Frangakis 2009). For example, the effect of widowhood can affect memory via different pathways, e.g. for some subjects via stress or depression and others via reduced physical health due to poorer lifestyle choices (Gerritsen et al. 2017). This is a limitation with the current study where the exposure is defined homogeneously, and should be explored more thoroughly in future work.
Several of our assumptions can be (further) relaxed. For example, Assumption 5 can be weakened to a stochastic Monotonicity, by following the procedure described in Lee, Daniels, and Sargent (2010). Also, in this study we have considered unmeasured outcome confounding; this assumption can easily be extended to allow unmeasured mortality confounding. Assumption 6 can be weakened by conditioning on the history of the joint process. However, a drawback with relaxing these assumptions is increasing the number of sensitivity parameters.
One limitation with BART is the restrictive, and sometimes unrealistic, assumption of IID normal errors, (although they can easily be replaced with heavier tail errors as in here). A fully non-parametric modeling approach could be obtained by extending BART to model the error distribution using the Dirichlet process mixtures (George et al. 2018). An additional limitation of the proposed approach is that we used existing R-functions for BART that are not most efficient for our setting. We will explore these limitations in future work, as well as, other choices for priors of the sensitivity parameters.
Supplementary Material
Acknowledgments
The authors would like to thank Dr Anna Sundström for helpful discussions on the interpretation of the results. This work is partially funded by The Swedish Foundation for Humanities and Social Sciences P17-0196:1 and Paths to Healthy and Active Ageing, funded by the Swedish Research Council for Health, Working Life and Welfare, (Dnr 2013 - 2056) to MJ. This work is partially funded by US NIH grants CA183854 and GM112327 to MJD. This publication is based on data collected in the Betula prospective cohort study, Umeå University, Sweden. The Betula Project is supported by Knut and Alice Wallenberg foundation (KAW) and the Swedish Research Council (K2010-61X-21446-01). The simulations were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at Umea University partially funded by the Swedish Research Council through grant agreement no. 2016-07213.
Footnotes
Supplementary materials
Web Appendices referenced in Sections 3, 4, 5, 6, and 7, as well as R code are available as Supplementary materials.
References
- Brumback BA, Hernán MA, Haneuse SJ, and Robins JM (2004). “Sensitivity analyses for unmeasured confounding assuming a marginal structural model for repeated measures”. In: Statistics in medicine 23.5, pp. 749–767. [DOI] [PubMed] [Google Scholar]
- Chen B and Zhou X-H (2011). “Doubly robust estimates for binary longitudinal data analysis with missing response and missing covariates”. In: Biometrics 67.3, pp. 830–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chipman HA, George EI, McCulloch RE, et al. (2010). “BART: Bayesian additive regression trees”. In: The Annals of Applied Statistics 4.1, pp. 266–298. [Google Scholar]
- Cole SR and Frangakis CE (2009). “The consistency statement in causal inference: a definition or an assumption?” In: Epidemiology 20.1, pp. 3–5. [DOI] [PubMed] [Google Scholar]
- Cole SR and Hernán MA (2008). “Constructing Inverse Probability Weights for Marginal Structural Models”. In: American Journal of Epidemiology 168.6, pp. 656–664. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Comment L, Mealli F, Haneuse S, and Zigler C (2019). “Survivor average causal effects for continuous time: a principal stratification approach to causal inference with semicompeting risks”. In: arXiv preprint arXiv:1902.09304. [Google Scholar]
- Frangakis CE and Rubin DB (2002). “Principal stratification in causal inference”. In: Biometrics 58.1, pp. 21–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frangakis CE, Rubin DB, An M-W, and MacKenzie E (2007). “Principal stratification designs to estimate input data missing due to death”. In: Biometrics 63.3, pp. 641–649. [DOI] [PubMed] [Google Scholar]
- George E, Laud P, Logan B, McCulloch R, and Sparapani R (2018). “Fully Nonparametric Bayesian Additive Regression Trees”. In: arXiv preprint arXiv:1807.00068. [Google Scholar]
- Gerritsen L, Wang H-X, Reynolds CA, Fratiglioni L, Gatz M, and Pedersen NL (2017). “Influence of negative life events and widowhood on risk for dementia”. In: The American Journal of Geriatric Psychiatry 25.7, pp. 766–778. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Häggström J (2018). “Data-driven confounder selection via Markov and Bayesian networks”. In: Biometrics 74.2, pp. 389–398. [DOI] [PubMed] [Google Scholar]
- Håkansson K, Rovio S, Helkala E-L, Vilska A-R, Winblad B, Soininen H, et al. (2009). “Association between mid-life marital status and cognitive function in later life: population based cohort study”. In: Bmj 339, b2462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hill JL (2011). “Bayesian nonparametric modeling for causal inference”. In: Journal of Computational and Graphical Statistics 20.1, pp. 217–240. [Google Scholar]
- Josefsson M, de Luna X, Daniels MJ, and Nyberg L (2016). “Causal inference with longitudinal outcomes and non-ignorable dropout: estimating the effect of living alone on cognitive decline”. In: Journal of the Royal Statistical Society: Series C (Applied Statistics) 65.1, pp. 131–144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Josefsson M, de Luna X, Pudas S, Nilsson L-G, and Nyberg L (2012). “Genetic and Lifestyle Predictors of 15-Year Longitudinal Change in Episodic Memory”. In: Journal of the American Geriatrics Society 60.12, pp. 2308–2312. [DOI] [PubMed] [Google Scholar]
- Kapelner A and Bleich J (2013). “bartMachine: Machine learning with Bayesian additive regression trees”. In: arXiv preprint arXiv 1312.2171. [Google Scholar]
- Karim ME, Petkau J, Gustafson P, Tremlett H, and Group TBS (2017). “On the application of statistical learning approaches to construct inverse probability weights in marginal structural cox models: hedging against weight-model misspecification”. In: Communications in Statistics-Simulation and Computation 46.10, pp. 7668–7697. [Google Scholar]
- Kenward MG, Molenberghs G, and Thijs H (2003). “Pattern-mixture models with proper time dependence”. In: Biometrika 90.1, pp. 53–71. [Google Scholar]
- Kim C, Daniels MJ, Marcus BH, and Roy JA (2017). “A framework for Bayesian nonparametric inference for causal effects of mediation”. In: Biometrics 73.2, pp. 401–409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kurland BF and Heagerty PJ (2005). “Directly parameterized regression conditioning on being alive: analysis of longitudinal data truncated by deaths”. In: Biostatistics 6.2, pp. 241–258. [DOI] [PubMed] [Google Scholar]
- Kurland BF, Johnson LL, Egleston BL, and Diehr PH (2009). “Longitudinal data with follow-up truncated by death: match the analysis method to research aims”. In: Statistical science: a review journal of the Institute of Mathematical Statistics 24.2, p. 211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laan MJ van der and Gruber S (2012). “Targeted minimum loss based estimation of causal effects of multiple time point interventions”. In: The international journal of biostatistics 8.1. [DOI] [PubMed] [Google Scholar]
- Lee K and Daniels MJ (2013). “Causal inference for bivariate longitudinal quality of life data in presence of death by using global odds ratios”. In: Statistics in medicine 32.24, pp. 4275–4284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee K, Daniels MJ, and Sargent DJ (2010). “Causal effects of treatments for informative missing data due to progression/death”. In: Journal of the American Statistical Association 105.491, pp. 912–929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lendle SD, Schwab J, Petersen ML, and van der Laan MJ (2017). “ltmle: An R Package Implementing Targeted Minimum Loss-Based Estimation for Longitudinal Data”. In: Journal of Statistical Software 81.1, pp. 1–21. [Google Scholar]
- Li Q and Su L (2018). “Accommodating informative dropout and death: a joint modelling approach for longitudinal and semicompeting risks data”. In: Journal of the Royal Statistical Society: Series C (Applied Statistics) 67.1, pp. 145–163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Linero AR and Daniels MJ (2018). “Bayesian approaches for missing not at random outcome data: The role of identifying restrictions”. In: Statistical Science 33.2, pp. 198–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mousavi-Nasab S-M-H, Kormi-Nouri R, Sundström A, and Nilsson L-G (2012). “The effects of marital status on episodic and semantic memory in healthy middle-aged and old individuals”. In: Scandinavian Journal of Psychology 53.1, pp. 1–8. [DOI] [PubMed] [Google Scholar]
- Nilsson L-G, Bäckman L, Erngrund K, Nyberg L, Adolfsson R, Bucht G, et al. (1997). “The Betula prospective cohort study: Memory, health, and aging”. In: Aging, Neuropsychology, and Cognition 4.1, pp. 1–32. [Google Scholar]
- Rabbitt P, Lunn M, and Wong D (2008). “Death, dropout, and longitudinal measurements of cognitive change in old age”. In: The Journals of Gerontology Series B: Psychological Sciences and Social Sciences 63.5, P271–P278. [DOI] [PubMed] [Google Scholar]
- Radd-Vagenas S, Duffy SL, Naismith SL, Brew BJ, Flood VM, and Fiatarone Singh MA (2018). “Effect of the Mediterranean diet on cognition and brain morphology and function: a systematic review of randomized controlled trials”. In: The American journal of clinical nutrition 107.3, pp. 389–404. [DOI] [PubMed] [Google Scholar]
- Rizopoulos D (2012). Joint models for longitudinal and time-to-event data: With applications in R. Chapman and Hall/CRC. [Google Scholar]
- Robins J (1986). “A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect”. In: Mathematical modelling 7.9–12, pp. 1393–1512. [Google Scholar]
- Robins JM, Rotnitzky A, and Zhao LP (1995). “Analysis of semiparametric regression models for repeated outcomes in the presence of missing data”. In: Journal of the american statistical association 90.429, pp. 106–121. [Google Scholar]
- Roy J, Lum KJ, Zeldow B, Dworkin JD, Re III VL, and Daniels MJ (2018). “Bayesian nonparametric generative models for causal inference with missing at random covariates”. In: Biometrics 74.4, pp. 1193–1202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shardell M and Ferrucci L (2018). “Joint mixed-effects models for causal inference with longitudinal data”. In: Statistics in medicine 37.5, pp. 829–846. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shardell M, Hicks GE, and Ferrucci L (2014). “Doubly robust estimation and causal inference in longitudinal studies with dropout and truncation by death”. In: Biostatistics 16.1, pp. 155–168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shardell M and Miller RR (2008). “Weighted estimating equations for longitudinal studies with death and non-monotone missing time-dependent covariates and outcomes”. In: Statistics in Medicine 27.7, pp. 1008–1025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Snowden JM, Rose S, and Mortimer KM (2011). “Implementation of G-computation on a simulated data set: demonstration of a causal inference technique”. In: American Journal of Epidemiology 173.7, pp. 731–738. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tan YV and Roy J (2019). “Bayesian additive regression trees and the General BART model”. In: Statistics in Medicine 0.0, pp. 1–22. doi: 10.1002/sim.8347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tchetgen Tchetgen EJ (2014). “Identification and estimation of survivor average causal effects”. In: Statistics in medicine 33.21, pp. 3601–3628. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wager S and Athey S (2018). “Estimation and inference of heterogeneous treatment effects using random forests”. In: Journal of the American Statistical Association 113.523, pp. 1228–1242. [Google Scholar]
- Wang A, Nianogo RA, and Arah OA (2017). “G-computation of average treatment effects on the treated and the untreated”. In: BMC medical research methodology 17.1, p. 3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang C, Scharfstein DO, Colantuoni E, Girard TD, Yan Y, et al. (2017). “Inference in randomized trials with death and missingness”. In: Biometrics 73.2, pp. 431–440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang L, Richardson TS, and Zhou X-H (2017). “Causal analysis of ordinal treatments and binary outcomes under truncation by death”. In: Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79.3, pp. 719–735. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wen L and Seaman SR (2018). “Semi-parametric methods of handling missing data in mortal cohorts under non-ignorable missingness”. In: Biometrics 74.4, pp. 1427–1437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu D, Daniels MJ, and Winterstein AG (2016). “Sequential BART for imputation of missing covariates”. In: Biostatistics 17.3, pp. 589–602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu Y, Scharfstein D, Müller P, and Daniels M (2019). “A Bayesian Nonparametric Approach for Evaluating the Effect of Treatment in Randomized Trials with SemiCompeting Risks”. In: arXiv preprint arXiv:1903.08509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou T, Elliott MR, and Little RJA (2019). “Penalized Spline of Propensity Methods for Treatment Comparison”. In: Journal of the American Statistical Association 114.525, pp. 1–19. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
