. 2021 Feb 4;5:3. doi: 10.1186/s41512-021-00092-9

Table 2.

Illustration of methods in different categories using an example of statin intervention in primary prevention of CVD

Approach categories			Refs	Targeted estimand	Potential pitfalls/challenges	Exemplary methods/evaluations
Combining causal effects measured from external information	Two-stage approach		Candido dos Reis et al. [22]	Risk of CVD under intervention of taking or not taking statin at baseline (and, in a considered trial protocol, following-up for a certain length of time during which statin choice is maintained): $E (Y^{(A_{0})}\| X_{0})$	Efficacy/effectiveness gap when translating trial results to routine care. Comparability of trail and observed populations (selection bias).	Develop a CPM using individuals who take statin at baseline with the coefficient for treatment variable in the model fixed to the statin effects estimated from trials.
	Two-stage approach		Brunner et al. [23]		Inflating the baseline cholesterol for individuals receiving statin by a certain level has assumed that ‘statins had a moderate effect on lipid reduction and was initiated late during lifetime’, and that statins operate only through cholesterol, i.e. ignores any other causal pathways.	Inflate the baseline cholesterol of individuals receiving statin (by 30% e.g.). Develop a CPM using all individuals. Combine the predicted individual-level CVD risk with an effect equation estimated from trials to get the absolute risk under intervention.
	One-stage approach		Silva [24]	Risk of CVD under intervention of taking statin of dosage a_i, (i = 1, …, d) at baseline: $E (Y^{(A_{0} = a_{i})}\| X_{0})$ .	Sample selection bias between the interventional data and observational data.	Individual patient data from RCTs and observational clinical data are combined under a Bayesian framework to predict risk under intervention. Use MCMC to approximate the posterior distributions of the parameters in the model.
Estimating both a prediction model and causal effects from observational data	Single intervention	Related to average treatment effect estimation	Van Amsterdam et al. [25]	Risk of CVD under intervention of taking/not taking statin at baseline, regardless of future: $E (Y^{(A_{0})}\| X_{0})$ .	An over-simplified causal structure can lead to biased estimates of causal effects, e.g. when there exists more than one collider that were not observed but whose information were contained in the prognostic factors.	Use a CNN to separate the unobserved collider information from other risk factors while using the last layer resembling linear regression to include the treatment variable as a covariate for risk prediction under intervention.
		Related to conditional treatment effect estimation	Alaa et al. [26]	Risk of CVD under intervention of taking/not taking statin at baseline, regardless of future: $E (Y^{(A_{0})}\| X_{0})$ .	Without careful examination of causal structure within the variables, biased association between treatment and outcome can be introduced.	Estimate the outcome curves for the treated samples and untreated samples simultaneously using the signal-in-white-noise model. The estimation of model is done through one loss function, known as the precision in estimating heterogeneous effects (PEHE).
		Related to conditional treatment effect estimation	Arjas [27]	Risk of CVD under intervention of taking/not taking statin at baseline, regardless of future: $E (Y^{(A_{0})}\| {\bar{H}}_{0})$ .	Potentially biased estimate due to misspecification of intensity functions required in the outcome hazard model.	Use treatment history and other risk factors measured over-time to set up a Bayesian model to estimate the outcome risk intensity function over time. For prediction, given an individual’s measurements up to time t, estimate the risk under a single intervention by applying MCMC on the predictive distributions.
	Time-dependent treatments and treatment-confounder feedback	MSMs within a prediction model framework	Pajouheshnia et al. [8]	Risk of CVD under interventions of taking/not taking statin at baseline and/or some other times at the future: $E (Y^{(\bar{A} (0, K) = 0)}\| X_{0})$	The effectiveness of bias correction depends on a correct specification of treatment model.	Assume a causal structure. Estimate treatment censoring probabilities by fitting logistic regression models in each of the follow-up periods and derive time-varying censoring weights. After censoring, develop the prognostic model using a weighted Cox model.
			Sperrin et al. [28]	Risk of CVD under interventions of taking/not taking statin at baseline and/or some other times at the future: $E (Y^{(\bar{A} (0, K) = 0)}\| X_{0})$	The effectiveness of bias correction depends on a correct specification of treatment model. Requires agreement between the prediction model and the set of variables required for conditional exchangeability.	Assume a causal structure. Collect the baseline prognostic factors, treatments, and treatment confounders at each time point post-baseline. Compute IPTWs using a treatment model; with derived IPTWs, build a logistic regression for outcome prediction under treatments.
			Lim et al. [29]	Risk of CVD and/or other outcomes of interest (e.g. cholesterol, SBP, etc) under multiple interventions planned for the next τ timesteps from current time, given an observed history ${\bar{H}}_{0}$ : $E (Y_{τ}^{(\bar{A} (0, τ - 1))}\| {\bar{H}}_{0})$ .	Requires agreement between the prediction model and the set of variables required for conditional exchangeability.	With observed treatment, covariate and outcome histories (allowing for multiple treatment options of different forms), develop a propensity network to compute the IPTW and a sequence-to-sequence model that predict the outcome under a planned sequence of interventions.
		Methods based on balanced representation approach	Bica et al. [30]		Potential confounders as no careful examination of causal structure.	Build a counterfactual recurrent network to predict outcomes under interventions: 1. For the encoder network, use an RNN, with LSTM unit to build treatment invariant representations of the patient history $Φ ({\bar{H}}_{t})$ and to predict one-step-ahead outcomes Y_t + 1; 2. For the decoder network, use $Φ ({\bar{H}}_{t})$ to initialize the state of an RNN that predicts the counterfactual outcomes for future treatments.
		Methods with g-computation for correcting time-varying confounding	Xu et al. [31]	Cholesterol or other continuous outcome of interest (univariate) at any time t in the future, under a sequence of interventions planned irregularly from current time till t, ${\bar{A}}_{0, < t}$ , given observed history: $E (Y_{t}^{({\bar{A}}_{0, < t})}\| {\bar{H}}_{0})$ .	Potential bias due to strong assumptions on model structure and possible model misspecification.	With observed treatment/covariate/outcome histories, estimate treatment-response trajectories using a Bayesian nonparametric or semi-parametric approach: 1. Specify models for different components in the generalised mix-effect model for outcome prediction. These usually include: treatment response, baseline regression (fixed effects), and random effects. For the case where the treatments are continuously-administrated, model the treatment response using LTI dynamic systems (Soleimani et al). 2. Choose priors for these models based on expert domain knowledge. 3. Use maximum a posteriori (MAP) (Soleimani et al.) or MCMC (Xu et al.) to approximate the posterior distributions of the parameters in the proposed model.
			Soleimani et al. [32]	$E (Y_{t}^{({\bar{A}}_{0, < t})}\| {\bar{H}}_{0})$ : same as in the Xu et al. method except that now the outcome Y can be multivariate (e.g. simultaneously predict risk of CVD, cholesterol and SBP) and the treatment can be both discrete-time and continuous-time.
			Schulam et al. [33]	$E (Y_{t}^{({\bar{A}}_{0, < t})} \| {\bar{Y}}_{0}, {\bar{A}}_{< 0})$ : same as in the Soleimani et al. method except that the observations only include intervention and outcome histories.	Potential bias due to strong assumptions on model structure and possible model misspecification. Lack of effect heterogeneity due to omitting baseline covariates.	With observed histories, jointly model intervention and outcomes using a marked point process (MPP): 1. Specify models for the components in the MPP intensity function: event model, outcome model, action (intervention) model. The parameterization of the event and action models can be chosen to reflect domain knowledge. The outcome model is parameterized using a GP. 2. Maximise the likelihood of observational traces over a fixed interval to estimate the parameters.