. 2015 Jul 25;44(5):1731–1737. doi: 10.1093/ije/dyv135

Table 1.

Proposed algorithms for imputation of potential outcomes using multiple imputation and the g-formula

	Multiple imputation	The parametric g-formula
1a	For each of N observations, defined by {Y=y, X=x, Z=z}, create two additional
	Variables Y⁰ and Y¹ which are initially set to missing.
	The observation is now defined by {Y=y, X=x, Z=z, Y⁰=., Y¹=}
1b	Assuming counterfactual consistency, set Y^x = Y. That is, if X = 0, then set Y ⁰= Y
2		Fit a model (e.g. logistic regression) for the association of X, Z on Y, resulting in estimated model parameters β
3a	Perform m imputations (e.g. in SAS procedure MI) for the missing values of Y¹ based on observed values of Z.	For each observation, use the model (fit in step 1) to predict expected value of Y¹ setting X = 1, and of Y⁰ setting X = 0
3a	This will result in m complete datasets
3b	Impute the missing values of Y⁰ based on observed values of Z in each of the m datasets. This keeps total number of datasets at m	Note that if Y is dichotomous, the expected value is the same as probability P(Y^x=1)
4a	Within each of m complete datasets, do:
4b	Calculate the mean of Y⁰ and Y¹ and take the difference (ratio) of the two means for the estimated causal risk difference (ratio)
4c	… and then combine across imputations by taking a simple mean (on the log scale for a risk ratio)
5	Bootstrap the process b times to obtain standard errors