Skip to main content
. 2015 Jul 25;44(5):1731–1737. doi: 10.1093/ije/dyv135

Table 1.

Proposed algorithms for imputation of potential outcomes using multiple imputation and the g-formula

Multiple imputation The parametric g-formula
1a For each of N observations, defined by {Y=y, X=x, Z=z}, create two additional
Variables Y0 and Y1 which are initially set to missing.
The observation is now defined by {Y=y, X=x, Z=z, Y0=., Y1=}
1b Assuming counterfactual consistency, set Yx = Y. That is, if X = 0, then set Y 0= Y
2 Fit a model (e.g. logistic regression) for the association of X, Z on Y, resulting in estimated model parameters β
3a Perform m imputations (e.g. in SAS procedure MI) for the missing values of Y1 based on observed values of Z. For each observation, use the model (fit in step 1) to predict expected value of Y1 setting X = 1, and of Y0 setting X = 0
This will result in m complete datasets
3b Impute the missing values of Y0 based on observed values of Z in each of the m datasets. This keeps total number of datasets at m Note that if Y is dichotomous, the expected value is the same as probability P(Yx=1)
4a Within each of m complete datasets, do:
4b Calculate the mean of Y0 and Y1 and take the difference (ratio) of the two means for the estimated causal risk difference (ratio)
4c … and then combine across imputations by taking a simple mean (on the log scale for a risk ratio)
5 Bootstrap the process b times to obtain standard errors