1a |
For each of N observations, defined by {Y=y, X=x, Z=z}, create two additional |
Variables Y0 and Y1 which are initially set to missing. |
The observation is now defined by {Y=y, X=x, Z=z, Y0=., Y1=} |
1b |
Assuming counterfactual consistency, set Yx = Y. That is, if X = 0, then set Y
0= Y
|
|
2 |
|
Fit a model (e.g. logistic regression) for the association of X, Z on Y, resulting in estimated model parameters β |
3a |
Perform m imputations (e.g. in SAS procedure MI) for the missing values of Y1 based on observed values of Z. |
For each observation, use the model (fit in step 1) to predict expected value of Y1 setting X = 1, and of Y0 setting X = 0 |
This will result in m complete datasets |
3b |
Impute the missing values of Y0 based on observed values of Z in each of the m datasets. This keeps total number of datasets at m
|
Note that if Y is dichotomous, the expected value is the same as probability P(Yx=1) |
4a |
Within each of m complete datasets, do: |
4b |
Calculate the mean of Y0 and Y1 and take the difference (ratio) of the two means for the estimated causal risk difference (ratio) |
4c |
… and then combine across imputations by taking a simple mean (on the log scale for a risk ratio) |
|
5 |
Bootstrap the process b times to obtain standard errors |