Table 2. Data generating mechanism (DGM) and missingness mechanism used in the simulation study in scenarios 1—4.
Seen. | DGM for Y | DGM for X | DGM for C | Partially observed variablesa | Missingness meehanismb |
---|---|---|---|---|---|
1 |
|
|
C is binary, with probability 0.5 of a value of 0 or 1 | C or Y |
|
2 |
|
|
C is normally distributed, with mean 0.5, variance 1 | X or Y |
|
3 |
|
|
C is normally distributed, with mean 0.5, variance 1 | X or Y |
|
4 |
|
|
C is binary, with probability 0.5 of a value of 0 or 1 | C or Y |
|
Abbreviation: logit, logistic function.
In each scenario, we assumed the error terms εY and εX were uncorrelated, with standard normal distributions (mean 0, variance 1).
For each missingness mechanism, α was chosen empirically to give approximately 70% observed values (and additionally 50% and 90% observed values in scenario 1 when ϕ = 1.0), for each strength of missingness association (τ), τ = 0.1, 1, 3 or 5.
Two separate situations were considered in each scenario: (i) the partially observed variable was directly involved in the mis-specified relationship and (ii) the partially observed variable was not directly involved. Values were set to missing for one variable only in each situation.
P(RΔ = 1) denotes the probability that a value (of the partially observed variable) is observed, with Δ = X, C or Y.