Skip to main content
. Author manuscript; available in PMC: 2024 Jan 4.
Published in final edited form as: J Clin Epidemiol. 2023 Jun 19;160:100–109. doi: 10.1016/j.jclinepi.2023.06.011

Table 2. Data generating mechanism (DGM) and missingness mechanism used in the simulation study in scenarios 1—4.

Seen. DGM for Y DGM for X DGM for C Partially observed variablesa Missingness meehanismb
1
  • Y is continuous and depends on X, X2 and C:

  • Y = Y = -0.4 + 0.4 X + 0.8

  • C + ϕ X2 + εY

  • where ϕ = 0.1, 0.6 or 1.0

  • X is continuous and depends on C:

  • X = C + εX

C is binary, with probability 0.5 of a value of 0 or 1 C or Y
  • Missingness depends on X: logit{P(RΔ = 1)} = τ

  • (α + X)

2
  • Y is continuous and depends on X, C and C2:

  • Y = -0.4 + 0.4 X + 0.8 C + 0.6 C2 + εY

  • X is continuous and depends on C:

  • X = C + εX

C is normally distributed, with mean 0.5, variance 1 X or Y
  • Missingness depends on C: logit{P(RΔ = 1)} = τ

  • (α + C)

3
  • Y is continuous and depends on X and C:

  • Y = -0.4 + 0.4 X + 0.8 C + εY

  • X is continuous and depends on C2:

  • X = C2 + εX

C is normally distributed, with mean 0.5, variance 1 X or Y
  • Missingness depends on C: logit{P(RΔ = 1)} = τ

  • (α + C)

4
  • Y is binary and depends on X, X2 and C:

  • logit{P(Y = 1)} = -0.4 + 0.4

  • X + 0.8 C + 0.5 X2

  • X is continuous and depends on C:

  • X = C + εX

C is binary, with probability 0.5 of a value of 0 or 1 C or Y
  • Missingness depends on X: logit{P(RΔ = 1)} = τ

  • (α + X)

Abbreviation: logit, logistic function.

In each scenario, we assumed the error terms εY and εX were uncorrelated, with standard normal distributions (mean 0, variance 1).

For each missingness mechanism, α was chosen empirically to give approximately 70% observed values (and additionally 50% and 90% observed values in scenario 1 when ϕ = 1.0), for each strength of missingness association (τ), τ = 0.1, 1, 3 or 5.

a

Two separate situations were considered in each scenario: (i) the partially observed variable was directly involved in the mis-specified relationship and (ii) the partially observed variable was not directly involved. Values were set to missing for one variable only in each situation.

b

P(RΔ = 1) denotes the probability that a value (of the partially observed variable) is observed, with Δ = X, C or Y.