Skip to main content
. Author manuscript; available in PMC: 2022 May 1.
Published in final edited form as: Epidemiology. 2021 May 1;32(3):393–401. doi: 10.1097/EDE.0000000000001332

Table 2:

Estimated risk differences for a single sample from the data generating mechanism

RD SD(RD) 95% CL CLD Run-timea

G-computation
  Main-effects −0.14 0.016 −0.17, −0.11 0.06 0.9
  Machine learning −0.09 0.015 −0.12, −0.06 0.06 82.3
IPW
  Main-effects −0.13 0.039 −0.20, −0.05 0.15 0.0
  Machine learning −0.11 0.028 −0.16, −0.05 0.11 0.3
AIPW
  Main-effects −0.08 0.038 −0.16, −0.01 0.15 0.0
  Machine learning −0.11 0.016 −0.14, −0.08 0.06 0.7
TMLE
  Main-effects −0.12 0.029 −0.18, −0.06 0.11 0.0
  Machine learning −0.12 0.016 −0.15, −0.09 0.06 0.7
DC-AIPW
  Main-effects −0.09 0.039 −0.16, −0.01 0.15 1.3
  Machine learning −0.11 0.023 −0.16, −0.07 0.09 128.1
DC-TMLE
  Main-effects −0.12 0.029 −0.18, −0.07 0.11 1.3
  Machine learning −0.11 0.021 −0.15, −0.07 0.08 129.9

RD: risk difference, SD(RD): standard deviation for the risk difference, 95% CL: 95% confidence limits, CLD: confidence limit difference defined as the upper confidence limit minus the lower confidence limit, IPW: inverse probability weighting, AIPW: augmented inverse probability weighting, TMLE: targeted maximum likelihood estimation, DC-AIPW: double cross-fit AIPW, DC-TMLE: double cross-fit TMLE.

Machine learning estimators were super-learner with 10-fold cross validation. Algorithms included were the empirical mean, main-effects logistic regression without regularization, generalized additive model with four splines and a ridge penalty of 0.6, generalized additive model with four splines, random forest with 500 trees and a minimum of 20 individuals per leaf, and a neural network with a single hidden layer consisting of four nodes.

Double cross-fit procedures included 100 different sample splits.

a

Run times are based on a server running on a single 2.5 GHz processor with 5 GB of memory allotted. Run times are indicated in minutes. G-computation run-times are large due to the use of a bootstrap procedure to calculate the variance for the risk difference. IPW used robust variance estimators. AIPW, TMLE, DC-AIPW, and DC-TMLE variances were calculated using influence curves.