Table 2:
Estimated risk differences for a single sample from the data generating mechanism
RD | SD(RD) | 95% CL | CLD | Run-timea | |
---|---|---|---|---|---|
G-computation | |||||
Main-effects | −0.14 | 0.016 | −0.17, −0.11 | 0.06 | 0.9 |
Machine learning | −0.09 | 0.015 | −0.12, −0.06 | 0.06 | 82.3 |
IPW | |||||
Main-effects | −0.13 | 0.039 | −0.20, −0.05 | 0.15 | 0.0 |
Machine learning | −0.11 | 0.028 | −0.16, −0.05 | 0.11 | 0.3 |
AIPW | |||||
Main-effects | −0.08 | 0.038 | −0.16, −0.01 | 0.15 | 0.0 |
Machine learning | −0.11 | 0.016 | −0.14, −0.08 | 0.06 | 0.7 |
TMLE | |||||
Main-effects | −0.12 | 0.029 | −0.18, −0.06 | 0.11 | 0.0 |
Machine learning | −0.12 | 0.016 | −0.15, −0.09 | 0.06 | 0.7 |
DC-AIPW | |||||
Main-effects | −0.09 | 0.039 | −0.16, −0.01 | 0.15 | 1.3 |
Machine learning | −0.11 | 0.023 | −0.16, −0.07 | 0.09 | 128.1 |
DC-TMLE | |||||
Main-effects | −0.12 | 0.029 | −0.18, −0.07 | 0.11 | 1.3 |
Machine learning | −0.11 | 0.021 | −0.15, −0.07 | 0.08 | 129.9 |
RD: risk difference, SD(RD): standard deviation for the risk difference, 95% CL: 95% confidence limits, CLD: confidence limit difference defined as the upper confidence limit minus the lower confidence limit, IPW: inverse probability weighting, AIPW: augmented inverse probability weighting, TMLE: targeted maximum likelihood estimation, DC-AIPW: double cross-fit AIPW, DC-TMLE: double cross-fit TMLE.
Machine learning estimators were super-learner with 10-fold cross validation. Algorithms included were the empirical mean, main-effects logistic regression without regularization, generalized additive model with four splines and a ridge penalty of 0.6, generalized additive model with four splines, random forest with 500 trees and a minimum of 20 individuals per leaf, and a neural network with a single hidden layer consisting of four nodes.
Double cross-fit procedures included 100 different sample splits.
Run times are based on a server running on a single 2.5 GHz processor with 5 GB of memory allotted. Run times are indicated in minutes. G-computation run-times are large due to the use of a bootstrap procedure to calculate the variance for the risk difference. IPW used robust variance estimators. AIPW, TMLE, DC-AIPW, and DC-TMLE variances were calculated using influence curves.