Skip to main content
. 2021 Jul 15;190(12):2690–2699. doi: 10.1093/aje/kwab207

Table 2.

Performance of the AIPW Software Package in Estimating the Average Treatment Effect (Risk Difference) in a Simulated Observational Study Based on the EAGeR Triala

Method and Software Package Bias (SE) MSE Mean 95% CI  
Width
95% CI  
Coverage (SE), %b
Mean Run Time,  
seconds
True model: GLM + no cross-fitting
 G-computation −0.002 (0.002) 0.005 0.271 94.8 (0.5) 1.82
 IPW −0.002 (0.002) 0.005 0.280 95.8 (0.4) 0.01
AIPW −0.002 (0.002) 0.005 0.268 94.8 (0.5) 0.36
CausalGAM −0.003 (0.002) 0.005 0.267 94.8 (0.5) 0.07
npcausal −0.002 (0.002) 0.005 0.267 94.6 (0.5) 0.24
tmle −0.002 (0.002) 0.005 0.261 94.4 (0.5) 0.29
tmle3 −0.002 (0.002) 0.005 0.268 94.8 (0.5) 0.31
GAMs + no cross-fitting
AIPW −0.002 (0.002) 0.005 0.261 93.8 (0.5) 1.16
CausalGAM −0.004 (0.002) 0.005 0.266 92.7 (0.6) 0.19
npcausal −0.002 (0.002) 0.005 0.260 93.9 (0.5) 0.98
tmle −0.002 (0.002) 0.005 0.257 94.0 (0.5) 0.86
tmle3 −0.002 (0.002) 0.005 0.261 93.9 (0.5) 4.54
GAMs + k = 10 cross-fitting
AIPW −0.002 (0.002) 0.005 0.310 96.6 (0.4) 7.92
npcausal −0.002 (0.002) 0.006 0.319 96.5 (0.4) 3.55
tmlec −0.002 (0.002) 0.005 0.272 95.6 (0.5) 5.15
tmle3 −0.002 (0.002) 0.005 0.308 96.5 (0.4) 7.51
SuperLearnerd + no cross-fitting
AIPW −0.009 (0.002) 0.005 0.246 93.0 (0.6) 14.65
npcausal −0.005 (0.002) 0.005 0.232 90.3 (0.7) 21.71
tmle −0.009 (0.002) 0.005 0.251 93.8 (0.5) 13.44
tmle3 −0.005 (0.002) 0.005 0.246 92.2 (0.6) 36.76
SuperLearnerd + k = 10 no cross-fitting
AIPW −0.002 (0.002) 0.005 0.281 95.6 (0.5) 128.48
npcausal −0.004 (0.002) 0.005 0.285 95.5 (0.5) 183.54
tmlec −0.006 (0.002) 0.005 0.266 94.5 (0.5) 43.38
tmle3 −0.004 (0.002) 0.005 0.272 95.2 (0.5) 48.52

Abbreviations: AIPW, augmented inverse probability weighting; CI, confidence interval; EAGeR, Effects of Aspirin in Gestation and Reproduction; GAM, generalized additive model; GLM, generalized linear model; IPW, inverse probability weighting; MSE, mean squared error; SE, standard error.

a Simulations were conducted with a sample size of 200 and 2,000 Monte Carlos simulations; the true risk difference was 0.128. Numbers in parentheses show Monte Carlo SEs for the performance indicator estimates.

b Asymptotic SEs were used for CI calculation in AIPW, CausalGAM, tmle, and tmle3. The CIs for G-computation and IPW were obtained via 200 bootstraps and sandwich estimators, respectively.

c Cross-fitting was conducted in the outcome model only because of its implementation.

d SuperLearner was used for tmle and AIPW, and sl3 was used for tmle3. Algorithms included gam, earth, ranger, and XGBoost.