. 2021 Jul 15;190(12):2690–2699. doi: 10.1093/aje/kwab207

Table 2.

Performance of the AIPW Software Package in Estimating the Average Treatment Effect (Risk Difference) in a Simulated Observational Study Based on the EAGeR Trial^a

Method and Software Package	Bias (SE)	MSE	Mean 95% CI Width	95% CI Coverage (SE), %^b	Mean Run Time, seconds
True model: GLM + no cross-fitting
G-computation	−0.002 (0.002)	0.005	0.271	94.8 (0.5)	1.82
IPW	−0.002 (0.002)	0.005	0.280	95.8 (0.4)	0.01
AIPW	−0.002 (0.002)	0.005	0.268	94.8 (0.5)	0.36
CausalGAM	−0.003 (0.002)	0.005	0.267	94.8 (0.5)	0.07
npcausal	−0.002 (0.002)	0.005	0.267	94.6 (0.5)	0.24
tmle	−0.002 (0.002)	0.005	0.261	94.4 (0.5)	0.29
tmle3	−0.002 (0.002)	0.005	0.268	94.8 (0.5)	0.31
GAMs + no cross-fitting
AIPW	−0.002 (0.002)	0.005	0.261	93.8 (0.5)	1.16
CausalGAM	−0.004 (0.002)	0.005	0.266	92.7 (0.6)	0.19
npcausal	−0.002 (0.002)	0.005	0.260	93.9 (0.5)	0.98
tmle	−0.002 (0.002)	0.005	0.257	94.0 (0.5)	0.86
tmle3	−0.002 (0.002)	0.005	0.261	93.9 (0.5)	4.54
GAMs + k = 10 cross-fitting
AIPW	−0.002 (0.002)	0.005	0.310	96.6 (0.4)	7.92
npcausal	−0.002 (0.002)	0.006	0.319	96.5 (0.4)	3.55
tmle^c	−0.002 (0.002)	0.005	0.272	95.6 (0.5)	5.15
tmle3	−0.002 (0.002)	0.005	0.308	96.5 (0.4)	7.51
SuperLearner^d + no cross-fitting
AIPW	−0.009 (0.002)	0.005	0.246	93.0 (0.6)	14.65
npcausal	−0.005 (0.002)	0.005	0.232	90.3 (0.7)	21.71
tmle	−0.009 (0.002)	0.005	0.251	93.8 (0.5)	13.44
tmle3	−0.005 (0.002)	0.005	0.246	92.2 (0.6)	36.76
SuperLearner^d + k = 10 no cross-fitting
AIPW	−0.002 (0.002)	0.005	0.281	95.6 (0.5)	128.48
npcausal	−0.004 (0.002)	0.005	0.285	95.5 (0.5)	183.54
tmle^c	−0.006 (0.002)	0.005	0.266	94.5 (0.5)	43.38
tmle3	−0.004 (0.002)	0.005	0.272	95.2 (0.5)	48.52

Abbreviations: AIPW, augmented inverse probability weighting; CI, confidence interval; EAGeR, Effects of Aspirin in Gestation and Reproduction; GAM, generalized additive model; GLM, generalized linear model; IPW, inverse probability weighting; MSE, mean squared error; SE, standard error.

^a Simulations were conducted with a sample size of 200 and 2,000 Monte Carlos simulations; the true risk difference was 0.128. Numbers in parentheses show Monte Carlo SEs for the performance indicator estimates.

^b Asymptotic SEs were used for CI calculation in AIPW, CausalGAM, tmle, and tmle3. The CIs for G-computation and IPW were obtained via 200 bootstraps and sandwich estimators, respectively.

^c Cross-fitting was conducted in the outcome model only because of its implementation.

^d SuperLearner was used for tmle and AIPW, and sl3 was used for tmle3. Algorithms included gam, earth, ranger, and XGBoost.