Skip to main content
. 2014 Jan 23;25(5):2214–2237. doi: 10.1177/0962280213519716

Table 1.

Empirical type I error rates of different propensity score methods for comparing survival functions between treatment groups.

Statistical method Prevalence of treatment
0.05 0.10 0.25
Effect in overall population of all subjects
 Stratification (Cox regression stratifying on PS strata) 0.439 0.571 0.629
 Stratification (stratified log-rank test) 0.439 0.572 0.629
 IPTW (Cole and Hernán) 0.070 0.072 0.051
 IPTW (Xie and Liu) 0.722 0.604 0.350
Effect in population of treated subjects
 Caliper matching (naïve Cox regression) 0.013 0.006 0.010
 Caliper matching (Cox regression with robust standard errors) 0.034 0.030 0.029
 Caliper matching (log-rank test) 0.013 0.006 0.010
 Caliper matching (stratified log-rank test) 0.035 0.033 0.039
 Nearest neighbour matching (naïve Cox regression) 0.012 0.008 0.073
 Nearest neighbour matching (Cox regression with robust standard errors) 0.033 0.030 0.144
 Nearest neighbour matching (log-rank test) 0.012 0.008 0.073
 Nearest neighbour matching (stratified log-rank test) 0.033 0.033 0.283
 IPTW (Cole and Hernán) 0.009 0.006 0.006
 IPTW (Xie and Liu) 0.000 0.000 0.000

Note: The cells contain empirical estimates of the type I error rate. These were the proportion of 1000 simulated datasets in which the null hypothesis of no difference in survival functions was rejected at the P < 0.05 level.