Figure 5:
Illustration of the concept and interpretation for power, type I, type S and type M errors (Gelman, 2015). Suppose that there is a hypothetical Student’s t(20)-distribution (black curve) for a true effect (blue vertical line) of 0.3 and a corresponding standard error of 1.0 percent signal change, a scenario highlighted in purple in Fig. 4. Under the null hypothesis (red vertical line and dot-dashed green curve), two-tailed testing with a type I error rate of 0.05 leads to having thresholds at ±2.086; FPR = 0.05 corresponds to the null distribution’s total area beyond these two critical values (marked with red diagonal lines). The power is the total area of the t(20)-distribution for the true effect (black curve) beyond these thresholds, which is 0.06 (shaded in blue). The type S error is the ratio of the blue area in the true effect distribution’s left tail beyond the threshold of −2.086 to the area in both tails, which is 23% here (i.e., the ratio of the “significant” area in the wrong-signed tail to that of the total “significant” area). If a random draw from the t(20)-distribution under the true effect happens to be 2.2 (small gray square), it would be identified as statistically significant at the 0.05 level, and the resulting type M error would quantify the magnification of the estimated effect size as 2.2/0.3 ≈ 7.33, which is much larger than unity.