Skip to main content
. 2021 Dec 10;10:e71601. doi: 10.7554/eLife.71601

Table 1. Replication rates according to seven criteria.

Papers Experiments Effects All outcomes
Total number 23 50 158 188
ORIGINAL POSITIVE RESULTS
Numerical results
Same direction 17 of 19 (89%) 26 of 35 (74%) 80 of 101 (79%) 95 of 116 (82%)
Direction and statistical significance 8 of 19 (42%) 17 of 33 (52%) 42 of 97 (43%) 44 of 112 (39%)
Original ES in replication CI 5 of 19 (26%) 3 of 33 (9%) 17 of 97 (18%) 26 of 112 (23%)
Replication ES in original CI 5 of 19 (26%) 11 of 33 (33%) 42 of 97 (43%) 50 of 112 (45%)
Replication ES in PI (porig) 6 of 19 (32%) 13 of 33 (39%) 56 of 97 (58%) 67 of 112 (60%)
Replication ES≥ original ES 1 of 19 (5%) 1 of 33 (3%) 3 of 97 (3%) 3 of 112 (3%)
Meta-analysis (p < 0.05) 15 of 19 (79%) 26 of 33 (79%) 60 of 97 (62%) 75 of 112 (67%)
Representative images
Same direction 9 of 10 (90%) 12 of 16 (75%) 28 of 35 (80%) 34 of 45 (76%)
Direction and statistical significance 3 of 8 (40%) 7 of 12 (58%) 14 of 22 (64%) 14 of 22 (64%)
Original image in replication CI 5 of 7 (71%) 3 of 11 (27%) 10 of 21 (48%) 10 of 21 (48%)
Replication effect ≥ original image 3 of 7 (43%) 5 of 11 (45%) 7 of 21 (33%) 7 of 21 (33%)
Sample sizes
Median [IQR] of original 46.0 [20.0–100] 20.0 [8.5–48.0] 8.0 [6.0–13.0] 8.0 [6.0–18.0]
Median [IQR] of replication 50.0 [28.0–128] 24.0 [11.5–50.0] 12.0 [8.0–22.2] 12.0 [8.0–18.0]
ORIGINAL NULL RESULTS
Numerical results
Same direction N/A N/A N/A N/A
Direction and statistical significance 9 of 11 (82%) 10 of 12 (83%) 11 of 15 (73%) 10 of 20 (50%)
Original ES in replication CI 8 of 11 (73%) 9 of 12 (75%) 11 of 15 (73%) 12 of 20 (60%)
Replication ES in original CI 9 of 11 (82%) 10 of 12 (83%) 12 of 15 (80%) 13 of 20 (65%)
Replication ES in PI (porig) 9 of 11 (82%) 10 of 12 (83%) 12 of 15 (80%) 14 of 20 (70%)
Replication ES ≤ original ES N/A N/A N/A N/A
Meta-analysis (p > 0.05) 8 of 11 (73%) 10 of 12 (83%) 10 of 15 (67%) 11 of 20 (55%)
Representative images
Same direction N/A N/A N/A N/A
Direction and statistical significance 3 of 3 (100%) 3 of 3 (100%) 4 of 5 (80%) 4 of 5 (80%)
Original image in replication CI 1 of 3 (33%) 1 of 3 (33%) 3 of 5 (60%) 3 of 5 (60%)
Replication effect ≤ original image N/A N/A N/A N/A
Sample sizes
Median [IQR] of original 16.0 [8.0–25.0] 12.0 [6.0–20.0] 15.0 [7.5–31.0] 18.0 [8.0–514]
Median [IQR] of replication 24.0 [16.0–69.0] 21.0 [8.0–54.0] 27.0 [8.0–66.8] 24.0 [16.0–573]

Summary of consistency between original and replication findings for original positive results (top) and null results (bottom), and by treating internal replications individually (all outcomes; column 5) and aggregated by effects (column 4), experiments (column 3), and papers (column 2). All findings coded in terms of consistency with original findings. If original results were null, then a positive result is counted as inconsistent with the original finding. For statistical significance, if original results were interpreted as a positive result but were not statistically significant at p < 0.05, then they were treated as a positive result (seven effects); likewise, if they were interpreted as a null result but were statistically significant at p < 0.05, they were treated as a null result (two effects). For original positive results, replications were deemed successful if they were statistically significant and in the same direction as the original finding; for original null results, replications were deemed successful if they were not statistically significant, regardless of direction. The ‘same direction’ criterion is not applicable for original null results because ‘null’ is an interpretation in null hypothesis significance testing and most null results still have a direction (as the effect size is almost always non-zero). Likewise, comparing direction of effect sizes is not meaningful for original null results if their variation was interpreted as noise. Mean differences were estimated from the image for original effects based on representative images. Original positive and null effects were kept separate when aggregating into experiments and papers. That is, if a single experiment had both positive and null effects, then the positive effects are summarized in ‘original positive results’ and the null outcomes are summarized in ‘original null results’. Very similar results are obtained when alternative strategies are used to aggregate the data (see Tables S1–S3 in Supplementary file 1). Standardized mean difference (SMD) effect sizes are reported. CI = 95% confidence interval; PI = 95% prediction interval; ES = effect size; IQR = interquartile range.