. 2021 Dec 10;10:e71601. doi: 10.7554/eLife.71601

Table 2. Replication rates according to three criteria involving null hypothesis significance testing.

	Papers		Experiments		Effects		All outcomes
Total number	23		50		158		188

ORIGINAL POSITIVE RESULTS
Succeeded on all three criteria	2	11%	2	6%	13	13%	20	18%
[1]Failed only on significance and direction	2	11%	1	3%	4	4%	6	5%
[2]Failed only on original in replication confidence interval	1	5%	5	15%	14	14%	10	9%
[3]Failed only on replication in original confidence interval	0	0%	0	0%	0	0%	0	0%
Failed only on [1] and [2]	0	0%	3	9%	11	11%	14	13%
Failed only on [2] and [3]	5	26%	10	30%	15	15%	14	13%
Failed only on [1] and [3]	1	5%	0	0%	0	0%	0	0%
Failed on all three criteria [1], [2], and [3]	8	42%	12	36%	40	41%	48	43%
Total	19		33		97		112

ORIGINAL NULL RESULTS
Succeeded on all three criteria	6	55%	7	58%	8	53%	7	35%
[1]Failed only on significance and direction	2	18%	2	17%	3	20%	5	25%
[2]Failed only on original in replication confidence interval	1	9%	1	8%	1	7%	1	5%
[3]Failed only on replication in original confidence interval	0	0%	0	0%	0	0%	0	0%
Failed only on [1] and [2]	0	0%	0	0%	0	0%	0	0%
Failed only on [2] and [3]	2	18%	2	17%	2	13%	2	10%
Failed only on [1] and [3]	0	0%	0	0%	0	0%	0	0%
Failed on all three criteria [1], [2], and [3]	0	0%	0	0%	1	7%	5	25%
Total	11		12		15		20

Number of replications that succeeded or failed to replicate results in original experiments according to three criteria within the null hypothesis significance testing framework: statistical significance (p < 0.05) and same direction; original effect size inside 95% confidence interval of replication effect size using standardized mean difference (SMD) effect sizes; replication effect size inside 95% confidence interval of original effect size using SMD effect sizes. Data for original positive results and original null results are shown separately, as are data for all outcomes and aggregated by effect, experiment, and paper. Very similar results are obtained when alternative strategies are used to aggregate the data (see Tables S4–S6 in Supplementary file 1).