Table 1.
Treatment 1 (mean ± SD, n) | Treatment 2 (mean ± SD, n) | Difference between means | P value | 95% CI of the difference between means | |
---|---|---|---|---|---|
Experiment A | 1000 ± 100, n = 50 | 990.0 ± 100, n = 50 | 10 | 0.6 | −30 to 50 |
Experiment B | 1000 ± 100, n = 3 | 950.0 ± 100, n = 3 | 50 | 0.6 | −177 to 277 |
Experiment C | 100 ± 5.0, n = 135 | 102 ± 5.0, n = 135 | 2 | 0.001 | 0.8 to 3.2 |
Experiment D | 100 ± 5.0, n = 3 | 135 ± 5.0, n = 3 | 35 | 0.001 | 24 to 46 |
Experiments A and B have identical P values, but the scientific conclusion is very different. The interpretation depends upon the scientific context, but in most fields Experiment A would be solid negative data proving that there either is no effect or that the effect is tiny. In contrast, Experiment B has such a wide confidence interval as to be consistent with nearly any hypothesis. Those data simply do not help answer your scientific question.
Similarly, experiments C and D have identical P values, but should be interpreted differently. In most experimental contexts, experiment C demonstrates convincingly that while the difference not zero, it is quite small. Experiment D provides convincing evidence that the effect is large.