Table 1.
Identical P values with very different interpretations
Treatment 1 (mean ± SD, n) | Treatment 2 (mean ± SD, n) | Difference between means | P value | 95 % CI of the difference between means | |
---|---|---|---|---|---|
Experiment A | 1,000 ± 100, n = 50 | 990.0 ± 100, n = 50 | 10 | 0.6 | −30 to 50 |
Experiment B | 1,000 ± 100, n = 3 | 950.0 ± 100, n = 3 | 50 | 0.6 | −177 to 277 |
Experiment C | 100 ± 5.0, n = 135 | 102 ± 5.0, n = 135 | 2 | 0.001 | 0.8 to 3.2 |
Experiment D | 100 ± 5.0, n = 3 | 135 ± 5.0, n = 3 | 35 | 0.001 | 24 to 46 |
Experiments A and B have identical P values, but the scientific conclusion is very different. The interpretation depends upon the scientific context, but in most fields experiment A would be solid negative data proving that there either is no effect or that the effect is tiny. In contrast, experiment B has such a wide confidence interval as to be consistent with nearly any hypothesis. Those data simply do not help answer your scientific question
Similarly, experiments C and D have identical P values, but should be interpreted differently. In most experimental contexts, experiment C demonstrates convincingly that while the difference is not zero, it is quite small. Experiment D provides convincing evidence that the effect is large