Skip to main content
Canadian Journal of Psychiatry. Revue Canadienne de Psychiatrie logoLink to Canadian Journal of Psychiatry. Revue Canadienne de Psychiatrie
letter
. 2020 Sep 29;66(4):421–422. doi: 10.1177/0706743720962277

P Values and Confidence Intervals

Scott B Patten 1,, David L Streiner 2
PMCID: PMC8172342  PMID: 32991213

Some of the critiques raised by Hwang et al. 1 in their letter concern the risk of type I and type II errors and the implications of such errors for interpretation of results reported by Cookey et al. 2 We would like to point out that some of these issues relate to the specific statistical tools chosen by investigators and specifically to the choice of hypothesis testing techniques versus estimation techniques. A P value quantifies the probability of an observed result, or one more extreme, under the assumption of a null hypothesis. P values are usually interpreted in relation to a 5% level of confidence, such that a P value of less than 5% is interpreted as evidence that the null hypothesis can be rejected. However, when a study reports a large number of tests, there is a risk that one or more of the null assumptions will be rejected due to chance, leading to type I error. Type I error can be understood as a “false positive” result of a statistical test since it suggests a difference in the population when one does not exist. When the number of observations is small, it is increasingly possible that an observed difference is compatible with random variation (at the 5% level of confidence) even when the null hypothesis is not true, a type II error. Type II error can be understood as a “false negative” because it suggests a lack of difference between groups in the population when a difference does exist. By the same token, when the sample size is small, any chance effect that is significant will likely be large. 3 For these reasons, the International Committee of Medical Journal Editors (ICMJE) recommends that authors should: “When possible, quantify findings and present them with appropriate indicators of measurement error or uncertainty (such as confidence intervals). Avoid relying solely on statistical hypothesis testing, such as P values, which fail to convey important information about effect size and precision of estimates.” This echoes a recommendation from the Task Force on Statistical Inference of the American Psychological Association, which states “It is hard to imagine a situation in which a dichotomous accept–reject decision is better than reporting an actual P value or, better still, a confidence interval. 4 A confidence interval provides an estimate of effect and also a range of values that are consistent with the observed data at the specified level of confidence (usually 95% confidence intervals are reported). A confidence interval for a measure of effect, such as an odds ratio, provides all of the information provided by a statistical test and more. When a confidence interval does not include the null value (e.g., 1.0 in the case of an odds ratio; 0.0 for a standardized mean difference), the null hypothesis would have been rejected had a statistical test been used. However, if the interpretation of a confidence interval leads to a statement of significance, the same issues of multiple “testing” arise. Therefore, carefully planned and thoughtful data analysis is required in addition to the selection of the best statistical tools. The CJP endorses the ICMJE recommendations, and we encourage authors to avoid excessive reliance on P values in reporting their analyses.

Footnotes

Authors’ Note: Dr. Patten is supported by the Cuthbertson & Fischer Chair in Pediatric Mental Health at the University of Calgary.

Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD: Scott B. Patten Inline graphic https://orcid.org/0000-0001-9871-4041

References

  • 1. Hwang P, Hechtman L, Jimenez AC, et al. In response to “clinical characteristics associated with early phase psychosis and comorbid substance use”: methodological concerns. Can J Psychiatry. 2021;66(2):183–184. doi: 10.1177/0706743720947636. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Cookey J, McGavin J, Crocker CE, Matheson K, Stewart SH, Tibbo PG. A retrospective study of the clinical characteristics associated with alcohol and cannabis use in early phase psychosis. Can J Psychiatry. 2020;65(6):426–435. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Streiner DL. Sample size in clinical research: when is enough enough? J Pers Assess. 2006:87(3);259–260. [DOI] [PubMed] [Google Scholar]
  • 4. APA Task Force. Wilkinson L, and Task Force on Statistical Inference APA Board of Scientific Affairs. Statistical methods in psychology journals: guidelines and explanations. Am Psychol. 1999:54(8);594–604. [Google Scholar]

Articles from Canadian Journal of Psychiatry. Revue Canadienne de Psychiatrie are provided here courtesy of SAGE Publications

RESOURCES