Skip to main content
The Journal of Clinical Hypertension logoLink to The Journal of Clinical Hypertension
editorial
. 2017 Nov 23;20(2):408–410. doi: 10.1111/jch.13173

Why it is nonsensical to use retrospective power analyses to conduct a postmortem on your study

Michael R Jiroutek 1,, J Rick Turner 2
PMCID: PMC8030883  PMID: 29168992

1. INTRODUCTION

A postmortem (best known in the medical context) is strictly defined as an examination of a dead body to determine the cause of death. For some time, this same idea has been used by businesses to evaluate a failed project or venture, after the fact, in order to glean what went wrong and to avoid similar future mistakes.1 The obvious shortcoming of this paradigm is that once the project has failed, the “postmortem” findings cannot resuscitate the project, enabling it not to fail. In recent years, in an attempt to be more proactive in making such assessments, the idea of a “premortem” has taken hold in the business community.2 The term “premortem” describes the idea that, following the conception of a project, but before the actual start, the team brainstorms about what could cause the project to fail and makes any necessary adjustments in advance to avoid these potential pitfalls. Done thoroughly, this proactive strategy will make a project more robust, increasing the chances of its success.

The business community seems to have finally figured out what the statistical/research community has known for decades—making an assessment of a project's success in advance is far more useful than waiting until after it has failed to try to determine what went wrong. The statistical community has a well‐known calculation to help determine this: power. Statistical power is defined as the probability that the null hypothesis will be rejected, given that the alternative hypothesis is true. An a priori power calculation is a statistical analog to the premortem. Pre‐study, we evaluate the likelihood of success by guesstimating known key factors, such as the effect size, sample size, data variability, and the desired type I and type II error rates.

Unfortunately, the statistical/research community appears to be falling into the trap that plagued the business community—regressing to the use of the statistical analog of the postmortem, despite repeated warnings against the fallacy of attempting post hoc power calculations.3, 4, 5, 6, 7, 8, 9, 10, 11, 12

Often referred to as retrospective power analysis, in this scenario, after a study has failed (operationally conceptualized in the statistical/research community as “failing to reject the null hypothesis”), a power calculation is undertaken to determine what sample size (or effect size) would have been needed to have been able to reject the null hypothesis; that is, to determine how the study could have been made successful after it died. To understand why such a calculation is not only useless, but also flat out nonsensical, consider the lottery.

2. CONCEPTUAL ANALOGY

Suppose, at the last minute before this coming weekend's Powerball drawing, an individual went to the local store and purchased a lottery ticket. Ticket in hand, that person might start dreaming of yachts and vacation homes and wonder, “What is the probability I'll win the lottery?” The individual arrives back at his house, sits back in his recliner, and turns on the broadcast of the drawing. One by one, ping‐pong balls (randomly) fall into the slot, indicating that particular ball was selected. The selected, numbered balls are ordered from smallest to largest—and—his number doesn't match.

At this point, after the winning numbers have been determined and he realizes his ticket is a loser, does it make sense for him to ask the above question, “What is the probability I'll win the lottery?” Before we address these scenarios, consider a third one.

On the way home after purchasing his ticket, he drives over a nail and gets a flat tire. The drawing was held as scheduled, but having accidently left his smart phone at home, he missed the announcement of the winning number, so he does not know whether he won. At this point, does it make sense for him to ask the question, “What is the probability I'll win the lottery?” Let's consider these scenarios by considering the relevant timeline.

Only in the first of these three lottery scenarios does it make sense to ask the question, “What is the probability I'll win the lottery?” Consider the standard mathematical definition of probability: the extent to which an event is likely to occur. Probability refers to an event that has not yet occurred (ie, an event in the future whose outcome is yet to be decided). This is the case only in our first lottery scenario. In the second and third scenarios, in which the ticket holder doesn't actually know the outcome, the event has already occurred, making the probability question invalid/nonsensical to pose. Since statistical power is a probability, let's consider the implications of when it makes sense to compute it in a clinical research setting.

3. OUR PREVIOUS EXAMPLE REVISITED

We will expand upon an example that we presented in two previous papers in this journal.13, 14 Imagine a Phase III clinical trial testing a new antihypertensive drug against a placebo. The research question, null hypothesis, and alternate (research) hypothesis are created. The research question is: Does the test drug reduce systolic blood pressure (SBP) to a statistically significantly greater degree than the placebo? The research hypothesis is: The test drug reduces SBP to a statistically significantly greater extent than placebo. The null hypothesis is: The test drug does not statistically significantly reduce the mean SBP as compared with placebo.

Prior to the start of this study, it is desirable to determine some estimate of the probability of success (ie, determine the statistical power). This was the original intent of the calculation when first proposed by Neyman and Pearson.15 As defined previously, power is the probability that the null hypothesis will be rejected (we declare the test drug does reduce SBP statistically significantly more than placebo), given that the alternative hypothesis is true (the test drug really does do this). The key word in the previous sentence is “will,” indicating a yet to be determined outcome. One can think of the data in this SBP study as the ping‐pong balls bouncing around in the lottery hopper: no individual SBPs have been measured for any study participants at this (pre‐study) point in time, so asking a probabilistic question makes inherent sense.

Now consider the situation post‐study (ie, after all SBP measurements have been made), the data collected, the planned statistical analyses conducted, and the study results are known. The ping‐pong balls have long since stopped falling, each having found its respective slot. The event(s) having occurred, the randomness that existed prior to the balls being selected is no longer. Post‐study, the data are fixed and known. Therefore, how can we ask a probability question about the study just completed? We can't—any such question violates the basic definition of probability: quantifying the chance of an event with an unknown outcome that is yet to occur. Therefore, since the question itself is nonsensical, trying to determine the statistical power of a study after the fact doesn't make sense. It would be like asking, “What is the probability that it rained yesterday?” It either did or didn't, and therefore assigning “it” a probability post hoc is meaningless.

Our third lottery scenario is a more interesting situation to ponder. The analog in the SBP study setting is where the data have been collected, but the analysis has not yet been conducted, so the answer to the research question is unknown. Does a power calculation make any more sense at this point in the study chronology timeline? No, it does not. The result has been established even though at this point in time it is unknown to the researcher. The randomness of the data ends once the last of the ping‐pong balls has dropped into its slot. Regardless of whether the result is known by the researcher, the existing dataset has been fixed, and thus the answer is “known” in the sense that it won't change since the event of interest—the research hypothesis (the answer to which is entirely dependent on the collected data)—is in the past.

4. IMPLICATIONS

The implications of respecting the rules of probability in the implementation of power analyses are straightforward. Because statistical power is a probability, and probabilities by definition can only apply to events yet to have occurred and, hence, for which the outcome is unknown, power analyses only make sense when conducted prior to data collection.

What then is a retrospective power analysis? It represents the likelihood of success of the next, yet to be conducted, study—it is an a priori power calculation for a subsequent study. An individual making such a calculation is asking: If the effect size, variability, sample size, and Type I error are the same as in the study just completed, what is the probability that the null hypothesis will be rejected, given that the alternative hypothesis is true?

A medical examiner's job in the postmortem is to determine the cause of death by stating scientifically determinable facts observed during the procedure. It is not to hypothesize what lifestyle or other changes could have been made to avoid having died in the first place. A medical examiner would never say: “If John Doe had done a better job of avoiding stress throughout his life, he would have had lower blood pressure and wouldn't have died of a heart attack.”

Similarly, any postmortem of a research study should only state scientifically determinable facts observed during the post‐study review/assessment. Saying that a just‐completed study was “underpowered” or “would have been successful if the sample size had been larger” are unknown, unknowable, and indeterminable justifications for its failure. Had the study been larger, there is no guarantee the observed effect size wouldn't shrink, or the additionally collected data wouldn't introduce more variability into the dataset, resulting in this larger study failing. Hypothesizing what could have made a study successful if certain different things had occurred brings to mind the line popularized by Don Meredith, a former commentator on Monday Night Football: “If ifs and buts were candies and nuts we'd all have a merry Christmas.” The post hoc statistical hypothesizing discussed in this editorial is probably best left to Santa Claus.

CONFLICT OF INTEREST

The authors report no specific funding in relation to the preparation of this paper.

ACKNOWLEDGMENT

The authors thank Ms Sheryl Jensen for administrative and editorial support.

REFERENCES

  • 1. Blumberg M. The art of the post‐mortem. March 9, 2011. http://www.businessinsider.com/the-art-of-the-post-mortem-2011-3. Accessed October 15, 2017.
  • 2. Klein G. Performing a project premortem. September 2007 issue of Harvard Business Review. https://hbr.org/2007/09/performing-a-project-premortem. Accessed October 15, 2017.
  • 3. Cox DR. Some problems connected with statistical inference. Ann Math Stat. 1958;29:357‐372. [Google Scholar]
  • 4. Goodman SN, Berlin JA. The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Ann Intern Med. 1994;121:200‐206. [DOI] [PubMed] [Google Scholar]
  • 5. Knapp TR. The overemphasis on power analysis. Nurs Res. 1996;45:379‐381. [DOI] [PubMed] [Google Scholar]
  • 6. Zumbo BD, Hubley AM. A note on misconceptions concerning prospective and retrospective power. Statistician. 1998;47:385‐388. [Google Scholar]
  • 7. Hoenig JM, Heisey DM. The abuse of power: the pervasive fallacy of power calculations for data analysis. Am Stat. 2001;55:19‐24. [Google Scholar]
  • 8. Altman DG, Schulz KF, Moher D, et al. CONSORT GROUP (Consolidated Standards of Reporting Trials). The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med. 2001;134:663‐694. [DOI] [PubMed] [Google Scholar]
  • 9. Senn SJ. Power is indeed irrelevant in interpreting completed studies. BMJ. 2002;325:1304. [PMC free article] [PubMed] [Google Scholar]
  • 10. Cummings P, Rivara FP. Reporting statistical information in medical journals. Arch Pediatr Adolesc Med. 2003;157:321‐324. [DOI] [PubMed] [Google Scholar]
  • 11. Lenth RV. Statistical power calculations. J Anim Sci. 2007;85(E Suppl.):E24‐E29. [DOI] [PubMed] [Google Scholar]
  • 12. Hayat MJ. Understanding sample size determination in nursing research. West J Nurs Res. 2013;35:943‐956. [DOI] [PubMed] [Google Scholar]
  • 13. Jiroutek MJ, Turner JR. In praise of confidence intervals: much more informative than P values alone. J Clin Hypertens (Greenwich). 2016;18:955‐957. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Jiroutek MJ, Turner JR. Buying a significant result: do we need to reconsider the role of the P‐value? J Clin Hypertens (Greenwich). 2017;19:919‐921. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Neyman J, Pearson E. On the problem of the most efficient tests of statistical hypotheses. Philos Trans R Soc Lond A. 1933;231:289‐337. [Google Scholar]

Articles from The Journal of Clinical Hypertension are provided here courtesy of Wiley

RESOURCES