Skip to main content
. Author manuscript; available in PMC: 2020 Aug 19.
Published in final edited form as: Obes Rev. 2019 Aug 19;20(11):1523–1541. doi: 10.1111/obr.12923

Table:

10 inferential errors, how they may occur, and recommendations for how to avoid them or how to communicate when they are unavoidable.

Inferential error1 Error description Recommendations2
Using Self-Reported Outcomes and Teaching to the Test Urging the intervention group to change health-related behaviors or conditions, then giving participants a questionnaire that asks about the same health related behaviors and conditions, and ignoring the biases this can induce. Use objective measurements when possible. If self-report is the only measurement tool available, either forego the measurements entirely, do not emphasize the measurements in the conclusions, or at the very least make the reader aware of the potential for biased results.
Foregoing Control Groups and Risking Regression to the Mean Creating Differences Over Time Providing an intervention only to individuals preferentially sampled to be either higher or lower than the population mean on some variable – such as children all with high BMI z-scores – and assuming improvements over time are caused by the intervention, rather than a spontaneous tendency for extreme values to revert toward the population average. Include a control group with the same characteristics as the intervention group. If not available, communicate clearly that subgrouping on extreme values risks the follow-up values being closer to the population average because of regression to the mean rather than an actual effect.
Changing the Goal Posts Using surrogate or secondary outcomes to make claims of effectiveness for an intervention when a study to test an intervention’s effect on obesity yields a non-significant result for the primary outcome. Focus the report on the pre-registered primary outcome, and communicate intermediate endpoints with great caution.
Ignoring Clustering in Studies that Randomize Groups of Children Conducting a cluster randomized trial in which groups of children are randomly assigned to experimental conditions, but analyzing the data as though the children were randomized individually. Always account for clustering in statistical analyses. Have as many clusters as possible, and always more than one cluster per treatment condition.
Following the Forking Paths, Sub-Setting, P-Hacking, and Data Dredging Trying different analyses with different subsets of the sample or various outcomes and basing conclusions on whatever is statistically significant. Where appropriate, pre-specify questions and analyses of interest. Be transparent about all analyses conducted, how they were conducted, and whether they were pre-specified. Do not draw definitive conclusions about causal effects from analyses that were not pre-specified or are subsets of many pre-specified analyses uncorrected for multiple testing.
Basing Conclusions on Tests for Significant Differences from Baseline Separately testing for significant differences from baseline in the intervention and control groups and if the former is significant and the latter is not, declaring the result statistically significant. Always conduct, report, and emphasize the appropriate between-groups test.
Equating ‘No Statistically Significant Difference’ with ‘Equally Effective’ Concluding that two interventions tested head-to-head had ‘equal effectiveness’ when there is no statistically significant difference between groups. Include an appropriate non-intervention control group if absolute effectiveness is of interest. When comparing only two interventions head-to-head, do not presume that changes over time reflect effectiveness. Testing equivalence or non-inferiority between two interventions requires special design and analysis considerations.
Ignoring Intervention Study Results in Favor of Observational Analyses Drawing conclusions from correlations of intervention-related factors with outcomes, rather than testing the actual intervention against a control as designed. Report primary, between-group analyses from controlled intervention studies. Clearly communicate that observational findings do not carry the same causal evidence.
Using One-sided Testing for Statistical Significance Switching to one-sided statistical significance tests to make results statistically significant. Two-sided tests are typically more appropriate. One-sided tests should not be used. In cases where one insists on their use, the testing approach should be pre-specified and justified.
Stating that Effects are Clinically Significant Even Though They Are Not Statistically Significant Ignoring the statistical tests in favor of making optimistic conclusions about whether the effects are clinically significant. Pre-specify what counts as statistically or clinically significant, and be faithful to and transparent about the analysis and interpretation plans. If using statistical significance testing, do not claim that effects have been demonstrated if the effect estimates are not statistically significant, regardless of how large the point estimates are.
1

The order of errors as presented does not imply a ranking of importance or severity.

2

In most cases, a common recommendation for hypothesis testing is to preregister or predefine as much as possible. The recommendations below are not meant to discourage hypothesis-generating investigations of the data, but rather to encourage making clear distinctions between hypothesis testing, hypothesis generation, and causal inference.