Inadequate reporting of statistical results

Martin Héroux

doi:10.1152/jn.00550.2016

letter

. 2016 Sep 1;116(3):1536–1537. doi: 10.1152/jn.00550.2016

Inadequate reporting of statistical results

Martin Héroux ^1,^2,^✉

PMCID: PMC5040376 PMID: 27678073

to the editor: In the first half of 2016 I was asked to review a manuscript for the Journal of Neurophysiology. Having reviewed only a handful of times for your journal, I started my review assignment by reading the Information for Authors and Information for Reviewers sections of your website. I noted that when preparing their manuscript, authors are encouraged to consult an editorial entitled “Guidelines for reporting statistics in journals published by the American Physiological Society” (Curran-Everett and Benos 2004). Unfortunately, the manuscript I reviewed failed to follow several key guidelines presented in this editorial. For example, error bars in figures were standard error of the mean rather than standard deviations, and the majority of reported P values were not exact, except when they neared significance, in which case they were explicitly stated (e.g., P = 0.067) and considered statistically significant. Such practices are not uncommon. In fact, these and other questionable statistical practices have become so prevalent that numerous articles have been written to try and educate scientists and motivate change (e.g., Cumming 2013; Drummond and Tom 2011; Drummond and Vowler 2011; Halsey et al. 2015; Nakagawa and Cuthill 2007; Tressoldi et al. 2013). Despite these efforts, many published papers continue to suffer from poor statistical reporting. As a journal with clear guidelines, I was interested in the prevalence of such papers in the Journal of Neurophysiology. To this end, I audited all research papers published in 2015 for the presence of the three easy-to-identify questionable reporting practices I noted in my initial review (see Supplemental Material for data and analysis details, available online at the Journal of Neurophysiology website). As you will see, the results are alarming.

Authors often prefer reporting the standard error of the mean because it is smaller than the standard deviation. However, the standard error of the mean is rarely, if ever, the appropriate statistic (Cumming 2013; Curran-Everett and Benos 2004). Nevertheless, of the 278 research papers published in 2015 with error bars in figures, 65% reported the standard error of the mean. Worse yet, 12.5% of papers had undefined error bars. Only 20% of papers reported standard deviations (Table 1).

Table 1.

Audit results of statistical reporting practices for papers published in 2015

	Papers, n/total	Papers, %
Error bars
SE	178/278	65.4
SD	55/278	20.2
95% CI	17/278	6.2
IQR	21/278	7.7
Undefined	34/278	12.5
Statistics
Exact	160/274	58.4
Not exact	114/274	41.6
0.1 > P > 0.05
Trend/significant	42/74	56.8
Not significant	32/74	43.2

Open in a new tab

n, No. of papers meeting indicated reporting practice; SE, standard error of the mean; SD, standard deviation; 95% CI, 95% confidence interval; IQR, median and interquartile range.

The P value has a long history, as does its misinterpretation (Cohen 1994; Greenland et al. 2016). Furthermore, similar to other research areas, neuroscience is plagued by low statistical power that reduces the probability of finding true effects, increases the rate of false discoveries, and exaggerates the size of reported effects (Button et al. 2013). Thus P values have been called fickle (Halsey et al. 2015). Nevertheless, when they are reported, P values should be exact (e.g., P = 0.038) rather than general (e.g., P < 0.05; Curran-Everett and Benos 2004). Of the 274 research papers published in the Journal of Neurophysiology in 2015 that included P values, 42% reported general P values. More worrisome, of the 74 papers with P values between 0.05 and 0.1, more than half interpreted these as statistical trends or statistically significant (Table 1).

The pressure to publish is ever increasing, and it plays a key role in the natural selection of bad science (Smaldino and McElreath 2016). Because clean, significant results are easier to publish, it is understandable why authors may choose to discuss nonsignificant results and favor the standard error of the mean, even if these practices are wrong. Fortunately, experts in the field of statistics have provided us with simple, implementable guidelines (e.g., Button et al. 2013; Cumming 2013; Curran-Everett and Benos 2004; Halsey et al. 2015; Nakagawa and Cuthill 2007). Unfortunately, such guidelines are often ignored (Sedlmeier and Gigerenzer 1989; Tressoldi et al. 2013). Researchers, reviewers, and editors are already overworked; who has the time to ensure authors comply with guidelines? But to not strive to adhere to these guidelines is to accept the current state of affairs, which as this audit highlights, is far from ideal. I sincerely hope this letter serves as a catalyst for an open discussion of statistical reporting practices in the Journal of Neurophysiology.

DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the author.

AUTHOR CONTRIBUTIONS

M.E.H. conception and design of research; M.E.H. performed experiments; M.E.H. analyzed data; M.E.H. interpreted results of experiments; M.E.H. prepared figures; M.E.H. drafted manuscript; M.E.H. edited and revised manuscript; M.E.H. approved final version of manuscript.

Supplementary Material

Data Procesing

Data_Processing.html^{(34.7KB, html)}

Data

Data.txt^{(7.6KB, txt)}

REFERENCES

Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J, Robinson ES, Munafò MR. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14: 365–376, 2013. [DOI] [PubMed] [Google Scholar]
Cohen J. The earth is round (p < 0.05). Eur J Epidemiol 49: 997–1003, 1994. [Google Scholar]
Cumming G. The new statistics: why and how. Psychol Sci 25: 7–29, 2013. [DOI] [PubMed] [Google Scholar]
Curran-Everett D, Benos DJ. Guidelines for reporting statistics in journals published by the American Physiological Society. Am J Physiol Regul Integr Comp Physiol 287: R247–R249, 2004. [DOI] [PubMed] [Google Scholar]
Drummond GB, Tom BD. Presenting data: can you follow a recipe? J Physiol 589: 5007–5011, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Drummond GB, Vowler SL. Show the data, don't conceal them. J Physiol 589: 1861–1863, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, Altman DG. Statistical tests, p values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol 31: 1–14, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
Halsey LG, Curran-Everett D, Vowler SL, Drummond GB. The fickle p value generates irreproducible results. Nat Methods 12: 179–185, 2015. [DOI] [PubMed] [Google Scholar]
Nakagawa SH, Cuthill IC. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol Rev Camb Philos Soc 82: 591–605, 2007. [DOI] [PubMed] [Google Scholar]
Sedlmeier P, Gigerenzer G. Do studies of statistical power have an effect on the power of studies? Psychol Bull 105: 309–316, 1989. [Google Scholar]
Smaldino PE, McElreath R. The natural selection of bad science. ArXiv:1605.09511 [physics. soc-ph] 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tressoldi PE, Giofré D, Sella F, Cumming G. High impact = high statistical standards? Not necessarily so. PLoS One 8: e56180, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Procesing

Data_Processing.html^{(34.7KB, html)}

Data

Data.txt^{(7.6KB, txt)}

[B1] Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J, Robinson ES, Munafò MR. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14: 365–376, 2013. [DOI] [PubMed] [Google Scholar]

[B2] Cohen J. The earth is round (p < 0.05). Eur J Epidemiol 49: 997–1003, 1994. [Google Scholar]

[B3] Cumming G. The new statistics: why and how. Psychol Sci 25: 7–29, 2013. [DOI] [PubMed] [Google Scholar]

[B4] Curran-Everett D, Benos DJ. Guidelines for reporting statistics in journals published by the American Physiological Society. Am J Physiol Regul Integr Comp Physiol 287: R247–R249, 2004. [DOI] [PubMed] [Google Scholar]

[B5] Drummond GB, Tom BD. Presenting data: can you follow a recipe? J Physiol 589: 5007–5011, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] Drummond GB, Vowler SL. Show the data, don't conceal them. J Physiol 589: 1861–1863, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, Altman DG. Statistical tests, p values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol 31: 1–14, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] Halsey LG, Curran-Everett D, Vowler SL, Drummond GB. The fickle p value generates irreproducible results. Nat Methods 12: 179–185, 2015. [DOI] [PubMed] [Google Scholar]

[B9] Nakagawa SH, Cuthill IC. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol Rev Camb Philos Soc 82: 591–605, 2007. [DOI] [PubMed] [Google Scholar]

[B10] Sedlmeier P, Gigerenzer G. Do studies of statistical power have an effect on the power of studies? Psychol Bull 105: 309–316, 1989. [Google Scholar]

[B11] Smaldino PE, McElreath R. The natural selection of bad science. ArXiv:1605.09511 [physics. soc-ph] 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] Tressoldi PE, Giofré D, Sella F, Cumming G. High impact = high statistical standards? Not necessarily so. PLoS One 8: e56180, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Inadequate reporting of statistical results

Martin Héroux

Table 1.

DISCLOSURES

AUTHOR CONTRIBUTIONS

Supplementary Material

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Inadequate reporting of statistical results

Martin Héroux

Table 1.

DISCLOSURES

AUTHOR CONTRIBUTIONS

Supplementary Material

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases