Skip to main content
Human Brain Mapping logoLink to Human Brain Mapping
. 2018 Sep 17;40(1):352–353. doi: 10.1002/hbm.24371

Reporting matters: Brain mapping with transcranial magnetic stimulation

Martin E Héroux 1,2,
PMCID: PMC6865499  PMID: 30222224

1.

Transcranial magnetic stimulation (TMS) allows researchers to noninvasively probe the human brain. In a recent issue of Human Brain Mapping, Massé‐Alarie, Bergin, Schneider, Schabrun, and Hodges (2017) used this technique to investigate the task‐specific organization of the primary motor cortex for control of human forearm muscles. Specifically, TMS was used to create cortical topographical maps of four forearm muscles at rest, and, in one of these muscles, during isometric wrist extension and isometric grip. The authors were interested in how these maps differ between muscles, how they overlap, and how they change with different motor tasks. Key to their approach was the use of indwelling fine‐wire electrodes to record motor evoked potentials elicited by magnetic stimulation, which revealed the size of cortical maps is grossly overestimated when evoked potentials are recorded from electrodes placed on the skin surface.

In their paper, Massé‐Alarie et al. (2017) set (statistical) significance at p < 0.05. Yet the authors interpret several p values above this threshold as statistical trends. Post hoc analyses were even performed for main effects that were not statistically significant:

“Although narrowly missing significance, there was a tendency for a main effect of Pairs of tasks for percentage of MEP peak overlap (F (1, 13) = 3.30; p = 0.053), which was explained by a tendency towards a greater percentage peak overlap in Rest‐Ext (32.1 ± 9.6%) than Rest‐Grip (14.3 ± 8.2%; p = 0.02); Fig. 4B), and by a tendency toward a greater percent peak overlap for Grip‐Ext (26.8 ± 9.7%) compared to Rest‐Grip (p = 0.09).” p. 6126

“Finally, although nonsignificant, there was a tendency toward a main effect for muscle pairs for percentage of MEP peak overlap (F(2, 18) = 3.23; p = 0.06). This was explained by a tendency toward a larger overlap between ECRBfw‐surf (43.5 ± 8.7%) than ECRBfw‐EDC (25.2 ± 7.5%; post hoc p = 0.03; Fig. 6C).” p. 6126‐6127

The authors later discuss these effects as if they were statistically significant:

“The findings that peak overlap was larger between ECRBsurf and ECRBfw than for ECRBfw and EDC implies that […].” p. 6130

Implicit to this type of interpretation is that statistical trends reflect real effects. However, additional data are more likely than not to turn a trend into a nonsignificant result (Wood, Freemantle, King, & Nazareth, 2014). Is this a problem? In this instance the spin is blatant and an astute reader can draw their own conclusions. However, spin in various forms is so common (Bero, 2018; Chiu, Grundy, & Bero, 2017; Héroux, 2016) that readers and reviewers may be wooed by such biased interpretations. Why do I say biased? The authors are lenient only in one direction. In their paper, Massé‐Alarie et al. (2017) report 11 p values that fall between 0.02 and 0.05. Why are these not reported as tending toward nonsignificance? Regardless of whether the p values were just above or just below the threshold of p = 0.05, an exact replication (i.e., same sample size and methods) of this study only has a 50% chance of reproducing these statistically significant (or near significant) effects (Button et al., 2013; Forstmeier, Wagenmakers, & Parker, 2017). As recently pointed out, p values are fickle (Cumming, 2014; Halsey, Curran‐Everett, Vowler, & Drummond, 2015), especially when sample size is small (Button et al., 2013; Higginson & Manufó, 2016). Thus, how confident should we be about the results of Massé‐Alarie et al. (2017)? Looking back, how confident should we be about our own work? This type of nuanced view is not common. But we need more of it; especially towards noninvasive brain stimulation where many published effects are simply not reproducible (Héroux, Taylor & Gandevia, 2015; Héroux, Loo, Taylor, & Gandevia, 2017).

Another reporting matter in the paper by Massé‐Alarie et al. (2017) is the all‐to‐common use of the standard error of the mean (SEM) to summarize data variability (Héroux, 2016; Héroux et al., 2017; Weissgerber, Milic, Winham, & Garovic, 2015). This is not what the SEM quantifies. But does it actually matter what measure is reported? Experts think so (Curran‐Everett & Benos, 2004), as do I. Here are some of the above results reported with standard deviations:

“[…] which was explained by a tendency towards a greater percentage peak overlap in Rest‐Ext (32.1 ± 35.9%) than Rest‐Grip (14.3 ± 31.1%; p = 0.02); Fig. 4B), and by a tendency toward a greater percent peak overlap for Grip‐Ext (26.8 ± 36.3%) compared to Rest‐Grip (p = 0.09).”

Given that percentages are bound between 0 and 100, what does 14.3 ± 31.1% actually mean? What does the underlying data look like? Reporting results with standard deviations or other appropriate measures of variability does not affect statistical tests—a significant result will remain a significant result—so let us not be afraid of them. Reporting results with standard deviations or other appropriate measures of variability provides the reader with a better sense of the underlying data, which is important to appropriately interpret study results and figures (Belia, Fidler, Williams, & Cumming, 2005; Curran‐Everett & Benos, 2004; Drummond & Vowler, 2011).

Exploratory research is important to identify new avenues of research and test new hypotheses, and the paper by Massé‐Alarie et al. (2017) raises many interesting questions on how the human primary motor cortex is organized. Nevertheless, I encourage the authors and others in the field to be mindful when reporting and interpreting study results, especially when sample sizes are relatively small. Let us heed the advice of experts and, as a field, strive toward publishing research that is less biased, and more reproducible and transparent.

Héroux ME. Reporting matters: Brain mapping with transcranial magnetic stimulation. Hum Brain Mapp. 2019;40:352–353. 10.1002/hbm.24371

REFERENCES

  1. Belia, S. , Fidler, F. , Williams, J. , & Cumming, G. (2005). Researchers misunderstand confidence intervals and standard error bars. Psychological Methods, 10, 389–396. [DOI] [PubMed] [Google Scholar]
  2. Bero, L. (2018). Meta‐research matters: Meta‐spin cycles, the blindness of bias, and rebuilding trust. PLoS Biology, 16, e2005972. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Button, K. S. , Ioannidis, J. P. , Mokrysz, C. , Nosek, B. A. , Flint, J. , Robinson, E. S. , & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376. [DOI] [PubMed] [Google Scholar]
  4. Chiu, K. , Grundy, Q. , & Bero, L. (2017). 'Spin' in published biomedical literature: A methodological systematic review. PLoS Biology, 15, e2002173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cumming, G. (2014). Understanding the new statistics: Effect sizes, confidence intervals, and meta‐analysis. New York, NY: Routledge. [Google Scholar]
  6. Curran‐Everett, D. , & Benos, D. J. (2004). Guidelines for reporting statistics in journals published by the American Physiological Society. Journal of Applied Physiology, 97, 457–459. [DOI] [PubMed] [Google Scholar]
  7. Drummond, G. B. , & Vowler, S. L. (2011). Show the data, don't conceal them. Journal of Physiology, 589, 1861–1863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Forstmeier, W. , Wagenmakers, E. J. , & Parker, T. H. (2017). Detecting and avoiding likely false‐positive findings ‐ A practical guide. Biological Reviews ‐ Cambridge Philosophical Society, 92, 1941–1968. [DOI] [PubMed] [Google Scholar]
  9. Halsey, L. G. , Curran‐Everett, D. , Vowler, S. L. , & Drummond, G. B. (2015). The fickle P value generates irreproducible results. Nature Methods, 12, 179–185. [DOI] [PubMed] [Google Scholar]
  10. Héroux, M. E. (2016). Inadequate reporting of statistical results. Journal of Neurophysiology, 116, 1536–1537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Héroux, M. E. , Loo, C. K. , Taylor, J. L. , & Gandevia, S. C. (2017). Questionable science and reproducibility in electrical brain stimulation research. PLoS One, 12, e0175635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Héroux, M. E. , Taylor, J. L. , & Gandevia, S. C. (2015). The use and abuse of transcranial magnetic stimulation to modulate corticospinal excitability in humans. PLoS One, 10, e0144151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Higginson, A. D. , & Manufò, M. R. (2016). Current incentives for scientists lead to underpowered studies with erronous conclusions. PLoS Biology, 14, e2000995. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Massé‐Alarie, H. , Bergin, M. J. G. , Schneider, C. , Schabrun, S. , & Hodges, P. W. (2017). "Discrete peaks" of excitability and map overlap reveal task‐specific organization of primary motor cortex for control of human forearm muscles. Human Brain Mapping, 38, 6118–6132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Weissgerber, T. L. , Milic, N. M. , Winham, S. J. , & Garovic, V. D. (2015). Beyond bar and line graphs: Time for a new data presentation paradigm. PLoS Biology, 13, e1002128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Wood, J. , Freemantle, N. , King, M. , & Nazareth, I. (2014). Trap of trends to statistical significance: Likelihood of near significant P value becoming more significant with extra data. British Medical Journal, 348, g2215. [DOI] [PubMed] [Google Scholar]

Articles from Human Brain Mapping are provided here courtesy of Wiley

RESOURCES