In the early 2010s, the “replication crisis” and synonymous terms (“replicability crisis” and “reproducibility crisis”) were coined to describe growing concerns regarding published research results too often not being replicable, potentially undermining scientific progress [1]. Many psychologists have studied this problem, contributing groundbreaking work resulting in numerous articles and several Special Issues in journals, with titles such as “Replicability in Psychological Science: A Crisis of Confidence?”, “Reliability and replication in cognitive and affective neuroscience research”, “Replications of Important Results in Social Psychology”, “Building a cumulative psychological science”, and “The replication crisis: Implications for linguistics” [1,2,3,4,5]. Researchers in the field of brain imaging, which often dovetails with psychology, have also published numerous works on the subject, with brain imaging organizations having become staunch supporters of efforts to address the problem, such organizations including the Stanford Center for Reproducible Neuroscience and the Organization for Human Brain Mapping (OHBM), the latter having created an annual award for the best replication study [6], regularly featuring informative events concerning the replication crisis and Open Science at its annual meetings [3,7]. The purpose of the Brain Sciences Special Issue “The Brain Imaging Replication Crisis” is to provide a forum for discussions concerning this replication crisis in light of the special challenges posed by brain imaging.
In John Ioannidis’ widely cited article entitled “Why most published research findings are false”, he convincingly argues that most published findings are indeed false, with relatively few exceptions [8,9,10]. He supports this claim using Bayes’ theorem and some reasonable assumptions concerning published research findings. It follows from Bayes’ theorem that when a hypothesis test is positive, the likelihood that this study finding is true (PPV, positive predictive value) depends on three variables: the α-level for statistical significance (where α is the probability of a positive test, given that the hypothesis is false), the power of the study (1 − β, where β is the probability of a negative test, given that the hypothesis is true), and the odds that the hypothesis is true (R, the ratio of the probability that the hypothesis is true to the probability that the hypothesis is false). This relationship is expressed with the equation PPV = R(1 − β)/[α + R(1 − β)]. From this equation, it follows that any hypothesis will likely be false, even after a positive test, when R < α. This situation applies to fields where tested hypotheses are seldom true, which could in part explain the low replication rates observed in cancer studies [11,12]. It also follows that when the study power is equal to α, the probability that the hypothesis is true remains the same as it was before the test. Thus, inadequately powered studies lack the capacity to advance our confidence in the tested hypotheses. The PPV can also be reduced by sources of bias that elevate the actual value of α above its nominal value, for example, when publication bias [13,14] causes only positive studies to be published for a given hypothesis. When published p-values are not corrected for multiple comparisons involving negative studies, actual p-values become much higher than the published ones.
Academic incentives regarding the publication of “interesting” findings in high-impact journals can further bias research towards the production of spurious, false–positive findings through multiple mechanisms [15]. Simmons et al. [16] demonstrated with computer simulations how four common variations in research methods and data analyses allowed the inflation of actual p-values via so-called p-hacking [14,17], from 0.05 to 0.61. Researchers incentivized to find their anticipated results might be biased towards choosing methods that yield those results [18]. In the same vein, methodological errors [19] might be found less frequently when they support the anticipated results. Additionally, after seeing the results of a study, researchers might be inclined to reconsider their original hypotheses to match the observed data, so-called HARKing (hypothesizing after the results are known) [20].
To counteract these deleterious academic incentives, Serra-Garcia and Gneezy [21] proposed disincentives for the publication of nonreplicable research findings. A problem with this approach is that it can take years and considerable research resources to identify such findings. Another problem is that the replicability of findings is not necessarily a good measure of study quality. High-quality studies have the capacity to sift out replicable from irreplicable hypotheses, for example, in confirmatory studies to provide a higher margin of certainty for hypotheses already considered likely to be true, and in exploratory studies to identify promising candidates for further research. Obviously, some such candidate hypotheses will not prove replicable. Conversely, a positive study of low quality, with no capacity to separate true from false hypotheses, could prove replicable if the tested hypothesis happened to be true.
Determining which hypotheses are replicable can be especially challenging in the field of brain imaging, with many experiments lacking the power to find the sought-after differences in neural activity due to limitations in the reliability of measures combined with cost considerations limiting sample sizes [22,23,24,25,26,27,28,29,30]. Nonetheless, the countless pipelines from available methods of analysis can provide the needed p-value to support practically any hypothesis [31,32]. HARKing also reliably yields positive findings, which can seem confirmatory. For example, if using functional connectivity (FC) to study brain differences between two groups that differ clinically in some way, one recipe for “success” is the following: (1) divide the brain into ~100 regions and find the FC between each pair of regions, yielding ~500 such pairs whose FC differs significantly between the two groups, with α = 0.05; (2) select a pair of such brain regions that happens to correspond to existing findings in the literature related to the studied clinical group differences; (3) write the paper as if the selected pair had been the only pair of interest, based on the literature search, thereby giving the appearance that the study is a confirmation of an expected finding.
What can improve the replicability of research results? Theoretical considerations can help to sift out likely from unlikely hypotheses even before testing begins [33]. Judicious study design can improve power. Perhaps the most efficient means of improving replicability are those that address the inflation of p-values. The preregistration of study hypotheses and methods [3,7] can prevent p-hacking and HARKing, provided that methods are specified in enough detail to eliminate flexibility in the data collection and analysis. A detailed specification of methods in published articles allows other researchers to reproduce published studies and to double-check the authors’ work if study data and software are also available. Many organizations now provide tools to facilitate such a preregistration of studies and storage of data and software. The Center for Open Science [34,35], for example, is a well-funded, nonprofit organization that provides these services at little to no cost to researchers.
We welcome the submission of papers contributing further ideas for how to address the replication crisis, including replication studies or papers describing refinements of brain imaging methods to improve study power. Additionally welcome are examples of excellent study quality involving (1) preregistration with detailed methods allowing an unambiguous study reproduction and (2) availability of data and software, if feasible. Please feel free to contact the guest editor (R.E.K.) to discuss a planned study, to learn if it would be considered suitable for publication, and if not, how to make it so.
Author Contributions
Conceptualization, R.E.K.J. and M.J.H.; writing—original draft preparation, R.E.K.J. and M.J.H.; writing—review and editing, R.E.K.J. and M.J.H.; visualization, R.E.K.J.; supervision, R.E.K.J.; project administration, R.E.K.J. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
No human or animal data were used for this article.
Conflicts of Interest
The authors declare no conflict of interest.
Footnotes
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Pashler H., Wagenmakers E.J. Editors’ Introduction to the Special Section on Replicability in Psychological Science: A Crisis of Confidence? Perspect. Psychol. Sci. 2012;7:528–530. doi: 10.1177/1745691612465253. [DOI] [PubMed] [Google Scholar]
- 2.Barch D.M., Yarkoni T. Introduction to the special issue on reliability and replication in cognitive and affective neuroscience research. Cogn. Affect. Behav. Neurosci. 2013;13:687–689. doi: 10.3758/s13415-013-0201-7. [DOI] [PubMed] [Google Scholar]
- 3.Nosek B.A., Lakens D. Registered reports: A method to increase the credibility of published results. Soc. Psychol. 2014;45:137–141. doi: 10.1027/1864-9335/a000192. [DOI] [Google Scholar]
- 4.Sharpe D., Goghari V.M. Building a cumulative psychological science. Can. Psychol. 2020;61:269–272. doi: 10.1037/cap0000252. [DOI] [Google Scholar]
- 5.Sönning L., Werner V. The replication crisis, scientific revolutions, and linguistics. Linguistics. 2021;59:1179–1206. doi: 10.1515/ling-2019-0045. [DOI] [Google Scholar]
- 6.Gorgolewski K.J., Nichols T., Kennedy D.N., Poline J.B., Poldrack R.A. Making replication prestigious. Behav. Brain Sci. 2018;41:e131. doi: 10.1017/S0140525X18000663. [DOI] [PubMed] [Google Scholar]
- 7.Nosek B.A., Alter G., Banks G.C., Borsboom D., Bowman S.D., Breckler S.J., Buck S., Chambers C.D., Chin G., Christensen G., et al. Promoting an open research culture. Science. 2015;348:1422–1425. doi: 10.1126/science.aab2374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ioannidis J.P.A. Why Most Published Research Findings Are False. PLoS Med. 2005;2:e124. doi: 10.1371/journal.pmed.0020124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ioannidis J.P.A. Discussion: Why “An estimate of the science-wise false discovery rate and application to the top medical literature” is false. Biostatistics. 2014;15:28–36. doi: 10.1093/biostatistics/kxt036. [DOI] [PubMed] [Google Scholar]
- 10.Jager L.R., Leek J.T. An estimate of the science-wise false discovery rate and application to the top medical literature. Biostatistics. 2014;15:1–12. doi: 10.1093/biostatistics/kxt007. [DOI] [PubMed] [Google Scholar]
- 11.Begley C., Ellis L. Raise standards for preclinical cancer research. Nature. 2012;483:531–533. doi: 10.1038/483531a. [DOI] [PubMed] [Google Scholar]
- 12.Prinz F., Schlange T., Asadullah K. Believe it or not: How much can we rely on published data on potential drug targets? Nat. Rev. Drug Discov. 2011;10:712–713. doi: 10.1038/nrd3439-c1. [DOI] [PubMed] [Google Scholar]
- 13.Young N.S., Ioannidis J.P.A., Al-Ubaydli O. Why Current Publication Practices May Distort Science. PLoS ONE. 2008;5:1418–1422. doi: 10.1371/journal.pmed.0050201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Brodeur A., Cook N., Heyes A. Methods Matter: P-Hacking and Publication Bias in Causal Analysis in Economics. Am. Econ. Rev. 2020;110:3634–3660. doi: 10.1257/aer.20190687. [DOI] [Google Scholar]
- 15.Nosek B.A., Spies J.R., Motyl M. Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth over Publishability. Perspect. Psychol. Sci. 2012;7:615–631. doi: 10.1177/1745691612459058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Simmons J.P., Nelson L.D., Simonsohn U. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 2011;22:1359–1366. doi: 10.1177/0956797611417632. [DOI] [PubMed] [Google Scholar]
- 17.Head M.L., Holman L., Lanfear R., Kahn A.T., Jennions M.D. The Extent and Consequences of P-Hacking in Science. PLoS Biol. 2015;13:e1002106. doi: 10.1371/journal.pbio.1002106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Giner-Sorolla R. Science or Art? How Aesthetic Standards Grease the Way Through the Publication Bottleneck but Undermine Science. Perspect. Psychol. Sci. 2012;7:562–571. doi: 10.1177/1745691612457576. [DOI] [PubMed] [Google Scholar]
- 19.Vul E., Harris C., Winkielman P., Pashler H. Puzzlingly High Correlations in fMRI Studies of Emotion, Personality, and Social Cognition. Perspect. Psychol. Sci. 2009;4:274–290. doi: 10.1111/j.1745-6924.2009.01125.x. [DOI] [PubMed] [Google Scholar]
- 20.Kerr N.L. HARKing: Hypothesizing after the results are known. Personal. Soc. Psychol. Rev. 1998;2:196–217. doi: 10.1207/s15327957pspr0203_4. [DOI] [PubMed] [Google Scholar]
- 21.Serra-Garcia M., Gneezy U. Nonreplicable publications are cited more than replicable ones. Sci. Adv. 2021;7:eabd1705. doi: 10.1126/sciadv.abd1705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Mumford J.A., Nichols T.E. Power calculation for group fMRI studies accounting for arbitrary design and temporal autocorrelation. Neuroimage. 2008;39:261–268. doi: 10.1016/j.neuroimage.2007.07.061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Button K.S., Ioannidis J.P.A., Mokrysz C., Nosek B.A., Flint J., Robinson E.S.J., Munafò M.R. Power failure: Why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 2013;14:365–376. doi: 10.1038/nrn3475. [DOI] [PubMed] [Google Scholar]
- 24.Carp J. The secret lives of experiments: Methods reporting in the fMRI literature. Neuroimage. 2012;63:289–300. doi: 10.1016/j.neuroimage.2012.07.004. [DOI] [PubMed] [Google Scholar]
- 25.Elliott M.L., Knodt A.R., Ireland D., Morris M.L., Poulton R., Ramrakha S., Sison M.L., Moffitt T.E., Caspi A., Hariri A.R. What Is the Test-Retest Reliability of Common Task-Functional MRI Measures? New Empirical Evidence and a Meta-Analysis. Psychol. Sci. 2020;31:792–806. doi: 10.1177/0956797620916786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Geuter S., Qi G., Welsh R.C., Wager T.D., Lindquist M.A. Effect Size and Power in fMRI Group Analysis. bioRxiv. 2018 doi: 10.1101/295048. [DOI] [Google Scholar]
- 27.Turner B.O., Paul E.J., Miller M.B., Barbey A.K. Small sample sizes reduce the replicability of task-based fMRI studies. Commun. Biol. 2018;1:62. doi: 10.1038/s42003-018-0073-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Masouleh S.K., Eickhoff S.B., Hoffstaedter F., Genon S. Empirical examination of the replicability of associations between brain structure and psychological variables. Elife. 2019;8:e43464. doi: 10.7554/eLife.43464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Noble S., Scheinost D., Constable R.T. A guide to the measurement and interpretation of fMRI test-retest reliability. Curr. Opin. Behav. Sci. 2021;40:27–32. doi: 10.1016/j.cobeha.2020.12.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Szucs D., Ioannidis J.P. Sample size evolution in neuroimaging research: An evaluation of highly-cited studies (1990–2012) and of latest practices (2017–2018) in high-impact journals. Neuroimage. 2020;221:2017–2018. doi: 10.1016/j.neuroimage.2020.117164. [DOI] [PubMed] [Google Scholar]
- 31.Botvinik-Nezer R., Holzmeister F., Camerer C.F., Dreber A., Huber J., Johannesson M., Kirchler M., Iwanir R., Mumford J.A., Adcock R.A., et al. Variability in the analysis of a single neuroimaging dataset by many teams. Nature. 2020;582:84–88. doi: 10.1038/s41586-020-2314-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Bowring A., Maumet C., Nichols T.E. Exploring the impact of analysis software on task fMRI results. Hum. Brain Mapp. 2019;40:3362–3384. doi: 10.1002/hbm.24603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Kelly R.E., Jr., Ahmed A.O., Hoptman M.J., Alix A.F., Alexopoulos G.S. The Quest for Psychiatric Advancement through Theory, beyond Serendipity. Brain Sci. 2022;12:72. doi: 10.3390/brainsci12010072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Nosek B. Center for Open Science: Strategic Plan. Center for Open Science; Charlottesville, VA, USA: 2017. [DOI] [Google Scholar]
- 35.Foster E.D., Deardorff A. Open Science Framework (OSF) J. Med. Libr. Assoc. 2017;105:203. doi: 10.5195/jmla.2017.88. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
No human or animal data were used for this article.