Abstract
Accumulating evidence suggests that many findings in psychological science and cognitive neuroscience may prove difficult to reproduce; statistical power in brain imaging studies is low, and has not improved recently; software errors in common analysis tools are common, and can go undetected for many years; and, a few large scale studies notwithstanding, open sharing of data, code, and materials remains the rare exception. At the same time, there is a renewed focus on reproducibility, transparency, and openness as essential core values in cognitive neuroscience. The emergence and rapid growth of data archives, meta-analytic tools, software pipelines, and research groups devoted to improved methodology reflects this new sensibility. We review evidence that the field has begun to embrace new open research practices, and illustrate how these can begin to address problems of reproducibility, statistical power, and transparency in ways that will ultimately accelerate discovery.
Keywords: open science, reproducibility, data sharing
Introduction
Most cognitive neuroscientists seek answers to questions about what patterns of neural activity underlie perception, thinking, memory, and action, among other topics. In answering these questions, we marshal evidence from studies of human and animal behavior, nervous system structure and activity, the effects of endogenous and exogenous substances, patterns of disorder and disease, and trajectories of change across the lifespan. Our common aim is to reveal reliable, reproducible, and useful facts about the relationship between mind and brain. These facts depend crucially on the tools we deploy to collect and evaluate data and on how we report what we do or do not find. In this paper, we review the degree to which our field meets the scientific ideals of reproducibility, transparency, and openness.
Rigorous self-reflection and self-criticism about methodology have been core values in cognitive neuroscience for some time1–3. Efforts to to foster widespread data sharing 4–6 and other open research practices have long histories. What strikes us as new and important enough to merit reviewing them in 2016 are developments that likely cheer the pessimist and the optimist alike. On the one hand, accumulating evidence suggests that many findings in psychological science may be difficult to reproduce7; statistical power in brain imaging studies is low8–10 and has not improved11 over time; software errors in analysis tools are common and can go undetected for many years12; and a few large scale studies and databases notwithstanding 4,13–14, the open sharing of data, code, and materials is rare. On the other hand, we see a renewed focus on reaffirming reproducibility, transparency, and openness as essential core values in psychological science and related fields7,15–19. This reinvigorated focus has begun to force greater clarity about what these values mean in practice20. We find that the emergence and rapid growth of data archives, meta-analytic tools, software pipelines, and research groups devoted to improved methodology are genuine reasons for optimism about the future of an open, transparent, and reproducible cognitive neuroscience.
In the sections that follow, we discuss definitions of open science practices and why they might be important for the field. We then review some of the history of these practices, discuss a range of recent developments, and offer some speculations about what the near future might hold.
History of open science practices in cognitive neuroscience
What are open science practices? What does it mean to reproduce or replicate a study? Most researchers agree that discovering robust and generalizable findings is central to the scientific enterprise1–2, 16, 19, 21, but what evidence determines success or failure in meeting the ideal? In a previous paper in NYAS, Bennett & Miller1 sought to assess the reliability of fMRI results, and provided data about the diverse measures used to assess reliability in the functional neuroimaging literature of that time. In summarizing the results of a large sample of studies reporting measures of test/retest reliability, Bennett and Miller1 observed that no agreement exists about what constitutes acceptable reliability, nor was there consensus on what measure or measures should be used to evaluate it. Half a decade later, Goodman and colleagues20 argue that uncertainty and disagreement about the meaning of these concepts 22 persists and that misunderstanding impedes progress toward solutions. In response, Goodman and colleagues suggest three new terms which we adopt here: methods reproducibility, results reproducibility, and inferential reproducibility. Methods reproducibility means that a different investigator is able to obtain the same results when applying the same tools and analytical procedures used in a study to the same (i.e., original) dataset. Results reproducibility means that a new study with new data, collected following the original procedures as closely as possible, yields the same outcomes. Inferential reproducibility occurs when independent researchers come to similar conclusions about what patterns of data mean, based on their own replication study or a reanalysis of a prior study20. For example, Goodman et al.20 suggest that competing views about the implications of a recent high-profile study of replicability in psychology7 stem, at least in part, from a disagreement at this level23, 24.
Clearly, to achieve methods reproducibility, research practices that accurately and precisely capture essential details about methods, data, and workflows must be deployed; to achieve results reproducibility, those elements must be made openly and freely available to the scientific community; and achieving inferential reproducibility requires, among other developments, the capacity to accumulate, analyze, and interpret large quantities of data25–26 in consistent ways. Thus, openness and transparency relate directly to reproducibility of all three kinds. Reflecting this sensibility, a diverse array of behavioral scientists have begun arguing that that achieving the scientific ideals of a free and open exchange of information requires the widespread adoption of open and transparent communication practices 15–16, 27–30. How well has the field of cognitive neuroscience measured up to these ideals?
Methods reproducibility
Much of cognitive neuroscience research is computationally intensive, so the extent to which the field's methods are reproducible depends on whether complex computational workflows can be reliably regenerated. Whether measuring task- or non-task-related nervous system activity using EEG, fMRI, or brain structure using MRI, CT, or PET, cognitive neuroscience studies regularly generate spatially and temporally dense data streams. Seemingly minor choices made at each step of an analysis pipeline—including experimental design, data acquisition, preprocessing, analysis, and reporting—can ramify and have important implications for reproducibility.
The complexity of the typical neuroimaging pipeline is visible from the earliest stages of data acquisition. While there are only three major manufacturers of MRI scanners, the machines run different pulse sequences, and even scanners from the same manufacturers do not often run the same software. At the preprocessing stage—even prior to statistical analysis—researchers face a bewildering array of options when considering when and how (or even whether) to account for subject movement, signal spikes, differences in brain anatomy, physiological confounds, and any number of standard concerns. Statistical analysis is no less complicated, as researchers must decide what kind of analyses to conduct (mass univariate, multivariate pattern classification, etc.), what search space to use (whole-brain, specific regions-of-interest), what statistical contrasts and multiple comparisons correction procedures to apply, and so on. The sheer magnitude of variation in analytical approaches underscores why computational reproducibility is so critical in cognitive neuroscience and why it has historically seemed so daunting. Put simply, without the ability to understand precisely what steps a research group took, it is doubtful that anyone else could ever reproduce the procedures.
In EEG, the diversity of methods is arguably even larger than in fMRI, with numerous manufacturers and a corresponding variety of technologies using different kinds of electrodes, amplifier settings, cap configurations, and software packages31. There have been some efforts at standardization of analysis methods through the release of software packages, such as the Matlab-based EEGLAB32 and ERPLAB33. These packages have the advantage of allowing researchers to explore the data using a graphical interface while simultaneously generating an executable history script that records most of the analysis decisions. The BigEEG Consortium (www.bigeeg.orgy an offshoot of the EEGLAB initiative, seeks to develop and promote data and metadata standards for EEG-based research that may eventually facilitate large-scale analysis and meta-analysis. But, by and large, EEG data collection and analysis involves equipment and workflows that vary considerably from lab to lab.
Of course, the complexity of neuroimaging data analysis is not itself the enemy. It is not the raw number of methodological and analytical choices per se that creates barriers to reproducibility; rather, the challenge lies in encoding those degrees of freedom in a standardized (and ideally, machine-readable) way. Fortunately, overtime, the brain imaging community has converged on recommendations about what parameters should be reported and how34–35. Moreover, at least in fMRI, imaging data analysis software shows a significant degree of standardization. From the earliest days of human brain imaging, leading research groups in the U.S. and U.K. wrote and freely distributed analysis software. This led to the widespread adoption of common tools with similar, although not identical, algorithms. Concerns about the inferential consequences of using one tool over another have been largely alleviated by findings from Gold36 and Morgan37, and questions about the reliability of workflows using one or more tools have addressed by Strother and colleagues38–39 and others (but see2, 12, 40). All of the major tools in common use—SPM, FSL, AFNI, and BrainVoyager—enable researchers to write scriptable workflows, built either on internal engines (BrainVoyager), widely available commercial (SPM-MATLAB), or free/open source software languages (Linux/unix shell, python, C/C++ for FSL and AFNI). Naturally, there are some important caveats to this seemingly rosy picture. One concern is that while existing software supports relatively standardized and highly processing workflows in principle, whether researchers actually take advantage of those features in practice is a separate matter. The number of SPM, AFNI, and BrainVoyager users who commonly rely exclusively on automated scripting in their analysis workflows, as opposed to using more user-friendly, but inherently irreproducible, graphical interfaces, is not known. We speculate it is small. Moreover, even in labs that do conduct fully automated analyses, the sharing or publication of the corresponding scripts or data processing pipelines remains rare40. While the differences between pipelines can be subtle40, the margin for error is also small (many published results only barely survive statistical correction), so a lack of full reporting can severely impair reproducibility.
A second caveat is that perfect reproducibility may be impossible to achieve even when a researcher is armed with all of the original data and scripts used to generate an analysis. Operating system differences, untracked differences in implicit software dependencies, and other factors can sometimes produce numerical discrepancies that, while initially small, may magnify as they cascade through a workflow to the point of introducing qualitative differences in results 41. We discuss potential solutions to this problem (e.g., containerization) later; for present purposes, we note that acknowledging the intrinsic limits of methodological reproducibility does not grant researchers license to ignore best practices in automation and code sharing.
Importantly, brain imaging data are only part of the reproducibility story in cognitive neuroscience. It is also critical to understand how to reproduce the psychological components of cognitive neuroscience studies—most notably, the experimental design and its intended relationship to the latent constructs of interest. Here, the prospects for full reproducibility have historically seemed less promising. Most experimental tasks involve the presentation of sequences of visual or auditory events and the collection of participants' behavioral responses -button presses, mouse movements or clicks, vocalizations, or eye movements - using computer programs that instantiate tasks custom-tailored by a research team to address particular questions of interest. There have been researcher-initiated efforts to develop controlled vocabularies that describe the range of cognitive tasks deployed in the literature42–43. NIH has spearheaded the creation of a standard toolbox of easily deployable tasks44 and also the development of data repositories designed to capture metadata about behavioral tasks and their variants45 in standardized and searchable forms. However, such efforts notwithstanding, most cognitive neuroscience researchers employ customized tasks built using a variety of software and scripting environments (e.g., E-Prime, the Matlab-based Psychophysics Toolbox, PsychoPy, and DMDX) Tasks use customized image and sound components, and researchers rarely share the code, image, or sound files used in experimental tasks40. These practices limit the reproducibility of behavioral measures used in cognitive neuroscience and psychology as a whole7. Of course, the rigid standardization of tasks and materials has its own significant flaws, including the possibility of stifling innovation and slowing progress. We suggest that more widespread and open sharing of behavioral tasks, code, and materials provides a constructive middle ground.
Results reproducibility
Assuming that independent researchers are able to reproduce the methods of one another's studies, how closely do the findings generated converge? The answer is: It depends. In principle, even differences as basic as the make of MRI scanner could undermine the ability to compare results across studies46, so considerable effort has gone into standardizing techniques that allow multi-site imaging studies to be carried out in rigorous and reproducible ways.
Fortunately, the viability of the basic technology is no longer in any serious doubt; abundant evidence demonstrates that all major brain imaging techniques are at least capable of producing highly convergent results across different sites and experimental procedures. Perhaps the best-known effort to demonstrate the basic robustness of results—not only in neuroimaging, but in other biomedical fields, is the Biomedical Informatics Research Network (BIRN, https://www.nitrc.org/projects/birn/y BIRN is a multi-site, collaborative research consortium that strives to advance understanding of brain research and brain disease through the principles of data sharing and collaboration. There were several different BIRN initiatives, including the morphology BIRN, the mouse BIRN, and the function BIRN (fBIRN), among others. Although the fBIRN's disease focus was schizophrenia, considerable effort went into developing generalizable models for multi-site data collection, best practices for research, and methods to facilitate the use of standardized processes across sites. One of fBIRN's biggest contributions was software that enabled the systematic investigation of how fMRI activation signals vary across sites, field strengths, and scanner platforms. The project also developed methods to control for these differences47. fBIRN scientists developed an automated Quality Assurance (QA) procedure based on a standard MRI phantom, and the team released freely available software that could be easily incorporated into any service center's data transfer pipeline48. The fBIRN also provided leadership in modeling inter-site reliability49: The same 18 participants were scanned in four different scanning sites. These analyses revealed that inter-subject variability was 10 times greater than inter-site variability; activation in many brain regions showed fair to good reliability; and measures of reliability increased with more runs of data.
More generally, the ability of techniques such as fMRI to produce robust and replicable findings is demonstrated by the rapid canonization of many initially surprising neuroimaging findings. For example, the tendency of a spatially conserved frontoparietal “task-positive” brain network to increase activity when participants engage in effortful cognitive activity has been replicated so often with fMRI over the past two decades 50–52 that the result is now often treated as a de facto manipulation check in new experiments. Large-scale meta-analyses of hundreds or even thousands of neuroimaging studies at a time further demonstrate a marked degree of convergence on stable neural correlates for most major psychological processes, from pain perception to episodic memory to language production26, 53–54.
Of course, it is one thing to establish that neuroimaging methods can consistently reveal broad mappings between cognitive processes and distributed brain networks, and quite another to establish that the specific pattern of findings generated by any single study can be reproduced with a high degree of fidelity in another study. Unfortunately, as previous commentators have observed1, 55, it is unclear whether neuroimaging findings meet this criterion. Arguably, the central problem is not that results reproducibility is particularly low, but that it has been difficult to quantify, leaving open the question of how much faith one should put in the results of any given published study. We focus on two critical barriers to the comprehensive assessment of results reproducibility in cognitive neuroscience, emphasizing fMRI (though the same concerns apply to other commonly used methods such as EEG, MEG, and TMS).
A first major challenge is that careful comparison of results across independent sites and studies typically requires that the full results be openly shared between sites, yet initiatives promoting neuroimaging data sharing have historically met with limited success. An early pioneer in data sharing was the fMRI Data Center (fMRIDC) at Dartmouth College, founded in 1999 6, 56–57. Around the same time, several journals, most notably the Journal of Cognitive Neuroscience (JOCN), tried to implement mandatory open data sharing—including deposition of raw data files (e.g., BOLD time series, anatomical images)—as a requirement for publication. These efforts to foster increased transparency, while laudable, sparked controversy and backlash from the community. Opponents raised concerns about practical issues— technology, data formats, time and money constraints, and privacy—and cultural ones—the possibly negative impact of open sharing on individual scientific careers and advancement, questions about data ownership, and whether data sharing should be mandatory or optional57, 58. The backlash eventually led JOCN to backstep on the data sharing requirement, and when funding to maintain the archive ran out, fMRIDC stopped accepting new data. fMRIDC's architects argue that despite the setbacks, fMRIDC should be viewed as a successful pioneer in open fMRI data sharing56, whose experiences shaped the next generation of repositories like the 1000 Functional Connectomes Project (FCP; fcon_1000.projects.nitrc.org/) and its International Neuroimaging Data Initiative (INDI)58, the OpenfMRI project (openfmri.org) and NeuroVault (neurovault.org), the Human Connectome Project (humanconnectomeproject.org), and the NIMH-based National Database for Autism Research (NDAR; ndar.nih.gov). More broadly, fMRIDC helped fuel interest in, recognition of, and support for the essential role information infrastructure (neuroinformatics) plays in making widespread data sharing and reuse possible.
A second barrier to the evaluation of results reproducibility is a lack of consensus about what quantitative measures should be used1. One area of contention is whether the magnitude or spatial extent of task-related activation (or both) should be assessed. Measures of magnitude and spatial extent depend on criteria for determining which voxels are active, of course. Bennett and Miller1 argued that the plurality of measures of reliability reported within individual studies made it challenging to ask about the reliability of findings across studies. They reported that the reliability of group-level results, using individual participants tested at different times, varied depending on the temporal gap between the tests, the specific tasks employed (sensory/motor vs. cognitive), design factors (block vs. event-related), the magnitude of activations, and other inter-individual factors. Importantly, Miller59 and others60 found that, like the difference between within- and between site variability found by the fBIRN team49, variability within participants across testing sessions was lower than variability between participants. Differences in tasks, the degree of selectivity of active voxels to those tasks, and subject motion appeared to be the biggest contributors to inter-subject variability61. Nevertheless, Bennett and Miller1 noted that all of the studies reporting test/retest reliabilities had small sample sizes, foreshadowing concerns about limited statistical power that others raised in the intervening years8, 11.
In sum, we believe that, perhaps surprisingly, given the size of the primary literature, the jury is still out on the degree to which researchers should expect individual neuroimaging findings to replicate when repeated under similar conditions. There is little doubt that methods like fMRI can produce highly replicable results, and that many canonical findings are indeed highly robust; however, as the low-hanging fruit are plucked and researchers increasingly turn to subtler phenomena, it becomes more important for researchers to share data and results openly. Only in doing so can we progress toward consensus on criteria for evaluating the reproducibility of results.
Inferential Reproducibility
The challenge of generating reproducible inferences—where independent researchers come to similar conclusions about what patterns of data mean—has been a central concern in the field for many years. Numerous published reviews have highlighted conceptual and statistical problems that threaten common neuroimaging inferences25, 62–63. One source of concern is that the statistical power of most fMRI studies is well below conventionally adequate levels8–11; in a recent review based on over 1,100 samples, Poldrack et al.62 found that the median fMRI study in 2015 was underpowered to detect anything but relatively large effects (Cohen's d of ∼0.75) even when using relatively high-powered procedures (i.e., a one-sample t-test). This observation is worrisome not only because low power implies a high false negative rate and inefficient resource expenditure, but because it frequently leads to incorrect interpretations—the notion that effects are stronger and better-localized than they actually are64–65. This increases the false-positive rate8 across the literature as a whole.
A second set of concerns arises at the analysis stage. As Bennett et al's well-known “dead salmon” illustration showed66, insufficiently stringent multiple corrections procedures can easily inflate the false positive rate—an observation echoed by numerous studies that have highlighted limitations with common correction methods12, 67–69. Moreover, such analyses all assume a best-case scenario under which researchers are not (inadvertently) capitalizing on the many “researcher degrees of freedom” available in a typical fMRI pipeline40. If one could formally account for “p-hacking” (i.e., data-dependent selection of analysis procedures), it is likley that the false positive rate would rise, perhaps substantially11, 70.
Lastly, even if one sets aside the statistical issues involved in the generation of cognitive neuroscience findings and assumes for the sake of argument that most published findings are fundamentally sound, it does not follow that researchers will agree about how to interpret such findings. Indeed, trenchant concerns have been raised about some of the most common assumptions researchers make when interpreting neuroimaging results, ranging from basic questions about what the BOLD signal reflects to what kind of information is actually extracted from multivariate pattern analysis 71–74. Poldrack75 has flagged the problem of reverse inference as a particularly serious challenge, noting that the widespread approach of inferring mental function based on the pattern of observed brain activity results runs a high risk of failure—unless it is supported by an appropriate Bayesian analysis that directly estimates the probability of a given task or state occurring conditional on an observed pattern of activity that is based on a reasonable prior distribution26.
Of course, science is a difficult enterprise, and it is easy to find serious methodological or statistical problems with virtually any piece of scientific research. The key question is what steps are researchers taking to address inferential concerns and to ensure that research findings continue to improve in reliability over time. To this end, to consider more recent initiatives aimed at improving the reproducibility of cognitive neuroscience research.
Recent Initiatives
The focus on problems of reproducibility in scientific research as a whole has accelerated in the last several years16, 19–20, and its scope extends well beyond psychological and neural science. As a result, cognitive neuroscience is both a beneficiary of new tools that promise to improve reproducibility and a contributor to them. We show that, fortunately, our field has already begun to embrace new, open and transparent research practices that promise to mitigate or even eliminate many of the serious problems of methods, results, and inferential reproducibility.
Methods reproducibility
Concern about reproducible workflows and practices across the computational sciences has sharpened in similar ways76–77. While the specific practices that make computations reproducible vary from one field to another, Sandve and colleagues78 summarize a set of steps that have broad applicability to cognitive neuroscientists. These include avoiding manual data manipulation steps (using scripts not GUIs); keeping careful track of the provenance (history) of all data, including derived results; tracking versions of all software and data; and providing public access to all code, outputs, and data.
Several data analysis tools have been developed based on free open source languages like R (RStudio; rstudio.com) and Python (Jupyter; jupyter.orgV These tools support the creation of interactive electronic notebooks that combine data manipulation and analysis code along with graphic visualizations and text-based commentary. The tools can be used with version control environments like git or mercurial, allowing the history of a project's data analysis to be captured. Version control software can be used to store and share software, analyses, manuscripts, and documents written in virtually any language (both human and computer). Coupled with web-based repositories like GitHub (qithub.com), BitBucket (bitbucket.org), or the Open Science Framework (OSF; osf.io), version control systems enable researchers to share the histories and current status of all project materials and data. Some researchers concerned about computational reproducibility have gone even further, by creating full software environments that can run a particular analysis and packaging them in a specialized “containerized” environment (e.g., Docker; www.docker.com) that can be distributed for others' use across a wide range of computer platforms. The use of electronic notebooks, version control software, and web-based open data repositories has begun to enable cognitive neuroscience researchers to produce open and transparent workflows that can be readily reproduced. The authors use many of these techniques in their own research workflows.
Other efforts focus on methods reproducibility across study teams. On such initiative is the development of the Brain Imaging Data Structure (BIDS; bids.neuroimaging.ioy a new open data format designed to facilitate the storage and sharing of data from brain imaging studies79. BIDS attempts to achieve an easily implementable file directory and data structure that captures critical data and metadata about brain imaging studies and some data about the behavioral tasks performed by participants. BIDS arose out of the work involved in creating the OpenFMRI (openfmri.org) data repository34, designed to allow researchers to openly share raw BOLD imaging data sets with sufficient information to permit re- or meta-analysis. The BigEEG project mentioned earlier represents a similar data format standardization initiative targeted at the EEG community.
On the data sharing side, modern platforms have picked up where pioneers like the fMRIDC left off, making it ever easier for researchers to distribute large neuroimaging datasets in a readily usable form. A major initiative focused on methods and results reproducibility is the Stanford Center for Reproducible Neuroscience (CRN; reproducibility.stanford.edu) formed in 2015 by Russell Poldrack and colleagues. CRN is developing data repositories for both raw neuroimaging datasets (an upcoming successor to the OpenFMRI platform) and whole brain statistical maps (NeuroVault.org)80. A long-term goal of the CRN is not only to facilitate sharing, but also to provide containerized, modular, and fully reproducible cloud-based tools that can be easily executed via a graphical web interface. This will bring reproducible state-of-the-art neuroimaging data analysis within reach of researchers who lack the resources to deploy their own pipelines locally81.
One of us82 has argued that many problems in reproducing the methods of behavioral studies could be ameliorated if video of all experimental procedures was more widely recorded and shared with researchers. Text-based methods sections with restrictive page or word limits simply can't convey sufficiently detailed information about a study's methods so that it can be reproduced by another researcher. Sharing video can pose privacy risks, but the Databrary (databrary.org) digital library, a repository specialized for storing and sharing video, has developed a policy framework to share identifiable data with participant permission. Like OSF, Databrary has begun to serve as a web-based home for researchers to store and share data, metadata, and materials about the non-imaging-related portions of a study, including videos of experimental procedures, images, audio recordings, or displays. Databrary largely focuses on developmental and learning science research now, but may expand in the future.
In sum, the field is making rapid strides to improve the reproducibility of methods with the emergence of new tools, practices, centers, web-based data management systems, and data repositories.
Results reproducibility
Despite the acknowledged lack of consensus about how to measure and thereby evaluate results reproducibility1 and the noted significant problems with statistical power, we find a number of encouraging developments concerning the reproducibility of results. Researchers continue to take seriously the effort to systematically measure the factors that influence test/retest reliability of responses across time and tasks, and these sorts of studies are increasingly common 83–88. Other research programs focus on addressing questions about the long-term within-subject stability of responses 89–90, and how the accurate assessment of within-participant differences might address questions about individual differences93. There is increasing support for conducting and publishing the results of confirmatory studies94–95, thereby rectifying some existing biases that often favor the publication of new, novel results over confirmatory ones.
Several large-scale cross-site imaging studies whose results were designed to be widely shared with the research community have been undertaken (e.g., the Human Connectome Project and the U.K. Biobank Project). Findings from these studies are beginning to appear95 with results that both confirm and extend current understanding. Perhaps equally important is the extent to which planning for the large-scale sharing of these sorts of data has led to publication of extensive details about processing pipelines96 and careful planning about how to make shared components useful to other researchers.
Policy makers and publishers have taken a renewed interest in how the context in which scientific research is conducted and results shared can influence reproducibility. The Consortium for Reliability and Reproducibility (CoRR) has developed best practice guidelines for the use the resting-state fMRI data available through the INDI archive96, and the Organization for Human Brain Mapping has created a Committee on Best Practice in Data Analysis and Sharing (COBIDAS)97. Following on the success of the ArXiv preprint service, increasing numbers of cognitive neuroscientists have begun to deposit article preprints in the BioRxiv preprint service (biorxiv.org/neuroscience). An effort specific to psychological science (PsyArxiv; osf.io/view/psvarxiv/) has begun with support from the Center for Open Science (COS; cos.io) and the newly formed Society for the Improvement of Psychological Science (SIPS; improvingpsych.orgy High profile generalist and topic-specific journals are adopting data sharing requirements reminiscent of those JOCN attempted to implement 15 years ago; there are new journals, such as Nature Publishing's Scientific Data, focused on creating citable, scholarly homes for well-curated datasets; and some journals (e.g., Cortex) have adopted a new publication format, the pre-registered report, that conducts a review of the methods and analysis plan prior to data collection in exchange for a commitment to publish the results regardless of the findings.
There have also been recent developments to improve appropriate usage and reporting of statistical tests by means of automated tools such as statcheck (statcheck.io)98–99, which looks for elementary errors in the reporting of individual statistical tests. A related tool, P-curve (www.p-curve.com)100 uses the complete set of statistical results from a body of work to estimate the evidentiary strength in favor of a hypothesis. While largely focused on the psychological science literature, these initiatives bear close watching by cognitive neuroscientists as they illustrate how the standardization of reporting practices can lead to insights about the quality of research practices and strength or weakness of evidence across a broad published literature100. In the case of statcheck, the system depends on the the fact that most experimental psychology papers report statistical analyses in ways that allow pertinent parameters to be automatically extracted from the published texts. Clearly, the diversity of efforts focused on bolstering the reproducibility of cognitive neuroscience results have considerable forward momentum.
Inferential reproducibility
Since the 2010 Bennett and Miller review on replicability, new tools and practices that promise to bolster the reproducibility of inferences have been created and are being adopted at an accelerating rate. We highlight three: Meta-analysis, improved statistical practices, and machine learning.
For meta-analysis to succeed, the statistical effects from a large number of disparate studies must be collected, normalized, and reported in standardized ways101. The variability in analysis and reporting practices across the cognitive neuroscience literature can make meta-analysis challenging. As a result, the creation and curation of large-scale brain imaging databases has been essential for the growth of meta-analysis as an inferential tool. One of the oldest such systems devoted to supporting meta-analytic datasets and software is the BrainMap project (www.brainmap.org) project5. As of late fall 2016, BrainMap consisted of data from more than 100,000 individual participants from nearly 4,000 papers. The BrainMap data and meta-analysis tools have been used and cited more than 600 times since 1992, with more than 125 citations in 2016 alone.
Neurosynth (neurosynth.org)26 takes an alternative approach to meta-analysis in which the raw data are i) activation (x,y,z) coordinates mined from the text of imaging papers published in HTML on the web, combined with ii) word frequencies from the same papers. In this way, Neurosynth aims to automate and thereby standardize and accelerate the process of meta-analysis. By combining information about activation coordinates with term frequencies derived from calculating distributional statistics from the published articles, Neurosynth enables the analyst to interactively determine the extent of evidence for a relationship between a specific term of interest and a set of brain coordinates. For example, the analyst could visualize either the probability of a given voxel's activation given the existence of a specific term in the system's database of papers or the probability of a target term appearing in papers that report a particular voxel as active. The system allows users to view 3D maps of the conditional probabilities online, to download the maps for further analysis, and to create customized sets of searches. As of early 2017, Neurosynth contained data from more than 11,000 imaging studies, and it provides users with downloadable interactive meta-analyses from more than 3,000 terms. The system has been cited almost 600 times, with more than 180 citations in 2016 alone. Of course, Neurosynth only supports meta-analysis on a subset of the published literature—older papers that were not published in easily parsable HTML formats and unpublished findings fall outside its scope.
Beyond meta-analysis, cognitive neuroscience research continues to push for new statistical procedures and the wider adoption of long-standing, but more robust ones. For space reasons we do not elaborate extensively here, but among the issues under active discussion are the appropriate handling of main effects and interaction tests102, the applicability of linear mixed effects modeling techniques 103, and the ongoing need to guard against the risks of false positive results3, 104 even when using well-established and vetted and widely used analysis software12. Still others suggest that the standard practice of treating stimulus effects as fixed, not random, may undermine the generalizability of findings across studies105. An emergent theme is the ongoing need for vigorous and rigorous methodological reevaluation combined with a commitment to more open software publication practices. In the recent case of Eklund and colleagues12, the discovery of an error in the algorithm for controlling for cluster-wise fMRI activation effects quickly led to changes in the widely-used AFNI package106–107. The episode highlights the corrective, collaborative nature of open-source software development, while also underscoring the uncomfortable reality that, at present, very few people who use open-source software packages actually bother to read the underlying code (the AFNI bug had previously gone undetected for many years).
In many other areas of social and computational science, progress has been facilitated by borrowing ideas and techniques from the field of machine learning. Philosophically, machine learning researchers tend to emphasize their ability to quantitatively predict key outcomes and pay less attention to traditional forms of scientific explanation108. This philosophy has led to the rapid proliferation of thousands of predictive modeling techniques—a number of which (e.g., support vector machines) are deployed regularly in cognitive neuroscience. Many fMRI studies are now framed as predictive problems in classification or regression, where the goal is to build a model that successfully discovers a mapping between a set of predictor variables and a set of discrete (in the case of classification) or continuous (in the case of regression) outcomes. For example, the distributed activation pattern of a large number of voxels in an fMRI data set can be used to predict successful vs. unsuccessful attempts to recognize a stimulus. The resultant classifier can then be used to predict outcomes “out-of-sample” (i.e., in new data sets) and potentially also to aid in the interpretation of which voxels were likely to have played a role in processing relevant information. This gives rise to applications such as the ‘mind-reading’ of representations present in visual cortex during movie viewing109–110 or revealing participants' semantic maps activated during narrative comprehension111.
Machine learning and related ‘big data’ techniques have provided entirely new approaches to analysis. They allow researchers to capitalize on neural information patterns that may be too subtle or complex to be easily discovered using more conventional summary statistic approaches (e.g. brain activation maps or ERPs)112. Of course, prediction-oriented approaches are not a panacea for standard concerns about the seeming ease with which researchers can fool themselves and unwittingly generate false or exaggerated findings. In the context of machine learning, the term ‘overfitting’ is used to describe a case in which the predictions of an analysis have been inadvertently contaminated by noise in the data that were used to develop the analysis. As a consequence of overfitting, favorable results obtained when analyzing the same dataset that was used to develop and calibrate the analysis will not be obtained when examining other, equivalent datasets. Some researchers who use machine learning take the notion of ‘overfitting’ more seriously than others113. The cautious deploy methods such as cross-validation (i.e., training and testing a model on independent subsets of the data) that should in principle guard against overfitting. However, machine learning pipelines allow for considerably more analytical flexibility than conventional analyses. Researchers often have a choice between literally hundreds of different estimation approaches, each of which may have its own free parameters that require tuning to perform optimally. Whereas there is widespread awareness of the need to cross-validate results once a model has been selected, there is much less recognition that overfitting can still occur through the optimization of an analysis (for further discussion, see114–115). Thus, as our field increasingly adopts machine learning techniques, it will be important to borrow established best practices from fields that have been using similar big data approaches for longer periods of time112, 115.
The Future
As Van Horn and Gazzaniga56 observed, “the reality remains that very little of the neuroimaging data gathered each day in the field have been made available to those who could help provide much needed understanding.” While we agree that this assessment still holds, we see other evidence that points toward a very different future. There is increasing recognition that greater openness and transparency, reflected in data, materials, and code sharing offers individual investigators and the field as a whole far more benefits than risks28, 58, 108, 116. While significant challenges remain in developing technology and workflow practices that make open and transparent workflows easy to generate and data readily shareable, that progress is being made. It is increasingly clear that there are substantial scientific rewards in analyzing or re-analyzing large-scale shared data sets beyond improving statistical power. Accordingly, while we take seriously the concerns many have raised recently about the methods, results, and inferential reproducibility of our field, we encourage our colleagues to embrace the newly emerging open science practices with an optimistic mindset56, 117, as there is so much more to gain than to lose.
At the same time, it is essential that the field identify barriers that stand in the way of a more open, transparent, and reproducible neuroscience of cognition. One clear gap is the difficulty of capturing and reporting reproducible information about tasks, displays, and analysis procedures, although new data and materials repositories like OSF and Databrary, emerging data standards (BIDS; BigEEG), and pipelines (EEGLAB; nipype) can play constructive roles. Another concerns the need to forge community consensus around a set of principles about the culture in which cognitive neuroscience research is carried out—how to seek permission to share, when data and materials should be shared, how to measure and report individual scholarly contributions to large-scale studies, how to weigh the impact of analyses conducted on secondary data relative to the collection of new data, and how to ensure that the transition to more open science practices doesn't unduly harm the careers of the next generation of researchers. Proscribing answers to these questions goes beyond our scope, but we urge continued dialogue focused on achieving community consensus.
A vital question for which there remains no satisfying answer is what entity will pay for the curation, support, maintenance and long-term storage of cognitive neuroscience data and materials. Data repositories, both past and present, have been funded either by short-term (3-5 year duration) NIH or NSF research grants or private foundation funders. Thus, despite increasingly strong encouragements from granting agencies to share data and materials or even mandates to do so, there is as yet no long-term commitment from the agencies for funding devoted to long-term data preservation. Data curation, storage, preservation, is not inexpensive, and the problem of how to sustain research infrastructure that benefits the entire research community will neither solve itself nor go away. Nevertheless, based on the success of other fields like astronomy, high energy physics, and the geosciences, we think that a strong case can be made for enduring federal and private donor support for research infrastructure that empowers cognitive neuroscientists to openly share data, materials, and methods.
Fundamentally, we think that investments in the future of cognitive neuroscience infrastructure will generate big payoffs. Fostering the widespread adoption of open, transparent, and reproducible research practices coupled with innovations in technology that enable the large-scale analysis of our particular store of ‘big data’ will accelerate the discovery of generalizable, robust, and meaningful findings about the nature and origins118 of human cognition.
Acknowledgments
ROG acknowledges support from NSF BCS-1147440, NSF BCS-1238599, and NICHD U01-HD-076595. BW acknowledges support from NSF BCS-1331073
References
- 1.Bennett CM, Miller MB. How reliable are the results from functional magnetic resonance imaging? Annals of the New York Academy of Sciences. 2010;1191:133–155. doi: 10.1111/j.1749-6632.2010.05446.x. http://doi.org/10.1111/j.1749-6632.2010.05446.x. [DOI] [PubMed] [Google Scholar]
- 2.Carp J. The secret lives of experiments: Methods reporting in the fMRI literature. Neurolmage. 2012b;63(1):289–300. doi: 10.1016/j.neuroimage.2012.07.004. https://doi.org/10.1016/i.neuroimaae.2012.07.004. [DOI] [PubMed] [Google Scholar]
- 3.Vul E, Harris C, Winkielman P, Pashler H. Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition. Perspectives on Psychological Science. 2009;4(3):274–290. doi: 10.1111/j.1745-6924.2009.01125.x. http://doi.org/10.1111/i.1745-6924.2009.01125.x. [DOI] [PubMed] [Google Scholar]
- 4.Biswal BB, Mennes M, Zuo XN, Gohel S, Kelly C, Smith SM, Milham MP. Toward discovery science of human brain function. Proceedings of the National Academy of Sciences. 2010;107(10):4734–4739. doi: 10.1073/pnas.0911855107. https://doi.org/10.1073/pnas.0911855107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Fox PT, Mikiten S, Davis G, Lancaster JL. Functional Neuroimaging: Technical Foundations. Chicago: 1994. BrainMap: A database of human functional brain mapping; pp. 95–105. [Google Scholar]
- 6.Van Horn JD, Gazzaniga MS. Databasing fMRI studies — towards a “discovery science” of brain function. Nature Reviews Neuroscience. 2002;3(4):314–318. doi: 10.1038/nrn788. http://doi.org/10.1038/nrn788. [DOI] [PubMed] [Google Scholar]
- 7.Open Science Collaboration. Estimating the reproducibility of psychological science. Science. 2015;349(6251) doi: 10.1126/science.aac4716. http://doi.org/10.1126/science.aac4716. [DOI] [PubMed] [Google Scholar]
- 8.Button KS, loannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, Munafò MR. Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience. 2013;14(5):365–376. doi: 10.1038/nrn3475. http://doi.org/10.1038/nrn3475. [DOI] [PubMed] [Google Scholar]
- 9.David SP, Ware JJ, Chu IM, Loftus PD, Fusar-Poli P, Radua J, loannidis JPA. Potential reporting bias in fMRI studies of the brain. PLoS ONE. 2013;8(7) doi: 10.1371/journal.pone.0070104. http://doi.org/10.1371/iournal.pone.0070104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.loannidis JPA, Munafò MR, Fusar-Poli P, Nosek BA, David SP. Publication and other reporting biases in cognitive sciences: detection, prevalence, and prevention. Trends in Cognitive Sciences. 2014;18(5):235–241. doi: 10.1016/j.tics.2014.02.010. http://doi.org/10.1016/i.tics.2014.02.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Szucs D, loannidis JP. Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. bioRxiv. 2016;071530 doi: 10.1371/journal.pbio.2000797. https://doi.org/10.1101/071530. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Eklund A, Nichols TE, Knutsson H. Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Proceedings of the National Academy of Sciences. 2016;113(28):7900–7905. doi: 10.1073/pnas.1602413113. https://doi.org/10.1073/pnas.1602413113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Dolgin E. This is your brain online: The functional connectomes project. Nature Medicine. 2010;16(4):351–351. doi: 10.1038/nm0410-351b. https://doi.org/10.1038/nm0410-351b. [DOI] [PubMed] [Google Scholar]
- 14.Van Essen DC, Smith SM, Barch DM, Behrens TEJ, Yacoub E, Ugurbil K. The WU-Minn Human Connectome Project: An overview. Neurolmage. 2013;80:62–79. doi: 10.1016/j.neuroimage.2013.05.041. https://doi.org/10.1016/j.neuroimage.2013.05.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Adolph KE, Gilmore RO, Freeman C, Sanderson P, Millman D. Toward open behavioral science. Psychological Inquiry. 2012;23(3):244–247. doi: 10.1080/1047840x.2012.705133. https://doi.org/10.1080/1047840X.2012.705133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Nosek BA, Bar-Anan Y. Scientific utopia: I. Opening scientific communication. Psychological Inquiry. 2012;23(3):217–243. https://doi.org/10.1080/1047840X.2012.692215. [Google Scholar]
- 17.Pashler H, Harris CR. Is the replicability crisis overblown? Three arguments examined. Perspectives on Psychological Science. 2012;7(6):531–536. doi: 10.1177/1745691612463401. https://doi.org/10.1177/1745691612463401. [DOI] [PubMed] [Google Scholar]
- 18.Poldrack RA, Poline JB. The publication and reproducibility challenges of shared data. Trends in Cognitive Sciences. 2015;19(2):59–61. doi: 10.1016/j.tics.2014.11.008. http://doi.org/10.1016/i.tics.2014.11.008. [DOI] [PubMed] [Google Scholar]
- 19.Munafò MR, Nosek BA, Bishop DVM, Button KS, Chambers CD, Sert NPdu, loannidis JPA. A manifesto for reproducible science. Nature Human Behaviour. 2017;10021 doi: 10.1038/s41562-016-0021. https://doi.org/10.1038/s41562-016-0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Goodman SN, Fanelli D, loannidis JPA. What does research reproducibility mean? Science Translational Medicine. 2016;8(341):3. doi: 10.1126/scitranslmed.aaf5027. [DOI] [PubMed] [Google Scholar]
- 21.Nosek BA, Alter G, Banks GC, Borsboom D, Bowman SD, Breckler SJ, Yarkoni T. Promoting an open research culture. Science. 2015;348(6242):1422–1425. doi: 10.1126/science.aab2374. https://doi.org/10.1126/science.aab2374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Patil P, Peng RD, Leek JT. What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspectives on Psychological Science. 2016;11(4):539–544. doi: 10.1177/1745691616646366. https://doi.org/10.1177/1745691616646366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Gilbert DT, King G, Pettigrew S, Wilson TD. Comment on Estimating the reproducibility of psychological science. Science. 2016;351(6277):1037–1037. doi: 10.1126/science.aad7243. http://doi.org/10.1126/science.aad7243. [DOI] [PubMed] [Google Scholar]
- 24.Etz A, Vandekerckhove J. A Bayesian perspective on the Reproducibility Project: Psychology. PLoS ONE. 2016;11(2):e0149794. doi: 10.1371/journal.pone.0149794. https://doi.org/10.1371/journal.pone.0149794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Yarkoni T, Poldrack RA, Van Essen DC, Wager TD. Cognitive neuroscience 2.0: Building a cumulative science of human brain function. Trends in Cognitive Science. 2010;14(11):489–96. doi: 10.1016/j.tics.2010.08.004. https://doi.org/10.1016/j.tics.2010.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Yarkoni T, Poldrack RA, Nichols TE, Essen DCV, Wager TD. Large-scale automated synthesis of human functional neuroimaging data. Nature Methods. 2011;8(8):665–670. doi: 10.1038/nmeth.1635. http://doi.org/10.1038/nmeth.1635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Nosek BA, Spies JR, Motyl M. Scientific utopia II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science. 2012;7(6):615–631. doi: 10.1177/1745691612459058. https://doi.org/10.1177/1745691612459058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Poldrack RA, Gorgolewski KJ. Making big data open: Data sharing in neuroimaging. Nature Neuroscience. 2014;17(11):1510–1517. doi: 10.1038/nn.3818. http://doi.org/10.1038/nn.3818. [DOI] [PubMed] [Google Scholar]
- 29.Poline JB, Breeze JL, Ghosh SS, Gorgolewski K, Halchenko YO, Hanke M, Kennedy DN. Data sharing in neuroimaging research. Frontiers in Neuroinformatics. 2012;6(9) doi: 10.3389/fninf.2012.00009. https://doi.org/10.3389/fninf.2012.00009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Wcherts JM, Bakker M, Molenaar D. Wllingness to share research data Is related to the strength of the evidence and the quality of reporting of statistical results. PLoS ONE. 2011;6(11):e26828. doi: 10.1371/journal.pone.0026828. https://doi.org/10.1371/iournal.pone.0026828. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Luck SJ. An Introduction to the Event-Related Potential Technique. MIT Press; 2014. [Google Scholar]
- 32.Delorme A, Makeig S. EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics. Journal of Neuroscience Methods. 2004;134:9–21. doi: 10.1016/j.jneumeth.2003.10.009. [DOI] [PubMed] [Google Scholar]
- 33.Lopez-Calderon J, Luck SJ. ERPLAB: An open-source toolbox for the analysis of event-related potentials. Frontiers in Human Neuroscience. 2014;8(213) doi: 10.3389/fnhum.2014.00213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Poldrack RA, Barch DM, Mitchell J, Wager T, Wagner AD, Devlin JT, Milham M. Toward open sharing of task-based fMRI data: the OpenfMRI project. Frontiers in Neuroinformatics. 2013;7(12) doi: 10.3389/fninf.2013.00012. https://doi.org/10.3389/fninf.2013.00012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.practical fMRI: the nuts & bolts. A checklist for fMRI acquisition methods reporting in the literature. 2013 Jan; Retrieved from https://practicalfmri.blogspot.com/2013/01/a-checklist-for-fmri-acquisition.html.
- 36.Gold S, Christian B, Arndt S, Zeien G, Cizadlo T, Johnson DL, Andreasen NC. Functional MRI statistical software packages: A comparative analysis. Human Brain Mapping. 1998;6(2):73–84. doi: 10.1002/(SICI)1097-0193(1998)6:2<73::AID-HBM1>3.0.CO;2-H. https://doi.org/10.1002/(SICI)1097-0193(1998)6:2<73∷AID-HBM1>3.0.CO;2-H41ps12-341ps12. https://doi.org/10.1126/scitranslmed.aaf5027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Morgan VL, Dawant BM, Li Y, Pickens DR. Comparison of fMRI statistical software packages and strategies for analysis of images containing random and stimulus-correlated motion. Computerized Medical Imaging and Graphics. 2007;31(6):436–446. doi: 10.1016/j.compmedimag.2007.04.002. https://doi.org/10.1016/i.compmedimaa.2007.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Strother SC, Anderson J, Hansen LK, Kjems U, Kustra R, Sidtis J, Rottenberg D. The quantitative evaluation of functional neuroimaging experiments: The NPAIRS data analysis framework. Neurolmage. 2002;15(4):747–771. doi: 10.1006/nimg.2001.1034. https://doi.org/10.1006/nimg.2001.1034. [DOI] [PubMed] [Google Scholar]
- 39.Strother S, La Conte S, Kai Hansen L, Anderson J, Zhang J, Pulapura S, Rottenberg D. Optimizing the fMRI data-processing pipeline using prediction and reproducibility performance metrics: I. A preliminary group analysis. Neurolmage. 2004;23(Supplement 1):S196–S207. doi: 10.1016/j.neuroimage.2004.07.022. https://doi.org/10.1016/j.neuroimage.2004.07.022. [DOI] [PubMed] [Google Scholar]
- 40.Carp J. On the plurality of (methodological) worlds: estimating the analytic flexibility of fMRI experiments. Brain Imaging Methods. 2012a;6(149) doi: 10.3389/fnins.2012.00149. https://doi.org/10.3389/fnins.2012.00149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Glatard T, Lewis LB, Ferreira da Silva R, Adalat R, Beck N, Lepage C, Evans AC. Reproducibility of neuroimaging analyses across operating systems. Frontiers in Neuroinformatics. 2015;9 doi: 10.3389/fninf.2015.00012. https://doi.org/10.3389/fninf.2015.00012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Turner JA, Laird AR. The cognitive paradigm ontology: Design and application. Neuroinformatics. 2011;70(1):57–66. doi: 10.1007/s12021-011-9126-x. https://doi.org/10.1007/s12021-011-9126-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Poldrack RA, Kittur A, Kalar D, Miller E, Seppa C, Gil Y, Bilder RM. The cognitive atlas: toward a knowledge foundation for cognitive neuroscience. Frontiers in Neuroinformatics. 2011;5(17) doi: 10.3389/fninf.2011.00017. https://doi.org/10.3389/fninf.2011.00017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Gershon RC, Cella D, Fox NA, Havlik RJ, Hendrie HC, Wagster MV. Assessment of neurological and behavioural function: the NIH Toolbox. The Lancet Neurology. 2010;9(2):138–139. doi: 10.1016/S1474-4422(09)70335-7. https://doi.oro/10.1016/S1474-4422(09)70335-7. [DOI] [PubMed] [Google Scholar]
- 45.Hall D, Huerta MF, McAuliffe MJ, Farber GK. Sharing heterogeneous data: The National Database for Autism Research. Neuroinformatics. 2012;10(4):331–339. doi: 10.1007/s12021-012-9151-4. https://doi.org/10.1007/s12021-012-9151-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Casey BJ, Cohen JD, O'Craven K, Davidson RJ, Irwin W, Nelson CA, Turski PA. Reproducibility of fMRI results across four institutions using a spatial working memory task. Neurolmage. 1998;8(3):249–261. doi: 10.1006/nimg.1998.0360. https://doi.org/10.1006/nimg.1998.0360. [DOI] [PubMed] [Google Scholar]
- 47.Friedman L, Stern H, Brown GG, Mathalon DH, Turner J, Glover GH, Potkin SG. Test-retest and between-site reliability in a multicenter fMRI study. Human Brain Mapping. 2008;29(8):958–972. doi: 10.1002/hbm.20440. https://doi.org/10.1002/hbm.20440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Friedman L, Glover GH. Report on a multicenter fMRI quality assurance protocol. Journal of Magnetic Resonance Imaging. 2006;23(6):827–839. doi: 10.1002/jmri.20583. https://doi.org/10.1002/jmri.20583. [DOI] [PubMed] [Google Scholar]
- 49.Brown GG, Mathalon DH, Stern H, Ford J, Mueller B, Greve DN, Potkin SG. Multisite reliability of cognitive BOLD data. Neurolmage. 2011;54(3):2163–2175. doi: 10.1016/j.neuroimage.2010.09.076. https://doi.org/10.1016/j.neuroimaae.2010.09.076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Braver TS, Cohen JD, Nystrom LE, Jonides J, Smith EE, Noll DC. A parametric study of prefrontal cortex involvement in human working memory. Neurolmage. 1997;5(1):49–62. doi: 10.1006/nimg.1996.0247. https://doi.org/10.1006/nimg.1996.0247. [DOI] [PubMed] [Google Scholar]
- 51.Dosenbach NUF, Visscher KM, Palmer ED, Miezin FM, Wenger KK, Kang HC, Petersen SE. A core system for the implementation of task sets. Neuron. 2006;50(5):799–812. doi: 10.1016/j.neuron.2006.04.031. https://doi.org/10.1016/j.neuron.2006.04.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Duncan J. The multiple-demand (MD) system of the primate brain: mental programs for intelligent behaviour. Trends in Cognitive Sciences. 2010;14(4):172–179. doi: 10.1016/j.tics.2010.01.004. https://doi.org/10.1016/j.tics.2010.01.004. [DOI] [PubMed] [Google Scholar]
- 53.Smith SM, Fox PT, Miller KL, Glahn DC, Fox PM, Mackay CE, Beckmann CF. Correspondence of the brain's functional architecture during activation and rest. Proceedings of the National Academy of Sciences. 2009;706(31):13040–13045. doi: 10.1073/pnas.0905267106. https://doi.org/10.1073/pnas.0905267106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Vigneau M, Beaucousin V, Hervé PY, Duffau H, Crivello F, Houdé O, Tzourio-Mazoyer N. Meta-analyzing left hemisphere language areas: Phonology, semantics, and sentence processing. Neurolmage. 2006;30(4):1414–1432. doi: 10.1016/j.neuroimage.2005.11.002. https://doi.org/10.1016/j.neuroimage.2005.11.002. [DOI] [PubMed] [Google Scholar]
- 55.Yarkoni T, Braver TS. Cognitive neuroscience approaches to individual differences in working memory and executive control: Conceptual and methodological issues. In: Gruszka A, Matthews G, Szymura B, editors. Handbook of Individual Differences in Cognition. Springer; New York: 2010. pp. 87–107. Retrieved from http://link.springer.com/chapter/10.1007/978-1-4419-1210-7_6. [Google Scholar]
- 56.Van Horn JD, Gazzaniga MS. Why share data? Lessons learned from the fMRIDC. Neurolmage. 2013;82:677–682. doi: 10.1016/j.neuroimage.2012.11.010. https://doi.org/10.1016/j.neuroimage.2012.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Van Horn JD, Grethe JS, Kostelec P, Woodward JB, Aslam JA, Rus D, Rockmore D, Gazzaniga MS. The Functional Magnetic Resonance Imaging Data Center (fMRIDC): The challenges and rewards of large-scale databasing of neuroimaging studies. Philosophical Transactions of the Royal Society of London: B Biological Sciences. 2001;356:1323–1339. doi: 10.1098/rstb.2001.0916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Mennes M, Biswal B, Castellanos FX, Milham MP. Making data sharing work: The FCP/INDI experience. Neurolmage. 2013;82:683–691. doi: 10.1016/j.neuroimage.2012.10.064. http://doi.org/10.1016/i.neuroimaae.2012.10.064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Miller MB, Donovan CL, Van Horn JD, German E, Sokol-Hessner P, Wolford GL. Unique and persistent individual patterns of brain activity across different memory retrieval tasks. Neurolmage. 2009;48(3):625–635. doi: 10.1016/j.neuroimage.2009.06.033. https://doi.org/10.1016/j.neuroimage.2009.06.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Costafreda SG, Brammer MJ, Vencio RZN, Mourao ML, Portela LAP, de Castro CC, Amaro E. Multisite fMRI reproducibility of a motor task using identical MR systems. Journal of Magnetic Resonance Imaging. 2007;26(4):1122–1126. doi: 10.1002/jmri.21118. https://doi.org/10.1002/imri.21118. [DOI] [PubMed] [Google Scholar]
- 61.Duncan KJ, Pattamadilok C, Knierim I, Devlin JT. Consistency and variability in functional localisers. Neurolmage. 2009;46(4):1018–1026. doi: 10.1016/j.neuroimage.2009.03.014. https://doi.org/10.1016/j.neuroimaae.2009.03.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Poldrack RA, Baker CI, Durnez J, Gorgolewski KJ, Matthews PM, Munafò MR, Yarkoni T. Scanning the horizon: towards transparent and reproducible neuroimaging research. Nature Reviews Neuroscience, advance online publication. 2017 doi: 10.1038/nrn.2016.167. https://doi.org/10.1038/nrn.2016.167. [DOI] [PMC free article] [PubMed]
- 63.Thirion B, Pinel P, Mériaux S, Roche A, Dehaene S, Poline JB. Analysis of a large fMRI cohort: Statistical and methodological issues for group analyses. Neurolmage. 2007;35(1):105–120. doi: 10.1016/j.neuroimage.2006.11.054. https://doi.org/10.1016/j.neuroimage.2006.11.054. [DOI] [PubMed] [Google Scholar]
- 64.Yarkoni T. Big Correlations in Little Studies: Inflated fMRI Correlations Reflect Low Statistical Power–Commentary on Vul et al. (2009) Perspectives on Psychological Science. 2009;4(3):294–298. doi: 10.1111/j.1745-6924.2009.01127.x. https://doi.org/10.1111/j.1745-6924.2009.01127.x. [DOI] [PubMed] [Google Scholar]
- 65.Kriegeskorte N, Lindquist MA, Nichols TE, Poldrack RA, Vul E. Everything You Never Wanted to Know about Circular Analysis, but Were Afraid to Ask. Journal of Cerebral Blood Flow & Metabolism. 2010;30(9):1551–1557. doi: 10.1038/jcbfm.2010.86. https://doi.org/10.1038/jcbfm.2010.86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Bennett C, Miller M, Wolford G. Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: an argument for multiple comparisons correction. Neurolmage. 2009;47(Supplement 1):S125. https://doi.org/10.1016/S1053-8119(09)71202-9. [Google Scholar]
- 67.Bennett CM, Wolford GL, Miller MB. The principled control of false positives in neuroimaging. Social Cognitive and Affective Neuroscience. 2009;4(4):417–422. doi: 10.1093/scan/nsp053. https://doi.org/10.1093/scan/nsp053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Woo CW, Krishnan A, Wager TD. Cluster-extent based thresholding in fMRI analyses: Pitfalls and recommendations. Neurolmage. 2014;91:412–419. doi: 10.1016/j.neuroimage.2013.12.058. https://doi.org/10.1016/j.neuroimage.2013.12.058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Smith SM, Nichols TE. Threshold-free cluster enhancement: Addressing problems of smoothing, threshold dependence and localisation in cluster inference. Neurolmage. 2009;44(1):83–98. doi: 10.1016/j.neuroimage.2008.03.061. https://doi.org/10.1016/j.neuroimage.2008.03.061. [DOI] [PubMed] [Google Scholar]
- 70.loannidis JPA. Why most published research findings are false. PLoS Medicine. 2005;2(8):e124. doi: 10.1371/journal.pmed.0020124. https://doi.org/10.1371/journal.pmed.0020124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Logothetis NK. What we can do and what we cannot do with fMRI. Nature. 2008;453(7197):869–878. doi: 10.1038/nature06976. https://doi.org/10.1038/nature06976. [DOI] [PubMed] [Google Scholar]
- 72.Cole DM, Smith SM, Beckmann CF. Advances and pitfalls in the analysis and interpretation of resting-state FMRI data. Frontiers in Systems Neuroscience. 2010;4 doi: 10.3389/fnsys.2010.00008. https://doi.org/10.3389/fnsys.2010.00008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Poldrack RA. Subtraction and beyond: The logic of experimental designs for neuroimaging. In: Hanson SJ, Bunzl M, editors. Foundational Issues in Human Brain Mapping. MIT Press; 2010. [Google Scholar]
- 74.Etzel JA, Zacks JM, Braver TS. Searchlight analysis: Promise, pitfalls, and potential. Neurolmage. 2013;78:261–269. doi: 10.1016/j.neuroimage.2013.03.041. https://doi.org/10.1016/j.neuroimage.2013.03.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Poldrack RA. Can cognitive processes be inferred from neuroimaging data? Trends in Cognitive Sciences. 2006;10(2):59–63. doi: 10.1016/j.tics.2005.12.004. https://doi.org/10.1016/j.tics.2005.12.004. [DOI] [PubMed] [Google Scholar]
- 76.Peng RD. Reproducible research in computational science. Science. 2011;334(6060):1226–1227. doi: 10.1126/science.1213847. https://doi.org/10.1126/science.1213847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Stodden V. Reproducible research: Tools and strategies for scientific computing. Computing in Science & Engineering. 2012;14(4):11–12. https://doi.org/10.1109/MCSE.2012.82. [Google Scholar]
- 78.Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol. 2013;9(10):e1003285. doi: 10.1371/journal.pcbi.1003285. https://doi.org/10.1371/journal.pcbi.1003285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Gorgolewski KJ, Auer T, Calhoun VD, Craddock RC, Das S, Duff EP, Poldrack RA. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Scientific Data. 2016;3160044 doi: 10.1038/sdata.2016.44. https://doi.org/10.1038/sdata.2016.44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Gorgolewski KJ, Varoquaux G, Rivera G, Schwarz Y, Ghosh SS, Maumet C, Margulies DS. NeuroVault.org: a web-based repository for collecting and sharing unthresholded statistical maps of the human brain. Frontiers in Neuroinformatics. 2015;9 doi: 10.3389/fninf.2015.00008. https://doi.org/10.3389/fninf.2015.00008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Gorgolewski KJ, Alfaro-Almagro F, Auer T, Bellec P, Capota M, Chakravarty M, Poldrack R. BIDS Apps: Improving ease of use, accessibility and reproducibility of neuroimaging data analysis methods. bioRxiv. 2016;079145 doi: 10.1371/journal.pcbi.1005209. https://doi.org/10.1101/079145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Gilmore RO, Adolph KE. Open Sharing of Research Video - Breaking down the boundaries of the research team. In: Hall Kara, Croyle R, Vogel A., editors. Advancing Social and Behavioral Health Research through Cross-Disciplinary Team Science: Principles for Success. Springer; (in press) [Google Scholar]
- 83.Jann K, Gee DG, Kilroy E, Schwab S, Smith RX, Cannon TD, Wang DJJ. Functional connectivity in BOLD and CBF data: Similarity and reliability of resting brain networks. Neurolmage. 2015;106:111–122. doi: 10.1016/j.neuroimage.2014.11.028. https://doi.org/10.1016/j.neuroimage.2014.11.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Jovicich J, Marizzoni M, Bosch B, Bartrés-Faz D, Arnold J, Benninghoff J, Frisoni GB. Multisite longitudinal reliability of tract-based spatial statistics in diffusion tensor imaging of healthy elderly subjects. Neurolmage. 2014;101:390–403. doi: 10.1016/j.neuroimage.2014.06.075. https://doi.org/10.1016/i.neuroimaae.2014.06.075. [DOI] [PubMed] [Google Scholar]
- 85.Koolschijn PCMP, Schel MA, Rooij M, de Rombouts SARB, Crone EA. A three-year longitudinal functional magnetic resonance imaging study of performance monitoring and test-retest reliability from childhood to early adulthood. Journal of Neuroscience. 2011;31(11):4204–212. doi: 10.1523/JNEUROSCI.6415-10.2011. https://doi.org/10.1523/JNEUROSCI.6415-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Liao XH, Xia MR, Xu T, Dai ZJ, Cao XY, Niu HJ, He Y. Functional brain hubs and their test-retest reliability: A multiband resting-state functional MRI study. Neurolmage. 2013;83:969–982. doi: 10.1016/j.neuroimage.2013.07.058. https://doi.org/10.1016/j.neuroimage.2013.07.058. [DOI] [PubMed] [Google Scholar]
- 87.Madhyastha T, Merillat S, Hirsiger S, Bezzola L, Liem F, Grabowski T, Jäncke L. Longitudinal reliability of tract-based spatial statistics in diffusion tensor imaging. Human Brain Mapping. 2014;35(9):4544–4555. doi: 10.1002/hbm.22493. https://doi.org/10.1002/hbm.22493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Marchitelli R, Minati L, Marizzoni M, Bosch B, Bartrés-Faz D, Müller BW, Jovicich J. Test-retest reliability of the default mode network in a multi-centric fMRI study of healthy elderly: Effects of data-driven physiological noise correction techniques. Human Brain Mapping. 2016;37(6):2114–2132. doi: 10.1002/hbm.23157. https://doi.org/10.1002/hbm.23157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Choe AS, Jones CK, Joel SE, Muschelli J, Belegu V, Caffo BS, Pekar JJ. Reproducibility and temporal structure in weekly resting-state fMRI over a period of 3.5 years. PLoS ONE. 2015;70(10):e0140134. doi: 10.1371/journal.pone.0140134. http://doi.org/10.1371/journal.pone.0140134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Poldrack RA, Laumann TO, Koyejo O, Gregory B, Hover A, Chen MY, Mumford JA. Long-term neural and physiological phenotyping of a single human. Nature Communications. 2015;68885 doi: 10.1038/ncomms9885. https://doi.org/10.1038/ncomms9885. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Dubois J, Adolphs R. Building a science of individual differences from fMRI. Trends in Cognitive Sciences. 2016;20(6):425–443. doi: 10.1016/j.tics.2016.03.014. https://doi.org/10.1016/i.tics.2016.03.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Boekel W, Wagenmakers EJ, Belay L, Verhagen J, Brown S, Forstmann BU. A purely confirmatory replication study of structural brain-behavior correlations. Cortex. 2015;66:115–133. doi: 10.1016/j.cortex.2014.11.019. https://doi.org/10.1016/j.rortex.2014.11.019. [DOI] [PubMed] [Google Scholar]
- 93.Muhlert N, Ridgway GR. Failed replications, contributing factors and careful interpretations: Commentary on Boekel et al., 2015. Cortex. 2016;74:338–342. doi: 10.1016/j.cortex.2015.02.019. https://doi.org/10.1016/j.cortex.2015.02.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Glasser MF, Coalson TS, Robinson EC, Hacker CD, Harwell J, Yacoub E, Van Essen DC. A multi-modal parcellation of human cerebral cortex. Nature. 2016;536(7615):171–178. doi: 10.1038/nature18933. httPs://doi.org/10.1038/nature18933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Glasser MF, Sotiropoulos SN, Wilson JA, Coalson TS, Fischl B, Andersson JL, Jenkinson M. The minimal preprocessing pipelines for the Human Connectome Project. Neurolmage. 2013;80:105–124. doi: 10.1016/j.neuroimage.2013.04.127. https://doi.org/10.1016/j.neuroimage.2013.04.127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Zuo XN, Anderson JS, Bellec P, Birn RM, Biswal BB, Blautzik J, Milham MP. An open science resource for establishing reliability and reproducibility in functional connectomics. Scientific Data. 2014;1140049 doi: 10.1038/sdata.2014.49. https://doi.org/10.1038/sdata.2014.49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Nichols TE, Das S, Eickhoff SB, Evans AC, Glatard T, Hanke M, Yeo BTT. Best practices in data analysis and sharing in neuroimaging using MRI. bioRxiv. 2016;054262 doi: 10.1038/nn.4500. https://doi.org/10.1101/054262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Epskamp S, Nuijten MB. statcheck: Extract statistics from articles and recompute p values. 2016 Retrieved from http://CRAN.R-project.org/package=statcheck.
- 99.Nuijten MB, Hartgerink CHJ, Assen MALM, van Epskamp S, Wicherts JM. The prevalence of statistical reporting errors in psychology (1985-2013) Behavior Research Methods. 2015;48(4):1–22. doi: 10.3758/s13428-015-0664-2. https://doi.org/10.3758/s13428-015-0664-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Simonsohn U, Nelson LD, Simmons JP. P-curve: a key to the file-drawer. Journal of Experimental Psychology: General. 2014;143(2):534. doi: 10.1037/a0033242. [DOI] [PubMed] [Google Scholar]
- 101.Wager TD, Lindquist M, Kaplan L. Meta-analysis of functional neuroimaging data: current and future directions. Social, Cognitive and Affective Neuroscience. 2007;2(2):150–158. doi: 10.1093/scan/nsm015. https://doi.org/10.1093/scan/nsm015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Nieuwenhuis S, Forstmann BU, Wagenmakers EJ. Erroneous analyses of interactions in neuroscience: a problem of significance. Nature Neuroscience. 2011;14(9):1105–1107. doi: 10.1038/nn.2886. https://doi.org/10.1038/nn.2886. [DOI] [PubMed] [Google Scholar]
- 103.Chen G, Saad ZS, Britton JC, Pine DS, Cox RW. Linear mixed-effects modeling approach to FMRI group analysis. Neurolmage. 2013;73:176–190. doi: 10.1016/j.neuroimage.2013.01.047. https://doi.org/10.1016/j.neuroimaae.2013.01.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Kriegeskorte N, Simmons WK, Bellgowan PSF, Baker CI. Circular analysis in systems neuroscience: the dangers of double dipping. Nature Neuroscience. 2009;12(5):535–540. doi: 10.1038/nn.2303. https://doi.org/10.1038/nn.2303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Westfall J, Nichols T, Yarkoni T. Fixing the stimulus-as-fixed-effect fallacy in task fMRI. bioRxiv. 2016;077131 doi: 10.12688/wellcomeopenres.10298.1. https://doi.org/10.1101/077131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Cox RW, Reynolds RC, Taylor PA. AFNI and clustering: False positive fates fedux. bioRxiv. 2016;065862 https://doi.org/10.1101/065862. [Google Scholar]
- 107.Eickhoff SB, Laird AR, Fox PM, Lancaster JL, Fox PT. Implementation errors in the GingerALE Software: Description and recommendations. Human Brain Mapping. 2016;38(1) doi: 10.1002/hbm.23342. https://doi.org/10.1002/hbm.23342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Breiman L. Statistical modeling: The two cultures (with comments and a rejoinder by the author) Statistical Science. 2001;16(3):199–231. https://doi.org/10.1214/ss/1009213726. [Google Scholar]
- 109.Kay KN, Naselaris T, Prenger RJ, Gallant JL. Identifying natural images from human brain activity. Nature. 2008;452(7185):352–355. doi: 10.1038/nature06713. https://doi.org/10.1038/nature06713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Nishimoto S, Vu AT, Naselaris T, Benjamini Y, Yu B, Gallant JL. Reconstructing visual experiences from brain activity evoked by natural movies. Current Biology. 2011;21(19):1641–1646. doi: 10.1016/j.cub.2011.08.031. https://doi.org/10.1016/j.cub.2011.08.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Huth AG, de Heer WA, Griffiths TL, Theunissen FE, Gallant JL. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature. 2016;532(7600):453–458. doi: 10.1038/nature17637. https://doi.org/10.1038/nature17637. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Varoquaux G, Raamana PR, Engemann D, Hoyos-ldrobo A, Schwartz Y, Thirion B. Assessing and tuning brain decoders: Cross-validation, caveats, and guidelines. Neurolmage. 2017;145:166–179. doi: 10.1016/j.neuroimage.2016.10.038. https://doi.org/10.1016/j.neuroimage.2016.10.038. [DOI] [PubMed] [Google Scholar]
- 113.Yarkoni T, Westfall J. Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science. doi: 10.1177/1745691617693393. (in press) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Cawley GC, Talbot NLC. On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research. 2010;11(Jul):2079–2107. http://www.jmlr.org/papers/v11/cawley10a.html. [Google Scholar]
- 115.Skocik M, Collins J, Callahan-Flintoft C, Bowman H, Wyble B. I tried a bunch of things: The dangers of unexpected overfitting in classification. bioRxiv. 2016:078816. doi: 10.1016/j.neubiorev.2020.09.036. [DOI] [PubMed] [Google Scholar]
- 116.Bandrowski AE, Martone ME. RRIDs: A simple step toward improving reproducibility through rigor and transparency of experimental methods. Neuron. 2016;90(3):434–436. doi: 10.1016/j.neuron.2016.04.030. http://doi.org/10.1016/j.neuron.2016.04.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Ascoli GA. The ups and downs of neuroscience shares. Neuroinformatics. 2006;4(3):213–215. doi: 10.1385/NI:4:3:213. https://doi.org/10.1385/NI:4:3:213. [DOI] [PubMed] [Google Scholar]
- 118.Gilmore RO. From big data to deep insight in developmental science. Wiley Interdisciplinary Reviews Cognitive Science. 2016 doi: 10.1002/wcs.1379.. [DOI] [PMC free article] [PubMed] [Google Scholar]
