Abstract
In this perspective article, we consider the critical issue of data and other research object standardisation and, specifically, how international collaboration, and organizations such as the International Neuroinformatics Coordinating Facility (INCF) can encourage that emerging neuroscience data be Findable, Accessible, Interoperable, and Reusable (FAIR). As neuroscientists engaged in the sharing and integration of multi-modal and multiscale data, we see the current insufficiency of standards as a major impediment in the Interoperability and Reusability of research results. We call for increased international collaborative standardisation of neuroscience data to foster integration and efficient reuse of research objects.
Keywords: Neuroscience; Standards,; Interoperability; International Neuroinformatics Coordinating Facility
As experimental assays and analyses become increasingly complex and the scale of tissue and cellular profiling multiplies, neuroscience faces increasing challenges. To be used efficiently, accessible data need to be described coherently and with standards. We present here the argument that standardisation is critical, requires an international effort, and will lead to much improved efficiency in neuroscience research. We call for the neuroscience community to join this standardisation effort.
Neuroscience is a multifactorial discipline where significant advances are made by combining theoretical, computational, experimental and technological approaches. The challenges of understanding function and dysfunction of the brain are still of unknown complexity and far from being met. Progress in therapeutics and clinical impact lags technical advances (Kapur et al., 2012), despite increased depth and scope of investigations. Results are currently published at an unprecedented rate, and for data and methods, stricter requirements are in motion toward more rigorous practices at funding agencies (NOT-OD-21–013) and most journals. Even so, there remain concerns about the large amount of poorly reproducible results (Ioannidis, 2007; Ioannidis et al., 2014; Baker, 2016). This represents a dire ‘lack of efficiency’ of neuroscience research amongst other fields (Chu & Evans, 2021).
To achieve the goals of basic and applied neuroscience, these fields require systematic, standardized and well-defined data organization practices and proper data description for effective content discovery and reproducibility. Deep understanding of the field necessitates 1) leveraging and extending existing theoretical frameworks and models for making testable predictions; and 2) probing experimental results and their interpretation by reanalysing data from different, new angles using cutting-edge analytic techniques. Given the complexity, scale, and multidisciplinary nature of the problem of understanding brain function, we argue such methods and practices for developing, integrating and testing theories and models must become radically more efficient, to keep up with the dramatic (and accelerating) advances in data acquisition.
There are evident reasons for the present state of practices in data sharing and management, not the least of which is the complexity of the nervous system itself. It is this profound complexity that requires the neuroscience research enterprise to efficiently integrate a broad set of results across manifold subfields. While these different subfields have common core concepts (like ‘function’, ‘experiment’, ‘observation’, ‘conclusion’, etc.), the data types, formats, experimental paradigms and appropriate metadata for these subfields differ, making integration of data and the development of a coherent theory of neural function a formidable challenge. Integration is also hampered by the culture of neuroscience (Ascoli, 2006) that still mostly values text-based articles over publishing dynamically usable research products, such as datasets with access methods, or Web based computational notebooks (Jupyter notebooks or Elife’s ‘Executable Research Articles’, Elife 2020). To achieve a more expedient understanding of brain function will require us to move beyond the present code and data archiving and sharing practices (Gleeson et al., 2017), information architectures, and publication models.
Is this problem resolving or compounding? Nationally and internationally, large investments in big data neuroscience initiatives are being undertaken (eHBP1, US BRAIN2, ENIGMA3, the Human Connectome Project (Elam et al., 2021), the China Brain Project, the Japanese Brain/MINDS project4, among others). A myriad of smaller, investigator-initiated research projects are continuously adding to our knowledge base, and individual investigators are thinking more broadly through extended collaborations. With this avalanche of data already underway, there are in fact few efforts to coordinate across communities the development of data standards and knowledge representation resulting from experimental research. It is well known that research incentive and funding structures are better suited for individual projects and initiatives but do not always foster collaboration in addressing the data and knowledge management in the bigger picture. Major funding of large scale consortia still prioritize new data generation over data integration, interoperability, management, or maximizing knowledge across existing databases. Efforts such as the NIH Common Fund's Stimulating Peripheral Activity to Relieve Conditions (SPARC) program and the BRAIN Initiative’s Cell Census Network (BICCN) in the US are reshaping these consortial practices but remain limited.
The funding bias towards new data but limited integration or curation introduces inefficiencies due to duplicated efforts, resulting in missing and redundant information, dramatically reducing the overall return on the research investment in the future. It is hard to quantify, but there is certainly a huge waste with poor reusability (Fergusson et al., 2014).
As neuroscience is an international effort, the effort to develop and implement standards is necessarily international in scope. Local efforts based on a few laboratories are unlikely to gather the critical mass necessary for adoption of a standard and the development of its ecosystem (tooling, training, etc.). Further, standards and repositories that are successful in supporting the aggregation of data across borders need to be sustained to be useful. In this way the development and adoption of rigorous and open data standards is seen as one of the key elements to promoting efficient collaboration and reuse. Effective data standards are tightly coupled with the availability of software tools which manage input and output data representations and transformations. For researchers remotely working in different subfields of neuroscience, improved standards and associated metadata is the only practical solution for efficient reuse of information to bridge across subfield domains (scales, species, cell type, resolution, brain functions, etc.). Funders and researchers are better at ‘developing’ than at ‘sustaining’, as there are less research incentives around maintenance of sustainable and coordinated information management infrastructure.
Is neuroscience a FAIR discipline, where data and results are Findable, Accessible, Interoperable and Reusable (Wilkinson et al., 2016)? FAIR is a set of increasingly accepted guiding principles for organizing and communicating the results of science so that they are understandable to both humans and machines. By contrast, current communication of results is still largely based on text (pdf—html) format articles. The articles themselves are findable, and often accessible (thanks to open initiatives like PubMed Central), but the central elements leading to the conclusions of the research (namely the data, software, detailed methods and complete results), are rarely FAIR. This lack of availability and transparency are at least partially causative of the problems that have emerged in terms of the reliability, reusability and reproducibility of the current research findings. There is a growing trend to meta-analyse sets of published data, but these will miss most data, i.e., studies for which data are not accessible or sufficiently reusable because of their format. Such omissions actually bias these important attempts at trans-study synthesis. Enabling the FAIR principles across the complete research workspace would make information aggregation feasible, efficient, and un- or less- biased (Mueller et al., 2018).
Today, the Web and other communication technologies provide fundamental tools to resolve this information integration issue. However, if our goal is to work efficiently and collaboratively and communicate research findings beyond exchanging papers, we need to establish a broader set of standards of communication. Anecdotally, the World Wide Web is successful because every browser “speaks” the same standards: http/html. Imagine how inefficient a Web search would be if 5 different browsers were required to execute separate search elements and we then had to manually integrate the results. The efficiency of Web search today is due to the international standardisation efforts and oversight of the World Wide Web Consortium (W3C). The inefficiency of a 5 browser manually-integrated search is suggestive of the current state of the art for a neuroscience query (i.e., “what is the cell density in brain regions associated with socialization, that expresses BDNF in the second trimester of development?”). Standards of data description and communication will also be necessary for machine and deep learning technologies to operate efficiently on large and diverse datasets, and reduce the huge current curation burden. Machines will need to extract standardized metadata for analyses to be efficient and unbiased.
So where do we go from here? A number of actions can be envisioned. One step is that national funding agencies should invest in international organizations and initiatives whose mission is focused on standardisation and education in neuroscience. The most established and experienced organization for neuroscience is the International Neuroinformatics Coordinating Facility (INCF)5, with its current network of 18 affiliated nations6. Through its new membership model, community-driven scientific interest groups, and international governance by members' representatives and stakeholders, the INCF provides the scaffolding and networking for community engagement around standards. A core mission of the INCF is to ensure that neuroscience is served by a set of well supported, non-overlapping standards that are easy to access and understand. The INCF has taken on the important role of acting as a standards organization for neuroscience, where standards and best practices can be reviewed, vetted and promoted (Abrams et al., 2021). Through this process, the INCF is creating a portfolio of standards that serve neuroscience and is developing training materials on their use. As the recognition by funders (such as NIH, NSF, Kavli, John and Laura Arnold Foundation, etc.), and the broader neuroscience community that this work is critical grows, the experience collected by INCF is unique and should be leveraged. Other international efforts include the International Brain Initiative, a consortium of the large international brain projects, which has recently established the Standards and Data Sharing Working Group to help achieve coordination with the INCF across large scale projects7. The IEEE has several standards efforts underway for neurotechnology8.
The successful development and adoption of standards will rely on the symbiotic development of current and novel tools that use and profit from these standards. The time frame for developing these standards and tools through community involvement, especially across international lines, is typically much longer than the grant cycles that power individual research programs. Thanks to investments and the work of dedicated volunteers working through organizations like INCF, a set of standards supporting neuroscience are starting to gain traction. A remarkable example of a successful standard is the “Brain Imaging Data Structure”9, for MRI data, started at an INCF meeting at Stanford, along with the development of many analysis tools that rely on the standard to automatically extract data and metadata (Gorgolewski et al., 2016, 2017). A substantial community has grown around the BIDS standard, with community-led extensions to domains such as magnetoencephalography (Niso et al., 2019) and electroencephalography (Pernet et al., 2019). The community has developed a formal governance procedure for extensions to the protocol, as well as a governance structure with an elected steering group. A second success is the Waxholm Space, a 3D MRI-based coordinate space for registering data in rat and mouse to a common coordinate system (Johnson et al., 2010; Okamura-Oho et al., 2012; Papp et al., 2014), adopted by the HBP and E-Brains. The Neurodata Without Borders (NWB) is an emerging standard for physiological data that has just issued its second version and is seeing uptake in the US BRAIN Initiative and other collaborative projects (Rübel et al., 2021). Finally, the US BRAIN Initiative is actively investing in the creation of new standards to support neuroscience10.
Programs for standardisation and promoting FAIR practices could also be set up through INCF or through scientific societies. These societies themselves, such as the Society for Neuroscience, the Organization for Human Brain Mapping and the clinical neurophysiology societies, can play a crucial role in encouraging standards development (e.g. Nichols et al., 2017), particularly as a growing number of journals and funding agencies are requiring deposition of code and data in a form suitable for secondary analyses. However, these organizations would need to form alliances and put in place the required funding tools as well as a vetting process, while such a process is already provided with the INCF.
Expanded information architecture and new software tools also have their role to play. The Open Connectome Project has enhanced the FAIRness of several prominent neuroscience studies (Vogelstein et al., 2018). These datasets are stored in a precomputed format in a publicly accessible cloud repository, and can be read, written, and viewed with nothing more than an Internet connection and browser (Charles et al., 2020). The Jupyter project (Kluyver et al., 2016) also proposes formats and infrastructure for data reuse. Efforts such as the European Human Brain Project (www.human brain project.eu) have made significant progress in knowledge graph architecture for neuroscience through projects such as the EBRAINS Knowledge Graph, a multi-modal metadata repository and query engine supporting experimental data and neuroscience data research. The SPARC Project has adopted FAIR principles in its data portal and knowledge graph, including full support for data citation (Osanlouy et al., 2021).
The FAIR data community and the INCF have learned -and continue to learn- some lessons from the decades old open source software community and standard organizations. Standard practice in open source software packages includes one line installation commands, and a quick-start tutorial, along with thoroughly documented code (Glatard et al., 2018; Vogelstein, 2018). These standards could translate to data stewardship practice in the form of brief readmes describing how to download and access the data, accompanied by more detailed metadata that is tethered to the data itself.
There have been multiple efforts to move neuroscience into e-Neuroscience through the development of standards and tools, some have been successful and some have not. However, we should refrain from interpreting the difficulties encountered as an argument against addressing the urgent requirements of a transformed and rapidly evolving field of neuroscience. Our technologies improve continuously, and empirical advantages, social pressures, and institutional policies have moved scientific communities towards open, data-driven and networked science. Recall also that the Web evolved in several phases, and standardized Web browsers were not a part of its early phase. Neuroscience is unlikely to ever be served by a single large database, but over the years, a functioning infrastructure comprising multiple databases and data repositories has emerged for sharing neuroscience data (Ascoli et al., 2017). We have today an opportunity to develop the necessary technology and make these infrastructures interoperable. We also have learned from the past about the technological and sociological barriers and are now in a better position to address them. The impact of increasing the reusability and therefore efficiency of neuroscience would be widespread and world-changing.
As the field of computational neuroscience evolves with new data acquisition methods, new hardware capabilities, and new analysis techniques, data standards will inevitably need to be updated or replaced. This reinforces the need for organizations to overlook these evolutions and the need for open governance as implemented by the INCF.
As scientists, we should pledge to work on the definition, development and implementation of standards and to foster a spirit of collaborative work in these developments. This work is not easy and will require dedication and support. We should ensure in our own research proposals that some effort will be set aside for the development of reproducible and FAIR research objects through international coordination. Ultimately research across the world is a collective and collaborative enterprise. Along with societies and funding agencies, individual scientists should take a proactive role in the evolution of this new world culture of FAIR neuroscience. One simple and concrete action is to participate in these global standardisation and coordination efforts with the INCF and together build a roadmap for FAIR neuroscience.
Footnotes
https://standards.ieee.org/industry-connections/neurotechnologies-for-brain-machine-interfacing.html
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Jean-Baptiste Poline, Email: jean-baptiste.poline@mcgill.ca.
David N. Kennedy, Email: david.kennedy@umassmed.edu
Giorgio A. Ascoli, Email: ascoli@gmu.edu
David C. Van Essen, Email: vanessen@wustl.edu
Adam R. Ferguson, Email: adam.ferguson@ucsf.edu
Jeffrey S. Grethe, Email: jgrethe@ucsd.edu
Michael J. Hawrylycz, Email: mikeh@alleninstitute.org
Paul M. Thompson, Email: pthomp@usc.edu
Russell A. Poldrack, Email: russpold@stanford.edu
Satrajit S. Ghosh, Email: satra@mit.edu
David B. Keator, Email: dbkeator@uci.edu
Joshua T. Vogelstein, Email: jovo@jhu.edu
Helen S. Mayberg, Email: helen.mayberg@mssm.edu
Maryann E. Martone, Email: mmartone@ucsd.edu
References
- Abrams, M.B., Bjaalie, J.G., Das, S., Egan, G.F., Ghosh, S.S., Goscinski, W.J., Grethe, J.S., Kotaleski, J.H., Ho, E.T.W., Kennedy, D.N., Lanyon, L.J., Leergaard, T.B., Mayberg, H.S., Milanesi, L., Mouček, R., Poline, J.B., Roy, P.K., Strother, S.C., Tang, T.B., Tiesinga, P., Wachtler, T., Wójcik, D.K., & Martone, M.E. (2021). A Standards Organization for Open and FAIR Neuroscience: the International Neuroinformatics Coordinating Facility. Neuroinform. 10.1007/s12021-020-09509-0 [DOI]
- Ascoli GA. The ups and downs of neuroscience shares. Neuroinformatics. 2006;4(3):213–215. doi: 10.1385/NI:4:3:213. [DOI] [PubMed] [Google Scholar]
- Ascoli GA, Maraver P, Nanda S, Polavaram S, Armañanzas R. Win–win data sharing in neuroscience. Nature Methods. 2017;14(2):112–116. doi: 10.1038/nmeth.4152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baker M. Is there a reproducibility crisis? Nature. 2016;533:451–453. [Google Scholar]
- Charles, A. S., Benjamin, F., Nicholas, T., Talmo, D. P., Daniel T., Benjamin D. P., Jaewon C., et al. (2020). Toward Community-Driven Big Open Brain Science: Open Big Data and Tools for Structure, Function, and Genetics. July. 10.1146/annurev-neuro-100119-110036 [DOI] [PMC free article] [PubMed]
- Chu, J. S. G., & Evans, J. A. (2021). Slowed canonical progress in large fields of science. PNAS 118. 10.1073/pnas.2021636118 [DOI] [PMC free article] [PubMed]
- Elam JS, Glasser MF, Harms MP, Sotiropoulos SN, Andersson JLR, Burgess GC, Curtiss SW, Oostenveld R, Larson-Prior LJ, Schoffelen JM, Hodge MR, Cler EA, Marcus DM, Barch DM, Yacoub E, Smith SM, Ugurbil K, Van Essen DC. NeuroImage. 2021;1(244):118543. doi: 10.1016/j.neuroimage.2021.118543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- eLife launches Executable Research Articles for publishing computationally reproducible results (2020). eLife. URL https://elifesciences.org/for-the-press/eb096af1/elife-launches-executable-research-articles-for-publishing-computationally-reproducible-results
- Ferguson AR, Nielson JL, Cragin MH, Bandrowski AE, Martone ME. Big data from small data: Data-sharing in the “long tail” of neuroscience. Nature Neuroscience. 2014;17:1442–1447. doi: 10.1038/nn.3838. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Glatard, T., Kiar, G., Aumentado-Armstrong, T., Beck, N., Bellec, P., Bernard, R., Bonnet, A., Brown, S. T., Camarasu-Pop, S., Cervenansky, F., Das, S., Ferreira, R., da Silva, G., Flandin, P., Girard, K. J., Gorgolewski, C. R. G., Guttmann, V., Hayot-Sasson, P. O., Quirion, P., Rioux, M. É., & Rousseau Evans, A. C. (2018). Boutiques: A flexible framework to integrate command-line applications in computing platforms. Gigascience, 7. 10.1093/gigascience/giy016 [DOI] [PMC free article] [PubMed]
- Gleeson P, Davison AP, Silver RA, Ascoli GA. A commitment to open source in neuroscience. Neuron. 2017;96(5):964–965. doi: 10.1016/j.neuron.2017.10.013. [DOI] [PubMed] [Google Scholar]
- Gorgolewski, K. J., Tibor, A., Vince, D., Calhoun, R., Cameron, C., Samir, D., Eugene, P., Duff, G. F., & Poldrack, R. A. (2016). The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Scientific Data, 3(1), 1-9. [DOI] [PMC free article] [PubMed]
- Gorgolewski KJ, Alfaro-Almagro F, Auer T, Bellec P, Capotă M, Chakravarty MM, Churchill NW, Cohen AL, Craddock RC, Devenyi GA, Eklund A, Esteban O, Flandin G, Ghosh SS, Guntupalli JS, Jenkinson M, Keshavan A, Kiar G, Liem F, Raamana PR, Raffelt D, Steele CJ, Quirion P-O, Smith RE, Strother SC, Varoquaux G, Wang Y, Yarkoni T, Poldrack RA. BIDS apps: Improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods. PLoS Computational Biology. 2017;13:e1005209. doi: 10.1371/journal.pcbi.1005209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ioannidis JPA. Why Most Published Research Findings Are False: Author’s Reply to Goodman and Greenland. PLoS Medicine. 2007;4(6):2. doi: 10.1371/journal.pmed.0040215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ioannidis JPA, Greenland S, Hlatky MA, Khoury MJ, Macleod MR, Moher D, Schulz KF, Tibshirani R. Increasing Value and Reducing Waste in Research Design, Conduct, and Analysis. The Lancet. 2014;383(9912):166–175. doi: 10.1016/S0140-6736(13)62227-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnson GA, Badea A, Brandenburg J, Cofer G, Fubara B, Liu S, Nissanov J. Waxholm Space: An Image-Based Reference for Coordinating Mouse Brain Research. NeuroImage. 2010;53(2):365–372. doi: 10.1016/j.neuroimage.2010.06.067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kapur S, Phillips AG, Insel TR. Why has it taken so long for biological psychiatry to develop clinical tests and what to do about it? Molecular Psychiatry. 2012;17(12):1174–1179. doi: 10.1038/mp.2012.105. [DOI] [PubMed] [Google Scholar]
- Kluyver, T., Ragan-Kelley, B., Pérez, F., Bussonnier, M., Frederic, J., Hamrick, J., Grout, J., Corlay, S., Ivanov, P., Abdalla, S., & Willing, C. (2016). Jupyter Notebooks—a publishing format for reproducible computational workflows 4.
- Mueller VI, Cieslik EC, Laird AR, Fox PT, Radua J, et al. Ten simple rules for neuroimaging meta-analysis. Neuroscience and Biobehavioral Reviews. 2018;84:151–161. doi: 10.1016/j.neubiorev.2017.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nichols TE, Das S, Eickhoff SB, Evans AC, Glatard T, Hanke M, Kriegeskorte N, Milham MP, Poldrack RA, Poline J-B, Proal E, Thirion B, Van Essen DC, White T, Yeo BTT. Best practices in data analysis and sharing in neuroimaging using MRI. Nature Neuroscience. 2017;20(3):299–303. doi: 10.1038/nn.4500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Niso, G., Tadel, F., Bock, E., Cousineau, M., Santos, A., & Baillet, S. (2019). Brainstorm Pipeline Analysis of Resting-State Data From the Open MEG Archive. Frontiers in Neuroscience, 13, 284. 10.3389/fnins.2019.00284 [DOI] [PMC free article] [PubMed]
- Okamura-Oho, Y., Shimokawa, K., Takemoto, S., Hirakiyama, A., Nakamura, S., Tsujimura, Y., Nishimura, M., Kasukawa, T., Masumoto, K.H., Nikaido, I., & Shigeyoshi, Y. (2012). “Transcriptome Tomography for Brain Analysis in the Web-Accessible Anatomical Space.” PloS One, 7(9), e45373. [DOI] [PMC free article] [PubMed]
- Osanlouy M, Bandrowski A, de Bono B, Brooks D, Cassarà AM, Christie R, Ebrahimi N, Gillespie T, Grethe JS, Guercio LA, Heal M, Lin M, Kuster N, Martone ME, Neufeld E, Nickerson DP, Soltani EG, Tappan S, Wagenaar JB, Zhuang K, Hunter PJ. The SPARC DRC: Building a Resource for the Autonomic Nervous System Community. Frontiers in Physiology. 2021;12:929. doi: 10.3389/fphys.2021.693735. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Papp EA, Leergaard TB, Evan Calabrese G, Johnson A, Bjaalie JG. Waxholm Space Atlas of the Sprague Dawley Rat Brain. NeuroImage. 2014;97(August):374–386. doi: 10.1016/j.neuroimage.2014.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pernet, C. R., Appelhoff, S., Gorgolewski, K. J., Flandin, G., Phillips, C., Delorme, A., & Oostenveld, R. (2019). EEG-BIDS, an extension to the brain imaging data structure for electroencephalography. Scientific Data, 6(1), 103. 10.1038/s41597-019-0104-8 [DOI] [PMC free article] [PubMed]
- Rübel, O., Tritt, A., Ly, R., Dichter, B.K., Ghosh, S., Niu, L., Soltesz, I., Svoboda, K., Frank, L., & Bouchard, K.E. (2021). The Neurodata Without Borders ecosystem for neurophysiological data science. 10.1101/2021.03.13.435173 [DOI] [PMC free article] [PubMed]
- Vogelstein, J. T., (2018). The FIRM Guiding Principles for scientific software development and stewardship. Bits and Brains. https://bitsandbrains.io/2018/10/21/numerical-packages.html
- Vogelstein JT, Perlman E, Falk B, Baden A, Roncal WG, Chandrashekhar V, Collman F, et al. A Community-Developed Open-Source Computational Ecosystem for Big Neuro Data. Nature Methods. 2018;15(11):846–847. doi: 10.1038/s41592-018-0181-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., & Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), 1-9. [DOI] [PMC free article] [PubMed]