A modular architecture for organizing, processing and sharing neurophysiology data

The International Brain Laboratory; Niccolò Bonacchi; Gaelle Chapuis; Anne Churchland; Eric E J DeWitt; Mayo Faulkner; Kenneth D Harris; Julia M Huntenburg; Max Hunter; Inês Laranjeira; Cyrille Rossant; Maho Sasaki; Michael Schartner; Shan Shen; Nicholas A Steinmetz; Edgar Y Walker; Steven J West; Olivier Winter; Miles Wells

doi:10.1038/s41592-022-01742-6

. Author manuscript; available in PMC: 2023 Jun 14.

Published in final edited form as: Nat Methods. 2023 Mar 2;20(3):403–407. doi: 10.1038/s41592-022-01742-6

A modular architecture for organizing, processing and sharing neurophysiology data

The International Brain Laboratory^✉, Niccolò Bonacchi ¹, Gaelle Chapuis ^2,³, Anne Churchland ⁴, Eric E J DeWitt ¹, Mayo Faulkner ², Kenneth D Harris ², Julia M Huntenburg ⁵, Max Hunter ², Inês Laranjeira ¹, Cyrille Rossant ², Maho Sasaki ⁶, Michael Schartner ¹, Shan Shen ⁶, Nicholas A Steinmetz ⁷, Edgar Y Walker ⁶, Steven J West ⁸, Olivier Winter ¹, Miles Wells ²

PMCID: PMC7614641 EMSID: EMS157526 PMID: 36864199

Abstract

We describe an architecture for organizing, integrating, and sharing neurophysiology data in single labs or collaborations. It comprises a database linking data files to metadata and electronic lab notes; a module collecting data from multiple labs into one location; a protocol for searching and sharing data; and a module for automatic analyses which populates a website. These modules can be used together or individually, by single labs or worldwide collaborations.

Improving technology allows neurophysiologists to record ever larger datasets. The need for technologies to organize and share this data is growing as scientists begin to assemble into large, international teams. The International Brain Laboratory (IBL) is a collaboration studying the computations supporting decision-making¹. We have developed modular data-management tools that enable individual labs and collaborations to:

Manage experimental subject colonies and track subject- and experiment-level metadata
Integrate data from multiple labs in a central store for sharing inside or outside the collaboration
Access shared data through a simple programmatic interface
Process incoming data through pipelines that automatically populate a website

Modern neurophysiological datasets comprise multiple recordings from multiple subjects, recorded using diverse devices. These data must be preprocessed, time-aligned, and integrated with data such as locations of recording electrodes before they can be used to draw scientific conclusions^2–8. Distributed collaborations pose distinct challenges: while public data release must wait for careful quality control, scientists within the collaboration require immediate access to specific data. This store must be searchable and allow downloading and also revision of individual items, because preprocessing and quality control methods are still evolving^9–11.

We addressed these problems with an architecture consisting of four modules (Figure 1). The first module is a Web interface for colony management and electronic lab notebook, that links files arising from each experiment to relevant metadata. The second module integrates data from multiple labs into a central database and bulk data store, providing immediate access while allowing updates of individual items. The third automatically runs analyses on newly-arrived data, providing results via a Web interface. The fourth allows standardization, access and sharing of the data. Full documentation can be found at https://docs.internationalbrainlab.org/ and through the links at https://www.internationalbrainlab.com/tools.

IBL data architecture. The “Alyx” database links colony management and electronic lab notebook metadata to experimental data files on a lab data server. Data from multiple labs are integrated on a central server, and distributed job management coordinates pre-processing on lab servers. Data are accessed via the Open Neurophysiology Environment (ONE) protocol, with adaptors for Neurodata Without Borders (NWB)^12,13 and DataJoint¹⁴, which also performs pipelined analyses for automatic display on a website.

To manage data within each lab, we developed “Alyx”: a relational database that links colony management, metadata, and lab notes to experimental data files. A web GUI allows users to enter metadata as it arrives (such as birth, weaning, genotyping, surgeries or experiments), and a REST API allows experiment control software to automatically enter metadata with a one-line command. Bulk data files are stored on a lab server, and linked to experiment and subject metadata in the database. This tool can be used by single labs as well as collaborations: it was developed in one member lab prior to IBL’s founding, and is now used by several labs worldwide for non-IBL work. An Alyx user guide can be found here, or linked via our main documentation page.

Integrating data between labs raises challenges of size and complexity. Large-scale electrophysiology produces hundreds of gigabytes per experiment, for which we have designed a novel 3-fold lossless compression algorithm (Appendix 1). A single IBL experiment generates over 150 raw and processed data files. We have devised conventions for organizing and naming these files, termed the “Open Neurophysiology Environment” (ONE; Appendix 2; https://int-brain-lab.github.io/ONE/), which formalizes how to encode cross-references between files, time synchronization, and versioning, and allows local and remote access via an API. ONE provides a simple way to standardize and share data from individual labs, by specifying standard filenames for common data types (Appendix 3) and defining conventions for naming lab-specific data files. Files from multiple labs are integrated by uploading nightly from lab servers to a central server using Globus Online¹⁵, coordinated by a central Alyx database which also stores metadata from all labs.

Neurophysiology data requires preprocessing, such as spike sorting and video analysis. We developed a task management system that uses computers in member labs as a processing pool. Computers query the Alyx database for a list of outstanding preprocessing tasks, determined by a dependency graph. Because Alyxis accessed through http, this works despite different universities’ diverse firewall policies, and allows monitoring, logging, and restarting all preprocessing tasks. Higher-level analyses are automatically run on newly preprocessed data using DataJoint¹⁴, which runs automated analyses and places the results on a website, including summaries of behavioral performance allowing scientists to monitor training progress, and basic analyses of spike trains. While manual curation of the full dataset will be required before public release, an illustrative curated subset of these data are available on a public website (https://data.internationalbrainlab.org).

To access data, an API allows users to search experiments and load data from the ONE files directly into Python (Appendix 3). This API allows both collaborations and individual labs to share data using the same standard. A large collaboration such as IBL can host files on a server such as AWS, and run an Alyxserver which allows users to rapidly search and selectively download the data. Individual labs can release data compatible with the same API by “uploading and forgetting” a zip of ONE files for users to download in toto (instructions here). Users can also access data via Neurodata Without Borders (NWB)^12,13 using software that translates from the ONE standard (https://github.com/catalystneuro/IBL-to-nwb; Supplementary Table 1), or through DataJoint¹⁴. A comparison of these and other sharing systems is in Appendix 4. The analyses in a recently-published paper¹ were made using this system, and an additional example is provided in Appendix 5.

The IBL architecture was designed for our large-scale collaboration, but its modular design allows components to be used by individual labs and smaller-scale collaborations. The Alyx system provides easy-to-use colony management and electronic lab notebook features for labs or collaborations, linking experimental files to this metadata. The ONE conventions allow data to be organized within a lab and shared externally, using standards that scale to large collaborations. Larger collaborations can also benefit from other features such as the DataJoint architecture to perform automated analyses for web display. We hope that these tools, and additional software we have provided (Appendix 6), will help pave the way forward to an era in which data from neurophysiology labs is integrated and shared on a routine basis.

Supplementary Material

Supplementary Information

EMS157526-supplement-Supplementary_Information.pdf^{(628.6KB, pdf)}

Acknowledgements

This work was supported by the Wellcome Trust (209558 to IBL, 216324 to IBL) and Simons Foundation (to IBL).

Footnotes

Author contributions

Niccolò Bonacchi [Conceptualization] (supporting). [Data curation] Data, metadata, and pipeline (equal). [Funding acquisition] (supporting). [Project administration] Meeting coordination (supporting), attendance (equal). [Resources] Computing and data storage (equal). [Software, Validation] Pipeline, core, quality control (equal), database and analysis libraries (supporting). [Visualization] Posters (lead), presentations (equal). [Writing – original draft] (equal). [Writing – review & editing] (equal).

Gaelle Chapuis [Data Curation] Helped with writing user guides detailing how to enter metadata in Alyx (supporting). [Software, Validation] Acted as naive user tester for ONE and DJ (supporting). [Project administration] Gathered and reported users’ requirements (supporting).

Anne Churchland Contributed to project administration, funding acquisition, writing and revising.

Eric E. J. DeWitt [Investigation] (supporting) Contributed to analyses for example use of data [Writing] (supporting) Figures and draft text for example.

Mayo Faulkner. [Software] Contributed to implementation of backend data infrastructure and analysis libraries. [Data Curation] Contributed to curating datasets and assuring quality assurance.

Kenneth D. Harris. Contributed to design of overall data architecture, and to Alyx and ONE systems. Contributed to project administration, funding acquisition, writing and revising.

Julia Huntenburg. [Software] Contributed to implementation of backend data infrastructure, analysis libraries and continuous integration. [Data Curation] Contributed to dataset curation and quality assurance.

Michael Schartner. [Conceptualisation] Contributed to the development of dataset types related to video and their quality control metrics.

Max Hunter. Contributed to the design and development of Alyx.

Inês Laranjeira. [Investigation] Performed analyses for the example use case of the data architecture (lead). [Writing - original draft] (supporting)

Cyrille Rossant. Contributed to design and implementation of overall data architecture, and to Alyx and ONE systems.

Maho Sasaki Contributed to the design and implementation of the IBL Data Portal website.

Shan Shen Contributed to the design and implementation of the DataJoint pipeline and the IBL JupyterHub

Nicholas A. Steinmetz contributed to the design, testing, and development of Alyx, of dataset types, and of software tools for working with them.

Edgar Y. Walker Contributed to the design and implementation of the DataJoint pipeline and the IBL Data Portal.

Steven J. West contributed to the design of data structures for histological alignment.

Olivier Winter [Software] Implemented the backend data infrastructure. [Validation/Methodology] Designed full loop integration tests to allow maintenance of the codebase. [Data Curation] Fixed and updated erroneous datasets.

Miles Wells. [Software] Contributed to the design, testing and implementation of Alyx, its dataset types and of software tools that work with them. [Software, Validation] Contributed to the design and implementation of continuous integration and quality assurance systems. [Writing] Contributed to writing, reviewing and editing the text and appendices.

Competing Interests

The authors declare the following competing interests: E.Y.W. hold equity ownership in VathesLLC which provides development and consulting for the framework (DataJoint) described in this work. The remaining authors declare no competing interests

Data availability

Data for Figures 2 and 3 is available at https://data.internationalbrainlab.org/.

Code availability

All code described in this manuscript is freely available and is listed in Supplementary Table 1 along with links to their respective repositories. The behavior data were collected using Bonsai and pyBpod, available at https://github.com/int-brain-lab/iblrig. Meta data were stored in a custom database available at https://github.com/cortex-lab/alyx. The data were processed using the custom data pipelines ibllib (https://github.com/int-brain-lab/iblrig) and DataJoint (https://datajoint.io/). The data were accessed using ONE (https://github.com/int-brain-lab/ONE) and DataJoint (https://github.com/int-brain-lab/IBL-pipeline).

References

1.The International Brain Laboratory et al. Standardized and reproducible measurement of decision-making in mice. eLife. 2021;10:e63711. doi: 10.7554/eLife.63711. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Pachitariu M, et al. Suite2p: beyond 10,000 neurons with standard two-photon microscopy. bioRxiv. 2017:061507. doi: 10.1101/061507. [DOI] [Google Scholar]
3.Mathis A, et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat Neurosci. 2018;21:1281–1289. doi: 10.1038/s41593-018-0209-y. [DOI] [PubMed] [Google Scholar]
4.Giovannucci A, et al. CaImAn an open source tool for scalable calcium imaging data analysis. eLife. 2019;8:e38173. doi: 10.7554/eLife.38173. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Vogelstein JT, et al. Fast nonnegative deconvolution for spike train inference from population calcium imaging. J Neurophysiol. 2010;104:3691–704. doi: 10.1152/jn.01073.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Pachitariu M, Steinmetz NA, Kadir SN, Carandini M, Harris KD. In: Advances in Neural Information Processing Systems. Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R, editors. Vol. 29. Curran Associates, Inc; 2016. Fast and accurate spike sorting of high-channel count probes with KiloSort; pp. 4448–4456. [Google Scholar]
7.Wiltschko AB, et al. Revealing the structure of pharmacobehavioral space through motion sequencing. Nat Neurosci. 2020;23:1433–1443. doi: 10.1038/s41593-020-00706-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Vogelstein JT, et al. Discovery of Brainwide Neural-Behavioral Maps via Multiscale Unsupervised Structure Learning. Science. 2014;344:386–392. doi: 10.1126/science.1250298. [DOI] [PubMed] [Google Scholar]
9.Siegle JH, et al. Survey of spiking in the mouse visual system reveals functional hierarchy. Nature. 2021;592:86–92. doi: 10.1038/s41586-020-03171-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Hill DN, Mehta SB, Kleinfeld D. Quality metrics to accompany spike sorting of extracellular signals. J Neurosci. 2011;31:8699–705. doi: 10.1523/JNEUROSCI.0971-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Harris KD, Quiroga RQ, Freeman J, Smith SL. Improving data quality in neuronal population recordings. Nat Neurosci. 2016;19:1165–1174. doi: 10.1038/nn.4365. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Teeters JL, et al. Neurodata Without Borders: Creating a Common Data Format for Neurophysiology. Neuron. 2015;88:629–634. doi: 10.1016/j.neuron.2015.10.025. [DOI] [PubMed] [Google Scholar]
13.Rübel O, et al. The Neurodata Without Borders ecosystem for neurophysiological data science. bioRxiv. 2021:2021.03.13.435173. doi: 10.1101/2021.03.13.435173. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Yatsenko D, et al. DataJoint: managing big scientific data using MATLAB or Python. bioRxiv. 2015:031658. doi: 10.1101/031658. [DOI] [Google Scholar]
15.Foster I. Globus Online: Accelerating and Democratizing Science through Cloud-Based Services. IEEE Internet Comput. 2011;15:70–73. [Google Scholar]
16.Wang Q, et al. The Allen Mouse Brain Common Coordinate Framework: A 3D Reference Atlas. Cell. 2020;181:936–953.:e20. doi: 10.1016/j.cell.2020.04.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Allen Institute for Brain Science. Allen Mouse Brain Atlas (2015) with region annotations. 2017 [Google Scholar]
18.Urai AE, et al. Citric Acid Water as an Alternative to Water Restriction for High-Yield Mouse Behavior. eNeuro. 2021;11(1):8. doi: 10.1523/ENEURO.0230-20.2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information

EMS157526-supplement-Supplementary_Information.pdf^{(628.6KB, pdf)}

Data Availability Statement

Data for Figures 2 and 3 is available at https://data.internationalbrainlab.org/.

[R1] 1.The International Brain Laboratory et al. Standardized and reproducible measurement of decision-making in mice. eLife. 2021;10:e63711. doi: 10.7554/eLife.63711. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Pachitariu M, et al. Suite2p: beyond 10,000 neurons with standard two-photon microscopy. bioRxiv. 2017:061507. doi: 10.1101/061507. [DOI] [Google Scholar]

[R3] 3.Mathis A, et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat Neurosci. 2018;21:1281–1289. doi: 10.1038/s41593-018-0209-y. [DOI] [PubMed] [Google Scholar]

[R4] 4.Giovannucci A, et al. CaImAn an open source tool for scalable calcium imaging data analysis. eLife. 2019;8:e38173. doi: 10.7554/eLife.38173. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Vogelstein JT, et al. Fast nonnegative deconvolution for spike train inference from population calcium imaging. J Neurophysiol. 2010;104:3691–704. doi: 10.1152/jn.01073.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Pachitariu M, Steinmetz NA, Kadir SN, Carandini M, Harris KD. In: Advances in Neural Information Processing Systems. Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R, editors. Vol. 29. Curran Associates, Inc; 2016. Fast and accurate spike sorting of high-channel count probes with KiloSort; pp. 4448–4456. [Google Scholar]

[R7] 7.Wiltschko AB, et al. Revealing the structure of pharmacobehavioral space through motion sequencing. Nat Neurosci. 2020;23:1433–1443. doi: 10.1038/s41593-020-00706-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Vogelstein JT, et al. Discovery of Brainwide Neural-Behavioral Maps via Multiscale Unsupervised Structure Learning. Science. 2014;344:386–392. doi: 10.1126/science.1250298. [DOI] [PubMed] [Google Scholar]

[R9] 9.Siegle JH, et al. Survey of spiking in the mouse visual system reveals functional hierarchy. Nature. 2021;592:86–92. doi: 10.1038/s41586-020-03171-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Hill DN, Mehta SB, Kleinfeld D. Quality metrics to accompany spike sorting of extracellular signals. J Neurosci. 2011;31:8699–705. doi: 10.1523/JNEUROSCI.0971-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Harris KD, Quiroga RQ, Freeman J, Smith SL. Improving data quality in neuronal population recordings. Nat Neurosci. 2016;19:1165–1174. doi: 10.1038/nn.4365. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Teeters JL, et al. Neurodata Without Borders: Creating a Common Data Format for Neurophysiology. Neuron. 2015;88:629–634. doi: 10.1016/j.neuron.2015.10.025. [DOI] [PubMed] [Google Scholar]

[R13] 13.Rübel O, et al. The Neurodata Without Borders ecosystem for neurophysiological data science. bioRxiv. 2021:2021.03.13.435173. doi: 10.1101/2021.03.13.435173. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Yatsenko D, et al. DataJoint: managing big scientific data using MATLAB or Python. bioRxiv. 2015:031658. doi: 10.1101/031658. [DOI] [Google Scholar]

[R15] 15.Foster I. Globus Online: Accelerating and Democratizing Science through Cloud-Based Services. IEEE Internet Comput. 2011;15:70–73. [Google Scholar]

[R16] 16.Wang Q, et al. The Allen Mouse Brain Common Coordinate Framework: A 3D Reference Atlas. Cell. 2020;181:936–953.:e20. doi: 10.1016/j.cell.2020.04.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Allen Institute for Brain Science. Allen Mouse Brain Atlas (2015) with region annotations. 2017 [Google Scholar]

[R18] 18.Urai AE, et al. Citric Acid Water as an Alternative to Water Restriction for High-Yield Mouse Behavior. eNeuro. 2021;11(1):8. doi: 10.1523/ENEURO.0230-20.2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A modular architecture for organizing, processing and sharing neurophysiology data

Niccolò Bonacchi

Gaelle Chapuis

Anne Churchland

Eric E J DeWitt

Mayo Faulkner

Kenneth D Harris

Julia M Huntenburg

Max Hunter

Inês Laranjeira

Cyrille Rossant

Maho Sasaki

Michael Schartner

Shan Shen

Nicholas A Steinmetz

Edgar Y Walker

Steven J West

Olivier Winter

Miles Wells

Abstract

Figure 1.

Supplementary Material

Acknowledgements

Footnotes

Data availability

Code availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A modular architecture for organizing, processing and sharing neurophysiology data

Niccolò Bonacchi

Gaelle Chapuis

Anne Churchland

Eric E J DeWitt

Mayo Faulkner

Kenneth D Harris

Julia M Huntenburg

Max Hunter

Inês Laranjeira

Cyrille Rossant

Maho Sasaki

Michael Schartner

Shan Shen

Nicholas A Steinmetz

Edgar Y Walker

Steven J West

Olivier Winter

Miles Wells

Abstract

Figure 1.

Supplementary Material

Acknowledgements

Footnotes

Data availability

Code availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases