Skip to main content
Nature Portfolio logoLink to Nature Portfolio
. 2024 Apr 11;21(5):809–813. doi: 10.1038/s41592-024-02237-2

brainlife.io: a decentralized and open-source cloud platform to support neuroscience research

Soichi Hayashi 1,#, Bradley A Caron 1,2,#, Anibal Sólon Heinsfeld 2, Sophia Vinci-Booher 1,3, Brent McPherson 1,4, Daniel N Bullock 1, Giulia Bertò 2, Guiomar Niso 1,5, Sandra Hanekamp 2, Daniel Levitas 1,2, Kimberly Ray 2, Anne MacKenzie 2, Paolo Avesani 6, Lindsey Kitchell 1,7, Josiah K Leong 1,8, Filipi Nascimento-Silva 1, Serge Koudoro 1, Hanna Willis 9, Jasleen K Jolly 10, Derek Pisner 2, Taylor R Zuidema 1, Jan W Kurzawski 11, Kyriaki Mikellidou 12,13, Aurore Bussalb 14, Maximilien Chaumon 14, Nathalie George 14, Christopher Rorden 15, Conner Victory 16, Dheeraj Bhatia 2, Dogu Baran Aydogan 17,18, Fang-Cheng F Yeh 19, Franco Delogu 16, Javier Guaje 1, Jelle Veraart 11, Jeremy Fischer 1, Joshua Faskowitz 1, Ricardo Fabrega 1, David Hunt 1, Shawn McKee 20, Shawn T Brown 21, Stephanie Heyman 22, Vittorio Iacovella 23, Amanda F Mejia 1, Daniele Marinazzo 24, R Cameron Craddock 2, Emanuale Olivetti 23, Jamie L Hanson 19, Eleftherios Garyfallidis 1, Dan Stanzione 2, James Carson 2, Robert Henschel 1, David Y Hancock 1, Craig A Stewart 1, David Schnyer 2, Damian O Eke 25, Russell A Poldrack 26, Steffen Bollmann 27, Ashley Stewart 27, Holly Bridge 9, Ilaria Sani 28,29, Winrich A Freiwald 28, Aina Puce 1, Nicholas L Port 1, Franco Pestilli 1,2,
PMCID: PMC11093740  PMID: 38605111

Abstract

Neuroscience is advancing standardization and tool development to support rigor and transparency. Consequently, data pipeline complexity has increased, hindering FAIR (findable, accessible, interoperable and reusable) access. brainlife.io was developed to democratize neuroimaging research. The platform provides data standardization, management, visualization and processing and automatically tracks the provenance history of thousands of data objects. Here, brainlife.io is described and evaluated for validity, reliability, reproducibility, replicability and scientific utility using four data modalities and 3,200 participants.

Subject terms: Cognitive neuroscience, Computational neuroscience, Translational research, Technology, Databases


brainlife.io is a one-stop cloud platform for data management, visualization and analysis in human neuroscience. It is web-based and provides access to a variety of tools in a reproducible and reliable manner.

Main

Over the past 30 years, neuroimaging has matured to adopt the FAIR (findable, accessible, interoperable and reusable) principles1,2, develop reporting best practices3 and data standards4. While making research more rigorous and transparent, this maturation has inevitably increased compliance requirements. Indeed, just a few years ago it was possible to publish studies with a few hours of data collected and analyzed in a single laboratory. Today, studies require combining hundreds of hours of measurement, across multiple participants, laboratories and data modalities (for example, magnetic resonance imaging (MRI), positron emission tomography, functional near-infrared spectroscopy, electro-encephalography (EEG) and magnetoencephalography (MEG)). To support the needs of a mature neuroimaging field, several data collection efforts have been started; relevant examples are the Human Connectome Project (HCP)5, Cambridge Centre for Ageing and Neuroscience study (Cam-CAN)6, Adolescent Brain Cognitive Development (ABCD) study7, the UK-Biobank8, Healthy Brain Network (HBN)9, Pediatric Imaging Neurocognition and Genetics (PING) study10 and the Natural Scene Dataset11. At the same time, the complexity of the data pipeline has also increased with multiple, distinct, software libraries and analysis toolboxes developed12,13.

As compliance requirements grow, so do barriers to entry (Fig. 1a). The mature, neuroimaging field requires increased resources and technical training to piece together and track multiple processes such as data ingestion, standardization, storage, management, preprocessing and feature extraction (Fig. 1a). Currently, no single and low-barrier technology exists to integrate and manage the ever-changing software and data components of a full study. The growing compliance requirements affect the research community inequitably; smaller institutions and lower-income countries are more likely to lack resources and training. As such, this maturation process may risk favoring higher-resourced teams: an outcome that would not only decrease diversity and inclusion, but also slow-down scientific progress.

Fig. 1. The burdens of neuroscience and the promise of integrative infrastructure.

Fig. 1

a, A figurative representation of the current major burdens of performing neuroimaging investigations. b, Our proposal for integrative infrastructure that coordinates services required to perform FAIR, reproducible, rigorous and transparent neuroimaging research thereby lifting the burden from the researcher. c, brainlife.io rests on the foundational pillars of the open science community such as data archives, standards, software libraries and compute resources. d, brainlife.io’s Map step takes MRI, MEG and EEG data and processes them to extract statistical features of interest. brainlife.io’s reduce step takes the extracted features and serves them to Jupyter Notebooks for statistical analysis. PS, parc-stats datatype; TM, tractmeasures datatype; NET, network datatype and CLI, common line interface. e, The brainlife.io technology automates capture of data provenance. All data objects on brainlife.io are stored with a record of the apps, app versions and parameters used to process the data. f, The primary services are provided to the user by brainlife.io. Panels a and b adapted from ref. 22 under a Creative Commons license CC BY 4.0.

In support of simplicity, efficiency, transparency and equity in big data neuroscience research, our team has developed a community resource, brainlife.io (Fig. 1b). The brainlife.io platform stands on the pillars of open science (Fig. 1c), to provide free, secure and reproducible neuroscientific data analysis. Because of its web-based availability, brainlife.io should expand opportunities for researchers from nations and institutions with limited research budgets and resources. brainlife.io should then serve as an enabler for researchers and students from all sorts of institutions of higher education and all sorts of backgrounds to access cutting-edge neuroscience analytic tools.

brainlife.io is a ready-to-use and ready-to-expand platform. As a ready-to-use system, it allows researchers to upload and analyze data from MRI, MEG and EEG systems. Data are managed using a secure warehousing system with a proper access-control model. Data can be preprocessed and visualized using version-controlled applications (hereafter referred to as apps; https://brainlife.io/apps; Supplementary Fig. 1), compliant with major data standards (for example, the Brain Imaging Data Structure4). As a ready-to-expand system, software developers may submit apps guided by standardization and documentation (https://github.com/brainlife/abcd-spec and https://brainlife.io/docs). The platform uses opportunistic computing to serve commercial and academic clouds to researchers. Computing resources can be registered on brainlife.io for individual users and projects, or the larger community (Extended Data Fig. 1a,b). Supplementary Results 1 describe the technology.

Extended Data Fig. 1. Platform Architecture.

Extended Data Fig. 1

a. Map of the locations of critical hubs for brainlife.io. b. Map the locations of critical facets of this research, including project infrastructure (that is compute resources), collaborators, and data sources. As the United States and Europe are home to many of the infrastructural resources, collaborators, and data sources, more details for these regions are provided (insets). c. brainlife.io’s Amaretti links data archives, software libraries, and computing resources. Specifically, ‘Apps’ (containerized services defined on GitHub.com) are automatically matched with data stored in the ‘Warehouse’ with computing resources. Statistical analyses can be implemented using Jupyter Notebooks. d. brainlife.io provides efficient docking between data archives, processing apps, and compute resources via a centralized service. e. Apps use standardized Datatypes and allow ‘smart docking’ only with compatible data objects. App outputs can be docked by other Apps for further processing.

The architecture of brainlife.io is based on a microservice approach for automated and decentralized data management and processing. Microservices are handled by the orchestration system Amaretti (Extended Data Fig. 1c,d and Extended Data Table 1) which deploys computational jobs on high-performance clusters and clouds (for example, Google Cloud, AWS or Microsoft Azure). Data management on brainlife.io is centered around projects, the ‘one-stop-shop’ for data management, processing, analysis and visualization (Supplementary Results 2 and Supplementary Fig. 2). Data archives can be docked by brainlife.io (Extended Data Fig. 1d) and data imported via the portal https://brainlife.io/datasets (Supplementary Table 1). Data from measurement instruments are imported using https://brainlife.io/ezbid (Extended Data Table 1)14. Data processing on brainlife.io uses an object-oriented and micro-workflow service model. Data objects are stored using predefined standardized formats, datatypes, that allow automated app pipelining (Extended Data Fig. 1e; https://brainlife.io/datatypes) and provenance tracking for millions of data objects. Data processing Apps are containerized, composable processing units, can be written in any language using containerization technology and are smart, meaning that they automatically identify, accept or reject data objects before processing (Supplementary Results 3 and Supplementary Fig. 3). brainlife.io apps and datatypes are Brain Imaging Data Structure4 compatible.

Extended Data Table 1.

Platform microservices

graphic file with name 41592_2024_2237_Tab1_ESM.jpg

Table with list of all platform services, name, scope, service URL (pointer to brainlife page if available as direct URL) and github URL for code.

Complex neuroimaging processing pipelines are simplified into two main steps, akin to Google’s MapReduce algorithm. An initial Map step preprocesses data objects asynchronously, in parallel, to extract features of interest (that is, functional activations, white matter maps, brain networks or time series data; Fig. 1d). A ‘reduce’ step follows where features extracted using apps are made available to preconfigured Jupyter Notebooks to perform analysis and generate figures. Indeed, all analyses and figures in this paper are available in brainlife.io notebooks (Supplementary Table 2). brainlife.io’s data workflow makes it possible to integrate large volumes of data into small sets of features saved into ‘tidy data’ structures (Fig. 1d). For more documentation regarding usage of the platform, see Extended Data Fig. 2 and Supplementary Table 1. Datatypes inform apps allowing automated processing and provenance tracking for millions of data objects. brainlife.io tracks data object IDs, app versions and parameter sets across data processing steps. brainlife.io data provenance graphs visualize (Fig. 1e and Supplementary Table 1) and reproduce (Supplementary Table 1) the data generation steps. brainlife.io lowers the barriers of entry to FAIR neuroimaging by supporting an end-to-end data analysis workflow within a unified ecosystem (Fig. 1f).

Extended Data Fig. 2. Platform Usage.

Extended Data Fig. 2

a. Top left. Number of users submitting more than 10 jobs per month. Top middle. Number of projects over time. Top right. Number of Apps over time. Bottom left. Data storage across all Projects. Bottom middle. Compute hours across all Projects (data only available 6 months post project start). Bottom right. Lines of code in the top 50 most-used Apps. b. Top left. User communities. Top right. App categories. Bottom left. Percent of total jobs launched with the software library installed (percentage for jobs of top 50 most-used Apps). Bottom right. Datasets sources. c. Map of the locations of the users that created an account and accessed brainlife.io. This map is a proxy to the level of attention the platform achieved worldwide.

We performed validation experiments to demonstrate cases where brainlife.io’s technology produces results consistent with best practices in the field. We used over 1,800 participants from three datasets: PING, HCPs1200 and Cam-CAN (Extended Data Figs. 38, Supplementary Results 4, Supplementary Fig. 4 and Supplementary Tables 3 and 4). Participants across all datasets spanned seven decades (that is, PING, 3–20 years; HCPs1200, 20–37 years and Cam-CAN, 18–88 years). Lifespan trajectories were plotted for multiple brain features (for example, brain region volume, white matter tract FAs, connectivity networks and MEG peak frequency; Fig. 2a and Extended Data Fig. 7) using brainlife.io’s Jupyter Notebooks. Inverted U-shaped lifespan trajectories were estimated, consistent with previous studies1517 (Fig. 2a and Extended Data Fig. 7). The results generated using brainlife.io demonstrate that substantially different datasets can be collated to identify established brain’s lifespan trajectories (Supplementary Results 4).

Extended Data Fig. 3. Data processing validity and reliability analysis.

Extended Data Fig. 3

Top row (a): Validity measures derived using the HCP Test-Retest (HCPTR) data. Each dot corresponds to the ratio for a given subject between data preprocessed and provided by the HCP Consortium vs data preprocessed on brainlife.io in a given measure for a given structure. Pearson’s correlation (r), root mean squared error (rmse), and a linear fit between the test and retest results were calculated. Parcel volume (mm3). Tract-average fractional anisotropy (FA). Node-wise functional connectivity (FC)*. Primary gradient value derived from resting-state fMRI*. Peak frequency (Hz) in the alpha band derived from MEG. Data from magnetometer sensors are represented as squares, and data from gradiometer sensors are represented as circles. Dark colors represent data within ±1 standard deviation (SD. 50% opacity represents data within 1-2 SD. 25% opacity represents data outside 2 SD. *A representative 5% of data presented. Bottom row (b): Test-retest reliability measures derived from derivatives of the HCPTR dataset generated using brainlife.io. Each dot corresponds to the ratio between a test-retest subject and a given measure for a given structure. Pearson’s correlation (r), root mean squared error (rmse), and a linear fit between the test and retest results were calculated.Parcel volume (mm3). Tract-average fractional anisotropy (FA). Node-wise functional connectivity (FC)*. Primary gradient value derived from resting-state fMRI*. Peak frequency (Hz) in the alpha band derived from MEG using the Cambridge (Cam-CAN) dataset. Data from magnetometer sensors are represented as squares, and data from gradiometer sensors are represented as circles. Dark colors represent data within ±1 standard deviation (SD. 50% opacity represents data within 1-2 SD. 25% opacity represents data outside 2 SD. *A representative 5% of data presented.

Extended Data Fig. 8. Replication of previous studies using brainlife.io.

Extended Data Fig. 8

a. Average cortical hcp-mmp parcel thickness (Nstruc = 322) compared to parcel orientation dispersion index (ODI) from the NODDI model mapped to the cortical surface (inset) of the HCPS1200 dataset (Nsub = 1,043) and Cam-CAN (Nsub = 492) dataset compared to the parcel-average cortical thickness. b. Receiver operator curves (ROC) comparing the performance of segmentation of the Right ILF using two automated segmentation methods (LAP: blue, NN_DR_MAM: green) in a subset of the HCPS1200 dataset (Nsub = 15). Dice coefficients between manual and automated segmentation of the hippocampus using AHSS method in UPENN dataset. c. Stressful life events obtained from Negative Life Events Schedule (NLES) survey from Healthy Brain Network participants (Nsub = 42) compared to Uncinate-average normalized Quantitative Anisotropy (QA). Mean linear regression (blue line) fits and standard deviation (shaded blue). Early life stress was obtained from multiple surveys collected from ABCD participants (Nsub = 1,107) compared to Uncinate-average Fractional Anisotropy (FA). Linear regression (green line) fits the data with standard deviation (shaded green). See Fig. 2b,c.

Fig. 2. brainlife.io supports scientific discovery and replication.

Fig. 2

a–d, Identifying unique relationships with brain features over the lifespan. a, Relationship between participant age and right hippocampal volume, right inferior longitudinal fasciculus (FA, within-network average functional connectivity (FC) derived using the Yeo17 atlas and peak frequency in the alpha band derived from magnetometer (squares) and gradiometers (circles) from MEG data. These analyses include participants from the PING (purple), HCP1200 (green) and Cam-CAN (yellow) datasets. Linear regressions were fitted to each dataset, and a quadratic regression was fitted to the entire dataset (blue). b,c, Replication and generalization of previously reported scientific findings. b, Average cortical hcp-mmp parcel thickness (Nstruc = 322) compared to parcel the ODI from the NODDI model mapped to the cortical surface (inset) of the HCPS1200 dataset (Nsub = 1,043) and Cam-CAN (Nsub = 492) dataset compared to the parcel-average cortical thickness. c, Stressful life events were obtained from the NLES survey from HBN participants (Nsub = 42) compared to uncinate-average normalized quantitative anisotropy (QA). Mean linear regression (blue line) fits and standard deviation (shaded blue). Early life stress was obtained from multiple surveys collected from ABCD participants (Nsub = 1,107) compared to uncinate-average FA. Linear regression (green line) fits the data with standard deviation (shaded green). d, Identification of clinical biomarkers. d, Retinal optical coherence tomography images from healthy controls (top row), patients with Stargardt’s disease (middle row) and patients with Choroideremia (bottom row). From these images, photoreceptor complex thickness was measured for each group (controls, gray; Choroideremia, green; Stargardt’s, blue) in two distinct areas of the retina: the fovea (eccentricities 0–1°) and periphery (eccentricities 7–8°). In addition, optic radiations carrying information for each retinal area were segmented and FA profiles were mapped. Average profiles with standard error (shaded regions) were computed. One participant with Stargardt and one with Choroideremia were identified each having FA profiles that deviated from healthy controls.

Extended Data Fig. 7. Lifelong brain maturation estimated across datasets.

Extended Data Fig. 7

Relationship between subject age and a. Right hippocampal volume, b. Right inferior longitudinal fasciculus (ILF) fractional anisotropy (FA), c. maximum node degree of density network derived using the hcp-mmp atlas, d*. Within-network average functional connectivity (FC) derived using the Yeo17 atlas, e*. Functional gradient distance for visual resting state network derived from the Yeo17 atlas, and f. Peak frequency in the alpha band derived from magnetometer (squares) and gradiometers (circles) from MEG data. These analyses include subjects from the PING (purple), HCPs1200 (green), and Cam-CAN (yellow) datasets. Linear regressions were fit to each dataset, and a quadratic regression was fit to the entire dataset (blue). * All points in e, and f are presented. See Fig. 2a. Relationship between age of subject and g. Cortical fractional anisotropy (FA) of the left V1, h. Within-network average functional connectivity (FC) from the Yeo17 Default Mode - A network. These analyses include subjects from the PING (purple), HCPs1200 (green), and CAN (yellow) datasets. Linear regressions were fit to each dataset, and a quadratic regression was fit to the entire dataset (blue).

We further evaluated the ability to replicate results and generalize findings. Apps were created to estimate cortical thickness and tissue orientation dispersion, orientation dispersion index (ODI) and analyze the HCPs1200 dataset. A negative relationship between cortical thickness and ODI was estimated (Fig. 2b and Extended Data Fig. 8; rHCP-brainlife = −0.43 versus roriginal), replicating the original study (ODI; roriginal = −0.46)18. The result was also generalized to the Cam-CAN dataset (Fig. 2b and Extended Data Fig. 8; rCam-CAN-brainlife = −0.28 versus roriginal). The association between life stressors and white matter organization of the uncinate fasciculus (r = −0.057) was a replication of Hason et al.19 using two independent datasets. The Negative Life Events Schedule (NLES) was correlated with quantitative anisotropy in the right- and left-hemisphere uncinate fasciculus (Fig. 2c and Extended Data Fig. 8; rHBN_LEFT = −0.35, two-tailed t-test, P = 0.018; rHBN_RIGHT = −0.39, two-tailed t-test P < 0.0156). Early Life Stress (a composite score of traumatic life events, environmental and neighborhood safety, and the family conflict subscale) was associated with the uncinate fasciculus FA (Fig. 2c and Extended Data Fig. 8; rABCD_LEFT = −0.12, P = 9.41 × 10−5; rABCD_RIGHT = −0.09, P = 0.0035). The results demonstrate the ability of brainlife.io services to detect meaningful associations in large, heterogeneous datasets (Supplementary Results 4).

Finally, we tested the ability of brainlife.io’s services to detect optic radiation white matter changes as a result of eye disease20. Individuals with Stargardt’s disease (deterioration initiated in the central retina) and choroideremia (deterioration initiated in peripheral retina) were compared to healthy controls. Stargardt’s FA was reduced in optic radiation fibers projecting to central V1 (not peripheral; Fig. 2d). Choroideremia’s FA was reduced in optic radiation fibers projecting to peripheral V1 (not central; Fig. 2d and Supplementary Results 4).

Our vision for brainlife.io is that of a trusted, interoperable and integrative platform connecting data archives and global communities of software developers, hardware providers and domain scientists (Supplementary Results 5 and Supplementary Table 5). The goal of brainlife.io is to facilitate research and education, accelerate brain understanding and lead to cures for brain diseases. To support this vision, brainlife.io connects trainees, researchers, developers and computing resource managers in high-, medium- and low-income countries via technology. The platform is registered on fairsharing.org, datacite.org and nitric.org, it is recommended by the International Neuroinformatics Coordinating Facility (https://incf.org/infrastructure/brainlife) and it can serve the US National Institutes of Health in the United States data deposition and sharing mandate21,22. A comprehensive overview of the platform and tutorials can be found at https://brainlife.io/docs. Videos provide tutorials and demonstrations at youtube.com/@brainlifeio. A slack channel supports communication and operations: https://brainlife.slack.com. Questions can be posted using the topic ‘brainlife’ on https://neurostars.org or GitHub issues (https://github.com/brainlife/brainlife/issues) can be added directly to the code repositories. A quarterly outreach newsletter is sent out to all users, and an X account (@brainlifeio) informs the wider community about critical events. The platform has already collected a growing community (Supplementary Results 3).

Methods

Data sources

Multiple openly available data sources were used for examining the validity, reliability and reproducibility of brainlife.io apps and for examining population distributions. All information regarding the specific image acquisitions, participant demographics and study-wide preprocessing can be found in the publications in refs. 57,9,10,2325. Some data sources are currently unpublished. For these, the appropriate information is provided. Experiments were approved by the local institutional review boards (IRB) and only the personnel approved for a specific study accessed the data in private projects on brainlife.io.

Validity, reliability, reproducibility, replicability, developmental trends and reference datasets

HCP (test–retest, s1200-release)

Data from these projects were used to assess the validity, reliability and reproducibility of the platform. They were used to assess the abilities of the platform to identify developmental trends in structural and functional measures, and they were used to generate reference datasets. For structural MRI (sMRI) data, the minimally preprocessed structural T1w and T2w images from the HCP from 1,066 participants from the s1200 and 44 participants from the test–retest releases were used5. Specifically, the 1.25 mm ‘acpc_dc_restored’ images generated from the Siemens 3 T MRI scanner were used for all analyses involving the HCP. For most examinations, the already-processed Freesurfer output from HCP was used. For diffusion MRI (dMRI) data, to assess the validity of preprocessing on brainlife.io, the unprocessed dMRI data from 44 participants from the HCP test dataset was used. For reliability and all remaining analyses, the minimally preprocessed dMRI images from 1,066 participants from the s1200 and 44 participants from the test–retest releases from the 3 T Siemens scanner were used. All processes incorporated the multi-shell acquisition data. For functional data (functional MRI (fMRI)), regarding validation, the unprocessed resting-state fMRI data from 44 participants from the HCP test dataset were compared to the minimally preprocessed blood oxygenation level dependent data provided by HCP. For reliability and all other analyses, the minimally preprocessed blood oxygenation level dependent data from 1,066 participants from the s1200 and 44 participants from the test–retest releases from the 3 T Siemens scanner were used.

The Cam-CAN

The data from this project were used to assess the validity, reliability and reproducibility of the platform, and to assess the abilities of the platform to identify developmental trends of structural and functional measures, and to generate reference datasets. For sMRI data, the unprocessed 1 mm isotropic structural T1w and T2w images from 652 participants from the Cam-CAN6 study were used. For dMRI data, the unprocessed 2 mm isotropic diffusion (dMRI) images from 652 participants from the Cam-CAN study were used. For fMRI data, the 3 × 3 × 4 mm3 unprocessed resting-state fMRI images from 652 participants from the Cam-CAN study were used. For electromagnetic data (MEG), the 1,000 Hz resting-state filtered and unfiltered datasets from 652 participants from the Cam-CAN study were used.

Developmental trends and reference datasets

PING

The data from this project were used to assess the abilities of the platform to identify developmental trends of structural measures and to generate reference datasets. For sMRI data, the unprocessed 1.2 × 1.0 × 1.0 mm3 structural T1w and the 1.0 mm isotropic T2w images from 110 participants from the PING10 study were used. For dMRI data, the unprocessed 2 mm isotropic diffusion (dMRI) images from 110 participants from the PING study were used.

Replicability datasets

ABCD

For sMRI data, the unprocessed 1 mm isotropic structural T1w and T2w images from a subset of 1,877 participants from the ABCD (release-2.0.0) study were used. For dMRI data, the unprocessed 1.77 mm isotropic diffusion (dMRI) images from a subset of 1,877 participants from the ABCD (release-2.0.0) study were used7,26. A single diffusion gradient shell was used for these experiments (b = 3,000 s ms2). Research was approved by the University of Arkansas IRB (no. 2209425822).

HBN

The data from this project were used to assess the abilities of the platform to replicate previously published findings via the assessment of the relationship between microstructural measures mapped to segmented uncinate fasciculi and self-reported early life stressors. Research was approved by the University of Pittsburgh IRB (no. PRO17060350). For sMRI data, the 0.8 mm isotropic structural T1w images from 42 participants from the HBN study9 were used. For dMRI data, the unprocessed 1.8 mm isotropic diffusion (dMRI) images from 42 participants from the CitiGroup Cornell Brain Imaging Center site of the HBN study were used. Research was approved by the University of Pittsburgh IRB (no. PRO17060350).

UPENN-PMC

The University of Pennsylvania, Penn Memory Center (UPENN-PMC) data from this project were used to assess the abilities of the platform to replicate previously published findings via the assessment of the performance of an automated hippocampal segmentation algorithm. Secondary data analyses were conducted under IRB exemption at Indiana University. For sMRI data, the T1w and T2w data were provided within the Automated Segmentation of Hippocampal Subfields Automated Segmentation of Hippocampal Subfields atlas27.

Clinical-identification datasets

Indiana University Acute Concussion dataset

The data from this project were used to assess the abilities of the platform to identify clinical populations via the mapping of microstructural measures to the cortical surface. Neuroimaging was performed at the Indiana University Imaging Research Facility, housed within the Department of Psychological and Brain Sciences with a 3 T Siemens Prisma whole-body MRI using a 64-channel head coil. Within this study, nine concussed athletes and 20 healthy athletes were included. Research approved by Indiana University (IRB 906000405). For sMRI data, high-resolution T1-weighted structural volumes were acquired using an MPRAGE sequence: TI = 900 ms, TE = 2.7 ms, TR = 1,800 ms, flip angle 9°, with 192 sagittal slices of 1.0 mm thickness, a field of view of 256 × 256 mm2 and an isometric voxel size of 1.0 mm3 (where TI, TE and TR refer to inversion time, echo time and repetition time, respectively). The total acquisition time was 4 min and 34 s. High-resolution T2-weighted structural volumes were also acquired: TE = 564 ms, TR = 3,200 ms, flip angle 120°, with 192 sagittal slices, a field of view of 240 × 256 mm2 and an isometric voxel size of 1.0 mm3. Total acquisition time was 4 min and 30 s. Diffusion data (dMRI) were collected using single-shot spin-echo simultaneous multi-slice (SMS) echo-planar imaging (transverse orientation, TE = 92.00 ms, TR = 3,820 ms, flip angle 78°, isotropic 1.5 mm3 resolution; FOV = LR 228 × 228 × 144 mm3; acquisition matrix MxP 138 × 138. SMS acceleration factor 4). This sequence was collected twice, one in the anterior-posterior fold-over direction and the other in the posterior-anterior (PA) fold-over direction, with the same diffusion gradient strengths and the number of diffusion directions: 30 diffusion directions at b = 1,000 s mm2, 60 diffusion directions at b = 1,750 s mm2, 90 diffusion directions at b = 2,500 s mm2 and 19 b = 0 s mm2 volumes. The total acquisition time for both sets of dMRI sequences was 25 min and 58 s.

Oxford University Choroideremia & Stargardt’s Disease Dataset

The data from this project were used to assess the abilities of the platform to identify clinical populations via mapping retinal-layer thickness via optical coherence tomography and mapping of microstructural measures along optic radiation bundles segmented using visual field information (eccentricity). Neuroimaging was performed at the Wellcome Centre for Integrative Neuroimaging, Oxford with the Siemens 3 T scanner. Research was approved by the UK Health Regulatory Authority reference 17/LO/1540. For sMRI data, high-resolution T1-weighted anatomical volumes were acquired using an MPRAGE sequence: TI = 904 ms, TE = 3.97 ms, TR = 1,900 ms, flip angle 8°, with 192 sagittal slices of 1.0 mm thickness, a field of view of 174 × 192 × 192 mm3 and an isometric voxel size of 1.0 mm3. The total acquisition time was 5 min and 31 s. Diffusion data (dMRI) were collected using echo-planar imaging (transverse orientation, TE = 92.00 ms, TR = 3,600 ms, flip angle 78°, 2.019 × 2.019 × 2.0 mm3 resolution; FOV = 210 × 220 × 158 mm3; acquisition matrix MxP = 210 × 210, SMS acceleration factor 3). This sequence was collected twice, one in the anterior-posterior fold-over direction and the other in the PA fold-over direction. The PA fold-over scan contained six diffusion directions, three at b = 0 s mm2 and three at b = 2,000 s mm2, and was used primarily for susceptibility-weighted corrections. The anterior-posterior fold-over scan contained 105 diffusion directions, five at b = 0 mm s2, 51 at b = 1,000 mm s2 and 49 at b = 2,000 mm s2. The total acquisition time for both sets of dMRI sequences was 7 min and 8 s.

General processing pipelines

Structural processing

For the ABCD, Cam-CAN, Oxford University Choroideremia & Stargardt’s Disease Dataset, and the Indiana University Acute Concussion datasets, the structural T1w and T2w (sMRI) images (if available) were preprocessed, including bias correction and alignment to the anterior commissure-posterior commissure plane, using the brainlife.io apps A273 (10.25663/brainlife.app.273) and A350 (10.25663/brainlife.app.350), respectively. For PING data, no bias correction was performed but alignment to the anterior commissure-posterior commissure plane was performed using A99 (10.25663/brainlife.app.99) and A116 (10.25663/brainlife.app.116) for T1w and T2w data, respectively. For HCP data, this data was already provided. The structural T1-weighted images for each participant and dataset were then segmented into different tissue types using functionality provided by MRTrix3 (ref. 28) implemented as A239 (10.25663/brainlife.app.239). For a subset of datasets, this was performed within the diffusion tractography generation step using A319 (10.25663/brainlife.app.319). The gray- and white-matter interface mask was subsequently used as a seed mask for white matter tractography. The processed structural T1w and T2w images were then used for segmentation and surface generation using the recon-all function from Freesurfer29 (A0; 10.25663/brainlife.app.0). Following Freesurfer, representations of the cortical ‘midthickness’ surface were computed by spatially averaging the coordinates of the pial and white matter surfaces generated by Freesurfer using the wb_command -surface-cortex-layer function provided by Workbench command for the HCPTR, HCPs1200, ABCD, Cam-CAN, PING and Indiana University Acute Concussion datasets. These surfaces were used for cortical tissue mapping analyses. Following Freesurfer and midthickness-surface generation, the 180 multimodal cortical nodes (hcp-mmp) atlas and the Yeo 17 (yeo17) atlas were mapped to the Freesurfer segmentation of each participant implemented as brainlife.io app A23 (10.25663/brainlife.app.23). These parcellations were used for subsequent cortical, subcortical and network analyses. In addition, measures for cortical thickness, surface area, volume and summaries of diffusion models of microstructure were estimated using A383 (10.25663/brainlife.app.383) and A389 (10.25663/brainlife.app.389). To estimate population receptive fields and visual field eccentricity properties in the cortical surface in the Oxford University Choroideremia & Stargardt’s Disease Dataset, the automated mapping algorithm developed by refs. 30,31 was implemented using A187 (10.25663/brainlife.app.187). To segment thalamic nuclei for optic radiation tracking, the automated thalamic nuclei segmentation algorithm provided by Freesurfer28 was implemented as A222 (10.25663/brainlife.app.222). Finally, visual regions of interest (ROI) binned by eccentricity were then generated using AFNI software32 functions implemented in A414 (10.25663/brainlife.app.414). To assess the replicability capabilities of the platform, an automated hippocampal nuclei segmentation app (A262; 10.25663/brainlife.app.262) was used to segment hippocampal subfields from participants within the UPENN-PMC dataset provided within the Automated Segmentation of Hippocampal Subfields atlas.

dMRI processing

Preprocessing and model fitting

For most of the analyses involving the HCP dataset, the minimally preprocessed dMRI images were used and thus no further preprocessing was performed. However, to assess the validity of the preprocessing pipeline, the unprocessed dMRI data from the HCP test dataset and dMRI images were preprocessed following the protocol outlined in ref. 33 using A68 (10.25663/brainlife.app.68). The same app was also used for preprocessing the dMRI images for the ABCD, Cam-CAN, PING, Oxford University Choroideremia & Stargardt’s Disease Dataset, the Indiana University Acute Concussion and HBN datasets. Specifically, dMRI images were denoised and cleaned from Gibbs ringing using functionality provided by MRTrix3 before being corrected for susceptibility, motion and eddy distortions and artifacts using FSL’s topup and eddy functions34,35. Eddy-current and motion correction was applied via the eddy_cuda8.0 with the replacement of outlier slices (that is, repol) command provided by FSL3639. Following these corrections, MRTrix3’s dwigradcheck functionality was used to check and correct for potential misaligned gradient vectors following topup and eddy40. Next, dMRI images were debiased using ANT’s n4 functionality41 and the background noise was cleaned using MrTrix3.0’s dwidenoise functionality42. Finally, the preprocessed dMRI images were registered to the structural (T1w) image using FSL’s epi_reg functionality4345. Following preprocessing, brain masks for dMRI data using bet from FSL were implemented as A163 (10.25663/brainlife.app.163).

DTI, NODDI and q-sampling model fitting

Following preprocessing, the diffusion tensor imaging (DTI) model46 and the neurite orientation dispersion and density imaging (NODDI)47,48 models were subsequently fit to the preprocessed dMRI images for each participant using either A319 (10.25663/brainlife.app.319) or A292 (10.25663/brainlife.app.292) for DTI model fitting and A365 (10.25663/brainlife.app.365) for NODDI fitting. Note, the NODDI model was only fit on the HCP, Cam-CAN, Oxford University Choroideremia & Stargardt’s Disease Dataset and the Indiana University Acute Concussion datasets. For those datasets, the NODDI model was fit using an intrinsic free diffusivity parameter (d) of 1.7 × 10−3 mm2 s−1 for white matter tract and network analyses, and a d of 1.1 × 10−3 mm2 s−1 for cortical tissue mapping analyses, using AMICO’s implementation48 as A365 (10.25663/brainlife.app.365). The constrained spherical deconvolution49 model was then fit to the preprocessed dMRI data for each run across four spherical harmonic orders (that is, Lmax) parameters (2, 4, 6, 8) using functionality provided by MRTrix3 implemented as brainlife.io app A238 (10.25663/brainlife.app.238). For the PING datasets, the constrained spherical deconvolution model was fit using the same code found in A238 (10.25663/brainlife.app.238), but performed using the tractography app A319 (10.25663/brainlife.app.319). For the HBN dataset, the isotropic spin distribution function was obtained by reconstructing the diffusion MRI data with the generalized q-sampling imaging method50 using functionality provided by DSI-Studio51 (A423; 10.25663/brainlife.app.423). Quantitative anisotropy was then estimated from the isotropic spin distribution function.

Tractography

Following model fitting, the fiber orientation distribution functions for Lmax = 6 and Lmax = 8 were subsequently used to guide anatomically constrained probabilistic tractography52 using functions provided by MRTrix3 implemented as brainlife.io app A297 (10.25663/brainlife.app.297) or A319 (10.25663/brainlife.app.319). For the HCPTR, HCPs1200 and Oxford University Choroideremia & Stargardt’s Disease datasets, Lmax = 8 was used. For the ABCD and Cam-CAN datasets, Lmax = 6 was used. For the HCP, ABCD and Cam-CAN, datasets, a total of 3 million streamlines were generated. For all datasets, a step size of 0.2 mm was implemented. For the HCPTR, HCPs1200, ABCD and Cam-CAN datasets, minimum and maximum lengths of streamlines were set at 25 and 250 mm, respectively, and a maximum angle of curvature of 35° was used. For the PING dataset, minimum and maximum lengths of streamlines were set at 20 and 220 mm, respectively, and a maximum angle of curvature of 35° was used.

Whiter matter segmentation and cleaning

Following tractography, 61 major white matter tracts were segmented for each run using a customized version of the white matter query language53 implemented as brainlife.io app A188 (10.25663/brainlife.app.188). Outlier streamlines were subsequently removed using functionality provided by Vistasoft and implemented as brainlife.io app A195 (10.25663/brainlife.app.195). Following cleaning, tract profiles with 200 nodes were generated for all DTI and NODDI measures across the 61 tracts for each participant and test–retest condition using functionality provided by Vistasoft and implemented as A361 (10.25663/brainlife.app.361). Macrostructural statistics, including average tract length, tract volume and streamline count were computed using functionality provided by Vistasoft implemented as A189 (10.25663/brainlife.app.189). Microstructural and macrostructural statistics were then compiled into a single data frame using A397 (10.25663/brainlife.app.397).

Segmentation of the optic radiation

To generate optic radiations segmented by estimates of visual field eccentricity in the Oxford University Choroideremia & Stargardt’s Disease Dataset, ConTrack54 tracking was implemented as A252 (10.25663/brainlife.app.252). Then, 500,000 sample streamlines were generated using a step size of 1 mm. Samples were then pruned using inclusion and exclusion waypoint ROI following methodologies outlined in refs. 19,55.

Segmentation of uncinate fasciculus

To assess the relationship between uncinate tract-average quantitative anisotropy, fractional anisotropy (FA) and early life stressors within two independent datasets (HBN, ABCD), the tract-average quantitative anisotropy for the left and right uncinate were computed from 42 participants from the HBN and the tract-average FA were computed from 1,107 participants from the ABCD dataset. For the HBN dataset, a full tractography segmentation pipeline was used to preprocess the dMRI data and segment the uncinate fasciculus using A423 (10.25663/brainlife.app.423). Automatic fiber tracking was then performed to segment the uncinate fasciculus using default parameters and templates from a population tractography atlas from the HCP56. A threshold of 16 mm as the maximum allowed threshold for the shortest streamline distance was then applied to remove spurious streamlines. The whole tract-average quantitative anisotropy was then estimated. To probe stress exposure within the HBN dataset, we used the NLES, a 22-item questionnaire in which participants were asked about the occurrence of different stressful life events. The tractography pipeline for the ABCD dataset has been described previously. The average FA for the left and right uncinate were estimated using procedures described previously, and then compared to the participant’s life stressors behavioral measures by fitting a linear regression to the data.

Structural networks

Following tract segmentation, structural networks were generated using the multimodal 180 cortical node atlas and the tractograms for each participant using MRTrix3’s tck2connectome (ref. 57) functionality implemented as A395 (10.25663/brainlife.app.395). Connectomes were generated by computing the number of streamlines intersecting each ROI pairing in the 180 cortical node parcellation. Multiple adjacency matrices were generated, including count, density (that is, the count divided by the node volume of the ROI pairs), length, length density (that is length divided by the volume of the ROI pairs) and average and average density axial diffusivity, fractional anisotropy, mean diffusivity, radial diffusivity, neurite density index, orientation dispersion index and isotropic volume fraction. Density matrices were generated using the -invnodevol option58. For non-count measures (length, axial diffusivity, fractional anisotropy, mean diffusivity, radial diffusivity, neurite density index, orientation dispersion index, isotropic volume fraction), the average measure across all streamlines connecting and ROI pair was computed using MRTrix3’s tck2scale functionality using the -precise option59 and the -scale_file option in tck2connectome. These matrices can be thought of as the ‘average measure’ adjacency matrices. These files were output as the ‘raw’ datatype and were converted to a conmat datatype using A393 (10.25663/brainlife.app.393). Connectivity matrices were then converted into the ‘network’ datatype using functionality from Python functionality implemented as A335 (10.25663/brainlife.app.335).

Cortical and subcortical diffusion and morphometry mapping

For the PING, HCPTR, HCPs1200, Cam-CAN and Indiana University Acute Concussion datasets, DTI and NODDI (if available) measures were mapped to each participant’s cortical white matter parcels following methods found in Fukutomi and colleagues18 using functions provided by Connectome Workbench60 implemented as brainlife.io app A379 (10.25663/brainlife.app.379). A Gaussian smoothing kernel (full-width at half-maximum ~4 mm, σ = 5/3 mm) was applied along the axis normal to the midthickness surface, and DTI and NODDI measures were mapped using the wb_command -volume-to-surface-mapping function. Freesurfer was used to map the average DTI and NODDI measures within each parcel using functionality from Connectome Workbench using A389 (10.25663/brainlife.app.389) and A483 (10.25663/brainlife.app.483). Measures of volume, surface area and cortical thickness for each cortical parcel were computed using Freesurfer and A464 (10.25663/brainlife.app.464). Freesurfer was also used to generate parcel-average DTI and NODDI measures for the subcortical segmentation (aseg) from Freesurfer using A383 (10.25663/brainlife.app.383). Measures of volume for each subcortical parcel were computed using Freesurfer and A272 (10.25663/brainlife.app.272).

rs-fMRI preprocessing and functional connectivity matrix generation

For the HCPTR and Cam-CAN datasets, unprocessed resting-state functional MRI (rs-fMRI) datasets were preprocessed using fMRIPrep implemented as A160 (10.25663/brainlife.app.160). Briefly, fMRIPrep does the following preprocessing steps. First, individual images are aligned to a reference image for motion estimation and correction using mcflirt from FSL. Next, slice timing correction is performed in which all slices are realigned in time to the middle of each relaxation time using 3dTShift from AFNI. Spatial distortions are then corrected using field map estimations. Finally, the fMRI data is aligned to the structural T1w image for each participant. Default parameters provided by fMRIPrep were used. For a subset of analyses involving the HCP test and retest datasets, the preprocessed rs-fMRI datasets provided by the HCP consortium were used. Following preprocessing via fMRIPrep for the volume data, connectivity matrices were generated using the Yeo17 parcellation and A369 (10.25663/brainlife.app.369) and A532 (10.25663/brainlife.app.532). Within-network functional connectivity for the 17 canonical resting-state networks was computed by computing the average functional connectivity values within all of the nodes belonging to a single network. These estimates were used for subsequent analyses.

rs-fMRI gradient processing

For the HCPTR and Cam-CAN datasets, unprocessed rs-fMRI data from the HCP Test and Cam-CAN datasets were preprocessed using fMRIPrep implemented as A160 (10.25663/brainlife.app.160). Within this app, the same preprocessing steps are undertaken as in A160 (10.25663/brainlife.app.160), except for an additional volume-to-surface mapping using mri_vol2surf from Freesurfer. The surface-based outputs were then used to compute gradients following methodologies outlined in ref. 61 for each participant in the HCPs1200, HCPTR and Cam-CAN datasets using A574 (10.25663/brainlife.app.574) using diffusion embedding62 and functions provided by BrainSpace63. More specifically, connectivity matrices were computed from surface vertex values within each node of the Schaffer 1,000 parcellation64. Cosine similarity was then computed to create an affinity matrix to capture inter-area similarity. Dimensionality reduction is then used to identify the primary gradients. A normalized-angle kernel was used to create the affinity matrix, from which two primary components were identified. Gradients were then aligned across all participants using a Procrustes alignment and joined embedding procedure61. Values from the primary gradient and the cosine distance used to generate the affinity matrices were used for subsequent analyses.

MEG processing

For some analyses, raw resting-state-MEG time series data from the Cam-CAN dataset was filtered using a Maxwell filter implemented as A476 (10.25663/brainlife.app.476) and median split using A529 (10.25663/brainlife.app.529). For the remainder of the analyses, filtered data provided by the Cam-CAN dataset was used. For all MEG data, power-spectrum density profiles (PSD) were estimated using functionality provided by MNE-Python28,65 implemented as A530 (10.25663/brainlife.app.530). Following PSD estimation, peak alpha frequency was estimated using A531 (10.25663/brainlife.app.531). Finally, PSD profiles were averaged across all nodes within each of the canonical lobes (frontal, parietal, occipital, temporal) using A599 (10.25663/brainlife.app.599). Measures of PSD and peak alpha frequency were used for all subsequent analyses.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Online content

Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41592-024-02237-2.

Supplementary information

Supplementary Information (2.6MB, pdf)

Supplementary Figs. 1–5, Tables 1–5 and Results 1–5.

Reporting Summary (4MB, pdf)

Acknowledgements

The brainlife.io project development and operations were supported by awards to F.P.: grant nos. NIH NIBIB R01EB029272, R01EB030896NSF and R01EB030896; NSF BCS 1734853 and 1636893; ACI 1916518, IIS 1912270; a gift from the Kavli Foundation; Wellcome Trust grant no. 226486/Z/22/Z and a Microsoft Investigator Fellowship. Additional funding was provided to support data collection used by the team, research that used brainlife.io or infrastructure that supported the platform: grant no. NIMH UM1NS132207 BRAIN CONNECTS: Center for Mesoscale Connectomics (Principal Investigator K. Ugurbil), grant no. NIMH R01MH133701 (C.R.). NSF grant award nos. 2004877 (S.V.-B.), 1541335 and 2232628 (S.M.), 1445604 and 2005506 (D.Y.H.), 1341698 and 1928224 (M. Norman), 1445606 (S.T.B.), 1928147 (S. Sanalevici). NIH grant award nos. 1U54MH091657 (HCP data, Principal Investigators D. Van Essen and K. Ugurbil), U01DA041048, U01DA050989, U01DA051016, U01DA041022, U01DA051018, U01DA051037, U01DA050987, U01DA041174, U01DA041106, U01DA041117, U01DA041028, U01DA041134, U01DA050988, U01DA051039, U01DA041156, U01DA041025, U01DA041120, U01DA051038, U01DA041148, U01DA041093, U01DA041089, U24DA041123, U24DA041147 (ABCD Study, multiple Principal Investigators), P41EB017183 (J.V.), NIH NIBIB R01EB030896 (A.P.) and ANR-20-NEUC-0004-01 (mulitple Principal Investigators). Multiple philanthropic contributions to the HBN (M. Milham).

Extended data

Extended Data Fig. 4. Processing with brainlife.io is valid and test-retest reliability is high - Structural MRI.

Extended Data Fig. 4

Top rows: Validity measures derived using the HCPTR data preprocessed and provided by the HCP Consortium compared to data preprocessed on brainlife.io. Each dot corresponds to the ratio for a given subject between data preprocessed and provided by the HCP Consortium vs data preprocessed on brainlife.io in a given measure for a given structure. Pearson’s correlation (r), root mean squared error (rmse), and a linear fit between the test and retest results were calculated and provided. a. Destrieux Parcel thickness (mm), surface area (mm2), and volume (mm3). b. HCP-mmp Parcel thickness (mm), surface area (mm2), and volume (mm3). Dark colors represent data within ± 1 standard deviation. 50% opacity represents data within 1-2 standard deviations. 25% opacity represents data outside 2 standard deviations. Bottom rows: Test-retest reliability measures derived from derivatives of the HCPTR dataset generated using brainlife.io. Each dot corresponds to the ratio between a test-retest subject and a given measure for a given structure. Pearson’s correlation (r), root mean squared error (rmse), and a linear fit between the test and retest results were calculated and provided. c. Destrieux Parcel thickness (mm), surface area (mm2), and volume (mm3). d. HCP-mmp Parcel thickness (mm), surface area (mm2), and volume (mm3). Dark colors represent data within ± 1 standard deviation. 50% opacity represents data within 1-2 standard deviations. 25% opacity represents data outside 2 standard deviations.

Extended Data Fig. 5. Processing with brainlife.io is valid, reliable, and reproducible.

Extended Data Fig. 5

Top row: Validity measures derived using the HCPTR data preprocessed and provided by the HCP Consortium compared to data preprocessed on brainlife.io. Each dot corresponds to the ratio for a given subject between data preprocessed and provided by the HCP Consortium vs data preprocessed on brainlife.io in a given measure for a given structure. Pearson’s correlation (r), root mean squared error (rmse), and a linear fit between the test and retest results were calculated and provided. v. Tract average AD, FA, MD, and RD. Dark colors represent data within ±1 standard deviation. 50% opacity represents data within 1-2 standard deviations. 25% opacity represents data outside 2 standard deviations. Bottom row: Test-retest reliability measures derived from derivatives of the HCPTR dataset generated using brainlife.io. Each dot corresponds to the ratio between a test-retest subject and a given measure for a given structure. Pearson’s correlation (r), root mean squared error (rmse), and a linear fit between the test and retest results were calculated and provided. w. Tract average AD, FA, MD, and RD. Dark colors represent data within ±1 standard deviation. 50% opacity represents data within 1-2 standard deviations. 25% opacity represents data outside 2 standard deviations. c. Computational reproducibility values derived by repeating runs of brainlife.io Apps using the HCPTR dataset and the CAN dataset. Each dot corresponds to the ratio for a given subject between repeated runs of each App for a given structure. Pearson’s correlation (r), root mean squared error (rmse), and a linear fit between the repeated runs was calculated. Destrieux Atlas Parcels volume (mm3). Tract-average fractional anisotropy (FA). Node-average functional connectivity (FC). Primary gradient values derived from resting state fMRI. Peak alpha frequency (Hz) in the alpha band derived from MEG.

Extended Data Fig. 6. Reference datasets for quality assurance.

Extended Data Fig. 6

Example workflow for building normative reference ranges for multiple derived statistical products (cortical parcel volume, white matter tract profilometry, within-network functional connectivity, and power-spectrum density (PSD)). a. Cortical volumes of the left hippocampus from HCP participants. Red dots indicate outlier data points. b. Average fractional anisotropy (FA) profiles (blue line) plotted with two standard deviations (shaded regions). Red lines indicate outlier profiles. c. Within-network functional connectivity for the nodes within the Default-A network using the Yeo17 atlas. Red dots indicate outlier data points. d. Average PSD from occipital channels using magnetometer sensors from Cam-CAN participants with one standard deviation (shaded regions). Red lines indicate outlier participants. Peak alpha frequency distribution was also computed, and outliers were detected (inset). e. Normative reference distributions for each derived statistical product across the PING (purple), HCP (blue), and Cam-CAN (orange) datasets. These distributions have had outliers removed. An example of the brainlife.io visualization for reference datasets can be found in Fig. S5. Data are presented as mean values ± SEM.

Author contributions

S.H. implemented most of the initial brainlife.io services. B.C. wrote the data analysis code, performed large-scale experiments, and prepared the figures and associated text. A.S.H. improved and implemented some of the services. J.F., R.C.C., D.H., D.S., D.P., L.K., J.K.L., C.R., F.N.-S., H.W., J.K.J., T.Z., J.W.K., S.K., C.V., D.N.B., B.M., D.B.A., F.D., J.G. and S.H., provided assets. All authors edited the manuscript. F.P. invented, designed and directed brainlife.io, wrote the paper and designed all the experiments and figures.

Peer review

Peer review information

Nature Methods thanks Jochem Rieger, Lucina Uddin and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Nina Vogt, in collaboration with the Nature Methods team.

Data availability

All data derived and described in this paper are made available via the brainlife.io platform as ‘Publications’. User data agreements are required for some projects, like data from the HCP, Cam-CAN, PING, ABCD and HBN datasets. The Indiana University Acute Concussion and Oxford University Choroideremia & Stargardt’s Disease Datasets are part of ongoing research projects and will be made available at a later stage. All other datasets are made freely available via the brainlife.io platform. See Supplementary Table 6 for the brainlife.io/pubs.

Code availability

As part of the article, we are describing a total of nine platform components. All components are made publicly available and open source under MIT License. All the software for the platform components is listed in Supplementary Table 1. In addition, we share the code used for the statistical analyses as Jupyter Notebooks (Supplementary Table 2). Finally, the Apps used and tested in this article are listed in Supplementary Table 3.

Competing interests

F.P. received a Microsoft Faculty Fellowship, and Microsoft Azure sells Cloud Services. S.T.B. works for Hewlett-Packard Enterprise, which sells computing services. A.D.B. is an employee of BioSerenity, a company that develops medical devices to help diagnose and monitor patients with chronic diseases. S.H. is an employee of SHEGEL SPRL/BVBA a legal firm with expertise in data protection law. The other authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Soichi Hayashi, Bradley A. Caron.

Change history

5/7/2024

A Correction to this paper has been published: 10.1038/s41592-024-02296-5

Extended data

is available for this paper at 10.1038/s41592-024-02237-2.

Supplementary information

The online version contains supplementary material available at 10.1038/s41592-024-02237-2.

References

  • 1.Poldrack RA, et al. Scanning the horizon: towards transparent and reproducible neuroimaging research. Nat. Rev. Neurosci. 2017;18:115–126. doi: 10.1038/nrn.2016.167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Wilkinson MD, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data. 2016;3:160018. doi: 10.1038/sdata.2016.18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Nichols TE, et al. Best practices in data analysis and sharing in neuroimaging using MRI. Nat. Neurosci. 2017;20:299–303. doi: 10.1038/nn.4500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Gorgolewski KJ, et al. The Brain Imaging Data Structure, a format for organizing and describing outputs of neuroimaging experiments. Sci. Data. 2016;3:160044. doi: 10.1038/sdata.2016.44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Van Essen DC, et al. The Human Connectome Project: a data acquisition perspective. Neuroimage. 2012;62:2222–2231. doi: 10.1016/j.neuroimage.2012.02.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Shafto MA, et al. The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study protocol: a cross-sectional, lifespan, multidisciplinary examination of healthy cognitive ageing. BMC Neurol. 2014;14:204. doi: 10.1186/s12883-014-0204-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Casey BJ, et al. The Adolescent Brain Cognitive Development (ABCD) study: imaging acquisition across 21 sites. Dev. Cogn. Neurosci. 2018;32:43–54. doi: 10.1016/j.dcn.2018.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Sudlow C, et al. UK BioBank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12:e1001779. doi: 10.1371/journal.pmed.1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Alexander LM, et al. An open resource for transdiagnostic research in pediatric mental health and learning disorders. Sci. Data. 2017;4:170181. doi: 10.1038/sdata.2017.181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Jernigan TL, et al. The Pediatric Imaging, Neurocognition, and Genetics (PING) data repository. Neuroimage. 2016;124:1149–1154. doi: 10.1016/j.neuroimage.2015.04.057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Allen EJ, et al. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nat. Neurosci. 2022;25:116–126. doi: 10.1038/s41593-021-00962-x. [DOI] [PubMed] [Google Scholar]
  • 12.Markiewicz CJ, et al. The OpenNeuro resource for sharing of neuroscience data. eLife. 2021;10:e71774. doi: 10.7554/eLife.71774. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Poldrack RA, Gorgolewski KJ, Varoquaux G. Computational and informatic advances for reproducible data analysis in neuroimaging. Annu. Rev. Biomed. Data Sci. 2019 doi: 10.1146/annurev-biodatasci-072018-021237. [DOI] [Google Scholar]
  • 14.Levitas D, et al. ezBIDS: guided standardization of neuroimaging data interoperable with major data archives and platforms. Sci. Data. 2024;11:179. doi: 10.1038/s41597-024-02959-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Betzel RF, et al. Changes in structural and functional connectivity among resting-state networks across the human lifespan. Neuroimage. 2014;102:345–357. doi: 10.1016/j.neuroimage.2014.07.067. [DOI] [PubMed] [Google Scholar]
  • 16.Bethlehem RAI, et al. Brain charts for the human lifespan. Nature. 2022;604:525–533. doi: 10.1038/s41586-022-04554-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Yeatman JD, Wandell BA, Mezer AA. Lifespan maturation and degeneration of human brain white matter. Nat. Commun. 2014;5:4932. doi: 10.1038/ncomms5932. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Fukutomi H, et al. Neurite imaging reveals microstructural variations in human cerebral cortical gray matter. Neuroimage. 2018;182:488–499. doi: 10.1016/j.neuroimage.2018.02.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hanson JL, Knodt AR, Brigidi BD, Hariri AR. Lower structural integrity of the uncinate fasciculus is associated with a history of child maltreatment and future psychological vulnerability to stress. Dev. Psychopathol. 2015;27:1611–1619. doi: 10.1017/S0954579415000978. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Ogawa S, et al. White matter consequences of retinal receptor and ganglion cell damage. Invest. Ophthalmol. Vis. Sci. 2014;55:6976–6986. doi: 10.1167/iovs.14-14737. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kozlov M. NIH issues a seismic mandate: share data publicly. Nature. 2022 doi: 10.1038/d41586-022-00402-1. [DOI] [PubMed] [Google Scholar]
  • 22.Eke DO, et al. International data governance for neuroscience. Neuron. 2021 doi: 10.1016/j.neuron.2021.11.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Glasser MF, et al. The minimal preprocessing pipelines for the Human Connectome Project. Neuroimage. 2013;80:105–124. doi: 10.1016/j.neuroimage.2013.04.127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Caron B, et al. Collegiate athlete brain data for white matter mapping and network neuroscience. Sci. Data. 2021;8:56. doi: 10.1038/s41597-021-00823-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Yushkevich PA, et al. Quantitative comparison of 21 protocols for labeling hippocampal subfields and parahippocampal subregions in in vivo MRI: towards a harmonized segmentation protocol. Neuroimage. 2015;111:526–541. doi: 10.1016/j.neuroimage.2015.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Karcher NR, Barch DM. The ABCD study: understanding the development of risk for mental and physical health outcomes. Neuropsychopharmacology. 2021;46:131–142. doi: 10.1038/s41386-020-0736-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Yushkevich PA, et al. Automated volumetry and regional thickness analysis of hippocampal subfields and medial temporal cortical structures in mild cognitive impairment. Hum. Brain Mapp. 2015;36:258–287. doi: 10.1002/hbm.22627. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Tournier J-D, et al. MRtrix3: a fast, flexible and open software framework for medical image processing and visualisation. Neuroimage. 2019;202:116137. doi: 10.1016/j.neuroimage.2019.116137. [DOI] [PubMed] [Google Scholar]
  • 29.Fischl B. FreeSurfer. Neuroimage. 2012;62:774–781. doi: 10.1016/j.neuroimage.2012.01.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Benson NC, et al. The retinotopic organization of striate cortex is well predicted by surface topology. Curr. Biol. 2012;22:2081–2085. doi: 10.1016/j.cub.2012.09.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Benson NC, Butt OH, Brainard DH, Aguirre GK. Correction of distortion in flattened representations of the cortical surface allows prediction of V1-V3 functional organization from anatomy. PLoS Comput. Biol. 2014;10:e1003538. doi: 10.1371/journal.pcbi.1003538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Cox RW. AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. Comput. Biomed. Res. 1996;29:162–173. doi: 10.1006/cbmr.1996.0014. [DOI] [PubMed] [Google Scholar]
  • 33.Ades-Aron B, et al. Evaluation of the accuracy and precision of the diffusion parameter EStImation with Gibbs and NoisE removal pipeline. Neuroimage. 2018;183:532–543. doi: 10.1016/j.neuroimage.2018.07.066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Andersson JLR, Skare S, Ashburner J. How to correct susceptibility distortions in spin-echo echo-planar images: application to diffusion tensor imaging. Neuroimage. 2003;20:870–888. doi: 10.1016/S1053-8119(03)00336-7. [DOI] [PubMed] [Google Scholar]
  • 35.Smith SM, et al. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage. 2004;23:S208–S219. doi: 10.1016/j.neuroimage.2004.07.051. [DOI] [PubMed] [Google Scholar]
  • 36.Andersson JLR, Sotiropoulos SN. An integrated approach to correction for off-resonance effects and subject movement in diffusion MR imaging. Neuroimage. 2016;125:1063–1078. doi: 10.1016/j.neuroimage.2015.10.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Andersson JLR, Graham MS, Zsoldos E, Sotiropoulos SN. Incorporating outlier detection and replacement into a non-parametric framework for movement and distortion correction of diffusion MR images. Neuroimage. 2016;141:556–572. doi: 10.1016/j.neuroimage.2016.06.058. [DOI] [PubMed] [Google Scholar]
  • 38.Andersson JLR, Graham MS, Drobnjak I, Zhang H, Campbell J. Susceptibility-induced distortion that varies due to motion: Correction in diffusion MR without acquiring additional data. Neuroimage. 2018;171:277–295. doi: 10.1016/j.neuroimage.2017.12.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Andersson JLR, et al. Towards a comprehensive framework for movement and distortion correction of diffusion MR images: Within volume movement. Neuroimage. 2017;152:450–466. doi: 10.1016/j.neuroimage.2017.02.085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Jeurissen B, Leemans A, Sijbers J. Automated correction of improperly rotated diffusion gradient orientations in diffusion weighted MRI. Med. Image Anal. 2014;18:953–962. doi: 10.1016/j.media.2014.05.012. [DOI] [PubMed] [Google Scholar]
  • 41.Tustison NJ, et al. Large-scale evaluation of ANTs and FreeSurfer cortical thickness measurements. Neuroimage. 2014;99:166–179. doi: 10.1016/j.neuroimage.2014.05.044. [DOI] [PubMed] [Google Scholar]
  • 42.Veraart J, et al. Denoising of diffusion MRI using random matrix theory. Neuroimage. 2016;142:394–406. doi: 10.1016/j.neuroimage.2016.08.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Jenkinson M, Smith S. A global optimisation method for robust affine registration of brain images. Med. Image Anal. 2001;5:143–156. doi: 10.1016/S1361-8415(01)00036-6. [DOI] [PubMed] [Google Scholar]
  • 44.Jenkinson M, Bannister P, Brady M, Smith S. Improved optimization for the robust and accurate linear registration and motion correction of brain images. Neuroimage. 2002;17:825–841. doi: 10.1006/nimg.2002.1132. [DOI] [PubMed] [Google Scholar]
  • 45.Greve DN, Fischl B. Accurate and robust brain image alignment using boundary-based registration. Neuroimage. 2009;48:63–72. doi: 10.1016/j.neuroimage.2009.06.060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Pierpaoli C, Jezzard P, Basser PJ, Barnett A, Di Chiro G. Diffusion tensor MR imaging of the human brain. Radiology. 1996;201:637–648. doi: 10.1148/radiology.201.3.8939209. [DOI] [PubMed] [Google Scholar]
  • 47.Zhang H, Schneider T, Wheeler-Kingshott CA, Alexander DC. NODDI: practical in vivo neurite orientation dispersion and density imaging of the human brain. Neuroimage. 2012;61:1000–1016. doi: 10.1016/j.neuroimage.2012.03.072. [DOI] [PubMed] [Google Scholar]
  • 48.Daducci A, et al. Accelerated microstructure imaging via convex optimization (AMICO) from diffusion MRI data. Neuroimage. 2015;105:32–44. doi: 10.1016/j.neuroimage.2014.10.026. [DOI] [PubMed] [Google Scholar]
  • 49.Tournier J-D, Calamante F, Connelly A. Robust determination of the fibre orientation distribution in diffusion MRI: non-negativity constrained super-resolved spherical deconvolution. Neuroimage. 2007;35:1459–1472. doi: 10.1016/j.neuroimage.2007.02.016. [DOI] [PubMed] [Google Scholar]
  • 50.Yeh F-C, Wedeen VJ, Tseng W-YI. Generalized q-sampling imaging. IEEE Trans. Med. Imaging. 2010;29:1626–1635. doi: 10.1109/TMI.2010.2045126. [DOI] [PubMed] [Google Scholar]
  • 51.Yeh F-C. Shape analysis of the human association pathways. Neuroimage. 2020;223:117329. doi: 10.1016/j.neuroimage.2020.117329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Smith RE, Tournier J-D, Calamante F, Connelly A. Anatomically-constrained tractography: improved diffusion MRI streamlines tractography through effective use of anatomical information. Neuroimage. 2012;62:1924–1938. doi: 10.1016/j.neuroimage.2012.06.005. [DOI] [PubMed] [Google Scholar]
  • 53.Bullock D, et al. Associative white matter connecting the dorsal and ventral posterior human cortex. Brain Struct. Funct. 2019 doi: 10.1007/s00429-019-01907-8. [DOI] [PubMed] [Google Scholar]
  • 54.Sherbondy AJ, Dougherty RF, Ben-Shachar M, Napel S, Wandell BA. ConTrack: finding the most likely pathways between brain regions using diffusion tractography. J. Vis. 2008;8:15.1–16. doi: 10.1167/8.9.15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Yoshimine S, et al. Age-related macular degeneration affects the optic radiation white matter projecting to locations of retinal damage. Brain Struct. Funct. 2018;223:3889–3900. doi: 10.1007/s00429-018-1702-5. [DOI] [PubMed] [Google Scholar]
  • 56.Yeh F-C, et al. Population-averaged atlas of the macroscale human structural connectome and its network topology. Neuroimage. 2018;178:57–68. doi: 10.1016/j.neuroimage.2018.05.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Smith RE, Tournier J-D, Calamante F, Connelly A. The effects of SIFT on the reproducibility and biological accuracy of the structural connectome. Neuroimage. 2015;104:253–265. doi: 10.1016/j.neuroimage.2014.10.004. [DOI] [PubMed] [Google Scholar]
  • 58.Hagmann P, et al. Mapping the structural core of human cerebral cortex. PLoS Biol. 2008;6:e159. doi: 10.1371/journal.pbio.0060159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Smith RE, Tournier J-D, Calamante F, Connelly A. SIFT: Spherical-deconvolution informed filtering of tractograms. Neuroimage. 2013;67:298–312. doi: 10.1016/j.neuroimage.2012.11.049. [DOI] [PubMed] [Google Scholar]
  • 60.Van Essen DC, et al. The WU-Minn Human Connectome Project: an overview. Neuroimage. 2013;80:62–79. doi: 10.1016/j.neuroimage.2013.05.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Margulies DS, et al. Situating the default-mode network along a principal gradient of macroscale cortical organization. Proc. Natl Acad. Sci. USA. 2016;113:12574–12579. doi: 10.1073/pnas.1608282113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Coifman RR, et al. Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc. Natl Acad. Sci. USA. 2005;102:7426–7431. doi: 10.1073/pnas.0500334102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Vos de Wael R, et al. BrainSpace: a toolbox for the analysis of macroscale gradients in neuroimaging and connectomics datasets. Commun. Biol. 2020;3:103. doi: 10.1038/s42003-020-0794-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Schaefer A, et al. Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity MRI. Cereb. Cortex. 2018;28:3095–3114. doi: 10.1093/cercor/bhx179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Gramfort A, et al. MEG and EEG data analysis with MNE-Python. Front. Neurosci. 2013;7:267. doi: 10.3389/fnins.2013.00267. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information (2.6MB, pdf)

Supplementary Figs. 1–5, Tables 1–5 and Results 1–5.

Reporting Summary (4MB, pdf)

Data Availability Statement

All data derived and described in this paper are made available via the brainlife.io platform as ‘Publications’. User data agreements are required for some projects, like data from the HCP, Cam-CAN, PING, ABCD and HBN datasets. The Indiana University Acute Concussion and Oxford University Choroideremia & Stargardt’s Disease Datasets are part of ongoing research projects and will be made available at a later stage. All other datasets are made freely available via the brainlife.io platform. See Supplementary Table 6 for the brainlife.io/pubs.

As part of the article, we are describing a total of nine platform components. All components are made publicly available and open source under MIT License. All the software for the platform components is listed in Supplementary Table 1. In addition, we share the code used for the statistical analyses as Jupyter Notebooks (Supplementary Table 2). Finally, the Apps used and tested in this article are listed in Supplementary Table 3.


Articles from Nature Methods are provided here courtesy of Nature Publishing Group

RESOURCES