Abstract
Neuroscience research has expanded dramatically over the past 30 years by advancing standardization and tool development to support rigor and transparency. Consequently, the complexity of the data pipeline has also increased, hindering access to FAIR data analysis to portions of the worldwide research community. brainlife.io was developed to reduce these burdens and democratize modern neuroscience research across institutions and career levels. Using community software and hardware infrastructure, the platform provides open-source data standardization, management, visualization, and processing and simplifies the data pipeline. brainlife.io automatically tracks the provenance history of thousands of data objects, supporting simplicity, efficiency, and transparency in neuroscience research. Here brainlife.io’s technology and data services are described and evaluated for validity, reliability, reproducibility, replicability, and scientific utility. Using data from 4 modalities and 3,200 participants, we demonstrate that brainlife.io’s services produce outputs that adhere to best practices in modern neuroscience research.
INTRODUCTION
Over the last 30 years, neuroimaging research has dramatically expanded our ability to study the structure and function of the living human brain, leading to major advancements in understanding brain-related health and disease 1–4. Today, neuroimaging modalities and techniques span multiple data types (e.g., magnetic resonance imaging [MRI], positron emission tomography [PET], functional near-infrared spectroscopy [fNIRS], electro-encephalography [EEG], and magnetoencephalography [MEG]), and have increased the feasibility of large-scale, population-level, data collection efforts.1,5,6 At the same time, the field of neuroimaging has attracted a large and ever-growing community of researchers 7,8. Furthermore, a process of adopting FAIR principles of data stewardship (Findability, Accessibility, Interoperability, and Reusability9), data standardization, open science methods, and increased data size, has been gaining grounds and in turns increasing requirements for rigorous and transparent data analysis and reporting. However, such approaches require significant additional technological support, posing new challenges to many researchers. We refer to these challenges as the burdens of neuroscience (Fig. 1).
Datasets are growing in size, in large part because they support scientific rigor and reproducibility. Research on the reproducibility of scientific findings indicates that limited sample sizes might have hindered the validity of early, foundational results in hypothesis-driven cognitive neuroscience research,10–16 but reproducibility issues can be found in biological science,17,18 psychology,12data science, and computational methods,19,20 cancer biology,21, and artificial intelligence.13,22,23This is largely because small sample sizes increase the probability of reporting spurious effects as statistically significant.1,24 Recent findings also make the case for increasing sample sizes into the thousands when research focuses on discovery science.5 Notable examples of large-scale data sharing within neuroscience and neuroimaging include the Human Connectome Project (HCP),25 the Cambridge Centre for Ageing and Neuroscience study (Cam-CAN),26,27 the Adolescent Brain Cognitive Development (ABCD) study,28,29 the UK-Biobank,30 the Healthy Brain Network (HBN),31 the Pediatric Imaging Neurocognition and Genetics (PING) study,32 the Natural Scene Dataset 33 and the thousands of individual brain datasets deposited on OpenNeuro.org.34 These data-sharing projects not only serve the needs of the neuroscience community with demonstrated impact 35, but also the incoming generation of AI research.36–38 However, larger datasets generally entail greater complexity as well. The use of datasets so unprecedented in size requires a substantial scaling up of resources and technical skills, and this in turn results in significant barriers to entry.
Traditionally, neuroimaging researchers have collected a few hours of neuroimaging data on a few dozen subjects and analyzed it using laboratory computers and a single tool-kit or programming environment, often created in-house. Current studies, by contrast, may require the analysis of hundreds (if not thousands) of hours of data, with an accompanying move of data away from individual laboratory computers toward high-performance computing clusters and cloud systems requiring multiple steps and a variety of scripting and programming languages (e.g., Unix/Linux shell, Python, MatLab, R, C++). The complexity of neuroimaging data pipelines and code development stacks have increased concomitantly.39,40 To help ensure the reproducibility and rigor of scientific results, the neuroimaging community has also developed data standards41 and software libraries for data processing and analysis (FSL, Freesurfer, Nibabel, MRTrix, DIPY, DSI-STudio).42–68 More recently prebuilt data processing pipelines that combine software from multiple libraries into unified partially preconfigured steps have been also developed 69–73. These pipelines advance data processing standardization but still leave many choices of parameters to users and often require technical input data formats.
As a result of all this progress for data and tools, neuroimaging researchers carry the burden of having to piece together and track multiple processes, such as data ingestion and standardization, storage, and management, preprocessing and feature extraction, all while also attending to tracking quality control, analyses, and publication (Fig. 1a). Publication of results requires compliance with the FAIR principles which, though well explained in theory, are often challenging to implement in practice. Submission of manuscripts often necessitates new analyses at a later date, by which point software and data versions may have changed, and data might have been removed from compute clusters or local servers. Existing approaches for managing these steps require manual tracking of data and code versions, along with advanced technical skills.40,74 Currently, there exists no efficient technology to help piece together and keep track of all of these (ever-changing) technology and data requirements.
As the resources necessary to participate fully in modern neuroscience research have grown, barriers to entry and funding have risen as well. Smaller universities, teaching colleges, undergraduate students, and other settings that lack the resources to support significant investments in infrastructure and training are at a meaningful disadvantage. Lack of resources and infrastructure is a key gap identified in surveys pertaining to both the adoption of FAIR neuroscience 75 and the conduct of neuroscience research in low- and medium-income countries 76,77. Without added support, FAIR neuroscience might evolve with an ever-increasing bias towards high-resourced teams, institutions, and countries. Such an outcome would not only decrease representation and diversity but would slow scientific progress. In support of simplicity, efficiency, transparency, and equity in big data neuroscience research, our team has developed a community resource, brainlife.io (Fig. 1b). The brainlife.io platform stands on the foundational pillars of the neuroimaging community and the mission of open science (Fig. 1c). brainlife.io provides free and secure reproducible neuroscience data analysis. brainlife.io’s technology works for researchers serving automated tracking of data provenance, preprocessing steps, parameter sets, and analysis versions. Our vision for brainlife.io is that of a trusted, interoperable, and integrative platform connecting global communities of software developers, hardware providers, and domain scientists via cloud services.
In the remainder of this article, we describe the technology and utilization of brainlife.io. After that, we present the results of our evaluations of the effectiveness of the technology. Experiments focused on the four axes of scientific transparency: external validity, reliability, reproducibility, and replicability. Finally, we demonstrate the platform’s potential for scientific utility in identifying human disease biomarkers.
RESULTS
Platform architecture
brainlife.io is a ready-to-use and ready-to-expand platform. As a ready-to-use system, it allows researchers to upload and analyze data from MRI, MEG, and EEG systems. Data are managed using a secure warehousing system that follows an advanced governance and access-control model. Data can be preprocessed and visualized using version-controlled applications (hereafter referred to as Apps) compliant with major data standards (the Brain Imaging Data Structure, BIDS41). As a ready-to-expand system, software developers may contribute or modify existing Apps guided by standard methods and documentation describing how to write Apps (github.com/brainlife/abcd-spec and brainlife.io/docs). The platform uses a combination of opportunistic computing and publicly funded resources 78–80 that are functionally integrated and can be available for use by a particular project or team of researchers. Computing resource managers can also register computer servers and clusters on brainlife.io to make them available either to individual users or projects or to the larger community of brainlife.io users (Fig. 2a and Fig. S2a). The Supplemental Platform architecture provides an extended description of the technology. The platform is available to any type of researcher from students to faculty researchers, either without cost (through opportunistic use of freely contributed resources) or with performance guarantees (through the use of dedicated hardware or payment for use of cloud resources).
Brainlife.io was founded via an initial investment from the U.S. BRAIN Initiative via a National Science Foundation, followed by support from the National Institutes of Health, the Department of Defense, the Kavli Foundation, and the Wellcome Trust. The platform’s geographically distributed computing and storage systems are securely hosted by national supercomputing centers and funded by a combination of institutional, national, and international awards (see Fig. S2). As of this paper, the Texas Advanced Computing Center, Indiana University Pervasive Technology Institute, Pittsburgh Supercomputing Center, San Diego Supercomputing Center, and the University of Michigan Advanced Research Computing Technology Services have supported the project. The distributed platform is connected with and depends on other major infrastructure and software projects such as OpenNeuro.org, osris.org, DataLad.org, BIDS, Freesurfer, FSL, nibabel, dipy.org, repronim.org, DSI-Studio, jetstream-cloud.org, frontera-portal.tacc.utexas.edu, access-ci.org, and INCF.org.
The architecture of brainlife.io is based on an innovative, microservices-based approach, including authentication, preprocessing, warehousing, event handling, and auditing. This architecture allows automated and decentralized data management and processing. Microservices are handled by the meta-orchestration workflow system Amaretti (Fig. 2a,b, and Table S1). Amaretti can deploy computational jobs on high-performance compute clusters and cloud systems. This allows the utilization of publicly-funded supercomputers and clouds 80, as well as commercial clouds, such as Google Cloud, AWS, or Microsoft Azure.
Data management on brainlife.io is centered around Projects and supported by a databasing and warehousing system (github.com/brainlife/warehouse). Projects are the “one-stop-shop” for data management, processing, analysis, visualization, and publication (Fig. S3c). Projects are created by users and are private by default, but can also be made publicly visible inside the brainlife.io platform. A project can be populated with data using several options (Fig. 2d). Several major archives and data repositories are currently docked by brainlife.io74 (see Fig. 2b). Noticeable examples are OpenNeuro.org34 and the Nathan-Kline data-sharing project.81–83 Datasets can be imported seamlessly into brainlife.io Projects by using either the portal brainlife.io/datasets 74 (see Video S2 and Video S3), the standardization tool brainlife.io/ezbids (see Table S1 and Video S6) or a dedicated Command Line Interface (CLI).
Data processing on brainlife.io utilizes an object-oriented and micro workflows service model. Data objects are stored using predefined formats, Datatypes, that allow automated App concatenation and pipelining (Fig. 2c; brainlife.io/Datatypes). Apps and Datatypes are the key components of a system that work together to allow automated processing and provenance tracking for millions of data objects. Apps are composable processing units written in a variety of languages using containerization technology.84,85 Apps are smart, and can automatically identify, accept, or reject datasets before processing (Fig. 2 and Fig. S2b). Community-developed data visualizers are served by brainlife.io to support quality control (see Table S1). Six new data visualizers have been developed and released as part of the project (Table S1 and Video S7). Whenever possible, Datatypes are made compatible with BIDS.41 BIDS Apps can be easily made into brainlife.io Apps and multiple examples exist already brainlife.io/apps.
The data workflow on brainlife.io simplifies the complexity of the modern neuroimaging processing pipeline into two steps, akin to Google’s MapReduce algorithm.86 An initial map step preprocesses data objects asynchronously and in parallel using Apps, so as to extract features of interest (such as functional activations, white matter maps, brain networks, or time series data; Fig. 2d). During the map step, Datatypes and Apps are synchronized and moved to available compute resources automatically. Apps process data objects in parallel across study participants in a Project. The map step is followed by a reduce step, wherein features extracted using Apps are made available to pre-configured Jupyter notebooks87,88 served on the platform to perform statistical analysis, machine-learning applications, and generate figures. Indeed, all statistical analyses and figures in this paper are available in accessible Jupyter Notebooks (see Table S2). brainlife.io’s data workflow makes it possible to integrate large volumes of diverse neuroimaging Datatypes into simpler sets of brain features organized into Tidy data structures 89 (Fig. S3c).
A key technological innovation developed for brainlife.io is the ability to automatically track all actions performed by platform users on Datatypes and Apps. The platform captures data object IDs, Apps versions, and parameter sets so as to track the full sequence of steps from data import to analysis and publication. A graph describing provenance metadata for each Datatype can be visualized using the provenance visualizer or downloaded (see Fig. S3d and Video S10). A shell script is automatically generated to allow the reproduction of full processing sequences (Video S11). Finally, a single record containing data objects, Apps, and Jupyter Notebooks used in a study can be made publicly available outside the platform bundled into a single record addressed by a unique Digital Objects Identifier (DOI) 90. Whereas all other existing systems provide users with technology to track analysis steps manually or require the use of coding, brainlife.io tracks automatically and do not require coding nor user actions to generate a record of everything done by a user for data analysis. This automation technology lowers the barriers of entry and democratizes FAIR, reproducible large-scale neuroimaging data analysis.
Platform evaluation
In the following section, we evaluate the utility of brainlife.io. To do so, we first present the level of engagement with the platform by the growing community of users. After that, we describe the results of experiments on the robustness and validity of the platform. A detailed description of each section below describing each App and step used can be found in the corresponding Supplemental Platform evaluation section.
Platform utilization
brainlife.io is developed following the FAIR principles. It is available worldwide and supports thousands of researchers. First made accessible in Spring 2018, its utilization and assets have grown steadily (Fig. 3 and Fig. S2c and S4). At the time of this writing, over 2,341 users across 43 countries have created a brainlife.io account. Over 1,542 of these have been active users (Fig. 3a). Over 3,439 data management Projects have been created, and a community of developers has implemented over 530 data processing Apps. Over 270 TBs of data have been stored and processed using brainlife.io, for a total of 1,097,603 hours of compute time.
Researchers ranging from undergraduate students to faculty use brainlife.io (Fig. 3b), and analyses span the full range of the neuroimaging data lifecycle. The most frequently used Apps pertained to diffusion tractography (22%), model fitting (15%), and anatomical ROI generation (12%). Community-developed software libraries provided the foundations for data processing, including Nibabel, Freesurfer, FSL, DIPY, MRTrix, the Connectome Workbench, and MNE-Python. Terabytes of data have been uploaded (72%) or imported from OpenNeuro.org (22%), the Nathan-Kline Institute data sharing projects (3%31,81,83), and other sources. This degree of world-wide platform access highlights the global need for technology like brainlife.io (see Fig. S2e). More details can be found in Supplemental platform utilization.
Platform testing
Experiments were performed to demonstrate the ability of the platform to provide accurate data processing and analysis at scale. The experiments focused on the four axes of scientific transparency: data processing external validity (DPEV), reliability, reproducibility, and replicability.91,92 Four data modalities (sMRI, fMRI, dMRI, MEG) were evaluated using, among others, the test-retest HCPTR, 93 the Cam-CAN,27 the HBN,31 and the ABCD28 datasets. In total, data from over 3,200 participants across 12 datasets were processed. Extracted brain features included cortical parcel volumes, white matter tract profilometry, functional and structural network properties, functional gradients, and peak alpha frequency (Fig. 4). Over 193,000 data objects and 22 Terabytes of data were generated for the experiments. A detailed description of the experiments below can be found in the Supplemental platform testing section. The brainlife.io Apps used for the experiments are reported in Table S3. Post-processing analyses were performed using brainlife.io-hosted Jupyter Notebooks (see Table S2).
Data processing external validity (DPEV) was defined as the ability of data processed on brainlife.io to accurately reflect brain properties proficiently processed by other teams. DPEV was estimated for four data modalities (sMRI, dMRI, fMRI, and MEG) and five brain features (brain areas volumes, major white matter tracts fractional anisotropy, resting state functional connectivity, resting-state function gradients, and MEG peak alpha frequency). Features values obtained using brainlife.io Apps were compared against data preprocessed by data originators, specifically the HCP consortium or Cam-CAN project team (Fig. 4, Fig. S4d,e,h). Cortical area volume estimates on 148 parcels were obtained using brainlife.io Apps and compared to corresponding estimates provided by the HCP consortium (Fig. 4a; rvalidity=0.98, rmsevalidity=570.54mm3). Fractional anisotropy (FA) in 61 white matter tracts was estimated using the raw and minimally preprocessed HCPTR dMRI data (Fig. 4b; rvalidity=0.95, rmsevalidity=0.018). Functional connectivity estimates between 1172 nodes-pairs 94 were compared between raw and minimally preprocessed HCPTR dMRI data (Fig. 4c; rvalidity=0.89, rmsevalidity=0.12). In addition, functional gradients 95,96 were computed on 400 nodes estimated on raw and minimally processed HCPTR fMRI data (Fig. 4d; rvalidity=0.59, rmsevalidity=0.036). Finally, the peak alpha frequency values were compared between Cam-CAN and brainlife.io processed MEG data (Fig. 4e; rvalidity=0.94, rmsevalidity=0.30 Hz). Overall, the results show strong similarity in feature estimates between data processed on brainlife.io versus those processed by external groups (functional gradients demonstrated the lowest validity and data processing-type dependency based on fMRI preprocessing procedures 97).
Data processing reliability (DPR) was defined as the ability to produce highly similar results on test and retest measurements within a study participant. DPR was estimated for the four data modalities and five brain features used above to estimate DPEV. Brain features estimated using brainlife.io Apps on test and retest measurements (HCPTR dataset) or median splits data (Cam-CAN MEG) were compared. Reliability estimates of brain area volumes, major tracts FA, networks FC, functional gradients, and Peak Alpha Frequency were obtained (see Fig. 4f–i and associated supplemental text). DPR varied between rreliability=0.99 and 0.73, with sMRI and dMRI demonstrating the highest reliability (rreliability=0.99, 0.93, respectively). See also Fig. S4f–g,i for estimates on additional brain features and Table S4 for a full report of all correlation values obtained in all brain features. The results show strong reliability of most of all the pipelines with the fMRI reliability being lowest, this is consistent with previous reports 98. We also performed computational reproducibility (CR) experiments (see Fig. S4j–n and associated text). These experiments demonstrated the similarity in estimates produced by brainlife.io Apps when used twice to process the same dataset. Given the use of containerization technology for the Apps, this test was expected to return high correlation values. Indeed, all correlations were above 0.99, demonstrating high consistency. These experiments demonstrate the ability of the platform to conduct valid, reliable, and reproducible data processing and analysis at scale across multiple data modalities and brain features.
Platform utility for scientific applications
Next, we evaluated the platform’s potential to support scientific findings. To do so, we evaluated whether data processed using brainlife.io’s Apps contained meaningful patterns. We used over 1,800 participants from three datasets: PING (Pediatric Imaging, Neurocognition, Genetics), HCPs1200, (HCP Young Adult 1,200), and Cam-CAN. Data were collected across ages, but age ranges differed in each dataset (i.e., 3–20 years for PING, 20–37 years for HCPs1200, and 18–88 years for Cam-CAN). The lifelong trajectory was plotted for multiple brain features (e.g., volumes of brain parts, FA of major tracts, network properties. MEG peak frequency, etc; Fig. 5). The collated age range spanned 7 decades. Features were combined using brainlife.io’s Jupyter Notebooks.
Multiple reports have shown inverted U-shaped lifelong trajectories across data modalities.99–103 We plotted brain features derived for each data modality (sMRI, dMRI, fMRI, and MEG) as a function of age across datasets (Fig. 5). Six exemplary lifelong trajectories are shown (additional features are reported in Fig. S5). For each data modality, a quadratic model was fit across all three datasets between 3 and 88 years of age: <MI>, (R2=0.152 ± 0.0773 s.d.). Mean quadratic term (a) across all data modalities was negative (−0.0514 ± 0.111 s.d.), demonstrating the expected inverted U-shape trajectory. Results show that, by automatically analyzing data using brainlife.io Apps, it is possible to collate across datasets with substantial differences in data acquisition parameters and signal-to-noise profiles. Additional details regarding these experiments can be found in Supplemental platform utility for scientific applications.
Replication and generalization of previous results
We then evaluated the ability of brainlife.io to replicate previous results and generalize findings across datasets. A more detailed description and additional experiments can be found in Supplemental replication and generalization. First, we tested brainlife.io’s ability to replicate the results of three previous studies. A negative correlation between cortical thickness and tissue orientation dispersion (ODI; roriginal =−0.46) has been reported in the HCPs1200 dataset.104 brainlife.io Apps were created to estimate cortical thickness and ODI and analyze HCPs1200 dataset. A negative relationship between cortical thickness and ODI was estimated, replicating the original study (Fig. 6a; rHCP-brainlife = −0.43 vs. roriginal). More examples of replications can be found in Fig. S6a,b.
Second, the generalization of the original findings to a different dataset was tested in three ways. The first test was run using the cortical ODI estimated in the Cam-CAN dataset. A negative trend of about half the magnitude of the original was estimated (Fig. 6a; rCam-CAN-brainlife = −0.28 vs. roriginal). The result generalizes the original results and the reduced effect in a new dataset is consistent with reports on the reproducibility of scientific findings.12 The second generalization test focused on the reported relationship between life stressors and white matter structural organization of the uncinate fasciculus (UF; r=−0.057).105 Two datasets were used to extend the finding to new data, i.e., HBN and ABCD. The number of negative life events (Negative Life Events Schedule; NLES) in the HBN dataset was correlated with subjects’ quantitative anisotropy (QA) in the right- and left-hemisphere UF. Results show a negative correlation similar in magnitude as found in the original study (Fig. 6b rHBN_LEFT = −0.35, p-value < 0.05; rHBN_RIGHT = −0.39, p-value < 0.05). The third and final attempt at the generalization of the same result was made using the ABCD dataset. Early life stress was estimated as a composite score of traumatic life events, environmental and neighborhood safety, and the family conflict subscale of the Family Environment Scale.29 A negative relationship between UF FA and the composite score was estimated in the left- and right-UF (Fig. 6c rABCD_LEFT = −0.12, p-value < 0.001; rABCD_RIGHT = −0.09, p < 0.01). Overall, these results demonstrate both the robustness of the original results and the potential of brainlife.io services to detect meaningful associations in large, heterogeneous datasets.
Example applications to detecting disease
The final two tests evaluated the platform’s ability to identify human disease biomarkers. Data from individuals with a sports-related concussion, eye disease (Choroideremia and Stargardt’s disease), and matched controls were used (Fig. 7). A detailed description of the experiments can be found in Supplemental to detecting disease. It has been reported that concussion can alter brain tissue both in cortical and deep white matter tracts.106 We set out to measure the difference in cortical white matter tissue in concussed and matched controls. FA was estimated from data collected within 24–48 hours post-concussion. The distribution of FA in the superior temporal sulcus (STS) is reported (Fig. 7a). One representative athlete showed strong post-concussive symptoms and low STS cortical FA (red). The result demonstrates the potential of brainlife.io processed data to report meaningful changes in brain tissue following a concussion.
Changes in the white matter of the optic radiation (OR) as a result of eye disease have been reported.107–111 We set out to test the ability of brainlife.io Apps to detect similar changes in the OR white matter tissue in two eye diseases for which OR white matter changes have not previously been reported. Individuals with Stargardt’s disease (a deterioration of the retina initiating in the central fovea), and Choroideremia (retinal deterioration initiating in the visual periphery), were compared to healthy controls. Retina photoreceptor complex thickness was estimated in the fovea and peripheral using optical coherence tomography (0–1 and 7–90 degrees of visual eccentricity, respectively; Fig. 7b). Choroideremia patients showed photoreceptor complex thickness comparable to healthy controls in the fovea, but deviated in the periphery (Fig. 7b). The trend was opposite for Stargardt’s patients. brainlife.io Apps were developed to automatically separate OR bundles projecting to different visual eccentricity in cortical area V1. Average FA profiles for each patient group and controls were estimated for OR fibers projecting to the fovea or periphery.112 113,114 Results show a reduction in FA in the component of the OR projecting to the fovea (but not the periphery) in Stargardt’s patients (Fig. 7b, blue), and the opposite pattern (OR fibers projecting to the periphery had lower FA than controls) in Choroideremia patients (Fig. 7b, blue). These results demonstrate the ability of the platform technology to detect disease biomarkers.
A new approach to facilitate quality control at scale
brainlife.io offers a unique quality assurance (QA) approach to ensure processed data has the quality necessary to serve large user bases. Reference ranges are often used in vision science to provide a reference for a measurement, 115 and a similar approach was integrated within the brainlife.io data processing interface. To test it, the mean, first, and second SD were estimated (via multiple Apps) for four brain features (tractmeasures, parc-stats, networks, PSD) using the HCPs1200, Cam-CAN, and PING datasets. For each of the four brain features, the estimated mean and estimated s.d. (referred to here as Reference ranges) are automatically calculated on the brainlife.io platform. That is, when a researcher uses an App to estimate one of the four features, the values of the researcher’s dataset are automatically overlaid on top of the mean, first, and second s.d. marks provided as a reference by brainlife.io. In this way, the mean and variability can be used by researchers to efficiently judge whether a recently processed dataset returned appropriate values. For example, reference datasets can be used to detect outlier data (Fig. 8a–d). Example reference datasets for four Datatypes are in Fig. 8e and an example of platform interfaces reporting these reference datasets is shown in Fig. S8. A detailed description of the approach used in this section can be found in Supplemental to quality control at scale. These reference ranges are an additional source for quality assurance, alongside other options for QA such as online data visualization, the automated generation of images and plots from the processed data as well as the detailed technical reports from major BIDS Apps such as fMRIprep, QSIPrep, MRIQC, Freesurfer 69,70,72,116.
DISCUSSION
The brainlife.io platform was developed with public funding to promote the progress of brain science and education and to enable discovery and improve health. The platform connects researchers with publicly available datasets, analysis code, data archives, and compute resources. brainlife.io is an end-to-end, turnkey data analysis platform that provides researchers interested in the brain with services for data upload, management, visualization, preprocessing, analysis, and publication–all integrated within a unique cloud environment and web interface. The platform uses opportunistic computing and publicly-funded resources for storage and computing, 78–80 but it can also use popular commercial clouds. The goal is to advance the democratization of big data neuroscience by lowering the barriers of entry to multimodal data analysis, network neuroscience, and large-scale analysis, all opportunities historically limited to a paucity of highly-skilled, high-profile research teams.39,99,117–122 The platform supports a rigorous and transparent scientific process spanning the research data lifecycle from after data collection to sharing123 and automatically tracks complex sequences of interactions between researchers, Apps, analysis notebooks, and data objects to support reproducibility. The FAIR data principles for data stewardship and management 9 are generally used as guidelines for any data-centric project. Recently, it has been proposed that a modern definition of neuroscience data should extend beyond measurements and data to include metadata and software for analysis and management. 123 Each research asset on brainlife.io (i.e., data derivatives, analysis software, and software services, as handled by the platform) is aligned with the FAIR data principles (see Supplement on brainlife.io and the FAIR principles). The following discussion will include descriptions of the resources available for getting started on brainlife.io, applications of brainlife.io to educational settings, the platform’s strict data governance principles, increasing “data gravity” via brainlife.io, potential expansion of the platform, and the platform’s current limitations.
The brainlife.io project provides multiple resources for App developers, computing resource managers, and neuroscience researchers to learn to use the platform or contribute to the project. A comprehensive overview of the platform and tutorials for getting started with developing Apps or using the platform can be found in the integrated documentation (brainlife.io/docs), as well as on a YouTube Channel that provides tutorials and demonstrations of concepts (mailto:youtube.com/@brainlifeioyoutube.com/@brainlifeio). A public slack channel is used for managing user communications, requests, feedback, and operations (brainlife.slack.com). Users can also ask questions to developers and the community using the topic ‘brainlife’ on neurostars.org and adding GitHub issues. Finally, a quarterly community engagement and outreach newsletter is sent to all users, and a Twitter account (@brainlifeio) informs the wider community on critical events and connects to information relevant to the project.
brainlife.io and its user community are highly engaged in providing innovative training and education opportunities for the next generation of students, postdocs, and clinicians interested in the intersection between neuroscience, data science, and information. The platform allows new students and educators to access many complex data files and analysis methods with minimal overhead. Educators have started using brainlife.io to teach neuroscience and data science concepts in the classroom, and courses have been organized in Europe, the USA, Canada, and Africa. These courses introduce basic concepts and teach students how to perform neuroimaging investigations without the requirement of programming or computing expertise. The skills that can be learned using the platform include data preprocessing, quality assurance, and statistical analyses. Integrative data management and analysis provide opportunities for educators and students in under-resourced institutions or countries to perform research and teach neuroscience with hands-on experience.
The project leadership and advisory team recognize the importance of ensuring that data processing workflows are ethically responsible, legally compliant, and socially acceptable. Indeed, data governance is considered an integral part of data processing. Data governance is defined as the principles, procedures, technologies, and policies that ensure acceptable and responsible processing of data at each stage of the data life cycle.123 It comprises the management of the availability, usability, integrity, quality, and security of data.123 The data governance policies, processes, and technologies within brainlife.io cover three key elements: people, processes, and technologies. A comprehensive set of advanced security measures and protocols guarantee that only authorized individuals have access. These measures include end-to-end encrypted communication, strict access control, and support for multi-factor authentication. Datasets uploaded by users using brainlife.io/ezBIDS are pseudonymized,124 (i.e. direct identifiers are removed) at upload. The platform interface provides fields for project managers to add Data Use Agreements (DUA) in alignment with the nature and context of their data. The platform even provides template DUAs describing data users’ responsibilities and liabilities, including becoming the data controller (the person who controls the purposes and means of processing the data). These governance mechanisms comply with available regulations and mandates, such as the European Union’s General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA) in the United States, which require that personal data be stored and managed in a secure and compliant manner. Cloud systems are designed to provide the level of protection necessary to ensure the privacy and confidentiality of research participants. Finally, the incoming changes to data deposition and sharing mandates (such as that recently released by the National Institutes of Health in the United States125,126) are likely to increase the workload for neuroscience researchers. The brainlife.io publication records are compatible with the NIH data sharing mandates (for privacy, sharing, and preservation), and the platform is registered on fairsharing.org, datacite.org, datasetsearch.research.google.com, and nitric.org.
Data gravity is the ability of datasets to attract utilization127 Neuroimaging research within the larger neuroscience field has led the way in increasing data gravity. A long and growing list of tools orchestrated under a general label of open science are being developed to support and facilitate data utilization and access. These tools can be divided into four primary categories: software library, data archives and database systems, data standards, and computing platforms.40 The data archives and systems closest to brainlife.io are the INDI,128,129 OpenNeuro.org,34 DANDI,130 BossDB,131 DataLad,74 NITRC,132 PING,32 Can-CAM,27 the Brain/MINDS project,133 and LORIS.134 The web services most related to the current work are NeuroQuery,135 NeuroScout,136 CBRAIN,137 NeuroDesk,138 XNAT,139 NEMAR,140 EBRAINS 141, LONI, 142,143, the International Brain Lab data Instratructure 144, COINSTAC 145 and CONP 146. Most projects are open-source and provide various degrees of data access. brainlife.io end-to-end integrated environment that brings researchers from raw data to Jupyter Notebooks and Tidy data tables while tracking data provenance automatically is unique. But many other projects exist and given the fast-growing landscape of neuroinformatics projects, we collected a table listing the major ones (see Table S5). The International Neuroinformatics Coordinating Facility also provides a list of major projects incf.org/infrastructure-portfolio. brainlife.io is one of the approved resources, as it complies with the INCF requirement for FAIR infrastructure. The ability of the platform to utilize data from multiple modalities (MEG, EEG, MRI) is a unique feature, connecting neuroimaging research sectors that have been historically siloed. However, we envision additional opportunities for expanding the types of data managed by the platform, fostering further data integration. For example, other data modalities could be mapped to brainlife.io Datatypes, and the mechanism for data Integration with metadata capture toolkits 147 and data models 148 would provide additional facilitation for the analysis domains of data currently not covered by the BIDS standard.
Improving the platform’s automation and interoperability is part of the vision and sustainability plan. For example, despite the best efforts of App developers, errors occur (see Fig. S3d). Currently, researchers only have simple interfaces that report technical output logs and error messages when Apps fail to process data, and parsing these messages requires expertise. Users are required to either contact the brainlife.io team or parse the error logs themselves. Planned improvements to brainlife.io’s error reporting interfaces will help users understand the sources of errors and find solutions. In addition to error identification, identifying the optimal set of processing steps or parameter sets at the beginning of a project can prove challenging. In addition, currently, researchers identify the optimal data processing steps by looking at existing documentation or videos. In the future, mechanisms that automatically identify processing steps can be implemented to suggest to researchers optimal ways to process their data (e.g. given what other researchers might have already implemented on the platform). Finally, improving connection with major archives and platforms such as OpenNeuro.org, DANDI, NeuroScout, NeuroDesk, and neurosynth.org, would contribute to implementing the vision of a global interoperable ecosystem for a FAIR, accessible, and democratized neuroscience.
In summary, the capabilities of brainlife.io are unique, open, accessible, and expandable. The expansion of instrument capabilities in neuroimaging has in the last 30 years revolutionized our ability to collect data about the brain and brain function. As the landscape of neuroscience big-data projects is only expected to grow in the coming years, moving research data management and computing to cloud platforms will become not just a brilliant option, but a serious requirement. Compliance with mandates for data privacy and sharing will ultimately require researchers to move data management and processing to secure and professionally managed to compute and storage systems. Our goal for brainlife.io is to facilitate this process and thereby revolutionize the ability to rigorously and reliably make use of the wealth of data now available to understand brain function, leading to new cures for brain disease. In so doing, brainlife.io will also make cutting-edge datasets and analysis resources more accessible to students and researchers from traditionally underrepresented groups in high-, medium- and low-income countries.
ONLINE METHODS AND MATERIALS
Data collection approval.
Multiple experiments were performed by individuals at various institutions using the platform. Experiments were approved by the local institutional review boards (IRB), and only the personnel approved for a specific study accessed the data in private projects on brainlife.io. Some of the secondary data usages were deemed IRB-exempt.
Data sources.
Multiple openly available data sources were used for examining the validity, reliability, and reproducibility of brainlife.io Apps and for examining population distributions. All information regarding the specific image acquisitions, participant demographics, and study-wide preprocessing can be found in the following publications 27,28,31,149–153. Some data sources are currently unpublished. For these, the appropriate information is provided.
Validity, reliability, reproducibility, replicability, developmental trends, & reference datasets
Human Connectome Project (HCP; Test-Retest, s1200-release) 149.
Data from these projects were used to assess the validity, reliability, and reproducibility of the platform. They were used to assess the abilities of the platform to identify developmental trends in structural and functional measures, and they were used to generate reference datasets. Structural data (sMRI): The minimally-preprocessed structural T1w and T2w images from the Human Connectome Project (HCP) from 1066 participants from the s1200 and 44 participants from the Test-Retest releases were used. Specifically, the 1.25 mm ‘acpc_dc_restored’ images generated from the Siemens 3T MRI scanner were used for all analyses involving the HCP. For most examinations, the already-processed Freesurfer output from HCP was used. Diffusion data (dMRI): To assess the validity of preprocessing on brainlife.io, the unprocessed dMRI data from 44 participants from the HCP Test dataset was used. For reliability and all remaining analyses, the minimally-preprocessed diffusion (dMRI) images from 1,066 participants from the s1200 and 44 participants from the Test-Retest releases from the 3T Siemens scanner were used. All processes incorporated the multi-shell acquisition data. Functional data (fMRI): For validation, the unprocessed resting-state functional MRI (fMRI) from 44 participants from the HCP Test dataset was compared to the minimally-preprocessed BOLD data provided by HCP. For reliability and all other analyses, the minimally-preprocessed BOLD data from 1,066 participants from the s1200 and 44 participants from the Test-Retest releases from the 3T Siemens scanner were used.
The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) 27.
The data from this project were used to assess the validity, reliability, and reproducibility of the platform and to assess the abilities of the platform to identify developmental trends of structural and functional measures, and to generate reference datasets. Structural data (sMRI): The unprocessed 1mm isotropic structural T1w and T2w images from 652 participants from the Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study were used. Diffusion data (dMRI): The unprocessed 2mm isotropic diffusion (dMRI) images from 652 participants from the Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study were used. Functional data (fMRI): The 3mm × 3mm × 4mm unprocessed resting-state fMRI images from 652 participants from the Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study were used. Electromagnetic data (MEG): The 1000 Hz resting-state filtered and unfiltered datasets from 652 participants from the Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study were used.
Developmental trends & reference datasets
Pediatric Imaging, Neurocognition, and Genetics (PING) 32.
The data from this project were used to assess the abilities of the platform to identify developmental trends of structural measures and to generate reference datasets. Structural data (sMRI): The unprocessed 1.2 × 1.0 × 1.0 mm structural T1w and the 1.0 mm isotropic T2w images from 110 participants from the Pediatric Imaging, Neurocognition, and Genetics (PING) study were used. Diffusion data (dMRI): The unprocessed 2mm isotropic diffusion (dMRI) images from 110 participants from the Pediatric Imaging, Neurocognition, and Genetics (PING) study were used.
Replicability datasets
Adolescent Brain Cognitive Development (ABCD) 28,29.
Structural data (sMRI): The unprocessed 1mm isotropic structural T1w and T2w images from a subset of 1,877 participants from the Adolescent Brain Cognitive Development (ABCD release-2.0.0) study were used. Diffusion data (dMRI): The unprocessed 1.77mm isotropic diffusion (dMRI) images from a subset of 1877 participants from the Adolescent Brain Cognitive Development (ABCD release-2.0.0) study were used. A single diffusion gradient shell was used for these experiments (b=3000s/msec2). Research approved by the University of Arkansas IRB (#2209425822).
Healthy Brain Network (HBN) 31.
The data from this project were used to assess the abilities of the platform to replicate previously published findings via the assessment of the relationship between microstructural measures mapped to segmented uncinate fasciculi and self-reported early life stressors. Research approved by the University of Pittsburgh IRB (#PRO17060350). Structural data (sMRI): The 0.8 mm isotropic structural T1w images from 42 participants from the Healthy Brain Network (HBN) study were used. Diffusion data (dMRI): The unprocessed 1.8 mm isotropic diffusion (dMRI) images from 42 participants from the CitiGroup Cornell Brain Imaging Center site of the Healthy Brain Network (HBN) study were used. Research approved by the University of Pittsburgh IRB (#PRO17060350).
UPENN-PMC 154.
The data from this project were used to assess the abilities of the platform to replicate previously published findings via the assessment of the performance of an automated hippocampal segmentation algorithm. All procedures were conducted under the approval of the Institutional Review Board at the University of Texas at Austin. Structural data (sMRI): The T1w and T2w data were provided within the Automated Segmentation of Hippocampal Subfields (ASHS) atlas154.
Clinical-identification datasets
Indiana University Acute Concussion Dataset.
The data from this project were used to assess the abilities of the platform to identify clinical populations via the mapping of microstructural measures to the cortical surface. Neuroimaging was performed at the Indiana University Imaging Research Facility, housed within the Department of Psychological and Brain Sciences with a 3-Tesla Siemens Prisma whole-body MRI using a 64-channel head coil. Within this study, 9 concussed athletes and 20 healthy athletes were included. Research approved by Indiana University (IRB: 906000405). Structural data (sMRI): High-resolution T1-weighted structural volumes were acquired using an MPRAGE sequence: TI = 900 ms, TE = 2.7 ms, TR = 1800 ms, flip angle = 9°, with 192 sagittal slices of 1.0 mm thickness, a field of view of 256 × 256 mm, and an isometric voxel size of 1.0 mm3. The total acquisition time was 4 minutes and 34 seconds. High-resolution T2-weighted structural volumes were also acquired: TE = 564 ms, TR = 3200 ms, flip angle = 120°, with 192 sagittal slices, a field of view of 240 × 256 mm, and an isometric voxel size of 1.0mm3. Total acquisition time was 4 minutes 30 seconds. Diffusion data (dMRI): Diffusion data were collected using single-shot spin-echo simultaneous multi-slice (SMS) EPI (transverse orientation, TE = 92.00 ms, TR = 3,820 ms, flip angle = 78 degrees, isotropic 1.5 mm3 resolution; FOV = LR 228 mm × 228 mm × 144 mm; acquisition matrix MxP = 138 × 138. SMS acceleration factor = 4). This sequence was collected twice, one in the AP fold-over direction and the other in the PA fold-over direction, with the same diffusion gradient strengths and the number of diffusion directions: 30 diffusion directions at b = 1000 s/mm2, 60 diffusion directions at b = 1,750 s/mm2, 90 diffusion directions at b = 2,500 s/mm2, and 19 b = 0 s/mm2 volumes. The total acquisition time for both sets of dMRI sequences was 25 minutes and 58 seconds.
Oxford University Choroideremia & Stargardt’s Disease Dataset.
The data from this project was used to assess the abilities of the platform to identify clinical populations via mapping retinal-layer thickness via OCT and mapping of microstructural measures along optic radiation bundles segmented using visual field information (eccentricity). Neuroimaging was performed at the Wellcome Centre for Integrative Neuroimaging, Oxford with the Siemens 3T scanner. Research approved by the UK Health Regulatory Authority reference 17/LO/1540. Structural data (sMRI): High-resolution T1-weighted anatomical volumes were acquired using an MPRAGE sequence: TI = 904 ms, TE = 3.97 ms, TR = 1900 ms, flip angle = 8°, with 192 sagittal slices of 1.0 mm thickness, a field of view of 174 mm × 192 mm × 192 mm, and an isometric voxel size of 1.0 mm3. The total acquisition time was 5 minutes and 31 seconds. Diffusion data (dMRI): Diffusion data were collected using EPI (transverse orientation, TE = 92.00ms, TR = 3600 ms, flip angle = 78 degrees, 2.019 × 2.019 × 2.0 mm3 resolution; FOV = 210 mm × 220 mm × 158 mm; acquisition matrix MxP = 210 × 210, SMS acceleration factor = 3). This sequence was collected twice, one in the AP fold-over direction and the other in the PA fold-over direction. The PA fold-over scan contained 6 diffusion directions, 3 at b = 0 s/mm2 and 3 at b = 2000 s/mm2, and was used primarily for susceptibility-weighted corrections. The AP fold-over scan contained 105 diffusion directions, 5 at b = 0 mm/s2, 51 at b = 1000 mm/s2, and 49 at b = 2000 mm/s2. The total acquisition time for both sets of dMRI sequences was 7 minutes and 8 seconds.
General processing pipelines
Structural processing.
For the ABCD, Cam-CAN, Oxford University Choroideremia & Stargardt’s Disease Dataset, and the Indiana University Acute Concussion datasets, the structural T1w and T2w (sMRI) images (if available) were preprocessed, including bias correction and alignment to the anterior commissure-posterior commissure (ACPC) plane, using A273 and A350 respectively. For PING data, no bias correction was performed but alignment to the ACPC plane was performed using A99 and A116 for T1w and T2w data respectively. For HCP data, this data was already provided. The structural T1-weighted images for each participant and dataset were then segmented into different tissue types using functionality provided by MRTrix3 (Tournier et al, 2019) implemented as A239. For a subset of datasets, this was performed within the diffusion tractography generation step using A319. The gray- and white-matter interface mask was subsequently used as a seed mask for white matter tractography. The processed structural T1w and T2w images were then used for segmentation and surface generation using the recon-all function from Freesurfer72 (A0). Following Freesurfer, representations of the cortical ‘midthickness’ surface were computed by spatially averaging the coordinates of the pial and white matter surfaces generated by Freesurfer using the wb_command -surface-cortex-layer function provided by Workbench command for the HCPTR, HCPs1200, ABCD, Cam-CAN, PING, and Indiana University Acute Concussion datasets. These surfaces were used for cortical tissue mapping analyses. Following Freesurfer and midthickness-surface generation, the 180 multimodal cortical nodes (hcp-mmp) atlas and the Yeo 17 (yeo17) atlas were mapped to the Freesurfer segmentation of each participant implemented as brainlife.io App A23. These parcellations were used for subsequent cortical, subcortical, and network analyses. In addition, measures for cortical thickness, surface area, volume, and summaries of diffusion models of microstructure were estimated using A383 and A389. To estimate population receptive fields (pRF) and visual field eccentricity properties in the cortical surface in the Oxford University Choroideremia & Stargardt’s Disease Dataset, the automated mapping algorithm developed by 155,156 was implemented using A187. To segment thalamic nuclei for optic radiation tracking, the automated thalamic nuclei segmentation algorithm provided by Freesurfer 72 was implemented as A222. Finally, visual regions of interest binned by eccentricity were then generated using AFNI 157 functions implemented in A414. To assess the replicability capabilities of the platform, an automated hippocampal nuclei segmentation app (A262) was used to segment hippocampal subfields from participants within the UPENN-PMC dataset provided within the ASHS atlas.
Diffusion (dMRI) processing.
Preprocessing & model fitting:
For a majority of the analyses involving the HCP dataset, the minimally-preprocessed dMRI images were used and thus no further preprocessing was performed. However, to assess the validity of the preprocessing pipeline, the unprocessed dMRI data from the HCP Test dataset, dMRI images were preprocessed following the protocol outlined in 158 using A68. The same app was also used for preprocessing the dMRI images for the ABCD, Cam-CAN, PING, Oxford University Choroideremia & Stargardt’s Disease Dataset, the Indiana University Acute Concussion, and HBN datasets. Specifically, dMRI images were denoised and cleaned from Gibbs ringing using functionality provided by MRTrix3 before being corrected for susceptibility, motion, and eddy distortions and artifacts using FSL’s topup and eddy functions 44,159. Eddy-current and motion correction was applied via the eddy_cuda8.0 with the replacement of outlier slices (i.e. repol) command provided by FSL 160–163. Following these corrections, MRTrix3’s dwigradcheck functionality was used to check and correct for potential misaligned gradient vectors following top-up and eddy 164. Next, dMRI images were debiased using ANT’s n4 functionality 165 and the background noise was cleaned using MrTrix3.0’s dwidenoise functionality 166. Finally, the preprocessed dMRI images were registered to the structural (T1w) image using FSL’s epi_reg functionality 167–169. Following preprocessing, brain masks for dMRI data using bet from FSL were implemented as A163.
DTI, NODDI, and q-sampling model fitting.
Following preprocessing, the diffusion tensor (DTI) model 170 and the neurite orientation dispersion and density imaging (NODDI) 171,172 models were subsequently fit to the preprocessed dMRI images for each participant using either A319 or A292 for DTI model fitting and A365 for NODDI fitting. Note, the NODDI model was only fit on the HCP, Cam-CAN, Oxford University Choroideremia & Stargardt’s Disease Dataset, and the Indiana University Acute Concussion datasets. For those datasets, the NODDI model was fit using an intrinsic free diffusivity parameter (d∥) of 1.7×10–3 mm2/s for white matter tract and network analyses, and a d∥ of 1.1×10–3mm2/s for cortical tissue mapping analyses, using AMICO’s implementation172 as A365. The constrained spherical deconvolution (CSD) (Tournier et al, 2007) model was then fit to the preprocessed dMRI data for each run across 4 spherical harmonic orders (i.e. Lmax) parameters (2,4,6,8) using functionality provided by MRTrix3 implemented as brainlife.io App A238. For the PING datasets, the CSD model was fit using the same exact code found in A238, but performed using the tractography App A319. For the HBN dataset, the isotropic spin distribution function was obtained by reconstructing the diffusion MRI data with the Generalized q-sampling imaging method 173 using functionality provided by DSI-Studio66 (A423). Quantitative anisotropy (QA) was then estimated from the isotropic spin distribution function.
Tractography.
Following model fitting, the fiber orientation distribution functions (fODFs) for Lmax=6 and Lmax=8 were subsequently used to guide anatomically-constrained probabilistic tractography (ACT; Smith et al, 2012) using functions provided by MRTrix3 implemented as brainlife.io App A297 or A319. For the HCPTR, HCPs1200, and Oxford University Choroideremia & Stargardt’s Disease datasets, Lmax=8 was used. For ABCD and Cam-CAN datasets, Lmax=6 was used. For the HCP, ABCD, Cam-CAN, datasets, a total of 3 million streamlines were generated. For all datasets, a step-size of 0.2 mm was implemented. For the HCPTR, HCPs1200, ABCD, and Cam-CAN datasets, minimum and maximum lengths of streamlines were set at 25 and 250mm respectively, and a maximum angle of curvature of 35° was used. For the PING dataset, minimum and maximum lengths of streamlines were set at 20 and 220mm respectively, and a maximum angle of curvature of 35° was used.
Whiter Matter Segmentation and cleaning.
Following tractography, 61 major white matter tracts were segmented for each run using a customized version of the white matter query language (Bullock et al, 2019) implemented as brainlife.io App A188. Outlier streamlines were subsequently removed using functionality provided by Vistasoft and implemented as brainlife.io App A195. Following cleaning, tract profiles with 200 nodes were generated for all DTI and NODDI measures across the 61 tracts for each participant and test-retest condition using functionality provided by Vistasoft and implemented as A361. Macrostructural statistics, including average tract length, tract volume, and streamline count was computed using functionality provided by Vistasoft implemented as A189. Microstructural and macrostructural statistics were then compiled into a single data frame using A397.
Segmentation of the optic radiation (OR).
To generate optic radiations segmented by estimates of visual field eccentricity in the Oxford University Choroideremia & Stargardt’s Disease Dataset, ConTrack 111 tracking was implemented as A252. 500,000 sample streamlines were generated using a step size of 1mm. Samples were then pruned using inclusion and exclusion waypoint ROIs following methodologies outlined in 108,109.
Segmentation of uncinate fasciculus (UF).
To assess the relationship between Uncinate tract-average quantitative anisotropy (QA) and fractional anisotropy (FA) and Early Life Stressors within two independent datasets (Healthy Brain Network, ABCD), the tract-average QA for the Left and Right Uncinates were computed from 42 participants from the HBN and the tract-average FA were computed from 1107 participants from the ABCD dataset. For the HBN dataset, a full tractography segmentation pipeline was used to preprocess the dMRI data and segment the uncinate fasciculus using A423. Automatic fiber tracking was then performed to segment the uncinate fasciculus using default parameters and templates from a population tractography atlas from the Human Connectome Project 174. A threshold of 16 mm as the maximum allowed threshold for the shortest streamline distance was then applied to remove spurious streamlines. The whole tract average QA was then estimated. To probe stress exposure within the HBN dataset, we used the Negative Life Events Schedule (NLES), a 22-item questionnaire where participants were asked about the occurrence of different stressful life events. For the questions pertaining to early life stressors, the ABCD dataset was used. The tract-average FA for the Left and Right Uncinates were estimated using procedures described previously, then compared to the participant’s life stressors behavioral measures by fitting a linear regression to the data.
Structural networks:
Following tract segmentation, structural networks were generated using the multi-modal 180 cortical node atlas and the tractograms for each participant using MRTrix3’s tck2connectome175 functionality implemented as A395. Connectomes were generated by computing the number of streamlines intersecting each ROI pairing in the 180 cortical node parcellation. Multiple adjacency matrices were generated, including count, density (i.e. count divided by the node volume of the ROI pairs), length, length density (i.e. length divided by the volume of the ROI pairs), and average and average density AD, FA, MD, RD, NDI, ODI, and ISOVF. Density matrices were generated using the -invnodevol option176. For non-count measures (length, AD, FA, MD, RD, NDI, ODI, ISOVF), the average measure across all streamlines connecting and ROI pair was computed using MRTrix3’s tck2scale functionality using the -precise option177 and the -scale_file option in tck2connectome. These matrices can be thought of as the “average measure” adjacency matrices. These files were outputted as the ‘raw’ Datatype, and were converted to conmat Datatype using A393. Connectivity matrices were then converted into the ‘network’ Datatype using functionality from python functionality implemented as A335.
Cortical & subcortical diffusion & morphometry mapping.
For the PING, HCPTR, HCPs1200, Cam-CAN, and Indiana University Acute Concussion datasets, DTI and NODDI (if available) measures were mapped to each participant’s cortical white matter parcels following methods found in Fukutomi and colleagues using functions provided by Connectome Workbench93 implemented as brainlife.io App A379. A Gaussian smoothing kernel (FWHM = ~4mm, σ = 5/3mm) was applied along the axis normal to the midthickness surface, and DTI and NODDI measures were mapped using the wb_command -volume-to-surface-mapping function. Freesurfer was used to map the average DTI and NODDI measures within each parcel using functionality from Connectome Workbench using A389 and A483. Measures of volume, surface area, and cortical thickness for each cortical parcel were computed using Freesurfer and A464. Freesurfer was also used to generate parcel average DTI and NODDI measures for the subcortical segmentation (aseg) from Freesurfer using A383. Measures of volume for each subcortical parcel were computed using Freesurfer and A272.
Resting-state Functional (rs-fMRI) preprocessing and functional connectivity matrix generation.
For the HCPTR and Cam-CAN datasets, unprocessed rs-fMRI datasets were preprocessed using fMRIPrep implemented as A160. Briefly, fMRIPrep does the following preprocessing steps. First, individual images are aligned to a reference image for motion estimation and correction using mcflirt from FSL. Next, slice timing correction is performed in which all slices are realigned in time to the middle of each TR using 3dTShift from AFNI. Spatial distortions are then corrected using field map estimations. Finally, the fMRI data is aligned to the structural T1w image for each participant. Default parameters provided by fMRIPrep were used. For a subset of analyses involving the HCP Test and Retest datasets, the preprocessed rs-fMRI datasets provided by the HCP consortium were used. Following preprocessing via fMRIPrep for the volume data, connectivity matrices were generated using the Yeo17 parcellation and A369 and A532. Within-network functional connectivity for the 17 canonical resting state networks was computed by computing the average functional connectivity values within all of the nodes belonging to a single network. These estimates were used for subsequent analyses.
Resting-state Functional (rs-fMRI) gradient processing.
For the HCPTR and Cam-CAN datasets, unprocessed rs-fMRI data from HCP Test and the Cam-CAN datasets were preprocessed using fMRIPrep implemented as A267. Within this app, the same preprocessing steps are undertaken as in A160, except for an additional volume-to-surface mapping using mri_vol2surf from Freesurfer. The surface-based outputs were then used to compute gradients following methodologies outlined in 96 for each participant in the HCPs1200, HCPTR, and Cam-CAN datasets using A574 using diffusion embedding 178 and functions provided by BrainSpace 179. More specifically, connectivity matrices were computed from surface vertex values within each node of the Schaffer 1,000 parcellation 180. Cosine similarity was then computed to create an affinity matrix to capture inter-area similarity. Dimensionality reduction is then used to identify the primary gradients. A normalized-angle kernel was used to create the affinity matrix, from which two primary components were identified. Gradients were then aligned across all participants using a Procrustes alignment and joined embedding procedure 96. Values from the primary gradient and the cosine distance used to generate the affinity matrices were used for subsequent analyses.
Magnetoencephalography (MEG) processing.
For some analyses, raw resting-state magnetoencephalography (rs-MEG) time series data from the Cam-CAN dataset was filtered using a Maxwell filter implemented as A476 and median split using A529. For the remainder of the analyses, filtered data provided by the Cam-CAN dataset was used. For all MEG data, power-spectrum density profiles (PSD) were estimated using functionality provided by MNE-Python 181 implemented as A530. Following PSD estimation, peak alpha frequency was estimated using A531. Finally, PSD profiles were averaged across all nodes within each of the canonical lobes (frontal, parietal, occipital, temporal) using A599. Measures of power-spectrum density and peak alpha frequency were used for all subsequent analyses.
Supplementary Material
ACKNOWLEDGMENTS.
The brainlife.io project, development and operations were supported by awards to Franco Pestilli; U.S. National Science Foundation (NSF) awards 1916518, 1912270, 1636893, and 1734853; U.S. National Institutes of Health awards (NIH) R01MH126699, R01EB030896, and R01EB029272; The Wellcome Trust award 226486/Z/22/Z; A Microsoft Investigator Fellowship; A gift from the Kavli Foundation. Additional funding was provided to support data collection used by the team, research that used brainlife.io or infrastructure that supported the platform: NSF awards 2004877 (Sophia Vinci-Booher), 1541335 and 2232628 (Shawn McKee), 1445604 and 2005506 (David Hancock), 1341698 (Michael Norman), 1928224 (Michael Norman), 1445606 (Shawn Brown), 1928147 (Sergiu Sanalevici). NIH awards 1U54MH091657 (HCP data, PIs David Van Essen and Kamil Ugurbil), U01DA041048, U01DA050989, U01DA051016, U01DA041022, U01DA051018, U01DA051037, U01DA050987, U01DA041174, U01DA041106, U01DA041117, U01DA041028, U01DA041134, U01DA050988, U01DA051039, U01DA041156, U01DA041025, U01DA041120, U01DA051038, U01DA041148, U01DA041093, U01DA041089, U24DA041123, U24DA041147 (ABCD Study, multiple PIs). Multiple philanthropic contributions to the HBN (Michael Milham).
Footnotes
Competing interests. The authors declare no competing financial interests.
CODE AVAILABILITY.
As part of the article we are describing a total of 9 platform components. All components are made publicly available open source under MIT License. All the software for the platform components is listed in Supplementary Table 1. In addition, we share the code used for the statistical analyses as Jupyter Notebooks (Supplementary Table 2). Finally, the Apps used and tested in this article are listed in Supplementary Table 3.
DATA AVAILABILITY.
All data derived and described in this paper are made available via the brainlife.io platform as “Publications”. User data agreements are required for some projects, like data from the HCP, Cam-CAN, PING, ABCD, and HBN datasets. The Indiana University Acute Concussion Dataset and the Oxford University Choroideremia & Stargardt’s Disease Dataset are parts of ongoing research projects and are not being released at this current time. All other datasets are made freely available via the brainlife.io platform. See supplementary Table 6 for the brainlife.io/pubs [we have added one example data record (https://doi.org/10.25663/brainlife.pub.40) for the review process <the DOIs for the remaining data records will be added at publication>].
REFERENCES
- 1.Poldrack R. A. et al. Scanning the horizon: towards transparent and reproducible neuroimaging research. Nat. Rev. Neurosci. 18, 115–126 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Birur B., Kraguljac N. V., Shelton R. C. & Lahti A. C. Brain structure, function, and neurochemistry in schizophrenia and bipolar disorder—a systematic review of the magnetic resonance neuroimaging literature. npj Schizophrenia 3, 1–15 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Pando-Naude V. et al. Gray and white matter morphology in substance use disorders: a neuroimaging systematic review and meta-analysis. Transl. Psychiatry 11, 29 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ahmadzadeh M., Christie G. J., Cosco T. D. & Moreno S. Neuroimaging and analytical methods for studying the pathways from mild cognitive impairment to Alzheimer’s disease: protocol for a rapid systematic review. Systematic Reviews vol. 9 Preprint at 10.1186/s13643-020-01332-7 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Marek S. et al. Reproducible brain-wide association studies require thousands of individuals. Nature 603, 654–660 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Naselaris T., Allen E. & Kay K. Extensive sampling for complete models of individual brains. Current Opinion in Behavioral Sciences 40, 45–51 (2021). [Google Scholar]
- 7.Gau R. et al. Brainhack: Developing a culture of open, inclusive, community-driven neuroscience. Neuron 109, 1769–1775 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Thompson P. M. et al. ENIGMA and global neuroscience: A decade of large-scale studies of the brain in health and disease across more than 40 countries. Transl. Psychiatry 10, 100 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wilkinson M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Camerer C. F. et al. Evaluating replicability of laboratory experiments in economics. Science 351, 1433–1436 (2016). [DOI] [PubMed] [Google Scholar]
- 11.Camerer C. F. et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat Hum Behav 2, 637–644 (2018). [DOI] [PubMed] [Google Scholar]
- 12.Open Science Collaboration. PSYCHOLOGY. Estimating the reproducibility of psychological science. Science 349, aac4716 (2015). [DOI] [PubMed] [Google Scholar]
- 13.Hutson M. Artificial intelligence faces reproducibility crisis. Science 359, 725–726 (2018). [DOI] [PubMed] [Google Scholar]
- 14.Klein R. A. et al. Many Labs 2: Investigating Variation in Replicability Across Samples and Settings. Advances in Methods and Practices in Psychological Science 1, 443–490 (2018). [Google Scholar]
- 15.Munafò M. R. et al. A manifesto for reproducible science. Nature Human Behaviour 1, 1–9 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Nissen S. B., Magidson T., Gross K. & Bergstrom C. T. Publication bias and the canonization of false facts. Elife 5, (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Stupple A., Singerman D. & Celi L. A. The reproducibility crisis in the age of digital medicine. NPJ Digit Med 2, 2 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.National Academies of Sciences, Engineering, and Medicine et al. Enhancing Scientific Reproducibility in Biomedical Research Through Transparent Reporting: Proceedings of a Workshop. (National Academies Press, 2020). doi: 10.17226/25627. [DOI] [PubMed] [Google Scholar]
- 19.McNutt M. Reproducibility. Science 343, 229 (2014). [DOI] [PubMed] [Google Scholar]
- 20.Stodden V. et al. Enhancing reproducibility for computational methods. Science 354, 1240–1241 (2016). [DOI] [PubMed] [Google Scholar]
- 21.Nosek B. A. & Errington T. M. Making sense of replications. Elife 6, (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.McKinney S. M. et al. Reply to: Transparency and reproducibility in artificial intelligence. Nature 586, E17–E18 (2020). [DOI] [PubMed] [Google Scholar]
- 23.Haibe-Kains B. et al. Transparency and reproducibility in artificial intelligence. Nature 586, E14–E16 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Button K. S. et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14, 365–376 (2013). [DOI] [PubMed] [Google Scholar]
- 25.Van Essen D. C. et al. The Human Connectome Project: a data acquisition perspective. Neuroimage 62, 2222–2231 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Taylor J. R. et al. The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) data repository: Structural and functional MRI, MEG, and cognitive data from a cross-sectional adult lifespan sample. Neuroimage 144, 262–269 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Shafto M. A. et al. The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study protocol: a cross-sectional, lifespan, multidisciplinary examination of healthy cognitive ageing. BMC Neurol. 14, 204 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Casey B. J. et al. The Adolescent Brain Cognitive Development (ABCD) study: Imaging acquisition across 21 sites. Dev. Cogn. Neurosci. 32, 43–54 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Karcher N. R. & Barch D. M. The ABCD study: understanding the development of risk for mental and physical health outcomes. Neuropsychopharmacology 46, 131–142 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Sudlow C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Alexander L. M. et al. An open resource for transdiagnostic research in pediatric mental health and learning disorders. Sci Data 4, 170181 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Jernigan T. L. et al. The Pediatric Imaging, Neurocognition, and Genetics (PING) Data Repository. Neuroimage 124, 1149–1154 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Allen E. J. et al. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nat. Neurosci. 25, 116–126 (2022). [DOI] [PubMed] [Google Scholar]
- 34.Markiewicz C. J. et al. The OpenNeuro resource for sharing of neuroscience data. Elife 10, (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Milham M. P. et al. Assessment of the impact of shared brain imaging data on the scientific literature. Nat. Commun. 9, 2818 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Willemink M. J. et al. Preparing Medical Imaging Data for Machine Learning. Radiology 295, 4–15 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Zeng N., Zuo S., Zheng G., Ou Y. & Tong T. Editorial: Artificial Intelligence for Medical Image Analysis of Neuroimaging Data. Front. Neurosci. 14, 480 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.NeuroImage. https://www.sciencedirect.com/journal/neuroimage/special-issue/101CWG577G3.
- 39.Poldrack R. A., Gorgolewski K. J. & Varoquaux G. Computational and Informatic Advances for Reproducible Data Analysis in Neuroimaging. Annu. Rev. Biomed. Data Sci. (2019) doi: 10.1146/annurev-biodatasci-072018-021237. [DOI] [Google Scholar]
- 40.Niso G. et al. Open and reproducible neuroimaging: From study inception to publication. Neuroimage 263, 119623 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Gorgolewski K. J. et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci Data 3, 160044 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Jenkinson M., Beckmann C. F., Behrens T. E. J., Woolrich M. W. & Smith S. M. FSL. Neuroimage 62, 782–790 (2012). [DOI] [PubMed] [Google Scholar]
- 43.Woolrich M. W. et al. Bayesian analysis of neuroimaging data in FSL. Neuroimage 45, S173–86 (2009). [DOI] [PubMed] [Google Scholar]
- 44.Smith S. M. et al. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 23 Suppl 1, S208–19 (2004). [DOI] [PubMed] [Google Scholar]
- 45.Dale A., Fischl B. & Sereno M. I. Cortical Surface-Based Analysis: I. Segmentation and Surface Reconstruction. Neuroimage 9, 179–194 (1999). [DOI] [PubMed] [Google Scholar]
- 46.Fischl B., Sereno M. I. & Dale A. Cortical Surface-Based Analysis: II: Inflation, Flattening, and a Surface-Based Coordinate System. Neuroimage 9, 195–207 (1999). [DOI] [PubMed] [Google Scholar]
- 47.Desikan R. S. et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 31, 968–980 (2006). [DOI] [PubMed] [Google Scholar]
- 48.Fischl B., Liu A. & Dale A. M. Automated manifold surgery: constructing geometrically accurate and topologically correct models of the human cerebral cortex. IEEE Medical Imaging 20, 70–80 (2001). [DOI] [PubMed] [Google Scholar]
- 49.Fischl B. et al. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron 33, 341–355 (2002). [DOI] [PubMed] [Google Scholar]
- 50.Fischl B. et al. Sequence-independent segmentation of magnetic resonance images. Neuroimage 23, S69–S84 (2004). [DOI] [PubMed] [Google Scholar]
- 51.Fischl B., Sereno M. I., Tootell R. B. H. & Dale A. M. High-resolution intersubject averaging and a coordinate system for the cortical surface. Hum. Brain Mapp. 8, 272–284 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Fischl B. et al. Automatically Parcellating the Human Cerebral Cortex. Cereb. Cortex 14, 11–22 (2004). [DOI] [PubMed] [Google Scholar]
- 53.Han X. et al. Reliability of MRI-derived measurements of human cerebral cortical thickness: The effects of field strength, scanner upgrade and manufacturer. Neuroimage 32, 180–194 (2006). [DOI] [PubMed] [Google Scholar]
- 54.Jovicich J. et al. Reliability in multi-site structural MRI studies: Effects of gradient non-linearity correction on phantom and human data. Neuroimage 30, 436–443 (2006). [DOI] [PubMed] [Google Scholar]
- 55.Kuperberg G. R. et al. Regionally localized thinning of the cerebral cortex in Schizophrenia. Arch. Gen. Psychiatry 60, 878–888 (2003). [DOI] [PubMed] [Google Scholar]
- 56.Reuter M., Schmansky N. J., Rosas H. D. & Fischl B. Within-Subject Template Estimation for Unbiased Longitudinal Image Analysis. Neuroimage 61, 1402–1418 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Reuter M. & Fischl B. Avoiding Asymmetry-Induced Bias in Longitudinal Image Processing. Neuroimage 57, 19–21 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Reuter M., Rosas H. D. & Fischl B. Highly Accurate Inverse Consistent Registration: A Robust Approach. Neuroimage 53, 1181–1196 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Salat D. et al. Thinning of the cerebral cortex in aging. Cereb. Cortex 14, 721–730 (2004). [DOI] [PubMed] [Google Scholar]
- 60.Segonne F. et al. A hybrid approach to the skull stripping problem in MRI. Neuroimage 22, 1060–1075 (2004). [DOI] [PubMed] [Google Scholar]
- 61.Segonne F., Pacheco J. & Fischl B. Geometrically accurate topology-correction of cortical surfaces using nonseparating loops. IEEE Trans. Med. Imaging 26, 518–529 (2007). [DOI] [PubMed] [Google Scholar]
- 62.Brett M. et al. nipy/nibabel: (2022). doi: 10.5281/zenodo.6658382. [DOI]
- 63.Tournier J.-D., Calamante F. & Connelly A. MRtrix: Diffusion tractography in crossing fiber regions. Int. J. Imaging Syst. Technol. 22, 53–66 (2012). [Google Scholar]
- 64.Tournier J.-D. et al. MRtrix3: A fast, flexible and open software framework for medical image processing and visualisation. Neuroimage 202, 116137 (2019). [DOI] [PubMed] [Google Scholar]
- 65.Garyfallidis E. et al. Dipy, a library for the analysis of diffusion MRI data. Front. Neuroinform. 8, 8 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Yeh F.-C. Shape analysis of the human association pathways. Neuroimage 223, 117329 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Yeh F.-C. Population-based tract-to-region connectome of the human brain and its hierarchical topology. Nat. Commun. 13, 1–13 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Yeh F.-C. et al. Automatic Removal of False Connections in Diffusion MRI Tractography Using Topology-Informed Pruning (TIP). Neurotherapeutics 16, 52–58 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Esteban O. et al. fMRIPrep: a robust preprocessing pipeline for functional MRI. Nat. Methods 16, 111–116 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Cieslak M. et al. QSIPrep: an integrative platform for preprocessing and reconstructing diffusion MRI data. Nat. Methods 18, 775–778 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Esteban O. et al. MRIQC: Advancing the automatic prediction of image quality in MRI from unseen sites. PLoS One 12, e0184661 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Fischl B. FreeSurfer. Neuroimage 62, 774–781 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Adebimpe A. et al. ASLPrep: a platform for processing of arterial spin labeled MRI and quantification of regional brain perfusion. Nat. Methods 19, 683–686 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Halchenko Y. et al. DataLad: distributed system for joint management of code, data, and their relationship. J. Open Source Softw. 6, 3262 (2021). [Google Scholar]
- 75.Paret C. et al. Survey on Open Science Practices in Functional Neuroimaging. Neuroimage 257, 119306 (2022). [DOI] [PubMed] [Google Scholar]
- 76.Maina M. B. et al. Two decades of neuroscience publication trends in Africa. Nat. Commun. 12, 3429 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Valenzuela-Toro A. M. & Viglino M. How Latin American researchers suffer in science. Nature Publishing Group UK 10.1038/d41586-021-02601-8 (2021) doi: 10.1038/d41586-021-02601-8. [DOI] [Google Scholar]
- 78.McKee S. et al. OSiRIS: a distributed Ceph deployment using software defined networking for multi-institutional research. J. Phys. Conf. Ser. 898, 062045 (2017). [Google Scholar]
- 79.Stanzione D. et al. Frontera: The Evolution of Leadership Computing at the National Science Foundation. in Practice and Experience in Advanced Research Computing 106–111 (Association for Computing Machinery, 2020). doi: 10.1145/3311790.3396656. [DOI] [Google Scholar]
- 80.Stewart C. A. et al. Jetstream: A self-provisioned, scalable science and engineering cloud environment. (2015) doi: 10.1145/2792745.2792774. [DOI]
- 81.Nooner K. B. et al. The NKI-Rockland Sample: A Model for Accelerating the Pace of Discovery Science in Psychiatry. Front. Neurosci. 6, 152 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Tobe R. H. et al. A longitudinal resource for studying connectome development and its psychiatric associations during childhood. Sci Data 9, 300 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Di Martino A. et al. The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Mol. Psychiatry 19, 659–667 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Developers, S. SingularityCE 3.8.3. (2021). doi: 10.5281/zenodo.5564915. [DOI]
- 85.Merkel D. Docker: lightweight Linux containers for consistent development and deployment. Linux J. 2014, 2 (2014). [Google Scholar]
- 86.Dean J. & Ghemawat S. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008). [Google Scholar]
- 87.Kluyver T. et al. Jupyter Notebooks – a publishing format for reproducible computational workflows. in Positioning and Power in Academic Publishing: Players, Agents and Agendas (eds. Loizides F. & Scmidt B.) 87–90 (IOS Press, 2016). doi: [Google Scholar]; Kluyver Thomas, Ragan-Kelley Benjamin, Pérez Fernando, Granger Brian, Bussonnier Matthias, Frederic Jonathan, Kelley Kyle, Hamrick Jessica, Grout Jason, Corlay Sylvain, Ivanov Paul, Avila Damián, Abdalla Safia, Willing Carol and Jupyter development team, (2016) Jupyter Notebooks – a publishing format for reproducible computational workflows. Loizides Fernando and Scmidt Birgit (eds.) In Positioning and Power in Academic Publishing: Players, Agents and Agendas. IOS Press. pp. 87–90 . (doi: 10.3233/978-1-61499-649-1-87 < 10.3233/978-1-61499-649-1-87>). . [DOI] [Google Scholar]
- 88.Perez F. & Granger B. E. IPython: A System for Interactive Scientific Computing. Comput. Sci. Eng. 9, 21–29 (2007). [Google Scholar]
- 89.Wickham H. Tidy Data. J. Stat. Softw. 59, 1–23 (2014).26917999 [Google Scholar]
- 90.Avesani P. et al. The open diffusion data derivatives, brain data upcycling via integrated publishing of derivatives and reproducible open cloud services. Sci Data 6, 69 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.National Academies of Sciences, Engineering et al. Understanding Reproducibility and Replicability. (National Academies Press (US), 2019). [PubMed] [Google Scholar]
- 92.Kelley T. L. Interpretation of educational measurements. 353, (1927). [Google Scholar]
- 93.Van Essen D. C. et al. The WU-Minn Human Connectome Project: an overview. Neuroimage 80, 62–79 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Yeo B. T. T. et al. The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J. Neurophysiol. 106, 1125–1165 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Bethlehem R. A. I. et al. Dispersion of functional gradients across the adult lifespan. Neuroimage 222, 117299 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Margulies D. S. et al. Situating the default-mode network along a principal gradient of macroscale cortical organization. Proc. Natl. Acad. Sci. U. S. A. 113, 12574–12579 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Botvinik-Nezer R. et al. Variability in the analysis of a single neuroimaging dataset by many teams. Nature 582, 84–88 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Noble S., Scheinost D. & Constable R. T. A decade of test-retest reliability of functional connectivity: A systematic review and meta-analysis. Neuroimage 203, 116157 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.McPherson B. C. & Pestilli F. A single mode of population covariation associates brain networks structure and behavior and predicts individual subjects’ age. Commun Biol 4, 943 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Betzel R. F. et al. Changes in structural and functional connectivity among resting-state networks across the human lifespan. Neuroimage 102 Pt 2, 345–357 (2014). [DOI] [PubMed] [Google Scholar]
- 101.López-Vicente M. et al. White matter microstructure correlates of age, sex, handedness and motor ability in a population-based sample of 3031 school-age children. Neuroimage 227, 117643 (2021). [DOI] [PubMed] [Google Scholar]
- 102.Lebel C. & Beaulieu C. Longitudinal development of human brain wiring continues from childhood into adulthood. J. Neurosci. 31, 10937–10947 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Bethlehem R. A. I. et al. Brain charts for the human lifespan. Nature 604, 525–533 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Fukutomi H. et al. Neurite imaging reveals microstructural variations in human cerebral cortical gray matter. Neuroimage 182, 488–499 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Hanson J. L., Knodt A. R., Brigidi B. D. & Hariri A. R. Lower structural integrity of the uncinate fasciculus is associated with a history of child maltreatment and future psychological vulnerability to stress. Dev. Psychopathol. 27, 1611–1619 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.McKee A. C., Daneshvar D. H., Alvarez V. E. & Stein T. D. The neuropathology of sport. Acta Neuropathol. 127, 29–51 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Hanekamp S. et al. White matter alterations in glaucoma and monocular blindness differ outside the visual system. Sci. Rep. 11, 6866 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Yoshimine S. et al. Age-related macular degeneration affects the optic radiation white matter projecting to locations of retinal damage. Brain Struct. Funct. 223, 3889–3900 (2018). [DOI] [PubMed] [Google Scholar]
- 109.Ogawa S. et al. White matter consequences of retinal receptor and ganglion cell damage. Invest. Ophthalmol. Vis. Sci. 55, 6976–6986 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Malania M., Konrad J., Jägle H., Werner J. S. & Greenlee M. W. Compromised Integrity of Central Visual Pathways in Patients With Macular Degeneration. Invest. Ophthalmol. Vis. Sci. 58, 2939–2947 (2017). [DOI] [PubMed] [Google Scholar]
- 111.Sherbondy A. J., Dougherty R. F., Ben-Shachar M., Napel S. & Wandell B. A. ConTrack: finding the most likely pathways between brain regions using diffusion tractography. J. Vis. 8, 15.1–16 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Yeatman J. D., Dougherty R. F., Myall N. J., Wandell B. A. & Feldman H. M. Tract profiles of white matter properties: automating fiber-tract quantification. PLoS One 7, e49790 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Aydogan D. B. & Shi Y. Parallel Transport Tractography. IEEE Trans. Med. Imaging 40, 635–647 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Baran D. & Shi Y. A novel fiber-tracking algorithm using parallel transport frames. in ISMRM (unknown, 2019). [Google Scholar]
- 115.Yanni S. E. et al. Normative reference ranges for the retinal nerve fiber layer, macula, and retinal layer thicknesses in children. Am. J. Ophthalmol. 155, 354–360.e1 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Esteban O. et al. Crowdsourced MRI quality metrics and expert quality annotations for training of humans and machines. Sci Data 6, 30 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Calhoun V. D., Liu J. & Adali T. A review of group ICA for fMRI data and ICA for joint inference of imaging, genetic, and ERP data. Neuroimage 45, S163–72 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118.Hériché J.-K., Alexander S. & Ellenberg J. Integrating Imaging and Omics: Computational Methods and Challenges. Annu. Rev. Biomed. Data Sci. 2, 175–197 (2019). [Google Scholar]
- 119.Shen L. & Thompson P. M. Brain Imaging Genomics: Integrated Analysis and Machine Learning. Proc. IEEE Inst. Electr. Electron. Eng. 108, 125–162 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120.Deslauriers-Gauthier S. et al. White matter information flow mapping from diffusion MRI and EEG. Neuroimage 201, 116017 (2019). [DOI] [PubMed] [Google Scholar]
- 121.Wirsich J., Amico E., Giraud A.-L., Goñi J. & Sadaghiani S. Multi-timescale hybrid components of the functional brain connectome: A bimodal EEG-fMRI decomposition. Netw Neurosci 4, 658–677 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122.Engemann D. A. et al. Combining magnetoencephalography with magnetic resonance imaging enhances learning of surrogate-biomarkers. Elife 9, (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123.Eke D. O. et al. International data governance for neuroscience. Neuron (2021) doi: 10.1016/j.neuron.2021.11.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 124.Eke D. et al. Pseudonymisation of neuroimages and data protection: Increasing access to data while retaining scientific utility. Neuroimage: Reports 1, 100053 (2021). [Google Scholar]
- 125.Kozlov M. NIH issues a seismic mandate: share data publicly. Nature Publishing Group UK 10.1038/d41586-022-00402-1 (2022) doi: 10.1038/d41586-022-00402-1. [DOI] [PubMed] [Google Scholar]
- 126.NOT-OD-21–013: Final NIH Policy for Data Management and Sharing. https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html.
- 127.Data Gravity – in the Clouds. Data Gravitas https://datagravitas.com/2010/12/07/data-gravity-in-the-clouds/ (2010). [Google Scholar]
- 128.Biswal B. B. et al. Toward discovery science of human brain function. Proc. Natl. Acad. Sci. U. S. A. 107, 4734–4739 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 129.Milham M. P. Open neuroscience solutions for the connectome-wide association era. Neuron 73, 214–218 (2012). [DOI] [PubMed] [Google Scholar]
- 130.Halchenko Y. et al. dandi/dandi-cli: 0.46.2. (2022). doi: 10.5281/zenodo.7041535. [DOI]
- 131.Hider R. Jr et al. The Brain Observatory Storage Service and Database (BossDB): A Cloud-Native Approach for Petascale Neuroscience Discovery. Front. Neuroinform. 16, 828787 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 132.Kennedy D. N., Haselgrove C., Riehl J., Preuss N. & Buccigrossi R. The NITRC image repository. Neuroimage 124, 1069–1073 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 133.Koike S. et al. Brain/MINDS beyond human brain MRI project: A protocol for multi-level harmonization across brain disorders throughout the lifespan. Neuroimage Clin 30, 102600 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 134.Das S., Zijdenbos A. P., Harlap J., Vins D. & Evans A. C. LORIS: a web-based data management system for multi-center studies. Front. Neuroinform. 5, 37 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 135.Dockès J. et al. NeuroQuery, comprehensive meta-analysis of human brain mapping. Elife 9, (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 136.de la Vega A. et al. Neuroscout, a unified platform for generalizable and reproducible fMRI research. Elife 11, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 137.Sherif T. et al. CBRAIN: a web-based, distributed computing platform for collaborative neuroimaging research. Front. Neuroinform. 8, 54 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 138.Renton A. I. et al. Neurodesk: An accessible, flexible, and portable data analysis environment for reproducible neuroimaging. bioRxiv 2022.12.23.521691 (2022) doi: 10.1101/2022.12.23.521691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 139.Marcus D. S., Olsen T. R., Ramaratnam M. & Buckner R. L. The Extensible Neuroimaging Archive Toolkit: an informatics platform for managing, exploring, and sharing neuroimaging data. Neuroinformatics 5, 11–34 (2007). [DOI] [PubMed] [Google Scholar]
- 140.Delorme A. et al. NEMAR: an open access data, tools and compute resource operating on neuroelectromagnetic data. Database 2022, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 141.Schirner M. et al. Brain simulation as a cloud service: The Virtual Brain on EBRAINS. Neuroimage 251, 118973 (2022). [DOI] [PubMed] [Google Scholar]
- 142.Rex D. E., Ma J. Q. & Toga A. W. The LONI Pipeline Processing Environment. Neuroimage 19, 1033–1048 (2003). [DOI] [PubMed] [Google Scholar]
- 143.Elam J. S. et al. The Human Connectome Project: A Retrospective. Neuroimage 118543 (2021) doi: 10.1016/j.neuroimage.2021.118543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 144.International Brain Laboratory et al. A modular architecture for organizing, processing and sharing neurophysiology data. Nat. Methods (2023) doi: 10.1038/s41592-022-01742-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 145.Plis S. M. et al. COINSTAC: A Privacy Enabled Model and Prototype for Leveraging and Processing Decentralized Brain Imaging Data. Front. Neurosci. 10, 365 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 146.Harding R. J., Bermudez P., Beauvais M., Bellec P. & Evans A. C. The Canadian Open Neuroscience Platform – An Open Science Framework for the Neuroscience Community. (2022) doi: 10.31219/osf.io/eh349. [DOI] [PMC free article] [PubMed]
- 147.Musen M. A. et al. The center for expanded data annotation and retrieval. J. Am. Med. Inform. Assoc. 22, 1148–1152 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 148.Maumet C. et al. Sharing brain mapping statistical results with the neuroimaging data model. Sci Data 3, 160102 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 149.Van Essen D. C. et al. The Human Connectome Project: a data acquisition perspective. Neuroimage 62, 2222–2231 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 150.Glasser M. F. et al. The minimal preprocessing pipelines for the Human Connectome Project. Neuroimage 80, 105–124 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 151.Caron B. et al. Collegiate athlete brain data for white matter mapping and network neuroscience. Sci Data 8, 56 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 152.Jernigan T. L. et al. The Pediatric Imaging, Neurocognition, and Genetics (PING) Data Repository. Neuroimage 124, 1149–1154 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 153.Yushkevich P. A. et al. Quantitative comparison of 21 protocols for labeling hippocampal subfields and parahippocampal subregions in in vivo MRI: towards a harmonized segmentation protocol. Neuroimage 111, 526–541 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 154.Yushkevich P. A. et al. Automated volumetry and regional thickness analysis of hippocampal subfields and medial temporal cortical structures in mild cognitive impairment. Hum. Brain Mapp. 36, 258–287 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 155.Benson N. C. et al. The retinotopic organization of striate cortex is well predicted by surface topology. Curr. Biol. 22, 2081–2085 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 156.Benson N. C., Butt O. H., Brainard D. H. & Aguirre G. K. Correction of distortion in flattened representations of the cortical surface allows prediction of V1-V3 functional organization from anatomy. PLoS Comput. Biol. 10, e1003538 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 157.Cox R. W. AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. Comput. Biomed. Res. 29, 162–173 (1996). [DOI] [PubMed] [Google Scholar]
- 158.Ades-Aron B. et al. Evaluation of the accuracy and precision of the diffusion parameter EStImation with Gibbs and NoisE removal pipeline. Neuroimage 183, 532–543 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 159.Andersson J. L. R., Skare S. & Ashburner J. How to correct susceptibility distortions in spin-echo echo-planar images: application to diffusion tensor imaging. Neuroimage 20, 870–888 (2003). [DOI] [PubMed] [Google Scholar]
- 160.Andersson J. L. R. & Sotiropoulos S. N. An integrated approach to correction for off-resonance effects and subject movement in diffusion MR imaging. Neuroimage 125, 1063–1078 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 161.Andersson J. L. R., Graham M. S., Zsoldos E. & Sotiropoulos S. N. Incorporating outlier detection and replacement into a non-parametric framework for movement and distortion correction of diffusion MR images. Neuroimage 141, 556–572 (2016). [DOI] [PubMed] [Google Scholar]
- 162.Andersson J. L. R., Graham M. S., Drobnjak I., Zhang H. & Campbell J. Susceptibility-induced distortion that varies due to motion: Correction in diffusion MR without acquiring additional data. Neuroimage 171, 277–295 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 163.Andersson J. L. R. et al. Towards a comprehensive framework for movement and distortion correction of diffusion MR images: Within volume movement. Neuroimage 152, 450–466 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 164.Jeurissen B., Leemans A. & Sijbers J. Automated correction of improperly rotated diffusion gradient orientations in diffusion weighted MRI. Med. Image Anal. 18, 953–962 (2014). [DOI] [PubMed] [Google Scholar]
- 165.Tustison N. J. et al. Large-scale evaluation of ANTs and FreeSurfer cortical thickness measurements. Neuroimage 99, 166–179 (2014). [DOI] [PubMed] [Google Scholar]
- 166.Veraart J. et al. Denoising of diffusion MRI using random matrix theory. Neuroimage 142, 394–406 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 167.Jenkinson M. & Smith S. A global optimisation method for robust affine registration of brain images. Med. Image Anal. 5, 143–156 (2001). [DOI] [PubMed] [Google Scholar]
- 168.Jenkinson M., Bannister P., Brady M. & Smith S. Improved optimization for the robust and accurate linear registration and motion correction of brain images. Neuroimage 17, 825–841 (2002). [DOI] [PubMed] [Google Scholar]
- 169.Greve D. N. & Fischl B. Accurate and robust brain image alignment using boundary-based registration. Neuroimage 48, 63–72 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 170.Pierpaoli C., Jezzard P., Basser P. J., Barnett A. & Di Chiro G. Diffusion tensor MR imaging of the human brain. Radiology 201, 637–648 (1996). [DOI] [PubMed] [Google Scholar]
- 171.Zhang H., Schneider T., Wheeler-Kingshott C. A. & Alexander D. C. NODDI: practical in vivo neurite orientation dispersion and density imaging of the human brain. Neuroimage 61, 1000–1016 (2012). [DOI] [PubMed] [Google Scholar]
- 172.Daducci A. et al. Accelerated Microstructure Imaging via Convex Optimization (AMICO) from diffusion MRI data. Neuroimage 105, 32–44 (2015). [DOI] [PubMed] [Google Scholar]
- 173.Yeh F.-C., Wedeen V. J. & Tseng W.-Y. I. Generalized q-sampling imaging. IEEE Trans. Med. Imaging 29, 1626–1635 (2010). [DOI] [PubMed] [Google Scholar]
- 174.Yeh F.-C. et al. Population-averaged atlas of the macroscale human structural connectome and its network topology. Neuroimage 178, 57–68 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 175.Smith R. E., Tournier J.-D., Calamante F. & Connelly A. The effects of SIFT on the reproducibility and biological accuracy of the structural connectome. Neuroimage 104, 253–265 (2015). [DOI] [PubMed] [Google Scholar]
- 176.Hagmann P. et al. Mapping the structural core of human cerebral cortex. PLoS Biol. 6, e159 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 177.Smith R. E., Tournier J.-D., Calamante F. & Connelly A. SIFT: Spherical-deconvolution informed filtering of tractograms. Neuroimage 67, 298–312 (2013). [DOI] [PubMed] [Google Scholar]
- 178.Coifman R. R. et al. Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc. Natl. Acad. Sci. U. S. A. 102, 7426–7431 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 179.Vos de Wael R. et al. BrainSpace: a toolbox for the analysis of macroscale gradients in neuroimaging and connectomics datasets. Commun Biol 3, 103 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 180.Schaefer A. et al. Local-Global Parcellation of the Human Cerebral Cortex from Intrinsic Functional Connectivity MRI. Cereb. Cortex 28, 3095–3114 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 181.Gramfort A. et al. MEG and EEG data analysis with MNE-Python. Front. Neurosci. 7, 267 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data derived and described in this paper are made available via the brainlife.io platform as “Publications”. User data agreements are required for some projects, like data from the HCP, Cam-CAN, PING, ABCD, and HBN datasets. The Indiana University Acute Concussion Dataset and the Oxford University Choroideremia & Stargardt’s Disease Dataset are parts of ongoing research projects and are not being released at this current time. All other datasets are made freely available via the brainlife.io platform. See supplementary Table 6 for the brainlife.io/pubs [we have added one example data record (https://doi.org/10.25663/brainlife.pub.40) for the review process <the DOIs for the remaining data records will be added at publication>].