Tractography is a fundamental neuroimaging technique used to map and analyze white matter tracts, offering valuable insights into the brain’s structural connectivity. By reconstructing axonal fiber pathways, tractography enables researchers to investigate the complex networks underlying cognitive functions and behaviors1. Over the past two decades, it has evolved from a novel method to a widely used tool in neuroscience, neurology, and psychiatry. This technique has substantially deepened our understanding of brain connectivity, elucidating processes related to development, neural plasticity, and the pathology of various neurological disorders.
Several challenges in tractography data processing hinder open and reproducible science (Fig. 1a). While no single solution fully resolves these issues, structured data sharing and standardized preprocessing can streamline workflows2,3. With the advent of high-angular resolution and multi-shell imaging, most studies now rely on over a hundred diffusion samplings, resulting in compressed datasets between 50 MB and 500 MB. The substantial size of raw diffusion-weighted images complicates data sharing, while managing these datasets requires significant storage, computational power, and processing time. Furthermore, diffusion MRI acquisition standards vary widely in gradient tables, phase encoding, and other protocol-specific parameters, adding complexity to preprocessing and requiring expertise in MR physics and signal processing. Each dataset often necessitates customized preprocessing, making it prone to errors. Even with the adoption of standardized formats like BIDS (Brain Imaging Data Structure)3, inconsistencies remain, such as varied naming conventions for b-table files or inconsistent placement of reverse phase-encoded images, leading researchers to spend substantial time reorganizing data before analysis. Preprocessing steps like motion and distortion correction demand considerable computational resources, with each scan typically requiring hours on a computing cluster. For studies with thousands of scans, this processing burden scales quickly, creating a substantial resource demand that many research groups find challenging to meet. Compounding this, these efforts are redundantly repeated across groups as each team independently manages formatting, storage, and preprocessing, leading to wasted storage, bandwidth, and computational resources that slow the pace of brain research.
Figure 1.

Tractography data processing with DSI Studio’s and the Fiber Data Hub workflow. (a) Traditional diffusion MRI processing is resource-intensive, requiring significant storage, bandwidth, and computation. Each diffusion-weighted scan can range from 50 to 500 MB, creating high download demands. Pipeline setup involves managing reversed phase-encoded images and resolving b-table inconsistencies, which are prone to error. Distortion correction alone can take hours per scan on high-performance systems, contributing to considerable storage, processing, and bandwidth requirements that hinder research scalability and slow progress. (b) DSI Studio’s integration with an independent cloud-based Fiber Data Hub addresses these challenges by offering a streamlined workflow. Public diffusion MRI datasets from repositories such as NDA, NITRC, OpenNeuro, and INDI are preprocessed on workstations, clusters, or cloud platforms to ensure quality-controlled, ready-to-use fiber data. This data is stored in a decentralized hub, hosting major studies like the HCP Lifespan Studies, ABCD Study, and OpenNeuro datasets. DSI Studio users can directly access the hub for file searches, quality control, tract analysis, batch downloads, and data summaries. By standardizing data formatting and compressing fiber data to under 10 MB per scan, the hub significantly reduces storage, bandwidth, and computational demands. This integrated, cloud-enabled workflow eliminates the need for manual data downloads and preprocessing, allowing researchers to begin fiber tracking and connectome analysis immediately, thereby accelerating neuroscience research across both individual and group-level studies.
The latest release of DSI Studio, now integrated with a cloud-based “Fiber Data Hub,” tackles core challenges in tractography by providing preprocessed datasets that allow researchers to bypass tedious preprocessing and begin directly with fiber orientation data and diffusion metrics (Fig. 1b). The hub hosts over 37,000 preprocessed fiber datasets from major repositories (Supplementary Fig. 1), including the Human Connectome Project (HCP) Lifespan Studies4, the Adolescent Brain Cognitive Development (ABCD) Study5, OpenNeuro6, with ongoing expansions as new datasets become available. Each dataset has undergone preprocessing (Supplementary Methods) using high-performance computing resources to generate quality-controlled fiber data stored using BIDS-compatible filenames for immediate tractography and connectome analysis. The preprocessed data—including local fiber orientation and diffusion metrics—is reduced in size by approximately 100x compared to raw diffusion-weighted imaging (DWI) data. This compression is achieved through 8-bit storage conversion, background masking, and selective data retention, allowing for compact storage while maintaining compatibility with tractography analysis (Supplementary Figs. 2 and 3). This approach significantly reduces storage requirements, download times, and preprocessing efforts, while addressing common diffusion MRI challenges, such as variability in gradient tables, phase encoding, and file structures, thereby streamlining workflows for researchers. More importantly, the Fiber Data Hub is accessible via direct download links, web portals (e.g., https://brain.labsolver.org), or REST API (Representational State Transfer API), enabling potential integration with tools such as MRtrix7, DIPY8, TRACULA9, SlicerDMRI10, or cloud-computing platforms such as brainlife.io2 and NiiVue. Researchers can leverage these tools alongside the Fiber Data Hub for quality control, denoising, Gibbs ringing correction, tract analysis, and data harmonization, enhancing accessibility, consistency, and efficiency in tractography research.
Besides the Fiber Data Hub, DSI Studio provides two unique tractography modalities—differential tractography and correlational tractography—which extend beyond conventional mapping to assess dynamic changes in fiber integrity and structure-function relationships (Supplementary Methods). One of these methods, differential tractography, focuses on detecting changes in neuronal integrity rather than merely mapping the presence of tracts. This approach is especially valuable for tracking disease progression or recovery, allowing comparisons of fiber integrity. Another innovation is correlation tractography, which maps neuronal pathways that correlate with specific study variables, facilitating investigations into structure-function relationships within group studies. This method enables researchers to identify fiber pathways whose characteristics are statistically linked to behavioral or clinical measures, offering novel insights into brain function.
DSI Studio’s integration of advanced tools and datasets at the Fiber Data Hub opens new pathways in neuroscience research. Traditionally, data acquisition, preprocessing, and analysis were managed separately, requiring researchers to handle each step independently, often resulting in duplicated efforts and higher error potential. In contrast, DSI Studio’s integrated platform unifies these elements, providing immediate access to high-quality preprocessed data and advanced analysis tools within a standardized, collaborative framework. As more researchers adopt integrated platforms, the field of neuroscience stands to gain in efficiency, reproducibility, and collaboration, ultimately accelerating the pace of discovery.
Code Availability Statement
The source code for DSI Studio is publicly available at https://github.com/frankyeh/DSI-Studio/. This repository contains all components of DSI Studio, including the core algorithms for diffusion modeling, fiber tracking, and connectometry analysis. In addition to DSI Studio, the TIPL (Template Image Processing Library) library—an essential dependency for the software—is also available as open-source code. TIPL provides optimized routines for image processing tasks such as tensor computation and transformation, facilitating high-performance diffusion MRI analysis. The TIPL library can be accessed at https://github.com/frankyeh/TIPL.
DSI Studio is supported by an active user community, including researchers, clinicians, and students who contribute to its ongoing development through discussions, feedback, and feature requests. User support and assistance for both DSI Studio and the Fiber Data Hub are available through the public community forum (https://groups.google.com/g/dsi-studio), where users can report issues, ask questions, and share insights on best practices. This open forum fosters collaboration and ensures that users have access to collective knowledge and troubleshooting resources.
For technical documentation and integration with external tools, the DSI Studio website (https://dsi-studio.labsolver.org) provides guidelines for reading and writing DSI Studio-specific formats, helping researchers integrate their workflows with other neuroimaging platforms. Additionally, the GitHub repository (https://github.com/frankyeh/DSI-Studio) offers access to the latest updates and source code, ensuring transparency and allowing users to track ongoing improvements.
Data Availability Statement
All 37,486 preprocessed brain fiber datasets in this study are publicly available on the Fiber Data Hub, accessible through an integrated GUI in DSI Studio that allows users to download, inspect, and analyze data seamlessly. Alternatively, the hub includes multiple independent decentralized storage locations on GitHub repositories that can be directly accessed or expanded as needed. A web portal is available at https://brain.labsolver.org for convenient access to the hub’s resources outside DSI Studio. These datasets include derived fiber data from major studies such as the HCP Lifespan Project, ABCD study, OpenNeuro repositories, and other studies detailed in the Methods section. The datasets provide diffusion metrics and voxel-level fiber orientations, ready for direct analysis. Each dataset on the data hub is accessible via HTTPS links, enabling direct downloads without the need to use the DSI Studio interface.
The redistribution of datasets follows the agreements of the source studies:
HCP Lifespan and ABCD Studies: The derived fiber data were shared under an agreement with the NIMH Data Archive (NDA). The redistribution of fiber data was confirmed with the NDA Help Desk.
OpenNeuro Repositories: The derived fiber data are also shared under the same CC0 license.
Other studies (CamCAN, HBN, NKI-Rockland, etc.): The derived fiber data are distributed in accordance with the original agreements of each dataset, allowing for public sharing of derived data.
This data hub provides researchers with open access to high-quality, preprocessed brain fiber data, supporting diverse scientific inquiries across clinical and developmental neuroscience.
Supplementary Material
Acknowledgments
The author was supported by NIH grant R01 NS120954. During the preparation of this work, the author used ChatGPT 4o (OpenAI) to revise the manuscript. After using this tool/service, the author reviewed and edited the content as needed and took full responsibility for the content of the publication.
Footnotes
Competing Interests:
The author declares no competing financial or non-financial interests.
References
- 1.Jbabdi S & Johansen-Berg H Brain connectivity 1, 169–183 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hayashi S et al. Nat Methods 21, 809–813 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Gorgolewski KJ et al. Scientific data 3, 1–9 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Somerville LH et al. Neuroimage 183, 456–468 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Casey BJ et al. Dev Cogn Neurosci 32, 43–54 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Markiewicz CJ et al. Elife 10, e71774 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Tournier JD et al. Neuroimage 202, 116137 (2019). [DOI] [PubMed] [Google Scholar]
- 8.Garyfallidis E et al. Front Neuroinform 8, 8 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Yendiki A et al. Front Neuroinform 5, 23 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zhang F et al. JCO Clin Cancer Inform 4, 299–309 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All 37,486 preprocessed brain fiber datasets in this study are publicly available on the Fiber Data Hub, accessible through an integrated GUI in DSI Studio that allows users to download, inspect, and analyze data seamlessly. Alternatively, the hub includes multiple independent decentralized storage locations on GitHub repositories that can be directly accessed or expanded as needed. A web portal is available at https://brain.labsolver.org for convenient access to the hub’s resources outside DSI Studio. These datasets include derived fiber data from major studies such as the HCP Lifespan Project, ABCD study, OpenNeuro repositories, and other studies detailed in the Methods section. The datasets provide diffusion metrics and voxel-level fiber orientations, ready for direct analysis. Each dataset on the data hub is accessible via HTTPS links, enabling direct downloads without the need to use the DSI Studio interface.
The redistribution of datasets follows the agreements of the source studies:
HCP Lifespan and ABCD Studies: The derived fiber data were shared under an agreement with the NIMH Data Archive (NDA). The redistribution of fiber data was confirmed with the NDA Help Desk.
OpenNeuro Repositories: The derived fiber data are also shared under the same CC0 license.
Other studies (CamCAN, HBN, NKI-Rockland, etc.): The derived fiber data are distributed in accordance with the original agreements of each dataset, allowing for public sharing of derived data.
This data hub provides researchers with open access to high-quality, preprocessed brain fiber data, supporting diverse scientific inquiries across clinical and developmental neuroscience.
