ABSTRACT
The accurate estimation of fiber orientation distribution functions (fODFs) in diffusion magnetic resonance imaging (MRI) is crucial for understanding early brain development and its potential disruptions. Although supervised deep learning (DL) models have shown promise in fODF estimation from neonatal diffusion MRI (dMRI) data, the out‐of‐domain (OOD) performance of these models remains largely unexplored, especially under diverse domain shift scenarios. This study evaluated the robustness of three state‐of‐the‐art DL architectures: multilayer perceptron (MLP), transformer, and U‐Net/convolutional neural network (CNN) on fODF predictions derived from dMRI data. Using 488 subjects from the developing Human Connectome Project (dHCP) and the Baby Connectome Project (BCP) datasets, we reconstructed reference fODFs from the full dMRI series using single‐shell three‐tissue constrained spherical deconvolution (SS3T‐CSD) and multi‐shell multi‐tissue CSD (MSMT‐CSD) to generate reference fODF reconstructions for model training, and systematically assessed the impact of age, scanner/protocol differences, and input dimensionality on model performance. Our findings reveal that U‐Net consistently outperformed other models when fewer diffusion gradient directions were used, particularly with the SS3T‐CSD‐derived ground truth, which showed superior performance in capturing crossing fibers. However, as the number of input diffusion gradient directions increased, MLP and the transformer‐based model exhibited steady gains in accuracy. Nevertheless, performance nearly plateaued from 28 to 45 input directions in all models. Age‐related domain shifts showed asymmetric patterns, being less pronounced in late developmental stages (late neonates, and babies), with SS3T‐CSD demonstrating greater robustness to variability compared to MSMT‐CSD. To address inter‐site domain shifts, we implemented two adaptation strategies: the Method of Moments (MoM) and fine‐tuning. Both strategies achieved significant improvements () in over 95% of tested configurations, with fine‐tuning consistently yielding superior results and U‐Net benefiting the most from increased target subjects. This study represents the first systematic evaluation of OOD settings in DL applications to fODF estimation, providing critical insights into model robustness and adaptation strategies for diverse clinical and research applications.
Keywords: constrained spherical deconvolution (CSD), deep learning, diffusion MRI, domain adaptation, domain shift, fiber orientation distribution function (fODF), infant, neonate
We quantify domain‐shift impacts of three state‐of‐the‐art deep learning models for fiber orientation estimation in dMRI of neonatal and baby brains, across age, scanner, input variations, target output ground truths, and demonstrate how fine‐tuning and data harmonization strategies improve model robustness for clinical and research applications.

1. Introduction
Early brain development is a crucial period that sets the stage for lifelong health (Bhat et al. 2014; Godfrey and Barker 2001; O'Donnell and Meaney 2017; Volpe 2009). The depiction of white matter fiber bundles, which are responsible for relaying action potential signals between different brain areas, is of particular interest for fetal, newborn, and baby brains. Those long and myelinated axons have been shown to play a significant role in cognitive and motor functions from infancy (Dubois et al. 2014) to adulthood (Brun and Englund 1986; Davis et al. 2003; Ruiz‐Rizzo et al. 2024). Precise estimation of these bundles of fibers is essential to comprehend in vivo developmental trends and identify irregularities that might indicate potential diseases.
Progress in diffusion magnetic resonance imaging (dMRI), a noninvasive technique that relies on water molecule displacement as a proxy to microstructure, has yielded unparalleled insights into the mapping of the human brain (Descoteaux et al. 2011; Özarslan et al. 2013). The predominant method for extracting diffusion properties from the diffusion signal typically involves a prior model, commonly the diffusion tensor imaging (DTI) model (Basser et al. 1994). However, more intricate models such as multi‐shell multi‐tissue constrained spherical deconvolution (MSMT‐CSD) aim to reconstruct fiber orientation distribution functions (fODFs), enabling the representation of complex white matter configurations like fiber crossings (Jeurissen et al. 2014; Tournier et al. 2004) with sufficiently large crossing angles (Schilling et al. 2018). These models, however, necessitate densely sampled multi‐shell dMRI data that require high acquisition times. A less data‐demanding method, single‐shell three‐tissue constrained spherical deconvolution (SS3T‐CSD) (Dhollander and Connelly 2016; Dhollander et al. 2016), reconstructs multi‐tissue data with a single non‐zero b‐value and has been demonstrated to be a good fit for developing brains (Dhollander et al. 2019), where white matter voxels might suffer more from partial volume effect.
Imaging developing brains presents unique challenges that necessitate specialized approaches: Infants and young children have limited tolerance for long scan sessions, often requiring sedation or specialized acquisition protocols to minimize motion artifacts (Hughes et al. 2017). Motion artifacts are a primary concern in infant MRI scanning, as any head motion can degrade image quality and limit reliable assessment of brain structures (Coupe et al. 2013). These constraints necessitate faster acquisition times to guarantee sufficient data quality for reliable fiber orientation estimation.
1.1. Estimating fODFs With Machine Learning
With the advancement of machine learning techniques and more prominently deep learning, data‐driven approaches have emerged as powerful alternatives to traditional model‐based methods, enabling efficient estimation of the mapping between different related diffusion quantities, for instance, from dMRI signals to scalars such as fractional anisotropy from challenging or undersampled dMRI acquisitions (Alexander et al. 2014; Golkov et al. 2016; Karimi and Gholipour 2022; Tian et al. 2020). Also, fODF prediction from the original raw signal or its spherical harmonic (SH) representation has raised a growing interest from the community in recent years (Bartlett et al. 2023; Hosseini et al. 2022; Jha et al. 2023; Karimi, Vasung, et al. 2021; Kebiri, Gholipour, Lin, et al. 2023; Lin et al. 2019; Nath, Schilling, et al. 2019). Specifically, Nath, Schilling, et al. (2019) used a dense residual network to predict fODFs derived from ex vivo confocal microscopy images of monkey histology sections. However, this method is constrained by the unavailability of ex vivo histological training data. Although Karimi, Jaimes, et al. (2021) used a voxel‐wise multilayer perceptron (MLP) to predict fODFs, Bartlett et al. (2023) and Lin et al. (2019) employed a 3D convolutional neural network (CNN) to predict fODFs of the central voxel based on a small neighborhood of the diffusion signal. To further exploit the correlations among neighboring voxels, a two‐stage Transformer‐CNN was employed by Hosseini et al. (2022) to convert 200 measurements into 60 measurements before proceeding to predict fODFs. Another work (Jha et al. 2023) has shown, using a differential equation approach, the feasibility of predicting accurate fODFs with a limited number of diffusion gradient directions using a 2D neighborhood. On the other hand, Kebiri, Gholipour, Lin, et al. (2023) used a 3D U‐Net‐like network with extensive residual connections to predict big patches, leveraging spatial correlations to estimate fODFs with a small number of input measurements and hence a substantial reduction in scanning times. This approach has yielded promising results on newborns and fetuses (Kebiri et al. 2024).
Differently, Koppers and Merhof (2016) have tried to estimate the orientation of the fibers using 2D CNN in a classification paradigm. Other studies have aimed at segmenting fiber tracts either through a prior model applied on the input (Dong et al. 2019; Wasserthal et al. 2018) or directly from a spherical representation of the acquired signal (Kebiri, Gholipour, Bach Cuadra, and Karimi 2023). Additionally, Da Silva et al. (2024) and Zeng et al. (2022) have developed angular super‐resolution approaches to enhance fODF quality from limited single‐shell acquisitions (SS3T‐CSD) to approximate multi‐shell high angular reconstructions (MSMT‐CSD). An extensive review of machine learning applications in dMRI can be found in Karimi and Warfield (2024) and Kebiri (2023).
In contrast to these supervised approaches, unsupervised methods have emerged as an alternative paradigm for fODF estimation. These methods learn biophysical features directly from dMRI data, without requiring pre‐computed reference ground‐truth fODFs for training. Notable examples include equivariant spherical deconvolution methods (Elaldi et al. 2021, 2024, 2025), which use rotation‐equivariant neural networks to predict fODFs that are then convolved with tissue response functions to reconstruct the dMRI signal, optimizing based on signal reconstruction error. Implicit neural representations employ coordinate‐based neural networks to model continuous fODF fields, where the spatial encoding allows these models to learn spatial correlations between neighboring voxels rather than treating voxels independently: Consagra et al. (2024) specifically address how conventional approaches ignore valuable spatial correlations by using neural fields to parameterize spatially varying ODFs that implicitly model spatial correlation structures while incorporating uncertainty quantification, whereas Hendriks et al. (2025) adapt constrained spherical deconvolution (CSD) within an implicit neural framework using sinusoidal encoding for spatial regularization across multi‐shell data. More recently, Gao et al. (2026) proposed unsupervised three‐compartment learning that jointly estimates tissue fractions and fODFs by enforcing biophysical constraints in the optimization process. Although these unsupervised approaches offer the advantage of not requiring explicit training ground truth data, they present distinct considerations for developing brain imaging. These methods typically require dataset‐specific training phases, substantial computational resources for optimization, and often extensive acquisition protocols with high angular resolution data, which may pose challenges for clinical scenarios involving young subjects where rapid deployment and acquisition time constraints are critical. Furthermore, direct comparison between supervised and unsupervised methods remains methodologically complex and potentially unfair. Supervised approaches can be evaluated against established reference standards (e.g., CSD reconstructions), but using these same CSD‐derived ground truth labels to evaluate unsupervised methods would be inherently biased, as it would penalize methods that learn alternative, potentially superior representations of the underlying fiber architecture. The absence of truly independent in vivo ground truth (Karimi and Warfield 2024; Kebiri et al. 2024) further complicates fair comparative evaluation between these paradigms.
1.2. Domain Shifts in dMRI and Mitigation Strategies
Although DL applied in medical imaging offers strong advantages as detailed above, it suffers substantially from domain shift issues in which source and target data distributions vary considerably. In fact, small datasets that are limited in age span and the privacy constraints of sharing them at scale hamper cross‐dataset studies. In MRI (Richiardi et al. 2025), domain shift is even more amplified as scanners from different sites vary in brands and field strengths and sequence acquisition parameters that differ significantly, even within one modality such as dMRI (b‐values and the gradient directions as an example). Both biological shifts (Bento et al. 2022; Dubois et al. 2014), such as age or pathology, and technological shifts (Tax et al. 2019) contribute to the final distribution shift between the source sets and the target sets. Age is a particular shift in developing brains because of the rapid change in structure and function (Konkel 2018; Schilling et al. 2023).
To mitigate distribution shifts in MRI, solutions may operate at the data level (e.g., data harmonization or augmentation) or at the model level (e.g., transfer learning or domain adaptation).
Data harmonization, which aims to minimize differences due to the unwanted shift between the source domain and the target domain, is dominated by statistical methods (Cetin Karayumak et al. 2019; Huynh et al. 2019; Johnson et al. 2007; Karimi and Warfield 2024; Mirzaalian et al. 2018). The most dominant ones are the Rotation Invariant Spherical Harmonics (RISH) (Cetin Karayumak et al. 2019, 2024; Mirzaalian et al. 2018), specifically designed for the original dMRI signal, and combined association test (ComBat) (Fortin et al. 2017; Johnson et al. 2007), a more general harmonization method that is applied to the target diffusion map (Pinto et al. 2020), that is, after model fitting. The Method of Moments (MoM) (Huynh et al. 2019), which aligns diffusion‐weighted imaging (DWI) features via spherical moments, was also proposed recently and achieved promising results in developing brains (Lin, Gholipour, et al. 2024). While most harmonization methods, such as RISH, require similar acquisition protocols and site‐matched healthy controls, MoM and ComBat are not subject to these restrictions, making them appropriate for a broader range of conditions.
Deep learning has also been used for harmonizing dMRI metrics (Hansen et al. 2022; Koppers et al. 2019; Moyer et al. 2020; Nath, Remedios, et al. 2019a), including those for fODF estimation. However, they either need paired acquisitions with histology (Nath, Remedios, et al. 2019a, 2019b) or scan‐rescan (Yao, Rheault, et al. 2023; Yao et al. 2024) acquisitions. Although deep learning techniques offer solutions to nonlinear harmonization, they are prone to overfitting and require extensive training data, often from matched acquisitions, that are not easy to get (Bashyam et al. 2022; Pinto et al. 2020).
Domain adaptation methods have been employed to tackle domain shifts in medical imaging (Guan and Liu 2021). However, only two methods have been proposed in the context of dMRI, and they aim to tackle the diversity of dMRI acquisitions and, in particular, the b‐value. Kamphenkel et al. (2018) circumvented the domain shift by using a diffusion kurtosis model to estimate missing input values in a breast cancer classification task, while Yao, Newlin, et al. (2023) used a dynamic head to learn the different shell configurations using spherical convolutions to predict fODFs. However, these methods do not offer robustness to other shifts such as scanner, age, or different protocols (except the b‐value).
Using deep learning, two orthogonal approaches are particularly interesting: adversarial training and transfer learning (fine‐tuning). The former relies on learning invariant features through a domain‐agnostic loss function (Ganin et al. 2016; Kamnitsas et al. 2017), whereas the latter relies on pre‐trained weights from a target‐related dataset (Ghafoorian et al. 2017; Samala et al. 2018) that can range from a source dataset to a public dataset of natural images (Krizhevsky et al. 2012).
These domain adaptation approaches are designed for supervised methods (Guan and Liu 2021). For unsupervised learning approaches, domain shifts are theoretically less problematic, as they learn biophysical features and perform fODF prediction on the same dataset without requiring external training data, which creates data distribution shifts (Karimi and Warfield 2024).
1.3. Contributions of This Work
The exploration of fODF estimation under domain shifts remains limited. While Karimi, Vasung, et al. (2021) and Kebiri, Gholipour, Lin, et al. (2023) have extensively tested their fODF prediction models on developing neonatal brains, only Kebiri, Gholipour, Lin, et al. (2023) have investigated out‐of‐domain (OOD) performance. However, this OOD evaluation was conducted qualitatively due to the lack of fetal fODF ground truth. Our preliminary work in Lin, Gholipour, et al. (2024) explored the age‐ and age/scanner/protocol‐related domain shifts and proposed potential solutions, including the MOM (data harmonization) and fine‐tuning (domain adaptation).
In this study, we significantly extend (Lin, Gholipour, et al. 2024) by including three state‐of‐the‐art supervised deep learning models: MLP‐based, transformers‐based, and U‐Net/CNN‐based. We also use two cohorts: the newborns of the developing Human Connectome Project (dHCP) and the babies of the Baby Connectome Project (BCP). We extend our analysis to different fODF ground‐truth models: SS3T‐CSD (Dhollander and Connelly 2016; Dhollander et al. 2016) and MSMT‐CSD (Jeurissen et al. 2014). Furthermore, different methodological configurations are evaluated: the number of input diffusion gradient directions to the model and the number of subjects from the target set used for domain adaptation/harmonization. To the best of our knowledge, this is the first study to assess OOD settings of DL applications for fODF estimation thoroughly.
2. Methods
Our framework for fODF prediction and OOD evaluation is illustrated in Figure 1 in multiple stages. First, reference CSD algorithms (MSMT‐CSD, SS3T‐CSD) generate ground truth fODFs from full dMRI series for training. Next, deep learning models undergo supervised training using source domain data () to predict these reference fODFs. The framework is evaluated in both intra‐site and inter‐site scenarios (detailed in Section 3), with domain shifts addressed through either data harmonization using MOM to transform target domain data () before inference, or model adaptation via fine‐tuning on target domain data to create adapted networks ().
FIGURE 1.

Overview of the fODF prediction framework using multiple deep learning architectures . (i) Reference fODFs are generated from full dMRI series using CSD algorithms (MSMT‐CSD or SS3T‐CSD) as ground truth for training of deep learning networks. (ii) Models are trained on source domain data () to predict these reference fODFs. (iii) Domain shifts (both intra‐site and inter‐site) are addressed through either: (a) MOM harmonization () to transform target domain data () before inference with the original model , or (b) Fine‐tuning the original model on target domain data to create an adapted model . Here, denotes the source domain dataset, denotes the target domain dataset, and represents the fine‐tuned model. The framework enables comprehensive evaluation across different acquisition protocols and age ranges, as detailed in Section 3.
2.1. Backbone Model Architectures
Three distinct deep learning architectures were implemented for fODF prediction, each representing different modeling paradigms: a U‐Net‐based architecture for multiscale spatial context learning (Kebiri, Gholipour, Lin, et al. 2023), a hybrid CNN‐Transformer for local–global feature integration (Hosseini et al. 2022), and an MLP for direct voxel‐wise signal‐to‐fODF mapping (Karimi, Vasung, et al. 2021). These architectures share a common input–output framework: Input diffusion measurements are normalized and projected onto a SHs basis to enhance acquisition independence, with SH order selection determined by the number of input directions. Throughout this paper, denotes the number of input diffusion measurements, whereas the output consistently represents fODF in SH basis (SH‐L max order 8) with 45 coefficients, denoted as .
2.1.1. U‐Net Based Method
The U‐Net architecture (Kebiri, Gholipour, Lin, et al. 2023) extends the traditional U‐Net (Ronneberger et al. 2015) with extensive multiscale residual connections and hierarchical feature integration. The network processes patches through an encoder‐decoder structure with configurable depth. The encoder uses stride‐2 convolutions for downsampling, with feature maps doubling at each level starting from 36 channels. Each encoding level incorporates residual blocks and cross‐scale connections that concatenate features from different resolution levels. The decoder employs transposed convolutions for upsampling, concatenating encoder features via skip connections before applying residual blocks. All convolutional blocks use ReLU activation (Agarap 2018) and dropout (Srivastava et al. 2014) for regularization, with normalization applied in activation‐dropout‐normalization ordering. The output layer produces fODF predictions without activation to enable direct SH coefficient regression.
2.1.2. CNN‐Transformer (CTtrack)
This hybrid architecture (Hosseini et al. 2022) combines 3D CNNs for spatial feature extraction with Transformer networks for global context modeling. The network processes patches through a residual CNN block with 60 feature maps and GELU activation (Hendrycks and Gimpel 2016). The CNN output is divided into 27 spatial patches, each projected to a d‐dimensional embedding space and augmented with learned positional encodings. Four Transformer blocks with multi‐head self‐attention (Vaswani et al. 2017) and feed‐forward networks process these patch embeddings, using layer normalization (Ba et al. 2016) and residual connections (He et al. 2016). Global average pooling aggregates the attention features, which are then processed by an MLP head with GELU activation to predict SH coefficients for the central voxel.
2.1.3. Multilayer Perceptron (MLP)
The MLP architecture (Karimi, Vasung, et al. 2021) operates on a voxel‐wise basis, processing individual voxels independently without spatial neighborhood information. The network comprises an input layer, six hidden layers, and an output layer with neuron configuration , where represents the number of input diffusion measurements and corresponds to the SH coefficients of the output fODF. Hidden layers use ReLU activation functions (Agarap 2018) with variance scaling initialization (He et al. 2015), while the output layer has no activation function to enable direct regression of SH coefficients. Dropout layers (Srivastava et al. 2014) are applied after each dense layer for regularization. The fully connected architecture enables direct signal‐to‐fODF mapping without spatial context considerations.
2.2. Reference CSD‐Based fODF Reconstruction for Training
To generate training ground truths, we used two classical fODF estimation models, Multi‐Shell Multi‐Tissue CSD (MSMT‐CSD) (Jeurissen et al. 2014) and Single‐Shell 3‐Tissue CSD (SS3T‐CSD) (Dhollander and Connelly 2016) under various dMRI acquisition settings. Following established terminology (Lin, Kebiri, et al. 2024), we refer to these fODF reconstructions as “ground truth” or “GT” when used as training targets for the deep learning models, while acknowledging these are reference reconstructions rather than true anatomical measurements. MSMT‐CSD leverages multi‐shell data to compute tissue‐specific response functions and yields multi‐tissue fODFs. SS3T‐CSD uses b1000 single‐shell (+b0) data and a three‐tissue approach, enabling approximate multi‐tissue decomposition with reduced acquisition demands. These reconstructions from classical models serve as training targets for the deep learning models.
2.3. Training Details
The U‐Net training employed Adam optimizer (Kingma and Ba 2014) with norm loss, initial learning rate , 0.1 dropout rate, and batch size 1, processing 128 patches per epoch for 500–1000 epochs with early stopping. CTtrack training used AdamW optimizer (Loshchilov and Hutter 2019) with initial learning rate , weight decay , batch size 4000, and loss function for 100 epochs. The MLP employed Adam optimizer with initial learning rate , batch size 2000, and norm loss for 100 epochs. The architectures have varying computational complexity: the MLP contains approximately 0.8 M parameters (for ), CTtrack contains approximately 1.8 M parameters, and U‐Net contains approximately 7 M parameters. All models demonstrated stable convergence behavior, with training loss consistently decreasing and validation loss reaching a plateau within the specified epochs, indicating successful optimization without overfitting. Convergence times approximated 12, 12, and 4 h for U‐Net, CTtrack, and MLP, respectively, with inference requiring less than 10 s per subject.
2.4. Domain Shift Mitigation Methods
We evaluate two domain adaptation strategies: data harmonization using MoM and model adaptation through fine‐tuning. These approaches address domain shifts through fundamentally different mechanisms and have distinct data requirements.
2.4.1. Data Harmonization Based on MoM
We used the MoM approach (Huynh et al. 2019) to harmonize dMRI data across different sites or acquisition protocols, as it is not subject to protocol‐ or subject‐matched constraints. This approach requires only the DWI signals from target domain subjects, without requiring corresponding ground truth fODFs. A linear transformation:
| (1) |
matches the statistical moments of the source and target datasets. For each gradient direction, we compute the mean () and variance () of the target and source data within brain masks. Voxel‐wise estimates of and are obtained by minimizing:
| (2) |
where is a regularization term penalizing large deviations from identity:
| (3) |
This voxel‐wise linear optimization is computationally efficient, requiring only seconds to minutes for harmonization of an entire dataset.
2.4.2. Domain Adaptation by Fine‐Tuning
The domain adaptation framework employs transfer learning through model fine‐tuning, maintaining the same architecture while adjusting network parameters to the target domain data distribution. In contrast to MoM harmonization, fine‐tuning requires both DWI signals and corresponding ground truth fODFs from target domain subjects for supervised learning.
Each model maintained its original optimizer while adapting to target domain data. The U‐Net processed target data in patches with stride‐8 voxels (50% overlap in each dimension), using only patches within the brain mask. These patches were split into training and validation sets (4:1 ratio). Fine‐tuning ran for 20 epochs with learning rate and batch size 64 for U‐Net, while CTtrack and MLP used 10 epochs with learning rates and , respectively, and batch size 2000. Fine‐tuning computational time per epoch varied with the number of target subjects and brain size: approximately 10 s (1 subject) to 60 s (10 subjects) per epoch for U‐Net, 30–150 s for CTtrack, and 3–15 s for MLP, with BCP subjects requiring modestly longer processing times due to larger brain and white matter volumes.
2.5. Statistical Analysis
Statistical significance was assessed using nonparametric tests appropriate for neuroimaging performance metrics, which typically exhibit bounded distributions and non‐normal characteristics (Button et al. 2013). Wilcoxon signed–rank tests (Wilcoxon 1945) were used for paired comparisons involving the same test subjects under different conditions. Mann–Whitney U tests (Mann and Whitney 1947) were employed for unpaired comparisons between distinct subject populations. All tests were two‐tailed unless directional hypotheses were being tested, such as performance improvement with increased gradient directions. Bonferroni correction (Dunn 1961) was applied to control for multiple comparisons within each experimental analysis. Tests were categorized by experimental design: intra‐site performance comparisons, age‐related domain shift analyses, and inter‐site domain adaptation assessments, with method‐to‐method comparisons providing direct architecture validation.
2.6. Implementation Details
All models were implemented using PyTorch (Paszke et al. 2019), PyTorch Lightning (Falcon and The PyTorch Lightning team 2024), MONAI (Cardoso et al. 2022), and TensorFlow 2 (Abadi et al. 2015). The U‐Net architecture was reimplemented from its original TensorFlow version using PyTorch. CTtrack maintained its original TensorFlow 2 implementation with modified data loading pipelines. The MLP architecture was reconstructed in TensorFlow 2 following published specifications. The MoM harmonization was implemented in MATLAB R2022b. Training used an NVIDIA GeForce RTX 2080 Ti with 11GB RAM and 24 CPU cores. All statistical analyses were performed using SciPy (Virtanen et al. 2020).
2.7. Evaluation
A quantitative assessment was carried out to evaluate the performance of the fODF estimation methods. The evaluation process relied on five metrics that capture different aspects of fODF reconstruction quality: (1) peak‐based metrics (Kebiri et al. 2024), including agreement rates (ARs) and angular differences (AD) that assess fiber population detection accuracy for both single and crossing fiber configurations, (2) fODF‐derived microstructural measures, including apparent fiber density (AFD) (Raffelt et al. 2012), which quantifies signal amplitude, and generalized fractional anisotropy (GFA) (Tuch 2004), which measures directional coherence and shape anisotropy, and (3) global correlation measures using the angular correlation coefficient (ACC) (Anderson 2005), which evaluate overall fODF field consistency and spatial coherence. Together, these metrics provide a comprehensive evaluation across multiple dimensions of fODF reconstruction quality. For peak‐based analysis, fiber orientations were extracted using Dipy (Garyfallidis et al. 2014) with a mean separation angle of 45°, a maximum of 3 peaks, and a relative peak threshold of 0.5. The choice of these parameters was guided by the work of Schilling et al. (2018), which demonstrated the limitations of current dMRI models in correctly estimating multiple fiber populations and low angular crossing fibers.
Agreement rate The AR was computed using confusion matrices and defined for each number of peaks as:
| (4) |
where represents the percentage of voxels where both methods agree on number of peaks, and denotes the percentage of voxels where at least one of the two methods predicts peaks. This metric captures the rate of concordance between two methods in peak number estimation.
Angular difference: Mean angular difference was computed for voxels containing the same number of estimated peaks. For voxels with multiple fibers, we extracted corresponding peaks between methods by computing the minimum angle between all possible configurations (4 configurations for 2 peaks, 9 for 3 peaks), followed by recursive elimination of matched peaks until all peaks are paired. When comparing deep learning predictions against training ground truth, this metric represents angular error; when comparing split datasets or different reconstruction methods, it quantifies angular disagreement.
Apparent fiber density: The AFD (Raffelt et al. 2012) quantifies the density of fibers aligned in specific orientations by integrating the fODF within directionally‐coherent lobes. For a given fODF , the AFD is computed by first segmenting the fODF into lobes based on peaks and troughs, then calculating the integral over each lobe:
| (5) |
where the integration is performed over each segmented lobe region of the fODF on the unit sphere. This approach captures the total fiber density within each coherent fiber population rather than just peak amplitude. The AFD difference is computed as a masked mean absolute percentage difference (MAPD), where the mask is confined to white matter voxels, and is given by:
| (6) |
where is the number of white matter voxels, and and represent the AFD values from the two methods being compared at voxel . This formulation provides a unified metric that serves as an error measure when evaluating predictions against ground truth and as a disagreement measure when comparing equivalent reconstructions.
Generalized fractional anisotropy: The GFA (Tuch 2004) is a scalar measure that quantifies the normalized variance of the fODF, indicating the degree of anisotropy. For an fODF with discrete samples on the sphere, the GFA is computed as:
| (7) |
where is the mean value of the fODF samples, and is the number of sampling directions on the sphere. GFA values range from 0 (isotropic, no directional preference) to 1 (maximally anisotropic, single direction). The GFA difference is computed as the absolute difference between GFA values within white matter voxels:
| (8) |
where is the number of white matter voxels. This provides a complementary measure to AFD that captures structural anisotropy characteristics rather than amplitude differences.
Angular correlation coefficient: The ACC (Anderson 2005) provides a global measure of similarity between fODF fields. For two fODFs represented by their SH coefficients and , the ACC is computed as:
| (9) |
where represents the dot product of the SH coefficient vectors, and denotes the Euclidean norm. This metric quantifies the overall correlation between fODF shapes and is particularly useful for assessing consistency between different reconstruction methods applied to the same underlying data.
3. Experiments
The proposed framework was extensively evaluated through a series of experiments designed to assess both the performance of different network architectures () and their robustness to various domain shifts. We first validate the CSD‐based reconstruction methods used for ground truth generation, then investigate intra‐site domain shifts (particularly age‐related variations), and finally examine inter‐site domain shifts between two major developing brain imaging consortia. The effectiveness of our domain shift mitigation strategies is evaluated across these scenarios.
3.1. Datasets and Preprocessing
3.1.1. Neonates of the dHCP
We use the third data release of the publicly available dHCP dataset 1 (Edwards et al. 2022), which was acquired on a 3 T Philips Achieva system with a 32‐channel neonatal head coil. The protocol employed TE = 90 ms, TR = 3800 ms, a multiband factor of 4, a SENSE factor of 1.2, a Partial Fourier factor of 0.855, in‐plane resolution 1.5 mm2, slice thickness 3 mm with 1.5 mm overlap (Hutter et al. 2018), and four shells s/mm2 with 20, 64, 88, and 128 volumes. Data were preprocessed with SHARD (Christiaens et al. 2021; Pietsch et al. 2021), including MP‐PCA denoising (Veraart et al. 2016), motion and distortion correction, Gibbs suppression, and resampling to 1.5 mm3 isotropic resolution of size voxels. White matter masks were obtained by combining T2‐weighted segmentation labels White Matter and Brainstem (registered to diffusion space Yushkevich et al. 2016) with FA computed in MRtrix (Tournier et al. 2012), following Kebiri et al. (2024). We used a total of 323 unique subjects from dHCP. Of these, 165 subjects (PMA [postmenstrual age] at scan: weeks, median: , mean: , SD: ) were denoted as dataset . Additionally, we selected two age‐distinct sets from dHCP, denoted as and , respectively, each consisting of 105 subjects: early‐stage (PMA at scan: weeks, median: , mean: , SD: ) and late‐stage (PMA at scan: weeks, median: , mean: , SD: ).
3.1.2. Babies of the Human Connectome Project (BCP)
We used the data from the publicly available BCP dataset 2 (Howell et al. 2019). Images were acquired on a 3 T Siemens Magnetom Prisma with a 32‐channel head coil. The dMRI protocol used six shells s/mm2 having 9, 12, 17, 24, 34, 48 diffusion gradient directions, respectively, and 6 images. Other parameters included TE = 88.6 ms, TR = 2640 ms, multiband factor 5, resolution 1.5 mm3, with a field of view of voxels. Preprocessing included denoising, bias correction, motion compensation, and distortion correction (Andersson and Sotiropoulos 2016), followed by FSL BET brain extraction (Jenkinson et al. 2012). White matter masks were derived by combining a SynthSeg+ (Billot et al. 2023) WM mask, voxels with FA 0.4, and voxels with (FA 0.15) (MD 0.0011). We used 165 subjects from BCP (age at scan: months, median: , mean: , SD: ) and denote this dataset as .
3.2. Validation of Reference CSD‐Based Reconstruction Methods
To assess the quality and consistency of the reconstruction methods used to generate training targets, we split each subject's dMRI series into two equal, nonoverlapping subsets (GS1 and GS2), following Kebiri, Gholipour, Lin, et al. (2023) and Kebiri et al. (2024). To ensure optimal angular distribution and prevent potential bias, we employed a systematic splitting methodology that optimizes the condition number of the diffusion tensor reconstruction matrix, following the scheme proposed by Skare et al. 2000. The splitting process operated independently for each b‐value shell, selecting the directions that provide optimal condition number properties as GS1, whereas the remaining directions formed GS2. This approach ensures that both subsets represent meaningful sampling schemes with complementary angular coverage rather than arbitrary divisions.
For the dHCP dataset, each subset contained 150 measurements distributed across b‐values {0, 400, 1000, 2600} s/mm2 with 10, 32, 44, and 64 directions, respectively. Similarly, for the BCP dataset, each subset contained 75 measurements. We reconstructed fODFs using MSMT‐CSD on both datasets using all available b‐values. For the dHCP dataset, we additionally performed SS3T‐CSD reconstruction using only the and s/mm2 shells.
As GS1 and GS2 represent equivalent samplings of the same underlying diffusion signal, the difference between their reconstructed fODFs indicates the inherent variability in each reconstruction method. This comparison quantifies the reference method consistency under reduced sampling conditions for classical CSD approaches when applied to subsampled data. For the dHCP dataset, we also compared the fODFs reconstructed from the complete series (300 measurements for MSMT‐CSD and b0/b1000 measurements for SS3T‐CSD) to assess and understand the differences between these two reconstruction approaches.
3.3. Intra‐Site Performance and Age‐Related Domain Shift
We evaluated within‐domain performance using both and datasets as defined in Section 3.1, each containing 165 subjects. The subjects were split into 85/80 for training‐validation/testing. Age‐related variations were investigated exclusively in the dHCP dataset, as BCP data showed minimal age‐related variation (Lin, Gholipour, et al. 2024). For age‐specific analysis, we used and , containing 105 subjects each, split 50/55 for training‐validation/testing.
For input signals, we used varying numbers of directions (): {6, 15, 28, 45} for dHCP and {6, 12} for BCP, with normalization by b0. The specific gradient directions for each were selected using the scheme proposed by Skare et al. 2000, which minimizes the condition number of the diffusion tensor reconstruction matrix. Ground truth fODFs were generated using MSMT‐CSD for both datasets. For dHCP, we conducted additional experiments using SS3T‐CSD‐generated fODFs from all 88 b1000 and 20 b0 measurements as an alternative training ground truth. For baseline comparisons, we performed in‐domain testing (training and testing on the same age group).
3.4. Inter‐Site Domain Shift
This analysis quantifies performance degradation when training on but testing on and vice versa. The mismatch includes different MRI scanners (Philips 3 T for dHCP, Siemens 3 T for BCP), acquisition protocols, and subject age ranges. We used 165 subjects from each dataset ( and ), split into 85/80 for training‐validation/testing. Both datasets were normalized by , and we investigated for each network, with gradient directions selected using the scheme proposed by Skare et al. 2000, comparing how domain shift impacts the fODF estimation results.
3.5. Domain Shift Mitigation
We evaluated two domain shift mitigation strategies when transferring models between the two sites (dHCP and BCP), as described in Section 2.4. Using {1, 2, 5, 10} target subjects, we tested both MoM harmonization, which transformed the input dMRI signals to match target domain statistics, and model fine‐tuning, which utilized both dMRI signals and ground truth fODFs to adapt the pre‐trained source model. As shown in Figure 1, we compared two inference scenarios: (1) applying the original model to harmonized target data and (2) using the fine‐tuned model on the original target data. These experiments quantified the effectiveness of each approach in improving fODF reconstruction quality across domains.
4. Results
We present our comprehensive evaluation results, with all performance differences validated through statistical testing: A total of 3633 statistical tests were performed across all experimental conditions, including 756 direct method‐to‐method comparisons for inter‐site experiments. These tests evaluated seven performance metrics (ARs for 1‐fiber and 2‐fiber voxels, ADs for 1‐fiber and 2‐fiber voxels, AFD, ACC, and GFA) across three CNN architectures, multiple gradient direction counts, and two ground truth reconstruction methods (MSMT‐CSD and SS3T‐CSD).
4.1. Consistency of Reference fODF Reconstructions
Table 1 shows the consistency between fODFs reconstructed from GS1 and GS2 of each subject using classical reconstruction methods across five complementary metrics. Single‐fiber estimations show a good AR ( for dHCP and 86% for BCP) for MSMT‐CSD, while two‐fiber estimations exhibit lower ARs of around for both datasets, with corresponding ADs of approximately . SS3T‐CSD exhibits higher agreement for multiple fibers () and lower agreement for single fibers (). As reported previously (Lin, Kebiri, et al. 2024), the proportion of multiple fibers predicted by SS3T‐CSD () is closer to literature values () compared to MSMT‐CSD () (Jeurissen et al. 2013; Schilling et al. 2022). Moreover, the splitting results in each subset containing only 44 b1000 directions, which approaches the minimum number of gradient directions needed to adequately represent the 45 SH coefficients of an 8th‐order reconstruction (Tournier et al. 2013). Hence, increasing the amount of data available for generating the reference fODF reconstructions would likely improve the consistency between split‐dataset reconstructions.
TABLE 1.
Consistency of reference fODF reconstructions.
| Dataset | Model |
|
Agreement rate (angular diff.) | AFD diff. (MAPD) | ACC | GFA diff. | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| Single fiber | Two fiber | |||||||||
| dHCP | MSMT‐CSD | 150 | 92.4% (6°) | 44.2% (16°) |
|
|
|
|||
| dHCP | SS3T‐CSD | 54 | 62.7% (6°) | 55.9% (13°) |
|
|
|
|||
| BCP | MSMT‐CSD | 75 | 86.0% (6°) | 45.5% (17°) |
|
|
|
|||
Note: We compare fODFs generated from two nonoverlapping subsets of each subject's dMRI data (). The table reports single‐ and two‐fiber agreement rates (AR) and angular differences (AD), along with apparent fiber density (AFD) difference, angular correlation coefficient (ACC), and generalized fractional anisotropy (GFA) difference. Better consistency is indicated by higher AR and ACC, and lower AD, AFD difference, and GFA difference. represents the number of measurements in each split dataset.
For AFD differences, SS3T‐CSD achieves significantly lower values () compared to MSMT‐CSD ( for dHCP, for BCP), indicating better amplitude consistency between split‐dataset reconstructions. The ACC shows high overall fODF similarity, with MSMT‐CSD achieving the highest ACC values ( for dHCP, for BCP) compared to SS3T‐CSD (). GFA differences follow the same trend as ACC, with dHCP modeled with MSMT‐CSD benefiting the most from the high‐angular sampling scheme (150 measurements).
For all experiments, metrics related to voxels with 3‐fibers are not reported because of their low consistency (ARs: 26.0%–43.6% compared to 45%–92% for 1–2 fiber populations), confirming the known limitations of current dMRI approaches for complex crossing fiber configurations (Schilling et al. 2018), especially in early development when the anisotropy is low (Dubois et al. 2014).
To further quantify the consistency between these two reference methods, we conducted a direct comparison by applying both MSMT‐CSD and SS3T‐CSD reconstructions to the same dHCP subjects. The ACC between the two methods averages . The agreement in peak number detection shows substantial variation by fiber configuration: Single‐fiber voxels show agreement, while two‐fiber voxels show only agreement. ADs between corresponding peaks average for single fibers and for two‐fiber populations. These differences confirm that the two reconstruction methods capture fundamentally different aspects of the underlying microstructure, with SS3T‐CSD demonstrating higher sensitivity to crossing fiber configurations and MSMT‐CSD providing more conservative estimates with higher single‐fiber stability.
It is important to note that the consistency values reported in this section represent the variability inherent in classical reconstruction methods when applied to subsampled data. Deep learning models trained on complete datasets may achieve ARs that exceed these split‐dataset consistency values, as they learn from full diffusion acquisitions and can leverage spatial context to compensate for reconstruction inconsistencies.
4.2. Intra‐Site Performance
Figure 2 provides qualitative examples, showing representative fODF reconstructions from each model architecture across different input configurations. For the dHCP dataset (Panel a), U‐Net predictions trained with varying numbers of input directions (6, 15, 28, 45) are compared against both MSMT‐CSD and SS3T‐CSD ground truth reconstructions. For the BCP dataset (Panel b), all three model architectures (MLP, CTtrack, U‐Net) are compared using 6 and 12 input directions.
FIGURE 2.

Qualitative examples of fiber orientation distribution function (fODF) estimation. (a) dHCP dataset: U‐Net predictions trained and tested using 6, 15, 28, and 45 diffusion gradient directions as input, with models trained using MSMT‐CSD (left) and SS3T‐CSD (right) reference reconstructions as ground truth. U‐Net is shown to demonstrate the effect of varying input directions on reconstruction quality for both ground truth types. (b) BCP dataset: MLP, CTtrack, and U‐Net predictions trained and tested using 6 and 12 diffusion gradient directions as input, with models trained using MSMT‐CSD reference reconstructions as ground truth. All three architectures are shown for comparison, while SS3T‐CSD is not included due to insufficient b1000 measurements (12) in BCP. Bottom rows show the corresponding ground truth (GT) reconstructions. Each model is trained on the respective number of input directions and evaluated on test subjects using the same input configuration. fODF glyphs are color‐coded by fiber orientation, visualized using MRtrix (Tournier et al. 2012).
4.2.1. Performance on the dHCP Dataset
We generally observe (Figure 3) similar performances of the different models for the AR and the AD when the number of input diffusion gradient directions is 15 or higher. For six directions, U‐Net models outperform in both metrics (AR and AD) for both MSMT‐ and SS3T‐CSD reference reconstructions serving as training GT. Statistical analysis using Wilcoxon signed–rank tests confirmed that U‐Net significantly outperformed both MLP and CTtrack architectures across multiple metrics. For example, at 6 directions with MSMT‐CSD ground truth, U‐Net showed significantly higher ARs for both single‐fiber ( vs. both alternatives) and two‐fiber populations (). Similar significant improvements were observed for AD (), AFD difference (), ACC (), and GFA difference () across all tested direction counts. This can be explained by the large field of view (i.e., patches of 163) of that method that compensates for the small number of directions. This trend is reversed for higher input directions, where CTtrack and the MLP score are slightly higher, especially for SS3T‐CSD.
FIGURE 3.

Model performance comparison for intra‐site evaluation on dHCP dataset. Agreement rates (top panels) and angular differences (bottom panels) between predicted and reference fODFs for MSMT‐CSD (left) and SS3T‐CSD (right) ground truth reconstructions. Results shown for three deep learning architectures (MLP, CTrack, U‐Net) across different input direction counts (6, 15, 28, 45), with separate analysis for single‐fiber (1‐fiber) and two‐fiber (2‐fiber) populations. SS3T‐CSD experiments are conducted only on dHCP data due to insufficient b1000 directions (only 12) in BCP. Higher agreement rates and lower angular differences indicate better performance. Each point represents individual test subjects, with horizontal lines showing median values.
Interestingly, and as can be depicted qualitatively in Panel (a) of Figure 2 (shown for the case of U‐Net), 2‐fiber errors are significantly lower for SS3T‐CSD than MSMT‐CSD. This is consistent with the power of SS3T‐CSD in depicting crossing fibers compared to MSMT‐CSD in newborns (Dhollander et al. 2019). This can be observed in Figure 4, where crossing fibers between the cortico‐spinal tract and the corpus callosum are clearly delineated for SS3T‐CSD, where the gray matter component is not overestimated as in MSMT‐CSD. For anatomical reference, we include the corresponding slice from the atlas by Pietsch et al. (2019).
FIGURE 4.

Qualitative comparison of coronal slices of the fODFs used as training GT for the learning‐based models, reconstructed with MSMT‐CSD and SS3T‐CSD, respectively, from the dMRI scan of a subject at 40 weeks PMA. The visualization presents both gray and white matter compartments overlaid on fractional anisotropy (FA) maps derived from the dMRI data. The reconstructions were generated and visualized using MRtrix (Tournier et al. 2012) and its fork MRtrix3Tissue (https://3Tissue.github.io). For anatomical reference, the rightmost image shows the corresponding slice from the brain atlas published by Pietsch et al. (2019). Inset boxes highlight detailed regions of the reconstructions, demonstrating the fiber orientation patterns in both compartments.
We also observe that all methods exhibit performance saturation when the number of input diffusion gradient directions increases from 28 to 45 in Figure 3. Performance improvements were statistically significant from 6 to 15 and from 15 to 28 directions for all methods (), but showed divergent patterns from 28 to 45 directions, with most MSMT‐CSD configurations reaching a statistical plateau (), while SS3T‐CSD maintained significant improvements across most metrics ().
For the AFD difference (Figure S1), we make similar observations as for AR and AD regarding the edge of U‐Net compared to other models for 6 directions, especially for SS3T‐CSD. However, this edge is kept for all input direction configurations. The ACC analysis (Figure S2) shows U‐Net achieving the highest values across most configurations, reaching approximately 0.90 for 6 directions with SS3T‐CSD ground truth compared to 0.83–0.85 for MLP and CTtrack. Similarly, GFA difference analysis (Figure S3) reveals U‐Net achieving the lowest GFA differences at 6 directions across both ground truth methods, demonstrating superior anisotropy preservation in a low‐data scenario.
4.2.2. Performance on the BCP Dataset
We observe (Figure 5) a superior performance of U‐Net compared to the other methods, which is more pronounced for 2‐fibers, for both AR in the number of peaks and their AD, as also qualitatively shown in Figure 2 (right column). These performance differences were statistically significant (Wilcoxon signed–rank test, ). For instance, at 6 directions, U‐Net showed significantly higher ARs and lower angular errors for both single‐fiber and two‐fiber populations compared to MLP and CTtrack (all ). A similar observation can be made for the AFD difference (Figure S1, right panel). Moreover, we notice a significant improvement when going from 6 to 12 directions (). ACC analysis (Figure S2, right panel) shows U‐Net reaching ~0.96 versus ~0.90 for other methods, whereas GFA differences (Figure S3) remain consistently lowest for U‐Net.
FIGURE 5.

Model performance comparison for intra‐site evaluation on BCP dataset using MSMT‐CSD ground truth. Agreement rates (top) and angular differences (bottom) between predicted and reference fODFs for single‐fiber (left) and two‐fiber (right) populations. Results shown for three deep learning architectures (MLP, CTtrack, U‐Net) with 6 and 12 input diffusion directions. The BCP dataset contains older subjects (2–24 months) compared to dHCP neonates. Each point represents individual test subjects, with horizontal lines showing median values.
4.3. Impact of Age‐Related Domain Shift in dHCP
Our analysis included 1008 statistical tests examining both between‐group differences (Mann–Whitney U) and training effects (Wilcoxon signed–rank) across all age configurations and metrics.
The AR (Figure 6a) and AD (Figure 6b) analyses reveal asymmetric age‐related domain shifts. Training on late‐stage subjects (41–45 weeks) and testing on early subjects (33–38 weeks) shows minimal performance degradation, while the reverse direction (early late) exhibits substantial domain shift, particularly for two‐fiber populations. This can be observed especially for 2‐fiber populations. Age‐related domain shifts showed significant effects across all methods and metrics (Mann–Whitney U test, ). Models trained and tested on matched age groups (early early, late late) consistently outperformed cross‐age scenarios (early late, late early), with early‐to‐late generalization showing more performance degradation than late‐to‐early cross‐age transfer (), confirming the presence of age‐related domain shifts even within the narrow postmenstrual age range studied. SS3T‐CSD demonstrates greater robustness to these age‐related variations compared to MSMT‐CSD across all experimental configurations.
FIGURE 6.

Age‐related domain shift analysis within dHCP dataset (33–45 weeks of age). Agreement rates (Panel a) and angular differences (Panel b) between predicted and reference fODFs across four experimental conditions: early early (train/test on 33–38 weeks), late late (train/test on 41–45 weeks), early late (train on early, test on late), and late early (train on late, test on early). Results shown for MSMT‐CSD and SS3T‐CSD ground truth with varying input directions (6, 15, 28, 45) and all three architectures (MLP, CTtrack, U‐Net). SS3T‐CSD experiments are conducted only on dHCP data due to insufficient b1000 directions (only 12) in BCP. Each point represents individual test subjects, with horizontal lines showing median values. Single‐fiber (1‐fiber) and two‐fiber (2‐fiber) populations demonstrate different sensitivities to age‐related domain shifts.
Regarding model performance, U‐Net maintains superiority with 6 input directions across all age configurations, whereas MLP and CTtrack perform better at higher numbers of directions (28–45), especially on the SS3T‐CSD GT. Notably, MLP shows continued improvement from 28 to 45 directions, particularly with SS3T‐CSD ground truth, unlike the other models, which plateau. The AFD analysis (Figure S4) confirms these patterns. The early late transfer shows significantly higher AFD errors compared to other age configurations, while SS3T‐CSD maintains lower domain shift sensitivity similar to the AR and AD metrics. Additional ACC and GFA analyses in Figures S5 and S6 further confirm SS3T‐CSD GT's superior robustness, maintaining stable ACC and GFA values across all age transfers.
4.4. Evaluating Domain Shift Mitigation
Domain adaptation effectiveness was evaluated through 2268 statistical tests: 1512 comparing adaptation strategies and target subject progressions, plus 756 direct method‐to‐method comparisons validating architecture‐specific performance.
Across both transfer directions, domain adaptation strategies demonstrated high effectiveness. Statistical analysis confirmed that MoM achieved significant improvements compared to baseline (no adaptation) in 325 out of 336 configurations (96.7%) and fine‐tuning showed significant improvements compared to baseline in 324 out of 336 configurations (96.4%) across all three architectures (Wilcoxon signed–rank test, ).
4.4.1. Trained on dHCP and Tested on BCP
We generally observe (Figure 7) that the single fiber configuration does not benefit as much as the 2‐fiber configuration from an increased number of target subjects for domain shift attenuation for all models and both MoM and fine‐tuning. Similarly, fine‐tuning benefits more from the number of target subjects compared to MoM. Moreover, in the 2‐fibers configuration, the more target subjects we add, the more fine‐tuning overperforms MoM. For dHCP BCP transfer, statistical analysis confirmed MoM effectiveness in 164 out of 168 configurations (97.6%) and fine‐tuning effectiveness in 160 out of 168 configurations (95.2%) across all three architectures (Wilcoxon signed–rank test, ). For single fiber populations, MoM is slightly better or equal to fine‐tuning in performance, especially for the AD metric. Except for the single‐fiber configuration in AR where MLP is overperforming other methods, we observe that U‐Net is generally the best model in domain shift attenuation for both MoM and fine‐tuning (Wilcoxon signed–rank test: U‐Net significantly outperformed MLP and CTtrack with across 89% of all comparisons; specifically for angular error (97% wins), AR in 2‐fiber populations (99% wins), ACC (97% wins), GFA (93% wins), and AFD (92% wins)). When we go from 6 to 12 input diffusion gradient directions, no noticeable trends were observed except the global performance improvement with the MLP benefiting the most from the increase in input samples (6‐direction results are provided in Figures S8, S10, and S11).
FIGURE 7.

Domain adaptation performance for dHCP‐trained models tested on BCP target domain. Agreement rates (Panel a) and angular differences (Panel b) between predicted and reference fODFs using increasing numbers of BCP target subjects (1, 2, 5, 10) for adaptation. Results compare three adaptation strategies: baseline (no adaptation), Method of Moments (MoM) harmonization, and fine‐tuning across all three architectures (MLP, CTtrack, U‐Net) with 12 input directions. Box plots show distribution across test subjects, representing cross‐dataset transfer from neonatal (dHCP, 33–45 weeks of age) to baby (BCP, 2–24 months postnatal) populations. Single‐fiber (1‐fiber) and two‐fiber (2‐fiber) populations demonstrate different adaptation characteristics (6‐direction results are provided in Figure S8).
For AFD difference (Figure S7), fine‐tuning always surpasses MoM, with no significant improvement with the number of target subjects. This suggests that with a single subject, the distribution of the target AFD can be learned.
These findings are further confirmed by complementary quantitative analyses of amplitude and anisotropy characteristics (Figure 9). The ACC analysis (Figure 9a) reveals that fine‐tuning achieves substantially higher global fODF similarity compared to both baseline and MoM approaches, with ACC values increasing from baseline ranges of 0.75–0.80 to above 0.80 for all models, and U‐Net approaching 0.90 (for 6 directions). The GFA error also shows a similar pattern where fine‐tuning generally surpasses the MoM, with U‐Net consistently achieving the lowest GFA differences and lowest standard deviations, and benefiting most from fine‐tuning, demonstrating that its spatial context modeling is particularly effective during domain transfer.
FIGURE 9.

Quantitative metrics analysis for domain adaptation experiments. ACC values (Panel a) computed as correlation between predicted and reference fiber orientations and GFA differences (Panel b) computed as absolute difference between predicted and reference anisotropy values for dHCP‐trained models tested on BCP target domain (left panels) and BCP‐trained models tested on dHCP target domain (right panels). Results compare three adaptation strategies: baseline (no adaptation), Method of Moments (MoM) harmonization, and fine‐tuning using increasing numbers of target domain subjects (1, 2, 5, 10) across all three architectures (MLP, CTtrack, U‐Net) with 12 input directions. Box plots show distribution across test subjects, representing cross‐dataset transfer between neonatal and baby populations. Fine‐tuning consistently achieves higher ACC values compared to baseline and MoM, with U‐Net demonstrating the most stable fiber orientation correlations across domain adaptation scenarios. U‐Net consistently achieves the lowest GFA differences and benefits most from fine‐tuning (6‐direction results are provided in Figures S10 and S11).
4.4.2. Trained on BCP and Tested on dHCP
We similarly observe that fine‐tuning benefits more from increasing the number of target subjects compared to MoM for the AD and the AR. This can be particularly observed in AD. However, this increase is not very pronounced and only starts to show at 10 subjects ( for improvement from 5 to 10 subjects). This can be particularly observed for U‐Net in the case of 2‐fiber populations. Fine‐tuning generally outperforms the MoM, but not in all cases. For instance, for AR and 2‐fiber populations, MLP performs generally better with MoM compared to fine‐tuning, and U‐Net is better for the case of the number of target subjects is five or fewer. U‐Net is generally the best model for addressing domain shifts (Wilcoxon signed–rank test: across multiple metrics; angular error 90% wins, AR 2‐fiber 89% wins, ACC 97% wins, GFA 86% wins), and it improves more with the number of target subjects and less with the number of gradient directions, the opposite of MLP and CTtrack. Statistical analysis confirmed MoM effectiveness in 161 out of 168 configurations (95.8%) and fine‐tuning effectiveness in 164 out of 168 configurations (97.6%) for BCP dHCP transfer across all three architectures (Wilcoxon signed–rank test, ).
For AFD difference (Figure S7, Panel b), fine‐tuning always surpasses MoM, with no significant improvement with the number of target subjects, up to five subjects as observed for AR and AD. The overall domain adaptation patterns for AR and AD in the BCP‐to‐dHCP transfer are illustrated in Figure 8, with complementary ACC and GFA analyses shown in Figure 9a,b (6‐direction results are provided in Figures S9–S11).
FIGURE 8.

Domain adaptation performance for BCP‐trained models tested on dHCP target domain. Agreement rates (Panel a) and angular differences (Panel b) between predicted and reference fODFs using increasing numbers of dHCP target subjects (1, 2, 5, 10) for adaptation. Results compare three adaptation strategies: baseline (no adaptation), Method of Moments (MoM) harmonization, and fine‐tuning across all three architectures (MLP, CTtrack, U‐Net) with 12 input directions. Box plots show distribution across test subjects, representing cross‐dataset transfer from baby (BCP, 2–24 months postnatal) to neonatal (dHCP, 33–45 weeks of age) populations. Single‐fiber (1‐fiber) and two‐fiber (2‐fiber) populations demonstrate different adaptation characteristics (6‐direction results are provided in Figure S9).
The ACC analysis reveals that BCP dHCP transfer presents substantially greater challenges for maintaining fiber orientation correlations compared to the reverse direction. Baseline ACC values are lower (0.74–0.79, for 6 directions) compared to dHCP BCP transfer (0.77–0.83, for 6 directions), reflecting the biological complexity of predicting neonatal brain microstructure from baby‐trained models. Fine‐tuning provides critical improvements, elevating ACC values to 0.78–0.88 across all architectures. U‐Net demonstrates the most stable performance with the highest ACC values, consistently increasing with the number of subjects in fine‐tuning, followed by MLP and finally CTtrack.
GFA difference analysis further emphasizes the challenge of the BCP dHCP domain shift. Baseline GFA differences are dramatically elevated, reaching 0.40–0.45 for MLP and 0.20–0.25 for CTtrack and U‐Net, indicating substantial anisotropy estimation errors when applying baby‐trained models to neonatal data. Fine‐tuning provides essential corrections, reducing GFA differences to 0.15–0.20 for MLP and 0.08–0.15 for U‐Net, though these remain higher than the reverse transfer direction. This pattern reflects the inherent difficulty in predicting the rapid microstructural changes occurring during early brain development, where neonatal tissue properties differ substantially from the more mature patterns captured in baby training data as also demonstrated previously (Lin, Gholipour, et al. 2024). The consistent superiority of U‐Net across both ACC and GFA metrics reinforces its robustness for challenging domain adaptation scenarios involving developmental populations.
5. Discussion
In this study, we extensively investigated the performance and robustness of different deep‐learning models on dMRI‐derived fODFs. We conducted intra‐site experiments on two datasets, the developing dHCP and the BCP, and inter‐site experiments to evaluate domain shift attenuation techniques. Specifically, we examined the effect of the number of input diffusion gradient directions, the influence of different ground truth configurations (MSMT‐ and SS3T‐CSD), and age domain shift on model performance and employed two domain adaptation strategies: the MoM and fine‐tuning.
5.1. Architecture‐Specific Performance
Our intra‐site experiments revealed that U‐Net consistently outperformed the other models when fewer diffusion directions were used, particularly with the dHCP dataset and the SS3T‐CSD ground truth. However, with an increased number of directions, MLP and CTtrack showed marginally but statistically significantly better performance.
The intra‐site experiments also show a plateau in the performance observed across all models from 28 to 45 directions, suggesting diminishing returns beyond a certain threshold of input directions and hence, a reduction in scanning time compared to non‐deep‐learning models. This important finding has direct implications for clinical scanning protocols, as it indicates potential for significant scan time reduction without compromising fODF reconstruction quality. For SS3T‐CSD in the same intra‐site experimental settings, this performance is acceptable for both single‐ and two‐fiber cases, reaching around 75% in AR and around 3° in the AD.
In the intra‐site BCP experiments, U‐Net maintained its edge, particularly for 2‐fiber populations, with less noticeable improvement when increasing from 6 to 12 directions. This could be attributed to the more consistent anatomical structures in older infants compared to the neonatal cohort of dHCP. The robustness of U‐Net's performance is also reflected in ACC and GFA metrics, particularly in low‐direction regimes where spatial context modeling proves most valuable.
5.2. Impact of Reference Reconstruction Methods
While architectural differences explain much of the performance variation, the choice of reference reconstruction method proved to be also influential across all architectures: Using SS3T‐CSD as the reference reconstruction method yielded statistically significantly lower ADs for 2‐fiber populations, with a more pronounced improvement with the number of input measurements, highlighting its efficacy in capturing crossing fibers compared to MSMT‐CSD in developing brains (Dhollander et al. 2019). This is also confirmed by the lower ground‐truth consistency in MSMT‐CSD compared to SS3T‐CSD in crossing fibers, despite the former benefiting from more diffusion measurements.
These findings underscore a fundamental limitation: The performance of DL‐based fODF estimation models is bounded by the quality and consistency of the data, in particular crossing fiber voxels. Challenges faced by current dMRI models, including MSMT‐CSD and SS3T‐CSD, in accurately estimating multiple fiber populations and low angular crossing fibers have been shown by Schilling et al. (2018), and also in our algorithm consistency analysis comparison experiment. In fact, the performance saturation at around 28 input directions is likely not a neural network limit but a model‐data limit. The more diverse and consistent the data, the more crossing fibers and crossing angles can be resolved, even for as few as 6 directions, as shown in our experiments, bypassing theoretical limits.
These inherent limitations of CSD‐based reference reconstructions also explain why direct comparison with unsupervised methods would be methodologically problematic. Our supervised models are explicitly trained to reproduce CSD reconstructions, while unsupervised methods learn alternative representations that may better capture the underlying fiber architecture.
5.3. Domain Shift Characteristics and Mitigation Strategies
Age‐related experiments in dHCP demonstrated reduced domain shifts when training on later age groups, with SS3T‐CSD proving more robust to age‐related variability than MSMT‐CSD. The consistent statistical significance of age‐related effects () within narrow age windows demonstrates that even subtle developmental changes create measurable domain shifts, with practical implications for age‐specific model calibration in clinical deployment.
Importantly, the magnitude of these age‐related domain shifts within dHCP (33–45 weeks PMA) is substantially smaller than the inter‐site domain shifts, reflecting the distinction between subtle biological developmental changes and more substantial biological and technical domain shifts. Our inter‐site experiments (dHCP vs. BCP) actually encompass much larger age‐related variations, as they span from neonates (33–45 weeks PMA) to babies (2–24 months), with differences in scanners and acquisition protocols. This combination of biological and technical domain shifts in the inter‐site scenario represents the most challenging and practically relevant deployment conditions. The substantial performance degradation observed in these cross‐site scenarios necessitated the evaluation of practical mitigation strategies. In fact, our results highlighted the complexity of estimating neonatal brain microstructure from baby‐trained models, more than the reverse direction. For domain shift attenuation, fine‐tuning generally surpassed MoM, particularly as the number of target subjects increased. U‐Net emerged as the most reliable model for domain adaptation across different scenarios, with its performance benefiting more from an increase in target subjects than from additional gradient directions. For most of the configurations, fine‐tuning with five target subjects yielded satisfactory results, as also demonstrated in other applications such as semantic segmentation (Lhermitte et al. 2024; Zalevskyi et al. 2025). The choice of limited target subjects (1–10) reflects both established practices in medical imaging domain adaptation (Ghafoorian et al. 2017; Valverde et al. 2019), where effective adaptation has been demonstrated with minimal target data, and practical constraints in clinical deployment scenarios where acquiring sufficient dMRI data for reliable CSD‐based reference reconstruction is time‐consuming and resource‐intensive (Calamuneri et al. 2018; Golkov et al. 2016; Tournier et al. 2013). This, however, depends on the consistency and variability of the few target subjects. Future work can focus on optimizing the choice of these target subjects and how synthetic data can help compensate for real data.
5.4. Limitations and Future Directions
It is important to note that obtaining absolute ground truth fiber orientations remains an open challenge in the field, as it would require extensive histological validation, which is rarely available, especially in developing brains. Classical reconstruction methods (Dhollander and Connelly 2016; Jeurissen et al. 2014; Tournier et al. 2004) serve as practical reference standards, despite being approximations of the underlying anatomy. These methods, based on physical models and mathematical constraints, provide reasonable estimates that have been validated through various indirect means, including anatomical studies and phantom experiments.
Future work can also attempt to merge reference fODF reconstruction algorithms in a way that leverages each of their advantages and potentially overcomes individual method constraints. Additionally, extending domain adaptation approaches to cross‐shell scenarios where models trained on one b‐value are deployed on datasets with different b‐values represents a promising research direction, as prior work has highlighted both the challenges of multi‐shell b‐value generalizability (Nath, Lyu, et al. 2019) and the complexity of cross‐shell transformations (Dugan and Carmichael 2023), though such approaches would require specialized frameworks to handle the distinct microstructural sensitivities captured by different b‐values and the associated cross‐shell domain shifts. The practical challenges of limited target data availability could be addressed through the optimized selection of target subjects for fine‐tuning and exploration of how synthetic data generation can help compensate for real data limitations. Future work could also explore controlled studies that systematically disentangle biological developmental changes from technical domain shifts in larger pediatric cohorts.
Domain adaptation is a promising strategy to foster model generalization in medical imaging, and many methods have been proposed recently (Guan and Liu 2021) and increase the likelihood of potential deployment in clinical settings. Some of these challenges in fODF estimation, linked to the spatial resolution, for example, have been attempted to be addressed recently by implicit neural representations (Dwedari et al. 2024), and others related to the broad heterogeneity of the input acquisition (gradient directions, b‐values, scanner type, etc.) by Ewert et al. (2024) using an encoder‐decoder framework to learn latent representations of the signal and the b‐vectors. The convergence of these diverse methodological advances with our findings on architecture‐specific robustness and practical domain adaptation strategies strengthens the path toward reliable clinical deployment.
6. Conclusion
Although we employed well‐established deep learning architectures (MLP, CNN‐Transformer, and U‐Net), our contribution lies in the rigorous evaluation of their robustness under realistic clinical deployment scenarios. The systematic domain shift analysis revealed architecture‐specific properties that were previously unknown, demonstrating that spatial context models (U‐Net) excel with limited data while direct mapping approaches (MLP) benefit more from increased input dimensionality. These findings have direct implications for clinical practice, where scanning protocols, patient populations, and hardware configurations vary significantly across institutions.
Overall, our study highlights the potential of deep learning for fODF estimation of developing brains while underscoring key challenges related to domain shifts. While domain adaptation techniques like MoM and fine‐tuning offer promising solutions, further research is needed to refine these methods, particularly in selecting optimal target subjects and improving ground‐truth models used to generate the GT fODFs. Ultimately, improving data consistency and model robustness will be crucial for translating these models into real‐world clinical applications.
Ethics Statement
This retrospective research study used open‐source human subject data from the Developing Human Connectome Project and the Baby Connectome Project, respectively, where ethical approval was not required per the data licenses.
Conflicts of Interest
The authors declare no conflicts of interest.
Supporting information
Data S1: hbm70367‐sup‐0001‐Figures.pdf.
Acknowledgments
We acknowledge the CIBM Center for Biomedical Imaging for providing expertise and resources to conduct this study. This research was supported by grants from the Swiss National Science Foundation (grants: 182602 and 215641); the US National Institutes of Health, including awards from the National Institute of Neurological Disorders and Stroke (R01NS128281) and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (R01HD110772); and the National Natural Science Foundation of China (grant: 62472315). The views and opinions expressed in this work are solely those of the authors and do not necessarily reflect the official policy or position of the funding agencies. The authors would like to thank Dr. Khoi Minh Huynh at the University of North Carolina at Chapel Hill for discussion and code sharing on the Method of Moments, and Hakim Ouaalam at Boston Children's Hospital and Harvard Medical School for preprocessing BCP T2‐weighted images. We also thank Dr. Erick J. Canales‐Rodriguez at EPFL, CIBM, and CHUV, and Dr. Yasser Alemán‐Gómez at CHUV for their support and discussion during the preparation phase of this work. Finally, we thank Anmin Liu at Tongji University for his logistical support. Open access publishing facilitated by Universite de Lausanne, as part of the Wiley ‐ Universite de Lausanne agreement via the Consortium Of Swiss Academic Libraries.
Lin, R. , Kebiri H., Gholipour A., et al. 2025. “Deep Learning for fODF Estimation in Infant Brains: Model Comparison, Ground‐Truth Impact, and Domain Shift Mitigation.” Human Brain Mapping 46, no. 14: e70367. 10.1002/hbm.70367.
Funding: This work was supported by the Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (182602, 215641), National Institute of Neurological Disorders and Stroke (R01NS128281), National Natural Science Foundation of China (62472315), Eunice Kennedy Shriver National Institute of Child Health and Human Development (R01HD110772), and US National Institutes of Health.
Rizhong Lin and Hamza Kebiri contributed equally to this work.
Endnotes
Data Availability Statement
The analyzed datasets were publicly available: Developing Human Connectome Project (dHCP): https://www.developingconnectome.org/data‐release/third‐data‐release/. Baby Connectome Project (BCP): https://www.humanconnectome.org/study/lifespan‐baby‐connectome‐project/. The code is publicly available at https://github.com/Medical‐Image‐Analysis‐Laboratory/dl_fiber_domain_shift.
References
- Abadi, M. , Agarwal A., Barham P., et al. 2015. “TensorFlow: Large‐Scale Machine Learning on Heterogeneous Systems.” https://www.tensorflow.org/.
- Agarap, A. F. 2018. “Deep Learning Using Rectified Linear Units (ReLU).” arXiv Preprint arXiv:1803.08375.
- Alexander, D. C. , Zikic D., Zhang J., Zhang H., and Criminisi A.. 2014. Image Quality Transfer via Random Forest Regression: Applications in Diffusion MRI Medical Image Computing and Computer‐Assisted Intervention–MICCAI 2014: 17th International Conference, Boston, MA, USA, September 14–18, 2014. Proceedings Part III 17, 225–232. [DOI] [PubMed]
- Anderson, A. W. 2005. “Measurement of Fiber Orientation Distributions Using High Angular Resolution Diffusion Imaging.” Magnetic Resonance in Medicine 54, no. 5: 1194–1206. 10.1002/mrm.20667. [DOI] [PubMed] [Google Scholar]
- Andersson, J. L. R. , and Sotiropoulos S. N.. 2016. “An Integrated Approach to Correction for Off‐Resonance Effects and Subject Movement in Diffusion MR Imaging.” NeuroImage 125: 1063–1078. 10.1016/j.neuroimage.2015.10.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ba, J. L. , Kiros J. R., and Hinton G. E.. 2016. “Layer Normalization.” arXiv Preprint arXiv:1607.06450.
- Bartlett, J. , Davey C., Johnston L., and Duan J.. 2023. “Recovering High‐Quality FODs From a Reduced Number of Diffusion‐Weighted Images Using a Model‐Driven Deep Learning Architecture.” arXiv Preprint arXiv:2307.15273. [DOI] [PubMed]
- Bashyam, V. M. , Doshi J., Erus G., et al. 2022. “Deep Generative Medical Image Harmonization for Improving Cross‐Site Generalization in Deep Learning Predictors.” Journal of Magnetic Resonance Imaging 55, no. 3: 908–916. 10.1002/jmri.27908. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Basser, P. J. , Mattiello J., and LeBihan D.. 1994. “MR Diffusion Tensor Spectroscopy and Imaging.” Biophysical Journal 66, no. 1: 259–267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bento, M. , Fantini I., Park J., Rittner L., and Frayne R.. 2022. “Deep Learning in Large and Multi‐Site Structural Brain MR Imaging Datasets.” Frontiers in Neuroinformatics 15: 805669. 10.3389/fninf.2021.805669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bhat, S. , Acharya U., Adeli H., Bairy G., and Adeli A.. 2014. “Autism: Cause Factors, Early Diagnosis and Therapies.” Reviews in the Neurosciences 25, no. 6: 841–850. 10.1515/revneuro-2014-0056. [DOI] [PubMed] [Google Scholar]
- Billot, B. , Magdamo C., Cheng Y., Arnold S. E., Das S., and Iglesias J. E.. 2023. “Robust Machine Learning Segmentation for Large‐Scale Analysis of Heterogeneous Clinical Brain MRI Datasets.” Proceedings of the National Academy of Sciences of the United States of America 120, no. 9: e2216399120. 10.1073/pnas.2216399120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brun, A. , and Englund E.. 1986. “A White Matter Disorder in Dementia of the Alzheimer Type: A Pathoanatomical Study.” Annals of Neurology 19, no. 3: 253–262. [DOI] [PubMed] [Google Scholar]
- Button, K. S. , Ioannidis J. P., Mokrysz C., et al. 2013. “Power Failure: Why Small Sample Size Undermines the Reliability of Neuroscience.” Nature Reviews Neuroscience 14, no. 5: 365–376. [DOI] [PubMed] [Google Scholar]
- Calamuneri, A. , Arrigo A., Mormina E., et al. 2018. “White Matter Tissue Quantification at Low b‐Values Within Constrained Spherical Deconvolution Framework.” Frontiers in Neurology 9: 716. 10.3389/fneur.2018.00716. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cardoso, M. J. , Li W., Brown R., et al. 2022. “MONAI: An Open‐Source Framework for Deep Learning in Healthcare.” arXiv Preprint arXiv:2211.02701. http://arxiv.org/abs/2211.02701.
- Cetin Karayumak, S. , Bouix S., Ning L., et al. 2019. “Retrospective Harmonization of Multi‐Site Diffusion MRI Data Acquired With Different Acquisition Parameters.” NeuroImage 184: 180–200. 10.1016/j.neuroimage.2018.08.073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cetin‐Karayumak, S. , Zhang F., Zurrin R., et al. 2024. “Harmonized Diffusion MRI Data and White Matter Measures From the Adolescent Brain Cognitive Development Study.” Scientific Data 11, no. 1: 249. 10.1038/s41597-024-03058-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Christiaens, D. , Cordero‐Grande L., Pietsch M., et al. 2021. “Scattered Slice SHARD Reconstruction for Motion Correction in Multi‐Shell Diffusion MRI.” NeuroImage 225: 117437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Consagra, W. , Ning L., and Rathi Y.. 2024. “Neural Orientation Distribution Fields for Estimation and Uncertainty Quantification in Diffusion MRI.” Medical Image Analysis 93: 103105. 10.1016/j.media.2024.103105. [DOI] [PubMed] [Google Scholar]
- Coupe, P. , Catheline G., Lanuza E., and Manjon J. V.. 2013. “Motion‐Compensation Techniques in Neonatal and Fetal MR Imaging.” American Journal of Neuroradiology 34, no. 6: 1124–1138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Da Silva, M. O. , Santana C. P., Do Carmo D. S., and Rittner L.. 2024. “FOD‐Swin‐Net: Angular Super Resolution of Fiber Orientation Distribution Using a Transformer‐Based Deep Model.” In 2024 IEEE International Symposium on Biomedical Imaging (ISBI), 1–5. IEEE. 10.1109/ISBI56570.2024.10635460. [DOI] [Google Scholar]
- Davis, K. L. , Stewart D. G., Friedman J. I., et al. 2003. “White Matter Changes in Schizophrenia: Evidence for Myelin‐Related Dysfunction.” Archives of General Psychiatry 60, no. 5: 443–456. [DOI] [PubMed] [Google Scholar]
- Descoteaux, M. , Deriche R., Le Bihan D., Mangin J.‐F., and Poupon C.. 2011. “Multiple Q‐Shell Diffusion Propagator Imaging.” Medical Image Analysis 15, no. 4: 603–621. [DOI] [PubMed] [Google Scholar]
- Dhollander, T. , and Connelly A.. 2016. “A Novel Iterative Approach to Reap the Benefits of Multi‐Tissue CSD From Just Single‐Shell (+b=0) Diffusion MRI Data.” Proc ISMRM 24: 3010. [Google Scholar]
- Dhollander, T. , Mito R., Raffelt D., and Connelly A.. 2019. “Improved White Matter Response Function Estimation for 3‐Tissue Constrained Spherical Deconvolution.” Proceedings of the International Society for Magnetic Resonance in Medicine 555, no. 10: 107. [Google Scholar]
- Dhollander, T. , Raffelt D., and Connelly A.. 2016. “Unsupervised 3‐Tissue Response Function Estimation From Single‐Shell or Multi‐Shell Diffusion MR Data Without a Co‐Registered T1 Image.” ISMRM Workshop on Breaking the Barriers of Diffusion MRI 5, no. 5: 1. [Google Scholar]
- Dong, X. , Yang Z., Peng J., and Wu X.. 2019. “Multimodality White Matter Tract Segmentation Using CNN.” Proceedings of the ACM Turing Celebration Conference‐China, 1–8.
- Dubois, J. , Dehaene‐Lambertz G., Kulikova S., Poupon C., Hüppi P. S., and Hertz‐Pannier L.. 2014. “The Early Development of Brain White Matter: A Review of Imaging Studies in Fetuses, Newborns and Infants.” Neuroscience 276: 48–71. 10.1016/j.neuroscience.2013.12.044. [DOI] [PubMed] [Google Scholar]
- Dugan, R. , and Carmichael O.. 2023. “Multi‐Shell dMRI Estimation From Single‐Shell Data via Deep Learning.” In Machine Learning in Clinical Neuroimaging, edited by Abdulkadir A., Bathula D. R., Dvornek N. C., et al., 14–22. Springer Nature. 10.1007/978-3-031-44858-4_2. [DOI] [Google Scholar]
- Dunn, O. J. 1961. “Multiple Comparisons Among Means.” Journal of the American Statistical Association 56: 52–64. [Google Scholar]
- Dwedari, M. M. , Consagra W., Müller P., Turgut Ö., Rueckert D., and Rathi Y.. 2024. “Estimating Neural Orientation Distribution Fields on High Resolution Diffusion MRI Scans.” In International Conference on Medical Image Computing and Computer‐Assisted Intervention, 307–317. Springer Nature. [Google Scholar]
- Edwards, A. D. , Rueckert D., Smith S. M., et al. 2022. “The Developing Human Connectome Project Neonatal Data Release.” Frontiers in Neuroscience 16: 886772. 10.3389/fnins.2022.886772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elaldi, A. , Dey N., Kim H., and Gerig G.. 2021. “Equivariant Spherical Deconvolution: Learning Sparse Orientation Distribution Functions From Spherical Data.” In Information Processing in Medical Imaging, edited by Feragen A., Sommer S., Schnabel J., and Nielsen M., 267–278. Springer International Publishing. 10.1007/978-3-030-78191-0_21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elaldi, A. , Gerig G., and Dey N.. 2024. “E(3) × SO(3)‐Equivariant Networks for Spherical Deconvolution in Diffusion MRI.” Medical Imaging With Deep Learning, 301–319. https://proceedings.mlr.press/v227/elaldi24a.html. [PMC free article] [PubMed]
- Elaldi, A. , Gerig G., and Dey N.. 2025. “Equivariant Spatio‐Hemispherical Networks for Diffusion MRI Deconvolution.” Advances in Neural Information Processing Systems 37: 52095–52126. [Google Scholar]
- Ewert, C. , Kügler D., Stirnberg R., Koch A., Yendiki A., and Reuter M.. 2024. “Geometric Deep Learning for Diffusion MRI Signal Reconstruction With Continuous Samplings (DISCUS).” Imaging Neuroscience 2: 1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Falcon, W. , and The PyTorch Lightning Team . 2024. “PyTorch Lightning.” 10.5281/zenodo.11644096. [DOI]
- Fortin, J.‐P. , Parker D., Tunç B., et al. 2017. “Harmonization of Multi‐Site Diffusion Tensor Imaging Data.” NeuroImage 161: 149–170. 10.1016/j.neuroimage.2017.08.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ganin, Y. , Ustinova E., Ajakan H., et al. 2016. “Domain‐Adversarial Training of Neural Networks.” Journal of Machine Learning Research 17, no. 59: 1–35. [Google Scholar]
- Gao, X. , Lin R., Feng J., Shi Y., and Qiao Y.. 2026. “UFO‐3: Unsupervised Three‐Compartment Learning for Fiber Orientation Distribution Function Estimation.” In Medical Image Computing and Computer Assisted Intervention – MICCAI 2025, edited by Gee J. C., Alexander D. C., Hong J., et al., 638–649. Springer Nature. 10.1007/978-3-032-04965-0_60. [DOI] [Google Scholar]
- Garyfallidis, E. , Brett M., Amirbekian B., et al. 2014. “Dipy, a Library for the Analysis of Diffusion MRI Data.” Frontiers in Neuroinformatics 8: 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ghafoorian, M. , Mehrtash A., Kapur T., et al. 2017. “Transfer Learning for Domain Adaptation in MRI: Application in Brain Lesion Segmentation.” Medical Image Computing and Computer Assisted Intervention‐MICCAI 2017: 20th International Conference, Quebec City, QC, Canada, September 11‐13, 2017, Proceedings, Part III 20, 516–524.
- Godfrey, K. M. , and Barker D. J.. 2001. “Fetal Programming and Adult Health.” Public Health Nutrition 4, no. 2b: 611–624. [DOI] [PubMed] [Google Scholar]
- Golkov, V. , Dosovitskiy A., Sperl J. I., et al. 2016. “Q‐Space Deep Learning: Twelve‐Fold Shorter and Model‐Free Diffusion MRI Scans.” IEEE Transactions on Medical Imaging 35, no. 5: 1344–1351. 10.1109/TMI.2016.2551324. [DOI] [PubMed] [Google Scholar]
- Guan, H. , and Liu M.. 2021. “Domain Adaptation for Medical Image Analysis: A Survey.” IEEE Transactions on Biomedical Engineering 69, no. 3: 1173–1185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hansen, C. B. , Schilling K. G., Rheault F., et al. 2022. “Contrastive Semi‐Supervised Harmonization of Single‐Shell to Multi‐Shell Diffusion MRI.” Magnetic Resonance Imaging 93: 73–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- He, K. , Zhang X., Ren S., and Sun J.. 2015. “Delving Deep Into Rectifiers: Surpassing Human‐Level Performance on Imagenet Classification.” In Proceedings of the IEEE International Conference on Computer Vision, 1026–1034. IEEE. [Google Scholar]
- He, K. , Zhang X., Ren S., and Sun J.. 2016. “Deep Residual Learning for Image Recognition.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778. IEEE. [Google Scholar]
- Hendriks, T. , Vilanova A., and Chamberland M.. 2025. “Implicit Neural Representation of Multi‐Shell Constrained Spherical Deconvolution for Continuous Modeling of Diffusion MRI.” Imaging Neuroscience 3: 00501. 10.1162/imag_a_00501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hendrycks, D. , and Gimpel K.. 2016. “Gaussian Error Linear Units (GELUs).” arXiv Preprint arXiv:1606.08415.
- Hosseini, S. M. H. , Hassanpour M., Masoudnia S., Iraji S., Raminfard S., and Nazem‐Zadeh M.. 2022. “CTtrack: A CNN+Transformer‐Based Framework for Fiber Orientation Estimation & Tractography.” Neuroscience Informatics 2, no. 4: 100099. [Google Scholar]
- Howell, B. R. , Styner M. A., Gao W., et al. 2019. “The UNC/UMN Baby Connectome Project (BCP): An Overview of the Study Design and Protocol Development.” NeuroImage 185: 891–905. 10.1016/j.neuroimage.2018.03.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hughes, E. J. , Winchman T., Padormo F., et al. 2017. “A Dedicated Neonatal Brain Imaging System.” Magnetic Resonance in Medicine 78, no. 2: 794–804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hutter, J. , Tournier J. D., Price A. N., et al. 2018. “Time‐Efficient and Flexible Design of Optimized Multishell HARDI Diffusion.” Magnetic Resonance in Medicine 79, no. 3: 1276–1292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huynh, K. M. , Chen G., Wu Y., Shen D., and Yap P.‐T.. 2019. “Multi‐Site Harmonization of Diffusion MRI Data via Method of Moments.” IEEE Transactions on Medical Imaging 38, no. 7: 1599–1609. 10.1109/TMI.2019.2895020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jenkinson, M. , Beckmann C. F., Behrens T. E. J., Woolrich M. W., and Smith S. M.. 2012. “ FSL .” NeuroImage 62, no. 2: 782–790. 10.1016/j.neuroimage.2011.09.015. [DOI] [PubMed] [Google Scholar]
- Jeurissen, B. , Leemans A., Tournier J.‐D., Jones D. K., and Sijbers J.. 2013. “Investigating the Prevalence of Complex Fiber Configurations in White Matter Tissue With Diffusion Magnetic Resonance Imaging.” Human Brain Mapping 34, no. 11: 2747–2766. 10.1002/hbm.22099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jeurissen, B. , Tournier J.‐D., Dhollander T., Connelly A., and Sijbers J.. 2014. “Multi‐Tissue Constrained Spherical Deconvolution for Improved Analysis of Multi‐Shell Diffusion MRI Data.” NeuroImage 103: 411–426. 10.1016/j.neuroimage.2014.07.061. [DOI] [PubMed] [Google Scholar]
- Jha, R. R. , Kumar B. R., Pathak S. K., Schneider W., Bhavsar A., and Nigam A.. 2023. “Undersampled Single‐Shell to MSMT fODF Reconstruction Using CNN‐Based ODE Solver.” Computer Methods and Programs in Biomedicine 230: 107339. [DOI] [PubMed] [Google Scholar]
- Johnson, W. E. , Li C., and Rabinovic A.. 2007. “Adjusting Batch Effects in Microarray Expression Data Using Empirical Bayes Methods.” Biostatistics 8, no. 1: 118–127. [DOI] [PubMed] [Google Scholar]
- Kamnitsas, K. , Baumgartner C., Ledig C., et al. 2017. “Unsupervised Domain Adaptation in Brain Lesion Segmentation With Adversarial Networks.” Information Processing in Medical Imaging: 25th International Conference, IPMI 2017, Boone, NC, USA, June 25–30, 2017, Proceedings 25, 597–609.
- Kamphenkel, J. , Jäger P. F., Bickelhaupt S., et al. 2018. “Domain Adaptation for Deviating Acquisition Protocols in CNN‐Based Lesion Classification on Diffusion‐Weighted MR Images.” Image Analysis for Moving Organ, Breast, and Thoracic Images: Third International Workshop, RAMBO 2018, Fourth International Workshop, BIA 2018, and First International Workshop, TIA 2018, Held in Conjunction With MICCAI 2018, Granada, Spain, September 16 and 20, 2018, Proceedings 3, 73–80.
- Karimi, D. , and Gholipour A.. 2022. “Diffusion Tensor Estimation With Transformer Neural Networks.” Artificial Intelligence in Medicine 130: 102330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karimi, D. , Jaimes C., Machado‐Rivas F., et al. 2021. “Deep Learning‐Based Parameter Estimation in Fetal Diffusion‐Weighted MRI.” NeuroImage 243: 118482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karimi, D. , Vasung L., Jaimes C., Machado‐Rivas F., Warfield S. K., and Gholipour A.. 2021. “Learning to Estimate the Fiber Orientation Distribution Function From Diffusion‐Weighted MRI.” NeuroImage 239: 118316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karimi, D. , and Warfield S. K.. 2024. “Diffusion MRI With Machine Learning.” Imaging Neuroscience 2: 1–55. 10.1162/imag_a_00353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kebiri, H. 2023. “Deep Learning Methods for Diffusion MRI in Early Development of the Human Brain: Resolution Enhancement and Model Estimation.” (Doctoral Dissertation, University of Lausanne).
- Kebiri, H. , Gholipour A., Bach Cuadra M., and Karimi D.. 2023. “Direct Segmentation of Brain White Matter Tracts in Diffusion MRI.” arXiv.
- Kebiri, H. , Gholipour A., Lin R., et al. 2024. “Deep Learning Microstructure Estimation of Developing Brains From Diffusion MRI: A Newborn and Fetal Study.” Medical Image Analysis 95: 103186. 10.1016/j.media.2024.103186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kebiri, H. , Gholipour A., Lin R., Vasung L., Karimi D., and Bach Cuadra M.. 2023. “Robust Estimation of the Microstructure of the Early Developing Brain Using Deep Learning.” International Conference on Medical Image Computing and Computer‐Assisted Intervention, 293–303.
- Kingma, D. P. , and Ba J.. 2014. “Adam: A Method for Stochastic Optimization.” arXiv Preprint arXiv:1412.6980.
- Konkel, L. 2018. “The Brain Before Birth: Using fMRI to Explore the Secrets of Fetal Neurodevelopment.” Environmental Health Perspectives 126, no. 11: 112001. 10.1289/EHP2268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koppers, S. , Bloy L., Berman J. I., Tax C. M., Edgar J. C., and Merhof D.. 2019. “Spherical Harmonic Residual Network for Diffusion Signal Harmonization.” Computational Diffusion MRI: International MICCAI Workshop 22: 173–182.
- Koppers, S. , and Merhof D.. 2016. “Direct Estimation of Fiber Orientations Using Deep Learning in Diffusion Imaging.” International Workshop on Machine Learning in Medical Imaging, 53–60.
- Krizhevsky, A. , Sutskever I., and Hinton G. E.. 2012. “Imagenet Classification With Deep Convolutional Neural Networks.” Advances in Neural Information Processing Systems, 25.
- Lhermitte, E. , Dinomais M., Oyaneder R. A., et al. 2024. “Validating Stroke Lesion Segmentation Methods Using MRI in Children: Transferability of Deep Learning Models.” In ISBI 2024‐21st IEEE International Symposium on Biomedical Imaging. IEEE. [Google Scholar]
- Lin, R. , Gholipour A., Thiran J.‐P., Karimi D., Kebiri H., and Bach Cuadra M.. 2024. “Cross‐Age and Cross‐Site Domain Shift Impacts on Deep Learning‐Based White Matter Fiber Estimation in Newborn and Baby Brains.” In IEEE International Symposium on Biomedical Imaging (ISBI). IEEE. 10.1109/ISBI56570.2024.10635347. [DOI] [Google Scholar]
- Lin, R. , Kebiri H., Gholipour A., et al. 2024. “Ground‐Truth Effects in Learning‐Based Fiber Orientation Distribution Estimation in Neonatal Brains.” MICCAI 2024 International Workshop on Computational Diffusion MRI (CDMRI). 10.1007/978-3-031-86920-4_3. [DOI]
- Lin, Z. , Gong T., Wang K., et al. 2019. “Fast Learning of Fiber Orientation Distribution Function for MR Tractography Using Convolutional Neural Network.” Medical Physics 46, no. 7: 3101–3116. [DOI] [PubMed] [Google Scholar]
- Loshchilov, I. , and Hutter F.. 2019. “Decoupled Weight Decay Regularization.” International Conference on Learning Representations. https://openreview.net/forum?id=Bkg6RiCqY7.
- Mann, H. B. , and Whitney D. R.. 1947. “On a Test of Whether One of Two Random Variables Is Stochastically Larger Than the Other.” Annals of Mathematical Statistics 18: 50–60. [Google Scholar]
- Mirzaalian, H. , Ning L., Savadjiev P., et al. 2018. “Multi‐Site Harmonization of Diffusion MRI Data in a Registration Framework.” Brain Imaging and Behavior 12: 284–295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moyer, D. , Ver Steeg G., Tax C. M., and Thompson P. M.. 2020. “Scanner Invariant Representations for Diffusion MRI Harmonization.” Magnetic Resonance in Medicine 84, no. 4: 2174–2189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nath, V. , Lyu I., Schilling K. G., et al. 2019. “Enabling Multi‐Shell b‐Value Generalizability of Data‐Driven Diffusion Models With Deep Shore.” International Conference on Medical Image Computing and Computer‐Assisted Intervention, 573–581. [DOI] [PMC free article] [PubMed]
- Nath, V. , Remedios S., Parvathaneni P., et al. 2019a. “Harmonizing 1.5T/3T Diffusion Weighted MRI Through Development of Deep Learning Stabilized Microarchitecture Estimators.” Medical Imaging 2019: Image Processing 10949: 173–182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nath, V. , Remedios S., Parvathaneni P., et al. 2019b. “Inter‐Scanner Harmonization of High Angular Resolution DW‐MRI Using Null Space Deep Learning.” Computational Diffusion MRI: International MICCAI Workshop 22: 193–201. [PMC free article] [PubMed]
- Nath, V. , Schilling K. G., Parvathaneni P., et al. 2019. “Deep Learning Reveals Untapped Information for Local White‐Matter Fiber Reconstruction in Diffusion‐Weighted MRI.” Magnetic Resonance Imaging 62: 220–227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O'Donnell, K. , and Meaney M.. 2017. “Fetal Origins of Mental Health: The Developmental Origins of Health and Disease Hypothesis.” American Journal of Psychiatry 174, no. 4: 319–328. 10.1176/appi.ajp.2016.16020138. [DOI] [PubMed] [Google Scholar]
- Özarslan, E. , Koay C. G., Shepherd T. M., et al. 2013. “Mean Apparent Propagator (MAP) MRI: A Novel Diffusion Imaging Method for Mapping Tissue Microstructure.” NeuroImage 78: 16–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paszke, A. , Gross S., Massa F., et al. 2019. “PyTorch: An Imperative Style, High‐Performance Deep Learning Library.” Advances in Neural Information Processing Systems, 32. https://papers.nips.cc/paper_files/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740‐Abstract.html.
- Pietsch, M. , Christiaens D., Hajnal J. V., and Tournier J.‐D.. 2021. “dStripe: Slice Artefact Correction in Diffusion MRI via Constrained Neural Network.” Medical Image Analysis 74: 102255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pietsch, M. , Christiaens D., Hutter J., et al. 2019. “A Framework for Multi‐Component Analysis of Diffusion MRI Data Over the Neonatal Period.” NeuroImage 186: 321–337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pinto, M. S. , Paolella R., Billiet T., et al. 2020. “Harmonization of Brain Diffusion MRI: Concepts and Methods.” Frontiers in Neuroscience 14: 396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Raffelt, D. , Tournier J.‐D., Rose S., et al. 2012. “Apparent Fibre Density: A Novel Measure for the Analysis of Diffusion‐Weighted Magnetic Resonance Images.” NeuroImage 59, no. 4: 3976–3994. [DOI] [PubMed] [Google Scholar]
- Richiardi, J. , Ravano V., Molchanova N., Gordaliza P. M., Kober T., and Bach Cuadra M.. 2025. “Domain Shift, Domain Adaptation, and Generalization: A Focus on MRI.” In Trustworthy Ai in Medical Imaging, 127–151. Elsevier. [Google Scholar]
- Ronneberger, O. , Fischer P., and Brox T.. 2015. “U‐Net: Convolutional Networks for Biomedical Image Segmentation.” MICCAI, 234–241.
- Ruiz‐Rizzo, A. L. , Finke K., and Archila‐Meléndez M. E.. 2024. “Diffusion Tensor Imaging in Alzheimer's Studies.” In Biomarkers for Alzheimer's Disease Drug Development, 105–113. Springer. [DOI] [PubMed] [Google Scholar]
- Samala, R. K. , Chan H.‐P., Hadjiiski L., Helvie M. A., Richter C., and Cha K.. 2018. “Cross‐Domain and Multi‐Task Transfer Learning of Deep Convolutional Neural Network for Breast Cancer Diagnosis in Digital Breast Tomosynthesis.” Medical Imaging 2018: Computer‐Aided Diagnosis 10575: 172–178. [Google Scholar]
- Schilling, K. G. , Chad J. A., Chamberland M., et al. 2023. “White Matter Tract Microstructure, Macrostructure, and Associated Cortical Gray Matter Morphology Across the Lifespan.” Imaging Neuroscience 1: 1–24. 10.1162/imag_a_00050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schilling, K. G. , Janve V., Gao Y., Stepniewska I., Landman B. A., and Anderson A. W.. 2018. “Histological Validation of Diffusion MRI Fiber Orientation Distributions and Dispersion.” NeuroImage 165: 200–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schilling, K. G. , Tax C. M., Rheault F., et al. 2022. “Prevalence of White Matter Pathways Coming Into a Single White Matter Voxel Orientation: The Bottleneck Issue in Tractography.” Human Brain Mapping 43, no. 4: 1196–1213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skare, S. , Hedehus M., Moseley M. E., and Li T.‐Q.. 2000. “Condition Number as a Measure of Noise Performance of Diffusion Tensor Data Acquisition Schemes With MRI.” Journal of Magnetic Resonance 147, no. 2: 340–352. [DOI] [PubMed] [Google Scholar]
- Srivastava, N. , Hinton G., Krizhevsky A., Sutskever I., and Salakhutdinov R.. 2014. “Dropout: A Simple Way to Prevent Neural Networks From Overfitting.” Journal of Machine Learning Research 15, no. 1: 1929–1958. [Google Scholar]
- Tax, C. M. , Grussu F., Kaden E., et al. 2019. “Cross‐Scanner and Cross‐Protocol Diffusion MRI Data Harmonisation: A Benchmark Database and Evaluation of Algorithms.” NeuroImage 195: 285–299. 10.1016/j.neuroimage.2019.01.077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tian, Q. , Bilgic B., Fan Q., et al. 2020. “DeepDTI: High‐Fidelity Six‐Direction Diffusion Tensor Imaging Using Deep Learning.” NeuroImage 219: 117017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tournier, J.‐D. , Calamante F., and Connelly A.. 2012. “MRtrix: Diffusion Tractography in Crossing Fiber Regions.” International Journal of Imaging Systems and Technology 22, no. 1: 53–66. [Google Scholar]
- Tournier, J.‐D. , Calamante F., and Connelly A.. 2013. “Determination of the Appropriate b Value and Number of Gradient Directions for High‐Angular‐Resolution Diffusion‐Weighted Imaging.” NMR in Biomedicine 26, no. 12: 1775–1786. 10.1002/nbm.3017. [DOI] [PubMed] [Google Scholar]
- Tournier, J.‐D. , Calamante F., Gadian D. G., and Connelly A.. 2004. “Direct Estimation of the Fiber Orientation Density Function From Diffusion‐Weighted MRI Data Using Spherical Deconvolution.” NeuroImage 23, no. 3: 1176–1185. [DOI] [PubMed] [Google Scholar]
- Tuch, D. S. 2004. “Q‐Ball Imaging.” Magnetic Resonance in Medicine 52, no. 6: 1358–1372. [DOI] [PubMed] [Google Scholar]
- Valverde, S. , Salem M., Cabezas M., et al. 2019. “One‐Shot Domain Adaptation in Multiple Sclerosis Lesion Segmentation Using Convolutional Neural Networks.” NeuroImage: Clinical 21: 101638. 10.1016/j.nicl.2018.101638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vaswani, A. , Shazeer N., Parmar N., et al. 2017. “Attention Is All You Need.” Advances in Neural Information Processing Systems 30. [Google Scholar]
- Veraart, J. , Fieremans E., and Novikov D. S.. 2016. “Diffusion MRI Noise Mapping Using Random Matrix Theory.” Magnetic Resonance in Medicine 76, no. 5: 1582–1593. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Virtanen, P. , Gommers R., Oliphant T. E., et al. 2020. “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python.” Nature Methods 17: 261–272. 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Volpe, J. 2009. “Brain Injury in Premature Infants: A Complex Amalgam of Destructive and Developmental Disturbances.” Lancet Neurology 8, no. 1: 110–124. 10.1016/S1474-4422(08)70294-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wasserthal, J. , Neher P., and Maier‐Hein K. H.. 2018. “TractSeg—Fast and Accurate White Matter Tract Segmentation.” NeuroImage 183: 239–253. [DOI] [PubMed] [Google Scholar]
- Wilcoxon, F. 1945. “Individual Comparisons by Ranking Methods.” Biometrics Bulletin 1, no. 6: 80–83. [Google Scholar]
- Yao, T. , Newlin N., Kanakaraj P., et al. 2023. “A Unified Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi‐Shell Diffusion‐Weighted MRI.” In Computational Diffusion MRI, edited by Karaman M., Mito R., Powell E., Rheault F., and Winzeck S., 13–22. Springer Nature. 10.1007/978-3-031-47292-3_2. [DOI] [Google Scholar]
- Yao, T. , Rheault F., Cai L. Y., et al. 2023. “Deep Constrained Spherical Deconvolution for Robust Harmonization.” Medical Imaging 2023: Image Processing 12464: 169–176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yao, T. , Rheault F., Cai L. Y., et al. 2024. “Robust Fiber Orientation Distribution Function Estimation Using Deep Constrained Spherical Deconvolution for Diffusion‐Weighted Magnetic Resonance Imaging.” Journal of Medical Imaging 11, no. 1: 014005. 10.1117/1.JMI.11.1.014005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yushkevich, P. A. , Gao Y., and Gerig G.. 2016. “ITK‐SNAP: An Interactive Tool for Semi‐Automatic Segmentation of Multi‐Modality Biomedical Images.” 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 3342–3345. [DOI] [PMC free article] [PubMed]
- Zalevskyi, V. , Sanchez T., Roulet M., et al. 2025. “DRIFTS: Optimizing Domain Randomization With Synthetic Data and Weight Interpolation for Fetal Brain Tissue Segmentation.” arXiv Preprint arXiv:2411.06842. 10.48550/arXiv.2411.06842. [DOI]
- Zeng, R. , Lv J., Wang H., et al. 2022. “FOD‐Net: A Deep Learning Method for Fiber Orientation Distribution Angular Super Resolution.” Medical Image Analysis 79: 102431. 10.1016/j.media.2022.102431. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1: hbm70367‐sup‐0001‐Figures.pdf.
Data Availability Statement
The analyzed datasets were publicly available: Developing Human Connectome Project (dHCP): https://www.developingconnectome.org/data‐release/third‐data‐release/. Baby Connectome Project (BCP): https://www.humanconnectome.org/study/lifespan‐baby‐connectome‐project/. The code is publicly available at https://github.com/Medical‐Image‐Analysis‐Laboratory/dl_fiber_domain_shift.
