Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Nov 26.
Published in final edited form as: IEEE Signal Process Mag. 2025 Nov 24;42(4):78–90. doi: 10.1109/MSP.2025.3590806

Domain-Randomized Deep Learning for Neuroimage Analysis: Selecting Training Strategies, Navigating Challenges, and Maximizing Benefits

Malte Hoffmann 1
PMCID: PMC12646568  NIHMSID: NIHMS2098107  PMID: 41306561

Abstract

Deep learning has revolutionized neuroimage analysis by delivering unprecedented speed and accuracy. However, the narrow scope of many training datasets constrains model robustness and generalizability. This challenge is particularly acute in magnetic resonance imaging (MRI), where image appearance varies widely across pulse sequences and scanner hardware. A recent domain-randomization strategy addresses the generalization problem by training deep neural networks on synthetic images with randomized intensities and anatomical content. By generating diverse data from anatomical segmentation maps, the approach enables models to accurately process image types unseen during training, without retraining or fine-tuning. It has demonstrated effectiveness across modalities including MRI, computed tomography, positron emission tomography, and optical coherence tomography, as well as beyond neuroimaging in ultrasound, electron and fluorescence microscopy, and X-ray microtomography. This tutorial paper reviews the principles, implementation, and potential of the synthesis-driven training paradigm. It highlights key benefits, such as improved generalization and resistance to overfitting, while discussing trade-offs such as increased computational demands. Finally, the article explores practical considerations for adopting the technique, aiming to accelerate the development of generalizable tools that make deep learning more accessible to domain experts without extensive computational resources or machine learning knowledge.

Index Terms—: Deep learning, domain generalization, domain randomization, neuroimaging, medical image analysis

Introduction

Neuroimaging techniques, such as magnetic resonance imaging (MRI), have enabled the study of the human brain in vivo. Alongside advances in acquisition technology, research in neuroimage processing has led to software that automates systematic data analysis, minimizing human effort while improving accuracy and reproducibility [1]. In recent years, deep learning (DL) has been driving the development of a new class of algorithms with unprecedented speed and accuracy, and for a broad range of tasks, deep neural networks have largely replaced classical techniques. However, a key challenge for DL in neuroimaging is small and highly specific datasets. Many studies include only hundreds or even tens of subjects [2], due to factors such as the high cost of data acquisition, multiple modalities competing for scan time, the large size of multi-dimensional data like time-series acquisitions, the low prevalence of certain neurological disorders, and privacy concerns regarding data sharing [3]. Training networks on limited datasets can lead to overfitting and poor generalization to new data—validation errors increase while the training loss continues to decrease [4]. This performance gap is common even for datasets acquired with similar MRI sequences in comparable cohorts, as models can become sensitive to subtle variations in scanner hardware or sequence parameters that trickle down to the images.

Emerging from a rich landscape of harmonization and domain shift mitigation techniques, a recent class of domain-randomization methods tackles the generalization problem by exposing networks to widely variable images synthesized from anatomical segmentation maps [5] (Figure 2). These methods address covariate shifts by generating an effectively unlimited stream of training data, varying key characteristics such as spatial structure, intensity, and resolution far beyond the realistic range (Figure 1). Synthesis-driven training is gaining traction in the neuroimaging community, as it facilitates the development of tools that generalize to new data types without retraining. It has been successfully applied to core neuroimage processing tasks such as segmentation [6, 7] and registration [8], which are fundamental to the interpretation of data acquired with a wide array of modalities. However, training with synthetic data introduces an additional level of abstraction and new challenges, which can present a barrier to adoption. Departing from a recent survey of current and future applications of the strategy [5], this tutorial paper intends to serve as a practical guide to help new adopters navigate these challenges, select appropriate training strategies, and maximize the benefit of synthesis-driven training—to foster the development of robust, domain-invariant DL tools that empower users to analyze their data without DL expertise.

Fig. 2:

Fig. 2:

Image synthesis steps. First, we sample a previously generated anatomical label map, and randomly move and deform it. Second, we generate a grayscale image by drawing an intensity for each label. Third, a series of randomized image corruptions lead to complex intensity patterns across the image and each anatomical structure. Both rows begin with the same label map.

Fig. 1:

Fig. 1:

Synthetic training images. The variability intentionally exceeds realistic bounds of medical imaging to encourage deep neural networks to generalize. To realize the full potential of domain randomization, synthesis-driven training generates a new, unseen input image at every iteration.

Background

A breadth of techniques aim to mitigate the effects of domain shift. This section provides a brief overview of domain shift and adaptation strategies. For details, we refer to more comprehensive reviews [9, 10].

Domain shift

Domain shift, or dataset shift, arises when the statistical distributions used for training and testing differ—common in neuroimaging and detrimental to machine-learning performance. Its three fundamental types are prior probability shift, concept shift, and covariate shift.

Denoting model inputs as X and outputs as Y, prior shift occurs when the output distribution P(Y) changes while the input characteristics given an output, P(XY), remain the same. For example, a model trained on a balanced mix of infant and adult scans may underperform if tested predominantly on infant data—an effect often addressed by adjusting sampling probabilities during training. Concept shift involves changes in the relationship between inputs and outputs, P(YX). Imagine an age-prediction model that learns enlarged ventricles correlate with old age. If later deployed on pediatric Canavan disease patients, who also often have enlarged ventricles, the learned association breaks down.

Covariate shift occurs when the input distribution P(X) changes while the input-output relationship P(YX) remains unchanged. For instance, a brain segmentation model should ideally yield consistent results whether fed T1- or T2-weighted MRI. In practice, covariate shift arises from site effects (scanner differences) or batch effects (processing differences). Domain randomization targets both by diversifying P(X) during training. The following sections review related strategies for mitigating covariate shift.

Statistical harmonization

Traditional harmonization corrects covariate shifts in neuroimaging studies by statistically adjusting the data. Empirical Bayes techniques like ComBat effectively remove batch effects estimated via linear models [11], and extensions to this framework using generalized additive models allow for nonlinear effects. These methods typically assume consistent preprocessing but require no labels and are computationally inexpensive. Statistical moment matching reduces distributional discrepancies between a labeled source domain and an unlabeled or sparsely labeled target domain by reweighting source samples to better match target statistics. For example, kernel mean matching [12] aligns features by minimizing maximum mean discrepancy (MMD) with a nonlinear kernel function, while correlation alignment matches second-order statistics for unsupervised domain adaptation [13].

Feature-based adaptation

Feature-based, or shallow, domain adaptation estimates a transform that maps source data into the target domain via domain-invariant representations. A common approach constructs an intermediate low-dimensional space where source and target samples share features. For example, subspace alignment uses principal component analysis (PCA) to establish a linear mapping between source and target bases. Many other methods, such as geodesic flow kernels and transfer component analysis, refine this approach [9].

Deep domain adaptation

More recently, DL has gained traction for domain adaptation. While early approaches use neural networks only for feature extraction followed by shallow adaptation, a vast landscape of modern deep methods integrate domain adaptation directly into representation learning using discrepancy metrics, adversarial learning, or reconstruction.

Discrepancy-based adaptation:

Discrepancy-based domain adaptation fine-tunes model weights using target domain data. Early techniques align features from a single layer, while later approaches extend alignment across multiple layers. If target labels are available, discrepancy-based adaptation can use supervision. Otherwise, methods minimize statistical discrepancies using metrics like MMD (Section Statistical harmonization), directly regularize network weights to ensure a linear relationship between domains, or integrate adaptive batch normalization layers into the network.

Adversarial domain adaptation:

Adversarial domain adaptation builds on generative adversarial networks (GANs), which adversarially train a generative and a discriminative model [14]: the generator produces fake images to fool the discriminator, while the discriminator learns to distinguish real from synthetic examples. Generative adversarial adaptation follows this idea, generating simulated images in the target domain that remain compatible with source labels. In contrast, discriminative adaptation replaces the generator with a domain-invariant feature extractor, often involving image-to-image translation or style transfer to map data across domains. A common two-step approach first trains a feature extractor and task network on the source domain, then freezes their weights and adversarially trains a new extractor and discriminator on unlabeled target data [15].

Reconstruction-based adaptation:

Reconstruction methods promote shared representations that support both the main task and image reconstruction, which is particularly useful when labels exist only in the source domain. Encoder-decoder networks, such as a variational auto-encoder (VAE), map inputs into a shared latent space and reconstruct them with a loss on the input. Adversarial variants introduce a domain confusion signal by training a discriminator to determine whether reconstructed samples come from the source or target domain. These methods tend to follow cyclic strategies, such as CycleGAN [10].

Domain generalization

Domain adaptation generally assumes access to source and limited target domain data, where labels may exist for all target samples (supervised), some of the target samples (semi-supervised), or none of them (unsupervised)—the latter two being most common in neuroimaging. Domain adaptation bridges observed gaps given source and target samples, whereas domain generalization aims to improve model robustness across future, unseen domains, from which no samples are available at training time.

Self-supervised learning:

Self-supervised learning replaces external labels with pretext tasks, leveraging unlabeled data for supervision. These tasks typically modify inputs to form related pairs and train models to predict relationships between them. Often used for pre-training, self-supervision can improve downstream performance when labeled data are scarce. A common approach is contrastive learning, which aligns representations of similar, “positive” pairs while separating representations of dissimilar, “negative” pairs.

Data augmentation:

Data augmentation improves generalization by exposing models to more variability than the training set encompasses. Rule-based augmentation applies predefined transformations to images and label maps [16]. Common geometric transforms include flipping, affine, and elastic deformation, while intensity-based augmentation might add noise or stretch image histograms. More generic corruptions include swapping of image patches or convolutions with random kernels. Learning-based augmentation trains networks to generate optimal transformations. Like Adversarial domain adaptation, adversarial augmentation uses a generator to create challenging transforms that fool a discriminator [14]. In contrast, data-driven methods extract natural variations from auxiliary, unlabeled datasets, while uncertainty-guided augmentation learns transforms that target ambiguous inputs for which predictions are unreliable.

Further approaches:

Several other domain generalization techniques extend performance to unseen settings without relying on target data during training. Multi-task learning trains a model to perform multiple related tasks at the same time to encourage general, shared representations useful for new tasks. Meta-learning is a related approach whose idea is to learn how to learn across a distribution of tasks. Ensemble learning trains multiple copies of the same model with varying initializations or training splits to improve robustness by fusing their predictions.

Realistic simulations

Synthetic images are widely used for DL and computer vision in neuroimaging [5], especially to address data scarcity with realistic simulations [16]. Instead of augmenting real data, these methods generate entirely new training images. Physics-based simulations use computational models to create medical images with controlled variability. VAEs synthesize anatomically plausible images by sampling from a learned latent space. Recently, probabilistic diffusion models have emerged as a powerful generative tool, producing diverse and detailed images that often surpass GANs in realism [17].

In contrast, domain randomization emphasizes data heterogeneity over authenticity to promote generalization across variations that may never be explicitly observed during training.

Domain-randomized learning

Domain randomization generates intentionally unrealistic training images from anatomical label maps, exposing networks to variability far beyond what limited real-world datasets typically capture. Like other domain generalization strategies, it aims to promote generalization to domains unavailable during training. Domain randomization naturally integrates with supervised and semi-supervised learning. While it is compatible with self-supervised and unsupervised paradigms, we assume access to label maps, as these are the foundation for image synthesis. Synthesis-driven training offers several advantages. First, it reduces the risk of overfitting [16], as every training step presents the network with a new, unseen image. Second, it can achieve state-of-the-art performance with only a few label maps, alleviating the need for compiling and annotating data. Third, optimizing losses on select anatomical labels can produce anatomy-aware models. For example, registration networks can learn to align brains while ignoring other structures, such as the neck, which effectively eliminates the need for skull-stripping [28]. Fourth, labeling errors cannot degrade network performance, because the images are generated directly from label maps and thus match them exactly. Finally, adding artifacts for a new modality or acquisition type is usually straightforward.

These factors have facilitated the development of methods that enable users to process their data without retraining or other techniques requiring DL expertise. Domain randomization has been successfully applied to structural segmentation [21, 22, 23, 24, 25, 29], skull-stripping [26, 27], registration [18, 19, 20, 28, 30], feature extraction [31], image-to-image translation, and super-resolution reconstruction [32].

Table I presents a meta-comparison of a selection of these methods against state-of-the-art DL baselines trained with standard augmentation, as well as classical algorithms. For each study, we report the mean Dice-based accuracy rank percentile across the evaluated datasets, along with the mean Dice gap to the top-performing baseline—often a different baseline for each dataset. If there is a tie, we assign the average of the tied ranks.

TABLE I:

Meta-comparison of mean Dice-based accuracy rank percentiles across test sets for domain-randomization methods relative to state-of-the-art baselines. When not in top percentiles, domain-randomized methods typically exhibit only small performance gaps to the best-performing baseline—often a different method for each dataset. These datasets span structural MRI with T1-weighted (T1), T2-weighted (T2), proton-density-weighted (PD), and fluid-attenuated inversion recovery (FLAIR) contrast, MR angiography (MRA), diffusion-weighted images (DWI) and derived fractional anisotropy maps (FA), quantitative T1 maps (qT1), positron emission tomography (PET), computed tomography (CT), and optical coherence tomography (OCT).

Main task Modalities tested Baselines tested Mean rank percentile Mean Dice gap to best
Registration
Affinee [18] T1, T2, FA, FLAIRs 2 71.9 0.3
Deformablee [18] T1, T2, FA, FLAIRs 3 54.2 1.3
Affine [19] T1, T2, PD, post-contrast T1c s 8 95.0 0.0
Deformable [19] T1, T2, PD, post-contrast T1c s 5 92.0 0.1
Longitudinal, rigide [20] T1, T2, post-contrast T1, FLAIRc s 4–5 55.0 2.3
Longitudinal, rigidw [20] T1, T2, post-contrast T1, FLAIRc s 4 87.5 0.1
Segmentation
Adult [21] T1, T2, PD, FLAIR, CTs 3–6 92.3 0.3
Infant [22] T1, T2 1 100.0 0.0
Fetale [23] T2, T2c 1–2 83.3 0.4
Adult [24] T1, FLAIRc l s 1 100.0 0.0
Lesional [24] T1, FLAIRc l s 1–2 100.0 0.0
Vascular [25] OCT 1 100.0 0.0
Skull-stripping
Adult [26] T1, T2, PD, FLAIR, MRA, DWI, qT1, PET, CTc s 5–6 98.0 0.0
Infant [27] T1, T2 2–4 100.0 0.0
e

Estimated from figures

w

Whole head

s

Include thick-slice stacks

c

Include clinical data

l

Include low-field MRI

Across a range of registration and segmentation tasks, the domain-randomization techniques generally achieve high rank percentiles. In cases of mid-range performance, the accuracy gap to the best-performing baseline is usually small. These findings suggest that while domain randomization does not always yield the highest accuracy, it generalizes well across diverse datasets—particularly heterogeneous clinical images, such as thick-slice acquisitions with glioblastoma—avoiding the gross inaccuracies that can occur with simpler methods [19, 26]. We emphasize that some baselines do not support all datasets considered [20, 21, 24], which is not reflected in the percentile rankings of Table I. Domain-randomized runtimes are similar to conventional DL, typically substantially shorter than for classical algorithms.

Critically, domain randomization leverages a fully control-lable, weightless generative model that synthesizes diverse anatomies and artifacts for training. The next section will review the key components of this model. As MRI offers diverse contrast mechanisms and acquisition protocols, and is widely used in clinical and research neuroimaging due to its excellent soft-tissue contrast and non-invasive nature, many of these components derive from MRI acquisition. However, several studies have demonstrated the generalization of the presented modeling techniques to neuroimaging data acquired with computed tomography (CT) [21, 26, 31], positron emission tomography (PET) [26], and optical coherence tomography (OCT) [25] (Table I). Beyond neuroimaging, these techniques have also been applied to 3D ultrasound, electron and fluorescence microscopy, and X-ray microtomography [29, 31].

Generative modeling

While the implementation details of the generative model vary, the general concepts are the same across methods (Figure 2). We assume availability of a training set 𝒯 of N-dimensional (ND, where N is typically 2 or 3) anatomical label maps. The label maps could be synthetic [25, 28], derived from structural scans using various tools [1, 21], or sourced from public repositories [26]. Let s𝒯 be a randomly selected label map and g a generative model that receives s as input and produces a new label map sx along with an associated, synthetic grayscale image x. In the absence of learnable parameters, g uses simple physics-based and Gaussian-mixture modeling to generate x,sx=g(s,z) given random seed z, sampling synthesis parameters from uniform (𝒰) and normal distributions (𝒩) with zero mean. In the following sections, we will break down the generative steps of Figure 2. Some of these steps involve smoothly varying noise fields, which we will address first.

Prerequisite: smooth noise

Noise fields that smoothly vary across the spatial domain Ω are a prerequisite for spatial augmentation, intensity modulation, and label-map synthesis [28]. As noise generation is ubiquitous in computer graphics, there are many efficient algorithms to choose from.

Value noise:

A straightforward approach is to randomly sample a low-resolution field of size f1×f2××fN, where we uniformly draw fi~𝒰1,bF,fiN for each axis i{1,2,,N}, allowing the spatial frequency of the noise to vary across realizations. We then linearly upsample this field into Ω. The resulting image is called value noise and varies smoothly but has an artificial appearance (Figure 3).

Fig. 3:

Fig. 3:

Fig. 3:

Fig. 3:

Noise generation. Left: Linearly upsampling a random low-resolution image creates smoothly varying “value noise”, which has a machine-generated appearance. Center: Gaussian noise, sampled at full resolution and smoothed via convolution, has a more natural appearance but is inefficient for large kernels. Right: Perlin noise—a type of gradient noise—achieves a natural look without convolutions. The intensity at each point is a combination of the dot products of random gradient vectors (red) at the corners of a unit cell and support vectors (black) from the same corners to that point.

Explicit smoothing:

Alternatively, explicit smoothing results in more natural-looking noise and is particularly efficient for small Gaussian kernels when leveraging their separability to perform a series of 1D convolutions. First, we sample field F~𝒩σF2 of randomized standard deviation σF~𝒰aF,bF at full resolution. Second, we construct a normalized Gaussian kernel κF,i of uniformly sampled standard deviation σκF,i~𝒰aκF,bκF for each axis i. We use kernels of length κF,i=3×σκF,i×2+1, where denotes rounding. Third, we convolve F with these kernels, yielding

Fκ=αF×F*κF,1*κF,2**κF,N, (1)

where rescaling by αF=maxF/maxFκ maintains the maximum strength of F. We perform these convolutions using an ND routine, reshaping each 1D kernel into an ND kernel with singleton dimensions except along the axis i. Generating noise of low spatial frequency by smoothing via convolution is less efficient due to the need for larger kernels.

Gradient noise:

A third, widely used type of noise is gradient noise, which achieves a natural appearance without convolutions (Figure 3). For example, Perlin noise procedures generate a smooth ND field F by sampling gradient vectors of random orientation across a regular lattice of control points [33]. Let PcRNc1,2,,2N be one of the 2N control points defining a unit cell, vc the gradient vector at Pc, and ui the unit vector along spatial axis i{1,2,,N}. We compute the field value at location M within the cell as

fM=c=12Nwc×vcM-Pc, (2)

where · denotes the scalar product. The ND linear-interpolation weight (often faded)

wc=i=1N1-M-Pcui (3)

ensures smooth transitions between control points. To vary the spatial frequency of the noise across realizations, we randomly sample the number of control points Ci~𝒰2,bC,CiN and process Ci-1 unit cells along each axis i.

Figure 3 compares 256 × 256 images of value noise generated by upsampling Gaussian noise from 1/64 of the target resolution, normally distributed noise convolved with kernels of full width at half maximum (FWHM) 64, and Perlin noise on a gradient vector grid of 4 × 4 control points.

Combining noise:

We can create more complex noise patterns spanning a range of spatial frequencies by combining noise images generated at multiple resolutions. This type of noise is called fractal noise and typically involves halving or doubling the frequency at each level and summing these octaves with weights inversely proportional to the frequency. Figure 4 illustrates this process with octaves of Perlin noise. By design, Perlin noise has an intensity range of [−1, 1]. While we can directly control the intensity of value noise by adjusting the bounds of the uniform distribution, obtaining noise fields via smoothing requires sampling from a non-uniform distribution. For the following applications, we standardize the intensity range by min-max normalizing noise to the interval [0, 1]. While the discussed techniques produce scalar fields, the first modeling step of Figure 2—spatial augmentation—requires a vector field for deformation, which we generate by sampling N independent noise components.

Fig. 4:

Fig. 4:

Fractal noise, or pink noise, results from adding noise over a range of spatial frequencies, with a relative weighting ω inversely proportional to the frequency. The example shown combines Perlin noise octaves sampled with C{2,4,8,16,32} control points along each axis.

Spatial augmentation

This step aims to simulate variations in head orientation within the field of view (FOV) by translating and rotating the label map s. Additionally, smooth, nonlinear transformations including scaling and shear increase anatomical variability. Let Φ=ϕA denote the composition of an affine transformation A and a nonlinear deformation field ϕ.

Affine transformation:

We generate A in 3D as follows. For each axis i, we sample parameters for translation ti~𝒰at,bt, rotation ri~𝒰ar,br, scaling zi~𝒰az,bz, and shear ei~𝒰ae,be. From these, we construct the corresponding 4 × 4 matrix transforms T,R,Z, and E, respectively, which we compose as A=TRZE. The simpler 2D case uses two parameters for translation and scaling each, and one parameter for rotation and shear each, resulting in a 3 × 3 matrix transform [19].

Elastic deformation:

The nonlinear component ϕ is a vector field. We generate it by sampling a min-max normalized N-component field ϕˆ using one of the methods from Section Prerequisite: smooth noise and by randomly scaling this field to obtain ϕ=ϕ×ϕˆ, where ϕ~𝒰aϕ,bϕ. To ensure that ϕ does not introduce holes or folding, we can similarly generate a smooth, stationary velocity field (SVF) ν instead and integrate it over unit time to obtain a diffeomorphism [34].

Applying the composite transform Φ=ϕA to the label map s using nearest neighbor interpolation yields a new label map, sx=sΦ.

Partial field of view:

Often, brain scans do not capture the full anatomy to save time. We simulate partial-FOV acquisitions by cropping the image content. From sx, we derive a binary cropping mask m that zeroes out a proportion pm~𝒰am,bm of the outermost voxels along a random axis. An efficient implementation is to construct 1D binary masks mi along each axis i, reshape them to have singleton dimensions along all other spatial axes, and combine them into the final mask m via element-wise multiplication using the broadcasting mechanics of modern DL libraries:

m=m1m2mN. (4)

Internally, model g generates image x from the label map sx (Figure 2). Depending on the target task, g may return sx or the cropped label map sxm (Figure 2). For example, a segmentation model may label structures in sxm that are present in a cropped image, whereas an affine registration model could learn to align all structures in sx even if x has a partial FOV. The next step is synthesizing a grayscale image.

Image synthesis

The image synthesis builds on a Bayesian model of MRI contrast [6], which assumes that the voxel intensities within each anatomical structure j in sx follow a Gaussian distribution. However, the noise level is unlikely to vary across tissue types, and we do not want to provide this statistic to the downstream task network. In the absence of artifacts, we therefore treat image voxels associated with label j as independent samples from a normal distribution 𝒩μj,σn2 with label-specific mean μj and global variance σn2.

We generate a noise-free “mean” intensity image xμ by assigning a random intensity value μj~𝒰(0,1) to all voxels with label j. As a result, the left hippocampus might be bright and the right hippocampus dark in one batch, and this contrast may reverse in the next. Although we could constrain bilateral intensities to be the same or use per-label intensity ranges to simulate a specific modality such as CT, these constraints can limit generalization. Counterintuitively, more realistic synthesis rules have been shown to underperform relative to unconstrained sampling even for the target modality [21].

Indexing into a lookup table is an efficient approach to implementing a function μ:0,J-1[0,1] that associates zero-based index labels j{0,1,,J-1} with the J intensity values μj. We sample μ as a 1D tensor and compute the mean image xμ by treating sx as an index map into μ, assigning each voxel location MΩ the intensity xμ(M)=μsx(M). Next, we will corrupt the image.

Image corruptions

Applying a series of randomized corruptions to the noise-free image xμ will create complex intensity patterns across each anatomical structure (Figure 2). As variations in noise levels across space are undesirable, we modulate image intensities with a bias field before adding noise.

Bias field:

A common MRI artifact is a low-frequency intensity non-uniformity of 10–20% across the image, often referred to as an intensity bias [35]. This effect arises from multiple factors, including eddy currents induced by gradient field changes and non-uniformities in the radio-frequency coils. In order to simulate an intensity bias, we generate a smooth scalar noise field Bˆ, normalized into the [0, 1] range (Section Prerequisite: smooth noise), and apply it multiplicatively to the image:

xB=xμ(id-B×Bˆ), (5)

where id is the identity field, and B~𝒰aB,bB is the maximum intensity drop, sampled uniformly.

An alternative implementation [21, 26] samples a low-resolution bias field from a normal distribution without renormalizing intensities, applies the exponential function voxel-wise to map values into R+, and then linearly upsamples the field into Ω. As shown in Figure 5, this approach does not modulate intensities symmetrically, which may be undesirable.

Fig. 5:

Fig. 5:

Bias field distribution after upsampling. Uniform sampling results in a bounded, symmetric distribution (blue, shifted by 1.5 for comparison). In contrast, applying the exponential function to normally distributed values guarantees a positive field, but the resulting distribution is asymmetric and includes higher values (exceeding 2 for standard deviation σ=0.33, red).

Gaussian blur:

Similar to the bias field, we apply random blur before adding noise to ensure a diverse landscape of images—adding noise first would correlate the noise floor with the smoothness level. Blurring simulates partial volume effects, where voxel intensities are a combination of signals from different tissue types due to the finite voxel size [7].

As in Section Prerequisite: smooth noise, we construct a normalized Gaussian kernel κi of uniformly sampled standard deviation σκ,i~𝒰aκ,bκ for each axis i{1,2,,N}. We convolve xB with these kernels, yielding

xκ=xB*κ1*κ2**κN. (6)

Additive noise:

The unstructured background noise in MRI magnitude images follows a Rician distribution and primarily arises from electrical resistance. However, for simplicity, we corrupt the image with Gaussian noise (Section Image synthesis). We uniformly sample a standard deviation σn~𝒰an,bn and draw a Gaussian noise tensor n~𝒩σn2 across Ω to obtain xn=xκn via voxel-wise addition.

Exponentiation:

Gamma augmentation, or exponentiation, relates to gamma correction—a technique that bridges the nonlinear nature of brightness perception with the linear response of digital imaging systems. In our generative model, gamma augmentation randomizes image contrast by nonlinearly stretching or compressing the intensity range. First, we re-normalize the image to [0, 1]. Then, we sample a global exponent γ~𝒰aγ,bγ,γ>0, and apply it voxel-wise:

xγ=xn-minxnmaxxn-minxnγ. (7)

Downsampling:

Together with the Gaussian blur, this step simulates a scan acquired at a lower resolution, upsampled to a higher resolution. First, we draw a downsampling factor di~𝒰1,bd for each axis i{1,2,,N} and downsample the image to reduce its size by these factors. Then, we upsample the result back to the original size to obtain a new image xd. Both nearest-neighbor and linear interpolation are efficient choices. For the discrete-valued label map sx, we use nearest-neighbor interpolation to avoid intermediate values when the downstream task requires downsampling of both x,sx.

Cropping:

We zero out a proportion of voxels at the edge of the FOV along randomized axes by multiplying the image voxel-wise with the binary mask m from Section Partial field of view, yielding xm=xdm. Such zero-intensity regions arise when preprocessing conforms images to a larger FOV. Depending on the size and location of the masked-out region, this step may partially crop the anatomy.

Further corruptions:

In aggregate, the applied corruptions produce highly variable images for model training (Figure 1). In order to enhance robustness to additional artifacts or preprocessing operations, incorporating further augmentations can be advantageous. For example, we may randomly clear image slices at random locations to simulate saturation effects. We might also set non-brain voxels to zero to simulate skull-stripping in preprocessing. Additionally, it can be beneficial to apply more aggressive corruptions such as downsampling, smoothing, and cropping only some of the time.

With the generative model in place, we will now explore practical aspects of synthesis-driven DL.

Practical discussion

The following sections outline key considerations for integrating the generative model into the DL stack and training networks with it, from a method selection framework to troubleshooting common issues.

Method selection guidance

Users seeking to choose between conventional and synthetic training need to consider their specific task, data—and to a lesser extent—computational resources. The main deciding factor is the task. Generally, domain randomization is appropriate for unsupervised problems that do not require ground-truth labels, such as registration with an overlap loss. Similarly compatible are supervised tasks whose supervision is invariant or equivariant with the transforms applied to label maps or images. For segmentation, for example, ground-truth label maps are invariant to additive image noise and equivariant with elastic augmentation. Synthesis-based training is also appropriate for tasks in which the quantity of interest can be derived from the image or label map. A counterexample is brain age prediction, as it is unclear how elastic augmentation may change the ground-truth age. Unless users only apply affine transforms or carefully explore Randomization ranges, conventional training is more appropriate. Domain randomization is an excellent choice for robust processing toolboxes. However, if the goal is to explore network architectures or solve a specific problem quickly, especially when generalization across a wide distribution of data is not required, conventional training is almost always preferable. Similarly, domain randomization is less suitable for data-driven analyses and exploratory DL aiming to characterize an empirical distribution [36, 37].

Another important factor is training data availability. A key advantage of synthesis-driven training is the ability to achieve competitive performance with few label maps. While reasonable performance is attainable with 5–10 subjects [21], some state-of-the-art methods use only about 100 [19, 26]. Adding more label maps can increase accuracy, albeit with diminishing returns [21]. In contrast, conventional DL often relies on datasets with several thousand subjects [38]. Domain randomization is a suitable option when training images or labels are sparse. Manual labeling may be required to learn segmentation of new structures, but there are other ways to obtain labels of surrounding structures for synthesis (Section Label maps for synthesis)—labeling accuracy is largely irrelevant in synthesis-based training, since the generated images match the labels by construction.

Computational resources are typically not a limiting factor. As for any DL task, training a performant neural network requires a recent GPU. Running the generative model on the same device is efficient (Section Efficient integration); the added memory requirements are comparatively low as the model lacks trainable parameters. Some tasks benefit from increasing model capacity (Section Model architecture and capacity. The associated increase in training time and memory usage can often be accommodated by optimizing code, for example, by reducing the field of view, applying fewer convolutional filters at the highest resolution, operating at half resolution or training with mixed precision.

Getting started

New adopters can quickly get started by building on publicly available implementations, which differ mainly in the order and details of the image corruption steps [19, 21]. Interactively experimenting with sampling ranges—such as the bias-field strength B~𝒰aB,bB—and generating examples in a Jupyter session will help users understand how the hyperparameters influence the image synthesis. In this context, setting both sampling bounds (aB,bB) to the intended maximum value is a good habit allowing to visualize the strongest possible effect without generating many examples. Table II provides domain-randomization ranges to use as a starting point. Similar ranges have successfully been applied in multi- and cross-contrast MRI registration, segmentation, and skull-stripping of MRI, CT, and PET [19, 21, 26].

TABLE II:

Uniform domain-randomization starter ranges [a,b], where SD abbreviates standard deviation. We list warp and bias field ranges assuming noise generation via Gradient noise.

Parameter Unit a b
Translation t mm −30 30
Rotation r −30 30
Scaling z % 90 110
Shear e % 90 110
Warp strength ϕ mm 0 20
Warp control points C 2 16
Cropping proportion pm % 0 20
Label intensity mean μ a.u. 0 1
Bias drop B % 0 50
Bias control points C 2 4
Image blurring SD σκ mm 0 2
Noise intensity SD σn % 0 10
Gamma exponent γ 0.5 1.5
Downsampling factor d 1 4

With the generative model set up, the next step is integrating it into the learning pipeline to train a task network. Unless tackling a completely new problem, begin by reproducing a known successful task and gradually expand from there. Open-source demos are available at https://w3id.org/synthmorph, showcasing how to use publicly available code to synthesize images, train affine and deformable registration models, and use a domain-randomized model for CT-to-MRI registration. These demos run interactively in the browser and require no installation of additional software.

Model inputs and outputs

In conventional training, we sample an image x and associated ground-truth quantity y from the training set, pass x through the task network, and compare the prediction yˆ to y in the loss function—before we compute its gradient with respect to the network weights to update these via backpropagation. In the synthesis paradigm, we use a training set 𝒯 of label maps. At each iteration, we sample a label map s𝒯 and use the generative model g to create a synthetic image-segmentation pair (x,sx) from s. The task network predicts yˆ from x as before—but what do we compare yˆ to? In general, we derive the target quantity y from sx. For example, y might be one-hot encoded brain structures for segmentation [21] or an aggregate of brain structures for skull-stripping [26]. For unsupervised registration, we might generate two image-segmentation pairs, extract fixed brain labels y and brain labels yˆ moved by the estimated transform, and compare their one-hot overlap [28]. Diagrams showing the information flow for these tasks are available in a recent review paper [5].

Efficient integration

Novice users may be tempted to generate a static set of image-segmentation pairs to train a network as they would with real data [22, 23]. We advise against this approach, as it undermines a key advantage of real-time synthesis: the ability to create a new image with each invocation. Generating an effectively endless stream of training images maximizes the potential to learn generalizable features.

There are several ways to integrate real-time synthesis into the training pipeline. One option is to set up a combined model that takes label maps as input, synthesizes images from them, and processes these through the task model [21, 28]. This setup requires extraction of the task model for validation and testing. A simpler method is calling separate models directly within the training loop. Placing both synthesis and task models on the same GPU minimizes costly copy operations between host and device but requires sufficient GPU memory. An alternative is to distribute synthesis and training over multiple processes or workers. For example, one implementation continuously generates training examples and writes them to disk using several CPU jobs, while a parallel GPU job reads the data for training, deleting samples after loading to minimize disk usage [26]. A recent project abstracts this approach into a Python package that orchestrates generation and training processes, efficiently streaming data through memory [39].

Randomization dimensions

Randomizing every aspect of the training data is unnecessary, as allocating model capacity to handle heterogeneity that we can easily remove is counterproductive. For example, simple min-max normalization standardizes the intensity range, while reorienting images based on header information avoids wasting capacity on learning all possible head orientations [19]. In contrast, it is beneficial to identify characteristics of a new target distribution that the generative model may not synthesize yet. These characteristics should be randomized across a range that both covers and exceeds the effects expected in real data. For instance, low-field MRI has a lower signal-to-noise ratio than standard MRI, and adjusting the randomization ranges accordingly can improve performance [24]. As discussed in Section Generative modeling, successful implementations randomize not only the sampled values but also the sampling distributions: instead of adding noise from a normal distribution of fixed standard deviation, we also randomize the standard deviation. Similarly, we vary the spatial frequency and strength of nonlinear deformations.

Randomization ranges

Randomizing image characteristics over wide ranges can improve model generalization by increasing the diversity of the training distribution, even though the synthesized images appear unrealistic. However, excessively wide ranges can make the task too difficult or force the network to allocate capacity to accommodate unnecessary variability. For example, random sampling of image intensities on a per-structure basis leads to unrealistic tissue contrasts but helps segmentation networks generalize beyond the intensity characteristics of specific modalities. Controlled deformation of input label maps improves registration performance, whereas excessive deformation can degrade it. This degradation usually occurs gradually: prior work showed validation accuracy on real data to vary smoothly within a relatively wide neighborhood of sampling-range optima [20, 28].

To ensure that the synthesis has a constructive impact on performance, we need to tailor the randomization ranges of Table II to the specific task or target domain. Prior work on registration optimizes sampling ranges via grid search [28]. This work fixes all but one synthesis hyperparameter and trains separate models for various values, initializing each with the same trained weights. However, grid searches are computationally costly and challenging with tasks in which the applied corruptions may introduce concept shifts. For example, for brain age prediction from MRI, spatial-anatomical augmentation and applied corruptions may shift the brain age of the original anatomy in a way that is difficult to predict and control, potentially impinging on performance. A recent method explores the hyperparameter space in a more principled fashion: it learns optimal randomization ranges from a set of real, labeled images while simultaneously training a task network on data synthesized using these ranges [40]. Although the approach adds complexity, it reduces reliance on hand-crafted parameters.

Label maps for synthesis

Many publicly available neuroimaging datasets contain label maps that include brain and sometimes non-brain structures [26]. When label maps are unavailable, robust segmentation tools can derive them from the images [1, 21]. Although applications such as brain-specific registration only require brain labels to compute the loss, including non-brain labels enables synthesis of whole-head scans, improving network robustness across images with and without skull-stripping. Similarly, skull-stripping requires only two labels—brain and non-brain—but using only these labels for synthesis may limit generalization. We can create more complex image content by incorporating non-anatomical and artifactual structures segmented from images using simple methods such as k-means clustering or Gaussian mixture modeling [21, 26].

Some works extend this approach by synthesizing the training label maps entirely, whether for segmentation [25, 29] or registration [28]. This strategy eliminates the need for acquired training data and is well-suited for anatomy-agnostic tasks or situations where patch-wise processing allows the synthesis of simple anatomical features [25].

Mixed-data training

Combining real and synthetic training images is a powerful way of extending existing datasets, increasing network generality via domain randomization while including domain-specific knowledge. When label maps are available for the real data, alternating between real and synthetic enables use of the same loss function to avoid introducing a weighting problem [22, 23]. When an additional loss term is required, such as an unsupervised image-similarity term for registration [38], we need to balance the loss function. Common approaches range from simple grid search and normalization by loss term magnitude to more principled methods like weighting by learned loss uncertainty or adaptive gradient norm balancing.

Model architecture and capacity

Synthesis-driven training is compatible with networks of any architecture. In our experience, fully leveraging synthetic data can require larger networks, to capture the vast training distribution. For example, increasing network capacity leads to substantial gains in registration accuracy, constrained only by the available GPU memory [28]. This work settles on a U-Net architecture with 10 convolutional layers of 256 filters each. In contrast, prior work on unsupervised registration, which the study builds upon, achieves state-of-the-art performance with a similar architecture using only 16 to 32 convolutional filters per layer [38]. However, the earlier work focuses on within-contrast registration of T1-weighted MRI, whereas networks trained with synthetic data tend to generalize well across MRI sequences and to cross-contrast registration. A downside of larger networks is long training times—from a week for 3D segmentation to a month or longer for very large registration networks—making pilot experiments essential.

Pilot experiments

Developing on a computationally reduced problem accelerates progress and helps eliminate bugs early, before switching to a full setup. Additionally, parameter sweeps become faster or more precise if hyperparameters transfer from the reduced to the full problem. We therefore recommend starting in 2D or at reduced resolution, where experiments often yield early results within an hour, and where models fully train overnight. For example, halving the resolution of 3D data cuts memory and processing demands by nearly 90%. Synthesis hyperparameters generally transfer if adjusted for voxel size [19]—voxel-based ranges like blurring (aκ,bκ, Section Gaussian blur) should double when moving from half to full resolution. While 2D experiments avoid this conversion, they require models with configurable dimensionality. Training on patches is also an efficient option if large-scale context is expendable. However, this strategy usually implies patch-based inference since performance drops when inputs deviate from the training distribution, and it requires fusing patch-wise predictions. Upon returning to full 3D runtimes, progressive validation during training becomes essential.

Validating domain shift mitigation

Continuously monitoring performance on real validation data ensures that training achieves its intended effect and helps determine when to stop. We recommend using multiple small but diverse validation sets that span various image types to reveal performance differences across domains. It is also helpful to train an identical network on real data in parallel, in order to quantify the reduction in domain shift induced by domain randomization. For tasks like segmentation and registration, Dice scores on label maps offer a simple and interpretable metric.

A more direct way to assess domain-shift reduction is to measure the variability of network features in response to covariate shifts of interest, using image similarity metrics such as normalized cross-correlation or mean absolute difference. We expect features in the later layers of a deep network to become invariant to such shifts [21, 28]. For example, a segmentation network should produce similar label maps for T1 and T2-weighted images of the same brain, and thus extract similar features in its final layers (Figure 6).

Fig. 6:

Fig. 6:

Features extracted from the last network layer for two scans of the same brain that differ in contrast and resolution. Domain randomization yields stable features, whereas training the same network on real T1-weighted data does not. Adapted from prior work [21].

Troubleshooting and failure modes

Whenever adjusting a hyperparameter that influences the generative model, we advise generating 20–30 images and label maps to assess whether the change has the desired effect and does not introduce common failure modes: it is easy to unintentionally produce data that slow down convergence, cause frequent divergence, or lead to NaN loss values. Pitfalls include moving label maps out of the FOV, too much cropping when the anatomy is at the edge of the FOV, and applying image corruptions that set the entire image to zero. These failure modes can be readily identified by eye and corrected.

If the generation works as expected but the loss does not decrease, ensure that the model inputs and outputs are correct. The added complexity of the generative model can lead to incorrect tensor selections—for instance, passing the original label map s to the loss when it should be sx after spatial augmentation. Visualizing inputs and outputs throughout the DL setup is often helpful. Another effective troubleshooting method is to write a single image-segmentation pair to disk and train the network on this one example, making feature extraction easy and convergence faster. If the loss does not decrease within this setup, there is likely a bug.

Limitations of synthetic training

Domain randomization can yield robust models that process a variety of neuroimaging data types without retraining. However, if the synthesis exclusively covers healthy anatomies or common pathologies, rarer patterns or atypical lesions may be handled suboptimally despite their clinical relevance. The limited size and diversity of many public datasets can introduce model bias when training on images synthesized from these data without sufficient spatial and anatomical augmentation. Bias can also occur when sampling rules within the generative model favor specific contrasts or shapes, leading to representation issues that impinge on balanced performance across populations.

Similarly, inadequate synthesis of complex real-world acquisition artifacts such as Gibbs ringing or radio-frequency inhomogeneities may cause networks to fail on scans that exhibit features outside the training distribution. Physiological noise from respiratory motion or pulsation is challenging to synthesize, yet it is often present in neuroimaging. If the image generation fails to capture any one of these effects in training, networks risk overfitting to synthetic artifacts. This reality gap will limit the model’s ability to generalize to real noise patterns and acquisition effects, thereby reducing performance.

Further limitations of the synthesis paradigm include a more abstract and complex setup, as well as longer training times. Users may need to adjust network capacity to handle the increased variability, select synthesis and augmentation ranges appropriate for the task, and efficiently allocate computational resources. Finally, domain randomization is not compatible with every DL task (Section Method selection guidance).

Conclusion and outlook

Domain randomization is an emerging DL strategy in neuroimaging that trains generalizable networks using synthetic images. Despite impactful advances in core image-analysis tasks, many opportunities remain to explore and broaden the impact of domain-randomized training. Potential downstream applications include cortical surface reconstruction for morphometry, MRI bias-field correction, image quality estimation, and extensions to the frequency domain—k-space. To date, most works have focused on methodological development and image processing. Yet, synthesis-based strategies could prove transformative in earlier stages of the imaging pipeline—such as adaptive motion correction, autonomous patient-centric acquisition schemes, or undersampled reconstruction strategies.

Although domain randomization substantially enhances generalizability, it is unlikely to supplant real training data. On the contrary, these paradigms are fundamentally complementary. Learned corruption processes are a promising development that highlight their synergies: recent work has shown how parametric and residual degradations of synthetic images can be learned from real data to bridge reality gaps. While this direction reduces the risk of wasting network capacity to learn unhelpful synthetic variability, understanding the trade-offs between model complexity, real data diversity, and the resulting generalizability remains an open research question. Concurrently, cutting-edge diffusion and flow-based models have tremendous potential to enrich the image synthesis well beyond heuristic corruptions. These powerful generative learning approaches may condition the synthesis on more complex anatomical priors, spanning broad neurodiversity and rare pathological variants with greater fidelity.

As neuroimaging and analysis continue to grow increasingly diverse and multimodal, scalable learning paradigms that complement real data to cover the high-dimensional space of anatomical, pathological, and technical heterogeneity will become ever more essential. This need is of particular importance for the development of large foundation models, as the availability and shareability of medical data remain comparatively restricted. The transformative impact that language models—specifically vision-language models—are having at large promises to reshape the neuroradiological landscape by enhancing diagnostic workflows through the synergistic integration of multimodal information, automated and adaptive reporting, as well as clinical decision support.

Looking ahead, domain randomization sits at an exciting intersection that combines domain-specific insight with principled generative modeling to achieve scalable learning. As generative technologies and foundation models evolve, we expect synthesis-based training to transition from handcrafted feature distributions towards data-driven, adaptive strategies—blurring the line between synthesis-based and real-world learning to pave a data-efficient pathway to general-purpose neuroimage analysis tools.

Acknowledgment

The work on this article was supported in part by the National Institute of Child Health and Human Development (R00 HD101553, R01 HD099846, R01 HD109436), the National Institute of Biomedical Imaging and Bioengineering (R01 EB033773), the National Institute on Aging (R01 AG064027), the National Institute of Neurological Disorders and Stroke (U24 NS135561), and the National Cancer Institute (R01 CA255479). The author maintains a consulting relationship with Neuro42, a company that did not have any involvement in the content of this work. He is grateful to Hanna Loetz for her unwavering support, without which this article would not have been possible.

References

  • [1].Fischl Bruce, “FreeSurfer,” NeuroImage, vol. 62, no. 2, pp. 774–781, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Malone Ian B, Cash David, Ridgway Gerard R, MacManus David G, Ourselin Sebastien, Fox Nick C, and Schott Jonathan M, “MIRIAD—public release of a multiple time point Alzheimer’s MR imaging dataset,” NeuroImage, vol. 70, pp. 33–36, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Althnian Alhanoof, AlSaeed Duaa, Al-Baity Heyam, Samha Amani, Dris Alanoud Bin, Alzakari Najla, Elwafa Afnan Abou, et al. , “Impact of dataset size on classification performance: an empirical evaluation in the medical domain,” Applied Sciences, vol. 11, no. 2, pp. 796, 2021. [Google Scholar]
  • [4].Zhou Kaiyang, Liu Ziwei, Qiao Yu, Xiang Tao, et al. , “Domain generalization: a survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 4, pp. 4396–4415, 2022. [Google Scholar]
  • [5].Gopinath Karthik, Hoopes Andrew, Alexander Daniel C, et al. , “Synthetic data in generalizable, learning-based neuroimaging,” Imaging Neuroscience, 2024. [Google Scholar]
  • [6].Wells William M, Grimson W Eric L, Kikinis Ron, and Jolesz Ferenc A, “Adaptive segmentation of MRI data,” IEEE Transactions on Medical Imaging, vol. 15, no. 4, pp. 429–442, 1996. [DOI] [PubMed] [Google Scholar]
  • [7].Van Leemput Koen, Maes Frederik, Vandermeulen Dirk, et al. , “A unifying framework for partial volume segmentation of brain MR images,” IEEE Transactions on Medical Imaging, vol. 22, no. 1, pp. 105–119, 2003. [DOI] [PubMed] [Google Scholar]
  • [8].Rückert Daniel, Sonoda Luke I, Hayes Carmel, Hill Derek LG, Leach Martin O, and Hawkes David J, “Nonrigid registration using free-form deformations: application to breast MR images,” IEEE Transactions on Medical Imaging, vol. 18, no. 8, pp. 712–721, 1999. [DOI] [PubMed] [Google Scholar]
  • [9].Csurka Gabriela, “Domain adaptation for visual applications: a comprehensive survey,” arXiv preprint arXiv:1702.05374, 2017. [Google Scholar]
  • [10].Wang Mei and Deng Weihong, “Deep visual domain adaptation: A survey,” Neurocomputing, vol. 312, pp. 135–153, 2018. [Google Scholar]
  • [11].Johnson W Evan, Li Cheng, and Rabinovic Ariel, “Adjusting batch effects in microarray expression data using empirical bayes methods,” Biostatistics, vol. 8, no. 1, pp. 118–127, 2007. [DOI] [PubMed] [Google Scholar]
  • [12].Gretton Arthur, Smola Alex, Huang Jiayuan, Schmittfull Marcel, et al. , “Covariate shift by kernel mean matching,” Dataset shift in machine learning, vol. 3, no. 4, pp. 5, 2009. [Google Scholar]
  • [13].Sun Baochen, Feng Jiashi, and Saenko Kate, “Correlation alignment for unsupervised domain adaptation,” Domain adaptation in computer vision applications, pp. 153–171, 2017. [Google Scholar]
  • [14].Pan Zhaoqing, Yu Weijie, Yi Xiaokai, Khan Asifullah, Yuan Feng, et al. , “Recent progress on generative adversarial networks (GANs): a survey,” IEEE Access, vol. 7, pp. 36322–36333, 2019. [Google Scholar]
  • [15].Tzeng Eric, Hoffman Judy, Saenko Kate, and Darrell Trevor, “Adversarial discriminative domain adaptation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7167–7176. [Google Scholar]
  • [16].Shorten Connor and Khoshgoftaar Taghi M, “A survey on image data augmentation for deep learning,” Journal of Big Data, vol. 6, no. 1, pp. 1–48, 2019. [Google Scholar]
  • [17].Müller-Franzes Gustav, Niehues Jan Moritz, Khader Firas, Arasteh Soroosh Tayebi, Haarburger Christoph, Kuhl Christiane, Wang Tianci, Han Tianyu, Nolte Teresa, Nebelung Sven, et al. , “A multimodal comparison of latent denoising diffusion probabilistic models and generative adversarial networks for medical image synthesis,” Scientific Reports, vol. 13, no. 1, pp. 12098, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Iglesias Juan Eugenio, “A ready-to-use machine learning tool for symmetric multi-modality registration of brain MRI,” Scientific Reports, vol. 13, no. 1, pp. 6657, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Hoffmann Malte, Hoopes Andrew, Greve Douglas N, Fischl Bruce, and Dalca Adrian V, “Anatomy-aware and acquisition-agnostic joint registration with SynthMorph,” Imaging Neuroscience, 2024. [Google Scholar]
  • [20].Fu Jingru, Dalca Adrian V, Fischl Bruce, Moreno Rodrigo, and Hoffmann Malte, “Learning accurate rigid registration for longitudinal brain MRI from synthetic data,” in IEEE International Symposium on Biomedical Imaging. IEEE, 2025, pp. 1–5. [Google Scholar]
  • [21].Billot Benjamin, Greve Douglas N, Puonti Oula, Thielscher Axel, Van Leemput Koen, Fischl Bruce, Dalca Adrian V, et al. , “SynthSeg: Segmentation of brain MRI scans of any contrast and resolution without retraining,” Medical Image Analysis, vol. 86, pp. 102789, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Hendrickson Timothy J, Reiners Paul, Moore Lucille A, Lundquist Jacob T, Fayzullobekova Begim, Perrone Anders J, Lee Erik G, Moser Julia, Day Trevor KM, et al. , “BIBSNet: A deep learning baby image brain segmentation network for MRI scans,” bioRxiv, 2023. [Google Scholar]
  • [23].Zalevskyi Vladyslav, Sanchez Thomas, Roulet Margaux, Verdera Jordina Aviles, Hutter Jana, Kebiri Hamza, and Cuadra Meritxell Bach, “Improving cross-domain brain tissue segmentation in fetal MRI with synthetic data,” in Medical Image Computing and Computer-Assisted Intervention. Springer, 2024, pp. 437–447. [Google Scholar]
  • [24].Laso Pablo, Cerri Stefano, Sorby-Adams Annabel, et al. , “Quantifying white matter hyperintensity and brain volumes in heterogeneous clinical and low-field portable MRI,” in IEEE International Symposium on Biomedical Imaging. IEEE, 2024, pp. 1–5. [Google Scholar]
  • [25].Chollet Etienne, Balbastre Yaël, Mauri Chiara, et al. , “Neurovascular segmentation in sOCT with deep learning and synthetic training data,” arXiv preprint arXiv:2407.01419, 2024. [Google Scholar]
  • [26].Hoopes Andrew, Mora Jocelyn S, Dalca Adrian V, et al. , “SynthStrip: skull-stripping for any brain image,” NeuroImage, vol. 260, pp. 119474, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Kelley William, Ngo Nathan, Dalca Adrian V., Fischl Bruce, Zöllei Lilla, and Hoffmann Malte, “Boosting skull-stripping performance for pediatric brain images,” in IEEE International Symposium on Biomedical Imaging. IEEE, 2024, pp. 1–5. [Google Scholar]
  • [28].Hoffmann Malte, Billot Benjamin, Greve Douglas N, Iglesias Juan Eugenio, Fischl Bruce, and Dalca Adrian V, “SynthMorph: learning contrast-invariant registration without acquired images,” IEEE Transactions on Medical Imaging, vol. 41, no. 3, pp. 543–558, 2021. [Google Scholar]
  • [29].Dey Neel, Abulnaga Mazdak, Billot, et al. , “AnyStar: domain randomized universal star-convex 3D instance segmentation,” in IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 7593–7603. [Google Scholar]
  • [30].Hoffmann Malte, Hoopes Andrew, Fischl Bruce, and Dalca Adrian V, “Anatomy-specific acquisition-agnostic affine registration learned from fictitious images,” in Medical Imaging 2023: Image Processing. SPIE, 2023, vol. 12464, p. 1246402. [Google Scholar]
  • [31].Dey Neel, Billot Benjamin, Wong Hallee E., Wang Clinton, Ren Mengwei, Grant Ellen, et al. , “Learning general-purpose biomedical volume representations using randomized synthesis,” in International Conference on Learning Representations, 2025, pp. 1–32. [Google Scholar]
  • [32].Iglesias Juan Eugenio, Billot Benjamin, Balbastre Yaël, Tabari Azadeh, Conklin John, González R Gilberto, Alexander Daniel C, Golland Polina, Edlow Brian L, Fischl Bruce, et al. , “Joint super-resolution and synthesis of 1 mm isotropic MP-RAGE volumes from clinical MRI exams with scans of different orientation, resolution and contrast,” NeuroImage, vol. 237, pp. 118206, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Perlin Ken, “An image synthesizer,” ACM Siggraph Computer Graphics, vol. 19, no. 3, pp. 287–296, 1985. [Google Scholar]
  • [34].Arsigny Vincent, Commowick Olivier, Pennec Xavier, and Ayache Nicholas, “A log-euclidean framework for statistics on diffeomorphisms,” in Medical Image Computing and Computer-Assisted Intervention. Springer, 2006, pp. 924–931. [Google Scholar]
  • [35].Sled John G, Zijdenbos Alex P, and Evans Alan C, “A nonparametric method for automatic correction of intensity nonuniformity in MRI data,” IEEE Transactions on Medical Imaging, vol. 17, no. 1, pp. 87–97, 1998. [DOI] [PubMed] [Google Scholar]
  • [36].Abukmeil Mohanad, Ferrari Stefano, Genovese Angelo, Piuri Vincenzo, and Scotti Fabio, “A survey of unsupervised generative models for exploratory data analysis and representation learning,” Acm Computing Surveys, vol. 54, no. 5, pp. 1–40, 2021. [Google Scholar]
  • [37].Yang Zhijian, Wen Junhao, Erus Guray, et al. , “Identifying five dominant dimensions of neurodegeneration in brain aging through deep learning: correlations with clinical and genetic measures,” in Alzheimer’s Association International Conference. ALZ, 2024. [Google Scholar]
  • [38].Balakrishnan Guha, Zhao Amy, Sabuncu Mert R, Guttag John, and Dalca Adrian V, “Voxelmorph: a learning framework for deformable medical image registration,” IEEE Transactions on Medical Imaging, vol. 38, no. 8, pp. 1788–1800, 2019. [Google Scholar]
  • [39].Doan Mike and Plis Sergey, “Scaling synthetic brain data generation,” IEEE Journal of Biomedical and Health Informatics, 2024. [Google Scholar]
  • [40].Hu Xiaoling, Puonti Oula, Iglesias Juan Eugenio, Fischl Bruce, and Balbastre Yael, “Learn2Synth: learning optimal data synthesis using hypergradients,” arXiv preprint arXiv:2411.16719, 2024. [Google Scholar]

RESOURCES