Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2026 May 7.
Published in final edited form as: Neuroimage. 2026 Mar 24;331:121881. doi: 10.1016/j.neuroimage.2026.121881

PRMix: Primary Region Mix Augmentation and Benchmark Dataset for Precise Whole Mouse Brain Anatomical Delineation

Kunhao Yuan a, Hanan Woods a, Ülkü Günar a, Digin Dominic a, Ying Wu a,b, Zhen Qiu a,c, Seth GN Grant a,d,*
PMCID: PMC7619041  EMSID: EMS213524  PMID: 41887545

Abstract

The architecture of the mouse brain shares remarkable similarities with the human brain, making it an essential model for studying brain pathologies, synaptic diversity, and regional specialization. A key step in such studies involves registering molecular images to reference brain atlases, a process hindered by the difficulty of accurately delineating brain regions. Toward this, we have curated a collection of high-resolution, dual-fluorescence microscopy images, termed as dual-fluorescence mouse brain microscopy (DMBM) dataset, complemented by expert annotations of 118 subregions in parasagittal sections. This dataset provides unprecedented insights into the molecular and structural complexity of the mouse brain. However, its full potential for detailed whole-brain analysis is compromised by challenges such as boundary ambiguity and sample scarcity in existing automated segmentation methods, prompting the development of the primary region mix (PRMix) augmentation method. PRMix is specifically designed to expand these datasets, enhance the realism of synthetic data and minimize overlap between adjacent regions. Our approach, together with the curated dataset, achieves superior segmentation performance across the mouse brain compared with existing methods, setting a new benchmark in brain imaging research.

Keywords: Mouse brain delineation, Data augmentation, Fluorescence microscopy, Brain atlas, Dense segmentation

1. Introduction

The mouse brain is a fundamental model in neuroscience research owing to its structural and functional parallels with the human brain (Zeisel et al., 2018; Papp et al., 2014). Elucidating its molecular, synaptic, and cellular architecture is essential for understanding brain organization and function. Recent advances in molecular labeling, genetic tagging, and high-resolution imaging have enabled the generation of diverse, brain-wide datasets (Zhu et al., 2018; Bulovaite et al., 2022). However, the standard reference atlases that define brain region boundaries remain based on classical histology (Lein et al., 2007; Paxinos and Watson, 2006), creating a critical need for accurate delineation methods that are compatible with modern molecular imaging modalities (Zhu et al., 2018).

Manual delineation and annotation of brain regions are labor-intensive and prone to inter-annotator variability, limiting both reproducibility and scalability. Automated methods offer a scalable alternative, enabling high-throughput and standardized analyses. However, their performance critically depends on the availability of high-quality and diverse training datasets, which are particularly limited for modalities that integrate molecular information with anatomically accurate annotations. In this context, data augmentation plays a crucial role in enriching datasets by introducing variability and enhancing the robustness and generalization of automated models. Yet, most existing methods focus on a binary segmentation problem, such as tumor segmentation (Menze et al., 2014) or lesion segmentation (Basaran et al., 2023), leaving more complex tasks, such as dense brain delineation, relatively underexplored. To address these challenges, we present the dual-fluorescence mouse brain microscopy (DMBM) dataset, a manually annotated collection of mouse brain parasagittal sections that integrates both structural and molecular information. To improve the precision of automated brain region delineation, we propose primary region mix (PRMix), a novel augmentation method that preserves the original anatomical structure of the brain while minimizing regional overlaps, enabling realistic data synthesis and improving the precision of automated brain delineation (Wang et al., 2022).

2. Related work

2.1. Brain atlases

The Allen brain atlas (Lein et al., 2007) and Paxinos brain atlases (Paxinos and Watson, 2006) are widely used reference atlases that show delineated regions in planes of tissue sections. A core methodology used to create these atlases is Nissl staining (Kádár et al., 2009), which highlights neuronal cell bodies and cytoarchitecture but lacks the resolution for molecular structures or synaptic connectivity. To overcome this limitation, fluorescent protein probes, such as those fused to endogenous synaptic proteins (Zhu et al., 2018), self-labeling tags such as HaloTags (Los et al., 2008), and antibody labeling (Curran et al., 2021), offer enhanced molecular insight and have been instrumental in developing single-synapse resolution maps of the mouse and human brain. In contrast to the cellular-level view of Nissl staining, protein-marked imaging provides superior subcellular detail.

More recently, advances in high-throughput 3D imaging have accelerated the creation of brain-wide datasets with cellular or subcellular resolution. Light-sheet fluorescence microscopy (LSFM), for instance, has been instrumental in generating whole-brain maps by enabling rapid optical sectioning of cleared tissue (Perens et al., 2021). Similarly, serial two-photon tomography (STPT) has been employed to systematically image and reconstruct the entire mouse brain at micron-scale resolution, providing detailed cytoarchitectural and connectivity data (Vousden et al., 2015).

While these methods provide invaluable insights into whole-brain architecture and function, they often rely on single molecular markers or are optimized for tracing long-range projections.

2.2. Dual-synaptic markers

Our work leverages a dual-synaptic marker strategy to dissect the molecular diversity of synapses. The two proteins we visualize, post-synaptic density protein 95 (PSD95) and synapse-associated protein 102 (SAP102), are key scaffolding molecules within the postsynaptic terminal of excitatory synapses but exhibit distinct expression patterns and functional roles throughout development (Metzbower et al., 2024) and across different brain regions (Zhu et al., 2018; Cizeron et al., 2020; Migaud et al., 1998; Cuthbert et al., 2007). Imaging these molecules at single-synapse resolution across the brain has allowed for the generation of the first brain-wide synaptic maps in mammals across the lifespan (Cizeron et al., 2020; Bulovaite et al., 2022), in genetic models of neurodevelopmental disorders (Zhu et al., 2018; Tomas-Roca et al., 2022) and in sleep deprivation (Koukaroudi et al., 2024).

The use of such multi-marker synaptic atlases has vast potential. They provide a crucial baseline for studying how synaptic composition is altered in models of disease, the effect of experience and learning, the impact of pharmacological interventions, and many other applications. Furthermore, this approach can be extended to include additional synaptic markers to create even more detailed synaptome maps (Zhu et al., 2018), enabling researchers to undertake advanced functional (Velicky et al., 2023) and connectomic (Winding et al., 2023) studies.

2.3. Image augmentation

Deep learning-based image analysis, especially in medical imaging, often faces data scarcity (Litjens et al., 2017; Frid-Adar et al., 2018). To mitigate this, image augmentation techniques, which were originally popularized in natural image recognition, have been widely adopted. Traditional approaches apply geometric transformations such as flipping and affine adjustments (rotation, scaling, translation) to teach models spatial invariance, as well as intensity perturbations (brightness and contrast shifts) to ensure robustness to variations in imaging conditions (Krizhevsky et al., 2012; Shorten and Khoshgoftaar, 2019; Chen et al., 2020).

More sophisticated techniques have emerged that mix information between images to expand dataset diversity. MixUp (Zhang et al., 2018) introduced linear image combinations to expand datasets and regularize training, and was later extended to segmentation tasks (Ghiasi et al., 2021). Recent advances include patch-based augmentation (Yun et al., 2019), scribble-based methods for medical imaging (Zhang and Zhuang, 2022), semantic-aware augmentation (Zhang et al., 2023; Wang et al., 2025), and self-adaptive blending to address background inconsistencies (Zhu et al., 2022). While effective for object-centric tasks, these methods are nevertheless ill-suited for whole-brain analysis as they disregard global anatomical topology.

2.4. Image synthesis

A related, yet distinct, line of research employs generative models to synthesize medical images (Zhu et al., 2017), with a primary focus on modality imputation (i.e., translating images between modalities). This has proven effective in applications ranging from magnetic resonance imaging (MRI) and computed tomography (CT) image synthesis (Chartsias et al., 2019; Reaungamornrat et al., 2022), to more recently, in multi-modal brain MRI to reduce data collection requirements (Yu et al., 2022). However, it was not designed for de novo sample synthesis and has not demonstrated success on high-resolution microscopy brain images, where cellular and regional integrity is a critical requirement.

PRMix, with its primary region sampling and overlap-aware augmentation, is specifically designed to address these gaps.

3. The dual-fluorescence mouse brain microscopy dataset

3.1. Data collection

The dataset comprises whole-brain parasagittal sections from 96 mice (48 male, 48 female) ranging in age from 1–12 months. Detailed procedures for the generation of the mouse lines are described in Zhu et al. (2018). Briefly, the endogenous Psd95 and Sap102 genes were genetically modified by inserting the coding regions for fluorescent proteins eGFP and mKO2 into the 3’ regions of the genes, resulting in the expression of PSD95eGFP and SAP102mKO2 fusion proteins. Mice were perfused with sodium pentobarbital and saline-PFA. Brains were dissected, post-fixed, and cryo-embedded in OCT compound. Parasagittal sections (18 μm) were cut referencing Allen mouse brain atlas slices 11–12 (Lein et al., 2007). Whole-brain imaging was performed on a Nikon Eclipse Ti2 with a spinning disk confocal system, and 856 × 812 individual tiles (per image) were stitched into full images with 16× downsampling. Subregions were manually delineated in ImageJ using PSD95/SAP102 protein markers, guided by the Allen brain atlas.

3.2. Dataset statistics

The collected dataset has 102 dual-fluorescence whole-brain parasagittal images (n = 96), with a median resolution of 6383 × 12531, encompassing 118 well-defined anatomical subregions from the sagittal mouse brain slices. Three exemplar images are shown in Fig. 1. The total number of pixels for each subregion is log-rescaled and summarized in Fig. 2(a), which highlights significant variations in pixel counts across subregions. Notably, the foreground-to-background ratio is approximately 0.84, making the proposed dataset highly challenging while informative for neuroscience and medical research. Fig. 2(b) illustrates the missing subregions for specific image IDs. Due to the location of the slices during sectioning, certain areas, such as AOB and RSPv6b, are present in only two slices and are considered less reliable and representative. We thus excluded them in evaluation but kept the information in the figure for completeness.

Fig. 1.

Fig. 1

Exemplar images from the curated dataset. From left to right: whole-brain images stained for SAP102 and PSD95, the dual-channel composite image (SAP102 in red, PSD95 in green), and the corresponding manually annotated brain regions.

Fig. 2.

Fig. 2

Top (a). Distribution of pixel counts across regions. Bottom (b). Region availability per image with light blue indicating presence and dark blue representing absence.

4. Methods

4.1. Motivation

While we introduced a high-resolution, dual-fluorescence dataset, microscopy data are scarcer than for CT (Wasserthal et al., 2023) or MRI (Menze et al., 2014) due to their extremely large size (e.g., 102k× 200k pixels) and low throughput, and thus insufficiently diverse on their own to eliminate augmentation. This scarcity, coupled with biological variability, makes augmentation essential.

Mixing-based augmentations like CutMix (Yun et al., 2019) and CarveMix (Zhang et al., 2023) are suboptimal for our task as they were designed for object-centric tasks and fail to preserve global context. Generative methods such as MouseGAN series (Yu et al., 2022) perform well when synthesizing low-resolution (~0.1 mm) images with a small number of regions (<30 per hemisphere). When applied to high-resolution brain images (~100 nm scale) with over a hundred subregions, they often generate anatomically plausible but texturally blurry structures, which compromises the crucial topological relationships among numerous small subregions. PRMix is thus motivated to enhance data diversity and realism for effective modeling. The overall diagram of the proposed PRMix is illustrated in Fig. 3, with the subsequent paragraphs detailing its design objectives and implementation.

Fig. 3.

Fig. 3

Overview of the proposed PRMix, consisting of three modules (1) offline hard-sample mining (HSM), (2) primary region sampling (PRS) and (3) overlapaware augmentation (OAA). Only a single fluorescent marker is shown for clarity.

4.2. Offline hard-sample mining

According to our initial experiments, random sample mixing can introduce feature inconsistencies, while exclusively selecting visually similar samples lacks sufficient challenge for effective learning. We therefore designed a strategy to ensure each target image is paired with multiple query images from a diverse range of pre-computed similarities across the whole dataset, termed as offline hard-sample mining. We quantify similarity through a mask overlap score, a metric designed to emphasize structural variations, which constitute the primary challenge in delineation, whereas standard augmentations compensate for variations in texture and intensity. For each target image, similarity scores are computed against the remaining training images and sorted. The top 20% most similar samples are designated as ‘easy’ samples, while the remaining 80% are labeled as ‘hard’ samples. As shown in Fig. 4, ‘easy’ sample-mixing minimally affects adjacent regions. In the ‘hard’ case, however, a direct ‘copy-and-paste’ approach leads to significant overlaps with the striatum and pallidum regions. PRMix resolves this by recalculating the target primary region’s optimal size, orientation, and location to fit properly within the target image. During data augmentation, we mix ‘easy’ and ‘hard’ samples in varying proportions, aiming to find a balance between challenge and learnability. Empirically, oversampling 80% from the hard-sample set yielded the best results. Details of the different sampling ratios are summarized in Table 5.

Fig. 4.

Fig. 4

Exemplar samples from the ‘easy’ and ‘hard’ mixing categories. Target primary regions are highlighted with label masks, and affected areas are shown within dashed rectangles. For clarity, only one primary region is selected in the example; however, PRMix can handle mixing multiple primary regions from different sources.

Table 5. Ablation on the portion of hard samples and key components.

(a) The portion of ‘hard’ samples in HSM (b) Key components
Method F1 mIoU PRS HSM OAA F1 mIoU
Baseline 65.33 54.33 73.70 60.76
PRMix 20% 72.73 59.98 71.17 59.21
PRMix 50% 69.71 58.02 70.60 58.57
PRMix 80% 74.66 61.78 74.66 61.78

4.3. Primary region sampling

Based on anatomical and functional criteria (Lein et al., 2007; Papp et al., 2014), we grouped the 118 brain subregions into 11 primary regions: cerebellum (CB), thalamus (TH), midbrain (MB), hindbrain (HB), isocortex, hypothalamus (HY), olfactory areas (OLF), cortical subplate (CTXsp), striatum (STR), pallidum (PAL), and hippocampal formation (HPF), as illustrated in Fig. 5. These primary regions serve as the fundamental units for mixing. During mixing, we only sampled 3–9 of these regions from multiple sources to replace those in target image, boosting diversity while preserving feature consistency. The primary region sampling offers two key advantages: (1) preserving the anatomical structure of the brain image and (2) avoiding potential subregional overlaps during the mixing process. More importantly, our design incorporates mixing at an intermediate semantic level, setting it apart from existing object-centric approaches (Zhang et al., 2018; Yun et al., 2019; Zhang et al., 2023; Wang et al., 2025). This makes it particularly well-suited for dense foreground tasks such as whole-brain delineation.

Fig. 5. The 11 primary brain regions illustrated in distinct colors, showing the grouping of their 118 constituent subregions.

Fig. 5

4.4. Overlap-aware augmentation

To ensure those sampled regions align in location and size with the target image, we implemented an overlap-aware augmentation module. This module begins by applying random affine transformations — such as rescaling, rotation, and translation — to the query masks for individual primary regions. Applying these transformations independently to local regions introduces non-linear structural variations, simulating complex geometric distortions — such as local stretching and tissue shifting — frequently observed in histological preparations. Corresponding regions are then cut out from the target masks, leaving behind a complementary mask which is used to calculate intersections with the query mask. If the intersection is below a certain threshold τ, the query region will be pasted directly onto the complementary masks at its original location. Otherwise, a greedy search for optimal affine parameters will begin to ensure a minimum intersection where possible. Once the greedy search converges, the optimized affine transformations are reapplied to the query images and masks, followed by the final pasting process. Crucially, this pasting operation is performed synchronously across both fluorescent channels. By transferring the multi-channel signal as a coupled unit, PRMix strictly preserves the intra-region synaptic co-localization patterns — essential for defining molecular identity — even as the background context changes. Further-more, by training the model to resolve anatomical structures despite these local geometric shifts and the artificial boundary discontinuities introduced by mixing, our approach effectively enhances robustness against physical artifacts like tears and folding. The overall process is outlined as pseudo-code in Algorithm 1.

Algorithm 1. OAA: Overlap-Aware Augmentation.

input : Source image, seg. mask, primary region p:{X,Y}sp; Target seg. mask, primary region p:Ytp; Overlap tolerance τ; Rot., Scaling, Trans., Shift ranges: [σlr,σhr],[σls,σhs],[σlt,σht],[sl,sh]

output: Augmented low-overlap source mask and image: M˜sp,X˜sp; obtain binary region masks: Msp𝕀(Ys==p),Mtp𝕀(Yt==p); Sample random affine transformations: Tr~U(σlr,σhr),Ts~U(σls,σhs),Tt~U(σlt,σht),Taff=TrTsTt; Apply affine transformations to mask: MspTaff(Msp); Initialize best overlaps and shifts: O ← +∞ sx, sy ← (sI, sI);

while sx ∈ [sI, sh] do

       whilesy ∈ [sI, sh] do

           Shift binary region mask: Mtmp=Tshift(Msp,sx,sy);

           Calculate overlaps between the binary mask and the

             complementary target mask: o=Mtmp(1Mtp);

            if o < O then

               Oo;

                break inner loop if O < τ;

           end

           sysy + stepsize;

       end

       sxsx + stepsize;

       break outer loop if O < τ;

end

Obtain optimal mask and image: M˜spMtmp;

  X˜sp=Tshift(Xsp,sx,sy)

4.5. Comparison with existing mixing approaches

Before formalizing the proposed PRMix, we compare it with existing mixing-based augmentations. Starting with the simplest method, MixUp (Zhang et al., 2018) blends a pair of images, i.e. source image Xs and target image Xt and their corresponding labels Ys and Yt. The synthesized image and label are generated through:

X˜=Xsλ+Xt(1λ)Y˜=Ysλ+Yt(1λ), (1)

where λ ~ U(0, 1) is a controlling factor. For CutMix (Yun et al., 2019) and CarveMix (Zhang et al., 2023), their mixing strategies can be summarized as:

X˜=XsMs+Xt(1Ms)Y˜=YsMs+Yt(1Ms). (2)

⊙ is the element-wise dot product and Ms represents a binary mask used to sample areas from the source image, which can be either rectangular regions (CutMix) or semantic regions (CarveMix). Taking both source image and target image semantics into account, we yielded the basic form of our PRMix:

X˜p=X^spM^sp+Xtp(1Mtp)Y˜p=Y^spM^sp+Ytp(1Mtp), (3)

where M^sp is the overlap-mitigated semantic mask for primary region p, retrieved with M^sp=OAA(Msp,Mtp),Msp is the union of total k subregions within a primary region, known as Msp=Ms1Ms2Msk, and the adapted source image X^sp is generate with the overlap-mitigated semantic mask, using X^sp=T(Xsp,M^sp). Additionally, to enhance the versatility and variability of the dataset, we further extended the proposed PRMix to multiple source images, simply by denoting the above process as X˜p,Y˜p=PRM({X,Y}sp,{X,Y}tp),, and iterated the process multiple times via:

X˜,Y˜=PRM(,(PRM({X,Y}s2,PRM({X,Y}s1,{X,Y}t1))))Ptotalnumberofprimaryregions (4)

This enables mixing at an intermediate semantic level and allows generalization beyond a single image pair, unlike traditional augmentation methods where repeated lower-level mixing often leads to occlusion and ambiguity.

Exemplar synthesized images from different methods are illustrated in Fig. 6. MixUp (Zhang et al., 2018) generates ambiguous images, whereas CutMix (Yun et al., 2019) and CarveMix (Zhang et al., 2023) focus on single regions and struggle to preserve global anatomy. By contrast, our method produces the most realistic and diverse-looking images. We also compared our approach to a generative model, MouseGAN++ (Yu et al., 2022), initially designed for modality imputation in MRI. As shown at the bottom of the figure, while this GAN-based method maintains global structure, it does so at the cost of local texture and intensity details, which are essential for dense whole-brain delineation.

Fig. 6.

Fig. 6

Exemplar results illustrating different augmentation outputs derived from identical source-target pairs. Panel (a): The middle two rows show the original image and its corresponding label mask, while the top and bottom rows provide zoomed-in views of the manipulated regions. Panel (b): The sample synthesis flow for generative method MouseGAN++. Best viewed in color.

5. Experiments

5.1. Implementation details

To ensure a fair comparison between different augmentation methods and minimize the influence of model architectures, all experiments were conducted using the state-of-the-art medical image segmentation model nnUNet (Isensee et al., 2024), which enhances the classic UNet (Ronneberger et al., 2015) with task-specific configurations. Both the proposed PRMix and its comparators were applied offline to generate augmented datasets for each fold, thereby enhancing training efficiency. Additionally, a shared pipeline was applied during mixing, which included standard pre-processing (intensity normalization, foreground oversampling, color jittering, and affine transformations) and a final morphological opening step after mixing to smooth artificial edges resulting from manual delineation. The model was trained with an input patch size of 768 × 1536, approximately 1/64 of the median image size, using 2 patches per GPU and 250 minibatches per epoch. The collected dataset was split into 80 training/validation and 22 strictly isolated and unaugmented testing samples, and all experiments were conducted using 5-fold cross-validation on two NVIDIA RTX 4090 GPUs.

The augmented training sets were generated with varying folds, namely 5× (~400 images), 10× (~800 images) and 20× (~1600 images), by excluding cases where source regions are absent from the target image. Although MouseGAN++ (Yu et al., 2022) was developed for modality imputation rather than data augmentation, we generated a 20× augmented dataset by repeated random sampling from the generative model. Due to the large size of the whole-brain image, patch-wise training has been employed. Under the given patch size and batch settings, the model requires approximately 2500 iterations to process the entire training dataset once—a process that takes about two days on a single GPU. Consequently, the baseline model was trained for 12 full epochs as a reference, while the proposed PRMix model was evaluated with 3, 6, and 12 full epochs for comparison. Hereafter, unless otherwise specified, all mentions of “epochs” refer to full epochs.

5.2. Experimental results

We compared PRMix with existing methods in Table 1, evaluating categorical average precision, recall, F1 and mIoU. Unless stated otherwise, all results are the average of 5-fold runs and were obtained using purely synthesized data (except for the baseline). The best score is highlighted in bold text. Although MouseGAN (Yu et al., 2022) utilizes an augmented dataset, it compromises intensity details, leading to the lowest precision and recall among all augmentation strategies. This performance gap largely stems from the fact that MouseGAN’s architecture is optimized for modality imputation in lower-resolution MRI. Its reliance on CycleGAN-like translation often results in ‘hallu-cinated’ or blurred textures when applied to high-resolution, synapsedense microscopy data. In contrast, a simpler mixing-based method like MixUp (Zhang et al., 2018) outperforms MouseGAN, despite introducing pixel-wise ambiguity. While CutMix (Yun et al., 2019) and CarveMix (Zhang et al., 2023) improve F1 and mIoU over the baseline, they fail to capture all true pixels, resulting in similar performance to MixUp. We hypothesize that this occurs because these methods generate synthetic samples without considering spatial locations, leading to overlapping regions that make distinguishing individual objects more challenging. By contrast, our proposed PRMix significantly outperforms all compared methods across all metrics, achieving the highest F1 (74.66) and mIoU (61.78) among all methods.

Table 1. Results on the isolated test set using different augmentation strategies, which are obtained with a 5-fold average.

±: standard deviations over 5 runs. Regional results are provided in the supplementary material.

Method F1 mIoU Precision Recall
Baseline 65.33±0.49 54.33±0.48 66.35±0.87 66.55±0.40
MouseGAN 20× 70.62±0.28 57.11±0.15 71.45±0.46 71.35±0.37
MixUp 20× 73.30±0.53 60.73±0.4O 74.71±0.43 73.77±0.39
CutMix 20× 72.97±0.79 60.67±0.62 75.05±0.42 73.83±1.02
CarveMix 20× 73.30±0.85 60.87±0.56 74.10±1.07 74.44±0.49
PRMix 20× 74.66+0.29 61.78±0.33 75.23±0.57 75.31±0.33

Qualitative results from different methods are illustrated in Fig. 7. The top two rows highlight under-represented regions, such as UVUgr, which Baseline, MouseGAN, CutMix, and CarveMix fail to segment. These mispredictions can be attributed to insufficient training data (Baseline), severe overlaps from mixing (CutMix and CarveMix), and an inability to capture the intensity distribution (MouseGAN). The third row displays challenging cortical regions where most methods struggle. In contrast, our proposed PRMix produces results that align closely with the ground truth anatomy. The final three rows present highly ambiguous samples in the sAMY, AON, and PTLp regions. Here, MouseGAN’s result is particularly noisy due to a lack of intensity gradients, underscoring that both anatomical structure and intensity distribution are essential for accurate brain delineation.

Fig. 7.

Fig. 7

Visualization of results from different mixing methods on randomly selected testing samples. Dashed white rectangles indicate regions with the most significant discrepancies between methods. Best viewed digitally. Zoom-in images are available in the supplemental material.

To move beyond qualitative comparisons, which can be prone to selection bias, we conducted a rigorous statistical analysis. We performed a paired t-test on the F1 and mIoU scores for every test sample, comparing PRMix against each baseline and competing method. The results are detailed in Table 2. As expected, all data augmentation methods, including PRMix, significantly outperformed the baseline. More importantly, PRMix also showed statistically significant improvements over MouseGAN, MixUp, and CarveMix. Although the margin over CutMix was not statistically significant, the p-values for both F1 (0.079) and mIoU (0.057) were close to the significance threshold, and PRMix’s superior average performance is shown in Table 1.

Table 2. Paired t-test p-values for F1 and mIoU, computed on test samples.

*P < 0.05, **P < 0.01 and ***P < 0.001. One-sided tests evaluate whether the selected method yields significantly higher F1 or mIoU than the reference. In each subgrid, the method above the horizontal line is the reference and the one below is the tested method.

Method F1 Performance mIoU Performance
F1-score p-value mIoU p-value
Baseline 65.33 N/A 54.33 N/A
MouseGAN 20× 70.62** 5.68e3 57.11* 1.08e2
MixUp 20× 73.30*** 1.48e17 60.73*** 5.84e21
CutMix 20× 72.97*** 8.19e22 60.67*** 1.87e24
CarveMix 20× 73.30*** 7.21e21 60.87*** 1.87e23
PRMix 20× 74.66*** 1.72e19 61.78*** 8.69e22
MouseGAN 20× 70.62 N/A 57.11 N/A
PRMix 20× 74.66*** 4.78e5 61.78*** 2.897e6
MixUp 20× 73.30 N/A 60.73 N/A
PRMix 20× 74.66** 9.32e3 61.78** 7.83e3
CutMix 20× 72.97 N/A 60.67 N/A
PRMix 20× 74.66 7.89e−2 61.78 5.65e−2
CarveMix 20× 73.30 N/A 60.87 N/A
PRMix 20× 74.66 8.35e−2 61.78* 4.61e2

5.3. Ablations

We conducted a comprehensive ablation study to evaluate the impact of several critical factors, including dataset scale, training duration, and the specific configuration of mixed images. Additionally, we investigated the influence of fluorescent channel selection, the proportion of hard samples in the HSM module, and the individual contributions of each PRMix component. Finally, we compared the effectiveness of purely synthetic training against a hybrid refinement schedule under a consistent computational budget, and all the results are detailed in Tables 3, 4, 5 and 6.

Table 3. Ablation study on the impact of dataset scaling in augmented folds and training duration.

(a) Scaling effect (b) The number of training epochs
Method Epochs F1 mIoU Method Epochs F1 mIoU
Baseline 12 65.33 54.33 Baseline 12 65.33 54.33
PRMix 5× 12 69.82 58.14 PRMix 20× 3 68.53 56.75
PRMix 10× 12 70.05 58.13 PRMix 20× 6 69.92 57.97
PRMix 20× 12 74.66 61.78 12 74.66 61.78

Table 4. Ablation on the number of mixing images and fluorescent channels.

(a) The number of mixing images (b) Fluorescence channels
Method F1 mIoU Method F1 mIoU
Baseline 65.33 54.33 Baseline 65.33 54.33
PRMix Dual 74.34 PRMix_SAP102 69.92 57.77
PRMix Tri 74.66 61.78 PRMix_PSD95 70.00 58.44
PRMix Quad 71.03 59.50 PRMix 74.66 61.78

Table 6. Performance comparison of Synthetic (12 epochs) versus Hybrid (6 synthetic and 6 real epochs) strategies.

Method Synthetic Hybrid
F1 mIoU F1 mIoU
MouseGAN 20× 70.62 57.11 71.01 57.51
MixUp 20× 73.30 60.73 73.73 61.10
CutMix 20× 72.97 60.67 73.73 61.01
CarveMix 20× 73.30 60.87 74.56 61.42
PRMix 20× 74.66 61.78 74.94 61.91

The results in Table 3(a) highlight the benefit of extensive data augmentation. Performance steadily improves as augmentation increases from 5× to 20×, where the model achieves its highest F1 and mIoU scores, surpassing the baseline by a large margin. Table 3(b) confirms that performance also correlates with longer training. However, comparing the two factors reveals that augmentation provides a greater advantage than training duration. For example, PRMix 20× with just 3 epochs of training outperforms PRMix 5× with 12 epochs, demonstrating that exposure to more diverse samples is more critical than longer training with less diverse data.

Our investigation into the number of mixed images (Table 4(a)) indicates that three is the optimal number. Mixing two images provides minimal variation and risks overfitting, whereas mixing four disrupts the target image’s structure. We attribute this to excessive structural disruption: when too many foreign patches are introduced, the global topological coherence of the target image is compromised. This results in an anatomically implausible ‘cluttered’ canvas where key contextual cues are heavily occluded, making it difficult for the model to learn valid spatial relationships. Empirically, mixing three images achieves the best balance and the highest performance. In Table 4(b), we assess the impact of fluorescence channels. The dual-channel model significantly outperforms models trained on either the SAP102 or PSD95 channel alone, validating our methodological design. We also note that the PSD95 model slightly outperforms the SAP102 model, likely because its clearer expression in maturer synapses (Metzbower et al., 2024) helps in distinguishing regional boundaries.

Table 5(a) shows that overemphasis on ‘easy’ samples (the 20% ‘hard’ setting) leads to suboptimal performance, likely due to insufficient feature diversity. In addition, the 50%:50% configuration does introduce greater diversity but still lacks enough hard examples to support learning a more generalizable representation, which accounts for its performance gap compared with the 80%–hard setting. Table 5(b) shows that integrating primary region sampling (PRS) alone yields a noticeable performance improvement compared to the baseline. However, further adding either HSM or OAA individually leads to performance degradation, suggesting that the ‘easy’ setting alone does not benefit from OAA, and that introducing additional hard samples through HSM increases task complexity. In contrast, combining both HSM and OAA produces a substantial performance gain, underscoring their synergistic effect in enhancing dataset quality.

For practical utility, we compared the ‘Purely Synthetic’ protocol (12 epochs) against a ‘Hybrid’ schedule (6 epochs synthetic pre-training + 6 epochs real data refinement) under a consistent compute budget. The results (Table 6) indicate that while incorporating real data yields marginal gains across all methods (0.2–1.2 increase in F1), PRMix maintains its consistent superiority. This confirms that PRMix generates high-quality synthetic samples that effectively complement real data in mixed training pipelines.

6. Conclusion

We curated the DMBM dataset, a high-resolution, expert-annotated collection capturing 118 brain subregions with unprecedented molecular and structural detail of the synaptic organization. For the first time, we demonstrate that integrating dual-fluorescence synaptic markers enables this dataset to serve as a comprehensive benchmark for regional delineation of the mouse brain.

To further improve the accuracy of automated segmentation methods, we introduced PRMix, a novel data augmentation that enables realistic data synthesis while preserving anatomical structures, achieving fine-grained brain region delineation beyond existing methods. By combining the DMBM dataset with PRMix augmentation, our work sets a new standard for mouse brain delineation and provides a robust framework for advancing neuroscience and biomedical imaging.

The resulting automated delineator offers a > 10-fold reduction in processing time, resolving boundary ambiguities and minimizing the inconsistencies of manual annotation. This efficiency dramatically lowers the barrier to creating large-scale, high-accuracy atlases. Looking forward, the adaptability of our framework opens critical new research avenues. While this study validated PRMix exclusively on the DMBM dataset due to the scarcity of comparable high-resolution synaptic data, the core principles of our approach — region-aware sampling and overlap minimization — are modality-agnostic. Consequently, our framework holds significant potential to generalize across diverse histological stains and species, providing a powerful tool for comparative neuroanatomy. Future work will focus on validating PRMix on external datasets and distinct imaging protocols to further establish its robustness. Ultimately, integrating this approach with other modalities, such as spatial transcriptomics and connectomics, will pave the way for a more holistic understanding of brain architecture.

Supplementary Material

Supplementary material related to this article can be found online at https://doi.org/10.1016/j.neuroimage.2026.121881.

Supplementary material

Acknowledgments

This work was funded by the Wellcome Trust, United Kingdom (302077/Z/23/Z, 218293/Z/19/Z, 221295/Z/20/Z); the Simons Initiative for the Developing Brain (SIDB) under the Simons Foundation for Autism Research Initiative, United Kingdom (529085); and the China Scholarship Council (202407030025 to Y.W.). We thank the Edinburgh International Data Facility (EIDF) for computational resources and data storage, and C. Davey for editing and D. Maizels for artwork.

Footnotes

CRediT authorship contribution statement

Kunhao Yuan: Writing – review & editing, Writing – original draft, Visualization, Methodology, Data curation, Conceptualization. Hanan Woods: Writing – review & editing, Data curation. Ülkü Günar: Data curation. Digin Dominic: Visualization, Resources. Ying Wu: Formal analysis. Zhen Qiu: Writing – review & editing, Formal analysis. Seth G.N. Grant: Writing – review & editing, Supervision, Resources, Funding acquisition, Conceptualization.

Ethics approval

The animal experiments underwent review by the University of Edinburgh Animal Welfare and Ethical Review Body (AWERB). They were subsequently approved (PPL PF3F251A9 – 24 July 2019 to 13 June 2024) by the UK Animals in Science Regulation Unit (ASRU) under the Animals (Scientific Procedures) Act 1986 in strict accordance with the Home Office Code of Practice. The experiments were conducted following an authorized experimental protocol endorsed by both the AWERB and the Bioresearch and Veterinary Services (BVS) department at the University of Edinburgh.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability

Data and Code Availability Statement The data and code used for this study have been made public at https://git-pages.ecdf.ed.ac.uk/dmbm-datasets-5c13cd/. For the purpose of open access, the author has applied a CC-BY public copyright license to any Author Accepted Manuscript version arising from this submission.

References

  1. Basaran B, Zhang W, Qiao M, Kainz B, Matthews P, Bai W. Lesionmix: A lesion-level data augmentation method for medical image segmentation; International Conference on Medical Image Computing and Computer-Assisted Intervention; 2023. pp. 73–83. [Google Scholar]
  2. Bulovaite E, Qiu Z, Kratschke M, Zgraj A, Fricker D, Tuck E, Gokhale R, Koniaris B, Jami S, Merino-Serrais P, et al. A brain atlas of synapse protein lifetime across the mouse lifespan. Neuron. 2022;110:4057–4073. doi: 10.1016/j.neuron.2022.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Chartsias A, Joyce T, Papanastasiou G, Semple S, Williams M, Newby D, Dharmakumar R, Tsaftaris S. Disentangled representation learning in cardiac image analysis. Med Image Anal. 2019;58:101535. doi: 10.1016/j.media.2019.101535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations; International Conference on Machine Learning; 2020. pp. 1597–1607. [Google Scholar]
  5. Cizeron M, Qiu Z, Koniaris B, Gokhale R, Komiyama N, Fransén E, Grant S. A brainwide atlas of synapses across the mouse life span. Science. 2020;369:270–275. doi: 10.1126/science.aba3163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Curran O, Qiu Z, Smith C, Grant S. A single-synapse resolution survey of psd95-positive synapses in twenty human brain regions. Eur J Neurosci. 2021;54:6864–6881. doi: 10.1111/ejn.14846. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Cuthbert P, Stanford L, Coba M, Ainge J, Fink A, Opazo P, Delgado J, Komiyama N, O’Dell T, Grant S. Synapse-associated protein 102/dlgh3 couples the nmda receptor to specific plasticity pathways and learning strategies. J Neurosci. 2007;27:2673–2682. doi: 10.1523/JNEUROSCI.4457-06.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Frid-Adar M, Diamant I, Klang E, Amitai M, Goldberger J, Greenspan H. Gan-based synthetic medical image augmentation for increased cnn performance in liver lesion classification. Neurocomputing. 2018;321:321–331. [Google Scholar]
  9. Ghiasi G, Cui Y, Srinivas A, Qian R, Lin T, Cubuk E, Le Q, Zoph B. Simple copy-paste is a strong data augmentation method for instance segmentation; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021. pp. 2918–2928. [Google Scholar]
  10. Isensee F, Wald T, Ulrich C, Baumgartner M, Roy S, Maier-Hein K, Jaeger P. Nnu-net revisited: A call for rigorous validation in 3d medical image segmentation; International Conference on Medical Image Computing and Computer-Assisted Intervention; 2024. pp. 488–498. [Google Scholar]
  11. Kádár A, Wittmann G, Liposits Z, Fekete C. Improved method for combination of immunocytochemistry and nissl staining. J Neurosci Methods. 2009;184:115–118. doi: 10.1016/j.jneumeth.2009.07.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Koukaroudi D, Qiu Z, Fransén E, Gokhale R, Bulovaite E, Komiyama N, Seibt J, Grant S. Sleep maintains excitatory synapse diversity in the cortex and hippocampus. Curr Biol. 2024;34:3836–3843. doi: 10.1016/j.cub.2024.07.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Krizhevsky A, Sutskever I, Hinton G. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst. 2012;25 [Google Scholar]
  14. Lein E, Hawrylycz M, Ao N, Ayres M, Bensinger A, Bernard A, Boe A, Boguski M, Brockway K, Byrnes E, et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature. 2007;445:168–176. doi: 10.1038/nature05453. [DOI] [PubMed] [Google Scholar]
  15. Litjens G, Kooi T, Bejnordi B, Setio A, Ciompi F, Ghafoorian M, Van Der Laak J, Van Ginneken B, Sánchez C. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60–88. doi: 10.1016/j.media.2017.07.005. [DOI] [PubMed] [Google Scholar]
  16. Los G, Encell L, McDougall M, Hartzell D, Karassina N, Zimprich C, Wood M, Learish R, Ohana R, Urh M, et al. Halotag: a novel protein labeling technology for cell imaging and protein analysis. ACS Chem Biol. 2008;3:373–382. doi: 10.1021/cb800025k. [DOI] [PubMed] [Google Scholar]
  17. Menze B, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J, Burren Y, Porz N, Slotboom J, Wiest R, et al. The multimodal brain tumor image segmentation benchmark (brats) IEEE Trans Med Imaging. 2014;34:1993–2024. doi: 10.1109/TMI.2014.2377694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Metzbower S, Levy A, Dharmasri P, Anderson M, Blanpied T. Distinct sap102 and psd-95 nano-organization defines multiple types of synaptic scaffold protein domains at single synapses. J Neurosci. 2024;44 doi: 10.1523/JNEUROSCI.1715-23.2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Migaud M, Charlesworth P, Dempster M, Webster L, Watabe A, Makhinson M, He Y, Ramsay M, Morris R, Morrison J, et al. Enhanced long-term potentiation and impaired learning in mice with mutant postsynaptic density-95 protein. Nature. 1998;396:433–439. doi: 10.1038/24790. [DOI] [PubMed] [Google Scholar]
  20. Papp E, Leergaard T, Calabrese E, Johnson G, Bjaalie J. Waxholm space atlas of the sprague dawley rat brain. Neuroimage. 2014;97:374–386. doi: 10.1016/j.neuroimage.2014.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Paxinos G, Watson C. The Rat Brain in Stereotaxic Coordinates: Hard Cover Edition. Elsevier; 2006. [Google Scholar]
  22. Perens J, Salinas C, Skytte J, Roostalu U, Dahl A, Dyrby T, Wichern F, Barkholt P, Vrang N, Jelsing J, et al. An optimized mouse brain atlas for automated mapping and quantification of neuronal activity using idisco+ and light sheet fluorescence microscopy. Neuroinformatics. 2021;19:433–446. doi: 10.1007/s12021-020-09490-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Reaungamornrat S, Sari H, Catana C, Kamen A. Multimodal image synthesis based on disentanglement representations of anatomical and modality specific features, learned using uncooperative relativistic gan. Med Image Anal. 2022;80:102514. doi: 10.1016/j.media.2022.102514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation; Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18; 2015. pp. 234–241. [Google Scholar]
  25. Shorten C, Khoshgoftaar T. A survey on image data augmentation for deep learning. J Big Data. 2019;6:1–48. doi: 10.1186/s40537-021-00492-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Tomas-Roca L, Qiu Z, Fransén E, Gokhale R, Bulovaite E, Price D, Komiyama N, Grant S. Developmental disruption and restoration of brain synaptome architecture in the murine pax6 neurodevelopmental disease model. Nat Commun. 2022;13:6836. doi: 10.1038/s41467-022-34131-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Velicky P, Miguel E, Michalska J, Lyudchik J, Wei D, Lin Z, Watson J, Troidl J, Beyer J, Ben-Simon Y, et al. Dense 4d nanoscale reconstruction of living brain tissue. Nature Methods. 2023;20:1256–1265. doi: 10.1038/s41592-023-01936-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Vousden D, Epp J, Okuno H, Nieman B, van Eede M, Dazai J, Ragan T, Bito H, Frankland P, Lerch J, et al. Whole-brain mapping of behaviourally induced neural activation in mice. Brain Struct Funct. 2015;220:2043–2057. doi: 10.1007/s00429-014-0774-0. [DOI] [PubMed] [Google Scholar]
  29. Wang T, Xing H, Li Y, Wang S, Liu L, Li F, Jing H. Deep learning-based automated segmentation of eight brain anatomical regions using head ct images in pet/ct. BMC Med Imaging. 2022;22:99. doi: 10.1186/s12880-022-00807-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Wang Y, Yuan K, Schaefer G, Liu X, Jing L, Guo K, Wang J, Fang H. Refining pseudo-labels through iterative mix-up for weakly supervised semantic segmentation. Pattern Recognit. 2025:111975 [Google Scholar]
  31. Wasserthal J, Breit H, Meyer M, Pradella M, Hinck D, Sauter A, Heye T, Boll D, Cyriac J, Yang S, et al. Totalsegmentator: robust segmentation of 104 anatomic structures in ct images. Radiol: Artif Intell. 2023;5:e230024. doi: 10.1148/ryai.230024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Winding M, Pedigo B, Barnes C, Patsolic H, Park Y, Kazimiers T, Fushiki A, Andrade I, Khandelwal A, Valdes-Aleman J, et al. The connectome of an insect brain. Science. 2023;379:eadd9330. doi: 10.1126/science.add9330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Yu Z, Han X, Zhang S, Feng J, Peng T, Zhang X. Mousegan++: unsupervised disentanglement and contrastive representation for multiple mri modalities synthesis and structural segmentation of mouse brain. IEEE Trans Med Imaging. 2022;42:1197–1209. doi: 10.1109/TMI.2022.3225528. [DOI] [PubMed] [Google Scholar]
  34. Yun S, Han D, Oh S, Chun S, Choe J, Yoo Y. Cutmix: Regularization strategy to train strong classifiers with localizable features; Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019. pp. 6023–6032. [Google Scholar]
  35. Zeisel A, Hochgerner H, Lönnerberg P, Johnsson A, Memic F, Van Der Zwan J, Häring M, Braun E, Borm L, Manno La, et al. Molecular architecture of the mouse nervous system. Cell. 2018;174:999–1014. doi: 10.1016/j.cell.2018.06.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Zhang H, Cisse M, Dauphin Y, Lopez-Paz D. Mixup: Beyond empirical risk minimization. 2018. arXiv:1710.09412. URL https://arxiv.org/abs/1710.09412.
  37. Zhang X, Liu C, Ou N, Zeng X, Zhuo Z, Duan Y, Xiong X, Yu Y, Liu Z, Liu Y, et al. Carvemix: a simple data augmentation method for brain lesion segmentation. NeuroImage. 2023;271:120041. doi: 10.1016/j.neuroimage.2023.120041. [DOI] [PubMed] [Google Scholar]
  38. Zhang K, Zhuang X. Cyclemix: A holistic strategy for medical image segmentation from scribble supervision; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022. pp. 11656–11665. [Google Scholar]
  39. Zhu F, Cizeron M, Qiu Z, Benavides-Piccione R, Kopanitsa M, Skene N, Koniaris B, DeFelipe J, Fransen E, Komiyama N, et al. Architecture of the mouse brain synaptome. Neuron. 2018;99:781–799. doi: 10.1016/j.neuron.2018.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Zhu J, Park T, Isola P, Efros A. Unpaired image-to-image translation using cycle-consistent adversarial networks; Proceedings of the IEEE International Conference on Computer Vision; 2017. pp. 2223–2232. [Google Scholar]
  41. Zhu Q, Wang Y, Yin L, Yang J, Liao F, Li S. Selfmix: a self-adaptive data augmentation method for lesion segmentation; International Conference on Medical Image Computing and Computer-Assisted Intervention; 2022. pp. 683–692. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

Data Availability Statement

Data and Code Availability Statement The data and code used for this study have been made public at https://git-pages.ecdf.ed.ac.uk/dmbm-datasets-5c13cd/. For the purpose of open access, the author has applied a CC-BY public copyright license to any Author Accepted Manuscript version arising from this submission.

RESOURCES