Abstract
How to harmonize site effects is a fundamental challenge in modern multi-site neuroimaging studies. Although many statistical models and deep learning methods have been proposed to mitigate site effects while preserving biological characteristics, harmonization schemes for multi-site resting-state functional magnetic resonance imaging (rs-fMRI), particularly for functional connectivity (FC), remain undeveloped. Moreover, statistical models, though effective for region-level data, are inherently unsuitable for capturing complex, nonlinear mappings required for FC harmonization. To address these issues, we develop a novel, flexible deep learning method, Mamba-based Residual Generative adversarial network (MR-GAN), to harmonize multi-site functional connectivities. Our method leverages the Mamba Block, which has been proven effective in traditional visual tasks, to define FC-specified sequential patterns and integrate them with a multi-task residual GAN to harmonize multi-site FC data. Experiments on 939 infant rs-fMRI scans from four sites demonstrate the superior performance of the proposed method in harmonization compared to other approaches.
Index Terms: Functional connectivity, Multi-site harmonization, Mamba
1. INTRODUCTION
Recent large-scale multi-site neuroimaging studies have demonstrated greater power to detect biologically relevant variability and has offered invaluable insights into the mechanisms underlying neurodevelopmental and neurodegenerative disorders [1]. However, the aggregation of neuroimaging data across different sites and scanners typically introduces non-biological variability, also known as site effects. These effects, stemming from site-related variations, can bias neuroimaging measures and distort the interpretation of clinical signals [2]. Consequently, many harmonization methods have been proposed to mitigate these unwanted site effects while preserving biological variability [3].
Drawing inspiration from image-to-image translation techniques in computer vision [4], many approaches have harmonized neuroimaging data in the image domain by synthesizing brain images across different sites [5]. However, due to the inherent complexity of neuroimaging data processing pipelines, these image-based harmonization techniques cannot guarantee that the final derived structural or functional features are free from site effects. Although some feature-based harmonization techniques have been proposed to directly mitigate site effects in derived volumetric [6], structural and functional [7, 8] magnetic resonance imaging (MRI) features, most of these methods rely on linear statistical models, such as ComBat [8] and its variants [6]. As data dimensionality increases and connections between brain regions become more complex, these methods struggle to capture subtle topological changes and effectively remove correlated noise. Moreover, when handling matrix data such as functional connectivity (FC), linear models and their parameter distribution assumptions may overlook the intricate topological structure of FC and the underlying connections between brain regions, leading to inadequate nonlinear mappings for FC. Although some deep learning methods have been applied to harmonize high-dimensional features such as cortical thickness maps [9], no learning-based FC-specified harmonization tools currently exist.
Recently, State Space Models (SSMs) [10], particularly structured state space sequence models (S4), have emerged as efficient and effective building blocks for deep networks, achieving state-of-the-art performance in continuous long-sequence data analysis. Mamba [11] enhances S4 by introducing a selective mechanism that enables the model to extract relevant information based on input data. With a hardware-aware implementation, Mamba outperforms Transformers in dense modalities such as language and genomics. State space models have also demonstrated promising results in vision tasks, such as image classification [12]. Given that image patches and features can be represented as sequences [12], these appealing properties of SSMs motivated us to explore Mamba blocks for feature extraction in FCs.
To this end, we propose a novel Mamba-based Residual Generative adversarial network (MR-GAN) for harmonizing early developing FC across sites. This model effectively removes site-related features and uniformly maps multi-site FC to a target site space, facilitating harmonization and enabling comprehensive FC analysis in subsequent studies. Specifically, 1) drawing inspiration from State Space Sequence Models (SSMs) [11, 10] and its variants [12], a family of deep sequence models renowned for their ability to handle long sequences, we first develop an FC-specified Mamba block capable of capturing both localized fine-grained features and long-range brain region dependencies in FC. 2) We leverage an adversarial strategy [13] to enhance the disentanglement of site-related and site-unrelated features. 3) Inspired by Cycle-GAN [4], we develop a cycle-consistent scheme to enforce consistency and employ a multi-task discriminator to ensure the authenticity and precision of the generated FC. Experiments on 939 infant rs-fMRI scans from four sites demonstrate the superior performance of the proposed method in harmonization compared to other approaches.
2. METHODS
2.1. Preliminaries and FC-specified Mamba Block
State Space Sequence Models (SSMs) [10] are a type of system designed to process 1-D function or sequence and can be represented as the following linear Ordinary Differential Equation (ODE):
| (1) |
where state matrix and are its parameters and denotes the implicit latent state.
Recently, Mamba [11] has further advances SSMs in discrete data modeling (e.g., text and genome). By merging SSM blocks with linear layers, the Mamba architecture demonstrates state-of-the-art performance on a variety of long sequence domains, with significant computational efficiency during both training and inference.
To effectively facilitate feature extraction and representation learning of FCs, we introduce the FC-specified Mamba Block, inspired by [11, 12]. As shown in Fig. 1, we developed an FC-specified sorting strategy to capture both localized fine-grained features and long-range dependencies between brain regions. The FC matrix x is partitioned into n row-wise patches to preserve the FC map of each brain region. We defined connection strengths above 0.5 as strong functional connections and computed the number of such connections between each brain region and others at the group level. These numbers are then arranged in descending order to create a new brain region sequence s. The sorted brain region sequence features by Sort Block are fed into the Bidirectional Vision Mamba block [12], which processes the sequence in both forward and backward directions to capture richer features from the 2-dimensional data. During the decoder stage, the features are resorted back into their original sequence by Re-Sort Block. The encoder and decoder in this model utilize the Mamba block as their backbone.
Fig. 1.

The framework of the proposed Mamba-based Residual Generative Adversarial Network (MR-GAN). The FC input encoder extracts latent features and employs the Adversarial Disentangled Strategy to distinguish site-specific from non-site features. These features are then merged with the target site features to generate FC residuals, which are added to the original FC to achieve harmonization. The discriminator and cycle-consistent strategies enhance the precision and robustness of the generated outputs.
2.2. Adversarial Disentangled Strategy
To detect and remove site effects from multi-site FC, we introduce an adversarial disentangled strategy [13]. As shown in Fig. 1, for the latent feature z obtained from the input s across M sites, let , where is a 1 × M vector representing classification probabilities, which is site-related, and contains the remaining subject-specified and site-invariant features. We use an adversarial excitation and inhibition method with term:
| (2) |
| (3) |
where and are site classifiers utilizing latent variables and , respectively. The one-hot vector encodes the ground-truth site label. We apply multi-class binary cross-entropy loss to align with the correct site label as in Eq. 2, while an inhibition process ensures remains site-unrelated with loss shown in Eq. 3. This approach allows us to extract non-site features, which are then combined with site-specific features in the next step and passed to the decoder to generate the target FC.
2.3. Cycle-Consistent Residual GAN
We develop a Cycle-Consistent Residual GAN to make sure the generated FC contains the target site features and closely matches the ground truth, as illustrated in Fig. 1. For a FC from site a, we generate its residual FC using the encoder E and decoder D, which is then added to the original to obtain the in target site b. After that, we perform a reverse mapping to predict the original input as follows:
| (4) |
We then impose the cycle-consistency loss to ensure that the reconstructed and latent non-site feature are precise representations of the original and reverse latent non-site feature :
| (5) |
Besides, we use a multi-task discriminator to ensure the authenticity and precision of the generated FC. We label all ground-truth FC as real and the generated FC as fake to train the discriminator. Additionally, the discriminator also performs site classification to ensure that the generated FC preserves the target site’s information.
3. EXPERIMENTS
3.1. Datasets and Implementation Details
We evaluated our method on four infant datasets: UNC/UMN Baby Connectome Project (BCP) [14], Multi-visit Advanced Pediatric Brain Imaging Study (MAP) [15], Calgary study [16] and Pixar study [17]. Detailed information is provided in Table 1. For infant structural and functional MRI processing, we followed the methodologies detailed in [18] to extract the fMRI time-series for each vertex on middle cortical surfaces and all-time series within each cortical region were then averaged based on [19]. We selected 200 cortical regions per hemisphere to construct the functional connectivity matrix by computing the Pearson’s correlation coefficient between the time series of each pair of region of interests (ROIs), followed by applying Fisher’s r-to-z transformation.
Table 1.
Data information of four infant datasets.
| Subject Number (Males) | Total Scans | Age Range (Months) | |
|---|---|---|---|
| BCP | 305 (143) | 661 | 0-73 |
| MAP | 79 (39) | 328 | 0-75 |
| Calgary | 67 (37) | 156 | 31-85 |
| Pixar | 61 (27) | 61 | 42-72 |
The encoder, decoder, and discriminator all employ the Vision Mamba (Vim) structure [12]. We adopt the Vim-base model, consisting of 24 encoder blocks, 8 decoder blocks, a 384-dimensional encoder embedding, and a 256-dimensional decoder embedding. Additionally, a masked autoencoder [20] is employed as a pretraining strategy to enhance the feature-capturing ability of the Mamba block. For the discriminator’s identification task and the site classification task, we use fully connected layers with dimensions of (256, 2) and (256, 4), respectively. During the disentanglement process, a simple sigmoid layer outputs the probability of zt, while a fully connected layer of dimension (380,4) is used fo zi to adversarially train the encoder to extract site-unrelated features. We set BCP dataset as the target dataset, with all four datasets mapped to the BCP domain for harmonization.
3.2. Validation on Removing Site Effects
To validate whether site effects are successfully removed while preserving the essential non-site FC features, we present the population-level developmental trajectory of functional networks in Fig. 2. We use the Combat algorithm and use age as covariates to compare the generated results with MR-GAN. We calculate the intra- and inter-network FC strength between 17 functional networks of each subject by averaging the FC connection related to the corresponding functional networks. We selected intra-network connections of Visual Central (VisCent), SomatomotorA (SomMotA), and Dorsal Attention A (DorsAttnA) as the representative connections to demonstrate the population-level developmental trajectory. As shown in Fig. 2, the linear statistical model, ComBat, struggles to effectively handle extreme values and outliers, often diminishing the overall strength of functional connections. In contrast, our MR-GAN achieves better harmonization while preserving both general trajectory trends and the unique characteristics for individual trajectory.
Fig. 2.

The visualization of the functional network developmental trajectories. Intra-network connections of VisCent, SomMotA, and DorsAttnA were selected as the representatives. Solid lines: growth chart; shaded area: 95% confidence intervals.
3.3. Validation on Downstream Tasks
We further investigate the impact of different harmonization methods on downstream tasks. We use Brain Connectivity based Graph Convolutional Networks (BC-GCN) [21], a deep learning based model designed specifically for FC based age prediction, to predict the scan age. Given the significant differences in age distributions across the four datasets, we designed four age prediction tasks: (1) using BCP as the training set and MAP as the test set (0–800 days); (2) using MAP as the training set and BCP as the test set (0–800 days); (3) using MAP and Pixar as the training set and Calgary as the test set (800–1600 days); (4) using Calgary as the training set and MAP and Pixar as the test set (800–1600 days). We used mean absolute error (MAE) to evaluate performance. The results, presented in Table 2, indicate that ComBat does not improve prediction accuracy compared to non-harmonized data, likely due to the limitations of linear statistical models in preserving the detailed information and intricate connection structures in two-dimensional FC data. As a result, such harmonization may cause the loss of many local features. In contrast, our MR-GAN significantly improves the accuracy and reliability of analysis across multi-site data compared to both non-harmonized and ComBat-based harmonization method.
Table 2.
Experimental results of age prediction by BC-GCN in terms of MAE (days).
| Task 1 | Task 2 | Task 3 | Task 4 | |
|---|---|---|---|---|
| Raw | 129.77 | 103.22 | 141.85 | 186.71 |
| ComBat | 127.07 | 137.96 | 147.80 | 185.27 |
| Mamba | 123.07 | 98.20 | 135.63 | 183.28 |
4. CONCLUSION
In this study, we propose a novel Mamba-based Residual Generative Adversarial Network (MR-GAN) to harmonize multi-site infant brain functional connectivity (FC). Unlike the previous harmonization methods, MR-GAN includes an FC-specified Mamba block, designed to preserve detailed features of FC. The Mamba Block effectively captures the relational characteristics between brain regions in FC and supports subsequent FC generation. To the best of our knowledge, this is the first application of the Mamba structure in the FC field. Additionally, we employed an adversarial disentanglement strategy to separate site-related and non-site features and utilized the Cycle-GAN structure to ensure that the generated FCs contain the target site feature and closely resemble the real FC. The promising results across four infant datasets underscore the superiority of our model in harmonizing functional connectivity during infancy.
6. ACKNOWLEDGMENTS
This work was supported in part by NIH grants (MH123202 and NS135574).
Footnotes
5. COMPLIANCE WITH ETHICAL STANDARDS
This research study was conducted retrospectively using human subject data made available in open access by UNC/UMN Baby Connectome Project [14], Calgary study [16] and Pixar study [17]. Ethical approval was not required as confirmed by the license attached with the open access data. All data acquired for the UNC Multi-visit Advanced Pediatric (MAP) [15] brain imaging study have been approved and regulated by the UNC IRB.
7. REFERENCES
- [1].Hazlett HC, Gu H, Munsell BC, et al. , “Early brain development in infants at high risk for autism spectrum disorder,” Nature, vol. 542, no. 7641, pp. 348–351, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Shinohara RT, Oh J, Nair G, et al. , “Volumetric analysis from a harmonized multisite brain mri study of a single subject with multiple sclerosis,” American Journal of Neuroradiology, vol. 38, no. 8, pp. 1501–1509, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Solanes A, Gosling CJ, Fortea L, et al. , “Removing the effects of the site in brain imaging machine-learning–measurement and extendable benchmark,” NeuroImage, vol. 265, pp. 119800, 2023. [DOI] [PubMed] [Google Scholar]
- [4].Zhu J, Park T, Isola P, et al. , “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in ICCV, 2017, pp. 2223–2232. [Google Scholar]
- [5].Eshaghzadeh Torbati M, Minhas DS, Tafti AP, et al. , “Espa: An unsupervised harmonization framework via enhanced structure preserving augmentation,” in MICCAI. Springer, 2024, pp. 184–194. [Google Scholar]
- [6].Torbati ME, Minhas DS, Ahmad G, et al. , “A multiscanner neuroimaging data harmonization using ravel and combat,” Neuroimage, vol. 245, pp. 118703, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Yamashita A, Yahata N, Itahashi T, et al. , “Harmonization of resting-state functional mri data across multiple imaging sites via the separation of site differences into sampling bias and measurement bias,” PLoS biology, vol. 17, no. 4, pp. e3000042, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Fortin J, Cullen N, Sheline YI, et al. , “Harmonization of cortical thickness measurements across scanners and sites,” Neuroimage, vol. 167, pp. 104–120, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Zhao F, Wu Z, Wang L, et al. , “Harmonization of infant cortical thickness using surface-to-surface cycle-consistent adversarial networks,” in MICCAI. Springer, 2019, pp. 475–483. [Google Scholar]
- [10].Gu A, Modeling Sequences with Structured State Spaces, Stanford University, 2023. [Google Scholar]
- [11].Gu A and Dao T, “Mamba: Linear-time sequence modeling with selective state spaces,” arXiv preprint arXiv:2312.00752, 2023. [Google Scholar]
- [12].Zhu L, Liao B, Zhang Q, et al. , “Vision mamba: Efficient visual representation learning with bidirectional state space model,” arXiv preprint arXiv:2401.09417, 2024. [Google Scholar]
- [13].Ding Z, Xu Y, Xu W, et al. , “Guided variational autoencoder for disentanglement learning,” in CVPR, 2020, pp. 7920–7929. [Google Scholar]
- [14].Howell BR, Styner MA, Gao W, et al. , “The unc/umn baby connectome project (bcp): An overview of the study design and protocol development,” NeuroImage, vol. 185, pp. 891–905, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Yin W, Chen M, Hung S, et al. , “Brain functional development separates into three distinct time periods in the first two years of life,” NeuroImage, vol. 189, pp. 715–726, 2019. [DOI] [PubMed] [Google Scholar]
- [16].Reynolds JE, Long X, Paniukov D, et al. , “Calgary preschool magnetic resonance imaging (mri) dataset,” Data in brief, vol. 29, pp. 105224, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Richardson H, Lisandrelli G, Riobueno-Naylor A, et al. , “Development of the social brain from age three to twelve years,” Nature communications, vol. 9, no. 1, pp. 1027, 2018. [Google Scholar]
- [18].Hu D, Wang F, Zhang H, et al. , “Existence of functional connectome fingerprint during infancy and its stability over months,” Journal of Neuroscience, vol. 42, no. 3, pp. 377–389, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Schaefer A, Kong R, Gordon EM, et al. , “Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity mri,” Cerebral cortex, vol. 28, no. 9, pp. 3095–3114, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].He K, Chen X, Xie S, et al. , “Masked autoencoders are scalable vision learners,” in CVPR, 2022, pp. 16000–16009. [Google Scholar]
- [21].Li Y, Zhang X, Nie J, et al. , “Brain connectivity based graph convolutional networks and its application to infant age prediction,” IEEE transactions on medical imaging, vol. 41, no. 10, pp. 2764–2776, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
