Abstract
As the practice of aggregating multi-site neuroimaging data has become more common, the field of neuroscience has increasingly recognized the importance of harmonization, or the removal of scanner effects from brain imaging data. While many harmonization methods exist, like ComBat and CovBat, few explicitly incorporate the network structure of the brain. Researchers studying structural connectivity are therefore not guaranteed to model the true underlying brain network. This study offers a new harmonization method, called NetBat, which was designed to incorporate network parameters from the weighted stochastic block model (WSBM) as covariates in the popular ComBat harmonization method. NetBat is demonstrated through analysis of eighteen neurotypical individuals each scanned on four MRI scanners. Results suggest that under tested circumstances NetBat provides more accurate overall harmonization and better retention of network structure than competing methods.
Keywords: Network harmonization, Structural connectivity, Multi-scan studies
1. Introduction
The brain is understood to be a modular entity (Grossberg, 2000; Bassett and Bullmore, 2006; Bassett and Sporns, 2017), and its regions therefore exhibit structural relationships that can be modeled with a network. This network architecture, which describes the topological structure of the brain, is often referred to as the structural connectivity of the brain (Bassett et al., 2011). Structural connectivity has been studied in a number of empirical contexts, including: relationships between structure and function (Damoiseaux and Greicius, 2009; Bullmore and Sporns, 2009); behavioral outcomes (Farooq et al., 2019); the brain in the resting state (Rosazza and Minati, 2011); brain dysfunction (Sharp et al., 2014); and brain development (Faskowitz et al., 2018). Structural images can be captured with methods like magnetic resonance imaging (MRI).
Many of the recent studies on structural connectivity have made use of multi-site online databases (Duan et al., 2022), which are easily accessible and have the advantage of increasing the power and generalizability of the study. However, neuroimage acquisition is known to be a complex process and is subject to numerous sources of unwanted biological and technical variability, including individual differences in brain size (O’Brien et al., 2011), head movement during scanning (Hedges et al., 2022), and the noise inherent in the scanner (Chen et al., 2014). The major disadvantage of multisite aggregation is the introduction of this last source, scanner noise, which risks obscuring potential network relationships. The process of removing these so-called “scanner effects” from neuroimaging data is called harmonization (Pinto et al., 2020). Harmonization is particularly relevant to studies in which data from different sites are collected, and the goal is to aggregate across a large number of datasets. Harmonization is also relevant in studies where participants are scanned multiple times on different scanners and has been shown to be helpful in reducing error in later analyses (Jovicich et al., 2013).
Harmonization can be completed at the neuroimage summary measure (or outcome) level or at the voxel level. Examples of outcome harmonization methods include human-phantom-based harmonization (Pohl et al., 2016), hardware-phantom based harmonization (Timmermans et al., 2019), and global scaling (Fortin et al., 2017). Alternatively, voxel-wise harmonization methods include removal of the artificial voxel effect by linear regression (RAVEL; Fortin et al., 2016), Multi-scanner Image harmonization via Structure Preserving Embedding Learning (MISPEL; Torbati et al., 2023), and Contrast Anatomy Learning and Analysis for MR Intensity Translation and Integration (CALAMITI; Zuo et al., 2021).
Combat (Fortin et al., 2018; Johnson et al., 2007), an empirical Bayesian approach, is one of the most widely implemented outcome-level harmonization methods. ComBat first estimates the amount of scanner noise present between and within each scanner using the data, and then uses those estimates to harmonize the data. Research evaluating ComBat suggests that it outperforms other methods, such as RAVEL (Torbati et al., 2021).
However, the danger in harmonizing neuroimaging data is that researchers may remove important relationships present in the data, or even suggest relationships that were not there previously. The propensity of methods like ComBat to remove expected biological variability in addition to technical variability has been acknowledged, and some have applied machine learning algorithms to account for this (Liu et al., 2023). To further complicate matters, methods like ComBat account for differences in scanners when harmonizing region-specific data but do not account for the interrelationships between regions that are likely present. ComBat, therefore, does not directly account for network structure among the regions of the brain.
Fig. 1 illustrates heat maps of structural networks at differing stages of harmonization. Rows and columns in each matrix correspond to regions of the brain, and each pixel is the correlation between four scans of the same participant’s brain. No obvious relationships between regions are present in Fig. 1a. However, after incorporating cluster information in Fig. 1b, a structural network becomes clear: five clusters, of differing sizes, are evident as blue blocks on the diagonal. This structural network is less clear after ComBat harmonization, as seen in Fig. 1c. While important in removing scanner noise, Combat has also removed evidence for structural relationships between regions.
Fig. 1.

Heat maps of one participant’s structural connectivity matrix: (a) before ComBat harmonization and without clustering; (b) with clustering, but before ComBat harmonization; and (c) after ComBat harmonization and after clustering. The rows and columns in (b) and (c) are ordered by cluster labels, and consistent between (b) and (c). Clusters were determined by stochastic block model (SBM). Correlations were computed on four scans of the same brain using different scanners. Blue pixels represent high positive correlation, red pixels represent high negative correlation, and white pixels represent zero correlation. Note that applying ComBat removes strength of clusters and fails to maintain negative correlations.
Several recent studies have attempted to evaluate or solve this problem. Onicas et al. (2022) assessed the relative benefit of ComBat harmonizing dMRI data by comparing network structure before and after harmonization, as well as conducting two different harmonization approaches, which they call “matrix harmonization” and “parameter harmonization”. In both cases, harmonization was conducted on network information, rather than the original data itself. Chen et al. (2021) offer CovBat, a ComBat variant developed with the intention of accounting for the covariance between brain regions. In its formulation, CovBat accounts for dyadic relationships among regions, but does not directly model network effects among regions beyond pairwise similarities.
We offer NetBat as an extension of ComBat which is comprised of two steps: first, the structural network is estimated from multiple MRI scans from the same individual; and second, model parameter estimates from the network are included in the ComBat model as covariates. The goal of the present work is to demonstrate the functionality of NetBat and compare the results of its harmonization process with other harmonization methods. Thus, not only are the results of NetBat compared with the unharmonized data, but also the results of ComBat and CovBat applied to the same dataset. ComBat and CovBat were chosen because they are two representative alternatives to NetBat that seek to accomplish similar goals. The success of these methods were assessed at two levels: (1) the differences between scanners before and after harmonization; and (2) the retention of network structure post-harmonization. Our results suggest that in several conditions NetBat outperforms ComBat and CovBat, and that further research can explicate the relative strengths and weaknesses of these methods.
2. Method
2.1. Participants and data acquisition
The sample utilized in this study comprised 18 participants. The median age of the participants was 72 years (range: 51–78 years), with 44% (N = 8) being males. All participants demonstrated cognitively unimpaired status. Ten participants exhibited a high degree of small vessel disease (SVD) as previously defined (Wilcock et al., 2021), while the remaining participants had a low degree of SVD. T1-weighted (T1-w) images were obtained for each participant using four different 3T scanners: General Electric (GE), Philips, Siemens Prisma (Prisma), and Siemens Trio (Trio). Detailed information on scanner specifications can be found in Table 1. For each participant, images were acquired on all scanners, with at most four months between scans. During this period, no biological changes in the brain were assumed to occur, thus any observed differences between scans were attributed solely to scanner effects. We utilized FreeSurfer 7.1.1 (Fischl, 2012) to analyze the images obtained from the collected sample, extracting regional measurements of cortical thickness and volume. In this manuscript, these measurements are denoted as raw, representing the unharmonized data. Analyses were conducted on the cortical thickness and volume measures of 68 brain regions (34 regions in one hemisphere and their corresponding region in the other hemisphere).
Table 1.
Scanner specifications.
| Scanner name | GE | Philips | Prisma | Trio |
|---|---|---|---|---|
| Manufacturer | General Electrics | Philips | Siemens | Siemens |
| Scanner Hardware | DISCOVERY-MR750w 3T | Achieva-dStream 3T | Prisma-fit 3T | TrioTim 3T |
| Scanner Software | 27-LX-MR-Software-release: DV26.0-R03-1831.b | 5.6.1–5.6.1.0 | syngo-MR-E11 | syngo-MR-B17 |
| Receive Coil | 32Ch-Head | MULTI-COIL | BC | 32Ch-Head |
| T1-w Sequence Type | BRAVO | ME-MPRAGE | ME-MPRAGE | ME-MPRAGE |
| Resolution (mm) | 1.0×1.0×0.5 | 1.0×1.0×1.0 | 1.0×1.0×1.0 | 1.0×1.0×1.0 |
| TE/ΔTE (ms) | 3.7 | 1.66/1.9 | 1.64/1.86 | 1.64/1.86 |
| TR (ms) | 9500 | 2530 | 2530 | 2530 |
| TI (ms) | 600 | 1300 | 1100 | 1200 |
2.2. ComBat harmonization
Combat (Johnson et al., 2007) is a harmonization method that empirically estimates and removes scanner noise from neuroimaging data—in the current study, regional measurements of cortical thickness and volume. First, a linear combination of the data is modeled, comprising biological variables of interest and empirical Bayes estimates of additive and multiplicative effects. Data is harmonized across site/scanner i for individual j and region v in the following way:
| (1) |
where denotes the structural measurement (thickness or volume) of site , person , and region denotes the average thickness or volume value across all subjects and scanners; denotes the vector of biological variables for region denotes a vector of regression coefficients for each biological variable in denotes the additive term for scanner i and region v; and denotes the multiplicative term for scanner i and region v. Here is assumed to be independent and normally-distributed, with mean 0 and variance .
Next, the model-based estimates for , and are used to harmonize the data as follows:
| (2) |
where denotes the newly harmonized values. These values can then be used in further analyses. It is noteworthy, however, that ComBat does not model relationships between regions, thereby ignoring the network structure of the brain. There is no guarantee, then, that the process of harmonizing data via ComBat preserves network structure.
2.3. CovBat harmonization
CovBat is a method developed by Chen et al. (2021) as an alternative to ComBat that models the covariance structure between brain regions. CovBat uses the existing ComBat harmonization framework as a basis. First, ComBat is used to remove the estimated additive and multiplicative effects. Then, principal components analysis (PCA) is conducted on the ComBat-adjusted residual data in order to determine within-site covariance matrices. Center and scale parameters are estimated and removed from the original scores.
The harmonization provided by CovBat captures the interrelationships between regions in a way that is not present in the standard ComBat model. However, while CovBat accounts for covariance relationships between regions of the registered image, researchers seeking to apply network models to neuroimaging data require a harmonization method that explicitly preserves network relationships. The following is an explanation of such a modeling approach, our proposed method, NetBat.
2.4. Structural connectivity matrix
We assume that, regardless of the scanner, there is a true underlying network that models the structural relationships between brain regions for each individual. The structural network has as its edge weights the strengths of the anatomical connections between brain regions. We estimate the associations between brain regions for an individual using the correlation between all pairs of regions over the four scans. This results in an adjacency matrix which is then used as the basis for the network model for that individual. The adjacency matrix contains the correlations between the structural data for each pair of regions and will therefore be referred to in this study as the structural connectivity matrix.
The current study assumes an underlying network structure from data collected by MRI scanners, where the measurements are cortical thickness and volume. It is notable, however, that the NetBat structure is flexible and therefore allows for any process by which a structural connectivity matrix may be derived. This includes correlating multiple scans of the same brain using different scanners, as is the case in this study, but also methods for estimating a structural connectivity matrix that: (1) use prior knowledge of the structural relationships between regions, and (2) those that use other structural imaging techniques to supplement or replace MRI (T1) data.
Consider individual and connection between regions , who has structural scans across four sites (as in our example), . At site , a structural measurement is taken for every region . As an example, may represent the white matter volume of region or the gray matter volume of that region. Let . The structural representation of person and connection can be estimated by the correlation between the regions of each scan as:
| (3) |
The structural connectivity matrix for person , then, is given by:
| (4) |
This structural connectivity matrix then represents the correlational relationships between different regions of an individual’s brain across multiple scans. The structural information contained in this matrix can then be used to create a person-specific structural network model.
2.5. WSBM
There are many network modeling methods that can be used to account for network dependencies. We initially propose using one of the simplest network models, the stochastic block model (SBM; Lee and Wilkinson, 2019). However, the structure for NetBat is broad enough to allow for future work to incorporate more complicated network models. SBM is a prominent community detection method that has been frequently used to determine the underlying network structure of empirical data (Faskowitz et al., 2018), especially brain data (Pavlović et al., 2020). SBM assumes that regions in the brain divide into communities where regions within the same community interact more often than those in differing communities. SBM identifies clusters of highly-connected brain regions from brain network data. The standard SBM assumes unweighted edges, where an edge is either present (represented by a value of 1) or absent (represented by a value of 0). Because the edge values in our networks are standardized correlations, it is necessary to use weighted SBM (WSBM; Aicher et al., 2015). WSBM estimates the community structure of weighted networks by applying a Bayesian variational algorithm, and demonstrates greater estimation accuracy than the long-standing method of thresholding weighted edges, thereby forcing weighted networks to appear unweighted.
In SBM, binary edge weights are modeled as a Bernoulli distribution, and the parameters of interest are: , the estimated community label for vertex , and , a matrix of structural connectivity parameters. Both can be used in further analyses. WSBM, alternatively, can model weighted edges using a number of distributions; here we illustrate the WSBM estimation process assuming that weighted edges follow a normal distribution. Like SBM, WSBM can produce community labels . However, the interpretation of the matrix is different given that WBSM is not merely estimating edge existence, but now also the magnitude, , and variability, , of the edge strength. For this reason, the likelihood function of a WSBM that assumes normally-distributed edges can be written as
| (5) |
where is the structural connectivity matrix that describes the network of interest; is the mean of edge bundle ; and is the variance of edge bundle . The current method uses as an indicator of the relationships between and within clusters, as determined by WSBM.
2.6. NetBat
Fig. 2 presents a summary of the overall steps of our proposed method, NetBat. In order to account for network dependencies, betwee n-community and within-community parameter estimates of association are included in model (1) as covariates. This requires the addition of a term to the ComBat model in (1). For participant , we fit the WSBM model (model (5)) to obtain mean edge bundle estimators , where is the number of communities in the fitted WSBM. In this way, represents the vector of network-based parameters associated with the edge bundle estimators from the fitted network model. Taking this all together, NetBat fits the following model:
| (6) |
Fig. 2.

A process model for NetBat. The first three steps of NetBat, represented here under the“Subject-Level Analysis” heading, produce a WSBM network for each participant from their structural data scan using a structural connectivity matrix (i.e., an adjacency matrix derived from structural connectivity data). Here the structural connectivity matrix is derived by correlating across scans from four different scanners for a given participant. Then, under the “Group-Level Analysis” heading, the next three steps demonstrate the process of using network connectivity parameters estimated by the WSBM to create a design matrix that can be used as a series of covariates in the ComBat model. Because WSBM labels clusters inconsistently across participants, it is necessary to rank-order the magnitude of the association statistic when creating the design matrix.
In the current study, we used the Weighted Stochastic Block Model (WSBM) to provide the parameter estimates of association for . Because WSBM requires that the number of clusters be specified apriori, a five-cluster community structure will be assumed in all analyses, and therefore there are , or 15, unique pairs of within- and between-cluster associations. These association parameter estimates must be size-ranked in order to provide consistency across individual scans. In this way the first column is a vector of the largest association values across all subjects, the second column the second-largest association values, and so on. This is necessary because, for different networks, WSBM does not consistently label communities. In other words, the community labeled “1” for one network may not be the community labeled “1” for another network. As such, the interpretation of these covariates is not the association between a given cluster and all other clusters, as might be expected, but instead the nth largest association for all participants. An additional benefit of this approach is that, by allowing for individual variability in edge weights, it is not necessary to establish invariance between participant brain images.
2.7. Data analysis
NetBat was used to harmonize the cortical thickness and volume of MRI data taken from 18 subjects, who were each scanned using 4 different scanners. Then, a structural connectivity matrix was produced for each subject across the 4 scanners. These structural connectivity matrices represented the underlying brain network for each subject. Individual WSBMs estimated the community structure for regions of each subject’s brain. We then extracted the community association values — which capture the associations between and within communities — and ordered them within-subject by size for their inclusion in a design matrix. The design matrix was then included in ComBat so that we could account for biological variation in network relationships.
In order to determine the success of NetBat in retaining network structure, we compared these results with data from three other harmonization processes: (1) raw data; (2) ComBat-harmonized data; and (3) CovBat-harmonized data. Evaluation took two general forms: (1) how consistently does each method harmonize MRI data for the same person across multiple scanners; and (2) how well does each method retain the network structure of the brain after harmonization. To answer the former question, we compared the mean absolute difference and the root mean squared difference between all six combinations of pairwise scanners and across all four methods. To answer the latter question, we assessed the network structure of all three harmonized datasets (ComBat, CovBat, and NetBat) by comparing their structural connectivity matrices to the original, unharmonized structural connectivity matrix. The rand index was used to determine if each harmonization method produced data that would be placed in the same WSBM cluster as the unharmonized data. Lastly, modularity was used to assess the sub-network structure of each network, as a way of capturing the relative efficiency of the clustering algorithm. Differences in rand index values, modularity values, and mean absolute difference values between the three harmonization methods were assessed using independent samples t-tests. For the rest of the manuscript, we will refer to the latter analyses as occurring at the post-harmonization network level.
3. Results
3.1. Harmonization comparisons
Figs. 3 and 4 demonstrate differences across harmonization methods, broken down by pairwise comparisons of scanners, for cortical thickness and volume, respectively. Fig. 3a, a bar plot of the means of absolute differences (MAD) between scanners for cortical thickness, and Fig. 3b, a bar plot of the root mean squared differences (RMSD) between scanners for cortical thickness, suggested similar results. Here it is apparent that larger differences are present between GE and Philips than with any other pair of scanners. However, the largest discrepancy between methods is present when comparing GE scans with Prisma scans: all harmonization methods produce smaller MAD scores than the raw, unharmonized data. The other four scanner pairs produced much lower MAD scores that are more even across methods. Of these, two scanner pairs—“GE - TR” and “PR - TR”—show evidence that NetBat outperforms the other harmonization methods. Two scanner pairs suggest that CovBat outperforms NetBat, namely “GE - PR” and “PH - TR”. ComBat only outperformed CovBat and NetBat for the “PH - TR” scanner pair. Overall, no harmonization methods demonstrates higher performance than the others in all conditions.
Fig. 3.

Comparisons of harmonization methods for cortical thickness measures from MRI scans. Pairwise differences are computed between each scanner. (a) Bar plot of the mean absolute differences between each pairwise set of scanners across all harmonization methods; and (b) Bar plot of the root mean squared difference (RMSD) between each pairwise set of scanners across all harmonization methods. PH = Philips, PR = Prisma, and TR = Trio.
Fig. 4.

Comparisons of harmonization methods for volume measures from MRI scans. Pairwise differences are computed between each scanner. (a) Bar plot of the mean absolute differences between each pairwise set of scanners across all harmonization methods; and (b) Bar plot of the root mean squared difference (RMSD) between each pairwise set of scanners across all harmonization methods. PH = Philips, PR = Prisma, and TR = Trio.
Similarly, MAD and RMSD bar plots for cortical volume, in Fig. 4a and Fig. 4b respectively, suggested that no harmonization method outperformed the others in all conditions. Like with the thickness measurements, “GE - PH” had much higher MAD scores than the other scanner pairs, across all methods. NetBat performed better than CovBat for three scanner pairs: “GE - PR”, “GE - TR”, and “PR - TR”. CovBat produced lower MAD scores than NetBat for “GE - PH” and “PH - PR”. ComBat performed best for the “GE - PH” and the “PH - TR” pair. Additionally, we conducted paired-samples t-tests to determine how many participants had significantly different pre- and post-harmonization cortisol thickness and volume measurements per each scanner and harmonization method; the resulting table can be found in Supplementary Table 1.
3.2. Post-harmonization network evaluation
Post-harmonization network heat maps for cortical thickness and volume can be found in Figs. 5 and 6, respectively. In each figure the first row contains heat maps for each harmonization method given a fixed order, i.e., the order determined via clustering of the raw unharmonized data, which we will call “raw order”. Raw order provides an assessment of how well each harmonization method retains the original, pre-harmonized network structure. Alternatively, the second row in each figure contains heat maps for each harmonization method given an order specific to the clustering of each harmonization method, which we will call “individual order”. Individual order provides an assessment of the network structure found after harmonization, and assumes that the true network structure may be best obtained after harmonizing. Each pixel in the heat map corresponds to a correlation between two brain regions, and the colors correspond to magnitude and direction of effect: blue represents high positive correlation, red represents high negative correlation, and white represents zero correlation. Raw data is used as the basis of comparison for all three harmonization methods in both the raw and individual order conditions. It is evident, upon initial visual inspection, that when considering the raw order, each harmonization method provides a noisier network structure than the raw data, suggesting that the process of harmonization removes part of the original network structure. However, when considering individual order, it is apparent that harmonized data can still provide network structure, although these structures differ substantially from the original structure. Additionally, it seems that NetBat sometimes provides a clearer network structure than the other harmonization methods, especially with regards to negative values. The results of a small simulation, representing ideal network conditions, can be found in Supplementary Figure 1, in which structural connectivity matrices derived from pre-harmonized and NetBat-harmonized simulated data are represented as heat maps.
Fig. 5.

Post-harmonization heat maps across harmonization method (Raw Data, ComBat, CovBat, and NetBat) for cortical thickness from a single participant. The first row of the figure contains the adjacency matrices for each harmonization method fixed to the “raw order”, or the order determined via clustering of the raw unharmonized data. The second row of the figure contains the adjacency matrices for each harmonization method fixed to an “individual order”, or the order determined via clustering of each method individually. Matrix rows and columns correspond to brain regions. Blue pixels represent high positive correlation, red pixels represent high negative correlation, and white pixels represent zero correlation.
Fig. 6.

Post-harmonization heat maps across harmonization method (Raw Data, ComBat, CovBat, and NetBat) for cortical volume from a single participant. The first row of the figure contains the adjacency matrices for each harmonization method fixed to the “raw order”, or the order determined via clustering of the raw unharmonized data. The second row of the figure contains the adjacency matrices for each harmonization method fixed to an “individual order”, or the order determined via clustering of each method individually. Matrix rows and columns correspond to brain regions. Blue pixels represent high positive correlation, red pixels represent high negative correlation, and white pixels represent zero correlation.
In order to determine how much of the network structure has been removed for each harmonization method, absolute differences between the raw data structural connectivity matrix and each harmonization method were calculated and the means plotted in a violin plot in Fig. 7. Here smaller differences correspond to less difference between the structural connectivity matrix of the harmonization method and the structural connectivity matrix of the raw data, and suggest that less of the original network structure has been removed. With both cortical thickness and volume, ComBat and NetBat have statistically significantly smaller absolute differences than CovBat. With cortical thickness, but not volume, ComBat had statistically significantly smaller absolute differences than NetBat.
Fig. 7.

Post-harmonization violin plots for mean absolute difference across harmonization method for: (a) cortical thickness and (b) volume. Differences are computed between the structural connectivity matrix of each harmonization method and the heat map of the raw data. Significance was determined by independent-samples t-test. * p < .05, ** p < .01, NS = No Significance.
The adjusted rand index provides an estimate for how closely related are the cluster labels provided by either two different clustering methods on the same data or the same clustering method on two similarly-structured datasets, the latter of which is the case here. The adjusted rand index is calculated on the WSBM cluster labels from the original non-harmonized data and the WSBM cluster labels from each of the harmonized datasets. Higher adjusted rand index values suggest that the brain regions from the original data are more likely to end up in the same cluster after a given harmonization method. Fig. 8 displays the distributions of each set of adjusted rand indexes, across all participants. The thickness measures in Fig. 8a suggest that ComBat has the closest clustering structure to the original data than both CovBat and NetBat, and NetBat has a closer clustering structure to the original data than CovBat. The volume measures in Fig. 8b suggest that there are no significant differences between any of the harmonization methods. It is evident in all cases that all harmonization methods provide less accurate clustering when compared to the original data.
Fig. 8.

Post-harmonization violin plots for the adjusted rand index across harmonization methods for: (a) cortical thickness and (b) volume. The adjusted rand index is calculated between the cluster labels for all harmonization methods and the cluster labels for the raw data. Significance was determined by independent-samples t-test. * p < .05, ** p < .01, NS = No Significance.
Network modularity, the last basis of comparison used in this study, is a measure of the ability for a network to be divided into sub-networks, according to the determined cluster structure. This metric can be interpreted as a measure of the efficiency of the clustering algorithm when applied to network data. Fig. 9 depicts the distributions of modularity scores for all participants across harmonization methods, where (a) represents cortical thickness and (b) represents volume. Higher modularity scores represent greater efficiency in clustering. Results suggest that, with cortical thickness, no harmonization method provides statistically significantly different modularity scores than raw data, and no harmonization method outperforms the others. Alternatively, with regard to volume, each harmonization method provides statistically significantly higher modularity scores than the raw data, but again no harmonization method outperforms the others.
Fig. 9.

Post-harmonization violin plots for network modularity across harmonization methods for: (a) cortical thickness and (b) volume. Network modularity is calculated for each network, representing the ability for the network to be divided into sub-networks. Significance was determined by independent-samples t-test. ** p < .01, *** p < .001, NS = No Significance.
4. Discussion
The current study compared our new method NetBat, with the standard ComBat method and the covariance-focused CovBat. We hypothesized that NetBat would better retain network relationships after harmonization than both ComBat and CovBat. All harmonization methods were used to analyze a dataset of 18 participants, each of whom were scanned in all four of the study’s scanners. Scanner effects were calculated using within-person data from all four scanners. Network relationships for a single participant were determined by correlating their within-person data, across the four scanners.
Two classes of empirical comparisons were considered in this study: (1) the average pairwise comparisons of scanners across harmonization methods, and (2) the retention of networks post-harmonization. In the first case, when comparing the consistency of harmonization between pairs of scanners, no harmonization method consistently outperformed the others. It is evident that NetBat produced the lowest MAD and RMSD scores for cortical thickness in two scanner comparisons, i.e., when comparing GE and Trio scanners and Prisma and Trio scanners. This ties CovBat, which also best minimized MAD and RMSD scores for cortical thickness in two scanner comparisons, and beats ComBat and raw data, where each only performed better in one scanner comparison each. However, with volume, NetBat produced the lowest MAD and RMSD scores in three scanner comparisons, which was more often than CovBat, ComBat, and raw data. The lack of consistent performance may suggest that further investigation is required to understand the differences between harmonization methods. For instance, it may be helpful to test various simulated conditions where we would expect structural imaging data to be better harmonized by NetBat than ComBat or CovBat. It is also notable that these metrics capture the consistency between scanners and are only indirectly affected by network relationships. Network-based metrics are likely to be more informative in assessing the ability of each harmonization method to retain the original network structure.
Assessment at the post-harmonization level provided a more nuanced picture of the relationships between the harmonization methods. First, visual inspection of post-harmonized heat maps suggests differences across harmonization methods in their overall pattern of association between regions. These differences likely correspond to qualitative interpretations. For instance, in Figs. 5 and 6, it appears that NetBat may better retain negative correlations than ComBat or CovBat. Second, mean absolute differences (MAD) between post-harmonization adjacency matrices and the unharmonized structural connectivity matrix suggest that NetBat and ComBat provide smaller MADs than CovBat, but that ComBat also provides smaller MADs than NetBat. These results seem to suggest that ComBat better retains network structure than both CovBat and NetBat, despite their design. Revisions to CovBat and NetBat may be necessary in order to improve their ability to retain network structure. Third, adjusted rank index values, calculated between the cluster labels of the original unharmonized data and the cluster labels from each of the harmonization methods post-harmonization, corroborate the results found by the mean absolute differences. Here, for thickness measurements, ComBat outperforms CovBat and NetBat in retaining the network structure of the original unharmonized data. However, no significant differences were found between harmonization methods with volume measurements. Modularity estimates suggest that, with volume and not thickness, each harmonization method provided higher modularity than the raw data, but no harmonization method outperformed the others. The significance of thickness results and not volume results is consistent with findings in the literature. Schwarz et al. (2016), for instance, suggests that, while both are highly correlated and sometimes serve different purposes, cortical thickness may be preferred as a measurement of study with Alzheimer’s subjects over cortical volume.
Visual differences between the fixed Raw Order and Individual Order post-harmonization heat maps in Figs. 5 and 6 suggest that each harmonization method, in altering the original outcomes, may obscure or introduce structural connectivity relationships relative to the original unharmonized data. It is not clear that any one harmonization method would consistently modify relationships based on their formulations. We expect that any harmonization method which accounts for network relationships to provide more accurate network relationships, however we did not consistently observe this, with ComBat outperforming both CovBat and NetBat in MAD relative to the unharmonized structural connectivity matrix. Future work should include simulation studies to assess the circumstances and limits at which each harmonization method can potentially obscure or introduce structural connectivity relationships, independent of harmonization performance.
NetBat is a novel approach to harmonization in a number of ways. The focus on retaining network structure in the original data when harmonizing differs from approaches where harmonization is conducted directly on the network’s structural connectivity matrix (Onicas et al., 2022). The network focus of NetBat also differs from the covariance focus of methods like CovBat (Chen et al., 2021). By including network parameters as covariates, NetBat improves upon ComBat by explicitly modeling network relationships. In some cases, NetBat has been shown to improve harmonization of structural MRI data over methods like ComBat and CovBat. Our results suggest that NetBat conducts harmonization of structural data in a way that is complementary to ComBat and CovBat. Further efforts to determine the relative strengths and weaknesses of each method should focus on simulation work that tests specific simulation conditions (e.g., the number of clusters specified apriori), and by testing structural data from different stratified populations.
A number of limitations exist in the design of the current study that are worth mentioning. First, the adjacency matrices in this study were comprised of correlations of four data points each, corresponding to the four scanners. This means that the correlations are susceptible to large scanner bias. While we acknowledge the difficulty in collecting structural MRI data from multiple scanners for each person, it is likely that the accuracy of the network models would improve if more scanners were used. Alternatively, there exist other methods for calculating a structural connectivity matrix for structural connectivity data, including prior knowledge of network relationships and methods that integrate multiple imaging modalities like JuSpace (Dukart et al., 2021). Likewise, NetBat currently uses the WSBM established by Aicher et al. (2015). Future research should consider alternative weighted network models such as the generative WSBM (Ng and Murphy, 2021), the generalized exponential random graph model of (Wilson et al., 2017; Stillman et al., 2019), or the latent space model (Wilson et al., 2020; Hoff et al., 2002). These alternative methods, for the creation of both the structural connectivity matrix and the network model, should be explored in this context.
While the current study assumed that network structure need be explicitly included as a covariate in the model, it may be that harmonization methods like ComBat do not remove enough of the network structure to justify explicit inclusion. Results from Kurokawa et al. (2021) suggest that, with dMRI data, brain network analyses are minimally affected by scanner effects. This may also hold true for structural MRI data. Additionally, it may be that network model parameters included as a covariate may not be the best way to represent network relationships in the model. It may be, for instance, that including the magnitude of association between regions as random effects may lead to better harmonization of network structure. Future research should explore this.
NetBat has as of yet only incorporated the standard ComBat structure established by Fortin et al. (2017). Future research could consider incorporating methods like longitudinal ComBat (Beer et al., 2020), which may help with the estimation of the underlying network structure. It may also be useful in future research to compare NetBat to other ComBat-like harmonization methods. For instance, Block-ComBat is another alternative to ComBat developed by Chen et al. (2022) that incorporates prior knowledge of structural or functional network relationships in a two-step process that re-harmonizes functional imaging data in the pre-specified network blocks. The comparison between NetBat and Block-ComBat is likely to produce meaningful differences in network retention. As it is, NetBat represents a novel approach to a methodologically difficult problem that may provide new insight into brain network architecture and the process of harmonization.
Supplementary Material
Appendix A. Supplementary data
Supplementary material related to this article can be found online at https://doi.org/10.1016/j.neuroimage.2025.121317.
Acknowledgments
This work was supported by the following NIH/NIA grants: R01 AG063752, P01 AG025204, P30 AG10129, and UH3 NS100608. The assertions and conclusions presented here are made by the authors and may not be reflective of the views of its supporting agencies.
Footnotes
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
CRediT authorship contribution statement
Gustav R. Sjobeck: Writing – review & editing, Writing – original draft, Visualization, Software, Methodology, Investigation, Formal analysis, Conceptualization. Mahbaneh Eshaghzadeh Torbati: Writing – review & editing, Software, Data curation. Davneet S. Minhas: Writing – review & editing, Supervision, Conceptualization. Charles S. DeCarli: Writing – review & editing, Resources, Funding acquisition, Data curation. James D. Wilson: Writing – review & editing, Supervision, Resources, Methodology, Investigation, Conceptualization. Dana L. Tudorascu: Writing – review & editing, Supervision, Resources, Investigation, Funding acquisition, Conceptualization.
Ethics statement
The institutional review boards at each of participating institutions from which the MR images were obtained approved this study (IRB # 1373817, expiration date: 1/19/2026), and subjects or their legal representatives gave written informed consent.
Data availability
The authors do not have permission to share data.
References
- Aicher C, Jacobs AZ, Clauset A, 2015. Learning latent block structure in weighted networks. J. Complex Netw 2 (3), 221–248. [Google Scholar]
- Bassett DS, Brown JA, Deshpande V, Carlson JM, Grafton ST, 2011. Conserved and variable architecture of human white matter connectivity. Neuroimage 54 (2), 1262–1279. 10.1016/j.neuroimage.2010.09.006. [DOI] [PubMed] [Google Scholar]
- Bassett DS, Bullmore E, 2006. Small-world brain networks. Neurosci. 12 (6), 512–523. [DOI] [PubMed] [Google Scholar]
- Bassett DS, Sporns O, 2017. Network neuroscience. Nature Neurosci. 20 (3), 353–364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beer JC, Tustison NJ, Cook PA, Davatzikos C, Sheline YI, Shinohara RT, Linn KA, 2020. Longitudinal ComBat: A method for harmonizing longitudinal multi-scanner imaging data. Neuroimage 220, 117129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bullmore E, Sporns O, 2009. Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Rev. Neurosci 10 (3), 186–198. [DOI] [PubMed] [Google Scholar]
- Chen AA, Beer JC, Tustison NJ, Cook PA, Shinohara RT, Shou H, 2021. Mitigating site effects in covariance for machine learning in neuroimaging data. Hum. Brain Mapp 43 (4), 1179–1195. 10.1002/hbm.25688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen J, Liu J, Calhoun VD, Arias-Vasquez A, Zwiers MP, Gupta CN, Franke B, Turner JA, 2014. Exploration of scanning effects in multi-site structural MRI studies. J. Neurosci. Methods 230, 37–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen AA, Srinivasan D, Pomponio R, Fan Y, Nasrallah IM, Resnick SM, Beason-Held LL, Davatzikos C, Satterthwaite TD, Bassett DS, 2022. Harmonizing functional connectivity reduces scanner effects in community detection. NeuroImage 256, 119198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Damoiseaux JS, Greicius MD, 2009. Greater than the sum of its parts: a review of studies combining structural connectivity and resting-state functional connectivity. Brain Struct. Funct 213, 525–533. [DOI] [PubMed] [Google Scholar]
- Duan Y, Zhao W, Luo C, Liu X, Jiang H, Tang Y, Liu C, Yao D, 2022. Identifying and predicting autism spectrum disorder based on multi-site structural MRI with machine learning. Front. Hum. Neurosci 15, 765517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dukart J, Holiga S, Rullmann M, Lanzenberger R, Hawkins PCT, Mehta MA, Hesse S, Barthel H, Sabri O, Jech R, 2021. Juspace: A Tool for Spatial Correlation Analyses of Magnetic Resonance Imaging Data with Nuclear Imaging Derived Neurotransmitter Maps. Technical Report, Wiley Online Library. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Farooq H, Chen Y, Georgiou TT, Tannenbaum A, Lenglet C, 2019. Network curvature as a hallmark of brain structural connectivity. Nat. Commun 10 (1), 4937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Faskowitz J, Yan X, Zuo XN, Sporns O, 2018. Weighted stochastic block models of the human connectome across the life span. Sci. Rep 8 (1), 12997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fischl B, 2012. FreeSurfer. Neuroimage 62 (2), 774–781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fortin J-P, Cullen N, Sheline YI, Taylor WD, Aselcioglu I, Cook PA, Adams P, Cooper C, Fava M, McGrath PJ, 2018. Harmonization of cortical thickness measurements across scanners and sites. Neuroimage 167, 104–120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fortin JP, Parker D, Tunç B, Watanabe T, Elliott MA, Ruparel K, Roalf DR, Satterthwaite TD, Gur RC, Gur RE, 2017. Harmonization of multi-site diffusion tensor imaging data. Neuroimage 161, 149–170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fortin JP, Sweeney EM, Muschelli J, Crainiceanu CM, Shinohara RT, 2016. Removing inter-subject technical variability in magnetic resonance imaging studies. NeuroImage 132, 198–212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grossberg S, 2000. The complementary brain: Unifying brain dynamics and modularity. Trends Cogn. Sci 4 (6), 233–246. [DOI] [PubMed] [Google Scholar]
- Hedges EP, Dimitrov M, Zahid U, Vega BB, Si S, Dickson H, McGuire P, Williams S, Barker GJ, Kempton MJ, 2022. Reliability of structural MRI measurements: The effects of scan session, head tilt, inter-scan interval, acquisition sequence, FreeSurfer version and processing stream. Neuroimage 246, 118751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoff PD, Raftery AE, Handcock MS, 2002. Latent space approaches to social network analysis. J. Amer. Statist. Assoc 97 (460), 1090–1098. [Google Scholar]
- Johnson WE, Li C, Rabinovic A, 2007. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8 (1), 118–127. [DOI] [PubMed] [Google Scholar]
- Jovicich J, Marizzoni M, Sala-Llonch R, Bosch B, Bartrés-Faz D, Arnold J, Benninghoff J, Wiltfang J, Roccatagliata L, Nobili F, 2013. Brain morphometry reproducibility in multi-center 3 t MRI studies: a comparison of cross-sectional and longitudinal segmentations. Neuroimage 83, 472–484. [DOI] [PubMed] [Google Scholar]
- Kurokawa R, Kamiya K, Koike S, Nakaya M, Uematsu A, Tanaka SC, Kamagata K, Okada N, Morita K, Kasai K, 2021. Cross-scanner reproducibility and harmonization of a diffusion mri structural brain network: A traveling subject study of multi-b acquisition. NeuroImage 245, 118675. [DOI] [PubMed] [Google Scholar]
- Lee C, Wilkinson DJ, 2019. A review of stochastic block models and extensions for graph clustering. Appl. Netw. Sci 4 (1), 1–50. [Google Scholar]
- Liu M, Zhu AH, Maiti P, Thomopoulos SI, Gadewar S, Chai Y, Kim H, Jahanshad N, 2023. Style transfer generative adversarial networks to harmonize multisite MRI to a single reference image to avoid overcorrection. Hum. Brain Mapp 44 (14), 4875–4892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ng TLJ, Murphy TB, 2021. Weighted stochastic block model. Stat. Methods Appl 30, 1365–1398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Brien LM, Ziegler DA, Deutsch CK, Frazier JA, Herbert MR, Locascio JJ, 2011. Statistical adjustments for brain size in volumetric neuroimaging studies: some practical implications in methods. Psychiatry Res.: Neuroimaging 193 (2), 113–122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Onicas AI, Ware AL, Harris AD, Beauchamp MH, Beaulieu C, Craig W, Doan Q, Freedman SB, Goodyear BG, Zemek R, 2022. Multisite harmonization of structural DTI networks in children: An A-CAP study. Front. Neurol 13, 850642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pavlović DM, Guillaume BR, Towlson EK, Kuek NM, Afyouni S, Vértes PE, Yeo BT, Bullmore ET, Nichols TE, 2020. Multi-subject stochastic blockmodels for adaptive analysis of individual differences in human brain network cluster structure. NeuroImage 220, 116611. [DOI] [PubMed] [Google Scholar]
- Pinto MS, Paolella R, Billiet T, Van Dyck P, Guns PJ, Jeurissen B, Ribbens A, den Dekker AJ, Sijbers J, 2020. Harmonization of brain diffusion MRI: Concepts and methods. Front. Neurosci 14, 396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pohl KM, Sullivan EV, Rohlfing T, Chu W, Kwon D, Nichols BN, Zhang Y, Brown SA, Tapert SF, Cummins K, 2016. Harmonizing DTI measurements across scanners to examine the development of white matter microstructure in 803 adolescents of the NCANDA study. Neuroimage 130, 194–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosazza C, Minati L, 2011. Resting-state brain networks: literature review and clinical applications. Neurol. Sci 32, 773–785. [DOI] [PubMed] [Google Scholar]
- Schwarz CG, Gunter JL, Wiste HJ, Przybelski SA, Weigand SD, Ward CP, Senjem ML, Vemuri P, Murray ME, Dickson DW, 2016. A large-scale comparison of cortical thickness and volume methods for measuring alzheimer’s disease severity. NeuroImage: Clin. 11, 802–812. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sharp DJ, Scott G, Leech R, 2014. Network dysfunction after traumatic brain injury. Nat. Rev. Neurol 10 (3), 156–166. [DOI] [PubMed] [Google Scholar]
- Stillman PE, Wilson JD, Denny MJ, Desmarais BA, Cranmer SJ, Lu ZL, 2019. A consistent organizational structure across multiple functional subnetworks of the human brain. NeuroImage 197, 24–36. [DOI] [PubMed] [Google Scholar]
- Timmermans C, Smeets D, Verheyden J, Terzopoulos V, Anania V, Parizel PM, Maas A, 2019. Potential of a statistical approach for the standardization of multicenter diffusion tensor data: a phantom study. J. Magn. Reson. Imaging 49 (4), 955–965. [DOI] [PubMed] [Google Scholar]
- Torbati ME, Minhas DS, Ahmad G, O’Connor EE, Muschelli J, Laymon CM, Yang Z, Cohen AD, Aizenstein HJ, Klunk WE, Christian BT, Hwang SJ, Crainiceanu CM, Tudorascu DL, 2021. A multi-scanner neuroimaging data harmonization using RAVEL and ComBat. NeuroImage 245, 118703. 10.1016/j.neuroimage.2021.118703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Torbati ME, Minhas DS, Laymon CM, Maillard P, Wilson JD, Chen CL, Crainiceanu CM, DeCarli CS, Hwang SJ, Tudorascu DL, 2023. MISPEL: A supervised deep learning harmonization method for multi-scanner neuroimaging data. Med. Image Anal 89, 102926. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilcock D, Jicha G, Blacker D, Albert MS, D’Orazio LM, Elahi FM, Fornage M, Hinman JD, Knoefel J, Kramer J, et al. , 2021. MarkVCID cerebral small vessel consortium: I. Enrollment, clinical, fluid protocols. Alzheimer’ s Dement. 17 (4), 704–715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilson JD, Cranmer S, Lu Z-L, 2020. A hierarchical latent space network model for population studies of functional connectivity. Comput. Brain Behav 3 (4), 384–399. [Google Scholar]
- Wilson JD, Denny MJ, Bhamidi S, Cranmer SJ, Desmarais BA, 2017. Stochastic weighted graphs: Flexible model specification and simulation. Soc. Networks 49, 37–47. [Google Scholar]
- Zuo L, Dewey BE, Carass A, Liu Y, He Y, Calabresi PA, Prince JL, 2021. Information-based disentangled representation learning for unsupervised MR harmonization. In: International Conference on Information Processing in Medical Imaging. Springer, pp. 346–359. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The authors do not have permission to share data.
