Abstract
Complex graph theory measures of brain structural connectomes derived from diffusion weighted images (DWI) provide insight into the network structure of the brain. Further, as the number of available DWI datasets grows, so does the ability to investigate associations in these measures with major biological factors, like age. However, one key hurdle that remains is the presence of scanner effects that can arise from different DWI datasets and confound multisite analyses. Two common approaches to correct these effects are voxel-wise and feature-wise harmonization. However, it is still unclear how to best leverage them for graph-theory analysis of an aging population. Thus, there is a need to better characterize the impact of each harmonization method and their ability to preserve age related features. We investigate this by characterizing four complex graph theory measures (modularity, characteristic path length, global efficiency, and betweenness centrality) in 48 participants aged 55 to 86 from Baltimore Longitudinal Study of Aging (BLSA) and Vanderbilt Memory and Aging Project (VMAP) before and after voxel- and feature-wise harmonization with the Null Space Deep Network (NSDN) and ComBat, respectively. First, we characterize across dataset coefficients of variation (CoV) and find the combination of NSDN and ComBat causes the greatest reduction in CoV followed by ComBat alone then NSDN alone. Second, we reproduce published associations of modularity with age after correcting for other covariates with linear models. We find that harmonization with ComBat or ComBat and NSDN together improves the significance of existing age effects, reduces model residuals, and qualitatively reduces separation between datasets. These results reinforce the efficiency of statistical harmonization on the feature-level with ComBat and suggest that harmonization on the voxel-level is synergistic but may have reduced effect after running through the multiple layers of the connectomics pipeline. Thus, we conclude that feature-wise harmonization improves statistical results, but the addition of biologically informed voxel-based harmonization offers further improvement.
Keywords: Diffusion-weighted imaging, tractography, complex network measures, harmonization, multi-site
1. INTRODUCTION
Diffusion weighted imaging (DWI) is a non-invasive in-vivo imaging modality that, with tractography, we can use to reconstruct white matter microstructure. Connectomics is an extension of tractography that represents brain connectivity as a graph with nodes connected by edges. Specifically, nodes in a connectome reflect cortical regions from the brain and edges reflect white matter connections. Using this graph representation, we can use established graph theory analyses to obtain insight into the connectivity of the brain. Neuroscientists have used graph measures to study how neurological disorders such as mild traumatic brain injury [1] [2], Alzheimer’s disease [3], temporal lobe epilepsy [4], and aging [5] affect brain structure and organization.
Despite these advances, statistically significant confounding differences in images caused by data acquisition and scanner noise is an established problem in multi-site DWI analysis [6], [7]. These differences carryover to graph theory analysis and other downstream tasks [8]. For instance, in the cohort investigated presently, there is clear differentiation between data distributions collected on different scanners (Figure 1). Without harmonization, this effect confounds metrics computed on the combined population. There are a few methods for harmonizing these differences, including ComBat [9] and the Null Space Deep Network (NSDN) [10], but it is unclear whether there are advantages to one method over the other or whether the two methods work synergistically. Thus, to begin to fill this gap, we explore how effective each method is for improving graph theory analysis of an aging population. Ideal harmonization techniques preserve variation associated with age and limit other sources of variation (i.e. variability caused by scanners).
Figure 1.

The graph measures investigated presently exhibit appreciable scanner effects and subsequent separation of data distributions by site.
ComBat is a widely used statistical tool originally designed for gene expression analysis that has since been extended to multi-scanner DWI harmonization. The algorithm uses an empirical Bayes framework to improve the variance for multiplicative and additive scanner effects [9]. Previous work suggests that ComBat is an effective tool for harmonizing differences between connectivity matrices as well as the graph measures themselves [8]. In this paper we apply ComBat on the graph measure distributions to perform feature-wise harmonization. The second harmonization method we investigate is the NSDN [10], a publicly available deep learning model trained and validated at the histological level to produce estimates of diffusion signals invariant to scanner effects on the voxel level. We aim to compare the effectiveness of these two methods and their combination at preserving age-related features and harmonizing scanner differences in graph theory measures computed with the Brain Connectivity Toolbox (BCT) developed by Rubinov and Sporns [11].
2. METHODS
In this work we examine graph measures from DWI data of 48 subjects across two sites. We implement voxel-wise harmonization using NSDN and feature wise harmonization using ComBat: NSDN is applied prior to tractography on the DWI themselves, whereas ComBat is applied on the graph measures at the last stage of the pipeline (Figure 2). We characterize the effectiveness of each method separately and together by analyzing the resultant coefficients of variation (CoVs) across the sites and the population and by analyzing subsequent associations with age.
Figure 2.

We used MRTrix to generate tractography from each DWI and convert to a connectome representation. In the control experiment, no harmonization is applied. ComBat harmonization is applied at the graph measure level. NSDN is applied to the DWI before tractography on the voxel level.
2.1. Data acquisition and preprocessing
We considered 23 participants from Vanderbilt Memory and Aging Project (VMAP) [12] and 25 from Baltimore Longitudinal Study of Aging (BLSA) [13]. The data covers 48 participants aged 55 to 86 years with 18 female and 22 with mild cognitive impairment. Both sites used single shell acquisitions but varied the gradient schemes. VMAP acquired 32 directions at a b-value of 700 s/mm2, and BLSA acquired 64 samples at a b-value of 1000 s/mm2, respectively. BLSA used a Philips 3T scanner at a resolution of 2.2 × 2.2 × 2.2 mm3 and resampled to 0.81 × 0.81 × 2.2 mm3. VMAP used a Philips 3T scanner at a resolution of 2 × 2 × 2 mm3.
DWI from all participants underwent preprocessing to remove eddy current, motion, and echo-planar imaging (EPI) distortions prior to any harmonization or model fitting [14].
2.2. Tractography and connectome generation
We used the MRTrix default probabilistic tracking algorithm of second order integration over fiber orientation distributions (FODs) for tractography [15]. We generated 10 million streamlines to build each tractogram. We limited tractography seeding and termination using the five-tissue-type mask and capped streamline length at 250 mm. We allowed backtracking. After, we converted the tractogram to a connectome using the Desikan-Killany atlas [16] with 84 cortical parcellations from Freesurfer [17].
2.3. Graph measures
We used the BCT for calculating modularity, average betweenness centrality, characteristic path length, and global efficiency [11]. Modularity is the degree to which the network may be subdivided into clearly delineated and nonoverlapping groups. Betweenness centrality is the fraction of shortest paths in the network that contain a given node. Average betweenness centrality is therefore the average fraction of shortest paths that nodes in a network participate in. The characteristic path length is the average shortest path between nodes in millimeters. Global efficiency is the average inverse shortest path length.
2.4. Harmonization
We used the Matlab implementation of ComBat to harmonize the differences in graph measure distributions at the feature level [18]. Briefly, ComBat creates a multivariate linear mixed effects regression model whose parameters are optimized with an empirical Bayes approach. Such model corrects for multiplicative and additive site effects and will drive the sites toward a common mean.
NSDN is a publicly available, data-driven approach for estimating tissue microstructure from DWI implemented in DiPy [10]. The null space deep network was trained on histology and scan re-scan data of an ex-vivo rhesus monkey brain. The algorithm uses a null-space architecture and takes 8th order spherical harmonic coefficients and outputs a 10th order spherical harmonic FOD with harmonized scanner effects in each voxel. NSDN displayed reproducible performance for scan-rescan data as well as previously unseen scanners. This model performs biologically informed harmonization at the voxel-level on preprocessed DWI prior to tractography.
For this study, we perform four classes of harmonization and investigate their effects: (1) no harmonization, (2) harmonization with ComBat, (3) harmonization with NSDN, and (4) harmonization with both ComBat and NSDN.
3. RESULTS
3.1. Impact on dataset variability
We compute the CoV the combined population without harmonization and after all three harmonization experiments by computing the ratio of the standard deviation of the population by the corresponding mean as a percent (Figure 3). We observe for both modularity and global efficiency that CoV decreases when either harmonization technique is applied compared to no harmonization. We also observe for these measures that CoV decreases further when both techniques are applied together, suggesting the variance reduction effects of the two are additive. We observe the same additive effect for average betweenness centrality and characteristic path length, but note that NSDN alone does not exhibit decreases in CoV, suggesting ComBat alone may produce larger reductions than NSDN alone.
Figure 3.

We calculated CoV of the control, ComBat harmonized, NSDN harmonized, and ComBat with NSDN harmonized graph measures across the combined population.
3.2. Impact on age effects
Previous work in the field has found a positive correlation between modularity and age [19]. As such, we further investigate NSDN and ComBat’s ability to preserve modularity age effects using linear models considering age, sex, and cognitive status as variables (Figure 4 and Table 1).
Figure 4.

Association of modularity with age in different harmonization techniques. (a) The root mean squared error (RMSE) of model fits. (b) The associations with age. Dashed lines reflect the 95% confidence interval.
Table 1.
We report linear model weights and their significance.
| Control | ComBat | NSDN | NSDN + ComBat | |
|---|---|---|---|---|
| Age effect | 0.0025*** | 0.0025*** | 0.0020*** | 0.0014* |
| Sex effect | −0.0143 | −0.0035 | 0.0000 | −0.0093 |
| Diagnosis effect | −0.0035 | −0.0146 | −0.0060 | −0.0022 |
| Site effect | 0.0580*** | 0.0011 | 0.04135*** | 0.0004 |
| F-statistic | 22.4*** | 4.02** | 22.9*** | 1.92 |
indicates p<0.05,
indicates p<0.005, and
indicates p<0.001.
In line with the trends observed in Figure 3, we observe the lowest error in model fit after applying both harmonization techniques, though ComBat alone produced more significant decreases than NSDN alone (Figure 4a). After controlling for sex and cognitive status, we observe the significance of the age effect improves by over an order of magnitude when either ComBat alone or ComBat with NSDN is applied (Table 1). The F-tests against constant models also become significant with ComBat alone or ComBat with NSDN (Table 1). Qualitatively, we observe that use of either harmonization technique draws the populations from the two sites closer together, though this is most obvious when ComBat and NSDN are used together (Figure 4b). Last, we observe that involvement of NSDN adds a positive bias shift in the modularity measures.
4. DISCUSSION
In this work, we begin to explore how different harmonization techniques at the voxel- or feature-wise levels or the combination of the two impact the study of brain networks quantified by complex graph theory measures in aging. Individually, ComBat reduced variance to a greater extent than NSDN alone, but the largest reduction in variance came with NSDN followed by ComBat. Additionally, we reproduce prior work that found a significant positive correlation between modularity and age and found that models after ComBat alone or ComBat with NSDN were the most significant. These results support that ComBat is efficient at statistical harmonization of graph measures; however, NSDN offers additional synergistic impact.
We posit that this finding is likely due to (1) the evaluation of harmonization in the statistical realm where ComBat directly operates, and (2) the highly non-linear and layered processes that take DWI images through tractography, connectome generation, and finally graph measure computation. These complex processes may result in the harmonized data on the voxel-level being too far removed from graph measure computation, reducing the impact of voxel-wise harmonization or introducing the bias shifts observed presently. Additionally, voxel-wise harmonization techniques treat each voxel independently, which may not account for spatially localized scanner effects which can affect global processes such as tractography and subsequently graph measure computation. Thus, further investigation is needed to understand the relationship between voxel-wise and statistical harmonization throughout different points in the standard connectomics pipeline (i.e., model fitting, tractography, connectome generation, or graph measure computation).
One limitation to this work is that we only considered two harmonization techniques: voxel-wise and feature-wise. There is evidence that ComBat harmonization on connectivity matrices prior to graph measure computation may also be effective [8]. Another limitation is that NSDN was not trained with data specific to either the VMAP or BLSA sites, which may increase performance. Specifically, NSDN was not tuned to account for differences in b-values as was the case for the data used presently. Retraining NSDN including more sites and variations in DWI acquisitions may strengthen its performance. The present study also used a small sample size, which may have contributed to the difficulty of resolving age effects in the non-modularity graph measures. A repeated study with larger sample sizes may result in more obvious improvement in identifying small age effects in modularity and other graph measures that have previously been reported [20]. Last, many other feature- and voxel-wise harmonization techniques exist. Notably, the LinearRISH framework [21] has become popular in recent years, and thus future studies should consider comparing its effects to ComBat as well in this population.
ACKNOWLEDGEMENTS
This work was conducted in part using the resources of the Advanced Computing Center for Research and Education at Vanderbilt University, Nashville, TN. Study data were obtained from the Vanderbilt Memory and Aging Project (VMAP). Data were collected by Vanderbilt Memory and Alzheimer’s Center Investigators at Vanderbilt University Medical Center. This work was supported by NIA grants R01-AG034962 (PI: Jefferson), R01-AG056534 (PI: Jefferson), R01-AG062826 (PI: Gifford), and Alzheimer’s Association IIRG-08-88733 (PI: Jefferson). This work was supported by the National Institutes of Health (NIH) under award numbers K01EB032989, K24-AG046373, K01-AG073584, and R01-AG034962, the National Science Foundation (NSF) under award number 2040462, the Alzheimer’s Association under award IIRG-08-88733, the Vanderbilt Clinical Translational Science Awards UL1-TR000445 and UL1-TR002243, and Vanderbilt’s High-Performance Computer Cluster for Biomedical Research under award S10-OD023680. This research was conducted with the support from the Intramural Research Program of the National Institute on Aging of the NIH. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or NSF.
REFERENCES
- [1].Caeyenberghs K et al. , “Brain connectivity and postural control in young traumatic brain injury patients: A diffusion MRI based network analysis,” NeuroImage : Clinical, vol. 1, no. 1, p. 106, 2012, doi: 10.1016/J.NICL.2012.09.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Yuan W, Wade SL, and Babcock L, “Structural Connectivity Abnormality in Children with Acute Mild Traumatic Brain Injury using Graph Theoretical Analysis,” 2014, doi: 10.1002/hbm.22664. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Daianu M et al. , “Breakdown of Brain Connectivity Between Normal Aging and Alzheimer’s Disease: A Structural k-Core Network Analysis”, doi: 10.1089/brain.2012.0137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Bernhardt BC, Chen Z, He Y, Evans AC, and Bernasconi N, “Graph-Theoretical Analysis Reveals Disrupted Small-World Organization of Cortical Thickness Correlation Networks in Temporal Lobe Epilepsy,” Cerebral Cortex, vol. 21, no. 9, pp. 2147–2157, Sep. 2011, doi: 10.1093/CERCOR/BHQ291. [DOI] [PubMed] [Google Scholar]
- [5].Dennis EL et al. , “CHANGES IN ANATOMICAL BRAIN CONNECTIVITY BETWEEN AGES 12 AND 30: A HARDI STUDY OF 467 ADOLESCENTS AND ADULTS,” Proceedings / IEEE International Symposium on Biomedical Imaging: from nano to macro. IEEE International Symposium on Biomedical Imaging, p. 904, 2012, doi: 10.1109/ISBI.2012.6235695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Magnotta VA et al. , “MultiCenter Reliability of Diffusion Tensor Imaging,” Brain Connectivity, vol. 2, no. 6, p. 345, Dec. 2012, doi: 10.1089/BRAIN.2012.0112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Matsui JT, “Development of image processing tools and procedures for analyzing multi-site longitudinal diffusion-weighted imaging studies,” May 2014, doi: 10.17077/ETD.LRTGIVKX. [DOI] [Google Scholar]
- [8].Onicas AI et al. , “Multisite Harmonization of Structural DTI Networks in Children: An A-CAP Study,” Frontiers in Neurology, vol. 13, Jun. 2022, doi: 10.3389/FNEUR.2022.850642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Fortin JP et al. , “Harmonization of multi-site diffusion tensor imaging data,” Neuroimage, vol. 161, pp. 149–170, Nov. 2017, doi: 10.1016/J.NEUROIMAGE.2017.08.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Nath V et al. , “Inter-Scanner Harmonization of High Angular Resolution DW-MRI using Null Space Deep Learning,” Lect Notes Monogr Ser, vol. 2019, no. 226249, p. 193, 2019, doi: 10.1007/978-3-030-05831-9_16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Rubinov M and Sporns O, “Complex network measures of brain connectivity: uses and interpretations,” Neuroimage, vol. 52, no. 3, pp. 1059–1069, Sep. 2010, doi: 10.1016/J.NEUROIMAGE.2009.10.003. [DOI] [PubMed] [Google Scholar]
- [12].Jefferson AL et al. , “The Vanderbilt Memory & Aging Project: Study Design and Baseline Cohort Overview,” Journal of Alzheimer’s Disease, vol. 52, no. 2, pp. 539–559, Jan. 2016, doi: 10.3233/JAD-150914. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Ferrucci L, “The baltimore longitudinal study of aging (BLSA): A 50-year-long journey and plans for the future,” Journals of Gerontology - Series A Biological Sciences and Medical Sciences, vol. 63, no. 12, pp. 1416–1419, 2008, doi: 10.1093/GERONA/63.12.1416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Cai LY et al. , “PreQual: An automated pipeline for integrated preprocessing and quality assurance of diffusion weighted MRI images,” Magn Reson Med, vol. 86, no. 1, pp. 456–470, Jul. 2021, doi: 10.1002/MRM.28678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].“(ISMRM 2010) Improved Probabilistic Streamlines Tractography by 2nd Order Integration Over Fibre Orientation Distributions.” https://archive.ismrm.org/2010/1670.html (accessed Jun. 23, 2022).
- [16].Klein A and Tourville J, “101 labeled brain images and a consistent human cortical labeling protocol,” Frontiers in Neuroscience, vol. 0, no. DEC, p. 171, 2012, doi: 10.3389/FNINS.2012.00171/ABSTRACT. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Fischl FB, “FreeSurfer,” 2012, doi: 10.1016/j.neuroimage.2012.01.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].“Jfortin1/ComBatHarmonization: Harmonization of multi-site imaging data with ComBat.” https://github.com/Jfortin1/ComBatHarmonization (accessed Jul. 26, 2022).
- [19].Dennis EL et al. , “CHANGES IN ANATOMICAL BRAIN CONNECTIVITY BETWEEN AGES 12 AND 30: A HARDI STUDY OF 467 ADOLESCENTS AND ADULTS,” Proceedings / IEEE International Symposium on Biomedical Imaging: from nano to macro. IEEE International Symposium on Biomedical Imaging, p. 904, 2012, doi: 10.1109/ISBI.2012.6235695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Wang Y et al. , “Longitudinal changes of connectomes and graph theory measures in aging,” Proc SPIE Int Soc Opt Eng, vol. 12032, p. 63, Mar. 2022, doi: 10.1117/12.2611845. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Kurokawa R et al. , “Cross-scanner reproducibility and harmonization of a diffusion MRI structural brain network: A traveling subject study of multi-b acquisition,” Neuroimage, vol. 245, p. 118675, Dec. 2021, doi: 10.1016/J.NEUROIMAGE.2021.118675. [DOI] [PubMed] [Google Scholar]
