Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Nov 1.
Published in final edited form as: J Neuroimaging. 2023 Aug 16;33(6):941–952. doi: 10.1111/jon.13147

Inter-site brain MRI volumetric biases persist even in a harmonized multi-subject study of multiple sclerosis

Kelly A Clark 1,2, Carly M O’Donnell 1,2, Mark A Elliott 3, Shahamat Tauhid 4, Blake E Dewey 5, Renxin Chu 4, Samar Khalil 4, Govind Nair 6, Pascal Sati 7, Anna DuVal 5, Nicole Pellegrini 5, Amit Bar-Or 8,9, Clyde Markowitz 8,9, Matthew K Schindler 8,9, Jonathan Zurawski 4, Peter A Calabresi 5, Daniel S Reich 10, Rohit Bakshi 4,*, Russell T Shinohara 1,2,8,*; NAIMS Cooperative
PMCID: PMC10981935  NIHMSID: NIHMS1969748  PMID: 37587544

Abstract

Background and Purpose:

Multicenter studies designs involving a variety of MRI scanners have become increasingly common. However, these present the issue of biases in image-based measures due to scanner or site differences. To assess these biases, we imaged 11 volunteers with multiple sclerosis (MS) with scan and rescan data at 4 sites.

Methods:

Images were acquired on Siemens or Philips scanners at 3-tesla. Automated white matter lesion detection and whole brain, gray and white matter, and thalamic volumetry were performed, as well as expert manual delineations of T1 Magnetization-Prepared Rapid Acquisition Gradient Echo and T2- Fluid-Attenuated Inversion Recovery lesions. Random effect and permutation-based nonparametric modeling was performed to assess differences in estimated volumes within and across sites.

Results:

Random effect modeling demonstrated model assumption violations for most comparisons of interest. Non-parametric modeling indicated that site explained > 50% of the variation for most estimated volumes. This expanded to > 75% when data from both Siemens and Philips scanners were included. Permutation tests revealed significant differences between average inter- and intra-site differences in most estimated brain volumes (P < .05). The automatic activation of spine coil elements during some acquisitions resulted in a shading artifact in these images. Permutation tests revealed significant differences between thalamic volume measurements from acquisitions with and without this artifact.

Conclusion:

Differences in brain volumetry persisted across MR scanners despite protocol harmonization. These differences were not well explained by variance component modeling; however, statistical innovations for mitigating inter-site differences show promise in reducing biases in multi-center studies of MS.

Keywords: Multiple Sclerosis, MRI, White Matter Lesions

Introduction

Brain volumetry performed on MRI scans is common practice for monitoring disease status and progression in patients with multiple sclerosis (MS). Disease course and longitudinal outcomes are often assessed by determining changes in hypointense lesions on T1-weighted images and hyperintense lesions on T2- Fluid-Attenuated Inversion RecoveryFLAIR images. In addition, volumetric changes across different brain structures are used to quantify atrophy.

Previous studies have highlighted the importance of imaging biomarkers in the diagnosis, management, and therapeutic trial investigation of MS. The formation of white matter lesions in the brain is an established hallmark of MS pathogenesis, and their presentation on MR images is employed in the differential diagnosis of MS from other disease mimics.1 The formation and persistence of T1-hypointense (“black hole”) lesions, which suggest axonal loss and tissue destruction, have been associated with disability and disease progression in MS patients.24 Brain volume changes, notably whole brain atrophy, which are quantified in measurements obtained from MR images, have been shown to be correlated with disability progression in patients with MS.5,6 Rates of gray matter atrophy have been shown to differ across different stages of MS,7 and both white and gray matter atrophy have been linked to the development of neuropsychological symptoms and cognitive impairment in patients with MS.8

Accurately estimating brain volumes and lesion load is crucial for defining disease status to evaluate individual patients and assess the efficacy of therapies in clinical trials; however, volumetric measurements have been shown to vary across a subject’s measures even under ideal research conditions where scanning technique (equipment and pulse sequence parameters), and duration of follow-up are carefully controlled.9 Previous studies that explored site or scanner effect in either healthy people or people living with Alzheimer’s disease have reported mixed results, with some demonstrating that the use of multiple scanners has the potential to exacerbate biases in estimated average brain volumes and negatively affect the reproducibility of various measures across different sites and scanners,10,11 while others have demonstrated high reproducibility regardless of scanner.1214

Site-related biases in brain volumes and methods for mitigating such bias including protocol standardization and image harmonization have been investigated in patients living with MS,1517 yet much remains to be understood regarding the full extent of how site affects variation in images when MS pathology is present.

Multicenter studies involving a variety of MR scanners are becoming increasingly common to meet the needs of larger sample sizes and increased statistical power in research settings. Therefore, it is vital to understand expected intra- and inter-site differences and their impact on volumetry in people with MS to properly design clinical trials and accurately assess MRI results. Our published pilot study, which investigated scanner and methodological variation using a standardized protocol on one traveling participant with MS, revealed that greater than 50% of the variation of most brain region of interest (ROI) volumes could be explained by site and differences in lesion volumes across sites were as large as 25%. Additionally, systematic biases in brain volumes persisted between sites that used scanners from the same vendors.15

To further assess variation in brain volumes attributable to site, here we imaged 11 volunteers with stable MS using a harmonized protocol developed by the North American Imaging in Multiple Sclerosis (NAIMS) Cooperative at 4 different sites. The NAIMS Cooperative was established in 2012 to unite experts in the field of MS imaging to improve MS research and routine practices through several initiatives, including the creation of standard imaging protocols and the identification of reliable imaging markers to monitor disease and therapeutic outcomes.18,19 Here, we explore average inter- and intra-site differences across volumes for several different ROIs in the brain obtained using a variety of automated volumetry methods and manual lesion segmentation.

Methods

Participants

Eleven people with stable MS were recruited to participate in the study across all four sites. Nine of these participants were imaged at all 4 study centers, which included the University of Pennsylvania (Penn), the Brigham and Women’s Hospital (Brigham), the National Institutes of Health (NIH), and the Johns Hopkins University (JHU). 2 participants were imaged at 2 and 3 sites, respectively, due to a pause in research activity during the COVID-19 pandemic. The mean age of our 11 participants (4 male, 7 female) at time of enrollment was 38 (range 29–47) years. Expanded Disability Status Scales of the 10 participants who underwent clinical assessment ranged from 0 to 3 with a mean score of 1.9 and a median score of 2.25, and the median timed 25 walk was 5.6 seconds. All participants were receiving established disease modifying treatments (DMTs) at the time of study enrollment. DMTs were not changed for at least 6 months prior to enrollment. Each participant signed an informed consent form for this study, which was approved by the University of Pennsylvania’s institutional review board (IRB). Brigham, NIH, and JHU ceded to Penn’s central IRB through a reliance agreement.

Imaging

A standardized, high-resolution, 3-tesla (3T) MRI brain scan protocol developed by the NAIMS Cooperative was performed at each site. The protocol was compliant with recent internationally issued guidelines.19 Images were acquired on Siemens Skyra (Brigham, NIH), Siemens Prisma (Penn), or Philips Achieva (JHU) scanners. Scan-rescan image pairs were acquired on the same day at each visit and participants were removed from the scanner and repositioned between scans. The average length of time between scans acquired at the first and last site was 58 days (range 32–87). Three-dimensional (3D) T1-weighted and 3D T2-FLAIR images acquired at each site for a single participant are shown in Figure 1.

FIGURE 1:

FIGURE 1:

Three-dimensional T1-weighted and T2-FLAIR images acquired during different sessions on different scanners for one participant. Penn, University of Pennsylvania; Brigham, The Brigham andWomen’s Hospital; NIH, The National Institutes of Health; JHU, The Johns Hopkins University; FLAIR, fluid-attenuated inversion recovery.

Gradient non-linearity (GNL) has been shown to increase variation in upper cervical cord area volumes in people living with MS by as much as 15%, and GNL distortion correction methods have been shown to minimize this variation,20 so an additional pair of geometric distortion-corrected scan and re-scan images were reconstructed at each of the Siemens sites to compare with images without distortion correction. Relevant acquisition sequences used in this analysis are shown in the Table 1. Differences in echo time, repetition time, and inversion time within sequences were necessary to minimize differences in images across scanner manufacturer.

Table 1 :

3T brain MRI anatomic acquisition protocolsa

3D T2 FLAIR 3D T1 MPRAGE
Siemens Skyra Siemens Prisma Philips Achieva Siemens Skyra Siemens Prisma Philips Achieva
Operation system version syngo VE11C syngo VE11C R5.6.1 syngo VE11C syngo VE11C R5.6.1
Head Coil 32 channel 32 channel 32 channel 32 channel 32 channel 32 channel
Acceleration factor for parallel imaging 2 2 2.6 2 2 2
Orientation Sagittal Sagittal Sagittal Sagittal Sagittal Sagittal
FOV (cm) 25.6 × 25.6 25.6 × 25.6 25.6 × 25.6 25.6 × 25.6 25.6 × 25.6 25.6 × 25.6
Matrix Size 256 × 256 256 × 256 256 × 256 256 × 256 256 × 256 256 × 240
No. of Sections 176 176 176 176 176 176
TR (ms) 4800 4800 4800 1900 1900 2500
TE (ms) 352 352 305 2.52 2.52 3.14
TI (ms) 1800 1800 1650 900 900 900
Flip Angle 120 120 120 9 9 9
Voxel Size (mm) 1.0 × 1.0 × 1.0 1.0 × 1.0 × 1.0 1.0 ×1.0 ×1.0 1.0 × 1.0 × 1.0 1.0 × 1.0 × 1.0 1.0 × 1.0 × 1.0
Scan Time (min:s) 6:57 6:57 6:33 4:15 4:15 5:09
No. of Signal Averages 1 1 1 1 1 1
a

The Brigham and Women’s Hospital and the National Institutes of Health used a Siemens Skyra scanner, the University of Pennsylvania used a Siemens Prisma scanner, and the Johns Hopkins University used a Philips Achieva Scanner.

Abbreviations: TR = Repetition Time, TE = Echo Time, TI = Inversion Time, No. = Number, FOV = Field of View, FLAIR = Fluid-Attenuated Inversion Recovery, MPRAGE = Magnetization-Prepared Rapid Acquisition Gradient, Min = Minutes, S = Seconds, MS = Milliseconds

The unexpected activation of a single spine coil receive element due to head placement in the scanner during several imaging sessions at Penn and Brigham resulted in shading artifacts through the caudal areas of the images, as shown in Figure 2.

FIGURE 2:

FIGURE 2:

Comparison of a pair of scan-rescan images acquired during the same session on the same scanner in which head positioning resulted in a spine receive coil activation (right) compared to not (left). Note excess shading apparent in the inferior anterior region of the image resulting from spine coil activation.

Manual Lesion Segmentation

Images were de-identified and manually assessed for identification of individual lesions and quantification/contouring of each lesion to derive total cerebral T1-hypointense lesion volume (T1LV) and T2-hyperintense lesion volume (T2LV) from the native 3D T1 and FLAIR images, respectively. This was performed by one experienced observer (S.T.) with a medical doctor degree and extensive experience in manual evaluation of MR scans of patients with MS; any challenging cases were verified by a senior lab director and neurologist (R.B.). For T2LV, all lesions on the FLAIR images were identified. For T1LV, lesions showing both hypointensity on T1-weighted images and at least partial hyperintensity on FLAIR images were marked. A semiautomated edge-finding tool in Jim (Version 7.0; http://www.xinapse.com/home.php) was then employed for delineation and volume estimation. De-identified images were pooled for the entire study, analyzed in one batch, and were presented in random order to reduce scan-to-scan memory effects. Estimated volumes obtained using Jim are shown in Figure 3.

FIGURE 3:

FIGURE 3:

Manually measured T1-weighted and T2-FLAIR lesion volumes for scan-rescan pairs of images from 11 subjects at each of the four North American Imaging inMultiple Sclerosis sites. Scan 1 volumes are indicated by circles, and scan 2 volumes are shown with triangles. Each subject is represented by a different color, and points are offset from one another to aid in visualization. Penn, University of Pennsylvania; Brigham, The Brigham andWomen’s Hospital; NIH, The National Institutes of Health; JHU, The Johns Hopkins University; FLAIR, fluid-attenuated inversion recovery; Subj, subject.

Pre-processing

Prior to automated segmentation, images underwent bias correction via nonuniform intensity normalization (N4ITK),21(p4) and FLAIR images were rigidly aligned to the subject’s own corresponding T1-weighted image within a given scan session. Brain extraction was performed using Multi-Atlas Skull Stripping,22 and intensity normalization was performed using WhiteStripe prior to automated lesion segmentation.23

Automated Volumetry Methods

Fully automated white matter (WM) lesion segmentation was performed using the Method for Inter-Modal Segmentation Analysis (MIMoSA), a logistic regression-based WM lesion segmentation method that incorporates mean structure and local covariance structures across imaging modalities obtained by intermodal coupling.24 Normal appearing white matter and gray matter volumes were estimated using Joint Label Fusion (JLF), a multi-atlas segmentation method that utilizes weighted voting based on voxel-level joint probability of a segmentation error occurring in pairs of atlases in order to minimizes total labelling error.25 Whole brain volume was estimated using FMRIB’s Automated Segmentation Tool.26 Estimated thalamic volumes were obtained using JLF and FMRIB’s Integrated Registration and Segmentation Tool.27

Statistical Analysis

All statistical analyses and visualization were performed in R (version 4.1.1) (http://www.r-project.org/). Random effects modeling was performed using the lme4 package (version 1.1.27.1).28 Models utilizing a classical random intercept structure, which included a random effect term that nested site within subject (to account for the interaction between site and subject as well as subject-level clustering for each of our brain structures and volumetry methods), were used to assess site biases in estimated volumes. Validity of the random effects models was assessed visually using normal Q-Q plots of the residuals and random effects created using the stats package (version 4.1.1), residual versus fitted plots, and density plots of the random effects for each model using the ggplot2 package (version 3.3.5). The Shapiro-Wilk test was used to assess normality of the residuals for each model using the stats package.

Linear regression was performed using the stats package to model the relationship between site and estimated volumes within each subject for all brain structures and volumetry methods. The proportion of variation explained by site was computed via univariate regression with site as a predictor and volume as an outcome to assess the effect of site on estimated volumes within each subject from these models.

To complement the parametric modeling approach, mean absolute difference (MAD) across all subjects was computed within and across sites. This was done by computing the absolute differences between all possible pairs of intra-site measures and inter-site measures within each subject. Here we define inter-site differences as ABS(Site-K_scan#x - Site-L_scan#x) for all possible site pairs (K and L), and x=1 or 2 and intra-site differences as ABS(SiteX_scan2-SiteX_scan1) across all sites (X). These differences were then averaged across all subjects to obtain a measure of the mean absolute inter-site difference and mean absolute intra-site difference followed by a calculation of a ratio of these two measures. Permutation testing was performed to assess whether the MADs of inter- and- intra-site measures were significantly different, using the ratio of MADs across inter-site measures and intra-site measures as our test statistic. Under the null hypothesis, we would expect a ratio of 1, indicating that average inter-site and average intra-site differences are equal. Ten-thousand permutations were performed, which involved shuffling all volumes blind to site and scan number within each subject across all ROIs, and the ratio of average absolute inter- and intra-site differences was computed after each permutation.

Biases in estimated volumes associated with active coil elements were assessed by computing the median absolute difference between estimated volumes for a) within-subject pairs of images that were acquired using the same active coil elements (namely, either both with or without activation of a spine coil element) and b) within-subject pairs of images that were acquired using different active coil elements was computed for all ROIs and methods. These were then averaged across all subjects. Permutation testing was performed to assess whether the median absolute differences from measures acquired using similar active coil and different active coil elements were significantly different, using the ratio of median absolute difference across different coil measures and median absolute difference across same coil measures as our test statistic. 10,000 permutations were performed by shuffling coil labels within each site (Penn and Brigham), and the ratio described above was computed after each permutation.

Results

Visual inspection of the residuals and effects from the random effect models indicated that most of the models violated the assumptions of normal distribution and homogeneity of the residuals which is exemplified in Figure 4 as well as normal distribution of the random effects in some models. This led to marked underestimation of the proportion of variance explained by site. In lieu of employing one of the many options for transforming the data, we opted for non-parametric permutations for robustness.

FIGURE 4:

FIGURE 4:

Normal Q-Q and residual versus fitted plots for the random-effect model that includes thalamic volumes obtained using joint label fusion from distortion-corrected images acquired at all four sites as the response variable and a random effect term that nested site within subject. The heavy-tailed distribution of the normalQ-Q plot indicates nonnormal distribution of the residuals from the random-effect model. As the fitted values increase in the residual versus fitted plot, variance of the residuals also increases, indicating heteroskedasticity of the residuals.

The spread of the proportion of variation explained by site from linear models for each brain structure and volumetry methods across all subjects is shown in Figure 5. Site explained > 50% of the variation in most subjects across most automated methods. More than 75% of the variation was explained by site in most subjects for most methods and brain structures where data from all scanner manufacturers was included in the models, indicating notable differences across scanner manufacturers.

FIGURE 5:

FIGURE 5:

Proportion of variation explained by site in brain volumes extracted via various methods represented by different colors. Each panel indicates settings that either included or excluded data from the Philips scanner as well as the distortion correction settings. Each boxplot depicts the spread of R2 values for site across all subjects. JLF, joint label fusion; FIRST, FMRIB’s Integrated Registration and Segmentation Tool; MIMoSA,Method for Inter-Modal Segmentation Analysis;MPRAGE, magnetization-prepared rapid acquisition gradient echo; FLAIR, fluid-attenuated inversion recovery; FAST, FMRIB’s Automated Segmentation Tool.

Ratios of average absolute inter- and intra-site differences are shown in Figure 6. Mean absolute differences (MAD) across pairs of inter-site measures were greater compared to intra-site measures for most volumetry methods and brain structures, with the largest ratios of average differences ranging from approximately 1.5 to 2.5 and occurring in subgroups that included data acquired from both scanner manufacturers. This indicates larger inter-site differences in volumes acquired on both Siemens and Phillips scanners compared with those acquired on only Siemens scanners for most of the automated volumetry methods.

FIGURE 6:

FIGURE 6:

Ratios of average absolute inter- and intrasite differences of volumes extracted via various methods (horizontal axis). Colors indicate different settings that either included or excluded data from the Philips scanner, as well as the distortion correction setting. The blue-dashed line represents where inter- and intrasite differences are equal; intersite differences were larger than intrasite differences for most measures. JLF, joint label fusion; FIRST, FMRIB’s Integrated Registration and Segmentation Tool;MIMoSA, Method for Inter-Modal Segmentation Analysis;MPRAGE, magnetization-prepared rapid acquisition gradient echo; FLAIR, fluid-attenuated inversion recovery; FAST, FMRIB’s Automated Segmentation Tool.

Average absolute difference between all possible inter-subject pairs of images was computed, and ratios of average absolute inter-subject and inter-site differences are shown in Figure 7. Average inter-subject differences were larger than average inter-site differences, ranging from approximately 1.5 to 10 times larger across all volumetry methods and brain structures, indicating that differences between subjects were of greater magnitude than inter-site differences.

FIGURE 7:

FIGURE 7:

Ratios of average absolute intersubject and intersite differences of volumes extracted via various methods (horizontal axis). Colors indicate different settings that either included or excluded data from the Philips scanner, as well as the distortion correction setting. The blue-dashed line represents where intersubject and intersite differences are equal; intersubject differences were larger than intersite differences for all measures. JLF, joint label fusion; FIRST, FMRIB’s Integrated Registration and Segmentation Tool;MIMoSA, Method for Inter-Modal Segmentation Analysis;MPRAGE, magnetization-prepared rapid acquisition gradient echo; FLAIR, fluid-attenuated inversion recovery; FAST, FMRIB’s Automated Segmentation Tool.

Permutation testing was performed to assess whether the MADs of inter- and- intra-site measures were significantly different, using the ratio of MADs across inter-site measures and intra-site measures as our test statistic. Under the null hypothesis, we would expect a ratio of 1, indicating that average inter-site and average intra-site differences are equal. 10,000 permutations were performed, involving the shuffling of site labels within each person, and the ratio of average absolute inter- and intra-site differences was computed after each permutation. The resulting negative log (base 10) p-values from these permutation tests are shown in Figure 8. Significant p-values were observed for many of the automated volumetry methods and brain structures both before and after multiple comparison correction. Most of these corresponded to tests that included data from both Siemens and Philips sites, indicating that significant differences in volumes were most prevalent across different scanner manufacturers.

FIGURE 8:

FIGURE 8:

Negative log (base 10) p-values from permutation tests assessing the association between site and brain volume extracted via various methods (horizontal axis). Colors indicate different settings that either included or excluded data from the Philips scanner, as well as the distortion correction setting. The red-dashed line represents the unadjusted significance threshold of .05, and the blue-dashed line represents the significance threshold obtained from the Bonferroni correction method. Most measurements demonstrated significant site effects, with many settings surviving even conservative Bonferroni correction. JLF, joint label fusion; FIRST, FMRIB’s Integrated Registration and Segmentation Tool; MIMoSA,Method for Inter-Modal Segmentation Analysis;MPRAGE, magnetization-prepared rapid acquisition gradient echo; FLAIR, fluid-attenuated inversion recovery; FAST, FMRIB’s Automated Segmentation Tool.

To assess differences in estimated volumes associated with active coil elements during imaging, permutation testing was performed using the ratio of median absolute differences across all pairs of within subject measures from images acquired using different active coil elements and those acquired using the same active coil elements and the resulting negative log (base 10) p-values from these permutation tests are shown Figure 9. Distortion and non-distortion-corrected imaging indicated multiple comparison-adjusted significant differences in JLF-measured thalamic volumes across receive coil modes (p < 0.01), potentially due the proximity of the thalamus to the receive coil shading artifact and JLF is more sensitive to global registration differences.

FIGURE 9:

FIGURE 9:

Negative log (base 10) p-values from permutation tests assessing the association between active coil elements during imaging and brain volumes extracted via various methods (horizontal axis). Colors indicate the distortion correction setting. The red-dashed line represents the unadjusted significance threshold of .05, and the blue-dashed line represents the significance threshold obtained from the Bonferroni correction method. Thalamic measurements using JLF andMIMoSA lesion volumes demonstrated significant effects, with JLF thalamus results surviving Bonferroni correction. JLF, joint label fusion; FIRST, FMRIB’s Integrated Registration and Segmentation Tool;MIMoSA, Method for Inter-Modal Segmentation Analysis;MPRAGE, magnetization-prepared rapid acquisition gradient echo; FLAIR, fluid-attenuated inversion recovery; FAST, FMRIB’s Automated Segmentation Tool.

Discussion

The aim of this study was to quantify differences in brain segmentation volumes from patients with MS across different scanners and sites to further understand how site biases manifest in the presence of varying MS pathology. The use of a standardized high-resolution protocol at 3T across all sites was crucial in minimizing the effect of scanning parameters on variation in estimated volumes. Critically, all study participants were clinically stable at the time of inclusion and radiologically stable throughout the imaging period, and imaging was completed within approximately 3 months of each person’s initial imaging session to minimize the effect of biological variation on estimated volumes. Moreover, as previous studies have demonstrated an association between brain volumes and time of day and hydration status,2931 imaging sessions were scheduled within a consistent time of day across all visits for each subject in order to control for this potential source of variation.

Biases in estimated volumes were assessed across several ROIs of key interest in MS using an array of automated methods to improve generalizability of our results. Despite these considerations, biases in brain volumes associated with site persisted and were found to be most notable across sites where different scanner vendors were used.

Further study is necessary to develop an appropriate parametric model to represent these site biases, since random and mixed effects modelling failed due to violations of classical statistical assumptions. Non-normality of the residuals and random effects from models that included a nested random effect to account for the interaction between site and subject led to underestimation of the variance in estimated volumes explained by site for most ROIs/methods considered in this analysis.

Head placement in the scanner resulted in the unanticipated activation of a spine receive coil element in 2 Siemens scanners, which produced a shading artifact unique to this occurrence.

Permutation testing revealed that median absolute difference across pairs of estimated volumes measures from images where active coil elements differed during imaging was significantly more than that of images acquired using similar active coil elements in the case of thalamic volumes. These findings warrant further investigation of the effect of active receive coil elements on volumetry and suggest that consideration of active receive coils employed, beyond the physical receive array device, and their settings in scanning protocols could reduce bias both within and across scanners.

This analysis involved a limited number of participants and range of scanners, and we obtained research-quality images from a small number of large academic hospitals, which does not reflect the level of variation we might expect to see across additional participants, scanners, and community hospitals and independent radiology practices. Previous findings suggest short term structural changes of the brain associated with different phases of the menstrual cycle and during pregnancy and post-partum which was not accounted for in this study.32,33,34 Since previous studies indicate an association between anesthesia exposure and brain volume changes, surgical procedures and recent surgical history around the time of study activities should be considered in future analyses.35 Additionally, the stability of individual MR scanners over time was not considered in this study design, and longitudinal studies should be considered moving forward to elucidate any association between changes in scanner performance over time and imaging outcomes such as brain volumes. Further analysis involving a larger number of participants, scanners, vendors and versions, sites, and longitudinal imaging follow-up is warranted to assess generalizability of our results and to understand how site effect in our analysis relates to site effect in longitudinal clinical follow-up.

Imaging 11 people with stable MS within 3 months of their initial study visit at 4 different NAIMS sites allowed us to assess inter- and intra-site differences in brain volumetric measurements. Significant technical variation in estimated volumes due to site were present across most ROIs and automated methods despite careful protocol harmonization. Average inter-site differences were largest relative to intra-site differences in subgroups that included data from both Philips and Siemens scanners, indicating higher variability across scanner manufacturer compared to variability in estimated volumes from images acquired using the same scanner manufacturer. Automatic activation of a receive coil element in the spine coil on Siemens scanners contributed to increased significant biases in volumetric measurements.

Our results highlight the persistence of inter-site variation even when using a harmonized protocol and stress the need to account for inter-site variation in clinical and research settings as they have the potential to confound study results. In cases where automated volumetry is used in clinical decision making, the effects such variation should be considered when making treatment decisions. Further study is warranted to improve our understanding of site effect in people with MS and to develop methods to mitigate these site effects.

Acknowledgements and Disclosures

We acknowledge the contribution of the NINDS Neuroimmunology Clinic and the NIMH Functional Magnetic Resonance Facility. K.A. Clark, M.A. Elliott, C. M. O’Donnell, S. Tauhid, R. Chu, S. Khalil, P. Sati, A. DuVal, N. Pellegrini, and J. Zurawski has received research support from Novartis, and I-Mab Biopharma. B.E. Dewey is supported by a post-doctoral fellowship from the National Multiple Sclerosis Society. C. Markowitz has received consulting from: Alexion, Bayer Healthcare, Biogen, BMS/Celgene, Janssen/Actelion, EMD Serono, Novartis, Roche/Genentech, Sanofi/Genzyme. A. Bar-Or participated as a speaker in meetings sponsored by and received consulting fees from Accure, Atara Biotherapeutics, Biogen, BMS/Celgene/Receptos, GlaxoSmithKline, Gossamer, Janssen/Actelion, Medimmune, Merck/EMD Serono, Novartis, Roche/Genentech, Sanofi-Genzyme; and has received grant support to the University of Pennsylvania from Biogen Idec, Roche/Genentech, Merck/EMD Serono and Novartis, unrelated to the current study. P.A. Calabresi is PI on grants to JHU from Annexon, Principia, and Genentech and has received consulting fees from Biogen, Disarm (now Lilly) and Avidea Technologies. G. Nair is supported by the intramural research program at the NINDS. D.S. Reich has Cooperative Research and Development Agreements with Abata Therapeutics, Sanofi Genzyme, and Vertex Pharmaceuticals, unrelated to the current study. R. Bakshi has received consulting fees from Bristol-Myers Squibb and EMD Serono and research support from Bristol-Myers Squibb, EMD Serono, and Novartis. R.T. Shinohara receives consulting fees from Octave Bioscience.

Funding:

Funding for this project was obtained from the National Institutes of Health (R01MH123550), the Intramural Research Program of NINDS, and the National Multiple Sclerosis Society (RG-1707-28586).

References

  • 1.Lublin FD, Reingold SC, Cohen JA, Cutter GR, Sørensen PS. Defining the clinical course of multiple sclerosis. Neurology 2014;83:278–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Giorgio A, Stromillo ML, Bartolozzi ML, et al. Relevance of hypointense brain MRI lesions for long-term worsening of clinical disability in relapsing multiple sclerosis. Mult Scler 2014;20:214–9. [DOI] [PubMed] [Google Scholar]
  • 3.Bagnato F Evolution of T1 black holes in patients with multiple sclerosis imaged monthly for 4 years. Brain 2003;126:1782–9. [DOI] [PubMed] [Google Scholar]
  • 4.Sahraian MA, Radue EW, Haller S, Kappos L. Black holes in multiple sclerosis: definition, evolution, and clinical correlations: Black holes in MS. Acta Neurol Scand 2009;122:1–8. [DOI] [PubMed] [Google Scholar]
  • 5.Andravizou A, Dardiotis E, Artemiadis A, et al. Brain atrophy in multiple sclerosis: mechanisms, clinical relevance and treatment options. Auto Immun Highlights 2019;10:7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Zivadinov R, Uher T, Hagemeier J, et al. A serial 10-year follow-up study of brain atrophy and disability progression in RRMS patients. Mult Scler 2016;22:1709–18. [DOI] [PubMed] [Google Scholar]
  • 7.Fisher E, Lee JC, Nakamura K, Rudick RA. Gray matter atrophy in multiple sclerosis: A longitudinal study: Gray Matter Atrophy in MS. Ann Neurol 2008;64:255–65. [DOI] [PubMed] [Google Scholar]
  • 8.Sanfilipo MP, Benedict RHB, Weinstock-Guttman B, Bakshi R. Gray and white matter brain atrophy and neuropsychological impairment in multiple sclerosis. Neurology 2006;66:685–92. [DOI] [PubMed] [Google Scholar]
  • 9.Morey RA, Selgrade ES, Wagner HR, Huettel SA, Wang L, McCarthy G. Scan-rescan reliability of subcortical brain volumes derived from automated segmentation. Hum Brain Mapp 2010;31:1751–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kruggel F, Turner J, Muftuler LT. Impact of scanner hardware and imaging protocol on image quality and compartment volume precision in the ADNI cohort. NeuroImage 2010;49:2123–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Jovicich J, Czanner S, Han X, et al. MRI-derived measurements of human subcortical, ventricular and intracranial brain volumes: Reliability effects of scan sessions, acquisition sequences, data analyses, scanner upgrade, scanner vendors and field strengths. NeuroImage 2009;46:177–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Cannon TD, Sun F, McEwen SJ, et al. Reliability of neuroanatomical measurements in a multisite longitudinal study of youth at risk for psychosis: Reliability of Multisite, Longitudinal MRI. Hum Brain Mapp 2014;35:2424–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bartzokis G, Mintz J, Marx P, et al. Reliability of in vivo volume measures of hippocampus and other brain structures using MRI. Magn Reson Imaging 1993;11:993–1006. [DOI] [PubMed] [Google Scholar]
  • 14.Schnack HG, van Haren NEM, Hulshoff Pol HE, et al. Reliability of brain volumes from multicenter MRI acquisition: A calibration study. Hum Brain Mapp 2004;22:312–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Shinohara RT, Oh J, Nair G, et al. Volumetric Analysis from a harmonized multisite brain MRI study of a single subject with Multiple Sclerosis. AJNR Am J Neuroradiol 2017;38:1501–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wrobel J, Martin ML, Bakshi R, et al. Intensity warping for multisite MRI harmonization. NeuroImage 2020;223:117242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Schwartz DL, Tagge I, Powers K, et al. Multisite reliability and repeatability of an advanced brain MRI protocol: Reliability of an Advanced Brain MRI Protocol. J Magn Reson Imaging 2019;50:878–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Oh J, Bakshi R, Calabresi PA, et al. The NAIMS cooperative pilot project: Design, implementation and future directions. Mult Scler 2018;24:1770–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wattjes MP, Ciccarelli O, Reich DS, et al. 2021 MAGNIMS–CMSC–NAIMS consensus recommendations on the use of MRI in patients with multiple sclerosis. Lancet Neurol 2021;20:653–70. [DOI] [PubMed] [Google Scholar]
  • 20.Papinutto N, Bakshi R, Bischof A, et al. Gradient nonlinearity effects on upper cervical spinal cord area measurement from 3D T 1 -weighted brain MRI acquisitions: Gradient Nonlinearity on UCCA Measurements From 3D T 1 -Weighted Brain MRI. Magn Reson Med 2018;79:1595–1601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Tustison NJ, Avants BB, Cook PA, et al. N4ITK: Improved N3 Bias Correction. IEEE Trans Med Imaging 2010;29:1310–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Doshi J, Erus G, Ou Y, Gaonkar B, Davatzikos C. Multi-atlas skull-stripping. Acad Radiol 2013;20:1566–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Shinohara RT, Sweeney EM, Goldsmith J, et al. Statistical normalization techniques for magnetic resonance imaging. NeuroImage: Clin 2014;6:9–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Valcarcel AM, Linn KA, Vandekar SN, et al. MIMoSA: An automated method for intermodal segmentation analysis of Multiple Sclerosis brain lesions: Method For inter-modal segmentation analysis. J Neuroimaging 2018;28:389–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Wang H, Yushkevich PA. Multi-atlas segmentation with joint label fusion and corrective learning—an open source implementation. Front Neuroinform 2013;7:27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Zhang Y, Brady M, Smith S. Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Trans Med Imaging 2001;20:45–57. [DOI] [PubMed] [Google Scholar]
  • 27.Patenaude B, Smith SM, Kennedy DN, Jenkinson M. A Bayesian model of shape and appearance for subcortical brain segmentation. NeuroImage 2011;56:907–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. J Stat Soft 2015;67:1–48. [Google Scholar]
  • 29.Kempton MJ, Ettinger U, Schmechtig A, et al. Effects of acute dehydration on brain morphology in healthy humans. Hum Brain Mapp 2009;30:291–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Nakamura K, Brown RA, Narayanan S, Collins DL, Arnold DL. Diurnal fluctuations in brain volume: Statistical analyses of MRI from large populations. NeuroImage 2015;118:126–32. [DOI] [PubMed] [Google Scholar]
  • 31.Nakamura K, Brown RA, Araujo D, Narayanan S, Arnold DL. Correlation between brain volume change and T2 relaxation time induced by dehydration and rehydration: Implications for monitoring atrophy in clinical studies. NeuroImage: Clin 2014;6:166–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Hagemann G, Ugur T, Schleussner E, et al. Changes in brain size during the menstrual cycle. Eagleman DM, ed. PLoS One 2011;6:e14655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Meeker TJ, Veldhuijzen DS, Keaser ML, Gullapalli RP, Greenspan JD. Menstrual cycle variations in gray matter volume, white matter volume and functional connectivity: critical impact on parietal lobe. Front Neurosci 2020;14:594588. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Oatridge A, Holdcroft A, Saeed N, et al. Change in brain size during and after pregnancy: study in healthy women and women with preeclampsia. AJNR Am J Neuroradiol 2002;23:19–26. [PMC free article] [PubMed] [Google Scholar]
  • 35.Kapoor I, Mahajan C, Prabhakar H. Effect of anesthetic agent on brain volume: A transcranial sonographic assessment. J Anaesthesiol Clin Pharmacol 2021;37:139. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES