Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Nov 1.
Published in final edited form as: Magn Reson Imaging. 2018 Jul 23;53:105–111. doi: 10.1016/j.mri.2018.07.011

Scan-rescan repeatability and cross-scanner comparability of DTI metrics in healthy subjects in the SPRINT-MS multicenter trial

Xiaopeng Zhou a, Ken E Sakaie b, Josef P Debbins c, Sridar Narayanan d,e, Robert J Fox f, Mark J Lowe b
PMCID: PMC6138530  NIHMSID: NIHMS1502231  PMID: 30048675

Abstract

Purpose:

To assess intrascanner repeatability and cross-scanner comparability for diffusion tensor imaging (DTI) metrics in a multicenter clinical trial.

Methods:

DTI metrics (including longitudinal diffusivity [LD], fractional anisotropy [FA], mean diffusivity [MD], and transverse diffusivity [TD]) from pyramidal tracts for healthy controls were calculated from images acquired on twenty-seven 3T MR scanners (Siemens and GE) with 6 different scanner models and 7 different software versions as part of the NN102/SPRINT-MS clinical trial. Each volunteer underwent two scanning sessions on the same scanner. Signal-to-noise ratio (SNR) and signal-to-noise floor ratio (SNFR) were also assessed.

Results:

DTI metrics showed good scan-rescan repeatability. There were no significant differences between scans and rescans in LD, FA, MD, or TD values. Although the cross-scanner coefficient of variation (CV) values for all DTI metrics were <5.7%, significant differences were observed for LD (p< 3.3e-5) and FA (p< 0.0024) when GE scanners were compared with Siemens scanners. Significant differences were also observed for SNR when comparing GE scanners and Siemens Skyra scanners (p< 1.4e-7) and when comparing Siemens Skyra scanners and TIM Trio scanners (p< 1.0e-10). Analysis of background signal also demonstrated differences between GE and Siemens scanners in terms of signal statistics. The measured signal intensity from a background noise region of interest was significantly higher for GE scanners than for Siemens scanners (p< 1.2e-12). Significant differences were also observed for SNFR when comparing GE scanners and Siemens Skyra scanners (p< 2.5e-11), GE scanners and Siemens Trio scanners (p< 7.5e-11), and Siemens Skyra scanners and TIM Trio scanners (p< 2.5e-9).

Conclusions:

The good repeatability of the DTI metrics among the 27 scanners used in this study confirms the feasibility of combining DTI data from multiple centers using high angular resolution sequences. Our observations support the feasibility of longitudinal multicenter clinical trials using DTI outcome measures. The noise floor level and SNFR are important parameters that must be assessed when comparing studies that used different scanner models.

Keywords: multicenter, scan-rescan, reproducibility, DTI, noise floor, brain

1. Introduction

Advanced imaging modalities can characterize tissue injury and recovery throughout the brain, and derived measures can be attractive biomarkers of response to treatment of disorders affecting the central nervous system. Deploying advanced imaging methods in multicenter clinical trials is limited by differences in the implementation of these methods between scanner manufacturers, which in turn may lead to differing measurements across magnets. A better understanding of these differences would aid in the design of clinical trials that consider using advanced imaging measures.

The Secondary and Primary pRogressive Ibudilast Neuronext Trial in Multiple Sclerosis (SPRINT-MS) is a multicenter clinical trial using advanced MRI techniques to identify the treatment response to ibudilast in patients with secondary and primary progressive multiple sclerosis (MS)[1,2]. One of the goals in SPRINT-MS is to use diffusion tensor imaging (DTI) to evaluate the change in DTI-based measures as a biomarker of efficacy. DTI is an attractive technique in that its measures can be used to characterize tissue properties quantitatively and robustly. For this reason, DTI metrics are candidate biomarkers in many clinical studies, such as studies of cancer [3,4], MS [5], stroke [6], Huntington’s disease [7,8], and Alzheimer’s disease [9,10]. High angular resolution diffusion imaging (HARDI), which uses more than 50 diffusion gradient directions [11], provides a more robust calculation of DTI metrics than traditional DTI. However, only a few studies have assessed the reproducibility of DTI metrics using HARDI [12,13].

Although multicenter DTI and HARDI studies can provide attractive outcome metrics for multicenter studies [1,7,13,14], variability among scanners may reduce the potential benefits of using multiple centers to evaluate new therapies. In longitudinal studies, scanner software and hardware upgrades cannot be avoided, and scanners may be replaced with different models over the course of the study. Furthermore, many studies combine images from different scanners retrospectively. To confidently and objectively interpret data pooling from different MR scanners with variations in software and hardware, intrascanner and interscanner comparisons of DTI measurements are necessary. Intrascanner comparison was conducted on the same healthy volunteer under the same conditions (scanner) and interscanner comparison was conducted on different healthy volunteers under different conditions (different scanners). Repeatability refers to intrascanner comparisons while reproducibility refers to interscanner comparisons, as defined by Bartlett et al. [15].

Several reproducibility and repeatability DTI studies have been conducted in multicenter validation studies involving human subjects [5,8,13,14,1621]. These studies have demonstrated heterogeneous results, including comparable fractional anisotropy (FA) and diffusivity values obtained from 5 different 3T MR scanners and platforms [5], significant variation of FA across scanners [14], and consistent reproducibility for all DTI metrics on different scanners from all 3 major manufacturers [21]. Assessment of intrascanner and interscanner repeatability in healthy volunteers would allow researchers to more fully understand the differences that are seen in neurologically diseased patients.

To this end, we sought to assess scan-rescan repeatability and to compare DTI metrics across various scanners in healthy volunteers from the SPRINT-MS study. The goal was to obtain results that could be used to assess differences in DTI metrics in future multicenter longitudinal studies.

2. Materials and methods

2.1. MRI scanners and subjects

SPRINT-MS involved 27 imaging centers with various 3T scanners manufactured by Siemens Healthcare (Erlangen, Germany) and GE Medical Systems (Waukesha, WI, USA) (Table 1). The 27 scanners included 11 Siemens TIM Trio, 6 Siemens Skyra, 1 GE Signa EXCITE, 7 GE Signa HDxt, 1 GE DISCOVERY MR750, and 1 GE DISCOVERY MR750W. The scanners had various software levels (Siemens, VB17 and VD13; GE, 12x, 15x, 16x, 23x, and 24x). Standard 20-, 12-, and 8-channel coils were required on Siemens Skyra, Siemens TIM Trio, and GE scanners, respectively. However, one GE Discovery MR750 site was limited to a 16-channel (HNS HEAD) coil. More detailed information can be found in Zhou et al. [2].

Table 1.

3T MR scanners involved in the SPRINT-MS trial (a less-detailed version of Table 1 in Zhou et al. [2]).

Site Manufacturer Model Software version Coil
1 GE DISCOVERY MR750 24x 32-channel head/HNS HEAD
2 GE DISCOVERY MR750W 23x Head 24/8HRBRAIN
3 GE Signa EXCITE 12x 8HRBRAIN
4,5 GE Signa HDxt 15x 8HRBRAIN
6–8, 10 GE Signa HDxt 16x 8HRBRAIN
9 GE Signa HDxt 23x 8HRBRAIN
11–16 Siemens Skyra Syngo MR D13 20-channel head-neck array
17–27 Siemens Trio Syngo MR B17 12-channel standard Siemens

A different healthy control was recruited at each site and underwent two MR scan sessions at each imaging center. Each subject provided informed consent prior to scanning. Ethical oversight was provided by the Massachusetts General Hospital IRB, to which all sites ceded IRB oversight. These scans were part of the qualification procedure that was performed at each site prior to the initiation of patient scans. A total of 27 volunteers aged 22 to 56 years (16 females, 11 males) participated in this study.

2.2. MR image acquisition

Imaging protocols were harmonized across sites to the extent permitted by each platform. Each subject scan session included comparable T1-weighted and DTI sequences across the different scanner platforms. After a subject was scanned with all MR sequences in a scan session, the subject left the scanner table and was then repositioned for the second scan session, where the same sequences used in the first session were repeated.

Whole-brain T1-weighted images were acquired with the following parameters: 192 axial slices, slice thickness = 1 mm, field of view = 256 × 256 mm2, repetition time/echo time = 20/6 ms, flip angle = 27°, and bandwidth = 160 Hz/pixel (Siemens) or 22.73 kHz (GE). The T1-weighted sequence required approximately 10 minutes for whole-brain coverage. HARDI protocols were generated for each system (flip angle = 90°, isotropic resolution = 2.5 mm, field of view = 255 mm × 255 mm × 150 mm, phase partial Fourier factor = 6/8; no parallel imaging). The HARDI sequence included 8 b = 0 volumes and 64 noncollinear diffusion-weighting (DW) gradients with b-value = 700 s/mm2, matching the profile used in previous research [22]. One GE Signa Excite scanner (scanner 3) could acquire only 55 DW volumes. Siemens TIM Trio scanners used twice-refocused spin-echo diffusion weighting, whereas Siemens Skyra scanners and GE scanners used Stejskal-Tanner diffusion weighting. HARDI required approximately 10 minutes for whole-brain coverage.

2.3. Quality assurance and quality control of in vivo images

In vivo images were scrutinized for data quality in terms of protocol adherence, subject motion, and artifacts. DICOM headers for all of the scans were checked for protocol compliance. Analysis of Functional NeuroImages (AFNI) [23] was used to qualitatively check for subject motion and artifacts. Datasets that failed quality control were rejected and repeat scans were acquired. Data quality assurance was normally completed within 24 hours of a scan.

2.4. Image processing and statistical analyses

T1-weighted images were used to generate a white matter mask for DTI analysis [24]. Seed regions of interest (ROIs) were manually drawn in the precentral gyrus on T1-weighted images. Brain extraction was applied to T1-weighted images using FSL BET [25] to enable coregistration between anatomical and DTI images. With BrainVISA software (http://www.brainvisa.info) [26], a split-brain tissue mask was generated from the high-resolution T1-weighted scan to prevent tracks from crossing between hemispheres. A white matter mask was produced by segmenting T1-weighted images using the FAST algorithm from the FSL library (www.fmrib.ox.ac.uk/fsl) [27] to exclude values from gray matter. The white matter mask was then transformed to DTI space using align_epi_anat from AFNI. Visual inspection confirmed alignment of the white matter mask in DTI space in the region encompassed by the corticospinal tract.

All DTI images passing quality control were first corrected for eddy current distortion and motion with Tortoise [28], using the first b = 0 volume as the reference. While using the average of the 8 b = 0 volumes could have been used to boost the SNR of the reference image, such averaging also incurs blurring—the b = 0 volumes would have to be coregistered to one another, and such coregistration involves interpolation-- which could have a deleterious effect on the motion correction. DTI metrics (longitudinal diffusivity (LD), mean diffusivity (MD), FA and TD) were then calculated on a voxel-by-voxel basis using in-house software [24].

Seed ROIs for tracking were manually drawn in the precentral gyrus using the T1-weighted images and in the cerebral peduncle using FA images in DTI space. The precentral gyrus ROIs were then transferred to DTI space. Probabilistic tractography was performed on the corrected DTI images to identify pyramidal tracts passing both the cerebral peduncle and precentral gyrus [24]. DTI metrics (LD, MD, FA and TD) within the tracts were calculated using software developed in-house [24]. For each subject, values of FA from the first scan were compared against values acquired in the second scan using a paired Student t-test with a significance level of α = 0.05. The comparison was repeated for TD, MD, and LD.

The coefficient of variation (CV = 100 [SD/mean] %) values for DTI metrics were calculated to evaluate intrascanner scan-rescan repeatability. According to Marenco et al. [29], a CV value for an imaging metric below 10% is usually acceptable for assessing biological differences. The intrascanner CV values of each DTI metric were calculated for each scanner. Each value was calculated from a single healthy control, and a different subject was scanned at each site. The cross-scanner CV values for each DTI metric were also calculated for scan 1 and separately for scan 2 to assess cross-scanner comparability. The intraclass correlation coefficient (ICC) for each DTI metric was calculated as in Marenco et al. [29] to assess reproducibility. ICC values above 0.7 are indicative of high reproducibility [29].

For each dataset, we calculated the white matter signal and noise in each voxel among 8 non-diffusion-weighted (b = 0) volumes after applying a white matter mask. White matter signal and noise were measured by taking the mean and standard deviation on a voxel-by-voxel basis among the b = 0 volumes. Signal-to-noise ratio (SNR) was calculated as the ratio between white matter signal and noise.

A study by Jones et al. [22] suggested that the noise floor might affect the calculation of DTI metrics. The signal in MR magnitude images follows a Rician distribution [22,30]. Jones et al. [22] referred to this noisy signal as the “rectified noise floor.” When SNR is low, the signal intensity of diffusion-weighted signal will be overestimated due to noise floor effects [22], and associated DTI metrics will demonstrate systematic error. The degree of systematic error will depend on the size of the signal relative to the noise floor. We calculated an empirical estimator of the importance of noise floor effects, the signal-to-noise floor ratio (SNFR), the ratio of signal in tissue to the noise floor. To estimate the noise floor, an ROI measuring 8 × 20 voxels was placed in a region outside of tissue and away from regions containing contamination from Nyquist ghosting on all slices of the b=0 images, and the mean signal within this ROI was taken as an estimate of the background signal.

The measurements described above (each DTI metric, SNR, noise floor level, and SNFR) were divided into three groups: GE, Siemens Skyra, and Siemens TIM Trio. The rationale for choosing these groups is based on results from phantoms, in which differences between these groups exceeded those within these groups [2]. A one-way ANOVA with significance level α = 0.05 was used to test for significant differences among scanner groups for each DTI metric, SNR, noise floor level, and SNFR, followed by a Tukey-Kramer post-hoc test: GE/Siemens Skyra, GE/Siemens TIM Trio, and Siemens Skyra/Siemens TIM Trio. All statistical analyses were performed using MATLAB (MATLAB Release 2015a, The MathWorks, Inc., Natick, MA, USA).

3. Results

3.1. Intrascanner repeatability and interscanner comparability of DTI metrics

The DTI metrics from scans 1 and 2 are shown in Fig. 1. There were no significant differences between scans 1 and 2. Among the 27 scanners, the highest intrascanner CV metrics were 4.7% for FA, 2.5% for MD, 3.8% for TD, and 5.6% for LD (Fig. 2). Mean ± standard deviation of intrascanner CV was 1.63% ± 1.27% for FA, 0.76% ± 0.58% for MD, 1.28% ± 0.83% for TD and 1.63% ± 1.27% for LD.

Fig. 1.

Fig. 1.

Diffusion tensor imaging measurements for (A) longitudinal diffusivity (LD), (B) fractional anisotropy (FA), (C) mean diffusivity (MD), and (D) transverse diffusivity (TD) from scan 1 (black) and scan 2 (white) on 27 scanners. The scanner information for each site is shown in Table 1.

Fig. 2.

Fig. 2.

Intrasite coefficient of variation (CV) values for (A) longitudinal diffusivity (LD), (B) fractional anisotropy (FA), (C) mean diffusivity (MD), and (D) transverse diffusivity (TD) are shown as blue bars. Cross-scanner CV values are shown as straight (scan 1) and dashed (scan 2) lines. The scanner information for each site is shown in Table 1.

The interscanner comparability showed that the cross-scanner CV values were 5.7% for FA, 2.9% for MD, 4.4% for TD, and 4.2% for LD for scan 1 and 5.6% for FA, 2.9% for MD, 4.6% for TD, and 3.8% for LD for scan 2 (solid and dashed lines, Fig. 2). The ICC values were 0.93 for FA, 0.94 for MD, 0.94 for TD, and 0.93 for LD.

3.2. DTI metrics across platforms

Fig. 3 shows the DTI metrics (LD, MD, TD, and FA) among the various scanner platforms. Significant differences were found for LD (p<2.5e-4, p<3.6e-4) and FA (p<0.02, p<0.005) when comparing GE scanners and Siemens Skyra scanners and when comparing GE scanners and Siemens TIM Trio scanners.

Fig. 3.

Fig. 3.

Comparison of diffusion tensor imaging metrics across scanner platforms. Horizontal bars show significant differences. LD = longitudinal diffusivity, FA = fractional anisotropy, MD = mean diffusivity, TD = transverse diffusivity.

3.3. SNR across platforms

Fig. 4 shows differences in SNR across scanner platforms. Significant differences were found for SNR when comparing GE scanners and Siemens Skyra scanners (p< 1.4e-7) and when comparing Siemens Skyra scanners and TIM Trio scanners (p< 1.0e-10). This result contrasts with the ANOVA analysis of DTI metrics (LD, MD, TD, and FA) shown in Fig. 3: significant differences were found for LD and FA when comparing GE scanners and Siemens Skyra scanners and when comparing GE scanners and Siemens Trio scanners, but there was no significant difference for any DTI metric when comparing Siemens Skyra scanners and TIM Trio scanners.

Fig. 4.

Fig. 4.

Comparison of signal-to-noise ratio (SNR) across scanner platforms. Horizontal bars show significant differences.

3.4. Noise floor level across platforms

Analysis of signal in the background demonstrated differences between GE and Siemens scanners in terms of signal statistics. The underlying noise parameter was different among all scanner models. The measured signal intensity from the background noise ROI was significantly higher for GE scanners than for Siemens scanners (p < 1.2e–12), as shown in Fig. 5. Furthermore, the statistical distribution of background noise was Rician for Siemens scanners and noncentral-chi for GE scanners.

Fig. 5.

Fig. 5.

Cross-scanner comparison of noise floor ratio (NFR). Bars show significant difference between scanners.

Fig. 6 shows differences in SNFR across scanner platforms. Significant differences were found for SNFR when comparing GE scanners and Siemens Skyra scanners (p< 2.5e-11), GE scanners and Siemens Trio scanners (p< 7.5e-11), and Siemens Skyra scanners and Trio scanners (p< 2.5e-9).

Fig. 6.

Fig. 6.

Cross-scanner comparison of signal-to-noise floor ratio (SNFR). Bars show significant difference between scanners.

4. Discussion

The goal of SPRINT-MS is to pool multicenter neuroimaging data to identify structural and microstructural changes in brain tissue in response to treatment with an experimental therapy. More specifically, one of the secondary outcomes is to compare longitudinal changes in TD along the pyramidal tracts between treatment groups. This study will help determine whether TD can be used as a biomarker in progressive MS clinical trials. To support this objective, we assessed the scan-rescan repeatability of DTI metrics and differences in DTI metrics among MR scanners in healthy controls. We reached two main conclusions: first, DTI metrics (FA, MD, TD, and LD) along pyramidal tracts showed good scan-rescan repeatability. Second, TD and MD were robust across multiple scanners. Statistically significant differences in FA and LD between scanner platforms, however, suggest caution. Should the differences be large compared to biologically-based changes, more care might be required to quantify these differences for inclusion in the overall analysis of the study.

There were no significant differences between scan 1 and scan 2, which suggests that there were no obvious difficulties with repeatability overall. Intrascanner and cross-scanner CV values were lower than the accepted threshold of 10% described by Marenco et al. [29] for all DTI parameters. Furthermore, the high ICC values we observed suggest good reproducibility among DTI metrics [29]. For each DTI metric, intrascanner CV at each site was lower than the cross-scanner CV among all sites, with one exception (LD at site 11). This comparison shows that intrascanner repeatability was higher than cross-scanner comparability.

The intrascanner and cross-scanner CV values we observed are generally comparable to values reported in previous studies using 3T scanners, despite differences in acquisition protocols and analysis methods (Table 2) [5,13,21]. For instance, Fox et al. [5] reported intrascanner and cross-scanner CV values for FA and TD that were comparable to our results despite the higher number of scanners in our study and our use of 64 diffusion-weighting directions with b = 700s/mm2 versus 33 diffusion-weighting directions with b = 1000s/mm2 in the previous study. In the study by Alger et al. [13], DTI sequences with 60 to 64 diffusion-weighting directions and b = 1300 or 1000 s/mm2 were used [13]. Additionally, unmatched sequences across scanners were employed, and scanners from three major manufacturers (GE, Siemens, Phillips) were assessed versus the two (GE, Siemens) assessed in our study. Similarly, Jovicich et al. [21] assessed scanners from GE, Siemens, and Philips using DTI sequences with 30 diffusion-weighting directions and b = 700 s/mm2. Using the same dataset as Jovicich et al. [21], Albi et al. [31] showed that CV of FA and diffusivities are lower than 5.5% in corticospinal tract with a the free water elimination method, suggesting that modifications to preprocessing can improve test-retest reproducibility.

Table 2.

MR scanners, DTI protocols, analysis methods, and intrascanner and interscanner CV values in previous studies using 3T MR scanners.

Study Scanners DTI parameters Analysis method Intrascanner CV Interscanner CV
Fox et al.[5] 5 scanners from Siemens, GE 33 DW directions, b = 1000 s/mm2, voxel size = 2.5 × 2.5 × 2.5 mm3 ROI 8.7% for FA, 4.6% for TD 6.8% for FA, 4.3% for TD
Alger et al. [13] 13 scanners from Siemens, GE, Philips 60–64 DW directions, b = 1300 or 1000 s/mm2, voxel size = 2 × 2 × 2, 1.6 × 1.6× 2, or 1.6 × 1.6 × 4 mm3 ROI In left and right CST, 3.8% and 3.6% for FA, 1.2% and 2.2% for MD, 4.5% and 4.6% for TD In left and right CST, 5.2% and 5.4% for FA, 6.9% and 6.6% for MD, 8.7% and 9.0% for TD
Jovicich et al. [20] 10 scanners from Siemens, GE, Philips 30 DW directions, b = 700 s/mm2, voxel size = 2 × 2 × 2 mm3 ROI, TBSS Excluding MD, AD, and RDofCST, CVofFA, MD,AD, andRD<10% N/A

DTI = diffusion tensor imaging, CV = coefficient of variation, DW = diffusion-weighting, ROI = region of interest, FA = fractional anisotropy, TD = transverse diffusivity, CST = corticospinal tract, MD = mean diffusivity, TBSS = tract-based spatial statistics, AD = axial diffusivity (or longitudinal diffusivity), RD = radial diffusivity, N/A = not available.

Our study demonstrated acceptable intrascanner repeatability and cross-scanner comparability for DTI metrics according to conventional standards. However, FA demonstrated higher CV values than MD, TD, and LD across sessions and sites. Furthermore, there were significant differences in LD and FA across scanner manufacturers that can be explained, in part, by differences between GE and Siemens platforms. This result contrasts with those from previous studies, which demonstrated that FA was the most reproducible DTI metric [5,13,21]. This discrepancy may be the result of differences in acquisition protocols and analysis methods. With HARDI sequences, the number of diffusion-weighting directions should be higher than 50 to provide accurate DTI measures [11]. Although Alger et al. [13] used a similar number of diffusion-weighting directions in HARDI sequences as in our study, the other two studies [5,21] used fewer diffusion-weighting directions. Furthermore, DTI metrics were calculated in pyramidal tracts generated with probabilistic tractography in our study, which is different from calculating DTI metrics in selected ROIs [5,13,21] or through the use of tract based spatial statistics (TBSS) analysis [21]. Although TBSS analysis is fully automated [32], a robust FA template and an accurate coregistration of FA images between subjects are necessary for good results [33]. TBSS may not provide accurate estimates in crossing fibers regions [33,34]. Hand-drawn ROIs [5,13] may introduce intra- and inter-rater differences [32]. With the probabilistic tractography method used in our study, seeds and target ROIs were also manually determined. However, with HARDI sequences, probabilistic tractography can reliably delineate fiber pathways in the presence of crossing fibers [11]. Hagler et al. [35] showed that there is a striking qualitative difference between manually selected fibers and probabilistic tractography-derived fiber masks, although the average measures of FA for atlas-derived and manually selected fibers were highly correlated.

Considering the significant differences seen in LD and FA among scanner platforms, strategies accounting for these systematic differences may be necessary for multicenter studies. There strategies may include using statistical models (e.g. with a random-effects term accounting for scanner), calibrating datasets with correction factors, and limiting analysis to scanners from the same vendor or one type of scanner [16,36]. For instance, Zhu et al. [16] integrated datasets from three scanners with similar configurations of the same vendor by taking site-dependent effects into account with weighting statistics. Pohl et al. [36] harmonized multicenter data first by correcting each DTI metric with a scanner-specific factor and then by using statistical models to account for potential demographic and clinical effects. However, since multicenter clinical trials typically follow the same patient longitudinally on the same scanner, these systematic biases may not appreciably affect the use of DTI in multicenter studies, as long as subjects are followed on the same scanner. Although not ideal, our analyses show that in many cases, subjects can be scanned on different scanners with little impact on the derived DTI measures.

Measurements of DTI parameters across imaging platforms can be affected by many factors, such as subject position, motion, biological variability, autoshimming before each sequence, imaging parameters and sequence, head coils, software versions, and reconstruction algorithms. SNR is a parameter that can reflect some of these various factors and is known to account for systematic differences in DTI parameters [37]. However, the patterns of significant differences in SNR among scanners in our study did not match the ANOVA analysis of DTI metrics: there were significant differences for LD and FA but no significant difference for SNR when comparing GE and TIM Trio scanners, and there was a significant difference for SNR but no significant differences for DTI metrics when comparing Skyra and TIM Trio scanners. These results indicate better consistency for DTI metrics when using scanners from the same manufacturer, suggesting that there may be intrinsic manufacturer-specific differences. Furthermore, differences in SNR alone may not lead to the differences seen in DTI metrics across scanners. Although systematic differences in DTI parameters were previously reported when the SNR was lower than 30 [37], the SNR values in our study were higher than 30 (with the exception of one site). Therefore, additional strategies accounting for these differences in DTI metrics may be necessary when combining DTI data from various MR manufacturers.

In our study, the noise floor statistics was noncentral-chi for GE scanners and Rician for Siemens scanners. This may be explained by the differences in coil combination modes for GE (sum-of-squares) and Siemens (adaptive combine) scanners [38]. However, this difference in combination mode can only partially account for the significantly higher noise floor seen with GE scanners (sum-of-squares) compared with Siemens scanners because the noise floor for GE scanners in this study was clearly higher than that for Siemens TIM Trio scanner from sum-of-squares in Sakaie et al. [38]. Some manufacturer-specific factors (such as amplifier gain) may contribute to this higher noise floor for GE scanners. Therefore, we also calculated SNFR to limit the influence of manufacturer-specific factors. The significant differences between scanners in noise floor and SNFR may account for the significant differences we observed in LD and FA. It has been shown that increasing the noise floor will result in lower eigenvalues [38]. Among the three eigenvalues, the first eigenvalue is affected more than the two minor eigenvalues, higher diffusivity can cause more signal attenuation. Therefore, LD (first eigenvalue) may be biased by high noise floor levels. This may also explain the significant differences seen in FA values, as these values incorporate LD values. Although there was a significant difference in SNFR between Siemens TIM Trio scanners and Siemens Skyra scanners, there was no significant difference between these scanner models in DTI metrics, which suggests that the derived DTI metrics may not be affected when SNFR is above a certain threshold. These results indicate that noise floor and SNFR must be considered when combining multicenter datasets from various scanner models.

This study was limited by the fact that only one subject’s data were used from each site, the scan-rescan interval was short, and the same subjects were not scanned at different sites. Use of a different subject at each site imposes a strong assumption that differences between sites are due to hardware, not biological differences. Ideally, a number of subjects would be scanned at each site to disentangle contributions to variability from hardware from that inherent to the subjects, but such a study was cost-prohibitive for a 27 site trial. Palacios et al. [32] scanned 1 traveling volunteer on 13 MRI scanners and showed that cross-scanner CV values comparable to those found here. Limitation of the traveling volunteer study to a single subject attests to the difficulty of performing such a study. The short scan-rescan interval and lack of subjects being scanned at multiple sites makes it difficult to differentiate between scanner-associated variations and differences caused by biological variation between subjects and over time. Although cross-scanner differences reflect both differences among instruments and biological differences, CV and ICC values reached acceptable levels. Another limitation of our study is that we only evaluate the reproducibility of pyramidal tracts. Focus was placed only on these tracts because change over time of DTI metrics in these tracts was chosen as a key outcome in the original data analysis plan for the SPRINT-MS trial. Future work will examine other pathways to assess the generalizability of this work.

Quantification of scanner-related variability is important when large enough to affect the outcome of a study. Ideally, a large number of subjects would be scanned at all sites, but the cost of such measurements can be prohibitive. Given that variability is generally higher between scanner platforms than among scanners of the same type, we recommend that a focused effort– several of the same subjects scanned on a limited but representative sample of scanner types–be performed to quantify the scanner variability.

5. Conclusion

In this study, we sought to assess intrascanner repeatability and cross-scanner comparability of DTI metrics within a large-scale multicenter clinical trial that includes 27 scanners from two major MRI scanner manufacturers. The good repeatability of DTI metrics across the scanners confirms the feasibility of combining multicenter DTI data from previous studies with different scanners, software versions, and pulse sequences utilized. Our results support the feasibility of using DTI in multicenter clinical trials. Despite subtle differences among subjects imaged on each scanner, our results suggest that cross-scanner comparability of DTI measures in healthy volunteers can provide insight regarding appropriate statistical corrections when multicenter data are pooled, thus allowing for increased precision in statistical power estimates.

Acknowledgments

We gratefully acknowledge NeuroRx Research for providing the MRI acquisition and procedures manual.

We would like to thank Megan Griffiths, scientific writer for the Imaging Institute, Cleveland Clinic, Cleveland, Ohio, for her editorial assistance.

Funding: This research was supported by NIH 1U01NS082329-01A1 (NINDS) and RG 4778-A-6 (National MS Society).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • [1].Fox RJ, Coffey CS, Cudkowicz ME, Gleason T, Goodman A, Klawiter EC, et al. Design, rationale, and baseline characteristics of the randomized double-blind phase II clinical trial of ibudilast in progressive multiple sclerosis. Contemp Clin Trials 2016;50:166–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Zhou X, Sakaie KE, Debbins JP, Kirsch JE, Tatsuoka C, Fox RJ, et al. Quantitative quality assurance in a multicenter HARDI clinical trial at 3T. Magn Reson Imaging 2017;35:81–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Padhani AR, Liu G, Koh DM, Chenevert TL, Thoeny HC, Takahara T, et al. Diffusion-weighted magnetic resonance imaging as a cancer biomarker: consensus and recommendations. Neoplasia 2009;11:102–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Huo J, Alger J, Kim H, Brown M, Okada K, Pope W, et al. Between-scanner and between-visit variation in normal white matter apparent diffusion coefficient values in the setting of a multi-center clinical trial. Clin Neuroradiol 2016;26:423–30. [DOI] [PubMed] [Google Scholar]
  • [5].Fox RJ, Sakaie K, Lee JC, Debbins JP, Liu Y, Arnold DL, et al. A validation study of multicenter diffusion tensor imaging: reliability of fractional anisotropy and diffusivity values. AJNRAmerican J Neuroradiol 2012;33:695–700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Song J, Nair VA, Young BM, Walton LM, Nigogosyan Z, Remsik A, et al. DTI measures track and predict motor function outcomes in stroke rehabilitation utilizing BCI technology. Front Hum Neurosci 2015;9:195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Müller HP, Gron G, Sprengelmeyer R, Kassubek J, Ludolph AC, Hobbs N, et al. Evaluating multicenter DTI data in Huntington’s disease on site specific effects: an ex post facto approach. NeuroImageClinical 2013;2:161–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Cole JH, Farmer RE, Rees EM, Johnson HJ, Frost C, Scahill RI, et al. Test-retest reliability of diffusion tensor imaging in Huntington’s disease. PLoS Curr 2014;6. doi: 10.1371/currents.hd.f19ef63fff962f5cd9c0e88f4844f43b. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Teipel SJ, Wegrzyn M, Meindl T, Frisoni G, Bokde AL, Fellgiebel A, et al. Anatomical MRI and DTI in the diagnosis of Alzheimer’s disease: a European multicenter study. J Alzheimers Dis 2012;31 Suppl 3:S33–47. [DOI] [PubMed] [Google Scholar]
  • [10].Keihaninejad S, Zhang H, Ryan NS, Malone IB, Modat M, Cardoso MJ, et al. An unbiased longitudinal analysis framework for tracking white matter changes using diffusion tensor imaging with application to Alzheimer’s disease. Neuroimage 2013;72:153–63. [DOI] [PubMed] [Google Scholar]
  • [11].Berman JI, Lanza MR, Blaskey L, Edgar JC, Roberts TP. High angular resolution diffusion imaging probabilistic tractography of the auditory radiation. AJNRAmerican J Neuroradiol 2013;34:1573–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Huang L, Wang X, Baliki MN, Wang L, Apkarian A V, Parrish TB. Reproducibility of structural, resting-state BOLD and DTI data between identical scanners. PLoS One 2012;7:e47684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Alger JR, Ellingson BM, Ashe-McNalley C, Woodworth DC, Labus JS, Farmer M, et al. Multisite, multimodal neuroimaging of chronic urological pelvic pain: methodology of the MAPP Research Network. NeuroImageClinical 2016;12:65–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Teipel SJ, Reuter S, Stieltjes B, Acosta-Cabronero J, Ernemann U, Fellgiebel A, et al. Multicenter stability of diffusion tensor imaging measures: a European clinical and physical phantom study. Psychiatry Res 2011;194:363–71. [DOI] [PubMed] [Google Scholar]
  • [15].Bartlett JW, Frost C. Reliability, repeatability and reproducibility: analysis of measurement errors in continuous variables. Ultrasound Obstet Gynecol 2008;31:466–75. [DOI] [PubMed] [Google Scholar]
  • [16].Zhu T, Hu R, Qiu X, Taylor M, Tso Y, Yiannoutsos C, et al. Quantification of accuracy and precision of multi-center DTI measurements: a diffusion phantom and human brain study. Neuroimage 2011;56:1398–411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Vollmar C, O’Muircheartaigh J, Barker GJ, Symms MR, Thompson P, Kumari V, et al. Identical, but not the same: intra-site and inter-site reproducibility of fractional anisotropy measures on two 3.0T scanners. Neuroimage 2010;51:1384–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Venkatraman VK, Gonzalez CE, Landman B, Goh J, Reiter DA, An Y, et al. Region of interest correction factors improve reliability of diffusion imaging measures within and across scanners and field strengths. Neuroimage 2015;119:406–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Grech-Sollars M, Hales PW, Miyazaki K, Raschke F, Rodriguez D, Wilson M, et al. Multi-centre reproducibility of diffusion MRI parameters for clinical sequences in the brain. NMR Biomed 2015;28:468–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Hobbs NZ, Cole JH, Farmer RE, Rees EM, Crawford HE, Malone IB, et al. Evaluation of multi-modal, multi-site neuroimaging measures in Huntington’s disease: baseline results from the PADDINGTON study. NeuroImageClinical 2012;2:204–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Jovicich J, Marizzoni M, Bosch B, Bartres-Faz D, Arnold J, Benninghoff J, et al. Multisite longitudinal reliability of tract-based spatial statistics in diffusion tensor imaging of healthy elderly subjects. Neuroimage 2014;101:390–403. [DOI] [PubMed] [Google Scholar]
  • [22].Jones DK, Basser PJ. “Squashing peanuts and smashing pumpkins”: how noise distorts diffusion-weighted MR data. Magn Reson Med 2004;52:979–93. [DOI] [PubMed] [Google Scholar]
  • [23].Cox RW. AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. Comput Biomed Res 1996;29:162–73. [DOI] [PubMed] [Google Scholar]
  • [24].Lowe MJ, Beall EB, Sakaie KE, Koenig KA, Stone L, Marrie RA, et al. Resting state sensorimotor functional connectivity in multiple sclerosis inversely correlates with transcallosal motor pathway transverse diffusivity. Hum Brain Mapp 2008;29:818–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TE, Johansen-Berg H, et al. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 2004;23 Suppl 1:S208–19. [DOI] [PubMed] [Google Scholar]
  • [26].Cointepas Y, Mangin JF, Garnero L, Poline JB, Benali H. BrainVISA: Software platform for visualization and analysis of multi-modality brain data. Neuroimage 2001;13:S98. [Google Scholar]
  • [27].Zhang Y, Brady M, Smith S. Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Trans Med Imaging 2001;20:45–57. [DOI] [PubMed] [Google Scholar]
  • [28].Pierpaoli C, Walker L, Irfanoglu MO, Barnett AS, Change LC, Koay CG, et al. TORTOISE: an integrated software package for processing of diffusion MRI data. ProcIntlSocMagResonMed 2010;18:1597. [Google Scholar]
  • [29].Marenco S, Rawlings R, Rohde GK, Barnett AS, Honea RA, Pierpaoli C, et al. Regional distribution of measurement error in diffusion tensor imaging. Psychiatry Res 2006;147:69–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Gudbjartsson H, Patz S. The Rician distribution of noisy MRI data. Magn Reson Med 1995;34:910–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Albi A, Pasternak O, Minati L, Marizzoni M, Bartres-Faz D, Bargallo N, et al. Free water elimination improves test-retest reproducibility of diffusion tensor imaging indices in the brain: A longitudinal multisite study of healthy elderly subjects. Hum Brain Mapp 2017;38:12–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Palacios EM, Martin AJ, Boss MA, Ezekiel F, Chang YS, Yuh EL, et al. Toward precision and reproducibility of diffusion tensor imaging: a multicenter diffusion phantom and traveling volunteer study. AJNRAmerican J Neuroradiol 2017;38:537–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Jones DK, Cercignani M. Twenty-five pitfalls in the analysis of diffusion MRI data. NMR Biomed 2010;23:803–20. [DOI] [PubMed] [Google Scholar]
  • [34].Otte WM, Dijkhuizen RM, van Meer MP, van der Hel WS, Verlinde SA, van Nieuwenhuizen O, et al. Characterization of functional and structural integrity in experimental focal epilepsy: reduced network efficiency coincides with white matter changes. PLoS One 2012;7:e39078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Hagler DJ Jr, Ahmadi ME, Kuperman J, Holland D, McDonald CR, Halgren E, et al. Automated white-matter tractography using a probabilistic diffusion tensor atlas: application to temporal lobe epilepsy. Hum Brain Mapp 2009;30:1535–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Pohl KM, Sullivan E V, Rohlfing T, Chu W, Kwon D, Nichols BN, et al. Harmonizing DTI measurements across scanners to examine the development of white matter microstructure in 803 adolescents of the NCANDA study. Neuroimage 2016;130:194– 213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Pierpaoli C, Basser PJ. Toward a quantitative assessment of diffusion anisotropy. Magn Reson Med 1996;36:893–906. [DOI] [PubMed] [Google Scholar]
  • [38].Sakaie K, Lowe M. Retrospective correction of bias in diffusion tensor imaging arising from coil combination mode. Magn Reson Imaging 2017;37:203–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES