A Comprehensive Reliability Assessment of Quantitative Diffusion Tensor Tractography

Jun Yi Wang; Hervé Abdi; Khamid Bakhadirov; Ramon Diaz-Arrastia; Michael D Devous, Sr

doi:10.1016/j.neuroimage.2011.12.062

. Author manuscript; available in PMC: 2013 Apr 2.

Published in final edited form as: Neuroimage. 2011 Dec 29;60(2):1127–1138. doi: 10.1016/j.neuroimage.2011.12.062

A Comprehensive Reliability Assessment of Quantitative Diffusion Tensor Tractography

Jun Yi Wang ^a, Hervé Abdi ^a,^c, Khamid Bakhadirov ^a, Ramon Diaz-Arrastia ^b, Michael D Devous Sr ^a,^c

PMCID: PMC3468740 NIHMSID: NIHMS358026 PMID: 22227883

Abstract

Diffusion tensor tractography is increasingly used to examine structural connectivity in the brain in various conditions, but its test-retest reliability is understudied. The main purposes of this study were to evaluate 1) the reliability of quantitative measurements of diffusion tensor tractography and 2) the effect on reliability of the number of gradient sampling directions and scan repetition. Images were acquired from ten healthy participants. Ten fiber regions of nine major fiber tracts were reconstructed and quantified using six fiber variables. Intra- and inter-session reliabilities were estimated using intraclass correlation coefficient (ICC) and coefficient of variation (CV), and compared to pinpoint major error sources. Additional pairwise comparisons were made between the reliability of images with 30 directions and NEX 2 (DTI30-2), 30 directions and NEX 1 (DTI30-1), and 15 directions and NEX 2 (DTI15-2) to determine whether increasing gradient directions and scan repetition improved reliability. Of the 60 tractography measurements, 43 showed intersession CV ≤ 10%, ICC ≥ .70, or both for DTI30-2, 40 measurements for DTI30-1, and 37 for DTI15-2. Most of the reliable measurements were associated with the tracts corpus callosum, cingulum, cerebral peduncular fibers, uncinate fasciculus, and arcuate fasciculus. These reliable measurements included factional anisotropy (FA) and mean diffusivity of all 10 fiber regions. Intersession reliability was significantly worse than intra-session reliability for FA, mean length, and tract volume measurements from DTI15-2, indicating that the combination of MRI signal variation and physiological noise/change over time was the major error source for this sequence. Increasing the number of gradient directions from 15 to 30 while controlling the scan time, significantly affected values for all six variables and reduced intersession variability for mean length and tract volume measurements. Additionally, while increasing scan repetition from 1 to 2 had no significant effect on the reliability for DTI with 30 directions, this significantly reduced the upward bias in FA values from all 10 fiber regions and fiber count, mean length, and tract volume measurements from 5-7 fiber regions. In conclusion, diffusion tensor tractography provided many measurements with high test-retest reliability across different fiber variables and various fiber tracts even for images with 15 directions (NEX 2). Increasing the number of gradient directions from 15 to 30 with equivalent scan time reduced variability whereas increasing repetition from 1 to 2 for 30-direction DTI improved the accuracy of tractography measurements.

Keywords: reliability, diffusion tensor imaging, tractography, variability, white matter, fiber tracts

Introduction

Many brain functions are mediated by parallel distributed neural networks (Catani and ffytche, 2005; McClelland and Rogers, 2003). For instance, language (Ben-Shachar et al., 2007; Duffau, 2008), attention, memory, working memory, consciousness (Naghavi and Nyberg, 2005), and executive function (Aron et al., 2007) all rely on coordinated activation of distinct cortical areas and efficient information transmission by long-range fiber pathways connecting these cortical areas. Recently, the non-invasive investigation of structural connectivity in the brain has been made possible by diffusion tensor imaging (DTI), a new MRI technique that estimates spatial distribution of water diffusion and provides information on the predominant diffusion direction and diffusion directionality (Basser et al., 1994). Based on this information, white matter fiber tracts can be reconstructed in vivo by DTI tractography for visual examination of brain structural connectivity (Behrens et al., 2003; Conturo et al., 1999; Mori et al., 1999). DTI tractography also provides tract-derived measurements such as fractional anisotropy (FA)—which reflects structural alignment and mean diffusivity (MD)—which represents packing density (Basser and Pierpaoli, 1996; Beaulieu, 2002). DTI tractography is now a key area of research applied to a wide range of conditions.

As DTI tractography becomes a standard imaging tool, it is essential to evaluate its test-retest reliability. Recent studies have identified factors that affect DTI reliability through elevating variability of the measurements. These factors include choice of gradient sampling scheme (Gao et al., 2009; Jones et al., 1999; Skare et al., 2000), number of gradient sampling directions (Jones, 2004; Landman et al., 2007), diffusion sensitivity b-value (Bisdas et al., 2008; Gao et al., 2009), number of scan repetitions (Farrell et al., 2007), image registration accuracy (Marenco et al., 2006; Pfefferbaum et al., 2003), rater reliability (Ciccarelli et al., 2003; Heiervang et al., 2006; Ozturk et al., 2008), fiber tracking method (Heiervang et al., 2006; Huang et al., 2004), and scanner model (Danielian et al., 2010; Pfefferbaum et al., 2003; Vollmar et al., 2010). Other factors affecting variability include imaging parameters, MRI signal variation, and subject physiological noise/change, motion, and positioning (Farrell et al., 2007; Gao et al., 2009; Landman et al., 2007; Marenco et al., 2006; Pfefferbaum et al., 2003). Some sources of variability are controllable by keeping consistency across all image acquisitions. Other sources that are not completely controllable include MRI signal variation, physiological noise/change, subject motion, and rater variability. These uncontrollable error sources are assumed to cause intra-session variability in tractography measurements. The same error sources are assumed to cause intersession variability, although MRI signal variation and physiological noise/change may contribute higher variability in measurement over longer periods of time than within the same day. Therefore, quantifying and comparing intra- and inter-session reliability reveal important error sources for tractography measurements.

Despite the importance of obtaining reliable tractography measurements, only four studies have provided limited information regarding the reliability of tractography measurements (Ciccarelli et al., 2003; Danielian et al., 2010; Heiervang et al., 2006; Vollmar et al., 2010). These studies—which used different fiber tracking algorithms—found that intersession reliability of tractography measurements depended on structure (e.g., the cingulum, corpus callosum) and variable (e.g., tract volume, FA). The structure- and variable-dependency of tractography measurements suggest that the reliability of a fiber measurement from a fiber tract cannot be directly inferred from the reliability of a fiber measurement from another fiber tract nor from the reliability of another fiber measurement from the same fiber tract. So, to fully understand the reliability of tractography measurements, intra-session reliability and the reliability of measurements from all major fiber tracts need to be determined.

One approach to improving the reliability of DTI measurements is to achieve high signal-to-noise ratio (SNR) through optimizing the DTI acquisition scheme. Studies have tested the effect of increasing the numbers of gradient directions and scan repetition on reliability. In adult brains where the diffusion directions in the white matter are least likely to be equally distributed, a large number of uniformly distributed gradient directions has been shown to reduce noise-induced rotational variance caused by the misalignment between the principal diffusion direction of the tensor and the gradient direction (Gao et al., 2009; Skare et al., 2000). Jones (2004) demonstrated through Monte-Carlo computer simulations that at least 30 gradient encoding directions were required to obtain robust diffusion measurements suitable for performing tractography. However, Heiervang et al. (2006) compared measurements from probabilistic fiber tracking performed on DTI with 60 versus 12 gradient directions with incongruent results. Even though the scan time was much shorter for DTI with 12 directions, the intersession variability was similar to the one obtained using 60 directions. This indicates that additional studies are needed to clarify the effect of number of gradient sampling directions on DTI reliability. Regarding the effect of scan repetition, Farrell et al. (2007) showed that scan repetition led to increased SNR, which reduced variability in voxel- and regions-of-interest (ROI)-based DTI measurements. To date, there are no available tractography data determine whether scan repetition is an effective means of improving reliability.

The current study aimed to provide a comprehensive assessment of the reliability of quantitative DTI tractography for examining structural connectivity. Specifically, we reconstructed nine major white matter fiber tracts: the corpus callosum, cingulum, fornix, angular bundle (a.k.a. descending cingulum or cingulum at the temporal lobe), cerebral peduncular fibers, uncinate fasciculus, inferior longitudinal fasciculus (ILF), inferior frontooccipital fasciculus (IFO), and arcuate fasciculus. These fiber tracts were quantified using six fiber variables: FA, MD, fiber count, mean length, tract volume, and fiber density. Intra- and inter-session reliabilities were estimated to identify reliable tractography measurements, and compared to evaluate the effect of MRI signal variation and subject physiological change over time. Additionally, the intra- and intersession reliabilities were compared between DTI acquired using both 30 and 15 gradient directions and with both NEX 1 (for 30-direction DTI only) and 2 to assess whether increasing the number of gradient directions and scan repetition improved the test-retest reliability of tractography measurements.

Materials and Methods

Research Participants and Imaging Sessions

Ten healthy subjects participated in the reliability study. These subjects had an average age of 28 years (standard deviation, SD, 5.5 years, range 19-38 years, 60% males). Inclusion criteria required that the subjects had no neurological impairments or psychiatric disorders and were under 40 years old. All healthy subjects participated in two imaging sessions. During the first session, two images were collected, with one using 30, and the other using 15 gradient directions and NEX 2 (DTI30-2A and DTI15-2A). During the second session about four weeks later (mean 41 days, SD 22 days, mode 30 days, range 23-88 days), four images of NEX 2, with two at 30 and two 15 directions, respectively, were collected (DTI30-2B, DTI30-2C, DTI15-2B, and DTI15-2C). As the images were not averaged at the scanner, we were able to select the first sets of the DTI30-2A, DTI30-2B, and DTI30-2C to form DTI30-1A, DTI30-1B, and DTI30-1C (i.e. NEX = 1) to determine the unique contribution to DTI reliability of increasing the number of gradient directions from 15 to 30. These images with 30 directions and NEX 1 were equivalent to DTI15-2 in terms of scan time and all other remaining imaging parameters.

Imaging Acquisition

A research dedicated Philips Achieva 3T MRI machine with multichannel SENSE head coil (Best, the Netherlands) was utilized to collect DTI and T1-weighted images for the reliability study. DTI images for both Jones 15 and 30 gradient directions (gradient directions spread out evenly in the space as it was described in Jones, 2004) were obtained using a single-shot spin-echo, echo-planar imaging sequence in 65 axial slices of 2.2 mm thickness (no gap) with 224 mm FOV, 112 × 112 acquisition matrix, 256 × 256 image matrix, TR of 5,630 ms, TE of 51 ms, 90° flip angle, and 2 NEX. The diffusion sensitizing gradients were applied at a b-value 1,000 s/mm². One additional image with minimum diffusion weighting was also obtained. The acquisition times were ~8 minutes for DTI images with 30 directions and ~ 4 minutes for images with 15 directions. The image resolution was 0.88 × 0.88 × 2.2 mm³. High-resolution T1-weighted 3D-Turbo Field Echo anatomical images were acquired in 160 axial slices of 1.0 mm thickness (no gap) with FOV 256 mm, 256 × 256 matrix, TR of 8.1 ms, TE of 3.7 ms, and 12° flip angle.

Imaging Preprocessing

The image preprocessing for DTI comprised brain extraction, which was carried out using the BET function (Smith, 2002) from FSL (www.fmrib.ox.ac.uk/fsl/, University of Oxford, UK), and eddy current and motion corrections, which were accomplished using the Automatic Image Registration program (Woods et al., 1998) implemented in DTI Studio (www.mristudio.org, Johns Hopkins Medical Institute, Baltimore, MD). For DTI30-2 and DTI15-2, the two sets of raw images were co-registered to one minimum diffusion weighted image using a 12-mode affine transformation. For DTI30-1, all diffusion weighted images were co-registered to a minimum diffusion weighted image using the same method. During the transformation process, the gradient tables were modified according to the generated transformation matrix and subsequently utilized in tensor calculation and diffusion map generation.

DTI Tractography

Fiber tracking was performed in DTI Studio using the Fiber Assignment by Continuous Tracking algorithm (Mori et al., 1999). The FA seed threshold was set to 0.25, termination threshold 0.20, and angle threshold 60°. Fiber tracking adopted the multiple ROI approach which has been shown to have the potential of achieving higher inter-rater reliability in fiber quantification (Huang et al., 2004). ROI drawings were placed along the trajectory of the fiber tracts to extract fibers using ROI logic operations OR, AND, NOT, and CUT. Fiber tracking methods were carefully devised to achieve high accuracy and high inter-rater reliability. The ROIs slices were strategically determined based on anatomical knowledge (Augustinack et al., 2010; Crosby et al., 1962; Nolte, 2002) and referencing previous fiber tracking methods (Catani et al., 2002; Concha et al., 2005; Huang et al., 2005; Mori and van Zijl, 2002; Wakana et al., 2007; Wakana et al., 2004).

Nine fiber tracts were chosen for reconstruction due to their clinical relevance and traceability under current DTI image resolution. As described in Wang et al. (2011), 28 fiber regions could be reconstructed from the 9 fiber tracts following the fiber tracking methods developed in the authors’ laboratory. Appendix Figure 1 depicts the ROI positions for reconstructing the 9 fiber tracts. In the current study, only one fiber region from each of the nine fiber tracts was chosen to simplify the analysis with the exception of the fornix. Both fornix body and crus were included in the analysis to account for the differences in inter-rater reliability of these two fiber regions (see next section) that might have affected the test-retest reliability. For fiber tracts containing two sides, the selections of the side were alternated left and right. The cerebral peduncular fibers contain five fiber regions. The superior frontal projections (CPsf) were chosen because of the prominence of the cortical spinal tract contained in this fiber region.

Tractography Measurements and Inter-rater Reliability

The 10 fiber regions were quantified using six fiber variables: FA, MD, fiber count (i.e. the number of DTI streamlines extracted for a fiber tract), mean length (i.e. the average length in mm for all streamlines belonging to a fiber tract), tract volume (number of voxels occupied by all streamlines for a particular fiber tract), and fiber density (fiber count/voxel). To correct for individual differences in head size, fiber count and tract volume were normalized by intracranial volume, which was calculated from T1-weighted images using the SIENAX program in FSL (Smith et al., 2004; Smith et al., 2002). The inter-rater reliability of the tractography measurements has been tested as a part of previous projects in our lab (Wang et al., 2011; Wang et al., 2008) using intraclass correlation coefficient (ICC) (McGraw and Wong, 1996). The ICC was ≥ .90 for all tractography measurements from the 10 fiber regions except for the tract volume of left fornix crus of which the ICC was .87.

Statistical Analysis

As one set each of DTI with 30 directions (NEX 2) and DTI with 15 directions (NEX 2) were acquired during the first session (DTI30-2A and DTI15-2A) while two sets of each were acquired during the second session (DTI30-2B, C and DTI15-2B, C), intra-session reliability was estimated by contrasting the images acquired within the second session (i.e. DTI30-2B vs. DTI30-2C), and inter-session reliability was quantified using the averaged values which contrasted the images acquired in the first session with the two images acquired in the second session (i.e. the average of DTI30-2A vs. DTI30-2B and DTI30-2A vs. DTI30-2C). The reliability indices were the intra class correlation coefficient (ICC) and the coefficient of variation (CV). ICC, which estimates the amount of agreement in absolute value within the repeated measurements, is computed as:

I C C = \frac{M S_{R} - M S_{E}}{M S_{R} + (k - 1) M S_{E} + \frac{k}{n} (M S_{C} - M S_{E})}

where MS_R = mean square for rows of observations, MS_C = mean square for columns of subjects, MS_E = mean square error, k = number of observations, and n = number of subjects. ICC was computed using the MATLAB toolbox Intraclass correlation coefficients created by Arash Salarian (www.mathworks.com/matlabcentral/fileexchange/22099). CV, which is defined as the ratio, expressed in percentage, of the standard deviation of a set of measurements by its mean (Abdi, 2010; Armitage, 1971; Bland and Altman, 1986), is computed as:

C V = {\frac{S D}{M e a n}}_{.}

Both between- and within-subject variability (CV_between and CV_within, respectively) were quantified, with CV_within being the SD of the differences in repeated measures divided by the mean. For biological measurements from MRI, CV_within ≤ 10% and ICC ≥ .70 are considered as acceptable, while an ICC ≥ .80 is considered ideal (Marenco et al., 2006). Therefore, tractography measurements with intersession CV_within ≤ 10%, ICC ≥ .70, or both, were classified as reliable. To compare different forms of reliability, we conducted paired t-tests to determine whether the changes in the ICC and CV_within for individual fiber variables (e.g., FA, MD) reached statistical significance. To correct for familywise errors, we used a 5% false discovery rate (FDR) (Benjamini and Hochberg, 1995) as computed by Thomas Nichols’ matlab function, FDR.

Results

Tractography Results

Figure 1 shows the representative fiber tracking results from one of the participants. Among the 10 fiber regions, left arcuate fasciculus was the only one that could not be reconstructed successfully from all 10 participants. This fiber region was missing from DTI30-2B, DTI30-1B, DTI30-2C, DTI30-1C, and DTI15-2B for a single subject even when the FA threshold for fiber tracking was lowered to .15. Consequently, left arcuate fasciculus for this subject was excluded from subsequent analyses. In addition, right IFO could not be reconstructed following the fiber tracking method for DTI30-2B, DTI30-1B, DTI30-2C, and DTI30-1C for one participant and DTI15-2A for another participant. Therefore, the ROI slices in the frontal lobe were moved posteriorly to extract the IFO from these scans.

Fig. 1 — Representative tractography results from a research participant. The fiber tracts were reconstructed from (a) DTI30-2A, (b) DTI30-2B, (c) DTI30-1A, (d) DTI30-1B, (e) DTI15-2A, and (f) DTI30-2B. The graph from top-down shows the corpus callosum (dark yellow), cingulum (light blue), fornix body (light yellow), fornix crus (light purple), angular bundle (light green), CPsf (purple), uncinate fasciculus (pink), ILF (blue), IFO (light pink), and arcuate fasciculus (dark teal).

Descriptive Statistics

Group means and standard deviations of the tractography measurements from the 10 fiber regions were averaged across the three sets of images for the same sequence (i.e. DTI30-2A, B, and C). Paired t tests were conducted to find group mean differences between the three sequences (Table 1). In general, DTI30-1 produced higher FA, mean length, fiber count, and tract volume measurements compared to DTI30-2, with FA showing the most consistent results, significantly higher for all 10 fiber regions. The remaining fiber variables showed higher values in DTI30-1 compared to DTI30-2 in 5-7 regions. DTI30-1 also produced higher FA, fiber count, mean length, and tract volume measurements and lower MD measurements compared to DTI15-2, with FA and MD showing the most consistent results in all fiber regions and the remaining variables showing consistency in 5-8 fiber regions. When comparing DTI30-2 to DTI15-2, fiber count, mean length, tract volume, and fiber density were significantly higher in 3-5 fiber regions and MD was lower in all fiber regions except fornix crus.

Table 1.

Group mean and standard deviation of tractography measurements.

Measurements	DTI 30, NEX 2		DTI 30, NEX 1		DTI 15, NEX 2		DTI 30, NEX 2 Vs. DTI 30 NEX 1		DTI 30, NEX 1 Vs. DTI 15, NEX 2		DTI 30, NEX 2 Vs. DTI 15, NEX 2

	Mean	SD	Mean	SD	Mean	SD	Paired t	p	Paired t	p	Paired t	p

Corpus callosum
FA	0.56	0.016	0.57	0.018	0.55	0.013	−8.97	<0.001	4.64	0.001	0.19	0.550
MD (μm²/ms)	0.78	0.022	0.77	0.019	0.81	0.025	4.44	0.001	−7.55	<0.001	−5.88	<0.001
FC	16.4	2.8	16.9	2.8	14.7	1.4	−4.60	0.001	3.40	0.008	2.63	0.027
ML (mm)	88.6	7.4	94.9	7.4	79.9	8.1	−13.24	<0.001	5.85	<0.001	3.31	0.009
TV (voxels/l)	34.9	3.6	38.4	3.8	31.1	2.5	−11.64	<0.001	7.54	<0.001	4.00	0.003
FD (FC/voxel)	47.0	6.7	46.4	6.7	42.9	4.7	2.82	0.020	2.07	0.068	2.44	0.037
Cingulum (left)
FA	0.50	0.017	0.53	0.016	0.51	0.020	−12.53	<0.001	3.44	0.008	−0.32	0.779
MD (μm²/ms)	0.69	0.015	0.70	0.014	0.73	0.014	−4.13	0.003	−9.22	<0.001	−8.55	<0.001
FC	686	196	686	208	502	241	0.03	0.976	2.89	0.018	2.92	0.017
ML (mm)	83.5	9.8	89.0	9.0	73.8	12.1	−5.70	<0.001	4.41	0.002	2.50	0.034
TV (voxels/l)	2251	551	2608	537	2045	742	−7.51	<0.001	3.04	0.014	1.06	0.319
FD (FC/voxel)	30	4.6	27	4.8	21	7.4	3.71	0.005	2.99	0.015	4.51	0.001
Angular bundle (right)
FA	0.41	0.016	0.44	0.021	0.41	0.025	−12.50	<0.001	4.99	0.001	−0.45	0.677
MD (μm²/ms)	0.72	0.017	0.72	0.014	0.75	0.021	−1.39	0.208	−4.82	0.001	−5.21	0.001
FC	150	53	165	58	147	43	−3.44	0.007	1.01	0.339	0.19	0.855
ML (mm)	34.8	3.5	36.2	3.7	35.0	5.7	−2.08	0.067	0.61	0.556	−0.14	0.892
TV (voxels/l)	582	114	677	132	562	139	−6.50	<0.001	2.29	0.048	0.41	0.695
FD (FC/voxel)	11	2.9	11	2.8	12	2.3	0.94	0.377	−0.63	0.545	−0.19	0.852
Fornix body
FA	0.53	0.031	0.59	0.030	0.52	0.030	−9.21	<0.001	5.76	<0.001	0.98	0.362
MD (μm²/ms)	1.18	0.089	1.11	0.107	1.26	0.103	5.93	<0.001	−10.53	<0.001	−8.19	<0.001
FC	334	121	300	81	281	113	1.75	0.115	0.81	0.440	1.52	0.162
ML (mm)	27.8	4.5	29.9	5.1	28.2	3.7	−5.00	0.001	2.55	0.031	−0.81	0.441
TV (voxels/l)	379	75	357	52	377	84	1.39	0.197	−0.58	0.574	0.06	0.954
FD (FC/voxel)	27	5.9	29	5.9	24	6.3	−0.87	0.409	3.21	0.011	1.95	0.083
Fornix crus (left)
FA	0.53	0.036	0.56	0.027	0.50	0.036	−5.44	0.001	5.49	<0.001	2.81	0.015
MD (μm²/ms)	0.89	0.075	0.88	0.058	0.96	0.069	1.26	0.244	−3.30	0.009	−2.41	0.039
FC	145	38	148	50	141	46	−0.16	0.875	0.43	0.680	0.25	0.807
ML (mm)	27.4	2.8	28.5	3.2	28.1	4.3	−3.12	0.012	0.47	0.651	−1.04	0.327
TV (voxels/l)	200	33	214	29	179	27	−1.23	0.248	2.71	0.024	2.61	0.028
FD (FC/voxel)	20	4.4	19	4.2	22	5.9	0.55	0.594	−1.32	0.220	−0.81	0.439
Cerebral peduncular projections to superior frontal lobes
FA	0.53	0.013	0.55	0.012	0.54	0.014	−14.34	<0.001	4.52	0.001	−3.47	0.001
MD (μm²/ms)	0.69	0.010	0.70	0.011	0.73	0.009	−1.36	0.134	−8.15	<0.001	−8.92	<0.001
FC	2246	830	2701	737	1871	949	−3.69	0.005	3.19	0.011	1.50	0.167
ML (mm)	120.9	5.8	126.3	5.1	114.0	8.9	−3.64	0.005	5.56	<0.001	2.63	0.027
TV (voxels/l)	10,071	2,155	13,185	1,877	7,926	2,392	−7.88	<0.001	7.06	<0.001	2.87	0.019
FD (FC/voxel)	20	4.5	20	3.3	21	6.0	0.19	0.854	−0.53	0.607	−0.49	0.639
Uncinate fasciculus (right)
FA	0.42	0.014	0.45	0.015	0.42	0.019	−13.54	<0.001	7.55	<0.001	0.80	0.446
MD (μm²/ms)	0.72	0.014	0.72	0.014	0.77	0.015	−1.06	0.372	−15.78	<0.001	−15.38	<0.001
FC	628	277	604	237	381	194	0.64	0.536	2.79	0.021	3.12	0.012
ML (mm)	76	12	76	13	67	14	0.03	0.979	2.34	0.044	2.29	0.048
TV (voxels/l)	2,035	596	2,240	551	1,445	502	−3.40	0.008	4.18	0.002	3.12	0.012
FD (FC/voxel)	25	8.4	22	6.9	18	6.8	2.48	0.035	1.89	0.091	3.10	0.013
Inferior longitudinal fasciculus (left)
FA	0.49	0.015	0.50	0.014	0.47	0.017	−5.80	<0.001	5.39	<0.001	2.82	0.021
MD (μm²/ms)	0.74	0.015	0.74	0.015	0.81	0.047	−2.30	0.030	−5.94	<0.001	−6.36	<0.001
FC	1782	444	2236	442	1313	364	−4.59	0.001	7.70	<0.001	3.83	0.004
ML (mm)	81.5	6.8	85.0	8.2	79.2	8.0	−4.24	0.002	2.45	0.037	1.03	0.329
TV (voxels/l)	5,913	1,008	7,907	1,035	4,975	779	−9.26	<0.001	11.21	<0.001	3.38	0.008
FD (FC/voxel)	27	4.1	26	3.4	24	5.0	1.29	0.232	2.11	0.065	2.20	0.055
Inferior frontooccipital fasciculus (right)
FA	0.48	0.017	0.51	0.017	0.48	0.021	−16.63	<0.001	6.66	<0.001	−0.39	0.681
MD (μm²/ms)	0.74	0.021	0.73	0.021	0.76	0.021	2.50	0.034	−5.09	0.001	−3.14	0.012
FC	822	268	946	242	678	330	−4.43	0.002	3.16	0.012	1.99	0.078
ML (mm)	144.7	7.1	148.3	8.5	131.6	12.9	−3.46	0.007	4.68	0.001	3.67	0.005
TV (voxels/l)	4,490	829	5,545	822	3,936	1,477	−7.83	<0.001	4.75	0.001	1.87	0.094
FD (FC/voxel)	31	6.5	30	4.0	25	8.6	1.18	0.268	2.38	0.041	3.14	0.012
Arcuate fasciculus (left)
FA	0.500	0.018	0.513	0.022	0.487	0.012	−4.70	0.002	4.40	0.003	2.90	0.024
MD (μm²/ms)	0.698	0.017	0.703	0.018	0.745	0.018	−6.71	<0.001	−8.83	<0.001	−10.16	<0.001
FC	529	346	520	277	340	169	0.52	0.617	2.16	0.067	1.91	0.098
ML (mm)	101.9	6.3	104.6	7.0	101.1	16.0	−3.96	0.006	0.77	0.466	0.22	0.829
TV (voxels/l)	2,172	834	2,362	782	1,640	569	−2.01	0.085	2.70	0.031	1.90	0.099
FD (FC/voxel)	22	8.6	21	5.9	20	6.8	0.77	0.467	0.76	0.472	1.00	0.352

Open in a new tab

Note: only one fiber region from each fiber tract was chosen for the reliability assessment to simply the analysis except for fornix. Bold, significant at 5% FDR (p = .031). Abbreviations: FC, fiber count; FD, fiber density; ML, mean length; TV, tract volume.

Inter-Subject Variability

The average CV_between was computed over the three sets of images for DTI30-2, DTI30-1, and DTI15-2. Among the six fiber variables, FA and MD generally had lower CV_between (2.4-7.6% for FA and 1.7-9.9% for MD) compared to mean length (4.3-22.4%), which in turn had lower CV_between compared to tract volume (9.5-42.9%), fiber density (11.8-41.0%), and fiber count (10.9-63.9%). Pairwise comparisons between DTI30-2, DTI30-1, and DTI15-2 revealed that DTI30-2 had significantly higher CV_between in fiber volume compared to DTI30-1, and lower CV_between in mean length and fiber density compared to DTI15-2. Additionally, DTI30-1 showed lower CV_between in mean length, tract volume, fiber density compared to DTI15-2 (paired |t|(9) = 3.00-4.28, p ≤ .015, FDR q = 0.05). Among the 10 fiber regions, the corpus callosum had relatively low CV_between for fiber count, tract volume and fiber density (9.5-17.2% vs. 9.3-63.9%), and fornix body and crus had relatively high CV_between for FA and MD (6.3-9.9% vs. 1.9-6.5%).

Test-Retest Reliability

Intra- and inter-session CV_within and ICC were computed for all tractography measurements from DTI30-2, DTI30-1, and DTI15-2 (Table 2). Similar to CV_between, FA and MD had the lowest intra- and inter-session CV_within, which ranged from 0.7-8.1% for FA intra-session CV_within, 0.5-9.7% for MD intra-session CV_within, 1.2-7.4% for FA intersession CV_within, and 1.1-7.6% for MD intersession CV_within. Mean length had the next lowest intra- and inter-session CV_within (1.6-12.6% and 3.2-18.8%, respectively), followed by tract volume (3.1-30.2% and 5.0-44.5%, respectively), fiber density (3.2-45.3% and 5.8-60.7%, respectively), and fiber count (4.6-61.6% and 6.0-83.4%, respectively). Among the 10 fiber regions, the corpus callosum had the lowest intra- and inter-session CV_within (0.7-7.6% and 1.2-11.3%, respectively) and produced the most number of acceptable CV_within (34) of the 36 values (6 fiber variables times intra- and inter-session CV_within for DTI30-2, DTI30-1, and DTI15-2). The number of acceptable CV_within for the remaining nine fiber regions ranged from 13 (fornix crus) to 20 (ILF).

Table 2.

Test-retest reliability of representative tractography measurements.

Measurements	Intrasession CV			Intersession CV			Intrasession ICC			Intersession ICC
Measurements	DTI30-2	DTI30-1	DTI15-2	DTI30-2	DTI30-1	DTI15-2	DTI30-2	DTI30-1	DTI15-2	DTI30-2	DTI30-1	DTI15-2

Corpus callosum
FA	0.74	1.48	0.89	1.21	1.88	1.25	0.97	0.90	0.89	0.90	0.88	0.86
MD (μm²/ms)	1.07	1.01	1.01	1.34	1.45	1.40	0.94	0.92	0.90	0.90	0.76	0.89
FC	5.22	7.56	4.59	7.44	5.99	11.28	0.95	0.90	0.93	0.91	0.93	0.39
ML (mm)	2.37	2.28	1.60	3.20	3.22	9.23	0.93	0.93	0.99	0.92	0.93	0.57
TV (voxels/l)	3.65	4.13	3.11	6.07	5.00	10.90	0.95	0.92	0.96	0.84	0.95	0.26
FD (FC/voxel)	5.26	6.71	3.32	5.83	6.88	8.93	0.86	0.84	0.97	0.91	0.92	0.62
Cingulum (left)
FA	2.07	1.54	1.85	2.26	3.29	2.88	0.77	0.85	0.92	0.76	0.67	0.77
MD (μm²/ms)	1.57	1.17	1.51	2.02	1.83	1.73	0.77	0.89	0.65	0.65	0.71	0.55
FC	14.23	15.12	17.96	22.04	21.63	29.38	0.86	0.88	0.94	0.70	0.76	0.82
ML (mm)	7.48	8.26	11.26	9.50	8.34	18.75	0.84	0.77	0.76	0.73	0.76	0.50
TV (voxels/l)	9.82	10.87	13.56	12.32	14.44	19.74	0.92	0.88	0.91	0.88	0.79	0.83
FD (FC/voxel)	8.09	11.30	12.54	16.03	16.42	29.69	0.83	0.80	0.93	0.50	0.53	0.71
Angular bundle (right)
FA	3.70	5.21	2.74	3.96	2.97	3.71	0.74	0.59	0.90	0.55	0.75	0.80
MD (μm²/ms)	3.45	3.25	2.32	3.17	3.19	2.99	0.29	0.32	0.64	0.48	0.25	0.58
FC	13.71	47.61	37.50	38.17	31.67	47.98	0.96	0.55	0.52	0.48	0.54	0.28
ML (mm)	8.19	12.60	9.79	10.49	14.07	13.88	0.82	0.47	0.84	0.50	0.37	0.72
TV (voxels/l)	8.73	29.48	30.23	19.62	17.85	30.32	0.93	0.42	0.56	0.54	0.49	0.47
FD (FC/voxel)	15.21	29.93	19.12	28.97	28.19	32.55	0.90	0.62	0.58	0.49	0.48	0.30
Fornix body
FA	3.58	8.09	3.03	6.52	7.38	5.28	0.85	0.13	0.86	0.57	0.41	0.65
MD (μm²/ms)	2.65	4.54	3.04	3.85	5.09	4.24	0.92	0.88	0.93	0.89	0.84	0.87
FC	22.99	40.37	40.09	26.38	43.50	40.96	0.81	0.39	0.66	0.77	0.28	0.59
ML (mm)	6.80	7.73	6.36	9.01	5.23	14.83	0.91	0.91	0.89	0.87	0.96	0.41
TV (voxels/l)	21.97	17.89	23.19	17.09	21.24	25.10	0.53	0.47	0.61	0.69	0.43	0.51
FD (FC/voxel)	16.00	42.25	33.42	20.87	51.74	33.94	0.74	0.27	0.56	0.57	0.08	0.47
Fornix crus (left)
FA	5.36	4.58	3.81	6.18	5.50	5.20	0.71	0.65	0.81	0.64	0.47	0.63
MD (μm²/ms)	7.83	6.32	9.71	7.58	3.46	6.36	0.70	0.52	0.48	0.69	0.73	0.68
FC	27.49	44.31	48.95	31.17	60.12	36.14	0.51	0.44	0.35	0.49	0.35	0.51
ML (mm)	11.16	10.34	10.27	10.69	9.52	13.43	0.55	0.73	0.79	0.65	0.55	0.70
TV (voxels/l)	24.34	23.59	22.25	21.91	27.11	28.16	0.47	0.12	0.27	0.32	0.13	0.20
FD (FC/voxel)	20.97	28.07	32.34	27.73	38.48	37.25	0.71	0.32	0.56	0.42	0.53	0.39
Cerebral peduncular projections to superior frontal lobes
FA	2.21	1.94	1.26	2.92	2.19	3.25	0.64	0.59	0.88	0.51	0.58	0.38
MD (μm²/ms)	1.58	1.62	0.98	1.45	1.61	2.92	0.51	0.49	0.86	0.57	0.53	−0.16
FC	23.06	19.08	24.96	34.66	28.62	31.35	0.84	0.72	0.88	0.63	0.57	0.81
ML (mm)	2.49	2.95	6.45	4.17	4.50	10.07	0.86	0.81	0.81	0.68	0.65	0.38
TV (voxels/l)	8.61	11.36	12.10	17.61	16.68	25.33	0.93	0.53	0.92	0.71	0.56	0.73
FD (FC/voxel)	20.08	9.26	15.63	18.86	14.66	19.06	0.67	0.86	0.89	0.72	0.73	0.80
Uncinate fasciculus (right)
FA	2.28	2.60	2.65	3.06	3.21	5.28	0.86	0.81	0.87	0.63	0.66	0.53
MD (μm²/ms)	0.49	1.11	1.18	1.14	1.65	1.83	0.92	0.86	0.85	0.84	0.77	0.58
FC	16.60	21.13	31.13	29.42	30.58	38.03	0.95	0.87	0.86	0.80	0.71	0.76
ML (mm)	3.95	6.46	10.81	8.04	7.33	12.91	0.97	0.93	0.88	0.89	0.93	0.85
TV (voxels/l)	12.34	10.02	17.78	16.12	12.45	26.39	0.93	0.93	0.88	0.86	0.82	0.75
FD (FC/voxel)	11.74	16.90	27.61	16.42	22.82	27.97	0.95	0.87	0.81	0.89	0.74	0.72
Inferior longitudinal fasciculus (left)
FA	2.62	2.01	2.20	2.94	3.32	3.41	0.71	0.78	0.81	0.64	0.52	0.65
MD (μm²/ms)	1.11	0.89	0.65	1.60	1.23	5.72	0.90	0.89	0.99	0.69	0.70	0.46
FC	17.13	16.26	20.74	21.55	23.32	26.83	0.76	0.78	0.71	0.69	0.46	0.58
ML (mm)	5.13	4.43	3.36	4.50	5.46	7.69	0.85	0.88	0.95	0.87	0.84	0.78
TV (voxels/l)	7.33	11.79	15.62	11.44	14.02	23.84	0.90	0.75	0.70	0.80	0.44	0.21
FD (FC/voxel)	13.63	10.22	9.87	16.28	17.86	33.46	0.65	0.76	0.93	0.54	0.49	0.21
Inferior frontooccipital fasciculus (right)
FA	3.51	4.01	2.35	3.60	4.37	6.96	0.65	0.47	0.90	0.58	0.47	0.25
MD (μm²/ms)	1.08	1.49	1.09	1.72	1.97	3.26	0.93	0.89	0.95	0.83	0.75	0.41
FC	51.11	61.60	28.87	38.59	35.30	83.43	0.39	−0.15	0.91	0.38	0.38	0.13
ML (mm)	2.32	4.78	4.75	3.99	4.73	17.65	0.82	0.70	0.95	0.67	0.77	−0.04
TV (voxels/l)	16.11	24.60	10.66	17.70	17.09	44.53	0.80	0.34	0.96	0.43	0.48	0.42
FD(FC/voxel)	42.12	45.27	28.58	30.68	35.50	60.74	0.04	−0.49	0.83	0.36	0.19	0.10
Arcuate fasciculus (left)
FA	1.47	1.80	3.29	2.83	2.51	3.11	0.92	0.92	0.32	0.73	0.92	0.39
MD (μm²/ms)	1.24	0.97	0.61	1.15	1.10	1.75	0.84	0.88	0.96	0.85	0.87	0.69
FC	21.03	28.17	59.48	20.17	29.42	43.27	0.93	0.75	0.38	0.94	0.82	0.63
ML (mm)	3.99	3.31	4.19	3.06	4.46	9.99	0.72	0.87	0.97	0.85	0.74	0.75
TV (voxels/l)	16.85	21.58	23.95	17.85	19.83	21.29	0.86	0.73	0.75	0.88	0.82	0.70
FD (FC/voxel)	14.26	18.76	41.63	21.03	26.67	33.77	0.93	0.65	0.31	0.83	0.67	0.58

Open in a new tab

Bold, CV ≤ 10% or ICC ≥ .70. Abbreviations: FC, fiber count; FD, fiber density; ML, mean length; TV, tract volume.

In contrast, intra- and inter-session ICCs were more uniform across all six fiber variables. Mean length provided more tractography measurements (46/60) with acceptable ICC values compared to other fiber variables (28-35/60). Of the 10 fiber regions, the corpus callosum and uncinate fasciculus provided the most number of acceptable ICC values (32 out of 36), followed in turn by cingulum (28), arcuate fasciculus (26), ILF (20), CPsf (18), fornix body (16), IFO (14), angular bundle (10), and fornix crus (7) (see Fig. 2).

Fig. 2 — The scatter plots of FA, MD, mean length (ML), and tract volume (TV) of (a) the corpus callosum and (b) the left arcuate fasciculus reconstructed from the DTI images acquired from the first sessions (x-axis) versus the second sessions (y-axis). Each row shows the data from a different DTI sequence, with the first row showing the data from DTI30-2, the second row DTI30-1, and the third row DTI15-2A. Light blue dots contrast the data from the first session (DTI30-2A, DTI30-1A, or DTI15-2A) with the data of the first DTI sequences acquired from the second sessions (DTI30-2B, DTI30-1B, or DTI15-2B) and the dark blue dots contrasting the data from the first session (DTI30-2A, DTI30-1A, or DTI15-2A) with the data of the second DTI sequences acquired from the second sessions (DTI30-2C, DTI30-1C, or DTI15-2C).

Table 3 summarizes the tractography measurements from the ten fiber regions with acceptable intersession CV, ICC, or both for the three DTI sequences. The corpus callosum, cingulum, CPsf, uncinate fasciculus, and arcuate fasciculus provided 14-18/18 reliability measurements across the three DTI sequences. Two thirds (120/180) of the tested tractography measurements were identified as reliable among which 43 were from DTI30-2, 40 from DTI30-1, and 37 from DTI15-2.

Table 3.

Tractography measurements from the 10 fiber regions showing either CV ≤ 10%, ICC ≥ .70, or both.

Fiber Regions	DTI30-2	DTI30-1	DTI15-2	Total
Corpus callosum	FA, MD, FC, ML, TV, FD	FA, MD, FC, ML, TV, FD	FA, MD, ML, FD	16
Cingulum	FA, MD, FC, ML, TV	FA, MD, FC, ML, TV	FA, MD, FC, TV, FD	15
Angular bundle	FA, MD	FA, MD	FA, MD, ML	7
Fornix body	FA, MD, FC, ML	FA, MD, ML	FA, MD	9
Fonix crus	FA, MD	FA, MD, ML	FA, MD, ML	8
CPsf	FA, MD, ML, TV, FD	FA, MD, ML, FD	FA, MD, FC, TV, FD	14
Uncinate fasciculus	FA, MD, FC, ML, TV, FD	FA, MD, FC, ML, TV, FD	FA, MD, FC, ML, TV, FD	18
ILF	FA, MD, ML, TV	FA, MD, ML	FA, MD, ML	10
IFO	FA, MD, ML	FA, MD, ML	FA, MD	8
Arcuate fasciculus	FA, MD, FC, ML, TV, FD	FA, MD, FC, ML, TV	FA, MD, ML, TV	15

Total	43	40	37	120

Open in a new tab

Abbreviations: FC, fiber count; FD, fiber density; ML, mean length; TV, tract volume.

Intra- Versus Inter-Session Reliability

To compare intra- and inter-session CV_within and ICC, paired t tests were performed on individual fiber variables across the 10 fiber regions and FDR was utilized to control for familywise errors (Table 4). For CV_within, only the DTI15-2 showed significantly higher intersession than intra-session values in FA and mean length (paired t(9) = −3.82 and −5.83, p = .004 and .0002, respectively). For ICC, both DTI30-2 and DTI15-2 showed significant differences in some of the fiber variables. In DTI30-2, intra-session ICC for FA was significantly higher than intersession ICC (paired t(9) = 4.59, p = .001). In DTI15-2, three fiber variables showed significantly higher intra-session ICC compared to inter-session ICC. These were FA (paired t(9) = 3.30, p = .009), mean length (paired t(9) = 3.62, p = .006), and tract volume (paired t(9) = 3.24, p = .010).

Table 4.

Comparisons of intra- and inter-session reliability

Reliability Indices	DTI30-2		DTI30-1		DTI15-2
Reliability Indices	Paired t	P	Paired t	P	Paired t	P

CV
FA	−2.98	0.02	−0.95	0.37	−3.85^*	0.004
MD	−1.94	0.09	−0.06	0.95	−1.53	0.16
Fiber count	−1.87	0.09	−0.22	0.83	−1.24	0.25
Mean length	−2.56	0.03	−0.94	0.37	−5.83^*	< 0.001
Tract volume	−1.89	0.09	−0.02	0.98	−2.60	0.03
Fiber density	−1.67	0.13	−2.07	0.07	−2.43	0.04
ICC
FA	4.59^*	0.001	0.66	0.53	3.30^*	0.009
MD	0.94	0.37	1.57	0.15	2.42	0.04
Fiber count	2.51	0.03	0.45	0.66	1.72	0.12
Mean length	1.49	0.17	1.78	0.11	3.62^*	0.006
Tract volume	2.36	0.04	0.49	0.63	3.24^*	0.010
Fiber density	1.61	0.14	0.15	0.88	2.59	0.029

Open in a new tab

significant at FDR .05, p ≤ .004 for CV, and p ≤ .010 for ICC.

The Effect of Gradient Directions and Scan Repetitions

Next, pairwise comparisons of CV_within and ICC were performed between DTI30-2, DTI30-1, and DTI15-2 to assess the effect of scan repetition and increasing the number of gradient directions from 15 to 30 on intra- and inter-session reliability (Table 5). For intra- and inter-session CV_within, while the values from DTI30-2 were not significantly different from the values from DTI30-1, mean length (paired t(9) = −4.98, p = .001) and tract volume (paired t(9) = −3.65, p = .005) showed significantly lower intersession CV_within in DTI30-1 relative to DTI15-2. When comparing DTI30-2 to DTI 15-2, mean length (paired t(9) = −5.97, p = .0002), tract volume (paired t(9) = −4.71, p = .001), and fiber density (paired t(9) = −4.27, p = .002) showed significantly lower intersession CV_within. In contrast, none of the comparisons of intra-session CV_within or intra- and inter-session ICC survived an FDR of 5%.

Table 5.

Reliability comparisons between different DTI sequences.

Reliability Indices	Intra-session Reliability						Intersession Reliability
	DTI30-2 Vs. DTI30-1		DTI30-1 Vs. DTI15-2		DTI30-2 Vs. DTI15-2		DTI30-2 Vs. DTI30-1		DTI30-1 Vs. DTI15-2		DTI30-2 Vs. DTI15-2
	Paired t	P	Paired t	P	Paired t	P	Paired t	P	Paired t	P	Paired t	P

CV
FA	−1.17	0.27	1.61	0.14	1.14	0.29	−0.49	0.63	−0.86	0.41	−1.11	0.30
MD	−0.11	0.91	0.07	0.95	−0.01	0.99	0.54	0.61	−1.87	0.10	−1.56	0.15
Fiber count	−2.48	0.04	−0.26	0.80	−1.93	0.09	−1.14	0.29	−1.38	0.20	−2.79	0.02
Mean length	−1.71	0.12	−0.77	0.46	−1.72	0.12	−0.03	0.97	−4.98^*	0.001	−5.97^*	<0.001
Tract volume	−1.59	0.15	−0.39	0.71	−1.83	0.10	−0.89	0.40	−3.65^*	0.005	−4.71^*	0.001
Fiber density	−1.63	0.14	−0.15	0.88	−1.45	0.18	−1.82	0.10	−1.62	0.14	−4.27^*	0.002
ICC
FA	1.57	0.15	−1.35	0.21	−0.44	0.67	0.42	0.68	0.57	0.58	1.06	0.32
MD	0.79	0.45	−1.26	0.24	−0.85	0.42	1.69	0.12	1.57	0.15	2.39	0.04
Fiber count	2.88	0.02	−0.86	0.41	0.87	0.40	1.88	0.09	0.35	0.74	1.92	0.09
Mean length	0.57	0.58	−2.01	0.08	−1.45	0.18	0.49	0.64	1.75	0.11	2.37	0.04
Tract volume	3.43	0.008	−2.13	0.06	1.44	0.18	2.42	0.04	1.07	0.31	2.71	0.02
Fiber density	2.31	0.05	−1.34	0.21	−0.05	0.96	1.59	0.15	0.69	0.51	2.41	0.04

Open in a new tab

significant at FDR .05, p ≤ .005 for CV.

Discussion

The current study provided an explicit evaluation of the reliability of tractography measurements of nine major fiber tracts in the brain. The examined fiber variables included FA and MD—which are the most widely used DTI parameters—together with the less commonly used volumetric measurements such as fiber count, mean length, tract volume, and fiber density. Different types of intra- and inter-session reliabilities were compared to find important sources of variability and to evaluate whether increasing the number of gradient directions from 15 to 30 and scan repetition improved reliability.

Reliability of Tractography Measurements

The major findings of the current study were the identification of fiber tracts that provide tractography measurements with high test-retest reliability, and fiber variables that could show acceptable test-retest reliability across different fiber tracts. In line with previous reports (Ciccarelli et al., 2003; Danielian et al., 2010; Heiervang et al., 2006; Vollmar et al., 2010), the reliability of tractography measurements is shown to be tract and variable specific. Across the three different DTI sequences (i.e. DTI30-2, DTI30-1, and DTI15-2), measurements from the corpus callosum tended to show lower intra- and inter-session CV (≤ 10% except 11.3% for fiber count and 10.1% for tract volume from DTI15-2) than those from other fiber tracts. Furthermore, the measurements from the corpus callosum, uncinate fasciculus, cingulum, and arcuate fasciculus provided relatively high intra- and inter-session ICC (ICC ≥ .70 for 26-32/60 measurements) compared to other fiber tracts (ICC ≥ .70 for 7-20/60 measurements). Considering both intersession CV and ICC, 120/180 tractography measurements have shown either low test-retest variability, high consistency, or both across the three DTI sequences. Five fiber regions, namely the corpus callosum, cingulum, CPsf, uncinate fasciculus, and arcuate fasciculus, provided reliable measurements (i.e. intersession CV ≤ 10% or ICC ≥ .70) beyond FA, MD, and mean length, while the reliable tractography measurements from the remaining five fiber regions were more restricted to FA and MD. Of the six fiber variables, FA and MD measurements showed the lowest intra- and inter-session CV, followed in turn by mean length, fiber volume, fiber density, and fiber count. Whereas all 60 FA and MD measurements had CV ≤ 10%, only 44 mean length, 9 tract volume, 9 fiber density, and 5 fiber count measurements attained such levels. In contrast, the six fiber variables exhibited similar intra- and inter-session ICC levels, with mean length showing the highest number of measurements with acceptable ICC values (ICC ≥ .70 for 46/60 measurements for mean length, and 28-35/60 for the remaining fiber variables).

Among previous studies, only Danielian et al. (2010) performed deterministic fiber tracking. They acquired DTI with 32 non-collinear directions (NEX 4) and assessed the reliability of tractography measurements from the complete, genu, motor, and splenium of the corpus callosum, and bilateral uncinate fasciculus. Despite the differences in scan repetition, image acquisition scheme, experimental design, image preprocessing, and fiber tracking methods, the previously reported reliability was at a similar level to that of the current results, notably the DTI30-2 images for both complete corpus callosum and uncinate fasciculus and also the DTI30-1 images for the corpus callosum (see Appendix Table 1). The findings from Danielian et al. and ours indicate that the choices of the number of directions and scan repetition are both more important than the choice of MR scanner for the reliability of tractography measurements (see below for more detailed discussion regarding the effect of the number of gradient encoding directions on reliability). We can compare our results with those of two other previous studies, one using fast marching, the other using a probabilistic fiber tracking method (Ciccarelli et al., 2003; Heiervang et al., 2006). Despite the fact that both previous studies acquired DTI using 60 directions (NEX 1) and thus had equivalent or more scan time than the current three sequences, our data showed lower variability (i.e. the intersession CVs for all three sequences from our study were higher than the intersession CVs from Ciccarelli et al. and Heiervang et al.) (see Appendix Table 1).

Error Sources in DTI Tractography Measurements

We estimated and then compared intra- and inter-session reliabilities in order to find major error sources. For DTI30-2, only FA showed significantly lower intersession ICC compared to intra-session ICC. For DTI15-2, both FA and mean length exhibited significantly higher intersession CV compared to intra-session CV, and FA as well as mean length and tract volume exhibited significantly lower intersession ICC compared to intra-session ICC. No significant differences were detected between intra- and inter-session CV and ICC for DTI30-1 even though many measurements showed higher intersession CV than intra-session CV (44/60).

The analysis indicated for the first time that MRI signal variation and physiological noise/change over time were important sources of error for test-retest reliability of tractography measurements especially FA, which has been used in a variety of studies. Although FA showed low intra-subject variability across the three DTI sequences—a pattern that helped detecting group differences between samples, the low intersession ICC for many FA measurements could potentially affect the capabilities of FA to reveal longitudinal changes and to find associations with other types of measurement. Additionally, we showed that MRI signal variation and physiological noise/change became more prominent for images acquired with 15 gradient directions, affecting both CV and ICC for several fiber variables. However, the results did not imply that the intra-session reliabilities for all tractography measurements were acceptable. In fact, 37/180 (21%) tractography measurements showed both high intra-session CV_within and low ICC. These measurements were mostly fiber count, tract volume, and fiber density measurements from fiber tracts such as the fornix, angular bundle, and IFO. The low reliability in some tractography measurements, even for images acquired from the same scan session, indicated that the combination of MRI signal variation within a scan session, subject motion, physiological noise, and rater variability was also an important error source in addition to MRI signal variation and physiological noise/change over time. Future studies should concentrate on improving image quality by stabilizing the MRI signal within a scan session and across sessions, and minimizing subject motion. Our data suggests that utilizing DTI acquisition schemes with 30 directions may help to reduce MRI signal variation over time. In addition, using phantoms to calibrate water diffusivity and directionality routinely (Lorenz et al., 2008) may be a useful procedure to avoid the drifting of DTI measurements.

The Effect of Number of Gradient Directions and Scan Repetitions

To determine whether increasing the number of gradient directions from 15 to 30 and scan repetition improve reliability, intra- and inter-session CV and ICC were compared between DTI30-2, DTI30-1, and DTI 15-2. While the number of gradient directions and scan repetition displayed non-significant effect on intra-session reliability, we found that increasing the number of gradient directions significantly reduced intersession CV for mean length and tract volume when comparing DTI15-2 to DTI30-1. This finding indicates that using 30 instead of 15 uniformly distributed gradient directions could significantly reduce intersession variability for mean length and tract volume measurements even when scan times are equivalent. However, our result is at odds with the findings of Heiervang et al. (2006) who acquired images with a 1.5T Siemens Sonata MR scanner. In Heiervang et al. (2006), probabilistic fiber tracking was performed to obtain FA, MD, and tract volume measurements of the cingulum, pyramidal tracts, optic radiations, and genu of the corpus callosum. The intersession CV for 12-direction images was reported as being similar to that for 60-direction images. One major difference between the Heiervang’s study and the present one, however, was the number of fiber tracts examined (Heiervang et al. used 4 when we used 9), and this difference might have affected the sensitivity in detecting differences in tract volume. Other factors contributing to the discrepancy could be the differences in scanner performance, image acquisition scheme, tracts-of-interest, and fiber tracking algorithms employed in the analyses. Further analysis is needed to investigate these discrepancies.

Unexpectedly, although increasing scan repetition from 1 to 2 for DTI with 30 directions did not significantly influence intra- or inter-session reliability when combining tractography measurements from the 10 fiber regions, the values obtained were significantly different for many measurements between the two datasets. The reduced SNR in DTI30-1 relative to DTI30-2 led to significantly elevated FA measurements from all 10 fiber regions (2.6-10% increase) as well as significantly elevated fiber count, mean length, and tract volume measurements from many fiber regions. In contrast, the effect on MD was mixed, with either increased, reduced, or no changes in values. The results were consistent with Farrell et al. (2007)’s findings of an upward bias in FA, but no bias in MD when reducing SNR for ROI- and voxel-based measurements. A practical implication is that DTI with different numbers of scan repetitions should not be mixed in the same analysis. Mixing DTI with different numbers of scan repetitions could occur in situations when multiple scans are acquired to improve SNR. When subject motion is detected in one of the scans for a subset of the subjects, we may opt to discard the scans with substantial subject motion. However, this may significantly affect the FA and other DTI measurements for these subjects.

In addition, increasing the number of gradient directions from 15 to 30 with either equivalent or doubled scan time had an effect on both the values and CV_between of tractography measurements. All FA measurements had significantly higher values in DTI30-1 compared to DTI15-2 (1.8-13% increase) as did many measurements of fiber count, mean length, and tract volume. MD was also affected, with significantly lower values in DTI30-1 compared to DTI15-2 (3.7-11% decrease). Comparing DTI15-2 with DTI30-2, the increased gradient directions had a significant effect on all MD measurements except fornix crus (3.1-15% decrease) and on some of the measurements of the remaining fiber variables (lower in DTI15-2 except for CPsf FA). Lower fiber volumes in DTI with 12 directions compared to those in DTI with 60 directions have been reported by Heiervang et al. (2006). Furthermore, we found that the values of the CV_between were also higher for mean length, tract volume, fiber density in DTI15-2 compared to DTI30-1, and for mean length and fiber density when comparing DTI15-2 with DTI30-2.

In summary, the comparisons of the three DTI sequences suggested that each had its own advantages. While DTI30-1 provided tractography measurements with lower variability in mean length, tract volume, and fiber density than DTI15-2, the values of tractography measurements from DTI15-2 were closer to DTI30-2 compared to those of DTI30-1. Among the three sequences, DTI30-2 provided most reliable measurements. However, it is important to keep in mind that the current reliability study was conducted on healthy subjects. The choice of DTI sequence could be different in clinical populations who may experience difficulties in staying still in the scanner. In subject populations where motion is likely, DTI15-2 or DTI30-1 may produce more reliable data than DTI30-2. Indeed, the intersession reliability was not significantly different between DTI30-1 and DTI30-2, and 15 directions were sufficient for obtaining FA and MD measurements from all 10 fiber regions, as were mean length and tract volume measurements from some of the fiber regions (Table 3).

The Reliability of DTI Tractography

While the reconstruction of the 10 fiber regions was successful for most images, it was unexpected that left arcuate fasciculus could not be reconstructed from some of the images for one participant. The reconstruction failure could be caused by biologically small tract size since the tract volumes for successfully reconstructed arcuate from other images of the same participant were also small. A second issue encountered during fiber reconstruction was the occurrences of so called ‘red voxels.’ These ‘red voxels’ have abnormally high FA values and were located along the longitudinal fissure where water diffusion directionality is normally low. Consequently, some corpus callosum fibers propagated through the longitudinal fissure and formed loops. These fibers were excluded from the result. However, the deletion might have affected the reliability. A third issue was the truncation of the corpus callosum fibers reconstructed from DTI15-2. The corpus callosum fibers failed to propagate to the cortex in some of the images, affecting the inter-session reliability of tractography measurements other than FA and MD.

Limitations and Future Directions

Several limitations of the present study need to be mentioned. The first is the well-known ‘crossing fiber’ issue that can occur with the deterministic fiber tracking algorithm such as those implemented in DTI Studio. This algorithm is considered advantageous for clinical applications compared to many alternative computationally intensive probabilistic fiber tracking methods (Behrens et al., 2003). This advantage notwithstanding, the deterministic fiber tracking algorithm fails to resolve complex fiber organization when fiber tracts are crossing or “kissing” and this can cause either early termination of fiber propagation because of the sub-threshold FA value, or erroneous fibers due to the incorrect principal diffusion direction. It is unknown how the ‘crossing fiber’ problem has affected the reliability of our tractography measurements. Additional studies are needed to evaluate the performance of different fiber tracking algorithms and compare the reliability of the resulting quantifications.

The second limitation is in the recognition of important error sources. The current investigation estimated the effect of the combination of MRI signal variation, subject motion, and subject physiological noise/change, scan repetition, and number of gradient directions on reliability. However, the study neither exhaustively differentiated all error sources, nor did it estimate reliability as a function of number of gradient sampling directions. Fulfilling these requirements would need further experimentation to be carried out. For example, the contribution of MRI signal variation to variability in tractography measurements can be tested using a phantom specifically designed for calibrating water diffusion properties and fiber tracking results. Nevertheless, the present study has identified important error sources in test-rest reliability of tractography measurements which should form a guide for future improvement.

Finally, we performed tractography in subjects’ native space instead of automatic fiber tracking that required the transformation all images to a common space, followed by the application of a common set of ROIs for tractography. Our intent was to provide a reliability assessment of tractography measurements obtained using the most popular method in order to provide insight regarding their potential efficacy in clinical research and practice. Although we acquired the images with AC-PC alignment and defined the ROI slices rigorously, the small misalignment between images and minor variances in intra-rater reliability might have had an effect on the reliability assessment. However, automatic fiber tracking may also add variability to the reliability assessment because individual differences in brain morphology make image registration and transformation less than perfect (Klein et al., 2009). Further, the need to perform tensor transformations coupled with the low image resolution of DTI make the process even more challenging (Zhang et al., 2008). Future experiments could beneficially explore whether automatic fiber tracking via spatially normalized data can further improve the reliability of tractography measurements.

Conclusions

The present study assessed the test-retest reliability of quantitative diffusion tensor tractography. Through a comprehensive evaluation of tractography measurements from nine major white matter fiber tracts, 37-43/60 measurements were identified as reliable, depending on the number of gradient directions and scan repetition. Many of these came from the corpus callosum, cingulum, CPsf, uncinate fasciculus, and arcuate fasciculus, and from FA and MD values of all 10 fiber regions. Notably, the identification of reliable tractography measurements was verified in a separate study (Wang et al., 2011), which showed improvement in the validity of tractography measurements for detecting traumatic axonal injury using the reliable measurements.

Different types of intra- and inter-session reliability of the tractography measurements were compared. While intersession reliability was lower than intra-session reliability for DTI with 15 directions, even intra-session reliability was low for some tractography measurements from DTI with 15 as well as 30 directions. The results indicate that the combination of MRI signal variation, subject motion, and physiological noise/change is an important error source. To improve reliability, increasing the number of gradient directions can be an effective method especially for volumetric tractography measurements, a conclusion supported by reduced variability in 30-versus 15-direction DTI for mean length and tract volume even when the scan time is equivalent. In contrast, the accuracy of the values can be improved through scan repetition, evident by comparing group means of tractography measurements from DTI30-1 with DTI30-2. These finding have important implications for future experiments. As long as subject motion is under control, scan repetition and increasing number of gradient directions will likely lead to tractography measurements with high test-retest reliability and accurate. Nevertheless, DTI images with 15 directions with two scan repetitions is capable of providing reliable FA and MD measurements from all nine major fiber tracts, and volumetric tractography measurements from some of the fiber tracts.

Supplementary Material

NIHMS358026-supplement-01.zip^{(505.1KB, zip)}

Acknowledgements

The authors would like to express gratitude to volunteers and to researchers and staff at the Meadow’s Imaging Center at UT Southwestern Medical Center. We would also like to acknowledge Bennett Landman from Johns Hopkins University who kindly provided MATLAB scripts for generating 15 gradient directions with minimal potential energy, and to Tom Porteous for editorial comments. This study is supported by National Institute on Disability and Rehabilitation Research (H133A020526 to R.D.-A.) and National Institutes of Health (U01 HD042652, R01 HD48179 to R.D.-A.).

Appendix

Appendix Table 1.

Comparisons of intersession CV and ICC to previous reports.

Tractography measurements	Ciccarelli et al., 2003	Heiervang et al., 2003^a	Danielian et al., 2010		DTI30-2		DTI30-1		DTI15-2

	CV (%)	CV (%)	CV (%)	ICC	CV (%)	ICC	CV (%)	ICC	CV (%)	ICC

CC
FA	6.2	5.9	1.4	0.87	1.2	0.90	1.9	0.88	1.3	0.86
MD	-	4.7	1.2	0.83	1.3	0.90	1.5	0.76	1.4	0.89
Tract Volume	7.8	9.0	4.1	0.92	6.1	0.84	5.0	0.95	10.9	0.26

Left CB
FA	-	7.3	-	-	2.3	0.76	3.3	0.67	2.9	0.77
MD	-	2.2	-	-	2.0	0.65	1.8	0.71	1.7	0.55
Tract Volume	-	28.9	-	-	12.3	0.88	14.4	0.79	19.7	0.83

Right UF
FA	-	7.3	1.2	0.89	3.1	0.63	3.2	0.66	5.3	0.53
MD	-	2.2	1.4	0.91	1.1	0.84	1.7	0.77	1.8	0.58
Tract Volume	-	28.9	12.3	0.68	16.1	0.86	12.5	0.82	26.4	0.75

Open in a new tab

the values were from the two ROI approach and the genu instead of the complete corpus callosum was reported.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errorsmaybe discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

Abdi H. Coefficient of variation. In: NJ IS, editor. Encyclopedia of Research Design. SAGE Publications, Inc.; Thousand Oaks, CA: 2010. pp. 169–171. [Google Scholar]
Armitage P. Statistical methods in medical research. Blackwell Scientific Publications; Oxford: 1971. [Google Scholar]
Aron AR, Behrens TE, Smith S, Frank MJ, Poldrack RA. Triangulating a cognitive control network using diffusion-weighted magnetic resonance imaging (MRI) and functional MRI. J. Neurosci. 2007;27:3743–3752. doi: 10.1523/JNEUROSCI.0519-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
Augustinack JC, Helmer K, Huber KE, Kakunoori S, Zollei L, Fischl B. Direct visualization of the perforant pathway in the human brain with ex vivo diffusion tensor imaging. Front Hum Neurosci. 2010;4:42. doi: 10.3389/fnhum.2010.00042. [DOI] [PMC free article] [PubMed] [Google Scholar]
Basser PJ, Mattiello J, LeBihan D. Estimation of the effective self-diffusion tensor from the NMR spin echo. J. Magn. Reson. B. 1994;103:247–254. doi: 10.1006/jmrb.1994.1037. [DOI] [PubMed] [Google Scholar]
Basser PJ, Pierpaoli C. Microstructural and physiological features of tissues elucidated by quantitative-diffusion-tensor MRI. J. Magn. Reson. B. 1996;111:209–219. doi: 10.1006/jmrb.1996.0086. [DOI] [PubMed] [Google Scholar]
Beaulieu C. The basis of anisotropic water diffusion in the nervous system - a technical review. NMR Biomed. 2002;15:435–455. doi: 10.1002/nbm.782. [DOI] [PubMed] [Google Scholar]
Behrens TE, Woolrich MW, Jenkinson M, Johansen-Berg H, Nunes RG, Clare S, Matthews PM, Brady JM, Smith SM. Characterization and propagation of uncertainty in diffusion-weighted MR imaging. Magn. Reson. Med. 2003;50:1077–1088. doi: 10.1002/mrm.10609. [DOI] [PubMed] [Google Scholar]
Ben-Shachar M, Dougherty RF, Wandell BA. White matter pathways in reading. Curr. Opin. Neurobiol. 2007;17:258–270. doi: 10.1016/j.conb.2007.03.006. [DOI] [PubMed] [Google Scholar]
Benjamini Y, Hochberg Y. Controlling the False Discovery Rate - a Practical and Powerful Approach to Multiple Testing. J. R. Statist. Soc. B. 1995;57:289–300. [Google Scholar]
Bisdas S, Bohning DE, Besenski N, Nicholas JS, Rumboldt Z. Reproducibility, interrater agreement, and age-related changes of fractional anisotropy measures at 3T in healthy subjects: effect of the applied b-value. AJNR Am. J. Neuroradiol. 2008;29:1128–1133. doi: 10.3174/ajnr.A1044. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1:307–310. [PubMed] [Google Scholar]
Catani M, ffytche DH. The rises and falls of disconnection syndromes. Brain. 2005;128:2224–2239. doi: 10.1093/brain/awh622. [DOI] [PubMed] [Google Scholar]
Catani M, Howard RJ, Pajevic S, Jones DK. Virtual in vivo interactive dissection of white matter fasciculi in the human brain. Neuroimage. 2002;17:77–94. doi: 10.1006/nimg.2002.1136. [DOI] [PubMed] [Google Scholar]
Ciccarelli O, Parker GJ, Toosy AT, Wheeler-Kingshott CA, Barker GJ, Boulby PA, Miller DH, Thompson AJ. From diffusion tractography to quantitative white matter tract measures: a reproducibility study. Neuroimage. 2003;18:348–359. doi: 10.1016/s1053-8119(02)00042-3. [DOI] [PubMed] [Google Scholar]
Concha L, Gross DW, Beaulieu C. Diffusion tensor tractography of the limbic system. AJNR Am. J. Neuroradiol. 2005;26:2267–2274. [PMC free article] [PubMed] [Google Scholar]
Conturo TE, Lori NF, Cull TS, Akbudak E, Snyder AZ, Shimony JS, McKinstry RC, Burton H, Raichle ME. Tracking neuronal fiber pathways in the living human brain. Proc. Natl. Acad. Sci. U. S. A. 1999;96:10422–10427. doi: 10.1073/pnas.96.18.10422. [DOI] [PMC free article] [PubMed] [Google Scholar]
Crosby EC, Humphrey T, Lauer EW. Correlative anatomy of the nervous system. The MacMilan Company; New York: 1962. [Google Scholar]
Danielian LE, Iwata NK, Thomasson DM, Floeter MK. Reliability of fiber tracking measurements in diffusion tensor imaging for longitudinal study. Neuroimage. 2010;49:1572–1580. doi: 10.1016/j.neuroimage.2009.08.062. [DOI] [PMC free article] [PubMed] [Google Scholar]
Duffau H. The anatomo-functional connectivity of language revisited. New insights provided by electrostimulation and tractography. Neuropsychologia. 2008;46:927–934. doi: 10.1016/j.neuropsychologia.2007.10.025. [DOI] [PubMed] [Google Scholar]
Farrell JA, Landman BA, Jones CK, Smith SA, Prince JL, van Zijl PC, Mori S. Effects of signal-to-noise ratio on the accuracy and reproducibility of diffusion tensor imaging-derived fractional anisotropy, mean diffusivity, and principal eigenvector measurements at 1.5 T. J. Magn. Reson. Imaging. 2007;26:756–767. doi: 10.1002/jmri.21053. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gao W, Zhu H, Lin W. A unified optimization approach for diffusion tensor imaging technique. Neuroimage. 2009;44:729–741. doi: 10.1016/j.neuroimage.2008.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
Heiervang E, Behrens TE, Mackay CE, Robson MD, Johansen-Berg H. Between session reproducibility and between subject variability of diffusion MR and tractography measures. Neuroimage. 2006;33:867–877. doi: 10.1016/j.neuroimage.2006.07.037. [DOI] [PubMed] [Google Scholar]
Huang H, Zhang J, Jiang H, Wakana S, Poetscher L, Miller MI, van Zijl PC, Hillis AE, Wytik R, Mori S. DTI tractography based parcellation of white matter: application to the mid-sagittal morphology of corpus callosum. Neuroimage. 2005;26:195–205. doi: 10.1016/j.neuroimage.2005.01.019. [DOI] [PubMed] [Google Scholar]
Huang H, Zhang J, van Zijl PC, Mori S. Analysis of noise effects on DTI-based tractography using the brute-force and multi-ROI approach. Magn. Reson. Med. 2004;52:559–565. doi: 10.1002/mrm.20147. [DOI] [PubMed] [Google Scholar]
Jones DK. The effect of gradient sampling schemes on measures derived from diffusion tensor MRI: a Monte Carlo study. Magn. Reson. Med. 2004;51:807–815. doi: 10.1002/mrm.20033. [DOI] [PubMed] [Google Scholar]
Jones DK, Horsfield MA, Simmons A. Optimal strategies for measuring diffusion in anisotropic systems by magnetic resonance imaging. Magn. Reson. Med. 1999;42:515–525. [PubMed] [Google Scholar]
Klein A, Andersson J, Ardekani BA, Ashburner J, Avants B, Chiang MC, Christensen GE, Collins DL, Gee J, Hellier P, Song JH, Jenkinson M, Lepage C, Rueckert D, Thompson P, Vercauteren T, Woods RP, Mann JJ, Parsey RV. Evaluation of 14 nonlinear deformation algorithms applied to human brain MRI registration. Neuroimage. 2009;46:786–802. doi: 10.1016/j.neuroimage.2008.12.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
Landman BA, Farrell JA, Jones CK, Smith SA, Prince JL, Mori S. Effects of diffusion weighting schemes on the reproducibility of DTI-derived fractional anisotropy, mean diffusivity, and principal eigenvector measurements at 1.5T. Neuroimage. 2007;36:1123–1138. doi: 10.1016/j.neuroimage.2007.02.056. [DOI] [PubMed] [Google Scholar]
Lorenz R, Bellemann ME, Hennig J, Il’yasov KA. Anisotropic phantoms for quantitative diffusion tensor imaging and fiber-tracking validation. Appl. Magn. Reson. 2008;33:419–429. [Google Scholar]
Marenco S, Rawlings R, Rohde GK, Barnett AS, Honea RA, Pierpaoli C, Weinberger DR. Regional distribution of measurement error in diffusion tensor imaging. Psychiatry Res. 2006;147:69–78. doi: 10.1016/j.pscychresns.2006.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
McClelland JL, Rogers TT. The parallel distributed processing approach to semantic cognition. Nat. Rev. Neurosci. 2003;4:310–322. doi: 10.1038/nrn1076. [DOI] [PubMed] [Google Scholar]
McGraw KO, Wong SP. Forming inferences about some intraclass correlation coefficients. Psychol. Methods. 1996;1:30–46. [Google Scholar]
Mori S, Crain BJ, Chacko VP, van Zijl PC. Three-dimensional tracking of axonal projections in the brain by magnetic resonance imaging. Ann. Neurol. 1999;45:265–269. doi: 10.1002/1531-8249(199902)45:2<265::aid-ana21>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
Mori S, van Zijl PC. Fiber tracking: principles and strategies - a technical review. NMR Biomed. 2002;15:468–480. doi: 10.1002/nbm.781. [DOI] [PubMed] [Google Scholar]
Naghavi HR, Nyberg L. Common fronto-parietal activity in attention, memory, and consciousness: shared demands on integration? Conscious Cogn. 2005;14:390–425. doi: 10.1016/j.concog.2004.10.003. [DOI] [PubMed] [Google Scholar]
Nolte J. The human brain: an introduction to its functional anatomy. 5th ed St. Louis, MO; Mosby: 2002. [Google Scholar]
Ozturk A, Sasson AD, Farrell JA, Landman BA, da Motta AC, Aralasmak A, Yousem DM. Regional differences in diffusion tensor imaging measurements: assessment of intrarater and interrater variability. AJNR Am. J. Neuroradiol. 2008;29:1124–1127. doi: 10.3174/ajnr.A0998. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pfefferbaum A, Adalsteinsson E, Sullivan EV. Replicability of diffusion tensor imaging measurements of fractional anisotropy and trace in brain. J. Magn. Reson. Imaging. 2003;18:427–433. doi: 10.1002/jmri.10377. [DOI] [PubMed] [Google Scholar]
Skare S, Hedehus M, Moseley ME, Li TQ. Condition number as a measure of noise performance of diffusion tensor data acquisition schemes with MRI. J. Magn. Reson. 2000;147:340–352. doi: 10.1006/jmre.2000.2209. [DOI] [PubMed] [Google Scholar]
Smith SM. Fast robust automated brain extraction. Hum. Brain Mapp. 2002;17:143–155. doi: 10.1002/hbm.10062. [DOI] [PMC free article] [PubMed] [Google Scholar]
Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TE, Johansen-Berg H, Bannister PR, De Luca M, Drobnjak I, Flitney DE, Niazy RK, Saunders J, Vickers J, Zhang Y, De Stefano N, Brady JM, Matthews PM. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage. 2004;23(Suppl 1):S208–219. doi: 10.1016/j.neuroimage.2004.07.051. [DOI] [PubMed] [Google Scholar]
Smith SM, Zhang Y, Jenkinson M, Chen J, Matthews PM, Federico A, De Stefano N. Accurate, robust, and automated longitudinal and cross-sectional brain change analysis. Neuroimage. 2002;17:479–489. doi: 10.1006/nimg.2002.1040. [DOI] [PubMed] [Google Scholar]
Vollmar C, O’Muircheartaigh J, Barker GJ, Symms MR, Thompson P, Kumari V, Duncan JS, Richardson MP, Koepp MJ. Identical, but not the same: intra-site and inter-site reproducibility of fractional anisotropy measures on two 3.0T scanners. Neuroimage. 2010;51:1384–1394. doi: 10.1016/j.neuroimage.2010.03.046. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wakana S, Caprihan A, Panzenboeck MM, Fallon JH, Perry M, Gollub RL, Hua K, Zhang J, Jiang H, Dubey P, Blitz A, van Zijl P, Mori S. Reproducibility of quantitative tractography methods applied to cerebral white matter. Neuroimage. 2007;36:630–644. doi: 10.1016/j.neuroimage.2007.02.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wakana S, Jiang H, Nagae-Poetscher LM, van Zijl PC, Mori S. Fiber tract-based atlas of human white matter anatomy. Radiology. 2004;230:77–87. doi: 10.1148/radiol.2301021640. [DOI] [PubMed] [Google Scholar]
Wang JY, Bakhadirov K, Abdi H, Devous MD, Sr., Marquez de la Plata CD, Moore C, Madden CJ, Diaz-Arrastia R. Longitudinal changes of structural connectivity in traumatic axonal injury. Neurology. 2011;77:818–826. doi: 10.1212/WNL.0b013e31822c61d7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang JY, Bakhadirov K, Devous MD, Sr., Abdi H, McColl R, Moore C, Marquez de la Plata CD, Ding K, Whittemore A, Babcock E, Rickbeil T, Dobervich J, Kroll D, Dao B, Mohindra N, Madden CJ, Diaz-Arrastia R. Diffusion tensor tractography of traumatic diffuse axonal injury. Arch. Neurol. 2008;65:619–626. doi: 10.1001/archneur.65.5.619. [DOI] [PubMed] [Google Scholar]
Woods RP, Grafton ST, Holmes CJ, Cherry SR, Mazziotta JC. Automated image registration: I. General methods and intrasubject, intramodality validation. J. Comput. Assist. Tomogr. 1998;22:139–152. doi: 10.1097/00004728-199801000-00027. [DOI] [PubMed] [Google Scholar]
Zhang W, Olivi A, Hertig SJ, van Zijl P, Mori S. Automated fiber tracking of human brain white matter using diffusion tensor imaging. Neuroimage. 2008;42:771–777. doi: 10.1016/j.neuroimage.2008.04.241. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS358026-supplement-01.zip^{(505.1KB, zip)}

[R1] Abdi H. Coefficient of variation. In: NJ IS, editor. Encyclopedia of Research Design. SAGE Publications, Inc.; Thousand Oaks, CA: 2010. pp. 169–171. [Google Scholar]

[R2] Armitage P. Statistical methods in medical research. Blackwell Scientific Publications; Oxford: 1971. [Google Scholar]

[R3] Aron AR, Behrens TE, Smith S, Frank MJ, Poldrack RA. Triangulating a cognitive control network using diffusion-weighted magnetic resonance imaging (MRI) and functional MRI. J. Neurosci. 2007;27:3743–3752. doi: 10.1523/JNEUROSCI.0519-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Augustinack JC, Helmer K, Huber KE, Kakunoori S, Zollei L, Fischl B. Direct visualization of the perforant pathway in the human brain with ex vivo diffusion tensor imaging. Front Hum Neurosci. 2010;4:42. doi: 10.3389/fnhum.2010.00042. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Basser PJ, Mattiello J, LeBihan D. Estimation of the effective self-diffusion tensor from the NMR spin echo. J. Magn. Reson. B. 1994;103:247–254. doi: 10.1006/jmrb.1994.1037. [DOI] [PubMed] [Google Scholar]

[R6] Basser PJ, Pierpaoli C. Microstructural and physiological features of tissues elucidated by quantitative-diffusion-tensor MRI. J. Magn. Reson. B. 1996;111:209–219. doi: 10.1006/jmrb.1996.0086. [DOI] [PubMed] [Google Scholar]

[R7] Beaulieu C. The basis of anisotropic water diffusion in the nervous system - a technical review. NMR Biomed. 2002;15:435–455. doi: 10.1002/nbm.782. [DOI] [PubMed] [Google Scholar]

[R8] Behrens TE, Woolrich MW, Jenkinson M, Johansen-Berg H, Nunes RG, Clare S, Matthews PM, Brady JM, Smith SM. Characterization and propagation of uncertainty in diffusion-weighted MR imaging. Magn. Reson. Med. 2003;50:1077–1088. doi: 10.1002/mrm.10609. [DOI] [PubMed] [Google Scholar]

[R9] Ben-Shachar M, Dougherty RF, Wandell BA. White matter pathways in reading. Curr. Opin. Neurobiol. 2007;17:258–270. doi: 10.1016/j.conb.2007.03.006. [DOI] [PubMed] [Google Scholar]

[R10] Benjamini Y, Hochberg Y. Controlling the False Discovery Rate - a Practical and Powerful Approach to Multiple Testing. J. R. Statist. Soc. B. 1995;57:289–300. [Google Scholar]

[R11] Bisdas S, Bohning DE, Besenski N, Nicholas JS, Rumboldt Z. Reproducibility, interrater agreement, and age-related changes of fractional anisotropy measures at 3T in healthy subjects: effect of the applied b-value. AJNR Am. J. Neuroradiol. 2008;29:1128–1133. doi: 10.3174/ajnr.A1044. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1:307–310. [PubMed] [Google Scholar]

[R13] Catani M, ffytche DH. The rises and falls of disconnection syndromes. Brain. 2005;128:2224–2239. doi: 10.1093/brain/awh622. [DOI] [PubMed] [Google Scholar]

[R14] Catani M, Howard RJ, Pajevic S, Jones DK. Virtual in vivo interactive dissection of white matter fasciculi in the human brain. Neuroimage. 2002;17:77–94. doi: 10.1006/nimg.2002.1136. [DOI] [PubMed] [Google Scholar]

[R15] Ciccarelli O, Parker GJ, Toosy AT, Wheeler-Kingshott CA, Barker GJ, Boulby PA, Miller DH, Thompson AJ. From diffusion tractography to quantitative white matter tract measures: a reproducibility study. Neuroimage. 2003;18:348–359. doi: 10.1016/s1053-8119(02)00042-3. [DOI] [PubMed] [Google Scholar]

[R16] Concha L, Gross DW, Beaulieu C. Diffusion tensor tractography of the limbic system. AJNR Am. J. Neuroradiol. 2005;26:2267–2274. [PMC free article] [PubMed] [Google Scholar]

[R17] Conturo TE, Lori NF, Cull TS, Akbudak E, Snyder AZ, Shimony JS, McKinstry RC, Burton H, Raichle ME. Tracking neuronal fiber pathways in the living human brain. Proc. Natl. Acad. Sci. U. S. A. 1999;96:10422–10427. doi: 10.1073/pnas.96.18.10422. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Crosby EC, Humphrey T, Lauer EW. Correlative anatomy of the nervous system. The MacMilan Company; New York: 1962. [Google Scholar]

[R19] Danielian LE, Iwata NK, Thomasson DM, Floeter MK. Reliability of fiber tracking measurements in diffusion tensor imaging for longitudinal study. Neuroimage. 2010;49:1572–1580. doi: 10.1016/j.neuroimage.2009.08.062. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Duffau H. The anatomo-functional connectivity of language revisited. New insights provided by electrostimulation and tractography. Neuropsychologia. 2008;46:927–934. doi: 10.1016/j.neuropsychologia.2007.10.025. [DOI] [PubMed] [Google Scholar]

[R21] Farrell JA, Landman BA, Jones CK, Smith SA, Prince JL, van Zijl PC, Mori S. Effects of signal-to-noise ratio on the accuracy and reproducibility of diffusion tensor imaging-derived fractional anisotropy, mean diffusivity, and principal eigenvector measurements at 1.5 T. J. Magn. Reson. Imaging. 2007;26:756–767. doi: 10.1002/jmri.21053. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Gao W, Zhu H, Lin W. A unified optimization approach for diffusion tensor imaging technique. Neuroimage. 2009;44:729–741. doi: 10.1016/j.neuroimage.2008.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Heiervang E, Behrens TE, Mackay CE, Robson MD, Johansen-Berg H. Between session reproducibility and between subject variability of diffusion MR and tractography measures. Neuroimage. 2006;33:867–877. doi: 10.1016/j.neuroimage.2006.07.037. [DOI] [PubMed] [Google Scholar]

[R24] Huang H, Zhang J, Jiang H, Wakana S, Poetscher L, Miller MI, van Zijl PC, Hillis AE, Wytik R, Mori S. DTI tractography based parcellation of white matter: application to the mid-sagittal morphology of corpus callosum. Neuroimage. 2005;26:195–205. doi: 10.1016/j.neuroimage.2005.01.019. [DOI] [PubMed] [Google Scholar]

[R25] Huang H, Zhang J, van Zijl PC, Mori S. Analysis of noise effects on DTI-based tractography using the brute-force and multi-ROI approach. Magn. Reson. Med. 2004;52:559–565. doi: 10.1002/mrm.20147. [DOI] [PubMed] [Google Scholar]

[R26] Jones DK. The effect of gradient sampling schemes on measures derived from diffusion tensor MRI: a Monte Carlo study. Magn. Reson. Med. 2004;51:807–815. doi: 10.1002/mrm.20033. [DOI] [PubMed] [Google Scholar]

[R27] Jones DK, Horsfield MA, Simmons A. Optimal strategies for measuring diffusion in anisotropic systems by magnetic resonance imaging. Magn. Reson. Med. 1999;42:515–525. [PubMed] [Google Scholar]

[R28] Klein A, Andersson J, Ardekani BA, Ashburner J, Avants B, Chiang MC, Christensen GE, Collins DL, Gee J, Hellier P, Song JH, Jenkinson M, Lepage C, Rueckert D, Thompson P, Vercauteren T, Woods RP, Mann JJ, Parsey RV. Evaluation of 14 nonlinear deformation algorithms applied to human brain MRI registration. Neuroimage. 2009;46:786–802. doi: 10.1016/j.neuroimage.2008.12.037. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Landman BA, Farrell JA, Jones CK, Smith SA, Prince JL, Mori S. Effects of diffusion weighting schemes on the reproducibility of DTI-derived fractional anisotropy, mean diffusivity, and principal eigenvector measurements at 1.5T. Neuroimage. 2007;36:1123–1138. doi: 10.1016/j.neuroimage.2007.02.056. [DOI] [PubMed] [Google Scholar]

[R30] Lorenz R, Bellemann ME, Hennig J, Il’yasov KA. Anisotropic phantoms for quantitative diffusion tensor imaging and fiber-tracking validation. Appl. Magn. Reson. 2008;33:419–429. [Google Scholar]

[R31] Marenco S, Rawlings R, Rohde GK, Barnett AS, Honea RA, Pierpaoli C, Weinberger DR. Regional distribution of measurement error in diffusion tensor imaging. Psychiatry Res. 2006;147:69–78. doi: 10.1016/j.pscychresns.2006.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] McClelland JL, Rogers TT. The parallel distributed processing approach to semantic cognition. Nat. Rev. Neurosci. 2003;4:310–322. doi: 10.1038/nrn1076. [DOI] [PubMed] [Google Scholar]

[R33] McGraw KO, Wong SP. Forming inferences about some intraclass correlation coefficients. Psychol. Methods. 1996;1:30–46. [Google Scholar]

[R34] Mori S, Crain BJ, Chacko VP, van Zijl PC. Three-dimensional tracking of axonal projections in the brain by magnetic resonance imaging. Ann. Neurol. 1999;45:265–269. doi: 10.1002/1531-8249(199902)45:2<265::aid-ana21>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]

[R35] Mori S, van Zijl PC. Fiber tracking: principles and strategies - a technical review. NMR Biomed. 2002;15:468–480. doi: 10.1002/nbm.781. [DOI] [PubMed] [Google Scholar]

[R36] Naghavi HR, Nyberg L. Common fronto-parietal activity in attention, memory, and consciousness: shared demands on integration? Conscious Cogn. 2005;14:390–425. doi: 10.1016/j.concog.2004.10.003. [DOI] [PubMed] [Google Scholar]

[R37] Nolte J. The human brain: an introduction to its functional anatomy. 5th ed St. Louis, MO; Mosby: 2002. [Google Scholar]

[R38] Ozturk A, Sasson AD, Farrell JA, Landman BA, da Motta AC, Aralasmak A, Yousem DM. Regional differences in diffusion tensor imaging measurements: assessment of intrarater and interrater variability. AJNR Am. J. Neuroradiol. 2008;29:1124–1127. doi: 10.3174/ajnr.A0998. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] Pfefferbaum A, Adalsteinsson E, Sullivan EV. Replicability of diffusion tensor imaging measurements of fractional anisotropy and trace in brain. J. Magn. Reson. Imaging. 2003;18:427–433. doi: 10.1002/jmri.10377. [DOI] [PubMed] [Google Scholar]

[R40] Skare S, Hedehus M, Moseley ME, Li TQ. Condition number as a measure of noise performance of diffusion tensor data acquisition schemes with MRI. J. Magn. Reson. 2000;147:340–352. doi: 10.1006/jmre.2000.2209. [DOI] [PubMed] [Google Scholar]

[R41] Smith SM. Fast robust automated brain extraction. Hum. Brain Mapp. 2002;17:143–155. doi: 10.1002/hbm.10062. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TE, Johansen-Berg H, Bannister PR, De Luca M, Drobnjak I, Flitney DE, Niazy RK, Saunders J, Vickers J, Zhang Y, De Stefano N, Brady JM, Matthews PM. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage. 2004;23(Suppl 1):S208–219. doi: 10.1016/j.neuroimage.2004.07.051. [DOI] [PubMed] [Google Scholar]

[R43] Smith SM, Zhang Y, Jenkinson M, Chen J, Matthews PM, Federico A, De Stefano N. Accurate, robust, and automated longitudinal and cross-sectional brain change analysis. Neuroimage. 2002;17:479–489. doi: 10.1006/nimg.2002.1040. [DOI] [PubMed] [Google Scholar]

[R44] Vollmar C, O’Muircheartaigh J, Barker GJ, Symms MR, Thompson P, Kumari V, Duncan JS, Richardson MP, Koepp MJ. Identical, but not the same: intra-site and inter-site reproducibility of fractional anisotropy measures on two 3.0T scanners. Neuroimage. 2010;51:1384–1394. doi: 10.1016/j.neuroimage.2010.03.046. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] Wakana S, Caprihan A, Panzenboeck MM, Fallon JH, Perry M, Gollub RL, Hua K, Zhang J, Jiang H, Dubey P, Blitz A, van Zijl P, Mori S. Reproducibility of quantitative tractography methods applied to cerebral white matter. Neuroimage. 2007;36:630–644. doi: 10.1016/j.neuroimage.2007.02.049. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] Wakana S, Jiang H, Nagae-Poetscher LM, van Zijl PC, Mori S. Fiber tract-based atlas of human white matter anatomy. Radiology. 2004;230:77–87. doi: 10.1148/radiol.2301021640. [DOI] [PubMed] [Google Scholar]

[R47] Wang JY, Bakhadirov K, Abdi H, Devous MD, Sr., Marquez de la Plata CD, Moore C, Madden CJ, Diaz-Arrastia R. Longitudinal changes of structural connectivity in traumatic axonal injury. Neurology. 2011;77:818–826. doi: 10.1212/WNL.0b013e31822c61d7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] Wang JY, Bakhadirov K, Devous MD, Sr., Abdi H, McColl R, Moore C, Marquez de la Plata CD, Ding K, Whittemore A, Babcock E, Rickbeil T, Dobervich J, Kroll D, Dao B, Mohindra N, Madden CJ, Diaz-Arrastia R. Diffusion tensor tractography of traumatic diffuse axonal injury. Arch. Neurol. 2008;65:619–626. doi: 10.1001/archneur.65.5.619. [DOI] [PubMed] [Google Scholar]

[R49] Woods RP, Grafton ST, Holmes CJ, Cherry SR, Mazziotta JC. Automated image registration: I. General methods and intrasubject, intramodality validation. J. Comput. Assist. Tomogr. 1998;22:139–152. doi: 10.1097/00004728-199801000-00027. [DOI] [PubMed] [Google Scholar]

[R50] Zhang W, Olivi A, Hertig SJ, van Zijl P, Mori S. Automated fiber tracking of human brain white matter using diffusion tensor imaging. Neuroimage. 2008;42:771–777. doi: 10.1016/j.neuroimage.2008.04.241. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A Comprehensive Reliability Assessment of Quantitative Diffusion Tensor Tractography

Jun Yi Wang

Hervé Abdi

Khamid Bakhadirov

Ramon Diaz-Arrastia

Michael D Devous Sr

Abstract

Introduction

Materials and Methods

Research Participants and Imaging Sessions

Imaging Acquisition

Imaging Preprocessing

DTI Tractography

Tractography Measurements and Inter-rater Reliability

Statistical Analysis

Results

Tractography Results

Fig. 1.

Descriptive Statistics

Table 1.

Inter-Subject Variability

Test-Retest Reliability

Table 2.

Fig. 2.

Table 3.

Intra- Versus Inter-Session Reliability

Table 4.

The Effect of Gradient Directions and Scan Repetitions

Table 5.

Discussion

Reliability of Tractography Measurements

Error Sources in DTI Tractography Measurements

The Effect of Number of Gradient Directions and Scan Repetitions

The Reliability of DTI Tractography

Limitations and Future Directions

Conclusions

Supplementary Material

Acknowledgements

Appendix

Appendix Table 1.

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases