Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Aug 1.
Published in final edited form as: Magn Reson Med. 2011 Feb 28;66(2):324–332. doi: 10.1002/mrm.22858

Test-Retest Reliability and Reproducibility of Short-Echo-Time Spectroscopic Imaging of Human Brain at 3T

Charles Gasparovic 1,2, Edward J Bedrick 3, Andrew R Mayer 1,4, Ronald A Yeo 2, HongJi Chen 5, Eswar Damaraju 1, Vince D Calhoun 1,6, Rex E Jung 1,7
PMCID: PMC3130105  NIHMSID: NIHMS264108  PMID: 21360748

Abstract

A 1H magnetic resonance spectroscopic imaging (1H-MRSI) study at 3T and short TE was conducted to evaluate both the reproducibility, as measured by the inter-scan coefficient of variation (CV), and test-retest reliability, as measured by the intraclass correlation coefficient (ICC), of measurements of glutamate (Glu), combined glutamate and glutamine (Glx), myo-inositol (mI), N-acetylaspartate (NAA), creatine, and choline in 21 healthy subjects. The effect of partial volume correction on these measures and the relationship of reproducibility and reliability to data quality were also examined. A 1H-MRSI slice was prescribed above the lateral ventricles and single repeat scans were performed within 30 min to minimize physiologic variability. Inter-scan CVs based on all the voxels varied from 0.05-0.07 for NAA, creatine, and choline to 0.10-0.13 for mI, Glu, and Glx. Findings on the reproducibility of gray and white matter estimates of NAA, creatine, and choline are consistent with previous studies using longer TEs, with CVs in the range of 0.02-0.04 and ICCs in the range of 0.65-0.90. CVs for Glu, Glx, and mI are much lower than reported in previous studies at 1.5T, while white matter mI (CV=0.04, ICC=0.93) and gray matter Glx (CV=0.04, ICC=0.68) demonstrated both high reproducibility and test-retest reliability.

Introduction

Proton magnetic spectroscopic imaging (1H-MRSI) is being used increasingly at high field (≥3T) and short echo times (TEs) to measure brain metabolites with multiplet signals that are generally more challenging to resolve at lower fields or longer TEs (1,2). The concentrations of glutamate (Glu), glutamine (Gln), combined Glu and Gln (Glx), or myo-inositol (mI), for example, have been measured with 1H-MRSI at 3T in several recent studies on brain disorders, including multiple sclerosis (3,4), cancer (5,6), and traumatic brain injury (7). Higher field strength improves signal-to-noise and the separation of neighboring signals, even though this is somewhat offset by an increase in line broadening (8). Short TEs, on the other hand, reduce signal loss due to T2 relaxation and cancelation of J-coupled signals by phase modulation, though the judicial use of the latter effect can be used to improve detectability of selected peaks (9-11). Even with these advantages, of course, the factors that impact the reliability of measuring metabolite signals at lower fields or with longer TEs are present at higher fields and shorter TEs. These include voxel size, magnetic field homogeneity, instrument noise, patient motion, and data processing. Furthermore, if longitudinal studies are undertaken, the accuracy of relocating the spectroscopic region of interest in subsequent scans becomes critical.

Although a number of past studies have examined the reproducibility or reliability of 1H-MRSI (12-20), most have been based on few subjects, some were conducted with non-standard pulse sequences, and none have been conducted at 3T using a short TE. Furthermore, the majority of these studies were not based on data corrected for partial volume effects, nor has a consistent inter-scan interval or measure of reproducibility or reliability been applied across studies. Only two studies (13,16) used a standard measure of test-retest measurement reliability, the intraclass correlation coefficients (ICC), while two others (12,14) used an analysis of variance approach to calculate coefficients of variation (CVs) as a measure of reproducibility, dividing the square root of separated variance components (for subject, scan, or voxel) by the mean across all voxels. The remainder of the studies reported more conventional CVs, based on the means and standard deviations of metabolite intensities over similar voxels across subjects or repeated scans. Only two studies calculated CVs based on metabolite intensities corrected for tissue content or brain region (16,20). The fact that the CVs determined from these various approaches vary, even for the most prominent 1H-MRS signals, from a few percent (16,20) to over 10% (14,15,17,18) underscores the general lack of comparability between approaches, experimentally as well as statistically; and, understandably, neuroimaging researchers may be either disheartened or encouraged to use 1H-MRSI in their studies, depending on which reports they read.

In this report we present the results of a study on 1H-MRSI reproducibility and test-retest reliability at 3T and short TE, involving 21 healthy subjects and a standard double spin echo pulse sequence. Our primary goals were to evaluate both the reproducibility and reliability of measurements of Glu, Glx, and mI and other major visible metabolites at 3T in healthy subjects and, more generally, to examine the effect of partial volume correction on these measures. A single 1H-MRSI slice was prescribed above the lateral ventricles and repeat scans were performed within 30 min to minimize any physiologic variability. The data were corrected for tissue composition and relaxation factors as previously described (7,21) and estimates of pure gray and white matter concentrations were estimated by linear regression. Signals from NAA (N-acetylaspartate), combined NAA and N-acetyl-aspartylglutamate (tNAA), total creatine (Cr) and total choline (Cho), Glu, Glx, and mI, were examined. To compare our findings to previous studies, reproducibility was assessed with CVs calculated from subject, scan, voxel and error variances separated using an analysis of variance method and absolute test-retest reliability was assessed with ICCs. All measures were calculated with and without partial volume correction. Finally, the relationship of reproducibility and reliability to data quality was investigated.

Methods

Subjects

Twenty-one healthy subjects (males = 12, mean age = 24.7±5.9) with no history of neurological or psychiatric disorders were recruited and scanned in accordance with a protocol approved by the Institutional Review Board for human research at the University of New Mexico.

MRI and MRS

MRI and 1H-MRSI experiments were performed on a Siemens 3T Tim Trio scanner. Foam padding and paper tape was used to restrict motion within the scanner. High resolution sagittally prescribed T1-weighted anatomic images were acquired with a 5-echo multi-echo MPRAGE sequence [TE (echo time) = 1.64, 3.5, 5.36, 7.22, 9.08 ms, TR (repetition time) = 2.53 s, TI (inversion time) =1.2s, 7° flip angle, number of excitations (NEX) = 1, slice thickness = 1 mm, FOV (field of view) = 256 mm, resolution = 256×256]. Only the root-mean-square of the 5 images generated by this sequence was used in subsequent analyses. T2-weighted images were collected with a fast spin echo sequence [TE = 77.0 ms, TR = 1.55 s, flip angle 152°, NEX = 1, slice thickness = 1.5 mm, FOV = 220 mm, matrix = 192×192, voxel size = 1.15×1.15×1.5 mm3]. The T2-weighted image was aligned axially, parallel to the anterior-posterior commissure (AC-PC) axis as it appeared in the sagittal plane of the T1-weighted image. The T2-weighted image was used to prescribe the 1H-MRSI slice.

Each subject was scanned with localizer, T1-weighted, T2-weighted, and 1H-MRSI sequences, removed completely from the scanner, placed back in the scanner within 15 minutes, and rescanned once. Careful replication of the prescription of the T2-weighted and 1H-MRSI sequences was accomplished by visual comparison to the initial images. Only the T1-weighted image was not repeated. This image was used for tissue segmentation for both sets of data, the results of which were registered to the each of the two T2-weighted images to compute the tissue fractions in each spectroscopic voxel for either scan (see 1H-MRSI data processing section below).

1H-MRSI was performed with a phase-encoded version of a point-resolved spectroscopy sequence (PRESS) both with and without water presaturation (TE = 40ms, TR = 1500 ms, slice thickness = 15mm, FOV = 220 × 220 mm, circular k-space sampling (radius = 24), total scan time = 582s). A TE of 40 ms was chosen to improve detection of the glutamate signal (11). The nominal voxel size was 6.9×6.9×15 mm3 (0.71 cm3) after zero-filling in k-space to 32×32 samples. Using the width at half maximum of the theoretical point spread function (with circular phase encoding and a Hamming filter with a 0.5 width) as the effective diameter of the voxel, the effective voxel volume is estimated to be 2.4 cm3. The 1H-MRSI volume of interest (VOI) was selected with strong saturation bands to reduce chemical shift artifacts and was prescribed with the T2-weighted image to lie immediately above the lateral ventricles and parallel to the AC-PC axis (in-plane T2-weighted image), and included portions of the cingulate gyrus and the frontal and parietal lobes. To further minimize the chemical shift artifact, the transmitter was set to the frequency of the NAA methyl peak during the acquisition of the metabolite spectra and to the frequency of the water peak during the acquisition of the unsuppressed water spectra. Additionally, the outermost rows and columns of the VOI were excluded from analysis. This resulted in a total voxel number of 48 to 80 analyzed voxels per subject, depending on the head size, for a grand total of 1496 voxels across all 21 subjects.

1H-MRSI data processing

After zero-filling to 32×32 points in k-space, applying a Hamming filter with a 50% window width, and 2D spatial Fourier transformation (FT), the time domain 1H-MRSI data were analyzed using LCModel (22) from 4.2-1.8ppm. The basis set for LCModel was generated using spectrum simulation software, based on the theoretical chemical shifts and coupling constants of 15 metabolites, and provided by the developer of LCModel (S. Provencher). Parameterized macromolecule intensities were included over the fitted spectral region (the LCModel set MM20). The Cramer-Rao lower bounds of the fit to the peak of interest output by LCModel were used as a criterion to exclude poor quality data (>20% for a metabolite of interest) from the final analysis. This resulted in a total of 1496 voxels for NAA, tNAA, Cr, and Cho, 1491 voxels for mI, 1340 voxels for Glu, and 1337 voxels for Glx that were analyzed further. The glutamine signal met the CRLB criterion in less than 50% of all spectra and, therefore, was not examined further in this study, other than as a fraction of the Glx signal. Subsequent processing of the derived metabolite values has been described previously (21). Briefly, concentration values were corrected for partial volume and relaxation effects using gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) maps generated by segmenting the T1-weighted images with SPM5 (23) and taking into account the variable water densities and relaxation times in each tissue or CSF compartment. In the present study, we used a CSF T1 value of 3.55 s and a CSF T2 estimate of 2.47 s based on previous measurements at our site (24). Otherwise, the previously reported T1, T2, and water density (WD) values used were as follows: GM: T1=1.304 s, T2=0.093 s (25), WD=0.78 (26); WM: T1=0.660 s, T2=0.073 s (25); WD=0.65 (26); CSF: WD=0.97 (26). Estimates of metabolite T1 and T2 values at 3T were drawn from Mlynarik et al. (27). The Gln T1 and T2 values were assumed to be equal to the Glu values. Estimates of metabolite concentrations in both GM and WM were generated by linear regression of the metabolite concentration in each against the normalized GM fraction of the voxel (GM fraction divided by the sum of the GM and WM fractions) and extrapolating to a GM fraction of one (pure GM) or zero (pure WM) (see example in Fig 1).

Figure 1.

Figure 1

A) Location of 1H-MRSI excitation volume (white rectangle) and measured voxels (green grid within white rectangle) overlaid on T2-weighted image. B) Sagittal view of 1H-MRSI excitation volume. C) LCModel fit of representative spectrum from gray matter voxel outlined in blue in images of A and B. D) Regression analysis of Glu data from all analyzed voxels. The horizontal axis is the GM fraction of the total tissue (GM + WM) fraction.

Statistics

Measurement reproducibility for most 1H-MRS studies has most often been evaluated in terms of measurement variance (28), usually cast in the form of a CV. However, different approaches have been taken to calculate CVs related to reproducibility, and CVs per se do not reflect the capability of a device to obtain the same value for a measurement repeatedly. This latter capability is evaluated with a measure of test-retest reliability, such as some form of the ICC. In the present study, CVs to evaluate 1H-MRSI reproducibility and ICCs to measure absolute test-retest reliability were calculated from subject, voxel, scan, and error variances separated using a straightforward analysis of variance (ANOVA) approach with various multi-factorial random effects models, as introduced by others in seminal early reports on 1H-MRSI reproducibility (12,14). This approach utilizes the restricted maximum likelihood (REML) method to ensure non-negative variance terms. The ANOVA model for the data sets that included each voxel’s metabolite estimate across all subjects was

Yijk=μ+Subi+Voxij+Scank+eijk [1]

where μ is the mean across all voxels; Subi, Voxij, and Scank are the subject, voxel, and scan random effects, respectively; and eijk is the residual or error term. For the calculation of the ICC for individual subject data, this model was reduced to

Yjk=μ+Voxj+Scank+ejk. [2]

For the data sets with only gray or white matter estimates of metabolite concentrations across all subjects, the model was

Yik(gray or white)=μ+Subi+Scank+eik [3]

where μ is the gray or white matter mean concentration across subjects.

ICCs to assess test-retest measurement reliability were based on these random-effects analysis of variance models and were computed as follows (29):

ICC=σB2σB2+σW2+σe2 [4]

where σB2 is a generalized between-subject variance of metabolite concentrations, σW2 is the within-subject variance between scans 1 and 2 (i.e., the inter-scan variance), and σe2 is the variance due to random noise. For the data sets involving all voxels across all subjects, σB2 is the sum of the subject and voxel within-subject variances, as estimated using REML and model [1] above. For ICCs based on individual subject data, σB2 is simply the voxel variance as estimated using model [2], and for the data sets involving gray and white matter estimates, σB2 is the subject variance as estimated using model [3].

CVs for inter-scan (test-retest) reproducibility were calculated based on the variances separated as above, using just the inter-scan ( σW2) and error ( σe2) variances along with the concentration mean μ.

CV=σW2+σe2μ.

Results

Figure 1 shows the 1H-MRSI slice location, a representative fit by LCModel to a spectrum from a voxel with primarily GM, and a plot of the regression analysis used to estimate the Glu concentration in either pure gray or white matter. Table 1 summarizes the various measures of reproducibility or reliability calculated for all data. As expected, the inter-scan (test-retest) CVs, based on the inter-scan and error variances alone, did not differ substantially between the partial volume corrected and uncorrected LCModel output, since the substantial variance due to different fractions of GM, WM, and CSF in different voxels is parceled into the voxel variance term in both cases and, hence, does not enter into the calculation of the inter-scan CV. Also consistent with expectations, the voxel variance is observed to be greater in the partial volume corrected data, due to adjusting the concentrations for the voxel CSF fraction and, hence, elevating the GM estimate of the metabolite from its uncorrected value and creating greater GM-WM metabolite differences across the brain. Inter-scan CVs based on all the voxels (ca 1300-1500) across all 21 subjects in this study varied from lows in the range of 0.05-0.07 for tNAA, NAA, Cr, and Cho to highs of 0.10-0.13 for mI, Glu, and Glx signals defined entirely by their multiplet structures and routinely more difficult to measure.

Table 1. Reproducibility and Reliability Results.

Results of test-retest statistical analyses for LCModel data without partial volume correction (“LCModel”), LCModel data after partial volume correction (“corrected”), and gray matter (“GM”) and white matter (“WM”) estimates of concentrations. Percent of total variance (“%”) for each variance component appears in the adjacent column to the right.

Metabolite Mean Variance CV ICC

Subject % Scan % Voxel % error % Total

NAA
LCmodel 17.51 0.612 25.3 0.021 0.9 0.842 34.9 0.941 38.9 2.416 0.06 0.60
corrected 16.45 0.446 10.5 0.019 0.4 2.874 67.8 0.899 21.2 4.238 0.06 0.78
GM 19.54 0.886 76.3 0.008 0.7 0.267 23.0 1.161 0.03 0.76
WM 14.23 0.627 83.4 0.006 0.8 0.119 15.8 0.752 0.02 0.83
tNAA
LCmodel 19.63 0.640 28.7 0.018 0.8 0.739 33.1 0.836 37.4 2.233 0.05 0.62
corrected 18.37 0.476 21.7 0.016 0.7 0.936 42.7 0.765 34.9 2.193 0.05 0.64
GM 19.50 0.521 65.1 0.026 3.3 0.253 31.6 0.800 0.03 0.65
WM 17.48 0.727 89.5 0.001 0.1 0.084 10.3 0.812 0.02 0.90
Cr
LCmodel 12.42 0.316 9.6 0.007 0.2 2.446 74.3 0.523 15.9 3.292 0.06 0.84
corrected 12.88 0.609 8.8 0.008 0.1 5.668 81.4 0.675 9.7 6.960 0.06 0.90
GM 17.52 1.373 82.0 0.062 3.7 0.239 14.3 1.674 0.03 0.82
WM 9.67 0.230 82.1 0.000 0.0 0.050 17.9 0.280 0.02 0.82
Cho
LCmodel 3.58 0.040 14.5 0.002 0.7 0.179 64.9 0.055 19.9 0.276 0.07 0.79
corrected 3.21 0.038 17.7 0.002 0.9 0.131 60.9 0.044 20.5 0.215 0.07 0.79
GM 3.27 0.061 74.4 0.004 4.9 0.017 20.7 0.082 0.04 0.75
WM 3.17 0.058 90.6 0.000 0.0 0.006 9.4 0.064 0.02 0.90
mI
LCmodel 12.08 0.750 13.2 0.033 0.6 3.296 58.1 1.594 28.1 5.673 0.11 0.71
corrected 10.64 0.820 12.5 0.028 0.4 4.395 67.1 1.304 19.9 6.547 0.11 0.80
GM 14.48 0.685 49.7 0.159 11.5 0.533 38.7 1.377 0.06 0.50
WM 7.96 1.084 93.1 0.000 0.0 0.080 6.9 1.164 0.04 0.93
Glu
LCmodel 16.77 0.324 2.9 0.004 0.0 7.918 70.4 2.995 26.6 11.241 0.10 0.73
corrected 15.49 0.431 2.8 0.001 0.0 12.071 78.7 2.836 18.5 15.339 0.11 0.82
GM 22.06 1.286 54.1 0.000 0.0 1.093 45.9 2.379 0.05 0.54
WM 10.49 0.097 21.7 0.000 0.0 0.349 78.3 0.446 0.06 0.22
Glx
LCmodel 20.67 0.765 3.0 0.126 0.5 17.824 69.2 7.052 27.4 25.767 0.13 0.72
corrected 19.17 1.305 4.0 0.095 0.3 25.157 76.2 6.476 19.6 33.033 0.13 0.80
GM 28.47 3.380 67.9 0.000 0.0 1.595 32.1 4.975 0.04 0.68
WM 11.93 0.768 45.0 0.222 13.0 0.717 42.0 1.707 0.08 0.45

The inter-scan CVs of the estimates of metabolite concentrations in either gray or white matter, on the other hand, are substantially less than those based on all voxels. This is also expected, due to reducing the voxel variability to single estimates of metabolite concentration in just gray or white matter, shifting the major source of variance to the subject-by-subject variability. These values ranged from 0.02-0.04 for tNAA, NAA, Cr, and Cho to 0.4-0.06 for mI, Glu, and Glx. Generally, GM CVs were slightly greater than WM CVs. However, this trend was reversed for Glu and Glx.

The effect of partial volume correction on improving reproducibility is also evident in the ICCs, which are the only true measures of absolute test-retest reliability in this study. As shown in Table 1 and suggested in the representative scatter plots of Figure 2, the ICCs based on all voxels were consistently higher for the data after partial volume correction. This was even more apparent in ICCs from individual subjects, as shown by a representative case in Figure 2. Overall, the ICCs based on all the data from voxels across all subjects were high (0.6-0.9), including for Glu, Glx, and mI. However, ICCs based on GM and WM estimates, while still in this range for NAA, tNAA, Cr, and Cho, were sometimes substantially lower for Glu, Glx, and mI.

Figure 2.

Figure 2

A and B) Plots of first scan versus second scan data for NAA (A) and Glu (B) for all voxels from all subjects with (right panels) or without (left panels) partial volume and relaxation correction. C) Similar plots for NAA data from one subject.

We also examined the relationship of data quality to reproducibility. Two measures of data quality reported by LCModel were examined: the NAA line width and signal-to-noise ratio (S/N), the latter approximated as the ratio of the peak height at 2.01 ppm to the root mean square of the residuals of the fit. The values of these measures for each spectrum were averaged across each subject’s 1H-MRSI data set for a particular scan to obtain one mean estimate of 1H-MRSI line width and S/N for that scan. One unexpected finding of this analysis was that measures of data quality were highly reproducible, as illustrated in the scatter plots of Figure 3. The ICC for line width and S/N on successive scans was 0.78 and 0.91, respectively. Furthermore, line width and S/N were predictive of concentration for several metabolites in linear regression models (Table 2). Inspection of scatter plots for these data, however, suggested that the significance of most of these correlations depended on a small number of cases with exceptionally low S/N or large line widths, as illustrated for representative cases in Figure 3. Eliminating just 3 cases from the analysis substantially reduced the number of significant correlations (Table 2). These cases were selected on the basis of having a line width or S/N that was outside the 2.5 or 97.5 percentile points of a theoretical normal distribution of line width or S/N values. Hence, the three cases had either a line width that was 1.96×SD greater than the mean line width across all subjects and scans or a S/N ratio that was 1.96×SD less than the mean S/N across all subjects and scans. This is a relatively conservative but nonetheless arbitrary cut-off point to define statistical outliers, and applied here only to illustrate that a small number of observations with particular poor quality may be largely responsible for the high regression coefficients shown in Table 2.

Figure 3.

Figure 3

A) Plot of mean 1H-MRSI data set line width for first scan versus second scan. B) Plot of mean 1H-MRSI data S/N for first scan versus second scan. C) Plot of estimate of GM Cr concentration versus mean S/N across subjects. D) Plot of estimate of WM Glu concentration versus mean S/N across subjects. The correlations were significant for data sets before removal of outliers (see Table II).

Table 2. Significant correlations of tissue-specific metabolite estimates with line width and S/N.

Normalized regression coefficients (“beta”) for linear regression analyses of GM or WM concentration estimates and either mean 1H-MRSI scan line width or S/N. Only significant correlations involving data from all 21 subjects are shown (p ≤ 0.05). Asterisks (*) indicate correlations no longer significant when outliers in line width or S/N removed.

Metabolite (scan) Normalized beta
Line width S/N
NAA GM (2) 0.563*
NAA WM (2) 0.452*
tNAA GM (2) 0.475*
Cr GM (1) -0.449*
Cr WM (1) 0.639
Cr GM (2) -0.682*
Cr WM (2) -0.669
mI GM (1) -0.503*
mI WM (1) -0.655
mI GM (2) -0.646*
mI WM (2) -0.718
Glu GM (2) 0.449
Glx GM (1) -0.531*
Glx WM (1) 0.641*
Glx GM (2) -0.637*
Glx WM (2) 0.590
*

Significance vanished with outlier removal

Discussion

This study examined both the reproducibility and absolute reliability of brain metabolite concentration estimates from 1H-MRSI data collected with a short TE at 3T, with and without partial volume correction. The reproducibility and reliability of estimates of pure gray and white matter metabolite concentrations were also measured. Our findings reveal a substantial improvement in both reproducibility, as assessed by inter-scan CVs, and reliability, as assessed by ICCs, with partial volume correction. CVs for corrected Glu, Glx, and mI levels in gray and white matter are substantially less than those reported in 1H-MRSI studies at 1.5T and short TE (14,17), while the CVs obtained for gray and white matter NAA, tNAA, Cr, and Cho agree well with those reported for gray and white matter estimates of these metabolites at 3T but at longer TEs (16,20). ICCs for partial volume corrected metabolite estimates were also consistently higher than ICCs based on uncorrected data. These results demonstrate that differences in signal intensities between scans that arise from less-than-exact repositioning of the 1H-MRSI slice, resulting in altered tissue fractions in each voxel, are partially compensated for by partial volume correction. Finally, the reproducibility (CVs) and reliability (ICCs) of NAA estimates in this study were comparable and sometimes superior to tNAA estimates (Table 1). This finding challenges the common assumption that NAAG cannot be reliably resolved from NAA at 3T and, therefore, that estimates of tNAA are more reliable than estimates of NAA alone.

The variety of field strengths, acquisition methods, and processing methods used in past studies has made arriving at a consensus on the reproducibility or reliability of 1H-MRSI challenging. With respect to signals defined solely by their multiplet structures, Chard et al. used single-slice PRESS 1H-MRSI at 1.5T with a 30-ms TE and relatively large nominal voxel size (2.3 cm3) to obtain inter-scan CVs for Glu, Glx, and mI in the range of 0.16 to 0.19 (14). These values can be compared directly to the inter-scan CVs obtained for the all-subject, all-voxel data, both with and without partial volume correction, in the present study (0.11 to 0.13) and are substantially greater than the CVs for the estimates of pure gray and white matter metabolite concentrations reported here (0.04-0.08). Similarly, using a multi-slice PRESS sequence at 1.5T with a TE of 30 ms and a nominal voxel size of 1 cm3, Langer et al. obtained median CVs for Glx and mI of 0.21 and 0.24, respectively (17). These CVs, however, were based on the square root of the total variance (the standard deviation) rather than solely on the scan and error variances and, therefore, are expectedly larger than those reported by either Chard et al. or in the present report. It is worth noting in this regard that the relatively low estimates of reproducibility reported by either Chard et al. or Langer et al. reflect different definitions of reproducibility as well as different acquisition and processing protocols. Along these lines, we note that these studies were conducted at lower field strength as well as in regions of brain that are generally characterized by greater field inhomogeneity than the region investigated in the present study. Nonetheless, judging from this study as well as whole-brain, multi-slice studies by others (16,20), the reproducibility of 1H-MRSI measurements of tissue-specific metabolite levels, which are the values of interest to most researchers, substantially exceeds that suggested by studies that do not take regional variations of metabolite levels into account.

Test-retest reliability, as measured by the ICC, was uniformly high (0.60-0.90) for all metabolites in this study when the data from all voxels across subjects were used as input. However, ICCs based on pure gray and white matter metabolite estimates were roughly inversely related to the inter-scan CVs: high for NAA, tNAA, Cr, and Cho (0.65-0.9) but lower for particular mI, Glu, and Glx estimates (0.22-0.54). Inspection of Table I reveals that the low ICCs are primarily a consequence of an error variance term ( σe2) that was high relative to the between-subjects term ( σB2) in Eq. [4]; whereas, in ICC analyses based on all voxels, the voxel variance accounted for much more, if not most, of the total variance, and thus led to a high σB2. Regardless of the details of these differences, the conclusion that must be drawn from these results is that, under the acquisition and processing protocols of the present study, the test-retest reliability of the measurement of GM mI, WM Glu, or WM Glx is low (ICC≤0.50) and the reliability of the GM Glu measurement is only slightly higher (ICC=0.54). Hence, among the signals examined in this study that are entirely defined by J-coupled multiplets, only the measurement of WM mI (ICC=0.93) and GM Glx (ICC=0.68) appear to be highly reliable, in agreement with the relatively low inter-scan CVs obtained for these signals (0.04).

Given the much lower concentration of Glu and glutamine in white matter and the small nominal voxel size (0.71 cm3) of this study, it is not unexpected that estimates of either Glu or Glx would be more reliable in gray matter than in white, nor that the more intense Glx signal could be detected more reliably than Glu alone. Hence, a larger voxel size and, consequently, greater S/N may improve the ICC for the detection of these molecules in both gray and white matter at 3T, i.e., by lowering the error variance term in Eq.[4]. However, another factor in the calculation of the ICC is the between-subjects variance term, which can be seen to be much larger for the GM estimate of Glu or Glx relative to the WM estimate (Table 1) and thus elevates the ICC. Similarly, a greater between-subjects variance coupled with a lower error term in the estimates of mI in WM relative to GM underlies the higher ICC of mI in WM relative to GM. The between-subjects variance term is, in principle, related to real differences in metabolite concentrations among subjects. When interpreting the ICC, therefore, it is worth bearing in mind that, given a certain level of noise, the greater the real subject-to-subject differences in a measured quantity, the higher the apparent measurement reliability will be. In this sense, the ICC based on data from a sample of healthy control subjects may underestimate the reliability of measuring longitudinal differences in a patient group in a study, if the between-subject variability of the measured parameter is greater in the patient group while the within-subject variability is not. Nor does a low ICC based on test-retest data from healthy subjects indicate that differences between healthy subjects and patients cannot be measured reliably since, ultimately, the means and variance components of both groups need to be taken into account in group comparisons.

It is worth noting that the use of the water signal as a concentration reference, acquired in a separate 9.7-minute scan, undoubtedly introduces variance in the concentration estimates. This source of variance will be absent in metabolite ratio data, i.e., if the metabolite intensities are scaled to another metabolite intensity within the same spectrum, such as Cr. Furthermore, any variance introduced by the estimates of CSF needed for ‘absolute’ metabolite concentrations calculations will also be absent in metabolite ratio estimates. Though comparing concentration estimates to ratio data was not an aim of the present study, the reproducibility of metabolite ratios might be greater than the reproducibility exhibited for water-scaled absolute concentrations in this study, provided that the reference metabolite intensity does not vary independently from the metabolite of interest or that any independent variance is less than the combined variance introduced by the water signal and CSF estimation.

An unexpected finding of this study was the high reproducibility of line width and S/N in repeat scans. This could only derive from reproducible patterns of magnetic field inhomogeneity in each subject which, in turn, ultimately derive from the interaction between the scanner magnet, the shim routine, and the magnetic susceptibility of the subject. The latter factor is undoubtedly unique for each subject and primarily determined by factors such as head size and shape, proximity of orbit and nasal cavities to the 1H-MRSI region of interest, and the amount of dental work. This reproducibility would be of little concern to researchers using 1H-MRSI were spectral quality not related to the accuracy of spectral curve fitting. However, previous studies have shown that low S/N and broad line widths can indeed lead to under- or over-estimations of metabolite concentrations when using standard curving fitting routines (28,30,31). The present study supports these findings. Significant correlations between metabolite estimates and either mean 1H-MRSI line width, S/N, or both accounted for a significant portion of the variance for some metabolites, which included Cr as well as Glu. The regression coefficients of this analysis are primarily negative, suggesting that low S/N leads to overestimates of concentrations by LCModel. The number of these correlations was reduced dramatically by eliminating just 3 cases (out of 21) that were outliers in terms of large line width or low S/N, underscoring the importance of screening data for spectral quality in 1H-MRSI studies as well as avoiding any biases in spectral quality between the groups or time points that are to be compared.

In summary, the results of this study demonstrate high reproducibility and test-retest reliability of tissue-specific estimates of several metabolites using 1H-MRSI at 3T and short TE in a group of healthy subjects. Our findings on the reproducibility of gray and white matter estimates of metabolites such as NAA, tNAA, Cr, and Cho are consistent with previous studies using longer TEs. Furthermore, these data show that, under the acquisition and processing protocols of this study, both WM mI and GM Glx in healthy subjects have relatively high reproducibility and test-retest reliability at 3T, and all other measurements of Glu, Glx, and mI demonstrate much lower CVs than reported in previously studies at 1.5T. Finally, the high reproducibility observed for spectral line widths and S/N ratios in 1H-MRSI data from individual subjects, as well as the impact of these factors on spectral analysis, warrant further investigation.

Acknowledgments

This work was was supported by grants from The Mind Research Network (DOE grant no. DE-FG02-99ER62764) and from the National Institutes of Health (grant P20 RR021938-01). Special thanks to George Malloy for assistance with data collection and Rachael Grazioplene for assistance with data analysis.

List of Symbols

μ

Greek mu (mean)

Yijk

Latin Y, subscripts i,j,k (metabolite concentration in voxel j from subject i at scan k)

Yjk

Latin Y, subscripts j,k (metabolite concentration in voxel j from combined subject data at scan k)

Yik(gray or white)

Latin Y, subscripts i,k, (gray or white) (metabolite concentration in either gray matter or white matter in subject j at scan k

Vox

all Latin (random effect of voxel on metabolite concentration)

Sub

all Latin (random effect of subject on metabolite concentration)

Scan

all Latin (random effect of scan on metabolite concentration)

e

Latin e (random error)

σB2

Greek sigma, superscript 2, subscript B (between-subject variance of metabolite concentrations)

σW2

Greek sigma, superscript 2, subscript W (within-subject variance between scans 1 and 2)

σe2

Greek sigma, superscript 2, subscript e (random error variance)

References

  • 1.Srinivasan R, Vigneron D, Sailasuta N, Hurd R, Nelson S. A comparative study of myo-inositol quantification using LCmodel at 1.5 T and 3.0 T with 3 D 1H proton spectroscopic imaging of the human brain. Magn Reson Imaging. 2004;22(4):523–528. doi: 10.1016/j.mri.2004.01.028. [DOI] [PubMed] [Google Scholar]
  • 2.Di Costanzo A, Trojsi F, Tosetti M, Schirmer T, Lechner SM, Popolizio T, Scarabino T. Proton MR spectroscopy of the brain at 3 T: an update. Eur Radiol. 2007;17(7):1651–1662. doi: 10.1007/s00330-006-0546-1. [DOI] [PubMed] [Google Scholar]
  • 3.Cianfoni A, Niku S, Imbesi SG. Metabolite findings in tumefactive demyelinating lesions utilizing short echo time proton magnetic resonance spectroscopy. AJNR Am J Neuroradiol. 2007;28(2):272–277. [PMC free article] [PubMed] [Google Scholar]
  • 4.Baranzini SE, Srinivasan R, Khankhanian P, Okuda DT, Nelson SJ, Matthews PM, Hauser SL, Oksenberg JR, Pelletier D. Genetic variation influences glutamate concentrations in brains of patients with multiple sclerosis. Brain. 133(9):2603–2611. doi: 10.1093/brain/awq192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Li Y, Chen AP, Crane JC, Chang SM, Vigneron DB, Nelson SJ. Three-dimensional J-resolved H-1 magnetic resonance spectroscopic imaging of volunteers and patients with brain tumors at 3T. Magn Reson Med. 2007;58(5):886–892. doi: 10.1002/mrm.21415. [DOI] [PubMed] [Google Scholar]
  • 6.Chawla S, Wang S, Wolf RL, Woo JH, Wang J, O’Rourke DM, Judy KD, Grady MS, Melhem ER, Poptani H. Arterial spin-labeling and MR spectroscopy in the differentiation of gliomas. AJNR Am J Neuroradiol. 2007;28(9):1683–1689. doi: 10.3174/ajnr.A0673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Gasparovic C, Yeo R, Mannell M, Ling J, Elgie R, Phillips J, Doezema D, Mayer AR. Neurometabolite concentrations in gray and white matter in mild traumatic brain injury: an 1H-magnetic resonance spectroscopy study. J Neurotrauma. 2009;26(10):1635–1643. doi: 10.1089/neu.2009.0896. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Gonen O, Gruber S, Li BS, Mlynarik V, Moser E. Multivoxel 3D proton spectroscopy in the brain at 1.5 versus 3.0 T: signal-to-noise ratio and resolution comparison. AJNR Am J Neuroradiol. 2001;22(9):1727–1731. [PMC free article] [PubMed] [Google Scholar]
  • 9.Schubert F, Gallinat J, Seifert F, Rinneberg H. Glutamate concentrations in human brain using single voxel proton magnetic resonance spectroscopy at 3 Tesla. Neuroimage. 2004;21(4):1762–1771. doi: 10.1016/j.neuroimage.2003.11.014. [DOI] [PubMed] [Google Scholar]
  • 10.Mayer D, Spielman DM. Detection of glutamate in the human brain at 3 T using optimized constant time point resolved spectroscopy. Magn Reson Med. 2005;54(2):439–442. doi: 10.1002/mrm.20571. [DOI] [PubMed] [Google Scholar]
  • 11.Mullins PG, Chen H, Xu J, Caprihan A, Gasparovic C. Comparative reliability of proton spectroscopy techniques designed to improve detection of J-coupled metabolites. Magn Reson Med. 2008;60(4):964–969. doi: 10.1002/mrm.21696. [DOI] [PubMed] [Google Scholar]
  • 12.Tedeschi G, Bertolino A, Campbell G, Barnett AS, Duyn JH, Jacob PK, Moonen CT, Alger JR, Di Chiro G. Reproducibility of proton MR spectroscopic imaging findings. AJNR Am J Neuroradiol. 1996;17(10):1871–1879. [PMC free article] [PubMed] [Google Scholar]
  • 13.Charles HC, Lazeyras F, Tupler LA, Krishnan KR. Reproducibility of high spatial resolution proton magnetic resonance spectroscopic imaging in the human brain. Magn Reson Med. 1996;35(4):606–610. doi: 10.1002/mrm.1910350422. [DOI] [PubMed] [Google Scholar]
  • 14.Chard DT, McLean MA, Parker GJ, MacManus DG, Miller DH. Reproducibility of in vivo metabolite quantification with proton magnetic resonance spectroscopic imaging. J Magn Reson Imaging. 2002;15(2):219–225. doi: 10.1002/jmri.10043. [DOI] [PubMed] [Google Scholar]
  • 15.Li BS, Babb JS, Soher BJ, Maudsley AA, Gonen O. Reproducibility of 3D proton spectroscopy in the human brain. Magn Reson Med. 2002;47(3):439–446. doi: 10.1002/mrm.10081. [DOI] [PubMed] [Google Scholar]
  • 16.Zhu XP, Young K, Ebel A, Soher BJ, Kaiser L, Matson G, Weiner WM, Schuff N. Robust analysis of short echo time (1)H MRSI of human brain. Magn Reson Med. 2006;55(3):706–711. doi: 10.1002/mrm.20805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Langer DL, Rakaric P, Kirilova A, Jaffray DA, Damyanovich AZ. Assessment of metabolite quantitation reproducibility in serial 3D-(1)H-MR spectroscopic imaging of human brain using stereotactic repositioning. Magn Reson Med. 2007;58(4):666–673. doi: 10.1002/mrm.21351. [DOI] [PubMed] [Google Scholar]
  • 18.Ratai EM, Hancu I, Blezek DJ, Turk KW, Halpern E, Gonzalez RG. Automatic repositioning of MRSI voxels in longitudinal studies: impact on reproducibility of metabolite concentration measurements. J Magn Reson Imaging. 2008;27(5):1188–1193. doi: 10.1002/jmri.21365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Gu M, Kim DH, Mayer D, Sullivan EV, Pfefferbaum A, Spielman DM. Reproducibility study of whole-brain 1H spectroscopic imaging with automated quantification. Magn Reson Med. 2008;60(3):542–547. doi: 10.1002/mrm.21713. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Maudsley AA, Domenig C, Sheriff S. Reproducibility of serial whole-brain MR spectroscopic imaging. NMR Biomed. 23(3):251–256. doi: 10.1002/nbm.1445. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Gasparovic C, Song T, Devier D, Bockholt HJ, Caprihan A, Mullins PG, Posse S, Jung RE, Morrison LA. Use of tissue water as a concentration reference for proton spectroscopic imaging. Magn Reson Med. 2006;55(6):1219–1226. doi: 10.1002/mrm.20901. [DOI] [PubMed] [Google Scholar]
  • 22.Provencher SW. Estimation of metabolite concentrations from localized in vivo proton NMR spectra. Magn Reson Med. 1993;30(6):672–679. doi: 10.1002/mrm.1910300604. [DOI] [PubMed] [Google Scholar]
  • 23.Ashburner J, Friston KJ. Unified segmentation. Neuroimage. 2005;26(3):839–851. doi: 10.1016/j.neuroimage.2005.02.018. [DOI] [PubMed] [Google Scholar]
  • 24.Gasparovic C, Neeb H, Feis DL, Damaraju E, Chen H, Doty MJ, South DM, Mullins PG, Bockholt HJ, Shah NJ. Quantitative spectroscopic imaging with in situ measurements of tissue water T1, T2, and density. Magn Reson Med. 2009;62(3):583–590. doi: 10.1002/mrm.22060. [DOI] [PubMed] [Google Scholar]
  • 25.Vymazal J, Righini A, Brooks RA, Canesi M, Mariani C, Leonardi M, Pezzoli G. T1 and T2 in the brain of healthy subjects, patients with Parkinson disease, and patients with multiple system atrophy: relation to iron content. Radiology. 1999;211(2):489–495. doi: 10.1148/radiology.211.2.r99ma53489. [DOI] [PubMed] [Google Scholar]
  • 26.Kreis R, Ernst T, Ross BD. Development of the human brain: in vivo quantification of metabolite and water content with proton magnetic resonance spectroscopy. Magn Reson Med. 1993;30(4):424–437. doi: 10.1002/mrm.1910300405. [DOI] [PubMed] [Google Scholar]
  • 27.Mlynarik V, Gruber S, Moser E. Proton T (1) and T (2) relaxation times of human brain metabolites at 3 Tesla. NMR Biomed. 2001;14(5):325–331. doi: 10.1002/nbm.713. [DOI] [PubMed] [Google Scholar]
  • 28.Kreis R. Issues of spectral quality in clinical 1H-magnetic resonance spectroscopy and a gallery of artifacts. NMR Biomed. 2004;17(6):361–381. doi: 10.1002/nbm.891. [DOI] [PubMed] [Google Scholar]
  • 29.Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86(2):420–428. doi: 10.1037//0033-2909.86.2.420. [DOI] [PubMed] [Google Scholar]
  • 30.Schirmer T, Auer DP. On the reliability of quantitative clinical magnetic resonance spectroscopy of the human brain. NMR Biomed. 2000;13(1):28–36. doi: 10.1002/(sici)1099-1492(200002)13:1<28::aid-nbm606>3.0.co;2-l. [DOI] [PubMed] [Google Scholar]
  • 31.Kanowski M, Kaufmann J, Braun J, Bernarding J, Tempelmann C. Quantitation of simulated short echo time 1H human brain spectra by LCModel and AMARES. Magn Reson Med. 2004;51(5):904–912. doi: 10.1002/mrm.20063. [DOI] [PubMed] [Google Scholar]

RESOURCES