. 2000 Feb 28;9(3):156–164. doi: 10.1002/(SICI)1097-0193(200003)9:3<156::AID-HBM4>3.0.CO;2-Q

fMRI of visual encoding: Reproducibility of activation

Willem CM Machielsen 1, Serge ARB Rombouts 2, Frederik Barkhof 3, Philip Scheltens 4,, Menno P Witter 1
PMCID: PMC6871840  PMID: 10739366


fMRI, a noninvasive technique to measure brain activation, is gaining clinical interest, because its sensitivity enables individual assessments. However, more insight in the reproducibility of these measurements during higher cognitive tasks is necessary. We performed an fMRI study involving within‐ and between‐subject reproducibility during encoding of complex visual pictures. Ten healthy subjects were studied on three occasions: twice in the same scanning session (study 1 and 2), and a third time, 3–24 days later (study 3). On all 30 occasions but one, activation was found in areas expected on the basis of previous studies, including the fusiform and lingual gyri, occipital and parietal areas, the (para)hippocampal area, and the frontal inferior sulcus. The reproducibility of the number of activated voxels in the whole brain was 72% and 63% (respectively, studies 1 and 2, and 1 and 3). The reproducibility of anatomical identical pixels that supplement these results was 49% and 36%. These reproducibility measures increase about 5–15% when only areas of expected activation are included. The quantitative measurements indicate that there is substantial variation in the volume of activation. The recognition of pictures as tested afterward explains part of this variation between subjects. Our findings indicate that whereas consistent patterns of activation exist, more insight is needed into what determines the volume of activation, especially to assess cognitive alterations in patients over time. Hum. Brain Mapping 9:156–164, 2000. © 2000 Wiley‐Liss, Inc.

Keywords: fMRI, reproducibility of results, memory, encoding


The medial temporal lobe (MTL) is known to be crucially involved in declarative memory [Squire, 1992]. Injury to (parts of) this system results in a profound impairment to store new information. Several studies, using positron emission tomography (PET) showed activation of the MTL during tasks that involved declarative memory [Nyberg et al., 1996; Schacter et al., 1996; Tulving et al., 1994]. Recently, activation in the MTL was measured with functional MRI (fMRI) in individual subjects [Gabrieli, 1997; Rombouts et al., 1998a; Stern et al., 1996; Wagner et al., 1998].

In our institute, we are particularly interested in MTL functioning in patients with Alzheimer's disease (AD). In AD patients, fMRI shows reduced activation in several areas, including MTL, during encoding of new visual information [Rombouts et al., unpublished data]. One of the advantages of fMRI over PET is the higher sensitivity, enabling activation to be detected in individual subjects, paving the way for the use of fMRI as a tool to assess individual subjects, for example, in a diagnostic setting. To assess the feasibility of using fMRI in a clinical setting, the normal variation between‐ and within‐subjects has to be established.

Previous studies investigated the reproducibility of activation in motor or visual areas in individual subjects [Ramsey et al., 1996; Rombouts et al., 1998b; Yetkin et al., 1996]. These studies, reporting qualitative measurements on the presence or absence of activation in macroscopically defined areas, reveal that certain brain areas are consistently active during repeated measurements. Quantitative measurements, assessing activation of the number of voxels, reveal more variance between repeated measurements [Noll et al., 1997]. Another study [Genovese et al., 1997] showed that the reproducibility of a cognitive task (working memory) was comparable—on a qualitative level—to a simple sensory task. However, quantitative measurements showed a larger variance for the cognitive task when compared to sensory stimulation. In the present study, we investigated the reproducibility of activation during a task known to activate the MTL: the encoding of complex visual stimuli [Stern et al., 1996]. We chose this task because it is one of the few tasks available within the field of our interest (i.e., episodic memory related to the hippocampal area) that has been proven useful with fMRI. It is also a simple task that could be used in clinical practice. For exploring the reproducibility, we chose the size (and overlap) of detected activation, whereas there is reason to believe that this is related to the amount of brain tissue being active, which is of clinical interest.


Ten healthy volunteers (seven female, three male, age range 19–30 years) with normal, or corrected to normal vision participated. Written informed consent was obtained after the nature of the procedure had been fully explained.

To control for possible intelligence or capacity differences, all subjects underwent two neuropsychological tasks; The Groninger Intelligentie Test [Luteijn, 1966], and the Warrington recognition test [Warrington, 1984]. All subjects were scanned in two sessions separated by 3–24 days (median, 13.5 days). During the first session, subjects underwent the scanning procedure twice, without repositioning (studies 1 and 2); and in the second session, the procedure was performed for a third time (study 3). Subjects were positioned in a molded foam pad to restrict motion of the head. In each study, subjects were presented novel colour pictures (encoding condition) alternating with familiar pictures (control condition) [Stern et al., 1996]. All colour pictures represented outdoor scenes, emotionally neutral, with no people in close‐up (Fig. I). Two control pictures were used, and presented before the start of the study for 1 min each, to familiarise the subjects with them. Pictures were projected on a screen at the end of the scanner table with a data projector, located outside the scanner room. A laptop computer was connected to the data projector, and the screen was seen via a mirror that was positioned above the head coil. Subjects were instructed to view the pictures carefully and memorise them, so they could recognise them later in a memory test. All pictures were presented at the same rate (one per 4 s). Per study, a series of 82 EPI volumes (23 slices per volume) was acquired in an alternating manner; two during control, ten during encoding, ten during control, ten during encoding, etc. The first two volumes were discarded afterward. After each study, a recognition test was performed in which subjects were presented 40 pictures, half of which were not presented during scanning. Subjects were asked whether or not they had seen the picture during scanning. Three different sets of pictures were used for the three separate studies, both for the encoding and recognition task. The sequence of these three different sets over the three studies was randomised between subjects. If necessary, subjects wore fMRI‐compatible glasses, to ensure optimal visual acuity. An eye test, consisting of lines of letters from large to small on the screen, tested the subjects' vision while they were in the scanner.

Figure 1.

Figure 1

Examples of colour pictures used during the visual encoding task.

Imaging was performed at 1.5‐T (Vision; Siemens, Erlangen, Germany) using the standard circularly polarised head coil. For fMRI, echo planer imaging (EPI), sensitive to blood oxygen level dependent (BOLD) effects, was used [repetition time (TR) = 4 s, echo time (TE) = 64 ms, flip angle (α) = 90°, field of view (FOV) = 220 mm, matrix = 64 × 128 (interpolated to 128 × 128), slice thickness = 6 mm, interslice gap = 1.02 mm, number of slices = 23]. The slices were positioned perpendicular to the long axis of the hippocampus, and covered the whole brain. For anatomic localisation, 3‐D gradient echo T1‐weighted images were obtained (TR = 15 ms, TE = 7 ms, α = 8°, FOV = 220 mm, matrix = 256 × 256, slices thickness = 2 mm, number of slices 82), positioned parallel to the fMRI scans.

The EPI volumes were corrected for motion and repositioning [Woods et al., 1992] and filtered with a Gaussian filter (resulting full width at half maximum of 4.0 × 4.0 × 6.0 mm3), and transformed into standard coordinate space [Talairach and Tournoux, 1988]. Motion correction corrupted the first and last slices of the volumes and these were discarded from further analysis, which was done with the use of AFNI software [Cox, 1996]. To detect differences in activation between the control and the activation state, voxel‐time‐curves were cross‐correlated with a boxcar representing the cycle of the paradigm (control/encoding) [Bandettini et al., 1992]. The boxcar had a delay of 4 s to account for the hemodynamic response delay. In the first type of analysis, individual activation maps were calculated, including all brain voxels (unselected map). Activation was set to regions with a minimum cluster size of five voxels (103 mm3) and a significance threshold of 1.0 × 10−4 per voxel [Cox et al., 1995] to acquire an overall P value of < 0.05 [Cox, 1996]. This clustering was done to improve the power of analysis for active areas larger than single voxels. We found that this clustering was a reasonable trade‐off between an improved sensitivity for detecting larger activated areas (i.e., larger than four voxels), and the loss of power to detect areas smaller than five voxels. In the second type of analysis, first group averages were calculated for all three studies (P < 0.05, corrected; no clustering). Next, new individual activation maps were created (selected‐map); including only those voxels from the Individual activation maps of the first type of analysis, that were also active in the average map of the first study (activation areas of one voxel could occur).

In clinical practice, it is customary to interpret fMRI results on the basis of position and size of activation, and not the magnitude of signal changes. Therefore, two reproducibility measures were used [Rombouts et al., 1997]: R ijoverlap = 2*V ijoverlap/(V i + V j ), in which V i,j is the size of the activated volume in study i,j and V ijoverlap the volume that is activated in both studies. The second measure was R ijsize = 2*V smallest/(V i + V j ), where V smallest is the smallest of the two volumes V i and V j that is activated in the two studies. Thus, R ijsize only tests for reproducibility of the number of activated voxels, whereas R ijoverlap also takes the location of activation into account. Both measures were multiplied with 100% to obtain a scale from 0% (worst) to 100% (best). Reproducibility was tested between studies 1 and 2 (same scanning session), and studies 1 and 3 (different scanning sessions). Measurements of reproducibility were calculated for the unselected and selected maps. Because of our particular interest of the reproducibility in the MTL, each selected map was divided into three regions‐of‐interest (ROI): anterior, posterior, and (para)hippocampal ROI. The anterior ROI reached from the most anterior point of the brain to the coronal plane located 5 mm posterior from the anterior commissure in Talairach space [Talairach and Tournoux, 1988]. The posterior ROI reached from the most posterior point of the brain to the coronal plane located 35 mm posterior from the anterior commisure. The (para)hippocampal ROI included the rest of the brain that lies between these two areas. Although this is a rough selection of the MTL, we did not have a rationale for more precise bordering.

One‐way ANOVA analyses were performed on the volume of activation and recognition scores, to look for differences among studies and among versions. To look at differences of the reproducibility percentages between studies and between ROIs, a paired t‐test and a post hoc ANOVA, Scheffé‐test, were used. The Pearson correlation coefficient was calculated to examine the association between activation volume and recognition score.


On all 30 occasions, motion artefacts and repositioning errors between sessions could be corrected and activation was detected. The scores of the subjects on both neuropsychological tasks (Warrington test and GIT) and the eye test, before each session, indicated that all subjects had no deviations in their cognitive abilities and were able to see the pictures properly.

Qualitative Analysis

For each of the three studies, a group average activation map, including all ten subjects, was created. In all three studies, activation was seen in the fusiform and lingual gyrus, the occipital and parietal area, the parahippocampal area, the lateral geniculate, the frontal inferior sulcus, the transitional zone of the insula and inferior frontal operculum (study 2 only right side), and the cerebellum (Figures 2 and 3). Areas that were less consistently activated included the posterior paracingular area (study 1 and 3) and the frontal superior sulcus (study 2).

Figure 2.

Figure 2

Mean activated areas of all subjects in study 1 and 2. The mean activation of all ten subjects in study 1 and 2 is presented in 16 coronal slides from anterior (up‐left) to posterior (down‐right). Yellow areas were active in both studies. Blue and red areas were, respectively, only active in study 1 or 2. Activation is seen in the fusiform and lingual gyrus, the parahippocampal gyrus, the lateral geniculate, occipital and parietal areas, the frontal inferior sulcus, the frontal superior sulcus (study 2), the posterior paracingular area (study 1), the transitional zone of the insula and the inferior frontal operculum, and the cerebellum. Note that this is a group average, and that the individual data showed more variance, and revealed therefore no significant differences among all three studies (ANOVA; F = 1.504; P = .240).

Figure 3.

Figure 3

Mean activated areas of all subjects in study 1 and 3. The mean activation of all ten subjects in study 1 and 3 is presented in 16 coronal slides from anterior (up‐left) to posterior (down‐right). Yellow areas were active in both studies. Blue and red areas were, respectively, only active in study 1 or 3. Activation is seen in the fusiform and lingual gyrus, the parahippocampal gyrus, the lateral geniculate, occipital and parietal areas, the frontal inferior sulcus, the posterior paracingular area, the transitional zone of the insula and the inferior frontal operculum, and the cerebellum. Note that this is a group average, and that the individual data showed more variance, and revealed therefore no significant differences among all three studies (ANOVA; F = 1.504; P = .240).

Regarding the individual data (30 occasions), we consistently observed activation in the ventral stream (i.e., fusiform and lingual gyrus, and occipital lobe) and parietal area, in all but one occasion in the posterior part of the parahippocampal region, and in 26 cases in the frontal inferior sulcus.

Quantitative Analysis

The results show that volumes of total brain activation vary widely, both between subjects and within subjects (Table I). Compared to study 1, overall more activation is found in studies 2 and 3, although the increase is not significant (ANOVA; F = 1.504; P = .240) (Table II). The largest range for whole brain activation volumes within subjects is from 12 to 146 cm3 for the unselected maps and from 13 to 46 cm3 for the selected maps. The largest ranges between subjects within one study were respectively 6–146 cm3 and 2–53 cm3.

Table I.

Size of activated volumes in the whole brain of all subjects

Subject Volume sizes (cm3)
Study 1 Study 2 Study 3
1 71.7 49.5 82.8
2 8.1 22.7 6.1
3 12.4 22.4 145.8
4 9.0 18.3 40.6
5 40.1 136.0 109.0
6 43.7 61.8 86.5
7 45.9 50.0 30.9
8 38.4 43.6 28.2
9 96.4 105.6 64.2
10 10.4 55.9 60.9
Average 37.6 ± 29.4 56.6 ± 37.7 65.0 ± 42.0

Table II.

Average size of activated volumes and average reproducibility ratios*

Volume of activated tissue (cm3) Reproducibility study 1 and 2 (%) Reproducibility study 1 and 3 (%)
Study 1 Study 2 Study 3 A
Whole brain 37.6 ± 29.4 56.6 ± 37.7 65 ± 42 23.7 ± 16.7 20.6 ± 15.4 71.6 ± 21.6 48.9 ± 10.3 62.6 ± 25.9 36.1 ± 16.0
Selected areas 26.1 ± 15.8 32.5 ± 13.9 32.6 ± 17.5 19.1 ± 11.2 16.8 ± 11.9 80.3 ± 15.6 61.7 ± 11.2 67.8 ± 19.5 50.7 ± 18.0
Posterior ROI 23.9 ± 14.4 29.1 ± 11.9 30.0 ± 16.0 17.4 ± 10.2 15.7 ± 11.1 80.4 ± 15.1 62.0 ± 11.5 67.7 ± 19.2 51.4 ± 18.1
Hippocampal ROI 1.3 ± 0.8 2.0 ± 1.2 1.7 ± 1.1 1.0 ± 0.7 0.8 ± 0.6 60.8 ± 28.8 50.4 ± 27.0 62.8 ± 37.9 39.6 ± 25.0
Anterior ROI 0.7 ± 0.7 1.0 ± 1.0 0.7 ± 0.7 0.5 ± 0.7 0.2 ± 0.4 62.7 ± 36.9 42.8 ± 30.8 37.9 ± 36.2 21.1 ± 22.8

The mean of the activated volume sizes in all ten subjects is presented for study 1, 2, and 3, and overlap areas (A overlap) between study 1–2 and study 1–3. The mean of the reproducibility in all ten subjects of activation size (R size) and overlap (R overlap) is presented between study 1–2 and study 1–3. All measurements are presented for the whole brain, the areas that were activated in study 1 on average (selected areas), and the three different brain parts from this average; posterior ROI, hippocampal ROI, and anterior ROI. The mean activated areas in study 2 and 3 are greater than in study 1, but not significantly due to the large amount of variance. Although there is a clear indication that the reproducibility between study 1–2 is better than between 1–3, and that the large posterior ROI reveals better reproducibility than the other smaller ROI's, only few differences reveal significance. For the whole brain and the selected areas, the reproducibility for overlap of activation is significantly different between study 1–2 and 1–3. The reproducibility for overlap for study 1–3 is significantly greater in the posterior ROI than in the anterior ROI.


For the unselected maps, the average R size for all the individual measurements is higher for study 1 and 2 (range 31–96%) (Table II; Fig. 2) than for study 1 and 3 (range 29–93%) (Fig. 3). The same holds for the selected maps, but with higher reproducibility (ranges 51–99% and 43–94%). The average of the R overlap is also higher for study 1 and 2 than for study 1 and 3 regarding the unselected‐maps (ranges 27–62% and 20–62%) and the selected maps (ranges 40–76% and 16–78%). These differences in R overlap were significant for both the unselected maps (paired t‐test; t = 3.092; P = .013) and selected maps (paired t‐test; t = 2.690; P = .025). Comparing the selected to the unselected maps, the R size and R overlap for study 2 and 3 increased, respectively, from 70% to 75% and from 42% to 58%.

The means of the reproducibility measurements are higher for the posterior than for the (para)hippocampal and anterior ROI (Table II). However, because of large variances, only the difference between the posterior and the anterior ROI for the R 13overlap is significant (Scheffé's mean difference = .304; P = .018).

Recognition Task

The mean recognition score of the 29 recognition tests (one missing value) was 92.2% correct responses of the 40 answers given (range 82.5–100%). Between the three versions, there was a significant difference of mean recognition score (ANOVA; F = 8.03; P = .002). Post hoc analysis revealed that version 2 had a significantly higher score than versions 1 and 3. Regarding the three studies, no significant effect of mean recognition scores was observed. Significant correlation of the mean activated volume per subject of the selected maps was found with the mean recognition (Pearson; r = .728; P < .01) (Fig. 4). With respect to the individual ROIs, significant correlation of the mean recognition with the mean activated volume was found only for the posterior ROI (Pearson; r = .736; P < .05) and anterior ROI (Pearson; r = .734; P < .05), but not (para)hippocampal ROI.

Figure 4.

Figure 4

Mean activation volume of selected areas plotted against mean recognition score per subject for all three studies. The two measures have significant correlation of 0.728 (P = .01), meaning that better recognition of pictures afterward is accompanied by more brain activation during encoding of those pictures.


We have performed an fMRI study on the reproducibility of brain activation during visual encoding of complex outdoor scenes. All subjects were able to recognise most of the presented stimuli afterward, which indicates that they all had been engaged in the process of encoding during the task. The scores on the neuropsychological and eye tests show that all subjects were perfectly normal in their cognitive abilities and had normal vision.

The activation patterns of the visual encoding task that we found are in good correspondence with earlier findings [Gabrieli, 1997; Stern et al., 1996]. However, the constant activation we found in the frontal inferior sulcus was not reported in the study of Stern et al. [1996]. Earlier PET and fMRI studies do, however, report activation in frontal areas during encoding of novel information [Grady et al., 1995; Haxby et al., 1996; Kelley et al., 1998; Rombouts et al., 1998a].

Although we consistently found activation in the parahippocampal area, which is known to be involved in encoding, most activation is found in the fusiform and lingual gyri, as well as in the occipital and parietal areas, which are known to be involved in higher order analysis of visual stimuli. This indicates that the task we used does not only lead to activation that specifically can be related to encoding, but also to higher order visual processing.

Activated areas that were only found in one or two of the group activation maps of the three studies were situated in the frontal superior sulcus, and posterior paracingular area. No clear inference about the functional role of these areas in relation to our task can be made from the literature.


The main aim of the present study was to assess the reproducibility of fMRI measurements. Because the resulting patterns of brain activation on all 30 occasions are consistent, we conclude that at a qualitative level, patterns of activation are reproducible across subjects and studies. When we apply quantitative measurements, the activation maps show much more variability between and within subjects. For the whole brain activation maps, the average scores of R overlap and R size within one scanning session were higher than between scanning sessions.

The areas that are active in the average activation map of all subjects in the first study, give us an indication of the areas that are involved in the task. The selected maps, that consider only activation in these areas, reveal higher reproducibility scores (both R overlap and R size) than the activation maps that consider the whole brain. These areas are in agreement with what was expected, based on a priori knowledge; it indicates that when individual fMRI maps are analysed, the confidence about diagnostic significance can be increased by restricting the analysis to such a priori selected areas. It should be noted here that the way we selected the areas from the data of study 1 can artificially give an increased reproducibility when activation maps from study 1 are compared to the other studies. However, with the same area selection, we also find a comparable increase of reproducibility for the comparison of study 2 and 3. This indicates that the reproducibility increase cannot only be attributed to selection bias.

The reproducibility results are not the same for all activated areas in the brain. The best reproducibility both in size and overlap is found in the posterior ROI (Table II), which includes the ventral stream and some parietal areas. The reproducibility is poorer in the (para)hippocampal and anterior ROI. This could be because of the anatomical definition of the boundaries of the ROIs, or to psychophysiological differences related to the way subjects perform the task, involving the amount of verbal and nonverbal encoding [Kelley et al., 1998]. It is also possible that the activation is, after all, consistent but smaller in these areas and therefore more sensitive to variance in the detection of it.

Factors of Variance

Although it is likely that higher cognitive tasks can be performed in different ways (i.e., strategies), we believe that the high qualitative reproducibility of the visual encoding task indicates that all subjects were engaged in the same main cognitive processes. However, we found an indication of one possible factor of activation variance. The mean recognition of the pictures afterward correlated with the mean activation volume per subject. Whereas the recognition of these pictures is probably an indication for the actual encoding success during scanning, this last process is a possible factor of variance for the activation volume found. Although this is between subjects, it is of importance to interpret activation maps. A significant correlation within subjects could not be established because of the lack of measurements per subject.

The remaining differences observed in activation patterns between each case can be because of several other factors. The first group of factors can be related to acquisition and analysing of the data. Although we matched the brains of all three studies, differences in the registration of the signal can still be present, especially between sessions. One study by Purdon et al. [1998] concluded that there is a wide range of noise variance in the fMRI signal within and between subjects, which influences the determination of active voxels. Earlier research in our institute examined the influence of changing the significance thresholds on the reproducibility of activation maps [Rombouts et al., 1998b]. Maximum reproducibility was found for the Bonferroni corrected significance threshold, but other significance levels near Bonferroni did not show large deviations in reproducibility. That study also showed that reproducibility can be increased by adjusting the significance threshold for each study. However, this does not provide a standard method to analyse activation maps in a single individual. As an alternative, in an earlier study, the significance threshold for the best reproducibility was determined ad hoc, which resulted in the same good reproducibility for a cognitive task as for a motor task [Noll et al., 1997]. However, this procedure requires several measurements within a subject and is therefore not usable for single measurements.

Another group of factors that are likely to account for variances in observed activation patterns can be related to psychophysiological factors like attention, fatigue, task acquaintance, and performance. A way that may influence some of these factors toward less variance is the use of a self‐paced task design. D'Esposito et al. [1997] recently argued that a self‐paced task might be a better choice if one is interested in the intensity of local neuronal processing rather than the duration of processing.


We conclude that the task we used yields reasonable reproducibility at a qualitative level, because during each occasion activation is found in approximately the same areas. The best quantitative reproducibility can be obtained by analysing preselected areas. Simply counting the activated voxels is sufficient for this, which might be used for clinical applications, because familiar tasks will probably be used for this purpose. However, it should be reminded that there is a considerable variance in total amount of activation even within subjects. Data from large populations must be obtained to further investigate intra‐ and intersubject variance, and more insight in possible confounding factors is needed to reduce or control this variance. We found an indication that the amount of activation related to the task correlates with the recognition afterward. This is in accordance with the idea that the amount of activation correlates with successful neuronal processing [Wagner et al., 1998]. We thus conclude that the activated volume, measured with fMRI, could be a clinically meaningful measure, under the condition that the variance is more controlled. Therefore, more insight in confounding factors and improved task design are needed before diagnostic interpretations are possible on an individual level.


Additional funds for this study were granted by the “Stichting Alzheimer en Neuropsychiatry Foundation Amsterdam.”


