Abstract
Functional magnetic resonance imaging studies frequently use emotional face processing tasks to probe neural circuitry related to psychiatric disorders and treatments with an emphasis on regions within the salience network (e.g., amygdala). Findings across previous test-retest reliability studies of emotional face processing have shown high variability, potentially due to differences in data analytic approaches. The present study comprehensively examined the test-retest reliability of an emotional faces task utilizing multiple approaches to region of interest (ROI) analysis and by examining voxel-wise reliability across the entire brain for both neural activation and functional connectivity. Analyses included 42 healthy adult participants who completed an fMRI scan concurrent with an emotional faces task on two separate days with an average of 25.52 days between scans. Intraclass correlation coefficients (ICCs) were calculated for the ‘FACES-SHAPES’ and ‘FACES’ (compared to implicit baseline) contrasts across the following: anatomical ROIs identified from a publicly available brain atlas (i.e., Brainnetome), functional ROIs consisting of 5-mm spheres centered on peak voxels from a publicly available meta-analytic database (i.e., Neurosynth), and whole-brain, voxel-wise analysis. Whole-brain, voxel-wise analyses of functional connectivity were also conducted using both anatomical and functional seed ROIs. While group-averaged neural activation maps were consistent across time, only one anatomical ROI and two functional ROIs showed good or excellent individual-level reliability for neural activation. The anatomical ROI was the right medioventral fusiform gyrus for the FACES contrast (ICC = 0.60). The functional ROIs were the left and the right fusiform face area (FFA) for both FACES-SHAPES and FACES (Left FFA ICCs = 0.69 & 0.79; Right FFA ICCs = 0.68 & 0.66). Poor reliability (ICCs < 0.4) was identified for almost all other anatomical and functional ROIs, with some exceptions showing fair reliability (ICCs = 0.4–0.59). Whole-brain voxel-wise analysis of neural activation identified voxels with good (ICCs = 0.6–0.74) to excellent reliability (ICCs > 0.75) that were primarily located in visual cortex, with several clusters in bilateral dorsal lateral prefrontal cortex (DLPFC). Whole-brain voxel-wise analyses of functional connectivity for amygdala and fusiform gyrus identified very few voxels with good to excellent reliability using both anatomical and functional seed ROIs. Exceptions included clusters in right cerebellum and right DLPFC that showed reliable connectivity with left amygdala (ICCs > 0.6). In conclusion, results indicate that visual cortical regions demonstrate good reliability at the individual level for neural activation, but reliability is generally poor for salience regions often focused on within psychiatric research (e.g., amygdala). Given these findings, future clinical neuroimaging studies using emotional faces tasks to examine individual differences might instead focus on visual regions and their role in psychiatric disorders.
Keywords: Affect, Replication, Individual Differences, Intraclass Correlation, Connectivity
Introduction
The reproducibility of findings in previous human neuroimaging studies has long been an overlooked area of inquiry, and the field of human neuroimaging needs to evolve within this area in order to address meaningful neuroscientific questions (Poldrack et al., 2016). Such an evolution would benefit the clinical utility of these methods in psychiatry by providing reliable measures of brain activity that may be useful in predicting treatment response at the individual level (McDermott et al., 2018). Psychiatric research studies that utilize functional neuroimaging tools to examine neural predictors of treatment response have often focused on regions within the salience network (e.g., Klumpp & Fitzgerald, 2018), which is a collection of brain regions that often co-activate during affectively relevant tasks, and includes the amygdala, insular cortex, and anterior cingulate cortex (Seeley, 2019). However, clinical measurement tools must be demonstrated as reliable to adequately inform the treatment decision-making process (Groth-Marnat, 2009), and this includes establishing test-retest reliability of neuroimaging tasks that primarily seek to target salience regions.
Animal and human studies provide a plethora of evidence that the amygdala, in particular, is a primary brain region for the processing of salient emotional stimuli (Costafreda et al., 2007; Sergerie et al., 2008), and it is therefore considered an important region for psychiatric disorders marked by disrupted emotional processing (Forster et al., 2012). Functional magnetic resonance imaging (fMRI) studies have repeatedly shown that the amygdala is robustly sensitive to processing emotional face stimuli (Phan et al., 2002; Todorov, 2012), and the amygdala activates more in response to emotional facial stimuli than to other emotional images (Hariri et al., 2002; Mattavelli et al., 2014). Other salience regions co-activated during emotional faces tasks include the dorsal anterior cingulate cortex (ACC) and insular cortex (Posamentier & Abdi, 2003; Fusar-Poli et al., 2009). Thus, fMRI studies involving emotional face processing are often used to probe neural activation in salience regions that are relevant to psychiatric disorders and treatments. Additionally, there are other brain regions such as the visual cortex (in particular, the fusiform face area [FFA]) and dorsolateral prefrontal cortex (DLPFC) that are activated during emotional faces tasks (Kanwisher et al., 1997; Posamentier & Abdi, 2003; Fusar-Poli et al., 2009). The FFA is considered to play an essential role in face perception (Kanwisher et al., 1997), while DLPFC is primarily considered to be a part of the frontoparietal network that serves a central role in executive functions and attention regulation in particular (Marek et al., 2018). However, other work has identified DLPFC as part of a dorsal subsystem of the salience network with a primary role in facilitating attention toward affective stimuli (Touroutoglou et al., 2012). Regardless of the specific network, DLPFC seems to play a role in attending to task-relevant information during performance of emotional faces tasks. FMRI studies using emotional faces tasks with psychiatric populations (e.g., social anxiety disorder, major depressive disorder, posttraumatic stress disorder) have shown numerous disruptions within these neural regions, particularly documenting hyperactivation within the amygdala (Etkin & Wager, 2007; Fusar-Poli et al., 2009; Shin & Liberzon, 2010; Hughes & Shin, 2011; Dong et al., 2017). Meanwhile, studies examining neural changes with pharmacologic or cognitive-behavioral treatment of mental health disorders have used decreased amygdala activation during emotional face processing as a biomarker of response (Aupperle et al., 2013; Taylor et al., 2014; van Rooij et al., 2015; Godlewska et al., 2016). Lastly, studies have examined whether neural activation during emotional face processing before a treatment may be predictive of subsequent treatment response, and they have shown that amygdala hyperactivation predicts worse treatment outcomes (Bryant et al., 2008; Cisler et al., 2015; Williams et al., 2015). However, the future utility of amygdala activation during an emotional faces task as either a biomarker or a predictor of treatment response for individual patients is dependent upon reliable measurement across time (i.e., test-retest reliability).
To establish the test-retest reliability of a measure, researchers typically conduct studies in which the same procedure is completed on different days, and then the strength of the correlation between the measurements is examined (Groth-Marnat, 2009). The intraclass correlation coefficient (ICC) provides a quantitative value of the consistency of an assessment across multiple measurements (Shrout & Fleiss, 1979), and it is a commonly used statistical method utilized in test-retest reliability studies (Yen & Lo, 2002; Weir, 2005; Bennett & Miller; 2010). However, there are several types of ICC estimates that can be utilized depending on the type of measurement being examined, including random raters single-measure ICC [i.e., ICC(2, 1)] and fixed raters single-measure ICC [i.e., ICC(3, 1)]. ICC(2, 1) compares the agreement of data across subjects and sessions as random factors, and it does not account for fixed practice effects consistent across a group (Shrout & Fleiss, 1979). If a practice or habituation effect is expected upon retest, then ICC(3, 1) is considered to be a more appropriate reliability estimate as it accounts for consistent group-level changes in the measurement across time (Shrout & Fleiss, 1979). The interpretation of ICC values typically follows the guidelines presented in Fleiss (1986), such that ICCs less than 0.4 are considered poor, ICCs between 0.40 and 0.59 are considered fair, ICCs between 0.60 and 0.74 are considered good, and ICCs of 0.75 or greater are considered excellent.
Several fMRI studies have examined test-retest reliability of emotional faces tasks (Johnstone et al., 2005; Schacher et al., 2006; Manuck et al., 2007; Plichta et al., 2012; Sauder et al., 2013; van den Bulk et al., 2013; Lipp et al., 2014; Nord et al., 2017; Haller et al., 2018; Lois et al., 2018). However, the ICCs reported in these studies have varied widely. For the amygdala specifically, most of these studies have reported poor or fair reliability (Plichta et al., 2012; Sauder et al., 2013; van den Bulk et al., 2013; Lipp et al., 2014; Nord et al., 2017; Haller et al., 2018; Lois et al., 2018; full range of ICCs = −.50 to .50). However, others have reported good to excellent reliability (Johnstone et al., 2005; Schacher et al., 2006; Manuck et al., 2007; Geissberger et al., 2020; full range of ICCs = −.08 to .83). Although inconsistently reported across studies, the test-retest reliability of FFA neural activation has been shown in some cases to be in the good to excellent range (Nord et al., 2017; Geissberger et al., 2020; full range of ICCs = .73 to .83), while others show poor to fair reliability (Sauder et al., 2013; Lipp et al., 2014; full range of ICCs = −.24 to .50). Reports of other regions, such as the ACC and DLPFC have also been inconsistent across studies with results ranging across poor (Nord et al., 2017; ICCs = −.13 to .33), fair (Geissberger et al., 2020; ICCs = .41 to .59), or good to excellent reliability (Haller et al., 2018; ICCs = .63 to .87).
While there are several potential factors contributing to this variability across previous test-retest studies, one important factor could be the sample size. Those studies reporting poor or fair amygdala reliability often had larger sample sizes (mean N across seven studies: 26.3; full range of N’s: 15–46; Plichta et al., 2012; Sauder et al., 2013; van den Bulk et al., 2013; Lipp et al., 2014; Nord et al., 2017; Haller et al., 2018; Lois et al., 2018). Meanwhile, those reporting good to excellent amygdala reliability often had smaller sample sizes (mean N across four studies: 13.5; full range of N’s: 12–15; Johnstone et al., 2005; Schacher et al., 2006; Manuck et al., 2007; Geissberger et al., 2020). The relationship between sample size and the reliability of neural activation for other regions, such as FFA, is less clear as ICCs for these regions were not reported in all studies. This variance in findings could also be due to the use of different analysis methods across studies, such as the specific contrast utilized or the methods for defining regions of interest (ROIs) to extract average neural activation. Regarding contrasts, amygdala ICCs have tended to be greater when contrasting face trials (regardless of valence) to either shape trials or fixation as compared to ICCs when contrasting between specific face valences (e.g., fearful versus happy; Johnstone et al., 2005; Sauder et al., 2013; Lipp et al., 2014). Concerning ROIs, some studies have used functionally-defined ROIs and some have used anatomically-defined ROIs. Functionally-defined ROIs are based on statistically significant activation averaged across the study sample (i.e., inconsistent across independent samples) or pre-defined (a priori) spherical ROIs centered on a peak coordinate determined by previous work. Anatomically-defined ROIs are also pre-defined and are determined by the selection of specific regions within an anatomical atlas. Within the same studies, analyses utilizing functionally-defined ROIs for amygdala typically led to higher ICCs than analyses utilizing anatomically-defined ROIs (Johnstone et al., 2005; Sauder et al., 2013; van den Bulk et al., 2013; Lipp et al., 2014). Importantly, the identification of reliable ROIs that are not based on the data collected from one specific sample would be most applicable for clinical purposes as these could be applied to a single individual’s data for treatment prediction or indexing treatment response.
Several recent studies have also examined the test-retest reliability of task-based functional connectivity (i.e., psychophysiological interaction) during emotional faces tasks using amygdala as a seed region (Haller et al., 2018; Nord et al., 2019). Within a healthy youth sample (N = 25), Haller and colleagues (2018) found reliable functional connectivity between an anatomically-defined left amygdala ROI seed and both left DLPFC and dorsal medial PFC (using an ICC threshold of 0.5). Within a healthy adult sample (N = 29), Nord and colleagues (2019) found reliable connectivity between anatomically-defined left and right amygdala seed ROIs and dorsal medial PFC (using an ICC threshold of 0.4). Note that some of these findings exceeded ICCs of 0.6 and thus fell within the good to excellent range for test-retest reliability (Haller et al., 2018; Nord et al., 2019). While these findings are encouraging, these analyses should be replicated using larger sample sizes.
The present study utilized multiple, standardized approaches to examine the test-retest reliability of neural activation and functional connectivity in a moderately large sample of healthy adults performing an emotional faces task (Hariri et al., 2002; Taylor et al., 2014) during fMRI. We utilized two standardized data analytic approaches to defining a priori ROIs (i.e., one anatomically-defined and one functionally-defined). We also conducted exploratory whole-brain, voxel-wise ICC analyses of neural activation and task-based functional connectivity. As the present study was primarily interested in examining the absolute agreement of measurements between time points without accounting for fixed practice effects (i.e., consistent change across the group), ICC(2, 1) was used for all ICC calculations in this study. This study also used a more stringent ICC threshold of 0.6 to focus on identifying ROIs or voxels with good to excellent reliability, but note that ROIs or voxels with ICCs below this threshold are also reported. Based on previous work, we hypothesized that: (1) the majority of ICCs calculated for a priori ROIs would show poor reliability (i.e., less than 0.4), (2) reliability estimates would be higher for functionally-defined ROIs than anatomically-defined ROIs, and (3) whole-brain functional connectivity analyses between amygdala seed ROIs and frontal cortex regions (i.e., DLPFC, dorsal medial PFC) would yield voxels with good to excellent reliability estimates. Due to the present study’s moderately large sample size (compared to many previous fMRI reliability studies) and exhaustive approach in examining test-retest reliability, this study seeks to provide clarity about the future utility of neural activation in response to emotional faces tasks during fMRI in psychiatric assessment and treatment prediction research.
Methods
Subject Selection
Fifty-eight healthy adults with no psychiatric diagnoses were recruited to participate in the present study (36 females; mean age = 24.43 years, range: 18–49). This sample was compiled across two different studies with identical fMRI scanning procedures, one that focused on test-retest reliability of measures in healthy adults with no psychiatric diagnosis (N = 27) and another that focused on the impact of resilience training in adult undergraduate college students (N = 31; results from clinical study reported previously; Akeman et al., 2020). From the latter study, only participants with no psychiatric diagnosis who were part of the control group were included, as they had completed the same fMRI protocol on two different dates without any intervention between scanning sessions at time 1 (T1) and time 2 (T2). Note that subsample 1 had a planned test-retest interval of approximately 2–3 weeks. Meanwhile, subsample 2 had a planned test-retest interval of approximately 6–8 weeks in order to compare participant data to a separate intervention group that had completed scans before and after a four-week resilience training protocol (Akeman et al., 2020). Participants in the test-retest reliability study were assessed for psychiatric diagnoses using the Mini-International Neuropsychiatric Interview (M.I.N.I.; Sheehan et al., 1998), while previous psychiatric diagnoses in the college student study were obtained via self-reported psychiatric history. Exclusionary criteria included diagnosis of a psychiatric disorder, concurrent use of psychotropic medications, medical illness that would affect central nervous system function (e.g., neurological disease), history of significant head trauma, current substance abuse, and ferromagnetic implants. A total of 16 participants who were enrolled in the study were also excluded from analysis due to poor data quality: 15 participants were excluded for having greater than 20% of their trials removed due to excessive motion at T1 or T2 (i.e., using a threshold of 0.3 mm for the average Euclidean Norm [ENORM] of motion parameters), and one participant was excluded due to scanner acquisition error at T2.
The combined final sample included forty-two healthy adults across the two studies (24 females; mean age = 24.83 years, SD = 8.80, range: 18–49). The racial and ethnic demographics of the combined final sample were as such: 28 participants were non-Hispanic white (12 in subsample 1, 16 in subsample 2), two were Hispanic (two in subsample 1, zero in subsample 2), three were black (one in subsample 1, two in subsample 2), four were Asian (two in subsample 1, two in subsample 2), and 5 were American Indian (four in subsample 1, one in subsample 2). Two-tailed independent-samples t-tests were used examine for statistically significant differences between subsamples for age, test-retest interval, and extraneous factors that may affect test-retest reliability such as accuracy, reaction time, time of day when the scan started, and participants’ average head motion values. As expected based on the different study designs, these subsamples significantly differed on age and test-retest interval (p’s < .001), but they did not significantly differ on any of the extraneous factors (all p’s > .14). These results are reported in Supplemental Table 1. Participants provided informed consent and received monetary compensation for their time and travel following the guidelines of the Western Institutional Review Board, who approved both study protocols. Research was conducted in accordance with the World Medical Association Declaration of Helsinki.
Experimental paradigm and stimuli
Participants completed the same study protocol on two different days. During the block-design emotional faces task (Hariri et al., 2002; Taylor et al., 2014), participants were presented with a series of shapes or emotional faces (from the NIMSTIM Set of Face Expressions; Tottenham et al., 2009) and were instructed to choose between two options to match a target stimulus based on either the form of the shapes or the emotion of the faces, respectively. Participants had five seconds to respond on a button pad using either their index or middle finger on the same hand. Each block (shapes or angry, happy, or fearful face-matching targets) consisted of six trials and lasted for 30 seconds. A fixation cross was presented for ten seconds between each block. A total of 12 blocks (three for shapes; three for each emotion) were presented in a pseudorandomized order and the entire task lasted 512 seconds. Accuracy and reaction time data were calculated for each participant and averaged across blocks. Due to errors in acquisition, accuracy and reaction time data were unavailable for three subjects.
FMRI data acquisition and imaging parameters
Functional and structural images were acquired using a Discovery MR750 whole-body 3.0 Tesla MRI scanner (GE Healthcare, Milwaukee, WI, USA). A receive-only 8-element phased array coil (GE Healthcare, Milwaukee, WI, USA) optimized for parallel imaging was used for MRI signal reception. During task performance, a single fMRI scan collected BOLD signal using a single-shot, gradient-recalled echo-planar imaging (EPI) sequence with sensitivity encoding (96 × 96 matrix, 240 mm field of view [FOV], 1.875 × 1.875 mm2 in-plane resolution, 39 axial slices, 2.9 mm slice thickness, 2.0 s repetition time [TR], 27 ms echo time [TE], 40° flip angle, 250 kHz sampling bandwidth, 256 volumes, and SENSE acceleration factor R = 2 in the phase-encoding direction). One T1-weighted Magnetization Prepared Rapid Gradient Echo (MPRAGE) imaging sequence with SENSE was used for anatomical reference and alignment purposes (256 × 256 matrix size, 240 mm FOV, 0.938 × 0.938 mm2 in-plane resolution, 1.0 mm slice thickness, 5.94 ms TR, 1.96 ms TE, 8° flip angle, 31.2 kHz sampling bandwidth, 186 axial slices per volume, and acceleration factor R = 2).
Data preprocessing and subject-level analyses
All structural and functional imaging data were preprocessed and analyzed using the Analysis of Functional NeuroImages (AFNI) software package (Cox, 1996). The first three volumes were discarded and slice timing correction was performed for each volume. The anatomical image was aligned to an EPI image (i.e., using a low motion TR for each individual) and warped to the MNI152_T1_2009c T1-weighted anatomical template. EPI images were realigned to the first volume, normalized to the template image, and resampled to a voxel size of 2 × 2 × 2 mm3. Anatomical data were resampled to a voxel size of 1 × 1 × 1 mm3.
Individual participant time series data were analyzed using AFNI’s 3dDeconvolve program (using the ‘BLOCK’ function). Regressors of interest were the angry, fearful, happy, and shape blocks. Regressors of non-interest included motion parameters (roll, pitch and yaw directions) and regressors to model baseline, linear and quadratic trends. The resultant regressor coefficients were divided by the baseline regressor to calculate percent signal change (PSC). PSC was combined across the three emotion types (i.e., angry, fear, happy; FACES) and contrasted with shape blocks (FACES-SHAPES) or compared to implicit baseline (FACES). A Gaussian filter with 4 mm full-width at half maximum was applied to the voxel-wise PSC data.
Anatomical ROI Selection
The Brainnetome atlas (atlas.brainnetome.org) is an open-access resource that provides a map of anatomical subregions of the human brain (Fan et al., 2016). These subregions were constructed using a comprehensive, multimodal neuroimaging approach that utilized both structural and functional connectivity information in addition to standard structural imaging.
Based on previous meta-analyses of emotional faces tasks (Fusar-Poli et al., 2009; Posamentier & Abdi, 2003), we focused on Brainnetome subregions within bilateral amygdala, fusiform gyrus, insula, DLPFC, and ACC. This resulted in a total of 36 ROIs (see Table 2).
Table 2.
FACES-SHAPES | FACES | |||||||
---|---|---|---|---|---|---|---|---|
Region of Interest (ROI) | Session 1 PSC Mean (SD) |
Session 2 PSC Mean (SD) |
ICC (2, 1) (95% CI) |
ICC (2, 1) p-value |
Session 1 PSC Mean (SD) |
Session 2 PSC Mean (SD) |
ICC (2, 1) (95% CI) |
ICC (2, 1) p-value |
Amygdala | ||||||||
Left Amygdala Lateral, BN 211 |
0.176 (0.175) | 0.138 (0.286) | 0.12 (−0.19–0.41) | 0.23 | 0.135 (0.205) | 0.149 (0.215) | 0.14 (−0.18–0.42) | 0.20 |
Right Amygdala Lateral, BN 212 |
0.155 (0.149) | 0.111 (0.182) | −0.06 (−0.35–0.24) | 0.66 | 0.102 (0.138) | 0.079 (0.101) | 0.03 (−0.27–0.33) | 0.41 |
Left Amygdala Medial, BN 213 |
0.150 (0.310) | 0.163 (0.414) | 0.18 (−0.14–0.46) | 0.14 | 0.017 (0.319) | 0.073 (0.313) | 0.05 (−0.26–0.35) | 0.38 |
Right Amygdala Medial, BN 214 |
0.129 (0.269) | 0.096 (0.338) | 0.27 (−0.04–0.53) | 0.04 | 0.130 (0.284) | 0.131 (0.192) | 0.11 (−0.21–0.40) | 0.25 |
Fusiform Gyrus | ||||||||
Left Fusiform Gyrus Ventrolateral, BN 105 |
0.379 (0.312) | 0.408 (0.377) | 0.26 (−0.04–0.53) | 0.05 | 0.366 (0.284) | 0.366 (0.274) | 0.33 (0.02–0.57) | 0.02 |
Right Fusiform Gyrus Ventrolateral, BN 106 |
0.547 (0.303) | 0.569 (0.459) | 0.21 (−0.10–0.48) | 0.09 | 0.592 (0.354) | 0.589 (0.374) | 0.31 (0.01–0.56) | 0.02 |
Left Fusiform Gyrus Medioventral, BN_107 |
0.307 (0.329) | 0.354 (0.416) | 0.34 (0.04–0.58) | 0.01 | 0.384 (0.359) | 0.387 (0.336) | 0.54# (0.28–0.72) | < 0.01 |
Right Fusiform Gyrus Medioventral, BN 108 |
0.367 (0.425) | 0.421 (0.424) | 0.49# (0.22–0.69) | < 0.01 | 0.531 (0.411) | 0.546 (0.406) | 0.60* (0.37–0.77) | < 0.01 |
Insula | ||||||||
Left Insula Dorsal Agranular, BN 163 |
0.032 (0.143) | 0.063 (0.150) | 0.08 (−0.24–0.38) | 0.30 | 0.049 (0.121) | 0.032 (0.115) | −0.04 (−0.34–0.27) | 0.59 |
Right Insula Dorsal Agranular, BN 164 |
0.019 (0.195) | 0.076 (0.219) | 0.01 (−0.29–0.30) | 0.49 | 0.020 (0.179) | 0.028 (0.154) | 0.20 (−0.12–0.47) | 0.11 |
Left Insula Dorsal Dysgranular, BN 165 |
0.010 (0.137) | 0.017 (0.143) | 0.07 (−0.23–0.37) | 0.31 | 0.058 (0.100) | 0.034 (0.105) | 0.29 (−0.01–0.54) | 0.03 |
Right Insula Dorsal Dysgranular, BN 166 |
−0.001 (0.145) | 0.034 (0.234) | 0.05 (−0.26–0.34) | 0.39 | 0.007 (0.127) | 0.008 (0.135) | 0.28 (−0.03–0.54) | 0.04 |
Left Insula Dorsalgranular, BN 167 |
0.005 (0.147) | −0.001 (0.125) | 0.04 (−0.27–0.34) | 0.40 | 0.024 (0.105) | 0.005 (0.123) | 0.09 (−0.22–0.38) | 0.28 |
Right Insula Dorsalgranular, BN 168 |
−0.004 (0.153) | 0.037 (0.191) | −0.18 (−0.46–0.13) | 0.87 | −0.055 (0.121) | −0.054 (0.178) | 0.11 (−0.20–0.40) | 0.24 |
Left Insula Hypergranular, BN 169 |
−0.002 (0.130) | 0.024 (0.137) | −0.12 (−0.41–0.19) | 0.78 | −0.002 (0.117) | −0.012 (0.140) | −0.04 (−0.35–0.27) | 0.61 |
Right Insula Hypergranular, BN 170 |
0.003 (0.146) | 0.034 (0.206) | −0.16 (−0.44–0.15) | 0.84 | −0.112 (0.133) | −0.118 (0.184) | 0.09 (−0.23–0.38) | 0.30 |
Left Insula Ventral Agranular, BN 171 |
0.022 (0.232) | 0.018 (0.202) | −0.03 (−0.34–0.28) | 0.57 | 0.001 (0.169) | −0.041 (0.196) | 0.19 (−0.11–0.47) | 0.10 |
Right Insula Ventral Agranular, BN 172 |
−0.057 (0.317) | 0.047 (0.281) | −0.07 (−0.35–0.23) | 0.68 | −0.098 (0.171) | −0.085 (0.233) | 0.16 (−0.15–0.44) | 0.16 |
Left Insula Ventral Granular BN 173 |
0.034 (0.198) | 0.069 (0.146) | 0.06 (−0.24–0.36) | 0.34 | 0.022 (0.134) | 0.014 (0.102) | 0.16 (−0.15–0.44) | 0.15 |
Right Insula Ventral Granular, BN 174 |
−0.046 (0.250) | 0.040 (0.282) | 0.17 (−0.13–0.44) | 0.14 | −0.080 (0.164) | −0.055 (0.186) | 0.04 (−0.27–0.34) | 0.40 |
Dorsolateral PFC | ||||||||
Left DLPFC Dorsal Area 44, BN 15 |
0.120 (0.156) | 0.149 (0.178) | 0.03 (−0.28–0.33) | 0.43 | 0.059 (0.123) | 0.070 (0.141) | 0.34 (−0.04–0.58) | 0.01 |
Right DLPFC Dorsal Area 44, BN 16 |
0.199 (0.228) | 0.268 (0.219) | 0.15 (−0.15–0.43) | 0.16 | 0.185 (0.238) | 0.156 (0.179) | 0.55# (0.30–0.73) | < 0.01 |
Left DLPFC Dorsal Area 946, BN 17 |
−0.016 (0.152) | −0.026 (0.146) | 0.26 (−0.05–0.52) | 0.05 | −0.029 (0.121) | −0.023 (0.125) | 0.31 (0.01–0.56) | 0.02 |
Right DLPFC Dorsal Area 946, BN 18 |
0.015 (0.149) | 0.029 (0.198) | 0.28 (−0.03–0.54) | 0.04 | −0.026 (0.135) | −0.018 (0.158) | 0.30 (−0.01–0.55) | 0.03 |
Left DLPFC Inferior Frontal Junction, BN 21 |
0.123 (0.194) | 0.147 (0.203) | 0.12 (−0.19–0.41) | 0.22 | 0.080 (0.193) | 0.115 (0.152) | 0.20 (−0.11–0.47) | 0.10 |
Right DLPFC Inferior Frontal Junction, BN 22 |
0.196 (0.219) | 0.243 (0.216) | −0.03 (−0.33–0.27) | 0.58 | 0.133 (0.207) | 0.154 (0.158) | 0.24 (−0.07–0.51) | 0.06 |
Left DLPFC Ventral Area 946, BN 29 |
0.014 (0.201) | 0.047 (0.203) | 0.24 (−0.06–0.51) | 0.06 | 0.020 (0.154) | 0.026 (0.137) | 0.27 (−0.04–0.53) | 0.04 |
Right DLPFC Ventral Area 946, BN 30 |
0.062 (0.275) | 0.151 (0.275) | 0.03 (−0.26–0.32) | 0.42 | 0.041 (0.210) | 0.061 (0.185) | 0.31 (0.00–0.56) | 0.02 |
Anterior Cingulate Cortex | ||||||||
Left ACC Caudodorsal, BN 177 |
−0.001 (0.163) | 0.012 (0.239) | 0.11 (−0.20–0.40) | 0.24 | −0.064 (0.170) | −0.059 (0.182) | 0.48# (0.21–0.69) | < 0.01 |
Right ACC Caudodorsal, BN 178 |
0.010 (0.157) | 0.026 (0.182) | 0.19 (−0.12–0.47) | 0.12 | −0.050 (0.123) | −0.041 (0.142) | 0.12 (−0.20–0.41) | 0.23 |
Left ACC Pregenual, BN 179 |
−0.057 (0.204) | −0.072 (0.227) | 0.05 (−0.26–0.35) | 0.37 | −0.123 (0.179) | −0.137 (0.164) | 0.25 (−0.06–0.51) | 0.06 |
Right ACC Pregenual, BN 180 |
−0.002 (0.153) | −0.019 (0.147) | 0.22 (−0.09–0.49) | 0.08 | −0.060 (0.132) | −0.073 (0.122) | 0.16 (−0.15–0.44) | 0.16 |
Left ACC Rostroventral, BN 183 |
−0.003 (0.155) | 0.040 (0.247) | 0.35 (−0.05–0.58) | 0.01 | −0.054 (0.186) | −0.015 (0.164) | 0.19 (−0.12–0.46) | 0.12 |
Right ACC Rostroventral, BN 184 |
−0.012 (0.116) | 0.014 (0.140) | 0.39 (0.10–0.62) | < 0.01 | −0.043 (0.101) | −0.041 (0.091) | 0.02 (−0.29–0.32) | 0.46 |
Left ACC Subgenual, BN 187 |
−0.121 (0.253) | −0.086 (0.313) | −0.08 (−0.38–0.23) | 0.70 | −0.198 (0.203) | −0.178 (0.237) | 0.00 (−0.30–0.30) | 0.51 |
Right ACC Subgenual, BN 188 |
−0.083 (0.167) | −0.080 (0.187) | 0.08 (−0.24–0.38) | 0.31 | −0.145 (0.145) | −0.162 (0.169) | 0.22 (−0.09–0.49) | 0.08 |
ICCs are absolute agreement and single-measure. ICCs between .4–.6 are denoted with a #. ICCs >.6 are denoted with a *. ICC value interpretation: poor (<.40), fair (.40–.59), good (.60–.74), excellent (≥.75). Negative ICCs are interpreted as having zero reliability. Abbreviations: PSC=percent signal change; ICC=intraclass correlation coefficient; ROI=region of interest; SD=standard deviation; CI=confidence interval; BN=Brainnetome; PFC=prefrontal cortex; DLPFC=dorsolateral prefrontal cortex; ACC=anterior cingulate cortex.
Functional ROI Selection
The Neurosynth meta-analytic database (neurosynth.org) is a meta-analytic resource that can be used to identify regions consistently active in studies relevant to specific search terms (Yarkoni et al., 2011). For the purposes of the current study, we used the search term “emotional faces”, which yielded a total of 100 relevant studies. A “uniformity test map” of voxel-wise z-scores was constructed to help identify regions consistently active across these studies. Using a z-score threshold of 5.26 allowed us to identify the greatest number of regions while also allowing for separation into individual clusters. We identified the peak Montreal Neurological Institute (MNI) coordinates for nine ROIs within bilateral amygdala, fusiform gyrus (i.e., FFA), insula, and DLPFC, and one medial dorsal ACC ROI (see Table 3). We then constructed spherical ROIs (5 mm radius) centered on these coordinates.
Table 3.
FACES-SHAPES | FACES | |||||||
---|---|---|---|---|---|---|---|---|
ROI | Session 1 PSC Mean (SD) |
Session 2 PSC Mean (SD) |
ICC (2, 1) (95% CI) |
ICC (2, 1) p-value |
Session 1 PSC Mean (SD) |
Session 2 PSC Mean (SD) |
ICC (2, 1) (95% CI) |
ICC (2, 1) p-value |
Amygdala | ||||||||
Left Amygdala – NS Peak (MNI: −23, −5, −14) |
0.288 (0.247) | 0.233 (0.348) | 0.24 (−0.07–0.51) | 0.06 | 0.253 (0.265) | 0.227 (0.247) | 0.42# (−0.13–0.64) | < 0.01 |
Right Amygdala – NS Peak (MNI: 24, −5, −12) |
0.204 (0.203) | 0.168 (0.196) | 0.26 (−0.04–0.52) | 0.04 | 0.164 (0.192) | 0.142 (0.151) | 0.34 (−0.04–0.58) | 0.01 |
Fusiform Gyrus | ||||||||
Left FFA – NS Peak (MNI: −42, −57, −7) |
0.146 (0.267) | 0.158 (0.219) | 0.69* (0.49–0.82) | < 0.01 | 0.149 (0.232) | 0.159 (0.234) | 0.79* (0.64–0.88) | < 0.01 |
Right FFA – NS Peak (MNI: 38, −60, −9) |
0.343 (0.375) | 0.362 (0.388) | 0.68* (0.48–0.82) | < 0.01 | 0.422 (0.351) | 0.478 (0.378) | 0.66* (0.45–0.80) | < 0.01 |
Insula | ||||||||
Left Insula – NS Peak (MNI: −35, 17, 2) |
0.063 (0.190) | 0.092 (0.220) | 0.09 (−0.23–0.38) | 0.29 | 0.093 (0.148) | 0.070 (0.150) | −0.08 (−0.38–0.33) | 0.70 |
Right Insula – NS Peak (MNI: 33, 19, 1) |
0.028 (0.174) | 0.091 (0.239) | −0.04 (−0.33–0.26) | 0.60 | 0.033 (0.158) | 0.024 (0.155) | 0.04 (−0.27–0.34) | 0.40 |
Dorsolateral PFC | ||||||||
Left DLPFC – NS Peak (MNI: −48, 19, 29) |
0.263 (0.347) | 0.325 (0.347) | 0.13 (−0.18–0.42) | 0.20 | 0.126 (0.290) | 0.175 (0.277) | 0.05 (−0.26–0.34) | 0.38 |
Right DLPFC – NS Peak (MNI: 42, 11, 28) |
0.267 (0.287) | 0.301 (0.277) | 0.41# (−0.12–0.47) | < 0.01 | 0.237 (0.296) | 0.227 (0.231) | 0.55# (0.29–0.73) | < 0.01 |
Anterior Cingulate Cortex | ||||||||
Dorsal ACC – NS Peak (MNI: 9, 31, 19) |
−0.009 (0.122) | −0.025 (0.129) | 0.14 (−0.17–0.42) | 0.19 | −0.059 (0.094) | −0.071 (0.089) | 0.17 (−0.14–0.45) | 0.14 |
ICCs are absolute agreement and single-measure. ICCs between .4–.6 are denoted with a #. ICCs >.6 are denoted with a *. ICC value interpretation: poor (<.40), fair (.40–.59), good (.60–.74), excellent (≥.75). Negative ICCs are interpreted as having zero reliability. Abbreviations: ICC=intraclass correlation coefficient; ROI=region of interest; PSC=percent signal change; SD=standard deviation; CI=confidence interval; NS=Neurosynth; MNI=Montreal Neurological Institute coordinates; FFA=fusiform face area; PFC=prefrontal cortex; DLPFC=dorsolateral prefrontal cortex; ACC=anterior cingulate cortex.
Statistical Analyses for Anatomical and Functional ROIs
Statistical analyses were carried out using the R Statistical Package (R Core Team, 2013). First, we conducted paired-samples t-tests to identify any significant changes in extraneous factors that may affect test-retest reliability between scan sessions at T1 and T2 such as accuracy, reaction time, time of day when the scan started, and participants’ average head motion values. Using the R package ‘irr’ (v0.84.1; Gamer et al., 2019), random raters single-measure ICCs (i.e., ICC[2, 1]) between T1 and T2 were calculated for the mean PSC extracted from the 36 anatomical ROIs and the nine functional ROIs, individual behavioral performance as measured by reaction time, as well as individual motion characterized by ENORM values. ICCs were not calculated for accuracy data (i.e., due to the ceiling effect in performance; see Table 1). For ICC calculations, scan session and subject were categorized as random factors to calculate ICC(2, 1). When reporting ICCs, we differentially denote whether ROIs met a more stringent 0.6 cutoff for good to excellent reliability or a less stringent cutoff of 0.4 for fair reliability.
Table 1.
Variable | Time 1 Mean (SD) |
Time 2 Mean (SD) |
ICC (2, 1) (p-value) |
Paired t-test t-statistic |
Paired t-test p-value |
---|---|---|---|---|---|
Accuracy (% Correct)@ | 98.8% (2.1) | 98.7% (2.2) | - | 0.24 | 0.81 |
Reaction time@ | 1077 ms (257) | 1027 ms (215) | 0.44# (< 0.01) | 1.50 | 0.14 |
Time of day scan started | 1:10pm (186 min) | 12:27pm (168 min) | - | 1.79 | 0.08 |
Head motion (ENORM values) | 0.0688 (0.0313) | 0.0643 (0.0267) | 0.60* (< 0.01) | 0.94 | 0.35 |
No variables statistically differed between T1 and T2 (all p’s > .05). Behavioral variables that included data from 39 out of the 42 participants are denoted with a @. Participants had near perfect accuracy for both T1 and T2, and an ICC was not calculated for accuracy as there was a ceiling effect and minimal variance to account for. An ICC was also not calculated for time of day as this variable was determined by participants’ scheduling convenience and was not a variable of interest concerning reliability. ICCs are absolute agreement and single-measure. ICCs between .4–.6 are denoted with a #. ICCs >.6 are denoted with a *. ICC value interpretation: poor (< .40), fair (.40–.59), good (.60–.74), excellent (≥ .75). Abbreviations: SD=standard deviation; ICC=intraclass correlation coefficient; ms = milliseconds; min = minutes; ENORM = average Euclidean Norm of motion parameters.
Whole-Brain Voxel-Wise ICCs for Neural Activation
Whole-brain voxel-wise ICCs were calculated using AFNI’s 3dLME package separately for the FACES-SHAPES and FACES contrasts. ICC calculations categorized both scan session and subject as random factors to calculate ICC(2, 1). These ICC maps were thresholded at a more stringent threshold of 0.6 to display only voxels that had good or excellent reliability. In addition, the unthresholded whole-brain ICC maps are provided in Supplemental Figure 1.
To allow for visualization concerning the consistency of group-level task effects from T1 and T2, we conducted whole-brain one-sample t-tests for each time point using AFNI’s 3dttest++ package. The results of these analyses were statistically thresholded at p < .01, corrected for multiple comparisons. Multiple comparisons corrections were conducted using cluster-based permutation testing with AFNI’s 3dClustsim package, and a voxel threshold of 1,537 was used based on a corrected threshold of α < .05. This was the more stringent clustering threshold when testing both T1 and T2 separately. To examine for statistically significant changes in group-level activation as an effect of time, we also conducted whole-brain dependent-sample t-tests between T1 and T2 that were statistically thresholded at p < .01, corrected for multiple comparisons. Cluster-based permutation testing with 3dClustim determined a voxel threshold of 225 based on a corrected threshold of α < .05. Lastly, we examined group-level consistency from T1 and T2 by calculating Pearson’s r correlation values for mean PSC across the 36 anatomical ROIs and the nine functional ROIs across T1 and T2. Hereafter, consistent task activation at the group level will be referred to as “group-level consistency,” while ICCs found for individual activation values will be referred to as “individual-level reliability.”
Whole-Brain Voxel-Wise ICCs for Functional Connectivity
Task-based functional connectivity was estimated using generalized psychophysiological (PPI) methods (McLaren et al., 2012). Four anatomically-defined seed regions were used and four functionally-defined seed regions were used. Anatomical seeds were two composite amygdala ROIs (i.e., Brainnetome lateral and medial combined but separate for left and right) and two composite fusiform gyrus ROIs (i.e., Brainnetome ventrolateral and medioventral combined but separate for left and right). Functional seeds were the same Neurosynth 5mm spherical ROIs from previous analyses for left and right amygdala and left and right FFA. All functional connectivity analyses were separated between left and right homologues due to potential laterality effects. A PPI regressor was created for each ROI as the product of the detrended seed regressor and the psychophysiological event (i.e., FACES blocks). The PPI regressor and the seed’s mean time series were then added to the individual level GLMs, identical to the GLMs used to estimate neural activation (including motion and drift parameters, and identical general linear tests for ICC contrasts of interest). Whole-brain voxel-wise ICCs were calculated using AFNI’s 3dLME package separately for each of the four seed regions. ICC calculations categorized both scan session and subject as random factors to calculate ICC(2, 1). These ICC maps were thresholded at a more stringent threshold of 0.6 to display only voxels that had good or excellent reliability. In addition, the unthresholded whole-brain ICC maps for functional connectivity analyses are provided as supplemental material. The unthresholded ICC maps for anatomically-defined amygdala and fusiform seeds are shown in Supplemental Figures 2 and 3, respectively. The unthresholded ICC maps for functionally-defined amygdala and fusiform seeds are shown in Supplemental Figures 4 and 5, respectively.
The data and the code that support the findings of this study have been uploaded and made publicly available in a data repository at https://osf.io/f4qvk/ (Open Science Framework; McDermott, 2020).
Results
Behavioral Data
The mean test-retest period was 25.52 days (SD = 12.70; Range: 11–60). The 39 participants with usable behavioral data showed similar performance between T1 and T2 as measured by both accuracy and reaction time, with no significant differences across time (p’s > .14). Additionally, there were no significant differences between T1 and T2 concerning time of day for scans or motion during scans (p’s > .08). All of these data, including ICCs, are provided in Table 1.
Imaging Data
Mean PSC values and ICCs for both the FACES-SHAPES and FACES contrasts for each of the 36 anatomical ROIs are included in Table 2. ICCs calculated for the FACES-SHAPES contrast for the anatomical ROIs showed only one ROI with fair reliability. This was the right medioventral fusiform gyrus ROI. All remaining anatomical ROIs showed poor reliability for FACES-SHAPES, and thus, no anatomical ROIs showed good or excellent reliability for FACES-SHAPES. For the FACES contrast, only one anatomical ROI showed good reliability. This was the right medioventral fusiform gyrus. Additionally, a total of three anatomical ROIs showed fair reliability. These were the left medioventral fusiform gyrus, the right DLPFC dorsal area 44, and the left caudodorsal ACC ROIs. All remaining anatomical ROIs showed poor reliability for FACES, and thus, no anatomical ROIs showed excellent reliability for FACES. Additionally, the ICC results for each subsample are reported separately for each of the 36 anatomical ROIs in Supplemental Table 2. Note that the pattern of results did not substantially differ between samples.
Mean PSC values and ICCs for the both the FACES-SHAPES and FACES contrasts for each of the nine functional ROIs are included in Table 3. ICCs calculated for the FACES-SHAPES contrast showed two ROIs with good reliability, which were the left and right FFA ROIs. In addition, the right DLPFC ROI showed fair reliability. For the FACES contrast, the left FFA ROI also showed excellent reliability and the right FFA ROI showed good reliability. In addition, the left amygdala and the right DLPFC ROIs showed fair reliability. For both contrasts, all remaining functional ROIs showed poor reliability. Additionally, the ICC results for each subsample are reported separately for each of the nine functional ROIs in Supplemental Table 3. Note that the pattern of results did not substantially differ between samples.
Whole-brain voxel-wise ICCs for both the FACES-SHAPES and FACES showed several clusters with good or excellent reliability. The thresholded ICC maps (ICCs > 0.6) and cluster tables (five largest clusters listed) are provided in Figure 1. Based on the total number of voxels that were considered good or excellent, the FACES contrast (16,888 voxels) showed relatively greater overall reliability compared to the FACES-SHAPES contrast (4,682 voxels). Group-level task effects based on one-sample t-tests from T1 and T2 are pictured in Figure 2 using a threshold of p < .01 (cluster corrected at α < .05). The results from the whole-brain dependent-sample t-tests between T1 and T2 showed no significant effects of time using a threshold of p < .01 (cluster corrected at α < .05). Note that no effects of time were observed when using a more lenient threshold of p < .05 (α < .05, cluster corrected at 716 voxels). Pearson’s r correlations examining the consistency of mean PSC values from T1 and T2 showed almost perfect correlations for the FACES-SHAPES contrast for the 36 anatomical ROIs (r = .971; p < .001) and the nine functional ROIs (r = .966; p < .001), suggesting strong group-level test-retest reliability (for scatter plots, see Figure 2).
Whole-brain voxel-wise ICCs for functional connectivity using PPI values identified very few voxels with good to excellent reliability for amygdala and fusiform seeds when using both anatomically- and functionally-defined ROIs (ICCs > 0.6). For functionally-defined amygdala ROIs, the total number of voxels with ICCs > 0.6 were 2,336 and 755 for the left and right, respectively. The thresholded ICC maps (ICCs > 0.6) and cluster tables (five largest clusters listed) for functionally-defined amygdala ROIs are provided in Figure 3. While many of these clusters were quite small, two notably larger clusters (i.e., > 200 voxels) were identified in the right cerebellum and right DLPFC for the left amygdala seed ROI. For anatomically-defined amygdala ROIs, the total number of voxels with ICCs > 0.6 were 1,905 and 1,670 for the left and right, respectively. The thresholded ICC maps (ICCs > 0.6) and cluster tables (five largest clusters listed) for anatomically-defined amygdala ROIs are provided in Figure 4. For functionally-defined fusiform ROIs, the total number of voxels with ICCs > 0.6 was 1,449 and 814 voxels for the left and right, respectively. Lastly, for the anatomically-defined fusiform ROIs, the total number of voxels with ICCs > 0.6 were 758 and 1,073 for the left and right, respectively. Unthresholded whole-brain, voxel-wise ICC maps for both functionally-defined and anatomically-defined fusiform seeds are provided in Supplemental Figures 3 and 5, respectively.
Discussion
In the current study, we investigated the test-retest reliability of neural activation and functional connectivity during an emotional faces task as measured by fMRI. Results add to previous literature by using a comprehensive and standardized approach, including two standardized methods for identifying a priori ROIs and reporting whole-brain, voxel-wise ICCs for both neural activation and functional connectivity. The current findings suggest that traditional measures of neural activation in response to emotional faces, while consistent when averaged across a group, are not always reliable for individual activation values. Regions that were shown to have good or excellent individual-level reliability were not the salience regions often focused on in psychiatric research (e.g., amygdala, insula). Rather, reliable neural activations in response to emotional faces were found primarily in visual cortex (e.g., fusiform gyrus) or in regions involved in visual attention more broadly (e.g., DLPFC). In line with our first hypothesis, the majority of ICCs calculated for a priori ROIs showed poor reliability (i.e., ICCs less than 0.4). Our second hypothesis was also supported as the reliability estimates were relatively higher for functional ROIs than anatomical ROIs. Most evidently, the left and right FFA ROIs demonstrated good or excellent reliability for both contrasts, while only one of the right anatomical fusiform ROIs had good reliability for the FACES contrast. Our third hypothesis was also supported in that our whole-brain voxel-wise ICC analyses of functional connectivity showed that connectivity values between amygdala ROI seeds and a sizable number of voxels within the DLPFC and dorsal medial PFC had good to excellent reliability (i.e., most prominently between left functional amygdala ROI and right DLPFC).
As a first point of discussion, our finding that group-level activations were consistent between scan sessions might elucidate why considerations regarding reliability have been overlooked for so long in the field of neuroimaging. Previous work looking at differences between patient populations and healthy control populations (Etkin & Wager, 2007; Fusar-Poli et al., 2009; Shin & Liberzon 2010; Hughes & Shin, 2011) may not have noticed significant changes in overall task activations between samples or across multiple time points within the same sample. Such consistency at the group-level could lead to the erroneous assumption of sufficient individual-level reliability of activation values. Our findings of robust and consistent group-level activations suggests that the emotional faces task may still have utility for such group-based analyses. This would be relevant for studies examining group differences (e.g., comparing patients to control groups) or when assessing changes in group-level activations over time (e.g., changes within a group completing active versus a comparator treatment). However, the primary findings of the current study suggest that an individual’s level of activation within emotional processing regions during emotional face processing is often not reliable across time, which constrains the utility of this paradigm for psychiatric research interested in salience regions (e.g., amygdala).
The distinction between individual-level reliability and consistent group-level neural activation across time is also relevant to recent work examining reliability within the same session during neuroimaging tasks (i.e., internal consistency; Hajcak et al., 2017; Infantolino et al., 2018). These studies discuss how neuroimaging researchers, regardless of modality, often report robust significant within-subjects contrast effects (e.g., difference between FACES and SHAPES) while simultaneously failing to report internal consistency estimates. When reported, these internal consistency estimates are often poor, which means that there is high variability in neural activation as it relates to individual differences (Hajcak et al., 2017; Infantolino et al., 2018). Low internal consistency of the within-subjects effects for any given neuroimaging task then limits its utility for examining individual differences in between-subjects effects (Hajcak et al., 2017). Infantolino and colleagues (2018) conducted an internal consistency analysis of fMRI data using the same emotional faces task as the present study. They demonstrated that internal consistency for amygdala activation during the FACES-SHAPES contrast was much worse than for the FACES contrast (Infantolino et al., 2018). This was likely because the amygdala response to faces and shapes was strongly correlated within individuals, and there was subsequently little variability to account for individual differences (Infantolino et al., 2018). While the present study is focused on test-reliability rather than internal consistency, our findings also showed that the FACES contrast (compared to implicit baseline) generally had superior reliability compared to the FACES-SHAPES contrast. However, note that regardless of the specific contrast, poor test-retest reliability estimates for within-subjects effects will greatly limit their utility to examine individual differences for between-subjects effects across time.
The ROIs exhibiting good to excellent reliability for neural activation in the current study were both functionally-defined FFA ROIs and the right medioventral fusiform gyrus anatomical ROI. Additionally, whole-brain, voxel-wise ICC analyses of neural activation also identified voxels within the visual cortex that ranged from good to excellent. While psychiatric research has focused primarily on amygdala activation during emotional face processing (Etkin & Wager, 2007; Williams et al., 2015), there have been reports concerning the clinical relevance of visual cortex activation. Specifically, increased visual cortex activation in response to angry faces compared to neutral faces was found to predict greater treatment response to cognitive behavioral therapy in adults with social anxiety disorder (Doehrmann et al., 2013). Doehrmann and colleagues (2013) state that although previous neuroimaging studies of social anxiety disorder have found altered visual cortex activation when responding to negative emotional faces (Evans et al., 2008; Bruhl et al., 2011), most clinical neuroimaging studies of this population still tend to focus primarily on limbic regions such as the amygdala. Visual cortex activation in response to negative emotional faces may have an important role in maintaining visual attention processes relevant to social anxiety disorder (Doehrmann et al., 2013), and perhaps visual cortex activation could play a role in other psychiatric disorders that show alterations in visual attention processing (e.g., PTSD; Aupperle et al., 2012). Thus, future research could aim to use emotional faces tasks to examine individual differences in visual cortex activation that is shown to exhibit sufficient test-retest reliability in the present study. While “sufficient” test-retest reliability in the present study was considered to be ROIs or voxels that had ICCs of 0.6 or greater, it should be noted that some ROIs related to salience processing did show at least fair reliability (ICCs > 0.40). These were right anatomical DLPFC, left anatomical ACC, left functional amygdala, and right functional DLPFC. Also, whole-brain analyses of neural activation identified clusters in bilateral DLPFC with sufficient reliability (i.e., ICCs > 0.6).
For functional connectivity, whole-brain, voxel-wise analyses using the amygdala seed ROIs revealed that a number of voxels in the DLPFC and dorsal medial PFC that had good to excellent reliability, and these findings partly replicate previous work (Haller et al., 2018; Nord et al., 2019). It should also be noted that these previous studies used lower cutoffs for what was considered to be sufficient reliability (i.e., 0.5 and 0.4, respectively). Note that fusiform gyrus connectivity maps did not reveal very many voxels with good to excellent reliability, perhaps indicating this is not likely to be a promising future direction. Thus, future analyses of visual cortex should primarily focus on neural activation values.
While we suggest that ICCs > 0.60 may be optimal (as this has traditionally been suggested as representing good to excellent reliability; Fleiss, 1986), it is possible that lower cutoffs (i.e., > 0.40, as used in some previous studies) may be sufficient – particularly when using a tool that offers information that cannot be obtained through other, more reliable means (as is often the case with fMRI). It may be useful for the neuroimaging field to develop a consensus concerning what level of reliability should be considered sufficient for supporting the psychometrics of an fMRI measurement. The unthresholded ICC maps provided in supplement could be informative for researchers interested in regions that were reliable at more liberal thresholds.
While the findings from the current study and previous studies generally report poor test-retest reliability for neural activation when using ROI-based approaches during fMRI emotional faces tasks, there are several potential avenues for identifying more reliable measurements. For example, whole-brain, voxel-wise ICCs, as reported in the current study, can be useful for pinpointing subregions with sufficient reliability for a given task. Neuroimaging researchers should also consider fundamental properties of measurement when designing novel fMRI tasks (e.g., assessing reliability prior to implementing it more broadly). Taking a more rigorous approach to task development might yield more reliable and valid neural measurements (McDermott et al., 2018). Making modifications to existing tasks could also potentially increase their reliability (e.g., by increasing the number of trials; Aday & Carlson, 2019). Additionally, some researchers have suggested an increased emphasis on ecological validity as a potential solution, but this is an emerging area of inquiry with only initial supporting evidence (Sonkusare et al., 2019). Alternative approaches to data analysis might also yield more reliable results. Potential alternative possibilities could include using: (a) principles of graph theory, which considers the way that activity from multiple regions correlate when simultaneously active (Cao et al., 2014); or (b) multivariate pattern analysis, which examines how patterns of voxels directly relate to the psychological construct of interest (e.g., pain intensity; Woo et al., 2017). However, these alternative methods could also add undue complication, have less relevance for delineating the brain regions and circuitry recruited to process the information presented (i.e., looking at how regions are correlated with one another or identifying voxel patterns that may or may not map onto specific circuitry), and could make findings less interpretable or directly relevant to clinical populations. To this end, the present study utilized data analytic approaches that are most commonly used in clinical research.
Finally, it is important to recognize several limitations of the current study. An adequate sample size is considered an important factor when estimating ICCs (Bujang & Buharum, 2017), and note that previous test-retest studies of emotional faces tasks with good to excellent reliability had smaller sample sizes (i.e., all N’s ≤ 15) compared to the studies that showed poor to fair reliability (i.e., all N’s ≥ 15). The sample size of the current study (N=42) is larger than what has been suggested as necessary for estimating ICCs (i.e., 20 subjects to detect ICCs of .60 with 90 percent power for two observations; Bujang & Buharum, 2017), and it represents one of the larger fMRI healthy adult test-retest reliability studies using an emotional faces task to date. However, the observed 95% confidence intervals for ICC estimates in the present study were still quite large, and studies using larger samples would be able to make more precise estimates of reliability. For example, the ICC estimate for the functional ROI centered on the left amygdala for the FACES-SHAPES contrast was 0.42 (i.e., indicating fair reliability), but its 95% confidence interval ranged from −0.13 to 0.64 (i.e., indicating the possibility of either poor or good reliability). The present study also focuses exclusively on healthy adults, and the use of more diverse samples could potentially be useful to further support generalizability of these findings. Specifically, it could be useful for future research to examine whether findings are similar when examining reliability in the context of psychiatric populations. In the current study, there were no differences between sessions on potential extraneous factors, including time of day when scanning started and head motion. However, this does not completely preclude other potential factors influencing reliability, such as hormonal changes, increases in daily stress, significant life events, or other yet unidentified variables.
In conclusion, group-averaged, but not individual, neural activation values during emotional face processing appear consistent across time. At the individual-level, test-retest reliability of this task is relatively poor across salience regions, but good reliability is found within areas of the visual cortex (e.g., fusiform gyrus) and some clusters within bilateral DLPFC. Findings also show that the FACES contrast (compared to implicit baseline) had superior reliability compared to FACES-SHAPES contrast, and functionally-defined ROIs showed superior reliability compared to anatomically-defined ROIs. Additionally, connectivity between a left functionally-defined amygdala seed ROI and right DLPFC voxels was found to be reliable. This could be a promising future direction for those interested in examining individual differences in connectivity between salience regions. Overall, these findings constrain the utility of emotional faces tasks for examining individual differences in the recruitment of salience processing regions but support their potential utility in examining broad group-level effects or individual differences in visual cortex or DLPFC.
Supplementary Material
Acknowledgements and Financial Disclosures
Funding for this study was provided by the William K. Warren Foundation. Timothy McDermott, M.A., received support from the National Institute of Mental Health under Award Number F31MH122090. Robin Aupperle, Ph.D., received support from the National Institute of Mental Health under Award Number K23MH108707 and National Institute of General Medical Sciences under Award Number P20GM121312. Namik Kirlic, Ph.D., received support from the National Institute for General Medical Sciences under award Number P20GM121312. Ashley Clausen, Ph.D. was supported by the Department of Veterans Affairs Office of Academic Affiliations Advanced Fellowship Program in Mental Illness Research and Treatment, the Medical Research Service of the Durham VA Health Care System, and the Department of Veterans Affairs Mid-Atlantic MIRECC. The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs or the United States Government. The authors have no conflicts of interest to disclose.
References
- Aday JS, Carlson JM. Extended testing with the dot-probe task increases test-retest reliability and validity. Cogn Process. 2019; 20(1): 65–72. [DOI] [PubMed] [Google Scholar]
- Akeman E, Kirlic N, Clausen AN, Cosgrove KT, McDermott TJ, Cromer LD, Paulus MP, Yeh HW, Aupperle RL. A pragmatic clinical trial examining the impact of a resilience program on college student mental health. Depress Anxiety. 2020; 37(3): 202–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aupperle RL, Melrose AJ, Stein MB, Paulus MP. Executive function and PTSD: disengaging from trauma. Neuropharmacology. 2012; 62(2): 686–694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aupperle RL, Allard CB, Simmons AN, Flagan T, Thorp SR, Norman SB, Paulus MP, Stein MB. Neural responses during emotional processing before and after cognitive trauma therapy for battered women. Psychiatry Res. 2013; 214(1): 48–55. [DOI] [PubMed] [Google Scholar]
- Baumgartner TA. Estimating the stability reliability of a score. Meas Phys Educ Exerc Sci. 2000; 4: 175–178. [Google Scholar]
- Bedard M, Martin NJ, Krueger P, Brazil K. Assessing reproducibility of data obtained with instruments based on continuous measurements. Exp Aging Res. 2000; 26: 353–365. [DOI] [PubMed] [Google Scholar]
- Bennett CM, Miller MB. How reliable are the results from functional magnetic resonance imaging? Ann N Y Acad Sci. 2010; 1191: 133–155. [DOI] [PubMed] [Google Scholar]
- Bryant RA, Felmingham K, Kemp A, Das P, Hughes G, Peduto A, Williams LM. Amygdala and ventral anterior cingulate activation predicts treatment response to cognitive behavior therapy for post-traumatic stress disorder. Psychol Med. 2008; 38(4): 555–561. [DOI] [PubMed] [Google Scholar]
- Bujang MA, Baharum N. A simplified guide to determination of sample size requirements for estimating the value of intraclass correlation coefficient: a review. Archives of Orofacial Sciences. 2017; 12(1): 1–11. [Google Scholar]
- Cao H, Plichta MM, Schafer A, Haddad L, Grimm O, Schneider M, Esslinger C, Kirsch P, Meter-Lindenberg A, Tost H. Test-retest reliability of fMRI-based graph theoretical properties during working memory, emotion processing, and resting state. Neuroimage. 2014; 84: 888–900. [DOI] [PubMed] [Google Scholar]
- Cisler JM, Sigel BA, Kramer TL, Smitherman S, Vanderzee K, Pemberton J, Kilts CD. Amygdala response predicts trajectory of symptom reduction during trauma-focused cognitive-behavioral therapy among adolescent girls with PTSD. J Psychiatr Res. 2015; 71: 33–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Costafreda SG, Brammer MJ, David AS, Fu CHY. Predictors of amygdala activation during the processing of emotional stimuli: A meta-analysis of 385 PET and fMRI studies. Brain Res. Rev 2008; 58(1): 57–70. [DOI] [PubMed] [Google Scholar]
- Cox RW. AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. Comput Biomed Res. 1996; 29(3): 162–173. [DOI] [PubMed] [Google Scholar]
- Doehrmann O, Ghosh SS, Polli FE, Reynolds GO, Horn F, Keshavan A, Triantafyllou C, Saygin ZM, Whitfield-Gabrieli S, Hofmann SG, Pollack M, Gabrieli JD. Predicting treatment response in social anxiety disorder from functional magnetic resonance imaging. JAMA Psychiatry. 2013; 70(1): 87–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dong D, Wang Y, Xiaoyan J, Li Y, Chang X, Vandekerckhove M, Luo C, Yao D. Abnormal brain activation during threatening face processing in schizophrenia: A meta-analysis of functional neuroimaging studies. Schizophr Res. 2017; 197: 200–208. [DOI] [PubMed] [Google Scholar]
- Elliott ML, Knodt AR, Cooke M, Kim MJ, Melzer TR, Keenan R, Ireland D, Ramrakha S, Poulton R, Avshalom C, Moffitt TE, Hariri AR. General functional connectivity: Shared features of resting-state and task fMRI drive reliable and heritable individual differences in functional brain networks. Neuroimage. 2019; 189: 516–532. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Etkin A, Wager TD. Functional neuroimaging of anxiety: A meta-analysis of emotional processing in PTSD, social anxiety disorder, and specific phobia. Am J Psychiatry. 2006; 164(10): 1476–1488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan L, Li G, Zhuo J, Zhang Y, Wang J, Chen L, Yang Z, Chu C, Xie S, Laird AR, Fox PT, Eickhoff SB, Yu C, Jiang T. The human brainnetome atlas: a new brain atlas based on connectional architecture. Cereb Cortex. 2016; 26(8): 3508–3526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fleiss J The Design and Analysis of Clinical Experiments. 1986. Wiley, New York. [Google Scholar]
- Forster GL, Novick AM, Scholl JL, Watt MJ. The role of the amygdala in anxiety disorders In Ferry B (Ed.), The Amygdala: A Discrete Multitasking Manager. 2012. IntechOpen. [Google Scholar]
- Fusar-Poli P, Placentino A, Carletti F, Landi P, Allen P, Surguladze S, Benedetti F, Abbamonte M, Gasparotti R, Barale F, Perez J, McGuire P, Politi P. Functional atlas of emotional faces processing: a voxel-based meta-analysis of 105 functional magnetic resonance imaging studies. J Psychiatry Neurosci. 2009; 34(6): 418–432. [PMC free article] [PubMed] [Google Scholar]
- Gamer M, Lemon J, Fellows I, Singh P. (2019). irr: Various coefficients of interrater reliability and agreement. In. Retrieved from https://cran.r-project.org/package=irr
- Geissberger N, Tik M, Sladky R, Woletz M, Schuler AL, Willnger D, Windischberger C. Reproducibility of amygdala activation in facial emotion processing at 7T. Neuroimage. 2020; 211: 116585. [DOI] [PubMed] [Google Scholar]
- Godlewska BR, Browning M, Norbury R, Cowen PJ, Harmer CJ. Early changes in emotional processing as a marker of clinical response to SSRI treatment in depression. Transl Psychiatry. 2016; 6(11): e957. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Groth-Marnat G Handbook of psychological assessment. 2009. Wiley, New York. [Google Scholar]
- Hajcak G, Meyer A, & Kotov R Psychometrics and the neuroscience of individual differences: Internal consistency limits between-subjects effects. J Abnormal Psychol, 2017; 126(6): 823–834. [DOI] [PubMed] [Google Scholar]
- Haller SP, Kircanski K, Stoddard J, White LK, Chen G, Sharif-Askary B, Zhang S, Towbin KE, Pine DS, Leibenluft E, Brotman MA. Reliability of neural activation and connectivity during implicit face emotion processing in youth. Dev Cogn Neurosci. 2018; 31: 67–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hariri AR, Tessitore A, Mattay VS, Fera F, Weinberger DR. The amygdala response to emotional stimuli: a comparison of faces and scenes. Neuroimage. 2002; 17(1): 317–323. [DOI] [PubMed] [Google Scholar]
- Hughes KC, Shin LM. Functional neuroimaging studies of post-traumatic stress disorder. Expert Rev Neurother. 2011; 11(2): 275–285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Infantolino ZP, Luking KR, Sauder CL, Curtin JJ, Hajcak G. Robust is not necessarily reliable: From within-subjects fMRI contrasts to between-subjects comparisons. Neuroimage. 2018; 173: 146–152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnstone T, Somerville LH, Alexander AL, Oakes TR, Davidson RJ, Kalin NH, Whalen PJ. Stability of amygdala BOLD response to fearful faces over multiple scan sessions. Neuroimage. 2005; 25: 1112–1123. [DOI] [PubMed] [Google Scholar]
- Kanwisher N, McDermott J, Chun MM. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J Neurosci. 1997; 17(11): 4302–4311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Klumpp H, Fitzgerald JM. Neuroimaging predictors and mechanisms of treatment response in social anxiety disorder: An overview of the amygdala. Curr Psychiatry Rep. 2018; 20(10): 89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lipp I, Murphy K, Wise RG, Caseras X. Understanding the contribution of neural and physiological signal variation to the low repeatability of emotion-induced BOLD responses. Neuroimage. 2014; 86: 335–342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lois G, Kirsch P, Sandner M, Plichta MM, Wessa M. Experimental and methodological factors affecting test-retest reliability of amygdala BOLD responses. Psychophysiology. 2018; 55(12):e13220. [DOI] [PubMed] [Google Scholar]
- Ludbrook J Statistical techniques for comparing measures and methods of measurement: A critical review. Clin Exp Pharmacol Physiol. 2002; 29(7), 527–536. [DOI] [PubMed] [Google Scholar]
- Manuck SB, Brown SM, Forbes EE, Hariri AR. Temporal stability of individual differences in amygdala reactivity. Am J Psychiatry. 2007; 164(10), 1613–1614. [DOI] [PubMed] [Google Scholar]
- Marek S, Dosenbach UF. The frontoparietal network: function, electrophysiology, and importance of individual precision mapping. Dialogues Clin Neurosci. 2018; 20(2): 133–140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mattavelli G, Sormaz M, Flack T, Asghar AUR, Fan S, Frey J, Manssuer L, Usten D, Young AW, Andrews TJ. Soc Cogn Affect Neurosci. 2014; 9(11): 1684–1689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McDermott TJ, Kirlic N, Aupperle RL. Roadmap for optimizing the clinical utility of emotional stress paradigms in human neuroimaging research. Neurobiol Stress. 2018; 8: 134–146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McDermott TJ. Visual cortical regions show sufficient test-retest reliability while salience regions are unreliable during emotional face processing. Open Science Framework. 2020. DOI: 10.17605/OSF.IO/F4QVK. Retrieved from https://osf.io/f4qvk/ [DOI] [PMC free article] [PubMed] [Google Scholar]
- McLaren DG, Ries ML, Xu G, Johnson SC. A generalized form of context-dependent psychophysiological interactions (gPPI): A comparison to standard approaches. Neuroimage, 2012; 61(4): 1277–1286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nord CL, Gray A, Charpentier CJ, Robinson OJ, Roiser JP. Unreliability of putatitve fMRI biomarkers during emotional face processing. Neuroimage. 2017; 156: 119–127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nord CL, Gray A, Robinson OJ, Roiser JP. Reliability of fronto-amygdala coupling during emotional face processing. Brain Sci. 2019; 9(4): 89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- R Core Team. (2013). R: A language and environment for statistical computing. Retrieved from http://www.R-project.org/
- Phan KL, Wager TD, Taylor SF, Liberzon I. Functional neuroanatomy of emotion: A meta-analysis of emotion activation studies in PET and fMRI. Neuroimage. 2002; 16(2): 331–348. [DOI] [PubMed] [Google Scholar]
- Plichta MM, Schwarz AJ, Grimm O, Morgen K, Mier D, Haddad L, Gerdes ABM, Sauer C, Tost H, Esslinger C, Colman P, Wilson F, Kirsch P, Meyer-Lindenberg A. Test-retest reliability of evoked BOLD signals from a cognitive-emotive fMRI test battery. Neuroimage. 2012; 60: 1746–1758. [DOI] [PubMed] [Google Scholar]
- Poldrack RA, Baker CI, Durnez J, Gorgolewski KJ, Matthews PM, Munafo MR, Nichols TE, Poline JB, Vul E, Yarkoni T. Scanning the horizon: towards transparent and reproducible neuroimaging research. Nat Rev Neurosci. 2016; 18(2): 115–126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Posamentier MT, Abdi H. Processing faces and facial expressions. Neuropsychol Rev. 2003; 13(3), 113–143. [DOI] [PubMed] [Google Scholar]
- Sauder CL, Hajcak G, Angstadt M, Phan KL. Test-retest reliability of amygdala response to emotional faces. Psychophysiology. 2013; 50(11): 1146–1156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schacher M, Haemmerle B, Woermann FG, Okujava M, Huber D, Grunwald T, Kramer G, Jokeit H. Amygdala fMRI lateralizes temporal lobe epilepsy. Neurology. 2006; 66(1), 81–87. [DOI] [PubMed] [Google Scholar]
- Seeley WW. The salience network: a neural system for perceiving and responding to homeostatic demands. J Neurosci. 2019; 39(50):9878–9882. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sergerie K, Chochol C, Armony JL. The role of the amygdala in emotional processing: A quantitative meta-analysis of functional neuroimaging studies. Neurosci Biobehav Rev. 2008; 32: 811–830. [DOI] [PubMed] [Google Scholar]
- Shavelson RJ, Webb NM. Generalizability Theory: A Primer. 1991. Thousand Oaks, CA: SAGE. [Google Scholar]
- Sheehan DV, Lecrubier Y, Sheehan KH, Amorim P, Janavs J, Weiller E, Hergueta T, Baker R, Dunbar GC. The Mini-International Neuropsychiatric Interview (M.I.N.I.): development and validation of a structured diagnostic psychiatric interview. J Clin Psychiatry. 1998; 59 Suppl 20: 22–33. [PubMed] [Google Scholar]
- Shin LM, Liberzon I. The neurocircuitry of fear, stress, and anxiety disorders. Neuropsychopharmacology. 2010; 35(1): 169–191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shrout PE, Fleiss JE. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979; 86(2): 420–428. [DOI] [PubMed] [Google Scholar]
- Sonkusare S, Breakspear M, Guo C. Naturalistic stimuli in neuroscience: Critically acclaimed. Trends Cogn Sci. 2019; 23(8): 699–714. [DOI] [PubMed] [Google Scholar]
- Taylor CT, Aupperle RL, Flagan T, Simmons AN, Amir N, Stein MB, Paulus MP. Neural correlates of a computerized attention modification program in anxious subjects. Soc Cogn Affect Neurosci. 2014; 9(9): 1379–1387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Todorov A The role of the amygdala in face perception and evaluation. Motiv Emot. 2012; 36(1): 16–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tottenham N, Tanaka JW, Leon AC, McCarry T, Nurse M, Hare TA, Marcus DJ, Westerlund A, Casey BJ, Nelson C. The NimStim set of facial expressions: judgments from untrained research participants. Psychiatry Res. 2009; 168(3): 242–249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Touroutoglou A, Hollenback M, Disckerson BC, Feldman Barrett. Dissociable large-scale networks anchored in the right anterior insula subserve affective experience and attention. Neuroimage. 2012; 60(4): 1947–1958. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van den Bulk BG, Koolschijn PC, Meens PH, van Lang ND, van der Wee NJ, Rombouts SA, Vermeiren RR, Crone EA. How stable is activation in the amygdala and prefrontal cortex in adolescence? A study of emotional face processing across three measurements. Dev Cogn Neurosci. 2013; 4: 65–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Rooij SJ, Geuze E, Kennis M, Rademaker AR, Vink M. Neural correlates of inhibition and contextual cue processing related to treatment response in PTSD. Neuropsychopharmacology. 2015; 40(3): 667–675. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res. 2005; 19(1), 231–240. [DOI] [PubMed] [Google Scholar]
- Williams LM, Korgaonkar MS, Song YC, Paton R, Eagles S, Goldstein-Piekarski A, Grieve SM, Harris AWF, Usherwood T, Etkin A. Amygdala reactivity to emotional faces in the prediction of general and medication-specific response to antidepressant treatment in the randomized iSPOT-D Trial. Neuropsychopharmacology. 2015; 40: 2398–2408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Woo CW, Krishnan A, Wager TD. Cluster-extent based thresholding in fMRI analyses: Pitfalls and recommendations. Neuroimage. 2014; 91: 412–419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yarkoni T, Poldrack RA, Nichols TE, Van Essen DC, Wager TD. Large-scale automated synthesis of human functional neuroimaging data. Nature Methods. 2011; 8(8), 665–670. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yen M, Lo LH. Examining test-retest reliability: An intra-class correlation approach. Nursing Research. 2002; 51(1): 59–62. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.