Abstract
Functional MRI (fMRI) has considerable potential as a translational tool for understanding risk, prioritizing interventions, and improving the treatment of brain disorders. However, recent studies have found that many of the most widely used fMRI measures have low reliability, undermining this potential. Here, we argue that many fMRI measures are unreliable because they were designed to identify group-effects, not to precisely quantify individual differences. We then highlight four emerging strategies (extended aggregation, reliability modeling, multi-echo fMRI, and stimulus design) that build on established psychometric properties to generate more precise and reliable fMRI measures. By adopting such strategies to improve reliability, we are optimistic that fMRI can fulfill its potential as a clinical tool.
Keywords: fMRI, reliability, individual differences, translational neuroscience, clinical neuroscience, biomarker
Can we reliably measure individual differences in brain function?
Cognitive neuroscience has revolutionized our understanding of how the brain supports behavioral functions ranging from basic sensory to complex cognitive processes. Based on these fundamental insights into brain and behavior, an emerging program of translational neuroscience seeks to identify individual differences in these patterns and, in so doing, inform the development of clinical biomarkers that can be used to predict disease risk, prioritize interventions, and improve treatment. Central to these efforts is functional magnetic resonance imaging (fMRI), as it affords the noninvasive measurement of brain activity in behaving humans across the lifespan. In recent years, fMRI studies of individual differences in clinically meaningful domains have proliferated, alongside expectations for clinical applications [1]. This expansion of individual differences research using fMRI has triggered questions about its readiness to fulfill the measurement properties necessary for clinical translation, central amongst which is reliability.
Psychometrics has long established that reliability is the necessary first step towards validity. For example, to investigate how brain function makes a super-ager resilient in the face of neurodegeneration or to tailor brain stimulation to an individual’s unique functional topography, we must first be able to reliably measure idiosyncrasies in brain function. To establish reliability, repeated measurements of brain function must produce converging estimates in the absence of significant changes in the individual (e.g., disease progression, exposure to treatment). Recently, we reported that many of the most widely adopted task-fMRI measures of brain activity during clinically relevant behavior (e.g., episodic memory, executive control) have low test-retest reliability and are, therefore, unable to serve as clinical biomarkers in their current state [2]. Results from similar studies have also pointed to low reliability in other widely-used fMRI measures including functional connectivity measures generated from short scans [3-5]. Fortunately, methods for improving reliability have long been developed and employed in the allied field of psychology (e.g., personality or cognitive assessments). However, these methods have yet to be fully adopted in fMRI research.
In this commentary, we begin by describing historical trends that contributed to the widespread use of unreliable fMRI measures in individual differences research. Then we highlight four emerging strategies (extended aggregation, reliability modeling, multi-echo fMRI, and stimulus design), each with roots in psychometrics, that could enable researchers to reliably measure individual differences in brain function with fMRI (Figure 1, Key Figure). We conclude that despite several false starts and dead ends, there is a bright future for a cumulative, translational neuroscience of individual differences using fMRI, but that for this future to be realized we must upend status-quo approaches in favor of psychometrically sound principles for measure development.
Figure 1. Emerging strategies for generating more reliable fMRI measures.
A Typical fMRI study generates measures of activation or functional connectivity by averaging over time (often 5-10 minutes of data). For example, black and white photos of facial expressions may be shown in blocks. Then a single regressor is fit to all face blocks to generate an average estimate of brain activation for each individual. Many of the most commonly used activation and functional connectivity measures from such short, "typical" fMRI studies are unreliable (represented by a wide error variability around the true score). The reliability of fMRI measures can be dramatically improved by Extended Aggregation of hours, instead of minutes, of data in each individual. These measures are far more precise and reliable because random error score variability cancels out over more trials (represented by the precise density plot around the true score. Reliability Modeling can improve reliability by separating stable variability, which is consistent across multiple measurements (green distribution), from error variance (gray distribution), which is transient. Multi-Echo fMRI improves reliability by utilizing multiple echoes to separate non-BOLD error variability (gray distribution) from the BOLD signal of interest (green distribution). Finally, the reliability of fMRI measures can be improved through Stimulus Design (i.e., designing stimuli to evoke more reliable between-subjects variance). For example, the stimuli chosen here are colored, visually striking, emotionally rich images that are more complex and relevant to everyday life (i.e., naturalistic and ecologically valid) than the tightly controlled black and white photographs of faces shown for a “typical” fMRI study. Such stimuli could be static images or dynamic movies selected based on their ability to generate reliable individual differences.
A very brief history of individual differences research with fMRI
The 1990s
In March of 1992, the first studies to map human brain function with MRI were sequentially reported by Kwong et al. and Ogawa et al. [6,7]. In each study, patterns of activity in the visual cortex were measured using the blood-oxygen-level-dependent (BOLD) signal by implementing a within-subject design to contrast activity between alternating blocks of darkness and visual stimulation. These simple, yet powerful experiments demonstrated the potential of fMRI to noninvasively measure human brain activity, sparking a flurry of research further “mapping” functions of the human brain. As these early experiments were typically designed for cognitive neuroscience research and required costly, technologically demanding infrastructures, they often relied on two critical design features: experimental manipulation and group averaging. Experimental manipulation was achieved by presenting tightly controlled stimuli in structured patterns so that the BOLD signal for a condition of interest could be experimentally contrasted with the BOLD signal during a baseline condition. By carefully constraining stimulus features (e.g., visual angle, size or instructions) and timing (e.g., block and event-related presentations), researchers could find patterns of brain activation that reflected specific, experimentally contrasted differences in task conditions (e.g., working memory vs. passive viewing of visual objects). In tandem, group averaging was utilized to reduce the inherent noisiness of individual-level BOLD data to elicit robust group effects between conditions of interest, thereby allowing inferences about the functions of the “average human brain.” Tasks were explicitly designed and optimized to consistently evoke within-subjects effects using experimental manipulation and group averaging. Using these core tools, the first decade of fMRI started with a technological trigger that was followed by iterative development, widespread adoption, and increasing expectations for the translation of fMRI into the advancement of our understanding and treatment of brain disorders ranging from depression and schizophrenia to Alzheimer’s disease (Figure 2).
Figure 2. A timeline of fMRI research.
A highly select list of events that have helped shape the current state of fMRI research on individual differences and clinical translation [2,16,18,19,24,102]. Blue shading represents the windows of data collection for the Human Connectome and UK Biobank studies. In addition, the rise of individual differences and biomarker research with fMRI in the scientific literature is plotted, in pink, along with fMRI studies that mention reliability, in red. While this is a simplification of fMRI history, it is clear that the number of publications using fMRI to investigate between-subjects questions rose rapidly in the two decades following its origin, before plateauing, while fMRI studies that consider reliability have made up a smaller fraction of fMRI studies. We have plotted the normalized proportion (search results as a fraction of all PubMed citations in that year) of PubMed search results per year from 1990 to 2020 using esperr.github.io/pubmed-by-year to account for the fact that there has been a rapid rise in the overall number of scientific publications throughout the last century.
The 2000s
During its second decade, fMRI expanded in both breadth and depth. More powerful scanners provided measures of brain activity with ever-greater spatial and temporal resolution [8]. Simultaneously, fMRI became increasingly employed in research with unique populations (e.g., children, brain disorders). It was during this period of expansion that some investigators began adopting fMRI tasks, originally developed to elicit robust within-subject effects, to probe between-subject individual differences. Due to the high cost of fMRI and the infancy of the technology, these investigations of individual differences were often a secondary aim of studies, opportunistically explored after the primary experimental cognitive neuroscience questions [9]. The logic of this approach was straightforward and alluring: If an fMRI task experimentally elicits activation in a targeted brain region during a psychological process of interest, then variability in the magnitude of that activity between individuals may drive variability in related behaviors and clinical endpoints. In this way, investigators attempted to simultaneously map behaviorally relevant brain activation and associate variability in this activation with individual differences in behavior. For example, work by us and others demonstrated that, when averaged across participants, the amygdala exhibits increased activity when participants view threat-related facial expressions in comparison with neutral visual stimuli [10,11]. In response to the extensive animal literature demonstrating the critical importance of the amygdala in fear learning and stress-related dysfunction, we and others hypothesized that variability in the magnitude of this threat-related amygdala activity would map onto individual differences in related behaviors such as anxiety and depression. These links were indeed reported alongside many similar studies of individual differences across a wide variety of fMRI tasks, behavioral traits, and clinical symptoms [12,13].
The 2010s
In the 2010s, two emerging trends simultaneously attracted more attention and scrutiny to translational fMRI. First, spurred on by the maturation and promise of fMRI, large consortia studies (e.g., the Human Connectome Project [14] and the UK Biobank [15]) were founded with the explicit goal of using fMRI tasks from experimental cognitive neuroscience to measure individual differences in brain function. With large samples and broad phenotyping, these studies created an open-access canvas for fMRI researchers to test the generalizability, replicability, and reliability of individual differences findings from prior work in small samples. Second, amidst the emerging replication crisis in psychology [16,17], a wave of studies found critical limitations in mainstream fMRI practices, calling into question many individual differences findings. Amongst these, researchers noted that statistical circularity was common in fMRI analyses, leading to inflated effect sizes [18]. Statistical methods for mapping brain activity were also found to be too liberal in determining statistical significance [19] and widely implemented statistical approaches for generalizing from experiments to the “real world” were found to be inadequate [20,21]. Relatedly, others noted that the small sample sizes of most fMRI studies (N ≤ 100) left them underpowered to detect realistic, uninflated effect sizes [22,23]. Furthermore, investigators discovered that many fMRI findings were confounded by group differences in head-motion and physiological artifacts during scanning [24,25]. Finally, our group and others found that many of the most commonly used fMRI measures had low test-retest reliability [2,3,5,26], which fundamentally undermines a measure’s ability to index individual differences or serve as a clinical biomarker. Collectively, these findings highlighted previously underappreciated limitations of fMRI research that called into question the ability of many fMRI measures to validly measure individual differences using status-quo methodologies.
The Present
These observations have already sparked methodological innovations to directly address many of these limitations, including more accurate methods for statistical inference, larger samples, multivariate modeling to boost reliability, motion-censoring, and advanced data processing techniques [27-34]. However, reliance on short scans (i.e., 5-10 minutes) in small samples (N ≤ 100), as well as rigid stimulus control and group averaging continue to limit the ability of fMRI to reliably measure individual differences in brain function that represent interpretable and tractable mechanisms of risk, pathophysiology, and treatment response. Even with advanced approaches for artifact reduction and statistical inferences, commonly used fMRI methods often generate unreliable measures. This unreliability thus continues to represent a fundamental threat to our ability to realize a rigorous translational neuroscience of individual differences. Given that reliability is a minimum, necessary prerequisite for valid individual differences research, a growing contingent of fMRI researchers have sought to build a new framework for translational neuroscience by asking a fundamental question: Under what conditions can fMRI generate reliable, individual-specific measures of brain function with sufficient precision to inform clinical practice? We next highlight four complementary strategies that have emerged in response to this question, as well as the psychometric principles that underlie their utility.
Building a reliable neuroscience of individual differences
Reliability can be improved through extended aggregation
The BOLD signal is surrounded by noise originating from thermal, physiological (e.g., motion, respiration), and miscellaneous non-physiological (e.g., scanner drift) sources [35]. In fact, the BOLD signal represents only a small fraction of the variance in fMRI data, on the order of 5-20% [35]. To tackle the challenge of isolating the subset of BOLD variance driven by reliable individual differences, a growing contingent of “precision fMRI” (pfMRI) research has adopted a tried-and-true principle from classical test theory: collect more data per person. Reliability tends to increase as assessment length increases because there are more opportunities for random, unstructured error score variability to cancel itself out. When this happens, true score variation makes up a larger portion of the measurement, resulting in higher reliability. For example, single-item measures are often dominated by noise and item-specific variance that cancels itself out as additional items are added and an aggregate score across many items is used. Typically, in psychometrics, “assessment length” refers to the number of items on a questionnaire or survey; however, the same principles apply to fMRI scan length. Across a wide variety of cohorts, scanners, and study designs, it has been shown that the reliability of fMRI measures tends to increase as scan length increases [3-5,36,37]. Illustrating this point, reliability gains are particularly pronounced when data from multiple scan sessions on different days are combined, suggesting that unwanted variance due to transient factors like time of day, head positioning, wakefulness, and scanner effects, often obscure stable individual differences hidden within measures from short, single-session scans [38,39]. By collecting many hours of data from a small number of subjects (Figure 3), pfMRI studies have further demonstrated that reliable, individual-specific, signatures are present in the spatial organization of brain networks as well as their temporal structure and timescale [36,37,40-43]. Furthermore, pfMRI has helped uncover new cortical networks that were previously obscured by group averaging [43-45] and begun to move towards clinical applications in the guidance of transcranial magnetic stimulation [46,47], detection of recovery from traumatic brain injury [48], and measurement of individual-specific cortical reorganization [49,50].
Figure 3. Precision fMRI can reveal reliable, individual-specific features of brain function.
By collecting hours of data from each individual, the Midnight Scan Club (MSC) and other “deep phenotyping” studies have demonstrated that fMRI can resolve highly detailed individual-specific patterns of brain function that are lost during group averaging. (A) With enough data, precise alignment can be identified between the boundaries of a retrosplenial functional connectivity network (white outline) and a map of task activation in the retrosplenial cortex of a single MSC participant (i.e., MSC01 to MSC01). Critically, this alignment is individual-specific and thus does not hold when the same functional connectivity network from a different individual (MSC06) is mapped onto the task activation of MSC01 (i.e., MSC06 to MSC01). Furthermore, the group-average functional connectivity map also maps poorly onto the task activation of MSC01 (i.e., MSCAVG to MSC01). (B) Similarly, the hand (in cyan) and face (in orange) functional connectivity networks can be mapped onto hand and face activation maps in the motor cortex that are highly specific to an individual and obscured by group averaging. Such pfMRI reveals that precise, individual-specific estimates of individual differences in brain function are possible but often missed when unreliable measures and group averaging are used. Images are adapted with permission from [41].
Emerging findings from pfMRI indicate that, with enough data, idiosyncratic functional organization may be the rule, not the exception [36]. This suggests that the traditional use of short fMRI scans and group averaging is holding back many translational neuroscience efforts because individual differences are obscured underneath a sea of unaccounted-for variability and noise. However, the current pfMRI methodology requires many hours of data from each individual to achieve high levels of precision [37,51]. In many anatomical targets for translational neuroscience (e.g., amygdala, accumbens, orbitofrontal cortex), signal dropout compounds unreliability, requiring even more data [52-55]. Therefore, the participant burden of pfMRI is high for most developmental and clinical samples for whom it is particularly challenging to lie still in a scanner for hours [24,56]. Thus, in its present form, pfMRI has not been widely pursued in population neuroscience efforts, which will be critical for realizing the broad translational value of fMRI [36,57,58]. However, to date, pfMRI has largely achieved greater reliability through the relatively crude approach of using aggregation to allow unstructured variance to cancel itself out over time [59]. To the extent that we understand the generative source of the true score variability that we want to measure (i.e., BOLD signal) and the error score variability that we want to remove (i.e., noise), alternative strategies may be able to more efficiently achieve precise and reliable measurement with shorter scans and lower participant burden.
Reliability can be improved by modeling stable variability
Translational neuroscience efforts are often focused on measuring stable biomarkers of disease risk, status, and prognosis [1]. However, many of the most widely used fMRI modeling approaches mix stable and transient variability by reducing a large number of fMRI measures to a single average estimate for each individual. Namely, fMRI studies often reduce regional brain function to a single estimate of activation or functional connectivity. In task-fMRI, for example, this is frequently done by fitting a single regressor or contrast of interest that represents the alternating structure of a task between control and experimental conditions (e.g., a boxcar model). Similarly, functional connectivity estimates are typically generated by correlating activity across the entire fMRI scan. These modeling approaches were originally designed for experimental cognitive neuroscience, where the between-subjects variance is a source of error to be minimized in order to maximize statistical power for estimating within-subjects experimental effects and group averages. However, with only a single estimate per individual (i.e., task contrast beta or edge functional connectivity), stable, individual-specific variance cannot be separated from transient sources of within-subjects variance (e.g., fluctuations in thoughts, emotional states, or attention) and noise [60,61].
Recent research suggests that the reliability of task-activation and functional connectivity measures can be substantially improved by explicitly isolating stable variance with tools designed for repeated measures (e.g., latent variable and Hierarchical Bayesian modeling) [21,38,62-65]. Critically, these modeling approaches can be applied both when multiple scans are available from each individual and when only a single scan is available. This is because fMRI scans intrinsically consist of many estimates of brain activity or connectivity. For example, multiple activation estimates can be generated by fitting regressors to the first and second halves of an fMRI scan separately (i.e., split-half analysis) or, at a finer-grained level, by fitting separate regressors to each trial within a scan [21,45,62]. Similarly, multiple functional connectivity estimates can be generated by splitting a single scan in half thereby generating two functional connectivity estimates or, in the extreme, by generating covariance estimates for every fMRI volume or data point [38,66]. Once multiple estimates are generated for each individual, tools from repeated-measures modeling can be used to separate “stable components” of fMRI variance from transient variance and noise [65]. Collectively, such modeling has been found to boost the reliability of activation and functional connectivity measures, especially from short fMRI scans, by as much as 60% [38,67]. Moreover, these stable components exhibit higher heritability and larger behavioral associations, further boosting translational value [38,62,63,68-70].
These methods illustrate a measurement principle that may appear counterintuitive – splitting up fMRI data into multiple noisier estimates can generate more reliable measures, at the latent variable level, than can be achieved through simple aggregation across the constituent parts. Furthermore, this insight is consistent with recent structural MRI findings that multiple, rapid, lower-resolution scans can generate more precise estimates of brain structure (e.g., cortical thickness) than a single, longer higher-resolution scan [71]. However, it is important to note that such reliability modeling is not a panacea and cannot replace careful measurement. In its simplest form, such modeling will, by design, absorb all forms of stable variance including stable artifacts like head motion, respiration, and vascular dynamics [29,72,73]. Therefore, to the extent to which these physiological artifacts are imperfectly removed during modeling, they will also be absorbed by the “stable component” and continue to corrupt the validity of brain-behavior associations.
Reliability can be improved by removing physiological artifacts.
Non-BOLD sources of variability, like head-motion, are often stable features of individuals [24,72-75]. Therefore, such sources will not necessarily be removed through aggregation or latent variable modeling, because their variance is non-random and insidiously mimics individual differences of interest. Furthermore, mainstream data processing techniques often fail to fully remove these physiological artifacts [24,76-78]. Multi-echo fMRI (ME-fMRI) represents an emerging, biophysically principled approach to isolate and remove noise and non-BOLD sources of variance from fMRI data [53]. To do so, ME-fMRI collects multiple whole-brain images during each excitation pulse (i.e., multiple echoes), instead of the single image that is typically collected (i.e., single-echo). This allows for the removal of many physiological artifacts because the BOLD signal decays across echoes while non-BOLD artifacts and noise do not [53,79].
As would be expected, improved isolation of the BOLD signal with ME-fMRI generates more precise measurements of task activation and functional connectivity, and improves statistical power [53,80,81]. Furthermore, ME-fMRI substantively reduces signal dropout in regions of particular interest for translational neuroscience (e.g., amygdala, accumbens, orbitofrontal cortex) because the echoes can be optimally weighted based on regionally specific rates of fMRI signal decay [52,53]. Of particular importance, early findings suggest ME-fMRI allows for reliable, precise mapping of individual differences in brain function with much shorter scans. For example, 10 minutes of ME-fMRI data have been found to generate more stable estimates of functional connectivity than 30 minutes of single-echo data [52].
While ME-fMRI’s widespread adoption been slowed by technological limitations, recent developments in scanner hardware and software (e.g., parallel and multi-band imaging) now allow for ME-fMRI to be acquired on most scanners with minimal loss of the spatial or temporal resolution typical of single-echo data [52,82]. Given the improved measurement precision already offered, as well as its likely continued development, ME-fMRI represents another promising strategy for translational neuroscientists to prioritize the reliable measurement of individual differences by isolating true sources of individual differences in the BOLD signal from non-BOLD but stable physiological artifacts and noise. ME-fMRI additionally allows for innovative study designs that are of interest for translational neuroscience, but methodologically challenging for single-echo fMRI. These include measurement of brain function during slow-onset drug-administration paradigms as well as mapping of rapid, stimulus-driven effects in naturalistic paradigms [83-85]. However, careful data cleaning practices are still required with ME-fMRI, because stable individual differences in non-neural BOLD effects (e.g., breathing patterns throughout a scan) can still confound individual differences research [77].
Reliability can be improved by designing stimuli to evoke individual differences
As already noted, the vast majority of fMRI tasks were designed to experimentally manipulate within-subjects, group-averaged effects, not to optimally evoke between-subjects individual differences in brain function [2,60,86]. Thus, another strategy for improving fMRI measurement of individual differences is to design new tasks from the ground up with the explicit goal of optimizing reliability and precision. In particular, there is a largely untapped opportunity for adopting psychometric tools from item-response theory and generalizability theory to select stimuli based on their ability to evoke reliable individual differences (see [87-89] for early steps in this direction). Constructing new tasks and stimuli from the ground up will, admittedly, require time-consuming and expensive fMRI pilot studies to assess large batches of stimuli and task items, test their psychometric features, and iteratively select stimuli that efficiently generate the most precise, reliable measures. However, related efforts suggest that validated fMRI stimuli with known measurement properties can yield large benefits including the ability to create more complete models of how individual brains process sensory, memory and linguistic information [90-92]. Furthermore, initial evidence suggests that a small subset of fMRI timepoints disproportionately drives reliable individual differences in functional connectivity [93]. As these high reliability timepoints tend to be elicited by the same movie segments across individuals [93], large efficiency gains may be possible by selecting stimuli that most efficiently evoke individual differences in brain function.
Naturalistic stimuli, such as movies, speeches, and complex social scenarios, may have particular benefits in this regard because they keep participants engaged, awake, and relatively still, thereby minimizing artifacts due to head motion, attention, and wakefulness [94-96]. Naturalistic stimuli also tend to have higher ecological validity than traditional tasks and can be easily tailored to a wide variety of content including visual, emotional, and social features that target psychological constructs of interest [90,97,98]. Relatedly, measures generated from movie-watching, as well as the combination of multiple tasks with resting-state data, can yield more reliable estimates of brain function with better predictive utility than single tasks or resting-state data alone, further suggesting that reliable individual differences may be best elicited from complex, varied stimuli [5,99,100]. However, these benefits also come with tradeoffs. For example, the complexity of naturalistic stimuli typically cannot be easily controlled (e.g., color composition and spatial frequency) to levels typical of traditional cognitive neuroscience stimuli; however, see [101]).
Concluding Remarks
In this commentary, we have described the origins, challenges, and frontiers of current efforts to generate reliable fMRI measures for translational neuroscience. Many of the most commonly used fMRI measures are not yet sufficiently reliable for use as clinical biomarkers. In retrospect, this may not be altogether surprising; the majority of the fMRI measures used today were not designed to identify precise between-subjects variance, but rather to reveal within-subjects cognitive neuroscience effects through experimental control and group averaging. By considering these legacy constraints, fMRI researchers are now challenged to create new paradigms for reliably measuring individual differences in brain function. Precision fMRI has revealed that deep individuality in the functional organization of the brain is measurable if stable variance is systematically isolated by collecting large amounts of data in each individual. Furthermore, emerging methods in reliability modeling, ME-fMRI, and study design suggest that reliable individual-specific fMRI measures can be more efficiently generated if protocols are optimized to isolate stable sources of between-subjects variability. Importantly, these methods could be implemented simultaneously and thus may yield complementary returns for precision and reliability. Preliminary efforts that have integrated ME-fMRI with naturalistic stimuli and item-level modeling with data aggregation suggest that the synthesis of these methods may offer the most powerful avenue for identifying reliable measures of individual differences in brain function with high translational value and potential for clinical applications [45,84]. Of course, fMRI is still a nascent tool, and many opportunities exist for continued technological innovation and development (see Outstanding Questions). The strategies we highlight here are not intended to be exhaustive or prescriptive. Instead, by highlighting these strategies alongside their grounding in psychometric principles, we hope to promote the design of fMRI studies that are better positioned to generate reliable measures of individual differences in brain function. Translational neuroscience with fMRI cannot be a secondary goal of experimental cognitive neuroscience and instead demands iterative, explicit development to optimize the measurement of reliable individual-specific variability. Despite recent setbacks, we see a bright future for a cumulative translational neuroscience of individual differences given that now, more than ever, we understand the limitations of our current fMRI measures and have emerging strategies for building more precise, reliable measures.
Outstanding Questions.
How much data are needed to generate reliable individual-specific estimates of brain function when aggregation, reliability modeling, ME-fMRI, and stimulus design are combined? In other words, are these strategies complementary or substitutes for one another?
What are the “fundamental units” of individual differences in brain function that can be measured with fMRI? What combination of task activation, functional connectivity during rest and tasks, and/or the way that functional connectivity changes during tasks to shape task activation, can best explain individuality in human brain function?
What are the underlying mechanisms that drive reliable individual differences in brain function measurable with fMRI? While we have emerging evidence that such differences reflect local patterns of anatomy, myelination, and structural connectivity, deepening our understanding of these mechanisms can not only further advance fMRI strategies for reliable measurement but also inform translation of findings to clinical applications.
How can we most effectively scale-up reliable fMRI measurement for population neuroscience? To date, precision fMRI measures have been limited to highly select, niche datasets. To understand human variation in brain function, especially of clinical value, we will need innovative fMRI protocols that can be readily implemented in large-scale, population-representative samples.
Highlights.
Since its introduction in 1992, fMRI has rapidly matured to become a powerful tool in neuroscience, allowing researchers to noninvasively map the functional organization of the average human brain and probe the brain bases of behaviors from the simple to the complex.
However, the translation of fMRI to clinical applications as well as the study of individual differences in brain function has been limited to date. This limitation, in part, reflects the inadequate reliability of many of the most commonly used fMRI measures. Reliability is a prerequisite for the valid measurement of brain function in individuals.
We highlight four emerging strategies, each with roots in psychometrics, that have the potential to improve measurement reliability, thereby advancing the potential clinical utility of fMRI: 1) extended aggregation, 2) reliability modeling, 3) multi-echo fMRI, and 4) stimulus design.
Acknowledgments
This work was supported by the National Science Foundation Graduate Research Fellowship (no. NSF DGE-1644868) and the National Institute of Aging grant no. NIA F99 AG068432-01 to M.L.E, as well as grant no. R01AG049789 to A.R.H. We would like to thank Avshalom Caspi, Terrie Moffitt, Tracy d’Arbeloff, Line Rasmussen, Alex Winn, and Ethan Whitman for thoughtful comments and feedback on earlier drafts of this manuscript.
Glossary
- Between-subject variance
Variability in a measure due to differences between individuals (i.e., individual differences). In fMRI, this is often represented by individual differences in the magnitude of activation or functional connectivity.
- Classical test theory
An early psychometric framework for improving the reliability of tests by conceptualizing observed scores from a measurement device as the sum of error score variation and true score variation.
- Clinical translation
The process of using scientific insight to inform the diagnosis, treatment, and prevention of clinical disorders. fMRI could one day be more routinely used to assess risk and prioritize treatments for brain disorders in individuals.
- Error score
The part of a measurement that is driven by random noise, measurement error, artifacts and bias.
- Ecological validity
The extent to which an experimental paradigm generalizes to the intended setting of interest. In fMRI, ecological validity increases as stimuli and tasks better capture real-world experiences and challenges faced by individuals (e.g., naturalistic stimuli).
- Functional topography
The spatial layout of brain functions and network organization across the cortex that has both general properties (e.g., the location of primary sensory cortex) and highly individualized patterns (see Figure 3).
- Generalizability theory
A psychometric framework that expands classical test theory with tools for disambiguating the multitude of sources of true score and error score variance. In fMRI, this allows researchers to measure variance that is driven by sources including scanner, time-of-day, and task.
- Item-response theory
A psychometric framework that expands on classical test theory by providing tools for assessing, designing and selecting individual items for a test instead of focusing on the observed scores that are generated from the test itself.
- Latent variable
A variable that is not directly observed, but instead is inferred from other measurements.
- Measure
A standard unit used to express the size, amount, or degree of something. Reliability is a property of a measure and will thus vary as properties of a measure varies. In fMRI, each combination of task length, stimulus type, task demands, etc. generates different measures with different reliabilities.
- Naturalistic stimuli
Experimental stimuli that are designed to approximate the rich complexity of everyday experience as opposed to the rigidly controlled stimuli that are commonly experimental cognitive neuroscience. In fMRI, naturalistic stimuli can include complex audio (e.g., speeches, stories) and visual (i.e., pictures and movies) modalities.
- Psychometrics
A specialist field focused on developing and improving the measurement of psychological constructs such as personality and cognitive ability.
- Reliability
The consistency of a measure when repeatedly assessed under similar conditions. Low reliability statistically limits the ability to detect associations between fMRI measurements and outcome measures (e.g., schizophrenia status or working memory capacity).
- True score
The targeted part of a measurement that is free of of noise, error, contamination or bias.
- Within-subject variance
Variability in a measure within an individual due to different conditions. In fMRI, this is often represented in differences in the pattern of activation between task and control conditions.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Cuthbert BN (2014) The RDoC framework: Facilitating transition from ICD/DSM to dimensional approaches that integrate neuroscience and psychopathology. World Psychiatry 13, 28–35 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Elliott ML et al. (2020) What Is the Test-Retest Reliability of Common Task-Functional MRI Measures? New Empirical Evidence and a Meta-Analysis. Psychol. Sci 31, 792–806 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Noble S et al. (2017) Influences on the Test–Retest Reliability of Functional Connectivity MRI and its Relationship with Behavioral Utility. Cereb. Cortex DOI: 10.1093/cercor/bhx230 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Birn RM et al. (2013) The effect of scan length on the reliability of resting-state fMRI connectivity estimates. Neuroimage 83, 550–558 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Elliott ML et al. (2019) General functional connectivity: Shared features of resting-state and task fMRI drive reliable and heritable individual differences in functional brain networks. Neuroimage 189, 516–532 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kwong KK et al. (1992) Dynamic magnetic resonance imaging of human brain activity during primary sensory stimulation. Proc. Natl. Acad. Sci. U. S. A DOI: 10.1073/pnas.89.12.5675 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ogawa S et al. (1992) Intrinsic signal changes accompanying sensory stimulation: Functional brain mapping with magnetic resonance imaging. Proc. Natl. Acad. Sci. U. S. A DOI: 10.1073/pnas.89.13.5951 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Bandettini PA Twenty years of functional MRI: The science and the stories. , Neuroimage. (2012) [DOI] [PubMed] [Google Scholar]
- 9.Yarkoni T and Braver TS (2010) Cognitive Neuroscience Approaches to Individual Differences in Working Memory and Executive Control: Conceptual and Methodological Issues. DOI: 10.1007/978-1-4419-1210-7_6 [DOI] [Google Scholar]
- 10.Hariri AR et al. (2000) Modulating emotional responses: Effects of a neocortical network on the limbic system. Neuroreport DOI: 10.1097/00001756-200001170-00009 [DOI] [PubMed] [Google Scholar]
- 11.Breiter HC et al. (1996) Response and habituation of the human amygdala during visual processing of facial expression. Neuron DOI: 10.1016/S0896-6273(00)80219-6 [DOI] [PubMed] [Google Scholar]
- 12.Braver TS et al. (2010) Vive les differences! Individual variation in neural mechanisms of executive control. Curr. Opin. Neurobiol 20, 242–250 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Hariri AR The neurobiology of individual differences in complex behavioral traits. , Annual Review of Neuroscience. (2009) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Barch DM et al. (2013) Function in the human connectome: Task-fMRI and individual differences in behavior. Neuroimage 80, 169–189 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Miller KL et al. (2016) Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nat. Neurosci 19, 1523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Ioannidis JPA Why most published research findings are false. , PLoS Medicine, 2. (2005) , 0696–0701 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Aarts AA et al. (2015) Estimating the reproducibility of psychological science. Science (80-. ) 349, [DOI] [PubMed] [Google Scholar]
- 18.Vul E et al. (2009) Puzzlingly High Correlations in fMRI Studies of Emotion, Personality, and Social Cognition1. Perspect. Psychol. Sci 4, 274–290 [DOI] [PubMed] [Google Scholar]
- 19.Eklund A et al. (2016) Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Proc. Natl. Acad. Sci. U. S. A DOI: 10.1073/pnas.1602413113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Yarkoni T (2019) The generalizability crisis. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Chen G et al. (2021) To pool or not to pool: Can we ignore cross-trial variability in FMRI? Neuroimage 225, 117496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Yarkoni T (2009) Big Correlations in Little Studies: Inflated fMRI Correlations Reflect Low Statistical Power—Commentary on Vul et al. (2009). Perspect. Psychol. Sci DOI: 10.1111/j.1745-6924.2009.01127.x [DOI] [PubMed] [Google Scholar]
- 23.Marek AS et al. (2020) Towards Reproducible Brain-Wide Association Studies. bioRxiv [Google Scholar]
- 24.Power JD et al. (2012) Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion. Neuroimage 59, 2142–2154 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Siegel JS et al. (2014) Statistical improvements in functional magnetic resonance imaging analyses produced by censoring high-motion data points. Hum. Brain Mapp 35, 1981–1996 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Bennett CM and Miller MB (2010) How reliable are the results from functional magnetic resonance imaging? Ann. N. Y. Acad. Sci 1191, 133–155 [DOI] [PubMed] [Google Scholar]
- 27.Cox RW et al. (2016) AFNI and Clustering: False Positive Rates Redux, Cold Spring Harbor Labs Journals. [Google Scholar]
- 28.Gratton C et al. (2020) Removal of high frequency contamination from motion estimates in single-band fMRI saves data without biasing functional connectivity. Neuroimage 217, 116866. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Fair DA et al. (2020) Correction of respiratory artifacts in MRI head motion estimates. Neuroimage 208, 116400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Cole MW et al. (2014) Intrinsic and task-evoked network architectures of the human brain. Neuron 83, 238–251 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Poldrack RA et al. (2017) Scanning the horizon: Towards transparent and reproducible neuroimaging research. Nat. Rev. Neurosci 18, 115–126 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Winkler AM et al. (2016) Faster permutation inference in brain imaging. Neuroimage DOI: 10.1016/j.neuroimage.2016.05.068 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Dubois J and Adolphs R (2016) Building a Science of Individual Differences from fMRI. Trends Cogn. Sci 20, 425–443 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Woo CW et al. (2017) Building better biomarkers: Brain models in translational neuroimaging. Nat Neurosci. 20, 365–377 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Liu TT (2016) Noise contributions to the fMRI signal: An overview. DOI: 10.1016/j.neuroimage.2016.09.008 [DOI] [PubMed] [Google Scholar]
- 36.Gratton C et al. (2020) Defining Individual-Specific Functional Neuroanatomy for Precision Psychiatry. Biol. Psychiatry 88, 28–39 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Gratton C et al. (2018) Functional Brain Networks Are Dominated by Stable Group and Individual Factors, Not Cognitive or Daily Variation. Neuron DOI: 10.1016/j.neuron.2018.03.035 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Teeuw J et al. (2021) Reliability modelling of resting-state functional connectivity. Neuroimage DOI: 10.1016/j.neuroimage.2021.117842 [DOI] [PubMed] [Google Scholar]
- 39.Cho JW et al. (2021) Impact of concatenating fMRI data on reliability for functional connectomics. Neuroimage 226, 117549. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Raut RV et al. (2020) Hierarchical dynamics as a macroscopic organizing principle of the human brain. DOI: 10.1073/pnas.2003383117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Gordon EM et al. (2017) Precision Functional Mapping of Individual Human Brains. Neuron 95, 791–807.e7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Seitzman BA et al. (2019) Trait-like variants in human functional brain networks. Proc. Natl. Acad. Sci. U. S. A 116, 22851–22861 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Braga RM and Buckner RL (2017) Parallel Interdigitated Distributed Networks within the Individual Estimated by Intrinsic Functional Connectivity. Neuron 95, 457–471.e5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Buckner RL and DiNicola LM (2019) The brain’s default network: updated anatomy, physiology and evolving insights. Nat Rev. Neurosci 20, 593–608 [DOI] [PubMed] [Google Scholar]
- 45.DiNicola LM et al. (2020) Parallel distributed networks dissociate episodic and social functions within the individual. J. Neurophysiol 123, 1144–1179 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Cash RFH et al. (2021) Personalized connectivity-guided DLPFC-TMS for depression : Advancing computational feasibility , precision and reproducibility. DOI: 10.1002/hbm.25330 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Cash RFH et al. (2020) Functional Magnetic Resonance Imaging-Guided Personalization of Transcranial Magnetic Stimulation Treatment for Depression. JAMA Psychiatry DOI: 10.1001/jamapsychiatry.2020.3794 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Gordon EM et al. (2018) High-fidelity measures of whole-brain functional connectivity and white matter integrity mediate relationships between traumatic brain injury and post-traumatic stress disorder symptoms. J. Neurotrauma 35, 767–779 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Newbold DJ et al. (2020) Plasticity and Spontaneous Activity Pulses in Disused Human Brain Circuits. Neuron 107, 580–589.e6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Laumann TO et al. Brain network reorganisation in an adolescent after bilateral perinatal strokes. , The Lancet Neurology, 20. (2021) , 255–256 [DOI] [PubMed] [Google Scholar]
- 51.Gratton C et al. (2019) Defining Individual-Specific Functional Neuroanatomy for Precision Psychiatry. Biol. Psychiatry DOI: 10.1016/j.biopsych.2019.10.026 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Lynch CJ et al. (2020) Rapid Precision Functional Mapping of Individuals Using Multi-Echo fMRI. Cell Rep. 33, 108540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Kundu P et al. (2017) Multi-echo fMRI: A review of applications in fMRI denoising and analysis of BOLD signals. Neuroimage 154, 59–80 [DOI] [PubMed] [Google Scholar]
- 54.Glasser MF et al. (2013) The minimal preprocessing pipelines for the Human Connectome Project. Neuroimage 80, 105–124 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Marek S et al. (2018) Spatial and Temporal Organization of the Individual Human Cerebellum. Neuron 100, 977–993.e7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Casey BJ et al. (2018) , The Adolescent Brain Cognitive Development (ABCD) study: Imaging acquisition across 21 sites. Developmental Cognitive Neuroscience 32, 43–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Figee M and Mayberg H (2021) The future of personalized brain stimulation. Nat. Med DOI: 10.1038/s41591-021-01243-7 [DOI] [PubMed] [Google Scholar]
- 58.Falk EB et al. (2013) What is a representative brain? Neuroscience meets population science. Proc. Natl. Acad. Sci 110, 17615–17622 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Rushton JP et al. (1983) Behavioral development and construct validity: The principle of aggregation. Psychol. Bull 94, 18–38 [Google Scholar]
- 60.Hedge C et al. (2018) The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behav. Res. Methods 50, 1166–1186 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Meyer A et al. (2017) Considering ERP difference scores as individual difference measures: Issues with subtraction and alternative approaches. Psychophysiology 54, 114–122 [DOI] [PubMed] [Google Scholar]
- 62.Chen G et al. (2020) Neuroimage Beyond the intraclass correlation : A hierarchical modeling approach to test-retest Beyond the intraclass correlation : A hierarchical modeling approach to test-retest assessment. [Google Scholar]
- 63.Brandmaier AM et al. (2018) Assessing reliability in neuroimaging research through intra-class effect decomposition (ICED). Elife 7, 1–19 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Kong R et al. (2019) Spatial Topography of Individual-Specific Cortical Networks Predicts Human Cognition, Personality, and Emotion. Cereb. Cortex 29, 2533–2551 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Cooper SR et al. (2019) Neuroimaging of Individual Differences: A Latent Variable Modeling Perspective. Neurosci. Biobehav. Rev 98, 29–46 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Betzel RF et al. (2019) High-amplitude co-fluctuations in cortical activity drive resting-state functional connectivity. bioRxiv DOI: 10.1101/800045 [DOI] [Google Scholar]
- 67.Chen G et al. (2021) Beyond the intraclass correlation : A hierarchical modeling approach to test-retest assessment. 10.1101/2021.01.04.425305 [DOI] [Google Scholar]
- 68.Teeuw J et al. (2019) Genetic and environmental influences on functional connectivity within and between canonical cortical resting-state networks throughout adolescent development in boys and girls. Neuroimage 202, 116073. [DOI] [PubMed] [Google Scholar]
- 69.Anderson KM et al. (2021) Heritability of individualized cortical network topography. Proc. Natl. Acad. Sci 118, e2016271118; [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.McCormick EM et al. (2021) Latent functional connectivity underlying multiple brain states. bioRxiv at < 10.1101/2021.04.05.438534> [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Nielsen JA et al. (2019) Precision Brain Morphometry: Feasibility and Opportunities of Extreme Rapid Scans. bioRxiv DOI: 10.1101/530436 [DOI] [Google Scholar]
- 72.van Dijk KRA et al. (2012) The influence of head motion on intrinsic functional connectivity MRI. Neuroimage 59, 431–438 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Hodgson K et al. (2017) Shared Genetic Factors Influence Head Motion During MRI and Body Mass Index. Cereb. Cortex 27, 5539–5546. DOI: 10.1093/cercor/bhw321 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Siegel JS et al. (2017) Data quality influences observed links between functional connectivity and behavior. Cereb. Cortex DOI: 10.1093/cercor/bhw253 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Power JD et al. (2020) A critical, event-related appraisal of denoising in resting-state fMRI studies. Cereb. Cortex 30, 5544–5559 [DOI] [PubMed] [Google Scholar]
- 76.Power JD et al. (2014) Methods to detect, characterize, and remove motion artifact in resting state fMRI. Neuroimage 84, 320–41 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Power JD et al. (2018) Ridding fMRI data of motion-related influences: Removal of signals with distinct spatial and physical bases in multiecho data. Proc. Natl. Acad. Sci 115, 201720985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Caballero-Gaudes C and Reynolds RC (2017) Methods for cleaning the BOLD fMRI signal. Neuroimage 154, 128–149 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Kundu P et al. (2012) Differentiating BOLD and non-BOLD signals in fMRI time series using multi-echo EPI. Neuroimage 60, 1759–1770 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Lombardo MV et al. (2016) Improving effect size estimation and statistical power with multi-echo fMRI and its impact on understanding the neural systems supporting mentalizing. Neuroimage 142, 55–66 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Kundu P et al. (2013) Integrated strategy for improving functional connectivity mapping using multiecho fMRI. Proc. Natl. Acad. Sci. U. S. A 110, 16187–16192 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Olafsson V et al. (2015) Enhanced identification of BOLD-like components with multi-echo simultaneous multi-slice (MESMS) fMRI and multi-echo ICA. Neuroimage DOI: 10.1016/j.neuroimage.2015.02.052 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Evans JW et al. (2015) Separating slow BOLD from non-BOLD baseline drifts using multi-echo fMRI. Neuroimage 105, 189–197 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Caballero-Gaudes C et al. (2019) A deconvolution algorithm for multi-echo functional MRI: Multi-echo Sparse Paradigm Free Mapping. Neuroimage 202, 1–15 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Gonzalez-Castillo J et al. (2016) Evaluation of multi-echo ICA denoising for task based fMRI studies: Block designs, rapid event-related designs, and cardiac-gated fMRI. Neuroimage 141, 452–468 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Hajcak G et al. (2017) Psychometrics and the neuroscience of individual differences: Internal consistency limits between-subjects effects. J. Abnorm. Psychol 126, 823–834 [DOI] [PubMed] [Google Scholar]
- 87.Tholen MG et al. (2020) Functional magnetic resonance imaging (fMRI) item analysis of empathy and theory of mind. Hum. Brain Mapp 41, 2611–2628 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Dodell-Feder D et al. (2011) FMRI item analysis in a theory of mind task. Neuroimage 55, 705–712 [DOI] [PubMed] [Google Scholar]
- 89.Wilson KA et al. (2021) Using item response theory to select emotional pictures for psychophysiological experiments. Int.J. Psychophysiol DOI: 10.1016/j.ijpsycho.2021.02.003 [DOI] [PubMed] [Google Scholar]
- 90.Naselaris T et al. (2021) Extensive sampling for complete models of individual brains. Curr. Opin. Behav. Sci 40, 45–51 [Google Scholar]
- 91.Hamilton LS and Huth AG (2020) The revolution will not be controlled: natural stimuli in speech neuroscience. Lang. Cogn. Neurosci 35, 573–582 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Allen EJ et al. (2021) A massive 7T fMRI dataset to bridge cognitive and computational neuroscience. [DOI] [PubMed] [Google Scholar]
- 93.Esfahlani FZ et al. (2020) High-amplitude cofluctuations in cortical activity drive functional connectivity. Proc. Natl. Acad. Sci. U. S. A 117, 28393–28401 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Tagliazucchi E and Laufs H (2014) Decoding Wakefulness Levels from Typical fMRI Resting-State Data Reveals Reliable Drifts between Wakefulness and Sleep. Neuron 82, 695–708 [DOI] [PubMed] [Google Scholar]
- 95.Eickhoff SB et al. (2020) Towards clinical applications of movie fMRI. Neuroimage 217, 116860. [DOI] [PubMed] [Google Scholar]
- 96.Vanderwal T et al. (2018) Movies in the magnet: Naturalistic paradigms in developmental functional neuroimaging. Dev. Cogn. Neurosci DOI: 10.1016/j.dcn.2018.10.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Mehrer J et al. (2021) An ecologically motivated image dataset for deep learning yields better models of human vision. DOI: 10.1073/pnas.2011417118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Hasson U et al. Reliability of cortical activity during natural stimulation. , Trends In Cognitive Sciences, 14. (2010) , 40–48 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Finn ES et al. (2017) Can brain state be manipulated to emphasize individual differences in functional connectivity? Neuroimage 160, 140–151 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Finn ES and Bandettini PA (2021) Movie-watching outperforms rest for functional connectivity-based prediction of behavior. Neuroimage 235, 117963. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Slivkoff S and Gallant JL (2021) ll Primer Design of complex neuroscience experiments using mixed-integer linear programming. Neuron DOI: 10.1016/j.neuron.2021.02.019 [DOI] [PubMed] [Google Scholar]
- 102.Button KS et al. (2013) Power failure: Why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci 14, 365–376 [DOI] [PubMed] [Google Scholar]