Abstract
Background:
Resting-state functional connectivity (RSFC) analysis with widefield optical imaging (WOI) is a potentially powerful tool to develop imaging biomarkers in mouse models of disease before translating them to human neuroimaging with functional magnetic resonance imaging (fMRI). The delineation of such biomarkers depends on rigorous statistical analysis. However, statistical understanding of WOI data is limited. In particular, cluster-based analysis of neuroimaging data depends on assumptions of spatial stationarity (i.e., that the distribution of cluster sizes under the null is equal at all brain locations). Whether actual data deviate from this assumption has not previously been examined in WOI.
New Method:
In this manuscript, we characterize the effects of spatial nonstationarity in WOI RSFC data and adapt a “two-pass” technique from fMRI to correct cluster sizes and mitigate spatial bias, both parametrically and nonparametrically. These methods are tested on multi-institutional data.
Results and Comparison with Existing Methods:
We find that spatial nonstationarity has a substantial effect on inference in WOI RSFC data with false positives much more likely at some brain regions than others. This pattern of bias varies between imaging systems, contrasts, and mouse ages, all of which could affect experimental reproducibility if not accounted for.
Conclusions:
Both parametric and nonparametric corrections for nonstationarity result in significant improvements in spatial bias. The proposed methods are simple to implement and will improve the robustness of inference in optical neuroimaging data.
Keywords: Functional neuroimaging, widefield optical imaging, functional connectivity, multiple testing problem, statistics
1. Introduction
Resting-state functional connectivity mapping with optical functional neuroimaging modalities is a potentially powerful technique that can span from preclinical imaging in mouse models to bedside imaging of acutely ill populations [Ayaz, 2022; Wang, 2022a; White, 2012; Ma, 2016]. The development of relevant functional neuroimaging biomarkers of disease requires statistical methods which precisely control the false positive rate (Type I error) while providing adequate statistical power (limitation of Type II error) [Peterson, 1999; Nichols, 2003]. This balance has led to the popularity of cluster-based inference, which improves sensitivity over pixel-based methods when changes have spatial extent (a reasonable assumption in brain imaging where functional brain areas are generally larger than the image pixel size) [Nichols, 2003; White 2023; Brier, 2023; Friston, 1994].
However, an often-underappreciated problem is that cluster-based inference is strongly affected by the smoothness of the underlying image noise [Worsley, 1999; Hayasaka, 2004]. One performs cluster-based inference by thresholding the statistical test map at a predetermined threshold and determining a p-value for each cluster based on its size. The cumulative density function for cluster size can be derived from parametric assumptions about the underlying statistical field and its autocorrelation (i.e., with random field theory, RFT) or via the empirical distribution derived through permutation inference. In either case, these methods implicitly assume that the spatial autocorrelation function is constant across the image: that is, a cluster of a given size is equally likely (under the null) to be found at any location. When this assumption is violated, larger clusters are more likely to be found of areas of high spatial autocorrelation. Consequently, uncorrected parametric methods (e.g., RFT) yield invalid inference, and nonparametric methods result in nonuniform sensitivity [Salimi-Khorshidi, 2011].
The effect of spatial nonstationarity has not previously been evaluated in functional neuroimaging in mice. Unfortunately, there are reasons to expect that nonstationarity is greater with widefield optical imaging (WOI) in mice than in human functional magnetic resonance imaging (fMRI). The mouse brain is structurally simpler than the human brain, resulting in fewer, relatively larger functional regions within which we would expect high autocorrelation driven by the functional connectivity networks [White, 2011]. Additionally, the field-of-view of widefield optical imaging is relatively limited increasing the effects of edges in the brain segmentation, where clusters are clipped and thus smaller. As WOI images the brain through an intact-skull cranial window [Silasi, 2016], overlying vasculature and cranial sutures may cause artifacts resulting in increased local roughness.
Thus, in the present manuscript, we evaluate parametric and nonparametric methods to control for spatial nonstationarity in mouse resting-state functional connectivity analysis. Specifically, we evaluate two versions of the “two-pass” approach originated by Salimi-Khorshidi et al. [2011] for fMRI wherein each pixel’s contribution to cluster size is adjusted to account for nonstationarity. We consider cluster correction using (1) the “resels per voxel” approach grounded in RFT and (2) the nonparametric “expected cluster size per voxel”. Both methods are compatible with determining significance thresholds and p-values through permutation inference; we have previously shown that RFT cluster-based inference fails to adequately control the familywise error rate in WOI [White, 2023]. Methods will be judged in their ability to provide even sensitivity across the field-of-view.
2. Methods
2.1. Animals and Optical Neuroimaging
Data for this study are derived from two sources: (1) a database of null hypothesis data obtained at the Children’s Hospital of Philadelphia (CHOP) and (2) available repository data from Washington University in St. Louis (WashU) derived from a longitudinal study of mouse development [Rahn, 2022]. The CHOP data are a set of N=32 adult (15 female, median age 9 weeks) C57bl/6 mice (Jackson Laboratory, Bar Harbor, ME) scanned across two days; all procedures on these mice were approved by CHOP’s Institutional Animal Care and Use Committee (IACUC) in accordance with the National Institutes of Health guide for the care and use of laboratory animals. An intact-skull cranial window was placed, and resting-state data was obtained using ketamine and xylazine anesthesia. For all sessions, six five-minute runs (30 minutes of data) were attempted; individual runs were excluded if data quality was poor (e.g., when a mouse awoke from anesthesia early). Overall, 12 runs (19%) had less than 30 minutes of data (5 with 25 min, 6 with 20 min, and 1 with 15 min). WOI data was processed as previously described, including pixel-wise censoring, guided brain segmentation and atlas transformation, and spectroscopy [White 2011, 2019, 2021; Padawer-Curry, 2021]. All analyses on CHOP data were performed on normalized changes in total hemoglobin concentration (with a subset of analyses also performed on oxy- and deoxyhemoglobin, included as a supplementary figure).
Data from the WashU cohort consist of 17 Thy1-GcaMP6 mice (5 female) on a C57bl/6 background imaged at five timepoints from P15 to P60. Procedures on these mice were approved by the IACUC at WashU. Imaging was performed awake using a similar intact-skull cranial window. Similar to the CHOP data, 30 minutes of data was obtained as six five-minute runs. Individual runs were excluded (by the original authors) based on quality thresholds. As downloaded, 83 sessions had 30 minutes of data, and 2 sessions had 25 minutes of data. Data were available already segmented and aligned, formatted as changes in corrected Calcium fluorescence, oxyhemoglobin concentrations, and deoxyhemoglobin concentrations. Both changes in Calcium fluorescence and in total hemoglobin concentration were analyzed. While the original paper from which these data were derived mentions the use of temporal censoring to improve data quality, reproduction of those methods was not possible given the data provided; thus, the details of the present data may differ slightly from those in its original publication.
Functional connectivity analysis was performed after filtering to 0.01–0.1 Hz for hemodynamic data and 0.4–4 Hz for Calcium data. Global signal regression was performed. FC was considered two ways. First, the majority of the analyses utilize standard seed-based functional connectivity using Pearson’s correlation coefficients between seeds at canonical cortical locations (defined in White et al. [2023]). All of the presented figures using seed-based analysis show variables (e.g., pixel-wise false positive rate) averaged over all 14 seeds. Second, in order to reevaluate the experimental data from Rahn et al. [2022], we performed homotopic FC analysis wherein each pixel was correlated with its homotopic correlate in the other hemisphere. As we previously found no major differences in FWER control with comparisons between raw correlation matrices (r-maps) and z-maps [White, 2023], all testing was performed on correlation matrices, as these were simpler to obtain from the WashU repository data.
2.2. Statistical Tests on Null Hypothesis Data
Two manners of group statistical analysis were considered: unpaired and paired t-tests. All statistical analyses were two-sided with the aim of controlling the familywise error rate at 0.05. For all permutation tests, 5,000 permutations were performed. Sex was not considered as a variable of interest.
For the unpaired tests, mice within each cohort were repeatedly divided into two groups, and we assessed for differences between these groups. For the CHOP mice, only one day per mouse was used. For the WashU mice, this analysis was performed at each age, resulting in five cohorts (for each of two imaging contrasts). Permutation inference was performed by randomly and repeatedly dividing mice into two groups and assessing differences between groups with pixel-wise two-sample t-tests.
For paired t-tests, we assessed for differences in functional connectivity structure between the two days of imaging in the CHOP mice. As these mice were adults with only a short interval between scans, no differences were expected. As the experimental premise of Rahn et al. was that mice might show changes in functional connectivity across development, this analysis was not performed on the WashU data.
In both cases, the first step in cluster-based inference is the selection of a cluster-determining threshold. For permutation analysis (unlike RFT), this choice of threshold is essentially arbitrary with a lower threshold resulting in larger clusters, but a higher cluster size threshold for inference. (The choice of threshold includes a tradeoff between sensitivity/power and the ability to localize functional changes; how to make this choice will not be addressed in the present study.) One might expect an interaction between the cluster-defining threshold and nonstationarity, as a lower threshold results in larger clusters and thus greater possible inhomogeneity in cluster sizes across the image. So, we considered three possible thresholds for the t-statistic: |t|>3, |t|>3.5, and |t|>4. Once all permutation t-maps were thresholded, cluster sizes and locations were recorded.
The maximum statistic was determined for cluster size and an empiric cumulative distribution function determined. A p-value for each cluster was then determined by comparing each cluster’s value to this probability distribution with a p defined as the likelihood of finding a cluster of size equal to or greater than the observed value. The minimum p-value is 1/K, where K is the number of permutations.
2.3. Determination of Spatial Variation and Bias
We calculated the effect of spatial variance in sensitivity in two manners. First, we considered spatial variation in the false positive rate (FPR). For each pixel, we calculated the rate at which it was involved in a cluster with a p-value of less than 0.05. The spatial variation in the FPR is simply the standard deviation across the field-of-view; lower values represent more even sensitivity and less bias.
Second, we considered spatial variation in the M statistic (as defined by Salimi-Khorshidi et al.). M is a weighted average across permutations of −log10(P). Using −log10(P) rather than P emphasizes the effect of very significant (and perhaps spuriously small) p-values. This “average p-value” is weighted by cluster size to account for large clusters impacting more pixels at a time. Thus, for each pixel, i, we consider:
Where K is the number of permutations, is a weight proportional to the likelihood of a cluster of size Si,k, hitting pixel i, and is the indicator function. Again, we assessed the standard deviation of this value across the image, with lower values being more optimal. All of these metrics were calculated for each of the fourteen seeds. As results were similar across seeds, to reduce noise, data as presented has been averaged across the fourteen maps.
2.4. Corrections for Nonstationarity
Two methods for nonstationarity correction were considered. Both methods adjust each pixel’s contribution to cluster size to account for local smoothness, either parametrically through a calculation of the local image noise roughness (the inverse of smoothness) or nonparametrically through an empiric calculation of cluster size at each pixel under the null.
First, we corrected each pixel’s contribution to cluster size based on the local “resels per voxel” (RPV), where “resel” is a neologism coined by Worsley et al [1992] for “resolution element”. RPV correction is a concept based on RFT which is equivalent to a distortion of the observed image field to an equivalent field where stationarity holds [Worsley, 1999; Hayasaka, 2004]. The local image roughness λi,d, at pixel i in dimension d for a given subject is estimated based on the covariance of the standardized residuals, [Kiebel 1999]:
where v is the number of degrees of freedom. This value is then averaged across subjects for each pixel.
The RPV is then defined as:
A higher RPV represents a rougher region and a larger equivalent contribution to cluster size. Note that RPV is equivalent to the inverse of the local full width at half maximum (FWHM) as used for RFT. For pixels i within a cluster S, the corrected cluster size is given by:
Second, we performed an empiric two-pass correction for cluster size. We calculated the empiric cluster size per voxel (ECSPV) as:
where Si,k, is the size of the cluster at pixel i for permutation k. K is the number of permutations (with the true, unpermuted arrangement of the data excluded) and is the number of permutations where Si,k> 0. E is a correction factor, which Salimi-Khorshidi et al. included to correct for skewness in the observed distribution of cluster sizes in their three-dimensional fMRI data. They assert that E=2/3 is optimal for fMRI data, although results supporting this conclusion are not shown [Salimi-Khorshidi, 2011]. As the optimal value of E for two-dimensional data is unclear, we performed ECSPV correction using a range of values from E = 0.25 to 2.
The corrected cluster size is then defined as:
As before, p-values were derived using permutation inference, except using the maximum statistic and empiric cumulative distribution function for either SRPV or SECSPV rather than the raw cluster size. Note, in all cases, we keep the terminology and abbreviations that use “voxel” despite our images being two-dimensional to avoid confusion and competing acronyms. A schematic of the processing scheme is shown in Figure 1.
Figure 1.

Correcting for nonstationarity in cluster-based analysis. For each permutation, a t-map is determined, and an RPV map is generated from the normalized residuals. The t-map is thresholded to give a map of clusters. The maps of clusters across all permutations are used to determine the ECSPV. Cluster sizes for each permutation for the purposes of inference can then be measured using either raw cluster size, cluster size corrected for RPV, or cluster size corrected for ECSPV. Once the maximum distribution for any variable is known, p-values (for each cluster) and the pixel-wise false positive rate (FPR) or M-statistic can be calculated.
2.5. Practical Effects of Nonstationarity Correction
Finally, in order to demonstrate the effect of nonstationarity on inference in an experiment where we expect the null hypothesis might not hold, we replicated the study of Rahn et al. using multiple statistical methods. For simplicity, we will focus on the question of whether homotopic connectivity (the correlation between each pixel and its homologue in the contralateral hemisphere) changes with development, which is the focus of Figure 2 in Rahn et al. Namely, longitudinal data was analyzed using a series of paired t-tests between adjacent ages (P15 and P22, P22 and P28, P28 and P35, and P35 and P60). Note that all homotopic connectivity is by definition bilaterally symmetric, and the statistical search space should properly be only half of the image. It is unclear how the original analysis was performed, but all subsequent tests are performed only on the left hemisphere.
Figure 2.

Effects of nonstationarity on cluster-based inference. First, note that all images are of the dorsal surface of the mouse brain viewed from above (rostral is up, and caudal is down; the left hemisphere is on the left, and the right hemisphere is on the right). Ideally, all measures would show little spatial variation across the brain. While the frequency with which a pixel exceeds the cluster-defining threshold, KS, is relatively even, larger clusters are far more likely to appear in some regions of the field-of-view than others. Parametrically, we see a wide variation in the resels per voxel (RPV), and nonparametrically in the expected cluster size per voxel (ECSPV). Thus, cluster-based inference results in spatially biased results with a spatially variant false positive rate (FPR) and M-statistic. Significant clusters and lower p-values are more likely to appear in smooth regions (lower RPV or higher ECSPV). Similar patterns occurred for both unpaired and paired t-tests. (See also Supplemental Fig. 1 for WashU data).
The analysis in Rahn et al. was performed using RFT [Brier, 2023], which we have demonstrated generates invalid p-values [White, 2023]. Importantly, the software used in Rahn et al. included a software bug that improperly created clusters by thresholding at t>1.65 rather than |t|>3.09. For direct visual comparison with the prior publication, we first reproduced the RFT results both using the incorrect threshold and a proper implementation of RFT. We then analyzed the data using permutation inference using all three of the methods discussed above (for consistency with RFT, all clusters were here defined by |t|>3.09): without correction for nonstationarity, RPV correction, and ECSPV correction. We then examined how p-values differed between these methods (as all clusters were defined the same way, the cluster sizes and locations do not differ between methods).
2.6. Data/Code Availability
MATLAB code to run the above analyses and generate figures as well as the CHOP data is available at https://osf.io/7fehu/?view_only=f5556893b22544a9bc8192f8ab01f461 [note: this link is for peer review and will be replaced with an open repository link upon publication]. The WashU data can be found at the online repository associated with Rahn et al. [2022].
3. Results
Cluster-based inference was first performed using null data, seed-based FC, a cluster-defining threshold of |t|>3, and an FWER of 0.05. The frequency with which any individual pixel exceeded this threshold was reasonably even across the field-of-view (CHOP data shown in Fig. 2; WashU data shown in Supp. Fig. 1). However, the frequency with which a pixel was included in a statistically significant cluster (as measured by either the false positive rate or the M-statistic) varied substantially (Fig. 2, Supp. Fig. 1). Comparing the maps of false positive rate to image noise roughness (as measured either parametrically by the resels per voxel, RPV, or nonparametrically as the expected cluster size per voxel, ECSPV), we can see that significant clusters are more likely to arise in areas of lower roughness (i.e., smoother regions, Fig 2., Supp. Fig. 1).
KS is relatively even, while the false positive rate (FPR) and M-statistic vary widely, generally as in a pattern predicted by the resels per voxel (RPV) and the expected cluster size per voxel (ECSPV). The pattern of bias differs from the CHOP data with particularly large smooth areas in the parietal cortex that worsen over time (especially in the hemoglobin data).
Interestingly, the pattern of image roughness differed substantially across datasets. In the CHOP data, the roughest areas were along the venous sinuses and cranial sutures (Fig. 2); the smoothest area were in large cortical regions (e.g., motor and visual cortex). (This pattern was nearly identical for the paired t-test analysis). In the WashU data, the roughest regions were parallel to (but not on) the anterior midline, while the parietal cortex was smoothest (Supp. Fig. 1). This pattern was more present at all ages, but more pronounced in the P60 data; the contrast between smooth and rough regions is stronger in the hemoglobin data than in the Calcium data.
The effect of nonstationarity on spatial bias was present at all cluster-determining thresholds but is more pronounced at lower thresholds (Supp. Fig. 2). As the cluster-defining threshold increases, the clusters necessarily become smaller and in the limit of a sufficiently large threshold are equivalent to pixel-wise inference.
Next, we assessed the ability of parametric and nonparametric methods to improve spatial bias in sensitivity. Before comparing methods (i.e., RPV- and ECSPV-correction), we examined the effect of the ECSPV correction factor, E (recall that Salimi-Khorshidi et al. used E=2/3 for three-dimensional data). Overall, the effect of varying E for the CHOP data was minimal (Supp. Fig. 3). E had a greater effect on the spatial standard deviation of the false positive rate with an optimal E (lowest standard deviation) being either 0.9 or 1.0 (depending on the permutation). The standard deviation of the M-statistic was nearly invariant in E, with values of E around 0.5 yielding the lowest standard deviation. Thus, further results will be shown using E=1 (results using other values of E can be easily generated using the code provided). There are no substantive differences in the results using reasonable values for E.
Adjusting cluster sizes using either RPV or ECSPV resulted in substantially more uniform sensitivity (CHOP data: Fig. 3, WashU data: Supp. Fig. 4). Quantifying spatial variance in FPR and M (Table 1) for both CHOP and WashU data, we see that both RPV and ECSPV correction result in improvements in spatial bias (lower standard deviation). ECSPV correction offers the best performance when using FPR as metric. RPV and ECSPV correction perform similarly in their improvement in M (ECSPV correction slightly outperforms RPV correction for the CHOP data and vice versa for the WashU data). There were no substantial differences in performance between hemodynamic and Calcium fluorescence data. The ability of correction to reduce spatial bias also similar for total hemoglobin (as in the main results) and oxy/deoxyhemoglobin (Supp. Fig. 5, Supp. Table 1). Interestingly, use of deoxyhemoglobin as a contrast resulted in a different pattern of bias, with false positive clusters more prevalent along the venous sinuses and in parietal cortex (Supp. Fig. 5).
Figure 3.

Corrections for nonstationarity improve measures of spatial bias. Using either RPV correction or ECSPV correction results in a much more even pattern of false positive rate (FPR) and M. (See also Supplemental Fig. 3 for WashU data).
Table 1:
Spatial standard deviation of the false positive rate (FPR) and the M-statistic for different nonstationarity correction techniques across all null datasets. Lower values of the spatial standard deviation (within each metric) are more optimal. Note, the values for M are similar to those from Salimi-Khorshidi et al. for MRI data (FPR was not assessed in that paper).
| False Positive Rate (per thousand) | M-Statistic | |||||
|---|---|---|---|---|---|---|
| No correction | RPV correction | ECSPV correction | No correction | RPV correction | ECSPV correction | |
| CHOP, unpaired | 1.40 | 0.941 | 0.578 | 0.241 | 0.148 | 0.134 |
| CHOP, paired | 1.28 | 0.879 | 0.519 | 0.229 | 0.138 | 0.119 |
| WashU, P15, HbT | 2.55 | 1.75 | 1.29 | 0.273 | 0.194 | 0.188 |
| WashU, P22, HbT | 2.55 | 1.80 | 1.20 | 0.267 | 0.190 | 0.194 |
| WashU, P28, HbT | 2.43 | 1.66 | 1.21 | 0.269 | 0.188 | 0.179 |
| WashU, P35, HbT | 2.43 | 1.53 | 1.11 | 0.266 | 0.175 | 0.171 |
| WashU, P60, HbT | 2.57 | 1.64 | 1.18 | 0.341 | 0.211 | 0.241 |
| WashU, P15, Calcium | 2.49 | 1.69 | 1.25 | 0.273 | 0.192 | 0.192 |
| WashU, P22, Calcium | 2.90 | 1.95 | 1.44 | 0.331 | 0.237 | 0.243 |
| WashU, P28, Calcium | 2.343 | 1.56 | 1.19 | 0.287 | 0.200 | 0.209 |
| WashU, P35, Calcium | 2.41 | 1.67 | 1.17 | 0.264 | 0.192 | 0.191 |
| WashU, P60, Calcium | 2.33 | 1.66 | 1.17 | 0.276 | 0.205 | 0.218 |
Examining both the FPR and M as a pixel-wise function of RPV and ECSPV (Fig. 4), we see that, for uncorrected data, FPR and M are highly dependent on spatial smoothness with higher smoothness (higher ECSPV, lower RPV) resulting in more significant clusters (higher FPR or M). This effect is substantially attenuated by either RPV or ECSPV correction. After either type of correction, M is almost independent of local smoothness (the line in Fig. 4 is essentially flat). FPR still varies with local smoothness but to a much lesser extent, with ECSPV correction offering the most improved performance. This improvement is also evident in the WashU data (Supp. Fig. 6); in these data RPV and ECSPV correction perform nearly equivalently.
Figure 4.

Effect of smoothness on spatial bias, with and without correction. Shown is the pixel-wise false positive rate (FPR) and M-statistic as a function of each pixel’s smoothness as measured by resels per voxel (RPV) and the expected cluster size per voxel (ECSPV). For each value of RPV or ECSPV the median FPR or M is displayed as solid line with the interquartile range demonstrated by the dashed lines. Ideally, these lines would be perfectly horizontal with even sensitivity across all pixels. As shown earlier, without correction (blue lines), smoother areas of the image (lower RPV, higher ECSPV) have dramatically higher false positive rates and M-statistics (i.e., lower p-values). Ideally, these curves would be flat, represented no spatial bias. We see that both RPV correction and ECSPV correction flatten the curves with generally ECSPV correction shown to be more effective. (See also Supplemental Fig. 4 for WashU data).
Finally, we applied these methods to study changes in homotopic functional connectivity across mouse development by reanalyzing data from one of the experiments in Rahn et al. [2002]. Homotopic functional was calculated for each time point, and then adjacent ages were compared with paired t-tests. For comparison with the original paper, we calculated significant clusters and cluster p-values using (1) the random field theory algorithm used in the original paper [Brier & Culver, 2023] which includes bugs and statistical errors, (2) an improved random field algorithm [White, 2023] which still yields invalid p-values, and (3) permutation inference (both uncorrected and with the two correction algorithms) (Supp. Fig. 7). Correction for nonstationarity can have a substantial effect on cluster p-values (Fig. 5A); as expected, clusters with higher ECSPV or lower RPV (indicating higher smoothness) have their p-values reduced with correction (compare to Fig. 10 in Salimi-Khorshidi et al.). These corrections can affect determinations of statistical significance when a p-value is adjusted above or below 0.05 (Fig. 5B).
Figure 5.

Effect of nonstationarity correction on p-values in the homotopic functional connectivity experiment of Rahn et al. (A) Changes in p-value after correction as a function of each cluster’s average RPV or ECSPV. Each dot is one cluster with all data from all paired comparisons included. These results demonstrate that areas of high smoothness (low RPV or high ECSPV) have spuriously low p-values before correction which are adjusted towards less significant levels after either RPV- or ECSPV-correction. (B) An example of how nonstationarity correction can change determinations of statistical significance. Shown are the significant clusters for the P35 vs P28 comparison. The frontal cortex cluster is significant in the initial experiment but does not meet criteria for significance after correction either RPV or ECSPV.
4. Discussion
Translation of neuroimaging findings from mouse models to humans requires robust statistical inference. However, to date there has been little research into how to detect changes in functional connectivity maps in mouse widefield optical imaging data. In this manuscript, we have addressed a problem with existing cluster-based approaches, namely nonstationarity in image roughness affecting cluster sizes. Importantly, we used two datasets from different institutions to help ensure that our results and recommendations are generalizable. As expected, nonstationarity has a strong effect on inference in WOI data; both the false positive rate and M-statistic vary substantially across the image under null hypothesis conditions. Both parametric and nonparametric correction (adjusting cluster sizes by either resels or expected cluster size, respectively) can counteract this effect and yield nearly even sensitivity across the field-of-view. In general, correction using the ECSPV was slightly superior to RPV correction and offers the computational benefit of not having to calculate and retain the normalized residuals from each permutation.
It is difficult to compare the images of spatial bias shown here to similar fMRI studies. Hayasaka et al. [2004] only shows a single FWHM map, and Salimi-Khorshidi et al. [2011] only shows maps of FWHM and ECSPV maps for voxel-based morphometry data (which is presented without a scale for quantitative interpretation). Overall, our spatial standard deviations of the M-statistic and the improvements with correction are similar to those presented by Salimi-Khorshidi et al. In one difference from fMRI studies, we found only minor effects from varying the correction factor, E, in the ECSPV algorithm, settling on E=1 as a simple setting. As hypothesized by Salimi-Khorshidi et al., this may result from less skew in the cluster size distribution in two dimensions relative to three, but the data which led to a preference for E=2/3 for fMRI was not available for comparison.
Unlike fMRI, WOI suffers from imaging artifacts due to the overlying skull and surface vasculature. While WOI data is temporally-normalized such that the absolute illumination intensity is relatively unimportant, superficial structures may cause neighboring pixels to have different temporal time courses. For example, a pixel in a blood vessel samples more systemic physiology, or a pixel at a cranial suture has lower signal-to-noise due to specular reflections. Local spatial autocorrelation would be lower in these regions. We can see such effects in our data. The roughest data in the CHOP data occurs around large venous sinuses along the anterior midline and at the junction of the cerebrum and olfactory bulb. As a result, large clusters are almost never seen in these areas. Conversely, the smoothest regions correspond to large functional regions (e.g., somatosensory and motor cortex) where spatial autocorrelation is likely driven by the underlying neuronal architecture as well as fewer artifacts from superficial structures. It would be interesting to examine whether some of these effects can be ameliorated by preparations which remove the skull in favor of glass cranial windows [Kim, 2016; Sunil 2021].
It is also important to note that the field-of-view in WOI does not encompass the entire brain, and functional regions in the brain may be cropped by the edge of the surgical exposure. Thus, the cluster size actually visualized by the system may be smaller than the ground truth. The methods described herein can help counteract this effect, adjusting for the fact that the expected cluster size at the image boundary will be smaller than that in the interior of the field-of-view.
Interestingly, the WashU data shows a very different pattern of spatial roughness than the CHOP data. In particular, over the course of the longitudinal study, a large smooth region appears in the parietal cortex which has a higher false positive rate than any other brain region. The origin of this phenomenon is unclear, but we offer three hypotheses. It may result from continued skull growth underneath a chronic intact-skull cranial window. The cemented cranial window may affect some skull regions differently than other, resulting in image artifacts not present in cranial windows attached to mice whose growth has completed (e.g., the CHOP data). Alternatively, it may be due to differences in arousal between anesthetized mice (CHOP data) and awake mice (WashU data). Arousal state has been shown to have a substantial effect on resting-state brain activity, varying spatially across the brain [Reimer, 2014; Reimer, 2016; Musall, 2019; Turner, 2020; West, 2020; Liu, 2021; Raut, 2022; Wang, 2022b]. Similarly, differences in local roughness may reflect changes in brain activity or organization during development, although in this case we would expect the P60 WashU data to be similar to the CHOP data, which it was not. As there were multiple methodological differences between the CHOP and WashU studies, possible neurologic or physiologic reasons for differences could not be fully investigated. Regardless of the underlying cause, this phenomenon adds another layer of complexity to studies which compare two longitudinal time points or two behavioral states. The same brain region may have different noise properties in the two conditions analyzed, which may affect statistical conclusions. Furthermore, image roughness differed between hemodynamic data and Calcium fluorescence data from the same acquisitions, which merits further investigation.
As spatial roughness (and the resulting patterns in FPR and M) vary so significantly between imaging systems and contrasts, it is clear how this could result in difficulty with replication of preclinical imaging biomarkers studies. In the absence of correction for nonstationarity, a true positive noted by one experiment may not be noted by another, not due to neurological reasons, but rather due to differences in image roughness affecting the statistical threshold for significance in each instance. As the use of correction for the spatial biases demonstrated above resulted in an even sensitivity for all data sets examined, these methods may aid in the detection of results that are not idiosyncratic to the particulars of an imaging system.
Across the field of WOI (and functional neuroimaging more generally), many methods are used to assess for differences in RSFC or in neuronal dynamics and then correct for multiple comparisons. These include the cluster-based methods evaluated here [Rahn, 2022; Brier & Culver, 2023] as well as false discovery rate (FDR) [Hakon, 2018], region-of-interest (ROI) analysis [Quarta, 2022], principal/independent component analysis (PCA, ICA) [Bice, 2022; West, 2022], and Bonferroni correction using a derived model order [Mitra, 2018]. Across all methods, it is important to ask fundamental statistical questions. Given the data used, do these methods adequately control the FWER or FDR as expected? Do they introduce spatial bias? In both fMRI and WOI, commonly used statistical algorithms have been found to have bugs or invalid p-values [Eklund, 2016; Chen, 2018; Smith, 2018; Eklund, 2020; Marek, 2021; White, 2023]. Thus, it is important to rigorously assess all statistical methods using true null hypothesis data. Not all techniques may apply equally well to fMRI and WOI. For example, the network based statistic (NBS) [Zalesky, 2010] has been increasingly used for WOI [Hakon, 2018; Rahn, 2023] despite the entire brain network not being visible to WOI. The complications of performing NBS on a partial network, which may differ between groups, have not been investigated. The present work attempts to address one specific question for statistical analysis of WOI data, bringing the problem of spatial nonstationarity to the attention of the WOI community, and robustly testing methods for mitigating bias.
While we limited our analysis to WOI data in mice, the concepts herein are applicable to other forms of optical imaging, including functional near-infrared spectroscopy (NIRS) and diffuse optical tomography (DOT). In all likelihood, image nonstationarity is likely of greater concern for those modalities as, for both NIRS and DOT, image resolution (and thus local smoothness in the image reconstruction) is highly variable over the field-of-view and strongly dependent on the optode positioning. NIRS and DOT additionally have the same problem of WOI in that their field-of-view is limited and clusters may be clipped at the edge of the visualized brain. A prior adaptation of RFT to DOT [Hassanpour, 2014] measured RPV at each pixel and acknowledged a highly variable and skewed distribution across the field-of-view. However, this measurement was only used to estimate the total number of resels across the brain (subsequently used in RFT formulas to determine the cluster size threshold for significance). It was not used for any post hoc correction of cluster sizes. Other applications of RFT to NIRS [Ye, 2009; Tak, 2014] do not seem to have considered nonstationarity in their data. ECSPV as developed by Salimi-Khorshidi et al. is simple to implement, and we have shown that it is applicable to optical systems measuring both hemodynamics and fluorescence. Further adaption to human optical imaging would be straight-forward.
It is important to note that cluster size correction occurs after the clusters themselves have been defined. These methods thus cannot correct for any imaging problems which may cause errors in cluster location or border placement (e.g., from poor signal-to-noise due to insufficient scan length or improper spectroscopy or fluorescence correction). Additionally, cluster-based inference is not always the ideal method to detect changes in neuroimaging data. Cluster-based inference includes a trade-off where local information is sacrificed in order to gain statistical power. Strong, but very localized changes may not result in cluster sizes large enough to be detected (with or without cluster size correction) even if they are true positives. The inference method used must be chosen appropriately based on the imaging question. But when cluster-based inference is used in WOI, cluster size correction is a valuable tool to improve robustness.
In conclusion, spatial nonstationarity in image noise has a strong effect on inference in WOI data, and this effect varies between imaging systems and contrasts. Both parametric and nonparametric methods are effective at reducing spatial bias, with nonparametric correction slightly superior when measured across different data sets. These methods are simple to implement and will thus likely aid in the detection of relevant neuroimaging findings in WOI data.
Supplementary Material
Highlights.
Cluster-based inference in neuroimaging is highly sensitive to local smoothness.
A first analysis of spatial nonstationarity in WOI, using multi-institutional data.
Nonstationarity results in substantial spatial bias in RSFC inference.
Both parametric and nonparametric methods are effective at mitigating bias.
Adjusting clusters to remove bias is important for robust biomarker development.
Funding
This work was funded by the National Institute of Neurological Disorders and Stroke (Grant Nos. K08NS117897, R01NS060910, and R01NS112274) and the National Institute of Mental Health (Grant Nos. R01MH123563, and R01MH123550). Additional support was provided by the Research Institute and Cardiac Center at the Children’s Hospital of Philadelphia.
Abbreviations:
- ECSPV
expected cluster size per voxel
- FPR
false positive rate
- FWER
familywise error rate
- RFT
random field theory
- RPV
resels per voxel
- RSFC
resting-state functional connectivity
- WOI
widefield optical imaging
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declaration of Competing Interests
The authors have no relevant potential conflicts of interest.
References
- •.Ayaz H, et al. (2002). Optical imaging and spectroscopy for the study of the human brain status report. Neurophotonics, 9, S24001. 10.1117/1.NPh.9.S2.S24001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Bice AR, et al. (2022). Homotopic contralesional excitation suppresses spontaneous circuit repair and global network reconnections following ischemic stroke. eLife, 11, e68852. 10.7554/eLife.68852 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Brier LM & Culver JP (2023). Open-source statistical and data processing tools for wide-field optical imaging data in mice. Neurophotonics, 10, 016601. 10.1117/1.NPh.10.1.016601 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Chen G, et al. (2018). A tail of two sides: Artificially doubled false positive rates in neuroimaging due to the sidedness choice with t-tests. Hum Brain Mapp, 40, 1037–1043. 10.1002/hbm.24399 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Eklund A, Nichols TE, & Knutsson H. (2016). Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Proc Nat Acad Sci U S A, 113, 7900–7905. www.pnas.org/cgi/doi/10.1073/pnas.1612033113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Eklund A, et al. (2020). How does group differences in motion scrubbing affect false positives in functional connectivity studies? bioRxiv 2020.02.12.944454 10.1101/2020.02.12.944454 [DOI] [Google Scholar]
- •.Friston KJ, et al. (1994). Assessing the significance of focal activations using their spatial extent. Hum Brain Mapp; 1, 210–220. 10.1002/hbm.460010306 [DOI] [PubMed] [Google Scholar]
- •.Hakon J, et al. (2018). Multisensory stimulation improves functional recovery and resting-state functional connectivity in the mouse brain after stroke. Neuroimage Clin, 17, 717–730. 10.1016/j.nicl.2017.11.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Hassanpour MS, et al. (2014). Statistical analysis of high density diffuse optical tomography. Neuroimage; 85, 104–116. 10.1016/j.neuroimage.2013.05.105 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Hayasaka S, et al. (2004). Nonstationary cluster-size inference with random field and permutation methods. Neuroimage; 22, 676–687. 10.1016/j.neuroimage.2004.01.041 [DOI] [PubMed] [Google Scholar]
- •.Kiebel SJ, et al. (1999). Robust smoothness estimation in statistic parametric maps using standardized residuals from the general linear model. Neuroimage, 10, 756–766. 10.1006/nimg.1999.0508 [DOI] [PubMed] [Google Scholar]
- •.Kim WH, et al. (2016). Long-Term Optical Access to an Estimated One Million Neurons in the Live Mouse Cortex. Cell Rep, 17, 3385–3394. 10.1016/j.celrep.2016.12.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Liu X, et al. (2021). Subcortical evidence for a contribution of arousal to fMRI studies of brain activity. Nat Comm, 9, 395. 10.1038/s41467-017-02815-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Ma Y, et al. (2016). Wide-field optical mapping of neural activity and brain haemodynamics: considerations and novel approaches. Phil Trans R Soc B, 371, 20150360. 10.1098/rstb.2015.0360 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Marek S, et al. (2021). Reproducible brain-wide association studies require thousands of individuals. Nature, 603, 654–660. 10.1038/s41586-022-04492-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Mitra A, et al. (2018). Spontaneous Infra-slow Brain Activity Has Unique Spatiotemporal Dynamics and Laminar Structure. Neuron, 98, 297–305. 10.1016/j.neuron.2018.03.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Musall S, et al. (2019). Single-trial neural dynamics are dominated by richly varied movements. Nat Neurosci, 22, 1677–1686. 10.1038/s41593-019-0502-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Nichols T & Hayasaka S (2003). Controlling the familywise error rate in functional neuroimaging: a comparative review. Stat Meth Med Res, 12, 419–446. 10.1191/0962280203sm341ra [DOI] [PubMed] [Google Scholar]
- •.Padawer-Curry JA, et al. (2021). Variability in atlas registration of optical intrinsic signal imaging and its effect on functional connectivity analysis. J Opt Soc Am A, 38, 245–252. 10.1364/JOSAA.410447 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Petersson KM, et al. (1999). Statistical limitations in functional neuroimaging II: signal detection and statistical inference. Phil Trans R Soc B, 354, 1261–1281. 10.1098/rstb.1999.0478 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Quarta E, et al. (2022). Distributed and Localized Dynamics Emerge in the Mouse Neocortex during Reach-to-Grasp Behavior. J Neurosci, 42, 777–788. 10.1523/JNEUROSCI.0762-20.2021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Rahn RM, et al. (2022). Functional connectivity of the developing mouse cortex. Cereb Cortex, 32, 1755–1768. 10.1093/cercor/bhab312 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Rahn RM, et al. (2023). Mecp2 deletion results in profound alterations of developmental and adult functional connectivity. Cereb Cortex, 33, 7436–7453. 10.1093/cercor/bhad050 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Raut RV, et al. (2021). Global waves synchronize the brain’s functional systems with fluctuating arousal. Sci Adv, 7, eabf2709. 10.1126/sciadv.abf2709 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Reimer J, et al. (2014). Pupil fluctuations track fast switching of cortical states during quiet wakefulness. Neuron, 84, 355–362. 10.1016/j.neuron.2014.09.033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Reimer J, et al. (2016). Pupil fluctuations track rapid changes in adrenergic and cholinergic activity in cortex. Nat Comm, 7, 13289. 10.1038/ncomms13289 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Salimi-Khorshidi G, et al. (2011). Adjusting the effect of nonstationarity in cluster-based and TFCE inference. Neuroimage, 54, 2006–2019 10.1016/j.neuroimage.2010.09.088 [DOI] [PubMed] [Google Scholar]
- •.Silasi G, et al. (2016). Intact skull chronic windows for mesoscopic wide-field imaging in awake mice. J Neurosci Meth, 267, 141–149. 10.1016/j.jneumeth.2016.04.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Smith SM & Nichols TE. (2018). Statistical Challenges in “Big Data” Human Neuroimaging. Neuron, 97, 263–268. 10.1016/j.neuron.2017.12.018 [DOI] [PubMed] [Google Scholar]
- •.Sunil S, et al. (2021). Stroke core revealed by tissue scattering using spatial frequency domain imaging. Neuroimage Clin, 29, 102539. 10.1016/j.nicl.2020.102539 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Tak S & Ye JC (2014). Statistical analysis of fNIRS data: A comprehensive review. Neuroimage, 85, 72–91. 10.1016/j.neuroimage.2013.06.016 [DOI] [PubMed] [Google Scholar]
- •.Turner KL, et al. (2020). Neurovascular coupling and bilateral connectivity during NREM and REM sleep. eLife, 9, e62071. 10.7554/eLife.62071 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Wang Y, LeDue JM, & Murphy TH (2022a). Multiscale imaging informs translational mouse modeling of neurological disease. Neuron, 110, 3688–3710. 10.1016/j.neuron.2022.09.006 [DOI] [PubMed] [Google Scholar]
- •.Wang Z, et al. (2022b). REM sleep is associated with distinct global cortical dynamics and controlled by occipital cortex. Nat Comm, 13, 6896. 10.1038/s41467-022-34720-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.West SL, et al. (2022). Wide-Field Calcium Imaging of Dynamic Cortical Networks during Locomotion. Cereb Cortex, 32, 2668–2687. 10.1093/cercor/bhab373 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.White BR, et al. (2011). Imaging of functional connectivity in the mouse brain. PLoS One, 6, e16322. 10.1371/journal.pone.0016322 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.White BR, et al. (2012). Bedside optical imaging of occipital cortex resting-state functional connectivity in neonates. Neuroimage, 59, 2529–2538. 10.1016/j.neuroimage.2011.08.094 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.White BR, et al. (2019). Brain segmentation, spatial censoring, and averaging techniques for optical functional connectivity imaging in mice. Biomed Opt Exp, 10, 5952–5973. 10.1364/BOE.10.005952 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.White BR, et al. (2021). Wavelength censoring for spectroscopy in optical functional neuroimaging. Phys Med Biol, 66, 065026. 10.1088/1361-6560/abd418 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.White BR, et al. (2023). Controlling the familywise error rate in widefield optical neuroimaging of functional connectivity in mice. Neurophotonics, 10, 015004. 10.1117/1.NPh.10.1.015004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Worsley KJ et al. (1992). A three-dimensional statistical analysis for CBF activation studies in human brain. J Cereb Blood Flow Metab, 12, 900–918. 10.1038/jcbfm.1992.127 [DOI] [PubMed] [Google Scholar]
- •.Worsley KJ, et al. (1999). Detecting changes in nonisotropic images. Hum Brain Mapp, 8, 98–101. 10.1002/(SICI)1097-0193(1999)8:2/3<98::AID-HBM5>3.0.CO;2-F [DOI] [PMC free article] [PubMed] [Google Scholar]
- •.Ye JC, et al. (2009). NIRS-SPM: Statistical parametric mapping for near-infrared spectroscopy. Neuroimage, 44, 428–447. 10.1016/j.neuroimage.2008.08.036 [DOI] [PubMed] [Google Scholar]
- •.Zalesky A, Fornito A, Bullmore ET. (2010). Network-based statistic: Identifying differences in brain networks. Neuroimage, 53, 1197–1207. 10.1016/j.neuroimage.2010.06.041 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
