Direct extraction of signal and noise correlations from two-photon calcium imaging of ensemble neuronal activity

Anuththara Rupasinghe; Nikolas Francis; Ji Liu; Zac Bowen; Patrick O Kanold; Behtash Babadi

doi:10.7554/eLife.68046

. 2021 Jun 28;10:e68046. doi: 10.7554/eLife.68046

Direct extraction of signal and noise correlations from two-photon calcium imaging of ensemble neuronal activity

Anuththara Rupasinghe ¹, Nikolas Francis ^2,³, Ji Liu ^2,³, Zac Bowen ^2,³, Patrick O Kanold ^2,^3,⁴, Behtash Babadi ^1,^✉

Editors: Brice Bathellier⁵, Barbara G Shinn-Cunningham⁶

PMCID: PMC8354639 PMID: 34180397

Abstract

Neuronal activity correlations are key to understanding how populations of neurons collectively encode information. While two-photon calcium imaging has created a unique opportunity to record the activity of large populations of neurons, existing methods for inferring correlations from these data face several challenges. First, the observations of spiking activity produced by two-photon imaging are temporally blurred and noisy. Secondly, even if the spiking data were perfectly recovered via deconvolution, inferring network-level features from binary spiking data is a challenging task due to the non-linear relation of neuronal spiking to endogenous and exogenous inputs. In this work, we propose a methodology to explicitly model and directly estimate signal and noise correlations from two-photon fluorescence observations, without requiring intermediate spike deconvolution. We provide theoretical guarantees on the performance of the proposed estimator and demonstrate its utility through applications to simulated and experimentally recorded data from the mouse auditory cortex.

Research organism: Mouse

Introduction

Neuronal activity correlations are essential in understanding how populations of neurons encode information. Correlations provide insights into the functional architecture and computations carried out by neuronal networks (Abbott and Dayan, 1999; Averbeck et al., 2006; Cohen and Kohn, 2011; Hansen et al., 2012; Kohn et al., 2016; Kohn and Smith, 2005; Lyamzin et al., 2015; Montijn et al., 2014; Smith and Sommer, 2013; Sompolinsky et al., 2001; Yatsenko et al., 2015). Neuronal activity correlations are often categorized in two groups: signal correlations and noise correlations (Cohen and Kohn, 2011; Cohen and Maunsell, 2009; Gawne and Richmond, 1993; Josić et al., 2009; Lyamzin et al., 2015; Vinci et al., 2016). Given two neurons, signal correlation quantifies the similarity of neural responses that are time-locked to a repeated stimulus across trials, whereas noise correlation quantifies the stimulus-independent trial-to-trial variability shared by neural responses that are believed to arise from common latent inputs.

Two-photon calcium imaging has become increasingly popular in recent years to record in vivo neural activity simultaneously from hundreds of neurons (Ahrens et al., 2013; Romano et al., 2017; Stosiek et al., 2003; Svoboda and Yasuda, 2006). This technology takes advantage of intracellular calcium flux mostly arising from spiking activity and captures calcium signaling in neurons in living animals using fluorescence microscopy. The observed fluorescence traces of calcium concentrations, however, are indirectly related to neuronal spiking activity. Extracting spiking activity from fluorescence traces is a challenging signal deconvolution problem and has been the focus of active research (Deneux et al., 2016; Friedrich et al., 2017; Grewe et al., 2010; Jewell et al., 2020; Jewell and Witten, 2018; Kazemipour et al., 2018; Pachitariu et al., 2018; Pnevmatikakis et al., 2016; Stringer and Pachitariu, 2019; Theis et al., 2016; Vogelstein et al., 2009; Vogelstein et al., 2010).

The most commonly used approach to infer signal and noise correlations from two-photon data is to directly apply the classical definitions of correlations for firing rates (Lyamzin et al., 2015), to fluorescence traces (Fallani et al., 2015; Francis et al., 2018; Rothschild et al., 2010; Winkowski and Kanold, 2013). However, it is well known that fluorescence observations are noisy and blurred surrogates of spiking activity, because of dependence on observation noise, calcium dynamics and the temporal properties of calcium indicators. Due to temporal blurring, the resulting signal and noise correlation estimates are highly biased. An alternative approach is to carry out the inference in a two-stage fashion: first, infer spikes using a deconvolution technique, and then compute firing rates and evaluate the correlations (Kerlin et al., 2019; Najafi et al., 2020; Ramesh et al., 2018; Soudry et al., 2015; Yatsenko et al., 2015). These two-stage estimates are highly sensitive to the accuracy of spike deconvolution, and require high temporal resolution and signal-to-noise ratios (Lütcke et al., 2013; Pachitariu et al., 2018). Furthermore, these deconvolution techniques are biased toward obtaining accurate first-order statistics (i.e. spike timings) via spatiotemporal priors, which may be detrimental to recovering second-order statistics (i.e. correlations). Finally, both approaches also undermine the non-linear dynamics of spiking activity as governed by stimuli, past activity and other latent processes (Truccolo et al., 2005). There are a few existing studies that aim at improving estimation of neuronal correlations, but they either do not consider signal correlations (Rupasinghe and Babadi, 2020; Yatsenko et al., 2015), or aim at estimating surrogates of correlations from spikes such as the connectivity/coupling matrix (Aitchison et al., 2017; Mishchenko et al., 2011; Soudry et al., 2015; Keeley et al., 2020).

Here, we propose a methodology to directly estimate both signal and noise correlations from two-photon imaging observations, without requiring an intermediate step of spike deconvolution. We pose the problem under the commonly used experimental paradigm in which neuronal activity is recorded during trials of a repeated stimulus. We avoid the need to perform spike deconvolution by integrating techniques from point processes and state-space modeling that explicitly relate the signal and noise correlations to the observed fluorescence traces in a multi-tier model. Thus, we cast signal and noise correlations within a parameter estimation setting. To solve the resulting estimation problem in an efficient fashion, we develop a solution method based on variational inference (Jordan et al., 1999; Blei et al., 2017), by combining techniques from Pólya-Gamma augmentation (Polson et al., 2013) and compressible state-space estimation (Rauch et al., 1965; Kazemipour et al., 2018; Ba et al., 2014). We also provide theoretical guarantees on the bias and variance performance of the resulting estimator.

We demonstrate the utility of our proposed estimation framework through application to simulated and real data from the mouse auditory cortex during presentations of tones and acoustic noise. In application to repeated trials under spontaneous and stimulus-driven conditions within the same experiment, our method reliably provides noise correlation structures that are invariant across the two conditions. In addition, our joint analysis of signal and noise correlations corroborates existing hypotheses regarding the distinction between their structures (Keeley et al., 2020; Rumyantsev et al., 2020; Bartolo et al., 2020). Moreover, while application of our proposed method to spatial analysis of signal and noise correlations in the mouse auditory cortex is consistent with existing work (Winkowski and Kanold, 2013), it reveals novel and distinct spatial trends in the correlation structure of layers 2/3 and 4. In summary, our method improves on existing work by: (1) explicitly modeling the fluorescence observation process and the non-linearities involved in spiking activity, as governed by both the stimulus and latent processes, through a multi-tier Bayesian forward model, (2) joint estimation of signal and noise correlations directly from two-photon fluorescence observations through an efficient iterative procedure, without requiring intermediate spike deconvolution, (3) providing theoretical guarantees on the performance of the proposed estimator, and (4) gaining access to closed-form posterior approximations, with low-complexity and iterative update rules and minimal dependence on training data. Our proposed method can thus be used as a robust and scalable alternative to existing approaches for extracting signal and noise correlations from two-photon imaging data.

Results

In this section, we first demonstrate the utility of our proposed estimation framework through simulation studies as well as applications on experimentally recorded data from the mouse auditory cortex. Then, we present theoretical performance bounds on the proposed estimator. Before presenting the results, we will give an overview of the proposed signal and noise correlation inference framework, and outline our contributions and their relationship to existing work. For the ease of reproducibility, we have archived a MATLAB implementation of our proposed method in GitHub (Rupasinghe, 2020) and have deposited the data used in this work in the Digital Repository at the University of Maryland (Rupasinghe et al., 2021).

Signal and noise correlations

We consider a canonical experimental setting in which the same external stimulus, denoted by $𝐬_{t}$ , is repeatedly presented across $L$ independent trials and the spiking activity of a population of $N$ neurons are indirectly measured using two-photon calcium fluorescence imaging. Figure 1 (forward arrow) shows the generative model that is used to quantify this procedure. The fluorescence observation in the $l^{𝗍𝗁}$ trial from the $j^{𝗍𝗁}$ neuron at time frame $t$ , denoted by $y_{t, l}^{(j)}$ , is a noisy surrogate of the intracellular calcium concentrations. The calcium concentrations in turn are temporally blurred surrogates of the underlying spiking activity $n_{t, l}^{(j)}$ , as shown in Figure 1.

Figure 1. — Observed (green) and latent (orange) variables pertinent to the $j^{𝗍𝗁}$ neuron are indicated, according to the proposed model for estimating the signal (blue) and noise (red) correlations from two-photon calcium fluorescence observations. Calcium fluorescence traces $(y_{t, l}^{(j)})$ of $L$ trials are observed, in which the repeated external stimulus $(𝐬_{t})$ is known. The underlying spiking activity $(n_{t, l}^{(j)})$ , trial-to-trial variability and other intrinsic/extrinsic neural covariates that are not time-locked with the external stimulus $(x_{t, l}^{(j)})$ , and the stimulus kernel $(𝐝_{j})$ are latent. Our main contribution is to solve the inverse problem: recovering the underlying latent signal $(𝐒)$ and noise $(𝐍)$ correlations directly from the fluorescence observations, without requiring intermediate spike deconvolution.

In modeling the spiking activity, we consider two main contributions: (1) the common known stimulus $𝐬_{t}$ affects the activity of the $j^{𝗍𝗁}$ neuron via an unknown kernel $𝐝_{j}$ , akin to the receptive field; (2) the trial-to-trial variability and other intrinsic/extrinsic neural covariates that are not time-locked to the stimulus $𝐬_{t}$ are captured by a trial-dependent latent process $x_{t, l}^{(j)}$ . Then, we use a Generalized Linear Model to link these underlying neural covariates to spiking activity (Truccolo et al., 2005). More specifically, we model spiking activity as a Bernoulli process:

n_{t, l}^{(j)} \sim Bernoulli (ϕ (x_{t, l}^{(j)}, 𝐝_{j}^{⊤} 𝐬_{t})),

where $ϕ (\cdot)$ is a mapping function, which could in general be non-linear.

The signal correlations aim to measure the correlations in the temporal response that are time-locked to the repeated stimulus, $𝐬_{t}$ . On the other hand, noise correlations in our setting quantify connectivity arising from covariates that are unrelated to the stimulus, including the trial-to-trial variability (Keeley et al., 2020). Based on the foregoing model, we propose to formulate the signal $({(𝚺_{s})}_{i, j})$ and noise $({(𝚺_{x})}_{i, j})$ covariance between the $i^{𝗍𝗁}$ neuron and $j^{𝗍𝗁}$ neuron as:

{(𝚺_{s})}_{i, j} := 𝐝_{i}^{⊤} cov (𝐬_{t}, 𝐬_{t}) 𝐝_{j}, {(𝚺_{x})}_{i, j} := cov (x_{t, l}^{(i)}, x_{t, l}^{(j)}),

(1)

where $cov (\cdot, \cdot)$ is the empirical covariance function defined for two vector time series $u_{t}$ and $v_{t}$ as $cov (u_{t}, v_{t}) := \frac{1}{T} \sum_{t = 1}^{T} (u_{t} - \frac{1}{T} \sum_{t^{'} = 1}^{T} u_{t^{'}}) {(v_{t} - \frac{1}{T} \sum_{t^{'} = 1}^{T} v_{t^{'}})}^{⊤}$ , for a total observation duration of $T$ time frames.

Our main contribution is to provide an efficient solution for the so-called inverse problem: direct estimation of $𝚺_{s}$ and $𝚺_{x}$ from the fluorescence observations, without requiring intermediate spike deconvolution (Figure 1, backward arrow). The signal and noise correlation matrices, denoted by $𝐒$ and $𝐍$ , can then be obtained by standard normalization of $𝚺_{s}$ and $𝚺_{x}$ :

{(𝐒)}_{i, j} := \frac{{(𝚺_{s})}_{i, j}}{\sqrt{{(𝚺_{s})}_{i, i} . {(𝚺_{s})}_{j, j}}}, {(𝐍)}_{i, j} := \frac{{(𝚺_{x})}_{i, j}}{\sqrt{{(𝚺_{x})}_{i, i} . {(𝚺_{x})}_{j, j}}}, \forall i, j = 1, 2, \dots, N .

(2)

We note that when spiking activity is directly observed using electrophysiology recordings, the conventional signal $({(𝚺_{s}^{𝖼𝗈𝗇})}_{i, j})$ and noise $({(𝚺_{x}^{𝖼𝗈𝗇})}_{i, j})$ covariances of spiking activity between the $i^{𝗍𝗁}$ and $j^{𝗍𝗁}$ neuron are defined as (Lyamzin et al., 2015):

(Σ_{s}^{c o n})_{i, j} := cov (\frac{1}{L} \sum_{l = 1}^{L} n_{t, l}^{(i)}, \frac{1}{L} \sum_{l = 1}^{L} n_{t, l}^{(j)}), (Σ_{x}^{c o n})_{i, j} := \frac{1}{L} \sum_{l = 1}^{L} cov (n_{t, l}^{(i)} - \frac{1}{L} \sum_{l^{'} = 1}^{L} n_{t, l^{'}}^{(i)}, n_{t, l}^{(j)} - \frac{1}{L} \sum_{l^{'} = 1}^{L} n_{t, l^{'}}^{(j)}),

(3)

which after standard normalization in Equation 2 give the conventional signal $({(𝐒^{𝖼𝗈𝗇})}_{i, j})$ and noise $({(𝐍^{𝖼𝗈𝗇})}_{i, j})$ correlations. While at first glance, our definitions of signal and noise covariances in Equation 1 seem to be a far departure from the conventional ones in Equation 3, we show that the conventional notions of correlation indeed approximate the same quantities as in our definitions:

S^{c o n} \approx S and N^{c o n} \approx N,

under asymptotic conditions (i.e. $T$ and $L$ sufficiently large). We prove this assertion of asymptotic equivalence in Appendix 1, which highlights another facet of our contributions: our proposed estimators are designed to robustly operate in the regime of finite (and typically small) $T$ and $L$ , aiming for the very same quantities that the conventional estimators could only recover accurately under ideal asymptotic conditions.

Existing methods used for performance comparison

In order to compare the performance of our proposed method with existing work, we consider three widely available methods for extracting neuronal correlations. In simulation studies, we additionally benchmark these estimates with respect to the known ground truth. The existing methods considered are the following:

Pearson correlations from the two-photon data
In this method, fluorescence observations are assumed to be the direct measurements of spiking activity, and thus empirical Pearson correlations of the two-photon data are used to compute the signal and noise correlations (Rothschild et al., 2010; Winkowski and Kanold, 2013; Francis et al., 2018; Bowen et al., 2020). Explicitly, these estimates are obtained by simply replacing $n_{t, l}^{(j)}$ in Equation 3 by $y_{t, l}^{(j)}$ , without performing spike deconvolution.
Two-stage Pearson estimation
Unlike the previous method, in this case spikes are first inferred using a deconvolution technique. Then, following temporal smoothing via a narrow Gaussian kernel the Pearson correlations are computed using the conventional definitions of Equation 3. For spike deconvolution, we primarily used the FCSS algorithm (Kazemipour et al., 2018). In order to also demonstrate the sensitivity of these estimates to the deconvolution technique that is used, we provide a comparison with the f-oopsi deconvolution algorithm (Pnevmatikakis et al., 2016) in Figure 2—figure supplement 1.
Two-stage GPFA estimation
Similar to the previous method, spikes are first inferred using a deconvolution technique. Then, a latent variable model called Gaussian Process Factor Analysis (GPFA) (Yu et al., 2009) is applied to the inferred spikes in order to estimate the latent covariates and receptive fields. Based on those estimates, the signal and residual noise correlations are derived through a formulation similar to Equation 1 and Equation 2 (Ecker et al., 2014).

Simulation study 1: neuronal ensemble driven by external stimulus

We simulated calcium fluorescence observations according to the proposed generative model given in Proposed forward model, from an ensemble of $N = 8$ neurons for a duration of $T = 5000$ time frames. We considered $L = 20$ repeated trials driven by the same external stimulus, which we modeled by an autoregressive process (see Guidelines for model parameter settings for details). Figure 2 shows the corresponding estimation results.

Figure 2—figure supplement 1. — (A) Estimated noise and signal correlation matrices from different methods. Rows from left to right: ground truth, proposed method, Pearson correlations from two-photon recordings, two-stage Pearson estimates and two-stage GPFA estimates. The normalized mean squared error (NMSE) of each estimate with respect to the ground truth and the leakage effect quantified by the ratio between out-of-network and in-network power (leakage) are indicated below each panel. (B) Simulated external stimulus (orange), latent trial-dependent process (red), fluorescence observations (black), estimated calcium concentrations (purple), putative spikes (green), and estimated mean of the latent state (blue) by the proposed method, for the first trial of neuron 1.

The first column of Figure 2A shows the ground truth noise (top) and signal (bottom) correlations (diagonal elements are all equal to one and omitted for visual convenience). The second column shows estimates of the noise and signal correlations using our proposed method, which closely match the ground truth. The third, fourth and fifth columns, respectively, show the results of the Pearson correlations from the two-photon data, two-stage Pearson, and two-stage GPFA estimation methods. Through a qualitative visual inspection, it is evident that these methods incur high false alarms and mis-detections of the ground truth correlations.

To quantify these comparisons, the normalized mean square error (NMSE) of different estimates with respect to the ground truth are shown below each of the subplots (Figure 2A). Our proposed method achieves the lowest NMSE compared to the others. Furthermore, we observed a significant mixing between signal and noise correlations in these other estimates. To quantify this leakage effect, we first classified each of the correlation entries as in-network or out-of-network, based on being non-zero or zero in the ground truth, respectively (see Performance evaluation). We then computed the ratio between the power of out-of-network components and the power of in-network components as a measure of leakage. The leakage ratios are also reported in Figure 2A. The leakage of our proposed estimates is the lowest of all four techniques, in estimating both the signal and noise correlations. In order to further probe the performance of our proposed method, the simulated external stimulus s_t, latent trial-dependent process $x_{t, 1}^{(1)}$ , simulated observations $y_{t, 1}^{(1)}$ , estimated calcium concentration ${\hat{z}}_{t, 1}^{(1)}$ , the putative spikes ${\hat{n}}_{t, 1}^{(1)} := {\hat{z}}_{t, 1}^{(1)} - α {\hat{z}}_{t - 1, 1}^{(1)}$ , and the estimated mean of the latent state $m_{𝐱_{t, 1}}^{(1)}$ , for the first trial of the first neuron are shown in Figure 2B. These results demonstrate the ability of the proposed estimation framework in accurately identifying the latent processes, which in turn leads to an accurate estimation of the signal and noise correlations as shown in Figure 2B.

The main sources of the observed performance gap between our proposed method and the existing ones are the bias incurred by treating the fluorescence traces as spikes, low spiking rates, non-linearity of spike generation with respect to intrinsic and external covariates, and sensitivity to spike deconvolution. For the latter, we demonstrated the sensitivity of the two-stage Pearson estimates to the choice of the deconvolution technique in Figure 2—figure supplement 1. Furthermore, in order to isolate the effect of said non-linearities on the estimation performance, we applied the two-stage methods to ground truth spikes in Figure 2—figure supplement 2. Our analysis showed that both two-stage estimates incur significant estimation errors even if the spikes were recovered perfectly, mainly due to the limited number of trials ( $L = 20$ here). In accordance with our theoretical analysis of the asymptotic behavior of the conventional signal and noise correlation estimates given in Appendix 1, we also showed in Figure 2—figure supplement 2 that the performance of the two-stage Pearson estimates based on ground truth spikes, but using $L = 1000$ trials, dramatically improves. Our proposed method, however, was capable of producing reliable estimates with the number of trials as low as $L = 20$ , which is typical in two-photon imaging experiments.

Analysis of robustness with respect to modeling assumptions

While the preceding results are quite favorable to our proposed method, the underlying generative models were the same as those used to estimate signal and noise correlations, which is in contrast to conventional real data validation with known ground truth. Access to ground truth correlations in two-photon imaging experimental settings, however, is quite challenging. In order to further probe the robustness of our proposed method in the absence of ground truth data, we utilized surrogate data that parallel the setting of Figure 2, but deviate from our modeling assumptions.

Robustness to stimulus integration model mismatch. First, we considered surrogate data generated with a non-linear stimulus integration model by replacing the linear receptive field component $𝐝_{j}^{⊤} 𝐬_{t}$ with $𝐝_{j}^{⊤} 𝐬_{t} + {({\tilde{𝐝}}_{j, 1}^{⊤} 𝐬_{t})}^{2} + {({\tilde{𝐝}}_{j, 2}^{⊤} 𝐬_{t})}^{2}$ , where ${\tilde{𝐝}}_{j, 1}$ and ${\tilde{𝐝}}_{j, 2}$ are akin to quadratic receptive field components. We assumed a linear stimulus integration model in our estimation framework (i.e., ${\tilde{𝐝}}_{j, 1} = {\tilde{𝐝}}_{j, 2} =$ ). Figure 2—figure supplement 3 shows the resulting correlation estimates. While the performance of our proposed signal correlation estimates degrade under this setting as compared to Figure 2, our proposed estimates still outperform existing methods. In addition, the model mismatch in the stimulus integration component does not affect the accuracy of noise correlation estimation in our method.
Robustness to calcium decay model mismatch. Next, we tested our proposed estimation framework on data simulated with a different calcium decay model. Specifically, we simulated data with second-order autoregressive calcium dynamics, and at a lower signal-to-noise ratio (SNR) compared to the setting of Figure 2, and used our inference framework which assumes first order calcium dynamics for estimation. Figure 2—figure supplement 4 shows the corresponding noise and signal correlations estimated by the proposed method under these conditions. Even though the performance slightly degrades (in terms of NMSE and leakage ratio), our method is able to recover the underlying correlations faithfully under this setting.
Robustness to SNR level and firing rate. Next, we compared the performance of Pearson and Two-Stage Pearson methods with our proposed method under varying SNR levels and average firing rates, as shown in Figure 2—figure supplement 5. While the performance of all methods degrades at low SNR levels or firing rates (SNR < 10 dB, firing rate < 0.5 Hz), our proposed method outperforms the existing methods for a wide range of SNR and firing rate values. To quantify this comparison, we have also indicated the mean and standard deviation of the relative performance gain of our proposed estimates across SNR levels and firing rates as insets in Figure 2—figure supplement 5.
Robustness to observation noise model mismatch. Finally, we repeated the foregoing comparisons under varying SNR levels and firing rates, only now we included an additional observation noise model mismatch. Similar to the treatment in Deneux et al., 2016, we considered two temporally correlated observation noise models: white noise with a low frequency drift (Figure 2—figure supplement 6, top panels) and pink noise (Figure 2—figure supplement 6, bottom panels). In accordance with the results in Figure 2—figure supplement 5, our proposed method outperforms the existing ones for a wide range of SNR and firing rate values and under both observation noise model mismatch conditions. From Figure 2—figure supplement 6C and F, it can be observed that the ground truth spikes are favorably recovered as a byproduct of our method, even though the estimated calcium concentrations are contaminated by the temporally correlated fluctuations in observation noise. This in turn results in accurate signal and noise correlation estimates.

Simulation study 2: spontaneous activity

Next, we present the results of a simulation study in the absence of external stimuli (i.e. $s_{t} = 0$ ), pertaining to the spontaneous activity condition. It is noteworthy that the proposed method can readily be applied to estimate noise correlations during spontaneous activity, by simply setting the external stimulus $𝐬_{t}$ and the receptive field $𝐝_{j}$ to zero in the update rules (see Proposed forward model for details). We simulated the ensemble spiking activity based on a Poisson process (Smith and Brown, 2003) using a discrete time-rescaling procedure (Brown et al., 2002; Smith and Brown, 2003), so that the data are generated using a different model than that used in our inference framework (i.e. Bernoulli process with a logistic link as outlined in Proposed forward model). As such, we eliminated potential performance biases in favor of our proposed method by introducing the aforementioned model mismatch. We simulated $L = 20$ independent trials of spontaneous activity of $N = 30$ neurons, observed for a time duration of $T = 5000$ time frames. The number of neurons in this study is notably larger than that used in the previous one, to examine the scalability of our proposed approach with respect to the ensemble size.

Figure 3 shows the comparison of the noise correlation matrices estimated by our proposed method, Pearson correlations from two-photon recordings, two-stage Pearson, and two-stage GPFA estimates, with respect to the ground truth. The Pearson and the two-stage estimates are highly variable and result in excessive false detections. Our proposed estimate, however, closely follows the ground truth, which is also reflected by the comparatively lower NMSE and leakage ratios, in spite of the mismatch between the models used for data generation and inference. In addition, our proposed method exhibits favorable scaling with respect to the ensemble size, thanks to the underlying low-complexity variational updates (see Low-complexity parameter updates for details).

Real data study 1: mouse auditory cortex under random tone presentation

We next applied our proposed method to experimentally recorded two-photon observations from the mouse primary auditory cortex (A1). The dataset consisted of recordings from 371 excitatory neurons in layer 2/3 A1, from which we selected $N = 16$ responsive neurons (i.e. neurons that exhibited at least one spiking event in at least half of the trials considered; see Guidelines for model parameter settings). A random sequence of four tones was presented to the mouse, with the same sequence being repeated for $L = 10$ trials. Each trial consisted of $T = 3600$ time frames, and each tone was 2 s long followed by a 4 s silent period (see Experimental procedures for details). We considered an integration window of $R = 25$ frames for stimulus encoding (see Guidelines for model parameter settings for details). The comparison of the noise and signal correlation estimates obtained by our proposed method, Pearson correlations from two-photon recordings, two-stage Pearson and two-stage GPFA methods is shown in Figure 4A. The spatial map of the 16 neurons considered in the analysis in the field of view is shown in Figure 4B. Figure 4C shows the stimulus tone sequence s_t , two-photon observations $y_{t, 1}^{(1)}$ , estimated calcium concentration ${\hat{z}}_{t, 1}^{(1)}$ , putative spikes ${\hat{n}}_{t, 1}^{(1)} := {\hat{z}}_{t, 1}^{(1)} - α {\hat{z}}_{t - 1, 1}^{(1)}$ and the estimated mean of the latent state $m_{𝐱_{t, 1}}^{(1)}$ , for the first trial of the first neuron.

Figure 4—figure supplement 1. — (A) Estimated noise (top) and signal (bottom) correlation matrices using different methods. Rows from left to right: proposed method, Pearson correlations from two-photon data, two-stage Pearson and two-stage GPFA estimates. (B) Location of the selected neurons with the highest activity in the field of view. (C) Presented tone sequence (orange), observations (black), estimated calcium concentrations (purple), putative spikes (green) and estimated mean latent state (blue) in the first trial of the first neuron. (D) Null distributions of chance occurrence of dissimilarities between signal and noise correlation estimates using different methods. The observed test statistic in each case is indicated by a dashed vertical line. (E) Scatter plots of signal vs. noise correlations for individual cell pairs (blue dots) corresponding to each method. Data were normalized for comparison by computing z-scores. For each case, the linear regression model fit is shown in red, and the slope and p-value of the t-test are indicated as insets.

We estimated the Best Frequency (BF) of each neuron as the tone that resulted in the highest level of fluorescence activity. The results in Figure 4A are organized such that the neurons with the same BF are neighboring, with the BF increasing along the diagonal. Thus, expectedly (Bowen et al., 2020) our proposed method as well as the Pearson and two-stage Pearson estimates show high signal correlations along the diagonal. However, the two-stage GPFA estimates do not reveal such a structure. By visual inspection, as also observed in the simulation studies, the Pearson correlations from two-photon recordings, two-stage Pearson and two-stage GPFA estimates have significant leakage between the signal and noise correlations, whereas our proposed signal and noise correlation estimates in Figure 4A suggest distinct spatial structures.

To quantify this visual comparison, we used a statistic based on the Tanimoto similarity metric (Lipkus, 1999), denoted by $T_{s} (𝐗, 𝐘)$ for two matrices $𝐗$ and $𝐘$ . As a measure of dissimilarity, we used $T_{d} (𝐗, 𝐘) := 1 - T_{s} (𝐗, 𝐘)$ (see Performance evaluation for details). The comparison of $T_{d} (\hat{𝐒}, \hat{𝐍})$ for the four estimates is presented in the second column of Table 1. To assess statistical significance, for each comparison we obtained null distributions corresponding to chance occurrence of dissimilarities using a shuffling procedure as shown in Figure 4D, and then computed one-tailed $p$ -values from those distributions (see Performance evaluation for details). Table 1 and Figure 4D includes these p-values, which show that the proposed estimates (boldface numbers in Table 1, second column) indeed have the highest dissimilarity between signal and noise correlations. The higher leakage effect in the other three estimates is also reflected in their smaller $T_{d} (\hat{𝐒}, \hat{𝐍})$ values.

Table 1. Dissimilarity metric statistics for the estimates in Figure 4A (also illustrated in Figure 4D), linear regression statistics of the comparison between signal and noise correlations in Figure 4E, and the average NMSE across 50 trials used in the shuffling procedure illustrated in Figure 5A.

	Dissimilarity $T_{d} (\hat{𝐒}, \hat{𝐍})$	Regression statistics (Figure 4E)		Shuffling test (Figure 5)
Estimate	Figure 4D	Slope (p-value)	$R^{2}$ Value	NMSE in $\hat{𝐍}$	NMSE in $\hat{𝐒}$
Proposed	$0.8725 (p < 10^{- 4})$	$0.02 (p = 0.84)$	$4 \times 10^{- 4}$	$1.07 \pm 0.16$	$1.32 \pm 0.19$
Pearson	$0.6675 (p = 0.71)$	$0.33 (p = 2 \times 10^{- 4})$	0.11	0	0
Two-stage Pearson	$0.7325 (p = 0.09)$	$0.15 (p = 0.10)$	0.02	$1.84 \pm 0.34$	$0.55 \pm 0.12$
Two-stage GPFA	$0.7625 (p < 10^{- 4})$	$0.02 (p = 0.86)$	$3 \times 10^{- 4}$	$2.32 \pm 0.52$	$2.26 \pm 0.51$

Open in a new tab

To further investigate this effect, we have depicted the scatter plots of signal vs. noise correlations estimated by each method in Figure 4E. To examine the possibility of the leakage effect on a pairwise basis, we performed linear regression in each case. The slope of the model fit, the p-value for the corresponding t-test, and the $R^{2}$ values are reported in the third and fourth columns of Table 1 (the slope and p-values are also shown as insets in Figure 4E). Consistent with the results of Winkowski and Kanold, 2013, the Pearson estimates suggest a significant correlation between the signal and noise correlation pairs (as indicated by the higher slope in Figure 4E). However, none of the other estimates (including the proposed estimates) in Figure 4E register a significant trend between signal and noise correlations. This further corroborates our assessment of the high leakage between signal and noise correlations in Pearson estimates, since such a leakage effect could result in overestimation of the trend between the signal and noise correlation pairs. The signal and noise correlations estimated by our proposed method show no pairwise trend, suggesting distinct patterns of stimulus-dependent and stimulus-independent functional connectivity (Kohn et al., 2016; Montijn et al., 2014; Rothschild et al., 2010; Keeley et al., 2020).

A key advantage of our proposed method over the Pearson and two-stage approaches is the explicit modeling of stimulus integration. The relevant parameter in this regard is the length of the stimulus integration window $R$ . While in our simulation studies the value of $R$ was known, it needs to be set by the user in real data applications. To this end, domain knowledge or data-driven methods such as cross-validation and model order selection can be utilized (see Guidelines for model parameter settings for details). Noting that the number of parameters to be estimated linearly scales with $R$ , it must be chosen large enough to capture the stimulus effects, yet small enough to result in favorable computational complexity. Here, given that the typical tone response duration of mouse A1 neurons is $< 1$ s (Linden et al., 2003; DeWeese et al., 2003; Petrus et al., 2014), with a sampling frequency of $f_{s} = 30$ Hz, we surmised that a choice of $R \sim 30$ suffices to capture the stimulus effects. We further examined the effect of varying $R$ on the proposed correlation estimates in Figure 4—figure supplement 1. As shown, small values of $R$ (e.g. $R = 1$ or 10) may not be adequate to fully capture stimulus integration effects. By considering values of $R$ in the range $25 - 50$ , we observed that the correlation estimates remain stable. We thus chose $R = 25$ for our analysis.

Careful inspection of the second panel in Figure 4C shows that the fluorescence activity often saturates to ∼4 times its baseline value. This effect is due to successive closely spaced spikes, which implies the occurrence of more than one spike per frame and thus violates our Bernoulli modeling assumption. To inspect the performance of our method more carefully under this scenario, we show in Figure 4—figure supplement 2 a zoomed-in view of the estimated latent processes ${\hat{z}}_{t, 1}^{(1)}$ (calcium concentration) and ${\hat{n}}_{t, 1}^{(1)}$ (putative spikes) for a sample data segment with high fluorescence activity. The estimated latent processes reveal two mechanisms leveraged by our inference method to mitigate the aforementioned model mismatch: first, our proposed method predicts spiking events in adjacent time frames to compensate for rapid increase in firing rate and thus infers calcium concentration levels that match the observed fluorescence; secondly, even though our generative model assumes that there is only one spiking event in a given time frame, this restriction is mitigated in our inference framework by relaxing the constraint ${\hat{n}}_{t, l}^{(j)} := {\hat{z}}_{t, l}^{(j)} - α {\hat{z}}_{t - 1, l}^{(j)} \leq 1$ , as explained in Low-complexity parameter updates. While this relaxation was performed for the sake of tractability of the inverse solution, it in fact leads to improved estimation results under episodes of rapid increase in firing rate, by allowing the putative spike magnitudes ${\hat{n}}_{t, l}^{(j)}$ to be greater than 1. The latter is evident in the magnitude of the inferred spikes in Figure 4—figure supplement 2, following the rise of fluorescence activity.

Given that the ground truth correlations are not available for a direct comparison, we instead performed a test of specificity that reveals another key limitation of existing methods. Fluorescence observations exhibit structured dynamics due to the exponential intracellular calcium concentration decay (as shown in Figure 4C, for example), which are in turn related to the underlying spikes that are driven non-linearly by intrinsic/extrinsic stimuli as well as the properties of the indicator used. As such, an accurate inference method is expected to be specific to this temporal structure. To test this, we randomly shuffled the $T$ time frames consistently in the same order in all trials, in order to fully break the temporal structure governing calcium decay dynamics, and then estimated correlations from these shuffled data using the different methods. The resulting estimates of noise correlations are shown in Figure 5A for one instance of such shuffled data. The average NMSE for a total of 50 shuffled samples with respect to the original un-shuffled estimates (in Figure 4A) are tabulated in the fifth and sixth columns of Table 1, and are also indicated below each panel in Figure 5A.

A visual inspection of Figure 5A shows that the Pearson correlations from two-photon recordings expectedly remain unchanged. Since this method treats each time frame to be independent, temporal shuffling does not impact the correlations in anyway. On the other extreme, both of the two-stage estimates seem to detect highly variable and large correlation values, despite operating on data that lacks any relevant temporal structure. Our proposed method, however, remarkably produces negligible correlation estimates. Although both the two-stage and proposed estimates show variability with respect to the shuffled data (Table 1, fifth column), the standard deviation of the NMSE values of our proposed method are considerably smaller than those of the two-stage methods (Table 1, fifth column). For further inspection, the histograms of a single element ( ${(\hat{𝐍})}_{1, 3}$ ) of the estimated correlation matrices across the 50 shuffling trials are shown in Figure 5B. The original un-shuffled estimates are marked by the dashed vertical lines in each case. The proposed estimate in Figure 5B is highly concentrated around zero, even though the un-shuffled estimate is non-zero. However, the two-stage estimates produce correlations that are widely variable across the shuffling trials. This analysis demonstrates that our proposed method is highly specific to the temporal structure of fluorescence observations, whereas the Pearson correlations from two-photon recordings, two-stage Pearson and two-stage GPFA methods fail to be specific.

Real data study 2: spontaneous vs. stimulus-driven activity in the mouse A1

To further validate the utility of our proposed methodology, we applied it to another experimentally-recorded dataset from the mouse A1 layer 2/3. This experiment pertained to trials of presenting a sequence of short white noise stimuli, randomly interleaved with silent trials of the same duration. Figure 6A shows a sample trial sequence. The two-photon recordings thus contained episodes of stimulus-driven and spontaneous activity (see Experimental procedures for details). Under this experimental setup, it is expected that the noise correlations are invariant across the spontaneous and stimulus-driven conditions. In accordance with the foregoing results of real data study 1, we also expect the signal and noise correlation patterns to be distinct. Each trial considered in the analysis consisted of $T = 765$ frames (see Experimental procedures for details). We selected $N = 10$ responsive neurons (according to the criterion described in Guidelines for model parameter settings), each with $L = 10$ trials. Similar to real data study 1, we chose a stimulus integration window of length $R = 25$ frames.

Figure 6—figure supplement 1. — (A) A sample trial sequence in the experiment. Stimulus-driven (stim) trials were recorded with randomly interleaved spontaneous (spon) trials of the same duration. (B) Estimated noise and signal correlation matrices under spontaneous (top) and stimulus-driven (bottom) conditions. Rows from left to right: proposed method, Pearson correlations from two-photon data, two-stage Pearson and two-stage GPFA estimates. (C) Location of the selected neurons with highest activity in the field of view. (D) Stimulus onsets (orange), observations (black), estimated calcium concentrations (purple) and putative spikes (green) for the first trial from two pairs of neurons with high signal correlation (top) and high noise correlation (bottom), as identified by the proposed estimates.

Figure 6B shows the resulting noise and signal correlation estimates under the spontaneous ( ${\hat{𝐍}}_{𝗌𝗉𝗈𝗇}$ , top) and stimulus-driven ( ${\hat{𝐍}}_{𝗌𝗍𝗂𝗆}$ and ${\hat{𝐒}}_{𝗌𝗍𝗂𝗆}$ , bottom) conditions. Figure 6C shows the spatial map of the 10 neurons considered in the analysis in the field of view. A visual inspection of the first column of Figure 6B indeed suggests that ${\hat{𝐍}}_{𝗌𝗉𝗈𝗇}$ and ${\hat{𝐍}}_{𝗌𝗍𝗂𝗆}$ are saliently similar, and distinct from ${\hat{𝐒}}_{𝗌𝗍𝗂𝗆}$ . The Pearson correlations obtained from two-photon data (second column) and the two-stage Pearson and GPFA estimates (third and fourth columns, respectively), however, evidently lack this structure. As in the previous study, we quantified this visual comparison using the similarity metric $T_{s} (𝐗, 𝐘)$ and the dissimilarity metric $T_{d} (𝐗, 𝐘)$ (see Performance evaluation for details). These statistics are reported in Table 2 along with the p-values (null distributions are shown in Figure 6—figure supplement 1), which show that the only significant outcomes (boldface numbers) are those of our proposed method. While it is expected from the experiment design for the noise correlations under the two settings to be similar, the only method that detects this expected outcome with statistical significance is our proposed method. Moreover, the statistically significant dissimilarity between the signal and noise correlations of our proposed estimates corroborate the hypothesis that signal and noise are encoded by distinct functional networks (Kohn et al., 2016; Montijn et al., 2014; Rothschild et al., 2010; Keeley et al., 2020).

Table 2. Similarity/dissimilarity metric statistics for the estimates in Figure 6.

Estimation method	$T_{s} ({\hat{𝐍}}_{𝗌𝗉𝗈𝗇}, {\hat{𝐍}}_{𝗌𝗍𝗂𝗆})$	$T_{d} ({\hat{𝐒}}_{𝗌𝗍𝗂𝗆}, {\hat{𝐍}}_{𝗌𝗍𝗂𝗆})$
Proposed	0.5716 ( $p = 0.003$ )	0.7946 ( $p = 0.004$ )
Pearson	0.3031 ( $p = 0.61$ )	0.5032 ( $p = 0.92$ )
Two-stage Pearson	0.2790 ( $p = 0.05$ )	0.7862 ( $p = 0.39$ )
Two-stage GPFA	0.2008 ( $p = 0.50$ )	0.7792 ( $p = 0.22$ )

Open in a new tab

Furthermore, Figure 6D shows the time course of the stimulus, observations, estimated calcium concentrations and putative spikes for the first trial from two pairs of neurons with high signal correlation ( $j = 2, 8$ , top) and high noise correlation ( $j = 3, 5$ , bottom). As expected, the putative spiking activity of the neurons with high signal correlation (top) are closely time-locked to the stimulus onsets. The activity of the two neurons with high noise correlation (bottom), however, is not time-locked to the stimulus onsets, even though the two neurons exhibit highly correlated activity. The correlations estimated via the proposed method thus encode substantial information about the inter-dependencies of the spiking activity of the neuronal ensemble.

Real data study 3: spatial analysis of signal and noise correlations in the mouse A1

Lastly, we applied our proposed method to examine the spatial distribution of signal and noise correlations in the mouse A1 layers 2/3 and 4 (data from Bowen et al., 2020). The dataset included fluorescence activity recorded during multiple experiments of presenting sinusoidal amplitude-modulated tones, with each stimulus being repeated across several trials (see Experimental procedures and Bowen et al., 2020 for experimental details). In each experiment, we selected on average around 20 responsive neurons for subsequent analysis (according to the criterion described in Guidelines for model parameter settings). For brevity, we compare the estimates of signal and noise correlations using our proposed method only with those obtained by Pearson correlations from the two-photon data. The latter method was also used in previous analyses of data from this experimental paradigm (Winkowski and Kanold, 2013).

In parallel to the results reported in Winkowski and Kanold, 2013, Figure 7A and Figure 7B illustrate the correlation between the signal and noise correlations in layers 2/3 and 4, respectively. Consistent with the results of Winkowski and Kanold, 2013, the signal and noise correlations exhibit positive correlation in both layers, regardless of the method used. However, the correlation coefficients (i.e. slopes in the insets) identified by our proposed method are notably smaller than those obtained from Pearson correlations, in both layer 2/3 (Figure 7A) and layer 4 (Figure 7B). Comparing this result with our simulation studies suggests that the stronger linear trend between the signal and noise correlations observed using the Pearson correlation estimates is likely due to the mixing between the estimates of signal and noise correlations. As such, our method suggests that the signal and noise correlations may not be as highly correlated with one another as indicated in previous studies of layer 2/3 and 4 in mouse A1 (Winkowski and Kanold, 2013).

Figure 7—figure supplement 1. — (A) Scatter-plot of noise vs. signal correlations (blue) for individual cell-pairs in layer 2/3, based on the proposed (left) and Pearson estimates (right). Data were normalized for comparison by computing z-scores. The linear model fits are shown in red, and the slope and p-value of the t-tests are indicated as insets. Panel (B) corresponds to layer 4 in the same organization as panel A. (C) Signal (top) and noise (bottom) correlations vs. cell-pair distance in layer 2/3, based on the proposed (left) and Pearson estimates (right). Distances were binned to $10 μ m$ intervals. The median of the distributions (black) and the linear model fit (red) are shown in each panel. The slope of the linear model fit, and the p-value of the t-test are also indicated as insets. Dashed horizontal lines indicate the zero-slope line for ease of visual comparison. Panel D corresponds to layer 4 in the same organization as panel C. (E) Spatial spread of signal (top) and noise (bottom) correlations in layer 2/3, based on the proposed (left) and Pearson estimates (right). The horizontal and vertical axes in each panel respectively represent the relative dorsoventral and rostrocaudal distances between each cell-pair, and the heat-map indicates the magnitude of correlations. Marginal distributions of the signal (blue) and noise (red) correlations along the dorsoventral and rostrocaudal axes for the proposed method (darker colors) and Pearson method (lighter colors) are shown at the top and right sides of the sub-panels. Panel F corresponds to layer 4 in the same organization as panel E.

Next, to evaluate the spatial distribution of signal and noise correlations, we plotted the correlation values for pairs of neurons as a function of their distance for layer 2/3 (Figure 7C) and layer 4 (Figure 7D). The distances were discretized using bins of length $10 μ m$ . The scatter of the correlations along with their median at each bin are shown in all panels. Then, to examine the spatial trend of the correlations, we performed linear regression in each case. The slope of the model fit, the p-value for the corresponding t-test, and the $R^{2}$ values are reported in Table 3 (the slope and p-values are also shown as insets in Figure 7C and D).

Table 3. Linear regression statistics for the analysis of correlations vs. cell-pair distance.

	Statistics of layer 2/3 correlations		Statistics of layer 4 correlations
Correlations	Slope (p-value)	$R^{2}$ Value	Slope (p-value)	$R^{2}$ Value
Proposed Signal Corr.	$- 𝟗 \times {𝟏𝟎}^{- 𝟓}$ ( $p = 0.002$ )	0.012	$- 𝟏 \times {𝟏𝟎}^{- 𝟒}$ ( $p = 3 \times 10^{- 6}$ )	0.023
Pearson Signal Corr.	$- 5 \times 10^{- 5}$ ( $p = 0.02$ )	0.007	$- 3 \times 10^{- 5}$ ( $p = 0.02$ )	0.005
Proposed Noise Corr.	$- 𝟏 \times {𝟏𝟎}^{- 𝟒}$ ( $p = 0.005$ )	0.010	$- 𝟓 \times {𝟏𝟎}^{- 𝟓}$ ( $p = 0.06$ )	0.004
Pearson Noise Corr.	$- 4 \times 10^{- 5}$ ( $p = 0.1$ )	0.003	$- 5 \times 10^{- 5}$ ( $p = 0.02$ )	0.005

Open in a new tab

From Table 3 and Figure 7C and D (upper panels), it is evident that the signal correlations show a significant negative trend with respect to distance, using both methods and in both layers. However, the slope of these negative trends identified by our method (boldface numbers in Table 3) is notably steeper than those identified by Pearson correlations. On the other hand, the trends of the noise correlations with distance (bottom panels) are different between our proposed method and Pearson correlations: our proposed method shows a significant negative trend in layer 2/3, but not in layer 4, whereas the Pearson correlations of the two-photon data suggest a significant negative trend in layer 4, but not in layer 2/3. In addition, the slopes of these negative trends identified by our method (boldface numbers in Table 3) are steeper than or equal to those identified by Pearson correlations.

Our proposed estimates also indicate that noise correlations are sparser and less widespread in layer 4 (Figure 7D) than in layer 2/3 (Figure 7C). To further investigate this observation, we depicted the two-dimensional spatial spread of signal and noise correlations in both layers and for both methods in Figure 7E and F, by centering each neuron at the origin and overlaying the individual spatial spreads. The horizontal and vertical axes in each panel represent the relative dorsoventral and rostrocaudal distances, respectively, and the heat-maps represent the magnitude of correlations. Comparing the proposed noise correlation spread in Figure 7E with the corresponding spread in Figure 7F, we observe that the noise correlations in layer 2/3 are indeed more widespread and abundant than in layer 4, as can be expected by more extensive intralaminar connections in layer 2/3 vs. 4 (Watkins et al., 2014; Meng et al., 2017a; Meng et al., 2017b; Kratz and Manis, 2015).

The spatial spreads of signal and noise correlations based on the Pearson estimates are remarkably similar in both layers (Figure 7E and F, right panels), whereas they are saliently different for our proposed estimates (Figure 7E and F, left panels). This further corroborates our hypothesis on the possibility of high mixing between the signal and noise correlation estimates obtained by the Pearson correlation of two-photon data. To further examine the differences between the signal and noise correlations, the marginal distributions along the dorsoventral and rostrocaudal axes are shown in Figure 7E and F, selectively overlaid for ease of visual comparison. To quantify the differences between the spatial distributions of signal and noise correlations estimated by each method, we performed Kolmogorov-Smirnov (KS) tests on each pair of marginal distributions, which are summarized in Figure 7—figure supplement 1. Although the marginal distributions of signal and noise correlations are significantly different in all cases from both methods, the effect sizes of their difference (KS statistics) are higher for our proposed estimates compared to those of the Pearson estimates.

Finally, the spatial spreads of correlations for either method and in each layer suggest non-uniform angular distributions with possibly directional bias. To test this effect, we computed the angular marginal distributions and performed KS tests for non-uniformity, which are reported in Figure 7—figure supplement 2. These tests indicate that all distributions are significantly non-uniform. In addition, the angular distributions of both signal and noise correlations in layer 4 exhibit salient modes in the rostrocaudal direction, whereas they are less directionally selective in layer 2/3 (Figure 7—figure supplement 2).

In summary, the spatial trends identified by our proposed method are consistent with empirical observations of spatially heterogeneous pure-tone frequency tuning by individual neurons in auditory cortex (Winkowski and Kanold, 2013). The improved correspondence of our proposed method compared to results obtained using Pearson correlations could be the result of the demixing of signal and noise correlations in our method. As a result of the demixing, our proposed method also suggests that noise correlations have a negative trend with distance in layer 2/3, but are much sparser and spatially flat in layer 4. In addition, the spatial spread patterns of signal and noise correlations are more structured and remarkably more distinct for our proposed method than those obtained by the Pearson estimates.

Theoretical analysis of the bias and variance of the proposed estimators

Finally, we present a theoretical analysis of the bias and variance of the proposed estimator. Note that our proposed estimation method has been developed as a scalable alternative to the intractable maximum likelihood (ML) estimation of the signal and noise covariances (see Overview of the proposed estimation method). In order to benchmark our estimates, we thus need to evaluate the quality of said ML estimates. To this end, we derived bounds on the bias and variance of the ML estimators of the kernel $𝐝_{j}$ for $j = 1, \dots, N$ and the noise covariance $𝚺_{x}$ . In order to simplify the treatment, we posit the following mild assumptions:

Assumption (1). We assume a scalar time-varying external stimulus (i.e. $𝐬_{t} = s_{t}$ , and hence $𝐝_{j} = d_{j}, 𝐝 = {[d_{1}, d_{2}, \dots, d_{N}]}^{⊤}$ ). Furthermore, we set the observation noise covariance to be $𝚺_{w} = σ_{w}^{2} 𝐈$ , for notational convenience.

Assumption (2). We derive the performance bounds in the regime where $T$ and $L$ are large, and thus do not impose any prior distribution on the correlations, which are otherwise needed to mitigate overfitting (see Preliminary assumptions).

Assumption (3). We assume the latent trial-dependent process and stimulus to be slowly varying signals, and thus adopt a piece-wise constant model in which these processes are constant within consecutive windows of length $W$ (i.e. $𝐱_{t, l} = 𝐱_{W_{k}, l}$ and $s_{t} = s_{W_{k}}$ , for $(k - 1) W + 1 \leq t < k W$ and $k = 1, \dots, K$ with $W_{k} = (k - 1) W + 1$ and $K W = T$ ) for our theoretical analysis, as is usually done in spike count calculations for conventional noise correlation estimates.

Our main theoretical result is as follows:

Theorem 1 (Performance Bounds) Let $q > \frac{1}{64}$ , $0 < ϵ < 1 / 2$ , and $0 < η \leq 1 / 2$ be fixed constants, $σ_{m}^{2} := \max_{i} {(Σ_{x})}_{i, i}$ and $σ_{s}^{2} := \frac{1}{K} \sum_{k = 1}^{K} s_{W_{k}}^{2}$ . Then, under Assumptions (1 - 3), the bias and variance of the maximum likelihood estimators $\hat{d}$ and ${\hat{Σ}}_{x}$ , conditioned on an event $A_{W}$ with $P (A_{W}) \geq 1 - η$ satisfy:

\begin{aligned} | {bias}_{𝒜_{W}} ({\hat{d}}_{j}) | & \leq \frac{1}{\sqrt{W^{1 - 2 ϵ}}} C_{1} (2 σ_{w} \sqrt{1 + α^{2}} + 1) + τ_{j}, \\ \sqrt{V a r_{𝒜_{W}} ({\hat{d}}_{j})} & \leq \sqrt{\frac{(Σ_{x})_{j, j}}{K L σ_{s}^{2} (1 - η)}} + \frac{1}{\sqrt{W^{1 - 2 ϵ}}} C_{2} (2 σ_{w} \sqrt{1 + α^{2}} + 1) + {\tilde{τ}}_{j}, \\ | {bias}_{𝒜_{W}} (({\hat{Σ}}_{x})_{i, j}) | & \leq \frac{| {(Σ_{x})}_{i, j} |}{K L (1 - η)} + \sqrt{\frac{\log W}{W^{1 - 2 ϵ}}} C_{3} (14 σ_{w} \sqrt{1 + α^{2}} + 3) + ξ_{i, j}, \\ \sqrt{V a r_{𝒜_{W}} (({\hat{Σ}}_{x})_{i, j})} & \leq \sqrt{\frac{(K L - 1) ((Σ_{x})_{i, j}^{2} + (Σ_{x})_{i, i} (Σ_{x})_{j, j})}{K^{2} L^{2} (1 - η)}} + \sqrt{\frac{\log W}{W^{1 - 2 ϵ}}} C_{4} (2 σ_{w} \sqrt{1 + α^{2}} + 1) + {\tilde{ξ}}_{i, j}, \end{aligned}

for all $i, j = 1, 2, \dots, N$ , if

\log W \geq max {\frac{\log (8 K L N / η)}{q}, \frac{32 σ_{m}^{2} q}{ϵ^{2}}, \frac{2 \log (64 q)}{1 - 2 ϵ}, \frac{max {6.25, 4 {(‖ μ_{x} ‖_{\infty} + max_{k, j} {| s_{W_{k}} d_{j} |})}^{2}}}{8 q σ_{m}^{2}}, \log 2},

where $τ_{j}$ and ${\tilde{τ}}_{j}$ denote bounded terms that are $O (σ_{w}^{2})$ or $O (\frac{1}{W})$ , $ξ_{i, j}$ and ${\tilde{ξ}}_{i, j}$ denote bounded terms that are $O (σ_{w}^{2})$ or $O (\frac{1}{W^{1 - 2 ϵ}})$ and $C_{1}, C_{2}, C_{3}$ and $C_{4}$ are bounded constants given in Appendix 2.

Proof. The proof of Theorem 1 is provided in Appendix 2.

■

In order to discuss the implications of this theoretical result, several remarks are in order:

Remark 1: Achieving near Oracle performance
A common benchmark in estimation theory is the performance of the idealistic oracle estimator, in which an oracle directly observes the true latent process $𝐱_{t, l}$ and the true kernel d_j and forms the correlation estimates. In this case, the oracle would incur zero bias and variance of order $𝒪 (1 / K L)$ in estimating d_j, and outputs an estimate of $𝚺_{x}$ with bias and variance in the order of $𝒪 (1 / K L)$ . Theorem 1 indeed states that for sufficiently large $W$ and small $σ_{w}$ , the bias and variance of the ML estimators are arbitrarily close to those of the oracle estimator. Recall that our variational inference framework is in fact a solution technique for the regularized ML problem. Hence, the bounds in Theorem 1 provide a benchmark for the expected performance of the proposed estimators, by quantifying the excess bias and variance over the performance of the oracle estimator.
Remark 2: Effect of the observation noise and observation duration
As the assumed window of stationarity $W \to \infty$ (and hence the observation duration $T \to \infty$ ), the loss of performance of the proposed estimators only depends on $σ_{w}^{2}$ , the variance of the observation noise. As a result, at a given observation noise variance $σ_{w}^{2}$ , these bounds provide a sufficient upper bound on the time duration of the observations required for attaining a desired level of estimation accuracy. It is noteworthy that $σ_{w}^{2}$ is typically small in practice, as it pertains to the effective observation noise and is significantly diminished by pixel averaging of the fluorescence traces following cell segmentation.
Remark 3: Effect of the number of trials
Finally, note that the bounds in Theorem 1 have terms that also drop as the number of trials $L$ grows. These terms in fact pertain to the performance of the oracle estimator. As the number of trials grows ( $L \to \infty$ ), the oracle estimates become arbitrarily close to the true parameters $𝚺_{x}$ and $𝐝_{j}$ . Thus, our theoretical performance bounds also provide a sufficient upper bound on the number of trials $L$ required for the oracle estimator to attain a desired level of estimation accuracy.

Discussion

We developed a novel approach for the joint estimation of signal and noise correlations of neuronal activities directly from two-photon calcium imaging observations and tested our method with experimental data. Existing widely used methods either take the fluorescence traces as surrogates of spiking activity, or first recover the unobserved spikes using deconvolution techniques, both followed by computing Pearson correlations or connectivity matrices. As such, they typically result in estimates that are highly biased and are heavily dependent on the choice of the spike deconvolution technique. We addressed these issues by explicitly relating the signal and noise covariances to the observed two-photon data via a multi-tier Bayesian model that accounts for the observation process and non-linearities involved in spiking activity. We developed an efficient estimation framework by integrating techniques from variational inference and state-space estimation. We also established performance bounds on the bias and variance of the proposed estimators, which revealed favorable scaling with respect to the observation noise and trial length.

We demonstrated the utility of our proposed estimation framework on both simulated and experimentally recorded data from the mouse auditory cortex. In our simulation studies, we evaluated the robustness of our proposed method with respect to several model mismatch conditions induced by the stimulus integration model, calcium decay, SNR level, firing rate, and temporally correlated observation noise. In all cases, we observed that our proposed estimates outperform the existing methods in recovering the signal and noise correlations.

There are two main sources for the observed performance gap between our proposed method and existing approaches. The first source is the favorable soft decisions on the timing of spikes achieved by our method as a byproduct of the iterative variational inference procedure. An accurate probabilistic decoding of spikes results in better estimates of the signal and noise correlations, and conversely having more accurate estimates of the signal and noise covariances improves the probabilistic characterization of spiking events. This is in contrast with both the Pearson correlations computed from two-photon data and two-stage methods: in computing the Pearson correlations from two-photon data, spike timing is heavily blurred by the calcium decay; in the two-stage methods, erroneous hard decisions on the timing of spikes result in biases that propagate to and contaminate the downstream signal and noise correlation estimation and thus results in significant errors.

The second source of performance improvement is the explicit modeling of the non-linear mapping from stimulus and latent covariates to spiking through a canonical point process model, which is in turn tied to a two-photon observation model in a multi-tier Bayesian fashion. Our theoretical analysis in Theorem 1 corroborates that this virtue of our proposed methodology results in robust performance under limited number of trials. As we have shown in Appendix 1, as the number of trials $L$ and trial duration $T$ tend to infinity, conventional notions of signal and noise correlation indeed recover the ground truth signal and noise correlations, as the biases induced by non-linearities average out across trial repetitions. However, as exemplified in Figure 2—figure supplement 2, in order to achieve comparable performance to our method using few trials (e.g. $L = 20$ ), the conventional correlation estimates require considerably more trials (e.g. $L = 1000$ ).

Application to two-photon data recorded from the mouse primary auditory cortex showed that unlike the aforementioned existing methods, our estimates provide noise correlation structures that are expectedly invariant across spontaneous and stimulus-driven conditions within an experiment, while producing signal correlation structures that are largely distinct from those given by noise correlation. These results provide evidence for the involvement of distinct functional neuronal network structures in encoding the stimulus-dependent and stimulus-independent information.

Our analysis of the relationship between the signal and noise correlations in layers 2/3 and 4 in mouse A1 indicated a smaller correlation between signal and noise correlations than previously reported (Winkowski and Kanold, 2013). Thus, our proposed method suggests that the signal and noise correlations reflect distinct circuit mechanisms of sound processing in layers 2/3 vs 4. The spatial distribution of signal correlations obtained by our method was consistent with previous work showing significant negative trends with distance (Winkowski and Kanold, 2013). However, in addition, our proposed method revealed a significant negative trend of noise correlations with distance in layer 2/3, but not in layer 4, in contrast to the outcome of Pearson correlation analysis. The lack of a negative trend in layer 4 could be attributed to the sparse nature of the noise correlation spread in layer 4, as revealed by our analysis of two-dimensional spatial spreads. The latter analysis indeed revealed that the noise correlations in layer 2/3 are more widespread than those in layer 4, consistent with existing work based on whole-cell patch recordings (Meng et al., 2017a; Meng et al., 2017b).

The two-dimensional spatial spreads of signal and noise correlations obtained by our method are more distinct than those obtained by Pearson correlations. The spatial spreads also allude to directionality of the functional connectivity patterns, with a notable rostrocaudal preference in layer 4. This result seems surprising in light of existing evidence for quasi-rostrocaudal organization of the tonotopic axis in mouse A1 (Romero et al., 2020). However, given the heterogeneity of tuning in both layers 2/3 and 4 with a best frequency interqartile range of ∼1–1.5 octaves over the imaging field (Bowen et al., 2020) and using supra-threshold tones, we expect that the tones will drive not only neurons with the corresponding best frequency, but also neurons tuned to neighboring frequencies. Moreover, there is high connectivity between layer 4 cells within a few 100 μm across the tonotopic axis (Kratz and Manis, 2015; Meng et al., 2017a), potentially amplifying and broadening the effect of supra-threshold tones.

Our proposed method can scale up favorably to larger populations of neurons, thanks to the underlying low-complexity variational updates in the inference procedure. Due to its minimal dependence on training data, our estimation framework is also applicable to single-session analysis of two-photon data with limited number of trials and duration. Another useful byproduct of the proposed framework is gaining access to approximate posterior densities in closed-form, which allows further statistical analyses such as construction of confidence intervals. Our proposed methodology can thus be used as a robust and scalable alternative to existing approaches for extracting neuronal correlations from two-photon calcium imaging data.

A potential limitation of our proposed generative model is the assumption that there is at most one spiking event per time frame for each neuron, in light of the fact that typical two-photon imaging frame durations are in the range of 30–100 ms. Average spike rates of excitatory neurons in mouse A1 layers 2/3 and 4 are of the order of $< 10$ Hz (Petrus et al., 2014; Forli et al., 2018) and thus our model is reasonable for the current study, although it might not be optimal during bursting activity. It is noteworthy that we relax this assumption in the inference framework by allowing the magnitude of putative spikes to be greater than one, thus alleviating the model mismatch during episodes of rapid increase in firing rate. This assumption can also be made more precise by adopting a Poisson model, but that would render closed-form variational density updates intractable.

Furthermore, in the regime of extremely low spiking rate and high observation noise, the proposed method may fail to capture the underlying correlations faithfully and its performance degrades to those of existing methods based on Pearson correlations, as we have shown through our simulation studies. Nevertheless, our method addresses key limitations of conventional signal and noise correlation estimators that persist even in high spiking rate and high SNR conditions.

Our proposed estimation framework can be used as groundwork for incorporating other notions of correlation such as the connected correlation function (Martin et al., 2020), and to account for non-Gaussian and higher order structures arising from spatiotemporal interactions (Kadirvelu et al., 2017; Yu et al., 2011). Other possible extensions of this work include leveraging variational inference beyond the mean-field regime (Wang and Blei, 2013), extension to time-varying correlations that underlie rapid task-dependent dynamics, and extension to non-linear models such as those parameterized by neural networks (Aitchison et al., 2017). In the spirit of easing reproducibility, a MATLAB implementation of our proposed method as well as the data used in this work are made publicly available (Rupasinghe, 2020; Rupasinghe et al., 2021).

Materials and methods

Proposed forward model

Suppose we observe fluorescence traces of $N$ neurons, for a total duration of $T$ discrete-time frames, corresponding to $L$ independent trials of repeated stimulus. Let $y_{t, l} := [y_{t, l}^{(1)}, y_{t, l}^{(2)}, \dots, y_{t, l}^{(N)}]^{⊤}$ , $z_{t, l} := [z_{t, l}^{(1)}, z_{t, l}^{(2)}, \dots, z_{t, l}^{(N)}]^{⊤}$ , and $n_{t, l} := [n_{t, l}^{(1)}, n_{t, l}^{(2)}, \dots, n_{t, l}^{(N)}]^{⊤}$ be the vectors of noisy observations, intracellular calcium concentrations, and ensemble spiking activities, respectively, at trial $l$ and frame $t$ . We capture the dynamics of $𝐲_{t, l}$ by the following state-space model:

𝐲_{t, l} = 𝐀 𝐳_{t, l} + 𝐰_{t, l}, 𝐳_{t, l} = α 𝐳_{t - 1, l} + 𝐧_{t, l},

where $𝐀 \in ℝ^{N \times N}$ represents the scaling of the observations, $𝐰_{t, l}$ is zero-mean i.i.d. Gaussian noise with covariance $𝚺_{w}$ , and $0 \leq α < 1$ is the state transition parameter capturing the calcium dynamics through a first order model. Note that this state-space is non-Gaussian due to the binary nature of the spiking activity, that is, $n_{t, l}^{(j)} \in {0, 1}$ . We model the spiking data as a point process or Generalized Linear Model with Bernoulli statistics (Eden et al., 2004; Paninski, 2004; Smith and Brown, 2003; Truccolo et al., 2005):

n_{t, l}^{(j)} \sim Bernoulli (λ_{t, l}^{(j)}), λ_{t, l}^{(j)} = ϕ (x_{t, l}^{(j)}, {d_{j}}^{⊤} s_{t}),

where $λ_{t, l}^{(j)}$ is the conditional intensity function (Truccolo et al., 2005), which we model as a non-linear function of the known external stimulus $𝐬_{t}$ and the other latent intrinsic and extrinsic trial-dependent covariates, $x_{t, l} := [x_{t, l}^{(1)}, x_{t, l}^{(2)}, \dots, x_{t, l}^{(N)}]^{⊤}$ . While we assume the stimulus $𝐬_{t} \in ℝ^{M}$ to be common to all neurons, we model the distinct effect of this stimulus on the $j^{𝗍𝗁}$ neuron via an unknown kernel $𝐝_{j} \in ℝ^{M}$ , akin to the receptive field.

The non-linear mapping of our choice is the logistic link, which is also the canonical link for a Bernoulli process in the point process and Generalized Linear Model frameworks (Truccolo et al., 2005). Thus, we assume:

ϕ (x_{t, l}^{(j)}, 𝐝_{j}^{⊤} 𝐬_{t}) = \frac{\exp (x_{t, l}^{(j)} + 𝐝_{j}^{⊤} 𝐬_{t})}{1 + \exp (x_{t, l}^{(j)} + 𝐝_{j}^{⊤} 𝐬_{t})} .

Finally, we assume the latent trial dependent covariates to be a Gaussian process $𝐱_{t, l} \sim 𝒩 (𝝁_{x}, 𝚺_{x})$ , with mean $μ_{x} := [μ_{x}^{(1)}, μ_{x}^{(2)}, \dots, μ_{x}^{(N)}]^{⊤}$ and covariance $𝚺_{x}$ .

The probabilistic graphical model in Figure 8 summarizes the main components of the aforementioned forward model. According to this forward model, the underlying noise covariance matrix that captures trial-to-trial variability can be identified as $𝚺_{x}$ . The signal covariance matrix, representing the covariance of the neural activity arising from the repeated application of the stimulus $𝐬_{t}$ , is given by $𝚺_{s} := 𝐃^{⊤} cov (𝐬_{t}, 𝐬_{t}) 𝐃$ , where $D := [d_{1}, d_{2}, \dots, d_{N}] \in R^{M \times N}$ . The signal and noise correlation matrices, denoted by $𝐒$ and $𝐍$ , can then be obtained by standard normalization of $𝚺_{s}$ and $𝚺_{x}$ :

{(𝐒)}_{i, j} := \frac{{(𝚺_{s})}_{i, j}}{\sqrt{{(𝚺_{s})}_{i, i} . {(𝚺_{s})}_{j, j}}}, {(𝐍)}_{i, j} := \frac{{(𝚺_{x})}_{i, j}}{\sqrt{{(𝚺_{x})}_{i, i} . {(𝚺_{x})}_{j, j}}}, \forall i, j = 1, 2, \dots, N .

The main problem is thus to estimate ${𝚺_{x}, 𝐃}$ from the noisy and temporally blurred data ${𝐲_{t, l}}_{t = 1, l = 1}^{T, L}$ .

Overview of the proposed estimation method

First, given a limited number of trials $L$ from an ensemble with typically low spiking rates, we need to incorporate suitable prior assumptions to avoid overfitting. Thus, we impose a prior $p_{𝗉𝗋} (𝚺_{x})$ on the noise covariance, to compensate sparsity of data. A natural estimation method to estimate ${𝚺_{x}, 𝐃}$ in a Bayesian framework is to maximize the observed data likelihood $p ({y_{t, l}}_{t, l = 1}^{T, L} | Σ_{x}, D)$ , that is maximum likelihood (ML). Thus, we consider the joint likelihood of the observed data and latent processes to perform Maximum a Posteriori (MAP) estimation:

\begin{array}{ll} p (y, z, x, Σ_{x} | D) & = p_{p r} (Σ_{x}) \prod_{t, l = 1}^{T, L} \frac{1}{\sqrt{(2 π)^{N} | Σ_{w} |}} \exp (- \frac{1}{2} (y_{t, l} - A z_{t, l})^{⊤} Σ_{w}^{- 1} (y_{t, l} - A z_{t, l})) \\ \times \prod_{t, l, j = 1}^{T, L, N} \frac{{(\exp (x_{t, l}^{(j)} + {d_{j}}^{⊤} s_{t}))}^{z_{t, l}^{(j)} - α z_{t - 1, l}^{(j)}}}{1 + \exp (x_{t, l}^{(j)} + {d_{j}}^{⊤} s_{t})} \prod_{t, l = 1}^{T, L} \frac{1}{\sqrt{(2 π)^{N} | Σ_{x} |}} \exp (- \frac{1}{2} (x_{t, l} - μ_{x})^{⊤} Σ_{x}^{- 1} (x_{t, l} - μ_{x})) . \end{array}

(4)

Inspecting this MAP problem soon reveals that estimating $𝚺_{x}$ and $𝐃$ is a challenging task: (1) standard approaches such as Expectation-Maximization (EM) (Shumway and Stoffer, 1982) are intractable due to the complexity of the model, arising from the hierarchy of latent processes and the non-linearities involved in their mappings and (2) the temporal coupling of the likelihood in the calcium concentrations makes any potential direct solver scale poorly with $T$ .

Thus, we propose an alternative solution based on Variational Inference (VI) (Beal, 2003; Blei et al., 2017; Jordan et al., 1999). VI is a method widely used in Bayesian statistics to approximate unwieldy posterior densities using optimization techniques, as a low-complexity alternative strategy to Markov Chain Monte Carlo sampling (Hastings, 1970) or empirical Bayes techniques such as EM. To this end, we treat ${𝐱_{t, l}}_{t, l = 1}^{T, L}$ and $𝚺_{x}$ as latent variables and ${𝐳_{t, l}}_{t, l = 1}^{T, L}$ and $𝐃$ as unknown parameters to be estimated. We introduce a framework to update the latent variables and parameters sequentially, with straightforward update rules. We will describe the main ingredients of the proposed framework in the following subsections. Hereafter, we use the shorthand notations $𝐲 := {𝐲_{t, l}}_{t, l = 1}^{T, L}$ , $𝐳 := {𝐳_{t, l}}_{t, l = 1}^{T, L}$ , and $𝐱 := {𝐱_{t, l}}_{t, l = 1}^{T, L}$ .

Preliminary assumptions

For the sake of simplicity, we assume that the constants α, $𝐀$ , $𝚺_{w}$ and $𝝁_{x}$ are either known or can be consistently estimated from pilot trials. Next, we take $p_{𝗉𝗋} (𝚺_{x})$ to be an Inverse Wishart density:

𝚺_{x} \sim {InvWish}_{N} (𝝍_{x}, ρ_{x}),

which turns out to be the conjugate prior in our model. Thus, $𝝍_{x}$ and $ρ_{x}$ will be the hyper-parameters of our model. Procedures for hyper-parameter tuning and choosing the key model parameters are given in subsections Hyper-parameter tuning and Guidelines for model parameter settings, respectively.

Decoupling via Pólya-Gamma augmentation

Direct application of VI to problems containing both discrete and continuous random variables results in intractable densities. Specifically, finding a variational distribution for $𝐱_{t, l}$ in our model with a standard distribution is not straightforward, due to the complicated posterior arising from co-dependent Bernoulli and Gaussian random variables. In order to overcome this difficulty, we employ Pólya-Gamma (PG) latent variables (Pillow and Scott, 2012; Polson et al., 2013; Linderman et al., 2016). We observe from Equation 4 that the posterior density, $p (𝐱 | 𝐳, 𝐃, 𝚺_{x})$ is conditionally independent in $t, l$ with:

p (𝐱_{t, l} | 𝐳, 𝐃, 𝚺_{x}) \propto p (𝐱_{t, l} | 𝚺_{x}) \prod_{j = 1}^{N} \frac{{(\exp (x_{t, l}^{(j)} + 𝐝_{j}^{⊤} 𝐬_{t}))}^{z_{t, l}^{(j)} - α z_{t - 1, l}^{(j)}}}{1 + \exp (x_{t, l}^{(j)} + 𝐝_{j}^{⊤} 𝐬_{t})} .

Thus, upon careful inspection, we see that this density has the desired form for the PG augmentation scheme (Polson et al., 2013). Accordingly, we introduce a set of auxiliary PG-distributed i.i.d. latent random variables $ω_{t, l} := [ω_{t, l}^{(1)}, ω_{t, l}^{(2)}, \dots, ω_{t, l}^{(N)}]^{⊤}$ , $ω_{t, l}^{(j)} \sim PG (1, 0)$ for $1 \leq j \leq N$ , $1 \leq t \leq T$ and $1 \leq l \leq L$ , to derive the complete data log-likelihood:

\begin{array}{ll} \log p (y, z, x, ω, Σ_{x} | D) \\ = - \frac{T L}{2} \log | Σ_{x} | + \log p_{p r} (Σ_{x}) + \sum_{t, l = 1}^{T, L} {- \frac{1}{2} {(y_{t, l} - A z_{t, l})}^{⊤} Σ_{w}^{- 1} (y_{t, l} - A z_{t, l}) - \frac{1}{2} (x_{t, l} - μ_{x})^{⊤} Σ_{x}^{- 1} (x_{t, l} - μ_{x}) \\ + \sum_{j = 1}^{N} {(z_{t, l}^{(j)} - α z_{t - 1, l}^{(j)} - \frac{1}{2}) (x_{t, l}^{(j)} + {d_{j}}^{⊤} s_{t}) - \frac{1}{2} ω_{t, l}^{(j)} {(x_{t, l}^{(j)} + {d_{j}}^{⊤} s_{t})}^{2} + \log p_{_{P G (1, 0)}} (ω_{t, l}^{(j)})}} + C, \end{array}

(5)

where $𝝎 := {𝝎_{t, l}}_{t, l = 1}^{T, L}$ and $C$ accounts for terms not depending on $𝐲, 𝐳, 𝐱, 𝝎$ , $𝚺_{x}$ and $𝐃$ . The complete data log-likelihood is notably quadratic in $𝐳_{t, l}$ , which as we show later admits efficient estimation procedures with favorable scaling in $T$ .

Deriving the optimal variational densities

In this section, we will outline the procedure of applying VI to the latent variables $𝐱 = {𝐱_{t, l}}_{t, l = 1}^{T, L}, 𝝎 = {𝝎_{t, l}}_{t, l = 1}^{T, L}$ and $𝚺_{x}$ , assuming that the parameter estimates $\hat{𝐳}$ and $\hat{𝐃}$ of the previous iteration are available. The methods that we propose to update the parameters $\hat{𝐳}$ and $\hat{𝐃}$ subsequently, will be discussed in the next section.

The objective of variational inference is to posit a family of approximate densities $𝒬$ over the latent variables, and to find the member of that family that minimizes the Kullback-Leibler (KL) divergence to the exact posterior:

q^{*} (x, ω, Σ_{x} | \hat{z}, \hat{D}) = \underset{q \in 𝒬}{argmin} KL (q (x, ω, Σ_{x} | \hat{z}, \hat{D}) ‖ p (x, ω, Σ_{x} | y, \hat{z}, \hat{D})) .

However, evaluating the KL divergence is intractable, and it has been shown (Blei et al., 2017) that an equivalent result to this minimization can be obtained by maximizing the alternative objective function, called the evidence lower bound (ELBO):

ELBO (q) = E [\log p (x, ω, Σ_{x}, y | \hat{z}, \hat{D})] - E [\log q (x, ω, Σ_{x} | \hat{z}, \hat{D})] .

Further, we assume $𝒬$ to be a mean-field variational family (Blei et al., 2017), resulting in the overall variational density of the form:

q (𝐱, 𝝎, 𝚺_{x}) = q (𝚺_{x}) \prod_{t, l = 1}^{T, L} (q (𝐱_{t, l}) \prod_{j = 1}^{N} q (ω_{t, l}^{(j)})) .

(6)

Under the mean field assumptions, the maximization of the ELBO can be derived using the optimization algorithm ‘Coordinate Ascent Variational Inference’ (CAVI) (Bishop, 2006; Blei et al., 2017). Accordingly, we see that the optimal variational densities in Equation 6 take the forms:

\begin{array}{ll} \log q^{*} (x_{t, l}) & \propto E_{q^{*} (Σ_{x}) q^{*} (ω_{t, l})} [\log p (x_{t, l} | ω_{t, l}, Σ_{x}, y, \hat{z}, \hat{D})], \\ \log q^{*} (ω_{t, l}^{(j)}) & \propto E_{q^{*} (x_{t, l})} [\log p (ω_{t, l}^{(j)} | x_{t, l}, Σ_{x}, y, \hat{z}, \hat{D})], \\ \log q^{*} (Σ_{x}) & \propto E_{q^{*} (x)} [\log p (Σ_{x} | x, y, \hat{z}, \hat{D})] . \end{array}

Upon evaluation of these expectations, we derive the optimal variational distributions as:

q^{*} (x_{t, l}) \sim 𝒩 (m_{x_{t, l}}, Q_{x_{t, l}}), q^{*} (ω_{t, l}^{(j)}) \sim PG (1, c_{t, l}^{(j)}), q^{*} (Σ_{x}) \sim {I n v W i s h}_{N} (P_{x}, γ_{x}),

whose parameters $m_{x_{t, l}} := {[m_{x_{t, l}}^{(1)}, m_{x_{t, l}}^{(2)}, \dots, m_{x_{t, l}}^{(N)}]}^{T}$ , $𝐐_{𝐱_{t, l}}$ , $c_{t, l}^{(j)}$ , $𝐏_{x}$ , and $γ_{x}$ can be updated given parameter estimates $\hat{𝐃}$ and $\hat{𝐳}$ :

\begin{array}{ll} Q_{x_{t, l}} = ({\tilde{Ω}}_{t, l} + γ_{x} P_{x}^{- 1})^{- 1}, m_{x_{t, l}} = Q_{x_{t, l}} ({\hat{z}}_{t, l} - α {\hat{z}}_{t - 1, l} - \frac{1}{2} 1 - {\tilde{Ω}}_{t, l} {\hat{D}}^{⊤} s_{t} + γ_{x} P_{x}^{- 1} μ_{x}), \\ P_{x} := ψ_{x} + \sum_{t, l = 1}^{T, L} {Q_{x_{t, l}} + m_{x_{t, l}} m_{x_{t, l}}^{⊤} - μ_{x} m_{x_{t, l}}^{⊤} - m_{x_{t, l}} μ_{x}^{⊤} + μ_{x} μ_{x}^{⊤}}, c_{t, l}^{(j)} = \sqrt{{(Q_{x_{t, l}})}_{j, j} + {(m_{x_{t, l}}^{(j)} + {\hat{d}}_{j}^{⊤} s_{t})}^{2}}, \end{array}

and $γ_{x} := ρ_{x} + T L$ , with ${\tilde{Ω}}_{t, l} \in R^{N \times N}$ denoting a diagonal matrix with entries $({\tilde{Ω}}_{t, l})_{j, j} := \frac{1}{2 c_{t, l}^{(j)}} \tanh (\frac{c_{t, l}^{(j)}}{2})$ and $𝟏 \in ℝ^{N}$ denoting the vector of all ones.

Low-complexity parameter updates

Note that even though $𝐳$ is composed of the latent processes $𝐳_{t, l}$ , we do not use VI for its inference, and instead consider it as an unknown parameter. This choice is due to the temporal dependencies arising from the underlying state-space model in Equation 4, which hinders a proper assignment of variational densities under the mean field assumption. We thus seek to estimate both $𝐳$ and $𝐃$ using the updated variational density $q^{*} (𝐱, 𝝎, 𝚺_{x})$ .

First, note that the log-likelihood in Equation 5 is decoupled in $l$ , which admits independent updates to ${𝐳_{t, l}}_{t = 1}^{T}$ , for $l = 1, \dots, L$ . As such, given an estimate $\hat{𝐃}$ , we propose to estimate ${𝐳_{t, l}}_{t = 1}^{T}$ as:

$\begin{array}{ll} {{\hat{z}}_{t, l}}_{t = 1}^{T} & = \underset{{z_{t, l}}_{t = 1}^{T}}{argmax} E_{q^{*} (x, ω, Σ_{x})} [\log p (y, z, x, ω, Σ_{x} | \hat{D})] \\ = \underset{{z_{t, l}}_{t = 1}^{T}}{argmin} \sum_{t = 1}^{T} {\frac{1}{2} {(y_{t, l} - A z_{t, l})}^{⊤} Σ_{w}^{- 1} (y_{t, l} - A z_{t, l}) - \sum_{j = 1}^{N} (m_{x_{t, l}}^{(j)} + {\hat{d}}_{j}^{⊤} s_{t}) (z_{t, l}^{(j)} - α z_{t - 1, l}^{(j)})}, \end{array}$ under the constraints $0 \leq z_{t, l}^{(j)} - α z_{t - 1, l}^{(j)} \leq 1$ , for $t = 1, \dots, T$ and $j = 1, \dots, N$ . These constraints are a direct consequence of $n_{t, l}^{(j)} = z_{t, l}^{(j)} - α z_{t - 1, l}^{(j)}$ being a Bernoulli random variable with $𝔼 [n_{t, l}^{(j)}] \in [0, 1]$ . While this problem is a quadratic program and can be solved using standard techniques, it is not readily decoupled in $t$ , and thus standard solvers would not scale favorably in $T$ .

Instead, we consider an alternative solution that admits a low-complexity recursive solution by relaxing the constraints. To this end, we relax the constraint $𝐳_{t, l} - α 𝐳_{t - 1, l} ⪯ 𝟏$ and replace the constraint $𝐳_{t, l} - α 𝐳_{t - 1, l} ⪰$ by penalty terms proportional to $| z_{t, l}^{(j)} - α z_{t - 1, l}^{(j)} |$ . The resulting relaxed problem is thus given by:

min_{{𝐳_{t, l}}_{t = 1}^{T}} \sum_{t = 1}^{T} {\frac{1}{2} {(𝐲_{t, l} - {𝐀𝐳}_{t, l})}^{⊤} 𝚺_{w}^{- 1} (𝐲_{t, l} - {𝐀𝐳}_{t, l}) + \sum_{j = 1}^{N} ν_{t, l}^{(j)} | z_{t, l}^{(j)} - α z_{t - 1, l}^{(j)} |},

(7)

where $ν_{t, l}^{(j)} := β | m_{𝐱_{t, l}}^{(j)} + {\hat{𝐝}}_{j}^{⊤} 𝐬_{t} |$ with $β \geq 1$ being a hyper-parameter. Given that the typical spiking rates are quite low in practice, $m_{𝐱_{t, l}}^{(j)} + {\hat{𝐝}}_{j}^{⊤} 𝐬_{t}$ is expected to be a negative number. Thus, we have assumed that $- m_{𝐱_{t, l}}^{(j)} - {\hat{𝐝}}_{j}^{⊤} 𝐬_{t} = | m_{𝐱_{t, l}}^{(j)} + {\hat{𝐝}}_{j}^{⊤} 𝐬_{t} |$ .

The problem of Equation 7 pertains to compressible state-space estimation, for which fast recursive solvers are available (Kazemipour et al., 2018). The solver utilizes the Iteratively Re-weighted Least Squares (IRLS) (Ba et al., 2014) framework to transform the absolute value in the second term of the cost function into a quadratic form in $𝐳_{t, l}$ , followed by Fixed Interval Smoothing (FIS) (Rauch et al., 1965) to find the minimizer. At iteration $k$ , given a current estimate $𝐳^{[k - 1]}$ , the problem reduces to a Gaussian state-space estimation of the form:

𝐲_{t, l} = {𝐀𝐳}_{t, l} + 𝐰_{t, l}, 𝐳_{t, l} = α 𝐳_{t - 1, l} + 𝐯_{t, l},

(8)

with $𝐰_{t, l} \sim 𝒩 (0, 𝚺_{w})$ and $𝐯_{t, l} \sim 𝒩 (0, 𝚺_{𝐯_{t, l}}^{[k]})$ , where $𝚺_{𝐯_{t, l}}^{[k]} \in ℝ^{N \times N}$ is a diagonal matrix with ${(𝚺_{𝐯_{t, l}}^{[k]})}_{j, j} := \sqrt{{({\hat{z}}_{t, l}^{(j) [k - 1]} - α {\hat{z}}_{t - 1, l}^{(j) [k - 1]})}^{2} + ε^{2}} / ν_{t, l}^{(j)}$ , for some small constant $ε > 0$ . This problem can be efficiently solved using FIS, and the iterations proceed for a total of $K$ times or until a standard convergence criterion is met (Kazemipour et al., 2018). It is noteworthy that our proposed estimator of the calcium concentration $𝐳_{t, l}$ can be thought of as soft spike deconvolution, which naturally arises from our variational framework, as opposed to the hard spike deconvolution step used in two-stage estimators.

Finally, given $q^{*} (𝐱, 𝝎, 𝚺_{x})$ and the updated $\hat{𝐳}$ , the estimate of $𝐝_{j}$ for $j = 1, 2, \dots, N$ can be updated in closed-form by maximizing the expected complete log-likelihood $𝔼_{q^{*} (𝐱, 𝝎, 𝚺_{x})} [\log p (𝐲, \hat{𝐳}, 𝐱, 𝝎, 𝚺_{x} | 𝐃)]$ :

{\hat{d}}_{j} = {(\sum_{t, l = 1}^{T, L} (({\tilde{Ω}}_{t, l})_{j, j} s_{t} {s_{t}}^{⊤}))}^{- 1} (\sum_{t, l = 1}^{T, L} {({\hat{z}}_{t, l}^{(j)} - α {\hat{z}}_{t - 1, l}^{(j)} - \frac{1}{2}) s_{t} - ({\tilde{Ω}}_{t, l})_{j, j} m_{x_{t, l}}^{(j)} s_{t}}) .

The VI procedure iterates between updating the variational densities and parameters until convergence, upon which we estimate the noise and signal covariances as:

{\hat{𝚺}}_{x} := mode {q^{*} (𝚺_{x})} = \frac{𝐏_{x}}{γ_{x} + N + 1}, {\hat{𝚺}}_{s} := {\hat{𝐃}}^{⊤} 𝔼 [𝐬_{t} 𝐬_{t}^{⊤}] \hat{𝐃} .

The overall combined iterative procedure is outlined in Algorithm 1. Furthermore, a MATLAB implementation of this algorithm is publicly available in Rupasinghe, 2020. It is worth noting that a special case of our proposed variational inference procedure can be used to estimate signal and noise correlations from electrophysiology recordings. Given that spiking activity, that is ${𝐧_{t, l}}_{t, l = 1}^{T, L}$ , is directly observed in this case, the solution to the optimization problem in Equation (7) is no longer required. Thus, the parameters $𝚺_{x}$ and $𝐃$ can be estimated using a simplified variational procedure, which is outlined in Algorithm 2 in Appendix 3.

Guidelines for model parameter settings

There are several key model parameters that need to be set by the user prior to the application of our proposed method. Here, we provide our rationale and criteria for choosing these parameters, which could also serve as guidelines in facilitating the applicability and adoption of our method by future users. We will also provide the specific choices of these parameters used in our simulation studies and real data analyses.

Number of neurons selected for the analysis ( $N$ )

While our proposed method scales-up well with the population size due to low-complexity update rules involved, including neurons with negligible spiking activity in the analysis would only increase the complexity and potentially contaminate the correlation estimates. Thus, we performed an initial pre-processing step to extract $N$ neurons that exhibited at least one spiking event in at least half of the trials considered.

Stimulus integration window length ( $R$ )

The number of lags $R$ considered in stimulus integration is a key parameter that can be set through data-driven approaches or using prior domain knowledge. Examples of common data-driven criteria include cross-validation, Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), which balance the estimation accuracy and model complexity (Arlot and Celisse, 2010; Ding et al., 2018).

To quantify the effect of $R$ on model complexity, we first describe the stimulus encoding model in our framework. Suppose that the onset of the $p^{𝗍𝗁}$ tone in the stimulus set ( $p = 1, \dots, P$ , where $P$ is the number of distinct tones) is given by a binary sequence $f_{t}^{(p)} \in {0, 1}$ . The choice of $R$ implies that the response at time $t$ post-stimulus depends only on the $R$ most recent time lags. As such, the effective stimulus at time $t$ corresponding to tone $p$ is given by $𝐬_{t}^{(p)} := {[f_{t}^{(p)}, f_{t - 1}^{(p)}, \dots, f_{t - R + 1}^{(p)}]}^{⊤} \in ℝ^{R}$ . By including all the $P$ tones, the overall effective stimulus at the $t^{t h}$ time frame is given by $𝐬_{t} := {[𝐬_{t}^{(1) ⊤}, \dots, 𝐬_{t}^{(P) ⊤}]}^{⊤} \in ℝ^{R P}$ . The stimulus modulation vector $𝐝_{j}$ would thus be $R P$ -dimensional. As a result, the number of parameters ( $M = R P$ ) to be estimated linearly increases with $R$ . By using additional domain knowledge, we chose $R$ to be large enough to capture the stimulus effects, and at the same time to be small enough to control the complexity of the algorithm.

As an example, given that the typical tone response duration of mouse primary auditory neurons is $< 1$ s (Linden et al., 2003; DeWeese et al., 2003; Petrus et al., 2014), with a sampling frequency of $f_{s} = 30$ Hz, a choice of $R \sim 30$ would suffice to capture the stimulus effects. By further examining the effect of varying $R$ on the proposed correlation estimates in Figure 4—figure supplement 1, we chose $R = 25$ for our real data analyses.

Observation noise covariance ( $𝚺_{w}$ ) and scaling matrix ( $𝐀$ )

We assumed that the observation noise covariance $𝚺_{w}$ is diagonal, and estimated the diagonal elements using the background fluorescence in the absence of spiking events, for each neuron. We set $𝐀 = a 𝐈$ , where $𝐈 \in ℝ^{N \times N}$ represents the identity matrix, and estimated $a$ by considering the average increase in fluorescence after the occurrence of isolated spiking events. Specifically, we derived the average fluorescence activity of multiple trials triggered to the fluorescence rise onset, and set $a$ as the increment in the magnitude of this average fluorescence immediately following the rise onset.

State transition parameter (α)

We chose α in the range $[0.95, 0.98]$ , which match the slow dynamics of the calcium indicator in our data. We tested the robustness of our estimates under different choices of α in this range through the method outlined in Hyper-parameter tuning, and accordingly chose the optimal value of α.

Mean of the latent trial-dependent process ( $𝝁_{x}$ )

We estimated $𝝁_{x}$ as a constant that is proportional to the average firing rate. To this end, we parametrized each component of $𝝁_{x}$ as $μ_{x}^{(j)} = - a_{μ} + b_{μ} \frac{1}{L T} \sum_{t, l = 1}^{T, L} y_{t, l}^{(j)}$ , for $j = 1, \dots, N$ . The constants $a_{μ}$ and $b_{μ}$ were chosen such that $- 2 \leq μ_{x}^{(j)} \leq - 10$ , which gives the range of baseline parameters compatible with observed firing rates in our experimental data.

Parameter choices for simulation study 1

In the first simulation study, we set $α = 0.98$ , $β = 8$ , $𝐀 = 0.1 𝐈$ , $𝝁_{x} = - 4.51$ and $𝚺_{w} = 2 \times 10^{- 4} 𝐈$ ( $𝐈 \in ℝ^{8 \times 8}$ represents the identity matrix and $𝟏 \in ℝ^{8}$ represents the vector of all ones), so that the SNR of simulated data was in the same range as that of experimentally-recorded data. We used a $6^{𝗍𝗁}$ order autoregressive process with a mean of -1 as the stimulus (s_t), and considered $R = 2$ ( $M = 2$ ) lags of the stimulus (i.e. $𝐬_{t} = {[s_{t}, s_{t - 1}]}^{⊤}$ ) in both the generative model and inference procedure. The components of the linear and quadratic stimulus modulation vectors, that is $𝐝_{j}$ , ${\tilde{𝐝}}_{j, 1}$ and ${\tilde{𝐝}}_{j, 2}$ , were chosen at random uniformly in the range $[- 0.5, 0.5]$ . The variance of s_t was set in each case such that the average power of the overall signal component ( $𝐝_{j}^{⊤} 𝐬_{t}$ for the linear model, and $𝐝_{j}^{⊤} 𝐬_{t} + {({\tilde{𝐝}}_{j, 1}^{⊤} 𝐬_{t})}^{2} + {({\tilde{𝐝}}_{j, 2}^{⊤} 𝐬_{t})}^{2}$ for the non-linear model) was comparable to the average power of the noise component ( $x_{t, l}^{(j)}$ ).

Algorithm 1 Estimation of

Σ_{x}

and

D

through the proposed iterative procedure

Inputs: Ensemble of fluorescence measurements

{𝐲_{t, l}}_{t, l = 1}^{T, L}

, constants

α, 𝐀, 𝚺_{w}

and

𝝁_{x}

, hyper-parameters

𝝍_{x}

ρ_{x}

, β and

ϵ

, tolerance at convergence δ and the external stimulus

𝐬_{t}

Outputs:

{\hat{𝚺}}_{x}

and

\hat{𝐃}

Initialization: Initial choice of

𝚺_{𝐯_{t, l}}

{\tilde{Ω}}_{t}

{\hat{𝚺}}_{x}

and

\hat{𝐃}

residual = 10 δ

γ_{x} = ρ_{x} + L T

1: while

residual \geq δ

do
Estimate calcium concentrations using Fixed Interval Smoothing
2: for

l = 1, \dots, L

do
Forward filter:
3: for

t = 1, . . ., T

do
4:

z_{(t | t - 1), l} = α z_{(t - 1 | t - 1), l}

P_{(t | t - 1), l} = α^{2} P_{(t - 1 | t - 1), l} + Σ_{v_{t, l}}

B_{t, l} = P_{(t | t - 1), l} A^{⊤} (A P_{(t | t - 1), l} A^{⊤} + Σ_{w})^{- 1}

z_{(t | t), l} = z_{(t | t - 1), l} + B_{t, l} (y_{t, l} - A z_{(t | t - 1), l})

P_{(t | t), l} = (I - B_{t, l} A) P_{(t | t - 1), l}

9: end for
Backward smoother:
10: for

t = T - 1, . . ., 1

do
11:

{\hat{z}}_{t, l} = z_{(t | t), l} + α P_{(t | t), l} P_{(t + 1 | t), l}^{- 1} ({\hat{z}}_{t + 1, l} - z_{(t + 1 | t), l})

12:      end for
13:   end for
Update variational parameters
14:      for

t = 1, \dots, T

and

l = 1, \dots, L

do
15:

𝐐_{𝐱_{t, l}} = {({\tilde{𝛀}}_{t, l} + γ_{x} 𝐏_{x}^{- 1})}^{- 1}

16:

m_{x_{t, l}} = Q_{x_{t, l}} ({\hat{z}}_{t, l} - α {\hat{z}}_{t - 1, l} - \frac{1}{2} 1 - {\tilde{Ω}}_{t, l} {\hat{D}}^{⊤} s_{t} + γ_{x} P_{x}^{- 1} μ_{x})

17:

v_{t, l}^{(j)} := β | m_{x_{t, l}}^{(j)} + {\hat{d}}_{j}^{T} s_{t} |

18: for

j = 1, \dots, N

do
19:

c_{t, l}^{(j)} = \sqrt{{(𝐐_{𝐱_{t, l}})}_{j, j} + {(m_{𝐱_{t, l}}^{(j)} + {\hat{𝐝}}_{j}^{⊤} 𝐬_{t})}^{2}}

20:

({\tilde{Ω}}_{t, l})_{j, j} := \frac{1}{2 c_{t, l}^{(j)}} \tanh (\frac{c_{t, l}^{(j)}}{2})

21:      end for
22:   end for
23:

P_{x} := ψ_{x} + \sum_{t, l = 1}^{T, L} {Q_{x_{t, l}} + m_{x_{t, l}} m_{x_{t, l}}^{⊤} - μ_{x} m_{x_{t, l}}^{⊤} - m_{x_{t, l}} μ_{x}^{⊤} + μ_{x} μ_{x}^{⊤}}

Update IRLS covariance approximation
24: for

l = 1, \dots, L, t = 1, \dots, T

and

j = 1, \dots, N

do
25:

(Σ_{v_{t, l}})_{j, j} := \sqrt{({\hat{z}}_{t, l}^{(j)} - α {\hat{z}}_{t - 1, l}^{(j)})^{2} + ε^{2}} / v_{t, l}^{(j)}

26: end for
Update outputs and the convergence criterion
27: for

j = 1, \dots, N

do
28:

{\hat{d}}_{j} = {(\sum_{t, l = 1}^{T, L} (({\tilde{Ω}}_{t, l})_{j, j} s_{t} {s_{t}}^{⊤}))}^{- 1} (\sum_{t, l = 1}^{T, L} {({\hat{z}}_{t, l}^{(j)} - α {\hat{z}}_{t - 1, l}^{(j)} - \frac{1}{2}) s_{t} - ({\tilde{Ω}}_{t, l})_{j, j} m_{x_{t, l}}^{(j)} s_{t}})

29: end for
30:

{(\hat{𝐃})}_{𝗉𝗋𝖾𝗏} = \hat{𝐃}

\hat{D} = [{\hat{d}}_{1}, {\hat{d}}_{2}, \dots, {\hat{d}}_{N}]

31:

{({\hat{𝚺}}_{x})}_{𝗉𝗋𝖾𝗏} = {\hat{𝚺}}_{x}

{\hat{Σ}}_{x} = \frac{P_{x}}{γ_{x} + N + 1}

32:

residual = {∥ {({\hat{𝚺}}_{x})}_{𝗉𝗋𝖾𝗏} - {\hat{𝚺}}_{x} ∥}_{2} / {∥ {({\hat{𝚺}}_{x})}_{𝗉𝗋𝖾𝗏} ∥}_{2} + {∥ {(\hat{𝐃})}_{𝗉𝗋𝖾𝗏} - \hat{𝐃} ∥}_{2} / {∥ {(\hat{𝐃})}_{𝗉𝗋𝖾𝗏} ∥}_{2}

33: end while
34: Return

{\hat{𝚺}}_{x}

and

\hat{𝐃}

Open in a new tab

Parameter choices for simulation study 2

In the second simulation study, we set $α = 0.98$ , $𝐀 = 0.1 𝐈$ , $𝝁_{x} = - 4.51$ and $𝚺_{w} = 10^{- 4} 𝐈$ ( $𝐈 \in ℝ^{30 \times 30}$ represents the identity matrix and $𝟏 \in ℝ^{30}$ represents the vector of all ones) when generating the fluorescence traces ${𝐲_{t, l}}_{t, l = 1}^{T, L}$ , so that the SNR of the simulated data was in the same range as of real calcium imaging observations. Furthermore, we simulated the spike trains based on a Poisson process (Smith and Brown, 2003) using the discrete time re-scaling procedure (Brown et al., 2002; Smith and Brown, 2003). Following the assumptions in Brown et al., 2002, we used an exponential link to simulate the observations:

n_{t, l}^{(j)} \sim Poisson (λ_{t, l}^{(j)}), λ_{t, l}^{(j)} = \exp (x_{t, l}^{(j)}),

as opposed to the Bernoulli-logistic assumption in our recognition model. Then, we estimated the noise covariance ${\hat{𝚺}}_{x}$ using the Algorithm 1, with a slight modification. Since there are no external stimuli, we set $𝐬_{t} =$ and $𝐃 =$ . Accordingly, in Algorithm 1, we initialized $\hat{𝐃} =$ and did not perform the update on $\hat{𝐃}$ in the subsequent iterations.

Parameter choices for real data study 1

The dataset consisted of recordings from 371 excitatory neurons, from which we selected $N = 16$ responsive neurons for the analysis. Each trial consisted of $T = 3600$ time frames (the sampling frequency was 30 Hz, and each trial had a duration of 120 s), with the presentation of a random sequence of four tones ( $P = 4$ ). The spiking events were very sparse and infrequent, and hence this dataset fits our model with at most one spiking event in a time frame.

We considered $R = 25$ ( $M = 100$ ) time lags in this analysis and further examined the effect of varying $R$ in Figure 4—figure supplement 1. We set $α = 0.95$ and $𝐀 = 𝐈$ ( $𝐈 \in ℝ^{16 \times 16}$ represents the identity matrix).

Parameter choices for real data study 2

Each trial consisted of $T = 765$ frames (25.5 s) at a sampling frequency of 30 Hz. The A1 neurons studied here had low response rates (in both time and space), with only $\sim 10$ neurons exhibiting spiking activity in at least half of the trials. Thus, we selected $N = 10$ neurons and $L = 10$ trials for the analysis, and chose $R = 25$ lags of the stimulus ( $M = 25$ ) in the model for the stimulus-driven condition. We set $α = 0.95$ and $𝐀 = 0.75 𝐈$ ( $𝐈 \in ℝ^{10 \times 10}$ represents the identity matrix).

Parameter choices for real data study 3

Each experiment consisted of $L = 5$ trials of $P = 9$ different tone frequencies repeated at four different amplitude levels, resulting in each concatenated trial being $\sim 180$ s long (see Bowen et al., 2020 for more details). We set the number of stimulus time lags considered to be $R = 25$ ( $M = 225$ ). For each layer, we analyzed fluorescence observations from six experiments. In each experiment, we selected the most responsive $N \sim 20$ neurons for the subsequent analysis. We set $α = 0.95$ and $𝐀 = 𝐈$ .

Performance evaluation

Simulation studies

Since the ground truth is known in simulations, we directly compared the performance of each signal and noise correlation estimate with the ground truth signal and noise correlations, respectively. Suppose the ground truth correlations are given by the matrix $𝐗$ and the estimated correlations are given by the matrix $\hat{𝐗}$ . To quantify the similarity between $𝐗$ and $\hat{𝐗}$ , we defined the following two metrics:

Normalized Mean Squared Error (NMSE): The NMSE computes the mean squared error of $\hat{𝐗}$ with respect to $𝐗$ using the Frobenius Norm:

N M S E := \frac{‖ X - \hat{X} ‖_{F}^{2}}{‖ X ‖_{F}^{2}} .

Ratio between out-of-network power and in-network power (leakage): First, we identified the in-network and out-of-network components from the ground truth correlation matrix $𝐗$ . Suppose that if the true correlation between the $i^{𝗍𝗁}$ neuron and the $j^{𝗍𝗁}$ neuron is non-zero, then $| {(𝐗)}_{i, j} | > δ_{x}$ , for some $δ_{x} > 0$ . Thus, we formed a matrix $𝐗^{𝗂𝗇}$ that masks the in-network components, by setting ${(𝐗^{𝗂𝗇})}_{i, j} = 1$ if $| {(𝐗)}_{i, j} | > δ_{x}$ and ${(𝐗^{𝗂𝗇})}_{i, j} = 0$ if $| {(𝐗)}_{i, j} | \leq δ_{x}$ . Likewise, we also formed a matrix $𝐗^{𝗈𝗎𝗍}$ that masks the out-of-network components, by setting ${(𝐗^{𝗈𝗎𝗍})}_{i, j} = 1$ if $| {(𝐗)}_{i, j} | \leq δ_{x}$ and ${(𝐗^{𝗈𝗎𝗍})}_{i, j} = 0$ if $| {(𝐗)}_{i, j} | > δ_{x}$ . Then, using these two matrices we quantified the leakage effect of $\hat{𝐗}$ comparative to $𝐗$ by:

leakage := \frac{{∥ \hat{𝐗} \cdot 𝐗^{𝗈𝗎𝗍} ∥}_{F}^{2}}{{∥ \hat{𝐗} \cdot 𝐗^{𝗂𝗇} ∥}_{F}^{2}},

where $(\cdot)$ denotes element-wise multiplication.

Real data studies

To quantify the similarity and dissimilarity between signal and noise correlation estimates, we used a statistic based on the Tanimoto similarity metric (Lipkus, 1999), denoted by $T_{s} (𝐗, 𝐘)$ for two matrices $𝐗$ and $𝐘$ . For two vectors $𝐚$ and $𝐛$ with non-negative entries, the Tanimoto coefficient (Lipkus, 1999) is defined as:

T (a, b) := \frac{a^{T} b}{a^{T} a + b^{T} b - a^{T} b} .

The Tanimoto similarly metric between two matrices can be defined in a similar manner, by vectorizing the matrices. Thus, we formulated a similarity metric between two correlation matrices $𝐗$ and $𝐘$ as follows. Let $𝐗^{+} := \max {𝐗, 0 𝐈}$ and $𝐗^{-} := \max {- 𝐗, 0 𝐈}$ , with the $\max {\cdot, \cdot}$ operator interpreted element-wise. Note that $𝐗 = 𝐗^{+} - 𝐗^{-}$ , and $𝐗^{+}, 𝐗^{-}$ have non-negative entries. We then defined the similarity matrix by combining those of the positive and negative parts as follows:

T_{s} (𝐗, 𝐘) := ε T (𝐗^{+}, 𝐘^{+}) + (1 - ε) T (𝐗^{-}, 𝐘^{-})

where $ε \in [0, 1]$ denotes the percentage of positive entries in $𝐗$ and $𝐘$ . As a measure of dissimilarity, we used $T_{d} (𝐗, 𝐘) := 1 - T_{s} (𝐗, 𝐘)$ . The values of $T_{d} (\hat{𝐒}, \hat{𝐍})$ in Table 1 and $T_{s} ({\hat{𝐍}}_{𝗌𝗉𝗈𝗇}, {\hat{𝐍}}_{𝗌𝗍𝗂𝗆})$ and $T_{d} ({\hat{𝐒}}_{𝗌𝗍𝗂𝗆}, {\hat{𝐍}}_{𝗌𝗍𝗂𝗆})$ reported in Table 2 were obtained based on the foregoing definitions.

To further assess the statistical significance of these results, we performed following randomized tests. To test the significance of $T_{s} ({\hat{𝐍}}_{𝗌𝗉𝗈𝗇}, {\hat{𝐍}}_{𝗌𝗍𝗂𝗆})$ , for each comparison and each algorithm, we fixed the first matrix (i.e. ${\hat{𝐍}}_{𝗌𝗉𝗈𝗇}$ ) and randomly shuffled the entries of the second one (i.e. ${\hat{𝐍}}_{𝗌𝗍𝗂𝗆}$ ) while respecting symmetry. We repeated this procedure for 10000 trials, to derive the null distributions that represented the probabilities of chance occurrence of similarities between two random groups of neurons.

To test the significance of $T_{d} (\hat{𝐒}, \hat{𝐍})$ and $T_{d} ({\hat{𝐒}}_{𝗌𝗍𝗂𝗆}, {\hat{𝐍}}_{𝗌𝗍𝗂𝗆})$ , for each comparison and each algorithm, again we fixed the first matrix (i.e. signal correlations). Then, we formed the elements of the second matrix (akin to noise correlations) as follows. For each element of the second matrix, we assigned either the same element as the signal correlations (in order to model the leakage effect) or a random noise (with same variance as the elements in the noise correlation matrix) with equal probability. As before, we repeated this procedure for 10,000 trials, to derive the null distributions that represent the probabilities of chance occurrence of dissimilarities between two matrices that have some leakage between them.

Hyper-parameter tuning

The hyper-parameters that directly affect the proposed estimation are the inverse Wishart prior hyper-parameters: $ψ_{x}$ and $ρ_{x}$ . Given that $ρ_{x}$ appears in the form of $γ_{x} := T L + ρ_{x}$ , we will consider $ψ_{x}$ and $γ_{x}$ as the main hyper-parameters for simplicity. Here, we propose a criterion for choosing these two hyper-parameters in a data-driven fashion, which will then be used to construct the estimates of the noise covariance matrix ${\hat{𝚺}}_{x}$ and weight matrix $\hat{𝐃}$ . Due to the hierarchy of hidden layers in our model, an empirical Bayes approach for hyper-parameter selection using a likelihood-based performance metric is not straightforward. Hence, we propose an alternative empirical method for hyper-parameter selection as follows.

For a given choice of $𝝍_{x}$ and $γ_{x}$ , we estimate ${\hat{𝚺}}_{x}$ and $\hat{𝐃}$ following the proposed method. Then, based on the generative model in Proposed forward model, and using the estimated values of ${\hat{𝚺}}_{x}$ and $\hat{𝐃}$ , we sample an ensemble of simulated fluorescence traces $\hat{𝐲} = {{\hat{𝐲}}_{t}^{(l)}}_{t, l = 1}^{T, L}$ , and compute the metric $d (𝝍_{x}, γ_{x})$ :

d (ψ_{x}, γ_{x}) := D_{f r o b} (cov (\hat{y}, \hat{y}), cov (y, y)),

where $cov (\cdot, \cdot)$ denotes the empirical covariance and $D_{𝖿𝗋𝗈𝖻} (𝐗, 𝐘) := {∥ 𝐗 - 𝐘 ∥}_{F}^{2}$ . Note that $D_{𝖿𝗋𝗈𝖻} (𝐗, 𝐘)$ is strictly convex in $𝐗$ . Thus, minimizing $D_{𝖿𝗋𝗈𝖻} (𝐗, 𝐘)$ over $𝐗$ for a given $𝐘$ has a unique solution. Accordingly, we observe that $d (𝝍_{x}, γ_{x})$ is minimized when $cov (\hat{𝐲}, \hat{𝐲})$ is nearest to $cov (𝐲, 𝐲)$ . Therefore, the corresponding estimates ${\hat{𝚺}}_{x}$ and $\hat{𝐃}$ that generated $\hat{𝐲}$ , best match the second-order statistics of $𝐲$ that was generated by the true parameters $𝚺_{x}$ and $𝐃$ .

The typically low spiking rate of sensory neurons observed in practice may render the estimation problem ill-posed. It is thus important to have an accurate choice of the scale matrix $𝝍_{x}$ in the prior distribution. However, an exhaustive search for optimal tuning of $𝝍_{x}$ is not computationally feasible, given that it has $N (N + 1) / 2$ free variables. Thus, the main challenge here is finding the optimal choice of the scale matrix $𝝍_{x, 𝗈𝗉𝗍}$ .

To address this challenge, we propose the following method. First, we fix $𝝍_{x, 𝗂𝗇𝗂𝗍} = τ 𝐈$ , where τ is a scalar and $𝐈 \in ℝ^{N \times N}$ is the identity matrix. Next, given $𝝍_{x, 𝗂𝗇𝗂𝗍}$ we find the optimal choice of $γ_{x}$ as:

γ_{x, 𝗂𝗇𝗂𝗍} = \underset{γ_{x} \in 𝒮_{γ}}{argmin} d (𝝍_{x, 𝗂𝗇𝗂𝗍}, γ_{x}),

where $𝒮_{γ}$ is a finite set of candidate solutions for $γ_{x} > N - 1$ . Let ${\hat{𝚺}}_{x, 𝗂𝗇𝗂𝗍}$ denote the noise covariance estimate corresponding to hyper-parameters $(𝝍_{x, 𝗂𝗇𝗂𝗍}, γ_{x, 𝗂𝗇𝗂𝗍})$ . We will next use ${\hat{𝚺}}_{x, 𝗂𝗇𝗂𝗍}$ to find a suitable choice of $𝝍_{x}$ . To this end, we first fix $γ_{x, 𝗈𝗉𝗍} := T L + {\tilde{ρ}}_{x}$ , for some $N - 1 < {\tilde{ρ}}_{x} ≪ T L$ . Note that by choosing ${\tilde{ρ}}_{x}$ to be much smaller than $T L$ , the final estimates become less sensitive to the choice of $γ_{x}$ . Then, we construct a candidate set $𝒮_{ψ}$ for $𝝍_{x, 𝗈𝗉𝗍}$ by scaling ${\hat{Σ}}_{x, i n i t}$ with a finite set of scalars $η \in R^{+}$ , i.e. $𝒮_{ψ} := {η {\hat{Σ}}_{x, i n i t}, η \in R^{+}}$ . To select $𝝍_{x, 𝗈𝗉𝗍}$ , we match it with the choice of $γ_{x, 𝗈𝗉𝗍}$ by solving:

𝝍_{x, 𝗈𝗉𝗍} = \underset{𝝍_{x} \in 𝒮_{ψ}}{argmin} d (𝝍_{x}, γ_{x, 𝗈𝗉𝗍}) .

Finally, we use these hyper-parameters $(𝝍_{x, 𝗈𝗉𝗍}, γ_{x, 𝗈𝗉𝗍})$ to obtain the estimators ${\hat{𝚺}}_{x}$ and $\hat{𝐃}$ as the output of the algorithm.

Experimental procedures

All procedures were approved by the University of Maryland Institutional Animal Care and Use Committee. Imaging experiments were performed on a P60 (for real data study 1) and P83 (for real data study 2) female F1 offspring of the CBA/CaJ strain (The Jackson Laboratory; stock #000654) crossed with transgenic C57BL/6J-Tg(Thy1-GCaMP6s)GP4.3Dkim/J mice (The Jackson Laboratory; stock #024275) (CBAxThy1), and F1 (CBAxC57). The third real data study was performed on data from P66-P93 and P166-P178 mice (see Bowen et al., 2020 for more details). We used the F1 generation of the crossed mice because they have good hearing into adulthood (Frisina et al., 2011).

We performed cranial window implantation and two-photon imaging as previously described in Francis et al., 2018; Liu et al., 2019; Bowen et al., 2019. Briefly, we implanted a cranial window of 3 mm in diameter over the left auditory cortex. We used a scanning microscope (Bergamo II series, B248, Thorlabs) coupled to Insight X3 laser (Spectra-physics) (study 1) or pulsed femtosecond Ti:Sapphire two-photon laser with dispersion compensation (Vision S, Coherent) (studies 2 and 3) to image GCaMP6s fluorescence from individual neurons in awake head-fixed mice with an excitation wavelengths of $λ = 920$ nm and $λ = 940$ nm, respectively. The microscope was controlled by ThorImageLS software. The size of the field of view was $370 \times 370 μ m$ . Imaging frames of $512 \times 512$ pixels (pixel size $0.72 μ m$ ) were acquired at 30 Hz by bidirectional scanning of an 8 kHz resonant scanner. The imaging depth was around $200 μ m$ below pia.

Data pre-processing

A circular ROI was manually drawn over each cell body to extract raw fluorescence traces from individual cells. Neuropil contamination subtraction and baseline correction were performed on the raw fluorescence traces of each cell (Francis et al., 2018; Liu et al., 2019; Bowen et al., 2020) according to $\frac{F_{𝖼𝖾𝗅𝗅} - α_{n} F_{𝗇𝖾𝗎𝗋𝗈𝗉𝗂𝗅} - baseline}{baseline}$ , where $α_{n}$ was set to 0.7 in real data study 1 (Francis et al., 2018), 0.8 in real data study 2 (Liu et al., 2019) and 0.9 in real data study 3 (Bowen et al., 2020). The two-photon observations ${𝐲_{t, l}}_{t, l = 1}^{T, L}$ used in our analyses are the output of this pre-processing step.

Stimuli for real data study 1

During imaging experiments, we presented four tones (4, 8, 16, and 32 kHz) at 70 dB SPL. The tones were 2 s in duration with an inter-trial silence of 4 s. For the sequence of tones, we first generated a randomized sequence that consisted of five repeats for each tone (20 tones in total) and then the same sequence was repeated for 10 trials.

Stimuli for real data study 2

During imaging experiments, we presented a 75 dB SPL 100 ms broadband noise (4–48 kHz) as the auditory stimulus. Each trial was 5.1 s long (1 s pre-stimulus silence + 0.1 s stimulus + 3 s post-stimulus silence), and the inter-trial duration was 3 s. Spontaneous neuronal activity was collected from activity during randomly interleaved no-stimuli trials of the same duration, and these trials had complete silence throughout the trial duration (5.1 s long).

Then, we extracted 50 such trials from each type, and formed 10 ( $L = 10$ ) trials each of 25.5 s duration ( $T = 765$ frames) for the subsequent analysis, by concatenating five 5.1 s trials. This final step was performed to increase the effective trial duration.

Stimuli for real data study 3

During imaging experiments, sounds were played at four sound levels (20, 40, 60, and 80 dB SPL). Auditory stimuli consisted of sinusoidal amplitude-modulated (SAM) tones (20 Hz modulation, cosine phase), ranging from 3 to 48 kHz. The frequency resolution was two tones/octave (0.5 octave spacing) and each of these tonal stimuli was 1 s long, repeated five times with a 4−6 s inter-stimulus interval (see Bowen et al., 2020 for details).

Acknowledgements

The authors thank Daniel E Winkowski for collecting the data in Bowen et al., 2019 that was also used in this work.

Appendix 1

Relationship to existing definitions of Signal and Noise correlations

Recall that the conventional definitions of signal and noise covariance of spiking activity between the $i^{𝗍𝗁}$ and $j^{𝗍𝗁}$ neuron are (Lyamzin et al., 2015):

\begin{array}{ll} (Σ_{s}^{c o n})_{i, j} & = cov (\frac{1}{L} \sum_{l} n_{t, l}^{(i)}, \frac{1}{L} \sum_{l} n_{t, l}^{(j)}), \\ (Σ_{x}^{c o n})_{i, j} & = \frac{1}{L} \sum_{l} cov (n_{t, l}^{(i)} - \frac{1}{L} \sum_{l^{'}} n_{t, l^{'}}^{(i)}, n_{t, l}^{(j)} - \frac{1}{L} \sum_{l^{'}} n_{t, l^{'}}^{(j)}), \end{array}

where $cov (u_{t}, v_{t}) := \frac{1}{T} \sum_{t = 1}^{T} (u_{t} - \frac{1}{T} \sum_{t^{'} = 1}^{T} u_{t^{'}}) {(v_{t} - \frac{1}{T} \sum_{t^{'} = 1}^{T} v_{t^{'}})}^{⊤}$ , is the empirical covariance. The correlations, are then derived by the standard normalization:

(S^{c o n})_{i, j} := \frac{(Σ_{s}^{c o n})_{i, j}}{\sqrt{(Σ_{s}^{c o n})_{i, i} . (Σ_{s}^{c o n})_{j, j}}}, (N^{c o n})_{i, j} := \frac{(Σ_{x}^{c o n})_{i, j}}{\sqrt{(Σ_{x}^{c o n})_{i, i} . (Σ_{x}^{c o n})_{j, j}}}, \forall i, j = 1, 2, \dots, N .

(9)

Suppose that the spiking events follow the forward model:

n_{t, l}^{(j)} \sim Bernoulli (λ_{t, l}^{(j)}), λ_{t, l}^{(j)} = ϕ (x_{t, l}^{(j)}, {d_{j}}^{⊤} s_{t}),

where $ϕ : ℝ^{2} \to [0, 1]$ is a differentiable non-linear mapping. We assume $𝐱_{t, l}$ and $𝐬_{t}$ to be independent. Without loss of generality, let $𝔼 [𝐬_{t}] =$ and $𝔼 [𝐱_{t, l}] = 𝝁_{x}$ . Further, we define the notation $X_{t} \approx Y_{t}$ to denote almost sure equivalence, that is $X_{t} \overset{a.s.}{\to} Z$ and $Y_{t} \overset{a.s.}{\to} Z$ for some random variable $Z$ .

First, let us consider ${(𝐒^{𝖼𝗈𝗇})}_{i, j}$ . Noting that $𝔼 [n_{t, l}^{(j)}] = 𝔼 [λ_{t, l}^{(j)}]$ and $𝔼 [n_{t, l}^{(i)} n_{t, l}^{(j)}] = 𝔼 [λ_{t, l}^{(i)} λ_{t, l}^{(j)}]$ , we conclude as $T \to \infty$ :

{(𝚺_{s}^{𝖼𝗈𝗇})}_{i, j} \approx cov (\frac{1}{L} \sum_{l} λ_{t, l}^{(i)}, \frac{1}{L} \sum_{l} λ_{t, l}^{(j)}),

from the law of large numbers. Then, if we consider the Taylor series expansion of $ϕ (x_{t, l}^{(j)}, 𝐝_{j}^{⊤} 𝐬_{t})$ around the mean $(μ_{x}^{(j)}, 0)$ , we get:

\begin{array}{ll} (Σ_{s}^{c o n})_{i, j} \approx cov (ϕ (μ_{x}^{(i)}, 0) + \frac{1}{L} \sum_{l} (x_{t, l}^{(i)} - μ_{x}^{(i)}) ϕ_{(x_{t, l}^{(i)})} (μ_{x}^{(i)}, 0) + ({d_{i}}^{⊤} s_{t}) ϕ_{({d_{i}}^{⊤} s_{t})} (μ_{x}^{(i)}, 0) + ϵ_{t, l}^{(i)}, \\ ϕ (μ_{x}^{(j)}, 0) + \frac{1}{L} \sum_{l} (x_{t, l}^{(j)} - μ_{x}^{(j)}) ϕ_{(x_{t, l}^{(j)})} (μ_{x}^{(j)}, 0) + ({d_{j}}^{⊤} s_{t}) ϕ_{({d_{j}}^{⊤} s_{t})} (μ_{x}^{(j)}, 0) + ϵ_{t, l}^{(j)}), \end{array}

where $ϵ_{t, l}^{(i)}$ and $ϵ_{t, l}^{(j)}$ represent the higher order terms. Then, as $L \to \infty$ , we get:

{(𝚺_{s}^{𝖼𝗈𝗇})}_{i, j} \approx cov ((𝐝_{i}^{⊤} 𝐬_{t}) ϕ_{(𝐝_{i}^{⊤} 𝐬_{t})} (μ_{x}^{(i)}, 0), (𝐝_{j}^{⊤} 𝐬_{t}) ϕ_{(𝐝_{j}^{⊤} 𝐬_{t})} (μ_{x}^{(j)}, 0)) + ϵ_{t, l},

since $lim_{L \to \infty} \frac{1}{L} \sum_{l = 1}^{L} (x_{t, l}^{(j)}) = μ_{x}^{(j)}$ by the Law of Large numbers. Thus, we see that:

\begin{array}{ll} (Σ_{s}^{c o n})_{i, j} & \approx C_{S} {d_{i}}^{⊤} cov (s_{t}, s_{t}) d_{j} + ϵ_{t, l} \\ = C_{S} (Σ_{s})_{i, j} + ϵ_{t, l}, \end{array}

where $C_{S}$ is a constant and $ϵ_{t, l}$ is typically small if the latent process $𝐱_{t, l}$ and the stimulus $𝐬_{t}$ are concentrated around their means. Then, the signal correlations are obtained by normalization of the signal covariance as in Equation 9, through which the scaling factor $C_{S}$ cancels and we get:

{(𝐒^{𝖼𝗈𝗇})}_{i, j} \approx {(𝐒)}_{i, j} .

Thus, as $T, L \to \infty$ , we see that $𝐒$ is indeed the signal correlation matrix that is aimed to be approximated by the conventional definitions.

Next, let us consider ${(𝐍^{𝖼𝗈𝗇})}_{i, j}$ . Similar to foregoing analysis of the signal covariance, as $T \to \infty$ we get:

(Σ_{x}^{c o n})_{i, j} \approx \frac{1}{L} \sum_{l} cov (λ_{t, l}^{(i)} - \frac{1}{L} \sum_{l^{'}} λ_{t, l^{'}}^{(i)}, λ_{t, l}^{(j)} - \frac{1}{L} \sum_{l^{'}} λ_{t, l^{'}}^{(j)}) .

Then, from a Taylor series expansion, we get:

\begin{array}{ll} (Σ_{x}^{c o n})_{i, j} \approx \frac{1}{L} \sum_{l} cov (x_{t, l}^{(i)} ϕ_{(x_{t, l}^{(i)})} (μ_{x}^{(i)}, 0) - \frac{1}{L} \sum_{l^{'}} x_{t, l^{'}}^{(i)} ϕ_{(x_{t, l^{'}}^{(i)})} (μ_{x}^{(i)}, 0) + ξ_{t, l}^{(i)}, \\ x_{t, l}^{(j)} ϕ_{(x_{t, l}^{(j)})} (μ_{x}^{(j)}, 0) - \frac{1}{L} \sum_{l^{'}} x_{t, l^{'}}^{(j)} ϕ_{(x_{t, l^{'}}^{(j)})} (μ_{x}^{(j)}, 0) + ξ_{t, l}^{(j)}), \end{array}

where $ξ_{t, l}^{(i)}$ and $ξ_{t, l}^{(j)}$ represent the higher order terms. Then, as $L \to \infty$ :

{(𝚺_{x}^{𝖼𝗈𝗇})}_{i, j} \approx \frac{1}{L} \sum_{l} cov ((x_{t, l}^{(i)} - μ_{x}^{(i)}) ϕ_{(x_{t, l}^{(i)})} (μ_{x}^{(i)}, 0), (x_{t, l}^{(j)} - μ_{x}^{(j)}) ϕ_{(x_{t, l}^{(j)})} (μ_{x}^{(j)}, 0)) + ξ_{t, l},

from the law of large numbers. Accordingly, we see that:

\begin{array}{ll} (Σ_{x}^{c o n})_{i, j} & \approx C_{N} \frac{1}{L} \sum_{l} cov (x_{t, l}^{(i)} - μ_{x}^{(i)}, x_{t, l}^{(j)} - μ_{x}^{(j)}) + ξ_{t, l} \\ = C_{N} (Σ_{x})_{i, j} + ξ_{t, l}, \end{array}

where $C_{N}$ is a constant and $ξ_{t, l}$ is typically small if the latent process $𝐱_{t, l}$ and the stimulus $𝐬_{t}$ are concentrated around their means. Then, the noise correlations are derived by normalization of the noise covariance given in Equation 9. This cancels out the scaling factor $C_{N}$ , and we get:

{(𝐍^{𝖼𝗈𝗇})}_{i, j} \approx {(𝐍)}_{i, j} .

Thus, we similarly conclude that as $T, L \to \infty$ , the conventional definition of noise correlation $𝐍^{𝖼𝗈𝗇}$ indeed aims to approximate $𝐍$ .

As a numerical illustration, we demonstrated in Figure 2—figure supplement 2 that the conventional definitions of the correlations indeed approximate our proposed definitions, but require much larger number of trials to be accurate. More specifically, in order to achieve comparable performance to our method using $L = 20$ trials, the conventional correlation estimates require $L = 1000$ trials.

Appendix 2

Proof of Theorem 1

In what follows, we present a comprehensive proof of Theorem 1. Recall the following key assumptions:

Assumption (2). We derive the performance bounds in the regime where $T$ and $L$ are large, and thus do not impose any prior distribution on the correlations (i.e. $p_{𝗉𝗋} (𝚺_{x}) \propto 1$ ), which are otherwise needed to mitigate overfitting (see Preliminary assumptions).

Assumption (3). We assume the latent trial-dependent process and stimulus to be slowly varying signals, and thus adopt a piece-wise constant model in which these processes are constant within consecutive windows of length $W$ (i.e., $𝐱_{t, l} = 𝐱_{W_{k}, l}$ and $s_{t} = s_{W_{k}}$ , for $(k - 1) W + 1 \leq t < k W$ and $k = 1, \dots, K$ with $W_{k} = (k - 1) W + 1$ and $K W = T$ ) for our theoretical analysis, as is usually done in spike count calculations for conventional noise correlation estimates.

Proof of Theorem 1. First, recall the proposed forward model (see Proposed forward model) under Assumption (1 – 3):

\begin{aligned} y_{t, l} & = A z_{t, l} + w_{t, l}, \\ z_{t, l} & = α z_{t - 1, l} + n_{t, l}, \\ n_{t, l}^{(j)} & \sim Bernoulli (ϕ (x_{W_{k}, l}^{(j)})), \\ x_{W_{k}, l} & \sim 𝒩 (μ_{x} + s_{W_{k}} d, Σ_{x}), \end{aligned}

where $ϕ (\cdot) := \frac{\exp (\cdot)}{1 + \exp (\cdot)}$ , is the logistic function. Note that we have re-defined the latent process $𝐱_{t, l}$ by absorbing the stimulus activity $s_{t} 𝐝$ to the mean of $𝐱_{t, l}$ for notational convenience, without loss of generality. Hereafter, we also assume that $𝐀 = 𝐈$ without loss of generality. For a truncation level $B$ (to be specified later), consider the event

𝒜_{W} := {| x_{W_{k}, l}^{(j)} | \leq B and \frac{1}{2 (1 + \exp (B))} \leq {\bar{n}}_{W_{k}, l}^{(j)} \leq 1 - \frac{1}{2 (1 + \exp (B))} for j = 1, \dots, N, k = 1, \dots, K and l = 1, \dots, L},

such that ${\bar{n}}_{W_{k}, l} = {[{\bar{n}}_{W_{k}, l}^{(1)}, {\bar{n}}_{W_{k}, l}^{(2)}, \dots, {\bar{n}}_{W_{k}, l}^{(N)}]}^{⊤} := \frac{1}{W} \sum_{w = 1}^{W} n_{(k - 1) W + w, l}$ .

First, we derive convenient forms of the maximum likelihood estimators via the Laplace’s approximations and asymptotic expansions (Wong, 2001) through the following lemma:

Lemma 1

Conditioned on event $A_{W}$ , the maximum likelihood estimators of the stimulus kernel of the $j^{𝑡ℎ}$ neuron and the noise covariance between the $i^{𝑡ℎ}$ and $j^{𝑡ℎ}$ neurons take the forms:

\begin{aligned} {\hat{d}}_{j} & = {\tilde{d}}_{j} (1 + 𝒪 (σ_{w}^{2})) (1 + 𝒪 (\frac{1}{W})) and \\ ({\hat{Σ}}_{x})_{i, j} & = ({\tilde{Σ}}_{x})_{i, j} (1 + 𝒪 (σ_{w}^{2})) (1 + 𝒪 (\frac{1}{W})), \end{aligned}

where

\begin{aligned} {\tilde{d}}_{j} & = \frac{1}{L \sum_{k = 1}^{K} s_{W_{k}}^{2}} \sum_{k, l = 1}^{K, L} s_{W_{k}} (ϕ^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - μ_{x}^{(j)}) and \\ ({\tilde{Σ}}_{x})_{i, j} & = \frac{1}{K L} \sum_{k, l = 1}^{K, L} (ϕ^{- 1} ({\tilde{n}}_{W_{k}, l}^{(i)}) - μ_{x}^{(i)} - s_{W_{k}} {\tilde{d}}_{i}) (ϕ^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - μ_{x}^{(j)} - s_{W_{k}} {\tilde{d}}_{j}), \end{aligned}

with

{\tilde{n}}_{W_{k}, l} = {[{\tilde{n}}_{W_{k}, l}^{(1)}, {\tilde{n}}_{W_{k}, l}^{(2)}, \dots, {\tilde{n}}_{W_{k}, l}^{(N)}]}^{⊤} := \frac{1}{W} \sum_{w = 1}^{W} (y_{(k - 1) W + w, l} - α y_{(k - 1) W + w - 1, l}) and ϕ^{- 1} (z) := \ln (z / (1 - z)) .

Proof of Lemma 1.

First, maximizing the data likelihood, we derive the estimators:

{\hat{d}}_{j} = \underset{d_{j}}{argmax} p (y | Σ_{x}, d) = \frac{\int (\frac{1}{L \sum_{k = 1}^{K} s_{W_{k}}^{2}} \sum_{k, l = 1}^{K, L} s_{W_{k}} (x_{W_{k}, l}^{(j)} - μ_{x}^{(j)})) p (y | n) p (n | x) p (x | Σ_{x}, d) d n d x}{\int p (y | n) p (n | x) p (x | Σ_{x}, d) d n d x},

(10)

and

\begin{array}{ll} ({\hat{Σ}}_{x})_{i, j} = \underset{(Σ_{x})_{i, j}}{argmax} & p (y | Σ_{x}, d) \\ = \frac{\int (\frac{1}{K L} \sum_{k, l = 1}^{K, L} (x_{W_{k}, l}^{(i)} - μ_{x}^{(i)} - s_{W_{k}} {\hat{d}}_{i}) (x_{W_{k}, l}^{(j)} - μ_{x}^{(j)} - s_{W_{k}} {\hat{d}}_{j})) p (y | n) p (n | x) p (x | Σ_{x}, d) d n d x}{\int p (y | n) p (n | x) p (x | Σ_{x}, d) d n d x}, \end{array}

(11)

where $W_{k} = (k - 1) W + 1$ . Then, we simplify these integrals based on the saddle point method of asymptotic expansions (Wong, 2001). To that end, first consider the numerator of Equation 10 denoted by $I_{num}^{(1)}$ . First, we evaluate the integration in $I_{num}^{(1)}$ with respect to the variable $n$ . To that end, note:

I_{num}^{(1)} = \int h_{num}^{(1)} (n) \exp (A_{1} f_{1} (n)) d n,

where $h_{num}^{(1)} (n) = \frac{1}{\sqrt{(2 π)^{T N L} σ_{w}^{2 T N L}}} \int (\frac{1}{L \sum_{k = 1}^{K} s_{W_{k}}^{2}} \sum_{k, l = 1}^{K, L} s_{W_{k}} (x_{W_{k}, l}^{(j)} - μ_{x}^{(j)})) p (n | x) p (x | Σ_{x}, d) d x$ , $A_{1} = \frac{1}{σ_{w}^{2}}$ , $f_{1} (n) = - \frac{1}{2} \sum_{t, l, j} {(y_{t, l}^{(j)} - \sum_{k = 1}^{t} α^{t - k} n_{t, l}^{(j)})}^{2}$ and $d 𝐧$ is shorthand notation for the product measure of the discrete random vector $𝐧$ . Observing that $\nabla f_{1} (\hat{𝐧}) =$ for $\hat{n} := {{\hat{n}}_{t, l} = y_{t, l} - α y_{t - 1, l}}_{t, l = 1}^{T, L}$ , using the method of asymptotic expansions, $I_{num}^{(1)}$ can be evaluated as:

I_{num}^{(1)} = h_{num}^{(1)} (\hat{𝐧}) \times \exp (A_{1} f_{1} (\hat{𝐧})) \sqrt{\frac{{(2 π)}^{T L N}}{- A_{1} | H (f_{1}) |}} (1 + 𝒪 (\frac{1}{A_{1}})),

(12)

where the determinant of the Hessian matrix $| H (f_{1}) |$ , is a negative function of α. Note that the covariance of this Gaussian integral $(- {(H (f_{1}))}^{- 1})$ is a function of $α \in (0, 1)$ , and hence is bounded. Thus, all higher order error terms in Equation 12 are also bounded, as higher order moments of Gaussian distributions are functions of the covariance.

Next, we simplify the integral $h_{num}^{(1)} (\hat{𝐧})$ in Equation 12 using a similar procedure. We have:

h_{num}^{(1)} (\hat{𝐧}) = \int r_{num}^{(1)} (𝐱) \exp (A_{2} f_{2} (𝐱)) 𝑑 𝐱,

(13)

where $f_{2} (𝐱) = \sum_{k, l, j} ({\tilde{n}}_{W_{k}, l}^{(j)} x_{W_{k}, l}^{(j)} - \log (1 + \exp (x_{W_{k}, l}^{(j)})))$ with

{\tilde{n}}_{W_{k}, l} = {[{\tilde{n}}_{W_{k}, l}^{(1)}, {\tilde{n}}_{W_{k}, l}^{(2)}, \dots, {\tilde{n}}_{W_{k}, l}^{(N)}]}^{⊤} := \frac{1}{W} \sum_{w = 1}^{W} {\hat{n}}_{(k - 1) W + w, l},

\begin{array}{ll} r_{num}^{(1)} (x) & = \frac{1}{\sqrt{(2 π)^{(W + 1) K L N} σ_{w}^{2 T N L} | Σ_{x} |^{K L}}} \exp (- \frac{1}{2} \sum_{t} {(x_{W_{k}, l} - μ_{x} - s_{W_{k}} d)}^{⊤} Σ_{x}^{- 1} (x_{W_{k}, l} - μ_{x} - s_{W_{k}} d)) \\ \times (\frac{1}{L \sum_{k = 1}^{K} s_{W_{k}}^{2}} \sum_{k = 1}^{K} s_{W_{k}, l} (x_{W_{k}, l}^{(j)} - μ_{x}^{(j)})), \end{array}

and $A_{2} = W$ . Then, we note that the gradient of f₂, $\nabla f_{2} (\hat{𝐱}) =$ for $\hat{𝐱} := {{\hat{x}}_{W_{k}, l}^{(j)} = ϕ^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)})}_{k, l, j = 1}^{K, L, N}$ , where $ϕ^{- 1} (z) := logit (z) = \ln (z / (1 - z))$ . Accordingly, by re-applying the saddle point method of asymptotic expansions, we evaluate the integral in Equation 13 as:

h_{num}^{(1)} (\hat{𝐧}) = r_{num}^{(1)} (\hat{𝐱}) \times \exp (A_{2} f_{2} (\hat{𝐱})) \sqrt{\frac{{(2 π)}^{K L N}}{- A_{2} | H (f_{2} (\hat{𝐱})) |}} (1 + 𝒪 (\frac{1}{A_{2}})),

(14)

where the determinant of the Hessian, $| H (f_{2} (\hat{𝐱})) | = - \prod_{k, l, j} {\tilde{n}}_{W_{k}, l}^{(j)} (1 - {\tilde{n}}_{W_{k}, l}^{(j)}) < 0$ when conditioned on event $𝒜_{W}$ . The higher order terms in Equation 14 will be bounded if the covariance of the saddle point approximation $(- {(H (f_{2} (\hat{𝐱})))}^{- 1})$ is bounded, which we ensure by conditioning on event $𝒜_{W}$ . This completes the evaluation of $I_{num}^{(1)}$ .

Following the same sequence of arguments, we evaluate the denominator of Equation 10 denoted by $I_{den}^{(1)}$ . Accordingly, we derive:

\begin{aligned} I_{den}^{(1)} & = h_{den}^{(1)} (\hat{n}) \times \exp (A_{1} f_{1} (\hat{n})) \sqrt{\frac{(2 π)^{T L N}}{- A_{1} | H (f_{1}) |}} (1 + 𝒪 (\frac{1}{A_{1}})), \\ h_{den}^{(1)} (\hat{n}) & = r_{den}^{(1)} (\hat{x}) \times \exp (A_{2} f_{2} (\hat{x})) \sqrt{\frac{(2 π)^{K L N}}{- A_{2} | H (f_{2} (\hat{x})) |}} (1 + 𝒪 (\frac{1}{A_{2}})), \end{aligned}

(15)

where $r_{den}^{(1)} (x) = \frac{1}{\sqrt{(2 π)^{(W + 1) K L N} σ_{w}^{2 T N L} | Σ_{x} |^{K L}}} \exp (- \frac{1}{2} \sum_{k, l} {(x_{W_{k}, l} - μ_{x} - s_{W_{k}} d)}^{⊤} Σ_{x}^{- 1} (x_{W_{k}, l} - μ_{x} - s_{W_{k}} d)) .$ Finally, by combining Equation 12, Equation 14 and Equation 15, the maximum likelihood estimator in Equation 10 takes the form:

{\hat{d}}_{j} = \frac{I_{num}^{(1)}}{I_{den}^{(1)}} = {\tilde{d}}_{j} \frac{(1 + 𝒪 (\frac{1}{A_{1}})) (1 + 𝒪 (\frac{1}{A_{2}}))}{(1 + 𝒪 (\frac{1}{A_{1}})) (1 + 𝒪 (\frac{1}{A_{2}}))} = {\tilde{d}}_{j} (1 + 𝒪 (σ_{w}^{2})) (1 + 𝒪 (\frac{1}{W})) .

Further, following the same sequence of reasoning, simplifying the numerator $(I_{num}^{(2)})$ and denominator $(I_{den}^{(2)})$ of Equation 11 yields:

{({\hat{𝚺}}_{x})}_{i, j} = \frac{I_{num}^{(2)}}{I_{den}^{(2)}} = {({\tilde{𝚺}}_{x})}_{i, j} \frac{(1 + 𝒪 (\frac{1}{A_{1}})) (1 + 𝒪 (\frac{1}{A_{2}}))}{(1 + 𝒪 (\frac{1}{A_{1}})) (1 + 𝒪 (\frac{1}{A_{2}}))} = {({\tilde{𝚺}}_{x})}_{i, j} (1 + 𝒪 (σ_{w}^{2})) (1 + 𝒪 (\frac{1}{W})) .

This concludes the proof of Lemma 1. ■

Given that $ϕ^{- 1} (z)$ is unbounded for $z = 0$ or $z = 1$ , we consider another truncation: $ϕ_{B^{'}}^{- 1} (z) := \min {\max {ϕ^{- 1} (z), - B^{'}}, B^{'}}$ , where $B^{'} = 2 \log (2 \exp (B) + 1)$ . This choice of $B^{'}$ guarantees that over $𝒜_{W}$ , $| ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(j)}) | < B^{'}$ for all $j = 1, \dots, N$ , $k = 1, \dots, K$ and $l = 1, \dots, L$ : and thus $ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(j)}) = ϕ^{- 1} ({\bar{n}}_{W_{k}, l}^{(j)})$ on $𝒜_{W}$ .

From Lemma 1, the bias and variance of the maximum likelihood estimators, ${\hat{d}}_{j}$ and ${({\hat{𝚺}}_{x})}_{i, j}$ are upper-bounded, if those of ${\tilde{d}}_{j}$ and ${({\tilde{𝚺}}_{x})}_{i, j}$ are bounded:

| bias ({\hat{d}}_{j}) | \leq | bias ({\tilde{d}}_{j}) | + ζ_{j}, Var ({\hat{d}}_{j}) \leq Var ({\tilde{d}}_{j}) + {\tilde{ζ}}_{j},

(16)

and

| bias ({({\hat{𝚺}}_{x})}_{i, j}) | \leq | bias ({({\tilde{𝚺}}_{x})}_{i, j}) | + υ_{i, j}, Var ({({\hat{𝚺}}_{x})}_{i, j}) \leq Var ({({\tilde{𝚺}}_{x})}_{i, j}) + {\tilde{υ}}_{i, j},

(17)

where $ζ_{j}$ , ${\tilde{ζ}}_{j}$ , $υ_{i, j}$ and ${\tilde{υ}}_{i, j}$ represent terms that are $𝒪 (σ_{w}^{2})$ or $𝒪 (\frac{1}{W})$ . Thus, we seek to derive the performance bounds of ${\tilde{d}}_{j}$ and ${({\tilde{𝚺}}_{x})}_{i, j}$ .

Bounding the bias of ${\hat{d}}_{j}$

Let us first consider ${\tilde{d}}_{j}$ . Note that:

\begin{array}{ll} | bias ({\tilde{d}}_{j}) | & := | E [{\tilde{d}}_{j}] - d_{j} | \\ \overset{(a)}{=} | E [{\tilde{d}}_{j} - {(d_{Oracle})}_{j}] | \\ \overset{(b)}{\leq} \frac{1}{L \sum_{k = 1}^{K} s_{W_{k}}^{2}} \sum_{k, l = 1}^{K, L} | s_{W_{k}} | E [| ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - x_{W_{k}, l}^{(j)} |], \end{array}

(18)

where $(a)$ holds since the Oracle estimator, ${(d_{Oracle})}_{j} = \frac{1}{L \sum_{k = 1}^{K} s_{W_{k}}^{2}} \sum_{k, l = 1}^{K, L} s_{W_{k}} (x_{W_{k}, l}^{(j)} - μ_{x}^{(j)})$ (i.e., observing $𝐱_{t, l}$ directly) is unbiased and $(b)$ follows through the application of Jensen’s inequality and triangle inequality. To simplify this bound, the triangle inequality yields:

𝔼 [| ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - x_{W_{k}, l}^{(j)} |] \leq 𝔼 [| ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(j)}) |] + 𝔼 [| ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(j)}) - x_{W_{k}, l}^{(j)} |] .

(19)

Then, to bound each of these terms, we establish a piece-wise linear Lipschitz-type bound on $ϕ_{B^{'}}^{- 1} (z)$ . First, consider the first term $𝔼 [| ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(j)}) |]$ . We seek to upper-bound this expectation by bounding $| ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(j)}) |$ via the following technical lemma:

Lemma 2. Conditioned on event $A_{W}$ , the following bound holds for all $j = 1, \dots, N$ , $k = 1, \dots, K$ and $l = 1, \dots, L$ :

ε ({\tilde{n}}_{W_{k}, l}^{(j)}, {\bar{n}}_{W_{k}, l}^{(j)}) := | ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(j)}) | \leq g (B) | {\tilde{n}}_{W_{k}, l}^{(j)} - {\bar{n}}_{W_{k}, l}^{(j)} |,

where

g (B) = \max {4 {(1 + \exp (B))}^{2}, 4 \exp (- B) \log (2 \exp (B) + 1) (1 + {(2 \exp (B) + 1)}^{2})} .

Proof of Lemma 2. First, consider the case ${\bar{n}}_{W_{k}, l}^{(j)} \leq 0.5$ . We bound the function $ε ({\tilde{n}}_{W_{k}, l}^{(j)}, {\bar{n}}_{W_{k}, l}^{(j)})$ in a piece-wise fashion as follows. Note that $ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)})$ is convex for ${\tilde{n}}_{W_{k}, l}^{(j)} \geq 0.5$ and concave for ${\tilde{n}}_{W_{k}, l}^{(j)} \leq 0.5$ . Thus, it immediately follows that for ${\tilde{n}}_{W_{k}, l}^{(j)} \leq {\bar{n}}_{W_{k}, l}^{(j)}$ , $ε ({\tilde{n}}_{W_{k}, l}^{(j)}, {\bar{n}}_{W_{k}, l}^{(j)})$ is convex and hence:

\begin{array}{ll} ε ({\tilde{n}}_{W_{k}, l}^{(j)}, {\bar{n}}_{W_{k}, l}^{(j)}) \leq \frac{| B^{'} + ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(j)}) |}{| {\bar{n}}_{W_{k}, l}^{(j)} - \frac{1}{1 + \exp (B^{'})} |} ({\bar{n}}_{W_{k}, l}^{(j)} - {\tilde{n}}_{W_{k}, l}^{(j)}) . \end{array}

(20)

Furthermore, for ${\bar{n}}_{W_{k}, l}^{(j)} \leq {\tilde{n}}_{W_{k}, l}^{(j)} \leq 0.5$ , $ε ({\tilde{n}}_{W_{k}, l}^{(j)}, {\bar{n}}_{W_{k}, l}^{(j)})$ is concave, and hence is bounded by the tangent at ${\bar{n}}_{W_{k}, l}^{(j)}$ :

\begin{array}{ll} ε ({\tilde{n}}_{W_{k}, l}^{(j)}, {\bar{n}}_{W_{k}, l}^{(j)}) \leq \frac{1}{{\bar{n}}_{W_{k}, l}^{(j)} (1 - {\bar{n}}_{W_{k}, l}^{(j)})} ({\tilde{n}}_{W_{k}, l}^{(j)} - {\bar{n}}_{W_{k}, l}^{(j)}) . \end{array}

(21)

Finally, for the case of ${\tilde{n}}_{W_{k}, l}^{(j)} \geq 0.5$ , consider the line,

\begin{array}{ll} h ({\tilde{n}}_{W_{k}, l}^{(j)}, {\bar{n}}_{W_{k}, l}^{(j)}) := \frac{| B^{'} - ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(j)}) |}{| \frac{1}{1 + \exp (- B^{'})} - {\bar{n}}_{W_{k}, l}^{(j)} |} ({\tilde{n}}_{W_{k}, l}^{(j)} - {\bar{n}}_{W_{k}, l}^{(j)}) . \end{array}

(22)

From the convexity of $ε ({\tilde{n}}_{W_{k}, l}^{(j)}, {\bar{n}}_{W_{k}, l}^{(j)})$ , $h ({\tilde{n}}_{W_{k}, l}^{(j)}, {\bar{n}}_{W_{k}, l}^{(j)})$ upper bounds $ε ({\tilde{n}}_{W_{k}, l}^{(j)}, {\bar{n}}_{W_{k}, l}^{(j)})$ for ${\tilde{n}}_{W_{k}, l}^{(j)} \geq 0.5$ , since $h (0.5, {\bar{n}}_{W_{k}, l}^{(j)}) \geq ε (0.5, {\bar{n}}_{W_{k}, l}^{(j)})$ for ${\bar{n}}_{W_{k}, l}^{(j)} \leq 0.5$ . Combining the piece-wise bounds in Equation 20, Equation 21 and Equation 22, we conclude that for ${\bar{n}}_{W_{k}, l}^{(j)} \leq 0.5$ :

ε ({\tilde{n}}_{W_{k}, l}^{(j)}, {\bar{n}}_{W_{k}, l}^{(j)}) \leq \tilde{g} ({\bar{n}}_{W_{k}, l}^{(j)}, B^{'}) | {\tilde{n}}_{W_{k}, l}^{(j)} - {\bar{n}}_{W_{k}, l}^{(j)} |,

(23)

where

\tilde{g} ({\bar{n}}_{W_{k}, l}^{(j)}, B^{'}) = \max {\frac{1}{{\bar{n}}_{W_{k}, l}^{(j)} (1 - {\bar{n}}_{W_{k}, l}^{(j)})}, \frac{| B^{'} + ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(j)}) |}{| {\bar{n}}_{W_{k}, l}^{(j)} - \frac{1}{1 + \exp (B^{'})} |}, \frac{| B^{'} - ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(j)}) |}{| \frac{1}{1 + \exp (- B^{'})} - {\bar{n}}_{W_{k}, l}^{(j)} |}} .

Due to the symmetry of $ε ({\tilde{n}}_{W_{k}, l}^{(j)}, {\bar{n}}_{W_{k}, l}^{(j)})$ , the same bound in Equation 23 can be established for ${\bar{n}}_{W_{k}, l}^{(j)} > 0.5$ as well.

Then, using $| ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(j)}) | \leq B^{'}$ and conditioning on event $𝒜_{W}$ , we simplify this bound as:

\tilde{g} ({\bar{n}}_{W_{k}, l}^{(j)}, B^{'}) \leq \max {4 {(1 + \exp (B))}^{2}, \frac{4 B^{'} (1 + \exp (B^{'})) (1 + \exp (B))}{\exp (B^{'}) - (2 \exp (B) + 1)}} .

Finally, based on the fact that $B^{'} = 2 \log (2 \exp (B) + 1)$ , the latter is further upper bounded as:

\tilde{g} ({\bar{n}}_{W_{k}, l}^{(j)}, B^{'}) \leq g (B),

where

g (B) = \max {4 {(1 + \exp (B))}^{2}, 4 \exp (- B) \log (2 \exp (B) + 1) (1 + {(2 \exp (B) + 1)}^{2})} .

This concludes the proof of Lemma 2. ■

Following Lemma 2, by conditioning on the event $𝒜_{W}$ we have:

𝔼_{𝒜_{W}} [| ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(j)}) |] \leq g (B) 𝔼_{𝒜_{W}} [| {\tilde{n}}_{W_{k}, l}^{(j)} - {\bar{n}}_{W_{k}, l}^{(j)} |] .

(24)

Then, we note that:

E [| {\tilde{n}}_{W_{k}, l}^{(j)} - {\bar{n}}_{W_{k}, l}^{(j)} |] \overset{(c)}{\leq} \sqrt{E [{| {\tilde{n}}_{W_{k}, l}^{(j)} - {\bar{n}}_{W_{k}, l}^{(j)} |}^{2}]} \overset{(d)}{=} \frac{σ_{w} \sqrt{1 + α^{2}}}{\sqrt{W}},

(25)

where in $(c)$ we have used the Cauchy-Schwarz inequality, and in $(d)$ we have used the fact that the observation noise across the $W$ time instances is i.i.d. and white. From the bounds in Equation 24 and Equation 25, we conclude that the first expectation in Equation 19, conditioned on event $𝒜_{W}$ is bounded as:

\begin{array}{ll} E_{𝒜_{W}} [| ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(j)}) |] & \leq g (B) E_{𝒜_{W}} [| {\tilde{n}}_{W_{k}, l}^{(j)} - {\bar{n}}_{W_{k}, l}^{(j)} |] \\ \leq g (B) \frac{σ_{w} \sqrt{1 + α^{2}}}{\sqrt{W} P (𝒜_{W})} . \end{array}

(26)

The foregoing sequence of reasoning similarly follows for $𝔼 [| ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(j)}) - x_{W_{k}, l}^{(j)} |]$ , since $\frac{1}{1 + \exp (B)} \leq ϕ (x_{W_{k}, l}^{(j)}) \leq 1 - \frac{1}{1 + \exp (B)}$ for $k = 1, \dots, K$ , $l = 1, \dots, L$ and $j = 1, \dots, N$ (as a consequence of $| x_{W_{k}, l}^{(j)} | < B$ for $k = 1, \dots, K$ , $l = 1, \dots, L$ and $j = 1, \dots, N$ , conditioned on $𝒜_{W}$ ). Accordingly, we derive the upper bound on the second term in Equation 19, conditioned on event $𝒜_{W}$ :

\begin{array}{ll} E_{𝒜_{W}} [| ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(j)}) - x_{W_{k}, l}^{(j)} |] & \leq g (B) E_{𝒜_{W}} [| {\bar{n}}_{W_{k}, l}^{(j)} - ϕ (x_{W_{k}, l}^{(j)}) |] \\ \overset{(e)}{\leq} \frac{g (B)}{W P (𝒜_{W})} \sqrt{E [{(\sum_{w = 1}^{W} n_{(k - 1) W + w, l}^{(j)} - W ϕ (x_{W_{k}, l}^{(j)}))}^{2}]} \\ \overset{(f)}{=} \frac{g (B)}{W P (𝒜_{W})} \sqrt{E [W ϕ (x_{W_{k}, l}^{(j)}) (1 - ϕ (x_{W_{k}, l}^{(j)}))]} \\ \overset{(g)}{\leq} \frac{g (B)}{2 \sqrt{W} P (𝒜_{W})}, \end{array}

(27)

where $(e)$ follows from the application of Jensen’s inequality, $(f)$ follows from the formula for the variance of a Binomial random variable, and $(g)$ follows from the inequality $ϕ (x_{W_{k}, l}^{(j)}) (1 - ϕ (x_{W_{k}, l}^{(j)})) \leq 1 / 4$ , for $ϕ (x_{W_{k}, l}^{(j)}) \in [0, 1]$ . Combining the results in Equation 26 and Equation 27, the overall expectation in Equation 19, conditioned on the event $𝒜_{W}$ is upper-bounded by:

𝔼_{𝒜_{W}} [| ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - x_{W_{k}, l}^{(j)} |] \leq \frac{2 g (B)}{\sqrt{W}} (σ_{w} \sqrt{1 + α^{2}} + \frac{1}{2}),

(28)

where we have lower bounded the probability of the event $𝒜_{W}$ by $1 / 2 (that is, ℙ (𝒜_{W}) > 1 / 2)$ (that is, $P (𝒜_{W}) > 1 / 2$ ). Thus, from Equation 18 and Equation 28 we derive:

| {bias}_{𝒜_{W}} ({\tilde{d}}_{j}) | \leq \frac{2 g (B)}{\sqrt{W}} (σ_{w} \sqrt{1 + α^{2}} + \frac{1}{2}) \frac{\sum_{k, l = 1}^{K, L} | s_{W_{k}} |}{L \sum_{k = 1}^{K} s_{W_{k}}^{2}} \overset{(h)}{\leq} \frac{2 g (B)}{σ_{s} \sqrt{W}} (σ_{w} \sqrt{1 + α^{2}} + \frac{1}{2}),

where in $(h)$ we have used the Cauchy-Schwarz inequality $\sum_{k = 1}^{K} | s_{W_{k}} | \leq \sqrt{K} \sqrt{\sum_{k = 1}^{K} s_{W_{k}}^{2}}$ while defining $σ_{s}^{2} := \frac{1}{K} \sum_{k = 1}^{K} s_{W_{k}}^{2}$ .

Then, for $B \geq 2.5$ , we have $g (B) = 4 {(1 + \exp (B))}^{2}$ and $B^{'} = 2 \log (2 \exp (B) + 1) \leq 3 B$ . Let $B := σ_{m} \sqrt{8 q \log W}$ for some $q > \frac{1}{64}$ . Further, for some $ϵ < 1 / 2$ , suppose that:

\log W \geq max {\frac{\log (8 K L N / η)}{q}, \frac{32 σ_{m}^{2} q}{ϵ^{2}}, \frac{2 \log (64 q)}{1 - 2 ϵ}, \frac{max {6.25, 4 {(‖ μ_{x} ‖_{\infty} + max_{k, j} {| s_{W_{k}} d_{j} |})}^{2}}}{8 q σ_{m}^{2}}, \log 2} .

(29)

Under these conditions,

g (B) \leq 4 {(1 + \exp (σ_{m} \sqrt{8 q \log W}))}^{2} \overset{(i)}{\leq} 16 \exp (2 σ_{m} \sqrt{8 q \log W}) \leq 16 W^{ϵ},

(30)

where in $(i)$ we have used the fact that $e^{x} \geq 1$ for $x \geq 0$ . Thus, under the conditions in Equation 29, we have:

| {bias}_{𝒜_{W}} ({\tilde{d}}_{j}) | \leq \frac{32}{σ_{s} \sqrt{W^{1 - 2 ϵ}}} (σ_{w} \sqrt{1 + α^{2}} + \frac{1}{2}) .

(31)

Finally, from Equation 16 and Equation 31, we conclude that:

| {bias}_{𝒜_{W}} ({\hat{d}}_{j}) | \leq \frac{1}{\sqrt{W^{1 - 2 ϵ}}} C_{1} (2 σ_{w} \sqrt{1 + α^{2}} + 1) + 𝒪 (σ_{w}^{2}) + 𝒪 (\frac{1}{W}),

where $C_{1} := \frac{16}{σ_{s}}$ .

Bounding the variance of ${\hat{d}}_{j}$

Next, we prove the upper bound on the variance of the maximum likelihood estimator, ${\hat{d}}_{j}$ . To that end, we upper-bound the variance of ${\tilde{d}}_{j}$ . First, using the Cauchy-Schwarz inequality, we have:

Var ({\tilde{d}}_{j}) := 𝔼 [{| {\tilde{d}}_{j} - 𝔼 [{\tilde{d}}_{j}] |}^{2}] \leq {\sqrt{𝔼 [{| {\tilde{d}}_{j} - {(d_{Oracle})}_{j} |}^{2}]} + \sqrt{Var ({(d_{Oracle})}_{j})}}^{2} .

(32)

Then, we upper-bound the conditional second moment of $| {\tilde{d}}_{j} - {(d_{Oracle})}_{j} |$ using the same techniques as we used in bounding the first moment. Accordingly, we get:

\begin{array}{ll} E_{𝒜_{W}} [{| {\tilde{d}}_{j} - (d_{Oracle})_{j} |}^{2}] & = \frac{1}{{(L \sum_{k = 1}^{K} s_{W_{k}}^{2})}^{2}} E_{𝒜_{W}} [{| \sum_{k, l = 1}^{K, L} s_{W_{k}} (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - x_{W_{k}, l}^{(j)}) |}^{2}] \\ \overset{(j)}{\leq} \frac{1}{{(L \sum_{k = 1}^{K} s_{W_{k}}^{2})}^{2}} {\sum_{k, l = 1}^{K, L} | s_{W_{k}} | \sqrt{E_{𝒜_{W}} [{| ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - x_{W_{k}, l}^{(j)} |}^{2}]}}^{2} \\ \overset{(k)}{\leq} {\frac{\sqrt{2} g (B)}{σ_{s} \sqrt{W}} (σ_{w} \sqrt{1 + α^{2}} + \frac{1}{2})}^{2}, \end{array}

(33)

where in $(j)$ , we have used the Cauchy-Schwarz inequality and $(k)$ follows from $𝔼_{𝒜_{W}} [{| ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - x_{W_{k}, l}^{(j)} |}^{2}] \leq \frac{2 {(g (B))}^{2}}{W} {(σ_{w} \sqrt{1 + α^{2}} + \frac{1}{2})}^{2}$ , which can be proven by the same techniques as before.

Next, we note that the variance of the Oracle estimator ${(d_{Oracle})}_{j}$ is given by:

Var ((d_{Oracle})_{j}) = \frac{1}{{(L \sum_{k = 1}^{K} s_{W_{k}}^{2})}^{2}} \sum_{k, l = 1}^{K, L} s_{W_{k}}^{2} Var ((x_{W_{k}, l}^{(j)} - μ_{x}^{(j)})) = \frac{(Σ_{x})_{j, j}}{L \sum_{k = 1}^{K} s_{W_{k}}^{2}} = \frac{(Σ_{x})_{j, j}}{L K σ_{s}^{2}} .

(34)

Combining Equation 32, Equation 33, and Equation 34, we can upper-bound the conditional variance of ${\tilde{d}}_{j}$ following Equation 32 as:

\sqrt{V a r_{𝒜_{W}} ({\tilde{d}}_{j})} \leq \sqrt{\frac{(Σ_{x})_{j, j}}{K L σ_{s}^{2} (1 - η)}} + \frac{\sqrt{2} g (B)}{σ_{s} \sqrt{W}} (σ_{w} \sqrt{1 + α^{2}} + \frac{1}{2}) .

Then, following Equation 16, under the conditions for $W$ in Equation 29, we conclude the proof of the conditional variance of ${\hat{d}}_{j}$ :

\sqrt{V a r_{𝒜_{W}} ({\hat{d}}_{j})} \leq \sqrt{\frac{(Σ_{x})_{j, j}}{K L σ_{s}^{2} (1 - η)}} + \frac{1}{\sqrt{W^{1 - 2 ϵ}}} C_{2} (2 σ_{w} \sqrt{1 + α^{2}} + 1) + 𝒪 (σ_{w}^{2}) + 𝒪 (\frac{1}{W}),

(35)

where $C_{2} := \frac{8 \sqrt{2}}{σ_{s}}$ .

Bounding the bias of ${({\hat{𝚺}}_{x})}_{i, j}$

Next, following the foregoing techniques, we upper-bound the bias and variance of the noise covariance estimator ${({\hat{𝚺}}_{x})}_{i, j}$ . To that end, we first note:

\begin{array}{ll} | bias (({\tilde{Σ}}_{x})_{i, j}) | & := | E [({\tilde{Σ}}_{x})_{i, j}] - (Σ_{x})_{i, j} | \\ \overset{(l)}{\leq} | E [({\tilde{Σ}}_{x})_{i, j} - {(Σ_{Oracle})}_{i, j}] | + | bias ({(Σ_{Oracle})}_{i, j}) |, \end{array}

(36)

where $(l)$ follows from the triangle inequality, with the Oracle noise covariance estimator (i.e., observing $𝐱_{t, l}$ directly), being defined as:

{(Σ_{Oracle})}_{i, j} = \frac{1}{K L} \sum_{k, l = 1}^{K, L} (x_{W_{k}, l}^{(i)} - μ_{x}^{(i)} - s_{W_{k}} (d_{Oracle})_{i}) (x_{W_{k}, l}^{(j)} - μ_{x}^{(j)} - s_{W_{k}} (d_{Oracle})_{j}) .

Then, to simplify the first term in Equation 36, we use similar techniques as before. Accordingly,

\begin{array}{lll} | E [({\tilde{Σ}}_{x})_{i, j} - {(Σ_{Oracle})}_{i, j}] | & = & | E [\frac{1}{K L} \sum_{k, l = 1}^{K L} (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(i)}) - μ_{x}^{(i)} - s_{W_{k}} {\tilde{d}}_{i}) (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - μ_{x}^{(j)} - s_{W_{k}} {\tilde{d}}_{j}) \\ - \frac{1}{K L} \sum_{k, l = 1}^{K L} (x_{W_{k}, l}^{(i)} - μ_{x}^{(i)} - s_{W_{k}} (d_{Oracle})_{i}) (x_{W_{k}, l}^{(j)} - μ_{x}^{(j)} - s_{W_{k}} (d_{Oracle})_{j})] | \\ \overset{(m)}{\leq} & \frac{1}{K L} \sum_{k, l = 1}^{K, L} E [| (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(i)}) - μ_{x}^{(i)}) (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - μ_{x}^{(j)}) \\ - (x_{W_{k}, l}^{(i)} - μ_{x}^{(i)}) (x_{W_{k}, l}^{(j)} - μ_{x}^{(j)}) |] \\ + \frac{1}{K L^{2} \sum_{k = 1}^{K} s_{W_{k}}^{2}} E [| \sum_{k, l = 1}^{K, L} s_{W_{k}} (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(i)}) - μ_{x}^{(i)}) \sum_{k^{'}, l^{'} = 1}^{K, L} s_{W_{k^{'}}} (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k^{'}}, l^{'}}^{(j)}) - μ_{x}^{(j)}) \\ - \sum_{k, l = 1}^{K, L} s_{W_{k}} (x_{W_{k}, l}^{(i)} - μ_{x}^{(i)}) \sum_{k^{'}, l^{'} = 1}^{K, L} s_{W_{k^{'}}} (x_{W_{k^{'}}, l^{'}}^{(j)} - μ_{x}^{(j)}) |], \end{array}

(37)

where $(m)$ follows through the application of Jensen’s inequality and triangle inequality. Next, conditioned on the event $𝒜_{W}$ we have:

\begin{array}{ll} E_{𝒜_{W}} & [| (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(i)}) - μ_{x}^{(i)}) (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - μ_{x}^{(j)}) - (x_{W_{k}, l}^{(i)} - μ_{x}^{(i)}) (x_{W_{k}, l}^{(j)} - μ_{x}^{(j)}) |] \\ \leq E_{𝒜_{W}} [| ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(i)}) ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - x_{W_{k}, l}^{(i)} x_{W_{k}, l}^{(j)} |] \\ + μ_{x}^{(j)} E_{𝒜_{W}} [| ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(i)}) - x_{W_{k}, l}^{(i)} |] + μ_{x}^{(i)} E_{𝒜_{W}} [| ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - x_{W_{k}, l}^{(j)} |] \\ \leq E_{𝒜_{W}} [| ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(i)}) ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(i)}) ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(j)}) |] \\ + E_{𝒜_{W}} [| ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(i)}) ϕ_{B^{'}}^{- 1} ({\bar{n}}_{W_{k}, l}^{(j)}) - x_{W_{k}, l}^{(i)} x_{W_{k}, l}^{(j)} |] + 2 μ_{m} E_{𝒜_{W}} [| ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - x_{W_{k}, l}^{(j)} |] \\ \leq 2 g (B) \frac{σ_{w} \sqrt{1 + α^{2}}}{\sqrt{W}} (g (B) \frac{σ_{w} \sqrt{1 + α^{2}}}{\sqrt{W}} + 4 \log (2 \exp (B) + 1)) + \frac{2 g (B)}{\sqrt{W}} {\frac{g (B)}{4 \sqrt{W}} + B} \\ + \frac{4 μ_{m} g (B)}{\sqrt{W}} (σ_{w} \sqrt{1 + α^{2}} + \frac{1}{2}), \end{array}

(38)

where $μ_{m} = ‖ μ_{x} ‖_{\infty}$ and we have used $B^{'} = 2 \log (2 \exp (B) + 1)$ . Similarly, conditioned on the event $𝒜_{W}$ the second term in Equation 37 can be bounded as:

\begin{array}{ll} E_{𝒜_{W}} & [| \sum_{k, l = 1}^{K, L} s_{W_{k}} (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(i)}) - μ_{x}^{(i)}) \sum_{k^{'}, l^{'} = 1}^{K, L} s_{W_{k^{'}}} (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k^{'}}, l^{'}}^{(j)}) - μ_{x}^{(j)}) \\ - \sum_{k, l = 1}^{K, L} s_{W_{k}} (x_{W_{k}, l}^{(i)} - μ_{x}^{(i)}) \sum_{k^{'}, l^{'} = 1}^{K, L} s_{W_{k^{'}}} (x_{W_{k^{'}}, l^{'}}^{(j)} - μ_{x}^{(j)}) |] \\ \leq \sum_{k, k^{'}, l, l^{'} = 1}^{K, K, L, L} | s_{W_{k}} s_{W_{k^{'}}} | E_{𝒜_{W}} [| ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(i)}) ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k^{'}}, l^{'}}^{(j)}) - x_{W_{k}, l}^{(i)} x_{W_{k^{'}}, l^{'}}^{(j)} |] \\ + μ_{x}^{(i)} \sum_{k, k^{'}, l, l^{'} = 1}^{K, K, L, L} | s_{W_{k}} s_{W_{k^{'}}} | E_{𝒜_{W}} [| ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k^{'}}, l^{'}}^{(j)}) - x_{W_{k^{'}}, l^{'}}^{(j)} |] \\ + μ_{x}^{(j)} \sum_{k, k^{'}, l, l^{'} = 1}^{K, K, L, L} | s_{W_{k}} s_{W_{k^{'}}} | E_{𝒜_{W}} [| ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(i)}) - x_{W_{k}, l}^{(i)} |] \end{array}

(39)

\begin{array}{ll} \leq {(L \sum_{k = 1}^{K} | s_{W_{k}} |)}^{2} {2 g (B) \frac{σ_{w} \sqrt{1 + α^{2}}}{\sqrt{W}} (g (B) \frac{σ_{w} \sqrt{1 + α^{2}}}{\sqrt{W}} + 4 \log (2 \exp (B) + 1)) \\ + \frac{2 g (B)}{\sqrt{W}} {\frac{g (B)}{4 \sqrt{W}} + B} + \frac{4 μ_{m} g (B)}{\sqrt{W}} (σ_{w} \sqrt{1 + α^{2}} + \frac{1}{2})} . \end{array}

(40)

Then, by combining the bounds in Equation 38 and Equation 40 and using an instance of Cauchy-Schwarz inequality ${(\sum_{k = 1}^{K} | s_{W_{k}} |)}^{2} \leq K \sum_{k = 1}^{K} s_{W_{k}}^{2}$ , we see that the bound in Equation 37 conditioned on the event $𝒜_{W}$ can be expressed as:

\begin{array}{ll} | E_{𝒜_{W}} [({\tilde{Σ}}_{x})_{i, j} - {(Σ_{Oracle})}_{i, j}] | & \leq 4 g (B) \frac{σ_{w} \sqrt{1 + α^{2}}}{\sqrt{W}} (g (B) \frac{σ_{w} \sqrt{1 + α^{2}}}{\sqrt{W}} + 4 \log (2 \exp (B) + 1) + 2 μ_{m}) \\ + \frac{4 g (B)}{\sqrt{W}} {\frac{g (B)}{4 \sqrt{W}} + B + μ_{m}} . \end{array}

(41)

Next, we see that the oracle estimator follows an Inverse Wishart distribution, that is $K L Σ_{Oracle} \sim I n v W i s h_{N} (Σ_{x}, K L - 1)$ . Therefore, we get:

E [Σ_{Oracle}] = \frac{(K L - 1)}{K L} Σ_{x} .

Thus, the bias of the oracle estimator is given by:

| bias ({(Σ_{Oracle})}_{i, j}) | = \frac{1}{K L} | {(Σ_{x})}_{i, j} | .

(42)

Combining the results in Equation 41 and Equation 42, the bias of $({\tilde{Σ}}_{x})_{i, j}$ can be bounded as:

\begin{array}{ll} | {bias}_{𝒜_{W}} (({\tilde{Σ}}_{x})_{i, j}) | & \leq \frac{| {(Σ_{x})}_{i, j} |}{K L (1 - η)} + 4 g (B) \frac{σ_{w} \sqrt{1 + α^{2}}}{\sqrt{W}} (4 \log (2 \exp (B) + 1) + 2 μ_{m}) \\ + \frac{4 g (B)}{\sqrt{W}} (B + μ_{m}) + 𝒪 (\frac{g {(B)}^{2}}{W}) . \end{array}

(43)

Finally, under the conditions for $W$ in Equation 29, the latter inequality simplifies to:

\begin{array}{ll} | {bias}_{𝒜_{W}} (({\tilde{Σ}}_{x})_{i, j}) | & \overset{(n)}{\leq} \frac{| {(Σ_{x})}_{i, j} |}{K L (1 - η)} + \frac{B g (B)}{\sqrt{W}} (28 σ_{w} \sqrt{1 + α^{2}} + 6) + 𝒪 (\frac{g {(B)}^{2}}{W}) \\ \overset{(o)}{\leq} \frac{| {(Σ_{x})}_{i, j} |}{K L (1 - η)} + 64 σ_{m} \sqrt{\frac{2 q \log W}{W^{1 - 2 ϵ}}} (14 σ_{w} \sqrt{1 + α^{2}} + 3) + 𝒪 (\frac{1}{W^{1 - 2 ϵ}}), \end{array}

(44)

where in $(n)$ we have used $2 \log (2 \exp (B) + 1) \leq 3 B$ and $B > 2 μ_{m}$ and in $(o)$ we have used $B g (B) \leq 16 L^{ϵ} σ_{m} \sqrt{8 q \log L}$ , which follows from Equation 30. Thus, following Equation 17 we derive the bound on the bias of the maximum likelihood estimator:

| {bias}_{𝒜_{W}} (({\hat{Σ}}_{x})_{i, j}) | \leq \frac{| {(Σ_{x})}_{i, j} |}{K L (1 - η)} + \sqrt{\frac{\log W}{W^{1 - 2 ϵ}}} C_{3} (14 σ_{w} \sqrt{1 + α^{2}} + 3) + 𝒪 (σ_{w}^{2}) + 𝒪 (\frac{1}{W^{1 - 2 ϵ}}),

where $C_{3} := 64 σ_{m} \sqrt{2 q}$ .

Bounding the variance of $({\hat{Σ}}_{x})_{i, j}$

Next, we establish an upper bound on the variance of the maximum likelihood estimator of the noise covariance. To that end, we upper-bound the variance of ${({\tilde{𝚺}}_{x})}_{i, j}$ . First, using the Cauchy-Schwarz inequality, we get:

\begin{array}{ll} Var (({\tilde{Σ}}_{x})_{i, j}) & := E [{| ({\tilde{Σ}}_{x})_{i, j} - E [({\tilde{Σ}}_{x})_{i, j}] |}^{2}] \\ \leq {\sqrt{E [{| ({\tilde{Σ}}_{x})_{i, j} - (Σ_{Oracle})_{i, j} |}^{2}]} + \sqrt{Var ({(Σ_{Oracle})}_{i, j})}}^{2} . \end{array}

(45)

Then, we upper-bound the conditional second moment of $| ({\tilde{Σ}}_{x})_{i, j} - (Σ_{Oracle})_{i, j} |$ using the same techniques used in bounding its first moment. Accordingly, we derive:

\begin{array}{ll} E_{𝒜_{W}} [{| ({\tilde{Σ}}_{x})_{i, j} - (Σ_{Oracle})_{i, j} |}^{2}] \\ = \frac{1}{K^{2} L^{2}} E_{𝒜_{W}} [{\sum_{k, l = 1}^{K, L} ((ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(i)}) - μ_{x}^{(i)} - s_{W_{k}} {\tilde{d}}_{i}) (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - μ_{x}^{(j)} - s_{W_{k}} {\tilde{d}}_{j}) \\ - (x_{W_{k}, l}^{(i)} - μ_{x}^{(i)} - s_{W_{k}} (d_{Oracle})_{i}) (x_{W_{k}, l}^{(j)} - μ_{x}^{(j)} - s_{W_{k}} (d_{Oracle})_{j}))}^{2}] \\ \leq \frac{1}{K^{2} L^{2}} {\sum_{k, l = 1}^{K, L} {E_{𝒜_{W}} [((ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(i)}) - μ_{x}^{(i)} - s_{W_{k}} {\tilde{d}}_{i}) (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - μ_{x}^{(j)} - s_{W_{K}} {\tilde{d}}_{j}) \\ - (x_{W_{k}, l}^{(i)} - μ_{x}^{(i)} - s_{W_{k}} (d_{Oracle})_{i}) (x_{W_{k}, l}^{(j)} - μ_{x}^{(j)} - s_{W_{k}} (d_{Oracle})_{j}))^{2}]}^{\frac{1}{2}}}^{2}, \end{array}

(46)

where the last bound follows from the Cauchy-Schwarz inequality. Then, we derive:

\begin{array}{ll} E_{𝒜_{W}} & [{(ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(i)}) - μ_{x}^{(i)} - s_{W_{k}} {\tilde{d}}_{i}) (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - μ_{x}^{(j)} - s_{W_{k}} {\tilde{d}}_{j}) \\ - (x_{W_{k}, l}^{(i)} - μ_{x}^{(i)} - s_{W_{k}} (d_{Oracle})_{i}) (x_{W_{k}, l}^{(j)} - μ_{x}^{(j)} - s_{W_{k}} (d_{Oracle})_{j})}^{2}] \\ = E_{𝒜_{W}} [{(ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(i)}) - μ_{x}^{(i)} - s_{W_{k}} \frac{1}{L \sum_{k^{'} = 1}^{K} {s_{W_{k^{'}}}}^{2}} \sum_{k^{'}, l^{'} = 1}^{K, L} s_{W_{k^{'}}} (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k^{'}}, l^{'}}^{(i)}) - μ_{x}^{(i)})) \\ \times (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - μ_{x}^{(j)} - s_{W_{k}} \frac{1}{L \sum_{k^{''} = 1}^{K} {s_{W_{k^{''}}}}^{2}} \sum_{k^{''}, l^{''} = 1}^{K, L} s_{W_{k^{''}}} (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k^{''}}, l^{''}}^{(j)}) - μ_{x}^{(j)})) \\ - (x_{W_{k}, l}^{(i)} - μ_{x}^{(i)} - s_{W_{k}} \frac{1}{L \sum_{k^{'} = 1}^{K} {s_{W_{k^{'}}}}^{2}} \sum_{k^{'}, l^{'} = 1}^{K, L} s_{W_{k^{'}}} (x_{W_{k^{'}}, l^{'}}^{(i)} - μ_{x}^{(i)})) \\ \times (x_{W_{k}, l}^{(j)} - μ_{x}^{(j)} - s_{W_{k}} \frac{1}{L \sum_{k^{''} = 1}^{K} {s_{W_{k^{''}}}}^{2}} \sum_{k^{''}, l^{''} = 1}^{K, L} s_{W_{k^{''}}} (x_{W_{k^{''}}, l^{''}}^{(j)} - μ_{x}^{(j)}))}^{2}] \\ = E_{𝒜_{W}} [{((ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(i)}) - μ_{x}^{(i)}) (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - μ_{x}^{(j)}) - (x_{W_{k}, l}^{(i)} - μ_{x}^{(i)}) (x_{W_{k}, l}^{(j)} - μ_{x}^{(j)})) \\ - \frac{s_{W_{k}}}{L \sum_{k^{''} = 1}^{K} {s_{W_{k^{''}}}}^{2}} \sum_{k^{''}, l^{''} = 1}^{K, L} s_{W_{k^{''}}} {(ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(i)}) - μ_{x}^{(i)}) (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k^{''}}, l^{''}}^{(j)}) - μ_{x}^{(j)}) \\ - (x_{W_{k}, l}^{(i)} - μ_{x}^{(i)}) (x_{W_{k^{''}}, l^{''}}^{(j)} - μ_{x}^{(j)})} \\ - \frac{s_{W_{k}}}{L \sum_{k^{'} = 1}^{K} {s_{W_{k^{'}}}}^{2}} \sum_{k^{'}, l^{'} = 1}^{K, L} s_{W_{k^{'}}} {(ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k^{'}}, l^{'}}^{(i)}) - μ_{x}^{(i)}) (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k}, l}^{(j)}) - μ_{x}^{(j)}) \\ - (x_{W_{k^{'}}, l^{'}}^{(i)} - μ_{x}^{(i)}) (x_{W_{k}, l}^{(j)} - μ_{x}^{(j)})} \\ + \frac{{s_{W_{k}}}^{2}}{L^{2} \sum_{k^{'}, k^{''} = 1}^{K, K} {s_{W_{k^{'}}}}^{2} {s_{W_{k^{''}}}}^{2}} \sum_{k^{'}, k^{''}, l^{'}, l^{''} = 1}^{K, L} s_{W_{k^{'}}} s_{W_{k^{''}}} {(ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k^{'}}, l^{'}}^{(i)}) - μ_{x}^{(i)}) (ϕ_{B^{'}}^{- 1} ({\tilde{n}}_{W_{k^{''}}, l^{''}}^{(j)}) - μ_{x}^{(j)}) \\ - (x_{W_{k^{'}}, l^{'}}^{(i)} - μ_{x}^{(i)}) (x_{W_{k^{''}}, l^{''}}^{(j)} - μ_{x}^{(j)})}}^{2}] \\ \leq \frac{2 {(g (B))}^{2}}{W} {(σ_{w} \sqrt{1 + α^{2}} + \frac{1}{2})}^{2} {(\frac{g (B)}{\sqrt{W}} (σ_{w} \sqrt{1 + α^{2}} + \frac{1}{2}) + 2 (B + μ_{m}))}^{2} \\ \times {(1 + \frac{s_{W_{k}} L \sum_{k^{'} = 1}^{K} s_{W_{k^{'}}}}{L \sum_{k^{'} = 1}^{K} {s_{W_{k^{'}}}}^{2}} + \frac{s_{W_{k}} L \sum_{k^{''} = 1}^{K} s_{W_{k^{''}}}}{L \sum_{k^{''} = 1}^{K} {s_{W_{k^{''}}}}^{2}} + \frac{s_{W_{k}}^{2} L \sum_{k^{'} = 1}^{K} s_{W_{k^{'}}} L \sum_{k^{''} = 1}^{K} s_{W_{k^{''}}}}{L^{2} \sum_{k^{'} = 1}^{K} {s_{W_{k^{'}}}}^{2} \sum_{k^{''} = 1}^{K} {s_{W_{k^{''}}}}^{2}})}^{2} . \end{array}

(47)

Using the final bound of Equation 47 in Equation 46, we get:

\begin{array}{ll} \sqrt{E_{𝒜_{W}} [{| ({\tilde{Σ}}_{x})_{i, j} - (Σ_{Oracle})_{i, j} |}^{2}]} & \leq \frac{\sqrt{2} g (B)}{\sqrt{W}} (σ_{w} \sqrt{1 + α^{2}} + \frac{1}{2}) (\frac{g (B)}{\sqrt{W}} (σ_{w} \sqrt{1 + α^{2}} + \frac{1}{2}) + 2 (B + μ_{m})) \\ \times \frac{1}{K L} (K L + \frac{L \sum_{k = 1}^{K} | s_{W_{k}} | \sum_{k^{'} = 1}^{K} | s_{W_{k^{'}}} |}{\sum_{k^{'} = 1}^{K} {s_{W_{k^{'}}}}^{2}} + \frac{L \sum_{k = 1}^{K} | s_{W_{k}} | \sum_{k^{''} = 1}^{K} | s_{W_{k^{''}}} |}{\sum_{k^{''} = 1}^{K} {s_{W_{k^{''}}}}^{2}} \\ + \frac{L \sum_{k = 1}^{K} s_{W_{k}}^{2} \sum_{k^{'} = 1}^{K} | s_{W_{k^{'}}} | \sum_{k^{''} = 1}^{K} | s_{W_{k^{''}}} |}{\sum_{k^{'} = 1}^{K} {s_{W_{k^{'}}}}^{2} \sum_{k^{''} = 1}^{K} {s_{W_{k^{''}}}}^{2}}) \\ \leq \frac{4 \sqrt{2} g (B)}{\sqrt{W}} (σ_{w} \sqrt{1 + α^{2}} + \frac{1}{2}) (\frac{g (B)}{\sqrt{W}} (σ_{w} \sqrt{1 + α^{2}} + \frac{1}{2}) + 2 (B + μ_{m})), \end{array}

(48)

where the last inequality follows from an instance of the Cauchy-Schwarz inequality, that is ${(\sum_{k = 1}^{K} | s_{W_{k}} |)}^{2} \leq K \sum_{k = 1}^{K} s_{W_{k}}^{2}$ .

Then, following the observation $K L Σ_{Oracle} \sim I n v W i s h_{N} (Σ_{x}, K L - 1)$ , we derive the variance of ${(Σ_{Oracle})}_{i, j}$ :

Var ({(Σ_{Oracle})}_{i, j}) = δ_{i, j}^{2} = \frac{(K L - 1) ((Σ_{x})_{i, j}^{2} + (Σ_{x})_{i, i} (Σ_{x})_{j, j})}{K^{2} L^{2}} .

(49)

Combining Equation 45, Equation 48 and Equation 49, we express the upper bound on the conditional variance of $({\tilde{Σ}}_{x})_{i, j}$ as:

\sqrt{V a r_{𝒜_{W}} (({\tilde{Σ}}_{x})_{i, j})} \leq \frac{1}{\sqrt{1 - η}} δ_{i, j} + \frac{8 \sqrt{2} g (B)}{\sqrt{W}} (B + μ_{m}) (σ_{w} \sqrt{1 + α^{2}} + \frac{1}{2}) + 𝒪 (\frac{g {(B)}^{2}}{W}) .

Then, following Equation 17 and the conditions in Equation 29, we conclude the proof of the upper bound on the conditional variance of $({\hat{Σ}}_{x})_{i, j}$ :

\sqrt{V a r_{𝒜_{W}} (({\hat{Σ}}_{x})_{i, j})} \leq \frac{1}{\sqrt{1 - η}} δ_{i, j} + \sqrt{\frac{\log W}{W^{1 - 2 ϵ}}} C_{4} (2 σ_{w} \sqrt{1 + α^{2}} + 1) + 𝒪 (σ_{w}^{2}) + 𝒪 (\frac{1}{W^{1 - 2 ϵ}}),

where $C_{4} := 384 σ_{m} \sqrt{q}$ .

Finally, it only remains to prove that the event $𝒜_{W}$ occurs with high probability for sufficiently large $W$ :

Lemma 3. The probability of occurrence of the event

𝒜_{W} = {| x_{W_{k}, l}^{(j)} | \leq B and \frac{1}{2 (1 + \exp (B))} \leq {\bar{n}}_{W_{k}, l}^{(j)} \leq 1 - \frac{1}{2 (1 + \exp (B))} for j = 1, \dots, N, k = 1, \dots, K and l = 1, \dots, L}

is upper-bounded as follows:

P (𝒜_{W}) \geq 1 - η,

for some constant $0 < η \leq 1 / 2$ satisfying the conditions of Equation (29).

Proof of Lemma 3.

First, using the union bound, we have:

P (𝒜_{W}) \geq 1 - \sum_{k, l, j = 1}^{K, L, N} {P (| x_{W_{k}, l}^{(j)} | > B) + P ({\bar{n}}_{W_{k}, l}^{(j)} < \frac{1}{2 (1 + \exp (B))}) + P ({\bar{n}}_{W_{k}, l}^{(j)} > 1 - \frac{1}{2 (1 + \exp (B))})} .

(50)

Next, we bound the probabilities on the right hand side using Chernoff’s inequality (Boucheron et al., 2013). First, note that:

\begin{array}{ll} P (x_{W_{k}, l}^{(j)} > B) = & P (x_{W_{k}, l}^{(j)} - μ_{x}^{(j)} - s_{W_{k}} d_{j} > B - μ_{x}^{(j)} - s_{W_{k}} d_{j}) \\ \overset{(p)}{\leq} P (x_{W_{k}, l}^{(j)} - μ_{x}^{(j)} - s_{W_{k}} d_{j} > \frac{B}{2}) \\ \overset{(q)}{\leq} \exp (- \frac{B^{2}}{8 σ_{m}^{2}}), \end{array}

where $(p)$ follows if $B > 2 (‖ μ_{x} ‖_{\infty} + max_{k, j} {| s_{W_{k}} d_{j} |})$ (which will hold under the conditions in Equation 29) and $(q)$ has been derived by applying the Chernoff’s bound on the Gaussian random variable $x_{W_{k}, l}^{(j)}$ . From the same reasoning, we see that $P (x_{W_{k}, l}^{(j)} < - B) \leq \exp (- \frac{B^{2}}{8 σ_{m}^{2}})$ . Combining these two results, we get the upper bound:

P (| x_{W_{k}, l}^{(j)} | > B) \leq 2 \exp (- \frac{B^{2}}{8 σ_{m}^{2}}) .

(51)

Next, note that:

\begin{array}{ll} P ({\bar{n}}_{W_{k}, l}^{(j)} < \frac{1}{2 (1 + \exp (B))}) & \overset{(r)}{\leq} P ({\bar{n}}_{W_{k}, l}^{(j)} - ϕ (x_{W_{k}, l}^{(j)}) < \frac{- 1}{2 (1 + \exp (B))}) \\ \overset{(s)}{\leq} \exp (- \frac{W}{16 (1 + \exp (B))^{2}}), \end{array}

(52)

where $(r)$ follows from the observation $\frac{1}{1 + \exp (B)} < ϕ (x_{W_{k}, l}^{(j)})$ (which is a consequence of $| x_{W_{k}, l}^{(j)} | < B$ ). Then, we note that the zero-mean random variable ${\bar{n}}_{W_{k}, l}^{(j)} - ϕ (x_{W_{k}, l}^{(j)})$ is sub-Gaussian with variance factor $\frac{2}{W}$ . Thus, using the Chernoff’s inequality on sub-Gaussian random variables (Boucheron et al., 2013), we derive the upper-bound $(s)$ in Equation 52. In a similar fashion, based on the observation $ϕ (x_{W_{k}, l}^{(j)}) < 1 - \frac{1}{1 + \exp (B)}$ , we conclude the bound:

P ({\bar{n}}_{W_{k}, l}^{(j)} > 1 - \frac{1}{2 (1 + \exp (B))}) \leq \exp (- \frac{W}{16 (1 + \exp (B))^{2}}) .

(53)

By combining the bounds in Equation 51, Equation 52, and Equation 53, the upper bound on $P (𝒜_{W})$ in Equation 50 takes the form:

P (𝒜_{W}) \geq 1 - 2 K L N \exp (- \frac{W}{16 (1 + \exp (B))^{2}}) - 2 K L N \exp (- \frac{B^{2}}{8 σ_{m}^{2}}) .

Finally, under the assumptions in Equation 29, we further simplify this bound as:

P (𝒜_{W}) \geq 1 - 2 K L N \exp (- \frac{W^{1 - ϵ}}{64}) - \frac{2 K L N}{W^{q}} \geq 1 - \frac{4 K L N}{W^{q}},

where we have used $W \geq 2$ (which gives $\log W \geq 2 \log \log W$ ) and $\log W \geq \frac{2 \log (64 q)}{1 - 2 ϵ}$ to show that $\frac{W^{1 - ϵ}}{64} \geq q \log W$ . Thus, $\log W \geq \frac{\log (8 K L N / η)}{q}$ ensures that $P (𝒜_{W}) \geq 1 - η$ , for $0 < η \leq \frac{1}{2}$ . ■

This concludes the proof of Theorem 1.■

Appendix 3

Adapting the proposed signal and noise correlation estimates to spiking observations

While Algorithm 1 takes two-photon fluorescence observations as input and produces estimates of signal and noise correlation as output, it is possible to adapt it to spiking observations obtained by electrophysiology recordings. The resulting algorithm is obtained by simplifying the variational inference procedure in Algorithm 1 and is given below for completeness:

Algorithm 2 Estimation of

𝚺_{x}

and

𝐃

from spiking observations

Inputs: Ensemble of spiking observations

{𝐧_{t, l}}_{t, l = 1}^{T, L}

, constant

𝝁_{x}

, hyper-parameters

𝝍_{x}

and

ρ_{x}

, tolerance at convergence δ and the external stimulus

𝐬_{t}

Outputs:

{\hat{Σ}}_{x}

and

\hat{D}

Initialization: Initial choice of

𝐏_{x}

{\tilde{Ω}}_{t}

{\hat{𝚺}}_{x}

and

\hat{𝐃}

residual = 10 δ

γ_{x} = ρ_{x} + L T

1. while

residual \geq δ

do
Update variational parameters
2. for

t = 1, . . ., T

and

l = 1, . . ., L

do
3.

𝐐_{𝐱_{t, l}} = {({\tilde{𝛀}}_{t, l} + γ_{x} 𝐏_{x}^{- 1})}^{- 1}

m_{x_{t, l}} = Q_{x_{t, l}} (n_{t, l} - \frac{1}{2} 1 - {\tilde{Ω}}_{t, l} {\hat{D}}^{⊤} s_{t} + γ_{x} P_{x}^{- 1} μ_{x})

5. for

j = 1, \dots, N

do
6.

c_{t, l}^{(j)} = \sqrt{{(𝐐_{𝐱_{t, l}})}_{j, j} + {(m_{𝐱_{t, l}}^{(j)} + {\hat{𝐝}}_{j}^{⊤} 𝐬_{t})}^{2}}

({\tilde{Ω}}_{t, l})_{j, j} := \frac{1}{2 c_{t, l}^{(j)}} \tanh (\frac{c_{t, l}^{(j)}}{2})

8. end for
9. end for
10.

P_{x} := ψ_{x} + \sum_{t, l = 1}^{T, L} {Q_{x_{t, l}} + m_{x_{t, l}} m_{x_{t, l}}^{⊤} - μ_{x} m_{x_{t, l}}^{⊤} - m_{x_{t, l}} μ_{x}^{⊤} + μ_{x} μ_{x}^{⊤}}

Update outputs and the convergence criterion
11. for

j = 1, \dots, N

do
12.

{\hat{d}}_{j} = {(\sum_{t, l = 1}^{T, L} (({\tilde{Ω}}_{t, l})_{j, j} s_{t} {s_{t}}^{⊤}))}^{- 1} (\sum_{t, l = 1}^{T, L} {(n_{t, l}^{(j)} - \frac{1}{2}) s_{t} - ({\tilde{Ω}}_{t, l})_{j, j} m_{x_{t, l}}^{(j)} s_{t}})

13. end for
14.

{(\hat{𝐃})}_{𝗉𝗋𝖾𝗏} = \hat{𝐃}

\hat{D} = [{\hat{d}}_{1}, {\hat{d}}_{2}, \dots, {\hat{d}}_{N}]

15.

{({\hat{𝚺}}_{x})}_{𝗉𝗋𝖾𝗏} = {\hat{𝚺}}_{x}

{\hat{𝚺}}_{x} = \frac{𝐏_{x}}{γ_{x} + N + 1}

16.

residual = {∥ {({\hat{𝚺}}_{x})}_{𝗉𝗋𝖾𝗏} - {\hat{𝚺}}_{x} ∥}_{2} / {∥ {({\hat{𝚺}}_{x})}_{𝗉𝗋𝖾𝗏} ∥}_{2} + {∥ {(\hat{𝐃})}_{𝗉𝗋𝖾𝗏} - \hat{𝐃} ∥}_{2} / {∥ {(\hat{𝐃})}_{𝗉𝗋𝖾𝗏} ∥}_{2}

17. end while
18. Return

{\hat{𝚺}}_{x}

and

\hat{𝐃}

Open in a new tab

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Behtash Babadi, Email: behtash@umd.edu.

Brice Bathellier, CNRS, France.

Barbara G Shinn-Cunningham, Carnegie Mellon University, United States.

Funding Information

This paper was supported by the following grants:

National Science Foundation 1807216 to Behtash Babadi.
National Science Foundation 2032649 to Behtash Babadi.
National Institutes of Health 1U19NS107464 to Patrick O Kanold, Behtash Babadi.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing.

Conceptualization, Resources, Data curation, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing.

Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing.

Data curation, Investigation, Writing - review and editing.

Conceptualization, Resources, Supervision, Funding acquisition, Validation, Investigation, Project administration, Writing - review and editing.

Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing.

Ethics

Animal experimentation: All procedures, under Kanold lab protocol R-JAN-19-06, conformed to the guidelines of the University of Maryland Institutional Animal Care and Use Committee and the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health.

Additional files

Transparent reporting form

elife-68046-transrepform.docx^{(247.6KB, docx)}

Data availability

A MATLAB implementation of the proposed method has been archived in Github at https://github.com/Anuththara-Rupasinghe/Signal-Noise-Correlation (copy archived at https://archive.softwareheritage.org/swh:1:rev:7397cc8d751a128f41df81f8af160014b22974d6). The data used in this work have been deposited in the Digital Repository at the University of Maryland at http://hdl.handle.net/1903/26917.

The following dataset was generated:

Rupasinghe A, Francis N, Liu J, Bowen Z, Kanold PO, Babadi B. 2021. Experimental Data from `Direct Extraction of Signal and Noise Correlations from Two-Photon Calcium Imaging of Ensemble Neuronal Activity'. Digital Repository at the University of Maryland. 1903/26917

References

Abbott LF, Dayan P. The effect of correlated variability on the accuracy of a population code. Neural Computation. 1999;11:91–101. doi: 10.1162/089976699300016827. [DOI] [PubMed] [Google Scholar]
Ahrens MB, Orger MB, Robson DN, Li JM, Keller PJ. Whole-brain functional imaging at cellular resolution using light-sheet microscopy. Nature Methods. 2013;10:413–420. doi: 10.1038/nmeth.2434. [DOI] [PubMed] [Google Scholar]
Aitchison L, Russell L, Packer AM, Yan J, Castonguay P, Hausser M, Turaga SC. Model-based bayesian inference of neural activity and connectivity from all-optical interrogation of a neural circuit. Advances in Neural Information Processing Systems; 2017. pp. 3486–3495. [Google Scholar]
Arlot S, Celisse A. A survey of cross-validation procedures for model selection. Statistics Surveys. 2010;4:40–79. doi: 10.1214/09-SS054. [DOI] [Google Scholar]
Averbeck BB, Latham PE, Pouget A. Neural correlations, population coding and computation. Nature Reviews Neuroscience. 2006;7:358–366. doi: 10.1038/nrn1888. [DOI] [PubMed] [Google Scholar]
Ba D, Babadi B, Purdon PL, Brown EN. Convergence and stability of iteratively Re-weighted least squares algorithms. IEEE Transactions on Signal Processing. 2014;62:183–195. doi: 10.1109/TSP.2013.2287685. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bartolo R, Saunders RC, Mitz AR, Averbeck BB. Information-Limiting correlations in large neural populations. The Journal of Neuroscience. 2020;40:1668–1678. doi: 10.1523/JNEUROSCI.2072-19.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
Beal MJ. University of London, University College London (United Kingdom); 2003. Variational algorithms for approximate Bayesian inference, PhD thesis. [Google Scholar]
Bishop CM. Pattern Recognition and Machine Learning (Information Science and Statistics. Berlin, Heidelberg: Springer-Verlag; 2006. [Google Scholar]
Blei DM, Kucukelbir A, McAuliffe JD. Variational inference: a review for statisticians. Journal of the American Statistical Association. 2017;112:859–877. doi: 10.1080/01621459.2017.1285773. [DOI] [Google Scholar]
Boucheron S, Lugosi G, Massart P. Concentration Inequalities: A Nonasymptotic Theory of Independence. OUP Oxford Press; 2013. [Google Scholar]
Bowen Z, Winkowski DE, Seshadri S, Plenz D, Kanold PO. Neuronal avalanches in input and associative layers of auditory cortex. Frontiers in Systems Neuroscience. 2019;13:45. doi: 10.3389/fnsys.2019.00045. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bowen Z, Winkowski DE, Kanold PO. Functional organization of mouse primary auditory cortex in adult C57BL/6 and F1 (CBAxC57) mice. Scientific Reports. 2020;10:10905. doi: 10.1038/s41598-020-67819-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brown EN, Barbieri R, Ventura V, Kass RE, Frank LM. The time-rescaling theorem and its application to neural spike train data analysis. Neural Computation. 2002;14:325–346. doi: 10.1162/08997660252741149. [DOI] [PubMed] [Google Scholar]
Cohen MR, Kohn A. Measuring and interpreting neuronal correlations. Nature Neuroscience. 2011;14:811–819. doi: 10.1038/nn.2842. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cohen MR, Maunsell JH. Attention improves performance primarily by reducing interneuronal correlations. Nature Neuroscience. 2009;12:1594–1600. doi: 10.1038/nn.2439. [DOI] [PMC free article] [PubMed] [Google Scholar]
Deneux T, Kaszas A, Szalay G, Katona G, Lakner T, Grinvald A, Rózsa B, Vanzetta I. Accurate spike estimation from noisy calcium signals for ultrafast three-dimensional imaging of large neuronal populations in vivo. Nature Communications. 2016;7:1–17. doi: 10.1038/ncomms12190. [DOI] [PMC free article] [PubMed] [Google Scholar]
DeWeese MR, Wehr M, Zador AM. Binary spiking in auditory cortex. The Journal of Neuroscience. 2003;23:7940–7949. doi: 10.1523/JNEUROSCI.23-21-07940.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ding J, Tarokh V, Yang Y. Model selection techniques: an overview. IEEE Signal Processing Magazine. 2018;35:16–34. doi: 10.1109/MSP.2018.2867638. [DOI] [Google Scholar]
Ecker AS, Berens P, Cotton RJ, Subramaniyan M, Denfield GH, Cadwell CR, Smirnakis SM, Bethge M, Tolias AS. State dependence of noise correlations in macaque primary visual cortex. Neuron. 2014;82:235–248. doi: 10.1016/j.neuron.2014.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
Eden UT, Frank LM, Barbieri R, Solo V, Brown EN. Dynamic analysis of neural encoding by point process adaptive filtering. Neural Computation. 2004;16:971–998. doi: 10.1162/089976604773135069. [DOI] [PubMed] [Google Scholar]
Fallani FV, Corazzol M, Sternberg JR, Wyart C, Chavez M. Hierarchy of neural organization in the embryonic spinal cord: granger-causality graph analysis of in vivo calcium imaging data. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2015;23:333–341. doi: 10.1109/TNSRE.2014.2341632. [DOI] [PubMed] [Google Scholar]
Forli A, Vecchia D, Binini N, Succol F, Bovetti S, Moretti C, Nespoli F, Mahn M, Baker CA, Bolton MM, Yizhar O, Fellin T. Two-Photon bidirectional control and imaging of neuronal excitability with high spatial resolution in Vivo. Cell Reports. 2018;22:3087–3098. doi: 10.1016/j.celrep.2018.02.063. [DOI] [PMC free article] [PubMed] [Google Scholar]
Francis NA, Winkowski DE, Sheikhattar A, Armengol K, Babadi B, Kanold PO. Small networks encode Decision-Making in primary auditory cortex. Neuron. 2018;97:885–897. doi: 10.1016/j.neuron.2018.01.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
Friedrich J, Zhou P, Paninski L. Fast online deconvolution of calcium imaging data. PLOS Computational Biology. 2017;13:e1005423. doi: 10.1371/journal.pcbi.1005423. [DOI] [PMC free article] [PubMed] [Google Scholar]
Frisina RD, Singh A, Bak M, Bozorg S, Seth R, Zhu X. F1 (CBA×C57) mice show superior hearing in old age relative to their parental strains: hybrid vigor or a new animal model for "golden ears"? Neurobiology of Aging. 2011;32:1716–1724. doi: 10.1016/j.neurobiolaging.2009.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gawne TJ, Richmond BJ. How independent are the messages carried by adjacent inferior temporal cortical neurons? The Journal of Neuroscience. 1993;13:2758–2771. doi: 10.1523/JNEUROSCI.13-07-02758.1993. [DOI] [PMC free article] [PubMed] [Google Scholar]
Grewe BF, Langer D, Kasper H, Kampa BM, Helmchen F. High-speed in vivo calcium imaging reveals neuronal network activity with near-millisecond precision. Nature Methods. 2010;7:399–405. doi: 10.1038/nmeth.1453. [DOI] [PubMed] [Google Scholar]
Hansen BJ, Chelaru MI, Dragoi V. Correlated variability in laminar cortical circuits. Neuron. 2012;76:590–602. doi: 10.1016/j.neuron.2012.08.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hastings WK. Monte carlo sampling methods using markov chains and their applications. Biometrika. 1970;57:97–109. doi: 10.1093/biomet/57.1.97. [DOI] [Google Scholar]
Jewell SW, Hocking TD, Fearnhead P, Witten DM. Fast nonconvex deconvolution of calcium imaging data. Biostatistics. 2020;21:709–726. doi: 10.1093/biostatistics/kxy083. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jewell S, Witten D. Exact spike train inference via $\ell_{0}$ optimization. The Annals of Applied Statistics. 2018;12:2457–2482. doi: 10.1214/18-AOAS1162. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK. An introduction to variational methods for graphical models. Machine Learning. 1999;37:183–233. doi: 10.1023/A:1007665907178. [DOI] [Google Scholar]
Josić K, Shea-Brown E, Doiron B, de la Rocha J. Stimulus-dependent correlations and population codes. Neural Computation. 2009;21:2774–2804. doi: 10.1162/neco.2009.10-08-879. [DOI] [PubMed] [Google Scholar]
Kadirvelu B, Hayashi Y, Nasuto SJ. Inferring structural connectivity using ising couplings in models of neuronal networks. Scientific Reports. 2017;7:2. doi: 10.1038/s41598-017-05462-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kazemipour A, Liu J, Solarana K, Nagode DA, Kanold PO, Wu M, Babadi B. Fast and stable signal deconvolution via compressible State-Space models. IEEE Transactions on Biomedical Engineering. 2018;65:74–86. doi: 10.1109/TBME.2017.2694339. [DOI] [PMC free article] [PubMed] [Google Scholar]
Keeley SL, Aoi MC, Yu Y, Smith SL, Pillow JW. In: Advances in Neural Information Processing Systems. Larochelle H, Ranzato M, Hadsell R, Balcan M. F, Lin H, editors. Curran Associates, Inc; 2020. Identifying signal and noise structure in neural population activity with Gaussian process factor models; pp. 1–84. [DOI] [Google Scholar]
Kerlin A, Mohar B, Flickinger D, MacLennan BJ, Dean MB, Davis C, Spruston N, Svoboda K. Functional clustering of dendritic activity during decision-making. eLife. 2019;8:e46966. doi: 10.7554/eLife.46966. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kohn A, Coen-Cagli R, Kanitscheider I, Pouget A. Correlations and neuronal population information. Annual Review of Neuroscience. 2016;39:237–256. doi: 10.1146/annurev-neuro-070815-013851. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kohn A, Smith MA. Stimulus dependence of neuronal correlation in primary visual cortex of the macaque. Journal of Neuroscience. 2005;25:3661–3673. doi: 10.1523/JNEUROSCI.5106-04.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kratz MB, Manis PB. Spatial organization of excitatory synaptic inputs to layer 4 neurons in mouse primary auditory cortex. Frontiers in Neural Circuits. 2015;9:17. doi: 10.3389/fncir.2015.00017. [DOI] [PMC free article] [PubMed] [Google Scholar]
Linden JF, Liu RC, Sahani M, Schreiner CE, Merzenich MM. Spectrotemporal structure of receptive fields in Areas AI and AAF of mouse auditory cortex. Journal of Neurophysiology. 2003;90:2660–2675. doi: 10.1152/jn.00751.2002. [DOI] [PubMed] [Google Scholar]
Linderman S, Adams RP, Pillow JW. Bayesian latent structure discovery from multi-neuron recordings. Advances in Neural Information Processing Systems; 2016. pp. 2002–2010. [Google Scholar]
Lipkus AH. A proof of the triangle inequality for the tanimoto distance. Journal of Mathematical Chemistry. 1999;26:263–265. doi: 10.1023/A:1019154432472. [DOI] [Google Scholar]
Liu J, Whiteway MR, Sheikhattar A, Butts DA, Babadi B, Kanold PO. Parallel processing of sound dynamics across mouse auditory cortex via spatially patterned thalamic inputs and distinct areal intracortical circuits. Cell Reports. 2019;27:872–885. doi: 10.1016/j.celrep.2019.03.069. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lütcke H, Gerhard F, Zenke F, Gerstner W, Helmchen F. Inference of neuronal network spike dynamics and topology from calcium imaging data. Frontiers in Neural Circuits. 2013;7:201. doi: 10.3389/fncir.2013.00201. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lyamzin DR, Barnes SJ, Donato R, Garcia-Lazaro JA, Keck T, Lesica NA. Nonlinear transfer of signal and noise correlations in cortical networks. Journal of Neuroscience. 2015;35:8065–8080. doi: 10.1523/JNEUROSCI.4738-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
Martin DA, Ribeiro TL, Cannas SA, Grigera TS, Plenz D, Chialvo DR. Box-scaling as a proxy of finite-size correlations. arXiv. 2020 doi: 10.1038/s41598-021-95595-2. https://arxiv.org/abs/2007.08236 [DOI] [PMC free article] [PubMed]
Meng X, Kao JP, Lee HK, Kanold PO. Intracortical circuits in thalamorecipient layers of auditory cortex refine after visual deprivation. Eneuro. 2017a;4:ENEURO.0092-17.2017. doi: 10.1523/ENEURO.0092-17.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meng X, Winkowski DE, Kao JPY, Kanold PO. Sublaminar subdivision of mouse auditory cortex layer 2/3 based on functional translaminar connections. The Journal of Neuroscience. 2017b;37:10200–10214. doi: 10.1523/JNEUROSCI.1361-17.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mishchenko Y, Vogelstein JT, Paninski L. A bayesian approach for inferring neuronal connectivity from calcium fluorescent imaging data. The Annals of Applied Statistics. 2011;5:1229–1261. doi: 10.1214/09-AOAS303. [DOI] [Google Scholar]
Montijn JS, Vinck M, Pennartz CM. Population coding in mouse visual cortex: response reliability and dissociability of stimulus tuning and noise correlation. Frontiers in Computational Neuroscience. 2014;8:58. doi: 10.3389/fncom.2014.00058. [DOI] [PMC free article] [PubMed] [Google Scholar]
Najafi F, Elsayed GF, Cao R, Pnevmatikakis E, Latham PE, Cunningham JP, Churchland AK. Excitatory and inhibitory subnetworks are equally selective during Decision-Making and emerge simultaneously during learning. Neuron. 2020;105:165–179. doi: 10.1016/j.neuron.2019.09.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pachitariu M, Stringer C, Harris KD. Robustness of spike deconvolution for neuronal calcium imaging. The Journal of Neuroscience. 2018;38:7976–7985. doi: 10.1523/JNEUROSCI.3339-17.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
Paninski L. Maximum likelihood estimation of cascade point-process neural encoding models. Network: Computation in Neural Systems. 2004;15:243–262. doi: 10.1088/0954-898X_15_4_002. [DOI] [PubMed] [Google Scholar]
Petrus E, Isaiah A, Jones AP, Li D, Wang H, Lee HK, Kanold PO. Crossmodal induction of thalamocortical potentiation leads to enhanced information processing in the auditory cortex. Neuron. 2014;81:664–673. doi: 10.1016/j.neuron.2013.11.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pillow JW, Scott J. In: Advances in Neural Information Processing Systems. Pereira F, Burges C. J. C, Bottou L, Weinberger K. Q, editors. Curran Associates, Inc; 2012. Fully Bayesian inference for neural models with negative-binomial spiking; pp. 1898–1906. [Google Scholar]
Pnevmatikakis E, Soudry D, Gao Y, Machado TA, Merel J, Pfau D, Reardon T, Mu Y, Lacefield C, Yang W, Ahrens M, Bruno R, Jessell TM, Peterka D, Yuste R, Paninski L, Denoising S. Deconvolution, and demixing of calcium imaging data. Neuron. 2016;89:37. doi: 10.1016/j.neuron.2015.11.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
Polson NG, Scott JG, Windle J. Bayesian inference for logistic models using Pólya–Gamma Latent Variables. Journal of the American Statistical Association. 2013;108:1339–1349. doi: 10.1080/01621459.2013.829001. [DOI] [Google Scholar]
Ramesh RN, Burgess CR, Sugden AU, Gyetvan M, Andermann ML. Intermingled ensembles in visual association cortex encode stimulus identity or predicted outcome. Neuron. 2018;100:900–915. doi: 10.1016/j.neuron.2018.09.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rauch HE, Tung F, Striebel CT. Maximum likelihood estimates of linear dynamic systems. AIAA Journal. 1965;3:1445–1450. doi: 10.2514/3.3166. [DOI] [Google Scholar]
Romano SA, Pérez-Schuster V, Jouary A, Boulanger-Weill J, Candeo A, Pietri T, Sumbre G. An integrated calcium imaging processing toolbox for the analysis of neuronal population dynamics. PLOS Computational Biology. 2017;13:e1005526. doi: 10.1371/journal.pcbi.1005526. [DOI] [PMC free article] [PubMed] [Google Scholar]
Romero S, Hight AE, Clayton KK, Resnik J, Williamson RS, Hancock KE, Polley DB. Cellular and widefield imaging of sound frequency organization in primary and higher order fields of the mouse auditory cortex. Cerebral Cortex. 2020;30:1603–1622. doi: 10.1093/cercor/bhz190. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rothschild G, Nelken I, Mizrahi A. Functional organization and population dynamics in the mouse primary auditory cortex. Nature Neuroscience. 2010;13:353–360. doi: 10.1038/nn.2484. [DOI] [PubMed] [Google Scholar]
Rumyantsev OI, Lecoq JA, Hernandez O, Zhang Y, Savall J, Chrapkiewicz R, Li J, Zeng H, Ganguli S, Schnitzer MJ. Fundamental bounds on the fidelity of sensory cortical coding. Nature. 2020;580:100–105. doi: 10.1038/s41586-020-2130-2. [DOI] [PubMed] [Google Scholar]
Rupasinghe A. Direct Extraction of Signal and Noise Correlations from Two-Photon Calcium Imaging of Ensemble Neuronal Activity MATLAB Codes. GitHub Repository. 2020 doi: 10.7554/eLife.68046. https://github.com/Anuththara-Rupasinghe/Signal-Noise-Correlation [DOI] [PMC free article] [PubMed]
Rupasinghe A, Francis N, Liu J, Bowen Z, Kanold PO, Babadi B. Experimental Data From ‘Direct Extraction of Signal and Noise Correlations From Two-Photon Calcium Imaging of Ensemble Neuronal Activity’. Digital Repository at the University of Maryland (DRUM); 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rupasinghe A, Babadi B. Robust inference of neuronal correlations from blurred and noisy spiking observations. 2020 54th Annual Conference on Information Sciences and Systems (CISS); 2020. pp. 1–5. [DOI] [Google Scholar]
Shumway RH, Stoffer DS. An approach to time series smoothing and forecasting using the em algorithm. Journal of Time Series Analysis. 1982;3:253–264. doi: 10.1111/j.1467-9892.1982.tb00349.x. [DOI] [Google Scholar]
Smith AC, Brown EN. Estimating a state-space model from point process observations. Neural Computation. 2003;15:965–991. doi: 10.1162/089976603765202622. [DOI] [PubMed] [Google Scholar]
Smith MA, Sommer MA. Spatial and temporal scales of neuronal correlation in visual area V4. Journal of Neuroscience. 2013;33:5422–5432. doi: 10.1523/JNEUROSCI.4782-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sompolinsky H, Yoon H, Kang K, Shamir M. Population coding in neuronal systems with correlated noise. Physical Review E. 2001;64:051904. doi: 10.1103/PhysRevE.64.051904. [DOI] [PubMed] [Google Scholar]
Soudry D, Keshri S, Stinson P, Oh MH, Iyengar G, Paninski L. Efficient "Shotgun" Inference of Neural Connectivity from Highly Sub-sampled Activity Data. PLOS Computational Biology. 2015;11:e1004464. doi: 10.1371/journal.pcbi.1004464. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stosiek C, Garaschuk O, Holthoff K, Konnerth A. In vivo two-photon calcium imaging of neuronal networks. PNAS. 2003;100:7319–7324. doi: 10.1073/pnas.1232232100. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stringer C, Pachitariu M. Computational processing of neural recordings from calcium imaging data. Current Opinion in Neurobiology. 2019;55:22–31. doi: 10.1016/j.conb.2018.11.005. [DOI] [PubMed] [Google Scholar]
Svoboda K, Yasuda R. Principles of two-photon excitation microscopy and its applications to neuroscience. Neuron. 2006;50:823–839. doi: 10.1016/j.neuron.2006.05.019. [DOI] [PubMed] [Google Scholar]
Theis L, Berens P, Froudarakis E, Reimer J, Román Rosón M, Baden T, Euler T, Tolias AS, Bethge M. Benchmarking spike rate inference in population calcium imaging. Neuron. 2016;90:471–482. doi: 10.1016/j.neuron.2016.04.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
Truccolo W, Eden UT, Fellows MR, Donoghue JP, Brown EN. A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. Journal of Neurophysiology. 2005;93:1074–1089. doi: 10.1152/jn.00697.2004. [DOI] [PubMed] [Google Scholar]
Vinci G, Ventura V, Smith MA, Kass RE. Separating spike count correlation from firing rate correlation. Neural Computation. 2016;28:849–881. doi: 10.1162/NECO_a_00831. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vogelstein JT, Watson BO, Packer AM, Yuste R, Jedynak B, Paninski L. Spike inference from calcium imaging using sequential monte carlo methods. Biophysical Journal. 2009;97:636–655. doi: 10.1016/j.bpj.2008.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vogelstein JT, Packer AM, Machado TA, Sippy T, Babadi B, Yuste R, Paninski L. Fast nonnegative deconvolution for spike train inference from population calcium imaging. Journal of Neurophysiology. 2010;104:3691–3704. doi: 10.1152/jn.01073.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang C, Blei DM. Variational inference in nonconjugate models. Journal of Machine Learning Research : JMLR. 2013;14:1005–1031. doi: 10.5555/2567709.2502613. [DOI] [Google Scholar]
Watkins PV, Kao JP, Kanold PO. Spatial pattern of intra-laminar connectivity in supragranular mouse auditory cortex. Frontiers in Neural Circuits. 2014;8:15. doi: 10.3389/fncir.2014.00015. [DOI] [PMC free article] [PubMed] [Google Scholar]
Winkowski DE, Kanold PO. Laminar transformation of frequency organization in auditory cortex. Journal of Neuroscience. 2013;33:1498–1508. doi: 10.1523/JNEUROSCI.3101-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wong R. Asymptotic approximations of integrals. Society for Industrial and Applied Mathematics. 2001;1:260. doi: 10.1137/1.9780898719260. [DOI] [Google Scholar]
Yatsenko D, Josić K, Ecker AS, Froudarakis E, Cotton RJ, Tolias AS. Improved estimation and interpretation of correlations in neural circuits. PLOS Computational Biology. 2015;11:e1004083. doi: 10.1371/journal.pcbi.1004083. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yu BM, Cunningham JP, Santhanam G, Ryu SI, Shenoy KV, Sahani M. Gaussian-Process factor analysis for Low-Dimensional Single-Trial analysis of neural population activity. Journal of Neurophysiology. 2009;102:614–635. doi: 10.1152/jn.90941.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yu S, Yang H, Nakahara H, Santos GS, Nikolić D, Plenz D. Higher-order interactions characterized in cortical activity. Journal of Neuroscience. 2011;31:17514–17526. doi: 10.1523/JNEUROSCI.3127-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

eLife. doi: 10.7554/eLife.68046.sa1

Decision letter

Editor: Brice Bathellier¹

Reviewed by: Brice Bathellier²

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Acceptance summary:

This study establishes a new method for more precise estimation of pairwise noise and signal correlations in two-photon calcium imaging, by modeling generically the influence of calcium dynamics and subtracting the interaction between response signals and variability when the trial number is low. The accuracy of this new estimator is demonstrated here for the mouse auditory cortex, but this tool will find useful applications on a large diversity of datasets.

Decision letter after peer review:

Thank you for submitting your article "Direct Extraction of Signal and Noise Correlations from Two-Photon Calcium Imaging of Ensemble Neuronal Activity" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, including Brice Bathellier as the Reviewing Editor and Reviewer #1. and the evaluation has been overseen by Barbara Shinn-Cunningham as the Senior Editor. The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

Both reviewers agree on the quality of the method proposed in this study and it potential for assessing correlations in two-photon calcium imaging data. However, two points should be carefully addressed according to the comments below. First the authors should better explain what aspect of your method is key for your approach to outperform others methods. Second, the authors should better explore the behavior of your method on data that does not fulfil the core assumptions you make (uncorrelated noise, LN response model).

1. Surrogate data use a measurement noise model that has no temporal correlations. This is not the case in real two-photon imaging data, which include both correlation-free noise (photon count statistics) and temporally (and spatially) correlated artefacts. As in Deneux et al. 2016 (cited in the paper), the authors should provide simulations with different types of noise and show how this affects their correlation estimates. Is it still more robust than more classical methods?

2. Another simplifying assumption used by the authors is their model of spiking activity which is composed of a linear receptive field followed by a non-linear mapping function. Several papers have shown that this only imperfectly models neural responses (e.g. to sound in auditory cortex). There are, in fact in real neural data, non-linearities that are more complex than what the mapping function can capture. How does this impact their estimate? The authors should simulate this with e.g. a multilayer network (two-layer linear non-linear cascade, deep net) or simulate neurons that respond to the quadratic sum of the output of several linear filter (see e.g. recent work by the Shamma and colleagues).

3. There is a lack of intuition about the key aspects in their approach that makes it overperform other methods. This should be introduced in the results and/or discussion to better guide and convince the reader.

It is crucial that any end-user be able to get a clear picture of the conditions under which the method can or cannot be applied before diving in. The fact that such an applicability domain is not well defined is a major concern. Notably, each Real Data Study presented in the paper uses a preliminary selection of "highly active cells" (1rst study: N = 16; 2nd study: N = 10; 3rd study: N~20 per field), as the authors succinctly discuss that performance is expected to degrade "in the regime of extremely low spiking rate and high observation noise" (l. 518-519). But no precise criteria are provided to specify what is meant by "highly active cells". On the other hand, the authors also assume that there is at most one spiking event per time frame for each neuron, which seems to exclude bursting neurons. The latter assumption seems to be a challenge with respect to the example traces shown on Figure 4C (∆F/F reaches 400%) and on Figure 6C (∆F/F reaches 100%), considering that the GCaMP6s signal for a single spike is expected to peak below 10-20%. This forces the authors to take a scaling factor of the observations A = 1 x I (Real Data Study 1 and 3) or A = 0.75 x I (Real Data Study 2) compared to the A = 0.1 x I taken in the Simulation Studies. Therefore, it looks like if the Real Data Studies were performed on mainly bursting cells and each burst was counted as one spiking event. A detailed discussion of the usable range of firing rates, whether in spike or burst units, as well as the usable range of SNR should be added to the main text to allow future users to assess the suitability of their data for this analysis.

4. Another parameter seems to be set by the authors on a criterion that is unclear to me: the number of time lags R to be included in the sound stimulus vector st. It seems to act as a memory of the past trajectory of the stimulus and probably serves to enhance the effect of stimulus onset/offset relative to the rest of the sound presentation. It is consistent with the known tendency of neurons in the primary auditory cortex to respond to these abrupt changes in sound power. However, this R is set at 2 in the Simulation Study 1, whereas it is set at 25, in the Real Data Studies 1 and 3, and to 40 in the Real Data Study 2. What leads to these differences escaped to me and should be explained more clearly.

5. This memory of the past stimulus trajectory appears to be specific to the proposed method and is not accounted for in the 2-stage Pearson estimation, for example. Since it probably helps to reflect the common sensitivity of neurons to onset/offset, it alone provides an advantage to the proposed method over the 2-stage Pearson estimation. It would be instructive to also perform this comparison with R set to 1 to get an idea of the magnitude of this advantage.

Reviewer #1 (Recommendations for the authors):

eLife. 2021 Jun 28;10:e68046. doi: 10.7554/eLife.68046.sa2

Author response

Essential revisions:

Both reviewers agree on the quality of the method proposed in this study and it potential for assessing correlations in two-photon calcium imaging data. However, two points should be carefully addressed according to the comments below. First the authors should better explain what aspect of your method is key for your approach to outperform others methods. Second, the authors should better explore the behavior of your method on data that does not fulfil the core assumptions you make (uncorrelated noise, LN response model).

We would like to thank the reviewing editor for his supportive stance towards our work, and for clearly summarizing the feedback from both reviewers. Based on the comments and suggestions of the two reviewers, we have made several changes to our manuscript that substantially and constructively addresses both of these key points, as we describe next.

1. Surrogate data use a measurement noise model that has no temporal correlations. This is not the case in real two-photon imaging data, which include both correlation-free noise (photon count statistics) and temporally (and spatially) correlated artefacts. As in Deneux et al. 2016 (cited in the paper), the authors should provide simulations with different types of noise and show how this affects their correlation estimates. Is it still more robust than more classical methods?

To address this comment, we performed extensive simulations to evaluate the robustness of different algorithms under model mismatch conditions induced by temporal correlations of observation noise. These new analyses are included in a new subsection called “Analysis of Robustness with respect to Modeling Assumptions” (Pages 6-7).

2. Another simplifying assumption used by the authors is their model of spiking activity which is composed of a linear receptive field followed by a non-linear mapping function. Several papers have shown that this only imperfectly models neural responses (e.g. to sound in auditory cortex). There are, in fact in real neural data, non-linearities that are more complex than what the mapping function can capture. How does this impact their estimate? The authors should simulate this with e.g. a multilayer network (two-layer linear non-linear cascade, deep net) or simulate neurons that respond to the quadratic sum of the output of several linear filter (see e.g. recent work by the Shamma and colleagues).

To address this comment, we performed a new simulation study to evaluate the robustness of different algorithms under model mismatch conditions induced by non-linear stimulus integration model. This result is included in a new subsection called “Analysis of Robustness with respect to Modeling Assumptions” (Pages 6-7).

3. There is a lack of intuition about the key aspects in their approach that makes it overperform other methods. This should be introduced in the results and/or discussion to better guide and convince the reader.

It is crucial that any end-user be able to get a clear picture of the conditions under which the method can or cannot be applied before diving in. The fact that such an applicability domain is not well defined is a major concern. Notably, each Real Data Study presented in the paper uses a preliminary selection of "highly active cells" (1rst study: N = 16; 2nd study: N = 10; 3rd study: N~20 per field), as the authors succinctly discuss that performance is expected to degrade "in the regime of extremely low spiking rate and high observation noise" (l. 518-519). But no precise criteria are provided to specify what is meant by "highly active cells". On the other hand, the authors also assume that there is at most one spiking event per time frame for each neuron, which seems to exclude bursting neurons. The latter assumption seems to be a challenge with respect to the example traces shown on Figure 4C (∆F/F reaches 400%) and on Figure 6C (∆F/F reaches 100%), considering that the GCaMP6s signal for a single spike is expected to peak below 10-20%. This forces the authors to take a scaling factor of the observations A = 1 x I (Real Data Study 1 and 3) or A = 0.75 x I (Real Data Study 2) compared to the A = 0.1 x I taken in the Simulation Studies. Therefore, it looks like if the Real Data Studies were performed on mainly bursting cells and each burst was counted as one spiking event. A detailed discussion of the usable range of firing rates, whether in spike or burst units, as well as the usable range of SNR should be added to the main text to allow future users to assess the suitability of their data for this analysis.

To address this comment, we have now included the sources of performance gap between our proposed method and existing ones in the revised Discussion section, highlighting the key aspects of our method that makes it outperform existing approaches (Pages 17-18). We have also added a new subsection to Methods called “Guidelines for model parameter settings” that includes our rationale and criteria for choosing the number of neurons ( $N$ ), stimulus integration window length ( $R$ ), observation noise covariance ( $Σ_{w}$ ), scaling matrix $A$ , state transition parameter ( $α$ ), and mean of the latent noise process ( $μ_{x}$ ) (Page 24). Finally, we have performed new simulation studies to evaluate the effects of SNR and firing rate on the performance of the proposed method (Pages 6-7), and closely inspected the performance of our method under rapid increase of firing rate (Page 10).

4. Another parameter seems to be set by the authors on a criterion that is unclear to me: the number of time lags R to be included in the sound stimulus vector st. It seems to act as a memory of the past trajectory of the stimulus and probably serves to enhance the effect of stimulus onset/offset relative to the rest of the sound presentation. It is consistent with the known tendency of neurons in the primary auditory cortex to respond to these abrupt changes in sound power. However, this R is set at 2 in the Simulation Study 1, whereas it is set at 25, in the Real Data Studies 1 and 3, and to 40 in the Real Data Study 2. What leads to these differences escaped to me and should be explained more clearly.

To address this comment, we have added a new subsection to Methods called “Guidelines for model parameter settings” that includes our rationale for choosing the stimulus integration window length R (Page 24) and have performed a new analysis to evaluate the effect of R on the performance of the proposed method in real data study 1 (Page 10).

5. This memory of the past stimulus trajectory appears to be specific to the proposed method and is not accounted for in the 2-stage Pearson estimation, for example. Since it probably helps to reflect the common sensitivity of neurons to onset/offset, it alone provides an advantage to the proposed method over the 2-stage Pearson estimation. It would be instructive to also perform this comparison with R set to 1 to get an idea of the magnitude of this advantage.

To address this comment, we have now discussed the advantage of including the stimulus history in our model and probed the sensitivity of our estimates to the choice of R (including R = 1) in Figure 4—figure supplement 1 (Page 10).

Reviewer #1 (Recommendations for the authors):

1. Surrogate data use a measurement noise model that has no temporal correlations. This is not the case in real two-photon imaging data, which include both correlation-free noise (photon count statistics) and temporally (and spatially) correlated artefacts. As in Deneux et al. 2016 (cited in the paper), the authors should provide simulations with different types of noise and show how this affects their correlation estimates. Is it still more robust than more classical methods?

Thank you for this suggestion. As explained earlier in response to the public portion of Reviewer 1’s comments, motivated by this suggestion, we have substantially enhanced our performance analyses in the revised manuscript and compiled them in a new subsection titled “Analysis of Robustness with respect to Modeling Assumptions” for better clarity and consistency.

Specifically, we considered two observation noise model mismatch conditions, namely, white noise + low frequency drift and pink noise, similar to the treatment in Deneux et al. (2016). For each noise mismatch model, we also varied the SNR level and firing rate and compared the performance of the different algorithms as reported in Figure 2—figure supplement 6. These new analyses demonstrate that our proposed estimates outperform the existing methods, under correlated generative noise models, and also with respect to varying levels of SNR and firing rate. As clearly evident in panels C and F of Figure 2—figure supplement 6, even though the estimated calcium concentrations are contaminated by the temporally correlated fluctuations in observation noise, the putative spikes estimated as a byproduct of our iterative method closely match the ground truth spikes, which in turn results in accurate estimates of signal and noise correlations.

2. Another simplifying assumption used by the authors is their model of spiking activity which is composed of a linear receptive field followed by a non-linear mapping function. Several papers have shown that this only imperfectly models neural responses (e.g. to sound in auditory cortex). There are, in fact in real neural data, non-linearities that are more complex than what the mapping function can capture. How does this impact their estimate? The authors should simulate this with e.g. a multilayer network (two-layer linear non-linear cascade, deep net) or simulate neurons that respond to the quadratic sum of the output of several linear filter (see e.g. recent work by the Shamma and colleagues).

$n_{t, l}^{(j)} \sim Bernoulli (logistic (x_{t, l}^{(j)} + d_{j}^{⊤} s_{t} + {({\tilde{d}}_{j, 1}^{⊤} s_{t})}^{2} + {({\tilde{d}}_{j, 2}^{⊤} s_{t})}^{2}))$

Thank you for this suggestion. As explained earlier in response to the public portion of Reviewer 1’s comments, we have addressed this comment in the revised manuscript in the new subsection titled “Analysis of Robustness with respect to Modeling Assumptions”. To examine the robustness of our method with respect to model mismatch in stimulus integration, as suggested, we generated data according to a non-linear (i.e., quadratic sum of linear filters) receptive field model:but assumed a linear stimulus integration model in our inference procedure (i.e., ${\tilde{d}}_{j, 1} = {\tilde{d}}_{j, 2} = 0$ ). The comparison of the correlations estimated under this setting by each method are shown in Figure 2 —figure supplement 3. While the performance of our proposed signal correlation estimates under this setting degrade as compared to that in Figure 2 with no model mismatch, our proposed estimates still outperform the other methods and recovers the ground truth signal correlation structure reasonably well.

It is noteworthy that the model mismatch in the stimulus integration component does not affect the accuracy of noise correlation estimates in our method, as is evident from the noise correlation estimates in Figure 2—figure supplement 3. In comparison, the biases induced in the other methods due to model mismatch and various other factors such as observation noise, temporal blurring, undermining non-linear mappings between spikes and underlying covariates, results in significantly larger errors in both signal and noise correlation estimates.

Finally, we would like to note that since our model is a baseline framework for signal and noise correlation estimation from two-photon imaging data, it has the inherent limitations of the underlying modeling assumptions such as linear receptive fields and binary spiking. However, it is possible to generalize our approach beyond linear stimulus integration by extending to non-linear models such as those parameterized by neural networks, as we have also outlined towards the end of the Discussion.

3. There is a lack of intuition about the key aspects in their approach that makes it overperform other methods. This should be introduced in the results and/or discussion to better guide and convince the reader.

Thank you for pointing out this source of ambiguity. The two main sources for the observed performance gap between our proposed method and existing approaches can be summarized as follows:

1) Favorable soft decisions on the timing of spikes achieved by our method, as a byproduct of the iterative variational inference procedure: an accurate probabilistic decoding of spikes results in better estimates of the signal/noise correlations, and conversely having more accurate estimates of the signal/noise covariances improves the probabilistic characterization of spiking events. This is in contrast with both the Pearson and Two-Stage methods: in the Pearson method, spike timing is heavily blurred by the calcium decay; in the two-stage methods, erroneous hard (i.e., binary) decisions on the timing of spiking events result in biases that propagate to and contaminate the downstream signal and noise correlation estimation and thus result in significant errors.

2) Explicit modeling of the non-linear mapping from stimulus and latent noise covariates to spiking through a canonical point process model (which is in turn tied to a two-photon observation model in a multi-tier Bayesian fashion) results in robust performance under limited number of trials and observation duration. As we have shown in Appendix 1, as the number of trials L and trial duration T tend to infinity, conventional notions of signal and noise correlation indeed recover the ground truth signal and noise correlations, as the biases induced by non-linearities average out across trial repetitions. However, as shown in Figure 2—figure supplement 2, in order to achieve comparable performance to our method using 20 trials, the conventional correlation estimates require ~1000 trials.

To address this comment, we have now included the aforementioned sources of performance gap in the revised Discussion section, highlighting the key aspects of our method that makes it outperform existing approaches (Pages 17-18).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

Rupasinghe A, Francis N, Liu J, Bowen Z, Kanold PO, Babadi B. 2021. Experimental Data from `Direct Extraction of Signal and Noise Correlations from Two-Photon Calcium Imaging of Ensemble Neuronal Activity'. Digital Repository at the University of Maryland. 1903/26917 [DOI] [PMC free article] [PubMed]

Supplementary Materials

Transparent reporting form

elife-68046-transrepform.docx^{(247.6KB, docx)}

Data Availability Statement

The following dataset was generated:

[bib1] Abbott LF, Dayan P. The effect of correlated variability on the accuracy of a population code. Neural Computation. 1999;11:91–101. doi: 10.1162/089976699300016827. [DOI] [PubMed] [Google Scholar]

[bib2] Ahrens MB, Orger MB, Robson DN, Li JM, Keller PJ. Whole-brain functional imaging at cellular resolution using light-sheet microscopy. Nature Methods. 2013;10:413–420. doi: 10.1038/nmeth.2434. [DOI] [PubMed] [Google Scholar]

[bib3] Aitchison L, Russell L, Packer AM, Yan J, Castonguay P, Hausser M, Turaga SC. Model-based bayesian inference of neural activity and connectivity from all-optical interrogation of a neural circuit. Advances in Neural Information Processing Systems; 2017. pp. 3486–3495. [Google Scholar]

[bib4] Arlot S, Celisse A. A survey of cross-validation procedures for model selection. Statistics Surveys. 2010;4:40–79. doi: 10.1214/09-SS054. [DOI] [Google Scholar]

[bib5] Averbeck BB, Latham PE, Pouget A. Neural correlations, population coding and computation. Nature Reviews Neuroscience. 2006;7:358–366. doi: 10.1038/nrn1888. [DOI] [PubMed] [Google Scholar]

[bib6] Ba D, Babadi B, Purdon PL, Brown EN. Convergence and stability of iteratively Re-weighted least squares algorithms. IEEE Transactions on Signal Processing. 2014;62:183–195. doi: 10.1109/TSP.2013.2287685. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Bartolo R, Saunders RC, Mitz AR, Averbeck BB. Information-Limiting correlations in large neural populations. The Journal of Neuroscience. 2020;40:1668–1678. doi: 10.1523/JNEUROSCI.2072-19.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] Beal MJ. University of London, University College London (United Kingdom); 2003. Variational algorithms for approximate Bayesian inference, PhD thesis. [Google Scholar]

[bib9] Bishop CM. Pattern Recognition and Machine Learning (Information Science and Statistics. Berlin, Heidelberg: Springer-Verlag; 2006. [Google Scholar]

[bib10] Blei DM, Kucukelbir A, McAuliffe JD. Variational inference: a review for statisticians. Journal of the American Statistical Association. 2017;112:859–877. doi: 10.1080/01621459.2017.1285773. [DOI] [Google Scholar]

[bib11] Boucheron S, Lugosi G, Massart P. Concentration Inequalities: A Nonasymptotic Theory of Independence. OUP Oxford Press; 2013. [Google Scholar]

[bib12] Bowen Z, Winkowski DE, Seshadri S, Plenz D, Kanold PO. Neuronal avalanches in input and associative layers of auditory cortex. Frontiers in Systems Neuroscience. 2019;13:45. doi: 10.3389/fnsys.2019.00045. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] Bowen Z, Winkowski DE, Kanold PO. Functional organization of mouse primary auditory cortex in adult C57BL/6 and F1 (CBAxC57) mice. Scientific Reports. 2020;10:10905. doi: 10.1038/s41598-020-67819-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] Brown EN, Barbieri R, Ventura V, Kass RE, Frank LM. The time-rescaling theorem and its application to neural spike train data analysis. Neural Computation. 2002;14:325–346. doi: 10.1162/08997660252741149. [DOI] [PubMed] [Google Scholar]

[bib15] Cohen MR, Kohn A. Measuring and interpreting neuronal correlations. Nature Neuroscience. 2011;14:811–819. doi: 10.1038/nn.2842. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] Cohen MR, Maunsell JH. Attention improves performance primarily by reducing interneuronal correlations. Nature Neuroscience. 2009;12:1594–1600. doi: 10.1038/nn.2439. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] Deneux T, Kaszas A, Szalay G, Katona G, Lakner T, Grinvald A, Rózsa B, Vanzetta I. Accurate spike estimation from noisy calcium signals for ultrafast three-dimensional imaging of large neuronal populations in vivo. Nature Communications. 2016;7:1–17. doi: 10.1038/ncomms12190. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] DeWeese MR, Wehr M, Zador AM. Binary spiking in auditory cortex. The Journal of Neuroscience. 2003;23:7940–7949. doi: 10.1523/JNEUROSCI.23-21-07940.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] Ding J, Tarokh V, Yang Y. Model selection techniques: an overview. IEEE Signal Processing Magazine. 2018;35:16–34. doi: 10.1109/MSP.2018.2867638. [DOI] [Google Scholar]

[bib20] Ecker AS, Berens P, Cotton RJ, Subramaniyan M, Denfield GH, Cadwell CR, Smirnakis SM, Bethge M, Tolias AS. State dependence of noise correlations in macaque primary visual cortex. Neuron. 2014;82:235–248. doi: 10.1016/j.neuron.2014.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] Eden UT, Frank LM, Barbieri R, Solo V, Brown EN. Dynamic analysis of neural encoding by point process adaptive filtering. Neural Computation. 2004;16:971–998. doi: 10.1162/089976604773135069. [DOI] [PubMed] [Google Scholar]

[bib22] Fallani FV, Corazzol M, Sternberg JR, Wyart C, Chavez M. Hierarchy of neural organization in the embryonic spinal cord: granger-causality graph analysis of in vivo calcium imaging data. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2015;23:333–341. doi: 10.1109/TNSRE.2014.2341632. [DOI] [PubMed] [Google Scholar]

[bib23] Forli A, Vecchia D, Binini N, Succol F, Bovetti S, Moretti C, Nespoli F, Mahn M, Baker CA, Bolton MM, Yizhar O, Fellin T. Two-Photon bidirectional control and imaging of neuronal excitability with high spatial resolution in Vivo. Cell Reports. 2018;22:3087–3098. doi: 10.1016/j.celrep.2018.02.063. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] Francis NA, Winkowski DE, Sheikhattar A, Armengol K, Babadi B, Kanold PO. Small networks encode Decision-Making in primary auditory cortex. Neuron. 2018;97:885–897. doi: 10.1016/j.neuron.2018.01.019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] Friedrich J, Zhou P, Paninski L. Fast online deconvolution of calcium imaging data. PLOS Computational Biology. 2017;13:e1005423. doi: 10.1371/journal.pcbi.1005423. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Frisina RD, Singh A, Bak M, Bozorg S, Seth R, Zhu X. F1 (CBA×C57) mice show superior hearing in old age relative to their parental strains: hybrid vigor or a new animal model for "golden ears"? Neurobiology of Aging. 2011;32:1716–1724. doi: 10.1016/j.neurobiolaging.2009.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] Gawne TJ, Richmond BJ. How independent are the messages carried by adjacent inferior temporal cortical neurons? The Journal of Neuroscience. 1993;13:2758–2771. doi: 10.1523/JNEUROSCI.13-07-02758.1993. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] Grewe BF, Langer D, Kasper H, Kampa BM, Helmchen F. High-speed in vivo calcium imaging reveals neuronal network activity with near-millisecond precision. Nature Methods. 2010;7:399–405. doi: 10.1038/nmeth.1453. [DOI] [PubMed] [Google Scholar]

[bib29] Hansen BJ, Chelaru MI, Dragoi V. Correlated variability in laminar cortical circuits. Neuron. 2012;76:590–602. doi: 10.1016/j.neuron.2012.08.029. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Hastings WK. Monte carlo sampling methods using markov chains and their applications. Biometrika. 1970;57:97–109. doi: 10.1093/biomet/57.1.97. [DOI] [Google Scholar]

[bib31] Jewell SW, Hocking TD, Fearnhead P, Witten DM. Fast nonconvex deconvolution of calcium imaging data. Biostatistics. 2020;21:709–726. doi: 10.1093/biostatistics/kxy083. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] Jewell S, Witten D. Exact spike train inference via $\ell_{0}$ optimization. The Annals of Applied Statistics. 2018;12:2457–2482. doi: 10.1214/18-AOAS1162. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK. An introduction to variational methods for graphical models. Machine Learning. 1999;37:183–233. doi: 10.1023/A:1007665907178. [DOI] [Google Scholar]

[bib34] Josić K, Shea-Brown E, Doiron B, de la Rocha J. Stimulus-dependent correlations and population codes. Neural Computation. 2009;21:2774–2804. doi: 10.1162/neco.2009.10-08-879. [DOI] [PubMed] [Google Scholar]

[bib35] Kadirvelu B, Hayashi Y, Nasuto SJ. Inferring structural connectivity using ising couplings in models of neuronal networks. Scientific Reports. 2017;7:2. doi: 10.1038/s41598-017-05462-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] Kazemipour A, Liu J, Solarana K, Nagode DA, Kanold PO, Wu M, Babadi B. Fast and stable signal deconvolution via compressible State-Space models. IEEE Transactions on Biomedical Engineering. 2018;65:74–86. doi: 10.1109/TBME.2017.2694339. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] Keeley SL, Aoi MC, Yu Y, Smith SL, Pillow JW. In: Advances in Neural Information Processing Systems. Larochelle H, Ranzato M, Hadsell R, Balcan M. F, Lin H, editors. Curran Associates, Inc; 2020. Identifying signal and noise structure in neural population activity with Gaussian process factor models; pp. 1–84. [DOI] [Google Scholar]

[bib38] Kerlin A, Mohar B, Flickinger D, MacLennan BJ, Dean MB, Davis C, Spruston N, Svoboda K. Functional clustering of dendritic activity during decision-making. eLife. 2019;8:e46966. doi: 10.7554/eLife.46966. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] Kohn A, Coen-Cagli R, Kanitscheider I, Pouget A. Correlations and neuronal population information. Annual Review of Neuroscience. 2016;39:237–256. doi: 10.1146/annurev-neuro-070815-013851. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] Kohn A, Smith MA. Stimulus dependence of neuronal correlation in primary visual cortex of the macaque. Journal of Neuroscience. 2005;25:3661–3673. doi: 10.1523/JNEUROSCI.5106-04.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] Kratz MB, Manis PB. Spatial organization of excitatory synaptic inputs to layer 4 neurons in mouse primary auditory cortex. Frontiers in Neural Circuits. 2015;9:17. doi: 10.3389/fncir.2015.00017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib42] Linden JF, Liu RC, Sahani M, Schreiner CE, Merzenich MM. Spectrotemporal structure of receptive fields in Areas AI and AAF of mouse auditory cortex. Journal of Neurophysiology. 2003;90:2660–2675. doi: 10.1152/jn.00751.2002. [DOI] [PubMed] [Google Scholar]

[bib43] Linderman S, Adams RP, Pillow JW. Bayesian latent structure discovery from multi-neuron recordings. Advances in Neural Information Processing Systems; 2016. pp. 2002–2010. [Google Scholar]

[bib44] Lipkus AH. A proof of the triangle inequality for the tanimoto distance. Journal of Mathematical Chemistry. 1999;26:263–265. doi: 10.1023/A:1019154432472. [DOI] [Google Scholar]

[bib45] Liu J, Whiteway MR, Sheikhattar A, Butts DA, Babadi B, Kanold PO. Parallel processing of sound dynamics across mouse auditory cortex via spatially patterned thalamic inputs and distinct areal intracortical circuits. Cell Reports. 2019;27:872–885. doi: 10.1016/j.celrep.2019.03.069. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] Lütcke H, Gerhard F, Zenke F, Gerstner W, Helmchen F. Inference of neuronal network spike dynamics and topology from calcium imaging data. Frontiers in Neural Circuits. 2013;7:201. doi: 10.3389/fncir.2013.00201. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib47] Lyamzin DR, Barnes SJ, Donato R, Garcia-Lazaro JA, Keck T, Lesica NA. Nonlinear transfer of signal and noise correlations in cortical networks. Journal of Neuroscience. 2015;35:8065–8080. doi: 10.1523/JNEUROSCI.4738-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib48] Martin DA, Ribeiro TL, Cannas SA, Grigera TS, Plenz D, Chialvo DR. Box-scaling as a proxy of finite-size correlations. arXiv. 2020 doi: 10.1038/s41598-021-95595-2. https://arxiv.org/abs/2007.08236 [DOI] [PMC free article] [PubMed]

[bib49] Meng X, Kao JP, Lee HK, Kanold PO. Intracortical circuits in thalamorecipient layers of auditory cortex refine after visual deprivation. Eneuro. 2017a;4:ENEURO.0092-17.2017. doi: 10.1523/ENEURO.0092-17.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib50] Meng X, Winkowski DE, Kao JPY, Kanold PO. Sublaminar subdivision of mouse auditory cortex layer 2/3 based on functional translaminar connections. The Journal of Neuroscience. 2017b;37:10200–10214. doi: 10.1523/JNEUROSCI.1361-17.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib51] Mishchenko Y, Vogelstein JT, Paninski L. A bayesian approach for inferring neuronal connectivity from calcium fluorescent imaging data. The Annals of Applied Statistics. 2011;5:1229–1261. doi: 10.1214/09-AOAS303. [DOI] [Google Scholar]

[bib52] Montijn JS, Vinck M, Pennartz CM. Population coding in mouse visual cortex: response reliability and dissociability of stimulus tuning and noise correlation. Frontiers in Computational Neuroscience. 2014;8:58. doi: 10.3389/fncom.2014.00058. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib53] Najafi F, Elsayed GF, Cao R, Pnevmatikakis E, Latham PE, Cunningham JP, Churchland AK. Excitatory and inhibitory subnetworks are equally selective during Decision-Making and emerge simultaneously during learning. Neuron. 2020;105:165–179. doi: 10.1016/j.neuron.2019.09.045. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib54] Pachitariu M, Stringer C, Harris KD. Robustness of spike deconvolution for neuronal calcium imaging. The Journal of Neuroscience. 2018;38:7976–7985. doi: 10.1523/JNEUROSCI.3339-17.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib55] Paninski L. Maximum likelihood estimation of cascade point-process neural encoding models. Network: Computation in Neural Systems. 2004;15:243–262. doi: 10.1088/0954-898X_15_4_002. [DOI] [PubMed] [Google Scholar]

[bib56] Petrus E, Isaiah A, Jones AP, Li D, Wang H, Lee HK, Kanold PO. Crossmodal induction of thalamocortical potentiation leads to enhanced information processing in the auditory cortex. Neuron. 2014;81:664–673. doi: 10.1016/j.neuron.2013.11.023. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib57] Pillow JW, Scott J. In: Advances in Neural Information Processing Systems. Pereira F, Burges C. J. C, Bottou L, Weinberger K. Q, editors. Curran Associates, Inc; 2012. Fully Bayesian inference for neural models with negative-binomial spiking; pp. 1898–1906. [Google Scholar]

[bib58] Pnevmatikakis E, Soudry D, Gao Y, Machado TA, Merel J, Pfau D, Reardon T, Mu Y, Lacefield C, Yang W, Ahrens M, Bruno R, Jessell TM, Peterka D, Yuste R, Paninski L, Denoising S. Deconvolution, and demixing of calcium imaging data. Neuron. 2016;89:37. doi: 10.1016/j.neuron.2015.11.037. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib59] Polson NG, Scott JG, Windle J. Bayesian inference for logistic models using Pólya–Gamma Latent Variables. Journal of the American Statistical Association. 2013;108:1339–1349. doi: 10.1080/01621459.2013.829001. [DOI] [Google Scholar]

[bib60] Ramesh RN, Burgess CR, Sugden AU, Gyetvan M, Andermann ML. Intermingled ensembles in visual association cortex encode stimulus identity or predicted outcome. Neuron. 2018;100:900–915. doi: 10.1016/j.neuron.2018.09.024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib61] Rauch HE, Tung F, Striebel CT. Maximum likelihood estimates of linear dynamic systems. AIAA Journal. 1965;3:1445–1450. doi: 10.2514/3.3166. [DOI] [Google Scholar]

[bib62] Romano SA, Pérez-Schuster V, Jouary A, Boulanger-Weill J, Candeo A, Pietri T, Sumbre G. An integrated calcium imaging processing toolbox for the analysis of neuronal population dynamics. PLOS Computational Biology. 2017;13:e1005526. doi: 10.1371/journal.pcbi.1005526. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib63] Romero S, Hight AE, Clayton KK, Resnik J, Williamson RS, Hancock KE, Polley DB. Cellular and widefield imaging of sound frequency organization in primary and higher order fields of the mouse auditory cortex. Cerebral Cortex. 2020;30:1603–1622. doi: 10.1093/cercor/bhz190. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib64] Rothschild G, Nelken I, Mizrahi A. Functional organization and population dynamics in the mouse primary auditory cortex. Nature Neuroscience. 2010;13:353–360. doi: 10.1038/nn.2484. [DOI] [PubMed] [Google Scholar]

[bib65] Rumyantsev OI, Lecoq JA, Hernandez O, Zhang Y, Savall J, Chrapkiewicz R, Li J, Zeng H, Ganguli S, Schnitzer MJ. Fundamental bounds on the fidelity of sensory cortical coding. Nature. 2020;580:100–105. doi: 10.1038/s41586-020-2130-2. [DOI] [PubMed] [Google Scholar]

[bib66] Rupasinghe A. Direct Extraction of Signal and Noise Correlations from Two-Photon Calcium Imaging of Ensemble Neuronal Activity MATLAB Codes. GitHub Repository. 2020 doi: 10.7554/eLife.68046. https://github.com/Anuththara-Rupasinghe/Signal-Noise-Correlation [DOI] [PMC free article] [PubMed]

[bib67] Rupasinghe A, Francis N, Liu J, Bowen Z, Kanold PO, Babadi B. Experimental Data From ‘Direct Extraction of Signal and Noise Correlations From Two-Photon Calcium Imaging of Ensemble Neuronal Activity’. Digital Repository at the University of Maryland (DRUM); 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib68] Rupasinghe A, Babadi B. Robust inference of neuronal correlations from blurred and noisy spiking observations. 2020 54th Annual Conference on Information Sciences and Systems (CISS); 2020. pp. 1–5. [DOI] [Google Scholar]

[bib69] Shumway RH, Stoffer DS. An approach to time series smoothing and forecasting using the em algorithm. Journal of Time Series Analysis. 1982;3:253–264. doi: 10.1111/j.1467-9892.1982.tb00349.x. [DOI] [Google Scholar]

[bib70] Smith AC, Brown EN. Estimating a state-space model from point process observations. Neural Computation. 2003;15:965–991. doi: 10.1162/089976603765202622. [DOI] [PubMed] [Google Scholar]

[bib71] Smith MA, Sommer MA. Spatial and temporal scales of neuronal correlation in visual area V4. Journal of Neuroscience. 2013;33:5422–5432. doi: 10.1523/JNEUROSCI.4782-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib72] Sompolinsky H, Yoon H, Kang K, Shamir M. Population coding in neuronal systems with correlated noise. Physical Review E. 2001;64:051904. doi: 10.1103/PhysRevE.64.051904. [DOI] [PubMed] [Google Scholar]

[bib73] Soudry D, Keshri S, Stinson P, Oh MH, Iyengar G, Paninski L. Efficient "Shotgun" Inference of Neural Connectivity from Highly Sub-sampled Activity Data. PLOS Computational Biology. 2015;11:e1004464. doi: 10.1371/journal.pcbi.1004464. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib74] Stosiek C, Garaschuk O, Holthoff K, Konnerth A. In vivo two-photon calcium imaging of neuronal networks. PNAS. 2003;100:7319–7324. doi: 10.1073/pnas.1232232100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib75] Stringer C, Pachitariu M. Computational processing of neural recordings from calcium imaging data. Current Opinion in Neurobiology. 2019;55:22–31. doi: 10.1016/j.conb.2018.11.005. [DOI] [PubMed] [Google Scholar]

[bib76] Svoboda K, Yasuda R. Principles of two-photon excitation microscopy and its applications to neuroscience. Neuron. 2006;50:823–839. doi: 10.1016/j.neuron.2006.05.019. [DOI] [PubMed] [Google Scholar]

[bib77] Theis L, Berens P, Froudarakis E, Reimer J, Román Rosón M, Baden T, Euler T, Tolias AS, Bethge M. Benchmarking spike rate inference in population calcium imaging. Neuron. 2016;90:471–482. doi: 10.1016/j.neuron.2016.04.014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib78] Truccolo W, Eden UT, Fellows MR, Donoghue JP, Brown EN. A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. Journal of Neurophysiology. 2005;93:1074–1089. doi: 10.1152/jn.00697.2004. [DOI] [PubMed] [Google Scholar]

[bib79] Vinci G, Ventura V, Smith MA, Kass RE. Separating spike count correlation from firing rate correlation. Neural Computation. 2016;28:849–881. doi: 10.1162/NECO_a_00831. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib80] Vogelstein JT, Watson BO, Packer AM, Yuste R, Jedynak B, Paninski L. Spike inference from calcium imaging using sequential monte carlo methods. Biophysical Journal. 2009;97:636–655. doi: 10.1016/j.bpj.2008.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib81] Vogelstein JT, Packer AM, Machado TA, Sippy T, Babadi B, Yuste R, Paninski L. Fast nonnegative deconvolution for spike train inference from population calcium imaging. Journal of Neurophysiology. 2010;104:3691–3704. doi: 10.1152/jn.01073.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib82] Wang C, Blei DM. Variational inference in nonconjugate models. Journal of Machine Learning Research : JMLR. 2013;14:1005–1031. doi: 10.5555/2567709.2502613. [DOI] [Google Scholar]

[bib83] Watkins PV, Kao JP, Kanold PO. Spatial pattern of intra-laminar connectivity in supragranular mouse auditory cortex. Frontiers in Neural Circuits. 2014;8:15. doi: 10.3389/fncir.2014.00015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib84] Winkowski DE, Kanold PO. Laminar transformation of frequency organization in auditory cortex. Journal of Neuroscience. 2013;33:1498–1508. doi: 10.1523/JNEUROSCI.3101-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib85] Wong R. Asymptotic approximations of integrals. Society for Industrial and Applied Mathematics. 2001;1:260. doi: 10.1137/1.9780898719260. [DOI] [Google Scholar]

[bib86] Yatsenko D, Josić K, Ecker AS, Froudarakis E, Cotton RJ, Tolias AS. Improved estimation and interpretation of correlations in neural circuits. PLOS Computational Biology. 2015;11:e1004083. doi: 10.1371/journal.pcbi.1004083. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib87] Yu BM, Cunningham JP, Santhanam G, Ryu SI, Shenoy KV, Sahani M. Gaussian-Process factor analysis for Low-Dimensional Single-Trial analysis of neural population activity. Journal of Neurophysiology. 2009;102:614–635. doi: 10.1152/jn.90941.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib88] Yu S, Yang H, Nakahara H, Santos GS, Nikolić D, Plenz D. Higher-order interactions characterized in cortical activity. Journal of Neuroscience. 2011;31:17514–17526. doi: 10.1523/JNEUROSCI.3127-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Direct extraction of signal and noise correlations from two-photon calcium imaging of ensemble neuronal activity

Anuththara Rupasinghe

Nikolas Francis

Ji Liu

Zac Bowen

Patrick O Kanold

Behtash Babadi

Roles

Abstract

Introduction

Results

Signal and noise correlations

Figure 1. The proposed generative model and inverse problem.

Existing methods used for performance comparison

Simulation study 1: neuronal ensemble driven by external stimulus

Figure 2. Results of simulation study 1.

Figure 2—figure supplement 1. Sensitivity of two-stage estimates to the choice of the underlying spike deconvolution technique.

Figure 2—figure supplement 2. Performance of two-stage estimates based on ground truth spikes.

Figure 2—figure supplement 3. Performance comparison under stimulus integration model mismatch.

Figure 2—figure supplement 4. Performance under calcium decay model mismatch.

Figure 2—figure supplement 5. Performance comparison under varying SNR levels and firing rates.

Figure 2—figure supplement 6. Performance comparison under observation noise model mismatch.

Analysis of robustness with respect to modeling assumptions

Simulation study 2: spontaneous activity

Figure 3. Results of simulation study 2.

Real data study 1: mouse auditory cortex under random tone presentation

Figure 4. Application to experimentally-recorded data from the mouse A1.

Figure 4—figure supplement 1. Probing the effect of stimulus integration window length on the performance of the proposed estimates.

Figure 4—figure supplement 2. Inspecting the inferred latent processes under high fluorescence activity due to rapid increase in firing rate.

Table 1. Dissimilarity metric statistics for the estimates in Figure 4A (also illustrated in Figure 4D), linear regression statistics of the comparison between signal and noise correlations in Figure 4E, and the average NMSE across 50 trials used in the shuffling procedure illustrated in Figure 5A.

Figure 5. Assessing the specificity of different estimation results shown in Figure 4.

Real data study 2: spontaneous vs. stimulus-driven activity in the mouse A1

Figure 6. Comparison of spontaneous and stimulus-driven activity in the mouse A1.

Figure 6—figure supplement 1. Histograms of the similarity/dissimilarity metrics under the shuffling procedure.

Table 2. Similarity/dissimilarity metric statistics for the estimates in Figure 6.

Real data study 3: spatial analysis of signal and noise correlations in the mouse A1

Figure 7. Comparison of signal and noise correlations across layers 2/3 and 4.

Figure 7—figure supplement 1. Comparing the marginal distributions of signal and noise correlations along the dorsoventral and rostrocaudal axes.

Figure 7—figure supplement 2. Marginal angular distributions of signal and noise correlations.

Table 3. Linear regression statistics for the analysis of correlations vs. cell-pair distance.

Theoretical analysis of the bias and variance of the proposed estimators

Discussion

Materials and methods

Proposed forward model

Figure 8. Probabilistic graphical model of the proposed forward model.

Overview of the proposed estimation method

Preliminary assumptions

Decoupling via Pólya-Gamma augmentation

Deriving the optimal variational densities

Low-complexity parameter updates

Guidelines for model parameter settings

Number of neurons selected for the analysis (N)

Stimulus integration window length (R)

Observation noise covariance (𝚺w) and scaling matrix (𝐀)

State transition parameter (α)

Mean of the latent trial-dependent process (𝝁x)

Parameter choices for simulation study 1

Parameter choices for simulation study 2

Parameter choices for real data study 1

Parameter choices for real data study 2

Parameter choices for real data study 3

Performance evaluation

Simulation studies

Real data studies

Hyper-parameter tuning

Experimental procedures

Data pre-processing

Stimuli for real data study 1

Stimuli for real data study 2

Stimuli for real data study 3

Acknowledgements

Appendix 1

Relationship to existing definitions of Signal and Noise correlations

Appendix 2

Proof of Theorem 1

Lemma 1

Bounding the bias of d^j

Bounding the variance of d^j

Bounding the bias of (𝚺^x)i,j

Number of neurons selected for the analysis ( $N$ )

Stimulus integration window length ( $R$ )

Observation noise covariance ( $𝚺_{w}$ ) and scaling matrix ( $𝐀$ )

Mean of the latent trial-dependent process ( $𝝁_{x}$ )

Bounding the bias of ${\hat{d}}_{j}$

Bounding the variance of ${\hat{d}}_{j}$

Bounding the bias of ${({\hat{𝚺}}_{x})}_{i, j}$

Bounding the variance of $({\hat{Σ}}_{x})_{i, j}$