Skip to main content
eLife logoLink to eLife
. 2021 Mar 8;10:e51675. doi: 10.7554/eLife.51675

Relationship between simultaneously recorded spiking activity and fluorescence signal in GCaMP6 transgenic mice

Lawrence Huang 1,, Peter Ledochowitsch 1,, Ulf Knoblich 1, Jérôme Lecoq 1, Gabe J Murphy 1, R Clay Reid 1, Saskia EJ de Vries 1, Christof Koch 1, Hongkui Zeng 1,, Michael A Buice 1, Jack Waters 1,, Lu Li 1,2,
Editors: Gary L Westbrook3, Gary L Westbrook4
PMCID: PMC8060029  PMID: 33683198

Abstract

Fluorescent calcium indicators are often used to investigate neural dynamics, but the relationship between fluorescence and action potentials (APs) remains unclear. Most APs can be detected when the soma almost fills the microscope’s field of view, but calcium indicators are used to image populations of neurons, necessitating a large field of view, generating fewer photons per neuron, and compromising AP detection. Here, we characterized the AP-fluorescence transfer function in vivo for 48 layer 2/3 pyramidal neurons in primary visual cortex, with simultaneous calcium imaging and cell-attached recordings from transgenic mice expressing GCaMP6s or GCaMP6f. While most APs were detected under optimal conditions, under conditions typical of population imaging studies, only a minority of 1 AP and 2 AP events were detected (often <10% and ~20–30%, respectively), emphasizing the limits of AP detection under more realistic imaging conditions.

Research organism: Mouse

eLife digest

Neurons, the cells that make up the nervous system, transmit information using electrical signals known as action potentials or spikes. Studying the spiking patterns of neurons in the brain is essential to understand perception, memory, thought, and behaviour. One way to do that is by recording electrical activity with microelectrodes. Another way to study neuronal activity is by using molecules that change how they interact with light when calcium binds to them, since changes in calcium concentration can be indicative of neuronal spiking. That change can be observed with specialized microscopes know as two-photon fluorescence microscopes. Using calcium indicators, it is possible to simultaneously record hundreds or even thousands of neurons. However, calcium fluorescence and spikes do not translate one-to-one.

In order to interpret fluorescence data, it is important to understand the relationship between the fluorescence signals and the spikes associated with individual neurons. The only way to directly measure this relationship is by using calcium imaging and electrical recording simultaneously to record activity from the same neuron. However, this is extremely challenging experimentally, so this type of data is rare.

To shed some light on this, Huang, Ledochowitsch et al. used mice that had been genetically modified to produce a calcium indicator in neurons of the visual cortex and simultaneously obtained both fluorescence measurements and electrical recordings from these neurons. These experiments revealed that, while the majority of time periods containing multi-spike neural activity could be identified using calcium imaging microscopy, on average, less than 10% of isolated single spikes were detectable. This is an important caveat that researchers need to take into consideration when interpreting calcium imaging results.

These findings are intended to serve as a guide for interpreting calcium imaging studies that look at neurons in the mammalian brain at the population level. In addition, the data provided will be useful as a reference for the development of activity sensors, and to benchmark and improve computational approaches for detecting and predicting spikes.

Introduction

Genetically encoded calcium indicators (GECIs) are widely used with two-photon laser scanning microscopy to report neuronal activity within local populations in vivo (Luo et al., 2018). This optical approach is minimally invasive and enables simultaneous measurement of activity from hundreds or even thousands of neurons at single-cell resolution, over multiple sessions. Using a contemporary GECI such as GCaMP6s, fluorescence changes associated with isolated spikes (action potentials, APs) in vivo can be detected when imaged at sufficiently high spatiotemporal resolution (Chen et al., 2013) (http://dx.doi.org/10.6080/K02R3PMN). Yet undetected APs are common in population imaging experiments (Theis et al., 2016; Berens et al., 2018).

Inferring the underlying AP train or firing rate from calcium imaging remains challenging for several reasons. First, population imaging studies necessarily employ a large field of view containing many neurons. In contrast, the AP to calcium-dependent fluorescence transfer function is typically characterized with a soma filling the field of view of the microscope, to maximize photon flux from the soma and thereby signal-to-noise ratio. Second, there is no ground truth spiking information available for most neurons in a population. Spiking information, often from a cell-attached recording, can be used to refine the spike inference model and thereby optimize AP detection. Third, the AP to calcium-dependent fluorescence transfer function may be different for each neuron due to various intrinsic and extrinsic factors, such as neuron-to-neuron differences in indicator expression.

Compared to viral expression, transgenic mouse lines offer convenience (e.g. bypassing virus injection and associated procedures) and achieve more uniform GECI expression in genetically defined neuronal populations (Madisen et al., 2015; Daigle et al., 2018). Using our intersectional transgenic mouse lines that enable Cre recombinase-dependent expression of GCaMP6s or GCaMP6f, we simultaneously characterized the spiking activity and fluorescence of individual GECI-expressing pyramidal neurons in layer 2/3 of mouse primary visual cortex (V1). We then tested the performance of several spike inference models, detecting APs under optimal conditions (models refined using the spiking information, with the soma filling the field of view) and under the less optimal conditions typical of population imaging experiments. Our results provide insight into the relationship between spiking activity in vivo and fluorescence signals and will aid the interpretation of existing and future calcium imaging datasets.

Results

To characterize the single-cell transfer function between observed fluorescence signals and underlying APs in vivo, we performed simultaneous calcium imaging and cell-attached recordings in V1 L2/3 excitatory pyramidal neurons in anesthetized mice (Figure 1A,B). To directly compare our results to virally expressed GCaMP6f and GCaMP6s (Chen et al., 2013) (http://dx.doi.org/10.6080/K02R3PMN), we used a small field of view (19.3–27.3 × 19.3–21.5 µm; scanning rate 141.3–158.3 frames per second [fps]). 2–10 min recordings were obtained from 213 neurons, all with fluorescence excluded from the nucleus. Quality control code was deployed to exclude from further analysis recordings with artifacts such as motion, photobleaching, somatic dye from the recording pipette, electrophysiological or fluorescence baseline instability, and abrupt changes in AP waveform (see 'Materials and methods'). The dataset for further analysis was from 48 neurons from mice of four transgenic lines, two expressing GCaMP6s and two GCaMP6f in excitatory neurons in layer 2/3 and deeper layers of cortex (Table 1).

Figure 1. Simultaneous calcium imaging and electrophysiology in vivo.

(A) Experimental design. (B) Fluorescence and Vm traces from an exemplar Emx1-s neuron. (C) 5 s of data from the neuron in panel B, showing a 2 AP, a 1 AP, and a 3 AP event. Pre- and post-AP exclusion windows, used to separate events, are illustrated for each event. AP, action potential.

Figure 1.

Figure 1—figure supplement 1. Mouse age, cell depth, and firing rate.

Figure 1—figure supplement 1.

(A) Mouse age. (B) Cell depths. (C) For each neuron, percentage of action potentials (APs) contained within 1–5 AP events.

Table 1. Dataset.

Sample size for each mouse line (see also Figure 1—figure supplement 1).

Mouse line Acronym GECI Mice Cells Recording duration APs per neuron
Emx1-IRES-Cre;
Camk2a-tTA;Ai94
Emx1-s GCaMP6s 5 21 241 ± 32 s 478 ± 121
Camk2a-tTA; tetO-GCaMP6s tetO-s GCaMP6s 1 4 347 ± 108 s 348 ± 71
Cux2-CreERT2;
Camk2a-tTA;Ai93
Cux2-f GCaMP6f 3 12 300 ± 79 s 484 ± 112
Emx1-IRES-Cre;
Camk2a-tTA;Ai93
Emx1-f GCaMP6f 4 11 201 ± 23 s 219 ± 38

We analyzed events with fluorescence transients separated from those of adjacent events, containing a total of 5427 APs (28% of APs; Figure 1—figure supplement 1C). An event was defined as one or more APs within 250 ms, with no APs in the preceding or subsequent 300 ms for GCaMP6f or 1 s and 500 ms for GCaMP6s (Figure 1C).

Calcium transients differ across mouse lines

Fluorescence measured from the soma is contaminated with fluorescence from the surrounding neuropil, due to the extended nature of the microscope point spread function. Neuropil contamination is often removed by subtracting a scaled version of the neuropil fluorescence from the somatic fluorescence, with the scale factor referred to as the r value (Kerlin et al., 2010; Akerboom et al., 2012). The r value can affect AP detection, with under-subtraction of neuropil leading to false positives (events detected when there was activity in the neuropil but not the soma) and over-subtraction leading to false negatives (failure to detect somatic activity). We examined the effects of r on detection of 1 AP events, with electrical recordings providing ground truth (Figure 2 and Figure 2—figure supplement 1). For many GCaMP6s neurons, the receiver operating characteristic (ROC) curve changed little with r (Figure 2A), indicating that APs were detected with few false positives with little effect of neuropil subtraction. Neuropil subtraction exerted a stronger influence on event detection in GCaMP6f neurons, where the ROC curve changed with r (Figure 2A), permitting identification of the optimal r as that which maximized the area under the ROC curve and, thereby, the true/false event detection ratio. Optimal r for Emx1-f and Cux2-f neurons was approximately normally distributed with mean ± SEM of 0.82 ± 0.07 (20 neurons, Figure 2B). Our results indicate that the value of r has a modest effect on event detection in GCaMP6f neurons in mouse V1. The effect of neuropil subtraction may be greater during coordinated activity across the whole network, such as during strong sensory stimuli.

Figure 2. Neuropil subtraction optimized for 1 AP (action potential) detection.

(A) Effect of changing neuropil subtraction on detection for exemplar GCaMP6s and GCaMP6f neurons. Upper plots: family of receiver operating characteristic (ROC) curves. Each curve illustrates detection probability for true APs against probability of false positives as detection threshold is changed, for 1 AP events. False positives were calculated from time windows with no APs. Each ROC curve represents a different value of r. Lower plots: area under the ROC curve as a function of r. Gray symbols represent value of r for which r * Fneuropil(t) was greater than Fcell_measured(t), resulting in a negative F0 and inversion of the ΔF/F trace. (B) Distribution of r values for 20 GCaMP6f neurons.

Figure 2.

Figure 2—figure supplement 1. Simulated effect of neuropil subtraction on event detection.

Figure 2—figure supplement 1.

(A) Simulated cell and neuropil traces. The neuropil trace contained transients that were (1) associated with cell transients and (2) between cell transients, and amplitudes were scaled by the neuropil contamination ratio r (r = 0.3) relative to the cell amplitudes. (B) (Left) Receiver operating characteristic (ROC) curves for classifying cell amplitudes, where r was varied from 0 to 1 for neuropil correction. The detection threshold was defined as the xth percentile of noise amplitudes (amplitudes of the neuropil trace that were between cell transients), where 1–x represented the false positive probability, and the detection probability (true positive probability) was the fraction of estimated cell amplitudes (amplitudes of the summed trace that were associated with cell transients) above the detection threshold. (Right) Area under ROC curves as a function of r.

After neuropil subtraction (see 'Materials and methods'), we averaged trials by number of APs, fit a sum of exponentials to estimate rise and decay time constants and calculated peak ΔF/F (mean fluorescence over 100 ms around the maximum within 300 ms for GCaMP6f and 500 ms for GCaMP6s) for events with 1–5 APs (Figure 3). 28–55% of detected APs were in events with 1–5 APs (Figure 1—figure supplement 1) and >70% of these analyzed APs were in multi-AP events. As expected, peak ΔF/F increased approximately linearly with 1–5 APs, and peak ΔF/F and decay time constant were greater with GCaMP6s than GCaMP6f (Figure 3C). Peak ΔF/F was comparable to or slightly greater than in previous studies with GCaMP6s and GCaMP6f, possibly because we subtracted more of the neuropil fluorescence with a slightly greater r value (r = 0.8 vs. 0.7; Chen et al., 2013).

Figure 3. Action potential (AP)-evoked calcium transients in four mouse lines.

(A) Example fluorescence traces (ΔF/F, with r = 0.82 neuropil subtraction) for 1 AP, 2 AP, and 3 AP events for an exemplar Emx1-s neuron (forty-two 1 AP events, thirty-eight 2 AP events, sixteen 3 AP events) and an exemplar Emx1-f neuron (twenty-three 1 AP events, twelve 2 AP events, eleven 3 AP events). (B) Mean fluorescence traces and fits (sum of two exponentials) for 1–5 AP events for the two neurons in A. (C) Mean ± SEM peak DF/F, rise time constant, and decay time constant for four mouse lines. Number of neurons for 1–5 AP events were: 15, 16, 14, 9, 6 for Emx1-s; 4, 4, 4, 0, 0 for tetO-s; 10, 10, 10, 5, 4 for Emx1-f; 9, 9, 7, 4, 2 for Cux2-f. Asterisks indicate differences between mouse lines (p<0.05, one-way ANOVA).

Figure 3.

Figure 3—figure supplement 1. Trial-to-trial variability of 1 AP (action potential) events.

Figure 3—figure supplement 1.

Fraction of trials in which the peak of the fluorescence transient exceeded the range expected from Poisson noise, a measure of trial-to-trial variability. (A) Example 1 AP event for an Emx1-s neuron, with fluorescence in units of photons. Dashed line: mean, across trials, of peak fluorescence. Gray: 95% confidence interval for Poisson-distributed noise. (B) Amplitudes of 1 AP events for 30 trials. Dashed line and gray area: mean and 95% confidence interval. (C) Mean ± SEM percentage of trials with peak ΔF outside the 95% confidence interval, for each mouse line. In all four mouse lines, peak fluorescence exceeded the 95% confidence interval for Poisson-distributed noise in >>5% of 1 AP trials. The fraction of trials outside the confidence interval correlated with resting fluorescence, likely the result of different illumination intensities across experiments.

As expected, photon shot noise was the dominant noise source in images from all mouse lines. The pixelwise slope of the least squares fit between the variance and mean of the photon flux was 1.04 ± 0.01 (mean ± SEM), consistent with the noise following a Poisson process (intercept −0.08 ± 0.2, 48 neurons). Trial-to-trial variability in the amplitude of the 1 AP-evoked fluorescence was substantial and exceeded photon shot noise in most neurons (Figure 3—figure supplement 1). The sources of non-Poisson variability in our results are unclear, but negligible motion was visible in the movies after motion correction. Likely the variability results primarily from trial-to-trial differences in the AP-evoked calcium concentration, assuming GCaMP6f and GCaMP6s are expressed at sufficient concentrations to report resting changes in calcium concentration in all four mouse lines.

Increasing field of view reduces optimized event detection

GCaMP6 indicators have been widely adopted because they exhibit greater AP-evoked ΔF/F than previous GCaMP indicators, but still some APs may go undetected (Chen et al., 2013). Under ideal conditions, almost all APs can be detected (with probability close to 1 at a false positive probability of 1%; Chen et al., 2013). However, many imaging experiments are performed with a field of view of hundreds of micrometers and this large field of view limits the dwell time per soma and thereby the photon flux per soma and signal-to-noise ratio. What event detection rate might be expected when imaging a large field of view, sufficient to include hundreds of somata? How much does field of view affect event detection?

We calculated detection probability for 1 AP and 2 AP events, using AP times from electrophysiology recordings to optimize event detection for each neuron (Chen et al., 2013). In high spatial and temporal resolution images, the probability of 1 AP event detected spanned a wide range (probability 0.07–1 and 0.11–0.95 for GCaMP6s and GCaMP6f, at 1% false positive probability, Figure 4A-C). As expected (Dana et al., 2014; Wei et al., 2019), most 1 AP events were detected in GCaMP6s and GCaMP6f neurons, but with lower average probability in GCaMP6f neurons (1 AP detection probability 0.70 ± 0.06 for 18 Emx1-s neurons, 0.80 ± 0.03 for three tetO-s neurons, 0.40 ± 0.08 for nine Cux2-f neurons, 0.60 ± 0.08 for 11 Emx1-f neurons, mean ± SEM at 1% false positive probability). 2 AP events were reliably detected in all four mouse lines (Figure 4C; detection probability 0.90 ± 0.06 for Emx1-s, 1.0 ± 0.0 for tetO-s, 0.66 ± 0.07 for Cux2-f, 0.80 ± 0.05 for Emx1-f, at 1% false positive probability). In high spatial and temporal resolution images, in all four mouse lines, it was possible to detect most but by no means all 1 AP and 2 AP events with a false positive probability of only 1%.

Figure 4. Downsampling affects event detection.

Figure 4.

(A–C) Event detection from images at high spatial and temporal resolution. (A) Receiver operating characteristic (ROC) curves for 1 AP events in 42 neurons, organized by mouse line. Neuropil subtraction was performed with r = 0.8 where possible (see 'Materials and methods'). Numbers of neurons were 18 Emx1-s, 3 tetO-s, 11 Cux2-f, 9 Emx1-f. (B) Mean ROC curves for the four mouse lines. (C) Event detection probabilities for 1 AP and 2 AP events. Bars represent mean ± SEM. (D–F) Equivalent plots for the same neurons after downsampling.

In these transgenic mice, 1 AP detection probabilities were lower than in previously reported neurons with virally expressed GCaMP6s and GCaMP6f (0.99 and 0.84 at 1% false positive probability in Chen et al., 2013). There are several possible reasons for this difference. In the transgenic mice used here, GCaMP expression is widespread throughout neocortex, which may result in labeling of greater numbers of axons and dendrites that contribute to the neuropil signal. Furthermore, GCaMP6 expression may be weaker in the four TIGRE1.0 mouse line crosses examined here than with strong promoter-driven adeno-associated virus (AAV) vectors as used in Chen et al., 2013. The newer TIGRE2.0 reporter lines drive GCaMP expression that is comparable to that from strong promoter-driven AAVs (Daigle et al., 2018), likely enabling 1 AP and 2 AP detection rates in transgenic mice that are comparable to those achieved with viral expression of GCaMP6.

Our recordings were obtained with a small field of view, at a high frame rate and centered on the soma (~19.3×19.3 µm, ~158 Hz, Figure 5A,B). In an attempt to simulate commonly used imaging conditions, we downsampled our images in space and time to mimic imaging with a 412 × 412 µm field of view at 30.3 Hz, as used in the Allen Brain Observatory (Figure 5C,D). The baseline fluorescence noise from downsampled images was comparable to that in the Allen Brain Observatory (Figure 5E) and is presumably comparable to images in many two-photon datasets with populations of hundreds of neurons.

Figure 5. Downsampling mimics large field-of-view images.

(A) Original, high spatial and temporal resolution GCaMP6f image of an exemplar Cux2-f neuron. (B) 1 AP, 2 AP, and 3 AP traces for the same neuron. Thin lines, individual trials; thick line, mean. Dashed line, ΔF/F = 0. (C) Downsampled image of the same neuron. (D) Traces from the downsampled neuron. (E) Normalized distribution of baseline noise (robust standard deviation [rSTD]) for 48 downsampled neurons (red) and 11,816 layer 2/3 neurons from Emx1-f and Cux2-f mice in the Allen Brain Observatory (blue).

Figure 5.

Figure 5—figure supplement 1. Example fluorescence traces from image quality control procedures, implemented during image downsampling.

Figure 5—figure supplement 1.

(A) Median of traces extracted from an example segmentation in which only the neuron of interest was found in each image during segmentation. The trace is significantly different from noise (KS test, p=1.8×10−40). (B) Median of traces extracted from an example segmentation in which only the neuron of interest was found in each image during segmentation. The trace is not significantly different from noise (KS test, p=0.072). (C) Medians of traces extracted from an example segmentation in which three regions of interest were identified. Each trace is significantly different from noise (KS tests, p=7.7×10−152, 1.6 × 10−19, 2.2 × 10−25). The sum of these three cluster medians had the highest correlation with the firing rate and was therefore used as the ΔF/F trace for this recording in the dataset. Scale bars apply to A–C. (D) Example fluorescence trace with an abrupt and sustained rise in fluorescence and in spike rate and subsequent loss of spiking activity. This neuron was manually removed from the dataset.

As expected, event detection probabilities were lower for downsampled images than for the original, high-resolution images (Figure 4D–F). 1 AP and 2 AP event detection probabilities were 0.32 ± 0.05 and 0.55 ± 0.08 for 18 Emx1-s neurons, 0.43 ± 0.07 and 0.89 ± 0.06 for three tetO-s neurons, 0.16 ± 0.04 and 0.24 ± 0.04 for 9 Cux2-f neurons, 0.21 ± 0.04 and 0.42 ± 0.09 for 11 Emx1-f neurons (mean ± SEM at 1% false positive probability). Even for 2 AP events, detection probability is <0.5 when imaging with GCaMP6f and a field of view of several hundred micrometers.

In summary, 1 AP and 2 AP events were detected with high probability when images were acquired with high spatial and temporal resolution and when analysis was performed with an event detection algorithm optimized for each neuron using known AP times. Even with known AP times to optimize detection for each neuron, event detection was severely impaired by a reduction in spatial and temporal resolution to mimic a typical two-photon population imaging experiment.

Modest effect of field of view on event detection under typical imaging conditions

In a typical two-photon population imaging experiment, no electrophysiology recording is available to optimize event detection. Often, the shape of the calcium transient is estimated from published indicator rise and decay times or derived from a representative sample of fluorescence transients. What are typical event detection and false positive probabilities under these sub-optimal conditions when the underlying AP times are unknown? Is performance degraded equally for high and for low resolution images?

We compared event detection from high-resolution and downsampled images using three spike inference algorithms: unconstrained non-negative deconvolution (NND; Friedrich et al., 2017), non-negative deconvolution with an L0 constraint to enforce event sparseness (Exact L0; Jewell et al., 2020), and a biophysical model that explicitly accounts for intracellular calcium dynamics (MLspike; Deneux et al., 2016). These three algorithms are among the highest performing spike inference algorithms (Berens et al., 2018).

For each neuron, the algorithms were deployed to estimate the number of APs in each image of the movie. All three algorithms estimated AP numbers that approximately recapitulated the number of APs measured with electrophysiology, but the number of APs per frame was typically not an integer due to imperfect spike inference (Figure 6A,D). We characterized performance using the Pearson correlation coefficient and the Matthews correlation coefficient, which compare measured and estimated AP number at each time point and the presence or absence of an event at each time point, respectively. Pearson correlation coefficients were ~0.4 when calculated with 33 ms time bins, increasing toward 0.7 as bin size was increased to 500 ms (Figure 6—figure supplement 1), comparable to published results (Berens et al., 2018). Mean Pearson and Matthews correlation coefficients were similar across inference algorithms and mouse lines and differed little between high-resolution and downsampled images (Figure 6B,E).

Figure 6. Performance of spike inference algorithms on high-resolution and downsampled images.

(A) Results from an exemplar Cux2-f neuron, at high resolution. Fluorescence and action potential (AP) rate from electrophysiology (black). Below, APs per image frame estimated with three spike inference algorithms: MLspike (blue), Exact L0 (purple), and non-negative deconvolution (NND, orange). (B) Pearson correlation coefficient (r) and Matthews correlation coefficient (MCC) for the three algorithms for each mouse line. 300 ms bins. (C) Receiver operating characteristic (ROC) curves, reporting probabilities of detecting true and false events in each time bin. Thin lines: individual neurons. Thick lines: mean across neurons. 300 ms bins. (D–F) Equivalent plots for downsampled images.

Figure 6.

Figure 6—figure supplement 1. Effect of bin duration on measures of spike inference algorithm performance.

Figure 6—figure supplement 1.

(A) Receiver operating characteristic (ROC) curves for MLspike (blue), Exact L0 (purple), and non-negative deconvolution (NND) (orange) algorithms after binning of spike probabilities into 33, 100, 200, 300, and 500 ms bins, for downsampled images resampled to 150 Hz. (B and C) Pearson correlation coefficients (r) and Matthews correlation coefficients (MCC) for each bin duration.

We plot ROC curves to more directly examine the relationship between detected events and false positives. Since spike inference is generally useful only where false positive rates are low, we focused on false positive probabilities in the range of 0–0.05. Performance differed greatly between neurons, but mean ROC curves were similar across mouse lines, with only modest differences between algorithms, between GCaMP6s and GCaMP6f lines, or between high-resolution and downsampled conditions (Figure 6C,F).

Naturally, detection probability increased with the number of APs per event. At a false positive probability of 0.01, detection probability was commonly <0.1 for 1 AP events, increasing approximately linearly with AP number, often to ~0.8 for 5 AP events (Figure 7A,B). With 1 AP events being the most common event type in all four mouse lines (Figure 7C), it was possible to detect only a minority of events with a low false positive probability. Using these spike inference algorithms, although detection probabilities were commonly slightly lower for downsampled than for high-resolution images, the difference was modest, indicating that the decreased SNR of population imaging had little effect on event detection in our dataset.

Figure 7. Performance of blind spike inference algorithms for 1–5 AP (action potential) events.

(A) Mean ± SEM detection probabilities at 1% false positive probability for high-resolution images. (B) Mean ± SEM detection probabilities at 1% false positive probability for downsampled images. (C) Frequency of 0–5 AP events for 250 ms bins for each mouse line.

Figure 7.

Figure 7—figure supplement 1. Upsampling enhances performance of unconstrained non-negative deconvolution (NND).

Figure 7—figure supplement 1.

Performance of the NND algorithm when applied to fluorescence traces from downsampled images (frame rate 30 Hz) and to the same fluorescence traces after upsampling to 150 Hz. 100 ms bins. (A) Pearson r and Matthews correlation coefficient (MCC). (B) Receiver operating characteristic (ROC) curves.
Figure 7—figure supplement 2. Autocalibration enhanced performance of MLspike.

Figure 7—figure supplement 2.

Performance of MLspike with and without use of the autocalibration procedure, applied to downsampled images, with fluorescence traces upsampled to a frame rate of 158 Hz. 100 ms bins. (A) Pearson r and Matthews correlation coefficient (MCC). (B) Receiver operating characteristic (ROC) curves.
Figure 7—figure supplement 3. Comparison of spike inference with blind and ground truth action potential (AP)-optimized algorithms.

Figure 7—figure supplement 3.

(A) Mean ± SEM 1 AP event detection probability at 1% false positive probability for ground truth-optimized and blind algorithms (used in Figure 4 and in Figures 6 and 7, respectively). Traces were selected as for Figure 4. (B) 1 AP events detected by each blind algorithm as a fraction of those detected by the ground truth-optimized algorithm, at 1% false positive probability.

Using our dataset, we compared event detection with three algorithms: unpenalized NND, NND with L0 constraint and mathematically guaranteed globally optimal solution (Exact L0), and the biophysically inspired MLspike model. For NND, performance was poor at 30 Hz and considerably improved by upsampling to 150 Hz (Figure 7—figure supplement 1). Upsampling of low frame rate data, often to 100 Hz, is a common practice in the field (Theis et al., 2016; Berens et al., 2018; Pachitariu et al., 2018). For MLspike, performance was poor without use of the autocalibration procedure to optimize the model for each neuron (Figure 7—figure supplement 2). MLspike thus contrasted with deconvolution-based algorithms, for which fixed parameters are more effective (Pachitariu et al., 2018). For Exact L0, neither upsampling nor optimization for each neuron was necessary for optimal mean performance across neurons.

Pachitariu et al., 2018 found that unpenalized NND matched and often exceeded the performance of algorithms with sparsifying constraints such as NND with an approximate L0 constraint (no mathematical guarantee of globally optimal solution). Consistent with the conclusions of Pachitariu et al., 2018, Exact L0 lagged the performance of NND for some metrics and genotypes (Figure 7 A, B) but was indistinguishable or even superior for others (Figure 6B, E). The performance of MLspike was broadly equivalent to that of NND, but the loss of performance due to downsampling was less with MLspike, resulting in outperformance of MLspike on downsampled images. Our results point to MLspike as a compelling choice for spike inference in population imaging experiments. Our results also suggest that there is ample room for improvement of spike inference models since event detection by the three spike inference models falls far short of the performance of the ground truth-optimized approach employed in Figure 4 (Figure 7—figure supplement 3).

In summary, relative to small field-of-view imaging, population imaging conditions decreased the probability of spike event detection with an event detector optimized to each individual neuron using ground truth AP information (Figure 4). With blind spike inference, many events went undetected even under near-ideal imaging conditions with a small field of view, and event detection was not substantially worse under population imaging conditions (Figure 6, Figure 7, Figure 7—figure supplement 3). The results of Figures 6 and 7 are likely representative of event detection in many GCaMP6 imaging experiments, where ground truth AP information is not available and blind spike inference is employed. Our results indicate that even though GCaMP6 indicators are bright and sensitive enough to enable the detection of most 1 AP events in superficial cortical pyramidal neurons in vivo if the detection procedure is optimized using ground truth AP information, most events containing 1, 2, and sometimes greater numbers of APs go undetected in our (and likely in many other) imaging experiments with GCaMP6.

Discussion

Calcium imaging is widely used to report neuronal spiking activity in vivo. However, accurate spike inference from calcium imaging remains a challenge, and there are relatively few ground truth datasets with simultaneous calcium imaging and electrophysiology to aid the development of more accurate spike inference algorithms. In a recent challenge, ~40 algorithms were trained and tested on datasets consisting of 37 GCaMP6-expressing neurons, underscoring the need for additional GCaMP6 calibration data (Berens et al., 2018). In addition to supporting efforts toward spike inference, an improved understanding of the relationship between spiking and observed fluorescence signals is necessary to further broaden the utility and impact of calcium imaging. To these ends, we contribute a ground truth dataset consisting of 48 V1 L2/3 excitatory neurons recorded at single-cell resolution (available at https://portal.brain-map.org/explore/circuits/oephys) and characterized their AP-to-calcium fluorescence transfer function. Complementing existing datasets with viral GECI expression (Chen et al., 2013; Theis et al., 2016; Dana et al., 2016), our work facilitates interpretation of existing and future calcium imaging studies using mainstream transgenic mouse lines, such as the Allen Institute’s Brain Observatory Visual Coding dataset (http://observatory.brain-map.org/visualcoding) (de Vries et al., 2020).

Previous studies have established that most APs can be detected with GCaMP6 indicators under near-optimal conditions (Chen et al., 2013). Yet undetected APs are common in population imaging experiments (Theis et al., 2016; Berens et al., 2018). To investigate why APs are often missed during population imaging, we compared event detection in 250 ms time windows with a neuronal soma occupying most of the image, near-optimal conditions for AP event detection, and event detection when the soma occupies just a small percentage of the field of view, less ideal conditions that are common in population imaging studies. Importantly, we downsampled images to simulate population imaging conditions, enabling comparison for the same APs under different imaging conditions.

Our results indicate that, in GCaMP6 transgenic mice, most APs can be detected under near-optimal conditions, while detection is less effective during population imaging. These conclusions are similar to those of previous studies with viral GCaMP6 expression, but our results also reveal two reasons for the difference in detection. Unsurprisingly, the reduced signal-to-noise ratio of population imaging, relative to single soma imaging, results in less effective event detection. However, a high signal-to-noise ratio, achieved by imaging one soma, was no guarantee of effective event detection. Effective detection also required optimization of detection for the neuron of interest, using known AP times to identify events with different AP numbers and so generate kernels of the appropriate amplitude and time course. Parameter tuning in the absence of known AP times, with the MLspike autocalibration routine, improved event detection but not to the high standard of ground truth-optimized detection. Unfortunately, measuring AP times for every neuron with electrophysiology is rarely feasible, severely limiting the percentage of events one might reasonably expect to detect with GCaMP6 in most imaging experiments.

Our results point to several practices that might be adopted to maximize spike detection. First, minimize the field of view, hence maximizing photon flux per neuron. Second, tune the spike inference model for each neuron independently, where possible. Third, compare the results of several spike inference models. The three models employed here produced similar AP detection rates, whether applied to high-resolution or to downsampled images. Similarly, Pachitariu et al., 2018 observed that the L0 constraint failed to improve performance of the NND model. Nonetheless, each model has strengths and weaknesses. For example, a model may detect more APs than another but at the cost of a greater false positive rate. As a result, model performance may diverge for some AP rates and patterns. In the worst case, comparing models provides some protection from errors in implementation. Fourth, ensure that traces are sampled (or upsampled) at a sufficiently high rate when employing NND and use autocalibration with MLspike; both make a substantial difference to model performance. Finally, exercise caution when interpreting the inferred spike rates. Commonly, many APs are not detected using even the most accurate spike inference models.

In summary, in this study we present a ground truth dataset from anesthetized mice with simultaneous electrophysiology and calcium imaging. Analysis of this dataset revealed that only a small fraction of isolated APs were detected under typical population imaging conditions and with existing spike inference algorithms. By making our data freely available, we hope that it will serve the community as a further resource to better understand the quantitative link between calcium-evoked fluorescent imaging signals and spiking activity.

Materials and methods

Key resources table.

Reagent type
(species) or resource
Designation Source or reference Identifiers Additional information
Genetic reagent Mus musculus B6.129S2-Emx1tm1(cre)Krj/J,
Emx1-IRES-Cre
Jackson Laboratory RRID:IMSR_JAX:005628
RRID:MGI:2684610
Genetic reagent Mus musculus B6(Cg)-Cux2tm3.1(cre/ERT2)Mull/Mmmh,
Cux2-CreERT2
MMRRC RRID:MMRRC_032779-MU
RRID:MGI:5014172
Genetic reagent Mus musculus B6.Cg-Tg(Camk2a-tTA)1Mmay/DboJ,
Camk2a-tTA
Jackson Laboratory RRID:IMSR_JAX:007004 RRID:MGI:2179066
Genetic reagent Mus musculus B6;DBA-Tg(tetO-GCaMP6s)2Niell/J, tetO-GCaMP6s Jackson Laboratory RRID:IMSR_JAX:024742
RRID:MGI:5553332
Genetic reagent Mus musculus B6;129S6-Igs7tm93.1(tetO-GCaMP6f)Hze/J,
Ai93(TITL-GCaMP6f)
Jackson Laboratory RRID:IMSR_JAX:024103
RRID:MGI:5558086
Genetic reagent Mus musculus B6.Cg-Igs7tm94.1(tetO-GCaMP6s)Hze/J,
Ai94(TITL-GCaMP6s)
Jackson Laboratory RRID:IMSR_JAX:024104
RRID:MGI:5607576
Software, algorithm MATLAB R2016b http://www.mathworks.com/products/matlab/ RRID:SCR_001622
Software, algorithm Python 3.7.4 http://www.python.org/ RRID:SCR_008394
Software, algorithm LabVIEW 2015 http://www.ni.com/labview/ RRID:SCR_014325

Experimental procedures were conducted in accordance with NIH guidelines and approved by the Institutional Animal Care and Use Committee (IACUC) of the Allen Institute for Brain Science under protocol number 1509.

Mice

Two-photon-targeted electrophysiology and two-photon calcium imaging were conducted in 2- to 5-month-old male and female transgenic mice: five Emx1-IRES-Cre;Camk2a-tTA;Ai94 (Emx1-s) mice, one Camk2a-tTA;tetO-GCaMP6s (tetO-s) mouse, three Emx1-IRES-Cre;Camk2a-tTA;Ai93 (Emx1-f) mice, and four Cux2-CreERT2;Camk2a-tTA;Ai93 (Cux2-f) mice. All four lines drive GCaMP expression primarily in excitatory neurons. In Cux2-CreERT2 mice, Cre and GCaMP expression are enriched in layer 2/3 (Franco et al., 2012; Harris et al., 2014). In Emx1-IRES-Cre and Camk2a-tTA mice, GCaMP is expressed throughout cortical layers (Gorski et al., 2002; Wekselblatt et al., 2016). Images showing the pattern of Cre and GCaMP expression in these mouse lines are available via the Transgenic Characterization pages of the Allen Mouse Brain Connectivity Atlas and Allen Brain Observatory: https://connectivity.brain-map.org/transgenic, http://observatory.brain-map.org/visualcoding/transgenic.

Mice of some of the genotypes used here, most notably Emx1-f, can exhibit epileptiform activity (Steinmetz et al., 2017), including overt seizures. Mice with seizures were excluded from the study. However, the spiking patterns of neurons from GCaMP6s and -f lines commonly differed, suggesting that one or more transgenes affected cell or circuit activity (Figure 1—figure supplement 1C).

Surgery

Mice were anesthetized with either isoflurane (0.75–1.5% in O2) or urethane (1.5 g/kg, 30% aqueous solution, intraperitoneal injection), then implanted with a metal head-post. A circular craniotomy was performed with skull thinning over the left V1 centering on 1.3 mm anterior and 2.6 mm lateral to the lambda. During surgery, the craniotomy was filled with artificial cerebrospinal fluid (ACSF) containing (in mM): NaCl 126, KCl 2.5, NaH2PO4 1.25, MgCl2 1, NaHCO3 26, glucose 10, CaCl2 2, in ddH2O; 290 mOsm; pH was adjusted to 7.3 with NaOH to keep the exposed V1 region from overheating or drying. Durotomy was performed to expose V1 regions of interest (ROIs) that were free of major blood vessels to facilitate the penetration of recording micropipettes. A thin layer of low-melting-point agarose (1–1.3% in ACSF, Sigma-Aldrich) was then applied to the craniotomy to control brain motion. The mouse body temperature was maintained at 37°C with a feedback-controlled animal heating pad (Harvard Apparatus).

Calcium imaging

Individual GCaMP6+ neurons ~100–300 µm below the pial surface of cortex were visualized under adequate anesthesia (stage III-3) using a Bruker (Prairie) two-photon microscope and Chameleon Ultra II Ti:sapphire laser (Coherent). Fluorescence excited at 920 nm wavelength, with <70 mW laser power measured after the objective, was collected in two spectral channels using green (510/42 nm) and red (641/75 nm) emission filters (Semrock) to visualize GCaMP6 and the Alexa Fluor 594-containing micropipette, respectively. Fluorescence images of 96–136 × 96–107 pixels and a 19.3–27.3 × 19.3–21.5 µm field of view were acquired at 141.3–158 frames per second through a 16× water immersion objective lens (Nikon, NA 0.8). Recordings included periods with and without visual stimuli. Mean ± SEM number of pixels per neuron was 1136 ± 46.

Electrophysiology

Two-photon-targeted cell-attached recording was performed following established protocols (Margrie et al., 2003; Kitamura et al., 2008; Knoblich et al., 2019). Long-shank borosilicate (KG-33, King Precision Glass) micropipettes (5–10 MΩ) were pulled with a P-97 puller (Sutter) and filled with ACSF and Alexa Fluor 594 to perform cell-attached recordings on GCaMP6+ neurons. Micropipettes were installed on a MultiClamp 700B headstage (Molecular Devices), which was mounted onto a Patchstar micromanipulator (Scientifica) with an approaching angle of 31° from horizontal plane. Minimal seal resistance was 20 MΩ. Data were acquired under ‘I = 0’ mode (zero current injection) with a Multiclamp 700B, recorded at 40 kHz using Multifunction I/O Devices (National Instruments) and custom software written in LabVIEW (National Instruments) and MATLAB (MathWorks). Isoflurane level was intentionally adjusted during recording sessions to keep the anesthesia depth as light as possible, resulting in fluctuation of the firing rates of recorded neurons.

Visual stimulation

Whole-screen sinusoidal static and drifting gratings were presented on a calibrated LCD monitor spanning 60° in elevation and 130° in azimuth to the contralateral eye. The mouse’s eye was positioned ~22 cm away from the center of the monitor. For static gratings, the stimulus consisted of four orientations (45° increment), four spatial frequencies (0.02, 0.04, 0.08, and 0.16 cycles per degree), and four phases (0, 0.25, 0.5, 0.75) at 80% contrast in a random sequence with 10 repetitions. Each static grating was presented for 0.25 s, with no inter-stimulus interval. A gray screen at mean illuminance was presented randomly a total of 60 times. For drifting gratings, the stimulus consisted of eight orientations (45° increment), four spatial frequency (0.02, 0.04, 0.08, and 0.16 cycles per degree), and one temporal frequency (2 Hz), at 80% contrast in a random sequence with up to five repetitions. Each drifting grating lasted for 2 s with an inter-stimulus interval of 2 s. A gray screen at mean illuminance was presented randomly for up to 15 times.

Neuron selection

We obtained recordings from 213 neurons and developed a numerical routine to exclude neurons with questionable electrophysiology or fluorescence movies, such as abrupt changes in baseline voltage or AP waveform or image artifacts such as those due to motion, photobleaching, or other slow baseline changes. Neurons were accepted for analysis if they passed both electrophysiology and image quality control criteria. Electrophysiology quality control is described in the next section and imaging quality control in the ‘Image downsampling’ section; 145 and 10 neurons were eliminated in the electrophysiology and image quality control steps, leaving 58 neurons. Of these, 10 were excluded from further analysis: red indicator had entered the soma from the pipette in seven instances, two neurons segmented poorly during image analysis, and one had a truncated electrophysiology recording. The final dataset consisted of 48 neurons.

Electrophysiology quality control

Electrophysiology traces were first baseline-subtracted to remove slow drift (third-order Savitzky-Golay filter over 20,001 samples using MATLAB sgolayfilt). APs were detected as peaks of amplitude more than 10 times the Quiroga threshold (QT), the median(|V(t)|/0.6745).

To develop a numerical routine, a group of human annotators identified 48 ‘high-quality’ electrophysiology recordings. We then compiled a large set of descriptive statistics, listed below, and calculated the distribution of each of these statistics in the reference dataset, thereby defining an acceptable range expected of high-quality recordings. Each descriptive statistic was subsequently computed for recordings from all 213 neurons. Each recording was passed for further analysis if for all metrics it fell within the range spanned by the manually selected dataset of 48 recordings.

For each electrophysiology recording, we calculated 35 descriptive statistics.

Metrics computed on continuous electrophysiological data:

(1) Median relative deviation of the membrane potential (MRDM), the ratio between the median absolute deviation (MAD) and the median: MRDM = MAD(Vm)/median(Vm).

(2) Mean of the baseline (BL).

(3) Coefficient of variation of the baseline: std (BL)/mean (BL).

(4) Mean of the baseline noise, approximated by the QT (Jewell et al., 2020).

(5) Stability of the QT: thousand 10 s intervals were uniformly sampled from each recording, and the QT was computed on each sample. Quiroga noise stability (QNS) was defined as the coefficient of variation over the 1000 QT samples.

(6) r2 of linear regression (MATLAB regression function) of the 1000 QT samples against the start times of the 10 s segments on which the QT was computed.

(7) Slope of linear regression (MATLAB regression function) of the 1000 QT samples against the start times of the 10 s segments on which the QT was computed.

(8) r2 of linear regression (MATLAB regression function) of the baseline against time.

(9) Slope of linear regression (MATLAB regression function) of the baseline against time.

(10) The number of samples for which the baseline-subtracted trace exceeds the QT divided by the number of samples for which it dips below the negative of the QT.

Metrics computed on the AP time series. Only recordings with >3 APs were included:

(11) Number of APs.

(12) Maximum likelihood inter-AP interval (MATLAB lognfit function).

(13) Mean AP amplitude.

(14) AP amplitude coefficient of variation.

(15) AP amplitude median relative deviation.

(16) Relative AP amplitude range: (max[amplitude] – min[amplitude])/median(amplitude).

(17) AP amplitude max/min ratio: max(amplitude)/min(amplitude).

(18) Signal-to-noise ratio (SNR), median(amplitude)/QT.

Metrics computed on 2-ms-long AP waveforms, AP time ±1 ms smoothed with MATLAB smooth function with sgolay option:

(19) 'Left' width-half-max (LWHM), the mean width at half the amplitude before the detected AP time.

(20) 'Right' width-half-max (RWHM), the mean width at half the amplitude after the detected AP time.

(21) Full width at half amplitude (FWHM). FWHM = LWHM + RWHM.

(22) Coefficient of variation of LWHM.

(23) Coefficient of variation of RWHM.

(24) Coefficient of variation of FWHM.

(25) r2 of linear regression (MATLAB regression function) of AP amplitude against AP time.

(26) Slope of linear regression (MATLAB regression function) of AP amplitude against AP time.

(27) r2 of linear regression (MATLAB regression function) of AP FWHM against AP time.

(28) Slope of linear regression (MATLAB regression function) of AP FWHM against AP time.

Firing rate-based metrics. Firing rate was estimated by convolution of the AP train with a 1-s-long box-car window (MATLAB conv function):

(29) Mean firing rate (FR).

(30) Coefficient of variation of FR.

(31) r2 of linear regression (MATLAB regression function) of firing rate against time.

(32) Slope of linear regression (MATLAB regression function) of firing rate against time.

(33) Pearson correlation (MATLAB corrcoef function) of BL vs. FR.

(34) Pearson correlation (MATLAB corrcoef function) between the baseline at AP time points and AP amplitude.

(35) Pearson correlation (MATLAB corrcoef function) between the baseline at AP times and the AP FWHM.

Neuropil subtraction, high-resolution images

To approximate somatic fluorescence (Fcell_true) without neuropil contamination, a scale version of the neuropil fluorescence (Fneuropil) was subtracted from each somatic fluorescence trace, after (Akerboom et al., 2012): Fcell_true(t) = Fcell_measured(t) – r * Fneuropil(t). We determined the optimal scale factor (r) for neurons with GCaMP6f to be 0.82 (see 'Results' section). We therefore used r = 0.8 as our default scale factor. For some neurons, Fneuropil was large enough relative to Fcell_measured that r = 0.8 resulted in negative fluorescence. For these neurons, we set r to 0.7, 0.6, or 0.5. For our dataset of 48 GCaMP6s and GCaMP6f neurons, we set r to 0.8 for 40 neurons, to 0.7 for four neurons, to 0.6 for three neurons, and to 0.5 for one neuron.

Neuropil subtraction, downsampled images

Neuropil subtraction was performed as described for the Allen Brain Observatory (de Vries et al., 2020).

Trace analysis

Electrophysiology and calcium imaging data were analyzed using custom MATLAB and Python scripts. For electrophysiology, Vm was filtered between 250 Hz and 5 kHz, and automated AP detection was performed using a threshold criterion (5×std of Vm).

For calcium imaging, in-plane motion artifacts were corrected (Dombeck et al., 2007), and neuron/ROI selection was performed using a semi-automatic algorithm (Chen et al., 2013) (kindly provided by Karel Svoboda, Janelia Research Campus). Ring-shaped ROIs were used to select GCaMP6-positive excitatory neurons, with GCaMP6 expression typically excluded from the nucleus and restricted to the cytoplasm.

To construct AP-calcium fluorescence response curves, we first identified all isolated AP events. For GCaMP6s, isolated events were separated from previous and subsequent events by ≥1000 and ≥500 ms, respectively. For GCaMP6f, isolated events were separated from previous and subsequent events by ≥300 ms. One result of finding isolated events is that only a minority of APs were used to construct AP-calcium fluorescence response curves. Within each event, APs were summed over 250 ms. Fluorescence traces were aligned to the first AP in each event, with t = 0 preceding the AP by <1 frame (6.3 ms at 158 Hz). For each event, ΔF/F = (F-F0,local)/F0,global, where F0,local was the mean fluorescence over 100 ms before the first AP, and F0,global was the minimum F0,local across trials. For GCaMP6s and GCaMP6f, peak ΔF/F was calculated by first finding tmax, the time of the maximum ΔF/F ≤ 500 ms and 300 ms after the first AP, respectively. Peak ΔF/F was the mean ΔF/F from tmax - 50 ms to tmax + 50 ms. Bursts of >5 APs were excluded from analysis due to the low frequency of such events.

Fluorescence-to-photon conversion

Mean and variance of the fluorescence, calculated pixelwise for each image, were linearly related, consistent with shot noise-limited imaging.

The resulting slope and offset of the least squares fit were used to convert fluorescence to number of photons: photons = (F – [–offset/slope])/slope (http://github.com/AllenInstitute/QC_2P). To account for different pixel dwell times along the resonant scanning axis, photon gain and offset were computed pixel-by-pixel along the resonant axis.

Trial-to-trial variability

For each neuron, fluorescence was summed over all somatic pixels and converted to photons. For each 1 AP event, mean photon count 0.1–0 s before the AP was subtracted. tmax, the time of the maximum photon count, was calculated from the mean 1 AP trace. Photon count in each trial was determined at tmax, and the 95% confidence interval was calculated as mean (across trials)±1.96 * mean peak. The percentage of trials with peak fluorescence outside the 95% confidence interval was used as a measure of trial-to-trial variability.

Ground truth-optimized event detection

We compared fluorescence traces of the response (1 AP or 2 AP events) to that of 0 AP events (Chen et al., 2013). For each recording, the mean response trace was used as the template vector. The template vector was normalized after subtracting the mean to create the unit vector, and the scalar results of projecting the response and noise traces on the unit vector were computed: ri and ni for response and noise scalars, respectively. The detection threshold was defined as the xth percentile of ni values, where 1–x represented the false positive probability (e.g. x = 95 for 95th percentile or 5% false positive probability), and the detection probability (true positive probability) was the fraction of ri values above the detection threshold.

Image downsampling

Fluorescence movies were sub-sampled by a factor of 4 in space and 5 in time (to pixel size 0.80 µm and frame rate 30.3 Hz) to match the sampling rate and approximate number of pixels per soma of the Allen Brain Observatory (de Vries et al., 2020). To assess the effect of downsampling on subsequent processing, 20 different approaches were tried in parallel (downsampling starting with the 1st, 2nd,…, 4th pixel × 1st, 2nd,…, 5th frame, respectively). 4 × 5 internally identical blocks, one block for each downsampling strategy, were tiled for a total of 400 almost identical ROIs per recording. Segmentation to find somatic ROIs, demixing of traces from nearby somata, neuropil subtraction, and the calculation of ΔF/F were performed as described for the Allen Brain Observatory (de Vries et al., 2020).

In cases where only the neuron of interest was found during segmentation, trace extraction would yield a family of 400 self-similar traces and the median was used for subsequent analysis. To catch cases where segmentation yielded additional objects that were not part of the neuron of interest, additional QC steps were required. The traces were first clustered using DBSCAN (Ester et al., 1996; Schubert et al., 2017), and each cluster median was compared against white noise of the same mean and standard deviation (KS test) and rejected as artifact if it was not significantly different (p<0.05). In cases where multiple clusters were significantly different from noise, this was either due to multiple neurons being present in the field of view or due to residual motion artifacts resulting in multiple translated copies of the same neuron. To disambiguate these two possibilities, the top three clusters were merged: sums were computed for all six possible combinations (sampled without replacement) of (up to) three most distinct cluster medians, and the combination most significantly correlated with the measured electrophysiological AP train was selected for subsequent analysis. Correlation significance was determined by building a null distribution of correlations between the cluster medians and 1000 random Poisson trains with a rate matching that of the recorded AP train. If there was no more significant correlation between any cluster median (or sum thereof) and the measured AP train than the 0.5th percentile of the null distribution (i.e. p>0.005), the recording was failed. Finally, we eliminated from further analysis <10 neurons with an abrupt and sustained (seconds) rise in spike rate and subsequent loss spiking activity out of concern that this activity pattern might indicate a breached plasma membrane.

To compare the noise characteristics of the downsampled images to the Allen Brain Observatory, we computed the robust standard deviation, a median-based method with outlier removal (de Vries et al., 2020). For the Allen Brain Observatory, we analyzed fluorescence over periods in which there were no apparent AP-evoked changes in fluorescence.

Acknowledgements

We are grateful for the Animal Care, Transgenic Colony Management, and Lab Animal Services teams for mouse husbandry, and Carol Thompson and John Phillips for providing project management support. We thank Karel Svoboda, Hod Dana, and Tsai-Wen Chen for sharing analysis software. This work was funded by Allen Institute for Brain Science. This work was also supported by grants from National Institutes of Health (R01EB026908) to MAB, and from National Natural Science Foundation of China (NSFC31871055) and Guangdong Science and Technology Department (2020B1212060018, 2017B030314026 and 2018B030334001) to LL. We thank the Allen Institute founders, Paul G Allen and Jody Allen for their vision, encouragement, and support.

Funding Statement

This work is funded by the Allen Institute for Brain Science. The funder had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Hongkui Zeng, Email: hongkuiz@alleninstitute.org.

Jack Waters, Email: jackw@alleninstitute.org.

Lu Li, Email: lilu67@mail.sysu.edu.cn.

Gary L Westbrook, Oregon Health and Science University, United States.

Gary L Westbrook, Oregon Health and Science University, United States.

Funding Information

This paper was supported by the following grants:

  • Allen Institute for Brain Science Program funds to Lawrence Huang, Peter Ledochowitsch, Ulf Knoblich, Jérôme Lecoq, Gabe J Murphy, R Clay Reid, Saskia EJ de Vries, Christof Koch, Hongkui Zeng, Michael A Buice, Jack Waters, Lu Li.

  • National Natural Science Foundation of China NSFC31871055 to Lu Li.

  • Guangdong Science and Technology Department 2017B030314026 to Lu Li.

  • Guangdong Science and Technology Department 2018B030334001 to Lu Li.

  • Guangdong Science and Technology Department 2020B1212060018 to Lu Li.

  • National Institutes of Health R01EB026908 to Michael A Buice.

Additional information

Competing interests

No competing interests declared.

Author contributions

Data curation, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft.

Data curation, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft.

Data curation, Investigation, Methodology.

Formal analysis, Investigation, Writing - review and editing.

Formal analysis, Supervision.

Conceptualization, Supervision.

Formal analysis.

Conceptualization, Supervision, Funding acquisition, Writing - review and editing.

Conceptualization, Formal analysis, Supervision, Investigation, Writing - review and editing.

Formal analysis, Supervision, Investigation, Methodology, Writing - review and editing.

Formal analysis, Investigation, Methodology, Writing - review and editing.

Data curation, Formal analysis, Supervision, Investigation, Methodology, Writing - review and editing.

Ethics

Animal experimentation: Experimental procedures were conducted in accordance with NIH guidelines and approved by the Institutional Animal Care and Use Committee (IACUC) of the Allen Institute for Brain Science under protocol number 1509.

Additional files

Transparent reporting form

Data availability

All data generated and analyzed in this study are available at https://portal.brain-map.org/explore/circuits/oephys.

The following dataset was generated:

Huang L, Ledochowitsch P, Knoblich U, Lecoq Jrm, Murphy GJ, Reid RC, de Vries SEJ, Koch C, Zeng H, Buice MA, Waters J, Li L. 2019. Ophys/Ephys Calibration Data. Allen Brain Map. circuits/oephys

References

  1. Akerboom J, Chen TW, Wardill TJ, Tian L, Marvin JS, Mutlu S, Calderón NC, Esposti F, Borghuis BG, Sun XR, Gordus A, Orger MB, Portugues R, Engert F, Macklin JJ, Filosa A, Aggarwal A, Kerr RA, Takagi R, Kracun S, Shigetomi E, Khakh BS, Baier H, Lagnado L, Wang SS, Bargmann CI, Kimmel BE, Jayaraman V, Svoboda K, Kim DS, Schreiter ER, Looger LL. Optimization of a GCaMP calcium Indicator for neural activity imaging. Journal of Neuroscience. 2012;32:13819–13840. doi: 10.1523/JNEUROSCI.2601-12.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Berens P, Freeman J, Deneux T, Chenkov N, McColgan T, Speiser A, Macke JH, Turaga SC, Mineault P, Rupprecht P, Gerhard S, Friedrich RW, Friedrich J, Paninski L, Pachitariu M, Harris KD, Bolte B, Machado TA, Ringach D, Stone J, Rogerson LE, Sofroniew NJ, Reimer J, Froudarakis E, Euler T, Román Rosón M, Theis L, Tolias AS, Bethge M. Community-based benchmarking improves spike rate inference from two-photon calcium imaging data. PLOS Computational Biology. 2018;14:e1006157. doi: 10.1371/journal.pcbi.1006157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Chen TW, Wardill TJ, Sun Y, Pulver SR, Renninger SL, Baohan A, Schreiter ER, Kerr RA, Orger MB, Jayaraman V, Looger LL, Svoboda K, Kim DS. Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature. 2013;499:295–300. doi: 10.1038/nature12354. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Daigle TL, Madisen L, Hage TA, Valley MT, Knoblich U, Larsen RS, Takeno MM, Huang L, Gu H, Larsen R, Mills M, Bosma-Moody A, Siverts LA, Walker M, Graybuck LT, Yao Z, Fong O, Nguyen TN, Garren E, Lenz GH, Chavarha M, Pendergraft J, Harrington J, Hirokawa KE, Harris JA, Nicovich PR, McGraw MJ, Ollerenshaw DR, Smith KA, Baker CA, Ting JT, Sunkin SM, Lecoq J, Lin MZ, Boyden ES, Murphy GJ, da Costa NM, Waters J, Li L, Tasic B, Zeng H. A suite of transgenic driver and reporter mouse lines with enhanced Brain-Cell-Type targeting and functionality. Cell. 2018;174:465–480. doi: 10.1016/j.cell.2018.06.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Dana H, Chen T-W, Hu A, Shields BC, Guo C, Looger LL, Kim DS, Svoboda K. Thy1-GCaMP6 transgenic mice for neuronal population imaging in vivo. PLOS ONE. 2014;9:e108697. doi: 10.1371/journal.pone.0108697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Dana H, Mohar B, Sun Y, Narayan S, Gordus A, Hasseman JP, Tsegaye G, Holt GT, Hu A, Walpita D, Patel R, Macklin JJ, Bargmann CI, Ahrens MB, Schreiter ER, Jayaraman V, Looger LL, Svoboda K, Kim DS. Sensitive red protein calcium indicators for imaging neural activity. eLife. 2016;5:e12727. doi: 10.7554/eLife.12727. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. de Vries SEJ, Lecoq JA, Buice MA, Groblewski PA, Ocker GK, Oliver M, Feng D, Cain N, Ledochowitsch P, Millman D, Roll K, Garrett M, Keenan T, Kuan L, Mihalas S, Olsen S, Thompson C, Wakeman W, Waters J, Williams D, Barber C, Berbesque N, Blanchard B, Bowles N, Caldejon SD, Casal L, Cho A, Cross S, Dang C, Dolbeare T, Edwards M, Galbraith J, Gaudreault N, Gilbert TL, Griffin F, Hargrave P, Howard R, Huang L, Jewell S, Keller N, Knoblich U, Larkin JD, Larsen R, Lau C, Lee E, Lee F, Leon A, Li L, Long F, Luviano J, Mace K, Nguyen T, Perkins J, Robertson M, Seid S, Shea-Brown E, Shi J, Sjoquist N, Slaughterbeck C, Sullivan D, Valenza R, White C, Williford A, Witten DM, Zhuang J, Zeng H, Farrell C, Ng L, Bernard A, Phillips JW, Reid RC, Koch C. A large-scale standardized physiological survey reveals functional organization of the mouse visual cortex. Nature Neuroscience. 2020;23:138–151. doi: 10.1038/s41593-019-0550-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Deneux T, Kaszas A, Szalay G, Katona G, Lakner T, Grinvald A, Rózsa B, Vanzetta I. Accurate spike estimation from noisy calcium signals for ultrafast three-dimensional imaging of large neuronal populations in vivo. Nature Communications. 2016;7:12190. doi: 10.1038/ncomms12190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Dombeck DA, Khabbaz AN, Collman F, Adelman TL, Tank DW. Imaging large-scale neural activity with cellular resolution in awake, mobile mice. Neuron. 2007;56:43–57. doi: 10.1016/j.neuron.2007.08.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Ester M, Kriegel H-P, Sander J, Xu X. A Density-Based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, (AAAI Press); 1996. pp. 226–231. [Google Scholar]
  11. Franco SJ, Gil-Sanz C, Martinez-Garay I, Espinosa A, Harkins-Perry SR, Ramos C, Müller U. Fate-restricted neural progenitors in the mammalian cerebral cortex. Science. 2012;337:746–749. doi: 10.1126/science.1223616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Friedrich J, Zhou P, Paninski L. Fast online deconvolution of calcium imaging data. PLOS Computational Biology. 2017;13:e1005423. doi: 10.1371/journal.pcbi.1005423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Gorski JA, Talley T, Qiu M, Puelles L, Rubenstein JL, Jones KR. Cortical excitatory neurons and Glia, but not GABAergic neurons, are produced in the Emx1-expressing lineage. The Journal of Neuroscience. 2002;22:6309–6314. doi: 10.1523/JNEUROSCI.22-15-06309.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Harris JA, Hirokawa KE, Sorensen SA, Gu H, Mills M, Ng LL, Bohn P, Mortrud M, Ouellette B, Kidney J, Smith KA, Dang C, Sunkin S, Bernard A, Oh SW, Madisen L, Zeng H. Anatomical characterization of cre driver mice for neural circuit mapping and manipulation. Frontiers in Neural Circuits. 2014;8:76. doi: 10.3389/fncir.2014.00076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Jewell SW, Hocking TD, Fearnhead P, Witten DM. Fast nonconvex deconvolution of calcium imaging data. Biostatistics. 2020;21:709–726. doi: 10.1093/biostatistics/kxy083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Kerlin AM, Andermann ML, Berezovskii VK, Reid RC. Broadly tuned response properties of diverse inhibitory neuron subtypes in mouse visual cortex. Neuron. 2010;67:858–871. doi: 10.1016/j.neuron.2010.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Kitamura K, Judkewitz B, Kano M, Denk W, Häusser M. Targeted patch-clamp recordings and single-cell electroporation of unlabeled neurons in vivo. Nature Methods. 2008;5:61–67. doi: 10.1038/nmeth1150. [DOI] [PubMed] [Google Scholar]
  18. Knoblich U, Huang L, Zeng H, Li L. Neuronal cell-subtype specificity of neural synchronization in mouse primary visual cortex. Nature Communications. 2019;10:2533. doi: 10.1038/s41467-019-10498-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Luo L, Callaway EM, Svoboda K. Genetic dissection of neural circuits: a decade of progress. Neuron. 2018;98:256–281. doi: 10.1016/j.neuron.2018.03.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Madisen L, Garner AR, Shimaoka D, Chuong AS, Klapoetke NC, Li L, van der Bourg A, Niino Y, Egolf L, Monetti C, Gu H, Mills M, Cheng A, Tasic B, Nguyen TN, Sunkin SM, Benucci A, Nagy A, Miyawaki A, Helmchen F, Empson RM, Knöpfel T, Boyden ES, Reid RC, Carandini M, Zeng H. Transgenic mice for intersectional targeting of neural sensors and effectors with high specificity and performance. Neuron. 2015;85:942–958. doi: 10.1016/j.neuron.2015.02.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Margrie TW, Meyer AH, Caputi A, Monyer H, Hasan MT, Schaefer AT, Denk W, Brecht M. Targeted whole-cell recordings in the mammalian brain in vivo. Neuron. 2003;39:911–918. doi: 10.1016/j.neuron.2003.08.012. [DOI] [PubMed] [Google Scholar]
  22. Pachitariu M, Stringer C, Harris KD. Robustness of spike deconvolution for neuronal calcium imaging. The Journal of Neuroscience. 2018;38:7976–7985. doi: 10.1523/JNEUROSCI.3339-17.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Schubert E, Sander J, Ester M, Kriegel HP, Xu X. DBSCAN revisited, revisited: why and how you should (Still) Use DBSCAN. ACM Trans Database Syst. 2017;42:1–21. [Google Scholar]
  24. Steinmetz NA, Buetfering C, Lecoq J, Lee CR, Peters AJ, Jacobs EAK, Coen P, Ollerenshaw DR, Valley MT, de Vries SEJ, Garrett M, Zhuang J, Groblewski PA, Manavi S, Miles J, White C, Lee E, Griffin F, Larkin JD, Roll K, Cross S, Nguyen TV, Larsen R, Pendergraft J, Daigle T, Tasic B, Thompson CL, Waters J, Olsen S, Margolis DJ, Zeng H, Hausser M, Carandini M, Harris KD. Aberrant cortical activity in multiple GCaMP6-Expressing transgenic mouse lines. Eneuro. 2017;4:ENEURO.0207-17.2017. doi: 10.1523/ENEURO.0207-17.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Theis L, Berens P, Froudarakis E, Reimer J, Román Rosón M, Baden T, Euler T, Tolias AS, Bethge M. Benchmarking Spike Rate Inference in Population Calcium Imaging. Neuron. 2016;90:471–482. doi: 10.1016/j.neuron.2016.04.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Wei Z, Lin B-J, Chen T-W, Daie K, Svoboda K, Druckmann S. A comparison of neuronal population dynamics measured with calcium imaging and electrophysiology. bioRxiv. 2019 doi: 10.1101/840686. [DOI] [PMC free article] [PubMed]
  27. Wekselblatt JB, Flister ED, Piscopo DM, Niell CM. Large-scale imaging of cortical dynamics during sensory perception and behavior. Journal of Neurophysiology. 2016;115:2852–2866. doi: 10.1152/jn.01056.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision letter

Editor: Gary L Westbrook1
Reviewed by: Karel Svoboda2, Michael Higley3, Bernardo L Sabatini4

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

The study by Huang, Knoblich et al. represents an important contribution to the field, providing critical examination of in vivo 2-photon calcium imaging for the detection of underlying spike events. Overall, the work is very high quality. The demonstration that spike detection is ~15% under normal "low zoom" imaging conditions is a stunning observation that should be a wake-up call to large parts of the community. The results are somewhat sobering for investigators in the sense that no once-size-fits all strategy accurately extracted spiking in commonly-used conditions from fluorescence data.

Decision letter after peer review:

Thank you for submitting your article "Relationship between spiking activity and simultaneous fluorescence signals in transgenic mice expressing GCaMP6" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by Gary Westbrook as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Karel Svoboda (Reviewer #1); Michael Higley (Reviewer #2); Bernardo L Sabatini (Reviewer #3).

The reviewers have discussed the reviews with one another and the Senior Editor has drafted this decision to help you prepare a revised submission.

Summary

This manuscript explores an important topic with data that is difficult to obtain. All reviewers thought the manuscript was worth publishing in eLife after appropriate revisions. However all reviewers had concerns (some overlapping) that are important to address. Many of comments can be addressed with clarifications, rewording and better explanation/analysis of some aspects of the work. However, the manuscript contains some statements and conclusions that indicate incomplete understanding of sources of noise in the imaging experiments. This is important because the paper is really focused on detection. The full comments of the reviewers are below.

Reviewer #1:

Calcium imaging is widely used to track activity in large populations of neurons. Calcium-dependent fluorescence is often thought of as “activity”, but it is unclear what this means because the spike to fluorescence transform is not well understood. As a result the interpretation of calcium imaging data is often superficial and misleading. This is in part because ground truth data (i.e. simultaneous imaging and recording) is scarce. The major contribution of this paper is the report of a substantial bolus of additional ground truth data in several widely used transgenic mouse lines. This is hard-won data and I support publication. But some work is required first.

The take-home messages in this paper are: There are differences in spike detection across mouse lines, with 6s-expressing mice outperforming 6f-expressing mice. This is expected.

There are differences in detection across mice expressing the same indicator:

These differences across mice expressing the same indicator are not explained. My suspicion is that differences in neuropil (i.e. background) is likely the culprit (emx1 expresses in L4 – with lots of axons in L2/3; CamK2 does not). Explain this and the strange claim that the noise level is different in the emx1-s vs the tetO-s mice.

There is variability in the response to single spikes:

No attempt is made to distinguish interesting biological sources of noise (e.g. spike calcium coupling) to non-interesting biology (movement) and non-biological explanations (instrumentation). The paper would be stronger if the sources of noise were analyzed better. In particular, are the measurements done in a shot-noise limited regime? Does movement contribute? How about other instrumentation noise (shouldn't but still)?

Lower zoom imaging produces lower snr than higher zoom imaging

The section “Comparing spike-to-calcium fluorescence response curves imaged at high and low spatiotemporal resolutions” is also strange. It's not clear to me what exactly the point is here.

Obviously, everything else being equal, increasing the fov from 20 – 400 μm reduces the light dose per neuron by a factor of 400 and thus the SNR by a factor of 20 (see Peron et al., CONB 2015) just based on shot-noise alone! No one would image a 20 μm FOV with the same power as a 400 μm FOV.

Reviewer #2:

The study by Huang, Knoblich et al. represents a very important contribution to the field, providing critical examination of in vivo 2-photon calcium imaging for the detection of underlying spike events. Overall, the work is very high quality, and I have no concerns or suggestions with regard to data collection. I do have several major points on analyses that need to be addressed, detailed below. Overall, the work reads as if the major focus is the comparison of different transgenic GCaMP6 lines. While this topic is interesting, the far more important issue is the ability to estimate spiking from imaging data under "real world" conditions. Thus, far more emphasis needs to be placed on the "low zoom" data. The difference between the mouse lines is modest at best. However, the demonstration that spike detection is ~15% under normal imaging conditions is a stunning observation that should be a wake-up call to large parts of the community.

1) As noted, the high impact value of the study is on the "low zoom" data, as this represents the situation for the vast majority of experimental labs using GCaMP6. All analyses in the manuscript, including the examples comparing ΔF/F and spiking (Figures 1-6) need to be repeated for the low-zoom data. The analysis of neuropil correction is absolutely critical, as this may play a much larger role in the reduced spatiotemporal sampling regime. I would actually suggest making these analyses the major focus, rather than limited to Figure 7.

2) The paired statistical comparisons of single spike signals with a random period (e.g., Figures 4—figure supplement 1 and Figure 7—figure supplement 1) are not very informative. The fact that there is an average difference is far less important than the discriminability of true spikes from noise.

3) It is unclear how the 91 selected cells were chosen for "high quality recording and imaging". It would be useful to know how the results change for "lower quality" imaging, as this may better inform experimentalists on data collection.

4) The authors should make some attempt to explain why the spike detection is so much poorer at low zoom. Is it the fewer pixels per cell (factor of 16) or the lower sampling rate (factor of 4-5). Disambiguating these contributors would better help the field. For example, how does spatially or temporally down-sampling the high-zoom data affect spike detection?

5) The sensitivity and ROC analyses assume that the only way to extract spike information from a fluorescence trace is to do a linear thresholding on ΔF/F amplitude, but many spike extraction methods take into account multiple properties of the shape of fluorescence transients. It would be beneficial if the authors could apply some of the most commonly used spike extraction algorithms to their data in order to benchmark/validate them (particularly under the low-zoom conditions).

6) Please address the accuracy of spike detection under low-zoom for all locations across the FOV. At low zoom, most commercial two-photon microscopes have increased PSF and reduced photon collection efficiency at the edges of a large field.

7) Please explain why the imaging data were temporally smoothed. This is non-standard, and it is important to know how the analyses apply to conventional approaches.

8) It is unclear why the tetO-GCaMP6s have a higher noise floor. Are there any systematic differences in the way these data were collected (different anesthetic, different depth, different amounts of brain motion, different age of mice) that could explain this?

9) Please state the ages of each individual animal used in this study. Is age a confound? Are the distribution of ages for each mouse line matched? For viral expression of GCaMP, time of expression is an important variable. What role does it play in transgenic mice?

Reviewer #3:

This is a well done and systematic comparison of the relationship between electrophysiologically recorded action potentials and GECI-reported fluorescence transients in a variety of transgenic mouse lines used for such recordings. The authors carefully compare the ability of detect single and 5 spike events. Many people will read this study and find its results useful in designing their own experiments and in analyzing their data.

A few points need consideration

1) The authors only use 91 out of the of the 237 cells collected. If there was selection made based on the quality of the imaging data, this may strongly impact the results. How were the cells chosen for inclusion? What happens if the other cells are analyzed?

2) Only peri-cell annular neuropil fluorescence correction is attempted. What if CNMF-type algorithms are used? Does this yield substantially different results? It is not clear that the fixed r-value subtractive approach is necessarily the best when trying to detect single spikes. (I am happy to be convinced otherwise if I am wrong).

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your article "Relationship between simultaneously recorded spiking activity and fluorescence signal in GCaMP6 transgenic mice" for consideration by eLife. Your revised article has been reviewed by three peer reviewers, and the evaluation has been overseen by Gary Westbrook as the Senior Editor and Reviewing Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Karel Svoboda (Reviewer #1); Michael Higley (Reviewer #2); Bernardo L Sabatini (Reviewer #3). The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). First, because many researchers have temporarily lost access to the labs, we will give authors as much time as they need to submit revised manuscripts. We are also offering, if you choose, to post the manuscript to bioRxiv (if it is not already there) along with this decision letter and a formal designation that the manuscript is "in revision at eLife". Please let us know if you would like to pursue this option.

Summary

The reviewers all agreed that this data is an important resource. This manuscript reports a valuable simultaneous ephys-ophys dataset with a great number of cells recorded (N = 237) and selected (N = 91). This data set will support the development of more refined spike-to-fluorescence or fluorescence-to-spike inference models. However, inclusion of data collected under more "real world" conditions – meaning low zoom and typical laser power – would substantially increase the impact. As presented, the results remain somewhat limited without such data for comparison. The authors were requested to examine how different spike detection and neuropil subtraction algorithms would change their conclusions. Instead of including that analysis here, they have instead posted another manuscript to BioRxiv. Although this makes the analysis public, it fails to improve this manuscript.

A number of specific comments below require your attention before we can make a final decision regarding publication in eLife.

Essential revisions

1) There is really only minimal analysis of the data. It would be greatly improve the impact of this work to compare detailed spike to fluorescence model parameters to other measurements using the models in Wei et al., 2019.

2) From this and previous studies it is clear that viral expression of GCamp provides better SNR than transgenic expression. This remains a mystery and would be worthy of comment in the manuscript

3) Of course, it is expected that SNR will decrease with zoom at constant power because of shot noise. But it is highly unlikely that investigators would use the same laser power for imaging over changes in zoom by a factor of 20. Please comment.

4) Critical references are missing. Dana et al., 2014 and Wei et al., 2019 have shown that “GCaMP6s cells have spike-triggered fluorescence responses of larger amplitude, lower variability and greater single-spike detectability than GCaMP6f” in transgenic mice.

5) Important spike-to-calcium parameters (e.g. rise- and decay- time constants) are missing. These parameters should be reported as part of the basic analyses; please incorporate the analyses e.g. Figures 2D,E,F and 3F in Chen et al., 2013. One can also use existing spike-to-fluorescence models to estimate the parameters (Wei et al., 2019).

6) Spike-event snippet creation. First, the authors should demonstrate how they chose the parameters for the snippets in each imaging condition. It is not clear that the choice of snippet parameters were optimal; for example, Emx1-s and tetO-s ΔF/F at 4APs in Figure 2 seem not to reach the maximum within 200 ms window. Ideally, these parameters could be determined by the estimates of rise- and decay- time constants. For example, if the half-rise time is 100 ms and the spike-event snippet is 200 ms, one should use a time window at least 400 ms to capture the peak dF/F.

7) Neuropil removal. This is a confusing part throughout the manuscript. At the beginning, the authors preferred not performing neuropil correction because it might increase peak ΔF/F variability (Figure 2—figure supplement 1D), then in the later part, the authors claimed that the importance of the neuropil correction to the spike detection. The authors could offer a systematic study to address if neuropil should be removed and how. It seems like the optimal r for each cell should be used throughout the manuscript. One study from Kerlin et al., 2010 (this paper should be cited) provided some answers to this.

8) ΔF/F computation. In general one would take a long time-window to compute F0 with the background subtraction in the denominator, where F0 aims to reflect 0AP fluorescence. Computing locally as the mean within 50 or 20 ms before the first spike event is risky, because F0 in a short window can be contaminated by the previous calcium decay after a many-AP event e.g. F0 around 5-AP in Figure 1B.

9) Peak ΔF/F variability. First, it is not clear how the mean coefficient of variation of ΔF/F peak (a main measure of peak ΔF/F variability throughout the manuscript) was defined and computed. Is the coefficient in the term of the number of spikes? How was it computed in a given nAP case, e.g. Figure 4A right or Figure 4B? Second, although shot noise is dominant on single pixels, or at high sample rates or in low brightness conditions, variability would be reduced when computing F by averaging over N pixels. The analysis on single pixels is somewhat misleading. Third, trial-by-trial variability is important but the measurement is not clear. In general, the peak ΔF/F should depend on the time series of the spikes. It roughly depends on the number of the spikes in a brief time window. One would thus expect two sources of trial-by-trial variance, one depends on the number of spikes; the other depends on the spike pattern as the number of spikes is fixed. Authors should be able to decompose the variability into these terms.

10) The finding that variance in response amplitude is no larger than expected from shot-noise is surprising and unlikely to be true. There are many reasons why the coupling between AP and Ca might be variable (modulation, baseline potential, state of channels, channel fluctuations). Make sure this analysis is correct.

11) Lastly, as demonstrated above, the peak ΔF/F does not directly depend on behavioral condition when the spike pattern is given. Yet the difference in spon vs visstim in Cux2-f mice is striking. Can this difference be explained by the spike pattern difference in spon vs visstim conditions? If not, what causes this difference?

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Relationship between simultaneously recorded spiking activity and fluorescence signal in GCaMP6 transgenic mice" for further consideration by eLife. Your revised article has been evaluated by Gary Westbrook (Senior Editor) and Reviewers 2 and 3 from the original submission.

We are satisfied with the data in the revised manuscript, but ask that you revise the text in accord with the comments below as we believe that these changes with substantially improve the impact of your important work.

Reviewer #2:

Far more emphasis needs to be placed on the "low zoom" data. The difference between the mouse lines is modest at best. However, the demonstration that spike detection is ~15% under normal imaging conditions is a stunning observation that should be a wake-up call to large parts of the community.

Reviewer #3:

The authors have done a great deal to improve the study. There are some remaining points can be easily addressed to improve the study even further.

1) There is no description of what cells the Cux2 and Emx1 cre lines target. It would help make the study more approachable and useful to include this information here.

2) The authors report that there is no epileptiform activity in the mice they use. Do they not see such activity in these mice in their hands or do they select individual mice without seizures? How to they judge the presence of epileptiform activity?

3) In light of point 2, the fraction of spikes that are in groups of 1-5 spike bursts is very different for GCAMP6s and GCAMP6f mice. This is a metric measured by the cell-attached recording. Doesn't this indicate that the transgene has a large effect on cellular or circuit activity patterns?

4) The authors use QC metrics to reduce the number of cells analyzed from >200 to 48. Can they report what factors result in 75% of the cells being rejected? Do they know which rejection factors actually impact the ability to accurately infer spiking from fluorescence? Such insight would be very useful for others who don't have cell-attached recordings but want to be able to understand what cells to include in their final analyses.

5) Lastly, the Discussion leaves something to be desired. It is a synopsis of the conclusions and iterates some rather obvious points but also making some claims without backing. It is not clear that the statements about GFP quantum yield being the limiting factor to further improvement of single-spike detection is correct. The Ca-sensitive fluorophores may not get brighter, but their linearity, DF/F, stability etc may improve. Given the assumption that photon-budget is limiting, the statement that voltage-sensors may provide the solution seems wrong as these will provide many fewer photons per AP. The authors could use the discussion instead of provide some helpful hints to imagers to make the best use of their data based on the analyses presented.

eLife. 2021 Mar 8;10:e51675. doi: 10.7554/eLife.51675.sa2

Author response


Summary

This manuscript explores an important topic with data that is difficult to obtain. All reviewers thought the manuscript was worth publishing in eLife after appropriate revisions. However all reviewers had concerns (some overlapping) that are important to address. Many of comments can be addressed with clarifications, rewording and better explanation/analysis of some aspects of the work. However, the manuscript contains some statements and conclusions that indicate incomplete understanding of sources of noise in the imaging experiments. This is important because the paper is really focused on detection. The full comments of the reviewers are below.

We thank the reviewers for their constructive comments and suggestions, which have been extremely helpful in our effort to improve the manuscript. In the revised manuscript submitted here, we have attempted to address all the points raised by the reviewers. In particular, the major part of newly added analyses is focused on sources of noise in high and low zoom imaging experiments. Here we provide a brief summary of the major changes we have made:

Removed insufficiently substantiated conclusions about the difference between tetO-s and other mouse lines (sample size too small for tetO-s) and about the difference in response curve slope between high zoom and low zoom data (optimal neuropil subtraction may be different between zooms and was not considered). Also removed several less informative results to streamline the paper.

Added new measurements and analyses to demonstrate how signal detection difference between high and low zoom imaging conditions can indeed be explained by shot noise calculations for most cells.

Referred the reviewers to our follow-up manuscript, now available on bioRxiv, which includes systematic subsampling of our high zoom data to model the typical population calcium imaging data and application of state-of-the-art spike inference methods on the resampled data.

Clarified our quality control process that resulted in the selection of the 91 cells from our larger dataset for further analysis.

Reviewer #1:

Calcium imaging is widely used to track activity in large populations of neurons. Calcium-dependent fluorescence is often thought of as “activity”, but it is unclear what this means because the spike to fluorescence transform is not well understood. As a result the interpretation of calcium imaging data is often superficial and misleading. This is in part because ground truth data (i.e. simultaneous imaging and recording) is scarce. The major contribution of this paper is the report of a substantial bolus of additional ground truth data in several widely used transgenic mouse lines. This is hard-won data and I support publication. But some work is required first.

The take-home messages in this paper are: There are differences in spike detection across mouse lines, with 6s-expressing mice outperforming 6f-expressing mice.

This is expected.

There are differences in detection across mice expressing the same indicator:

These differences across mice expressing the same indicator are not explained. My suspicion is that differences in neuropil (i.e. background) is likely the culprit (emx1 expresses in L4 – with lots of axons in L2/3; CamK2 does not). Explain this and the strange claim that the noise level is different in the emx1-s vs the tetO-s mice.

Some of our analyses were from small numbers of recordings. The differences in spike detection and noise level across mice are examples, with the tetO-s results based on 4 cells from 1 mouse. Differences between mouse lines were statistically significant, as reported in the original manuscript, but the small number of recordings leaves us with little confidence in the apparent differences in spike detection and noise level in the Emx1-s vs tetO-s mice. Thus, we have removed these conclusions from the revised manuscript.

We think differences between neuropil fluorescence is unlikely to be due to the difference between Emx1 and Camk2a. GCaMP6s expression in Emx1-s (Emx1-IRES-Cre;Camk2a-tTA;Ai94) and tetO-s (Camk2a-tTA;tetO-GCaMP6s) lines are both under control of the Camk2a promoter, since Emx1-Cre has broader expression in cortex than Camk2a-tTA (as the reviewer mentioned) and thus Camk2a is the same restricting factor in both lines.

There is variability in the response to single spikes:

No attempt is made to distinguish interesting biological sources of noise (e.g. spike calcium coupling) to non-interesting biology (movement) and non-biological explanations (instrumentation). The paper would be stronger if the sources of noise were analyzed better. In particular, are the measurements done in a shot-noise limited regime? Does movement contribute? How about other instrumentation noise (shouldn't but still)?

We have expanded our noise analyses and present them in the new Figure 4—figure supplement 1.

Our new analyses indicate that shot noise is the dominant noise source in most cells (Figure 4—figure supplement 1). Furthermore, most of the trial-to-trial variance in the amplitudes of calcium transients can be attributable to shot noise. Trial-to-trial variance was 0-2% greater than expected from shot noise (variance was greater than expected by 1±14% in Emx1-s, 0±8% in tetO-s and 2±7% in Cux2-f). This 0-2% additional variance is presumably the summed effects of motion, instrumentation noise and spike-to-calcium coupling. Even without separating these three minor contributions, we can conclude that there’s little trial-to-trial variability in spike-to-calcium coupling in 3 of our 4 mouse lines.

The exception is Emx1-f, in which the trial-to-trial variance is 48±91% greater than expected from shot noise. We believe the increased variability in Emx1-f may be biological (and possibly related to this line’s susceptibility to epileptiform activity) and we explore this topic further in our follow-up paper, Ledochowitsch et al. (available on bioRxiv; https://www.biorxiv.org/content/10.1101/800102v1).

Lower zoom imaging produces lower snr than higher zoom imaging

The section “Comparing spike-to-calcium fluorescence response curves imaged at high and low spatiotemporal resolutions” is also strange. It's not clear to me what exactly the point is here.

Obviously, everything else being equal, increasing the fov from 20 – 400 μm reduces the light dose per neuron by a factor of 400 and thus the SNR by a factor of 20 (see Peron et al. CONB 2015) just based on shot-noise alone! No one would image a 20 μm FOV with the same power as a 400 μm FOV.

In this study, we conducted both high and low zoom imaging for a subset of the cells. When switching zooms we did NOT change the laser power. As a result, the laser power used for our low zoom data may be lower than that of typical population-scale imaging experiments. Indeed, our low zoom data had lower median photon flux compared to the same mouse lines in the Allen Brain Observatory dataset. We have kept the low zoom data in the manuscript, for reasons explained below in response to this reviewer’s further comments, as well as to address some of reviewer #2’s comments on this topic.

We have now added the following analysis in the low zoom Results section. Our expected reduction of SNR between high zoom and low zoom is by a factor lower than sqrt(400) = 20. After accounting for the duty cycle ratio for photon collection between the two zoom conditions (high/low = 0.7) and for the ratio of ROI areas drawn separately for the two zoom conditions, we find the effective ratio of photon fluxes (high/low) to be ~333. Considering further that the ratio of sampling rates for the fluorescence traces is 158/30 = 5.3, the expected SNR ratio for the fluorescence traces (high/low) is actually sqrt(333 / 5.3) = 7.9. Most cells for which we have both high and low zoom recordings, closely match that expected SNR ratio. There are a few outlier cells where we suspect that z-drift and/or activity differences between the high and the low zoom measurements lead to deviating SNR ratios.

Reviewer #2:

The study by Huang, Knoblich et al. represents a very important contribution to the field, providing critical examination of in vivo 2-photon calcium imaging for the detection of underlying spike events. Overall, the work is very high quality, and I have no concerns or suggestions with regard to data collection. I do have several major points on analyses that need to be addressed, detailed below. Overall, the work reads as if the major focus is the comparison of different transgenic GCaMP6 lines. While this topic is interesting, the far more important issue is the ability to estimate spiking from imaging data under "real world" conditions. Thus, far more emphasis needs to be placed on the "low zoom" data. The difference between the mouse lines is modest at best. However, the demonstration that spike detection is ~15% under normal imaging conditions is a stunning observation that should be a wake-up call to large parts of the community.

1) As noted, the high impact value of the study is on the "low zoom" data, as this represents the situation for the vast majority of experimental labs using GCaMP6. All analyses in the manuscript, including the examples comparing ΔF/F and spiking (Figures 1-6) need to be repeated for the low-zoom data. The analysis of neuropil correction is absolutely critical, as this may play a much larger role in the reduced spatiotemporal sampling regime. I would actually suggest making these analyses the major focus, rather than limited to Figure 7.

The revised paper includes a more thorough analysis of the low zoom data in the final Results section, “Spike detection during population imaging, at low zoom”. During the analysis process we realized that our high and low zoom data were collected at the same laser power. As a result, the low zoom data have lower photon flux and poorer SNR than many population imaging studies, limiting the conclusions we can draw about spike detection under common experimental conditions. Thus, our basic characterization is still mainly on the high zoom data.

In this revision we further investigated if the reduction in single spike detection rates in our low zoom data were directly correlated with the reduction of photon flux and found this indeed to be the case, as the expected spike detection rates for low zoom through simulation of the high zoom data matched the measured spike detection rates from the low zoom data (Figure 6C). This result then allowed us to make a simple estimate of the spike detection rates under more common 2-photon imaging conditions which use higher laser power than in our low zoom experiment. By calculating per cell photon flux directly from the Allen Brain Observatory data, we were able to estimate the single spike detection rates to be 25-35% for GCaMP6f cells in that dataset (Figure 6D). We believe that this may be a more reasonable estimate of spike detection with GCaMP6 in many population imaging studies.

We agree that there’s a need to further address spiking estimates under “real world” conditions. The vast majority of our images were collected under high zoom conditions, with inherently higher SNR than most population imaging studies. To mimic common population imaging conditions, we have resampled our high-zoom recordings both spatially and temporally. The noise profile of this resampled data set matches that of the Allen Brain Observatory and thus approximates typical population-scale imaging conditions. Our study of resampled data has become sizable and includes a comparison of spike extraction algorithms (see point 5, below). Rather than merge this complex study with the current manuscript, we have written a second manuscript that focuses more directly on spike extraction from low-zoom images, the latter created by resampling high zoom images. The follow-up manuscript, Ledochowitsch et al., is available as a preprint on bioRxiv (https://www.biorxiv.org/content/10.1101/800102v1).

2) The paired statistical comparisons of single spike signals with a random period (e.g., Figures 4—figure supplement 1 and Figure 7—figure supplement 1) are not very informative. The fact that there is an average difference is far less important than the discriminability of true spikes from noise.

We agree with the reviewer and have removed these results and figure panels to streamline the manuscript.

3) It is unclear how the 91 selected cells were chosen for "high quality recording and imaging". It would be useful to know how the results change for "lower quality" imaging, as this may better inform experimentalists on data collection.

Cell selection was based on manual assessment. Selected cells exhibited (1) no motion artifacts (after motion correction in x-y axes); (2) no photobleaching; (3) no evidence of dye filling from the pipette; and (4) stable baseline and distinguishable spikes for electrophysiological recordings. We have added this description to the manuscript.

4) The authors should make some attempt to explain why the spike detection is so much poorer at low zoom. Is it the fewer pixels per cell (factor of 16) or the lower sampling rate (factor of 4-5). Disambiguating these contributors would better help the field. For example, how does spatially or temporally down-sampling the high-zoom data affect spike detection?

Poorer spike detection results from decreased SNR at low zoom. Since our images are shot noise limited (see responses to reviewer 1, and Figure 4—figure supplement 1), spatial and temporal downsampling are equivalent. (The effects of spatial and temporal downsampling might be different if one dimension were severely undersampled. For example, if temporal sampling were so slow that the peaks of some calcium transients were not sampled. We’ve not downsampled sufficiently in either space or time to enter this regime.) In switching from high to low zoom, spatial sampling was reduced more than temporal sampling (16x vs 5x) so the reduced photon flux at low zoom is primarily the result of reduced spatial sampling. sqrt(16) = 4 and sqrt(5) = 2.24 so spatial downsampling accounts for ~65% of the effect.

5) The sensitivity and ROC analyses assume that the only way to extract spike information from a fluorescence trace is to do a linear thresholding on ΔF/F amplitude, but many spike extraction methods take into account multiple properties of the shape of fluorescence transients. It would be beneficial if the authors could apply some of the most commonly used spike extraction algorithms to their data in order to benchmark/validate them (particularly under the low-zoom conditions).

We agree with the reviewer. The linear thresholding of ΔF/F amplitude is a basic form of spike extraction and applying more advanced spike inference algorithms might be informative. However, spike inference is a complex and active area of research. Our follow-up manuscript, Ledochowitsch et al., includes a comparison of several spike extraction algorithms, using the ground-truth dataset we present here. We think that publishing a second manuscript, where we can address spike inference specifically, is preferable to squeezing a more sophisticated analysis of spike inference into the current manuscript.

6) Please address the accuracy of spike detection under low-zoom for all locations across the FOV. At low zoom, most commercial two-photon microscopes have increased PSF and reduced photon collection efficiency at the edges of a large field.

Unfortunately, we do not have data to address this point, as all our patched cells were near the center of the FOV.

7) Please explain why the imaging data were temporally smoothed. This is non-standard, and it is important to know how the analyses apply to conventional approaches.

Our frame rate under high zoom conditions was 158 frames per second (fps), approximately 5 times the ~30 fps rate of many modern 2-photon imaging experiments. We therefore used a smoothing window of 5 frames to approximate the temporal information more typical of 2-photon imaging experiments.

8) It is unclear why the tetO-GCaMP6s have a higher noise floor. Are there any systematic differences in the way these data were collected (different anesthetic, different depth, different amounts of brain motion, different age of mice) that could explain this?

There were no systematic differences in the way tetO-s data were collected, with no differences in mouse age, surgical preparation, or cell depths. As discussed above (in our responses to reviewer 1), these results suffered a small sample size and we have removed them from the manuscript.

9) Please state the ages of each individual animal used in this study. Is age a confound? Are the distribution of ages for each mouse line matched? For viral expression of GCaMP, time of expression is an important variable. What role does it play in transgenic mice?

We have added the ages of individual animals used in the study (Figure 1—figure supplement 1A). There were no significant differences in the age distributions across mouse lines. All mice were adults and one advantage of using transgenic mice is that transgene expression remains stable in adult mice over many months, as shown in our previous studies (Madisen et al., 2015; Daigle et al., 2018).

Reviewer #3:

This is a well done and systematic comparison of the relationship between electrophysiologically recorded action potentials and GECI-reported fluorescence transients in a variety of transgenic mouse lines used for such recordings. The authors carefully compare the ability of detect single and 5 spike events. Many people will read this study and find its results useful in designing their own experiments and in analyzing their data.

A few points need consideration

1) The authors only use 91 out of the of the 237 cells collected. If there was selection made based on the quality of the imaging data, this may strongly impact the results. How were the cells chosen for inclusion? What happens if the other cells are analyzed?

Cell selection was based on manual assessment. Selected cells exhibited (1) no motion artifacts (after motion correction in x-y axes); (2) no photobleaching; (3) no evidence of dye filling from the pipette; and (4) stable baseline and distinguishable spikes for electrophysiological recordings. We have added this description to the manuscript.

2) Only peri-cell annular neuropil fluorescence correction is attempted. What if CNMF-type algorithms are used? Does this yield substantially different results? It is not clear that the fixed r-value subtractive approach is necessarily the best when trying to detect single spikes. (I am happy to be convinced otherwise if I am wrong).

We implemented CNMF (via CaImAn) to obtain fluorescence traces to compare to our fixed r-value subtractive approach. We have not found evidence that CNMF provides superior spike detection. However, we cannot be sure that our ROI segmentation and fluorescence trace extraction using CNMF was optimal. In fact, the segmentation was not robust to small changes in input parameters, leading us to suspect that trace extraction might be far from optimal.

Unfortunately, we do not have a reliable method of tuning CNMF parameters. There are many input parameters and we have found that there are complex interactions between them so finding optimal parameters would require an extensive search of segmentation across n-dimensional space, for each dataset. Since we do not know whether our CNMF results are optimal, we cannot, in principle, be confident that CNMF could not offer improved performance.

In conclusion, we have neither evidence that CNMF is superior for spike detection nor that it’s inferior and we decided, therefore, not to add CNMF results to the manuscript.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Summary

The reviewers all agreed that this data is an important resource. This manuscript reports a valuable simultaneous ephys-ophys dataset with a great number of cells recorded (N = 237) and selected (N = 91). This data set will support the development of more refined spike-to-fluorescence or fluorescence-to-spike inference models. However, inclusion of data collected under more "real world" conditions – meaning low zoom and typical laser power – would substantially increase the impact. As presented, the results remain somewhat limited without such data for comparison. The authors were requested to examine how different spike detection and neuropil subtraction algorithms would change their conclusions. Instead of including that analysis here, they have instead posted another manuscript to BioRxiv. Although this makes the analysis public, it fails to improve this manuscript.

To address these concerns, we made three major changes to the manuscript:

1) We removed low zoom data from the manuscript. The laser power used for low zoom imaging was lower than typically used for population imaging, bringing into question the utility of the low zoom images. We replaced low zoom images with downsampled images (originally presented in our companion bioRxiv paper, Ledochowitsch et al.), as requested by reviewer 2 in the first round of reviews. We downsampled high zoom images to match the pixel size, frame rate and SNR of the Allen Brain Observatory and we believe these downsampled images are more representative of typical “real world” population imaging conditions.

2) We adopted the more rigorous quality control criteria outlined in Ledochowitsch et al., resulting a smaller number of neurons in our updated dataset, now 48 neurons. More information on numbers of recordings, broken down by genotype, is provided in Table 1.

3) We include analysis of spike detection. We applied three leading spike inference models and compared event detection under ideal (high zoom) and real-world-like (downsampled) conditions.

Essential revisions

1) There is really only minimal analysis of the data. It would be greatly improve the impact of this work to compare detailed spike to fluorescence model parameters to other measurements using the models in Wei et al., 2019.

We have added analyses, the major addition being a comparison of performance of three spike inference models on high zoom and downsampled datasets (Figures 6 and 7). Our main conclusion is that only ~20-30% of events are detected, at 1% false positive probability, under population imaging conditions.

2) From this and previous studies it is clear that viral expression of GCamp provides better SNR than transgenic expression. This remains a mystery and would be worthy of comment in the manuscript

In the revised manuscript we have reanalyzed our data with proper neuropil subtraction and time windows following the reviewers’ suggestions (see below). In the new analysis we don’t see that transgenic expression of GCaMP is substantially worse than viral expression. Nonetheless, we have addressed the potential difference where we list and discuss possible causes. Neuropil contamination is one possible explanation and strength of expression may be another.

3) Of course, it is expected that SNR will decrease with zoom at constant power because of shot noise. But it is highly unlikely that investigators would use the same laser power for imaging over changes in zoom by a factor of 20. Please comment.

We used the same laser power when switching between high and low zoom, to enable the most direct comparison possible between these conditions. However, we agree that it’s unlikely investigators would use the same laser power for imaging under such different conditions. In light of this fact, and as requested by reviewer 2 in the initial reviews, we have now replaced the low-zoom results with downsampled data. The baseline noise of the downsampled videos was similar to that in the Allen Brain Observatory (Figure 4), consistent with the downsampled results being representative of real world imaging conditions at low zoom. In the revised manuscript, we compare spike detection for each neuron under high zoom and after downsampling and we place high zoom and downsampled results side-by-side to facilitate comparison (Figures 5-7).

4) Critical references are missing. Dana et al., 2014, and Wei et al., 2019, have shown that `GCaMP6s cells have spike-triggered fluorescence responses of larger amplitude, lower variability and greater single-spike detectability than GCaMP6f` in transgenic mice.

We have cited additional references, including Dana et al., 2014 and Wei et al., 2019.

5) Important spike-to-calcium parameters (e.g. rise- and decay- time constants) are missing. These parameters should be reported as part of the basic analyses; please incorporate the analyses e.g. Figures 2D,E,F and 3F in Chen et al., 2013. One can also use existing spike-to-fluorescence models to estimate the parameters (Wei et al., 2019).

In Figure 3, we now provide examples of fluorescence transients, mean transients, and plots of peak DF/F, rise time constant and decay time constant for 1-5 APs.

6) Spike-event snippet creation. First, the authors should demonstrate how they chose the parameters for the snippets in each imaging condition. It is not clear that the choice of snippet parameters were optimal; for example, Emx1-s and tetO-s ΔF/F at 4APs in Figure 2 seem not to reach the maximum within 200 ms window. Ideally, these parameters could be determined by the estimates of rise- and decay- time constants. For example, if the half-rise time is 100 ms and the spike-event snippet is 200 ms, one should use a time window at least 400 ms to capture the peak dF/F.

We have lengthened the time windows for snippet creation and over which peak ΔF/F was found. The rise and decay time constants for GCaMP6f are 50-100 ms and 200-300 ms; and for GCaMP6s 150-200 ms and ~750 ms (4-5 APs, Figure 3C). Our updated snippets extend 300 ms and 500 ms after the spike, for GCaMP6f and GCaMP6s and are now adequate to robustly capture the peak ΔF/F (Figures 1 and 3).

7) Neuropil removal. This is a confusing part throughout the manuscript. At the beginning, the authors preferred not performing neuropil correction because it might increase peak ΔF/F variability (Figure 2—figure supplement 1D), then in the later part, the authors claimed that the importance of the neuropil correction to the spike detection. The authors could offer a systematic study to address if neuropil should be removed and how. It seems like the optimal r for each cell should be used throughout the manuscript. One study from Kerlin et al., 2010, (this paper should be cited) provided some answers to this.

We have streamlined neuropil subtraction in the revised manuscript. We determined the r-value that optimizes spike detection for GCaMP6f lines (spike detection changes little with r-value in GCaMP6s lines) and found that the mean optimal r-value is 0.82 (Figure 2). For high zoom images, we perform neuropil subtraction for all neurons, throughout the manuscript, with r = 0.8 except where r = 0.8 resulted in negative pre-spike fluorescence. For downsampled images, we used the neuropil subtraction algorithm employed by the Allen Brain Observatory, to mimic population imaging conditions as closely as possible. New methods sections explain how we performed neuropil subtraction.

8) ΔF/F computation. In general one would take a long time-window to compute F0 with the background subtraction in the denominator, where F0 aims to reflect 0AP fluorescence. Computing locally as the mean within 50 or 20 ms before the first spike event is risky, because F0 in a short window can be contaminated by the previous calcium decay after a many-AP event e.g. F0 around 5-AP in Figure 1B.

We’ve revised our ΔF/F calculation to reduce its sensitivity to previous spikes. For each snippet we calculated ΔF by subtracting the pre-spike fluorescence, defined as the mean fluorescence 0-100 ms before the spike. As the denominator in the ΔF/F calculation, we used the minimum pre-spike fluorescence across a group of snippets, with snippets grouped by cell and by number of APs. This approach ensures that the same F0 value is used for all snippets in a group, even where the transients in some snippets ride on the tail of the fluorescence transient from a previous AP. Manual inspection revealed that F0 returned to a stable baseline (presumably 0 AP fluorescence) in many trials and that our updated procedure uses 0 AP fluorescence as the denominator of the ΔF/F calculation for all trials.

9) Peak ΔF/F variability. First, it is not clear how the mean coefficient of variation of ΔF/F peak (a main measure of peak ΔF/F variability throughout the manuscript) was defined and computed. Is the coefficient in the term of the number of spikes? How was it computed in a given nAP case, e.g. Figure 4A right or Figure 4B? Second, although shot noise is dominant on single pixels, or at high sample rates or in low brightness conditions, variability would be reduced when computing F by averaging over N pixels. The analysis on single pixels is somewhat misleading. Third, trial-by-trial variability is important but the measurement is not clear. In general, the peak ΔF/F should depend on the time series of the spikes. It roughly depends on the number of the spikes in a brief time window. One would thus expect two sources of trial-by-trial variance, one depends on the number of spikes; the other depends on the spike pattern as the number of spikes is fixed. Authors should be able to decompose the variability into these terms.

10) The finding that variance in response amplitude is no larger than expected from shot-noise is surprising and unlikely to be true. There are many reasons why the coupling between AP and Ca might be variable (modulation, baseline potential, state of channels, channel fluctuations). Make sure this analysis is correct.

We have removed the coefficient of variation measurements form the paper. In light of the reviewers’ concerns about single pixel measurements, we have revised our analysis of noise and trial-to-trial variability, summing pixels across each soma. Our revised analysis indicates that variability was often greater than that expected from shot noise (Figure 3—figure supplement 1). Our results document trial-to-trial variability but provide little further information on sources of variability. Likely motion is not a significant contributor since there’s very little motion in the images. We speculate that biology is the major source of trial-to-trial variability.

11) Lastly, as demonstrated above, the peak ΔF/F does not directly depend on behavioral condition when the spike pattern is given. Yet the difference in spon vs visstim in Cux2-f mice is striking. Can this difference be explained by the spike pattern difference in spon vs visstim conditions? If not, what causes this difference?

The comparison of spon and visstim conditions was based on a small number of Cux2-f neurons. The revision of selection procedures when updating the manuscript has reduced the number of Cux2-f neurons without visual stimuli to zero (Figure 1—figure supplement 1). Whether there’s a difference in peak ΔF/F between spon and visstim in Cux2-f is not a question we can answer with confidence with our dataset.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

We are satisfied with the data in the revised manuscript, but ask that you revise the text in accord with the comments below as we believe that these changes with substantially improve the impact of your important work.

Reviewer #2:

Far more emphasis needs to be placed on the "low zoom" data. The difference between the mouse lines is modest at best. However, the demonstration that spike detection is ~15% under normal imaging conditions is a stunning observation that should be a wake-up call to large parts of the community.

We have made several changes to emphasize the low detection rate under normal imaging conditions. Main changes include adding AP detection rate numbers to the Abstract. We have also rewritten the Introduction to include a new second paragraph addressing the paradox that APs can be detected efficiently with GCaMP indicators, but often aren’t in practice due to microscope magnification and other constraints such as those on model refinement.

Reviewer #3:

The authors have done a great deal to improve the study. There are some remaining points can be easily addressed to improve the study even further.

1) There is no description of what cells the Cux2 and Emx1 cre lines target. It would help make the study more approachable and useful to include this information here.

In the Results we now note that expression is in excitatory neurons:

“The dataset for further analysis was from 48 neurons from mice of 4 transgenic lines, two expressing GCaMP6s and two GCaMP6f in excitatory neurons in layer 2/3 and deeper layers of cortex (Table 1).”

We provide additional information on expression in the Materials and methods, where we now state:

“All four lines drive GCaMP expression primarily in excitatory neurons. In Cux2-CreERT2 mice, Cre and GCaMP expression are enriched in layer 2/3 (Franco et al., 2012; Harris et al., 2014). In Emx1-IRES-Cre and Camk2a-tTA mice, GCaMP is expressed throughout cortical layers (Gorski et al., 2002; Wekselblatt et al., 2016). Images showing the pattern of Cre and GCaMP expression in these mouse lines are available via the Transgenic Characterization pages of the Allen Mouse Brain Connectivity Atlas and Allen Brain Observatory: https://connectivity.brain-map.org/transgenic, http://observatory.brain-map.org/visualcoding/transgenic

2) The authors report that there is no epileptiform activity in the mice they use. Do they not see such activity in these mice in their hands or do they select individual mice without seizures? How to they judge the presence of epileptiform activity?

The Materials and methods previously stated, “Ai93 and Ai94 containing mice included in this dataset did not show behavioral signs for epileptic brain activity.” We have replaced this sentence with a more specific statement:

“Mice of some of the genotypes used here, most notably Emx1-f, can exhibit epileptiform activity (Steinmetz et al., 2017), including overt seizures. Mice with seizures were excluded from the study.”

3) In light of point 2, the fraction of spikes that are in groups of 1-5 spike bursts is very different for GCAMP6s and GCAMP6f mice. This is a metric measured by the cell-attached recording. Doesn't this indicate that the transgene has a large effect on cellular or circuit activity patterns?

We agree with the reviewer’s interpretation and now made this point explicitly in the revised manuscript:

“However, the spiking patterns of neurons from GCaMP6s and -f lines commonly differed, suggesting that one or more transgenes affected cell or circuit activity (Figure 1—figure supplement 1C).”

4) The authors use QC metrics to reduce the number of cells analyzed from >200 to 48. Can they report what factors result in 75% of the cells being rejected? Do they know which rejection factors actually impact the ability to accurately infer spiking from fluorescence? Such insight would be very useful for others who don't have cell-attached recordings but want to be able to understand what cells to include in their final analyses.

Most of the discarded neurons (145 of 213) were removed by the electrophysiology quality control filters. These filters often removed overlapping populations of neurons, making it impossible to assign removal to single factors. The overlapping nature of the filtering process also makes it difficult or perhaps impossible to determine the impact of each of our 35 electrophysiology filters on our ability to infer spiking from fluorescence. A few (10 of 68) neurons were removed during image downsampling, leaving 58 neurons. 10 neurons were removed manually, 7 with red indicator in the soma, 2 because of image processing issues, and 1 because of a truncated electrophysiology recording, leaving 48 neurons in the final data set. To the Materials and methods, we have added information on the number of neurons removed by each step in our quality control process.

5) Lastly, the Discussion leaves something to be desired. It is a synopsis of the conclusions and iterates some rather obvious points but also making some claims without backing. It is not clear that the statements about GFP quantum yield being the limiting factor to further improvement of single-spike detection is correct. The Ca-sensitive fluorophores may not get brighter, but their linearity, DF/F, stability etc may improve. Given the assumption that photon-budget is limiting, the statement that voltage-sensors may provide the solution seems wrong as these will provide many fewer photons per AP. The authors could use the discussion instead of provide some helpful hints to imagers to make the best use of their data based on the analyses presented.

We have rewritten this paragraph of the Discussion, now suggesting best practices in light our results. The revised paragraph reads:

“Our results point to several practices that might be adopted to maximize spike detection. Firstly, minimize the field of view, maximizing photon flux per neuron. Secondly, tune the spike inference model for each neuron independently, where possible. Thirdly, compare the results of several spike inference models. The 3 models employed here produced similar AP detection rates, whether applied to high resolution or to down-sampled images. Similarly, Pachitariu et al., (2018) observed that the L0 constraint failed to improve performance of the NND model. Nonetheless, each model has strengths and weaknesses. For example, a model may detect more APs than another but at the cost of a greater false positive rate. As a result, model performance may diverge for some AP rates and patterns. In the worst case, comparing models provides some protection from errors in implementation. Fourthly, ensure that traces are sampled (or upsampled) at a sufficiently high rate when employing NND and use autocalibration with MLspike; both make a substantial difference to model performance. Finally, exercise caution when interpreting the inferred spike rates. Commonly, many APs are not detected using even the most accurate spike inference models.”

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Huang L, Ledochowitsch P, Knoblich U, Lecoq Jrm, Murphy GJ, Reid RC, de Vries SEJ, Koch C, Zeng H, Buice MA, Waters J, Li L. 2019. Ophys/Ephys Calibration Data. Allen Brain Map. circuits/oephys

    Supplementary Materials

    Transparent reporting form

    Data Availability Statement

    All data generated and analyzed in this study are available at https://portal.brain-map.org/explore/circuits/oephys.

    The following dataset was generated:

    Huang L, Ledochowitsch P, Knoblich U, Lecoq Jrm, Murphy GJ, Reid RC, de Vries SEJ, Koch C, Zeng H, Buice MA, Waters J, Li L. 2019. Ophys/Ephys Calibration Data. Allen Brain Map. circuits/oephys


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES