Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Jul 1.
Published in final edited form as: Adv Neural Inf Process Syst. 2021 Dec;34:6062–6074.

Bubblewrap: Online tiling and real-time flow prediction on neural manifolds

Anne Draelos 1, Pranjal Gupta 2, Na Young Jun 3, Chaichontat Sriworarat 4, John Pearson 5
PMCID: PMC9247712  NIHMSID: NIHMS1819004  PMID: 35785106

Abstract

While most classic studies of function in experimental neuroscience have focused on the coding properties of individual neurons, recent developments in recording technologies have resulted in an increasing emphasis on the dynamics of neural populations. This has given rise to a wide variety of models for analyzing population activity in relation to experimental variables, but direct testing of many neural population hypotheses requires intervening in the system based on current neural state, necessitating models capable of inferring neural state online. Existing approaches, primarily based on dynamical systems, require strong parametric assumptions that are easily violated in the noise-dominated regime and do not scale well to the thousands of data channels in modern experiments. To address this problem, we propose a method that combines fast, stable dimensionality reduction with a soft tiling of the resulting neural manifold, allowing dynamics to be approximated as a probability flow between tiles. This method can be fit efficiently using online expectation maximization, scales to tens of thousands of tiles, and outperforms existing methods when dynamics are noise-dominated or feature multi-modal transition probabilities. The resulting model can be trained at kiloHertz data rates, produces accurate approximations of neural dynamics within minutes, and generates predictions on submillisecond time scales. It retains predictive performance throughout many time steps into the future and is fast enough to serve as a component of closed-loop causal experiments.

1. Introduction

Systems neuroscience is in the midst of a data explosion. Advances in microscopy [1, 2] and probe technology [3, 4, 5] have made it possible to record thousands of neurons simultaneously in behaving animals. At the same time, growing interest in naturalistic behaviors has increased both the volume and complexity of jointly recorded behavioral data. On the neural side, this has resulted in a host of new modeling and analysis approaches that aim to match the complexity of these data, typically using artificial neural network models as proxies for biological neural computation [6, 7, 8].

At the same time, this increase in data volume has resulted in increasing emphasis on methods for dimensionality reduction [9] and a focus on neural populations in preference to the coding properties of individual neurons [10]. However, given the complexity of neural dynamics, it remains difficult to anticipate what experimental conditions will be needed to test population hypotheses in post hoc analyses, complicating experimental design and reducing power. Conversely, adaptive experiments, those in which the conditions tested change in response to incoming data, have been used in neuroscience to optimize stimuli for experimental testing [11, 12, 13, 14], in closed-loop designs [15, 16, 17], and even to scale up holographic photostimulation for inferring functional connectivity in large circuits [18].

Yet, despite their promise, adaptive methods are rarely applied in practice for two reasons: First, although efficient online methods for dimensionality reduction exist [19, 20, 21, 22, 23], these methods do not typically identify stable dimensions to allow low-dimensional representations of data to be compared across time points. That is, when the spectral properties of the data are changing in time, methods like incremental SVD may be projecting the data into an unstable basis, rendering these projections unsuitable as inputs to further modeling. Second, while many predictive models based on the dynamical systems approach exist [6, 24, 25, 26, 27, 28, 29], including online approaches [30, 31, 16, 32], they typically assume a system with lawful dynamics perturbed by Gaussian noise. However, many neural systems of interest are noise-dominated, with multimodal transition kernels between states.

In this work, we are specifically interested in closed loop experiments in which predictions of future neural state are needed in order to time and trigger interventions like optogenetic stimulation or a change in visual stimulus. Thus, our focus is on predictive accuracy, preferably far enough into the future to compensate for feedback latencies. To address these goals, we propose an alternative to the linear systems approach that combines a fast, stable, online dimensionality reduction with a semiparametric tiling of the low-dimensional neural manifold. This tiling introduces a discretization of the neural state space that allows dynamics to be modeled as a Hidden Markov Model defined by a sparse transition graph. The entire model, which we call “Bubblewrap,” can be learned online using a simple EM algorithm and handles tilings and graphs of up to thousands of nodes at kiloHertz data rates. Most importantly, this model outperforms methods based on dynamical systems in high-noise regimes when the dynamics are more diffusion-like. Training can be performed at a low, fixed latency ≈ 10ms using a GPU, while a cached copy of the model in main memory is capable of predicting upcoming states at <1ms latency. As a result, Bubblewrap offers a method performant and flexible enough to serve as a neural prediction engine for causal feedback experiments.

2. Stable subspaces from streaming SVD

As detailed above, one of the most pressing issues in online neural modeling is dealing with the increasingly large dimensionality of collected data — hundreds of channels per Neuropixels probe [4, 5], tens of thousands of pixels for calcium imaging. However, as theoretical work has shown [33, 34], true neural dynamics often lie on a low-dimensional manifold, so that population activity can be accurately captured by analyzing only a few variables.

Here, we combine two approaches to data reduction: In the first stage, we use sparse random projections to reduce dimensionality from an initial d dimensions (thousands) to n (a few hundred) [35, 36]. By simple scaling, for a fixed budget of N cells in our manifold tiling, we expect density (and thus predictive accuracy) to scale as N1n in dimension n, and so we desire n to be as small as possible. However, by the Johnson-Lindenstrauss Lemma [37, 36], when reducing from d to n dimensions, the distance between vectors u* and v* in the reduced space is related to the distance between their original versions u and v by

(1ε)uv2u*v*2(1+ε)uv2 (1)

with probability 1 − δ if n>O(log(1/δ)/ε2). Unfortunately, even for ε ~ 0.1 (10% relative error), the required n may be quite large, making this inappropriate for reducing to the very small numbers of effective dimensions characterizing neural datasets.

graphic file with name nihms-1819004-f0001.jpg

Thus, in the second stage, we reduce from n~O(100) to k~O(10) dimensions using a streaming singular value decomposition. This method is based on the incremental block update method of [20, 22] with an important difference: While the block update method aims to return the top-k SVD at every time point, the directions of the singular vectors may be quite variable during the course of an experiment (Figure 1dh), which implies an unstable representation of the neural manifold. However, as we show below, the top-k subspace spanned by these vectors stabilizes in seconds on typical neural datasets and remains so throughout the experiment. Therefore, by selecting a stable basis (instead of the singular vector basis) for the top-k subspace, we preserve the same information while ensuring a stable representation of the data for subsequent model fitting.

Figure 1: Timing and stability of two-stage dimension reduction.

Figure 1:

a) Distortion (ε) as a function of number of dimensions retained (n) for both sparse random projections and proSVD on random Gaussian data with batch size b = 1000. b) Time required for the dimensionality reduction in (a), amortized for batch size. While random projections are extremely efficient, proSVD time costs grow with the number of dimensions retained. c) Pareto front for the time-distortion tradeoff of random projections followed by proSVD. Color indicates n, the number of dimensions retained by random projections. Black arrow indicates the particular tradeoff we chose of n = 200. d–f) Embedding of a single trial (green line) into the basis defined by streaming SVD for different amounts of data seen. Dotted line indicates the same trial embedded using SVD on the full data set. Rapid changes in estimates of singular vectors early on lead to an unstable representation. g–i) Same trial and conventions as (d–f) for the proSVD embedding. Dotted lines in the two rows are the same curve in different projections.

More specifically, let xtn be a vector of input data after random projections. In our streaming setup, these are processed b samples at a time, with b = 1 reasonable for slower methods like calcium imaging and b = 40 more appropriate for electrophysiological sampling rates of ~20kHz. Then, if the data matrix X has dimension n × T, adding columns over time, the incremental method of [20, 22] produces at each time step a factorization X = QRW, where the columns of the orthogonal matrices Q and W span the left and right top-k singular subspaces, respectively. If the matrix R were diagonal, this would be equivalent to the SVD. In the incremental algorithm, R is augmented at each timestep based on new data to form R^, which is block diagonalized via an orthogonal matrix and truncated to the top-k subspace, allowing for an exact reduced-rank SVD (Appendix A).

However, as reviewed in [20, 22], since there are multiple choices of basis Q for for the top-k singular subspace, there are likewise multiple choices of block diagonalization for R^. In [20, 22], the form of this operation is chosen for computational efficiency. But an equally valid option is to select the orthogonal matrix that minimizes the change in the singular subspace basis Q from one timestep to the next:

minQtQt1F=minTQ^U1TQt1F, (2)

where Q^ is an augmented basis for the top-(k + b) singular subspace, U1 contains the first k left singular vectors of R^, and T is an orthogonal matrix (Appendix A). This minimization is known as the Orthogonal Procrustes problem and has a well-known solution [38]: T=U˜V˜, where U˜ and V˜ are the left and right singular vectors, respectively, of MQt1Q^U1. (See [39] for a recent application of similar ideas in brain-computer interfaces). This Procrustean SVD (proSVD) procedure is summarized in Algorithm 1. There, lines 1–8 follow [20, 22], while lines 10 and 11 perform the Orthogonal Procrustes procedure. Line 9 serves as a leak term that discounts past data as in [40].

Figures 1ac illustrates the performance of the two-stage dimension reduction algorithm for a case of d = 104 randomly generated Gaussian data. While proSVD yields minimal distortion (due to truncation of the spectrum to k = 6), random projections require k~O(100) to achieve the same result (Figure 1a). By contrast, random projections are much faster (Figure 1b). Thus, we can trade off distortion against time by adjusting n, the number of intermediate dimensions. As Figure 1c shows, the optimal tradeoff occurs somewhere around n = 200 for this example.

Figures 1di show results for neural data from recorded from monkey motor cortex [26] in a cued reach task. While projection of the data into the basis defined by streaming SVD remains unstable early in data collection (top), the proSVD representation is nearly equivalent to the full offline result after only a few trials (≈15s, middle). This is due to the fact that, in all data sets we examined, the top-k SVD subspace was identified extremely quickly; proSVD simply ensures the choice of a stable basis for that subspace.

3. Bubblewrap: a soft manifold tiling for online modeling

As reviewed above, most neural population modeling approaches are based on the dynamical systems framework, assuming a lawful equation of motion corrupted by noise. However, for animals engaged in task-free natural behavior [41, 42, 43], trajectories are likely to be sufficiently complex that simple dynamical models fail. For instance, dynamical systems models with Gaussian noise necessarily produce unimodal transition probabilities centered around the mean prediction, while neural trajectories may exhibit multimodal distributions beginning at the same system state. By contrast, we pursue an alternative method that trades some accuracy in estimating instantaneous system state for flexibility in modeling the manifold describing neural activity.

Our approach is to produce a soft tiling of the neural manifold in the form of a Gaussian mixture model (GMM), each component of which corresponds to a single tile. We then approximate the transitions between tiles via a Hidden Markov Model (HMM), which allows us to capture multimodal probability flows. As the number of tiles increases, the model produces an increasingly finer-grained description of dynamics that assumes neither an underlying dynamical system nor a particular distribution of noise.

More specifically, let xt be the low-dimensional system state and let zt ∈ 1 … N index the tile to which the system is assigned at time t. Then we have for the dynamics

p(zt=jzt1=i)=Aijp(xtzt)=N(μzt,Σzt)p(μj,Σj)=NIW(μ0j,λj,Ψj,νj), (3)

where we have assumed Normal-inverse-Wishart priors on the parameters of the Gaussians. Given its exponential family form and the conjugacy of the priors, online expectation maximization updates are available in closed form [44, 45, 46] for each new datum, though we opt, as in [45] for a gradient-based optimization of an estimate of the evidence lower bound

L(A,μ,Σ)=ij(N^ij(T)+βij1)logAij+j(S^1j(T)+λjμ0j)Σj1μj12jtr((Ψj+S^2j(T)+λjμ0jμ0j+(λj+n^j(T))μjμj)Σj1)12j(νj+n^j(T)+d+2)logdetΣj (4)

with accumulating (estimated) sufficient statistics

αj(t)=iαi(t1)Γij(t)N^ij(t)=(1εt)N^ij(t1)+αi(t1)Γij(t)n^j(t)=iN^ij(t)S^1j(t)=(1εt)S^1j(t1)+αj(t)xtS^2j(t)=(1εt)S^2j(t1)+αj(t)xtxt (5)

where αj(t) = p(zt = j|x1:t) is the filtered posterior, Γij(t) is the update matrix from the forward algorithm [44], and εt is a forgetting term that discounts previous data. Note that even for ε = 0, L is only an estimate of the evidence lower bound because the sufficient statistics are calculated using α(t) and not the posterior over all observed data.

In setting Normal-Inverse-Wishart priors over the Gaussian mixture components, we take an empirical Bayes approach by setting prior means μ0j to the current estimate of the data center of mass and prior covariance parameters Ψj to N2k times the current estimate of the data covariance (Appendix B). For initializing the model we use a small data buffer M~O(10). We chose effective observation numbers (λ, ν) = 10−3 and trained this model to maximize L(A,μ,Σ) using Adam [47], enforcing parameter constraints by replacing them with unconstrained variables aij and lower triangular Lj with positive diagonal: Aij=exp(aij)/jexp(aij),Σj1=LjLj..

Finally, in order to prevent the model from becoming stuck in local minima and to encourage more effective tilings, we implemented two additional heuristics as part of Bubblewrap: First, whenever a new observation was highly unlikely to be in any existing mixture component (log p(xt|zt) < θn for all zt), we teleported a node at this data point by assigning αJ(t) = 1 for an unused index J. For initial learning this results in a “breadcrumbing” approach where nodes are placed at the locations of each new observed datum. Second, when the number of active nodes was equal to our total node budget N, we chose to reclaim the node with the lowest value of n^(t) and zeroed out its existing sufficient statistics before teleporting it to a new location. In practice, these heuristics substantially improved performance, especially early in training (Appendix D). The full algorithm is summarized in Algorithm 2.

graphic file with name nihms-1819004-f0002.jpg

4. Experiments

We demonstrated the performance of Bubblewrap on both simulated non-linear dynamical systems and experimental neural data. We compared these results to two existing online learning models for neural data, both of which are based on dynamical systems [30, 32]. To simulate low-dimensional systems, we generated noisy trajectories from a two-dimensional Van der Pol oscillator and a three-dimensional Lorenz attractor. For experimental data, we used four publicly available datasets from a range of applications: 1) trial-based spiking data recorded from primary motor cortex in monkeys performing a reach task [48, 49] preprocessed by performing online jPCA [49]; 2) continuous video data and 3) trial-based wide-field calcium imaging from a rodent decision-making task [50, 51]; 4) high-throughput Neuropixels data [52, 53].

For each data set, we gave each model the same data as reduced by random projections and proSVD. For comparisons across models, we quantified overall model performance by taking the mean log predictive probability over the last half of each data set (Table 1). For Bubblewrap, prediction T steps into the future gives

logp(xt+Tx1:t)=logi,jp(xt+Tzt+T=j)p(zt+T=jzt=i)p(zt=ix1:t)=logi,jN(xt+1;μj,Σj)(AT)ijαi(t), (6)

where AT is the T-th power of the transition matrix. Conveniently, these forward predictions can be efficiently computed due to the closed form (6), while similar predictions in comparison models [30, 32] must be approximated by sampling (Appendix C). In addition, for Bubblewrap, which is focused on coarser transitions between tiles, we also report the entropy of predicted transitions:

H(t,T)=jp(zt+T=jx1:t)logp(zt+T=jx1:t)=ij(AT)ijαi(t)logk(AT)kjαk(t). (7)

Table 1:

Model comparison results as mean ± standard deviation of the log predictive probability over the last half of the dataset.

Log predictive probability
Dataset Bubblewrap VJF [32] ZP (2016) [30]
2D Van der Pol, 0.05 0.965 ± 1.123 −0.338 ± 0.427 0.121 ± 0.857
2D Van der Pol, 0.20 −1.088 ± 1.184 −1.140 ± 0.879 −0.506 ± 0.964
3D Lorenz, 0.05 −7.338 ± 1.289 −16.98 ± 1.923 −12.39 ± 1.723*
3D Lorenz, 0.20 −7.474 ± 1.279 −17.30 ± 2.112 −12.42 ± 1.708*
Monkey reach 3.046 ± 4.959 −5.159 ± 0.987 3.818 ± 9.118
Wide-field calcium 5.974 ± 2.979 3.768 ± 6.204 1.613 ± 4.083
Mouse video −10.93 ± 2.386 −15.86 ± 1.084 −10.65 ± 4.145*
Neuropixels −12.84 ± 6.017 −12.06 ± 5.244 −12.28 ± 4.567

Asterisks (*) indicate models that degenerated to a random walk.

Additional detailed experimental results and benchmarking of our GPU implementation in JAX [54] are in Appendix D. We compared performance of our algorithm against both [30] (using our own implementation in JAX) and Variational Joint Filtering [32] (using the authors’ implementation). Our implementation of Bubblewrap, as well as code to reproduce our experiments, is open-source and available online at http://github.com/pearsonlab/Bubblewrap.

When tested on low-dimensional dynamical systems, Bubblewrap successfully learned tilings of both neural manifolds, outperforming VJF [32] on both datasets (Figure 2a,b) while it was comparable to the algorithm of [30] on one of the 2D (but neither of the 3D) cases (Figure 2). This is surprising, since both comparison methods assume an underlying dynamical system and attempt to predict differences between individual data points, while Bubblewrap only attempts to localize data to within a coarse area of the manifold.

Figure 2: Modeling of low-dimensional dynamical systems.

Figure 2:

a) Bubblewrap end tiling of a 2D Van der Pol oscillator (data in gray; 5% noise case corresponding to line 1 of Table 1). Tile center locations are in black with covariance ‘bubbles’ for 3 sigma in orange. b) Bubblewrap end tiling of a 3D Lorenz attractor (5% noise), where tiles are plotted similarly to (a). c) Log predictive probability across all timepoints for each comparative model for the 2D Van der Pol, 0.05 case (top) and for the 3D Lorenz, 0.05 case (bottom).

We next tested each algorithm on more complex data collected from neuroscience experiments. These data exhibited a variety of structure, from organized rotations (Figure 3a) to rapid transitions between noise clusters (Figure 3b) to slow dynamics (Figure 3c). In each case, Bubblewrap learned a tiling of the data that allowed it to equal or outperform state predictions from the comparison algorithms (Figure 3df, blue). In some cases, as with the mouse dataset, the algorithm of [30] produced predictions for xt by degenerating to a random walk model (Table 1 marked with *; Appendix D). Regardless, Bubblewrap’s tiling generated transition predictions with entropies far below those of a random walk (Figure 3df, green), indicating it successfully identified coarse structure, even in challenging datasets. Thus, even though these data are noise-dominated and lack much of the typical structure identified by neural population models, coarse-graining identifies some reliable patterns.

Figure 3: Bubblewrap results on experimental datasets.

Figure 3:

a) Bubblewrap results for example trials (blue) from the monkey reach dataset [48, 49], projected onto the first jPCA plane. All trials are shown in gray. The tile center locations which were closest to the trajectories are plotted along with their covariance “bubbles.” Additionally, large transition probabilities from each tile center are plotted as black lines connecting the nodes. Bubblewrap learns both within-trial and across-trial transitions, as shown by the probability weights. b) Bubblewrap results on widefield calcium imaging from [50, 51], visualized with UMAP. A single trajectory comprising ≈ 1.5s of data is shown in blue. Covariance “bubbles” and transition probabilities omitted for clarity. c) Bubblewrap results when applied to videos of mouse behavior [50, 51], visualized by projection onto the first SVD plane. Blue line: 3.3s of data. d, e, f) Log predictive probability (blue) and entropy (green) over time for the respective datasets in (a,b,c). Black lines are exponential weighted moving averages of the data. Dashed green line indicates maximum entropy (log2(N)).

We additionally considered the capability of our algorithm to scale to high-dimensional or high-sampling rate data. As a case study, we considered real-time processing (including random projections, proSVD, and Bubblewrap learning) of Neuropixels data comprising 2688 units with 74,330 timepoints from 30 ms bins. As Figure 4 shows, Bubblewrap once again learns a tiling of the data manifold (a), capturing structure in the probability flow within the space (b) with predictive performance comparable to finer-grained methods (Table 1). More importantly, all these steps can be performed well within the 30ms per sample time of the data (c). In fact, when testing on representative examples of d = 104 dimensions, 1 kHz sampling rates, or N = 20, 000 tiles, our algorithm was able to maintain amortized per-sample processing times below those of data acquisition. In practice, we found that even in higher-dimensional datasets (as in the Neuropixels case), only 1–2 thousand tiles were used by the model, making it easy to run at kHz data rates. What’s more, while learning involved round trip GPU latencies to perform gradient updates, online predictions using slightly stale estimates of Bubblewrap parameters could be performed far faster, in tens of microseconds.

Figure 4: High-throughput data & benchmarking.

Figure 4:

a) Bubblewrap results for example trajectories (blue) in the Neuropixels dataset [52, 53] (data in gray) visualized with UMAP. b) Log predictive probability (blue) and entropy (green) over time. Black lines are exponential weighted moving averages of the data. Dashed green line indicates maximum entropy. c) Average cycle time (log scale) during learning or prediction (last bar) for each timepoint. Neuropixels (NP) is run as in (a,b) with no optimization and all heuristics, and Bubblewrap is easily able to learn at rates much faster than acquisition (30 ms). By turning off the global mean and covariance and priors updates and only taking a gradient step for L every 30 timepoints, we are able to run at close to 1 kHz (NPb). All other bars show example timings from Van der Pol synthetic datasets optimized for speed: 104 dim, where we randomly project down to 200 dimensions and used proSVD to project to 10 dimensions for subsequent Bubblewrap modeling learning; N = 20k, 10k, and 1k nodes, showing how our algorithm scales with the number of tiles; and Prediction, showing the time cost to predict one step ahead for the N = 1k case.

Just as importantly, when used for closed loop experiments, algorithms must be able to produce predictions far enough into the future for interventions to be feasible. Thus we examined the performance of our algorithm and comparison models for predicting T steps ahead into the future. Bubblewrap allows us to efficiently calculate predictions even many time steps into the future using (6), whereas the comparison models require much costlier sampling approaches. Figure 5 shows the mean log predictive probabilities for all models many steps into the future for each experimental dataset (top row), and the entropy of the predicted transitions using Bubblewrap (bottow row). Our algorithm consistently maintains performance even when predicting 10 steps ahead, providing crucial lead time to enable interventions at specific points along the learned trajectory. In comparison, predictive performance of [30], which initially matches or exceeds Bubblewrap for two datasets, rapidly declines, while Variational Joint Filtering [32], with lower log likelihood, also exhibits a slow decay in accuracy.

Figure 5: Multi-step ahead predictive performance.

Figure 5:

(top) Mean log predictive probability as a function of the number of steps ahead used for prediction for each of the four experimental datasets studied. Colors indicate model. (bottom) Bubblewrap entropy as a function of the number of steps ahead used for prediction. Higher entropy indicates more uncertainty about future states. Dashed lines denote maximum entropy for each dataset (log of the number of tiles).

5. Discussion

While increasing attention has been paid in neuroscience to population hypotheses of neural function [10], and while many methods for modeling these data offline exist, surprisingly few methods function online, though presumably online methods will be needed to test some population dynamics hypotheses [17]. While the neural engineering literature has long used online methods based on Kalman filtering, (e.g., [16]), and these methods are known to work well in many practical cases, they also imply strong assumptions about the evolution of activity within these systems. Thus, many studies that employ less constrained behavior or study neural activity with less robust dynamics may benefit from more flexible models that can be trained while the experiment is running.

Here, to address this need, we have introduced both a new dimension reduction method that rapidly produces stable estimates of features and a method for rapidly mapping and charting transitions on neural manifolds. Rather than focus on moment-by-moment prediction, we focus on estimating a coarse tiling and probability flow among these tiles. Thus, Bubblewrap may be less accurate than methods based on dynamical systems when state trajectories are accurately described by smooth vector fields with Gaussian noise. Conversely, when noise dominates, is multimodal, or only large-scale probability flow is discernible over longer timescales, Bubblewrap is better poised to capture these features. We saw this in our experiments, where the model of [30] exhibited better overall performance in the mouse video dataset (Figure 3e) when it did not learn to predict and degenerated to a random walk. Indeed, the most relevant comparison to the two approaches is the duality between stochastic differential equations and Fokker-Planck equations, where ours is a (softly) discretized analog of the latter. Nonetheless, in many of the cases we consider, Bubblewrap produces superior results even for state prediction. Nonetheless, like many similar models, ours includes multiple hyperparameters that require setting. While we did not experience catastrophic failure or sensitive dependence on parameters in our testing, and while our methods adapt to the scale and drift of the data, some tuning was required in practice.

As detailed above, while many methods target population dynamics, and a few target closed-loop settings [31, 16, 55], very few models are capable of being trained online. Thus, the most closely related approaches are those in [30, 32], to which we provide extensive comparisons. However, these comparisons are somewhat strained by the fact that we provided all models with the same proSVD-reduced low-dimensional data, while [32] is capable of modeling high-dimensional data in its own right and [30] was targeted at inferring neural computations from dynamical systems. We thus view this work as complementary to the dynamical systems approach, one that may be preferred when small distinctions among population dynamics are less important than characterizing highly noisy, non-repeating neural behavior.

Finally, we showed that online training of Bubblewrap can be performed fast enough for even kiloHertz data acquisition rates if small latencies are tolerable and gradient steps can be performed for small numbers of samples at a time. Yet, for real-time applications, it is not training time but the time required to make predictions that is relevant, and we demonstrate prediction times of tens of microseconds. Moreover, Bubblewrap is capable of producing effective predictions multiple time steps into the future, providing ample lead time for closed-loop interventions. Thus, coarse-graining methods like ours open the door to online manipulation and steering of neural systems.

Supplementary Material

Supplementary

Acknowledgments and Disclosure of Funding

Research reported in this publication was supported by a NIH BRAIN Initiative Planning Grant (R34NS116738; JP), and a Swartz Foundation Postdoctoral Fellowship for Theory in Neuroscience (AD). AD also holds a Career Award at the Scientific Interface from the Burroughs Wellcome Fund.

Contributor Information

Anne Draelos, Biostatistics & Bioinformatics, Duke University.

Pranjal Gupta, Psychology & Neuroscience, Duke University.

Na Young Jun, Neurobiology, Duke University.

Chaichontat Sriworarat, Biomedical Engineering, Duke University.

John Pearson, Biostatistics & Bioinformatics, Electrical & Computer Engineering, Neurobiology, Psychology & Neuroscience, Duke University.

References

  • [1].Ahrens Misha B, Orger Michael B, Robson Drew N, Li Jennifer M, and Keller Philipp J. Whole-brain functional imaging at cellular resolution using light-sheet microscopy. Nature methods, 10(5):413–420, 2013. [DOI] [PubMed] [Google Scholar]
  • [2].Emiliani Valentina, Cohen Adam E, Deisseroth Karl, and Häusser Michael. All-optical interrogation of neural circuits. Journal of Neuroscience, 35(41):13917–13926, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Stevenson Ian H and Kording Konrad P. How advances in neural recording affect data analysis. Nature neuroscience, 14(2):139–142, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Steinmetz Nicholas A, Koch Christof, Kenneth D Harris, and Matteo Carandini. Challenges and opportunities for large-scale electrophysiology with neuropixels probes. Current opinion in neurobiology, 50:92–100, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Steinmetz Nicholas A, Aydin Cagatay, Lebedeva Anna, Okun Michael, Pachitariu Marius, Bauza Marius, Beau Maxime, Bhagat Jai, Claudia Böhm Martijn Broux, et al. Neuropixels 2.0: A miniaturized high-density probe for stable, long-term brain recordings. bioRxiv, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Mante Valerio, Sussillo David, Shenoy Krishna V, and Newsome William T. Context-dependent computation by recurrent dynamics in prefrontal cortex. nature, 503(7474):78–84, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Rajan Kanaka, Harvey Christopher D, and Tank David W. Recurrent network models of sequence generation and memory. Neuron, 90(1):128–142, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Song H Francis, Yang Guangyu R, and Wang Xiao-Jing. Training excitatory-inhibitory recurrent neural networks for cognitive tasks: a simple and flexible framework. PLoS computational biology, 12(2):e1004792, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Cunningham John P and Byron M Yu. Dimensionality reduction for large-scale neural recordings. Nature neuroscience, 17(11):1500–1509, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Ebitz R Becket and Hayden Benjamin Y. The population doctrine revolution in cognitive neurophysiology. arXiv preprint arXiv:2104.00145, 2021. [Google Scholar]
  • [11].Carlson Eric T, Rasquinha Russell J, Zhang Kechen, and Connor Charles E. A sparse object coding scheme in area v4. Current Biology, 21(4):288–293, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].DiMattina Christopher and Zhang Kechen. Adaptive stimulus optimization for sensory systems neuroscience. Frontiers in neural circuits, 7:101, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Cowley Benjamin, Williamson Ryan, Clemens Katerina, Smith Matthew, and Yu Byron M. Adaptive stimulus selection for optimizing neural population responses. In Advances in Neural Information Processing Systems, volume 30, 2017. [Google Scholar]
  • [14].Reza Abbasi-Asl Yuansi Chen, Bloniarz Adam, Oliver Michael, Willmore Ben DB, Gallant Jack L, and Yu Bin. The deeptune framework for modeling and characterizing neurons in visual cortex area v4. bioRxiv, page 465534, 2018. [Google Scholar]
  • [15].Zhang Zihui, Russell Lloyd E, Packer Adam M, Gauld Oliver M, and Häusser Michael. Closed-loop all-optical interrogation of neural circuits in vivo. Nature methods, 15(12):1037–1040, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Bolus Michael F, Willats Adam A, Rozell Christopher J, and Stanley Garrett B. State-space optimal feedback control of optogenetically driven neural activity. Journal of Neural Engineering, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Peixoto Diogo, Jessica R Verhein Roozbeh Kiani, Jonathan C Kao Paul Nuyujukian, Chandrasekaran Chandramouli, Brown Julian, Fong Sania, Ryu Stephen I, Shenoy Krishna V, et al. Decoding and perturbing decision states in real time. Nature, pages 1–6, 2021. [DOI] [PubMed] [Google Scholar]
  • [18].Draelos Anne and Pearson John. Online neural connectivity estimation with noisy group testing. Advances in Neural Information Processing Systems, 33, 2020. [Google Scholar]
  • [19].Brand Matthew. Incremental singular value decomposition of uncertain data with missing values. Computer Vision?ECCV 2002, pages 707–720, 2002. [Google Scholar]
  • [20].Baker Christopher G. A block incremental algorithm for computing dominant singular subspaces. Master’s thesis, Florida State University, 2004. [Google Scholar]
  • [21].Brand Matthew. Fast low-rank modifications of the thin singular value decomposition. Linear algebra and its applications, 415(1):20–30, 2006. [Google Scholar]
  • [22].Baker Christopher G, Gallivan Kyle A, and Van Dooren Paul. Low-rank incremental methods for computing dominant singular subspaces. Linear Algebra and its Applications, 436(8):2866–2888, 2012. [Google Scholar]
  • [23].Mairal Julien, Bach Francis, Ponce Jean, and Sapiro Guillermo. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11(1), 2010. [Google Scholar]
  • [24].Archer Evan, Il Memming Park Lars Buesing, Cunningham John, and Paninski Liam. Black box variational inference for state space models. arXiv preprint arXiv:1511.07367, 2015. [Google Scholar]
  • [25].Gao Yuanjun, Archer Evan, Paninski Liam, and Cunningham John P. Linear dynamical neural population models through nonlinear embeddings. arXiv preprint arXiv:1605.08454, 2016. [Google Scholar]
  • [26].Pandarinath Chethan, Daniel J O’Shea Jasmine Collins, Jozefowicz Rafal, Stavisky Sergey D, Kao Jonathan C, Trautmann Eric M, Kaufman Matthew T, Ryu Stephen I, Hochberg Leigh R, et al. Inferring single-trial neural population dynamics using sequential auto-encoders. Nature methods, 15(10):805–815, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Linderman Scott, Johnson Matthew, Miller Andrew, Adams Ryan, Blei David, and Paninski Liam. Bayesian learning and inference in recurrent switching linear dynamical systems. In Artificial Intelligence and Statistics, pages 914–922, 2017. [Google Scholar]
  • [28].Linderman* Scott W., Johnson* Matthew J., Miller Andrew C., Adams Ryan P., Blei David M., and Paninski Liam. Bayesian learning and inference in recurrent switching linear dynamical systems. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 2017. [Google Scholar]
  • [29].Nassar Josue, Linderman Scott W, Bugallo Monica, and Il Park Memming. Tree-structured recurrent switching linear dynamical systems for multi-scale modeling. arXiv preprint arXiv:1811.12386, 2018. [Google Scholar]
  • [30].Zhao Yuan and Park Il Memming. Interpretable nonlinear dynamic modeling of neural trajectories. In Lee D, Sugiyama M, Luxburg U, Guyon I, and Garnett R, editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. [Google Scholar]
  • [31].Yang Yuxiao, Connolly Allison T, and Shanechi Maryam M. A control-theoretic system identification framework and a real-time closed-loop clinical simulation testbed for electrical brain stimulation. Journal of neural engineering, 15(6):066007, 2018. [DOI] [PubMed] [Google Scholar]
  • [32].Zhao Yuan and Park Il Memming. Variational online learning of neural dynamics. Frontiers in computational neuroscience, 14, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Gao Peiran, Trautmann Eric, Yu Byron, Santhanam Gopal, Ryu Stephen, Shenoy Krishna, and Ganguli Surya. A theory of multineuronal dimensionality, dynamics and measurement. BioRxiv, page 214262, 2017. [Google Scholar]
  • [34].Trautmann Eric M, Stavisky Sergey D, Lahiri Subhaneil, Ames Katherine C, Kaufman Matthew T, O’Shea Daniel J, Vyas Saurabh, Sun Xulu, Ryu Stephen I, Ganguli Surya, et al. Accurate estimation of neural population dynamics without spike sorting. Neuron, 103(2):292–308, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Achlioptas Dimitris. Database-friendly random projections: Johnson-lindenstrauss with binary coins. Journal of computer and System Sciences, 66(4):671–687, 2003. [Google Scholar]
  • [36].Li Ping, Hastie Trevor J, and Church Kenneth W. Very sparse random projections. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 287–296, 2006. [Google Scholar]
  • [37].Johnson William B and Lindenstrauss Joram. Extensions of lipschitz mappings into a hilbert space. Contemporary mathematics, 26(189–206):1, 1984. [Google Scholar]
  • [38].Schönemann Peter H. A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1):1–10, 1966. [Google Scholar]
  • [39].Degenhart Alan D, Bishop William E, Oby Emily R, Tyler-Kabara Elizabeth C, Chase Steven M, Batista Aaron P, and Byron M Yu. Stabilization of a brain–computer interface via the alignment of low-dimensional spaces of neural activity. Nature biomedical engineering, 4(7):672–685, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Ross David A, Lim Jongwoo, Lin Ruei-Sung, and Yang Ming-Hsuan. Incremental learning for robust visual tracking. International journal of computer vision, 77(1):125–141, 2008. [Google Scholar]
  • [41].Berman Gordon J, Choi Daniel M, Bialek William, and Shaevitz Joshua W. Mapping the stereotyped behaviour of freely moving fruit flies. Journal of The Royal Society Interface, 11(99):20140672, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [42].Gordon J Berman William Bialek, and Shaevitz Joshua W. Predictability and hierarchy in drosophila behavior. Proceedings of the National Academy of Sciences, 113(42):11943–11948, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Pereira Talmo D, Shaevitz Joshua W, and Murthy Mala. Quantifying behavior to understand the brain. Nature Neuroscience, pages 1–13, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Mongillo Gianluigi and Deneve Sophie. Online learning with hidden markov models. Neural computation, 20(7):1706–1716, 2008. [DOI] [PubMed] [Google Scholar]
  • [45].Cappé Olivier and Moulines Eric. On-line expectation–maximization algorithm for latent data models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(3):593–613, 2009. [Google Scholar]
  • [46].Le Corff Sylvain, Fort Gersende, et al. Online expectation maximization based algorithms for inference in hidden markov models. Electronic Journal of Statistics, 7:763–792, 2013. [Google Scholar]
  • [47].Kingma Diederik P and Ba Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [Google Scholar]
  • [48].Churchland Mark. Churchland lab code. “https://churchland.zuckermaninstitute.columbia.edu/content/code”.
  • [49].Churchland Mark M, Cunningham John P, Kaufman Matthew T, Foster Justin D, Nuyujukian Paul, Ryu Stephen I, and Shenoy Krishna V. Neural population dynamics during reaching. Nature, 487(7405):51–56, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [50].Musall Simon, Kaufman Matthew T, Juavinett Ashley L, Gluf Steven, and Churchland Anne K. Single-trial neural dynamics are dominated by richly varied movements. Nature neuroscience, 22(10):1677–1686, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [51].Musall Simon, Kaufman Matthew T., Juavinett Ashley L., Gluf Steven, and Churchland Anne K.. Single-trial neural dynamics are dominated by richly varied movements: dataset. Technical report, October 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [52].Steinmetz Nick, Pachitariu Marius, Stringer Carsen, Carandini Matteo, and Harris Kenneth. Eight-probe neuropixels recordings during spontaneous behaviors, Mar 2019. [Google Scholar]
  • [53].Stringer Carsen, Pachitariu Marius, Steinmetz Nicholas, Reddy Charu Bai, Carandini Matteo, and Harris Kenneth D. Spontaneous behaviors drive multidimensional, brainwide activity. Science, 364(6437), 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [54].Bradbury James, Frostig Roy, Hawkins Peter, Johnson Matthew James, Leary Chris, Maclaurin Dougal, Necula George, Paszke Adam, VanderPlas Jake, Wanderman-Milne Skye, and Zhang Qiao. JAX: composable transformations of Python+NumPy programs, 2018. [Google Scholar]
  • [55].Sani Omid G, Abbaspourazad Hamidreza, Wong Yan T, Pesaran Bijan, and Shanechi Maryam M. Modeling behaviorally relevant neural dynamics enabled by preferential subspace identification. Technical report, Nature Publishing Group, 2020. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary

RESOURCES