Abstract
Stability in recurrent neural models poses a significant challenge, particularly in developing biologically plausible neurodynamical models that can be seamlessly trained. Traditional cortical circuit models are notoriously difficult to train due to expansive nonlinearities in the dynamical system, leading to an optimization problem with nonlinear stability constraints that are difficult to impose. Conversely, recurrent neural networks (RNNs) excel in tasks involving sequential data but lack biological plausibility and interpretability. In this work, we address these challenges by linking dynamic divisive normalization (DN) to the stability of “oscillatory recurrent gated neural integrator circuits” (ORGaNICs), a biologically plausible recurrent cortical circuit model that dynamically achieves DN and that has been shown to simulate a wide range of neurophysiological phenomena. By using the indirect method of Lyapunov, we prove the remarkable property of unconditional local stability for an arbitrary-dimensional ORGaNICs circuit when the recurrent weight matrix is the identity. We thus connect ORGaNICs to a system of coupled damped harmonic oscillators, which enables us to derive the circuit’s energy function, providing a normative principle of what the circuit, and individual neurons, aim to accomplish. Further, for a generic recurrent weight matrix, we prove the stability of the 2D model and demonstrate empirically that stability holds in higher dimensions. Finally, we show that ORGaNICs can be trained by backpropagation through time without gradient clipping/scaling, thanks to its intrinsic stability property and adaptive time constants, which address the problems of exploding, vanishing, and oscillating gradients. By evaluating the model’s performance on RNN benchmarks, we find that ORGaNICs outperform alternative neurodynamical models on static image classification tasks and perform comparably to LSTMs on sequential tasks.
1. Introduction
Deep neural networks (DNNs) have found widespread use in modeling tasks from experimental systems neuroscience. The allure of DNN-based models lies in their ease of training and the flexibility they offer in architecting systems with desired properties [1–3]. In contrast, neurodynamical models like the Wilson-Cowan [4] or the Stabilized Supralinear Network (SSN) [5] are more biologically plausible than DNNs, but these models confront considerable training challenges due to the lack of stability guarantees for high-dimensional problems. Training recurrent neural networks (RNNs), by comparison, is more straightforward thanks to ad hoc regularization techniques like layer normalization, batch normalization, and gradient clipping/scaling, which help stabilize training without imposing strict stability constraints. Conversely, neurodynamical models require enforcing hard stability constraints while maintaining biological plausibility. In lower dimensions, it is relatively straightforward to derive constraints on model parameters that ensure a dynamically stable system [6, 7]. However, for high-dimensional systems, this becomes significantly more challenging, as integrating these hard constraints into the optimization problem is more complex [8, 9]. Stability is generally advantageous in DNNs, as it is linked to improved generalization, mitigation of exploding gradient problems, increased robustness to input noise, and simplified training techniques [10].
The divisive normalization (DN) model was developed to explain the responses of neurons in the primary visual cortex (V1) [11–14], and has since been applied to diverse cognitive processes and neural systems [15–24]. Therefore, DN has been proposed as a canonical neural computation [25] that is linked to many well-documented physiological [26, 27] and psychophysical [28, 29] phenomena. DN models various neural processes: adaptation [30, 31], attention [32], automatic gain control [33], decorrelation, and statistical whitening [34]. The defining characteristic of DN is that each neuron’s response is divided by a weighted sum of the activity of a pool of neurons (Eq. 2, below) like when normalizing the length of a vector. Due to its wide applicability and ability to explain a variety of neurophysiological phenomena, we argue that this characteristic should be central to any neurodynamical model. Both the Wilson-Cowan and SSN models have been shown to approximate DN responses [5, 35], but only approximately in certain parameter regimes.
Normalization techniques have been extensively adopted for training DNNs, demonstrating their ability to stabilize, accelerate training, and enhance generalization [36–38]. Divisive normalization can be viewed as a comprehensive normalization strategy, with batch and layer normalization being specific instances [39]. Models implementing DN have shown superior performance compared to common normalization methods (Batch [36], Layer [37], Group [40]) in tasks such as image recognition with convolutional neural networks (CNNs) [41] and language modeling with RNNs [39, 42]. Despite the foundational role of these techniques in deep learning algorithms, their implementation is ad hoc, limiting their conceptual relevance. They serve as practical solutions addressing the limitations of current machine learning frameworks rather than offering principled insights derived from understanding cortical circuits.
It has been proposed that DN is achieved via a recurrent circuit [11, 13, 43–47]. Oscillatory recurrent gated neural integrator circuits (ORGaNICs) are rate-based recurrent neural circuit models that implement DN dynamically via recurrent amplification [47, 48]. Since ORGaNICs’ response follows the DN equation at steady-state, its steady-state response captures the full range of aforementioned neural phenomena explained by DN [11–34]. ORGaNICs have further been shown to simulate key time-dependent neurophysiological and cognitive/perceptual phenomena under realistic biophysical constraints [47, 48]. Additional phenomena not explained by DN [49] can in principle be integrated into the model. In this paper, however, we focus on the effects of DN on the dynamical stability of ORGaNICs. Despite some empirical evidence that ORGaNICs are highly robust, the question of whether the model is stable for arbitrary parameter choices, and thus whether it can be robustly trained on ML tasks by backpropagation-through-time (BPTT), remains open.
Here, we establish the unconditional stability — applicable across all parameters and inputs — of a multidimensional two-neuron-types ORGaNICs model when the recurrent weight matrix is the identity. We prove this result, detailed in Section 4, by the indirect method of Lyapunov: we perform linear stability analysis around the model’s analytically-known normalization fixed point and reduce the stability problem to that of a high-dimensional mechanical system, whose stability is defined in terms of a tractable quadratic eigenvalue problem. We then address the stability of the model with an arbitrary recurrent weight matrix in Section 5. While the indirect method of Lyapunov becomes intractable for such a system, we provide proof of unconditional stability for a two-dimensional circuit with an arbitrary recurrent weight and offer empirical evidence supporting the claim of stability for high-dimensional systems.
ORGaNICs can be viewed as biophysically plausible extensions of Long Short Term Memory units (LSTMs) [3] and Gated Recurrent Units (GRUs) [50], RNN architectures that have been widely used in ML applications [3, 51–54]. The main differences are that ORGaNICs operate in continuous time and have built-in dynamic normalization (via recurrent gain modulation) and built-in attention (via input gain modulation). Thus, we expect that ORGaNICs should be able to solve relatively sophisticated tasks [47]. Here, we demonstrate (Section 6) that by virtue of their intrinsic stability, ORGaNICs can be trained on sequence modeling tasks by BPTT, in the same manner as traditional RNNs (unlike SSN that instead requires costly specialized training strategies [55]), despite implementing power-law activations [5]. Moreover, we show that ORGaNICs trained by naive BPTT (i.e., without gradient clipping/scaling or other ad hoc strategies) achieve performance comparable to LSTMs on the tasks that we consider, despite no systematic hyperparameter tuning.
2. Related Work
Trainable biologically plausible neurodynamical models:
There have been several attempts to develop neurodynamical models that mimic the function of biological circuits and that can be trained on cognitive tasks. Song et al. [56] incorporated Dale’s law into the vanilla RNN architecture, which was successfully trained across a variety of cognitive tasks. Building on this, Soo et al. [57] developed a technique for such RNNs to learn long-term dependencies by using skip connections through time. ORGaNICs is a model that is already built on biological principles and can learn long-term dependencies intrinsically by tuning the (intrinsic or effective) time constants, therefore it does not require the method used in [57]. Soo et al. [55] introduced a novel training methodology (dynamics-neural growth) for SSNs and demonstrated its utility for tasks involving static (time-independent) stimuli. However, this training approach is costly and difficult to scale (because SSNs, unlike ORGaNICs, are not unconditionally stable), and its applicability on tasks with dynamically changing inputs remains unclear.
Dynamical systems view of RNNs:
The stability of continuous-time RNNs has been extensively studied and discussed in a comprehensive review by Zhang et al. [58]. Recent advancements have focused on designing architectures that address the issues of vanishing and exploding gradients, thereby enhancing trainability and performance. A central idea in these designs is to achieve better trainability and generalization by ensuring the dynamical stability of the network. Moreover to avoid the problem of vanishing gradients the key idea is to constrain the real part of the eigenvalues of the linearized dynamical system to be close to zero, which facilitates the propagation and retention of information over long durations of time. Chang et al. [59] and Erichson et al. [60] achieve this by imposing an antisymmetric constraint on the recurrent weight matrix. Meanwhile, Rusch et al. [61, 62] propose an architecture based on coupled damped harmonic oscillators, resulting in a second-order system of ordinary differential equations that behaves similarly to how ORGaNICs behave in the vicinity of the normalization fixed point, as we show in Section 4. Despite their impressive performance on various sequential data benchmarks, these models lack biological plausibility due to their use of saturating nonlinearities (instead of normalization) and unrealistic weight parameterizations.
3. Model description
In its simplest form, the two-neuron-types ORGaNICs model [47, 48] with neurons of each type can be written as,
| (1) |
where and are the membrane potentials (relative to an arbitrary threshold potential that we take to be 0) of the excitatory and inhibitory neurons, evolving according to the dynamical equations defined above with and denoting the time derivatives. The notation denotes element-wise multiplication of vectors, and squaring, rectification, square-root, and division are also performed element-wise. is an n-dimensional vector with all entries equal to 1. is the input drive to the circuit and is a weighted sum of the input, , i.e., . The firing rates, and are rectified power functions of the underlying membrane potentials. For the derivation of a general model with arbitrary power-law exponents, including the Eq. 1, see Appendix A. Note that the term serves the purpose of defining a mechanism for reconstructing the membrane potential (which can be negative, depending on the sign of the input) from the firing rates that are strictly nonnegative. and are the firing rates of neurons with complementary receptive fields such that they encode inputs with positive and negative signs, respectively. Note that only one of these neurons fires at a given time. In ORGaNICs, these neurons have a single dynamical equation for their membrane potentials, where the sign of indicates which neuron is active. Neurons with such complementary (anti-phase) receptive fields are found adjacent to each other in the visual cortex [63], and we hypothesize that such complementary neurons are ubiquitous throughout the neocortex. and are the input gains for the external inputs and fed to neurons and , respectively. is the set of positive real numbers, determines the semisaturation of the responses of neurons by contributing to the depolarization of neurons . and represent the time constants of and neurons.
In addition to receiving external inputs, both and neurons receive recurrent inputs, represented by the last term in both of the equations. is the recurrent weight matrix that captures lateral connections between the neurons. This recurrent input is gated by the neurons, via the term . Similarly, the nonnegative normalization weight matrix, , encapsulates the recurrent inputs received by the neurons. The differential equations are designed in such a way that when and (i.e., with all elements equal to a constant ), the principal neurons follow the normalization equation exactly (and approximately when ) at steady-state,
| (2) |
and represent the contribution of neurons with complementary receptive fields to the normalization pool, and is the contrast energy of the input. Note that the recurrent gain, , is a particular nonlinear function of the output responses/activation designed to achieve DN, while the input gain, , is an input gate that can implement an attention mechanism.
4. Stability analysis of high-dimensional model with identity recurrent weights
We consider the stability of the general high-dimensional ORGaNICs (Eq. 1) when the recurrent weight matrix is identity, . We first simplify the dynamical system by noting that and yielding the following equations,
| (3) |
For identity recurrent weights, we have a unique fixed point, given by,
| (4) |
Since the normalization weights in the matrix are nonnegative, at steady-state we have , so that , and the corresponding firing rates at steady-state are,
| (5) |
Note that we recover the normalization equation, Eq. 2, if . Since the fixed points of and a neurons are known analytically, to prove that this fixed point is locally asymptotically stable (i.e., the responses converge asymptotically to the fixed point), we apply the indirect method of Lyapunov at this fixed point [64]. This method allows us to analyze the stability of the nonlinear system in the vicinity of the fixed point by studying the corresponding linearized system. The Jacobian matrix about , defining the linearized system, is given by,
| (6) |
where is a diagonal matrix of appropriate size with the elements of the vector on the diagonal. A necessary and sufficient condition for local stability is that the real parts of all eigenvalues of this matrix are negative. We thus proceed by computing the characteristic polynomial for the Jacobian, . The roots of this polynomial, found by setting , are the eigenvalues of the system. Consider the block matrix,
| (7) |
Notice that and are diagonal and therefore they commute, i.e., , so we have that which is a property of the determinant of block matrices with commuting blocks [65]. Therefore, the characteristic polynomial of the linearized system after expansion of the terms and simplification is given by,
| (8) |
Finding the roots of this polynomial is thus a quadratic eigenvalue problem of the form , which has been studied extensively [66–69]. can be interpreted as the characteristic polynomial associated with a system of linear second-order differential equations with constant coefficients of the form . Therefore, proving the stability of our system (i.e., for ), is equivalent to proving the asymptotic stability of .
Tisseur et al. [67] and Kirillov et al. [69] list a set of constraints on the damping matrix, and stiffness matrix, that yield a stable system, but they are not directly applicable to our system. In the context of a high-dimensional mechanical system, our system falls under the category of gyroscopically stabilized systems with indefinite damping. Few results are known about the conditions leading to the stability of such systems. By constructing a Lyapunov function, we prove (Appendix B) the following stability theorem that is directly applicable to our system, following an approach similar to Kliem et al. [70].
Theorem 4.1. For a system of linear differential equations with constant coefficients of the form,
| (9) |
where and is a positive diagonal matrix (hence ), the dynamical system is globally asymptotically stable if is Lyapunov diagonally stable.
Since the stiffness matrix,
| (10) |
is a positive diagonal matrix, a sufficient condition for stability of the system is that the damping matrix, , given by,
| (11) |
is Lyapunov diagonally stable, i.e., there exists a positive definite diagonal matrix , such that is positive definite.
Since all of the parameters are positive, and the weights in the matrix are nonnegative, we can conclude the following: and are positive diagonal matrices and is a matrix with all positive entries (that may or may not be symmetric). Therefore, is a Z-matrix, meaning that its off-diagonal entries are nonpositive. Further, a Z-matrix is Lyapunov diagonally stable if and only if it is a nonsingular M-matrix. Intuitively, M-matrices are matrices with non-positive off-diagonal elements and “large enough” positive diagonal entries. Berman & Plemmons [71] list 50 equivalent definitions of nonsingular M-matrices. We use the one that is best suited for our problem,
Theorem 4.2. (Chapter 6, Theorem 2.3 from [71]) A Z-matrix matrix is Lyapunov diagonally stable if and only if there exists a convergent regular splitting of the matrix, that is, it has a representation of the form , where and have all nonnegative entries, and has a spectral radius smaller than 1.
We now show that, indeed, has a convergent regular splitting for all combinations of the circuit parameters and for all inputs. We have already shown that is a Z-matrix, therefore, the first condition of the theorem is satisfied. Next, we consider the following splitting with and . Since and are positive diagonal matrices, is nonnegative, while is also nonnegative because has all positive entries. Therefore, the only condition left to satisfy is that the spectral radius of is smaller than 1, or that the matrix is convergent.
The matrix can be written as,
| (12) |
We prove the following theorem (Appendix D) which directly applies to ,
Theorem 4.3. A matrix of the form is convergent (i.e., its spectral radius is less than 1), if and satisfy and for all i, j.
Defining and , it can be seen that they satisfy the constraints of the theorem, and thus is convergent. This implies that has a convergent regular splitting and, as a result, the linearized dynamical system is unconditionally globally asymptotically stable for all the values of parameters and inputs. Further, the global asymptotic stability of linearization implies the local asymptotic stability of the normalization fixed point for ORGaNICs.
This result holds even when the neurons have different time constants, regardless of their type, as no assumptions were made about the time constants. This finding is significant for machine learning, particularly for designing architectures based on ORGaNICs. It allows neurons/units to integrate information at varying time scales while maintaining a stable circuit that performs normalization dynamically. Moreover, analytical expressions for eigenvalues can be obtained in the following case,
Theorem 4.4. Let , the normalization matrix be given by , where is the all-ones matrix, and the parameters are scalars, i.e., , and . Under these conditions, the eigenvalues of the system admit closed form solutions (detailed in Appendix C).
This result is particularly useful for neuroscience as it elucidates the connection between ORGaNICs parameters and the strength and frequency of oscillatory activity. Since we followed a direct Lyapunov approach to prove Theorem 4.1 as shown in Appendix B, we can derive an energy (viz., Lyapunov function) for ORGaNICs as shown in Appendix H.
Theorem 4.5. When , the energy (Lyapunov function) minimized by ORGaNICs in the vicinity of the normalization fixed point, is given by,
| (13) |
Where are the diagonal entries of and are the steady-state values of neurons .
Specifically, for a two-dimensional model (one neuron and one neuron) this expression simplifies to reveal that ORGaNICs behave like a damped harmonic oscillator with energy,
| (14) |
This result demonstrates that ORGaNICs minimize the residual of the instantaneously reconstructed gated input drive , while also ensuring that the principal neuron’s response, , achieves DN. The balance between these objectives is governed by the parameters and the external input strength. With fixed parameters, weaker inputs, , cause the model to prioritize input matching over normalization, whereas stronger inputs increasingly engage the normalization objective.
5. Stability analysis for arbitrary recurrent weights
Now, we relax the constraint that the recurrent weight matrix must be identity, allowing , and see how the stability result changes. This leads to the following set of equations,
| (15) |
The linear stability analysis becomes intractable for a general because we no longer have a closed-form analytical expression for the steady states of and . Additionally, the characteristic polynomial cannot be expressed in a way similar to Eq.8. Nevertheless, for a two-dimensional system,
| (16) |
we can prove the following, with a detailed analysis provided in Appendix E.
Theorem 5.1. Given that the recurrence is contracting, i.e., , when there exists a unique fixed point with and , and it is asymptotically stable.
Theorem 5.2. Given that the recurrence is expansive, i.e., , there are either 1 or 3 fixed points of which at least one is asymptotically stable. When there exists exactly 1 fixed point with and , and it is asymptotically stable. If , there are no additional fixed points. If , there exist either 0 or 2 additional fixed points with and whose stability cannot be guaranteed.
We plot the phase portraits for these different cases in Fig. 1. The key takeaway is that there is always a fixed point with and having the same sign as . This fixed point is asymptotically stable regardless of the value of . Based on these results and the proven stability of arbitrary dimensional ORGaNICs when (as shown in Section 4), we conjecture that
Figure 1: Phase portraits for 2D ORGaNICs with positive input drive.
We plot the phase portraits of 2D ORGaNICs in the vicinity of the stable fixed points for contractive (a, d) and expansive (b, c, e, f) recurrence scalar . A stable fixed point always exists, regardless of the parameter values. (a-c), The main model (Eq. 16). (d-f), The rectified model (Eq. 102). Red stars and black circles indicate stable and unstable fixed points, respectively. The parameters for all plots are: , and . For (a) & (d), the parameters are ; for (b) & (e), ; and for (c) & (f), .
Conjecture 5.3. Consider high-dimensional ORGaNICs with an arbitrary recurrent weight matrix and no constraints on the remaining parameters. If the norm of the input drive satisfies , and the maximum singular value of is constrained to be 1, then the system possesses at least one asymptotically stable fixed point.
This conjecture is supported by empirical evidence showing consistent stability, as ORGaNICs initialized with random parameters and inputs under these constraints have exhibited stability in 100% of trials, see Fig. 4. We further speculate that ORGaNICs may be typically stable beyond this regime as we find that 100% of trials yield a stable circuit when the constraint on the maximum singular value of is increased to 2, but it becomes unstable when it is increased to 3.

6. Experiments
We provide further empirical evidence in support of Conjecture 5.3 that ORGaNICs is asymptotically stable by showing that stability is preserved when training ORGaNICs using naïve BPTT on two different tasks: 1) static classification of MNIST, 2) sequential classification of pixel-by-pixel MNIST. Because these ML tasks have no relevance for neurobiological or cognitive processes, we relax one aspect of the biological plausibility of ORGaNICs, specifically, allowing arbitrary (learned) nonnegative values for the intrinsic time constants.1
6.1. Static input classification task
We first show that we can train ORGaNICs on the MNIST handwritten digit dataset [72] presented to the circuit as a static input. This setting corresponds to evolving the responses of the neurons dynamically until they reach a fixed point solution and using the steady-state firing rates of the principal neurons to predict the labels, akin to deep equilibrium models [73]. While the fixed point of the circuit is known when (given by Eq. 89), we allow to be learnable and parameterized it to have a maximum singular value of 1. This constraint allows us to find the fixed point responses of all the neurons without simulation, using a fixed point iteration scheme (Algorithm 1) that converges with great accuracy in a few (less than 5) steps, see Fig. 4 & 5. We provide an intuition for why this algorithm works with empirical evidence of fast convergence in Appendix G.
We trained ORGaNICs on this task (details provided in Appendix I.1) and compared its performance to SSN [5] trained by dynamics-neutral growth [55]. We found that ORGaNICs perform better than SSN with the same model size, and on par with an MLP (Table 1). We analyzed the eigenvalues of the Jacobian matrix of the trained model and consistently found the largest real part to be negative (Fig. 5), indicating stability. Moreover, we found that stability was maintained during training (Fig. 6).
Table 1:
Test accuracy on MNIST dataset
| Model | Accuracy |
|---|---|
| SSN (50:50) | 94.9% |
| SSN (80:20) | 95.2% |
| MLP (50) | 98.2% |
| ORGaNICs (50:50) | 98.1% |
| ORGaNICs (80:80) | 98.2% |
| ORGaNICs (two layers) | 98.1% |
6.2. Time varying input
We trained unconstrained ORGaNICs by naïve BPTT on a classification task of sequential MNIST (sMNIST), proposed by Le et al. [74]. This is a challenging task because it involves long-term dependencies and requires the architecture to maintain and integrate information over long timescales. Briefly, the task involves the presentation of pixels of MNIST images sequentially (one pixel for each timestep) in scanline order, and at the end of the input the model has to predict the digit that was presented. There is a more complicated version of this task, permuted sequential MNIST, in which the pixels of all images are permuted in some random order before being presented sequentially. We train ORGaNICs with different hidden layer sizes (number of neurons) on these two tasks by discretizing the rectified ORGaNICs with arbitrary recurrence, Eq. 87, which has all the properties that we have derived for the main model. Since an unstable fixed point is undesirable in such a task, as it may lead to diverging trajectories, we prefer the rectified model (Appendix F) over the main model. We proved that the 2D rectified ORGaNICs (Eq. 102) does not exhibit an unstable fixed point for positive inputs, as it can also be seen in Fig 1. The hidden states of the neurons are initialized with a uniform random distribution (for more details, see Appendix I.2). Additionally, we make the input gains and dynamical with their ODEs given by,
| (17) |
We achieved slightly better performance than LSTMs on sMNIST with a smaller model size and comparable performance on permuted sMNIST, without hyperparameter optimization and without gradient clipping/scaling (Table 2). We found that the trajectories of are bounded when it is trained on the sequential task (Fig. 7), indicating stability. We also show that the training of ORGaNICs is stable and does not require gradient clipping when the intrinsic time constants of the neurons are fixed (Table 2).
Table 2:
Test accuracy on sequential pixel-by-pixel MNIST and permuted MNIST
| Model | sMNIST | psMNIST | # units | # params |
|---|---|---|---|---|
| LSTMs [75] | 97.3% | 92.6% | 128 | 68k |
| AntisymmetricRNN [59] | 98.0% | 95.8% | 128 | 10k |
| coRNN [61] | 99.3% | 96.6% | 128 | 34k |
| Lipschitz RNN [60] | 99.4% | 96.3% | 128 | 34k |
| ORGaNICs (fixed time constants) | 90.3% | 80.3% | 64 | 26k |
| ORGaNICs (fixed time constants) | 94.8% | 84.8% | 128 | 100k |
| ORGaNICs | 97.7% | 89.9% | 64 | 26k |
| ORGaNICs | 97.8% | 90.7% | 128 | 100k |
7. Discussion
Summary:
While extensive research has been aimed at identifying highly expressive RNN architectures that can model complex data, there has been little advancement in developing robust, biologically plausible recurrent neural circuits that are easy to train and perform comparably to their artificial counterparts. Regularization techniques such as batch, group, and layer normalization have been developed and are implemented as ad hoc add-ons making them biologically implausible. In this work, we bridge these gaps by leveraging the recently proposed ORGaNICs model which implements divisive normalization (DN) dynamically in a recurrent circuit. We establish the unconditional stability of an arbitrary dimensional ORGaNICs circuit with an identity recurrent weight matrix , with all of the other parameters and inputs unconstrained, and provide empirical evidence of stability for ORGaNICs with arbitrary . Since ORGaNICs remain stable for all parameter values and inputs, we do not need to resort to techniques that are restrictive in parameter space, or that require designing unrealistic structures for weight matrices. ORGaNICs’ intrinsic stability mitigates the issues of exploding and oscillating gradients, enabling the use of “vanilla” BPTT without the need for gradient clipping, which is instead required when training LSTMs. Moreover, ORGaNICs effectively address the vanishing gradient problem often encountered when training RNNs. This is achieved by processing information across various timescales, resulting in a blend of lossy and non-lossy neurons, while preserving stability. The model’s effectiveness in overcoming vanishing gradients is further evidenced by its competitive performance against architectures specifically designed to address this issue, such as LSTMs.
Dynamic normalization:
Normalization techniques, such as batch and layer normalization, are fundamental in modern ML architectures significantly enhancing the training and performance of CNNs. However, a principled approach to incorporating normalization into RNNs has remained elusive. While layer normalization is commonly applied to RNNs to stabilize training, it does not influence the underlying circuit dynamics since it is applied a-posteriori to the output activations, leaving the stability of RNNs unaffected. Furthermore, DN has been shown to generalize batch and layer normalization [39], leading to improved performance [39, 41, 42]. ORGaNICs, unlike RNNs with layer normalization, implement DN dynamically within the circuit, marking the first instance of this concept being applied and analyzed in ML. Our work demonstrates that embedding DN within a circuit naturally leads to stability, which is greatly advantageous for trainability. This stability, a consequence of dynamic DN, sets ORGaNICs apart from other RNNs by providing both output normalization and model robustness. As a result, ORGaNICs can be trained using BPTT, achieving performance on par with LSTMs. The key insight is that the dynamic application of DN not only enhances training efficiency but also improves model robustness. This illustrates how the incorporation of neurobiological principles can drive advances in ML.
Interpretability:
In the proof of stability, we establish a direct connection between ORGaNICs and systems of coupled damped harmonic oscillators, which have long been studied in mechanics and control theory. This analogy not only enables us to derive an interpretable energy function for ORGaNICs (Eq. 13), providing a normative principle of what the circuit aims to accomplish, but also sheds light on the link between normalization and dynamical stability of neural circuits. For a relevant ML task, having an analytical expression for the energy function allows us to quantify the relative contributions of the individual neurons in the trained model, offering more interpretability than other RNN architectures. For instance, Eq. 13 shows that the ratio of time constants for E-I neuron pairs determines how much weight a neuron assigns to divisive normalization relative to aligning its responses with the input drive . This insight provides a clear functional role for each neuron in the trained model. Moreover, since ORGaNICs are biologically plausible, we can understand how the various components of the dynamical system might be computed within a neural circuit [48], bridging the gap between theoretical models and biological implementation, and offering a means to generate and test hypotheses about neural computation in real biological systems (which we will be reporting elsewhere).
Limitations:
Although the stability property pertains to a continuous-time system of nonlinear differential equations, typical implementations for tasks with sequential data involve an Euler discretization of these equations for training purposes. This might lead to a stiff dynamical system, potentially causing numerical instabilities and explosive dynamics, highlighting the importance of carefully parameterizing time constants and choosing a small enough time step to maintain stable dynamics. The proof of unconditional stability is only tractable for the two-dimensional circuit and the high-dimensional circuit with . Therefore, we can only conjecture the stability of ORGaNICs for arbitrary , based on these two limiting cases and on empirical evidence. In the current form, the weight matrices of the input gain modulators, , and , are each . As a result, the number of parameters grows more rapidly with the hidden state size compared to other RNNs. To mitigate this, we plan to explore using compact and/or convolutional weights to prevent a significant increase in the number of parameters as the hidden state size expands.
Attention mechanisms in ORGaNICs:
ORGaNICs have a built-in mechanism for attention: modulating the input gain (e.g., Eq. 17), coupled with DN. This attention mechanism aligns with experimental data on both increases in the gain of neural responses and improvements in behavioral performance [19,20,32,76–85]. Moreover, this mechanism performs a computation that is analogous to that of an attention head in ML systems (including transformers [2]) as both operate by changing the gain over time. In ORGaNICs, DN replaces the softmax operation typically used in an attention head.
Future work:
This study has explored only a single layer of ORGaNICs for the sequential tasks. Future work will examine how stacked layers with feedback connections, similar to those in the cortex, perform on benchmarks for sequential modeling and also on cognitive tasks with long-term dependencies. We have thus far shown that ORGaNICs can address the problem of long-term dependencies by learning intrinsic time constants. Future investigations will assess the performance of ORGaNICs for tasks with long-term dependencies by learning to modulate the responses of the and neurons to control the effective time constant of the recurrent circuit (without changing the intrinsic time constants) [47], i.e., implementing a working memory circuit capable of learning to maintain and manipulate information across various timescales.
Supplementary Material
Acknowledgments and Disclosure of Funding
The authors acknowledge valuable discussions with Flaviano Morone, Asit Pal, Mathias Casiulis, and Guanming Zhang. This work was supported by the National Institute of Health under award number R01EY035242. S.M. acknowledges the Simons Center for Computational Physical Chemistry for financial support. This work was supported in part through the NYU IT High-Performance Computing resources, services, and staff expertise.
Footnotes
Python code for this study is available at https://github.com/martiniani-lab/dynamic-divisive-norm.
References
- [1].Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012. [Google Scholar]
- [2].Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N, Kaiser Łukasz, and Polosukhin Illia. Attention is all you need. Advances in neural information processing systems, 30, 2017. [Google Scholar]
- [3].Hochreiter Sepp and Schmidhuber Jürgen. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. [DOI] [PubMed] [Google Scholar]
- [4].Wilson Hugh R and Cowan Jack D. Excitatory and inhibitory interactions in localized populations of model neurons. Biophysical journal, 12(1):1–24, 1972. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Rubin Daniel B, Van Hooser Stephen D, and Miller Kenneth D. The stabilized supralinear network: a unifying circuit motif underlying multi-input integration in sensory cortex. Neuron, 85(2):402–417, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Cowan Jack D, Neuman Jeremy, and van Drongelen Wim. Wilson−cowan equations for neocortical dynamics. The Journal of Mathematical Neuroscience, 6:1–24, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Ahmadian Yashar, Rubin Daniel B, and Miller Kenneth D. Analysis of the stabilized supralinear network. Neural computation, 25(8):1994–2037, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Kotary James, Fioretto Ferdinando, Van Hentenryck Pascal, and Wilder Bryan. End-to-end constrained optimization learning: A survey. arXiv preprint arXiv:2103.16378, 2021. [Google Scholar]
- [9].Donti Priya L, Rolnick David, and Kolter J Zico. Dc3: A learning method for optimization with hard constraints. arXiv preprint arXiv:2104.12225, 2021. [Google Scholar]
- [10].Haber Eldad and Ruthotto Lars. Stable architectures for deep neural networks. Inverse problems, 34(1):014004, 2017. [Google Scholar]
- [11].Carandini Matteo and Heeger David J. Summation and division by neurons in primate visual cortex. Science, 264(5163):1333–1336, 1994. [DOI] [PubMed] [Google Scholar]
- [12].Albrecht Duane G and Geisler Wilson S. Motion selectivity and the contrast-response function of simple cells in the visual cortex. Visual neuroscience, 7(6):531–546, 1991. [DOI] [PubMed] [Google Scholar]
- [13].Heeger David J. Normalization of cell responses in cat striate cortex. Visual neuroscience, 9(2):181–197, 1992. [DOI] [PubMed] [Google Scholar]
- [14].Heeger David J. Modeling simple-cell direction selectivity with normalized, half-squared, linear operators. Journal of neurophysiology, 70(5):1885–1898, 1993. [DOI] [PubMed] [Google Scholar]
- [15].Simoncelli Eero P and Heeger David J. A model of neuronal responses in visual area mt. Vision research, 38(5):743–761, 1998. [DOI] [PubMed] [Google Scholar]
- [16].Reynolds John H, Chelazzi Leonardo, and Desimone Robert. Competitive mechanisms subserve attention in macaque areas v2 and v4. Journal of Neuroscience, 19(5):1736–1753, 1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Schwartz Odelia and Simoncelli Eero. Natural sound statistics and divisive normalization in the auditory system. Advances in neural information processing systems, 13, 2000. [Google Scholar]
- [18].Zoccolan Davide, Cox David D, and DiCarlo James J. Multiple object response normalization in monkey inferotemporal cortex. Journal of Neuroscience, 25(36):8150–8164, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Boynton Geoffrey M. A framework for describing the effects of attention on visual responses. Vision research, 49(10):1129–1143, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Lee Joonyeol and Maunsell John HR. A normalization model of attentional modulation of single unit responses. PloS one, 4(2):e4651, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Bays Paul M. Noise in neural populations accounts for errors in working memory. Journal of Neuroscience, 34(10):3632–3645, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Ma Wei Ji, Husain Masud, and Bays Paul M. Changing concepts of working memory. Nature neuroscience, 17(3):347–356, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Zenger-Landolt Barbara and Heeger David J. Response suppression in v1 agrees with psychophysics of surround masking. Journal of Neuroscience, 23(17):6884–6893, 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Foley John M. Human luminance pattern-vision mechanisms: masking experiments require a new model. JOSA A, 11(6):1710–1719, 1994. [DOI] [PubMed] [Google Scholar]
- [25].Carandini Matteo and Heeger David J. Normalization as a canonical neural computation. Nature reviews neuroscience, 13(1):51–62, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Brouwer Gijs Joost and Heeger David J. Cross-orientation suppression in human visual cortex. Journal of neurophysiology, 106(5):2108–2119, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Cavanaugh James R, Bair Wyeth, and Movshon J Anthony. Nature and interaction of signals from the receptive field center and surround in macaque v1 neurons. Journal of neurophysiology, 88(5):2530–2546, 2002. [DOI] [PubMed] [Google Scholar]
- [28].Xing Jing and Heeger David J. Center-surround interactions in foveal and peripheral vision. Vision research, 40(22):3065–3072, 2000. [DOI] [PubMed] [Google Scholar]
- [29].Petrov Alexander A, Dosher Barbara Anne, and Lu Zhong-Lin. The dynamics of perceptual learning: an incremental reweighting model. Psychological review, 112(4):715, 2005. [DOI] [PubMed] [Google Scholar]
- [30].Wainwright Martin J, Schwartz Odelia, and Simoncelli Eero P. Natural image statistics and divisive normalization. 2002.
- [31].Westrick Zachary M, Heeger David J, and Landy Michael S. Pattern adaptation and normalization reweighting. Journal of Neuroscience, 36(38):9805–9816, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Reynolds John H and Heeger David J. The normalization model of attention. Neuron, 61(2):168–185, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Heeger David J, Simoncelli Eero P, and Movshon J Anthony. Computational models of cortical visual processing. Proceedings of the National Academy of Sciences, 93(2):623–627, 1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Lyu Siwei and Simoncelli Eero P. Nonlinear extraction of independent components of natural images using radial gaussianization. Neural computation, 21(6):1485–1519, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Malo Jesús, Esteve-Taboada José Juan, and Bertalmío Marcelo. Cortical divisive normalization from wilson−cowan neural dynamics. Journal of Nonlinear Science, 34(2):1–36, 2024. [Google Scholar]
- [36].Ioffe Sergey and Szegedy Christian. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. pmlr, 2015. [Google Scholar]
- [37].Ba Jimmy Lei, Kiros Jamie Ryan, and Hinton Geoffrey E. Layer normalization. arXiv preprint arXiv:1607.06450, 2016. [Google Scholar]
- [38].Huang Lei, Qin Jie, Zhou Yi, Zhu Fan, Liu Li, and Shao Ling. Normalization techniques in training dnns: Methodology, analysis and application. IEEE transactions on pattern analysis and machine intelligence, 45(8):10173–10196, 2023. [DOI] [PubMed] [Google Scholar]
- [39].Ren Mengye, Liao Renjie, Urtasun Raquel, Sinz Fabian H, and Zemel Richard S. Normalizing the normalizers: Comparing and extending network normalization schemes. arXiv preprint arXiv:1611.04520, 2016. [Google Scholar]
- [40].Wu Yuxin and He Kaiming. Group normalization. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018. [Google Scholar]
- [41].Miller Michelle, Chung SueYeon, and Miller Kenneth D. Divisive feature normalization improves image recognition performance in alexnet. In International Conference on Learning Representations, 2021. [Google Scholar]
- [42].Singh Saurabh and Krishnan Shankar. Filter response normalization layer: Eliminating batch dependence in the training of deep neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11237–11246, 2020. [Google Scholar]
- [43].Carandini Matteo, Heeger David J, and Movshon J Anthony. Linearity and normalization in simple cells of the macaque primary visual cortex. Journal of Neuroscience, 17(21):8621–8644, 1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Ozeki Hirofumi, Finn Ian M, Schaffer Evan S, Miller Kenneth D, and Ferster David. Inhibitory stabilization of the cortical network underlies visual surround suppression. Neuron, 62(4):578–592, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Carandini Matteo, Heeger David J, and Senn Walter. A synaptic explanation of suppression in visual cortex. Journal of Neuroscience, 22(22):10053–10065, 2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].Brosch Tobias and Neumann Heiko. Interaction of feedforward and feedback streams in visual cortex in a firing-rate model of columnar computations. Neural Networks, 54:11–16, 2014. [DOI] [PubMed] [Google Scholar]
- [47].Heeger David J and Mackey Wayne E. Oscillatory recurrent gated neural integrator circuits (organics), a unifying theoretical framework for neural dynamics. Proceedings of the National Academy of Sciences, 116(45):22783–22794, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Heeger David J and Zemlianova Klavdia O. A recurrent circuit implements normalization, simulating the dynamics of v1 activity. Proceedings of the National Academy of Sciences, 117(36):22494–22505, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Duong Lyndon, Simoncelli Eero, Chklovskii Dmitri, and Lipshutz David. Adaptive whitening with fast gain modulation and slow synaptic plasticity. Advances in Neural Information Processing Systems, 36, 2024. [Google Scholar]
- [50].Chung Junyoung, Gulcehre Caglar, Cho KyungHyun, and Bengio Yoshua. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014. [Google Scholar]
- [51].Sutskever Ilya, Vinyals Oriol, and Le Quoc V. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27, 2014. [Google Scholar]
- [52].Cho Kyunghyun, Van Merriënboer Bart, Gulcehre Caglar, Bahdanau Dzmitry, Bougares Fethi, Schwenk Holger, and Bengio Yoshua. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014. [Google Scholar]
- [53].Graves Alex, Mohamed Abdel-rahman, and Hinton Geoffrey. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 6645–6649. Ieee, 2013. [Google Scholar]
- [54].Graves Alex. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013. [Google Scholar]
- [55].Soo Wayne and Lengyel Máté. Training stochastic stabilized supralinear networks by dynamics-neutral growth. Advances in Neural Information Processing Systems, 35:29278–29291, 2022. [Google Scholar]
- [56].Song H Francis, Yang Guangyu R, and Wang Xiao-Jing. Training excitatory-inhibitory recurrent neural networks for cognitive tasks: a simple and flexible framework. PLoS computational biology, 12(2):e1004792, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [57].Soo Wayne, Goudar Vishwa, and Wang Xiao-Jing. Training biologically plausible recurrent neural networks on cognitive tasks with long-term dependencies. Advances in Neural Information Processing Systems, 36, 2024. [Google Scholar]
- [58].Zhang Huaguang, Wang Zhanshan, and Liu Derong. A comprehensive review of stability analysis of continuous-time recurrent neural networks. IEEE Transactions on Neural Networks and Learning Systems, 25(7):1229–1262, 2014. [Google Scholar]
- [59].Chang Bo, Chen Minmin, Haber Eldad, and Chi Ed H. Antisymmetricrnn: A dynamical system view on recurrent neural networks. arXiv preprint arXiv:1902.09689, 2019. [Google Scholar]
- [60].Erichson N Benjamin, Azencot Omri, Queiruga Alejandro, Hodgkinson Liam, and Mahoney Michael W. Lipschitz recurrent neural networks. arXiv preprint arXiv:2006.12070, 2020. [Google Scholar]
- [61].Rusch T Konstantin and Mishra Siddhartha. Coupled oscillatory recurrent neural network (cornn): An accurate and (gradient) stable architecture for learning long time dependencies. arXiv preprint arXiv:2010.00951, 2020. [Google Scholar]
- [62].Rusch T Konstantin and Mishra Siddhartha. Unicornn: A recurrent model for learning very long time dependencies. In International Conference on Machine Learning, pages 9168–9178. PMLR, 2021. [Google Scholar]
- [63].Liu Zheng, Gaska James P, Jacobson Lowell D, and Pollen Daniel A. Interneuronal interaction between members of quadrature phase and anti-phase pairs in the cat’s visual cortex. Vision research, 32(7):1193–1198, 1992. [DOI] [PubMed] [Google Scholar]
- [64].Khalil Hassan K. Control of nonlinear systems. Prentice Hall, New York, NY, 2002. [Google Scholar]
- [65].Silvester John R. Determinants of block matrices. The Mathematical Gazette, 84(501):460–467, 2000. [Google Scholar]
- [66].Lancaster Peter. Lambda-matrices and vibrating systems. Courier Corporation, 2002. [Google Scholar]
- [67].Tisseur Françoise and Meerbergen Karl. The quadratic eigenvalue problem. SIAM review, 43(2):235–286, 2001. [Google Scholar]
- [68].Lancaster Peter. Stability of linear gyroscopic systems: a review. Linear Algebra and its Applications, 439(3):686–706, 2013. [Google Scholar]
- [69].Kirillov Oleg N. Nonconservative stability problems of modern physics, volume 14. Walter de Gruyter GmbH & Co KG, 2021. [Google Scholar]
- [70].Kliem Wolfhard and Pommer Christian. Indefinite damping in mechanical systems and gyroscopic stabilization. Zeitschrift für angewandte Mathematik und Physik, 60:785–795, 2009. [Google Scholar]
- [71].Berman Abraham and Plemmons Robert J. Nonnegative matrices in the mathematical sciences. SIAM, 1994. [Google Scholar]
- [72].Deng Li. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE signal processing magazine, 29(6):141–142, 2012. [Google Scholar]
- [73].Bai Shaojie, Kolter J Zico, and Koltun Vladlen. Deep equilibrium models. Advances in neural information processing systems, 32, 2019. [Google Scholar]
- [74].Le Quoc V, Jaitly Navdeep, and Hinton Geoffrey E. A simple way to initialize recurrent networks of rectified linear units. arXiv preprint arXiv:1504.00941, 2015. [Google Scholar]
- [75].Arjovsky Martin, Shah Amar, and Bengio Yoshua. Unitary evolution recurrent neural networks. In International conference on machine learning, pages 1120–1128. PMLR, 2016. [Google Scholar]
- [76].Beuth Frederik and Hamker Fred H. A mechanistic cortical microcircuit of attention for amplification, normalization and suppression. Vision research, 116:241–257, 2015. [DOI] [PubMed] [Google Scholar]
- [77].Denison Rachel N, Carrasco Marisa, and Heeger David J. A dynamic normalization model of temporal attention. Nature Human Behaviour, 5(12):1674–1685, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [78].Herrmann Katrin, Heeger David J, and Carrasco Marisa. Feature-based attention enhances performance by increasing response gain. Vision research, 74:10–20, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [79].Herrmann Katrin, Montaser-Kouhsari Leila, Carrasco Marisa, and Heeger David J. When size matters: attention affects performance by contrast or response gain. Nature neuroscience, 13(12):1554–1559, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [80].Maunsell John HR. Neuronal mechanisms of visual attention. Annual review of vision science, 1:373–391, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [81].Ni Amy M and Maunsell John HR. Spatially tuned normalization explains attention modulation variance within neurons. Journal of neurophysiology, 118(3):1903–1913, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [82].Ni Amy M and Maunsell John HR. Neuronal effects of spatial and feature attention differ due to normalization. Journal of Neuroscience, 39(28):5493–5505, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [83].Schwedhelm Philipp, Krishna B Suresh, and Treue Stefan. An extended normalization model of attention accounts for feature-based attentional enhancement of both response and coherence gain. PLoS computational biology, 12(12):e1005225, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [84].Smith Philip L, Sewell David K, and Lilburn Simon D. From shunting inhibition to dynamic normalization: Attentional selection and decision-making in brief visual displays. Vision Research, 116:219–240, 2015. [DOI] [PubMed] [Google Scholar]
- [85].Zhang Xilin, Japee Shruti, Safiullah Zaid, Mlynaryk Nicole, and Ungerleider Leslie G. A normalization framework for emotional attention. PLoS biology, 14(11):e1002578, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [86].Carandini Matteo. Amplification of trial-to-trial response variability by neurons in visual cortex. PLoS biology, 2(9):e264, 2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [87].Anderson Jeffrey S, Lampl Ilan, Gillespie Deda C, and Ferster David. The contribution of noise to contrast invariance of orientation tuning in cat visual cortex. Science, 290(5498):1968–1972, 2000. [DOI] [PubMed] [Google Scholar]
- [88].Priebe Nicholas J and Ferster David. Inhibition, spike threshold, and stimulus selectivity in primary visual cortex. Neuron, 57(4):482–497, 2008. [DOI] [PubMed] [Google Scholar]
- [89].Bernstein Dennis S. Matrix mathematics: theory, facts, and formulas. Princeton university press, 2009. [Google Scholar]
- [90].Zhang Fuzhen. The Schur complement and its applications, volume 4. Springer Science & Business Media, 2006. [Google Scholar]
- [91].Horn Roger A and Johnson Charles R. Matrix analysis. Cambridge university press, 2012. [Google Scholar]
- [92].Paszke Adam, Gross Sam, Chintala Soumith, Chanan Gregory, Yang Edward, DeVito Zachary, Lin Zeming, Desmaison Alban, Antiga Luca, and Lerer Adam. Automatic differentiation in pytorch. 2017.
- [93].Kingma Diederik P and Ba Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [Google Scholar]
- [94].He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

