Abstract
Backpropagation is widely used to train artificial neural networks, but its relationship to synaptic plasticity in the brain is unknown. Some biological models of backpropagation rely on feedback projections that are symmetric with feedforward connections, but experiments do not corroborate the existence of such symmetric backward connectivity. Random feedback alignment offers an alternative model in which errors are propagated backward through fixed, random backward connections. This approach successfully trains shallow models, but learns slowly and does not perform well with deeper models or online learning. In this study, we develop a meta-learning approach to discover interpretable, biologically plausible plasticity rules that improve online learning performance with fixed random feedback connections. The resulting plasticity rules show improved online training of deep models in the low data regime. Our results highlight the potential of meta-learning to discover effective, interpretable learning rules satisfying biological constraints.
Subject terms: Learning algorithms, Computational science
The biological plausibility of backpropagation and its relationship with synaptic plasticity remain open questions. The authors propose a meta-learning approach to discover interpretable plasticity rules to train neural networks under biological constraints. The meta-learned rules boost the learning efficiency via bio-inspired synaptic plasticity.
Introduction
Error-driven learning in multilayer neural networks was revolutionized by the error backpropagation algorithm1, or backprop for short. In backprop, gradients or “errors” are propagated backward through auxiliary feedback pathways to compute parameter updates.
While practical, backprop has strong structural constraints that make it biologically implausible2,3. A major limitation, known as the weight transport problem4 states that transmitting gradients to upstream layers requires feedback connections that are symmetric with feedforward connections. Such symmetric connectivity is not known to exist in the brain. In an attempt to depart from the symmetry assumption, Lillicrap et al.5 show that even random backward connections can transmit effective teaching signals to train the upstream layers. In this scenario, while the backward connections are fixed, forward weights evolve to align the teaching signals with those prescribed by the backprop algorithm. However, leaving out the symmetry constraint comes with caveats. Random feedback alignment struggles with deeper networks, limited training data sizes, convolutional layers, and online data streams6,7.
To improve random feedback alignment, Nøkland8 proposed to rewire the feedback connections and feed the teaching signals directly from the output layer to the upstream layers. While this improves the transmission of errors, it still does not perform as robustly as the symmetric case in the low data regime. Parallel to this, Liao et al.9 suggested dismissing symmetry in magnitude, but assigning symmetric signs to the feedback connections. Nonetheless, they found that decreasing the batch size of the training data may deteriorate performance when discarding symmetry. In addition, they found batch normalization10 critical for training with asymmetric connections. These findings render the methods inadequate for training with an online stream of data, where the batch size is one, and ultimately undercuts their biological plausibility.
An alternative strategy is to implement a secondary update rule to modify the backward connections along with the forward weights. To that end, Akrout et al.11 proposed to use a Hebbian plasticity rule12 to adjust the feedback matrices parallel to the approximate gradient-based update of the forward path. The former pushes the backward connections toward the transpose of the forward weights. However, Kunin et al.13 show that this approach is highly sensitive to hyperparameter tuning. Instead, they redefine the optimization objective as a loss function based on the forward path in combination with layer-wise regularization terms for backward weights to update forward and backward pathways concurrently. They propose a few regularization terms and show that combining these units can achieve more stable plasticity rules.
Meta-learning is a broad learning framework consisting of a learning process that envelopes another optimization loop and learns some aspect of the inner learning procedure, effectively “learning to learn.” Although this concept has been around for decades14, Finn et al.15 popularized meta-learning for few-shot learning applications. This approach employs meta-learning to optimize an internal representation of the network, which is subsequently used as an initial weight to expedite learning on a downstream task. Further, Javed and White16 extended this approach to continual learning by modifying the objective function of the outer optimization loop. Still, they used the modified approach to learn a partial initialization of the network’s forward weights. Although effective in learning representations for the few-shot learning, they effectively work by pre-training a model rather than by learning to learn. More precisely, their effectiveness is largely derived from their ability to meta-learn a weight initialization, rather than meta-learning a learning rule itself.
The meta-learning framework has provided a new direction for building biologically plausible computational neural models. For example, Lindsey et al.17 learn the direct feedback pathways that modulate activations and use a supervised adaptation of Oja’s rule to update forward connections. It is supervised because it benefits from modulated activations, which do not guarantee the established properties of conventional Oja’s rule18. Nevertheless, they also meta-learn an initial value for the forward connections, which makes their approach dependent on the learned weight initialization, not on the learned learning rule alone. Miconi et al.19,20 showed that meta-learning can train a variety of network architectures on various tasks. Like Lindsey et al.17, their approach meta-trains a separate plasticity rule for each weight. While this approach can be effective, the resulting plasticity rules are difficult to interpret. In addition, meta-learning weight initialization in these works makes it unclear to what degree the results are affected by the proposed plasticity rule as opposed to the weight initialization.
A growing body of work aims to only meta-learn a plasticity rule without inferring any component of the inner model, such as initial weights. Early work includes Bengio et al.21, who meta-learned a parametric learning rule to train a 2D classifier and boolean function. In each meta-iteration, they used the plasticity rule to train multiple networks on separate tasks and obtained the meta-loss function by summing over the loss of all these networks. More recent work includes Andrychowicz et al.22, who parametrize the learning rule with a Recurrent Neural Network (RNN) and meta-learn weights of the RNN model. Using an RNN allows for training a dynamic update rule. In the context of biological plausibility, Confavreux et al.23 used meta-learning to determine plasticity rules that train shallow linear networks. Rather than discovering new rules, they recover well-known plasticity rules using objective functions based on their known behavior.
The scope of the meta-learning framework is beyond learning the forward pathway’s plasticity rule. Meta-learning has given rise to unorthodox training models beyond the classic backward transmission of errors. For example, Metz et al.24 used a meta-learning framework to learn a plasticity rule for unsupervised learning. They proposed to infer the teaching signals by meta-learning a network that projects forward activation units and the downstream feedback signal into backward hidden states. These hidden states are subsequently used to update the forward and backward weights via each pathway’s meta-learned plasticity rule. Another related work on semi-supervised learning25 uses learnable auxiliary feedback and lateral connections to facilitate error propagation during training and meta-learns the plasticity rules to update these connections. Finally, Sandler et al.26 reformulate the interactions between the forward and backward activations by defining parameterized update rules for both feedforward and feedback connections. Then, they yield new plasticity rules by meta-learning these hyperparameters.
Here, we improve upon previous work by discovering a plasticity rule that enhances the flow of information in the backward pathway while learning more distinctive embeddings in the forward network. We use meta-learning to learn a parameterized plasticity rule based on a combination of candidate rules. Key features that characterize our approach include:
Our approach solely meta-learns a plasticity rule and does not learn a weight initialization. As a result, our approach learns a learning rule that can be applied to train “naive,” randomly initialized networks from scratch.
We use “meta-parameter sharing” in the sense that all weights share a common plasticity rule, instead of learning a separate plasticity rule for each weight. This approach allows us to interpret and understand the meta-learned plasticity rules.
We impose an L1 penalty on the plasticity coefficients in our meta-loss function. This encourages our algorithm to learn a plasticity rule with fewer terms, further simplifying the analysis and interpretability of the resulting rule.
Our inner learning loop uses online learning (batch size 1) and limited training data (250 data points). Coupled with the random weight initialization in our inner learning loop, this forces the plasticity rules to learn in a more biologically relevant and challenging setting, with which random feedback alignment is known to struggle9.
Previous studies have employed different combinations of elements, such as meta-parameter sharing (as used in ref. 23) and online learning (as used in ref. 17). In contrast to previous studies, we integrate all these features to address the weight alignment problem. Our analysis of the meta-learned plasticity rules demonstrates how they overcome the weight alignment challenge. Our approach further advances the use of meta-plasticity to understand how effective learning can emerge in biological neural circuits.
Results
Limitations of feedback alignment in deep networks
Consider a fully connected deep neural network fW parameterized by weights W, representing a non-linear mapping fW : x ↦ yL from the network’s input y0 = x to the output yL, with L denoting the depth of the network. Each network layer is defined by
1 |
2 |
where yℓ is the activation for layer ℓ and σ stands for the non-linear activation function.
Given a dataset , the model is trained in an attempt to find the set of weight parameters W = {Wℓ−1,ℓ∣0 < ℓ ≤ L}, that minimize a loss function . Each weight matrix Wℓ−1,ℓ is modulated by a teaching signal eℓ derived from . A commonly used method to compute eℓ is to analytically calculate the modulatory signal eL in the output layer and then use a backward auxiliary network to transmit it to the upstream layers. This backward projection follows the relation
3 |
where ⊙ denotes element-wise multiplication and B = {Bℓ+1,ℓ∣0 < ℓ < L} are the set of feedback connections.
In a gradient-based optimization algorithm, eL is defined as the derivative of the loss function with respect to zL. This teaching signal is propagated backward up to the initial layer to modulate the weight parameters. A widely used scheme, backprop, uses feedback weights that are the transposes of the forward path’s weights, to transport these modulating signals using Eq. (3). Subsequently, the forward weight parameters are updated by
4 |
which represents a shared plasticity rule for all forward connections Wℓ−1,ℓ and θ is the associated learning rate.
To alleviate the biologically undesirable characteristics of the backprop algorithm, ref. 5 proposed the “Random Feedback Alignment” approach, which departs from the assumption of symmetric feedback connections and instead uses fixed random backward connections BFA that are not bound to the forward weights. To distinguish between the two learning algorithms, we hereafter use the phrase “feedback alignment” to refer to the learning rule in Eq. (4) with fixed random and we use “backprop” to refer to Eq. (4) with .
For feedback alignment, the teaching signal is not an exact gradient, but an approximating pseudo-gradient term. The resulting learning algorithm performs well on simple tasks and shallower networks. However, feedback alignment fails to reach good accuracy in deeper networks and is not as robust in the small data regime. In our empirical test with an online stream of data, feedback alignment only begins to effectively learn after about 2000 iterations, while backprop learns much more quickly (Fig. 1a). An alternative approach to using feedback connections that link consecutive layers is to create direct backward pathways8. This change allows errors to be transmitted directly from the output layer to the upstream layers. This modification leads to improved performance compared to the feedback alignment method, speeding up the learning process and improving accuracy. However, it still falls short of the performance level of backpropagation (see Supplementary Fig. S1). In addition, Fig. 1b shows that the teaching signals transmitted through fixed feedback connections are not aligned with the true gradients, , computed by backpropagation at this stage of training.
These limitations indicate that the backward flow of information through fixed feedback is insufficient for online training in deeper models. This paper investigates modified plasticity rules to improve the trained model’s performance. To that end, a meta-learning framework is adapted to explore a parameterized space of the plasticity rules.
Meta-learning to discover interpretable plasticity rules
Meta-learning is a machine learning paradigm that aims to learn elements of a learning procedure. This framework consists of a two-level learning scheme: An inner adaptation loop that learns parameters W of a model fW using a parameterized plasticity rule and an outer meta-optimization loop that modifies the plasticity meta-parameters Θ. The meta-training dataset contains a set of tasks , each consisting of K training data (Xtrain, Ytrain) and Q query data (Xquery, Yquery) per class. The former is used to train the model fW while the latter optimizes the meta-parameters Θ. Algorithm 1 details the meta-learning framework presented in this work.
Algorithm 1
Meta-learning algorithm.
Input meta-training set , plasticity rule , number of episodes , meta-learning rate η, and regularization coefficient λ.
Initialize learning parameters Θ(0).
for do
Initialize network parameters W(0) and B.
for do
Set
for ℓ = 1, …, L do
Compute zℓ (Eq. (1)).
Compute yℓ (Eq. (2)).
end for
Compute .
Compute .
for ℓ = L, …, 1 do
Compute (Eq. (3)).
Update .
end for
end for
Update meta-parameters .
end for
In each meta-iteration, also known as an episode, a randomly initialized model fW is trained on an online training data sequence. In other words, each adaptation iteration uses a single data point (xtrain, ytrain) to update W. It is worth emphasizing that reinitializing weights W at each episode removes the learning rule’s dependence on the weight initialization. The meta-learned plasticity rules are therefore optimized to learn a task starting from a randomly initialized weight matrix. In contrast, meta-optimizing initial weights will adapt meta-parameters Θ to the later stages of learning, which does not extrapolate to the training lifetime anymore. Moreover, when meta-learning a weight initialization in conjunction with a plasticity rule (e.g.17), it is not clear to what extent improvements in learning can be attributed to the weight initialization versus the meta-learned plasticity rule itself.
Each episode ε follows two objectives. The first is to quantify the model parameters W using a loss function , iteratively, on each data point sampled from task ’s training set. Then, given a set of R candidate terms , a parameterized plasticity rule is defined as a linear combination of individual plasticity terms,
5 |
where Θ = {θr∣0 ≤ r ≤ R − 1} is the set of learning parameters shared across layers. This rule is used to update forward weights, W, in the network. The second objective, dubbed meta-loss, assesses the meta-parameters Θ by evaluating the loss function on the query set of the same task using the updated model fW. While meta-learning over the pool of plasticity terms yields an optimized set of meta-parameters, Θ, the resulting plasticity rule consists of too many terms which are difficult to interpret and understand and whose underlying mechanisms may overlap. Therefore, following Occam’s razor, we introduce an L1 penalty on plasticity coefficients to select for a sparser set of plasticity terms. Mathematically put, the meta-loss is defined as
6 |
where fW is the model updated in the adaptation loop and λ is a predefined hyperparameter. The regularization term in Eq. (6) is the L1 norm of the meta-parameters, leading the algorithm to favor simplicity in the plasticity model (see Supplementary Fig. S5, and Table S1 for a comparison with alternative regularization approaches). While weights W are optimized using , meta-parameters Θ are updated by a gradient-based approach. Figure 2 summarizes the problem’s configuration.
Benchmarking backprop and feedback alignment via meta-learning
Before introducing new plasticity rules, it is necessary to establish the baseline performance for the current learning models for the learning task considered here. To this end, we use the meta-learning framework to optimize the learning rate, θ, in Eq. (4) for backprop and feedback alignment. Since, in these examples, the meta-learning model seeks to optimize the meta-parameter rather than selecting one term over the other, the regularization coefficient λ in Eq. (6) is set to zero.
Figure 3a–c compares the performance of the two plasticity rules over 600 episodes. First, the reinitialized models fW are trained at each episode using an online stream of M × K = 250 data points. Then, the meta-accuracy and meta-loss are evaluated with the query data. Tracing the evolution of the plasticity coefficients in Fig. 3c shows that the meta-learning model converges after ~100 episodes. After convergence, the model trained with feedback alignment is, on average, about 25% accurate in its predictions, whereas the model backpropagated via symmetric feedbacks reaches an approximate accuracy of about 70% (Fig. 3a). In addition, the backpropagated model reaches considerably lower loss values as shown in Fig. 3b. The comparison shows that the former is not adequately trained with an online data stream in the small data regime. This outcome is further supported by Fig. 3d, which illustrates the poor alignment of the modulating signals in feedback alignment with the backprop analogs.
Biologically plausible plasticity rules
The analysis in the previous section indicated a substantial performance gap between the backprop model and the pseudo-gradient rule with random feedback pathways early in the learning process. However, with the interrupted backward flow as the only distinction between the two rules, the error in the last layer and activations still maintain proper information. Intuitively, introducing new local combinations of these terms to the plasticity rule may restore information flow and improve performance. To that end, we define a set of candidate plasticity terms and use meta-learning to uncover combinations that enhance learning. Meta-learning helps in two ways: finding the optimized set of meta-parameters for the linear combination of candidate terms and selecting the dominant plasticity terms. While the former avoids cumbersome hand-tuning of the coefficients, the latter provides a tool for systematically studying the space of learning rules.
We began by examining a set of R = 10 plasticity terms and combined them according to Eq. (5) to form the learning rule (see Methods and below for definitions of these rules). Figure 4a–c illustrates the performance of the model. We set the initial values of the meta-parameters to 0. As seen in Fig. 4a, the model’s accuracy initially resembles that of the FA model, but as the meta-optimization continues, the accuracy improves, starting around 10 episodes. By about 300 meta-iterations, the accuracy approaches that of the BP model. This trend is also echoed in Fig. 4b, where the loss initially follows that of the FA learning model but then declines and eventually becomes similar to that of the BP method. In Fig. 4c, it is demonstrated that the alignment angles of the teaching signals with their BP counterparts are improved compared to the FA model, seen in Fig. 3d. Figure 4d shows that the coefficients for all but 3 terms converge toward zero after about 600 episodes. Those three terms are a pseudo-gradient rule (), a Hebbian-like plasticity rule (), and Oja’s rule (). Selecting these three terms and omitting the others gives a simpler plasticity rule of the form
7 |
where Θ = {θ0, θ2, θ9} is the set of plasticity meta-parameters. performs similar to the (see Supplementary Fig. S2) and significantly improves the performance of the feedback alignment method in the low data regime (Fig. 1).
While the meta-learning successfully discovers , it is important to interpret the plasticity rule and understand how it leads to improved learning. consists of three components: a pseudo-gradient term, a Hebbian-style error-based term, and Oja’s rule. In what follows, we study the latter terms separately with the pseudo-gradient term to unveil the underlying reason behind their performance.
Hebbian-style error-based plasticity rule
Motivated to understand the Hebbian-style error-based learning term in Eq. (7), we rerun the model using a plasticity rule that only includes the modified Hebbian term and the pseudo-gradient term, but omits the third term
8 |
In Fig. 5, the meta-learning algorithm is used to optimize the coefficients θ0 and θ2, which are initialized to 10−3 and zero, respectively. Comparing the accuracy and the loss plot to ’s performance (Supplementary Fig. S2) shows that while demonstrates a significant improvement over via feedback alignment, it is yet to reach that of . Despite this, the teaching signals of are better aligned with the backprop direction than ’s (Supplementary Fig. S2), which indicates that the Hebbian error term is the driving force behind aligning the teaching signals in .
Figure 6 illustrates how alters the communications between the backward and forward pathways. The diagram in Fig. 6a shows a model solely trained with the via feedback alignment. In this scenario, the information from B2,1 flows to W0,1 through Eq. (3), which is then propagated to W1,2 after the forward pass. This configuration updates W1,2 to align the modulator vector e1 with the backprop counterpart. Nonetheless, this machinery does not sufficiently align the modulating signals when applied to deeper networks with fewer training iterations. In the diagram on the right, the last layer is updated with an additional Hebbian-style plasticity term , while the first layer is trained with vanilla rule via feedback alignment. Once again, information from B2,1 flows into W0,1. However, this time, introduces an auxiliary channel to flow the information from B2,1 to W1,2. Finally, the forward propagation through the network implicitly transmits the information from B2,1 to W1,2. The modified rule establishes an explicit supplementary means to communicate between B2,1 and W1,2, boosts the alignment of e1, and improves the model’s performance. Note that the mechanism in needs two learning iterations to transmit information from B2,1 to W1,2; information from W0,1 propagates to W1,2 only after y1 is computed with the updated W0,1. Meanwhile, does this in the same iteration, carrying out expedited learning.
To corroborate the argument above, we consider a 3-layer network trained with rule via feedback alignment and inspect the effect of adding the error-based Hebbian-style plasticity term on the alignment angles in different layers. To that end, rather than sharing the same learning rule across the network, each layer is updated using one of the rule via feedback alignment or rules. Table 1 determines that adding the Hebbian error term to the weight update reduces the alignment angle α between the pre-synaptic error and its backprop analog. A more detailed discussion can be found in Supplementary Note 3.
Table 1.
α0 | α1 | α2 | ||
---|---|---|---|---|
W0,1, W1,2, W2,3 | – | 89.89 | 76.69 | 82.04 |
W0,1, W2,3 | W1,2 | 89.95 | 59.95 | 72.14 |
W0,1, W1,2 | W2,3 | 90.03 | 75.18 | 29.02 |
W2,3 | W0,1, W1,2 | 75.29 | 61.23 | 72.56 |
W0,1 | W1,2, W2,3 | 90.2 | 49.4 | 27.9 |
W1,2 | W0,1, W2,3 | 84.86 | 74.25 | 30.33 |
– | W0,1, W1,2, W2,3 | 77.93 | 49.93 | 28.4 |
The leftmost column includes the parameters updated using with feedback alignment, and the next column indicates layers trained with (Eq. (8)). Angles αℓ represent the alignment between the modulatory signal eℓ and the backpropagated counterpart at each layer (in degrees). Since e0 is a synthetic error, the effect of the on W0,1 alone has been excluded. The model is trained for 500 episodes, and the computed angles are averaged after a burn-in period of 100 episodes.
For a more precise, mathematical intuition of the effects that has on weights, we show in Supplementary Note 4 that, in a linear network model under reasonable approximating assumptions,
9 |
for layers, ℓ = 1, 2, …, L − 1. Thus, the term in pushes Wℓ−1,ℓ toward the transpose of Bℓ,ℓ−1, resulting in faster alignment of the modulatory signals with the backprop algorithm’s error vectors and more efficient learning.
Oja’s rule
Equation (7) proposes a plasticity rule to train deep networks using fixed feedback matrices. Above, we demonstrated that the Hebbian-style learning term improves the trained model’s performance by improving the modulatory signals’ alignments with the backpropagated analogs. Here, we look at the remaining plasticity term in Eq. (7): Oja’s rule, a purely local learning rule that updates the weights based on its current state and the local activations in the forward path. To this end, we redefine the plasticity rule as a linear combination of the pseudo-gradient term and Oja’s rule
10 |
We initialize θ0 to 10−3 and θ9 to zero and employ Alg. 1 to optimize the set of meta-parameters Θ. Figure 7a, b illustrates that adding Oja’s rule to the pseudo-gradient term enhances the model’s accuracy when backward connections are fixed. Figure 7c presents the angles between the teaching signals ensued by Eq. (10) and the corresponding backpropagated ones. While the accuracy and loss are significantly improved, contrary to expectations, Oja’s rule does not substantially reduce the alignment angles (Fig. 7c). In fact, alignment angles are only slightly smaller when using Oja’s rule compared to using pure FA, as seen by comparing Fig. 7c to Fig. 3d. This contrasts with alignment angles for and , which are greatly reduced in deeper layers compared to (compare Fig. 7c to Fig. 5c and Supplementary Fig. S2c).
Inspecting Fig. 7 suggests that rather than helping to align the modulating signals, Oja’s rule helps by entirely circumventing the backward path. Oja’s rule implements a Hebbian learning rule subjected to an orthonormality constraint on the weights18. In Eq. (10), yℓ−1 and yℓ denote post-nonlinearity activations (as stated in Eq. (2)), resulting in the plasticity rule to implement a non-linear version of Oja’s rule. When trained iteratively, this non-linear variation implements a recursive non-linear algorithm for Principal Component Analysis27,28. Previous studies on the convergence of Oja’s rule have shown that for a compression layer, where , rows of the weight matrix will tend to a rotated basis in the dimensional subspace spanned by the principal directions of the input yℓ−129.
We demonstrate that incorporating Oja’s rule into Feedback Alignment improves feature map extraction in the forward path through unsupervised learning, despite not recursively applying pure Oja’s rule. By analyzing the continuous-time differential equation corresponding to the Oja’s learning rule, Williams29 and Oja28 establish the stability limits for this rule. In a compression layer, the fixed point of Oja’s rule is a stable solution if . This conclusion can be used to derive a proximity measure30–32 of the estimated Wℓ−1,ℓ to a stable solution of Oja’s rule in the presence of non-linear activations. The error
11 |
where
12 |
can define this measure. Figure 8 studies this orthonormality measure in models trained with different plasticity rules. Results show that using Oja’s rule will render the weight matrices increasingly orthonormal, reducing the correlation in weight rows and improving the feature extraction in these layers. These findings indicate that introducing Oja’s rule alone can help with the problem of slow learning caused by random feedback connections (see Supplementary Fig. S4).
The architecture of a classifier network includes initial layers that act as feature extractors, creating hidden representations for the final layer. This last layer, dubbed predictor, maps the hidden feature representations to the target class for the given input image. To improve the classifier’s performance, a plasticity rule that enhances feature extraction in the earlier layers is beneficial. However, this rule has no grounds to positively impact the predictor layer’s performance. Despite this, for comprehensiveness, we also applied the plasticity rule to the final layer and found no detrimental effect on the model’s performance.
In summary, rather than improving alignment, applied to hidden layers provides embeddings that facilitate more effective learning.
Discussion
Despite the dominance of the backpropagation algorithm as the primary technique to train deep neural networks, its biological plausibility remains a significant ground for contest2,3. In particular, the presence of feedback synaptic projections that are precisely symmetric to the forward projections is not biologically realistic. Previous work5 showed that learning can be achieved without this symmetry using feedback connections that are randomly sampled, not tied to the forward path, and fixed throughout the training process. While a breakthrough, this method is susceptible to diminished performance when training deeper networks or using smaller batch sizes6,7. The latter is a challenge for online learning.
A recent body of work attempts to improve learning through asymmetric feedback connections. They either rewire fixed feedback connections, use plastic feedback connections that are updated through an auxiliary plasticity rule, or impose partial symmetry in the backward network8,9,17,19,20. Our work accelerates the learning process by enhancing the rules that govern neural plasticity while transmitting teaching signals through fixed connections. Our proposed rules for plasticity are based on biologically motivated learning principles, like Oja’s rule, or have been inspired by them, such as the error-based Hebbian rule. A linear combination of these terms yields a parameterized learning rule. To overcome the arduous hand-tuning of these hyper-parameters, we use a meta-learning approach that systematically explores the pool of candidate plasticity rules. This approach consists of an inner loop that learns a task and an outer loop that updates the plasticity coefficients. The inner loop always starts from randomly initialized weights, so the model must learn to learn from scratch. Moreover, the inner loop learns from an online stream of training data, simulating real-time learning in the brain.
To assure interpretability of our meta-learned learning rule, we expressed the rule as a linear combination of individual plasticity terms, imposed an L1 penalty on the coefficients, and used meta-parameter sharing between all update rules. Many terms in the pool of plasticity rules can be redundant and employ identical or overlapping mechanisms but only differ in their efficiency, i.e., computational cost or the number of required learning iterations to operate. Employing an L1-penalized meta-loss decreases the count of plasticity terms that work in parallel. In addition, while sharing the same meta-parameters across layers may limit the model’s freedom in learning, it is a vital component for discovering a global learning rule, leaving the door open to investigate the revealed terms.
Using this meta-learning approach, we discover two plasticity rules that accelerate learning through fixed feedback connections. The first, an error-based Hebbian rule, combines the errors of pre- and post-synaptic layers to update forward projecting weights. The second rule, known as Oja’s rule, combines pre- and post-synaptic activations with the connection’s current state to update weights. We investigated each plasticity rule, its underlying mechanism, and how it contributes to learning, revealing two distinct mechanisms behind them. First, the Hebbian-like error term improves performance by modifying the flow of information through the backward path. It introduces an auxiliary channel to communicate information about the backward connections to the forward weights. As a result, it accelerates learning by better aligning modulating signals with the ones transmitted through a symmetric feedback connection. Ultimately, the modified plasticity alters the training to resemble backpropagation. Unlike the Hebbian-like rule, Oja’s rule does not directly affect the flow of the feedback signals. Instead, it acts only on the forward path, implementing an unsupervised learning scheme that extracts feature maps independently of the labels and loss. The updated weight rows approximate an orthonormal basis in the subspace spanned by PCA eigenvectors of the pre-synaptic activations28. The strengthened signal separation capabilities in the earlier layers improve predictions made by the output layer.
While synaptic plasticity in the brain is mediated by a vast array of biophysical processes, the changes to a single synaptic weight largely depend on the activity of its pre-synaptic and post-synaptic neurons and the current weight, a property known as “local” plasticity. For the plasticity rules used in our study (with the exception of Oja’s rule), weight updates depend on activations from a forward pass and error signals from a backward pass. Since these quantities were used to update the forward projecting weights, this raises the question of whether the plasticity rules are truly local. The answer to this question depends on the biological interpretation of the forward and backward passes.
Under one interpretation, separate populations of neurons encode the forward and backward passes, i.e., the neurons encoding eℓ are distinct from those encoding yℓ. Under this interpretation, the plasticity rules used in this study are not strictly local.
Under another interpretation, forward activations and backward errors are represented by the same neural populations, i.e., the same neurons encode eℓ and yℓ. Under this interpretation, all of the plasticity rules used in this study are local. There are several models for how this multiplexing of forward and backward signals could be achieved (see ref. 2 for a review). For example, activations and errors could be represented at separate points in time by the same neurons.
Alternatively, recent work hypothesizes that activations and errors are encoded separately in the basal and apical dendrites of the same cortical pyramidal neurons33. Along similar lines, a growing body of work posits that activations and errors are multiplexed by the distinction between bursts and single action potentials, which are communicated separately by synaptic projections onto the soma versus apical dendrites of pyramidal neurons34–36. The dependence of synaptic plasticity on the morphological site of the synaptic contact and on the type of spiking (bursts versus individual spikes) is well established in experiments37–41. Under these models, established biophysical properties of cortical synapses can produce plasticity rules like ours that multiplex forward and backward propagating information to update weights. Networks in36 rely on weight decay to approximately align forward and backward weights11, while some networks in33 rely on random feedback alignment. Hence, our meta-learned plasticity rules could improve learning in those models.
Our meta-learning approach isolated three plasticity terms: a backprop-like rule (), Oja’s rule18 (), and a rule we refer to as eHebb (). Possible biological implementations of Oja’s rule and the backprop-like rule have been studied in great depth in previous work2,3,33,36. The eHebb rule could be implemented in a similar way to the backprop-like rule. For example, under the model in ref. 36, eHebb would change synaptic weights in response to the co-occurrence of pre- and post-synaptic bursts. Plasticity is strongly mediated by firing rates and intracellular calcium42,43, both of which are elevated during bursts.
As the eHebb’s mechanism tends to align modulating signals with the symmetric counterparts, its performance may at best match that of backprop. However, as Oja’s rule does not aim to imitate backprop, its performance is not bounded by that of backprop, and hence it can also be used to enhance learning in symmetric feedback models. For instance, we realized that adding Oja’s plasticity rule to the gradient-based learning term accelerates learning for poorly initialized networks. This observation explains why the improved performance in the fixed feedback model may outperform learning in the symmetric case. A similar concept was used in the earlier works to initialize internal representations of the neural networks32. However, that work used weights preprocessed by Oja’s rule to start gradient-based learning rather than using both terms simultaneously as the plasticity rule. Hence, our results demonstrate the utility of the proposed meta-learning approach as a tool for combining different learning terms as a single parameterized learning rule.
We used meta-learning to find plasticity rules that can learn effectively under the biologically relevant setting where forward and backward weights are not explicitly aligned. But our meta-learning technique can be applied more broadly to identify plasticity rules that overcome other biological constraints in various contexts and models. For instance, our study only focused on plasticity in forward connections; however, backward projections in the brain can also exhibit plasticity. Our meta-learning approach can be extended to discover plasticity rules for backward connections in such settings. Another interesting future direction is to meta-learn the architecture of the feedback pathways instead of (or in addition to) the plasticity parameters. That is, to simultaneously provide both direct8 and regular5 feedback pathways and allow the meta-learning algorithm to pick the most efficient path to carry the teaching signals to each layer.
In another direction, our meta-parameter sharing approach could be partially relaxed without learning a new plasticity rule for each connection. For example, one could consider a network with several neural populations and a shared plasticity rule for each pair of populations. This approach could help understand the role of distinct neuron types and populations in biological circuits.
We focused on meta-learning biologically plausible plasticity rules, but our approach can also be applied to discover learning rules that satisfy other constraints or optimize other meta-loss functions. For example, the approach can be used to find learning rules that can be implemented in non-standard hardware like neuromorphic chips or optical networks, or to discover learning rules that minimize energy consumption or other factors.
In summary, we developed and tested a meta-learning approach designed to produce simple, interpretable plasticity rules that can effectively learn on new data. First, using randomly initialized weights on each iteration of the outer loop (instead of meta-learning the initialization) and using online learning in our inner loop encouraged plasticity rules that can perform online learning from scratch. Secondly, meta-parameter sharing yielded a vastly smaller set of learned plasticity rules compared to learning a plasticity rule for each synapse. Finally, an L1 penalty on plasticity coefficients promoted sparsity within the learning rule, ultimately yielding a small set of plasticity terms that are more readily interpreted. Our results demonstrate the utility of this approach for discovering and interpreting plasticity rules. Taken together, our work opens new avenues to the application of meta-learning for discovering interpretable plasticity rules that satisfy biological or other constraints.
Methods
Models
Figure 1 performs a 10-way classification on the MNIST dataset, with images resized to 28 × 28 dimensions. The model is trained online, processing one data point per iteration (batch size one) for a single epoch. The model is a 5-layer fully connected neural network with dimensions 784-170-130-100-70-47. Hidden layers use the softplus activation function
13 |
with β = 10. The output layer uses the softmax activation function. Figures 3–5 and 7–8 perform 5-way classification on the EMNIST dataset. During adaptation, the network is trained for one episode, with a batch size of one. These figures use the same architecture as Fig. 1. For Table 1, the model conducts a 5-way classification on the EMNIST dataset with an image size of 28 × 28. The model is a 3-layer fully connected neural network with dimensions 784-130-70-47. Like the rest of the paper, hidden layers use softplus non-linearity with β = 10, while the output layer uses softmax.
In the fixed feedback pathway problem, the weights and feedback connections are initially set to random values that differ from each other. Both symmetric and fixed feedback models utilize the Xavier method44 to re-initialize forward and backward connections at the start of each meta-learning episode.
In Figs. 4, 5, 7, and 8, and Table 1, we set the initial value for the learning rate θ0 of the term to 10−3 and set all other hyper-parameters to zero.
All plots depict the mean outcome over 20 trials, each with different initial weights and feedback matrices. The shaded region in the loss, accuracy, and meta-parameters plots illustrates the 98% confidence interval, determined through bootstrapping across trials with 500 bootstrapped samples.
Candidate learning terms
Equation (7) presented a plasticity rule that improves the model’s performance in the presence of fixed random feedback connections. We employed the meta-learning framework described in Alg. 1 to explore a set of local learning rules to discover such a plasticity term. This set of terms is defined as
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
The rules above are local in the sense that the updates to the j, kth entry of Wℓ−1,ℓ depend only on the kth entry of eℓ−1 and yℓ−1, the jth entry of yℓ and eℓ, and the j, kth entry of Wℓ−1,ℓ. This notion of locality assumes that errors and activations are encoded in the same neurons (see Discussion). Even under this constraint of locality, there is an unlimited number of possible plasticity rules to choose from. To form the list above, we first considered all quadratic combinations of activations and errors except we omitted pure Hebbian plasticity () because we found that it leads to unstable network dynamics (a blowup of activations). Instead, we replaced it with Oja’s rule , which adds a stabilizing term onto pure Hebbian plasticity. Additional terms were added to test the viability of higher order plasticity terms.
Computing the learning terms , , , , , and requires a pre-synaptic error term. In order to update the weights in the first layer W0,1, where there is no pre-synaptic error, we define a synthetic error e0 using Eq. (3) and the activation function in Eq. (13), such that
24 |
Meta-training
We presented a meta-learning framework for swiftly exploring a pool of plasticity terms and uncovering combinations that exceed the performance of the existing plasticity rule. We demonstrate this by training a classifier network, which performs a 5-way classification on 28 × 28 images. The cross-entropy function evaluates the loss in the adaptation loop, whereas the meta-loss is determined by Eq. (6). While, in principle, any optimization algorithm, such as evolutionary methods, can be used to optimize Θ, the algorithm presented in Alg. 1 uses ADAM45, a gradient-based optimization technique, with a meta-learning rate of 10−3.
In the meta-optimization phase, this gradient-based optimizer differentiates through the unrolled computational graph of the adaptation phase. Thus, the non-linear layers are double differentiated, once to compute eL and a second time by the meta-optimizer. This arrangement will only allow a two-times differentiable non-linear layer, which prohibits using the Rectified Linear Unit, ReLU, as the activation function σ. Instead, we use the softplus function (Eq. (13)), a continuous, twice-differentiable approximation of the ReLU function. In Eq. (13), parameter β controls the smoothness of the function. Furthermore, the L1 norm used in the meta-loss (Eq. (6)), defined by the absolute value function, is not continuously differentiable at every point. However, it is commonly used in deep learning in conjunction with stochastic gradient descent (SGD)46. In PyTorch and other deep learning frameworks, the derivative of the absolute value function is typically defined as zero at zero.
In the present examples, each task contains M = 5 labels. Consequently, assembling a diverse set of 5-way classification tasks requires a database with a large number of classes. Thus, databases such as MNIST47, which only has ten classes, are unsuitable for proper meta-training. On the other hand, in each episode, the classifier fW is reinitialized with random weights W. Therefore, the task should contain enough data points per class to train fW adequately. Hence, databases such as Omniglot48 with only 20 data points per character designed for few-shot learning (e.g., with meta-optimized W) are impractical in the present framework. In the current work, meta-training tasks are made from the EMNIST database49. This database contains 47 classes, making it a good candidate for the meta-learning framework. Each task contains K = 50 training and Q = 10 query data points per class.
Notably, the use of K = 50 training data per class with M = 5 classes in each episode means that the meta-learned plasticity rule needs to train a randomly initialized network with only 250 training data points. Hence, our models are in a low data regime without the benefit of pre-trained weights that are often used for few-shot learning.
Supplementary information
Acknowledgements
The authors thank Ashok Litwin-Kumar, Jack Lindsey, Claudia Clopath, and Klara Kaleb for their helpful discussions. This work was supported by Air Force Office of Scientific Research grant FA9550-21-1-0223 (N.S.T., R.R.); and by National Science Foundation grants NSF-DBI-1707400 (N.S.T., R.R.), and NSF-DMS-1654268 (R.R.).
Author contributions
N.S.T. and R.R. conceived and designed the project. N.S.T. performed numerical experiments and generated figures. N.S.T. and R.R. wrote the manuscript.
Peer review
Peer review information
Nature Communications thanks Martha White and the other anonymous reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.
Data availability
In this study, the EMNIST database49 was used for meta-learning experiments. This database is publicly accessible at 10.1109/IJCNN.2017.7966217. Additional benchmarking was done using the MNIST dataset47 and the FashionMNIST dataset50. These datasets can be found at http://yann.lecun.com/exdb/mnist and https://github.com/zalandoresearch/fashion-mnist, respectively. Source data are provided with this paper.
Code availability
The PyTorch-based implementation and script files used to generate the results in this paper can be accessed at https://github.com/NeuralDynamicsAndComputing/MetaLearning-Plasticity51.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-023-37562-1.
References
- 1.Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. Nature. 1986;323:533–536. doi: 10.1038/323533a0. [DOI] [Google Scholar]
- 2.Whittington JC, Bogacz R. Theories of error back-propagation in the brain. Trends Cogn. Sci. 2019;23:235–250. doi: 10.1016/j.tics.2018.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lillicrap TP, Santoro A, Marris L, Akerman CJ, Hinton G. Backpropagation and the brain. Nat. Rev. Neurosci. 2020;21:335–346. doi: 10.1038/s41583-020-0277-3. [DOI] [PubMed] [Google Scholar]
- 4.Grossberg S. Competitive learning: from interactive activation to adaptive resonance. Cogn. Sci. 1987;11:23–63. doi: 10.1111/j.1551-6708.1987.tb00862.x. [DOI] [Google Scholar]
- 5.Lillicrap TP, Cownden D, Tweed DB, Akerman CJ. Random synaptic feedback weights support error backpropagation for deep learning. Nat. Commun. 2016;7:1–10. doi: 10.1038/ncomms13276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Amit Y. Deep learning with asymmetric connections and hebbian updates. Front. Comput. Neurosci. 2019;13:18. doi: 10.3389/fncom.2019.00018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bartunov S. Assessing the scalability of biologically-motivated deep learning algorithms and architectures. Adv. Neural Inf. Process. Syst. 2018;31:9390–9400. [Google Scholar]
- 8.Nøkland A. Direct feedback alignment provides learning in deep neural networks. Adv. Neural Inf. Process. Syst. 2029;29:1037–1045. [Google Scholar]
- 9.Liao, Q., Leibo, J. & Poggio, T. How important is weight symmetry in backpropagation? In Proc. AAAI Conference on Artificial Intelligence, (eds Schuurmans, D. & Wellman, M.) 1837–1844 (AAAI Press, 2016).
- 10.Ioffe, S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (eds Bach, F. & David Blei, D.) 448–456. (PMLR, 2015).
- 11.Akrout M, Wilson C, Humphreys P, Lillicrap T, Tweed DB. Deep learning without weight transport. Adv. Neural Inf. Process. Syst. 2019;32:974–982. [Google Scholar]
- 12.Hebb, D. O. The Organization of Behavior: A Neuropsychological Theory (Psychology Press, 2005).
- 13.Kunin, D. et al. Two routes to scalable credit assignment without weight symmetry. In International Conference on Machine Learning (eds Daumé III, H. & Singh, A.) 5511–5521 (PMLR, 2020).
- 14.Schmidhuber J. Learning to control fast-weight memories: an alternative to dynamic recurrent networks. Neural Comput. 1992;4:131–139. doi: 10.1162/neco.1992.4.1.131. [DOI] [Google Scholar]
- 15.Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning (Doina Precup, D. & Whye Teh, Y.) 1126–1135 (PMLR, 2017).
- 16.Javed K, White M. Meta-learning representations for continual learning. Adv. Neural Inf. Process. Syst. 2019;32:1818–1828. [Google Scholar]
- 17.Lindsey J, Litwin-Kumar A. Learning to learn with feedback and local plasticity. Adv. Neural Inf. Process. Syst. 2020;33:21213–21223. [Google Scholar]
- 18.Oja E. Simplified neuron model as a principal component analyzer. J. Math. Biol. 1982;15:267–273. doi: 10.1007/BF00275687. [DOI] [PubMed] [Google Scholar]
- 19.Miconi, T., Stanley, K. & Clune, J. Differentiable plasticity: training plastic neural networks with backpropagation. In International Conference on Machine Learning (eds Dy, J. & Krause, A.) 3559–3568. (PMLR, 2018).
- 20.Miconi, T., Rawal, A., Clune, J. & Stanley, K. O. Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity. In International Conference on Learning Representations, (2019).
- 21.Bengio S, Bengio Y, Cloutier J. On the search for new learning rules for anns. Neural Process. Lett. 1995;2:26–30. doi: 10.1007/BF02279935. [DOI] [Google Scholar]
- 22.Andrychowicz M. Learning to learn by gradient descent by gradient descent. Adv. Neural Inf. Process. Syst. 2016;29:3981–3989. [Google Scholar]
- 23.Confavreux B, Zenke F, Agnes E, Lillicrap T, Vogels T. A meta-learning approach to (re) discover plasticity rules that carve a desired function into a neural network. Adv. Neural Inf. Process. Syst. 2020;33:16398–16408. [Google Scholar]
- 24.Metz, L., Maheswaranathan, N., Cheung, C. & Sohl-Dickstein, J. Meta-Learning Update Rules for Unsupervised Representation Learning. In International Conference on Learning Representations, (2019).
- 25.Gu, K., Greydanus, S., Metz, L., Maheswaranathan, N. & Sohl-Dickstein, J. Meta-learning biologically plausible semi-supervised update rules. Preprint at bioRxiv10.1101/2019.12.30.891184 (2019).
- 26.Sandler, M. et al. Meta-learning bidirectional update rules. In International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 9288–9300 (PMLR, 2021).
- 27.Oja, E. Data compression, feature extraction, and autoassociation in feedforward neural networks. Artificial Neural netw. (1991).
- 28.Oja E. Principal components, minor components, and linear neural networks. Neural Netw. 1992;5:927–935. doi: 10.1016/S0893-6080(05)80089-9. [DOI] [Google Scholar]
- 29.Williams, R. J. Feature Discovery through Error-correction Learning, volume 8501. (Institute for Cognitive Science, University of California, San Diego, 1985).
- 30.Karhunen J, Joutsensalo J. Representation and separation of signals using nonlinear pca type learning. Neural Netw. 1994;7:113–127. doi: 10.1016/0893-6080(94)90060-4. [DOI] [Google Scholar]
- 31.Karhunen J, Joutsensalo J. Generalizations of principal component analysis, optimization problems, and neural networks. Neural Netw. 1995;8:549–562. doi: 10.1016/0893-6080(94)00098-7. [DOI] [Google Scholar]
- 32.Karayiannis NB. Accelerating the training of feedforward neural networks using generalized hebbian rules for initializing the internal representations. IEEE Trans. Neural Netw. 1996;7:419–426. doi: 10.1109/72.485677. [DOI] [PubMed] [Google Scholar]
- 33.Sacramento J, Ponte Costa R, Bengio Y, Senn W. Dendritic cortical microcircuits approximate the backpropagation algorithm. Adv. Neural Inf. Process. Syst. 2018;31:8735–8746. [Google Scholar]
- 34.Körding KP, König P. Supervised and unsupervised learning with two sites of synaptic integration. J. Comput. Neurosci. 2001;11:207–215. doi: 10.1023/A:1013776130161. [DOI] [PubMed] [Google Scholar]
- 35.Naud R, Sprekeler H. Sparse bursts optimize information transmission in a multiplexed neural code. Proc. Natl Acad. Sci. USA. 2018;115:E6329–E6338. doi: 10.1073/pnas.1720995115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Payeur A, Guerguiev J, Zenke F, Richards BA, Naud R. Burst-dependent synaptic plasticity can coordinate learning in hierarchical circuits. Nat. Neurosc. 2021;24:1010–1019. doi: 10.1038/s41593-021-00857-x. [DOI] [PubMed] [Google Scholar]
- 37.Paulsen O, Sejnowski TJ. Natural patterns of activity and long-term synaptic plasticity. Curr. Opin. Neurobiol. 2000;10:172–180. doi: 10.1016/S0959-4388(00)00076-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Letzkus JJ, Kampa BM, Stuart GJ. Learning rules for spike timing-dependent plasticity depend on dendritic synapse location. J. Neurosci. 2006;26:10420–10429. doi: 10.1523/JNEUROSCI.2650-06.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kampa BM, Letzkus JJ, Stuart GJ. Requirement of dendritic calcium spikes for induction of spike-timing-dependent synaptic plasticity. J. Physiol. 2006;574:283–290. doi: 10.1113/jphysiol.2006.111062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Nevian T, Sakmann B. Spine ca2+ signaling in spike-timing-dependent plasticity. J. Neurosci. 2006;26:11001–11013. doi: 10.1523/JNEUROSCI.1749-06.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Froemke, R. C., Tsay, I. A., Raad, M., Long, J. D. & Dan, Y. Contribution of individual spikes in burst-induced long-term synaptic modification. J. Neurophysiol.95, 1620–1629 (2006). [DOI] [PubMed]
- 42.Graupner M, Brunel N. Calcium-based plasticity model explains sensitivity of synaptic changes to spike pattern, rate, and dendritic location. Proc. Natl Acad. Sci. USA. 2012;109:3991–3996. doi: 10.1073/pnas.1109359109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Graupner M, Wallisch P, Ostojic S. Natural firing patterns imply low sensitivity of synaptic plasticity to spike timing compared with firing rate. J. Neurosci. 2016;36:11238–11258. doi: 10.1523/JNEUROSCI.0104-16.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proc. Thirteenth International Conference on Artificial Intelligence and Statistics, (eds Teh, Y. W. & Titterington, D. M.)249–256 (JMLR Workshop and Conference Proceedings, 2010).
- 45.Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations, (2015).
- 46.Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning. (MIT press, 2016).
- 47.LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc. IEEE. 1998;86:2278–2324. doi: 10.1109/5.726791. [DOI] [Google Scholar]
- 48.Lake BM, Salakhutdinov R, Tenenbaum JB. Human-level concept learning through probabilistic program induction. Science. 2015;350:1332–1338. doi: 10.1126/science.aab3050. [DOI] [PubMed] [Google Scholar]
- 49.Cohen, G., Afshar, S., Tapson, J. & Van Schaik, A. Emnist: Extending mnist to handwritten letters. In 2017 International Joint Conference on Neural Networks (IJCNN), 2921–2926 (IEEE, 2017).
- 50.Xiao, H., Rasul, K. & Vollgraf, R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. Preprint at https://arxiv.org/abs/1708.07747 (2017).
- 51.Shervani-Tabar, N. & Rosenbaum, R. “meta-learning biologically plausible plasticity rules with random feedback pathways” metalearning-plasticity repository. Zenodo10.5281/zenodo.7706619 (2023). [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
In this study, the EMNIST database49 was used for meta-learning experiments. This database is publicly accessible at 10.1109/IJCNN.2017.7966217. Additional benchmarking was done using the MNIST dataset47 and the FashionMNIST dataset50. These datasets can be found at http://yann.lecun.com/exdb/mnist and https://github.com/zalandoresearch/fashion-mnist, respectively. Source data are provided with this paper.
The PyTorch-based implementation and script files used to generate the results in this paper can be accessed at https://github.com/NeuralDynamicsAndComputing/MetaLearning-Plasticity51.