Summary
The spiking neural network (SNN) mimics the information-processing operation in the human brain. Directly applying backpropagation to the training of the SNN still has a performance gap compared with traditional deep neural networks. To address the problem, we propose a biologically plausible spatial adjustment that rethinks the relationship between membrane potential and spikes and realizes a reasonable adjustment of gradients to different time steps. It precisely controls the backpropagation of the error along the spatial dimension. Secondly, we propose a biologically plausible temporal adjustment to make the error propagate across the spikes in the temporal dimension, which overcomes the problem of the temporal dependency within a single spike period of traditional spiking neurons. We have verified our algorithm on several datasets, and the experimental results have shown that our algorithm greatly reduces network latency and energy consumption while also improving network performance.
Keywords: SNN, biologically plausible spatial adjustment, biologically plausible temporal adjustment, low latency, low energy consumption, spiking neural network, backpropagation, surrogate gradient
Highlights
- 
•BPSA avoids the unnecessary influence from non-spiking neurons to the weights update 
- 
•BPTA helps transmit errors across spikes to enhance the temporal dependency 
- 
•BPSA and BPTA have improved the performance of BP-based SNNs 
- 
•BPSA and BPTA can reduce the energy consumption and latency of BP-based SNNs 
The bigger picture
The spiking neural network (SNN) captures more important aspects of brain information processing and has been applied to various domains. The biggest problem restricting the development of SNN is the training algorithm. Backpropagation (BP)-based training has extended SNNs to more complex network structures and datasets. However, the traditional design of BP ignores the dynamic characteristics of SNNs and is not biologically plausible. This paper rethinks the problems in BP-based SNNs and proposes a biologically plausible spatiotemporal adjustment to replace the traditional artificial design. The adjustment greatly improves the performance of the SNNs and reduces energy consumption and latency. The long-term ambition of this research is to take more inspiration on learning mechanisms and structures from the cognitive brain at different levels of details to build even more biologically plausible SNNs as a foundation for future artificial intelligence models.
The proposal of the surrogate gradient provides convenience for the training of SNNs. Most researchers often regard SNNs as a substitute for RNNs and ignore the characteristics of spiking neurons. This paper analyzes problems in the backpropagation-based SNNs and proposes a more biologically plausible spatiotemporal adjustment to help mitigate problems from the spatial and temporal dimensions. The adjustments significantly improve the performance of SNNs and reduce energy consumption and latency.
Introduction
Deep neural networks (DNNs) have achieved success in various research areas, such as object detection,1 visual tracking,2 face recognition,3 etc. However, they are still far away from the information-processing mechanisms of the human brain. Spiking neural networks (SNNs) are known as the third-generation artificial neural network.4 They have been widely used in many fields, such as semantic segmentation,5 visual explanations,6 privacy protection,7,8 and object detection.9 The discrete spikes used to transmit information are more energy efficient and are more in line with the information-processing mechanism in the brain. Combined with neuromorphic computing,10 it promises to realize real intelligence.
However, due to the complex neural dynamics and non-differential characteristics of SNNs, it is still a challenge to train SNNs efficiently. Existing SNN training methods can be roughly divided into three categories: the biologically plausible method, the conversion method, and the backpropagation-based method.
The biologically plausible method, such as Hebbian learning rules11 and spike-timing-dependent plasticity (STDP),12 is mainly inspired by the synaptic learning rules in the human brain. The Hebbian theory believes that the connection between pre- and post-synaptic neurons will increase due to continuous and repetitive stimulation of pre-synaptic neurons. STDP is an extended Hebbian learning rule based on the temporal difference between pre- and post-synaptic neurons. Diehl et al.13 used the STDP learning rule and lateral inhibition in a two-layer SNN and achieved 95% accuracy on the MNIST dataset. Saeed et al.14 introduced a weight-sharing strategy and designed a spiking convolutional neural network. The weight was learned by the STDP layer-wisely. Kherapisheh et al.15 used the hand-crafted difference of Guassian (DoG) features as the input of the SNNs and trained the subsequently convolutional layer through STDP. These methods rely on the local activities of neighboring neurons to update network weights and lack the supervision of global signals. Although Zhao et al. designed a multi-layer SNN based on global feedback connections and local optimization learning rules (GLSNN),16 it still performs poorly when transplanted to some deep networks for some complex tasks.
The conversion method is an alternative way to get high-performance SNNs. It first trains the well-performed DNNs, then converts the DNNs into SNNs with some additional adjustments.17, 18, 19, 20, 21 The analog values of DNNs are converted into the firing rates of SNNs. Although the conversion method makes the SNNs achieve performance close to the traditional DNNs, the simulation time is too long, which causes the network to have poor real-time performance and high energy consumption. Also, the conversion methods rely highly on the well-trained DNNs and do not take advantage of the temporal information of SNNs.
The success of deep learning depends heavily on the proposal of the backpropagation algorithm. Several studies provide evidence for backpropagation in the brain. The feedback connections may make predictions of activities of low-level brain areas,22, 23, 24, 25 and the biological neurons will backpropagate the action potentials to provide crucial signals for synaptic plasticity.26, 27, 28, 29 Lillicrap et al.30 argued that the differences with the feedforward and feedback neural activities may locally approximate the error signals in backpropagation. Researchers in SNN domains also introduced the backpropagation algorithm into the optimization of SNNs with the surrogate-gradient method.31, 32, 33, 34 Surrogate gradient helps SNNs perform backpropagation through time (BPTT) so that SNNs can be adopted to larger-scale network structures, such as VGG, ResNet, etc., and perform better on more complex datasets. However, directly applying the surrogate gradient into the training of SNNs may lead to some problems. First, the surrogate gradient obtains the gradient by smoothing the spike firing function. Neurons with membrane potential around the threshold will participate in the backpropagation. As a result, the neurons that do not emit spikes may participate in weight updating, significantly increasing the network’s energy consumption. Second, the spiking neuron will reset to the resting potential after the spike is emitted. The reset operation will cut off the error along the temporal dimension during the backpropagation so that errors cannot propagate across spikes, which significantly weakens the temporal dependence of the SNNs. To address the problems mentioned above, we introduced a biologically plausible spatiotemporal adjustment to improve the backpropagation training of SNNs, which can be summarized as follows:
- 
•We study the influence of the surrogate gradient on the spatial dimension of the SNNs, rethink the relationship between the neuron membrane potential and the spikes, and propose a more biologically plausible spatial adjustment (BPSA) to help regulate spike activities. 
- 
•We study the limitations of the surrogate gradient in the temporal dimension and introduce a more biologically plausible temporal adjustment (BPTA), which enables the SNNs to propagate errors across the spikes, enhancing the temporal dependence of the SNNs. 
- 
•We conduct experiments on several commonly used datasets. For the static datasets MNIST, CIFAR10, and CIFAR100, we get remarkable performance compared with other state-of-the-art SNNs. To the best of our knowledge, we have reached state-of-the-art performance for the neuromorphic datasets N-MNIST, DVS-CIFAR10, and DVS-Gesture. For the Google Speech Commands dataset, we have reached comparable performance with other artificial neural networks designed for speech recognition. Moreover, our method dramatically reduces energy consumption and latency through analysis compared with other state-of-the-art SNNs. 
Results
In this section, we conduct experiments using the PyTorch framework35 with NVIDIA A100 graphic processing unit (GPU). The network weights are initialized with the default method of PyTorch. We use the AdamW36 algorithm as the optimizer, the learning rate is set with 1 × 10-3, and the same learning rate control strategy as in SGDR37 is used. The same method in temporal spike sequence-learning backpropagation (TSSL-BP) is used to warm up the model. The membrane potential threshold of the neuron is set to 0.5, the membrane potential decay constant , and the default simulation duration T is set to 16. The training epochs are set to 300. The α in Equation 10 is set to 0.2. First, we conduct experiments on the static MNIST, CIFAR10, and CIFAR100 datasets. To further illustrate the superiority of our algorithm, we also conduct experiments on the neuromorphic datasets N-MNIST,38 DVS-Gesture,39 and DVS-CIFAR10.40 And to demonstrate the adaptability of our algorithm in other domains, we conduct experiments on the speech-recognition dataset Google Speech Commands.41 For the static datasets, we use the direct input encoding used in Wu et al.32 as well as the voting strategy. For the neuromorphic dataset, we use the same data preprocessing strategy used in SpikingJelly.42 For different datasets, we designed three different network structures to adapt to different sizes and complexities. The small network is 128C3-MP2-128C3-256C3-MP2-2048FC-DP-10Voting, the middle is 128C3-MP2-128C3-MP2-256C3-MP2-512C3-AP4-512FC-10Voting, and the large is 128C3-128C3-MP2-128C3-MP2-256C3-MP2-512C3-MP2-1024C3-AP4-DP-1024FC-10Voting. AP denotes the average-pooling operation, MP denotes max-pooling operation, DP denotes neuron dropout,43 and C denotes the Conv-BN-ReLU-LIF operation.
Static datasets
MNIST is one of the most common classification datasets in the deep-learning domain, with 60,000 training datasets and 10,000 test datasets. The samples in the datasets are 28 28 gray-scale images representing handwritten numbers from 0 to 9, respectively. We use the small structure for the evaluation. The CIFAR10 dataset is more challenging for most existing SNNs. The training set has 50,000 samples, and the test set has 10,000 samples. The dataset is a 32 32 color dataset. A deeper network will achieve better performance. Hence, we adopt the middle structure to conduct the experiment. CIFAR100 is a more challenging version than CIFAR10; it has 100 categories, and each category has only 600 samples: 500 for training and 100 for testing. The network structure is the same with CIFAR10. Experimental results are compared with several deep SNN models, including conversion and BP based, as shown in Table 1.
Table 1.
Classification accuracy on MNIST, CIFAR10, and CIFAR100 datasets
| Models | Training method | MNIST | CIFAR10 | CIFAR100 | 
|---|---|---|---|---|
| Spiking CNN44 | conversion | – | 82.95 | – | 
| BackRes45 | BP | – | 84.98 | – | 
| ContinueSNN46 | conversion | 99.44 | 90.85 | – | 
| Spike-Norm19 | conversion | – | 91.55 | – | 
| STBP31 | BP | 99.42 | 50.7 | – | 
| HM2BP33 | BP | 99.49 | – | – | 
| LISNN47 | BP | 99.5 | – | – | 
| BNTT48 | BP | – | 90.5 | 66.6 | 
| STBP NeuNorm32 | BP | – | 90.53 | |
| BackEISNN49 | BP | 99.67 | 90.93 | – | 
| SBPSNN43 | BP | 99.59 | 90.95 | – | 
| TSSL-BP34 | BP | 99.53 | 91.41 | – | 
| ST-RSBP50 | BP | 99.62 | – | – | 
| RNL51 | conversion | 99.51 | 93.45 | 75.1 | 
| SNASNet-Fw 52 | NAS + BP | – | 93.64 | 70.06 | 
| SNASNet-Bw 52 | NAS + BP | – | 94.12 | 73.04 | 
| Our method | BP | 99.67 | 92.15 | 68.28 | 
| Our method ResNet34 | BP | – | 94.51 | 69.32 | 
The spatiotemporal BP (STBP) NeuNorm32 is the STBP method with the neuron norm. For the normal network structures we set, our network achieves comparable performance with other SNN algorithms. Also, in order to illustrate the adaptability of our algorithm to deeper networks, we tested it based on the network structure ResNet34. As can be seen in the Table 1, for the CIFAR10 dataset, our network has reached state-of-the-art performance compared with other famous SNNs, whether based on BP or conversion. For the CIFAR100 dataset, although our network still has a little gap compared with RNL51 and SASNet,52 the RNL algorithm directly converts the well-trained DNNs to SNNs, while SNASNet searches a better network structure based on neural architecture search (NAS).
Neuromorphic datasets
To better illustrate our spatiotemporal adjustment, we conduct experiments on the neuromorphic datasets N-MNIST, DVS-Gesture, and DVS-CIFAR10. N-MNIST is the neuromorphic version of MNIST. The dynamic version sensor (DVS) is put in front of the static images on a computer screen. The images shift due to the DVS moving in the direction in three sides of the isosceles triangle in turn, and the two-channel spike event (on and off) is collected. DVS-Gesture is a real-time gesture-recognition dataset reported by DVS. The dataset has 11 hand gestures such as hand clips, arm rolls, etc., collected from 29 individuals under three illumination conditions. DVS-CIFAR10 is a neuromorphic version converted from the CIFAR10 dataset. 10,000 frame-based images are converted into 10,000 event streams with DVS. For N-MNIST, we use the middle structure, and for the DVS-Gesture and DVS-CIFAR10, which are more complex, we use the large structure.
As can be seen in Table 2, for the N-MNSIT dataset, our method has surpassed STBP by 0.3%; even with the introduction of NeuNorm, our work still performs better than them. For the more complex gesture dataset, our model surpasses the latest STBP-tdBN56 by 2% and LMCSNN57 by 1.4%. Our model has reached state-of-the-art performance compared with other current famous SNNs. For the DVS-CIFAR10 dataset, compared with the latest STBP-tdBN, we surpassed them by nearly 11%. For LMCSNN, which make many parameters in the leaky integrate-and-fire (LIF) spiking neurons learnable, we also surpass them by 4%. Our method has achieved state-of-the-art performance for the DVS-CIFAR10 dataset.
Table 2.
Classification accuracy on N-MNIST, DVS-Gesture, and DVS-CIFAR10 datasets
| Models | Method | N-MNIST | DVS-Gesture | DVS-CIFAR10 | 
|---|---|---|---|---|
| HM2-BP 33 | BP | 98.88 | – | – | 
| SLAYER 53 | BP | 99.2 | 93.64 | – | 
| TSSL-BP 30 34 | BP | 99.28 | – | – | 
| IIRSNN 54 | BP | 99.28 | – | – | 
| TSSL-BP 100 34 | BP | 99.4 | – | – | 
| STBP 31 | BP | 99.44 | – | – | 
| LISNN 47 | BP | 99.45 | – | – | 
| STBP NeuNorm 32 | BP | 99.53 | – | 60.5 | 
| BNTT 48 | BP | – | – | 63.2 | 
| SALT 55 | BP | – | – | 67.1 | 
| STBP-tdBN 56 | BP | – | 96.87 | 67.8 | 
| LMCSNN 57 | BP | 99.61 | 97.57 | 74.8 | 
| BackEISNN 49 | BP | 99.57 | – | – | 
| Our method | BP | 99.71 | 98.96 | 78.95 | 
Speech-recognition dataset
To verify the performance of our algorithm in other domains, we validate the proposed method on the Google Speech Commands dataset. There are two versions of this dataset, and the second version is used for testing. There are 105,000 utterances in 35 categories, and each utterance is 1 s long. The two training datasets are rebalanced by repeating random samples to make the number of samples the same in each class.
As can be seen in Table 3, even compared with the artificial neural networks designed for speech recognition, our algorithm still shows comparable performance.
Table 3.
Classification accuracy on Google Speech Commands dataset
Conclusion
In this paper, first, we analyze the existing problems in the SNNs trained with BP. We find that the current setting will cause the earlier spiking neurons repeat participating in the gradient calculation of the network, making a more significant influence on the network weight. The BPTT algorithm on the SNNs only propagates errors backward in a single-spike period. The temporal dependence between spikes will be truncated. By introducing the biologically plausible spatial adjustment, it will consider the spikes generated by the membrane potential of different strengths, which will have different effects on the parameter update during the backpropagation process. In addition, the biologically plausible temporal adjustment is introduced, and it considers the backpropagation across the spikes. We have achieved remarkable performance on MNIST, CIFAR10, CIFAR100, and Google Speech Commands datasets and achieved the current best performance on N-MNIST, DVS-Gesture, and DVS-CIFAR10 datasets. By analyzing the energy consumption and latency of the SNNs, we find that the BPSAs and BPTAs significantly reduces energy consumption and latency while improving performance.
Discussion
In this section, firstly, we conduct the ablation study to the BPSA and BPTA mentioned above and analyze the contribution of each module. Secondly, we explore the energy consumption of the SNNs for these adjustments. Thirdly, we discuss the latency of the SNNs affected by these adjustments. Finally, we give the limitations of our algorithm and future work. Through the analysis, it is fully illustrated that the above two adjustments can make the behavior of the spiking neurons more stable and establish a better performance while reducing network latency and energy consumption.
Ablation study
We conduct the ablation study on the neuromorphic datasets DVS-Gesture and DVS-CIFAR10 due to the more complex spatial structure and stronger temporal information, which will fully illustrate our adjustments’ importance. We use Lillicrap et al.31 as our baseline and then continue to add the BPSA and BPTA.
As can be seen in Table 4, with the introduction of the two adjustments, the performance of the network is gradually improved, among which the spatial adjustment brings more significant improvement.
Table 4.
The ablation study of the two adjustments on DVS-Gesture and DVS-CIFAR10 datasets
| Baseline | BPSA | BPSA + BPTA | |
|---|---|---|---|
| DVS-Gesture | 93.92 | 97.56 | 98.96 | 
| DVS-CIFAR10 | 71.40 | 75.30 | 78.95 | 
We also give the test curves of the DVS-Gesture dataset. As shown in Figure 1, with the number of epochs increasing, the accuracy of the model with biologically plausible spatiotemporal adjustment fluctuates less. Because with the introduction of the two adjustments, the firing pattern of neurons is more stable, making the model more robust to more minor parameter changes. Meanwhile, a reasonable gradient allocation strategy in the BP improves the model’s generalization performance and avoids overfitting to a certain extent.
Figure 1.
The test accuracy curve on DVS-Gesture of our method and the baseline
Energy-efficiency study
To illustrate the energy efficiency of our algorithm, we visualize the firing frequency of different layers in the MNIST experiment. As can be seen from the Figure 2, due to the biologically plausible spatiotemporal adjustment, our method exhibits an extremely low firing rate, especially in the initial convolutional layers.
Figure 2.
The firing frequency of different convolutional layers on MNIST of our method and the baseline
We compare the accuracy and energy efficiency of the SNNs trained by the method used in Wu et al.,31 the model we propose, and the artificial neural networks (ANNs) using the same network structure and network parameters. Most operations in ANNs are multiply accumulate (MAC), while in SNNs, the spikes transmitted in the network are sparse, and the spikes are integrated into the membrane potential. As a result, most operations in SNNs are accumulate (AC) operations. We calculate the energy consumption of the SNN by multiplying floating-point operations (FLOPS) and the energy consumption of MAC and AC operations. We use the same energy-efficiency calculations as in Chakraborty et al.,62 and the computation details can be seen in Equation 1.
| (Equation 1) | 
As can be seen in Table 5, our method has a lower firing rate and higher energy efficiency. The training method of the SNNs proposed in this paper distributes the gradient more reasonably along the spatial and temporal dimensions, avoiding the problem that the earlier spiking neurons would have a more significant influence on the network parameters. The cross-spikes propagation will also enhance the temporal dependence of the SNNs. Therefore, the method proposed in this paper achieves lower network power consumption while maintaining a higher accuracy.
Table 5.
The energy-efficiency study of our model with baseline on different datasets
| Dataset | Accuracy (%) | Firing rate | EE = (×) | 
|---|---|---|---|
| MNIST | 99.58/99.42 | 0.082/0.183 | 35.1/15.7 | 
| N-MNIST | 99.61/99.32 | 0.097/0.176 | 29.6/16.3 | 
| CIFAR10 | 92.33/89.49 | 0.108/0.214 | 26.6/13.4 | 
| DVS-Gesture | 98.26/93.92 | 0.083/0.165 | 34.6/17.4 | 
| DVS-CIFAR10 | 77.76/71.40 | 0.097/0.177 | 29.5/16.2 | 
Represented as baseline/our method.
Latency study
The latency of the SNNs is one of the main problems that restricts the development of SNNs. The spiking neurons need to accumulate membrane potential, and once they reach the threshold, they fire spikes and transmit information. Therefore, SNNs often require a long simulation time to achieve higher performance. Here, we study the influence of different simulation lengths on the network performance.
As shown in the Figure 3, when our adjustments are not introduced, when the simulation time is reduced, the test curve of the network is not very smooth, that is, the network needs a long simulation time to converge. As can be seen in Table 6, with the introduction of the two adjustments, our training method still achieves high accuracy while reducing the simulation time. The low latency of our approach further lays the foundation for the practical application of SNNs.
Figure 3.
The test accuracy of different simulation lengths on DVS-Gesture dataset with our method and the baseline
Table 6.
The test accuracy on DVS-Gesture dataset of different simulation lengths of our method and the baseline
| T = 32 | T = 16 | T = 8 | T = 4 | |
|---|---|---|---|---|
| BPSA + BPTA | 98.27 | 98.26 | 96.18 | 92.01 | 
| BPSA | 96.53 | 97.56 | 94.44 | 89.58 | 
| Baseline | 95.49 | 93.92 | 84.03 | 73.96 | 
Limitations of the study
In this paper, through the analysis of the training of the BP-based SNN, we find that neurons that do not generate spikes will still participate in the update of network weights. Also, the error signals along the temporal dimension cannot propagate across the spikes due to the reset operation. By introducing the BPSA and BPTA mechanisms, our network is more consistent with the brain in terms of weight update, and the energy consumption and latency of the SNN are greatly reduced. However, there is no independent module in the brain specially designed for the BP pathway. In future work, we will explore more biologically plausible learning methods to train SNNs with high performance and robustness.
Experimental procedures
Resource availability
Lead contact
Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Yi Zeng (yi.zeng@ia.ac.cn).
Materials availability
This study did not generate new unique materials.
Data and code availability
All original code has been deposited at https://github.com/Brain-Inspired-Cognitive-Engine/BP-STA under https://doi.org/10.5281/zenodo.6489856 and is publicly available as of the date of publication.
Spiking-neuron model
Many spiking-neuron models with biological neural characteristics have been proposed in recent years, and the LIF model is adopted in most common neuron models in deep SNNs. The LIF neurons continuously accumulate the membrane potential and emit spikes once they reach the threshold. We give a detailed description of the LIF neuron models. As shown in Equation 2, the membrane potential of the neuron changes dynamically with the input current.
| (Equation 2) | 
denotes the input current, which is composed of input spikes. R is the membrane resistance, and is the synaptic time constant. When the membrane potential is greater than the threshold , the neuron will spike and be reset to . Without loss of generality, we set the reset potential , . To facilitate the calculation and simulation, we convert Equation 2 into a discrete form with Euler method with so that we can get
| (Equation 3) | 
The input can be obtained from the pre-synaptic spikes , is the number of neurons in the layer, then we can get
| (Equation 4) | 
- , and the function g is the threshold function. is the synaptic weight from the layer from neuron j to neuron i. denotes the neuron j spikes in layer at time t. 
Spatiotemporal characteristics of SNNs
The discontinuity of the spike firing function makes it challenging to apply the BP directly to the training of SNNs. In recent years, surrogate gradient has been proposed to replace the discontinuous gradient with a smooth gradient function to enable the SNNs to conduct BP in the spatial and temporal domains. Here, we use the mean average firing rates of the last layer to approximate the classification label and train the network through the mean squared error (MSE):
| (Equation 5) | 
T denotes the simulation length, denotes the real labels, and the denotes the output at time t. By applying chain rule, we can obtain the gradient with respect to weight:
| (Equation 6) | 
| (Equation 7) | 
denotes the derivative with respect to o in the layer at time step t and can be derived from the layer (spatial) and time step (temporal).
As can be seen in Equation 6 and Figure 4, the traditional surrogate-gradient method will calculate the gradient around the threshold, even if the spiking neurons do not emit spikes in the forward process. This will cause a large number of neurons that do not emit spikes to participate in the parameter update, increasing network’s energy consumption. Also, as can be seen in Figure 4, for the neuron , it will participate in the weight update repeatedly according to the chain rule, and the earlier spiking moment will have a larger influence on the weight update compared with . While in neurophysiology, the farther away the spiking activity is from the current moment, the smaller the effect.
Figure 4.
The forward and backward process of spiking neural networks
The dotted lines of different colors indicate the impact on the network at different time steps. The earlier spiking node will have more influence on the parameter update.
For an SSN trained with BP, the temporal dependence mainly comes from accumulating membrane potential over time. As a result, the backward process for the temporal dimension can be written as
| (Equation 8) | 
Since the spiking neurons will reset to the resting potential after reaching the threshold, that is to say that will have no relationship with , and the temporal dependence will no longer exist, as shown in Figure 5.
Figure 5.
The temporal backpropagation of LIF neurons
The information can only propagate within a single-spike period and cannot propagate cross spikes.
To tackle the problems mentioned above, we propose the BPSA in which the neurons along with the hierarchical layers that emit spikes will participate in the weight update. Also, we propose the BPTA to help the errors transmit to the initial time step without being clipped.
BPSA
The membrane potential of spiking neurons changes as a process of information accumulation. After the neurons have accumulated enough information, they will send the information to the post-synaptic neurons in the form of spikes. As a result, the binary spikes can be regarded as a normalization of the information contained in the membrane potential. For the BP process, it is more reasonable to only calculate the gradient of the neuron at the moment of spiking to the membrane potential. We propose a BPSA to improve the BP-based training SNNs. When the membrane potential does not reach the threshold, we will clip the gradient of the spikes to the membrane potential to avoid the problems of repeated updates at an earlier time, as in Figure 4. When the membrane potential reaches the threshold, we normalize the membrane potential and spread the information in spikes. Then, the derivative of the spikes concerning the membrane potential can be expressed as
| (Equation 9) | 
This method considers the influence of the spikes generated by the membrane potential of different strengths on the parameter update. For a spike excited by larger membrane potential, there will be a minor optimization step for the model parameters in the BP process to ensure the stability of the spikes. The spikes excited by the membrane potential near the spike threshold will have a more significant impact on the model parameters, allowing the model to quickly push the membrane potential close to the threshold away to obtain more stable spikes.
BPTA
In biological neurons, the spike that the neuron fires will affect the subsequent spikes of the neuron. When directly using the BP algorithm to optimize the parameters of the SNNs, the gradient of the loss function to the neuron output will only be propagated from the time the neuron was last excited to the present and will not cross the spikes as shown in Equation 8 and Figure 5. So, the influence between spikes will not be considered in the temporal dimension. Then, we propose a BPTA cross the spikes. Considering that the temporal dependence disappears during the BP process, we add the residual connection between spikes during the backward pathway, as shown in Figure 6. The influence to control the error transfer from time step to t is controlled by the residual factor α. The temporal feedback process can be written as
| (Equation 10) | 
Figure 6.
The temporal residual pathway helps the error transfer from time step to time step t
As can be seen in Equation 10, when the neurons do not emit a spike at time t, , which is the same with the traditional BP algorithm. However, when the neuron fires a spike at time t, then , the temporal dependence can be written as . With the introduction of the BPSAs and BPTAs, the influence of different spikes becomes more reasonable, and the temporal residual backward pathway enables it to propagate errors over spikes.
Acknowledgments
This work is supported by the National Key Research and Development Program (grant no. 2020AAA0104305) and the Strategic Priority Research Program of the Chinese Academy of Sciences (grant no. XDB32070100).
Author contributions
Conceptualization, G.S., D.Z., and Y.Z.; methodology, G.S. and D.Z.; investigation, G.S.; writing – original draft, G.S., D.Z., and Y.Z.; writing – review & editing, G.S., D.Z., and Y.Z.; supervision, Y.Z.; funding acquisition, Y.Z.
Declaration of interests
The authors declare no competing interests.
Published: June 2, 2022
References
- 1.Zou Z., Shi Z., Guo Y., Ye J. Object detection in 20 years: a survey. arXiv. 2019 doi: 10.48550/arXiv.1905.05055. Preprint at. [DOI] [Google Scholar]
- 2.Li P., Wang D., Wang L., Lu H. Deep visual tracking: review and experimental comparison. Pattern Recogn. 2018;76:323–338. doi: 10.3389/fnins.2020.00119. [DOI] [Google Scholar]
- 3.Masi I., Wu Y., Hassner T., Natarajan P. IEEE; 2018. Deep Face Recognition: A Survey. In 2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) pp. 471–478. [DOI] [Google Scholar]
- 4.Maass W. Networks of spiking neurons: the third generation of neural network models. Neural Network. 1997;10:1659–1671. doi: 10.1016/S0893-6080(97)00011-7. [DOI] [Google Scholar]
- 5.Kim Y., Chough J., Panda P. Beyond classification: directly training spiking neural networks for semantic segmentation. arXiv. 2021 doi: 10.48550/arXiv. Preprint at. [DOI] [Google Scholar]
- 6.Kim Y., Panda P. Visual explanations from spiking neural networks using inter-spike intervals. Sci. Rep. 2021;11:1–14. doi: 10.1038/s41598-021-98448-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kim Y., Venkatesha Y., Panda P. Privatesnn: fully privacy-preserving spiking neural networks. arXiv. 2021 doi: 10.48550/arXiv.2104.03414. Preprint at. [DOI] [Google Scholar]
- 8.Venkatesha Y., Kim Y., Tassiulas L., Panda P. Federated learning with spiking neural networks. IEEE Trans. Signal. Process. 2021;69:6183–6194. doi: 10.1109/TSP.2021.3121632. [DOI] [Google Scholar]
- 9.Kim S., Park S., Na B., Yoon S. Spiking-yolo: spiking neural network for energy-efficient object detection. Proc. AAAI Conf. Artif. Intell. 2020;34:11270–11277. [Google Scholar]
- 10.Roy K., Jaiswal A., Panda P. Towards spike-based machine intelligence with neuromorphic computing. Nature. 2019;575:607–617. doi: 10.1038/s41586-019-1677-2. [DOI] [PubMed] [Google Scholar]
- 11.Hebb D.O. Wiley; 1949. The Organization of Behavior. [Google Scholar]
- 12.Bi G.q., Poo M.m. Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J. Neurosci. 1998;18:10464–10472. doi: 10.1523/JNEUROSCI.18-24-10464.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Diehl P.U., Cook M. Unsupervised learning of digit recognition using spike-timing-dependent plasticity. Front. Comput. Neurosci. 2015;9:99. doi: 10.3389/fncom.2015.00099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kheradpisheh S.R., Ganjtabesh M., Masquelier T. Bio-inspired unsupervised learning of visual features leads to robust invariant object recognition. Neurocomputing. 2016;205:382–392. doi: 10.1016/j.neucom.2016.04.029. [DOI] [Google Scholar]
- 15.Kheradpisheh S.R., Ganjtabesh M., Thorpe S.J., Masquelier T. Stdp-based spiking deep convolutional neural networks for object recognition. Neural Network. 2018;99:56–67. doi: 10.1016/j.neunet.2017.12.005. [DOI] [PubMed] [Google Scholar]
- 16.Zhao D., Zeng Y., Zhang T., Shi M., Zhao F. Glsnn: a multi-layer spiking neural network based on global feedback alignment and local stdp plasticity. Front. Comput. Neurosci. 2020;14:576841. doi: 10.3389/fncom.2020.576841. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Diehl P.U., Neil D., Binas J., Cook M., Liu S.C., Pfeiffer M. 2015 International Joint Conference on Neural Networks (IJCNN) IEEE; 2015. Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing; pp. 1–8. [Google Scholar]
- 18.Xu Q., Qi Y., Yu H., Shen J., Tang H., Pan G. IJCAI; 2018. Csnn: An Augmented Spiking Based Framework with Perceptron-Inception; pp. 1646–1652. [Google Scholar]
- 19.Sengupta A., Ye Y., Wang R., Liu C., Roy K. Going deeper in spiking neural networks: vgg and residual architectures. Front. Neurosci. 2019;13:95. doi: 10.3389/fnins.2019.00095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Hu Y., Tang H., Wang Y., Pan G. Spiking deep residual network. arXiv. 2018 doi: 10.48550/arXiv.1805.01352. Preprint at. [DOI] [PubMed] [Google Scholar]
- 21.Li Y., Zeng Y., Zhao D. Bsnn: towards faster and better conversion of artificial neural networks to spiking neural networks with bistable neurons. arXiv. 2021 doi: 10.48550/arXiv.2105.12917. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Rao R.P., Ballard D.H. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 1999;2:79–87. doi: 10.1038/4580. [DOI] [PubMed] [Google Scholar]
- 23.Bastos A.M., Usrey W.M., Adams R.A., Mangun G.R., Fries P., Friston K.J. Canonical microcircuits for predictive coding. Neuron. 2012;76:695–711. doi: 10.1016/j.neuron.2012.10.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kok P., Lange F.P.d. An introduction to model-based cognitive neuroscience. Springer; 2015. Predictive coding in sensory cortex; pp. 221–244. [DOI] [Google Scholar]
- 25.Roelfsema P.R., Holtmaat A. Control of synaptic plasticity in deep cortical networks. Nat. Rev. Neurosci. 2018;19:166–180. doi: 10.1038/nrn.2018.6. [DOI] [PubMed] [Google Scholar]
- 26.Bereshpolova Y., Amitai Y., Gusev A.G., Stoelzel C.R., Swadlow H.A. Dendritic backpropagation and the state of the awake neocortex. J. Neurosci. 2007;27:9392–9399. doi: 10.1523/JNEUROSCI.2218-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Schiess M., Urbanczik R., Senn W. Somato-dendritic synaptic plasticity and error-backpropagation in active dendrites. PLoS Comput. Biol. 2016;12:e1004638. doi: 10.1371/journal.pcbi.1004638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Richards B.A., Lillicrap T.P. Dendritic solutions to the credit assignment problem. Curr. Opin. Neurobiol. 2019;54:28–36. doi: 10.1016/j.conb.2018.08.003. [DOI] [PubMed] [Google Scholar]
- 29.Fitzsimonds R.M., Song H.j., Poo M.m. Propagation of activity-dependent synaptic depression in simple neural networks. Nature. 1997;388:439–448. doi: 10.1038/41267. [DOI] [PubMed] [Google Scholar]
- 30.Lillicrap T.P., Santoro A., Marris L., Akerman C.J., Hinton G. Backpropagation and the brain. Nat. Rev. Neurosci. 2020;21:335–346. doi: 10.1038/s41583-020-0277-3. [DOI] [PubMed] [Google Scholar]
- 31.Wu Y., Deng L., Li G., Zhu J., Shi L. Spatio-temporal backpropagation for training high-performance spiking neural networks. Front. Neurosci. 2018;12:331. doi: 10.3389/fnins.2018.00331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Wu Y., Deng L., Li G., Zhu J., Xie Y., Shi L. Direct training for spiking neural networks: faster, larger, better. Proc. AAAI Conf. Artif. Intell. 2019;33:1311–1318. [Google Scholar]
- 33.Jin Y., Zhang W., Li P. Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018. Hybrid macro/micro level backpropagation for training deep spiking neural networks; pp. 7005–7015. [Google Scholar]
- 34.Zhang W., Li P. Temporal spike sequence learning via backpropagation for deep spiking neural networks. Adv. Neural Inf. Process. Syst. 2020;33:12022–12033. [Google Scholar]
- 35.Paszke A., Gross S., Massa F., Lerer A., Bradbury J., Chanan G., Killeen T., Lin Z., Gimelshein N., Antiga L., et al. Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019;32 doi: 10.48550/arXiv.1912.01703. [DOI] [Google Scholar]
- 36.Loshchilov I., Hutter F. International Conference on Learning Representations. 2018. Decoupled weight decay regularization. [DOI] [Google Scholar]
- 37.Loshchilov I., Hutter F. Sgdr: stochastic gradient descent with warm restarts. arXiv. 2016 doi: 10.48550/arXiv.1608.03983. Preprint at. [DOI] [Google Scholar]
- 38.Orchard G., Jayawant A., Cohen G.K., Thakor N. Converting static image datasets to spiking neuromorphic datasets using saccades. Front. Neurosci. 2015;9:437. doi: 10.3389/fnins.2015.00437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Amir A., Taba B., Berg D., Melano T., McKinstry J., Di Nolfo C., Nayak T., Andreopoulos A., Garreau G., Mendoza M., et al. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. A low power, fully event-based gesture recognition system; pp. 7243–7252. [Google Scholar]
- 40.Li H., Liu H., Ji X., Li G., Shi L. Cifar10-dvs: an event-stream dataset for object classification. Front. Neurosci. 2017;11:309. doi: 10.3389/fnins.2017.00309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Warden P. Speech commands: a dataset for limited-vocabulary speech recognition. arXiv. 2018 doi: 10.48550/1804.03209. Preprint at. [DOI] [Google Scholar]
- 42.Fang W., Chen Y., Ding J., Chen D., Yu Z., Zhou H., Tian Y. 2020. Other Contributors. Spikingjelly. [Google Scholar]
- 43.Lee C., Sarwar S.S., Panda P., Srinivasan G., Roy K. Enabling spike-based backpropagation for training deep neural network architectures. Front. Neurosci. 2020;14:119. doi: 10.3389/fnins.2020.00119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Hunsberger E., Eliasmith C. Spiking deep networks with lif neurons. arXiv. 2015 doi: 10.48550/1510.08829. Preprint at. [DOI] [Google Scholar]
- 45.Panda P., Aketi S.A., Roy K. Toward scalable, efficient, and accurate deep spiking neural networks with backward residual connections, stochastic softmax, and hybridization. Front. Neurosci. 2020;14:653. doi: 10.3389/fnins.2020.00653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Rueckauer B., Lungu I.A., Hu Y., Pfeiffer M., Liu S.C. Conversion of continuous-valued deep networks to efficient event-driven networks for image classification. Front. Neurosci. 2017;11:682. doi: 10.3389/fnins.2017.00682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Cheng X., Hao Y., Xu J., Xu B. IJCAI; 2020. Lisnn: Improving Spiking Neural Networks with Lateral Interactions for Robust Object Recognition; pp. 1519–1525. [Google Scholar]
- 48.Kim Y., Panda P. Revisiting batch normalization for training low-latency deep spiking neural networks from scratch. Front. Neurosci. 2020;15:773954. doi: 10.3389/fnins.2021.773954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Zhao D., Zeng Y., Li Y. Backeisnn: a deep spiking neural network with adaptive self-feedback and balanced excitatory-inhibitory neurons. arXiv. 2021 doi: 10.48550/arXiv.2105.13004. Preprint at. [DOI] [PubMed] [Google Scholar]
- 50.Zhang W., Li P. Spike-train level backpropagation for training deep recurrent spiking neural networks. Adv. Neural Inf. Process. Syst. 2019;32 doi: 10.48550/arXiv.1908.06378. [DOI] [Google Scholar]
- 51.Ding J., Yu Z., Tian Y., Huang T. Optimal ann-snn conversion for fast and accurate inference in deep spiking neural networks. arXiv. 2021 doi: 10.48550/2105.11654. Preprint at. [DOI] [Google Scholar]
- 52.Kim Y., Li Y., Park H., Venkatesha Y., Panda P. Neural architecture search for spiking neural networks. arXiv. 2022 doi: 10.48550/2201.10355. Preprint at. [DOI] [Google Scholar]
- 53.Shrestha S.B., Orchard G. Slayer: spike layer error reassignment in time. Adv. Neural Inf. Process. Syst. 2018;31 [Google Scholar]
- 54.Fang H., Shrestha A., Zhao Z., Qiu Q. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. 2020. Exploiting neuron and synapse filter dynamics in spatial temporal learning of deep spiking neural network. [DOI] [Google Scholar]
- 55.Kim Y.S., Panda P. Optimizing Deeper Spiking Neural Networks for Dynamic Vision Sensing. Neural Network. 2021;144:686–698. doi: 10.1016/j.neunet.2021.09.022. [DOI] [PubMed] [Google Scholar]
- 56.Zheng H., Wu Y., Deng L., Hu Y., Li G. Going deeper with directly-trained larger spiking neural networks. Proc. AAAI Conf. Artif. Intell. 2021;35:11062–11070. [Google Scholar]
- 57.Fang W., Yu Z., Chen Y., Masquelier T., Huang T., Tian Y. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. Incorporating learnable membrane time constant to enhance learning of spiking neural networks; pp. 2661–2671. [Google Scholar]
- 58.Lee J., Park J., Kim K.L., Nam J. Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms. arXiv. 2017 doi: 10.48550/arXiv.1703.01789. Preprint at. [DOI] [Google Scholar]
- 59.de Andrade D.C., Leo S., Viana M.L.D.S., Bernkopf C. A neural attention model for speech command recognition. arXiv. 2018 doi: 10.48550/arXiv.1808.08929. Preprint at. [DOI] [Google Scholar]
- 60.Kim T., Lee J., Nam J. Comparison and analysis of samplecnn architectures for audio classification. IEEE J. Sel. Top.Signal. Process. 2019;13:285–297. doi: 10.1109/JSTSP.2019.2909479. [DOI] [Google Scholar]
- 61.Won M., Chun S., Nieto O., Serrc X. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE; 2020. Data-driven harmonic filters for audio representation learning; pp. 536–540. [Google Scholar]
- 62.Chakraborty B., She X., Mukhopadhyay S. A fully spiking hybrid neural network for energy-efficient object detection. arXiv. 2021 doi: 10.48550/2104.10719. Preprint at. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All original code has been deposited at https://github.com/Brain-Inspired-Cognitive-Engine/BP-STA under https://doi.org/10.5281/zenodo.6489856 and is publicly available as of the date of publication.






