Abstract
Spiking Neural Networks (SNNs), designed to more accurately model the brain’s neurobiological processes, have been proposed as energy-efficient alternatives to conventional Artificial Neural Networks (ANNs), which typically incur high computational and energy costs. However, the enhanced energy efficiency and computational savings incurred by using SNNs are often achieved at the expense of reduced classification performance. Recent studies have investigated the incorporation of attention mechanisms into SNNs to enhance their classification performance, but these approaches typically repurpose attention mechanisms originally developed for conventional ANNs, which fail to fully leverage the spike-based encoding characteristics intrinsic to spiking neuron dynamics. To address this challenge, we propose the Biologically Inspired Attention Spiking Neural Network (BIASNN), a novel SNN architecture designed for image classification. BIASNN introduces a biologically inspired attention mechanism that integrates adaptive leaky integrate and fire neurons with components from established attention models. Our attention mechanism is placed into an existing SNN architecture using leaky integrate and fire neurons to enhance biological fidelity by combining multiple spiking neuron models in a single network. Experiments on benchmark image classification datasets demonstrate that BIASNN achieves high classification accuracy using only four timesteps. By enabling the development of more biologically plausible attention mechanisms, BIASNN advances the capabilities of deep spiking neural networks toward more brain-like processing.
Keywords: Attention mechanisms, Biological neural networks, Image classification, Spiking neural networks
Subject terms: Computational biology and bioinformatics, Engineering, Mathematics and computing, Neuroscience
Introduction
Artificial Neural Networks (ANNs) have been at the forefront of advances in machine learning and artificial intelligence, leading to breakthroughs in tasks such as image classification1, speech recognition2, and recommendation systems3. ANNs were originally inspired by the structure and function of biological neural networks, specifically neurons in the human brain4. However, despite this biological inspiration, traditional ANNs differ significantly from the brain’s highly efficient mechanisms. The human brain achieves high-throughput information processing with remarkably low power consumption5, processing extremely large amounts of information in a massively parallel and event-driven manner. In contrast, ANNs typically require substantial computational and energy resources, due to their reliance on dense, continuous, floating-point matrix operations.
To address these limitations, and more closely mimic the efficiency of biological brains, spiking neural networks (SNNs) have gained significant attention as a next-generation neural network model. Unlike ANNs, which rely on continuous-valued activations, SNNs communicate via binary spikes (0 s and 1 s)6, reflecting the event-driven nature of information processing in biological neural systems. This spike-based communication enables SNNs to operate in a sparse, asynchronous fashion, significantly reducing the number of multiplication operations a network is required to perform. Consequently, SNNs offer substantial computational and energy savings7,8, making them particularly well-suited for real-time, low-power applications, such as edge computing and neuromorphic systems.
Just as the biological brain comprises different types of neurons with distinct properties9, various spiking neuron models have been developed for use in SNNs, each balancing biological plausibility against computational efficiency. The Leaky Integrate-and-Fire (LIF) neuron10, for example, represents a foundational approach, capturing essential neuronal dynamics such as membrane potential decay and threshold-based spike generation with minimal computational overhead. The adaptive leaky integrate and fire (ALIF) neuron11 extends the LIF framework by incorporating mechanisms for adaptive threshold modulation or membrane potential adjustment, thereby increasing biological realism while maintaining relative simplicity. In contrast, more biophysically detailed models, such as the Izhikevich model12 and Hodgkin–Huxley model13, offer higher fidelity by reproducing complex neuronal behaviors, including bursting, resonation, and various firing patterns, at the cost of significantly increased computational demands.
The selection of neuron models within an SNN plays a critical role in determining both the network’s computational efficiency and its functional accuracy. While the Izhikevich and Hodgkin–Huxley models are well-suited for applications that are focused on replicating specific aspects of biological networks, their high computational demands render them impractical for use in deep networks. Conversely, the LIF and ALIF models are commonly employed in SNNs designed to replicate tasks performed by ANNs, due to their lower computational requirements. Previous works14–17 chose to adopt the LIF neuron model due to its simplicity, and demonstrated its effectiveness in achieving state-of-the-art results. However, the ALIF model has been shown to provide improved firing rate stability, with only a minor increase in energy consumption, while producing competitive accuracies18. Despite the proven success of both the LIF and ALIF neuron types independently, to the authors’ knowledge, no existing work has investigated the integration of both neurons within a unified network architecture, highlighting a notable gap in the current literature and a promising area for research.
Despite the computational advantages SNNs provide, they face challenges in achieving the high accuracy observed in traditional ANNs. This remains true even for fundamental tasks like image classification, with the exception of networks using smaller-scale datasets19–21. To help alleviate this issue, researchers have once again turned to the human brain for inspiration. The biological brain, particularly the visual cortex, dynamically allocates resources to the most important parts of the visual field based on the task being performed or some external stimuli. This enables the brain to filter out unnecessary information, allowing humans to process complex environments efficiently and make decisions quickly22,23. Inspired by this phenomenon, attention mechanisms have been successfully integrated into ANNs24–28, leading to the development of highly successful models like transformers and vision transformers (ViTs)29, which excel at prioritizing key features for improved task performance. The idea of incorporating attention into SNNs has been the subject of several recent research articles, and it has been shown to be an effective tool for assisting in the optimization of spike generation and processing, leading to better performance and energy efficiency14,30–34. While attention has been successfully integrated into SNNs, the mechanisms used in most studies were created for ANNs, which leaves an open avenue of research to start exploring mechanisms that are more biologically plausible.
This paper explores the combination of a biologically inspired attention mechanism and SNNs for the task of image classification. Specifically, we propose a new, 3-D spatial-channel attention mechanism for SNNs. The attention mechanism makes use of the spiking output of ALIF neurons to create a binary attention mask, which is applied to the input features to eliminate noisy or non-vital information. Our mechanism is inserted into an existing SNN using LIF neurons, creating a new network capable of using multiple spiking neuron types. The proposed attention mechanism is further analyzed using explainable AI tools to enhance the interoperability of its effects on the decision-making process. Our Biologically Inspired Attention SNN (BIASNN) model is evaluated on three static image datasets (CIFAR-10, CIFAR-100, and FMNIST) with resulting accuracies of 95.66%, 94.22% and 75.40%. The main contributions of our work can be summarized as follows.
We create a 3D, spike-based attention mechanism that uses ALIF neurons for controlling attention within the spatial and channel dimensions of images.
We propose a new method for making use of multiple types of spiking neurons in an SNN.
We make use of a Grad-Cam-like method to further analyze how our proposed mechanism affects the classification of the input images.
Experimental results show that our new method obtains comparable results when measured against existing SNN models.
Methods
The goal of this work is to create a new, more biologically plausible form of attention, and integrate the proposed mechanism into an existing SNN architecture that currently uses LIF neurons. This combination is used to form the proposed BIASNN model. The subsections below discuss the details of the backbone architecture, the inner workings of our new attention mechanism, and the spiking ALIF block used for generating the final attention map.
Backbone architecture
The ResNet model1 is a widely adopted deep neural network architecture for image processing, originally developed to mitigate the degradation problem in deep networks through the use of identity-based residual connections. Building upon this concept, the MS-ResNet architecture17 was created for SNNs. In this framework, data encoding is performed via an initial convolutional layer that transforms static image inputs into a format suitable for spike-based processing. The encoded signals are then propagated through a series of residual blocks, each comprising two spiking neuron layers followed by a convolutional layer. This structure permits the exchange of floating-point feature maps between blocks, enabling improved representational capacity and learning stability. Due to its success in image classification, we adopt the MS-ResNet18 architecture as the backbone for our BIASNN network. Following the MS-ResNet18 design paradigm, our model consists of eight residual blocks, with the proposed attention mechanism inserted after every other block, except for the last.
The entire architecture of the proposed network is illustrated in Fig. 1. As depicted in Fig. 1a, the BIASSN model begins with an initial two-dimensional convolutional layer, configured with a kernel size of seven, a stride of one, and padding of three, which serves to encode the input data into a format suitable for downstream spike processing. This is followed by a sequence of residual blocks, the internal structure of which is detailed in Fig. 1b. Each residual block begins with a Leaky Integrate-and-Fire (LIF) neuron layer, which integrates synaptic input, in the form of weighted floating-point values, into the individual neurons’ membrane potentials. When a membrane potential crosses a threshold, an output spike is generated. The resulting spike trains are propagated through a convolutional layer, followed by batch normalization to stabilize learning. The normalized output is subsequently fed into a second LIF layer, whose spiking activity is again processed through a convolutional layer and a second batch normalization step. A residual connection is added to the output of the final batch normalization operation, enabling gradient flow and promoting stable training. Architectural variations for the first convolutional layer in each residual block group are detailed in Table 1. All other convolutional layers throughout the network utilize a kernel size of three, a stride of one, and padding of one.
Fig. 1.
Overview of the proposed BIASNN network. The BIASSN model (a) consists of MSResNet18 residual blocks (b) and our proposed attention mechanism (c). The attention mechanism combines the CBAM and Squeeze and Excite methods with an ALIF block (d) to create a 3D attention mechanism. The ALIF block contains four layers: Channel Normalization, Data Inversion, ALIF neurons, and Spike Inversion. The combination of these layers is designed so that the attention mechanism will learn to eliminate the least important values from the input data.
Table 1.
Information about the first convolutional layer in each group of blocks.
| Group number | Convolution information | Channel output sizes |
|---|---|---|
| 1 | k = 3, s = 1, p = 1 | 64 |
| 2 | k = 3, s = 2, p = 1 | 128 |
| 3 | k = 3, s = 2, p = 1 | 256 |
| 4 | k = 3, s = 1, p = 1 | 512 |
For our LIF layers, we utilize the following equations:
![]() |
1 |
![]() |
2 |
![]() |
3 |
where
is the membrane potential of the nth postsynaptic neuron at time step t, and a time step is a single iteration through the network. The variable
is the decay constant of the neuron,
is the weight between the ith presynaptic neuron and the nth postsynaptic neuron, and
is the output spike value from the ith presynaptic neuron.
is the output spike value of the nth postsynaptic neuron,
represents the Heaviside function, and
is the voltage threshold of the postsynaptic neuron. In general, Eq. (1) represents the voltage update process of a neuron, Eq. (2) is used to determine whether a spike is generated, and Eq. (3) resets the voltage of the neuron if a spike occurs. A diagram depicting the process an LIF neuron in our model undergoes can be seen in Fig. 2a.
Fig. 2.
Overview of the processes carried out by an LIF neuron and ALIF neuron. The LIF neuron, depicted in (a), takes the input from the previous layer and integrates it into the decayed (leaked) membrane potential. If the membrane potential reaches a specified threshold, the neuron will fire a spike and reset its membrane potential. Otherwise, it will remain silent, and the membrane potential will be transferred to the next time step as is. The ALIF neuron, depicted in (b), operates in a similar fashion to the LIF, but with one key difference. The ALIF neuron will adjust its firing threshold based on whether a spike occurred in the current time step. If the neuron generates a spike, its threshold will increase, and if a spike wasn’t generated, the threshold will decrease.
Our model makes use of the surrogate gradient method for training35. Using the surrogate gradient method allows for direct training of our SNN and allows us to easily use existing deep learning libraries. To overcome the issue of nondifferentiable spiking outputs from the LIF layers, we use the following equation17,36 during the backward pass:
![]() |
4 |
where the variable
is a constant used to keep the integral of the function set to 1.
Attention mechanism
A detailed overview of the architecture of our attention mechanism can be seen in Fig. 3a, and its distinct steps are summarized in Algorithm 1. Our attention mechanism draws its inspiration from the CBAM37 and SE24 attention architectures and begins with the operations listed below.
![]() |
5 |
![]() |
6 |
Fig. 3.
of the proposed attention Overview mechanism and ALIF Block. Shown in (a) is the process data undergoes in the attention mechanism. Here, global average and max pooling are used to squeeze the data in the channel dimension. The DS convolutions are then used to gradually increase (excite) the number of channels back to that of the original input. Shown in (b) is a detailed look at the ALIF block. The colored squares in the cubes show the data being transformed at each step. In both figures, the final attention map consists of only black squares (0’s) and white squares (1’s).
In Eq. (5),
denotes the input data at time step t, while
represents the concatenated feature maps obtained from average pooling (AP) and max pooling (MP) operations. In Eq. (6),
is processed by two depth-wise separable (DS) convolutional layers. The DS convolutions utilize kernel sizes of 5 and 7, strides of 2 and 3, and have output channel sizes of C/r and C, respectively, to produce the output
. These convolutional layers are designed to progressively excite the channel dimensionality to match that of the original input, with the rate of channel expansion governed by the hyperparameter r.
ALIF block
Once the data has been convolved, it is passed into the ALIF block. As can be seen in Fig. 3b, the ALIF block consists of four different steps. The first two steps are channel normalization18,38,39 and inversion, and the equations for these processes are:
![]() |
7 |
![]() |
8 |
where
denotes the normalized output values,
and
represent the per-channel minimum and maximum values of
, and ϵ is some small, constant value added to the denominator to prevent division by zero in the rare case that the minimum and maximum values are equal. After normalization to the range [0, 1], the data is inverted using Eq. (8), resulting in the output
. Once the data has been inverted, it is sent to a layer of ALIF neurons with adaptive thresholds18,40 to generate spikes. Our ALIF neurons make use of the following equations:
![]() |
9 |
![]() |
10 |
![]() |
11 |
![]() |
12 |
![]() |
13 |
where
is the membrane potential of the nth neuron at time step t,
represents the membrane potential decay constant of the neuron,
is the output spike value of the neuron,
is the Heaviside function, and
is the membrane potential voltage threshold of the neuron. The variable
is the update value for the threshold at the next time step, averaged over the batch dimension B.
is the scale factor for the update value, and
is the update time constant. Overall, Eqs. (9)–(11) are used to update the membrane potential of the neuron, generate a spike when necessary, and reset the membrane potential of the neuron if a spike occurs. Equations (12) and (13) are used to update the threshold value of the ALIF neuron for the next time step. A diagram depicting the process an ALIF neuron in our attention mechanism undergoes can be seen in Fig. 2b.
To overcome the non-differentiability of spike-based outputs, our ALIF layer makes use of the arctan surrogate method35,41 in the backpropagation step, which can be derived as seen below.
![]() |
14 |
Algorithm 1.
Pseudo-code of the proposed attention mechanism
In Eq. (14),
is a constant used to scale the output gradient. Once the spikes for the ALIF layer’s neurons have been calculated, they are inverted using Eq. (15) below:
![]() |
15 |
where
is the final output spike values for all ALIF neurons. The ALIF spikes are computed in this way to keep the output of the ALIF neurons sparse, as is typically desired from spiking neurons. Finally, using Eq. (16), the original attention mechanism input is multiplied by the inverted spikes to get the final output of the attention mechanism.
![]() |
16 |
In Eq. (16) above, STE stands for the straight-through estimator42. The straight-through estimator is used to allow gradient calculations for all values of
, even those that were removed by the attention mask.
Experimental setup
We evaluate our newly proposed BIASNN on three standard datasets, FashionMNIST43, CIFAR1044, and CIFAR10044. For all experiments the adaptation parameter,
, is reset at the end of each epoch. We incorporate this strategy to simulate the long-term, homeostatic effects of threshold variations typically seen in biological neurons45. Table 2 lists the common hyperparameter settings for the three datasets. Information pertaining to each dataset can be seen in Table 3. For training, each dataset is augmented with random horizontal flipping and random cropping with a padding of four applied. To ensure a fair comparison, we processed all three datasets through the MS-ResNet18 network using the same convolution settings employed by our BIASNN (see Table 1 for details).
Table 2.
Hyperparameter settings for all datasets.
Table 3.
Information about FMNIST, CIFAR10, and CIFAR100 datasets.
| Dataset | Training images | Validation images | Number of classes | Number of channels | Image size |
|---|---|---|---|---|---|
| FMNIST | 60,000 | 10,000 | 10 | 1 | 28 × 28 |
| CIFAR10 | 50,000 | 10,000 | 10 | 3 | 32 × 32 |
| CIFAR100 | 50,000 | 10,000 | 100 | 3 | 32 × 32 |
These results are indicated with a superscript “a” in Table 4. All reported results correspond to the highest validation accuracies achieved for each dataset within 150 training epochs. All experiments are carried out on an NVIDIA RTX 4090 graphics card using the PyTorch 2.1 library.
Table 4.
Experimental results on the FashionMNIST, CIFAR10, and CIFAR100 datasets.
| Dataset | Model | Training method | Time steps (T) | Top-1 Acc |
|---|---|---|---|---|
| FMNIST | R-CSNN51 | STiDi-BP | 100 | 92.80 |
| Yin et al.47 | Proxy Learning | 50 | 94.56 | |
| SFA-SNN52 | SG | 20 | 92.40 | |
| Liu et al.53 | SG | 24 | 90.31 | |
| Dan et al.46 | SG | 6 | 95.44 | |
| SONN54 | SG | 300/1000 | 90.0 | |
| MS-ResNet17 (ResNet18)a | SG | 4 | 95.47 | |
| BIASNN | SG | 4 | 95.66 | |
| CIFAR10 | CQ Training48 (VGG*) | ANN-SNN | 600 | 94.16 |
| Wang et al.49 (ResNet-18) | ANN-SNN | 4 | 93.27 | |
| Wang et al.49 (VGG-16) | ANN-SNN | 4 | 94.06 | |
| SpikeConverter55 (VGG-16) | ANN-SNN | 16 | 93.71 | |
| Yan et al.56 (VGG-*) | ANN-SNN | 8 | 93.71 | |
| Li et al.57 (ResNet18) | SG | 4 | 92.92 | |
| Dan et al.46 | SG | 6 | 94.04 | |
| Sun et al.50 (SResNet38) | SG | 20 | 93.54 | |
| MA-SNN14 (VGG-11) | SG | 6 | 91.91 | |
| MS-ResNet17 (ResNet18)a | SG | 4 | 93.84 | |
| BIASNN | SG | 4 | 94.22 | |
| CIFAR100 | Yan et al.56 (VGG16) | ANN-SNN | 16 | 66.11 |
| SpikeConverter55 (VGG-16) | ANN-SNN | 16 | 71.22 | |
| CQ Training48 (VGG*) | ANN-SNN | 300 | 71.84 | |
| Wang et al.49 (VGG-16) | ANN-SNN | 4 | 70.08 | |
| Li et al.57 (VGG16) | SG | 4 | 69.40 | |
| Sun et al.50 (SResNet38) | SG | 20 | 71.77 | |
| MA-SNN14 (VGG-11) | SG | 1 | 60.49 | |
| Liu et al.18 | SG | 8 | 67.83 | |
| LitE-SNN58 | SG | 4 | 69.55 | |
| MS-ResNet17 (ResNet18)a | SG | 4 | 74.98 | |
| BIASNN | SG | 4 | 75.40 |
In the Training Method column, rows set to SG indicate studies that use surrogate gradients for training, rows set to ANN-SNN indicate studies that use ANN-SNN conversion for training, and all others are custom learning methods.
aIndicates results from experiments performed by the authors.
Results
The results of our experiments on the FMNIST dataset are shown in Table 4. As can be seen in the table, our BIASNN model achieves an accuracy of 95.66% when the number of timesteps (T) is set to 4. The proposed solution achieves an increase in accuracy over the backbone MS-ResNet18 by 0.19%. Excluding the backbone network, the method with the next closest accuracy is the method proposed by Dan et al.46. Their method achieved an accuracy 0.22% less than ours, while requiring two extra timesteps. The Proxy Learning method47 shows the next closest accuracy to our BIASNN, however, this method required 50 time steps and produced an accuracy 1.1% lower than BIASNN. Table 4 lists the results of experiments on the CIFAR10 dataset. As can be seen in the table, the BIASNN network achieves an accuracy of 94.22%, a 0.38% increase in accuracy over the backbone MS-ResNet18. Comparing our results with other methods on the CIFAR10 dataset, we see that the CQ Training48 and Wang et al.49 methods achieve accuracies only slightly lower than those of the proposed BIASNN. The CQ Training method achieves an accuracy of 0.06% less than BIASNN. However, their method makes use of ANN-SNN conversion and requires 600 time steps to achieve a similar accuracy to our four time step method. The method proposed by Wang et al. made use of ANN-SNN conversion as well, and was able to achieve results slightly lower than our BIASNN method (0.13%) in four time steps with a VGG-16-based network. However, when applied to the typically more powerful ResNet-18 architecture, the accuracy of their method decreases to 93.27%, which is 0.95% less than the accuracy achieved by our proposed BIASNN. Besides the decrease in accuracy, the CQ Training and Wang et al. methods still do not allow for the direct training of the SNN, requiring an extra step with the conversion process.
Table 4 lists the results of experiments on the CIFAR100 dataset. When employing four time steps, our proposed BIASNN model is able to achieve an accuracy of 75.40%, an increase in accuracy of 0.42% over the backbone MS-ResNet18. In the case of the CIFAR100 dataset, the CQ Training method is the next closest competitor to the proposed BIASNN. Their method produces an accuracy of 71.84% with 300 time steps, which is a relatively large drop in performance (3.45%) compared to our BIASNN’s results. The method proposed in Sun et al.50 demonstrates the next highest accuracy compared to the proposed BIASNN. It achieves an accuracy of 71.77%, again showing a large drop in performance (3.63%) compared to our BIASSN.
Discussion
To better analyze the effects the proposed attention mechanism has on the overall classification accuracy of the network, we make use of two different methods to visualize its impact. First, we compare the individual class accuracies with those of the backbone architecture, and second, we make use of a Grad-CAM-like method for generating heat maps of the spiking outputs of the attention layer.
Shown in Fig. 4a, b are the confusion matrices for the original MS-ResNet18 and our proposed BIASNN on the CIFAR10 dataset. We see that the proposed attention mechanism helps increase the accuracy of several different classes, most notably the airplane (+ 2%), cat (+ 3%), and frog (+ 1%) classes. However, it does decrease the accuracy of some classes, such as dogs (-2%) and horses (-2%). This may be a sign of the network eliminating the wrong or too much information, making some classes more difficult to distinguish.
Fig. 4.
Confusion matrix results for the original MS-ResNet18 (a), and our proposed BIASNN (b) on the CIFAR10 dataset.
The Grad-CAM method59 is used for visualizing the effects individual layers have on the final output of a network. It makes use of activation values and their gradients to generate a heatmap, allowing researchers to better understand how different layers are affecting a model. Here, we use this same concept to study the effect the spiking layer in our attention mechanism has on our proposed network. However, unlike the original Grad-CAM, we do not use the gradients of the spiking layers for two reasons. First, the gradients of spiking outputs are generated using a surrogate gradient function, which is only an approximation of a spike’s gradient and can introduce errors into the final heatmap. Second, the surrogate functions are typically designed so that lower input values will have higher gradient values. This can result in misleading heatmaps, as the areas that caused spikes to occur, i.e., areas with the most information, will have the smallest gradients. In our specific case, since the final spike outputs were inverted, the highest gradients should correspond to the spike values for the attention mask; however, we prefer to use a method that can be more easily applied to any spiking layer in the network.
Instead, we make use of the total number of spikes for all time steps to create the final heat map. To generate the final heatmap for each image, the number of spikes is summed across all time steps, and then normalized to be in the range [0, 1] using Eqs. (17) and (18) below.
![]() |
17 |
![]() |
18 |
In Fig. 5, class-specific heatmaps were generated using the inverted spike outputs, computed via Eq. (15), from the final attention mechanism in the network. As can be seen in the figure, the number of spikes generated in areas of importance tends to be higher, while areas of less importance produce fewer spikes. This is an indication that our attention mechanism does help the network learn where to focus, and is demonstrated particularly well in the airplane, deer, and bird heatmaps.
Fig. 5.
Heatmap images, and the corresponding input CIFAR10 image, for each of the ten classes in the CIFAR10 dataset. The heatmap images show the normalized spike counts generated from the inverted spikes, calculated using Eq. (15), in the last attention mechanism in our proposed BIASNN.
Our proposed attention mechanism makes use of ALIF neurons due to their more stable firing rate. However, the LIF neuron is generally a more popular choice when working with SNNs. To validate our use of the ALIF layer of neurons in our attention mechanism, we examine the results of the BIASNN model when an LIF layer is used as the spiking layer in our attention mechanism. We term this method Leaky Attention SNN (LASNN), and compare its results to those of the proposed BIASNN model, where an ALIF layer of neurons is used. From our experiments, we find that the BIASNN model achieves an accuracy of 94.22%, whereas the LASNN method produces an accuracy of 93.68%, a considerably large drop in performance. In an effort to understand why the BIASNN method outperforms the LASNN method, we look at two different pieces of information generated by the network. First is the number of neuron values that are eliminated at each time step for each attention mechanism. The number of eliminated values plays a crucial role in the amount of information that is allowed to pass through the rest of the network and will ultimately affect the final accuracy. Shown in Fig. 6a, the number of values that are eliminated using the LASNN method varies greatly for each time step in all attention mechanisms. Also, there is a considerable difference in the number of eliminated values between attention mechanisms. In contrast, the BIASNN method gradually increases the number of eliminated values between time steps for each mechanism, and the number eliminated by each attention mechanism is relatively the same.
Fig. 6.
Graphs displaying BIASNN and LASNN comparisons. Depicted in (a), the percentage of neuron activations suppressed by the LIF-based attention mechanism (LASNN) and the proposed ALIF-based mechanism (BIASNN). Shown in (b), the percentage of spiking neurons in the LIF layers that follow the attention mechanisms.
Second, we look at the firing patterns of the LIF layers that directly follow the attention mechanisms. The firing patterns seen in these layers will affect the rest of the layers in the block, and ultimately the accuracy of the network, so stability becomes an important factor. Plotted in Fig. 6b, the number of spiking neurons in the LIF layers varies considerably for the LASNN method, which is to be expected given the greater variability in the number of eliminated neuron values in the attention mechanisms. In contrast, the BIASNN method produces LIF neurons that are gradually less activated over time, and the variation between the LIF layers is more consistent. This follows well with the gradual increase in the number of eliminated neuron values for each attention mechanism in the BIASNN method. Based on these two pieces of information, it appears that the instability of LASNN is what leads to its lower classification accuracy.
We also examine the effects using the LASSN method has on the energy requirements of our proposed network. For our energy calculations, we follow60, where the network is assumed to be running on a 45 nm CMOS chip, and addition and multiplication operations require 0.9pJ and 3.7pJ of energy, respectively. Results show that the BIASNN method consumes 0.55 mJ of energy and the LASNN method requires 0.59 mJ of energy, an outcome that seems counterintuitive considering the use of the more complex ALIF neurons in BIASNN. Examining the number of multiplication and addition operations within the attention mechanisms, we find that the LIF neurons require 229,376 multiplications and 458,752 additions, whereas the ALIF neurons require 458,755 multiplications and 688,688 additions. Although the ALIF neurons require approximately twice the number of multiplications and 1.5 times the number of additions, this adds minimal computational overhead to the network. In fact, this small increase in complexity is attained by our placement strategy for the attention mechanisms, as only three ALIF layers are added to the network, one for each attention mechanism. However, the small increase in complexity does not explain the difference in power consumption between the two models. Instead, the difference can be seen by examining the spiking rate of the LIF neurons outside of the attention mechanisms in both networks. The LASNN method shows an average spiking rate of 11.88%, while the BIASNN method maintains a lower rate of 9.64%. The reduced spike rate of BIASNN results in it requiring 54,338,180 fewer addition operations compared to LASNN, offsetting the extra cost of the ALIF neurons and accounting for the 0.05 mJ difference in energy consumption. A particularly clear example of the higher spike rate in LASNN is observed in LIF layer 9, shown in Fig. 6b. Based on reduced spiking rate of the network, lower energy consumption, and increased accuracy, we conclude that incorporating ALIF layers into our attention mechanism is the most effective design choice.
In addition to employing two types of spiking neurons, our network also incorporates two distinct surrogate gradient functions, one tailored for each neuron type. This design choice was motivated by observed performance improvements when using two functions instead of the standard single-function approach. Experiments on the CIFAR10 dataset show that employing Eq. (4) as the surrogate for all spiking layers reduces the network’s accuracy to 93.97%, a reduction of 0.25% compared to the proposed dual function model. Conversely, when Eq. (14) is the surrogate gradient utilized by all spiking layers, the network reached a peak accuracy of only 73.33% at epoch 34. After this point, both accuracy and spiking activity progressively declined, and by epoch 70, the network exhibited zero spiking activity with an accuracy of 10%. This resulted in what is commonly referred to as the dead neuron problem in SNNs, and no learning occurred for the remaining epochs. The improved accuracy observed when combining Eqs. (4) and (14) suggests that, in certain cases, employing multiple surrogate gradient functions is essential for maximizing network performance. For our BIASNN, this combination proved to be the most effective configuration.
While the proposed BIASNN introduces a more biologically plausible form of attention than those proposed in prior works, we find that its performance does not match that of methods employing floating-point valued attention. A comparison of our work and other recently proposed attention mechanisms for SNNs is provided in Table 5. As shown, our biologically inspired approach achieves accuracy comparable to the FSTA-SNN61 method when applied to a two-timestep ResNet20 architecture; however, methods that generate floating-point attention masks generally outperform BIASNN. We attribute this limitation to the binary, all-or-nothing approach our attention mechanism incorporates. In contrast, floating-point attention allows continuous weighting of input values, enabling more flexible feature selection and improved generalization across training and testing data. By restricting map weights to only 1 s and 0 s, our mechanism risks either eliminating too much information, leading to overfitting, or eliminating too little, effectively negating the benefits of attention. Nevertheless, BIASNN demonstrates that spike-based attention can approach the effectiveness of some floating-point methods, while maintaining greater biological plausibility. These results highlight the importance of further investigation into brain-inspired attention processes, as they hold promise for bridging the gap between biological realism and state-of-the-art performance.
Table 5.
Comparison of our proposed BIASNN and other state-of-the-art SNNs incorporating floating-point valued attention.
| Model | Timesteps | Accuracy | Biologically inspired |
|---|---|---|---|
| FSTA-SNN (ResNet19)61 | 2 | 96.52 | ✕ |
| FSTA-SNN (ResNet20)61 | 4 | 94.72 | ✕ |
| FSTA-SNN (ResNet20)61 | 2 | 94.18 | ✕ |
| TCJA-SNN (MS-ResNet18)32 | 6 | 95.87 | ✕ |
| TCJA-SNN (MS-ResNet18)32 | 4 | 95.60 | ✕ |
| IM-LIF34 | 6 | 95.66 | ✕ |
| IM-LIF34 | 3 | 95.29 | ✕ |
| GAC-SNN33 | 6 | 96.46 | ✕ |
| GAC-SNN33 | 4 | 96.24 | ✕ |
| BIASNN | 4 | 94.22 | ✓ |
Conclusion
In conclusion, this study presents a novel attention mechanism (BIASNN) designed for spiking neural networks and static image classification. Our proposed mechanism is designed to help enhance the performance of SNNs while increasing the biological plausibility of attention mechanisms. Our attention mechanism makes use of a CBAM-like architecture in combination with the spiking output of ALIF neurons to eliminate unnecessary input data. The elimination of this data helps the network learn where to focus and what to focus on, all while adding a new degree of biological plausibility. Experimental results on several standard datasets demonstrate that our mechanism can help a network enhance performance, and an in-depth analysis using explainable AI tools helps to better understand how our mechanism functions. Attention is a powerful tool that has been successfully used in many ANN models, and by incorporating attention, SNNs have the potential to advance both practical AI applications and our understanding of attention in the brain. Future work may explore incorporating additional biological mechanisms, such as ternary spiking outputs, or more biologically plausible neuron models. Additionally, optimizing this mechanism for neuromorphic datasets and examining its impact on broader, more complex neural tasks are potential areas for further study.
Acknowledgements
The authors thank the Artificial Intelligence Center (AIC) and MLIS Laboratory, College of Computing, Khon Kaen University, Thailand, for their support.
Author contributions
K.T. Formal analysis, methodology, original draft. W.T. prepared visualizing. S.W. reviewed and edited the manuscript.
Data availability
Code is available through the corresponding author. The datasets used and/or analyzed in this study are publicly available from the following websites. FMNIST: https://github.com/zalandoresearch/fashion-mnist. CIFAR10 and CIFAR100: https://www.cs.toronto.edu/~kriz/cifar.html.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Kevin Takala, Email: kevin.t@kkumail.com.
Sartra Wongthanavasu, Email: wongsar@kku.ac.th.
References
- 1.He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (Las Vegas, 2016).
- 2.Dong, L., Xu, S. & Xu, B. Speech-transformers: A no-recurrence sequence-to-sequence model for speech recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing 465 (Institute of Electrical and Electronics Engineers, Calgary, 2018).
- 3.Kannikaklang, N., Thamviset, W. & Wongthanavasu, S. BiLSTCAN: A novel SRS-based bidirectional long short-term capsule attention network for dynamic user preference and next-item recommendation. IEEE Access12, 6879–6899 (2024). [Google Scholar]
- 4.Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (2015).
- 5.Laughlin, S. B. & Sejnowski, T. J. Communication in neuronal networks. Science1979(301), 1870–1874 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Maass, W. Networks of spiking neurons: The third generation of neural network models. Neural Netw.10, 1659–1671 (1997). [Google Scholar]
- 7.Rathi, N. & Roy, K. DIET-SNN: A low-latency spiking neural network with direct input encoding and leakage and threshold optimization. IEEE Trans. Neural Netw. Learn. Syst.34, 3174–3182 (2023). [DOI] [PubMed] [Google Scholar]
- 8.Rathi, N. & Roy, K. LITE-SNN: Leveraging inherent dynamics to train energy-efficient spiking neural networks for sequential learning. IEEE Trans. Cogn. Dev. Syst.16, 1905–1914 (2024). [Google Scholar]
- 9.Gast, R., Solla, S. A. & Kennedy, A. Neural heterogeneity controls computations in spiking neural networks. Proc. Natl. Acad. Sci. U.S.A.121, e2311885121 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Abbott, L. F. Lapicque’s introduction of the integrate-and-fire model neuron (1907). Brain Res. Bull.50, 303–304 (1999). [DOI] [PubMed] [Google Scholar]
- 11.Brette, R. & Gerstner, W. Adaptive exponential integrate-and-fire model as an effective description of neuronal activity. J. Neurophysiol.94, 3637–3642 (2005). [DOI] [PubMed] [Google Scholar]
- 12.Izhikevich, E. M. Simple model of spiking neurons. IEEE Trans. Neural Netw.14, 1569–1572. 10.1109/TNN.2003.820440 (2003). [DOI] [PubMed] [Google Scholar]
- 13.Hodgkin, A. L. & Huxley, A. F. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol.117, 500–544 (1952). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Yao, M. et al. Attention spiking neural networks. IEEE Trans. Pattern Anal. Mach. Intell.45, 9393–9410 (2023). [DOI] [PubMed] [Google Scholar]
- 15.Cai, S., Li, P. & Li, H. A bio-inspired spiking attentional neural network for attentional selection in the listening brain. IEEE Trans. Neural Netw. Learn. Syst.10.1109/TNNLS.2023.3303308 (2023). [DOI] [PubMed] [Google Scholar]
- 16.Cai, W. et al. A spatial-channel-temporal-fused attention for spiking neural networks. IEEE Trans. Neural Netw. Learn. Syst.10.1109/TNNLS.2023.3278265 (2023). [DOI] [PubMed] [Google Scholar]
- 17.Hu, Y., Deng, L., Wu, Y., Yao, M. & Li, G. Advancing spiking neural networks towards deep residual learning. IEEE Trans. Neural Netw. Learn. Syst.36, 2353–2367 (2025). [DOI] [PubMed] [Google Scholar]
- 18.Liu, Q., Cai, M., Chen, K., Ai, Q. & Ma, L. Reconstruction of adaptive leaky integrate-and-fire neuron to enhance the spiking neural networks performance by establishing complex dynamics. IEEE Trans. Neural Netw. Learn. Syst.36, 2619–2633 (2025). [DOI] [PubMed] [Google Scholar]
- 19.Lee, C., Sarwar, S. S., Panda, P., Srinivasan, G. & Roy, K. Enabling spike-based backpropagation for training deep neural network architectures. Front. Neurosci.14, 497482 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lee, C., Panda, P., Srinivasan, G. & Roy, K. Training deep spiking convolutional neural networks with STDP-based unsupervised pre-training followed by supervised fine-tuning. Front. Neurosci.12, 435 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Tavanaei, A. & Maida, A. BP-STDP: Approximating backpropagation using spike timing dependent plasticity. Neurocomputing330, 39–47 (2019). [Google Scholar]
- 22.Melcher, D. Predictive remapping of visual features precedes saccadic eye movements. Nat. Neurosci.10, 903–907 (2007). [DOI] [PubMed] [Google Scholar]
- 23.Rolfs, M., Jonikaitis, D., Deubel, H. & Cavanagh, P. Predictive remapping of attention across eye movements. Nat. Neurosci.14, 252–256 (2010). [DOI] [PubMed] [Google Scholar]
- 24.Hu, J., Shen, L. & Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 7132–7141 (IEEE Computer Society, 2018). 10.1109/CVPR.2018.00745.
- 25.Dai, T., Cai, J., Zhang, Y., Xia, S. T. & Zhang, L. Second-order attention network for single image super-resolution. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition vols 2019-June 11057–11066 (IEEE Computer Society, 2019).
- 26.Wang, X., Girshick, R., Gupta, A. & He, K. Non-local neural networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 7794–7803 (IEEE Computer Society, 2018). 10.1109/CVPR.2018.00813.
- 27.Zhang, H., Goodfellow, I., Metaxas, D. & Odena, A. Self-Attention Generative Adversarial Networks. In Proceedings of the 36th International Conference on Machine Learning 7354–7363 (2019).
- 28.Gao, L., Li, X., Song, J. & Shen, H. T. Hierarchical LSTMs with adaptive attention for visual captioning. IEEE Trans. Pattern Anal. Mach. Intell.42, 1112–1131 (2020). [DOI] [PubMed] [Google Scholar]
- 29.Dosovitskiy, A. et al. An image is worth 16×16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (2021).
- 30.Yao, M. et al. Temporal-wise attention spiking neural networks for event streams classification. In IEEE/CVF International Conference on Computer Vision 10221–10230 (2021).
- 31.Yao, M. et al. Inherent redundancy in spiking neural networks. In Proceedings of the IEEE International Conference on Computer Vision 16878–16888 (Institute of Electrical and Electronics Engineers Inc., 2023). 10.1109/ICCV51070.2023.01552.
- 32.Zhu, R. J. et al. TCJA-SNN: Temporal-channel joint attention for spiking neural networks. IEEE Trans. Neural Netw. Learn. Syst.10.1109/TNNLS.2024.3377717 (2024). [DOI] [PubMed] [Google Scholar]
- 33.Qiu, X. et al. Gated Attention Coding for Training High-Performance and Efficient Spiking Neural Networks. https://github.com/bollossom/GAC (2024).
- 34.Lian, S., Shen, J., Wang, Z. & Tang, H. IM-LIF: Improved neuronal dynamics with attention mechanism for direct training deep spiking neural network. IEEE Trans. Emerg. Top. Comput. Intell.8, 2075–2085 (2024). [Google Scholar]
- 35.Eshraghian, J. K. et al. Training spiking neural networks using lessons from deep learning. Proc. IEEE111, 1016–1054 (2023). [Google Scholar]
- 36.Wu, Y., Deng, L., Li, G., Zhu, J. & Shi, L. Spatio-temporal backpropagation for training high-performance spiking neural networks. Front. Neurosci.12, 331 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Woo, S., Park, J., Lee, J.-Y. & Kweon, I. S. CBAM: Convolutional block attention module. In European Conference on Computer Vision (ECCV) (2018).
- 38.Kim, S., Park, S., Na, B. & Yoon, S. Spiking-YOLO: Spiking neural network for energy-efficient object detection. In AAAI 11270–11277 (2020).
- 39.Wu, Y. et al. Direct training for spiking neural networks: Faster, larger, better. In AAAI 1311–1318 (2019).
- 40.Shen, J., Ni, W., Xu, Q. & Tang, H. Efficient spiking neural networks with sparse selective activation for continual learning. In AAAI 611–619 (2024).
- 41.Fang, W. et al. Incorporating learnable membrane time constant to enhance learning of spiking neural networks. In Proceedings of the IEEE International Conference on Computer Vision 2641–2651 (Institute of Electrical and Electronics Engineers Inc., 2021). 10.1109/ICCV48922.2021.00266.
- 42.Bengio, Y., Léonard, N. & Courville, A. Estimating or propagating gradients through stochastic neurons for conditional computation. ArXiv (2013).
- 43.Xiao, H., Rasul, K. & Vollgraf, R. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. https://github.com/zalandoresearch/fashion-mnist (2017).
- 44.Krizhevsky, A. & Hinton, G. Learning Multiple Layers of Features from Tiny Images. https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (2009).
- 45.Turrigiano, G. G. Homeostatic plasticity in neuronal networks: The more things change, the more they stay the same. Trends Neurosci.22, 221–227 (1999). [DOI] [PubMed] [Google Scholar]
- 46.Dan, Y., Sun, C., Li, H. & Meng, L. Adaptive spiking neuron with population coding for a residual spiking neural network. Appl. Intell.55, 288 (2025). [Google Scholar]
- 47.Kheradpisheh, S. R., Mirsadeghi, M. & Masquelier, T. Spiking neural networks trained via proxy. IEEE Access10, 70769–70778 (2022). [Google Scholar]
- 48.Yan, Z., Zhou, J. & Wong, W.-F. Near lossless transfer learning for spiking neural networks. In AAAI 10577–10584 (2021).
- 49.Wang, Z. et al. Toward high-accuracy and low-latency spiking neural networks with two-stage optimization. IEEE Trans. Neural Netw. Learn. Syst.36, 3189–3203 (2025). [DOI] [PubMed] [Google Scholar]
- 50.Sun, C. et al. An energy efficient residual spiking neural network accelerator with ternary spikes. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst.44, 395–400 (2025). [Google Scholar]
- 51.Mirsadeghi, M., Shalchian, M., Kheradpisheh, S. R. & Masquelier, T. Spike time displacement-based error backpropagation in convolutional spiking neural networks. Neural Comput. Appl.35, 15891–15906 (2023). [Google Scholar]
- 52.Chen, T., Wang, L., Li, J., Duan, S. & Huang, T. Improving spiking neural network with frequency adaptation for image classification. IEEE Trans. Cogn. Dev. Syst.16, 864–876 (2024). [Google Scholar]
- 53.Liu, F., Xu, J., Yang, J. & Wu, W. A novel multi-type image coding method acting on supervised hierarchical deep spiking convolutional neural networks for image classification. Cognit. Comput.17, 9 (2025). [Google Scholar]
- 54.Huang, Z. et al. Modeling of spiking neural network with optimal hidden layer via spatiotemporal orthogonal encoding for patterns recognition. IEEE Trans. Emerg. Top. Comput. Intell.9, 2194–2207 (2025). [Google Scholar]
- 55.Liu, F. et al. SpikeConverter: An efficient conversion framework zipping the gap between artificial neural networks and spiking neural networks. In AAAI 1692–1701 (2022).
- 56.Yan, Z., Tang, K., Zhou, J. & Wong, W. F. Low latency conversion of artificial neural network models to rate-encoded spiking neural networks. IEEE Trans. Neural Netw. Learn. Syst.36, 14107–14118 (2025). [DOI] [PubMed] [Google Scholar]
- 57.Li, Y. et al. Efficient structure slimming for spiking neural networks. IEEE Trans. Artif. Intell.5, 3823–3831 (2024). [Google Scholar]
- 58.Liu, Q., Yan, J., Zhang, M. & Li, H. LitE-SNN: Designing lightweight and efficient spiking neural network through spatial-temporal compressive network search and joint optimization. In Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI-24) 3097–3105 (2024).
- 59.Selvaraju, R. R. et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision 2017-October 618–626 (Institute of Electrical and Electronics Engineers Inc., 2017).
- 60.Horowitz, M. 1.1 Computing’s energy problem (and what we can do about it). In Digest of Technical Papers—IEEE International Solid-State Circuits Conference vol. 57 10–14 (2014).
- 61.Yu, K., Zhang, T., Wang, H. & Xu, Q. FSTA-SNN:Frequency-Based Spatial-Temporal Attention Module for Spiking Neural Networks. www.aaai.org (2025).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Code is available through the corresponding author. The datasets used and/or analyzed in this study are publicly available from the following websites. FMNIST: https://github.com/zalandoresearch/fashion-mnist. CIFAR10 and CIFAR100: https://www.cs.toronto.edu/~kriz/cifar.html.


























