BIASNN: a biologically inspired attention mechanism in spiking neural networks for image classification

Kevin Takala; Wachirawut Thamviset; Sartra Wongthanavasu

doi:10.1038/s41598-025-22430-3

. 2025 Nov 5;15:38753. doi: 10.1038/s41598-025-22430-3

BIASNN: a biologically inspired attention mechanism in spiking neural networks for image classification

Kevin Takala ^1,^✉, Wachirawut Thamviset ¹, Sartra Wongthanavasu ^1,^✉

PMCID: PMC12589409 PMID: 41193546

Abstract

Spiking Neural Networks (SNNs), designed to more accurately model the brain’s neurobiological processes, have been proposed as energy-efficient alternatives to conventional Artificial Neural Networks (ANNs), which typically incur high computational and energy costs. However, the enhanced energy efficiency and computational savings incurred by using SNNs are often achieved at the expense of reduced classification performance. Recent studies have investigated the incorporation of attention mechanisms into SNNs to enhance their classification performance, but these approaches typically repurpose attention mechanisms originally developed for conventional ANNs, which fail to fully leverage the spike-based encoding characteristics intrinsic to spiking neuron dynamics. To address this challenge, we propose the Biologically Inspired Attention Spiking Neural Network (BIASNN), a novel SNN architecture designed for image classification. BIASNN introduces a biologically inspired attention mechanism that integrates adaptive leaky integrate and fire neurons with components from established attention models. Our attention mechanism is placed into an existing SNN architecture using leaky integrate and fire neurons to enhance biological fidelity by combining multiple spiking neuron models in a single network. Experiments on benchmark image classification datasets demonstrate that BIASNN achieves high classification accuracy using only four timesteps. By enabling the development of more biologically plausible attention mechanisms, BIASNN advances the capabilities of deep spiking neural networks toward more brain-like processing.

Keywords: Attention mechanisms, Biological neural networks, Image classification, Spiking neural networks

Subject terms: Computational biology and bioinformatics, Engineering, Mathematics and computing, Neuroscience

Introduction

Artificial Neural Networks (ANNs) have been at the forefront of advances in machine learning and artificial intelligence, leading to breakthroughs in tasks such as image classification¹, speech recognition², and recommendation systems³. ANNs were originally inspired by the structure and function of biological neural networks, specifically neurons in the human brain⁴. However, despite this biological inspiration, traditional ANNs differ significantly from the brain’s highly efficient mechanisms. The human brain achieves high-throughput information processing with remarkably low power consumption⁵, processing extremely large amounts of information in a massively parallel and event-driven manner. In contrast, ANNs typically require substantial computational and energy resources, due to their reliance on dense, continuous, floating-point matrix operations.

To address these limitations, and more closely mimic the efficiency of biological brains, spiking neural networks (SNNs) have gained significant attention as a next-generation neural network model. Unlike ANNs, which rely on continuous-valued activations, SNNs communicate via binary spikes (0 s and 1 s)⁶, reflecting the event-driven nature of information processing in biological neural systems. This spike-based communication enables SNNs to operate in a sparse, asynchronous fashion, significantly reducing the number of multiplication operations a network is required to perform. Consequently, SNNs offer substantial computational and energy savings^7,8, making them particularly well-suited for real-time, low-power applications, such as edge computing and neuromorphic systems.

Just as the biological brain comprises different types of neurons with distinct properties⁹, various spiking neuron models have been developed for use in SNNs, each balancing biological plausibility against computational efficiency. The Leaky Integrate-and-Fire (LIF) neuron¹⁰, for example, represents a foundational approach, capturing essential neuronal dynamics such as membrane potential decay and threshold-based spike generation with minimal computational overhead. The adaptive leaky integrate and fire (ALIF) neuron¹¹ extends the LIF framework by incorporating mechanisms for adaptive threshold modulation or membrane potential adjustment, thereby increasing biological realism while maintaining relative simplicity. In contrast, more biophysically detailed models, such as the Izhikevich model¹² and Hodgkin–Huxley model¹³, offer higher fidelity by reproducing complex neuronal behaviors, including bursting, resonation, and various firing patterns, at the cost of significantly increased computational demands.

The selection of neuron models within an SNN plays a critical role in determining both the network’s computational efficiency and its functional accuracy. While the Izhikevich and Hodgkin–Huxley models are well-suited for applications that are focused on replicating specific aspects of biological networks, their high computational demands render them impractical for use in deep networks. Conversely, the LIF and ALIF models are commonly employed in SNNs designed to replicate tasks performed by ANNs, due to their lower computational requirements. Previous works^14–17 chose to adopt the LIF neuron model due to its simplicity, and demonstrated its effectiveness in achieving state-of-the-art results. However, the ALIF model has been shown to provide improved firing rate stability, with only a minor increase in energy consumption, while producing competitive accuracies¹⁸. Despite the proven success of both the LIF and ALIF neuron types independently, to the authors’ knowledge, no existing work has investigated the integration of both neurons within a unified network architecture, highlighting a notable gap in the current literature and a promising area for research.

Despite the computational advantages SNNs provide, they face challenges in achieving the high accuracy observed in traditional ANNs. This remains true even for fundamental tasks like image classification, with the exception of networks using smaller-scale datasets^19–21. To help alleviate this issue, researchers have once again turned to the human brain for inspiration. The biological brain, particularly the visual cortex, dynamically allocates resources to the most important parts of the visual field based on the task being performed or some external stimuli. This enables the brain to filter out unnecessary information, allowing humans to process complex environments efficiently and make decisions quickly^22,23. Inspired by this phenomenon, attention mechanisms have been successfully integrated into ANNs^24–28, leading to the development of highly successful models like transformers and vision transformers (ViTs)²⁹, which excel at prioritizing key features for improved task performance. The idea of incorporating attention into SNNs has been the subject of several recent research articles, and it has been shown to be an effective tool for assisting in the optimization of spike generation and processing, leading to better performance and energy efficiency^14,30–34. While attention has been successfully integrated into SNNs, the mechanisms used in most studies were created for ANNs, which leaves an open avenue of research to start exploring mechanisms that are more biologically plausible.

This paper explores the combination of a biologically inspired attention mechanism and SNNs for the task of image classification. Specifically, we propose a new, 3-D spatial-channel attention mechanism for SNNs. The attention mechanism makes use of the spiking output of ALIF neurons to create a binary attention mask, which is applied to the input features to eliminate noisy or non-vital information. Our mechanism is inserted into an existing SNN using LIF neurons, creating a new network capable of using multiple spiking neuron types. The proposed attention mechanism is further analyzed using explainable AI tools to enhance the interoperability of its effects on the decision-making process. Our Biologically Inspired Attention SNN (BIASNN) model is evaluated on three static image datasets (CIFAR-10, CIFAR-100, and FMNIST) with resulting accuracies of 95.66%, 94.22% and 75.40%. The main contributions of our work can be summarized as follows.

We create a 3D, spike-based attention mechanism that uses ALIF neurons for controlling attention within the spatial and channel dimensions of images.
We propose a new method for making use of multiple types of spiking neurons in an SNN.
We make use of a Grad-Cam-like method to further analyze how our proposed mechanism affects the classification of the input images.
Experimental results show that our new method obtains comparable results when measured against existing SNN models.

Methods

The goal of this work is to create a new, more biologically plausible form of attention, and integrate the proposed mechanism into an existing SNN architecture that currently uses LIF neurons. This combination is used to form the proposed BIASNN model. The subsections below discuss the details of the backbone architecture, the inner workings of our new attention mechanism, and the spiking ALIF block used for generating the final attention map.

Backbone architecture

The ResNet model¹ is a widely adopted deep neural network architecture for image processing, originally developed to mitigate the degradation problem in deep networks through the use of identity-based residual connections. Building upon this concept, the MS-ResNet architecture¹⁷ was created for SNNs. In this framework, data encoding is performed via an initial convolutional layer that transforms static image inputs into a format suitable for spike-based processing. The encoded signals are then propagated through a series of residual blocks, each comprising two spiking neuron layers followed by a convolutional layer. This structure permits the exchange of floating-point feature maps between blocks, enabling improved representational capacity and learning stability. Due to its success in image classification, we adopt the MS-ResNet18 architecture as the backbone for our BIASNN network. Following the MS-ResNet18 design paradigm, our model consists of eight residual blocks, with the proposed attention mechanism inserted after every other block, except for the last.

The entire architecture of the proposed network is illustrated in Fig. 1. As depicted in Fig. 1a, the BIASSN model begins with an initial two-dimensional convolutional layer, configured with a kernel size of seven, a stride of one, and padding of three, which serves to encode the input data into a format suitable for downstream spike processing. This is followed by a sequence of residual blocks, the internal structure of which is detailed in Fig. 1b. Each residual block begins with a Leaky Integrate-and-Fire (LIF) neuron layer, which integrates synaptic input, in the form of weighted floating-point values, into the individual neurons’ membrane potentials. When a membrane potential crosses a threshold, an output spike is generated. The resulting spike trains are propagated through a convolutional layer, followed by batch normalization to stabilize learning. The normalized output is subsequently fed into a second LIF layer, whose spiking activity is again processed through a convolutional layer and a second batch normalization step. A residual connection is added to the output of the final batch normalization operation, enabling gradient flow and promoting stable training. Architectural variations for the first convolutional layer in each residual block group are detailed in Table 1. All other convolutional layers throughout the network utilize a kernel size of three, a stride of one, and padding of one.

Table 1.

Information about the first convolutional layer in each group of blocks.

Group number	Convolution information	Channel output sizes
1	k = 3, s = 1, p = 1	64
2	k = 3, s = 2, p = 1	128
3	k = 3, s = 2, p = 1	256
4	k = 3, s = 1, p = 1	512

Open in a new tab

For our LIF layers, we utilize the following equations:

where Inline graphic is the membrane potential of the nth postsynaptic neuron at time step t, and a time step is a single iteration through the network. The variable is the decay constant of the neuron, is the weight between the ith presynaptic neuron and the nth postsynaptic neuron, and is the output spike value from the ith presynaptic neuron. Inline graphic is the output spike value of the nth postsynaptic neuron, represents the Heaviside function, and is the voltage threshold of the postsynaptic neuron. In general, Eq. (1) represents the voltage update process of a neuron, Eq. (2) is used to determine whether a spike is generated, and Eq. (3) resets the voltage of the neuron if a spike occurs. A diagram depicting the process an LIF neuron in our model undergoes can be seen in Fig. 2a.

Fig. 2 — Overview of the processes carried out by an LIF neuron and ALIF neuron. The LIF neuron, depicted in (a), takes the input from the previous layer and integrates it into the decayed (leaked) membrane potential. If the membrane potential reaches a specified threshold, the neuron will fire a spike and reset its membrane potential. Otherwise, it will remain silent, and the membrane potential will be transferred to the next time step as is. The ALIF neuron, depicted in (b), operates in a similar fashion to the LIF, but with one key difference. The ALIF neuron will adjust its firing threshold based on whether a spike occurred in the current time step. If the neuron generates a spike, its threshold will increase, and if a spike wasn’t generated, the threshold will decrease.

Our model makes use of the surrogate gradient method for training³⁵. Using the surrogate gradient method allows for direct training of our SNN and allows us to easily use existing deep learning libraries. To overcome the issue of nondifferentiable spiking outputs from the LIF layers, we use the following equation^17,36 during the backward pass:

where the variable Inline graphic is a constant used to keep the integral of the function set to 1.

Attention mechanism

A detailed overview of the architecture of our attention mechanism can be seen in Fig. 3a, and its distinct steps are summarized in Algorithm 1. Our attention mechanism draws its inspiration from the CBAM³⁷ and SE²⁴ attention architectures and begins with the operations listed below.

Fig. 3 — of the proposed attention Overview mechanism and ALIF Block. Shown in (a) is the process data undergoes in the attention mechanism. Here, global average and max pooling are used to squeeze the data in the channel dimension. The DS convolutions are then used to gradually increase (excite) the number of channels back to that of the original input. Shown in (b) is a detailed look at the ALIF block. The colored squares in the cubes show the data being transformed at each step. In both figures, the final attention map consists of only black squares (0’s) and white squares (1’s).

In Eq. (5), Inline graphic denotes the input data at time step t, while represents the concatenated feature maps obtained from average pooling (AP) and max pooling (MP) operations. In Eq. (6), is processed by two depth-wise separable (DS) convolutional layers. The DS convolutions utilize kernel sizes of 5 and 7, strides of 2 and 3, and have output channel sizes of C/r and C, respectively, to produce the output Inline graphic . These convolutional layers are designed to progressively excite the channel dimensionality to match that of the original input, with the rate of channel expansion governed by the hyperparameter r.

ALIF block

Once the data has been convolved, it is passed into the ALIF block. As can be seen in Fig. 3b, the ALIF block consists of four different steps. The first two steps are channel normalization^18,38,39 and inversion, and the equations for these processes are:

where Inline graphic denotes the normalized output values, and represent the per-channel minimum and maximum values of , and ϵ is some small, constant value added to the denominator to prevent division by zero in the rare case that the minimum and maximum values are equal. After normalization to the range [0, 1], the data is inverted using Eq. (8), resulting in the output Inline graphic . Once the data has been inverted, it is sent to a layer of ALIF neurons with adaptive thresholds^18,40 to generate spikes. Our ALIF neurons make use of the following equations:

where Inline graphic is the membrane potential of the nth neuron at time step t, represents the membrane potential decay constant of the neuron, is the output spike value of the neuron, is the Heaviside function, and is the membrane potential voltage threshold of the neuron. The variable is the update value for the threshold at the next time step, averaged over the batch dimension B. Inline graphic is the scale factor for the update value, and is the update time constant. Overall, Eqs. (9)–(11) are used to update the membrane potential of the neuron, generate a spike when necessary, and reset the membrane potential of the neuron if a spike occurs. Equations (12) and (13) are used to update the threshold value of the ALIF neuron for the next time step. A diagram depicting the process an ALIF neuron in our attention mechanism undergoes can be seen in Fig. 2b.

To overcome the non-differentiability of spike-based outputs, our ALIF layer makes use of the arctan surrogate method^35,41 in the backpropagation step, which can be derived as seen below.

Algorithm 1 — Pseudo-code of the proposed attention mechanism

In Eq. (14), Inline graphic is a constant used to scale the output gradient. Once the spikes for the ALIF layer’s neurons have been calculated, they are inverted using Eq. (15) below:

where Inline graphic is the final output spike values for all ALIF neurons. The ALIF spikes are computed in this way to keep the output of the ALIF neurons sparse, as is typically desired from spiking neurons. Finally, using Eq. (16), the original attention mechanism input is multiplied by the inverted spikes to get the final output of the attention mechanism.

In Eq. (16) above, STE stands for the straight-through estimator⁴². The straight-through estimator is used to allow gradient calculations for all values of Inline graphic , even those that were removed by the attention mask.

Experimental setup

We evaluate our newly proposed BIASNN on three standard datasets, FashionMNIST⁴³, CIFAR10⁴⁴, and CIFAR100⁴⁴. For all experiments the adaptation parameter, Inline graphic , is reset at the end of each epoch. We incorporate this strategy to simulate the long-term, homeostatic effects of threshold variations typically seen in biological neurons⁴⁵. Table 2 lists the common hyperparameter settings for the three datasets. Information pertaining to each dataset can be seen in Table 3. For training, each dataset is augmented with random horizontal flipping and random cropping with a padding of four applied. To ensure a fair comparison, we processed all three datasets through the MS-ResNet18 network using the same convolution settings employed by our BIASNN (see Table 1 for details).

Table 2.

Hyperparameter settings for all datasets.

Hyperparameter	Equations	FMNIST	CIFAR10	CIFAR100
Β	(2)	0.25	0.25	0.25
V_th	(3)	0.5	0.5	0.5
a	(5)	1	1	1
ε	(7)	1e−5	1e−5	1e−5
τ_v	(9)	0.6	0.6	0.6
	(10)	0.98	0.999	0.999
τ_a	(12)	250	400	50
dt_a	(12)	4	4	4
α	(14)	2	2	2
r	–	32	32	16

Open in a new tab

Table 3.

Information about FMNIST, CIFAR10, and CIFAR100 datasets.

Dataset	Training images	Validation images	Number of classes	Number of channels	Image size
FMNIST	60,000	10,000	10	1	28 × 28
CIFAR10	50,000	10,000	10	3	32 × 32
CIFAR100	50,000	10,000	100	3	32 × 32

Open in a new tab

These results are indicated with a superscript “a” in Table 4. All reported results correspond to the highest validation accuracies achieved for each dataset within 150 training epochs. All experiments are carried out on an NVIDIA RTX 4090 graphics card using the PyTorch 2.1 library.

Table 4.

Experimental results on the FashionMNIST, CIFAR10, and CIFAR100 datasets.

Dataset	Model	Training method	Time steps (T)	Top-1 Acc
FMNIST	R-CSNN⁵¹	STiDi-BP	100	92.80
	Yin et al.⁴⁷	Proxy Learning	50	94.56
	SFA-SNN⁵²	SG	20	92.40
	Liu et al.⁵³	SG	24	90.31
	Dan et al.⁴⁶	SG	6	95.44
	SONN⁵⁴	SG	300/1000	90.0
	MS-ResNet¹⁷ (ResNet18)^a	SG	4	95.47
	BIASNN	SG	4	95.66
CIFAR10	CQ Training⁴⁸ (VGG*)	ANN-SNN	600	94.16
	Wang et al.⁴⁹ (ResNet-18)	ANN-SNN	4	93.27
	Wang et al.⁴⁹ (VGG-16)	ANN-SNN	4	94.06
	SpikeConverter⁵⁵ (VGG-16)	ANN-SNN	16	93.71
	Yan et al.⁵⁶ (VGG-*)	ANN-SNN	8	93.71
	Li et al.⁵⁷ (ResNet18)	SG	4	92.92
	Dan et al.⁴⁶	SG	6	94.04
	Sun et al.⁵⁰ (SResNet38)	SG	20	93.54
	MA-SNN¹⁴ (VGG-11)	SG	6	91.91
	MS-ResNet¹⁷ (ResNet18)^a	SG	4	93.84
	BIASNN	SG	4	94.22
CIFAR100	Yan et al.⁵⁶ (VGG16)	ANN-SNN	16	66.11
	SpikeConverter⁵⁵ (VGG-16)	ANN-SNN	16	71.22
	CQ Training⁴⁸ (VGG*)	ANN-SNN	300	71.84
	Wang et al.⁴⁹ (VGG-16)	ANN-SNN	4	70.08
	Li et al.⁵⁷ (VGG16)	SG	4	69.40
	Sun et al.⁵⁰ (SResNet38)	SG	20	71.77
	MA-SNN¹⁴ (VGG-11)	SG	1	60.49
	Liu et al.¹⁸	SG	8	67.83
	LitE-SNN⁵⁸	SG	4	69.55
	MS-ResNet¹⁷ (ResNet18)^a	SG	4	74.98
	BIASNN	SG	4	75.40

Open in a new tab

In the Training Method column, rows set to SG indicate studies that use surrogate gradients for training, rows set to ANN-SNN indicate studies that use ANN-SNN conversion for training, and all others are custom learning methods.

^aIndicates results from experiments performed by the authors.

Results

The results of our experiments on the FMNIST dataset are shown in Table 4. As can be seen in the table, our BIASNN model achieves an accuracy of 95.66% when the number of timesteps (T) is set to 4. The proposed solution achieves an increase in accuracy over the backbone MS-ResNet18 by 0.19%. Excluding the backbone network, the method with the next closest accuracy is the method proposed by Dan et al.⁴⁶. Their method achieved an accuracy 0.22% less than ours, while requiring two extra timesteps. The Proxy Learning method⁴⁷ shows the next closest accuracy to our BIASNN, however, this method required 50 time steps and produced an accuracy 1.1% lower than BIASNN. Table 4 lists the results of experiments on the CIFAR10 dataset. As can be seen in the table, the BIASNN network achieves an accuracy of 94.22%, a 0.38% increase in accuracy over the backbone MS-ResNet18. Comparing our results with other methods on the CIFAR10 dataset, we see that the CQ Training⁴⁸ and Wang et al.⁴⁹ methods achieve accuracies only slightly lower than those of the proposed BIASNN. The CQ Training method achieves an accuracy of 0.06% less than BIASNN. However, their method makes use of ANN-SNN conversion and requires 600 time steps to achieve a similar accuracy to our four time step method. The method proposed by Wang et al. made use of ANN-SNN conversion as well, and was able to achieve results slightly lower than our BIASNN method (0.13%) in four time steps with a VGG-16-based network. However, when applied to the typically more powerful ResNet-18 architecture, the accuracy of their method decreases to 93.27%, which is 0.95% less than the accuracy achieved by our proposed BIASNN. Besides the decrease in accuracy, the CQ Training and Wang et al. methods still do not allow for the direct training of the SNN, requiring an extra step with the conversion process.

Table 4 lists the results of experiments on the CIFAR100 dataset. When employing four time steps, our proposed BIASNN model is able to achieve an accuracy of 75.40%, an increase in accuracy of 0.42% over the backbone MS-ResNet18. In the case of the CIFAR100 dataset, the CQ Training method is the next closest competitor to the proposed BIASNN. Their method produces an accuracy of 71.84% with 300 time steps, which is a relatively large drop in performance (3.45%) compared to our BIASNN’s results. The method proposed in Sun et al.⁵⁰ demonstrates the next highest accuracy compared to the proposed BIASNN. It achieves an accuracy of 71.77%, again showing a large drop in performance (3.63%) compared to our BIASSN.

Discussion

To better analyze the effects the proposed attention mechanism has on the overall classification accuracy of the network, we make use of two different methods to visualize its impact. First, we compare the individual class accuracies with those of the backbone architecture, and second, we make use of a Grad-CAM-like method for generating heat maps of the spiking outputs of the attention layer.

Shown in Fig. 4a, b are the confusion matrices for the original MS-ResNet18 and our proposed BIASNN on the CIFAR10 dataset. We see that the proposed attention mechanism helps increase the accuracy of several different classes, most notably the airplane (+ 2%), cat (+ 3%), and frog (+ 1%) classes. However, it does decrease the accuracy of some classes, such as dogs (-2%) and horses (-2%). This may be a sign of the network eliminating the wrong or too much information, making some classes more difficult to distinguish.

The Grad-CAM method⁵⁹ is used for visualizing the effects individual layers have on the final output of a network. It makes use of activation values and their gradients to generate a heatmap, allowing researchers to better understand how different layers are affecting a model. Here, we use this same concept to study the effect the spiking layer in our attention mechanism has on our proposed network. However, unlike the original Grad-CAM, we do not use the gradients of the spiking layers for two reasons. First, the gradients of spiking outputs are generated using a surrogate gradient function, which is only an approximation of a spike’s gradient and can introduce errors into the final heatmap. Second, the surrogate functions are typically designed so that lower input values will have higher gradient values. This can result in misleading heatmaps, as the areas that caused spikes to occur, i.e., areas with the most information, will have the smallest gradients. In our specific case, since the final spike outputs were inverted, the highest gradients should correspond to the spike values for the attention mask; however, we prefer to use a method that can be more easily applied to any spiking layer in the network.

Instead, we make use of the total number of spikes for all time steps to create the final heat map. To generate the final heatmap for each image, the number of spikes is summed across all time steps, and then normalized to be in the range [0, 1] using Eqs. (17) and (18) below.

In Fig. 5, class-specific heatmaps were generated using the inverted spike outputs, computed via Eq. (15), from the final attention mechanism in the network. As can be seen in the figure, the number of spikes generated in areas of importance tends to be higher, while areas of less importance produce fewer spikes. This is an indication that our attention mechanism does help the network learn where to focus, and is demonstrated particularly well in the airplane, deer, and bird heatmaps.

Our proposed attention mechanism makes use of ALIF neurons due to their more stable firing rate. However, the LIF neuron is generally a more popular choice when working with SNNs. To validate our use of the ALIF layer of neurons in our attention mechanism, we examine the results of the BIASNN model when an LIF layer is used as the spiking layer in our attention mechanism. We term this method Leaky Attention SNN (LASNN), and compare its results to those of the proposed BIASNN model, where an ALIF layer of neurons is used. From our experiments, we find that the BIASNN model achieves an accuracy of 94.22%, whereas the LASNN method produces an accuracy of 93.68%, a considerably large drop in performance. In an effort to understand why the BIASNN method outperforms the LASNN method, we look at two different pieces of information generated by the network. First is the number of neuron values that are eliminated at each time step for each attention mechanism. The number of eliminated values plays a crucial role in the amount of information that is allowed to pass through the rest of the network and will ultimately affect the final accuracy. Shown in Fig. 6a, the number of values that are eliminated using the LASNN method varies greatly for each time step in all attention mechanisms. Also, there is a considerable difference in the number of eliminated values between attention mechanisms. In contrast, the BIASNN method gradually increases the number of eliminated values between time steps for each mechanism, and the number eliminated by each attention mechanism is relatively the same.

Fig. 6 — Graphs displaying BIASNN and LASNN comparisons. Depicted in (a), the percentage of neuron activations suppressed by the LIF-based attention mechanism (LASNN) and the proposed ALIF-based mechanism (BIASNN). Shown in (b), the percentage of spiking neurons in the LIF layers that follow the attention mechanisms.

Second, we look at the firing patterns of the LIF layers that directly follow the attention mechanisms. The firing patterns seen in these layers will affect the rest of the layers in the block, and ultimately the accuracy of the network, so stability becomes an important factor. Plotted in Fig. 6b, the number of spiking neurons in the LIF layers varies considerably for the LASNN method, which is to be expected given the greater variability in the number of eliminated neuron values in the attention mechanisms. In contrast, the BIASNN method produces LIF neurons that are gradually less activated over time, and the variation between the LIF layers is more consistent. This follows well with the gradual increase in the number of eliminated neuron values for each attention mechanism in the BIASNN method. Based on these two pieces of information, it appears that the instability of LASNN is what leads to its lower classification accuracy.

We also examine the effects using the LASSN method has on the energy requirements of our proposed network. For our energy calculations, we follow⁶⁰, where the network is assumed to be running on a 45 nm CMOS chip, and addition and multiplication operations require 0.9pJ and 3.7pJ of energy, respectively. Results show that the BIASNN method consumes 0.55 mJ of energy and the LASNN method requires 0.59 mJ of energy, an outcome that seems counterintuitive considering the use of the more complex ALIF neurons in BIASNN. Examining the number of multiplication and addition operations within the attention mechanisms, we find that the LIF neurons require 229,376 multiplications and 458,752 additions, whereas the ALIF neurons require 458,755 multiplications and 688,688 additions. Although the ALIF neurons require approximately twice the number of multiplications and 1.5 times the number of additions, this adds minimal computational overhead to the network. In fact, this small increase in complexity is attained by our placement strategy for the attention mechanisms, as only three ALIF layers are added to the network, one for each attention mechanism. However, the small increase in complexity does not explain the difference in power consumption between the two models. Instead, the difference can be seen by examining the spiking rate of the LIF neurons outside of the attention mechanisms in both networks. The LASNN method shows an average spiking rate of 11.88%, while the BIASNN method maintains a lower rate of 9.64%. The reduced spike rate of BIASNN results in it requiring 54,338,180 fewer addition operations compared to LASNN, offsetting the extra cost of the ALIF neurons and accounting for the 0.05 mJ difference in energy consumption. A particularly clear example of the higher spike rate in LASNN is observed in LIF layer 9, shown in Fig. 6b. Based on reduced spiking rate of the network, lower energy consumption, and increased accuracy, we conclude that incorporating ALIF layers into our attention mechanism is the most effective design choice.

In addition to employing two types of spiking neurons, our network also incorporates two distinct surrogate gradient functions, one tailored for each neuron type. This design choice was motivated by observed performance improvements when using two functions instead of the standard single-function approach. Experiments on the CIFAR10 dataset show that employing Eq. (4) as the surrogate for all spiking layers reduces the network’s accuracy to 93.97%, a reduction of 0.25% compared to the proposed dual function model. Conversely, when Eq. (14) is the surrogate gradient utilized by all spiking layers, the network reached a peak accuracy of only 73.33% at epoch 34. After this point, both accuracy and spiking activity progressively declined, and by epoch 70, the network exhibited zero spiking activity with an accuracy of 10%. This resulted in what is commonly referred to as the dead neuron problem in SNNs, and no learning occurred for the remaining epochs. The improved accuracy observed when combining Eqs. (4) and (14) suggests that, in certain cases, employing multiple surrogate gradient functions is essential for maximizing network performance. For our BIASNN, this combination proved to be the most effective configuration.

While the proposed BIASNN introduces a more biologically plausible form of attention than those proposed in prior works, we find that its performance does not match that of methods employing floating-point valued attention. A comparison of our work and other recently proposed attention mechanisms for SNNs is provided in Table 5. As shown, our biologically inspired approach achieves accuracy comparable to the FSTA-SNN61 method when applied to a two-timestep ResNet20 architecture; however, methods that generate floating-point attention masks generally outperform BIASNN. We attribute this limitation to the binary, all-or-nothing approach our attention mechanism incorporates. In contrast, floating-point attention allows continuous weighting of input values, enabling more flexible feature selection and improved generalization across training and testing data. By restricting map weights to only 1 s and 0 s, our mechanism risks either eliminating too much information, leading to overfitting, or eliminating too little, effectively negating the benefits of attention. Nevertheless, BIASNN demonstrates that spike-based attention can approach the effectiveness of some floating-point methods, while maintaining greater biological plausibility. These results highlight the importance of further investigation into brain-inspired attention processes, as they hold promise for bridging the gap between biological realism and state-of-the-art performance.

Table 5.

Comparison of our proposed BIASNN and other state-of-the-art SNNs incorporating floating-point valued attention.

Model	Timesteps	Accuracy	Biologically inspired
FSTA-SNN (ResNet19)⁶¹	2	96.52	✕
FSTA-SNN (ResNet20)⁶¹	4	94.72	✕
FSTA-SNN (ResNet20)⁶¹	2	94.18	✕
TCJA-SNN (MS-ResNet18)³²	6	95.87	✕
TCJA-SNN (MS-ResNet18)³²	4	95.60	✕
IM-LIF³⁴	6	95.66	✕
IM-LIF³⁴	3	95.29	✕
GAC-SNN³³	6	96.46	✕
GAC-SNN³³	4	96.24	✕
BIASNN	4	94.22	✓

Open in a new tab

Conclusion

In conclusion, this study presents a novel attention mechanism (BIASNN) designed for spiking neural networks and static image classification. Our proposed mechanism is designed to help enhance the performance of SNNs while increasing the biological plausibility of attention mechanisms. Our attention mechanism makes use of a CBAM-like architecture in combination with the spiking output of ALIF neurons to eliminate unnecessary input data. The elimination of this data helps the network learn where to focus and what to focus on, all while adding a new degree of biological plausibility. Experimental results on several standard datasets demonstrate that our mechanism can help a network enhance performance, and an in-depth analysis using explainable AI tools helps to better understand how our mechanism functions. Attention is a powerful tool that has been successfully used in many ANN models, and by incorporating attention, SNNs have the potential to advance both practical AI applications and our understanding of attention in the brain. Future work may explore incorporating additional biological mechanisms, such as ternary spiking outputs, or more biologically plausible neuron models. Additionally, optimizing this mechanism for neuromorphic datasets and examining its impact on broader, more complex neural tasks are potential areas for further study.

Acknowledgements

The authors thank the Artificial Intelligence Center (AIC) and MLIS Laboratory, College of Computing, Khon Kaen University, Thailand, for their support.

Author contributions

K.T. Formal analysis, methodology, original draft. W.T. prepared visualizing. S.W. reviewed and edited the manuscript.

Data availability

Code is available through the corresponding author. The datasets used and/or analyzed in this study are publicly available from the following websites. FMNIST: https://github.com/zalandoresearch/fashion-mnist. CIFAR10 and CIFAR100: https://www.cs.toronto.edu/~kriz/cifar.html.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Kevin Takala, Email: kevin.t@kkumail.com.

Sartra Wongthanavasu, Email: wongsar@kku.ac.th.

References

1.He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (Las Vegas, 2016).
2.Dong, L., Xu, S. & Xu, B. Speech-transformers: A no-recurrence sequence-to-sequence model for speech recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing 465 (Institute of Electrical and Electronics Engineers, Calgary, 2018).
3.Kannikaklang, N., Thamviset, W. & Wongthanavasu, S. BiLSTCAN: A novel SRS-based bidirectional long short-term capsule attention network for dynamic user preference and next-item recommendation. IEEE Access12, 6879–6899 (2024). [Google Scholar]
4.Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (2015).
5.Laughlin, S. B. & Sejnowski, T. J. Communication in neuronal networks. Science1979(301), 1870–1874 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Maass, W. Networks of spiking neurons: The third generation of neural network models. Neural Netw.10, 1659–1671 (1997). [Google Scholar]
7.Rathi, N. & Roy, K. DIET-SNN: A low-latency spiking neural network with direct input encoding and leakage and threshold optimization. IEEE Trans. Neural Netw. Learn. Syst.34, 3174–3182 (2023). [DOI] [PubMed] [Google Scholar]
8.Rathi, N. & Roy, K. LITE-SNN: Leveraging inherent dynamics to train energy-efficient spiking neural networks for sequential learning. IEEE Trans. Cogn. Dev. Syst.16, 1905–1914 (2024). [Google Scholar]
9.Gast, R., Solla, S. A. & Kennedy, A. Neural heterogeneity controls computations in spiking neural networks. Proc. Natl. Acad. Sci. U.S.A.121, e2311885121 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Abbott, L. F. Lapicque’s introduction of the integrate-and-fire model neuron (1907). Brain Res. Bull.50, 303–304 (1999). [DOI] [PubMed] [Google Scholar]
11.Brette, R. & Gerstner, W. Adaptive exponential integrate-and-fire model as an effective description of neuronal activity. J. Neurophysiol.94, 3637–3642 (2005). [DOI] [PubMed] [Google Scholar]
12.Izhikevich, E. M. Simple model of spiking neurons. IEEE Trans. Neural Netw.14, 1569–1572. 10.1109/TNN.2003.820440 (2003). [DOI] [PubMed] [Google Scholar]
13.Hodgkin, A. L. & Huxley, A. F. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol.117, 500–544 (1952). [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Yao, M. et al. Attention spiking neural networks. IEEE Trans. Pattern Anal. Mach. Intell.45, 9393–9410 (2023). [DOI] [PubMed] [Google Scholar]
15.Cai, S., Li, P. & Li, H. A bio-inspired spiking attentional neural network for attentional selection in the listening brain. IEEE Trans. Neural Netw. Learn. Syst.10.1109/TNNLS.2023.3303308 (2023). [DOI] [PubMed] [Google Scholar]
16.Cai, W. et al. A spatial-channel-temporal-fused attention for spiking neural networks. IEEE Trans. Neural Netw. Learn. Syst.10.1109/TNNLS.2023.3278265 (2023). [DOI] [PubMed] [Google Scholar]
17.Hu, Y., Deng, L., Wu, Y., Yao, M. & Li, G. Advancing spiking neural networks towards deep residual learning. IEEE Trans. Neural Netw. Learn. Syst.36, 2353–2367 (2025). [DOI] [PubMed] [Google Scholar]
18.Liu, Q., Cai, M., Chen, K., Ai, Q. & Ma, L. Reconstruction of adaptive leaky integrate-and-fire neuron to enhance the spiking neural networks performance by establishing complex dynamics. IEEE Trans. Neural Netw. Learn. Syst.36, 2619–2633 (2025). [DOI] [PubMed] [Google Scholar]
19.Lee, C., Sarwar, S. S., Panda, P., Srinivasan, G. & Roy, K. Enabling spike-based backpropagation for training deep neural network architectures. Front. Neurosci.14, 497482 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Lee, C., Panda, P., Srinivasan, G. & Roy, K. Training deep spiking convolutional neural networks with STDP-based unsupervised pre-training followed by supervised fine-tuning. Front. Neurosci.12, 435 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Tavanaei, A. & Maida, A. BP-STDP: Approximating backpropagation using spike timing dependent plasticity. Neurocomputing330, 39–47 (2019). [Google Scholar]
22.Melcher, D. Predictive remapping of visual features precedes saccadic eye movements. Nat. Neurosci.10, 903–907 (2007). [DOI] [PubMed] [Google Scholar]
23.Rolfs, M., Jonikaitis, D., Deubel, H. & Cavanagh, P. Predictive remapping of attention across eye movements. Nat. Neurosci.14, 252–256 (2010). [DOI] [PubMed] [Google Scholar]
24.Hu, J., Shen, L. & Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 7132–7141 (IEEE Computer Society, 2018). 10.1109/CVPR.2018.00745.
25.Dai, T., Cai, J., Zhang, Y., Xia, S. T. & Zhang, L. Second-order attention network for single image super-resolution. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition vols 2019-June 11057–11066 (IEEE Computer Society, 2019).
26.Wang, X., Girshick, R., Gupta, A. & He, K. Non-local neural networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 7794–7803 (IEEE Computer Society, 2018). 10.1109/CVPR.2018.00813.
27.Zhang, H., Goodfellow, I., Metaxas, D. & Odena, A. Self-Attention Generative Adversarial Networks. In Proceedings of the 36th International Conference on Machine Learning 7354–7363 (2019).
28.Gao, L., Li, X., Song, J. & Shen, H. T. Hierarchical LSTMs with adaptive attention for visual captioning. IEEE Trans. Pattern Anal. Mach. Intell.42, 1112–1131 (2020). [DOI] [PubMed] [Google Scholar]
29.Dosovitskiy, A. et al. An image is worth 16×16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (2021).
30.Yao, M. et al. Temporal-wise attention spiking neural networks for event streams classification. In IEEE/CVF International Conference on Computer Vision 10221–10230 (2021).
31.Yao, M. et al. Inherent redundancy in spiking neural networks. In Proceedings of the IEEE International Conference on Computer Vision 16878–16888 (Institute of Electrical and Electronics Engineers Inc., 2023). 10.1109/ICCV51070.2023.01552.
32.Zhu, R. J. et al. TCJA-SNN: Temporal-channel joint attention for spiking neural networks. IEEE Trans. Neural Netw. Learn. Syst.10.1109/TNNLS.2024.3377717 (2024). [DOI] [PubMed] [Google Scholar]
33.Qiu, X. et al. Gated Attention Coding for Training High-Performance and Efficient Spiking Neural Networks. https://github.com/bollossom/GAC (2024).
34.Lian, S., Shen, J., Wang, Z. & Tang, H. IM-LIF: Improved neuronal dynamics with attention mechanism for direct training deep spiking neural network. IEEE Trans. Emerg. Top. Comput. Intell.8, 2075–2085 (2024). [Google Scholar]
35.Eshraghian, J. K. et al. Training spiking neural networks using lessons from deep learning. Proc. IEEE111, 1016–1054 (2023). [Google Scholar]
36.Wu, Y., Deng, L., Li, G., Zhu, J. & Shi, L. Spatio-temporal backpropagation for training high-performance spiking neural networks. Front. Neurosci.12, 331 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Woo, S., Park, J., Lee, J.-Y. & Kweon, I. S. CBAM: Convolutional block attention module. In European Conference on Computer Vision (ECCV) (2018).
38.Kim, S., Park, S., Na, B. & Yoon, S. Spiking-YOLO: Spiking neural network for energy-efficient object detection. In AAAI 11270–11277 (2020).
39.Wu, Y. et al. Direct training for spiking neural networks: Faster, larger, better. In AAAI 1311–1318 (2019).
40.Shen, J., Ni, W., Xu, Q. & Tang, H. Efficient spiking neural networks with sparse selective activation for continual learning. In AAAI 611–619 (2024).
41.Fang, W. et al. Incorporating learnable membrane time constant to enhance learning of spiking neural networks. In Proceedings of the IEEE International Conference on Computer Vision 2641–2651 (Institute of Electrical and Electronics Engineers Inc., 2021). 10.1109/ICCV48922.2021.00266.
42.Bengio, Y., Léonard, N. & Courville, A. Estimating or propagating gradients through stochastic neurons for conditional computation. ArXiv (2013).
43.Xiao, H., Rasul, K. & Vollgraf, R. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. https://github.com/zalandoresearch/fashion-mnist (2017).
44.Krizhevsky, A. & Hinton, G. Learning Multiple Layers of Features from Tiny Images. https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (2009).
45.Turrigiano, G. G. Homeostatic plasticity in neuronal networks: The more things change, the more they stay the same. Trends Neurosci.22, 221–227 (1999). [DOI] [PubMed] [Google Scholar]
46.Dan, Y., Sun, C., Li, H. & Meng, L. Adaptive spiking neuron with population coding for a residual spiking neural network. Appl. Intell.55, 288 (2025). [Google Scholar]
47.Kheradpisheh, S. R., Mirsadeghi, M. & Masquelier, T. Spiking neural networks trained via proxy. IEEE Access10, 70769–70778 (2022). [Google Scholar]
48.Yan, Z., Zhou, J. & Wong, W.-F. Near lossless transfer learning for spiking neural networks. In AAAI 10577–10584 (2021).
49.Wang, Z. et al. Toward high-accuracy and low-latency spiking neural networks with two-stage optimization. IEEE Trans. Neural Netw. Learn. Syst.36, 3189–3203 (2025). [DOI] [PubMed] [Google Scholar]
50.Sun, C. et al. An energy efficient residual spiking neural network accelerator with ternary spikes. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst.44, 395–400 (2025). [Google Scholar]
51.Mirsadeghi, M., Shalchian, M., Kheradpisheh, S. R. & Masquelier, T. Spike time displacement-based error backpropagation in convolutional spiking neural networks. Neural Comput. Appl.35, 15891–15906 (2023). [Google Scholar]
52.Chen, T., Wang, L., Li, J., Duan, S. & Huang, T. Improving spiking neural network with frequency adaptation for image classification. IEEE Trans. Cogn. Dev. Syst.16, 864–876 (2024). [Google Scholar]
53.Liu, F., Xu, J., Yang, J. & Wu, W. A novel multi-type image coding method acting on supervised hierarchical deep spiking convolutional neural networks for image classification. Cognit. Comput.17, 9 (2025). [Google Scholar]
54.Huang, Z. et al. Modeling of spiking neural network with optimal hidden layer via spatiotemporal orthogonal encoding for patterns recognition. IEEE Trans. Emerg. Top. Comput. Intell.9, 2194–2207 (2025). [Google Scholar]
55.Liu, F. et al. SpikeConverter: An efficient conversion framework zipping the gap between artificial neural networks and spiking neural networks. In AAAI 1692–1701 (2022).
56.Yan, Z., Tang, K., Zhou, J. & Wong, W. F. Low latency conversion of artificial neural network models to rate-encoded spiking neural networks. IEEE Trans. Neural Netw. Learn. Syst.36, 14107–14118 (2025). [DOI] [PubMed] [Google Scholar]
57.Li, Y. et al. Efficient structure slimming for spiking neural networks. IEEE Trans. Artif. Intell.5, 3823–3831 (2024). [Google Scholar]
58.Liu, Q., Yan, J., Zhang, M. & Li, H. LitE-SNN: Designing lightweight and efficient spiking neural network through spatial-temporal compressive network search and joint optimization. In Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI-24) 3097–3105 (2024).
59.Selvaraju, R. R. et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision 2017-October 618–626 (Institute of Electrical and Electronics Engineers Inc., 2017).
60.Horowitz, M. 1.1 Computing’s energy problem (and what we can do about it). In Digest of Technical Papers—IEEE International Solid-State Circuits Conference vol. 57 10–14 (2014).
61.Yu, K., Zhang, T., Wang, H. & Xu, Q. FSTA-SNN:Frequency-Based Spatial-Temporal Attention Module for Spiking Neural Networks. www.aaai.org (2025).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CR1] 1.He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (Las Vegas, 2016).

[CR2] 2.Dong, L., Xu, S. & Xu, B. Speech-transformers: A no-recurrence sequence-to-sequence model for speech recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing 465 (Institute of Electrical and Electronics Engineers, Calgary, 2018).

[CR3] 3.Kannikaklang, N., Thamviset, W. & Wongthanavasu, S. BiLSTCAN: A novel SRS-based bidirectional long short-term capsule attention network for dynamic user preference and next-item recommendation. IEEE Access12, 6879–6899 (2024). [Google Scholar]

[CR4] 4.Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (2015).

[CR5] 5.Laughlin, S. B. & Sejnowski, T. J. Communication in neuronal networks. Science1979(301), 1870–1874 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Maass, W. Networks of spiking neurons: The third generation of neural network models. Neural Netw.10, 1659–1671 (1997). [Google Scholar]

[CR7] 7.Rathi, N. & Roy, K. DIET-SNN: A low-latency spiking neural network with direct input encoding and leakage and threshold optimization. IEEE Trans. Neural Netw. Learn. Syst.34, 3174–3182 (2023). [DOI] [PubMed] [Google Scholar]

[CR8] 8.Rathi, N. & Roy, K. LITE-SNN: Leveraging inherent dynamics to train energy-efficient spiking neural networks for sequential learning. IEEE Trans. Cogn. Dev. Syst.16, 1905–1914 (2024). [Google Scholar]

[CR9] 9.Gast, R., Solla, S. A. & Kennedy, A. Neural heterogeneity controls computations in spiking neural networks. Proc. Natl. Acad. Sci. U.S.A.121, e2311885121 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Abbott, L. F. Lapicque’s introduction of the integrate-and-fire model neuron (1907). Brain Res. Bull.50, 303–304 (1999). [DOI] [PubMed] [Google Scholar]

[CR11] 11.Brette, R. & Gerstner, W. Adaptive exponential integrate-and-fire model as an effective description of neuronal activity. J. Neurophysiol.94, 3637–3642 (2005). [DOI] [PubMed] [Google Scholar]

[CR12] 12.Izhikevich, E. M. Simple model of spiking neurons. IEEE Trans. Neural Netw.14, 1569–1572. 10.1109/TNN.2003.820440 (2003). [DOI] [PubMed] [Google Scholar]

[CR13] 13.Hodgkin, A. L. & Huxley, A. F. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol.117, 500–544 (1952). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Yao, M. et al. Attention spiking neural networks. IEEE Trans. Pattern Anal. Mach. Intell.45, 9393–9410 (2023). [DOI] [PubMed] [Google Scholar]

[CR15] 15.Cai, S., Li, P. & Li, H. A bio-inspired spiking attentional neural network for attentional selection in the listening brain. IEEE Trans. Neural Netw. Learn. Syst.10.1109/TNNLS.2023.3303308 (2023). [DOI] [PubMed] [Google Scholar]

[CR16] 16.Cai, W. et al. A spatial-channel-temporal-fused attention for spiking neural networks. IEEE Trans. Neural Netw. Learn. Syst.10.1109/TNNLS.2023.3278265 (2023). [DOI] [PubMed] [Google Scholar]

[CR17] 17.Hu, Y., Deng, L., Wu, Y., Yao, M. & Li, G. Advancing spiking neural networks towards deep residual learning. IEEE Trans. Neural Netw. Learn. Syst.36, 2353–2367 (2025). [DOI] [PubMed] [Google Scholar]

[CR18] 18.Liu, Q., Cai, M., Chen, K., Ai, Q. & Ma, L. Reconstruction of adaptive leaky integrate-and-fire neuron to enhance the spiking neural networks performance by establishing complex dynamics. IEEE Trans. Neural Netw. Learn. Syst.36, 2619–2633 (2025). [DOI] [PubMed] [Google Scholar]

[CR19] 19.Lee, C., Sarwar, S. S., Panda, P., Srinivasan, G. & Roy, K. Enabling spike-based backpropagation for training deep neural network architectures. Front. Neurosci.14, 497482 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Lee, C., Panda, P., Srinivasan, G. & Roy, K. Training deep spiking convolutional neural networks with STDP-based unsupervised pre-training followed by supervised fine-tuning. Front. Neurosci.12, 435 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Tavanaei, A. & Maida, A. BP-STDP: Approximating backpropagation using spike timing dependent plasticity. Neurocomputing330, 39–47 (2019). [Google Scholar]

[CR22] 22.Melcher, D. Predictive remapping of visual features precedes saccadic eye movements. Nat. Neurosci.10, 903–907 (2007). [DOI] [PubMed] [Google Scholar]

[CR23] 23.Rolfs, M., Jonikaitis, D., Deubel, H. & Cavanagh, P. Predictive remapping of attention across eye movements. Nat. Neurosci.14, 252–256 (2010). [DOI] [PubMed] [Google Scholar]

[CR24] 24.Hu, J., Shen, L. & Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 7132–7141 (IEEE Computer Society, 2018). 10.1109/CVPR.2018.00745.

[CR25] 25.Dai, T., Cai, J., Zhang, Y., Xia, S. T. & Zhang, L. Second-order attention network for single image super-resolution. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition vols 2019-June 11057–11066 (IEEE Computer Society, 2019).

[CR26] 26.Wang, X., Girshick, R., Gupta, A. & He, K. Non-local neural networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 7794–7803 (IEEE Computer Society, 2018). 10.1109/CVPR.2018.00813.

[CR27] 27.Zhang, H., Goodfellow, I., Metaxas, D. & Odena, A. Self-Attention Generative Adversarial Networks. In Proceedings of the 36th International Conference on Machine Learning 7354–7363 (2019).

[CR28] 28.Gao, L., Li, X., Song, J. & Shen, H. T. Hierarchical LSTMs with adaptive attention for visual captioning. IEEE Trans. Pattern Anal. Mach. Intell.42, 1112–1131 (2020). [DOI] [PubMed] [Google Scholar]

[CR29] 29.Dosovitskiy, A. et al. An image is worth 16×16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (2021).

[CR30] 30.Yao, M. et al. Temporal-wise attention spiking neural networks for event streams classification. In IEEE/CVF International Conference on Computer Vision 10221–10230 (2021).

[CR31] 31.Yao, M. et al. Inherent redundancy in spiking neural networks. In Proceedings of the IEEE International Conference on Computer Vision 16878–16888 (Institute of Electrical and Electronics Engineers Inc., 2023). 10.1109/ICCV51070.2023.01552.

[CR32] 32.Zhu, R. J. et al. TCJA-SNN: Temporal-channel joint attention for spiking neural networks. IEEE Trans. Neural Netw. Learn. Syst.10.1109/TNNLS.2024.3377717 (2024). [DOI] [PubMed] [Google Scholar]

[CR33] 33.Qiu, X. et al. Gated Attention Coding for Training High-Performance and Efficient Spiking Neural Networks. https://github.com/bollossom/GAC (2024).

[CR34] 34.Lian, S., Shen, J., Wang, Z. & Tang, H. IM-LIF: Improved neuronal dynamics with attention mechanism for direct training deep spiking neural network. IEEE Trans. Emerg. Top. Comput. Intell.8, 2075–2085 (2024). [Google Scholar]

[CR35] 35.Eshraghian, J. K. et al. Training spiking neural networks using lessons from deep learning. Proc. IEEE111, 1016–1054 (2023). [Google Scholar]

[CR36] 36.Wu, Y., Deng, L., Li, G., Zhu, J. & Shi, L. Spatio-temporal backpropagation for training high-performance spiking neural networks. Front. Neurosci.12, 331 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Woo, S., Park, J., Lee, J.-Y. & Kweon, I. S. CBAM: Convolutional block attention module. In European Conference on Computer Vision (ECCV) (2018).

[CR38] 38.Kim, S., Park, S., Na, B. & Yoon, S. Spiking-YOLO: Spiking neural network for energy-efficient object detection. In AAAI 11270–11277 (2020).

[CR39] 39.Wu, Y. et al. Direct training for spiking neural networks: Faster, larger, better. In AAAI 1311–1318 (2019).

[CR40] 40.Shen, J., Ni, W., Xu, Q. & Tang, H. Efficient spiking neural networks with sparse selective activation for continual learning. In AAAI 611–619 (2024).

[CR41] 41.Fang, W. et al. Incorporating learnable membrane time constant to enhance learning of spiking neural networks. In Proceedings of the IEEE International Conference on Computer Vision 2641–2651 (Institute of Electrical and Electronics Engineers Inc., 2021). 10.1109/ICCV48922.2021.00266.

[CR42] 42.Bengio, Y., Léonard, N. & Courville, A. Estimating or propagating gradients through stochastic neurons for conditional computation. ArXiv (2013).

[CR43] 43.Xiao, H., Rasul, K. & Vollgraf, R. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. https://github.com/zalandoresearch/fashion-mnist (2017).

[CR44] 44.Krizhevsky, A. & Hinton, G. Learning Multiple Layers of Features from Tiny Images. https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (2009).

[CR45] 45.Turrigiano, G. G. Homeostatic plasticity in neuronal networks: The more things change, the more they stay the same. Trends Neurosci.22, 221–227 (1999). [DOI] [PubMed] [Google Scholar]

[CR46] 46.Dan, Y., Sun, C., Li, H. & Meng, L. Adaptive spiking neuron with population coding for a residual spiking neural network. Appl. Intell.55, 288 (2025). [Google Scholar]

[CR47] 47.Kheradpisheh, S. R., Mirsadeghi, M. & Masquelier, T. Spiking neural networks trained via proxy. IEEE Access10, 70769–70778 (2022). [Google Scholar]

[CR48] 48.Yan, Z., Zhou, J. & Wong, W.-F. Near lossless transfer learning for spiking neural networks. In AAAI 10577–10584 (2021).

[CR49] 49.Wang, Z. et al. Toward high-accuracy and low-latency spiking neural networks with two-stage optimization. IEEE Trans. Neural Netw. Learn. Syst.36, 3189–3203 (2025). [DOI] [PubMed] [Google Scholar]

[CR50] 50.Sun, C. et al. An energy efficient residual spiking neural network accelerator with ternary spikes. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst.44, 395–400 (2025). [Google Scholar]

[CR51] 51.Mirsadeghi, M., Shalchian, M., Kheradpisheh, S. R. & Masquelier, T. Spike time displacement-based error backpropagation in convolutional spiking neural networks. Neural Comput. Appl.35, 15891–15906 (2023). [Google Scholar]

[CR52] 52.Chen, T., Wang, L., Li, J., Duan, S. & Huang, T. Improving spiking neural network with frequency adaptation for image classification. IEEE Trans. Cogn. Dev. Syst.16, 864–876 (2024). [Google Scholar]

[CR53] 53.Liu, F., Xu, J., Yang, J. & Wu, W. A novel multi-type image coding method acting on supervised hierarchical deep spiking convolutional neural networks for image classification. Cognit. Comput.17, 9 (2025). [Google Scholar]

[CR54] 54.Huang, Z. et al. Modeling of spiking neural network with optimal hidden layer via spatiotemporal orthogonal encoding for patterns recognition. IEEE Trans. Emerg. Top. Comput. Intell.9, 2194–2207 (2025). [Google Scholar]

[CR55] 55.Liu, F. et al. SpikeConverter: An efficient conversion framework zipping the gap between artificial neural networks and spiking neural networks. In AAAI 1692–1701 (2022).

[CR56] 56.Yan, Z., Tang, K., Zhou, J. & Wong, W. F. Low latency conversion of artificial neural network models to rate-encoded spiking neural networks. IEEE Trans. Neural Netw. Learn. Syst.36, 14107–14118 (2025). [DOI] [PubMed] [Google Scholar]

[CR57] 57.Li, Y. et al. Efficient structure slimming for spiking neural networks. IEEE Trans. Artif. Intell.5, 3823–3831 (2024). [Google Scholar]

[CR58] 58.Liu, Q., Yan, J., Zhang, M. & Li, H. LitE-SNN: Designing lightweight and efficient spiking neural network through spatial-temporal compressive network search and joint optimization. In Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI-24) 3097–3105 (2024).

[CR59] 59.Selvaraju, R. R. et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision 2017-October 618–626 (Institute of Electrical and Electronics Engineers Inc., 2017).

[CR60] 60.Horowitz, M. 1.1 Computing’s energy problem (and what we can do about it). In Digest of Technical Papers—IEEE International Solid-State Circuits Conference vol. 57 10–14 (2014).

[CR61] 61.Yu, K., Zhang, T., Wang, H. & Xu, Q. FSTA-SNN:Frequency-Based Spatial-Temporal Attention Module for Spiking Neural Networks. www.aaai.org (2025).

PERMALINK

BIASNN: a biologically inspired attention mechanism in spiking neural networks for image classification

Kevin Takala

Wachirawut Thamviset

Sartra Wongthanavasu

Abstract

Introduction

Methods

Backbone architecture

Fig. 1.

Table 1.

Fig. 2.

Attention mechanism

Fig. 3.

ALIF block

Algorithm 1.

Experimental setup

Table 2.

Table 3.

Table 4.

Results

Discussion

Fig. 4.

Fig. 5.

Fig. 6.

Table 5.

Conclusion

Acknowledgements

Author contributions

Data availability

Declarations

Competing interests

Footnotes

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases