Abstract
Artificial neural networks (ANNs) are at the core of most Deep learning (DL) algorithms that successfully tackle complex problems like image recognition, autonomous driving, and natural language processing. However, unlike biological brains who tackle similar problems in a very efficient manner, DL algorithms require a large number of trainable parameters, making them energy-intensive and prone to overfitting. Here, we show that a new ANN architecture that incorporates the structured connectivity and restricted sampling properties of biological dendrites counteracts these limitations. We find that dendritic ANNs are more robust to overfitting and outperform traditional ANNs on several image classification tasks while using significantly fewer trainable parameters. These advantages are likely the result of a different learning strategy, whereby most of the nodes in dendritic ANNs respond to multiple classes, unlike classical ANNs that strive for class-specificity. Our findings suggest that the incorporation of dendritic properties can make learning in ANNs more precise, resilient, and parameter-efficient and shed new light on how biological features can impact the learning strategies of ANNs.
Introduction
The biological brain is remarkable in its ability to quickly and accurately process, store, and retrieve vast amounts of information while using minimal energy1. Artificial Intelligence (AI) systems, on the other hand, are notoriously energy hungry2–4 and often fail on tasks where biological systems excel, such as continual and transfer learning5–7. The most widely used AI method is deep learning (DL)8, which is applied in areas like computer vision9 and natural language processing10 and can even achieve superhuman performance in very specific tasks11,12. However, the number of trainable parameters needed to achieve such performance is large leading to generalization failures due to overfitting13, as well as energy consumption levels that are not sustainable14. Moreover, unlike the brain, DL methods still fail to achieve high-performance accuracy under noisy settings15,16 and tasks where information changes in a continuous manner17. This dichotomy between biological and artificial intelligence systems suggests that drawing inspiration from the brain may help enhance the efficiency of DL models, bringing them one step closer to emulating the biological way of information processing.
DL architectures rely heavily on multilayered artificial neural networks (ANNs) inspired by their biological counterparts. In these networks, artificial nodes are typically constructed as linearly weighted sums of their inputs followed by a nonlinearity, roughly imitating how the soma or axon of biological neurons integrates inputs18, and learning occurs via changes in the connection strengths (weights) between these nodes19. In contrast, biological neurons are much more complex, consisting of a soma, an axon, and numerous dendrites that enable them to process thousands of synaptic inputs in parallel, in ways that differ extensively between cell types20. Although the somatic and axonal functionalities of biological neurons are well captured in artificial neurons, the dendritic computations are currently missing.
Biological dendrites, because of their ability to generate local regenerative events (dendritic spikes)21,22, share a similar spiking profile as the neuronal soma. As a result, biological neurons can act as multi-layer ANNs23–26, able to perform complex computations27,28, such as logical operations29,30, signal amplification and segregation31,32, coincidence detection33–36, multiplexing37 and filtering of irrelevant or noisy stimuli38,39. Consequently, dendrites are thought to underlie complex brain functions, including perception40,41, motor behavior42,43, fear learning44–46, and memory linking47. Moreover, dendrites can help achieve such functions in an efficient manner. For example, they enable learning with few plastic synapses48, forming memories using small neuronal populations24, and increasing storage capacity49,50. Given the high computational power of dendrites and the associated benefits in biological networks27,51, the current design of artificial neurons seems outdated. Incorporation of dendritic properties would likely empower ANNs52–54, fostering more effective, efficient, and resilient learning behaviors like those seen in biological networks.
The above proposition is supported by recent studies that have integrated dendritic structures and their properties into traditional ANNs55–58, showing promising results on machine learning (ML) tasks59–64. For instance, adding active dendrites in ANNs was shown to enhance the network’s ability to learn continually63, while including a specific dendritic nonlinearity improved performance in a multitask learning scenario65. However, to achieve improved performance, these studies have either sacrificed biological plausibility64, used a very large number of trainable parameters63, or were applied to very simple tasks58.
Here, we propose a bio-realistic dendritic architecture that aims to improve learning in ANNs trained with the backpropagation algorithm. In the proposed architecture, inputs are fed into the dendritic layer, which is, in turn, connected to the somatic layer in a sparse and highly structured manner (Figure 1). Moreover, input sampling is inspired by the receptive fields of neurons in the visual cortex66,67 and amounts to sampling a restricted part of the input, similar to a specific form of convolutional networks, the so-called locally-connected networks68, as opposed to the entire image that is typically done in ANNs. By incorporating dendritic structural and sampling features, the new dendritic ANN models match or outperform traditional ANNs on several image classification tasks and counteract overfitting, while using orders of magnitude fewer trainable parameters. These improvements are likely due to a more extensive utilization of trainable weights and a different learning strategy used by dendritic versus traditional ANNs. Overall, our findings suggest that dendrites can augment the computational efficiency of ANNs without sacrificing their performance accuracy, opening new avenues for developing bio-inspired ML systems that inherit some of the major advantages of biological brains.
Figure 1. Schematic representation of the dendritic ANN (dANN) compared to a classical vanilla ANN (vANN).
A. Example of a layer 2/3 pyramidal cell of the mouse primary visual cortex (dendrites: pink; soma: grey) that served as inspiration for the artificial dendritic neuron in B. The morphology was adopted from Park et al.69. B. The dendritic neuron model consists of a somatic node (blue) connected to several dendritic nodes (pink). All nodes have a nonlinear activation function. Each dendrite is connected to the soma with a (cable) weight, , where and denote the dendrite and soma indices, respectively. Inputs are connected to dendrites with (synaptic) weights, , where and are indices of the dendrites and input nodes, respectively. , denotes the number of synapses each dendrite receives, and the number of dendrites per soma . C. The dendritic ANN architecture. The input is fed to the dendritic layer (pink nodes), passes a nonlinearity, and then reaches the soma (blue nodes), passing through another nonlinearity. Dendrites are connected solely to a single soma, creating a sparsely connected network. D. Typical fully connected ANN with two hidden layers. Nodes are point neurons (blue) consisting only of a soma. E. Illustration of the different strategies used to sample the input space: random sampling (R), local receptive fields (LRF), global receptive fields (GRF) and fully connected (F) sampling of input features. Examples correspond to the synaptic weights of all nodes that are connected to the first unit in the second layer. The colormap denotes the magnitude of each weight.
Results
To explore the role of dendritic properties in efficient learning, we developed a dendritic ANN (dANN) model with structured connectivity that loosely mimics the morphology of biological neurons (Figure 1A). In this model, each dendrite acts as a typical point neuron: it linearly sums its weighted inputs (synapses) and passes the sum through a nonlinearity. The dendritic activations are subsequently multiplied by the cable weights and summed at the soma before going through a second nonlinearity (Figure 1B). To train the model using ML platforms (e.g., TensorFlow, PyTorch, Jax), we implemented it as a traditional ANN with two sparsely connected hidden layers, representing the dendritic and somatic units, respectively, and a fully connected output layer (Figure 1C). For comparison purposes, we also implemented a fully connected, vanilla ANN (vANN) with the same number of layers (Figure 1D).
In addition to their structured connectivity, the dendrites of biological neurons typically receive only partial information from a visual scene70. To investigate the contribution of this property, we implemented four types of input sampling for the dendritic ANN model: a) random sampling of input features (R), b) local receptive fields (LRF) where each dendrite samples from a spatially restricted part of the image, c) global receptive fields (GRF) where all dendrites belonging to a soma sample from the same spatially restricted part of the image, and d) an all-to-all, fully connected type of sampling (F), which is also used by the vANN (Figure 1E). We then tested the learning capabilities of our models on various image classification tasks (Supplementary Figure 1) using the same (default) hyperparameters, optimization algorithm, and loss function (see Methods). To ensure fair comparisons, we tested equivalent network architectures for all models, i.e., consisting of the same number of nodes in each hidden layer.
Bio-inspired dendritic ANNs are more accurate, robust, and efficient than vanilla ANNs on image classification.
We first tested the learning capabilities of all models against the Fashion MNIST (FMNIST) dataset (Figure 2A). We found that dANNs with restricted input sampling achieve better learning and combat overfitting much more effectively than vANN models for both size-matched and larger vANN architectures (in terms of trainable parameters). This is evidenced by a consistently lower test loss for all dANN models - except the dANN-F - compared to vANNs of the same number of trainable parameters (Figure 2B). Importantly, the vANN models exhibit overfitting as the model size increases (Figure 2B and Supplementary Figure 2), while this does not occur for dANNs (for the model sizes tested), suggesting that dendrites may serve as natural regularizers71. Indeed, reductions in overfitting are also evident in vANNs when using various regularization methods or hyperparameter tuning such as dropout (Supplementary Figure 3), different learning rates (Supplementary Figure 4) or an early stopping criterion (Supplementary Figure 5). These overfitting reductions, however, are not as large as those seen in dANNs, suggesting that dendrites provide a more robust regularization effect.
Figure 2. Dendritic features improve learning on Fashion MNIST classification.
A. The Fashion MNIST dataset consists of 28×28 grayscale images of 10 categories. B. Average test loss as a function of the trainable parameters of the five models used: A dendritic ANN with random inputs (dANN-R, green), a dANN with LRFs (red), a dANN with GRFs (blue), a dANN with all-to-all inputs (dANN-F, orange), and the vANN with all-to-all inputs (grey). Horizontal and vertical dashed lines denote the minimum test loss of the vANN and its trainable parameters, respectively. The -axis is shown in a logarithmic scale (log10). C. Similar to B, but depicting the test accuracy instead of the loss. D. Test loss as a function of the number of dendrites per somatic node for the four dANN models. The linestyle (solid and dashed) represents different somatic numbers. The dashed horizontal line represents the minimum test loss of the vANN (512–256 size of its hidden layers, respectively). The -axis is shown in a logarithmic scale (log2). E. Similar to D, but showing the test accuracy instead of the loss. The dashed horizontal line represents the maximum test accuracy of the vANN (2048–512 size of its hidden layers, respectively). For all panels, shades represent three standard deviations across initializations for each model.
In addition to overfitting benefits, dANNs with restricted input sampling match the best performance of vANNs via the use of much fewer trainable parameters (Figure 2C), suggesting that dendritic features render ANNs more efficient. Among the four dANN configurations tested, the dANN with local receptive fields (dANN-LRFs) is the most efficient: it reaches maximum accuracy and minimum loss with over one order of magnitude fewer trainable parameters than the vANN. The dANN models with random (dANN-R) and global receptive field (dANN-GRF) sampling are slightly worse but significantly more efficient than the vANN, while the dANN with all-to-all input sampling (dANN-F) shows reduced overfitting but no efficiency gains with respect to accuracy. Overall, these findings suggests that both dendritic features, i.e., the structured dendritic connectivity and the restricted input sampling, contribute to the efficiency gains of dANN models.
Finally, we found that, as expected, learning in dANN models improves with network size (lower loss: Figure 2D, better accuracy: Figure 2E). More importantly, unlike other bio-inspired architectures72, dANN models appear to scale well with increasing depth (Supplementary Figure 6), revealing their potential for use in deeper architectures.
To substantiate our results on the beneficial role of dendritic features, we tested the dANN models on five additional benchmark datasets (Figure 3, Table 1, see §Datasets). As with FMNIST, we found that the best dANN models consistently outperformed - albeit slightly- the best vANN in terms of both accuracy and loss (Table 1 and Supplementary Table 1). Moreover, similarly to FMNIST, we found that dANNs with restricted input sampling (i.e., R, LRF, GRF) are much more efficient than vANN for all datasets. Specifically, they can match the accuracy (Figure 3A) and loss (Figure 3B) of the best vANN using 1–3 orders of magnitude fewer trainable parameters. It is worth noting that for more difficult tasks, like the CIFAR10 dataset, the difference in both the number of trainable parameters (Figure 3A-B) and the best accuracy (Table 1) between these dANNs and vANNs is more prominent. Finally, the all-to-all input sampling enables dANNs to achieve slightly higher accuracy than vANNs on certain datasets (Table 1), albeit with a greater number of trainable parameters (Figure 3A-B, purple bars), thus diminishing the efficiency gains of dANNs.
Figure 3. Dendrites improve performance across various benchmark datasets.
A. Number of trainable parameters that each model needs, dANN-R (green), dANN-LRF (red), dANN-GRF (blue), and dANN-F (orange) to match the highest test accuracy of the respective vANN (grey). B. The same as in B, but showing the number of trainable parameters required to match the minimum test loss of the vANN (grey). C. Accuracy efficiency score for all models and all datasets tested. Test accuracy is normalized with the logarithm of trainable parameters. The score is bounded in [0, 1]. D. Same as in C, but showing the loss efficiency score. Again, we normalized the test score with the logarithm of the trainable parameters times the number of epochs needed to achieve minimum validation loss. The score is bounded in . In all barplots the errorbars represent three standard deviations across initializations for each model.
Table 1.
Top test accuracy scores obtained by each model on five benchmark datasets across various configurations and their corresponding test loss. Performance accuracy and loss are listed as mean ± standard deviation over initializations for each model.
| MODEL PERFORMANCE – TOP ACCURACY | |||||
|---|---|---|---|---|---|
| Models | MNIST | FMNIST | KMNIST | EMNIST | CIFAR10 |
| Test accuracy (%) | |||||
| dANN-R | 98.090±0.0583 | 89.612±0.0870 | 91.076±0.0811 | 82.745±0.2568 | 52.458±0.4347 |
| dANN-LRF | 98.466±0.1058 | 90.256±0.2237 | 91.928±0.0643 | 83.166±0.0893 | 56.966±0.9796 |
| dANN-GRF | 98.576±0.0809 | 90.182±0.2164 | 90.046±0.2915 | 83.779±0.2064 | 56.998±0.4875 |
| dANN-F | 98.466±0.0862 | 90.184±0.2203 | 90.61±0.6927 | 84.121±0.1880 | 60.12±0.3906 |
| vANN | 98.034±0.2742 | 89.288±0.3654 | 91.552±0.6629 | 83.381±0.2681 | 49.082±1.2092 |
| Test loss | |||||
| dANN-R | 0.0644±0.0013 | 0.3245±0.0028 | 0.5068±0.0045 | 1.0591±0.0225 | 1.4612±0.0172 |
| dANN-LRF | 0.0483±0.0018 | 0.2975±0.0042 | 0.4374±0.0037 | 0.6105±0.0048 | 1.2684±0.0230 |
| dANN-GRF | 0.0471±0.0022 | 0.2947±0.0052 | 0.5757±0.0335 | 0.5605±0.0077 | 1.3398±0.0217 |
| dANN-F | 0.0503±0.0023 | 0.2935±0.0043 | 0.4789±0.0378 | 0.5390±0.0060 | 1.2811±0.0247 |
| vANN | 0.0967±0.0116 | 0.4040±0.0066 | 0.8246±0.0791 | 2.0860±0.0476 | 2.0455±0.1375 |
To quantify the efficiency differences between dANNs and vANNs, we formulated the efficiency score metrics, which normalize the best accuracy (Figure 3C) and the corresponding loss (Figure 3D) that a given model can achieve with the number of trainable parameters used multiplied by the number of epochs needed to reach minimum validation loss (see Methods). These metrics consider both the size of a network (trainable parameters) and a proxy of its convergence speed (training epochs). We found that dANN models with restricted input sampling exhibit higher efficiency than vANNs across all datasets. This is not the case for the dANN with all-to-all input sampling, which seems to be more efficient than vANNs in terms of minimum loss but less efficient in terms of maximum accuracy (Figure 3C, D). This is because the dANN-F model has a very large number of parameters and needs a large number of epochs for training (Supplementary Figure 7), yet does not suffer from overfitting, thus achieving a smaller loss for similar accuracy in the majority of the datasets tested.
The above findings illustrate that the two dendritic features implemented here, i.e., the structured connectivity and the restricted input sampling, provide important efficiency gains on image classification compared to classical vANNs. To dissect the relative contributions of these two features, we compared the dANN models to densely connected vANN models (Supplementary Figure 8 and Supplementary Figure 9) and sparsely connected sANN models (Supplementary Figure 10 and Supplementary Figure 11), furnished with the four types of input sampling. We found that while restricted sampling improves the performance efficiency of vANNs, dANNs still outperform these improved models (Supplementary Figure 8) suggesting that restricted sampling alone cannot account for the improved efficiency of dANN models.
To assess the contribution of structured dendritic connectivity, we next compared dANNs to randomly connected, sparse ANN models (sANNs). Sparse neural networks were previously shown to exhibit improved performance73 and since dANNs are a specific subset of sANNs, it is likely that efficiency gains stem primarily from internal sparsity. We found that the bio-inspired dANNs, namely those with structured restricted input sampling (LRF and GRF), consistently outperform sANNs in terms of efficiency gains. When comparing dANNs and sANNs with matched input sampling types -whereby the only difference is the structured vs. random internal connectivity-, differences in efficiency gains are reduced. However, dANNs still exhibit higher efficiency gains (Supplementary Figure 10) suggesting that a structured, tree-like internal sparsity provides additional efficiency benefits.
Overall, these experiments confirm that both dendritic features, namely the structured connectivity and the restricted input sampling, contribute significantly to improving the performance accuracy of dANNs, their efficiency and their resilience to overfitting. Given these findings, in the following sections we focus our analysis on the bio-inspired dANN models (i.e., those with restricted sampling: dANN-R, dANN-LRF, dANN-GRF) and fully connected vANN models.
Bio-inspired dANNs employ a distinct learning strategy and fully exploit their available resources
To better understand why dANN models with restricted input sampling outperform vANNs, we analyzed their weight distributions post-learning of the Fashion MNIST dataset. We found a broader distribution, i.e., larger range of values, of synaptic (layer 1) weights for dANNs compared to vANN (Supplementary Table 2 and Figure 4A, top row) and a bimodal distribution of dANN cable (layer 2) weights, all centered around zero (Supplementary Table 2 and Figure 4A, middle row). For the dANN-LRF model, in particular, there were very few cable weights close to zero, indicating that the model effectively utilizes all trainable parameters of this layer. In contrast, in the vANN model, weights follow a Gaussian-like distribution centered around zero, suggesting that many weights are not as effectively utilized. Finally, the distribution of the output layer weights in dANNs is broader than in the vANN model (Supplementary Table 2 and Figure 4A, bottom row). These observations suggest that bio-inspired dANNs fully exploit their trainable parameters, especially their cable (second layer or dendrosomatic) weights, compared to the second hidden layer of the vANN.
Figure 4. Bio-inspired dANN models fully exploit their available resources and solve the task using a different learning strategy.
A. Weight probability density functions after training for dANN-R, dANN-GRF, dANN-LRF, and vANN. The density functions are built by concatenating all weights across initializations for each model. First hidden layer (top row), second hidden layer (middle row), and output layer (bottom row) weights are shown. Both and axes are shared across all subplots for visual comparison among the density plots. Supplementary Table 2 contains the kurtosis, skewness, and range of all KDE plots. B. Probability density function of the entropy (bits) for the first (normal color) and second (shaded color) hidden layer, respectively. Entropies are calculated using the activations of each layer for all test images of FMNIST (see Methods). Silent nodes have been excluded from the visualization. Higher values signify mixed selectivity, whereas low values indicate class specificity. C. Probability density functions of selectivity for both layers (different color shades) and all models (columns). For all histograms, the bins are equal to the number of classes, i.e., for the FMNIST dataset.
To delineate how the nodes of the different models contribute to a decision, we looked into their selectivity. First, we calculated the information entropy, which measures how class-specific a node is. High entropy values indicate mixed selectivity, whereby the node is active for more than one class, while low values indicate class specificity. We found opposite entropy distributions between the bio-inspired dANNs and the vANN. This means that the dANN models primarily have mixed-selective nodes in both hidden layers, while vANNs primarily have class-specific nodes. This difference was even more pronounced for dANNs with global or local RFs (Figure 4B).
To assess whether the observed differences in entropy map onto node specificity, we formulated the selectivity index, which counts how many classes a given node responds to. Specifically, if a node is active (activation greater than zero for a given image) for more than 400 images of a specific class, corresponding roughly to 40% of testing images, its selectivity index for that class is set to one. As with entropy distributions, we found that in bio-inspired dANNs, both layers consist primarily of mixed-selective nodes, while the vANN contains primarily class-specific nodes (Figure 4C). These observations suggest that dANN and vANN models employ different strategies to solve the same classification task.
To complete our interpretability analysis, we visualized the hidden representations of the compared dANN and vANN models post-learning. The goal was to assess the amount of high-level information that is extracted by the first and second hidden layers across models (i.e., dendritic and somatic layers for dANNs, respectively). We applied the T-distributed stochastic neighbor embedding (TSNE), an algorithm that reduces the dimensionality and allows visualization of high-dimensional data74. By visual inspection, we observed a change in the representation between the dendritic and somatic layers of dANN models, similarly to representations of vANN between its two hidden layers (Figure 5A-D). We quantified the separability of the representations using the silhouette and the neighborhood (NH) scores, which measure the global and local degree of separability, respectively (see Methods). In the three dANN models, global and local separability was increased from the dendritic to the somatic layer (Figure 5E-F), something that we also observed in the hidden layers of the vANN. This means that the discriminatory power of both dANNs and vANN increases across layers in a similar way. This is in line with the findings of Figure 4, whereby the vANN is shown to have higher class-specificity than the dANNs in the first layer, and thus higher separability scores. Importantly, our results regarding the properties of the representations in low-dimensional space reflect the properties of the high-dimensional data as shown by their high trustworthiness scores (Figure 5G). The latter measures the extent to which the local structure of the data is retained after projection to the lower-dimensional space. Values close to 1 indicate high reliability. Figure 5G suggests that the three dANNs do a better job in retaining the structure of the original data in their representations, as measured by TSNE, compared to vANNs. This is probably a result of the different strategy implemented by these networks.
Figure 5. Learned representations.
A-D. TSNE projections of the activations for the first (left column) and the second (right column) hidden layers corresponding to the three dANN and the vANN models. Different colors denote the image categories of the FMNIST dataset. While the figure shows the results of one run, the representations are consistent across 10 runs of the TSNE algorithm (data not shown). E. Silhouette scores of the representations (2-way ANOVA: model F(3,32)=1598.31, p<10–3, layer F(1,32)=2130.39, p<10–3, model x layer F(3,32)=105.20, p<10–3). F. Neighborhood scores of the representations, calculated using 11 neighbors (2-way ANOVA: model F(3,32)=8624.78, p<10–3, layer F(1, 32)=18512.42, p<10–3, model x layer F(3, 32)=299.51, p<10–3). G. Trustworthiness of the representations, calculated using 11 neighbors (2-way ANOVA: model F(3,32)=6187.66, p<10–3, layer F(1,32)=1856.84, p<10–3, model x layer F(3, 32)=1777.98, p<10–3). In all barplots the errorbars represent three standard deviations across initializations for each model and 10 runs of the TSNE algorithm per initialization. Stars denote significance with unpaired t-test (two-tailed) with Bonferroni’s correction.
Overall, our interpretability analysis reveals that dANNs with restricted input sampling use a different strategy than the vANN model to achieve accurate, robust, and efficient image classification: rather than becoming class-specific early on like the vANN, the dANN models exhibit mixed-selectivity in both layers. This strategy may underlie their ability to create trustworthy representations of the input data, and achieve high performance accuracy and reduced overfitting, while using significantly fewer yet fully utilized trainable parameters.
Dendritic benefits are more pronounced as the task difficulty increases
Our image discrimination results suggest that the difference between bio-inspired dANN and vANN models may be larger for more difficult tasks/datasets (see results for CIFAR10 in Table 1). To test this hypothesis, we constructed learning scenarios that are known to be challenging for ANN models.
First, we added Gaussian noise (with a variable σ and zero mean) to all images in the FMNIST dataset, thus creating new datasets of increasing classification difficulty (Figure 6A). We then selected the best vANN and the corresponding dANNs that matched its performance accuracy on FMNIST (from Figure 2 and Figure 3) and tested their performance on the noisy datasets. We found that, while the performance of all models declined with increasing noise levels, dANNs with restricted input sampling demonstrated higher efficiency and resilience. This is evident by a slower increase rate for the loss and a slower drop rate for the accuracy efficiency scores, respectively, compared to vANNs (Figure 6B). In all cases, the best-performing dANN was the one with local RFs (dANN-LRF).
Figure 6. Bio-inspired dANNs are more accurate and efficient than vANNs when inputs are noisy or presented in a sequential manner.
A. An example of one FMNIST image with variable Gaussian noise. Sigma () is the standard deviation of the Gaussian noise. B. Testing loss (left) and accuracy (right) efficiency scores for all models and noise levels. Shades represent three standard deviations across network initializations for each model. C. The sequential learning task. D. As in B, but showing the loss (left) and accuracy (right) efficiency scores for the sequential task. Errorbars denote three standard deviations across initializations for each model. See Table 2 and Supplementary Table 3 for the accuracy and loss values.
To confirm the advantages of dANN models on challenging tasks, we constructed a second learning scenario that remains challenging for traditional ANNs. In this task, models were fed with batches of inputs belonging to the same class in a sequential manner (Figure 6C). This process, which was repeated 50 times (epochs), results in models receiving information only from images of a single class during gradient calculation. As with the noisy task, the three dANN models were more accurate (Table 2), less variable across different initializations, and much more efficient than the vANN, as evidenced by their loss and accuracy efficiency scores (Figure 6D). The best-performing dANN was again the one with local RFs (dANN-LRF). Overall, these findings suggest that incorporation of dendritic features in ANNs may result in even greater robustness, accuracy and efficiency gains when the task difficulty is increased.
Table 2.
Test accuracy obtained by each model on the sequential learning task, and their corresponding test loss, using the FMNIST dataset. Test accuracy and loss are listed as mean ± standard deviation across initializations for each model.
| MODEL PERFORMANCE – SEQUENTIAL LEARNING | ||
|---|---|---|
| Models | Test accuracy (%) | Test loss |
| dANN | 56.834±1.6748 | 3.3947±0.3063 |
| dANN-LRF | 59.658±1.5005 | 1.6532±0.1783 |
| dANN-GRF | 59.348±1.3412 | 1.8007±0.1537 |
| vANN | 28.458±9.4994 | 313.466±240.8529 |
Discussion
Bio-inspired machine learning is one of the most dynamic branches of AI. Biological dendrites and their learning rules, are among the top candidates being explored, already showing highly promising results in artificial neural networks55,56,75,76. Recent studies in dendritic networks focus on their potential to tackle difficult problems such as continual60,63 and multitask learning65 and propose solutions to the credit assignment problem without backpropagation56,57. However, the processing power and efficiency of biological networks, largely endowed by their dendrites, are still far from being matched by respective ANNs. Towards this goal, we focused on the structured connectivity and restricted sampling characteristics of dendrites, two prominent features that are conserved across brain regions and species77,78, suggesting that their role in information processing is likely to be very important.
We constructed a set of dendritic ANNs that leverage the structured connectivity and restricted sampling features of biological dendrites to enhance learning. dANNs are constructed as typical ANNs with two hidden layers, in which the first (dendritic) layer is connected in a sparse and structured manner to the second (somatic) layer so that it resembles the structured connectivity of biological dendrites to their respective somata. Input sampling was inspired by the receptive fields of neurons in the visual cortex67,78, whereby dendrites (and neurons) sample only a restricted part of the visual space. We compared our models to vanilla ANNs across numerous image classification tasks and found that they are superior in performance accuracy, degree of overfitting, robustness to noise, and sequential learning. Compared to vANN, these benefits are achieved with orders of magnitude fewer trainable parameters, making dANNs more efficient and effective. As there is growing concern that the demand for computational resources to develop and apply AI models could lead to a significant increase in the electricity consumption of data centers worldwide79, dANNs are especially valuable for edge computing and other energy-constrained scenarios. Even though we focused on simple ANNs that form the foundation of nearly all DL architectures, it is important to note that our study also offers a framework for integrating dendritic features into various models, such as convolutional neural networks, transformers, and others80. This involves replacing the fully connected layers commonly found in deep learning architectures with our dANN model.
To dissect the relative contributions of structured connectivity vs. restricted input sampling, we also compared dANN models to fully connected ANNs and sparse (randomly-connected) ANNs furnished with the input sampling types that were used in the dendritic models. We found that both features are important for achieving the overfitting and accuracy efficiency gains and that their relative contribution varies with the dataset and task used.
Of note, while some of the sparse networks approximate the performance of dANNs, the space of all sparse networks of a given size is vast, and finding those networks that work best is non-trivial. This work demonstrates that a biologically-inspired structured architecture (namely the dendritic one) can result in improved and more efficient performance. This finding is unexpected, as the bio-inspired architecture -including tree-like hidden layer connectivity and sampling with receptive fields- is just one of the too many possible sparse architectures. Knowing that nature has identified a network architecture that is more efficient than classical neuronal network architectures is important for designing efficient systems, without the need for extensive and expensive grid searches to identify such architectures.
Throughout these comparisons, the dANN model with LRF subsampling emerged as the most efficient configuration from a computational perspective. LRF subsampling assumes that individual dendrites of a given soma preferentially receive inputs from focused regions of a visual scene and that these regions are not necessarily close in visual space (Figure 1). In the visual cortex, this could amount to receiving clustered inputs from presynaptic neurons that share similar feature selectivity, allowing for selective and distributed information processing within the dendritic tree. Several studies in both the cortex and the hippocampus provide evidence that dendrites can sample from the same feature space and receive inputs with correlated tuning properties40,81–84. Additionally, dendrites can exhibit local, branch-specific integration and plasticity85. This feature-based input organization is further supported by observations of branch-specific dendritic depolarizations and the generation of local dendritic spikes in response to specific input combinations29,86. This structured organization of inputs has been suggested to confer important computational advantages for learning and memory storage49,87–89.
Our interpretability analysis revealed that dANN models -unlike vANN models- form primarily mixed-selective nodes. This finding aligns with experimental evidence of mixed-selective neurons in various cortical regions of the mouse and primate brains90–93. Such neurons can encode multiple task-relevant features simultaneously, a property believed to be important for flexible decision-making and information processing, especially in higher cortical regions such as the prefrontal cortex. The ability to form and utilize mixed-selective nodes suggests that dANNs may enable more efficient and adaptable information processing, similar to biological neural networks. Further research into the similarities and differences between mixed-selective nodes in ANNs and mixed-selectivity neurons in the brain could provide valuable insights into the principles underlying intelligent behavior.
Beyond the findings reported here, other studies have also adopted dendritic properties in ANNs and DL models. These studies are different and complementary to ours in several ways. For example, one approach implemented dendrites as max-pooling or average-pooling layers63,64,94, two methods that are extensively used in DL models95. Other approaches were more abstract, modeling dendrites as a multiplicative component and/or using all-to-all connectivity from the input layer onto dendrites62,96–98. Some methods use specific dendritic components, such as normalization of the weights or specific dendritic spikes, but their connectivity matrices become sparse using, for example, evolutionary algorithms, or they contain fully connected layers65,99,100. Lastly, dendritic models incorporating local learning rules have been used to study how the brain solves the credit assignment problem, showing promising learning capabilities but are not easily applicable to large ML applications55,56,75.
Inspired by Jones and Kording58, our modeling approach considers dendrites as an additional layer that provides weighted inputs to the somatic nodes. This means that dANN can be viewed as a sparse ANN from an ML perspective. In ANNs, sparsity can be achieved by pruning after training101,102, using evolutionary algorithms103,104, specific regularization105,106 during training, or iterative methods applied before training107,108. Similar to the latter approaches, in dANNs, the connectivity sparsity is handcrafted from the start, creating a fixed architecture that is significantly smaller than typical vANNs undergoing a pruning process. This makes training faster and more efficient, as no pruning is required. Furthermore, inputs to dendrites are not randomly allocated but can be constructed based on RFs, setting our model apart from traditional sparse networks that rely on random connectivity initialization. Finally, RFs are created before training and can be modified to capture the most essential characteristics of a dataset. These attributes lend biological inspiration to our dANN models and are expected to be advantageous for neuromorphic hardware implementation, particularly when faced with space limitations and increased energy consumption resulting from lengthy node connections.
It is crucial to acknowledge the boundaries and limitations of our dANN architecture. In the implementation presented here, to maintain their initial connectivity, dANNs necessitate an extra boolean mask multiplication after every gradient descent step. This additional step results in a higher computational expense regarding floating-point operations. Moreover, during training, we employ the backpropagation algorithm and discard some gradients that aren’t linked to an existing connection, potentially losing vital information from other gradient directions that could result in faster convergence. Using locally computed gradients that are dependent only on the connected nodes would overcome these limitations and further improve the efficiency/performance of our dANN models, but such an implementation is currently not possible with existing ML platforms and requires custom-made code. Of note, the same limitations apply to sparse network architectures that are implemented using the existing ML platforms. Finally, the cable weights used here are unconstrained. A more bio-realistic approach would be to restrict these weights to positive values or set them to fixed, positive values. Nevertheless, we believe our work is valuable as it offers new insights into the benefits, i.e., improved accuracy, less overfitting, and much fewer parameters, that can be gained by adopting dendritic features in classical ANNs. These advantages combined with their ability to scale relatively well with depth, render the proposed dANN networks a potentially powerful alternative to classical ANNs through their incorporation in DL architectures such as convolutional neural nets or transformers.
Overall, we show that implementing dendritic properties can significantly enhance the learning capabilities of ANNs, making them both accurate and efficient. These findings hold great promise as they suggest that integrating biological characteristics could be crucial for optimizing the sustainability and effectiveness of ML algorithms.
Methods
Network architectures
We have developed a range of traditional ANNs consisting of two hidden layers and an output layer that matches the number of classes. To create the dANN model, we first create two boolean masks that determine the synaptic weights between the input and dendritic layers, as well as the cable weights between dendritic and somatic layers. Once we have initialized the model, we apply these masks to achieve a sparse network with structured synaptic and cable weights (Eq. 1).
| (1) |
where denotes the weights, the boolean mask associated -th layer, and is the Hadamard (elementwise) product.
The calculation for the forward pass is obtained by linearly combining the inputs with the weights and adding the bias in each node of all layers. Finally, the output is obtained by passing the summation through a (nonlinear) activation function (Eq. 2).
| (2) |
where and denote the inputs to the -th layer and its activations, respectively. is the activation function.
In the output layer, we calculate the loss, which is then propagated back to calculate the gradients with respect to all trainable parameters respectively. To ensure the same connectivity as the original model, we zero out all gradients calculated in non-existent connections (Eq. 3).
| (3) |
where is the loss and denotes the trainable parameters.
Synaptic connections
To define the input to the dendritic connectivity matrix, we utilize three distinct strategies. Our first approach involves random allocation, where each dendrite receives 16 inputs (pixels) which are randomly selected from the image (dANN-R). Our second approach utilizes local-constructed receptive fields (dANN-LRF), where each dendrite again receives 16 inputs (pixels), but this time they are sampled from a restricted part of the image. To do so, we randomly select a pixel to represent the center of the receptive field for each dendrite. In particular, the central pixel is drawn from a uniform distribution. Then, the 16 inputs are chosen from the 4×4 neighborhood of that pixel. The process is repeated for all dendrites. Finally, we utilize a global-constructed receptive field (dANN-GRF). In this approach, we select a pixel to represent the receptive field center for each soma instead of each dendrite. Then, the central pixel of each dendrite belonging to that soma has a central pixel drawn from a uniform distribution around the central somatic pixel. Finally, dendrites receive 16 inputs from the 4×4 neighborhood of their central pixel, as before.
Datasets
The dANN models are trained to classify images into one of their respective classes. The MNIST109 consists of handwritten digits from 0 to 9. Fashion MNIST110 is an alternative to MNIST and consists of clothing images: T-shirt/top, trousers, pullover, dress, coat, sandal, shirt, sneaker, bag, and ankle boot. Kuzushiji MNIST111 is a drop-in replacement for the MNIST dataset consisting of one Japanese character representing each of the ten rows of Hiragana. All of these datasets come with 60,000 training and 10,000 test images. Extended MNIST112 follows the same conversion paradigm used to create the MNIST dataset. The result is a set of datasets that constitute more challenging classification tasks involving letters and digits. Here, we used the 47 balanced classes with 731,668 training and 82,587 testing images. All MNIST variants consist of 28×28 grayscale images. Finally, CIFAR-10113 consists of images of objects or animals in ten classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. The dataset contains 50,000 training and 10,000 test images. The images are 32×32 pixels in three color channels.
For our experiments, we trained the models with 90% of the training data, keeping the remaining 10% for validation. Once the training was complete, we evaluated the performance on the test set.
Hyperparameters
Our models were trained using the Adam optimization algorithm, with the default parameters of a learning rate of 0.001 and betas of 0.9 and 0.999. To ensure efficient training, we utilized a minibatch size of 128. The number of epochs was variable for each dataset but the same across models. Specifically, for MNIST, FMNIST, and KMNIST we used 15, 20, and 20 epochs, respectively, whereas for EMNIST and CIFAR10 we used 50 epochs. For the sequential learning scenario, we used 50 epochs to train the models. In our dANN models, each dendrite receives inputs from 16 input neurons. We calculate the loss using the cross-entropy function and have set the activation function of all nodes to the Leaky Rectified Linear Unit (ReLU) with a negative slope of 0.1, except for the output nodes, which utilize the softmax activation function.
Efficiency scores
To calculate the efficiency scores for all networks, we formulated the accuracy (aes) and loss (les) efficiency scores, respectively. To do so, we normalized the accuracy and loss with a factor that takes values from and is the ratio of the logarithm, with base 10, of the number of trainable parameters of a model with the minimum number of trainable parameters of the compared models times the number of epochs needed to reach minimum validation loss (Eq. 4).
| (4) |
By dividing the accuracy by the factor , the score remains in [0,100] while multiplying the loss with the loss score remains in (Eq. 5).
| (5) |
where denotes the number of trainable parameters multiplied by the number of epochs needed to reach minimum validation loss of the model , and denotes the compared models.
Using these scores, models with large numbers of trainable parameters and training epochs have lower accuracy efficiency scores and higher loss efficiency scores compared to models with fewer trainable parameters and fewer training epochs.
Interpretability analysis
Synaptic, cable, and output weights:
To display the learned weights, we constructed histograms with 20 bins and utilized kernel density estimation (KDE) to approximate the underlying distribution using a continuous probability density curve. The KDE plot smooths the observations with a Gaussian kernel, generating a continuous estimation. The histograms are built by concatenating the learned weights across initializations for each ANN model.
Entropy:
To calculate the entropy for the first and second hidden layers, we used the activations of all nodes during evaluation against the test set. First, we created the hit matrix () for each layer that assigns the number of times a node was activated (activity above zero) for images belonging to the same category.
To calculate normalized probabilities, we add an extra row to HM containing the number of times a node remained inactive. Thus, the summation across rows is equal to the number of data samples in the test set.
Obtaining the probability matrix by dividing the HM by the number of images, we calculate the entropy for each node in the -th layer (Eq. 6).
| (6) |
We plot the entropy distributions using histograms with 20 bins and the KDE method to estimate a continuous probability density function. We removed inactive nodes from the analysis.
Selectivity:
To calculate how selective a node is, we calculated how many categories it was activated for. We consider activity significant if a node was activated for over 400 images of a specific category. Thus, we ended with integers in , with 1 denoting class-specificity and total mixed-selectivity. As selectivity is a discrete metric, we plot the histograms with bins and without using KDE.
Dimensionality reduction and analysis
To calculate the representations of the hidden layers, we used the t-distributed stochastic neighbor embedding (TSNE) dimensionality reduction method74 with perplexity equal to 50. We chose this technique based on its widespread popularity and proven ability to preserve neighborhoods and clusters in projections. Activations for a given layer, the subject of our analysis, are extracted strictly for a random subset of 2,000 observations from the test sets to aid visual presentation. We visualize projections as scatterplots, with points colored to show their class. To assess the quality of the projection and its discriminatory power, we employ two metrics. The Silhouette score calculates the global structure of the projection. It shows if activations of images belonging to the same category are close in the reduced space114, and the neighborhood hit score (NH) shows the local structure in the projection and indicates how well classes are visually separated115,116.
Silhouette score is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample (Eq. 7).
| (7) |
and takes values in [−1, 1], with values closer to 1 denoting good clustering and values close to 0 indicating overlapping clusters. Negative values indicate a sample has been assigned to the wrong cluster, as a different cluster is more similar. The Silhouette score is the average over all samples.
The NH denotes how similar a point is to its nearest neighbors in the reduced space. For a given , the NH for a point is the percentage of the -nearest neighbors that belong to the same class as (Eq. 8).
| (8) |
where denotes the category of the data point , and are the point in its neighborhood. Then the neighborhood hit score is calculated as an average across all points in the dataset. The score is bounded in [0, 1], with higher values denoting better neighborhood compactness and small values of misplaced data points. Here, we use .
Trustworthiness is a metric that expresses the extent to which the local structure is retained after the dimensionality reduction117 (Eq. 9).
| (9) |
where denotes the rank of the datapoint according to the pairwise distances between the low-dimensional datapoints, and represents the -nearest neighbors of datapoint in the low-dimensional space, but not in the high-dimensional space.
Thus, any unexpected nearest neighbors in the low-dimensional space are penalized proportionally to their rank in the high-dimensional space. The trustworthiness is within [0, 1]. Here, we use .
Computing resources and software
All simulations were performed on a custom machine under the Debian GNU/Linux trixie/sid (kernel version 6.6.15–2) operating system with 64GB of RAM, Intel® Core™ i5–10400 CPU @ 2.90GHz, and an NVIDIA GeForce RTX 3080 Ti GPU @ 12GB. We implemented all models using the Keras 2.15.0 functional API118 with TensorFlow 2.15.0 backend119 under Python 3.9.18 (conda 23.7.4). For better handling of the training process, we used a custom training loop. For data analysis and visualization, we utilized various Python modules, including numpy 1.24.4120, scikit-learn 1.4.1121, pandas 1.5.3122, matplotlib 3.8.3123, seaborn 0.13.2124 and seaborn-image 0.8.0125.
Statistical Analysis
For all standard statistical tests (detailed in figure legends), the significance level was 0.05. To correct for multiple comparisons, was divided by the number of tests according to the Bonferroni procedure. Throughout the figures, p values are denoted by * (p<0.05), ** (p<0.01), and *** (p< 0.001). To compare the dependent value among different groups (models x layers), we used a two-way analysis of variance (ANOVA) followed by an unpaired t-test (two-tailed) with Bonferroni’s correction whenever statistical difference was observed for post hoc comparisons. The statistical analysis was performed using the pingouin 0.5.4 library126.
Supplementary Material
Acknowledgments
The authors thank Michalis Pagkalos, Dr. Athanasia Papoutsi, Prof. Blake Richards, Ioannis-Rafail Tzonevrakis, Dr. Eirini Troullinou, and Prof. Grigorios Tsagkatakis for their valuable and constructive feedback. We also thank the reviewers for their detailed and highly constructive feedback. This work was supported by NIH (1R01MH124867-04) to P.P., the European Commission (H2020-FETOPEN-2018-2019-2020-01, FET-Open Challenging Current Thinking, NEUREKA GA-863245) to P.P.
Footnotes
Competing interests
The authors declare no competing interests.
Computer code
The code underlying this study is available on GitHub.
Data availability
The source code that generates all Figures and the data that support this study are accessible on GitHub.
References
- 1.Attwell D. & Laughlin S. B. An Energy Budget for Signaling in the Grey Matter of the Brain. J. Cereb. Blood Flow Metab. 21, 1133–1145 (2001). [DOI] [PubMed] [Google Scholar]
- 2.Luccioni A. S., Jernite Y. & Strubell E. Power Hungry Processing: Watts Driving the Cost of AI Deployment? Preprint at http://arxiv.org/abs/2311.16863 (2023).
- 3.Strubell E., Ganesh A. & McCallum A. Energy and Policy Considerations for Deep Learning in NLP. in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (eds. Korhonen A., Traum D. & Màrquez L.) 3645–3650 (Association for Computational Linguistics, Florence, Italy, 2019). doi: 10.18653/v1/P19-1355. [DOI] [Google Scholar]
- 4.Mehonic A. & Kenyon A. J. Brain-inspired computing needs a master plan. Nature 604, 255–260 (2022). [DOI] [PubMed] [Google Scholar]
- 5.McCloskey M. & Cohen N. J. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem. in Psychology of Learning and Motivation vol. 24 109–165 (Elsevier, 1989). [Google Scholar]
- 6.Abraham W. C. & Robins A. Memory retention – the synaptic stability versus plasticity dilemma. Trends Neurosci. 28, 73–78 (2005). [DOI] [PubMed] [Google Scholar]
- 7.Mesnil G. et al. Unsupervised and Transfer Learning Challenge: a Deep Learning Approach. in Proceedings of ICML Workshop on Unsupervised and Transfer Learning 97–110 (JMLR Workshop and Conference Proceedings, 2012). [Google Scholar]
- 8.LeCun Y., Bengio Y. & Hinton G. Deep learning. Nature 521, 436–444 (2015). [DOI] [PubMed] [Google Scholar]
- 9.Guo Y. et al. Deep learning for visual understanding: A review. Neurocomputing 187, 27–48 (2016). [Google Scholar]
- 10.Chai J. & Li A. Deep Learning in Natural Language Processing: A State-of-the-Art Survey. in 2019 International Conference on Machine Learning and Cybernetics (ICMLC) 1–6 (IEEE, Kobe, Japan, 2019). doi: 10.1109/ICMLC48188.2019.8949185. [DOI] [Google Scholar]
- 11.Silver D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016). [DOI] [PubMed] [Google Scholar]
- 12.Fuchs F., Song Y., Kaufmann E., Scaramuzza D. & Durr P. Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning. IEEE Robot. Autom. Lett. 6, 4257–4264 (2021). [Google Scholar]
- 13.Ying X. An Overview of Overfitting and its Solutions. J. Phys. Conf. Ser. 1168, 022022 (2019). [Google Scholar]
- 14.Yang T.-J., Chen Y.-H., Emer J. & Sze V. A method to estimate the energy consumption of deep neural networks. in 2017 51st Asilomar Conference on Signals, Systems, and Computers 1916–1920 (IEEE, Pacific Grove, CA, USA, 2017). doi: 10.1109/ACSSC.2017.8335698. [DOI] [Google Scholar]
- 15.Nazaré T. S., Da Costa G. B. P., Contato W. A. & Ponti M. Deep Convolutional Neural Networks and Noisy Images. in Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (eds. Mendoza M. & Velastín S.) vol. 10657 416–424 (Springer International Publishing, Cham, 2018). [Google Scholar]
- 16.Dodge S. & Karam L. Understanding how image quality affects deep neural networks. in 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX) 1–6 (IEEE, Lisbon, Portugal, 2016). doi: 10.1109/QoMEX.2016.7498955. [DOI] [Google Scholar]
- 17.Parisi G. I., Kemker R., Part J. L., Kanan C. & Wermter S. Continual lifelong learning with neural networks: A review. Neural Netw. 113, 54–71 (2019). [DOI] [PubMed] [Google Scholar]
- 18.McCulloch W. S. & Pitts W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133 (1943). [PubMed] [Google Scholar]
- 19.Rumelhart D. E., Hinton G. E. & Williams R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986). [Google Scholar]
- 20.Spruston N. Pyramidal neurons: dendritic structure and synaptic integration. Nat. Rev. Neurosci. 9, 206–221 (2008). [DOI] [PubMed] [Google Scholar]
- 21.Major G., Larkum M. E. & Schiller J. Active Properties of Neocortical Pyramidal Neuron Dendrites. Annu. Rev. Neurosci. 36, 1–24 (2013). [DOI] [PubMed] [Google Scholar]
- 22.Larkum M. E., Wu J., Duverdin S. A. & Gidon A. The Guide to Dendritic Spikes of the Mammalian Cortex In Vitro and In Vivo. Neuroscience 489, 15–33 (2022). [DOI] [PubMed] [Google Scholar]
- 23.Poirazi P., Brannon T. & Mel B. W. Pyramidal Neuron as Two-Layer Neural Network. Neuron 37, 989–999 (2003). [DOI] [PubMed] [Google Scholar]
- 24.Tzilivaki A., Kastellakis G. & Poirazi P. Challenging the point neuron dogma: FS basket cells as 2-stage nonlinear integrators. Nat. Commun. 10, 3664 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Beniaguev D., Segev I. & London M. Single cortical neurons as deep artificial neural networks. Neuron 109, 2727–2739.e3 (2021). [DOI] [PubMed] [Google Scholar]
- 26.Häusser M. & Mel B. Dendrites: bug or feature? Curr. Opin. Neurobiol. 13, 372–383 (2003). [DOI] [PubMed] [Google Scholar]
- 27.Poirazi P. & Papoutsi A. Illuminating dendritic function with computational models. Nat. Rev. Neurosci. 21, 303–321 (2020). [DOI] [PubMed] [Google Scholar]
- 28.Larkum M. A cellular mechanism for cortical associations: an organizing principle for the cerebral cortex. Trends Neurosci. 36, 141–151 (2013). [DOI] [PubMed] [Google Scholar]
- 29.Gidon A. et al. Dendritic action potentials and computation in human layer 2/3 cortical neurons. Science 367, 83–87 (2020). [DOI] [PubMed] [Google Scholar]
- 30.Shepherd G. M. & Brayton R. K. Logic operations are properties of computer-simulated interactions between excitable dendritic spines. Neuroscience 21, 151–165 (1987). [DOI] [PubMed] [Google Scholar]
- 31.Harnett M. T., Makara J. K., Spruston N., Kath W. L. & Magee J. C. Synaptic amplification by dendritic spines enhances input cooperativity. Nature 491, 599–602 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Schachter M. J., Oesch N., Smith R. G. & Taylor W. R. Dendritic Spikes Amplify the Synaptic Signal to Enhance Detection of Motion in a Simulation of the Direction-Selective Ganglion Cell. PLoS Comput. Biol. 6, e1000899 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ariav G., Polsky A. & Schiller J. Submillisecond Precision of the Input-Output Transformation Function Mediated by Fast Sodium Dendritic Spikes in Basal Dendrites of CA1 Pyramidal Neurons. J. Neurosci. 23, 7750–7758 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Softky W. Sub-millisecond coincidence detection in active dendritic trees. Neuroscience 58, 13–41 (1994). [DOI] [PubMed] [Google Scholar]
- 35.Roome C. J. & Kuhn B. Dendritic coincidence detection in Purkinje neurons of awake mice. eLife 9, e59619 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Stuart G. J. & Häusser M. Dendritic coincidence detection of EPSPs and action potentials. Nat. Neurosci. 4, 63–71 (2001). [DOI] [PubMed] [Google Scholar]
- 37.Naud R. & Sprekeler H. Sparse bursts optimize information transmission in a multiplexed neural code. Proc. Natl. Acad. Sci. 115, (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Anandan E. S., Husain R. & Seluakumaran K. Auditory attentional filter in the absence of masking noise. Atten. Percept. Psychophys. 83, 1737–1751 (2021). [DOI] [PubMed] [Google Scholar]
- 39.Benezra S. E., Patel K. B., Campos C. P., Hillman E. M. C. & Bruno R. M. Learning Enhances Behaviorally Relevant Representations in Apical Dendrites. http://biorxiv.org/lookup/doi/10.1101/2021.11.10.468144 (2021) doi: 10.1101/2021.11.10.468144. [DOI] [PMC free article] [PubMed]
- 40.Takahashi N., Oertner T. G., Hegemann P. & Larkum M. E. Active cortical dendrites modulate perception. Science 354, 1587–1590 (2016). [DOI] [PubMed] [Google Scholar]
- 41.Takahashi N. et al. Active dendritic currents gate descending cortical outputs in perception. Nat. Neurosci. 23, 1277–1285 (2020). [DOI] [PubMed] [Google Scholar]
- 42.Xu T. et al. Rapid formation and selective stabilization of synapses for enduring motor memories. Nature 462, 915–919 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Otor Y. et al. Dynamic compartmental computations in tuft dendrites of layer 5 neurons during motor behavior. Science 376, 267–275 (2022). [DOI] [PubMed] [Google Scholar]
- 44.Lai C. S. W., Adler A. & Gan W.-B. Fear extinction reverses dendritic spine formation induced by fear conditioning in the mouse auditory cortex. Proc. Natl. Acad. Sci. 115, 9306–9311 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Xu Z. et al. Fear conditioning and extinction induce opposing changes in dendritic spine remodeling and somatic activity of layer 5 pyramidal neurons in the mouse motor cortex. Sci. Rep. 9, 4619 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Letzkus J. J. et al. A disinhibitory microcircuit for associative fear learning in the auditory cortex. Nature 480, 331–335 (2011). [DOI] [PubMed] [Google Scholar]
- 47.Kastellakis G., Silva A. J. & Poirazi P. Linking Memories across Time via Neuronal and Dendritic Overlaps in Model Neurons with Active Dendrites. Cell Rep. 17, 1491–1504 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Malakasis N., Chavlis S. & Poirazi P. Synaptic Turnover Promotes Efficient Learning in Bio-Realistic Spiking Neural Networks. http://biorxiv.org/lookup/doi/10.1101/2023.05.22.541722 (2023) doi: 10.1101/2023.05.22.541722. [DOI]
- 49.Poirazi P. & Mel B. W. Impact of Active Dendrites and Structural Plasticity on the Memory Capacity of Neural Tissue. Neuron 29, 779–796 (2001). [DOI] [PubMed] [Google Scholar]
- 50.Kaifosh P. & Losonczy A. Mnemonic Functions for Nonlinear Dendritic Integration in Hippocampal Pyramidal Circuits. Neuron 90, 622–634 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Makarov R., Pagkalos M. & Poirazi P. Dendrites and efficiency: Optimizing performance and resource utilization. Curr. Opin. Neurobiol. 83, 102812 (2023). [DOI] [PubMed] [Google Scholar]
- 52.Pagkalos M., Makarov R. & Poirazi P. Leveraging dendritic properties to advance machine learning and neuro-inspired computing. Curr. Opin. Neurobiol. 85, 102853 (2024). [DOI] [PubMed] [Google Scholar]
- 53.Chavlis S. & Poirazi P. Drawing inspiration from biological dendrites to empower artificial neural networks. Curr. Opin. Neurobiol. 70, 1–10 (2021). [DOI] [PubMed] [Google Scholar]
- 54.Acharya J. et al. Dendritic Computing: Branching Deeper into Machine Learning. Neuroscience 489, 275–289 (2022). [DOI] [PubMed] [Google Scholar]
- 55.Guerguiev J., Lillicrap T. P. & Richards B. A. Towards deep learning with segregated dendrites. eLife 6, e22901 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Payeur A., Guerguiev J., Zenke F., Richards B. A. & Naud R. Burst-dependent synaptic plasticity can coordinate learning in hierarchical circuits. Nat. Neurosci. 24, 1010–1019 (2021). [DOI] [PubMed] [Google Scholar]
- 57.Illing B., Ventura J., Bellec G. & Gerstner W. Local plasticity rules can learn deep representations using self-supervised contrastive predictions. in Advances in Neural Information Processing Systems (eds. Ranzato M., Beygelzimer A., Dauphin Y., Liang P. S. & Vaughan J. W.) vol. 34 30365–30379 (Curran Associates, Inc., 2021). [Google Scholar]
- 58.Jones I. S. & Kording K. P. Might a Single Neuron Solve Interesting Machine Learning Problems Through Successive Computations on Its Dendritic Tree? Neural Comput. 33, 1554–1571 (2021). [DOI] [PubMed] [Google Scholar]
- 59.Körding K. P. & König P. Supervised and unsupervised learning with two sites of synaptic integration. J. Comput. Neurosci. 11, 207–215 (2001). [DOI] [PubMed] [Google Scholar]
- 60.Kirkpatrick J. et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114, 3521–3526 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Siegel M., Körding K. P. & König P. Integrating Top-Down and Bottom-Up Sensory Processing by Somato-Dendritic Interactions. J. Comput. Neurosci. 8, 161–173 (2000). [DOI] [PubMed] [Google Scholar]
- 62.Tang C. et al. Dendritic Neural Network: A Novel Extension of Dendritic Neuron Model. IEEE Trans. Emerg. Top. Comput. Intell. 1–12 (2024) doi: 10.1109/TETCI.2024.3367819. [DOI]
- 63.Iyer A. et al. Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments. Front. Neurorobotics 16, 846219 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Wu X., Liu X., Li W. & Wu Q. Improved Expressivity Through Dendritic Neural Networks. in Advances in Neural Information Processing Systems (eds. Bengio S. et al.) vol. 31 (Curran Associates, Inc., 2018). [Google Scholar]
- 65.Wybo W. A. M. et al. NMDA-driven dendritic modulation enables multitask representation learning in hierarchical sensory processing pathways. Proc. Natl. Acad. Sci. 120, e2300558120 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Ringach D. L. Mapping receptive fields in primary visual cortex. J. Physiol. 558, 717–728 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Bair W. Visual receptive field organization. Curr. Opin. Neurobiol. 15, 459–464 (2005). [DOI] [PubMed] [Google Scholar]
- 68.Chen Y. et al. Locally-connected and convolutional neural networks for small footprint speaker recognition. in Interspeech 2015 1136–1140 (ISCA, 2015). doi: 10.21437/Interspeech.2015-297. [DOI] [Google Scholar]
- 69.Park J. et al. Contribution of apical and basal dendrites to orientation encoding in mouse V1 L2/3 pyramidal neurons. Nat. Commun. 10, 5372 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Morgan A. T., Petro L. S. & Muckli L. Scene Representations Conveyed by Cortical Feedback to Early Visual Cortex Can Be Described by Line Drawings. J. Neurosci. 39, 9410–9423 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Nusrat I. & Jang S.-B. A Comparison of Regularization Techniques in Deep Neural Networks. Symmetry 10, 648 (2018). [Google Scholar]
- 72.Bartunov S. et al. Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures. in Advances in Neural Information Processing Systems vol. 31 (Curran Associates, Inc., 2018). [Google Scholar]
- 73.Hoefler T., Alistarh D., Ben-Nun T., Dryden N. & Peste A. Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks. J. Mach. Learn. Res. 22, 1–124 (2021). [Google Scholar]
- 74.Maaten L. van der & Hinton G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008). [Google Scholar]
- 75.Sacramento J., Ponte Costa R., Bengio Y. & Senn W. Dendritic cortical microcircuits approximate the backpropagation algorithm. in Advances in Neural Information Processing Systems (eds. Bengio S. et al.) vol. 31 (Curran Associates, Inc., 2018). [Google Scholar]
- 76.Sacramento J., Costa R. P., Bengio Y. & Senn W. Dendritic error backpropagation in deep cortical microcircuits. Preprint at http://arxiv.org/abs/1801.00062 (2017).
- 77.Holley Z. L., Bland K. M., Casey Z. O., Handwerk C. J. & Vidal G. S. Cross-Regional Gradient of Dendritic Morphology in Isochronically-Sourced Mouse Supragranular Pyramidal Neurons. Front. Neuroanat. 12, 103 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Gaucher Q. et al. Complexity of frequency receptive fields predicts tonotopic variability across species. eLife 9, e53462 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.De Vries A. The growing energy footprint of artificial intelligence. Joule 7, 2191–2194 (2023). [Google Scholar]
- 80.Sarker I. H. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Comput. Sci. 2, 420 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Wilson D. E., Whitney D. E., Scholl B. & Fitzpatrick D. Orientation selectivity and the functional clustering of synaptic inputs in primary visual cortex. Nat. Neurosci. 19, 1003–1009 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Lavzin M., Rapoport S., Polsky A., Garion L. & Schiller J. Nonlinear dendritic processing determines angular tuning of barrel cortex neurons in vivo. Nature 490, 397–401 (2012). [DOI] [PubMed] [Google Scholar]
- 83.Cichon J. & Gan W.-B. Branch-specific dendritic Ca(2+) spikes cause persistent synaptic plasticity. Nature 520, 180–185 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Fu M., Yu X., Lu J. & Zuo Y. Repetitive motor learning induces coordinated formation of clustered dendritic spines in vivo. Nature 483, 92–95 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.O’Hare J. K., Wang J., Shala M. D., Polleux F. & Losonczy A. Variable recruitment of distal tuft dendrites shapes new hippocampal place fields. Preprint at 10.1101/2024.02.26.582144 (2024). [DOI]
- 86.Beaulieu-Laroche L., Toloza E. H. S., Brown N. J. & Harnett M. T. Widespread and Highly Correlated Somato-dendritic Activity in Cortical Layer 5 Neurons. Neuron 103, 235–241.e4 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Kastellakis G., Cai D. J., Mednick S. C., Silva A. J. & Poirazi P. Synaptic clustering within dendrites: an emerging theory of memory formation. Prog. Neurobiol. 126, 19–35 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Kastellakis G. & Poirazi P. Synaptic Clustering and Memory Formation. Front. Mol. Neurosci. 12, 300 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Kastellakis G., Tasciotti S., Pandi I. & Poirazi P. The dendritic engram. Front. Behav. Neurosci. 17, 1212139 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Parthasarathy A. et al. Mixed selectivity morphs population codes in prefrontal cortex. Nat. Neurosci. 20, 1770–1779 (2017). [DOI] [PubMed] [Google Scholar]
- 91.Fusi S., Miller E. K. & Rigotti M. Why neurons mix: high dimensionality for higher cognition. Curr. Opin. Neurobiol. 37, 66–74 (2016). [DOI] [PubMed] [Google Scholar]
- 92.Barak O., Rigotti M. & Fusi S. The sparseness of mixed selectivity neurons controls the generalization-discrimination trade-off. J. Neurosci. Off. J. Soc. Neurosci. 33, 3844–3856 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Rigotti M. et al. The importance of mixed selectivity in complex cognitive tasks. Nature 497, 585–590 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Wu X. et al. Mitigating Communication Costs in Neural Networks: The Role of Dendritic Nonlinearity. Preprint at http://arxiv.org/abs/2306.11950 (2023).
- 95.Zhao L. & Zhang Z. A improved pooling method for convolutional neural networks. Sci. Rep. 14, 1589 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Ji J., Gao S., Cheng J., Tang Z. & Todo Y. An approximate logic neuron model with a dendritic structure. Neurocomputing 173, 1775–1783 (2016). [Google Scholar]
- 97.Todo Y., Tamura H., Yamashita K. & Tang Z. Unsupervised learnable neuron model with nonlinear interaction on dendrites. Neural Netw. 60, 96–103 (2014). [DOI] [PubMed] [Google Scholar]
- 98.Zhang Y. et al. A Lightweight Multi-Dendritic Pyramidal Neuron Model with Neural Plasticity on Image Recognition. IEEE Trans. Artif. Intell. 1–13 (2024) doi: 10.1109/TAI.2024.3379968. [DOI]
- 99.Bird A. D., Jedlicka P. & Cuntz H. Dendritic normalisation improves learning in sparsely connected artificial neural networks. PLOS Comput. Biol. 17, e1009202 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Hodassman S., Vardi R., Tugendhaft Y., Goldental A. & Kanter I. Efficient dendritic learning as an alternative to synaptic plasticity hypothesis. Sci. Rep. 12, 6571 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Mengiste S. A., Aertsen A. & Kumar A. Effect of edge pruning on structural controllability and observability of complex networks. Sci. Rep. 5, 18145 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Han S., Pool J., Tran J. & Dally W. Learning both Weights and Connections for Efficient Neural Network. in Advances in Neural Information Processing Systems (eds. Cortes C., Lawrence N., Lee D., Sugiyama M. & Garnett R.) vol. 28 (Curran Associates, Inc., 2015). [Google Scholar]
- 103.Liu Z. et al. Learning Efficient Convolutional Networks through Network Slimming. in 2017 IEEE International Conference on Computer Vision (ICCV) 2755–2763 (IEEE, Venice, 2017). doi: 10.1109/ICCV.2017.298. [DOI] [Google Scholar]
- 104.Mocanu D. C. et al. Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat. Commun. 9, 2383 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Louizos C., Welling M. & Kingma D. P. Learning Sparse Neural Networks through L_0 Regularization. in (2018).
- 106.Scardapane S., Comminiello D., Hussain A. & Uncini A. Group sparse regularization for deep neural networks. Neurocomputing 241, 81–89 (2017). [Google Scholar]
- 107.Patil S. M. & Dovrolis C. PHEW : Constructing Sparse Networks that Learn Fast and Generalize Well without Training Data. in Proceedings of the 38th International Conference on Machine Learning 8432–8442 (PMLR, 2021). [Google Scholar]
- 108.Tanaka H., Kunin D., Yamins D. L. & Ganguli S. Pruning neural networks without any data by iteratively conserving synaptic flow. in Advances in Neural Information Processing Systems vol. 33 6377–6389 (Curran Associates, Inc., 2020). [Google Scholar]
- 109.Lecun Y., Bottou L., Bengio Y. & Haffner P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998). [Google Scholar]
- 110.Xiao H., Rasul K. & Vollgraf R. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. Preprint at http://arxiv.org/abs/1708.07747 (2017).
- 111.Clanuwat T. et al. Deep Learning for Classical Japanese Literature. Preprint at 10.20676/00000341 (9999). [DOI]
- 112.Cohen G., Afshar S., Tapson J. & van Schaik A. EMNIST: an extension of MNIST to handwritten letters. Preprint at http://arxiv.org/abs/1702.05373 (2017).
- 113.Krizhevsky A. Learning Multiple Layers of Features from Tiny Images. (2009).
- 114.Rousseeuw P. J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987). [Google Scholar]
- 115.Paulovich F. V., Nonato L. G., Minghim R. & Levkowitz H. Least Square Projection: A Fast High-Precision Multidimensional Projection Technique and Its Application to Document Mapping. IEEE Trans. Vis. Comput. Graph. 14, 564–575 (2008). [DOI] [PubMed] [Google Scholar]
- 116.Rauber P. E., Fadel S. G., Falcao A. X. & Telea A. C. Visualizing the Hidden Activity of Artificial Neural Networks. IEEE Trans. Vis. Comput. Graph. 23, 101–110 (2017). [DOI] [PubMed] [Google Scholar]
- 117.Maaten L. van der. Learning a Parametric Embedding by Preserving Local Structure. in Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics 384–391 (PMLR, 2009). [Google Scholar]
- 118.Chollet F. & and others. Keras. (2015).
- 119.Abadi Martín, Agarwal Ashish, Barham Paul, Brevdo Eugene, et al. TensorFlow: Large-scale machine learning on heterogeneous systems.
- 120.Harris C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121.Pedregosa F. et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). [Google Scholar]
- 122.McKinney W. pandas: a Foundational Python Library for Data Analysis and Statistics. in (2011).
- 123.Hunter J. D. Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng. 9, 90–95 (2007). [Google Scholar]
- 124.Waskom M. L. seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021). [Google Scholar]
- 125.seaborn-image: image data visualization.
- 126.Vallat R. Pingouin: statistics in Python. J. Open Source Softw. 3, 1026 (2018). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The source code that generates all Figures and the data that support this study are accessible on GitHub.






