Abstract
Advances in artificial intelligence enable neural networks to learn a wide variety of tasks, yet our understanding of the learning dynamics of these networks remains limited. Here, we study the temporal dynamics during learning of Hebbian feedforward neural networks in tasks of continual familiarity detection. Drawing inspiration from network neuroscience, we examine the network’s dynamic reconfiguration, focusing on how network modules evolve throughout learning. Through a comprehensive assessment involving metrics like network accuracy, modular flexibility, and distribution entropy across diverse learning modes, our approach reveals various previously unknown patterns of network reconfiguration. We find that the emergence of network modularity is a salient predictor of performance and that modularization strengthens with increasing flexibility throughout learning. These insights not only elucidate the nuanced interplay of network modularity, accuracy, and learning dynamics but also bridge our understanding of learning in artificial and biological agents.
Network neuroscience techniques reveal a link between network modularity and task performance in artificial neural networks.
INTRODUCTION
Biological and artificial neural networks (ANNs) are powerful models capable of learning and adapting to new information. Such models are used routinely in neuroscience and artificial intelligence (AI) for various applications related to learning. Neuroscience has traditionally focused on understanding the organizational mechanisms of biological neural networks and how they support cognitive process (1), although these insights can sometimes inform the development of AI models (2–4). AI, on the other hand, leverages ANNs in a variety of applications, from computer vision and natural language processing, to more complex tasks that involve decision-making and prediction (5), with these applications offering insights into the brain’s intricate mysteries (6, 7). While ANNs are often viewed as a digital manifestation of the brain’s workings, a holistic framework that perceives an ANN as a dynamic neural system is conspicuously absent.
Current applications of AI in neuroscience can be broadly classified into two categories. The first category applies ANNs as prediction tools to strengthen the power of identifying associations, e.g., using sophisticated models to map brain connectome to labels and developing encoding models based on ANN features (6, 8). The second category builds recurrent neural network (RNN) models to execute cognitive tasks, with the goal of understanding the relationship among tasks and the principles of cognition through manipulating the trained ANNs (9, 10). The methodological philosophy behind these two categories of approaches is that the similarity in performance may suggest similarity in the structure and representation. Yet, the dynamic nature of learning, intrinsic to both artificial and biological entities, remains largely uncharted, underscoring the chasms that persist in our comprehension of ANN methodologies and neural dynamics.
A substantial amount of research has been devoted to understanding the efficacy of ANNs, based primarily on concepts from computational optimization and statistical learning theory. Such perspectives, however, are insufficient to capture the dynamic, nonlinear nature of learning in biological neural networks (11, 12). While we are now able to construct networks capable of impressive feats (13), grasping their underlying learning dynamics is still an emerging frontier. This calls for versatile analytical instruments, capable of dissecting the temporal intricacies of ANNs, reflecting the persistent adaptability evident in biological neural networks during learning phases. We argue that a perspective grounded in computational neuroscience is indispensable, drawing from a set of techniques we label Artificial Network Neuroscience.
Here, we illustrate these ideas by studying the learning dynamics of synaptic networks—specifically, Hebbian feedforward (HebbFF) neural networks—during a continual familiarity detection task (14). We choose this task and model for two reasons. First, memory-related tasks have been widely studied in network neuroscience especially for the network reconfiguration across the learning procedure (15–18). Second, the HebbFF model endowed with synaptic plasticity reproduces experimental results in the memory domain (14, 19, 20), making it an appropriate model for dynamic examination. We hypothesize that leveraging analytical techniques from network neuroscience will facilitate the interpretation of dynamic shifts underlying neural network reconfigurations.
To study the learning dynamics of HebbFF, we developed a multipronged approach examining the dynamic reconfiguration of the networks from different perspectives. We begin with an analysis of the modularity over temporal scales and its relationship to variations in task accuracy and distribution entropy across diverse learning paradigms. We then explored the synchronicity of states during the training phase and its subsequent correlation with accuracy. We show that network modularization increases with learning and that network flexibility serves as a robust metric encapsulating model performance, in line with results from neuroscience in biological organisms. We hope that the analytical approach described here will shed light on the interplay between network modularity, accuracy, and learning dynamics, and ultimately advance our understanding of ANNs and their biological counterparts.
RESULTS
We explored the dynamic reconfiguration of ANNs, focusing on a class of recurrent (RNNs) called HebbFF network. This network determines the familiarity of a stimulus x(t) based on whether it matches a stimulus encountered previously (14). The network input is an N-dimensional vector x(t), and the output y(t) indicates whether x(t) equals x(t − R) (Fig. 1A). Note that this task setting resembles classical working memory paradigms, which have been investigated widely in network neuroscience terms of dynamic network reconfiguration (15, 17). The parameters of this continual familiarity detection task include a repeat interval length, R, and a vector length, N, which we set to R = 5 and N = 100. We use a neural network with M = 120 units in the hidden layer to provide sufficient representation power for encoding the input (Fig. 1B). A more detailed discussion on the memory capacity can be found at (14).
Fig. 1. Functional connection for HebbFF network.
(A) An illustration of the continual familiarity detection task. The network output indicates whether a given stimulus x(t) matches the stimulus presented previously. (B) The HebbFF network, using a hidden layer, encodes the input stimulus and predicts its familiarity through a linear classifier. Here, M is the number of neurons, and T is the number of testing samples. (C) Analogous to the brain network, hidden layer activations are extracted in response to a task sequence, consisting of a hidden-layer activation matrix of dimension M × T. The function connection between the neurons i and j is defined as their Pearson’s correlation over the testing samples.
We trained 120 network instances representing different subjects. Each network was trained on a training dataset of 2500 samples for 1000 epochs. After every 10 epochs, we evaluated the partially trained model on a testing dataset comprised of a separate 2500 samples, examining the activation of the hidden layer units to investigate the emergence of modularization. For each model instance, thus, we created a 120 × 2500 × 100 tensor with 120 regions, 2500 test time points, and 100 selected training epochs. As the weights of HebbFF are updated through the task, the activation vectors contain information about short memory, which supports the combination of neighboring cases into time windows. The functional connectivity between each pair of units is computed as the correlation between their activations over the 2500 test samples.
In cognitive neuroscience, network modularization is recognized as an important mechanism that enhances computational efficiency and flexibility by facilitating localized information processing (21). To understand the role of modularization in HebbFF, we analyzed the evolution of the network’s community structure throughout the learning trajectory. We found a pronounced decline in the number of communities early in the learning trajectory, concomitant with an escalation in overall network modularity (Fig. 2, A and B). Such trends are indicative of a scenario wherein the quantitative reduction in module count is counterbalanced by an amplification in the representational capability of the extant modules. A deeper probe into the modular allegiance matrix across varied learning epochs reveals a nascent modular structure as early as epoch 10 (Fig. 2C). By epoch 200, this structure began to delineate with more pronounced modules. By epochs 390 and 580, we observed a further crystallization of these modules, with distinct community structure surfacing. By epochs 770 and 960, the matrix displayed pronounced and contrastive modules, underscoring a heightened modular allegiance. Such dynamics underscore the iterative refinement and consolidation of a community structure during the learning process, echoing the behavior of biological neural networks. These results highlight the emergence of modular, localized processing in fine-tuning learning dynamics, raising the important question of whether such structure is related to task performance.
Fig. 2. Modularization of hidden layer activation over training.
(A) The number of communities decreases, and (B) the modularity increases through the training. (C) The modularization becomes more significant through trainings with less modules and more contrastive modules.
To analyze more closely the increase of modularity during learning, we examined the activations of the hidden layer. We found that, on average, the hidden layer’s activation initially increased in very early stages but declined as training matured toward its latter phases (Fig. 3A). Such a reversal could result from the enhancement of the distribution’s negative spectrum or a possible attenuation of its positive counterpart. Additional insights into this behavior can be extracted from observing the activation distributions across model instances across select epochs, from early to late stages (Fig. 3E). We found that the probability of near-zero activations increased throughout learning, consistent with the increased modularity described previously (Fig. 2C).
Fig. 3. Evolution of activation patterns in the hidden layer over training periods.
(A) Tracing the trajectory of average activation in the hidden layer, an initial moderate increase is evident in the early training epochs, succeeded by a subsequent decline persisting until the latter epochs. (B) An analysis of the activation values’ correlation among neurons in the hidden layer exhibits a consistent decrease as training progresses. (C) A study of the kurtosis of the activation values across epochs shows an initial decrease, followed by a resurgence in the later stages of training. (D) The skewness of activation values delineates an intriguing pattern: an initial decline, transitioning into a bifurcation in later epochs, manifesting both an ascendant trajectory and an alternate trajectory with an initial surge followed by a decline. (E) Epoch-specific histograms depict the activation distribution across select epochs, illustrating a growing concentration of values around zero and a reduction in distributional asymmetry as training advances. (F) Histograms across epochs, analogous to those in (E), elucidate the shifts in correlation values. Early training predominantly showcases correlations proximate to unity. As training matures, the distribution gravitates toward zero, concluding in a mildly zero-skewed distribution punctuated by an isolated peak at one.
We also found interesting patterns in the kurtosis and skewness of the activation distributions. We observed a slight contraction in the kurtosis of hidden layer neural activations early in learning, followed by a marked expansion in the later stages (Fig. 3C). This trend corroborates our observations in Fig. 3E, which demonstrate an accumulation of activation values in the vicinity of zero. The skewness, meanwhile, displayed an intriguing pattern. Early in learning, the distribution of neural activations showed a consistent decline in skewness, representing an increase in the symmetry of the distribution. Yet, as training enters its intermediate and late phases, the skewness bifurcated into dual trajectories. While one set of networks followed a steady increase in skewness, the other showed a rapid increase before a slower decrease. This raises the question of whether these distinct trajectories represent distinct modes of learning, and whether they are relevant for behavior.
To gain additional insights, we examined the correlations within the hidden layer. We found that correlations tended to decrease throughout learning (Fig. 3B). In early stages, hidden layer activations were highly correlated (Fig. 3F, left). As training unfolded, however, the distribution of correlation shifted toward zero, culminating in a slightly zero-skewed distribution punctuated by an isolated peak at one (Fig. 3F, right). Such behavior suggests a desynchronization within the network, suggesting an augmentation in representational power that may underlie an increase in model performance.
In network neuroscience, a modular structure holds dual signification: first, as a descriptor of learning phase differentiation; second, as an indicator of interindividual variability during cognitive assessments. To understand the interplay between modularity and performance within HebbFF, we explored the correlation dynamics between modularity and accuracy, both within specified epoch windows and across disparate model instances (Fig. 4). Looking across windows of 100 epochs, we found a consistently positive correlation between modularity and accuracy, suggesting that more modular networks tend to perform better. This correlation approached one during the median training stages, subsequently retracting to ~0.6 in late stages (Fig. 4A). This trend is congruent with the overarching rise in modularity as training progresses depicted previously (Fig. 2B) and associated with an opposite trend for the change of the number of communities. Given the near-monotonic increase in modularity (Fig. 2B) and in task accuracy (Fig. 4A), the trends in correlations suggest that the heightened modularity facilitates, in median training stages, the memorization of states in the form of strengthening the separation among different states.
Fig. 4. Correlation dynamics between network modularity and performance accuracy.
(A) Correlation progression between modularity, community number, and accuracy across distinct epoch windows (epochWin), each time point comprises 10 selected epochs corresponding to 100 original epochs in the training procedure. For the modularity and model accuracy, initial stages showcase a surge in correlation nearing unity, which subsequently declines to ~0.6 by the concluding stages. For the number of communities and model accuracy, the correlations display an opposite trend and goes to almost zeros in the late stages. (B) Depiction of correlation distributions corresponding to distinct epoch windows, capturing the variability within individual error bars of (A). (C) Overall correlation trajectory between modularity, number of communities, and accuracy, calculated across diverse model instances. For the modularity and model accuracy, a marked positive correlation during the formative training stages inversely transitions to a negative domain during intermediate and advanced stages. For the number of communities and model accuracy, the correlations display an opposite trend and remain positive in the late stages. (D) Mediation analysis for the connection strength, network modularity and model accuracy. The network modularity fully mediates the effect of the connection strength on the model accuracy.
Holding the epoch constant and examining correlations across model instances, we found a distinct pattern. The modularity-accuracy correlation demonstrates an initial upswing during the formative training epochs, which then descends to negative magnitudes during the intermediate and concluding phases. The community-number-accuracy correlation displays an overall opposite trend and remains positive in the concluding phases (Fig. 4C). Examining these patterns across epoch windows and model instances reveals a particularly intriguing trend. While a single model’s learning curve exhibits a positive correlation between accuracy and modularity, this association switches to negative when examined across model instances. When we calculate the correlation across instances, different instances vary in the number of communities, with a larger number of communities resulting in improved model capacity and better model performances. However, more communities unnecessarily lead to higher modularity values. For a single model, the community structure remains almost unchanged, especially in the late stage of training. Thus, the modularity value is mainly affected by the strength where higher modularity suggests better separation of different modes and, consequently, higher model accuracy (see fig. S2 and eqs. S1 to S8 for a more detailed mechanistic discussion based on idealized models). These findings suggest that, within models subjected to extended training, a robust modular structure, manifesting as markedly modularized activation states, could potentially compromise representational capacity, consequently attenuating overall performance.
As stated above, the change in connection strength may also be predictive of the model’s accuracy. Thus, to examine whether the effect of modularity on the model accuracy is caused by changes in connection strength, we performed a mediation analysis where model accuracy is the dependent variable, connection strength is the independent variable, and network modularity is the mediator variable. We used a mixed-effects regression model. First, the connection strength significantly predicts the network modularity (effect a: β = −0.1122, PF < 0.0001, note that PF denotes P values associated with the F statistic) and model accuracy (effect c: β = −1.7092, PF < 0.0001). In addition, network modularity predicts model accuracy (effect b: β = 17.8802, PF < 0.0001). However, when we execute the regression of model accuracy on both network modularity and connection strength, the effect of connection strength becomes insignificant (c′: β = 0.2804, PF = 0.1006). Based on these results, we can conclude that network modularity fully mediates the effect of the connection strength on the model accuracy. The causal link between network modularity and model accuracy is also supported by a Granger’s causality analysis (see fig. S1).
The results thus far illustrate the relevance of network modularity, a static metric of mesoscale structure, for task performance. To complement these findings, we examined also the network’s modular flexibility, which quantifies the dynamic reconfiguration rate of the networks’ modular structure. In biological networks, modular flexibility has been empirically linked to task execution proficiency in neuroscience studies, displaying distinct patterns across learning phases. Here, we examined modular flexibility to assess its connection with model performance across learning stages and across model instances (Fig. 5).
Fig. 5. Evaluating the interplay between accuracy and modular flexibility during training.
(A) Illustration of the construction of multilayer functional networks through correlation within time windows. Here, M is the number of nodes, and L is the number of time windows. We apply the multilayer community detection algorithm to obtain the community structure for the hidden layer in each epoch window and compute the modular flexibility as the changing rate of the community association across all epoch windows. (B) Over the training progression, modular flexibility consistently elevates in tandem with accuracy. Initially, there is a distinctive oscillation in the accuracy curve, mirroring a “warm-up” phase, while modular flexibility follows a clear upward trend. (C) The epoch-wise correlation between modular flexibility and accuracy (spanning individual model instances) first ascends, recording positive values, before descending into negative territory. (D) Analyzing correlations within designated epoch windows, we discern that while both modularity and flexibility exhibit an upward trajectory, flexibility achieves its correlation zenith earlier, subsequently declining to near-zero levels in the concluding stages.
To examine the dynamics of modular flexibility, the sequence of 2500 time points was divided into 50 time windows, each with length 50. The dynamic functional connectivity matrices (22) were then constructed as the Pearson’s matrix within each of the 50 time windows (Fig. 5A). We applied the multi-slice Louvain algorithm (23) to the 50-layer networks to obtain the temporally varying community association (Fig. 5A). We then calculated the network flexibility (15) for each model instance. Initially, we segmented the entire learning procedure (1000 epochs, recorded every 10 epochs) into 10 epoch windows. Within each of these windows, we calculated the correlation between flexibility and accuracy and subsequently aggregated these correlations to produce a composite histogram and error bars. We found a remarkable parallel between increase in modular flexibility and accuracy over the learning process (Fig. 5B). Similarly to the trend found for modularity, correlations between flexibility and accuracy (across model instances) increased in the early learning stages before declining in the subsequent stages (Fig. 5C). Within trained models, we found that lower flexibility, which we associated with increased representational stability, may predict higher memory performance. We found several correlation peaks in epochs 101 to 200 and 201 to 300, resonating with the latter phases of the warm-up and the onset of the progressive stage, respectively. Epoch-specific correlations for the 120 model instances display an intriguing pattern (Fig. 5D). Although flexibility and modularity follow similar trajectories, the peak of the correlation between flexibility and accuracy precedes the peak of the correlation between modularity and accuracy. This suggests that dynamic metrics (e.g., flexibility) might be precursors to the emergent modular representations throughout training. Furthermore, while flexibility contributes to learning, optimal performance may require stable representations.
The patterns observed in the distributions of hidden unit activations (Fig. 3, A to D), when viewed alongside the evolution in modular flexibility (Fig. 5A), suggest varying learning trajectories across different model instances. To discern whether prominent learning patterns exist, we analyzed the performance of HebbFF on the test set throughout the learning process (Fig. 6). This analysis revealed two distinct learning modes. In the first mode (learning mode I), accuracy remained nearly constant for the first 200 epochs (Fig. 6, A and B). In the second mode (learning mode II), accuracy increased rapidly in the first few epochs before decreasing again around epoch 100 (Fig. 6, E and F).
Fig. 6. Divergent modes of learning in HebbFF networks.
(A) Learning mode I showcases different evolutionary trends for accuracy, modular flexibility, and distribution entropy. The accuracy curve (B) exhibits a warm-up period of ~20 epochs before ascending to its upper limit. Simultaneously, the modular flexibility (C) consistently grows, reaching a plateau, while the distribution entropy of the hidden layer (D) peaks during the warm-up phase and then diminishes during the early accuracy surge before experiencing another incline as training concludes. In contrast, learning mode II (E) reveals distinct patterns in accuracy, modular flexibility, and distribution entropy compared to mode I. Specifically, the accuracy (F) swiftly climbs to a suboptimum and then recedes, facilitating further exploration within the loss landscape to pinpoint global optima. Despite the fluctuating accuracy, both the modular flexibility (G) and distribution entropy (H) perpetually ascend. For clarity, the normalized score represents the original measurement adjusted to fit within the range of 0 to 1.
In brain networks, network flexibility represents the community adaptation rate throughout time, with higher flexibility often correlating with heightened cognitive task performance. We wished to determine whether this relationship between network flexibility and performance is also found in HebbFF networks. Both learning modes, despite differing warm-up behaviors, consistently exhibited a monotonic increase in modular flexibility, possibly to increase the network’s representation capability and accuracy (Fig. 6, C and G). This raises the possibility that modular flexibility is a potentially broader network characteristic, transcending its traditional heuristic function.
To further examine the representational capacity of the hidden layers, we gauged the distribution entropy of hidden layer activation, interpreting it as the average informational content. For mode I, despite the averaged entropy curve increasing before the 100-epoch mark, substantial fluctuations were noted between 100 and 300 epochs (Fig. 6D). In contrast, mode II experienced a nearly monotonic rise in entropy after a transient in the first few epochs, suggesting consistent informational augmentation parallel to increasing modular flexibility. Intriguingly, both modes demonstrated entropy (24) growth in latter stages, implying that global optima optimization correlates with enhanced representation.
Last, to analyze the generalizability of the link between network modularity and model accuracy, we examined the correlation between the hidden layer modularity and the model accuracy in a dynamic vision sensor image recognition task (25). This task aims to categorize an image sequence, a task very different from the familiarity detection task (Fig. 7A). The input sequence first goes through a pretrained encoder to transfer into the hidden states. The hidden states are then updated in the form of h(t + 1) = W · h(t). Here, the updating weight matrix W consists of the elementwise product of two matrix S that can be either fixed or trained via the Hebbian rule, and F is trained via the backpropagation. If we fix S as a constant matrix, then the model reduces to a typical RNN model. The hidden layer has 200 dimensions. We ran the model with 120 random seeds to obtain multiple model instances and averaged the curves to achieve stable results (see the Supplementary Materials for a more detailed description of the model setup). First, functional connections, defined as the covariance of h, displayed an increasing modularity (Fig. 7B) along with the increase of model performance (Fig. 7C). This was true regardless of whether we updated W with the Hebbian rule for S or not. Using the Hebbian rule, however, led to higher modularity and better performance as well as fast convergence and earlier overfitting. Overall, these results support our conclusion that ANN modularization is a characteristic of the training procedure.
Fig. 7. Generalization on the spiking neural network with DVS input.
To explore the generalizability of the observation of modularity over the training, we examine the modularity change over training in an image recognition task with a spiking neural network. (A) The model takes an image sequence acquired with the dynamic vision sensor (DVS) camera and outputs the category label. The hidden layer states are updated in a recurrent form with mixed mechanism of Hebbian rules and backpropagation. (B) Network modularity in hidden layer increased regardless of whether a Hebbian learning was used. The network trained with the Hebbian rule had higher modularity values than the network trained without the Hebbian rule. (C) Test accuracy for networks trained with and without Hebbian learning.
DISCUSSION
We studied the learning dynamics of the HebbFF neural network and identified patterns of modularization over the training process. By examining the evolution of modularity, modular flexibility, and entropy throughout the learning process, we obtained insights into the learning behavior of the HebbFF neural network that the modularization of the activation pattern of hidden layer neurons predicts the model performance both along training and across model instances. The relationship between modularity and task performance highlights the potential of network neuroscience to characterize the learning behaviors of ANNs.
The relationship between network architecture and learnability has long been an important topic for network science and computational cognitive neuroscience (26). Modularization of network architecture has been shown to support both sustained activity (27, 28) and the adaption to varied goals (29) with high executive efficiency (30). Leveraging RNNs, recent work highlights that spatial constraints (31) and specialized information processing (32) may also demand a certain level of modularization. In brain networks, prior works also demonstrate a correlation between network segregation and integration with the execution of cognitive tasks (33–35).
In the specific case of memory-related tasks, a positive correlation between modularity and performance was found to help an individual learn new skills without forgetting old skills (36). Typically, these results are interpreted in terms of challenging tasks requiring integration between modules, with the need for integration decreasing through learning (33). In line with this prior work, we found that the modularity of HebbFF networks increased during the learning process. These changes reflect the enhancement of modularity values associated with learning (37) and neurodevelopment (38). The fluctuations in flexibility throughout the learning process also mimic prior findings (15, 16), underscoring the role of increased flexibility in supporting task learning for given subjects. In contrast, in late learning stages, we observed a negative association between modularity and accuracy, as well as between flexibility and accuracy. Such negative association is also observed in reservoirs model, where both overly weak and strong modularization can harm the model performance (39). For brain networks, the negative association aligns with the decrease of modularity (40) and flexibility (41) through the adolescent neurodevelopment. In particular, the Simpson’s paradox (42) in Fig. 4 (A and C) may stem from a complex interplay between local cohesion and global connectivity (39), potentially reflecting developmental trend and individual difference in neurodevelopment.
In our investigation of HebbFF networks, we also observed that the processes of segregation and integration occurred not only among different functional modules but also within the representation of diverse features. This suggests that network modularization, as it pertains to feature representation, may hold substantial relevance for understanding the functionality of the brain. In the brain, different systems handle different aspects of the input and decision-making processes, implying that features could be encoded across multiple interconnected systems. This holistic approach to understanding neural networks, by considering the role of feature representation in network modularization, may provide unique insights into the complex dynamics of both artificial and biological learning systems. The consistency of the results on HebbFF and brain networks support the link between the network characteristics and the cognitive execution (43). In addition, the role of modularization in the representation is also supported in previous work varying from neuroscience (44) to machine learning (45). Thus, beyond offering a nuanced interpretation of network methodologies, our study points to a potentially unifying understanding of dynamic reconfiguration roles in both human cognition and machine learning.
Existing network neuroscience analyses of functional brain networks typically model the brain as a dynamic system that follows either a random walk (46) or a given geometric flow (47). These analyses are based on the implicit assumption that the variation of the information segregation and integration would cause corresponding difference in the signal space. Here, because functional connectivity is constructed directly from patterns of activation in the hidden layer units, we provide a more directive examination on the relationship between the learning behavior and the representation topology. This approach offers numerous insights into the performance of ANNs, including how the topology of the feature representation varies in early stages of learning, even when accuracy does not increase. While previous work (12) investigates how network topology affects the performance, our work adds to this literature by suggesting that, in addition to the network structure and its associated static measurement, the networked dynamics may provide additional evidence on how the model performance is related to the structure. This aligns with neuroscience findings that the dynamics of functional connectivity provides better predictions of cognitive scores than static measures (48). Accordingly, our work suggests a previously unidentified family of measurements and can facilitate the design of brain-inspired neural networks.
More broadly, our work contributes to the field of AI for neuroscience by providing a better understanding of the dynamical behavior of ANNs through a neuroscience-inspired lens. The endeavor to understand the human brain and develop efficient AI systems has often been a mutually beneficial process. However, a major challenge in AI for neuroscience has been bridging the gap between the static, linear analysis commonly used in machine learning and the dynamic, nonlinear characteristics that are intrinsic to the biological brain. Traditional methods for analyzing neural networks, such as studying the weight and bias parameters, may not capture the complete picture of how an ANN learns, adapts, and evolves over time. In this context, our study provides previously unidentified tools and methodologies to better understand these dynamics. By constructing brain-like networks from HebbFF and applying techniques like community detection and modular analysis, we mirror the modular structure and dynamic reconfiguration seen in the biological brain. In essence, we provide an avenue to study the temporal evolution and adaptation of ANNs during training, similar to the continual reconfiguration observed in the brain during learning. This approach enhances our understanding of the similarities and differences between ANNs and the human brain, opening new avenues for improving the design and training of ANNs. Our work also takes a step toward closing the gap between the simplistic activation and loss landscapes usually used in AI research and the complex, high-dimensional, and dynamic landscapes that are likely in the brain.
In our study, we have primarily concentrated on memory tasks and the emergence of modular structures identified through modularity maximization (49). We note, however, that various other methodologies can identify modular structures, and their significance quantified through different means, such as block-based models (50) or by comparing the relative strengths of inter-block and intra-block connections (51). Moreover, while flexibility in brain networks is commonly interpreted as a marker of neural plasticity at the network level (26), in ANNs, it relates to the system’s capacity to adapt and represent multiple states. This adaptability is influenced not only by the architecture of the network but also by the chosen hyperparameters.
In sum, our discoveries contribute to the broader aim of intersection between AI and neuroscience, using AI not only to replicate but also to understand and learn from the intricate workings of the brain. The tools and methods that we developed present previously unexplored opportunities to study learning dynamics in both artificial and biological neural networks. Such cross-fertilization of ideas can potentially lead to more efficient, adaptable, and robust AI systems while providing insights into the neuroscience of learning and memory. Future work should expand our approach to other cognitive tasks like multimodal matching, value decision, and perception tasks, as well as to other types of ANNs including RNN and deep feedforward networks. We hope that multimodal continual tasks learned through complex networks could serve as a digital analog of the brain in terms of cognitive execution and may provide novel insights into how functional modules reconfigure to support complex tasks.
MATERIALS AND METHODS
HebbFF network architecture
We adopted the same HebbFF network for a continual familiarity detection task in (14). In this task, the Hebbian network takes a stream of stimuli in the form of randomly generated N × 1–dimensional input vector x(t) and returns a label y(t) that indicates whether x(t) has appeared previously. For both the training and testing datasets, the x(t) is generated as a ±1 binary vector that equals to x(t − R) with probability P. Here, R is the repeat interval length. The HebbFF network consists of three layers: the input, output, and hidden layers. The hidden layer consists of M neurons. We use an M × 1 vector a(t) to denote the hidden state and h(t) = σ[a(t)] to denote the activation state after an activation function σ(·), where takes the form of the sigmoid function. The hidden state a(t) is obtained through a linear transformation on the input x(t), which is given by
| (1) |
where W1 is an M × N matrix denoting the affine transformation and b1 is an M × 1 vector denoting the bias. The matrix A(t) is the plasticity matrix that is updated at every time step to take care of the memory. It is calculated as
| (2) |
where λ is the learnable decay parameter and η is the learnable familiarity learning rate. When η > 0, it is called the Hebbian learning; when η < 0, it is called the anti-Hebbian learning. As demonstrated in (14) and supported by our own experiments, we adopt the anti-Hebbian learning rules for the familiarity detection. Further, the readout is given by
| (3) |
where W2 of dimension 1 × M and b2 of dimension 1 × 1 are learnable transformation weight and bias. The training loss is then set as
| (4) |
which represents the cross-entropy between the label y and prediction . The activation values a(t) and A(t) are updated through the training and thus can be used to analyze the network dynamics. The network structure is shown in Fig. 1 (A and B).
Construction of brain-like networks from the HebbFF activation
For an instance of HebbFF, after being training for t epochs, we evaluate the model on the testing dataset Xtest of dimension N × T and collect the hidden states into a matrix AH = [a(1), …, a(T)] of dimension M × T, which can be taken as the recorded activation sequence of the HebbFF for continuously executing T tasks. We then divide the full AH into L time windows each of the length Z = T/L. As the length parameter T is fully controllable, we assume that T can be divided up by Z for simplicity. We obtain a multilayer network where is the Pearson’s correlation value of the ith and jth rows in AH within the lth time window after the kth epoch’s training. On the basis of these along the full K epochs’ training, we can perform network-based analysis to characterize the learning behavior of HebbFF through training.
Modularization
Community detection is a method that decomposes a system into subsystems (23). For a given multilayer network where i and j denote the regions and l denote the layers. The multilayer modularity function is given as
| (5) |
where the adjacency matrix of layer l has component Fijl; γl is the resolution parameter of layer l; gil and gjr give the community assignments of nodes i and j in layers l and r, respectively; and kil is the strength of node i in the layer l with 2μ = ∑jrκjr, κjl = kjl + cjl, and cjl = ∑rCjlr. When Fijl is signed, one can split the positive and negative part and construct the modularity function similarly as shown in (52). Through maximizing Q, we can get the community structure gil of each node in each layer, which allows us to further investigate the networked dynamics of modules. Further, the module-allegiance matrix P is defined as the matrix whose element Pij denotes the frequency of nodes i and j in the same partition.
Dynamic reconfiguration of HebbFF networks
On the basis of the constructed network series {Aijl} and their associated community structure gil, we can then define the modular flexibility fi as the changing frequency of the community association over time (15), i.e., . Further, we can define the modular flexibility of the system as . If the system has a high flexibility, then it indicates that the module structure of the system changes fast, thus suggesting a high flexibility in the representation of the learned features.
Information entropy of activation variables
For a HebbFF network, the modular structure of the activation-induced correlation network is supported by the similarity of neuron’s activation patterns. To quantify the strength of different region’s participation in support the temporal dynamics, we adopt the entropy of a random variable quantifies the average information contained in the outcome (24). It is defined as the expectation of the log of the density for a continuous distribution. Mathematically, it is denoted as H(x) = Ex[−log P(x)]. For each HebbFF, when it processed K samples, we can collect K activation vector a1, …, aK. On the basis of these a1, …, aK, we can estimate the H(a) as the information contained by the hidden layer. Here, as our purpose is not accurately defining the information quantity, thus, for simplicity, we assume that this ai follows a multivariate Gaussian distribution.
Mixed-effects model
We use the mixed effects model for the regression in examining the relationship between modularity, connection strength, and model accuracy. Suppose the dependent variable is Yij for the ith model and jth sample and the independent variable is Xj. Then, the mixed effects model is given as Yij = μ + Ui + Vi · Xj + β · Xj + ϵij, where Ui and Vi are the random effects for the ith model instance and β and μ are the fixed effects. We then apply the fixed effects to construct the mediation analysis.
Acknowledgments
We thank S. Deng in UESTC for helping implement the SNN experiments.
Funding: S.G. is supported by NSFC Key Program, 62236009; Shenzhen Fundamental Research Program (General Program), JCYJ 20210324140807019; NSFC General Program, 61876032; and Key Laboratory of Data Intelligence and Cognitive Computing, Longhua District, Shenzhen.
Author contributions: Conceptualization, methodology, investigation, and visualization: S.G. Writing—original draft: S.G. and M.G.M. Writing—review and editing: S.G., M.G.M., H.T., and G.P.
Competing interests: The authors declare that they have no competing interests.
Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. All original data in this work was generated by program provided at https://github.com/dtyulman/hebbff with some homemade adaption for storage and parallel execution. The code for data processing and analysis can be found at Zenodo (https://doi.org/10.5281/zenodo.12207489).
Supplementary Materials
This PDF file includes:
Supplementary Results
Figs. S1 and S2
References
REFERENCES AND NOTES
- 1.Yang G. R., Joglekar M. R., Song H. F., Newsome W. T., Wang X. J., Task representations in neural networks trained to perform many cognitive tasks. Nat. Neurosci. 22, 297–306 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Pei J., Deng L., Song S., Zhao M., Zhang Y., Wu S., Wang G., Zou Z., Wu Z., He W., Chen F., Deng N., Wu S., Wang Y., Wu Y., Yang Z., Ma C., Li G., Han W., Li H., Wu H., Zhao R., Xie Y., Shi L., Towards artificial general intelligence with hybrid Tianjic chip architecture. Nature 572, 106–111 (2019). [DOI] [PubMed] [Google Scholar]
- 3.Schuman C. D., Kulkarni S. R., Parsa M., Mitchell J. P., Date P., Kay B., Opportunities for neuromorphic computing algorithms and applications. Nat. Comput. Sci. 2, 10–19 (2022). [DOI] [PubMed] [Google Scholar]
- 4.Ullman S., Using neuroscience to develop artificial intelligence. Science 363, 692–693 (2019). [DOI] [PubMed] [Google Scholar]
- 5.LeCun Y., Bengio Y., Hinton G., Deep learning. Nature 521, 436–444 (2015). [DOI] [PubMed] [Google Scholar]
- 6.Y. Takagi, S. Nishimoto, “High-resolution image reconstruction with latent diffusion models from human brain activity” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2023), pp. 14453–14463. [Google Scholar]
- 7.Yamins D. L. K., DiCarlo J. J., Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci. 19, 356–365 (2016). [DOI] [PubMed] [Google Scholar]
- 8.Du C., Fu K., Li J., He H., Decoding visual neural representations by multimodal learning of brain-visual-linguistic features. IEEE Trans. Pattern Anal. Mach. Intell. 45, 10760–10777 (2023). [DOI] [PubMed] [Google Scholar]
- 9.Ito T., Yang G. R., Laurent P., Schultz D. H., Cole M. W., Constructing neural network models from brain data reveals representational transformations linked to adaptive behavior. Nat. Commun. 13, 673 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Yang G. R., Molano-Mazón M., Towards the next generation of recurrent network models for cognitive neuroscience. Curr. Opin. Neurobiol. 70, 182–192 (2021). [DOI] [PubMed] [Google Scholar]
- 11.Wang X.-J., Theory of the multiregional neocortex: Large-scale neural dynamics and distributed cognition. Annu. Rev. Neurosci. 45, 533–560 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.S. Xie, A. Kirillov, R. Girshick, K. He, “Exploring randomly wired neural networks for image recognition” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (IEEE, 2019), pp. 1284–1293. [Google Scholar]
- 13.S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, E. Horvitz, E. Kamar, P. Lee, Y. T. Lee, Y. Li, S. Lundberg, H. Nori, H. Palangi, M. T. Ribeiro, Y. Zhang, Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv:2303.12712 (2023).
- 14.Tyulmankov D., Yang G. R., Abbott L. F., Meta-learning synaptic plasticity and memory addressing for continual familiarity detection. Neuron 110, 544–557.e8 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Bassett D. S., Wymbs N. F., Porter M. A., Mucha P. J., Carlson J. M., Grafton S. T., Dynamic reconfiguration of human brain networks during learning. Proc. Natl. Acad. Sci. U.S.A. 108, 7641–7646 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Braun U., Schäfer A., Walter H., Erk S., Romanczuk-Seiferth N., Haddad L., Schweiger J. I., Grimm O., Heinz A., Tost H., Meyer-Lindenberg A., Bassett D. S., Dynamic reconfiguration of frontal brain networks during executive cognition in humans. Proc. Natl. Acad. Sci. U.S.A. 112, 11678–11683 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Braun U., Harneit A., Pergola G., Menara T., Schäfer A., Betzel R. F., Zang Z., Schweiger J. I., Zhang X., Schwarz K., Chen J., Blasi G., Bertolino A., Durstewitz D., Pasqualetti F., Schwarz E., Meyer-Lindenberg A., Bassett D. S., Tost H., Brain network dynamics during working memory are modulated by dopamine and diminished in schizophrenia. Nat. Commun. 12, 3478 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.He Y., Liang X., Chen M., Tian T., Zeng Y., Liu J., Hao L., Xu J., Chen R., Wang Y., Gao J.-H., Tan S., Taghia J., He Y., Tao S., Dong Q., Qin S., Development of brain state dynamics involved in working memory. Cereb. Cortex 33, 7076–7087 (2023). [DOI] [PubMed] [Google Scholar]
- 19.Qin S., Farashahi S., Lipshutz D., Sengupta A. M., Chklovskii D. B., Pehlevan C., Coordinated drift of receptive fields in Hebbian/anti-Hebbian network models during noisy representation learning. Nat. Neurosci. 26, 339–349 (2023). [DOI] [PubMed] [Google Scholar]
- 20.H. G. Rodriguez, Q. Guo, T. Moraitis, “Short-term plasticity neurons learning to learn and forget” in Proceedings of the 39th International Conference on Machine Learning (ACM, 2022), pp. 18704–18722. [Google Scholar]
- 21.Sporns O., Betzel R. F., Modular brain networks. Annu. Rev. Psychol. 67, 613–640 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Preti M. G., Bolton T. A., Van De Ville D., The dynamic functional connectome: State-of-the-art and perspectives. Neuroimage 160, 41–54 (2017). [DOI] [PubMed] [Google Scholar]
- 23.Mucha P. J., Richardson T., Macon K., Porter M. A., Onnela J.-P., Community structure in time-dependent, multiscale, and multiplex networks. Science 328, 876–878 (2010). [DOI] [PubMed] [Google Scholar]
- 24.D. J. MacKay, Information Theory, Inference and Learning Algorithms (Cambridge Univ. Press, 2003). [Google Scholar]
- 25.A. Amir, B. Taba, D. Berg, T. Melano, J. McKinstry, C. Di Nolfo, T. Nayak, A. Andreopoulos,G. Garreau, M. Mendoza, J. Kusnitz, M. Debole, S. Esser, T. Delbruck, M. Flickner, D. Modha, “A low power, fully event-based gesture recognition system” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2017),pp. 7243–7252. [Google Scholar]
- 26.Zurn P., Bassett D. S., Network architectures supporting learnability. Philos. Trans. R. Soc. Lond. B Biol. Sci. 375, 20190323 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Wang S.-J., Hilgetag C., Zhou C., Sustained activity in hierarchical modular neural networks: Self-organized criticality and oscillations. Front. Comput. Neurosci. 5, 30 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kaiser M., Hilgetag C., Optimal hierarchical modular topologies for producing limited sustained activation of neural networks. Front. Neuroinform. 4, 8 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kashtan N., Alon U., Spontaneous evolution of modularity and network motifs. Proc. Natl. Acad. Sci. 102, 13773–13778 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Clune J., Mouret J.-B., Lipson H., The evolutionary origins of modularity. Proc. Biol. Sci. 280, 20122863 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Achterberg J., Akarca D., Strouse D. J., Duncan J., Astle D. E., Spatially embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings. Nat. Mach. Intell. 8, 1369–1381 (2023). [Google Scholar]
- 32.J. Tanner, S. Mansour L, L. Coletta, A. Gozzi, R. F. Betzel, Functional connectivity modules in recurrent neural networks: Function, origin and dynamics. arXiv:2310.20601 (2023).
- 33.Bassett D. S., Yang M., Wymbs N. F., Grafton S. T., Learning-induced autonomy of sensorimotor systems. Nat. Neurosci. 18, 744–751 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Tononi G., Sporns O., Edelman G. M., A measure for brain complexity: Relating functional segregation and integration in the nervous system. Proc. Natl. Acad. Sci. 91, 5033–5037 (1994). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Wang R., Liu M., Cheng X., Wu Y., Hildebrandt A., Zhou C., Segregation, integration, and balance of large-scale resting brain networks configure different cognitive abilities. Proc. Natl. Acad. Sci. U.S.A. 118, e2022288118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Ellefsen K. O., Mouret J.-B., Clune J., Neural modularity helps organisms evolve to learn new skills without forgetting old skills. PLoS Comput. Biol. 11, e1004128 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Finc K., Bonna K., He X., Lydon-Staley D. M., Kühn S., Duch W., Bassett D. S., Dynamic reconfiguration of functional brain networks during working memory training. Nat. Commun. 11, 2435 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Baum G. L., Ciric R., Roalf D. R., Betzel R. F., Moore T. M., Shinohara R. T., Kahn A. E., Vandekar S. N., Rupert P. E., Quarmley M., Cook P. A., Elliott M. A., Ruparel K., Gur R. E., Gur R. C., Bassett D. S., Satterthwaite T. D., Modular segregation of structural brain networks supports the development of executive function in youth. Curr. Biol. 27, 1561–1572.e8 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Rodriguez N., Izquierdo E., Ahn Y.-Y., Optimal modularity and memory capacity of neural reservoirs. Netw. Neurosci. 3, 551–566 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Cohen J. R., D’Esposito M., The segregation and integration of distinct brain networks and their relationship to cognition. J. Neurosci. 36, 12083–12094 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Gu S., Fotiadis P., Parkes L., Xia C. H., Gur R. C., Gur R. E., Roalf D. R., Satterthwaite T. D., Bassett D. S., Network controllability mediates the relationship between rigid structure and flexible dynamics. Netw. Neurosci. 6, 275–297 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Simpson E. H., The interpretation of interaction in contingency tables. J. R. Stat. Soc. Series B 13, 238–241 (1951). [Google Scholar]
- 43.Medaglia J. D., Lynall M.-E., Bassett D. S., Cognitive network neuroscience. J. Cogn. Neurosci. 27, 1471–1491 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Shine J. M., Neuromodulatory influences on integration and segregation in the brain. Trends Cogn. Sci. 23, 572–583 (2019). [DOI] [PubMed] [Google Scholar]
- 45.Liu Z., Kitouni O., Nolte N. S., Michaud E., Tegmark M., Williams M., Towards understanding grokking: An effective theory of representation learning. Adv. Neural Inf. Process. Syst. 35, 34651–34663 (2022). [Google Scholar]
- 46.Bullmore E., Sporns O., Complex brain networks: Graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10, 186–198 (2009). [DOI] [PubMed] [Google Scholar]
- 47.Deco G., Kringelbach M. L., Great expectations: Using whole-brain computational connectomics for understanding neuropsychiatric disorders. Neuron 84, 892–905 (2014). [DOI] [PubMed] [Google Scholar]
- 48.Liégeois R., Li J., Kong R., Orban C., Van De Ville D., Ge T., Sabuncu M. R., Yeo B. T. T., Resting brain dynamics at different timescales capture distinct aspects of human behavior. Nat. Commun. 10, 2317 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Blondel V. D., Guillaume J.-L., Lambiotte R., Lefebvre E., Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, P10008 (2008). [Google Scholar]
- 50.Aicher C., Jacobs A. Z., Clauset A., Learning latent block structure in weighted networks. J. Complex Netw. 3, 221–248 (2015). [Google Scholar]
- 51.Nematzadeh A., Ferrara E., Flammini A., Ahn Y.-Y., Optimal network modularity for information diffusion. Phys. Rev. Lett. 113, 088701 (2014). [DOI] [PubMed] [Google Scholar]
- 52.Gu S., Xia C. H., Ciric R., Moore T. M., Gur R. C., Gur R. E., Satterthwaite T. D., Bassett D. S., Unifying the notions of modularity and core–periphery structure in functional brain networks during youth. Cereb. Cortex 30, 1087–1102 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Geweke J., Measurement of linear dependence and feedback between multiple time series. J. Am. Stat. Assoc. 77, 304–313 (1982). [Google Scholar]
- 54.Fang W., Chen Y., Ding J., Yu Z., Masquelier T., Chen D., Huang L., Zhou H., Li G., Tian Y., SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence. Sci. Adv. 9, eadi1480 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary Results
Figs. S1 and S2
References







