Bayesian inference is facilitated by modular neural networks with different time scales

Kohei Ichikawa; Kunihiko Kaneko

doi:10.1371/journal.pcbi.1011897

. 2024 Mar 13;20(3):e1011897. doi: 10.1371/journal.pcbi.1011897

Bayesian inference is facilitated by modular neural networks with different time scales

Kohei Ichikawa ^1,^*, Kunihiko Kaneko ^2,³

Editor: Thomas Serre⁴

PMCID: PMC10962854 PMID: 38478575

Abstract

Various animals, including humans, have been suggested to perform Bayesian inferences to handle noisy, time-varying external information. In performing Bayesian inference by the brain, the prior distribution must be acquired and represented by sampling noisy external inputs. However, the mechanism by which neural activities represent such distributions has not yet been elucidated. Our findings reveal that networks with modular structures, composed of fast and slow modules, are adept at representing this prior distribution, enabling more accurate Bayesian inferences. Specifically, the modular network that consists of a main module connected with input and output layers and a sub-module with slower neural activity connected only with the main module outperformed networks with uniform time scales. Prior information was represented specifically by the slow sub-module, which could integrate observed signals over an appropriate period and represent input means and variances. Accordingly, the neural network could effectively predict the time-varying inputs. Furthermore, by training the time scales of neurons starting from networks with uniform time scales and without modular structure, the above slow-fast modular network structure and the division of roles in which prior knowledge is selectively represented in the slow sub-modules spontaneously emerged. These results explain how the prior distribution for Bayesian inference is represented in the brain, provide insight into the relevance of modular structure with time scale hierarchy to information processing, and elucidate the significance of brain areas with slower time scales.

Author summary

Bayesian inference is essential for predicting noisy inputs in the environment and is suggested to be common in various animals, including humans. For the brain, to perform Bayesian inference, the prior distribution of the signal must be acquired and represented in the neural networks by sampling noisy inputs to estimate the posterior distribution of signals. By training recurrent neural networks to predict time-varying inputs, we demonstrated that those with modular structures, characterized by the main module exhibiting faster neural activity and the sub-module exhibiting slower neural activity, achieved highly accurate Bayesian inference to perform the required task. In this network, the prior distribution was specifically represented by the slower sub-module, which effectively integrated the earlier inputs. Furthermore, this modular structure with different time scales and division of representing roles emerged spontaneously through the learning process of Bayesian inference. Our results demonstrate a general mechanism for encoding prior distributions and highlight the importance of the brain’s modular structure with time scale differentiation for Bayesian information processing.

Introduction

In the human and various animal brains, information processing involves inference based on inputs from the external world through the sensory systems, which obtain information from inputs under uncertainty [1] due to noise. To predict time-varying noisy inputs, previous studies suggested that animals such as humans and monkeys process inputs according to a Bayesian inference framework to deal with such uncertainty [2–13].

Bayesian inference is performed by calculating the posterior from the prior. The prior is gained from the history of inputs and gives the information to predict the signal input in advance. From it, the likelihood is estimated by observing the input signal. It is believed that prior knowledge must first be represented in the brain. It is then adjusted over time from the input history. However, how prior information is shaped in the brain remains elusive. In machine learning, several models such as variational recurrent neural networks (RNN) [14, 15] have been proposed that can perform Bayesian inference by computing prior from external signals. However, these models are designed specifically to make Bayesian inference, for instance, by introducing neurons for expressing prior, in advance. They, therefore, will not be able to answer how prior is shaped in the brain and which structure in the brain is relevant, in order to perform Bayesian inference. Here, we explore how neural networks acquire and represent prior knowledge, to predict time-varying noise inputs by performing Bayesian inference.

In this study, by recalling that a deterministic neural network with a simple learning algorithm can perform probabilistic inference [16], we investigate which type of RNNs can predict stochastic time-varying input better, by acquiring prior knowledge to perform Bayesian inference. Note that the acquisition mechanism of prior knowledge itself was studied earlier for fixed inputs [17]. In this case, however, fixed prior is sufficient, therein, and how time-varying prior was shaped in neural networks to predict time-varying inputs was not considered.

To discuss which structure of RNNs is relevant to shape the Bayesian inference, we recall hierarchical structure in the brain, with functional differentiated areas. In fact, some experiments suggest that the prior and the likelihood for Bayesian inference are encoded in different brain areas [18–20], even though the validity of the possible mechanism underlying the results remain controversial. On the other hand, relevance of area differentiation to the Bayesian inference can be theoretically expected as follows: In general, to obtain the prior, it is necessary to estimate the prior distribution based on previous observations. For it, the population of neurons representing the prior must integrate observed inputs over some time span. One possible mechanism for achieving such integration will be gained by adopting two neural modules functioning at distinct time scales; a downstream neuron population with slower activity changes separated from an upstream neuron population that processes input information. The existence of slow module that does not directly receive inputs in the neural network, thus will be relevant to integrate inputs over some time span.

Some experimental reports, in fact, have suggested that the time scale of neural activities in higher layers of the brain that do not directly receive external input is slow [21–23], which may work to integrate the activities of lower layers. Note that putamen, amygdala, insula, and orbitofrontal cortex in the brain have been found to represent prior in an experiment [18].

Inspired by these experiments and theoretical considerations, we studied RNN models with two modules; a main module with a direct connection to the input-output layer and a sub-module with a direct connection to the main module and without connections to the input-output layer (i.e., a hierarchical structure)(Fig 1). In the RNN we applied a time-varying stochastic input, whose mean value changes with time whereas Gaussian noise is added around it. We trained the RNN to minimize the prediction error. Then, we confirmed that RNN could predict the input by the appropriate modular structure shaping the prior for the Bayesian inference. We examined a possible role of modular structure and the importance of the time scale difference between the main and sub-modules in forming the prior representation for Bayesian inference. We found that RNNs with a modular structure shape the prior more accurately than regular RNNs when the signal containing noise is inputted.

Fig 1 — (a) Standard RNN without modular structure (b) RNN with modular structure.

Further, Bayesian inference was shown to be more accurate when the time scale of the sub-module was appropriately slower. When the time scale was uniform, prior information was maintained in both the main module and sub-module. In this case, the performance of the prediction of time-varying input was rather low. In contrast, when the time scales were different, prior information was represented by the slow sub-module, in which case, the performance of prediction was quite high. In the latter case, the time variance of the prior was embedded in the neural manifold of the slower sub-module. With this embedding of variance by the sub-module, the average input change is clearly distinguished from noise, leading to better Bayesian inference.

In addition, we trained the RNN so that the connection of structure and time scale of neurons also change and examined if the modular structure with distinct time scales would emerge from a homogeneous neural network. As the training progressed, we observed that the time scales of neurons differentiated into slower and faster scales. A modular structure arose in which slower neurons were separated from the input/output layers, which were predominantly connected to the fast neurons connecting input/output layers. This sub-module with slow neurons represented the prior information distinctly.

These results will be important to understand how the prior for Bayesian inference is represented in the neural networks and provide insight into the relationships between neural dynamics and the structure [24–28] underlying information processing in the brain.

Materials and methods

Recurrent neural networks with/without modular structure

To investigate the effect of structure and time scale on Bayesian inference, we considered the following RNNs [29]; a standard RNN without modular structure and RNN with a modular structure.

First, we adopted a standard RNN consisting of an input layer, a recurrent(hidden) layer, and an output layer, as shown in Fig 1(a). The number of neurons N is set at N = 200 unless otherwise mentioned. The following equation represents the dynamics of the recurrent layer:

\begin{matrix} x (t + 1) = (I - α) x (t) + α ReLU (W_{i n} u (t) + W x (t)) + \sqrt{α} ξ, \end{matrix}

(1)

where I represents $I = {(1, 1, \dots, 1)}^{T}$ and $α = {(α_{1}, α_{2}, \dots, α_{N})}^{T}$ represents a vector to introduce the rate of change, i.e. the inverse of time scale of the neurons(0 ≤ α_i ≤ 1) (Within this range of α, the present model works). Furthermore, it should be noted that the products of vectors such as (I − α)x(t) are defined as the Hadamard product. If the timescale for neurons is identical, α_i is set as α_i = const. Below we consider the case with two timescales as

\begin{matrix} α_{i} = {\begin{matrix} α_{m} & (1 \leq i \leq N_{m}) \\ α_{s} & (N_{m} < i \leq N), \end{matrix} \end{matrix}

(2)

where N_m and N − N_m neurons have distinct timescales. N_m is set at N_m = 150 unless otherwise mentioned. Here the standard homogeneous network is given by α_s = α_m; Below, the case with α_s < α_m was mostly studied to investigate the effect of time scale difference. Although we have mainly shown the results with a specific ratio of 150 fast neurons to 50 slow neurons, we found that the key results remained unchanged; In fact, we will show later that unless the number ratio is excessively biased (e.g., 10:240 or 240:10). As long as there are enough neurons in each category to capture the underlying dynamics, good Bayesian performance is achieved. (The condition may further depend on the details in the network complexity or specific interactions forms between fast and slow neurons, which will need further studies.) Here, u(t) is the input signal, and x is the state of the neurons in the recurrent layer. W_in and W represent the weight of synaptic connection. ξ was used to account for independent noise in dynamics given by a random variable that follows a normal distribution with mean 0 and standard deviation 0.05. We adopted the activation function ReLU(ReLU(z) = 0 for z ≤ 0 and ReLU(z) = z for z > 0) [30]. Then, the output of the RNN was determined by the linear combination of the internal states as follows.

\begin{matrix} y (t) = W_{o u t} x (t) \end{matrix}

(3)

Next, we introduced a modular structure to the above RNN to ensure the distinction between the main and sub-modules(Fig 1(b)). Only the main module was connected to the input/output layers. Thus, the dynamics of the recurrent layer are given by

\begin{matrix} \begin{matrix} x_{m} (t + 1) = (1 - α_{m}) x_{m} (t) + \\ α_{m} ReLU (W_{i n} u (t) + W_{m a i n} x_{m} (t) + W_{s \to m} x_{s} (t)) + \sqrt{α_{m}} ξ_{m} \end{matrix} \end{matrix}

(4)

\begin{matrix} x_{s} (t + 1) = (1 - α_{s}) x_{s} (t) + \\ α_{s} ReLU (W_{s u b} x_{s} (t) + W_{m \to s} x_{m} (t)) + \sqrt{α_{s}} ξ_{s}, \end{matrix}

(5)

where x_m and x_s represent the firing rate of neurons in the main and sub-modules, respectively. Hence, α_m and α_s represent the inverse of time scale of the main and the sub-module, respectively. Here, α_m is fixed at 1(without losing generality), while we varied α_s from 1 to 0.01 to examine the effect of the time scale difference. The RNN output was determined by the linear combination of internal states of the main module.

\begin{matrix} y (t) = W_{o u t} x_{m} (t) \end{matrix}

(6)

Task

This study focused on a task optimally solved via Bayesian inference, in which an RNN was assigned to estimate the actual value from a noise-perturbed signal. We constructed the external input signal s as follows:

Initially, we randomly sampled the true value, y_true, from a generator (or cause) distribution defined by a normal distribution with a mean of μ_g and a variance of $σ_{g}^{2} : y_{t r u e} \sim N (μ_{g}, σ_{g}^{2})$
Subsequently, the s was generated from y_true by adding noise which is sampled from the normal distribution with mean 0 and variance $σ_{l}^{2} : s \sim N (y_{t r u e}, σ_{l}^{2})$

Here it is important to note that the generator is not static, and is changed over time with a probability p_t, where p_t represents the parameter indicating the likelihood of changes in the generator. Upon the change, the parameters μ_g and σ_g were updated to values uniformly sampled from a given ranges: μ_g ∈ [−0.5, 0.5] and σ_g ∈ [0, 0.8]. When the generator model alters, it changes the distribution to which y_true conforms. As a result, the external input signal s fed into the RNN also changes. However, even if the generator changes, the signals before and after the change are combined and input to the RNN as a series of signals.

As mentioned in the Introduction, we set this task, as the RNN can make predictions when it performs Bayesian inference. To perform it, the prior distribution needed for Bayesian inference must be estimated from the observed signal so that it is close to the generator distribution. u(t) for Eq 1 (or 4 and 5) is given by the Probabilistic Population Code (PPC) [31]. Now we need to assign the input term u(t) in Eq 1 from the signal s. For it, we adopted the Probabilistic Population Code (PPC) [31]. PPC assumes that the information in a signal is encoded by a population of neurons with a position-based preferred stimulus that fires probabilistically according to a Poisson distribution. It has been shown that neural networks with a population of neurons with input by PPC can learn probabilistic inference effectively [16]. Therefore, in this study, we also assumed that the activity u of the input-layer neurons encoding the observed signal followed the PPC model. Accordingly, u was sampled from the following Poisson distribution [32] every time step:

\begin{matrix} p (u | s) = \prod_{i} \frac{e^{- f_{i} (s)} f_{i} {(s)}^{u_{i}}}{u_{i}!} \end{matrix}

(7)

Here, s is the observed input signal (i.e., which is generated from y_true by adding noise), whereas f_i is the tuning curve of the neurons, which represents how responsive each neuron is to s. The signal variance $σ_{l}^{2}$ is inversely encoded in f_i as the amplitude of the tuning curve. The tuning curve of the neurons is represented as follows:

\begin{matrix} f_{i} (s) = \frac{1}{σ_{l}^{2}} exp (\frac{- {(s - ϕ_{i})}^{2}}{2 σ_{PPC}^{2}}), \end{matrix}

(8)

where ϕ_i represents the preferred stimuli of neurons in the input layer. It was assumed that ϕ_i follows an arithmetic sequence for i (ϕ_i = −1/2 + i/m when the number of neurons in the input layer is m) [33]. Also, $σ_{PPC}^{2}$ is a constant that represents the width of tuning curve which was set as $σ_{PPC}^{2} = 1 / 2$ in this study. By employing the above of tuning curve transformation, it is demonstrated that one can encode the information from the external signal s into the spatial position of neurons that are most likely to fire.

In this task, the true value y_true was to be estimated based on the input signal u. Therefore, training was performed to minimize the mean squared error (MSE) between the neural network output y(t) and the true value y_true(t). Note that the information of y_true is used only to calculate the loss function when learning. (We acknowledge that providing the true value y_true(t) for training might be an artificial setting. However, similar settings in which the true value is provided have been utilized in previous studies, such as in the work by [18]. For the purposes of this study, to investigate the role of a modular structure and time-scale difference, it will be useful to adopt the simple and previously established settings. Then one can compare the present system with the standard homogeneous network case, even though it cannot fully reflect on the real-world situation.)

\begin{matrix} L = \frac{1}{T} \sum_{t} {(y (t) - y_{t r u e} (t))}^{2} \end{matrix}

(9)

The training was performed by using the backpropagation method [34, 35], to decrease L by optimizing the weight of synaptic connections W_*. Here, this optimization is performed by the stochastic gradient method. For it, an efficient method called Adam [36] is generally established and widely use, which was adopted here. The batch size of training samples was set to 50. In machine learning, a batch size refers to the number of training samples that are processed together in one iteration. The weight decay rate was set to 0.0001, where the weight decay is a regularization technique used in neural networks to prevent overfitting. This decay is introduced by adding a penalty, proportional to the size of the weight coefficients, to the loss function that the model is asked to minimize. By setting a weight decay rate, we ensure that the model does not too much focus on some particular feature and can generalize better to unseen data. In each iteration in machine learning, the network performs a single pass through the entire training dataset. Here we set 6000 iterations for the training. Here, in our case, the training process was performed over the complete set of training data 6000 times. This is a typical number adopted in machine learning and we also confirmed that this number is sufficient to complete the training. See Table 1 for the hyperparameters used in the experiment.

Table 1. Hyperparameters.

Important items are in bold.

Attribute	Value
Range of μ_g	−0.5 ≤ μ_g ≤ 0.5
Range of σ_g	0 ≤ σ_g ≤ 0.8
Range of σ_l	$\sqrt{1 / 5} \leq σ_{l} \leq 1$
Switching probability of the generator: p_t	p_t = 0.03
Dimension of the input units u(t)	100
σ _PPC	0.5
Lasting time of u(t)	T = 120
# of neurons in the main module	N_m = 150
# of neurons in the submodule	50
α _m	1
α _s	1, 0.5, 0.2, 0.1, 0.05, 0.01
Batch size	50
Optimization algorithm	Adam
Learning rate	0.001
Iteration	6000
Weight decay	0.0001

Open in a new tab

Results

Fixed structure and time scales

Bayesian optimality

Because the generated signal s was observed under noise, the neural network was required to estimate the true value y_true sampled from the generator. If the information from the generator were known, we can estimate the true value optimally as follows (maximum a posteriori(MAP) estimation [37]).

\begin{matrix} y_{o p t} = \frac{σ_{g}^{2}}{σ_{g}^{2} + σ_{l}^{2}} s + \frac{σ_{l}^{2}}{σ_{g}^{2} + σ_{l}^{2}} μ_{g} \end{matrix}

(10)

However, as described in the “Task” section, the information from the generator was not explicitly given to the neural network, so it must be estimated from observed signals as a prior distribution. First, we examined whether the neural network could achieve this prior-based estimation.

The output y of RNN with modular structure trained with α_s = 0.1, when given an observed signal s, is shown Fig 2. s was sampled from the prior with μ_g = 0.5, σ_g = 0.5, and $σ_{l} = \sqrt{1 / 5}$ of noise was added. The green points represent the estimation based on the maximum likelihood estimation y_ML, which is with the highest accuracy when no prior information is available. Here, this estimation is nothing but matching with the observed signal s. The blue points represent y_opt when estimated according to the MAP estimation, and the orange points represent the actual neural network output y. Fig 2 shows that the output of RNN is closer to the blue points y_opt rather than to the green points, indicating that approximate (nearly-optimal) Bayesian inference with a well-estimated prior is achieved (the mean squared error between y and y_ML is 0.15, and the mean squared error between y and y_opt is 0.019, the latter being smaller).

Next, we examined the optimality of the Bayesian estimation for networks with and without modular structures and time scale differences. Fig 3(a) shows the MSE between y and y_opt by the RNN trained under each condition. This result shows that the modular structure improved the accuracy of Bayesian estimation, which was further increased when α_s decreased to an appropriate degree. In fact, we found the optimal time scale α_s = 0.06 ∼ 0.2, at which the maximum accuracy was achieved. As shown in S4 Fig MSE remains to be low at around 0.06 ≲ α_s ≲ 0.2. Even without modular structure, the time scale difference contributed to inference accuracy, but the accuracy increased significantly with both the modular structure and time scale difference.

Adjustability to rapid generator switching

So far, we studied the performance of Bayesian inference models under a fixed generator to compare the accuracy of Bayesian inference itself. Next, we examined their performance when the generator changes in time. To perform Bayesian inference for a rapidly changing input, it was necessary for the model to quickly approach the new optimal value y_opt to yield a good estimation. To verify the accuracy of the RNN in this case, we compared the MSE between y_true(t) generated by the generator and the output y(t) of RNN under various p_t(Fig 3(b)). The model with α_s = 0.1 was found to be more accurate for all values of p_t.

As a special case, we considered a setting where the input moves back and forth between two generators, A and B. Then we examined whether the prior distribution estimated by the RNN was closer to the distribution of either generator. Specifically, we adopted generator A with $(μ_{g}, σ_{g}^{2}) = (μ_{A}, σ_{A}^{2})$ and generator B with $(μ_{g}, σ_{g}^{2}) = (μ_{B}, σ_{B}^{2})$ and compute the following values when the Bayesian optimal estimates under each generator were $y_{o p t}^{A}, y_{o p t}^{B}$ .

\begin{matrix} a (t) = \frac{y_{o p t}^{B} - y (t)}{y_{o p t}^{B} - y_{o p t}^{A}} \end{matrix}

(11)

When a(t) is close to 1, the model’s prior is closer to generator A, and when a(t) is close to 0, it is closer to generator B.

Comparing the change in a(t) between the model with α_s = 0.1 and the model with α_s = 1, we found that the model with α_s = 0.1 was more adjustable to the generator change, as shown in Fig 4(a). This result shows that the model with α_s = 0.1 was more responsive to the changes of the generators and recognized the generator change more quickly in all runs. The difference between the two models was especially pronounced in the extreme case in which the two generators switched every time(Fig 4(b)). Intuitively, having a population of slow neurons would seem to be disadvantageous in responding to rapid environmental changes, but the results showed the opposite. The network with α_s = 1 could not follow rapid input changes, whereas that with α_s = 0.1 could estimate the input prior effectively. We discuss the importance of slow neurons in responding to rapid changes below. Furthermore, we also checked that when μ_g and σ_g are constant, the modularity of the network is not necessary, and the difference in the performance with and without modularity was not detected.

Fig 4 — (a) a(t) for the case where generator A( $(μ_{A}, σ_{A}^{2}) = (- 0.5, 0.04)$ ) and generator B( $(μ_{A}, σ_{A}^{2}) = (0.5, 0.04)$ ) switch alternately with probability p_t = 0.2. The model with α_s = 0.1 adjusted more quickly to the generator change. The thin line represents a(t) when the output was fully adjusted to generator switching. (b) a(t) for the case where generator A( $(μ_{A}, σ_{A}^{2}) = (- 0.5, 0.04)$ ) and generator B( $(μ_{A}, σ_{A}^{2}) = (0.5, 0.04)$ ) switch every time(periodic switch).

Representation of the prior

We investigated how the slow sub-module facilitated improved prior representation for Bayesian inference. By starting from the examination of the hypothesis that a group of downstream slow neurons represent the prior by integrating the observed signal over time, we investigated which side of the main/sub-module was responsible for the prior information in the modular RNN.

Here, by using the prior information, the estimated value was shifted from the observed signal s to an appropriate value y_opt(Eq 10). In other words, even given the same signal input s, the output varied depending on which time series signal was input before s (because the prior estimation changed). Even if one module returned to its original state, the output shifted from s because the prior information remained in the other module. The magnitude of this change is considered to represent the degree to which the module utilizes the estimated prior information. Therefore, it is possible to estimate the extent to which each module plays a role in prior information processing by examining the change in the output y(t) when the internal state of each main and sub-module is changed to the value corresponding to a different prior.

First, let x_m(t; μ_g, σ_g), x_s(t; μ_g, σ_g) be the internal states of the main and sub-module, respectively, when the input signal s from a generator (μ_g, σ_g) is applied for a certain period. Because the output y(t) is determined by the internal states of two modules and the input signal at t − 1, it can be written as y(x_m(t − 1; μ_g, σ_g), x_s(t − 1; μ_g, σ_g), s(t − 1), σ_l)(From now on, time notation will be omitted). From this, the change in the output y is computed by fixing one of the two modules and varying the other to a different internal state $x_{i} (μ_{g}, σ_{g}) \to x_{i} (μ_{g}^{'}, σ_{g}^{'})$ . This change is made by saving the value of internal state x_m,s obtained by applying the input s created from the generator of $(μ_{g}^{'}, σ_{g}^{'})$ , and by changing the value of x_m,s to that value during RNN inference. (This procedure is adopted only for the sake to analyze which module is more responsible for the prior information.) The degree of change in y represents the impact on the output of each module reflecting the prior information. Hence, by comparing the above variances of y by x_m(or x_s) with fixed x_s(or x_m) respectively, it is possible to estimate how much each module is responsible for the prior representation. Specifically, we fixed one of the modules at the values with μ_g = 0 and σ_g = 0.4 (These values are set to the median of the range of values −0.5 ≤ μ_g ≤ 0.5 and 0 ≤ σ_g ≤ 0.8), i.e., x_i(0, 0.4), while for the other module, μ_g and σ_g are changed as x_g(μ_g, σ_g). Then, we calculated the variance of y as

\begin{matrix} V_{m} = {⟨ Var {[y (x_{m} (μ_{g}, σ_{g}), x_{s} (0, 0.4), s, σ_{l})]}_{(μ_{g}, σ_{g})} ⟩}_{(s, σ_{l})} \end{matrix}

(12)

\begin{matrix} V_{s} = {⟨ Var {[y (x_{m} (0, 0.4), x_{s} (μ_{g}, σ_{g}), s, σ_{l})]}_{(μ_{g}, σ_{g})} ⟩}_{(s, σ_{l})}, \end{matrix}

(13)

where $Var {[]}_{(μ_{g}, σ_{g})}$ denotes the variance over the changes of (μ_g, σ_g), and $〈 〉_{(s, σ_{l})}$ denotes the average over the changes of (s, σ_l). The magnitudes of V_s and V_m indicate the extent to which the sub-module and main module, respectively, influence the variation in the output, to respond upon changes in the signal’s prior distribution.

Dependencies of V_s and V_m on different α_s are shown in Fig 5. This result shows that when α_s = 1 (i.e., the time scale is uniform), both the main and sub-modules contribute to the representation of prior distribution to the same degree. Conversely, when α_s = 0.05 ∼ 0.5, V_s is much larger than V_m, meaning that the sub-module selectively contributes to the representation of the prior. In particular, when α_s = 0.1 and 0.2, the differentiation of representation between the main and sub-modules is more pronounced. Note that the contribution of the main module is large when α_s = 0.01, probably because the time scale of the sub-module is too slow to code the information of the prior. Comparing Figs 3 and 5 shows that the highly accurate Bayesian inference is achieved when the prior distribution information is localized in the sub-module.

Next, we investigated how the prior is represented by the main and sub-modules by visualizing the neural activity by principal component analysis(PCA) [38, 39]. First, x_m(μ_g, σ_g) and x_s(μ_g, σ_g) were computed for various (μ_g, σ_g) in a model with α_s = 0.1, and made PCA. The results were projected on a plane using the first and second principal components and color-coded according to μ_g and σ_g (Fig 6(a) and 6(b)). The neural activity in the main module was loosely distributed on a one-dimensional manifold, represented by the first principal component(PC1). This PC1 approximately corresponded to the μ_g value, although the distinction was not clear. In contrast, the activity in the sub-module was clearly represented by 2-dimensional manifolds, as in Fig 6(b2), where PC1 corresponds to μ_g, and PC2 corresponds to σ_g, rather well.

Then, we performed the same analysis on the model with α_s = 1 (Fig 6(c) and 6(d)). In this case, the manifolds of neural activities for the main and sub-modules did not change significantly. Both were represented in a one-dimensional manifold corresponding to μ_g; there was no axis corresponding to σ_g. The decodability of σ_g achieved in the internal states of the sub-module with α_s = 0.1 was not observed for α_s = 1. In fact, the coefficient of determination when σ_g was calculated by Ridge regression from the internal state of the sub-module with α_s = 0.1 was 0.68, while that using the sub-module with α_s = 1 is −0.03. This suggests that the model with α_s ∼ 0.1 can better distinguish the input’s variance from noise to perform Bayesian inference accurately.

When the generator changed rapidly, the variance of the prior was larger than the variance of the generator, as shown in the S1 Fig for the case with α_s = 0.1. When σ_g was large, as seen from Eq 10, the influence of the observed signal s was larger than that of μ_g, allowing the model to “keep up” with large changes in the observed signal. This explains the higher adjustability to rapid generator changes as seen in Fig 4.

To investigate whether the division of roles and accurate Bayesian inference depend on the number of neurons in the sub-module and main-module, we trained models with varying the ratio values of N_s to N_m by fixing at N_s + N_m = 250. As shown in S3 Fig, the MSE remained to be low as long as the number fraction is not too biased(e.g., 10:240 or 240:10). Except these extreme cases, the efficient Bayesian inference was achieved, where the division of roles was achived as computed by V_s and V_m. Here, to investigate whether the difference in time scale or the existence of slower time scale itself was more influential, we trained an RNN with α_s = α_m = 0.1 and examined its accuracy. As shown in S2 Fig, We found that when (α_m, α_s) = (0.1, 0.1), the MSE was larger, and the accuracy was worse than that in the cases with (α_m, α_s) = (1, 1) and (α_m, α_s) = (1, 0.1). Therefore, it is not simply the slower time scale of the neurons but the time scale difference between the main and sub-modules facilitates the accuracy in Bayesian inference.

Effects of different time scales

To examine the impact of α_s differences on Bayesian inference accuracy in detail, we considered how each model with α_s = 1 and α_s = 0.1 represents prior as a function of the input signal s(t).

The RNN need to estimate the current generator from past input signals in order to accurately predict y(t). In this paper, this estimation is treated as a prior. Thus, we assume that the mean μ_p in the current generator estimated by the RNN is represented as a superposition of past input signals as follows:

\begin{matrix} μ_{p} ≃ \sum_{k} a_{k} s (t - k) . \end{matrix}

(14)

Note that μ_p corresponds to the “current state of the generator” estimated by the RNN and is treated as a variable distinct from μ_g (for example, right after the generator switches from μ_g = −0.5 to μ_g = 0.5, if a positive s(t) is input into the RNN, the RNN would estimate that the mean of generator is remains to be still −0.5). If the values of a_k in the above equation are known, it will be possible to discuss how the RNN estimates the current state of the generator, and how it performs this estimation using the past input signals s(t).

Below, we estimate a_k. For it, it is necessary to estimate μ_p. Here, there is the one-to-one relationship between the internal state x of the RNN and μ_p. Given this background, we define μ_p as $μ_{p} = W_{μ_{p}} x$ . Here, $W_{m u_{p}}$ is considered to be a transformation matrix, calculated as follows: We fixed the generator and estimated $W_{μ_{p}}$ by calculating x. Then, we created a data vector M_g that gives the time series $μ_{g}^{1}, μ_{g}^{2}, \dots$ and a data matrix X that gives the time series of the internal states x¹, x², … i.e.,

\begin{matrix} M_{g} = {(μ_{g}^{1}, μ_{g}^{2}, \dots)}^{T}, X = (x^{1}, x^{2}, \dots) \end{matrix}

Then, we seeked for the matrix $W_{μ_{p}}$ such that $M_{g} ≃ W_{μ_{p}} X$ . Using the Moore-Penrose pseudo inverse, we obtain the best-fit matrix as $W_{μ_{p}} = M_{g} X^{†}$ [40]. Let μ_p be the result of the transformation by $W_{μ_{p}}$ . As Fig 7(a) shows, μ_g ≃ μ_p is valid.

Fig 7 — (a)Comparison between the estimated mean of prior μ_p and the mean of generator μ_g. (b)Comparison between the linear weighted sum of past signals s(t − k) and the estimated mean of prior μ_p.

Based on the calculation of μ_p, we estimated a_k by the following steps. First, we obtained x(t) against the time-varying signal with a probability of p_t = 0.03. By applying the above transformation matrix, $W_{μ_{p}}$ to x(t), which was obtained at this time, the prior μ_p was estimated. The state of the prior was thus obtained for the time series of the observed signal s(t).

Finally, a_k in Eq 14 was obtained by minimizing the difference between the two sides of Eq 14. Specifically, we created a data vector M_p that arranges μ_p and a data matrix S that arranges $s = {(s (t - 1), s (t - 2), \dots, s (t - K))}^{T}$ . Using the Moore-Penrose pseudo inverse, we obtained $a = {(a_{1}, a_{2}, \dots, a_{K})}^{T}$ . As Fig 7 shows, $μ_{p} ≃ \sum_{k}^{K} a_{k} s (t - k)$ was valid.

Because the obtained coefficients correspond to the contribution of the signal before k time steps, we could estimate the extent to which the neural network uses past information in estimating the prior.

The estimated coefficients of Eq 14 were plotted against k(Fig 8), revealing that the model with α_s = 0.1 used more past information in estimating prior information than the model with α_s = 1. This difference in time windows leads to a difference in accuracy for prior encoding.

Organization of modular structure time-scale separation

So far, we have investigated neural networks with fixed and modular structures along fixed time scales and demonstrated that those with fast and slow modules effectively represented the prior distribution. Then, we investigated whether such a modular structure would emerge by training a neural network to predict y_true from a homogeneous-structure network. Here, it should be noted that our findings about the effectiveness of modular structures with slow/fast time scales in Bayesian inference do not necessarily imply such structure emerges through learning. In this section, we examine if slow/fast separation with corresponding modular structure is reachable just by learning.

We again used the same neural network model as Eq 1. In this section, α values, as well as elements of W, change by training to start from initial values set randomly according to $N (0.5, 0.1)$ . In other words, we examine if the modular structure together with the timescale difference emerges from the random Gaussian distribution without such structure. During training, each matrix W and α are optimized according to the gradient descent method [41] at each step. The number of neurons in the recurrent layer of the neural network was set to 80.

The change in α distribution during the learning task is shown in Fig 9(a). As shown, α split into two groups over the learning period: one with large values close to 1 and the other with small values near 0.1.

Next, we measured the contribution of prior representation as examined in the “Representation of the prior” section for groups of neurons with large values α (neurons with α_i > 0.8) and groups of neurons with small values α(neurons with α_i < 0.2) for three epochs in the learning process(Fig 9(b)). We found that after 10000 epochs, the slow neurons were responsible for the representation of prior distribution, as in the model with α_s = 0.1 in the fixed time scale setting. To explore the optimal time scale α in response to the variations in p_t, we allowed α to be trainable, by taking an approach similar to that in the last chapter. Models were trained under three distinct settings: p_t = 0.03, 0.1, and 0.3. The results are illustrated in S5 Fig. As shown, the peak on the smaller α side shifted to a larger value as p_t increased. Conversely, the peak on the larger α remained constant at 1 across all settings, with no discernible difference in its proportion.

Finally, we investigated the neural network structure shaped by training. In Fig 9(a), the recurrent layer neurons of the network of epoch 10000 were split into the three groups, divided by the magnitude of α_i, slow neurons with α_i < 0.2, fast neurons with α_i > 0.8, and 0.2 ≤ α_i ≤ 0.8 neurons as the others. The average connectivity between the input layer, each group, and the output layer is shown in Fig 9(c) [42]. The connection from the input layer to the group of fast neurons and that from the fast neurons to the output layer were distinctively larger than those to or from the slow neurons. Among connections within the recurrent layer, those between the fast and slow neurons were larger than others. In summary, a modular structure, shown in Fig 1(b), emerged through learning alone.

Discussion

In this study, we demonstrated that neural networks with slow and fast activity modules play an essential role in the prior representation for Bayesian inference. We set up a task to predict a time-varying signal under noise that could be estimated by Bayesian inference and trained RNNs with or without modular structure and with or without time scale differences.

The RNN could learn to approximate Bayesian inference using the prior(approximating the generator distribution) in all conditions we tested. However, the accuracy was higher in the modular RNN; further, the accuracy was significantly higher when the time scale of the sub-module was moderately slower than that of the main module. In addition, the increase in accuracy was pronounced against a rapidly varying input, for which it was necessary to generate a prior that changes quickly. To achieve such accuracy with a slow sub-module, the sub-module was found to specifically represent the prior, indicating role differentiation between the representation of the prior and the representation of the observed signal (likelihood). Of note, such functional differentiation is caused by differences in time scales. This result is consistent with experimental observations in the brain in which areas that code the prior and likelihood in Bayesian inference are different [18–20](However, caution is required as there is also an experimental report showing that the prior and likelihood are encoded in the same brain area [43]). Finally, it was shown that a modular structure with distinct time scales was spontaneously organized in the RNN by learning.

It is important to note that a relatively slow time scale of the neuron population encoding the prior is required, but the difference between fast and slow neurons should not be excessive. If the time scale is too small, the accuracy is decreased (Fig 3), in which case the sub-module is not responsible for representing the prior (Fig 5). This is because prior construction requires a larger time span to address changes in external input for a neural network with such a slow time scale.

It has been suggested that the time scale of neurons slows down hierarchically from the area where the signal is directly applied to the area where information is proceed [21–23]. This hierarchical structure, combined with modularity [44], is believed to be relevant to information processing [44–46]. Our findings indicate that modular structures with two-level time scales could handle slowly changing inputs. Handling more complex environmental shifts might necessitate a more multi-layered modular structure with diverse time scales. With such a structure, Bayesian inference against complex temporal changes could be achieved by extrapolating the results of this study. Further research verifying this finding will elucidate the significance of hierarchical structuring in the brain. Notably, our simulations revealed that the distinction in time scales not only improves Bayesian inference accuracy but also spontaneously arises from learning processes. Considering these findings, a similar process may be expected in evolution [47].

The modular network with slow/fast time scales could integrate out noise and distinguish the average change in the inputs from fast noise. In fact, the network could effectively predict temporal changes in the input, even under rapidly changing conditions. The brain must adapt to time-varying, noisy inputs; hence, the performance of Bayesian inference by the network design reported herein is considered relevant to brain information processing.

We adopted a simple RNN and trained it using backpropagation. Backpropagation is often argued to be different from the learning algorithm implemented in the actual brain [48, 49], so care should be taken when generalizing our results. However, previous studies have suggested that neural networks trained by backpropagation can show similar behavior to that of the actual brain [38, 50–55]. For instance, by training neural networks by backpropagation, it is possible to produce a neural activity that displays the same behavior as place cells, which represents one’s own spatial position [56]. It is generally considered that the learning scheme in the brain will not adopt backpropagation. Still, one may expect that neural networks and dynamics that achieve the requested task and Bayesian inference have a common structure, as long as the learning scheme is based on synaptic changes depending on on/off neural activity dynamics. Then, the present finding that neurons with slower time scales play a role in representing the prior will be relevant as a plausible explanation of how the brain actually behaves.

Unravelling the relationship between the structure of neural networks, neural dynamics, and the information processing performed by the brain is a primary goal in computational neuroscience [25–27, 57]. In this study, the relevance of modular structure and time scale difference in neural dynamics to the representation of the prior in Bayesian inference is demonstrated, as well as their formation by learning [58, 59], which will support ongoing research in the field.

Supporting information

S1 Fig. Trajectory of the internal state x_sub(t) of the sub-module when generator A(

(μ_{A}, σ_{A}^{2}) = (- 0.5, 0.04)

) and generator B(

(μ_{A}, σ_{A}^{2}) = (0.5, 0.04)

) switch alternately.

Here the trajectories of the internal state x_sub(t) are plotted by the first and second principal components in S1 Fig for the cases in which generators A and B switch every 2-time steps and every 30-time steps. Generators A and B both have σ_g = 0.04. In the case of switching every 30-time steps, they were located in the region taken by the internal state when σ_g was small. In the case of switching every 2-time steps, they were located in the region taken by the internal state when σ_g was large. This occurred because the generators switched so rapidly that the RNN recognized that the signal was created by a generator with a large variance. This made it possible to switch y(t) quickly because the information of the observed signal s was prioritized over the prior information when calculating the output y.

(EPS)

pcbi.1011897.s001.eps^{(413.3KB, eps)}

S2 Fig

Results of the RNN with α_m = α_s = 0.1: (a) Mean squared error between the optimal value y_opt(t) and the output of RNN y(t), plotted against the setting (α_m, α_s) = (1, 1), (1, 0.1), (0.1, 0.1). (b) Division of roles for representing prior distribution. V_s, V_m defined in the text Eqs (12) and (13) plotted for different values of (α_m, α_s) = (1, 1), (1, 0.1), (0.1, 0.1) computed over 1000 samples of data. When (α_m, α_s) = (0.1, 0.1), it resulted in V_s < V_m and this indicates that the sub-module was unable to process prior-based information.

(EPS)

pcbi.1011897.s002.eps^{(482.2KB, eps)}

S3 Fig

Results of the different N_s, N_m: (a) Mean squared error between the optimal value y_opt(t) and the output of RNN y(t), plotted against the setting (N_s, N_m) = (10, 240), (50, 200), (100, 150), (150, 100), (200, 50), (240, 10). α_s is set to 0.1. (b) Division of roles for representing prior distribution. By fixing the sum N_s + N_m to be constant at 250, we examined six configurations: (N_s, N_m) = (10, 240), (50, 200), (100, 150), (150, 100), (200, 50), (240, 10) while keeping α_s = 0.1, α_m = 1 and p_t = 0.03 fixed. Our results reveal that except for the cases (N_s, N_m) = (10, 240), (240, 10) efficient Bayesian inference, indicated by lower MSE values was observed for all other configurations(S3(a) Fig). The differences in MSE between the configurations (N_s, N_m) = (50, 200), (100, 150), (150, 100), (50, 200) were within the margin of error. Furthermore, the division of roles as measured by the variances ((Eqs (12) and (13))) between the sub-module and main-module was evident in all configurations except for (N_s, N_m) = (10, 240)(S3(b) Fig). As long as the number fraction is not too biased, the efficient Bayesian inference was achieved, with the division of roles. If the fraction of N_s is too low, the variance for the slow module is larger, but the number of slow module is not sufficient to make appropriate Bayesian difference, whereas if it is too high the separation of variances does not follow. From these findings, it can be inferred that the results presented in this paper hold broadly, as long as neither of the modules is extremely undersized.

(EPS)

pcbi.1011897.s003.eps^{(871.6KB, eps)}

S4 Fig. Extended examination of α_s in the range 0.16 − 0.06: MSE between the optimal value y_opt(t) and the output of RNN y(t), plotted against the time scale α_s.

Trained and tested the model with (a)p_t = 0.03 and (b)p_t = 0.1. In our present analysis, it was observed that if α_s is slow, accurate Bayesian inference is achievable. In Fig 3, MSE turned to be larger for α_s ≲ 0.01 or ≳ 0.2, and it was smaller around α_s ∼ (0.05 ∼ 0.2). Motivated by this, we investigated if detailed differences in α_s might pinpoint an optimal value, and if the outcomes would be influenced by variations in p_t. Here we change α_s values ranging from 0.16 to 0.06, as shown in S4 Fig, there was no significant differences within this range. Additionally, such insensitivity was observed irrespective of the differences in p_t.

(EPS)

pcbi.1011897.s004.eps^{(1.1MB, eps)}

S5 Fig. Time scale α for different p_t: Frequency distribution of α for the model trained in (a) p_t = 0.03 setting, (b) p_t = 0.1 setting, and (c) p_t = 0.3 setting.

The peak on the smaller α side shifted to a larger value as p_t increased. On the other hand, the peak on the larger α remained constant at 1.

(EPS)

pcbi.1011897.s005.eps^{(1.1MB, eps)}

Acknowledgments

We thank Koji Hukushima and Yasushi Nagano for stimulating the discussion.

Data Availability

Source codes for these models can be found at https://github.com/tripdancer0916/slow-reservoir.

Funding Statement

This study was partially supported by a Grant-in-Aid for Scientific Research (A) (20H00123) from the Ministry of Education, Culture, Sports, Science, and Technology (MEXT) of Japan and by MIC under a grant entitled “R&D of ICT Priority Technology (JPMI00316)” in part. KK is supported by Novo Nordisk Fonden. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Sokoloski S. Implementing a Bayes Filter in a Neural Circuit: The Case of Unknown Stimulus Dynamics. Neural Computation. 2017;29(9):2450–2490. doi: 10.1162/neco_a_00991 [DOI] [PubMed] [Google Scholar]
2. Knill DC, Pouget A. The Bayesian brain: the role of uncertainty in neural coding and computation. Trends in Neurosciences. 2004;27(12):712–719. doi: 10.1016/j.tins.2004.10.007 [DOI] [PubMed] [Google Scholar]
3. Moreno-Bote R, Knill DC, Pouget A. Bayesian sampling in visual perception. Proceedings of the National Academy of Sciences. 2011;108(30):12491–12496. doi: 10.1073/pnas.1101430108 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Angelaki DE, Gu Y, DeAngelis GC. Multisensory integration: psychophysics, neurophysiology, and computation. Current Opinion in Neurobiology. 2009;19(4):452–458. doi: 10.1016/j.conb.2009.06.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Haefner RM, Berkes P, Fiser J. Perceptual Decision-Making as Probabilistic Inference by Neural Sampling. Neuron. 2016;90(3):649–660. doi: 10.1016/j.neuron.2016.03.020 [DOI] [PubMed] [Google Scholar]
6. Ernst MO, Banks MS. Humans integrate visual and haptic information in a statistically optimal fashion. Nature. 2002;415(6870):429–433. doi: 10.1038/415429a [DOI] [PubMed] [Google Scholar]
7. Merfeld DM, Zupan L, Peterka RJ. Humans use internal models to estimate gravity and linear acceleration. Nature. 1999;398(6728):615–618. doi: 10.1038/19303 [DOI] [PubMed] [Google Scholar]
8. Doya K, Ishii S, Pouget A, Rao RPN. Bayesian Brain: Probabilistic Approaches to Neural Coding. MIT Press; 2007. [Google Scholar]
9. Friston K. The history of the future of the Bayesian brain. NeuroImage. 2012;62(2):1230–1233. doi: 10.1016/j.neuroimage.2011.10.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Pouget A, Beck JM, Ma WJ, Latham PE. Probabilistic brains: knowns and unknowns. Nature Neuroscience. 2013;16(9):1170–1178. doi: 10.1038/nn.3495 [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Beck JM, Latham PE, Pouget A. Marginalization in Neural Circuits with Divisive Normalization. Journal of Neuroscience. 2011;31(43):15310–15319. doi: 10.1523/JNEUROSCI.1706-11.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Geisler WS, Kersten D. Illusions, perception and Bayes. Nature Neuroscience. 2002;5(6):508–510. doi: 10.1038/nn0602-508 [DOI] [PubMed] [Google Scholar]
13. Honig M, Ma WJ, Fougnie D. Humans incorporate trial-to-trial working memory uncertainty into rewarded decisions. Proceedings of the National Academy of Sciences. 2020;117(15):8391–8397. doi: 10.1073/pnas.1918143117 [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Chung J, Kastner K, Dinh L, Goel K, Courville AC, Bengio Y. A Recurrent Latent Variable Model for Sequential Data. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, editors. Advances in Neural Information Processing Systems. vol. 28. Curran Associates, Inc.; 2015. [Google Scholar]
15. Ahmadi A, Tani J. A Novel Predictive-Coding-Inspired Variational RNN Model for Online Prediction and Recognition. Neural Computation. 2019;31(11):2025–2074. doi: 10.1162/neco_a_01228 [DOI] [PubMed] [Google Scholar]
16. Orhan AE, Ma WJ. Efficient probabilistic inference in generic neural networks trained with non-probabilistic feedback. Nature Communications. 2017;8(1):138. doi: 10.1038/s41467-017-00181-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Quax SC, Bosch SE, Peelen MV, van Gerven MAJ. Population codes of prior knowledge learned through environmental regularities. Scientific Reports. 2021;11(1):640. doi: 10.1038/s41598-020-79366-z [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Vilares I, Howard JD, Fernandes HL, Gottfried JA, Kording KP. Differential Representations of Prior and Likelihood Uncertainty in the Human Brain. Current Biology. 2012;22(18):1641–1648. doi: 10.1016/j.cub.2012.07.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Chan SCY, Niv Y, Norman KA. A Probability Distribution over Latent Causes, in the Orbitofrontal Cortex. Journal of Neuroscience. 2016;36(30):7817–7828. doi: 10.1523/JNEUROSCI.0659-16.2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
20. d’Acremont M, Schultz W, Bossaerts P. The Human Brain Encodes Event Frequencies While Forming Subjective Beliefs. Journal of Neuroscience. 2013;33(26):10887–10897. doi: 10.1523/JNEUROSCI.5829-12.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Murray JD, Bernacchia A, Freedman DJ, Romo R, Wallis JD, Cai X, et al. A hierarchy of intrinsic timescales across primate cortex. Nature Neuroscience. 2014;17(12):1661–1663. doi: 10.1038/nn.3862 [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Cavanagh SE, Hunt LT, Kennerley SW. A Diversity of Intrinsic Timescales Underlie Neural Computations. Frontiers in Neural Circuits. 2020;14. doi: 10.3389/fncir.2020.615626 [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Golesorkhi M, Gomez-Pilar J, Zilio F, Berberian N, Wolff A, Yagoub MCE, et al. The brain and its time: intrinsic neural timescales are key for input processing. Communications Biology. 2021;4(1):970. doi: 10.1038/s42003-021-02483-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Amunts K, DeFelipe J, Pennartz C, Destexhe A, Migliore M, Ryvlin P, et al. Linking Brain Structure, Activity, and Cognitive Function through Computation. eNeuro. 2022;9(2). doi: 10.1523/ENEURO.0316-21.2022 [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Mastrogiuseppe F, Ostojic S. Linking Connectivity, Dynamics, and Computations in Low-Rank Recurrent Neural Networks. Neuron. 2018;99(3):609–623.e29. doi: 10.1016/j.neuron.2018.07.003 [DOI] [PubMed] [Google Scholar]
26. Vyas S, Golub MD, Sussillo D, Shenoy KV. Computation Through Neural Population Dynamics. Annual Review of Neuroscience. 2020;43(1):249–275. doi: 10.1146/annurev-neuro-092619-094115 [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Beiran M, Dubreuil A, Valente A, Mastrogiuseppe F, Ostojic S. Shaping Dynamics With Multiple Populations in Low-Rank Recurrent Networks. Neural Computation. 2021;33(6):1572–1615. doi: 10.1162/neco_a_01381 [DOI] [PubMed] [Google Scholar]
28. Papo D. Time scales in cognitive neuroscience. Frontiers in Physiology. 2013;4. doi: 10.3389/fphys.2013.00086 [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Barak O. Recurrent neural networks as versatile tools of neuroscience research. Curr Opin Neurobiol. 2017;46:1–6. doi: 10.1016/j.conb.2017.06.003 [DOI] [PubMed] [Google Scholar]
30.Nair V, Hinton GE. Rectified Linear Units Improve Restricted Boltzmann Machines. In: Fürnkranz J, Joachims T, editors. ICML. Omnipress; 2010. p. 807–814. Available from: http://dblp.uni-trier.de/db/conf/icml/icml2010.html#NairH10.
31. Ma WJ, Beck JM, Latham PE, Pouget A. Bayesian inference with probabilistic population codes. Nature Neuroscience. 2006;9(11):1432–1438. doi: 10.1038/nn1790 [DOI] [PubMed] [Google Scholar]
32. Ichikawa K, Kataoka A. Dynamical Mechanism of Sampling-Based Probabilistic Inference Under Probabilistic Population Codes. Neural Computation. 2022;34(3):804–827. doi: 10.1162/neco_a_01477 [DOI] [PubMed] [Google Scholar]
33. Swindale NV. Orientation tuning curves: empirical description and estimation of parameters. Biological Cybernetics. 1998;78(1):45–56. doi: 10.1007/s004220050411 [DOI] [PubMed] [Google Scholar]
34. Rumelhart DE, Hinton GE, Williams RJ. In: Learning Internal Representations by Error Propagation. Cambridge, MA, USA: MIT Press; 1986. p. 318–362. [Google Scholar]
35. Werbos PJ. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE. 1990;78(10):1550–1560. doi: 10.1109/5.58337 [DOI] [Google Scholar]
36.Kingma DP, Ba J. Adam: A Method for Stochastic Optimization; 2014. Available from: http://arxiv.org/abs/1412.6980.
37. Bishop CM. Pattern Recognition and Machine Learning. Springer; 2006. [Google Scholar]
38. Mante V, et al. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature. 2013;503:78–84. doi: 10.1038/nature12742 [DOI] [PMC free article] [PubMed] [Google Scholar]
39. Ichikawa K, Kaneko K. Short-term memory by transient oscillatory dynamics in recurrent neural networks. Phys Rev Research. 2021;3:033193. doi: 10.1103/PhysRevResearch.3.033193 [DOI] [Google Scholar]
40. Penrose R. A generalized inverse for matrices. Mathematical Proceedings of the Cambridge Philosophical Society. 1955;51(3):406–413. doi: 10.1017/S0305004100030401 [DOI] [Google Scholar]
41. Perez-Nieves N, Leung VCH, Dragotti PL, Goodman DFM. Neural heterogeneity promotes robust learning. Nature Communications. 2021;12(1):5791. doi: 10.1038/s41467-021-26022-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
42. Yang GR, Joglekar MR, Song HF, Newsome WT, Wang XJ. Task representations in neural networks trained to perform many cognitive tasks. Nature Neuroscience. 2019;22(2):297–306. doi: 10.1038/s41593-018-0310-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
43. Mochol G, Kiani R, Moreno-Bote R. Prefrontal cortex represents heuristics that shape choice bias and its integration into future behavior. Current Biology. 2021;31(6):1234–1244.e6. doi: 10.1016/j.cub.2021.01.068 [DOI] [PMC free article] [PubMed] [Google Scholar]
44. Yamashita Y, Tani J. Emergence of Functional Hierarchy in a Multiple Timescale Neural Network Model: A Humanoid Robot Experiment. PLOS Computational Biology. 2008;4(11):1–18. doi: 10.1371/journal.pcbi.1000220 [DOI] [PMC free article] [PubMed] [Google Scholar]
45. Kurikawa T, Kaneko K. Multiple-Timescale Neural Networks: Generation of History-Dependent Sequences and Inference Through Autonomous Bifurcations. Frontiers in Computational Neuroscience. 2021;15. doi: 10.3389/fncom.2021.743537 [DOI] [PMC free article] [PubMed] [Google Scholar]
46. Tanaka G, Matsumori T, Yoshida H, Aihara K. Reservoir computing with diverse timescales for prediction of multiscale dynamics. Phys Rev Research. 2022;4:L032014. doi: 10.1103/PhysRevResearch.4.L032014 [DOI] [Google Scholar]
47. Yamaguti Y, Tsuda I. Functional differentiations in evolutionary reservoir computing networks. Chaos: An Interdisciplinary Journal of Nonlinear Science. 2021;31(1):013137. doi: 10.1063/5.0019116 [DOI] [PubMed] [Google Scholar]
48.Bengio Y, Lee D, Bornschein J, Lin Z. Towards Biologically Plausible Deep Learning. ArXiv. 2015;abs/1502.04156.
49. Lillicrap TP, Cownden D, Tweed DB, Akerman CJ. Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications. 2016;7(1):13276. doi: 10.1038/ncomms13276 [DOI] [PMC free article] [PubMed] [Google Scholar]
50. Richards BA, Lillicrap TP, Beaudoin P, Bengio Y, Bogacz R, Christensen A, et al. A deep learning framework for neuroscience. Nature Neuroscience. 2019;22(11):1761–1770. doi: 10.1038/s41593-019-0520-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
51. Yang GR, Wang XJ. Artificial Neural Networks for Neuroscientists: A Primer. Neuron. 2020;107(6):1048–1070. doi: 10.1016/j.neuron.2020.09.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
52. Barak O, Sussillo D, Romo R, Tsodyks M, Abbott LF. From fixed points to chaos: Three models of delayed discrimination. Progress in Neurobiology. 2013;103:214–222. doi: 10.1016/j.pneurobio.2013.02.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Cueva CJ, Wei XX. Emergence of grid-like representations by training recurrent neural networks to perform spatial localization. In: International Conference on Learning Representations; 2018. Available from: https://openreview.net/forum?id=B17JTOe0-.
54. Yamins DLK, DiCarlo JJ. Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience. 2016;19(3):356–365. doi: 10.1038/nn.4244 [DOI] [PubMed] [Google Scholar]
55. Haesemeyer M, Schier AF, Engert F. Convergent Temperature Representations in Artificial and Biological Neural Networks. Neuron. 2019;103(6):1123–1134.e6. doi: 10.1016/j.neuron.2019.07.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
56. Banino A, Barry C, Uria B, Blundell C, Lillicrap T, Mirowski P, et al. Vector-based navigation using grid-like representations in artificial agents. Nature. 2018;557(7705):429–433. doi: 10.1038/s41586-018-0102-6 [DOI] [PubMed] [Google Scholar]
57. Dubreuil A, Valente A, Beiran M, Mastrogiuseppe F, Ostojic S. The role of population structure in computations through neural dynamics. Nature Neuroscience. 2022;25(6):783–794. doi: 10.1038/s41593-022-01088-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
58. Lorenz DM, Jeng A, Deem MW. The emergence of modularity in biological systems. Physics of Life Reviews. 2011;8(2):129–160. doi: 10.1016/j.plrev.2011.02.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
59. Kashtan N, Alon U. Spontaneous evolution of modularity and network motifs. Proceedings of the National Academy of Sciences. 2005;102(39):13773–13778. doi: 10.1073/pnas.0503610102 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1011897.r001

Decision Letter 0

Ulrik R Beierholm, Marieke Karlijn van Vugt

7 Apr 2023

Dear student Ichikawa,

Thank you very much for submitting your manuscript "Bayesian inference is facilitated by modular neural networks with different time scales" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

Note especially that some of the reviewers highlighted issues with the clarity of the writing.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Ulrik R. Beierholm

Academic Editor

PLOS Computational Biology

Marieke van Vugt

Section Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors present computational work showing that RNN can learn to represent prior and likelihood information in a close-to-optimal manner by obeying Bayes rule, and that this is accomplished by separated submodules that specialize to represent the prior with slow neurons and the likelihood with fast neurons. Overall, the paper is interesting and provide some novel results.

The introduction can be improved. It does not make justice to the interesting results that follow in the Results section. Also, in general the quality of English can be improved.

I wonder what would be the optimal architecture in the limit where p_t (the transition probability) goes to zero. I would assume that in this limit no modular architecture is needed, as the prior is fix and does not change over time (and it can be stored in the weights of the network). Could the author perform simulations of this case, and compare to the limit of p_t going to zero, as in Fig. 3b?

More details describing the learning method is needed. It is unclear how learning works when the prior distribution is changed over time with probability p_t. Is the prior relearn at each step? Or one uses all the data to train the network? For how long episodes? If yes, the network will naturally adapt to the time scale of the changes, but more discussion is needed.

The authors claim in the discussion that in the brain different subnetworks are in charge of encoding priors and likelihoods. I think that the authors are right that generally speaking slow, high level variables are encoding in higher brain areas than fast, low level variables. However, there are some exceptions. For instance, in the work by Mochol et al, Current Biology, 2021, it is shown that prefrontal cortex encodes both priors and likelihoods in the same brain area. Therefore, I would rewrite the discussion and being more cautious about this.

In Fig. 8, I would not use transparent colors to display the distributions.

Line 3: animal brain -> animal brains

Line 83: are the noise terms assumed to be independent

Line 88-89: ‘and’ in cursive

Line 208: unclear why y depends explicitly on s. I thought that it would depend only on x_m explicitly.

Reviewer #2: When performing Bayesian inference, the "prior" often changes abruptly, and for inference to be accurate these changes must be tracked. Here the authors investigate a modular network with both slow and fast timescales. They find an optimal timescale in the slow network. And, they also show that the timescale, and modular structure, can be learned, something that is very important.

Overall, this is an interesting paper that, I think, should be published. However, I got pretty lost in places, and the authors should clarify things before it's released to the public. The following comments should clarify my confusion (and also point out other issues, which were more minor).

1. Larger font on figures would be greatly appreciated -- some were _very_ hard to read (because I'm old ;)).

2. l 80, typo, I think: "= 0" --> "= z".

3. Fig. 3: the y-axes should start at zero, to make it easy to see how big the effect is.

4. Fig. 3 (especially blue points in panel b): why not do more runs to decrease the error bars?

5. l 168: Fig. 2b should be Fig. 3b, I think.

6. l 195-232: Here I got very lost. You talk about setting mu_g and sigma_g for each module. But you can't do that, right? Instead, all you can do is determine how the input is generated. So the analysis doesn't make sense to me.

7. l 264: I got somewhat confused here as well. It seems to me that mu_p and sigma_p are internal values that are not directly accessible. If so, how can you find a transformation from x to mu_p (l 272)?

That said, the result of this analysis -- that alpha_s=0.1 uses more past information than alpha_s=1 -- is not surprising. But it doesn't explain why alpha_s=0.1 is better for rapid switches. Maybe that explanation was where I got lost?

Reviewer #3: # Paper synopsis

In the paper *Bayesian inference is facilitated by modular neural networks with different time scales*, Kohei Ichikawa and Kunihiko Kaneko explore how recurrent neural networks capture short and long timescales effects in a dynamic Bayesian inference problem. In particular, they study how a modular RNN architecture with different timescales for their respective neural activity dynamics can better predict a noisy stimulus than an RNN with a simpler architecture. The paper quantitatively assesses numerous features of the modular architecture, and overall demonstrates that the modular RNN learns to represent long-time scale, prior information in the slow timescale submodule, whereas the fast timescale module learns to focus on the immediate problem of identifying the noisy stimulus.

# Review and general recommendations

Overall I did like aspects of this paper, and I think the questions the authors raise about the role of timescales in inference are quite interesting. Moreover, the simulations they perform take some steps towards answering them. Nevertheless, the paper in its present form has fundamental issues:

(i) Overall the paper feels underdeveloped. Although particular explanations are written well enough, a lot of the text feels meandering, incomplete, and hard to follow. The figures also suffer from often conveying relatively little information, and with poor formatting and hard to read text. As such, this is a frustrating paper to review, because although I think some of the content is good, I feel like it's conveyed quite poorly. I feel the authors sent off this manuscript prematurely.

(ii) Similarly, the simulations presented in the paper are extremely limited both in breadth and quantity. As a neural network paper where the simulations are presumably not compute-intensive, I would like to see much broader testing of the hyper parameters of the network, from timescales, to number of neurons, to switching probabilities. Can the authors provide cases where the modularity of the network is *not* necessary? If the authors wish to support their claims with simulations rather than analysis, then the simulations must be to a much higher standard.

Overall, based on the current amount of content, it feels like a tightly written version of this paper could fit into 5 pages. I apologize that such a comment does not provide the authors with much guidance, but in its present form the paper feels too underdeveloped to identify exactly where to fill it in. These issues prevent me from recommending the paper for publication, and the authors will have to undertake major revisions to prepare it for publication.

# Local comments

(2023-04-07, 10:45:14 a.m.)

“when performing Bayesian inference by the brain.” (p. 1) redundant

“uncertainty” (p. 2) please cite Sokoloski in Neural Computation, 2017

“the” (p. 3) a

“the results to be discussed are 76 not altered as long as both the numbers are sufficient (say 100 vs. 50, 150 vs. 150 for 77 fast and slow neurons)” (p. 4) a bit more detail

“xmandxs” (p. 4) typo

“αmandαs” (p. 4) typo

“Then” (p. 5) this connecting word is confusing. Does this sentence follow from the previous?

“the ease of firing” (p. 5) tuning curve width?

“Therefore, training was performed to minimize the mean squared error (MSE) between the neural network output y(t) and the true value ytrue(t)” (p. 5) this is a unrealistic assumption and should be justified more.

“Hyperparameters” (p. 6) the amount of detail here is excessive. Focus on the key parameters

“pt” (p. 6) Why isn't this variable on the other side?

“Fig 4.” (p. 8) This figure only provides a weak demonstration of the point

“The network with αs = 1 could not follow rapid input changes, whereas that with αs = 0.1 could estimate the input prior effectively.” (p. 8) it would be nice to have some kind of explanation of how the alpha s timescale compared with the timescale of the switching. Is there an optimal alpha s that depends on switch speed?

“Fig 6” (p. 10) Labels and text are tool small in figures. Normalize text sizes across all figures.

“SI for the case” (p. 11) supplementary information? Please improve reference

“Effects of different time scales” (p. 11) This section seems far harder to read than it should be, given the simplicity of the point the authors are trying to make

“To examine the impact of αs differences on Bayesian inference accuracy in detail, we 258 considered how each model with αs = 1 and αs = 0.1 represents prior as a function of 259 the input signal.” (p. 11) At some point I would like to see key model hyperparameters thoroughly grid searched

“ak defined by Eq.14 is plotted against t, for the model with αs = 1 and αs = 0.1 using 3000 data points.” (p. 12) Figure text is unhelpful. It should be possible to have some understanding of the figure based on the caption

“Then,” (p. 12) You mean to say this is what this section is going to be about? This makes it sound like you've already done it

“normal RNN” (p. 12) reference earlier equations?

“In summary, a modular structure, shown in Fig.1(b), 315 emerged through learning alone.” (p. 12) The way this section is set up, it feels like it trivializes the earlier results of the paper. You could (and maybe should) simply start the paper with this section, show that 0.1 is optimal, and then analyze the results throughout the rest of the paper. I spent a lot of time in this paper trying to guess why the authors chose particular settings for various hyperparameters.

“Future research should investigate 342 how this optimal time scale depends on the time scale of environmental changes.” (p. 13) I think this research needs to be in this study

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Sacha Sokoloski

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

PLoS Comput Biol. 2024 Mar 13;20(3):e1011897. doi: 10.1371/journal.pcbi.1011897.r002

Author response to Decision Letter 0

8 Nov 2023

Attachment

Submitted filename: reply_to_referee_report.pdf

pcbi.1011897.s006.pdf^{(131.6KB, pdf)}

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1011897.r003

Decision Letter 1

Ulrik R Beierholm, Marieke Karlijn van Vugt

29 Nov 2023

Dear student Ichikawa,

Thank you very much for submitting your manuscript "Bayesian inference is facilitated by modular neural networks with different time scales" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Ulrik R. Beierholm

Academic Editor

PLOS Computational Biology

Marieke van Vugt

Section Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Thanks for the improvements in the paper.

Reviewer #2: I still like this paper, and it's better, but it's also still confusing in places. Comments follow, more or less in order of appearance.

1. Typo in Eq. 2: should be N_m < i, not N_m \\le i.

2. How can you run with alpha=1? In that regime doesn't x diffuse off to infinity?

3. How often is u sampled? Every time step? Or only when things change?

4. p_t is introduced on line 221, but I don't think you ever told us what it is.

6. Lines 238-9, "When a(t) is close to 1, the model’s prior is closer to generator B, and when a(t) is close to -1, it is closer to prior A." Shouldn't it be 0 and 1, not 1 and -1?

7. Fig. 2 is a bit unsatisfying. What we would like to see is a plot of y versus s for different values of mu_g and sigma_g. Along with a summary statistic.

8. The mu_p explanation was confusing. As far as I can tell, you fit a matrix to mu_g = W_{mu_p} x, and then declared that mu_p = W_{mu_p} x? Is that correct? If so, you should say so; if not, I'm lost.

9. Lines 362-4: "Here, note that the findings that the modular structure with slow/fast time scales works better for Bayesian inference do not necessarily imply that the structure can be achievable just by learning." I believe you're just saying that it's not clear that alpha can be learned. If so, you should say that. If not, I'm lost.

Reviewer #3: This article is much improved, and the authors have addressed the bulk of my

concerns. Here are some additional comments to help further improve the manuscript.

p. 3, l. 44: "...corresponds to the higher layer in the brain." This statement

needs to be rephrased and made more precise.

p. 3, l. 59: "...the prior more appropriately than regular RNN." explain what

you mean by this.

p. 3, Eq. 1: Make sure to introduce all variables in the equation, even if you

can only fully define them later in the text.

p. 4, l. 153: You should state somewhere here exactly which parameters of the

neural network you're optimizing.

p. 10, l. 279: "...strongly reflects the difference in the prior distribution to

the difference in output, respectively." This sentence is unclear.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Figure Files:

Data Requirements:

Reproducibility:

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLoS Comput Biol. 2024 Mar 13;20(3):e1011897. doi: 10.1371/journal.pcbi.1011897.r004

Author response to Decision Letter 1

8 Jan 2024

Attachment

Submitted filename: reply_to_referee_report.pdf

pcbi.1011897.s007.pdf^{(123.8KB, pdf)}

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1011897.r005

Decision Letter 2

Thomas Serre

6 Feb 2024

Dear student Ichikawa,

We are pleased to inform you that your manuscript 'Bayesian inference is facilitated by modular neural networks with different time scales' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.

Best regards,

Thomas Serre

Section Editor

PLOS Computational Biology

Marieke van Vugt

Section Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #2: I'm happy with the revisions.

I do, however, have one more comment. The first few times I looked at Eq. 1, I thought that I was the identity matrix and alpha was a diagonal matrix. However, I just noticed that I and alpha are are vectors. That's fine, but if so you need to make it clear that you're using element-wise multiplication; e.g., alpha x is a vector whose i^th component is alpha_i x_i. Alternatively, you could make this standard and change I to the identity matrix and alpha to a diagonal matrix.

Either is fine with me, and I definitely don't need to see the paper again.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1011897.r006

Acceptance letter

Thomas Serre

27 Feb 2024

PCOMPBIOL-D-23-00148R2

Bayesian inference is facilitated by modular neural networks with different time scales

Dear Dr Ichikawa,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Zsofia Freund

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Fig. Trajectory of the internal state x_sub(t) of the sub-module when generator A(

(μ_{A}, σ_{A}^{2}) = (- 0.5, 0.04)

) and generator B(

(μ_{A}, σ_{A}^{2}) = (0.5, 0.04)

) switch alternately.

(EPS)

pcbi.1011897.s001.eps^{(413.3KB, eps)}

S2 Fig

(EPS)

pcbi.1011897.s002.eps^{(482.2KB, eps)}

S3 Fig

(EPS)

pcbi.1011897.s003.eps^{(871.6KB, eps)}

S4 Fig. Extended examination of α_s in the range 0.16 − 0.06: MSE between the optimal value y_opt(t) and the output of RNN y(t), plotted against the time scale α_s.

(EPS)

pcbi.1011897.s004.eps^{(1.1MB, eps)}

S5 Fig. Time scale α for different p_t: Frequency distribution of α for the model trained in (a) p_t = 0.03 setting, (b) p_t = 0.1 setting, and (c) p_t = 0.3 setting.

The peak on the smaller α side shifted to a larger value as p_t increased. On the other hand, the peak on the larger α remained constant at 1.

(EPS)

pcbi.1011897.s005.eps^{(1.1MB, eps)}

Attachment

Submitted filename: reply_to_referee_report.pdf

pcbi.1011897.s006.pdf^{(131.6KB, pdf)}

Attachment

Submitted filename: reply_to_referee_report.pdf

pcbi.1011897.s007.pdf^{(123.8KB, pdf)}

Data Availability Statement

Source codes for these models can be found at https://github.com/tripdancer0916/slow-reservoir.

[pcbi.1011897.ref001] 1. Sokoloski S. Implementing a Bayes Filter in a Neural Circuit: The Case of Unknown Stimulus Dynamics. Neural Computation. 2017;29(9):2450–2490. doi: 10.1162/neco_a_00991 [DOI] [PubMed] [Google Scholar]

[pcbi.1011897.ref002] 2. Knill DC, Pouget A. The Bayesian brain: the role of uncertainty in neural coding and computation. Trends in Neurosciences. 2004;27(12):712–719. doi: 10.1016/j.tins.2004.10.007 [DOI] [PubMed] [Google Scholar]

[pcbi.1011897.ref003] 3. Moreno-Bote R, Knill DC, Pouget A. Bayesian sampling in visual perception. Proceedings of the National Academy of Sciences. 2011;108(30):12491–12496. doi: 10.1073/pnas.1101430108 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref004] 4. Angelaki DE, Gu Y, DeAngelis GC. Multisensory integration: psychophysics, neurophysiology, and computation. Current Opinion in Neurobiology. 2009;19(4):452–458. doi: 10.1016/j.conb.2009.06.008 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref005] 5. Haefner RM, Berkes P, Fiser J. Perceptual Decision-Making as Probabilistic Inference by Neural Sampling. Neuron. 2016;90(3):649–660. doi: 10.1016/j.neuron.2016.03.020 [DOI] [PubMed] [Google Scholar]

[pcbi.1011897.ref006] 6. Ernst MO, Banks MS. Humans integrate visual and haptic information in a statistically optimal fashion. Nature. 2002;415(6870):429–433. doi: 10.1038/415429a [DOI] [PubMed] [Google Scholar]

[pcbi.1011897.ref007] 7. Merfeld DM, Zupan L, Peterka RJ. Humans use internal models to estimate gravity and linear acceleration. Nature. 1999;398(6728):615–618. doi: 10.1038/19303 [DOI] [PubMed] [Google Scholar]

[pcbi.1011897.ref008] 8. Doya K, Ishii S, Pouget A, Rao RPN. Bayesian Brain: Probabilistic Approaches to Neural Coding. MIT Press; 2007. [Google Scholar]

[pcbi.1011897.ref009] 9. Friston K. The history of the future of the Bayesian brain. NeuroImage. 2012;62(2):1230–1233. doi: 10.1016/j.neuroimage.2011.10.004 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref010] 10. Pouget A, Beck JM, Ma WJ, Latham PE. Probabilistic brains: knowns and unknowns. Nature Neuroscience. 2013;16(9):1170–1178. doi: 10.1038/nn.3495 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref011] 11. Beck JM, Latham PE, Pouget A. Marginalization in Neural Circuits with Divisive Normalization. Journal of Neuroscience. 2011;31(43):15310–15319. doi: 10.1523/JNEUROSCI.1706-11.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref012] 12. Geisler WS, Kersten D. Illusions, perception and Bayes. Nature Neuroscience. 2002;5(6):508–510. doi: 10.1038/nn0602-508 [DOI] [PubMed] [Google Scholar]

[pcbi.1011897.ref013] 13. Honig M, Ma WJ, Fougnie D. Humans incorporate trial-to-trial working memory uncertainty into rewarded decisions. Proceedings of the National Academy of Sciences. 2020;117(15):8391–8397. doi: 10.1073/pnas.1918143117 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref014] 14. Chung J, Kastner K, Dinh L, Goel K, Courville AC, Bengio Y. A Recurrent Latent Variable Model for Sequential Data. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, editors. Advances in Neural Information Processing Systems. vol. 28. Curran Associates, Inc.; 2015. [Google Scholar]

[pcbi.1011897.ref015] 15. Ahmadi A, Tani J. A Novel Predictive-Coding-Inspired Variational RNN Model for Online Prediction and Recognition. Neural Computation. 2019;31(11):2025–2074. doi: 10.1162/neco_a_01228 [DOI] [PubMed] [Google Scholar]

[pcbi.1011897.ref016] 16. Orhan AE, Ma WJ. Efficient probabilistic inference in generic neural networks trained with non-probabilistic feedback. Nature Communications. 2017;8(1):138. doi: 10.1038/s41467-017-00181-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref017] 17. Quax SC, Bosch SE, Peelen MV, van Gerven MAJ. Population codes of prior knowledge learned through environmental regularities. Scientific Reports. 2021;11(1):640. doi: 10.1038/s41598-020-79366-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref018] 18. Vilares I, Howard JD, Fernandes HL, Gottfried JA, Kording KP. Differential Representations of Prior and Likelihood Uncertainty in the Human Brain. Current Biology. 2012;22(18):1641–1648. doi: 10.1016/j.cub.2012.07.010 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref019] 19. Chan SCY, Niv Y, Norman KA. A Probability Distribution over Latent Causes, in the Orbitofrontal Cortex. Journal of Neuroscience. 2016;36(30):7817–7828. doi: 10.1523/JNEUROSCI.0659-16.2016 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref020] 20. d’Acremont M, Schultz W, Bossaerts P. The Human Brain Encodes Event Frequencies While Forming Subjective Beliefs. Journal of Neuroscience. 2013;33(26):10887–10897. doi: 10.1523/JNEUROSCI.5829-12.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref021] 21. Murray JD, Bernacchia A, Freedman DJ, Romo R, Wallis JD, Cai X, et al. A hierarchy of intrinsic timescales across primate cortex. Nature Neuroscience. 2014;17(12):1661–1663. doi: 10.1038/nn.3862 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref022] 22. Cavanagh SE, Hunt LT, Kennerley SW. A Diversity of Intrinsic Timescales Underlie Neural Computations. Frontiers in Neural Circuits. 2020;14. doi: 10.3389/fncir.2020.615626 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref023] 23. Golesorkhi M, Gomez-Pilar J, Zilio F, Berberian N, Wolff A, Yagoub MCE, et al. The brain and its time: intrinsic neural timescales are key for input processing. Communications Biology. 2021;4(1):970. doi: 10.1038/s42003-021-02483-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref024] 24. Amunts K, DeFelipe J, Pennartz C, Destexhe A, Migliore M, Ryvlin P, et al. Linking Brain Structure, Activity, and Cognitive Function through Computation. eNeuro. 2022;9(2). doi: 10.1523/ENEURO.0316-21.2022 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref025] 25. Mastrogiuseppe F, Ostojic S. Linking Connectivity, Dynamics, and Computations in Low-Rank Recurrent Neural Networks. Neuron. 2018;99(3):609–623.e29. doi: 10.1016/j.neuron.2018.07.003 [DOI] [PubMed] [Google Scholar]

[pcbi.1011897.ref026] 26. Vyas S, Golub MD, Sussillo D, Shenoy KV. Computation Through Neural Population Dynamics. Annual Review of Neuroscience. 2020;43(1):249–275. doi: 10.1146/annurev-neuro-092619-094115 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref027] 27. Beiran M, Dubreuil A, Valente A, Mastrogiuseppe F, Ostojic S. Shaping Dynamics With Multiple Populations in Low-Rank Recurrent Networks. Neural Computation. 2021;33(6):1572–1615. doi: 10.1162/neco_a_01381 [DOI] [PubMed] [Google Scholar]

[pcbi.1011897.ref028] 28. Papo D. Time scales in cognitive neuroscience. Frontiers in Physiology. 2013;4. doi: 10.3389/fphys.2013.00086 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref029] 29. Barak O. Recurrent neural networks as versatile tools of neuroscience research. Curr Opin Neurobiol. 2017;46:1–6. doi: 10.1016/j.conb.2017.06.003 [DOI] [PubMed] [Google Scholar]

[pcbi.1011897.ref030] 30.Nair V, Hinton GE. Rectified Linear Units Improve Restricted Boltzmann Machines. In: Fürnkranz J, Joachims T, editors. ICML. Omnipress; 2010. p. 807–814. Available from: http://dblp.uni-trier.de/db/conf/icml/icml2010.html#NairH10.

[pcbi.1011897.ref031] 31. Ma WJ, Beck JM, Latham PE, Pouget A. Bayesian inference with probabilistic population codes. Nature Neuroscience. 2006;9(11):1432–1438. doi: 10.1038/nn1790 [DOI] [PubMed] [Google Scholar]

[pcbi.1011897.ref032] 32. Ichikawa K, Kataoka A. Dynamical Mechanism of Sampling-Based Probabilistic Inference Under Probabilistic Population Codes. Neural Computation. 2022;34(3):804–827. doi: 10.1162/neco_a_01477 [DOI] [PubMed] [Google Scholar]

[pcbi.1011897.ref033] 33. Swindale NV. Orientation tuning curves: empirical description and estimation of parameters. Biological Cybernetics. 1998;78(1):45–56. doi: 10.1007/s004220050411 [DOI] [PubMed] [Google Scholar]

[pcbi.1011897.ref034] 34. Rumelhart DE, Hinton GE, Williams RJ. In: Learning Internal Representations by Error Propagation. Cambridge, MA, USA: MIT Press; 1986. p. 318–362. [Google Scholar]

[pcbi.1011897.ref035] 35. Werbos PJ. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE. 1990;78(10):1550–1560. doi: 10.1109/5.58337 [DOI] [Google Scholar]

[pcbi.1011897.ref036] 36.Kingma DP, Ba J. Adam: A Method for Stochastic Optimization; 2014. Available from: http://arxiv.org/abs/1412.6980.

[pcbi.1011897.ref037] 37. Bishop CM. Pattern Recognition and Machine Learning. Springer; 2006. [Google Scholar]

[pcbi.1011897.ref038] 38. Mante V, et al. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature. 2013;503:78–84. doi: 10.1038/nature12742 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref039] 39. Ichikawa K, Kaneko K. Short-term memory by transient oscillatory dynamics in recurrent neural networks. Phys Rev Research. 2021;3:033193. doi: 10.1103/PhysRevResearch.3.033193 [DOI] [Google Scholar]

[pcbi.1011897.ref040] 40. Penrose R. A generalized inverse for matrices. Mathematical Proceedings of the Cambridge Philosophical Society. 1955;51(3):406–413. doi: 10.1017/S0305004100030401 [DOI] [Google Scholar]

[pcbi.1011897.ref041] 41. Perez-Nieves N, Leung VCH, Dragotti PL, Goodman DFM. Neural heterogeneity promotes robust learning. Nature Communications. 2021;12(1):5791. doi: 10.1038/s41467-021-26022-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref042] 42. Yang GR, Joglekar MR, Song HF, Newsome WT, Wang XJ. Task representations in neural networks trained to perform many cognitive tasks. Nature Neuroscience. 2019;22(2):297–306. doi: 10.1038/s41593-018-0310-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref043] 43. Mochol G, Kiani R, Moreno-Bote R. Prefrontal cortex represents heuristics that shape choice bias and its integration into future behavior. Current Biology. 2021;31(6):1234–1244.e6. doi: 10.1016/j.cub.2021.01.068 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref044] 44. Yamashita Y, Tani J. Emergence of Functional Hierarchy in a Multiple Timescale Neural Network Model: A Humanoid Robot Experiment. PLOS Computational Biology. 2008;4(11):1–18. doi: 10.1371/journal.pcbi.1000220 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref045] 45. Kurikawa T, Kaneko K. Multiple-Timescale Neural Networks: Generation of History-Dependent Sequences and Inference Through Autonomous Bifurcations. Frontiers in Computational Neuroscience. 2021;15. doi: 10.3389/fncom.2021.743537 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref046] 46. Tanaka G, Matsumori T, Yoshida H, Aihara K. Reservoir computing with diverse timescales for prediction of multiscale dynamics. Phys Rev Research. 2022;4:L032014. doi: 10.1103/PhysRevResearch.4.L032014 [DOI] [Google Scholar]

[pcbi.1011897.ref047] 47. Yamaguti Y, Tsuda I. Functional differentiations in evolutionary reservoir computing networks. Chaos: An Interdisciplinary Journal of Nonlinear Science. 2021;31(1):013137. doi: 10.1063/5.0019116 [DOI] [PubMed] [Google Scholar]

[pcbi.1011897.ref048] 48.Bengio Y, Lee D, Bornschein J, Lin Z. Towards Biologically Plausible Deep Learning. ArXiv. 2015;abs/1502.04156.

[pcbi.1011897.ref049] 49. Lillicrap TP, Cownden D, Tweed DB, Akerman CJ. Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications. 2016;7(1):13276. doi: 10.1038/ncomms13276 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref050] 50. Richards BA, Lillicrap TP, Beaudoin P, Bengio Y, Bogacz R, Christensen A, et al. A deep learning framework for neuroscience. Nature Neuroscience. 2019;22(11):1761–1770. doi: 10.1038/s41593-019-0520-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref051] 51. Yang GR, Wang XJ. Artificial Neural Networks for Neuroscientists: A Primer. Neuron. 2020;107(6):1048–1070. doi: 10.1016/j.neuron.2020.09.005 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref052] 52. Barak O, Sussillo D, Romo R, Tsodyks M, Abbott LF. From fixed points to chaos: Three models of delayed discrimination. Progress in Neurobiology. 2013;103:214–222. doi: 10.1016/j.pneurobio.2013.02.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref053] 53.Cueva CJ, Wei XX. Emergence of grid-like representations by training recurrent neural networks to perform spatial localization. In: International Conference on Learning Representations; 2018. Available from: https://openreview.net/forum?id=B17JTOe0-.

[pcbi.1011897.ref054] 54. Yamins DLK, DiCarlo JJ. Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience. 2016;19(3):356–365. doi: 10.1038/nn.4244 [DOI] [PubMed] [Google Scholar]

[pcbi.1011897.ref055] 55. Haesemeyer M, Schier AF, Engert F. Convergent Temperature Representations in Artificial and Biological Neural Networks. Neuron. 2019;103(6):1123–1134.e6. doi: 10.1016/j.neuron.2019.07.003 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref056] 56. Banino A, Barry C, Uria B, Blundell C, Lillicrap T, Mirowski P, et al. Vector-based navigation using grid-like representations in artificial agents. Nature. 2018;557(7705):429–433. doi: 10.1038/s41586-018-0102-6 [DOI] [PubMed] [Google Scholar]

[pcbi.1011897.ref057] 57. Dubreuil A, Valente A, Beiran M, Mastrogiuseppe F, Ostojic S. The role of population structure in computations through neural dynamics. Nature Neuroscience. 2022;25(6):783–794. doi: 10.1038/s41593-022-01088-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref058] 58. Lorenz DM, Jeng A, Deem MW. The emergence of modularity in biological systems. Physics of Life Reviews. 2011;8(2):129–160. doi: 10.1016/j.plrev.2011.02.003 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011897.ref059] 59. Kashtan N, Alon U. Spontaneous evolution of modularity and network motifs. Proceedings of the National Academy of Sciences. 2005;102(39):13773–13778. doi: 10.1073/pnas.0503610102 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Bayesian inference is facilitated by modular neural networks with different time scales

Kohei Ichikawa

Kunihiko Kaneko

Roles

Abstract

Author summary

Introduction

Fig 1. Schematic of RNN.

Materials and methods

Recurrent neural networks with/without modular structure

Task

Table 1. Hyperparameters.

Results

Fixed structure and time scales

Bayesian optimality

Fig 2.

Fig 3.

Adjustability to rapid generator switching

Fig 4. Adjustability to rapid generator change.

Representation of the prior

Fig 5. Division of roles for representing prior distribution.

Fig 6. The neural activities of the main module xm(μg, σg) and sub-module xs(μg, σg) were plotted by the first and second principal component spaces.

Effects of different time scales

Fig 7.

Fig 8. ak defined by Eq 14 is plotted against k, for the model with αs = 1 and αs = 0.1 using 3000 data points.

Organization of modular structure time-scale separation

Fig 9. RNN features obtained by learning when α is variable by learning.

Discussion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Ulrik R Beierholm

Marieke Karlijn van Vugt

Roles

Author response to Decision Letter 0

Decision Letter 1

Ulrik R Beierholm

Marieke Karlijn van Vugt

Roles

Author response to Decision Letter 1

Decision Letter 2

Thomas Serre

Roles

Acceptance letter

Thomas Serre

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Fig 6. The neural activities of the main module x_m(μ_g, σ_g) and sub-module x_s(μ_g, σ_g) were plotted by the first and second principal component spaces.

Fig 8. a_k defined by Eq 14 is plotted against k, for the model with α_s = 1 and α_s = 0.1 using 3000 data points.