Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2021 Oct 7;17(10):e1008993. doi: 10.1371/journal.pcbi.1008993

Narrative event segmentation in the cortical reservoir

Peter Ford Dominey 1,2,*
Editor: Frédéric E Theunissen3
PMCID: PMC8525778  PMID: 34618804

Abstract

Recent research has revealed that during continuous perception of movies or stories, humans display cortical activity patterns that reveal hierarchical segmentation of event structure. Thus, sensory areas like auditory cortex display high frequency segmentation related to the stimulus, while semantic areas like posterior middle cortex display a lower frequency segmentation related to transitions between events. These hierarchical levels of segmentation are associated with different time constants for processing. Likewise, when two groups of participants heard the same sentence in a narrative, preceded by different contexts, neural responses for the groups were initially different and then gradually aligned. The time constant for alignment followed the segmentation hierarchy: sensory cortices aligned most quickly, followed by mid-level regions, while some higher-order cortical regions took more than 10 seconds to align. These hierarchical segmentation phenomena can be considered in the context of processing related to comprehension. In a recently described model of discourse comprehension word meanings are modeled by a language model pre-trained on a billion word corpus. During discourse comprehension, word meanings are continuously integrated in a recurrent cortical network. The model demonstrates novel discourse and inference processing, in part because of two fundamental characteristics: real-world event semantics are represented in the word embeddings, and these are integrated in a reservoir network which has an inherent gradient of functional time constants due to the recurrent connections. Here we demonstrate how this model displays hierarchical narrative event segmentation properties beyond the embeddings alone, or their linear integration. The reservoir produces activation patterns that are segmented by a hidden Markov model (HMM) in a manner that is comparable to that of humans. Context construction displays a continuum of time constants across reservoir neuron subsets, while context forgetting has a fixed time constant across these subsets. Importantly, virtual areas formed by subgroups of reservoir neurons with faster time constants segmented with shorter events, while those with longer time constants preferred longer events. This neurocomputational recurrent neural network simulates narrative event processing as revealed by the fMRI event segmentation algorithm provides a novel explanation of the asymmetry in narrative forgetting and construction. The model extends the characterization of online integration processes in discourse to more extended narrative, and demonstrates how reservoir computing provides a useful model of cortical processing of narrative structure.

Author summary

When we watch movies or listen to stories, our brains are led through a trajectory of activation whose structure reflects that of the event structure of the story. This takes place at multiple timescales across the brain, likely corresponding to different timescales of event representation. While this has been well described in human fMRI, the underlying computations that lead to these activation trajectories has not yet been fully characterized. The current research develops and explores a recurrent network “reservoir” model of cortical computation, whose natural internal dynamics help to provide an explanation of the trajectory of brain states that are observed in different cortical areas in humans. The model is exposed to narratives in the form of word embeddings for words in the narrative transcript. Neural activation in the model reveals event structure at multiple levels of temporal structure. This begins to provide insight into the computations underlying the event structure observed in the human brain during narrative processing.

Introduction

Human existence is embedded in a never ending flow of time [1]. A major function of the nervous system is to segment and organize the spatiotemporal flow of perception and action into coherent and relevant structure for effective behavior [24]. A particular challenge is to take into account the context of the recent and distant past while addressing the current situation. One of the highest expressions of this is human narrative, which provides a mechanism for encoding, transmitting and inventing reality with its inherent temporal structure [57]. Recent advances in neuroscience recording and analysis methods have made possible the study of temporal structure of neurophysiological signals in the human brain during narrative processing, e.g. [810].

One of the questions addressed in such research concerns the actual computations that integrate past and present information within the hierarchical networks in the brain [9]. Interestingly, networks of neurons that have recurrent connections are particularly well suited for problems that require integration of the past with the present [1114]. When we ask how such recurrent networks might be implemented in the brain it is even more interesting that one of the principal characteristics of cortical connectivity is the high density of local recurrent connections [15], i.e. that one of the primary characteristics of primate cortex is that it is a recurrent network.

It is thus not surprising that recurrent networks have been used to model cortical processing of sequential and temporal structure, explaining both behavior and neurophysiology [1620]. A subset of recurrent network models use fixed (rather than modifiable) random connections which avoids truncating the recurrent dynamics as required by recurrent learning mechanisms [11]. This allows the full expression of recurrent dynamics and high dimension expression of the inputs in the recurrent dynamics. This reservoir computing framework has been used to explain aspects of primate cortical neurophysiology in complex cognitive tasks [18,19,21,22].

Reservoir computing refers to a class of neural network models that exploit the rich dynamics that are created when information circulates in recurrent connections within the network [14,2326]. The novelty is that instead of employing learning-related modifications of weights within the recurrent network, the recurrent connections are fixed, and learning occurs only in connections between the units in the recurrent network and the so-called readout neurons. This eliminates the truncation of the recurrent activations and other simplifications that are necessary in the implementation of recurrent learning algorithms [27], and allows a natural way to use temporal structure in the input signal. In one of the first implementations of reservoir computing, Dominey et al. [19] proposed that the primate frontal cortex could be modeled as a reservoir, and the striatum as the readout, where corticostriatal readout connections are modified by reward-related dopamine. They used this model to simulate primate sensorimotor sequence learning, and the corresponding neural activity in prefrontal cortex. Again, the motivation for using fixed recurrent connections is that learning algorithms for recurrent connections require a simplification or truncation of the recurrent network dynamics. With fixed recurrent connections, the reservoir displays a rich, high dimensional mixture of its inputs, and a rich temporal structure. The model thus learned the sensorimotor sequencing task, and the reservoir units displayed a high dimensional mixture of space and time as observed in the primate prefrontal cortex [28]. Indeed, the somewhat surprising phenomenon underlying reservoir computing is that without learning, the random recurrent connections produce a high dimensional representation that encodes an infinity of nonlinear re-combinations of the inputs over time, thus providing universal computational power [23]. The appropriate representation in the high dimensional representation can then be retrieved through learning with a simple linear readout.

Reservoir units are leaky integrators that have their own inherent time constant. The interaction of such units via recurrent connections generates a further diversity of effective time constants within the system [29]. Thus, reservoirs have an inherent capacity to represent time and spatiotemporal structure [16,3034]. That is, the recurrent connections produce a form of fading reverberation of prior inputs so that the reservoir is inherently capable of representing temporal structure.

In typical use, an input signal excites the reservoir, generating a spatiotemporal trajectory of activation, and the connections to the readout neurons are trained so as to produce the desired behavior. In Enel et al. [18], the inputs were a sequence of activation of spatial targets simulating an explore-exploit task where a monkey first explored to identify which of four targets was rewarded, and then exploited or repeated choices of the rewarded target until a new exploration began. Readout neurons were trained so as to produce the correct exploration or exploitation responses on each trial. Interestingly, the units in the recurrent network displayed a non-linear mixture of the input signal, referred to as mixed selectivity [22], which was highly correlated with the neural responses recorded in the behaving monkey. Thus, reservoir models of cortical function yield interesting observations at two levels: First, in terms of learned behavioral output, and second, in terms of the high dimensional representations within the reservoir, independent of learning.

Reservoir models have been used to perform semantic role labeling in sentence processing [35]. In the domain of narrative and discourse comprehension, Uchida et al. [36] recently used a reservoir-based model to explain how discourse context is integrated on-line so that each new incoming word is processed in the integrated context of the prior discourse. The model addressed the immediacy constraint, whereby all past input is always immediately available for ongoing processing [37,38]. To address this constraint, the model proposed that making past experience immediately accessible in discourse or narrative comprehension involves a form of temporal-to-spatial integration on two timescales. The first timescale involves the integration of word meaning over extended lifetime, and corresponds to the notion of lexical semantics. This is modeled by the Wikipedia2Vec language model [39]. Wikipedia2vec uses three models to generate word embeddings: a word2vec skip-gram model [40] applied to the 3 billion word 2018 Wikipedia corpus, the link-graph semantics extracted from Wikipedia, and anchor-context information from hyperlinks in Wikipedia [39]. The second timescale is at the level of the on-line integration of words in the narrative. This is modeled by a recurrent reservoir network that takes as input the sequence of word embeddings corresponding to the successive words in the discourse. The model is illustrated in Fig 1. In [36], the reservoir was trained to generate the discourse vector, a vector average of the input words in the discourse. A sequence of word embeddings was input to the reservoir, and the readouts were trained to generate the average of the input embeddings.

Fig 1. Narrative Integration Reservoir.

Fig 1

Word embeddings, generated by Wikipedia2vec, input to reservoir which generates a trajectory of internal states that represent the word-by-word processing of the narrative.

The model was used to simulate human brain responses during discourse comprehension as the N400 ERP, a neurophysiological index of semantic integration difficulty [41]. N400 amplitude increases as a function of the semantic distance between a target word and the prior discourse [42,43]. This neurocomputational model was the first to simulate immediacy and overrule in discourse-modulated N400. It is important to note that this research exploited reservoir computing in terms of the behavior of the trained model, as observed in the trained readout neurons, with respect to human behavior and N400 responses. This research did not examine the coding within the recurrent reservoir itself.

The current research explores how this model can account for human neurophysiology of narrative processing in much more extended narratives, increasing from the order of 101 to 102–103 words. Here, instead of looking at trained readout responses, we directly examine the high dimensional representations of word embeddings within the recurrent reservoir, which is a high dimensional non-linear integrator. We compare these representations with those from human neurophysiology in human fMRI. It is important to acknowledge that this research would not be possible without the open science approach in modern neuroscience and machine learning. Data and algorithms used in the analysis of the human neuroscience are publicly available [8,44,45], as is python code for developing the reservoir model [46], and for creating word embeddings for narrative transcripts [39]. This open policy creates a context where it is possible to perform the same analyses on human fMRI and on Narrative Integration Reservoir simulations.

The Narrative Integration Reservoir model embodies two hypotheses: First, that word embeddings from a sufficiently large corpus can serve as a proxy for distributed neural semantics. It is important to note that these embeddings contain extensive knowledge of events as we will see below. Second, that temporal-spatial integration of these word vectors in a recurrent reservoir network simulates cortical temporal-spatial integration of narrative. To test these hypotheses the model is confronted with two principal observations related to cortical processing of narrative. The first has to do with the appearance of transitions between coherent epochs of distributed cortical activation corresponding to narrative event boundary segmentation [8]. The second has to do with a hierarchy of time constants in this processing, and an asymmetry in time constants for constructing vs. forgetting narrative context [9].

Baldassano et al. [8] developed a novel algorithm to detect narrative event boundaries in the fMRI signal of subjects listening to narratives. Their algorithm is a variant of the hidden Markov model (HMM). It identifies events as continuous sequences in the fMRI signal that demonstrate high similarity with an event-specific pattern, and event transitions as discontinuities in these patterns. They used the HMM to segment fMRI activity during narrative processing and made several remarkable observations. In particular they demonstrated that segmentation granularity varies along a hierarchy from short events in sensory regions to long events in high order areas (e.g. TPJ) representing abstract, multimodal situation models.

This allows us to pose the question, are these higher level representations in areas like TPJ the result of longer effective time constants in these higher cortical areas, or is there some processing taking place that is imposing these longer time constants on these higher processing areas? Indeed, in a large distributed recurrent network it is likely that different populations of neurons with different effective time constants will emerge, as demonstrated by Bernacchia et al. [29]. We can thus predict that a distribution of neurons with different time constants will be observed in the reservoir and that these time constants will be related to aspects of their narrative segmentation processing.

Interestingly Chien and Honey [9] demonstrated that the expression of such time constants in narrative processing is dependent on context. In their experiment, one group of subjects listened to an intact narrative (e.g. with a structure ABCD), and another to a narrative that had been scrambled (e.g. with a structure ACBD). The narrative was “It’s Not the Fall that Gets You” by Andy Christie (https://themoth.org/stories/its-not-the-fall-that-gets-you). The fMRI responses were then compared across these groups in two contexts. The forgetting context compared the situation where the two groups started by hearing the same narrative, and then shifted to two different narratives (e.g. AB in group 1 and AC in group 2). Thus the forgetting context was the transition from same (A) to different (B/C). The construction context compared the situation where the two groups started hearing different narrative and then began to hear the same narrative (e.g. CD in group 1 and BD in group 2). The transition from different (C/B) to same (D) was the construction context. In this clever manipulation of forgetting and constructing context, Chien and Honey [9] discovered that in different cortical areas the time constants of event structure such as observed by Baldassano et al. [8] is reflected in the rate of context construction, whereas there no systematic relation with context forgetting across these cortical areas. That is, higher cortical areas (e.g. TPJ) construct contexts more slowly, whereas forgetting did not increase in this systematic manner. In order to account for these results, they developed a hierarchical auto-encoder in time (HAT) model. Each hierarchical level in the model receives prior context from the level below, and higher levels have longer explicit time constants, so their context is less influenced by their input at each time step. This allows the system to account for the hierarchy of timescales in context construction. A surprise signal (that is triggered at event segment boundaries) gates the integration of new input and thus allows for a uniform and rapid reset in the case of context forgetting. These observations on event segmentation, the relation between timescales of processing and segmentation granularity, and the asymmetry in timescales for context forgetting and construction provide a rich framework and set of observations against which we can compare the Narrative Integration Reservoir.

In our experiments, the input to the reservoir of the word vectors for each successive word in the narrative produces a spatiotemporal trajectory of reservoir states. This reservoir activation trajectory simulates the fMRI signal of humans that listen to the same narrative, and we can thus apply the segmentation HMM of Baldassano et al. [8] to these reservoir state trajectories.

To evaluate the Narrative Integration Reservoir, we first expose it to a set of short texts extracted from the New York Times, and Wikipedia to test whether the reservoir will generate activity patterns that can be segmented by the HMM. At the same time, we also examine how the HMM can segment the unprocessed embeddings themselves, as well as a linear integration of the embeddings, in order to see what information is available independent of the reservoir.

Next we compare reservoir and human brain neural activity trajectories generated by exposure to the same narrative. We then undertake two more significant tests of the model. First, we examine whether the model can provide insight into a form of asymmetry of construction vs forgetting of narrative context as observed by Chien and Honey [9]. We then test the hypothesis that different effective time constants for reservoir neurons in context construction [9] will be associated with different granularity of event segmentation as observed in fMRI data by Baldassano et al. [8].

Results

Segmentation at topic boundaries

We first tested the hypothesis that the Narrative Integration Reservoir when driven by narrative input should exhibit event-structure activity with HMM segmentation correlated with known narrative boundaries. We created a test narrative with clearly identifiable ground truth boundaries. We choose four short text segments from different articles in the New York Times along with four short text segments from 4 Wikipedia articles and concatenated these together to form a single narrative text that had well defined ground truth topic boundaries. This yielded our test narrative which was then used as input to the Narrative Integration Reservoir in order to generate the trajectory of reservoir activation. We tested the resulting event-structured activity by applying the HMM segmentation model of Baldassano et al. [8] to the trajectory of reservoir states generated by feeding the embeddings for the words in these texts into the reservoir. The HMM takes as input the neural trajectory to be segmented, and the number of segments, k, to be identified, and produces a specification of the most probable event boundaries. The HMM model is available in the python BrainIAK library described in the Materials and Methods section.

Independent of the integration provided by the reservoir, there is abundant rich information in the embeddings themselves. It is thus of interest to determine if this information alone is sufficient to allow good segmentation based on the embeddings alone. Likewise, it is crucial to demonstrate that the NIR model’s performance is different from something very simple like feeding the embeddings into a linear integrator (LI) model. We thus investigated how the HMM would segment the embeddings, the LI (see Materials and Methods section) with different time constants, and the NIR reservoir with three different leak rates. The results are illustrated in Fig 2.

Fig 2.

Fig 2

HMM segmentation illustrated on time-point time-point correlations for (A) embeddings alone, linear integrator of embeddings with (B) fast leak rate, and (C) slower leak rate, and reservoir with three progressively slower leak rates (D-F). Input is a text made up of 8 paragraphs extracted from Wikipedia (4) and New York Times (4) articles. White dotted lines indicate segmentation of the HMM. Red dots indicate ground truth section boundaries in the text. Note that already the embeddings and linear integrator contain structured information related to the text.

Each panel in Fig 2 illustrates the HMM segmentation superimposed on the timepoint-timepoint correlations for the different signals: the raw embeddings, the LI with three values for leak rate α (.2, .1, .05), and the NIR reservoir with three leak rates (.2, .1, .05). Starting at k = 8 we run the HMM and increase k until the 8 segments are identified. This yields k = 10. The dotted lines indicate event boundaries identified by the HMM with k = 10, and dots indicate the actual ground truth boundaries.

Panel A illustrates segmentation on the raw embeddings. We can see that there is some structure related to the ground truth. Panels B and C illustrate segmentation with LI with values of the leak rate α = 0.2 and 0.05. As α decreases, the memory of past embeddings increases, and the impact of the current input decreases. We observe that the linear integrator is able to segment the narrative. Panels C-E illustrate segmentation on the NIR model reservoir states with the three different leak rates α = 0.2, 0.1 and 0.05. Again, as α increases, the memory of past events increases. We can observe that the NIR representations allow segmentation that is aligned with the ground truth, particularly for α = 0.05.

To evaluate segmentation of the embeddings, the LI, and NIR models, we applied the randomization procedure from Baldassano et al. [8]. For each method, we generate 1000 permuted versions of the HMM boundaries in which the distribution of event lengths (the distances between the boundaries) is held constant but the order of the event lengths is shuffled. We use this null model for comparison with how often the HMM boundaries for the ground truth and those for the model boundaries will be within 3 TRs of each other by chance. The true match value was compared to this distribution to compute a z value, which was converted to a p value. Indeed, all of the methods (embeddings alone, 3 linear integrators, 3 reservoirs) yielded representations whose segmentations correspond to the ground truth and differed significantly from the null model. With k = 8, we observed the following p values: embeddings alone 4.42e-04, LI1 7.21e-09, LI2 9.56e-05, LI3 3.56e-02, NIR1 8.13e-05, NIR2 2.67e-02, NIR3 4.67e-02. With k = 10 all values remained significant except for LI2 and LI3. As illustrated in panel C, LI3 segmentation produces more uniform segments that do not differ significantly from the null model, but align well with the ground truth.

This indicates that already in the embeddings there is sufficient information to segment text from different sources, and that this information can be represented in a linear integration of the embeddings. We can now investigate the possible added value of the non-linear integration provided by the reservoir.

Comparison of narrative integration reservoir and human fMRI segmentation on corresponding input

Baldassano et al. [8] demonstrated cross-modal segmentation, where application of their HMM to fMRI from separate subject groups that either watched a 24 minute movie or listened to an 18 minute audio narration describing events that occurred in the movie produced similar event segmentation using their HMM. The HMM thus revealed significant correspondences between event segmentation of fMRI activity for movie-watching subjects and audio-narration subjects exposed to the same story with different modalities (movie vs. audio) and different timing. This extended the related work of Zadbood et al. [47] who showed that event-specific neural patterns observed as participants watched the movie were significantly correlated with neural patterns of naïve listeners who listened to the spoken description of the movie.

Using material from these experiments, we set out to compare HMM segmentation in two conditions: The first is the fMRI signal from humans who watched the same 24 minute movie, and the second is the recurrent reservoir activity when the Narrative Integration Reservoir is exposed to the text transcript of the 18 minute recall of that same movie as in [47]. Thanks to their open data policy [8,9,44,45,47], we have access to fMRI from human subjects who watched the episode of Sherlock, along with a transcript of the recall of this movie. For fMRI we use data from Chen et al. 2017 that was recorded while subjects watched the first 24 minutes of the Sherlock episode (see Materials and Methods). For input to the NIR we use the transcript of the 18 minute audio recall of this 24 minute film from episode from [47]. We use the transcript to generate a sequence of word embeddings from Wikipedia2vec which is the input to the Narrative Integration Reservoir. The resulting trajectory of activation patterns in the reservoir can be segmented using the HMM. In parallel the HMM is used to segment human fMRI from subjects who watched the corresponding movie. This allows a parallel comparison of the HMM segmentation of human brain activity and of Narrative Integration Reservoir activity, as illustrated in Fig 3.

Fig 3. Pipeline for comparing HMM segmentation of human fMRI and reservoir recurrent states during processing of the same narrative.

Fig 3

Humans watch a “Sherlock” episode in fMRI scanner. Narrative Integration Reservoir model exposed to recall transcript of the same episode. Resulting trajectories of human fMRI and model reservoir states are processed by the Baldassano HMM.

fMRI Data from 16 subjects who watched the 24 minute move were compared with state trajectories from 16 NIR reservoirs exposed to the transcript from the 18 recall of the movie. The movie duration of 24 minutes corresponds to a total recording of 946 TRs. For each subject, the HMM was run on the fMRI data from the angular gyrus (AG) which has been identified as an area that produces related event processing [45,47] and segmentation [8] for watching a movie and listening to recall of the same movie. Baldassano et al. [8] determined that the optimal segmentation granularity for AG is between 50–90 segments for the 1976 TRs in the 50 minute episode, corresponding to a range of 24–43 segments for the 946 TRs in the 24 minute fMRI data we used. We thus chose k = 40 for the HMM segmentation, as a value that has been established to be in the optimal range for AG. The summary results of the HMM segmentation of the NIR and fMRI trajectories are presented in Fig 4. In Fig 4A we see the segmentation into 40 events in the average over all subjects.

Fig 4. Human and Model HMM segmentation.

Fig 4

A: Segmentation of human fMRI. B: Segmentation of Narrative Integration Reservoir model internal states. C: Temporal correspondence of fMRI and model segmentation. D: Correspondence of segmentation boundaries for fMRI from 16 subjects (blue) and reservoir trajectories from 16 model instances (pink).

For the Narrative Integration Reservoir, 16 instances were created using different seed values for initialization of the reservoir connections, and each was exposed to the word by word narrative of the recall of the movie. For each word, the corresponding Wikipedia2vec 100 dimensional embedding was retrieved and input into the reservoir. The HMM was then run on each of these reservoir activation trajectories. In Fig 4B we see the segmentation into 40 events in the average over all reservoirs.

In order to visualize the temporal correspondence between the reservoir and the fMRI, we can calculate the probability that reservoir state and an fMRI TR are in the same event (regardless of which event it is). This is

k(TR==k)·p(TB==k) (1)

which we can compute by a simple matrix multiplication on the two segmentation matrices. (See BrainIAK Tutorial: https://brainiak.org/tutorials/12-hmm/). The result is illustrated in Fig 4C which illustrates the temporal correspondence between the segmentation of human fMRI and Narrative Integration Reservoir state trajectories. In Fig 4D we illustrate violin plots for the mean segment boundaries for the fMRI and NIR model. We can observe that the distribution of segment boundaries for the NIR model spatially overlaps those for the fMRI data, indicating a good match.

To determine whether this match is statistically significant, we use the randomization procedure from Baldassano et al. [8] described above. We collected the segmentations for the 16 reservoir instances, and generated mean boundaries. We generate 1000 permuted versions of these NIR model boundaries in which the distribution of event lengths (the distances between the boundaries) is held constant but the order of the event lengths is shuffled. We use this null model for comparison with how often the HMM boundaries for the fMRI segmentation and those for the NIR model segmentation boundaries will be within 3 TRs of each other by chance. The true matches are significantly different from the null matches (p = 0.001), indicating that it is unlikely that the observed match is the result of random processes, allowing us to reject the null hypothesis. We performed the same test using the embeddings alone and the linear integrator. The linear integrator produced a segmentation that matched that of the fMRI (p = 0.0391). The embeddings alone produced a segmentation that did not differ from the null model (p = 0.1046).

The important point is that the Narrative Integration Reservoir demonstrates structured representations of narrative in terms of coherent trajectories of neural activity that are discontinuous at event boundaries, as revealed by the HMM segmentation. This allows us to proceed with investigation of temporal aspects of this processing.

Different timing of constructing and forgetting temporal context

In order to investigate the time course of context processing, we exposed the reservoir to an experimental manipulation based on that used by Chien and Honey [9]. We recall that they considered two types of contextual processing: constructing and forgetting. In the constructing context, separate groups of subjects heard different narratives and then at a given point began to hear the same narrative. At this point, they began to construct a shared context. Conversely, in the forgetting context, the two separate groups of subjects initially heard the same narrative, and then at a given point began to hear two different narratives. At this point, they began to forget their common context.

We thus exposed paired instances of the Narrative Integration Reservoir (i.e. two identical instances of the same reservoir) to two respective conditions. The first instance was exposed to an intact version of the Not the Fall transcript in four components ABCD. The second instance was exposed to a scrambled version of the transcript ACBD.

For the two model instances, the initial component A is the same for both. The second and third components BC and CB, respectively, are different, and the final component D is the same. We can thus examine the transition Same to Different for forgetting, and the transition Different to Same for constructing. We expose two identical copies of the same reservoir separately to the intact and scrambled conditions, and then directly compare the two resulting reservoir state trajectories by subtracting the scrambled from the intact trajectory. This is illustrated in Fig 5. There we see that in the common initial Same section, there is no difference between the two trajectories. At the Same to Different transition we see an abrupt difference. This corresponds to forgetting the common context. Then in the transition from Different to Same, we see a more gradual convergence of the difference signal to zero in the construction context. Interestingly the same effects of abrupt forgetting and more progressive construction are observed for the linear integrator. This indicates that this asymmetry in constructing and forgetting is a property of leaky integrator systems.

Fig 5. Activation differences for model pairs exposed to intact and scrambled narrative.

Fig 5

A. Linear integrator and B. Reservoir activation difference at the Same-Different and Different-Same transitions. In the initial Same period, both perceived the same initial 140 words of the “Not The Fall” story. In the following Different period, model 1 continued with the story, and model 2 received a scrambled version. In the final Same period, both were again exposed to the same final portion of the story. Forgetting, at the Same-Different transition, takes place relatively abruptly. Constructing, at the Different-Same transition appears to take place much more progressively.

This asymmetry with a rapid rise and slow decay in the values in Fig 5 may appear paradoxical, as integrators that are slow to forget prior context, should also be slow to absorb new input. Thus it is important to recall that the dependent variable illustrated in Fig 5 is not the activation of a single integrator (or reservoir). Rather, it is the difference between two integrators (or reservoirs) exposed to the intact vs. shifted narratives, respectively. At the transition from same to different, the input to the two integrators becomes different, and from that point their pasts begin to differ as well. Thus, the integrator values in the same-different transition are dominated by the diverging inputs. This produces the rapid change in the difference value. In contrast, at the transition from different to same, the inputs become the same. In the two integrators only the pasts are different, and they converge to the same signal (driven by the common input) as a function of the leak rate. Thus, integrator values in the different-same transition are dominated by the divergent pasts as a function of the leak rate. These respective impacts of input and past produce the asymmetry in Fig 5. A more detailed analysis and demonstration is provided in the Materials and Methods section.

To analyze construction and forgetting across the two groups, Chien and Honey [9] measured the inter-subject pattern correlation (ISPC) by correlating the spatial pattern of activation at each time point across the two groups. We performed the same correlation analysis across the two reservoirs. In Fig 6 we display the forgetting and constructing context signals, along with the pairwise correlation diagrams, or inter reservoir pattern correlations IRPCs, formed by the correlation of states at time t of the intact and scrambled reservoir trajectories. Gradual alignment or constructing is illustrated in Fig 6A which shows the difference between reservoir trajectories for intact vs. scrambled inputs at the transition from Different to Same. In Fig 6B the IRPC between the intact and scrambled reservoir state trajectories is illustrated, with a blue circle marking the point illustrated by the dotted line in 6A, where the scrambled and intact narratives transition to the common Same ending. Interestingly, we see that there is a gradual smooth reduction of the differences in 6A, and progressive re-construction of correlation along the diagonal in 6B. In contract, in 6C and D we focus on the transition from Same to Different. There we see the difference signal and the cross-correlation map for the divergence/forgetting context in the Same to Different transition. Presented in the same timescale as A and B, we see in C and D a much more abrupt signal change, both in the difference signal (6C) and in the IRPC map (6D). There, where the white circle marks the shift from common to different narrative input, we see an abrupt end to the correlation indicated by the initially high value along the diagonal which abruptly disappears. Interestingly this is consistent with observations of Chien and Honey [9], who identified the existence of more extended time constant for constructing vs. forgetting a context in certain higher order cortical regions. Similar effects, were observed for the linear integrator. The take home message of Figs 5 and 6 is that in the forgetting and construction are not systematically related, and that forgetting produces a rapid divergence between the reservoirs’ activations, while construction of a common context takes place over more progressively. We now examine these effects in more detail.

Fig 6. Dynamics of Constructing and Forgetting.

Fig 6

A. Zoom on reservoir activity difference during constructing (transition from Different to Same). Dotted line marks the transition. Note the gradual decay. B. Time-point time-point correlations between the two (intact and scrambled input) reservoirs. Blue circle marks the beginning of the Same input. Note the slow and progressive buildup of coherence revealed along the diagonal. C. Zoom on reservoir activity difference during forgetting (transition from Same to Different). Dotted line marks the transition. Note the abrupt increase. D. Same as B, but white circle marks the Same to Different transition. Not the abrupt loss of coherence along the diagonal.

Distribution of time constants for context Alignment/Construction

In Chien and Honey [9] this extended time constant for constructing was observed particularly for cortical areas higher in the semantic processing hierarchy such as the temporal-parietal junction TPJ, vs. primary auditory cortex. Indeed, they demonstrated while construction or alignment times increased from peripheral regions toward higher-order regions, forgetting or separation times did not increase in this systematic manner.

Considering such properties in the Narrative Integration Reservoir, we recall that the reservoir is made up of leaky integrator neurons with a leak rate, and thus the reservoir should have some characteristic inherent temporal dynamics and time constants. More importantly, within the reservoir, combinations of recurrent connections can construct subnetworks that have different effective time constants. This predicts that we should be able to identify a distribution of time constants within the reservoir, similar to the observations of Bernacchia et al. [29]. To test this prediction, we again exposed paired reservoir instances to the intact and scrambled conditions, and analyzed the difference between the two reservoir state trajectories. For each neuron in the reservoir pair, we took the absolute value of its activation difference (intact–scrambled) at the onset point of the convergence/construction period (indicated by the dotted vertical line in Fig 6A) and counted the number of time steps until that value fell to ½ the initial value. This was used as the effective time constant, or alignment time, of the construction rate for each neuron. The sorted values of these time constants is presented in Fig 7 where it compares well with the same type of figure illustrating the distribution of cortical construction time constants from Chien and Honey [9] (see their S4B Fig). When we applied this procedure to the linear integrator, we observed that all units have the same time constants. The distribution of time constants is a property of the reservoir that is not observed in the linear integrator.

Fig 7. Reservoir units sorted by alignment time constant (time steps for the activation difference to descend to ½ its initial value).

Fig 7

Broad range of alignment times. This distribution is remarkably similar to that observed by [9] (see their S4B Fig).

We can further visualize the behavior of this distribution of time constants by binning neurons into time constant groups, forming different virtual cortical areas. Fig 8 illustrates two thus created virtual areas of 100 neurons each (neurons 100–199 for the fast area with fastest time constants, and neurons 800–899 with slow time constants for the slow area). Note the slopes of the neuronal response which become more shallow in B vs. A. Below the traces of neural responses, the corresponding IRPC cross-correlation between the intact and scrambled reservoir state trajectories for each of these sub-groups of neurons is illustrated. We observe that for the faster neurons, the alignment is faster as revealed by the coherent structure along the diagonal.

Fig 8.

Fig 8

Effects of Alignment Time on Reservoir unit activity difference during alignment or constructing after the Different to Same transition, for fast (A) and slow (B) virtual area. Note the steep slope of the activity in the panel A that progressively flattens in panel B. Likewise note the “rebound” in A, where neurons quickly reduce their activity but then continue and cross the x-axis before coming to zero. In panel B, corresponding to slower neurons, activity actually increases at the transition before slowly coming back to zero. This indicates the complex and non-linear characteristics of the reservoir. C and D: Effects of alignment time on context construction. Same format as Fig 6B. Note how the rapid onset of coherence along the diagonal for the fast area in C, and the slower onset of coherence in the slower area in panel D.

Fig 9 illustrates these forgetting and constructing responses (panels A and B), and the inter reservoir pattern correlation functions for five increasing slow areas (made of 5 groups of 200 neurons) in the forgetting and constructing contexts (panels C and D). These linear plots correspond to the values along the diagonal of the IRPC—the inter reservoir pattern correlation for the pairs exposed to the intact and scrambled narratives. There we observe that while the time constant for forgetting (transition from Same to Different) is fixed across these five cortical areas, there is a clear distinction in the time course of constructing (transition from Different to Same). Thus we observe a distribution of construction times, with no relation to the forgetting time. We verified the lack of correlation between the time constants for constructing vs. forgetting (Spearman correlation = -0.004, p = 0.89). This corresponds to the observations of Chien and Honey [9], that alignment or construction time increases in higher cortical areas, whereas there is no systematic relation between cortical hierarchy and separation or forgetting [9]. Now we consider the functional consequences of this diversity of time constants in construction.

Fig 9. Temporal profiles of Forgetting and Constructing.

Fig 9

A and B: Difference between reservoir activation for two NIR models receiving the intact and scrambled narrative. A. Forgetting—Dotted line indicates transition from Same to Different, with abrupt transition from 0 to large difference values. B. Transition from Different to Same, with transition from large differences, progressively to 0. C and D Inter-Reservoir Pattern Correlations for the two (intact and shifted input) NIR models. These linear plots correspond to the values along the diagonal of the IRPC. C. Forgetting. Dotted line indicates transition from Same to Different. All 5 temporal areas (Area1-5) display an overlapping descent to the minimum coherence in parallel. It reaches a minimum about 15 time steps later and then fluctuates around the same level. D. Constructing. In contrast, the 5 regions display a diversity of time-courses in their reconstruction of coherence. Dotted lines indicate the same time points in panels A-C and B-D respectively.

Segmentation granularity corresponds to construction time constant

Baldassano et al. [8] showed that segmentation granularity varies across cortex, with shorter events in sensory areas and increasingly longer events in higher cortical areas. Chien and Honey [9] further revealed that the time constant for context constructing similarly increases along this hierarchy. Our time sorting method allows us to generate populations of neurons that can be grouped into virtual areas that are fast, that is they have a small effective time constant (i.e. high leak rate), and areas that are slow, with longer effective times constants (low leak rate). We can now test the hypothesis that there is a relation between these time constants for construction, and the increasing-scale of event segmentation effects in higher areas reported by Baldassano et al. [8]. In other words, longer time constants for construction will correspond to a preference for fewer, longer events in segmentation by the HMM, whereas shorter time constants will have a preference for more numerous and shorter events in the segmentation by the HMM.

We predict that the HMM will have a better fit when k is smaller for a slow area, and when k is higher for a fast area. Formally for values of k = i, and k = j (denoted ki and kj, we test the following inequality, when i < j, and thus ki < kj: the sum of log-likelihoods for small k and slow NIR, and large k and fast NIR will be greater than the sum for large k and slow NIR and small k and fast NIR. We refer to this difference as the segmentation effect.

fori<j,ll(ki(NIRslow))+ll(kj(NIRfast))>ll(kj(NIRslow))+ll(ki(NIRfast)) (2)

To test this prediction, we ran the forgetting-construction experiment as described above, and then generated fast and slow virtual cortical areas of 100 neurons each, corresponding to those illustrated in Fig 8, and segmented the activation from these fast and slow areas using the HMM with large and small values of k. We then evaluated the prediction as the inequality in Eq (2). We first consider the results with the HMM with ki = 8 for the slow area, and kj = 22 for the fast area for an example reservoir, illustrated in Fig 10A and 10B. These values for k were identified based on specification by Baldassano et al. [8] of optimal values of k for fast (early visual) and slow (default mode) cortical areas (see Materials and Methods). The results using these values of k for segmenting the NIR neural activity are illustrated in Fig 10A with the time-point time-point correlation map for a fast area (neurons 100–199) with small convergence time constants, and the event boundaries found by the HMM with k = 22. Panel B illustrates the same for a slow areas (neurons 800–899) with large convergence time constants, and the event boundaries found by the HMM with k = 8. We can observe the finer grained correlation structure along the diagonal in panel A, and more coarse grained, larger event structure in panel B. For 50 NIR instances, this segmentation effect was significant, p<0.001, consistent with our prediction. That is, the sum of log-likelihoods for k = 8 for the slow NIR, and large k = 22 for the fast NIR was significantly greater than the sum for k = 22 with the slow NIR and k = 8 for the fast NIR.

Fig 10. HMM segmentation in two populations of reservoir neurons with faster vs. slower alignment times.

Fig 10

A–Fast reservoir subpopulation (neurons 100–199) with shorter alignment times. HMM segmentation into k = 22 event segments. B–Slow reservoir subpopulation (neurons 800–899) with greater alignment times. HMM segmentation into k = 8 regions. Note the larger sections of continuous coherence in B vs. A. Panel C—Segmentation effect advantage for 50 reservoir pairs for using k(i) < k(j) on the slow vs fast areas, respectively. D. p values. For k(i) = 8–18 and k(j) = 18–40 this advantage is significant. For a large range of k values, when k(i) < k(j), there is a significant advantage of using the HMM with k(i) vs. k(j) on the slow vs. fast areas respectively.

To examine the more general application of the inequality in Eq (2) we ran the HMM on these two areas, iterating over k from 2 to 40 for both areas. The results of this grid search are presented in Fig 10C and 10D, which illustrate the value of the inequality, and the p values of the difference, respectively. There see we that for all small values of 8 < ki < 16 applied to the slow area and large values 18 < ki < 40 applied to the fast area, the inequality in Eq (2) holds, with median p = 0.0001, mean p = 0.0015. This corresponds to the red patch above the diagonal in Fig 10C for 8 < ki < 16 and 18 < ki < 40, indicating the positive segmentation effects, and the corresponding blue patch in Fig 10D, corresponding to the significant p values. This confirms the prediction that areas with faster construction times prefer shorter events, and those with slower construction times constants prefer longer events.

Discussion

Narrative is a uniquely human form of behavior that provides a coherent structure for linking events over time [6,48]. A given instance of a narrative itself is a temporal sequence that must be processed by appropriate mechanisms in the nervous system [10]. Previous research has identified constraints on this processing related to immediacy [38]. Information that is provided early in the narrative must remain immediately accessible at any time. This is technically an interesting problem, as in many architectures, the time to access previous memories may depend on the size of the memory, or the position of the data in question in the input sequence. We recently reasoned that performing a temporal-to-spatial transformation can solve this problem, provided that there is a parallel readout capability that provides immediate access to the distributed spatial representation [36].

The current research further tests this model in the context of temporal processing constraints identified in more extended narrative processing. Chien and Honey [9] compared brain activity across subjects that hear the same and then different narratives (forgetting), and different then same narratives (constructing). They characterized a form of asymmetry in the temporal processing of the construction and forgetting of narrative context. Construction and forgetting can take place at different rates, and particularly in higher areas, construction can take place more progressively than forgetting. This is our first constraint. More globally, their empirical finding was that while construction or alignment times increased from peripheral regions toward higher-order regions, forgetting or separation times did not increase in this systematic manner. This gradient of construction times is our second constraint.

In order to account for the hierarchy of time constants across areas Chien and Honey [9] developed the HAT model with explicit hierarchically organized modules, where each successive module in the hierarchy is explicitly provided a longer time constant. They introduced a gating mechanism in their hierarchical recurrent model in order to explicitly address the more abrupt response to change in context in the forgetting context. This model is a significant achievement as it provides an explicit incarnation of the processing constraints in the form of a hierarchy of modules, the pre-specified time constants and the explicit gating mechanism. As stated, the problem remains, “what are the essential computational elements required to account for these data?” (p. 681–682). Our third constraint comes from Baldassano et al. [8] who demonstrate that higher elements in the cortical hierarchy prefer segmentation with fewer and larger segments, while lower sensory areas prefer more and shorter segments. Part of the motivation for the current research is to attempt to respond to the question “what are the essential computational elements required to account for these data?” with respect to these constraints, and to establish a relation between the context construction hierarchy of Chien and Honey [9] and the event segmentation hierarchy of Baldassano et al. [8].

Indeed, when we simulated the experimental conditions where pairs of subjects heard the same and then different narratives (forgetting) and respectively different and then same narratives (constructing), the between reservoir correlations displayed an abrupt response in the forgetting context, and a more progressive responses in the constructing context. Interestingly, this behavior is an inherent property of the reservoir, and the linear integrator, in the context of a task that evaluates the difference between two reservoirs (or integrators). The transition from different to same requires the dissipation of the past differences in face of common inputs, and will vary as a function of the leak rate. In contrast, in the transition from same to different, the rapid onset response in the forgetting context is due to immediate responses to the diverging inputs in the reservoir which are independent of the leak rate (see Fig 14 and Materials and Methods). This behavior of the reservoir and the linear integrator addresses the first constraint from Chien and Honey [9]–the construction, forgetting asymmetry, or lack of covariance between constructing and forgetting. However, whereas each element of the linear integrator has the same decay rate, the hierarchy of decay time constants is a natural property of the reservoir, as observed by Bernacchia et al. [29] (and see Materials and Methods below). Thus, only the reservoir, and not the linear integrator, can address the second constraint, the hierarchy of time constants across cortical areas for forgetting.

Fig 14. Analysis of Linear Integrator in Forgetting and Construction.

Fig 14

Using random Gaussian inputs, intact (ABCD) and shifted (ACBD). A. Forgetting–transition from same to different inputs marked by dotted line. All 100 units of LI. Dominated by input term of LI. B. Construction–transition from different to same inputs marked by dotted line. All 100 units of LI. Dominated by memory term of LI. C-D. Example single unit of LI, tested with three time constants (α = 0.2, 0.1, 0.05 –blue, orange, green). C. Time constants do not have a remarkable effect on the difference signal for forgetting at the transition. D. Time constants have a noticeable effect on the difference signal for construction at and beyond the transition.

By partitioning the reservoir neurons into subgroups based on their constructing time constants, we observed that areas with longer time constants preferred segmentation with fewer, longer events, while on the opposite, areas with shorter time constants preferred segmentation with more numerous and shorter events, thus addressing the third constraint, from Baldassano et al. [8]. This was demonstrated using evaluation of the preferred number of events based on the log-likelihood of model fit. New methods for event segmentation [49] can overcome limitations of this approach including a tendency to overestimate the number of states when the number of states approaches the number of time-points. As we are far from this regime (i.e. the number of states in much smaller than the number of time points) the HMM-based method is not subject to this problem, and suitable for our purposes [49].

While there are likely multiple mechanism that contribute to the diversity of inherent time constants across cortical areas [29,50,51], the internal dynamics of recurrent loops within the reservoir account for a rich diversity of effective time constants as we observe here. We propose that this segregation is a natural product of neuroanatomy, and that narrative structure has found its way into this structure. While individual neurons have the same time constants, we observe a recurrent time constant effect, a functional variety of time constants due to the recurrent dynamics.

Here we observed how this effect generates a diversity of time constants within a single reservoir. We partitioned the units into groups based on time constants, and revealed a relation between these time constants for context construction, and event segmentation across cortical areas as observed by [8]. This effect should also extend across multiple areas in a larger scale model, with a hierarchy of time constants as one moves farther from the input, consistent with Chaudhuri et al. [51].

We propose that in the primate brain, as higher cortical areas become progressively removed from sensory inputs, and driven by lower cortical areas, they will have higher effective time constants, due to the accumulation of the recurrent time constant effect. The result would be a local diversity of time constants within a given range within areas, consistent with [29], and a broader range of time constants across areas consistent with [50] and [51]. The potential results of such effects have been observed in the hierarchy of temporal receptive windows in cortex [52,53] and the corresponding temporal processing hierarchy [8,9,52]. Lerner et al. [53] thus characterized a hierarchy of temporal response windows by examining inter-subject correlations in brain responses to audio that was intact, or scrambled at word, sentence or paragraph levels. It will be interesting to test the prediction that the virtual areas in the reservoir with their different alignment times will also display corresponding temporal response windows as revealed by scrambling at different levels of granularity as observed by [53]. Here we propose that the different levels of temporal structure in narrative may have evolved to be accommodated by the representational capabilities in the cortical hierarchy of temporal processing.

It is worth noting, in the context of the relation between narrative structure and recurrent network functional neuroanatomy, that the observed reservoir behavior comes directly from the input-driven dynamics of the reservoir. There is no learning in the reservoir, and we do not look at the readout, but at the recurrent elements of the reservoir itself. While there is no learning in the reservoir, the NIR model benefits from massive learning that is present in the Wikipedia2Vec corpus [39]. Indeed, the embeddings themselves already contain sufficient information to provide segmentation. Words within the same narrative context will tend (by definition of the embedding model) to share similarity with words in the same narrative context. The benefits of this learning are clearly visible in the visible structure of the embeddings in Figs 2 and 9, and in the segmentation results that can be obtained directly from the embeddings and from a simple linear integrator model that we examined.

Indeed, comparing the reservoir to the embeddings alone and the linear integrator reveals the crucial features of the reservoir, which derive from its highly non-linear integrator properties. That is, the distribution of effective time constants that are found within the reservoir, and that emerge because of the recurrent connectivity. The recurrent connectivity provides a diversity of effective time constants. Grouping neurons by these time constants allows us to evaluate the hypothesis that effective functional time constants in cortical areas as observed by Chien and Honey [9] will correspond to granularity of event representations in these areas, as revealed by HMM segmentation of Baldassano et al. [8]. This corresponds to the added-value of the reservoir with respect to a linear integrator as a process model of cortical dynamics. The recurrent dynamics provide two key elements of cortical function that are not present in the linear integrator. The first, which we have previously examined, is the projection of the inputs into a high dimensional space, which provides the universal computing characteristic characterized by [23], and revealed by mixed selectivity in the neuronal coding [18,22]. Here we see how this high dimensional coding models cortical integration during narrative processing. The second property, which we investigate for the first time here, is the diversity of functional time constants that is a correlate of the high dimensional projection. Indeed, in this high dimensional representation, time is one of the dimensions [31]. The degree of resemblance that we found between cortical and reservoir dynamics in narrative processing provides further support for the idea that aspects of cortical function can be considered in the context of reservoir computing [18,19,21,22,54].

A limitation of the current modeling effort is that it does not explicitly address meaning and the content of the representations in the NIR model and the fMRI signal. Indeed, in our previous work [36], we predicted N400 responses using the trained readout of the reservoir to generate the cumulative average of the input word sequence, thus forming a discourse vector. This discourse average vector could then be compared with the word embedding for a target word in order to predict the N400 as 1-similarity. Interestingly, these discourse vectors encode knowledge that is assumed to be required for making inferences about events [42]. Related models have used recurrent networks to integrate meaning over multiple word utterances in order to predict the N400 [55,56], further supporting the role of recurrent connections in accumulating information over multiple words. Reservoir content can also be used to decode more structured meaning about events in terms of semantic roles including agent, action, object, recipient etc. [35,57]. This information is coded in the input based on clues provided by grammatical morphology and word order [58], in the form of grammatical constructions [59,60]. In the reservoir model, these inputs are combined in the high dimensional representation, allowing a trained readout to extract semantic structure. The same principal should apply to narrative structure. That is, the elements contributing to the narrative meaning in the input are combined in the high dimensional representation, and a trained readout should be able to extract the desired meaning representations. Related research has started to address how structured meaning in the narrative is extracted to build up a structured situation model [61,62], so that the system can learn to answer questions about perceived events and narrative. Baldassano et al. [8] note that the event segmentation they observe in high areas including angular gyrus and posterior medial cortex exhibit properties associated with situation model representations including “long event timescales, event boundaries closely related to human annotations, generalization across modalities, hippocampal response at event boundaries, reactivation during free recall, and anticipatory coding for familiar narratives” (p. 717). Baldassano et al. [63] further investigated these representations and determined that the event segment structure discovered by the HMM can be used to classify fMRI activation trajectories based on the underlying story schema.

In this context, we are witnessing an interesting conjuncture in the science and technology of language. Language models in machine learning are beginning to display remarkable performance capacities with human-like performance in question answering [64], semantic similarity judgement, translation and other domains [65,66]. In certain aspects they are similar enough to human performance that specific measures of human language comprehension from psycholinguistic experiments are now being used to characterize and evaluate these language models [67,68]. At the same time, these language models are beginning to display underlying representations and mechanisms that provide insight into human brain processes in language processing [6971]. Future research should combine human neurophysiology studies of narrative comprehension and parallel modeling of the underlying neurophysiological processes. In this context one would expect to identify the presence of high dimensional coding and mixed selectivity, characteristic of reservoir computing, in cortical processing of narrative.

Materials and methods

Narrative integration reservoir

The end-to-end functioning of the Narrative Integration Reservoir (based on, and extending the model of Uchida et al. [36]) is illustrated in Fig 1. The model consists of two components. The first generates word embedding vectors, and the second generates the spatiotemporal trajectory of neural activation. Given the input narrative, we first remove stop words (e.g. the, in, at, that, which, etc.) which provide little or no semantic information [72]. The remaining input words are transformed into word embedding vectors by the Wikipedia2Vec model, pre-trained on the 3 billion word 2018 Wikipedia corpus [39]. These vectors are then input to the reservoir, a recurrent network with fixed recurrent connections. The fixed connections in the recurrent network allow the full dynamics of the system to be exploited, which in some other networks with modifiable recurrent connections is not the case due to temporal cut-off of the recurrence required for implementing learning on the recurrent connections [12,27,73]. This method of fixed connections in the reservoir was first employed in order to model primate prefrontal cortex neurons during behavioral sequence learning tasks [19], and was subsequently developed with spiking neurons in the liquid state machine [23], and in the context of non-linear dynamics signal processing with the echo state network [24,74], all corresponding to the class of reservoir computing [25].

In reservoir computing, the principle is to create a random dynamic recurrent neural network, and then stimulate the reservoir with input, and harvest the rich high dimensional states. Typically this harvesting consists in training the output weights from reservoir units to output units, and then running the system on new inputs and collecting the resulting outputs from the trained system. In the current research we focus our analysis directly on the rich high dimensional states in the reservoir itself. That is, we do not train the reservoir to perform any transformation on the inputs. Instead, we analyze the activity of the reservoir neurons themselves. The basic discrete-time, tanh-unit echo state network with N reservoir units and K inputs is characterized by the state update equation:

x(t+1)=(1α)x(t)+αf(Wx(t)+Winu(t)) (3)

where x(n) is the N-dimensional reservoir state, f is the tanh function, W is the N×N reservoir weight matrix, Win is the N×K input weight matrix, u(n) is the K dimensional input signal, α is the leaking rate. The matrix elements of W and Win are drawn from a random distribution.

The reservoir was instantiated using easyesn, a python library for recurrent neural networks using echo state networks (https://pypi.org/project/easyesn/) [46]. We used a reservoirs of 1000 neurons, with input and output dimensions of 100. The W and W_in matrices are initialized with uniform distribution of values from -0.5 to 0.5. The leak rate was 0.05. This was established based on our empirical observations and the high volatility of the input. We also tested with leak rates of 0.1, 0.15 and 0.2. The reservoir is relatively robust to changes in these values, as long as the reservoir dynamics are neither diverging nor collapsing.

To simulate narrative processing, words were presented in their sequential narrative order to the reservoir. Stop words (e.g. the, a, it) were removed, as they provide no semantic information [72]. Similar results were obtained in the presence of stop words. Words were coded as 100 dimensional vectors from the Wikipedia2vec language model. Fig 11 illustrates the forms of the input, and reservoir activation, and their respective cross-correlations.

Fig 11. Reservoir fundamentals.

Fig 11

A. Temporal structure of input sequence of word embeddings for 100 input dimensions. B. Time-point time-point pattern correlation for input sequence. C. Activity of a subset of reservoir units. Note the relative smoothing with respect to the input in panel A. D. Time-point time-point pattern correlation of reservoir activity during the processing of the narrative. Note the display of coherent structure along and around the diagonal, when compared with panel B. This indicates the integrative function of the reservoir.

In Fig 11, in panel A we see the high frequency of change in the input signal. This is the word by word succession of 100 dimensional word embeddings for the successive words in the narrative. In B we see the auto-correlation, and note that there is structure present. (Event segmentation for embeddings is presented in Fig 2). In C we see at the same timescale the activation of 50 reservoir units. Here we can observe that the frequency of change is much lower than in the original input. This is due to the integrative properties of the reservoir. In D we see the autocorrelation of the reservoir states over the time course of the narrative. Here we see along the diagonal more evidence of structure and the integration over time in local patches.

The reservoir has inherent temporal dynamics. We can visualize these dynamics by exposing the reservoir to a zero input, then a constant input, and then return to zero, and then observing the responses to these transitions. Such behavior is illustrated in Fig 12. A zero input is provided from 0 to 500 time steps, then a constant input from 500 to 900, and finally a zero input from 900 to 1500. Fig 12 displays the response of 10 sample neurons. This illustrates the inherent temporal dynamics of the reservoir. In order to more carefully characterize the temporal properties of the reservoir, we measured the time constants for neural responses to these transitions, and then plotted the ordered time constants. In Fig 13 we display the ordered time constants for neurons in response to the transition from zero input to a fixed non-zero signal, and then from signal to zero. These can be compared to the time constants for construction and forgetting.

Fig 12. Reservoir dynamics in response to a continuous input of zero, then a fixed non-zero input, and final return to zero input.

Fig 12

Note the diversity of temporal responses. This indicates the inherent property of distributed time constants generated by recurrent connections within the reservoir.

Fig 13. Reservoir units sorted by time to stabilize after the transitions from zero to non-zero input, and from non-zero to zero input.

Fig 13

Comparable to Fig 7. Dotted line marks the mean from Fig 7 for comparison.

Linear integrator

In order to demonstrate that the model’s performance is different from feeding the embeddings into a linear integrator model, we use a linear integrator described in Eq (4):

LI(n)=(1α)*LI(n1)+(1+α)*embedding(n) (4)

The linear integrator LI combines the previous integrated embeddings with the current embedding scaled by a leak rate α. As α increases, the influence of past inputs is reduced. In the different experiments we compare performance of the NIR model with the LI linear integrator.

Note on the asymmetry for constructing and forgetting in the linear integrator

Here we analyze the behavior of the linear integrator described in Eq (4) in the context of the construction/forgetting asymmetry. We first confirmed that indeed the LI behaves as expected to a pulse input, with symmetric rise and fall behavior. Then, in order to determine that the asymmetry is not due to correlation in the input, we ran the intact vs. scrambled experiment for constructing and forgetting using data sampled from a random normal distribution. This is illustrated in Fig 14. Again we observe forgetting-construction asymmetry panels A and B.

We determined that the asymmetry is due to properties of the integrator in a certain parameter range, in the specific context of the current task. What is visualized in Fig 5, and 14, is not the response of a single instance of the integrator, but rather it is the difference in the activations of two integrators exposed to particular input sequences that have an ABCD and ACBD structure, respectively. Fig 14A and 14B display this difference between the 100 units of the two LIs for the Gaussian data. This is directly comparable to Fig 5A.

Panels C and D display this difference for a single example unit, with three different values of the leak rate (α = 0.2, 0.1, 0.05). For forgetting, in panel C, we can observed that the difference value changes rapidly, and the leak rate does not have a large impact. For forgetting, at each time step the inputs to the two integrators are different. This contributes to the rapid divergence. In addition, the memory components integrate these differences. Thus, in the forgetting condition, the inputs and the memory components of the two integrators vary, leading to a rapid divergence, dominated by differences in the inputs. For construction, in panel D, we can observe that the difference value is initially high, and converges to zero at different rates, dependent on the leak rate. At each time step, the inputs are the same and only the memory components of the two integrators are different. We see the clear effect of the memory component, as in panel D, we observe that the leak rate plays an important role in the time-course of construction. This reveals that our observed asymmetry in forgetting vs. construction is due to the combined effects of (a) the task–taking the difference between two integrators in the same-different-same input conditions (as opposed to observing a single integrator), and (b) the leak rate of the linear integrators. At the same-different transition, the difference is driven by the inputs which diverge rapidly. At the different-same transition, the difference is driven by the memory component, which evolves more or less slowly, according to the leak rate.

In the linear integrator of Chien and Honey [9], it is likely that with relatively high leak rates (i.e. low vales of parameter ρi < 0.5 in their Eq 1 describing the linear integrator), this asymmetry would not be present.

Note on data normalization for correlation maps

When illustrating the time-time correlation maps, we do so after subtracting the mean value from each element (i.e., for each unit in the reservoir, or each dimension in the word-embedding, we treat the signal as a time-series, and subtract the mean of the time-series from every time-point). This can prevent saturation issues which may confuse the interpretation.

BrainIAK HMM model

The HMM model is described in detail in Baldassano et al. [8] and is available as part of the BrainIAK python library, along with example jupyter notebook that corresponds to [8] (https://github.com/brainiak/brainiak/tree/master/examples/eventseg). Given a set of (unlabeled) time courses of (simulated or real) neural activity, the goal of the event segmentation model is to temporally divide the data into “events” with stable activity patterns, punctuated by “event boundaries” at which activity patterns rapidly transition to a new stable pattern. The number and locations of these event boundaries can then be compared between human neural activity and simulated Narrative Integration Model activity, and to ground truth values.

The model is run on a given fMRI or reservoir trajectory, and requires specification of the expected number of segments, k. The segmentation returns a probability value. The trained model can then be run on a test trajectory, and a log likelihood score is returned. This mode can be used to evaluate the trained model on untrained test data.

Selection of k values for the HMM comparison of fast and slow virtual NIR areas

In the comparison of segmentation on the fast and slow virtual areas, we based values on those determined for different cortical areas by [8]. They determined that for an fMRI signal of 1976 TRs, early visual segmented optimally with 119 events, thus 16.6 TRs per event. PMC was optimal with 44 events, thus 44.9 TRs per event. The Not the Fall fMRI signal has a duration of 365 TRs. This gives us 365TRs/16.6TRs per event: k = 22 for the fast area, and 365TRs/44.9TRs per event: k = 8 for the slow area. We use these values in segmenting the NIR activity based on the Not the Fall transcript which has 682 TRs.

In order to make a more systematic evaluation of k values for the fast and slow area we performed an exhaustive analysis with an ensemble of k values varying from 2 to 40 in a grid search, to evaluate the segmentation effect as specified in Eq (2) above.

We used the log likelihood of the model fit to evaluate segmentation with different k values. We gathered the log-likelihoods for the fast and slow NIR model, for 50 NIR models, and then exhaustively evaluated the above inequality for all values of k with paired t-test.

Model code and data

This research is realized in the open code spirit, and indeed benefitted from open code and data for the fMRI experiments [9,44] and the HMM segmentation model [8], and for development of the reservoir model [46], and the language model for word embeddings [39,40]. The Narrative Integration Model code in python, and all required data is available on GitHub https://github.com/pfdominey/Narrative-Integration-Reservoir/.

The fMRI data for 16 subjects in the comparison of human and Narrative Integration Reservoir event segmentation originates from the study of [45]. Data from the angular gyrus for the first 24 minutes of the Sherlock episode were derived from this data and provided by Baldassano https://figshare.com/articles/dataset/Sherlock_data_for_OHBM/12436955. The transcript of the 18 minute auditory recall of the Sherlock episode segment, and fMRI data for the Not the Fall narrative, and the corresponding transcript are described in [44] and are provided in this repository http://datasets.datalad.org/?dir=/labs/hasson/narratives/stimuli.

Acknowledgments

This research benefited from open access to fMRI data and narrative transcripts [8,44,45,47], the HMM segmentation model [8], and a framework for reservoir computing modeling [46], and word embeddings [39], without which the current work would not have been possible.

Data Availability

https://github.com/pfdominey/Narrative-Integration-Reservoir/.

Funding Statement

PFD received funding from the French Région Bourgogne Franche Comté, Grant ANER RobotSelf 2019-Y-10650. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Lashley KS. The problem of serial order in behavior. In: Jeffress LA, editor. Cerebral mechanisms in behavior. New Your: Wiley; 1951. p. 112–36. doi: 10.1037/h0056603 [DOI] [Google Scholar]
  • 2.Speer NK, Reynolds JR, Swallow KM, Zacks JM. Reading stories activates neural representations of visual and motor experiences. Psychological Science. 2009;20(8):989–99. doi: 10.1111/j.1467-9280.2009.02397.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Tversky B, Zacks JM. Event perception. The Oxford handbook of cognitive psychology, Oxford University Press, New York. 2013:83–94. [Google Scholar]
  • 4.Zacks JM, Speer NK, Swallow KM, Braver TS, Reynolds JR. Event perception: a mind-brain perspective. Psychological bulletin. 2007;133(2):273. doi: 10.1037/0033-2909.133.2.273 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Boyd B. The evolution of stories: from mimesis to language, from fact to fiction. Wiley Interdisciplinary Reviews: Cognitive Science. 2018;9(1):e1444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Bruner J. The narrative construction of reality. Critical inquiry. 1991:1–21. [Google Scholar]
  • 7.Ricoeur P. Time and Narrative, Volume 1. Chicago: University of Chicago Press; 1984. 274 p. [Google Scholar]
  • 8.Baldassano C, Chen J, Zadbood A, Pillow JW, Hasson U, Norman KA. Discovering event structure in continuous narrative perception and memory. Neuron. 2017;95(3):709–21. e5. doi: 10.1016/j.neuron.2017.06.041 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Chien H-YS, Honey CJ. Constructing and forgetting temporal context in the human cerebral cortex. Neuron. 2020. doi: 10.1016/j.neuron.2020.02.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Willems RM, Nastase SA, Milivojevic B. Narratives for neuroscience. Trends in neurosciences. 2020;43(5):271–3. doi: 10.1016/j.tins.2020.03.003 [DOI] [PubMed] [Google Scholar]
  • 11.Pineda FJ. Generalization of back-propagation to recurrent neural networks. Physical review letters. 1987;59(19):2229–32. doi: 10.1103/PhysRevLett.59.2229 [DOI] [PubMed] [Google Scholar]
  • 12.Elman J. Distributed representations, Simple recurrent networks, and grammatical structure. Machine Learning. 1991;7:30. [Google Scholar]
  • 13.Servan-Schreiber D, Cleeremans A, McClelland JL. Graded state machines: The representation of temporal contingencies in simple recurrent networks. Machine Learning. 1991;7(2–3):161–93. [Google Scholar]
  • 14.Dominey PF. Complex sensory-motor sequence learning based on recurrent state representation and reinforcement learning. Biol Cybern. 1995;73(3):265–74. Epub 1995/08/01. . [PubMed] [Google Scholar]
  • 15.Douglas RJ, Koch C, Mahowald M, Martin K, Suarez HH. Recurrent excitation in neocortical circuits. Science. 1995;269(5226):981–5. doi: 10.1126/science.7638624 [DOI] [PubMed] [Google Scholar]
  • 16.Laje R, Buonomano DV. Robust timing and motor patterns by taming chaos in recurrent neural networks. Nature neuroscience. 2013;16(7):925–33. doi: 10.1038/nn.3405 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Paton JJ, Buonomano DV. The neural basis of timing: Distributed mechanisms for diverse functions. Neuron. 2018;98(4):687–705. doi: 10.1016/j.neuron.2018.03.045 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Enel P, Procyk E, Quilodran R, Dominey P. Reservoir Computing Properties of Neural Dynamics in Prefrontal Cortex. PLoS computational biology. 2016;12. doi: 10.1371/journal.pcbi.1004967 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Dominey P, Arbib M., & Joseph J. P. A model of corticostriatal plasticity for learning oculomotor associations and sequences. Journal of cognitive neuroscience. 1995;7(3):311–36. doi: 10.1162/jocn.1995.7.3.311 [DOI] [PubMed] [Google Scholar]
  • 20.Cazin N, Llofriu M, Scleidorovich P, M., Pelc T, Harland B, Weitzenfeld A, et al. Reservoir Computing Model of Prefrontal Cortex Creates Novel Combinations of Previous Navigation Sequences from Hippocampal Place-cell Replay with Spatial Reward Propagation. PLoS computational biology. 2019;15(7). doi: 10.1371/journal.pcbi.1006624 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Fusi S, Miller EK, Rigotti M. Why neurons mix: high dimensionality for higher cognition. Current opinion in neurobiology. 2016;37:66–74. doi: 10.1016/j.conb.2016.01.010 [DOI] [PubMed] [Google Scholar]
  • 22.Rigotti M, Barak O, Warden MR, Wang XJ, Daw ND, Miller EK, et al. The importance of mixed selectivity in complex cognitive tasks. Nature. 2013;497(7451):585–90. Epub 2013/05/21. doi: 10.1038/nature12160 ; PubMed Central PMCID: PMC4412347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Maass W, Natschlager T, Markram H. Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput. 2002;14(11):2531–60. Epub 2002/11/16. doi: 10.1162/089976602760407955 . [DOI] [PubMed] [Google Scholar]
  • 24.Jaeger H, Haas H. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science. 2004;304(5667):78–80. Epub 2004/04/06. doi: 10.1126/science.1091277 . [DOI] [PubMed] [Google Scholar]
  • 25.Lukosevicius M, Jaeger H. Reservoir computing approaches to recurrent neural network training. Computer Science Review. 2009;3(3):22. [Google Scholar]
  • 26.Dominey PF, Arbib MA, Joseph JP. A Model of Corticostriatal Plasticity for Learning Oculomotor Associations and Sequences J Cogn Neurosci. 1995;7(3):25. [DOI] [PubMed] [Google Scholar]
  • 27.Pearlmutter BA. Gradient calculations for dynamic recurrent neural networks: A survey. Neural Networks, IEEE Transactions on. 1995;6(5):1212–28. doi: 10.1109/72.410363 [DOI] [PubMed] [Google Scholar]
  • 28.Barone P, Joseph JP. Prefrontal cortex and spatial sequencing in macaque monkey. Exp Brain Res. 1989;78(3):447–64. Epub 1989/01/01. doi: 10.1007/BF00230234 . [DOI] [PubMed] [Google Scholar]
  • 29.Bernacchia A, Seo H, Lee D, Wang X-J. A reservoir of time constants for memory traces in cortical neurons. Nature neuroscience. 2011;14(3):366–72. doi: 10.1038/nn.2752 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Buonomano DV, & Merzenich M. M. Temporal information transformed into a spatial code by a neural network with realistic properties. Science. 1995;267(5200):1028–30. doi: 10.1126/science.7863330 [DOI] [PubMed] [Google Scholar]
  • 31.Dominey PF. A shared system for learning serial and temporal structure of sensori-motor sequences? Evidence from simulation and human experiments. Brain Res Cogn Brain Res. 1998;6(3):163–72. Epub 1998/04/04. doi: 10.1016/s0926-6410(97)00029-3 . [DOI] [PubMed] [Google Scholar]
  • 32.Dominey PF. Influences of temporal organization on sequence learning and transfer: Comments on Stadler (1995) and Curran and Keele (1993). Journal of Experimental Psychology: Learning, Memory, and Cognition,. 1998;24(1):14. [Google Scholar]
  • 33.Dominey PF, & Ramus F. Neural network processing of natural language: I. Sensitivity to serial, temporal and abstract structure of language in the infant. Language and Cognitive Processes. 2000;15(1):87–127. [Google Scholar]
  • 34.Dominey PF. Recurrent temporal networks and language acquisition-from corticostriatal neurophysiology to reservoir computing. Frontiers in Psychology. 2013;4:1–14. Epub 2013/08/13. doi: 10.3389/fpsyg.2013.00001 ; PubMed Central PMCID: PMC3733003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Hinaut X, Dominey PF. Real-time parallel processing of grammatical structure in the fronto-striatal system: A recurrent network simulation study using reservoir computing. PloS one. 2013;8(2). doi: 10.1371/journal.pone.0052946 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Uchida T, Lair N, Ishiguro H, Dominey PF. A Model of Online Temporal-Spatial Integration for Immediacy and Overrule in Discourse Comprehension. Neurobiology of Language. 2021;2(1):83–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Just MA, Carpenter PA. A theory of reading: From eye fixations to comprehension. Psychological review. 1980;87(4):329. [PubMed] [Google Scholar]
  • 38.Hagoort P, van Berkum J. Beyond the sentence given. Philos Trans R Soc Lond B Biol Sci. 2007;362(1481):801–11. doi: 10.1098/rstb.2007.2089 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Yamada I, Asai A, Sakuma J, Shindo H, Takeda H, Takefuji Y, et al. Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia. arXiv preprint arXiv:14103916. 2020. [Google Scholar]
  • 40.Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J, editors. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems; 2013. [Google Scholar]
  • 41.Kutas M, Federmeier KD. Thirty years and counting: finding meaning in the N400 component of the event-related brain potential (ERP). Annual review of psychology. 2011;62:621–47. doi: 10.1146/annurev.psych.093008.131123 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Metusalem R, Kutas M, Urbach TP, Hare M, McRae K, Elman JL. Generalized event knowledge activation during online sentence comprehension. Journal of memory and language. 2012;66(4):545–67. doi: 10.1016/j.jml.2012.01.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Nieuwland MS, Van Berkum JJ. When peanuts fall in love: N400 evidence for the power of discourse. Journal of cognitive neuroscience. 2006;18(7):1098–111. doi: 10.1162/jocn.2006.18.7.1098 [DOI] [PubMed] [Google Scholar]
  • 44.Nastase SA, Liu Y-F, Hillman H, Zadbood A, Hasenfratz L, Keshavarzian N, et al. Narratives: fMRI data for evaluating models of naturalistic language comprehension. bioRxiv. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Chen J, Leong YC, Honey CJ, Yong CH, Norman KA, Hasson U. Shared memories reveal shared structure in neural activity across individuals. Nature neuroscience. 2017;20(1):115–25. doi: 10.1038/nn.4450 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Thiede LA, Zimmermann RS. Easyesn: a library for recurrent neural networks using echo state networks 2017. Available from: https://github.com/kalekiu/easyesn. [Google Scholar]
  • 47.Zadbood A, Chen J, Leong YC, Norman KA, Hasson U. How we transmit memories to other brains: constructing shared neural representations via communication. Cerebral cortex. 2017;27(10):4988–5000. doi: 10.1093/cercor/bhx202 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Polkinghorne DE. Narrative knowing and the human sciences: Suny Press; 1988. [Google Scholar]
  • 49.Geerligs L, van Gerven M, Güçlü U. Detecting neural state transitions underlying event segmentation. NeuroImage. 2021;236:118085. doi: 10.1016/j.neuroimage.2021.118085 [DOI] [PubMed] [Google Scholar]
  • 50.Murray JD, Bernacchia A, Freedman DJ, Romo R, Wallis JD, Cai X, et al. A hierarchy of intrinsic timescales across primate cortex. Nature neuroscience. 2014. doi: 10.1038/nn.3862 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Chaudhuri R, Knoblauch K, Gariel M-A, Kennedy H, Wang X-J. A large-scale circuit mechanism for hierarchical dynamical processing in the primate cortex. Neuron. 2015;88(2):419–31. doi: 10.1016/j.neuron.2015.09.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Hasson U, Yang E, Vallines I, Heeger DJ, Rubin N. A hierarchy of temporal receptive windows in human cortex. Journal of Neuroscience. 2008;28(10):2539–50. doi: 10.1523/JNEUROSCI.5487-07.2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Lerner Y, Honey CJ, Silbert LJ, Hasson U. Topographic mapping of a hierarchy of temporal receptive windows using a narrated story. Journal of Neuroscience. 2011;31(8):2906–15. doi: 10.1523/JNEUROSCI.3684-10.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Barak O. Recurrent neural networks as versatile tools of neuroscience research. Current opinion in neurobiology. 2017;46:1–6. doi: 10.1016/j.conb.2017.06.003 [DOI] [PubMed] [Google Scholar]
  • 55.Brouwer H, Crocker MW, Venhuizen NJ, Hoeks JC. A neurocomputational model of the N400 and the P600 in language processing. Cognitive science. 2017;41:1318–52. doi: 10.1111/cogs.12461 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Rabovsky M, Hansen SS, McClelland JL. Modelling the N400 brain potential as change in a probabilistic representation of meaning. Nature Human Behaviour. 2018;2(9):693–705. doi: 10.1038/s41562-018-0406-4 [DOI] [PubMed] [Google Scholar]
  • 57.Dominey PF, Hoen M, Blanc JM, Lelekov-Boissard T. Neurological basis of language and sequential cognition: evidence from simulation, aphasia, and ERP studies. Brain Lang. 2003;86(2):207–25. Epub 2003/08/19. doi: 10.1016/s0093-934x(02)00529-1 . [DOI] [PubMed] [Google Scholar]
  • 58.Bates E, McNew S, MacWhinney B, Devescovi A, Smith S. Functional constraints on sentence processing: a cross-linguistic study. Cognition. 1982;11(3):245–99. Epub 1982/05/01. doi: 10.1016/0010-0277(82)90017-8 . [DOI] [PubMed] [Google Scholar]
  • 59.Dominey PF, Hoen M, Inui T. A neurolinguistic model of grammatical construction processing. J Cogn Neurosci. 2006;18(12):2088–107. doi: 10.1162/jocn.2006.18.12.2088 . [DOI] [PubMed] [Google Scholar]
  • 60.Bates E, MacWhinney B. Competition, variation, and language learning. In: MacWhinney B, Bates E, editors. Mechanisms of language acquisition. Hillsdale, NJ: Erlbaum; 1987. p. 157–93. [Google Scholar]
  • 61.Pointeau G, Mirliaz S, Mealier A-L, Dominey PF. Learning to Use Narrative Function Words for the Organization and Communication of Experience. Frontiers in Psychology. 2021;12. doi: 10.3389/fpsyg.2021.591703 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Mealier A-L, Pointeau G, Mirliaz S, Ogawa K, Finlayson M, Dominey PF. Narrative Constructions for the Organization of Self Experience: Proof of Concept via Embodied Robotics Frontiers in Psychology: Language. 2017. doi: 10.3389/fpsyg.2017.01331 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Baldassano C, Hasson U, Norman KA. Representation of real-world event schemas during narrative perception. Journal of Neuroscience. 2018;38(45):9689–99. doi: 10.1523/JNEUROSCI.0251-18.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Talmor A, Herzig J, Lourie N, Berant J. Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:181100937. 2018. [Google Scholar]
  • 65.Reimers N, Gurevych I. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:190810084. 2019. [Google Scholar]
  • 66.Devlin J, Chang M-W, Lee K, Toutanova K, editors. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); 2019. [Google Scholar]
  • 67.Ettinger A. What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. Transactions of the Association for Computational Linguistics. 2020;8:34–48. [Google Scholar]
  • 68.Ettinger A, Feldman N, Resnik P, Phillips C, editors. Modeling N400 amplitude using vector space models of word representation. CogSci; 2016. 29359204 [Google Scholar]
  • 69.Mitchell TM, Shinkareva SV, Carlson A, Chang K-M, Malave VL, Mason RA, et al. Predicting human brain activity associated with the meanings of nouns. science. 2008;320(5880):1191–5. doi: 10.1126/science.1152876 [DOI] [PubMed] [Google Scholar]
  • 70.Schrimpf M, Blank IA, Tuckute G, Kauf C, Hosseini EA, Kanwisher NG, et al. Artificial Neural Networks Accurately Predict Language Processing in the Brain. BioRxiv. 2020. [Google Scholar]
  • 71.Dehghani M, Boghrati R, Man K, Hoover J, Gimbel SI, Vaswani A, et al. Decoding the neural representation of story meanings across languages. Human brain mapping. 2017;38(12):6096–106. doi: 10.1002/hbm.23814 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Silva C, Ribeiro B, editors. The importance of stop word removal on recall values in text categorization. Proceedings of the International Joint Conference on Neural Networks, 2003; 2003: IEEE. [Google Scholar]
  • 73.Elman J. Finding structure in time. Cognitive Sci. 1990;14:179–211. [Google Scholar]
  • 74.Jaeger H. The" echo state" approach to analysing and training recurrent neural networks-with an erratum note’. Bonn, Germany: German National Research Center for Information Technology GMD Technical Report. 2001;148. [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008993.r001

Decision Letter 0

Frédéric E Theunissen, Samuel J Gershman

17 May 2021

Dear Dr Dominey,

Thank you very much for submitting your manuscript "Narrative Event Segmentation in the Cortical Reservoir" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

As you will see both reviewers agreed that your use of the reservoir computing to model (and explain) event segmentation and its hierarchical temporal dynamics is an important contribution. Reviewer 2 also applauded your efforts in the use of the data and model from the Baldassano paper facilitating the comparison between the data and the model. As you will see both reviewers also raise weaknesses that need to be addressed. Some of these are relatively minor, as they relate to omissions of important details or clarity in the presentation but others are more substantial: e.g. statistical assessment (choice of statistical tests) and, related, choice of model parameters (in particular K in HMM). The choice of your model parameters requires a more quantitative and statistical approach (reviewer 1) and you could further quantify the benefits of the entire approach by for example comparing reservoir computing with a linear integrator suggested by reviewer 2.

The clarity of the manuscript could also be improved. Some of this might just be done by reordering (see comments from the reviewers).

The introduction of the reservoir computing (in the Introduction) is particularly terse. The paragraph that starts with "It is not surprising that recurrent neural networks ..." is particularly dense.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Frédéric E. Theunissen

Associate Editor

PLOS Computational Biology

Samuel Gershman

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Reviewer #1: Dominey uses a recurrent neural network model to capture several temporal dynamics observed in fMRI data during naturalistic narrative comprehension: (1) context construction at varying times scales, (2) rapid context flushing, and (3) slower-evolving events at higher-level cortical areas. The reservoir network model essentially compresses a sequence of prior semantic vectors into a discourse vector that can be deployed immediately during ongoing processing. This is an interesting (and challenging) paper. Most of my comments pertain to the first half of results validating the event segmentations of the reservoir network; namely, I think the statistical assessment of event segmentions can be improved, and we need more details about certain analytic choices (e.g. the number of events). I also think a more thorough explanation of the reservoir network early on would help readers interpret the results (e.g. how is the network trained? how do the different time constants arise?). I include a handful of minor comments and typos at the end.

Major comments:

As someone who’s not very familiar with this kind of network model, I think the manuscript would benefit from a bit more exposition early on about how the reservoir network works. We get some details in the Discussion and the Methods section located at the end of the paper, and at some points the author points to a previous paper (Uchida et al., 2021); but there should be sufficient “standalone” details about the reservoir network in this paper, and these should come early enough in the Introduction/Results that the reader can more easily interpret the findings. For example, you say “the reservoir was trained to generate the discourse vector”—but how exactly? What’s the training objective? Is the reservoir network previously trained on the text corpus somehow? You mention in the Discussion that in the “current modeling effort we do not train the reservoir,” but this comes very late (and is still a bit opaque to me). Another example: How do the temporal dynamics (time constants) of the network arise? Do they emerge out of some sort of training, or are they user-specified parameter settings when wiring the reservoir? (Apologies if these things are obvious and I’m just missing it!)

The choice of applying event segmentation to downsampled whole-brain fMRI data doesn’t seem to have a strong precedent in the related literature. For example, Baldassano et al., 2017, and 2018, use searchlights and/or focus on high-level association areas like angular gyrus, posterior medial cortex, and medial prefrontal cortex. In this case, putative event representations are encoded in finer-grained response patterns within a given region of interest (although not too fine-grained via Chen et al., 2017). It’s not clear to me that these sort of response patterns seen in the literature contribute meaningfully to the coarse, whole-brain response patterns used here, and I’m somewhat surprised the event structure is evident at the whole-brain level (e.g. Fig. 4A). This non-localized approach also seems to be in tension with the idea of a cortical hierarchy with different temporal dynamics. Replicating this analysis with more localized cortical areas implicated in high-level event representation could strengthen the argument; otherwise, better motivation should be provided for using coarse whole-brain data.

There are several places where the choice of the predefined number of events k for the HMM seems arbitrary or insufficiently motivated. For example, why is a HMM with k = 5 used when applying the reservoir network to the Wikipedia-based test narrative generated from four Wikipedia articles? Were multiple values of k assessed (which values?) and k = 5 chosen (based on what criterion?)? Or did you try k = 4, but the result fit poorly? Other examples: when running the HMM on the fMRI data, you specify k = 10; when you examine the fast and slow reservoir subnetworks, you use k = 25 and k = 8—why? If you’re trying multiple values of k here, it should be a systematic comparison and we need to know the criteria for selecting k; for example, I would consider using the t-distance introduced by Geerligs et al., 2021 (There’s also a nice demo here: https://naturalistic-data.org/content/Event_Segmentation.html).

I’m having a hard time understanding how event boundaries are statistically compared here. For example, you report for the reservoir network that the “ground truth and HMM segment boundaries are highly correlated, with the Pearson correlation r = 0.99, p < 0.0001.” What exactly is being correlated here? Are you correlating a time series of zeros with ones where an event boundary is found? In this case, the degrees of freedom is the number of (autocorrelated) time points. I’m not sure this sort of statistical test is adequate and would advocate for a nonparametric randomization-based approach. For example, Baldassano et al., 2017, use a randomization procedure where they shuffle the boundaries (e.g. 1000 times) while preserving the duration of the events (in conjunction with some metric like the t-distance mentioned above).

In line with the previous comment, when comparing the boundaries (at k = 5) found for the NYT and Wikipedia test narratives, you say you “normalized the resulting event boundaries into a common range, and made a pairwise comparison of the segment boundaries for the two texts.” I’m not really sure what this means. You compared the index of the time points on which each of the four boundaries landed? But your t-value has 5 degrees of freedom, suggesting 6 boundaries were compared… including the first and final time point? Again, I think a nonparametric statistical approach for comparing segmentations (e.g. adapted from Baldassano et al., 2017) would make this more convincing.

In Figure 11, you show the pre-reservoir time-point correlation matrix for Wikipedia2Vec embeddings that serve as input to the network. The lack of slow, event-like structure seems obvious here, but it could be useful to treat this as a more formal “control” model. In other words, if you want to show that the reservoir network captures narrative structure above and beyond the word-level embeddings, it might be worthwhile to show that it provides statistically better event segmentations than the pre-reservoir embeddings.

This paper demonstrates that relatively straightforward recurrent dynamics can reproduce several of the temporal dynamics observed in fMRI data during narrative comprehension. However, the modeling work here doesn’t really touch on the actual content of those high-level event or narrative representations. For example, Baldassano and colleagues relate event representations to situation models. Do we have any interpretation of the discourse vectors represented by the reservoir network (other than summarizing the prior semantic vectors)? You touch on this in the Discussion on page 16, but it might deserve an additional sentence or two.

The figures are described in the main text, but I don’t see any figure captions in the copy of the manuscript provided by the journal or on bioRxiv. Standalone figure captions would be helpful.

Minor comments:

For figures with time-point similarity matrices (e.g. Figs. 2, 4), it would be helpful to see color bars so we have an intuition about the scale of correlations(?) observed.

Abstract: not sure you need “awake” in the first line here

Abstract: expand “HMM” acronym

Page 4: “Wikipedia2ec” > “Wikipedia2Vec”

Page 5: “has to with” > “has to do with”

Page 7: “Braniak” > “BrainIAK”

Figure 7: “Alignement” > “alignment”

Page 8: I would use the full story name and credit the storyteller here: “It’s Not the Fall that Gets You” by Andy Christie (https://themoth.org/stories/its-not-the-fall-that-gets-you)

Page 9: You note the 10 events assigned to the fMRI data and say “Likewise for the Narrative Integration Reservoir, 10 instances were created…” This wording implies to me that the number of reservoir instances relates to the number of events—but I think you want to say that the number of instances is matched to the number of subjects (also 10). Also, you say you “summary results of this segmentation”—but how do you summarize across instances?

Page 14: “remain” > “remaining”

Page 15 “reservoir correlations displayed an abrupt response in the forgetting context” due to “immediate responses to input in the reservoir”—this still wasn’t very intuitive to me… why?

Great to see the code on GitHub!

References:

Baldassano, C., Hasson, U., & Norman, K. A. (2018). Representation of real-world event schemas during narrative perception. Journal of Neuroscience, 38(45), 9689-9699. https://doi.org/10.1523/JNEUROSCI.0251-18.2018

Chen, J., Leong, Y. C., Honey, C. J., Yong, C. H., Norman, K. A., & Hasson, U. (2017). Shared memories reveal shared structure in neural activity across individuals. Nature Neuroscience, 20(1), 115-125. https://doi.org/10.1038/nn.4450

Geerligs, L., van Gerven, M., & Güçlü, U. (2021). Detecting neural state transitions underlying event segmentation. NeuroImage, 236, 118085. https://doi.org/10.1016/j.neuroimage.2021.118085

Samuel A. Nastase

Reviewer #2: In this manuscript, Dominey presents a reservoir-computing model (a “narrative integration reservoir”) of event-boundary and temporal-integration processes in the human cerebral cortex. In particular, this model seeks to explain neural phenomena related to “hierarchical event segmentation” (as reported by Baldassano et al., 2017) and “context construction” (as reported by Chien & Honey, 2021). Broadly, these neural phenomena relate to (i) how quickly the state-vectors of cortical regions change at the boundaries between “events” in a narrative [e.g. changes of scene] and (ii) for how long the state-vectors within a segment of the narrative display context-dependence [e.g. where the state-vectors for the current input depend on the state of input 6 words earlier, or whether they are independent of the prior input].

The neural data are modeled in a two-stage procedure. In the first stage, the narrative stimuli used in the original studies is converted to a sequence of vector, via word-by-word mapping using the wikipedia2vec embedding procedure (Yamada et al. , 2020). In the second stage, the embedding sequences are converted into neural state sequences by feeding them through a reservoir model. In this way, the model is able to generate “state transitions” and “context dependence” effects that match the effects from the literature.

The manuscript demonstrates that the reservoir model generates event boundaries comparable to those detected by the HMM model in the original fMRI data, and the unit activation and pattern correlation seem to follow the gradual context construction and rapid context forgetting. Finally, the units within the reservoir model were able to be grouped according to different timescales, enabling the model to replicate the hierarchical pattern of event segmentation, context construction and context forgetting.

The main strengths of this manuscript are that:

i) the research topic and the modeling approach are important: event segmentation and multi-timescale dynamics are highly active areas with broad interest to researchers studying human computational and cognitive neurosciences

ii) the reservoir model is an intriguing and exciting model of temporal integration in the cerebral cortex, because it does not rely on precise tuning of the model parameters, and instead achieves its power via the high-dimensionality of its own dynamics; the tolerance for variation in the weights and in the dynamics makes this computational approach biologically plausible, and it could even help to account for some of the redundancy and plasticity that is observed in cortical-dependent function;

iii) the paper directly employs many of the same techniques from the original papers it is modeling, nicely clarifying the comparison between model and data;

iv) the paper is written in a lively and compelling style.

The main weaknesses of this manuscript are that:

i) the paper does not attempt to isolate the key functional features that account for the effects — in other words, the manuscript does not describe how the reservoir model be adjusted so that it does /not/ work. In particular, a comparison with a linear integrator model (rather than a nonlinear reservoir model) would be highly instructive;

ii) the manuscript claims that there is no learning / training involved, but the use of wikipedia2vec surely involves a rich form of learning; more generally, more discussion is needed of whether the authors are claiming that the event-grouping effects in fMRI data are simply reflections of the within-event semantic similarity, or whether the event transitions may reflect other factors;

iii) related to the point above, the manuscript does not sufficiently characterize the autocorrelation that is present in the embedding vectors themselves… for example, the correlation between words within a wikipedia article must surely be higher than the correlation between words across distinct articles; it is possible that the reservoir model may magnify these effects, but a precise characterization of the autocorrelation before and after transformation through the reservoir model is critical;

iv) there are some smaller technical issues about how the model results are compared with the empirical data, as outlined below.

Overall, this is an intriguing manuscript with great potential, following revisions, to advance our understanding of event-segmentation and temporal integration processes in the human brain.

———

MAIN POINTS

1) The most important missing piece in this manuscript is to present a comparison model that does /not/ produce the effects exhibited by the narrative reservoir model. In particular, it is crucial to demonstrate that the model’s performance is different from something very simple like feeing the embeddings into a linear integrator model, or a linear filter [e.g. a boxcar running average, or a recency-weighted average like [0.2, 0.18, 0.16, 0.14, 0.12, 0.08, 0.06, 0.04, 0.02] . More generally, it would be fascinating to know which of the elements of the reservoir model are necessary in order to account for the effect: for example, what happens if leak rate is set to near-zero, or if it is set to a very high value? Together such comparisons could help to understand whether nonlinearity is even required to generate the effects, and to separate out which effects arise from the reservoir component, and which effects arise from the wikipedia2vec embedding model.

2) Related to the point above, the manuscript claims that the results do not require any learning. For example, in this paragraph: “It is worth noting, in the context of the relation between narrative structure and recurrent network functional neuroanatomy, that the observed reservoir behavior comes directly from the input-driven dynamics of the reservoir. There is no learning, and we do not look at the readout, but at the recurrent elements of the reservoir itself.”

Although I understand that the reservoir model is not trained in any way to match the neural data, the model does have the benefit of the enormous amount of information (learned from text and semantic-structure) in the encoding model. At a conceptual level, please textually clarify the sense in which there is no learning. At a more practical level, it seems critical to better characterize how much event-structure can be directly extracted from the embeddings themselves. Figure 11 shows a comparison of the word-embedding autocorrelation structure and the reservoir-state autocorrelation structure, and these do appear to be very different. However, they are plotted on different scales, and there is no smoothing at all applied to the word-embeddings. It seems implausible that words sampled from within one Wikipedia article (or New York Time article) are going to have the same average inter-word similarity as words samples across two distinct articles — certainly many words will be shared or non-specific, but in the time-averaged data, there should be some semantic “themes” that are shared across sentences within an article, but not across articles. It is crucial to separate out such effects [inherent in the input to the narrative reservoir model] from the effects that arise from the recurrent dynamics of the reservoir model.

3) The HMM segmentation model will probably have a bias to “equally space” its events across an interval from start to end — in other words, in the absence of any actual structure over time, such as in random noise data, the HMM will likely segment a sequence into units of similar length. Therefore, in order to compare segmentations in the neural data and in the model, it seems important to run a control in which we use the HMM to cluster a /permutation/ of the real data, and then show that the HMM fit on this permuted data has a lower correspondence to our model [or fMRI data] than the HMM fit on the original. For example, if we have a sequence of events ABCD in the simulation and the same sequence ABCD in the neural data, then we cannot just show that the timepoints of the segmentations are correlated — the more compelling demonstration would be [for example] to compare (i) correlation when simulation and neural data are both using ABCD ordering and (ii) correlation when the simulation uses DBAC ordering and the neural data is for ABCD ordering.

4) There are some issues relating the timing of the words in the (auditory, spoken stimulus, for which the neural data were recorded) and the location of words within the text-transcript fed into the word embedding model. Words are not spoken at a constant rate in a narrative, to 50% of the words do not correspond precisely to 50% of the time in a narrative. In order to align neural data (fMRI timing) with model predictions (word-embedding timing), the only solution is to determine when each word is spoken. The authors propose two methods for aligning the neural data with the stimulus timing, but (as far as I can tell?) neither of these methods actually precisely aligns the timing of the neural data with the timing of when the actual words were spoken. Since the stimuli are all available, is it not possible to generate the simulations in a way that matches directly with when the words were spoken (and when the brain responses were recorded)?

5) When illustrating the time-time correlation maps, it may be helpful to do so only after subtracting the mean value from each element (i.e., for each unit in the reservoir, or each dimension in the word-embedding, treat the signal as a time-series, and subtract the mean of the time-series from every time-point). This can prevent saturation issues which may confuse the interpretation. For example, in Figure 11 there is an illustration of the time-time correlation for the embedding inputs (panel B) and for the reservoir states (panel D). One correlation map is shown on a scale from 0.0 to 1.0 and the other maps is shown on a scale from 0.88 to 1.0. If the mean values are removed from all elements, then the maps will not have these saturation effects, and they can plotted on comparable scales. The saturation effect (where all correlation values are very high) arises when there is a common “mean signal” that is stable within a system over time, so that all pairs of timepoints are highly correlated.

6) The forgetting curves, grouped by timescales, shown in Figure 9, exhibit a dip at the beginning and then a spike around t =15. It is not entirely clear what is happening at t=0 and what is happening at t=15. Please could you make this clearer, both in the text on the figure using labels. Given that the model is driven by word embedding, is it possible that sudden “separation” or “convergence” effects arise from high/low frequency characters (e.g., special characters such as punctuation, or onset-related words) that occur at the event boundary? It is important to rule out the possibility that the model is driven to an unusual state by distinctive words or characters that occur near event boundaries, rather than by the broader “meaning incompatibility” between the prior context and the current input. Does this sudden dip (Figure 9) occur for all event boundaries and/or for all sentence boundaries? [Of course, it does seem that the patterns of the units were driven by the stimulus after the event boundary, and the curves seem to separate over time indicating their representation are gradually different.]


7) It was unclear to me whether this manuscript is proposing that reservoir dynamics are actually proposed as a process model for cortical dynamics in the human brain. If so, please clarify what are the architectural features that are being proposed -- what are the essential features of the reservoir that we should interpret as functional principles for the cerebral cortex?

8) I was intrigued by this section in the manuscript: “ We can propose that this segregation is a natural product of neuroanatomy, and that narrative structure has found its way into this structure. Narrative structure reflects the neuroanatomy.” Please could you extend and/or clarify this statement. As I understand it, the claim is that (i) each different stage of cortical processing could employ its own reservoir network and (ii) time constants in the reservoirs would be longer in higher order regions, and then (iii) this configuration would explain the results of Baldassano et al who analyzed narrative structure at multiple scales. If this is indeed the logic, please could you unpack this for the reader.

MINOR POINTS

In relation to the final “generalization” analysis, examining the long-vs-short scale event segmentation generalization, please provide a little more information about the generalization accuracy (beyond the difference between coherent vs. incoherent condition). For example, what is the generalization accuracy of event segmentation in each condition? Is there an accuracy difference between long vs. short timescale “regions”? Were meaningful events being segmented in the short- vs. long-timescale “regions”? Providing this information could help validate this analysis and clarify its meaning. Relatedly, the logic and procedure for the HMM-generalization analysis could be clarified in the text. The hypothesis being tested is described as follows: ““We can now test the hypothesis that there is a relation between the time constant for construction, and the granularity of segmentation in the Narrative Integration Model by the HMM.” I think it may be clearer if you phrase this without reference to the HMM, since the HMM is just a data-analysis tool [it is not a process model for brain dynamics] — so I think that the underlying claim here is that differences in timescales of different units (within a reservoir model) can explain the increasing-scale event segmentation effects reported by Baldassano et al?

Figures: Please add axis labels to all panels of all figures, and ensure that the resolution is high; some figures are difficult to read (at least in the format supplied to reviewers)

In Figure 4B, what is the low inter-event correlation (blue cross-shape) that happens in the middle of the story? Is there something very different happening over this period?


Figure 5 nicely shows the effect of context forgetting and construction on the model representation. However, I find it hard to tell whether different units really “forget” the context at a similar rate. The forgetting curves and pattern correlation seem to take ~20 TRs to reach different activation/low pattern correlation. Is it possible to make this differentiation clearer or more explicit? Furthermore, although the color scheme allows to differentiate the correlation values, all the values are very high (above 0.95). This makes it harder to compare with the neuroimaging results in Chien and Honey (2020) where they subtracted from each voxel its mean signal.

Figures 5 and 6: The figure titles, captions and body text could be clearer on providing a summary “take home message” of these Figures.

Figure 6: In order to demonstrate that construction time and forgetting time are not related, would it not be more direct to plot forgetting-time vs construction-time unit-by-unit?

Figure 6: there is no legend label for panels C and D

Figures 8 and 9: I appreciated the finding that units could be grouped into different timescales based on context construction analysis. Given that the mapped timescale only ranges from 0-35, consider using just 3 groups to make the inter-group difference clearer. I did not perceive much difference between the 5 correlation maps shown in Figure 8, for example.

Figures 9-13: y-axis labels should state “activation difference” or similar (instead of “activation”)


For the HMM hyperparameters, why was K = 5 chosen for Wikipedia, which also has 4 events as in the NY TImes text?


Please elaborate on the initialization parameters for the narrative integration reservoir. What is the distribution of values from which W_in and W are randomly initialized? Does this choice powerfully shape the behavior of the reservoir, or would similar results be observed with any other choice, as long as the reservoir dynamics are neither diverging nor collapsing?

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Samuel A. Nastase

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008993.r003

Decision Letter 1

Frédéric E Theunissen, Samuel J Gershman

1 Aug 2021

Dear Dr Dominey,

Thank you very much for submitting your manuscript "Narrative Event Segmentation in the Cortical Reservoir" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please look at the comments of Reviewer #2 - they are quite detailed and relevant. I will look carefully at your reply.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Frédéric E. Theunissen

Associate Editor

PLOS Computational Biology

Samuel Gershman

Deputy Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Reviewer #1: The author has substantially reworked and improved the manuscript. The Introduction now contains a thorough description of the reservoir architecture used in this manuscript, which clarifies my confusion about what components of the network are fixed vs. learned. Rather than using whole-brain data, the author now uses response patterns from angular gyrus. Some of the seemingly arbitrary analysis choices (e.g. the number of events k) are now better described and motivated, and the statistical treatment is improved by using a randomization-based nonparametric statistical test for evaluating the event segmentations. Importantly, the author now better evaluates the pre-reservoir embeddings and a linear integrator model, both of which serve as baseline (or control) models. I have a couple minor comments:

In comparing the reservoir model to fMRI data, the author switched to a different story (the widely-used Sherlock movie and recall dataset). I’m a little curious why—but I assume it was just a matter of convenience.

I’m curious about the temporal structure inherent in the reservoir network vs. the temporal structure inherent in the stimulus. For example, it could be interesting to scramble the word order of the stimulus prior to supplying it to the reservoir network (following Lerner et al., 2011).

Author summary: “brains are lead through” > “brains are led through”

Page 6: the sentence structure “and for developing the reservoir model” seems awkward

Page 10: “with p values ranging from p = 1.87e-10 for linear integrator 1 to p = 3.22e-02 for NIR 2”—it might be worth unpacking these results a bit more (or comparing models)

Page 10: “This means that the HMM can be used to measure similarity in event structure in neural activity from different stimulus modalities”—what exactly does this mean? We can run HMMs on two related datasets and then examine the event “templates” (i.e. centroids) found by the model across two datasets (but I don’t think that’s done here). Or we can compare the event boundaries, but this requires somehow matching the timing across datasets (e.g. in Chen et al., 2017, they use averaging to match events across perception and recall).

References:

Lerner, Y., Honey, C. J., Silbert, L. J., & Hasson, U. (2011). Topographic mapping of a hierarchy of temporal receptive windows using a narrated story. Journal of Neuroscience, 31(8), 2906-2915. https://doi.org/10.1523/JNEUROSCI.3684-10.2011

Reviewer #2: Thanks to the author for a raft of new analyses and figures, and for actively engaging with all the points raised in the revision.

My only outstanding concerns relate to the Construction/Forgetting analyses, and to some conceptual and methodological points around that.

(Point 1) I think that the text could be a little more precise about what was shown empirically in relation to construction and forgetting, versus the underlying mechanisms. In Chien & Honey (2020), the finding was not precisely that construction was slow while forgetting was fast. Rather, the finding was that the alignment times (“time for construction of shared context”) increased from peripheral regions toward higher-order regions, while separation times (“forgetting”) did not increase in this systematic manner. It is true that, in the HAT model presented in that paper, there is an “event reset” that leads to rapid forgetting of prior context (leading to asymmetric behavior within the same model for construction and forgetting). However, the empirical (fMRI) finding from that study was not as definitive: the finding was just that the forgetting and construction patterns did not covary with each other: slow-constructing regions were not necessarily slow-forgetting regions. This absence of covariation observed in the empirical data is a kind of asymmetry, but it is not quite the same as saying that, for a single model of a single population, that the circuit can “construct slowly” and “forget quickly”. I think that the temporal asymmetry in individual models of individual regions is still an important point worth investigating — and there are plenty of other theoretical reasons to believe that some kind of context-resetting is occurring — but it is also important to be clear that the empirical finding (in relation to constructing-vs-forgetting timescales) was the absence of covariation across regions.

(Point 2) I am having difficulty understanding how the linear integrator is showing a gradual ramping (for Different —> Same) and yet a rapid decrease (for Same —> Different). I am referring here to the top panel of Figure 5, and the following text in the revised manuscript: “Interestingly the same effects of abrupt forgetting and more progressive construction are observed for the linear integrator. This indicates that this asymmetry in constructing and forgetting is a property of leaky integrators systems.” In the “Integrator Details” section below I elaborate on why I think that linear integrators that are “slow to construct” should also be “slow to forget”; but similar arguments are presented in the Supplemental Material of Chien & Honey (2020).

I am not sure where the discrepancy arises between the linear integrator used in this manuscript and the one used by Chien & Honey (2020). One possibility is that the integrator used here employs a different equation, although this is somewhat ambiguous (see Point 3 below). A second possibility is that the construction/forgetting effects interact in a crucial way with the autocorrelation in the input signals — the simulations in the current manuscript employ correlated input from the word embeddings, while the theoretical arguments of Chien & Honey (2020) made use of uncorrelated input signals. To test for the role of autocorrelated input, one possibility would be to re-run the construction / forgetting analysis on “fake stimuli” constructed as random and independent sequences of vectors generated by sample Gaussian-distributed random numbers. If the asymmetry goes away, then it seems that the sequential properties of the stimulus play a role. A third possibility is that the dependent variable [“difference in activation level”] used in the Figure 5 analyses is different in some important way from the dependent variable (pattern correlation) used in Chien & Honey (2020). However, this possibility seems unlikely, as the current manuscript also shows similar asymmetry effects using a pattern correlation measure.

(Point 3) The equations describing the leaky integrator seems to contain a typo, so it is not totally clear what form of integration was employed. In the text, the equation reads as:

LI(n+1) = ( (1-alpha)LI(n-1)n + (1+alpha)*embedding(n) )/(n+1)

where I believe there may be an erroneous multiplication of LI by n?

(Point 4) Assuming that a key conceptual contribution of the present manuscript is that the reservoir computing model has a repertoire of multiple time constants, then — to be clear — is the neural reservoir being proposed as a model of multiple stages of cortical processing (each with their own timescale)? This is important because one could also imagine the reservoir as a model of a single region, in which distinct nodes within that region each have their own timescale (as in the work of Bernacchia et al, for example). Please clarify in the text.

~~~~

Integrator Details

~~~~

Can linear integrators be “slow constructors” and “rapid forgetters”?

Informally, I want to say something like “integrators that are slow to forget prior context, should also be slow to absorb new input”.

Somewhat more formally, for the linear integrator, let’s call it L(t) and assume it has a form L(t) = p*L(t-1) + q*S(t), where S(t) is the input Signal. (In the case of this manuscript, the input signal values are the outputs from the embedding model.)

Clearly, once p and q are fixed, L(t) depends only on L(0) and the values of S(t).

Moreover, if we are currently at time t, we can choose a time t-tau from the past, and we can see that L(t) depends both on the values of S(t) before and after that moment. In other words, L(t) depends on S(t) for t < (t - tau) and it also depends on the values of S(t) for t >= (t-tau). We can begin to ask whether the S(t) value before t-tau are more important / influential, altogether, than the values of S(t) after t-tau.

I want to claim that:

if I fix L(0), and treat the inputs S(t) as independent random samples from a given distribution,

then the expected proportion of variance of L(t) that depends on [S(0), S(1), … S(t-tau)] is just a function of tau.

In other words, we could say that V_before(tau) = the proportion of variance of L(t) determined by [S(0), S(1), … S(t-tau)].

And similarly, we could say that V_after(tau) = the proportion of variance of L(t) determined by [S(t-tau+1), S(t-tau+2), … S(t)].

Importantly, V_before(tau) + V_after(tau) = 1, because all of the variance in the L(t) values is determined by the S(t) values.

At one extreme, when tau = 0, then we are considering the effect of all of the input on L(t), and obviously we can explain all of the variance in L(t) if we know all the input, so V_before(0) = 1.

At the other extreme, when tau = t, then we are considering none of the input, and we don’t explain any of the variance in L(t), so V_before(t) = 0.

And as tau increases from 0 towards t, we are looking at input further and further in the past, and we are influencing less and less of the input, so V_before(tau) gradually decreases from 1 down to 0, while V_after(tau) gradually increases from 0 to 1.

Because this function V_before(tau) (which is equivalent to 1- V_after(tau)) is just a property of the linear integrator, it will act in the same way regardless of whether we are “entering” or “exiting” a shared context. Therefore, a linear integrator should not be able to exhibit immediate forgetting (rapid) and progressive constructing (slow), as shown in Figure 5.

To make this a bit more concrete, suppose that the Intact models receive the following input sequence:

[A B][C D][E F]

where each capital letter represents some sequence of words in the input stream, S(t), and each capital letter stands in for an equal number of words, and the square brackets indicate the “segments” that were scrambled in the experimental manipulation of the input data.

In that case, we can imagine that the Scrambled group sees the following sequence:

[W X][C D][Y Z]

where the shared portion of the text is “CD” and the preceding and following segments are different from what is seen by the Intact group.

The relevant time period for the “Construction” measurement is the Different-to-Same period leading into the shared segment “CD”: this is BC for the Intact condition and XC for the Scrambled condition.

Similarly, the relevant time period for the “Forgetting” measurement is Same-to-Different period as they exit the shared segment “CD”: this is DE for the Intact condition and DY for the the Scrambled condition.

So when we conduct the Construction analysis, we are asking how much of the variance in L(t) depends on the shared current input (C), even though the preceding contexts (B and X) are different. Thus, for a linear integrator, the construction analysis is a way of measuring V_after(tau).

And when we conduct the Forgetting analysis, we are asking how much of the variance in L(t) depends on the shared history (D), even though the current input streams (E and Y) are different. Thus, for a linear integrator, the forgetting analysis is a way of measuring V_before(tau).

I hope it is clear that, for both the forgetting and construction analyses, the difference (on average) of the states of the Intact and Scrambled models is just a function of V_before(tau) and V_after(tau). But since V_before(tau) + V_after(tau) = 1, we are really measuring the same thing in both cases. If V_before(tau) ramps downward quickly as a function of tau, then V_after(tau) ramps upward quickly as a function of tau.

The reasoning above is by no means a formal proof, but I hope it makes clear why I was surprised by the linear integrator results in Figure 5, and if I am going wrong here, I hope that the author can help me to understand where.

In either case, thanks for the science, and apologies for this long explanation!

Chris Honey

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Samuel A. Nastase

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008993.r005

Decision Letter 2

Frédéric E Theunissen, Samuel J Gershman

8 Sep 2021

Dear Dr Dominey,

We are pleased to inform you that your manuscript 'Narrative Event Segmentation in the Cortical Reservoir' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Frédéric E. Theunissen

Associate Editor

PLOS Computational Biology

Samuel Gershman

Deputy Editor

PLOS Computational Biology

***********************************************************

Dear Peter,

I have read your second revisions and your reply to the reviews. Congratulations on a nice contribution.

Frederic Theunissen

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008993.r006

Acceptance letter

Frédéric E Theunissen, Samuel J Gershman

22 Sep 2021

PCOMPBIOL-D-21-00645R2

Narrative Event Segmentation in the Cortical Reservoir

Dear Dr Dominey,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Andrea Szabo

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol


Articles from PLoS Computational Biology are provided here courtesy of PLOS

RESOURCES