Invariants for neural automata

Jone Uria-Albizuri; Giovanni Sirio Carmantini; Peter beim Graben; Serafim Rodrigues

doi:10.1007/s11571-023-09977-5

. 2023 May 31;18(6):3291–3307. doi: 10.1007/s11571-023-09977-5

Invariants for neural automata

Jone Uria-Albizuri ^1,^✉, Giovanni Sirio Carmantini ², Peter beim Graben ³, Serafim Rodrigues ^4,^✉

PMCID: PMC11655788 PMID: 39712102

Abstract

Computational modeling of neurodynamical systems often deploys neural networks and symbolic dynamics. One particular way for combining these approaches within a framework called vector symbolic architectures leads to neural automata. Specifically, neural automata result from the assignment of symbols and symbol strings to numbers, known as Gödel encoding. Under this assignment, symbolic computation becomes represented by trajectories of state vectors in a real phase space, that allows for statistical correlation analyses with real-world measurements and experimental data. However, these assignments are usually completely arbitrary. Hence, it makes sense to address the problem which aspects of the dynamics observed under a Gödel representation is intrinsic to the dynamics and which are not. In this study, we develop a formally rigorous mathematical framework for the investigation of symmetries and invariants of neural automata under different encodings. As a central concept we define patterns of equality for such systems. We consider different macroscopic observables, such as the mean activation level of the neural network, and ask for their invariance properties. Our main result shows that only step functions that are defined over those patterns of equality are invariant under symbolic recodings, while the mean activation, e.g., is not. Our work could be of substantial importance for related regression studies of real-world measurements with neurosymbolic processors for avoiding confounding results that are dependant on a particular encoding and not intrinsic to the dynamics.

Keywords: Computational cognitive neurodynamics, Symbolic dynamics, Neural automata, Observables, Invariants, Language processing

Introduction

Computational cognitive neurodynamics deals to a large extent with statistical modeling and regression analyses between behavioral and neurophysiological observables on the one hand and neurocomputational models of cognitive processes on the other hand (Gazzaniga et al. 2002; Rabinovich et al. 2012). Examples for experimentally measurable observables are response times (RT), eye-movements (EM), event-related brain potentials (ERP) in the domain of electroencephalography (EEG), event-related magnetic fields (ERF) in the domain of magnetoencephalography (MEG), or the blood-oxygen-level-dependent signal (BOLD) in functional magnetic resonance imaging (fMRI).

Computational models for cognitive processes often involve drift-diffusion approaches (Ratcliff 1978; Ratcliff and McKoon 2007), cognitive architectures such as ACT-R (Anderson et al. 2004), automata theory (Hopcroft and Ullman 1979), dynamical systems (van Gelder 1998; Kelso 1995; Rabinovich and Varona 2018), and notably neural networks (e.g. Hertz et al. (1991); Arbib (1995)) that became increasingly popular after the induction of deep learning techniques in recent time (LeCun et al. 2015; Schmidhuber 2015).

For carrying out statistical correlation analyses between experimental data and computational models one has to devise observation models, relating the microscopic states within a computer simulation (e.g. the spiking of a simulated neuron) with the above-mentioned macroscopically observable measurements. In decision making, e.g., a suitable observation model is first passage time in a drift-diffusion model (Ratcliff 1978; Ratcliff and McKoon 2007). In the domain of neuroelectrophysiology, local field potentials (LFP) and EEG can be described through macroscopic mean-fields, based either on neural compartment models (Mazzoni et al. 2008; beim Graben and Rodrigues 2013; Martínez-Cañada et al. 2021), or neural field theory (Jirsa et al. 2002; beim Graben and Rodrigues 2014). For MRI and BOLD signals, particular hemodynamic observation models have been proposed (Friston et al. 2000; Stephan et al. 2004).

In the fields of computational psycholinguistics and computational neurolinguistics (Arbib and Caplan 1979; Crocker 1996; beim Graben and Drenhaus 2012; Lewis 2003) a number of studies employed statistical regression analysis between measured and simulated data. To name only a few of them, Davidson and Martin (2013) modeled speed-accuracy data from a translation-recall experiment among Spanish and Basque subjects through a drift-diffusion approach (Ratcliff 1978; Ratcliff and McKoon 2007). Lewis and Vasishth (2006) correlated self-paced reading times for English sentences of different linguistic complexity with the predictions of an ACT-R model (Anderson et al. 2004). Huyck (2009) devised a Hebbian cell assembly network of spiking point neurons for a related task. Using an automaton model for formal language (Hopcroft and Ullman 1979), Stabler (2011) argued how reading times could be related to the automaton’s working memory load. Similarly, Boston et al. (2008) compared eye-movement data with the predictions of an automaton model for probabilistic dependency grammars (Nivre 2008).

Correlating human language processing with event-related brain dynamics became an important subject of computational neurolinguistics in recent years. Beginning with the seminal studies of beim Graben et al. (2000, 2004), similar work has been conducted by numerous research groups (for an overview cf. Hale et al. (2022)). Also to name only a few of them, Hale et al. (2015) correlated different formal language models with the BOLD response of participants listening to speech. Similarly, Frank et al. (2015) used different ERP components in the EEG, such as the N400 (a deflection of negative polarity appearing about 400 ms after stimulus onset as a marker of lexical-semantic access) for such statistical modeling. beim Graben and Drenhaus (2012) correlated the temporally integrated ERP during the understanding of negative polarity items (Krifka 1995) with the harmony observable of a recurrent neural network (Smolensky 2006), thereby implementing a formal language processor as a vector symbolic architecture (Gayler 2006; Schlegel et al. 2021). Another neural network model of the N400 ERP-component is due to Rabovsky and McRae (2014), and to Rabovsky et al. (2018) who related this marker with neural prediction error and semantic updating as observation models. Similar ideas have been suggested by Brouwer et al. (2017); Brouwer and Crocker (2017), and Brouwer et al. (2021) who considered a deep neural network of layered simple recurrent networks (Cleeremans et al. 1989; Elman 1990), where the basal layer implements lexical retrieval, thus accounting for the N400 ERP-component, while the upper layer serves for contextual integration. Processing failures at this level are indicated by another ERP-component, the P600 (a positively charged deflection occurring around 600 ms after stimulus onset). Their neurocomputational model thereby implemented a previously suggested retrieval-integration account (Brouwer et al. 2012; Brouwer and Hoeks 2013).

In the studies of beim Graben et al. (2000, 2004, 2008), a dynamical systems approach was deployed — later dubbed cognitive dynamical modeling by beim Graben and Potthast (2009). This denotes a three-tier approach starting firstly with symbolic data structures and algorithms as models for cognitive representations and processes. These symbolic descriptions are secondly mapped onto a vectorial representation within the framework of vector symbolic architectures (Gayler 2006; Schlegel et al. 2021) through filler-role bindings and subsequent tensor product representations (Smolensky 1990, 2006; Mizraji 1989, 2020). In a third step, these linear structures are used as training data for neural network learning. More specifically, symbol strings and formal language processors can be mapped through Gödel encodings to dynamical automata (beim Graben et al. 2000, 2004, 2008; beim Graben and Potthast 2014).

In the seminal work by Siegelmann and Sontag (1995), recurrent neural networks are shown to support universal computation. Specifically, the authors construct a neural network architecture able to simulate a Universal Turing Machine in real-time. Their approach is based on the fractal encoding of the machine tape and state (beim Graben and Potthast 2009), and the application of affine linear transformations that appropriately map the encoded tape and state to a new tape and state, as dictated by the machine table, at each computation step. Recently, Carmantini et al. (2017) have shown that recurrent neural networks can also simulate dynamical automata in real-time, within an architecture named by the authors as neural automata (NA). Similarly to the approach of Siegelmann and Sontag (1995), NA make use of linear units in the network to apply affine linear transformations onto the fractal encoding of symbol strings. However, by basing their construction on dynamical automata, Carmantini et al. (2017) were able to define simpler, more parsimonious networks, with a direct correspondence between network architecture and the structure of the dynamical automata, they simulate.1

Carmantini et al. (2017) also showed how neural automata can be used for neurolinguistic correlation studies. They implemented a diagnosis-repair parser (Lewis 1998; Lewis and Vasishth 2006) for the processing of initially ambiguous subject relative and object relative sentences (Frisch et al. 2004; Lewis and Vasishth 2006) through an interactive automata network. As an appropriate observation model they exploited the mean activation of the resulting neural network (Amari 1974) as synthetic ERP (beim Graben et al. 2008; Barrès et al. 2013) and obtained a model for the P600 component in their attempt.

For all these neurocomputational models symbolic content must be encoded as neural activation patterns. In vector symbolic architectures, this procedure involves a mapping of symbols onto filler vectors and of their possible binding sites in a data structure onto role vectors (beim Graben and Potthast 2009). Obviously, such an encoding is completely arbitrary and could be replaced at least by any possible permutation of a chosen code. Therefore, the question arises to what extent neural observation models remain invariant under permutations of an arbitrarily chosen code. Even more crucially, one has to face the problem whether a statistical correlation analysis depends on only one particularly chosen encoding, or not. Only if statistical models are also invariant under recoding, they could be regarded as reliable methods of scientific investigation.

It is the aim of the present study to provide a rigorous mathematical treatment of invariant observation models for the particular case of dynamical and neural automata and their underlying shift spaces. The article is structured as follows. In "Invariants in dynamical systems" section we introduce the general concepts and basic definitions about invariants in dynamical systems, focusing later in "Neurodynamics" section on the special case of neurodynamical ones. In "Symbolic dynamics" section we focus our attention on symbolic dynamics. After introducing the basic notation we discuss the tools and facts that are needed in "Rooted trees" section about rooted trees and about Gödel encodings in "Gödel encodings" section. In "Cylinder sets" section we relate these concepts to cylinder sets in order to finally describe the invariant partitions for different Gödelizations of strings in "Invariants" section. Then, in "Neural automata" section we describe the architecture for neural automata and how to pass from single strings to dotted sequences. Finally, in "Invariant observables" section we describe a symmetry group defined by Gödel recoding of alphabets for neural automata, and we define a macroscopic observable that is invariant under this symmetry, based on the invariants described in "Invariants" section before. In the end, in "Neurolinguistic application" section, we apply our results to a concrete example with a neural automaton constructed to emulate parser for a context-free grammar. We demonstrate that the given macroscopic observable is invariant under Gödel recodings, whereas Amari’s mean network activity is not. "Discussion" section provides a concluding discussion. All the mathematical proofs about the facts claimed throughout the paper are collected in an “appendix”.

Invariants in dynamical systems

We consider a classical time-discrete and deterministic dynamical system in its most generic form as an ordered pair $Σ = (X, Φ)$ where $X \subset R^{n}$ is a compact Hausdorff space as its phase space of dimension $n \in N$ and $Φ : X \to X$ is an invertible (generally nonlinear) map (Atmanspacher and beim Graben 2007). The flow of the system is generated by the time iterates $Φ^{t}$ , $t \in Z$ , i.e., $t \mapsto Φ^{t}$ is a one-parameter group for the dynamics with time $t \in Z$ , obeying $Φ^{t} \circ Φ^{s} = Φ^{t + s}$ for $t, s \in Z$ . The elements of the phase space $x \in X$ refer to the microscopic description of the system $Σ$ and are therefore called microstates. After preparation of an initial condition $x_{0} \in X$ the system evolves deterministically along a trajectory $T = {x (t) = Φ^{t} (x_{0}) | t \in Z}$ .

A bounded function $f : X \to R$ is called an observable with $f (x) \in R$ as measurement result in microstate $x$ . The function space $B (X) = {f : X \to R | ‖ f ‖ < \infty}$ , endowed with point wise function addition $(f + g) (x) = f (x) + g (x)$ , function multiplication $(f g) (x) = f (x) g (x)$ , and scalar multiplication $(λ f) (x) = λ f (x)$ (for all $f, g \in B (X)$ , $λ \in R$ ) is called the observable algebra of the system $Σ$ with norm $‖ \cdot ‖ : B (X) \to R_{0}^{+}$ . Restricting the function space B(X) to the bounded continuous functions $C_{0} (X)$ , yields the algebra of microscopic observables which describe ideal measurements for uniquely distinguishing among different microstates within certain regions of phase space.

By contrast, complex real-world dynamical systems only allow the measurement of macroscopic properties. The corresponding macroscopic observables belong to the larger algebra of bounded functions2B(X) and are usually defined as large-scale limits of so-called mean-fields (Hepp 1972; Sewell 2002). Examples for macroscopic mean-field observables in computational neuroscience are discussed below.

The algebra of macroscopic observables B(X) contains step functions and particularly the indicator functions $χ_{A}$ for proper subsets $A \subset X$ which are not continuous over whole X. Because $χ_{A} (x) = χ_{A} (y)$ for all $x, y \in A$ , the microstates $x$ and $y$ are not distinguishable by means of the macroscopic measurement of $χ_{A}$ . Thus, Jauch (1964) and Emch (1964) called them macroscopically equivalent.3 The class of macroscopically equivalent microstates forms a macrostate in the given mathematical framework (Jauch 1964; Emch 1964; Sewell 2002). Hence, a macroscopic observable induces a partition of the phase space of a dynamical system $Σ$ into macrostates.

The algebras of microscopic observables, $C_{0} (X)$ , and of macroscopic observables, B(X), respectively, are linear spaces with their additional algebraic products. As vector spaces, they allow the construction of linear homomorphisms $φ : B (X) \to B (X)$ which are vector spaces as well. An important subspace of the space of linear homomorphisms is provided by the space of linear automorphisms, $Aut (B (X))$ , which contains the invertible linear homomorphisms. The space $Aut (B (X))$ is additionally a group with respect to function composition, $(φ \circ η) (f)$ , called the automorphism group of the algebra B(X).

Next, let G be a group possessing a faithful representation $α$ in the automorphism group $Aut (B (X))$ of the dynamical system $Σ$ ; that is, $α : G \to Aut (B (X))$ is an injective group homomorphism. Then, for $a \in G$ , $α_{a} \in Aut (B (X))$ maps an observable $f \in B (X)$ onto its transformed $α_{a} (f) \in B (X)$ , such that for two $a, b \in G$ it holds $α_{a * b} (f) = (α_{a} \circ α_{b}) (f)$ where ‘ $*$ ’ denotes the group product in G. The group G is called a symmetry of the dynamical system $Σ$ (Sewell 2002). Moreover, if the representation of G commutes with the dynamics of $Σ$ ,

\begin{matrix} (α_{a} (f \circ Φ^{t})) (x) = f (Φ^{t} (α_{a}^{*} (x))) \end{matrix}

for all $x \in X$ , the group G is called dynamical symmetry (Sewell 2002). In Eq. (1), the map $α_{a}^{*} : X \to X$ is the lifting result from the observables to phase space through

\begin{matrix} f \circ α_{a}^{*} = α_{a} (f) . \end{matrix}

As an example consider the macroscopic observable $χ_{A}$ , i.e. the indicator function for a proper subset $A \subset X$ again. Choosing $α_{a}^{*}$ in such as way that $α_{a}^{*} (x) \in A$ for all $x \in A$ , leaves $χ_{A}$ invariant: $χ_{A} (α_{a}^{*} (x)) = χ_{A} (x)$ .

More generally, we say that an observable $f \in B (X)$ is invariant under the symmetry G if

\begin{matrix} f (α_{a}^{*} (x)) = f (x) \end{matrix}

for all $a \in G$ . It is the aim of the present study to investigate such invariants for particular neurodynamical systems, namely dynamical and neural automata (beim Graben et al. 2000, 2004, 2008; Carmantini et al. 2017).

Neurodynamics

Neurodynamical systems are essentially recurrent neural networks consisting of a large number, $n \in N$ , of model neurons (or units) that are connected in a complex graph (Hertz et al. 1991; Arbib 1995; LeCun et al. 2015; Schmidhuber 2015). Under a suitable normalization, the activity of a unit, e.g. its spike rate can be represented by a real number in the unit interval $[0, 1] \subset R$ . Then, the microstate of the entire network becomes a vector in the n-dimensional hypercube, $x \in X = {[0, 1]}^{n} \subset R^{n}$ . The microscopic observables are projectors on the individual coordinate axes,

\begin{matrix} f_{i} (x) = x_{i} \end{matrix}

for $1 \leq i \leq n$ . For discrete time, the network dynamics is generally given as a nonlinear difference equation

\begin{matrix} x (t + 1) = Φ_{W} (x (t)) . \end{matrix}

Here $x (t) \in X$ is the activation vector (the microstate) of the network at time t and $Φ_{W}$ is a nonlinear map, parameterized by the synaptic weight matrix $W \in R^{2}$ . Often, the map $Φ_{W}$ is assumed to be of the form

\begin{matrix} Φ_{W} (x) = F (W \cdot x), \end{matrix}

with a nonlinear squashing function $F = {(F_{i})}_{1 \leq i \leq n} : X \to X$ as the activation function of the network. For $F_{i} = Θ$ (where $Θ$ denotes the Heaviside jump function), equations (4, 5) describe a network of McCulloch-Pitts neurons (McCulloch and Pitts 1943). Another popular choice for the activation function is the logistic function

\begin{matrix} F_{i} (x) = \frac{1}{1 + e^{- x_{i}}}, \end{matrix}

describing firing rate models (cf., e.g., beim Graben (2008)). Replacing Eq. (5) by the map

\begin{matrix} Φ_{W} (x) = (1 - Δ t) x + Δ t F (W \cdot x) \end{matrix}

yields a time-discrete leaky integrator network (Wilson and Cowan 1972; beim Graben et al. 2009; beim Graben and Rodrigues 2013). For numerical simulations using the Euler method, $Δ t < 1$ is chosen for the time step.

For correlation analyses of neural network simulations with experimental data from neurophysiological experiments one needs a mapping from the high-dimensional neural activation space $X \subset R^{n}$ into a much lower-dimensional observation space that is spanned by $p \in N$ macroscopic observables $f_{k} : X \to R$ ( $1 \leq k \leq p$ ). A standard method for such a projection is principal component analysis (PCA) (Elman 1991). If PCA is restricted to the first principal axis, the resulting scalar variable could be conceived as a measure of the overall activity in the neural network. In the realm of computational neurolinguistics PCA projections were exploited by beim Graben et al. (2008).

Another important scalar observable, e.g. used by beim Graben and Drenhaus (2012) as a neuronal observation model, is Smolensky’s harmony (Smolensky 1986)

\begin{matrix} H (t) = x {(t)}^{+} \cdot W \cdot x (t) \end{matrix}

with $x^{+}$ as transposed activation state vector, and the synaptic weight matrix $W$ , above.

Brouwer et al. (2017) suggested the “dissimilarity” between the actual microstate and its dynamical precursor, i.e.

\begin{matrix} D (t) = 1 - \frac{x (t) \cdot x (t - 1)}{‖ x (t) ‖ ‖ x (t - 1) ‖} \end{matrix}

as a suitable neuronal observation model.

In this study, however, we use Amari’s mean network activity (Amari 1974)

\begin{matrix} A (t) = \frac{1}{n} \sum_{i} x_{i} (t) \end{matrix}

as time-dependent “synthetic ERP” (Barrès et al. 2013; Carmantini et al. 2017) of a neural network.

Symbolic dynamics

A symbolic dynamics arises from a time-discrete but space continuous dynamical system $Σ$ through a partition of its phase space X into a finite family of m disjoint subsets totally covering the space X (Lind and Marcus 1995). Hence

\begin{matrix} P = {A_{k} \subset X | A_{k} \cap A_{j} = \emptyset for k \neq j, ⋃_{k = 1}^{m} A_{k} = X} . \end{matrix}

Such a partition could be induced by a macroscopic observable with finite range. By assigning the index k of a partition set $A_{k}$ as a distinguished symbol $s_{t}$ to a state $x (t)$ when $x (t) \in A_{k}$ , a trajectory of the system is mapped onto a two-sided infinite symbolic sequence. Correspondingly, the flow map of the dynamics $Φ$ becomes represented by the left shift $σ$ through $σ (s_{t}) = s_{t + 1}$ .

Following beim Graben et al. (2004, 2008), and Carmantini et al. (2017), a symbol is meant to be a distinguished element from a finite set $A$ , which we call an alphabet. A sequence of symbols $w \in A^{l}$ is called a word of length l, denoted $l = | w |$ . The set of words of all possible lengths w of finite length $| w | \geq 0$ , also called the vocabulary over $A$ , is denoted $A^{*}$ (for $| w | = 0$ , $w = ϵ$ denotes the “empty word”).

Rooted trees

One can visualize the set of all words over the alphabet $A$ as a regular rooted tree, T, where each vertex is labeled by and corresponds to each word formed by using this alphabet. Let us assume that $A$ has m letters for some $m \in N$ . That is $A = {a_{1}, \dots, a_{m}}$ . Then, the tree T is inductively constructed as follows:

(i)
The root of the tree is a vertex labeled by the empty word $ϵ$ .
(ii)
Assume we have constructed the vertices of step n, then we construct the vertices of step $n + 1$ as follows. Suppose that we have k vertices at step n that are labeled by the words $w_{1}, \dots, w_{k}$ . Then
- For each $i = 1, \dots, k$ and each $a_{j} \in A$ we add a new vertex decorated by $w_{i} a_{j}$ .
- For each $i = 1, \dots, k$ and $j = 1, \dots, m$ we add and edge from $w_{i}$ to $w_{i} a_{j}$ .

This construction generates a regular rooted tree. Following the aforementioned construction, typically in the first step the root is placed at the top vertex. Subsequently the root is joined by edges, where each edge is associated to every word of length 1, that is, to every symbol of $A$ . Then iteratively, each of these edges labeled by a letter of $A$ is joined to any word of length two starting by that letter, and so on. Assuming that $A = {a_{1}, \dots, a_{m}}$ , this construction yields an infinite tree as in Fig. 1.

Fig. 1 — The vocabulary $A^{*}$ as a rooted tree

Each vertex of the tree corresponds to a word over the alphabet $A$ . That is, the set of vertices of the tree is $A^{*}$ . On the other hand, each infinite ray starting from the root, corresponds to an infinite sequence of symbols over $A$ , and it belongs to the boundary of the tree. We denote this boundary by $\partial T$ and as mentioned, viewed as a set is equal to $A^{N}$ .

The construction of the tree is unique up to the particular ordering of the symbols in $A$ we chose. Thus, in principle, if $γ : A \to {0, \dots, m - 1}$ is a particular ordering (i.e. a bijection) of the alphabet where an element a is denoted as $a_{i}$ if $γ (a) = i - 1$ , then the tree should be denoted by $T_{γ}$ as it depends on that particular ordering of the alphabet.

Let us denote by T the regular rooted tree over the alphabet ${0, 1, \dots m - 1}$ with the natural order induced by $N$ (see Fig. 2).

Henceforth will denote by $M$ the alphabet ${0, \dots, m - 1}$ and as before, by T the tree corresponding to the alphabet $M$ under the usual ordering on $N$ .

When we say that the construction is unique up to reordering of symbols, we mean that both trees are isomorphic as graphs, where an isomorphism of graphs is a bijection between vertices preserving incidence. Indeed, for any bijection $γ : A \to M$ , the tree $T_{γ}$ is ismorphic to T as a graph.

Lemma 1

Let $γ : A \to M$ be an ordering of the alphabet $A$ . Then $T_{γ}$ and T are isomorphic.

Since being isomorphic is transtive, this lemma shows that for any two alphabets $A_{1}$ and $A_{2}$ of the same cardinality and any two orderings of those alphabets $γ_{1}$ and $γ_{2}$ , their corresponding trees $T_{γ_{1}}$ and $T_{γ_{2}}$ will be isomorphic as graphs.

Gödel encodings

Having $A^{N}$ , the space of one-sided infinite sequences over an alphabet $A$ containing $| A | = m$ symbols and $s = a_{1} a_{2} \dots$ a sequence in this space, with $a_{k}$ being the k-th symbol in s and an ordering $γ : A \to {0, \dots, m - 1}$ , then a Gödelization is a mapping from $A^{N}$ to $[0, 1] \subset R$ defined as follows:

\begin{matrix} ψ_{γ} (s) : = \sum_{k = 1}^{\infty} γ (a_{k}) m^{- k} . \end{matrix}

By the Lemma 1 we know that for each Gödelization of $A$ induced by $γ$ , there is an isomorphism of graphs between $T_{γ}$ and T. Since the choice for the ordering of the alphabet (in other words, the choice of $γ$ ) is arbitrary and leads to different Gödel encodings, we are interested in finding invariants for different such encodings.

One can define a metric on the boundary of the tree in the following way: given any two infinite rays of the tree $p = a_{1} a_{2} a_{3} \dots$ and $q = b_{1} b_{2} b_{3} \dots$ we define

d (p, q) = \{\begin{matrix} 0 & if a_{1} \neq b_{1} \\ m^{- n} & if a_{i} = b_{i} for i = 1, \dots, n and a_{n + 1} \neq b_{n + 1} \\ 1 & if p = q \end{matrix})

This defines an ultrametric on the boundary, that is, a metric that satisfies a stronger version of the triangular inequality, namely:

\begin{matrix} d (p, q) \leq max {d (p, r), d (r, q)} . \end{matrix}

When we encode the infinite strings under the Gödel encoding, we are sending rays that are close to each other under this ultrametric to points that are close in the [0, 1] interval under the usual metric.

Lemma 2

Let $p = a_{1} a_{2} a_{3} \dots$ and $q = b_{1} b_{2} b_{3} \dots$ be two infinite strings over $A$ . Then for any Gödel encoding $ψ$ we have that

\begin{matrix} d (p, q) \leq \frac{1}{m^{n}} ⟺ \exists k \in {0, \dots, m^{n} - 1}, \\ ψ (p), ψ (q) \in [\frac{k}{m^{n}}, \frac{k + 1}{m^{n}}) . \end{matrix}

Recall that the lemma does not mean that points that are close (with respect to the usual metric) on the [0, 1] interval come from rays that were close on the tree. For example, if the alphabet has 3 letters, the points $1 / 3 - ϵ$ and 1/3 are as close as we want for any $ϵ > 0$ but are at distance 0 from each other on the tree. In fact, it gives a partition of the interval for each $n \in N$ in a way that, if two points representing an infinite string are in the same interval according to the partition of the corresponding n, then they come from two rays that share at least a common prefix of length n.

Cylinder sets

In symbolic dynamics, a cylinder set (McMillan 1953) is a subset of the space $A^{N}$ of infinite sequences from an alphabet $A$ that agree in a particular building block of length $l \in N$ . Thus, let $w = A^{*}$ be a finite word $a_{1} a_{2} \dots a_{l}$ of length l, we define the cylinder set

\begin{matrix} [w] = [a_{1} a_{2} \dots a_{l}] = {s \in A^{N} | s_{k} = a_{k}, k = 1, \dots, l} . \end{matrix}

We can also see the cylinder sets on the tree depicted in Fig. 3. In fact, for each level on the tree (where level refers to vertices corresponding to words of certain fixed length) we get a partition of the interval [0, 1]. The vertices hanging from each vertex on that level land on their corresponding interval of the partition. Thus, from a rooted tree view point, a cylinder set corresponds to a whole tree hanging from that vertex. Concretely, the cylinder set [w] for the word $w \in A^{*}$ is the subtree hanging from the vertex decorated by w.

Two different Gödel codes $ψ, φ$ can only differ with respect to their assignments $γ_{1}, γ_{2} : A \to {0, \dots, m - 1}$ . Thus, we call a permutation $π \in S_{m}$ (with $S_{m}$ as the symmetric group) a Gödel recoding, if there exist two assigments $γ_{1}$ and $γ_{2}$ such that

\begin{matrix} π \circ γ_{1} = γ_{2} . \end{matrix}

Invariants

The ultimate goal of our study is to find invariants under Gödel recodings. Observe that under the notation of Lemma 1, $g_{γ_{1}} : T_{γ_{1}} \to T$ and $g_{γ_{2}} : T_{γ_{2}} \to T$ are two graph isomorphisms. In fact, they induce a graph automorphism of T, $g_{π} = g_{γ_{2}} \circ g_{γ_{1}}^{- 1} : T \to T$ . And this automorphism sends the vertices encoded by $γ_{1}$ to the ones encoded by $γ_{2}$ .

As Lemma 2 shows, a Gödel recoding preserves the size of cylinder sets after permuting vertices. However, the way of ordering the alphabet and how this permutes the rays of the tree is even more restrictive than just preserving the size of the cylinder sets. In fact, under the action of a reordering each vertex can only be mapped to certain vertices and it is forbidden to be sent to others. This is captured by the following most central definition.

Definition 1

Let $w = a_{i_{1}} a_{i_{2}} \dots a_{i_{l}} \in M^{l}$ be a string of length l after an ordering $γ$ . We define a partition of the set of integers ${1, 2, \dots, l}$ ,

\begin{matrix} P_{w} = {{j_{1}, \dots, j_{k}} \subset N | a_{i_{j_{1}}} = \dots = a_{i_{j_{k}}}} . \end{matrix}

For any word $w \in M^{*}$ we call $P_{w}$ the pattern of equality of w.

Equipped with aforementioned formalisms we are now in a position to formulate the first main finding of our study as follows.

Theorem 1

For any other vertex $u \in T$ there exists a Gödel recoding $π$ such that $g_{π} (w) = u$ if and only if

\begin{matrix} u \in & M^{l} \end{matrix}

\begin{matrix} P_{w} = & P_{u} . \end{matrix}

Theorem 1 states that each vertex can be mapped to any vertex having the same pattern of equality and nowhere else.

Example 1

If $A = {a, b, c}$ and we consider $w = a a a b c a b c \in A^{8}$ . Then we have $P_{w} = {{1, 2, 3, 6}, {4, 7}, {5, 8}}$ , which gives us all the possible words where w can be mapped to. That would be the list of all the possibilities:

\begin{matrix} \begin{matrix} bbbacbac & c c c b a c b a & a a a c b a c b \\ b b b c a b c a & c c c a b c a b & a a a b c a b c \end{matrix} \end{matrix}

So we have only 6 possible vertices out of $3^{8} = 6561$ . And of course, this proportion reduces as we go deeper on the tree.

In terms of Gödelization into the [0, 1] interval, we illustrate the implications by an example. Let us assume that $m = 3$ and $l = 3$ , for example. Then, in Fig. 4 the cylinder sets of certain color can only be mapped through a recoding to a cylinder set of the same color and nowhere else.

Figure 5 shows the corresponding partition of the interval [0, 1] where the intervals in each color may be mapped to another of the same color by a different assignment map and nowhere else.

Fig. 5 — Invariant partition of the interval [0, 1] after Gödelization

Neural automata

Following beim Graben et al. (2004, 2008), and Carmantini et al. (2017), a dotted sequence $s \in A^{Z}$ on an alphabet $A$ is a two-sided infinite sequence of symbols “ $s = \dots a_{- 2} a_{- 1} . a_{0} a_{1} a_{2} \dots$ ” where $a_{i} \in A$ , for all indices $i \in Z$ . Here, the dot “.” is simply used as a mnemonic sign, indicating that the index 0 is to its right.

A shift space $M_{S} = (A^{Z}, σ)$ consists of the set $A^{Z}$ together with a map4 $σ : A^{Z} \to A^{Z}$ such that $σ (a_{i}) = a_{i + 1}$ for $i \in Z$ (Lind and Marcus 1995). Additionally, Moore (1990, 1991) have shown that the shift space $A^{Z}$ endowed with

\begin{matrix} \begin{matrix} F : A^{Z} \to Z \\ \oplus : A^{Z} \times {(A \cup {Φ})}^{Z} \to A^{Z} \\ G : A^{Z} \to {(A \cup {Φ})}^{Z}, \end{matrix} \end{matrix}

and their composition $Ω (s) = σ^{F (s)} (s \oplus G (s))$ , can simulate any Turing machine. The space $M_{GS} = (A^{Z}, Ω)$ is called a Generalized Shift (GS) if there exists a domain of dependance (DoD), i.e. an interval $(k_{l}, k_{r})$ , with $k_{l} \leq 0 \leq k_{r}$ , such that the definition of the maps F and G only depend on the content of the string $s \in A^{Z}$ on that interval. The function G maps each symbol in the DoD of s to a symbol in $A$ , whereas all symbols outside of the DoD are mapped to an auxiliary symbol $Φ$ . The $\oplus$ operation then carries out a substitution operation where all symbols mapped to $Φ$ by G(s) are left untouched, whereas symbols in the DoD of s are overwritten by their image under G(s). Finally, the map F determines the number of shifts to be applied to the string resulting by the substitution operation.

Carmantini et al. (2017) introduced a more general shift space, called versatile shift (VS). The VS is equipped with a more versatile rewriting operation, where dotted words in the DoD are replaced by dotted words of arbitrary length, as opposed to replacing each symbol in the DoD with exactly one symbol, as in a GS. For that purpose the dot is interpreted as a meta-symbol which can be concatenated with two words $v_{1}, v_{2} \in A^{*}$ through $v = v_{1} . v_{2}$ . Let ${\hat{A}}^{*}$ denote the set of these dotted words. Moreover, let $Z^{-} = {i | i < 0, i \in Z}$ and $Z^{+} = {i | i \geq 0, i \in Z}$ the sets of negative and non-negative indices. We can then reintroduce the notion of a dotted sequence as follows: a dotted sequence $s \in A^{Z}$ is a bi-infinite sequence of symbols such that $s = w_{α} v w_{β}$ with $v \in {\hat{A}}^{*}$ as a dotted word $v = v_{1} . v_{2}$ and $w_{α} v_{1} \in A^{-}$ and $v_{2} w_{β} \in A^{+}$ . Through this definition, the indices of s are inherited from the dotted word v and are thus not explicitly prescribed.

A VS is then defined as a pair $M_{VS} = (A^{Z}, Ω)$ , with $A^{Z}$ being the space of dotted sequences, and $Ω : A^{Z} \to A^{Z}$ defined by

\begin{matrix} Ω (s) = σ^{F (s)} (s \oplus G (s)) \end{matrix}

with

\begin{matrix} \begin{matrix} F : A^{Z} \to Z \\ \oplus : A^{Z} \times A^{Z} \to A^{Z} \\ G : A^{Z} \to A^{Z}, \end{matrix} \end{matrix}

where the operator “ $\oplus$ ” substitutes the dotted word $v_{1} . v_{2} \in {\hat{A}}^{*}$ in s with a new dotted word $\hat{v_{1}} . \hat{v_{2}} \in {\hat{A}}^{*}$ specified by G, while ${F (s) = F |}_{{\hat{A}}^{*}} (v_{1} . v_{2})$ determines the number of shift steps as for Moore’s generalized shifts (Carmantini et al. 2017). For a more detailed explanation about VS see Section 2.1.1 in Carmantini et al. (2017).

A nonlinear dynamical automaton (NDA) is a triple $M_{NDA} = (Y, P, Φ)$ , where $P$ is a rectangular partition of the unit square $Y = {[0, 1]}^{2} \subset R^{2}$ , that is

\begin{matrix} P = {D^{(i, j)} \subset Y | 1 \leq i \leq m, 1 \leq j \leq n, m, n \in N}, \end{matrix}

so that each cell is defined as $D^{(i, j)} = I_{i} \times J_{j}$ , with $I_{i}, J_{j} \subset [0, 1]$ being real intervals for each bi-index (i, j), with $D^{(i, j)} \cap D^{(k, l)} = \emptyset$ if $(i, j) \neq (k, l)$ , and $⋃_{i, j} D^{(i, j)} = Y$ . The couple $(Y, Φ)$ is a time-discrete dynamical system with phase space Y and the flow $Φ : Y \to Y$ is a piecewise affine-linear map such that $Φ_{| D^{(i, j)}} : = Φ^{(i, j)}$ , with $Φ^{(i, j)}$ having the following form:

\begin{matrix} Φ^{(i, j)} (y) = (\begin{matrix} a_{1}^{(i, j)} \\ a_{2}^{(i, j)} \end{matrix}) + (\begin{matrix} λ_{1}^{(i, j)} & 0 \\ 0 & λ_{2}^{(i, j)} \end{matrix}) (\begin{matrix} y_{1} \\ y_{2} \end{matrix}), \end{matrix}

with state vector $y = (y_{1}, y_{2})$ . Carmantini et al. (2017) have shown that using Gödelization any versatile shift can be mapped to a nonlinear dynamical automaton. Therefore, one can reproduce the activity of a versatile shift on the unit square Y. In order to do so, the partition (18) is given by the so called domain of dependance (DoD). The domain of dependance is a pair $(l, r) \in N \times N$ which defines the length of the strings on the left and right hand side of the dot in a dotted sequence that is relevant for the versatile shift to act on the phase space. The dynamics of the versatile shift is completely determined by how the string looks like in each iteration on the domain of dependance. Then, if the domain is (l, r) and if the alphabet $A$ has size m, the partition of the unit square is given by $m^{r}$ intervals on the $y_{1}$ axis and $m^{l}$ intervals on the $y_{2}$ axis, corresponding to cells where the NDA is defined according to the versatile shift. Finally, a neural automaton (NA) is an implementation of an NDA by means of a modular recurrent neural network. The full construction can be followed in "Invariants in dynamical systems" section of Carmantini et al. (2017).

The neural automaton comprises a phase space $X = {[0, 1]}^{n}$ where the two-dimensional subspace $Y = {[0, 1]}^{2}$ of the underlying NDA is spanned by only two neurons that belong to the machine configuration layer (MCL). The remainder $X \ Y$ is spanned by the neurons of the branch selection layer (BSL) and the linear transformation layer (LTL), both mediating the piecewise affine mapping (19). Having an NDA defined from a versatile shift, each rectangle on the partition is given by the DoD, and the action of the NDA on each rectangle depends on the particular Gödel encoding of the alphabet $A$ that has been chosen. We are interested in invariant macroscopic observables of such automata under different Gödel encodings of the alphabet.

Since we are now interested on dotted sequences over an alphabet $A$ , instead of having an invariant partition of the interval [0, 1] as in Fig. 5, we will have an invariant partition of the unit square $Y = {[0, 1]}^{2}$ . That is, we will have a partition in rectangles where the machine might be at certain step of the dynamics or not. Each color in that partition gives all the possible places where a particular dotted sequence of certain right and left lengths could be under a different Gödel encoding.

For example, assuming that our alphabet has $m = 3$ letters in both sides of the dotted sequence and that we are looking at words of length $l = 2$ on the left hand side of the dot, and length $r = 3$ on the right hand side of the dot, the partition would be like in Fig. 6.

Fig. 6 — Each small square corresponds to a square on the partition given by the dotted sequences of length (2, 3). The squares colored by the same color are those having the same pattern of equality, and thus, are those which can be mapped to each other under different Gödel encodings of the alphabet

Let us assume that we are considering the invariant partition for dotted sequences of length (l, r), meaning that the left hand side has length l and the right hand side r. Then we know that the partition of the square Y is given by $E^{(i, j)} = [\frac{i}{m^{l}}, \frac{i + 1}{m^{l}}) \times [\frac{j}{m^{r}}, \frac{j + 1}{m^{r}})$ . Each left corner of the rectangle corresponds to the position of the Gödelization of a dotted sequence of size (l, r). Each point $(y_{1}, y_{2}) = (\frac{i}{m^{l}}, \frac{j}{m^{l}})$ has a unique expansion on base m for its coordinates. That is, there are some $c_{1}, \dots, c_{l}$ with $0 \leq c_{i} \leq m_{1}$ such that

\begin{matrix} y_{1} = \frac{i}{m^{l}} = \frac{c_{1}}{m} + \frac{c_{2}}{m^{2}} + \dots + \frac{c_{l}}{m^{l}} . \end{matrix}

These ${c_{1}, \dots, c_{l}}$ also define a partition of ${1, \dots, l}$ in the same way as given in Definition 1. Therefore ${d_{1}, \dots, d_{k}} \in P_{x} ⟺ c_{j_{1}} = \dots = c_{j_{k}}$ . This procedure similarly applies to the $y_{2}$ coordinate. Hence, the corners defining an invariant piece of the partition will be those sharing the same partition of ${1, \dots, l} \times {1, \dots, r}$ . In other words, we can obtain the corners related to a given $y$ by expanding $y_{1}$ and $y_{2}$ on base m and permuting the appearance of $0, \dots, m - 1$ on the expansion.

For example, if $m = 3$ and $(l, r) = (2, 3)$ , we have $3^{2} \cdot 3^{3} = 3^{5}$ rectangles. Now let us take, for instance the rectangle $[\frac{6}{3^{2}}, \frac{7}{3^{2}}) \times [\frac{10}{3^{3}}, \frac{11}{3^{3}})$ and let us find its invariant partition. First we decompose

\begin{matrix} y_{1} = \frac{6}{3^{2}} = \frac{2}{3} + \frac{0}{9} and y_{2} = \frac{10}{3^{3}} = \frac{1}{3} + \frac{0}{9} + \frac{1}{27} . \end{matrix}

Hence a rectangle in the same invariant partition must be of the form $E^{(i, j)} = [\frac{i}{m^{l}}, \frac{i + 1}{m^{l}}) \times [\frac{j}{m^{r}}, \frac{j + 1}{m^{r}})$ with $\frac{i}{3^{2}} = \frac{a}{3} + \frac{b}{9}$ and $\frac{j}{3^{3}} = \frac{c}{3} + \frac{b}{9} + \frac{c}{27}$ with $a, b, c \in {0, 1, 2}$ and different.5 This gives the following rectangles

\begin{matrix} [\frac{1}{9}, \frac{2}{9}) \times [\frac{23}{27}, \frac{24}{27}) & [\frac{3}{9}, \frac{4}{9}) \times [\frac{20}{27}, \frac{21}{27}) \\ [\frac{2}{9}, \frac{3}{9}) \times [\frac{16}{27}, \frac{17}{27}) & [\frac{6}{9}, \frac{7}{9}) \times [\frac{10}{27}, \frac{11}{27}) \\ [\frac{7}{9}, \frac{8}{9}) \times [\frac{3}{27}, \frac{4}{27}) & [\frac{5}{9}, \frac{6}{9}) \times [\frac{6}{27}, \frac{7}{27}) \end{matrix}

In this way we can construct the partition of the unit square given by the patterns of equality.

Invariant observables

Our aim now is to define an observable $f \in B (X)$ , in the sense of "Invariants in dynamical systems" section for neural automata. That is $f : X \to R$ should obey Eq. (3) where the map $α_{π}^{*}$ corresponds to a symmetry induced by a Gödel recoding of the alphabets. Here $π$ denotes the permutation of the alphabet needed to pass from one Gödel encoding to the other, as explained later.

Notice that in the previous discussion we were assuming that we knew the length of the strings that were encoded. However, this is not the case in practice, and may cause problems, as the length of the strings vary at each iteration. For instance, if for the alphabet ${a, b}$ the symbol a is mapped to 0 under certain Gödel enconding $γ$ and the symbol b to 1, then the number $x = 1 / 2 \in [0, 1]$ would correspond to the word $w_{r} = b a \overset{r - 1}{\dots} a$ once we assume that the string is of length r for $r \in N \cup {0}$ . However, if we do not know the length of the encoded string, each $w_{r}$ will have a different Gödel number under the Gödel encoding that sends b to 0 and a to 1, namely $\sum_{k = 2}^{r - 1} 1 / 2^{k}$ . Thus, encoding symbols with the number 0 makes some strings indistinguishible under Gödel recoding, because having no symbols is interpreted as having the symbol encoded by 0 as many times as we want. This issue can be easily avoided by adding one symbol $⊔$ to the alphabet, which will be interpreted as a blank symbol, and will always be forced to be encoded as 0 by any Gödel encoding.

Suppose that we have an NDA defined from a versatile shift under the condition that the blank symbol $⊔$ has been added to the alphabet $A$ representing the blank symbol and that is mapped to 0 under any Gödel encoding.6 We will assume that $A$ has m symbols after adding the blank symbol (that is, we had $m - 1$ symbols before). Then for any pair $(r, l) \in N \times N$ , we can divide the unit square Y into the rectangle partition given by

\begin{matrix} R = {E^{(i, j)} \subset Y | 1 \leq i \leq m^{r}, 1 \leq j \leq m^{l}} . \end{matrix}

Next, we extend this partition of the phase space of the NDA, that equals the subspace of the machine configuration layer of the larger NA, to the entire phase space of the neural automaton. This is straightforwardly achieved by defining another partition

\begin{matrix} Q = {E^{(i, j)} \times {[0, 1]}^{n - 2} \subset X | 1 \leq i \leq m^{r}, 1 \leq j \leq m^{l}} . \end{matrix}

Now, for each left corner $(x_{1}^{(i, j)}, x_{2}^{(i, j)}) \in E^{(i, j)}$ we find their pattern of equality $P_{ij}$ , assuming that the permutation is taking place just on the symbols ${2, \dots, m}$ (as the first symbol has to be mapped to 0 under any encoding).

Let us suppose that ${P_{i j_{1}}, \dots, P_{i j_{s}}}$ are all the different appearing patterns of equality and we define the indicator functions $χ_{k} : X \to {0, 1}$ as

\begin{matrix} χ_{k} (x) = \{\begin{matrix} 1 & if x \in E^{(i, j)} \times {[0, 1]}^{n - 2} and P_{ij} = P_{i j_{k}} \\ 0 & otherwise \end{matrix}) \end{matrix}

for $x \in X$ . Then, we can choose $c_{1}, \dots, c_{s} \in R$ to be s different real numbers and define a macroscopic observable $f : X \to R$ as a step function

\begin{matrix} f (x) = \sum_{k = 1}^{s} c_{k} χ_{k} (x) . \end{matrix}

Clearly, we have $f \in B (X)$ .

Our aim is to show that this observable is invariant under the symmetry group $S_{m - 1} \times S_{m - 1}$ of the dynamical system $(X, Φ)$ given by the neural automaton in Eq. (19), where $S_{m - 1}$ denotes the symmetric group on $m - 1$ elements. First of all, we must show that $S_{m - 1} \times S_{m - 1}$ is a symmetry of the neural automaton.

Before doing this, we will define an auxiliary map. Let $π = (π_{1}, π_{2}) \in S_{m} \times S_{m}$ be any element of the product that fixes 1 (on the set ${1, 2, \dots, m}$ where $S_{m}$ acts). Notice that the elements of $S_{m}$ fixing the first element form a subgroup of $S_{m}$ that is isomorphic to $S_{m - 1}$ . Let now $x$ be any point in X. Let us consider $y = (y_{1}, y_{2})$ the first two coordinates of $x$ given by the activations of the machine configuration layer of the NA. Then, we can check in which of the intervals of the partition $R$ is, say $(y_{1}, y_{2}) \in E^{(i, j)} = [\frac{i}{m^{l}}, \frac{i + 1}{m^{l}}) \times [\frac{j}{m^{r}}, \frac{j + 1}{m^{r}})$ . We can therefore compute the expansion on base m of each corner and take the coefficients we get as words over the alphabet $M = {0, 1, \dots, m - 1}$ , say $c_{1} \dots c_{l} \in M^{l}$ and $d_{1}, \dots, d_{r} \in M^{r}$ . Then, we compute $g_{π_{1}} (c_{1} \dots c_{l})$ and $g_{π_{2}} (d_{1} \dots d_{r})$ and we encode these words by the canonical Gödel encoding (that is, the one given by the identity map on $M$ ). Thus, we obtain a new corner of some rectangle in our partition of the phase space, say $E^{(i^{'}, j^{'})} = [\frac{i^{'}}{m^{l}}, \frac{i^{'} + 1}{m^{l}}) \times [\frac{j^{'}}{m^{r}}, \frac{j^{'} + 1}{m^{r}})$ . We now define a map $ρ_{π} : Y \to Y$ by $ρ_{π} (y_{1}, y_{2}) = (y_{1} + \frac{i^{'} - i}{m^{l}}, y_{2} + \frac{j^{'} - j}{m^{r}})$ . This map can obviously be extended to a map from X to X being the identity on the rest of the coordinates. Abusing notation we also refer to $ρ_{π}$ as to this map. Informally speaking, the map $ρ_{π}$ rigidly permutes the squares on the partition $R$ according to the action of $g_{π_{1}}$ and $g_{π_{2}}$ on the words representing the corners.

Now, we can define $α_{π} : B (X) \to B (X)$ as follows. For any $f \in B (X)$ , we define

\begin{matrix} α_{π} (f) (x) = f (ρ_{π} (x)) . \end{matrix}

It is not difficult to check that if $π, γ \in S_{m - 1} \times S_{m - 1}$ are two group elements, then $α_{γ π} (f) = (α_{γ} \circ α_{π}) (f)$ so that $S_{m - 1} \times S_{m - 1}$ is a symmetry of the system.

Thus, we obtain finally our main result.

Theorem 2

Let $f \in B (X)$ be a macroscopic observable on the space space of a neural automaton as defined in (24). Then f is invariant under the symmetric group $S_{m - 1} \times S_{m - 1}$ of Gödel recodings of the automaton’s symbolic alphabet.

It is worth mentioning that this procedure gives infinitely many different invariant observables. In fact, any choice of $(r, l) \in N \times N$ gives a thinner invariant partition, and respectively, a sharper observable.

Neurolinguistic application

As an instructive example we consider a toy model of syntactic language processing as often employed in computational psycholinguistics and computational neurolinguistics (Arbib and Caplan 1979; Crocker 1996; beim Graben and Drenhaus 2012; Hale et al. 2022; Lewis 2003).

In order to process the sentence given by beim Graben and Potthast (2014) in example 2, linguists often derive a context-free grammar (CFG) from a phrase structure tree (Hopcroft and Ullman 1979).

Example 2

the dog chased the cat

In our case, the CFG consists of rewriting rules

\begin{matrix} S & \to NP VP \end{matrix}

\begin{matrix} VP & \to V NP \end{matrix}

\begin{matrix} NP & \to the dog \end{matrix}

\begin{matrix} V & \to chased \end{matrix}

\begin{matrix} NP & \to the cat \end{matrix}

where the left-hand side always presents a nonterminal symbol to be expanded into a string of nonterminal and terminal symbols at the right-hand side. Omitting the lexical rules (28 – 30), we regard the symbols $NP, V$ , denoting ‘noun phrase’ and ‘verb’, respectively, as terminals and the symbols $S$ (‘sentence’) and $VP$ (‘verbal phrase’) as nonterminals.

Then, a versatile shift processing this grammar through a simple top down recognizer (Hopcroft and Ullman 1979) is defined by

\begin{matrix} S . a VP & \mapsto NP . a \\ VP . a NP & \mapsto V . a \\ a . a & \mapsto ϵ . ϵ \end{matrix}

where the left-hand side of the tape is now called ‘stack’ and the right-hand side ‘input’. In (31) a stands for an arbitrary input symbol. Note the reversed order for the stack left of the dot. The first two operations in (31) are predictions according to a rule of the CFG while the last one is an attachment of subsequent input with already predicted material.

This machine then parses the well formed sentence $NP V NP$ as shown in Table 1 from beim Graben and Potthast (2014). We reproduce this table here as Table 1.

Table 1.

Sequence of state transitions of the versatile shift processing the well-formed string from example 2, i.e. NP V NP. The operations are indicated as follows: “predict (X)” means prediction according to rule (X) of the context-free grammar; attach means cancelation of successfully predicted terminals both from stack and input; and “accept” means acceptance of the string as being well-formed

Time	State	Operation
0	S.NP V NP	Predict (26)
1	VP NP.NP V NP	Attach
2	VP.V NP	Predict (27)
3	NP V.V NP	Attach
4	NP.NP	Attach
5	$ϵ$ . $ϵ$	Accept

Open in a new tab

Once we obtained the versatile shift, an NA simulating it can be generated. When we do so, we chose a particular Gödel encoding of the symbols. Suppose we chose the following two Gödelizations $γ = (γ_{1}, γ_{2})$ and $δ = (δ_{1}, δ_{2})$ that are given by

\begin{matrix} γ_{1} : {⊔, NP, V} & \to {0, 1, 2} \\ ⊔ & \mapsto 0 \\ NP & \mapsto 1 \\ V & \mapsto 2 \\ γ_{2} : {⊔, NP, V, VP, S} & \to {0, 1, 2, 3, 4} \\ ⊔ & \mapsto 0 \\ NP & \mapsto 1 \\ V & \mapsto 2 \\ VP & \mapsto 3 \\ S & \mapsto 4 \end{matrix}

on the one hand, and by

\begin{matrix} δ_{1} : {⊔, NP, V} & \to {0, 1, 2} \\ ⊔ & \mapsto 0 \\ NP & \mapsto 2 \\ V & \mapsto 1 \\ δ_{2} : {⊔, NP, V, VP, S} & \to {0, 1, 2, 3, 4} \\ ⊔ & \mapsto 0 \\ NP & \mapsto 4 \\ V & \mapsto 3 \\ VP & \mapsto 1 \\ S & \mapsto 2 \end{matrix}

on the other hand. Defining the step function $f : X \to R$ as in (24) after choosing $(l, r) = (2, 3)$ and the $c_{i}$ -s randomly. The neural automaton consists of $n = 72$ neurons, i.e. the phase space is given by the hypercube $X = {[0, 1]}^{72}$ . Running the neural network with both encodings and computing the step function f on each iteration $i = 1, \dots, 6$ , we see in Fig. 7 that f is indeed invariant under Gödel recoding.

Fig. 7 — The macroscopic observable f, given by the step function (24) is invariant under Gödel recoding. The figure shows the result of ‘measuring’ f to a neural automaton encoded by $γ$ on top and to the same machine encoded by $δ$ below

The step function clearly distinguishes among different states (where here by “different”we mean with different patterns of equality), but returns the same value for the states corresponding to the same pattern of equality, that is, states that differ on the Gödel encodings, as desired.

In contrast, if we use Amari’s observable Eq. (9) for the same simulation, we obtain a very different picture, showing that this observable is not invariant under Gödel recoding, as shown in Fig. 8. Obviously, this observable strongly depends on the particular Gödel encoding we have chosen.

Fig. 8 — Amari’s mean-field observable Eq. (9) of the neural automaton under two different Gödel encodings $γ$ and $δ$

Discussion

In this study we have presented a way of finding particular macroscopic observables for nonlinear dynamical systems that are generated by Gödel encodings of symbolic dynamical systems, such as nonlinear dynamical automata (NDA: beim Graben et al. (2000, 2004, 2008); beim Graben and Potthast (2014)) and their respective neural network implementation, namely, neural automata (NA: Carmantini et al. (2017)). Specifically, we have investigated under which circumstances such observables could be invariant under any particular choice for the Gödel encoding.

When mapping symbolic dynamics to a real phase space, the numbering of the symbols is usually arbitrary. Therefore, it makes sense to ask which information of the dynamics is preserved or can be recovered from what we see in phase space under the different possible options. In this direction, we have provided a complete characterisation of the strings that are and are not distinguishable after certain Gödel encoding in terms of patterns of equality. We have proven a partition theorem for such invariants.

In the concrete case of NA constructed as in Carmantini et al. (2017), which can emulate any Turing Machine, we have a dynamical system for a neural automaton. This system completely depends on the choice of the Gödel numbering for the symbols on the alphabet of the NA. Based on the invariant partition mentioned before, we were able to define a macroscopic observable that is invariant under any Gödel recoding. In fact, by the way we define this observable, the definition is based on an invariant partition according to the length of the strings on the left and right hand side of a dotted sequence compising the machine tape of the NA. This means that each choice of the length of those strings provides a sharper invariant, making strings with different patterns of equality completely distinguishable. It is also important to mention that macroscopic observables in general are not invariant under Gödel recoding. As a particular example, we computed the mean neural network activation originally suggested by Amari (1974) and later employed by Carmantini et al. (2017) as a modeled “synthetic ERP” (Barrès et al. 2013) in neurocomputing.

In fact, any observable that is invariant under Gödel recoding must be equally defined for points on the phase space corresponding to Gödelizations of strings sharing the same patterns of equality. This could probably provide an important constraint in the finding of other invariant macroscopic observables.

Theoretically, one could run neural automaton under all (or many) possible Gödel encodings and check which observables are preserved by the dynamics and which are not. This could provide important information about the performance of the neural network architecture that is intrinsic of the dynamical system, and not dependant on the choice of the numbering for the codification of the symbols. In practice, the computation of all the permutations of the alphabet grows with the factorial of the alphabet’s cardinality, and the computation of invariant partitions even with powers of that number for longer strings. This, of course, would present some practical constraints for large alphabets and sharp invariant observables.

Our results could be of substantial importance for any kind of related approaches in the field of computational cognitive neurodynamics. All models that rely upon the representation of symbolic mental content by means of high-dimensional activation vectors as training patterns for (deep) neural networks (Arbib 1995; LeCun et al. 2015; Hertz et al. 1991; Schmidhuber 2015), such as vector symbolic architectures (Gayler 2006; Schlegel et al. 2021; Smolensky 1990, 2006; Mizraji 1989, 2020) in particular, are facing the problems of arbitrary symbolic encodings. As long as one is only interested in building inference machines for artificial intelligence, this does not really matter. However, when activation states of neural network simulations have to be correlated with real-word data from experiments in the domains of human or animal cognitive neuroscience and psychology, the given encoding may play a role. Thus, the investigation of invariant observables in regression analyses and statistical modeling becomes mandatory for avoiding possible confounds that could result from a particularly chosen encoding.

These results also have implications in Mathematical and Computational Neuroscience, where the aim is to explain by means of mathematical theories and computational modelling neurophysiological processes as observed in in-vitro and in-vivo experiments via instrumentation devices. Our results forces us to consider the possibility as to what extent (if any) that observations, which motivate the development of models in the literature (e.g Spiking models), are epiphenomenon? To conclude, we express the hope that our study paves the way towards a more a comprehensive research in computational cognitive neurodynamics, mathematical and computational neuroscience where the study of macroscopic observations and its invariant formulation can lead to interesting new insights.

Reproducibility

All numerical simulations that have been presented in "Neurolinguistic application" section may be reproduced using the code available at the GitHub repository https://github.com/TuringMachinegun/Turing_Neural_Networks. The repository contains the code to build the architecture of a neural automaton as introduced in (Carmantini et al. (2017)) together with particular examples. The code that computes the invariant partitions given by equality patterns can also be found in the repository. The code allows the user to implement various observables (e.g step function, Amari’s observable) in order to test further cases, exploit and further develop our framework.

Acknowledgements

SR acknowledges support from Ikerbasque (The Basque Foundation for Science), the Basque Government through the BERC 2022-2025 program and by the Ministry of Science and Innovation: BCAM Severo Ochoa accreditation CEX2021-001142-S / MICIN / AEI / 10.13039/501100011033 and through project RTI2018-093860-B-C21 funded by (AEI/FEDER, UE) and acronym MathNEURO. JUA aknowledges support from the Spanish Government, grants PID2020-117281GB-I00 and PID2019-107444GA-I00, partly with European Regional Development Fund (ERDF), and the Basque Government, grant IT1483-22.

Appendix: Proofs of lemmata and theorems

Proof of Lemma 1

The ordering $γ$ itself induces the isomorphism between both graphs. Namely let $g_{γ} : T_{γ} \to T$ be such that if $s = a_{i_{1}} a_{i_{2}} a_{i_{3}} \dots \in T_{γ}$ then

\begin{matrix} g_{γ} (s) = γ (a_{i_{1}}) γ (a_{i_{2}}) γ (a_{i_{3}}) \dots \end{matrix}

which clearly belongs to T.

We must show that it defines a bijection between vertices and that preserves incidence.

It is easy to prove that it is a bijection. Namely if $w = a_{i_{1}} a_{i_{2}} \dots a_{i_{n}}$ and $u = b_{i_{1}} \dots b_{i_{k}}$ are any two vertices of the tree $T_{γ}$ then $g_{γ} (w) = g_{γ} (u)$ implies that both strings must have the same length, hence $n = k$ . And since $γ (a_{i_{j}}) = γ (b_{i_{j}})$ and $γ$ is a bijection, we must have $a_{i_{j}} = b_{i_{j}}$ so that $w = u$ . Moreover, for any $v = l_{1} \dots l_{n} \in T$ there is $z = a_{γ^{- 1} (l_{1})} \dots a_{γ^{- 1} (l_{n})}$ which is clearly mapped to v through $g_{γ}$ .

The only thing that is left to show is that $g_{γ}$ preserves incidence. That is, that given $w \in A^{*}$ and $a \in A$ , then $g_{γ} (w a) = g_{γ} (w) l$ for some $l \in {0, \dots, m - 1}$ . But this is also clear from the definition of $g_{γ}$ .

Proof of Lemma 2

Let us suppose that $d (p, q) \leq \frac{1}{m^{n}}$ . This means that at least the first n symbols in both strings are equal. Then, if $ψ$ is a Gödel encoding defined by the asignment $γ : A \to M$ we have that

\begin{matrix} ψ (p) = \sum_{i = 1}^{\infty} γ (a_{i}) \frac{1}{m^{i}} = \sum_{i = 1}^{n} γ (a_{i}) \frac{1}{m^{i}} + \sum_{i = n + 1}^{\infty} γ (a_{i}) \frac{1}{m^{i}} . \end{matrix}

Let us put $r = \sum_{i = 1}^{n} γ (a_{i}) \frac{1}{m^{i}}$ . Now, since $γ (a_{i}) \leq m - 1$

\begin{matrix} r & = \frac{a_{1} m^{n - 1} + a_{2} d^{n - 2} + \dots a_{n - 1} m + a_{n}}{m^{n}} \leq \\ \leq \frac{m^{n} - m^{n - 1} + m^{n - 1} - \dots - d + d - 1}{m^{n}} = \frac{m^{n - 1}}{m^{n}} . \end{matrix}

So $r = \frac{k}{m^{n}}$ for some $k = 0, \dots, m^{n} - 1$ . Since $\sum_{i = n + 1}^{\infty} γ (a_{i}) \frac{1}{m^{i}} < \frac{1}{m^{n + 1}}$ , we get that $ψ (p) \in [\frac{k}{m^{n}}, \frac{k + 1}{m^{n}})$ . Since q is equal to p on at least the first n strings we will also have $ψ (q) = r + \sum_{i = n + 1}^{\infty} γ (b_{i}) \frac{1}{m^{i}}$ and by the same reason it will be on the same interval.

For the other implication, if we have two real numbers $ψ (p)$ and $ψ (q)$ after encoding some infinite strings p and q, we want to show that if they are on some interval of the type $[\frac{k}{m^{n}}, \frac{k + 1}{m^{n}})$ , then they have the same prefix of at least length n. We can always write those numbers as $ψ (p) = \frac{k}{m^{n}} + r_{1}$ and $ψ (q) = \frac{k}{m^{n}} + r_{2}$ with $r_{1}, r_{2} < \frac{1}{m^{n + 1}}$ . If we write the number k in its m-adic expansion, it will be uniquely determined by $l_{1}, \dots, l_{n} \in {0, \dots, m - 1}$ , and each number $r_{i}$ can be writen as a series by $r_{i} = \sum_{j = n + 1} l_{j_{i}} \frac{1}{m^{j}}$ , for $i = 1, 2$ . Taking the inverse images of each $l_{i}$ , that is $γ^{- 1} (l_{i}) = a_{i}$ we will obtain that $p = a_{1} \dots a_{n} a_{n + 1} \dots$ and $q = a_{1} \dots a_{n} b_{n + 1} \dots$ . That is, they are at least at distance $\frac{1}{m^{n}} .$

Proof of Theorem 1

Let us first assume that given u there exists some $π$ such that its induced automorphisms maps w to u. The condition Eq. (13) is clear, by Lemma 2. Then, we can write $u = c_{1} \dots c_{n}$ . Now, we know that $g_{π} (w) = a_{π (1)} a_{π (2)} \dots a_{π (n)} = c_{1} \dots c_{n} = u$ . Then

\begin{matrix} {j_{1}, \dots, j_{k}} \in P_{w} & ⟺ a_{i_{j_{1}}} = \dots = a_{i_{j_{k}}} \\ ⟺ a_{π (i_{j_{1}})} = \dots = a_{π (i_{j_{k}})} \\ ⟺ c_{i_{j_{1}}} = \dots = c_{i_{j_{k}}} \\ ⟺ {j_{1}, \dots, j_{k}} \in P_{u} . \end{matrix}

To show the other direction, it suffices to define $π$ so that $g_{π} (w) = u$ . That is, if $u = c_{1} \dots c_{n}$ , let us define $π (a_{i}) = c_{i}$ and let us send the $a_{j} \in M$ not appearing in w to the $c_{j}$ -s not appearing in u in a bijective way. This can be done, it is well defined by condition Eq. (14), and it defines a bijection on $M$ by construction.

Proof of Theorem 2

We have to show that equation (3) is satisfied. Namely, we have to show that $f (α_{π}^{*} (x)) = f (x)$ for each $π \in S_{m - 1} \times S_{m - 1}$ and $x \in X$ . Note that by definition, $α_{π}^{*} = ρ_{π}$ in our case. Therefore, we must show that $f (ρ_{π} (x)) = f (x)$ Let $π = (π_{1}, π_{2}) \in S_{m - 1} \times S_{m - 1}$ and $(y_{1}, y_{2}) \in Y$ . Then, there is some $E^{(i, j)}$ for which $(x, y) \in E^{(i, j)}$ . Let us denote by $π^{'} = (π_{1}^{'}, π_{2}^{'}) \in S_{m} \times S_{m}$ the permutation fixing 1 and sending $π^{'} (i) = π (i) + 1$ for $i = 2, \dots, m - 1$ (namely, the permutation fixing the first letter and permuting the rest as $π_{i}$ permutes the $m - 1$ letters for $i = 1, 2$ respectively).

Note that since we have enlarged our alphabet with the $⊔$ symbol, and since both our original encoding and the one permuted by $π$ send this symbol to 0, if $(y_{1}, y_{2})$ is decoded as $(c_{1}, \dots, c_{l})$ and $(d_{1}, \dots, d_{r})$ , the possible 0s appearing at the end of each encoding (indicating that the string has smaller length than l and/or r) will remain being 0-s, and therefore the point will not be mapped to a point encoding longer strings.

After applying $ρ_{π^{'}}$ , we will obtain $ρ_{π} ((y_{1}, y_{2})) \in E^{(i^{'}, j^{'})}$ for some $i^{'}, j^{'}$ . However, since $ρ_{π^{'}}$ is defined through $g_{π_{1}^{'}}$ and $g_{π_{2}^{'}}$ both $E^{(i, j)}$ and $E^{(i^{'}, j^{'})}$ belong to the same $P_{ij}$ . That is, they have the same pattern of equality. Hence, by the definition of f we obtain that $f (x) = f (ρ_{π} (x))$ , and we are done. $□$

Funding

Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.

Footnotes

Note that neural automata are parsimonious implementations of universal computers, especially of Turing machines. These are not to be confused with neural Turing machines appearing in the framework of deep learning approaches (Graves et al. 2014).

In fact one needs the algebra of essentially bounded functions with respect to a given probability measure here. For a proper treatment of these concepts, algebraic quantum theory is required (Sewell 2002).

Cf. the related concept of epistemic equivalence used by beim Graben and Atmanspacher (2006, 2009).

⁴

In the same way, the shift map’s inverse, $σ^{- 1}$ , can be defined as the operation that shifts every character in the string s one position to the right (or, equivalently, moves the dot one position to the left).

⁵

Here we are assuming that both the alphabet $A$ and the Gödel Encoding is the same in both sides of the dot, otherwise we would have more freedom and obtain more intervals, but the procedure works anyway.

⁶

In order to make things simpler we will assume that we have the same alphabet on the stack and the input symbols. This can always be assumed considering the union of both alphabets if needed.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Jone Uria-Albizuri, Email: jone.uria@ehu.eus.

Serafim Rodrigues, Email: srodrigues@bcamath.org.

References

Amari SI (1974) A method of statistical neurodynamics. Kybernetik 14:201–215 [DOI] [PubMed] [Google Scholar]
Anderson JR, Bothell D, Byrne MD, Douglass S, Lebiere C, Qin Y (2004) An integrated theory of the mind. Psychol Rev 111(4):1036–1060 [DOI] [PubMed] [Google Scholar]
Arbib MA (ed) (1995) The handbook of brain theory and neural networks. MIT Press, Cambridge [Google Scholar]
Arbib MA, Caplan D (1979) Neurolinguistics must be computational. Behav Brain Sci 2(03):449–460 [Google Scholar]
Atmanspacher H, Beim Graben P (2007) Contextual emergence of mental states from neurodynamics. Chaos Complexity Lett 2(2/3):151–168 [Google Scholar]
Barrès V, Arbib ASM (2013) Synthetic event-related potentials: A computational bridge between neurolinguistic models and experiments. Neural Netw 37:66–92 [DOI] [PubMed] [Google Scholar]
Beim Graben P (2008) Foundations of neurophysics. In: Zhou C, Thiel M, Kurths J, Graben PB (eds) Lectures in supercomputational neuroscience dynamics in complex brain networks springer complexity series. Springer, Berlin [Google Scholar]
Beim Graben P, Atmanspacher H (2006) Complementarity in classical dynamical systems. Found Phys 36(2):291–306 [Google Scholar]
Beim Graben P, Atmanspacher H (2009) Extending the philosophical significance of the idea of complementarity. In: Atmanspacher H, Primas H (eds) Recasting reality Wolfgang Pauli’s philosophical ideas and contemporay science. Springer, Berlin [Google Scholar]
Beim Graben P, Drenhaus H (2012) Computationelle neurolinguistik. Z. Germanistische Linguistik 40(1):97–125 [Google Scholar]
Beim Graben P, Potthast R (2009) Inverse problems in dynamic cognitive modeling. Chaos 19(1):015103 [DOI] [PubMed] [Google Scholar]
Beim Graben P, Potthast R (2014) Universal neural field computation. In: Potthast R, Wright JJ, Coombes S, Beim Graben P (eds) Neural fields theory and applications. Springer, Berlin [Google Scholar]
Beim Graben P, Rodrigues S (2013) A biophysical observation model for field potentials of networks of leaky integrate-and-fire neurons. Front Comput Neurosci 6(100):10042 [DOI] [PMC free article] [PubMed] [Google Scholar]
Beim Graben P, Rodrigues S (2014) On the electrodynamics of neural networks. In: Potthast R, Wright JJ, Coombes S, Beim Graben P (eds) Neural fields theory and applications. Springer, Berlin [Google Scholar]
Beim Graben P, Liebscher T, Saddy JD (2000) Parsing ambiguous context-free languages by dynamical systems: disambiguation and phase transitions in neural networks with evidence from event-related brain potentials (ERP). In: Jokinen K, Heylen D, Njiholt A (eds) Learning to Behave Universiteit Twente Enschede, TWLT 18. Internalising Knowledge, Enschede [Google Scholar]
Beim Graben P, Jurish B, Saddy D, Frisch S (2004) Language processing by dynamical systems. Int J Bifurcat Chaos 14(2):599–621 [Google Scholar]
Beim Graben P, Gerth S, Vasishth S (2008) Towards dynamical system models of language-related brain potentials. Cogn Neurodyn 2(3):229–255 [DOI] [PMC free article] [PubMed] [Google Scholar]
Beim Graben P, Barrett A, Atmanspacher H (2009) Stability criteria for the contextual emergence of macrostates in neural networks. Netw Comput Neural Syst 20(3):178–196 [DOI] [PubMed] [Google Scholar]
Boston MF, Hale JT, Patil U, Kliegl R, Vasishth S (2008) Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam sentence corpus. J Eye Mov Res 2(1):1–12 [Google Scholar]
Brouwer H, Crocker MW (2017) On the proper treatment of the P400 and P600 in language comprehension. Front Psychol 8:1327 [DOI] [PMC free article] [PubMed] [Google Scholar]
Brouwer H, Hoeks JCJ (2013) A time and place for language comprehension: mapping the N400 and the P600 to a minimal cortical network. Front Human Neurosci 7(758):4572 [DOI] [PMC free article] [PubMed] [Google Scholar]
Brouwer H, Fitz H, Hoeks J (2012) Getting real about Semantic Illusions: rethinking the functional role of the P600 in language comprehension. Brain Res 1446:127–143 [DOI] [PubMed] [Google Scholar]
Brouwer H, Crocker MW, Venhuizen NJ, Hoeks JCJ (2017) A neurocomputational model of the N400 and the P600 in language processing. Cogn Sci 41(S6):1318–1352 [DOI] [PMC free article] [PubMed] [Google Scholar]
Brouwer H, Delogu F, Venhuizen NJ, Crocker MW (2021) Neurobehavioral correlates of surprisal in language comprehension: a neurocomputational model. Front Psychol 12:110 [DOI] [PMC free article] [PubMed] [Google Scholar]
Carmantini GS, Beim Graben P, Desroches M, Rodrigues S (2017) A modular architecture for transparent computation in recurrent neural networks. Neural Netw 85:85–105 [DOI] [PubMed] [Google Scholar]
Cleeremans A, Servan-Schreiber D, McClelland JL (1989) Finite state automata and simple recurrent networks. Neural Comput 1(3):372–381 [Google Scholar]
Crocker MW (1996) Computational psycholinguistics studies in computational psycholinguistics. Kluwer, Dordrecht [Google Scholar]
Davidson DJ, Martin AE (2013) Modeling accuracy as a function of response time with the generalized linear mixed effects model. Acta Physiol (Oxf) 144(1):83–96 [DOI] [PubMed] [Google Scholar]
Elman JL (1990) Finding structure in time. Cogn Sci 14:179–211 [Google Scholar]
Elman JL (1991) Distributed representations, simple recurrent networks, and grammatical structure. Mach Learn 7:195–225 [Google Scholar]
Emch G (1964) Coarse-graining in Liouville space and master equation. Helv Phys Acta 37:532–544 [Google Scholar]
Frank SL, Otten LJ, Galli G, Vigliocco G (2015) The ERP response to the amount of information conveyed by words in sentences. Brain Lang 140:1–11 [DOI] [PubMed] [Google Scholar]
Frisch S, Beim Graben P, Schlesewsky M (2004) Parallelizing grammatical functions: P600 and P345 reflect different cost of reanalysis. Int J Bifurcat Chaos 14(2):531–549 [Google Scholar]
Friston KJ, Mechelli A, Turner R, Price CJ (2000) Nonlinear responses in fMRI: the balloon model, Volterra kernels, and other hemodynamics. Neuroimage 12(4):466–477 [DOI] [PubMed] [Google Scholar]
Gayler RW (2006) Vector symbolic architectures are a viable alternative for Jackendoff’s challenges. Beha Brain Sci 29:78–79. 10.1017/S0140525X06309028 [Google Scholar]
Gazzaniga MS, Ivry RB, Mangun GR (eds) (2002) Cognitive neuroscience. The Biology of the Mind, New York [Google Scholar]
Graves A, Wayne G, Danihelka I (2014) Neural Turing machines. arxiv:1410.5401 [cs.ne], Google DeepMind
Hale JT, Lutz DE, Luh WM, Brennan JR (2015) Modeling fMRI time courses with linguistic structure at various grain sizes. In: Proceedings of the 2015 workshop on cognitive modeling and computational linguistics, North American Association for Computational Lingustics, Denver
Hale JT, Campanelli L, Li J, Bhattasali S, Pallier C, Brennan JR (2022) Neurocomputational models of language processing. Ann Rev Linguist 8(1):427–446 [Google Scholar]
Hepp K (1972) Quantum theory of measurement and macroscopic observables. Helv Phys Acta 45(2):237–248 [Google Scholar]
Hertz J, Krogh A, Palmer RG (1991) Introduction to the theory of neural computation. In: Lecture notes of the Santa FE institute studies in the science of complexity. Perseus Books, Cambridge
Hopcroft JE, Ullman JD (1979) Introduction to automata theory languages, and computation. Addison-Wesley, Menlo Park [Google Scholar]
Huyck CR (2009) A psycholinguistic model of natural language parsing implemented in simulated neurons. Cogn Neurodyn 3(4):317–330 [DOI] [PMC free article] [PubMed] [Google Scholar]
Jauch JM (1964) The problem of measurement in quantum mechanics. Helv Phys Acta 37:293–316 [Google Scholar]
Jirsa VK, Jantzen KJ, Fuchs A, Kelso JAS (2002) Spatiotemporal forward solution of the EEG and MEG using network modeling. IEEE Trans Med Imag 21(5):493–504 [DOI] [PubMed] [Google Scholar]
Kelso JAS (1995) Dynamic patterns. MIT Press, Cambrigde [Google Scholar]
Krifka M (1995) The semantics and pragmatics of polarity items. Linguist Anal 25:209–257 [Google Scholar]
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444 [DOI] [PubMed] [Google Scholar]
Lewis RL (2003) Computational psycholinguistics. In: Encyclopedia of cognitive science. Macmillan Reference Ltd., London
Lewis RL (1998) Reanalysis and limited repair parsing: leaping off the garden path. In: Ferreira F, Fodor JD (eds) Reanalysis in sentence processing. Kluwer, Amsterdam [Google Scholar]
Lewis RL, Vasishth S (2006) An activation-based model of sentence processing as skilled memory retrieval. Cogn Sci 29:375–419 [DOI] [PubMed] [Google Scholar]
Lind D, Marcus B (1995) An introduction to symbolic dynamics and coding. Cambridge University Press, Cambridge [Google Scholar]
Martínez-Cañada P, Ness TV, Einevoll G, Fellin T, Panzeri S (2021) Computation of the electroencephalogram (EEG) from network models of point neurons. PLoS Comput Biol 17(4):1–41 [DOI] [PMC free article] [PubMed] [Google Scholar]
Mazzoni A, Panzeri S, Logothetis NK, Brunel N (2008) Encoding of naturalistic stimuli by local field potential spectra in networks of excitatory and inhibitory neurons. PLoS Comput Biol 4(12):e1000239 [DOI] [PMC free article] [PubMed] [Google Scholar]
McCulloch WS, Pitts W (1943) A logical calculus of ideas immanent in nervous activity. Bull Math Biophys 5:115–133 [PubMed] [Google Scholar]
McMillan B (1953) The basic theorems of information theory. Ann Math Stat 24:196–219 [Google Scholar]
Mizraji E (1989) Context-dependent associations in linear distributed memories. Bull Math Biol 51(2):195–205 [DOI] [PubMed] [Google Scholar]
Mizraji E (2020) Vector logic allows counterfactual virtualization by the square root of NOT. Logic J IGPL 25:463 [Google Scholar]
Moore C (1990) Unpredictability and undecidability in dynamical systems. Phys Rev Lett 64:2354 [DOI] [PubMed] [Google Scholar]
Moore C (1991) Generalized shifts: unpredictability and undecidability in dynamical systems. Nonlinearity 4:199 [Google Scholar]
Nivre J (2008) Algorithms for deterministic incremental dependency parsing. Comput Linguist 34(4):513–553 [Google Scholar]
Rabinovich MI, Varona P (2018) Discrete sequential information coding: heteroclinic cognitive dynamics. Front Comput Neurosci 12:73 [DOI] [PMC free article] [PubMed] [Google Scholar]
Rabinovich M, Friston K, Varona P (eds) (2012) Principles of brain dynamics: global state interactions. MIT Press, Cambridge [Google Scholar]
Rabovsky M, McRae K (2014) Simulating the n400 ERP component as semantic network error: insights from a feature-based connectionist attractor model of word meaning. Cognition 132(1):68–89 [DOI] [PubMed] [Google Scholar]
Rabovsky M, Hansen SS, McClelland JL (2018) Modelling the N400 brain potential as change in a probabilistic representation of meaning. Nat Human Behav 2:693 [DOI] [PubMed] [Google Scholar]
Ratcliff R (1978) A theory of memory retrieval. Psychol Rev 85(2):59–108 [Google Scholar]
Ratcliff R, McKoon G (2007) The diffusion decision model: theory and data for two-choice decision tasks. Neural Comput 20(4):873–922 [DOI] [PMC free article] [PubMed] [Google Scholar]
Schlegel K, Neubert P, Protzel P (2021) A comparison of vector symbolic architectures. Artif Intell Rev 55:4523 [Google Scholar]
Schmidhuber J (2015) Deep learning in neural networks: An overview. Neural Netw 61:85–117 [DOI] [PubMed] [Google Scholar]
Sewell GL (2002) Quantum mechanics and its emergent macrophysics. Princeton University Press, Princeton [Google Scholar]
Siegelmann HT, Sontag ED (1995) On the computational power of neural nets. J Comput Syst Sci 50(1):132–150 [Google Scholar]
Smolensky P (1986) Information processing in dynamical systems: foundations of harmony theory. In: Rumelhart DE, McClelland JL (eds) The PDP Research Group parallel distributed processing: explorations in the microstructure of cognition. MIT Press, Cambridge [Google Scholar]
Smolensky P (1990) Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artif Intell 46(1–2):159–216 [Google Scholar]
Smolensky P (2006) Harmony in linguistic cognition. Cogn Sci 30:779–801 [DOI] [PubMed] [Google Scholar]
Stabler EP (2011) Top-down recognizers for MCFGs and MGs. In: Proceedings of the 2nd workshop on cognitive modeling and computational linguistics. Association for Computational Linguistics, Portland, pp 39 – 48
Stephan KE, Harrison LM, Penny WD, Friston KJ (2004) Biophysical models of fMRI responses. Curr Opin Neurobiol 14:629–635 [DOI] [PubMed] [Google Scholar]
van Gelder T (1998) The dynamical hypothesis in cognitive science. Behav Brain Sci 21(05):615–628 [DOI] [PubMed] [Google Scholar]
Wilson HR, Cowan JD (1972) Excitatory and inhibitory interactions in localized populations of model neurons. Biophys J 12(1):1–24 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR1] Amari SI (1974) A method of statistical neurodynamics. Kybernetik 14:201–215 [DOI] [PubMed] [Google Scholar]

[CR2] Anderson JR, Bothell D, Byrne MD, Douglass S, Lebiere C, Qin Y (2004) An integrated theory of the mind. Psychol Rev 111(4):1036–1060 [DOI] [PubMed] [Google Scholar]

[CR3] Arbib MA (ed) (1995) The handbook of brain theory and neural networks. MIT Press, Cambridge [Google Scholar]

[CR4] Arbib MA, Caplan D (1979) Neurolinguistics must be computational. Behav Brain Sci 2(03):449–460 [Google Scholar]

[CR5] Atmanspacher H, Beim Graben P (2007) Contextual emergence of mental states from neurodynamics. Chaos Complexity Lett 2(2/3):151–168 [Google Scholar]

[CR6] Barrès V, Arbib ASM (2013) Synthetic event-related potentials: A computational bridge between neurolinguistic models and experiments. Neural Netw 37:66–92 [DOI] [PubMed] [Google Scholar]

[CR7] Beim Graben P (2008) Foundations of neurophysics. In: Zhou C, Thiel M, Kurths J, Graben PB (eds) Lectures in supercomputational neuroscience dynamics in complex brain networks springer complexity series. Springer, Berlin [Google Scholar]

[CR8] Beim Graben P, Atmanspacher H (2006) Complementarity in classical dynamical systems. Found Phys 36(2):291–306 [Google Scholar]

[CR9] Beim Graben P, Atmanspacher H (2009) Extending the philosophical significance of the idea of complementarity. In: Atmanspacher H, Primas H (eds) Recasting reality Wolfgang Pauli’s philosophical ideas and contemporay science. Springer, Berlin [Google Scholar]

[CR10] Beim Graben P, Drenhaus H (2012) Computationelle neurolinguistik. Z. Germanistische Linguistik 40(1):97–125 [Google Scholar]

[CR11] Beim Graben P, Potthast R (2009) Inverse problems in dynamic cognitive modeling. Chaos 19(1):015103 [DOI] [PubMed] [Google Scholar]

[CR12] Beim Graben P, Potthast R (2014) Universal neural field computation. In: Potthast R, Wright JJ, Coombes S, Beim Graben P (eds) Neural fields theory and applications. Springer, Berlin [Google Scholar]

[CR13] Beim Graben P, Rodrigues S (2013) A biophysical observation model for field potentials of networks of leaky integrate-and-fire neurons. Front Comput Neurosci 6(100):10042 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] Beim Graben P, Rodrigues S (2014) On the electrodynamics of neural networks. In: Potthast R, Wright JJ, Coombes S, Beim Graben P (eds) Neural fields theory and applications. Springer, Berlin [Google Scholar]

[CR15] Beim Graben P, Liebscher T, Saddy JD (2000) Parsing ambiguous context-free languages by dynamical systems: disambiguation and phase transitions in neural networks with evidence from event-related brain potentials (ERP). In: Jokinen K, Heylen D, Njiholt A (eds) Learning to Behave Universiteit Twente Enschede, TWLT 18. Internalising Knowledge, Enschede [Google Scholar]

[CR16] Beim Graben P, Jurish B, Saddy D, Frisch S (2004) Language processing by dynamical systems. Int J Bifurcat Chaos 14(2):599–621 [Google Scholar]

[CR17] Beim Graben P, Gerth S, Vasishth S (2008) Towards dynamical system models of language-related brain potentials. Cogn Neurodyn 2(3):229–255 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] Beim Graben P, Barrett A, Atmanspacher H (2009) Stability criteria for the contextual emergence of macrostates in neural networks. Netw Comput Neural Syst 20(3):178–196 [DOI] [PubMed] [Google Scholar]

[CR19] Boston MF, Hale JT, Patil U, Kliegl R, Vasishth S (2008) Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam sentence corpus. J Eye Mov Res 2(1):1–12 [Google Scholar]

[CR20] Brouwer H, Crocker MW (2017) On the proper treatment of the P400 and P600 in language comprehension. Front Psychol 8:1327 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] Brouwer H, Hoeks JCJ (2013) A time and place for language comprehension: mapping the N400 and the P600 to a minimal cortical network. Front Human Neurosci 7(758):4572 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] Brouwer H, Fitz H, Hoeks J (2012) Getting real about Semantic Illusions: rethinking the functional role of the P600 in language comprehension. Brain Res 1446:127–143 [DOI] [PubMed] [Google Scholar]

[CR23] Brouwer H, Crocker MW, Venhuizen NJ, Hoeks JCJ (2017) A neurocomputational model of the N400 and the P600 in language processing. Cogn Sci 41(S6):1318–1352 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] Brouwer H, Delogu F, Venhuizen NJ, Crocker MW (2021) Neurobehavioral correlates of surprisal in language comprehension: a neurocomputational model. Front Psychol 12:110 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] Carmantini GS, Beim Graben P, Desroches M, Rodrigues S (2017) A modular architecture for transparent computation in recurrent neural networks. Neural Netw 85:85–105 [DOI] [PubMed] [Google Scholar]

[CR26] Cleeremans A, Servan-Schreiber D, McClelland JL (1989) Finite state automata and simple recurrent networks. Neural Comput 1(3):372–381 [Google Scholar]

[CR27] Crocker MW (1996) Computational psycholinguistics studies in computational psycholinguistics. Kluwer, Dordrecht [Google Scholar]

[CR28] Davidson DJ, Martin AE (2013) Modeling accuracy as a function of response time with the generalized linear mixed effects model. Acta Physiol (Oxf) 144(1):83–96 [DOI] [PubMed] [Google Scholar]

[CR29] Elman JL (1990) Finding structure in time. Cogn Sci 14:179–211 [Google Scholar]

[CR30] Elman JL (1991) Distributed representations, simple recurrent networks, and grammatical structure. Mach Learn 7:195–225 [Google Scholar]

[CR31] Emch G (1964) Coarse-graining in Liouville space and master equation. Helv Phys Acta 37:532–544 [Google Scholar]

[CR32] Frank SL, Otten LJ, Galli G, Vigliocco G (2015) The ERP response to the amount of information conveyed by words in sentences. Brain Lang 140:1–11 [DOI] [PubMed] [Google Scholar]

[CR33] Frisch S, Beim Graben P, Schlesewsky M (2004) Parallelizing grammatical functions: P600 and P345 reflect different cost of reanalysis. Int J Bifurcat Chaos 14(2):531–549 [Google Scholar]

[CR34] Friston KJ, Mechelli A, Turner R, Price CJ (2000) Nonlinear responses in fMRI: the balloon model, Volterra kernels, and other hemodynamics. Neuroimage 12(4):466–477 [DOI] [PubMed] [Google Scholar]

[CR35] Gayler RW (2006) Vector symbolic architectures are a viable alternative for Jackendoff’s challenges. Beha Brain Sci 29:78–79. 10.1017/S0140525X06309028 [Google Scholar]

[CR36] Gazzaniga MS, Ivry RB, Mangun GR (eds) (2002) Cognitive neuroscience. The Biology of the Mind, New York [Google Scholar]

[CR37] Graves A, Wayne G, Danihelka I (2014) Neural Turing machines. arxiv:1410.5401 [cs.ne], Google DeepMind

[CR38] Hale JT, Lutz DE, Luh WM, Brennan JR (2015) Modeling fMRI time courses with linguistic structure at various grain sizes. In: Proceedings of the 2015 workshop on cognitive modeling and computational linguistics, North American Association for Computational Lingustics, Denver

[CR39] Hale JT, Campanelli L, Li J, Bhattasali S, Pallier C, Brennan JR (2022) Neurocomputational models of language processing. Ann Rev Linguist 8(1):427–446 [Google Scholar]

[CR40] Hepp K (1972) Quantum theory of measurement and macroscopic observables. Helv Phys Acta 45(2):237–248 [Google Scholar]

[CR41] Hertz J, Krogh A, Palmer RG (1991) Introduction to the theory of neural computation. In: Lecture notes of the Santa FE institute studies in the science of complexity. Perseus Books, Cambridge

[CR42] Hopcroft JE, Ullman JD (1979) Introduction to automata theory languages, and computation. Addison-Wesley, Menlo Park [Google Scholar]

[CR43] Huyck CR (2009) A psycholinguistic model of natural language parsing implemented in simulated neurons. Cogn Neurodyn 3(4):317–330 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] Jauch JM (1964) The problem of measurement in quantum mechanics. Helv Phys Acta 37:293–316 [Google Scholar]

[CR45] Jirsa VK, Jantzen KJ, Fuchs A, Kelso JAS (2002) Spatiotemporal forward solution of the EEG and MEG using network modeling. IEEE Trans Med Imag 21(5):493–504 [DOI] [PubMed] [Google Scholar]

[CR46] Kelso JAS (1995) Dynamic patterns. MIT Press, Cambrigde [Google Scholar]

[CR47] Krifka M (1995) The semantics and pragmatics of polarity items. Linguist Anal 25:209–257 [Google Scholar]

[CR48] LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444 [DOI] [PubMed] [Google Scholar]

[CR49] Lewis RL (2003) Computational psycholinguistics. In: Encyclopedia of cognitive science. Macmillan Reference Ltd., London

[CR50] Lewis RL (1998) Reanalysis and limited repair parsing: leaping off the garden path. In: Ferreira F, Fodor JD (eds) Reanalysis in sentence processing. Kluwer, Amsterdam [Google Scholar]

[CR51] Lewis RL, Vasishth S (2006) An activation-based model of sentence processing as skilled memory retrieval. Cogn Sci 29:375–419 [DOI] [PubMed] [Google Scholar]

[CR52] Lind D, Marcus B (1995) An introduction to symbolic dynamics and coding. Cambridge University Press, Cambridge [Google Scholar]

[CR53] Martínez-Cañada P, Ness TV, Einevoll G, Fellin T, Panzeri S (2021) Computation of the electroencephalogram (EEG) from network models of point neurons. PLoS Comput Biol 17(4):1–41 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR54] Mazzoni A, Panzeri S, Logothetis NK, Brunel N (2008) Encoding of naturalistic stimuli by local field potential spectra in networks of excitatory and inhibitory neurons. PLoS Comput Biol 4(12):e1000239 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR55] McCulloch WS, Pitts W (1943) A logical calculus of ideas immanent in nervous activity. Bull Math Biophys 5:115–133 [PubMed] [Google Scholar]

[CR56] McMillan B (1953) The basic theorems of information theory. Ann Math Stat 24:196–219 [Google Scholar]

[CR57] Mizraji E (1989) Context-dependent associations in linear distributed memories. Bull Math Biol 51(2):195–205 [DOI] [PubMed] [Google Scholar]

[CR58] Mizraji E (2020) Vector logic allows counterfactual virtualization by the square root of NOT. Logic J IGPL 25:463 [Google Scholar]

[CR59] Moore C (1990) Unpredictability and undecidability in dynamical systems. Phys Rev Lett 64:2354 [DOI] [PubMed] [Google Scholar]

[CR60] Moore C (1991) Generalized shifts: unpredictability and undecidability in dynamical systems. Nonlinearity 4:199 [Google Scholar]

[CR61] Nivre J (2008) Algorithms for deterministic incremental dependency parsing. Comput Linguist 34(4):513–553 [Google Scholar]

[CR62] Rabinovich MI, Varona P (2018) Discrete sequential information coding: heteroclinic cognitive dynamics. Front Comput Neurosci 12:73 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR63] Rabinovich M, Friston K, Varona P (eds) (2012) Principles of brain dynamics: global state interactions. MIT Press, Cambridge [Google Scholar]

[CR64] Rabovsky M, McRae K (2014) Simulating the n400 ERP component as semantic network error: insights from a feature-based connectionist attractor model of word meaning. Cognition 132(1):68–89 [DOI] [PubMed] [Google Scholar]

[CR65] Rabovsky M, Hansen SS, McClelland JL (2018) Modelling the N400 brain potential as change in a probabilistic representation of meaning. Nat Human Behav 2:693 [DOI] [PubMed] [Google Scholar]

[CR66] Ratcliff R (1978) A theory of memory retrieval. Psychol Rev 85(2):59–108 [Google Scholar]

[CR67] Ratcliff R, McKoon G (2007) The diffusion decision model: theory and data for two-choice decision tasks. Neural Comput 20(4):873–922 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR68] Schlegel K, Neubert P, Protzel P (2021) A comparison of vector symbolic architectures. Artif Intell Rev 55:4523 [Google Scholar]

[CR69] Schmidhuber J (2015) Deep learning in neural networks: An overview. Neural Netw 61:85–117 [DOI] [PubMed] [Google Scholar]

[CR70] Sewell GL (2002) Quantum mechanics and its emergent macrophysics. Princeton University Press, Princeton [Google Scholar]

[CR71] Siegelmann HT, Sontag ED (1995) On the computational power of neural nets. J Comput Syst Sci 50(1):132–150 [Google Scholar]

[CR72] Smolensky P (1986) Information processing in dynamical systems: foundations of harmony theory. In: Rumelhart DE, McClelland JL (eds) The PDP Research Group parallel distributed processing: explorations in the microstructure of cognition. MIT Press, Cambridge [Google Scholar]

[CR73] Smolensky P (1990) Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artif Intell 46(1–2):159–216 [Google Scholar]

[CR74] Smolensky P (2006) Harmony in linguistic cognition. Cogn Sci 30:779–801 [DOI] [PubMed] [Google Scholar]

[CR75] Stabler EP (2011) Top-down recognizers for MCFGs and MGs. In: Proceedings of the 2nd workshop on cognitive modeling and computational linguistics. Association for Computational Linguistics, Portland, pp 39 – 48

[CR76] Stephan KE, Harrison LM, Penny WD, Friston KJ (2004) Biophysical models of fMRI responses. Curr Opin Neurobiol 14:629–635 [DOI] [PubMed] [Google Scholar]

[CR77] van Gelder T (1998) The dynamical hypothesis in cognitive science. Behav Brain Sci 21(05):615–628 [DOI] [PubMed] [Google Scholar]

[CR78] Wilson HR, Cowan JD (1972) Excitatory and inhibitory interactions in localized populations of model neurons. Biophys J 12(1):1–24 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Invariants for neural automata

Jone Uria-Albizuri

Giovanni Sirio Carmantini

Peter beim Graben

Serafim Rodrigues

Abstract

Introduction

Invariants in dynamical systems

Neurodynamics

Symbolic dynamics

Rooted trees

Fig. 1.

Fig. 2.

Lemma 1

Gödel encodings

Lemma 2

Cylinder sets

Fig. 3.

Invariants

Definition 1

Theorem 1

Example 1

Fig. 4.

Fig. 5.

Neural automata

Fig. 6.

Invariant observables

Theorem 2

Neurolinguistic application

Example 2

Table 1.

Fig. 7.

Fig. 8.

Discussion

Reproducibility

Acknowledgements

Appendix: Proofs of lemmata and theorems

Proof of Lemma 1

Proof of Lemma 2

Proof of Theorem 1

Proof of Theorem 2

Funding

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases