Skip to main content
Cognitive Neurodynamics logoLink to Cognitive Neurodynamics
. 2010 Apr 20;4(4):295–313. doi: 10.1007/s11571-010-9110-4

Functional model of biological neural networks

James Ting-Ho Lo 1,
PMCID: PMC2974103  PMID: 22132040

Abstract

A functional model of biological neural networks, called temporal hierarchical probabilistic associative memory (THPAM), is proposed in this paper. THPAM comprises functional models of dendritic trees for encoding inputs to neurons, a first type of neuron for generating spike trains, a second type of neuron for generating graded signals to modulate neurons of the first type, supervised and unsupervised Hebbian learning mechanisms for easy learning and retrieving, an arrangement of dendritic trees for maximizing generalization, hardwiring for rotation-translation-scaling invariance, and feedback connections with different delay durations for neurons to make full use of present and past informations generated by neurons in the same and higher layers. These functional models and their processing operations have many functions of biological neural networks that have not been achieved by other models in the open literature and provide logically coherent answers to many long-standing neuroscientific questions. However, biological justifications of these functional models and their processing operations are required for THPAM to qualify as a macroscopic model (or low-order approximate) of biological neural networks.

Keywords: Neuron model, Hebb learning, Spike train, Unsupervised learning, Dendritic tree model

Introduction

Biological neural networks are known to have such structures as hierarchical networks with feedbacks, neurons, denritic trees and synapses; and perform such functions as supervised and unsupervised Hebbian learning, storing knowledge in synapses, encoding information by dendritic trees, and detecting and recognizing spatial and temporal multiple/hierarchical causes. However, descriptions of these structures and functions are mostly fragmental and sometimes controversial in the literature on neuroscience (Arbib 2003; Dayan and Abbott 2001; Kandel et al. 2000; Koch 1999; Levitan and Kaczmarek 1993; Stuart et al. 2008) (Two examples related with this paper are logic gates vs. low-order polynomials in dendritic processing (Mel 1994), Hebbian vs. not Hebbian in learning (Mel 2002)) and on artificial neural networks (Bishop 2006; Dayan and Abbott 2001; Hawkins 2004; Hecht-Nielsen 2007; Hecht-Nielsen and McKenna 2003; Principe et al. 2000; Rieke et al. 1999; O’Reilly and Munakata 2000; Hassoun 1993; Hinton and Anderson 1989; Kohonen 1988). A single mathematical model that provides an integration of these structures and functions, and explains how the structures interact to perform the functions may shed some light to what processing operations might be required for each structure and function, suggest corresponding experiments to perform, and thereby enhance understanding of biological neural networks as systems whole. In fact, neuroscientists have long hypothesized a common cortical algorithm, and researchers on artificial neural networks have long searched for an ideal learning machine that learns and retrieves easily, detects and recognizes multiple temporal and spatial causes, and generalizes adequately on noisy, distorted, occluded, rotated, translated and scaled patterns. A common cortical algorithm and, more often than not, an ideal learning machine is intended to be a single mathematical model. To the best of this author’s knowledge, the former was first mentioned by Vernon Mountcastle (1978) and the latter was first suggested by John von Neumann (1958).

This paper is intended to provide such a mathematical model. The model, called temporal hierarchical probabilistic associative memory (THPAM), comprises novel models of dendritic trees; neurons communicating with spike trains; a mechanism for unsupervised and supervised learning; a structure for detecting and recognizing noised, distorted and occluded patterns; hard-wiring for detecting and recognizing rotated, translated and scaled patterns; and feedback neural fibers for processing temporal data. Although biological justifications of these models have not been established, these models are logically coherent and integrate into the model, THPAM, of biological neural networks. Before the biological justifications are obained, THPAM can only be termed a functional model rather than a macroscopic model.

Derivation of THPAM is guided by the following four neurobiological postulates:

  1. The biological neural networks are recurrent multilayer networks of neurons.

  2. Most neurons output a spike train.

  3. Knowledge is stored in the synapses between neurons.

  4. Synaptic strengths are adjusted by a version of the Hebb rule of learning. (In his 1949 book, The Organization of Behavior, Donald Hebb posited: “When one cell repeatedly assists in firing another, the axon of the first cell develops synaptic knobs (or enlarges them if they already exist) in contact with the soma of the second cell.” A natural extension of this (alluded to by Hebb as the decay of unused connections) is to decrease the synaptic strength when the source and target neurons are not active at the same time.) [http://www.en.wikipedia.org/wiki/Hebbian_theory].

As a matter of fact, in the development of artificial neural networks, Postulates 1 and 3 led to multilayer perceptrons and recurrent multilayer perceptrons (Rieke et al. 1999; O’Reilly and Munakata 2000; Dayan and Abbott 2001; Hecht-Nielsen and McKenna 2003; Hawkins 2004; Hecht-Nielsen 2007; Principe et al. 2000; Bishop 2006; Haykin 2009), and Postulates 3 and 4 led to associative memories (Kohonen 1988; Hinton and Anderson 1989; Hassoun 1993). However, multilayer perceptrons exclude Postulates 2 and 4; and associative memories exclude Postulate 1. As useful as multilayer perceptrons and associative memories are in engineering, they have limited capabilities and offer little insight into the inner workings of biological neural networks.

The construction of a functional model of biological neural networks based on all the four postulates has broken the barriers confining the multilayer perceptrons and the associative memories. A first contribution of this paper lies in each of the following features of THPAM (temporal hierarchical probabilistic associative memory) that such existing models as the recurrent multilayer perceptron and associative memories do not have:

  1. a recurrent multilayer network learning by the Hebb rule;

  2. fully automated unsupervised and supervised Hebbian learning mechanisms (involving no differentiation, error backpropagation, optimization, iteration, cycling repeatedly through all learning data, or waiting for asymptotic behavior to emerge);

  3. dendritic trees encoding inputs to neurons;

  4. neurons communicating with spike trains carrying subjective probability distributions;

  5. masking matrices facilitating recognition of corrupted, distorted, and occluded patterns; and

  6. feedbacks with different delay durations for fully utilizing temporally and spatially associated information.

A second contribution of this paper lies in the integration of not only the above unique features but also the following additional features in a single model of biological neural networks:

  1. detecting and recognizing multiple/hierarchical causes; and

  2. hard-wired learning for detecting and recognizing rotated, translated and scaled patterns.

A third contribution of this paper is providing logically coherent answers jointly to the following long-standing questions by using a single functional model of biological neural networks:

  1. What is the information that neurons communicate by spike trains?

  2. How do spike trains carry this information?

  3. In what form is this information stored in the synapses? How are the synapses updated to learn this information in supervised learning and unsupervised learning in accordance with the Hebb rule of learning?

  4. How is this information stored in synapses retrieved and converted into spike trains?

  5. How does a biological neural network generalize on corrupted, distorted or occluded input?

  6. What enables a biological neural network to recognize rotated, translated or scaled patterns?

  7. How do the spike generation and travel times affect the network processing?

  8. What are the functions of dendritic nodes and trees? How are dendritic nodes connected into trees to perform their function?

However, we note that even if all its component models are biologically justified, THPAM is only a “first-order approximate” of biological neural networks, which is not intended to explain all the biological structures and phenomena observed in biological neural networks. Some biological structures or phenomena can undoubtedly be found that are seemingly or apparently missing in THPAM. Nevertheless, the 16 numbered list items above are like 16 pieces of a jigsaw puzzle. The fact that they fit together nicely into one piece whole for the first time indicates that THPAM is worth pursuing further as a candidate macroscopic model.

The components and processing operations in THPAM can be viewed as hypotheses about the macroscopic properties of biological neural networks. Some issues that need to be resolved to biologically justify or dismiss these hypotheses are mentioned in this paper. In recent years, we have seen rapid progress in technology for measuring dendritic, synaptic and neuronal quantities, and we expect to see more. It is hoped that those outstanding issues will soon be resolved in one way or another. (It may be appropriate to recall that when it was first published, the special theory of relativity was not more than a set of logically coherent mathematical results. The claims of bizarre space/time relativity and mass-energy conversion had not even been thought of, much less experimentally confirmed. Yet the theory led to experiments, and the bizarre space/time relativity and mass-energy conversion were eventual proven to be true.)

A current major research thrust on learning machines is the development of those with a deep architecture such as the convolutional networks (LeCun et al. 1989, 1998; Simard et al. 2003), deep belief networks (Hinton et al. 2006; Hinton and Salakhutdinov 2006) and deep Boltzmann machines (Salakhutdinov and Hinton 2009). Better versions and good understandings have been reported in (Bengio and LeCun 2007; Ranzato et al. 2007; Bengio et al. 2007; Desjardins and Bengio 2008; Erhan et al. 2010). The deep belief networks and their improved version, the deep Boltzmann machines, can learn without a supervisor by a ingenious technique called greedy layer-wise learning strategy. The convolutional networks capture the spatial topology of the input images and can recognize translated patterns very well. All these deep learning machines have good generalization capabilities for recognizing distorted, rotated and translated patterns. On the well-known and widely used “MNIST Database” of handwritten digits, the deep Boltzmann machine achieved an error rate of 0.95% in recognizing handwritten digits (without using training tricks such as supplementing the data set with lightly transformed versions of the training data) (Salakhutdinov and Hinton 2009). After about 20 years of evolution, the convolutional networks’ latest version, “LeNet-6+ unsupervised training,” achieved a recognition error of 0.39% on the same “MNIST Database” (Bengio and LeCun 2007). Performances of these deep learning machines are expected to continue improving even further, especially when feedback structures are added in them.

To appreciate these performances, we note that the error rate of human performance in recognizing handwritten numerals is 1.56% at about 1 digit per second (Wilkinson et al. 1992) and 0.91% (0.56% rejection and 0.35% error) in two rounds with no time limit in the second round (Geist et al. 1994). Nevertheless, none of the deep learning machines existing in the open literature has any of the first seven features listed above.

Several models of cortical circuits, which attempt to integrate neurobiological findings into a model of the cortex, have been reported (Martin 2002; Granger 2006; Grossberg 2007; George and Hawkins 2009). Martin (2002) states: “It is clear that we simply do not understand much of the detailed structure of cortical microcircuits or their relation to functions.” (Granger 2006) provides a computational instruction set to establish a unified formalism for describing human faculties ranging from perception and learning to reasoning and language. Grossberg (2007) explains how laminar neocortical circuits, which embody two computational paradigms—complementary computing and laminar computing, give rise to biological intelligence. George and Hawkins (2009) describes how Bayesian belief propagation in a spatio-temporal hierarchical model can lead to a mathematical model for cortical circuits. The models of cortical circuits in (Granger 2006; George and Hawkins 2009) exhibit interesting pattern recognition capabilities in certain numerical examples. However, they have not been tested or compared with learning machines on those widely used databases. Granger (2006), Grossberg (2007) contain no numerical results. None of the models of cortical circuits has any of the first six features listed above.

A brief summary of the results on THPAM together with the organization of this paper follows: THPAM can be viewed as an organization of a biological neural network into a recurrent multilayer network of processing units (PUs). Section "A recurrent multilayer network of processing units" briefly describes this network and establishes notations of the inputs and outputs of PUs in the network.

The first two questions are answered in "Information carried by spike trains". It is argued that the ideal informations for neurons to communicate are the subjective probability distributions (SPDs) of the labels of patterns appearing in the receptive domains of neurons. Therefore, we hypothesize that the SPDs are said ideal informations to facilitate our derivation of THPAM. The resulting integrity of THPAM and impossibility to replace SPDs suggests that the hypothesis is likely to be valid. It is further argued that under the four postulates, the SPD is the average frequency of the spikes in the spike train.

A processing unit (PU) is a two-layer pattern recognizer, which learns and generates aforementioned SPDs. To achieve these, the PU uses not-exclusive-or (NXOR) logic gates to transform its input vectors into general orthogonal expansions (GOEs). GOEs are described in "Orthogonal expansion". The transformation by a large number of NXOR gates can be looked upon as a functional model of the dendritic trees of the neurons in the PU.

By a crude version of the Hebb rule, outer products of the GOEs (general orthogonal expansions) and their respective labels are accumulated to form general expansion correlation matrices (GECMs), which are the synaptic strengths stored in the PU (processing unit). GECMs are discussed in "Expansion correlation matrices". Simple multiplication of the GECMs and the GOE of the input pattern and simple manipulation of the resultant products yield the SPD of the label of the input pattern. This generation of SPDs together with an example is given in "Representations of probability distributions".

Each processing unit (PU) uses a masking matrix to automatically select a maximum number of components of its input feature subvector that matches those of a stored feature subvector and determine the SPD of the label involved. The masking matrix and how it works is described in "Masking matrices". The masking matrix can be viewed as mathematical idealization and organization of a large number of overlapped and nested dendritic trees.

As shown in Fig. 2, a processing unit (PU) comprises an Orthogonal Expander, a label SPD (subjective probability distribution) Estimator, a Spike Generator, a GECMs (general expansion correlation matrices) Adjuster, and a storage of Inline graphic and Inline graphic. A PU has essentially two functions, retrieving “point estimates” of the label of a feature subvector Inline graphic from the memory (i.e., GECMs) and learning a feature subvector and its label that is either provided from outside the PU (in supervised learning) or generated by itself (in unsupervised learning). The structural diagram of an example PU is shown in Fig. 3. Both supervised and unsupervised learning by the PU follow a crude version of the Hebb rule. Spike trains generated by each PU facilitate unsupervised learning. This simple novel modeling of the Hebbian unsupervised learning in biological neural networks is a major underpin of THPAM as a functional model of biological neural networks. The PU and its functions of retrieving and learning are explained in "Processing units and supervised and unsupervised learning".

Fig. 2.

Fig. 2

A processing unit, PU(n), with a feature subvector index n, comprising an Orthogonal Expander, a label SPD (subjective probability distribution) Estimator, a Spike Generator, a GECM (general expansion correlation matrix) Adjuster, and a storage of Inline graphic and Inline graphic. PU(n) has essentially two functions, retrieving a “point estimate” or a sequence of “point estimates” (i.e., spike trains) of the label of a feature subvector Inline graphic from the memory, GECMs, and learning a feature subvector and its label that is either provided from outside the PU (in supervised learning) or generated by the PU itself (in unsupervised learning)

Fig. 3.

Fig. 3

The structural diagram of the PU (processing unit) in Example 3 and Example 4. The dendritic tree is the orthogonal expansion of the input feature subvector xτ. The tree nodes are NXORs. A Hebbian learning mechanism for the D-neuron can perform supervised or unsupervised depending on whether the label rτ is provided from outside the PU or is the output x{yτ} of the D-neuron. A “pseudo-Hebbian” learning mechanism for the C-neuron performs only unsupervised learning and always uses 1 in so doing. While the D-neuron output spike trains, the C-neuron generates graded signals to modulate the D-neuron

The brain is known to be able to recognize rotated, translated and scaled patterns. To achieve this, each PU in THPAM learns a rotation, translation and scaling suite of its input feature subvector. Such suites are described in "Learning to recognize rotated, translated or scaled patterns". Some translations, rotation, compression and expansion of an example pattern are shown in Fig. 4.

Fig. 4.

Fig. 4

A receptive domain is shown in (a), and a translation in (b). A rotation, compression and expansion of the translation are shown in (c), (d), (e). (f) shows five receptive domains that are translations of one another

Spike trains propagated among the PUs are one of the postulates leading to THPAM. In "Spike trains for each exogenous feature vector", the necessity of spike trains for the foregoing parts of THPAM to work properly is discussed. So is how the spike trains are feedbacked with delays. Typical feedback connections with delays are shown in Fig. 5.

Fig. 5.

Fig. 5

Layer l and layer l +2 of an example THPAM with feedback connections from layer l to layer l and from layer l +2 to layer l. For each exogenous feature vector Inline graphic, ζ rounds of retrieving and learning are performed by each PU in THPAM at times, τ + i/ζ,, i = 0, 1,…, ζ −1. The outputs of a PU form R spike trains

A recurrent multilayer network of processing units

The temporal hierarchical probabilistic associative memory (THPAM) can be looked upon as an organization of a biological neural network into a recurrent hierarchical network of PUs (processing units). Each PU is a pattern recognizer that comprises dendritic trees, neurons of two types, synaptic weights, and a learning mechanism for updating these synaptic weights by a version of the Hebb rule in unsupervised or supervised learning.

Spike trains propagating through biological neural networks are assumed to be sequences of unipolar binary numbers, 0’s and 1’s. A group of M spike trains can be viewed as a sequence of M-dimensional unipolar binary vectors, vt, t = 1, 2,…, where Inline graphic. In this paper, we convert vt, t = 1, 2,…, into a sequence of M-dimensional bipolar binary vectors, xt, t = 1, 2,…, by xtm = 2vtm −1, for m = 1,…, M and t = 1, 2,…. We will use xt to simplify our description and discussion in this paper. Since xt is only a mathematical representation of vt, and xtm can easily be converted back into vtm by vtm = (xtm +1)/2, we also call the components of xt, t = 1, 2,…, spike trains with the understanding that they are mathematical representations of the “biological” spike trains, vt, t = 1, 2,….

A vector input to THPAM is called an exogenous feature vector, and a vector input to a layer of PUs is called a feature vector. A feature vector input to a layer usually contains not only feedforwarded outputs from a preceding layer but also feedbacked outputs from the same or higher layers with a time delay. A feature vector may contain components from an exogenous feature vector. For simplicity, we assume that the exogenous feature vector is only input to layer 1 and is thus a subvector of a feature vector input to layer 1. These vectors over time form groups of spike trains.

A subvector of a feature vector that is input to a PU is called a feature subvector. Trace the feedforward connections backward from neurons of a PU to a subvector of the exogenous feature vector. This feature subvector is called the receptive domain of the PU. The collection of neurons in layer l −1 that have a feedforward connection to a neuron in a PU in layer l and the delay devices that hold a feedback for direct input to the same PU in layer l are called the immediate receptive domain of the PU.

The feature vector input to layer l at time or numbering t is denoted by Inline graphic, and the output from the layer at t is denoted by Inline graphic, where Inline graphic and Inline graphic are specified in more detail later in this section. An exogenous feature vector is denoted by Inline graphic. It is a subvector of Inline graphic, which may contain feedbacked components. For notational simplicity, the superscript l −1 in Inline graphic and dependencies on l −1 or l in other symbols are sometimes suppressed in the following when no confusion is expected.

Let xt, t = 1, 2,…, denote a sequence of M-dimensional feature vectors Inline graphic, whose components are ternary numbers. Let Inline graphic be a subvector Inline graphic such that Inline graphic. The subvector Inline graphic of xt is a feature subvector of the feature vector xt. n is called a feature subvector index (FSI), and Inline graphic is said to be a feature subvector on the FSI n or have the FSI n. Each PU is associated with a fixed FSI n and denoted by PU Inline graphic. Using these notations, the sequence of subvectors of xt, t = 1, 2,…, that is input to PU(n) is Inline graphic, t = 1, 2,…. The FSI n of a PU usually has subvectors, Inline graphic, u = 1,…, U, on which subvectors Inline graphic of Inline graphic are separately processed by PU(Inline graphic) at first. The subvectors, Inline graphic, u = 1,…, U, are not necessarily disjoint, but all inclusive in the sense that every component of n is included in at least one of the subvectors Inline graphic. Moreover, the components of Inline graphic are usually randomly selected from those of n.

The PUs in layer l have FSIs (feature subvector indices) denoted by Inline graphic, Inline graphic,…,Inline graphic. Upon receiving a feature vector Inline graphic by layer l, the feature subvectors, Inline graphic, Inline graphic,…,Inline graphic, are formed and processed by the PUs, PUInline graphic, PUInline graphic,…,PUInline graphic, to compute Inline graphic, Inline graphic,…,Inline graphic first and then generate Inline graphic, Inline graphic,…,Inline graphic, respectively. Here Inline graphic denotes a representation of the subjective probability of the label Inline graphic of Inline graphic, and Inline graphic denotes the output of Inline graphic based on Inline graphic. These representations and outputs are grouped into the representation Inline graphic of subjective probabilities and the output vector Inline graphic of layer l as follows:

graphic file with name M51.gif
graphic file with name M52.gif

The components of a feature vector Inline graphic input to layer l at time (or with numbering) τ comprise components of ternary vectors generated by PUs in layer l −1 and those generated at a previous time by PUs in the same layer l or PUs in higher layers with layer numberings l +k for some positive integers k. The time delays may be of different durations.

Once an exogenous feature vector is received by THPAM, the PUs perform functions of retrieving and/or learning from layer to layer starting with layer 1, the lowest layer. After the PUs in the highest layer, layer L, complete performing their functions, THPAM is said to have completed one round of retrievings and/or learnings (or memory adjustments). For each exogenous feature vector, THPAM will continue to complete a certain number of rounds of retrievings and/or learnings.

We note that retrieving and learning by a PU are performed locally, meaning that only the feature subvector input to the PU and its label are involved in the processing by the PU. In "Orthogonal expansion, Expansion correlation matrices, Representations of probability distributions, 7, 8 and Learning to recognize rotated, translated or scaled patterns", the subscripts t and τ denote the time or numbering of a feature vector or subvector that is input to a layer or a PU, whereas, in "Spike trains for each exogenous feature vector", they denote the time or numbering of an exogenous vector that is input to THPAM.

Information carried by spike trains

Since the immediate receptive domain (defined in Section "A recurrent multilayer network of processing units") of a PU may be shared by more than one cause (or pattern) or may contain parts from more than one cause, and may contain corruption, distortion, occludion or noise from the PU’s receptive domain (defined in Section "A recurrent multilayer network of processing units") or the sensor measurements, image pixels, or sound recordings that are transformed into the receptive domain; the PU’s immediate receptive domain can completely be described or represented only by a probability distribution (or a relative frequency distribution). Therefore, probability distributions are the most desirable information for the PUs to communicate among them. Since probability distributions can be learned by the PU only from “experiences,” they must be subjective probability distributions (SPDs). As will be seen in "Representations of probability distributions", SPDs of the labels of a PU’s immediate receptive domain can be generated by the PU.

There are three possible ways spike trains can carry an SPD: (1) The SPD is carried by the shapes of the spikes. (2) The spike trains at an instant of time form a binary representation of the SPD. (3) The SPD is represented by the frequencies of spikes in spike trains.

Shapes of the spikes cannot be learned by the Hebb rule. Besides, SPDs output from a layer of PU are input to the next layer of PUs. In the process of learning, such SPDs for causes or patterns change. The learning and retrieving mechanisms of PUs must be able to tolerate such changes. It is not clear how changes in spike shapes can be tolerated. Hence, way (1) is ruled out for PUs in THPAM.

In way (2), each SPD is represented by a certain number of bits (or tets), the number depending on the level of accuracy required. The higher the accuracy level, the larger the dimensionality of the output vector of the processing unit.

Again, in the process of learning, the SPD for a certain cause changes. For the learning and retrieving mechanisms to tolerate changes in the codes for the SPDs, the codes must vary gradually as the SPD changes gradually. Such codes are known to consist of large numbers of bits, requiring a large dimensionality of the output vector of the PU. Furthermore, it is not clear how unsupervised learning can be performed with such binary codes by the Hebb rule. For instance, when feature subvectors (or their variants) that have not been learned are input to a PU, the SPDs output by the PU are the same, namely the uniform distribution, which is therefore represented by the same binary code and giving all such input vectors the same label in Hebbian learning, failing to learn to distinguish different feature subvectors without supervision.

Way (3) can be easily obtained by using a pseudo-random number generator to convert a subjective probability into a +1 spike with said subjective probability and a mathematical −1 spike (i.e., biological 0) otherwise. Using this representation, the rate of +1 spikes in a spike train is on the average the subjective probability generated by the PU that outputs the spike train. The dimensionality of this representation is the dimensionality of the label. At any instant of time, the spike trains output by the processing units in a layer form an image of +1, −1 and 0 (for simplicity in certain cases). Gradual change in the subjective probabilities changes the distribution of the ternary digits in the image gradually, which can be tolerated by the use of the masking matrices to be described in "Masking matrices. When a feature subvector (or a variant thereof) that has never been learned is input to a PU, a random label is assigned to the vector. Different new input feature subvectors are usually assigned different labels in unsupervised learning. Occasional coincidences of different feature subvectors assigned with the same label do not cause a problem, if the label is used as part of a feature subvector input to a higher-layer PU. For example, “loo” as a part of “look,” “loot,” and “loop” does not cause confusion.

Therefore, under the four postulates, the subjective probability of a component of a label being +1 is represented by the average spike rate of a spike train. It follows that if the dimensionality of the label is at most R, R neurons form a group whose R spike trains carry the SPD of the label.

Orthogonal expansion

As discussed in the preceding section, SPDs (subjective probability distributions) are the most desirable information for PUs to communicate among themselves. Can SPDs be learned and retrieved by PUs under the four postulates? Subjective probabilities are relative frequencies. We need to find out whether and how relative frequencies can be learned and retrieved.

Orthogonal expansion of ternary vectors from the coding theory (Slepian 1956) plays an important role in learning and retrieving of the relative frequencies. The following example motivates and explains the definition of orthogonal expansion of bipolar vectors.

Example 1 Given 2-dimensional bipolar vectors, Inline graphic and Inline graphic, let

graphic file with name M56.gif

By simple algebra, Inline graphic. It follows that Inline graphic= 1, if a = b; and Inline graphic, if a ≠ b. Inline graphic and Inline graphic are therefore called orthogonal expansions of a and b. Generalizing this idea of orthogonal expansion yields the following definition.

Definition Given an m-dimensional ternary vector Inline graphic, define Inline graphic recursively by

graphic file with name M64.gif 1

Inline graphic is called the orthogonal expansion of v.

The above definition is justified by the following theorem.

Theorem 1 Let Inline graphic and Inline graphic be two m-dimensional ternary vectors. Then the inner product Inline graphic of their orthogonal expansions, Inline graphic and Inline graphic, can be expressed as follows:

graphic file with name M71.gif 2

which have the following properties:

  1. If akbk =  −1 for some k ∈ {1,…, m}, then Inline graphic.

  2. If akbk = 0 for some k ∈ {1,…, m}, then Inline graphic.

  3. If Inline graphic, then Inline graphic.

  4. If a and b are bipolar vectors, then Inline graphic if a ≠ b; and Inline graphic if a = b.

Proof Applying the recursive formula (1), we obtain

graphic file with name M78.gif

The formula in (2) follows. The four properties above are easy consequences.

We remark that if some components of a are set equal to zero to obtain a vector c and the nonzero components of c are all equal to their corresponding components in b, then we still have Inline graphic This property is used to construct masking matrices in "Masking matrices" for learning and recognizing corrupted, distorted and occluded patterns and for facilitating generalization on such patterns.

Expansion correlation matrices

In this section, it is shown how orthogonal expansions of subvectors of feature subvectors input to a PU are used to construct synaptic weights in the PU, and how such synaptic weights, in the form of matrices, are adjusted in the PU to learn feature subvectors.

Let the label of Inline graphic be denoted by Inline graphic, which is an R-dimensional ternary vector. All subvectors, Inline graphic, u = 1,…, U, of Inline graphic share the same label Inline graphic. In supervised learning, Inline graphic is provided from outside THPAM, and in unsupervised learning, Inline graphic is generated internally in the PU itself.

The pairs (Inline graphic, Inline graphic), t = 1, 2,…, are learned by the PU to form expansion correlation matrices (ECMs), Inline graphic and Inline graphic on Inline graphic. After the first T pairs are learned, these matrices are

graphic file with name M92.gif 3
graphic file with name M93.gif 4

where Inline graphic are orthogonal expansions of Inline graphic, Inline graphic is a scaling constant that is selected to keep all numbers involved in THPAM manageable, λT-tI is a weight matrix, where I is the identity matrix, and λ(0 < λ < 1) is a forgetting factor. Other matrix Inline graphic can be used as the weight matrix instead. Note that the matrix Inline graphic has only one row.

The ECMs, Inline graphic and Inline graphic, are adjusted as follows: If Inline graphic,

graphic file with name M102.gif 5
graphic file with name M103.gif 6

If Inline graphic, then Inline graphic and Inline graphic are unchanged. These update formulas are discussed in terms of supervised and unsupervised Hebbian learning in "Processing units and supervised and unsupervised learning". We note here that learning an input feature subvector using the above formulas is instantaneous. No differentiation, backpropagation, iteration, optimization, cycling repeatedly through training material or waiting for asymptotic convergence is required.

Orthogonal expansions (OEs) Inline graphic and ECMs, Inline graphic, Inline graphic, u = 1,…, U, are assembled into a general orthogonal expansion (GOE) Inline graphic and general expansion correlation matrices (GECMs), Inline graphic and Inline graphic, for PUInline graphic (the PU on the FSI n) as follows:

graphic file with name M114.gif 7
graphic file with name M115.gif 8
graphic file with name M116.gif 9

Note that while Inline graphic, the dimensionality of the orthogonal expansion of Inline graphic is Inline graphic. The former can be made much smaller than the latter by setting Inline graphic small. If the components of a subvector Inline graphic of the feature subvector index (FSI) n are selected from the FSI n at random, then the components of Inline graphic are a random sample of Inline graphic, and Inline graphic is a “lower-resolution” representation of Inline graphic. Hence, a sufficient number U of Inline graphic, which may have common components, can sufficiently represent Inline graphic. Since subvectors Inline graphic are all inclusive, Inline graphic if and only if Inline graphic for u = 1,…, U. However, even if Inline graphic may still be equal to Inline graphic for some values of u. Therefore, the use of the GOE Inline graphic has not only the advantage of having the much smaller dimensionality of Inline graphic and thereby the much smaller dimensionality of the GECMs, but also the advantage of helping enhance the generalization capability of the PU. This advantage is further discussed in "Masking matrices".

Note that the components of Inline graphic are actually all the products that can be obtained from those of Inline graphic. Each product is obtained by successive two-factor multiplications. For example,

graphic file with name M137.gif

Because of commutativity and associativity of multiplication, the successive two-factor multiplication for a component of Inline graphic is not unique. Missing or repeating components in Inline graphic in the GECMs, Inline graphic and Inline graphic, or in the GOE of the input feature subvectors Inline graphic cause only graceful degradation of subjective probability distribution representation Inline graphic.

Each two-factor multiplication can be looked upon as an NXOR operation on the two factors involed. Note that NXOR operations can be replaced with XOR operations without affecting the generation of subjective probability distribution representation Inline graphic. XOR gates were found in dendritic trees by Zador et al. (1992), Fromherz and Gaede (1993), and the existence of logic gates and low-order polynomials in dendritic trees were discussed in Mel (1994).

Representations of probability distributions

How the expansion correlation matrices are used to generate representations of SPDs (subjective probability distributions) is shown in this section. The following example illustrates the idea.

Example 2 Given two different feature subvectors, Inline graphic and Inline graphic, which are 2-dimensional bipolar vectors. Then, Inline graphic = 4, Inline graphic = 0, and Inline graphic. Let a training data set consists of 8 copies of u with label +1 and 2 copies of u with label −1; and 3 copies of v with label +1 and 27 copies of v with label −1. This training data set is learned by a PU with Inline graphic [in (3) and (4)] to form the GECMs (general expansion correlation matrices) with U = 1:

graphic file with name M151.gif

By simple algebra, Inline graphic, Inline graphic, Inline graphic, Inline graphic. It follows that Inline graphic = 8/10 is the relative frequency that u has been learned with label +1 by the PU; and Inline graphic is the relative frequency that u has been learned with label −1 by the PU. Similary, Inline graphic is the relative frequency that v has been learned with label +1; and Inline graphic is the relative frequency that v has been learned with label −1.

We now generalize the idea illustrated in Example 2 in the following. Let us first define the symbols Inline graphic, Inline graphic, Inline graphic:

graphic file with name M163.gif 10
graphic file with name M164.gif 11
graphic file with name M165.gif 12

and the symbols Inline graphic, Inline graphic, Inline graphic:

graphic file with name M169.gif 13
graphic file with name M170.gif 14
graphic file with name M171.gif 15

where Inline graphic is a general orthogonal expansion (GOE) and Inline graphic and Inline graphic are general expansion correlation matrices (GECMs) for PUInline graphic. It is easy to see that Inline graphic, and Inline graphic

As a special case, Example 2 shows that Inline graphic is an approximate of the subjective probability that the i-th component of the label of Inline graphic is +1. The general case is examined in the following.

Assume that all Inline graphic and Inline graphic are bipolar binary vectors. By (11), (12), (3) and (4),

graphic file with name M182.gif

where

graphic file with name M183.gif

Assume further that Inline graphic, u = 1,…, U are all the same. Then if Inline graphic,

graphic file with name M186.gif 16

For example, if λ and U are set equal to 1, the above expression becomes

graphic file with name M187.gif

where Inline graphic is the number of Inline graphic’s with Inline graphic that have been learned and are equal to Inline graphic, and Inline graphic is the number of Inline graphic’s that have been learned and are equal to Inline graphic. Hence, the ratio Inline graphic is a relative frequence that the input feature subvector Inline graphic has a label with its j-th component Inline graphic. The example also shows that if λ equal to 1, the memory, Inline graphic and Inline graphic, never degrades. However, in this case, if learning continues, the memory can get saturated, causing memory “overflow.”

The closer λ is to 1 and the smaller U is, the closer the above expression (16) approximates the subjective probability that the label Inline graphic, based on the GECMs, Inline graphic and Inline graphic which are constructed with pairs Inline graphict = 1, 2,…, T. Here Inline graphic with R components. (note that I is not the identify matrix I.) The forgetting factor λ de-emphasizes past pairs gradually. It does not have to be applied each time a feature subvector Inline graphic is learned by PU(n) as above. It can be applied once after a certain number, say 1,600 of feature subvectors are learned by the PU.

All the statements concerning a probability in this paper are statements concerning a subjective probability, and the word “subjective” is sometimes omitted. If Inline graphic then Inline graphic is approximately the probability Inline graphic that the j-th component Inline graphic of the label Inline graphic of Inline graphic is +1 based on Inline graphic and Inline graphic. If Inline graphic, then we set Inline graphic = 1/2. The vector

graphic file with name M216.gif

is a representation of a probability distribution of the label Inline graphic of the feature subvector Inline graphic input to PU(n). Since Inline graphic, if Inline graphic, the ratio Inline graphic is equal to Inline graphic If Inline graphic, set 2Inline graphic. Denote Inline graphic by Inline graphic. Then the vector Inline graphic satisfies

graphic file with name M228.gif

and is also a representation of a probability distribution of the label Inline graphic of the feature subvector Inline graphic. Here, Inline graphic.

A point estimate of the label Inline graphic can be obtained by converting each component Inline graphic of Inline graphic into a ternary number Inline graphic by the following steps: For k = 1, R, set Inline graphic, and generate a pseudo-random number in accordance with the probability distribution of a random variable Inline graphic and Inline graphic, and set Inline graphic equal to the resultant pseudo-random number. Assemble Inline graphic, j = 1,…, R, into Inline graphic, which is a point estimate of the label Inline graphic.

Masking matrices

Let a feature subvector that deviates from each of a group of feature subvectors that have been learned by the PU due to corruption, distortion or occlusion be presented to the PU. If the PU is able to automatically find the largest subvector of the presented subvector that matches at least one subvector among the group and generate the SPD of the label of the largest subvector, the PU is said to have a maximal generalization capability. This capability is achieved by the use of masking matrices described in this section.

Let a subvector Inline graphic be a slightly different (e.g., corrupted, distorted, occluded) version of Inline graphic, which is one of the subvectors, Inline graphic, t = 1, 2,…, T, stored in ECMs, Inline graphic and Inline graphic, on Inline graphic. Assume that Inline graphic is very different from other subvectors stored in the ECMs. Since Inline graphic, the information stored in Inline graphic and Inline graphic about the label Inline graphic cannot be obtained from Inline graphic and Inline graphic. This is viewed as failure of Inline graphic and Inline graphic to generalize. Because of property 2 in Theorem 1, if the corrupted, distorted and occluded components in Inline graphic are set equal to zero, then the information stored in the ECMs about the label Inline graphic can be obtained in part from the remaining components of Inline graphic. This observation motivated masking matrices.

Let us denote the vector Inline graphic with its i1-th, i2-th,…, and ij-th components set equal to 0 by Inline graphic, where 1 ≤ i1 < i2 < ⋅⋅⋅ < ij ≤ n. For example, if Inline graphic, then Inline graphic. Denote the n-dimensional vector Inline graphic by I (not the identity matrix I) and denoting the orthogonal expansion of v(i1i2,…, ij) by Inline graphic. We note that v(i1i2,…, ij) = diag(I(i1i2,…, ij)) v and Inline graphic, where Inline graphic and Inline graphic denote the orthogonal expansions of v(i1i2,…, ij) and Inline graphic respectively (not the orthogonal expansions of v and I with their i1-th, i2-th, and ij-th components set equal to 0).

Using these notations, a feature subvector Inline graphic with its i1-th, i2-th, and ij-th components set equal to 0 is Inline graphic, and the orthogonal expansion of Inline graphic is diagInline graphic. Hence, the matrix diagInline graphic, as a matrix transformation, sets the i1-th, i2-th, and ij-th components of xt(n(u)) equal to zero in transforming Inline graphic (i.e., in diagInline graphic).

Two important properties of the matrix diagInline graphic are the following:

  1. If diagInline graphic, then Inline graphic

  2. If diagInline graphic, then Inline graphic= 0.

The following example illustrates how such matrices diagInline graphic can be used by a PU (processing unit) to generalize.

Example 3 Consider a cube shown in Fig. 1. The coordinate vectors of its eight vertices, xt, t = 1, 2,…, 8, and their corresponding labels, rt, t = 1, 2,…, 8, are shown at the vertices and in the squares, respectively, where the question marks indicate unknown labels. The training data consists of the pairs, (xtrt), t = 1, 2, 3, 7, 8.

Fig. 1.

Fig. 1

Data for training and testing the PU (processing unit) in Example 3 and Example 4 are shown as the vertices of a cube. Their bipolar binary labels are the numbers or question marks for unknown labels in the squares at the vertices. x4, x5, x6 are unavailable in the data set for Example 3. They are learned one by one without supervision (i.e., with labels generated by the PU) in Example 4

The pairs, Inline graphic, t = 1, 2, 3, 7, 8, are listed as rows in the following table:

Inline graphic 1 xt1 xt2 xt2xt1 xt3 xt3xt1 xt3xt2 xt3xt2xt1 rt
Inline graphic 1 −1 −1 1 −1 1 1 −1 −1
Inline graphic 1 1 −1 −1 −1 −1 1 1 1
Inline graphic 1 −1 1 −1 −1 1 −1 1 1
Inline graphic 1 −1 1 −1 1 −1 1 −1 1
Inline graphic 1 1 1 1 1 1 1 1 1

Assume U = 1 and Inline graphic in (5), (6), (3) and (4) in a PU (processing unit). The general expansion correlation matrices, D and C, of the PU is the following:

graphic file with name M292.gif 17
graphic file with name M293.gif 18

Let

graphic file with name M294.gif

Orthogonal expansion of them yields

graphic file with name M295.gif

We introduce the following matrix

graphic file with name M296.gif 19

where the weight 2−8 is selected to de-emphasize the effect of the second term above as compared with the first term. The orthogonal expansion of the three vertices of the cube in Fig. 1 that are not included in the training data are listed as follows:

Inline graphic 1 xt1 xt2 xt2xt1 xt3 xt3xt1 xt3xt2 xt3xt2xt1
Inline graphic 1 1 1 1 −1 −1 −1 −1
Inline graphic 1 −1 −1 1 1 −1 −1 1
Inline graphic 1 1 −1 −1 1 1 −1 −1

From the following examples,

graphic file with name M301.gif

we see that diagInline graphic sets the k-th component xtk of Inline graphic equal to 0 for t = 1,…, 8, k = 1, 2, 3.

Simple matrix-vector multiplication yields Inline graphic and Inline graphic for t = 4, 5, 6. Hence no information is provided on xt by Inline graphic and Inline graphic for t = 4, 5, 6. This shows that if xt has not been learned, then no information on it is provided by the general expansion matrices. Recall that if Inline graphic ≠ 0, the subjective probability Inline graphic, where Inline graphic and Inline graphic. With M, we will use Inline graphic and Inline graphic instead.

Assume that x1 is input to the PU with the above D and C. By matrix multiplication,

graphic file with name M314.gif

Then the subjective probability that the label of x4 is 1 is Inline graphic = 0.0077, and the subjective probability that the label of x4 is −1 is 0.9923. Note that x1 with a label of −1 has been learned. The subjective probability that the label of x4 is −1 should be 1. The use of M causes a very small amount of error to the subjective probability, which can be adjusted by changing the weight, 2−8.

Assume that x4 is input to the PU with the above D and C. By matrix multiplication,

graphic file with name M316.gif

Then the subjective probability that the label of x4 is 1 is Inline graphic. From Fig. 1, we see that all the three vertices neighboring x4 have been learned and have a label of 1. It is a good generalization that a label of 1 is assigned to x4.

Assume that x6 is presented to the same PU. By matrix multiplication,

graphic file with name M318.gif 20
graphic file with name M319.gif 21

Then the subjective probability that the label of x6 is 1 is Inline graphic. From Fig. 1, we see that only two vertices neighboring x4 have been learned, and they both have a label of 1. It is a good generalization that a label of 1 is assigned to x6.

Assume that x5 is input to the same PU. By matrix multiplication,

graphic file with name M321.gif

Then the subjective probability that the label of x5 is 1 is Inline graphic. From Fig. 1, we see that only two vertices neighboring x4 have been learned, and one of them has a label of 1, and the other has a label of −1. No generalization is possible. A label of 1 is assigned to x6 with a subjective probability of 1/2 and that a label of −1 is assigned to x6 with equal subjective probability.

In the general case, we combine all such matrices diagInline graphic that set less than or equal to a selected positive integer Inline graphic of components of Inline graphic equal to zero into the following masking matrix

graphic file with name M326.gif 22

where 2j is used to compensate for the factor 2j in Inline graphic in the important property stated above, and 2−8j is an example weight selected to differentiate between different levels j of maskings. What this weight really is in biological neural networks needs to be found by biological experiments. So is the positive integer Inline graphic.

Let us denote Inline graphic by M here for abbreviation. Note that for k = 1,…, R, we have the following:

  • If Inline graphic ≠ 0, then
    graphic file with name M331.gif
  • If Inline graphic, but Inline graphic, then
    graphic file with name M334.gif
  • If Inline graphic, Inline graphic, but Inline graphic, then
    graphic file with name M338.gif

Continuing in this manner, it is seen that Inline graphic and Inline graphic always use the greatest number of uncorrupted, undistorted or unoccluded components of Inline graphic in estimating Inline graphic, Inline graphic, and Inline graphic.

Corresponding to Inline graphic, Inline graphic and Inline graphic defined in (7), (8) and (9), a general masking matrix is defined as follows:

graphic file with name M348.gif 23

where the right side is a matrix with Inline graphic, u = 1, 2, U, as diagonal blocks and zero elsewhere.

If the masking matrix Inline graphic is used, the symbols Inline graphic, Inline graphic, Inline graphic are defined as follows:

graphic file with name M354.gif 24
graphic file with name M355.gif 25
graphic file with name M356.gif 26

With the masking matrix Inline graphic, the symbols Inline graphic, Inline graphic, Inline graphic, Inline graphic are in turn defined as follows:

graphic file with name M362.gif 27
graphic file with name M363.gif 28
graphic file with name M364.gif 29

where Inline graphic is a general orthogonal expansion (GOE) and Inline graphic and Inline graphic are general expansion correlation matrices (GECMs) for PUInline graphic. It follows that

graphic file with name M369.gif 30
graphic file with name M370.gif 31
graphic file with name M371.gif 32

It is easy to see that Inline graphic, and Inline graphic. If Inline graphic, then we set Inline graphic. If Inline graphic, then Inline graphic, where Inline graphic is the probability that the k-th component Inline graphic of the label Inline graphic of Inline graphic is +1 based on Inline graphic and Inline graphic. It follows that

graphic file with name M384.gif

is a representation of a probability distribution of the label Inline graphic of Inline graphic.

It is mentioned in "Expansion correlation matrices" that selecting sufficiently small subvectors Inline graphic, u = 1,…, U, has the advantage of making Inline graphic sufficiently small. The formula (22) shows that selecting sufficiently small subvectors Inline graphic, u = 1,…, U, also has the advantage of making the number of terms in the formula sufficiently small. The use of subvectors Inline graphic, u = 1,…, U, has another way to help enhancing the generalization capability of PUInline graphic: If the number of corrupted, distorted or occluded components of a subvector Inline graphic of Inline graphic exceeds Inline graphic, then Inline graphic does not contribute to Inline graphic or the output Inline graphic of PU(n). This eliminates the effect of a subvector Inline graphic that contains too many errors and allows PU(n) to produce a better estimate of the subjective probability distribution of a label on better subvectors of Inline graphic.

If some terms in (22) are missing, PU(n) suffers only graceful degradation of its generalization capability. We hypothesize that a masking matrix is a mathematical idealization and organization of a large number of nested and overlapped dendritic trees.

Processing units and supervised and unsupervised learning

We are ready to assemble a PU (processing unit) and see how supervised and unsupervised learning are performed. A processing unit, PU(n), on a feature subvector index n, is shown in Fig. 2. It has essentially two functions, retrieving a “point estimate” of the label of a feature subvector Inline graphic from the memory (i.e., GECMs) and learning a feature subvector and its label that is either provided from outside the PU (in supervised learning) or generated by itself (in unsupervised learning). PU(n) comprises an Orthogonal Expander, a label SPD (subjective probability distribution) Estimator, a Spike Generator, a GECM (general expansion correlation matrix) Adjuster, and a storage of the GECMs, C(n) and Inline graphic The Orthogonal Expander models dendritic trees with NXORs (or XORs) as tree nodes, Inline graphic and Inline graphic model the synaptic weights in a biological neural network, and the label SPD Estimator and Spike Generator jointly model R neurons of one type and 1 neuron of another type in the same layer of a biological neural network. These two types of neuron will be described below.

During retrieving, a feature subvector Inline graphic on the FSI (feature subvector index) n is first expanded into a GOE (general orthogonal expansion) Inline graphic by the Orthogonal Expander. Inline graphic is then processed by the SPD (subjective probability distribution) Estimator, using the GECMs (general expansion correlation matrices), Inline graphic and Inline graphic to obtain a representation Inline graphic of an SPD of the label of the feature subvector Inline graphic. The Spike Generator converts Inline graphic into a ternary vector Inline graphic, which is the output of the PU. This process of generating Inline graphic and Inline graphic by PU(n) is called retrieval of a label of the feature subvector Inline graphic by PU(n).

The SPD Estimator and Spike Generator may be viewed as R neurons of one type and one neuron of another type that jointly output a “point estimate” Inline graphic of the label of Inline graphic. The former type is called D-neurons and the latter C-neuron. The C-neuron does a simple multiplication

graphic file with name M421.gif

For j = 1,…, R, D-neuron j performs the following tasks:

  1. Input Inline graphic If Inline graphic = 0, set Inline graphic; else compute Inline graphic and set Inline graphic equal to Inline graphic.

  2. Compute the subjective probability Inline graphic = (Inline graphic that the j-th component of the label of Inline graphic is +1.

  3. Generate a pseudo-random number in accordance with the probability distribution of a random variable Inline graphic and Inline graphic, and set Inline graphic equal to the resultant pseudo-random number. This is a point estimate of the j-th component of the label of Inline graphic.

Figure 2 provides a “flow chart” of the general PU. A structural diagram of an example PU is shown in Fig. 3, where the PU is that of Example 3. Input to the PU is the feature subvector Inline graphic. An dendritic tree encode xτ into the orthogonal expansion Inline graphic, whose components are multiplied by the synapses, denoted by ⊗, and the resultant multiples are distributed to the D-neuron and C-neuron. In learning, the general expansion correlation matrices, D and C, are incremented by Inline graphic and Inline graphic, respectively. In supervised learning of D, rτ is provided from outside the PU. In unsupervised learning of D, rτ is set equal to x{yτ}, which is generated by D-neurons. C is the accumulation of Inline graphic. A possible way to perform this accumulation by the Hebb rule is for the C-neuron to have a second output that is always equal to the constant 1.

If a label Inline graphic of Inline graphic from outside the PU is available for learning, and learning Inline graphic and Inline graphic is wanted, supervised learning is performed by the PU. In supervised learning, the class label Inline graphic is received through a lever represented by a thick solid line with a solid dot at its end in Fig. 2 by the GECM Adjuster, which receives also Inline graphic from the Orthogonal Expander and adjusts ECMs by formulas (5)–(6) and assembles the resultant ECMs, Inline graphic and Inline graphic, u = 1,…,U, into general ECMs, Inline graphic and Inline graphic, by (8) and (9).

These Inline graphic and Inline graphic are then stored, after a one-numbering delay (or a unit-time delay), in the storage, from which they are sent to the SPD Estimator.

If a label Inline graphic of Inline graphic from outside the PU is unavailable but learning Inline graphic is wanted, unsupervised learning is performed by the PU. In this case, the lever in Fig. 2 should be in the unsupervised training position represented by the lower dashed line with a solid dot at its end in Fig. 2. The feature subvector Inline graphic is first processed by the Orthogonal Expander, SPD Estimator, and Spike Generator as in performing retrieval described above. The resultant bipolar vector Inline graphic, which is a point estimate of the lable of Inline graphic is received, through the lever in the unsupervised training position, and used by the GECM Adjuster as the label Inline graphic of Inline graphic. The GECM Adjuster receives Inline graphic also and adjusts GECMs, Inline graphic and Inline graphic, using the update formulas, (5)–(6), in the same way as in supervised learning.

Let us now see how a “vocabulary” is created by the PU through unsupervised learning: If a feature subvector Inline graphic or a slightly different version of it has not been learned by PUInline graphic, and Inline graphic = 0; then Inline graphic = 0 and Inline graphic = (1/2)I, where I = Inline graphic. The SPD Estimator and Spike Generator uses this probability vector to generate a purely random label Inline graphic. Once this Inline graphic has been learned and stored in Inline graphic and Inline graphic, if Inline graphic is input to PU(n) and to be learned without supervision for the second time, then Inline graphic and one more copy of the pair (Inline graphic, Inline graphic) is included in Inline graphic and Inline graphic.

If a feature subvector Inline graphic or a slightly different version of it has been learned by PU(n) with different labels for different numbers of times, then Inline graphic and Inline graphic. For example, assume that two labels, Inline graphic and Inline graphic of the same feature subvector Inline graphic have been learned with relative frequencies, 0.7 and 0.3, respectively. Since these two labels may have common components, the point estimate of the label resembles Inline graphic with a probability of higher that 70% and resembles Inline graphic with a probability of greater than 30%. To learn this probability, a number of such point estimates need to be learned. This is one of the reasons for each PU to generate multiple spikes for each exogenous feature vector, which is to be discussed in "Spike trains for each exogenous feature vector".

If no learning is to be performed by PU(n), the lever represented by a thick solid line with a solid dot in Fig. 2 is placed in the neutral position, through which 0 is sent as the label Inline graphic of Inline graphic to the GECM Adjuster, which then keeps Inline graphic and Inline graphic unchanged. Here is a condition for setting Inline graphic = 0 to skip learning (supervised or unsupervies): If Inline graphic generated by a PU’s estimation means in retrieving is a bipolar vector or sufficiently close to a bipolar vector by some criterion, which indicates that the input feature subvector Inline graphic is adequately learned, then the lever is placed in the middle position and no learning is performed. This avoids “saturating” the expansion correlation matrices with too many copies of one feature subvector and its label.

We note that a well-known unsupervised learning method based on a kind of Hebb rule is the Oja learning algorithm that generates the principal components of the input vectors (Oja 1982). Oja’s method gets the principal components only asymptotically and the principal components must taper down fast enough, which is true only if the input vectors do not have too many major features.

We use the PU of Example 3 to illustrate unsupervised learning in the following example.

Example 4 In this example, the PU in Example 3 with D, C, M in (17), (18), (19) will learn x6 and then x5 without supervision.

Recall that

Inline graphic 1 xt1 xt2 xt2xt1 xt3 xt3xt1 xt3xt2 xt3xt2xt1
Inline graphic 1 1 −1 −1 1 1 −1 −1
Inline graphic 1 −1 −1 1 1 −1 −1 1

and

graphic file with name M497.gif

and the subjective probability that the label of x6 is 1 is Inline graphic = 1. Hence, Inline graphic = 1 and thus the spike x{y6} generated by the PU is 1. To learn x6 without supervision, the GECM adjuster in Fig. 2 (i.e., Hebbian learning mechanism in Fig. 3) uses this spike as r6 in (5) and (6) and updates D and C in (17) and (18) into

graphic file with name M500.gif 33
graphic file with name M501.gif 34

To learn x5, the SPD Estimator in Fig. 2 (i.e., D- and C-neurons in Fig. 3) first processes it to obtain

graphic file with name M502.gif 35
graphic file with name M503.gif 36

Inline graphic. The Spike Generator then uses the subjective probability p5 = (y5 + 1) /3 = 2/3 to output the spike +1 with probability 2/3 and −1 with probability 1/3. The resultant spike is then used as r5 in (5) and (6) to updates D and C in (35) and (36) into

graphic file with name M505.gif

or

graphic file with name M506.gif

depending on whether r5 = 1 or r5 =  −1, respectively. For D and C to learn the subjective probability p5 = 2/3, the feature subvector x5 needs to be learned a number of times. This is one of the motivations of PUs generating spike trains for each exogenous feature vector input to the THPAM. Generating spike trains in a THPAM are discussed in "Spike trains for each exogenous feature vector".

Learning to recognize rotated, translated or scaled patterns

In this section, we describe a method for PUs (processing units) to learn to recognize rotated, translated and scaled patterns. The method can be modified for PUs to learn to recognize translated and scaled temporal patterns such as speech and music. Since the method is valid for both supervised and unsupervised learning, labels Inline graphic to be referred to may be provided from outside THPAM in supervised learning or generated by the PUs in unsupervised learning. It is assumed in this section that feature vectors are arrays of ternary pixels.

Locations of ternary pixels in an array are assumed to be dense relative to the locations of the pixels selected as components of a feature subvector Inline graphic input to a PU. We identify the FSI (feature subvector index) n of a feature subvector with the locations of the pixels in Inline graphic. In other words, the components of n are also the numberings of the locations of the pixels included as components of Inline graphic.

Imagine a thin rubber disk with small holes at the locations of the pixels of the feature subvector with the FSI n. We translate the disk in some directions (e.g., 0, 15, 30, 45,…, 330, 345 degrees) a number of steps (e.g., 0, 1, 2,…), rotate the disk clockwise and counterclockwise a number (e.g., 0, 1, 2,…) of angles (e.g., 0, 5, 10, 15 degrees) at each translation, and expand and compress the rubber disk uniformly for a number of times (e.g., 0, 1, 2,…) at each translation for some percentages (e.g., 0, 5, 10%,…), to obtain other feature subvector indices of the same dimensionality as n. Note that in using the rubber disk to determine an FSI, if a hole in the rubber disk contains more than one pixel in the image, the one nearest to the center of the hole is included in the FSI.

Figure 4 shows examples of rotation, translations and scalings in an RTS (rotation, translation and scaling) suite of a feature subvector index n, which is shown in Fig. 4a. The components of n are the numberings (of a feature subvector) shown in the small circles within the retangular box. The cross without arrow heads indicate the orientation and position of n. Figure 4b shows a translation to the right. Figure 4c shows a rotation of the translation in Fig. 4b. Figure 4d and e show a compression and an expansion of the translation in Fig. 4b. Five examples of translations of n are shown in Fig. 4f.

Let Inline graphic be a set of FSIs ω(i) identified with such rotations, translations, and scalings of n including n. Inline graphic is called a rotation/translation/scaling (RTS) suite of n, and Inline graphic denotes the number of elements in Inline graphic. Notice the digit 0 in the parentheses (e.g., 0, 1, 2,…) in the last paragraph. It indicates a rotation, a translation, or a scaling that is the feature subvector itself.

Although ω(i) is a rotation, translation, or scaling of n, this dependence on n is not indicated in the symbol ω(i) for notational simplicity. As n is rotated, translated or scaled into ω(i), Inline graphic as a subvector of n is rotated, translated or scaled into a subvector of ω(i). This subvector of ω(i) is denoted by Inline graphic. The set Inline graphic of such subvectors of Inline graphic, is denoted by Inline graphic and called a rotation/translation/scaling (RTS) suite of Inline graphic. Note that Inline graphic. The set Inline graphic, which is also denoted by Inline graphic, is called the rotation/translation/scaling (RTS) suite of Inline graphic on Inline graphic. In generating and summing orthogonal expansions on an RTS suite Inline graphic, elements in the RTS suite of Inline graphic on Inline graphic first go through orthogonal expansion. The resultant orthogonal expansions Inline graphic are then added up to form the sum Inline graphic on the RTS suite Inline graphic of Inline graphic.

In both the supervised learning and unsupervied learning, the subvectors, Inline graphic, Inline graphic, on Inline graphic are assigned the label Inline graphic of Inline graphic. ECMs (expansion correlation matrices), Inline graphic and Inline graphic, on Inline graphic are defined by

graphic file with name M541.gif 37
graphic file with name M542.gif 38

Inline graphic and Inline graphic can be adjusted to learn a pair Inline graphic, where λ is a forgetting factor, and Λ is a scaling constant. If Inline graphic, Inline graphic and Inline graphic are replaced respectively with Inline graphic and Inline graphic. If Inline graphic, then Inline graphic and Inline graphic are unchanged.

Sums Inline graphic of orthogonal expansions (OEs), and ECMs, Inline graphic and Inline graphic, u = 1,…, U, are respectively assembled into a general orthogonal expansion (GOE) Inline graphic and general expansion correlation matrices (GECMs), Inline graphic and Inline graphic, for PU(n) (the PU on the feature vector n) as follows:

graphic file with name M560.gif 39
graphic file with name M561.gif 40
graphic file with name M562.gif 41

If these are used in PU(n), Fig. 2 should be modified: Inline graphic should be replaced with Inline graphic. Inline graphic and Inline graphic in Fig. 2 should denote those Inline graphic and Inline graphic above. Note that the input feature subvector Inline graphic in Fig. 2 is not adequate. It should be replaced with Inline graphic to provide the the RTS suite of Inline graphic on Inline graphic for u = 1,…, U.

Spike trains for each exogenous feature vector

Recall that a ternary vector Inline graphic output from a processing unit, PU(n), is obtained by converting a representation Inline graphic of a probability distribution of a label Inline graphic of a feature subvector Inline graphic. The spike generator in PU(n) uses a pseudo-random number generator to do the conversion. If some components of Inline graphic are greater than −1 and less than 1, then the corresponding components ofInline graphic generated by the pseudo-random number generator contain uncertainty (i.e., pseudo-randomness), which reflects probabilistic information contained in Inline graphic.

In retrieving, when a PU receives a feature subvector with such components with uncertainty, it uses masking matrices or general masking matrices to suppress or “filter out” those components that make the received feature subvector inconsistent with those stored in its ECMs or GECMs and to find a match between the received feature subvector and feature subvectors stored in those ECMs or GECMs. Masking matrices are described in "Masking matrices".

However, there is a chance for pseudo-random number generators to generate a ternary vector Inline graphic that is an outlier for the probability distribution Inline graphic. As Inline graphic is used as a label in unsupervised learning in PU(n) and is feedforwarded or feedbacked as inputs to PUs, such an outlier may have undesirable effects on learning and retrieving of THPAM in spite of masking matrices. To minimize such undesirable effects and to represent the subjective probabilities involved in the PUs, we let a THPAM complete a certain number of rounds of retrieving and learning for each exogenous feature vector Inline graphic so that many versions of Inline graphic are generated and learned by each PU for the same Inline graphic.

The subscript t or τ in Inline graphic, Inline graphic, and Inline graphic denote the time or numbering of the quantities going through PU(n) in Sections "A recurrent multilayer network of processing units" and "Expansion correlation matrices to Learning to recognize rotated, translated or scaled patterns". In the rest of this section, assume that each exogenous feature vector is presented to THPAM for one unit of time, and that during this one unit of time, there are ζ spikes in each spike train. Here, the subscript t or τ denotes only the time the exogenous feature vector Inline graphic or Inline graphic arrives at the input terminals of THPAM. For each exogenous feature vector Inline graphic, ζ rounds of retrieving and learning are performed by THPAM at times, τ + i/ζ,, i = 0, 1, ζ −1. Consequently, PU(n) generates a sequence of ternary vectors denoted by Inline graphic, i = 0, 1,…, ζ −1, for each exogenous feature vector Inline graphic. This sequence consists of R spike trains, each having ζ spikes each of 1/ζ unit of time.

A feedback connection from layer l +k to layer l for k ≥ 0 must have at least one delay device to ensure stability. Each delay device holds a spike for 1/ζ unit of time before it is allowed to pass. Causes in patterns, temporal or spatial, usually form a hierarchy. Example 1: Phonemes, words, phrases, sentences, and paragraphs in speech. Example 2: Notes, intervals, melodic phrases, and songs in music. Example 3: Bananas, apples, peaches, salt shaker, pepper shaker, Tabasco, fruit basket, condiment tray, table, refrigerator, water sink, and kitchen in a house. Note that although Example 3 is a spatial hierarchy, when one looks around in the kitchen, the images received by the person’s retina form a temporal hierarchy.

The higher a layer in THPAM is, the higher in the hierarchy the causes the PUs in the layer treat, and the more time it takes for the causes to form and be recognized by the PUs. Therefore, the number of delay devices on a feedback connection is a monotone increasing function of k. This requirement is consistent with the workings in a biological neural network in the cortex. Note that it takes time (1) for PUs to process feature subvector, (2) for spikes to travel along feedforward connections from a layer to the next layer, and (3) for spikes to travel along feedback connections from a layer to the same or lower-numbered layer. Note also that the times taken for (1) and (2) can be ignored in the feedforward connections, because the subscripts of the input vector Inline graphic and output vector Inline graphic of all layers are the same. However, a feedback Inline graphic from layer l +k to layer l for inclusion in Inline graphic must have a delay j that is proportional to the sum of the times taken for (1), (2) and (3) from the input terminals of layer l back to the same input terminals.

For illustration, two examples are given in the following:

Example 5 Let us set the number Nd of delay devices on the feedback connection from layer l +k to layer l, and the number ζ of learning and retriving rounds to be equal to 1 + 2k and 8 respectively, and consider the feedback connections for k = 0, 2, 3 and 7. Figure 5 shows layer l +2 and layer l in the THPAM. The feature subvector Inline graphic input to layer l consists Inline graphic feedforwarded from layer l −1 and feedbacks from layer l and layers above it. Only the feedback Inline graphic from layer l and the feedback Inline graphic from layer l +2 are shown in the figure. The boxes containing 1/ζ are delay devises with delay duration 1/ζ.

  • On the feedback connection from layer l to layer l (k = 0): There is 1 delay device on the connection. At the time τ, the exogenous feature vector Inline graphic arrives, the feedback Inline graphic is the last output from layer l in response to the preceding exogenous feature vector Inline graphic. At time τ + i/ζ for i = 1,…, 7, the feedback is Inline graphic, which is an output from layer l in response to the same exogenous feature vector Inline graphic. This feedback connection is shown on the right side of Fig. 5.

  • On the feedback connection from layer l +2 to layer l (k = 2): There are five delay devices on the connection. At time τ, the exogenous feature vector Inline graphic arrives at the input terminals of THPAM, and the five delay devices on the feedback connection holds the 5 feedbacks, Inline graphic, Inline graphic, which are outputs from layer l +2 in response to the preceding exogenous feature vector Inline graphic. During the presence of Inline graphic, these five feedbacks are respectively included in the first five feature vectors, Inline graphic, Inline graphic,…,Inline graphic, input to layer l. The next 3 inputs, Inline graphic, Inline graphic, Inline graphic, to layer l include respectively the feedbacks, Inline graphic, Inline graphic, Inline graphic, output from layer l +3 in response to Inline graphic. This feedback connection is shown on the right side of Fig. 5.

  • On the feedback connection from layer l +3 to layer l (k = 3): There are 7 delay devices on the connection. At time τ, the exogenous feature vector Inline graphic arrives at the input terminals of THPAM, and the 7 delay devices on the feedback connection holds the 7 feedbacks, Inline graphic,…,Inline graphic,…,Inline graphic, which are outputs from layer l +3 in response to the preceding exogenous feature vector Inline graphic. During the presence of Inline graphic, these 7 feedbacks are respectively included in the first 7 feature vectors, Inline graphic, Inline graphic, Inline graphic, input to layer l. The eighth input Inline graphic to layer l includes the feedback Inline graphic output from layer l +3 in response to Inline graphic.

  • On the feedback connection from layer l +3 to layer l (k = 7): There are 15 delay devices on the connection. At time τ, the exogenous feature vector Inline graphic arrives at the input terminals of THPAM, and the 15 delay devices on the feedback connection holds the 15 feedbacks, Inline graphic,…,Inline graphic. The first 7 of them are outputs from layer l +7 in response to the exogenous feature vector Inline graphic. The next 1 of them, Inline graphic, is the first output from layer l +7 in response to the exogenous feature vector Inline graphic. During the presence of Inline graphic, these 8 feedbacks are respectively included in the 8 feature vectors, Inline graphic, Inline graphic, input to layer l in response to Inline graphic.

During the presence of the exogenous feature vector Inline graphic, the feedbacks, output from layer l +k in response to Inline graphic, provide spatial associative information; and the feedbacks, output from layer l +k in response to Inline graphic provide less spatial and more temporal associative information. The further back that feedbacks are from, the less spatial and more temporal associative information is used in processing the current Inline graphic. Of course, if an exogenous feature vector is presented to THPAM for a large number of time units, all the feedbacks are actually from the same exogenous feature vector, and spatial associative information is thoroughly utilized by the use of the feedback connections.

Conclusion

The temporal hierarchical probabilistic associative memory (THPAM), proposed in this paper, is the only single mathematical model of biological neural networks that has all the eight features and answers coherently all the eight questions listed in the introductory section "Introduction". John von Neumann said: “We require exquisite numerical precision over many logical steps to achieve what brains accomplish in very few short steps” in his well-known 1958 book, The Computer and the Brain (von Neumann 1958). Showing that it is possible to achieve so many functions of biological neural networks in a few short logical steps by a single functional model is a small but perhaps significant step towards unraveling the brain.

THPAM’s mathematical structures, functions and their processing operations are hypothesized to be low-order approximates of those of biological neural networks. The integration of them, THPAM, is hypothesized to be a low-order approximate of the biological neural networks themselves. These hypotheses have been under examination. Insight into the inner workings and interactions of the components of a biological neural network is expected to be gained through the examination.

The work reported in this paper points to three research directions:

  1. Examine the components and processing operations of THPAM as biological hypotheses. If possible, justify these hypotheses to establish THPAM as a macroscopic model or low-order approximate of biological neuronal models.

  2. Expand and modify THPAM into functional models of the visual, auditory, somatosensory and (premotor, primary and supplementary) motor cortices.

  3. Test and apply THPAM to such applications as face detection and recognition, radiograph reading, baggage examination, financial time series prediction, video monitoring, text understanding, prostheses, etc.

Abbreviations

ECM

Expanded correlation matrix

FSI

Feature subvector index

GECM

General expanded correlation matrix

GOE

General orthogonal expansion

NXOR

Not-exclusive-or

OE

Orthogonal expansion

PU

Processing unit

PU(n)

Processing unit on feature subvector index n

RTS

Rotation, translation and scaling

SPD

Subjective probability distribution

THPAM

Temporal hierarchical probabilistic associative memory

XOR

Exclusive-or

Footnotes

An erratum to this article can be found at http://dx.doi.org/10.1007/s11571-010-9127-8

References

  1. Arbib MA. The handbook of brain theory and neural networks, 2. Cambridge: The MIT Press; 2003. [Google Scholar]
  2. Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems. The handbook of brain theory and neural networks, MIT Press, Cambridge
  3. Bengio Y, LeCun Y (2007) Scaling learning algorithms towards AI. In: Bottou et al. (eds) Large-scale kernel machines. The MIT Press, Cambridge
  4. Bishop CM. Pattern recognition and machine learning. New York: Springer Science; 2006. [Google Scholar]
  5. Dayan P, Abbott LF. Theoretical neuroscience: computational and mathematical modeling of neural systems. Cambridge: The MIT Press; 2001. [Google Scholar]
  6. Desjardins G, Bengio Y (2008) Empirical evaluation of convolutional rbms for vision. Technical Report 1327, Département d’Informatique et de Recherche Opérationnelle, Université de Montréal
  7. Erhan D, Bengio Y, Courville A, Manzagol P, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res Appear
  8. Fromherz P, Gaede V. Exclusive-or function of single arborized neuron. Biol Cybern. 1993;69:337–334. doi: 10.1007/BF00203130. [DOI] [PubMed] [Google Scholar]
  9. Geist J, Wilkinson R, Janet S, Grother P, Hammond B, Larsen N, Klear R, Matsko M, Burges C, Creecy R, Hull J, Vogl, Wilson C (1994) The second census optical charater recognition systems conference. Technical Report NIST 5452, National Institute of Standards and Technology, May 1994
  10. George D, Hawkins J. Towards a mathematical theory of cortical micro-circuits. PLoS Comput Biol. 2009;5(10):1–26. doi: 10.1371/journal.pcbi.1000532. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Granger R. Engines of the brain: the computational instruction set of human cognition. AI Mag. 2006;27:15–31. [Google Scholar]
  12. Grossberg S. Towards a unified theory of neocortex: laminar cortical circuits for vision and cognition. Prog Brain Res. 2007;165:79–104. doi: 10.1016/S0079-6123(06)65006-1. [DOI] [PubMed] [Google Scholar]
  13. Hassoun MH. Associative neural memories, theory and implementation. New York: Oxford University Press; 1993. [Google Scholar]
  14. Hawkins J. On intelligence. New York: Henry Holt and Company; 2004. [Google Scholar]
  15. Haykin S. Neural networks and learning machine, 3. Upper Saddle River: Pretice Hall; 2009. [Google Scholar]
  16. Hecht-Nielsen R. Confabulation theory. New York: Springer; 2007. [Google Scholar]
  17. Hecht-Nielsen R, McKenna T. Computational models for neuroscience. New York: Springer; 2003. [Google Scholar]
  18. Hinton GE, Anderson JA. Parallel models of associative memory. Hllsdale: Lawrence Erlbaum Associates; 1989. [Google Scholar]
  19. Hinton GE, Osindero S, Teh Y. A fast learning algorithm for deep belief nets. Neural Comput. 2006;313:504–507. doi: 10.1162/neco.2006.18.7.1527. [DOI] [PubMed] [Google Scholar]
  20. Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006;18:1527–1554. doi: 10.1126/science.1127647. [DOI] [PubMed] [Google Scholar]
  21. Kandel ER, Schwartz JH, Jessell TM. Principles of neural science. New York: McGraw-Hill; 2000. [Google Scholar]
  22. Koch C. Biophysics of computation. Oxford: Oxford University Press; 1999. [Google Scholar]
  23. Kohonen T. Self-organization and associative memory. New York: Springer; 1988. [Google Scholar]
  24. LeCun Y, Bose B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989;1(4):541–551. doi: 10.1162/neco.1989.1.4.541. [DOI] [Google Scholar]
  25. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of The IEEE 86(11):2278–2324
  26. Levitan IB, Kaczmarek LK. Simulated annealing and boltzmann machines. London: Wiley; 1993. [Google Scholar]
  27. Martin KAC. Microcircuits in visual cortex. Curr Opinion Neurobiol. 2002;12(4):418–425. doi: 10.1016/S0959-4388(02)00343-4. [DOI] [PubMed] [Google Scholar]
  28. Mel BW. Information processing in dendritic trees. Neural Comput. 1994;6:1031–1085. doi: 10.1162/neco.1994.6.6.1031. [DOI] [Google Scholar]
  29. Mel BW. Have we been hebbing down the wrong path? Neuron. 2002;34:275–288. doi: 10.1016/S0896-6273(02)00669-4. [DOI] [PubMed] [Google Scholar]
  30. Mountcastle VB. An organizing principle for cerebral function: the unit model and the distributed system. In: Edelman GM, Mountcastle VB, editors. The mindful brain. Cambridge: MIT Press; 1978. [Google Scholar]
  31. Oja E. A simplified neuron nodel as a principal component analyzer. J Math Biol. 1982;15:267–273. doi: 10.1007/BF00275687. [DOI] [PubMed] [Google Scholar]
  32. O’Reilly R, Munakata Y. Computational explorations in cognitive neuroscience. Cambridge: The MIT Press; 2000. [Google Scholar]
  33. Principe JC, Euliano NR, Lefebvre WC. Neural and adaptive systems: fundamentals through simulations. New York: Wiley; 2000. [Google Scholar]
  34. Ranzato M, Huang F, Boureau Y, LeCun Y (2007) Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: Proceedings of computer vision and pattern recognition conference (CVPR 2007). IEEE Press, New York
  35. Rieke F, Warland D, Ruyter R, van Steveninck, Bialek W. Spikes: exploring the neural code. Cambridge: The MIT Press; 1999. [Google Scholar]
  36. Salakhutdinov R, Hinton G (2009) Deep Boltzmann machines. In: Proceedings of the 12th international conference on artificial intelligence and statistics (AISTATS) volume 5. Clearwater Beach, Florida, pp 448–455
  37. Simard P, Steinkraus D, Platt JC (2003) Best practices for convolutional neural networks. In: Proceedings of the seventh international conference on document analysis and recognition 2:958–962
  38. Slepian D. A class of binary signaling alphabets. Bell Syst Tech J. 1956;35:203. [Google Scholar]
  39. Stuart G, Nelson S, Hausser M. Dendrites, 2. New York: Oxford University Press; 2008. [Google Scholar]
  40. Neumann J. The computer and the brain. New Haven: Yale University Press; 1958. [Google Scholar]
  41. Wilkinson R, Geist J, Janet S, Grother P, Burges C, Creecy R, Hammond B, Hull J, Larsen N, Vogl, Wilson C (1992) The first census optical charater recognition systems conference. Technical Report NIST 4912, National Institute of Standards and Technology, August 1992
  42. Zador AM, Clairborne BJ, Brown TH (1992) Nonlinear pattern separation in single hippocampal neurons with active dendritic membrane. In: Moody J, Lippmann R (eds) Advances in neural information processing systems, vol. 4. Morgan Kaufmann, San Mateo, pp 51–58

Articles from Cognitive Neurodynamics are provided here courtesy of Springer Science+Business Media B.V.

RESOURCES