Information Loss Associated with Imperfect Observation and Mismatched Decoding

Masafumi Oizumi; Masato Okada; Shun-Ichi Amari

doi:10.3389/fncom.2011.00009

. 2011 Mar 2;5:9. doi: 10.3389/fncom.2011.00009

Information Loss Associated with Imperfect Observation and Mismatched Decoding

Masafumi Oizumi ^1,^*, Masato Okada ^1,², Shun-Ichi Amari ¹

PMCID: PMC3084443 PMID: 21629857

Abstract

We consider two types of causes leading to information loss when neural activities are passed and processed in the brain. One is responses of upstream neurons to stimuli being imperfectly observed by downstream neurons. The other is upstream neurons non-optimally decoding stimuli information contained in the activities of the downstream neurons. To investigate the importance of neural correlation in information processing in the brain, we specifically consider two situations. One is when neural responses are not simultaneously observed, i.e., neural correlation data is lost. This situation means that stimuli information is decoded without any specific assumption about neural correlations. The other is when stimuli information is decoded by a wrong statistical model where neural responses are assumed to be independent even when they are not. We provide the information geometric interpretation of these two types of information loss and clarify their relationship. We then concretely evaluate these types of information loss in some simple examples. Finally, we discuss use of these evaluations of information loss to elucidate the importance of correlation in neural information processing.

Keywords: Fisher information, information loss, imperfect observation, mismatched decoding, information geometry, correlated activity, coincidence detection

Introduction

Neurons in early sensory areas represent the information of various stimuli from the external world by their noisy activities. The noise inherent in neural activities needs to be properly handled by the nervous system for the information to be accurately processed. One simple but powerful means for coping with the neural noise is population coding. Neurophysiological experiments have shown that many neurons with different selectivities respond to particular stimuli. These findings suggest that the nervous system represents information through population activities, which would be helpful for accurate information processing. This coding scheme for stimulus information is known as population coding.

An important feature of population coding is that the activity of neurons is correlated (Gray et al., 1989; Gawne and Richmond, 1993; Zohary et al., 1994; Meister et al., 1995; Lee et al., 1998; Ishikane et al., 2005; Averbeck et al., 2006; Ohiorhenuan et al., 2010; but see Ecker et al., 2010). A crucial question is how much information the nervous system can extract from correlated population activities. In general, it is difficult for the nervous system to maximally extract information when its activities are correlated because two conditions must be satisfied. First, downstream neurons must perfectly observe the responses of the upstream neurons (perfect observation of neural responses). Second, downstream neurons must optimally decode the information from the observed neural responses (optimal decoding of stimulus). In other words, if either the observation or the decoding is imperfect or non-optimal, which are both likely situations in the nervous system, stimuli information is inevitably degraded. In this work, we discuss the amount of information loss associated with imperfect observation and non-optimal decoding.

With regard to non-optimal decoding, several researchers including the authors of this paper have investigated how much information would be lost if neural correlation is ignored in decoding (Nirenberg et al., 2001; Wu et al., 2001; Golledge et al., 2003; Averbeck and Lee, 2006; Oizumi et al., 2010). This type of decoding is called mismatched decoding (Merhav et al., 1994; Oizumi et al., 2010) because the decoding does not match the actual neural activities, i.e., the actual neural activities are correlated but the correlations are ignored in decoding. With regard to imperfect observation, this work is the first to address this issue. As an example of imperfect observation, we specifically consider the situation that the activity of neurons is not simultaneously observed by downstream neurons (Figure 1). This is related to whether the coincidence detector plays an important role in information processing in the brain (Abeles, 1982; König et al., 1996). If a large proportion of the total information is lost when the responses of neurons are not simultaneously observed, coincidence detection would be necessary for accurate information processing.

**Schematic of simultaneous and non-simultaneous observation of neural responses**.

The framework of mismatched decoding was introduced to quantify the importance of correlated activity because this importance could be quantified by the amount of information loss when neural correlation is ignored in decoding (Nirenberg et al., 2001; Wu et al., 2001; Nirenberg and Latham, 2003). Similarly, the importance of correlated activity could be also measured using the amount of information loss due to non-simultaneous observation. We therefore obtain two different measures for the importance of correlated activity by introducing the concepts of imperfect observation and mismatched decoding. We aim to clarify the relationship between the measures and provide a simple guide on how to use them.

Combining the concepts of imperfect observation and mismatched decoding produces four types of situations where stimulus is inferred: (1) observation is perfect and decoding is optimal; (2) decoding is optimal but observation is imperfect; (3) observation is perfect but decoding is mismatched; and (4) observation is imperfect and decoding is mismatched. We discuss the inferences in these four types of situations from the viewpoint of information geometry (Amari and Nagaoka, 2000) and then clarify their relationships. We also specifically compute the amount of information obtained through these four inference types for neural responses described by the Gaussian model and by a binary probabilistic model.

This paper is organized as follows. First, we introduce the concept of an exponential family and two probability distributions that belong to the exponential family, i.e., the Gaussian distribution and the binary probabilistic model, which have both been intensively investigated as representative models of neural responses (Abbott and Dayan, 1999; Amari, 2001; Nakahara and Amari, 2002). Second, we provide information geometric interpretation of the four types of inference mentioned above and describe how to evaluate the information in each of the four types by using the Fisher information. Third, we compute the amount of information in each of the four types of inference in the Gaussian model and explain the relationship between the inference with imperfect observation and that with mismatched decoding. Fourth, we also compute the amount of information obtained by the four types of inference in simple binary probabilistic models. Finally, we summarize the results and discuss how to use the two measures introduced in this study for quantifying the importance of neural correlation and mention some of the future directions of this work.

Exponential Family of Probability Distributions

Let us denote the conditional probability distribution for a neural response r = (r₁, r₂,…,r_N) over a population of N neurons being evoked by a stimulus s as p(r; s), where s is a continuous variable. We assume that p(r; s) belongs to the exponential family. Probability distributions that belong to an exponential family can be written as

p (r; s) = \exp (\sum_{I = 1}^{n} R_{I} (r) θ_{I} (s) - ψ (s)),

(1)

where R_I(r) is a function of neural responses r, θI(s) is a function of stimulus s, and a normalization constant ψ(s) is also a function of stimulus s. θI(s) is called the natural parameter of the exponential family. Two examples of probability distributions are investigated in this paper.

Example 1. Gaussian distribution

The number of spikes emitted by a neuron over a fixed time period (time-averaged rate) or the total number of spikes emitted by a population of neurons (population-averaged rate), which is denoted by r, may be described by the Gaussian distribution

p (r; s) = \frac{1}{\sqrt{{(2 π)}^{N} \det [C (s)]}} \exp (- \frac{1}{2} {(r - f (s))}^{T} C^{- 1} (s) (r - f (s))),

(2)

where N is the number of neurons, f(s) is the average number of spikes, and C(s) is the covariance matrix. If we rewrite Eq. 2 as follows, we can see that the Gaussian distribution belongs to the exponential family

\begin{array}{l} p (r; s) = \exp (C^{- 1} (s) f (s) \cdot r - \frac{1}{2} C^{- 1} (s) \cdot \bar{r} {\bar{r}}^{T} - \frac{1}{2} \log \det [C (s)] \\ - \frac{1}{2} f^{T} (s) C^{- 1} f (s)) . \end{array}

(3)

In this case,

R_{1} = {r_{i}} = {r_{1}, \dots, r_{N}},

(4)

R_{2} = {r_{i} r_{j}} = {r_{1} r_{1}, r_{1} r_{2}, \dots, r_{N - 1} r_{N}, r_{N} r_{N}},

(5)

θ_{1} = {\sum_{j = 1}^{N} C_{i j}^{- 1} (s) f_{j} (s)},

(6)

θ_{2} = {- \frac{1}{2} C_{i j}^{- 1} (s)},

(7)

ψ (s) = \frac{1}{2} \log \det [C (s)] + \frac{1}{2} f^{T} (s) C^{- 1} f (s) .

(8)

Example 2. Log-linear model of binary neural response

When we analyze neural responses within a short time period (∼1–10 ms, typically), the neural responses are considered stochastic binary variables: r_i = 1 when the ith neuron fires within the time bin, and r_i = 0 when it does not. The joint distributions of N random binary variables can be generally written in the following form (Amari, 2001; Nakahara and Amari, 2002):

\begin{array}{l} p (r; s) \\ = \exp (\sum_{i = 1}^{N} θ_{i} (s) r_{i} + \sum_{i < j} θ_{i j} (s) r_{i} r_{j} + \dots + θ_{12 \dots N} (s) r_{1} r_{2} \dots r_{N} - ψ (s)), \end{array}

(9)

where the normalization constant ψ(s) is given by

ψ (s) = - \log p (r_{1} = r_{2} = \dots = r_{N} = 0; s) .

(10)

This probability distribution is clearly in the exponential family form. In this case,

\begin{array}{l} R_{1} = {r_{i}} = {r_{1}, \dots, r_{N}}, \\ R_{2} = {r_{i} r_{j}} = {r_{1} r_{1}, r_{1} r_{2}, \dots, r_{N - 1} r_{N}, r_{N} r_{N}}, \\ \dots, \\ R_{N} = {r_{1} r_{2} \dots r_{N}} \end{array}

(11)

\begin{array}{l} θ_{1} = {θ_{i} (s)}, \\ θ_{2} = {θ_{i j} (s)}, \\ \dots, \\ θ_{N} = {θ_{12 \dots N} (s)} . \end{array}

(12)

Recent investigations have shown that the observed statistics of neural responses can be sufficiently captured by this type of probabilistic model, which contains up to second-order correlation terms (Schneidman et al., 2006; Shlens et al., 2006; Tang et al., 2008), although the importance of higher-order correlations has also been discussed (Amari et al., 2003; Montani et al., 2009; Ohiorhenuan et al., 2010). For simplicity, we consider only the second-order correlations and ignore higher-order correlations in this work, i.e.,

θ_{I} = 0, (for I > 2) .

(13)

Inference of Stimulus and Fisher Information

We consider the inference problem of how accurately the stimulus value s can be estimated when the stochastic neural response r is given. We assume that neural response r is observed many times. The neural response at the tth trial is denoted by r(t) and the number of trials by T. If each neural response is independent and identically distributed, the probability distribution for T observations of neural responses is given by

p_{T} (r^{T}; s) = \prod_{t = 1}^{T} p (r (t); s),

(14)

= \exp (\sum_{I = 1}^{2} {\bar{R}}_{I} (r) \cdot θ_{I} (s) - T ψ (s)),

(15)

where ${\bar{R}}_{I}$ are sufficient statistics for the probability distribution and are given by

{\bar{R}}_{1} = {{\bar{r}}_{i}} {\bar{r}}_{i} = \frac{1}{T} \sum_{t = 1}^{T} r_{i} (t),

(16)

{\bar{R}}_{2} = {{\bar{r}}_{i j}} {\bar{r}}_{i j} = \frac{1}{T} \sum_{t = 1}^{T} r_{i} (t) r_{j} (t) .

(17)

We evaluate the accuracy of the estimate by using the Fisher information,

g (s) = - E_{p_{T}} [\frac{d^{2} l_{s}}{d s^{2}}] = - \int d r^{T} p_{T} (r^{T}; s) \frac{d^{2} \log p_{T} (r^{T}; s)}{d s^{2}},

(18)

where l_s = log p_T(r^T; s) and $E_{p_{T}}$ denotes the expectation with respect to the distribution p_T(r^T; s). Through the Cramér–Rao bound, the Fisher information bounds the average squared decoding error for an unbiased estimate as follows:

E_{p_{T}} [{(s - \overset{\land}{s})}^{2}] \geq \frac{1}{g (s)},

(19)

where s is the true stimulus value and $\hat{s}$ is the estimate. Since the Fisher information is the lower bound of the mean square error, behavior of the mean square error and the Fisher information could be different in general (Brunel and Nadal, 1998; Yaeli and Meir, 2010). However, the maximum likelihood estimator, which chooses s for an estimate that maximizes likelihood function p_T(r^T; s), achieves the Cramér–Rao bound (Eq. 19) as T → ∞. A Bayesian estimator, which is generally a biased estimator, can also achieve the Cramér–Rao bound as T → ∞ because it becomes equivalent to the maximum likelihood estimator as T → ∞.

We compute the Fisher information with respect to stimulus s from the Fisher information matrix with respect to the natural parameters,

g_{I J} (θ) = E_{p_{T}} [\partial_{I} l_{θ} \partial_{J} l_{θ}],

(20)

= - E_{p_{T}} [\partial_{I} \partial_{J} l_{θ}],

(21)

where l_θ = p_T(r^T; θ), and ∂_I = ∂/∂θ_I. From the definition of the Fisher information matrix with respect to the natural parameters (Eqs. 20 and 21), the natural parameters are given by

g_{I J} (θ) = E_{p_{T}} [({\bar{R}}_{I} - η_{I}) ({\bar{R}}_{J} - η_{J})],

(22)

= T \partial_{I} \partial_{J} ψ (θ),

(23)

where

η_{I} = E_{p_{T}} [{\bar{R}}_{I}] = E_{p} [R_{I}] .

(24)

By using g_IJ(θ), the Fisher information with respect to the stimulus s can be written as

g (s) = \sum_{I, J} g_{I J} (θ) \frac{\partial θ_{I}}{\partial s} \frac{\partial θ_{J}}{\partial s} .

(25)

The Fisher information g(s) described above determines the accuracy of the estimate of s under two conditions: (1) all sufficient statistics ${\bar{R}}_{I}$ are available, and (2) the likelihood function p(r; s) is exactly known. Regarding the first condition, downstream neurons may not be able to simultaneously access the responses of all upstream neurons. This imperfect observation of neural responses by downstream neurons leads to loss of information. Similarly, regarding the second condition, downstream neurons are unlikely to completely know the likelihood function p(r; s). Downstream neurons are more likely to only partially know p(r; s) and to decode the stimulus based on a decoding model q(r; s), which is not equal to p(r; s) but partially matches p(r; s) (Nirenberg et al., 2001; Wu et al., 2001; Oizumi et al., 2010). This mismatched decoding of stimuli by downstream neurons also results in loss of information. These two types of information loss are evaluated next.

Information Loss Caused by Imperfect Observation of Neural Responses

In this work, we specifically consider the situation that second-order sufficient statistics ${\bar{R}}_{2}$ are not accessible to downstream neurons and only first-order sufficient statistics ${\bar{R}}_{1}$ are available to them. This is related to whether coincidence detector neurons are needed to accurately estimate the stimulus. To evaluate the loss of information associated with loss of data, we first marginalize the joint probability distribution $p ({\bar{R}}_{1}, {\bar{R}}_{2}; s)$ over ${\bar{R}}_{2} :$

p ({\bar{R}}_{1}; s) = \int d {\bar{R}}_{2} p ({\bar{R}}_{1}, {\bar{R}}_{2}; s) .

(26)

When only ${\bar{R}}_{1}$ is observed, the Fisher information with respect to stimulus s is given by

g^{L} (s) = - \int d {\bar{R}}_{1} p ({\bar{R}}_{1}; s) \frac{d^{2} \log p ({\bar{R}}_{1}; s)}{d s^{2}} .

(27)

The information loss associated with loss of ${\bar{R}}_{2}$ is

Δ g = g (s) - g^{L} (s) .

(28)

Information Loss Caused by Mismatched Decoding of Stimulus

We evaluate the loss of information when downstream neurons infer the stimulus parameter s based on not the correct probability distribution p(r; s) but a mismatched probability distribution q(r; s). We assume that q(r; s) also belongs to the exponential family and that the maximum likelihood estimation based on q(r; s) is consistent, i.e.,

E_{p} [\frac{d l_{q} (r; s)}{d s}] = 0,

(29)

where E_p denotes the expectation with respect to p(r; s) and l_q(r; s) = log q(r; s). We evaluate the squared decoding error of the maximum likelihood estimation with the mismatched likelihood function q(r; s) based on T observations of neural responses, r^T. The estimate ${\hat{s}}_{q}$ is given by

{\hat{s}}_{q} = \underset{s}{\arg \max} l_{q_{T}} (r^{T}; s),

(30)

= \underset{s}{\arg \max} \sum_{t = 1}^{T} l_{q} (r (t); s) .

(31)

By differentiating l_qT(r^T; s) with respect to s at ${\hat{s}}_{q},$ we obtain the quasi-likelihood estimating equation

\sum_{t = 1}^{T} \frac{d l_{q} (r (t); {\overset{\land}{s}}_{q})}{d s} = 0.

(32)

When the left-hand side terms of Eq. 32 are expanded at the true value of stimulus parameter s,

\begin{array}{l} 0 = \sum_{t = 1}^{T} \frac{d l_{q} (r (t); {\overset{\land}{s}}_{q})}{d s}, \\ = \sum_{t = 1}^{T} \frac{d l_{q} (r (t); s)}{d s} + \sum_{t = 1}^{T} \frac{d^{2} l_{q} (r (t); s)}{d s^{2}} (s - {\overset{\land}{s}}_{q}) . \end{array}

(33)

According to the central limit theorem, the first term on the right-hand side of Eq. 33 converges to a Gaussian distribution with mean 0 and variance TE_p[(dl_q(r; s)/ds)²] as T → ∞. From the weak law of large numbers, the coefficient of the second term on the right-hand side of Eq. 33 becomes

\sum_{t = 1}^{T} \frac{d^{2} l_{q} (r (t); s)}{d s^{2}} = T E_{p} [\frac{d^{2} l_{q} (r; s)}{d s^{2}}] .

(34)

Taken together, we have

E_{p} [{({\hat{s}}_{q} - s)}^{2}] = \frac{1}{T} \frac{E_{p} [{(\frac{d l_{q} (r; s)}{d s})}^{2}]}{{(E_{p} [\frac{d^{2} l_{q} (r; s)}{d s^{2}}])}^{2}} .

(35)

If we regard the inverse of the squared decoding error as the Fisher information, the Fisher information for mismatched decoding model q(r; s) is

g * (s) = T \frac{{(E_{p} [\frac{d^{2} l_{q} (r; s)}{d s^{2}}])}^{2}}{E_{p} [{(\frac{d l_{q} (r; s)}{d s})}^{2}]} .

(36)

Note that when q(r; s) = p(r; s), g*(s) = g(s).

Information Geometric Interpretations

We discuss the inference of stimulus when neural responses are only partially observed and that when a mismatched probability distribution is used for decoding, from the information geometric viewpoint (Amari and Nagaoka, 2000). We consider the following four types of inference.

Inference 1 (Perfect observation and matched decoding) The complete data $({\bar{R}}_{1}, {\bar{R}}_{2})$ are available and the true probability distribution p(r; s) is used for decoding.

Inference 2 (Imperfect observation) The true probability distribution p(r; s) is used for decoding but only partial data $({\bar{R}}_{1})$ are available.

Inference 3 (Mismatched decoding) The complete data $({\bar{R}}_{1}, {\bar{R}}_{2})$ are available but a mismatched probability distribution q(r; s) is used for decoding.

Inference 4 (Imperfect observation and mismatched decoding) A mismatched probability distribution q(r; s) is used for decoding and only partial data $({\bar{R}}_{1})$ are available.

We assumed that both the true probability distribution p(r; s) and mismatched probability distributions q(r; s) belong to the exponential family of probability distributions S given in Eq. 1. S is specified by n-dimensional natural parameters θ = (θ_{1, θ2, …, θn}). If we take θ as a coordinate system introduced in set S of probability distributions, we can regard S as an n-dimensional manifold (space). A point in S represents a specific probability distribution determined by the parameters θ. The true statistical model p(r; s) denoted by M and the mismatched statistical model q(r; s) denoted by M*, both of which are parameterized by a single variable s, are considered as curves in the manifold S, i.e., one-dimensional submanifolds having a coordinate s.

Inference 1 First, we describe Inference 1 from the viewpoint of information geometry (Amari, 1982; Amari and Nagaoka, 2000). Let us denote the observed data $({\bar{R}}_{1}^{o}, {\bar{R}}_{2}^{o})$ by $\bar{x} .$ $\bar{x}$ can be considered as a point in S, which we call the observed point. The observed point $\bar{x}$ is distributed near the point s that represents the true probability distribution when stimulus s is presented, p(r; s). The deviation of $\bar{x}$ from the point specified by the true stimulus parameter s can be decomposed into the deviation in the parallel direction to M and the deviation in the orthogonal direction to M. The maximum likelihood estimation corresponds to the minimizer of the Kullback–Leibler divergence between the distribution corresponding to the observed point $\bar{x}$ (which is not in M in general) and distributions in M,

\hat{s} = \underset{s}{\arg \min} K L [\bar{x} ({\bar{R}}_{1}^{o}, {\bar{R}}_{2}^{o}) ∥ p (r; s)],

(37)

where $\hat{s}$ is the maximum likelihood estimator. The geometric interpretation of the maximum likelihood estimation is the orthogonal projection to M from $\bar{x}$ (Figure 2). The orthogonal projection completely eliminates the deviation of $\bar{x}$ from s in the orthogonal direction to M but the deviation in the parallel direction to M remains. This remaining deviation corresponds to the Fisher information of M (Eq. 18). If we use other estimators that are not orthogonal projections to M (e.g., the moment estimator), the decoding error necessarily becomes larger than the orthogonal projection.

**Information geometric picture of Inference 1 (perfect observation and matched decoding)**.

Inference 2 In the Inference 2 case (Amari, 1995), only ${\bar{R}}_{1}$ is observed. Let us define a submanifold D, which is formed by the set of observed points, where ${\bar{R}}_{1}$ is fixed at the observed value ${\bar{R}}_{1}^{o}$ but unobserved ${\bar{R}}_{2}$ takes arbitrary values. Submanifold D is called the data submanifold. The maximum likelihood estimation based on partial observed data corresponds to searching for the pair of points and that minimizes the Kullback–Leibler divergence between $\hat{x} \in D$ and $\hat{s} \in M$ (Figure 3), i.e.,

**Information geometric picture of Inference 2 (imperfect observation)**.

\min K L (D ∥ M) = \min_{x \in D, s \in M} K (\hat{x} ∥ s) .

(38)

The estimated value of s is expressed as

{\hat{s}}_{d} = \underset{s}{\arg \min} [\min_{{\bar{R}}_{2}} K L [\hat{x} ({\bar{R}}_{1}^{o}, {\bar{R}}_{2}) ∥ p (r; s)]] .

(39)

For a given candidate point in D, $\hat{x},$ the point in M that minimizes the Kullback–Leibler divergence is given by the orthogonal projection $\hat{x}$ of to M.

Inference 3 For Inference 3 (Figure 4), we assumed that the maximum likelihood estimation based on a mismatched model q(r; s) is unbiased (Eq. 29). This condition corresponds to the case where the point p(r; s) in M and the point q(r; s) in M*, which both represent a given stimulus parameter s, are the mutually nearest points in S in terms of the Kullback–Leibler divergence, i.e.,

**Information geometric picture of Inference 3 (mismatched decoding)**.

q (r; s) = \underset{q (r; s^{'})}{\arg \min} K L [p (r; s) ∥ q (r; s^{'})] .

(40)

If we differentiate Eq. 40 with respect to s′, we obtain Eq. 29. Similar to in Inference 1, the maximum likelihood estimation based on a mismatched model M* corresponds to the minimization of the Kullback–Leibler divergence between the observed point $\bar{x}$ and points in M*,

{\hat{s}}_{q} = \underset{s}{\arg \min} K L [\bar{x} ({\bar{R}}_{1}^{o}, {\bar{R}}_{2}^{o}) ∥ q (r; s)] .

(41)

This corresponds to the orthogonal projection of the observed point $\bar{x}$ to M* (Figure 4). The orthogonal projection to M* cannot completely eliminate the deviation in the direction perpendicular to M, unless M and M* are in parallel. Thus, information is inevitably lost depending on the angle between M and M*. The Fisher information for the mismatched decoding model is given by Eq. 36.

Inference 4 In Inference 4 (Figure 5), the maximum likelihood estimation with partial observed data ${\bar{R}}_{1}^{o}$ and a mismatched probability distribution q(r; s) corresponds to searching for two points in the data submanifold D and the mismatched model M*:

**Information geometric picture of Inference 4 (imperfect observation and mismatched decoding)**.

{\hat{s}}_{q d} = \underset{s}{\arg \min} [\min_{{\bar{R}}_{2}} K L [\bar{x} ({\bar{R}}_{1}^{o}, {\bar{R}}_{2}) ∥ q (r; s)]] .

(42)

Relationship between Inference with Partial Observed Data and Inference with Mismatched Decoding Model: Gaussian Case

In this section, we compute the Fisher information obtained by the four types of inference described in the previous section when the probability distributions are Gaussian. We also discuss the relationship between the inferences.

One-dimensional case

Before we deal with the multidimensional Gaussian model, we first consider the one-dimensional case as a toy example. We specifically consider the Gaussian distribution with mean μ(s) = s and variance σ²(s) = s²:

p (r) = \frac{1}{\sqrt{2 π s^{2}}} \exp (- \frac{1}{2 s^{2}} {(r - s)}^{2}) .

(43)

The statistical model M = {N(s, s²)} is expressed as a curve in the manifold S = {N(μ, σ²)} with coordinates of mean μ and variance σ² (Figure 6). The probability distribution on T observations of r is given by

**Information geometric picture of four types of inference in one-dimensional Gaussian model**.

p (\bar{r}, \bar{R}) = \exp (T (\frac{{\bar{r}}^{2}}{s} - \frac{\bar{R}}{2 s^{2}} - \log (\sqrt{2 π s^{2}}) - \frac{1}{2})),

(44)

where $\bar{r}$ and $\bar{R}$ are sufficient statistics

\bar{r} = \frac{1}{T} \sum_{t = 1}^{T} r (t),

(45)

\bar{R} = \frac{1}{T} \sum_{t = 1}^{T} r^{2} (t) .

(46)

In this model, we compute the Fisher information and the information loss when the data of variance $\bar{R}$ is lost and those when the decoding model whose variance is mismatched with the actual one is used.

Inference 1 When the data of neural responses $\bar{x} = (\bar{r}, \bar{R})$ are completely observed and the actual statistical model M is used in decoding, the maximum likelihood estimation of s corresponds to the orthogonal projection from $\bar{x}$ to M (Figure 6). In this case, we can compute the Fisher information as

I_{F 1} (s; \bar{r}, \bar{R}) = \frac{3 T}{s^{2}} .

(47)

Inference 2 When data of variance $\bar{R}$ is lost, the data manifold D is given by

D = {N (μ, σ^{2}) | μ = \bar{r}, σ^{2} is arbitrary} .

(48)

In this case, the estimated value of s is the intersection point of D and M (Figure 6). By using Eq. 27, we can obtain the Fisher information,

I_{F 2} (s; \bar{r}) = \frac{T + 2}{s^{2}} \sim \frac{T}{s^{2}},

(49)

where we used the fact that the marginal distribution over $\bar{R}$ can be written as

p (\bar{r}) = \frac{1}{\sqrt{2 π {(\frac{s}{\sqrt{T}})}^{2}}} \exp (- \frac{1}{2 {(\frac{s}{\sqrt{T}})}^{2}} {(\bar{r} - s)}^{2}) .

(50)

$p (\bar{r})$ can be derived by considering that $\bar{r}$ also obey a Gaussian distribution and that the mean and the variance of $\bar{r}$ are s and s²/T, respectively.

The information loss is given by

Δ I_{F 2} (s) = \frac{2 T}{s^{2}} .

(51)

Inference 3 We specifically consider the inference with the following mismatched decoding model to compare it with the inference when $\bar{R}$ is lost:

q (r) = \frac{1}{\sqrt{2 π σ^{2}}} \exp (- \frac{1}{2 σ^{2}} {(r - s)}^{2}) .

(52)

In this model, the mean is equal to the actual one but the variance is mismatched with the actual one. The maximum likelihood estimation based on the mismatched decoding model M* corresponds to the orthogonal projection from the observed point $\bar{x}$ to M* (Figure 6). By using Eq. 36, we obtain the Fisher information

I_{F 3} (s) = \frac{T}{s^{2}} .

(53)

Inference 4 When the mismatched decoding model q(r), where the variance is independent of s, is used for decoding, the data of variance $\bar{R}$ does not affect the results of the inference. Thus, even if $\bar{R}$ is lost, no information is lost in this mismatched decoding. The Fisher information in the Inference 4 case is the same as that in the Inference 3 case:

I_{F 4} (s) = I_{F 3} (s) .

(54)

As Eqs. 49 and 53 show, I_F3(s) is equal to I_F2(s). We can also easily show that I_F3(s) is equal to I_F2(s) in one-dimensional cases in general. However, in the multidimensional case, I_F3(s) is not equal to I_F2(s). In the next section, we explain the general relationship between I_F2(s) and I_F3(s) in the multidimensional Gaussian model.

Multidimensional case

We next consider the multidimensional Gaussian distribution shown in Eq. 2. The probability distribution for T observations of neural responses r is given by

p (\bar{r}, \bar{R}) = \exp (T (θ (s) \cdot \bar{r} + Θ (s) \cdot \bar{R} - ψ (s))),

(55)

where the sufficient statistics $\bar{r}$ and $\bar{R}$ are

\bar{r} = \frac{1}{T} \sum_{t = 1}^{T} r (t),

(56)

\bar{R} = \frac{1}{T} \sum_{t = 1}^{T} r (t) r {(t)}^{T},

(57)

the natural parameters θ and Θ are

θ ((s) = C^{- 1} (s) f (s),

(58)

Θ (s) = - \frac{1}{2} C^{- 1} (s),

(59)

and the normalization constant ψ(s) is

Ψ (s) = \frac{1}{2} f (s) C^{- 1} (s) f (s) + \log (\sqrt{\det [C]}) + \log (\sqrt{{(2 π)}^{N}}) .

(60)

Inference 1 First, let us consider the Fisher information in the Inference 1 case. The Fisher information matrix with respect to the natural parameters is given by

- \frac{\partial^{2} \log p (\bar{r}, \bar{R})}{\partial θ_{i} \partial θ_{j}} = T C_{i j},

(61)

- \frac{\partial^{2} \log p (\bar{r}, \bar{R})}{\partial θ_{i} \partial Θ_{j k}} = T (C_{j i} f_{k} + C_{k i} f_{j}),

(62)

- \frac{\partial^{2} \log p (\bar{r}, \bar{R})}{\partial Θ_{i j} \partial Θ_{k l}} = T (2 C_{i k} f_{l} f_{j} + 2 C_{j k} f_{l} f_{i} + 2 C_{i k} C_{l j}) .

(63)

By using the Fisher information matrix with respect to the natural parameters, we can obtain the Fisher information with respect to stimulus s from Eq. 25:

\frac{I_{F 1} (s; \bar{r}, \bar{R})}{T} = {f^{'}}^{T} C^{- 1} f^{'} + \frac{1}{2} T r [C^{'} C^{- 1} C^{'} C^{- 1}] .

(64)

Inference 2 Second, let us consider the Fisher information in the Inference 2 case. We consider the situation that the second-order sufficient statistics $\bar{R}$ in Eq. 55 are lost and only the first-order sufficient statistics $\bar{r}$ are observed. The marginalized distribution over missing data $\bar{R}$ is given by

\begin{array}{l} p (\bar{r}) \\ = \frac{1}{\sqrt{{(2 π)}^{N} \det [C (s) / T]}} \exp (- \frac{T}{2} {(\bar{r} - f (s))}^{T} C^{- 1} (s) (\bar{r} - f (s))), \end{array}

(65)

= \exp (T (θ (s) \cdot \bar{r} + Θ (s) \cdot \bar{r} {\bar{r}}^{T} - Ψ (s))),

(66)

where the natural parameters θ and θ are

θ (s) = C^{- 1} (s) f (s),

(67)

Θ (s) = - \frac{1}{2} C^{- 1} (s),

(68)

and the normalization constant ψ(s) is

\begin{array}{l} Ψ (s) = \frac{1}{2} f (s) C^{- 1} (s) f (s) + \frac{1}{T} \log (\sqrt{\det [C]}) + \frac{1}{T} \log \sqrt{{(\frac{2 π}{T})}^{N}}, \\ \sim \frac{1}{2} f (s) C^{- 1} (s) f (s), \end{array}

(69)

where we ignored the terms of the order of 1/T in the limit of T → ∞. In this case, the Fisher information matrix with respect to the natural parameters is computed as follows:

- \frac{\partial^{2} \log p (\bar{r})}{\partial θ_{i} \partial θ_{j}} = T C_{i j},

(70)

- \frac{\partial^{2} \log p (\bar{r})}{\partial θ_{i} \partial Θ_{j k}} = T (C_{j i} f_{k} + C_{k i} f_{j}),

(71)

- \frac{\partial^{2} \log p (\bar{r})}{\partial Θ_{i j} \partial Θ_{k l}} = T (2 C_{i k} f_{l} f_{j} + 2 C_{j k} f_{l} f_{i}) .

(72)

If we compare the components of the Fisher information matrix when $\bar{R}$ is missing with those when the data are complete, the information loss due to the missing data is seen to be represented in the components $\frac{\partial^{2} \log p (\bar{r}, \bar{R})}{\partial Θ_{i j} \partial Θ_{k l}}$ (Eqs. 63 and 72). By using the Fisher information matrix with respect to the natural parameters, we can compute the Fisher information with respect to stimulus s as

\frac{I_{F 2} (s; \bar{r})}{T} = {f^{'}}^{T} C^{- 1} f^{'} .

(73)

From Eqs. 64 and 73, we find that the information loss due to the missing data $\bar{R}$ is

\frac{Δ I_{F 2} (s)}{T} = \frac{1}{2} Tr [C^{'} C^{- 1} C^{'} C^{- 1}] .

(74)

This information loss solely depends on C′C⁻¹ and is always positive.

Inference 3 Third, let us consider the Fisher information in the Inference 3 case. In the Inference 2 case, we considered that the second-order sufficient statistics $\bar{R}$ which are the variance and covariance data of neural responses, are lost. A mismatched probability distribution q(r; s) that is comparable with the inference when $\bar{R}$ is lost would be that the mean in q(r; s) is the same as that in p(r; s) but the covariance matrix in q(r; s) does not match the true covariance matrix in p(r; s). As a simple example, we assume that the covariance matrix in the mismatched probability distribution is a constant matrix C_q that is independent of s:

q (r; s) = \frac{1}{\sqrt{{(2 π)}^{N} \det [C_{q}]}} \exp (- \frac{1}{2} {(r - f (s))}^{T} C_{q}^{- 1} (r - f (s))) .

(75)

In this case, we can show that the maximum likelihood estimation based on the mismatched probability distribution is consistent, i.e., the condition shown in Eq. 29 is satisfied, as follows:

\begin{array}{l} E_{p} [\partial_{s} l_{q} (r; s)] = E_{p} [r^{T} C_{q}^{- 1} \frac{\partial f}{\partial s} - f^{T} C_{q}^{- 1} \frac{\partial f}{\partial s}], \\ = 0. \end{array}

(76)

By using Eq. 36, we obtain the Fisher information for the mismatched model q(r; s):

\frac{I_{F 3} (s)}{T} = \frac{{({f^{'}}^{T} C_{q}^{- 1} f^{'})}^{2}}{{f^{'}}^{T} C_{q}^{- 1} C C_{q}^{- 1} f^{'}} .

(77)

Inference 4 and comparison Finally, we consider the Fisher information in the Inference 4 case and compare the four types of inference described above. It is obvious that the Fisher information obtained by Inference 1, I_F1, is the largest and the Fisher information obtained by Inference 4, I_F4, is the smallest. On the other hand, the relationship between the Fisher information obtained by Inference 2, I_F2, and that obtained by Inference 3, I_F3, is not clear, i.e.,

I_{F 1} > I_{F 2}, I_{F 3} > I_{F 4} .

(78)

However, the relationship between I_F2 and I_F3 can be clarified by considering the Fisher information obtained by Inference 4, I_F4. We assumed that a mismatched model that is related to the loss of covariance data, $\bar{R}$ is the statistical model whose covariance matrix is a constant matrix. In this case, the vector of natural parameters Θ in Eq. 55, which is coupled with $\bar{R},$ does not depend on s. Thus, when we use a mismatched model q(r; s) whose covariance matrix is independent of s, the inference does not change even if the data about covariance $\bar{R}$ are lost. Thus, Inferences 3 and 4 result in the same estimate of s and the same Fisher information:

I_{F 4} (s) = I_{F 3} (s) .

(79)

To summarize, the relationship between the Fisher information in each of the four inference cases is

I_{F 1} > I_{F 2} > I_{F 3} = I_{F 4} .

(80)

Information Loss in Log-Linear Model of Binary Neural Response

In this section, we evaluate the information loss associated with loss of data and mismatched decoding in the log-linear model of binary neural response.

Two-neuron model

As the simplest example, we first consider the two-neuron model,

p (r_{1}, r_{2}) = \exp (θ_{1} (s) r_{1} + θ_{2} (s) r_{2} + Θ (s) r_{1} r_{2} - \log Z (s)),

(81)

where Z(s) is a normalization constant,

Z (s) = \sum_{r_{1}, r_{2} = 0, 1} \exp (\sum_{i = 1}^{2} θ_{i} (s) r_{i} + Θ (s) r_{1} r_{2}) .

(82)

The probability distribution for T observations of neural responses r can be written as

\begin{array}{l} p ({\bar{r}}_{1}, {\bar{r}}_{2}, {\bar{R}}_{12}) = W ({\bar{r}}_{1}, {\bar{r}}_{2}, {\bar{R}}_{12}) \exp (T (θ_{1} {\bar{r}}_{1} + θ_{2} {\bar{r}}_{2} + Θ {\bar{R}}_{12} - \log Z)), \\ = \exp (T (θ_{1} {\bar{r}}_{1} + θ_{2} {\bar{r}}_{2} + Θ {\bar{R}}_{12} + \frac{\log W ({\bar{r}}_{1}, {\bar{r}}_{2}, {\bar{R}}_{12})}{T} - \log Z)), \end{array}

(83)

where ${\bar{r}}_{2}, {\bar{r}}_{1},$ and ${\bar{R}}_{12}$ are sufficient statistics,

{\bar{r}}_{1} = \frac{1}{T} \sum_{t = 1}^{T} r_{1}^{(t)},

(84)

{\bar{r}}_{2} = \frac{1}{T} \sum_{t = 1}^{T} r_{2}^{(t)},

(85)

{\bar{R}}_{12} = \frac{1}{T} \sum_{t = 1}^{T} r_{1}^{(t)} r_{2}^{(t)},

(86)

and $W ({\bar{r}}_{1}, {\bar{r}}_{2}, {\bar{R}}_{12})$ is the number of configurations of (r⁽¹⁾, r⁽²⁾,…,r^(T)) where the sufficient statistics take the specific values ${\bar{r}}_{1}, {\bar{r}}_{2}, and {\bar{R}}_{12} .$ $W ({\bar{r}}_{1}, {\bar{r}}_{2}, {\bar{R}}_{12})$ can be expressed as

\begin{array}{l} W ({\bar{r}}_{1}, {\bar{r}}_{2}, {\bar{R}}_{12}) \\ = \frac{T!}{(T {\bar{R}}_{12})! (T ({\bar{r}}_{1} - {\bar{R}}_{12}))! (T ({\bar{r}}_{2} - {\bar{R}}_{12}))! (T (1 - {\bar{r}}_{1} - {\bar{r}}_{2} + {\bar{R}}_{12}))!} . \end{array}

(87)

In the limit of T → ∞, by using Stirling's formula, $W ({\bar{r}}_{1}, {\bar{r}}_{2}, {\bar{R}}_{12})$ can be approximated as

\begin{matrix} \log W = - T ({\bar{R}}_{12} \log {\bar{R}}_{12} + ({\bar{r}}_{1} - {\bar{R}}_{12}) \log ({\bar{r}}_{1} - {\bar{R}}_{12}) \\ + ({\bar{r}}_{2} - {\bar{R}}_{12}) \log ({\bar{r}}_{2} - {\bar{R}}_{12})) \\ + (1 - {\bar{r}}_{1} - {\bar{r}}_{2} + {\bar{R}}_{12}) \log (1 - {\bar{r}}_{1} - {\bar{r}}_{2} + {\bar{R}}_{12})) . \end{matrix}

(88)

Inference 1 We first compute the Fisher information in the Inference 1 case. The Fisher information matrix with respect to the natural parameters is given by

- \frac{\partial^{2} \log p ({\bar{r}}_{1}, {\bar{r}}_{2}, {\bar{R}}_{12})}{\partial θ_{i} \partial θ_{j}} = T (〈 r_{i} r_{j} 〉 - 〈 r_{i} 〉 〈 r_{j} 〉),

(89)

- \frac{\partial^{2} \log p ({\bar{r}}_{1}, {\bar{r}}_{2}, {\bar{R}}_{12})}{\partial Θ^{2}} = T (〈 r_{1} r_{2} 〉 - {〈 r_{1} r_{2} 〉}^{2}),

(90)

- \frac{\partial^{2} \log p ({\bar{r}}_{1}, {\bar{r}}_{2}, {\bar{R}}_{12})}{\partial θ_{i} \partial Θ} = T (〈 r_{1} r_{2} 〉 - 〈 r_{i} 〉 〈 r_{1} r_{2} 〉),

(91)

where 〈x〉 = E_p[x] = ∑_rxp(r₁, r₂). By using the Fisher information matrix with respect to the natural parameters, we can obtain the Fisher information with respect to stimulus s from Eq. 25:

\begin{array}{l} \frac{I_{F 1} (s; {\bar{r}}_{1}, {\bar{r}}_{2}, {\bar{R}}_{12})}{T} \\ = \sum_{i = 1}^{2} 〈 r_{i} 〉 (1 - 〈 r_{i} 〉) {(\frac{\partial θ_{i}}{\partial s})}^{2} + 〈 r_{1} r_{2} 〉 (1 - 〈 r_{1} r_{2} 〉) {(\frac{\partial Θ}{\partial s})}^{2} \\ + 2 (〈 r_{1} r_{2} 〉 - 〈 r_{1} 〉 〈 r_{2} 〉) \frac{\partial θ_{1}}{\partial s} \frac{\partial θ_{2}}{\partial s} + 2 \sum_{i = 1}^{2} 〈 r_{1} r_{2} 〉 (1 - 〈 r_{i} 〉) \frac{\partial θ_{i}}{\partial s} \frac{\partial Θ}{\partial s} . \end{array}

(92)

Inference 2 We next compute the Fisher information when the data of neural correlation ${\bar{R}}_{12}$ are lost. When T is finite, it is difficult to marginalize the probability distribution $p ({\bar{r}}_{1}, {\bar{r}}_{2}, {\bar{R}}_{12})$ over ${\bar{R}}_{12}$ because there are many possible ${\bar{R}}_{12}$ when specific values of ${\bar{r}}_{1}$ and ${\bar{r}}_{2}$ are given. However, in the T → ∞ case, we only need to consider the most probable ${\bar{R}}_{12}$ when ${\bar{r}}_{1}$ and ${\bar{r}}_{2}$ are given. By differentiating the argument of the exponential function in Eq. 83 with respect to ${\bar{R}}_{12},$ we can obtain the equation for the most probable ${\bar{R}}_{12}$ :

Θ - \log \frac{{\bar{R}}_{12} (1 - {\bar{r}}_{1} - {\bar{r}}_{2} + {\bar{R}}_{12})}{({\bar{r}}_{1} - {\bar{R}}_{12}) ({\bar{r}}_{2} - {\bar{R}}_{12})} = 0.

(93)

We denote the solution of this equation by ${\bar{R}}_{12}^{*}$ Note that ${\bar{R}}_{12}^{*}$ depends on θ but does not depend on θ₁ and θ₂. The marginalized probability distribution is written as

\begin{array}{l} p ({\bar{r}}_{1}, {\bar{r}}_{2}) \\ = \exp (T (θ_{1} {\bar{r}}_{1} + θ_{2} {\bar{r}}_{2} + Θ {\bar{R}}_{12}^{*} + \frac{\log W ({\bar{r}}_{1}, {\bar{r}}_{2}, {\bar{R}}_{12}^{*})}{T} - \log Z)) . \end{array}

(94)

By using Eq. 94, we can compute the Fisher information when ${\bar{R}}_{12}$ is lost as

\begin{array}{l} \frac{I_{F 2} (s; {\bar{r}}_{1}, {\bar{r}}_{2})}{T} = \frac{I_{F 1} (s; {\bar{r}}_{1}, {\bar{r}}_{2}, {\bar{R}}_{12})}{T} \\ - E_{p} ({\bar{r}}_{1}, {\bar{r}}_{2}) [\frac{\partial^{2}}{\partial Θ^{2}} (Θ {\bar{R}}_{12}^{*} + \frac{1}{T} \log W ({\bar{r}}_{1}, {\bar{r}}_{2}, {\bar{R}}_{12}^{*})) {(\frac{\partial Θ}{\partial s})}^{2},] \end{array}

(95)

where we used the fact that ${\bar{R}}_{12}^{*}$ only depends on θ. The information loss is given by

\begin{matrix} \frac{Δ I_{F 2} (s)}{T} = E_{p ({\bar{r}}_{1}, {\bar{r}}_{2})} [\frac{\partial^{2}}{\partial Θ^{2}} (Θ {\bar{R}}_{12}^{*} + \frac{1}{T} \log W ({\bar{r}}_{1}, {\bar{r}}_{2}, {\bar{R}}_{12}^{*}))] d {(\frac{\partial Θ}{\partial s})}^{2}, \\ = (\frac{1}{〈 r_{1}, r_{2} 〉} + \frac{1}{〈 r_{1} 〉 - 〈 r_{1} r_{2} 〉} + \frac{1}{〈 r_{2} 〉 - 〈 r_{1} r_{2} 〉} \\ + \frac{1}{1 - 〈 r_{1} 〉 - 〈 r_{2} 〉 + 〈 r_{1} r_{2} 〉} {(\frac{\partial Θ}{\partial s})}^{2}) . \end{matrix}

(96)

This information loss only depends on ∂θ/∂s. Thus, when θ is independent of s, there is no information loss even if data ${\bar{R}}_{12}$ are lost.

Inference 3 We next compute the Fisher information when neural correlation is ignored in decoding. We consider the mismatched decoding model q(r₁, r₂) that is the product of the marginal probability distributions of the actual distribution p(r₁, r₂):

\begin{array}{l} q (r_{1}, r_{2}; s) = \prod_{i = 1}^{2} p (r_{i}), \\ = \prod_{i = 1}^{2} \exp (θ_{i}^{D} (s) r_{i} - \log Z_{i}^{D} (s)), \end{array}

(97)

where p(r_i) is the marginalized distribution over r_j, i.e., $p (r_{i}) = \sum_{r_{j}} p (r_{i}, r_{j}),$ and the normalization constant $Z_{i}^{D} (s)$ is given by

Z_{i}^{D} (s) = 1 + \exp (θ_{i}^{D} (s)) .

(98)

From the definition of q(r₁, r₂), the averaged values of r₁ and r₂ over the mismatched model q(r₁, r₂) are equal to those over the actual model p(r₁, r₂). Thus, the following relationship holds between $θ_{i}^{D} (s)$ and the natural parameters in the actual probability distribution p(r₁, r₂):

\begin{array}{l} 〈 r_{i} 〉 = \frac{1}{Z (s)} \sum_{r_{1}, r_{2}} r_{i} \exp (θ_{1} r_{1} + θ_{2} r_{2} + Θ r_{1} r_{2}), \\ = \frac{\exp (θ_{i}^{D} (s))}{1 + \exp (θ_{i}^{D} (s))} . (for i = 1, 2) \end{array}

(99)

The maximum likelihood estimation based on this mismatched decoding model q(r₁, r₂) is shown to be consistent as follows:

E_{p} [\frac{\partial \log q (r_{1}, r_{2})}{\partial s}] = E_{p} [\sum_{i = 1}^{2} \frac{\partial θ_{i}^{D} (s)}{\partial s} (r_{i} - \frac{\exp (θ_{i}^{D} (s))}{1 + \exp (θ_{i}^{D} (s))})],

(100)

= 0.

(101)

When the estimation by a mismatched decoding model is consistent, the Fisher information obtained by the mismatched decoding model can be computed by Eq. 36. The Fisher information is given by

\frac{I_{F 3} (s)}{T} = \frac{{(\sum_{i = 1}^{2} {(\frac{\partial θ_{i}^{D}}{\partial s})}^{2} 〈 r_{i} 〉 (1 - 〈 r_{i} 〉))}^{2}}{\sum_{i = 1}^{2} {(\frac{\partial θ_{i}^{D}}{\partial s})}^{2} 〈 r_{i} 〉 (1 - 〈 r_{i} 〉) + 2 \frac{\partial θ_{1}^{D}}{\partial s} \frac{\partial θ_{2}^{D}}{\partial s} (〈 r_{1} r_{2} 〉 - 〈 r_{1} 〉 〈 r_{2} 〉)},

(102)

where

\begin{array}{l} \frac{\partial θ_{i}^{D}}{\partial s} = \frac{\partial θ_{i}}{\partial s} 〈 r_{i} 〉 (1 - 〈 r_{i} 〉) + \frac{\partial θ_{j}}{\partial s} (〈 r_{1} r_{2} 〉 - 〈 r_{1} 〉 〈 r_{2} 〉) \\ + \frac{\partial Θ}{\partial s} 〈 r_{1} r_{2} 〉 (1 - 〈 r_{i} 〉) . (for i, j = 1, 2) \end{array}

(103)

As discussed in the previous section, I_F3(s) is always smaller than I_F2(s). Although the information loss associated with the loss of data $\bar{R}$ only depends on ∂Θ/∂s, the information loss associated with ignoring correlation in decoding depends not only on ∂Θ/∂s but also on ∂θ₁/∂s and ∂θ₂/∂s. Thus, when neural correlation is ignored in decoding, the information is lost even if Θ is independent of s.

As a special case, if θ₁ = θ₂, which means 〈r₁〉 = 〈r₂〉 = 〈r〉, the information loss only depends on ∂Θ/∂s and is given by

\frac{Δ I_{F 3} (s)}{T} = \frac{〈 r_{1} r_{2} 〉 (〈 r 〉 - 〈 r_{1} r_{2} 〉) (1 - 2 〈 r 〉 + 〈 r_{1} r_{2} 〉)}{〈 r 〉 - 2 {〈 r 〉}^{2} + 〈 r_{1} r_{2} 〉} {(\frac{\partial Θ}{\partial s})}^{2} .

(104)

In this case, ΔI_F3(s) is equal to ΔI_F2(s) (Eq. 96). This is because if θ₁ = θ₂ = θ, only two parameters, namely θ and Θ, are in the statistical model. This is the same situation as in the one-dimensional Gaussian case illustrated in Figure 6, where ΔI_F3(s) is also equal to ΔI_F2(s).

Homogeneous n neuron model

We next consider the case with a large number of neurons. In this case, the Fisher information cannot be analytically computed in general. To restrict ourselves to dealing with an analytically tractable model, we here only deal with a probabilistic model of a homogeneous neural population. In this model, θi(s) = θ(s) for any i and θij(s) = Θ(s) for any pair of i and j, i.e.,

p (r; s) = \exp (θ (s) \sum_{i = 1}^{N} r_{i} + \frac{Θ (s)}{N} \sum_{i < j} r_{i} r_{j} - \log Z (s)),

(105)

where the normalization constant Z(s) is given by

Z (s) = Tr \exp (θ (s) \sum_{i = 1}^{N} r_{i} + \frac{Θ (s)}{N} \sum_{i < j} r_{i} r_{j}),

(106)

where Tr stands for the sum over all possible combinations of the neuron state variables (r₁, r₂,…,r_N). The probability distribution of T observations is given by

p (\bar{r}, \bar{R}) = \exp (T (θ (s) \sum_{i = 1}^{N} {\bar{r}}_{i} + \frac{Θ (s)}{N} \sum_{i < j} {\bar{R}}_{i j} - \log Z (s))),

(107)

where $\bar{r} = {{\bar{r}}_{i}}$ and $\bar{R} = {{\bar{R}}_{i j}}$ are sufficient statistics,

{\bar{r}}_{i} = \frac{1}{T} \sum_{t = 1}^{T} r_{i} (t),

(108)

{\bar{R}}_{i j} = \frac{1}{T} \sum_{t = 1}^{T} r_{i} (t) r_{j} (t) .

(109)

Inference 1 We first compute the Fisher information when data are complete and the decoding is optimal. As Eq. 23 shows, the Fisher information can be computed if we evaluate log Z. For analytical tractability, we consider the limit of N→∞. In this case, Z can be calculated as

\begin{array}{l} Z = Trexp (θ {\sum_{i} r_{i} + \frac{Θ}{N} \sum_{i < j} r_{i} r_{j}), \\ = Trexp ((θ - \frac{Θ}{2 N}) \sum_{i} r_{i} + \frac{Θ}{2 N} {(\sum_{i} r_{i})}^{2}), \\ = \int_{- \infty}^{\infty} d m exp (N (\frac{Θ}{2} m^{2} + θ m) + log W (m)), \end{array}

(110)

where m is

m = \frac{1}{N} \sum_{i = 1}^{N} r_{i}

(111)

and W(m) is the number of states where m takes a certain value, which is given by

W (m) = \frac{N!}{(N m)! (N (1 - m))!} .

(112)

In the limit of N→∞, by using Stirling's formula, W(m) can be approximated as

\log W (m) = - N (m \log m + (1 - m) \log (1 - m)) .

(113)

We denote the argument of the exponential function in Eq. 110 by F, where

F = N (\frac{Θ}{2} m^{2} + θ m - m \log m - (1 - m) \log (1 - m)) .

(114)

In the limit of N→∞, the integral in Eq. 110 can be approximated as

\int_{- \infty}^{\infty} d m \exp (F) = \exp (F^{*}),

(115)

where F* is the maximum of the function F. From ∂F/∂m = 0, the value of m that maximizes the function F is the solution of the self-consistent equation

m = \frac{\exp (θ + Θ m)}{1 + \exp (θ + Θ m)} .

(116)

We set the solution of Eq. 116 to m*(θ, θ). The Fisher information matrix with respect to the natural parameters θ and θ is given by

- \frac{\partial^{2} F^{*}}{\partial θ^{2}} = N \frac{m^{*} (1 - m^{*})}{1 - Θ m^{*} (1 - m^{*})},

(117)

- \frac{\partial^{2} F^{*}}{\partial θ \partial Θ} = N \frac{m^{* 2} (1 - m^{*})}{1 - Θ m^{*} (1 - m^{*})},

(118)

- \frac{\partial^{2} F^{*}}{\partial Θ^{2}} = N \frac{m^{* 3} (1 - m^{*})}{1 - Θ m^{*} (1 - m^{*})} .

(119)

By using the Fisher information matrix, we obtain the Fisher information with respect to stimulus s from Eq. 25:

\begin{array}{l} \frac{I_{F 1} (s)}{N} = \frac{m^{*} (1 - m^{*})}{1 - Θ m^{*} (1 - m^{*})} {(\frac{\partial θ}{\partial s})}^{2} + 2 \frac{m^{* 2} (1 - m^{*})}{1 - Θ m^{*} (1 - m^{*})} \frac{\partial θ}{\partial s} \frac{\partial Θ}{\partial s} \\ + \frac{m^{* 3} (1 - m^{*})}{1 - Θ m^{*} (1 - m^{*})} {(\frac{\partial Θ}{\partial s})}^{2} . \end{array}

(120)

Inferences 2 and 3 We next consider the situation where the data of correlation, $\bar{R}$ are lost and a mismatched decoding model that ignores neural correlation q(r; s) is used. We first consider the inference with a mismatched decoding model (Inference 3). This model q(r; s) is defined from the actual probability distribution p(r; s) as follows:

q (r; s) = \prod_{i = 1}^{N} p (r_{i}),

= \prod_{i = 1}^{N} \exp (θ_{D} (s) r_{i} - \log Z_{D} (s)),

(121)

where the normalization constant Z_D is given by

Z_{D} (s) = 1 + \exp (θ_{D} (s)) .

(122)

The relationship between m* in the previous section and θD is

m^{*} = \frac{\exp (θ_{D})}{1 + \exp (θ_{D})} .

(123)

By comparing Eq. 123 with Eq. 116, we obtain

θ_{D} = θ + Θ m^{*} .

(124)

Similar to in the previous section, the maximum likelihood estimation with the independent decoding model q(r; s) can be shown to be consistent as follows:

\begin{matrix} E_{p} [\partial_{s} l_{q} (r; s)] = E_{p} [(\sum_{i = 1}^{N} r_{i}) \frac{\partial θ_{D}}{\partial s} - N \frac{\exp (θ_{D})}{1 + \exp (θ_{D})} \frac{\partial θ_{D}}{\partial s}], \\ = 0. \end{matrix}

(125)

By using Eq. 36, we can compute the Fisher information obtained by the independent decoding model as

\begin{matrix} \frac{I_{F 3} (s)}{N} = \frac{m^{*} (1 - m^{*})}{1 - Θ m^{*} (1 - m^{*})} {(\frac{\partial θ}{\partial s})}^{2} + 2 \frac{m^{* 2} (1 - m^{*})}{1 - Θ m^{*} (1 - m^{*})} \frac{\partial θ}{\partial s} \frac{\partial Θ}{\partial s} \\ + \frac{m^{* 3} (1 - m^{*})}{1 - Θ m^{*} (1 - m^{*})} {(\frac{\partial Θ}{\partial s})}^{2} . \end{matrix}

(126)

Since I_F1(s) = I_F3(s) (see Eqs. 120 and 126), there is no information loss when the independent model is used for decoding in a homogeneous neural population (Wu et al., 2001). As for the Gaussian model, the Fisher information when the data of neural correlation $\bar{R}$ are lost, I_F2, is larger than I_F3 and is smaller than I_F1. Since I_F1(s) = I_F3(s) in this case, I_F2(s) is also equal to I_F1(s). Thus,

I_{F 1} (s) = I_{F 2} (s) = I_{F 3} (s) .

(127)

In summary, no information is lost in a homogeneous neural population even when the data of neural correlation, $\bar{R}$ are lost or when the mismatched model that ignores neural correlation is used for decoding.

Discussion

In this work, we introduced a novel framework for investigating information processing in the brain where we studied information loss caused by two situations: imperfect observations and mismatched decoding. By evaluating the information loss caused by non-simultaneous observations of neural responses, we can quantify the importance of correlated activity. This can also be quantified by similarly evaluating the information loss caused by mismatched decoding that ignores neural correlation. We discussed these two types of loss by giving the information geometric interpretations of inferences with partially observed data and those with a mismatched decoding model and elucidated their relationship. We showed that the information loss associated with ignoring correlation in decoding is always larger than that caused by non-simultaneous observations of neural responses. This is because the inference based on an independent decoding model with complete data is equivalent to the inference based on an independent decoding model with “partial” data where the data of neural correlation are lost, which is naturally worse than the inference based on a correct decoding model with the partial data. This also can be intuitively understood by considering that decoding without the data of correlation considers all possible models of neural correlations, including the correct one, whereas decoding with a mismatched model locks it within the wrong domain.

Taking account of the relationship between the two inference methods, we give a simple guide on how to use the two different measures for quantifying the importance of correlation. To address the importance of coincidence detection by downstream neurons without making any specific assumption about the decoding process in higher-order areas in the brain, we should evaluate the information loss caused by non-simultaneous observations. The information loss quantified in this way can be used as the lower bound on the information conveyed by correlated activity. In contrast, the information loss quantified by using the independent decoding model can be used as the upper bound on the information conveyed by correlated activity because neural correlation is ignored not only in the observations but also in the decoding in this quantification. In summary, we consider that both measures should be computed when quantifying the importance of correlations and should be used as the lower bound and upper bound for the importance of correlations.

We considered the case that the stimulus is represented as a continuous variable. In this case, the Fisher information is a suitable measure for quantifying the maximal amount of information that can be extracted from neural responses. When considering a set of discrete stimuli, which is mostly the case for neurophysiological experiments, we should use the mutual information instead of the Fisher information. Similar to the Fisher information, the mutual information obtained by the inference with partially observed data can be computed by marginalizing the probability distribution of neural responses over the sufficient statistics that correspond to the lost data. In addition, the mutual information obtained by the mismatched decoding model can be computed using information derived by Merhav et al. (Merhav et al., 1994; Latham and Nirenberg, 2005; Amari and Nakahara, 2006; Oizumi et al., 2010). Thus, when we use the mutual information, we can also quantify the importance of correlation with the concepts of missing data and mismatched decoding. We analyzed the binary probabilistic model with a homogeneous neural population and found that neural correlation does not carry any information in this particular case. It is important to study how much information correlated activity could carry in an inhomogeneous neural population such as the hypercolumn model in V1. Another interesting future direction would be to evaluate the information conveyed by higher-order correlation (Amari et al., 2003); we considered only pair-wise correlation in this paper. These issues remain as future work.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

This work was supported by a Grant-in-Aid for JSPS Fellows (08J08950) to Masafumi Oizumi. Masato Okada was supported by Grants-in-Aid for Scientific Research (18079003, 20240020, 20650019) from the Ministry of Education, Culture, Sports, Science and Technology of Japan.

Appendix

Calculation of fisher information

In this appendix, we provide derivations for Eqs. 61–64, 70–73, and 89–91. First, we derive a useful relationship which holds for the exponential family. From the definition of the exponential family (Eq. 2), we obtain

\frac{\partial \log p (r; s)}{\partial θ_{I}} = R_{I} (r) - \frac{\partial^{2} ψ (s)}{\partial θ_{I}} .

(128)

Taking the average of this equation, we find

E_{p} [R_{I} (r)] = \frac{\partial ψ (s)}{\partial θ_{I}} .

(129)

By using this relationship, we can derive Eqs. 61–63 as follows,

\begin{matrix} - \frac{\partial^{2} \log p (\bar{r}, \bar{R})}{\partial θ_{i} \partial θ_{j}} = T \frac{\partial^{2} ψ (s)}{\partial θ_{i} \partial θ_{j}}, \\ = T \frac{\partial f_{i}}{\partial θ_{j}}, \\ = T C_{i j}, \end{matrix}

(130)

\begin{array}{l} - \frac{\partial^{2} \log p (\bar{r}, \bar{R})}{\partial θ_{i} \partial Θ_{j k}} - T \frac{\partial (f_{i} f_{k} + C_{j k})}{\partial θ_{i}}, \\ = T (\frac{\partial f_{i}}{\partial θ_{i}} f_{k} + \frac{\partial f_{k}}{\partial θ_{i}} f_{j}), \\ = T (C_{j i} f_{k} + C_{k i} f_{j}), \end{array}

(131)

and

\begin{matrix} - \frac{\partial^{2} \log p (\bar{r}, \bar{R})}{\partial Θ_{i j} \partial Θ_{k l}} = T \frac{\partial (f_{i} f_{j} + C_{i j})}{\partial Θ_{k l}}, \\ = (\frac{\partial f_{i}}{\partial Θ_{k l}} f_{j} + \frac{\partial f_{j}}{\partial Θ_{k l}} f_{i} + \frac{\partial C_{i j}}{\partial Θ_{k l}}), \\ = T (2 C_{i k} f_{l} f_{j} + 2 C_{j k} f_{l} f_{i} + 2 C_{i k} C_{l j}) . \end{matrix}

(132)

where we used Eqs. 58 and 59. We compute the Fisher information with respect to stimulus s by the following equation,

\begin{array}{l} I_{F} (s) = - \sum_{i, j} \frac{\partial^{2} \log p (\bar{r}, \bar{R})}{\partial θ_{i} \partial \partial_{j}} \frac{\partial θ_{i} (s)}{\partial s} \frac{\partial θ_{j} (s)}{\partial s} \\ - 2 \sum_{i, j, k} \frac{\partial^{2} \log p (\bar{r}, \bar{R})}{\partial θ_{i} \partial Θ_{j k}} \frac{\partial θ_{i} (s)}{\partial s} \frac{\partial Θ_{j k} (s)}{\partial s} \\ - \sum_{i, j, k, l} \frac{\partial^{2} \log p (\bar{r}, \bar{R})}{\partial Θ_{i j} \partial Θ_{k l}} \frac{\partial Θ_{i j} (s)}{\partial s} \frac{\partial Θ_{k l} (s)}{\partial s} . \end{array}

(133)

By using Eqs. 130–132, we have

\begin{array}{l} - \sum_{i, j} \frac{\partial^{2} \log p (\bar{r}, \bar{R})}{\partial θ_{i} \partial θ_{j}} \frac{\partial θ_{i} (s)}{\partial s} \frac{\partial θ_{j} (s)}{\partial s} \\ = T \sum_{i, j, k, l} C_{i j} (C_{i k}^{- 1} f_{k}^{'} + {C^{'}}_{i k}^{- 1} f k) (C_{j l}^{- 1} {f^{'}}_{l} + {C^{'}}_{j l}^{- 1} f_{l}), \\ = T ({f^{'}}^{T} C^{- 1} f^{'} + 2 f^{T} {C^{'}}^{- 1} C C^{- 1} f^{'} + f^{T} {C^{'}}^{- 1} C {C^{'}}^{- 1} f), \end{array}

(134)

\begin{array}{l} - 2 \sum_{i, j, k} \frac{\partial^{2} \log p (\bar{r}, \bar{R})}{\partial θ_{i} \partial Θ_{j k}} \frac{\partial θ_{i} (s)}{\partial s} \frac{\partial Θ_{j k} (s)}{\partial s} \\ = 2 T \sum_{i, j, k, l} (C_{j i} f_{k} + C_{k i} f_{j}) (C_{i l}^{- 1} {f^{'}}_{l} + {C^{'}}_{i l}^{- 1} f_{l}) (- \frac{1}{2} {C^{'}}_{j k}^{- 1}), \\ = T (2 f^{T} {C^{'}}^{- 1} C C^{- 1} f^{'} + 2 f^{T} {C^{'}}^{- 1} C {C^{'}}^{- 1} f), \end{array}

(135)

\begin{array}{l} - \sum_{i, j, k, l} \frac{\partial^{2} \log p (\bar{r}, \bar{R})}{\partial Θ_{i j} \partial Θ_{k l}} \frac{\partial Θ_{i j} (s)}{\partial s} \frac{\partial Θ_{k l} (s)}{\partial s} \\ - T \sum_{i, j, k, l} (2 C_{i k} f_{l} f_{j} + 2 C_{j k} f_{l} f_{i} + 2 C_{i k} C_{l j}) (- \frac{1}{2} {C^{'}}_{i j}^{- 1}) (- \frac{1}{2} {C^{'}}_{k l}^{- 1}), \\ = T (\frac{1}{2} Tr [C^{'} C^{- 1} C^{'} C^{- 1}] + f^{T} {C^{'}}^{- 1} C {C^{'}}^{- 1} f) . \end{array}

(136)

Taken together, we can obtain Eq. 64. By similar computations, we can also obtain Eqs. 70–73.

Eqs. 88–91 can be derived as follows,

\begin{matrix} - \frac{\partial^{2} \log p ({\bar{r}}_{1}, {\bar{r}}_{2}, {\bar{R}}_{12})}{\partial θ_{i} \partial θ_{j}} \\ = T \frac{\partial}{\partial θ_{j}} (\frac{1}{Z} \sum_{r_{1}, r_{2} = 0, 1} r_{i} \exp (\sum_{i = 1}^{2} θ_{i} (s) r_{i} + Θ (s) r_{1} r_{2})) \\ = T (〈 r_{i} r_{j} 〉 - 〈 r_{i} 〉 〈 r_{j} 〉) . \end{matrix}

(137)

\begin{matrix} - \frac{\partial^{2} \log p ({\bar{r}}_{1}, {\bar{r}}_{2}, {\bar{R}}_{12})}{\partial Θ^{2}} \\ = T \frac{\partial}{\partial Θ} (\frac{1}{Z} \sum_{r_{1}, r_{2} = 0, 1} r_{1} r_{2} \exp (\sum_{i = 1}^{2} θ_{i} (s) r_{i} + Θ (s) r_{1} r_{2})) \\ = T (〈 r_{1} r_{2} 〉 - {〈 r_{1} r_{2} 〉}^{2}), \end{matrix}

(138)

\begin{array}{l} - \frac{\partial^{2} \log p ({\bar{r}}_{1}, {\bar{r}}_{2}, {\bar{R}}_{12})}{\partial θ_{i} \partial Θ} \\ = T \frac{\partial}{\partial θ_{i}} (\frac{1}{Z} \sum_{r_{1}, r_{2} = 0, 1} r_{1} r_{2} \exp (\sum_{i = 1}^{2} θ_{i} (s) r_{i} + Θ (s) r_{1} r_{2})) \\ = T (〈 r_{1} r_{2} 〉 - 〈 r_{i} 〉 〈 r_{1} r_{2} 〉) . \end{array}

(139)

References

Abbott L. F., Dayan P. (1999). The effect of correlated variability on the accuracy of a population code. Neural. Comput. 11, 91–101 [DOI] [PubMed] [Google Scholar]
Abeles M. (1982). Role of the cortical neuron: integrator or coincidence detector? Isr. J. Med. Sci. 18, 83–92 [PubMed] [Google Scholar]
Amari S. (1982). Differential geometry of curved exponential families – curvatures and information loss. Ann. Stat. 10, 357–385 10.1214/aos/1176345779 [DOI] [Google Scholar]
Amari S. (1995). Information geometry of the EM and em algorithms for neural networks. Neural Netw. 8, 1379–1408 10.1016/0893-6080(95)00003-8 [DOI] [Google Scholar]
Amari S. (2001). Information geometry on hierarchy of probability distributions. IEEE Trans. Neural Netw. 47, 1701–1711 [Google Scholar]
Amari S., Nagaoka H. (2000). Method of Information Geometry. New York: Oxford University Press [Google Scholar]
Amari S., Nakahara H. (2006). Correlation and independence in the neural code. Neural Comput. 18, 1259–1267 10.1162/neco.2006.18.6.1259 [DOI] [PubMed] [Google Scholar]
Amari S., Nakahara H., Wu S., Sakai Y. (2003). Synchronous firing and higher-order interactions in neuron pool. Neural. Comput. 15, 127–142 [DOI] [PubMed] [Google Scholar]
Averbeck B. B., Latham P. E., Pouget A. (2006). Neural correlations, population coding and computation. Nat. Rev. Neruosci. 7, 358–366 [DOI] [PubMed] [Google Scholar]
Averbeck B. B., Lee D. (2006). Effects of noise correlations on information encoding and decoding. J. Neurophysiol. 95, 3633–3644 10.1152/jn.00919.2005 [DOI] [PubMed] [Google Scholar]
Brunel N., Nadal J. (1998). Mutual information, Fisher information, and population coding. Neural. Comput. 10, 1731–1757 [DOI] [PubMed] [Google Scholar]
Ecker A., Berens P., Keliris G., Bethge M., Logothetis N., Tolias A. (2010). Decorrelated neuronal firing in cortical microcircuits. Science 327, 584–587 10.1126/science.1179867 [DOI] [PubMed] [Google Scholar]
Gawne T. J., Richmond B. J. (1993). How independent are the messages carried by adjacent inferior temporal cortical neurons? J. Neurosci. 13, 2758–2771 [DOI] [PMC free article] [PubMed] [Google Scholar]
Golledge H. D., Panzeri S., Zheng F., Pola G., Scannell J. W., Giannikopoulos D. V., Mason R. J., Tovée M. J., Young M. P. (2003). Correlations, feature-binding and population coding in primary visual cortex. Neuroreport 14, 1045–1050 10.1097/00001756-200305230-00028 [DOI] [PubMed] [Google Scholar]
Gray C. M., König P., Engel A. K., Singer W. (1989). Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties. Nature 338, 334–337 10.1038/338334a0 [DOI] [PubMed] [Google Scholar]
Ishikane H., Gangi M., Honda S., Tachibana M. (2005). Synchronized retinal oscillations encode essential information for escape behavior in frogs. Nat. Neurosci. 80, 1087–1095 [DOI] [PubMed] [Google Scholar]
König P., Engel A. K., Singer W. (1996). Integrator or coincidence detector? The role of the cortical neuron revisited. Trends Neurosci. 19, 130–137 10.1016/S0166-2236(96)80019-1 [DOI] [PubMed] [Google Scholar]
Latham P. E., Nirenberg S. (2005). Synergy, redundancy, and independence in population codes, revisited. J. Neurosci. 25, 5195–5206 10.1523/JNEUROSCI.5319-04.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee D., Port N. L., Kruse W., Georgopoulos A. P. (1998). Variability and correlated noise in the discharge of neurons in motor and parietal areas of the primate cortex. J. Neurosci. 18, 1161–1170 [DOI] [PMC free article] [PubMed] [Google Scholar]
Meister M., Lagnado L., Baylor D. A. (1995). Concerted signaling by retinal ganglion cells. Science 270, 1207–1210 10.1126/science.270.5239.1207 [DOI] [PubMed] [Google Scholar]
Merhav N., Kaplan G., Lapidoth A., Shamai Shitz S. (1994). On information rates for mismatched decoders. IEEE Trans. Inf. Theory 40, 1953–1967 [Google Scholar]
Montani F., Ince R. A. A., Senetore R., Arabzadeh E., Diamond M., Panzeri S. (2009). The impact of higher-order interactions on the rate of synchronous discharge and information transmission in somatosensory cortex. Philos. Trans. R. Soc. A 367, 3297–3310 [DOI] [PubMed] [Google Scholar]
Nakahara H., Amari S. (2002). Information-geometric measure for neural spikes. Neural. Comput. 14, 2269–2316 [DOI] [PubMed] [Google Scholar]
Nirenberg S., Carcieri S. M., Jacobs A. L., Latham P. E. (2001). Retinal ganglion cells act largely as independent encoders. Nature 411, 698–701 10.1038/35079612 [DOI] [PubMed] [Google Scholar]
Nirenberg S., Latham P. E. (2003). Decoding neural spike trains: how important are correlations? Proc. Natl. Acad. Sci. U.S.A. 100, 7348–7353 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ohiorhenuan I. E., Mechler F., Purpura K. P., Schmid A. M., Hu Q., Victor J. D. (2010). Sparse coding and high-order correlations in fine-scale cortical networks. Nature 466, 617–622 10.1038/nature09178 [DOI] [PMC free article] [PubMed] [Google Scholar]
Oizumi M., Ishii T., Ishibashi K., Hosoya T., Okada M. (2010). Mismatched decoding in the brain. J. Neurosci. 30, 4815–4826 10.1523/JNEUROSCI.4360-09.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
Schneidman E., Berry M. J., II, Segev R., Bialek W. (2006). Weak pairwise correlations imply strongly correlated network states in a neural population. Nature 440, 1007–1012 10.1038/nature04701 [DOI] [PMC free article] [PubMed] [Google Scholar]
Shlens J., Field G. D., Gauthier J. L., Grivich M. I., Petrusca D., Sher A., Litke A. M., Chichilnisky E. J. (2006). The structure of multi-neuron firing patterns in primate retina. J. Neurosci. 260, 8254–8266 [DOI] [PMC free article] [PubMed] [Google Scholar]
Tang A., Jackson D., Hobbs J., Chen W., Smith J. L., Patel H., Prieto A., Petrusca D., Grivich M. I., Sher A., Hottowy P., Dabrowski W., Litke A. M., Beggs J. M. (2008). A maximum entropy model applied to spatial and temporal correlations from cortical networks in vitro. J. Neurosci. 28, 505–518 10.1523/JNEUROSCI.3359-07.2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wu S., Nakahara H., Amari S. (2001). Population coding with correlation and an unfaithful model. Neural Comput. 13, 775–797 10.1162/089976601300014349 [DOI] [PubMed] [Google Scholar]
Yaeli S., Meir R. (2010). Error-based analysis of optimal tuning functions explains phenomena observed in sensory neurons. Front. Comput. Neurosci. 4:130 10.3389/fncom.2010.00130 [DOI] [PMC free article] [PubMed] [Google Scholar]
Zohary E., Shadlen M. N., Newsome W. T. (1994). Correlated neuronal discharge rate and its implications for psychophysical performance. Nature 370, 140–143 10.1038/370140a0 [DOI] [PubMed] [Google Scholar]

[B1] Abbott L. F., Dayan P. (1999). The effect of correlated variability on the accuracy of a population code. Neural. Comput. 11, 91–101 [DOI] [PubMed] [Google Scholar]

[B2] Abeles M. (1982). Role of the cortical neuron: integrator or coincidence detector? Isr. J. Med. Sci. 18, 83–92 [PubMed] [Google Scholar]

[B3] Amari S. (1982). Differential geometry of curved exponential families – curvatures and information loss. Ann. Stat. 10, 357–385 10.1214/aos/1176345779 [DOI] [Google Scholar]

[B4] Amari S. (1995). Information geometry of the EM and em algorithms for neural networks. Neural Netw. 8, 1379–1408 10.1016/0893-6080(95)00003-8 [DOI] [Google Scholar]

[B5] Amari S. (2001). Information geometry on hierarchy of probability distributions. IEEE Trans. Neural Netw. 47, 1701–1711 [Google Scholar]

[B6] Amari S., Nagaoka H. (2000). Method of Information Geometry. New York: Oxford University Press [Google Scholar]

[B7] Amari S., Nakahara H. (2006). Correlation and independence in the neural code. Neural Comput. 18, 1259–1267 10.1162/neco.2006.18.6.1259 [DOI] [PubMed] [Google Scholar]

[B8] Amari S., Nakahara H., Wu S., Sakai Y. (2003). Synchronous firing and higher-order interactions in neuron pool. Neural. Comput. 15, 127–142 [DOI] [PubMed] [Google Scholar]

[B9] Averbeck B. B., Latham P. E., Pouget A. (2006). Neural correlations, population coding and computation. Nat. Rev. Neruosci. 7, 358–366 [DOI] [PubMed] [Google Scholar]

[B10] Averbeck B. B., Lee D. (2006). Effects of noise correlations on information encoding and decoding. J. Neurophysiol. 95, 3633–3644 10.1152/jn.00919.2005 [DOI] [PubMed] [Google Scholar]

[B11] Brunel N., Nadal J. (1998). Mutual information, Fisher information, and population coding. Neural. Comput. 10, 1731–1757 [DOI] [PubMed] [Google Scholar]

[B12] Ecker A., Berens P., Keliris G., Bethge M., Logothetis N., Tolias A. (2010). Decorrelated neuronal firing in cortical microcircuits. Science 327, 584–587 10.1126/science.1179867 [DOI] [PubMed] [Google Scholar]

[B13] Gawne T. J., Richmond B. J. (1993). How independent are the messages carried by adjacent inferior temporal cortical neurons? J. Neurosci. 13, 2758–2771 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] Golledge H. D., Panzeri S., Zheng F., Pola G., Scannell J. W., Giannikopoulos D. V., Mason R. J., Tovée M. J., Young M. P. (2003). Correlations, feature-binding and population coding in primary visual cortex. Neuroreport 14, 1045–1050 10.1097/00001756-200305230-00028 [DOI] [PubMed] [Google Scholar]

[B15] Gray C. M., König P., Engel A. K., Singer W. (1989). Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties. Nature 338, 334–337 10.1038/338334a0 [DOI] [PubMed] [Google Scholar]

[B16] Ishikane H., Gangi M., Honda S., Tachibana M. (2005). Synchronized retinal oscillations encode essential information for escape behavior in frogs. Nat. Neurosci. 80, 1087–1095 [DOI] [PubMed] [Google Scholar]

[B17] König P., Engel A. K., Singer W. (1996). Integrator or coincidence detector? The role of the cortical neuron revisited. Trends Neurosci. 19, 130–137 10.1016/S0166-2236(96)80019-1 [DOI] [PubMed] [Google Scholar]

[B18] Latham P. E., Nirenberg S. (2005). Synergy, redundancy, and independence in population codes, revisited. J. Neurosci. 25, 5195–5206 10.1523/JNEUROSCI.5319-04.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] Lee D., Port N. L., Kruse W., Georgopoulos A. P. (1998). Variability and correlated noise in the discharge of neurons in motor and parietal areas of the primate cortex. J. Neurosci. 18, 1161–1170 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] Meister M., Lagnado L., Baylor D. A. (1995). Concerted signaling by retinal ganglion cells. Science 270, 1207–1210 10.1126/science.270.5239.1207 [DOI] [PubMed] [Google Scholar]

[B21] Merhav N., Kaplan G., Lapidoth A., Shamai Shitz S. (1994). On information rates for mismatched decoders. IEEE Trans. Inf. Theory 40, 1953–1967 [Google Scholar]

[B22] Montani F., Ince R. A. A., Senetore R., Arabzadeh E., Diamond M., Panzeri S. (2009). The impact of higher-order interactions on the rate of synchronous discharge and information transmission in somatosensory cortex. Philos. Trans. R. Soc. A 367, 3297–3310 [DOI] [PubMed] [Google Scholar]

[B23] Nakahara H., Amari S. (2002). Information-geometric measure for neural spikes. Neural. Comput. 14, 2269–2316 [DOI] [PubMed] [Google Scholar]

[B24] Nirenberg S., Carcieri S. M., Jacobs A. L., Latham P. E. (2001). Retinal ganglion cells act largely as independent encoders. Nature 411, 698–701 10.1038/35079612 [DOI] [PubMed] [Google Scholar]

[B25] Nirenberg S., Latham P. E. (2003). Decoding neural spike trains: how important are correlations? Proc. Natl. Acad. Sci. U.S.A. 100, 7348–7353 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] Ohiorhenuan I. E., Mechler F., Purpura K. P., Schmid A. M., Hu Q., Victor J. D. (2010). Sparse coding and high-order correlations in fine-scale cortical networks. Nature 466, 617–622 10.1038/nature09178 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B27] Oizumi M., Ishii T., Ishibashi K., Hosoya T., Okada M. (2010). Mismatched decoding in the brain. J. Neurosci. 30, 4815–4826 10.1523/JNEUROSCI.4360-09.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B28] Schneidman E., Berry M. J., II, Segev R., Bialek W. (2006). Weak pairwise correlations imply strongly correlated network states in a neural population. Nature 440, 1007–1012 10.1038/nature04701 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B29] Shlens J., Field G. D., Gauthier J. L., Grivich M. I., Petrusca D., Sher A., Litke A. M., Chichilnisky E. J. (2006). The structure of multi-neuron firing patterns in primate retina. J. Neurosci. 260, 8254–8266 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B30] Tang A., Jackson D., Hobbs J., Chen W., Smith J. L., Patel H., Prieto A., Petrusca D., Grivich M. I., Sher A., Hottowy P., Dabrowski W., Litke A. M., Beggs J. M. (2008). A maximum entropy model applied to spatial and temporal correlations from cortical networks in vitro. J. Neurosci. 28, 505–518 10.1523/JNEUROSCI.3359-07.2008 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B31] Wu S., Nakahara H., Amari S. (2001). Population coding with correlation and an unfaithful model. Neural Comput. 13, 775–797 10.1162/089976601300014349 [DOI] [PubMed] [Google Scholar]

[B32] Yaeli S., Meir R. (2010). Error-based analysis of optimal tuning functions explains phenomena observed in sensory neurons. Front. Comput. Neurosci. 4:130 10.3389/fncom.2010.00130 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B33] Zohary E., Shadlen M. N., Newsome W. T. (1994). Correlated neuronal discharge rate and its implications for psychophysical performance. Nature 370, 140–143 10.1038/370140a0 [DOI] [PubMed] [Google Scholar]

PERMALINK

Information Loss Associated with Imperfect Observation and Mismatched Decoding

Masafumi Oizumi

Masato Okada

Shun-Ichi Amari

Abstract

Introduction

Figure 1.

Exponential Family of Probability Distributions

Example 1. Gaussian distribution

Example 2. Log-linear model of binary neural response

Inference of Stimulus and Fisher Information

Information Loss Caused by Imperfect Observation of Neural Responses

Information Loss Caused by Mismatched Decoding of Stimulus

Information Geometric Interpretations

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Relationship between Inference with Partial Observed Data and Inference with Mismatched Decoding Model: Gaussian Case

One-dimensional case

Figure 6.

Multidimensional case

Information Loss in Log-Linear Model of Binary Neural Response

Two-neuron model

Homogeneous n neuron model

Discussion

Conflict of Interest Statement

Acknowledgments

Appendix

Calculation of fisher information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Information Loss Associated with Imperfect Observation and Mismatched Decoding

Masafumi Oizumi

Masato Okada

Shun-Ichi Amari

Abstract

Introduction

Figure 1.

Exponential Family of Probability Distributions

Example 1. Gaussian distribution

Example 2. Log-linear model of binary neural response

Inference of Stimulus and Fisher Information

Information Loss Caused by Imperfect Observation of Neural Responses

Information Loss Caused by Mismatched Decoding of Stimulus

Information Geometric Interpretations

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Relationship between Inference with Partial Observed Data and Inference with Mismatched Decoding Model: Gaussian Case

One-dimensional case

Figure 6.

Multidimensional case

Information Loss in Log-Linear Model of Binary Neural Response

Two-neuron model

Homogeneous n neuron model

Discussion

Conflict of Interest Statement

Acknowledgments

Appendix

Calculation of fisher information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases