Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2016 Dec 6;113(51):14817–14822. doi: 10.1073/pnas.1603583113

Unified framework for information integration based on information geometry

Masafumi Oizumi a,b,1, Naotsugu Tsuchiya b,c,d, Shun-ichi Amari a
PMCID: PMC5187746  PMID: 27930289

Significance

Measuring the degree of causal influences among multiple elements of a system is a fundamental problem in physics and biology. We propose a unified framework for quantifying any combination of causal relationships between elements in a hierarchical manner based on information geometry. Our measure of integration, called geometrical integrated information, quantifies the strength of multiple causal influences among elements by projecting the probability distribution of a system onto a constrained manifold. This measure overcomes mathematical problems of existing measures and enables an intuitive understanding of the relationships between integrated information and other measures of causal influence such as transfer entropy. Inspired by the integration of neural activity in consciousness studies, our measure should have general utility in analyzing complex systems.

Keywords: integrated information, mutual information, transfer entropy, information geometry, consciousness

Abstract

Assessment of causal influences is a ubiquitous and important subject across diverse research fields. Drawn from consciousness studies, integrated information is a measure that defines integration as the degree of causal influences among elements. Whereas pairwise causal influences between elements can be quantified with existing methods, quantifying multiple influences among many elements poses two major mathematical difficulties. First, overestimation occurs due to interdependence among influences if each influence is separately quantified in a part-based manner and then simply summed over. Second, it is difficult to isolate causal influences while avoiding noncausal confounding influences. To resolve these difficulties, we propose a theoretical framework based on information geometry for the quantification of multiple causal influences with a holistic approach. We derive a measure of integrated information, which is geometrically interpreted as the divergence between the actual probability distribution of a system and an approximated probability distribution where causal influences among elements are statistically disconnected. This framework provides intuitive geometric interpretations harmonizing various information theoretic measures in a unified manner, including mutual information, transfer entropy, stochastic interaction, and integrated information, each of which is characterized by how causal influences are disconnected. In addition to the mathematical assessment of consciousness, our framework should help to analyze causal relationships in complex systems in a complete and hierarchical manner.


Quantitative assessment of causal influences among elements in a complex system is a fundamental problem in many fields of science, including physics (1), economics (2), gene networks (3), social networks (4), ecosystems (5), and neuroscience (6). There have been many previous attempts to quantify causal influences between elements in stochastic systems. Information theory has played a pivotal role in these endeavors, leading to various measures, including predictive information (7), transfer entropy (8), and stochastic interaction (9). Drawn from consciousness studies involving measurement of integration of neural activity (10, 11), the mathematical concept of integrated information is also useful as a framework for analyzing causal relationships in complex systems with multiple elements.

Recent research suggests that the brain loses the ability to integrate information when consciousness is lost during dreamless sleep (12), general anesthesia (13), or vegetative states (14), suggesting that quantifying integration of information can serve as a neurophysiological marker of consciousness (10, 11, 15). The integrated information theory (IIT) of consciousness (16, 17) proposes a measure of integration called integrated information that quantifies multiple causal influences among elements of a system. Integrated information is theoretically motivated by the holistic property of consciousness experienced as a unified whole that is irreducible into separate parts or experiences. Whereas the original motivation for integrated information is intended to elucidate the neural substrate of consciousness, it can in principle be applied to many research fields.

Despite its broad potential impact, the application of integrated information (16, 18) to experimental data is severely limited (19, 20) due to the original measure’s derivation under restricted conditions, wherein the probability distribution of past states in a system is assumed to be uniform, variable discrete (18). In an effort to broaden the applicability, several measures have been proposed under general conditions (9, 19, 21). However, these proposed measures are limited by mathematical problems. Quantification of a pairwise causal influence from one element to another can be achieved with existing measures, but to quantify multiple causal influences among many parts poses the problems of overestimation and confounding noncausal influences. To overcome these problems, we propose a unified framework for quantifying causal influences based on information geometry (22). The measure we propose, called “geometric integrated information” ΦG, overcomes the described difficulties, provides geometric interpretations of existing measures, and elucidates the relationships among the measures in a hierarchical manner. The mathematical solution we derive should have broad utility in elucidating complex systems.

Three Postulates on Strength of Influences

We propose a unified theoretical framework for quantifying the strength of spatiotemporal influences based on three postulates. Let us consider a stochastic dynamical system in which the past and present states of the system are given by X={x1,x2,,xN} and Y={y1,y2,,yN}, respectively, where N is the number of elements in the system. Information about X is integrated by influences among elements and transmitted to Y. The spatiotemporal influences of the system are fully characterized by the joint probability distribution p(X,Y). We call p(X,Y) a “full model”. In a dynamical system characterized by p(X,Y), there are three different types of influences. Influences between elements at the same time (called equal-time influences) can be quantified by analyzing only the marginal distributions p(X) or p(Y). Influences across different time points (called across-time influences) can be further divided into those among different units (cross-influences) and those within the same unit (self-influences). The across-time influences can be quantified from the conditional probability distribution p(Y|X). They are also known as causal influences (2, 8), in the sense of causality that is statistically inferred from conditional probability distributions although it does not necessarily mean actual physical causality (23). Here, we use the term causality in this context and focus on quantifying causal influences.

For quantifying causal influences (both self- and cross-influences) among elements of X and Y, consider approximating the probability distribution p(X,Y) by another probability distribution q(X,Y) in which the influences of interest are statistically disconnected. We call q(X,Y) a “disconnected model.” The strength of influences can be quantified by to what extent the corresponding disconnected model q(X,Y) can approximate the full model p(X,Y). The goodness of the approximation can be evaluated by the difference between the two probability distributions p(X,Y) and q(X,Y). Minimizing a difference between p(X,Y) and q(X,Y) corresponds to finding the best approximation of p(X,Y) by a disconnected model q(X,Y). From this reasoning, we propose the first postulate as follows.

Postulate 1. Strength of influences is quantified by a minimized difference between the full model and a disconnected model.

The second postulate is used to define a disconnected model. Consider partitioning the elements of a system into m parts, X={X1,X2,,Xm} and Y={Y1,Y2,,Ym}, where Xi and Yi contain the same elements in a system. To avoid the confounds of noncausal influences, we should minimally disconnect only the influences of interest without affecting the rest. To define such a minimal operation of statistically disconnecting influences from Xi to Yj, we propose the second postulate as follows.

Postulate 2. A disconnected model, where influences from Xi to Yj are disconnected, satisfies the Markov condition XiXiYj, where Xi is the complement of Xi in X; that is, Xi=XXi.

The Markov condition XiXiYj means that Xi and Yj are conditionally independent given Xi,

q(Xi,Yj|Xi)=q(Xi|Xi)q(Yj|Xi). [1]

Under the Markov condition, there is no direct influence from Xi on Yj given the states of the other elements Xi being fixed.

The third postulate defines the measure of a difference between the full model and a disconnected model, which is denoted by D[p:q]. There are many possible ways to quantify the difference between two probability distributions (22, 24). We consider several theoretical requirements that the measure of difference should satisfy to have desirable mathematical properties (details in Supporting Information): (i) D[p:q] should be nonnegative and becomes 0 if and only if p=q, (ii) D[p:q] should be invariant under invertible transformations of random variables, (iii) D[p:q] should be decomposable, and (iv) D[p:q] should be flat. We can prove that the only measure that satisfies all of the theoretical requirements is the well-known Kullback–Leibler (KL) divergence (22). Thus, we propose the third postulate as follows.

Postulate 3. A difference between the full model and a disconnected model is measured by KL divergence.

Taken together, the strength of causal influences from Xi to Yj, ci[XiYj], is quantified by the minimized KL divergence,

ci[XiYj]=minq(X,Y)DKL[p(X,Y)||q(X,Y)], [2]

under the constraint of the Markov condition given by Eq. 1.

A Unified Derivation of Existing Measures

In this section, we derive existing measures from the unified framework and provide the interpretations of them.

Total Causal Influences: Mutual Information.

First, consider quantifying the total strength of causal influences between the past and present states. From the operation of disconnections given by Eq. 1, the influences from all elements X to Y are disconnected by forcing X and Y to be independent,

q(X,Y)=q(X)q(Y). [3]

The disconnected model is graphically represented in Fig. 1A. To introduce the perspective of information geometry, consider a manifold of probability distributions MF, where each point in the manifold represents a probability distribution p(X,Y) (a full model). Consider also a manifold MI where X and Y are independent, which means that there are no causal influences between X and Y. A probability distribution q(X,Y) (a disconnected model) is represented as a point in the manifold MI. In general, the actual probability distribution p(X,Y) is represented as a point outside the submanifold MI (Fig. 2). The difference between the two probability distributions is quantified by KL divergence,

DKL[p(X,Y)||q(X,Y)]=X,Yp(X,Y)logp(X,Y)q(X,Y). [4]

Fig. 1.

Fig. 1.

(A–D) Minimizing the Kullback–Leibler (KL) divergence between the full and the disconnected model leads to various information theoretic quantities: (A) mutual information, (B) transfer entropy, (C) integrated information, and (D) stochastic interaction. Constraints imposed on the disconnected model are graphically shown.

Fig. 2.

Fig. 2.

Information geometric picture for minimizing the KL divergence between the full model p(X,Y), which resides in the manifold MF, and the disconnected model q(X,Y), which resides in the manifold MI. q(X,Y) is the point in MI that is closest to p(X,Y).

We consider finding the closest point q to p within the submanifold MI, which minimizes the KL divergence between p(X,Y) and q(X,Y)MI (Fig. 2). This corresponds to finding the best approximation of p(X,Y). The minimizer of KL divergence is derived by orthogonally projecting the point p(X,Y) to the manifold MI according to the projection theorem in information geometry (22) (Supporting Information). In the present case, p, the closest point q, and any point q in MI form an orthogonal triangle. Thus, the following Pythagorean relation holds: D(p||q)=D(p||q)+D(q||q). From the Pythagorean relation, we can find that the KL divergence is minimized when the marginal distributions of q(X,Y) over X and Y are both equal to those of the actual distribution p(X,Y); i.e., q(X)=p(X) and q(Y)=p(Y). The minimized KL divergence is given by

minqDKL[p||q]=H(Y)H(Y|X), [5]
=I(X;Y) [6]

where H(Y) is the entropy of Y, H(Y|X) is the conditional entropy of Y given X, and I(X;Y) is the mutual information between X and Y. From the derivation, we can interpret the mutual information between X and Y as the total causal influences between X and Y. The mutual information between the present and past states can be also interpreted as the degree of predictability of the present states given the past states and has been termed as predictive information (7).

Partial Causal Influences: Conditional Transfer Entropy.

Next, consider quantifying a partial causal influence from one element to another in the system. From the operation of disconnections in Eq. 1, a partial causal influence from xi to yj is disconnected by q, satisfying

q(xi,yj|xi)=q(xi|xi)q(yj|xi), [7]

where xi is the past states of all of the variables other than xi. Under the constraint, the KL divergence is minimized when q(X)=p(X), q(yj|X)=p(yj|xi), and q(yj|X,yj)=p(yj|X,yj) (Supporting Information). The minimized KL divergence is found to be equal to the conditional transfer entropy,

minqDKL[p||q]=H(yj|xi)H(yj|X), [8]
=TE(xiyj|xi), [9]

where TE(xiyj|xi) is the conditional transfer entropy from xi to yj given xi. Thus, we can interpret the conditional transfer entropy as the strength of the partial causal influence from xi to yj.

A Measure of Integrated Information

Integrated information is defined as a measure to quantify the strength of all causal influences among parts of the system. In the case of two units, integrated information should quantify both of the causal influences from x1 to y2 and from x2 to y1. It aims to quantify the extent to which the whole system exerts synergistic influences on its future more than the parts of a system independently do and, thus, irreducibility of the whole system into independent parts (16). Accordingly, integrated information is theoretically required that it should be nonnegative and upper bounded by the total causal influences in the whole system, which is the mutual information between the past and present states I(X;Y) in our framework as shown above (20). Based on Postulates 1–3, we uniquely derive a measure of integrated information by imposing the corresponding constraints, which naturally satisfies the theoretical requirement.

Consider again partitioning a system into m parts. By applying the operation in Eq. 1 for all pairs of i and j (i), we can find that all causal influences among the parts are disconnected by the condition

q(Yi|X)=q(Yi|Xi)(i). [10]

To quantify integrated information, we consider a manifold MG constrained by Eq. 10. Note that within MG, the present states in a part Yi directly depend only on the past states of itself, Xi, and thus the transfer entropies from one part Xi to all of the other parts Yj (ji) are 0. Now we propose a measure of integrated information, called geometric integrated information ΦG, as the minimized KL divergence between the actual distribution p(X,Y) and the disconnected distribution q(X,Y) within MG:

ΦG=minqMGDKL[p||q]. [11]

The manifold MG formed by the constraints for integrated information (Eq. 10) includes the manifold MI formed by the constraints for mutual information (Eq. 3); i.e., MIMG. Because minimizing the KL divergence in a larger space always leads to a smaller value, ΦG is always smaller than or equal to the mutual information I(X;Y):

0ΦGI(X;Y). [12]

Thus, ΦG, uniquely derived from Postulates 1–3, naturally satisfies the theoretical requirements as integrated information.

Comparisons with Other Measures

The Sum of Transfer Entropies.

For simplicity, consider a system consisting of two variables (Fig. 1). Conceptually, a measure of integrated information should be designed to quantify the strength of two causal influences from x1 to y2 and from x2 to y1 (Fig. 1C). Because each causal influence is quantified by the transfer entropy, TE(x1y2|x2) or TE(x2y1|x1), one may naively think that the sum of transfer entropies can be used as a valid measure of integrated information and may be the same as ΦG. In contrast with this naive intuition, the sum of transfer entropies is not equal to ΦG and moreover, it can exceed the mutual information between X and Y, which violates the important theoretical requirement as a measure of integrated information (Eq. 12). When there is strong dependence between y1 and y2, simply taking the sum of transfer entropies leads to overestimation of the total strength of causal influences. An extreme case where such overestimation occurs is when y1 and y2 are copies of each other.

As a simple example, consider a system consisting of two binary units, each of which takes one of the two states, 0 or 1. Assume that the probability distribution of the past states of x1 and x2 is a uniform distribution; i.e., p(x1,x2)=1/4. The present state of unit 1, y1, is determined by the AND operation of the past state x1 and x2, that is, y1 becomes 1 if both x1 and x2 are 1, and it becomes 0 otherwise. On the other hand, y2 is determined by a “noisy” AND operation where the state of y2 flips with certain probability r; i.e., p(y2=1)=1r if (x1,x2)=(1,1) and p(y2=1)=r if (x1,x2)=(0,0),(0,1),(1,0), where r determines the noise level. As the noise level of the noisy AND operation decreases, the dependence between y1 and y2 gets stronger. When there is no noise, i.e., r=0, y1 and y2 are completely equal. We varied the strength of dependence by changing the noise level and calculated transfer entropies and ΦG (see Supporting Information for the computation of ΦG in the binary case) (Fig. 3). As the noise level decreases, the transfer entropy from x1 to y2 increases but the mutual information stays the same because y2, which is a noisy AND gate, does not add any additional information about the input X above the information already provided by y1, which is the perfect AND gate. When the noise level is low and thus the dependence between y1 and y2 is strong, the sum of transfer entropies exceeds the amount of mutual information.

Fig. 3.

Fig. 3.

Comparison between integrated information and the sum of transfer entropies (TE). A system consists of two binary units whose states are determined by an AND gate and a noisy AND gate. When the noise level of the noisy AND gate is low and thus the dependence between the units is strong, the sum of transfer entropies (green line) exceeds the mutual information (black line) whereas integrated information ΦG (red line) is always less than the mutual information. Each transfer entropy (blue solid and dotted lines) is always less than or equal to ΦG.

On the other hand, ΦG never exceeds the amount of mutual information (Fig. 3). ΦG avoids the overestimation by simultaneously evaluating the strength of multiple influences. In contrast, the sum of transfer entropies separately quantifies causal influences by considering only parts of the system. For example, when the transfer entropy from x1 to y2 is quantified, y1 is not taken into consideration, which leads to the overestimation. To accurately evaluate the total strength of multiple influences, we need to take a holistic approach as we proposed to do with ΦG. The flaw of the simple sum of transfer entropies illuminates the limitation of the part-based approach and the advantage of the holistic approach.

A related quantity with the sum of transfer entropies has been proposed as causal density (21). Originally, causal density was proposed as the normalized sum of the conditional Granger causality from one element to another (21). Because transfer entropy is equivalent to Granger causality for Gaussian variables (25), the normalized sum of the conditional transfer entropies can be considered as a generalization of causal density. Although a simple sum of Granger causality or transfer entropies is easy to evaluate and would be useful for approximately evaluating the total strength of causal influences, we need to be careful about the problem of overestimation.

Stochastic Interaction.

Another measure, called stochastic interaction (9), was proposed as a different measure of integrated information (19). In the derivation of stochastic interaction, Ay (9) considered a manifold MS where the conditional probability distribution of Y given X is decomposed into the product of the conditional probability distributions of each part (Fig. 1D):

q(Y|X)=i=1mq(Yi|Xi). [13]

This constraint satisfies the constraint for the integrated information (Eq. 10). Thus, MSMG. In addition to that, this constraint further satisfies conditional independence among the present states of parts given the past states in the whole system X:

q(Y|X)=i=1mq(Yi|X). [14]

This constraint corresponds to disconnecting equal-time influences among the present states of the parts given the past states of the whole in addition to across-time influences (Fig. 1D). On the other hand, the constraint in Eq. 10 corresponds to disconnecting only across-time influences (Fig. 1C).

The KL divergence is minimized when q(X)=p(X) and q(Yi|Xi)=p(Yi|Xi) (9). The minimized KL divergence is equal to stochastic interaction SI(X;Y):

minqDKL[p||q]=iH(Yi|Xi)H(Y|X), [15]
=SI(X;Y). [16]

In contrast to the manifold MG considered for ΦG, the manifold MS formed by the constraints for stochastic interaction (Eq. 13) does not include the manifold MI formed by the constraints for the mutual information between X and Y (Eq. 3). This is because not only causal influences but also equal-time influences are disconnected in MS (Fig. 1D). Stochastic interaction can therefore exceed the total strength of causal influences in the whole system, which violates the theoretical requirement as a measure of integrated information (Eq. 12). Notably, stochastic interaction can be nonzero even when there are no causal influences, i.e., when the mutual information is 0 (20). To summarize, stochastic interaction does not purely quantify causal influences but rather quantifies the mixture of causal influences and simultaneous influences.

Analytical Calculation for Gaussian Variables

Although we cannot derive a simple analytical expression for ΦG in general, it is possible to derive it for Gaussian variables. In this section, we analytically compute ΦG when the probability distribution of a system p(X,Y) is Gaussian. We also show a close relationship between the proposed measure of integrated information ΦG and multivariate Granger causality. Consider the following multivariate autoregressive model,

Y=AX+E, [17]

where X and Y are the past and present states of a system, A is the connectivity matrix, and E is Gaussian random variables with mean 0 and covariance matrix Σ(E), which are uncorrelated over time. The multivariate autoregressive model is the generative model of a multivariate Gaussian distribution. Regarding Eq. 17 as a full model, we consider the following as a disconnected model:

Y=AX+E. [18]

The constraints for ΦG (Eq. 10) correspond to setting the off-diagonal elements of A to 0:

Aij=0(ij). [19]

It is instructive to compare this with the constraints for the other information theoretic quantities introduced above: the constraints for mutual information (Fig. 1A), transfer entropy from x1 to y2 (Fig. 1B), and stochastic interaction (Fig. 1D). They correspond to A=0, A21=0, and the off-diagonal elements of A and Σ(E) being 0, respectively. Fig. 4 shows the relationship between the manifolds formed by the constraints for mutual information MI, stochastic interaction MS, and integrated information MG. We can see that MI and MS are included in MG. Thus, ΦG is smaller than I(X;Y) or SI(X;Y). On the other hand, there is no inclusion relation between MI and MS.

Fig. 4.

Fig. 4.

Relationships between manifolds for mutual information MI (gray line), stochastic interaction MS (orange line), and integrated information MG (green plane) in the Gaussian case. MI is the line where A=0, MS is the line where Σ(E)12 and A12,A21 are 0, and MG is the plane where A12,A21 are 0.

By differentiating the KL divergence between the full model p(X,Y) and a disconnected model q(X,Y) with respect to Σ(X)1, A, and Σ(E)1, we can find the minimum of the KL divergence, using the following equations (details in Supporting Information):

Σ(X)=Σ(X), [20]
(Σ(X)(AA)Σ(E)1)ii=0, [21]
Σ(E)=Σ(E)+(AA)Σ(X)(AA)T. [22]

By substituting Eqs. 2022 into the KL divergence, we obtain

ΦG=12log|Σ(E)||Σ(E)|. [23]

|Σ(E)| is called the generalized variance, which is used as a measure of goodness of fit, i.e., the degree of prediction error, in multivariate Granger causality analysis (26, 27). In the Gaussian case, ΦG can be interpreted as the difference in the prediction error on comparison of the full and the disconnected model, in which the off-diagonal elements of A are set to 0. Thus, ΦG is consistent with multivariate Granger causality based on the generalized variance. ΦG can be rewritten as the difference between the conditional entropy in the full model and that in the disconnected model,

ΦG=H(q(Y|X))H(p(Y|X)). [24]

For comparison, mutual information, transfer entropy, and stochastic interaction are given as I(X;Y)=12log|Σ(X)||Σ(E)|, TE(xiyj|xj)=12logΣ(E)jjΣ(E)jj, SI(X;Y)=12logΣ(E)11Σ(E)22|Σ(E)|, where Σ(E)jj (j=1,2) is the covariance of the conditional probability distribution p(yj|xj).

Hierarchical Structure

We can construct a hierarchical structure of the disconnected models and then use it to systematically quantify all possible combinations of causal influences (28). For example, in a system consisting of two elements, there are four across-time influences, x1y1, x1y2, x2y1, and x2y2, which are denoted by T11, T12, T21, and T22, respectively. Although we consider only the cross-influences, T12 and T21 for transfer entropy and integrated information, we can also quantify self-influences T11 and T22 by imposing the corresponding constraints, such as q(y1|x1,x2)=q(y1|x2) and q(y2|x1,x2)=q(y2|x1), respectively. A set of all possible disconnected models forms a partially ordered set with respect to KL divergence between the full and the disconnected models (Fig. 5). If a given disconnected model is related to another one with a removal or an inclusion of an influence, the two models are connected by a line in Fig. 5. From Bottom to Top in Fig. 5, information loss increases as more influences are disconnected. Note that there is no ordering relationship between the disconnected models at the same level of the hierarchy. In Fig. 5, Top, all four influences are disconnected, and thus information loss is maximized, which corresponds to the mutual information I(X;Y). The hierarchical structure generalizes related measures mentioned in this article and provides a clear perspective on the relationship among different measures.

Fig. 5.

Fig. 5.

A hierarchical structure of the disconnected models where across-time influences are broken in a system consisting of two units. All possible combinations of influences retained in the disconnected model are displayed. If two models are related with the addition or removal of one influence, they are connected by a line. The KL divergence between the full and the disconnected model increases from Bottom to Top.

Discussion

In this paper, we proposed a unified framework based on information geometry, which enables us to quantify multiple influences without overestimation and confounds of noncausal influences. With the framework, we uniquely derived the measure of integrated information, ΦG. Moreover, our framework enables the complete description of causal relationships within a system by quantifying any combination of causal influences in a hierarchical manner as shown in Fig. 5. We expect that our framework can be used in diverse research fields, including neuroscience (29, 30), where network connectivity analysis has been an active research topic (31), and in particular consciousness researchers (3234) because information integration is considered to be a key prerequisite of conscious information processing in the brain (10, 11).

To apply the measure of integrated information in real data, we need to resolve several practical difficulties. First, the computational costs increase exponentially with the system size. Thus, some way of approximating data is necessary. As we showed in this paper, the Gaussian approximation enables us to analytically compute integrated information, allowing us to compute integrated information in a large system (Eqs. 2023). However, in real world systems, including brains, nonlinearity can be often significant and the Gaussian approximation may poorly fit to data. In such cases, transforming time series data into a sequence of discrete symbols can result in more accurate approximation (34, 35). Our measure of integrated information can be computed in such discrete distributions as shown in Supporting Information. Second, we need to find an appropriate partition of a system, which is an important problem in IIT (16). The computational costs for finding the optimal partition also exponentially increase. To overcome this difficulty, some effective optimization method needs to be used, possibly methods from discrete mathematics.

From a theoretical perspective, we could consider replacing Postulates 2 and 3 with different ones as interesting future research. As for Postulate 2, which defines the operation of disconnecting causal influences, we can use the interventional formalism (23, 36), which quantifies causal influences based on mechanisms of a system rather than observation of the system. As for Postulate 3, which defines the difference between the full model and a disconnected model, we can replace the KL divergence with other measures (24), such as the optimal transport distance, a.k.a, earth mover’s distance, which is considered to be important in IIT (17) and also has been shown to be useful in statistical machine learning (37). Our framework based on information geometry can be generally used for deriving different measures of causal influences from such different postulates and for analyzing the different geometric structures induced by them.

Manifold of Probability Distributions

Information geometry deals with a manifold of probability distributions and elucidates geometry in the manifold. Each point in the manifold represents a particular probability distribution. For example, consider a discrete probability distribution p(x) where x is a discrete random variable taking n+1 different values x={0,1,,n}. Let pi be the probability that x takes the value i,

pi=Prob[x=i],i=0,1,,n. [S1]

Because the sum of probabilities has to be 1,

i=0npi=1, [S2]

the probability distribution is parameterized by a vector of n probabilities

ξ=(p1,p2,,pn). [S3]

Thus, the set of all possible probability distributions p(x) forms an n-dimensional manifold. The vector of n probabilities, ξ, is considered as a coordinate of the manifold. This manifold is called the probability simplex and is denoted by Sn.

Another example is a univariate Gaussian distribution,

p(x;μ,σ2)=12πσexp((xμ)22σ2), [S4]

where x is a random continuous variable, μ is the mean, and σ2 is the variance. A Gaussian distribution is specified by two variables, μ and σ. Thus, a set of Gaussian distribution forms a two-dimensional manifold parameterized by μ and σ. The coordinate system of the manifold is ξ=(μ,σ).

Discrete probability distributions and a Gaussian distribution belong to a broad class of probability distributions called an exponential family. The probability distributions included in the exponential family are written in the form

p(x,ξ)=exp(i=1nξihi(x)+k(x)ψ(ξ)), [S5]

where hi(x) and k(x) are functions of a random variable x and ψ(ξ) is the normalization factor. A set of n parameters ξ=(ξ1,ξ2,,ξn) specifies the probability distributions. Thus, the exponential family of probability distributions forms an n-dimensional manifold with the coordinate system ξ.

Requirements for a Measure of Difference

As detailed in the main text, we postulated that the strength of influences is quantified by a minimized difference between the full model p and a disconnected model q, which is denoted by D[p:q]. There are many possible ways to quantify a difference between probability distributions (22, 24). We consider four theoretical requirements that should be satisfied by the measure of a difference so that the measure has desirable mathematical properties. We can prove that the only measure of a difference that satisfies all four requirements is the Kullback–Leibler (KL) divergence.

The first requirement is as follows.

Requirement 1. D[p(x):q(x)] should be a divergence.

A divergence, D[P:Q], is a quantity that measures a degree of separation between two points P and Q, satisfying the following criteria:

  • i)

    D[P:Q]0.

  • ii)

    D[P:Q]=0 if and only if P=Q.

  • iii)
    When P and Q are sufficiently close, by denoting their coordinates by ξP and ξQ=ξP+dξ, the Taylor expansion of D is written as
    D[ξP:ξQ]=12gij(ξP)dξidξj+O(|dξ|3), [S6]
    and matrix G=(gij) is positive definite.

The first and second criteria are considered as the minimum requirements so that the quantity can be interpreted as a measure of the separation between two points. A divergence is a weaker notion than a distance because it does not satisfy the axioms of distance; i.e., a divergence is not symmetric, D[P:Q]D[Q:P], and does not satisfy the triangle inequality. When the third criterion is satisfied, a manifold M determined by the positive-definite matrix gij is said to be Riemannian. As explained in the previous section, a probability distribution of the full model and a disconnected model can be represented as a point in the manifold of probability distributions, Sn. We consider the two points corresponding to the full model and a disconnected model as P and Q, respectively.

The second requirement is as follows.

Requirement 2. D[p(x):q(x)] should be invariant under invertible transformations of random variables.

D[p(x):q(x)] should be invariant when a random variable x is transformed by an invertible function y=k(x), where x and y are one-to-one mapping. When x and y are one-to-one mapping, there is no information loss due to the transformation of variables. Invariance is also a necessary condition because the measure of a difference should not depend on a particular choice of variables but rather should be invariant under such lossless transformations of variables,

D[p(x):q(x)]=D[p(y=k(x)):q(y=k(x))]. [S7]

We can arbitrarily make countless invariant divergences. In general, when a certain invariant divergence, Din[p:q], is given, g(Din[p:q]) is also an invariant function where g is an arbitrary monotonic function that satisfies g(0)=0 and g(0)>0. To resolve such arbitrariness, the third requirement is necessary.

Requirement 3. D[p(x):q(x)] should be decomposable.

When a divergence can be written in an additive form of component-wise divergences for some function d,

D[p:q]=i=0nd(pi,qi), [S8]

it is said to be decomposable. It has been proved that an invariant and decomposable divergence is uniquely written in the form (22)

D[p:q]=i=0npif(qipi), [S9]

where f is a differentiable convex function satisfying

f(1)=0. [S10]

This type of divergence is called f divergence.

There still remains an arbitrariness in the choice among the f-divergence functions. The arbitrariness can be resolved by the requirement of flatness in the manifold of probability distributions.

Requirement 4. D[p(x):q(x)] should be flat.

A divergence induces a Riemannian metric and a dual pair of affine connections coupled by the metric. It is said to be “flat” when it induces a flat structure in the underlying manifold where its dual curvatures are 0. The dually flat manifold has useful properties as it can be considered as a generalization of a Euclidean space. Importantly, information geometry shows that the generalized Pythagorean theorem and the related projection theorem hold in a dually flat manifold. These theorems play pivotal roles in this paper as we show below, as well as in many applications in various fields including statistics, machine learning, and information theory because the minimization of the divergence can be easily solved by these theorems. As we quantify causal influences as the minimized divergence, this property is beneficial for our purpose.

With the requirement of dual flatness, the divergence is now uniquely determined and is found to be the well-known KL divergence (22),

DKL[p(x)||q(x)]=xp(x)logp(x)q(x). [S11]

To summarize, the KL divergence is the only divergence that is invariant, decomposable, and flat, satisfying all of the theoretical requirements for a measure of difference with the desired mathematical properties. Thus, as in Postulate 3 in the main text, we propose that the difference between the full model and a disconnected model should be quantified by the KL divergence.

Pythagorean Theorem

As stated in the previous section, the KL divergence induces a dually flat structure in the manifold of probability distributions. A dually flat manifold can be considered as a generalization of Euclidian space. In a dually flat manifold, a generalized Pythagorean theorem holds. Let us consider three points (probability distributions) p, q, and r in a dually flat manifold. When the dual geodesic connecting p and q is orthogonal to the geodesic connecting q and r, the following Pythagorean theorem holds:

DKL[p(x)||r(x)]=DKL[p(x)||q(x)]+DKL[q(x)||r(x)]. [S12]

Here, the term “geodesic” does not mean the shortest path connecting two points. It is used to mean a straight line connecting two points. The geodesic connecting q and r is represented as

θqr=(1t)θq+tθr, [S13]

where θq and θr represent the positions of q and r, respectively, in a coordinate system θ. Its tangent vector is given by

dθqrdt=θrθq. [S14]

In a dually flat manifold, there is a dual coordinate system θ that is coupled with θ via Legendre transformation. The dual geodesic connecting p and q is represented as

θpq=(1t)θp+tθq, [S15]

where θp and θq represent the positions of p and q, respectively, in the dual coordinate system θ. Its tangent vector is given by

dθpqdt=θqθp. [S16]

That the two geodesics are orthogonal means that the two tangent vectors (Eqs. S14 and S16) are orthogonal,

dθpqdtdθqrdt=0,
(θqθp)(θrθq)=0.

It can be shown that the above equation representing the relationship of the orthogonal triangle is equivalent to the Pythagorean relation in Eq. S12 (22).

Projection Theorem and Pythagorean Relations

We consider minimizing the KL divergence between the full model p and a disconnected model q under constraints

DKL[p||q]=minqMDDKL[p||q], [S17]

where q is the minimizer of the KL divergence and MD is a submanifold where the constraints are satisfied. According to the projection theorem in information geometry, the closest point q can be found by orthogonally projecting the point p to the submanifold MD. The orthogonal projection means that the dual geodesic connecting p and q is orthogonal to any tangent vector in MD at the intersection. If the submanifold MD is flat, the dual geodesic connecting p and q is orthogonal to geodesics connecting q and any point qMD. In a flat submanifold, any geodesic connecting any two points in the submanifold is included in the submanifold. Thus, the three points p, the closest point q, and any point q MD form an orthogonal triangle. As explained in the previous section, the following Pythagorean relation holds for the orthogonal triangle:

DKL[p||q]=DKL[p||q]+DKL[q||q]. [S18]

Total Causal Influences: Mutual Information

Total causal influences can be quantified by minimizing the KL divergence under the constraint where the past and future states of a system X and Y are independent. The constraint is given by

q(X,Y)=q(X)q(Y). [S19]

As explained in the previous section, the Pythagorean relation (Eq. S18) holds because the submanifold determined by the constraint is flat. From the Pythagorean relation, we have

DKL[p||q](DKL[p||q]+DKL[q||q])=0,X,Y(p(X,Y)q(X,Y))logq(X,Y)q(X,Y)=0,X(p(X)q(X))logq(X)q(X)+Y(p(Y)q(Y))logq(Y)q(Y)=0. [S20]

From Eq. S20, we find that q(X) and q(Y) must be equal to the marginal distributions of p(X,Y) over X and Y, respectively:

q(X)=p(X),q(Y)=p(Y).

The minimized KL divergence is given by

minq(X,Y)DKL(p||q)=X,Yp(X,Y)logp(X,Y)p(X)p(Y),=H(X)+H(Y)H(X,Y),=I(X;Y),

where H is the entropy and I(X;Y) is the mutual information between X and Y.

Partial Causal Influences: Transfer Entropy

Partial causal influences from one element to another can be quantified by minimizing the KL divergence under the constraint where xi and yj are conditionally independent given the past states of a system that excludes only xi. The constraint is given by

q(yj|X)=q(yj|xi), [S21]

where xi is the past states of a system except for xi. The constraint disconnects the causal interaction from xi to yj. Under the constraint, the disconnected model q(X,Y) is expressed as

q(X,Y)=q(yj|yj,X)q(yj|X)q(X),=q(yj|yj,X)q(yj|xi)q(X),

where yj is the present states of a system except for yj. The KL divergence between p and q can be decomposed into the following three KL divergences,

DKL(p(X,Y)||q(X,Y))=DKL(p(X)||q(X))+DKL(p(yj|X)||q(yj|X))+DKL(p(yj|X,yj)||q(yj|X,yj)). [S22]

When we minimize the KL divergence in Eq. S22, we can separately minimize the three KL divergences. Because the constraint does not affect the first term and the third term, we can easily find that the first term and the third term are minimized (become 0) when q(X) and q(yj|X,yj) are equal to the corresponding distributions p(X) and p(yj|X,yj), respectively. Thus, the minimization of the KL divergence in Eq. S22 is simply expressed as

minq(X,Y)DKL(p(X,Y)||q(X,Y))=minq(yj|X)DKL(p(yj|X)||q(yj|X)). [S23]

We can find the minimizer of the KL divergence by using the Pythagorean relation because the submanifold determined by the constraint in Eq. S21 is flat. From the Pythagorean relation, we have

X,yj(p(X,yj)q(X,yj))logq(yj|X)q(yj|X)=0,xi,yjp(xi)(p(yj|xi)q(yj|xi))logq(yj|xi)q(yj|xi)=0, [S24]

where the relation that q(X)=p(X) is used. From Eq. S24, we find that

q(yj|xi)=p(yj|xi). [S25]

The minimized KL divergence is calculated as

minq(X,Y)DKL(p(X,Y)||q(X,Y))=X,yjp(X,yj)logp(yj|X)p(yj|xi),=H(yj|xi)H(yj|X),=TE(xiyj|xi),

where H(yj|xi) [or H(yj|X)] is the conditional entropy given xi [or X] and TE(xiyj|xi) is the conditional transfer entropy from xi to yj given xi.

Integrated Information

We propose to quantify integrated information as the minimized KL divergence,

ΦG=minqDKL(p||q), [S26]

under the constraint

q(Yi|X)=q(Yi|Xi)(i), [S27]

where Xi and Yi represent the past and present states of the elements in the ith subsystem, respectively. In general, it is difficult to analytically solve the minimization of the KL divergence under the above constraints unlike in the case of mutual information or transfer entropy. Note that the submanifold determined by the constraints is not flat and thus the Pythagorean relation in Eq. S18 does not hold. We need to numerically solve the constrained minimization of KL divergence by using optimization methods such as Newton’s method.

Integrated Information for Discrete Variables

In the following, we show how to compute ΦG when the states of units are represented by discrete variables. As the simplest case, we consider a system consisting of two binary units whose states are 0 or 1. Generalization to the other cases is quite straightforward although computations are more complicated. In the case of two units, the constraints for integrated information are given by

q(y1|x1,x2)=q(y1|x1), [S28]
q(y2|x1,x2)=q(y2|x2). [S29]

Under the constraint, the disconnected model q(X,Y) is expressed as

q(X,Y)=q(y2|x1,x2,y1)q(y1|x1,x2)q(X),=q(y2|x1,x2,y1)q(y1|x1)q(X),

where we used the first constraint (Eq. S28). We cannot further simplify q(y2|x1,x2,y1) by using the second constraint (Eq. S29). Thus, we need to introduce the Lagrange multipliers for the second constraint. Note that as we are currently considering the binary variables, the second constraint is equivalent to the constraint

q(y2|x1=0,x2)q(y2|x1=1,x2)=0. [S30]

With the method of Lagrange multipliers, the Lagrangian is given by

L=DKL(p||q)+ρ(Xq(X)1)+x1λ(x1)(y1q(y1|x1)1)+x1,x2,y1μ(x1,x2,y1)(y2q(y2|x1,x2,y1)1)+x2β(x2)(q(y2=1|x1=0,x2)q(y2=1|x1=1,x2)), [S31]

where ρ, λ(x1), μ(x1,x2,y1), and β(x2) are Lagrange multipliers. The constraint in Eq. S30 is imposed on only one of the states of y2 (we set y2=1 without loss of generality) because the constraint for the other state of y2, i.e., q(y2=0|x1=0,x2)q(y2=0|x1=1,x2)=0, is automatically satisfied due to the constraint of y2q(y2|x1,x2,y1)=1.

The disconnected model q(X,Y) that minimizes the KL divergence can be found by differentiating the Lagrangian with respect to q(X), q(y1|x1), and q(y2|x1,x2,y1) and setting the partial derivatives to 0,

Lq(X)=p(X)q(X)+ρ=0, [S32]
Lq(y1|x1)=p(y1,x1)q(y1|x1)+λ(x1)+β(x2)(1)x1q(y2=1|x1,x2,y1)=0, [S33]
Lq(y2|x1,x2,y1)=p(x1,x2,y1,y2)q(y2|x1,x2,y1)+μ(x1,x2,y1)+β(x2)y2(1)x1q(y1|x1)=0. [S34]

From the first equation (Eq. S32), we can simply find that

q(X)=p(X). [S35]

However, the other two equations cannot be analytically solved. Thus, to find the solutions of the above equations, we need to resort to a numerical method. When we computed ΦG in a system consisting of two binary units in Fig. 3 of the main text, we used Newton’s method.

Integrated Information for Gaussian Variables

When the probability distributions are Gaussian distributions, ΦG can be analytically computed. For simplicity, we consider partitioning a system into individual elements, meaning that there are N subsystems, each subsystem containing only one unit. The constraints are expressed as

q(yi|X)=q(yi|xi)(i). [S36]

Consider the following multivariate autoregressive model,

Y=AX+E, [S37]

where X is the past state of a system, Y is the present state of a system, A is the connectivity matrix, and E is Gaussian random variables, which are uncorrelated over time. The multivariate autoregressive model is the generative model of a multivariate Gaussian distribution. The joint probability distribution of X and Y is expressed as

p(X,Y)=exp{12(XTΣ(X)1X+(YAX)TΣ(E)1(YAX)ψ)}, [S38]

where Σ(X) and Σ(E) are the covariance matrices of X and Y, respectively. We consider combining X and Y and define a new vector Z,

Z=(XY). [S39]

Then, the joint probability distribution of X and Y is rewritten as

p(Z)=exp{12(ZTΣ(Z)1Zψ)}, [S40]

where the covariance matrix of Z is given by

Σ(Z)=(Σ(X)Σ(X)ATAΣ(X)Σ(E)+AΣ(X)AT), [S41]

and the inverse of Σ(Z) is given by

Σ(Z)1=(Σ(X)1+ATΣ(E)1AATΣ(E)1Σ(E)1AΣ(E)1). [S42]

Similarly, the joint probability distributions in the disconnected model are given by

q(X,Y)=exp{12(XTΣ(X)1X+(YAX)TΣ(E)1(YAX)ψ)}, [S43]
=exp{12(ZTΣ(Z)1Zψ)}, [S44]

where Σ(X) and Σ(E) are the covariance matrices of X and Y in the disconnected model, respectively, and A is the connectivity matrix in the disconnected model. The constraints in Eq. S36 correspond to setting the off-diagonal elements of A to 0:

Aij=0(ij). [S45]

The KL divergence between the full model and a disconnected model is given by

DKL(p||q)=12(log|Σ(Z)||Σ(Z)|+Tr(Σ(Z)Σ(Z)1)2N), [S46]

where N is the number of variables in X or Y and 2N is the total number of variables in X and Y. The determinant of Σ(Z) can be calculated as

|Σ(Z)|=|Σ(X)||Σ(E)+AΣ(X)ATAΣ(X)Σ(X)1Σ(X)AT|, [S47]
=|Σ(X)||Σ(E)|. [S48]

Thus, the KL divergence is rewritten as

DKL(p||q)=12(log|Σ(X)||Σ(X)|+log|Σ(E)||Σ(E)|+Tr(Σ(Z)Σ(Z)1)2N). [S49]

The trace of Σ(Z)Σ(Z)1 is calculated as

Tr(Σ(Z)Σ(Z)1)=Tr(Σ(X)Σ(X)1+Σ(X)(AT2AT)Σ(E)1A+(Σ(E)+AΣ(X)AT)Σ(E)1). [S50]

The derivatives of the KL divergence with respect to the components of Σ(X)1, A, Σ(E)1 can be calculated as

DKLΣ(X)ij1=Σ(X)jiΣ(X)ji, [S51]
DKLAii=(Σ(X)(AA)Σ(E)1)ii, [S52]
DKLΣ(E)ij1=Σ(E)jiΣ(E)ji+((AA)Σ(X)(AA)T)ji, [S53]

where we used

|A|Aij=|A|Aji1. [S54]

Thus, the KL divergence is minimized when

Σ(X)=Σ(X), [S55]
(Σ(X)(AA)Σ(E)1)ii=0, [S56]
Σ(E)=Σ(E)+(AA)Σ(X)(AA)T. [S57]

By substituting Eqs. S55 and S57 into Eq. S50, we find that the trace of Σ(Z)Σ(Z)1 is 2N,

Tr(Σ(Z)Σ(Z)1)=2N. [S58]

Thus, the minimized KL divergence, ΦG, is simply written as

ΦG=12log|Σ(E)||Σ(E)|. [S59]

Acknowledgments

We thank Charles Yokoyama, Matthew Davidson, and Dror Cohen for helpful comments on the manuscript. M.O. was supported by a Grant-in-Aid for Young Scientists (B) from the Ministry of Education, Culture, Sports, Science, and Technology of Japan (26870860). N.T. was supported by the Future Fellowship (FT120100619) and the Discovery Project (DP130100194) from the Australian Research Council. M.O. and N.T. were supported by CREST, Japan Science and Technology Agency.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1603583113/-/DCSupplemental.

References

  • 1.Ito S, Sagawa T. Information thermodynamics on causal networks. Phys Rev Lett. 2013;111(18):180603. doi: 10.1103/PhysRevLett.111.180603. [DOI] [PubMed] [Google Scholar]
  • 2.Granger CW. Some recent development in a concept of causality. J Econom. 1988;39(1):199–211. [Google Scholar]
  • 3.Bansal M, Belcastro V, Ambesi-Impiombato A, Di Bernardo D. How to infer gene networks from expression profiles. Mol Syst Biol. 2007;3(1):78. doi: 10.1038/msb4100120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Xiang R, Neville J, Rogati M. Proceedings of the 19th International Conference on World Wide Web. Association for Computing Machinery; New York: 2010. Modeling relationship strength in online social networks; pp. 981–990. [Google Scholar]
  • 5.Sugihara G, et al. Detecting causality in complex ecosystems. Science. 2012;338(6106):496–500. doi: 10.1126/science.1227079. [DOI] [PubMed] [Google Scholar]
  • 6.Park HJ, Friston K. Structural and functional brain networks: From connections to cognition. Science. 2013;342(6158):1238411. doi: 10.1126/science.1238411. [DOI] [PubMed] [Google Scholar]
  • 7.Bialek W, Nemenman I, Tishby N. Predictability, complexity, and learning. Neural Comput. 2001;13(11):2409–2463. doi: 10.1162/089976601753195969. [DOI] [PubMed] [Google Scholar]
  • 8.Schreiber T. Measuring information transfer. Phys Rev Lett. 2000;85(2):461. doi: 10.1103/PhysRevLett.85.461. [DOI] [PubMed] [Google Scholar]
  • 9.Ay N. Information geometry on complexity and stochastic interaction. Entropy. 2015;17(4):2432–2458. [Google Scholar]
  • 10.Koch C, Massimini M, Boly M, Tononi G. Neural correlates of consciousness: Progress and problems. Nat Rev Neurosci. 2016;17(5):307–321. doi: 10.1038/nrn.2016.22. [DOI] [PubMed] [Google Scholar]
  • 11.Tononi G, Boly M, Massimini M, Koch C. Integrated information theory: From consciousness to its physical substrate. Nat Rev Neurosci. 2016;17(7):450–461. doi: 10.1038/nrn.2016.44. [DOI] [PubMed] [Google Scholar]
  • 12.Massimini M, et al. Breakdown of cortical effective connectivity during sleep. Science. 2005;309(5744):2228–2232. doi: 10.1126/science.1117256. [DOI] [PubMed] [Google Scholar]
  • 13.Alkire MT, Hudetz AG, Tononi G. Consciousness and anesthesia. Science. 2008;322(5903):876–880. doi: 10.1126/science.1149213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Gosseries O, Di H, Laureys S, Boly M. Measuring consciousness in severely damaged brains. Annu Rev Neurosci. 2014;37:457–478. doi: 10.1146/annurev-neuro-062012-170339. [DOI] [PubMed] [Google Scholar]
  • 15.Casali AG, et al. A theoretically based index of consciousness independent of sensory processing and behavior. Sci Transl Med. 2013;5(198):198ra105. doi: 10.1126/scitranslmed.3006294. [DOI] [PubMed] [Google Scholar]
  • 16.Tononi G. Consciousness as integrated information: A provisional manifesto. Biol Bull. 2008;215(3):216–242. doi: 10.2307/25470707. [DOI] [PubMed] [Google Scholar]
  • 17.Oizumi M, Albantakis L, Tononi G. From the phenomenology to the mechanisms of consciousness: Integrated information theory 3.0. PLoS Comput Biol. 2014;10(5):e1003588. doi: 10.1371/journal.pcbi.1003588. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Balduzzi D, Tononi G. Integrated information in discrete dynamical systems: Motivation and theoretical framework. PLoS Comput Biol. 2008;4(6):e1000091. doi: 10.1371/journal.pcbi.1000091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Barrett AB, Seth AK. Practical measures of integrated information for time-series data. PLoS Comput Biol. 2011;7(1):e1001052. doi: 10.1371/journal.pcbi.1001052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Oizumi M, Amari S, Yanagawa T, Fujii N, Tsuchiya N. Measuring integrated information from the decoding perspective. PLoS Comput Biol. 2016;12(1):e1004654. doi: 10.1371/journal.pcbi.1004654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Seth AK, Barrett AB, Barnett L. Causal density and integrated information as measures of conscious level. Philos Trans A Math Phys Eng Sci. 2011;369(1952):3748–67. doi: 10.1098/rsta.2011.0079. [DOI] [PubMed] [Google Scholar]
  • 22.Amari S. Information Geometry and Its Applications. Springer; Tokyo: 2016. [Google Scholar]
  • 23.Pearl J. Causality. Cambridge Univ Press; Cambridge, UK: 2009. [Google Scholar]
  • 24.Tegmark M. 2016. Improved measures of integrated information. arXiv:1601.02626.
  • 25.Barnett L, Barrett AB, Seth AK. Granger causality and transfer entropy are equivalent for Gaussian variables. Phys Rev Lett. 2009;103(23):2–5. doi: 10.1103/PhysRevLett.103.238701. [DOI] [PubMed] [Google Scholar]
  • 26.Geweke J. Measurement of linear dependence and feedback between multiple time series. J Am Stat Assoc. 1982;77(378):304–313. [Google Scholar]
  • 27.Barrett AB, Barnett L, Seth AK. Multivariate granger causality and generalized variance. Phys Rev E. 2010;81(4):041907. doi: 10.1103/PhysRevE.81.041907. [DOI] [PubMed] [Google Scholar]
  • 28.Ay N, Olbrich E, Bertschinger N, Jost J. A geometric approach to complexity. Chaos. 2011;21(3):037103. doi: 10.1063/1.3638446. [DOI] [PubMed] [Google Scholar]
  • 29.Deco G, Tononi G, Boly M, Kringelbach ML. Rethinking segregation and integration: Contributions of whole-brain modelling. Nat Rev Neurosci. 2015;16(7):430–439. doi: 10.1038/nrn3963. [DOI] [PubMed] [Google Scholar]
  • 30.Boly M, et al. Stimulus set meaningfulness and neurophysiological differentiation: A functional magnetic resonance imaging study. PLoS One. 2015;10(5):e0125337. doi: 10.1371/journal.pone.0125337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Bullmore E, Sporns O. Complex brain networks: Graph theoretical analysis of structural and functional systems. Nat Rev Neurosci. 2009;10(3):186–198. doi: 10.1038/nrn2575. [DOI] [PubMed] [Google Scholar]
  • 32.Lee U, Mashour GA, Kim S, Noh GJ, Choi BM. Propofol induction reduces the capacity for neural information integration: Implications for the mechanism of consciousness and general anesthesia. Conscious Cogn. 2009;18(1):56–64. doi: 10.1016/j.concog.2008.10.005. [DOI] [PubMed] [Google Scholar]
  • 33.Chang JY, et al. Multivariate autoregressive models with exogenous inputs for intracerebral responses to direct electrical stimulation of the human brain. Front Hum Neurosci. 2012;6:317. doi: 10.3389/fnhum.2012.00317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.King JR, et al. Information sharing in the brain indexes consciousness in noncommunicative patients. Curr Biol. 2013;23(19):1914–1919. doi: 10.1016/j.cub.2013.07.075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Bandt C, Pompe B. Permutation entropy: A natural complexity measure for time series. Phys Rev Lett. 2002;88(17):174102. doi: 10.1103/PhysRevLett.88.174102. [DOI] [PubMed] [Google Scholar]
  • 36.Ay N, Polani D. Information flows in causal networks. Adv Complex Sys. 2008;11(01):17–41. [Google Scholar]
  • 37.Cuturi M. Sinkhorn distances: Lightspeed computation of optimal transport. Adv Neural Inform Process Syst. 2013:2292–2300. [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES