Skip to main content
Entropy logoLink to Entropy
. 2019 Sep 10;21(9):881. doi: 10.3390/e21090881

The Poincaré-Shannon Machine: Statistical Physics and Machine Learning Aspects of Information Cohomology

Pierre Baudot 1,2,
PMCID: PMC7515411

Abstract

Previous works established that entropy is characterized uniquely as the first cohomology class in a topos and described some of its applications to the unsupervised classification of gene expression modules or cell types. These studies raised important questions regarding the statistical meaning of the resulting cohomology of information and its interpretation or consequences with respect to usual data analysis and statistical physics. This paper aims to present the computational methods of information cohomology and to propose its interpretations in terms of statistical physics and machine learning. In order to further underline the cohomological nature of information functions and chain rules, the computation of the cohomology in low degrees is detailed to show more directly that the k multivariate mutual information (Ik) are (k1)-coboundaries. The (k1)-cocycles condition corresponds to Ik=0, which generalizes statistical independence to arbitrary degree k. Hence, the cohomology can be interpreted as quantifying the statistical dependences and the obstruction to factorization. I develop the computationally tractable subcase of simplicial information cohomology represented by entropy Hk and information Ik landscapes and their respective paths, allowing investigation of Shannon’s information in the multivariate case without the assumptions of independence or of identically distributed variables. I give an interpretation of this cohomology in terms of phase transitions in a model of k-body interactions, holding both for statistical physics without mean field approximations and for data points. The I1 components define a self-internal energy functional Uk and (1)kIk,k2 components define the contribution to a free energy functional Gk (the total correlation) of the k-body interactions. A basic mean field model is developed and computed on genetic data reproducing usual free energy landscapes with phase transition, sustaining the analogy of clustering with condensation. The set of information paths in simplicial structures is in bijection with the symmetric group and random processes, providing a trivial topological expression of the second law of thermodynamics. The local minima of free energy, related to conditional information negativity and conditional independence, characterize a minimum free energy complex. This complex formalizes the minimum free-energy principle in topology, provides a definition of a complex system and characterizes a multiplicity of local minima that quantifies the diversity observed in biology. I give an interpretation of this complex in terms of unsupervised deep learning where the neural network architecture is given by the chain complex and conclude by discussing future supervised applications.

Keywords: algebraic topology, machine learning, information theory, statistical physics, deep neural networks, unsupervised learning, multivariate mutual information, statistical dependences, k-body interactions, synergy, clustering


1 Introduction 3
   1.1 Observable Physics of Information 3
   1.2 Statistical Interpretation: Hierarchical Independences and Dependences Structures 3
   1.3 Statistical Physics Interpretation: K-Body Interacting Systems 4
   1.4 Machine Learning Interpretation: Topological Deep Learning 5
2 Information Cohomology 6
   2.1 A Long March through Information Topology 6
   2.2 Information Functions (Definitions) 7
   2.3 Information Structures and Coboundaries 10
      2.3.1 First Degree (k=1) 12
      2.3.2 Second Degree (k=2) 12
      2.3.3 Third Degree (k=3) 12
      2.3.4 Higher Degrees 13
3 Simplicial Information Cohomology 13
   3.1 Simplicial Substructures of Information 13
   3.2 Topological Self and Free Energy of K-Body Interacting System-Poincaré-Shannon Machine 14
   3.3 k-Entropy and k-Information Landscapes 18
   3.4 Information Paths and Minimum Free Energy Complex 19
      3.4.1 Information Paths (Definition) 19
      3.4.2 Derivatives, Inequalities and Conditional Mutual-Information Negativity 20
      3.4.3 Information Paths Are Random Processes: Topological Second Law of Thermodynamics and Entropy Rate 21
      3.4.4 Local Minima and Critical Dimension 23
      3.4.5 Sum over Paths and Mean Information Path 24
      3.4.6 Minimum Free Energy Complex 25
4 Discussion 27
   4.1 Statistical Physics 27
      4.1.1 Statistical Physics without Statistical Limit? Complexity through Finite Dimensional Non-Extensivity 27
      4.1.2 Naive Estimations Let the Data Speak 28
      4.1.3 Discrete Informational Analog of Renormalization Methods: No Mean-Field Assumptions Let the Objects Differentiate 29
      4.1.4 Combinatorial, Infinite, Continuous and Quantum Generalizations 29
   4.2 Data Science 29
      4.2.1 Topological Data Analysis 29
      4.2.2 Unsupervised and Supervised Deep Homological Learning 30
      4.2.3 Epigenetic Topological Learning—Biological Diversity 31
5 Conclusions 32
References 33

“Now what is science? ...it is before all a classification, a manner of bringing together facts which appearances separate, though they are bound together by some natural and hidden kinship. Science, in other words, is a system of relations. ...it is in relations alone that objectivity must be sought. ...it is relations alone which can be regarded as objective. External objects... are really objects and not fleeting and fugitive appearances, because they are not only groups of sensations, but groups cemented by a constant bond. It is this bond, and this bond alone, which is the object in itself, and this bond is a relation.”

H. Poincaré

1. Introduction

The present paper aims to provide a comprehensive introduction and interpretation in terms of statistics, statistical physics and machine learning of the information cohomology theory developed in References [1,2]. It presents the computational aspects of the application of information cohomology to data presented in Reference [3] and in the associated paper [2], which consists of an unsupervised classification of cell types or gene modules and provides a generic model for epigenetic co-regulation and differentiation.

1.1. Observable Physics of Information

In its application to empirically measured data, information cohomology is at the cross-roads of both data analysis and statistical physics and this article aims to give some keys to its interpretation within those two fields, which could be quoted as “passing the information between disciplines” in reference to Mezard’s review [4]. Just as topos have been used as a communication bridge allowing the translation of theorems between different domains and therefore to unify mathematical theories [5], information cohomology can help in further unraveling some equivalences between different disciplines (e.g., statistical physics and machine learning) and shall play a foundational role in both as already proposed by Doering and Isham considering only probability structures [6,7]. In doing so, information theory goes one step toward a general mathematical and physical theory of communication (or of measure and observation), where information conservation is an isomorphism. In other terms, following the paths of topology, this paper pursues the work that started with the studies of Brillouin [8], Jaynes [9,10], Landauer [11], Penrose [12], Wheeler [13], and Bennett [14], of an informational theory of physics that would be furthermore restricted to a theory of data: an austere empirical theory [15] with a minimum set of priors or axioms, a physics that “let the data speak”. It is the axiom of observability (“concepts which correspond to no conceivable observation should be eliminated from physics” [16], p. 264) which imposes that statistical physics and data sciences shall not be dissociated but unified in their foundations. Thermodynamics established that free energy is the energy that can be effectively used, whereas entropy was also called “lost heat”. The same holds generically on whatever data with mutual information and total correlations, some kind of relative or shared information: mutual information is the information that can be effectively used for pattern detection and classification. It hence appears that knowledge is a form of energy [17] and the results suggest that there are important resources of such information-energy in the k-body dependences even beyond pairwise interactions.

1.2. Statistical Interpretation: Hierarchical Independences and Dependences Structures

First, in order to provide a mathematical interpretation of information cohomology, I give a brief bibliographical overview of its multiple and diverse origins coming notably from the theory of motive and recall the open conjecture on the higher classes. In References [1,2], entropy was characterized uniquely as the first cohomology class in a topos theory and this result could be extended to quantum information, Kullback-Leibler divergence and cross-entropy (Proposition 4 [1]) and have been further developed and extended by Vigneaux [18,19] to Tsallis entropies and differential entropy following some preliminary results of Marcolli and Thorngren [20]. Here, we show directly by computing the cohomology in the low degrees that multivariate k-mutual information, denoted Ik, are (k1)-coboundaries. Recalling a result of the associated paper (Theorem 2 [2]), establishing that n random variables are independent if and only if all the subsets of k-mutual information vanish (k2), allows one to conclude that the (k1)-cocycles condition corresponds to Ik=0, which generalizes statistical independence to an arbitrary degree k. The interpretation and meaning of information cohomology is hence the quantification of the statistical dependences and of the obstruction to factorization—a result confirmed by the work of Mainiero, with a different approach [21]. Notably, the introduction of a symmetric action of conditioning, following Gerstenhaber and Schack [22], gives a 1-cocycle condition that characterizes the famous information pseudo-metric discovered by Shannon, Rajski, Zurek, Crutchfield and Bennett [23,24,25,26,27,28] and that can be generalized to the k-multivariate case by new symmetric non-negative information functions, the pseudo k-volumes Vk=HkIk.

1.3. Statistical Physics Interpretation: K-Body Interacting Systems

Second, to provide an interpretation of information cohomology in terms of statistical physics, this paper settles information structures in the context of a generic k-body interacting system. One can remark that the setting of information cohomology is equivalent to the Potts models that generalizes the spin models to arbitrary multivalued variables (see Reference [29] for review) and considers all possible k-ary statistical interactions in a similar way as the multispin interaction models do (k-spin interaction models that generalize pairwise and nearest-neighbor models, see Reference [30] and reference therein). I describe the computational and combinatorial restrictions from the lattice of partitions to the simplicial sub-lattice that allows one to compute in practice the information cohomology on data as in References [2,3] and that defines entropy and information landscapes and their respective paths that provide the discrete informational analog of path integrals. The combinatorics of possible interactions that are computed by the information landscapes is equivalent to the computational “exponential wall” encountered in many particle studies, notably density functional theory (DFT), as exposed by Kohn [31]. The I1 component defines a self-internal energy functional Uk and (1)kIk,k2 components define the contribution to a total free-energy functional Gk of the k-body interactions (i.e., the total correlation). The total free energy is a Kullback-Leibler divergence and a special case (symmetric in the interacting body) of the free energy introduced by Baez and Pollard [32] (see also the appendix of the associated paper [2]). These definitions allow the recovery of usual equilibrium or semi-classical expressions in special cases. The set of all first critical points of information paths—a conditional-independence condition—gives a construction of the minimum free energy complex. Mutual information negativity, also called synergy [33]—a phenomenon known to provide the signature of frustrated states in glasses since the work of Matsuda [34]—is related here in the context of the more general conditional mutual information to a kind of (discrete) first-order transition, analogous to smooth phase transition in small systems [35], yet seen topologically as the critical points of a simplicial complex.

To further settle the thermodynamical interpretation of information cohomology, it is relevant to wonder what the cohomological expression of the first and second principles of thermodynamics could be. The set of information paths in simplicial structures is in bijection with the symmetric group and random processes, providing a trivial topological expression of the second law of thermodynamic as a consequence of entropy convexity, improving the theorem of Cover that needed to assume Markov conditions [36]. Thanks to the theorem of Noether [37], the first principle is expressed in terms of continuous symmetries and her theorem has been restated in more modern homological terms for finite elements by Mansfield [38] and for Markov chains by Baez and Fong [39]. Hence, the expression of the first principle in information cohomology, let as a conjecture here, should take the form of a Noether theorem for random discrete processes—that is, for the entire symmetric group Sn, a question already asked by Neuenschwander [40].

As presented by Kadanov [41] or in References [42,43], close to phase transition (notably in three dimensions), the failure of the mean field theories introduced by Van der Waals, Maxwell and Landau led to the development of renormalization group methods. Renormalization methods intrinsically rely on asymptotic unrealistic assumptions such as an infinite number of particles [41,43] and neglect infinite quantities, which has raised fundamental criticism [44,45]. In order to compute an analog of a mean field model like van der Waals interactions [46,47,48], we define and compute the mean information paths that correspond to the homogeneous case of identically distributed random variables. On a dataset of expected interacting genes, the mean information path reproduces the usual behavior of the free energy in the condensed phase (i.e., with a critical point), while for genes that are less expected to interact, the path exhibits a monotonic decrease without a non-trivial minimum which corresponds to the usual free-energy potential in the uncondensed disordered phase for which the n-body interactions are negligible. However, compared with the non-averaged original information paths as presented in References [2,3], it is clear that this analog mean-field approach erases the multiplicity of critical points and the diversity or richness of the complexes.

Hence, with respect to statistical physics, the main novelty brought by the information cohomology approach is the introduction of purely finite and discrete methods that can account for transition phenomena in heterogeneous systems without mean field assumptions and of new measures of correlations (i.e., multivariate mutual information that generalizes the usual correlation coefficients to non-linear relations) [2,3]. Among the work left for further studies in this preliminary line, I conjecture that the infinite dimensional continuous extension of the information cohomology formalism should be equivalent to renormalization methods, while I underline that, in practice, there is no need of these physically unrealistic assumptions.

1.4. Machine Learning Interpretation: Topological Deep Learning

The interpretation of the information cohomology in terms of machine learning and data analysis first relies on the definition of random variables as partition and of information structures and complexes on the lattice of partition [1]. Partitions are equivalent to equivalence classes (see Ellerman [49,50] for review) and hence the complex of random variables spans all possible equivalence classes of the data point. Therefore, the information cochain complex deserves the function of a universal classifier. Moreover, these information structures defined on the whole lattice of partitions encompass all possible statistical dependences and relations, since by definition it considers all possible equivalent classes on a probability space and hence fully answers to the problem raised by James and Crutchfield [51], who remarked on some partitions that are not distinguished by the Ik simplicial structure. The combinatorics of these structures forbid any computation in practice, until quantum computers become available and the severe restriction to the simplicial case of the cohomology with complexity O(2n)) implies that not all statistical dependencies can be estimated, as shown by James and Crutchfield [51]. The interpretation makes two phenomena coincide (i.e., condensation in statistical physics and clustering of data points in data science), which are signed by information and conditional information negativity as studied in the companion paper [2]. This generalizes the idea and results obtained on networks by the team of Grassberger [52].

On the side of applied algebraic topology, the identification of the topological structures of a dataset has motivated important research following the development of persistent homology [53,54,55]. Combining statistical and topological structures in a single framework remains an active challenge of data analysis that has already yielded some interesting results [56,57]. Some recent works have proposed information theoretical approaches grounded on homology, defining persistent entropy [58,59], graph topological entropy [60], spectral entropy [61] or multilevel integration entropies [62]. The present work is formally different and arose independent of persistence and provides a cohomology intrinsically based on probability for which the invariants are arguably the most important features of statistics and free of any metric assumption. The interpretation in terms of deep neural networks is also straightforward. It is preliminarily developed and applied here and in Reference [2] for the unsupervised case, while the supervised subcase is only briefly discussed here and left for further work. See Reference [63] for a presentation and preliminary applications to the digit images database of the Mixed National Institute of Standards and Technology (MNIST). The informational approach of topological data analysis provides a direct probabilistic and statistical analysis of the structure of a dataset which allows the gap with neural network analysis to be bridged and may be a step toward their formalization in mathematics and the characterization of the network architecture necessary for a given dataset. The original work based on spin networks by Hopfield [64] formalized fully recurrent networks as n binary random variables (N=2). Ackley, Hinton and Sejnowski [65] followed up by imposing the Markov Field condition, allowing the introduction of conditional independence to handle network structures with hidden layers and hidden nodes. The result—the Boltzmann or Helmholtz machine [66]—relies on the maximum entropy or free-energy minimization principle and originally on minimizing the relative entropy between the network and environmental states [65]. Indeed, Reference [67] and references therein (notably see the whole opus of Marcolli with application to linguistics [68]) provides a review of the relevance of homology to artificial or natural cognition. Considering neurons as binary random variables or more generally N-ary variables (corresponding to a rate coding hypothesis) in the present context provides a homologically constrained approach of those neural networks, where the first input layer is represented by the marginal (single variable, degree 1 component) while hidden layers are associated to higher degrees. In a very naive sense, higher cohomological degrees distinguish higher-order patterns (or higher-dimensional patterns in the simplicial case), just as receptive fields of convolutional neural networks recognize higher-order features when going to higher depth-rank of neural layers as described in David Marr’s original sketch [69] and now implemented efficiently in deep network structures. Notably, the notion of geodesic used in machine learning is replaced by the homotopical notion of path. On the data analysis side, it provides a new algorithm and tools for topological data analysis allowing one to rank and detect clusters and functional modules, and to make dimensionality reduction; indeed, all these classical tasks in data analysis have a direct homological meaning. I propose to call the data analysis method presented here the Poincaré-Shannon machine, since it implements simplicial homology (see Poincaré’s Analysis Situs [70]) and information theory in a single framework (see Shannon’s theory of communication [71]), applied effectively to empirical data.

2. Information Cohomology

This section provides a short bibliographical note on the inscription of information and probability theory within homological theories Section 2.1. We also recall the definition of information functions Section 2.2 and provide a short description of information cohomology computed in the low degrees Section 2.3, such that the interpretation of entropy and mutual information within Hochschild cohomology appears straightforward and clear. There are no new results in this section but I hope to provide a more simple and helpful presentation for some researchers outside the field of topology of what can be found in References [1,18,19] that should be considered for more precise and detailed exposition.

2.1. A Long March through Information Topology

From the mathematical point of view, a motivation of information topology is to capture the ambiguity theory of Galois, which is the essence of group theory or discrete symmetries (see André’s reviews [72,73]) and Shannon’s information uncertainty theory in a common framework—a path already paved by some results on information inequalities (see Yeung’s results [74]) and in algebraic geometry. In the work of Cathelineau, [75], entropy first appeared in the computation of the degree-one homology of the discrete group SL(2,C) with coefficients in the adjoint action by choosing a pertinent definition of the derivative of the Bloch–Wigner dilogarithm. It could be shown that the functional equation with five terms of the dilogarithm implies the functional equation of entropy with four terms. Kontsevitch [76] discovered that a finite truncated version of the logarithm appearing in cyclotomic studies also satisfied the functional equation of entropy, suggesting a higher-degree generalization of information, analog to polylogarithm and hence showing that the functional equation of entropy holds in p and 0 field characteristics. Elbaz-Vincent and Gangl used algebraic means to construct this information generalization which holds over finite fields [77], where information functions appear as derivations [78]. After entropy appeared in tropical and idempotent semi-ring analysis in the study of the extension of Witt semiring to the characteristic 1 limit [79], Marcolli and Thorngren developed the thermodynamic semiring, an entropy operad that could be constructed as a deformation of the tropical semiring [20]. Introducing Rota–Baxter algebras, it allowed the derivation of a renormalization procedure [80]. In defining the category of finite probability and using Fadeev axiomatization, Baez, Fritz and Leinster could show that the only family of functions that has the functorial property is Shannon information loss [81,82]. Basing his approach on information and Koszul geometry, Boyom developed a more geometrical view of statistical models that notably considers foliations in place of the random variables [83]. Introducing a deformation theoretic framework and chain complex of random variables, Drumond-Cole, Park and Terilla [84,85,86] constructed a homotopy probability theory for which the cumulants coincide with the morphisms of the homotopy algebras. The probabilistic framework used here was introduced in Reference [1] and generalized to Tsallis entropies by Vigneaux [18,19]. The diversity of the formalisms employed in these independent but convergent approaches is astonishing. So, as to the question “what is information topology?”, it is only possible to answer that it is under development at the moment. The results of Catelineau, Elbaz-Vincent and Gangl inscribed information into the theory of motives, which according to Beilison’s program is a mixed Hodge-Tate cohomology [87]. All along the development of the application to data, following the cohomology developed by References [1,18] on an explicit probabilistic basis, we aimed to preserve such a structure and unravel its expression in information theoretic terms. Moreover, following Aomoto’s results [88,89], the actual conjecture [1] is that the higher classes of information cohomology should be some kind of polylogarithmic k-form (k-differential volumes that are symmetric and additive and that correspond to the cocycle conditions for the cohomology of Lie groups [88]). The following developments suggest that these higher information groups should be the families of functions satisfying the functional equations of k-independence Ik=0− a rather vague but intuitive view that can be tested in special cases.

2.2. Information Functions (Definitions)

The information functions used in Reference [1] and the present study were originally defined by Shannon [71] and Kullback [90] and further generalized and developed by Hu Kuo Ting [91] and Yeung [92] (see also McGill [93]). These functions include entropy, denoted H1=H(X;P); joint entropy, denoted Hk=H(X1,,Xk;P); mutual information, denoted I2=I(X1;X2;P); multivariate k-mutual information, denoted Ik=I(X1;;Xk;P); and the conditional entropy and mutual information, denoted Y.Hk=H(X1,,Xk|Y;P) and Y.Ik=I(X1;;Xk|Y;P). The classical expression of these functions is the following (using k=1/ln2, the usual bit unit):

  • The Shannon-Gibbs entropy of a single variable Xj is defined by [71]:
    H1=H(Xj;PXj)=kx[Nj]p(x)lnp(x)=ki=1Njpilnpi, (1)
    where [Nj]={1,,Nj} denotes the alphabet of Xj.
  • The relative entropy or Kullback-Liebler divergence, which was also called “discrimination information” by Kullback [90], is defined for two probability mass functions p(x) and q(x) by:
    D(p(x)||q(x))=D(X;p(x)||q(x))=kxXp(x)lnq(x)p(x)=H(X;p(x),q(x))H(X;p(x)), (2)
    where H(X;p(x),q(x)) is the cross-entropy and H(X;p(x)) the Shannon entropy. It hence generates minus entropy as a special case, taking the deterministic constant probability q(x)=1. With the convention k=1/ln2, D(p(x)||q(x)) is always positive or null.
  • The joint entropy is defined for any joint product of k random variables (X1,,Xk) and for a probability joint distribution P(X1,,Xk) by [71]:
    Hk=H(X1,,Xk;PX1,,Xk)=kx1,,xk[N1××Nk]N1××Nkp(x1xk)lnp(x1xk)=ki,j,,kN1,,Nkpijkkindiceslnpijk, (3)
    where [N1××Nk]={1,,Nj××Nk} denotes the alphabet of (X1,,Xk).
  • The mutual information of two variables X1,X2 is defined as [71]:
    I(X1;X2;PX1,X2)=kx1,x2[N1×N2]N1×N2p(x1.x2)lnp(x1)p(x2)p(x1.x2), (4)
    and it can be generalized to k-mutual information (also called co-information) using the alternated sums given by Equation (17), as originally defined by McGill [93] and Hu Kuo Ting [91], giving:
    Ik=I(X1;;Xk;P)=kx1,,xk[N1××Nk]N1××Nkp(x1xk)lnI[k];card(I)=i;ioddpII[k];card(I)=i;ievenpI. (5)
    For example, the 3-mutual information is the function:
    I3=kx1,x2,x3[N1×N2×N3]N1×N2×N3p(x1.x2.x3)lnp(x1)p(x2)p(x3)p(x1.x2.x3)p(x1.x2)p(x1.x3)p(x2.x3). (6)
    For k3, Ik can be negative [91].
  • The total correlation introduced by Watanabe [94] called integration by Tononi and Edelman [95] or multi-information by Studený and Vejnarova [96] and Margolin and colleagues [97], which we denote Ck(X1;Xk;P), is defined by:
    Ck=Ck(X1;Xk;P)=i=1kH(Xi)H(X1;Xk)=i=2k(1)iI[n];card(I)=iIi(XI;P)=kx1,,xk[N1××Nk]N1××Nkp(x1.xk)lnp(x1xk)p(x1)p(xk). (7)
    For two variables, the total correlation is equal to the mutual information (C2=I2). The total correlation has the favorable property of being a relative entropy 2 between marginal and joint-variable and hence of being always non-negative.
  • The conditional entropy of X1 knowing (or given) X2 is defined as [71]:
    X2.H1=H(X1|X2;P)=kx1,x2[N1×N2]N1N2p(x1.x2)lnpx2(x1)=kx2X2N2p(x2).x1X1N1px2x1lnpx2x1. (8)
    Conditional joint-entropy, X3.H(X1,X2) or (X1,X2).H(X3), is defined analogously by replacing the marginal probabilities by the joint probabilities.
  • The conditional mutual information of two variables X1,X2 knowing a third X3 is defined as [71]:
    X3.I2=I(X1;X2|X3;P)=kx1,x2,x3[N1×N2×N3]N1×N2×N3p(x1.x2.x3)lnpx3(x1)px3(x2)px3(x1,x2). (9)
    Conditional mutual information generates all the preceding information functions as subcases, as shown by Yeung [92]. We have the theorem: if X3=Ω, then it gives the mutual information; if X2=X1, it gives conditional entropy; and if both conditions are satisfied, it gives entropy. Notably, we have I1=H1.

We now give the few information equalities and inequalities that are of central use in the homological framework, in the information diagrams and for the estimation of the information from the data.

We have the chain rules (see Reference [36] for proofs):

H(X1;X2;P)=H(X1;P)+X1.H(X2;P)=H(X2;P)+X2.H(X1;P), (10)
I(X1;X2;P)=H(X1;P)X2.H(X1;P)=H(X2;P)X1.H(X2;P), (11)

which we can write more generally as (where the hat denotes the omission of the variable):

H(X1;;Xi^;;Xk+1;P)=H(X1;;Xk+1;P)(X1;;Xi^;;Xk+1).H(Xi;P), (12)

that we can write in short Hk+1Hk=(X1,Xk).H(Xk+1)

I(X1;;Xi^;;Xk+1;P)=I(X1;;Xk+1;P)+Xi.I(X1;;Xi^;;Xk+1;P), (13)

which we can write in short Ik1Ik=Xk.Ik1, generating the chain rule (10) as a special case.

These two equations provide recurrence relationships that give an alternative formulation of the chain rules in terms of a chosen path on the lattice of information structures:

Hk=H(X1,,Xk;P)=i=1k(X1,,Xi1).H(Xi;P), (14)

where we assume H(X1;P)=X0.H(X1;P) and hence that X0 is the greatest element X0=Ω.

Ik=I(X1;;Xk;P)=I(X1)i=2kXi.I(X1;;Xi1). (15)

We have the alternated sums or inclusion–exclusion rules [1,34,91]:

Hn(X1,,Xn;P)=i=1n(1)i1I[n];card(I)=iIi(XI;P), (16)
In(X1;;Xn;P)=i=1n(1)i1I[n];card(I)=iHi(XI;P). (17)

For example: H3(X1,X2,X3)=I1(X1)+I1(X2)+I1(X3)I2(X1;X2)I2(X1;X3)I2(X2;X3)+I3(X1;X2;X3).

The chain rule of mutual information goes together with the following inequalities discovered by Matsuda [34]. For all random variables X1;;Xk with associated joint probability distribution P, we have the theorem due to Matsuda [34]

  • Xk.I(X1;;Xk1;P)0 if and only if I(X1;;Xk1;P)I(X1;;Xk;P) (in short: Ik1Ik),

  • Xk.I(X1;;Xk1;P)<0 if and only if I(X1;;Xk1;P)<I(X1;;Xk;P) (in short: Ik1<Ik),

Which fully characterize the phenomenon of information negativity as an increasing or diverging sequence of mutual information.

2.3. Information Structures and Coboundaries

This section justifies the choice of functions and algorithm, the topological nature of the data analysis and the approximations we had to concede for the computation. In the general formulation of information cohomology, the random variables are partitions of the atomic probabilities of a finite probability space (Ω,B,P) (e.g., all their equivalence classes). The Joint-Variable (X1,X2) is the less-fine partition that is finer than X1 and X2; the whole lattice of partitions Π [98] corresponds to the lattice of joint random variables [1,99]. Then, a general information structure is defined to be the triple (Ω,Π,P). A more modern and general expression in category theory and topos is given in References [1,18]. (X1,,Xk;P) designates the image law of the probability P by the measurable function of joint variables (X1,,Xk). Figure 1 gives a simple example of the lattice of partitions for four atomic probabilities, with the simplicial sublattice used for data analysis. Atomic probabilities are also illustrated in a figure in the associated paper [2].

Figure 1.

Figure 1

Example of general and simplicial information structures. (a) Example of lattice of random variables (partitions): the lattice of partitions of atomic-elementary events for a sample space of four atomic elements |Ω|=4 (e.g., two coins and Ω={00,01,10,11}), each element being denoted by a black dot in the circles representing the random variables. The joint operation of random variables denoted (X,Y) or XY of two partitions is the less-fine partition Z that is finer than X and Y (Z divides Y and X or Z is the greatest common divisor of Y and X). It is represented by the coincidence of two edges of the lattices. The joint operation has an identity element denoted 1=Ω (that we will denote 0 hereafter), with X,1=X,Ω=X and is idempotent (X,X)=X2=X. The structure is a partially ordered set (poset) with a refinement relation. (b) Illustration of the simplicial structure (sublattice) used for the data analysis (|Ω|=4 as previously).

On this general information structure, we consider the real module of all measurable functions F(X1,,Xk;P) and the conditioning-expectation by Y of measurable functions as the action of Y on the functional module, denoted Y.F(X1,,Xk;P), such that it corresponds to the usual definition of conditional entropy (Equation (8)). We define our complexes of measurable functions of random variables Xk=F(X1,,Xk;P) and the cochain complexes (Xk,k) as:

0X00X11X22Xk1k1Xk,

where k is the left action co-boundary that Hochschild proposed for associative and ring structures [100]. A similar construction of a random variable complex was given by Drumond-Cole, Park and Terilla [84,85]. We also consider the two other directly related cohomologies defined by considering a trivial left action [1] and a symmetric (left and right) action [22,101,102] of conditioning:

  • The left action Hochschild-information coboundary and cohomology (with trivial right action):
    (k)F(X1;X2;;Xk+1;P)=X1.F(X2;;Xk+1;P)+i=1k(1)iF(X1;X2;;(Xi,Xi+1);;Xk+1;P)+(1)k+1F(X1;;Xk;P). (18)
    This coboundary, with a trivial right action, is the usual coboundary of Galois cohomology ([103], p. 2) and in general it is the coboundary of homological algebra obtained by Cartan and Eilenberg [104] and MacLane [105] (non-homogenous bar complex).
  • The “topological-trivial” Hochschild-information coboundary and cohomology: consider a trivial left action in the preceding setting (e.g., X1.F(X2;;Xk+1)=F(X2;;Xk+1)). It is the subset of the preceding case, which is invariant under the action of conditioning. We obtain the topological coboundary (tk) [1]:
    (tk)F(X1;X2;;Xk+1;P)=F(X2;;Xk+1;P)+i=1k(1)iF(X1;X2;;(Xi,Xi+1);;Xk+1;P)+(1)k+1F(X1;;Xk;P). (19)
  • The symmetric Hochschild-information coboundary and cohomology: as introduced by Gerstenhaber and Shack [22], Kassel [102] (p. 13) and Weibel [101] (chap. 9), we consider a symmetric (left and right) action of conditioning, that is, X1.F(X2;;Xk+1)=F(X2;;Xk+1).X1. The left action module is essentially the same as considering a symmetric action bimodule [22,101,102]. We hence obtain the following symmetric coboundary (*k):
    (*k)F(X1;X2;;Xk+1;P)=X1.F(X2;;Xk+1;P)+i=1k(1)iF(X1;X2;;(Xi,Xi+1);;Xk+1;P)+(1)k+1Xk+1.F(X1;;Xk;P). (20)

Based on these definitions, Baudot and Bennequin [1] computed the first homology class in the left action Hochschild-information cohomology case and the coboundaries in higher degrees. We introduce here the symmetric case and detail the higher-degree cases by direct specialization of the co-boundary formulas, such that it appears that information functions and chain rules are homological by nature. For notation clarity, we omit the probability in the writing of the functions and when specifically stated replace their notation F by their usual corresponding informational function notation H,I.

2.3.1. First Degree (k=1)

For the first degree k=1, we have the following results:

  • The left 1-co-boundary is (1)F(X1;X2)=X1.F(X2)F(X1,X2)+F(X1). The 1-cocycle condition (1)F(X1;X2)=0 gives F(X1,X2)=F(X1)+X1.F(X2), which is the chain rule of information shown in Equation (10). Then, following Kendall [106] and Lee [107], it is possible to recover the functional equation of information and to characterize uniquely—up to the arbitrary multiplicative constant k—the entropy (Equation (1)) as the first class of cohomology [1,18]. This main theorem allows us to obtain the other information functions in what follows. Marcolli and Thorngren [20] and the group of Leinster, Fritz and Baez [81,82] independently obtained an analog result using a measure-preserving function and a characteristic one Witt construction, respectively. In these various theoretical settings, this result extends to relative entropy [1,20,82] and Tsallis entropies [18,20].

  • The topological 1-coboundary (t1) is (t1)F(X1;X2)=F(X2)F(X1,X2)+F(X1), which corresponds to the definition of mutual information (t1)F(X1;X2)=I(X1;X2)=H(X1)+H(X2)H(X1,X2) and hence I2 is a topological 1-coboundary.

  • The symmetric 1-coboundary (*1) is (*1)F(X1;X2)=X1.F(X2)F(X1,X2)+X2.F(X1), which corresponds to the negative of the pairwise mutual information (*1)F(X1;X2)=X2.H(X1)+X1.H(X2)H(X1,X2)=I(X1;X2) and hence I2 is a symmetric 1-coboundary. Moreover, the 1-cocycle condition (*1)F(X1;X2)=0 characterizes functions satisfying F(X1,X2)=X2.F(X1)+X1.F(X2), which corresponds to the information pseudo-metric discovered by Shannon [23], Rajski [24], Zurek [25] and Bennett [26] and has further been applied for hierarchical clustering and finding categories in data by Kraskov and Grassberger [27]: H(X1X2)=X2.H(X1)+X1.H(X2)=H(X1,X2)I(X1;X2). Therefore, up to an arbitrary scalar multiplicative constant k, the information pseudo-metric H(X1X2) is the first class of symmetric cohomology. This pseudo-metric is represented in Figure 3. It generalizes to pseudo k-volumes that we define by Vk=HkIk (particularly interesting symmetric nonnegative functions computed by the provided software).

2.3.2. Second Degree (k=2)

For the second degree k=2, we have the following results:

  • The left 2-co-boundary is 2F(X1;X2;X3)=X1.F(X2;X3)F((X1,X2);X3)+F(X1;(X2,X3))F(X1;X2), which corresponds to minus the 3-mutual information 2F(X1;X2;X3)=X1.I(X2;X3)I((X1,X2);X3)+I(X1;(X2,X3))I(X1;X2)=I(X1;X2;X3) and hence I3 is the left 2-coboundary.

  • The topological 2-coboundary is (t2)F(X1;X2;X3)=F(X2;X3)F((X1,X2);X3)+F(X1;(X2,X3))F(X1;X2), which corresponds in information to t2F(X1;X2;X3)=I(X2;X3)I((X1,X2);X3)+I(X1;(X2,X3))I(X1;X2)=0 and hence the topological 2-coboundary is always null-trivial.

  • The symmetric 2-coboundary is (*2)F(X1;X2;X3)=X1.F(X2;X3)F((X1,X2);X3)+F(X1;(X2,X3))X3.F(X1;X2), which corresponds in information to *2F(X1;X2;X3)=X1.I(X2;X3)I((X1,X2);X3)+I(X1;(X2,X3))X3.I(X1;X2)=0 and hence the symmetric 2-coboundary is always null-trivial.

2.3.3. Third Degree (k=3)

For the third degree k=3, we have the following results:

  • The left 3-co-boundary is 3F(X1;X2;X3;X4)=X1.F(X2;X3;X4)F((X1,X2);X3;X4)+F(X1;(X2,X3);X4)F(X1;X2;(X3,X4))+F(X1;X2;X3), which corresponds in information to 3F(X1;X2;X3;X4)=X1.I(X2;X3;X4)I((X1,X2);X3;X4)+I(X1;(X2,X3);X4)I(X1;X2;(X3,X4))+I(X1;X2;X3)=0 and hence the left 3-coboundary is always null-trivial.

  • The topological 3-coboundary is t3F(X1;X2;X3;X4)=F(X2;X3;X4)F((X1,X2);X3;X4)+F(X1;(X2,X3);X4)F(X1;X2;(X3,X4))+F(X1;X2;X3), which corresponds in information to t3F(X1;X2;X3;X4)=I(X2;X3;X4)I((X1,X2);X3;X4)+I(X1;(X2,X3);X4)I(X1;X2;(X3,X4))+I(X1;X2;X3)=I(X1;X2;X3;X4) and hence I4 is a topological 3-coboundary.

  • The symmetric 3-coboundary is (*3)F(X1;X2;X3;X4)=X1.F(X2;X3;X4)F((X1,X2);X3;X4)+F(X1;(X2,X3);X4)F(X1;X2;(X3,X4))+X4.F(X1;X2;X3), which corresponds in information to *3F(X1;X2;X3;X4)=X1.I(X2;X3;X4)I((X1,X2);X3;X4)+I(X1;(X2,X3);X4)I(X1;X2;(X3,X4))+X4.I(X1;X2;X3)=I(X1;X2;X3;X4) and hence I4 is a symmetric 3-coboundary.

2.3.4. Higher Degrees

For k=4, we obtain 4F(X1;X2;X3;X4;X5)=I5 and t5F(X1;X2;X3;X4;X5)=0 and *5F(X1;X2;X3;X4;X5)=0. For arbitrary k, the symmetric coboundaries are just the opposite of the topological coboundaries tk=*k. It is possible to generalize to arbitrary degrees [1] by remarking that:

  • For even degrees 2k: we have I2k=tI2k1, and then I2k=tttH with 2k1 boundary terms. In conclusion, we have:
    2kF=I2k+1and*2kF=t2kF=0. (21)
  • For odd degrees 2k+1: I2k+1=I2k1 and then I2k+1=ttH with 2k boundary terms. In conclusion, we have:
    2k1F=0and*2k1F=t2kF=I2k. (22)

In References [2,108] (Theorem 2), we show that the mutual independence of n variables is equivalent to the vanishing of all Ik functions for all 2kn. As a probabilistic interpretation and conclusion, the information cohomology hence quantifies statistical dependences at all degrees and the obstruction to factorization. Moreover, k-independence coincides with cocycles. We therefore expect that the higher cocycles of information, conjectured to be polylogarithmic forms [1,77,78], are characterized by the functional equations Ik=0 and quantify statistical k-independence.

3. Simplicial Information Cohomology

3.1. Simplicial Substructures of Information

The general information structure, relying on the information functions defined on the whole lattice of partitions, encompasses all possible statistical dependences and relations, since by definition it considers all possible equivalent classes on a probability space. One could hence expect this general structure to provide a promising theoretical framework for classification tasks on data and this is probably true in theory. However, this general case hardly allows any interesting computational investigation, as it implies an exhaustive exploration of computational complexity following Bell’s combinatoric in O(exp(exp(Nn))) for nN-ary variables. This fact was already remarked in the study of aggregation for artificial intelligence by Lamarche-Perrin and colleagues [109]. At each order k, the number of k-joint-entropy and k-mutual-information to evaluate is given by Stirling numbers of the second kind S(n,k) that sum to Bell number Bn, Bn=k=0nS(n,k). For example, considering 16 variables that can take 8 values each, we have 816=2483.1014 atomic probabilities and the partition lattice of variables exhibits around ee24812200 elements to compute. This computational reef can be decreased by considering the sample size m, which is the number of trials, repetitions or points used to effectively estimate the empirical probability. It restricts the computation to O(exp(exp(m)), which remains insurmountable in practice with our current classical Turing machines. To circumvent this computational barrier, data analysis is developed on the simplest and oldest subcase of Hochschild cohomology—the simplicial cohomology, which we hence call the simplicial information cohomology and structure and which corresponds to a subcase of cohomology and structure introduced previously (see Figure 1b). It corresponds to Examples 1 and 4 in Reference [1], and to the python scripts shared on Github (cf. Supplementary Materials). For simplicity, we note also the simplicial information structure (Ω,Δn,P), Δn=(X1,,Xn;P), as we will not come back to the general setting. Joint (X1,X2) and meet (X1;X2) operations on random variables are the usual joint-union and meet-intersection of Boolean algebra and define two opposite-dual monoids, freely generating the lattice of all subsets and its dual. The combinatorics of the simplicial information structure follow binomial coefficients and, for each degree k in an information structure of n variables, we have nk=n!k!(nk!) elements that are in one-to-one correspondence with the k-faces (the k-tuples) of the n-simplex of random variables (or its barycentric subdivisions). It is a (simplicial) substructure of the general structure, since any finite lattice is a sub-lattice of the partition lattice [110]. This lattice embedding and the fact that simplicial cohomology is a special case of Hochschild cohomology can also be inferred directly from their coboundary expression and has been explicitly formalized in homology: notably, Gerstenhaber and Shack showed that a functor, denoted ΣkΣ!, induces an isomorphism between simplicial and Hochschild cohomology H(Σ,k)H(kΣ!,kΣ!) (see [111] for precisions). A simplicial complex Xk=F(X1,,Xk;P) of measurable functions is any subcomplex of this simplex Δn with kn and any simplicial complex can be realized as a subcomplex of a simplex (see p. 296 in Reference [112]). The information landscapes presented in Section 3.3 illustrate an example of such a lattice/information structure. Moreover in this ordinary homological structure, the degree obviously coincides with the dimension of the data space (the data space is in general Rn, the space of “co-ordinate” values of the variables). This homological (algebraic, geometric and combinatorial) restriction to the simplicial subcase can have some important statistical consequences. In practice, whereas the consideration of the partition lattice ensured that no reasonable (up to logical equivalence) statistical dependences could be missed (since all the possible equivalence classes on the atomic probabilities were considered), the monoidal simplicial structure unavoidably misses some possible statistical dependences, as shown and exemplified by James and Crutchfield [51].

3.2. Topological Self and Free Energy of K-Body Interacting System-Poincaré-Shannon Machine

Topological Self and Free Energy of K-Body Interacting Systems

The basic idea behind the development of topological quantum field theories [113,114,115] was to define the action and energy functionals on a purely topological ground, independently of any metric assumptions and to derive from this the correlation functions or partition functions. Here, in an elementary model for applied purposes, we define, in the special case of classical and discrete probability, the k-mutual information Ik (that generalize the correlation functions to nonlinear relation [116]), as the contribution of the k-body interactions to the energy functional. Some further observations support such a definition: (i) as stated in Reference [1], the signed mutual information (1)kIk defining energy are sub-harmonic, a kind of weak convexity; (ii) in the next sections, we define the paths of information and show that they are equivalent to the discrete symmetry group; (iii) from the empirical point of view, the results in Section 3.4.5 shows that these energy functionals estimated on real data behave as expected for usual k-body homogeneous formalism such as the van der Waals model or more refined density functional theory (DFT) [117,118]. Given in the context of simplicial structures, these definitions generalize to the case of a partitions lattice and altogether provide the usual thermodynamical and machine-learning expressions and interpretation of mutual information quantities: some new methods free of metric assumptions. There are two qualitatively and formally different components in the Ik, that give the two following definitions.

Definition 1.

Self-internal energy (definition): for k=1, I1 and their sum in an information structure expressed in Equation (16), namely, T[n];card(T)=1I1(XT;P), are a self-interaction component, since they sum over marginal information entropy I1(Xi)=H1(Xi). We call the first-dimension mutual-information component U(X1,,Xn;PN) the self-information or internal energy, in analogy to usual statistical physics and notably DFT:

U(X1,,Xn;PN)=i=1nI1(Xi;PN). (23)

Note that in the present context, which is discrete and where the interactions do not depend on a metric, the self-interaction does not diverge, which is a usual problem with metric continuous formalism and was the original motivation for regularization and renormalization infinite corrections, considered by Feynman and Dirac as the mathematical default of the formalism [44,45].

Definition 2.

k-free-energy and total-free-energy: for k2, (1)kIk and their sum in an information structure (Equation 16) quantify the contribution of the k-body interactions. We call the kth dimension mutual-information component (1)kIk given in Equation (5) the k-free-information-energy. We call the (cumulative) sum over dimensions of these k-free-information-energies starting at pairwise interactions (dimension 2), the total n-free-information-energy and denote it G(X1,,Xn;PN):

G(X1,,Xn;PN)=i=2n(1)i1I[n];card(I)=iIi(XI;PN)=Cn(X1;Xn;PN). (24)

The total free energy is the total correlation (Equation (7)) introduced by Watanabe in 1960 [94] that quantifies statistical dependence in the work of Studený and Vejnarova [96] and Margolin and colleagues [97] and among other examples consciousness in the work of Tononi and Edelman [95]. In agreement with the results of Baez and Pollard in their study of biological dynamics using out-of-equilibrium formalism [32] and the appendix of the companion paper on Bayes free energy [2], the total free energy is a relative entropy. The consideration that free energy is the peculiar case of total correlation within the set of relative entropies accounts for the fact that the free energy shall be a symmetric function of the variables associated to the various bodies (e.g., f(X;Y)=f(Y;X) in the pairwise interaction case). Moreover, whereas the Ik energy component can be negative, the Gk total energy component is always non-negative. Each (1)kIk term in the free energy can be understood as a free-energy correction accounting for the k-body interactions.

Entropy is given by the alternated sums of information (Equation (16)), which then read as the usual isotherm thermodynamic relation:

Hn(X1,,Xn;PN)=U(X1,,Xn;PN)G(X1,,Xn;PN). (25)

This information-theoretic formulation of thermodynamic relation follows Jaynes [9,10], Landauer [11], Wheeler [119] and Bennett’s [14] original work and is general in the sense that it is finite and discrete and holds independently of the assumption of the system being in equilibrium or not (i.e., for any finite probability). In more probabilistic terms, it does not assume that the variables are identically distributed—a required condition for the application of classical central limit theorems (CLTs) to obtain the normal distributions in the asymptotic limit [120]. In the special case where one postulates that the probability follows the equilibrium Gibbs distribution, which is also the maximum entropy distribution [121,122], the expression of the joint entropy (k=1/ln2) allows recovery of the equilibrium fundamental relation, as usually achieved in statistical physics (see Adami and Cerf [123] and Kapranov [124] for more details). Explicitly, let us consider Gibbs’ distribution:

p(X1=x1,,Xn=xn)=pijnnindices=1ZeβEijn/kBT, (26)

where Eijn is the energy of the elementary-atomic probability pijn, kB is Boltzmann’s constant, T is the temperature and Z=i,j,,nNi.NjNneEijn/kBT is the partition function, such that i,j,,nNi.NjNnpijn=1. Since H(X1,,Xn)=ki,j,,nNi.NjNnpijnlnpijn equals the thermodynamic entropy function S up to the arbitrary Landauer constant factor kBln2, S=kBln2H(X1,,Xn), the entropy for the Gibbs distribution gives:

H(X1,,Xn)/k=i,j,,nN1.N2NnpijnEijn/kBT+i,j,,nN1.N2NnpijnlnZ=(EG)/kBT, (27)

which gives the expected thermodynamical relation:

kBTln2.H(X1,,Xn)=EG=UG, (28)

where G is the free energy G=kBTlnZ.

In the general case of arbitrary random variables (not necessarily i.i.d.) and discrete probability space, the identification of marginal information with internal energy

k=1nH(Xk)=i,j,,nN1.N2NnpijnEijn (29)

implies by direct algebraic calculus that:

i,j,,nN1.N2NnpijnEijn=i,j,,nN1.N2Nnpijnlnk=inpk, (30)

where the marginal probability pk is the sum over all probabilities for which Xk=xk. It is hence tempting to identify the elementary atomic energies Eijn with the elementary marginal information lnpk. This is achieved uniquely by considering that such an elementary energy function must satisfy the additivity axiom (extensivity): (E(Xi=xi,Xj=xj)=Ei,j=Eij=Ei+Ej), which is the functional equation of the logarithm. The original proof goes back at least to Kepler, an elementary version was given by Erdos [125] and in information theoretic terms can be found in the proofs of uniqueness of “single event information function” by Aczel and Darokzy ([126], p. 3). It establishes the following proposition:

Theorem 1.

Given a simplicial information structure, the elementary energies satisfying the extensivity axiom are the functions:

Eijn=kk=inlnpk, (31)

where k is an arbitrary constant settled to k=1/ln2 for units in bits.

The geometric meaning of these elementary energies as log of marginal elementary probability volumes (locally Euclidean) is illustrated in Figure 2 and further underlines that Ik,k2 are volume corrections accounting for the statistical dependences among marginal variables.

Figure 2.

Figure 2

Elementary energy as logarithm of locally Euclidean probability volumes. Example of an elementary energy Eijk associated to a probability pijk (n=3 variables). The histograms of the marginal distributions of each variable are plotted beside the axes.

Examples: (i) In the example of three binary random variables (n=3,N1=N2=N3=2, three variables of Bernoulli) illustrated in the figure of the associated paper [2], we have E000=ln(p0p0p0), E000=ln(p000+p010+p001+p011)ln(p000+p100+p001+p101)ln(p000+p100+p010+p110) and in the configuration of negative-entangled-Borromean information of the figure of the associated paper [2], we obtain E000=3 in bit units and similarly E001=E010=E011=E101=E110=E111=3 and we hence recover U=i,j,k8pijkEijk=i=13H(Xi)=3 bits. Note that the limit 0ln00 avoids singularity of elementary energies.

(ii) In the special case of identically distributed variables, pk=pj, we have Eijn=nklnpk, and hence the marginal Gibbs distribution:

pk=eEijnnk. (32)

(iii) For independent identically distributed variables (non-interacting), we have Gn=0 and hence:

Hn(X1,,Xn;PN)=U(X1,,Xn;PN)=nH(Xi). (33)

(iv) Considering the variables to be the 6n variables of the phase space, with one variable of position and one variable of momentum per body (denoted (Xk1,Xk2,Xk3,Pk1,Pk2,Pk3) for the kth body), it is possible to re-express the semi-classical formalism, according to which the entropy formulation is (p. 22, [127]):

H6n(X11,X12,X13,P11,P12,P13,,Pn3;PN)=logΔXΔP(2π)6n. (34)

This is achieved by identifying the internal and free energy as follows:

E=6nlog(2π), (35)
G=log(ΔXΔP). (36)

This identifies the elementary volumes/probabilities with the Planck constant, the quantum of action (the consistency in the units is realized in Section 3.4.3 by the introduction of time). The quantum of action can be illustrated by considering in Figure 2 that it is the surface of the square/rectangle for two conjugate variables (considered as position and momentum). In this setting, ΔXΔP quantifies the non-extensivity of the volume in the phase-space due to interactions or in other words, the mutual information accounts for the consideration of the dependence of the subsystems considered as opened and exchanging energy. As noted by Baez and Pollard, the relative entropy provides a quantitative measure of how far from equilibrium the whole system is [32]. The basic principle of this expression of information theory in physics has been known at least since Jaynes’s work [9,10].

As a conclusion, information topology applies—without imposing metric, symplectic or contact structures—to the physical formalism of n-body interacting systems relying on empirical measures. Considering the 3n or 6n dimensions (degrees of freedom) of a configuration or a phase space as random variables, it is possible to recover the (semi)classical statistical physics formalism. It is also interesting to discuss the status of the analog of the temperature variable in the present formalism which is played by the graining, which is the size Ni of the alphabet of a variable Xi. In usual thermodynamics we have H(Xn;PN)=T.S(Xn) and to stay consistent, temperature shall be a functional inverse of the graining N, the lowest temperature being the finest grain (large N) and the highest temperature being the coarsest grain (small N).

3.3. k-Entropy and k-Information Landscapes

Definition 3.

Information Landscapes: Information landscapes are a representation of the (semi)lattice of information structures where each element is represented as a function of its corresponding value of entropy or mutual information. In abscissa are the dimensions k and in ordinate the values of the information functions of a given subset of k variables.

In data science terms, these landscapes provide a visualization of the potentially high-dimensional structure of the data points. In information theoretic terms, it provides a representation of Shannon’s work on lattices [23], further developed by Han [128]. Hk and Ik, as real continuous functions, provide a ranking of the lattices at each dimension k. It is the ranking (i.e., the relative values of information) which matters and comes out of the homological approach, rather than the absolute values. The principle of Hk and Ik landscapes is illustrated in Figure 3 for n=4. Hk and Ik analyze and quantify the variability-randomness and statistical dependences at all dimensions k, respectively, from 1 to n, n being the total number of variables under study. The Hk landscape represents the values of joint entropy for all k-tuples of variables as a function of the dimensions k, the number of variables in the k-tuple, together with the associated edges–paths of the lattice (in grey). The Ik landscape represents the values of mutual information for all k-tuples of variables as a function of the dimension k, which is the number of variables in the k-tuple. Figure 3 gives two theoretical extremal examples of such landscapes: one for independent and identically distributed variables (totally disordered) and one for fully dependent identically distributed variables (totally ordered). The degeneracy of Hk and Ik values is given by the binomial coefficient (color code in Figure 3), hence allowing one to derive the normal exact expression of the information landscapes in the asymptotic infinite dimensional limit (n) by application of Laplace–Lemoivre theorem. These are theoretical extremal examples: Hk and Ik landscapes effectively computed and estimated on biological data with a finite sample are shown in References [2,3,108] and in practice the finite sample size (m) may impose some bounds on the landscapes.

Figure 3.

Figure 3

Entropy and information landscapes. (a) Illustration of the principle of an entropy Hk landscape and (b) of a mutual information Ik landscape for n=4 random variables. The lattice of the simplicial information structure is depicted with grey lines. Theoretical examples of entropy and information landscapes. (c,d) Hk and Ik landscapes for n independent and identically distributed variables. The degeneracy of Hk and Ik values is represented by a color code: the number of k-tuples having the same information value. (e,f) Hk and Ik landscapes for n fully redundant variables. Such variables are equivalent from the information point of view; they are identically distributed and fully dependent.

3.4. Information Paths and Minimum Free Energy Complex

In this section we establish that information landscapes and paths directly encode the basic equalities, inequalities and functions of information theory and allow us to obtain the minimum free energy complex that we estimate on data.

3.4.1. Information Paths (Definition)

Definition 4.

Information Paths: On the discrete simplicial information lattice Δk, we define a path of degree k as a sequence of edges of the lattice that begins at the least element of the lattice (the identity constant “0”), travels along edges from vertex to vertex of increasing dimension and ends at the greatest element of the lattice of dimension k. Information paths are defined on both joint-entropy and meet-mutual-information semi-lattices and the usual joint-entropy and mutual-information functions are defined on each element of such paths. The entropy path and information path of degree k are denoted HPk and IPk, respectively and the set of all information paths is denoted HPk={HPi}i1,,k! for the entropy paths and IPk={IPi}i1,,k! for the mutual-information paths.

We have the theorem:

Theorem 2.

The two sets of all information paths HPk and IPk in the simplicial information structure Δk are both in bijection with the symmetric group Sk. Notably, there are k! information paths in Δk.

Proof. 

by simple enumeration, an edge of dimension m connects km edges of dimension m+1, the number of paths is hence (k0).(k1).(kk+2).(kk+1)=k!, hence the conclusion. □

A given path can be identified with a permutation or a total order by extracting the missing variable in a previous node when increasing the dimension, for example the mutual-information path in Δ4: IPi=0(0,X2)(0,X1,X2)(X1,X2,X4)(0,X1,X2,X3,X4) can be noted as the permutation σ:

0123402143or(01234)σ(02143). (37)

We note an information path with arrows, giving for the previous example IPi=(0X2X1X4X3). These paths shall be seen as the automorphisms of {1,2k}=[k] and the space of entropy and mutual-information paths can be endowed with the structure of two opposite symmetric groups Sk and Skopp. The equivalence of the set of paths and symmetric group only holds for the subcase of simplicial structures and the information paths in the lattice of partition are obviously much richer. More precisely, the subset of simplicial information paths in the lattice of partitions corresponds to the automorphisms of the lattice. It is known that the finite symmetric group is the automorphism group of the finite partition lattice [129]. The geometrical realization of information paths IPk and HPk consists of two dual permutohedra (see Postnikov [130]) and gives the informational version of the work of Matúš on conditional probability and permutohedra [131].

3.4.2. Derivatives, Inequalities and Conditional Mutual-Information Negativity

Derivatives of information paths:

In the information landscapes, the paths HPi and IPi are piecewise linear functions IPi(k) with IPi(k)=Ik, where Ik is the mutual information of the k-tuple of variables pertaining to the path IPi. We define the first derivatives of the paths for both entropy and mutual-information structures as piecewise linear functions:

First derivative of entropy path: the first derivative of an entropy path HPi(k) is the conditional information (X1,,Xk1).H(Xk;P):

dHPi(k)dk=H(X1,,Xk;P)H(X1,,Xk1;P)=(X1,,Xk1).H(Xk;P). (38)

This derivative is illustrated in the graph of Figure 4a. It implements the chain rule of entropy Hk+1Hk=(X1;;Xi^;;Xk+1).H(Xi) (Equation (12)) and in homology provides a diagram where conditional entropy is a simplicial coface map (X1;;Xi^;;Xk+1).H(Xi)=di:XkXk+1, as a simplicial special case of Hochschild coboundaries (Section 2.3).

Figure 4.

Figure 4

Entropy and information paths. Illustration of an entropy path HPi=01423 (a) and of a mutual-information path IPi=01432 (b) for n=4 random variables (see text).

First derivative of mutual-information path: the first derivative of an information path IPi(k) is minus the conditional information (Xk).I(X1,,Xk1;P):

dIPi(k)dk=I(X1,,Xk;P)I(X1,,Xk1;P)=Xk.I(X1,,Xk1;P). (39)

This derivative is illustrated in the graph of Figure 4b. It implements the chain rule of mutual information Ik1Ik=Xk.Ik1 (Equation (13)) and in homology provides a diagram where minus the conditional mutual information is a simplicial coface map Xi.I(X1;;Xi^;;Xk+1)=di:XkXk+1, introduced in Section 2.3.

Bounds of the derivatives and information inequalities

The slope of entropy paths is bounded by the usual conditional entropy bounds ([92] pp. 27–28). Its minimum is 0 and is achieved in the case where Xk+1 is a deterministic function of (X1,,Xk) (lower dashed red line in Figure 4a). Its global upper bound is maxHk+1=k.ln(N1Nk+1) and its sharp bound given by (X1;;Xi^;;Xk+1).H(Xi)H(Xi) is achieved in the case where Xk+1 is independent of X1,,Xk (we have Hk+1=Hk+H(Xk+1) (higher dashed red line in Figure 4a). Hence, any entropy path lies in the (convex) entropy cone defined by the three points labeled Hk, minHk+1 and maxHk+1: the three vertices of the cone depicted as a red surface in Figure 4a and called the Shannonian cone following Yeung’s seminal work [132]. The behavior of a mutual-information path and the bounds of its slope are richer and more complex than the preceding conditional entropy:

  • For k=2, the conditional information is the conditional entropy Xi.I(Xj)=Xi.H(Xj) and has the same usual bounds 0Xi.I(Xj)I(Xj).

  • For k=3 the conditional mutual information Xi.I(Xj;Xh) is always positive or null Xi.I(Xj;Xh)0 and hence I2I3 ([92], p. 26, the opposite of Theorem 2.40, p. 30), whereas the higher limit is given by Xi.I(Xj;Xh)minXi.H(Xj),Xi.H(Xh) ([34] th. 2.17), with equality iff Xj and Xh are conditionally independent given Xi and implying that the slope from k=2 to k=3 increases in the Ik landscape.

  • For k>3, Xk.I(X1;;Xk1) can be negative as a consequence of the preceding inequalities. In terms of information landscape this negativity means that the slope is positive, hence that the information path has crossed a critical point—a minimum. As expressed by Theorem due to Matsuda [34], Xk.I(X1;;Xk1)<0 iff Ik<Ik+1. The minima correspond to zeros of conditional information (conditional independence) and hence detect cocycles in the data. The results on information inequalities define as “Shannonian” [133,134,135] the set of inequalities that are obtained from conditional information positivity (Xi.I(Xj;Xh)0) by linear combination, which forms a convex “positive” cone after closure. “Non-Shannonian” inequalities could also be exhibited [133,134], hence defining a new convex cone that includes and is strictly larger than the Shannonian set. Following Yeung’s nomenclature and to underline the relation with his work, we call the positive conditional mutual-information cone (the surface colored in red in Figure 4b) the “Shannonian” cone and the negative conditional mutual-information cone (the surface colored in blue in Figure 4b) the “non-Shannonian” cone.

3.4.3. Information Paths Are Random Processes: Topological Second Law of Thermodynamics and Entropy Rate

Here we present the dynamical aspects of information structures. Information paths directly provide the standard definition of a stochastic process and it imposes how the time arrow appears in the homological framework, how time series can be analyzed, how entropy rates can be defined and so forth.

Definition 5.

Random (stochastic) process ([136]): A random process {Xt,tT} is a collection of random variables on the same probability space (Ω,B,P) and the index set T is a totally ordered set.

A stochastic process is a collection of random variables indexed by time—the probabilistic version of a time series. We have the following lemma:

Lemma 1.

(Stochastic process and information paths): Let (Ω,Δk,P) be a simplicial information structure, then the set of entropy paths HPk and of mutual-information paths IPk are in one-to-one correspondence with the set of stochastic processes {Xt,tT,|T|=k}.

Proof. 

Considering each symbol of a time series as a random variable, the definition of a stochastic process corresponds to the unique information paths HPi and IPi, whose total order is the time order of the series. More formally, we can prove that the number of different total order on the finite set T=1,2,,k with k elements is k!, such that we can establish a one-to-one correspondence of total orders on T with permutations on T and by Theorem 2 with entropy and information paths. Let T be a finite set with k elements and ≤ a total order relation on T. Consider that k=1, then the set T contains only 1 element and any relation is trivially a total order on T. Now, consider that k>1 and that T={t1,t2,,tn}. Suppose that xT. By the definition of a total order (Definition 9, p. 146 Bourbaki [137]), since ≤ is a total order on T then for y in T and where xy we have that x<y or y<x. If x<y,yT then we define x to be the minimum element in T. If xy,yT then there exists a yT such that y<x and such a process can be achieved recurrently until we obtain this minimal element. Without loss of generality, consider that t1 is this minimal element. We then take the set T{t1} and repeat the preceding reasoning to find a minimal element t2 of T{t1} and by recurrence we obtain t1<t2<<tk and hence that there are k! total orders in T. □

In other words, these paths are the automorphisms of {1,2k}=[k]. We immediately obtain a topological version of the second law of thermodynamics, which follows from an elementary convexity property and improves the result of Cover [36]:

Theorem 3.

(Stochastic process and information paths): Let (Ω,Δk,P) be a simplicial information structure and let XT={Xt,tT} be a stochastic process defined by a collection of random variables on the same probability space (Ω,B,P) with cardinality |XT|=t, where the index set T is a totally ordered set, then the entropy H(XT;P), where XT is the joint-random variable of the variables in XT and can only increase or stay constant with t.

Proof. 

given the correspondence we just established in Lemma 1, the statement is equivalent to H(X1,,Xk)H(X1,,Xk1) or H(X1,,Xk)H(X1,,Xk1)0 which by the chain rule of information (12) with k=1/ln2 gives (X1,,Xk1).H(Xk)0. It is hence sufficient to prove the non-negativity of conditional entropy which can be found in Yeung ([92], p. 27). Consider the definition of conditional entropy given in 8, that we denote for simplicity:

H(Xk|(X1,,Xk1);P)=k(x1,,xk1)N1××Nk1p(x1,,xk1).H(Xk|((X1,,Xk1)=(x1,,xk1))), (40)

with (x1,,xk1)(X1××Xk1). It is hence sufficient to show that H(Xk|((X1,,Xk1)=(x1,,xk1)))0), which follows from the fact that for 0p(x1,,xk1)(xk)1, we have kln(1/p(x1,,xk1)(xk))0 with k=1/ln2, which follows from the concavity of the logarithm. The generalization with respect to the stationary Markov condition on XT used by Cover comes from the remark that in any case the indexing set of the variable is a total order. □

Remark 1.

The equality H(X1,,Xk)=H(X1,,Xk1) corresponds to a statistical independence condition I2((X1,,Xk1);Xk)=0 and to an equilibrium condition. Note that the homological formalism imposes an “initial” minimally low entropy state H(0)=I(0)=0, which is a usual assumption in physics and which corresponds to the constant and zero degree homology, which has to have at least one component to talk about the cohomology. The meaning of this theorem in common terms was summarized by Gabor and Brillouin: “you cannot have something for nothing, not even an observation” [138]. This increase in entropy is illustrated in Figure 5a. The usual stochastic approach of time series assumes a Markov chain structure, imposing peculiar statistical dependences that restrict memory effects (cf. associated paper [2] proposition 8). The consideration of stochastic processes without restriction allows any kind of dependences and arbitrary long, historical and “non-trivial” memory. From the biological point of view, it formalizes the phenomenon of arbitrary long-lasting memory. From the physical point of view, without proof, such a framework appears as a classical analog of the consistent or decoherent histories developed notably by Griffiths [139], Omnes [140] and Gell-Mann and Hartle [141]. The information structures impose a stronger constraint of a totally ordered set (or more generally a weak ordering) than the preorder imposed by Lieb and Yngvason [142] to derive the second law.

Figure 5.

Figure 5

Example of mean entropy and information paths of gene expression. (a) Mean entropy path Hk for the 21 genes of interest for population A (green line) and population B neurons (red line). (b) Mean information path Ik for the same pool of genes. (c) Mean information path Ik for the remaining 20 genes (“non-relevant”). The undersampling dimension introduced in the associated paper [2] is depicted with arrows.

It is also interesting to note that even in this classical probability framework, the entropy cone (the topological cone depicted in Figure 4a) imposed by information inequalities, when considered with this time ordering, is a time-like cone (much like the special relativity cone) but with the arguably remarkable fact that we did not introduce any metric.

The stochastic process definition allows definition of the finite and asymptotic information rate:

Definition 6.

Information rate: the finite information rate r of an information path HPi is r=Hkk.

The asymptotic information rate r of an information path HPi is r=limkHkk. It requires the generalization of the present formalism to the infinite dimensional setting or infinite information structures, which is not trivial and will be investigated in further work. We also let the question of the expression in information cohomology of the first principle as an open problem. Question: recently Baez and Fong published a Noether Theorem for Markov processes [39]; can we derive a Noether theorem for random discrete processes in general, that is, for all the symmetric groups Sn using the present construction? Such a theorem would provide the topological expression of the first law of thermodynamics. This question was asked by Neuenschwander [40] and related to this aim, Mansfield gave a Noether theorem for finite elements [38].

3.4.4. Local Minima and Critical Dimension

The derivative of information paths allows establishment of the lemma on which information path analysis is based. A critical point is said to be non-trivial if at this point the sign of the derivative of the path (i.e., the conditional information) changes.

Lemma 2.

Local minima of information paths: if Xk.I(X1;;Xk1)<0, then all paths from 0 to Ik passing by Ik1 have at least one local minimum. In order for an information path to have a non-trivial critical point, it is necessary that k>3, the smallest possible dimension of a critical point being k=3.

Proof. 

it is a direct consequence of the definitions of paths and of conditional 2-mutual information Xk.I2 positivity (Xk.I20, cf. Theorem 3.4.2.2 [92]). □

Note that, by definition, a local minimum can be a global minimum. If it exists, we will call the dimension k of the first local minimum of an information path the first informational critical dimension of the information path IPi and denote it ki1. This allows us to define maximal information paths:

Definition 7.

Positive information path: A positive information path is an information path from 0 to a given Ik corresponding to a given k-tuple of variables such that Ik<Ik1<<I1.

Definition 8.

Maximal positive information path: A maximal positive information path is a positive information path of maximal length. More formally, a maximal positive information path is a positive information path that is not a proper subset of positive information paths.

The definitions make positive information paths and maximal positive information paths coincide with chains (faces) and maximal chains (facets), respectively. The maximal positive information path stops at the first local minimum of an information path, if it exists. The first informational critical dimension ki1 of a time series IPi, whenever it exists, gives a quantification of the duration of the memory of the system.

3.4.5. Sum over Paths and Mean Information Path

As previously, for k=1, IPi(1) can be identified with the self-internal energy and for k2, IPi(k) corresponds to the k-free-energy of a single path IPi. The chain rule of mutual information (Equation (15)) and the derivative of an IPi path (Equation (38)) imply that the k-free-energy can be obtained from a single path:

Ik=I(X1;;Xk;P)=I(X1)i=2kXi.I(X1;;Xi1)=IPi(1)+j=2kdIPi(j)dj. (41)

Hence, the global thermodynamical relation (25) can be understood as the sum over all paths, the sum over informational histories: the classical, discrete and informational version of the path integrals in statistical physics [143]. Indeed, considering an inverse relation between time and dimension t=1n in the probability expression (32) for iid processes gives the usual expression of a unitary evolution operator pk=et.Eijnk. Free-information-energy integrates over the simplicial structure of the whole lattice of partitions over degrees k2, which further justifies its free-energy name.

In order to obtain a single state function instead of a group of k! path functions, we can compute the mean behavior of the information structure, which is achieved by defining the mean Hk and Ik, denoted Hk and Ik:

Hk=T[n];card(T)=kHk(XT;P)nk, (42)

and

Ik=T[n];card(T)=kIk(XT;P)nk. (43)

For example, considering n=3, then I2=I(X1;X2)+I(X1;X3)+I(X2;X3)3. This defines the mean mutual-information path and a mean entropy path denoted HP(k) and IP(k) in the information landscape. The case k=2 of those functions introduced in Reference [108] is studied in Merkh and Montúfar [144] with a characterization of the degeneracy of their maxima and are called factorized mutual information. As previously, IP(1) can be identified with the mean self-internal energy U(Xhomn;PN) and for k>1IP(k) to the mean k-free-information-energy G(Xhomn;PN), giving the usual isotherm relation:

H(Xhomn;PN)=U(Xhomn;PN)G(Xhomn;PN). (44)

The computation of the mean paths corresponds to an idealized information structure Xhomn for which all the variables would be identically distributed, would have the same entropy and would share the same mutual information Ik at each dimension k: a homogeneous information structure, with homogeneous high-dimension k-body interactions. As is usually achieved in physics notably in mean-field theory (e.g., Weiss [145] or Hartree), it aims to provide a single function summarizing the average behavior of the system (we will see that in practice it misses the important biological structures, pointing out the constitutive heterogeneity of biological systems; see Section 4.1.3). Using the same dataset and results presented in References [2,3,108], the IP(k) paths estimated on a genetic expression data set are shown for two populations of neurons (A and B) in Figure 5. We quantified the gene expression levels for 41 genes in two populations of cells (A or B) as presented in References [2,3,108]. We estimated Hk and Ik landscapes for these two populations and for two sets of genes (“genes of interest” and “non-relevant”) according to the computational and estimation methods presented in References [2,3,108]. The available computational power restricted the analysis to a maximum of n=21 variables (or 21 dimensions) and imposed us to divide the genes between the two classes “genes of interest” and “non-relevant”. The 21 genes of interest were selected within the 41 quantified genes according to their known specific involvement in the function of population A cells.

Figure 5 exhibits the critical phenomenon usually encountered in condensed matter physics, like the example of van der Waals interactions [146]. Like any Ik path, IP(k) can have a first minimum with a critical dimension ki1 that could be called the homogeneous critical dimension. For the 21 genes of interest (whose expression levels, given the literature, are expected to be linked in these cell types) the Ik path exhibited a clear minimum at the critical dimension ki1=4 for population A neurons and ki1=5 population B neurons, reproducing the usual free-energy potential in the condensed phase for which n-body interactions are non-negligible. For the 20 other genes, less expected to be related in these cell types, the Ik path exhibited a monotonic decrease without a non-trivial minimum, which corresponds to the usual free-energy potential in the uncondensed-disordered phase for which the n-body interactions are negligible. Indeed, as shown in the work of Xie and colleagues [147], the tensor network renormalization approach of n-body interacting quantum systems gives rise to an expression of the free-energy as a function of the dimension of the interactions, in the same way achieved here.

3.4.6. Minimum Free Energy Complex

The analysis of information paths that we now propose aims to determine all the first critical points of information paths, in other words, to determine all the information paths for which conditional information stays positive and all first local minima of the information landscape that can also be interpreted as a conditional independence criterion. Such an exhaustive characterization would give a good description of the landscape and of the complexity of the measured system. The qualitative reason for considering only the first extrema for the data analysis is that beyond that point, mutual information diverges (as explained in Section 3.4.4) and the maximal positive information paths correspond to stable functional modules in the application to data (gene expression).

A more mathematical justification is that they define the facets of a complex in our simplicial structure, which we will call the minimum energy complex of our information structure, underlining that this complex is the formalization of the minimum free-energy principle in a degenerate case.

We now obtain the theorem that our information path analysis aims to characterize empirically:

Theorem 4.

(Minimum free energy complex): the set of all maximal positive information paths forms a simplicial complex that we call the minimum free energy complex. Moreover, the dimension-degree of the minimum free energy complex is the maximum of all the first informational critical dimensions (d=maxki1), if it exists, or the dimension of the whole simplicial structure n. The minimum free energy complex is denoted X+d. A necessary condition for this complex not to be a simplex is that its dimension is greater than or equal to four (d4).

Proof. 

It is known that there is a one-to-one correspondence between simplicial complexes and their set of maximal chains (facets) (see Reference [148], p. 95, for an example). The last part follows from Lemma 2. □

In simple words, the maximal faces (e.g., the maximal positive information paths) encode all the structures of the minimum free energy complex. Figure 6 illustrates one of the simplest examples of a minimum free energy complex that is not a simplex, of dimension four in a five-dimensional simplicial structure of information Δ5.

Figure 6.

Figure 6

Example of maximal Ik paths in an Ik landscape for n=5 together with its corresponding minimum free energy complex. (a) Maximal Ik paths in an Ik landscape for n=5. The maximum positive information paths are depicted in red, for example, the paths 1234 but also 4321, 345 and 125 are maximum positive information paths (i.e., facets/maximal chains). The facet 1234 is a 3-simplex while 345 is a 2-simplex with critical dimension k345=3. The usual dimension of the simplex is used here but we could have augmented it by one, since we added the constant element “0” to the algebra (pointed space), such that the usual simplicial dimension and the critical dimension correspond. The maximal critical dimension of the positive information paths is the dimension of the complex and hence d(X+k)=d(1234)=4. (b) The minimum free energy complex corresponding to the preceding maximal ik paths. It is a subcomplex of the 4-simplex, also called the 5-cell, with only one four-dimensional cell among the five depicted as the bottom tetrahedron {1234} with darker red volume. It has 5 vertices, 10 edges, 10 2-faces and 1 3-face (cell), hence its Euler characteristic is χ(X+k)=510+101=4 and its minimum free energy characteristic characteristic is: H+k(X+k)=XiX+k5I(Xi)(Xi;Xj)X+k10I(Xi;Xj)+(Xi;Xj;Xh)X+k10I(Xi;Xj;Xh)I(X1;X2;X3;X4).

We define the minimum free energy characteristic as:

H+k(X+k;P)=i=1k(1)i1IX+;card(I)=iIi(XI;P), (45)

where the component with dimension higher than one is a free energy. In the example of Figure 6, it gives:

H+k(X+k)=XiX+k5I(Xi)(Xi;Xj)X+k10I(Xi;Xj)+(Xi;Xj;Xh)X+k10I(Xi;Xj;Xh)I(X1;X2;X3;X4). (46)

We propose that this complex defines a complex system:

Definition 9.

Complex system: A complex system is a minimum free energy complex.

It has the merit to provide a formal definition of complex systems as simple as the definition of an abstract simplicial complex can be and to be quite consensual with respect to some of the approaches in this domain, as reviewed by Newman [149]. Notably, it provides a formal basis to define some of the important concepts in complex systems: emergence being the coboundary map, imergence the boundary map, synergy being information negativity, organization scales being the ranks of random-variable lattices, a collective interaction being a local minimum of free-energy, diversity being the multiplicity of these minima quantified by the number of facets, a network being a 1-complex, a network of networks being a 1-complex in hyper-cohomology.

The interpretation in terms of sum over paths in the complex is direct, as it sums over paths until conditional independence. We called it the minimum free energy complex but could instead have called it the positive or instantaneous complex because its facets appear as the boundaries of the “present” structure but it obviously contains all the past history and the memory of the structure (notably encoded in the negative Ik that are necessarily non-Markovian). The topological formalization of the minimum energy allows the coexistence of numerous local minima—a situation usually encountered in complex systems (slow aging) such as frustrated glasses and K-sat problems [4,150] which settings correspond here to the case of n binary random variables, N1==N2=2. The existence of the frustration effect, due to the multiplicity of these local minima in the free energy landscape [151], has also been one of the main difficulties of condensed matter theory. Matsuda could show that Ik negativity is a signature of frustration [34]. The first axioms of DFT consider that probability densities of n elementary bodies are each in a 3-dimensional space [117,118], defining a whole simplicial structure of dimension n=3n, commonly called the configuration space. When considered with the physical axiom of a configuration space, Theorem 4 implies that, while the minimum free energy complex of an elementary body can only be a simplex, the configuration space of n elementary bodies can be a complex with (quite) arbitrary topology. In simple terms, this settles the elementary components of the configuration space as 3-simplices, whose composition can give arbitrarily complicated k-complexes. This idea is in resonance with the triangulations of space-time that arose notably from the work of Wheeler [119] and Penrose [12], like spin foams [152] and causal sets [153], while here we only considered classical probabilities.

4. Discussion

4.1. Statistical Physics

4.1.1. Statistical Physics without Statistical Limit? Complexity through Finite Dimensional Non-Extensivity

The measure of entropy and information rate on data (the evolution of entropy Hk when the number of variables k increases) has a long history. Originally, in the work of Strong and colleagues [154] and as usual in information theory and statistical physics, it was considered that the “true” entropy was given in the asymptotic limit limnHn under stationarity or stronger assumptions. As explained in Section 2.1 (see also the note of Kontsevitch [76], in the work of Baez, Fritz and Leinster [81]) and extensively in the statistical physics works of Niven [155,156,157], entropy does not need asymptotic or infinite assumptions such as the Stirling approximation to be derived. Rather, entropy appears without approximations in a discrete formalism equivalent to Galois cohomology [103] as a first cohomology class which has to be completed by invariants in the higher degrees. Here and in the associated paper [2], we have tried to understand, explore and exploit this observation. Rather than being interested in the asymptotic limit (the infinite dimensional case) and absolute values of information, the present analysis focuses on the finite version of the “slow approach of the entropy to its extensive asymptotic limit” that Grassberger [158] as well as Bialek, Nemenman and Tishby proposed to be “a sign of complexity” [159], “complexity through non-extensivity” (see also Tsallis [160]). In short, we consider the non-extensivity of information before considering its asymptotic limit. Considering a statistical physics without statistical limit could be pertinent for the study of “small” systems, which concerns biological systems. Their small size allows them to harness thermal fluctuations and impose their investigation with out-of equilibrium methods, as exposed in the work of Ritort and colleagues, reviewed in Reference [161]. Finiteness and discreteness could be an essential property of the physics of complex living systems. The Hk and Ik landscapes presented here give a detailed expression of the “signs of complexity” and non-extensivity for such small size systems (finite dimension k) and give a finite dimensional geometric view of the “slow approach of the entropy to its extensive asymptotic limit”. In a sense, what replaces here the large number limits, Avogadro number consideration and so forth, is the combinatorial explosions of the different possible interactions: in the same way as in van der Waals paradigm, a combinatorial number of weak interactions can lead to a strong global interaction. Among all possible data structures, one is universal: data and empirical measures are discrete and finite, as emphasized by Born [16] and fully justify the cohomological approach used here originating in topos (designed by Grothendieck to hold the discrete and continuous in a single hand [162]), which was originally constructed to handle the Lie and Galois theory, as well as continuous and discrete symmetries, in a common framework. Notably, the principle originally enunciated by Willard Gibbs which considers that a phase transition is a singularity in thermodynamic behavior and occurs only in infinite systems is physically false: none of the observed systems are infinite and many of them—even small—present phase transition [35]. Cohomological methods allow the Gibbs transition principle to be corrected with respect to experience while leaving the thermodynamic theory untouched.

4.1.2. Naive Estimations Let the Data Speak

One of the striking results of the data analysis as presented here and in the associated paper [2] concerns the relatively low sample size (m=41 and m=111 for the analysis with cells as variables and with genes as variables respectively) required to obtain satisfying results in relatively high dimensions (k=10 and k=6, respectively). Satisfying results means here that they predict already-known results reported in the biological literature, or in agreement with experts’ labels. In Reference [97], Nemenman and colleagues, who developed the problematic of the sampling problem, state in the introduction that “entropy may be estimated reliably even when inferences about details of the underlying probability distribution are impossible. Thus the direct estimation of dependencies has a chance even for undersampled problems” and conclude that “a major advantage of our definition of statistical dependencies in terms of the MaxEnt approximations is that it can be applied even when the underlying distributions are undersampled”. The present analysis agrees and confirms their conclusion. The methods applied here are quite elementary. They do not make assumptions of an expected or true distribution, of maximum entropy distribution or pairwise interaction Hamiltonian, coupling constant or metric, of stationarity or ergodicity or i.i.d. process, Markov chain, or underlying network structure, or whatever prior that would speak in place of the data. It just considers numerical empirical probabilities as expressed by Kolmogorov axioms ([163], chap. 1), which he called the “generalized fields of probability” because they do not assume the sixth axiom of continuity. Rather than fixing a model with priors, the present formalism allows the raw data to freely impose their specific structure to the model, which is usually called the naive approach or naive estimation. If one accepts that a frequentist theory and interpretation of probability is mathematically valid ([163], chap. 1), one may then conclude that a frequentist theory of entropy and information may also hold and moreover directly fulfills the usual requirement of observability in theoretical physics recalled by Born in his Nobel lecture [16]. This frequentist elementary consideration is not mathematically trivial, notably when considered from the number-theoretic point of view. For example, the combinatoric of integer partitions of m could be investigated in the general information structure (partition) context, which to our knowledge has not been achieved in the context of probability and information.

4.1.3. Discrete Informational Analog of Renormalization Methods: No Mean-Field Assumptions Let the Objects Differentiate

To our best knowledge, all previous studies that tried to quantify statistical dependences using information methods with more than three variables used total correlation [97,164] and crucially assumed that the interaction between the variables are homogeneous, which corresponds to the usual mean field assumption and to the identically distributed case of mean information (Section 3.4.5) presented here. The proposed combinatorial decomposition allows heterogeneous classes within the set of variables to be identified [2,3], which would not have been possible using homogeneous assumptions, just as renormalization methods allowed the failures of mean field models to be overcome [41].

4.1.4. Combinatorial, Infinite, Continuous and Quantum Generalizations

In place of the usual assumptions of statistical physics and in order to compute effectively information structures on the data, we had to concede a severe restriction to the simplest simplicial combinatoric, leaving the whole multinomial combinatoric as potential and relevant extensions of the present data analysis. It is likely (and left as an open question) that the generalization of the binomial combinatoric presented here to the multinomial case will allow extension of the theorem obtained in the associated paper that shows that informations provide co-ordinates on the probability simplex for binary variables (Theorem 3 in Reference [2]). Even larger extensions have already been achieved theoretically, notably by Vigneaux, who developed q-multinomial deformations of this combinatorics associated with Tsallis entropies [19,165,166]. Moreover, as shown in the PhD of Vigneaux, this finite and discrete elementary setting extends nicely to the infinite and continuous case [19] within cohomology theory. If those structures currently appear gigantic and out of computational reach, one should expect that the future generations of computing devices will allow us to investigate them. On the side of the generalization of the information cohomology to quantum information [1,19] (see also Maniero [21]), Adami and Cerf showed that quantum conditional entropies and I2 can be negative and that it happens precisely for entangled systems according to Bell inequalities [167,168]. The framework of Adami and Cerf provides a natural way to generalize the present classical Ik structures to the quantum case and allows the interpretation that classical Ik negativity detects classical entanglement-like relations. According to their results, Bell inequalities impose that the “non-Shannonian” (quantum) cone in information landscapes should happen below the third dimension: in contrast to what we presented for the classical minimum energy complex in Section 3.4.6, quantum entanglement should allow non-simplex quantum minimum free energy complexes to happen in dimension below 3—a promising topological insight into the possible non-locality of the configuration space.

4.2. Data Science

4.2.1. Topological Data Analysis

Thanks to the recent success of topological data analysis, a recurrent question has concerned the relation of information cohomology with persistence homology. The answer is not easy, since both appeared independently and are quite different in formalism, algorithms and results on the dataset. It is possible to provide an intuitive non-rigorous interpretation of the present work in persistence terms. Most of the persistent methods consist of approximating the birth and death of the Betti’s numbers of the C˘ech complex obtained by considering balls around each data point while the radius of the ball grows. The C˘ech complex is given by the intersection of the balls and for combinatorial computational reasons, most of the algorithms restrict to pairwise intersections, giving the Vietoris–Ripps complex as an approximation of the C˘ech complex. Our method focuses on the intersection of random variables rather than balls around data points: a theorem of Hu Kuo Ting [91] (Theorem 1 in Reference [2]) shows the equivalence of mutual-information functions with set theoretic finite measurable functions endowed with the intersection operator, formalizing the usual naive Venn diagram interpretation of mutual information. Hence, leaving the mathematical rigor to allow an interpretation of the current algorithm in the common language of persistence, I compute here an information-theoretic analog of the C˘ech complex and it is not excluded that this analogy can be made formal and notably to establish some nerve theorem for information structures (see Oudot for a review [169]). It turns out that “zero information intersections” is exactly equivalent to statistical independence (Theorem 2 in Reference [2]). Then, the balls can be viewed as a local estimation of the probability density under a metric space assumption, while computing the persistence is roughly analog to varying the graining of the probability estimation as achieved in Section 5.6 of Reference [2] and the maximal mutual-information coefficient method proposed by Reshef et al. [116]. Hence, in regard to current topological data analysis methods, the methods presented here provide an intrinsically probabilistic cohomological framework: the differential operators are fundamental maps in probability-information theory. As a consequence, no metric assumption is required a priori: in practice, it is possible to compute the homology, for example, on position variables and/or on qualitative variables such as “nice” and “not nice” or “Alice” and “Bob”. The present method is topological and avoids the a priori introduction of such a metric; rather, a family of Shannon’s pseudometric emerges from the formalism as a first cohomological class (Section 2.3.1). Considering a symmetric action of conditioning, we obtain Shannon’s metric parametrized by a scalar multiplicative constant.

4.2.2. Unsupervised and Supervised Deep Homological Learning

As underlined in the Introduction, complexes of random variables and minimum free energy complex can be understood as providing a geometrically constrained architecture to deep neural networks, where the depth of the neural network corresponds to the dimension of the cochain complex and marginal variables and information I1 correspond to the input layer. Notably, the multiplicity of facets forming the complexes allows consideration of neural networks with parallel layer architecture to analyze conditionally independent features of the input, as effectively achieved in real nervous systems, such as the macroscopic “where and what” (dorso and ventral, respectively) visual processing streams in the vertebrate cortex [170,171]. Moreover, the energy functional interpretation of mutual information Ik and of total correlation Gk functions directly follow and generalize the definitions of the hierarchical deep architectures of Boltzmann machines and of Helmholtz machines [65,66] that explicitly expressed free energy in terms of Kullback-Leibler (KL) divergences between layers. Notably, the introduction of the multiplicity decomposition of “energy functions” formalizes unsupervised learning in neural networks in terms of a combinatorial family of analytically independent functions Ik with independent gradients (Theorem 4 in Reference [2]): instead of a single energy and associated gradient descent, mutual information provides a multiplicity of gradients. The application scope is restricted to the general problem of unsupervised learning here but the supervised subcase can be proposed from this and is detailed with some examples of application to the digits MNIST images dataset presented here [63] and in a future publication. Notably, the mathematical depth of the back-propagation algorithm [172,173,174] comes from the fact that it implements the chain rule for derivation with respect to the parameters, allowing learning in the class of differentiable functions. In information cohomology, supervised learning appears as a subcase of unsupervised learning and defines the sublattice, information landscapes and complexes for which all chains contain the label variable Xi to be learned and the 2n1Hk, Ik which (sub)gradients are independent in open dense subsets of the probability subsimplex ΔX/Xi (subsimplex obtained by conditioning on the parameters EXi, the conditional probability laws with respect to Xi). Notably, the marginal degree 1 layer is only composed of the label variable Xi, that correponds to the usual definition of an input, and the simplicial structure is translated by one degree. Then, the maximal depth of a deep neural network achieving the classification given the data is the dimension of the information simplicial complex—a result that can be linked to the work of Montúfar on the dimension of restricted Boltzmann machine [175]. In this cohomological context, the back-propagation is implemented by the information chain rule and is forward as imposed the cohomology contra-variant nature. The relation to information geometry and Amari’s natural gradient [176] follows from the relation of the Fisher information with the Hessian of the KL divergence [177] (or entropy [178]) and from the implementation of natural gradients for deep network training [179] and is left for further courageous investigations.

4.2.3. Epigenetic Topological Learning—Biological Diversity

In place of the MaxEnt principle, we proposed a least energy principle equivalent here to a homological complex (finite and without metric assumptions). Mathematically, profited from the fact that whether the maximum of entropy functional is always unique and in a sense normative, the minima of Ik functionals exhibit a rich structure of degeneracy, corresponding to the “non-Shannonian set” [133,134,135] and conjectured to be at least as rich as topological links can be [2]. We proposed that this multiplicity of minima accounts for biological diversity, or more precisely that the number of facets of this complex quantifies the diversity in the system. The application to cell type identification presented in the associated paper [2] gives a preliminary validation of this quantification. Moreover, the definition of a complex system such as the minimum free energy complex given in Section 3.4.6, underlining that diversity is just the multiplicity of the minima, is in agreement with Waddington’s original work [180] (see Figure 7b). In the allegory of Waddington’s epigenetic landscapes, whatever the ball, it will always fall down—a statement that can be assimilated to the second law of thermodynamics. Doing so, however, it will be able to take different paths: diversity comes from the multiple minima. Waddington’s explanation of this landscape is as a “complex system of interactions” that can be formalized by the minimum free energy complex with interactions corresponding to the Ik. Moreover, formalisms assuming that the variables are identically distributed, as for the homogeneous systems described in the section on mean paths (Section 3.4.5), will display a single first minima (one facet, a simplex) and hence no diversity. Sharing the same aims, Teschendorff and Enver and then Jin and colleagues, proposed an alternative interpretation of Waddington’s landscape in terms of signaling entropy [181] and of probability transitions [182], respectively.

Figure 7.

Figure 7

The epigenetic landscape of Waddington. (a) The epigenetic landscape of Waddington. A path of the ball in this landscape illustrates a cell’s developmental fate. (b) “The complex system of interactions underlying the epigenetic landscape”, with Waddington’s original legends [180].

Following Thom’s topological morphogenetic view of Waddington’s work [183], we propose that Ik landscape, paths and minimum free energy complex provide a possible informational formalization of Waddington’s epigenetic complex landscape and cell fates (cf. Figure 7). This formalization of Waddington’s epigenetic view is consistent with the machine learning formalization of Hebbian epigenetic plasticity. From the pure formal view, the models of Hebbian neural learning like Hopfield’s network, Boltzmann machines, the Infomax models proposed by Linsker, Nadal and Parga, as well as Bell and Sejnowski [184,185,186]) can be viewed as binary variable subcases of a generic N-ary variable epigenetic developmental process. For example, Potts models were implemented for the simulation of cell-based morphogenesis by Glazier and colleagues [187]. Hence, the topological approach can allow the treatment of neural learning and development on the ground of a common epigenetic formalism, in agreement with biological results pointing out the continuum and “entanglement” of the biological processes underlying development and learning [188]. In terms of the current problematics of neuroscience, such generalization allows, on a formal level, consideration of an analog coding in place of a digital coding and the methods developed here can be applied to studies investigating (discrete) analog coding.

Moreover, following all the work of these past decades on the application of statistical physics to biological systems (some of them cited in this article), we propose that the epigenetic process implements the first two laws of thermodynamics, weak topological versions of which are proposed to hold in the raw data space (without phase space or symplectic structure, cf. Section 3.4.3). As previously underlined, the condition for such an inscription of living organism dynamics into classical statistical physics to be legitimate is that the considered variables correspond to phase space variables.

5. Conclusions

In this paper we have proposed a unified view and account for classical statistical physics and machine learning methods like topological data analysis and deep neural networks under a common information cohomology framework. The particularity and main novelty of the methods are that they are finite and discrete and that such axioms, rather than being a default, can have fundamental mathematical (Galois theory [103]), probabilistic (Kolmogorov foundations of finite probability field [163]), physical (frequentist hypothesis and observability axiom [16]) and computational (directly computable combinatorial expressions) and machine learning (numerical pattern classification) meaning. From the statistical physics point of view, we developed within the information cohomology framework an expression of internal and free energy, of the second law and conjecturally of the first law of thermodynamics, while developing a generic k-body interaction model that can be viewed as a dicrete-finite and informational analog to renormalization methods (and hence more physically sound [44,45]), avoiding mean-field approximations. The application of these informational methods to genetic expression data reproduced the usual signatures of phase transition of mean-field k-body models [46,47,48]. From the machine learning point of view, this paper provides a combinatorial and computational expression of the information cohomology and notably of the simplicial subcase that can be effectively computed and underline its universal classifier function. The methods applied in References [2,3] offer new algorithms for topological data analysis which are intrinsically probabilistic, as well as acohomological insight into deep neural network architecture and training/learning algorithms, both supervised and unsupervised, while providing a generic biological model of epigenetic plasticity and development.

Acknowledgments

Thanks previously to support and hosting since 2007 of the Max Planck Institute for Mathematics in the Sciences (MPI-MIS), the Complex Systems Institute Paris-Ile-de-France (ISC-PIF), and the Institut de Mathématiques de Jussieu - Paris Rive Gauche (IMJ-PRG). This work addresses a deep and warm acknowledgement to the researchers who helped its realization: D. Bennequin, J.M. Goaillard, Hong Van le, G. Marrelec, M. Tapia, and J.P. Vigneaux; or supported and encouraged it: H. Atlan, F. Barbaresco, H. Bénali, P. Bourgine, F. Chavane, J. Jost, A. Mohammad-Djafari, J.P. Nadal, J. Petitot, A. Sarti, J. Touboul. A partial version of this work has been deposited in the Method section of Bioarxiv 168740 in July 2017 and preprints 2018 [108].

Abbreviations

The following abbreviations are used in this manuscript:

iid Independent identically distributed
Hk Multivariate k-joint Entropy
Ik Multivariate k-mutual information
Gk Multivariate k-total-correlation or k-multi-information

Supplementary Materials

The software Infotopo is available at https://github.com/pierrebaudot/INFOTOPO.

Funding

This work was funded by the European Research Council (ERC consolidator grant 616827 CanaloHmics to J.M. Goaillard) and Median Technologies, developed at UNIS Inserm 1072—Université Aix-Marseille.

Conflicts of Interest

The authors declare no conflict of interest.

References

  • 1.Baudot P., Bennequin D. The Homological Nature of Entropy. Entropy. 2015;17:3253–3318. doi: 10.3390/e17053253. [DOI] [Google Scholar]
  • 2.Baudot P., Tapia M., Bennequin D., Goaillard J.C. Topological Information Data Analysis. Entropy. 2019 doi: 10.3390/e21090869. in press. [DOI] [Google Scholar]
  • 3.Tapia M., Baudot P., Formizano-Tréziny C., Dufour M., Temporal S., Lasserre M., Marquèze-Pouey B., Gabert J., Kobayashi K., Goaillard J.M. Neurotransmitter identity and electrophysiological phenotype are genetically coupled in midbrain dopaminergic neurons. Sci. Rep. 2018;8:13637. doi: 10.1038/s41598-018-31765-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Mézard M. Passing Messages Between Disciplines. Science. 2003;301:1686. doi: 10.1126/science.1086309. [DOI] [PubMed] [Google Scholar]
  • 5.Caramello O. The unification of Mathematics via Topos Theory. [(accessed on 4 September 2019)];arXiv. 2010 Available online: https://arxiv.org/abs/1006.3930.1006.3930 [Google Scholar]
  • 6.Doering A., Isham C. Classical and quantum probabilities as truth values. J. Math. Phys. 2012;53:032101. doi: 10.1063/1.3688627. [DOI] [Google Scholar]
  • 7.Doering A., Isham C. A Topos Foundation for Theories of Physics: I. Formal Languages for Physics. J. Math. Phys. 2008;49:053515. doi: 10.1063/1.2883740. [DOI] [Google Scholar]
  • 8.Brillouin L. Science and Information Theory. Academic Press; Cambridge, MA, USA: 1956. [Google Scholar]
  • 9.Jaynes E. Information Theory and Statistical Mechanics II. Phys. Rev. 1957;108:171–190. doi: 10.1103/PhysRev.108.171. [DOI] [Google Scholar]
  • 10.Jaynes E.T. Information Theory and Statistical Mechanics. Phys. Rev. 1957;106:620–630. doi: 10.1103/PhysRev.106.620. [DOI] [Google Scholar]
  • 11.Landauer R. Irreversibility and heat generation in the computing process. IBM J. Res. Dev. 1961;5:183–191. doi: 10.1147/rd.53.0183. [DOI] [Google Scholar]
  • 12.Penrose R. Quantum Theory and Beyondum. Cambridge University Press; Cambridge, UK: 1971. Angular Momentum: An Approach to Combinatorial Space-Time; pp. 151–180. [Google Scholar]
  • 13.Wheeler J. Information, Physics, Quantum: The Search for Links. In: Zureked W.H., editor. Complexity, Entropy, and the Physics of Information. CRC Press; New York, NY, USA: 1990. [Google Scholar]
  • 14.Bennett C. Notes on Landauer’s principle, Reversible Computation and Maxwell’s Demon. Stud. Hist. Philos. Mod. Phys. 2003;34:501–510. doi: 10.1016/S1355-2198(03)00039-X. [DOI] [Google Scholar]
  • 15.Wheeler J.A. Physics and Austerity, Law Without Law. Center for Theoretical Physics, University of Texas; Austin, TX, USA: 1982. Numéro 122 de Pamhlets on Physics. [Google Scholar]
  • 16.Born M. The Statistical Interpretation of Quantum Mechanics. Nobel Lecture. [(accessed on 4 September 2019)];1954 Available online: https://www.nobelprize.org/prizes/physics/1954/born/lecture/
  • 17.Baudot P. Ph.D. Thesis. Universit’e Pierre et Marie Curie-Paris VI; Paris, France: Sep, 2006. Natural Computation: Much Ado about Nothing? An Intracellular Study of Visual Coding in Natural Condition. [Google Scholar]
  • 18.Vigneaux J. The Structure of Information: From Probability to Homology. [(accessed on 4 September 2019)];arXiv. 2017 Available online: https://arxiv.org/abs/1709.07807.1709.07807 [Google Scholar]
  • 19.Vigneaux J. Ph.D. Thesis. Paris 7 Diderot University; Paris, France: Jun, 2019. Topology of Statistical Systems. A Cohomological Approach to Information Theory. [Google Scholar]
  • 20.Marcolli M., Thorngren R. Thermodynamic Semirings. [(accessed on 4 September 2019)];arXiv. 2011 doi: 10.4171/JNCG/159. Available online: https://arxiv.org/abs/1108.2874.10.4171/JNCG/159 [DOI] [Google Scholar]
  • 21.Maniero T. Homological Tools for the Quantum Mechanic. [(accessed on 4 September 2019)];arXiv. 2019 Available online: https://arxiv.org/abs/1901.02011.1901.02011 [Google Scholar]
  • 22.Gerstenhaber M., Schack S. A hodge-type decomposition for commutative algebra cohomology. J. Pure Appl. Algebr. 1987;48:229–247. doi: 10.1016/0022-4049(87)90112-5. [DOI] [Google Scholar]
  • 23.Shannon C. Transactions of the IRE Professional Group on Information Theory. Volume 1. IEEE; Piscataway, NJ, USA: 1953. The lattice theory of information; pp. 105–107. [Google Scholar]
  • 24.Rajski C. A metric space of discrete probability distributions. Inform. Control. 1961;4:371–377. doi: 10.1016/S0019-9958(61)80055-7. [DOI] [Google Scholar]
  • 25.Zurek W. Thermodynamic cost of computation, algorithmic complexity and the information metric. Nature. 1989;341:119–125. doi: 10.1038/341119a0. [DOI] [Google Scholar]
  • 26.Bennett C.H., Gács P., Li M., Vitányi P.M., Zurek W.H. Information distance. IEEE Trans. Inf. Theory. 1998;44:1407–1423. doi: 10.1109/18.681318. [DOI] [Google Scholar]
  • 27.Kraskov A., Grassberger P. Information Theory and Statistical Learning. Springer; Boston, MA, USA: 2009. MIC: Mutual Information Based Hierarchical Clustering; pp. 101–123. [Google Scholar]
  • 28.Crutchfield J.P. Nonlinear Structures in Physical Systems. Springer; New York, NY, USA: 1990. Information and its metric; pp. 119–130. [Google Scholar]
  • 29.Wu F. The Potts model. Rev. Mod. Phys. 1982;54:235–268. doi: 10.1103/RevModPhys.54.235. [DOI] [Google Scholar]
  • 30.Turban L. One-dimensional Ising model with multispin interactions. J. Phys. A-Math. Theor. 2016;49:355002. doi: 10.1088/1751-8113/49/35/355002. [DOI] [Google Scholar]
  • 31.Kohn W. Nobel Lecture: Electronic structure of matter—Wave functions and density functionals. Rev. Mod. Phys. 1999;71:1253. doi: 10.1103/RevModPhys.71.1253. [DOI] [Google Scholar]
  • 32.Baez J., Pollard S. Relative Entropy in Biological Systems. Entropy. 2016;18:46. doi: 10.3390/e18020046. [DOI] [Google Scholar]
  • 33.Brenner N., Strong S.P., Koberle R., Bialek W., de Ruyter van Steveninck R.R. Synergy in a neural code. Neural Comput. 2000;12:1531–1552. doi: 10.1162/089976600300015259. [DOI] [PubMed] [Google Scholar]
  • 34.Matsuda H. Information theoretic characterization of frustrated systems. Physica A. 2001;294:180–190. doi: 10.1016/S0378-4371(01)00039-5. [DOI] [Google Scholar]
  • 35.Dunkel J., Hilbert S. Phase transitions in small systems: Microcanonical vs. canonical ensembles. Physica A. 2006;370:390–406. doi: 10.1016/j.physa.2006.05.018. [DOI] [Google Scholar]
  • 36.Cover T., Thomas J. Elements of Information Theory. John Wiley & Sons; Chichester, UK: 1991. [Google Scholar]
  • 37.Noether E. Invariant Variation Problems. Transport Theor. Stat. 1971;1:186–207. doi: 10.1080/00411457108231446. [DOI] [Google Scholar]
  • 38.Mansfield E.L. Foundations of Computational Mathematics, Santander. Cambridge University Press; Cambridge, UK: 2005. Noether’s Theorem for Smooth, Difference and Finite Element Systems; pp. 230–257. [Google Scholar]
  • 39.Baez J., Fong B. A Noether theorem for Markov processes. J. Math. Phys. 2013;54:013301. doi: 10.1063/1.4773921. [DOI] [Google Scholar]
  • 40.Neuenschwander D. Noether’s theorem and discrete symmetries. Am. J. Phys. 1995;63:489. doi: 10.1119/1.17888. [DOI] [Google Scholar]
  • 41.Kadanoff L.P. Phase Transitions: Scaling, Universality and Renormalization. [(accessed on 4 September 2019)];2010 Phase Transitions Dirac V2.4. Available online: https://jfi.uchicago.edu/~leop/TALKS/Phase 20TransitionsV2.4Dirac.pdf.
  • 42.Wilson K.G., Kogut J. The renormalization group and the epsilon expansion. Phys. Rep. 1974;12:75–200. doi: 10.1016/0370-1573(74)90023-4. [DOI] [Google Scholar]
  • 43.Zinn-Justin J. Phase Transitions and Renormalization Group. Oxford University Press; Oxford, UK: 2010. [Google Scholar]
  • 44.Feynman R. QED. The Strange Theory of Light and Matter. Princeton University Press; Princeton, NJ, USA: 1985. [Google Scholar]
  • 45.Dirac P. Directions in Physics. John Wiley & Sons; Chichester, UK: 1978. [Google Scholar]
  • 46.Van der Waals J.D. Over de Continuiteit van den Gas- en Vloeistoftoestand. Luitingh-Sijthoff; Amsterdam, The Netherlands: 1873. [Google Scholar]
  • 47.Maxwell J. Van der Waals on the Continuity of the Gaseous and Liquid States. Nature. 1874;10:407. [Google Scholar]
  • 48.Maxwell J. On the dynamical evidence of the molecular constitution of bodies. Nature. 1875;1:357–359. doi: 10.1038/011357a0. [DOI] [Google Scholar]
  • 49.Ellerman D. An introduction to partition logic. Log. J. IGPL. 2014;22:94–125. doi: 10.1093/jigpal/jzt036. [DOI] [Google Scholar]
  • 50.Ellerman D. The Review of Symbolic Logic. Volume 3. Cambridge University Press; Cambridge, UK: 2010. The logic of partitions: introduction to the dual of the logic of subsets; pp. 287–350. [Google Scholar]
  • 51.James R., Crutchfield J. Multivariate Dependence beyond Shannon Information. Entropy. 2017;19:531. doi: 10.3390/e19100531. [DOI] [Google Scholar]
  • 52.Foster D., Foster J., Paczuski M., Grassberger F. Communities, clustering phase transitions, and hysteresis: Pitfalls in constructing network ensembles. Phys. Rev. E. 2010;81:046115. doi: 10.1103/PhysRevE.81.046115. [DOI] [PubMed] [Google Scholar]
  • 53.Lum P., Singh G., Lehman A., Ishkanov T., Vejdemo-Johansson M., Alagappan M., Carlsson J., Carlsson G. Extracting insights from the shape of complex data using topology. Sci. Rep. 2013;3:1236. doi: 10.1038/srep01236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Epstein C., Carlsson G., Edelsbrunner H. Topological data analysis. Inverse Probl. 2011;27:120201. [Google Scholar]
  • 55.Carlsson G. Topology and data. Bull. Am. Math. Soc. 2009;46:255–308. doi: 10.1090/S0273-0979-09-01249-X. [DOI] [Google Scholar]
  • 56.Niyogi P., Smale S., Weinberger S. A Topological View of Unsupervised Learning from Noisy Data. SIAM J. Comput. 2011;20:646–663. doi: 10.1137/090762932. [DOI] [Google Scholar]
  • 57.Buchet M., Chazal F., Oudot S., Sheehy D. Efficient and Robust Persistent Homology for Measures. Comput. Geom. 2016;58:70–96. doi: 10.1016/j.comgeo.2016.07.001. [DOI] [Google Scholar]
  • 58.Chintakunta H., Gentimis T., Gonzalez-Diaz R., Jimenez M.J., Krim H. An entropy-based persistence barcode. Pattern Recognit. 2015;48:391–401. doi: 10.1016/j.patcog.2014.06.023. [DOI] [Google Scholar]
  • 59.Merelli E., Rucco M., Sloot P., Tesei L. Topological Characterization of Complex Systems: Using Persistent Entropy. Entropy. 2015;17:6872–6892. doi: 10.3390/e17106872. [DOI] [Google Scholar]
  • 60.Tadic B., Andjelkovic M., Suvakov M. The influence of architecture of nanoparticle networks on collective charge transport revealed by the fractal time series and topology of phase space manifolds. J. Coupled Syst. Multiscale Dyn. 2016;4:30–42. doi: 10.1166/jcsmd.2016.1094. [DOI] [Google Scholar]
  • 61.Maletic S., Rajkovic M. Combinatorial Laplacian and entropy of simplicial complexes associated with complex networks. Eur. Phys. J. 2012;212:77–97. doi: 10.1140/epjst/e2012-01655-6. [DOI] [Google Scholar]
  • 62.Maletic S., Zhao Y. Multilevel Integration Entropies: The Case of Reconstruction of Structural Quasi-Stability in Building Complex Datasets. Entropy. 2017;19:172. doi: 10.3390/e19040172. [DOI] [Google Scholar]
  • 63.Baudot P., Bernardi M. Information Cohomology Methods for Learning the Statistical Structures of Data. [(accessed on 4 September 2019)];2019 DS3 Data Science, Ecole Polytechnique. Available online: https://www.ds3-datascience-polytechnique.fr/wp-content/uploads/2019/06/DS3-426_2019_v2.pdf.
  • 64.Hopfield J. Neural Networks and Physical Systems with Emergent Collective Computational Abilities. Proc. Natl. Acad. Sci. USA. 1982;79:2554–2558. doi: 10.1073/pnas.79.8.2554. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Ackley D., Hinton G., Sejnowski T.J. A Learning Algorithm for Boltzmann Machines. Cogn. Sci. 1985;9:147–169. doi: 10.1207/s15516709cog0901_7. [DOI] [Google Scholar]
  • 66.Dayan P., Hinton G., Neal R., Zemel R. The Helmholtz Machine. Neural Comput. 1995;7:889–904. doi: 10.1162/neco.1995.7.5.889. [DOI] [PubMed] [Google Scholar]
  • 67.Baudot P. Elements of Consciousness and Cognition. Biology, Mathematic, Physics and Panpsychism: An Information Topology Perspective. [(accessed on 4 September 2019)];arXiv. 2018 Available online: https://arxiv.org/abs/1807.04520.1807.04520 [Google Scholar]
  • 68.Port A., Gheorghita I., Guth D., Clark J., Liang C., Dasu S., Marcolli M. Mathematics in Computer Science. Volume 1. Springer; Berlin/Heidelberger, Germany: 2018. Persistent Topology of Syntax; pp. 33–50. [Google Scholar]
  • 69.Marr D. Vision. MIT Press; Cambridge, MA, USA: 1982. [Google Scholar]
  • 70.Poincare H. Analysis Situs. Journal de l’École Polytechnique. 1895;1:1–121. [Google Scholar]
  • 71.Shannon C.E. A Mathematical Theory of Communication. Bell Labs Tech. J. 1948;27:379–423. doi: 10.1002/j.1538-7305.1948.tb01338.x. [DOI] [Google Scholar]
  • 72.Andre Y. Symétries I. Idées Galoisiennes. IRCAM; Paris, France: 2009. [(accessed on 5 september 2019)]. Chap 4 of Leçons de Mathématiques Contemporaines à l’IRCAM, Master; cel-01359200. Available online: https://cel.archives-ouvertes.fr/cel-01359200. [Google Scholar]
  • 73.Andre Y. Ambiguity theory, old and new. [(accessed on 4 September 2019)];arXiv. 2008 Available online: https://arxiv.org/abs/0805.2568.0805.2568 [Google Scholar]
  • 74.Yeung R. Communications, Information and Network Security. Volume 712. Springer; Boston, MA, USA: 2003. On Entropy, Information Inequalities, and Groups; pp. 333–359. [Google Scholar]
  • 75.Cathelineau J. Sur l’homologie de sl2 a coefficients dans l’action adjointe. Math. Scand. 1988;63:51–86. doi: 10.7146/math.scand.a-12225. [DOI] [Google Scholar]
  • 76.Kontsevitch M. The 11/2 Logarithm. 1995. Unpublished work.
  • 77.Elbaz-Vincent P., Gangl H. On poly(ana)logs I. Compos. Math. 2002;130:161–214. doi: 10.1023/A:1013757217319. [DOI] [Google Scholar]
  • 78.Elbaz-Vincent P., Gangl H. International Conference on Geometric Science of Information. Springer; Berlin/Heidelberger, Germany: 2015. Finite polylogarithms, their multiple analogues and the Shannon entropy. [Google Scholar]
  • 79.Connes A., Consani C. Noncommutative Geometry, Arithmetic, and Related Topics. JHU Press; Baltimore, MD, USA: 2009. Characteristic 1, entropy and the absolute point. [Google Scholar]
  • 80.Marcolli M., Tedeschi R. Entropy algebras and Birkhoff factorization. J. Geom. Phys. 2018;97:243–265. doi: 10.1016/j.geomphys.2015.07.018. [DOI] [Google Scholar]
  • 81.Baez J., Fritz T., Leinster T. A Characterization of Entropy in Terms of Information Loss. Entropy. 2011;13:1945–1957. doi: 10.3390/e13111945. [DOI] [Google Scholar]
  • 82.Baez J.C., Fritz T. A Bayesian characterization of relative entropy. Theory Appl. Categ. 2014;29:422–456. [Google Scholar]
  • 83.Boyom M. Foliations-Webs-Hessian Geometry-Information Geometry-Entropy and Cohomology. Entropy. 2016;18:433. doi: 10.3390/e18120433. [DOI] [Google Scholar]
  • 84.Drummond-Cole G., Park J.S., Terilla J. Homotopy probability theory I. J. Homotopy Relat. Struct. 2015;10:425–435. doi: 10.1007/s40062-013-0067-y. [DOI] [Google Scholar]
  • 85.Drummond-Cole G., Park J.S., Terilla J. Homotopy probabilty Theory II. J. Homotopy Relat. Struct. 2015;10:623–635. doi: 10.1007/s40062-014-0078-3. [DOI] [Google Scholar]
  • 86.Park J.S. Homotopy Theory of Probability Spaces I: Classical independence and homotopy Lie algebras. [(accessed on 4 September 2019)];aiXiv. 2015 Available online: https://arxiv.org/abs/1510.08289.1510.08289 [Google Scholar]
  • 87.Beilinson A., Goncharov A., Schechtman V., Varchenko A. The Grothendieck Festschrift. Volume 1. Birkhäuser; Boston, MA, USA: 1990. Aomoto dilogarithms, mixed Hodge structures and motivic cohomology of pairs of triangles on the plane; pp. 135–172. [Google Scholar]
  • 88.Aomoto K. Addition theorem of Abel type for Hyper-logarithms. Nagoya Math. J. 1982;88:55–71. doi: 10.1017/S0027763000020092. [DOI] [Google Scholar]
  • 89.Goncharov A. Regulators. Springer; Berlin/Heidelberger, Germany: 2005. pp. 297–324. [Google Scholar]
  • 90.Kullback S., Leibler R. On information and sufficiency. Ann. Math. Stat. 1951;22:79–86. doi: 10.1214/aoms/1177729694. [DOI] [Google Scholar]
  • 91.Hu K.T. On the Amount of Information. Theory Probab. Appl. 1962;7:439–447. [Google Scholar]
  • 92.Yeung R. Information Theory and Network Coding. Springer; Berlin/Heidelberger, Germany: 2007. [Google Scholar]
  • 93.McGill W. Multivariate information transmission. Psychometrika. 1954;19:97–116. doi: 10.1007/BF02289159. [DOI] [Google Scholar]
  • 94.Watanabe S. Information theoretical analysis of multivariate correlation. IBM J. Res. Dev. 1960;4:66–81. doi: 10.1147/rd.41.0066. [DOI] [Google Scholar]
  • 95.Tononi G., Edelman G. Consciousness and Complexity. Science. 1998;282:1846–1851. doi: 10.1126/science.282.5395.1846. [DOI] [PubMed] [Google Scholar]
  • 96.Studeny M., Vejnarova J. Learning in Graphical Models. MIT Press; Cambridge, MA, USA: 1999. The multiinformation function as a tool for measuring stochastic dependence; pp. 261–296. [Google Scholar]
  • 97.Margolin A., Wang K., Califano A., Nemenman I. Multivariate dependence and genetic networks inference. IET Syst. Biol. 2010;4:428–440. doi: 10.1049/iet-syb.2010.0009. [DOI] [PubMed] [Google Scholar]
  • 98.Andrews G. The Theory of Partitions. Cambridge University Press; Cambridge, UK: 1998. [Google Scholar]
  • 99.Fresse B. Koszul duality of operads and homology of partitionn posets. Contemp. Math. Am. Math. Soc. 2004;346:115–215. [Google Scholar]
  • 100.Hochschild G. On the cohomology groups of an associative algebra. Ann. Math. 1945;46:58–67. doi: 10.2307/1969145. [DOI] [Google Scholar]
  • 101.Weibel C. An Introduction to Homological Algebra. Cambridge University Press; Cambridge, UK: 1995. [Google Scholar]
  • 102.Kassel C. Homology and Cohomology of Associative Algebras—A Concise Introduction to Cyclic Homology. [(accessed on 4 September 2019)];2004 Advanced Course on Non-Commutative Geometry. Available online: https://cel.archives-ouvertes.fr/cel-00119891/
  • 103.Tate J. Galois Cohomology. Springer Science & Business Media; Berlin/Heidelberger, Germany: 1991. [Google Scholar]
  • 104.Cartan H., Eilenberg S. Homological Algebra. Princeton University Press; Princeton, NJ, USA: 1956. [Google Scholar]
  • 105.Mac Lane S. Homology. Springer Science & Business Media; Berlin/Heidelberger, Germany: 1975. [Google Scholar]
  • 106.Kendall D. Functional Equations in Information Theory. Probab. Theory Relat. Field. 1964;2:225–229. doi: 10.1007/BF00533380. [DOI] [Google Scholar]
  • 107.Lee P. On the Axioms of Information Theory. Ann. Math. Stat. 1964;35:415–418. doi: 10.1214/aoms/1177703765. [DOI] [Google Scholar]
  • 108.Baudot P., Tapia M., Goaillard J. Topological Information Data Analysis: Poincare-Shannon Machine and Statistical Physic of Finite Heterogeneous Systems. [(accessed on 5 September 2019)];2018 Preprints 2018040157. Available online: https://www.preprints.org/manuscript/201804.0157/v1.
  • 109.Lamarche-Perrin R., Demazeau Y., Vincent J. The Best-partitions Problem: How to Build Meaningful Aggregations?; Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT); Atlanta, GA, USA. 17–20 November 2013; 2013. p. 18. [Google Scholar]
  • 110.Pudlák P., Tůma J. Every finite lattice can be embedded in a finite partition lattice. Algebra Univ. 1980;10:74–95. doi: 10.1007/BF02482893. [DOI] [Google Scholar]
  • 111.Gerstenhaber M., Schack S. Simplicial cohomology is Hochschild Cohomology. J. Pure Appl. Algebr. 1983;30:143–156. doi: 10.1016/0022-4049(83)90051-8. [DOI] [Google Scholar]
  • 112.Steenrod N. Products of Cocycles and Extensions of Mapping. Ann. Math. 1947;48:290–320. doi: 10.2307/1969172. [DOI] [Google Scholar]
  • 113.Atiyah M. Topological quantum field theory. Publ. Math. IHÉS. 1988;68:175–186. doi: 10.1007/BF02698547. [DOI] [Google Scholar]
  • 114.Witten E. Topological Quantum Field Theory. Commun. Math. Phys. 1988;117:353–386. doi: 10.1007/BF01223371. [DOI] [Google Scholar]
  • 115.Schwarz A. Topological Quantum Field Theory. [(accessed on 4 September 2019)];arXiv. 2000 Available online: https://arxiv.org/abs/hep-th/0011260.hep-th/0011260v1 [Google Scholar]
  • 116.Reshef D., Reshef Y., Finucane H., Grossman S., McVean G., Turnbaugh P., Lander E., Mitzenmacher M., Sabeti P. Detecting Novel Associations in Large Data Sets. Science. 2011;334:1518. doi: 10.1126/science.1205438. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Hohenberg P., Kohn W. Inhomogeneous electron gas. Phys. Rev. 1964;136:864–871. doi: 10.1103/PhysRev.136.B864. [DOI] [Google Scholar]
  • 118.Kohn W., Sham L.J. Self-Consistent Equations Including Exchange and Correlation Effects. Phys. Rev. 1965;140:1133–1138. doi: 10.1103/PhysRev.140.A1133. [DOI] [Google Scholar]
  • 119.Wheeler J. Information, Physics, quantum: The search for the links; Proceedings of the 3rd International Symposium on Foundations of Quantum Mechanics; Tokyo, Japan. 28–31 August 1989; pp. 354–368. [Google Scholar]
  • 120.Von Bahr B. On the central limit theorem in Rk. Ark. Mat. 1967;7:61–69. doi: 10.1007/BF02591677. [DOI] [Google Scholar]
  • 121.Ebrahimi N., Soofi E., Soyer R. Multivariate maximum entropy identification, transformation, and dependence. J. Multivar. Anal. 2008;99:1217–1231. doi: 10.1016/j.jmva.2007.08.004. [DOI] [Google Scholar]
  • 122.Conrad K. Probability distributions and maximum entropy. Entropy. 2005;6:10. [Google Scholar]
  • 123.Adami C., Cerf N. Prolegomena to a non-equilibrium quantum statistical mechanics. Chaos Solition Fract. 1999;10:1637–1650. [Google Scholar]
  • 124.Kapranov M. Thermodynamics and the moment map. [(accessed on 4 September 2019)];arXiv. 2011 Available online: https://arxiv.org/pdf/1108.3472.pdf.1108.3472 [Google Scholar]
  • 125.Erdos P. On the distribution function of additive functions. Ann. Math. 1946;47:1–20. doi: 10.2307/1969031. [DOI] [Google Scholar]
  • 126.Aczel J., Daroczy Z. On Measures of Information and Their Characterizations. Academic Press; Cambridge, MA, USA: 1975. [Google Scholar]
  • 127.Lifshitz E.M., Landau L.D. Statistical Physics (Course of Theoretical Physics, Volume 5) Butterworth-Heinemann; Oxford, UK: 1969. [Google Scholar]
  • 128.Han T.S. Linear dependence structure of the entropy space. Inf. Control. 1975;29:337–368. doi: 10.1016/S0019-9958(75)80004-0. [DOI] [Google Scholar]
  • 129.Bjorner A. Continuous partition lattice. Proc. Natl. Acad. Sci. USA. 1987;84:6327–6329. doi: 10.1073/pnas.84.18.6327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Postnikov A. Permutohedra, Associahedra, and Beyond. Int. Math. Res. Not. 2009;2009:1026–1106. doi: 10.1093/imrn/rnn153. [DOI] [Google Scholar]
  • 131.Matus F. Annales de l’Institut Henri Poincare (B) Probability and Statistics. Elsevier; Amsterdam, The Netherlands: 2003. Conditional probabilities and permutahedron. [Google Scholar]
  • 132.Yeung R. Communications in Information and Systems. Volume 15. International Press of Boston; Somerville, MA, USA: 2015. Facets of entropy; pp. 87–117. [Google Scholar]
  • 133.Yeung R. A framework for linear information inequalities. IEEE Trans. Inf. Theory. 1997;43:1924–1934. doi: 10.1109/18.641556. [DOI] [Google Scholar]
  • 134.Zang Z., Yeung R.W. On Characterization of Entropy Function via Information Inequalities. IEEE Trans. Inf. Theory. 1997;44:1440–1452. doi: 10.1109/18.681320. [DOI] [Google Scholar]
  • 135.Matúš F. Infinitely Many Information Inequalities. ISIT. 2007:41–47. doi: 10.1109/ISIT.2007.4557201. [DOI] [Google Scholar]
  • 136.Takacs D. Stochastic Processes Problems and Solutions. John Wiley & Sons; Chichester, UK: 1960. [Google Scholar]
  • 137.Bourbaki N. Theory of Sets-Elements of Mathematic. Addison Wesley; Boston, MA, USA: 1968. [Google Scholar]
  • 138.Brillouin L. Scientific Uncertainty, and Information. Academic Press; Cambridge, MA, USA: 2014. [Google Scholar]
  • 139.Griffiths R. Consistent Histories and the Interpretation of Quantum Mechanics. J. Stat. Phys. 1984;35:219. doi: 10.1007/BF01015734. [DOI] [Google Scholar]
  • 140.Omnes R. Logical reformulation of quantum mechanics I. Foundations. J. Stat. Phys. 1988;53:893–932. doi: 10.1007/BF01014230. [DOI] [Google Scholar]
  • 141.Gell-Mann M., Hartle J. Quantum mechanics in the light of quantum cosmology. In: Zurek W., editor. Complexity, Entropy, and the Physics of Information. Addison-Wesley; Boston, MA, USA: 1990. pp. 425–458. [Google Scholar]
  • 142.Lieb E.H., Yngvason J. Statistical Mechanics. Volume 45. Springer; Berlin/Heidelberger, Germany: 1998. A Guide to Entropy and the Second Law of Thermodynamics; pp. 571–581. [Google Scholar]
  • 143.Feynman R. Space-Time Approach to Non-Relativistic Quantum Mechanics. Rev. Mod. Phys. 1948;20:367–387. doi: 10.1103/RevModPhys.20.367. [DOI] [Google Scholar]
  • 144.Merkh T., Montufar G. Factorized Mutual Information Maximization. [(accessed on 4 September 2019)];arXiv. 2019 Available online: https://arxiv.org/abs/1906.05460.1906.05460 [Google Scholar]
  • 145.Weiss P. L’hypothèse du champ moléculaire et la propriété ferromagnétique. J. Phys. Theor. Appl. 1907;6:661–690. doi: 10.1051/jphystap:019070060066100. [DOI] [Google Scholar]
  • 146.Parsegian V. Van der Waals Forces: A Handbook for Biologists, Chemists, Engineers, and Physicists. Cambridge University Press; Cambridge, UK: 2006. [Google Scholar]
  • 147.Xie Z., Chen J., Yu J., Kong X., Normand B., Xiang T. Tensor Renormalization of Quantum Many-Body Systems Using Projected Entangled Simplex States. Phys. Rev. X. 2014;4:011025. doi: 10.1103/PhysRevX.4.011025. [DOI] [Google Scholar]
  • 148.Hà H.T., Van Tuyl A. Resolutions of square-free monomial ideals via facet ideals: A survey. [(accessed on 4 September 2019)];arXiv. 2006 Available online: https://arxiv.org/abs/math/0604301.math/0604301 [Google Scholar]
  • 149.Newman M.E.J. Complex Systems: A Survey. [(accessed on 4 September 2019)];arXiv. 2011 Available online: http://arxiv.org/abs/1112.1440v1.1112.1440v1 [Google Scholar]
  • 150.Mezard M., Montanari A. Information, Physics, and Computation. Oxford University Press; Oxford, UK: 2009. [Google Scholar]
  • 151.Vannimenus J., Toulouse G. Theory of the frustration effect. II. Ising spins on a square lattice. J. Phys. Condens. Matter. 1977;10:115. doi: 10.1088/0022-3719/10/18/008. [DOI] [Google Scholar]
  • 152.Rovelli C. Notes for a brief history of quantum gravity. [(accessed on 4 September 2019)];arXiv. 2008 Available online: https://arxiv.org/pdf/gr-qc/0006061.pdf.gr-qc/0006061v3 [Google Scholar]
  • 153.Sorkin R. Finitary Substitute for Continuous Topology. Int. J. Theor. Phys. 1991;30:923–947. doi: 10.1007/BF00673986. [DOI] [Google Scholar]
  • 154.Strong S.P., Van Steveninck R.D.R., Bialek W., Koberle R. On the application of information theory to neural spike trains; Proceedings of the Pacific Symposium on Biocomputing; Maui, HI, USA. 4–9 January 1998; pp. 621–632. [PubMed] [Google Scholar]
  • 155.Niven R. Non-asymptotic thermodynamic ensembles. EPL. 2009;86:1–6. doi: 10.1209/0295-5075/86/20010. [DOI] [Google Scholar]
  • 156.Niven R. Combinatorial entropies and statistics. Eur. Phys. J. 2009;70:49–63. doi: 10.1140/epjb/e2009-00168-5. [DOI] [Google Scholar]
  • 157.Niven R. Exact Maxwell–Boltzmann, Bose–Einstein and Fermi–Dirac statistics. Phys. Lett. A. 2005;342:286–293. doi: 10.1016/j.physleta.2005.05.063. [DOI] [Google Scholar]
  • 158.Grassberger P. Toward a quantitative theory of self-generated complexity. Int. J. Theor. Phys. 1986;25:907–938. doi: 10.1007/BF00668821. [DOI] [Google Scholar]
  • 159.Bialek W., Nemenman I., Tishby N. Complexity through nonextensivity. Physica A. 2001;302:89–99. doi: 10.1016/S0378-4371(01)00444-7. [DOI] [Google Scholar]
  • 160.Tsallis C. Entropic Nonextensivity: A possible measure of Complexity. Chaos Solition Fract. 2002;13:371–391. doi: 10.1016/S0960-0779(01)00019-4. [DOI] [Google Scholar]
  • 161.Ritort F. Nonequilibrium fluctuations in small systems: From physics to biology. Adv. Chem. Phys. 2008;137:31–123. [Google Scholar]
  • 162.Artin M., Grothendieck A., Verdier J. Theorie des Topos et Cohomologie Etale des Schemas—(SGA 4) Vol I, II, III. Springer; Berlin/Heidelberger, Germany: 1972. [Google Scholar]
  • 163.Kolmogorov A.N. Grundbegriffe der Wahrscheinlichkeitsrechnung. Springer; Berlin/Heidelberger, Germany: 1933. [Google Scholar]
  • 164.Tkacik G., Marre O., Amodei D., Schneidman E., Bialek W., Berry M.N. Searching for collective behavior in a large network of sensory neurons. PLoS Comput. Biol. 2014;10:e1003408. doi: 10.1371/journal.pcbi.1003408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 165.Vigneaux J. Information theory with finite vector spaces. IEEE Trans. Inf. Theory. 2019;99:1. doi: 10.1109/TIT.2019.2907590. [DOI] [Google Scholar]
  • 166.Wada T., Suyari H. The k-Generalizations of Stirling Approximation and Multinominal Coefficients. Entropy. 2013;15:5144–5153. doi: 10.3390/e15125144. [DOI] [Google Scholar]
  • 167.Cerf N., Adami C. Negative entropy and information in quantum mechanic. Phys. Rev. Lett. 1997;79:5194. doi: 10.1103/PhysRevLett.79.5194. [DOI] [Google Scholar]
  • 168.Cerf N., Adami C. Entropic Bell Inequalities. Phys. Rev. A. 1997;55:3371. doi: 10.1103/PhysRevA.55.3371. [DOI] [Google Scholar]
  • 169.Oudot S. Persistence Theory: From Quiver Representations to Data Analysis. Volume 209 American Mathematical Society; Providence, RI, USA: 2015. [Google Scholar]
  • 170.Schneider G. Two visual systems. Science. 1969;163:895–902. doi: 10.1126/science.163.3870.895. [DOI] [PubMed] [Google Scholar]
  • 171.Goodale M., Milner A. Separate visual pathways for perception and action. Trends Neurosci. 1992;15:20–25. doi: 10.1016/0166-2236(92)90344-8. [DOI] [PubMed] [Google Scholar]
  • 172.Kelley H. Gradient theory of optimal flight paths. ARS. 1960;30:947–954. doi: 10.2514/8.5282. [DOI] [Google Scholar]
  • 173.Dreyfus S. The numerical solution of variational problems. J. Math. Anal. Appl. 1962;5:30–45. doi: 10.1016/0022-247X(62)90004-5. [DOI] [Google Scholar]
  • 174.Rumelhart D., Hinton G., Williams R. Learning representations by back-propagating errors. Nature. 1986;323:533–536. doi: 10.1038/323533a0. [DOI] [Google Scholar]
  • 175.Montufar G., Morton J. Discrete Restricted Boltzmann Machines. J. Mach. Learn. Res. 2015;21:653–672. [Google Scholar]
  • 176.Amari S. Advances in Neural Information Processing Systems. MIT Press; Cambridge, MA, USA: 1997. Neural learning in structured parameter spaces—Natural Riemannian gradient; pp. 127–133. [Google Scholar]
  • 177.Martens J. New Insights and Perspectives on the Natural Gradient Method. [(accessed on 4 September 2019)];arXiv. 2017 Available online: https://arxiv.org/abs/1412.1193.1412.1193 [Google Scholar]
  • 178.Bengtsson I., Zyczkowski K. Geometry of Quantum States: An Introduction to Quantum Entanglement. Cambridge University Press; Cambridge, UK: 2006. [Google Scholar]
  • 179.Pascanu R., Bengio Y. Revisiting Natural Gradient for Deep Networks. [(accessed on 4 September 2019)];arXiv. 2014 Available online: https://arxiv.org/abs/1301.3584.1301.3584 [Google Scholar]
  • 180.Waddington C.H. The Strategy of the Genes. Routledge; Abingdon-on-Thames, UK: 1957. [Google Scholar]
  • 181.Teschendorff A., Enver T. Single-cell entropy for accurate estimation of differentiation potency from a cell’s transcriptome. Nat. Commun. 2017;8:15599. doi: 10.1038/ncomms15599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 182.Jin S., MacLean A., Peng T., Nie Q. scEpath: Energy landscape-based inference of transition probabilities and cellular trajectories from single-cell transcriptomic data. Bioinformatics. 2018;34:2077–2086. doi: 10.1093/bioinformatics/bty058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 183.Thom R. Stabilite struturelle et morphogenese. Poetics. 1974;3:7–19. doi: 10.1016/0304-422X(74)90010-2. [DOI] [Google Scholar]
  • 184.Linsker R. Self-organization in a perceptual network. Computer. 1988;21:105–117. doi: 10.1109/2.36. [DOI] [Google Scholar]
  • 185.Nadal J.P., Parga N. Sensory coding: information maximization and redundancy reduction; Proceedings of the Neuronal Information Processing; Cargèse, France. 30 June–12 July 1997; Singapore: World Scientific; 1999. pp. 164–171. [Google Scholar]
  • 186.Bell A.J., Sejnowski T. An information maximisation approach to blind separation and blind deconvolution. Neural Comput. 1995;7:1129–1159. doi: 10.1162/neco.1995.7.6.1129. [DOI] [PubMed] [Google Scholar]
  • 187.Chen N., Glazier J., Izaguirre J., Alber M. A parallel implementation of the Cellular Potts Model for simulation of cell-based morphogenesis. Comput. Phys. Commun. 2007;176:670–681. doi: 10.1016/j.cpc.2007.03.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 188.Galvan A. Neural plasticity of development and learning. Hum. Brain Mapp. 2010;31:879–890. doi: 10.1002/hbm.21029. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES