Abstract
Recent progress of theories and applications regarding statistical models with generalized exponential functions in statistical science is giving an impact on the movement to deform the standard structure of information geometry. For this purpose, various representing functions are playing central roles. In this paper, we consider two important notions in information geometry, i.e., invariance and dual flatness, from a viewpoint of representing functions. We first characterize a pair of representing functions that realizes the invariant geometry by solving a system of ordinary differential equations. Next, by proposing a new transformation technique, i.e., conformal flattening, we construct dually flat geometries from a certain class of non-flat geometries. Finally, we apply the results to demonstrate several properties of gradient flows on the probability simplex.
Keywords: representing functions, affine immersion, nonextensive statistical physics, invariance, dually flat structure, Legendre conjugate, gradient flow
1. Introduction
The theory of information geometry has elucidated abundant geometric properties equipped with a Riemannian metric and mutually dual affine connections. When it is applied to the study of statistical models described by the exponential family, the logarithmic function plays a significant role in giving the standard information geometric structure to the models [1,2].
Inspired by the recent progress of several areas in statistical physics and mathematical statistics [3,4,5,6,7,8,9,10] which have exploited theoretical interests and possible applications for generalized exponential families, one research direction in information geometry is pointing to constructions of deformed geometries based on the standard one, keeping its basic properties. A typical and classical example of such a deformation would be the alpha-geometry [1,2], a statistical definition of which can be regarded as a replacement of the logarithmic function by suitable power functions. Hence, for the purpose of the generalization and flexible applicability, much attention is paid to various uses of such replacements by representing functions as important tools [3,4,11,12].
Two major characteristics of the standard structure are dual flatness and invariance [2]. Dual flatness (or Hessian structure [13]) produces fruitful properties such as the existence of canonical coordinate systems, a pair of conjugate potential functions and the canonical divergence (relative entropy). In addition, they are connected with the Legendre duality relation, which is also fundamental in the generalization of statistical physics. On the other hand, the invariance of geometric structure is crucially valuable in developing mathematical statistics. It has been proved [14] that invariance holds for only the structure with a special triple of a Riemannian metric and a pair of mutually dual affine connections, which are respectively called the Fisher information and the alpha-connections (see Section 3 for their definitions). The study of these two characteristics from a viewpoint of representing functions would contribute to our geometrical understanding.
In this paper, we first characterize a pair of representing functions that realizes the invariant information geometric structure. Next, we propose a new transformation to obtain dually flat geometries from a certain class of non-flat information geometries, using concepts from affine differential geometry [15,16]. We call the transformation conformal flattening, which is a generalization of the way to realize the corresponding dually flat geometry from the alpha-geometry developed in [17,18]. As applications and easy consequences of the results, we finally show several properties of gradient flows associated with realized dually flat geometries. Focusing on geometric characteristics conserved by the transformation, we discuss the properties such as a relation between geodesics and flows, the first integral of the flows and so on. These properties are new and general. Hence, they refine the arguments of the flows in [18], where only the alpha-geometry is treated.
The paper is organized as follows. In Section 2, we introduce preliminary results, explaining several existing methods to construct the information geometric structure that includes a dually flat structure and the alpha-structure and so on. We also give a short summary of concepts from affine differential geometry, which will be used in this paper. Section 3 provides a characterization of representing functions that realize invariant geometry, i.e., the one equipped with the Fisher information and a pair of the alpha-connections. The characterization is obtained by solving a simple system of ordinary equations. In Section 4, we first obtain a certain class of information geometric structure by regarding representing functions as immersions into an ambient affine space. Then, we demonstrate the conformal flattening to realize the corresponding dually flat structure, and discuss their properties and relations with generalized entropies or escort probabilities [19]. Section 5 exhibits the geometric properties of gradient flows with respect to a conformally realized Riemannian metric. These flows are reduced to the well-known replicator flow [20] (Chapter 16) when we consider the standard information geometry. Suitably choosing its pay-off functions, we see that the flow follows a geodesic curve or conserves a divergence from an equilibrium. In the final section, some concluding remarks are made.
Throughout the paper, we use a probability simplex as a statistical model for the sake of simplicity.
2. Preliminaries
2.1. Information Geometry of and
Let us represent an element with its components as . Denote, respectively, the positive orthant by
and the relative interior of the probability simplex by
Let be a probability distribution of a random variable X taking a value in the finite sample space . We consider a set of distributions with positive probabilities, i.e., , defined by
which is identified with . A statistical model in is represented with parameters by
where each is smoothly parametrized by . For such a statistical model, can also be regarded as coordinates of the corresponding submanifold in . For simplicity, we shall consider the full model, i.e., and the parameter set is bijective with via ’s.
The information geometric structure [2] on denoted by is composed of the pair of mutually dual torsion-free affine connections ∇ and with respect to a Riemannian metric g. If we write , the mutual duality requires components of to satisfy
| (1) |
Let L and M be a pair of strictly monotone (i.e., one-to-one) smooth functions on the interval . One way of constructing such a structure is to define the components as follows [2,11]:
| (2) |
| (3) |
| (4) |
In this paper, we call L and M representing functions. It is easy to verify the mutual duality (1). (Positive definiteness of g needs additional conditions.)
When the curvature tensors of both ∇ and vanish, is called dually flat [2]. It is known that is dually flat if and only if there exist two special coordinate systems denoted by and , respectively, where is ∇-affine, is -affine and they are biorthogonal, i.e.,
We give examples. For a real number , define and . If we set and , then they derive the alpha-structure [2] , where is the Fisher information and are the alpha-connections (see Section 3). In particular, if we choose , it defines the standard dually flat structure , where and are called the e- and m-connection, respectively [2]. Similarly, the -log geometry [3] can also be introduced in the same way by taking and .
One traditional way to construct a general information geometric structure , without using representing functions, is by means of contrast functions (or divergences) [2,21]. In our case, let be a function on satisfying with equality if and only if . For a vector field , let denote its tangent vector at p. When we define
| (5) |
| (6) |
| (7) |
we can confirm that (1) holds. If g is positive definite, we say that is a contrast function or a divergence that induces the structure .
A contrast function of the form:
| (8) |
always induces the corresponding dually flat structure. Conversely, it is known [2] that if is dually flat, then there exists the unique contrast function of the form (8) that induces the structure. Hence, it is called the canonical divergence of and we say that the functions and are potentials. By setting , we see that a dually flat structure naturally gives the Legendre duality relations at each p, i.e., the function , is the Legendre conjugate of satisfying
Applying the idea of affine hypersurface theory [15] is also one of the other ways to construct the information geometric structure. Let D be the canonical flat affine connection on . Consider an immersion f from into and a vector field on that is transversal to the hypersurface in . Such a pair , called an affine immersion, defines a torsion-free connection ∇ and the affine fundamental form g on via the Gauss formula as
| (9) |
where is the set of tangent vector fields on and denotes the differential of f. By regarding g as a (pseudo-) Riemannian metric, one can discuss the realized structure on .
We say that is non-degenerate and equiaffine if g is non-degenerate and is tangent to for any , respectively. The latter ensures that the volume element on defined by
is parallel to ∇ [15] (p.31). It is known [15,16] that there exists a torsion-free dual affine connection satisfying (1) if and only if is non-degenerate and equiaffine. In this case, the obtained structure on is not dually flat in general. However, there always exists a positive function and a dually flat structure on that hold the following relations [16]:
| (10) |
| (11) |
| (12) |
Furthermore, there exists a specific contrast function for called the geometric divergence. Then, a contrast function that induces is given by the conformal divergence . These properties of the structure realized by the non-degenerate and equiaffine immersion are called 1-conformal flatness [16].
3. Characterization of Invariant Geometry by Representing Functions
Suppose that a pair of representing functions defines an information geometric structure by (2), (3) and (4). In this section, we consider the condition of such that is invariant. This is equivalent [2,14] to g which is the Fisher information defined by
| (13) |
and a pair of dual connections satisfies and for a certain , where is the -connection defined by
| (14) |
Hence, expressed in (2) by functions and coincides with the Fisher information if and only if the following equation holds:
| (15) |
Similarly, we derive a condition for expressed in (3) to be the -connection. First, note that the following relations hold:
| (16) |
On the other hand, we have
| (17) |
| (18) |
Substituting (16), (17) and (18) into (3) and (14), and comparing them, we obtain (15) again and
| (19) |
Expressing and , we have the following ODE from (15) and (19):
| (20) |
By integrations, we get
| (21) |
and
| (22) |
where c and are constants with a constraint . Thus, is essentially a pair of representing functions that derives the alpha-geometry and there is only freedom of adjusting the constants for the invariance of geometry. If we require solely (15), which implies that only a Riemannian metric g is the Fisher information , there still remains much freedom for .
4. Affine Immersion of the Probability Simplex
Now we consider the affine immersion with the following assumptions.
Assumptions:
The affine immersion is nondegenerate and equiaffine,
- The immersion f is given by the component-by-component and common representing function L, i.e.,
The representing function is sign-definite, concave with and strictly increasing, i.e., . Hence, the inverse of L denoted by E exists, i.e., .
Each component of satisfies on .
Remark 1.
From the assumption 3, it follows that , and . Regarding sign-definiteness of L, note that we can adjust to by a suitable constant c without loss of generality since the resultant geometric structure is unchanged (See Theorem 1) by the adjustment. For a fixed L satisfying the assumption 3, we can choose ξ that meets the assumptions 1 and 4. For example, if we take , then is called centro-affine, which is known to be equiaffine [15] (p.37). The assumptions 3 and 4 also assure positive definiteness of g (the details are described in the proof of Theorem 1). Hence, is non-degenerate and we can regard g as a Riemannian metric on .
4.1. Conormal Vector and the Geometric Divergence
Define a function on by
then immersed in is expressed as a level surface of . Denote by the dual space of and by the pairing of and . The conormal vector [15] (p.57) for the affine immersion is defined by
| (23) |
for . Using the assumptions and noting the relations:
we have
| (24) |
where is a normalizing factor defined by
| (25) |
Then, we can confirm (23) using the relation for . Note that defined by
also satisfies
| (26) |
Furthermore, it follows, from (24), (25) and the assumption 4, that
for all .
It is known [15] (p.57) that the affine fundamental form g can be represented by
In our case, it is calculated via (26) as
| (27) |
Hence, g is positive definite from the assumptions 3 and 4, and we can regard it as a Riemannian metric.
Utilizing these notions from affine differential geometry, we can introduce a geometric divergence [16] as follows:
| (28) |
It is easily checked that is actually a contrast function of the 1-conformally flat structure using (5), (6) and (7).
4.2. Conformal Flattening Transformation
As is described in the preliminary section, by 1-conformally flatness there exists a positive function, i.e., conformal factor that relates with a dually flat structure via the conformal transformation (10), (11) and (12). A contrast function that induces is given as the conformal divergence:
| (29) |
from the geometric divergence in (28).
For an arbitrary function L within our setting given by the four assumptions, we prove that we can construct a dually flat structure by choosing the conformal factor carefully. Hereafter, we call this transformation conformal flattening.
Define
then it is negative because each is negative. The conformal divergence to with respect to the conformal factor is
Theorem 1.
If the conformal factor is , then the information geometric structure on that is transformed from the 1-conformally flat structure via (10), (11) and (12) is dully flat. Furthermore, the conformal divergence that induces on is canonical where Legendre conjugate potential functions and coordinate systems are explicitly given by
(30)
(31)
(32)
(33)
Proof.
Using given relations, we first show that the conformal divergence is the canonical divergence for :
(34) Next, let us confirm that .
Since , we have
by setting . Hence, we have
Differentiating by , we obtain
This implies that
Together with (34) and this relation, is confirmed to be the Legendre conjugate of .
The dual relation follows automatically from the property of the Legendre transform. □
The following corollary is straightforward because all the quantities in the theorem depend on only L:
Corollary 1.
Under the assumptions, the dually flat structure on , obtained by following the above conformal flattening, does not depend on the choice of the transversal vector ξ.
Remark 2.
Note that the conformal metric is given by and is positive definite. Furthermore, the relation (12) means that the dual affine connections and are projectively (or -1-conformally) equivalent [15,16]. Hence, is projectively flat. Furthermore, the above corollary implies that the realized affine connection ∇ is also projectively equivalent to the flat connection if we use the centro-affine immersion, i.e., [15,16]. See Proposition 3 for an application of projective equivalence of affine connections.
Remark 3.
In our setting, conformal flattening is geometrically regarded as normalization of the conormal vector ν. Hence, the dual coordinates can be interpreted as a generalization of the escort probability [10,19] (see the following example). Similarly, ψ and might be seen as the associated Massieu function and entropy, respectively.
Remark 4.
While the immersion f is composed of a representing function L under the assumption 2, the corresponding M of a single variable does not generally exist for nor . From the expressions of the Riemann metrics g in (27) and , we see that the counterparts of the representing functions would be, respectively, and , but note that they are multi-variable functions of .
4.3. Examples
If we take L to be the logarithmic function , then the conformally flattened geometry immediately defines the standard dually flat structure on the simplex . We see that is the entropy, i.e., and the conformal divergence is the KL divergence (relative entropy), i.e., .
Next, let the affine immersion be defined by the following L and :
and
with and . We see that the immersion is centro-affine scaled by the constant factor . Then, we see that the immersion realizes the alpha-structure on with . The geometric divergence is the alpha-divergence, i.e.,
Following the procedure of conformal flattening described in the above, we have [17]
and obtain a dually flat structure via the formulas in Theorem 1:
Here, and are the q-logarithmic function and the Tsallis entropy [10], respectively, defined by
Note that the escort probability appears as the dual coordinate .
5. An Application to Gradient Flows on
Recall the replicator flow on the simplex for given functions defined by
| (35) |
which is extensively studied in evolutionary game theory. It is known [20] (Chapter 16) that
-
(i)the solution to (35) is the gradient flow that maximizes a function satisfying
with respect to the Shahshahani metric (See below),(36) -
(ii)
the KL divergence is a local Lyapunov function for an equilibrium called the evolutionary stable state (ESS) for the case of with .
The Shahshahani metric is defined on the positive orthant by
Note that the Shahshahani metric induces the Fisher metric on . Further, the KL divergence is the canonical divergence [2] of . Thus, the replicator dynamics (35) are closely related with the standard dually flat structure , which associates with exponential and mixture families of probability distributions. In addition, investigation of the flow is also important from a viewpoint of statistical physics governed by the Boltzmann–Gibbs distributions when we choose as various physical quantities, e.g., free energy or entropy.
Similarly, when we consider various Legendre relations deformed by L, it would be of interest to investigate gradient flows on for a dually flat structure or a 1-conformally flat structure . Since g and can be naturally extended to as a diagonal form (we use the same notation for brevity):
| (37) |
from (27), we can define two gradient flows for on . One is the gradient flow for g, which is
| (38) |
for . It is verified that is tangent to , i.e., and gradient of V, i.e.,
In the same way, the other one for is defined by
| (39) |
Note that both the flows reduce to (35) when .
From (37), the following consequence is immediate:
Proposition 1.
The trajectories of the gradient flow (38) and (39) starting from the same initial point coincide while velocities of time-evolutions are different by the factor-.
Taking account of the example with respect to the alpha-geometry and the conformally flattened one given in subsection 4.3, the following result shown in [18] can be regarded as a corollary of the above proposition:
Corollary 2.
The trajectories of the gradient flow (39) with respect to the conformal metric for coincide with those of the replicator flow (35) while velocities of time-evolutions are different by the factor-.
Next, we particularly consider the case when is a potential function or divergences. As for a gradient flow on a manifold equipped with a dually flat structure , the following result is known:
Proposition 2.
[22] Consider the potential function and the canonical divergence of for an arbitrary prefixed point r. The gradient flows for and follow -geodesic curves.
As is described in Remark 2, and are projectively equivalent. One geometrically interesting property of the projective equivalence is that - and - geodesic curves coincide up to their parametrizations (i.e., a curve is -pregeodesic if and only if it is -pregeodesic) [15] (p.17). Combining this fact with Propositions 1 and 2, we see that the following result holds:
Proposition 3.
Let be an arbitrary prefixed point. The gradient flows (38) for , and follow -geodesic curves.
Finally, we demonstrate here another aspect of the flow (39). Let us particularly consider the following functions :
| (40) |
Note that s are not integrable, i.e., non-trivial V satisfying (36) does not exist because of the anti-symmetry of . Hence, for this case, (39) is no longer a gradient flow. However, we can prove the following result:
Theorem 2.
Consider the flow (39) with the functions s defined in (40) and assume that there exists an equilibrium for the flow. Then, and are the first integral (conserved quantity) of the flow.
Proof.
By substituting (40) into in (39) and using the expression of in (37), we have
By the relation and (31), it holds that
Hence, we see that and the flow (39) reduces to
(41) Since r is an equilibrium point, we see from (40) that
(42) Then, using (34), (41) and (42), we have
Thus, is the first integral of the flow. It follows that is also the first integral of the flow from the definition of conformal divergence (29). □
Remark 5.
From proposition 1, the same statement holds for the flow (38). The proposition implies the fact [20] that the KL divergence is the first integral for the replicator flow (35) with the function in (40) defined by and .
6. Conclusions
We have considered two important aspects of information geometric structure, i.e., invariance and dual flatness, from a viewpoint of representing functions. As for the invariance of geometry, we have proved that a pair of representing functions that derives the alpha-structure is essentially unique. On the other hand, we have shown the explicit formula of conformal flattening that transforms 1-conformally flat structures on the simplex realized by affine immersions to the corresponding dually flat structures. Finally, we have discussed several geometric properties of gradient flows associated to two structures.
Presently, our analysis is restricted to the probability simplex, i.e., the space of discrete probability distributions. For the continuous case, the similar or related results are obtained in [23,24] without using affine immersions. Extensions of the results obtained in this paper to continuous probability space and the exploitation of relations to the literature are left for future work.
The conformal flattening can also be applied to the computationally efficient construction of a Voronoi diagram with respect to the geometric divergences [18]. Exploring the possibilities of other applications would be of interest.
Acknowledgments
Part of the results is adapted and reprinted with permission from Springer Customer Service Centre GmbH (licence No: 4294160782766): Springer Nature, Geometric Science of Information LNCS 10589 Nielsen, F., Barbaresco, F., Eds., (Article Name:) On affine immersions of the probability simplex and their conformal flattening, (Author:) A. Ohara, (Copyright:) Springer International Publishing AG 2017 [25]. The author is partially supported by JSPS Grant-in-Aid (C) 15K04997.
Conflicts of Interest
The author declares no conflict of interest.
References
- 1.Amari S.I. Differential-Geometrical Methods in Statistics. Springer; New York, NY, USA: 1985. (Lecture Notes in Statistics Series 28). [Google Scholar]
- 2.Amari S.I., Nagaoka H. Methods of Information Geometry. AMS & Oxford University Press; Oxford, UK: 2000. (Translations of Mathematical Monographs Series 191). [Google Scholar]
- 3.Naudts J. Continuity of a class of entropies and relative entropies. Rev. Math. Phys. 2004;16:809. doi: 10.1142/S0129055X04002151. [DOI] [Google Scholar]
- 4.Eguchi S. Information geometry and statistical pattern recognition. Sugaku Expos. 2006;19:197. [Google Scholar]
- 5.Grünwald P.D., Dawid A.P. Game theory, maximum entropy, minimum discrepancy, and robust Bayesian decision theory. Ann. Statist. 2004;32:1367. [Google Scholar]
- 6.Fujisawa H., Eguchi S. Robust parameter estimation with a small bias against heavy contamination. J. Multivar. Anal. 2008;99:2053. doi: 10.1016/j.jmva.2008.02.004. [DOI] [Google Scholar]
- 7.Naudts J. The q-exponential family in statistical Physics. Cent. Eur. J. Phys. 2009;7:405. doi: 10.2478/s11534-008-0150-x. [DOI] [Google Scholar]
- 8.Naudts J. Generalized thermostatics. Springer; Berlin, Germany: 2010. [Google Scholar]
- 9.Ollila E., Tyler D., Koivunen V., Poor V. Complex elliptically symmetric distributions : Survey, new results and applications. IEEE Trans. Sig. Proc. 2012;60:5597. doi: 10.1109/TSP.2012.2212433. [DOI] [Google Scholar]
- 10.Tsallis C. Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World. Springer; Berlin, Germany: 2009. [Google Scholar]
- 11.Zhang J. Divergence Function, Duality, and Convex Analysis. Neural Comput. 2004;16:159. doi: 10.1162/08997660460734047. [DOI] [PubMed] [Google Scholar]
- 12.Wada T., Matsuzoe H. Conjugate representations and characterizing escort expectations in information geometry. Entropy. 2017;19:309. doi: 10.3390/e19070309. [DOI] [Google Scholar]
- 13.Shima H. The Geometry of Hessian Structures. World Scientific; Singapore: 2007. [Google Scholar]
- 14.Chentsov N.N. Statistical Decision Rules and Optimal Inference. AMS; Providence, RI, USA: 1982. [Google Scholar]
- 15.Nomizu K., Sasaki T. Affine Differential Geometry. Cambridge University Press; Cambridge, UK: 1993. [Google Scholar]
- 16.Kurose T. On the divergences of 1-conformally flat statistical manifolds. Tohoku Math. J. 1994;46:427. doi: 10.2748/tmj/1178225722. [DOI] [Google Scholar]
- 17.Ohara A., Matsuzoe H., Amari S.I. A dually flat structure on the space of escort distributions. J. Phys. Conf. Ser. 2010;201:012012. doi: 10.1088/1742-6596/201/1/012012. [DOI] [Google Scholar]
- 18.Ohara A., Matsuzoe H., Amari S.I. Conformal geometry of escort probability and its applications. Mod. Phys. Lett. B. 2012;26:1250063. doi: 10.1142/S0217984912500637. [DOI] [Google Scholar]
- 19.Tsallis C., Mendes M.S., Plastino A.R. The role of constraints within generalized nonextensive statistics. Physica A. 1998;261:534. doi: 10.1016/S0378-4371(98)00437-3. [DOI] [Google Scholar]
- 20.Hofbauer J., Sigmund K. The Theory of Evolution and Dynamical Systems: Mathematical Aspects of Selection. Cambridge University Press; Cambridge, UK: 1988. [Google Scholar]
- 21.Eguchi S. Geometry of minimum contrast. Hiroshima Math. J. 1992;22:631. [Google Scholar]
- 22.Fujiwara A., Amari S.I. Gradient systems in view of information geometry. Physica D. 1995;80:317. doi: 10.1016/0167-2789(94)00175-P. [DOI] [Google Scholar]
- 23.Amari S.I., Ohara A., Matsuzoe H. Geometry of deformed exponential families: Invariant, dually-flat and conformal geometries. Physica A. 2012;391:4308. doi: 10.1016/j.physa.2012.04.016. [DOI] [Google Scholar]
- 24.Matsuzoe H. Hessian structures on deformed exponential families and their conformal structures. Diff. Geo. Appl. 2014;35:323. doi: 10.1016/j.difgeo.2014.06.003. [DOI] [Google Scholar]
- 25.Ohara A. On affine immersions of the probability simplex and their conformal flattening. In: Nielsen F., Barbaresco F., editors. Geometric Science of Information. Springer; Berlin, Germany: 2017. [Google Scholar]
