Skip to main content
Entropy logoLink to Entropy
. 2018 Mar 10;20(3):186. doi: 10.3390/e20030186

Conformal Flattening for Deformed Information Geometries on the Probability Simplex

Atsumi Ohara 1
PMCID: PMC7512704  PMID: 33265277

Abstract

Recent progress of theories and applications regarding statistical models with generalized exponential functions in statistical science is giving an impact on the movement to deform the standard structure of information geometry. For this purpose, various representing functions are playing central roles. In this paper, we consider two important notions in information geometry, i.e., invariance and dual flatness, from a viewpoint of representing functions. We first characterize a pair of representing functions that realizes the invariant geometry by solving a system of ordinary differential equations. Next, by proposing a new transformation technique, i.e., conformal flattening, we construct dually flat geometries from a certain class of non-flat geometries. Finally, we apply the results to demonstrate several properties of gradient flows on the probability simplex.

Keywords: representing functions, affine immersion, nonextensive statistical physics, invariance, dually flat structure, Legendre conjugate, gradient flow

1. Introduction

The theory of information geometry has elucidated abundant geometric properties equipped with a Riemannian metric and mutually dual affine connections. When it is applied to the study of statistical models described by the exponential family, the logarithmic function plays a significant role in giving the standard information geometric structure to the models [1,2].

Inspired by the recent progress of several areas in statistical physics and mathematical statistics [3,4,5,6,7,8,9,10] which have exploited theoretical interests and possible applications for generalized exponential families, one research direction in information geometry is pointing to constructions of deformed geometries based on the standard one, keeping its basic properties. A typical and classical example of such a deformation would be the alpha-geometry [1,2], a statistical definition of which can be regarded as a replacement of the logarithmic function by suitable power functions. Hence, for the purpose of the generalization and flexible applicability, much attention is paid to various uses of such replacements by representing functions as important tools [3,4,11,12].

Two major characteristics of the standard structure are dual flatness and invariance [2]. Dual flatness (or Hessian structure [13]) produces fruitful properties such as the existence of canonical coordinate systems, a pair of conjugate potential functions and the canonical divergence (relative entropy). In addition, they are connected with the Legendre duality relation, which is also fundamental in the generalization of statistical physics. On the other hand, the invariance of geometric structure is crucially valuable in developing mathematical statistics. It has been proved [14] that invariance holds for only the structure with a special triple of a Riemannian metric and a pair of mutually dual affine connections, which are respectively called the Fisher information and the alpha-connections (see Section 3 for their definitions). The study of these two characteristics from a viewpoint of representing functions would contribute to our geometrical understanding.

In this paper, we first characterize a pair of representing functions that realizes the invariant information geometric structure. Next, we propose a new transformation to obtain dually flat geometries from a certain class of non-flat information geometries, using concepts from affine differential geometry [15,16]. We call the transformation conformal flattening, which is a generalization of the way to realize the corresponding dually flat geometry from the alpha-geometry developed in [17,18]. As applications and easy consequences of the results, we finally show several properties of gradient flows associated with realized dually flat geometries. Focusing on geometric characteristics conserved by the transformation, we discuss the properties such as a relation between geodesics and flows, the first integral of the flows and so on. These properties are new and general. Hence, they refine the arguments of the flows in [18], where only the alpha-geometry is treated.

The paper is organized as follows. In Section 2, we introduce preliminary results, explaining several existing methods to construct the information geometric structure that includes a dually flat structure and the alpha-structure and so on. We also give a short summary of concepts from affine differential geometry, which will be used in this paper. Section 3 provides a characterization of representing functions that realize invariant geometry, i.e., the one equipped with the Fisher information and a pair of the alpha-connections. The characterization is obtained by solving a simple system of ordinary equations. In Section 4, we first obtain a certain class of information geometric structure by regarding representing functions as immersions into an ambient affine space. Then, we demonstrate the conformal flattening to realize the corresponding dually flat structure, and discuss their properties and relations with generalized entropies or escort probabilities [19]. Section 5 exhibits the geometric properties of gradient flows with respect to a conformally realized Riemannian metric. These flows are reduced to the well-known replicator flow [20] (Chapter 16) when we consider the standard information geometry. Suitably choosing its pay-off functions, we see that the flow follows a geodesic curve or conserves a divergence from an equilibrium. In the final section, some concluding remarks are made.

Throughout the paper, we use a probability simplex as a statistical model for the sake of simplicity.

2. Preliminaries

2.1. Information Geometry of Sn and R+n+1

Let us represent an element pRn+1 with its components pi,i=1,,n+1 as p=(pi)Rn+1. Denote, respectively, the positive orthant by

R+n+1:={p=(pi)Rn+1|pi>0,i=1,,n+1},

and the relative interior of the probability simplex by

Sn:=pR+n+1i=1n+1pi=1.

Let p(X) be a probability distribution of a random variable X taking a value in the finite sample space Ω={1,2,,n,n+1}. We consider a set of distributions p(X) with positive probabilities, i.e., p(i)=pi>0,i=1,,n+1, defined by

p(X)=i=1n+1piδi(X),δi(j)=δij(the Kroneckers delta),

which is identified with Sn. A statistical model in Sn is represented with parameters ζ=(ζj),j=1,,dn by

pζ(X)=i=1n+1pi(ζ)δi(X),

where each pi is smoothly parametrized by ζ. For such a statistical model, ζj can also be regarded as coordinates of the corresponding submanifold in Sn. For simplicity, we shall consider the full model, i.e., d=n and the parameter set is bijective with Sn via pi(ζ)’s.

The information geometric structure [2] on Sn denoted by (g,,*) is composed of the pair of mutually dual torsion-free affine connections ∇ and * with respect to a Riemannian metric g. If we write i:=/ζi,i=1,,n, the mutual duality requires components of (g,,*) to satisfy

igjk=Γij,k+Γik,j*. (1)

Let L and M be a pair of strictly monotone (i.e., one-to-one) smooth functions on the interval (0,1). One way of constructing such a structure (g,,*) is to define the components as follows [2,11]:

gij(p)=XΩiL(pζ(X))jM(pζ(X)),i,j=1,,n, (2)
Γij,k(p)=XΩijL(pζ(X))kM(pζ(X)),i,j,k=1,,n, (3)
Γij,k*(p)=XΩkL(pζ(X))ijM(pζ(X)),i,j,k=1,,n. (4)

In this paper, we call L and M representing functions. It is easy to verify the mutual duality (1). (Positive definiteness of g needs additional conditions.)

When the curvature tensors of both ∇ and * vanish, (g,,*) is called dually flat [2]. It is known that (g,,*) is dually flat if and only if there exist two special coordinate systems denoted by θi(p) and ηi(p),i=1,,n, respectively, where (θi) is ∇-affine, (ηi) is *-affine and they are biorthogonal, i.e.,

gθi,ηj=δij.

We give examples. For a real number α, define L(α)(u):=2u(1α)/2/(1α) and L(1):=lnu. If we set L(u)=L(α)(u) and M(u)=L(α)(u), then they derive the alpha-structure [2] (gF,(α),(α)), where gF is the Fisher information and (±α) are the alpha-connections (see Section 3). In particular, if we choose α=1, it defines the standard dually flat structure (gF,(e):=(1),(m):=(1)), where (e) and (m) are called the e- and m-connection, respectively [2]. Similarly, the ϕ-log geometry [3] can also be introduced in the same way by taking L(u)=logϕ(u) and M(u)=u.

One traditional way to construct a general information geometric structure (g,,*), without using representing functions, is by means of contrast functions (or divergences) [2,21]. In our case, let ρ be a function on Sn×Sn satisfying ρ(p,r)0,p,rSn with equality if and only if p=r. For a vector field i, let (i)p denote its tangent vector at p. When we define

gij(p)=(i)p(j)rρ(p,r)p=r,i,j=1,,n, (5)
Γij,k(p)=(i)p(j)p(k)rρ(p,r)p=r,i,j,k=1,,n, (6)
Γij,k*(p)=(i)p(j)r(k)rρ(p,r)p=r,i,j,k=1,,n, (7)

we can confirm that (1) holds. If g is positive definite, we say that ρ is a contrast function or a divergence that induces the structure (g,,*).

A contrast function ρ of the form:

ρ(p,r)=ψ(θ(p))+φ(η(r))i=1nθi(p)ηi(r) (8)

always induces the corresponding dually flat structure. Conversely, it is known [2] that if (g,,*) is dually flat, then there exists the unique contrast function of the form (8) that induces the structure. Hence, it is called the canonical divergence of (g,,*) and we say that the functions ψ and φ are potentials. By setting p=r, we see that a dually flat structure naturally gives the Legendre duality relations at each p, i.e., the function φ, is the Legendre conjugate of ψ satisfying

ηi=ψθi,θi=φηi.

Applying the idea of affine hypersurface theory [15] is also one of the other ways to construct the information geometric structure. Let D be the canonical flat affine connection on Rn+1. Consider an immersion f from Sn into Rn+1 and a vector field ξ on Sn that is transversal to the hypersurface f(Sn) in Rn+1. Such a pair (f,ξ), called an affine immersion, defines a torsion-free connection ∇ and the affine fundamental form g on Sn via the Gauss formula as

DXf*(Y)=f*(XY)+g(X,Y)ξ,X,YX(Sn), (9)

where X(Sn) is the set of tangent vector fields on Sn and f* denotes the differential of f. By regarding g as a (pseudo-) Riemannian metric, one can discuss the realized structure (g,) on Sn.

We say that (f,ξ) is non-degenerate and equiaffine if g is non-degenerate and DXξ is tangent to Sn for any XX(Sn), respectively. The latter ensures that the volume element θ on Sn defined by

θ(X1,,Xn)=det(f*(X1),,f*(Xn),ξ),XiX(Sn)

is parallel to ∇ [15] (p.31). It is known [15,16] that there exists a torsion-free dual affine connection * satisfying (1) if and only if (f,ξ) is non-degenerate and equiaffine. In this case, the obtained structure (g,,*) on Sn is not dually flat in general. However, there always exists a positive function σ and a dually flat structure (g˜,˜,˜*) on Sn that hold the following relations [16]:

g˜=σg, (10)
g(˜XY,Z)=g(XY,Z)d(lnσ)(Z)g(X,Y), (11)
g(˜X*Y,Z)=g(X*Y,Z)+d(lnσ)(X)g(Y,Z)+d(lnσ)(Y)g(X,Z). (12)

Furthermore, there exists a specific contrast function ρ(p,r) for (g,,*) called the geometric divergence. Then, a contrast function ρ˜(p,r) that induces (g˜,˜,˜*) is given by the conformal divergence ρ˜(p,r)=σ(r)ρ(p,r). These properties of the structure (g,) realized by the non-degenerate and equiaffine immersion are called 1-conformal flatness [16].

3. Characterization of Invariant Geometry by Representing Functions

Suppose that a pair of representing functions (L,M) defines an information geometric structure (g,,*) by (2), (3) and (4). In this section, we consider the condition of (L,M) such that (g,,*) is invariant. This is equivalent [2,14] to g which is the Fisher information gF defined by

gijF(p)=XΩpζ(ilnpζ)(jlnpζ) (13)

and a pair of dual connections satisfies =(α) and *=(α) for a certain αR, where (α) is the α-connection defined by

Γij,k(α)=XΩpζijlnpζ+1α2(ilnpζ)(jlnpζ)(klnpζ). (14)

Hence, gij expressed in (2) by functions L(u) and M(u) coincides with the Fisher information if and only if the following equation holds:

dLdudMdu=1/u. (15)

Similarly, we derive a condition for Γij,k expressed in (3) to be the α-connection. First, note that the following relations hold:

ilnpζ=(ipζ)1pζ,ijlnpζ=(ijpζ)1pζ(ipζ)(jpζ)1pζ2. (16)

On the other hand, we have

ijL(pζ)=(ijpζ)dLdu(pζ)+(ipζ)(jpζ)d2Ldu2(pζ), (17)
kM(pζ)=(kpζ)dMdu(pζ). (18)

Substituting (16), (17) and (18) into (3) and (14), and comparing them, we obtain (15) again and

d2Ldu2dMdu=1+α2u2. (19)

Expressing L:=dL/du and L:=d2L/du2, we have the following ODE from (15) and (19):

LL=1+α2u. (20)

By integrations, we get

lnL=(1+α)2lnu+c (21)

and

L(u)=c1u(1α)/2+c2,M(u)=c3u(1+α)/2+c4, (22)

where c and ci,i=1,,4 are constants with a constraint c1c3=4/(1α2). Thus, (L,M) is essentially a pair of representing functions that derives the alpha-geometry and there is only freedom of adjusting the constants for the invariance of geometry. If we require solely (15), which implies that only a Riemannian metric g is the Fisher information gF, there still remains much freedom for (L,M).

4. Affine Immersion of the Probability Simplex

Now we consider the affine immersion with the following assumptions.

Assumptions:

  1. The affine immersion (f,ξ) is nondegenerate and equiaffine,

  2. The immersion f is given by the component-by-component and common representing function L, i.e.,
    f:Snp=(pi)x=(xi)Rn+1,xi=L(pi),i=1,,n+1,
  3. The representing function L:(0,1)R is sign-definite, concave with L<0 and strictly increasing, i.e., L>0. Hence, the inverse of L denoted by E exists, i.e., EL=id.

  4. Each component of ξ satisfies ξi<0,i=1,,n+1 on Sn.

Remark 1.

From the assumption 3, it follows that LE=1, E>0 and E>0. Regarding sign-definiteness of L, note that we can adjust L(u) to L(u)+c by a suitable constant c without loss of generality since the resultant geometric structure is unchanged (See Theorem 1) by the adjustment. For a fixed L satisfying the assumption 3, we can choose ξ that meets the assumptions 1 and 4. For example, if we take ξi=|L(pi)|, then (f,ξ) is called centro-affine, which is known to be equiaffine [15] (p.37). The assumptions 3 and 4 also assure positive definiteness of g (the details are described in the proof of Theorem 1). Hence, (f,ξ) is non-degenerate and we can regard g as a Riemannian metric on Sn.

4.1. Conormal Vector and the Geometric Divergence

Define a function Ψ on Rn+1 by

Ψ(x):=i=1n+1E(xi),

then f(Sn) immersed in Rn+1 is expressed as a level surface of Ψ(x)=1. Denote by Rn+1 the dual space of Rn+1 and by ν,x the pairing of xRn+1 and νRn+1. The conormal vector [15] (p.57) ν:SnRn+1 for the affine immersion (f,ξ) is defined by

ν(p),f*(X)=0,XTpSn,ν(p),ξ(p)=1, (23)

for pSn. Using the assumptions and noting the relations:

Ψxi=E(xi)=1L(pi)>0,i=1,,n+1,

we have

νi(p):=1ΛΨxi=1Λ(p)E(xi)=1Λ(p)1L(pi),i=1,,n+1, (24)

where Λ is a normalizing factor defined by

Λ(p):=i=1n+1Ψxiξi=i=1n+11L(pi)ξi(p). (25)

Then, we can confirm (23) using the relation i=1n+1Xi=0 for X=(Xi)X(Sn). Note that v:SnRn+1 defined by

vi(p)=Λ(p)νi(p)=1L(pi),i=1,,n+1,

also satisfies

v(p),f*(X)=0,XTpSn. (26)

Furthermore, it follows, from (24), (25) and the assumption 4, that

Λ(p)<0,νi(p)<0,i=1,,n+1,

for all pSn.

It is known [15] (p.57) that the affine fundamental form g can be represented by

g(X,Y)=ν*(X),f*(Y),X,YTpSn.

In our case, it is calculated via (26) as

g(X,Y)=Λ1v*(X),f*(Y)X(Λ1)v,f*(Y)=1Λi=1n+11L(pi)L(pi)XiYi=1Λi=1n+1L(pi)L(pi)XiYi,X,YTpSn. (27)

Hence, g is positive definite from the assumptions 3 and 4, and we can regard it as a Riemannian metric.

Utilizing these notions from affine differential geometry, we can introduce a geometric divergence [16] as follows:

ρ(p,r)=ν(r),f(p)f(r)=i=1n+1νi(r)(L(pi)L(ri))=1Λ(r)i=1n+1L(pi)L(ri)L(ri),p,rSn. (28)

It is easily checked that ρ is actually a contrast function of the 1-conformally flat structure (g,,*) using (5), (6) and (7).

4.2. Conformal Flattening Transformation

As is described in the preliminary section, by 1-conformally flatness there exists a positive function, i.e., conformal factor σ that relates (g,,*) with a dually flat structure (g˜,˜,˜*) via the conformal transformation (10), (11) and (12). A contrast function ρ˜ that induces (g˜,˜,˜*) is given as the conformal divergence:

ρ˜(p,r)=σ(r)ρ(p,r),p,rSn. (29)

from the geometric divergence ρ in (28).

For an arbitrary function L within our setting given by the four assumptions, we prove that we can construct a dually flat structure (g˜,˜,˜*) by choosing the conformal factor σ carefully. Hereafter, we call this transformation conformal flattening.

Define

Z(p):=i=1n+1νi(p)=1Λ(p)i=1n+11L(pi),

then it is negative because each νi(p) is negative. The conformal divergence to ρ with respect to the conformal factor σ(r):=1/Z(r) is

ρ˜(p,r)=1Z(r)ρ(p,r).

Theorem 1.

If the conformal factor is σ=1/Z, then the information geometric structure (g˜,˜,˜*) on Sn that is transformed from the 1-conformally flat structure (g,,*) via (10), (11) and (12) is dully flat. Furthermore, the conformal divergence ρ˜ that induces (g˜,˜,˜*) on Sn is canonical where Legendre conjugate potential functions and coordinate systems are explicitly given by

θi(p)=xi(p)xn+1(p)=L(pi)L(pn+1),i=1,,n, (30)
ηi(p)=Pi(p):=νi(p)Z(p)=1/L(pi)k=1n+11/L(pk),i=1,,n, (31)
ψ(p)=xn+1(p)=L(pn+1), (32)
φ(p)=1Z(p)i=1n+1νi(p)xi(p)=i=1n+1Pi(p)L(pi). (33)

Proof. 

Using given relations, we first show that the conformal divergence ρ˜ is the canonical divergence for (g˜,˜,˜*):

ρ˜(p,r)=1Z(r)ν(r),f(p)f(r)=P(r),f(r)f(p)=i=1n+1Pi(r)(xi(r)xi(p))=i=1n+1Pi(r)xi(r)i=1nPi(r)(xi(p)xn+1(p))i=1n+1Pi(r)xn+1(p)=φ(r)i=1nηi(r)θi(p)+ψ(p). (34)

Next, let us confirm that ψ/θi=ηi.

Since θi(p)=L(pi)+ψ(p),i=1,,n, we have

pi=E(θiψ),i=1,,n+1,

by setting θn+1:=0. Hence, we have

1=i=1n+1E(θiψ).

Differentiating by θj, we obtain

0=θji=1n+1E(θiψ)=i=1n+1E(θiψ)δjiψθj=E(xj)i=1n+1E(xi)ψθj.

This implies that

ψθj=E(xj)i=1n+1E(xi)=ηj.

Together with (34) and this relation, φ is confirmed to be the Legendre conjugate of ψ.

The dual relation φ/ηi=θi follows automatically from the property of the Legendre transform. □

The following corollary is straightforward because all the quantities in the theorem depend on only L:

Corollary 1.

Under the assumptions, the dually flat structure (g˜,˜,˜*) on Sn, obtained by following the above conformal flattening, does not depend on the choice of the transversal vector ξ.

Remark 2.

Note that the conformal metric is given by g˜=g/Z and is positive definite. Furthermore, the relation (12) means that the dual affine connections * and ˜* are projectively (or -1-conformally) equivalent [15,16]. Hence, * is projectively flat. Furthermore, the above corollary implies that the realized affine connectionis also projectively equivalent to the flat connection ˜ if we use the centro-affine immersion, i.e., ξi=L(pi) [15,16]. See Proposition 3 for an application of projective equivalence of affine connections.

Remark 3.

In our setting, conformal flattening is geometrically regarded as normalization of the conormal vector ν. Hence, the dual coordinates ηi(p)=Pi(p) can be interpreted as a generalization of the escort probability [10,19] (see the following example). Similarly, ψ and φ might be seen as the associated Massieu function and entropy, respectively.

Remark 4.

While the immersion f is composed of a representing function L under the assumption 2, the corresponding M of a single variable does not generally exist for (g,,*) nor (g˜,˜,˜*). From the expressions of the Riemann metrics g in (27) and g˜=g/Z, we see that the counterparts of the representing functions M(pi) would be, respectively, νi(p) and Pi(p), but note that they are multi-variable functions of p=(pi).

4.3. Examples

If we take L to be the logarithmic function L(t)=ln(t), then the conformally flattened geometry immediately defines the standard dually flat structure (gF,(1),(1)) on the simplex Sn. We see that φ(p) is the entropy, i.e., φ(p)=i=1n+1pilnpi and the conformal divergence is the KL divergence (relative entropy), i.e., ρ˜(p,r)=D(KL)(r||p)=i=1n+1ri(lnrilnpi).

Next, let the affine immersion (f,ξ) be defined by the following L and ξ:

L(t):=11qt1q,xi(p)=11q(pi)1q,

and

ξi(p)=q(1q)xi(p),

with 0<q and q1. We see that the immersion is centro-affine scaled by the constant factor q(1q). Then, we see that the immersion realizes the alpha-structure (gF,(α),(α)) on Sn with q=(1+α)/2. The geometric divergence is the alpha-divergence, i.e.,

ρ(p,r)=41α21i=1n+1(pi)(1α)/2(ri)(1+α)/2.

Following the procedure of conformal flattening described in the above, we have [17]

Ψ(x)=i=1n+1((1q)xi)1/1q,Λ(p)=q,(constant)
νi(p)=1q(pi)q,σ(p)=1Z(p)=qk=1n+1(pi)q,

and obtain a dually flat structure (g˜F,˜,˜*) via the formulas in Theorem 1:

ηi=Pi=(pi)qk=1n+1(pk)q,θi=11q(pi)1q11q(pn+1)1q=lnq(pi)ψ(p),
ψ(p)=lnq(pn+1),φ(p)=lnq1expq(Sq(p)),g˜F=1Z(p)gF.

Here, lnq and Sq(p) are the q-logarithmic function and the Tsallis entropy [10], respectively, defined by

lnq(t)=t1q11q,Sq(p)=i=1n+1(pi)q11q.

Note that the escort probability appears as the dual coordinate ηi.

5. An Application to Gradient Flows on Sn

Recall the replicator flow on the simplex Sn for given functions fi(p) defined by

p˙i=pi(fi(p)f¯(p)),i=1,,n+1,f¯(p):=i=1n+1pifi(p), (35)

which is extensively studied in evolutionary game theory. It is known [20] (Chapter 16) that

  • (i)
    the solution to (35) is the gradient flow that maximizes a function V(p) satisfying
    fi=Vpi,i=1,,n+1, (36)
    with respect to the Shahshahani metric gS (See below),
  • (ii)

    the KL divergence is a local Lyapunov function for an equilibrium called the evolutionary stable state (ESS) for the case of fi(p)=j=1n+1aijpj with (aij)R(n+1)×(n+1).

The Shahshahani metric gS is defined on the positive orthant R+n+1 by

gijS(p)=k=1n+1pkpiδij,i,j=1,,n+1.

Note that the Shahshahani metric induces the Fisher metric gF on Sn. Further, the KL divergence is the canonical divergence [2] of (gF,(1),(1)). Thus, the replicator dynamics (35) are closely related with the standard dually flat structure (gF,(1),(1)), which associates with exponential and mixture families of probability distributions. In addition, investigation of the flow is also important from a viewpoint of statistical physics governed by the Boltzmann–Gibbs distributions when we choose V(p) as various physical quantities, e.g., free energy or entropy.

Similarly, when we consider various Legendre relations deformed by L, it would be of interest to investigate gradient flows on Sn for a dually flat structure (g˜,˜,˜*) or a 1-conformally flat structure (g,,*). Since g and g˜ can be naturally extended to R+n+1 as a diagonal form (we use the same notation for brevity):

gij(p)=1Λ(p)L(pi)L(pi)δij,g˜ij(p)=1Z(p)gij(p),i,j=1,,n+1 (37)

from (27), we can define two gradient flows for V(p) on Sn. One is the gradient flow for g, which is

p˙i=gii1(fif¯H),f¯H(p):=k=1n+1Hk(p)fk(p),Hi(p):=gii1(p)k=1n+1gkk1(p), (38)

for i=1,,n+1. It is verified that p˙ is tangent to Sn, i.e., p˙TpSn and gradient of V, i.e.,

g(X,p˙)=i=1n+1fiXif¯Hi=1n+1Xi=i=1n+1VpiXi,X=(Xi)X(Sn).

In the same way, the other one for g˜ is defined by

p˙i=g˜ii1(fif¯H),f¯H(p):=k=1n+1Hk(p)fk(p),i=1,,n+1. (39)

Note that both the flows reduce to (35) when L=ln.

From (37), the following consequence is immediate:

Proposition 1.

The trajectories of the gradient flow (38) and (39) starting from the same initial point coincide while velocities of time-evolutions are different by the factor-Z(p).

Taking account of the example with respect to the alpha-geometry and the conformally flattened one given in subsection 4.3, the following result shown in [18] can be regarded as a corollary of the above proposition:

Corollary 2.

The trajectories of the gradient flow (39) with respect to the conformal metric g˜ for L(t)=t1q/(1q) coincide with those of the replicator flow (35) while velocities of time-evolutions are different by the factor-Z(p).

Next, we particularly consider the case when V(p) is a potential function or divergences. As for a gradient flow on a manifold equipped with a dually flat structure (g˜,˜,˜*), the following result is known:

Proposition 2.

[22] Consider the potential function ψ(p) and the canonical divergence ρ˜(p,r) of (g˜,˜,˜*) for an arbitrary prefixed point r. The gradient flows for V(p)=±ψ(p) and V(p)=±ρ(p,r) follow ˜*-geodesic curves.

As is described in Remark 2, * and ˜* are projectively equivalent. One geometrically interesting property of the projective equivalence is that *- and ˜*- geodesic curves coincide up to their parametrizations (i.e., a curve is *-pregeodesic if and only if it is ˜*-pregeodesic) [15] (p.17). Combining this fact with Propositions 1 and 2, we see that the following result holds:

Proposition 3.

Let rSn be an arbitrary prefixed point. The gradient flows (38) for V(p)=±ρ(p,r)=±ρ˜(p,r)/σ(r), V(p)=±ρ˜(p,r) and V(p)=±ψ(p) follow ˜*-geodesic curves.

Finally, we demonstrate here another aspect of the flow (39). Let us particularly consider the following functions fi:

fi(p):=L(pi)(L(pi))2j=1n+1aijPj(p),aij=ajiR,i,j=1,,n+1. (40)

Note that fis are not integrable, i.e., non-trivial V satisfying (36) does not exist because of the anti-symmetry of aij. Hence, for this case, (39) is no longer a gradient flow. However, we can prove the following result:

Theorem 2.

Consider the flow (39) with the functions fis defined in (40) and assume that there exists an equilibrium rSn for the flow. Then, ρ(p,r) and ρ˜(p,r) are the first integral (conserved quantity) of the flow.

Proof. 

By substituting (40) into f¯H(p) in (39) and using the expression of g˜ii in (37), we have

f¯H(p)=1k=1n+1L(pk)/L(pk)i=1n+11L(pi)j=1n+1aijPi(p).

By the relation E(xi)=1/L(pi) and (31), it holds that

i=1n+11L(pi)j=1n+1aijPi(p)=1k=1n+1E(xk)i=1n+1j=1n+1aijE(xl)E(xj)=0.

Hence, we see that f¯H=0 and the flow (39) reduces to

p˙i=g˜ii1(p)fi(p),i=1,,n+1. (41)

Since r is an equilibrium point, we see from (40) that

j=1n+1aijPj(r)=0,i=1,,n+1. (42)

Then, using (34), (41) and (42), we have

dρ˜(p,r)dt=i=1n+1Pi(r)L(pi)p˙i=Z(p)Λ(p)i=1n+1Pi(r)(L(pi))2L(pi)fi(p)=Z(p)Λ(p)i=1n+1Pi(r)j=1n+1aijPj(p)=Z(p)Λ(p)i=1n+1j=1n+1(Pi(p)Pi(r))aijPj(p)=Z(p)Λ(p)i=1n+1j=1n+1(Pi(p)Pi(r))aij(Pj(p)Pj(r))=0.

Thus, ρ˜(p,r) is the first integral of the flow. It follows that ρ(p,r) is also the first integral of the flow from the definition of conformal divergence (29). □

Remark 5.

From proposition 1, the same statement holds for the flow (38). The proposition implies the fact [20] that the KL divergence is the first integral for the replicator flow (35) with the function fi(p) in (40) defined by L(t)=lnt and Pj(p)=pj.

6. Conclusions

We have considered two important aspects of information geometric structure, i.e., invariance and dual flatness, from a viewpoint of representing functions. As for the invariance of geometry, we have proved that a pair of representing functions that derives the alpha-structure is essentially unique. On the other hand, we have shown the explicit formula of conformal flattening that transforms 1-conformally flat structures on the simplex Sn realized by affine immersions to the corresponding dually flat structures. Finally, we have discussed several geometric properties of gradient flows associated to two structures.

Presently, our analysis is restricted to the probability simplex, i.e., the space of discrete probability distributions. For the continuous case, the similar or related results are obtained in [23,24] without using affine immersions. Extensions of the results obtained in this paper to continuous probability space and the exploitation of relations to the literature are left for future work.

The conformal flattening can also be applied to the computationally efficient construction of a Voronoi diagram with respect to the geometric divergences [18]. Exploring the possibilities of other applications would be of interest.

Acknowledgments

Part of the results is adapted and reprinted with permission from Springer Customer Service Centre GmbH (licence No: 4294160782766): Springer Nature, Geometric Science of Information LNCS 10589 Nielsen, F., Barbaresco, F., Eds., (Article Name:) On affine immersions of the probability simplex and their conformal flattening, (Author:) A. Ohara, (Copyright:) Springer International Publishing AG 2017 [25]. The author is partially supported by JSPS Grant-in-Aid (C) 15K04997.

Conflicts of Interest

The author declares no conflict of interest.

References

  • 1.Amari S.I. Differential-Geometrical Methods in Statistics. Springer; New York, NY, USA: 1985. (Lecture Notes in Statistics Series 28). [Google Scholar]
  • 2.Amari S.I., Nagaoka H. Methods of Information Geometry. AMS & Oxford University Press; Oxford, UK: 2000. (Translations of Mathematical Monographs Series 191). [Google Scholar]
  • 3.Naudts J. Continuity of a class of entropies and relative entropies. Rev. Math. Phys. 2004;16:809. doi: 10.1142/S0129055X04002151. [DOI] [Google Scholar]
  • 4.Eguchi S. Information geometry and statistical pattern recognition. Sugaku Expos. 2006;19:197. [Google Scholar]
  • 5.Grünwald P.D., Dawid A.P. Game theory, maximum entropy, minimum discrepancy, and robust Bayesian decision theory. Ann. Statist. 2004;32:1367. [Google Scholar]
  • 6.Fujisawa H., Eguchi S. Robust parameter estimation with a small bias against heavy contamination. J. Multivar. Anal. 2008;99:2053. doi: 10.1016/j.jmva.2008.02.004. [DOI] [Google Scholar]
  • 7.Naudts J. The q-exponential family in statistical Physics. Cent. Eur. J. Phys. 2009;7:405. doi: 10.2478/s11534-008-0150-x. [DOI] [Google Scholar]
  • 8.Naudts J. Generalized thermostatics. Springer; Berlin, Germany: 2010. [Google Scholar]
  • 9.Ollila E., Tyler D., Koivunen V., Poor V. Complex elliptically symmetric distributions : Survey, new results and applications. IEEE Trans. Sig. Proc. 2012;60:5597. doi: 10.1109/TSP.2012.2212433. [DOI] [Google Scholar]
  • 10.Tsallis C. Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World. Springer; Berlin, Germany: 2009. [Google Scholar]
  • 11.Zhang J. Divergence Function, Duality, and Convex Analysis. Neural Comput. 2004;16:159. doi: 10.1162/08997660460734047. [DOI] [PubMed] [Google Scholar]
  • 12.Wada T., Matsuzoe H. Conjugate representations and characterizing escort expectations in information geometry. Entropy. 2017;19:309. doi: 10.3390/e19070309. [DOI] [Google Scholar]
  • 13.Shima H. The Geometry of Hessian Structures. World Scientific; Singapore: 2007. [Google Scholar]
  • 14.Chentsov N.N. Statistical Decision Rules and Optimal Inference. AMS; Providence, RI, USA: 1982. [Google Scholar]
  • 15.Nomizu K., Sasaki T. Affine Differential Geometry. Cambridge University Press; Cambridge, UK: 1993. [Google Scholar]
  • 16.Kurose T. On the divergences of 1-conformally flat statistical manifolds. Tohoku Math. J. 1994;46:427. doi: 10.2748/tmj/1178225722. [DOI] [Google Scholar]
  • 17.Ohara A., Matsuzoe H., Amari S.I. A dually flat structure on the space of escort distributions. J. Phys. Conf. Ser. 2010;201:012012. doi: 10.1088/1742-6596/201/1/012012. [DOI] [Google Scholar]
  • 18.Ohara A., Matsuzoe H., Amari S.I. Conformal geometry of escort probability and its applications. Mod. Phys. Lett. B. 2012;26:1250063. doi: 10.1142/S0217984912500637. [DOI] [Google Scholar]
  • 19.Tsallis C., Mendes M.S., Plastino A.R. The role of constraints within generalized nonextensive statistics. Physica A. 1998;261:534. doi: 10.1016/S0378-4371(98)00437-3. [DOI] [Google Scholar]
  • 20.Hofbauer J., Sigmund K. The Theory of Evolution and Dynamical Systems: Mathematical Aspects of Selection. Cambridge University Press; Cambridge, UK: 1988. [Google Scholar]
  • 21.Eguchi S. Geometry of minimum contrast. Hiroshima Math. J. 1992;22:631. [Google Scholar]
  • 22.Fujiwara A., Amari S.I. Gradient systems in view of information geometry. Physica D. 1995;80:317. doi: 10.1016/0167-2789(94)00175-P. [DOI] [Google Scholar]
  • 23.Amari S.I., Ohara A., Matsuzoe H. Geometry of deformed exponential families: Invariant, dually-flat and conformal geometries. Physica A. 2012;391:4308. doi: 10.1016/j.physa.2012.04.016. [DOI] [Google Scholar]
  • 24.Matsuzoe H. Hessian structures on deformed exponential families and their conformal structures. Diff. Geo. Appl. 2014;35:323. doi: 10.1016/j.difgeo.2014.06.003. [DOI] [Google Scholar]
  • 25.Ohara A. On affine immersions of the probability simplex and their conformal flattening. In: Nielsen F., Barbaresco F., editors. Geometric Science of Information. Springer; Berlin, Germany: 2017. [Google Scholar]

Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES