Skip to main content
Entropy logoLink to Entropy
. 2020 Feb 20;22(2):240. doi: 10.3390/e22020240

Global Geometry of Bayesian Statistics

Atsuhide Mori 1
PMCID: PMC7516673  PMID: 33286014

Abstract

In the previous work of the author, a non-trivial symmetry of the relative entropy in the information geometry of normal distributions was discovered. The same symmetry also appears in the symplectic/contact geometry of Hilbert modular cusps. Further, it was observed that a contact Hamiltonian flow presents a certain Bayesian inference on normal distributions. In this paper, we describe Bayesian statistics and the information geometry in the language of current geometry in order to spread our interest in statistics through general geometers and topologists. Then, we foliate the space of multivariate normal distributions by symplectic leaves to generalize the above result of the author. This foliation arises from the Cholesky decomposition of the covariance matrices.

Keywords: information geometry, poisson structure, symplectic structure, contact structure, foliation, Cholesky decomposition

1. Introduction

Suppose that a smooth manifold U is embedded in the space of positive probability densities defined on a fixed domain. Then, the relative entropy defines a separating premetric D:U×UR0 on U. Here a premetric on U is a non-negative function on U×U vanishing along the diagonal set ΔU×U, and it is separating if it vanishes only at Δ. Its jet of order 3 at Δ induces a family of differential geometric structures on U, which is the main subject of the information geometry. There is a large body of literature on the information geometry (see [1,2] and references therein). It is worth noting that another “canonical” choice of premetric other than the above D is discussed in [3].

In the case where U is the space of univariate normal distributions, the half plane H=R×R>0(m,s) presents U, where m denotes the mean and s the standard deviation. Since the convolution of two normal densities is a normal density, it induces a product ∗ on H called the convolution product. On the other hand, since the pointwise product of two normal densities is proportional to a normal density, it induces another product · on H called the Bayesian product. Their expressions are

(m,s)(m,s)=m+m,s2+s2,(m,s)·(m,s)=ms2+ms2s2+s2,sss2+s2. (1)

In the previous work of the author ([4]), a Fourier-like transformation is defined as the diffeomorphism F^:HH sending (m,s) to (M,S)=ms2,1s. It is an involution interchanging the two operations ∗ and ·. Accordingly, the stereograph f:H×HR>0 of the above D is defined by

f:(m,s,M,S)12MS+sSms2+12(sS)212ln(sS)=12(mm)2+s2s21+lns2s2=D((m,s),(m,s)). (2)

The flow (m,s,M,S)(etm,ets,etM,etS) (tR) preserves f as well as the graph FH×H of F^. The same symmetry appears in the contact/symplectic geometry related to the algebraic geometry of Hilbert modular cusps. Moreover, there exists a contact Hamiltonian flow whose restriction to the graph F presents a certain Bayesian inference. Its application appears in [5].

In this paper, we describe Bayesian statistics in the language of current geometry in order to share the problems among general geometers and topologists. Then, generalizing the above result of the author, we foliate the space of multivariate normal distributions by using the Cholesky decomposition of the covariance matrices and define on each leaf the Fourier-like transformation, the stereograph of the relative entropy, and the contact Hamiltonian flow presenting a Bayesian inference. The ultimate aim of this research is to construct a Bayesian statistical model of space-time on which everything learns by changing its inner distribution along the leaf.

2. Results

2.1. Symplectic/Contact Geometry

Current geometry does not heavily use tensor calculus. Instead, it uses (exterior) differential forms, which can be integrated along cycles, pulled-back under smooth maps, and differentiated without affine connections. In symplectic/contact geometry, the readers must be familiar with differential forms. Then, this subsection is the minimal summary of definitions. For the details, refer to [6].

A (positive) symplectic form on an oriented 2n-manifold is a closed 2-form ω satisfying ωn>0, where ωn=ωω. If the orientation is reversed, the 2-form ω becomes a negative symplectic form. In either case, a symplectic form ω identifies a vector field X with an exact 1-form dH through the one-to-one correspondence defined by Hamilton’s equation ιXω=dH. Here ι denotes the interior product. Then, X is called a Hamiltonian vector field of the primitive function H (+constant). The flow generated by X preserves the symplectic form ω. Namely, the Lie derivative LXω(=ιXdω+dιXω) vanishes. A Lagrangian submanifold is an n-manifold which is immersed in a symplectic 2n-manifold so that the pull-back of the symplectic form vanishes. The word “symplectic” is a calque of “complex”. Indeed, there exists an almost complex structure J which is compatible with a given symplectic structure, i.e., for which the composition ω(·,J·) is a Riemannian metric. In the case where J is integrable, ω is called a Kähler form of the complex manifold.

On the other hand, a (positive) contact form on an oriented (2n1)-manifold N is a 1-form η satisfying η(dη)n1>0. A (co-oriented) contact structure on N is the conformal class of a contact form. It can be presented as the oriented hyperplane field kerη. The product manifold R(t)×N carries the exact symplectic form d(etη). Take a function h on N. Let X be the Hamiltonian vector field of the function eth defined on the product manifold R×N. Then, the push-forward Y of X under the projection of R×N to the second factor is well-defined. The vector field Y is called the contact Hamiltonian vector field of the function h on N. The pair of the equations η(Y)=h and ηLYη=0 uniquely determines Y. A Legendrian submanifold is an (n1)-manifold which is immersed in a contact (2n1)-manifold so that the pull-back of the contact form vanishes.

2.2. Bayesian Statistics

Suppose that any point y of a smooth manifold M equipped with a volume form dvol presents a positive probability density or probability ρy:WR>0 defined on a (possibly discrete) measurable space W, where ρy depends smoothly on y, and ρyρy for yyM. Let V be the space of positive volume forms with finite total volume on M. Take an arbitrary element VV and consider it as the initial state of the mind M of an agent. Here W stands for (a part of) the world for the agent. Finding a datum wW in his world, the agent can take the value ρy(w) as a smooth positive function ρ¯w:yρy(w) on M, which is called the likelihood of the datum w. Then, he can multiply the volume form V by ρ¯w>0 to obtain a new element of V. This defines the updating map

φ:W×V(w,V)ρ¯wVV. (3)

The “psychological” probability density pV on the mind M defined by pVdvol=V/MV is accordingly updated into the density pφ(w,V)pVρ¯w, which is called the conditional probability density given the datum w. Practically, Bayes’ rule on conditional probabilities is expressed as

P(yΔy|wΔw)=P(wΔw|yΔy)P(wΔw)·P(yΔy). (4)

Here P denotes the probability of an event, Δy (respectively, Δw) a small portion of M (respectively, W). Since the state of the world does not depend on the mind of the agent, the probability P(wΔw) is independent of y, and therefore approximates a constant on M. On the other hand, the conditional probability P(wΔw|yΔy) of the datum w approximates a function of y which is clearly proportional to the above likelihood. This implies that the factor P(wΔw|yΔy)P(wΔw) in the right-hand side of Bayes’ rule (Equation (4)) is approximately proportional to the likelihood. Thus, Bayes’ rule (Equation (4)) implies the updating of pV via the formula (Equation (3)). The Bayesian product · mentioned in the introduction appears in this context. Namely, the variable of the first factor is the mean y of a normal distribution on W. The density of the normal distribution at the datum w can be considered as a function of y, which is proportional to a normal distribution on M. Thus the Bayesian product of normal distributions on M presents the updating of the density of the predictive mean in the mind of the agent.

The aim of Bayesian updating is practical to many people. Indeed, the aim of the above updating is the estimation of the mean. Nevertheless, it is quite natural that a geometer multiplies a volume form by a positive function once he is given them. In this regard, we can say that the aim of Bayesian updating is a geometric setting of a dynamical system. In particular, a Bayesian updating in the conjugate prior is at first, simply the iteration of a Bayesian product.

2.3. The Information Geometry

Suppose that a manifold U is embedded in the subset {VVMV=1}={pVdvolVV}. Hereafter, we identify the element pVdvolU with the “psychological” probability density pV. We call U a conjugate prior for the updating map φ if the cone U˜={etVtR,VU} satisfies φ(W×U˜)U˜. (Whether there exists a preferred conjugate prior or not, how to determine the initial state of the mind is another interesting problem. For example, one may fix the asymptotic behavior of the state of mind according to the aim of the Bayesian inference and search for the optimal decision of the initial state. See [7] for an approach to this problem via the information geometry.)

Now we define the “distance” D˜:U˜×U˜R on U˜, which satisfies none of the axioms of distance, by

D˜(V1,V2)=MV1lnV1V2(therelativeentropy). (5)

From the convexity of ln, we see that the restriction D=D˜|U×UlnMV2=0 is a separating premetric on U, which is called the Kullback–Leibler divergence in information theory. This implies that the germ of D along the diagonal set Δ of U×U represents the zero section of the cotangent sheaf of U, that is, for any point x=(x1,,xn) of any chart of U, the Taylor expansion of Dx+12dx,x12dx has no linear terms. Thus the differential dD:TU×TUR also vanishes on the diagonal set Δ of TU×TU. We regard the 1-form on TU represented by the germ of dD along Δ as a quadratic tensor, and denote it by g (note that gx:TxU×TxUR is linear). It appears as two times the quadratic terms 12i,jgijdxidxj (gij=gji) in the above Taylor expansion. Of course, it also appears in the Taylor expansion of Dx12dx,x+12dx. Thus it can also be considered as the quadratic terms of the symmetric sum Dx+12dx,x12dx+Dx12dx,x+12dx. The symmetric matrix [gij] is called the Fisher information in information theory. From the non-negativity of D, we may assume generically that g is a Riemannian metric. We would like to notice that this construction of Riemannian metric by means of symmetric sum also works over U˜. Indeed, we have

D˜(et+12dtV,et12dtV)+D˜(et12dtV,et+12dtV)=etdt2+O(dt3). (6)

Let 0 be the Levi–Civita connection of g. We write the lowest degree terms in the Taylor expansions of Dx+12dx,x12dxDx12dx,x+12dx as 13i,j,kTijkdxidxjdxk (Tijk=Tjik=Tikj). This presents the symmetric cubic tensor T, which can be constructed from the anti-symmetric difference D(x1,x2)=D(x1,x2)D(x2,x1) similarly as above. One can use it to deform the Levi–Civita connection 0 into the α-connections α=0αgT (αR) without torsion, where g denotes the dual metric. Especially, we call 1 and 1 respectively the e-connection and the m-connection. The symmetric tensor T is sometimes called skewness since it presents the asymmetry of D. The information geometry concerns the family of α-connections as well as the Fisher information metric on U. We usually do not extend it over U˜ for the symmetric sum of D˜ lacks asymmetry.

2.4. The Geometry of Normal Distributions

We consider the space U of multivariate normal distributions on M=Rn. A vector μ=(μi)1inRn and an upper triangular matrix C=[cij]1i,jnMat(n,R) with positive diagonal entries parameterize U by declaring that μ presents the mean and CTC the Cholesky decomposition of the covariance matrix. Further we put

σi=cii,rij=cijcii(i,j{1,,n}),C=diag(σ)[rij]. (7)

The matrix [rij] is unitriangular, i.e., a triangular matrix whose diagonal entries are all 1. Then, each point x=(μ,σ,r)U=Rn×(R>0)n×Rn(n1)/2 presents the volume form

Vx=1(2π)nσ1σnexp12C(σ,r)T(yμ)2dvol. (8)

Let ·2 denote the sum of the squares of the entries of a matrix as well as a vector. The relative entropy defines the premetric D(x,x=(μ,σ,r)) by

D(x,x)=C(σ,r)T(μμ)22+C(σ,r)C(σ,r)12n2i=1nlnσiσi, (9)

Let 1n denote the unit matrix, and ΔC the difference C(σ+Δσ,r+Δr)C(σ,r). We have

D(x+Δx,x)=CTΔμ22+ΔCC122+tr(ΔCC1)ln1n+ΔCC1. (10)

Let rij be the entries of the inverse matrix of [rij]. We have

(theij-entryofΔCC1)=Δσiσi(i=j)σi+Δσiσjk=i+1jrkjΔrik(i<j)110(i>j). (11)

From Equations (10) and (11), we see that the Fisher information metric g is expressed as

g=k=1n1σki=1krikdμi2+2i=1ndσiσi2+l=1n1k=l+1nσlσki=l+1krikdrli2. (12)

Put

gμμ=gμi,μj=ki,jrikrjkσk2=C1CT,gσσ=diag2σi2,grr,l=grlirlji,j>l=σl2gμi,μji,j>l(l=1,,n1). (13)

Then, the representing matrix of g is the following block diagonal matrix:

diag(gμμ,gσσ,grr,2,,grr,n). (14)

Lowering the upper indices of the α-connection with LgKLΓαIJK=Γ{I,J},Kα, we have

Γ{μi,μj},σk0=Γ{μi,σk},μj0=rikrjkσk3, (15)
Γ{σi,σi},σi0=2σi3, (16)
Γ{μi,μj},rab0=Γ{μi,rab},μj0=k=bnrbk(riarjk+rikrja)2σk2, (17)
Γ{rli,rlj},σl0=Γ{rli,σl},rlj0=ki,jσlrikrjkσk2, (18)
Γ{rli,rlj},σk0=Γ{rli,σk},rlj0=σl2rikrjkσk3(ki,j), (19)
Γ{rli,rlj},rab0=Γ{rli,rab},rlj0=σl2Γ{μi,μj},rab0(a>l), (20)
Γ{I,J},K0=0(fortheotherchoicesof {I,J}andK). (21)

With respect to the same coordinates, the coefficients of the e-connection are

Γ{μi,σk},μj1=2Γ{μi,σk},μj0, (22)
Γ{σi,σi},σi1=3Γ{σi,σi},σi0, (23)
Γ{μi,rab},μj1=2Γ{μi,rab},μj0, (24)
Γ{rli,rlj},σl1=2Γ{rli,rlj},σl0, (25)
Γ{rli,σk},rlj1=2Γ{rli,σk},rlj0(ki,j), (26)
Γ{rli,rab},rlj1=2Γ{rli,rab},rlj0(a>l), (27)
Γ{I,J},K1=0(fortheotherchoicesof {I,J}andK). (28)

Those of the m-connection are

Γ{μi,μj},σk(1)=2Γ{μi,μj},σj0, (29)
Γ{σi,σi},σi(1)=Γ{σi,σi},σi0, (30)
Γ{μi,μj},rab(1)=2Γ{μi,μj},rab0, (31)
Γ{rli,σl},rlj(1)=2Γ{rli,σl},rlj0, (32)
Γ{rli,rlj},σk(1)=2Γ{rli,rlj},σk0(ki,j), (33)
Γ{rli,rlj},rab(1)=2Γ{rli,rlj},rab0(a>l), (34)
Γ{I,J},K(1)=0(fortheotherchoicesof {I,J}andK). (35)

There is a particular system of coordinates for describing the e-connection. Namely, all the coefficients vanish with respect to the natural parameter (C1CTμ,ξ), where ξ=(ξab)1abn is the upper half of C1CT. On the other hand, all the coefficients for the m-connection vanish with respect to the expectation parameter (μ,ν), where ν=(νab)1abn is the upper half of CTC+μμT.

2.5. The Generalization

This subsection is devoted to the generalization of the result of the author, which is mentioned in the introduction, to the above multivariate setting. We fix the third component r of the coordinate system (μ,σ,r), and change the presentation of the others. Namely, we take the natural projection π:U=Hn×Rn(n1)/2Rn(n1)/2 and replace the coordinates (μ,σ) on the fiber L(r)=π1(r) by (m,s) appearing in the next proposition. The generalization is then straightforward.

Proposition 1.

The fiber L(r)=π1(r) is an affine subspace of U with respect to the e-connection 1. It can be parameterized by affine parameters misi2 and 1si2, where m=[rij]Tμ and s=2σ.

Proof. 

The natural parameters C1CTμ=2k=inrikmksk21in and ξ=k=bnrakrbk1sk21abn are affine provided that rij are constant, and misi2 and 1si2 are affine. □

Note that dr(μαμ) is identically zero on some/any fiber L(r) if and only if α=1. The fiber satisfies the following two properties.

Proposition 2.

L(r) is closed under the convolution ∗ and the Bayesian product ·, and thus inherits them.

Proof. 

The covariance of the density at (μ,σ,r)(μ,σ,r) is

[rij]Tdiag((σi2))[rij]+[rij]Tdiag((σi2))[rij]=[rij]Tdiag((σi2+σi2))[rij]. (36)

This coincides with that of the density at μ+μ,σi2+σi2,r. Thus L(r) is closed under ∗, and inherits it as (m,s)(m,s)=m+m,si2+si2.

Put u=(σi2) and y=[rij]Tx. The density at (μ,σ,r) is proportional to expy(x)Tdiag(u)y(x)2mTdiag(u)y(x). From this we see that (μ,σ,r)·(μ,σ,r)=(μ,σ,r) implies r=r, u=u+u and diag(u)m=diag(u)m+diag(u). Thus L(r) is closed under ·, and inherits it as (m,s)·(m,s)=misi2+misi2si2+si2,sisisi2+si2. □

Proposition 3.

The fiber L(r) with the induced metric from g admits a Kähler complex structure.

Proof. 

The restriction of g is 2i=1ndmi2+dsi2si2. We define the complex structure J:TL(r)TL(r) by J(mi)=si (J(si)=mi). Then, the 2-form ω=2i=1ndmidsisi2 satisfies ω(·,J·)=g(·,·). □

We write the restriction D|L(r) of the premetric D using the coordinates (m,s) as follows, where we omit r for the expression does not depend on r.

D|L((m,s),(m,s))=12i=1nmisimisi2+si2si21lnsi2si2. (37)

We take the product U1×U2 of two copies of U. Then, the products L1(r)×L2(R) of the fibers foliate U1×U2. We call this the primary foliation of U1×U2. For each (r,R)Rn(n1), we have the coordinate system (m,s,M,S) on the leaf L1(r)×L2(R). From the Kähler forms ω1=2i=1ndmidsisi2 and ω2=2i=1ndMidSiSi2 respectively on L1(r) and L2(R), we define the symplectic forms ω1±ω2 on L1(r)×L2(R), which induce the mutually opposite orientations in the case where n is odd. Hereafter, we consider the pair of regular Poisson structures defined by these symplectic structures on the primary foliation, and fix the primitive 1-forms λ±=2i=1ndmisi±dMiSi. The corresponding pair of Poisson bi-vectors is Π±=12i=1nsi2misi±Si2MiSi defined on U1×U2. We take the 2n-dimensional submanifolds Fε,δ=misi+MiεiSi=0,siSi=δi(i=1,,n) of the leaf L1(r)×L2(R) for εRn and δ(R>0)n. The secondary foliation of U1×U2 foliates any leaf U(r)×U(r) by the 3n-dimensional submanifolds Fε=δ(R>0)nFε,δ for εRn. The tertiary foliation of U1×U2 foliates all leaves Fε of the secondary foliation by the 2n-dimensional submanifolds Fε,δ for δ(R>0)n.

Proposition 4.

With respect to the Kähler form dλ, the tertiary leaves Fε,δ are Lagrangian correspondences.

Proof. 

The tangent space TFε,δ is generated by simiSiMi and 2mimi+sisiSiSi. We have dλ(simiSiMi,mi)=0 and dλ(simiSiMi,sisiSiSi)=2si2si2(Si)2Si2=0. Thus Fε,δ are Lagrangian submanifolds of (L1(r)×L2(R),dλ). □

The restrictions of λ±|N to the hypersurface N=(m,s,M,S)L1×L2|i=1n(siSi)=1 are contact forms. Let η± denote them.

Proposition 5.

For any ε and δ with i=1nδi=1, the submanifold Fε,δN is a disjoint union of n-dimensional submanifolds {s=const}Fε,δ which are integral submanifolds of the hyper plane field kerη+ on N.

Proof. 

We have λ+(simiSiMi)=11=0. □

For each point (ε,δ)Hn, we have the diffeomorphism F^ε,δ:HnHn sending (m,s)Hn to (M,S)Hn with (m,s,M,S)Fε,δ. We put hi=lnsiSiδi and

fε,δ(m,s,M,S)=12i=1nMiεiSi+ehimisi2+e2hi1+2hi. (38)

Then, we have D|L((m,s),(m,s))=fε,δ((m,s),F^ε,δ(m,s)). For any ζ(R>0)2n, we define the diffeomorphism φε,ζ:(m,s,M,S)(ζ2i1mi),(ζ2i1si),(εi+ζ2i(Miεi)),(ζ2iSi)), which preserves the 1-forms λ±. It is easy to prove the following proposition.

Proposition 6.

In the case where ζ2i1ζ2i=1 for i=1,,n, the diffeomorphism φε,ζ preserves fε,δ.

For each εRn, we take the set fε=(fε,δ,Fε,δ)δ(R>0)n, and consider it as a structure of the secondary leaf Fε.

Proposition 7.

For any ζ(R>0)n, the diffeomorphism φε,ζ preserves the set fε for any εRn. In the case where ζ satisfies i=1n(ζ2i1ζ2i)=1, the diffeomorphism φε,ζ also preserves the hypersurface N.

Proof. 

Fε,δ maps to Fε,(ζ2i1ζ2iδi), and fε,(ζ2i1ζ2iδi)φε,ζ=fε,δ holds. Further i=1nδi=1 implies i=1n(ζ2i1ζ2iδi)=1 provided that i=1n(ζ2i1ζ2i)=1. □

For any δ(R>0)n, the diffeomorphism F^0,δ interchanges the operation (m,s)(m,s)=m+m,si2+si2 with the operation (m,s)·(m,s)=misi2+misi2si2+si2,sisisi2+si2.

Proposition 8.

If (m,s,M,S),(m,s,M,S)F0,δ, then

((m,s)·(m,s),(M,S)(M,S))F0,δ((m,s)(m,s),(M,S)·(M,S))F0,δ. (39)

Proof. 

Putting (m,s)=(m,s)·(m,s) we have Mi+Mi=δimisi2δimisi2=δimisi2 and Si2+Si2=δi2si2+δi2si2=δi2si2, hence the first equation. The second equation is similarly proved. □

A curve (m(t),s(t))Hn is a geodesic with respect to the e-connection 1 if and only if misi2 and 1si2 are affine functions of t for i=1,,n.

Definition 1.

We say that an e-geodesic (m(t),s(t))Hn is intensive if it admits an affine parameterization such that 1si2 are linear for i=1,,n.

Note that any e-geodesic is intensive in the case where n=1.

Proposition 9.

Given an intensive e-geodesic (m(t),s(t)), we can change the parameterization of its image M(t),S(t)=εimi(t)δisi2,δisi under the diffeomorphism F^ε,δ to obtain an intensive e-geodesic.

Proof. 

Put misi2=ait+bi and 1si2=cit. We have MiεSi2=aiciδibiciδi·1t and 1Si2=1ciδi2·1t. They are respectively an affine function and a linear function of 1t. □

We have the hypersurface N=i=1nsiSi=1Hn which carries the contact forms η±=2i=1ndmisi±dMiSi|N. This is defined on any leaf L1(r)×L2(R)H2n of the primary foliation of U1×U2. Now we state the main result.

Theorem 1.

The contact Hamiltonian vector field Y of the restriction of the function i=1nmisi to N for the contact form η+ coincides with that for the other contact form η. It is tangent to the tertiary leaves Fε,δ and defines flows on them. Here each flow line presents the correspondence between intensive e-geodesics in Proposition 9.

Corollary 1.

For any δ(R>0)n, the flow on the leaf F0,δ presents the iteration of the operation ∗ on the first factor of U1×U2 and that of the operation · on the second factor as is described in Proposition 8 (see Figure 1).

Figure 1.

Figure 1

On any leaf of the primary foliation of U1×U2, there is a bi-contact hypersurface N carrying the bi-contact Hamiltonian vector field Y. Because of the dimension, the surface F in the figure presents simultaneously a leaf of the secondary foliation and a leaf of the tertiary foliation of that leaf. The flow on the tertiary leaf F=F0,δ traces the common lift of the iteration of ∗ on L1 and the iteration of · on L2.

From Corollary 1, we see that the flow model of Bayesian inference studied in [4,5] also works for the multivariate case. We prove the theorem.

Proof. 

Take the vector field Y˜=14i=1n2mimi+sisiSiSi on U1×U1. It satisfies ιY˜λ±=i=1nmisi, L˜Yλ±=λ±, and ιY˜dln(siSi)=0, and thus its restriction to N is the contact Hamiltonian vector field Y. It also satisfies ιY˜dmisi+MiεiSi=14misi+MiεiSi (i=1,,n), where the right-hand sides vanish along Fε,δ. Given a point (P0,Q0)=(m0,s0,r0,M0,S0,R0)NL1(r0)×L2(R0)U×U, we have the integral curve (m(t),s(t),r(t),M(t),S(t),R(t))=(e2tm0,ets0,r0,M0,etS0,R0) of Y with initial point (P0,Q0). We can change the parameter of the curve (m(t),s(t),r0) on the first factor with t=e2t to obtain an intensive e-geodesic. □

2.6. The Symmetry

The diffeomorphism φε,ζ with i=1n(ζ2i1ζ2i)=1 in Proposition 7 also appears in the standard construction of Hilbert modular cusps in [8]. We sketch the construction.

Since the function H=lni=1n(siSi) is linear in the logarithmic space R2n(lnsi),(lnSi), we can take 2n1 points ((lnζ2i1(k)),(lnζ2i(k))){H=0} (k=1,,2n1) so that the quotient of the primary leaf L(r)×L(R) under the Z2n1-action generated by φε,ζ(1), ..., φε,ζ(2n1) is the total space of a vector bundle over T2n1×R. Here T2n1 is the quotient of {H=0} under the Z2n1-lattice generated by the above points, and the fiber R2n consists of the vectors (m,Mε).

In the univariate case (n=1), on the logarithmic sS-plane, we can take any point (lns,lnS)=(lnζ1(1),lnζ2(1)) of the line H=lns+lnS=0 other than the origin. That is, we put ζ1(1)=et and ζ2(1)=et (t0). Then, the map φε,ζ sends (m,s,M,S) to (etm,ets,et(Mε),etS). The Z-action generated by it rolls up the level sets of H, so that the quotient of the logarithmic sS-plane becomes the cylinder T1×R, which is the base space. On the other hand, the mM-plane, which is the fiber, expands horizontally and contracts vertically. This is the inverse monodromy along T1. In general, we obtain a similar R2n-bundle over T2n1×R if we take the 2n1 points of H=0 in general position.

From Proposition 7, we see that the leaf Fε of the secondary foliation with the set fε, as well as the pair of the 1-forms λ± with the function H, descends to the R2n-bundle. If there exists further a Z2n-lattice on the fiber R2n which is simultaneously preserved by the maps (mi),(Miεi)(ζ2i1(k)mi),ζ2i(k)(Miεi) (k=1,,2n1), we obtain a T2n-bundle over T2n1×R. Such a choice of ζ(k) would be number theoretical. Indeed, this is the case for Hilbert modular cusps. Moreover, we are considering the 1-forms λ±, which descend to the T2n-bundle. See [9] for the standard construction with special attention to these 1-forms.

We should notice that the vector field Y does not descend to the T2n-bundle. However, every actual Bayesian inference along Y eventually stops. Thus, we may take sufficiently large T2n and consider Y as a locally supported vector field to perform the inference in the quotient space.

3. Discussion

Finally, we would like to comment on the transverse geometry of the primary foliation. The author conjectures that it has some relation to the M-theory. See e.g., [10] for a relation between Poisson geometry and matrix theoretical or non-commutative geometrical physics.

The premetric D((μ+Δμ,σ+Δσ,r+Δr),(μ,σ,r)) on U can be decomposed as

D((μ+Δμ,σ+Δσ,r),(μ,σ,r))+i=1n1j=i+1nσi+Δσiσjk=i+1jrkjΔrik2. (40)

The first term on the right-hand side presents the fiber premetric D|L(m+Δm,s+Δs),(m,s), where Δm=[rij]TΔμ and Δs=2Δσ. If Δσ=0 (Δs=0), then we have the (non-information geometrical) Pythagorean-type formula

D((μ+Δμ,σ,r+Δr),(μ,σ,r))=D((μ+Δμ,σ,r),(μ,σ,r))+D((μ,σ,r+Δr),(μ,σ,r)). (41)

Note that each term in this expression does not depend on μ. The second term on the right-hand side can be expressed as

D((μ,σ,r+Δr),(μ,σ,r))=i=1n1j=i+1nσiσjk=i+1jrkjΔrik2. (42)

This presents the discretized version of the following restriction of the Fisher information g:

g|{(μ,σ)=const}=i=1n1j=i+1nσiσjk=i+1jrkjdrik2. (43)

We have the orthonormal frame with respect to this metric which consists of

eij=σiσjk=jnrjkrik(1i<jn). (44)

This frame satisfies the relations [eij,ekl]=δilekjδkjeil of the unitriangular algebra, where δ·,· denotes Kronecker’s delta. Using the dual coframe eij, the relations can be expressed as

deij=k=i+1j1eikekj. (45)

The transverse section of the primary foliation of U1×U2 is the product of two copies of the unitriangular Lie group, which we would like to call the bi-unitriangular group. We fix the frame (respectively, the coframe) of the transverse section consisting of the above eij (respectively, eij) in the first factor U1 and their copies Eij (respectively, Eij) in the second factor U2. The quotient manifold of the bi-unitriangular group by a cocompact lattice inherits a Riemannian metric from the sum of Fisher informations, and carries the following global (n2)-plectic structure Ω (dΩ=0, Ωn>0):

Ω=i=1nei,i+1ei,nEni+1,ni+2Eni+1,n. (46)

We notice that, in the symplectic case where n=3, the quotient manifold admits no Kähler structure (see [11]). However, it is still remarkable that the transverse symplectic 6-manifold is naturally ignored in the Bayesian inference described in this paper. Conjecturally, a similar model would help us to treat events in parallel worlds (or blanes) in the same “psychological” procedure.

Funding

This research received no external funding.

Conflicts of Interest

The author declares no conflict of interest.

References

  • 1.Amari S. Information Geometry and Its Applications. Springer; Tokyo, Japan: 2016. [Google Scholar]
  • 2.Ay N., Jost J., Le H.V., Schwachhöfer L. Information Geometry. Springer; Gewerbesrasse, Switzerland: 2017. [Google Scholar]
  • 3.Felice N., Ay N. Towards a Canonical Divergence within Information Geometry. arXiv. 20181806.11363 [Google Scholar]
  • 4.Mori A. Information geometry in a global setting. Hiroshima Math. J. 2018;48:291–306. doi: 10.32917/hmj/1544238029. [DOI] [Google Scholar]
  • 5.Mori A. A concurrence theorem for alpha-connections on the space of t-distributions and its application. Hokkaido Math. J. 2020 in press. [Google Scholar]
  • 6.Geiges H. An Introduction to Contact Topology. Cambridge University Press; Cambridge, UK: 2008. [Google Scholar]
  • 7.Snoussi H. Bayesian information geometry: Application to prior selection on statistical manifolds. In: Hawkes P., editor. Advances in Imaging and Electron Physics 146. Elsevier; San Diego, CA, USA: 2007. pp. 163–207. [Google Scholar]
  • 8.Hirzebruch F. Hilbert modular surfaces. Enseign. Math. 1973;19:183–281. [Google Scholar]
  • 9.Massot P., Niederkrüger K., Wendl C. Weak and strong fillability of higher dimensional contact manifolds. Invent. Math. 2013;192:287–373. doi: 10.1007/s00222-012-0412-5. [DOI] [Google Scholar]
  • 10.Kuntner N., Steinacker H. On Poisson geometries related to noncommutative emergent gravity. J. Geom. Phys. 2012;62:1760–1777. doi: 10.1016/j.geomphys.2012.04.002. [DOI] [Google Scholar]
  • 11.Cordero L., Fernández M., Gray A. Symplectic manifolds with no Kähler structure. Topology. 1986;25:375–380. doi: 10.1016/0040-9383(86)90050-9. [DOI] [Google Scholar]

Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES