Skip to main content
Entropy logoLink to Entropy
. 2020 Apr 20;22(4):467. doi: 10.3390/e22040467

Weyl Prior and Bayesian Statistics

Ruichao Jiang 1, Javad Tavakoli 1,*, Yiqiang Zhao 2
PMCID: PMC7516948  PMID: 33286240

Abstract

When using Bayesian inference, one needs to choose a prior distribution for parameters. The well-known Jeffreys prior is based on the Riemann metric tensor on a statistical manifold. Takeuchi and Amari defined the α-parallel prior, which generalized the Jeffreys prior by exploiting a higher-order geometric object, known as a Chentsov–Amari tensor. In this paper, we propose a new prior based on the Weyl structure on a statistical manifold. It turns out that our prior is a special case of the α-parallel prior with the parameter α equaling n, where n is the dimension of the underlying statistical manifold and the minus sign is a result of conventions used in the definition of α-connections. This makes the choice for the parameter α more canonical. We calculated the Weyl prior for univariate Gaussian and multivariate Gaussian distribution. The Weyl prior of the univariate Gaussian turns out to be the uniform prior.

Keywords: information geometry, Bayesian statistics, prior distributions, conformal geometry

1. Introduction

In Bayesian inference, a parameter is regarded as a random variable Θ. A density of Θ is called, by abuse of terminology, a prior distribution p(θ). After collecting some data, one obtains a conditional density p(x|θ), referred to as the likelihood function. The Bayes’ theorem then computes the posterior distribution p(θ|x) using p(θ) and p(x|θ). This is interpreted as an update of the information about the unknown parameter Θ in Bayesian inference. One such choice of the prior distribution p(θ) is Jeffreys prior Jω, which is the correct choice of uniform distribution. Here, the word “uniform” means uninformative, not favorable of any particular choice of the parameter.

Information geometry, in its narrowest sense, is an attempt to use differential geometry to study statistical inference. It has found applications in statistical inference, signal processing, and machine learning [1]. References [2,3] are two elementary introductions. In information geometry, geometric structures, for example metric tensors g and affine connections ∇, can be put on the set of prior distributions P(θ). These geometric structures help to single out some particular prior distributions, for example, the Jeffreys prior Jω, which, by the fundamental theorem of Riemannian geometry, is the unique volume form parallel with respect to the Levi–Civita connection LC. Since the Jeffreys prior Jω is provided by geometry, it is automatically invariant under reparametrization, which reflects the opinion that information can be at best not lost during a transformation of parameters and this is encoded in the notion of sufficient statistics. Similarly, if one can find a unique prior distribution satisfying some specified geometric conditions, then that prior distribution is called canonically chosen. Matsuzoe, Takeuchi, and Amari used information geometry to define the α-parallel prior αω such that, when α=0, it reduces to the Jeffreys prior [4].

Historically, Weyl proposed a generalization of general relativity to unify gravity and electromagnetism. Einstein soon pointed out that Weyl’s theory predicted substantial broadening of the characteristic length of atoms, which is contradictory to the well-observed thin atomic spectra. Even though Weyl geometry failed the unification of gravity and electromagnetism, which is still an open problem, Weyl geometry has found applications in possible generalization of general relativity [5] and the differential-geometric study of defects in continuum mechanics [6]. Weyl geometry is kept in mathematics. The relation of affine differential geometry, Weyl geometry, and Riemannian geometry are shown below. Let π:EB be a fibre bundle with base B and let each fibre π1(x), xB, be a Lie group G, called the structure group of the fibre bundle. For different G, we obtain different geometries as follows:

  1. GL(n), the general linear group, affine differential geometry;

  2. C(n):={kA | kR+, AO(n)}, the conformal group, Weyl geometry;

  3. O(n), the orthogonal group, Riemannian geometry.

With the reduction of structure groups O(n)C(n)GL(n), and the fact that smaller the structure group, the more geometric properties are expected. In our case, the reduction of group gives rise to a canonical choice for the parameter α of the α-parallel prior. For more about bundle-theoretic differential geometry, see [7].

In this paper, we will use Weyl geometry to define a prior distribution for Bayesian inference, which we call the Weyl prior. We will elucidate the relation between the dimension of a statistical manifold and the parameter α in Takeuchi and Amari’s α-parallel prior.

The organization of the paper is as follows: In Section 2, we review information geometry and the α-parallel prior. We discuss Weyl geometry in Section 3. We define the Weyl prior, and elucidate the relation between the Weyl prior and the α-parallel prior in Section 4. We calculate the Weyl prior for the univariate Gaussian distribution as an example in Section 5 and the multivariate Gaussian distribution in Section 6. All functions in the paper are real-valued and smooth, all connections are torsion-free, and the Einstein summation rule is used.

2. Information Geometry and α-Priors

In this section, we review some basics of information geometry. For more details, see [1].

Let us consider a statistical model P, which is a set of parametric densities P={p(x|θ)}. P can be geometrized as follows: first, we introduce the Fisher metric tensor, which is a 2nd order tensor,

Definition 1.

The Fisher metric tensor is defined by

gij=Eθiljl (1)

where Eθ is the the transition kernel X×Θ[0,), l is the log likelihood function, and i is the partial derivative with respect to coordinate i.

Then, we introduce the Amari–Chentsov tensor, which is a 3rd order tensor.

Definition 2

(Amari–Chentsov Tensor). The Amari–Chentsov tensor C is defined by

Cijk=Eθiljlkl. (2)

Remark 1.

The Amari–Chentsov tensor C defined above satisfies

C=g.

In other words, C is the covariant derivative of the metric tensor g.

In Riemannian geometry, C vanishes everywhere, which is required if the length of a tangent vector is to be preserved under the parallel transport. In information geometry, this requirement is dropped and thus a duality theory arises. Let ∇ be an arbitrary torsion-free affine connection on a Riemannian manifold (M,g). The dual connection * of ∇ plays an important role in information geometry.

Definition 3

(Dual Connection). The dual connection * on a Riemannian manifold (M,g) with affine connection is defined as the unique affine connection satisfying the following equation:

Xgp(Y,Z)=gp(XY,Z)+gp(Y,X*Z), (3)

where pM and X,Y,ZTpM.

Remark 2.

The dual connection * preserves the metric tensor g together with :

gp(X,Y)=gq(ΠX,Π*Y),

where X,YTpM, and Π and Π* are parallel transports induced by and *, respectively, along some curve from p to q. In general, gp(X,Y)gq(ΠX,ΠY) and gp(X,Y)gq(Π*X,Π*Y), unless =*=LC. See [1].

Now, we introduce α-connections.

Definition 4.

The α-connections are defined in terms of Christoffel symbols by

αΓjki=LCΓjkiα2gilCljk, (4)

where αR and LC stands for Levi–Civita.

Remark 3.

The dual connection of α is then given by

α*=α. (5)

Remark 4.

The α-parallel prior αω is the volume form parallel with respect to α. Unlike the Jeffreys prior, which always exists, the α-parallel prior do not necessarily exist. An α-parallel prior exists if and only if the Ricci curvature tensor is symmetric [4]. However, if αω exists for one αR, then it exists for all α [8].

The following characterization will be used in Section 5 to obtain the relation between the α-parallel prior and the Weyl prior defined therein.

Proposition 1.

[4] Let (M,g,α) be a statistical manifold. If there exists an exact 1-form T=dΩ for some function Ω determined by and g, then the α-parallel prior is αω=exp{α2Ω}detg.

Remark 5.

dΩ is known as the Chebyshev 1-form. A differential form ϕ is called closed if the exterior derivative vanishes i.e., dϕ=0, and is called exact if there exists a differential form φ such that ϕ=dφ. By definition, every exact form is closed. By Poincare’s lemma, every closed form is locally exact. Because statistical manifolds are simply connected, closedness implies exactness.

3. Weyl Geometry

In this section, we review some concepts of Weyl geometry which are needed in the next section. For more details, see [9].

Two Riemannian metrics g and g on a manifold M are said to be conformally equivalent if g=eλg for some smooth function λ on M.

A conformal structure C on M is an equivalent class of conformally equivalent Riemannian metrics, i.e., C:={g|g=eλg}.

A Weyl structure is a map F:CΛ1(M) from the conformal structure C to the set of 1-forms on M, satisfying

F(eλg)=F(g)dλ.

The image of g under F is called the Weyl 1-form F(g):=φ.

A Weyl structure enables us to translate a scalar product ( , )p at p to ( , )q at q along a curve c:[0,1]M:

( , )q=exp01c*φgq, (6)

where c*φ is the pullback of the Weyl 1form φ along curve c. A Weyl manifold is a manifold with a Weyl structure.

Remark 6.

The meaning of this equation is: If we start with a scalar product (,)p at a point p arising from the conformal class C, then there exists a metric tensor gC extending (,)p, i.e., gp=(,)p. The value of this particular choice of g at another point q is gq. However, different choice of g gives rise to different gq. The scalar product (,)q determined by Weyl translation is proven to be independent of g [9]. Hence, by Weyl translation, we can compare lengths of vectors at different points on a Weyl manifold, whereas, with only the conformal structure C, we can only compare ratios of lengths.

An affine connection ∇ is said to be a Weyl connection if the parallel transport of a scalar product under ∇ coincides with the Weyl translation.

The Weyl connection is characterized by the following propositions.

Proposition 2

([9]). An affine connection is a Weyl connection if and only if g+φg=0 for all gC.

Proposition 3

(Fundamental Theorem of Weyl Geometry [9]). There exists a unique torsion-free Weyl connection W on a Weyl manifold M. The Christoffel symbols of W are given by

WΓjki=LCΓjki+12δjiφk+δkiφjgimgjkφm, (7)

where δji is the Kronecker delta.

4. Weyl Prior

In this section, we define the Weyl prior and show its relation to the α-parallel prior.

First, we define the Weyl prior as follows.

Definition 5

(Weyl Prior). Let (M,g) be an n-dimensional Riemannian manifold with the conformal structure C=[g] and the Weyl structure F. Let W be the Weyl connection. The Weyl prior Wω is defined as the unique volume form parallel with respect to W.

Remark 7.

The uniqueness of the Weyl prior is the result of the uniqueness of the Weyl connection.

Now, we prove the main result of this paper.

Theorem 1.

Let (M,g) be a Riemannian manifold. Let W and n be the Weyl connection and the n-connection, i.e., the α-connection with α=n, where n is the dimension of M. Suppose that the n-prior nω exists, then

Wω=nω.

Proof. 

Consider an arbitrary volume form fdetg, where f is a positive function on M. For fdetg to be parallel with respect to Wω, it is necessary and sufficient that

Wfdetg=fWdetg+Wfdetg=0. (8)

Componentwise, Equation (8) becomes

fWjdetg+Wjfdetg=fWjdetg+jfdetg, (9)

since covariant derivative coincides with partial derivative for functions.

Since detg is a scalar density of weight 1, its covariant derivative is given by

Wjdetg=jdetgWΓjdetg, (10)

where Γj is obtained by the contraction of Equation (7) over i and k:

WΓj=WΓjii=LCΓjii+12δjiφi+δiiφjgimgjiφm=jlndetg+12φj+nφjδjmφm=jlndetg+n2φj.

Substituting Equation (10) into Equation (9), we obtain

jf=n2φjf. (11)

Since the covariant derivative coincides with exterior derivative for functions, collect indices in Equation (11)

φ=2ndlnf. (12)

Assume for now that the Weyl 1-form φ is exact, that is, φ=dΩ for some function Ω on M. Then, from Equation (29), the Weyl prior is given by

Wω=exp{n2Ω}detg. (13)

By comparison of Equation (13) with Proposition 1, the theorem is proved under the assumption of the exactness of the Weyl 1-form.

Since we proved that the Weyl prior Wω is the αprior with α=n, and we required the existence of nω, our assumption of the exactness of the Weyl 1form φ is indeed true by Remark 4. □

Remark 8.

The minus sign in α=n is a result of the definition of α-connection. By Remark 3, the dual connection of α is α, we would have α=n here, had we defined the α-connection to be its dual connection in Definition 4. It would seem more natural to consider the dual prior of the Weyl prior.

5. Weyl Prior for Gaussian Family

In this section, we calculate the Weyl prior of the Gaussian family as an example.

Example 1

(Gaussian Family). Consider the Gaussian family

P=p(x|μ,σ2)=12πσexp12σ2(xμ)2 | (μ,σ)M.

Choosing (μ,σ2) as a coordinate system, we have

μl=xμσ2,
σ2l=(xμ)22σ412σ2,

where l is the log likelihood function.

The first element of the Fisher metric tensor g in the (μ,σ2)-coordinate is given by

gμμ=Eθμlμl=(xμ)2σ412πσexp12σ2(xμ)2dx=1σ2,

where Eθ is the conditional expectation of X given Θ. The other elements of the Fisher metric tensor are

gμσ2=gσ2μ=0,

and

gσ2σ2=12σ4.

Hence,

detg=12σ3. (14)

To calculate the Weyl 1-form, we first calculate the Amari–Chentsov tensor C,

Cμμμ=Eθμlμlμl=(xμ)3σ612πσexp12σ2(xμ)2dx=0.

Similarly,

Cσ2μμ=Cμσ2μ=Cμμσ2=1σ4,
Cσ2σ2μ=Cσ2μσ2=Cμσ2σ2=0,

and

Cσ2σ2σ2=1σ6.

Hence, the Weyl 1-form is given by

φ=12Cijkgjkdθi=32σ2dσ2. (15)

Now, it is easy to check that φ=d(32lnσ2) is an exact form. Hence, for Gaussian family P, a Weyl prior exists and is given by

Wω=exp2232lnσ212σ3=12. (16)

Remark 9.

Based on our calculation, we find that the Weyl prior for the univariate Gaussian distribution with unknown mean and unknown variance is just the uniform prior. This shows that the uniform prior is in fact a uninformative prior. This counter-intuitive result is related to the fact that every two-dimensional manifold is conformally-flat, which can be proved using the existence of isothermal coordinates in two dimensions [10].

6. Multivariate Gaussian

The above example can be extended to the multivariate case. Consider the multivariate Gaussian distribution

f(x|μ,Σ)=1(2π)n/2detΣexp{12(xμ)Σ1(xμ)}, (17)

where μ is the mean vector and Σ is the covariance matrix.

Using matrix calculus, we have

μl=Σ1(xμ) (18)

and

Σl=12Σ1+12Σ1(xμ)(xμ)Σ1=12Σ1+12[Σ1(xμ)][Σ1(xμ)]=12Σ1+12(Σ1Σ1)[(xμ)(xμ)] (19)

We can now compute the Fisher information matrix.

gμμ=Eθ[μμ]=(Σ1Σ1)Eθ[(xμ)(xμ)]=(Σ1Σ1)Σ=Σ1ΣΣ=Σ1 (20)

where the second last line is by the action of matrix tensor product and the last line is by the definition of covariance matrix.

Similarly,

gμΣ=gΣμ=0, (21)

and

gΣΣ=12Σ1Σ1. (22)

The Amari–Chentsov tensor can be computed in the same way:

Cμμμ=Eθ[μlμlμl]=0. (23)
CΣμμ=CμΣμ=CμμΣ=Σ1Σ1. (24)
CμΣΣ=CΣμΣ=CΣΣμ=0. (25)
CΣΣΣ=Σ1Σ1Σ1. (26)

The detail computation of Equation (26) is as follows:

CΣΣΣ=Eθ[ΣlΣlΣl]=Eθ{18Σ1Σ1Σ1+18Σ1(Σ1Σ1)(xμ)(xμ)Σ1+18Σ1Σ1(xμ)(xμ)(Σ1Σ1)18Σ1Σ1Σ1Σ1(xμ)(xμ)(xμ)(xμ)Σ1+18Σ1Σ1Σ1Σ1(xμ)(xμ)18Σ1Σ1Σ1(xμ)(xμ)Σ1Σ1(xμ)(xμ)18Σ1Σ1(xμ)(xμ)Σ1Σ1Σ1(xμ)(xμ)+18Σ1Σ1Σ1Σ1Σ1Σ1[(xμ)(xμ)(xμ)(xμ)(xμ)(xμ)]}=Σ1Σ1Σ1 (27)

The above expression can be evaluated by 4th and 6th moments of multivariate Gaussian.

The Weyl prior is then given by:

φ=12Cijkgjkdθi=12CΣμμgμμdΣ+12CΣΣΣgΣΣdΣ=12Σ1Σ1ΣdΣ+12Σ1Σ1Σ12ΣΣdΣ=32Σ1dΣ=d32lndetΣ. (28)

The Weyl prior is thus given by:

Wω=expn+(n+1)n/2232lndetΣdetΣ1det12Σ1Σ1=detΣ(3n2+9n)/8detΣ(2n1)/22n/2=detΣ(n1)(3n+4)/82n/2, (29)

where, in the first line, n+(n+1)n/2 is the dimension of the statistical manifold for the multivariate Gaussian distribution.

Remark 10.

Our calculation of the Weyl prior of the multivariate Gaussian distribution is generally not a uniform prior. However, Equation (29) shows that, when n=1, that is, the univariate case, the Weyl prior is indeed the uniform prior. This is in accordance with our direct calculation for the univariate case.

7. Discussion and Conclusions

We discussed Weyl geometry and Weyl prior in this paper. We also calculated Weyl prior for the Gaussian family as an example.

The underlying principle of Jeffreys prior, α-parallel prior, and Weyl prior is the concept of invariance in statistics. Jeffreys prior is invariant under a change of the coordinate of parameters. Weyl prior and α-parallel prior, as generalizations of Jeffreys prior, automatically satisfy this invariance. Moreover, Weyl prior, as a volume form defined on a Weyl manifold, is also invariant under a gauge transformation [11]. In addition, invariant under the gauge transformation is the generalized conjugate connection [11].

One possible use of the Weyl prior is using the uniform prior for distributions with two parameters. This is because any two-dimensional manifold is conformally-flat.

Acknowledgments

We thank the referees for their constructive comments and suggestions.

Author Contributions

R.J. contributed significantly to the paper. Throughout the process of this research, both J.T. and Y.Z. provided detailed advices, discussions, and suggestions. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Discovery Grants of the Natural Sciences and Engineering Research Council of Canada (NSERC) under No. 256233 and No. 163407. The APC was funded by NSERC No. 163407.

Conflicts of Interest

The authors declare no conflict of interest.

References

  • 1.Amari S. Information Geometry and Its Applications. Volume 194 Springer; Berlin/Heidelberg, Germany: 2016. [Google Scholar]
  • 2.Calin O., Udriste C. Geometric Modeling in Probability and Statistics. Springer; Basel, Switzerland: 2014. [Google Scholar]
  • 3.Nielsen F. An elementary introduction to information geometry. arXiv. 2018 doi: 10.3390/e22101100.1808.08271 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Matsuzoe H., Takeuchi J., Amari S. Equiaffine structures on statistical manifolds and Bayesian statistics. Differ. Geom. Its Appl. 2006;24:567–578. doi: 10.1016/j.difgeo.2006.02.003. [DOI] [Google Scholar]
  • 5.Ciambelli L., Leigh R.G. Weyl Connections and their Role in Holography. arXiv. 20191905.04339 [Google Scholar]
  • 6.Yavari A., Goriely A. Weyl geometry and the nonlinear mechanics of distributed point defects. Proc. R. Soc. A. 2012;468:3902–3922. doi: 10.1098/rspa.2012.0342. [DOI] [Google Scholar]
  • 7.Kobayashi S., Nomizu K. Foundations of Differential Geometry. Volume 1 Wiley; New York, NY, USA: 1963. [Google Scholar]
  • 8.Takeuchi J., Amari S. α-parallel prior and its properties. IEEE Trans. Inf. Theory. 2005;51:1011–1023. doi: 10.1109/TIT.2004.842703. [DOI] [Google Scholar]
  • 9.Folland G.B. Weyl manifolds. J. Differ. Geom. 1970;4:145–153. doi: 10.4310/jdg/1214429379. [DOI] [Google Scholar]
  • 10.Kulkarni R. Conformally flat manifolds. Proc. Natl. Acad. Sci. USA. 1972;69:2675. doi: 10.1073/pnas.69.9.2675. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Calin O., Matsuzoe H., Zhang J. Trends in Differential Geometry, Complex Analysis and Mathematical Physics. World Scientific; Singapore: 2009. Generalizations of conjugate connections; pp. 26–34. [Google Scholar]

Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES