Abstract
When using Bayesian inference, one needs to choose a prior distribution for parameters. The well-known Jeffreys prior is based on the Riemann metric tensor on a statistical manifold. Takeuchi and Amari defined the -parallel prior, which generalized the Jeffreys prior by exploiting a higher-order geometric object, known as a Chentsov–Amari tensor. In this paper, we propose a new prior based on the Weyl structure on a statistical manifold. It turns out that our prior is a special case of the -parallel prior with the parameter equaling , where n is the dimension of the underlying statistical manifold and the minus sign is a result of conventions used in the definition of -connections. This makes the choice for the parameter more canonical. We calculated the Weyl prior for univariate Gaussian and multivariate Gaussian distribution. The Weyl prior of the univariate Gaussian turns out to be the uniform prior.
Keywords: information geometry, Bayesian statistics, prior distributions, conformal geometry
1. Introduction
In Bayesian inference, a parameter is regarded as a random variable . A density of is called, by abuse of terminology, a prior distribution . After collecting some data, one obtains a conditional density , referred to as the likelihood function. The Bayes’ theorem then computes the posterior distribution using and . This is interpreted as an update of the information about the unknown parameter in Bayesian inference. One such choice of the prior distribution is Jeffreys prior , which is the correct choice of uniform distribution. Here, the word “uniform” means uninformative, not favorable of any particular choice of the parameter.
Information geometry, in its narrowest sense, is an attempt to use differential geometry to study statistical inference. It has found applications in statistical inference, signal processing, and machine learning [1]. References [2,3] are two elementary introductions. In information geometry, geometric structures, for example metric tensors g and affine connections ∇, can be put on the set of prior distributions . These geometric structures help to single out some particular prior distributions, for example, the Jeffreys prior which, by the fundamental theorem of Riemannian geometry, is the unique volume form parallel with respect to the Levi–Civita connection . Since the Jeffreys prior is provided by geometry, it is automatically invariant under reparametrization, which reflects the opinion that information can be at best not lost during a transformation of parameters and this is encoded in the notion of sufficient statistics. Similarly, if one can find a unique prior distribution satisfying some specified geometric conditions, then that prior distribution is called canonically chosen. Matsuzoe, Takeuchi, and Amari used information geometry to define the -parallel prior such that, when , it reduces to the Jeffreys prior [4].
Historically, Weyl proposed a generalization of general relativity to unify gravity and electromagnetism. Einstein soon pointed out that Weyl’s theory predicted substantial broadening of the characteristic length of atoms, which is contradictory to the well-observed thin atomic spectra. Even though Weyl geometry failed the unification of gravity and electromagnetism, which is still an open problem, Weyl geometry has found applications in possible generalization of general relativity [5] and the differential-geometric study of defects in continuum mechanics [6]. Weyl geometry is kept in mathematics. The relation of affine differential geometry, Weyl geometry, and Riemannian geometry are shown below. Let be a fibre bundle with base B and let each fibre , be a Lie group G, called the structure group of the fibre bundle. For different G, we obtain different geometries as follows:
the general linear group, affine differential geometry;
the conformal group, Weyl geometry;
the orthogonal group, Riemannian geometry.
With the reduction of structure groups , and the fact that smaller the structure group, the more geometric properties are expected. In our case, the reduction of group gives rise to a canonical choice for the parameter of the -parallel prior. For more about bundle-theoretic differential geometry, see [7].
In this paper, we will use Weyl geometry to define a prior distribution for Bayesian inference, which we call the Weyl prior. We will elucidate the relation between the dimension of a statistical manifold and the parameter in Takeuchi and Amari’s -parallel prior.
The organization of the paper is as follows: In Section 2, we review information geometry and the -parallel prior. We discuss Weyl geometry in Section 3. We define the Weyl prior, and elucidate the relation between the Weyl prior and the -parallel prior in Section 4. We calculate the Weyl prior for the univariate Gaussian distribution as an example in Section 5 and the multivariate Gaussian distribution in Section 6. All functions in the paper are real-valued and smooth, all connections are torsion-free, and the Einstein summation rule is used.
2. Information Geometry and -Priors
In this section, we review some basics of information geometry. For more details, see [1].
Let us consider a statistical model which is a set of parametric densities can be geometrized as follows: first, we introduce the Fisher metric tensor, which is a 2nd order tensor,
Definition 1.
The Fisher metric tensor is defined by
(1) where is the the transition kernel , l is the log likelihood function, and is the partial derivative with respect to coordinate i.
Then, we introduce the Amari–Chentsov tensor, which is a 3rd order tensor.
Definition 2
(Amari–Chentsov Tensor). The Amari–Chentsov tensor C is defined by
(2)
Remark 1.
The Amari–Chentsov tensor C defined above satisfies
In other words, C is the covariant derivative of the metric tensor
In Riemannian geometry, C vanishes everywhere, which is required if the length of a tangent vector is to be preserved under the parallel transport. In information geometry, this requirement is dropped and thus a duality theory arises. Let ∇ be an arbitrary torsion-free affine connection on a Riemannian manifold The dual connection of ∇ plays an important role in information geometry.
Definition 3
(Dual Connection). The dual connection on a Riemannian manifold with affine connection ∇ is defined as the unique affine connection satisfying the following equation:
(3) where and
Remark 2.
The dual connection preserves the metric tensor g together with
where and Π and are parallel transports induced by ∇ and , respectively, along some curve from p to q. In general, and , unless . See [1].
Now, we introduce -connections.
Definition 4.
The α-connections are defined in terms of Christoffel symbols by
(4) where and LC stands for Levi–Civita.
Remark 3.
The dual connection of is then given by
(5)
Remark 4.
The α-parallel prior is the volume form parallel with respect to Unlike the Jeffreys prior, which always exists, the α-parallel prior do not necessarily exist. An α-parallel prior exists if and only if the Ricci curvature tensor is symmetric [4]. However, if exists for one then it exists for all α [8].
The following characterization will be used in Section 5 to obtain the relation between the -parallel prior and the Weyl prior defined therein.
Proposition 1.
[4] Let be a statistical manifold. If there exists an exact 1-form for some function Ω determined by ∇ and g, then the α-parallel prior is
Remark 5.
is known as the Chebyshev 1-form. A differential form ϕ is called closed if the exterior derivative vanishes i.e., , and is called exact if there exists a differential form φ such that . By definition, every exact form is closed. By Poincare’s lemma, every closed form is locally exact. Because statistical manifolds are simply connected, closedness implies exactness.
3. Weyl Geometry
In this section, we review some concepts of Weyl geometry which are needed in the next section. For more details, see [9].
Two Riemannian metrics g and on a manifold M are said to be conformally equivalent if for some smooth function on M.
A conformal structure on M is an equivalent class of conformally equivalent Riemannian metrics, i.e., .
A Weyl structure is a map from the conformal structure to the set of 1-forms on satisfying
The image of g under F is called the Weyl 1-form
A Weyl structure enables us to translate a scalar product at p to at q along a curve
(6) |
where is the pullback of the Weyl form along curve A Weyl manifold is a manifold with a Weyl structure.
Remark 6.
The meaning of this equation is: If we start with a scalar product at a point p arising from the conformal class , then there exists a metric tensor extending , i.e., . The value of this particular choice of g at another point q is . However, different choice of g gives rise to different . The scalar product determined by Weyl translation is proven to be independent of g [9]. Hence, by Weyl translation, we can compare lengths of vectors at different points on a Weyl manifold, whereas, with only the conformal structure , we can only compare ratios of lengths.
An affine connection ∇ is said to be a Weyl connection if the parallel transport of a scalar product under ∇ coincides with the Weyl translation.
The Weyl connection is characterized by the following propositions.
Proposition 2
([9]). An affine connection ∇ is a Weyl connection if and only if for all
Proposition 3
(Fundamental Theorem of Weyl Geometry [9]). There exists a unique torsion-free Weyl connection on a Weyl manifold The Christoffel symbols of are given by
(7) where is the Kronecker delta.
4. Weyl Prior
In this section, we define the Weyl prior and show its relation to the -parallel prior.
First, we define the Weyl prior as follows.
Definition 5
(Weyl Prior). Let be an n-dimensional Riemannian manifold with the conformal structure and the Weyl structure Let be the Weyl connection. The Weyl prior is defined as the unique volume form parallel with respect to
Remark 7.
The uniqueness of the Weyl prior is the result of the uniqueness of the Weyl connection.
Now, we prove the main result of this paper.
Theorem 1.
Let be a Riemannian manifold. Let and be the Weyl connection and the -connection, i.e., the α-connection with , where n is the dimension of Suppose that the -prior exists, then
Proof.
Consider an arbitrary volume form where f is a positive function on For to be parallel with respect to it is necessary and sufficient that
(8) Componentwise, Equation (8) becomes
(9) since covariant derivative coincides with partial derivative for functions.
Since is a scalar density of weight its covariant derivative is given by
(10) where is obtained by the contraction of Equation (7) over i and
Substituting Equation (10) into Equation (9), we obtain
(11) Since the covariant derivative coincides with exterior derivative for functions, collect indices in Equation (11)
(12) Assume for now that the Weyl 1-form is exact, that is, for some function on Then, from Equation (29), the Weyl prior is given by
(13) By comparison of Equation (13) with Proposition 1, the theorem is proved under the assumption of the exactness of the Weyl 1-form.
Since we proved that the Weyl prior is the prior with and we required the existence of our assumption of the exactness of the Weyl form is indeed true by Remark 4. □
Remark 8.
The minus sign in is a result of the definition of α-connection. By Remark 3, the dual connection of is we would have here, had we defined the α-connection to be its dual connection in Definition 4. It would seem more natural to consider the dual prior of the Weyl prior.
5. Weyl Prior for Gaussian Family
In this section, we calculate the Weyl prior of the Gaussian family as an example.
Example 1
(Gaussian Family). Consider the Gaussian family
Choosing as a coordinate system, we have
where l is the log likelihood function.
The first element of the Fisher metric tensor g in the -coordinate is given by
where is the conditional expectation of X given The other elements of the Fisher metric tensor are
and
Hence,
(14) |
To calculate the Weyl 1-form, we first calculate the Amari–Chentsov tensor
Similarly,
and
Hence, the Weyl 1-form is given by
(15) |
Now, it is easy to check that is an exact form. Hence, for Gaussian family a Weyl prior exists and is given by
(16) |
Remark 9.
Based on our calculation, we find that the Weyl prior for the univariate Gaussian distribution with unknown mean and unknown variance is just the uniform prior. This shows that the uniform prior is in fact a uninformative prior. This counter-intuitive result is related to the fact that every two-dimensional manifold is conformally-flat, which can be proved using the existence of isothermal coordinates in two dimensions [10].
6. Multivariate Gaussian
The above example can be extended to the multivariate case. Consider the multivariate Gaussian distribution
(17) |
where is the mean vector and is the covariance matrix.
Using matrix calculus, we have
(18) |
and
(19) |
We can now compute the Fisher information matrix.
(20) |
where the second last line is by the action of matrix tensor product and the last line is by the definition of covariance matrix.
Similarly,
(21) |
and
(22) |
The Amari–Chentsov tensor can be computed in the same way:
(23) |
(24) |
(25) |
(26) |
The detail computation of Equation (26) is as follows:
(27) |
The above expression can be evaluated by 4th and 6th moments of multivariate Gaussian.
The Weyl prior is then given by:
(28) |
The Weyl prior is thus given by:
(29) |
where, in the first line, is the dimension of the statistical manifold for the multivariate Gaussian distribution.
Remark 10.
Our calculation of the Weyl prior of the multivariate Gaussian distribution is generally not a uniform prior. However, Equation (29) shows that, when , that is, the univariate case, the Weyl prior is indeed the uniform prior. This is in accordance with our direct calculation for the univariate case.
7. Discussion and Conclusions
We discussed Weyl geometry and Weyl prior in this paper. We also calculated Weyl prior for the Gaussian family as an example.
The underlying principle of Jeffreys prior, -parallel prior, and Weyl prior is the concept of invariance in statistics. Jeffreys prior is invariant under a change of the coordinate of parameters. Weyl prior and -parallel prior, as generalizations of Jeffreys prior, automatically satisfy this invariance. Moreover, Weyl prior, as a volume form defined on a Weyl manifold, is also invariant under a gauge transformation [11]. In addition, invariant under the gauge transformation is the generalized conjugate connection [11].
One possible use of the Weyl prior is using the uniform prior for distributions with two parameters. This is because any two-dimensional manifold is conformally-flat.
Acknowledgments
We thank the referees for their constructive comments and suggestions.
Author Contributions
R.J. contributed significantly to the paper. Throughout the process of this research, both J.T. and Y.Z. provided detailed advices, discussions, and suggestions. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by Discovery Grants of the Natural Sciences and Engineering Research Council of Canada (NSERC) under No. 256233 and No. 163407. The APC was funded by NSERC No. 163407.
Conflicts of Interest
The authors declare no conflict of interest.
References
- 1.Amari S. Information Geometry and Its Applications. Volume 194 Springer; Berlin/Heidelberg, Germany: 2016. [Google Scholar]
- 2.Calin O., Udriste C. Geometric Modeling in Probability and Statistics. Springer; Basel, Switzerland: 2014. [Google Scholar]
- 3.Nielsen F. An elementary introduction to information geometry. arXiv. 2018 doi: 10.3390/e22101100.1808.08271 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Matsuzoe H., Takeuchi J., Amari S. Equiaffine structures on statistical manifolds and Bayesian statistics. Differ. Geom. Its Appl. 2006;24:567–578. doi: 10.1016/j.difgeo.2006.02.003. [DOI] [Google Scholar]
- 5.Ciambelli L., Leigh R.G. Weyl Connections and their Role in Holography. arXiv. 20191905.04339 [Google Scholar]
- 6.Yavari A., Goriely A. Weyl geometry and the nonlinear mechanics of distributed point defects. Proc. R. Soc. A. 2012;468:3902–3922. doi: 10.1098/rspa.2012.0342. [DOI] [Google Scholar]
- 7.Kobayashi S., Nomizu K. Foundations of Differential Geometry. Volume 1 Wiley; New York, NY, USA: 1963. [Google Scholar]
- 8.Takeuchi J., Amari S. α-parallel prior and its properties. IEEE Trans. Inf. Theory. 2005;51:1011–1023. doi: 10.1109/TIT.2004.842703. [DOI] [Google Scholar]
- 9.Folland G.B. Weyl manifolds. J. Differ. Geom. 1970;4:145–153. doi: 10.4310/jdg/1214429379. [DOI] [Google Scholar]
- 10.Kulkarni R. Conformally flat manifolds. Proc. Natl. Acad. Sci. USA. 1972;69:2675. doi: 10.1073/pnas.69.9.2675. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Calin O., Matsuzoe H., Zhang J. Trends in Differential Geometry, Complex Analysis and Mathematical Physics. World Scientific; Singapore: 2009. Generalizations of conjugate connections; pp. 26–34. [Google Scholar]