Strong consistency of nonparametric Bayes density estimation on compact metric spaces with applications to specific manifolds

Abhishek Bhattacharya; David B Dunson

doi:10.1007/s10463-011-0341-x

. Author manuscript; available in PMC: 2013 Aug 1.

Published in final edited form as: Ann Inst Stat Math. 2011 Nov 18;64(4):687–714. doi: 10.1007/s10463-011-0341-x

Strong consistency of nonparametric Bayes density estimation on compact metric spaces with applications to specific manifolds

Abhishek Bhattacharya ¹, David B Dunson ²

PMCID: PMC3439825 NIHMSID: NIHMS341736 PMID: 22984295

Abstract

This article considers a broad class of kernel mixture density models on compact metric spaces and manifolds. Following a Bayesian approach with a nonparametric prior on the location mixing distribution, sufficient conditions are obtained on the kernel, prior and the underlying space for strong posterior consistency at any continuous density. The prior is also allowed to depend on the sample size n and sufficient conditions are obtained for weak and strong consistency. These conditions are verified on compact Euclidean spaces using multivariate Gaussian kernels, on the hypersphere using a von Mises-Fisher kernel and on the planar shape space using complex Watson kernels.

Keywords: Nonparametric Bayes, Density Estimation, Posterior consistency, Sample dependent prior, Riemannian manifold, Hypersphere, Shape space

1 Introduction

Density estimation on compact metric spaces, such as manifolds, is a fundamental problem in nonparametric inference on non-Euclidean spaces. Some applications include directional and axial data analysis, spatial modeling, shape analysis and dimensionality reduction problems in which the data lie on an unknown lower dimensional space. However, the literature on statistical theory and methods of density estimation in non-Euclidean spaces is still under-developed. Our focus is on Bayesian nonparametric approaches.

For nonparametric Bayes density estimation on the real line ℜ, there is a rich literature, with Dirichlet process mixtures of Gaussian kernels providing a commonly-used approach (Escobar and West (1995)[6]) that leads to dense support (Lo (1984)[14]) and weak and strong posterior consistency (Ghosal et al. (1999)[8]). From the celebrated theorem of Schwartz (1965)[16], weak posterior consistency results when the true density f₀ is in the Kullback-Leibler (KL) support of the prior, meaning that all KL neighborhoods around f₀ are assigned positive probability. In general, it is quite difficult to show KL support for new priors for a density, though Wu and Ghosal (2008)[20] provide useful conditions for a class of kernel mixture priors, with Bhattacharya and Dunson (2010a)[2] extending these conditions to general compact metric spaces. It is widely accepted that weak consistency is an insufficient property when the focus is on density estimation. For example, if f₀ is a density with respect to Lebesgue measure, weak consistency does not even ensure that the posterior assigns positive probability to the set of densities with respect to Lebesgue measure. Hence, it is important to provide stronger results.

Until very recently, essentially all the literature on theory of nonparametric Bayes density estimation focused on one-dimensional Euclidean spaces. An important development in multivariate Euclidean spaces is the article of Wu and Ghosal (2010)[21] who provide sufficient conditions for strong consistency in nonparametric Bayes density estimation from Dirichlet process mixtures of multivariate Gaussian kernels. However severe tail restrictions are imposed on the kernel covariance, which become overly restrictive when the data are very high dimensional. Also the theory developed in their paper is specialized and cannot be easily generalized to arbitrary kernel mixtures on more general spaces.

We are particularly interested in density estimation in the special case in which the compact metric space M corresponds to a Riemannian manifold. In order to extend kernel mixture models used in Euclidean spaces to manifolds M, the kernel needs to be carefully chosen. One approach is to introduce an invertible coordinate map between a subset of M and a Euclidean space (Hirsch (1976)[10]). Under such an approach, the density prior on M can be induced through a kernel mixture model in a Euclidean space. However, several major problems arise in using such an approach. Firstly, it is not possible to cover the entire manifold with a single smooth coordinate chart except for very simple manifolds, so unless the data are very concentrated one may obtain poor performance. Different local charts can be patched together to form an atlas, but this may introduce artifactual discontinuities in the resulting density. Because the coordinate map is not isometric, the geometry of the manifold can be heavily distorted. As good choices of coordinate frames necessarily depend on the observations, additional uncertainty is automatically induced. Due to these and other shortcomings of coordinate based methods, we focus on modeling approaches that are coordinate free in the sense that we build density models with respect to the invariant volume form on the manifold.

In Bhattacharya and Dunson (2010a)[2], a density model is presented on a general compact metric space with respect to any fixed base measure using a random mixture of probability kernels. Under mild conditions on the kernel and the mixing prior, it is shown that the prior probability of any uniform neighborhood of any continuous density f₀ is positive and if f₀ is positive everywhere, it lies in the KL support of the prior. This inturn implies posterior consistency in the weak sense. Density estimation on the planar shape space is presented as a special case. However strong posterior consistency is not addressed. The everywhere positivity restriction on the true density cannot be easily relaxed. Also besides the location mixing distribution, the only other parameter in the model is a scalar band-width. This restricts the flexibility when the sample size is small.

Focusing on kernel mixture priors for densities on a compact metric space M, in this article, we provide sufficient conditions on the kernel, prior and the underlying space to ensure strong consistency. Theorem 2 and Corollary 1 provide sufficient conditions to ensure that all total variation neighborhoods around f₀ will be assigned probability converging to one as the sample size increases. The theoretical development relies on the method of sieves and exponentially consistent tests discussed in Barron (1989)[1]. However, applying this framework outside multivariate spaces is not standard and requires careful use of differential geometry. Through Theorem 1, we prove weak consistency for a bigger class of kernels than Bhattacharya and Dunson (2010a)[2]. The only requirement on the true density is that it is continuous everywhere. To illustrate the theory, we focus on density estimation on the unit hypersphere using von Mises-Fisher kernels and on the planar shape space using complex Watson kernels. In both these cases, it is shown that the kernels satisfy the sufficient conditions. The results also apply to Gaussian mixture densities on ℜ^d whenever the true density has compact support. In that case, a truncated and transformed Wishart prior on the covariance inverse, the transformation depending on the data dimension is shown to suffice as in Wu and Ghosal (2010)[21]. Appropriate kernel choices are presented on other manifolds such as axial spaces, Stiefel manifolds and Grasmannians which arise as generalisations of the two manifolds.

When the manifold is high-dimensional, priors satisfying conditions for strong consistency tend to put too little probability near bandwidths close to 0, which is undesirable for applications. A gamma prior on the inverse-bandwidth, for example, cannot be shown to satisfy the conditions. Hence, we extend the consistency results to cover cases with priors depending on the sample size n. Theorem 3 extends the Schwartz theorem to prove weak consistency, while Theorem 4 proves strong consistency using such priors. A gamma prior with scale decreasing with n at an appropriate rate satisfies the conditions for both weak and strong posterior consistency at an exponential rate. When using multivariate Gaussian mixtures, a truncated Wishart prior with hyper-parameters depending on n, is shown to work.

To mantain a free flow while reading, we put all the proofs together at the end in a section called Appendix.

2 Consistency theorems on compact metric spaces

2.1 Weak posterior consistency

Let (M, ρ) be a compact metric space, ρ being the distance metric, and let X be a random variable on M (from some measurable space (Ω, Inline graphic , Q)). We assume that the distribution of X has a density with respect to some fixed finite base measure λ on M. The natural choice for such a λ when M is a Riemannian manifold is the invariant volume form. We are interested in modelling this unknown density via a flexible model. Let K(m; μ, Inline graphic ) be a probability kernel on M with location μ ∈ M and other parameters ∈ N, N being a Polish space, that is, it is homeomorphic to a complete seperable metric space. In the special case, we choose N = (0, ∞) and then K may be called a location-scale kernel.

Given the parameters (μ, Inline graphic ), K satisfies ∫_M K(m; μ, )λ(dm) = 1. Then a location mixture density model for X is defined as

f (m; P, K) = \int_{M} K (m; μ, K) P (d μ)

(1)

with parameters P in the space Inline graphic (M) of all probability distributions on M and ∈ N. We call P the location mixing distribution. When N = (0, ∞), we view as the band-width of the kernel and hence call the precision or inverse band-width parameter. More generally comprises of many other parameters in different spaces determining the kernel shape, modality etc, and the precision is a particular function of Inline graphic . The upcoming consistecy theorems and examples will illustrate that function. Kernel mixture models are used routinely in Bayesian density estimation in Euclidean spaces, with Lennox et al.(2009)[13] applying such an approach to bivariate angular data and Bhattacharya and Dunson (2010a)[2], (2010b)[3] considering kernel mixtures on general metric spaces.

A prior Π₁ on (P, Inline graphic ) induces a prior Π on the space of densities (M ) on M through the model (1). Given a random realization X₁, …, X_n of X, we can compute the posterior of f. The Schwartz theorem provides a useful tool in proving that the posterior assigns probability converging to one in arbitrarily small neighborhoods of the true density f₀ as the sample size n → ∞. Let F₀ denote the probability distribution corresponding to f₀, let KL(f₀; f ) = ∫_M f₀(m) log{f₀(m)/f(m)}λ(dm) denote the KL divergence of another density f from f₀, and let K_ε(f₀) denote the KL neighborhood {f ∈ Inline graphic (M): KL(f₀; f ) < ε}. f₀ is said to be in the KL support of Π or Π is said to satisfy the KL condition at f₀ if Π{K_ε(f₀)} > 0 for all ε > 0.

Proposition 1 (Schwartz Theorem)

If (1) f₀ is in the KL support of Π, and (2) U ⊂ Inline graphic (M) is such that there exists a uniformly exponentially consistent sequence of test functions for testing H₀: f = f₀ versus H₁: f ∈ U^c, then Π(U |X₁, …, X_n) → 1 as n → ∞ a.s. $F_{0}^{\infty}$ .

The posterior probability of U^c can beexpressed as

Π (U^{c} ∣ X_{1}, \dots, X_{n}) = \frac{\int_{U^{c}} \prod_{i = 1}^{n} \frac{f (X_{i})}{f_{0} (X_{i})} Π (d f)}{\int \prod_{1}^{n} \frac{f (X_{i})}{f_{0} (X_{i})} Π (d f)}

(2)

Condition (1), known as the KL condition, ensures that for any β > 0,

\underset{n \to \infty}{lim inf} exp (n β) \int \prod_{i = 1}^{n} \frac{f (X_{i})}{f_{0} (X_{i})} Π (d f) = \infty a . s .

(3)

while condition (2) implies that

lim_{n \to \infty} exp (n β_{0}) \int_{U^{c}} \prod_{i = 1}^{n} \frac{f (X_{i})}{f_{0} (X_{i})} Π (d f) = 0 a . s .

for some β₀ > 0 (depending on U ) and therefore

lim_{n \to \infty} exp (n β_{0} / 2) Π (U^{c} ∣ X_{1}, \dots, X_{n}) = 0 a . s .

Hence Proposition 1 provides conditions for posterior consistency at an exponential rate. Theorem 1 derives sufficient conditions on the kernel and the prior so that f₀ is in the uniform support and hence KL support of Π. They are

A1
The kernel K is continuous in its arguements.

For any continuous function f: M → ℜ (written as f ∈ C(M)), for any ε > 0, there exists a compact subset N_ε of N with non-empty interior, such that
A2
sup_{m∈M,
∈N_ε}|f(m) − ∫_M K(m; μ, )f (μ)λ(dμ)| < ε.
A3
For any ε > 0, the set ${F_{0}} \times N_{ε}^{o}$ intersects with the (weak) support of Π₁. Here A^o denotes the interior of a set A.
A4
f₀ is a continuous density.

Theorem 1

Under assumptions A1–A4, for any ε > 0,

Π {f \in D (M) : sup_{m \in M} ∣ f (m) - f_{0} (m) ∣ < ε} > 0,

which implies that f₀ is in the KL support of Π.

As a corollary, we obtain the KL property for the location-scale kernel, derived in Bhattacharya and Dunson (2010a)[2]. However unlike in there, we need not assume f₀ to be positive everywhere.

When U is a weakly open neighborhood of f₀, condition (2) in Proposition 1 is always satisfied. Hence under assumptions A1–A4, weak posterior consistency at an exponential rate follows. Assumptions A1 and A2 impose some mild conditions on the kernel choice, which are easily satisfied by several parametric families. In particular, A2 implies that as a probability distribution on M, K(; μ, Inline graphic ) can be made arbitrarilly close in weak sense to the degenerate point mass at μ, uniformly in μ, for appropriate choice, thereby justifying the name ‘location’ for μ. When the compact neighborhood N_ε can be represented as the inverse image under some function ψ(ψ: N → ℜ⁺) of some neighborhood around infinity, then ψ( Inline graphic ) can be viewed as the precision parameter. We will provide examples of kernels on some non-Euclidean manifolds arising in shape and directional data analysis which satisfy A1 and A2. A common choice for Π₁ satisfying A3 can be a full support product prior such as a Dirichlet process DP(w₀P₀) prior on P satisfying supp(P₀) = M and an independent everywhere positive density prior on Inline graphic .

2.2 Strong consistency

When U is a total variation neighborhood of f₀, LeCam (1973)[12] and Barron (1989)[1] show that condition (2) of Proposition 1 will not be satisfied in most cases. In Barron (1989)[1] (also see Ghosal et al.(1999)[8]), a sieve method is considered to obtain sufficient conditions for the numerator in (2) to decay at an exponential rate and hence get strong posterior consistency at an exponential rate. This is stated in Proposition 2. In its statement, for Inline graphic ⊆ (M) and ε > 0, the L₁-metric entropy N (ε, ) is defined as the logarithm of the minimum number of ε-sized (or smaller) L₁ subsets needed to cover .

Proposition 2

If there exists a Inline graphic ⊆ (M) such that (1) for n sufficiently large, $Π (D_{n}^{c}) < \exp (- n β)$ for some β > 0, and (2) N (ε, )/n → 0 as n → ∞ for any ε > 0, then for any total variation neighborhood U of f₀, there exists a β₀ > 0 (depending on U) such that $lim {sup}_{n \to \infty} exp (n β_{0}) \int_{U^{c}} \prod_{1}^{n} \frac{f (X_{i})}{f_{0} (X_{i})} Π (d f) = 0$ a.s. $F_{0}^{\infty}$ . Hence if f₀ is in the KL support of Π, the posterior probability of any total variation neighborhood of f₀ converges to 1 almost surely.

Theorem 2, which is the main theorem of this paper, describes a Inline graphic which satisfies condition (2). We assume that there exists a continuous function φ: N → [0, ∞) for which the following assumptions hold.

A5
There exists positive constants κ₁, a₁, A₁ such that for all κ ≥ κ₁, μ, ν ∈ M,
$sup_{m \in M, K \in φ^{- 1} [0, κ]} ∣ K (m; μ, K) - K (m; ν, K) ∣ \leq A_{1} κ^{a_{1}} ρ (μ, ν) .$
A6
There exists positive constants a₂, A₂ such that for all , ∈ φ⁻¹ [0, κ], κ ≥ κ₁,
$sup_{m, μ \in M} ∣ K (m; μ, K_{1}) - K (m; μ, K_{2}) ∣ \leq A_{2} κ^{a_{2}} ρ_{2} (K_{1}, K_{2}),$

ρ₂ metrizing the topology of N.
A7
For any κ ≥ κ₁, the subset φ⁻¹ [0, κ] is compact and given ε > 0, the minimum number of ε (or smaller) radius balls covering it (known as the ε-covering number) can be bounded by (κε⁻¹)^b₂ for appropriate positive constant b₂ (independent of κ and ε).
A8
There exists a₃, A₃ > 0 such that the ε-covering number of M is bounded by A₃ε^−a₃ for any ε > 0.

For two positive sequences {a_n} and {b_n}, {a_n} is said to be ‘little-o’ of {b_n}, written as a_n = o(b_n), if the sequence {a_n/b_n} converges to 0 as n → ∞.

Theorem 2

For a positive sequence {κ_n} diverging to ∞, define

D_{n} = {f (P, K) : P \in M (M), K \in φ^{- 1} [0, κ_{n}]} .

Under assumptions A5–A8, given any ε > 0, for n sufficiently large, $N (ε, D_{n}) \leq C (ε) κ_{n}^{a_{1} a_{3}}$ for some C(ε) > 0. Hence N(ε, Inline graphic ) is o(n) whenever κ_n is o (n^{(a₁a₃)⁻¹}).

As a corollary, we derive conditions on the prior Π₁ on (P, Inline graphic ) under which strong posterior consistency at an exponential rate follows.

Corollary 1

Under the hypothesis of Theorems 1 and 2 and

A9
Π₁( (M ) × φ⁻¹ (n^a, ∞)) < exp(−nβ) for some a < (a₁a₃)⁻¹ and β > 0,

the posterior probability of any total variation neighborhood of f₀ converges to 1 a.s. $F_{0}^{\infty}$ .

Theorem 1 ensures that f₀ is in the KL support of Π. Theorem 2 and assumption A9 ensure that Inline graphic satisfies conditions (1) and (2) of Proposition 2. Hence from the Proposition, the proof follows.

When we use a location-scale kernel, that is, when N = (0, ∞), choose a prior Π₁ = Π₁₁⊗π₁ having full support, set φ to be the identity map, then a choice for π₁ for which Assumption A9 is satisfied is a Weibull density W eib( Inline graphic ; α, β) ∝ exp(−β ) whenever the shape parameter α exceeds a₁a₃ (a₁, a₃ as in Assumptions A5 and A8).

Remark 1

A gamma prior on Inline graphic satisfies the requirements for weak consistency but not A9 (unless a₁a₃ < 1). However that does not prove that it is not eligible for strong consistency because Corollary 1 provides only sufficient conditions. In Section 2.3, we prove it to be eligible as long as the hyperparameters are allowed to depend on sample size n in a suitable way.

When the underlying space is non-compact such as ℜ^d, Corollary 1 applies to any true density f₀ supported on a compact set, say M. Then the kernel can be chosen to have non-compact support, such as Gaussian, but to apply Theorem 2, we need to restrict the prior on the location mixing distribution to have support in Inline graphic (M). We can weaken assumptions A5 and A6 to

A5′
sup_{∈φ⁻¹[0, κ]}||K(μ, ) − K(ν, )|| ≤ A₁κ^a₁ρ (μ, ν) and
A6′
_μ_∈_M||K(μ, ) − K(μ, )|| ≤ A₂κ^a₂ρ₂( , ) ∀ , ∈ φ⁻¹ [0, κ] respectively. Here ||f −g|| denotes the L₁ distance between densities f and g. The proof of Theorem 2 can be easily modified to show consistency under the new assumptions and is left to the reader.

The multivariate Gaussian kernel can be represented as

K (m, μ, K) = {(2 π)}^{- d / 2} det {(K)}^{1 / 2} exp (- 1 / 2 {(m - μ)}^{'} K (m - μ)) m, μ \in R^{d}, K \in M^{+} (d),

M⁺(d) being the space of all d × d positive matrices. Hence Inline graphic is the kernel covariance. It satisfies A5′ and A6′ as shown in Proposition 3. Here by λ₁( ), …, λ_d( ), we denote the eigen-values of in increasing order.

Proposition 3

The multivariate Gaussian kernel satisfies A5′ with φ being the largest eigen-value function, φ( Inline graphic ) = λ_d( ) and a₁ = 1/2. It also satisfies A6′ once we restrict N to be the space of all positive matrices with the least eigen-value being bounded below by some pre-specified positive constant, say, λ₁, i.e., N = { ∈ M⁺ (d): λ₁( ) ≥ λ₁}. The space M⁺(d) (and hence N) satisfies A7 while any compact subset M of ℜ^d satisfies A8 from Theorem 2 with a₃ = d. Hence if

Π_{1} (M (M) \times {K \in N : λ_{d} (K) > n^{a}}) < exp (- n β)

for some a < 2/d and β > 0, and if f₀ is in the KL support of Π, strong posterior consistency follows.

Theorem 4 in Ghosal et al.(1999)[8] provides sufficient conditions on f₀ and Π₁ for the KL condition to be satisfied while using a Gaussian mixture model in the univariate setting to model a compactly supported density. It assumes Π₁ = Π₁₁⊗π₁ with F₀ ∈ supp(Π₁₁) and ∞ ∈ supp(π₁). The theorem can be extended to multivariate setting with the condition on Π₁ relaxed to, for any κ > 0, there exists a Inline graphic ∈ M ⁺ (d) with λ₁( ) ≥ κ such that (F₀, ) ∈ supp(Π₁). Therefore λ₁( ) can be viewed as the kernel precision in this case. A full support product Π₁ on (M ) × N will satisfy these requirements. Using a product prior, a choice for π₁ for which strong consistency also follows can be the so called truncated transformed Wishart defined as follows. Set Inline graphic = Λ^a for any a ∈ (0, 2/d) with Λ following a Wishart distribution restricted to N. Then is said to follow a truncated transformed Wishart with transformation power a.

Remark 2

The truncation restriction on the space N is not undesirable, because for more precise fit, we are interested in low bandwidths and the least eigen-value of Inline graphic can be viewed as the inverse of the band-width. However, lower the transformation power, lower is the prior probability for high precisions which is undesirable when sample sizes are not high.

In Wu and Ghosal (2010)[21], strong consistency is proved in the special case of Dirichlet process Gaussian mixtures used to model density f₀ having support as ℜ^d. It requires a to be less than 1/d resulting in even smaller precision. In the next section, we prove that no transformation is required (a = 1) as long as the hyper-parameters are allowed to depend on the sample size appropriately.

2.3 Consistency with sample size-dependent priors

When the dimension of the manifold is large, as is the case in shape analysis with a large number of landmarks, the constraints on the shape parameter in the proposed Weibull prior on the inverse bandwidth become overly-restrictive. In particular, for strong posterior consistency, the shape parameter needs to be very large in high-dimensional cases, implying a prior on the bandwidth that places very small probability in neighborhoods close to zero, which is undesirable in many applications. By instead allowing the prior to depend on sample size n, we can potentially obtain priors that may have better small sample operating characteristics, while still leading to strong consistency. However, for n-dependent priors, the KL condition is no longer sufficient to ensure that (3) holds and hence the Schwartz theorem breaks down. In this section, we will modify the conditions and derive weak and strong consistency results for n-dependent priors.

As recommended in earlier sections, we let P and Inline graphic be independent under Π₁. Then, assuming P ~ Π₁₁ is a constant prior, we focus on the case in which has a sample size-dependent prior density π_n with respect to some base measure λ₁ on N, ~ π_n( )λ₁(d ). We pick λ₁ to have full support. Depending on the context, π_n will refer to both the density and distribution of Inline graphic . Denote the resulting sequence of induced priors on (M) as Π_n. Theorem 3 proves weak posterior consistency under the following assumptions on the prior.

A10
The prior Π₁₁ contains F₀ in its support.
A11
For any ε > 0, for all ∈ N_ε,
$\underset{n \to \infty}{lim inf} exp (n ε) π_{n} (K) = \infty .$

Here N_ε is as defined in Assumption A2.

Theorem 3

Under assumptions A1 and A2 on the kernel, A4 on the true density f₀, and, A10 and A11 on the prior, the posterior probability of any weak neighborhood of f₀ converges to one a.s. $F_{0}^{\infty}$ .

The proof is immediate from the following two lemmas.

Lemma 1

Under assumptions A1–A2, A4 and A10–A11, for any ε > 0,

\underset{n \to \infty}{lim inf} exp (n ε) \int \prod_{1}^{n} \frac{f (X_{i})}{f_{0} (X_{i})} Π_{n} (d f) = \infty

(4)

a.s. $F_{0}^{\infty}$ .

Lemma 2

If there exists a uniformly exponentially consistent sequence of test functions for testing H₀: f = f₀ versus H₁: f ∈ U^c, and Π_n(U^c) > 0 for all n > C with C a sufficiently arge constant, then for some β₀ > 0,

lim_{n \to \infty} exp (n β_{0}) \int_{U^{c}} \prod_{1}^{n} \frac{f (X_{i})}{f_{0} (X_{i})} Π_{n} (d f) = 0

a.s. $F_{0}^{\infty}$ .

The proof of Lemma 2 is related to that of Lemma 4.4.2. from Ghosh and Ramamoorthi (2003)[9] which is stated for a constant prior Π but with the set U^c depending on n, they call this V_n. There it is assumed that lim inf_n_→∞Π(V_n) > 0 but that is not necessary as long as Π(V_n) > 0 for all large n. Lemma 1 is proved in the Appendix.

With a location-scale kernel, N being (0, ∞), a gamma prior π_n( Inline graphic ) ∝ exp(−β_n ) , α, β_n > 0, denoted by Gam(α, β_n), satisfies Assumption A11 on entire of N, as long as β_n is o(n).

With a multivariate Gaussian kernel on ℜ^d with dispersion Inline graphic , a Wishart prior on ,

π_{n} (K; β_{n}, q) = 2^{- d q / 2} Γ_{d} (q / 2) β_{n}^{d q / 2} exp (- β_{n} / 2 Tr (K)) det {(K)}^{(q - d - 1) / 2}, q > d - 1, β_{n} > 0,

denoted as $Wish (β_{n}^{- 1} I_{d}, q)$ satisfies A11 on entire M⁺(d), as long as β_n is o(n). Here Γ_d(.) denotes the multivariate gamma function defined as

Γ_{d} (q / 2) = \int_{M^{+} (d)} exp (- Tr (K)) det {(K)}^{(q - d - 1) / 2} d K .

For strong consistency, we impose the following additional condition on π_n. Let a₁ and a₃ be as in Assumptions A5 (or A5′) and A8 respectively.

A12
For some β₀ > 0 and a < (a₁a₃)⁻¹,
$lim_{n \to \infty} exp (n β_{0}) π_{n} {φ^{- 1} (n^{a}, \infty)} = 0.$

This assumption is in place of A9 used for constant priors.

Theorem 4

Under Assumptions A1–A2 and A4–A8 and A10–A12, the posterior probability of any total variation neighborhood of f₀ converges to 1 a.s $F_{0}^{\infty}$ .

The proof is very similar to that of Corollary 1. This is because under assumptions A1–A2, A4 and A10–A11, the conclusion (4) of Lemma 1 holds. The other assumptions are to show that the L₁-metric entropy of Inline graphic is o(n) while $Π_{n} (D_{n}^{c})$ is exponentially small, being defined in Theorem 2. Under these assumptions, the proof of Proposition 2 goes through to prove strong consistency with sample size dependent priors. This is also mentioned in §5 of Ghosal et al.(1999)[8]. They require lim inf_n_→∞Π_n(K_ε(f₀)) > 0 in place of of the assumption Π(K_ε(f₀)) > 0 for constant priors but this is only to ensure that (4) holds.

Again as in §2.2, we can weaken assumptions A5 and A6 to A5′ and A6′ respectively.

For a location-scale kernel, a Gam(α, β_n) prior on precision Inline graphic satisfies A12 when n ¹⁻^a is o(β_n) for some a ∈ (0, (a₁a₃)⁻¹). Hence, for example, we have weak and strong posterior consistency with β_n = b₁n/{log(n)}^b₂ for any b₁, b₂ > 0.

For a multivariate Gaussian kernel, to satisfy assumption A6′, we need to truncate the space N to { Inline graphic ∈ M ⁺ (d): λ₁( ) ≥ λ₁}, as proved in Proposition 3. Then we may set a truncated Wishart prior on , defined as

π_{n} (K) = \frac{exp (- β_{n} / 2 Tr (K)) det {(K)}^{(q - d - 1) / 2}}{\int_{A \in N} exp (- β_{n} / 2 Tr (A)) det {(A)}^{(q - d - 1) / 2} d A}, K \in N .

(5)

Then for Assumption A12 to be satisfied, we require n¹⁻^a to be o(β_n) for some a ∈ (0, (a₁a₃)⁻¹). This is shown in Proposition 4. Hence we have weak and strong posterior consistency once we set β_n = b₁n/{log(n)}^b₂ for any b₁, b₂ > 0. Unlike in §2.2, we impose no transformation constraints, which is very helpful especially when sample sizes are not that high while the data dimensions are huge.

Proposition 4

For a positive sequence {β_n} diverging to infinity, Assumption A12 is satisfied for the Truncated-Wishart density sequence π_n in (5) if there exists an a ∈ (0, (a₁a₃)⁻¹) for which β_n satisfies n^1−a/β_n → 0 as n → ∞.

In the subsequent sections, we present kernel choices for density estimation on some specific non-Euclidean manifolds that arise in several applications. We illustrate how to apply Theorems 1, 2, 3 and 4 and obtain weak and strong posterior consistency.

3 Application to unit hypersphere

Let M be the unit sphere S^d embedded in ℜ^d⁺¹. It is a compact Riemannian manifold of dimension d and a compact metric space under the chord distance ρ(u, v) = ||u−v||₂, ||.||₂ denoting the L²-norm. Spherical data on S² arise in the context of directional data analysis. Most of the shape spaces are quotients of high dimensional spheres. Hence it is important to develop consistent inference procedures on this space, and very few results exist in the context of Bayesian nonparametrics.

To define a probability density model as in (1) with respect to the volume form V, we need a suitable kernel which satisfies the assumptions in Section 2. One of the most commonly used probability densities on this space is the von Mises-Fisher (vMF) density which is given by

vMF (m; μ, K) = c^{- 1} (K) exp (K m^{T} μ), m, μ \in S^{d}, K \in [0, \infty),

(6)

with c being the normalizing constant which can be derived to be

\frac{2 π^{d / 2}}{Γ (\frac{d}{2})} \int_{- 1}^{1} exp (K t) {(1 - t^{2})}^{d / 2 - 1} d t .

(7)

The vMF density on S¹ was first derived in Von Mises (1918)[18] and the density in case of S² was given by Fisher (1953)[7]. Watson and Williams (1953)[19] generalized this distribution to S^d and examined many of its properties. It can be shown that the parameter μ is the extrinsic mean (as defined in Bhattacharya and Patrangenaru (2003)[4]), and hence can be interpreted as the distribution location. The parameter Inline graphic is a measure of concentration, with = 0 corresponding to the uniform distribution having constant density equal to 1/∫_S^dV (dm). As diverges to ∞, the vMF distribution converges to a point mass at μ in an L¹ sense uniformly. This is proved in Theorem 5.

Theorem 5

The vMF kernel satisfies Assumptions A1 and A2.

Hence from Theorem 1, weak posterior consistency follows using the location mixture density model (1) with a Dirichlet Process prior on P and an independent gamma prior on Inline graphic . In the d = 2 special case, Lennox et al.(2009)[13] proposed a closely related model but did not consider theoretical properties. Theorem 6 verifies the assumptions for strong consistency.

Theorem 6

With φ( Inline graphic ) = , the vMF kernel on S^d satisfies assumption A5 with a₁ = d/2 + 1 and A6 with a₂ = d/2. The compact metric-space (S^d, ρ) satisfies assumption A8 with a₃ = d.

As a result a Weibull prior on Inline graphic with shape parameter exceeding (d + d²/2)⁻¹ satisfies the condition of Corollary 1 and strong posterior consistency follows. The proofs of Theorems 5 and 6 use the following lemma which establishes certain properties of the normalizing constant.

Lemma 3

Define c̃( Inline graphic ) = exp(− )c( ), ≥ 0. Then c̃ is decreasing and for ≥ 1,

\tilde{c} (K) \geq C K^{- d / 2}

for some appropriate positive constant C.

When d is large, as is often the case for spherical data, a more appropriate prior on Inline graphic for which weak and strong consistencies hold can be Gam(α, β_n) as mentioned at the end of §2.3.

It is easy to check that the vMF density is the conditional distribution, given ||X|| = 1, of a Gaussian random vector X on ℜ^d⁺¹ with mean μ and dispersion matrix Inline graphic I_d₊₁. A more general family of distribution on S^d may be obtained as the conditional distribution, given ||X|| = 1, of a Normal X on ℜ^d ⁺¹ with mean μ and dispersion matrix , in the space M ⁺ (d + 1) of (d + 1) × (d + 1) positive matices. Then we obtain the Fisher-Bingham family of kernels. It can be interesting to show that the resulting kernel mixture satisfies the assumptions of Theorems 1, 2, 3 and 4 and obtain posterior consistency. We postpone that to later works.

A generalization of the sphere is the Stiefel manifold V_d₊₁_,k, the space all k dimensional orthonormal frames in ℜ^d⁺¹. One can easily extend the vMF kernel to the so called Fisher kernel on this manifold and carry out density estimation. Again, proving that consistency holds is postponed for future works.

Another important manifold arising in axial data analysis is RP^d, the space of all rays in ℜ^d⁺¹. This manifold can be obtained as the quotient of S^d after identifying antipodal points p and −p as identical. In the next section, we illustrate density estimation on its complex analogue, the complex projective space. It is easy and simpler to obtain analogous results on the real version.

4 Planar Shape Space

4.1 Background

Let M be the planar shape space $\sum_{2}^{k}$ which is defined as follows. Consider a set of k landmark locations, k > 2, on a 2D image, not all points being the same. We refer to such a set as a k-ad. The similarity shape of this k-ad is what remains after removing the Euclidean rigid body motions of translation, rotation and scaling. We use the following shape representation first proposed by Kendall (1984)[11]. Denote the k-ad by a complex k-vector z in Inline graphic . To remove the effect of translation from z, let z_c = z − z̄, with $\bar{z} = (\sum_{j = 1}^{k} z_{j}) / k$ being the centroid. The centered k-ad z_c lies in a k − 1 dimensional complex subspace, and hence we can use k − 1 complex coordinates. The effect of scaling is then removed by normalizing the coordinates of z_c to obtain a point w on the complex unit sphere Inline graphic in . Since w contains the shape information of z along with rotation, it is called the preshape of z. The similarity shape of z is the orbit of w under all rotations in 2D which is

[w] = {e^{i θ} w : θ \in (- π, π]} .

This represents a shape as the set of all intersection points of a unique complex line passing through the origin with Inline graphic and the planar shape space $\sum_{2}^{k}$ is then the set of all such shapes. Hence $\sum_{2}^{k}$ can be identified with the space of all complex lines passing through the origin in which is the complex projective space and is a compact Riemannian manifold of dimension 2k − 4. The $\sum_{2}^{k}$ can be embedded into the space of all order k − 1 complex Hermitian matrices via the embedding J([w]) = ww^*, ^* denoting the complex conjugate transpose. This embedding induces a distance on $\sum_{2}^{k}$ called the extrinsic distance which generates the manifold topology and is given by

d_{E} ([u], [v]) = | | J ([u]) - J ([v]) | | = \sqrt{2 (1 - {∣ u^{*} v ∣}^{2})} ([u], [v] \in \sum_{2}^{k}) .

For more details, see Bhattacharya and Dunson (2010a)[2] and the references cited therein.

4.2 Density model

We define a location-mixture density on $\sum_{2}^{k}$ as in (1) with respect to the Riemannian volume form V and the kernel being a complex Watson density. This complex Watson density was used in Dryden and Mardia (1998)[5] for parametric density modelling and is given by

CW (m; μ, K) = c^{- 1} (K) exp {K ({∣ z^{*} ν ∣}^{2} - 1)} (m = [z], μ = [ν])

(8)

with c (K) = π^{k - 2} K^{2 - k} (1 - exp (- K) \sum_{r = 0}^{k - 3} \frac{K^{r}}{r!}) .

(9)

It is shown in Bhattacharya and Dunson (2010a)[2] that the complex Watson kernel satisfies assumptions A1 and A2 in §2. Using a Dirichlet Process prior on the location mixing distribution and an independent gamma prior on the precision parameter, Theorem 1 implies that the density model (1) has full support in the space of all positive continuous densities on $\sum_{2}^{k}$ in uniform and KL sense and hence the posterior is weakly consistent.

Theorem 7 verifies that the complex Watson kernel also satisfies the regularity conditions in A5 and A6.

Theorem 7

The complex Watson kernel CW(m; μ, Inline graphic ) on the compact metric space $\sum_{2}^{k}$ endowed with the extrinsic distance d_E satisfies assumption A5 with a₁ = k − 1 and A6 with a₂ = 3k − 8.

The proof uses Lemma 4 which verifies certain properties of the normalizing constant.

Lemma 4

Let c( Inline graphic ) be the normalizing constant for CW(μ, ) as defined in (9). Then c is decreasing on [0, ∞) with

lim_{κ \to 0} c (K) = \frac{π^{k - 2}}{(k - 2)!} and lim_{K \to \infty} c (K) = 0.

If we define

\tilde{c} (K) = 1 - exp (- K) \sum_{r = 0}^{k - 3} \frac{K^{r}}{r!},

it follows that c̃ is increasing with

\begin{array}{r} lim_{K \to 0} \tilde{c} (K) = 0, lim_{K \to \infty} \tilde{c} (K) = 1 and \\ \tilde{c} (K) \geq (k - 2)!^{- 1} exp (- K) K^{k - 2} . \end{array}

The proof follows from direct computations.

Theorem 8 verifies that assumption A8 holds on $\sum_{2}^{k}$ .

Theorem 8

The compact metric space ( $\sum_{2}^{k}$ ,d_E) satisfies assumption A8 with a₃ = 2k − 3.

As a result, Corollary 1 implies that strong posterior consistency holds with Π₁ = (DP)(ω₀P₀)⊗π₁, for Weibull π₁ with shape parameter exceeding (2k − 3)(k − 1). Alternatively one may use a gamma prior on Inline graphic with inverse-scale increasing with n at a suitable rate and we have consistency using Theorems 3 and 4.

The complex Watson kernel is a special case of the complex Bingham kernel which has density proportional to exp(z^*Az) with respect to the volume form. This kernel has location corresponding to the shape of a eigen-vector corresponding to the largest eigen-value of A. Since it has more parameters, we expect better fit in smaller samples. We will prove that weak and strong posterior consistency holds while using this kernel in a later work.

When the landmarks are obtained from a 3D object, it is more appropriate to carry out an affine shape analysis, that is identify two k-configurations as identical if they are related by an affine transformation. One can identify the resulting shape space with the Grassmanian manifold - the space of all 3D subspaces of ℜ^k⁻¹, a result of Sparr (1992)[17]. The Grassmanian is an extension of the real projective space and hence one may consider (real) Bingham kernels and construct kernel mixture density models on this space.

5 Summary & Future Work

We consider kernel mixture density models on general compact manifolds and obtain sufficient conditions on the kernel, priors and the space for the density estimate to be strongly consistent. Thereby we extend the existing literature on strong posterior consistency on ℜ^d using Gaussian kernels to more general non-Euclidean manifolds. The conditions are verified for specific kernels on two important manifolds, namely the hypersphere and the planar shape space. It is discussed how to extend the kernel choice on these manifolds and construct their counterparts on other manifolds arising as generalisations. The multivariate Gaussian mixture model with an appropriate truncated and transformed Wishart prior on the within cluster covariance inverse is also shown to satisfy the consistency conditions when used to model a compactly supported density on ℜ^d. We also allow the prior to depend on the sample size and obtain sufficient conditions for weak and strong consistency, while expecting better small sample operating characteristics. As a result a truncated Wishart prior on the covariance inverse of a multivariate Gaussian kernel is shown to satisfy the requirements for strong consistency.

In later works we plan to prove the results for other kernels on additional manifolds arising in applications. We also plan to extend the results to cover densities with non-compact support, in particular ℜ^d. Since most of the non-Euclidean manifolds arising in applications are compact, that is not a high priority.

6 Appendix

6.1 Proof of Theorem 1

The proof runs on the lines of that of Theorem 1 in Bhattacharya and Dunson (2010a)[2].

Proof

First of all we show that the set

{P \in M (M) : sup_{m \in M, K \in N_{\in}} ∣ f (m; P, K) - f (m; F_{0}, K) ∣ < ε}

(10)

contains a weakly open neighborhood of F₀, F₀ being the distribution corresponding to f₀. The kernel K being continuous from assumption A1, for any (m, Inline graphic ) ∈ M × N_ε,

W_{m, K} = {P : ∣ f (m; P, K) - f (m; F_{0}, K) ∣ < ε / 3}

defines an open neighborhood of F₀. The mapping from (m, Inline graphic ) to f (m; P, ) is a uniformly equicontinuous family of functions on M × N_ε, labeled by P ∈ (M), because, for m₁, m₂ ∈ M; , ∈ N_ε,

∣ f (m_{1}; P, K_{1}) - f (m_{2}; P, K_{2}) ∣ \leq \int_{M} ∣ K (m_{1}; μ, K_{1}) - K (m_{2}; μ, K_{2}) ∣ P (d μ)

and K is uniformly continuous on M × M × N_ε. Therefore there exists a δ > 0 such that ρ₁₂((m₁, Inline graphic ), (m₂, )) < δ implies that

sup_{P} ∣ f (m_{1}; P, K_{1}) - f (m_{2}; P, K_{2}) ∣ < ε / 3.

Here ρ₁₂ denotes any distance on M × N inducing the product topology. Cover M × N_ε by finitely many balls of radius $δ : M \times N_{ε} = \cup_{i = 1}^{J} B {(m_{i}, K_{i}), δ}$ . Let $W_{ε} = \cap_{i = 1}^{J} W_{m_{i}, K_{i}}$ which is an open neighborhood of F₀. Let P ∈ Inline graphic and (m, ) ∈ M × N_ε. Then there exists a (m_i, ) such that (m, ) ∈ B{(m_i, ), δ}. Then |f(m; P, ) − f(m; F₀, )|

\begin{array}{l} \leq ∣ f (m; P, K) - f (m_{i}; P, K_{i}) ∣ + ∣ f (m_{i}; P, K_{i}) - f (m_{i}; F_{0}, K_{i}) ∣ + ∣ f (m_{i}; F_{0}, K_{i}) - f (m; F_{0}, K) ∣ \\ < \frac{ε}{3} + \frac{ε}{3} + \frac{ε}{3} = ε . \end{array}

This proves that the set in (10) contains Inline graphic , an open neighborhood of F₀.

For P ∊ Inline graphic and ∊ N_ε, sup_m|f₀(m) − f(m; P, )|

\leq {sup}_{m} ∣ f_{0} (m) - f (m; F_{0}, K) ∣ + {sup}_{m} ∣ f (m; F_{0}, K) - f (m; P, K) ∣ < 2 ε

because of assumptions A2 and A4. Hence

Π {f : sup_{m \in M} ∣ f (m) - f_{0} (m) ∣ < 2 ε} \geq Π_{1} (W_{ε} \times N_{ε}) > 0

because of assumption A3 and the fact that int( Inline graphic × N_ε) interects with supp(Π₁). This implies the KL property when f₀ is strictly positive (and hence bounded below), as shown in Corollary 1 of Bhattacharya and Dunson (2010a) [2].

In case f₀ is not bounded below, we use Lemma 4 in Wu and Ghosal (2008)[20] to get a continuous everywhere positive density f₁ (depending on f₀ and ε) for which $Π (K_{ε} (f_{1})) \leq Π (K_{2 ε + \sqrt{ε}} (f_{0}))$ . From what we have proved above, Π(K_ε(f₁)) > 0 and as a result the KL condition follows for f₀.

6.2 Proof of Theorem 2

In this proof and the subsequent ones, we shall use a general symbol C for any constant not depending on n (but possibly on ε).

Proof

Given δ₁ > 0 (≡ δ₁(ε, n)), cover M by N₁ (≡ N₁(δ₁)) many disjoint subsets of diameter at most $δ_{1} : M = \cup_{i = 1}^{N_{1}} E_{i}$ . Assumption A8 implies that for δ₁ sufficiently small, $N_{1} \leq C δ_{1}^{- a_{3}}$ . Pick μ_i ∈ E_i, i = 1, …, N₁, and define for a probability P,

P_{n} = \sum_{i = 1}^{N_{1}} P (E_{i}) δ_{μ_{i}}, P_{n} (E) = {(P (E_{1}), \dots, P (E_{N_{1}}))}^{T} .

(11)

Denoting the L₁-norm as ||.||, for any Inline graphic with φ( ) ≤ κ_n,

\begin{array}{r} | | f (P, K) - f (P_{n}, K) | | \leq \sum_{i = 1}^{N_{1}} \int_{E_{i}} | | K (μ, K) - K (μ_{i}, K) | | P (d μ) \\ \leq C \sum_{i} \int_{E_{i}} {sup}_{m \in M} ∣ K (m; μ, K) ∣ - K (m; μ_{i}, K) ∣ P (d μ) \end{array}

(12)

\leq C κ_{n}^{a_{1}} δ_{1} .

(13)

The inequality in (13) follows from (12) using Assumption A5.

For Inline graphic , in φ⁻¹ [0, κ_n], P ∈ (M),

\begin{array}{r} | | f (P, K) - f (P, \tilde{K}) | | \leq C sup_{m, μ \in M} ∣ K (m; μ, K) - K (m; μ, \tilde{K}) ∣ \\ \leq C κ_{n}^{a_{2}} ρ_{2} (K, \tilde{K}), \end{array}

(14)

the inequality in (14) following from Assumption A6. Given δ₂ > 0 (≡ δ₂(ε, n)), cover φ⁻¹ [0, κ_n] by finitely many subsets of diameter at most δ₂, the number of such subsets required being at most $C {(κ_{n} δ_{2}^{- 1})}^{b_{2}}$ , from Assumption A7. Call the collection of these subsets W (δ₂, n).

Letting S_d = {x ∈ [0, 1]^d: Σx_i ≤ 1}, S_d is compact under the L¹-metric (||x||_L¹ = Σ|x_i|, x ∈ ℜ^d), and hence given any δ₃ > 0 (≡ δ₃(ε)), can be covered by finitely many subsets of the cube [0, 1]^d each of diameter at most δ₃. In particular cover S_d with cubes of side length δ₃/d lying partially or totally in S_d. Then an upper bound on the number N₂ ≡ N₂(δ₃, d) of such cubes can be shown to be $\frac{λ (S_{d} (1 + δ_{3}))}{{(δ_{3} / d)}^{d}}$ , λ denoting the Lebesgue measure on ℜ^d and S_d(r) = {x ∈ [0, ∞)^d: Σx_i ≤ r}. Since λ(S_d(r)) = r^d/d!, hence

N_{2} (δ_{3}, d) \leq \frac{d^{d}}{d!} {(\frac{1 + δ_{3}}{δ_{3}})}^{d} .

Let Inline graphic (δ₃, d) denote the partition of S_d as constructed above.

Let d_n = N₁(δ₁). For 1 ≤ i ≤ N₂(δ₃, d_n), $1 \leq j \leq C {(κ_{n} δ_{2}^{- 1})}^{b_{2}}$ , define

D_{i j} = {f (P, K) : P_{n} (E) \in W_{i}, K \in W_{j}},

with Inline graphic and W_j being elements of (δ₃, d_n) and W (δ₂, n) respectively. We claim that this subset of has L¹ diameter of at most ε. For f(P, ), f(P̃, ) in this set, ||f(P, ) − f(P̃, )|| ≤

\begin{array}{r} | | f (P, K) - f (P_{n}, K) | | + | | f (P_{n}, K) - f ({\tilde{P}}_{n}, K) | | + \\ + | | f ({\tilde{P}}_{n}, K) - f (\tilde{P}, K) | | + | | f (\tilde{P}, K) - f (\tilde{P}, \tilde{K}) | | . \end{array}

(15)

From inequality (13), it follows that the first and third terms in (15) are at most $C κ_{n}^{a_{1}} δ_{1}$ . The second term can be bounded by

\sum_{i = 1}^{d_{n}} ∣ P (E_{i}) - \tilde{P} (E_{i}) ∣ < δ_{3}

and from the inequality in (14), the fourth term is bounded by $C κ_{n}^{a_{2}} δ_{2}$ . Hence the claim holds if we choose $δ_{1} = C κ_{n}^{- a_{1}}, δ_{2} = C κ_{n}^{- a_{2}}$ , and δ₃ = C. The number of such subsets covering Inline graphic is at most ${C N}_{2} (δ_{3}, d_{n}) {(κ_{n} δ_{2}^{- 1})}^{b_{2}}$ . From Assumption A8, it follows that for n sufficiently large,

d_{n} = N_{1} (δ_{1}) \leq C κ_{n}^{a_{1} a_{3}} .

Using the Stirling’s formula, we can bound log(N₂(δ₃, d_n)) by Cd_n. Also $κ_{n} δ_{2}^{- 1}$ is bounded by $C κ_{n}^{a_{2} + 1}$ , so that N(ε, Inline graphic ) ≤

C + C log (κ_{n}) + {C d}_{n} \leq C κ_{n}^{a_{1} a_{3}}

for n sufficiently large.

6.3 Proof of Proposition 3

Proof

In Lemma 5 of Wu and Ghosal (2010)[21], it is shown that

| | K (μ, K) - K (ν, K) | | \leq C λ_{d} {(K)}^{1 / 2} ρ (u, ν)

for all μ, ν ∈ ℝ^d and Inline graphic ∈ M⁺(d). This means that A5′ is satisfied with a₁ = 1/2. Also by the geometry of ℜ^d, A8 is satisfied with a₃ = d.

To show A7, note that φ⁻¹ [0, κ] is a subset of M ⁺ (d) consisting of those positive matrices whose eigen values are bounded by κ. We equip M ⁺ (d) with the L₂ norm distance, i.e.

ρ_{2} (K_{1}, K_{2}) = {| | K_{1} - K_{2} | |}_{2}, {| | K | |}_{2}^{2} = \sum_{i j} K_{i j}^{2} = Trace (K^{2})

and view it as a subset of M(d) - the space of all order d real matrices. Then φ⁻¹ [0, κ] is contained in a ball of radius $\sqrt{d κ}$ around the zero matrix. The ε-covering number of a such a ball is of the order ${(\sqrt{κ} / ε)}^{d^{2}}$ . Hence A7 is also satisfied.

Remains to check A6′. Since φ⁻¹ [0, κ] is a convex subset of M(d), use the Taylor’s theorem to get

∣ K (x; μ, K_{1}) - K (x; μ, K_{2}) ∣ \leq ρ_{2} (K_{1}, K_{2}) sup_{K \in φ^{- 1} [0, κ]} {| | \frac{\partial}{\partial K} K (x; μ, K) | |}_{2}

for all x, μ ∈ ℜ^d, and Inline graphic , ∈ φ⁻¹ [0, κ]. This in turn implies that

\begin{array}{r} | | K (μ, K_{1}) - K (μ, K_{2}) | | \leq \\ ρ_{2} (K_{1}, K_{2}) \int_{sup_{R^{d} K \in φ^{- 1} [0, κ]}} {| | \frac{\partial}{\partial K} K (x; μ, K) | |}_{2} d x . \end{array}

Some calculation will show that

\begin{array}{r} {| | \frac{\partial}{\partial K} K (x; μ, K) | |}_{2} \leq \\ C K (x; μ, K) (\sqrt{\sum_{1}^{d} λ_{j}^{- 2} (K)} + {| | x - μ | |}_{2}^{2}), \end{array}

C being some constant independent of x, μ or Inline graphic . Since φ⁻¹ [0, κ] consists of all positive matrices whose eigen values lie in [λ₁, κ], this will imply that

| | K (μ, K_{1}) - K (μ, K_{2}) | | \leq C κ^{d / 2} ρ_{2} (K_{1}, K_{2})

which means that A6′ is also satisfied with a₂ = d/2. Here C denotes a constant independent of μ, Inline graphic , or κ. The rest of the proof follows from Theorem 2 and Proposition 2.

6.4 Proof of Lemma 1

Proof

Fix ε > 0. Under assumptions A1, A2 and A4, it follows from the proof of Theorem 1 that there exists a weakly open neighborhood Inline graphic of F₀ (depending on ε) such that K_ε(f₀) contains {f (P, ): P ∈ , ∈ N_ε}. Hence

\begin{array}{r} \int \prod_{1}^{n} \frac{f (X_{i})}{f_{0} (X_{i})} Π_{n} (d f) \geq \int_{K_{ε} (f_{0})} \prod_{1}^{n} \frac{f (X_{i})}{f_{0} (X_{i})} Π_{n} (d f) \\ \geq \int_{W \times N_{ε}} \prod_{1}^{n} \frac{f (X_{i}; P, K)}{f_{0} (X_{i})} π_{n} (K) Π_{11} (d P) λ_{1} (d K) . \end{array}

By the law of large numbers, for any f ∈ K_ε(f₀),

\frac{1}{n} \sum_{i} log {(f_{0} / f) (X_{i})} \to KL (f_{0}; f) < ε

a.s. $F_{0}^{\infty}$ as n → ∞. Therefore for any P ∈ Inline graphic and ∈ N_ε,

\begin{array}{r} \underset{n}{lim inf} exp (2 n ε) \prod_{1}^{n} \frac{f (X_{i}; P, K)}{f_{0} (X_{i})} = \\ \underset{n}{lim inf} exp [n [2 ε - (1 / n) \sum_{i} log {f_{0} (X_{i}) / f (X_{i}; P, K)}]] = \infty a . s . F_{0}^{\infty} . \end{array}

Also from Assumption A11, lim inf_n exp(nε)π_n( Inline graphic ) = ∞ ∀ ∈ N_ε and hence

\underset{n}{lim inf} exp (3 n ε) \prod_{1}^{n} \frac{f (X_{i}; P, K)}{f_{0} (X_{i})} π_{n} (K) = \infty a . s . F_{0}^{\infty} .

By Fubini-Tonelli theorem, there exists a Ω₀ ⊂ Ω with probability 1 such that for any ω ∈ Ω₀,

\underset{n}{lim inf} exp (3 n ε) \prod_{1}^{n} \frac{f (X_{i} (ω); P, K)}{f_{0} (X_{i} (ω))} π_{n} (K) = \infty

for all (P, Inline graphic ) ∈ × N_ε outside of a Π₁₁(dP)⊗λ₁(d ) measure 0 subset. By Assumption A10, and since λ₁ has full support and N_ε has a non-empty interior, (Π₁₁⊗λ₁)( × N_ε) > 0. Therefore using the Fatou’s lemma, we conclude that

\begin{array}{r} \underset{n}{lim inf} exp (3 n ε) \int \prod_{1}^{n} \frac{f (X_{i})}{f_{0} (X_{i})} Π_{n} (d f) \geq \\ \int_{W \times N_{ε}} \underset{n}{lim inf} {exp (3 n ε) \prod_{1}^{n} \frac{f (X_{i}; P, K)}{f_{0} (X_{i})} π_{n} (K)} Π_{11} (d P) λ_{1} (d K) = \infty a . s . F_{0}^{\infty} . \end{array}

Since ε was arbitrary, the proof is completed.

6.5 Proof of Proposition 4

Proof

For any a > 0,

\begin{array}{l} π_{n} {φ^{- 1} (n^{a}, \infty)} & = P r (λ_{d} (K) > n^{a}), K ~ π_{n} \\ = P r (λ_{d} (X) > n^{a} ∣ λ_{1} (X) \geq λ_{1}), X ~ Wish (β_{n}^{- 1} I_{d}, q) \\ \leq P r (λ_{d} (X) > n^{a}) / P r (λ_{1} (X) \geq λ_{1}) \\ = P r (λ_{d} (Z) > β_{n} n^{a}) / P r (λ_{1} (Z) \geq β_{n} λ_{1}), Z ~ Wish (I_{d}, q), \end{array}

(16)

the last identity following because X equals to $β_{n}^{- 1} Z$ in distribution. The numerator in (16) is less than Pr(Tr(Z) > β_nn^a). The trace of Z follows a Gam(1/2, qd/2) distribution which has exponentially decaying tail. Hence the numerator is less than exp(−Cβ_nn^a) for some C > 0 when n is sufficiently large.

Now we derive a lower bound for the probability in the denominator of (16). In Mallik (2003)[15], the joint density of λ₁(Z), …, λ_d(Z) has been shown to be

f (x_{1}, \dots x_{d}) = \frac{{(Π_{i = 1}^{d} x_{i})}^{q - d} exp (- \sum_{i = 1}^{d} x_{i}) Π_{1 \leq i \leq j \leq d} {(x_{i} - x_{j})}^{2}}{Π_{i = 1}^{d} (d - i)! (q - i)!}, 0 < x_{1} < \dots < x_{d} < \infty .

Hence Pr(λ₁(Z) ≥ β_nλ₁) = Pr(λ_j (Z) ≥ β_nλ₁ ∀j) =

\begin{array}{r} C \int_{β_{n} λ_{1} \leq x_{1} < \dots < x_{d} < \infty} {(\prod_{i = 1}^{d} x_{i})}^{q - d} exp (- \sum_{i = 1}^{d} x_{i}) \prod_{1 \leq i < j \leq d} {(x_{i} - x_{j})}^{2} \prod_{i = 1}^{d} {d x}_{i} \\ \geq C \int_{β_{n} λ_{1} \leq x_{1} < \dots < x_{d} < \infty} exp (- 2 \sum_{i = 1}^{d} x_{i}) \prod_{1 \leq i < j \leq d} {(x_{i} - x_{j})}^{2} \prod_{i = 1}^{d} {d x}_{i} \end{array}

(17)

for n sufficiently large. Integrate (17) by parts to get Pr(λ₁(Z) ≥ β_nλ₁) ≥ exp(−Cβ_n) for appropriate C > 0 when n is sufficiently large. Hence there exists a C > 0 such that the ratio in (16) is less than exp(−Cβ_nn^a). If we pick a as in the Proposition, for any β₀ > 0, it follows that

exp (n β_{0}) π_{n} {φ^{- 1} (n^{a}, \infty)} < exp {- n (C β_{n} n^{a - 1} - β_{0})}

which converges to zero because β_nn^a⁻¹ diverges to infinity. This verifies Assumption A12 with a as in the Proposition and β₀ being any positive constant.

6.6 Proof of Theorem 5

Proof

Denote by M the unit sphere S^d and by ρ the chord distance on it. Express the vMF kernel as

K (m; μ, K) = c^{- 1} (K) exp [K {1 - ρ^{2} (m, μ) / 2}] (m, μ \in M; K \in [0, \infty)) .

Since ρ is continuous on the product space M × M and c is continuous and non-vanishing on [0, ∞), K is continuous on M × M × [0, ∞) and assumption A1 follows.

For a given continuous function f on M, m ∈ M, Inline graphic ≥ 0, define

I (m, K) = f (m) - \int_{M} K (m; μ, K) f (μ) V (d μ) = \int_{M} K (m; μ, K) {f (m) - f (μ)} V (d μ) .

Then assumption A2 follows once we show that

lim_{K \to \infty} (sup_{m \in M} ∣ I (m, K) ∣) = 0.

To simplify I(m, Inline graphic ), make a change of coordinates μ ↦ μ̃ = U(m)^T μ, μ̃ ↦ θ ∈ Θ_d ≡ (0, π)^d⁻¹× (0, 2π) where U(m) is an orthogonal matrix with first column equal to m and θ = (θ₁, …, θ_d)^T are the spherical coordinates of μ̃ ≡ μ̃(θ) which are given by

{\tilde{μ}}_{j} = cos θ_{j} \prod_{h < j} sin θ_{h}, j = 1, \dots, d, {\tilde{μ}}_{d + 1} = \prod_{j = 1}^{d} sin θ_{j} .

Using these coordinates, the volume form can be written as

V (d μ) = V (d \tilde{μ}) = {sin}^{d - 1} (θ_{1}) {sin}^{d - 2} (θ_{2}) \dots sin (θ_{d - 1}) d θ_{1} \dots d θ_{d}

and hence I(m, Inline graphic ) equals

\begin{array}{r} \int_{Θ_{d}} c^{- 1} (K) exp {K cos (θ_{1})} {f (m) - f (U (m) \tilde{μ})} {sin}^{d - 1} (θ_{1}) \dots sin (θ_{d - 1}) d θ_{1} \dots d θ_{d} \\ = c^{- 1} (K) \int_{Θ_{d - 1} \times (- 1, 1)} exp (K t) {f (m) - f (U (m) \tilde{μ}} {(1 - t^{2})}^{d / 2 - 1} \\ {sin}^{d - 2} (θ_{2}) \dots sin (θ_{d - 1}) d θ_{2} \dots d θ_{d} d t \end{array}

(18)

where t = cos(θ₁), μ̃ = μ̃(θ(t)) and θ(t) = (arccos(t), θ₂, …, θ_d)^T. In the integrand in (18), the distance between m and U(m) μ̃ is $\sqrt{2 (1 - t)}$ . Substitute t = 1 − Inline graphic s in the integral with s ∈ (0, 2 ). Define

Φ (s, K) = sup {∣ f (m) - f (\tilde{m}) ∣ : m, \tilde{m} \in M, ρ (m, \tilde{m}) \leq \sqrt{2 K^{- 1} s}} .

Then

∣ f (m) - f (U (m) \tilde{μ}) ∣ \leq Φ (s, K) .

Since f is uniformly continuous on (M, ρ), therefore Φ is bounded on (ℜ⁺)² and lim_→∞ Φ(s, Inline graphic ) = 0. Hence from (18), we deduce that sup_m_∈_M |I(m, )| ≤

\begin{array}{r} c^{- 1} (K) K^{- 1} \int_{Θ_{d - 1} \times (0, 2 K)} exp (K - s) Φ (s, K) {(K^{- 1} s (2 - K^{- 1} s))}^{d / 2 - 1} \\ {sin}^{d - 2} (θ_{2}) \dots sin (θ_{d - 1}) d θ_{2} \dots d θ_{d} d s \leq \\ C K^{- d / 2} {\tilde{c}}^{- 1} (K) \int_{0}^{\infty} Φ (s, K) e^{- s} s^{d / 2 - 1} d s . \end{array}

(19)

From Lemma 3, it follows that

\underset{K \to \infty}{lim sup} K^{- d / 2} {\tilde{c}}^{- 1} (K) < \infty .

This in turn, using the Lebesgue Dominated Convergence Theorem implies that the expression in (19) converges to 0 as Inline graphic → ∞. This verifies assumption A2.

6.7 Proof of Theorem 6

In the proof, B_d(r) denotes the ball of radius r around 0 in ℜ^d:

B_{d} (r) = {x \in R^{d} : {| | x | |}_{2} \leq r}

and B_d refers to B_d(1).

Proof

It is clear from (6) and (7) that the vMF kernel K is continuously differentiable on ℜ^d⁺¹ × ℜ^d ⁺¹× [0, ∞). Hence

sup_{m \in S^{d}, K \in [0, κ]} ∣ K (m; μ, K) - K (m; ν, K) ∣ \leq sup_{m \in S^{d}, x \in B_{d + 1}, K \in [0, κ]} {‖ \frac{\partial}{\partial x} K (m; x, K) ‖}_{2} {| | μ - ν | |}_{2} .

Since

\frac{\partial}{\partial x} K (m; x, K) = K {\tilde{c}}^{- 1} (K) exp {- K (1 - m^{T} x)} m,

its norm is bounded by Inline graphic c̃⁻¹ ( ). Lemma 3 implies that this in turn is bounded by

κ {\tilde{c}}^{- 1} (κ) \leq C κ^{d / 2 + 1}

for Inline graphic ≤ κ and ≥ 1. This proves assumption A5 with a₁ = d/2 + 1.

To verify A6, given Inline graphic , ≤ κ, use the inequality,

sup_{m, μ \in S^{d}} ∣ K (m; μ, K_{1}) - K (m; μ, K_{2}) ∣ \leq sup_{m, μ \in S^{d}, K \leq κ} | \frac{\partial}{\partial κ} K (m; μ, K) | ∣ K_{1} - K_{2} ∣ .

By direct computations, one can show that

\begin{array}{r} \frac{\partial}{\partial K} K (m; μ, K) = - \frac{\partial}{\partial K} \tilde{c} (K) {\tilde{c}}^{- 2} (K) exp {- K (1 - m^{T} μ)} \\ - {\tilde{c}}^{- 1} (K) exp {- K (1 - m^{T} μ)} (1 - m^{T} μ), \\ \frac{\partial}{\partial K} \tilde{c} (K) = - C \int_{- 1}^{1} exp {- K (1 - t)} (1 - t) {(1 - t^{2})}^{d / 2 - 1} d t, \\ | \frac{\partial}{\partial K} \tilde{c} (K) | \leq C \tilde{c} (K) . \end{array}

Therefore, using Lemma 3,

| \frac{\partial}{\partial K} K (m; μ, K) | \leq C {\tilde{c}}^{- 1} (K) \leq C {\tilde{c}}^{- 1} (κ) \leq C κ^{d / 2}

for any Inline graphic ≤ κ and κ ≥ 1. Hence A6 is verified with a₂ = d/2.

Finally to verify A8, note that S^d ⊂ B_d₊₁ ⊂ [−1, 1]^d ⁺¹ which can be covered by finitely many cubes of side length ε/(d + 1). Each such cube has L₂ diameter ε. Hence their intersections with S^d provides a finite ε-cover for this manifold. If ε < 1, such a cube intersects with S^d only if it lies entirely in B_d₊₁(1 + ε) ∩ B_d₊₁(1 − ε)^c. The number of such cubes, and hence the ε-cover size can be bounded by

C ε^{- (d + 1)} {{(1 + ε)}^{d + 1} - {(1 - ε)}^{d + 1}} \leq C ε^{- d}

for some C > 0 not depending on ε. This verifies A8 for appropriate positive constant A₃ and a₃ = d.

6.8 Proof of Lemma 3

Proof

Express c̃( Inline graphic ) as

C \int_{- 1}^{1} exp {- K (1 - t)} {(1 - t^{2})}^{d / 2 - t} d t

and it is clear that it is decreasing. This expression suggests that

\begin{array}{r} \tilde{c} (K) \geq C \int_{0}^{1} exp {- K (1 - t)} {(1 - t^{2})}^{d / 2 - 1} d t \\ \geq C \int_{0}^{1} exp {- K (1 - t^{2})} {(1 - t^{2})}^{d / 2 - 1} d t \\ = C \int_{0}^{1} exp {- K u) u^{d / 2 - 1} {(1 - u)}^{- 1 / 2} d u \\ \geq C \int_{0}^{1} exp {- K u) u^{d / 2 - 1} d u \\ = C K^{- d / 2} \int_{0}^{K} exp {- v) v^{d / 2 - 1} d u \\ \geq C {\int_{0}^{1} exp (- v) v^{d / 2 - 1} d v} K^{- d / 2} \end{array}

if Inline graphic ≥ 1.

6.9 Proof of Theorem 7

Proof

Express the complex Watson kernel as

K (m; μ, K) = c^{- 1} (K) exp (\frac{- K}{2} d_{E}^{2} (m, μ)) .

Given Inline graphic ≥ 0, define

φ (t) = exp (\frac{- K}{2} t^{2}), t \in [0, \sqrt{2}] .

Then $∣ φ^{'} (t) ∣ \leq \sqrt{2} K$ , so that

∣ φ (t) - φ (s) ∣ \leq \sqrt{2} K ∣ s - t ∣, s, t \in [0, \sqrt{2}]

which implies that

\begin{array}{l} ∣ K (m; μ, K) - K (m; ν, K) ∣ \leq c^{- 1} (K) \sqrt{2} K ∣ d_{E} (m, μ) - d_{E} (m, ν) ∣ \\ \leq \sqrt{2} K c^{- 1} (K) d_{E} (μ, ν) . \end{array}

(20)

For Inline graphic ≤ κ, from Lemma 4, it follows that

\begin{array}{l} K c^{- 1} (K) \leq κ c^{- 1} (κ) = π^{2 - k} κ^{k - 1} {\tilde{c}}^{- 1} (κ) \\ \leq π^{2 - k} κ^{k - 1} {\tilde{c}}^{- 1} (1) \end{array}

provided κ ≥ 1. Hence for any κ ≥ 1,

sup_{K \in [0, κ]} K c^{- 1} (K) \leq C κ^{k - 1}

and from inequality (20), a₁ = k − 1 follows.

By direct computation, one can show that

\begin{array}{r} \frac{\partial}{\partial K} K (m; μ, K) = π^{k - 2} exp {- \frac{1}{2} K d_{E}^{2} (m, μ) - K} \times \\ c^{- 2} (K) K^{2 - k} [\sum_{r = k - 1}^{\infty} \frac{K^{r - 1}}{r!} {k - 2 - \frac{r}{2} d_{E}^{2} (m, μ)}] . \end{array}

(21)

Denote by S the sum in the second line of (21). Since $d_{E}^{2} (m, μ) \leq 2$ , it can be shown that

∣ k - 2 - \frac{r}{2} d_{E}^{2} (m, μ) ∣ \leq {\begin{array}{l} k - 2 & if k - 1 \leq r \leq 2 k - 4, \\ r - k + 2 & if 2 k - 3 \leq r, \end{array}

so that

\begin{array}{l} ∣ S ∣ & \leq (k - 2) \sum_{r = k - 1}^{2 k - 4} \frac{K^{r - 1}}{r!} + \sum_{r = 2 k - 3}^{\infty} \frac{K^{r - 1}}{r!} (r - k + 2) \\ = (k - 2) K^{k - 2} \sum_{r = 0}^{k - 3} \frac{K^{r}}{(r + k - 1)!} + K^{2 k - 4} \sum_{r = 0}^{\infty} \frac{K^{r}}{(r + 2 k - 3)!} (r + k - 1) \\ \leq C K^{k - 2} e^{K} + K^{2 k - 4} e^{K} . \end{array}

Plug the above inequality in (21) to get

\begin{array}{l} | \frac{\partial}{\partial K} K (m; μ, K) | \leq {C c}^{- 2} (K) K^{2 - k} exp {- \frac{1}{2} K d_{E}^{2} (m, μ)} (C K^{k - 2} + K^{2 k - 4}) \\ \leq {C c}^{- 2} (K) (C + K^{k - 2}) . \end{array}

(22)

For Inline graphic ≤ κ and κ ≥ 1, using Lemma 4, we bound the expression in (22) by

\begin{array}{r} {C c}^{- 2} (κ) (C + κ^{k - 2}) = C κ^{2 k - 6} {\tilde{c}}^{- 2} (κ) (C + κ^{k - 2}) \\ \leq C κ^{2 k - 6} {\tilde{c}}^{- 2} (1) (C + κ^{k - 2}) \leq C κ^{3 k - 8} \end{array}

(23)

for κ sufficiently large. Since K is a continuously differentiable in Inline graphic , from (23) it follows that there exists κ₁ > 0 such that for all κ ≥ κ₁, , ≤ κ,

\begin{array}{r} sup_{m, μ \in \sum_{2}^{k}} ∣ K (m; μ, K_{1}) - K (m; μ, K_{2}) ∣ \leq sup_{m, μ \in \sum_{2}^{k}, K \in [0, κ]} | \frac{\partial}{\partial K} K (m; μ, K) | ∣ K_{1} - K_{2} ∣ \\ \leq C κ^{3 k - 8} ∣ K_{1} - K_{2} ∣ . \end{array}

This proves Assumption A6 with a₂ = 3k − 8.

6.10 Proof of Theorem 8

In the proof, C_i, i = 1, 2, … denote positive constants possibly depending on k.

Proof

The preshape sphere Inline graphic , as a real manifold, can be identified with the real unit sphere S²^k⁻³. Endow it with the chord distance induced by the L²-norm

{| | u | |}_{2} = \sqrt{\sum_{i = 1}^{k - 1} {∣ u_{i} ∣}^{2}} (u = {(u_{1}, \dots, u_{k - 1})}^{T}) .

Then from Theorem 6, it follows that given any δ > 0, Inline graphic can be covered by finitely many subsets of diameter less than or equal to δ, the number of such subsets being bounded by C₁δ⁻⁽²^k⁻³⁾ + C₂. The extrinsic distance d_E on $\sum_{2}^{k}$ can be bounded by the chord distance on as follows. For u, v ∈ ,

\begin{array}{l} {| | u - v | |}_{2}^{2} = 2 - 2 Re (u^{*} v) \geq 2 - 2 ∣ u^{*} v ∣ = 2 (1 - ∣ u^{*} v ∣) \\ \geq (1 + ∣ u^{*} v ∣) (1 - ∣ u^{*} v ∣) = \frac{1}{2} d_{E}^{2} ([u], [v]) . \end{array}

Hence $d_{E} ([u], [v]) \leq \sqrt{2} {| | u - v | |}_{2}$ , so that given any ε > 0, the shape image of a δ-cover for Inline graphic with $δ = ε / \sqrt{2}$ provides an ε-cover for $\sum_{2}^{k}$ . Hence the ε-covering size for $\sum_{2}^{k}$ can be bounded by C₁ε⁻⁽²^k⁻³⁾+ C₂.

Contributor Information

Abhishek Bhattacharya, Email: ab-hishek@isical.ac.in, Indian Statistical Institute, 203 B.T. Road, Kolkata, W.B. 700108, India.

David B. Dunson, Email: dun-son@stat.duke.edu, Department of Statistical Science, Duke University, Durham, NC 27708, U.S.A

References

1.Barron AR. Uniformly powerful goodness of fit tests. Annals of Statistics. 1989;17:107–24. [Google Scholar]
2.Bhattacharya A, Dunson D. Nonparametric Bayesian density estimation on manifolds with applications to planar shapes. Biometrika. 2010a;97(4):851–865. doi: 10.1093/biomet/asq044. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Bhattacharya A, Dunson D. Discussion Paper. Department of Statistical Science, Duke University; 2010b. Nonparametric Bayes classification and hypothesis testing on manifolds. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Bhattacharya RN, Patrangenaru V. Large sample theory of intrinsic and extrinsic sample means on manifolds. Annals of Statistics. 2003;31:1–29. [Google Scholar]
5.Dryden IL, Mardia KV. Statistical Shape Analysis. Wiley; N.Y: 1998. [Google Scholar]
6.Escobar MD, West M. Bayesian density-estimation and inference using mixtures. Journal of the American Statistical Association. 1995;90:577–588. [Google Scholar]
7.Fisher RA. Dispersion on a sphere. Proceedings of the Royal Society of London A. 1953;1130:295–305. [Google Scholar]
8.Ghosal S, Ghosh JK, Ramamoorthi RV. Posterior consistency of dirichlet mixtures in density estimation. Annals of Statistics. 1999;27:143–158. [Google Scholar]
9.Ghosh JK, Ramamoorthi RV. Bayesian Nonparametrics. Springer; N.Y: 2003. [Google Scholar]
10.Hirsch M. Differential Topology. Springer Verlag; New York: 1976. [Google Scholar]
11.Kendall DG. Shape manifolds, procrustean metrics, and complex projective spaces. Bulletin of the London Mathematical Society. 1984;16:81–121. [Google Scholar]
12.LeCam L. Convergence of estimates under dimensionality restrictions. Annals of Statistics. 1973;1:38–53. [Google Scholar]
13.Lennox KP, Dahl DB, Vannucci M, Tsai JW. Density estimation for protein conformation angles using a bivariate von Mises distribution and Bayesian nonparametrics. Journal of the American Statistical Association. 2009;104:586–596. doi: 10.1198/jasa.2009.0024. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Lo AY. On a class of Bayesian nonparametric estimates. 1. density estimates. Annals of Statistics. 1984;12:351–357. [Google Scholar]
15.Mallik RK. The pseudo-wishart distribution and its application to mimo systems. IEEE Transactions on Information Theory. 2003;49(10):2761–2769. [Google Scholar]
16.Schwartz L. On Bayes procedures. Z Wahrsch Verw Gebiete. 1965;4:10–26. [Google Scholar]
17.Sparr G. Depth-computations from polihedral images. Proceedings of 2nd European Conference on Computer Vision, ECCV-2; 1992. pp. 378–386. [Google Scholar]
18.von Mises RV. Uber die “Ganzzahligkeit” der Atomgewicht und verwandte Fragen. Physik Z. 1918;19:490–500. [Google Scholar]
19.Watson GS, Williams EJ. Construction of significance tests on the circle and sphere. Biometrika. 1953;43:344–52. [Google Scholar]
20.Wu Y, Ghosal S. Kullback-Leibler property of kernel mixture priors in Bayesian density estimation. Electronic Journal of Statistics. 2008;2:298–331. [Google Scholar]
21.Wu Y, Ghosal S. The L1 - consistency of dirichlet mixtures in multivariate bayesian density estimation on bayes procedures. Journal of Mutivariate Analysis. 2010;101:2411–2419. [Google Scholar]

[R1] 1.Barron AR. Uniformly powerful goodness of fit tests. Annals of Statistics. 1989;17:107–24. [Google Scholar]

[R2] 2.Bhattacharya A, Dunson D. Nonparametric Bayesian density estimation on manifolds with applications to planar shapes. Biometrika. 2010a;97(4):851–865. doi: 10.1093/biomet/asq044. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Bhattacharya A, Dunson D. Discussion Paper. Department of Statistical Science, Duke University; 2010b. Nonparametric Bayes classification and hypothesis testing on manifolds. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Bhattacharya RN, Patrangenaru V. Large sample theory of intrinsic and extrinsic sample means on manifolds. Annals of Statistics. 2003;31:1–29. [Google Scholar]

[R5] 5.Dryden IL, Mardia KV. Statistical Shape Analysis. Wiley; N.Y: 1998. [Google Scholar]

[R6] 6.Escobar MD, West M. Bayesian density-estimation and inference using mixtures. Journal of the American Statistical Association. 1995;90:577–588. [Google Scholar]

[R7] 7.Fisher RA. Dispersion on a sphere. Proceedings of the Royal Society of London A. 1953;1130:295–305. [Google Scholar]

[R8] 8.Ghosal S, Ghosh JK, Ramamoorthi RV. Posterior consistency of dirichlet mixtures in density estimation. Annals of Statistics. 1999;27:143–158. [Google Scholar]

[R9] 9.Ghosh JK, Ramamoorthi RV. Bayesian Nonparametrics. Springer; N.Y: 2003. [Google Scholar]

[R10] 10.Hirsch M. Differential Topology. Springer Verlag; New York: 1976. [Google Scholar]

[R11] 11.Kendall DG. Shape manifolds, procrustean metrics, and complex projective spaces. Bulletin of the London Mathematical Society. 1984;16:81–121. [Google Scholar]

[R12] 12.LeCam L. Convergence of estimates under dimensionality restrictions. Annals of Statistics. 1973;1:38–53. [Google Scholar]

[R13] 13.Lennox KP, Dahl DB, Vannucci M, Tsai JW. Density estimation for protein conformation angles using a bivariate von Mises distribution and Bayesian nonparametrics. Journal of the American Statistical Association. 2009;104:586–596. doi: 10.1198/jasa.2009.0024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Lo AY. On a class of Bayesian nonparametric estimates. 1. density estimates. Annals of Statistics. 1984;12:351–357. [Google Scholar]

[R15] 15.Mallik RK. The pseudo-wishart distribution and its application to mimo systems. IEEE Transactions on Information Theory. 2003;49(10):2761–2769. [Google Scholar]

[R16] 16.Schwartz L. On Bayes procedures. Z Wahrsch Verw Gebiete. 1965;4:10–26. [Google Scholar]

[R17] 17.Sparr G. Depth-computations from polihedral images. Proceedings of 2nd European Conference on Computer Vision, ECCV-2; 1992. pp. 378–386. [Google Scholar]

[R18] 18.von Mises RV. Uber die “Ganzzahligkeit” der Atomgewicht und verwandte Fragen. Physik Z. 1918;19:490–500. [Google Scholar]

[R19] 19.Watson GS, Williams EJ. Construction of significance tests on the circle and sphere. Biometrika. 1953;43:344–52. [Google Scholar]

[R20] 20.Wu Y, Ghosal S. Kullback-Leibler property of kernel mixture priors in Bayesian density estimation. Electronic Journal of Statistics. 2008;2:298–331. [Google Scholar]

[R21] 21.Wu Y, Ghosal S. The L1 - consistency of dirichlet mixtures in multivariate bayesian density estimation on bayes procedures. Journal of Mutivariate Analysis. 2010;101:2411–2419. [Google Scholar]

PERMALINK

Strong consistency of nonparametric Bayes density estimation on compact metric spaces with applications to specific manifolds

Abhishek Bhattacharya

David B Dunson

Abstract

1 Introduction

2 Consistency theorems on compact metric spaces

2.1 Weak posterior consistency

Proposition 1 (Schwartz Theorem)

Theorem 1

2.2 Strong consistency

Proposition 2

Theorem 2

Corollary 1

Remark 1

Proposition 3

Remark 2

2.3 Consistency with sample size-dependent priors

Theorem 3

Lemma 1

Lemma 2

Theorem 4

Proposition 4

3 Application to unit hypersphere

Theorem 5

Theorem 6

Lemma 3

4 Planar Shape Space

4.1 Background

4.2 Density model

Theorem 7

Lemma 4

Theorem 8

5 Summary & Future Work

6 Appendix

6.1 Proof of Theorem 1

Proof

6.2 Proof of Theorem 2

Proof

6.3 Proof of Proposition 3

Proof

6.4 Proof of Lemma 1

Proof

6.5 Proof of Proposition 4

Proof

6.6 Proof of Theorem 5

Proof

6.7 Proof of Theorem 6

Proof

6.8 Proof of Lemma 3

Proof

6.9 Proof of Theorem 7

Proof

6.10 Proof of Theorem 8

Proof

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases