Geometric Wavelet Scattering Networks on Compact Riemannian Manifolds

Michael Perlmutter; Feng Gao; Guy Wolf; Matthew Hirn

. Author manuscript; available in PMC: 2021 Aug 6.

Published in final edited form as: Proc Mach Learn Res. 2020 Jul;107:570–604.

Geometric Wavelet Scattering Networks on Compact Riemannian Manifolds

Michael Perlmutter ¹, Feng Gao ², Guy Wolf ³, Matthew Hirn ⁴

PMCID: PMC8343966 NIHMSID: NIHMS1726498 PMID: 34368770

Abstract

The Euclidean scattering transform was introduced nearly a decade ago to improve the mathematical understanding of convolutional neural networks. Inspired by recent interest in geometric deep learning, which aims to generalize convolutional neural networks to manifold and graph-structured domains, we define a geometric scattering transform on manifolds. Similar to the Euclidean scattering transform, the geometric scattering transform is based on a cascade of wavelet filters and pointwise nonlinearities. It is invariant to local isometries and stable to certain types of diffeomorphisms. Empirical results demonstrate its utility on several geometric learning tasks. Our results generalize the deformation stability and local translation invariance of Euclidean scattering, and demonstrate the importance of linking the used filter structures to the underlying geometry of the data.

Keywords: geometric deep learning, wavelet scattering, spectral geometry

1. Introduction

In an effort to improve our mathematical understanding of deep convolutional networks and their learned features, Mallat (2010, 2012) introduced the scattering transform for signals on $ℝ^{d}$ . This transform has an architecture similar to convolutional neural networks (ConvNets), based on a cascade of convolutional filters and simple pointwise nonlinearities. However, unlike many other deep learning methods, this transform uses the complex modulus as its nonlinearity and does not learn its filters from data, but instead uses designed filters. As shown in Mallat (2012), with properly chosen wavelet filters, the scattering transform is provably invariant to the actions of certain Lie groups, such as the translation group, and is also provably Lipschitz stable to small diffeomorphisms, where the size of a diffeomorphism is quantified by its deviation from a translation. These notions were applied in Bruna and Mallat (2011, 2013); Sifre and Mallat (2012, 2013, 2014); Oyallon and Mallat (2015) using groups of translations, rotations, and scaling operations, with applications in image and texture classification. Additionally, the scattering transform and its deep filter bank approach have also proven to be effective in several other fields, such as audio processing (Andén and Mallat, 2011, 2014; Wolf et al., 2014, 2015; Andén et al., 2019), medical signal processing (Chudácek et al., 2014), and quantum chemistry (Hirn et al., 2017; Eickenberg et al., 2017, 2018; Brumwell et al., 2018). Mathematical generalizations to non-wavelet filters have also been studied, including Gabor filters as in the short time Fourier transform (Czaja and Li, 2019) and more general classes of semi-discrete frames (Grohs et al., 2016; Wiatowski and Bölcskei, 2015, 2018).

However, many data sets of interest have an intrinsically non-Euclidean structure and are better modeled by graphs or manifolds. Indeed, manifold learning models (e.g., Tenenbaum et al., 2000; Coifman and Lafon, 2006a; van der Maaten and Hinton, 2008) are commonly used for representing high-dimensional data in which unsupervised algorithms infer data-driven geometries to capture intrinsic structure in data. Furthermore, signals supported on manifolds are becoming increasingly prevalent, for example, in shape matching and computer graphics. As such, a large body of work has emerged to explore the generalization of spectral and signal processing notions to manifolds (Coifman and Lafon, 2006b) and graphs (Shuman et al., 2013a, and references therein). In these settings, functions are supported on the manifold or the vertices of the graph, and the eigenfunctions of the Laplace-Beltrami operator, or the eigenvectors of the graph Laplacian, serve as the Fourier harmonics. This increasing interest in non-Euclidean data geometries has led to a new research direction known as geometric deep learning, which aims to generalize convolutional networks to graph and manifold structured data (Bronstein et al., 2017, and references therein).

Inspired by geometric deep learning, recent works have also proposed an extension of the scattering transform to graph domains. These mostly focused on finding features that represent a graph structure (given a fixed set of signals on it) while being stable to graph perturbations. In Gama et al. (2019b), a cascade of diffusion wavelets from Coifman and Maggioni (2006) was proposed, and its Lipschitz stability was shown with respect to a global diffusion-inspired distance between graphs. These results were generalized in Gama et al. (2019a) to graph wavelets constructed from more general graph shift operators. A similar construction discussed in Zou and Lerman (2019) was shown to be stable to permutations of vertex indices, and to small perturbations of edge weights. Gao et al. (2019) established the viability of graph scattering coefficients as universal graph features for data analysis tasks (e.g., in social networks and biochemistry data). The wavelets used in Gao et al. (2019) are similar to those used in Gama et al. (2019b), but are constructed from an asymmetric lazy random walk matrix (the wavelets in Gama et al. (2019b) are constructed from a symmetric matrix). The constructions of Gama et al. (2019b) and Gao et al. (2019) were unified in Perlmutter et al. (2019), which introduced a large family of graph wavelets, including both those from Gama et al. (2019b) and Gao et al. (2019) as special cases, and showed that the resulting scattering transforms enjoyed many of the same theoretical properties as in Gama et al. (2019b).

In this paper we consider the manifold aspect of geometric deep learning. There are two basic tasks in this setting: (1) classification of multiple signals over a single, fixed manifold; and (2) classification of multiple manifolds. Beyond these two tasks, there are additional problems of interest such as manifold alignment, partial manifold reconstruction, and generative models. Fundamentally for all of these tasks, both in the approach described here and in other papers, one needs to process signals over a manifold. Indeed, even in manifold classification tasks and related problems such as manifold alignment, one often begins with a set of universal features that can be defined on any manifold, and which are processed in such a way that allows for comparison of two or more manifolds. In order to carry out these tasks, a representation of manifold supported signals needs to be stable to orientations, noise, and deformations over the manifold geometry. Working towards these goals, we define a scattering transform on compact smooth Riemannian manifolds without boundary, which we call geometric scattering. Our construction is based on convolutional filters defined spectrally via the eigendecomposition of the Laplace-Beltrami operator over the manifold, as discussed in Section 2. We show that these convolutional operators can be used to construct a wavelet frame similar to the diffusion wavelets constructed in Coifman and Maggioni (2006). Then, in Section 3, we construct a cascade of these generalized convolutions and pointwise absolute value operations that is used to map signals on the manifold to scattering coefficients that encode approximate local invariance to isometries, which correspond to translations, rotations, and reflections in Euclidean space. We then show in Section 4 that our scattering coefficients are also stable to the action of diffeomorphisms with a notion of stability analogous to the Lipschitz stability considered in Mallat (2012) on Euclidean space. Our results provide a path forward for utilizing the scattering mathematical framework to analyze and understand geometric deep learning, while also shedding light on the challenges involved in such generalization to non-Euclidean domains. Indeed, while these results are analogous to those obtained for the Euclidean scattering transform, we emphasize that the underlying mathematical techniques are derived from spectral geometry, which plays no role in the Euclidean analysis. Numerical results in Section 5 show that geometric scattering coefficients achieve competitive results for signal classification on a single manifold, and classification of different manifolds. We demonstrate the geometric scattering method can capture the both local and global features to generate useful latent representations for various downstream tasks. Proofs of technical results are provided in the appendices.

1.1. Notation

Let $M$ denote a compact, smooth, connected d-dimensional Riemannian manifold without boundary contained in $ℝ^{n}$ , and let $L^{2} (M)$ denote the set of functions $f : M \to ℝ$ that are square integrable with respect to the Riemannian volume dx. Let r(x, x^′) denote the geodesic distance between two points, and let Δ denote the Laplace-Beltrami operator on $M$ . Let $Isom (M, M^{'})$ denote the set of all isometries between two manifolds $M$ and $M^{'}$ , and set $Isom (M) = Isom (M, M)$ to be the isometry group of $M$ . Likewise, we set $Diff (M) = Diff (M, M)$ to be the diffeomorphism group on $M$ . For $ζ \in Diff (M)$ , we let $‖ ζ ‖_{\infty} : = \sup_{x \in M} r (x, ζ (x))$ denote its maximum displacement.

2. Geometric wavelet transforms on manifolds

The Euclidean scattering transform is constructed using wavelet and low-pass filters defined on $ℝ^{d}$ . In Section 2.1, we extend the notion of convolution against a filter (wavelet, low-pass, or otherwise), to manifolds using notions from spectral geometry. Many of the notions described in this section are geometric analogues of similar constructions used in graph signal processing (Shuman et al., 2013b). Section 2.2 utilizes these constructions to define Littlewood-Paley frames for $L^{2} (M)$ , and Section 2.3 describes a specific class of Littlewood-Paley frames which we call geometric wavelets.

2.1. Convolution on manifolds

On $ℝ^{d}$ , the convolution of a signal $f \in L^{2} (ℝ^{d})$ with a filter $h \in L^{2} (ℝ^{d})$ is defined by translating h against f; however, translations are not well-defined on generic manifolds. Nevertheless, convolution can also be characterized using the Fourier convolution theorem, i.e., $\hat{f * h} (ω) = \hat{f} (ω) \hat{h} (ω)$ . Fourier analysis can be defined on $M$ using the spectral decomposition of −Δ. Since $M$ is compact and connected, −Δ has countably many eigenvalues which we enumerate as 0 = λ₀ < λ₁ ≤ λ₂ (repeating those with multiplicity greater than one), and there exists a sequence of eigenfunctions φ₀, φ₁, φ₂, … such that {φ_k}_k≥0 is an orthonormal basis for $L^{2} (M)$ and −Δφ_k = λ_kφ_k. While one can take each φ_k to be real valued, we do not assume this choice of eigenbasis. One can show that φ₀ is constant, which implies, by orthogonality, that φ_k has mean zero for k ≥ 1. We consider the eigenfunctions {φ_k}_k≥0 as the Fourier modes of the manifold $M$ , and define the Fourier series $\hat{f} \in ℓ^{2}$ of $f \in L^{2} (M)$ as

\hat{f} (k) : = 〈 f, φ_{k} 〉 = \int_{M} f (y) \bar{φ_{k} (y)} d y .

Since φ₀, φ₁, φ₂, … form an orthonormal basis, we have

f (x) = \sum_{k \geq 0} 〈 f, φ_{k} 〉 φ_{k} (x) = \sum_{k \geq 0} \hat{f} (k) φ_{k} (x) = \sum_{k \geq 0} (\int_{M} f (y) \bar{φ_{k} (y)} d y) φ_{k} (x) .

(1)

For $f, h \in L^{2} (M)$ , we define the convolution ∗ over $M$ between f and h as

f * h (x) : = \sum_{k \geq 0} \hat{f} (k) \hat{h} (k) φ_{k} (x) .

(2)

We let $T_{h} : L^{2} (M) \to L^{2} (M)$ be the corresponding operator T_hf(x) := f ∗ h(x) and note that we may write

T_{h} f (x) = f * h (x) = \int_{M} (\sum_{k \geq 0} \hat{h} (k) φ_{k} (x) \bar{φ_{k} (y)}) f (y) d y = \int_{M} K_{h} (x, y) f (y) d y,

where

K_{h} (x, y) : = \sum_{k \geq 0} \hat{h} (k) φ_{k} (x) \bar{φ_{k} (y)} .

It is well known that convolution on $ℝ^{d}$ commutes with translations. This equivariance property is fundamental to Euclidean ConvNets, and has spurred the development of equivariant neural networks on other spaces, e.g., Cohen and Welling (2016); Kondor and Trivedi (2018); Thomas et al. (2018); Kondor et al. (2018a); Cohen et al. (2018); Kondor et al. (2018b); Weiler et al. (2018). Since translations are not well-defined on $M$ , we instead seek to construct a family of operators which commute with isometries. Towards this end, we say a filter $h \in L^{2} (M)$ is a spectral filter if λ_k = λ_ℓ implies $\hat{h} (k) = \hat{h} (ℓ)$ . In this case, there exists a function $H : [0, \infty) \to ℝ$ , which we refer to as the spectral function of h, such that

H (λ_{k}) = \hat{h} (k), for all k \geq 0 .

In the proofs of our theorems, it will be convenient to group together eigenfunctions belonging to the same eigenspace. This motivates us to define

Λ : = {λ \in ℝ : there exists φ \in L^{2} (M) such that - Δ φ = λ φ}

as the set of all eigenvalues of −Δ, and to let

K^{(λ)} (x, y) : = \sum_{λ_{k} = λ} φ_{k} (x) \bar{φ_{k} (y)}

(3)

for each λ ∈ Λ. We note that if h is a spectral filter, then we may write

K_{h} (x, y) = \sum_{k \geq 0} H (λ_{k}) φ_{k} (x) \bar{φ_{k} (y)} = \sum_{λ \in Λ} H (λ) K^{(λ)} (x, y) .

(4)

For a diffeomorphism $ζ \in Diff (M)$ we define the operator $V_{ζ} : L^{2} (M) \to L^{2} (M)$ as

V_{ζ} f (x) : = f (ζ^{- 1} (x)) .

The operator V_ζ deforms the function $f \in L^{2} (M)$ according to the diffeomorphism ζ of the underlying manifold $M$ . The following theorem shows that T_h and V_ζ commute if ζ is an isometry and h is a spectral filter. We note the assumption that h is a spectral filter is critical and in general T_h does not commute with isometries if h is not a spectral filter. The proof is in Appendix A.

Theorem 2.1

For every spectral filter $h \in L^{2} (M)$ and for every $f \in L^{2} (M)$ ,

T_{h} V_{ζ} f = V_{ζ} T_{h} f, \forall ζ \in Isom (M) .

2.2. Littlewood-Paley frames over manifolds

A family of spectral filters {h_γ : γ ∈ Γ} (with Γ countable), is called a Littlewood-Paley frame if it satisfies the following condition which implies that the h_γ cover the frequencies of $M$ evenly:

\sum_{γ \in Γ} {| {\hat{h}}_{γ} (k) |}^{2} = 1, \forall k \geq 0.

(5)

We define the corresponding frame analysis operator, $H : L^{2} (M) \to ℓ^{2} (L^{2} (M))$ , by

H f : = {f * h_{γ} : γ \in Γ} .

The following proposition shows that if (5) holds, then $H f$ preserves the energy of f.

Proposition 2.2

If {h_γ : γ ∈ Γ} satisfies (5), then $H : L^{2} (M) \to ℓ^{2} (L^{2} (M))$ is an isometry, i.e.,

‖ H f ‖_{2, 2}^{2} : = \sum_{γ \in Γ} {‖ f * h_{γ} ‖}_{2}^{2} = ‖ f ‖_{2}^{2}, \forall f \in L^{2} (M) .

The proof of Proposition 2.2 is nearly identical to the corresponding result in the Euclidean case. For the sake of completeness, we provide full details in Appendix B. Since the operator $H$ is linear, Proposition 2.2 also shows the operator $H$ is non-expansive, i.e., ${‖ H f_{1} - H f_{2} ‖}_{2, 2} \leq {‖ f_{1} - f_{2} ‖}_{2}$ . This property is directly related to the L² stability of a ConvNet of the form $σ_{m} (H_{m} (σ_{m - 1} (H_{m - 1} \dots σ_{1} (H_{1} f)))$ , where the σ_ℓ are nonlinear functions. Indeed, if all the frame analysis operators $H_{ℓ}$ and all the nonlinear operators σ_ℓ are non-expansive, then the entire network is non-expansive as well.

2.3. Geometric wavelet transforms on manifolds

The geometric wavelet transform is a special type of Littlewood-Paley frame analysis operator in which the filters group the frequencies of $M$ into dyadic packets. A spectral filter $ϕ \in L^{2} (M)$ is said to be a low-pass filter if $\hat{ϕ} (0) = 1$ and $| \hat{ϕ} (k) |$ is non-increasing with respect to k. Typically, $| \hat{ϕ} (k) |$ decays rapidly as k grows large. Thus, a low-pass filtering, T_ϕf := f ∗ ϕ, retains the low frequencies of f while suppressing the high frequencies. A wavelet $ψ \in L^{2} (M)$ is a spectral filter such that $\hat{ψ} (0) = 0$ . Unlike low-pass filters, wavelets have no frequency response at k = 0, but are generally well localized in the frequency domain away from k = 0.

We shall define a family of low-pass and a wavelet filters, using the difference between low-pass filters at consecutive dyadic scales, in a manner which mimics standard wavelet constructions (see, e.g., Meyer, 1993). Let $G : [0, \infty) \to ℝ$ be a non-negative, non-increasing function with G(0) = 1. Define a low-pass spectral filter ϕ and its dilation at scale 2^j for $j \in ℤ$ by:

\hat{ϕ} (k) : = G (λ_{k}) and {\hat{ϕ}}_{j} (k) : = G (2^{j} λ_{k}) .

Given the dilated low-pass filters, ${{\hat{ϕ}}_{j}}_{j \in ℤ}$ , we defined our wavelet filters by

{\hat{ψ}}_{j} (k) : = {[{| {\hat{ϕ}}_{j - 1} (k) |}^{2} - {| {\hat{ϕ}}_{j} (k) |}^{2}]}^{1 / 2} .

(6)

For $J \in ℤ$ , we let A_Jf := f ∗ ϕ_J and Ψ_jf := f ∗ ψ_j. We then define the geometric wavelet transform as

W_{J} f : = {A_{J} f, Ψ_{j} f : j \leq J} = {f * ϕ_{J}, f * ψ_{j} : j \leq J} .

The geometric wavelet transform extracts the low frequency, slow transitions of f over $M$ through A_Jf, and groups the high frequency, sharp transitions of f over $M$ into different dyadic frequency bands via the collection {Ψ_jf : j ≤ J}. The following proposition can be proved by observing that {ϕ_J , ψ_j : j ≤ J} forms a Littlewood-Paley frame and applying Proposition 2.2. The proof is nearly identical to the corresponding result in the Euclidean case; however, we provide full details in Appendix C in order to help keep this paper self-contained.

Proposition 2.3

For any $J \in ℤ$ , $W_{J} : L^{2} (M) \to ℓ^{2} (L^{2} (M))$ is an isometry, i.e.,

{‖ W_{J} f ‖}_{2, 2} = ‖ f ‖_{2}, \forall f \in L^{2} (M) .

An important example is G(λ) = e^−λ. In this case the low-pass kernel K_ϕJ is the heat kernel on M at diffusion time t = 2^J, and the wavelet operators Ψ_j are similar to the diffusion wavelets introduced in Coifman and Maggioni (2006). We also note that wavelet constructions similar to ours were used in Hammond et al. (2011) and Dong (2017). Figure 1 depicts these wavelets over manifolds from the FAUST data set (Bogo et al., 2014). Unlike many wavelets commonly used in computer vision, our wavelets are not directional. Indeed, on a generic manifold the concept of directional wavelets is not well-defined since the isometry group cannot be decomposed into translations, rotations, and reflections. Instead, our wavelets have a donut-like shape which is somewhat similar to the wavelet obtained by applying the Laplacian operator on $ℝ^{d}$ to a d-dimensional Gaussian.

Figure 1: — Geometric wavelets on the FAUST mesh with G(λ) = e^−λ. From left to right j = −1,−3,−5,−7,−9. Positive values are colored red, while negative values are dark blue.

3. The geometric wavelet scattering transform

The geometric wavelet scattering transform is a type of geometric ConvNet, constructed in a manner analogous to the Euclidean scattering transform (Mallat, 2012) as an alternating cascade of geometric wavelet transforms (defined in Section 2.3) and nonlinearities. As we shall show in Sections 3.3, 3.4, and 4, this transformation enjoys several desirable properties for processing data consisting of signals defined on a fixed manifold $M$ , in addition to tasks in which each data point is a different manifold and one is required to compare and classify manifolds. Tasks of the latter form are approachable due to the use of geometric wavelets that are derived from a universal frequency function $G : [0, \infty) \to ℝ$ that is defined independent of $M$ . Motivation for these invariance and stability properties is given in Section 3.1, and the geometric wavelet scattering transform is defined in Section 3.2. We note that much of our analysis remains valid when our wavelets are replaced with a general Littlewood-Paley frame. However, we will focus on the wavelet case for the ease of exposition and to to emphasize the connections between the manifold scattering transform and its Euclidean analogue.

3.1. The role of invariance and stability

Invariance and stability play a fundamental role in many machine learning tasks, particularly in computer vision. For classification and regression, one often wants to consider two signals $f_{1}, f_{2} \in L^{2} (M)$ , or two manifolds $M$ and $M^{'}$ , to be equivalent if they differ by the action of a global isometry. Similarly, it is desirable that the action of small diffeomorphisms on $f \in L^{2} (M)$ , or on the underlying manifold $M$ , should not have a large impact on the representation of the input signal.

Thus, we seek to construct a family of representations, (Θ_t)_t∈(0,∞), which are invariant to isometric transformations up to the scale t. In the case of analyzing multiple signals on a fixed manifold, such a representation should satisfy a condition of the form:

{‖ Θ_{t} (f) - Θ_{t} (V_{ζ} f) ‖}_{2, 2} \leq α (ζ) β (t) ‖ f ‖_{2}, \forall f \in L^{2} (M), ζ \in Isom (M),

(7)

where α(ζ) measures the size of the isometry with α(id) = 0, and β(t) decreases to zero as the scale t grows to infinity. For diffeomorphisms, invariance is too strong of a property since we are often interested in non-isometric differences between signals on a fixed manifold, or geometric differences between multiple manifolds, not just topological differences, e.g., we often wish to classify a doughnut differently than a coffee mug, even though they are both topologically a 2-torus. Instead, we want a family of representations that is stable to diffeomorphism actions, but not invariant. Combining this requirement with the isometry invariance condition (7) leads us to seek, for the case of a fixed manifold $M$ , a condition of the form:

{‖ Θ_{t} (f) - Θ_{t} (V_{ζ} f) ‖}_{2, 2} \leq [α (ζ) β (t) + A (ζ)] ‖ f ‖_{2}, \forall t \in (0, \infty), f \in L^{2} (M), ζ \in Diff (M),

(8)

where A(ζ) measures how much ζ differs from being an isometry, with A(ζ) = 0 if $ζ \in Isom (M)$ and A(ζ) > 0 if $ζ \notin Isom (M)$ . We also develop analogous conditions for isometries in the case of multiple manifolds in Section 3.4.

At the same time, the representations (Θ_t)_t∈(0,∞) should not be trivial. Different classes or types of signals are often distinguished by their high-frequency content, i.e., $\hat{f} (k)$ for large k. The same may also be true of two manifolds $M$ and $M^{'}$ , in that differences between them are only readily apparent when comparing the high frequency eigenfunctions of their respective Laplace-Beltrami operators. Our problem is thus to find a family of representations for data defined on a manifold that is stable to diffeomorphisms, allows one to control the scale of isometric invariance, and discriminates between different types of signals, in both high and low frequencies. The wavelet scattering transform of Mallat (2012) achieves goals analogous to the ones presented here, but for Euclidean supported signals. We seek to construct a geometric version of the scattering transform, using filters corresponding to the spectral geometry of $M$ , and to show it has similar properties.

3.2. Defining the geometric wavelet scattering transform

The geometric wavelet scattering transform is a nonlinear operator $S_{J} : L^{2} (M) \to ℓ^{2} (L^{2} (M))$ constructed through an alternating cascade of geometric wavelet transforms W_J and nonlinearities. Its construction is motivated by the desire to obtain localized isometry invariance and stability to diffeomorphisms, as formulated in Section 3.1.

A simple way to obtain a locally isometry invariant representation of a signal is to apply the low-pass averaging operator A_J. If G(λ) ≤ e^−λ, then one can use Theorem 2.1 to show that

{‖ A_{J} f - A_{J} V_{ζ} f ‖}_{2} \leq C (M) 2^{- d J} ‖ ζ ‖_{\infty} ‖ f ‖_{2}, \forall f \in L^{2} (M), \forall ζ \in Isom (M),

(9)

In other words, the L² difference between f ∗ ϕ_J and V_ζf ∗ ϕ_J for a unit energy signal f (i.e., ||f||₂ = 1), is no more than the size of the isometry ||ζ||_∞ depressed by a factor of 2^dJ, up to some universal constant that depends only on $M$ . Thus, the parameter J controls the degree of invariance.

However, by definition $A_{J} f = f * ϕ_{J} = \sum_{k \geq 0} \hat{f} (k) {\hat{ϕ}}_{J} (k) φ_{k}$ , and so if $| {\hat{ϕ}}_{J} (k) | \leq e^{- 2^{J} λ_{k}}$ , we see the high-frequency content of f is lost in the representation A_Jf. The high frequencies of f are recovered with the wavelet coefficients {Ψ_jf = f ∗ ψ_j : j ≤ J}, which are guaranteed to capture the remaining frequency content of f. However, the wavelet coefficients Ψ_jf are not isometry invariant and thus do not satisfy any bound analogous to (9). If we apply the averaging operator in addition to the wavelet coefficient operator, we obtain:

A_{J} Ψ_{j} f = f * ψ_{j} * ϕ_{J} = \sum_{k \geq 0} \hat{f} (k) {\hat{ψ}}_{j} (k) {\hat{ϕ}}_{J} (k) φ_{k},

but by design the sequences ${\hat{ϕ}}_{J}$ and ${\hat{ψ}}_{j}$ have small overlapping support in order to satisfy the Littlewood-Paley condition (5), particularly in their largest responses, and thus f ∗ ψ_j ∗ ϕ_J ≈ 0. In order to obtain a non-trivial invariant that also retains some of the high-frequency information in the signal f, we need to apply a nonlinear operator. Because it is non-expansive and commutes with isometries, we choose the absolute value (complex modulus) function as our non-linearity, and let

U_{J} [j] f : = | f * ψ_{j} |, j \leq J .

We then convolve the U_J[j]f with the low-pass ϕ_j to obtain locally invariant descriptions of f, which we refer to as the first-order scattering coefficients:

S_{J} [j] f : = | f * ψ_{j} | * ϕ_{J}, j \leq J .

(10)

The collection of all such coefficients can be written as

A_{J} U_{J}^{1} f : = {A_{J} U_{J} [j] f : j \leq J} = {| f * ψ_{j} | * ϕ_{J} : j \leq J},

where

U_{J}^{1} f : = {U_{J} [j] f : j \leq J} = {| f * ψ_{j} | : j \leq J} .

These coefficients satisfy a local invariance bound similar to (9), but encode multiscale characteristics of f over the manifold geometry, which are not contained in A_Jf. Nevertheless, the geometric scattering representation $S_{J}^{1} f : = {A_{J} f, A_{J} U_{J}^{1} f}$ still loses information contained in the signal f. Indeed, even with the absolute value, the functions |f ∗ψ_j| have frequency information not captured by the low-pass ϕ_J. Iterating the geometric wavelet transform W_J recovers this information by computing $W_{J} U_{J}^{1} f = {| f * ψ_{j_{1}} | * ϕ_{J}, | f * ψ_{j_{1}} | * ψ_{j_{2}} : j_{1}, j_{2} \leq J}$ , which contains the first order invariants (10) but also retains the high frequencies of $U_{J}^{1} f$ . We then obtain second-order geometric wavelet scattering coefficients given by

S_{J} [j_{1}, j_{2}] f : = A_{J} U_{J} [j_{1}] U_{J} [j_{2}] f = ‖ f * ψ_{j_{1}} | * ψ_{j_{2}} | * ϕ_{J},

the collection of which can be written as $A_{J} U_{J}^{1} U_{J}^{1} f$ . The corresponding geometric scattering transform up to order m = 2 computes $S_{J}^{2} f : = {A_{J} f, A_{J} U_{J}^{1} f, A_{J} U_{J}^{1} U_{J}^{1} f}$ , which can be thought of as a three-layer geometric ConvNet that extracts invariant representations of the input signal at each layer. Second order coefficients, in particular, decompose the interference patterns in |f ∗ ψ_j1| into dyadic frequency bands via a second wavelet transform. This second order transform has the effect of coupling two scales 2^j1 and 2^j2 over the geometry of the manifold $M$ .

The general geometric scattering transform iterates the wavelet transform and absolute value (complex modulus) operators up to an arbitrary depth. Formally, for $J \in ℤ$ and j₁, …, j_m ≤ J we let

U_{J} [j_{1}, \dots, j_{m}] : = U_{J} [j_{m}] \dots U_{J} [j_{1}] f = ‖ | f * ψ_{j_{1}} | * ψ_{j_{2}} | * \dots * ψ_{j_{m}} |

when m ≥ 1, and we let U_J[∅]f = f when m = 0. Likewise, we define

S_{J} [j_{1}, \dots, j_{m}] : = A_{J} U_{J} [j_{m}] \dots U_{J} [j_{1}] f = | ‖ f * ψ_{j_{1}} | * ψ_{j_{2}} | * \dots * ψ_{j_{m}} | * ϕ_{J},

and let S_J[∅]f = f ∗ ϕ_J. We then consider the maps $U_{J} : L^{2} (M) \to ℓ^{2} (L^{2} (M))$ and $S_{J} : L^{2} (M) \to ℓ^{2} (L^{2} (M))$ given by

U_{J} f = {U_{J} [j_{1}, \dots, j_{m}] f : m \geq 0, j_{i} \leq J \forall 1 \leq i \leq m},

and

S_{J} f = {S_{J} [j_{1}, \dots, j_{m}] f : m \geq 0, j_{i} \leq J \forall 1 \leq i \leq m} .

An illustration of the map S_J, which we refer to as the geometric scattering transform at scale 2^J, is given by Figure 2. In practice, one only uses finitely many layers, which motives us to also consider the L-layer versions of U_J and S_J defined for L ≥ 0 by

U_{J}^{L} f : = {U_{J} [j_{1}, \dots, j_{m}] f : 0 \leq m \leq L, j_{i} \leq J \forall 1 \leq i \leq m}

and

S_{J}^{L} f : = {S_{J} [j_{1}, \dots, j_{m}] f : 0 \leq m \leq L, j_{i} \leq J \forall 1 \leq i \leq m} .

Figure 2: — The geometric wavelet scattering transform $S_{J}^{L}$ , illustrated for L = 2.

The invariance properties of S_J and $S_{J}^{L}$ are described in Sections 3.3 and 3.4, whereas their diffeomorphism stability properties are described in Section 4. The following proposition shows that both S_J and $S_{J}^{L}$ are non-expansive.

Proposition 3.1

Both the finite-layer and infinite-layer geometric wavelet scattering transforms are nonexpansive. Specifically,

{‖ S_{J}^{L} f_{1} - S_{J}^{L} f_{2} ‖}_{2, 2} \leq {‖ S_{J} f_{1} - S_{J} f_{2} ‖}_{2, 2} \leq ‖ f_{1} - f_{2} ‖, \forall f_{1}, f_{2} \in L^{2} (M) .

The first inequality is trivial. The proof of the second inequality is nearly identical to Mallat (2012, Proposition 2.5), and is thus omitted.

3.3. Isometric invariance

The geometric wavelet scattering transform is invariant to the action of the isometry group on the input signal f up to a factor that depends upon the frequency decay of the low-pass spectral filter 𝜙_J. In particular, the following theorem establishes isometric invariance up to the scale 2^J under the assumption that $\hat{ϕ} (k) = G (λ_{k}) \leq e^{- λ_{k}}$ . The proof of Theorem 3.2 is given in Appendix D.

Theorem 3.2

Let $ζ \in Isom (M)$ and suppose G(λ) ≤ e^−λ. Then there is a constant $C (M) < \infty$ such that for all $f \in L^{2} (M)$ ,

{‖ S_{J} f - S_{J} V_{ζ} f ‖}_{2, 2} \leq C (M) 2^{- d J} ‖ ζ ‖_{\infty} {‖ U_{J} f ‖}_{2, 2},

(11)

and

{‖ S_{J}^{L} f - S_{J}^{L} V_{ζ} f ‖}_{2, 2} \leq C (M) {(L + 1)}^{1 / 2} 2^{- d J} ‖ ζ ‖_{\infty} ‖ f ‖_{2} .

(12)

The factor ||U_Jf||_2,2 for an infinite depth network is hard to bound in terms of ||f||₂, which is also true for the Euclidean scattering transform (Mallat, 2012). However, for finite depth networks, a simple argument shows that ${‖ U_{J}^{L} f ‖}_{2, 2} \leq {(L + 1)}^{1 / 2} ‖ f ‖_{2}$ , which yields (12).

For manifold classification (or any task requiring rigid invariance), we take J → ∞. This limit is equivalent to replacing the low-pass operator A_J with an integration over $M$ , since for any $x \in M$ ,

\lim_{J \to \infty} S_{J} [j_{1}, \dots, j_{m}] f (x) = \frac{1}{\sqrt{vol (M)}} \int_{M} | ‖ f * ψ_{j_{1}} | * ψ_{j_{2}} | * \dots * ψ_{j_{m}} (x^{'}) | d x^{'} = vol {(M)}^{- 1 / 2} ‖ ‖ f * ψ_{j_{1}} | * ψ_{j_{2}} | * \dots * ψ_{j_{m}} ‖_{1} .

(13)

Equation (13) motivates the definition of a non-windowed geometric scattering transform,

\bar{S} f : = {\bar{S} f (j_{1}, \dots, j_{m}) : m \geq 0, j_{i} \in ℤ \forall 1 \leq i \leq m} \bar{S} f (j_{1}, \dots, j_{m}) : = ‖ ‖ f * ψ_{j_{1}} | * ψ_{j_{2}} | * \dots * ψ_{j_{m}} ‖_{1} .

We also define ${\bar{S}}^{L} f$ as the L-layer version of $\bar{S} f$ , analogous to $S_{J}^{L} f$ defined in (3.2). Unlike $S f \in ℓ^{2} (L^{2} (M))$ , which consists of a countable collection of functions, $\bar{S} f$ consists of a countable collections of scalar values. Theorem 3.2 and (13) show that these values are invariant to global isometries $ζ \in Isom (M)$ acting on f. The following proposition shows they form a sequence in ℓ². We give a proof in Appendix E.

Theorem 3.3

If $f \in L^{2} (M)$ , then $\bar{S} f \in ℓ^{2}$ when $| | \bar{S} f | |_{2} \leq | | f | |_{2} .$

3.4. Isometric invariance between different manifolds

Let $M$ and $M^{'}$ be isometric manifolds. For shape matching tasks in which $M$ and $M^{'}$ should be identified as the same shape, it is appropriate to let J → ∞, and, inspired by (13), use the $\bar{S}$ representation to carry out the computation; see Section 5.2 for numerical results along these lines. In such tasks, one selects a signal f that it is defined intrinsically in terms of the geometry $M$ , i.e., one that is chosen in such a way that given f on $M$ , one can compute a corresponding signal f′ = V_ζf on $M^{'}$ without explicit knowledge of $ζ \in Isom (M, M^{'})$ . For example, if $M$ is a two-dimensional surface embedded in $ℝ^{3}$ , and $M^{'}$ is a three-dimensional rotation of $M$ by ζ ∈ SO(3), then the coordinate function is such a function. Indeed, let $x = (x_{1}, x_{2}, x_{3}) \in M \subset ℝ^{3}$ and suppose f(x) = x_i is the coordinate function on $M$ for a fixed coordinate 1 ≤ i ≤ 3. Then the coordinate function $f^{'} (x^{'}) = x_{i}^{'}$ on $M^{'}$ is given by f′ = V_ζf. More sophisticated examples include the SHOT features of Tombari et al. (2010); Prakhya et al. (2015). The following proposition shows that the geometric scattering transform produces a representation that is invariant to isometries $ζ \in Isom (M, M^{'})$ . We give a proof in Appendix F.

Proposition 3.4

Let $ζ \in Isom (M, M^{'})$ , let $f \in L^{2} (M)$ , and let f′ := V_ζf be the corresponding signal defined on $M^{'}$ . Then $\bar{S} f = {\bar{S}}^{'} f^{'}$ .

In other tasks, one may wish to have local isometric invariance between $M$ and $M^{'}$ . We thus extend Theorem 3.2 in the following way. If $ζ_{1} \in Isom (M, M^{'})$ , then the operator $V_{ζ_{1}}$ maps $L^{2} (M)$ into $L^{2} (M^{'})$ . We wish to estimate how much (S_J)′ V_ζf differs from S_Jf, where (S_J)′ denotes the geometric wavelet scattering transform on $M^{'}$ . However, the difference S_Jf−(S_J)′ V_ζf is not well-defined since S_Jf is a countable collection of functions defined on $M$ and (S_J)′ V_ζf is a collection of functions defined on $M^{'}$ . Therefore, in Theorem 3.5 we let ζ₂ be a second isometry from $M$ to $M^{'}$ and estimate ${‖ S_{J} f - V_{ζ_{2}^{- 1}} {(S_{J})}^{'} V_{ζ_{1}} f ‖}_{2, 2}$ ; see Appendix G for the proof.

Theorem 3.5

Let $ζ_{1}, ζ_{2} \in Isom (M, M^{'})$ and assume that G(λ) ≤ e^−λ. Then there is a constant $C (M) < \infty$ such that

{‖ S_{J} f - V_{ζ_{2}^{- 1}} {(S_{J})}^{'} V_{ζ_{1}} f ‖}_{2, 2} \leq C (M) 2^{- d J} {‖ ζ_{2}^{- 1} \circ ζ_{1} ‖}_{\infty} {‖ U_{J} f ‖}_{2}, \forall f \in L^{2} (M) .

(14)

and

{‖ S_{J}^{L} f - V_{ζ_{2}^{- 1}} {(S_{J}^{L})}^{'} V_{ζ_{1}} f ‖}_{2, 2} \leq C (M) {(L + 1)}^{1 / 2} 2^{- d J} {‖ ζ_{2}^{- 1} \circ ζ_{1} ‖}_{\infty} ‖ f ‖_{2}, \forall f \in L^{2} (M) .

(15)

It is worthwhile to contrast Proposition 3.4 and Theorem 3.5 with Theorem 3.2 stated in Section 3.3. As mentioned in the introduction, two of the basic tasks we condsider are the classification of multiple signals over a single, fixed manifold and the classification of multiple manifolds. Since Theorem 3.2 considers an isometry $ζ : M \to M$ , it shows that the manifold scattering transform is well-suited for the former task. Proposition 3.4 and Theorem 3.5, on the other hand, assume that ζ is an isometry from one manifold $M$ to another manifold $M^{'}$ , and therefore indicate that the manifold scattering transform is well-suited to the latter task as well.

4. Stability to Diffeomorphisms

In this section we show the scattering transform is stable to the action of diffeomorphisms on a signal $f \in L^{2} (M)$ . In Section 4.1, we show that when restricted to bandlimited functions, the geometric scattering transform is stable to diffeomorphisms. In Section 4.2, we show that under certain assumptions on $M$ , that spectral filters are stable to diffeomorphisms (even if f is not bandlimited). As a consequence, it follows that finite width (i.e., a finite number of wavelets per layer) scattering networks are stable to diffeomorphisms on these manifolds.

4.1. Stability for bandlimited functions

Analogously to the Lipschitz diffeomorphism stability in Mallat (2012, Section 2.5), we wish to show the geometric scattering coefficients are stable to diffeomorphisms that are close to being an isometry. Similarly to Wiatowski and Bölcskei (2015); Czaja and Li (2019), we will assume the input signal f is λ- bandlimited for some λ > 0. That is, $\hat{f} (k) = 〈 f, φ_{k} 〉 = 0$ whenever λ_k > λ.

Theorem 4.1

Let $ζ \in Diff (M)$ , and assume G(λ) ≤ e^−λ. Then there is a constant $C (M) < \infty$ such that if ζ = ζ₁ ∘ ζ₂ for some isometry $ζ_{1} \in Isom (M)$ and diffeomorphism $ζ_{2} \in Diff (M)$ , then

{‖ S_{J} f - S_{J} V_{ζ} f ‖}_{2, 2} \leq C (M) [2^{- d J} {‖ ζ_{1} ‖}_{\infty} {‖ U_{J} f ‖}_{2} + λ^{d} {‖ ζ_{2} ‖}_{\infty} ‖ f ‖_{2}],

(16)

and

{‖ S_{J}^{L} f - S_{J}^{L} V_{ζ} f ‖}_{2, 2} \leq C (M) [{(L + 1)}^{1 / 2} 2^{- d J} {‖ ζ_{1} ‖}_{\infty} + λ^{d} {‖ ζ_{2} ‖}_{\infty}] ‖ f ‖_{2},

(17)

for all functions $f \in L^{2} (M)$ such that $\hat{f} (k) = 〈 f, φ_{k} 〉 = 0$ whenever λ_k > λ.

Theorem 4.1 achieves the goal set forth by (8), with two exceptions: (i) we restrict to bandlimited functions; and (ii) the infinite depth network has the term ||U_Jf||_2,2 in the upper bound. We leave the vast majority of the work in resolving these issues to future work, although Section 4.2 takes some initial steps in resolving (i). We also leave for future work the case of quantifying ${‖ \bar{S} f - {\bar{S}}^{'} f ‖}_{2}$ for two diffeomorphic manifolds $M$ and $M^{'}$ . When ζ is an isometry, it reduces to Theorem 3.2, since in this case we may choose ζ = ζ₁, ζ₂ = id and note that ||id||_∞ = 0. For a general diffeomorphism ζ, taking the infimum of ||ζ₂||_∞ over all factorizations ζ = ζ₁ ∘ ζ₂ leads to a bound where the first term depends on the scale of the isometric invariance and the second term depends on the distance from ζ to the isometry group $Isom (M)$ in the uniform norm. Letting J → ∞, we may also prove an analogous theorem for the non-windowed scattering transform.

Theorem 4.2

{‖ \bar{S} f - \bar{S} V_{ζ} f ‖}_{2} \leq λ^{d} {‖ ζ_{2} ‖}_{\infty} ‖ f ‖_{2},

(18)

for all functions $f \in L^{2} (M)$ such that $\hat{f} (k) = 〈 f, φ_{k} 〉 = 0$ whenever λ_k > λ.

The proofs of Theorems 4.1 and 4.2 are given in Appendix H.

4.2. Single-Filter Stability

Theorems 4.1 and 4.2 prove diffeomorphism stability for the geometric wavelet scattering transform, but their proof techniques rely on f being bandlimited. In this section we discuss a possible approach to proving a stability result for all $f \in L^{2} (M)$ .

As stated in Theorem 2.1, spectral integral operators are equivariant to the action of isometries. This fact is crucial to proving Theorem 3.2 because it allows us to estimate

{‖ S_{J} f - V_{ζ} S_{J} f ‖}_{2, 2}

(19)

instead of

{‖ S_{J} f - S_{J} V_{ζ} f ‖}_{2, 2} .

(20)

In Mallat (2012), it is shown that the Euclidean scattering transform S_Euc is stable to the action of certain diffeomorphisms which are close to being translations. A key step in the proof is a bound on the commutator norm ||[S_Euc,V_ζ]||, which then allows the author to bound a quantity analogous to (19) instead of bounding (20) directly. This motivates us to study the commutator of spectral integral operators with V_ζ for diffeomorphisms which are close to being isometries.

For technical reasons, we will assume that $M$ is two-point homogeneous, that is, for any two pairs of points, (x₁, x₂), (y₁, y₂) such that r(x₁, x₂) = r(y₁, y₂), there exists an isometry $ζ \in Isom (M)$ such that ζ(x₁) = y₁ and ζ(x₂) = y₂. In order to quantify how far a diffeomorphism $ζ \in Diff (M)$ differs from being an isometry we will consider two quantities:

A_{1} (ζ) = \sup_{\begin{matrix} x, y \in M \\ x \neq y \end{matrix}} | \frac{r (ζ (x), ζ (y)) - r (x, y)}{r (x, y)} |,

(21)

and

A_{2} (ζ) = (\sup_{x \in M} | | \det [D ζ (x)] | - 1 |) (\sup_{x \in M} | \det [D ζ^{- 1} (x)] |) .

(22)

Intuitively, A₁ is a measure of how much ζ distorts distances, and A₂ is a measure of how much ζ distorts volumes. We let A(ζ) = max{A₁(ζ),A₂(ζ)} and note that if ζ is an isometry, then A(ζ) = 0. We remark that A(ζ) defined here differs from the notion of diffeomorphism size used in Theorems 4.1 and 4.2. It is an interesting research direction to understand the differences between these formulations, and to understand more generally which definitions of diffeomorphism size geometric deep networks are stable to. The following theorem, which is proved in Appendix I, bounds the operator norm of [T_h, V_ζ] in terms of A(ζ) and a quantity depending upon h.

Theorem 4.3

Assume that $M$ is two-point homogeneous, and let $h \in L^{2} (M)$ be a spectral filter. Then there exists a constant $C (M) > 0$ such that for any diffeomorphism $ζ \in Diff (M)$ ,

‖ [T_{h}, V_{ζ}] ‖ \leq C (M) A (ζ) B (h)

where

B (h) = \max {\sum_{k \in ℕ} \hat{h} (k) λ_{k}^{(d + 1) / 4}, {(\sum_{k \in ℕ} \hat{h} {(k)}^{2})}^{\frac{1}{2}}} .

Theorem 4.3 leads to the following corollary, which we prove in Appendix J

Corollary 4.4

Assume that $M$ is two-point homogeneous and that G(λ) ≤ e^−λ, then

‖ [Ψ_{j}, V_{ζ}] ‖ \leq C (M) A (ζ) B (ψ_{j})

where,

B (ψ_{j}) = \max {2^{- (d + 1 / 2) (j - 1)}, 2^{- d j / 4}} .

In practice, the wavelet transform is implemented using finitely many wavelets. By the triangle inequality, Corollary 4.4 leads to a commutator estimate for the finite wavelet transform. Therefore, by the arguments used in the proof of Theorem 2.12 in Mallat (2012), it follows that the geometric scattering transform is stable to diffeomorphisms on two-point homogeneous manifolds when implemented with finitely many wavelets at each layer. We do note however, that B(ψ_j) increases exponentially as j decreases to −∞. Therefore, this argument only applies to a finite-wavelet implementation of the geometric scattering transform. Mallat (2012) overcomes this difficulty using an almost orthogonality argument. In the future, one might seek to adapt these techniques to the manifold setting. However, there are numerous technical difficulties which are not present in the Euclidean setting.

5. Numerical results

In this section, we describe two numerical experiments to illustrate the utility of the geometric wavelet scattering transform. We consider both traditional geometric learning tasks, in which we compare to other geometric deep learning methods, as well as limited training tasks in which the unsupervised nature of the transform is particularly useful. In the former set of tasks, empirical results are not state-of-the-art, but they show that the geometric scattering model is a good mathematical model for geometric deep learning. Specifically, in Section 5.1 we classify signals, corresponding to digits, on a fixed manifold, the two-dimensional sphere. Then, in Section 5.2 we classify different manifolds which correspond to ten different people whose bodies are positioned in ten different ways. The back-end classifier for all experiments is an RBF kernel SVM.

In order to carry out our numerical experiments, it was necessary to discretize our manifolds and represent them as graphs. We use triangle meshes for all manifolds in this paper, which allows us to approximate the Laplace-Beltrami operator and integration on each manifold via the approach described in Solomon et al. (2014). We emaphasize that this approximation of the Laplace-Beletrami operator is not the standard graph Laplacian of the triangular mesh, and thus the discretized geometric scattering transform is not the the same as the versions of the graph scattering transform reported in Zou and Lerman (2019); Gama et al. (2019b,a); Gao et al. (2019).

5.1. Spherical MNIST

In the first experiment, we project the MNIST dataset from Euclidean space onto a two-dimensional sphere using a triangle mesh with 642 vertices. During the projection, we generate two datasets consisting of not rotated (NR) and randomly rotated (R) digits. Using the NR spherical MNIST database, we first investigate in Figure 3(a)subfigure the power of the globally invariant wavelet scattering coefficients for different network depths L and with J → ∞, which is equivalent to using the ${\bar{S}}^{L}$ f representation defined in Section 3.3. Here f is the projection of the digit onto the sphere. We observe increasing accuracy but with diminishing returns across the range 0 ≤ L ≤ 3. Then on both the NR and R spherical MNIST datasets, we calculate the geometric scattering coefficients $S_{J}^{L} f$ for J = −2 and L = 2. Other values of J are also reported in Appendix K, in addition to further details on how the spherical MNIST classification experiments were conducted. From Theorem 4.1, we know the scattering transform is stable to randomly generated rotations and Figure 3(b)subfigure shows the scattering coefficients capture enough rotational information to correctly classify the digits.

Figure 3: — Spherical MNIST classificaion results.

5.2. FAUST

The FAUST dataset (Bogo et al., 2014) contains ten poses from ten people resulting in a total of 100 manifolds represented by triangle meshes. We first consider the problem of classifying poses. This task requires globally invariant features, and thus we compute the globally invariant geometric wavelet scattering transform ${\bar{S}}^{L} f$ of Section 3.3. Following the common practice of other geometric deep learning methods (see, e.g., Litany et al., 2017; Lim et al., 2019), we use 352 SHOT features (Tombari et al., 2010; Prakhya et al., 2015) as initial node features f. We used 5-fold cross validation for the classification tests with nested cross validation to tune the hyper-parameters of the RBF kernel SVM, as well as the network depth L. We remark that tuning the network depth of the geometric scattering transform is relatively simple as compared to fully learned geometric deep networks, since the filters are predefined geometric wavelets. This is particularly important for smaller data sets such as FAUST where there is a limited amount of training data. As indicated in Figure 3, we achieve 95% overall accuracy using the geometric scattering features, compared to 92% accuracy achieved using only the integrals of SHOT features (i.e., restricting to L = 0). We note that Masci et al. (2015) also considered pose classification, but the authors used a different training/test split (50% for training and 50% for test in a leave-one-out fashion), so our results are not directly comparable.

As a second task, we attempt to classify the people. This task is even more challenging than classifying the poses since some of the people are very similar to each other. We again performed 5-fold cross-validation, with each fold containing two poses from each person to ensure the folds are evenly distributed. As shown in Table 1, we achieved 76% accuracy on this task compared to the 61% accuracy using only integrals of SHOT features. In order to further emphasize the difference between the discretized geometric scattering transform and the graph scattering transform, we also attempted this task using the graph scattering transform derived from the graph Laplacian of the manifold mesh, applied to the SHOT features of each manifold. For this approach, which is representative of the aforementioned graph scattering papers, we obtained 58% accuracy. This result is similar to the 61% accuracy obtained by the baseline SHOT feature approach and empirically indicates the importance of encoding geometric information into the scattering transform for manifold-based tasks. More details regarding both tasks are in Appendix K.

Table 1:

Manifold classification on FAUST dataset with two tasks.

Task/Model	SHOT only	Geometric scattering
Pose classification	0.92	0.95
Person classification	0.61	0.76

Open in a new tab

6. Conclusion

We have constructed a geometric version of the scattering transform on a large class of Riemannian manifolds and shown this transform is non-expansive, invariant to isometries, and stable to diffeomorphisms. Our construction uses the spectral decomposition of the Laplace-Beltrami operator to construct a class of spectral filtering operators that generalize convolution on Euclidean space. While our numerical examples demonstrate geometric scattering on two-dimensional manifolds, our theory remains valid for manifolds of any dimension d, and therefore can be naturally extended and applied to higher-dimensional manifolds in future work. Finally, our construction provides a mathematical framework that enables future analysis and understanding of geometric deep learning.

Acknowledgments

This research was partially funded by: grant P42 ES004911, through the National Institute of Environmental Health Sciences of the NIH, supporting F.G.; IVADO (l’institut de valorisation des données) [G.W.]; the Alfred P. Sloan Fellowship (grant FG-2016-6607), the DARPA Young Faculty Award (grant D16AP00117), the NSF CAREER award (grant 1845856), and NSF grant 1620216 [M.H.]; NIH grant R01GM135929 [M.H. & G.W]. The content provided here is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.