Shape Analysis of Functional Data with Elastic Partial Matching

Darshan Bryner; Anuj Srivastava

doi:10.1109/TPAMI.2021.3130535

. Author manuscript; available in PMC: 2023 Dec 1.

Published in final edited form as: IEEE Trans Pattern Anal Mach Intell. 2022 Nov 7;44(12):9589–9602. doi: 10.1109/TPAMI.2021.3130535

Shape Analysis of Functional Data with Elastic Partial Matching

Darshan Bryner ¹, Anuj Srivastava ²

PMCID: PMC9714315 NIHMSID: NIHMS1848507 PMID: 34818189

Abstract

Elastic Riemannian metrics have been used successfully for statistical treatments of functional and curve shape data. However, this usage suffers from a significant restriction: the function boundaries are assumed to be fixed and matched. In practice, functional data often comes with unmatched boundaries. It happens, for example, in dynamical systems with variable evolution rates, such as COVID-19 infection rate curves associated with different geographical regions. Here, we develop a Riemannian framework that allows for partial matching, comparing, and clustering of functions with phase variability and uncertain boundaries. We extend past work by (1) Defining a new diffeomorphism group G over the positive reals that is the semidirect product of a time-warping group and a time-scaling group; (2) Introducing a metric that is invariant to the action of G; (3) Imposing a Riemannian Lie group structure on G to allow for an efficient gradient-based optimization for elastic partial matching; and (4) Presenting a modification that, while losing the metric property, allows one to control the amount of boundary disparity in the registration. We illustrate this framework by registering and clustering shapes of COVID-19 rate curves, identifying basic patterns, minimizing mismatch errors, and reducing variability within clusters compared to previous methods.

Keywords: Functional data analysis, elastic partial matching, phase variability, COVID-19 rates, elastic Riemannian metric

1. Introduction

The field of functional data analysis (FDA) has seen a tremendous growth and activity over the last few years [10], [12], [16], [23], [27]. This phenomenal interest in FDA stems in part from our growing ability to record, store, and transmit data streams that are indexed over near-continuous times. A number of books and papers, including the ones mentioned above, have highlighted clear benefits of analyzing these data streams as continuous functions rather than as discrete time-series. In FDA, one treats observed functions as elements of a function space, endows a metric structure on this space, and uses the geometry of this metric space to perform statistical analyses. The main challenges in the analysis of functional data come from the infinite dimensionality of function spaces, the contamination of data due to noise, and the presence of phase variability within the data. We highlight the last, and arguably most important, of these issues next.

Data with Phase Variability:

Functional data displays an interesting phenomenon that makes it unique from the perspective of statistical analysis. Real life functional data often comes with phase variability, i.e., functions are often observed with perturbations or warpings of the time axis, resulting in the horizontal movements of peaks and valleys. In other words, noise in real data manifests itself as not only vertical (or additive) but also horizontal (or compositional), reflecting an inherent lack of temporal synchronization across functions. For example, in the case of COVID-19 infection-rate curves, the high and low points for different regions occur at different times, implying different evolution rates of the virus cycle. Several papers [18], [19], [22], [29], [30] and a book [27] have documented and formally developed the concept of phase variability in functional data with the guidance that ignoring this phase variability leads to inflated variance in the data, loss of structures, and reduced power in hypothesis testing. Indeed these papers provide ways of separating phase and amplitude components (also termed alignment or registration of functions) and then performing either individualized or joint statistical analyses [15], [17], [22], [27], [31]. The proposed methods (for phase-amplitude separation) differ in their choices of metrics, tools for optimizations, and their definitions of phase; however, most of this work has a fundamental shortcoming. It assumes that the functions are fully observed over a common interval and moreover that the endpoints match perfectly across observations.

In more mathematical terms, let ${f_{i} : [0, T] \to ℝ, i = 1, 2, \dots, n}$ be the set of observed functions on an interval [0, T]. Isolating phase variability implies finding time-warping functions {γ_i} such that the time-warped functions {f_i ○ γ_i} are matched (or aligned, or registered) where, in this context, a good matching generally refers to having functions with peaks and valleys co-located across observations. Here, the warping functions {γ_i} are constrained to be diffeopmorphsisms of [0, T] to itself such that γ_i(0) = 0 and γ_i(T) = T. This last property implies that the endpoints of the data, {f_i(0)} and {f_i(T)}, are assumed to be already matched across all functions.

Data with Sliding Right Boundary:

In many situations where the phase variability is present not just in the interior of [0, T] but also on the boundaries, the boundary-matching assumption breaks down and presents a major challenge. Sometimes one can assume that the left boundaries {f_i(0)} are either synchronized or can be synchronized perhaps through a simple shift. However, the right boundaries are still variable, i.e., γ_i(T) ≠ T. This situation arises, for example, in COVID-19 data where at the end of any observed interval, the infection rate curves for different regions are at different states of evolution. Another example of uncertain boundaries arises in censored data. Censoring is a process of randomly truncating the observation interval before the scheduled end is reached. In mathematical terms, such censored observations are denoted by ${f_{i} : [0, T_{i}] \to ℝ, i = 1, 2, \dots, n}$ , with the end times $T_{i} \in ℝ_{+}$ being arbitrary. There is a significant literature for analyzing right-censored functional data [5], [8], especially in survival analysis [7], [14], where the aim is to model censoring times as random variables and use the distribution of min(T, T_i) explicitly in the likelihood functions. However, this literature does not account for phase variability and assumes that the functional data is fully registered. Thus, the analysis naturally deteriorates when phase variability is present in the actual data. A commonly used, albeit naive, solution is to simply time scale (linearly) each observation so that they all have the same domain [0, T]. However, this fails to address some deeper issues.

Data with Both Phase Variability and Sliding Right Boundary:

In this paper, we are concerned with functional data with: (1) Random phase variability; and (2) Variable right endpoints T_is. The challenge is to separate the phase from the amplitude when the right endpoints are no longer synchronized. In other words, we wish to align or register functions, essentially by matching their peaks and valleys inside the domain while simultaneously placing their floating right boundaries correctly. Some researchers refer to this problem as that of elastic partial matching [9], [21], [25]. An example of this situation arises in the analysis of Berkeley growth curves [24], where the natural start time of growth is birth. However, the end of the fixed 20-year observation period may not match across subjects due to different biological clocks. Another example mentioned earlier involves incidence rates of the COVID-19 virus for different geographical locations where different cities/states/countries exhibit different rates of evolution in the infections. The incidence’s start time can be defined as the first positive case, but due to epidemiological and demographic factors, the pandemic evolution is undoubtedly not well synchronized across regions. Different regions experience different growths and decline rates, and one needs to perform registration to understand these patterns.

How can we analyze such data while preserving salient structures in the data? We illustrate this problem with a simple example. Figure 1(a) shows a pair of functions, f₁ and f₂, where the left boundaries are matched well but the right boundaries are different, with the observed endpoints being T₁ = 2 and T₂ = 1. Furthermore, the two functions have similar shapes except that f₁ is missing a piece relative to f₂; thus, f₁ can only be well matched with a part of f₂, despite T₂ < T₁. If we linearly stretch f₂ to match their boundaries by a simple time scaling (t ↦ 2t), we obtain the result in (b). If, in addition to a linear scaling, we also time-warp f₂ to match with f₁, we get (c). Finally, if we stretch f₂ in such a way that f₁ is matched to a part of f₂ (t ↦ 3t) and then apply time warping, we get (d). Amongst these solutions, panel (d) provides the most satisfactory result.

Fig. 1. — Illustration of different potential solutions for matching curves with different shapes and flexible right boundaries.

In order to reach this solution, we need to infer two items: (1) which parts of the two functions match, i.e. how much time-scaling is needed, and (2) which time-warping aligns the two matched parts? If we have a set of functions, each with an uncertain right boundary, then the joint registration, modeling, and analysis of this data become even more challenging.

The following items summarize the main challenges in developing mathematical solutions:

Mathematical Representations & Metrics: One needs mathematical representations that can account for uncertain boundaries and time-warpings of data. Specifically, we need metrics and objective functions such that partial matching of functions can be posed as binary optimization problems. Similar to past works on elastic shape analysis of functions [27], we need metrics that are invariant to actions of nuisance groups applicable here – time-warping and time-scaling.
Interaction of Warping & Sliding Boundaries: While the solutions for the individual problems – time-warping to match extrema and time-scaling to match the boundaries – are well known, the combination makes the problem more difficult. The combination requires diffeomorphic transformations of domains but without a fixed right boundary. This, in turn, demands searching over all combinations of linear stretches and nonlinear warpings to match any two functions. One needs to impose additional structure to help efficiently optimize over the joint space.

This formulation resembles the problem of partial matching of shapes, and there are several papers in the literature on this topic, see [1], [2], [6], [11], [20], [33] for example. However, our focus is on Riemannian approaches as they provide a comprehensive toolbox for the statistical analysis of shapes, including geodesics, proper metrics, statistical summaries, and probabilistic modeling. This paper develops a Riemannian framework for elastic analysis of functional data with right floating boundaries. Similar to the past works, we develop a framework where we represent both time-warping and time-scaling as group actions on a set of functions. A fully automated optimization procedure compares any two functions by optimally time-warping their domains and varying the right boundaries. The essential contribution here is a parametrization of the joint action of these groups, which facilitates the partial matching of curves. Additionally, we develop an extension that allows us to choose a scalar weight λ > 0 for balancing the dual goals of matching the interiors and the right boundaries; however, some metric properties are lost when λ ≠ 1.

The rest of this paper is organized as follows. We develop a mathematical framework for partially aligning functions using an elastic Riemannian metric and a square-root representation in Section 2. Section 3 develops the optimization of an objective function for pairwise matching of functions. Section 4 presents a set of experimental results involving both simulated and real data (COVID-19 infection rate curves) to demonstrate the success of the proposed framework. The paper ends with a summary and some conclusions in Section 5.

2. Proposed Mathematical Framework

In this section, we develop an elastic Riemannian framework for representing, partially matching, and comparing functional data with phase variability and sliding right boundaries. Before we develop our approach, we summarize the past ideas for phase-amplitude separation, or registration of functions, with matched boundaries. We will follow the approach presented in [27], [28], [29].

2.1. Past Work in the Alignment of Functions with Fixed Boundaries

Let $ℱ = {f : [0, T] \to ℝ ∣ f$ be the set of functions of interest. Let Γ_T be the group of all positive diffeomorphisms from [0, T] to itself that preserve the boundaries. For any $f \in ℱ$ and γ ∈ Γ_T, the composition f ○ γ denotes a time warping of f while keeping the boundaries fixed (γ(0) = 0, γ(T) = T). In order to register functions, one represents them by their square-root velocity functions (SRVFs). For any $f \in ℱ$ , its SRVF is given by $q (t) = sign (\dot{f} (t)) \sqrt{| \dot{f} (t) |}$ . In fact, this mapping f ↦ q forms a bijection between $ℱ$ and $L^{2} ([0, T], ℝ)$ , up to a constant. One can reconstruct a function f from its SRVF q and f(0) using $f (t) = f (0) + \int_{0}^{t} q (s) | q (s) | d s, t \in [0, T]$ .

For any γ ∈ Γ_T, the SRVF of the time-warped function f ○ γ is given by $(q * γ) ≜ (q \circ γ) \sqrt{\dot{γ}}$ . A very important property of SRVFs is that for any two functions $f_{1}, f_{2} \in ℱ$ , and their SRVFs q₁, q₂, we have ∥q₁ − q₂∥ = ∥(q₁ * γ) − (q₂ * γ)∥ for all γ ∈ Γ_T. (The norm here is the standard $L^{2}$ norm: $‖ q ‖ = \sqrt{\int_{0}^{T} q {(t)}^{2} d t}$ Due to this property, the $L^{2}$ norm between SRVFs is called the elastic metric. With the elastic metric, the problem of registration between functions can be solved as the following optimization:

\hat{γ} = \underset{γ \in Γ_{T}}{argmin} ‖ q_{1} - (q_{2} * γ) ‖ .

(1)

This optimization is solved either using the dynamic programming algorithm or a gradient-based approach, as needed, and the resulting registration is called dense elastic registration. For any t ∈ [0, T], the point f₁(t) is said to be matched to the point $f_{2} (\hat{γ} (t))$ .

For alignment of multiple functions, one exploits the fact that the minimum in Eqn. 1 is actually a proper distance on the quotient space $L^{2} / Γ_{T}$ . Using this distance, one can define a mean function under this metric and then align the given functions to this mean using Eqn. 1. For a given set of functions {f₁, f₂, . . . , f_n}, this framework results in a set of time-warping functions (also called phases) {γ_i} and the corresponding aligned functions {f_i ○ γ_i} (also called amplitudes). The full procedure for this Phase-Amplitude separation has been outlined in Chapter 8 of the textbook [27]. Figure 2 shows an example of this approach – the left panel shows the original functions {f_i}, the middle panel shows the time-warped aligned functions {f_i ○ γ_i}, and the right panels shows the optimal warping functions {γi}.

Fig. 2. — Phase-amplitude separation of uncensored functional data using SRVF representation.

This approach has been remarkably successful in the alignment of functional data in a variety of applications. However, as mentioned earlier, this framework assumes that both the endpoints t = 0 and t = T are preregistered in all observations. In Fig. 2, the endpoints are kept fixed while the interior is time-warped. Next, we consider the problem of aligning functional data with a sliding right boundary. Note that one of the boundaries, say the left one, can be matched by translation, and that leaves only the other boundary to be matched.

2.2. Joint Time-Warping and Time Scaling

Re-define the set of functions to be $ℱ = {f : [0, \infty) \to ℝ ∣ f$ is absolutely continuous}. Define the space $ℱ_{0} \subset ℱ$ as the space of all absolutely continuous, right-censored functions on [0, ∞) as

ℱ_{0} = {f \in ℱ ∣ \exists c \geq 0 s . t . f (t) = c o n s t . for all t > c} .

Let $L_{0}^{2} \subset L^{2} ([0, \infty), ℝ)$ be the set of SRVFs of the elements of $ℱ_{0}$ . That is, any $q \in L_{0}^{2}$ is a square-integrable function such that q(t) = 0 for all t > c and some c > 0. Define the preshape distance between censored functions as the $L^{2}$ distance on their corresponding SRVFs.

Definition 1.

The preshape distance on $ℱ_{0}$ is given by the $L^{2}$ distance

d_{p} {(f_{1}, f_{2})}^{2} = \int_{0}^{\infty} {(q_{1} (t) - q_{2} (t))}^{2} d t,

where q₁, q₂ are SRVFs of f₁, f₂, respectively.

Next, we re-define the diffeomorphism group Γ to be on the domain [0, ∞) :

Γ = {γ : [0, \infty) \to [0, \infty) ∣ γ (0) = 0, γ is a positive diffeomorphism} .

One can show that Γ has properties similar to Γ_T in the previous section. Namely, Γ is a group under composition as the binary operation, and Γ forms a right group action on the function space $ℱ_{0}$ also by function composition. In particular, for a function $f \in ℱ_{0}$ with censoring point c, the group action is f ↦ f ○ γ, which sends the censoring point c to γ⁻¹(c). Also similar to that of Γ_T, the corresponding right group action of Γ on $L_{0}^{2}$ is given by $q * γ = (q ○ γ) \sqrt{\dot{γ}}$ . Furthermore, the group Γ also acts by isometries with respect to the preshape distance d_p. That is, for any γ ∈ Γ and $f_{1}, f_{2} \in ℱ_{0}$ , the distance preserving property d_p(f₁, f₂) = d_p(f₁ ○ γ, f₂ ○ γ) holds.

At this stage, one can use the past formulation to register functions according to inf_γ∈Γ d_p(f₁, f₂ ○ γ). However, since the functions q₁ and q₂ are zero on the right sides, i.e., they contain no information to help guide function registration, the use of the full Γ is excessive. Instead, we will seek a proper subgroup of Γ that makes the optimization computationally efficient and the result more interpretable. We construct this new group G ⊂ Γ from a combination of the time-scaling H ⊂ Γ and time-warping N ⊂ Γ, which have the following definitions.

Definition 2.

The time-scaling set H is given by

H = {h \in Γ ∣ h (t) = a t for any a > 0},

the set of all positive, linear functions on [0, ∞).

Definition 3.

The time-warping set N is given by

N = \cup_{b > 0} {n \in Γ ∣ n (t) = t for t \geq b},

the set of fixed-interval diffeomorphisms on [0, b] completed with identity from b to infinity, unioned over all b > 0. The point b is termed the pivot point of the diffeomorphism n.

Now, any group element g ∈ G is constructed such that g = n ○ h, and since G ⊂ Γ, the group actions of G on $ℱ_{0}$ and $L_{0}^{2}$ remain the same as that of Γ. That is, for any g = n ○ h ∈ G and $f \in ℱ$ , the action of g on f is given by f ○ n ○ h. Also, the action of G on $ℱ$ is by isometries under the metric d_p simply because the larger group Γ does the same.

Later on, we establish that G is a proper subgroup of Γ, thus forming the quotient space $L_{0}^{2} / G$ and defining a shape distance on the quotient space. However, first, we provide intuition and motivation for this specific construction. This formulation allows us to restrict the search from a prohibitively large space Γ to only the pair $a \in ℝ_{+}$ and γ ∈ Γ_b, where b is the pivot mentioned above. Further, if the pivot point is known, we can simply perform the time-warping on the fixed space Γ₁ instead of Γ_b. We can thus leverage existing computational tools to optimize over this joint space $(a, γ) \in ℝ_{+} \times Γ_{1}$ . The resulting optimal registration is also highly interpretable – it is a global stretching/compression of the function combined with a nonlinear warping.

However, the question remains: What should the value of the pivot point b be? Suppose we are registering the function f₂ to f₁ in $ℱ_{0}$ , with the respective censoring points being c₂ and c₁. That is, we solve for inf_g∈G d_p(f₁, f₂ ○ g) with g = n ○ h and h = at. Since the SRVFs q₁ and q₂ are zero after their respective censoring points, there is no need to perform any time-warping beyond the minimum of the two censoring points. Moreover, since after time-scaling, the censoring point c₂ becomes h⁻¹(c₂) = c₂/a, we can set the pivot point b to be b = min(c₁, c₂/a). Thus, the pivot b only depends on the free parameter a, simplifying the subsequent optimization.

Now, we can write the functional form of g ∈ G in a parameterized form as follows. First, note that for h ∈ H with slope a and for n ∈ N with pivot point b, the operation h○n○h⁻¹ sends n to another element $\tilde{n} \in N$ with pivot point b/a. Furthermore, this operation does not change the shape of n and simply scales the time-warping function along the diagonal. We can thus write any n ∈ N as n = h ○ n₀ ○ h⁻¹, where n₀ ∈ N has a pivot of b = 1. That is, if we let h(t) = bt with the desired pivot point b, then n(t) can be written as n(t) = bn₀(t/b). By letting n₀(t) = γ(t) on 0 ≤ t ≤ 1 for γ ∈ Γ₁, we can identify any g ∈ G with the triplet (a, b, γ) with

g (t) = {\begin{array}{l} a b γ (t / b) & t \leq b \\ a t & t > b . \end{array}

(2)

However, as explained in the previous paragraph, since we know the censoring points c₁ and c₂, we simplify this parameter space by setting the pivot value to b = min(c₁, c₂/a). Now, any g ∈ G can be identified with the pair $(a, γ) \in ℝ_{+} \times Γ_{1}$ .

2.3. G is a Proper Subgroup of Γ

In order to show that G is a proper subgroup of Γ, we first show that the time-scaling set H and the time-warping set N are both subgroups of Γ, with N being a normal subgroup. Then we show that G can be constructed as an outer semidirect product of H and N; hence, it is a proper subgroup of Γ.

Lemma 1.

The time-scaling set H is a subgroup of Γ.

Proof:

H ⊂ Γ by definition. The closure property is achieved due to linearity: h₁ ○ h₂ = h₁(a₂t) = a₂h₁(t) = a₂a₁t ∈ H. Any h ∈ H has an inverse h⁻¹(t) = t/a, and thus h⁻¹ ∈ H. The identity element e(t) = t is also in H. □

Lemma 2.

The time-warping set N is a normal subgroup of Γ.

Proof:

N ⊂ Γ by definition. To establish the closure property, note that for any n₁, n₂ ∈ N, the composition $\tilde{n} = n_{1} \circ n_{2}$ is also a member of N, with $\tilde{b} = \max (b_{1}, b_{2})$ such that $\tilde{n} (t) = t$ for $t \geq \tilde{b}$ .

To prove that N is a normal subgroup of Γ, we need to show that γ ○ n ○ γ⁻¹ ∈ N for all γ ∈ Γ and n ∈ N. Evaluating the composition, we get

γ (n (γ^{- 1} (t))) = {\begin{array}{l} γ (n (γ^{- 1} (t))) & 0 \leq t \leq γ (a) \\ γ (γ^{- 1} (t)) & t > γ (a) \end{array} = {\begin{array}{l} (γ ○ n ○ γ^{- 1}) (t) & 0 \leq t \leq γ (a) \\ t & t > γ (a), \end{array}

which is a member of N. □

We need to form a product of N and H to reach the desired subgroup G. However, since N and H do not commute, we cannot simply construct a direct product group N × H. Instead, using the fact that N and H are both groups, and that N ∩ H = {e}, the identity element, we can construct a new subgroup G ⊂ Γ as an outer semidirect product of both N and H. As a first step, we define the following homomorphism.

Definition 4.

Let ϕ : H → Aut(N) be the group homomorphism defined by ϕ(h)(n) = h○n○h⁻¹ for all h ∈ H and n ∈ N. We write this homomorphism as ϕ_h : N → N for brevity.

Before we proceed, we establish that this mapping is indeed a homomorphism.

Lemma 3.

The map h ↦ ϕ_h is a homomorphism ϕ : H → Aut(N), i.e. $ϕ_{h_{1}} \circ ϕ_{h_{2}} = ϕ_{h_{1} \circ h_{2}}$ .

Proof:

For simplicity, we suppress the symbol ○ for function composition in this proof. Since N is a normal subgroup of Γ, and H ⊂ Γ, it holds that $ϕ_{h_{1}} ϕ_{h_{2}}$ and $ϕ_{h_{1} h_{2}}$ are both functions from N → N. For any n ∈ N, we have that $ϕ_{h_{1}} ϕ_{h_{2}} (n) = ϕ_{h_{1}} (h_{2} n h_{2}^{- 1}) = h_{1} h_{2} n h_{2}^{- 1} h_{1}^{- 1} = (h_{1} h_{2}) n {(h_{1} h_{2})}^{- 1} = ϕ_{h_{1} h_{2}} (n)$ . □

Now we have all the pieces to form the subgroup required for registration of functions.

Definition 5.

Using the time-warping subgroup N, the time-scaling subgroup H, and the homomorphism ϕ : H → Aut(N), define the outer semidirect product G = N ⋊_ϕ H, denoted by G ⊂ Γ, to be the pairing (N, H) with the following properties:

Group operation: $(n_{1}, h_{1}) \cdot (n_{2}, h_{2}) = (n_{1} \circ ϕ_{h_{1}} (n_{2}), h_{1} \circ h_{2})$
Inverse element: ${(n, h)}^{- 1} = (ϕ_{h^{- 1}} (n^{- 1}), h^{- 1})$ .

Using Lemmas 1, 2, and 3 and the definition of the homomorphism ϕ, by definition of an outer semidirect product, we can show that G is a proper subgroup of Γ.

Remarks:

By definition, any element in G can be expressed uniquely as g = n ○ h. (Therefore, G also contains functions g ∈ Γ that take the form g(t) = at for all t > b.) Henceforth, we will use this shorthand notation of g to indicate the time-warping and time-scaling pair (n, h).

Since the subgroup G acts on $ℱ_{0}$ (correspondingly on $L_{0}^{2}$ ) by isometries, the distance d_p descends to the quotient space $ℱ_{0} / G$ (correspondingly $L_{0}^{2} / G$ ) and provides a distance on this quotient space.

Definition 6.

The shape distance on $ℱ_{0}$ is given by

d_{s} (f_{1}, f_{2}) = inf_{g \in G} d_{p} (f_{1}, f_{2} \circ g)

(3)

Similar to the theory developed in [27], [28], [29], this distance d_s is a proper distance on the quotient space $ℱ / G$ .

We illustrate this idea using a simple example in Fig. 3. Here, we first compute the shape distance between a pair of censored functions f₁ and f₂ (with respective censoring points c₁ and c₂) by aligning f₂ to f₁ according to Eq. 3. In the first row, the left plot shows the original function pair, the center plot shows the optimally aligned f₂ with respect to f₁, and the right plot shows the optimal g ∈ G needed to perform the time-warping of f₂. The second row shows the same progression of events but for when the index labeling of the two functions has been reversed. Here, since the shape distance is a proper distance and is hence symmetric, the shape distance is the same in both cases, and the optimal diffeomorphisms in G are inverses of each other.

3. Optimization Details for Pairwise Shape Distance

Next, we develop techniques for optimally aligning censored functions f₁ and f₂ (with known censoring points c₁ and c₂) in order to compute their shape distance as defined in Eq. 3. That is, we develop numerical recipes to find and apply a group element $\hat{g} \in G$ to f₂ in order to optimally match a fixed f₁ to minimize d_p. Towards this end, we define an energy:

E (g) = \int_{0}^{\infty} {(q_{1} (t) - (q_{2} * g) (t))}^{2} d t .

We have shown in Section 2.2 that any g ∈ G can be identified by the pair $(a, γ) \in ℝ_{+} \times Γ_{1}$ with b = min(c₁, c₂/a). Thus,

E (a, γ) = \int_{0}^{b} {(q_{1} (t) - q_{2} (a b γ (t / b)) \sqrt{a \dot{γ} (t / b)})}^{2} d t + \int_{b}^{\infty} {(q_{1} (t) - q_{2} (a t) \sqrt{a})}^{2} d t .

(4)

Then, we solve for $(\hat{a}, \hat{γ}) = \underset{(a, γ) \in ℝ_{+} \times Γ_{1}}{arginf} E (a, γ)$ , form $\hat{g} (t)$ using Eqn. 2, set $\hat{b} = \min (c_{1}, c_{2} / \hat{a})$ , and apply ${\hat{f}}_{2} = f_{2} \circ \hat{g}$ .

3.1. Grid-Search Algorithm

As a simpler first idea, we present a grid-search approach for this optimization in the following manner. We start by defining a finite sampling, or “grid,” on the time-scaling parameter $a \in ℝ_{+}$ given by {a_i, i = 1, . . . , J}. Then, for each grid point i = 1, . . . , J, we fix the time-scaling parameter a_i and pivot point b_i = min(c₁, c₂/a_i) and solve for the optimal time-warping γ_i ∈ Γ₁. One can solve for γ_i by first truncating the two functions to the common overlapping domain [0, b_i], rescaling both functions to have domain [0, 1], and then performing dense elastic registration over this interval as described in Subsection 2.1. One can perform this function registration using one of several approaches present in the literature. The most commonly used tool is the dynamic programming algorithm [27], but one can also exploit the geometry of the space Γ₁ to develop a BFGS-based gradient search [13]. The pair (a_i, γ_i) that yields the lowest value of the energy function E in Eqn. 4 defines the optimal diffeomorphism $\hat{g} \in G$ that best matches f₂ to f₁.

Algorithm 1 describes the grid-search algorithm for elastic partial matching of right-censored functions. The advantages of this algorithm are that: (1) It is relatively straightforward and uses mostly existing tools from the literature, and (2) It provides global solutions depending on the grid resolution. The disadvantage is that it is computationally expensive to solve for an optimal diffeomorphism γ_i ∈ Γ₁ for each candidate time-scaling parameter a_i. A more computationally efficient solution is to perform a gradient search over the joint domains. However, due to the high-dimensionality of the state space and complex nature of the objective function, any gradient-based optimization will be highly dependent on initialization. Therefore, in practice we use Algorithm 1, with a limited number of grid points, for obtaining a coarse initialization for a gradient-descent algorithm. The gradient descent is then used as a local refinement of the coarse initialization.

3.2. Gradient-Based Joint Optimization

In order to derive a gradient-based optimization of E, we change to a more convenient mathematical representation for elements of the group G. For an element g ∈ G identified by parameters $(a, γ) \in ℝ_{+} \times Γ_{1}$ , consider the map M given by $M (a, γ) = (log (a), \sqrt{\dot{γ}})$ , the log-transformation of a and the SRVF of γ. Let $ξ \in ℝ$ be such that a = e^ξ and let the space of SRVFs of Γ₁ be Ψ. It is easy to see that for any γ ∈ Γ₁, its SRVF $ψ = \sqrt{\dot{γ}}$ is non-negative and has unit $L^{2}$ norm (please refer to [27]).

Definition 7.

Define the space of SRVFs of Γ₁ as the positive orthant of the unit Hilbert sphere

Ψ = {ψ \in L^{2} ([0, 1], ℝ) ∣ ‖ ψ ‖ = 1, ψ > 0 a . e .} .

Define the parameter space $𝒫 = ℝ \times Ψ$ as the set of all the transformed variables (ξ, ψ). Now, in the context of censored function registration, we can form the group element g ∈ G from the parameters (ξ, ψ). Since $(a, γ) = M^{- 1} (ξ, ψ) = (e^{ξ}, \int_{0}^{t} ψ {(s)}^{2} d s)$ and b = min(c₁, c₂e^−ξ), we can rewrite Eqn. 2 as

g (t) = {\begin{array}{l} e^{ξ} b \int_{0}^{t / b} ψ {(s)}^{2} d s & t \leq b \\ e^{ξ} t & t > b . \end{array}

(5)

Moreover, the group action of G on $q_{2} \in L_{0}^{2}$ can be written in terms of the parameters $(ξ, ψ) \in 𝒫$ as such:

q * g = (q \circ g) \sqrt{\dot{g}} = {\begin{array}{l} q (e^{ξ} b (\int_{0}^{t / b} ψ {(s)}^{2} d s)) e^{ξ / 2} ψ (t / b) & t \leq b \\ q (e^{ξ} t) e^{ξ / 2} & t > b . \end{array}

(6)

Thus, the energy function in Eqn. 4 can be re-written as

E (ξ, ψ) = \int_{0}^{b} {(q_{1} (t) - q_{2} (e^{ξ} b (\int_{0}^{t / b} ψ {(s)}^{2} d s)) e^{ξ / 2} ψ (t / b))}^{2} d t + \int_{b}^{\infty} {(q_{1} (t) - q_{2} (e^{ξ} t) e^{ξ / 2})}^{2} d t,

where b = min(c₁, c₂e^−ξ).

What is the reason for changing the representation of elements of G from $ℝ_{+} \times Γ_{1}$ to $𝒫$ ? The reason is that the Riemannian geometry of $𝒫$ is less complex, relatively, in that we know the expressions for tangent spaces, exponential maps and geodesics in $𝒫$ . Similar to G, $𝒫$ is also a Lie group. Note that the group operation is $(ξ_{1}, ψ_{1}) \cdot (ξ_{2}, ψ_{2}) = (ξ_{1} + ξ_{2}, (ψ_{1} ○ \int_{0}^{t} ψ_{2} {(s)}^{2} d s) ψ_{2})$ , the inverse element is given by $(- ξ, 1 / ψ) \in 𝒫$ , and the identity element of $𝒫$ is given by $p_{i d} \equiv (ξ_{i d}, ψ_{i d}) \in 𝒫$ , where ξ_id = 0 and ψ_id(t) = 1 for all t ∈ [0, 1]. This Lie group structure allows us to formulate gradient vectors at each iteration in the tangent space of the identity element and apply these updates to the current estimate in a sequential manner.

Definition 8.

Define the tangent space of $𝒫$ at the identity element p_id = (ξ_id, ψ_id) = (0, 1) as

T_{p_{i d}} (𝒫) = {v = (y, z) \in ℝ \times L^{2} ([0, 1], ℝ) ∣ \int_{0}^{1} z (t) = 0} .

Definition 9.

Define the exponential mapping of $v = (y, z) \in T_{p_{i d}} (𝒫)$ as $\exp_{p_{i d}} (v) \in 𝒫$ via the formula

{exp}_{p_{id}} (v) = (y, cos (‖ z ‖) + sin (‖ z ‖) \frac{z}{‖ z ‖}) .

In an iterative line-search, at iteration k, let the current estimate be $p_{k} = (ξ_{k}, ψ_{k}) \in 𝒫$ , a search direction vector v_k be in the tangent bundle of $𝒫$ , and a step size be α_k > 0. If $𝒫$ were a vector space, the updated estimate would be given simply by p_k+1 = p_k + α_kv_k. Since $𝒫$ is instead a non-linear manifold, one can compute the update using the exponential mapping $p_{k + 1} = {exp}_{p_{k}} (α_{k} v_{k})$ . However, since $𝒫$ is also a Lie group with an identity element and a group action, there is no need to compute search directions in any tangent space except at identity. For any k, we can compute $v_{k} \in T_{p_{i d}} (𝒫)$ and then apply a sequential update to p_k via $p_{k + 1} = p_{k} \cdot \exp_{p_{i d}} (α_{k} v_{k})$ , where · denotes the group operation of $𝒫$ .

Most often, the search direction v_k is based on the gradient of the objective function, or energy function, evaluated at the current estimate. Therefore, we need to define the energy $E_{k} : 𝒫 \to ℝ$ at step k and show how to compute its gradient as an element of $T_{p_{i d}} (𝒫)$ . First, define the current censored function as $q_{2, k} ≜ (q_{2} * g_{k})$ , where g_k is identified by $(ξ_{k}, ψ_{k}) \in 𝒫$ and $b_{k} = min (c_{1}, c_{2} e^{- ξ_{k}})$ .

Definition 10.

For an update $(ξ, ψ) \in 𝒫$ at iteration k, define the energy of updating the current censored function q_2,k with censoring point c_2,k by (ξ, ψ) as E_k(ξ, ψ)

= \int_{0}^{b} {(q_{1} (t) - q_{2, k} (e^{ξ} b (\int_{0}^{t / b} ψ {(s)}^{2} d s)) e^{ξ / 2} ψ (t / b))}^{2} d t + \int_{b}^{\infty} {(q_{1} (t) - q_{2, k} (e^{ξ} t) e^{ξ / 2})}^{2} d t,

(7)

where b = min(c₁, c_2,ke^−ξ).

Derivation of Gradient of E:

Next, we compute the gradient ∇E_k at identity and define a line-search update direction vector $v_{k} \in T_{p_{i d}} (𝒫)$ based on this gradient. Before writing the full analytical expression for the gradient, we develop a series of useful results.

Lemma 4.

For $q_{1}, q_{2} \in L_{0}^{2}$ with censoring points c₁, c₂ ≥ 0 and g ∈ G identified by $(ξ, ψ) \in 𝒫$ with pivot point b = min(c₁, c₂e^−ξ), let ${\tilde{q}}_{2} ≜ q_{2} * g$ according to Eqn. 6. The derivative of ${\tilde{q}}_{2}$ with respect to ξ evaluated at the identity element $(ξ_{i d}, ψ_{i d}) \in 𝒫$ is given by $\frac{\partial {\tilde{q}}_{2}}{\partial ξ} (ξ_{i d}, ψ_{i d}) = t {\dot{q}}_{2} + \frac{1}{2} q_{2}$ for all t ≥ 0.

Proof:

See Appendix A.

Lemma 5.

Let $q_{1}, q_{2} \in L_{0}^{2}$ with censoring points c₁, c₂ ≥ 0, let b = min(c₁, c₂) and let $F (ψ) = \int_{0}^{b} (q_{1} (t) - {(q_{2} (b \int_{0}^{t / b} ψ {(s)}^{2} d s) ψ (t / b))}^{2} d t$ . Then, if x = t/b, the Riemannian gradient at identity $\nabla F \in T_{ψ_{i d}} (Ψ)$ is given by $\nabla F = w (x) - \int_{0}^{1} w (x) d x$ , with $w (x) = 4 b^{2} \int_{0}^{x} (q_{1} (b s) - q_{2} (b s)) {\dot{q}}_{2} (b s) d s - 2 b (q_{1} (b x) - q_{2} (b x)) q_{2} (b x)$ .

Proof:

See Appendix B.

Having obtained these useful results, we now derive the expression for gradient of E_k with respect to an incremental element of $𝒫$ .

Theorem 1.

At iteration k and for $q_{1}, q_{2, k} \in L_{0}^{2}$ with censoring points c₁, c_2,k ≥ 0, the gradient of E_k at identity $(ξ_{i d}, ψ_{i d}) \in 𝒫$ is written as the pair $\nabla E_{k} (ξ_{i d}, ψ_{i d}) = (\frac{\partial E_{k}}{\partial ξ} (ξ_{i d}, ψ_{i d}), \frac{\partial E_{k}}{\partial ψ} (ξ_{i d}, ψ_{i d})) \in T_{p_{i d}} (𝒫)$ . The two terms that comprise the gradient vector are defined as follows.

The partial derivative of E_k with respect to $ξ \in ℝ$ evaluated at identity is given by $\frac{\partial E_{k}}{\partial ξ} (ξ_{i d}, ψ_{i d})$

= - 2 \int_{0}^{\infty} (q_{1} - q_{2, k}) (t {\dot{q}}_{2, k} + \frac{1}{2} q_{2, k}) d t \in ℝ .

(8)

Let b = min(c₁, c_2,k), x = t/b, and define the function $w_{k} \in T_{ψ_{i d}} (Ψ)$ as

w_{k} (x) = 4 b^{2} \int_{0}^{x} (q_{1} (b s) - q_{2, k} (b s)) {\dot{q}}_{2, k} (b s) d s - 2 b (q_{1} (b x) - q_{2, k} (b x)) q_{2, k} (b x) .

Then, the partial derivative of E_k with respect to ψ ∈ Ψ evaluated at identity is given by

\frac{\partial E_{k}}{\partial ψ} (ξ_{i d}, ψ_{i d}) = w_{k} (x) - \int_{0}^{1} w_{k} (x) d x .

(9)

Proof:

For part (1), define the function ${\tilde{q}}_{2, k} = q_{2, k} * g$ , where g ∈ G is identified by $(ξ, ψ) \in 𝒫$ and b = min(c₁, c_2,ke^−ξ). Now, we compute the partial derivative as

\frac{\partial E_{k}}{\partial ξ} = - 2 \int_{0}^{\infty} (q_{1} - {\tilde{q}}_{2, k}) \frac{\partial {\tilde{q}}_{2, k}}{\partial ξ} d t .

Using Lemma 4, the above expression evaluated at identity becomes the expression given in Eqn. 8.

For part (2), define $F_{k} (ψ) = E_{k} (ξ_{i d}, ψ) - \int_{b}^{\infty} (q_{1} (t) - {q_{2, k} (t))}^{2} d t$ . Notice that F_k(ψ) takes the form of F as defined in Lemma 5 with q₂ replaced with q_2,k, and by construction $\nabla F_{k} (ψ_{i d}) = \frac{\partial E_{k}}{\partial ψ} (ξ_{i d}, ψ_{i d})$ . Thus, according to Lemma 5, the partial derivative is equal to the expression given in Eqn. 9. □

In order to implement a backtracking line-search method based on the Armijo-Goldstein condition, we must first define an inner product on $T_{p_{i d}} (𝒫)$ .

Definition 11.

For $v_{1}, v_{2} \in T_{p_{i d}} (𝒫)$ with v₁ = (y₁, z₁) and v₂ = (y₂, z₂), the chosen inner product on $𝒫$ is given by ⟨⟨v₁, v₂⟩⟩ = y₁y₂ + ⟨z₁, z₂⟩, where ⟨·,·⟩ is the standard $L^{2}$ inner product, and the corresponding norm is given by $‖ v ‖_{𝒫} = \sqrt{〈 〈 v, v 〉 〉}$ .

With this inner product and norm for elements of the tangent space, we can define the Armijo-Goldstein condition for the backtracking line-search method.

Definition 12. (Armijo-Goldstein Condition)

For a candidate update $p = (ξ, ψ) \in 𝒫$ , a scalar β ∈ (0, 1), a search direction $v_{k} \in T_{p_{i d}} (𝒫)$ , and a stepsize δ > 0, the Armijo-Goldstein condition is given by $E_{k} (p) \leq E_{k} (p_{i d}) + \frac{β δ}{{‖ v_{k} ‖}_{𝒫}} 〈 〈 \nabla E_{k} (p_{i d}), v_{k} 〉 〉$ .

One can see that in the special case of gradient descent where v_k = −∇E_k(p_id), the above condition simplifies to $E_{k} (p) \leq E_{k} (p_{i d}) - β δ {‖ \nabla E_{k} (p_{i d}) ‖}_{𝒫}$ . In addition to the Armijo-Goldstein condition, we also need the condition that (ψ_k * ψ)(t) ≥ 0 for all t ∈ [0, 1] in order to ensure that the incremental update of ψ_k remains in Ψ. If either of the two conditions are not satisfied, then one must reduce the stepsize by a factor of τ ∈ (0, 1) by updating δ ↦ τδ until both are satisfied or until δ becomes too small.

Algorithm 2 outlines the gradient descent method for elastic partial matching of functions.

3.2.

3.3. Modification of Energy Function

The energy defined above (Definition 10) depends in part on the $L^{2}$ norm of the extra portion of the longer function (the second term). Therefore, if that piece has a large $L^{2}$ norm, then it tends to dominate the total energy. In that case, an optimal alignment pushes the endpoints of the two functions to be closer together, and the results look more similar to the standard elastic registration with identical boundaries. In order to control the influence of this unmatched part, we modify the energy functional by multiplying the second term in E by a constant λ > 0, which results in:

E (ξ, ψ) = \int_{0}^{b} {(q_{1} (t) - q_{2} (e^{ξ} b (\int_{0}^{t / b} ψ {(s)}^{2} d s)) e^{ξ / 2} ψ (t / b))}^{2} d t + λ \int_{b}^{\infty} {(q_{1} (t) - q_{2} (e^{ξ} t) e^{ξ / 2})}^{2} d t,

(10)

where b = min(c₁, c₂e^−ξ). The resulting gradient expressions are altered in only a minor fashion and for the sake of brevity are not repeated here.

We note that for λ ≠ 1, the energy E is no longer the square of a proper metric, and some of the nice mathematical properties of a quotient space metric are lost. The energy minimization process results in a measure of “dissimilarity” rather than a proper distance squared. One can still perform a clustering analysis with these tools, as we demonstrate in the next Section.

4. Experimental Results

In this section we present some experiments demonstrating the strengths of this framework. We first introduce the two functional data sets – one simulated and one real – that we use to demonstrate and compare methods.

Dataset 1:

The simulated data set consists of 51 functions, each with 100 sample points, and separated into three classes of 17 functions each. The functions are all based on a mixture of three Gaussians on the interval [0, 1] with means at t = 0.3, t = 0.5, and t = 0.7 and with equal variance. The functions within each class have the same mixture coefficients, where, for class 1, 2 and 3, we use the coefficient pairs (0.2, 0.5, 0.8), (0.5, 0.5, 0.5), and (0.8, 0.5, 0.2), respectively. To generate a function f in the simulated dataset, we start with the appropriate Gaussian mixture on [0, 1] and alter it in the following manner: (1) select b ~ Uniform[0.625, 1] and truncate f to the interval [0, b]; (2) select a ~ Uniform[0.9, 1.1] and apply the time-scaling function h(t) = at to f; and (3) apply a random time-warping diffeomorphism in Γ_b/a to f. Fig. 4 shows the resulting simulated data set, colored according to class label with original uncensored functions plotted as dashed black lines.

Fig. 4. — Simulated data set with three classes. The left-most panel shows all functions plotted in the same window and colored according to class label. The remaining three panels plot each of the three classes separately.

Dataset 2:

The real data used here comes from daily COVID-19 infection rates for each of the 50 United States plus the District of Columbia and Puerto Rico plus 47 European countries. The daily new case data is preprocessed via the following procedure. We translate each curve so that t = 0 represents the day of the first recorded infection. Thus, t ∈ [0, ∞) represents the number of days since a state or country’s first case. Then, we apply a seven day moving average to smooth the data, and then we re-sample each curve via spline interpolation to have 100 uniformly spaced time sample points. Finally, we normalize the rate curve so that it has integral 1. We execute this process three times, each on raw data truncated at different ending dates – July 31, September 30, and November 30 of the year 2020 – to create three datasets of normalized infection rate curves. Fig. 5 shows a plot of all 99 normalized infection rate curves for the United States and Europe. The first two columns separate the US and European curves, respectively, for visual purposes.

Fig. 5. — United States and European COVID-19 normalized infection rate curves truncated at three different right boundaries – July 31, September 30, and November 30 of the year 2020. For each panel, the time axis indicates the number of days since the first recorded case. In the first and second columns, the US and European data is plotted separately for ease of visualization, and in the third column they are plotted together.

4.1. Pairwise Alignment Examples

Fig. 6 shows a few examples of pairwise alignments for both the simulated and the real data sets. For each function pair we compute both the standard elastic alignment with fixed and matched endpoints and our novel elastic partial matching with floating endpoints. For the partial matching, we use the modified energy function in Eq. 10 with λ = 0.01 for the simulated data and λ = 0.25 for the COVID-19 data. In each panel, we plot the original censored functions f₁ and f₂ in blue and red, respectively; the elastically registered version of f₂ in yellow with matched right endpoint; and the partially matched version via the methodology developed in Sections 2 and 3 in purple. In each case, the alignments are all noticeably different visually, with our elastic partial matching methodology providing a more natural registration than the standard elastic methodology. For some examples, we see that the curve with standard elastic registration is similar to our partially matched curve on a portion of the domain. However, as it approaches the endpoint of the fixed blue curve, the elastic registration tends to unnaturally compress or stretch the yellow curve in order to force the right endpoints to match. In other cases, the freedom offered by our partial matching methodology allows us to find a completely different matching across the entire domain. In particular, the elastic partial matching of North Dakota to Sweden’s infection rate curve allows for a surprisingly similar shape matching that the standard elastic registration was unable to uncover.

Below each pairwise matching result, we also show the corresponding optimal diffeomorphisms for the two elastic methods. Using the same color scheme and the same domain (time axis) scale, we plot the optimal $\hat{γ} \in Γ_{\hat{b}}$ and $\hat{g} \in G$ associated with the two elastic registrations. The circle along the purple colored diffeomorphism $\hat{g} \in G$ represents the point $\hat{g} (\hat{b})$ , where $\hat{b} = min {c_{1}, {\hat{c}}_{2}}$ is the pivot point. Recall that beyond this point the function $\hat{g}$ is linear with slope $\hat{g} (\hat{b}) / \hat{b}$ and extends to infinity. In cases where the optimally matched second function (purple) is shorter than the first (blue), we do not plot $\hat{g} (t)$ beyond the pivot point $t = \hat{b}$ .

4.2. Algorithm Performance and Parameter Selection

Performance Analysis:

Here, we present four numerical experiments to demonstrate the numerical performance of the elastic partial matching algorithm under various settings.

Experiment (1): Using the COVID dataset, we compute and plot the average computation time versus the number of function sample points N_f for three algorithm configurations: (i) Using only the grid-search algorithm; (ii) Using the gradient descent algorithm initialized with the original f₂, i.e. “no init”; and (iii) Using the gradient descent algorithm initialized with the grid-search result from (i).
Experiment (2): For both the COVID and the simulated datasets, we compute the average shape distance d_s obtained from using the “no init” gradient descent method as well as using the grid-search initialized gradient descent method.
Experiment (3): For both the COVID and the simulated datasets, we compute the average shape distance d_s versus the number of grid points J in the grid-search initialization to the gradient descent method.
Experiment (4): For both the COVID and the simulated datasets, we test the effect of λ on the partial matching result by computing the average “overlap ratio,” i.e. the proportion of function domain overlap after alignment, for several values of λ.

The parameters used for each experiment are the following. For all implementations of the gradient descent method, we use ϵ = 10⁻³, δ = 10⁻⁴, maxit = 300, β = 0.1, and τ = 0.5. Also, for all implementations of the grid-search method, for any value of J, we set the grid {ξ_i, i = 1, . . . , J} to be J uniformly spaced values between ξ₁ = −log(2) and ξ_J = log(2). Note that we form $a_{i} = e^{ξ_{i}}$ for use in Algorithm 1. In Experiments (1), (2), and (3) where λ is fixed, we use λ = 0.01 for the simulated dataset and λ = 0.25 for the COVID dataset. In Experiments (2), (3), and (4) where N_f is fixed, we use N_f = 100. In Experiments (1), (2), and (4) where J is fixed, we use J = 50. For Experiments (1), (2), and (3), we average results over 25 uniquely and randomly selected dataset pairs, and for Experiment (4), we use 100 unique and random dataset pairs. For Experiment (1) we use N_f = [25, 38, 56, 84, 127, 190, 285, 427, 641], for Experiment (3), we use J = [3, 6, 12, 24, 48, 96, 192, 384], and for Experiment (4) we use λ = [10⁻², . . . , 10¹] where the exponents are 10 uniformly spaced values between −2 and 1. Fig. 7 plots the results of Experiments (1) through (4) from left to right.

For Experiment (1), one can see from the left-most plot of Fig. 7 that the gradient descent computation time scales well in N_f compared to that of the grid-search algorithm. The complexity of the grid-search algorithm is determined almost entirely by Step 7 of Algorithm 1, the Dynamic Programming algorithm. One can also see that the gradient descent time is reduced for any value of N_f when provided with a good initialization. In fact, Experiment (2) shows that not only is the computation time reduced, but also the solution quality improves with a good initialization. Since the optimization is complex and over a high dimensional space, the gradient descent method returns a highly localized solution. However, the grid-search method using the Dynamic Programming subroutine achieves a globally optimal solution based on the selected time-scaling grid points and function discretization. Thus, using the gradient descent method as a local refinement to the grid-search result can yield near globally optimal results with large enough number of grid points and function samples. Experiment (3) explores the trade-off between the number of grid points J over a fixed range and the solution quality for each dataset. In both cases, the average shape distance improves only slightly after J = 24. Ultimately, since the computation time is not overwhelmingly large for N_f = 100, we select a larger value of J = 50 for the remainder of our results.

Remark:

Another important setting for the grid-search algorithm is the range of the grid points. Here, we use a range that allows for proposed censoring points to lie between c₂/2 and 2c₂, i.e. between half and twice the original censoring point. Of course, since the optimal solution of Algorithm 1 serves as the initialization for the gradient descent algorithm, the final censoring point ${\hat{c}}_{2}$ could lie anywhere on $ℝ_{+}$ . However, since there are typically many local solutions to the energy functional, the grid range tends to influence the range of ${\hat{c}}_{2}$ greatly.

Selection of λ:

In addition to the range of the grid points, the parameter λ in Eq. 10 is a free parameter that controls the amount of time-scaling freedom, or endpoint disparity, allowed by the elastic partial matching algorithms. A lower value of λ allows for a potentially larger difference in right endpoints as the optimal solution; whereas, a higher value of λ will tend to force the endpoints to be closer together. The selection of λ could be guided by the application at hand and any prior knowledge of its physical limitations. For example, if the dataset is of human growth rate curves measured up to a particular age, as in the Berkeley growth curves [24], it make senses to reduce endpoint disparity. Individuals’ biological clocks vary but not by a lot. However, with states’ virus infection rate curves, there are many more factors at play that could accelerate or delay virus reproduction over time, such as population density and local policies. For this reason, it makes sense to allow for more endpoint disparity for the COVID-19 application than for human growth rate curves.

Experiment (4) helps to guide us to a data-driven selection of λ for each dataset using the overlap ratio, which is defined for a pairwise alignment as $ρ = \min ({c_{1}, {\hat{c}}_{2}}) / \max ({c_{1}, {\hat{c}}_{2}})$ . As shown in the right-most panel of Fig. 7, which plots the average overlap ratio $\bar{ρ}$ versus λ, the curves resemble logistic-type S-curves with an upper asymptote of 1 and lower asymptote of about 0.82 in both cases. With this plot, we select an appropriate λ for each dataset as follows. For the simulated data, we select a relatively small value of λ = 0.01 according to the $\bar{ρ}$ vs λ plot to allow for more flexible endpoints. For the COVID dataset, we anticipate smaller flexibility and select a λ = 0.25 that yields an intermediary value of $\bar{ρ}$ , the inflection point of the $\bar{ρ}$ vs λ S-curve.

4.3. Bayesian Clustering of Functional Data

Next, we perform unsupervised clustering of the data using three different pairwise distance (or dissimilarity) measures and compare the results. The three methods are: (a) the standard $L^{2}$ metric with fixed and matched right endpoints, (b) the standard elastic metric with fixed and matched right endpoints, and (c) our novel elastic partial matching of SRVFs with floating right endpoints. Note that method (a) operates on the function values themselves while methods (b) and (c) use SRVFs for comparison. For method (c), we use the grid-search method (Algorithm 1) to initialize the gradient descent method (Algorithm 2). For Algorithm 1, we use the same grid that we used for Experiments (1), (3), and (4) given in Subsection 4.2; additionally, we use the same gradient descent input parameters for Algorithm 2 as stated in that same Subsection. Finally, as stated earlier, for method (c) we use a weighting coefficient of λ = 0.01 for the simulated data and λ = 0.25 for the COVID data, and thus the pairwise comparison in this method is not a proper distance but rather a more general dissimilarity measure.

We use the clustering technique described in [34] to cluster the data for each case. This clustering method is Bayesian in nature and uses only a pairwise similarity matrix S as input – not the functional data itself – to determine the optimal number of clusters and the cluster members in an MCMC approach. It assumes a Wishart prior on the similarity matrix and uses a variant of the Chinese Restaurant Process to help determine the number of clusters. In the associated clustering software there are several hyperparameters that must be set; however, we simply use all default settings provided in the code, with an initial number of clusters set to 3 for the simulated data and 10 for the real data, to obtain clustering results. Here, we use the conversion formula S_ij = 1 − D_ij/max{D}, where D is the pairwise distance (or dissimilarity) matrix. For method (c), the entries D_ij are equal to the square root of the energy functional after alignment. This conversion formula ensures that all entries of the similarity matrix S are scaled similarly between 0 and 1.

Dataset 1:

Fig. 8 shows the result of the Bayesian clustering on the simulated dataset for each of the three methods: (a), (b), and (c). From left to right, we show the pairwise similarity matrices, the color-coded block diagonal class inclusion matrices, the function clusters with consistent color scheme, and the associated cluster means for each cluster. In order to form block diagonal matrices, the data set indices have been optimally permuted as a result of the clustering algorithm. One can see that the number of clusters (3) is correct only for the elastic partial matching method shown in row (c). Furthermore, the cluster labeling is 100% correct for this method.

In both methods (b) and (c), the tiled cluster display shows functions that have been jointly aligned according to their respective alignment methods. For method (b), the joint alignment is a standard one, but for method (c) we use the following iterative approach. The idea is to find a template with the most “information” and then align each function to this template, in each iteration. How is such a template found? For any two functions, we first align them using our method and label the one with the larger censoring point (after the registration) as the one with more information. For a set of K functions, we select the initial template to be the function with the largest censoring point and then perform K − 1 pairwise alignments to that function. If after the alignments the function with the largest censoring point is the same function as the template, we have found the function with the most information and terminate the iterations; otherwise, we repeat the process with the new template function set as the function with the largest censoring point after the alignments. One can see that the functions align well within the clusters for our method, showcasing the three distinct classes. Since the data is partially observed, the other two methods tend to mislabel and/or divide the three classes unnecessarily into further subclasses.

The right-most panels in Fig. 8 show the cluster members as well as the cross-sectional pointwise mean for each cluster. Additionally, we compute the average cross-sectional variance and display that value in the bottom of each panel. As is evident from these numbers, the standard $L^{2}$ analysis leads to artificially inflated variances due to lack of alignment. The standard elastic analysis helps to bring down the variance within clusters by aligning peaks and valleys, but since the data is partially observed, the restriction of identical right endpoints still drives up the variance unnecessarily. The average cross-sectional variances are 80.5 × 10⁻⁴ for method (a), 21.8 × 10⁻⁴ for method (b), and 0.342 × 10⁻⁴ for method (c).

Dataset 2:

Next, we apply the clustering algorithm on the real COVID-19 datasets. Figs. 9, 10, and 11 show the results of the Bayesian clustering using methods (a), (b), and (c), respectively. The organization of each of the three figures is the following. Groups of three rows show results for data up to July 31, September 30, and November 30, respectively. The first group of three rows shows the color-coded country/state maps, the permuted pairwise similarity matrices, and the associated block-diagonal class inclusion matrices with the same color scheme as their associated maps. The second group of three rows shows the resulting function clusters along with their cross-sectional mean functions in black and average cross-sectional variance value. Similar to Fig. 8 with methods (b) and (c), the cluster members in Figs. 10 and 11 are mutually aligned within clusters in the tiled cluster display. Also similarly, the vertical axis has been scaled in each cluster tile so that the maximum value is equal to the maximum function value in each cluster set.

Fig. 9. — Clustering results on the COVID-19 data set for the $L^{2}$ metric, method (a). Rows 1 and 4 correspond to the July 31 dataset, rows 2 and 5 correspond to the September 30 dataset, and rows 3 and 6 to the November 30 dataset. The first three rows show a US state & European country map colorized according to class label, the pairwise similarity matrix, and the pairwise class inclusion matrix. The next three rows show the corresponding clusters and their cross-sectional mean curves in black. The average cross-sectional variance (×10⁻⁶) is shown in the upper left corner of each cluster panel.

Fig. 10. — Clustering results on the COVID-19 data set for the elastic metric with fixed endpoints, method (b). The figure description is the same as that of Fig. 9.

Fig. 11. — Clustering results on the COVID-19 data set for the elastic metric with partial matching, method (c). The figure description is the same as that of Fig. 9.

One can draw the following inferences from the COVID-19 results. Firstly, there is a high degree of geographical correlation of cluster members; i.e. neighboring states/countries are typically more likely to be of the same cluster member than not. This geographical correlation is stronger for method (c) compared to others. Secondly, method (c) yields fewer clusters than the other two methods on average due to its relative invariance to partial observations. Since method (c) tends to eliminate clusters with redundant shapes, it preserves true virus trajectory features, and groups states/countries more accurately compared to methods (a) and (b). Note that shape-based features that discriminate between clusters present themselves as the number and relative intensity of waves of virus transmission. Finally, we can see that method (c) yields the lowest within-class cross-sectional variance on average per cluster, suggesting that these clusters are the tightest overall of the three methods. The mean value over all clusters for each of the three methods are the following: (a) 6.03 × 10⁻⁶, (b) 7.38 × 10⁻⁶, and (c) 5.82 × 10⁻⁶.

Another important observation that we can make is that the COVID-19 data clusters are more similar to each other with respect to method (c) than with other methods. This phenomenon is in stark contrast to that of the simulated data, where clusters are very distinct and separable. Fig. 12 provides evidence of the above claim. Here, each panel shows the kernel density estimated probability density functions for within-class similarity values (blue) and across-class similarity values (red) for each dataset and method. The number shown in each panel is our measure of density separability, which is computed as ${cos}^{- 1} (〈 {\sqrt{f}}_{i n}, {\sqrt{f}}_{o u t} 〉)$ , where f_in is the estimated within-class similarity value density and f_out is the estimated across-class similarity value density (see [27]). The separability measure for the simulated data increases for methods (a), (b), and (c), respectively, with values of 1.03, 1.33, and 1.57; contrarily, for the COVID data the separability values decrease for methods (a), (b), and (c), respectively, with values of 0.904, 0.684, and 0.634. This result provides evidence that states and countries have overall more similar virus trajectories with respect to method (c) than the other methods, and since the opposite effect is observed in simulated data, this phenomenon is due to the input data and not an artifact of the methodology used.

Fig. 12. — Estimated density functions of within-class and across-class pairwise similarity values. The top row corresponds to the simulated dataset, and the bottom row to the real COVID-19 datasets. The columns from left to right correspond to methods (a), (b), and (c) respectively. The number value in the top left of each panel indicates the distance measure, or separability, of each pair.

5. Summary & Future Work

This paper presents a novel approach to overcoming an important limitation of the past works in elastic matching, comparisons, and analysis of functional data, namely that the boundaries are fixed and registered. In addition to exhibiting phase variability within the observation interval, real data often also exhibit boundary variability, and the past methods fail to account for such boundary issues. The proposed elastic partial matching method forms a joint action of time-warping and time-scaling groups and searches for an optimal nonlinear warping along with a sliding right boundary to best match functions. As seen through experiments on simulated and COVID-19 rate curves, this additional freedom allows for more natural alignments and tighter, more visually distinct clusters compared to past methods.

Although the use of a Riemannian elastic framework was motivated from a statistical perspective, the subsequent statistical tools are not developed in this paper due to a lack of space. The shape distance d_s defined here is a proper distance on the quotient space $ℱ_{0} / G$ and can be used to define sample means, sample covariance, and tangent PCA based statistical models. These models can be used further for testing and classification of functional data.

This framework can easily be applied to the problem of partially matching shapes of curves in Euclidean spaces [1], [11], [26], [33]. Very often, planar curves extracted from image data suffer from partial obscuration and missing parts. The proposed elastic partial framework naturally extends to curves in $ℝ^{n}$ and can be very useful in matching partially observed shapes in that context. Another important application of our elastic partial matching framework is human activity analysis [3], [4], [32]. As stated in these papers, classification of observed activities requires temporal synchronization, modulo variable execution rates, and this matching process can benefit from flexible boundaries.

Supplementary Material

supp1-3130535

NIHMS1848507-supplement-supp1-3130535.pdf^{(123.2KB, pdf)}

Acknowledgements

The authors are grateful to Prof. Eric Klassen of FSU for helpful discussions on topics covered in this paper. This research was supported in part by the grants NSF CDS&E DMS 1621787, NIH R01 GM135927, and NSF CDS&E DMS 1953087 to AS; and by the Naval Innovative Science and Engineering (NISE) program at the Naval Surface Warfare Center Panama City Division (NSWC PCD) to DB.

Biographies

graphic file with name nihms-1848507-b0015.gif

Darshan Bryner Darshan Bryner is a research scientist and branch manager at the Naval Surface Warfare Center Panama City Division (NSWC PCD). He obtained his Ph.D. degree in Statistics from Florida State University in 2013, where his dissertation research focused on developing novel statistical shape models to improve image segmentation quality. As a research scientist at NSWC PCD, Dr. Bryner has conducted basic and applied research in the fields of statistical computer vision, functional data analysis, spatial statistics, optimization on nonlinear manifolds, and bioinformatics, leading to several publications in top-tier peer-reviewed journals and conferences. Since 2017, Dr. Bryner has served as the head of the Advanced Signal Processing and Automatic Target Recognition Branch (Code X23) at NSWC PCD.

graphic file with name nihms-1848507-b0016.gif

Anuj Srivastava Anuj Srivastava is a Professor and a Distinguished Research Professor at the Florida State University. His research interests include statistical analysis on nonlinear manifolds, statistical computer vision, functional data analysis, and shape analysis. He is a fellow of AAAS, IAPR, IEEE, and ASA. He has held several visiting positions at European universities, including INRIA, France; the University of Lille, France; and Durham University, UK. He has coauthored more than 230 papers in peer-reviewed journals and top-tier conferences, and also several books, including the 2016 Springer textbook on ”Functional and Shape Data Analysis”.

Contributor Information

Darshan Bryner, Naval Surface Warfare Center Panama City Division, Panama City, FL..

Anuj Srivastava, Department of Statistics, Florida State University, Tallahassee, FL..

References

[1].Maheshwari A, Sack JR, Shahbaz K, and Zarrabi-Zadeh H Improved algorithms for partial curve matching. In Algorithms – ESA 2011. ESA 2011. Lecture Notes in Computer Science, vol 6942. Springer, Berlin, Heidelberg, 2011. [Google Scholar]
[2].Alt H and Godau M. Computing the Frechet distance between two polygonal curves. International Journal of Computational Geometry and Applications, 5:75 – 91, March 1995. [Google Scholar]
[3].Ben Amor B, Su J, and Srivastava A. Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Transaction on Pattern Analysis and Machine Intelligence, 38(1):1–13, 2016. [DOI] [PubMed] [Google Scholar]
[4].Anirudh R, Turaga P, Su J, and Srivastava A. Elastic functional coding of riemannian trajectories. IEEE Transactions of Pattern Analysis and Machine Intelligence, 39(5):922–936, 2017. [DOI] [PubMed] [Google Scholar]
[5].Belkis Altendji, Demongeot Jacques, Laksaci Ali, and Rachdi Mustapha. Functional data analysis: Estimation of the relative error in functional regression under random left-truncation. Journal of Nonparametric Statistics, 30(2):472–490, 2018. [Google Scholar]
[6].Buchin K, Buchin M, and Wang Y. Exact algorithms for partial curve matching via the Frechet distance. In Proceedings of the 2009 Annual ACM-SIAM Symposium on Discrete Algorithms, pages 645 – 654, January 2009. [Google Scholar]
[7].Kong D, Ibrahim JG, Lee E, and Zhu H. FLCRM: Functional linear cox regression model. Biometrics, 74(1):109–117, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Delaigle Aurore and Hall Peter. Classification using censored functional data. Journal of the American Statistical Association, 108(504):1269–1283, 2013. [Google Scholar]
[9].Elmi Angelo F. Curve registration in functional data analysis with informatively censored event-times. PhD thesis, Univ. of Penn., 2009. [Google Scholar]
[10].Ferraty F and Vieu P. Nonparametric Functional Data Analysis: Theory and Practice. Springer Series in Statistics. Springer; New York, 2006. [Google Scholar]
[11].Funkhouser T and Shilane P. Partial matching of 3d shapes with priority-driven search. In Eurographics Symposium on Geometry Processing, 2006. [Google Scholar]
[12].Hsing Tailen and Eubank Randall. Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley, 2015. [Google Scholar]
[13].Huang W, A Gallivan K, Srivastava A, and Absil P-A. Riemannian optimization for registration of curves in elastic shape analysis. Journal of Mathematical Imaging and Vision, 54(3):320–343, 2016. [Google Scholar]
[14].Klein John P. and Moeschberger Melvin L.. Survival Analysis: Techniques for Censored and Truncated Data. Springer Verlag; New York, 2003. [Google Scholar]
[15].Kneip A and Ramsay JO. Combining registration and fitting for functional models. Journal of the American Statistical Association, 103(483), 2008. [Google Scholar]
[16].Kokoszka Piotr and Reimherr Matthew. Introduction to Functional Data Analysis. Chapman and Hall, 2017. [Google Scholar]
[17].Lee Sungwon and Jung Sungkyu. Combined analysis of amplitude and phase variations in functional data. arXiv:1603.01775v2, 2017. [Google Scholar]
[18].Marron JS, Ramsay JO, Sangalli LM, and Srivastava A. Statistics of time warpings and phase variations. Electronic Journal of Statistics, 8(2):1697–1702, 2014. [Google Scholar]
[19].Marron JS, Ramsay JO, Sangalli LM, and Srivastava A. Functional data analysis of amplitude and phase variation. Statistical Science, 30(4):468–484, November 2015. [Google Scholar]
[20].McBride JC and Kimia BB. Archaeological fragment reconstruction using curve-matching. In 2003 Conference on Computer Vision and Pattern Recognition Workshop, volume 1, 2003. [Google Scholar]
[21].Cui M, Femiani J, Hu J, Wonka P, and Razdan A. Curve matching for open 2d curves. Pattern Recognition Letters, 30:1–10, 2009. [Google Scholar]
[22].Ramsay JO and Li X. Curve registration. Journal of the Royal Statistical Society, Ser. B, 60:351–363, 1998. [Google Scholar]
[23].Ramsay JO and Silverman BW. Functional Data Analysis, Second Edition. Springer Series in Statistics, 2005. [Google Scholar]
[24].Ramsay JO, Bock RD, and Gasser T. Comparison of height acceleration curves in the fels, zurich, and berkeley growth data. Annals of Human Biology, 22(5):413–426, 1999. [DOI] [PubMed] [Google Scholar]
[25].Robinson D. Functional data analysis and partial shape matching in square-root velocity framework. FSU Dissertaion, 2012. [Google Scholar]
[26].Sebastian TB, Klein PN, and Kimia BB. On aligning curves. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(1):116–125, 2003. [DOI] [PubMed] [Google Scholar]
[27].Srivastava A and Klassen E. Functional and Shape Data Analysis. Springer Series in Statistics, 2016. [Google Scholar]
[28].Srivastava A, Klassen E, Joshi SH, and Jermyn IH. Shape analysis of elastic curves in euclidean spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(7):1415–1428, 2011. [DOI] [PubMed] [Google Scholar]
[29].Srivastava A, Wu W, Kurtek S, Klassen E, and Marron JS. Registration of functional data using fisher-rao metric. arXiv, arXiv:1103.3817, 2011. [Google Scholar]
[30].Takagishi M and Yadohisa H. Robust curve registration using the t distribution. Behaviormetrika, 46:177–198, 2019. [Google Scholar]
[31].Tucker JD, Wu W, and Srivastava A. Generative models for functional data using phase and amplitude separation. Computational Statistics and Data Analysis, 61:50–66, 2013. [Google Scholar]
[32].Veeraraghavan A, Srivastava A, Roy-Chowdhury AK, and Chellappa R. Rate-invariant recognition of humans and their activities. IEEE Transactions on Image Processing, 8(6):1326–1339, June 2009. [DOI] [PubMed] [Google Scholar]
[33].Yang Chengzhuan, Wei Hui, and Yu Qian. A novel method for 2d nonrigid partial shape matching. Neurocomputing, 275:1160–1176, 2018. [Google Scholar]
[34].Zhang Z, Pati D, and Srivastava A. Bayesian clustering of shapes of curves. Journal of Statistical Planning and Inference, 166:171 – 186, 2015. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supp1-3130535

NIHMS1848507-supplement-supp1-3130535.pdf^{(123.2KB, pdf)}

[R1] [1].Maheshwari A, Sack JR, Shahbaz K, and Zarrabi-Zadeh H Improved algorithms for partial curve matching. In Algorithms – ESA 2011. ESA 2011. Lecture Notes in Computer Science, vol 6942. Springer, Berlin, Heidelberg, 2011. [Google Scholar]

[R2] [2].Alt H and Godau M. Computing the Frechet distance between two polygonal curves. International Journal of Computational Geometry and Applications, 5:75 – 91, March 1995. [Google Scholar]

[R3] [3].Ben Amor B, Su J, and Srivastava A. Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Transaction on Pattern Analysis and Machine Intelligence, 38(1):1–13, 2016. [DOI] [PubMed] [Google Scholar]

[R4] [4].Anirudh R, Turaga P, Su J, and Srivastava A. Elastic functional coding of riemannian trajectories. IEEE Transactions of Pattern Analysis and Machine Intelligence, 39(5):922–936, 2017. [DOI] [PubMed] [Google Scholar]

[R5] [5].Belkis Altendji, Demongeot Jacques, Laksaci Ali, and Rachdi Mustapha. Functional data analysis: Estimation of the relative error in functional regression under random left-truncation. Journal of Nonparametric Statistics, 30(2):472–490, 2018. [Google Scholar]

[R6] [6].Buchin K, Buchin M, and Wang Y. Exact algorithms for partial curve matching via the Frechet distance. In Proceedings of the 2009 Annual ACM-SIAM Symposium on Discrete Algorithms, pages 645 – 654, January 2009. [Google Scholar]

[R7] [7].Kong D, Ibrahim JG, Lee E, and Zhu H. FLCRM: Functional linear cox regression model. Biometrics, 74(1):109–117, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] [8].Delaigle Aurore and Hall Peter. Classification using censored functional data. Journal of the American Statistical Association, 108(504):1269–1283, 2013. [Google Scholar]

[R9] [9].Elmi Angelo F. Curve registration in functional data analysis with informatively censored event-times. PhD thesis, Univ. of Penn., 2009. [Google Scholar]

[R10] [10].Ferraty F and Vieu P. Nonparametric Functional Data Analysis: Theory and Practice. Springer Series in Statistics. Springer; New York, 2006. [Google Scholar]

[R11] [11].Funkhouser T and Shilane P. Partial matching of 3d shapes with priority-driven search. In Eurographics Symposium on Geometry Processing, 2006. [Google Scholar]

[R12] [12].Hsing Tailen and Eubank Randall. Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley, 2015. [Google Scholar]

[R13] [13].Huang W, A Gallivan K, Srivastava A, and Absil P-A. Riemannian optimization for registration of curves in elastic shape analysis. Journal of Mathematical Imaging and Vision, 54(3):320–343, 2016. [Google Scholar]

[R14] [14].Klein John P. and Moeschberger Melvin L.. Survival Analysis: Techniques for Censored and Truncated Data. Springer Verlag; New York, 2003. [Google Scholar]

[R15] [15].Kneip A and Ramsay JO. Combining registration and fitting for functional models. Journal of the American Statistical Association, 103(483), 2008. [Google Scholar]

[R16] [16].Kokoszka Piotr and Reimherr Matthew. Introduction to Functional Data Analysis. Chapman and Hall, 2017. [Google Scholar]

[R17] [17].Lee Sungwon and Jung Sungkyu. Combined analysis of amplitude and phase variations in functional data. arXiv:1603.01775v2, 2017. [Google Scholar]

[R18] [18].Marron JS, Ramsay JO, Sangalli LM, and Srivastava A. Statistics of time warpings and phase variations. Electronic Journal of Statistics, 8(2):1697–1702, 2014. [Google Scholar]

[R19] [19].Marron JS, Ramsay JO, Sangalli LM, and Srivastava A. Functional data analysis of amplitude and phase variation. Statistical Science, 30(4):468–484, November 2015. [Google Scholar]

[R20] [20].McBride JC and Kimia BB. Archaeological fragment reconstruction using curve-matching. In 2003 Conference on Computer Vision and Pattern Recognition Workshop, volume 1, 2003. [Google Scholar]

[R21] [21].Cui M, Femiani J, Hu J, Wonka P, and Razdan A. Curve matching for open 2d curves. Pattern Recognition Letters, 30:1–10, 2009. [Google Scholar]

[R22] [22].Ramsay JO and Li X. Curve registration. Journal of the Royal Statistical Society, Ser. B, 60:351–363, 1998. [Google Scholar]

[R23] [23].Ramsay JO and Silverman BW. Functional Data Analysis, Second Edition. Springer Series in Statistics, 2005. [Google Scholar]

[R24] [24].Ramsay JO, Bock RD, and Gasser T. Comparison of height acceleration curves in the fels, zurich, and berkeley growth data. Annals of Human Biology, 22(5):413–426, 1999. [DOI] [PubMed] [Google Scholar]

[R25] [25].Robinson D. Functional data analysis and partial shape matching in square-root velocity framework. FSU Dissertaion, 2012. [Google Scholar]

[R26] [26].Sebastian TB, Klein PN, and Kimia BB. On aligning curves. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(1):116–125, 2003. [DOI] [PubMed] [Google Scholar]

[R27] [27].Srivastava A and Klassen E. Functional and Shape Data Analysis. Springer Series in Statistics, 2016. [Google Scholar]

[R28] [28].Srivastava A, Klassen E, Joshi SH, and Jermyn IH. Shape analysis of elastic curves in euclidean spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(7):1415–1428, 2011. [DOI] [PubMed] [Google Scholar]

[R29] [29].Srivastava A, Wu W, Kurtek S, Klassen E, and Marron JS. Registration of functional data using fisher-rao metric. arXiv, arXiv:1103.3817, 2011. [Google Scholar]

[R30] [30].Takagishi M and Yadohisa H. Robust curve registration using the t distribution. Behaviormetrika, 46:177–198, 2019. [Google Scholar]

[R31] [31].Tucker JD, Wu W, and Srivastava A. Generative models for functional data using phase and amplitude separation. Computational Statistics and Data Analysis, 61:50–66, 2013. [Google Scholar]

[R32] [32].Veeraraghavan A, Srivastava A, Roy-Chowdhury AK, and Chellappa R. Rate-invariant recognition of humans and their activities. IEEE Transactions on Image Processing, 8(6):1326–1339, June 2009. [DOI] [PubMed] [Google Scholar]

[R33] [33].Yang Chengzhuan, Wei Hui, and Yu Qian. A novel method for 2d nonrigid partial shape matching. Neurocomputing, 275:1160–1176, 2018. [Google Scholar]

[R34] [34].Zhang Z, Pati D, and Srivastava A. Bayesian clustering of shapes of curves. Journal of Statistical Planning and Inference, 166:171 – 186, 2015. [Google Scholar]

PERMALINK

Shape Analysis of Functional Data with Elastic Partial Matching

Darshan Bryner

Anuj Srivastava

Roles

Abstract

1. Introduction

Data with Phase Variability:

Data with Sliding Right Boundary:

Data with Both Phase Variability and Sliding Right Boundary:

Fig. 1.

2. Proposed Mathematical Framework

2.1. Past Work in the Alignment of Functions with Fixed Boundaries

Fig. 2.

2.2. Joint Time-Warping and Time Scaling

Definition 1.

Definition 2.

Definition 3.

2.3. G is a Proper Subgroup of Γ

Lemma 1.

Proof:

Lemma 2.

Proof:

Definition 4.

Lemma 3.

Proof:

Definition 5.

Remarks:

Definition 6.

Fig. 3.

3. Optimization Details for Pairwise Shape Distance

3.1. Grid-Search Algorithm

3.2. Gradient-Based Joint Optimization

Definition 7.

Definition 8.

Definition 9.

Definition 10.

Derivation of Gradient of E:

Lemma 4.

Proof:

Lemma 5.

Proof:

Theorem 1.

Proof:

Definition 11.

Definition 12. (Armijo-Goldstein Condition)

3.3. Modification of Energy Function

4. Experimental Results

Dataset 1:

Fig. 4.

Dataset 2:

Fig. 5.

4.1. Pairwise Alignment Examples

Fig. 6.

4.2. Algorithm Performance and Parameter Selection

Performance Analysis:

Fig. 7.

Remark:

Selection of λ:

4.3. Bayesian Clustering of Functional Data

Dataset 1:

Fig. 8.

Dataset 2:

Fig. 9.

Fig. 10.

Fig. 11.

Fig. 12.

5. Summary & Future Work

Supplementary Material

Acknowledgements

Biographies

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles