On Matrix-Valued Monge–Kantorovich Optimal Mass Transport

Lipeng Ning; Tryphon T Georgiou; Allen Tannenbaum

doi:10.1109/TAC.2014.2350171

. Author manuscript; available in PMC: 2016 Mar 18.

Published in final edited form as: IEEE Trans Automat Contr. 2014 Aug 21;60(2):373–382. doi: 10.1109/TAC.2014.2350171

On Matrix-Valued Monge–Kantorovich Optimal Mass Transport

Lipeng Ning ¹, Tryphon T Georgiou ², Allen Tannenbaum ³

PMCID: PMC4798256 NIHMSID: NIHMS757419 PMID: 26997667

Abstract

We present a particular formulation of optimal transport for matrix-valued density functions. Our aim is to devise a geometry which is suitable for comparing power spectral densities of multivariable time series. More specifically, the value of a power spectral density at a given frequency, which in the matricial case encodes power as well as directionality, is thought of as a proxy for a “matrix-valued mass density.” Optimal transport aims at establishing a natural metric in the space of such matrix-valued densities which takes into account differences between power across frequencies as well as misalignment of the corresponding principle axes. Thus, our transportation cost includes a cost of transference of power between frequencies together with a cost of rotating the principle directions of matrix densities. The two endpoint matrix-valued densities can be thought of as marginals of a joint matrix-valued density on a tensor product space. This joint density, very much as in the classical Monge–Kantorovich setting, can be thought to specify the transportation plan. Contrary to the classical setting, the optimal transport plan for matrices is no longer supported on a thin zero-measure set.

Index Terms: Convex optimization, matrix-valued density functions, optimal mass-transport

I. Introduction

The problem of optimal mass transport (OMT) dates back to the work of G. Monge in 1781 [1] while its modern formulation is due to L. Kantorovich [2]. In recent years the subject is developing rapidly due to its intrinsic significance and range of applications in physics, economics, and probability [3]–[5].

Our motivation for studying “matrix-valued transport” originates in the spectral analysis of multi-variable time series. Just as in scalar time series, spectral content is assessed based on (estimated) statistics of the underlying process, where these are simply moments of the corresponding power spectral density (PSD). Different metrics have been proposed to compare PSD’s for purposes of spectral approximation, estimation, and system modeling (see [6], [7] and the references therein). However, since spectra are estimated based on integrals, weak^* metrics¹ are preferable since they provide continuity of statistics to perturbations in the PSD. Earlier metrics and so called, divergence measures, typically fail in this respect (see [7]). Hence, for this reason, optimal mass transport which endows the space of (scalar) probability/mass/power densities with a natural weak^* metric—the Wasserstein metric, is of particular interest. Our aim in this paper is to develop one possible such generalization of the Wasserstein metric that allows comparison of matrix-valued density functions in a similar spirit.

The scalar OMT theory has been adapted in [8] to model slowly time-varying changes in power spectra of time series and has been used for statistical estimation, data assimilation, and morphing. While in scalar time series, the power spectral content may drift across frequencies over time (e.g., when considering Doppler effects, echolocation of a moving target, etc.), in vector-valued time series the power spectral content may shift principle directions as well. In fact, such a rotation of the power-specral content is typical in general antenna-arrays when a scatterer changes position with respect to array elements. Therefore, a concept of transport between matrix-valued densities requires that we take into account both, the cost of shifting power across frequencies as well as the cost of rotating the corresponding principle axes. Besides our particular formulation of a “non-commutative” Monge–Kantorovich transportation problem and of a correponding metric, the main results in this paper are i) that the optimal transport can be cast as a convex-optimization problem, ii) the geodesics and transport paths can be determined using convex programming, and iii) the optimal transport plan has support which, in contrast to the classical Monge–Kantorovich setting, is no longer contained on a thin zero-measure set. The relevance of the proposed metric is highlighted in examples on spectral morphing and spectral tracking in the final section of the paper.

II. Preliminaries on Optimal Mass Transport

Consider two probability density functions μ₀ and μ₁ supported on ℝ and let ℳ(μ₀, μ₁) be the set of probability measures m on ℝ × ℝ with μ₀ and μ₁ as marginals, i.e.,

\int_{ℝ} m (x, y) d y = μ_{0} (x), \int_{ℝ} m (x, y) d x = μ_{1} (y), m (x, y) \geq 0.

Clearly, ℳ(μ₀, μ₁) is not empty since already the product μ₀(x)μ₁(y) ∈ ℳ(μ₀, μ₁). Probability densities are thought of as distributions of mass and the optimal mass transport problem is to determine

T_{c} (μ_{0}, μ_{1}) : = inf_{m \in M (μ_{0}, μ_{1})} \int_{ℝ \times ℝ} c (x, y) m (x, y) d x d y

(1)

where c(x, y) is the cost of transporting one unit of mass from location x to y. In particular, when c(x, y) = |x − y|², the optimal cost gives rise to the 2-Wasserstein metric

W_{2} (μ_{0}, μ_{1}) = T_{2} {(μ_{0}, μ_{1})}^{\frac{1}{2}}

where

T_{2} (μ_{0}, μ_{1}) : = inf_{m \in M (μ_{0}, μ_{1})} \int_{ℝ \times ℝ} {∣ x - y ∣}^{2} m (x, y) d x d y .

(2)

In general, (1) is a linear program with dual

sup_{ϕ, ψ} {\int_{ℝ} (ϕ_{0} (x) μ_{0} (x) - ϕ_{1} (x) μ_{1} (x)) d x ∣ ϕ_{0} (x) - ϕ_{1} (y) \leq c (x, y)}

(3)

where ϕ₀, ϕ₁ are continuous, see [3]. For the quadratic cost c(x, y) = |x − y|² and in one spatial dimension, 𝓣₂(μ₀, μ₁) can also be written explicitly in terms of the cumulative distributions functions

M_{i} (x) = \int_{- \infty}^{x} μ_{i} (x) d x for i = 0, 1

in the form

T_{2} (μ_{0}, μ_{1}) = \int_{0}^{1} {∣ M_{0}^{- 1} (t) - M_{1}^{- 1} (t) ∣}^{2} d t .

(4)

In this case, the optimal joint probability density m ∈ ℳ(μ₀, μ₁) has support on (x, T (x)) where T (x) is the sub-differential of a convex lower semi-continuous function, see [3, p. 75]. More specifically, T (x) is uniquely defined by

M_{0} (x) = M_{1} (T (x)) .

(5)

Interestingly, a geodesic μ_τ (τ ∈ [0, 1]) between μ₀ and μ₁ can be written explicitly as well in terms of a corresponding cumulative function M_τ, for each τ, defined via

M_{τ} ((1 - τ) x + τ T (x)) = M_{0} (x) .

(6)

Indeed, it readily follows that:

\begin{array}{l} W_{2} (μ_{0}, μ_{τ}) = τ W_{2} (μ_{0}, μ_{1}) \\ W_{2} (μ_{τ}, μ_{1}) = (1 - τ) W_{2} (μ_{0}, μ_{1}) \end{array}

and that μ_τ (τ ∈ [0, 1]) is a geodesic.

III. Matrix-Valued Optimal Mass Transport

We consider the one-dimensional family of matrix-valued functions

F : = {μ (\cdot) ∣ for x \in ℝ, μ {(x)}^{*} = μ (x) \in ℂ^{n \times n}, μ (x) \geq 0, tr (\int_{ℝ} μ (x) d x) = 1} .

These are Hermitian, positive semi-definite matrix-valued functions on ℝ normalized so that their trace integrates to 1. They will be referred to as matrix-valued densities and can be thought of as a generalization of probability density functions. The scalar-valued tr(μ) represents mass at location x. Thus, all elements in ℱ have the same total mass over the support. Below, we motivate a particular cost of transportation between such matrix-valued functions and introduce a suitable generalization of the Monge–Kantorovich OMT to matrix-valued densities.

A. Tensor Product and Partial Trace

Consider two n-dimensional real or complex (Hilbert) spaces ℋ₀ and ℋ₁, let ℒ(ℋ₀) and ℒ(ℋ₁) denote the space of linear operators on ℋ₀ and ℋ₁, respectively, and let μ₀ ∈ ℒ(ℋ₀) and μ₁ ∈ ℒ(ℋ₁). Thus, in the present subsection, μ_i (i ∈ {0, 1}) are fixed matrices. We denote their tensor product by μ₀ ⊗ μ₁ ∈ ℒ(ℋ₀ ⊗ ℋ₁) which is formally defined via

μ_{0} \otimes μ_{1} : u \otimes v \mapsto μ_{0} u \otimes μ_{1} v .

Since our spaces are finite-dimensional this can be identified with the Kronecker product of the corresponding matrix representation of the two operators. The space ℒ(ℋ₀ ⊗ ℋ₁) is the span of all products μ₀ ⊗ μ₁ with μ_i ∈ ℒ(ℋ_i) for i ∈ {0, 1}.

Consider μ ∈ ℒ(ℋ₀ ⊗ ℋ₁). The partial traces tr_ℋ₀ and tr_ℋ₁, or tr₀ and tr₁ for brevity, are linear maps

\begin{array}{l} {tr}_{1} : L (H_{0} \otimes H_{1}) \to L (H_{0}) : μ \mapsto {tr}_{1} (μ) \\ {tr}_{0} : L (H_{0} \otimes H_{1}) \to L (H_{1}) : μ \mapsto {tr}_{0} (μ) \end{array}

defined uniquely by the property that on simple products they act as follows:

{tr}_{1} (μ_{0} \otimes μ_{1}) = tr (μ_{1}) μ_{0} and {tr}_{0} (μ_{0} \otimes μ_{1}) = tr (μ_{0}) μ_{1}

for any μ₀ ∈ ℒ(ℋ₀) and μ₁ ∈ ℒ(ℋ₁). Alternatively, μ ∈ ℒ(ℋ₀ ⊗ ℋ₁) can be represented by a matrix [μ_ik,_ℓ_m] of size n² × n² as it maps a basis element u_i ⊗ v_k ∈ ℋ₀ ⊗ ℋ₁ to Σ_ℓ_,m μ_ik,_ℓ_mu_ℓ ⊗ v_m. Then, the partial trace e.g., tr₁(μ) is the represented by the n × n matrix with (i, ℓ)-th entry Σ_kμ_ik,_ℓ_k, for 1 ≤ i, ℓ ≤ n. Likewise the (k, m)-th entry of tr₀(μ) is Σ_i μ_ik,im, for 1 ≤ k, m ≤ n. See [9] for the significance of partial trace in the context of quantum mechanics.

B. Joint Matrix-Valued Density

We now return to considering matrix-valued density functions μ₀, μ₁ ∈ ℱ. A naive attempt is to seek a joint density m ≥ 0 having support on ℝ × ℝ and having μ₀, μ₁ as “marginals,” i.e., so that

\int_{ℝ} m (x, y) d y = μ_{0} (x), \int_{ℝ} m (x, y) d x = μ_{1} (y) .

(7)

However, in contrast to the scalar case, such an m does not exist in general. To see this, consider the case of matrix valued measures

\begin{array}{l} μ_{0} (x) = [\begin{matrix} \frac{1}{2} & 0 \\ 0 & 0 \end{matrix}] δ (x - 1) + [\begin{matrix} 0 & 0 \\ 0 & \frac{1}{2} \end{matrix}] δ (x - 2), \\ μ_{1} (y) = [\begin{matrix} \frac{1}{4} & - \frac{1}{4} \\ - \frac{1}{4} & \frac{1}{4} \end{matrix}] δ (y - 1) + [\begin{matrix} \frac{1}{4} & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{4} \end{matrix}] δ (y - 2) \end{array}

where δ(·) denotes the Dirac delta. If m(x, y) were to exist, its support would be contained in {(1, 1), (1, 2) (2, 1), (2, 2)}. It is easy to see that there cannot be a consistent selection of four 2 × 2 matrices so that, in pairs, they sum up to the coefficients making up μ₀(x) and μ₁(y).

Thus, any natural definition of a transportation plan requires that the joint density lives in a bigger space. A particular formulation is as follows: we seek

m (x, y) being n^{2} \times n^{2} positive semi-definite matrix

(8a)

for (x, y) ∈ ℝ × ℝ, such that

m_{0} (x, y) : = {tr}_{1} (m (x, y)), m_{1} (x, y) : = {tr}_{0} (m (x, y))

(8b)

\int_{ℝ} m_{0} (x, y) d y = μ_{0} (x), \int_{ℝ} m_{1} (x, y) d x = μ_{1} (y)

(8c)

and denote

M (μ_{0}, μ_{1}) : = {m ∣ (8 a) - (8 c) are satisfied} .

Since μ₀ ⊗ μ₁ ∈ M(μ₀, μ₁), the set M(μ₀, μ₁) is clearly not empty.

We next motivate a suitable transportation cost. This is a functional on the joint density M(μ₀, μ₁), just as in the scalar case. However, besides penalizing transport of mass between two points x and y, we also impose a penalty on a corresponding rotation as well.

C. Transportation Cost

As indicated earlier, we interpret tr(m(x, y)) as the total “mass” that is being transferred from x to y. We consider a scalar cost² c(x, y) = (x − y)² and the “mass transference” cost

min_{m \in M (μ_{0}, μ_{1})} \int_{ℝ \times ℝ} c (x, y) tr (m (x, y)) d x d y .

(9)

This coincides with the optimal transportation cost between scalar-valued densities tr(μ₀) and tr(μ₁). Thus, if tr(μ₀(x)) = tr(μ₁(x)), the optimal value of (9) is zero since it reduces to optimal transport between identical scalar marginals. Thus, (9) fails to quantify mismatch of directionality between the given matrix-valued marginals. Below, we introduce a term that penalizes directionality missmatch.

We assume throughout that the marginals are positive definite pointwise. Then, for i ∈ {0, 1}, tr(μ_i(x)) represents the total mass at x while μ_i(x)/tr(μ_i(x)), normalized to have trace 1, encapsulates directional information. Likewise, for the joint density m(x, y), assuming that m(x, y) ≠ 0, we define the normalized partial traces

\begin{array}{l} {\underline{tr}}_{0} (m (x, y)) : = {tr}_{0} (m (x, y)) / tr (m (x, y)) \\ {\underline{tr}}_{1} (m (x, y)) : = {tr}_{1} (m (x, y)) / tr (m (x, y)) . \end{array}

Their difference captures the directional mismatch between the two partial traces. Hence, we introduce

tr ({‖ ({\underline{tr}}_{0} - {\underline{tr}}_{1}) m (x, y) ‖}_{F}^{2} m (x, y))

to quantify the rotational mismatch and we consider the cost functional

tr ((c (x, y) + λ {‖ ({\underline{tr}}_{0} - {\underline{tr}}_{1}) m (x, y) ‖}_{F}^{2}) m (x, y))

with λ > 0, to weigh in the relative significance of the linear and rotational penalties.

D. Optimal Transportation Problem

In view of the above, we define

T_{2, λ} (μ_{0}, μ_{1}) : = min_{m \in M (μ_{0}, μ_{1})} \int_{ℝ \times ℝ} tr ((c + λ {‖ ({\underline{tr}}_{0} - {\underline{tr}}_{1}) m ‖}_{F}^{2}) m) d x d y

(10)

with c(x, y) = (x − y)², and show next that (10) is in fact a convex optimization problem.

From the definition

\begin{array}{l} {\underline{tr}}_{0} (m) tr (m) = {tr}_{0} (m), \\ {\underline{tr}}_{1} (m) tr (m) = {tr}_{1} (m) \end{array}

and hence

\begin{array}{l} {‖ ({\underline{tr}}_{0} - {\underline{tr}}_{1}) m ‖}_{F}^{2} tr (m) = \frac{{‖ ({\underline{tr}}_{0} - {\underline{tr}}_{1}) m ‖}_{F}^{2} tr {(m)}^{2}}{tr (m)} \\ = \frac{{‖ ({tr}_{0} - {tr}_{1}) m ‖}_{F}^{2}}{tr (m)} . \end{array}

Now let m(x, y) = tr(m(x, y)) and m₀(x, y), m₁(x, y) be as in (8). The expression for the optimal cost in (10) is lower bounded by

\begin{array}{l} min_{m_{0}, m_{1}, m} {\int (c (x, y) m (x, y) + λ \frac{{‖ m_{0} - m_{1} ‖}_{F}^{2}}{m}) d x d y ∣ \\ m_{0} (x, y), m_{1} (x, y) \geq 0, \\ tr (m_{0} (x, y)) = tr (m_{1} (x, y)) = m (x, y) \\ \int m_{0} (x, y) d y = μ_{0} (x), \int m_{1} (x, y) d x = μ_{1} (y)} . \end{array}

(11)

For an optimal triple m̂, m̂₀, m̂₁ of (11), m̂:= m̂₀ ⊗ m̂₁ is a minimizer of (10) that gives the same optimal value as (11). Thus, the optimal cost in (10) is equivalently written as (11).

For x > 0, the expression (y − z)²/x is jointly convex in the arguments x, y, z, see e.g., [10, p. 72]. It readily follows that the integral in (11) is a jointly convex functional of its arguments. All additional constraints in (11) are convex as well and, therefore, so is the optimization problem.

IV. On the Geometry of Optimal Mass Transport

An important result in the (scalar) OMT theory is that the transportation plan is the sub-differential of a convex function and has support on a thin zero-measure set, see e.g., [3, p. 92]. This property is not shared by the optimal transportation plan between matrix-valued density functions as we explain next.

In standard scalar OMT with convex transportation cost, the optimal transportation plan has a certain cyclically monotonic property [3]. More specifically, if (x₁, y₁), (x₂, y₂) are two points where the transportation plan has support (i.e., m(x, y) ≠ 0), then x₂ > x₁ implies y₂ ≥ y₁. The interpretation is that optimal transportation paths of mass elements do not cross. For the case of matrix-valued distributions as in (3), this property may not hold in the same way. However, interestingly, a weaker monotonicity property holds for the supporting set of the optimal matrix transportation plan. The property is defined next and the precise statement is given in Proposition 2 below.

Definition 1

A set 𝓢 ⊂ ℝ² is called a ρ-monotonically nondecreasing, for ρ > 0, if for any two points (x₁, y₁), (x₂, y₂) ∈ 𝓢, it holds that

(x_{2} - x_{1}) (y_{1} - y_{2}) \leq ρ .

A geometric interpretation for a ρ-monotonically non-decreasing set is that if (x₁, y₁), (x₂, y₂) ∈ 𝓢 and x₂ > x₁, y₁ > y₂, then the area of the rectangle with vertices (x_i, y_j) (i, j ∈ {1, 2}) is not larger than ρ. The transportation plan of the scalar-valued optimal transportation problem with a quadratic cost has support on a 0-monotonically non-decreasing set.

Proposition 2

Given μ₀, μ₁ ∈ ℱ, let m be the optimal transportation plan in (10) with c(x, y) = (x − y)² and λ > 0. Then m has support on at most a (4 · λ)-monotonically nondecreasing set.

Proof

See the Appendix.

Further, the optimal transportation cost 𝓣₂_{, λ}(μ₀, μ₁) satisfies:

𝓣₂_,λ (μ₀, μ₁) = 𝓣₂_,λ(μ₁, μ₀),
𝓣₂_,λ (μ₀, μ₁) ≥ 0,
𝓣₂_,λ (μ₀, μ₁) = 0 if and only if μ₀ = μ₁.

Thus, although 𝓣₂_,λ(μ₀, μ₁) can be used to compare matrix-valued densities, it is not a metric and neither is $T_{2, λ}^{1 / 2}$ since the triangular inequality does not hold in general. We will introduce a slightly different formulation of a transportation problem which does give rise to a metric.

A. Optimal Transport on a Subset

In this subsection, we restrict attention to a certain subset of transport plans M(μ₀, μ₁) and show that the corresponding optimal transportation cost induces a metric. More specifically, let

M_{0} (μ_{0}, μ_{1}) : = {m ∣ m (x, y) = (μ_{0} (x) \otimes μ_{1} (y)) a (x, y), m \in M} .

For m(x, y) ∈ M₀(μ₀, μ₁),

\begin{array}{l} {\underline{tr}}_{0} (m (x, y)) : = μ_{1} (y) / tr (μ_{1} (y)) \\ {\underline{tr}}_{1} (m (x, y)) : = μ_{0} (x) / tr (μ_{0} (x)) . \end{array}

Given μ₀ and μ₁, the “orientation” of the mass of m(x, y) is fixed. Thus, in this case, the optimal transportation cost is

{\tilde{T}}_{2, λ} (μ_{0}, μ_{1}) : = min_{m \in M_{0} (μ_{0}, μ_{1})} \int tr ((c + λ {‖ ({\underline{tr}}_{0} - {\underline{tr}}_{1}) m (x, y) ‖}_{F}^{2}) m) d x d y .

(12)

Proposition 3

For 𝓣₂_,λ as in (12) with λ > 0 and μ₀, μ₁ ∈ ℱ,

d_{2, λ} (μ_{0}, μ_{1}) : = {({\tilde{T}}_{2, λ} (μ_{0}, μ_{1}))}^{\frac{1}{2}}

(13)

defines a metric on ℱ.

Proof

It is straightforward to see that

d_{2, λ} (μ_{0}, μ_{1}) = d_{2, λ} (μ_{1}, μ_{0}) \geq 0

and that d₂_,λ (μ₀, μ₁) = 0 if and only if μ₀ = μ₁. We now show that the triangle inequality holds as well. For μ₀, μ₁, μ₂ ∈ ℱ, let

\begin{array}{l} m_{01} (x, y) = \frac{μ_{0} (x)}{tr (μ_{0} (x))} \otimes \frac{μ_{1} (y)}{tr (μ_{1} (y))} m_{01} (x, y) \\ m_{12} (y, z) = \frac{μ_{1} (y)}{tr (μ_{1} (y))} \otimes \frac{μ_{2} (z)}{tr (μ_{2} (z))} m_{12} (y, z) \end{array}

denote the optimal transportation plan for the pairs (μ₀, μ₁) and (μ₁, μ₂), respectively, where m₀₁ and m₁₂ are two (scalar-valued) joint densities on ℝ² with marginals tr(μ₀), tr(μ₁) and tr(μ₁), tr(μ₂), respectively. Given m₀₁(x, y) and m₁₂(y, z) there is a joint density function m(x, y, z) on ℝ³ with m₀₁ and m₁₂ as the marginals on the corresponding subspaces [3, p. 208]. We set

m (x, y, z) = \frac{μ_{0} (x)}{tr (μ_{0} (x))} \otimes \frac{μ_{1} (y)}{tr (μ_{1} (y))} \otimes \frac{μ_{2} (z)}{tr (μ_{2} (z))} m (x, y, z)

and note that it has m₀₁ and m₁₂ as matrix-valued marginal distributions. Now, let m₀₂(x, z) = (μ₀(x)/tr μ₀(x)) ⊗ (μ₂(z)/tr μ₂(z))m₀₂(x, z) be the marginal of m(x, y, z) when tracing out the y-component. This m₀₂(x, z) is a possible transportation plan between μ₀ and μ₂. Hence

\begin{array}{l} d_{2, λ} (μ_{0}, μ_{2}) \leq {(\int_{ℝ^{2}} ({(x - z)}^{2} + λ {‖ \frac{μ_{0} (x)}{tr μ_{0} (x)} - \frac{μ_{2} (z)}{tr μ_{2} (z)} ‖}_{F}^{2}) m_{02} d x d z)}^{\frac{1}{2}} \\ = {(\int_{ℝ^{3}} ({(x - z)}^{2} + λ {‖ \frac{μ_{0} (x)}{tr μ_{0} (x)} - \frac{μ_{2} (z)}{tr μ_{2} (z)} ‖}_{F}^{2}) m d x d y d z)}^{\frac{1}{2}} \\ = {(\int_{ℝ^{3}} ({(x - y + y - z)}^{2} + λ {‖ \frac{μ_{0} (x)}{tr μ_{0} (x)} - \frac{μ_{1} (y)}{tr μ_{1} (y)} + \frac{μ_{1} (y)}{tr μ_{1} (y)} - \frac{μ_{2} (z)}{tr μ_{2} (z)} ‖}_{F}^{2}) m d x d y d z)}^{\frac{1}{2}} \\ \leq {(\int_{ℝ^{2}} ({(x - y)}^{2} + λ {‖ \frac{μ_{0} (x)}{tr μ_{0} (x)} - \frac{μ_{1} (y)}{tr μ_{1} (y)} ‖}_{F}^{2}) m_{01} d x d y)}^{\frac{1}{2}} + {(\int_{ℝ^{2}} ({(y - z)}^{2} + λ {‖ \frac{μ_{1} (y)}{tr μ_{1} (y)} - \frac{μ_{2} (z)}{tr μ_{2} (z)} ‖}_{F}^{2}) m_{12} d y d z)}^{\frac{1}{2}} \\ = d_{2, λ} (μ_{0}, μ_{1}) + d_{2, λ} (μ_{1}, μ_{2}) \end{array}

where the last inequality is due to the metric property of L₂.

If λ = 0, then 𝓣̃_2,0(μ₀, μ₁) is exactly the OMT cost between the scalar-valued densities tr(μ₀) and tr(μ₁) as was explained earlier. In particular, for an optimal transportation plan m(x, y) between tr(μ₀) and tr(μ₁), the matrix-valued transportation plan m(x, y) = (μ₀(x)/tr(μ₀(x))) ⊗ (μ₁(y)/tr(μ₁(y)))m(x, y) is optimal between μ₀ and μ₁ which satisfies that 𝓣̃_2,0(μ₀, μ₁) = 𝓣₂(tr(μ₀), tr(μ₁)). Thus, 𝓣̃_2,0(μ₀, μ₁) = 0 if and only if tr(μ₀) = tr(μ₁). Hence, 𝓣̃_2,0 fails to be a metric. Moreover, since for any λ ≥ 0 it holds that 𝓣_2,_λ ≤ 𝓣̃_2,_λ, if tr(μ₀) = tr(μ₁) then 𝓣_2,0(μ₀, μ₁) also equals to zero.

Proposition 4

Given μ₀, μ₁ ∈ ℱ, let m be the optimal transportation plan in (13), then m has support on at most a (2 · λ)-monotonically non-decreasing set.

Proof

We need to prove that if m(x₁, y₁) ≠ 0 and m(x₂, y₂) ≠ 0, then x₂ > x₁, y₁ > y₂ implies

(y_{1} - y_{2}) (x_{2} - x_{1}) \leq 2 λ .

(14)

Assume that m evaluated at the four points (x_i, y_j) with i, j ∈ {1, 2}, is as follows:

m (x_{i}, y_{i}) = m_{i j} \cdot A_{i} \otimes B_{j}

with

A_{i} = \frac{μ_{0} (x_{i})}{tr (μ_{1} (x_{i}))}, B_{i} = \frac{μ_{0} (y_{i})}{tr (μ_{1} (y_{i}))}

and m₁₁, m₂₂ > 0. The steps of the proof are similar to those of Proposition 2 detailed in the Appendix: first, we assume that Proposition 4 fails and that

(y_{1} - y_{2}) (x_{2} - x_{1}) > 2 λ .

Then we show that a smaller cost can be incurred by rearranging the “mass.” Consider the situation when m₂₂ ≥ m₁₁ first and let m̂ be a new transportation plan with

\begin{array}{l} \hat{m} (x_{1}, y_{1}) = 0 \\ \hat{m} (x_{1}, y_{2}) = (m_{11} + m_{12}) \cdot A_{1} \otimes B_{2} \\ \hat{m} (x_{2}, y_{1}) = (m_{11} + m_{21}) \cdot A_{2} \otimes B_{1} \\ \hat{m} (x_{2}, y_{2}) = (m_{22} - m_{11}) \cdot A_{2} \otimes B_{2} . \end{array}

Then, m̂, m have the same marginals at the four points, the cost incurred by m is

\sum_{i = 1}^{2} \sum_{j = 1}^{2} m_{i j} ({(x_{i} - y_{j})}^{2} + λ {‖ A_{i} - B_{j} ‖}_{F}^{2})

(15)

and the cost incurred by m̂ is

(m_{11} + m_{12}) ({(x_{1} - y_{2})}^{2} + λ {‖ A_{1} - B_{2} ‖}_{F}^{2}) + (m_{11} + m_{21}) ({(x_{2} - y_{1})}^{2} + λ {‖ A_{2} - B_{1} ‖}_{F}^{2}) + (m_{22} - m_{11}) ({(x_{2} - y_{2})}^{2} + λ {‖ A_{2} - B_{2} ‖}_{F}^{2}) .

(16)

To show that (15) is larger than (16), after canceling common terms, it suffices to show that

{(y_{1} - x_{1})}^{2} + {(y_{2} - x_{2})}^{2} + λ {‖ A_{1} - B_{1} ‖}_{F}^{2} + λ {‖ A_{2} - B_{2} ‖}_{F}^{2} \geq {(y_{2} - x_{1})}^{2} + {(y_{1} - x_{2})}^{2} + λ {‖ A_{1} - B_{2} ‖}_{F}^{2} + λ {‖ A_{2} - B_{1} ‖}_{F}^{2} .

However, the above holds true since

\begin{array}{l} {(y_{1} - x_{1})}^{2} + {(y_{2} - x_{2})}^{2} + λ {‖ A_{1} - B_{1} ‖}_{F}^{2} + λ {‖ A_{2} - B_{2} ‖}_{F}^{2} \geq {(y_{1} - x_{1})}^{2} + {(y_{2} - x_{2})}^{2} \\ = {(y_{1} - x_{2})}^{2} + {(y_{2} - x_{1})}^{2} + 2 (x_{2} - x_{1}) (y_{1} - y_{2}) \\ > {(y_{1} - x_{2})}^{2} + {(y_{2} - x_{1})}^{2} + 4 λ \\ \geq {(y_{1} - x_{2})}^{2} + {(y_{1} - x_{2})}^{2} + λ ({‖ A_{1} - B_{2} ‖}_{F}^{2} + {‖ A_{2} - B_{1} ‖}_{F}^{2}) . \end{array}

The last inequality follows from:

{‖ A_{1} - B_{2} ‖}_{F}^{2} = tr (A_{1}^{2} + B_{2}^{2} - 2 A_{1} B_{2}) \leq tr (A_{1}^{2} + B_{2}^{2}) \leq 2.

The case m₁₁ > m₂₂ proceeds similarly.

V. Examples

We give two different examples where matrix-valued OMT can be directly applied. Both relate to spectral analysis of multivariable time series.³

A. Spectral Morphing

We first highlight the relevance of matrix-valued OMT to spectral analysis with a numerical example on spectral morphing. The idea is to model slowly time-varying changes in the spectral domain by geodesics in a suitable geometry (see e.g., [7], [8]). The use of geodesic interpolation can be thought of as a regularization technique. Indeed, geodesics smoothly shift spectral power across frequencies lessening the possibility of a fade-in fade-out artifacts and OMT, for scalar power spectra, has been used to this end in [7], [8]. Below we exemplify how geodesics appear in matrix-valued OMT.

Starting with μ₀, μ₁ ∈ ℱ we approximate the geodesic between the two by constructing N − 1 points intermediate matrix densities. To this end, we set μ_τ₀ = μ₀ and μ_{τ_N} = μ₁, and determine μ_{τ_k} ∈ ℱ for k = 1, …, N − 1 as the solution to

min_{μ_{τ_{k}}, 0 < k < N} \sum_{k = 0}^{N - 1} T_{2, λ} (μ_{τ_{k + 1}}, μ_{τ_{k}}) .

(17)

As noted in Section III-D, this can be obtained numerically via convex programming. The present example uses

\begin{array}{l} μ_{0} = [\begin{matrix} 1 & 0 \\ 0.2 e^{- j θ} & 1 \end{matrix}] [\begin{matrix} \frac{1}{{∣ a_{0} (e^{j θ}) ∣}^{2}} & 0 \\ 0 & 0.01 \end{matrix}] [\begin{matrix} 1 & 0.2 e^{j θ} \\ 0 & 1 \end{matrix}] \\ μ_{1} = [\begin{matrix} 1 & 0.2 \\ 0 & 1 \end{matrix}] [\begin{matrix} 0.01 & 0 \\ 0 & \frac{1}{{∣ a_{1} (e^{j θ}) ∣}^{2}} \end{matrix}] [\begin{matrix} 1 & 0 \\ 0.2 & 1 \end{matrix}] \end{array}

with

\begin{array}{l} a_{0} (z) = (z^{2} - 1.8 cos (\frac{π}{4}) z + {0.9}^{2}) (z^{2} - 1.4 cos (\frac{π}{3}) z + {0.7}^{2}) \\ a_{1} (z) = (z^{2} - 1.8 cos (\frac{π}{6}) z + {0.9}^{2}) (z^{2} - 1.5 cos (\frac{2 π}{15}) z + {0.75}^{2}) \end{array}

shown in Fig. 1. Since the value of a power spectral density at each point in frequency is a 2 × 2 Hermitian matrix, we have used the (1, 1), (1, 2), and (2, 2) subplots to display the magnitude of the corresponding entries, i.e., |μ(1, 1)|, |μ(1, 2)|, (= |μ(2, 1)|), and |μ(2, 2)|, respectively, and the (2,1) subplot to display the phase ∠μ(1, 2) (= −∠μ(2, 1)).

The 3-D plots in Fig. 2 refer to (17), with λ = 0.1, for an approximation of a geodesic. The two boundary plots represent the power spectra μ₀ and μ₁ shown in blue and red, respectively, using the same convention about magnitudes and phases explained above. There are in total 7 power spectra μ_{τ_k}, k = 1, …, 7 shown along the geodesic between μ₀ and μ₁, and the time-indices correspond to τ_k = k/8. It is interesting to observe the smooth shift of the energy over the geodesic path from the one “channel” to the other while, at the same time, the corresponding peak shifts from one frequency to another. One should bear in mind that the so-constructed geodesic is a non-parametric path interpolating/linking the given spectra.

B. Regularization Using Geodesics

Consider two time series

\begin{array}{l} x_{1} (t) = a_{1} (t) cos (θ_{1} (t) t + ϕ_{1 a}) + a_{2} (t) cos (θ_{2} (t) t + ϕ_{1 b}) + w_{1} (t) \\ x_{2} (t) = a_{2} (t) cos (θ_{1} (t) t + ϕ_{2 a}) + a_{1} (t) cos (θ_{2} (t) t + ϕ_{2 b}) + w_{2} (t) \end{array}

for t = 1, …, 2000, both consisting of sinusoidal signals with time-varying amplitude and frequency (chirp-like) with added white noise w₁(t) and w₂(t). The amplitude a₁(t) decreases from 1.2 to 0.1 while a₂(t) increases from 0.1 to 1.2. Frequency θ₁(t) decreases from (π/4) to (π/4) − (π/30) while θ₂(t) increases from (π/3) to (π/3) + (π/30). Then [w₁(t), w₂(t)]′ is white, with independent components, and sampled from a zero-mean Gaussian distribution with covariance $[\begin{matrix} 3 & 1.5 \\ 1.5 & 3 \end{matrix}]$ . The initial phases of the sinusoids are randomly selected in [0, 2π].

Since we are dealing with non-stationary time series, we truncate the observed time series, to segments of length equal to 200, to retain resolution. Thus, we let x_i_,_k(t) := x_i(200k+t) with i = 1, 2, k = 0, …, 9 and process separately the segments {x_i_,_k(1), x_i_,_k(2), …, x_i_,_k(200)}. We obtain matrix-valued sample covariances {R_k_,−20, …, R_k_,0, …, R_k_,20} for each. We then determine autoregressive models based on these sample covariances and, thereby, the corresponding power spectral density functions. More specifically, for ℓ = 0, 1, …, 20, we compute

R_{k, ℓ} : = \frac{1}{200} \sum_{i = ℓ}^{200} [\begin{matrix} x_{1, k} (i) \\ x_{2, k} (i) \end{matrix}] [x_{1, k} (i - ℓ), x_{2, k} (i - ℓ)]

and let $R_{k, - ℓ} = R_{k, ℓ}^{'}$ . We then solve the Yule-Walker equations

R_{k, ℓ} = \sum_{i = 1}^{20} A_{k, i} R_{k, ℓ - i}, for ℓ = 1, \dots, 20

for the autoregressive (matricial) coefficients A₁, …, A₂₀. We let $Ω : = R_{0} - \sum_{i = 1}^{20} A_{k, i} R_{k, - i}$ be the corresponding innovation variance. The estimated power spectral density function for the kth segment, denoted as μ̂_{τ_k}, is given as μ̂_{τ_k}(θ) = A_k(e^jθ)⁻¹Ω(A_k(e^jθ)^*)⁻¹ with

A_{k} (e^{j θ}) = (I - A_{k, 1} z - \dots - A_{k, 20} z^{20}) ∣_{z = e^{j θ}} .

We scale μ̂_{τ_k} so that the integral of its trace is normalized to one. Thus, the observation record is used to obtain 10 PSD’s denoted as μ̂_{τ_k}, for k = 1, …, 10. These represent estimates of the spectral power at intermediary points in time. We change the time scale so that τ₁ = 0 and τ₁₀ = 1. The spectrogram is shown in Fig. 3(a).

Fig. 3 — (a) Shows the estimated spectrogram of the observed time series and (b) corresponds to the geodesic-fitted spectrogram.

We construct an OMT-geodesic to regularize the estimated PSD’s. This idea was proposed and carried out in [8] for scalar time series and scalar PSD’s. For the present matrix-valued setting, the geodesic is obtained by solving

min_{μ_{τ_{k}}} {\sum_{k = 1}^{10} T_{2, λ} (μ_{τ_{k}}, {\hat{μ}}_{τ_{k}}) ∣ μ_{τ_{k}} are on an OMT geodesic} .

(18)

An explicit formula of the OMT geodesic is not available. However, in light of Proposition 2, (18) can be approximated for small λ as follows. Let μ̂_{τ_k} = tr(μ̂_{τ_k}) for k = 1, …, 10. These are scalar-valued PSD’s. Let M̂_{τ_k} denote the corresponding cumulative distribution functions. For λ small, following [8], we compute μ₀ := tr(μ₀) and μ₁ := tr(μ₁) via solving:

min_{μ_{0}, μ_{1}} \sum_{k = 1}^{10} \int_{0}^{1} {((1 - τ_{k}) M_{0}^{- 1} (v) + τ_{k} M_{1}^{- 1} (v) - {\hat{M}}_{τ_{k}}^{- 1} (v))}^{2} d v

with M₀ and M₁ representing the cumulative distribution function of μ₀ and μ₁, respectively. Then, as was shown in [8], the μ_{τ_k}’s for 1 < k < 10 can be computed via

M_{τ_{k}}^{- 1} (v) = (1 - τ_{k}) M_{0}^{- 1} (v) + τ_{k} M_{1}^{- 1} (v) .

The matrix-valued PSD’s μ₀ and μ₁ are obtained by solving

min_{μ_{0}, μ_{1}} \sum_{k = 1}^{10} \int_{0}^{1} {‖ (1 - τ_{k}) \frac{μ_{0} (M_{0}^{- 1} (v))}{μ_{0} (M_{0}^{- 1} (v))} + τ_{k} \frac{μ_{1} (M_{1}^{- 1} (v))}{μ_{1} (M_{1}^{- 1} (v))} - \frac{{\hat{μ}}_{τ_{k}} ({\hat{M}}_{τ_{k}}^{- 1} (v))}{{\hat{μ}}_{τ_{k}} ({\hat{M}}_{τ_{k}}^{- 1} (v))} ‖}_{F}^{2} d v

and the μ_{τ_k}’s for 1 < k < 10 are computed via

\frac{μ_{τ_{k}} (M_{τ_{k}}^{- 1} (v))}{μ_{τ_{k}} (M_{τ_{k}}^{- 1} (v))} = (1 - τ_{k}) \frac{μ_{0} (M_{0}^{- 1} (v))}{μ_{0} (M_{0}^{- 1} (v))} + τ_{k} \frac{μ_{1} (M_{1}^{- 1} (v))}{μ_{1} (M_{1}^{- 1} (v))} .

We display this geodesic-fitted spectrogram in Fig. 3(b). It can be seen that the shift of energy from one channel to another and between resonant frequencies is smoother than that shown in Fig. 3(a) (which is a spectrogram based on matricial auto-regressive models).

In order to compare the resolution between the two techniques (spectrogram based on AR-modeling vs. geodesic regularization), we identify the frequency and directionality of peak power for the two power spectral densities and compare principal direction. This we explain next as the result is quite revealing and suggesting.

For each μ̂_{τ_k} (θ), we find two frequencies θ₁ and θ₂ where the power spectral densities (PSD’s) have locally maximal power, i.e., the two frequencies where tr(μ̂_{τ_k} (θ)) has the largest peaks. Then we compute the (normalized) eigenvectors corresponding to the dominant eigenvalues of μ_{τ_k} (θ₁) and μ_{τ_k} (θ₂), respectively. These eigenvectors are shown in Fig. 4(a) using black dashed lines. The red and green plots in Fig. 4(a) represent the path of the two eigenvectors as τ_k increases from 0 to 1. The axes in Fig. 4(a) correspond to the two channels/components of the time series and τ_k. (Should all the power be present in one of the two channels, the eigenvector would line up, accordingly, to one of the axes.) The values of the eigenvector when projected onto the two channels/axes, reflect the energy of the signals in the corresponding channels. Thus, in antenna-array applications, the direction of eigenvectors corresponds to the direction of a scatterer relative to the array. Statistical errors are reflected in the jagged nature of the paths when these are based on a spectrogram as in Fig. 4(a). However, when comparing with the eigenvectors of the OMT regularized spectrogram/AR-models μ_{τ_k}, the corresponding paths shown in Fig. 4(b) are smooth. Direct comparison between Fig. 4(a) and (b) highlights the potential advantages of using geodesics as a means to regularize power distribution in non-stationary time series.

Fig. 4 — In (a), the trajectories of the dominant eigenvector of μ̂_{τ_k} (θ₁) and μ̂_{τ_k} (θ₂) are shown in red and blue, respectively. The corresponding trajectories of μ_{τ_k} (θ₁) and μ_{τ_k} (θ₂) are shown in (b).

VI. Conclusion

The geometry of Monge–Kantorovich optimal mass transport provides scalar densities with a natural metric structure (see [11], [12]; also [13] for a systems viewpoint and connections to image analysis and power spectra). Our interest has been in extending such a geometric structure to matrix-valued densities. To this end, we formulated one possible matrix-valued version of the Monge–Kantorovich transportation problem. Computations require convex programming and the framework directly extends the scalar case. An alternative generalization of the Monge–Kantorovich theory to a “non-commutative” setting has been given in the context of the theory of free-probabilities [14]. However, this may not be suitable for matrix-valued power distributions as it is not weak^* continuous. Alternative non-commutative generalizations of the Wasserstein metric are given in [15]–[17]. Possible connections between the formulation herein and these alternative viewpoints is the subject of current investigation.

Biographies

graphic file with name nihms757419b1.gif

Lipeng Ning received the B.S. and M.S. degrees in control science and engineering from Beijing Institute of Technology, Beijing, China, in 2006 and 2008, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Minnesota, Minneapolis, in 2013.

He is currently a Postdoc Research Fellow in Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA. His research interests include system identification, spectral estimation, sparse signal recovery, and diffusion magnetic resonance imaging.

graphic file with name nihms757419b2.gif

Tryphon T. Georgiou (F’00) received the Diploma in mechanical and electrical engineering from the National Technical University of Athens, Athens, Greece, in 1979 and the Ph.D. degree from the University of Florida, Gainesville, in 1983.

He is a faculty member in the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, and the Vincentine Hermes-Luh Chair. He has served as a Co-Director of the Control Science and Dynamical Systems Center at the University of Minnesota (1990–2014), and on the Board of Governors of the Control Systems Society of the IEEE (2002–2005).

Dr. Georgiou has been a recipient of the George S. Axelby Outstanding Paper Award of the IEEE Control Systems Society for the years 1992, 1999, and 2003. He is a Foreign Member of the Royal Swedish Academy of Engineering Sciences (IVA).

graphic file with name nihms757419b3.gif

Allen Tannenbaum (F’08) is a faculty member in computer science and applied mathematics at Stony Brook University, Stony Brook, NY, USA. He works in control, computer vision, and medical imaging.

Appendix

Appendix: Proof of Proposition 2

We need to show that if m(x₁, y₁) ≠ 0 and m(x₂, y₂) ≠ 0, then x₂ > x₁, y₁ > y₂ implies

(x_{2} - x_{1}) (y_{1} - y_{2}) \leq 4 λ .

(19)

Without loss of generality, let

m (x_{i}, y_{j}) = m_{i j} \cdot A_{i j} \otimes B_{i j}

(20)

with A_ij, B_ij ≥ 0, tr(A_ij) = tr(B_ij) = 1 and i, j ∈ {1, 2}. Note that m₁₂ and m₂₁ could be zero if m does not have support on the particular point. We assume that the condition in the proposition fails and that

(x_{2} - x_{1}) (y_{1} - y_{2}) > 4 λ .

(21)

We then show that by rearranging the mass, the cost can be reduced.

We first consider the situation when m₂₂ ≥ m₁₁. By rearranging the value of m at the four points (x_i, y_j) with i, j ∈ {1, 2}, we construct a new transportation plan m̃ at these four locations as follows:

\tilde{m} (x_{1}, y_{1}) = 0

(22a)

\tilde{m} (x_{1}, y_{2}) = (m_{11} + m_{12}) \cdot {\tilde{A}}_{12} \otimes {\tilde{B}}_{12}

(22b)

\tilde{m} (x_{2}, y_{1}) = (m_{11} + m_{21}) \cdot {\tilde{A}}_{21} \otimes {\tilde{B}}_{21}

(22c)

\tilde{m} (x_{2}, y_{2}) = (m_{22} - m_{11}) \cdot A_{22} \otimes B_{22}

(22d)

where

\begin{array}{l} {\tilde{A}}_{12} = \frac{m_{11} A_{11} + m_{12} A_{12}}{m_{11} + m_{12}}, & {\tilde{B}}_{12} = \frac{m_{11} B_{22} + m_{12} B_{12}}{m_{11} + m_{12}} \\ {\tilde{A}}_{21} = \frac{m_{11} A_{22} + m_{21} A_{21}}{m_{11} + m_{21}}, & {\tilde{B}}_{21} = \frac{m_{11} B_{11} + m_{21} B_{21}}{m_{11} + m_{21}} . \end{array}

This new transportation plan m̃ has the same marginals as m at x₁, x₂ and y₁, y₂. The original cost incurred by m at these four locations is

\sum_{i = 1}^{2} \sum_{j = 1}^{2} m_{i j} ({(x_{i} - y_{j})}^{2} + λ {‖ A_{i j} - B_{i j} ‖}_{F}^{2})

(23)

while the cost incurred by m̃ is

(m_{11} + m_{12}) ({(x_{1} - y_{2})}^{2} + λ {‖ {\tilde{A}}_{12} - {\tilde{B}}_{12} ‖}_{F}^{2}) + (m_{11} + m_{21}) ({(x_{2} - y_{1})}^{2} + λ {‖ {\tilde{A}}_{21} - {\tilde{B}}_{21} ‖}_{F}^{2}) + (m_{22} - m_{11}) ({(x_{2} - y_{2})}^{2} + λ {‖ A_{22} - B_{22} ‖}_{F}^{2}) .

(24)

After simplification, to show that (23) is larger than (24), it suffices to show that

2 m_{11} (x_{2} - x_{1}) (y_{1} - y_{2})

(25)

is larger than

λ m_{11} (\sum_{i = 1}^{2} \sum_{j \neq i} {‖ {\tilde{A}}_{i j} - {\tilde{B}}_{i j} ‖}_{F}^{2} - \sum_{i = 1}^{2} {‖ A_{i i} - B_{i i} ‖}_{F}^{2})

(26a)

+ λ m_{12} ({‖ {\tilde{A}}_{12} - {\tilde{B}}_{12} ‖}_{F}^{2} - {‖ A_{12} - B_{12} ‖}_{F}^{2})

(26b)

+ λ m_{21} ({‖ {\tilde{A}}_{21} - {\tilde{B}}_{21} ‖}_{F}^{2} - {‖ A_{21} - B_{21} ‖}_{F}^{2}) .

(26c)

From (21), it follows that the value in (25) is greater than 20λm₁₁. We derive upper bounds for each term in (26). First,

(26 a) \leq λ m_{11} ({‖ {\tilde{A}}_{12} - {\tilde{B}}_{12} ‖}_{F}^{2} + {‖ {\tilde{A}}_{21} - {\tilde{B}}_{21} ‖}_{F}^{2}) \leq 4 λ m_{11}

where the last inequality follows from the fact that:

{‖ A - B ‖}_{F}^{2} = tr (A^{2} - 2 A B + B^{2}) \leq tr (A^{2} + B^{2}) \leq 2

for any A, B ≥ 0 with tr(A) = tr(B) = 1. Now consider

\begin{array}{l} {‖ {\tilde{A}}_{12} - {\tilde{B}}_{12} ‖}_{F}^{2} - {‖ A_{12} - B_{12} ‖}_{F}^{2} = tr (({\tilde{A}}_{12} - {\tilde{B}}_{12} + A_{12} - B_{12}) ({\tilde{A}}_{12} - {\tilde{B}}_{12} - A_{12} + B_{12})) \\ = \frac{m_{11}}{m_{11} + m_{12}} ({‖ A_{11} - B_{22} ‖}_{F}^{2} - {‖ A_{12} - B_{12} ‖}_{F}^{2} - \frac{m_{12}}{m_{11} + m_{12}} {‖ A_{11} - B_{22} - A_{12} + B_{12} ‖}_{F}^{2}) \\ \leq \frac{m_{11}}{m_{11} + m_{12}} {‖ A_{11} - B_{22} ‖}_{F}^{2} \\ \leq 2 \frac{m_{11}}{m_{11} + m_{12}} \end{array}

where the second equality follows from the definitions of Ã₁₂ and B̃₁₂ while the last inequality is obtained by bounding the terms in the trace. Thus, referring to expressions by the respective equation numbering

(26 b) \leq 2 λ m_{12} \frac{m_{11}}{m_{11} + m_{12}} \leq 2 λ m_{11} .

In a similar manner, (26c) ≤ 2λm₁₁. Therefore

(26) \leq 8 λ m_{11} < (25)

which implies that the cost incurred by m̃ is smaller than the cost incurred by m.

For the case where m₁₁ > m₂₂, we can prove the claim by constructing a new transportation plan m̂ with values

\begin{array}{l} \hat{m} (x_{1}, y_{1}) = (m_{11} - m_{22}) \cdot A_{11} \otimes B_{11} \\ \hat{m} (x_{1}, y_{2}) = (m_{12} + m_{22}) \cdot {\hat{A}}_{12} \otimes {\hat{B}}_{12} \\ \hat{m} (x_{2}, y_{1}) = (m_{21} + m_{22}) \cdot {\hat{A}}_{21} \otimes {\hat{B}}_{21} \\ \hat{m} (x_{2}, y_{2}) = 0 \end{array}

with

\begin{array}{l} {\hat{A}}_{12} = \frac{m_{12} A_{12} + m_{22} A_{11}}{m_{12} + m_{22}}, & {\hat{B}}_{12} = \frac{m_{12} B_{12} + m_{22} B_{22}}{m_{12} + m_{22}} \\ {\hat{A}}_{21} = \frac{m_{21} A_{21} + m_{22} A_{22}}{m_{21} + m_{22}}, & {\hat{B}}_{21} = \frac{m_{21} B_{21} + m_{22} B_{11}}{m_{21} + m_{22}} . \end{array}

The rest of the proof is carried out in a similar manner.

Footnotes

A sequence of measures dμ_n converges weak^* to dμ if and only if ∫fdμ_n → ∫fdμ for all continuous and bounded f.

It is interesting to consider functionals of the form tr(c(x, y)m(x, y)), with c(x, y) being matricial, and how to utilize such so as to reflect transportation cost with practical relevance.

Matlab code is available at http://www.ece.umn.edu/users/ningx015/research.html.

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Contributor Information

Lipeng Ning, Email: lning@bwh.harvard.edu, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115 USA.

Tryphon T. Georgiou, Email: tryphon@umn.edu, Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455 USA.

Allen Tannenbaum, Email: allen.tannenbaum@stonybrook.edu, Departments of Computer Science and Applied Mathematics, Stony Brook University, Stony Brook, NY 11794 USA.

References

1.Monge G. Open Library. De l’Imprimerie Royale; 1781. Mémoire sur la théorie des déblais et des remblais. [Google Scholar]
2.Kantorovich L. On the transfer of masses. Dokl Akad Nauk SSSR. 1942;37:227–229. [Google Scholar]
3.Villani C. Topics in optimal transportation. Vol. 58. Providence, RI: American Mathematical Society; 2003. [Google Scholar]
4.Ambrosio L. Lecture notes on optimal transport problems. Mathematical aspects of evolving interfaces. 2003:1–52. [Google Scholar]
5.Rachev S, Rüschendorf L. Mass Transportation Problems: Theory. Vol. 1. New York: Springer-Verlag; 1998. [Google Scholar]
6.Ferrante A, Pavon M, Ramponi F. Hellinger versus Kullback–Leibler multivariable spectrum approximation. IEEE Trans Autom Control. 2008 Apr;53(4):954–967. [Google Scholar]
7.Jiang X, Ning L, Georgiou TT. Distances and Riemannian metrics for multivariate spectral densities. IEEE Trans Autom Control. 2012 Jul;57(7):1723–1735. [Google Scholar]
8.Jiang X, Luo Z, Georgiou T. Geometric methods for spectral analysis. IEEE Trans Signal Process. 2012 Mar;60(3):1064–1074. [Google Scholar]
9.Petz D. Quantum Information Theory and Quantum Statistics (Theoretical and Mathematical Physics) Berlin, Germany: Springer; 2008. [Google Scholar]
10.Boyd S, Vandenberghe L. Convex optimization. Cambridge, MA: Cambridge Univ. Press; 2004. [Google Scholar]
11.Benamou J, Brenier Y. A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numerische Mathematik. 2000;84(3):375–393. [Google Scholar]
12.Jordan R, Kinderlehrer D, Otto F. The variational formulation of the Fokker-Planck equation. SIAM J Mathemat Anal. 1998;29(1):1–17. [Google Scholar]
13.Tannenbaum E, Georgiou T, Tannenbaum A. Signals and control aspects of optimal mass transport and the Boltzmann entropy. Proc. 49th IEEE Conf. Decision and Control; 2010. pp. 1885–1890. [Google Scholar]
14.Biane P, Voiculescu D. A free probability analogue of the Wasserstein metric on the trace-state space. Geometric and Funct Anal. 2001;11(6):1125–1138. [Google Scholar]
15.Rieffel M. Metrics on state spaces. Doc Math J DMV. 1999;4:559–600. [Google Scholar]
16.Andrea FD, Martinetti P. A view on optimal transport from noncommutative geometry. SIGMA. 2010;6(057):24. [Google Scholar]
17.Martinetti P. Towards a Monge–Kantorovich metric in noncommutative geometry. Zap Nauch Semin POMI. 2013:411. arXiv preprint arXiv:1210.6573. [Google Scholar]

[R1] 1.Monge G. Open Library. De l’Imprimerie Royale; 1781. Mémoire sur la théorie des déblais et des remblais. [Google Scholar]

[R2] 2.Kantorovich L. On the transfer of masses. Dokl Akad Nauk SSSR. 1942;37:227–229. [Google Scholar]

[R3] 3.Villani C. Topics in optimal transportation. Vol. 58. Providence, RI: American Mathematical Society; 2003. [Google Scholar]

[R4] 4.Ambrosio L. Lecture notes on optimal transport problems. Mathematical aspects of evolving interfaces. 2003:1–52. [Google Scholar]

[R5] 5.Rachev S, Rüschendorf L. Mass Transportation Problems: Theory. Vol. 1. New York: Springer-Verlag; 1998. [Google Scholar]

[R6] 6.Ferrante A, Pavon M, Ramponi F. Hellinger versus Kullback–Leibler multivariable spectrum approximation. IEEE Trans Autom Control. 2008 Apr;53(4):954–967. [Google Scholar]

[R7] 7.Jiang X, Ning L, Georgiou TT. Distances and Riemannian metrics for multivariate spectral densities. IEEE Trans Autom Control. 2012 Jul;57(7):1723–1735. [Google Scholar]

[R8] 8.Jiang X, Luo Z, Georgiou T. Geometric methods for spectral analysis. IEEE Trans Signal Process. 2012 Mar;60(3):1064–1074. [Google Scholar]

[R9] 9.Petz D. Quantum Information Theory and Quantum Statistics (Theoretical and Mathematical Physics) Berlin, Germany: Springer; 2008. [Google Scholar]

[R10] 10.Boyd S, Vandenberghe L. Convex optimization. Cambridge, MA: Cambridge Univ. Press; 2004. [Google Scholar]

[R11] 11.Benamou J, Brenier Y. A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numerische Mathematik. 2000;84(3):375–393. [Google Scholar]

[R12] 12.Jordan R, Kinderlehrer D, Otto F. The variational formulation of the Fokker-Planck equation. SIAM J Mathemat Anal. 1998;29(1):1–17. [Google Scholar]

[R13] 13.Tannenbaum E, Georgiou T, Tannenbaum A. Signals and control aspects of optimal mass transport and the Boltzmann entropy. Proc. 49th IEEE Conf. Decision and Control; 2010. pp. 1885–1890. [Google Scholar]

[R14] 14.Biane P, Voiculescu D. A free probability analogue of the Wasserstein metric on the trace-state space. Geometric and Funct Anal. 2001;11(6):1125–1138. [Google Scholar]

[R15] 15.Rieffel M. Metrics on state spaces. Doc Math J DMV. 1999;4:559–600. [Google Scholar]

[R16] 16.Andrea FD, Martinetti P. A view on optimal transport from noncommutative geometry. SIGMA. 2010;6(057):24. [Google Scholar]

[R17] 17.Martinetti P. Towards a Monge–Kantorovich metric in noncommutative geometry. Zap Nauch Semin POMI. 2013:411. arXiv preprint arXiv:1210.6573. [Google Scholar]

PERMALINK

On Matrix-Valued Monge–Kantorovich Optimal Mass Transport

Lipeng Ning

Tryphon T Georgiou, Fellow, IEEE

Allen Tannenbaum, Fellow, IEEE

Abstract

I. Introduction

II. Preliminaries on Optimal Mass Transport

III. Matrix-Valued Optimal Mass Transport

A. Tensor Product and Partial Trace

B. Joint Matrix-Valued Density

C. Transportation Cost

D. Optimal Transportation Problem

IV. On the Geometry of Optimal Mass Transport

Definition 1

Proposition 2

Proof

A. Optimal Transport on a Subset

Proposition 3

Proof

Proposition 4

Proof

V. Examples

A. Spectral Morphing

Fig. 1.

Fig. 2.

B. Regularization Using Geodesics

Fig. 3.

Fig. 4.

VI. Conclusion

Biographies

Appendix

Appendix: Proof of Proposition 2

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases