OPERATOR NORM INEQUALITIES BETWEEN TENSOR UNFOLDINGS ON THE PARTITION LATTICE

Miaoyan Wang; Khanh Dao Duc; Jonathan Fischer; Yun S Song

doi:10.1016/j.laa.2017.01.017

. Author manuscript; available in PMC: 2018 May 1.

Published in final edited form as: Linear Algebra Appl. 2017 Jan 17;520:44–66. doi: 10.1016/j.laa.2017.01.017

OPERATOR NORM INEQUALITIES BETWEEN TENSOR UNFOLDINGS ON THE PARTITION LATTICE

Miaoyan Wang ¹, Khanh Dao Duc ¹, Jonathan Fischer ², Yun S Song ^1,^2,³

PMCID: PMC5340277 NIHMSID: NIHMS844502 PMID: 28286347

Abstract

Interest in higher-order tensors has recently surged in data-intensive fields, with a wide range of applications including image processing, blind source separation, community detection, and feature extraction. A common paradigm in tensor-related algorithms advocates unfolding (or flattening) the tensor into a matrix and applying classical methods developed for matrices. Despite the popularity of such techniques, how the functional properties of a tensor changes upon unfolding is currently not well understood. In contrast to the body of existing work which has focused almost exclusively on matricizations, we here consider all possible unfoldings of an order-k tensor, which are in one-to-one correspondence with the set of partitions of {1, …, k}. We derive general inequalities between the l^p-norms of arbitrary unfoldings defined on the partition lattice. In particular, we demonstrate how the spectral norm (p = 2) of a tensor is bounded by that of its unfoldings, and obtain an improved upper bound on the ratio of the Frobenius norm to the spectral norm of an arbitrary tensor. For specially-structured tensors satisfying a generalized definition of orthogonal decomposability, we prove that the spectral norm remains invariant under specific subsets of unfolding operations.

Key words and phrases: higher-order tensors, general unfoldings, partition lattice, operator norm, orthogonality

1. Introduction

Tensors of order 3 or greater, known as higher-order tensors, have recently attracted increased attention in many fields across science and engineering. Methods built on tensors provide powerful tools to capture complex structures in data that lower-order methods may fail to exploit. Among numerous examples, tensors have been used to detect patterns in time-course data [7, 17, 22, 29] and to model higher-order cumulants [1, 2, 14]. However, tensor-based methods are fraught with challenges. Tensors are not simply matrices with more indices; rather, they are mathematical objects possessing multilinear algebraic properties. Indeed, extending familiar matrix concepts such as norms to tensors is non-trivial [12, 18], and computing these quantities has proven to be NP-hard [4–6].

The spectral relations between a general tensor and its lower-order counterparts have yet to be studied. There are generally two types of approaches underlying many existing tensor-based methods. The first approach flattens the tensor into a matrix and applies matrix-based techniques in downstream analyses, notably higher-order SVD [3, 10] and TensorFace [25]. Flattening is computationally convenient because of the ubiquity of well-established matrix-based methods, as well as the connection between tensor contraction and block matrix multiplication [19]. However, matricization leads to a potential loss of the structure found in the original tensor. This motivates the key question of how much information a flattening retains from its parent tensor.

The second approach either handles the tensor directly or unfolds it into objects of order-3 or higher. Recent work on bounding the spectral norm of sub-Gaussian tensors reveals that solving for the convex relaxation of tensor rank by unfolding is suboptimal [24]. Interestingly, in the context of tensor completion, unfolding a higher-order tensor into a nearly cubic tensor requires smaller sample sizes than matricization [28]. These results are probabilistic in nature and merely focus on a particular class of tensors. Assessing the general impact of unfolding operations on an arbitrary tensor and the role of the tensor’s intrinsic structure remains challenging.

The primary goal of this paper is to study the effect of unfolding operations on functional properties of tensors, where an unfolding is any lower-order representation of a tensor. We study the operator norm of a tensor viewed as a multilinear functional because this quantity is commonly used in both theory and applications, especially in tensor completion [15,28] and low-rank approximation problems [23, 27]. Given an order-k tensor, we represent each possible unfolding operation using a partition π of [k] = {1, …, k}, where a block in π corresponds to the set of modes that should be combined into a single mode. Each unfolding is a rearrangement of the elements of the original tensor into a tensor of lower order. Here we study the l^p operator norms of all possible tensor unfoldings, which together define what we coin a “norm landscape” on the partition lattice. A partial order relation between partitions enables us to find a path between an arbitrary pair of unfoldings and establish our main inequalities relating their operator norms. For specially-structured tensors satisfying a generalized definition of orthogonal decomposability, we show that the spectral norm (p = 2) remains invariant under unfolding operations corresponding to a specific subset of partitions. To our knowledge, our results represent the first attempt to provide a full picture of the norm landscape over all possible tensor unfoldings.

The remainder of this paper is organized as follows. In Section 2, we introduce some notation, and relate the spectral norm and the general l^p-norm of a tensor. We then describe in Section 3 general tensor unfoldings defined on the partition lattice. In Section 4, we present our main results on the inequalities between the operator norms of any two tensor unfoldings and describe how the norm landscape changes over the partition lattice. In Section 5, we generalize the notion of orthogonal decomposable tensors and prove that the spectral norm is invariant within a specific set of tensor unfoldings. We conclude in Section 6 with discussions about our findings and avenues of future work.

2. Higher-order tensors and their operator norms

An order-k tensor 𝒜 = 〚a_{i₁ …
i_k}〛 ∈ 𝔽^{d₁×⋯×d_k} over a field 𝔽 is a hypermatrix with dimensions (d₁, …, d_k) and entries a_{i₁…i_k} ∈ 𝔽, for 1 ≤ i_n ≤ d_n, n = 1, …, k. In this paper, we focus on real tensors, 𝔽 = ℝ. The total dimension of 𝒜 is denoted by $dim (𝒜) = \prod_{n = 1}^{k} d_{n}$ .

The vectorization of 𝒜, denoted Vec(𝒜), is defined as the operation rearranging all elements of 𝒜 into a column vector. For ease of notation, we use the shorthand [n] to denote the n-set {1, …, n} for n ∈ ℕ₊, and sometimes write $\otimes_{n = 1}^{k} x_{n} = x_{1} \otimes \dots \otimes x_{k}$ when space is limited. We use S^d−1 = {x ∈ ℝ^d : ‖x‖₂ = 1} to denote the (d − 1)-dimensional unit sphere, and I_d to denote the d × d identity matrix.

For any two tensors 𝒜 = 〚a_{i₁ …
i_k}〛, ℬ = 〚b_{i₁ …
i_k}〛 ∈ ℝ^{d₁×⋯×d_k} of identical order and dimensions, their inner product is defined as

〈 𝒜, ℬ 〉 = \sum_{i_{1}, \dots, i_{k}} a_{i_{1} \dots i_{k}} b_{i_{1} \dots i_{k}},

while the tensor Frobenius norm of 𝒜 is defined as

{‖ 𝒜 ‖}_{F} = \sqrt{〈 𝒜, 𝒜 〉} = \sqrt{\sum_{i_{1}, \dots, i_{k}} {| a_{i_{1} \dots i_{k}} |}^{2}},

both of which are analogues of standard definitions for vectors and matrices.

Following [12], we define the covariant multilinear matrix multiplication of a tensor 𝒜 ∈ ℝ^{d₁×⋯×d_k} by matrices $M_{1} = (m_{i_{1} j_{1}}^{(1)}) \in ℝ^{d_{1} \times s_{1}}, \dots, M_{k} = (m_{i_{k} j_{k}}^{(k)}) \in ℝ^{d_{k} \times s_{k}}$ as

𝒜 (M_{1}, \dots, M_{k}) = 〚 \sum_{i_{1} = 1}^{d_{1}} \dots \sum_{i_{k} = 1}^{d_{k}} a_{i_{1} \dots i_{k}} m_{i_{1} j_{1}}^{(1)} \dots m_{i_{k} j_{k}}^{(k)} 〛,

which results in an order-k tensor in ℝ^{s₁×⋯×s_k}. This operation multiplies the n^th mode of 𝒜 by the matrix M_n for all n ∈ [k]. Just as a matrix may be multiplied in up to two modes by matrices of consistent dimensions, an order-k tensor can be multiplied by up to k matrices in k modes. In the case of k = 2, 𝒜 is a matrix and $𝒜 (M_{1}, M_{2}) = M_{1}^{T} 𝒜 M_{2}$ . Sometimes we are interested in multiplying by vectors rather than matrices, in which case we obtain the k-multilinear functional 𝒜: ℝ^d₁×⋯× ℝ^d_k → ℝ given by

𝒜 (x_{1}, \dots, x_{k}) = \sum_{i_{1} = 1}^{d_{1}} \dots \sum_{i_{k} = 1}^{d_{k}} a_{i_{1} \dots i_{k}} x_{i_{1}}^{(1)} \dots x_{i_{k}}^{(k)} = 〈 𝒜, x_{1} \otimes \dots \otimes x_{k} 〉,

(1)

where $x_{n} = {(x_{1}^{(n)}, \dots, x_{d_{n}}^{(n)})}^{T} \in ℝ^{d_{n}}$ , n ∈ [k]. Note that multiplying by a vector in r modes in the manner defined in (1) reduces the order of the output tensor to k − r, whereas multiplying by matrices leaves the order unchanged. Although the coordinate representation of a tensor as a hypermatrix provides a concrete description, viewing it instead as a multilinear functional provides a coordinate-free, basis-independent perspective which allows us to better characterize the spectral relations among different tensor unfoldings.

We define the operator norm, or induced norm, of a tensor 𝒜 using the associated k-multilinear functional (1).

Definition 2.1 (Lim [12])

Let 𝒜 ∈ ℝ^{d₁×⋯×d_k} be an order-k tensor. For any 1 ≤ p ≤ ∞, the l^p-norm of the multilinear functional associated with 𝒜 is defined as

{‖ 𝒜 ‖}_{p} = sup {\frac{𝒜 (x_{1}, \dots, x_{k})}{{‖ x_{1} ‖}_{p} \dots {‖ x_{k} ‖}_{p}} : x_{n} \neq 0, x_{n} \in ℝ^{d_{n}}, n \in [k]} = sup {𝒜 (x_{1}, \dots, x_{k}) : {‖ x_{n} ‖}_{p} = 1, x_{n} \in ℝ^{d_{n}}, n \in [k]},

(2)

where ‖x_n‖_p denotes the vector l^p-norm of x_n.

Remark 2.2

The special case of p = 2 is called the spectral norm, frequently denoted ‖𝒜‖_σ. By (2), ‖𝒜‖_σ is the maximum value obtained as the inner product of the tensor 𝒜 with a rank-1 tensor, x₁ ⊗ ⋯ ⊗ x_n, of Frobenius norm 1 and of the same dimensions. This point of view provides an equivalent definition of ‖𝒜‖_σ as determining the best rank-1 tensor approximation to 𝒜, and we note that the rank-1 constraint becomes weaker as more unfolding is applied. See Section 4 for further details.

Because we restrict all entries of 𝒜 and ${x_{n}}_{n = 1}^{k}$ to be real, we need not take the absolute value of 𝒜(x₁, …, x_k) as in [12]. It is worth mentioning that the notion of tensor l^p-norms defined by (2) are not extensions of the classical matrix l^p-norms when p ≠ 2. To see this, recall that for an m × n matrix A, one usually defines the l^p operator norm as

{‖ A ‖}_{p} = sup {\frac{{‖ A x ‖}_{p}}{{‖ x ‖}_{p}} : x \neq 0, x \in ℝ^{n}} .

(3)

In general (2) and (3) are not equal even for matrices, illustrated by the following example:

Example 2.3

Let A = 〚a_ij〛 be the 2 × 2 matrix $A = [\begin{matrix} 1 & 1 \\ 0 & 4 \end{matrix}]$ and consider p = 1. Solving (3), we have

{‖ A ‖}_{1} = sup {\frac{{‖ A x ‖}_{1}}{{‖ x ‖}_{1}} : x \neq 0, x \in ℝ^{2}} = sup_{j} \sum_{i} | a_{i j} | = 5 .

However, instead using (2) gives

{‖ A ‖}_{1} = sup {\frac{x_{1}^{T} A x_{2}}{{‖ x_{1} ‖}_{1} {‖ x_{2} ‖}_{1}} : x_{n} \in ℝ^{2}, x_{n} \neq 0, n \in [2]} = sup {x_{1}^{T} A x_{2} : x_{n} \in ℝ^{2}, {‖ x_{n} ‖}_{1} = 1, n \in [2]} = 4,

which is neither the classical matrix l¹-norm (equal to 5), nor the entry-wise l¹-norm (equal to ∑_i,j |a_ij | = 6).

Throughout this paper, we adopt Definition 2.1 and always use ‖·‖_p to denote the l^p-norm defined therein, even for matrices. In fact, (3) defines an operator norm by viewing the matrix as a linear operator from ℝ^d₂ to ℝ^d₁, whereas (2) defines an operator norm in which the matrix defines a bilinear functional from ℝ^d₁ × ℝ^d₂ to ℝ. These two definitions are equivalent when p = 2, but otherwise represent two different operators and result in two distinct operator norms. To be consistent with our treatment of tensors as k-multilinear functionals, we formulate matrices as bilinear functionals.

For a given tensor, its l^p-norm and l^q-norm mutually control each other, and the comparison bound is polynomial in the total dimension of the tensor, $dim (𝒜) = \prod_{n = 1}^{k} d_{n}$ .

Proposition 2.4 (l^p-norm vs. l^q-norm)

Let 𝒜 ∈ ℝ^{d₁× ⋯
×d_k} be an order-k tensor. Suppose ‖·‖_p and ‖·‖_q are two norms defined in (2) with q ≥ p ≥ 1. Then,

{‖ 𝒜 ‖}_{p} \leq {‖ 𝒜 ‖}_{q} \leq dim {(𝒜)}^{\frac{1}{p} - \frac{1}{q}} {‖ 𝒜 ‖}_{p} .

Proof

Starting from Definition 2.1, we have

{‖ 𝒜 ‖}_{q} = sup {\frac{𝒜 (x_{1}, \dots, x_{k})}{{‖ x_{1} ‖}_{q} \dots {‖ x_{k} ‖}_{q}} : x_{n} \neq 0, x_{n} \in ℝ^{d_{n}}, n \in [k]} = sup {\frac{𝒜 (x_{1}, \dots, x_{k})}{{‖ x_{1} ‖}_{p} \dots {‖ x_{k} ‖}_{p}} \times \frac{{‖ x_{1} ‖}_{p} \dots {‖ x_{k} ‖}_{p}}{{‖ x_{1} ‖}_{q} \dots {‖ x_{k} ‖}_{q}} : x_{n} \neq 0, x_{n} \in ℝ^{d_{n}}, n \in [k]} .

(4)

For any q ≥ p ≥ 1, the equivalence of vector norms tells us

{‖ x ‖}_{q} \leq {‖ x ‖}_{p} \leq d^{\frac{1}{p} - \frac{1}{q}} {‖ x ‖}_{q}, for
all x \in ℝ^{d} .

(5)

Applying (5) to x_n, for n ∈ [k], gives

1 \leq \prod_{n = 1}^{k} \frac{{‖ x_{n} ‖}_{p}}{{‖ x_{n} ‖}_{q}} \leq {(\prod_{n = 1}^{k} d_{n})}^{\frac{1}{p} - \frac{1}{q}} .

(6)

Inserting (6) into (4) and noting $dim (𝒜) = \prod_{n = 1}^{k} d_{n}$ , we find

{‖ 𝒜 ‖}_{p} \leq {‖ 𝒜 ‖}_{q} \leq dim {(𝒜)}^{\frac{1}{p} - \frac{1}{q}} {‖ 𝒜 ‖}_{p},

which completes the proof.

3. Partitions and general tensor unfoldings

Any higher-order tensor can be transformed into different lower-order tensors by modifying its indices in various ways. The most common transformations are n-mode flattenings, or matricizations, which rearrange the elements of an order-k tensor into a d_n × ∏_i≠n d_i matrix. For example, the n-mode matricization of a tensor 𝒜 ∈ ℝ^{d₁×⋯×d_k} is obtained by mapping the fixed tensor index (i₁, …, i_k) to the matrix index (i_n, m), where

m = 1 + \prod_{a \in [k], a \neq n} (i_{a} - 1) J_{a}, where J_{a} = \prod_{l \in [a - 1], l \neq n} d_{l} .

(7)

Recently there has been much interest in studying the relationship between tensors and their matrix flattenings [9]. We present here a more general analysis by considering all possible lower-order tensor unfoldings rather than just matricizations. Using the blocks of a partition of [k] to specify which modes are combined into a single mode of the new tensor, we establish a one-to-one correspondence between the set of all partitions of [k] and the set of lower-order tensor unfoldings. The partition lattice then describes the underlying relationship between possible tensor unfoldings of a tensor 𝒜.

For any k ∈ ℕ₊, a partition π of [k] is a collection ${B_{1}^{π}, B_{2}^{π}, \dots, B_{ℓ}^{π}}$ of disjoint, nonempty subsets (or blocks) $B_{i}^{π}$ satisfying $\cup_{i = 1}^{ℓ} B_{i}^{π} = [k]$ . The set of all partitions of [k] is denoted 𝒫_[k]. We use |π| to denote the number of blocks in π, and $| B_{i}^{π} |$ to denote the number of elements in $B_{i}^{π}$ . We say that a partition π is a level-ℓ partition if |π| = ℓ. The set of all level-ℓ partitions of [k] is denoted by $𝒫_{[k]}^{ℓ}$ , which is a set of S(k, ℓ) elements, where S(k, ℓ) is the Stirling number of the second kind.

The following partial order naturally relates partitions satisfying a basic compatibility constraint and the resulting structure plays a key role in our work.

Definition 3.1 (Partition Lattice)

A partition π₁ ∈ 𝒫_[k] is called a refinement of π₂ ∈ 𝒫_[k] if each block of π₁ is a subset of some block of π₂; conversely, π₂ is said to be a coarsening of π₁. This relationship defines a partial order, expressed as π₁ ≤ π₂, and we say that π₁ is finer than π₂ while π₂ is coarser than π₁. If either π₁ ≤ π₂ or π₂ ≤ π₁, then π₁ and π₂ are comparable. According to this partial order, the least element of 𝒫_[k] is 0_[k] ≔ {{1}, …, {k}}, while the greatest element is 1_[k] ≔ {{1, …, k}}. Equipped with this notion, 𝒫_[k] generates a partition lattice by connecting any two comparable partitions that differ by exactly one level. An example is illustrated in Figure 1. Henceforth, 𝒫_[k] may represent either the set of all partitions of [k] or the partition lattice it generates depending on context.

It is clear that many partitions are not comparable, including every pair of distinct partitions at the same level. To consider arbitrary partitions in tandem, we require an extension of this partial order. In general, for any two partitions π₁, π₂ ∈ 𝒫_[k], we define their greatest lower bound π₁ ∧ π₂ as

π_{1} \land π_{2} ≔ sup {π \in 𝒫_{[k]} : π \leq π_{1}, π \leq π_{2}} .

More concretely, π₁ ∧ π₂ consists of the collection of all nonempty intersections of blocks in π₁ and π₂ and is unique for a given pair (π₁, π₂).

Example 3.2

Figure 1 illustrates 𝒫_[4] in lattice form. Recall that an edge connects two partitions if and only if they are comparable and their levels differ by exactly one. Partitions are comparable if and only if there exists a non-reversing path between them. To clarify the components of Definition 3.1, take π₁ = {{1, 4}, {2, 3}} and π₂ = {{1, 3, 4}, {2}}. Then π₁ ∧ π₂ = {{1, 4}, {2}, {3}}, 0_[4] = {{1}, {2}, {3}, {4}}, and 1_[4] = {{1, 2, 3, 4}}.

The following definition generalizes the concept of n-mode flattenings defined in (7) to general unfoldings induced by arbitrary partitions π ∈ 𝒫_[k].

Definition 3.3 (General Tensor Unfolding)

Let 𝒜 ∈ ℝ^{d₁×⋯×d_k} be an order-k tensor and $π = {B_{1}^{π}, \dots, B_{ℓ}^{π}} \in 𝒫_{[k]}$ . The partition π defines a mapping $ϕ_{π} : [d_{1}] \times \dots \times [d_{k}] \to [\prod_{j \in B_{1}^{π}} d_{j}] \times \dots \times [\prod_{j \in B_{ℓ}^{π}} d_{j}]$ such that

ϕ_{π} (i_{1}, \dots, i_{k}) = (m_{1}, \dots, m_{ℓ}),

(8)

where

m_{j} = 1 + \prod_{r \in B_{j}^{π}} (i_{r} - 1) J_{r}, where J_{r} = \prod_{l \in B_{j}^{π}, l < r} d_{l}, for all j \in [ℓ] .

Clearly, ϕ_π is a one-to-one mapping, so its inverse $ϕ_{π}^{- 1}$ is well defined. Thus ϕ_π induces an unfolding action 𝒜 ↦ Unfold_π(𝒜) such that

{({Unfold}_{π} (𝒜))}_{(m_{1}, \dots, m_{ℓ})} = {(𝒜)}_{ϕ_{π}^{- 1} (m_{1}, \dots, m_{ℓ})},

for all $(m_{1}, \dots, m_{ℓ}) \in [\prod_{j \in B_{1}^{π}} d_{j}] \times \dots \times [\prod_{j \in B_{ℓ}^{π}} d_{j}]$ or, equivalently,

{({Unfold}_{π} (𝒜))}_{ϕ_{π} (i_{1}, \dots, i_{k})} = {(𝒜)}_{(i_{1}, \dots, i_{k})},

for all (i₁, …, i_k) ∈ [d₁]×⋯× [d_k]. Thus Unfold_π(𝒜) is an order-ℓ tensor of dimensions $(\prod_{i \in B_{1}^{π}} d_{i}, \dots, \prod_{i \in B_{ℓ}^{π}} d_{i})$ , and we call it the tensor unfolding of 𝒜 induced by π.

Remark 3.4

By the definition of ϕ_π in (8), Unfold_{0_[k]}(𝒜) = 𝒜 and Unfold_{1_[k]}(𝒜) = Vec(𝒜).

Example 3.5

Consider an order-4 tensor 𝒜 = 〚a_ijkl〛 ∈ ℝ^2×2×2×2. We provide a subset of the possible tensor unfoldings to elucidate both the manner in which the operation works and the natural association with partitions.

For π = {{1, 2}, {3}, {4}}, Unfold_π(𝒜) is an order-3 tensor of dimensions (4, 2, 2) with entries given by
${({Unfold}_{π} (𝒜))}_{1 k l} = a_{11 k l}, {({Unfold}_{π} (𝒜))}_{2 k l} = a_{12 k l},$

${({Unfold}_{π} (𝒜))}_{3 k l} = a_{21 k l}, {({Unfold}_{π} (𝒜))}_{4 k l} = a_{22 k l},$
for all (k, l) ∈ [2] × [2].
For π = {{1, 2}, {3, 4}}, Unfold_π(𝒜) is a 4 × 4 matrix
${Unfold}_{π} (𝒜) = (\begin{matrix} a_{1111} & a_{1112} & a_{1121} & a_{1122} \\ a_{1211} & a_{1212} & a_{1221} & a_{1222} \\ a_{2111} & a_{2112} & a_{2121} & a_{2122} \\ a_{2211} & a_{2212} & a_{2221} & a_{2222} \end{matrix}) .$
For π = {{1, 2, 3}, {4}}, Unfold_π(𝒜) is a 8 × 2 matrix
${Unfold}_{π} (𝒜) = (\begin{matrix} a_{1111} & a_{1112} \\ a_{1121} & a_{1122} \\ a_{1211} & a_{1212} \\ a_{1221} & a_{1222} \\ a_{2111} & a_{2112} \\ a_{2121} & a_{2122} \\ a_{2211} & a_{2212} \\ a_{2221} & a_{2222} \end{matrix}) .$

Remark 3.6

There are different conventions to order the elements within each transformed mode. In principle, the ordering of elements within each transformed mode is irrelevant, so we do not explicitly spell out their orderings hereafter.

Remark 3.7

The unfolding operation leaves the Frobenius norm unchanged; that is, ‖𝒜‖_F = ‖Unfold_π(𝒜)‖_F for all π ∈ 𝒫_[k]. More generally, the inner product remains invariant under all unfoldings: 〈𝒜, ℬ〉 = 〈Unfold_π(𝒜), Unfold_π(ℬ)〉 for all π ∈ 𝒫_[k], where 𝒜, ℬ ∈ ℝ^{d₁×⋯×d_k} are two order-k tensors of the same dimensions.

4. Operator norm inequalities on the partition lattice

In this section, we compare the operator norms of different unfoldings of a tensor, in particular relative to that of the original tensor. We first focus on the spectral norm (p = 2) and then discuss extensions to general l^p-norms.

Recall that for an order-k tensor 𝒜 ∈ ℝ^{d₁×⋯×d_k}, its spectral norm is defined as

{‖ 𝒜 ‖}_{σ} = sup {𝒜 (x_{1}, \dots, x_{k}) : {‖ x_{n} ‖}_{2} = 1, x_{n} \in ℝ^{d_{n}}, n \in [k]} .

(9)

The maximization of the polynomial form 𝒜(x₁, …, x_k) on the unit sphere is closely related to the best, in the least-square sense, rank-1 tensor approximation. Specifically, for an order-k tensor 𝒜 ∈ ℝ^{d₁×⋯×d_k}, the problem of determining the spectral norm is equivalent to finding a scalar λ and a rank-1 norm-1 tensor x₁ ⊗⋯⊗ x_k that minimize the function

f (x_{1}, \dots, x_{k}) = {‖ 𝒜 - λ x_{1} \otimes \dots \otimes x_{k} ‖}_{F} .

The corresponding value of λ is equal to ‖𝒜‖_σ. Since x₁ ⊗⋯⊗ x_k must have the same order and dimensions as 𝒜, the rank-1 condition becomes less strict the more unfolded the tensor. In particular, for the vectorized unfolding, Vec(𝒜), the best rank-1 tensor approximation is simply Vec(𝒜) itself. For unfoldings into a matrix, Mat(𝒜), the best rank-1 approximation is the outer product of the leading left and the right singular vectors of Mat(𝒜). For higher-order unfoldings, the closed form of the best rank-1 approximation is not known in general. Nevertheless, the set of the rank-1 tensors over which the supremum is taken in (9) becomes more restricted. This observation implies that the spectral norm of a tensor unfolding decreases as the order of the unfolded tensor increases; that is, the spectral norm preserves the partial order on partitions.

Proposition 4.1 (Monotonicity)

For all partitions π₁, π₂ ∈ 𝒫_[k] satisfying π₁ ≤ π₂,

{‖ {Unfold}_{π_{1}} (𝒜) ‖}_{σ} \leq {‖ {Unfold}_{π_{2}} (𝒜) ‖}_{σ} .

In particular, we have the global extrema

{‖ 𝒜 ‖}_{σ} = {‖ {Unfold}_{0_{[k]}} (𝒜) ‖}_{σ} = min_{π \in 𝒫_{[k]}} {‖ {Unfold}_{π} (𝒜) ‖}_{σ},

{‖ 𝒜 ‖}_{F} = {‖ {Unfold}_{1_{[k]}} (𝒜) ‖}_{σ} = max_{π \in 𝒫_{[k]}} {‖ {Unfold}_{π} (𝒜) ‖}_{σ} .

Proof

Suppose π₁, π₂ ∈ 𝒫_[k], and π₁ is a one-step refinement of π₂. Without loss of generality, assume π₂ is obtained by merging two blocks B₁, B₂ ∈ π₁ into a single block B ∈ π₂. Let Unfold_π₁(𝒜) ∈ ℝ^{d₁×⋯×d_ℓ} denote the tensor unfolding induced by π₁ = {B₁, …, B_ℓ}. Then, Unfold_π₂(𝒜) is a (d₁ · d₂, d₃, …, d_ℓ)-dimensional tensor. By definition of the spectral norm,

{‖ {Unfold}_{π_{1}} (𝒜) ‖}_{σ} = sup {〈 {Unfold}_{π_{1}} (𝒜), x_{1} \otimes \dots \otimes x_{ℓ} 〉 : x_{n} \in S^{d_{n} - 1}, n \in [ℓ]} = sup {〈 {Unfold}_{π_{2}} (𝒜), Vec (x_{1} \otimes x_{2}) \otimes x_{3} \otimes \dots \otimes x_{ℓ} 〉 : x_{n} \in S^{d_{n} - 1}, n \in [ℓ]} \leq sup {〈 {Unfold}_{π_{2}} (𝒜), y \otimes x_{3} \dots \otimes x_{ℓ} 〉 : y \in S^{d_{1} \cdot d_{2} - 1}, x_{n} \in S^{d_{n} - 1}, n = 3, \dots, ℓ} = {‖ {Unfold}_{π_{2}} (𝒜) ‖}_{σ},

where the third line comes from the fact that the set {Vec(x₁ ⊗ x₁) : (x₁, x₂) ∈ S^d₁−1 × S^d₂−1} is contained in the set {y : y ∈ S^{d₁·d₂−1}}.

In general, if π₁ ≤ π₂, we can obtain Unfold_π₂(𝒜) from Unfold_π₁(𝒜) by a series of single unfoldings. Hence, applying the above arguments to these successive unfoldings gives the desired result.

The following lemma will play a key role in proving our main results:

Lemma 4.2 (One-Step Inequality)

For 1 < ℓ ≤ k, let ℬ ∈ ℝ^{d₁×⋯×d_ℓ} be an order-ℓ tensor unfolding of an order-k tensor 𝒜 induced by the partition {B₁, …, B_ℓ} ∈ 𝒫_[k], and let 𝒞 be an order-(ℓ − 1) tensor unfolding of ℬ induced by merging blocks B_i and B_j for some i, j ∈ [ℓ]. Then,

min {(d_{i}, d_{j})}^{- 1 / 2} {‖ 𝒞 ‖}_{σ} \leq {‖ ℬ ‖}_{σ} \leq {‖ 𝒞 ‖}_{σ} .

(10)

Proof

The upper bound follows readily from Proposition 4.1. To prove the lower bound, without loss of generality, assume 𝒞 corresponds to the merging of blocks B_ℓ−1 and B_ℓ so that 𝒞 is a (d₁, …, d_ℓ−2, d_ℓ−1·d_ℓ)-dimensional tensor. Note that S^d₁−1 ×⋯× S^{d_ℓ−2−1} × S^{d_ℓ−1d_ℓ−1} is a compact set, so the supremum (9) is attained in that set for 𝒞. Then, there exists $(x_{1}^{*}, \dots, x_{ℓ - 2}^{*}, y^{*}) \in S^{d_{1} - 1} \times \dots \times S^{d_{ℓ - 2} - 1} \times S^{d_{ℓ - 1} d_{ℓ} - 1}$ such that

{‖ 𝒞 ‖}_{σ} = 〈 𝒞, x_{1}^{*} \otimes \dots \otimes x_{ℓ - 2}^{*} \otimes y^{*} 〉 .

(11)

Define $𝒞^{*} = 𝒞 (x_{1}^{*}, \dots, x_{ℓ - 2}^{*}, I_{d_{ℓ - 1} d_{ℓ}}) \in ℝ^{d_{ℓ - 1} d_{ℓ}}$ . By the self duality of the Frobenius norm in ℝ^{d_ℓ−1d_ℓ}, we can rewrite (11) as

{‖ 𝒞 ‖}_{σ} = sup {〈 𝒞^{*}, y 〉 : y \in S^{d_{ℓ - 1} d_{ℓ} - 1}} = {‖ 𝒞^{*} ‖}_{F} .

(12)

On the other hand, if we define $Mat (𝒞^{*}) ≔ ℬ (x_{1}^{*}, \dots, x_{ℓ - 2}^{*}, I_{ℓ - 1}, I_{ℓ}) \in ℝ^{d_{ℓ - 1} \times d_{ℓ}}$ , then 𝒞* is simply the vectorization of Mat(𝒞*). Hence, by Remark 3.7, we obtain

{‖ 𝒞^{*} ‖}_{F} = {‖ Mat (𝒞^{*}) ‖}_{F} .

(13)

Using the definition of Mat(𝒞*), we can write

{‖ ℬ ‖}_{σ} \geq sup {〈 ℬ, x_{1}^{*} \otimes x_{2}^{*} \otimes \dots \otimes x_{ℓ - 2}^{*} \otimes x_{ℓ - 1} \otimes x_{ℓ} 〉 : (x_{ℓ - 1}, x_{ℓ}) \in S^{d_{ℓ - 1} - 1} \times S^{d_{ℓ} - 1}} = sup {〈 Mat (𝒞^{*}), x_{ℓ - 1} \otimes x_{ℓ} 〉 : (x_{ℓ - 1}, x_{ℓ}) \in S^{d_{ℓ - 1} - 1} \times S^{d_{ℓ} - 1}} = {‖ Mat (𝒞^{*}) ‖}_{σ} .

(14)

Recall that Mat(𝒞*) is a matrix of size d_ℓ−1 × d_ℓ, meaning [8]

{‖ Mat (𝒞^{*}) ‖}_{σ} \geq min {(d_{ℓ - 1}, d_{ℓ})}^{- 1 / 2} {‖ Mat (𝒞^{*}) ‖}_{F} .

(15)

Using (14) in conjunction with (15), (13), and (12), we then obtain

{‖ ℬ ‖}_{σ} \geq min {(d_{ℓ - 1}, d_{ℓ})}^{- 1 / 2} {‖ Mat (𝒞^{*}) ‖}_{F} = min {(d_{ℓ - 1}, d_{ℓ})}^{- 1 / 2} {‖ 𝒞^{*} ‖}_{F} = min {(d_{ℓ - 1}, d_{ℓ})}^{- 1 / 2} {‖ 𝒞 ‖}_{σ},

which proves the lower bound.

Remark 4.3

Both bounds in the one-step inequality (10) are sharp. The sharpness of the upper bound will be discussed in Section 5. For the lower bound, consider an order-ℓ tensor unfolding $ℬ = (\sum_{i = 1}^{d_{1}} e_{1, i} \otimes e_{2, i}) \otimes 𝒟 \in ℝ^{d_{1} \times d_{2} \times \dots \times d_{ℓ}}$ , where {e_1,i : i ∈ [d₁]} is the standard orthonormal basis of ℝ^d₁, {e_2,i : i ∈ [d₁]} is a set of d₁ standard basis vectors in ℝ^d₂, and 𝒟 is an arbitrary (d₃, …, d_ℓ)-dimensional tensor. Assume d₁ ≤ d₂ and consider the two blocks B₁ = {1}, B₂ = {2}. By the unfolding operation specified in Lemma 4.2 with B₁ and B₂, $𝒞 = (\sum_{i = 1}^{d_{1}} Vec (e_{1, i} \otimes e_{2, i})) \otimes 𝒟 \in ℝ^{d_{1} \cdot d_{2} \times d_{3} \times \dots \times d_{ℓ}}$ , which is an order-(ℓ − 1) tensor unfolding by merging the first two modes of ℬ into a single mode. It follows that ‖ℬ‖_σ = ‖𝒟‖_σ and ${‖ 𝒞 ‖}_{σ} = \sqrt{d_{1}} {‖ 𝒟 ‖}_{σ}$ , and therefore the left-hand-side inequality in (10) is saturated.

More generally, we can establish inequalities relating the spectral norms of two tensor unfoldings corresponding to two arbitrary partitions π₁, π₂ ∈ 𝒫_[k] that are not necessarily comparable. To do so, we must first introduce the following definition.

Definition 4.4

Given an order-k tensor 𝒜 ∈ ℝ^{d₁×⋯×d_k}, we define the map dim_𝒜 : 𝒫_[k] ×𝒫_[k] → ℕ₊ as

{dim}_{𝒜} (π_{1}, π_{2}) = \prod_{B \in π_{1}} max_{B' \in π_{2}} D_{𝒜} (B, B'), where D_{𝒜} (B, B') = \prod_{n \in B \cap B'} d_{n},

where π₁, π₂ ∈ 𝒫_[k].

We label this quantity as dim_𝒜(·, ·) because it involves a product of a subset of the dimensions of 𝒜, and dim_𝒜(π₁, π₂) ≤ dim(𝒜) with equality only when π₁ = π₁ ∧ π₂. Intuitively, dim_𝒜(π₁, π₂) reflects the overlap between the unfoldings induced by π₁ and π₁ ∧ π₂. Example 4.6 presents a concrete illustration.

Remark 4.5

We set D_𝒜(B, B′) = 0 when B ∩ B′ = ∅. This does not affect the multiplication in Definition 4.4 because a block of π₁ cannot be disjoint from every block of π₂. Note that D_𝒜(B, B′) = D_𝒜(B′, B), but in general dim_𝒜(π₁, π₂) ≠ dim_𝒜(π₂, π₁).

Example 4.6

To illustrate the above map, let 𝒜 ∈ ℝ^{d₁×d₂×d₃×d₄} and consider the partitions π₁ = {{1, 2}, {3, 4}} and π₂ = {{1, 2, 3}, {4}}, for which π₁∧π₂ = {{1, 2}, {3}, {4}}. From Definition 4.4,

D_{𝒜} ({1, 2}, {1, 2, 3}) = d_{1} d_{2}

D_{𝒜} ({3, 4}, {1, 2, 3}) = d_{3}

D_{𝒜} ({1, 2}, {4}) = 0

D_{𝒜} ({3, 4}, {4}) = d_{4} .

Then,

{dim}_{𝒜} (π_{1}, π_{2}) = max {D_{𝒜} ({1, 2}, {1, 2, 3}), D_{𝒜} ({1, 2}, {4})} \times max {D_{𝒜} ({3, 4}, {1, 2, 3}), D_{𝒜} ({3, 4}, {4})} = d_{1} d_{2} max {d_{3}, d_{4}} .

Exchanging arguments, we find

{dim}_{𝒜} (π_{2}, π_{1}) = max {D_{𝒜} ({1, 2, 3}, {1, 2}), D_{𝒜} ({1, 2, 3}, {3, 4})} \times max {D_{𝒜} ({4}, {1, 2}), D_{𝒜} ({4}, {3, 4})} = d_{4} max {d_{1} d_{2}, d_{3}} .

Remark 4.7

As in Lemma 4.2, if π₂ is a one-step coarsening of π₁ obtained by merging two blocks $B_{i}^{π_{1}}$ and $B_{j}^{π_{1}}$ of dimensions d_i and d_j into a single block $B_{1}^{π_{2}}$ , then

D_{𝒜} (B_{i}^{π_{1}}, B_{1}^{π_{2}}) = d_{i},

D_{𝒜} (B_{j}^{π_{1}}, B_{1}^{π_{2}}) = d_{j} .

Hence, Lemma 4.2 can be written as

min {D_{𝒜} (B_{i}^{π_{1}}, B_{1}^{π_{2}}), D_{𝒜} (B_{j}^{π_{1}}, B_{1}^{π_{2}})}^{- 1 / 2} {‖ {Unfold}_{π_{2}} (𝒜) ‖}_{σ} \leq {‖ {Unfold}_{π_{1}} (𝒜) ‖}_{σ} \leq {‖ {Unfold}_{π_{2}} (𝒜) ‖}_{σ} .

Having introduced Definition 4.4, we can now state our main result on how the spectral norms of two arbitrary unfoldings of a tensor are related:

Theorem 4.8 (Spectral norm inequalities)

Let 𝒜 ∈ ℝ^{d₁×⋯×d_k} be an arbitrary order-k tensor, and π₁, π₂ any two partitions in 𝒫_[k]. Then,

{[\frac{dim (𝒜)}{{dim}_{𝒜} (π_{1}, π_{2})}]}^{- 1 / 2} {‖ {Unfold}_{π_{1}} (𝒜) ‖}_{σ} \leq {‖ {Unfold}_{π_{2}} (𝒜) ‖}_{σ} \leq {[\frac{dim (𝒜)}{{dim}_{𝒜} (π_{2}, π_{1})}]}^{1 / 2} {‖ {Unfold}_{π_{1}} (𝒜) ‖}_{σ} .

Remark 4.9

Note that π₁ and π₂ need not be comparable.
If d_n = d for all n ∈ [k], then the result reduces to
$d^{- c_{1} / 2} {‖ {Unfold}_{π_{1}} (𝒜) ‖}_{σ} \leq {‖ {Unfold}_{π_{2}} (𝒜) ‖}_{σ} \leq d^{c_{2} / 2} {‖ {Unfold}_{π_{1}} (𝒜) ‖}_{σ},$
where c₁ = (k − ∑_B∈π₁ max_B′∈π₂ |B ∩ B′|) and c₂ = (k − ∑_B∈π₂ max_B′∈π₁ |B ∩ B′|).
For k = 4, the above inequalities in (b) are sharp. For example, consider the tensor 𝒜 = I_d ⊗ I_d, and partitions π = {{1, 2}, {3, 4}} and π′ = {{1}, {2}, {3}, {4}}, for which we have ‖Unfold_π(𝒜)‖_σ = d and ‖Unfold_π′(𝒜)‖_σ = 1. If π₁ = π and π₂ = π′, then c₁ = 2 and d^−c₁/2 ‖Unfold_π₁(𝒜)‖_σ = ‖Unfold_π₂(𝒜)‖_σ. On the other hand, if π₁ = π′ and π₂ = π, then we have c₂ = 2 and ‖Unfold_π₂(𝒜)‖_σ = d^c₂/2 ‖Unfold_π₁(𝒜)‖_σ. This particular tensor is further discussed later in Example 4.15.

Proof of Theorem 4.8

The main idea is to apply Lemma 4.2 to some appropriate sequence of partitions connecting π₁ and π₂ in the partition lattice. To do so, we consider Unfold_{π₁∧π₂}(𝒜) and compare its spectral norm to that of Unfold_π₁(𝒜). Since π₁ ∧ π₂ ≤ π₁, from Proposition 4.1 we have

{‖ {Unfold}_{π_{1} \land π_{2}} (𝒜) ‖}_{σ} \leq {‖ {Unfold}_{π_{1}} (𝒜) ‖}_{σ} .

(16)

Let $π_{1} = {B_{i}^{π_{1}} : i \in [ℓ_{1}]}$ and $π_{2} = {B_{j}^{π_{2}} : j \in [ℓ_{2}]}$ , and note that $π_{1} \land π_{2} = {B_{i}^{π_{1}} \cap B_{j}^{π_{2}} : i \in [ℓ_{1}], j \in [ℓ_{2}]}$ . Now, order the blocks in π₂ such that

D_{𝒜} (B_{1}^{π_{1}}, B_{m_{1}}^{π_{2}}) \geq D_{𝒜} (B_{1}^{π_{1}}, B_{m_{2}}^{π_{2}}) \geq \dots \geq D_{𝒜} (B_{1}^{π_{1}}, B_{m_{ℓ_{2}}}^{π_{2}}),

(17)

and define $B_{1 j} = B_{1}^{π_{1}} \cap B_{m_{j}}^{π_{2}}$ . Consider a sequence (𝒯₁, 𝒯₂, …, 𝒯_ℓ₂−1) of unfoldings of 𝒜 where 𝒯_j, for 1 ≤ j ≤ ℓ₂ − 1, is obtained from the tensor Unfold_{π₁∧π₂}(𝒜) by an unfolding operation corresponding to merging the blocks B_1,1, …, B_1,j+1 into a single block. Using Lemma 4.2 and (17), we obtain

{‖ {Unfold}_{π_{1} \land π_{2}} (𝒜) ‖}_{σ} \geq min {D_{𝒜} (B_{1}^{π_{1}}, B_{m_{1}}^{π_{2}}), D_{𝒜} (B_{1}^{π_{1}}, B_{m_{2}}^{π_{2}})}^{- 1 / 2} {‖ 𝒯_{1} ‖}_{σ} = {[D_{𝒜} (B_{1}^{π_{1}}, B_{m_{2}}^{π_{2}})]}^{- 1 / 2} {‖ 𝒯_{1} ‖}_{σ} .

Similarly for all i ∈ [ℓ₂ − 2],

{‖ 𝒯_{i} ‖}_{σ} \geq min {D_{𝒜} (B_{1}^{π_{1}}, B_{m_{i + 2}}^{π_{2}}), \prod_{j = 1}^{i + 1} D_{𝒜} (B_{1}^{π_{1}}, B_{m_{j}}^{π_{2}})}^{- 1 / 2} {‖ 𝒯_{i + 1} ‖}_{σ} = {[D_{𝒜} (B_{1}^{π_{1}}, B_{m_{i + 2}}^{π_{2}})]}^{- 1 / 2} {‖ 𝒯_{i + 1} ‖}_{σ} .

Combining these inequalities gives

{‖ {Unfold}_{π_{1} \land π_{2}} (𝒜) ‖}_{σ} \geq {[\prod_{j = 2}^{ℓ_{2}} D_{𝒜} (B_{1}^{π_{1}}, B_{m_{j}}^{π_{2}})]}^{- 1 / 2} {‖ 𝒯_{ℓ_{2} - 1} ‖}_{σ} = {[\frac{\prod_{j = 1}^{ℓ_{2}} D_{𝒜} (B_{1}^{π_{1}}, B_{j}^{π_{2}})}{max_{j \in [ℓ_{2}]} {D_{𝒜} (B_{1}^{π_{1}}, B_{j}^{π_{2}})}}]}^{- 1 / 2} {‖ 𝒯_{ℓ_{2} - 1} ‖}_{σ} .

We can iterate the same line of argument with ${B_{i}^{π_{1}} \cap B_{m_{j}}^{π_{2}} : j \in [ℓ_{2}]}$ for i = 2, …, ℓ₁ to obtain

{‖ {Unfold}_{π_{1} \land π_{2}} (𝒜) ‖}_{σ} \geq \prod_{i = 1}^{ℓ_{1}} {[\frac{\prod_{j = 1}^{ℓ_{2}} D_{𝒜} (B_{i}^{π_{1}}, B_{j}^{π_{2}})}{max_{j \in [ℓ_{2}]} {D_{𝒜} (B_{i}^{π_{1}}, B_{j}^{π_{2}})}}]}^{- 1 / 2} {‖ {Unfold}_{π_{1}} (𝒜) ‖}_{σ} = {[\frac{\prod_{i = 1}^{ℓ_{1}} \prod_{j = 1}^{ℓ_{2}} D_{𝒜} (B_{i}^{π_{1}}, B_{j}^{π_{2}})}{\prod_{i = 1}^{ℓ_{1}} max_{j \in [ℓ_{2}]} {D_{𝒜} (B_{i}^{π_{1}}, B_{j}^{π_{2}})}}]}^{- 1 / 2} {‖ {Unfold}_{π_{1}} (𝒜) ‖}_{σ} = {[\frac{\prod_{n \in [k]} d_{n}}{{dim}_{𝒜} (π_{1}, π_{2})}]}^{- 1 / 2} {‖ {Unfold}_{π_{1}} (𝒜) ‖}_{σ} .

Together with (16), this last inequality means

{‖ {Unfold}_{π_{1} \land π_{2}} (𝒜) ‖}_{σ} \leq {‖ {Unfold}_{π_{1}} (𝒜) ‖}_{σ} \leq {[\frac{\prod_{n \in [k]} d_{n}}{{dim}_{𝒜} (π_{1}, π_{2})}]}^{1 / 2} {‖ {Unfold}_{π_{1} \land π_{2}} (𝒜) ‖}_{σ} .

(18)

By symmetry,

{‖ {Unfold}_{π_{1} \land π_{2}} (𝒜) ‖}_{σ} \leq {‖ {Unfold}_{π_{2}} (𝒜) ‖}_{σ} \leq {[\frac{\prod_{n \in [k]} d_{n}}{{dim}_{𝒜} (π_{2}, π_{1})}]}^{1 / 2} {‖ {Unfold}_{π_{1} \land π_{2}} (𝒜) ‖}_{σ} .

(19)

Finally, combining (18) and (19) completes the proof.

We may immediately establish several corollaries of Theorem 4.8.

Corollary 4.10

All order-k tensors 𝒜 ∈ ℝ^{d₁×⋯×d_k} satisfy

{‖ 𝒜 ‖}_{F} \leq {[\frac{dim (𝒜)}{{max}_{n \in [k]} d_{n}}]}^{1 / 2} {‖ 𝒜 ‖}_{σ} .

Proof

Taking π₁ = 0_[k] and π₂ = 1_[k] in Theorem 4.8 yields the result.

Corollary 4.10 gives the worst-case ratio of the Frobenius norm to the spectral norm for an arbitrary tensor. This ratio is sharper than the bound recently found by Friedland and Lim [5, Lemma 5.1], namely ‖𝒜‖_F ≤ dim(𝒜)^1/2 ‖𝒜‖_σ.

We now give a set of inequalities comparing the spectral norms of unfoldings at level ℓ to that at either level k or level 1. For ease of exposition, we assume d_n = d for all n ∈ [k].

Corollary 4.11 (Bottom-Up Inequality)

Let 𝒜 ∈ ℝ^d×⋯×d be an order-k tensor with the same dimension d in all modes. For all levels 1 ≤ ℓ ≤ k and partitions $π \in 𝒫_{[k]}^{ℓ}$ ,

d^{- (k - ℓ) / 2} max_{π \in 𝒫_{[k]}^{ℓ}} {‖ {Unfold}_{π} (𝒜) ‖}_{σ} \leq {‖ 𝒜 ‖}_{σ} \leq min_{π \in 𝒫_{[k]}^{ℓ}} {‖ {Unfold}_{π} (𝒜) ‖}_{σ} .

(20)

Proof

Take π₂ = 0_[k] in Theorem 4.8.

Remark 4.12

The existing work that is most closely related to our own is that of Hu’s [9], in which the author bounds the nuclear norm of a tensor by that of its matricization. Since the nuclear norm and spectral norm are dual to each other in tensor space, many of our results apply to the nuclear norm as well. Particularly, letting ℓ = 2 in the bottom-up inequality (20) reproduces Hu’s results.

Corollary 4.13 (Top-Down Inequality)

Let 𝒜 ∈ ℝ^d×⋯×d be an order-k tensor with the same dimension d in all modes. For all levels 1 ≤ ℓ ≤ k and partitions $π \in 𝒫_{[k]}^{ℓ}$ ,

d^{- (k - {max}_{i \in [ℓ]} | B_{i}^{π} |) / 2} {‖ 𝒜 ‖}_{F} \leq {‖ {Unfold}_{π} (𝒜) ‖}_{σ} \leq {‖ 𝒜 ‖}_{F} .

Proof

Take π₁ = 1_[k] in Theorem 4.8.

Corollary 4.14

Let 𝒜 ∈ ℝ^d×⋯×d be an order-k tensor with the same dimension d in all modes. For all levels 1 ≤ ℓ ≤ k,

d^{- (k - ⌈ k / ℓ ⌉) / 2} {‖ 𝒜 ‖}_{F} \leq min_{π \in 𝒫_{[k]}^{ℓ}} {‖ {Unfold}_{π} (𝒜) ‖}_{σ} .

Proof

Note that the minimum of the maximal block sizes across all level-ℓ partitions of [k] is $min_{π \in 𝒫_{[k]}^{ℓ}} max_{B \in π} | B | = ⌈ k / ℓ ⌉$ , and apply Corollary 4.13.

The above corollaries bound the amount by which norms can vary over a specific level ℓ. They imply that the ratios ‖Unfold_π(𝒜)‖_σ/‖𝒜‖_σ and ‖Unfold_π(𝒜)‖_σ/‖𝒜‖_F fall in the intervals [1, d^(k−ℓ)/2] and [d^{−(k−
⌈k/ℓ⌉)/2}, 1], respectively. Therefore, in the worst case, ‖Unfold_π(𝒜)‖_σ only recovers ‖𝒜‖_σ or ‖𝒜‖_F at poly(d) precision. Note that the factor d^(k−ℓ)/2 has an exponent linear in the difference between the orders of the original tensor and its flattening. This means that the potential deviation between their spectral norms depends only on the difference in their orders rather than the actual orders themselves, and that the deviation accumulates in multiplicative fashion with a loss of $\sqrt{d}$ in precision at each level. In contrast, the factor d^{−(k−⌈k/ℓ⌉)/2} depends on more than just the gap between k and ℓ, with a larger impact for unfoldings with orders close to k.

We provide a low-order example that reaches the poly(d) scaling factor in Corollary 4.11.

Example 4.15

Consider the order-4 tensor 𝒜 = I_d ⊗ I_d. Straightforward calculation shows that ‖𝒜‖_σ = 1. Furthermore, by symmetry, the spectral norm of an unfolding induced by any partition in $𝒫_{[4]}^{2}$ or $𝒫_{[4]}^{3}$ must fall into one of the following five representative cases:

For $π_{1} = {{1}, {2}, {3, 4}} \in 𝒫_{[4]}^{3}, {‖ {Unfold}_{π_{1}} (𝒜) ‖}_{σ} = \sqrt{d}$ .
For $π_{2} = {{1}, {3}, {2, 4}} \in 𝒫_{[4]}^{3}, {‖ {Unfold}_{π_{2}} (𝒜) ‖}_{σ} = 1$ .
For $π_{3} = {{1, 2}, {3, 4}} \in 𝒫_{[4]}^{2}, {‖ {Unfold}_{π_{3}} (𝒜) ‖}_{σ} = d$ .
For $π_{4} = {{1}, {2, 3, 4}} \in 𝒫_{[4]}^{2}, {‖ {Unfold}_{π_{4}} (𝒜) ‖}_{σ} = \sqrt{d}$ .
For $π_{5} = {{1, 3}, {2, 4}} \in 𝒫_{[4]}^{2}, {‖ {Unfold}_{π_{5}} (𝒜) ‖}_{σ} = 1$ .

Therefore

max_{π \in 𝒫_{[4]}^{3}} {‖ {Unfold}_{π} (𝒜) ‖}_{σ} = d^{(4 - 3) / 2} {‖ 𝒜 ‖}_{σ} and min_{π \in 𝒫_{[4]}^{3}} {‖ {Unfold}_{π} (𝒜) ‖}_{σ} = {‖ 𝒜 ‖}_{σ} .

max_{π \in 𝒫_{[4]}^{2}} {‖ {Unfold}_{π} (𝒜) ‖}_{σ} = d^{(4 - 2) / 2} {‖ 𝒜 ‖}_{σ} and min_{π \in 𝒫_{[4]}^{2}} {‖ {Unfold}_{π} (𝒜) ‖}_{σ} = {‖ 𝒜 ‖}_{σ} .

We conclude this section by generalizing Theorem 4.8 to l^p-norms.

Theorem 4.16 (^{l^p}-norm inequalities)

Let 𝒜 ∈ ℝ^{d₁×⋯×d_k} be an arbitrary order-k tensor, and π₁, π₂ any two partitions in 𝒫_[k]. Then,

For any 1 ≤ p ≤ 2,
$\frac{{[dim (𝒜)]}^{- 1 / p}}{{[{dim}_{𝒜} (π_{1}, π_{2})]}^{- 1 / 2}} {‖ {Unfold}_{π_{1}} (𝒜) ‖}_{p} \leq {‖ {Unfold}_{π_{2}} (𝒜) ‖}_{p} \leq \frac{{[dim (𝒜)]}^{1 / p}}{{[{dim}_{𝒜} (π_{2}, π_{1})]}^{1 / 2}} {‖ {Unfold}_{π_{1}} (𝒜) ‖}_{p} .$
For any 2 ≤ p ≤ ∞,
$\frac{{[dim (𝒜)]}^{\frac{1}{p} - 1}}{{[{dim}_{𝒜} (π_{1}, π_{2})]}^{- 1 / 2}} {‖ {Unfold}_{π_{1}} (𝒜) ‖}_{p} \leq {‖ {Unfold}_{π_{2}} (𝒜) ‖}_{p} \leq \frac{{[dim (𝒜)]}^{1 - \frac{1}{p}}}{{[{dim}_{𝒜} (π_{2}, π_{1})]}^{1 / 2}} {‖ {Unfold}_{π_{1}} (𝒜) ‖}_{p} .$

Proof

We only prove (a) since (b) follows similarly. For any given 1 ≤ p ≤ 2, taking q = 2 in Proposition 2.4 implies that the bound between the l^p-norm and spectral norm depends only on the total dimension of the tensor, dim(𝒜) = ∏_n∈[k] d_n. Because the total dimension is invariant under any unfolding operation, we have

dim {(𝒜)}^{\frac{1}{2} - \frac{1}{p}} {‖ {Unfold}_{π} (𝒜) ‖}_{σ} \leq {‖ {Unfold}_{π} (𝒜) ‖}_{p} \leq {‖ {Unfold}_{π} (𝒜) ‖}_{σ},

(21)

for all π ∈ 𝒫_[k]. Combining (21) with Theorem 4.8 gives the desired results.

5. Orthogonal decomposability and norm equality on upper cones

We have seen that the unfolding operation may change the spectral norm by up to a poly(d) factor for an arbitrary 𝒜. This is undesirable in many flattening-based algorithms, such as [3, 25]. However, for some specially-structured tensors, the operator norm on the partition lattice may not change much either globally or locally. We demonstrate such a behavior for the following class of tensors:

Definition 5.1 (π-orthogonal decomposable)

Let 𝒜 ∈ ℝ^{d₁×⋯×d_k} be an order-k tensor and consider any partition π ∈ 𝒫_[k]. Then 𝒜 is called π-orthogonal decomposable, or π-OD, over ℝ if it admits the decomposition

𝒜 = \sum_{n = 1}^{r} λ_{n} a_{1}^{(n)} \otimes \dots \otimes a_{k}^{(n)},

(22)

where λ₁ ≥ λ₂ ≥ ⋯ ≥ λ_r ≥ 0, and the set of vectors ${a_{i}^{(n)} \in ℝ^{d_{i}} : i \in [k], n \in [r]}$ satisfies

〈 \otimes_{i \in B} a_{i}^{(n)}, \otimes_{i \in B} a_{i}^{(m)} 〉 = δ_{n m},

for all B ∈ π and all n, m ∈ [r].

A concept similar to π-OD, referred to as biorthogonal eigentensor decomposition [28], is introduced in the tensor completion literature when k = 3 and π = {{1}, {2, 3}}. Informally speaking, π-OD imposes an orthogonality constraint on every block of singular vectors.

Remark 5.2 (0_[k]-OD)

When π = 0_[k] in Definition 5.1, we obtain the special case of 0_[k]-OD tensors, which admit the decomposition (22) while satisfying

〈 a_{i}^{(n)}, a_{i}^{(m)} 〉 = δ_{n m},

for all i ∈ [k] and all n, m ∈ [r].

The definition of 0_[k]-OD tensors generalizes the definition of orthogonal decomposable tensors presented in [21], as we require neither symmetry nor equality of dimension across modes. In fact, a 0_[k]-OD tensor 𝒜 is a diagonalizable tensor [3], meaning that the core tensor output from higher-order SVD is superdiagonal (i.e., entries are zero unless i₁ = ⋯ = i_k).

Lemma 5.3

Consider an order-k tensor 𝒜 ∈ ℝ^{d₁×⋯×d_k}.

Let π₁, π₂ ∈ 𝒫_[k] and π₁ ≤ π₂. If 𝒜 is π₁-OD, then 𝒜 is π₂-OD.
Let π ∈ 𝒫_[k] and π ≠ 1_[k]. If 𝒜 is π-OD, then
${‖ {Unfold}_{π} (𝒜) ‖}_{σ} = λ_{1} .$ (23)

Proof

Part (a): For any two finite sets of vectors {x_i} and {y_i} for which x_i, y_i ∈ ℝ^d_i, we have

〈 \otimes_{i \in B} x_{i}, \otimes_{j \in B} y_{j} 〉 = \prod_{i \in B} 〈 x_{i}, y_{i} 〉,

(24)

for all B ⊂ [k]. Suppose B ∈ π₂. If π₁ ≤ π₂, then there exist subsets C₁, …, C_m ∈ π₁ such that C₁ ∪ ⋯ ∪ C_m = B. So,

〈 \otimes_{i \in B} x_{i}, \otimes_{j \in B} y_{j} 〉 = \prod_{i \in B} 〈 x_{i}, y_{i} 〉 = \prod_{a = 1}^{m} \prod_{i \in C_{a}} 〈 x_{i}, y_{i} 〉 = \prod_{a = 1}^{m} 〈 \otimes_{i \in C_{a}} x_{i}, \otimes_{j \in C_{a}} y_{j} 〉,

which implies that 𝒜 is π₂-OD if 𝒜 is π₁-OD.

Part (b): Suppose $π \in 𝒫_{[k]}^{ℓ}$ is of the form $π = {B_{i}^{π} : i \in [ℓ]}$ . Note that π ≠ 1_[k] implies ℓ ≥ 2. Letting $τ = {B_{1}^{π}, {(B_{1}^{π})}^{c}}$ where ${(B_{1}^{π})}^{c}$ denotes the complement of $B_{1}^{π}$ with respect to [k], we have $τ \in 𝒫_{[k]}^{2}$ and π ≤ τ. By Lemma 5.3(a), 𝒜 is τ-OD, so 𝒜 admits a decomposition of the form

𝒜 = \sum_{n = 1}^{r} λ_{n} a_{1}^{(n)} \otimes \dots \otimes a_{k}^{(n)},

(25)

where

〈 \otimes_{i \in B_{1}} a_{i}^{(n)}, \otimes_{j \in B_{1}} a_{j}^{(m)} 〉 = δ_{n m} and 〈 \otimes_{i \in {(B_{1}^{π})}^{c}} a_{i}^{(n)}, \otimes_{j \in {(B_{1}^{π})}^{c}} a_{j}^{(m)} 〉 = δ_{n m}

(26)

for all n, m ∈ [r].

Now define $x_{n} = Vec (\otimes_{i \in B_{1}^{π}} a_{i}^{(n)})$ and $y_{n} = Vec (\otimes_{i \in {(B_{1}^{π})}^{c}} a_{i}^{(n)})$ for all n ∈ [r]. By (26), both {x_n} and {y_n} are sets of orthonormal vectors. By the definition of Unfold_τ (𝒜), (25) implies

{Unfold}_{τ} (𝒜) = \sum_{n = 1}^{r} λ_{n} x_{n} y_{n}^{T},

which is simply the matrix SVD of Unfold_τ (𝒜). Hence ‖Unfold_τ (𝒜)‖_σ = λ₁. Using monotonicity (c.f., Proposition 4.1), we have

{‖ {Unfold}_{π} (𝒜) ‖}_{σ} \leq {‖ {Unfold}_{τ} (𝒜) ‖}_{σ} = λ_{1} .

(27)

Conversely, by the definition of the spectral norm, we have

{‖ {Unfold}_{π} (𝒜) ‖}_{σ} \geq 〈 {Unfold}_{π} (𝒜), Vec (\otimes_{i \in B_{1}^{π}} a_{i}^{(1)}) \otimes \dots \otimes Vec (\otimes_{i \in B_{ℓ}^{π}} a_{i}^{(1)}) 〉 = \sum_{n = 1}^{r} λ_{n} 〈 Vec (\otimes_{i \in B_{1}^{π}} a_{i}^{(n)}) \otimes \dots \otimes Vec (\otimes_{i \in B_{ℓ}^{π}} a_{i}^{(n)}), Vec (\otimes_{i \in B_{ℓ}^{π}} a_{i}^{(1)}) \otimes \dots \otimes Vec (\otimes_{i \in B_{ℓ}^{π}} a_{i}^{(1)}) 〉 = \sum_{n = 1}^{r} λ_{n} \prod_{j \in [ℓ]} 〈 Vec (\otimes_{i \in B_{j}^{π}} a_{i}^{(n)}), Vec (\otimes_{i \in B_{j}^{π}} a_{i}^{(1)}) 〉 = \sum_{n = 1}^{r} λ_{n} δ_{n, 1} = λ_{1}

(28)

where the third line comes from (24) and the last line follows from the fact that 𝒜 is π-OD. Combining (27) and (28), we conclude ‖Unfold_π(𝒜)‖_σ = λ₁.

Remark 5.4

The condition π ≠ 1_[k] in Lemma 5.3 is needed for (23) to hold. In fact, consider a 2 × 2 matrix A = 2e₁ ⊗ e₁ + e₁ ⊗ e₂, where {e_i, i ∈ [2]} is the canonical basis of ℝ². Then, A is 1_[k]-OD with λ₁ = 2, but ${‖ {Unfold}_{1_{[k]}} (𝒜) ‖}_{σ} = {‖ 𝒜 ‖}_{F} = \sqrt{5} \neq 2$ .

Theorem 5.5 (Norm equality on upper cones)

If 𝒜 is π-OD, then for any partition in the upper cone U_π = {τ ∈ 𝒫_[k] : π ≤ τ < 1_[k]} of π, we have

{‖ {Unfold}_{τ} (𝒜) ‖}_{σ} = {‖ {Unfold}_{π} (𝒜) ‖}_{σ} .

Proof

If 𝒜 is π-OD, then by Lemma 5.3(b), we have ‖Unfold_π(𝒜)‖_σ = λ₁. Given any τ ≥ π, by Lemma 5.3(a), 𝒜 is also τ-OD. Again, using Lemma 5.3(b), we have ‖Unfold_τ (𝒜)‖_σ = λ₁. Therefore, ‖Unfold_τ (𝒜)‖_σ = ‖Unfold_π(𝒜)‖_σ.

Theorem 5.5 states that the spectral norm is invariant for π-OD 𝒜 under any unfolding induced by the partitions in the upper cone U_π of π. This lies in contrast with the poly(d) factor we have seen for unstructured tensors.

Corollary 5.6

If 𝒜 is 0_[k]-OD, then for all partitions π ≠ 1_[k], we have

{‖ {Unfold}_{π} (𝒜) ‖}_{σ} = {‖ 𝒜 ‖}_{σ} .

Corollary 5.6 implies that for 0_[k]-OD tensors, the operator norm is invariant under any unfolding operations except vectorization. Lastly, π₁, π₂ ∈ U_{π₁∧π₂} implies the following corollary:

Corollary 5.7

Let π₁, π₂ ∈ 𝒫_[k]. If 𝒜 is (π₁ ∧ π₂)-OD, then

{‖ {Unfold}_{π_{1}} (𝒜) ‖}_{σ} = {‖ {Unfold}_{π_{2}} (𝒜) ‖}_{σ} .

6. Discussion

In this paper, we presented a new framework representing all possible tensor unfoldings by the partition lattice and established a set of general inequalities quantifying the impact of tensor unfoldings on the operator norms of the resulting tensors. We showed that the comparison bounds scale polynomially in the dimensions {d_n} of the tensor, with powers depending on the corresponding partition and block sizes for any pair of tensor unfoldings being compared. As a direct consequence, we demonstrated how the operator norm of a general tensor is lower and upper bounded by that of its unfoldings.

In general, an unfolding operation may inflate the operator norm by up to a poly(d) factor, as seen in Corollary 4.11. Note that the quantity dim(𝒜) plays a key role in the worst-case inflation factor and is a manifestation of the curse of dimensionality. Specifically, dim(𝒜) can be quite large as the mode dimensions and tensor order increase, with particular sensitivity to the latter. In such settings, our main result seems to bode poorly for flattening-based algorithms; however, we believe that it should be interpreted with caution because our comparison bounds deal with arbitrary tensors rather than those often sought in applications. In fact, π-OD tensors permit much tighter bounds in which some unfoldings, including certain matricizations, leave the operator norm relatively unaffected. In practice, π-OD tensors, or those within a small neighborhood around π-OD tensors, arise widely in statistical and machine learning applications [1, 11, 16, 26].

Additionally, our work enables us to compare different unfoldings at the same level ℓ. Recent work on problems featuring nuclear-norm regularization has shown that not all n-mode flattenings are equally preferable [13]. Indeed, as illustrated in Example 4.15, the operator norm of level-2 unfoldings (i.e. matricizations) can be quite different. Recently, several algorithms have been proposed to account for this behavior. For example, Tomioka et al. [23] consider a weighted sum of the norms of all single-mode matricizations. Other techniques include two-mode matricization [26] and square matricization [15, 20], in which the original tensor is reshaped into a matrix by flattening along multiple modes. Our work provides general bounds to evaluate the effectiveness of such schemes. In particular, the results presented here are used in the theoretical analysis of a two-mode higher-order SVD algorithm proposed recently [26].

We have not attempted to characterize the degree to which operator norm relations on the partition lattice restrict the original tensor. Essentially, this is a converse problem asking whether π-OD is a necessary condition for Theorem 5.5 and Corollary 5.6 in addition to being sufficient. If not, it would be useful to determine the extent to which such equalities inform us about the intrinsic structure of the original tensor. From a practical standpoint, norm comparisons between different matricizations are relatively simple, but the optimal manner in which to use this information to learn about the original tensor remains unknown.

In closing, we emphasize that while this work focuses on theory rather than computational tractability, it possesses practical implications as well. Because direct calculation of the operator norm of a level-ℓ tensor is generally computationally prohibitive for ℓ ≥ 3, exploiting level-2 unfoldings may be attractive when the unfolding effect is small enough. Alternatively, for more precise calculations, a number of approximation algorithms exist for higher-order tensor problems [23, 27] at the cost of increased computation. Given that the trade-off between accuracy and computation is often unavoidable, our work may be of help in finding an appropriate application-specific balance when working with higher-order tensors.

Acknowledgments

This research is supported in part by a Math+X Research Grant from the Simons Foundation, a Packard Fellowship for Science and Engineering, and an NIH training grant T32-HG000047.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

1.Anandkumar A, Ge R, Hsu D, Kakade SM, Telgarsky M. Tensor decompositions for learning latent variable models. The Journal of Machine Learning Research. 2014;15(1):2773–2832. [Google Scholar]
2.Cardoso J-F. Eigen-structure of the fourth-order cumulant tensor with application to the blind source separation problem. International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 1990:2655–2658. [Google Scholar]
3.De Lathauwer L, De Moor B, Vandewalle J. A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications. 2000;21(4):1253–1278. [Google Scholar]
4.Friedland S, Lim L-H. Computational complexity of tensor nuclear norm. SIAM Journal on Optimization. 2016;4(26):2378–2393. [Google Scholar]
5.Friedland S, Lim L-H. Nuclear norm of higher-order tensors. Mathematics of Computation. (to appear) [Google Scholar]
6.Hillar CJ, Lim L-H. Most tensor problems are NP-hard. Journal of the ACM. 2013;60(6) Article 45. [Google Scholar]
7.Hoff PD. Multilinear tensor regression for longitudinal relational data. The Annals of Applied Statistics. 2015;9(3):1169–1193. doi: 10.1214/15-AOAS839. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Horn RA, Johnson CR. Matrix Analysis. Cambridge University Press; 2012. [Google Scholar]
9.Hu S. Relations of the nuclear norm of a tensor and its matrix flattenings. Linear Algebra and its Applications. 2015;478:188–199. [Google Scholar]
10.Kroonenberg PM. Three-Mode Principal Component Analysis: Theory and Applications. Vol. 2. DSWO Press; 1983. [Google Scholar]
11.Kuleshov V, Chaganty A, Liang P. Tensor factorization via matrix factorization. Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS) 2015:507–516. [Google Scholar]
12.Lim L-H. Singular values and eigenvalues of tensors: a variational approach. Proceedings of the IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP) 2005:129–132. [Google Scholar]
13.Liu J, Musialski P, Wonka P, Ye J. Tensor completion for estimating missing values in visual data. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2013;35(1):208–220. doi: 10.1109/TPAMI.2012.39. [DOI] [PubMed] [Google Scholar]
14.McCullagh P. Tensor notation and cumulants of polynomials. Biometrika. 1984;71(3):461–476. [Google Scholar]
15.Mu C, Huang B, Wright J, Goldfarb D. Square deal: Lower bounds and improved relaxations for tensor recovery. Proceedings of the 31st International Conference on Machine Learning (ICML) 2014:73–81. [Google Scholar]
16.Nickel M, Tresp V, Kriegel H-P. A three-way model for collective learning on multi-relational data. Proceedings of the 28th International Conference on Machine Learning (ICML) 2011:809–816. [Google Scholar]
17.Omberg L, Golub GH, Alter O. A tensor higher-order singular value decomposition for integrative analysis of DNA microarray data from different studies. Proceedings of the National Academy of Sciences. 2007;104(47):18371–18376. doi: 10.1073/pnas.0709146104. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Qi L. Eigenvalues of a real supersymmetric tensor. Journal of Symbolic Computation. 2005;40(6):1302–1324. [Google Scholar]
19.Ragnarsson S, Van Loan CF. Block tensor unfoldings. SIAM Journal on Matrix Analysis and Applications. 2012;33(1):149–169. [Google Scholar]
20.Richard E, Montanari A. A statistical model for tensor PCA. Advances in Neural Information Processing Systems (NIPS) 2014;27:2897–2905. [Google Scholar]
21.Robeva E. Orthogonal decomposition of symmetric tensors. SIAM Journal on Matrix Analysis and Applications. 2016;37:86–102. [Google Scholar]
22.Sun J, Tao D, Faloutsos C. Beyond streams and graphs: dynamic tensor analysis. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006:374–383. [Google Scholar]
23.Tomioka R, Hayashi K, Kashima H. Estimation of low-rank tensors via convex optimization. 2010 arXiv:1010.0789. [Google Scholar]
24.Tomioka R, Suzuki T. Spectral norm of random tensors. 2014 arXiv:1407.1870. [Google Scholar]
25.Vasilescu MAO, Terzopoulos D. Multilinear analysis of image ensembles: Tensorfaces. Proceedings of European Conference on Computer Vision (ECCV) 2002:447–460. [Google Scholar]
26.Wang M, Song YS. Orthogonal tensor decompositions via two-mode higher-order SVD (HOSVD) 2016 arXiv:1612.03839. [Google Scholar]
27.Yu Y, Cheng H, Zhang X. Approximate low-rank tensor learning. 7th NIPS Workshop on Optimization for Machine Learning. 2014 [Google Scholar]
28.Yuan M, Zhang C-H. On tensor completion via nuclear norm minimization. Foundations of Computational Mathematics. 2015:1–38. [Google Scholar]
29.Zhao L, Zaki MJ. Tricluster: an effective algorithm for mining coherent clusters in 3D microarray data. Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. 2005:694–705. [Google Scholar]

[R1] 1.Anandkumar A, Ge R, Hsu D, Kakade SM, Telgarsky M. Tensor decompositions for learning latent variable models. The Journal of Machine Learning Research. 2014;15(1):2773–2832. [Google Scholar]

[R2] 2.Cardoso J-F. Eigen-structure of the fourth-order cumulant tensor with application to the blind source separation problem. International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 1990:2655–2658. [Google Scholar]

[R3] 3.De Lathauwer L, De Moor B, Vandewalle J. A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications. 2000;21(4):1253–1278. [Google Scholar]

[R4] 4.Friedland S, Lim L-H. Computational complexity of tensor nuclear norm. SIAM Journal on Optimization. 2016;4(26):2378–2393. [Google Scholar]

[R5] 5.Friedland S, Lim L-H. Nuclear norm of higher-order tensors. Mathematics of Computation. (to appear) [Google Scholar]

[R6] 6.Hillar CJ, Lim L-H. Most tensor problems are NP-hard. Journal of the ACM. 2013;60(6) Article 45. [Google Scholar]

[R7] 7.Hoff PD. Multilinear tensor regression for longitudinal relational data. The Annals of Applied Statistics. 2015;9(3):1169–1193. doi: 10.1214/15-AOAS839. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Horn RA, Johnson CR. Matrix Analysis. Cambridge University Press; 2012. [Google Scholar]

[R9] 9.Hu S. Relations of the nuclear norm of a tensor and its matrix flattenings. Linear Algebra and its Applications. 2015;478:188–199. [Google Scholar]

[R10] 10.Kroonenberg PM. Three-Mode Principal Component Analysis: Theory and Applications. Vol. 2. DSWO Press; 1983. [Google Scholar]

[R11] 11.Kuleshov V, Chaganty A, Liang P. Tensor factorization via matrix factorization. Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS) 2015:507–516. [Google Scholar]

[R12] 12.Lim L-H. Singular values and eigenvalues of tensors: a variational approach. Proceedings of the IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP) 2005:129–132. [Google Scholar]

[R13] 13.Liu J, Musialski P, Wonka P, Ye J. Tensor completion for estimating missing values in visual data. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2013;35(1):208–220. doi: 10.1109/TPAMI.2012.39. [DOI] [PubMed] [Google Scholar]

[R14] 14.McCullagh P. Tensor notation and cumulants of polynomials. Biometrika. 1984;71(3):461–476. [Google Scholar]

[R15] 15.Mu C, Huang B, Wright J, Goldfarb D. Square deal: Lower bounds and improved relaxations for tensor recovery. Proceedings of the 31st International Conference on Machine Learning (ICML) 2014:73–81. [Google Scholar]

[R16] 16.Nickel M, Tresp V, Kriegel H-P. A three-way model for collective learning on multi-relational data. Proceedings of the 28th International Conference on Machine Learning (ICML) 2011:809–816. [Google Scholar]

[R17] 17.Omberg L, Golub GH, Alter O. A tensor higher-order singular value decomposition for integrative analysis of DNA microarray data from different studies. Proceedings of the National Academy of Sciences. 2007;104(47):18371–18376. doi: 10.1073/pnas.0709146104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Qi L. Eigenvalues of a real supersymmetric tensor. Journal of Symbolic Computation. 2005;40(6):1302–1324. [Google Scholar]

[R19] 19.Ragnarsson S, Van Loan CF. Block tensor unfoldings. SIAM Journal on Matrix Analysis and Applications. 2012;33(1):149–169. [Google Scholar]

[R20] 20.Richard E, Montanari A. A statistical model for tensor PCA. Advances in Neural Information Processing Systems (NIPS) 2014;27:2897–2905. [Google Scholar]

[R21] 21.Robeva E. Orthogonal decomposition of symmetric tensors. SIAM Journal on Matrix Analysis and Applications. 2016;37:86–102. [Google Scholar]

[R22] 22.Sun J, Tao D, Faloutsos C. Beyond streams and graphs: dynamic tensor analysis. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006:374–383. [Google Scholar]

[R23] 23.Tomioka R, Hayashi K, Kashima H. Estimation of low-rank tensors via convex optimization. 2010 arXiv:1010.0789. [Google Scholar]

[R24] 24.Tomioka R, Suzuki T. Spectral norm of random tensors. 2014 arXiv:1407.1870. [Google Scholar]

[R25] 25.Vasilescu MAO, Terzopoulos D. Multilinear analysis of image ensembles: Tensorfaces. Proceedings of European Conference on Computer Vision (ECCV) 2002:447–460. [Google Scholar]

[R26] 26.Wang M, Song YS. Orthogonal tensor decompositions via two-mode higher-order SVD (HOSVD) 2016 arXiv:1612.03839. [Google Scholar]

[R27] 27.Yu Y, Cheng H, Zhang X. Approximate low-rank tensor learning. 7th NIPS Workshop on Optimization for Machine Learning. 2014 [Google Scholar]

[R28] 28.Yuan M, Zhang C-H. On tensor completion via nuclear norm minimization. Foundations of Computational Mathematics. 2015:1–38. [Google Scholar]

[R29] 29.Zhao L, Zaki MJ. Tricluster: an effective algorithm for mining coherent clusters in 3D microarray data. Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. 2005:694–705. [Google Scholar]

PERMALINK

OPERATOR NORM INEQUALITIES BETWEEN TENSOR UNFOLDINGS ON THE PARTITION LATTICE

Miaoyan Wang

Khanh Dao Duc

Jonathan Fischer

Yun S Song

Abstract

1. Introduction

2. Higher-order tensors and their operator norms

Definition 2.1 (Lim [12])

Remark 2.2

Example 2.3

Proposition 2.4 (lp-norm vs. lq-norm)

Proof

3. Partitions and general tensor unfoldings

Definition 3.1 (Partition Lattice)

Figure 1.

Example 3.2

Definition 3.3 (General Tensor Unfolding)

Remark 3.4

Example 3.5

Remark 3.6

Remark 3.7

4. Operator norm inequalities on the partition lattice

Proposition 4.1 (Monotonicity)

Proof

Lemma 4.2 (One-Step Inequality)

Proof

Remark 4.3

Definition 4.4

Remark 4.5

Example 4.6

Remark 4.7

Theorem 4.8 (Spectral norm inequalities)

Remark 4.9

Proof of Theorem 4.8

Corollary 4.10

Proof

Corollary 4.11 (Bottom-Up Inequality)

Proof

Remark 4.12

Corollary 4.13 (Top-Down Inequality)

Proof

Corollary 4.14

Proof

Example 4.15

Theorem 4.16 (lp-norm inequalities)

Proof

5. Orthogonal decomposability and norm equality on upper cones

Definition 5.1 (π-orthogonal decomposable)

Remark 5.2 (0[k]-OD)

Lemma 5.3

Proof

Remark 5.4

Theorem 5.5 (Norm equality on upper cones)

Proof

Corollary 5.6

Corollary 5.7

6. Discussion

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Proposition 2.4 (l^p-norm vs. l^q-norm)

Theorem 4.16 (^{l^p}-norm inequalities)

Remark 5.2 (0_[k]-OD)