Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2021 Jul 7;118(28):e2015851118. doi: 10.1073/pnas.2015851118

Tensor-tensor algebra for optimal representation and compression of multiway data

Misha E Kilmer a, Lior Horesh b, Haim Avron c, Elizabeth Newman d,1
PMCID: PMC8285895  PMID: 34234014

Significance

Many real-world data are inherently multidimensional; however, often data are processed as two-dimensional arrays (matrices), even if the data are naturally represented in higher dimension. The common practice of matricizing high-dimensional data is due to the ubiquitousness and strong theoretical foundations of matrix algebra. Various tensor-based approximations have been proposed to exploit high-dimensional correlations. While these high-dimensional techniques have been effective in many applications, none have been theoretically proven to outperform matricization generically. In this study, we propose matrix-mimetic, tensor-algebraic formulations to preserve and process data in its native, multidimensional format. For a general family of tensor algebras we prove the superiority of optimal truncated tensor representations to traditional matrix-based representations with implications for other related tensorial frameworks.

Keywords: tensor, compression, multiway data, SVD, rank

Abstract

With the advent of machine learning and its overarching pervasiveness it is imperative to devise ways to represent large datasets efficiently while distilling intrinsic features necessary for subsequent analysis. The primary workhorse used in data dimensionality reduction and feature extraction has been the matrix singular value decomposition (SVD), which presupposes that data have been arranged in matrix format. A primary goal in this study is to show that high-dimensional datasets are more compressible when treated as tensors (i.e., multiway arrays) and compressed via tensor-SVDs under the tensor-tensor product constructs and its generalizations. We begin by proving Eckart–Young optimality results for families of tensor-SVDs under two different truncation strategies. Since such optimality properties can be proven in both matrix and tensor-based algebras, a fundamental question arises: Does the tensor construct subsume the matrix construct in terms of representation efficiency? The answer is positive, as proven by showing that a tensor-tensor representation of an equal dimensional spanning space can be superior to its matrix counterpart. We then use these optimality results to investigate how the compressed representation provided by the truncated tensor SVD is related both theoretically and empirically to its two closest tensor-based analogs, the truncated high-order SVD and the truncated tensor-train SVD.

1. Introduction

A. Overview.

Following the discovery of the spectral decomposition by Lagrange in 1762, the singular value decomposition (SVD) was discovered independently by Beltrami and Jordan in 1873 and 1874, respectively. Further generalization of the decomposition came independently by Sylvester in 1889 and by Autonne in 1915 (1). Perhaps the most notable theoretical results associated with the decomposition is due to Eckart and Young, who provided the first optimality proof of the decomposition back in 1936 (2).

The use of SVD in data analysis is ubiquitous. From a statistical point of view, the singular vectors of a (mean-subtracted) data matrix represent the principal component directions, i.e., the directions in which there is maximum variance corresponding to the largest singular values. However, the SVD is historically well-motivated by spectral analysis of linear operators. In typical data analysis tasks, matrices are treated as rectangular arrays of data which may not correspond directly to either statistical interpretations or representations of linear transforms. Hence, the prevalence of the SVD in data analysis applications requires further investigation.

The utility of the SVD in the context of data analysis is due to two key factors: the aforementioned Eckart–Young theorem (also known as the Eckart–Young–Minsky theorem) and the fact that the SVD (or in some cases a partial decomposition or high-fidelity approximation) can be efficiently computed relative to the matrix dimensions and/or desired rank of the partial decomposition (see, e.g., ref. 3 and references therein). Formally, the Eckart–Young theorem offers the solution to the problem of finding the best (in the Frobenius norm or 2-norm) rank-k approximation to a matrix with rank greater than k in terms of the first k terms of the SVD expansion of that matrix. The theorem implies, in some informal sense, that the majority of the informational content is captured by the dominant singular subspaces (i.e., the span of the singular vectors corresponding to the largest singular values), opening the door to compression, efficient representation, denoising, and so on. The SVD motivates modeling data as matrices, even in cases where a more natural model is a high-dimensional array (a tensor)—a process known as matricization.

Intuitively, there is an inherent disadvantage of matricization of data which can naturally be represented as a tensor. For example, a grayscale image is intrinsically represented as a matrix of numbers, but a video is natively represented as a tensor since there is an additional dimension: time. Preservation of the dimensional integrity of the data can be imperative for subsequent analysis, e.g. to account for high-dimensional correlations embedded in the structures in which the data are organized. Even so, in practice, there is often a surprising dichotomy between the data representation and the algebraic constructs employed for its processing. Thus, in the last century there have been efforts to define decomposition of tensorial structures, e.g. CANDECOMP/PARAFAC (CP) (46), Tucker (7), higher-order SVD (HOSVD) (8), and Tensor-Train (9). However, none of the aforementioned decompositions offers an Eckart–Young-like optimality result.

In this study, we attempt to close this gap by proving a general Eckart–Young optimality theorem for a tensor-truncated representation. An Eckart–Young-like theorem must revolve around some tensor decomposition and a metric. We consider tensor decompositions built around the idea of the t-product presented in ref. 10 and the tensor-tensor product extensions of a similar vein in ref. 11. In refs. 10 and 11 the authors define tensor-tensor products between third-order tensors and corresponding algebra in which notions of identity, orthogonality, and transpose are all well-defined. For the t-product in ref. 10, the authors show there exists an Eckart–Young type of optimality result in the Frobenius norm. Other popular tensor decompositions [e.g., Tucker, HOSVD, CP, Tensor-Train SVD (TT-SVD) (4, 79, 12)] do not lend themselves easily to a similar analysis, either because direct truncation does not lead to an optimal lower-term approximation or because they lack analogous definitions of orthogonality and energy preservation.

In this paper, for the generalization of the t-product, the M-product (11), we prove two Eckart–Young optimality results. Importantly, we prove the superiority of the corresponding compressed tensor representation to the matrix compressed representation of the same data. This is a theoretical proof of superiority of compression of multiway data using tensor decomposition as opposed to treating the data as a matrix. Leveraging our algebraic framework, we show that the HOSVD can be interpreted as a special case of the class of tensor-tensor products we consider. As such, we are able to apply our tensor Eckart–Young results to explain why truncation of the HOSVD will, in general, not give an optimal compressed representation when compared to the proposed truncated tensor SVD approach. Moreover, we show how our Eckart–Young results can be applied in the context of comparison of the proposed tensor-tensor framework to a truncated TT-SVD approximation.

The M-product-based algebra is orientation-dependent, meaning that the ability to compress the data in a meaningful way depends on the orientation of the tensor (i.e., how the data are organized in the tensor). For example, a collection of grayscale images of size m×n can be placed into a tensor as lateral slices or be rotated first and then placed as lateral slices, and so on. In many applications, there are good reasons to keep the second, lateral, dimension fixed (i.e., representing time, total number of samples, etc.), but in others there may be no obvious merit to preferentially treat one of the other two dimensions. Thus, we also consider variants of the t-product approach that offer optimal approximations to the tensorized data without handling one spatial orientation differently than another.

We primarily limit the discussion to third-order tensors, though in the final section we discuss how the ideas generalize to higher-order as well. Indeed, the potential for even greater compression gain for higher-order representations of the data exists, provided the data have higher-dimensional correlations to be exploited.

B. Paper Organization.

In Section 2, we give background notation and definitions. In Section 3, we define decompositions, the t-SVDM and its variant, the t-SVDMII, and prove an Eckart–Young-like theorem for each. In Section 5, we employ these theorems to prove the superior representation of the t-SVDM and t-SVDMII compared to the matrix SVD. To provide intuition, Section 4 and Section 5 discuss when and why some data are more amenable to optimal representation through the use of the truncated t-SVDM than through the matrix SVD. In Section 6, we relate the M-framework to other related tensorial frameworks, including HOSVD and TT-SVD. Section 7 discusses multisided tensor compression. Section 8 contains a numerical study and highlights extensions to higher-order data. A summary and future work are the subjects of Section 9.

2. Background

For the purposes of this paper, a tensor is a multidimensional array, and the order of the tensor is defined as the number of dimensions of this array. As we are concerned with third-order tensors throughout most of the discussion, we limit our notation and definitions to the third-order case here and generalize to higher order in the final section.

A. Notation and Indexing

A third-order tensor A is an object in Cm×p×n. Its Frobenius norm, AF, is analogous to the matrix case, that is, AF2=i,j,k|Ai,j,k|2. We use MATLAB notation for entries: Ai,j,k denotes the entry at row i and column j of the matrix going k “inward.” The fibers of tensor A are defined by fixing two indices. Of note are the tube fibers, written as Ai,j,: or ai,j, i=1:m,j=1:p. A slice of a third-order tensor A is a two-dimensional array defined by fixing one index. Of particular note are the frontal and lateral slices, as depicted in Fig. 1. The ith frontal slice is expressed as A:,:,i and also referenced as A(i) for convenience in later definitions. The jth lateral slice would be A:,j,: or equivalently expressed as Aj.

Fig. 1.

Fig. 1.

Fibers and slices of m×p×n tensor A. Left to right: Frontal slices, denoted either A(i) or A:,:,i; lateral slices denoted Aj or A:,j,:; tube fibers aij or Ai,j,:.

Some other notation that we use for convenience are the vec and reshape operators that map matrices to vectors by column unwrapping, and vice versa:

a=vec(A)CmnA=reshape(a,[m,n]).

We can also define invertible mappings between m×n matrices and m×1×n tensors by twisting and squeezing* (13): i.e., XCm×n is related to XCm×1×n via

X=twist(X)andX=sq(X).

The mode-1, mode-2, and mode-3 unfoldings of ACm×p×n are m×pn, p×mn, and n×mp, respectively, and are given by

A(1)[A(1),,A(n)]A(2)[(A(1)),,(A(n))]A(3)[sq(A:,1,:),sq(A:,2,:),,sq(A:,p,:)]. [1]

These are useful in defining modewise tensor-matrix products (see ref. 12). For example, A×3M for a r×n matrix M is equivalent to computing the matrix-matrix product MA(3), which is an r×mp matrix, and then reshaping the result to an m×p×r tensor.

The HOSVD (8) can be expressed as

A=C×1U×2V×3W, [2]

where U,V,W are the matrices containing the left singular vectors, corresponding to nonzero singular values, of the matrix SVDs of A(1),A(2),A(3), respectively. For an m×p×n tensor, U would be m×r1, V, p×r2, and Wn×r3, where r1,r2,r3 are the ranks of the three respective unfoldings. Correspondingly, we say the tensor has HOSVD rank (r1,r2,r3). The r1×r2×r3 core tensor is given by CA×1UH×2VH×3WH. While the columns of the factor matrices are orthonormal, the core need not be diagonal, and its entries need not be nonnegative. In practice, compression is achieved by truncating to an HOSVD rank k=(k1,k2,k3), but unlike the matrix case such truncation does not lead to an optimal truncated approximation in a norm sense. It does, however, offer a quasi-optimal approximation in the following sense: ABkFdAB*F, which means that if there exists an optimal solution B* with a small error then a truncated HOSVD Bk will yield a bounded approximation to it (1416).

If a,b,c are length m,n,p vectors, respectively, then Buvw (i.e., Bi,j,k=uivjwk) is called a rank-1 tensor. A CP (46) decomposition of a tensor A is an expression as a sum of rank-1 outer-products:

A=i=1rsi(u(i)v(i)w(i))=:[[S;U,V,W]],

where the factor matrices U,V,W have the u(i),v(i),w(i) as their columns, respectively, and S is a diagonal matrix containing the weights. If r is minimal, then r is said to be the rank of the tensor; that is, r is the tensor rank. Although it is known that rmin(np,mp,nm), determining the rank of a tensor is an NP-hard problem (17). Also, the factor matrices need not have orthonormal columns, nor be full rank. .

However, if we are given a set of factor matrices with k columns such that A=[[U,V,W]] holds, this is still a CP decomposition, even if we do not know if k corresponds to, or is bigger than, the rank. In other words, given a CP decomposition where the factor matrices have k columns, all we know is that the rank of the tensor represented by this decomposition has tensor rank at most k.

In the remaining subsections of this section, we provide background on a tensor-tensor product representation recently developed in the literature (10, 11, 13). The primary goal of this paper is to derive provably optimal (i.e., minimal, in the Frobenius norm) approximations to tensors under this matrix mimetic framework. We compare these approximations to those derived by processing the same data in matrix form and show links between our approximations and to direct tensorial factorizations that offer a notion of orthogonality, the HOSVD and TT-SVD.

B. A Family of Tensor-Tensor Products

The first closed multiplicative operation between a pair of third-order tensors of appropriate dimension was given in ref. 10. That operation was named the t-product, and the resulting linear algebraic framework is described in refs. 10 and 13. In ref. 11, the authors followed the theme from ref. 10 by describing a new family of tensor-tensor products, called M-product, and gave the associated algebraic framework. As the presentation in ref. 11 includes the t-product as one example, we will introduce the class of tensor-tensor products of interest and at times throughout the paper highlight the t-product as a special case.

Let M be any invertible n×n matrix, and ACm×p×n. We will use hat notation to denote a tensor in the transform domain specified by M:

A^A×3M,

where, since M is n×n, A^ has the same dimension as A. Importantly, A^ corresponds to applying M along all tube fibers, although it is implemented according to the definition of computing the matrix-matrix product MA(3) and reshaping the result. The “hat” notation should be understood in context relative to the M applied.

Algorithm 1:

Algorithm AMB for invertible M from ref. 11

INPUT: ACm×p×n, BCp××n, invertible MCn×n
1: Define A^A×3M, B^B×3M
2: for i=1,,n do
3: C^:,:,i=A^:,:,iB^:,:,i
4: end for
5: Define-C=C^×3M1.NowC=AMBCm××n

From ref. 11, we define the M-product between ACm×p×n and BCp×r×n through the steps in Algorithm 1. The inner for-loop (steps 2 through 4) is the facewise product, denoted C^=A^B^, and is embarrassingly parallelizable since the matrix-matrix products in the loop are independent. Step 1 (and 5) could in theory also be performed in p independent blocks of matrix-matrix products (or matrix solves, in the case of ×3M1 to avoid inverse computation).

Choosing M as the (unnormalized) DFT matrix and comparing to the algorithm in the previous section, we see that AMB effectively reduce to the t-product operation, A*B, defined in ref. 10. Thus, the t-product is a special instance of products from the M-product family.

C. Tensor Algebraic Framework

Now we can introduce the remaining parts of the framework. For the t-product, the linear algebraic framework is in refs. 10 and 13. However, as the t-product is a special case of the M as noted above, we will elucidate here the linear algebraic framework as described in ref. 11, with pointers to refs. 10 and 13 so we can cover all cases.

There exists a notion of conjugate transposition and an identity element:

Definition 2.1 (Conjugate Transpose): Given ACm×p×n its p×m×n conjugate transpose under MAH is defined

(A^H):,:,i=(A^:,:,i)H,i=1,,n.

As noted in ref. 11, this definition ensures the multiplication reversal property for the Hermitian transpose under M: AHMBH=(BMA)H. This definition is consistent with the t-product transpose given in ref. 10 when M is defined by the DFT matrix.

Definition 2.2 (Identity Tensor; Unit-Normalized Slices): The m×m×n identity tensor I satisfies AMI=A=IMA for ACm×m×n. For invertible M, this tensor always exists. Each frontal slice of I^=I×3M is an identity matrix. If B is m×1×n, and BHMB is the 1×1×n identity tensor under M, we say the B is a unit-normalized tensor slice.

The concept of unitary and orthogonal tensors is now straightforward:

Definition 2.3 (Unitary/Orthogonality): Two m×1×n tensors, A,B, are called M-orthogonal slices if AHMB is the tube fiber 0. If QCm×m×n (QRm×m×n) is called M-unitary, (M-orthogonal), if

QHMQ=I=QMQH,

where H is replaced by transpose for real-valued tensors. Note that I must be the one defined under M as well and that tube fibers on the diagonal correspond to unit-normalized slices QjHMQj and off-diagonals to tube fibers formed from products QjHMQk with kj.

3. Tensor M*M-SVDs and Optimal Truncated Representation

As noted in the previous section and described in more detail in ref. 11, any invertible matrix M can be used to define a valid tensor-tensor product. However, in this paper we will focus on a specific class of matrices M for which unitary invariance under the Frobenius norm is preserved, as we discuss in the following. We then use this feature to develop Eckart–Young theory for our tensor decompositions later in this section.

A. Unitary Invariance

Unitary invariance of the Frobenius norm of real-valued orthogonal tensors under the t-product was shown in ref. 10. Here, we prove a more general result.

Theorem 3.1. With the choice of n×nM=cW for unitary (orthogonal) W, and nonzero c, assume Q is m×m×n and M-unitary (M-orthogonal). Then

QMBF=BF,BCm×k×n.

Likewise, if BCp×m×n, BMQF=BF.

Proof.

Suppose M=cW where W is unitary. Then, M1=1cWH. Next,

B^F=B×3MF=cWB(3)F=cB(3)F=cBF. [3]

Let C=QMB. Using [3],

BF2=1|c|2B^F2=1|c|2i=1pQ^:,:,iB^:,:,iF2=1|c|2C^F2=CF2=QMBF2,

as each Q^:,:,i is unitary. The other direction is similar.

We now have the framework we need to describe tensor SVDs induced by a fixed, M operator. These were defined and existence was proven in ref. 10 for the t-product over real-valued tensors and in ref. 11 for M more generally.

Definition 3.2 (10, 11): Let A be a m×p×n tensor. The (full) M tensor SVD (t-SVDM) of A is

A=UMSMVH=i=1rU:,i,:MSi,i,:MV:,i,:H, [4]

where URm×m×n, VRp×p×n are Munitary, and SRm×p×n is a tensor whose frontal slices are diagonal (such a tensor is called f-diagonal), and rmin(m,p) is the number of nonzero tubes in S. When M is the DFT matrix, this reduces to the t-product-based t-SVD introduced in ref. 10.

Note that Definition 3.2 implies each frontal slice of A^ has rank less than or equal to r. Clearly, if m>p, from the second equality we can get a reduced t-SVDM, by restricting U to have only p orthonormal lateral slices, and S to be p×p×n, as opposed to the full representation. Similarly, if p>m, we only need to keep the m×m×n portion of S and the m columns of V to obtain the same representation. An illustration of the decomposition is presented in Fig. 2, Top.

Fig. 2.

Fig. 2.

(Top) Illustration of the tensor SVD. (Bottom) Example showing different truncations across the different SVDs of the faces, based on Algorithm 3.

Algorithm 2:

Full t-SVDM from ref. 11

INPUT: ACm×p×n, invertible MCn×n
1: A^A×3M
2: for i=1,,n do
3: [U^:,:,i,S^:,:,i,V^:,:,i]=svd(A^:,:,i)
End for
4: U=U^×3M1,S=S^×3M1,V=V^×3M1

Independent of the choice of M, the components of the t-SVDM are computed in transform space. We describe the full t-SVDM in Algorithm 2. As noted, the t-SVDM above was proposed already in ref. 11. However, when we restrict the class of M to nonzero multiples of unitary or orthogonal matrices, we can now derive an Eckart–Young theorem for tensors in general form. To do so, we first give a new corollary for the restricted class of M considered.

Corollary 3.3. Assume M=cW, where c0, and W is unitary. Then given the t-SVDM of A over M defined in Definition 3.2,

AF2=SF2=i=1min(p,m)Si,i,:F2.

Moreover, S1,1,:F2S2,2,:F2.

Proof: The proof of the first equality follows from Theorem 3.1, the second from the definition of Frobenius norm. To prove the ordering property, use the shorthand for each singular tube fiber as siSi,i,:, and note using [3] that

siF2=1cs^iF2=1cj=1n(σ^i(j))2,

where we have used σ^i(j) to denote the ith largest singular value of the jth frontal face of S^. However since σ^i(j)σ^i+1(j), the result follows.

This observation gives rise to a new definition.

Definition 3.4: We refer to r in the t-SVDM Definition 3.2 (see the second equality in [4]) as the t-rank,§ the number of nonzero singular tubes in the t-SVDM.

We can also extend the idea of multirank in ref. 13 to the M general case:

Definition 3.5: The multirank of A under M is the vector ρ such that its ith entry ρi denotes the rank of the ith frontal slice of A^; that is, ρi=rank(A^:,:,i).

Notice that a tensor with multirank ρ must have t-rank equal to maxi=1,,nρi.

Definition 3.6: The implicit rank under M of Aρ is r=i=1nρi.

Note that S is uniquely defined in Definition 3.2, thus the t-rank and multirank of a tensor are also unique.

B. Eckart–Young Theorem for Tensors

The key aspect that has made the t-product-based t-SVD so instrumental in many applications (see, for example, refs. 1821) is the tensor Eckart–Young theorem proven in ref. 10 for real-valued tensors under the t-product. In loose terms, truncating the t-product-based t-SVDM gives an optimal low t-rank approximation in the Frobenius norm. An Eckart–Young theorem for the M operator was not provided in ref. 11. We give a proof below for the special case that we have been considering in which M is a multiple of a unitary matrix.

Theorem 3.7. Define Ak=U:,1:k,:MS1:k,1:k,:MV:,1:k,:H, where M is a nonzero multiple of a unitary matrix. Then, Ak is the best Frobenius norm approximation over the set Γ={C=XMY|XCm×k×n,YCk×p×n}, the set of all t-rank k tensors under M, of the same dimensions as A. The squared error is AAkF2=i=k+1rsiF2, where r is the t-rank of A.

Proof: The squared error result follows easily from the results in the previous section. Now let B=XMY. ABF2=1cA^B^F2=1ci=1nA^:,:,iB^:,:,iF2. By definition, B^:,:,i is a rank-k outer product X^:,:,iY^:,:,i. The best rank-k approximation to B^:,:,i is U^:,1:k,iS^1:k,1:k,iV^:,1:k,iH, so A^:,:,iU^:,1:k,iS^1:k,1:k,iV^:,1:k,iHF2A^:,:,iB^:,:,iF2, and the result follows.

In ref. 18, the authors used the Eckart–Young result for the t-product for compression of facial data and a PCA-like approach to recognition. They also gained additional compression in an algorithm they called the t-SVDII (only for the t-product on real-valued tensors), which, although not described as such in that paper, is effectively reducing the multirank for further compression. Here, we provide the theoretical justification for the t-SVDII approach in ref. 18 while simultaneously extending the result to the M-product family restricted to M being a nonzero multiple of a unitary matrix.

Theorem 3.8. Given the t-SVDM of A under M, define Aρ, to be the approximation having multirank ρ: that is,

(A^ρ):,:,i=U^:,1:ρi,iS^1:ρi,1:ρi,iV^:,1:ρi,iH.

Then Aρ is the best multirank ρ approximation to A in the Frobenius norm and

AAρF2=i=1nk=1ri(σ^ρi+k(i))2,

where ri denotes the rank of the ith frontal face of A^.

Proof: Follows similarly to the above, and is omitted.

To use this in practice, we generalize the idea of the t-SVDII in ref. 18 to the M-product for M a multiple of a unitary matrix. First, we need a suitable method to choose ρ. We know

A^F2=i=1nj=1ri(σ^j(i))2,

where ri is the rank of the ith frontal face of A^. Thus, there are Ki=1nrinmin(m,p) total nonzero singular values. Let us order the (σ^j(i))2 values in descending order and put them into a vector of length K. We find the first index JK such that (i=1Jvi)/A^F2>γ. Keeping J total terms thus implies an approximation of energy γ. Then let τ=vJ—this will be the value of the singular value that is the smallest one which we should include in the approximation. We run back through the n faces, and for face i we keep only the ρi singular 3-tuples such that σ^j(i)τ. In other words, the RE in our approximation is given by

i=1nj=1ρi(σ^j(i))2A^F2γ.

The pseudo-code is given in Algorithm 3, and a cartoon illustration of the output is given in Fig. 2.

Algorithm 3:

Return t-SVDMII under M, ρ to meet energy constraint

INPUT: A, M a multiple of unitary matrix; desired energy γ(0,1].
1: Compute t-SVDM of A.
2: Concatenate ((S^j,j,i).2) for all i,j into a vector v.
3: vsort(v,descend).
4: Let w be the vector of cumulative sums: i.e., wk=i=1kvi
5: Find the first index J such that wJ/S^F2>γ.
6: Define τvJ.
7: for i=1,,n do
8: Set ρi as number of singular values for A^:,:,i greater or equal to τ.
9: Keep only the m×ρiU^:,1:ρi,i and G^ρS^1:ρi,1:ρi,iV^:,1:ρi,iH.
End for

In Section 5, we compare our Eckart–Young results for tensors with the corresponding matrix approximations obtained by using the matrix-based Eckart–Young theorem, but first we need a few results that show what structure these tensor approximations inherit from M.

4. Latent Structure

To understand why the proposed tensor decompositions are efficient at compression and feature extraction we investigate the latent structure induced by the algebra in which we operate. We shall also capitalize on this structural analysis in the next section’s proofs stating how the proposed t-SVDM and t-SVDMII decompositions can be used to devise superior approximations compared to their matrix counterparts.

If v,cC1×1×n, from Algorithm 1 we have

vMc=(v×3M)((c×3M))×3M1,

where indicates pointwise scalar products on each face. Using the definitions, this expression is tantamount to

twist[M1diag(Mc(3))Mv(3)]=twist[v(3)Mdiag(c^)M]. [5]

Note that Mc(3) is mathematically equivalent to forming the tube fiber c^, and diag applied to a tube fiber works analogously to diag applied to that tube fiber’s column vector equivalent. Further, the transpose is a real-valued transpose, stemming from the definition of the mode-3 product.

Let B be any element in Cm×1×n and consider computing the product QBMc. In ref. 11 it was shown that the jth tube fiber entry in Q is effectively the product of the tubes Bj,1,:Mc. From [5] we have

sq(Q)=sq(B)Mdiag(c^)M. [6]

The matrix in the parentheses on the right is an element of the space of all matrices XM={X:X=MDM}, where D is diagonal. This brings us to a major result.

Theorem 4.1. Given A=UMSMVHC=i=1tU:,i,:MCi,:,:. Then, using R[v]Mdiag(v^)M

sq(A:,k,:)=i=1tsq(U:,i,:)R[Ci,k,:],=i=1tUiR[Ci,k,:]. [7]

Thus, each lateral slice of A is a weighted combination of “basis” matrices given by Uisq(U:,i,:), but the weights, instead of being scalars, are matrices R[Ci,k,:] from the matrix algebra induced by the choice of M. For M, the DFT matrix, the matrix algebra is the algebra of circulants.

5. Tensors and Optimal Approximations

In ref. 18, the claim was made that a t-SVD to k terms could be superior to a matrix SVD based compression to k terms. Here, we offer a formal proof then discuss the relative meaning of k. Then, in the next section, we discuss what can be done to obtain further compression.

A. Theory: T-rank vs. Matrix Rank

Let us assume that our data are a collection of , m×n matrices Di,i=1,. For example, Di might be a grayscale image, or it might be the values of a function discretized on a two-dimensional uniform grid. Let di=vec(Di), so that di has length mn.

We put samples into a matrix (tensor) from left to right:

A=d1,,dCmn×A=twist(Di),,twist(D)Cm××n.

Thus, A,A represent the same data, just in different formats. It is first instructive to consider in what ways the t-rank, t, of A and the matrix rank r of A are related. Then, we will move on to relating the optimal t-rank k approximation of A with the optimal rank-k approximation to A.

Theorem 5.1. The t-rank, t, of A is less than or equal to the rank, r, of A. Additionally, since tmin(m,), if m<r, then t<r.

Proof: The problem’s dimensions necessitate tmin(m,) and r. Let A=GH be a rank-r factorization of A such that G is mn×r and H is r×. From the fact A^=A×3M, we can show the m×-sized ith frontal face of A^ satisfies

A^:,:,i=j=1nmijA:,:,j=j=1nmijG(j1)m+jm,:H=j=1nmijGj1m+jm,:H. [8]

Clearly, the rank of this frontal slice is bounded above by min(m,r) since this is the maximal rank of the matrix in parentheses. Then, the singular values of the matrix A^:,:,i satisfy σ^1(i)σ^2(i)σ^ri(i), where rimin(m,r). As S^j,j,i=σ^j(i), Sj,j,:=S^j,j,:×3M1 for a particular value j will be a nonzero tube fiber iff for any of the i=1:n at least one σ^j(i) is nonzero. There can be at most min(m,r) nonzero tube fibers, so tmin(m,r).

Note that the proof was independent of the choice of invertible M. In particular, it holds for M=I. This means that the act of “folding” the data matrix into a tensor may provide a reduced rank approximation (a rank-r matrix goes to a t-rank <r tensor). A nonidentity choice of M, though, may reveal tr. To make the idea concrete, let us consider an example.

Example 5.2: Let URn×n invertible, and ciRn, i=1,,p with pn be a set of independent vectors. Define A such that A:,i,:=twist(Ucirc(ci)). The t-rank is 1.

On the other hand, with Z the circulant downshift matrix,

A=(IU)diag(I,Z,,Zn1)c1c2cpc1c2cpc1c2cp.

The rank of the A therefore is p. If U and C have orthonormal columns, we can show AAkF2=(pk)n for any k<n.

B. Theory: Comparison of Optimal Approximations

In this subsection we compare the quality of approximations obtained by truncating the matrix SVD of the data matrix vs. truncating the t-SVDM of the same data as a tensor. We again assume that mn> and that A has rank r.

Let A=UΣV be the matrix SVD of A, and denote its best rank-k, k<r approximation according to

AkU:,1:kC1:k,:(Ak):,j=i=1kU:,icij, [9]

for j=1,,k where C=ΣV. Finally, we need the following matrix version of [9] which we reference in the proof:

reshape((Ak):,j,[m,n])=i=1kreshape(U:,i,[m,n])cij. [10]

Theorem 5.3. Given A, A as defined above, with A having rank r and A having t-rank t, let Ak denote the best rank-k matrix approximation to A in the Frobenius norm, where kr. Let Ak denote the best t-rank-k tensor approximation under M, where M is a multiple of a unitary matrix, to A in the Frobenius norm. Then

Sk+1:t,k+1:t,:F=AAkFAAkF.

Proof: Consider [10]. The multiplication by the scalar cij in the sum is equivalent to multiplication from the right by cijI. However, since M=cW for unitary W, we have cijI=Mdiag(cije)M, where e is the vector of all ones. Define the tube fiber Ci,j,: from the matrix-vector product cijM1e oriented into the third dimension. Then, cijI=R[Ci,j,:]. Now we observe that [10] can be equivalently expressed as

reshape((Ak):,j)=i=1kreshape(U:,i)R[cij]. [11]

These can be combined into a tensor equivalent

Zki=1kQ:,i,:MCi,:,:=QMCwhere
(Zk):,j,:=twist(reshape((Ak):,j,[m,n])),Q:,i,:=twist(reshape(U:,i,[m,n])).

Since C^:,:,i=Σ^1:k,1:kV^:,1:k, the t-rank of C is k. The t-rank of Q must also not be smaller than k, by Theorem 5.1.

Thus, given the definition of Ak as the minimizer over all such k-term “outer-products” under M, it follows that

Sk+1:t,k+1:t,:F=AAkFAZkF=AAkF.

Here is an example showing strict inequality is possible. Additional supporting examples are in the numerical results.

Example 5.4: Given M the Haar wavelet matrix, and let

A=11140003,withA1=σ1u1v1.

It is easily shown that AA1F=σ2=1. Setting A1=U:,1,:MS1,1,:MV:,1,:, we observe

AA1F=S2,2,:F=M0σ^2(2)F=σ2^(2)0.59236<1.

In the next subsection we discuss the level of approximation provided by the output of Algorithm 3 by relating it back to the truncated t-SVDM and also to truncated matrix SVD. First, we need a way to relate storage costs.

Theorem 5.5. Let Ak be the t-SVDM t-rank k approximation to A, and suppose its implicit rank is r. Define μ=AkF2/AF2. There exists γμ such that the t-SVDMII approximation, Aρ, obtained for this γ in Algorithm 3, has implicit rank less than or equal to an implicit rank of Ak and

AAρFAAkFAAkF.

Proof: From Theorem 5.3 that AAkF2=1cj=1ni=k+1n(σ^i(j))2. The proof is by construction using Ak as the starting point; that is, assume ρi=k for each frontal slice. Let σ^* be the largest singular value we truncated; that is,

σ^*=maxi=1,,nσρi+1(i).

Let C={σ^j(i)<σ^*|1jρi,i} be the union of all of the singular values that were kept in the approximation and smaller than σ^*. If C is empty, then Aρ=Ak and we are done.

Otherwise, σ^* was larger than at least one of the singular values we kept. Let i* be the index of the frontal slice containing σ^* with corresponding rank ρi*. Let cmin be the smallest element contained in C and, by definition, cmin<σ^*. Let imin be the index of the frontal slice containing cmin with corresponding rank ρimin. Note that cmin must be the smallest singular value kept from frontal slice imin. Set ρi*ρi*+1 and ρiminρimin1. Then, the error AAρF2 has changed by an amount σ^*2+cmin2.

In practice, we can decrease the implicit rank further. For convenience, assume the elements of C are labeled in increasing order (c1c2c3). Let πp=j=1pcj2 for p less than or equal to the cardinality of C. Choose the largest value of p such that πp<σ^*2. Again, set ρi*ρi*+1 and reduce the p values of ρi that correspond to c1,,cp. Then, the error AAρF2 has changed by an amount σ^*2+πp and the implicit rank has decreased by p1.

C. Storage Comparisons

Let us suppose that κ is the truncation parameter for the tensor approximation and k is the truncation parameter for the matrix approximation. Table 1 gives a comparison of storage for the methods we have discussed so far. Note that for the t-SVDMII it is necessary to work only in the transform domain, as moving back to the spatial domain would cause fill and unnecessary storage.

Table 1.

Comparison of storage costs for approximations of an m×p×n tensor A

Basis storage Coefficients storage Total implicit storage
Uk kmn Ck=SkVk kp Ak k(mn+p)
Uκ κmn Cκ=SκMVκ κpn Ak κ(mn+pn) + st[M]
U^ρ rm C^ρ=S^ρV^ρ rp Aρ r(m+p) + st[M]

Top-to-bottom: a k-term truncated matrix SVD expansion, a κ-term truncated t-SVDM expansion, and a ρ-multirank t-SVDMII. Recall that for the t-SVDMII, we store terms in the transform domain and the implicit rank for Aρ is r=i=1nρi. The notation st[M] refers to the implicit storage of M, described below.

In practice, we can omit the storage costs of M when using fast transform techniques, such as the DCT, DFT, or discrete wavelet transform. In the case where M cannot be applied by fast transform the storage for M for t-SVDM is bounded by the number of nonzeros in M for t-SVDM. As we show in Section 5D, if ρ has many zeros (that is, many faces in the transform domain do not contribute singular values above the threshold), we can reduce the storage to nnz(ρ)n.

Discussion

If we assume we do not need to store the transformation matrix M, then if κ=k, Theorem 5.3 says the approximation error of the t-SVD is at least as good as the corresponding matrix approximation. In applications where we only need to store the basis terms, e.g., to do projections, the basis for the tensor approximation is better in a relative error sense than the basis for the matrix case, for the same storage. However, unless n=1, if we need to store both the basis and the coefficients, we will need more storage for the tensor case if we need to take κ=k. Fortunately, in practice AAκFAAkF for κ<k. Indeed, we already showed an example (see Example 5.2) where the error is zero for κ=1, but k had to be much larger to achieve exact approximation. If κk<m+pnm+p, then the total implicit storage of the tensor approximation of κ terms is less than the total storage for the matrix case of k terms.

Compared to the matrix SVD, the t-SVDMII approach can provide compression for at least as good, or better, an approximation level as indicated by the theorem. Of course, M should be an appropriate one given the latent structure in the data. The t-SVDMII approach allows us to account for the more “important” features (e.g., low frequencies and multidimensional correlations) and therefore impose a larger truncation on the corresponding frontal faces because those features contribute more to the global approximation. Truncation of t-SVDM by a single truncation index, on the other hand, effectively treats all features equally and always truncates each A^:,:,i to k terms, which depending on the choice of M may not be as good. This is demonstrated in Section 8.

6. Comparison to Other Tensor Decompositions

Now, we compare the M decompositions to other types of tensor representations as described in Section 1.

A. Comparison to truncated HOSVD

In this section, we wish to show how truncated HOSVD (tr-HOSVD) can be expressed using a M-product. Then, we can compare our truncated results to the tr-HOSVD. Other truncation strategies than the one discussed below could be considered, e.g. ref. 22, but are outside the scope of this study.

The tr-HOSVD is formed by truncating to k1,k2,k3 columns, respectively, the factor matrices Q,W,Z and forming the k1×k2×k3 core tensor as

CkA×1Q:,1:k1×2W:,1:k2×3Z:,1:k3,

where k denotes the triple (k1,k2,k3). The tr-HOSVD approximation then is

Ak=Ck×1Q:,1:k1×2W:,1:k2×3Z:,1:k3.

We now prove the following theorem that shows the tr-HOSVD can be represented under M when M=Z.

Theorem 6.1. Define the n×n matrix M as M=Z (since Z is unitary, it follows that M1=Z), and define m×k1×n and p×k2×n tensors in the transform space according to

Q^:,:,i=Q,andW^:,:,i=Wfori=1,,n.

Then Q=Q^×3M1,W=W^×3M1 and it is easy to show that Q, W are unitary tensors. Define P^ as the p×p×n tensor with identity matrices on faces 1 to k3 and 0 matrices from faces k3+1 to n.

Let C=Q:,1:k1,:MAMW:,1:k2,:. Then

Ak=Q:,1:k1,:MCMW:,1:k2,:MP.

Proof: First consider

CA×1Q:,1:k1×2W:,1:k2×3Z=(A×3Z)×1Q:,1:k1×2W:,1:k2=A^×1Q:,1:k1×2W:,1:k2, [12]

using properties of modewise products (see [12]). From the definitions of the modewise product the ith face of C as defined via [12] is Q:,1:k1A^:,:,iW:,1:k2. However, this means that we can equivalently represent C as C=Q:,1:k1,:MAMW:,1:k2,:.

Now BQ:,1:k1,:MCMW:,1:k2,: implies B^:,:,i=Q:,1:k1C^:,:,iW:,1:k2,i=1,,n. However, since C^:,:,i=Q:,1:k1A^:,:,iW:,1:k2 for i=1,,n, we only need to zero-out the last k3+1:n frontal slices of B to get to Ak, which we do by taking the M-product with P on the right.

Note that in our theorem we assume Z is square, but the transformation Z is effectively truncated by postmultiplying by P, which allows for the M-product presentation of the tr-HOSVD. Thus, we now can compare the theoretical results from tr-HOSVD to our truncated methods.

Theorem 6.2. Given the tr-HOSVD approximation Ak, for κmin(k1,k2),

AAκFAAkF

with equality only if Ak=Aκ.

Proof: Note that the t-ranks of Q and WMP are k1 and k2, respectively. Since C is k1×k2×n, its t-rank cannot exceed κmin(k1,k2). As such, we know Ak can be written as a sum of κ outer-products of tensors under M, and the result follows given the optimality of Aκ.

Corollary 6.3. Given the tr-HOSVD approximation Ak, for κmin(k1,k2) there exists γ such that Aρ returned by Algorithm 3 has implicit rank less than or equal to Aκ and

AAρFAAκFAAkF.

Note this is independent of the choice of k3: The best tr-HOSVD approximation will occur when k3=n.

While this has theoretical value, it begs the question of whether or not Z needs to be stored explicitly. When M is a matrix that can be applied quickly without explicit storage, such as a discrete cosine transform, this is not a consideration. We will say more about this at the end of this section.

B. Comparison to TT-SVD

The TT-SVD was introduced in ref. 9. In recent years, the (truncated) TT-SVD has been used successfully for compressed representations of matrix operators in high-dimensional spaces and for data compression.

For a third-order tensor, the first step of the TT-SVD algorithm is to perform a truncated matrix SVD of one unfolding. Thus, the results depend upon the choice of unfolding. In order to compare with our method, we shall choose the mode-2 unfolding.# We note that A=A(2), and since A=unfold(A)=USV has been used already, we observe A(2)=VSU is the SVD of the mode-2 unfolding. The TT-SVD algorithm proceeds as follows for third-order tensors:

1. Truncate V(SU), to k terms, keep Vk (typically, as a 1×p×k tensor). Let CSkUk, and note it is k×mn.

2. Let L(C) denote C reshaped to an mk×n matrix. Compute the SVD L(C)=WDQ and truncate to q terms. The mk×qWq is folded into a k×m×q tensor, and the remaining EqDqQq is stored as a q×n×1 tensor.

The truncations k,q are chosen based on a user-defined threshold ϵ such that AA~FϵAF, where A~ is the TT-SVD approximation. The storage cost of this implicit tensor approximation is pk+mkq+qn numbers.

Theorem 6.4. Let A~ denote the (k,q) truncated TT-SVD approximation, where k,q correspond to the truncation indices that satisfy the user input threshold ϵ. Then,

AAkF2AA~F2,

with strict inequality if q is less than the rank of L(C). Furthermore, there exists ρ such that

AAρF2AA~F2.

Proof: Let A(2),k=VkC and let r1 be the rank of C and r2 be the rank of L(C). The steps in the algorithm imply that A~(2)=VkL1(Eq). Thus,

A(2)A~(2)F2=A(2)A(2),k+A(2),kA~(2)F2=CL1(Eq)Sk+1:r1,k+1:r1U:,k+1:r1F2=i=k+1r1σi2+L(C)EqF2=AAkF2+j=q+1r2di2,

by employing unitary-invariance of the Frobenius norm.

Applying Theorem 5.5,

AAρF2AAkFAAkF2+j=q+1r2di2=A(2)A~(2)F2=AA~F2.

The inequality is strict if q<r2; it is also strict independent of q if AAkF<AAkF which, as noted earlier, is achievable and is almost always the case in practice.

C. Approximation in CP Form

The purpose of the interpretation of our decomposition in CP form is with an eye toward an implicit compressed representation discussed in the next subsection. Consider the formation of Aρ in Algorithm 3. In step 10, we keep ρi terms in the matrix SVD for frontal slice i of A^. So

A^ρ=i=1nj=1ρiU^:,j,iV^:,j,iσ^j(i)ei, [13]

and ei is the ith column of the n×n identity matrix. However, Aρ=A^ρ×3M1. Thus,

Aρ=i=1nj=1ρiσ^j(i)U^:,j,iV^:,j,i(M1ei).

Next, concatenate vectors to form three matrices of r columns as follows. Let U[U^:,1:ρ1,1,,U^:,1:ρn,n] similarly for V. Let

W[M~:,1,,M~:,1ρ1,,M~:,n,,M~:,nρn], [14]

where M~=M1, so that W also has r columns, with repeats as dictated by the ρi. Let S contain the σ^ρi(j). Then

Aρ=[[S;U,V,W]].

Note that if M is an orthogonal (unitary) matrix then M1=M (MH for complex M) the elements of the rightmost matrix are columns of the transpose of M and hence have unit norm. The columns of U,V also have unit norm.

D. Discussion: Storage of the Transformation Matrix

From [14], only columns i of M~ for which ρi>0 need to be stored (along with a storage-negligible vector of integer pointers) so storage drops to O(n|ρ|); see Table 2. In the numerical results on hyperspectral data, this fact, in combination with Corollary 6.3, allows the t-SVDMII to give results superior to the tr-HOSVD in both error and storage.

Table 2.

Summary of st[M]

t-SVDM t-SVDMII
Fast transform M 0 0
Unstructured M nnz(M) min (nnz(ρ)n, nnz(M))

7. Multisided Tensor Compression

Consider the mapping Cm×p×nCn×p×m induced by matrix-transposing (without conjugation) each of the p lateral slices. We use a superscript of P to denote such a permuted tensor:

AP=permute(A,[3,2,1]),(AP)P=A.

In this section, we define techniques for compression using both orientations of the lateral slices in order to ensure a more balanced approach to the compression of the data.

A. Optimal Convex Combinations

Given A=UMSMVH and AP=WBDBQH, compress each and form

α(Uk1MSk1MVk1H)+(1α)(Wk2BDk2BQk2H)P.

Observe that unfold(AP)=PA, where A=unfold(A) as before and P denotes a stride permutation matrix. Since P is orthogonal, the singular values and right singular vectors of A are the same as those of PA, and the left singular vectors are row permuted by P. For a truncation parameter r,

AP(AP)rFAArF.

It follows that for β1α

AαAk1+β(Ak2P)PFαAAk1F+βAAk2FAAmin(k1,k2)F.

Similarly for the optimal t-SVDMII approximations, Aρ,δαAρ+β(AδP)P, where ρ,δ are the multiindices for each orientation, respectively, which may have been determined with different energy levels. From Theorem 3.8,

AAρ,δF2=α2i=1nj=ρi+1ri(σ^j(i))2+β2k=1mj=δk+1r~k(σ~j(k))2,

where ri is the rank of A^:,:,i,i=1:n under M, r~k is the rank of Ap^:,:,k under B and σ~j(k) are for Ap^ under B as well.

B. Sequential Compression

Sequential compression is also possible, as described in ref. 23. A second tensor SVD is applied to Cp, making each step locally optimal. Due to space constraints, we will not elaborate on this further here.

8. Numerical Examples

In the following discussion, the compression ratio (CR) is defined as the number of floating point numbers needed to store the uncompressed data divided by the number of floating point numbers needed to store the compressed representation (in its implicit form). Thus, the larger the ratio, the better the compression. The relative error (RE) is the ratio of the Frobenius-norm difference between the original data and the approximation over the Frobenius-norm of the original data.

A. Compression of Yale B Data

In this section, we show the power of compression for the t-SVDMII approach with appropriate choice of M, that is, M exploits structure inherent in the data. We create a third-order tensor from the Extended Yale B face database (24) by putting the training images in as lateral slices in the tensor. Then, we apply Algorithm 3, varying ρ, for four different choices of M: We choose M as a random orthogonal matrix, we use M as an orthogonal wavelet matrix, we use M as the unnormalized DCT matrix, and we form M in a data-driven approach by taking the transpose of the left factor matrix Z of the mode-3 unfolding. We have chosen to use a random orthogonal matrix in this experiment to show that the compression power is relative to structure that is induced through the choice of M, so we do not expect, nor do we observe, value in choosing M to be random. In Fig. 3, we plot the CR against the RE in the approximation. We observe that for RE on the order of 10 to 15% the margins in compression achieved by the t-SVDMII for both the DCT, the wavelet transform, and the data-driven transform vs. treating the data in either matrix form, or in choosing a transform that—like the matrix case—does not exploit structure in the data, are quite large. To evaluate only compressibility associated with the transformation we do not count the storage of M.

Fig. 3.

Fig. 3.

Illustration of the compressive power of the t-SVDIIM in Algorithm 3 for appropriate choices of M. Far more compression is achieved using either the DCT or wavelet transform to define M, as well as defining M in a data-driven approach since they capitalize on structural features in the data.

B. Video Frame Data

For this experiment, we use video data available in MATLAB.** The video consists of 120, 120×160 frames in grayscale. The camera is positioned near one spot in the road, and cars travel on that road (more or less from the top to the bottom as the frames progress), so the only changes per frame are cars entering and disappearing.

We compare the performance of our truncated t-SVDMII, for M being the DCT matrix, against the truncated matrix and truncated HOSVD approximations. We orient the frames as transposed lateral slices in the tensor to confine the change in one lateral slice to another to rows where there is car movement.†† Thus, A is 160×120×120.

With both the truncated t-SVDMII and truncated matrix SVD approaches we can truncate based on the same energy value. Thus, we get RE in our respective approximations with about the same value, and then we can compare the relative compression. Alternatively, we can fix our energy value and compute our truncated t-SVDMII and find its CR. Then, we can compute the truncated matrix SVD approximation with similar CR and compare its RE to the tensor-based approximation. We show some results for each of these two ways of comparison.

There are many ways of choosing the truncation 3-tuple for HOSVD. Trying to choose a 3-tuple that has a comparable relative approximation to our approach would be cumbersome and would still leave ambiguities in the selection process. Thus, we employ two truncation methods that yield an approximation best matching the CRs of our tensor approximation. The indices are chosen as follows: 1) Compress only on the second mode (i.e., change k2, fix k1=m,k3=n) and 2) choose truncation parameters on dimensions such that the modewise compression to dimension ratios are about the same. The second option amounts to looping over k2=1,,p, setting k1=k2mn, and k3=k2. Thus, it is possible to compute the CR for the tr-HOSVD based on the dimension in advance to find the closest match to the desired compression levels. The results are given in Table 3.

Table 3.

Video experiment results

t-SVDMII Matrix tr-HOSVD
γ1=.998 γ1 CR (m,k2,n) (k1,k2,k2)
CR 4.76 1.83 4.76 4.95 4.90
RE 0.044 0.045 0.093 0.098 0.065
γ2=.996 γ1 CR (m,k2,n) (k1,k2,k2)
CR 10.10 2.54 10.87 10.75 10.42
RE 0.063 0.064 0.120 0.125 0.090

Matrix-based compression can be for predefined relative energy γ (which affects RE) or set to achieve desired compression, so we performed both. For experiment 1 with γ1, the k2 and (k1,k2,k3) values that gave the same compression results were 25 and (92, 69, 69), respectively; for the second experiment these were 11 and (70, 53, 53), respectively. Truncations for the matrix case were 65 and 25 for the γ1 experiment and 47 and 11 for the second.

We can also visualize the impact of the compression schemes. In Fig. 4 we give the corresponding reconstructed representations of frame 10 and 54 for the four methods under comparable CR for the second γ (columns 2 and 4 through 6, second row-block of the table, i.e., the results corresponding to the most compression). Cars disappear altogether and/or artifacts make it appear as though cars may be in the frame when they are not—at these compression levels, the matrix and tr-HOSVD suffer from a ghosting effect.

Fig. 4.

Fig. 4.

Various reconstructions for frames 10 and 54 (Left and Right). Top to bottom: Original, tr-tSVDMII with γ2=0.996, tr-Matrix for same γ2, tr-HOSVD (70,53,53), and tr-HOSVD(m,25,n). See Table 3 for numerical results.

C. Hyperspectral Image Compression

We compare the performance of the t-SVDMII and the truncated HOSVD based on our results in Corollary 6.3 on hyperspectral data. The hyperspectral data comes from flyover images of the Washington, DC mall with a sensor that measured 210 bands of visible infrared spectra (25). After removing the opaque atmospheric levels the data consist of 191 images corresponding to different wavelengths and each image is of size 307 × 1,280.

To reduce the computational cost for this comparison, and to allow for equal truncation in both spatial dimensions, we resize the images using MATLAB imresize to 300×300 images. We store the data in a tensor A of size 300×300×191, where the third dimension contains the wavelengths. We choose this orientation because we use M=Z, where Z contains the left singular vectors of A(3). We can expect a high correlation and hence high compressibility (i.e., many ρi will be 0) along the third dimension as they correspond to exactly the same spatial location at each wavelength.

For a fair comparison, we relate the t-SVDMII, the tr-HOSVD, and the TT-SVD via the truncation parameter κ of the truncated t-SVDM Aκ using M=Z. For the t-SVDMII based on Theorem 5.5 we use the energy parameter γ=AκF2/AF2. For the tr-HOSVD based on Theorem 6.2 we use a variety of multilinear truncations (κ,k2,k3) and (k1,κ,k3) for various choices of k1,k2κ and k3. For the TT-SVD based on Theorem 6.4 we choose the accuracy threshold ϵ such that the truncation of the first unfolding is κ. Note we first permute the tensor A so that the first TT-SVD unfolding is the mode-2 unfolding. For completeness, we include the matrix SVD Aκ and apply Theorem 5.3. We display the RE-vs.-CR results in Fig. 5.

Fig. 5.

Fig. 5.

Performance of various representations the hyperspectral data compression. Each shape represents a different method. The methods are related by the truncation parameter κ and each color represents a different choice of κ. The HOSVD results depicted by the squares correspond to multilinear rank (κ,κ,n) and maximally compresses the first two dimensions such that Theorem 6.2 applies and does not compress in the third dimension to minimize the RE. The HOSVD results depicted by the asterisks are many different choices of multilinear rank (k1,k2,k3) such that Theorem 6.2 applies. For each κ, we depict all possible combinations of k3=10,30,,190 and either k1=κ and k2 is any larger truncation value, or vice versa. The RE depends most on the value of κ and the CR depends most on the value of k3. This is due to the orientation and structure of the data; the third dimension is highly compressible and thus truncating k3 does not significantly worsen the approximation. Comparing shapes, the t-SVDMII produces the best results, clearly outperforming all cases of the SVD and HOSVD and outperforming the TT-SVD, most noticeably for better approximations (lower RE). Comparing colors, for a fixed value of κ the t-SVDMII produces the smallest RE, providing empirical support for the theory.

The results use the t-SVDMII storage method described in Section 6D. Even when counting the storage of M, the t-SVDMII outperforms the matrix SVD and the tr-HOSVD and is highly competitive with the TT-SVD and outperforms for larger values of κ (i.e., less truncation, but better approximation quality). For example, if we compare the t-SVDMII result with κ=250 (dark gray diamond) with the TT-SVD result with κ=290 (light gray circle), we see that the t-SVDMII produces a better approximation for roughly the same CR. The color-coded results in Fig. 5 follow directly from writing the tr-HOSVD in our M-framework and the optimality condition in Theorem 6.2 and follow from relating the TT-SVD to the matrix SVD and using our Eckart–Young result in Theorem 5.3. Specifically, we see that for each choice of κ the t-SVDMII approximation has a smaller RE than all other methods.

We also note that for the HOSVD in Fig. 5 a large number of combinations of multirank parameters (k1,k2,k3) are possible and can drastically change the approximation–compression relationship. Finding a reasonable tr-HOSVD multirank via trial and error may not be feasible in practice. In comparison, there is only one parameter for the t-SVDMII and TT-SVD which impacts the approximation–compression relationship more predictably, particularly because the approximation quality is known a priori.

D. Extension to Four Dimensions and Higher

Although the algorithms and optimality results were described for third-order tensors, the algorithmic approach and optimality theory can be extended to higher-order tensors since the definitions of the tensor-tensor products extend to higher-order tensors in a recursive fashion, as shown in ref. 26. Ideas based on its truncation for compression and high-order tensor completion and robust PCA can be found in the literature (19, 27). As noted in ref. 11, a similar recursive construct can be used for higher-order tensors for the M-product, or different combinations of transform-based products can be used along different modes. We give the four-dimensional t-SVDMII algorithm in ref. 23. Due to space constraints, we do not discuss this further here.

9. Conclusions and Ongoing Work

We have proved Eckart–Young theorems for the t-SVDM and t-SVDMII. Importantly, we showed that the truncated t-SVDM and t-SVDMII representations give better approximation to the data than the corresponding matrix-based SVD compression. Although the superiority of tensor-based compression has been observed by many in the literature, our result proves this theoretically. By interpreting the HOSVD in the M-framework, we developed the relationship between the HOSVD and the t-SVDM and t-SVDMII and then applied our Eckart–Young results to show how our tensor approximations improve on the Frobenius norm error. We were also able to apply our Eckart–Young theorems to compare Frobenius norm error with that of a TT-SVD approximation.

We briefly considered both multisided compression and extensions of our work to higher-order tensors. In future work, means for optimizing α, k, and j such that the upper bound on the error is minimized while minimizing the total storage will be investigated.

The choice of M defining the tensor-tensor product should be tailored to the data for best compression. Therefore, consideration as for how to best design M to suit the dataset shall be pursued in future work.

Acknowledgments

M.E.K.’s work was partially supported by a grants from IBM Thomas J. Watson Research Center, Tufts T-Tripods Institute (under NSF Harnessing the Data Revolution grant CCF-1934553), and NSF grant DMS-1821148.

Footnotes

The authors declare no competing interest.

This article is a PNAS Direct Submission.

*If A is m×1×n, the MATLAB command squeeze(A) returns the m×n matrix.

Since computation of the best rank-k CP decomposition requires an iterative method and does not require orthogonality, CP results are not brought for comparison in this study.

The reader should regard the elements of the tensor as 1×1×n tube fibers under M. This forms a free module (see ref. 11). The analogy to elemental inner-product-like definitions over the underling free module induced by M and space of tube-fibers is referenced in section 4 of ref. 11 for general M-products and in section 3 of ref. 13 for the t-product. The notion of orthogonal/unitary tensors is therefore consistent with this generalization of inner-products, which is captured in the first part of the definition.

§The term “t-rank” is exclusive to the M tensor decomposition and should not be confused with the rank of a tensor, which was defined in Section 1.

Specfically, we mean the minimizer of the Frobenius norm of the discrepancy between the original and the approximated tensor.

#If we want to choose a different mode for the first unfolding we would correspondingly permute the tensor prior to decomposing with our method.

In MATLAB, this would be obtained by using the command permute(A,[3,2,1]).

**The video data are built-in to MATLAB and can be loaded using trafficVid = VideoReader(‘traffic.mj2′).

††Orienting with transposed frames did not affect performance significantly.

Data Availability

There are no data underlying this work.

References

  • 1.Stewart G. W., On the early history of the singular value decomposition. SIAM Rev. 35, 551–566 (1993). [Google Scholar]
  • 2.Eckart C., Young G., The approximation of one matrix by another of lower rank. Psychometrika 1, 211–218 (1936). [Google Scholar]
  • 3.Watkins D. S., Fundamentals of Matrix Computations (Wiley, ed. 3, 2010). [Google Scholar]
  • 4.Hitchcock F. L., The expression of a tensor or a polyadic as a sum of products. J. Math. Phys. 6, 164–189 (1927). [Google Scholar]
  • 5.Carroll J. D., Chang J., Analysis of individual differences in multidimensional scaling via an n-way generalization of ‘Eckart-Young’ decomposition. Psychometrika 35, 283–319 (1970). [Google Scholar]
  • 6.Harshman R.. “Foundations of the PARAFAC procedure: Models and conditions for an ”explanatory” multi-modal factor analysis” (UCLA Working Papers in Phonetics, 16, 1970). [Google Scholar]
  • 7.Tucker L. R., “Implications of factor analysis of three-way matrices for measurement of change” in Problems in Measuring Change, Harris C. W., Ed. (University of Wisconsin Press, Madison WI, 1963),pp. 122–137. [Google Scholar]
  • 8.De Lathauwer L., De Moor B., Vandewalle J., A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21, 1253–1278. [Google Scholar]
  • 9.Oseledets I. V., Tensor-train decomposition. SIAM J. Sci. Comput. 33, 2295–2317 (2011). [Google Scholar]
  • 10.Kilmer M. E., Martin C. D., Factorization strategies for third-order tensors. Lin. Algebra Appl. 435, 641–658 (2011). [Google Scholar]
  • 11.Kernfeld E., Kilmer M., Aeron S., Tensor–tensor products with invertible linear transforms. Lin. Algebra Appl. 485, 545–570 (2015). [Google Scholar]
  • 12.Kolda T., Bader B., Tensor decompositions and applications. SIAM Rev. 51, 455–500 (2009). [Google Scholar]
  • 13.Kilmer M. E., Braman K., Hao N., Hoover R. C., Third-order tensors as operators on matrices: A theoretical and computational framework with applications in imaging. SIAM J. Matrix Anal. Appl. 34, 148–172 (2013). [Google Scholar]
  • 14.Grasedyck L., Hierarchical singular value decomposition of tensors. SIAM J. Matrix Anal. Appl. 31, 2029–2054 (2010). [Google Scholar]
  • 15.Hackbusch W., Tensor Spaces and Numerical Tensor Calculus (Springer Series in Computational Mathematics, Springer, 2012), vol. 42. [Google Scholar]
  • 16.Vannieuwenhoven N., Vandebril R., Meerbergen K., A new truncation strategy for the higher-order singular value decomposition. SIAM J. Sci. Comput. 34, A1027–A1052 (2012). [Google Scholar]
  • 17.Hillar C. J., Lim L., Most tensor problems are NP-Hard. J. ACM 60, 1–39 (2013). [Google Scholar]
  • 18.Hao N., Kilmer M. E., Braman K., Hoover R. C., Facial recognition using tensor-tensor decompositions. SIAM J. Imag. Sci. 6, 457–463 (2013). [Google Scholar]
  • 19.Zhang Z., Ely G., Aeron S., Hao N., Kilmer M., “Novel methods for multilinear data completion and de-noising based on tensor-SVD” in 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(IEEE, 2014), pp. 3842–3849. [Google Scholar]
  • 20.Zhang Y., et al. , “Multi-view spectral clustering via tensor-SVD decomposition” in 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI, 2017), pp. 493–497. [Google Scholar]
  • 21.Sagheer S. V. M., George S. N., Kurien S. K., Despeckling of 3D ultrasound image using tensor low rank approximation. Biomed. Signal Process Contr. 54, 101595 (2019). [Google Scholar]
  • 22.Ballester-Ripoll R., Lindstrom P., Pajarola R., TTHRESH: Tensor compression for multidimensional visual data. arXiv [Preprint] (2018). https://arxiv.org/abs/1806.05952 (Accessed 1 December 2020). [DOI] [PubMed]
  • 23.Kilmer M. E., Horesh L., Avron H., Newman E., Tensor-tensor algebra for optimal representation and compression. arXiv [Preprint] (2019). https://arxiv.org/abs/2001.00046 (Accessed 1 December 2020). [DOI] [PMC free article] [PubMed]
  • 24.Georghiades A. S., Belhumeur P. N., Kriegman D. J., From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intell. 23, 643–660 (2001). [Google Scholar]
  • 25.Landgrebe D., Biehl L., An introduction and reference for MultiSpec (2019). https://engineering.purdue.edu/biehl/MultiSpec/. Accessed 1 December 2020.
  • 26.Martin C. D., Shafer R., LaRue B., An order-p tensor factorization with applications in imaging. SIAM J. Sci. Comput. 35, A474–A490 (2013). [Google Scholar]
  • 27.Ely G., Aeron S., Hao N., Kilmer M., 5D seismic data completion and denoising using a novel class of tensor decompositions. Geophysics 80, V83–V95 (2015). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

There are no data underlying this work.


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES