Abstract
Generalizations of matrix decompositions to multidimensional arrays, called tensor decompositions, are simple yet powerful methods for analyzing datasets in the form of tensors. These decompositions model a data tensor as a sum of rank-1 tensors, whose factors provide uses for a myriad of applications. Given the massive sizes of modern datasets, an important challenge is how well computational complexity scales with the data, balanced with how well decompositions approximate the data. Many efficient methods exploit a small subset of the tensor’s elements, representing most of the tensor’s variation via a basis over the subset. These methods’ efficiencies are often due to their randomized natures; however, deterministic methods can provide better approximations, and can perform feature selection, highlighting a meaningful subset that well-represents the entire tensor. In this paper, we introduce an efficient subset-based form of the Tucker decomposition, by selecting coresets from the tensor modes such that the resulting core tensor can well-approximate the full tensor. Furthermore, our method enables a novel feature selection scheme unlike other methods for tensor data. We introduce methods for random and deterministic coresets, minimizing error via a measure of discrepancy between the coreset and full tensor. We perform the decompositions on simulated data, and perform on real-world fMRI data to demonstrate our method’s feature selection ability. We demonstrate that compared with other similar decomposition methods, our methods can typically better approximate the tensor with comparably low computational complexities.
INDEX TERMS: Tensor decomposition, tucker decomposition, higher order singular value decomposition, coresets, tensor CUR decomposition, subset selection, feature selection, fMRI
I. INTRODUCTION
Datasets in the modern era often take the form of large multidimensional arrays called tensors. A tensor can be understood as a collection of values (e.g. measurements) that are each associated with a corresponding list of array indices, where denotes the order of the tensor. Whereas a vector is a first order tensor and a matrix is a second order tensor, the analysis of third or higher order tensors is the focus of those methods formally called tensor decompositions. Tensor decompositions generalize matrix decompositions to higher order tensors, approximating a tensor dataset as a tensor product of several factor matrices that have various use cases. These generalizations notably endow tensor decompositions with the ability to model multilinear relationships within the data, concisely modeling the relationships across different modes of the tensor. Furthermore, tensor decompositions provide a low-rank model of the tensor that typically is orders of magnitude smaller in memory than the original tensor. A tensor decomposition’s factors are typically useful for describing the latent characteristics of the tensor, and are often used for providing a generative model of the data. All in all, tensor decompositions provide tools for a wide range of uses, such as dimension reduction [1], [2], [3], [4], [5], feature extraction [6], [7], [8], [9], denoising [10], [11], [12], [13], [14], [15], missing data completion [15], [16], [17], [18], [19], [20], dictionary learning [21], [22], [23], [24], [25], signal processing [26], [27], [28], [29], [30], [31], [32], and various others. Applications of tensor decompositions are widespread and include chemometrics [1], [33], [34], psychometrics [35], [36], econometrics [37], [38], analysis of medical imaging modalities [7], [39], [40], [41], [42], [43], [44], [45], [46], [47], [48], radar and communication applications [30], [49], [50], applications to machine learning [26], [51], [51], [52], [53], [54], [55], [56], and many others.
Perhaps one of the simplest tensor decompositions is what is often called the canonical polyadic decomposition (CPD) [57], [58], which approximates a tensor as the sum of rank-1 tensors where is a user-defined positive integer. CPD can be understood as a higher-order generalization of matrix low-rank decompositions, which decompose a matrix into a sum of rank-1 matrices that best approximates the original matrix. However, whereas matrix rank decompositions are typically not unique unless additional constraints are imposed, the CPD is often unique under much milder conditions. This results in unique factors that reveal the latent structure of the data under fewer required assumptions [1], [59], [60], [61].
Another useful form of tensor decomposition is the Tucker Decomposition [62], [63]. The Tucker decomposition is a general form of tensor decomposition that represents an th order tensor as the tensor product of factor matrices with an th order ‘‘core’’ tensor: a small tensor that can be considered a compressed version of the original tensor. A notable specific type of Tucker decomposition is the higher-order singular value decomposition (HOSVD) [63], [64], the direct generalization of the matrix singular value decomposition (SVD) to tensors. HOSVD is analytically represented by its factor matrices being the singular vectors of each ‘‘unfolding’’ (matricization) of the original tensor, in which case the core tensor can be interpreted as a tensorial form of principal components. While the CPD and Tucker are perhaps the most popularly used tensor decompositions, since their introduction a wide variety of other decompositions have been introduced and used successfully. These include the Tensor Train decomposition [65], [66], hierarchical Tucker decompositions [67], [68], tensor block-term decompositions [69], [70], coupled matrix-tensor factorizations [44], [71], [72], and online tensor decompositions [73], [74], [75], [76].
Most tensor decompositions perform their optimization routines by breaking the problem of estimating all factor matrices into simpler subproblems. This typically involves solving for each factor matrix one at a time, by unfolding the tensor with respect to each of the modes and subsequently solving for (or updating) a corresponding mode’s factor matrix. While these routines simplify optimization by allowing decompositions to be solved with matrix-based methods, tensor decompositions nevertheless rely on multiplying high-dimensional matrix representations of the tensor data, which can become computationally expensive with exceedingly massive tensors. These challenges have greatly motivated computationally efficient methods for tensor decompositions, especially those that retain simple models with excellent approximation and explainability.
Many efficient tensor decompositions are direct generalizations of matrix decompositions. With matrices, a particularly useful strategy has been to approximate a matrix via projecting onto the span of only a small subset of columns. These are referred to as column subset selection (CSS) methods, of which include the matrix CUR decomposition [77], [78] which approximates a given matrix by both a subset of columns and a subset of rows . Subset selection methods are distinguished by those that select a subset randomly, with a focus on faster decompositions, or those that select a subset deterministically, with a focus on better approximation and for performing feature selection: identifying particularly representative elements of the data that well-describe the rest of the data. Extensions of these matrix decompositions to tensors exist as types of Tucker decompositions that are called tensor CUR decompositions [79], [80], [81], [82], [83], [84], which use subsets of elements from multiple modes of a tensor to provide a multilinear basis for the entire tensor. Due to their simple procedures, tensor CUR decompositions are among the fastest tensor decompositions, and can also provide good approximations of tensors with reasonably large subset sizes, yet may suffer with smaller subset sizes. These methods exclusively select subsets randomly, rather than deterministically. Extensions of deterministic subset-based methods may also be desirable for tensors, especially in the interest of determining well-representative subsets of the data.
Tensorial feature selection has been accomplished in [85], [86], and [87] but only in the context of supervised learning for classification, where tensors are accompanied by labels and feature selection is a function of the labels. An unsupervised feature selection method for third order tensors was proposed in [88], which takes subsets from a single mode of the tensor. However, features in these subsets differ depending on what elements they correspond to in another ‘‘view’’ mode, and thus may be harder to interpret. Furthermore, these subsets are acquired after performing a CPD, whereas the methods we consider in this paper actually use the subsets to perform efficient tensor decompositions. To our knowledge there have been no other extensions of deterministic subset-based methods to tensor data in the general unsupervised setting, and none for multiple modes of the tensor.
In this paper, we introduce an efficient weighted subset-based type of Tucker decomposition, similar in form to the tensor CUR decompositions and the sequentially truncated higher-order singular value decomposition (ST-HOSVD) [89]. Notably, the deterministic variation of our method provides a novel unsupervised feature selection algorithm for tensor data, selecting subsets from one or more modes that are reasonably best able to summarize the structure of the tensor. Sequentially across a tensor’s modes, we select from each mode a coreset, i.e. a weighted subset of elements [90], [91], [92], [93], [94], that reasonably minimizes a measure of discrepancy between the coreset and the entire mode, which in turn minimizes the mean squared error cost between the tensor and its approximation. We connect the discrepancy to the cost function of HOSVD, showing that use of weighed subsets provides a better minimization to the cost than unweighted subsets used in tensor CUR decompositions. We consider two methods: one based on random coreset selection, sampling according to a weighted probability distribution, and one based on deterministic coreset selection, utilizing an efficient weighted kernel herding (WKH) [95] procedure. For a given coreset, we select the corresponding coreset weights via an efficient nonnegative least squares (NNLS) minimizing the discrepancy between the coreset and the entire mode. We analyze performance of our two methods on large datasets, testing with both simulated data and real functional magnetic resonance imaging (fMRI) data via functional connectivity matrices (FNCs) arranged as a large tensor. Comparing with similar Tucker-type methods, such as variations of the Tensor CUR decomposition and randomized HOSVD methods, we demonstrate that our methods are highly efficient, provide good approximation performance, and can be converted to a HOSVD decomposition with strong estimation quality.
The paper is organized as follows. Section II introduces preliminary concepts regarding matrices and tensors, including several basic methods for matrix and tensor decompositions. Section III explains efficient generalizations of subset-based methods for matrix decompositions to tensor decompositions, such as the tensor CUR decompositions. Section IV introduces our proposed sequential coreset-based tensor decomposition methods, which we refer to as tensor coreset decompositions (TCD). Section V provides results of our methods, compared with various other methods, on both simulated tensor datasets and a real fMRI FNC tensor dataset. Section VI concludes the paper and overviews the contributions.
II. PRELIMINARIES
A. NOTATION
Throughout the paper, we use notation that is summarized in Table 1, and is consistent with notation of other works that discuss tensor decompositions (e.g., [1]).
TABLE 1.
Notation used in this paper.
| Notation | Definition |
|---|---|
| scalar | |
| vector | |
| matrix | |
| tensor | |
| number of modes | |
| dimensionality of mode, for | |
| n-rank: rank of | |
| number of factors in decomposition of mode | |
| index of the nth mode, for | |
| index set of nth mode (with cardinality ) | |
| All modes’ index sets | |
| element of the mode | |
| subset of mode corresponding to indices | |
| mode unfolding of | |
| mode tensor product of with | |
| matrix transpose | |
| matrix pseudoinverse | |
| projection matrix | |
| mode “mapping” matrix corresponding to |
We denote scalars by lowercase unbolded letters (e.g., ), vectors by lowercase bolded letters (e.g., ), matrices by uppercase bolded letters (e.g. ), and higher order tensors (order three or higher) by calligraphic bolded letters (e.g. ).
The order of a tensor, , also referred to as the number of modes, can be loosely thought of as the number of dimensions in the tensor, but more precisely it is the number of indices needed to index an entry in the tensor. For instance, a third order tensor has a corresponding element denoted by . Each index corresponds to a different mode of the tensor, and is bounded by the dimensionality of that mode. For example, given a third-order tensor , the dimensionality of the first mode is . In general, when dealing with th-order tensors, we refer to the dimensionality of the node by , and a particular index from that mode by , for , and .
As our paper utilizes subsets of the tensor, we define a subtensor as a subset of elements in the tensor corresponding to some set of indices. We define index sets over a given mode of a tensor by unbolded calligraphic letters , and use a colon to otherwise indicate all elements of a mode. For example, denotes the element of the first mode, and denotes a subset of elements in the first mode corresponding to the index set .
An important operation in tensor decompositions is the matricization of a tensor, also called the unfolding. We denote the mode unfolding of a tensor by the matrix , where is the product of all other mode’s dimensionalities. The row of the mode unfolding is the vectorization of the element in the mode, e.g. denotes the third row in the first mode unfolding of and is equal to vec , the third element of the first mode.
The rank of a tensor is defined as the smallest number of rank-1 tensors that exactly sum to . Unlike with matrices, determining the tensor rank is difficult for most real-world tensors. A more well-defined notion of a tensor’s rank structure are the n-ranks, the ranks of each unfolding .
If the mode unfolding of a tensor is left multiplied by a matrix , the resulting product is equivalently represented in the tensor domain by the mode tensor product .
The norm of a tensor is defined by:
In the next subsections, we first discuss Tucker and HOSVD decompositions for tensors, and note their complexities. We then discuss column subset selection (CSS) methods for reducing complexities of matrix decompositions, and then discuss generalizations of these methods to tensors.
B. TUCKER AND HOSVD DECOMPOSITIONS
The Tucker decomposition [62], [63] is a general type of tensor decomposition that approximates an th order tensor by the tensor product of factor matrices , with a smaller core tensor . The general cost function for Tucker decompositions takes the form:
| (1) |
where is the mode’s factor matrix, is the core tensor, and are the number of factors chosen for the mode, which are often closely related to the tensor’s n-ranks , for .
The Tucker decomposition is not unique without any further constraints. There are a variety of ways to achieve a unique Tucker decomposition over a tensor, including several subset-based approaches such as the tensor CUR decomposition and the method that we later propose in this paper.
A useful Tucker decomposition is the HOSVD [63], [64], a natural generalization of SVD to tensors. HOSVD’s factor matrices of a tensor are analytically given as the left singular vectors of each unfolding of , and the core tensor is obtained from a tensor product of these factor matrices with . The HOSVD procedure is described in Algorithm 1.
Several variations of HOSVD have been introduced since its inception to improve its efficiency, with one of the most used variations being the sequentially truncated HOSVD (ST-HOSVD) [89]. Across each mode of the tensor, ST-HOSVD first estimates a mode’s factor matrix from the left singular vectors of the mode unfolding (just as done with HOSVD), and then replaces with the core tensor formed by the tensor product of this factor matrix with . Over calculation of the mode factor matrices, the current tensor progressively reduces in size until it becomes the final core tensor and all factor matrices are obtained. The ST-HOSVD procedure is described in Algorithm 2.
If we denote the SVD of each in the for loop of Algorithm 2 by , such that , it follows that ST-HOSVD’s truncation strategy sequentially replaces with its top right principal components (PCs) , thus best preserving the approximation of the original tensor while reducing the dimensionality of operations across all remaining modes.
We now compare the computational complexities of HOSVD and ST-HOSVD. For simplicity, we assume that the order of modes truncated with ST-HOSVD is . HOSVD’s computational complexity is , dominated by the SVD-unfoldings for large tensors. ST-HOSVD considerably reduces this complexity to , where , here is the number of factors in the mode. However with ST-HOSVD, the first few modes’ SVDs are similar in complexity to those calculated with HOSVD. This leads ST-HOSVD to still be computationally expensive when dealing with large tensors, motivating more scalable decomposition methods.
In the next section, we overview subset-based methods for reducing complexity of matrix decompositions, from which we then overview their various generalizations to tensors.
C. MATRIX DECOMPOSITIONS BY COLUMN SUBSET SELECTION (CSS)
This subsection gives a general overview of column subset selection methods for matrices. For a more detailed discussion of the topic, we refer the reader to [96], [97], [98].
CSS methods approximate a matrix by selecting a subset of columns of the matrix, selecting either randomly or deterministically, and then approximating by projecting onto the span of the subset. If we denote as the matrix formed by a subset of columns, corresponding to some index set , the approximation error for some choice of is given by:
| (2) |
where is the pseudoinverse of (such that , and is the projection matrix corresponding to the column space of . This is equivalently given by:
| (3) |
where is a matrix mapping columns of onto the span of , which in later sections we refer to as a ‘‘mapping’’ matrix.
1). RANDOMIZED CSS
Randomized CSS methods operate by assigning a weighted probability distribution to the columns and then sampling according to this distribution. Uniform sampling of the columns (giving equal sampling probability to each column) generally produces bad approximations of a matrix, especially if the columns are heterogeneous. Instead, sampling distributions are often based on probabilities weighted by the squared norm of columns, i.e. ‘‘norm sampling’’ [77], [99], or approximated statistical leverage scores [100]. In our paper, we focus on norm sampling, which is the most computationally efficient of the sampling-based methods, and we note that norm sampling is also conventional in tensor-based methods [79], [82], [101]. It has been proven in [99] that norm sampling provides the following error guarantees: given a matrix and values for , , and a defined upper limit to the rank of , then a norm sampled selection for satisfies the following error probability
where is the best rank-k approximation to , and is the probability of failure.
Furthermore, it has been proven in [77] that given a norm sampled subset of columns , after rescaling the columns of to be the same norm
| (4) |
that the following error probability is satisfied :
where .
If we denote the SVD of by , this particular result suggests that with a high enough sample size , a norm sampled can adequately approximate the left PCs of with a high probability, by re-scaling the columns according to (4). We later will refer to this result when introducing our coreset-based method.
2). DETERMINISTIC CSS
Deterministic CSS methods are combinatorial methods for selecting a ‘‘best’’ representative subset of columns, where ‘‘best’’ is relative to the method used. The problem of finding a subset that exactly minimizes the approximation cost over all possible subsets has been acknowledged as being UG-hard (where ‘‘UG’’ refers to the unique games conjecture) [102], in which case deterministic algorithms mainly focus on obtaining a reasonably ‘‘best’’ subset in a reasonable amount of time. These methods can also effectively serve as feature selection methods, and thus there is a large overlap between methods that can be used for feature selection and those used for deterministic CSS. However, the design of CSS methods typically puts a greater emphasis on the scalability of methods, especially with the high-dimensional combinatorial problems posed by large matrices or tensors.
Perhaps the most popular method for deterministic CSS is to use the greedy algorithm, which consecutively searches for a new column to add onto a subset such that the resulting new subset best approximates the full matrix. The greedy CSS algorithm was first studied in [103], and has been demonstrated to be both scalable to large numbers of columns and provide high-quality representative subsets [97], [104], [105], [106], [107], [108].
As one may expect, deterministic CSS methods provide better approximation than randomized methods and incur better error guarantees. The tightest bounds for deterministic CSS depend on the singular values of ; intuitively, those matrices whose singular values have higher rate of decay are simpler matrices which require much fewer columns to well-approximate the matrix. In [98], the following bound was proven on greedy CSS:
where is the best rank-k approximation to , is the number of steps taken by the greedy algorithm, and is the smallest singular value of . Similar results have been proven in Theorem 3 of [109]. A shared result amongst these works is that by only taking slightly more than columns with greedy CS, the approximated matrix is less than a factor from the optimal choice of columns.
III. SUBSET METHODS GENERALIZED TO TENSORS (METHODS TO APPROXIMATE THE HOSVD)
As tensor decompositions frequently invoke matrix operations with the tensor unfolding, matrix approximation techniques have found great use for accelerating tensor decompositions [79], [80], [81], [82], [83], [101]. As our proposed method is most analogous to the HOSVD, we focus only on those subset-based methods for performing a Tucker decomposition in the form of an approximated HOSVD.
These methods generally estimate a form of Tucker decomposition that is not a HOSVD decomposition, but can be used to approximate one. In order to provide an approximate HOSVD decomposition, we may convert any method’s corresponding Tucker decomposition to a HOSVD decomposition via the procedure [82] outlined in Algorithm 3.
There are various different strategies to provide a Tucker decomposition over a tensor via exploiting the previously discussed matrix approximation techniques over the tensor unfoldings . These strategies can generally be separated into two distinct camps with differing decompositions:
column-based subsets: approximate by a subset of its columns, e.g. randomized sampling tucker CUR [80]
row-based subsets: approximate by a subset of its rows, e.g. Chidori CUR [79], [82], Fiber CUR [81], [82], and randomized-block HOSVD (RB-HOSVD) [101]
We briefly overview and contrast these two strategies in the following subsections.
A. COLUMN-BASED SUBSET METHODS FOR TENSOR UNFOLDINGS
Column-based subset methods approximate a tensor unfolding using a subset of its columns. These columns are referred to as “fibers” in the tensor literature, and represent a fixed index in all modes of the tensor except for the mode. As an example, is a fiber of the first mode which represents for some indices of the other modes that correspond to some fiber index .
For the mode unfolding of a tensor , if we denote as an index set for some subset of columns, and denote as the matrix formed by these columns, then (2) is restated as:
| (5) |
| (6) |
where is the projection matrix corresponding to the column space of , is the pseudoinverse of , and is the matrix mapping columns of onto .
By denoting as the set of all mode’s column index sets , for , then we can represent the resulting decomposition’s cost in a manner similar to (1):
| (7) |
where the core tensor is given by , and the factor matrices are given by the column subsets .
This decomposition in (7) was first introduced in [80], referred to as “ApproxTensorSVD” in that paper. Later publications such as [83] refer to the algorithm as randomized sampling tucker CUR (RST-CUR). This decomposition is perhaps the most direct generalization of the matrix CUR to the tensor domain, as the decomposition takes the exact form of the matrix CUR when . We refer to this decomposition as RST-CUR for the remainder of the paper.
The same advantages gained by matrix CUR for matrices carries over to RST-CUR for tensors, notably a low complexity way to approximate a tensor’s HOSVD. Additionally, as the factor matrices are fibers of the full tensor , the factor matrices retain properties held by the original tensor, which can include sparsity, nonnegativity, etc.. These qualities in being retained in factor matrices may aid with the interpretability of the decomposition.
A key difference between column-based subset methods and row-based subset methods over is how differences in dimensions affect the subset selection process. As is in general a very wide matrix with , the massive number of columns leads deterministic column subset selection methods to be intractable, as their complexities are typically in the order of or more. Furthermore, even randomized methods typically only use a uniform distribution for sampling the columns, e.g. with norm sampling it may also be intractable to calculate the norm of all columns of . This is a significant comparative disadvantage of the column-based methods such as RST-CUR, as uniform sampling of the columns may lead to significantly worse approximations for a given choice of . While column subset methods over are expensive, on the other hand, row-based subset methods are typically tractable due to the much smaller number of rows , as we discuss in the next subsection.
B. ROW-BASED SUBSET METHODS FOR TENSOR UNFOLDINGS
Row-based subset methods approximate a tensor unfolding using a subset of its rows. Aside from more advanced sampling methods being tractable over the rows than the columns of , another advantage of row-based methods is the interpretability of their subsets. Because rows in are simply the elements of the mode, rows of are easier to interpret than the fiber columns of .
For the mode unfolding of a tensor , if we now denote as an index set for some subset of rows, and denote as the matrix formed by these rows, then (2) is restated as:
| (8) |
| (9) |
where is the projection matrix corresponding for the row space of , and is the mode’s mapping matrix, which maps rows of onto the span of , and is given by:
| (10) |
By denoting as the set of all modes’ row index sets , for , then we can represent the resulting decomposition’s cost in a form similar to (1):
| (11) |
where the core tensor is a subtensor of over the index sets , and the factor matrices are the mapping matrices , for .
The characteristic difference between the decomposition in (11), and the decomposition in (7), is how elements of the tensor manifest as elements in the decomposition, relative to a tensor generalization of (3). In (7), elements of manifest as fibers in the factor matrices , and the core tensor can be considered a tensor generalization of the mapping matrix. Where in (11), the opposite occurs: elements of manifest as the core tensor , and the factor matrices are the modes’ mapping matrices. Thus with (11), the core tensor is the element of the decomposition that retains properties of the original tensor, which may yield more useful decompositions depending on the application.
Various tensor decompositions take the form of the decomposition in (11). This decomposition was first introduced in [79] shortly before the introduction of the RST-CUR decomposition. Later works such as [82] have provided significant understandings to the error guarantees of this decomposition, and have referred to it by the name “Chidori CUR” decomposition.
A key feature of the Chidori CUR is that the subset indices are chosen prior to the decomposition, and that the mapping matrices are calculated only over those fibers of that correspond to the subset indices of all other modes. In other words, in calculation of in (10), the matrix is the unfolding of the mode “Chodiri Beam” , and is the unfolding of the core tensor (a subtensor of the Chidori Beam). Because the are calculated over only the Chidori Beams, the decomposition only requires access to the Chidori beams and is thus independent from all other elements in the tensor. This reliance on only a small subset of the tensor to perform the decomposition results in one of the most computationally efficient tensor decompositions. At the same time, however, independence of the decomposition from elements outside the Chidori beams may result in a worse factorization than other decompositions, particularly when the subsets of the core tensor are not well-representative of the rest of , or if is otherwise heavily heterogeneous in nature.
A similar decomposition was later introduced in [81], and can be considered a generalization of the Chidori CUR where the unfolding fibers in and are not restricted to those of the other modes, but can be any random corresponding subset of fibers from and over the entire tensor. This decomposition was also later studied in [82] and has been called the “Fiber CUR” decomposition. As the Fiber CUR allows access to any random subset of fibers of and for calculating mapping matrices , its decomposition may be more robust to poorly chosen subsets of the data. However, as column fibers in Fiber CUR are typically uniformly sampled, this may also lead the Fiber CUR to exhibit considerably higher variation in the quality of the estimated , which often leads to worse decompositions than those provided by Chidori CUR.
As described in Section III-A, the massive numbers of columns in make column selection methods intractable, and thus typically only rely on uniform sampling to select the columns. However for row-based methods such as Chidori CUR and Fiber CUR, the much smaller number of rows allow for more sophisticated sampling methods such as norm sampling. When norm sampling is applied, index sets are selected according to the norms of elements in the original tensor, e.g. which when vectorized is of dimension . These sampling schemes require passes over the tensor to construct the index sets, and thus can still be of considerable expense. Perhaps as a result of this, row-based subset methods for tensors have exclusively used random subset methods such as uniform and norm sampling to obtain subsets of the tensor, and thus deterministic subset methods have not been explored.
Building on the ideas presented in previous sections, in the next section we introduce a new way of performing a subset-based Tucker decomposition that provides a good balance between efficiency and approximation quality, by exploiting weighted subsets of the data called coresets.
IV. TENSOR CORESET DECOMPOSITION
In this section, we introduce a method for the Tucker decomposition that operates by selecting coresets: weighted subsets of the data. As we later explain, these weighted subsets can provide a better approximation to the tensor by effectively better approximating the HOSVD’s principal component tensor . We later motivate additional differences vs. the previously discussed methods, such as a sequentially truncated coresets approach analogous to ST-HOSVD, and the ability to represent symmetry in the tensor over multiple modes. Furthermore, instead of exclusively selecting subsets randomly for greater efficiency, we also motivate ability to select subsets deterministically, for better approximation quality and for feature selection.
A. SUBSET DISCREPANCY–A MEASURE OF “REPRESENTATIVENESS”
To motivate weighted subsets within a tensor, we first refer back to the per-mode approximation error provided for row-based subsets in (9).
Denoting the SVD of by , the approximation error of depends on how well the subset can approximate the row-space of , specifically in terms of approximating its right principal components , which in the tensor domain is represented by the HOSVD core tensor These PCs are analytically given by the eigenvectors and corresponding eigenvalues of the quadratic form , and thus can be approximated from via the corresponding form . As a result, an implicit distance between the PCs of and is given by the distance between the quadratic forms:
| (12) |
This can be understood as a nonparametric measure of discrepancy [90], [91], [92], [93], [94], [95] between the full set and the subset , analogous to the maximum mean discrepancy [92], [93], [94] for a particular realization of distributional “embeddings” of the elements in the set. Specifically for some element in the node, given by , its corresponding embedding in this discrepancy is given by , and the mode’s “full mode embedding” is given by the sum , which we seek to best approximate via the subset’s embedding .
B. CORESETS–WEIGHTED SUBSETS
A subset’s discrepancy can be further decreased by weighting the subset: assigning individual weights to each element in the subset. Utilizing these weighted subsets, called coresets, the discrepancy measure is given by:
| (13) |
where is the set of coreset weights corresponding to each element in , and is the modes full mode embedding (a fixed quantity).
An important point to acknowledge here is that in (13), the weights are applied to the element embeddings , and not the elements themselves . Instead, it follows that the elements receive the square root of the weights :
This necessitates us to later specify nonnegative weights in order for to be real.
We now discuss the procedure for selecting weights that minimize the discrepancy (13). For simplicity, for now we assume that we have a particular realization of the subset , which is selected either randomly or deterministically (as we explain in the next subsection). With fixed and discrepancy as only a function of the weights , we can equivalently write (13) in a form where all matrices are instead given as vectors:
| (14) |
where we define as the vectorization of the element’s embedding, as the horizontal concatenation of the , and as the vectorization of the full mode embedding.
This is a least squares problem for which the ordinary least squares (OLS) solution is . However as noted previously, we also require that the weights be nonnegative in order for the square root of weights (applied to the elements themselves) to be real. Therefore, we use a NNLS algorithm [110] to solve for . This is an efficient algorithm that does not explicitly form the embeddings, instead only requiring the kernels between embeddings which are significantly easier to calculate. We define the kernel between the embeddings of and as:
| (15) |
The two quantities required by the algorithm are kernels and . The matrix provides all pairwise kernels within the subset, and is equal to , where denotes the Hadamard power (here, elementwise squaring). The vector provides kernels with each element in the subset with the full mode embedding, equal to , where is the vector of 1s. For further efficiency, we initialize the NNLS algorithm with the mapping of the OLS solution to its nearest nonnegative vector. In our experience, we often observe that the OLS solution is already nonnegative and thus exactly minimizes (14) without requiring the NNLS algorithm.
Having provided the means to optimize the weights , in the next section we discuss ways of selecting the subsets.
C. SUBSET SELECTION–RANDOMIZED OR DETERMINISTIC
As our method is a row-based subset method over , we can consider more advanced means of selecting subsets than uniform sampling of rows. We use different strategies if we seek random subsets, prioritizing computational efficiency over approximation quality, or deterministic subsets, prioritizing approximation quality in addition to the utility of feature selection.
For random subsets, we use norm sampling as done with previously mentioned methods. While we unfortunately do not provide an approximation bound for random subsets using the NNLS weights discussed previously, intuitively these weights should yield a discrepancy that is less than or equal to that provided by the normalized weights discussed in (4), as those weights are not explicitly optimizing over the discrepancy whereas the NNLS weights are. Therefore, we expect an error superior or equal to that of (4)’s weights. As we show in the next section, it is inexpensive to calculate the NNLS weights since the kernel quantities are required anyways to calculate the mode’s mapping matrix . Selection of the random subset along with calculating the weights has complexity of , which is linear in .
For deterministic subsets, we retain the use of greedy methods in the interest of balancing approximation quality with computational efficiency. As discussed in Section II-C, greedy methods significantly outperform the error bounds of randomized methods and lead to subsets that rapidly converge to the properties of the full set. We specifically utilize the weighted kernel herding (WKH) method [95] which allows us to simultaneously and efficiently solve for the subset indices and weights . Like the NNLS algorithm, the WKH algorithm is made more efficient by only requiring kernels to operate. It uses , the matrix of pairwise kernels between all elements in the mode, and has complexity , which is quadratic in .
In the next section, we introduce our tensor decomposition method as a sequentially truncated variation of the row-based subset model in (11), where we sequentially replace the tensor with a coreset of itself.
D. TENSOR DECOMPOSITION VIA SEQUENTIALLY TRUNCATED CORESETS
We now motivate our method for performing a coreset-based Tensor decomposition. We first revisit points mentioned in Section III-B. specifically discussing the advantages and disadvantages of the Chidori CUR decomposition. As we note previously, the Chidori CUR Decomposition is efficient because it only requires processing small subsets of the tensor – the “Chidori beams” – in order to calculate the mode’s mapping matrices . However, this may also lead to a significantly worse approximation quality for the decomposition, in the event that the randomly chosen subsets are not well representative of the entire tensor , or otherwise for decomposing tensors that are highly heterogeneous in nature. When approximation quality is a priority for both randomized and deterministic methods, it may be more prudent to use a method that does pass over all elements of the tensor, but preferably only once if computational efficiency is also a priority. These decompositions can provide significantly more representative subsets of the data while still maintaining excellent computational efficiency.
With this focus in mind, in order to provide a good balance between approximation error and efficiency, we instead consider a method inspired by ST-HOSVD that utilizes sequentially truncated coresets to perform the decomposition. Like ST-HOSVD, for each mode of the tensor, we would learn the mapping matrix and then replace the tensor with a truncated tensor, thus significantly decreasing the complexity of calculating for all remaining modes. However, whereas ST-HOSVD replaces the tensor with the PCs of the mode, we instead replace the tensor with the mode’s coreset. These methods are closely connected by the fact that the coresets are trying to best preserve the PCs of the tensor, as evidence by the discrepancy cost in (12) and (13).
We now discuss details of our method’s implementation to assist understanding the pseudocode provided in Algorithm 4. The factorization of the mode is initialized by selecting a subset , which as we mentioned in Section VI.C is in general for random subsets or for deterministic subsets. We then compute the pairwise inner products between the subset and the full set, given by the matrix , within which the submatrix provides the pairwise inner products within the subset. With both of these matrices, we can obtain the kernels of embeddings and to perform NNLS and learn the coreset weights . Arranging in the diagonal matrix , the approximation in (9) given by can be weighted via , in which case the mapping matrix (accounting for weights) is given by , and the weighted coreset is given by . Thus with the weights calculated, we compute the mapping matrix , then truncate the mode by replacing it with the coreset , and finally un-matricize the tensor so that the entire process can be repeated for the remaining modes. Fig. 1 visualizes the tensor coreset decomposition (TCD).
FIGURE 1.

Visualization of the tensor coreset decomposition (TCD) applied to a 3rd-order tensor . The core tensor is a weighted subtensor (coreset) of .
After truncating over all modes, the resulting coreset core tensor is a subtensor weighted on each mode by weight matrix , and serves as a compressed form of analogous to the principal component tensor from HOSVD. The weights are a key differentiator from other methods like the Chidori CUR decomposition, and the use of within this sequentially method can allow for an excellent approximation to the HOSVD core tensor , and by extension, the tensor .
The method described in Algorithm 4 assumes that is an asymmetric tensor, and does not preserve symmetry in the decomposition if is symmetric across several modes. To retain symmetry in the decomposition in the event that is symmetric, we simply compute only one factor matrix for one of the symmetric modes , and reuse for all other modes symmetric to , while truncating those other modes the same way .
In the next subsection, we compare our so called Tensor Coreset Decomposition (TCD) to the Chidori CUR decomposition from a computational complexity standpoint.
E. COMPUTATIONAL COMPLEXITY OF TCD
In this section, we discuss the complexities of TCD with random (norm sampled) or deterministic (WKH) subsets, compared to Chidori CUR with random (norm sampled) subsets. We retain notations such as for sequentially truncated methods like ST-HOSVD and TCD. For simplicity, we assume that the modes truncated with these methods are in the order .
We first discuss complexity of Chidori CUR decomposition with random (norm sampled) subsets. The majority of complexity is in calculation of the norms of elements across the modes of the original tensor , each mode of complexity . These are then followed by the significantly cheaper calculations of the mapping matrices per each Chidori Beam, of complexity , where we denote . The total complexity is thus .
We then consider the complexity of TCD with random (norm sampled) subsets, which we refer to as TCD-R. The majority of complexity from each mode’s truncation occurs from the norm sampling of complexity , and the calculation of inner products between the subset and full set of complexity . These are then used by the significantly cheaper calculations of the coreset weights of complexity , and calculation of the mapping matrices of complexity (re-using from the coreset weights). The total complexity is thus: .
Lastly, we consider the complexity of TCD with deterministic (WKH) subsets, which we refer to as TCD-D. The majority of complexity from each mode’s truncation occurs from requiring calculation of , the pairwise inner products over the entire mode, of complexity . This is followed by the WKH subset selection of complexity , which yields indices and weights , along with the significantly cheaper calculations of the mapping matrices of complexity (where we can also re-use from the WKH). The total complexity is thus: .
Table 2 provides complexities of these methods. We note that for symmetric tensors, all methods are capable of exploiting symmetry by re-using factor matrices across symmetric modes, as described in Section IV-D. In this case, the summation is truncated to only the number of unique modes (i.e., symmetric modes are only counted once).
TABLE 2.
Computational complexities of TCD and similar methods described in Sections II and III. For truncated methods, we assume that the truncation order is .
| ST-HOSVD | |
| Chidori CUR | |
| TCD-R | |
| TCD-D |
In the next section, we experimentally test the TCD methods vs other efficient Tucker decomposition methods for approximating the HOSVD. We first demonstrate performance of methods on simulated data under various generative conditions. Later, we demonstrate these methods on real fMRI data in the form of functional connectivity maps (FNCs).
V. NUMERICAL EXPERIMENTS
We first introduce the performance measures used to compare the Tensor decomposition methods. Denoting a method’s approximated tensor by , the relative approximation error of is given by:
As the methods discussed in this paper are often used to approximate the HOSVD or ST-HOSVD, we also use a measure of distance between factors of the ST-HOSVD and factors of a method’s estimated HOSVD. We introduce this new measure as “HOSVD distance”, and note that its formulation utilizes the inter-symbol-interference (ISI) [111] used to evaluate the performance of blind source separation methods. Defining HOSVD distance, we denote as the factor matrices for the “true” ST-HOSVD, and denote as a method’s corresponding estimated HOSVD factor matrices, obtained by converting a method’s factorization into a HOSVD via Algorithm 3. Then the HOSVD distance between a method’s estimated HOSVD factors and the true factors is given by:
| (16) |
where the ISI of a matrix measures how close the matrix is to a permuted diagonal matrix (a performance measure invariant to sign and permutation ambiguities of the factors), and is given by
| (17) |
Finally, we also measure the CPU-time of the methods. For all performance evaluations done in Sections V and VI, we use the computational resources provided by the UMBC High Performance Computing Facility (HPCF), thus CPU-time is reflective of HPCF’s capabilities.
A. EXPERIMENTS WITH SIMULATED DATA
Our generative model of a tensor is as follows. For a common dimensionality across the modes , we model a tensor as the sum of a low-rank signal tensor and a full-rank noise tensor :
| (18) |
where is the signal to noise ratio (SNR) of .
The signal tensor is given in the form , where is a core tensor for some “true core size” , and for are the factor matrices. The core tensor , factor matrices , and noise tensor are all randomly generated with entries sampled from the standard Gaussian distribution.
We consider two sets of simulated experiments: one where the generative model follows a CPD model, and one where the generative model follows a Tucker model (respectively referred to in our experiments as “CPD model data” or “Tucker model data”). These experiments use the same conditions described above except for generation of core tensor : the Tucker experiments generate all entries of from the standard Gaussian distribution, whereas the CPD experiments specify as a superdiagonal core tensor wherein all for are drawn from the standard Gaussian distribution, and all other entries of equal 0.
As our paper focuses on subset-based methods for a Tucker decomposition, particularly those that approximate the HOSVD, we limit our results to variations of these methods. We thus include ST-HOSVD [89], Chidori CUR [79], [82], RST-CUR [80], a random-projection variant of HOSVD called RP-HOSVD [C] [83], [112], and another row-based tensor decomposition like those discussed in Section III-B., called RB-HOSVD [101]. While we also discuss the Fiber CUR [81] in Section III-B, we do not include Fiber CUR in our experiments as we observed poor performance compared to the other algorithms.
All of the methods and experiments are coded in MATLAB. According to the tested methods’ respective papers, we use norm sampling to obtain random row subsets of the tensor unfoldings for Chidori CUR and RB-HOSVD, and we use uniform sampling to to obtain random column subsets of for RST-CUR. Our implementation of ST-HOSVD is from the tensor toolbox version 3.6 [113], and all other methods are coded via details given in their respective papers.
To simplify the experiments, we perform all of these methods with a common “estimated core size” that is shared across the modes of the estimated tensor. The decomposition is then converted to an estimated HOSVD decomposition that also uses the same for all modes of the tensor. Therefore, the true HOSVD factor matrices are given by and the estimated factor matrices can be given by , for , in which case the matrices in (17) are of size . Note here that the true HOSVD is performed for some choice of estimated core size that may differ from the true core size of .
Under the generative model defined in (18), we vary these qualities of the model to test the methods’ performances: estimated core size , the true core sizes , the dimensionality of the modes (mode size) , the SNR of the simulated data tensor , and the number of modes . All of our experiments use these default parameters: , , , , and . Given the memory requirements of extremely large tensors, in the experiment that varies the number of modes, we restrict to be either 3 or 4 modes and we use a smaller default mode size of .
For all plots where we display CPU time performance, we note that these plots were essentially identical for the CPD and Tucker modeled data, thus performance was effectively independent of the generative model’s core tensor structure. Therefore, we only show the plot for the CPD model data. Additionally, we do not show figures for CPU time vs. the true core size or the SNR , as these experiments feature CPU times that are constant with respect to these variables.
Fig. 2 plots the methods’ CPU time performance with respect to the mode size . In this experiment ST-HOSVD is the slowest of the methods, followed by Chidori CUR, RP-HOSVD, RST-CUR, TCD-D, and TCD-R. With the default estimated core size , we note that TCD-D can maintain fast times in the event that is small, which works well for tensors that have a reasonably low ranks. We also note that Chidori CUR’s slower performance is mainly due to the norm sampling over the entire tensor for each mode, in contrast to sampling over truncated tensors such as done in other methods. Chidori CUR is significantly faster when uniform sampling is done in place of norm sampling, with an accompanying degree of loss in approximation performance.
FIGURE 2.

CPU time w.r.t. mode size .
Fig. 3 plots the methods’ CPU time performance with respect to the estimated core size . TCD-D faces larger complexity with higher , whereas all other methods have complexity that only increases slightly with increasing . This may motivate other methods besides TCD-D for when CPU time is a priority and larger are desired. However, TCD-D is still unique among these methods for deterministically selecting elements from the modes. Thus, compared to these otherwise predominately randomized methods, TCD-D is perhaps unique in its utility for feature selection.
FIGURE 3.

CPU time w.r.t. estimated core size .
Fig. 4 plots the methods’ CPU time performance with respect to the number of modes , for and . TCD-D and TCD-R are among the fastest methods in this experiment, and interestingly, TCD-D is the fastest despite being deterministic. We observe that this is due to how MATLAB’s efficiency varies with respect to different mathematical operations: MATLAB is especially efficient in computing the Gram matrix , so much so that it can actually be more efficient to compute than even the fastest methods for calculating norms of rows of , which is required of the norm sampling approaches like TCD-R, Chidori CUR, and RB-HOSVD. Depending on the efficiency of the calculations, the programming environment used and the dimensions of the tensor, these methods may benefit by using to calculate the norms. At the same time, this also demonstrates the efficiency of the WKH procedure in TCD-D for smaller , since it does not lead to significant increases in complexity above the other methods.
FIGURE 4.

CPU time w.r.t. number of modes .
Fig. 5 plots the methods’ relative error performance with respect to the estimated core size . All methods’ decompositions exponentially approach the true tensor in approximation quality with diminishing returns in . Performance of TCD-R in this experiment is comparable to Chidori CUR, with these methods only beaten by ST-HOSVD and TCD-D for lower .
FIGURE 5.

Relative error w.r.t. the estimated core size . Left: CPD model data. Right: Tucker model data.
Fig. 6 plots the methods’ relative error performance with respect to the true core size . Given a fixed estimated core size , all decompositions perform worse as the true core size increases, where with the decompositions are effectively underparametrizing and/or undersampling their model of the tensor. Like in Fig. 5, in this experiment TCD-D has an estimation performance that is only slightly worse than ST-HOSVD. After these methods, TCD-R has the third best performance, exceeding that of Chidori CUR for larger .
FIGURE 6.

Relative error w.r.t. the true core size . Left: CPD model data. Right: Tucker model data.
Fig. 7 plots the methods’ relative error performance with respect to the mode size . These performances are mostly constant in with the CPD data (left), but for the Tucker data, some methods like Chidori CUR and TCD-R feature slightly worse performances with larger , up to diminishing returns. Most of the randomized methods have much more comparable relative errors for the CPD model data, with significantly higher spread with the Tucker model data.
FIGURE 7.

Relative error w.r.t. the mode size . Left: CPD model data. Right: Tucker model data.
Fig. 8 plots the methods’ relative error performance with respect to the signal to noise ratio (SNR) . Subject to diminishing returns, all methods perform significantly better in approximating the tensor with higher SNR, and TCD-D and TCD-R appear to provide some of the better approximations with lower SNR values. With higher SNR values, TCD-D’s performance is comparable to ST-HOSVD and TCD-R’s performance is comparable to Chidori CUR.
FIGURE 8.

Relative error w.r.t. the SNR . Left: CPD model data. Right: Tucker model data.
Fig. 9 plots the methods’ relative error performance with respect to the number of modes , for and . An apparent disadvantage to Chidori CUR and TCD-R occurs when , in which case these methods’ performances appear to suffer considerably, whereas all other methods are not as much affected by change of .
FIGURE 9.

Relative error w.r.t. the number of modes . Left: CPD model data. Right: Tucker model data.
We now discuss the methods’ performances in terms of the HOSVD distance measure defined in (17). We note that we compare each algorithm’s estimated HOSVD factors to the “true” factors estimated by ST-HOSVD for the same choice of , thus we don’t include ST-HOSVD in these plots since it has a HOSVD distance of 0 with itself.
Fig. 10 plots the methods’ HOSVD distances with respect to the estimated core size . All plots feature a clear U-shaped performance curve where the best performance generally occurs at , slightly higher than the true core size . Interestingly, these U-shaped HOSVD distance vs. plots are notably different in shape from the monotonically decreasing error vs. plots in Fig. 5. While the relative error of the decompositions only decreases when the decompositions model allows for more complexity (via increasing ), the HOSVD distance represents more of a measure of parameter estimation, where the desired parameters are the true ST-HOSVD factors, and are best estimated when the estimated number of factors is close to the true number .
FIGURE 10.

HOSVD distance w.r.t. the estimated core size . Left: CPD model data. Right: Tucker model data.
Fig. 11 plots the methods’ HOSVD distances with respect to the true core size . Whereas Fig. 10 shows a U-shaped curve with varying , Fig. 11 shows that increasing strictly worsens the methods’ performances as for a fixed . All methods perform poorly when is too large for the Tucker model data. However with the CPD model data, TCD-D performs significantly better than all other methods, especially with large .
FIGURE 11.

HOSVD distance w.r.t. the true core size . Left: CPD model data. Right: Tucker model data.
Fig. 12 plots the methods’ HOSVD distances with respect to the mode size . Like in Fig. 7, performances are mostly constant in with the CPD data (left), but for the Tucker data, all methods except TCD-D feature slightly worse performances with larger , whereas TCD-D actually features slightly better performances for larger , up to diminishing returns. We suspect the reason for TCD-D actually doing better for larger is that as all other variables are fixed, the tensor is generated the same with different but there are just more elements available to consider subsets over, in which case TCD-D’s deterministic WKH has more options of a subset that better minimize the discrepancy measure, and thus better match the PCs of the tensor.
FIGURE 12.

HOSVD distance w.r.t. the mode size . Left: CPD model data. Right: Tucker model data.
Fig. 13 plots the methods’ HOSVD distances with respect to the SNR . Like in Fig. 8, subject to diminishing returns, all methods perform significantly better with higher SNR, with TCD-R’s performance slightly better than Chidori CUR but typically worse than RST-CUR and RP-HOSVD. Whereas in Fig. 8 all methods’ relative errors nearly converge to 0 with increased SNR, TCD-D’ HOSVD distance in Fig. 13 converges significantly faster to 0 with increased SNR than the other methods’ HOSVD distances.
FIGURE 13.

HOSVD distance w.r.t. the SNR . Left: CPD model data. Right: Tucker model data.
Fig. 14 plots the methods’ HOSVD distances with respect to the number of modes , for and . Like in Fig. 9, TCD-R Chidori CUR perform worse with with the Tucker model data, whereas all other methods’ HOSVD distances are not as much affected by .
FIGURE 14.

HOSVD distance w.r.t. the number of modes . Left: CPD model data. Right: Tucker model data.
To summarize these experiments, we observe that TCD-R is among the most efficient of these methods, and TCD-D is also efficient when is small. In most experiments, TCD-R yields comparatively better approximation error and HOSVD distance performance vs. other methods with similar time complexities. Furthermore, TCD-D’s performance is typically significantly better than all other tested methods, and even competes closely to that of ST-HOSVD despite using only a subset of the tensor’s elements.
In the next section, we perform these methods on real data in the form of fMRI functional connectivity matrices (FNCs), where we visually demonstrate performance of these methods and also demonstrate the use of TCD-D for feature selection.
B. EXPERIMENT WITH FMRI DATA
Our experiments use resting-state fMRI data from the bipolar-schizophrenia network on intermediate phenotypes (B-SNIP) [114], [115], where our data tensor was obtained from the acquisition and preprocessing steps described in [116] and [117]. The main goals of these experiments are to:
Demonstrate performance of the tensor decomposition methods on real fMRI data in terms of estimation quality and computational efficiency.
Demonstrate TCD-D’s ability (unique among these methods) to perform feature selection within modes, selecting well-representative elements of the data. In our case, these elements are functional networks (FNs) which are typically used to characterize neurological phenomenon.
We now detail how the data tensor was formed. The fMRI dataset includes 176 healthy control and 176 schizophrenia patients for a total of subjects. The data was first preprocessed and then analyzed via constrained independent vector analysis (cIVA) to extract meaningful latent factors for describing the data. From each subject’s data, 53 spatial factors were extracted which correspond to biologically important functional networks (FNs). These factors are representative of seven different functional domains: subcortical (SC, 5 FNs), auditory (AUD, 2 FNs), sensorimotor (MOT, 9 FNs), visual (VIS, 9 FNs), cognitive control (CC, 17 FNs), default mode (DMN, 7 FNs) and cerebellar (CB, 4 FNs). Corresponding to each of these 53 spatial factors are time course factors, representing amplitudes of the networks at each point of measurement, and the correlations between these time courses are particularly useful for representing relationships between the networks. All pairwise Pearson correlations between any two of the 53 networks’ time courses is represented in a symmetric 53 × 53 matrix called a functional network connectivity (FNC) matrix. Our experiment constructs these FNC matrices across each of 352 subjects, and forms an FNC tensor .
A key factor in dealing with the data tensor is understanding its effective n-ranks given how the tensor was obtained. Our FNC data was extracted from functional networks that are expected to be maximally statistically independent from one another, being extracted from cIVA which maximizes statistical independence between networks. Therefore, we expect low correlation between the spatial components of different networks, and this can also result in time courses that demonstrate low correlation between disparate networks. This results in a tensor with effectively high n-ranks, thus decompositions of FNC tensors like require higher numbers of factors to adequately approximate the FNCs. This presents a challenge for the decomposition methods to approximate the tensor with relatively fewer factors, allowing us to better magnify and compare the methods’ estimation capabilities.
Due to the higher n-ranks of the FNC tensor, we test the algorithms on two different forms of the FNC tensor: one being the original FNC tensor, and the other being the elementwise squaring of the FNC tensor. The elementwise squaring provides R-squared values representing the degree of association between the network time courses. Taking the elementwise square of these FNCs effectively increases the spread of the singular values of each mode unfolding , allowing for better approximation with lower-rank models while still maintaining an interpretable decomposition.
For our experiments, we did a prior exploratory analysis over several candidates of estimated numbers of factors , and ultimately implemented for both forms of the tensor. The reasoning for these choice of were as follows: to better exemplify the approximation quality differences between the methods, to reasonably approximate the FNCs without too many factors, and to provide a more parsimonious model which TCD-D can then use to select networks whose R-squared values are “well representative” of all R-squared values in . These 20 networks could then be interpreted as particularly informative for approximating the relationships between any of the 53 networks.
We use the same tensor decomposition methods in Section V-A to decompose our FNC tensor . In order to also exploit the symmetry of , we modify each of these methods to use the same symmetry exploiting process described at the end of Section IV-D. Therefore, since the first and second modes are symmetric (pertaining to the 53 networks), the same factor matrix is used for both of these modes, and the core tensor is thus also symmetric with respect to these modes.
As done in the previous section, our experiments measure performance via CPU time, relative error, and HOSVD distance. Additionally, we implement a measure of how consistent the methods’ approximated HOSVD decompositions are with respect to different runs of the decompositions, which corresponds to different random subsets per run for the randomized methods. In defining this measure, we denote as the approximated HOSVD factors from a run of a decomposition method over the data, and define the set of the across runs by the set . Then our measure of “cross-distance”, the average distance between any two runs of a decomposition, is given by:
| (19) |
This “cross-distance” can be considered a generalization of the “cross-ISI” measure used to measure distances between runs for Blind Source Separation (BSS) methods [118].
Along with using cross-distance to measure the variability of the randomized methods, we also use cross-distance to obtain a single run that is the most well representative of all other runs, for which we may plot the FNCs approximated by this run to visually compare the average approximation quality of the methods. The plotted average FNCs were obtained by constructing the most representative run’s approximate tensor from its factorization, and then averaging the approximated subject FNCs across the 352 subjects.
Fig. 15 and Fig. 16 exhibit the average FNCs extracted from a typical run of each method, on the FNC tensor and squared FNC tensor respectively. In both forms of the data, the FNCs typically feature two well-defined blocks on the diagonal. These correspond to the motor (upper block) and visual (lower block) groups of networks, which feature high correlation and R-squared values within the groups. Because of the larger degree of association within these networks, their larger values in lead them to be especially important for approximating . Viewing the averages of FNCs in Fig. 15, we observe all methods are able to reasonably approximate at least one of these blocks, with ST-HOSVD, TCD-D, TCD-R, Chidori CUR, and RP-HOSVD demonstrating the two well-defined blocks, and TCD-D and TCD-R having performance closest to ST-HOSVD. Viewing the average of squared FNCs in Fig. 16, we note that all methods except for RB-HOSVD demonstrate two clearly defined blocks, with TCD-D and TCD-R having performance closest to ST-HOSVD.
FIGURE 15.

Plots of the average FNCs obtained by the approximated original FNC tensor , for each method’s most typical run (the run with the minimum cross-distance to all other runs). All methods used the ranks .
FIGURE 16.

Plots of the average FNCs obtained by the approximated elementwise squared FNC tensor , for each method’s most typical run (the run with the minimum cross-distance to all other runs). All methods used the ranks .
Tables 3 and 4 presents each method’s performance measures on the FNC tensor. All methods provide relatively higher relative errors, as the decomposition ranks are perhaps relatively conservative for the more heterogeneous nature of the FNC tensor. While in practice we select to provide an approximation quality that is nearly identical to the original tensor, our choice of lower is useful for better magnifying the approximation capabilities of the methods, which are clearly demonstrated in the much wider range of their values. ST-HOSVD provides the best relative error, and TCD-D features a comparatively similar error while simultaneously identifying representative networks. Among the more efficient methods, RST-CUR is the fastest method but has the second worst error and worst cross-distance, whereas TCD-R is the second fastest method with the third lowest error, HOSVD distance, and cross-distance. This demonstrates that TCD-D and TCD-R provide good performance measures given their time complexities, and can provide reasonably good approximations to the tensor with fewer factors .
TABLE 3.
Performances of methods on the original FNC tensor , averaged over 1000 independent runs over the data. Best performances per measure are bolded.
| CPU-time (sec) | relative error | HOSVD distance | cross-distance | |
|---|---|---|---|---|
| ST-HOSVD | 0.026 | 0.500 | 0 | 0 |
| TCD-D | 0.025 | 0.650 | 0.128 | 0 |
| TCD-R | 0.005 | 0.686 | 0.130 | 0.145 |
| Chidori CUR | 0.007 | 0.687 | 0.133 | 0.150 |
| RST-CUR | 0.002 | 0.691 | 0.132 | 0.169 |
| RP-HOSVD | 0.006 | 0.689 | 0.207 | 0.169 |
| RB-HOSVD | 0.015 | 0.940 | 0.148 | 0.163 |
TABLE 4.
Performances of methods on the elementwise squared FNC tensor , averaged over 1000 independent runs over the data. Best performances per measure are bolded.
| CPU-time (sec) | relative error | HOSVD distance | cross-distance | |
|---|---|---|---|---|
| ST-HOSVD | 0.026 | 0.384 | 0 | 0 |
| TCD-D | 0.025 | 0.464 | 0.150 | 0 |
| TCD-R | 0.005 | 0.514 | 0.155 | 0.155 |
| Chidori CUR | 0.007 | 0.520 | 0.160 | 0.157 |
| RST-CUR | 0.002 | 0.549 | 0.156 | 0.185 |
| RP-HOSVD | 0.006 | 0.528 | 0.218 | 0.175 |
| RB-HOSVD | 0.015 | 0.890 | 0.176 | 0.183 |
Additionally, a key distinction between TCD-D and the other methods is that TCD-D deterministically selects elements that are well representative of the tensor. Thus, TCD-D is unique among these methods for the capability of performing feature selection with the tensor data. With this fMRI dataset, TCD-D deterministically selects a reasonably “best” subset of the factor networks. We now consider the interpretation of the TCD-D selected networks. We observed that several TCD-D selected networks were selected not only for the 20 selected networks of the original FNC data, but also the 20 selected networks of the elementwise squared data, highlighting the importance of these networks (a total of 14 networks shared between the two forms of the tensor, corresponding to the indices 5, 8, 9, 12, 15, 17, 23, 24, 27, 28, 33, 45, 49, 51). Table 5 overviews details of these 14 networks identified over both forms of the data tensor, including their associated factor index in the FNCs (their index in ), the region of the brain the network corresponds to, and the group of networks it associates with.
TABLE 5.
Descriptions of 14 factors selected by TCD-D, shared between the 20 selected from the original FNC tensor and the 20 from the elementwise-squared FNC tensor.
| Index | Region | Network | Component |
|---|---|---|---|
| 5 | Thalamus | subcortical (SC) |
|
| 8 | Postcentral gyrus | sensorimotor (SM) |
|
| 9 | Left postcentral gyrus | sensorimotor (SM) |
|
| 12 | Superior parietal lobule | sensorimotor (SM) |
|
| 15 | Superior parietal lobule | sensorimotor (SM) (SM) |
|
| 17 | Calcarine gyrus | visual (VIS) |
|
| 23 | Inferior occipital | gyms visual (VIS) |
|
| 24 | Lingual gyms | visual (VIS) |
|
| 27 | Inferior parietal lobule | sensorimotor (SM) |
|
| 28 | Superior frontal gyms | cognitive control (CC) |
|
| 33 | Inferior parietal lobule | sensorimotor (SM) |
|
| 45 | Anterior cingulate cortex | default-mode network (DMN)) |
|
| 49 | Posterior cingulate cortex | default-mode network (DMN) |
|
| 51 | Cerebellar | Cerebellar (CB) |
|
These identified networks, including regions such as the thalamus, superior temporal gyrus, superior frontal gyrus, and posterior cingulate cortex, are significant as they represent crucial functional “blocks of networks” within the brain. Each of these networks is associated with specific functional domains, such as sensorimotor (e.g., left postcentral gyrus, superior parietal lobule), visual (e.g., inferior occipital gyrus), cognitive control (e.g., inferior parietal lobule), and the default mode network (e.g., posterior cingulate cortex). Clinically, these functional networks have been reported as significant brain regions highly associated with various psychiatric disorders. For instance, the superior frontal gyrus and posterior cingulate cortex have been identified in previous research as valuable biomarkers for different psychiatric conditions [116], [119], [120], [121], [122]. Furthermore, the fact that 14 of the 20 networks were identified over both forms of the tensor (original FNCs, and elementwise squared FNCs) demonstrates robustness of the proposed TCD-D method, showing consistent identification of meaningful functional areas that are associated with several psychiatric disorders. For example, reduced connectivity between the posterior cingulate and frontal areas in patients with first-episode schizophrenia has been reported in [123]. The failure of appropriate posterior cingulate cortex deactivation has been reported as potential biomarker in traumatic brain injury and mental disorders like ADHD, autism and schizophrenia [124].
VI. CONCLUSION
This paper presents efficient Tucker decomposition methods via using a small subtensor as a multilinear basis over the full data tensor, which we refer to as tensor coreset decompositons (TCD). The methods operate by sequentially truncating the tensor by replacing it with a coreset of elements from one or more of the tensor’s modes, with the coreset calculated such that it minimizes a discrepancy between itself and the HOSVD core tensor: principal components of the tensor’s unfoldings. This process sub-sequentially estimates mapping matrices that serve as the decomposition’s factor matrices, which can also be useful for efficiently approximating the tensor’s HOSVD.
For quantifying the “representativeness” of a coreset over the data tensor, we introduced a discrepancy-based measure that has straightforward connections to the cost function of HOSVD. We use this measure to develop a new efficient nonnegative least squares (NNLS) procedure for selecting the coreset weights, such that we minimize the discrepancy with respect to a choice of subset.
For decompositions that put greater emphasis on efficiency, we proposed “TCD-R” which randomly selects the subsets using norm sampling. For decompositions that place greater emphasis on approximation quality, and utility of selecting well representative subsets and for feature selection, we proposed “TCD-D” which uses a deterministic subset selection scheme based on the method of weighted kernel herding (WKH). Compared to previous methods, TCD-D is notably unique for its ability to perform unsupervised feature selection within the modes of the tensor data.
Finally, we experimentally demonstrate that our methods generally provide good balances between efficiency, approximation error quality, and quality of factors when converted to a HOSVD. Furthermore, we demonstrate on real fMRI FNC data that TCD-D is able to identify meaningful subsets of functional networks which are able to well-approximate the relationships between all networks in the FNC tensor.
ACKNOWLEDGMENT
The authors would like to thank Dr. Evrim Acar for her expertise and helpful feedback on the article. (Ben Gabrielson and Hanlu Yang are co-first authors.)
This work was supported in part by NSF under Grant 2316420; in part by NIH under Grant R01MH118695, Grant R01MH123610, and Grant R01AG073949; in part by the Computational Hardware used is part of the University of Maryland, Baltimore County (UMBC) High Performance Computing Facility (HPCF) funded by the U.S. NSF through the MRI and SCREMS Programs under Grant CNS-0821258, Grant CNS-1228778, Grant OAC-1726023, and Grant DMS-0821311; and in part by UMBC.
Biographies

BEN GABRIELSON received the B.A. degree in physics from the Franklin and Marshall College, Lancaster, PA, USA, in 2015, and the M.S. degree in electrical engineering from the University of Maryland, Baltimore County, Baltimore, MD, USA, in 2020, where he is currently pursuing the Ph.D. degree in electrical engineering, under the supervision of Dr. Tülay Adali.
His research interests include matrix and tensor factorizations, with a particular emphasis on efficient implementations of blind source separation and joint blind source separation.

HANLU YANG (Graduate Student Member, IEEE) received the B.A. degree in electrical engineering from Jilin University, China, and the M.S. degree in electrical engineering from Temple University, Philadelphia, PA, USA. She is currently pursuing the Ph.D. degree in electrical engineering with the University of Maryland, Baltimore County, Baltimore, MD, USA, under the supervision of Dr. Tülay Adali.
Her research interests include matrix and tensor factorizations, machine/deep learning, and statistical/graph signal processing, with applications in community detection, precision medicine, and large-scale neuroimaging data analysis.

TRUNG VU received the B.S. degree in computer science from Hanoi University of Science and Technology, Hanoi, Vietnam, in 2014, and the Ph.D. degree in computer science from the School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA, in 2022.
He is currently a Postdoctoral Research Associate with the Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, Baltimore, MD, USA. His research interests include optimization methods, independent component analysis, and matrix factorization with applications in machine learning and signal processing. He received the Best Student Paper Award at IEEE MLSP 2019.

VINCE CALHOUN (Fellow, IEEE) is currently the Founding Director of the Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), where he holds appointments with Georgia State, Georgia Tech, and Emory. He is the author of more than 1100 full journal articles. His work includes the development of flexible methods to analyze neuroimaging data, including blind source separation, deep learning, multimodal fusion and genomics, and neuroinformatics tools.
Dr. Calhoun is a fellow of the Institute of Electrical and Electronic Engineers, American Association for the Advancement of Science, American Institute for Medical and Biological Engineering, American College of Neuropsychopharmacology, Organization for Human Brain Mapping (OHBM), and the International Society of Magnetic Resonance in Medicine. He serves on the IEEE BISP Technical Committee and is also a member of the IEEE Data Science Initiative Steering Committee and the IEEE Brain Technical Committee.

TÜLAY ADALI (Fellow, IEEE) received the Ph.D. degree in electrical engineering from North Carolina State University, Raleigh, NC, USA, in 1992.
She joined the faculty of UMBC, in 1992. She is currently a Distinguished University Professor with the University of Maryland (UMBC), Baltimore County, Baltimore, MD, USA.
Prof. Adali is a fellow of AIMBE and AAIA, a Fulbright Scholar, and an IEEE SPS Distinguished Lecturer. She was a recipient of the SPS Meritorious Service Award, the Humboldt Research Award, the IEEE SPS Best Paper Award, the SPIE Unsupervised Learning and ICA Pioneer Award, the Presidential Research Professorship at UMBC, the University System of Maryland Regents’ Award for Research, and the NSF CAREER Award. She has been active in conference organizations and served or will serve as the Technical Chair, in 2017 and 2025, the Special Sessions Chair, in 2018 and 2024, the Publicity Chair, in 2000 and 2005, for the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), and the General/Technical Chair for the IEEE Machine Learning for Signal Processing (MLSP) and Neural Networks for Signal Processing Workshops, from 2001 to 2009, in 2014, and in 2023. She was the Chair of the NNSP/MLSP Technical Committee, from 2003 to 2005 and from 2011 to 2013, and served or is serving on numerous boards and technical committees of SPS. She served as the Chair of the IEEE Brain Technical Community, in 2023, and the Signal Processing Society (SPS) Vice President for Technical Directions, from 2019 to 2022. She is the Editor-in-Chief of Signal Processing Magazine.
REFERENCES
- [1].Kolda TG and Bader BW, “Tensor decompositions and applications,” SIAM Rev, vol. 51, no. 3, pp. 455–500, Aug. 2009. [Google Scholar]
- [2].Cichocki A, Lee N, Oseledets I, Phan A-H, Zhao Q, and Mandic DP, “Tensor networks for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor decompositions,” Found. Trends Mach. Learn, vol. 9, nos. 4–5, pp. 249–429, 2016. [Google Scholar]
- [3].Cichocki A, Lee N, Oseledets I, Phan A-H, Zhao Q, Sugiyama M, and Mandic DP, “Tensor networks for dimensionality reduction and large-scale optimization: Part 2 applications and future perspectives,” Found. Trends Mach. Learn, vol. 9, no. 6, pp. 431–673, 2017. [Google Scholar]
- [4].Zare A, Ozdemir A, Iwen MA, and Aviyente S, “Extension of PCA to higher order data structures: An introduction to tensors, tensor decompositions, and tensor PCA,” Proc. IEEE, vol. 106, no. 8, pp. 1341–1358, Aug. 2018. [Google Scholar]
- [5].Taherisadr M, Joneidi M, and Rahnavard N, “EEG signal dimensionality reduction and classification using tensor decomposition and deep convolutional neural networks,” in Proc. IEEE 29th Int. Workshop Mach. Learn. Signal Process. (MLSP), Oct. 2019, pp. 1–6. [Google Scholar]
- [6].Phan AH and Cichocki A, “Tensor decompositions for feature extraction and classification of high dimensional datasets,” Nonlinear Theory Appl., IEICE, vol. 1, no. 1, pp. 37–68, 2010. [Google Scholar]
- [7].Cong F, Lin Q-H, Kuang L-D, Gong X-F, Astikainen P, and Ristaniemi T, “Tensor decomposition of EEG signals: A brief review,” J. Neurosci. Methods, vol. 248, pp. 59–69, Jun. 2015. [DOI] [PubMed] [Google Scholar]
- [8].Taguchi Y-H, “Identification of candidate drugs using tensor-decomposition-based unsupervised feature extraction in integrated analysis of gene expression between diseases and DrugMatrix datasets,” Sci. Rep, vol. 7, no. 1, p. 13733, Oct. 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Gao Y, Zhang G, Zhang C, Wang J, Yang LT, and Zhao Y, “Federated tensor decomposition-based feature extraction approach for industrial IoT,” IEEE Trans. Ind. Informat, vol. 17, no. 12, pp. 8541–8549, Dec. 2021. [Google Scholar]
- [10].Fan H, Li C, Guo Y, Kuang G, and Ma J, “Spatial–spectral total variation regularized low-rank tensor decomposition for hyperspectral image denoising,” IEEE Trans. Geosci. Remote Sens, vol. 56, no. 10, pp. 6196–6213, Oct. 2018. [Google Scholar]
- [11].Lin T and Bourennane S, “Survey of hyperspectral image denoising methods based on tensor decompositions,” EURASIP J. Adv. Signal Process, vol. 2013, no. 1, pp. 1–11, Dec. 2013. [Google Scholar]
- [12].Gong X, Chen W, Chen J, and Ai B, “Tensor denoising using low-rank tensor train decomposition,” IEEE Signal Process. Lett, vol. 27, pp. 1685–1689, 2020. [Google Scholar]
- [13].Zhang H, Liu L, He W, and Zhang L, “Hyperspectral image denoising with total variation regularization and nonlocal low-rank tensor decomposition,” IEEE Trans. Geosci. Remote Sens, vol. 58, no. 5, pp. 3071–3084, May 2020. [Google Scholar]
- [14].Xue J, Zhao Y, Liao W, and Chan JC, “Nonlocal low-rank regularized tensor decomposition for hyperspectral image denoising,” IEEE Trans. Geosci. Remote Sens, vol. 57, no. 7, pp. 5174–5189, Jul. 2019. [Google Scholar]
- [15].Zhang Z, Ely G, Aeron S, Hao N, and Kilmer M, “Novel methods for multilinear data completion and de-noising based on tensor-SVD,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, Jun. 2014, pp. 3842–3849. [Google Scholar]
- [16].Acar E, Dunlavy DM, Kolda TG, and Mørup M, “Scalable tensor factorizations for incomplete data,” Chemometric Intell. Lab. Syst, vol. 106, no. 1, pp. 41–56, Mar. 2011. [Google Scholar]
- [17].Ely G, Aeron S, Hao N, and Kilmer ME, “5D seismic data completion and denoising using a novel class of tensor decompositions,” Geophysics, vol. 80, no. 4, pp. 83–95, Jul. 2015. [Google Scholar]
- [18].Tan H, Feng G, Feng J, Wang W, Zhang Y-J, and Li F, “A tensor-based method for missing traffic data completion,” Transp. Res. C, Emerg. Technol, vol. 28, pp. 15–27, Mar. 2013. [Google Scholar]
- [19].Song Q, Ge H, Caverlee J, and Hu X, “Tensor completion algorithms in big data analytics,” ACM Trans. Knowl. Discovery Data, vol. 13, no. 1, pp. 1–48, Feb. 2019. [Google Scholar]
- [20].Chen H, Lin M, Liu J, Yang H, Zhang C, and Xu Z, “NT-DPTC: A non-negative temporal dimension preserved tensor completion model for missing traffic data imputation,” Inf. Sci, vol. 653, Jan. 2024, Art. no. 119797. [Google Scholar]
- [21].Barak B, Kelner JA, and Steurer D, “Dictionary learning and tensor decomposition via the sum-of-squares method,” in Proc. 47th Annu. ACM Symp. Theory Comput, Jun. 2015, pp. 143–151. [Google Scholar]
- [22].Zhang Y, Mou X, Wang G, and Yu H, “Tensor-based dictionary learning for spectral CT reconstruction,” IEEE Trans. Med. Imag, vol. 36, no. 1, pp. 142–154, Jan. 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Zubair S and Wang W, “Tensor dictionary learning with sparse TUCKER decomposition,” in Proc. 18th Int. Conf. Digit. Signal Process. (DSP), Jul. 2013, pp. 1–6. [Google Scholar]
- [24].Gemechu D, “Sparse regularization based on orthogonal tensor dictionary learning for inverse problems,” Math. Problems Eng, vol. 2024, pp. 1–24, Feb. 2024. [Google Scholar]
- [25].Song Y, Gong Z, Chen Y, and Li C, “Tensor-based sparse Bayesian learning with intra-dimension correlation,” IEEE Trans. Signal Process, vol. 71, pp. 31–46, 2023. [Google Scholar]
- [26].Sidiropoulos ND, De Lathauwer L, Fu X, Huang K, Papalexakis EE, and Faloutsos C, “Tensor decomposition for signal processing and machine learning,” IEEE Trans. Signal Process, vol. 65, no. 13, pp. 3551–3582, Jul. 2017. [Google Scholar]
- [27].Cichocki A, Mandic D, De Lathauwer L, Zhou G, Zhao Q, Caiafa C, and Phan HA, “Tensor decompositions for signal processing applications: From two-way to multiway component analysis,” IEEE Signal Process. Mag, vol. 32, no. 2, pp. 145–163, Mar. 2015. [Google Scholar]
- [28].Lim L-H and Comon P, “Multiarray signal processing: Tensor decomposition meets compressed sensing,” Comp. Rendus. Mécanique, vol. 338, no. 6, pp. 311–320, Jun. 2010. [Google Scholar]
- [29].Lathauwer LD and Moor BD, “From matrix to tensor: Multilinear algebra and signal processing,” in Proc. Inst. Math. Appl. Conf. Ser, vol. 67, 1998, pp. 1–16. [Google Scholar]
- [30].Chen H, Ahmad F, Vorobyov S, and Porikli F, “Tensor decompositions in wireless communications and MIMO radar,” IEEE J. Sel. Topics Signal Process, vol. 15, no. 3, pp. 438–453, Apr. 2021. [Google Scholar]
- [31].Sørensen M and De Lathauwer L, “Coupled tensor decompositions for applications in array signal processing,” in Proc. 5th IEEE Int. Workshop Comput. Adv. Multi-Sensor Adapt. Process. (CAMSAP), Dec. 2013, pp. 228–231. [Google Scholar]
- [32].Pena-Pena K, Lau DL, and Arce GR, “T-HGSP: Hypergraph signal processing using t-product tensor decompositions,” IEEE Trans. Signal Inf. Process. Over Netw, vol. 9, pp. 329–345, 2023. [Google Scholar]
- [33].Smilde AK, Bro R, and Geladi P, Multi-Way Analysis: Applications in the Chemical Sciences. Hoboken, NJ, USA: Wiley, 2005. [Google Scholar]
- [34].Skantze V, Wallman M, Sandberg A-S, Landberg R, Jirstrand M, and Brunius C, “Identification of metabotypes in complex biological data using tensor decomposition,” Chemometric Intell. Lab. Syst, vol. 233, Feb. 2023, Art. no. 104733. [Google Scholar]
- [35].Bi Y, Lu Y, Long Z, Zhu C, and Liu Y, “Tensor decompositions: Computations, applications, and challenges,” Tensors Data Process, pp. 1–30, Jan. 2022. [Google Scholar]
- [36].Minoccheri C, Soroushmehr R, Gryak J, and Najarian K, “Tensor methods for clinical informatics,” in Artificial Intelligence in Healthcare and Medicine. Boca Raton, FL, USA: CRC Press, 2022, pp. 261–281. [Google Scholar]
- [37].Wang D, Zheng Y, and Li G, “High-dimensional low-rank tensor autoregressive time series modeling,” J. Econometrics, vol. 238, no. 1, Jan. 2024, Art. no. 105544. [Google Scholar]
- [38].Billio M, Casarin R, Iacopini M, and Kaufmann S, “Bayesian dynamic tensor regression,” 2017, arXiv:1709.09606. [Google Scholar]
- [39].Mahyari AG, Zoltowski DM, Bernat EM, and Aviyente S, “A tensor decomposition-based approach for detecting dynamic network states from EEG,” IEEE Trans. Biomed. Eng, vol. 64, no. 1, pp. 225–237, Jan. 2017. [DOI] [PubMed] [Google Scholar]
- [40].Wei S, Tang Y, Gao T, Wang Y, Wang F, and Chen D, “Scale-variant structural feature construction of EEG stream via component-increased dynamic tensor decomposition,” Knowl.-Based Syst, vol. 294, Jun. 2024, Art. no. 111747. [Google Scholar]
- [41].Sen B and Parhi KK, “Extraction of common task signals and spatial maps from group fMRI using a PARAFAC-based tensor decomposition technique,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Mar. 2017, pp. 1113–1117. [Google Scholar]
- [42].Chatzichristos C, Kofidis E, Morante M, and Theodoridis S, “Blind fMRI source unmixing via higher-order tensor decompositions,” J. Neurosci. Methods, vol. 315, pp. 17–47, Mar. 2019. [DOI] [PubMed] [Google Scholar]
- [43].Acar E, Levin-Schwartz Y, Calhoun VD, and Adali T, “Tensor-based fusion of EEG and FMRI to understand neurological changes in schizophrenia,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2017, pp. 1–4. [Google Scholar]
- [44].Chatzichristos C, Kofidis E, Van Paesschen W, De Lathauwer L, Theodoridis S, and Van Huffel S, “Early soft and flexible fusion of electroencephalography and functional magnetic resonance imaging via double coupled matrix tensor factorization for multisubject group analysis,” Hum. Brain Mapping, vol. 43, no. 4, pp. 1231–1255, Mar. 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Acar E, Aykut-Bingol C, Bingol H, Bro R, and Yener B, “Multiway analysis of epilepsy tensors,” Bioinformatics, vol. 23, no. 13, pp. 10–18, Jul. 2007. [DOI] [PubMed] [Google Scholar]
- [46].Acar E, Roald M, Hossain KM, Calhoun VD, and Adali T, “Tracing evolving networks using tensor factorizations vs. ICA-based approaches,” Frontiers Neurosci, vol. 16, Apr. 2022, Art. no. 861402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [47].Kofidis E and Regalia PA, “Tensor approximation and signal processing applications,” Contemp. Math, pp. 103–133, Jan. 2001. [Google Scholar]
- [48].Nesaragi N, Patidar S, and Thangaraj V, “A correlation matrix-based tensor decomposition method for early prediction of sepsis from clinical data,” Biocybernetics Biomed. Eng, vol. 41, no. 3, pp. 1013–1024, Jul. 2021. [Google Scholar]
- [49].Kofidis E, “Adaptive joint channel estimation/data detection in flexible multicarrier MIMO systems—A tensor-based approach,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Apr. 2024, pp. 8721–8725. [Google Scholar]
- [50].Xu F, Morency MW, and Vorobyov SA, “DOA estimation for transmit beamspace MIMO radar via tensor decomposition with Vandermonde factor matrix,” IEEE Trans. Signal Process, vol. 70, pp. 2901–2917, 2022. [Google Scholar]
- [51].Ji Y, Wang Q, Li X, and Liu J, “A survey on tensor techniques and applications in machine learning,” IEEE Access, vol. 7, pp. 162950–162990, 2019. [Google Scholar]
- [52].Rabanser S, Shchur O, and Günnemann S, “Introduction to tensor decompositions and their applications in machine learning,” 2017, arXiv:1711.10781. [Google Scholar]
- [53].Kaliyar RK, Goswami A, and Narang P, “DeepFakE: Improving fake news detection using tensor decomposition-based deep neural network,” J. Supercomput, vol. 77, no. 2, pp. 1015–1037, Feb. 2021. [Google Scholar]
- [54].Novikov A, Izmailov P, Khrulkov V, Figurnov M, and Oseledets IV, “Tensor train decomposition on TensorFlow (T3F),” J. Mach. Learn. Res, vol. 21, no. 30, pp. 1–7, 2020.34305477 [Google Scholar]
- [55].Tjandra A, Sakti S, and Nakamura S, “Tensor decomposition for compressing recurrent neural network,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2018, pp. 1–8. [Google Scholar]
- [56].Maruhashi K, Todoriki M, Ohwa T, Goto K, Hasegawa Y, Inakoshi H, and Anai H, “Learning multi-way relations via tensor decomposition with neural networks,” in Proc. AAAI Conf. Artif. Intell, vol. 32, 2018, pp. 1–8. [Google Scholar]
- [57].Hitchcock FL, “The expression of a tensor or a polyadic as a sum of products,” J. Math. Phys, vol. 6, nos. 1–4, pp. 164–189, Apr. 1927. [Google Scholar]
- [58].Hitchcock FL, “Multiple invariants and generalized rank of a P-way matrix or tensor,” J. Math. Phys, vol. 7, nos. 1–4, pp. 39–79, Apr. 1928. [Google Scholar]
- [59].Domanov I and De Lathauwer L, “On the uniqueness of the canonical polyadic decomposition of third-order tensors—Part I: Basic results and uniqueness of one factor matrix,” SIAM J. Matrix Anal. Appl, vol. 34, no. 3, pp. 855–875, Jan. 2013. [Google Scholar]
- [60].Domanov I and De Lathauwer L, “On the uniqueness of the canonical polyadic decomposition of third-order tensors—Part II: Uniqueness of the overall decomposition,” SIAM J. Matrix Anal. Appl, vol. 34, no. 3, pp. 876–903, Jan. 2013. [Google Scholar]
- [61].Domanov I and De Lathauwer L, “Canonical polyadic decomposition of third-order tensors: Relaxed uniqueness conditions and algebraic algorithm,” Linear Algebra Appl, vol. 513, pp. 342–375, Jan. 2017. [Google Scholar]
- [62].Tucker LR, “Implications of factor analysis of three-way matrices for measurement of change,” Problems Measuring Change, vol. 15, nos. 122–137, p. 3, 1963. [Google Scholar]
- [63].Tucker LR, “Some mathematical notes on three-mode factor analysis,” Psychometrika, vol. 31, no. 3, pp. 279–311, Sep. 1966. [DOI] [PubMed] [Google Scholar]
- [64].Lathauwer LD, Moor BD, and Vandewalle J, “A multilinear singular value decomposition,” SIAM J. Matrix Anal. Appl, vol. 21, no. 4, pp. 1253–1278, Jan. 2000. [Google Scholar]
- [65].Oseledets IV, “Tensor-train decomposition,” SIAM J. Sci. Comput, vol. 33, no. 5, pp. 2295–2317, Jan. 2011. [Google Scholar]
- [66].Bigoni D, Engsig-Karup AP, and Marzouk YM, “Spectral tensor-train decomposition,” SIAM J. Sci. Comput, vol. 38, no. 4, pp. 2405–2439, Jan. 2016. [Google Scholar]
- [67].Grasedyck L, “Hierarchical singular value decomposition of tensors,” SIAM J. Matrix Anal. Appl, vol. 31, no. 4, pp. 2029–2054, Jan. 2010. [Google Scholar]
- [68].Cohen N, Sharir O, Levine Y, Tamari R, Yakira D, and Shashua A, “Analysis and design of convolutional networks via hierarchical tensor decompositions,” 2017, arXiv:1705.02302. [Google Scholar]
- [69].Lathauwer LD, “Decompositions of a higher-order tensor in block terms—Part II: Definitions and uniqueness,” SIAM J. Matrix Anal. Appl, vol. 30, no. 3, pp. 1033–1066, 2008. [Google Scholar]
- [70].Ye J, Wang L, Li G, Chen D, Zhe S, Chu X, and Xu Z, “Learning compact recurrent neural networks with block-term tensor decomposition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit, Jun. 2018, pp. 9378–9387. [Google Scholar]
- [71].Acar E, Kolda TG, and Dunlavy DM, “All-at-once optimization for coupled matrix and tensor factorizations,” 2011, arXiv:1105.3422. [Google Scholar]
- [72].Ermiş B, Acar E, and Cemgil AT, “Link prediction in heterogeneous data via generalized coupled tensor factorization,” Data Mining Knowl. Discovery, vol. 29, no. 1, pp. 203–236, Jan. 2015. [Google Scholar]
- [73].Zhou S, Vinh NX, Bailey J, Jia Y, and Davidson I, “Accelerating online CP decompositions for higher order tensors,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Aug. 2016, pp. 1375–1384. [Google Scholar]
- [74].Du Y, Zheng Y, Lee K-C, and Zhe S, “Probabilistic streaming tensor decomposition,” in Proc. IEEE Int. Conf. Data Mining (ICDM), Nov. 2018, pp. 99–108. [Google Scholar]
- [75].Rontogiannis AA, Kofidis E, and Giampouras PV, “Online rank-revealing block-term tensor decomposition,” in Proc. 55th Asilomar Conf. Signals, Syst., Comput, Oct. 2021, pp. 1678–1682. [Google Scholar]
- [76].Huang F, Niranjan UN, Hakeem MU, and Anandkumar A, “Online tensor methods for learning latent variable models,” J. Mach. Learn. Res, vol. 16, no. 1, pp. 2797–2835, Jan. 2015. [Google Scholar]
- [77].Drineas P, Kannan R, and Mahoney MW, “Fast Monte Carlo algorithms for matrices I: Approximating matrix multiplication,” SIAM J. Comput, vol. 36, no. 1, pp. 132–157, Jan. 2006. [Google Scholar]
- [78].Mahoney MW and Drineas P, “CUR matrix decompositions for improved data analysis,” Proc. Nat. Acad. Sci, vol. 106, no. 3, pp. 697–702, Jan. 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [79].Mahoney MW, Maggioni M, and Drineas P, “Tensor-CUR decompositions for tensor-based data,” in Proc. 12th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Aug. 2006, pp. 327–336. [Google Scholar]
- [80].Drineas P and Mahoney MW, “A randomized algorithm for a tensor-based generalization of the singular value decomposition,” Linear Algebra Appl, vol. 420, nos. 2–3, pp. 553–571, Jan. 2007. [Google Scholar]
- [81].Caiafa CF and Cichocki A, “Generalizing the column–row matrix decomposition to multi-way arrays,” Linear Algebra Appl, vol. 433, no. 3, pp. 557–573, Sep. 2010. [Google Scholar]
- [82].Cai H, Hamm K, Huang L, and Needell D, “Mode-wise tensor decompositions: Multi-dimensional generalizations of CUR decompositions,” J. Mach. Learn. Res, vol. 22, no. 185, pp. 1–36, 2021. [Google Scholar]
- [83].Ahmadi-Asl S, Abukhovich S, Asante-Mensah MG, Cichocki A, Phan AH, Tanaka T, and Oseledets I, “Randomized algorithms for computation of tucker decomposition and higher order SVD (HOSVD),” IEEE Access, vol. 9, pp. 28684–28706, 2021. [Google Scholar]
- [84].Ahmadi-Asl S, Caiafa CF, Cichocki A, Phan AH, Tanaka T, Oseledets I, and Wang J, “Cross tensor approximation methods for compression and dimensionality reduction,” IEEE Access, vol. 9, pp. 150809–150838, 2021. [Google Scholar]
- [85].Cao B, He L, Kong X, Yu PS, Hao Z, and Ragin AB, “Tensor-based multi-view feature selection with applications to brain diseases,” in Proc. IEEE Int. Conf. Data Mining, Dec. 2014, pp. 40–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [86].Smalter A, Huan J, and Lushington G, “Feature selection in the tensor product feature space,” in Proc. 9th IEEE Int. Conf. Data Mining, Dec. 2009, pp. 1004–1009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [87].Yu J, Kong Z, Zhan L, Shen L, and He L, “Tensor-based multi-modality feature selection and regression for Alzheimer’s disease diagnosis,” Comput. Sci. Inf. Technol, vol. 12, no. 18, p. 123, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [88].Zhang Y, Wang X, Cai Z, Zhou Y, and Yu PS, “Tensor-based unsupervised multi-view feature selection for image recognition,” in Proc. IEEE Int. Conf. Multimedia Expo (ICME), Jul. 2021, pp. 1–6. [Google Scholar]
- [89].Vannieuwenhoven N, Vandebril R, and Meerbergen K, “A new truncation strategy for the higher-order singular value decomposition,” SIAM J. Sci. Comput, vol. 34, no. 2, pp. 1027–1052, Jan. 2012. [Google Scholar]
- [90].Williamson SA and Henderson J, “Understanding collections of related datasets using dependent MMD coresets,” Information, vol. 12, no. 10, p. 392, Sep. 2021. [Google Scholar]
- [91].Dwivedi R and Mackey L, “Generalized kernel thinning,” 2021, arXiv:2110.01593. [Google Scholar]
- [92].Karnin Z and Liberty E, “Discrepancy, coresets, and sketches in machine learning,” in Proc. Conf. Learn. Theory, Jun. 2019, pp. 1975–1993. [Google Scholar]
- [93].Gretton A, Borgwardt K, Rasch MJ, Schölkopf B, and Smola AJ, “A kernel two-sample test,” J. Mach. Learn. Res, vol. 13, no. 1, pp. 723–773, Mar. 2012. [Google Scholar]
- [94].Tolstikhin I, Sriperumbudur BK, and Schölkopf B, “Minimax estimation of maximum mean discrepancy with radial kernels,” in Proc. Adv. Neural Inf. Process. Syst, vol. 29, Dec. 2016, pp. 1938–1946. [Google Scholar]
- [95].Huszár F and Duvenaud D, “Optimally-weighted herding is Bayesian quadrature,” 2012, arXiv:1204.1664. [Google Scholar]
- [96].Boutsidis C, Mahoney MW, and Drineas P, “An improved approximation algorithm for the column subset selection problem,” in Proc. 20th Annu. ACM-SIAM Symp. Discrete Algorithms, Jan. 2009, pp. 968–977. [Google Scholar]
- [97].Farahat AK, Elgohary A, Ghodsi A, and Kamel MS, “Greedy column subset selection for large-scale data sets,” Knowl. Inf. Syst, vol. 45, no. 1, pp. 1–34, Oct. 2015. [Google Scholar]
- [98].Altschuler JM, Bhaskara A, Fu G, Mirrokni V, Rostamizadeh A, and Zadimoghaddam M, “Greedy column subset selection: New bounds and distributed algorithms,” in Proc. Int. Conf. Mach. Learn, 2016, pp. 2539–2548. [Google Scholar]
- [99].Frieze A, Kannan R, and Vempala S, “Fast Monte-Carlo algorithms for finding low-rank approximations,” J. ACM, vol. 51, no. 6, pp. 1025–1041, Nov. 2004. [Google Scholar]
- [100].Rudi A, Calandriello D, Carratino L, and Rosasco L, “On fast leverage score sampling and optimal learning,” in Proc. Adv. Neural Inf. Process. Syst, vol. 31, 2018, pp. 1–11. [Google Scholar]
- [101].Jiang X, Wang X, Yang J, Chen S, and Qin X, “Faster TKD: Towards lightweight decomposition for large-scale tensors with randomized block sampling,” IEEE Trans. Knowl. Data Eng, vol. 35, no. 8, pp. 7966–7979, Aug. 2023. [Google Scholar]
- [102].Çivril A, “Column subset selection problem is UG-hard,” J. Comput. Syst. Sci, vol. 80, no. 4, pp. 849–859, Jun. 2014. [Google Scholar]
- [103].Çivril A and Magdon-Ismail M, “Column subset selection via sparse approximation of SVD,” Theor. Comput. Sci, vol. 421, pp. 1–14, Mar. 2012. [Google Scholar]
- [104].Farahat AK, Ghodsi A, and Kamel MS, “An efficient greedy method for unsupervised feature selection,” in Proc. IEEE 11th Int. Conf. Data Mining, Dec. 2011, pp. 161–170. [Google Scholar]
- [105].Farahat AK, Ghodsi A, and Kamel MS, “A fast greedy algorithm for generalized column subset selection,” 2013, arXiv:1312.6820. [Google Scholar]
- [106].Mirzasoleiman B, Badanidiyuru A, Karbasi A, Vondrák J, and Krause, “Lazier than lazy greedy,” in Proc. AAAI Conf. Artif. Intell, vol. 29, Jan. 2015, pp. 1812–1818. [Google Scholar]
- [107].Pi Y, Peng H, Zhou S, and Zhang Z, “A scalable approach to column-based low-rank matrix approximation,” in Proc. 23rd Int. Joint Conf. Artif. Intell, Aug. 2013, pp. 1600–1606. [Google Scholar]
- [108].Ordozgoiti B, Mozo A, and De Lacalle JGL, “Regularized greedy column subset selection,” Inf. Sci, vol. 486, pp. 393–418, Jun. 2019. [Google Scholar]
- [109].Dereziński M, Khanna R, and Mahoney MW, “Improved guarantees and a multiple-descent curve for column subset selection and the Nystrom method,” in Proc. Adv. Neural Inf. Process. Syst, vol. 33, 2020, pp. 4953–4964. [Google Scholar]
- [110].Bro R and De Jong S, “A fast non-negativity-constrained least squares algorithm,” J. Chemometrics, vol. 11, no. 5, pp. 393–401, Sep. 1997. [Google Scholar]
- [111].Amari SA, “Advances in neural information processing systems,” in Advances in neural information processing systems, vol. 8, pp. 757–763. Cambridge, MA, USA: MIT Press, 1996. [Google Scholar]
- [112].Zhou G, Cichocki A, and Xie S, “Decomposition of big tensors with low multilinear rank,” 2014, arXiv:1412.1885. [Google Scholar]
- [113].Kolda TG and Bader BW, “MATLAB tensor toolbox,” Sandia Nat. Lab. (SNL), Albuquerque, NM, and Livermore, CA, USA, Tech. Rep, 2006. [Google Scholar]
- [114].Tamminga CA, Ivleva EI, Keshavan MS, Pearlson GD, Clementz BA, Witte B, Morris DW, Bishop J, Thaker GK, and Sweeney JA, “Clinical phenotypes of psychosis in the bipolar-schizophrenia network on intermediate phenotypes (B-SNIP),” Amer. J. Psychiatry, vol. 170, no. 11, pp. 1263–1274, Nov. 2013. [DOI] [PubMed] [Google Scholar]
- [115].Tamminga CA, Pearlson G, Keshavan M, Sweeney J, Clementz B, and Thaker G, “Bipolar and schizophrenia network for intermediate phenotypes: Outcomes across the psychosis continuum,” Schizophrenia Bull, vol. 40, pp. 131–137, Mar. 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [116].Yang H, Vu T, Long Q, Calhoun V, and Adali T, “Identification of homogeneous subgroups from resting-state fMRI data,” Sensors, vol. 23, no. 6, p. 3264, Mar. 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [117].Vu T, Laport F, Yang H, Calhoun VD, and Adali T, “Constrained independent vector analysis with reference for multi-subject fMRI analysis,” 2023, arXiv:2311.05049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [118].Long Q, Jia C, Boukouvalas Z, Gabrielson B, Emge D, and Adali T, “Consistent run selection for independent component analysis: Application to fmri analysis,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Apr. 2018, pp. 2581–2585. [Google Scholar]
- [119].Matsumoto H, Simmons A, Williams S, Hadjulis M, Pipe R, Murray R, and Frangou S, “Superior temporal gyrus abnormalities in early-onset schizophrenia: Similarities and differences with adult-onset schizophrenia,” Amer. J. Psychiatry, vol. 158, no. 8, pp. 1299–1304, Aug. 2001. [DOI] [PubMed] [Google Scholar]
- [120].Li T, Wang Q, Zhang J, Rolls ET, Yang W, Palaniyappan L, Zhang L, Cheng W, Yao Y, and Liu Z, “Brain-wide analysis of functional connectivity in first-episode and chronic stages of schizophrenia,” Schizophrenia Bull, vol. 43, no. 2, pp. 436–448, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [121].Yang H, Akhonda MABS, Ghayem F, Long Q, Calhoun VD, and Adali T, “Independent vector analysis based subgroup identification from multisubject fMRI data,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2022, pp. 1471–1475. [Google Scholar]
- [122].Yang H, Ortiz-Bouza M, Vu T, Laport F, Calhoun VD, Aviyente S, and Adali T, “Subgroup identification through multiplex community structure within functional connectivity networks,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Apr. 2024, pp. 2141–2145. [Google Scholar]
- [123].Liang S, Deng W, Li X, Wang Q, Greenshaw AJ, Guo W, Kong X, Li M, Zhao L, Meng Y, Zhang C, Yu H, Li X-M, Ma X, and Li T, “Aberrant posterior cingulate connectivity classify first-episode schizophrenia from controls: A machine learning study,” Schizophrenia Res, vol. 220, pp. 187–193, Jun. 2020. [DOI] [PubMed] [Google Scholar]
- [124].Leech R and Sharp DJ, “The role of the posterior cingulate cortex in cognition and disease,” Brain, vol. 137, no. 1, pp. 12–32, Jan. 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
