Hypergraph regularized nonnegative triple decomposition for multiway data analysis

Qingshui Liao; Qilong Liu; Fatimah Abdul Razak

doi:10.1038/s41598-024-59300-3

. 2024 Apr 20;14:9098. doi: 10.1038/s41598-024-59300-3

Hypergraph regularized nonnegative triple decomposition for multiway data analysis

Qingshui Liao ^1,^2,^✉, Qilong Liu ², Fatimah Abdul Razak ^1,^✉

PMCID: PMC11032410 PMID: 38643209

Abstract

Tucker decomposition is widely used for image representation, data reconstruction, and machine learning tasks, but the calculation cost for updating the Tucker core is high. Bilevel form of triple decomposition (TriD) overcomes this issue by decomposing the Tucker core into three low-dimensional third-order factor tensors and plays an important role in the dimension reduction of data representation. TriD, on the other hand, is incapable of precisely encoding similarity relationships for tensor data with a complex manifold structure. To address this shortcoming, we take advantage of hypergraph learning and propose a novel hypergraph regularized nonnegative triple decomposition for multiway data analysis that employs the hypergraph to model the complex relationships among the raw data. Furthermore, we develop a multiplicative update algorithm to solve our optimization problem and theoretically prove its convergence. Finally, we perform extensive numerical tests on six real-world datasets, and the results show that our proposed algorithm outperforms some state-of-the-art methods.

Keywords: Nonnegative tensor decomposition, Triple decomposition, Hypergraph regularization, Data anaylsis

Subject terms: Computational science, Applied mathematics

Introduction

A massive amount of high-dimensional data has been accumulated in social networks, neural networks, data mining, computer vision, and other domains as data extraction technology has advanced. A number of issues arise when analyzing and processing high-dimensional data, such as the need for long computation times and large memory spaces. As a result, dimensionality reduction is commonly conducted prior to further processing and analysis of these data. High-dimensional data is often vectorized to form a larger matrix. Matrix-based methods, such as principal component analysis (PCA)¹, singular value decomposition (SVD)², multiway extensions of the SVD³, and linear discriminant analysis (LDA)⁴, are then used for dimensionality reduction. However, the matrix-based dimensionality reduction methods ignore the internal structure of the data. Therefore, tensor decomposition techniques are used to gain a better understanding of data features. There are some widely used tensor decomposition methods, such as Eckart-Young decomposition⁵, CANDECOMP/PARAFAC (CP) decomposition⁶, Tucker decomposition (TD)⁷, and the family of principal component decomposition models related to TD^8–12. TD is the decomposition of a tensor into the product of the core tensor and some factor matrices in different directions. When the core tensor in TD is taken to be the unit tensor, it degenerates to CP decomposition. Different from CP, multiway versions of principal component decompositions related to TD focus on underlining different numbers of main influence components for various multiway data via feature extraction along different modes of models.

TD has been successfully applied in the fields of pattern recognition, cluster analysis, image denoising, and image complementation. Due to the powerful data representation capabilities of TD, many TD variants have been developed in recent years based on reasonable assumptions such as sparsity¹³, smoothness¹⁴, and convolution¹⁵. However, TD faces some challenges when dealing with high-dimensional data: (i) The size of the core tensor in TD grows rapidly as the order of the data increases, which may result in a high cost of calculation and estimation complexity; (ii) TD does not consider the variability in each direction. This variability is widespread in some real data, such as traffic and internet data, where the three modes of the third-order tensor have strong temporal, spatial, and periodic significance¹⁶. To remedy these shortcomings, Qi et al.¹⁷ proposed a bilevel form of triple decomposition (TriD). The triple decomposition for third-order tensors transforms a third-order tensor into a product of three third-order factor tensors. Each factor tensor represents a different meaning and is of lower dimension in two directions. TriD performs TD on a tensor and triple decomposes the Tucker core at the same time. The number of parameters in TriD is less than that of TD in substantial cases. Therefore, TriD is less costly than TD.

Although TriD has achieved better results in tensor data recovery experiments, it does not take into account the geometrical manifold structure of the data. In the past decade, manifold learning has been widely adopted to preserve the geometric information of original data. Cai et al.¹⁸ explored the geometrical information by constructing a k-nearest neighbor graph and proposed the graph regularized nonnegative matrix factorization (GNMF), which demonstrated promising performance in clustering analysis. To improve the robustness of GNMF, some variants of GNMF have been proposed as described in the literatiures^19–24. Li et al.²⁵ introduced a manifold regularization term on the core tensor and proposed a manifold regularization nonnegative Tucker decomposition (MR-NTD) method. Qiu et al.²⁶ proposed a graph regularized nonnegaitve Tucker decomposition (GNTD) method by applying Laplacian regularization to the last nonnegative factor matrix. Liu et al.²⁷ presented a technique known as graph regularized $L_{p}$ smooth NTD (GSNTD) via embedding graph regularization and $L_{p}$ smooth constraint into the original model of NTD. Subsequently, Wu et al.²⁸ proposed a manifold regularization nonnegative triple decomposition (MRNTriD) of tensor sets that takes advantage of tensor geometry information. These graph-based manifold learning methods perform well in clustering. They, however, only consider the pairwise relationship between samples and ignore the high-order relationship among samples. Hypergraph learning is a good candidate for solving this problem.

Using a hypergraph to model the high-order relationship between samples will improve classification performance. There are numerous significant methods combined with hypergraphs that work well in clustering tasks: Zeng et al.²⁹ presented a hypergraph regularized nonnegative matrix factorization (HNMF) method. Wang et al.³⁰ introduced a hypergraph regularization to $L_{1 / 2}$ -NMF (HSNMF) for exploiting spectral-spatial joint structure of hypespectral images. Huang et al.³¹ constructed a sparse hypergraph for better clustering and proposed a sparse hypergraph regularized NMF (SHNMF) method. Yin et al.³² proposed a hypergraph regularized nonnegative tensor factorization (HyperNTF) method by incorporating hypergraph into nonnegative tensor decomposition. Zhao et al.³³ introduced a hypergraph regularized term into the framework of the nonnegative tensor ring decomposition and proposed a hypergraph regularized nonnegative tensor ring decomposition (HGNTR). To reduce computational complexity and suppress noise, they applied the low-rank approximation trick to accelerate HGNTR (LraHGNTR)³³. Huang et al.³⁴ designed a method to dynamically update the hypergraph and proposed a dynamic hypergraph regularized nonnegative Tucker decomposition (DHNTD) method.

To the best of our knowledge, there is no method to consider higher-order relationships among data sample points in TriD. Inspired by the advantages of hypergraph learning and TriD, in this paper, we present a hypergraph regularized nonnegative triple decomposition (HNTriD) model. HNTriD can explore low-dimensional parts-based representations while preserving detailed complex geometrical information from high-dimensional tensor data. Then, we develop an iterative multiplicative updating algorithm to solve the HNTriD model. The following are the main contributions of this paper:

HNTriD is a novel dimensionality reduction method by incorporating hypergraph learning into TriD. It is good at dealing with the clustering tasks for tensor data, and the computation cost and containment resources could be greatly reduced.
HNTriD embraces the merit of the complex connections of observed samples while retaining raw data structural information in dimensionality reduction. We attribute this excellent performance to the hypergraph regularized term’s ability which can successfully approximate the inner relationships of original data.
HNTriD makes sense for some practical applications, such as clustering tasks, because it performs well at multiway data learning and can successfully preserve the important characteristics in dimensionality reduction. Experimental results in some popular datasets, including COIL20, GEORGIA, MNIST, ORL, PIE, and USPS, show that HNTriD outperforms existing rival approaches in cluster analysis.

The remainder of this paper is organized as follows: Section 2 goes over some fundamental concepts, such as NTD, TriD, and hypergraph learning, that will be used in the subsequent sections. The objective function of the HNTriD model is proposed in Section 3, and we discuss the HNTriD optimization algorithm in detail, including the updating rules for the parameters of the model, the convergence analysis of the proposed method, and the computation complexity analysis of HNTriD. In Section 4, we present some experimental results that can be used to validate the efficacy and accuracy of our proposed method. The last section is the conclusion.

Related work

In this section, we briefly overview some basic definitions, including NTD^32,34,35, TriD^17,25, hypergraph learning^36–38. The notations used in this paper are listed in Table 1.

Table 1.

List of the notations relevant to this paper.

Notaions	Descriptions	Notations	Descriptions
$U$	A matrix	$X$	A tensor
${‖ \cdot ‖}_{F}$	Frobenius norm	$Tr (\cdot)$	Trace
$〚 \cdot 〛$	Triple product	$Y_{(n)}$	The mode-n matricization of tensor $Y$
$\otimes$	Kronecker product	$X_{ijl}$	The (i, j, l) element of a third-order tensor
$⊛$	Hadamard product	$vec (\cdot)$	The operator vectorizes a subject into a vector
$O (\cdot)$	Computation cost	$\times_{n}$	The mode-n product

Open in a new tab

Nonnegative tensor decomposition (NTD)

TD is a popular class of methods for dimensionality reduction of high-dimensional data⁷. The data collected in real life are usually nonnegative, so it makes more physical sense to add nonnegative constraints to all factors in TD. Therefore, we focus on the nonnegative tensor decomposition (NTD). In fact, NTD is a multiway extension of nonnegative matrix factorization (NMF)³⁹, which imposes nonnegative constraints to the TD model³⁵, and it preserves the multilinear structure of data. Given a nonnegative third-order tensor $X \in R_{+}^{n_{1} \times n_{2} \times n_{3}}$ , NTD can be expressed as a core nonnegative tensor $\hat{X} \in R_{+}^{r_{1} \times r_{2} \times r_{3}}$ multiplied by three nonnegative factor matrices $U \in R_{+}^{n_{1} \times r_{1}}$ , $V \in R_{+}^{n_{2} \times r_{2}}$ , and $W \in R_{+}^{n_{3} \times r_{3}}$ , and it can be formulated as

\begin{matrix} X = \hat{X} \times_{1} U \times_{2} V \times_{3} W . \end{matrix}

If the smallest integers $r_{1}, r_{2}, r_{3}$ such that (1) holds, then we call the vector $(r_{1}, r_{2}, r_{3})$ the Tucker rank. In the process of solving the optimal solution, we usually use its transformation of the mode-n matricization, and (1) can be expressed in the following equivalent forms

\begin{matrix} X_{(1)} = U {\hat{X}}_{(1)} (W \otimes V)^{⊤}, X_{(2)} = V {\hat{X}}_{(2)} (W \otimes U)^{⊤}, and X_{(3)} = W {\hat{X}}_{(3)} (V \otimes U)^{⊤}, \end{matrix}

where $X_{(n)}$ denotes the mode-n matricization of the tensor $X$ , “ $\otimes$ ” denotes the Kronecker product of two matrices.

Bilevel form of triple decomposition (TriD)

In the TD and NTD methods, the size of the core tensor grows rapidly as the order of data increases, which may result in a high cost of calculation. To overcome this shortcoming, Qi et al.¹⁷ recently proposed a new form of triple decomposition for third-order tensors, which reduces a third-order tensor to the product of three third-order factor tensors.

Definition 1

¹⁷ Let $\hat{X} = ({\hat{x}}_{ijl}) \in R^{r_{1} \times r_{2} \times r_{3}}$ be a nonzero tensor. We say that $\hat{X}$ is the triple product of three third-order square tensors $A \in R^{r_{1} \times r \times r}$ , $B \in R^{r \times r_{2} \times r}$ , and $C \in R^{r \times r \times r_{3}}$ , triple product of the tensors is denoted by

\begin{matrix} \hat{X} = 〚 A B C 〛, \end{matrix}

where $A$ , $B$ , and $C$ are named horizontally square tensor, laterally square tensor, and frontally square tensor, respectively. For $i = 1, 2, \dots, r_{1}, j = 1, 2, \dots, r_{2}$ , $l = 1, 2, \dots, r_{3}$ , the elementwise definition of the triple product can be illustrated as

\begin{matrix} {\hat{X}}_{ijl} = \sum_{p = 1}^{r} \sum_{q = 1}^{r} \sum_{s = 1}^{r} A_{iqs} B_{pjs} C_{pql} . \end{matrix}

\begin{matrix} r \leq m i d {r_{1}, r_{2}, r_{3}}, \end{matrix}

where “ $m i d {\cdot}$ ” denotes the median, we call (3) is a low rank triple decomposition of $\hat{X}$ . $A, B$ , and $C$ are the factor tensors of $\hat{X}$ . The smallest value of r such that (4) holds is known as the triple rank of $\hat{X}$ , which is denoted as TriRank( $\hat{X}$ )=r. The triple rank of a zero tensor is defined as zero.

If a third-order tensor is decomposed by TD, and its Tucker core is triple decomposed into three tensors simultaneously. Then we get a bilevel form of the triple decomposition, that is shown below.

Definition 2

¹⁷ Based on the definition of NTD shown in (1), if the core tensor $\hat{X}$ has a triple decomposition $\hat{X} = 〚 A B C 〛$ , where TriRank( $\hat{X}$ )=r, $A \in R_{+}^{r_{1} \times r \times r}$ , $B \in R_{+}^{r \times r_{2} \times r}$ , and $C \in R_{+}^{r \times r \times r_{3}}$ . Then $X$ can be represented as

\begin{matrix} X = 〚 A B C 〛 \times_{1} U \times_{2} V \times_{3} W . \end{matrix}

We call (5) a bilevel form of the triple decomposition of $X$ , which is always referred as TriD. $A$ , $B$ , and $C$ are the inner factor tensors.

From (1), the minimum number in parameters of NTD of the third-order tensor $X$ is $n_{1} r_{1} + n_{2} r_{2} + n_{3} r_{3} + r_{1} r_{2} r_{3}$ , where $(r_{1}, r_{2}, r_{3})$ is the Tucker rank of $X$ . On the other hand, the number of parameters of TriD is $n_{1} r_{1} + n_{2} r_{2} + n_{3} r_{3} + (r_{1} + r_{2} + r_{3}) r^{2}$ , where r is the triple rank of $\hat{X}$ . Generally, the triple rank of $\hat{X}$ is far less than each of the Tucker rank’s components of the original tensor $X$ . Then there are substantial cases where the number of parameters of TriD is strictly less than that of the TD.

NTD and TriD are linear dimensionality reduction techniques that may miss the essential nonlinear data structure. Manifold learning, on the other hand, is an effective technique for discovering geometric structure in multiway data, and hypergraph learning is a promising manifold learning method.

Hypergraph learning

To improve clustering performance, it is necessary to maintain the internal hidden geometry structure information, which can be detailed by the hypergraph learning. Given $n_{3}$ grayscale image ${X_{1}, X_{2}, \dots, X_{n_{3}}}$ , each grayscale image can be viewed as a matrix of size $n_{1} \times n_{2}$ . These $n_{3}$ matrices are stacked to form a tensor $X$ of size $n_{1} \times n_{2} \times n_{3}$ . The i-th frontal slice $X (:, :, i)$ of the tensor $X$ is exactly the matrix $X_{i}$ . In addition, we can build a hypergraph $(V, E ; S)$ to encode the geometrical structure of raw data⁴⁰. Each node $v_{i} \in V$ represents a related data $X_{i}$ and every hyperedge $e_{i} \in E$ consists of several nodes that are clustered by some constraints. For each vertex $v_{i}$ , we form a hyperedge $e_{i}$ of $v_{i}$ and the k-neighbours of $v_{i}$ . For each hyperedge $e_{i}$ with a weight $s (e_{i})$ which is used to measure the similarity of the contained image nodes. The weight $s (e_{i})$ can be calculated as follows:

\begin{matrix} s (e_{i}) = \sum_{X_{j} \in e_{i}} e x p (\frac{- ‖ X_{i} - X_{j} ‖_{F}^{2}}{σ^{2}}), \end{matrix}

where $σ = \frac{1}{k n_{3}} \sum_{i = 1}^{3} \sum_{j \in e_{i}} {‖ X_{i} - X_{j} ‖}_{F}$ denotes the mean distance among all vertices in hyperedge $e_{i}$ . In particular, we can construct an incidence matrix $H$ as follows:

\begin{matrix} H (v_{i}, e_{q}) = \{\begin{matrix} 1, & if v_{i} \in e_{q}, \\ 0, & if v_{i} \notin e_{q} . \end{matrix}) \end{matrix}

The degrees of a node $v_{i}$ and a hyperedge $e_{q}$ can be expressed as

\begin{matrix} d (v_{i}) = \sum_{e_{q} \in E, v_{i} \in V} s (e_{q}) H (v_{i}, e_{q}) \end{matrix}

and

\begin{matrix} d (e_{q}) = | e_{q} | = \sum_{v_{i} \in V} H (v_{i}, e_{q}), \end{matrix}

respectively. We use $D_{v}, D_{e}$ , and $S_{e}$ to denote diagonal matrices whose elements are $d (v_{i})$ , $d (e_{q})$ , and $s (e_{q})$ , respectively.

To make the hypergraph more visual, we show the spatial structure in Figure 1. Herein every $v_{j} (j = 1, 2, \dots, 9)$ represents a node, and each $e_{i} (i = 1, 2, \dots, 5)$ denotes a hyperedge.

An example of hypergraph and its incident relationship.

If two matrix data $X_{i}$ and $X_{j}$ are similar in the original raw observation, it is reasonable to assume that their low-dimensional representations $w_{i}$ and $w_{j}$ are adjacent to each other. Combined with practical application and theoretical analysis of hypergraph^31–33, we can assume that $w_{i}$ and $w_{j}$ are the corresponding vectors that are related to the nodes $v_{i}$ and $v_{j}$ . Then, the following expression can be used to calculate the clustering similarity of the original data $X_{i}$ and $X_{j}$ in the low-dimensional approximation.

\begin{matrix} \frac{1}{2} \sum_{e_{q} \in E} \sum_{v_{i}, v_{j} \in V} \frac{s (e_{q}) H (v_{i}, e_{q}) H (v_{j}, e_{q})}{δ (e_{q})} {‖ w_{i} - w_{j} ‖}_{2}^{2} \\ = \sum_{e_{q} \in E} \sum_{v_{i} \in V} s (e_{q}) H (v_{i}, e_{q}) {‖ w_{i} ‖}_{2}^{2} \sum_{v_{j} \in V} \frac{H (v_{j}, e_{q})}{δ (e_{q})} \\ - \sum_{e_{q} \in E} \sum_{v_{i}, v_{j} \in V} \frac{s (e_{q}) H (v_{i}, e_{q}) H (v_{j}, e_{q})}{δ (e_{q})} w_{i}^{⊤} w_{j} \\ = Tr [W^{⊤} (D_{v} - H S_{e} D_{e}^{- 1} H^{⊤}) W] \\ = Tr (W^{⊤} L W), \end{matrix}

where $L = D_{v} - \bar{S}$ is a hypergraph Laplacian matrix that characterizes the data manifold, and $\bar{S} = H S_{e} D_{e}^{- 1} H^{⊤}$ .

Hypergraph regularized nonnegative triple decomposition (HNTriD)

TriD is a significant tensor data dimensional reduction algorithm, but it ignores higher-order relationships between the inner parts of raw data and does not consider nonnegative constraints, which may result in a big gap in data clustering performance. Modeling the high-order relationship among samples will help to improve performance. Hypergraph learning is an effective tool for illustrating the inner complex connections of multiway data. By incorporating the hypergraph Laplacian regularized term into the bilevel form of triple decomposition, we get a new method named HNTriD, as shown in the following subsection.

Objective function of HNTriD

Suppose $X \in R_{+}^{n_{1} \times n_{2} \times n_{3}}$ be a third-order nonnegative tensor which we stack the samples that are represented by $n_{3}$ second-order data $X_{i} \in R_{+}^{n_{1} \times n_{2}} (i = 1, 2, \dots, n_{3})$ as the elements of the third mode, each $X_{i}$ represents an original data sample of the raw data. Note that $X_{(3)} = {[vec (X_{1}), vec (X_{2}), \dots, vec (X_{3})]}^{⊤}$ , unfolding $X$ along the third mode we can simplify (1) into its matricization form that equals to the third equation of (2), which can be written as

\begin{matrix} X_{(3)}^{⊤} = [(V \otimes U), {\hat{X}}_{(3)}^{⊤}] W^{⊤} . \end{matrix}

Let $W = {[w_{1}, w_{2}, \dots, w_{n_{3}}]}^{⊤}$ , each $w_{k} \in R_{+}^{3}$ can be regarded as a low-dimensional representation for the data $X_{k}$ under the basis of $(V \otimes U) {\hat{X}}_{(3)}^{⊤}$ .

To improve the multiway data representation ability and brush up operational efficiency, we propose the following HNTriD model, which incorporates the hypergraph constraint into the TriD model. For a given nonnegative tensor, $X \in R_{+}^{n_{1} \times n_{2} \times n_{3}}$ , HNTriD aims to find three nonnegative tensors $A \in R_{+}^{r_{1} \times r \times r}$ , $B \in R_{+}^{r \times r_{2} \times r}$ , and $C \in R_{+}^{r \times r \times r_{3}}$ and three nonnegative factor matrices $U \in R_{+}^{n_{1} \times r_{1}}$ , $V \in R_{+}^{n_{2} \times r_{2}}$ , and $W \in R_{+}^{n_{3} \times r_{3}}$ such that

\begin{matrix} min_{A, B, C, U, V, W} f_{HNTriD} = \frac{1}{2} {‖ X - 〚 A B C 〛 \times_{1} U \times_{2} V \times_{3} W ‖}_{F}^{2} + \frac{α}{2} Tr (W^{⊤} L W), \\ s . t . A \geq 0, B \geq 0, C \geq 0, U \geq 0, V \geq 0, W \geq 0 . \end{matrix}

The first and second parts of (7) are the reconstruction error term and the hypergraph regularized term, respectively. The reconstruction error term in (7) can be seen as a deep nonnegative tensor decomposition with two layers. The first layer is a TD in the following form

\begin{matrix} X \approx Y \times_{1} U \times_{2} V \times_{3} W, \end{matrix}

where $Y \times_{1} U \times_{2} V$ denotes the set of multilinear bases of the original data $X$ and $W$ denotes the encoding matrix of $X$ under this set of multilinear bases. The second layer is the triple decomposition, which takes the following form

\begin{matrix} Y \approx 〚 A B C 〛, \end{matrix}

where each factor tensor represents a different meaning in different application problems. For example, in social networks and transportation data, different characteristics such as temporal stability, spatial correlation, and traffic periodicity may be reflected in each of these three factors. This two-layer decomposition not only reduces the computation required to update the core tensor, but also takes into account the respective advantages of the TD and the triple decomposition. The variable $α$ is an adjustment parameter that is used to measure the importance of the hypergraph regularization term. The hypergraph regularization term preserves the multilateral relationships among the data, so we establish model (7).

HNTriD is used to represent high-dimensional data in a low-dimensional form. To better show the implications of HNTriD, we draw a flowchart to provide a concise overview of the implementation procedure in Figure 2.

A flowchart used to show the implementation process of HNTriD in data analysis.

Optimization algorithm

When the parameters $A, B, C, U, V$ , and $W$ are considered simultaneously, the objective function $f_{HNTriD}$ of HNTriD in (7) is not convex. Therefore, obtaining the global optimal solution is difficult. To deal with it, we introduce an iterative algorithm that achieves a local minimum. To simplify the process of solving the optimal algorithm, we show two important lemmas that will be frequently used.

Lemma 1

¹⁷Let $\hat{X} = 〚 A B C 〛$ , we define three third-order tensors $F \in R_{+}^{r^{2} \times r_{2} \times r_{3}}$ , $G \in R_{+}^{r_{1} \times r^{2} \times r_{3}}$ , and $H \in R_{+}^{r_{1} \times r_{2} \times r^{2}}$ with entries

\begin{matrix} F_{kjt} = \sum_{p = 1}^{r} B_{pjs} C_{pqt}, G_{ilt} = \sum_{q = 1}^{r} A_{iqs} C_{pqt}, and H_{ijm} = \sum_{s = 1}^{r} A_{iqs} B_{pjs}, \end{matrix}

where $k = q + (s - 1) r$ , $l = p + (s - 1) r$ , and $m = p + (q - 1) r$ , respectively. Then

\begin{matrix} \hat{X} = F \times_{1} A_{(1)} = G \times_{2} B_{(2)} = H \times_{3} C_{(3)} . \end{matrix}

Lemma 2

Let $M \in R^{m \times n}$ , $N \in R^{n \times p}$ , $P \in R^{p \times q}$ , and $Q \in R^{m \times q}$ . Then

\begin{matrix} \frac{{\partial ‖ Q - M N P ‖}_{F}^{2}}{\partial N} = 2 M^{⊤} (M N P - Q) P^{⊤} . \end{matrix}

Proof

According to

\begin{matrix} {‖ Q - M N P ‖}_{F}^{2} = & Tr [{(Q - M N P)}^{⊤}, (Q - M N P)] \\ = & Tr [{(M N P)}^{⊤}, (M N P)] - 2 Tr (P Q^{⊤} M N) + Tr (Q^{⊤} Q), \end{matrix}

one has

\begin{matrix} \frac{{\partial ‖ Q - M N P ‖}_{F}^{2}}{\partial vec (N)} = \frac{\partial Tr [{(M N P)}^{⊤}, (M N P)]}{\partial vec (N)} - 2 \frac{\partial Tr (P Q^{⊤} M N)}{\partial vec (N)} . \end{matrix}

Combining it with

\begin{matrix} \frac{\partial Tr [{(M N P)}^{⊤}, (M N P)]}{\partial vec (N)} = & \frac{\partial Tr [{(M N P)}^{⊤}, (M N P)]}{\partial vec (M N P)} \cdot \frac{\partial vec (M N P)}{\partial vec (M N)} \cdot \frac{\partial vec (M N)}{\partial vec (N)} \\ = & 2 {[vec (M N P)]}^{⊤} (P^{⊤} \otimes I_{m}) (I_{q} \otimes M) \\ = & 2 {[vec (M N P)]}^{⊤} (P^{⊤} \otimes M) \\ = & 2 {[vec (M^{⊤} M N P P^{⊤})]}^{⊤} \end{matrix}

and

\begin{matrix} \frac{\partial Tr (P Q^{⊤} M N)}{\partial vec (N)} = {[vec (M^{⊤} Q P^{⊤})]}^{⊤} \end{matrix}

yields

\begin{matrix} \frac{{\partial ‖ Q - M N P ‖}_{F}^{2}}{\partial vec (N)} = 2 vec {[M^{⊤}, (M N P - Q), P^{⊤}]}^{⊤} . \end{matrix}

Therefore, $\frac{{\partial ‖ M N P - Q ‖}_{F}^{2}}{\partial N} = 2 M^{⊤} (M N P - Q) P^{⊤}$ is obtained. This completes the proof. $□$

Solutions of inner factor tensors

When the variables $B, C, U, V$ , and $W$ are fixed, then the objective function of HNTriD is equivalent to

\begin{matrix} min_{A \geq 0} \frac{1}{2} {‖ X - 〚 A B C 〛 \times_{1} U \times_{2} V \times_{3} W ‖}_{F}^{2} . \end{matrix}

The Lagrange function of the above optimization problem (10) is

\begin{matrix} L_{A} = \frac{1}{2} {‖ X - 〚 A B C 〛 \times_{1} U \times_{2} V \times_{3} W ‖}_{F}^{2} - Tr (Φ_{1} A_{(1)}^{⊤}) . \end{matrix}

The matricization form of (11) that along the mode-1 is

\begin{matrix} L_{A} = \frac{1}{2} {‖ X_{(1)} - U A_{(1)} F_{(1)} {(W \otimes V)}^{⊤} ‖}_{F}^{2} - Tr (Φ_{1} A_{(1)}^{⊤}), \end{matrix}

where $F_{(1)}$ is the unfolding form of $F$ that defined as (8). By Lemma 2, the gradient of $L_{A}$ with respect to $A_{(1)}$ is given by

\begin{matrix} \frac{\partial L_{A}}{\partial A_{(1)}} = U^{⊤} (U A_{(1)} F_{(1)} {(W \otimes V)}^{⊤} - X_{(1)}) (W \otimes V) F_{(1)}^{⊤} - Φ_{1} . \end{matrix}

According to⁴¹, we can take advantage of the Karush-Kuhn-Tucker (KKT) conditions $\frac{\partial L_{A}}{\partial A_{(1)}} = 0$ and $Φ_{1} ⊛ A_{(1)} = 0$ , then the following equation is satisfied,

\begin{matrix} (U^{⊤}, U, A_{(1)}, F_{(1)}, {(W \otimes V)}^{⊤}, (W \otimes V), F_{(1)}^{⊤}) ⊛ A_{(1)} - (U^{⊤}, X_{(1)}, (W \otimes V), F_{(1)}^{⊤}) ⊛ A_{(1)} = 0 . \end{matrix}

Based on the above equation, we obtain the following updating rule for $A$ , and

\begin{matrix} [A_{(1)}]_{ij} ⟵ {[A_{(1)}]}_{ij} \frac{{[U^{⊤}, X_{(1)}, (W \otimes V), F_{(1)}^{⊤}]}_{ij}}{{[U^{⊤}, U, A_{(1)}, F_{(1)}, {(W \otimes V)}^{⊤}, (W \otimes V), F_{(1)}^{⊤}]}_{ij}} . \end{matrix}

Using the same technique, updating rules for inner factor tensors $B$ and $C$ are obtained, which can be expressed as

\begin{matrix} [B_{(2)}]_{ij} ⟵ {[B_{(2)}]}_{ij} \frac{{[V^{⊤}, X_{(2)}, (W \otimes U), G_{(2)}^{⊤}]}_{ij}}{{[V^{⊤}, V, B_{(2)}, G_{(2)}, {(W \otimes U)}^{⊤}, (W \otimes U), G_{(2)}^{⊤}]}_{ij}} \end{matrix}

and

\begin{matrix} [C_{(3)}]_{ij} ⟵ {[C_{(3)}]}_{ij} \frac{{[W^{⊤}, X_{(3)}, (V \otimes U), H_{(3)}^{⊤}]}_{ij}}{{[W^{⊤}, W, C_{(3)}, H_{(3)}, {(V \otimes U)}^{⊤}, (V \otimes U), H_{(3)}^{⊤}]}_{ij}}, \end{matrix}

respectively.

Solutions of factor matrices

When the variables $A, B, C, U$ , and $V$ are fixed, then the objective function of HNTriD is equivalent to

\begin{matrix} min_{W \geq 0} \frac{1}{2} {‖ X - 〚 A B C 〛 \times_{1} U \times_{2} V \times_{3} W ‖}_{F}^{2} + \frac{α}{2} Tr (W^{⊤} L W) . \end{matrix}

The Lagrange function of the optimization problem (15) is

\begin{matrix} L_{W} = \frac{1}{2} {‖ X - 〚 A B C 〛 \times_{1} U \times_{2} V \times_{3} W ‖}_{F}^{2} + \frac{α}{2} Tr (W^{⊤} L W) - Tr (Ψ_{3} W^{⊤}) . \end{matrix}

By using a transformation of the mode-3 matricization of the tensor $X$ and $\hat{X}$ , (16) is obtained as follows

\begin{matrix} L_{W} = \frac{1}{2} {‖ X_{(3)} - W C_{(3)} H_{(3)} {(V \otimes U)}^{⊤} ‖}_{F}^{2} + \frac{α}{2} Tr (W^{⊤} L W) - Tr (Ψ_{3} W^{⊤}) . \end{matrix}

By Lemma 2, the gradient of $L_{W}$ with respect to $W$ is given by

\begin{matrix} \frac{\partial L_{W}}{\partial W} = (W C_{(3)} H_{(3)} {(V \otimes U)}^{⊤} - X_{(3)}) (V \otimes U) H_{(3)}^{⊤} C_{(3)}^{⊤} + α L W - Ψ_{3} . \end{matrix}

Using the Karush-Kuhn-Tucker (KKT) conditions $\frac{\partial L_{W}}{\partial W} = 0$ and $Ψ_{3} ⊛ W = 0$ , the following equation is satisfied,

\begin{matrix} (W C_{(3)} H_{(3)} {(V \otimes U)}^{⊤} (V \otimes U) H_{(3)}^{⊤} C_{(3)}^{⊤} + α D_{v} W) ⊛ W \\ - (X_{(3)} (V \otimes U) H_{(3)}^{⊤} C_{(3)}^{⊤} + α \bar{S} W) ⊛ W = 0 . \end{matrix}

Based on the above equation, we obtain the following updating rule for $W$ , and

\begin{matrix} W_{ij} ⟵ W_{ij} \frac{{[X_{(3)} (V \otimes U) H_{(3)}^{⊤} C_{(3)}^{⊤} + α \bar{S} W]}_{ij}}{{[W C_{(3)} H_{(3)} {(V \otimes U)}^{⊤} (V \otimes U) H_{(3)}^{⊤} C_{(3)}^{⊤} + α D_{v} W]}_{ij}} . \end{matrix}

Using the same technique, updating rules for the inner factor matrices $U$ and $V$ are obtained, which can presented as

\begin{matrix} U_{ij} ⟵ U_{ij} \frac{{[X_{(1)}, (W \otimes V), F_{(1)}^{⊤}, A_{(1)}^{⊤}]}_{ij}}{{[U, A_{(1)}, F_{(1)}, {(W \otimes V)}^{⊤}, (W \otimes V), F_{(1)}^{⊤}, A_{(1)}^{⊤}]}_{ij}} \end{matrix}

and

\begin{matrix} V_{ij} ⟵ V_{ij} \frac{{[X_{(2)}, (W \otimes U), G_{(2)}^{⊤}, B_{(2)}^{⊤}]}_{ij}}{{[V, B_{(2)}, G_{(2)}, {(W \otimes U)}^{⊤}, (W \otimes U), G_{(2)}^{⊤}, B_{(2)}^{⊤}]}_{ij}}, \end{matrix}

respectively.

Convergence analysis theorically

In this subsection, the convergence of the iterative updating algorithm is investigated. Our proof will make use of an auxiliary function that is defined as below.

Definition 3

⁴² $G (x, \tilde{x})$ is an auxiliary function for $F (x)$ if the conditions

\begin{matrix} G (x, \tilde{x}) \geq F (x) and G (x, x) = F (x) \end{matrix}

are satisfied.

The auxiliary function is of great help due to the key property that is shown as follows:

Lemma 3

⁴² If $G (x, \tilde{x})$ is an auxiliary function of $F (x)$ , then $F (x)$ is non-increasing under the update

\begin{matrix} x^{t + 1} = arg min_{x} G (x, x^{t}) . \end{matrix}

Now, we are going to show that the update rule for $A_{(1)}$ shown in (12) is exactly the same as that shown in (20) with a proper auxiliary function. Considering the ith row and jth column entry ${[A_{(1)}]}_{ij}$ in $A_{(1)}$ , we use $F_{ij}$ to denote the part of the objective function (7) that is relevant only to ${[A_{(1)}]}_{ij}$ . The first and second derivatives of $F_{ij}$ are

\begin{matrix} F_{ij}^{'} = {[U^{⊤}, (U A_{(1)} F_{(1)} {(W \otimes V)}^{⊤} - X_{(1)}), (W \otimes V), F_{(1)}^{⊤}]}_{ij} \end{matrix}

and

\begin{matrix} F_{ij}^{″} = {(U^{⊤} U)}_{ii} {(F_{(1)}, {(W \otimes V)}^{⊤}, (W \otimes V), F_{(1)}^{⊤})}_{jj}, \end{matrix}

respectively.

Lemma 4

The function

\begin{matrix} G (x, {[A_{(1)}]}_{ij}^{t}) = & F_{ij} ({[A_{(1)}]}_{ij}^{t}) + F_{ij}^{'} ({[A_{(1)}]}_{ij}^{t}) (x - {[A_{(1)}]}_{ij}^{t}) \\ + \frac{{[U^{⊤}, U, A_{(1)}, F_{(1)}, {(W \otimes V)}^{⊤}, (W \otimes V), F_{(1)}^{⊤}]}_{ij}}{2 {[A_{(1)}]}_{ij}^{t}} {(x - {[A_{(1)}]}_{ij}^{t})}^{2} \end{matrix}

is an auxiliary function for $F_{ij}$ , which is only relevant to ${[A_{(1)}]}_{ij}$ .

Proof

Since $G (x, x) = F_{ij} (x)$ is obvious, we only need to show that the condition $G (x, {[A_{(1)}]}_{ij}^{t}) \geq F_{ij} (x)$ holds. To achieve this, we take into consideration the Taylor series expansion of $F_{ij} (x)$ which can be formalized as

\begin{matrix} F_{ij} (x) = F_{ij} ({[A_{(1)}]}_{ij}^{t}) + F_{ij}^{'} ({[A_{(1)}]}_{ij}^{t}) (x - {[A_{(1)}]}_{ij}^{t}) + \frac{1}{2} F_{ij}^{″} ({[A_{(1)}]}_{ij}^{t}) {(x - {[A_{(1)}]}_{ij}^{t})}^{2} . \end{matrix}

Comparing (21) with (22), we can get that $G (x, {[A_{(1)}]}_{ij}^{t}) \geq F_{ij} (x)$ is satisfied as long as

\begin{matrix} \frac{{[U^{⊤}, U, A_{(1)}, F_{(1)}, {(W \otimes V)}^{⊤}, (W \otimes V), F_{(1)}^{⊤}]}_{ij}}{2 {[A_{(1)}]}_{ij}^{t}} \geq \frac{1}{2} F_{ij}^{″} ({[A_{(1)}]}_{ij}^{t}) \end{matrix}

holds, which can be expressed as

\begin{matrix} {[U^{⊤}, U, A_{(1)}, F_{(1)}, {(W \otimes V)}^{⊤}, (W \otimes V), F_{(1)}^{⊤}]}_{ij} \\ \geq {[A_{(1)}]}_{ij}^{t} {(U^{⊤} U)}_{ii} {(F_{(1)}, {(W \otimes V)}^{⊤}, (W \otimes V), F_{(1)}^{⊤})}_{jj} . \end{matrix}

Since

\begin{matrix} {[U^{⊤}, U, A_{(1)}, F_{(1)}, {(W \otimes V)}^{⊤}, (W \otimes V), F_{(1)}^{⊤}]}_{ij} \\ = \sum_{i_{2} = 1}^{r_{1}} \sum_{i_{3} = 1}^{r^{2}} {(U^{⊤} U)}_{i i_{2}} {[A_{(1)}]}_{i_{2} i_{3}} {[F_{(1)} {(W \otimes V)}^{⊤} (W \otimes V) F_{(1)}^{⊤}]}_{i_{3} j} \\ = \sum_{i_{2} = 1, i_{2} \neq i}^{r_{1}} \sum_{i_{3} = 1, i_{3} \neq j}^{r^{2}} {(U^{⊤} U)}_{i i_{2}} {[A_{(1)}]}_{i_{2} i_{3}} {[F_{(1)} {(W \otimes V)}^{⊤} (W \otimes V) F_{(1)}^{⊤}]}_{i_{3} j} \\ + {[A_{(1)}]}_{ij}^{t} {(U^{⊤} U)}_{ii} {(F_{(1)}, {(W \otimes V)}^{⊤}, (W \otimes V), F_{(1)}^{⊤})}_{jj} \\ \geq {[A_{(1)}]}_{ij}^{t} {(U^{⊤} U)}_{ii} {(F_{(1)}, {(W \otimes V)}^{⊤}, (W \otimes V), F_{(1)}^{⊤})}_{jj}, \end{matrix}

which implies (23) holds, then $G (x, {[A_{(1)}]}_{ij}^{t}) \geq F_{ij} (x)$ is satisfied. This completes the proof. $□$

Theorem 1

The objective function of the HNTriD model (7) is non-increasing under the updating rule $A_{(1)}$ represented as (12).

Proof

Replacing the auxiliary function $G (x, x^{t})$ of (20) with (21) yields

\begin{matrix} {[A_{(1)}]}_{ij}^{t + 1} = arg min_{x} G (x, {[A_{(1)}]}_{ij}^{t}) . \end{matrix}

According to

\begin{matrix} \frac{\partial G (x, {[A_{(1)}]}_{ij}^{t})}{\partial x} \\ = F_{ij}^{'} ({[A_{(1)}]}_{ij}^{t}) + \frac{{[U^{⊤}, U, A_{(1)}, F_{(1)}, {(W \otimes V)}^{⊤}, (W \otimes V), F_{(1)}^{⊤}]}_{ij}}{{[A_{(1)}]}_{ij}^{t}} (x - {[A_{(1)}]}_{ij}^{t}) = 0, \end{matrix}

we have

\begin{matrix} [A_{(1)}]_{ij}^{t + 1} = {[A_{(1)}]}_{ij}^{t} \frac{{[U^{⊤}, X_{(1)}, (W \otimes V), F_{(1)}^{⊤}]}_{ij}}{{[U^{⊤}, U, A_{(1)}, F_{(1)}, {(W \otimes V)}^{⊤}, (W \otimes V), F_{(1)}^{⊤}]}_{ij}} . \end{matrix}

Then we can see that (24) agrees with (12), and the Lemma 4 guarantees that (21) is an auxiliary function of $F_{ij}$ . Based on this, in conjunction with Lemma 3, we can get that $f_{HNTriD}$ is non-increasing under the update rule of (12). The proof is then finished. $□$

We are going to state that the update for $W$ expressed as (17) is equal to the update (20) with an appropriate auxiliary function. Considering the ith row and jth column entry $W_{ij}$ in $W$ , we use ${\hat{F}}_{ij}$ to denote the part of the objective function (7) that is only relevant to $W_{ij}$ . The first and second derivatives of ${\hat{F}}_{ij}$ are shown below

\begin{matrix} {\hat{F}}_{ij}^{'} = {[(W C_{(3)} H_{(3)} {(V \otimes U)}^{⊤} - X_{(3)}) (V \otimes U) H_{(3)}^{⊤} C_{(3)}^{⊤} + α L W]}_{ij} \end{matrix}

and

\begin{matrix} {\hat{F}}_{ij}^{″} = {[C_{(3)}, H_{(3)}, {(V \otimes U)}^{⊤}, (V \otimes U), H_{(3)}^{⊤}, C_{(3)}^{⊤}]}_{jj} + α L_{ii}, \end{matrix}

respectively.

Lemma 5

The function

\begin{matrix} \hat{G} (x, W_{ij}^{t}) = & {\hat{F}}_{ij} (W_{ij}^{t}) + {\hat{F}}_{ij}^{'} (W_{ij}^{t}) (x - W_{ij}^{t}) \\ + \frac{{[W C_{(3)} H_{(3)} {(V \otimes U)}^{⊤} (V \otimes U) H_{(3)}^{⊤} C_{(3)}^{⊤} + α D_{v} W]}_{ij}}{2 W_{ij}^{t}} {(x - W_{ij}^{t})}^{2} \end{matrix}

is an auxiliary function for ${\hat{F}}_{ij}$ , which is only relevant to $W_{ij}$ .

Proof

Since $\hat{G} (x, x) = {\hat{F}}_{ij} (x)$ is obvious, we only need to illustrate that the condition $\hat{G} (x, x) \geq {\hat{F}}_{ij} (x)$ holds. To achieve this, we take into consideration the Taylor series expansion of ${\hat{F}}_{ij} (x)$ which can be expressed as follows

\begin{matrix} {\hat{F}}_{ij} (x) = {\hat{F}}_{ij} (W_{ij}^{t}) + {\hat{F}}_{ij}^{'} (W_{ij}^{t}) (x - W_{ij}^{t}) + \frac{1}{2} {\hat{F}}_{ij}^{″} (W_{ij}^{t}) {(x - W_{ij}^{t})}^{2} . \end{matrix}

Combing (25) with (26) we can find that $\hat{G} (x, W_{ij}^{t}) \geq {\hat{F}}_{ij} (x)$ is equivalent to

\begin{matrix} \frac{{[W C_{(3)} H_{(3)} {(V \otimes U)}^{⊤} (V \otimes U) H_{(3)}^{⊤} C_{(3)}^{⊤} + α D_{v} W]}_{ij}}{2 W_{ij}^{t}} \geq \frac{1}{2} {\hat{F}}_{ij}^{″} (W_{ij}^{t}) . \end{matrix}

And the above equation can be rewritten as

\begin{matrix} {[W C_{(3)} H_{(3)} {(V \otimes U)}^{⊤} (V \otimes U) H_{(3)}^{⊤} C_{(3)}^{⊤} + α D_{v} W]}_{ij} \\ \geq W_{ij}^{t} {[C_{(3)}, H_{(3)}, {(V \otimes U)}^{⊤}, (V \otimes U), H_{(3)}^{⊤}, C_{(3)}^{⊤}]}_{jj} + α W_{ij}^{t} L_{ii} . \end{matrix}

Since

\begin{matrix} {[W C_{(3)} H_{(3)} {(V \otimes U)}^{⊤} (V \otimes U) H_{(3)}^{⊤} C_{(3)}^{⊤} + α D_{v} W]}_{ij} \\ = \sum_{k = 1}^{r_{3}} W_{ik} {[C_{(3)}, H_{(3)}, {(V \otimes U)}^{⊤}, (V \otimes U), H_{(3)}^{⊤}, C_{(3)}^{⊤}]}_{kj} + α {(D_{v})}_{ii} W_{ij}^{t} \\ = \sum_{k = 1, k \neq j}^{r_{3}} W_{ik} {[C_{(3)}, H_{(3)}, {(V \otimes U)}^{⊤}, (V \otimes U), H_{(3)}^{⊤}, C_{(3)}^{⊤}]}_{kj} \\ + W_{ij}^{t} {[C_{(3)}, H_{(3)}, {(V \otimes U)}^{⊤}, (V \otimes U), H_{(3)}^{⊤}, C_{(3)}^{⊤}]}_{jj} + α L_{ii} W_{ij}^{t} + α {\bar{S}}_{ii} W_{ij}^{t} \\ \geq W_{ij}^{t} {[C_{(3)}, H_{(3)}, {(V \otimes U)}^{⊤}, (V \otimes U), H_{(3)}^{⊤}, C_{(3)}^{⊤}]}_{jj} + α W_{ij}^{t} L_{ii}, \end{matrix}

which implies (27) holds, and $\hat{G} (x, W_{ij}^{t}) \geq {\hat{F}}_{ij} (x)$ is satisfied. This completes the proof. $□$

Theorem 2

The objective function of the HNTriD model (7) is non-increasing under the updating rule $W$ represented as (17).

Proof

Using (25) to replace the $G (x, x^{t})$ that lies in (20), we obtain

\begin{matrix} W_{ij}^{t + 1} = arg min_{x} \hat{G} (x, W_{ij}^{t}) . \end{matrix}

According to

\begin{matrix} \frac{\partial \hat{G} (x, W_{ij}^{t})}{\partial x} \\ = {\hat{F}}_{ij}^{'} (W_{ij}^{t}) + \frac{{[W C_{(3)} H_{(3)} {(V \otimes U)}^{⊤} (V \otimes U) H_{(3)}^{⊤} C_{(3)}^{⊤} + α D_{v} W]}_{ij}}{W_{ij}^{t}} (x - W_{ij}^{t}) = 0, \end{matrix}

we have

\begin{matrix} U_{ij}^{t + 1} = U_{ij}^{t} \frac{{[X_{(3)} (V \otimes U) H_{(3)}^{⊤} C_{(3)}^{⊤} + α \bar{S} W]}_{ij}}{{[W C_{(3)} H_{(3)} {(V \otimes U)}^{⊤} (V \otimes U) H_{(3)}^{⊤} C_{(3)}^{⊤} + α D_{v} W]}_{ij}} . \end{matrix}

It is worth noting that (28) is consistent with (17). Lemma 5 ensures that (25) is an auxiliary function of ${\hat{F}}_{ij}$ , which combined with Lemma 3 results in $f_{HNTriD}$ being non-increasing under the update rule (17). This brings the proof to a close. $□$

Applying the same techniques to parameters $B, C, U$ , and $V$ to check the convergence of HNTriD. To summarize, we can obtain that $f_{HNTriD}$ is non-increasing under each of the update rules for inner factor tensors and matrices $A, B, C, U, V$ , and $W$ while fixing the others. Before imposing our algorithm on real-world datasets for clustering tasks, it is necessary to simplify the calculation formulas of the parameters $A, B, C, U, V$ , and $W$ , as in the following Remark.

Remark 1

From the form of updating rules of $A, B, C, U, V$ , and $W$ , it is a fact that each update needs to calculate the Kronecker products which requires costly storage resources. To simplify the produce of updating for mentioned parameters, we take advantages of the tensor property of the mode-n unfolding. Then, we get

\begin{matrix} X_{(1)} (W \otimes V) = {(X \times_{2} V^{⊤} \times_{3} W^{⊤})}_{(1)} \end{matrix}

and

\begin{matrix} F_{(1)} {(W \otimes V)}^{⊤} (W \otimes V) = {(F \times_{2} V^{⊤} V \times_{3} W^{⊤} W)}_{(1)}, \end{matrix}

which means that (12) and (18) can be transformed as

\begin{matrix} [A_{(1)} {]_{ij} ⟵ [A_{(1)}]}_{ij} \frac{{[U^{⊤}, {(X \times_{2} V^{⊤} \times_{3} W^{⊤})}_{(1)}, F_{(1)}^{⊤}]}_{ij}}{{[U^{⊤}, U, A_{(1)}, {(F \times_{2} V^{⊤} V \times_{3} W^{⊤} W)}_{(1)}, F_{(1)}^{⊤}]}_{ij}} \end{matrix}

and

\begin{matrix} U_{ij} ⟵ U_{ij} \frac{{[{(X \times_{2} V^{⊤} \times_{3} W^{⊤})}_{(1)}, F_{(1)}^{⊤}, A_{(1)}^{⊤}]}_{ij}}{{[U, A_{(1)}, {(F \times_{2} V^{⊤} V \times_{3} W^{⊤} W)}_{(1)}, F_{(1)}^{⊤}, A_{(1)}^{⊤}]}_{ij}}, \end{matrix}

respectively. According to

\begin{matrix} X_{(2)} (W \otimes U) = {(X \times_{1} U^{⊤} \times_{3} W^{⊤})}_{(2)} \end{matrix}

and

\begin{matrix} G_{(2)} {(W \otimes U)}^{⊤} (W \otimes U) = {(G \times_{1} U^{⊤} U \times_{3} W^{⊤} W)}_{(2)}, \end{matrix}

(13) and (19) can be calculated as

\begin{matrix} [B_{(2)}]_{ij} ⟵ {[B_{(2)}]}_{ij} \frac{{[V^{⊤}, {(X \times_{1} U^{⊤} \times_{3} W^{⊤})}_{(2)}, G_{(2)}^{⊤}]}_{ij}}{{[V^{⊤}, V, B_{(2)}, {(G \times_{1} U^{⊤} U \times_{3} W^{⊤} W)}_{(2)}, G_{(2)}^{⊤}]}_{ij}} \end{matrix}

and

\begin{matrix} V_{ij} ⟵ V_{ij} \frac{{[{(X \times_{1} U^{⊤} \times_{3} W^{⊤})}_{(2)}, G_{(2)}^{⊤}, B_{(2)}^{⊤}]}_{ij}}{{[V, B_{(2)}, {(G \times_{1} U^{⊤} U \times_{3} W^{⊤} W)}_{(2)}, G_{(2)}^{⊤}, B_{(2)}^{⊤}]}_{ij}} . \end{matrix}

Similarly, (14) and (17) can be further rewritten as

\begin{matrix} [C_{(3)}]_{ij} ⟵ {[C_{(3)}]}_{ij} \frac{{[W^{⊤}, {(X \times_{1} U \times_{2} V)}_{(3)}, H_{(3)}^{⊤}]}_{ij}}{{[W^{⊤}, W, C_{(3)}, {(H \times_{1} U^{⊤} U \times_{2} V^{⊤} V)}_{(3)}, H_{(3)}^{⊤}]}_{ij}} \end{matrix}

and

\begin{matrix} W_{ij} ⟵ W_{ij} \frac{{[{(X \times_{1} U \times_{2} V)}_{(3)} H_{(3)}^{⊤} C_{(3)}^{⊤} + α \bar{S} W]}_{ij}}{{[W C_{(3)} {(H \times_{1} U^{⊤} U \times_{2} V^{⊤} V)}_{(3)} H_{(3)}^{⊤} C_{(3)}^{⊤} + α D_{v} W]}_{ij}}, \end{matrix}

respectively.

Hence, the learning rules for the objective function are obtained via the multiplicative update methods described as above. Specifically, we randomly initialize the tensors and factor matrices $A$ , $B$ , $C$ , $U, V$ , and $W$ , then iterate them by (29), (31), (33), (30), (32), and (34). Each iteration ends when the stopping criterion is met. After completing all iterations, we record the operations of the model and examine the convergence at the end of each iteration. The pseudo-code for HNTriD is given in Algorithm 1.

Computational complexity analysis

In this subsection, we analyze the computational complexity of the proposed HNTriD model. First, we consider the calculation cost for the tensor-tensor product in (8). In the process of computation tensors $F, G$ , and $H$ takes $O (r^{3} r_{2} r_{3})$ , $O (r^{3} r_{1} r_{3})$ , and $O (r^{3} r_{1} r_{2})$ operations, respectively. It requires $O (n_{1} r_{1}^{2} + n_{2} r_{2}^{2} + n_{3} r_{3}^{2})$ operations to calculate symmetric matrices $U^{⊤} U$ , $V^{⊤} V$ , and $W^{⊤} W$ . It takes $O (n_{1} n_{2} n_{3} r_{2} + n_{1} n_{3} r_{2} r_{3} + n_{1} r^{2} r_{2} r_{3})$ operations for calculating ${(X \times_{2} V^{⊤} \times_{3} W^{⊤})}_{(1)} F_{(1)}^{⊤}$ . It takes $O (r^{2} r_{2}^{2} r_{3} + r^{2} r_{2} r_{3}^{2} + r^{4} r_{2} r_{3} + r^{4} r_{1} + n_{1} r^{2} r_{1})$ operations to calculate $U A_{(1)} {(F \times_{2} V^{⊤} V \times_{3} W^{⊤} W)}_{(1)} F_{(1)}^{⊤}$ . Since ${(X \times_{2} V^{⊤} \times_{3} W^{⊤})}_{(1)} F_{(1)}^{⊤}$ and $U A_{(1)} {(F \times_{2} V^{⊤} V \times_{3} W^{⊤} W)}_{(1)} F_{(1)}^{⊤}$ are available, the computational cost for each term, including $U^{⊤} {(X \times_{2} V^{⊤} \times_{3} W^{⊤})}_{(1)} F_{(1)}^{⊤}$ , $U^{⊤} U A_{(1)} {(F \times_{2} V^{⊤} V \times_{3} W^{⊤} W)}_{(1)} F_{(1)}^{⊤}$ , ${(X \times_{2} V^{⊤} \times_{3} W^{⊤})}_{(1)} F_{(1)}^{⊤} A_{(1)}^{⊤}$ , and $U A_{(1)} {(F \times_{2} V^{⊤} V \times_{3} W^{⊤} W)}_{(1)} F_{(1)}^{⊤} A_{(1)}^{⊤}$ , is equal to $O (n_{1} r^{2} r_{1})$ . Then, the cost of computing the update rules of $A$ in (29) is about $O (r^{2} r_{1})$ . Assume that integers $r_{1}, r_{2}, r_{3}$ , and r are of the same order of magnitude and they are much smaller than $n_{1}, n_{2}$ , and $n_{3}$ . We claim that the total computational cost of computing the update rule of $A$ in (29) and $U$ in (30) is approximately

\begin{matrix} O (n_{1} n_{2} n_{3} r + n_{1} n_{3} r^{2} + n_{1} r^{4} + r^{6}) \end{matrix}

Similarly, the total computational cost of computing updating rules for $B$ in (31) and $V$ in (32) is about

\begin{matrix} O (n_{1} n_{2} n_{3} r + n_{2} n_{3} r^{2} + n_{2} r^{4} + r^{6}) . \end{matrix}

The total computational cost of updating the rules for $C$ in (33) and $W$ in (34) is approximately

\begin{matrix} O (n_{1} n_{2} n_{3} r + n_{1} n_{2} r^{2} + n_{3} r^{4} + r^{6}) . \end{matrix}

Therefore, we can get the total calculation cost of the HNTriD algorithm approximately as

\begin{matrix} O (n_{1} n_{2} n_{3} r + (n_{1} n_{3} + n_{2} n_{3} + n_{1} n_{2}) r^{2} + (n_{1} + n_{2} + n_{3}) r^{4} + r^{6}) . \end{matrix}

Experiments

To check the validation of our proposed HNTriD algorithm for clustering data with dimensionality reduction, we run experiments on six popular datasets and compare the results of (7) with that of the related state-of-the-art methods, including NMF⁴², GNMF¹⁸, HNMF²⁹, HSNMF³¹, SHNMF⁴³, HGNTR³³, LraHGNTR³³, HyperNTF³², and TriD¹⁷. All the simulations will be performed on a desktop computer equipped with an Intel (R) Core (TM) i5-10400F CPU at 2.90 GHz and 16 GB of memory, running MATLAB 2015a in Windows 10.

Datasets

The clustering performance is evaluated on six widely used datasets, including COIL20, GEORGIA, MNIST, ORL, PIE, and USPS. The general statistical information of the datasets is summarized in Table 2, including the samples, sizes, and categories that were used in the numerical modeling tests of this paper. A brief overview of the mentioned datasets is presented below.

COIL20 (https://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php): It is a grayscale image dataset comprised of photographs taken from 20 different individuals, and each person was photographed 72 pieces of images from different angles. After resizing each image to $32 \times 32$ , we can get a third-order tensor $Y \in R_{+}^{32 \times 32 \times 1, 440}$ .
GEORGIA (http://www.anefian.com/research/face_reco.htm): It is a colored JPG image dataset, every image was drawn from 50 people and each person was photographed 15 pieces of images with cluttered backgrounds. The images used in this paper have been converted to grayscale and resized to $32 \times 32$ . We can obtain a tensor of third order, which defined as $Y \in R_{+}^{32 \times 32 \times 750}$ .
MNIST (http://yann.lecun.com/exdb/mnist/): It is a handwritten digit image dataset, and each image is $28 \times 28$ in size. More than 60,000 digit images were collected in the MNIST dataset range from “0” to “9”. In the numerical tests of this paper, we chose 100 images randomly for each single digit. Thus, the chosen images can be presented as a third-order tensor $Y \in R_{+}^{28 \times 28 \times 1, 000}$ .
ORL (https://github.com/saeid436/Face-Recognition-MLP/tree/main/ORL): It is a dataset that includes 400 grayscale face images of 40 different people collected from different facial expressions, various facial details, and varying lighting, and each image is in size of $112 \times 92$ . A third-order tensor can be defined as $Y \in R_{+}^{112 \times 92 \times 400}$ .
PIE (http://www.ri.cmu.edu/projects/project_418.html): It is a dataset containing over 40,000 facial images collected from 68 different individuals. These images were taken in a variety of poses, lighting conditions, and expressions. We randomly selected 53 people with 22 different facial images for our numerical tests. We converted them to gray-level and resize them to $32 \times 32$ . Then the selected images can be expressed as a third-order tensor $Y \in R_{+}^{32 \times 32 \times 1, 166}$ .
USPS (https://www.csie.ntu.edu.tw/cjlin/libsvmtools/datasets/multiclass.html#usps): It is a dataset that includes 11,000 grayscale handwritten digits (from “0” to “9”) that are $16 \times 16$ in size. In the simulation tests of this paper, we chose 100 images at random for each digit. On this basis, we can build a third-order tensor $Y \in R_{+}^{16 \times 16 \times 1, 000}$ .

Table 2.

Descriptions of the relevant six datasets used in this paper.

Item	Dataset
Item	COIL20	GEORGIA	MNIST	ORL	PIE	USPS
Sample	1,440	750	1,000	400	1,166	1,000
Size	$32 \times 32$	$32 \times 32$	$28 \times 28$	$112 \times 92$	$32 \times 32$	$16 \times 16$
Category	20	50	10	40	53	10

Open in a new tab

Evaluation metrics

Clustering analysis groups samples only according to the sample data itself and its aim is to group different objects into different groups according to the controlled conditions. The way to evaluate the efficiency of the clustering methods is that objects within groups are similar to each other, while objects differ from group to group. The greater the similarity within the group, the greater the difference between the groups, the better the clustering effect. As we know, the ACC, NMI, and PUR are widely used assessment criteria^44,45 of clustering algorithm. The accuracy (ACC) can be defined as

\begin{matrix} ACC (\bar{x_{i}}, x_{i}) = \frac{1}{n} \sum_{1}^{n} δ (\bar{x_{i}}, m a p (x_{i})), \end{matrix}

where n is the number of samples in datasets, $\bar{x_{i}}$ and $x_{i}$ denote the cluster sample and the original sample, respectively. The symbol $m a p (\cdot)$ indicates the matchup relationship mapping function, which is responsible for matching the cluster samples and original samples. The symbol $δ (\cdot, \cdot)$ is the delta function shown as follows

\begin{matrix} δ (\bar{x_{i}}, m a p (x_{i})) = \{\begin{matrix} 1, & if x_{i} is mapped into \bar{x_{i}}, \\ 0, & otherwise . \end{matrix}) \end{matrix}

In general, the agreement between two clusters can be measured with the mutual information ( $MI$ ), which is widely used in clustering applications. Given two discrete random variables $\bar{X}$ and X which stand for the cluster label sets and true label sets, $\bar{x}$ and x are selected arbitrarily from $\bar{X}$ and X, respectively. Then, the $MI$ can be measured by

\begin{matrix} MI (\bar{X}, X) = \sum_{x \in X} \sum_{\bar{x} \in \bar{X}} p (\bar{x}, x) log (\frac{p (\bar{x}, x)}{p (\bar{x}) p (x)}), \end{matrix}

where $p (\bar{x})$ and p(x) are the edge probability distribution function which denote the probabilities of the samples. The $p (\bar{x}, x)$ denotes the joint probability distribution function of $\bar{X}$ and X which means that the object belongs to category $\bar{X}$ and category X at the same time. To force the score to have an upper bound, we take the $NMI$ as one of the evaluation criterion, and the definition is

\begin{matrix} NMI (\bar{X}, X) = \frac{MI (\bar{X}, X)}{m a x (T (\bar{X}), T (X))}, \end{matrix}

where $T (\bar{X})$ and $T (X)$ are the entropy of the cluster label set $\bar{X}$ and the entropy of the true label set X. In this way, the score ranges of $NMI (\bar{X}, X)$ is from 0 to 1.

The purity ( $PUR$ ) of a clustering algorithm is a simple assessment format which only have to calculate the proportion of the correct clustering to the total. In other words, the $PUR$ is to scale the degree of correctness of measurement, the $PUR$ score of a cluster is observed by a weighted sum of the $PUR$ values of the respective clusters, which is denoted by

\begin{matrix} PUR (\bar{X}, X) = \frac{1}{n} \sum_{j = 0}^{k} max_{i} | {\bar{x}}_{j} \cap x_{i} |, \end{matrix}

where $\bar{X} = ({\bar{x}}_{1}, {\bar{x}}_{2}, \dots, {\bar{x}}_{k})$ is the cluster category set, the ${\bar{x}}_{i}$ denotes the ith cluster set. $X = (x_{1}, x_{2}, \dots, x_{k})$ is the original datasets that need to be clustered, $x_{i}$ represents the ith original object. The total number of the objects is n that need to be clustered and the function $| \cdot |$ denotes the cardinality of a set.

Algorithms for comparison

To ensure the clustering performance, we compare the proposed HNTriD model with the following state-of-the-art clustering algorithms.

NMF⁴²: It incorporates nonnegative constraint into two factor matrices decomposed from the original matrix.
GNMF¹⁸: It imposes the graph constraint to the coefficient matrix of the NMF method.
HNMF²⁹: It incorporates the hypergraph constraint into the coefficient matrix of the NMF method.
HSNMF³⁰: It imposes the hypergraph constraint on the coefficient matrix based on the $L_{1 / 2}$ -NMF method.
SHNMF³¹: It takes the sparse hypergraph as a regularization and adds it to the NMF framework.
HGNTR³³: It includes the hypergraph constraint on the last TR core tensor and a nonnegative constraint on TR factor tensors.
LraHGNTR³³: It is the low-rank approximation of HGNTR.
HyperNTF³²: It imposes a hypergraph constraint on the last factor matrix of the CP model and limits all factor matrices to be nonnegative.
TriD¹⁷: It is a bilevel form of the triple decomposition of a third-order tensor.

Parameters selection

To achieve the best performance, some critical parameters in the experimental simulations needed to be adjusted. In all tests, let $ϵ = 10^{- 5}$ and the maximum number of iterations be 1000 unless otherwise specified. We set the regularized term $α$ at the grid of ${10^{- 3}, 10^{- 2}, 10^{- 1}, 1, 10, 100, 1000}$ , and the k-nearest neighbors are chosen from ${3, 4, 5, 6, 7}$ . The parameters $r_{1}$ and $r_{2}$ are integers empirically chosen from ${3, 4, \dots, 32}$ , and the integer r is chosen from ${2, 3, \dots, 20}$ . Furthermore, we choose the third mode, $r_{3}$ , as the number of categories in the related datasets, as shown in Table 2. In our experiments, we let one of the parameters $r_{1}, r_{2}, r, k, α$ varies in the grid given above, and the rest of the parameters were fixed, and the parameters corresponding to the maximum values of NMI in the experiments were recorded. The optimal parameters corresponding to each dataset are given in Table 3.

Table 3.

List of parameters’ values corresponding to the maximum NMI of HNTriD on six datasets.

Dataset	Optimal parameter
Dataset	$r_{1}$	$r_{2}$	$r_{3}$	r	k	$α$
COIL20	17	17	20	4	4	1000
GEORGIA	5	5	50	5	3	0.01
MNIST	15	32	10	15	5	10
ORL	11	11	40	11	5	0.01
PIE	8	6	53	19	6	0.01
USPS	3	16	10	2	4	100

Open in a new tab

In Figure 3, we show the effect of the parameters $α$ and k on the three indicators ACC, NMI, and PUR on different datasets. In subplots (a), (c), and (e) of Figure 3, the remaining parameters except $α$ are taken as in Table 3. In subplots (b), (d), and (f) of Figure 3, the remaining parameters except k are taken as in Table 3.

The clustering performance of the HNTriD model varies with different $α$ and the number of nearest neighbors k.

From Figure 3, we can conclude that when the parameter $α$ is set to $10^{3}, 10^{- 2}, 10, 10^{- 2}, 10^{- 2}$ , and $10^{2}$ , the ACC, NMI, and PUR all perform better in clustering tasks on COIL20, GEORGIA, MNIST, ORL, PIE, and USPS datasets. The parameter k is set to 4, 3, 5, 5, 6, and 4, the ACC, NMI, and PUR achieve better results on COIL20, GEORGIA, MNIST, ORL, PIE, and USPS datasets, respectively.

Convergence study experimentally

In Section 3.3, we demonstrated that our HNTriD algorithm is non-increasing in theory. Here, we validate it using six HNTriD convergence curves tested from six related datasets, which are shown in Figure 4. There are two important key features that can be identified from Figure 4. First, as the number of iterations increases, the objective function value of HNTriD decreases. Second, the convergence report states that the curve declines rapidly and reaches a relatively stable state within thirty iterations. To summarize, HNTriD experiments show that our method works well on the six datasets mentioned above.

Convergence report of the proposed algorithm on six datasets.

Experimental comparison

To validate the effectiveness of the proposed HNTriD method, we compare it to some state-of-the-art methods under the assessment criteria ACC, NMI, and PUR. For HNTriD, the parameters are selected as in Table 3. For each method embodied with manifold learning, we set the regularized term $α$ at the grid of ${10^{- 3}, 10^{- 2}, 10^{- 1}, 1, 10, 100, 1000}$ , and the k-nearest neighbors are chosen from {3, 4, 5, 6, 7}. For methods based on TD, such as TriD and HyperNTF, we take the dimension of the third direction of the core tensor to be the class number of the original data. For methods based on tensor ring decomposition, such as HGNTR and LraHGNTR, we take the product of the first order and the third order to be the class number of the original data. The remaining parameters in the comparison algorithm are adjusted on the grid taken by HNTriD. First, we run a number of numerical tests to compare the clustering effect across different datasets. Second, statistical significance comparison is performed on COIL20 and MNIST using the t-test. Third, we present 2-D visualizations of different methods for clustering results on the COIL20 dataset and then complete the comparison tests by means of the t-SNE technique⁴⁶. Finally, we compare the amount of time they took to finish clustering tasks on six related real-world datasets.

Numerical comparison results

All experiments are run on the same sub-raw datasets, which are chosen at random from the corresponding database. Each experimental result is obtained only after the process has been repeated 100 times. The numerical tests, in particular, are performed in two steps. The first step is to choose a group of objects at random from the raw data and then decompose them into corresponding sub-raw data based on the parametric form of the model. To ensure that the experimental results are as accurate as possible to the real-world data clustering situation. We repeat the first step 10 times to obtain 10 groups of sub-raw data. In the second step, we use the K-means method to compute the evaluation index value for each group of sub-raw data. As before, we repeat the second step 10 times to obtain 10 evaluation values for each group of sub-raw data. Throughout the experiment, we can receive 100 evaluation index values and calculate the average value as the performance result for each method. Finally, we report the average performance in Tables 4, 5, and 6. Simultaneously, we record the time spent by each method performing clustering tasks on each dataset, and the results are shown in Figure 6.

Table 4.

Quantitative clustering (ACC%±std%) of different methods on six datasets.

Method	Dataset
Method	COIL20	GEORGIA	MNIST	ORL	PIE	USPS
NMF	57.51±4.84	40.52±1.90	48.42±3.72	66.34±3.72	65.62±3.46	41.62±3.18
GNMF	69.32±3.45	41.54±2.10	$49.35 \pm 5.78$	68.29±3.61	64.84±4.07	44.64±3.35
HNMF	75.13±3.11	41.88±1.69	46.33±5.96	68.61±3.05	64.33±3.45	44.84±3.69
HSNMF	74.59±3.01	38.19±1.83	45.29±5.74	41.28±1.65	$\underline{76.65 \pm 1.96}$	40.73±2.52
SHNMF	58.03±4.88	40,52±1.84	48.25±3.59	66.67±3.66	65.63±4.08	41.39±2.43
HGNTR	75.97±3.24	35.79±2.35	44.90±6.42	65.91±3.59	58.90±4.51	50.65±3.21
LraHGNTR	75.61±3.07	$\underline{43.61 \pm 1.53}$	43.86±6.27	64.35±3.17	46.32±3.51	47.56±2.99
HyperNTF	$\underline{77.15 \pm 4.04}$	41.86±1.51	45.41±4.40	$\underline{71.78 \pm 2.23}$	52.34±2.74	$52.66 \pm 4.52$
TriD	50.57±4.18	31.50±1.53	43.62±4.08	54.30±3.10	74.54±4.25	50.96±5.15
HNTriD	$82.11 \pm 2.53$	$47.32 \pm 2.06$	$\underline{48.66 \pm 6.14}$	$72.99 \pm 2.64$	$83.85 \pm 3.27$	$\underline{52.13 \pm 4.08}$

Open in a new tab

Significant values are bold.

Table 5.

Quantitative clustering (NMI%±std%) of different methods on six datasets.

Method	Dataset
Method	COIL20	GEORGIA	MNIST	ORL	PIE	USPS
NMF	70.82±2.06	60.03±1.13	45.71±2.34	82.77±1.79	85.33±1.39	38.08±2.30
GNMF	82.92±2.58	60.94±1.19	$\underline{52.02 \pm 4.58}$	84.57±1.62	84.75±1.80	46.08±3.56
HNMF	87.81±1.52	61.32±1.02	49.92±4.55	84.98±1.52	84.69±1.45	46.18±2.97
HSNMF	87.65±1.60	57.74±1.04	49.26±4.52	61.52±1.09	90.56±0.67	37.08±1.69
SHNMF	72.40±2.37	60.02±1.11	45.66±2.48	83.04±1.82	85.19±1.80	38.30±2.27
HGNTR	88.24±1.43	56.07±1.67	49.42±0.58	82.53±1.64	82.81±1.81	52.33±2.30
LraHGNTR	88.15±1.29	$\underline{62.47 \pm 0.91}$	49.57±4.82	82.18±1.51	75.84±2.23	50.16±2.19
HyperNTF	$\underline{88.71 \pm 7.12}$	60.78±0.95	48.40±4.18	$\underline{85.27 \pm 0.87}$	79.19±1.24	$54.07 \pm 2.49$
TriD	66.60±2.42	52.10±1.24	38.35±3.39	73.84±2.13	$\underline{90.98 \pm 1.69}$	45.82±4.31
HNTriD	$91.04 \pm 0.83$	$65.29 \pm 1.35$	$53.22 \pm 4.23$	$86.44 \pm 1.22$	$94.12 \pm 1.05$	$\underline{53.95 \pm 2.42}$

Open in a new tab

Significant values are bold.

Table 6.

Quantitative clustering (PUR%±std%) of different methods on six datasets.

method	Dataset
method	COIL20	GEORGIA	MNIST	ORL	PIE	USPS
NMF	60.33±3.91	43.27±1.69	52.60±3.33	71.11±2.83	72.09±2.51	43.83±2.63
GNMF	75.21±2.83	44.41±1.80	$55.10 \pm 5.29$	73.25±2.76	71.31±3.05	47.34±3.17
HNMF	80.52±2.29	44.71±1.55	52.10±5.03	73.69±2.34	70.83±2.56	47.39±2.89
HSNMF	80.20±2.31	40.75±1.65	51.25±4.71	44.02±1.94	80.48±1.53	43.32±2.29
SHNMF	61.12±4.06	43.46±1.70	52.62±3.33	71.48±3.11	71.96±2.96	43.33±2.21
HGNTR	81.19±2.25	38.77±2.15	50.50±6.14	69.98±3.04	64.40±3.63	53.60±2.66
LraHGNTR	81.06±2.05	$\underline{46.71 \pm 1.41}$	50.00±5.75	68.67±2.59	51.16±3.41	50.18±2.62
HyperNTF	$\underline{81.80 \pm 3.30}$	45.90±1.20	51.20±4.15	$\underline{74.84 \pm 1.70}$	59.16±1.99	$56, 73 \pm 3.54$
TriD	54.16±3.40	34.25±1.51	47.27±3.96	59.55±2.73	$\underline{80.64 \pm 3.17}$	52.70±4.97
HNTriD	$85.65 \pm 1.64$	$52.33 \pm 1.62$	$\underline{54, 72 \pm 5.32}$	$76.36 \pm 2.00$	$87.63 \pm 2.35$	$\underline{55.55 \pm 3.55}$

Open in a new tab

Significant values are bold.

Comparison report of the running time in regard to the different methods on six datasets.

Tables 4, 5, and 6, present experimental results demonstrating correlation algorithm clustering performance on six datasets. The advanced clustering method can be found in the tables under the quantitative clustering rules. For ease of observation, we highlight the best data in bold and the second ones in a slight underline. The experimental results presented above lead us to the following conclusion: (i) In terms of clustering performance, as measured by ACC, NMI, and PUR, our proposed method outperforms others in the majority of cases, and our experimental results are second-best, if not the best. (ii) The best experimental results in the mass are located in tensor-based methods. Because tensor methods consider more information from the raw data. (iii) The HNTriD method outperforms other tensor-based decomposition methods in most cases. Because it inherits the previous algorithm’s excellent characteristics, including TriD, and preserves the data’s multi-linear structure. Experiments show that the proposed HNTriD algorithm performs well in clustering tasks.

Statistical significance comparison

A t-test is a statistical technique used to determine if there is a significant difference between two groups of data. It functions as an important tool in hypothesis testing and aids researchers in determining whether two groups are genuinely distinct^47,48. Subsequently, we examine the statistical significance of the disparity between HNTriD and some typical approaches using t-test. Similar to⁴⁹, we take the significance level of $p < 0.05$ in the t-test to draw the difference. If our approach outperforms a compared method in a comparison test and the difference is statistically significant (t-test, $p < 0.05$ ), we record it as significant better or worse for one time. If the difference between our approach and a compared method is not statistically significant, then we say that they are comparable. We use (a, b, c) to display comparison results. Three integers inside the brackets respectively correspond to the number of times that the performance of our method is significantly better than, comparable to, significantly worse than a related method. We compare 10 algorithms (including HNTriD) on both the COIL20 and MNIST datasets. In each comparison, we run all the compared algorithms 10 times, and we repeat each group of comparison experiments 10 times. Specifically, the statistical test results are presented in Tables 7 and 8.

Table 7.

The t-test comparison results of different methods on COIL20.

	Metrics	NMF	GNMF	HNMF	HSNMF	SHNMF	HGNTR	LraHGNTR	HyperNTF	TriD
HNTriD	ACC	(10,0,0)	(10,0,0)	(10,0,0)	(10,0,0)	(10,0,0)	(10,0,0)	(10,0,0)	(10,0,0)	(9,1,0)
	NMI	(10,0,0)	(10,0,0)	(10,0,0)	(10,0,0)	(10,0,0)	(10,0,0)	(10,0,0)	(10,0,0)	(10,0,0)
	PUR	(10,0,0)	(10,0,0)	(10,0,0)	(10,0,0)	(10,0,0)	(10,0,0)	(10,0,0)	(10,0,0)	(10,0,0)
	overall	(30,0,0)	(30,0,0)	(30,0,0)	(30,0,0)	(30,0,0)	(30,0,0)	(30,0,0)	(30,0,0)	(29,1,0)

Open in a new tab

Table 8.

The t-test comparison results of different methods on MNIST.

	Metrics	NMF	GNMF	HNMF	HSNMF	SHNMF	HGNTR	LraHGNTR	HyperNTF	TriD
HNTriD	ACC	(3,5,2)	(3,5,2)	(5,3,2)	(10,0,0)	(2,4,4)	(3,6,1)	(7,2,1)	(7,1,2)	(5,5,0)
	NMI	(9,1,0)	(6,3,1)	(7,2,1)	(10,0,0)	(8,2,0)	(7,2,1)	(9,1,0)	(8,1,1)	(10,0,0)
	PUR	(6,4,0)	(6,3,1)	(6,6,2)	(10,0,0)	(4,4,2)	(3,6,1)	(6,4,0)	(9,0,1)	(9,1,0)
	overall	(18,10,2)	(15,11,4)	(18,7,5)	(30,0,0)	(14,10,6)	(13,14,3)	(22,7,1)	(24,2,4)	(24,6,0)

Open in a new tab

According to comparison tests show in Table 7, our method is significantly superior to the compared methods on the COIL20 dataset. The experimental results on MNIST demonstrate a clear decline in performance compared to the experimental findings on COIL20. However, from the overview of all metrics’ evaluation, the results still demonstrate a high level of performance when compared to other approaches. Based on the information provided in Tables 7 and 8, it is evident that our method demonstrates significant statistical advancements compared to the listed methods in most cases. The statistical test findings indicate that our method has a significantly bigger advantage over the other compared ones.

Visualization on clustering tasks

In order to visually demonstrate the clustering performance of HNTriD, we present cluster visualizations of several comparable approaches to assess the data learning capability of HNTriD. In this experiment, we choose the COIL20 dataset as a representative example to conduct comparative tests on clustering tasks. We specifically select 10 categories of data for analysis. The data analysis is shown in a two-dimensional space using t-SNE, and the cluster results are displayed in Figure 5 for visual comparison.

2-D visualizations of the clustering results of several algorithms using t-SNE on the COIL20 dataset. (a) NMF. (b) GNMF. (c) HNMF. (d) HSNMF. (e) SHNMF. (f) HGNTR. (g) LraHGNTR. (h) HyperNTF. (i) TriD. (j) HNTriD.

Figure 5 demonstrates that the HNTriD method, when applied to the mulitiway dataset, is capable of effectively discerning the differences between data samples. HNTriD outperforms other approaches in visually separating sample clusters in the COIL20 dataset, while some methods fail to completely separate samples from other clusters. This strategy enhances the reliability of the clustering data comparison experiment mentioned above and confirms that the inclusion of HNTriD improves the learning capability of multiway data.

Running time comparison

From the previous experimental results (including numerical experiments, statistical significance comparison, and visualization on clustering tasks), the HNTriD model shows better data analysis performance. However, it is important to take into account the time cost when applying mathematical models in real-life situations. This means that if we can improve the efficiency of calculations while preserving the quality of data analysis, the mathematical model will be more effective in practical applications. Based on this background, we figure out the time cost and use Figure 6 to record the running time of clustering tasks for each method on six related datasets. On each dataset, we compare the computational time required by each method to complete the same numerical tests described in Subsection 4.6. Each bar in the Figure 6 represents the total time needed for a method to complete the cluster analysis of a dataset, and different colors represent different algorithms. For example, for each dataset, the time cost of HNTriD is represented in yellow.

We can deduce the following statements from the bar graph: (i) Matrix-based decomposition methods are almost always faster than tensor-based ones. Matrix-based methods have an obvious advantage in terms of running speed for there are few factors that needed to be updated due to their special arithmetic expression. (ii) When compared to general methods, manifold learning ones take longer to complete clustering tasks in most cases. This occurs because manifold learning algorithms require updating more parameters in clustering data. (iii) When compared to matrix-based algorithms, the HNTriD algorithm takes longer to cluster tasks. Given the computational complexity of the algorithm, the experimental results are consistent with our expectations. The increase in computational time is due to the construction of the hypergraph and the depiction of raw data. (iv) Among the tensor-based methods, the HNTriD algorithm’s computation speed does not fall behind while maintaining its superior performance.

Conclusions

In this paper, the proposed HNTriD method performs well in multiway data learning because it combines the advantages of hypergraph learning and TriD. By constructing hypergraphs, it can reveal the complex structural information of more complex variables hidden among raw data. When combined with the TriD model, it can retain the multi-linear structure of high-order data while mining the potential information within the data and has strong data clustering abilities. Furthermore, we use the multiplicative update method to optimize the proposed HNTriD model, and experiments show that the new algorithm is convergent. The proposed algorithm is applied to six real-world datasets for clustering analysis, including COIL20, GEORGIA, MNIST, ORL, PIE, and USPS, and the data clustering results are compared to those of several existing algorithms. The experimental results demonstrate that the proposed HNTriD method is efficient and saves time in data analysis. In our current work, our hypergraph does not change once it is generated, which may result in a less-than-ideal hypergraph learned in some data with unexpected noise. The solution to this problem, however, is outside the scope of our current work, and we hope to improve it in the future.

Acknowledgements

Fatimah Abdul Razak is supported by Universiti Kebangsaan Malaysia under Grant GUP-2021-046. Qingshui Liao is supported by Scientific Research Foundation of Higher Education Institutions for Young Talents of Department of Education of Guizhou Province under Grant QJJ[2022]129. Qilong Liu is supported by Guizhou Provincial Basic Research Program (Natural Science) under Grant QKHJC-ZK[2023]YB245.

Author contributions

The idea of this article is proposed by Fatimah Abdul Razak, the exact algorithms are provided by Qingshui Liao, the numerical tests are conducted by Qingshui Liao, the English writing is finished by Qingshui Liao and polished by Qilong Liu and Fatimah Abdul Razak.

Data availability statements

The open datasets used in this manuscript have been linked in related parts, and the datasets generated during the current study are available from the corresponding author on reasonable request.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Qingshui Liao, Email: Liaoqingshui2021@163.com.

Fatimah Abdul Razak, Email: fatima84@ukm.edu.my.

References

1.Wold S, Esbensen K, Geladi P. Principal component analysis. Chemometr. Intell. Lab. 1987;2(1–3):37–52. [Google Scholar]
2.Stewart GW. On the early history of the singular value decomposition. SIAM Rev. 1993;35(4):551–566. [Google Scholar]
3.Beh, E. J., & Lombardo, R. Multiple and Multiway Correspondence Analysis. Wiley Interdiscip. Rev. Comput. Stat. 11 e1464. MR3999531, (2019). 10.1002/wics.1464
4.Martinez AM, Kak AC. PCA versus LDA. IEEE Trans. Pattern Anal. Mach. Intell. 2001;23(2):228–233. [Google Scholar]
5.Carroll JD, Chang JJ. Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition. Psych. 1970;35(3):283–319. [Google Scholar]
6.Domanov I, Lathauwer LD. Canonical polyadic decomposition of third-order tensors: Reduction to generalized eigenvalue decomposition. SIAM J. Matrix Anal. App. 2014;35(2):636–660. [Google Scholar]
7.Kolda TG, Bader BW. Tensor decompositions and applications. SIAM Rev. 2009;51(3):455–500. [Google Scholar]
8.Ceulemans E, Kiers HA. Selecting among three-mode principal component models of different types and complexities: A numerical convex hull based method. Br. J. Math. Stat. Psychol. 2006;59(1):133–150. doi: 10.1348/000711005X64817. [DOI] [PubMed] [Google Scholar]
9.Kroonenberg, P. M. Applied Multiway Data Analysis. Wiley Series in Probability and Statistics, Wiley Interscience, Hoboken, NJ. MR2378349 (2008). 10.1002/9780470238004
10.Kiers HAL. Three-way methods for the analysis of qualitative and quantitative two-way data. Leiden, NL: DSWO Press; 1989. [Google Scholar]
11.Kroonenberg, P. M. Multiway extensions of the SVD. Advanced studies in behaviormetrics and data science T. Imaizumi, A. Nakayama, S. Yokoyama, (eds.) 141–157 (2020)
12.Lombardo R, Velden M, Beh EJ. Three-way correspondence analysis in R. R J. 2023;15(2):237–262. [Google Scholar]
13.Xu YY. Alternating proximal gradient method for sparse nonnegative Tucker decomposition. Math. Program. Comput. 2015;7:39–70. [Google Scholar]
14.Yokota T, Zdunek R, Cichocki A, Yamashita Y. Smooth nonnegative matrix and tensor factorizations for robust multi-way data analysis. Signal Process. 2015;113:234–249. [Google Scholar]
15.Wu Q, Zhang LQ, Cichocki A. Multifactor sparse feature extraction using convolutive nonnegative Tucker decomposition. Neurocomputing. 2014;129:17–24. [Google Scholar]
16.Tan HC, Yang ZX, Feng G, Wang WH, Ran B. Correlation analysis for tensor-based traffic data imputation method. Procedia Soc. Behav. Sci. 2013;96:2611–2620. [Google Scholar]
17.Qi LQ, Chen YN, Bakshi M, Zhang XZ. Triple decomposition and tensor recovery of third order tensors. SIAM J. Matrix Anal. Appl. 2021;42(1):299–329. [Google Scholar]
18.Cai D, He XF, Han JW, Huang TS. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011;33(8):1548–1560. doi: 10.1109/TPAMI.2010.231. [DOI] [PubMed] [Google Scholar]
19.Chen KY, Che HJ, Li XQ, Leung MF. Graph non-negative matrix factorization with alternative smoothed l 0 regularizations. Neural Comput. Appl. 2023;35(14):9995–10009. [Google Scholar]
20.Deng P, Li TR, Wang HJ, Horng SJ, Yu Z, Wang XM. Tri-regularized nonnegative matrix tri-factorization for co-clustering. Knowl-Based Syst. 2021;226:107101. [Google Scholar]
21.Li CL, Che HJ, Leung MF, Liu C, Yan Z. Robust multi-view non-negative matrix factorization with adaptive graph and diversity constraints. Inf. Sci. 2023;634:587–607. [Google Scholar]
22.Lv LS, Bardou D, Hu P, Liu YQ, Yu GH. Graph regularized nonnegative matrix factorization for link prediction in directed temporal networks using pagerank centrality. Chaos Solitons Fractals. 2022;159:112107. [Google Scholar]
23.Nasiri E, Berahmand K, Li YF. Robust graph regularization nonnegative matrix factorization for link prediction in attributed networks. Multimed. Tools Appl. 2023;82(3):3745–3768. [Google Scholar]
24.Wang Q, He X, Jiang X, Li XL. Robust bi-stochastic graph regularized matrix factorization for data clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2020;44(1):390–403. doi: 10.1109/TPAMI.2020.3007673. [DOI] [PubMed] [Google Scholar]
25.Li XT, Ng MK, Cong G, Ye YM, Wu QY. MR-NTD: Manifold regularization nonnegative Tucker decomposition for tensor data dimension reduction and representation. IEEE Trans. Neural Netw. Learn. Syst. 2016;28(8):1787–1800. doi: 10.1109/TNNLS.2016.2545400. [DOI] [PubMed] [Google Scholar]
26.Qiu YN, Zhou GX, Wang YJ, Zhang Y, Xie SL. A generalized graph regularized non-negative Tucker decomposition framework for tensor data representation. IEEE T. Cybern. 2020;52(1):594–607. doi: 10.1109/TCYB.2020.2979344. [DOI] [PubMed] [Google Scholar]
27.Liu Q, Lu L, Chen Z. Non-negative Tucker decomposition with graph regularization and smooth constraint for clustering. Pattern Recognit. 2024;148:110207. [Google Scholar]
28.Wu FS, Li CQ, Li YT. Manifold regularization nonnegative triple decomposition of tensor sets for image compression and representation. J. Optimiz. Theory App. 2022;192(3):979–1000. [Google Scholar]
29.Zeng K, Yu J, Li CH, You J, Jin T. Image clustering by hyper-graph regularized non-negative matrix factorization. Neurocomputing. 2014;138:209–217. [Google Scholar]
30.Wang WH, Qian YT, Tang YY. Hypergraph-regularized sparse NMF for hyperspectral unmixing. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 2016;9(2):681–694. [Google Scholar]
31.Huang S, Wang HX, Ge YX, Huangfu L, Zhang XH, Yang D. Improved hypergraph regularized nonnegative matrix factorization with sparse representation. Pattern Recognit. Lett. 2018;102:8–14. [Google Scholar]
32.Yin WG, Qu YZ, Ma ZM, Liu QY. HyperNTF: A hypergraph regularized nonnegative tensor factorization for dimensionality reduction. Neurocomputing. 2022;512:190–202. [Google Scholar]
33.Zhao XH, Yu YY, Zhou GX, Zhao QB, Sun WJ. Fast hypergraph regularized nonnegative tensor ring decomposition based on low-rank approximation. Appl. Intell. 2022;52(15):17684–17707. [Google Scholar]
34.Huang ZH, Zhou GX, Qiu YN, Yu YY, Dai H. A dynamic hypergraph regularized non-negative Tucker decomposition framework for multiway data analysis. Int. J. Mach. Learn. Cybern. 2022;13(12):3691–3710. [Google Scholar]
35.Kim, Y. D., & Choi, S. Nonnegative Tucker decomposition. In IEEE Comput. Vis. Pattern Recognit., pp. 1–8 (2007). IEEE
36.Gao Y, Zhang ZZ, Lin HJ, Zhao XB, Du SY, Zu CQ. Hypergraph learning: Methods and practices. IEEE Trans. Pattern Anal. Mach. Intell. 2020;44(5):2548–2566. doi: 10.1109/TPAMI.2020.3039374. [DOI] [PubMed] [Google Scholar]
37.Bretto A. Hypergraph Theory. New York: Springer; 2013. [Google Scholar]
38.Zhang ZH, Bai L, Liang YH, Hancock E. Joint hypergraph learning and sparse regression for feature selection. Pattern Recognit. 2017;63:291–309. [Google Scholar]
39.Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature. 1999;401(6755):788–791. doi: 10.1038/44565. [DOI] [PubMed] [Google Scholar]
40.Zhou, D. Y., Huang, J. Y., & Schölkopf, B. Learning with hypergraphs: Clustering, classification, and embedding. Adv. Neural Inf. Process. Syst.19 (2006)
41.Boyd S, Boyd SP, Vandenberghe L. Convex Optimization. Cambridge: Cambridge Univ. Press; 2004. [Google Scholar]
42.Lee DD, Seung HS. Algorithms for non-negative matrix factorization. Proc. Adv. Neural Inf. Process. Syst. 2001;1:556–562. [Google Scholar]
43.Wang CY, Yu N, Wu MJ, Gao YL, Liu JX, Wang J. Dual hyper-graph regularized supervised NMF for selecting differentially expressed genes and tumor classification. IEEE ACM Trans. Comput. Biol. Bioinf. 2020;18(6):2375–2383. doi: 10.1109/TCBB.2020.2975173. [DOI] [PubMed] [Google Scholar]
44.Razak, F. A. The derivation of mutual information and covariance function using centered random variables. In AIP Conference Proceedings, vol. 1635, pp. 883–889 (2014). AIP
45.Yin M, Gao JB, Xie SL, Guo Y. Multiview subspace clustering via tensorial t-product representation. IEEE Trans. Neural Netw. Learn. Syst. 2018;30(3):851–864. doi: 10.1109/TNNLS.2018.2851444. [DOI] [PubMed] [Google Scholar]
46.Li S, Li W, Lu H, Li Y. Semi-supervised non-negative matrix tri-factorization with adaptive neighbors and block-diagonal learning. Eng. Appl. Artif. Intell. 2023;121:106043. [Google Scholar]
47.Demšar J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006;7:1–30. [Google Scholar]
48.Huang D, Wang CD, Lai JH. Locally weighted ensemble clustering. IEEE Trans. Cybern. 2017;48(5):1460–1473. doi: 10.1109/TCYB.2017.2702343. [DOI] [PubMed] [Google Scholar]
49.Zhang GY, Zhou YR, He XY, Wang CD, Huang D. One-step kernel multi-view subspace clustering. Knowl. Based Syst. 2020;189:105126. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The open datasets used in this manuscript have been linked in related parts, and the datasets generated during the current study are available from the corresponding author on reasonable request.

[CR1] 1.Wold S, Esbensen K, Geladi P. Principal component analysis. Chemometr. Intell. Lab. 1987;2(1–3):37–52. [Google Scholar]

[CR2] 2.Stewart GW. On the early history of the singular value decomposition. SIAM Rev. 1993;35(4):551–566. [Google Scholar]

[CR3] 3.Beh, E. J., & Lombardo, R. Multiple and Multiway Correspondence Analysis. Wiley Interdiscip. Rev. Comput. Stat. 11 e1464. MR3999531, (2019). 10.1002/wics.1464

[CR4] 4.Martinez AM, Kak AC. PCA versus LDA. IEEE Trans. Pattern Anal. Mach. Intell. 2001;23(2):228–233. [Google Scholar]

[CR5] 5.Carroll JD, Chang JJ. Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition. Psych. 1970;35(3):283–319. [Google Scholar]

[CR6] 6.Domanov I, Lathauwer LD. Canonical polyadic decomposition of third-order tensors: Reduction to generalized eigenvalue decomposition. SIAM J. Matrix Anal. App. 2014;35(2):636–660. [Google Scholar]

[CR7] 7.Kolda TG, Bader BW. Tensor decompositions and applications. SIAM Rev. 2009;51(3):455–500. [Google Scholar]

[CR8] 8.Ceulemans E, Kiers HA. Selecting among three-mode principal component models of different types and complexities: A numerical convex hull based method. Br. J. Math. Stat. Psychol. 2006;59(1):133–150. doi: 10.1348/000711005X64817. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Kroonenberg, P. M. Applied Multiway Data Analysis. Wiley Series in Probability and Statistics, Wiley Interscience, Hoboken, NJ. MR2378349 (2008). 10.1002/9780470238004

[CR10] 10.Kiers HAL. Three-way methods for the analysis of qualitative and quantitative two-way data. Leiden, NL: DSWO Press; 1989. [Google Scholar]

[CR11] 11.Kroonenberg, P. M. Multiway extensions of the SVD. Advanced studies in behaviormetrics and data science T. Imaizumi, A. Nakayama, S. Yokoyama, (eds.) 141–157 (2020)

[CR12] 12.Lombardo R, Velden M, Beh EJ. Three-way correspondence analysis in R. R J. 2023;15(2):237–262. [Google Scholar]

[CR13] 13.Xu YY. Alternating proximal gradient method for sparse nonnegative Tucker decomposition. Math. Program. Comput. 2015;7:39–70. [Google Scholar]

[CR14] 14.Yokota T, Zdunek R, Cichocki A, Yamashita Y. Smooth nonnegative matrix and tensor factorizations for robust multi-way data analysis. Signal Process. 2015;113:234–249. [Google Scholar]

[CR15] 15.Wu Q, Zhang LQ, Cichocki A. Multifactor sparse feature extraction using convolutive nonnegative Tucker decomposition. Neurocomputing. 2014;129:17–24. [Google Scholar]

[CR16] 16.Tan HC, Yang ZX, Feng G, Wang WH, Ran B. Correlation analysis for tensor-based traffic data imputation method. Procedia Soc. Behav. Sci. 2013;96:2611–2620. [Google Scholar]

[CR17] 17.Qi LQ, Chen YN, Bakshi M, Zhang XZ. Triple decomposition and tensor recovery of third order tensors. SIAM J. Matrix Anal. Appl. 2021;42(1):299–329. [Google Scholar]

[CR18] 18.Cai D, He XF, Han JW, Huang TS. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011;33(8):1548–1560. doi: 10.1109/TPAMI.2010.231. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Chen KY, Che HJ, Li XQ, Leung MF. Graph non-negative matrix factorization with alternative smoothed l 0 regularizations. Neural Comput. Appl. 2023;35(14):9995–10009. [Google Scholar]

[CR20] 20.Deng P, Li TR, Wang HJ, Horng SJ, Yu Z, Wang XM. Tri-regularized nonnegative matrix tri-factorization for co-clustering. Knowl-Based Syst. 2021;226:107101. [Google Scholar]

[CR21] 21.Li CL, Che HJ, Leung MF, Liu C, Yan Z. Robust multi-view non-negative matrix factorization with adaptive graph and diversity constraints. Inf. Sci. 2023;634:587–607. [Google Scholar]

[CR22] 22.Lv LS, Bardou D, Hu P, Liu YQ, Yu GH. Graph regularized nonnegative matrix factorization for link prediction in directed temporal networks using pagerank centrality. Chaos Solitons Fractals. 2022;159:112107. [Google Scholar]

[CR23] 23.Nasiri E, Berahmand K, Li YF. Robust graph regularization nonnegative matrix factorization for link prediction in attributed networks. Multimed. Tools Appl. 2023;82(3):3745–3768. [Google Scholar]

[CR24] 24.Wang Q, He X, Jiang X, Li XL. Robust bi-stochastic graph regularized matrix factorization for data clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2020;44(1):390–403. doi: 10.1109/TPAMI.2020.3007673. [DOI] [PubMed] [Google Scholar]

[CR25] 25.Li XT, Ng MK, Cong G, Ye YM, Wu QY. MR-NTD: Manifold regularization nonnegative Tucker decomposition for tensor data dimension reduction and representation. IEEE Trans. Neural Netw. Learn. Syst. 2016;28(8):1787–1800. doi: 10.1109/TNNLS.2016.2545400. [DOI] [PubMed] [Google Scholar]

[CR26] 26.Qiu YN, Zhou GX, Wang YJ, Zhang Y, Xie SL. A generalized graph regularized non-negative Tucker decomposition framework for tensor data representation. IEEE T. Cybern. 2020;52(1):594–607. doi: 10.1109/TCYB.2020.2979344. [DOI] [PubMed] [Google Scholar]

[CR27] 27.Liu Q, Lu L, Chen Z. Non-negative Tucker decomposition with graph regularization and smooth constraint for clustering. Pattern Recognit. 2024;148:110207. [Google Scholar]

[CR28] 28.Wu FS, Li CQ, Li YT. Manifold regularization nonnegative triple decomposition of tensor sets for image compression and representation. J. Optimiz. Theory App. 2022;192(3):979–1000. [Google Scholar]

[CR29] 29.Zeng K, Yu J, Li CH, You J, Jin T. Image clustering by hyper-graph regularized non-negative matrix factorization. Neurocomputing. 2014;138:209–217. [Google Scholar]

[CR30] 30.Wang WH, Qian YT, Tang YY. Hypergraph-regularized sparse NMF for hyperspectral unmixing. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 2016;9(2):681–694. [Google Scholar]

[CR31] 31.Huang S, Wang HX, Ge YX, Huangfu L, Zhang XH, Yang D. Improved hypergraph regularized nonnegative matrix factorization with sparse representation. Pattern Recognit. Lett. 2018;102:8–14. [Google Scholar]

[CR32] 32.Yin WG, Qu YZ, Ma ZM, Liu QY. HyperNTF: A hypergraph regularized nonnegative tensor factorization for dimensionality reduction. Neurocomputing. 2022;512:190–202. [Google Scholar]

[CR33] 33.Zhao XH, Yu YY, Zhou GX, Zhao QB, Sun WJ. Fast hypergraph regularized nonnegative tensor ring decomposition based on low-rank approximation. Appl. Intell. 2022;52(15):17684–17707. [Google Scholar]

[CR34] 34.Huang ZH, Zhou GX, Qiu YN, Yu YY, Dai H. A dynamic hypergraph regularized non-negative Tucker decomposition framework for multiway data analysis. Int. J. Mach. Learn. Cybern. 2022;13(12):3691–3710. [Google Scholar]

[CR35] 35.Kim, Y. D., & Choi, S. Nonnegative Tucker decomposition. In IEEE Comput. Vis. Pattern Recognit., pp. 1–8 (2007). IEEE

[CR36] 36.Gao Y, Zhang ZZ, Lin HJ, Zhao XB, Du SY, Zu CQ. Hypergraph learning: Methods and practices. IEEE Trans. Pattern Anal. Mach. Intell. 2020;44(5):2548–2566. doi: 10.1109/TPAMI.2020.3039374. [DOI] [PubMed] [Google Scholar]

[CR37] 37.Bretto A. Hypergraph Theory. New York: Springer; 2013. [Google Scholar]

[CR38] 38.Zhang ZH, Bai L, Liang YH, Hancock E. Joint hypergraph learning and sparse regression for feature selection. Pattern Recognit. 2017;63:291–309. [Google Scholar]

[CR39] 39.Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature. 1999;401(6755):788–791. doi: 10.1038/44565. [DOI] [PubMed] [Google Scholar]

[CR40] 40.Zhou, D. Y., Huang, J. Y., & Schölkopf, B. Learning with hypergraphs: Clustering, classification, and embedding. Adv. Neural Inf. Process. Syst.19 (2006)

[CR41] 41.Boyd S, Boyd SP, Vandenberghe L. Convex Optimization. Cambridge: Cambridge Univ. Press; 2004. [Google Scholar]

[CR42] 42.Lee DD, Seung HS. Algorithms for non-negative matrix factorization. Proc. Adv. Neural Inf. Process. Syst. 2001;1:556–562. [Google Scholar]

[CR43] 43.Wang CY, Yu N, Wu MJ, Gao YL, Liu JX, Wang J. Dual hyper-graph regularized supervised NMF for selecting differentially expressed genes and tumor classification. IEEE ACM Trans. Comput. Biol. Bioinf. 2020;18(6):2375–2383. doi: 10.1109/TCBB.2020.2975173. [DOI] [PubMed] [Google Scholar]

[CR44] 44.Razak, F. A. The derivation of mutual information and covariance function using centered random variables. In AIP Conference Proceedings, vol. 1635, pp. 883–889 (2014). AIP

[CR45] 45.Yin M, Gao JB, Xie SL, Guo Y. Multiview subspace clustering via tensorial t-product representation. IEEE Trans. Neural Netw. Learn. Syst. 2018;30(3):851–864. doi: 10.1109/TNNLS.2018.2851444. [DOI] [PubMed] [Google Scholar]

[CR46] 46.Li S, Li W, Lu H, Li Y. Semi-supervised non-negative matrix tri-factorization with adaptive neighbors and block-diagonal learning. Eng. Appl. Artif. Intell. 2023;121:106043. [Google Scholar]

[CR47] 47.Demšar J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006;7:1–30. [Google Scholar]

[CR48] 48.Huang D, Wang CD, Lai JH. Locally weighted ensemble clustering. IEEE Trans. Cybern. 2017;48(5):1460–1473. doi: 10.1109/TCYB.2017.2702343. [DOI] [PubMed] [Google Scholar]

[CR49] 49.Zhang GY, Zhou YR, He XY, Wang CD, Huang D. One-step kernel multi-view subspace clustering. Knowl. Based Syst. 2020;189:105126. [Google Scholar]

PERMALINK

Hypergraph regularized nonnegative triple decomposition for multiway data analysis

Qingshui Liao

Qilong Liu

Fatimah Abdul Razak

Abstract

Introduction

Related work

Table 1.

Nonnegative tensor decomposition (NTD)

Bilevel form of triple decomposition (TriD)

Definition 1

Definition 2

Hypergraph learning

Figure 1.

Hypergraph regularized nonnegative triple decomposition (HNTriD)

Objective function of HNTriD

Figure 2.

Optimization algorithm

Lemma 1

Lemma 2

Proof

Solutions of inner factor tensors

Solutions of factor matrices

Convergence analysis theorically

Definition 3

Lemma 3

Lemma 4

Proof

Theorem 1

Proof

Lemma 5

Proof

Theorem 2

Proof

Remark 1

Algorithm 1.

Computational complexity analysis

Experiments

Datasets

Table 2.

Evaluation metrics

Algorithms for comparison

Parameters selection

Table 3.

Figure 3.

Convergence study experimentally

Figure 4.

Experimental comparison

Numerical comparison results

Table 4.

Table 5.

Table 6.

Figure 6.

Statistical significance comparison

Table 7.

Table 8.

Visualization on clustering tasks

Figure 5.

Running time comparison

Conclusions

Acknowledgements

Author contributions

Data availability statements

Competing interests

Footnotes

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases