Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2018 Jul 25;13(7):e0200579. doi: 10.1371/journal.pone.0200579

CTD: Fast, accurate, and interpretable method for static and dynamic tensor decompositions

Jungwoo Lee 1, Dongjin Choi 1, Lee Sael 1,*
Editor: Wenjie Ruan2
PMCID: PMC6059458  PMID: 30044837

Abstract

How can we find patterns and anomalies in a tensor, i.e., multi-dimensional array, in an efficient and directly interpretable way? How can we do this in an online environment, where a new tensor arrives at each time step? Finding patterns and anomalies in multi-dimensional data have many important applications, including building safety monitoring, health monitoring, cyber security, terrorist detection, and fake user detection in social networks. Standard tensor decomposition results are not directly interpretable and few methods that propose to increase interpretability need to be made faster, more memory efficient, and more accurate for large and quickly generated data in the online environment. We propose two versions of a fast, accurate, and directly interpretable tensor decomposition method we call CTD that is based on efficient sampling method. First is the static version of CTD, i.e., CTD-S, that provably guarantees up to 11× higher accuracy than that of the state-of-the-art method. Also, CTD-S is made up to 2.3× faster and up to 24× more memory-efficient than the state-of-the-art method by removing redundancy. Second is the dynamic version of CTD, i.e. CTD-D, which is the first interpretable dynamic tensor decomposition method ever proposed. It is also made up to 82× faster than the already fast CTD-S by exploiting factors at previous time step and by reordering operations. With CTD, we demonstrate how the results can be effectively interpreted in online distributed denial of service (DDoS) attack detection and online troll detection.

Introduction

Given a tensor, or multi-dimensional array, how can we find patterns and anomalies in an efficient and directly interpretable way? How can we do this in an online environment, where new data arrive at each time step? Many real-world data are multi-dimensional and can be modeled as sparse tensors. Examples include network traffic data (source IP—destination IP—time), movie rating data (user—movie—time), IoT sensor data, and healthcare data. Finding patterns and anomalies in those tensor data is a very important problem with many applications such as building safety monitoring [1], patient health monitoring [25], cyber security [6], terrorist detection [79], and fake user detection in social networks [10, 11]. Tensor decomposition method, a widely-used tool in tensor analysis, has been used for this task. However, the standard tensor decomposition methods such as PARAFAC [12] and Tucker [13] do not provide interpretability and are not applicable for real-time analysis in environments with high-velocity data.

Sampling-based tensor decomposition methods [1416] arose as an alternative due to their direct interpretability. The direct interpretability not only reduces time and effort involved in finding patterns and anomalies from the decomposed tensors but also provides clarity in interpreting the result. A sampling-based decomposition method for sparse tensors is also memory-efficient since it preserves the sparsity of the original tensors on the sampled factor matrices. However, existing sampling-based tensor decomposition methods are slow, have high memory usage, and produce low accuracy. For example, Tensor-CUR [16], the state-of-the-art sampling-based static tensor decomposition method, has many redundant fibers including duplicates in its factors. These redundancy cause higher memory usage and longer running time. Tensor-CUR is also not accurate enough for real-world tensor analysis.

In addition to interpretability, demands for online method applicable in a dynamic environment, where multi-dimensional data are generated continuously at a fast rate, are also increasing. A real-time analysis is not feasible with static methods since all the data, i.e., historical and incoming tensors, need to be decomposed over again at each time step. There are a few dynamic tensor decomposition methods proposed [1719]. However, proposed methods are not directly interpretable and do not preserve sparsity. To the best of our knowledge, there has been no sampling-based dynamic tensor decomposition method proposed.

In this paper, we propose CTD (Compact Tensor Decomposition), a fast, accurate, and interpretable sampling-based tensor decomposition method. CTD has two versions: CTD-S for static tensors, and CTD-D for dynamic tensors. CTD-S is optimal after sampling, and results in a compact tensor decomposition through careful sampling and redundancy elimination, thereby providing much better running time and memory efficiency than previous methods. CTD-D, the first sampling-based dynamic tensor decomposition method in literature, updates and modifies minimally on the components altered by the incoming data, making the method applicable for real-time analysis on a dynamic environment. Table 1 shows the comparison of CTD and the existing method, Tensor-CUR.

Table 1. Comparison of our proposed CTD and the existing Tensor-CUR.

The static method CTD-S outperforms the state of-the-art Tensor-CUR in terms of time, memory usage, and accuracy. The dynamic method CTD-D is the fastest.

Existing [Proposed]
Tensor-CUR [16] CTD-S CTD-D
Interpretability
Time fast faster fastest
Memory usage low lower low
Accuracy low high high
Online

Our main contributions are as follows:

  • Method. We propose CTD, a fast, accurate, and directly interpretable tensor decomposition method. We prove the optimality of the static method CTD-S which makes it more accurate than the state-of-the-art method. Also, to the best of our knowledge, the dynamic method CTD-D is the first sampling-based dynamic tensor decomposition method.

  • Performance. CTD-S is up to 11× more accurate, 2.3× faster, and 24× more memory-efficient compared to Tensor-CUR, the state-of-the-art competitor. CTD-D is up to 82× faster than CTD-S.

  • Interpretable Analysis. We show how CTD results are directly interpreted to successfully detect DDoS attacks in network traffic data and trolls in social network data.

The codes and datasets used in this paper are available at https://github.com/leesael/CTD. The rest of this paper is organized as follows. We first describe preliminaries and related works for tensor and sampling-based decomposition. We then describe our proposed method CTD and the experimental results. After presenting CTD at work, we conclude this paper.

Preliminaries and related works

In this section, we describe preliminaries and related works for tensor and sampling-based decompositions. Table 2 lists the definitions of symbols used in this paper.

Table 2. Table of symbols.

Symbol Definition Symbol Definition
X tensor (Euler script, bold letter) X pseudoinverse of X
X matrix (uppercase, bold letter) N order of a tensor
x column vector (lower case, bold letter) ×n n-mode product
x scalar (lower case, italic letter) ‖•‖F Frobenius norm
X(n) mode-n matricization of a tensor X nnz(X) number of nonzero elements in X

Tensor

A tensor is a multi-dimensional array and is denoted by the boldface Euler script, e.g. XRI1××IN where N denotes the order (the number of axes) of X. Each axis of a tensor is also known as mode or way. A fiber is a vector (1-mode tensor) which has fixed indices except one. Every index of a mode-n fiber is fixed except n-th index. A fiber can be regarded as a higher-order version of a matrix row and column. A matrix column and row each correspond to mode-1 fiber and mode-2 fiber, respectively. A slab is an (N − 1)-mode tensor which has only one fixed index. X(α)RIα×Nα denotes a mode-α matricization of X, where Nα = ∏nα In. X(α) is made by rearranging mode-α fibers of X to be the columns of X(α). XF is the Frobenius norm of X and is defined by Eq (1).

XF2=i1,i2,,iNxi1i2iN2 (1)

X×nURI1××In-1×J×In+1××IN denotes the n-mode product of a tensor XRI1××IN with a matrix URJ×In. Elementwise,

(X×nU)i1in-1jin+1iN=in=1Inxi1in-1inin+1iNujin (2)

X×nU has a property shown in Eq (3).

Y=X×nUY(n)=UX(n) (3)

We assume that a matrix or tensor is stored in a sparse-unordered representation (i.e. only nonzero entries are stored in a form of pair of indices and the corresponding value). nnz(X) denotes the number of nonzero elements in X.

We describe existing sampling-based matrix and tensor decomposition methods in the following subsections.

Sampling based matrix decomposition

Sampling-based matrix decomposition methods sample columns or rows from a given matrix and use them to make their factors. They produce directly interpretable factors which preserve sparsity since those factors directly reflect the sparsity of the original data. In contrast, a singular value decomposition (SVD) generates factors which are hard to understand and dense because the factors are in a form of linear combination of columns or rows from the given matrix. Definition 1 shows the definition for CX matrix decomposition [20], a kind of sampling-based matrix decomposition.

Definition 1. Given a matrix ARm×n, the matrix A˜=CX is a CX matrix decomposition of A, where a matrix CRm×c consists of actual columns of A and a matrix X is any matrix of size c × n.

We introduce well-known CX matrix decomposition methods: LinearTimeCUR, CMD, and Colibri.

LinearTimeCUR and CMD

Drineas et al. [21] proposed LinearTimeCUR and Sun et al. [22] proposed CMD. In the initial step, LinearTimeCUR and CMD sample columns from an original matrix A according to the probabilities proportional to the norm of each column with replacement. Drineas et al. [21] has proven that this biased sampling provides an optimal approximation. Then, they project A into the column space spanned by those sampled columns and use the projection as the low-rank approximation of A. LinearTimeCUR has many duplicates in its factors because a column or row with a higher norm is likely to be selected multiple times. These duplicates make LinearTimeCUR slow and require a large amount of memory. CMD handles the duplication issue by removing duplicate columns and rows in the factors of LinearTimeCUR, thereby reducing running time and memory significantly.

Colibri

Tong et al. [23] proposed Colibri-S which improves CMD by removing all types of linear dependencies including duplicates. Colibri-S is much faster and memory-efficient compared to LinearTimeCUR and CMD because the dimension of factors is much smaller than that of LinearTimeCUR and CMD. Tong et al. [23] also proposed the dynamic version Colibri-D. Although Colibri-D can update its factors incrementally, it fixes the indices of the initially sampled columns which need to be updated over time. Our CTD-D not only handles general dynamic tensors but also does not have to fix those indices.

Sampling based tensor decomposition

Sampling-based tensor decomposition method samples actual fibers or slabs from an original tensor. In contrast to PARAFAC [12] and Tucker [13], the most famous tensor decomposition methods, the resulting factors of sampling-based tensor decomposition method are easy to understand and usually sparse. There are two types of sampling based tensor decomposition: one based on Tucker and the other based on LR tensor decomposition which is defined in Definition 2.2. In Tucker-type sampling based tensor decomposition (e.g., ApproxTensorSVD [14] and FBTD (fiber-based tensor decomposition) [15]), factor matrices for all modes are either sampled or generated; the overhead of generating a factor matrix for each mode makes these methods too slow for applications to real-time analysis. We focus on sampling methods based on LR tensor decomposition which is faster than those based on Tucker decomposition.

Definition 2. (LR tensor decomposition) Given a tensor XRI1×I2××IN, X˜=L×αR is a mode-α LR tensor decomposition of X, where a matrix RRIα×c consists of actual mode-α fibers of X and a tensor L is any tensor of size I1 × ⋯ × Iα−1 × c × Iα+1 × ⋯ × IN.

Tensor-CUR. Mahoney et al. [16] proposed Tensor-CUR, a mode-α LR tensor decomposition method. Tensor-CUR is an n-dimensional extension of LinearTimeCUR. Tensor-CUR samples fibers and slabs from an original tensor and builds its factors using the sampled ones. The only difference between LinearTimeCUR and Tensor-CUR is that Tensor-CUR exploits fibers and slabs instead of columns and rows. Thus, Tensor-CUR has drawbacks similar to those of LinearTimeCUR. Tensor-CUR has many redundant fibers in its factors and these fibers make Tensor-CUR slow and use a large amount of memory.

Proposed method

In this section, we describe our proposed CTD (Compact Tensor Decomposition), an efficient and interpretable sampling-based tensor decomposition method. We first describe the static version CTD-S, and then the dynamic version CTD-D of CTD.

CTD-S for static tensors

Overview

How can we design an efficient sampling-based static tensor decomposition method? Tensor-CUR, the existing state-of-the-art, has many redundant fibers in its factors and these fibers make Tensor-CUR slow and use large memory. Our proposed CTD-S method removes all dependencies from the sampled fibers and maintains only independent fibers; thus, CTD-S is faster and more memory-efficient than Tensor-CUR.

Algorithm

Fig 1 shows the scheme for CTD-S. CTD-S first samples fibers biased toward a norm of each fiber. Three different fibers (red, blue, green) are sampled in Fig 1. There are many duplicates after biased sampling process since CTD-S samples fibers multiple times with replacement and a fiber with a higher norm is likely to be sampled many times. There also exist linearly dependent fibers such as the green fiber which can be expressed as a linear combination of the red one and the blue one. Those linearly dependent fibers including duplicates are redundant in that they do not give new information when interpreting the result. CTD-S removes those redundant fibers and stores only the independent fibers in its factor R to keep result compact. CTD-S only keeps one red fiber and one blue fiber in R in Fig 1.

Fig 1. The scheme for CTD-S.

Fig 1

CTD-S decomposes a tensor XRI1×I2××IN into one tensor CRI1××Iα-1×s˜×Iα+1××IN, and two matrices URs˜×s˜ and RRIα×s˜ such that XC×αRU. CTD-S is a mode-α LR tensor decomposition method and is interpretable since R consists of independent fibers sampled from X.

Algorithm 1 CTD-S for Static Tensor

Input: Tensor XRI1×I2××IN, mode α ∈ {1, ⋯, N}, sample size s ∈ {1, ⋯, Nα}, and tolerance ϵ

Output: CRI1××Iα-1×s˜×Iα+1××IN, URs˜×s˜, RRIα×s˜

1: Let X(α) be the mode-α matricization of X

2: Compute column distribution for i = 1, ⋯, Nα: P(i)|X(α)(:,i)|2X(α)F2

3: Sample s columns from X(α) based on P(i). Let I = {i1, ⋯, is}

4: Let I={i1,,is} be a set consisting of unique elements in I

5: Initialize R[X(α)(:,i1)] and U1/(X(α)(:,i1)TX(α)(:,i1))

6: for k = 2: sdo

7:  Compute the residual:

  res(X(α)(:,ik)-RURTX(α)(:,ik))

8:  if ||res||ϵ||X(α)(:,ik)|| then

9:   continue

10: else

11:   Compute: δ||res||2 and yURTX(α)(:,ik)

12:   Update U: U(U+yyT/δ-y/δ-yT/δ1/δ)

13:   Expand R:R[R,X(α)(:,ik)]

14:  end if

15: end for

16: Compute CX×αRT

17: return C, U, R

Algorithm 1 shows the procedure of CTD-S. First, CTD-S computes the probabilities of mode-α fibers of X, which are proportional to the norm of each fiber, and then samples s fibers from X according to the probabilities with replacement, in lines 1-3. Redundant fibers exist in the sampled fibers in this step. CTD-S selects unique fibers from the initially sampled s fibers in line 4 where s′ denotes the number of those unique fibers. This step reduces the number of iterations in lines 6-15 from s − 1 to s′ − 1. R is initialized by the first sampled fiber in line 5. In lines 6-15, CTD-S removes redundant mode-α fibers in the sampled fibers. The matrices U and R are computed incrementally in this step. The columns of R always consist of independent mode-α fibers through the loop. In each iteration, CTD-S checks whether one of the sampled fibers is linearly independent of the column space spanned by R or not in lines 7-8, using the residual tolerance ϵ. If the fiber is independent, CTD-S updates U and expands R with the fiber in lines 10-13. Finally, CTD-S computes C with X and R in line 16.

Lemma 1 shows the computational cost of CTD-S.

Lemma 1. The computational complexity of CTD-S is O((s˜Iα+s)Nα+s(s˜2+nnz(R))+slogs+nnz(X)), where Nα isnα In and s˜ss.

Proof. The mode-α matricization of X in line 1 needs O(nnz(X)) operations. Computing column distribution in line 2 takes O(nnz(X)+Nα) and sampling s columns in line 3 takes O(sNα). O(slogs) operation is required in computing unique elements in I in line 4. Computing R and U in lines 5-15 takes O(s(s˜2+nnz(R))) as proved in Lemma 1 in [23]. Computing C in line 16 takes O(s˜IαNα). Overall, CTD-S needs O((s˜Iα+s)Nα+s(s˜2+nnz(R))+slogs+nnz(X)) operations.

Lemma 2 shows that CTD-S has the optimal accuracy for given sampled fibers and ϵ = 0, thus is more accurate than Tensor-CUR.

Lemma 2. CTD-S has the minimum error, thus is more accurate than Tensor-CUR for a given R0 consisting of initially sampled fibers when the residual tolerance ϵ = 0.

Proof. CTD-S and Tensor-CUR are both mode-α LR tensor decomposition methods. They both sample fibers from X in the same way in the initial step. Assume R0 be the matrix consisting of those initially sampled fibers, and the same R0 is given for CTD-S and Tensor-CUR. Then, the reconstruction error of X given R0 is a function of L(α) as shown in Eq (4). The equality comes from Eq (3).

||X-L×αR0||F=||X(α)-R0L(α)||F (4)

The reconstruction error is globally minimum when L(α)=R0X(α). Eq (5) shows the minimum reconstruction error.

minL(α)||X(α)-R0L(α)||F=||X(α)-R0R0X(α)||F (5)

Let R be the factor of CTD-S. R consists of the independent columns of R0 since the tolerance ϵ = 0. We show that CTD-S has the minimum reconstruction error in Eq (6).

||X(α)-R0R0X(α)||F=||X(α)-RRX(α)||F=||X(α)-R(RTR)-1RTX(α)||F=||X(α)-RUC(α)||F=ErrorofCTD-S (6)

The first equality in Eq (6) holds because R0R0X(α) means the projection of X(α) onto the column space of R0, and R and R0 have the same column space. The third equality holds because CTD-S uses (RT R)−1 for its factor U (theorem 1 in [23]), and RT X(α) for its factor C. In contrast, Tensor-CUR does not have the minimum reconstruction error because Tensor-CUR has L(α) which is different from R0X(α). Specifically, Tensor-CUR further samples rows (called slabs) from X(α) to construct its L(α).

CTD-D for dynamic tensors

Overview

How can we design an efficient sampling-based dynamic tensor decomposition method? In a dynamic setting, a new tensor arrives at every time step and we want to keep track of sampling-based tensor decomposition. The main challenge is to update factors quickly while preserving accuracy. Note that there has been no sampling-based dynamic tensor decomposition method in the literature. Applying CTD-S at every time step is not a feasible option since it starts from scratch to update its factors, and thus running time increases rapidly as tensor grows. We propose CTD-D, the first sampling-based dynamic tensor decomposition method. CTD-D samples mode-α fibers only from the newly arrived tensor, and then updates the factors appropriately using those sampled ones. The main idea of CTD-D is to update the factors of CTD-S incrementally by (1) exploiting factors at previous time step and (2) reordering operations.

Algorithm 2 CTD-D for Dynamic Tensor

Input: Tensor ΔXRI1××IN-1×1, mode α ∈ {1, ⋯, N − 1}, C(t), U(t), R(t), sample size d ∈ {1, ⋯, ΔNα}, and tolerance ϵ

Output: C(t+1), U(t+1), R(t+1)

1: Let ΔX(α) be the mode-α matricization of ΔX

2: Compute column distribution for i = 1, ⋯, ΔNα:

   P(i)|ΔX(α)(:,i)|2ΔX(α)F2

3: Sample d columns from ΔX(α) based on P(i). Let I = {i1, ⋯, id}

4: Let I={i1,,id} be a set consisting of unique elements in I

5: Initialize R(t+1)R(t), U(t+1)U(t), and ΔR ← []

6: for k = 1: ddo

7:  Let xΔX(α)(:,ik)

8:  Compute the residual:

  res(x-R(t+1)U(t+1)(R(t+1))Tx)

9:  if ||res||ϵ||x|| then

10:   continue

11:  else

12:   Compute: δ||res||2 and yU(t+1)(R(t+1))Tx

13:   Update U(t+1): U(t+1)(U(t+1)+yyT/δ-y/δ-yT/δ1/δ)

14:   Expand R(t+1) and ΔR: R(t+1) ← [R(t+1),x] and ΔR ← [ΔR,x]

15:  end if

16: end for

  Update C(α)(t+1):

17: if ΔR is not empty then

18:  C(α)(t+1)(C(α)(t)(R(t))TΔX(α)(ΔR)TR(t)U(t)C(α)(t)(ΔR)TΔX(α))

19: else

20:  C(α)(t+1)(C(α)(t)(R(t))TΔX(α))

21: end if

22: Fold C(α)(t+1) into C(t+1)

23: return C(t+1), U(t+1), R(t+1)

Algorithm

Fig 2 shows the scheme for CTD-D. At each time step, CTD-D samples fibers from newly arrived tensor and updates factors by checking linear dependency of sampled fibers with the factor at previous time step. Purple and green fiber are sampled from newly arrived tensor in Fig 2. Note that the purple fiber is added to the factor R since it is linearly independent of the fibers in the factor at the previous time step, while the linearly dependent green fiber is ignored.

Fig 2. The scheme for CTD-D.

Fig 2

For any time step t, CTD-D maintains its factors C(t)RI1××Iα-1×d˜t×Iα+1××IN-1×t, U(t)Rd˜t×d˜t, and R(t)RIα×d˜t such that X(t)C(t)×αR(t)U(t), where the upper subscript (t) indicates that the factor is at time step t. X(t) grows along the time mode and we assume that N-th mode is the time mode in a dynamic setting, where N denotes the order of X(t). At the next time step t + 1, CTD-D receives newly arrived tensor ΔXRI1×I2×××IN-1×1 and updates C(t), U(t), and R(t) into C(t+1)RI1××Iα-1×d˜t+1×Iα+1××IN-1×(t+1), U(t+1)Rd˜t+1×d˜t+1, and R(t+1)RIα×d˜t+1, respectively such that X(t+1)C(t+1)×αR(t+1)U(t+1).

Algorithm 2 shows the procedure of CTD-D. First, CTD-D computes the probabilities of mode-α fibers of ΔX, which are proportional to the norm of each fiber, and then samples d fibers according to the probabilities with replacement in lines 1-3. CTD-D selects unique d′ fibers in line 4 and initializes R(t+1), U(t+1), and ΔR with R(t), U(t), and an empty matrix respectively in line 5, where ΔR consists of differences between R(t) and R(t+1). In lines 6-16, CTD-D expands R(t+1) with those sampled fibers by sequentially evaluating linear dependency of each fiber with the column space of R(t+1). R(t+1) and U(t+1) are updated in this step. Finally, C(α)(t+1) is updated in lines 17-21.

In the following, we describe two main ideas of CTD-D to update C(α)(t+1), R(t+1), and U(t+1) efficiently while preserving accuracy: exploiting factors at previous time step, and reordering operations.

(1) Exploiting factors at previous time step: First, we explain how we update R(t+1) and U(t+1) using the idea. In line 5 of Algorithm 1, CTD-S initializes R and U using one of the sampled fibers. This is because CTD-S requires R to consist of linearly independent columns and it is satisfied when R has only one fiber. Since R(t) already consists of linearly independent columns, we initialize R(t+1) and U(t+1) with R(t) and U(t) respectively in line 5 of Algorithm 2. In lines 6-16, we check linear independence of each sampled fiber from ΔX(α) with R(t+1). If the fiber is linearly independent, we expand R(t+1) and update U(t+1) as in the lines 11-13 of Algorithm 1.

Second, we describe how we update C(t+1) using the idea. We assume that ΔR is not empty after line 16 of Algorithm 2. At time step t and its successor step t + 1, CTD-S satisfies Eqs (7) and (8), where C(α)(t) has the size d˜t×Nα(t) and C(α)(t+1) has the size d˜t+1×Nα(t+1).

C(α)(t)(R(t))TX(α)(t) (7)
C(α)(t+1)(R(t+1))TX(α)(t+1) (8)

We can rewrite R(t+1) and X(α)(t+1) as Eqs (9) and (10) respectively, where ΔR has the size Iα×Δd˜ and ΔX(α) has the size Iα×ΔNα such that Nα(t+1)=Nα(t)+ΔNα and d˜t+1=d˜t+Δd˜.

R(t+1)=[R(t)ΔR] (9)
X(α)(t+1)=[X(α)(t)ΔX(α)] (10)

We replace R(t+1) and X(α)(t+1) in Eq (8) with those in Eqs (9) and (10), respectively, to obtain the Eq (11).

C(α)(t+1)[(R(t))T(ΔR)T][X(α)(t)ΔX(α)]=[(R(t))TX(α)(t)(R(t))TΔX(α)(ΔR)TX(α)(t)(ΔR)TΔX(α)] (11)

CTD-S computes all the 4 elements ((R(t))TX(α)(t), (R(t))TΔX(α), (ΔR)TX(α)(t), and (ΔR)TΔX(α)) in Eq (11) from scratch, hence requires a lot of computations. To make computation of C(α)(t+1) incremental, we exploit existing factors at time step t: C(α)(t), R(t), and U(t). First, we use C(α)(t) instead of (R(t))TX(α)(t) as in the Eq (7). Second, we should replace X(α)(t) in (ΔR)TX(α)(t) with the factors at time step t, since CTD-D does not have X(α)(t) as its input unlike CTD-S. We substitute R(t)U(t)C(α)(t) for X(α)(t). This is because CTD-S ensures X(t)C(t)×αR(t)U(t) which can be rewritten as X(α)(t)R(t)U(t)C(α)(t) by Eq (3). Eq (12) shows the final form of C(α)(t+1) which is the same as line 18 in Algorithm 2.

C(α)(t+1)[C(α)(t)(R(t))TΔX(α)(ΔR)TR(t)U(t)C(α)(t)(ΔR)TΔX(α)] (12)

(ΔR)TR(t)U(t)C(α)(t) and (ΔR)TΔX(α) are ignored when ΔR is empty as expressed in line 20 of Algorithm 2.

(2) Reordering computations: The computation order for the element (ΔR)TR(t)U(t)C(α)(t) is important since each order has a different computation cost. We want to determine the optimal parenthesization among possible parenthesizations. It can be shown that (((ΔR)TR(t))U(t))C(α)(t) is the optimal one with O((Δd˜)d˜t(Iα+d˜t+Nα(t))) operations and can be done by parenthesizing from the left.

We prove that CTD-D is faster than CTD-S in Lemma 3.

Lemma 3. CTD-D is faster than CTD-S. The computational complexity of CTD-D is O((Δd˜)d˜t(Nα(t)+Iα)+(d˜t+1Iα+d)ΔNα+d(d˜t+12+nnz(R(t+1)))+dlogd+nnz(ΔX)).

Proof. The lines 1-4 of Algorithm 2 for CTD-D are similar to those of Algorithm 1 for CTD-S. The only difference is that CTD-D samples d columns from ΔX(α) while CTD-S samples s columns from X(α). Thus, lines 1-4 takes O(nnz(ΔX)+dΔNα+dlogd). Updating R(t+1) and U(t+1) in lines 5-16 needs O(d(d˜t+12+nnz(R(t+1)))) operations as proved in Lemma 1 in [23]. In updating C(t+1) in lines 17-18, (R(t))TΔX(α) takes computational cost of O(d˜tIαΔNα). (ΔR)TΔX(α) takes O(Δd˜IαΔNα) and (ΔR)TR(t)U(t)C(α)(t) takes O((Δd˜)d˜t(Iα+d˜t+Nα(t))). Overall, CTD-D takes O((Δd˜)d˜t(Nα(t)+Iα)+(d˜t+1Iα+d)ΔNα+d(d˜t+12+nnz(R(t+1)))+dlogd+nnz(ΔX)).

CTD-D is faster than CTD-S because CTD-S has s˜IαNα in its complexity, which is much larger than all the terms in the complexity of CTD-D.

Experiments

We perform experiments to answer the following questions.

Q1: What is the performance of our static method CTD-S compared to the competing method Tensor-CUR?

Q2: How do the performance of CTD-S and Tensor-CUR change with regard to the sample size parameter?

Q3: What is the performance of our dynamic method CTD-D compared to the static method CTD-S?

Q4: What are the results of applying CTD-D for online DDoS attack detection and online troll detection?

Experimental settings

Machine

All the experiments are performed on a machine with a 10-core Intel 2.20 GHz CPU and 256 GB RAM.

Competing method

We compare our proposed method CTD with Tensor-CUR [16], the state-of-the-art sampling-based tensor decomposition method. Both methods are implemented in MATLAB.

Measure

We define three metrics (1. Relative Error, 2. Memory, and 3. Time) as follows. First, a Relative Error is defined as Eq (13). X denotes the original tensor and X˜ is the tensor reconstructed from the factors of X. For example, X˜=C×αRU in CTD-S.

RelativeError=||X˜-X||F2||X||F2 (13)

Second, Memory is defined as Eq (14). It measures the relative amount of memory needed for storing the resulting factors. The denominator and numerator indicate the amount of memory needed for storing the original tensor and the resulting factors, respectively.

Memory=nnz(C)+nnz(U)+nnz(R)nnz(X) (14)

Finally, Time denotes running time in seconds.

Data

Table 3 shows the data we used in our experiments.

Table 3. Summary of the tensor data used.
Name I1 I2 I3 Nonzeros
Facebook-wall [25] 63,891 63,890 1,504 738,485
Facebook-wall (synthetic) [30] 63,891 63,890 1,504 1,169,656
Hyperspectral Image [26] 538 323 148 25,715,854
Infectious [27] 407 410 1,392 17,298
Hypertext 2009 [28] 112 113 5,246 20,818
Haggle [29] 77 274 1,567 27,972
CAIDA [30] 189 189 1,000 20,511
CAIDA (synthetic) [30] 189 189 1,000 46,102

Input parameters

All methods take a tensor X generated from each dataset, a mode α, and a sample size s as input because they are LR tensor decomposition methods. In each experiment, we give the same input and compare the performance. We fix α = 1 and perform experiments under various sample sizes s. We set the number of slabs to sample r = s and the rank k = 10 in Tensor-CUR, and set ϵ = 10−6 in CTD.

Performance of CTD-S

We measure the performance of CTD-S to answer Q1 and Q2. In summary, compared to the Tensor-CUR, CTD-S is more accurate, and its running time and memory usage are relatively constant over various sample sizes.

Fig 3 shows the Time vs. Relative Error and the Memory vs. Relative Error of CTD-S compared to Tensor-CUR under various sample sizes to answer Q1. We measure error under similar level of running time with the pair of results with smallest difference in running time (horizontal lines in Fig 3). We find that CTD-S is up to 11× more accurate for the same level of running time compared to Tensor-CUR. This phenomenon coincides with the Lemma 2, which guarantees that CTD-S is more accurate than Tensor-CUR theoretically. Likewise, we choose the pair of points with smallest error difference between the two methods to compare running time and memory (vertical lines in Fig 3). Although no significant improvement in speed and memory usages are found for under similar sample sizes, CTD-S is able to perform 2.3× faster, and 24× more memory-efficient than Tensor-CUR under similar error rates.

Fig 3. Error, running time, and memory usage of CTD-S compared to Tensor-CUR varying sample sizes.

Fig 3

CTD-S is more accurate, faster and more memory-efficient than Tensor-CUR.

Fig 4 shows the Relative Error, Time, and Memory of CTD-S compared to those of Tensor-CUR over increasing sample sizes s for the Haggle dataset to answer question Q2. The error of CTD-S decreases as s increases because it gains more data to sample important fibers which describe the original tensor well. The running time and memory usage of CTD-S are relatively constant compared to those of Tensor-CUR. This is because CTD-S keeps only the linearly independent fibers, the number of which is bound by the rank of X(α). There are small fluctuations in the graphs since the sampling process of both CTD-S and Tensor-CUR are based on randomness. Although we have shown the results for only the Haggle dataset, the overall trend persists over other datasets.

Fig 4. Error, running time, and memory usage of CTD-S compared to those of Tensor-CUR over sample size s for haggle dataset.

Fig 4

CTD-S is more accurate over various sample sizes, and its running time and memory usage are relatively constant compared to the Tensor-CUR.

We further investigate the differences in accuracy improvements under various datasets by characterizing the dataset and relations to CTD-S performs. we characterize the datasets with density (dense or sparse) and fiber independence rate. The fiber independence rates are measured as follows:

numberofindependentmode-αfibersnumberofthewholemode-αfibersofatensor

Table 4 represents the accuracy of CTD-S compared to Tensor-CUR and fiber independence rate. We can identify that in sparse datasets, ones with a lower fiber independence rate shows better accuracy performance in large. This is in an accordance to the assumption that if the fiber independence rate is low, there’s a high probability of obtaining most of these independent fibers with a given sample size leading to high accuracy for CTD-S. The Hyperspectral Image data that forms a dense tensor showed relatively higher accuracy even though the proportion of independent fibers is high under the strict measure of independence. However, image data are known to have high redundancy and CTD-S samples fibers well even when strict independence rate is low.

Table 4. The accuracy of CTD-S compared to Tensor-CUR and the fiber independence rate.

Name Density Accuracy compared to Tensor-CUR Fiber independence rate
CAIDA sparse 48.3× 3.81 × 10−6
Haggle sparse 30.1× 1.24 × 10−6
Hypertext 2009 sparse 11× 1.79 × 10−4
Facebook-wall sparse 2.2× 4.40 × 10−4
Infectious sparse 2.8× 6.43 × 10−4
Hyperspectral Image dense 10× 1.13 × 10−2

Performance of CTD-D

We compare the performance of CTD-D with those of CTD-S to answer Q3. In summary, CTD-D is up to 82× faster for the same level of error compared to CTD-S. The detail is as follows.

To simulate a dynamic environment, we divide a given dataset into two parts along the time mode. We use the first 80% of the dataset as historical data and the later 20% as incoming data. We assume that historical data is already given and incoming data arrives sequentially at every time step, such that the whole data grows along the time mode. We measure the performance of CTD-D and CTD-S at each time step and calculate the average. We set the sample size d of CTD-D to be much smaller than that of CTD-S because CTD-D samples fibers only from the increment ΔX while CTD-S samples from the whole data X. We set d of CTD-D to be 0.01 times s of CTD-S, α = 1, and ϵ = 10−6.

Fig 5 shows the Time vs. Relative Error and Memory vs. Relative Error relation of CTD-D compared to those of CTD-S. Note that CTD-D is much faster than CTD-S for all the datasets. The reason why CTD-D is especially faster for the Hyperspectral Image dataset is that the dataset has relatively many dependent fibers, which makes CTD-D skip updating U, compared to the other datasets. CTD-D uses the same or slightly more memory than CTD-S does. This is because multiplication between sparse matrices used in updating C does not always produce sparse output, thus the number of nonzero entries in C increases slightly over time steps.

Fig 5. Error, running time, and memory usage relation of CTD-D compared CTD-S varying sample sizes.

Fig 5

CTD-D is faster and has smaller error while using the same or slightly larger memory space compared to CTD-S.

CTD at work

In this section, we apply CTD-D to online DDoS attack detection in network traffic data and online troll detection in social network data. We show how CTD-D’s interpretability can help successfully detect DDoS attacks and trolls.

Online DDoS attack detection

A DDoS attack makes an online service unavailable by sending a huge amount of traffic to the server from multiple sources. DDoS attacks are still major threats to many companies. In effect, 20% of financial companies get $1 million revenue loss per hour and 43% lose more than $250,000 hourly under DDoS attack, while 74% take more than 1 hour to shut down the attacks [24].

Our goal is to detect DDoS attacks in network traffic data efficiently in an online fashion. We propose a novel online DDoS attack detection method based on CTD-D’s interpretability. We show that CTD-D is one of the feasible options for online DDoS attack detection and show how it detects attacks successfully. In contrast to the standard PARAFAC [12] and Tucker [13] decomposition methods, CTD-D can determine DDoS attacks from its decomposition result without expensive overhead. We aim to dynamically find a victim (destination host) and corresponding attackers (source hosts) of each DDOS attack in network traffic data that is when a victim receives a huge amount of traffic from a large number of attackers.

The online DDoS attack detection method based on CTD-D is as follows. First, we apply CTD-D on network traffic data which is a 3-mode tensor in the form of (source IP—destination IP—time). We assume an online environment where each slab of the network traffic data in the form of (source IP—destination IP) arrives sequentially at every time step. We use source IP mode as mode α. Second, we inspect the factor R of CTD-D, which consists of actual mode-α fibers from the original data. R is composed of important mode-α fibers which signify major activities such as DDoS attack or heavy traffic to the main server. Thanks to CTD, we can directly find out destination host and occurrence time of a major activity represented in a fiber in R, by simply tracking the indices of fibers. We regard fibers with the same destination host index represent the same major activity, and consider the first fiber among those with the same destination host index to be the representative of each major activity. Then, we select fibers with the norm higher than the average among the first fibers and suggest them as candidates of DDoS attack. This is because DDoS attacks have much higher norms than normal traffic does.

We generate network traffic data by injecting DDoS attacks on the real-world CAIDA network traffic dataset [30]. We assume that randomly selected 20% of source hosts participate in each DDoS attack. Table 5 shows the result of DDoS attack detection method of CTD-D. CTD-D achieves high F1 score for various number n of injected DDoS attacks with notable precision. We set d = 10, and ϵ = 0.15.

Table 5. The result of online DDoS attack detection method based on CTD-D.

CTD-D achieves high F1 score for various n with notable precision, where n denotes the number of injected DDoS attacks.

n Recall Precision F1 score
1 1.000 1.000 1.000
3 1.000 1.000 1.000
5 0.880 1.000 0.931
7 0.857 1.000 0.921

Online troll detection

Recent social network services (SNS) such as Facebook or Twitter has billions of users; their main concern is to detect trolls, or abnormal users, since trolls can severely undermine the service. Our goal is to detect trolls in social network tensor data in an online fashion. We define a troll as an abnormal user who posts on the other users’ walls much more than normal users do. We show how CTD-D finds trolls successfully using its interpretability.

We use a process similar to the online DDoS attack detection method based on CTD-D described in the previous section to find trolls. We use the real-world Facebook-wall social network tensor, a 3-mode tensor containing triplets where each entry denotes the number of posts for the corresponding triplet. A triplet (User 1—User 2—time) means that User 2 posted on the User 1’s wall. We assume an online environment where new data point in the form of (User 1—User 2) arrives at every time step. We apply CTD-D with User 1 mode for α so that each fiber collected in the factor R represents User 2’s behavior at some time. By tracking indices of fibers in the factor R, we can reveal which fiber represents behaviors of which users at which time. We then decide trolls (User 2) by picking fibers which have norm larger than the average.

We test the ability of CTD to interpretability detect trolls by inserting synthetic trolls into the Facebook-wall dataset. Table 6 shows the result of online troll detection in Facebook-wall dataset based on CTD-D. It is notable that we can detect all the trolls inserted with very small sample size, 10−4% of the entire fibers, for a various number of trolls.

Table 6. The result of online troll detection in facebook-wall dataset based on CTD-D.

CTD-D detects all the trolls inserted (recall = 1) for various n, where n denotes the number of injected troll users. Note that we used only 10−4% of the entire fibers as a sample size.

n Recall Precision F1 score
1 1.000 0.200 0.333
3 1.000 0.500 0.667
5 1.000 0.556 0.714
10 1.000 0.833 0.909

Conclusion

We propose CTD, a fast, accurate, and directly interpretable tensor decomposition method based on sampling. The static version CTD-S is up to 11× more accurate, 2.3× faster, and 24× more memory-efficient compared to the state-of-the-art method. The dynamic version CTD-D is up to 82× faster than CTD-S for an online environment. CTD-D is the first method providing interpretable dynamic tensor decomposition. Utilizing the interpretability of CTD, we were able to successfully detect online DDoS attacks and trolls from network data. The interpretability of CTD comes from the assumption that the original data fiber itself is sparse and interpretable, such as IP address or words in documents. Although not all real-world data have this property, such as values in gene expression data, a wide range of social and technological data in the online environment does have the sparse and interpretable properties. In such cases, CTD is capable of dynamically detecting important or abnormal data in an online environment.

Data Availability

All data and code are available from GitHub (https://github.com/leesael/CTD).

Funding Statement

LS received support from Basic Science Research Program through the Korea National Research Foundation (http://www.nrf.re.kr/) grant number (NRF-2015R1C1A2A01055739).

References

  • 1.Khoa NLD, Zhang B, Wang Y, Liu W, Chen F, Mustapha S, et al. On Damage Identification in Civil Structures Using Tensor Analysis. In PAKDD 2015. 2015.
  • 2. Prada MA, Toivola J, Kullaa J, HollméN J. Three-way Analysis of Structural Health Monitoring Data. Neurocomput. 2012;80(15):119–128. 10.1016/j.neucom.2011.07.030 [DOI] [Google Scholar]
  • 3.Wang Y, Chen R, Ghosh J, Denny JC, Kho A, Chen Y, et al. Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics. In KDD’15; 2015; p. 1265–1274. [DOI] [PMC free article] [PubMed]
  • 4.Cyganek B, Woźniak M. Tensor based representation and analysis of the electronic healthcare record data. BIBM; 2015; p. 1383–1390.
  • 5.Perros I, Chen R, Vuduc R, Sun J. Sparse Hierarchical Tucker Factorization and Its Application to Healthcare. ICDM; 2015; p. 943–948.
  • 6. Thuraisingham BM, Khan K, Masud MM, Hamlen KW. Data Mining for Security Applications. In IEEE/IFIP EUC 2008; 2008; p. 585–589. [Google Scholar]
  • 7. Phua C, Lee VCS, Smith-Miles K, Gayler RW. A Comprehensive Survey of Data Mining-based Fraud Detection Research; In ICICTA 2010; 2010; p. 50–53. [Google Scholar]
  • 8.Allanach J, Tu H, Singh S, Willett P, Pattipati K. Detecting, tracking, and counteracting terrorist networks via hidden Markov models. In IEEE Aerospace; 2004; p. 3257.
  • 9. Arulselvan A, Commander CW, Elefteriadou L, Pardalos PM. Detecting Critical Nodes in Sparse Graphs. Comput Oper Res. 2009;36(7):2193–2200. 10.1016/j.cor.2008.08.016 [DOI] [Google Scholar]
  • 10.Cao Q, Sirivianos M, Yang X, Pregueiro T. Aiding the Detection of Fake Accounts in Large Scale Social Online Services. NSDI12; 2012; p. 197–210.
  • 11.Kontaxis G, Polakis I, Ioannidis S, Markatos EP. Detecting social network profile cloning. In PERCOM Workshops; 2011; p. 295–300.
  • 12. Harshman RA. Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics; 1970; 16(1). [Google Scholar]
  • 13. Tucker LR. Some mathematical notes on three-mode factor analysis. Psychometrika. 1966;(31)3:279–311. 10.1007/BF02289464 [DOI] [PubMed] [Google Scholar]
  • 14. Drineas P, Mahoney MW. A randomized algorithm for a tensor-based generalization of the singular value decomposition. Linear algebra and its applications. 2007;430(2-3):553–571. 10.1016/j.laa.2006.08.023 [DOI] [Google Scholar]
  • 15. Caiafa CF, Cichocki A. Generalizing the column–row matrix decomposition to multi-way arrays. Linear Algebra and its Applications. 2010; 433(3):557–573. 10.1016/j.laa.2010.03.020 [DOI] [Google Scholar]
  • 16. Mahoney MW, Maggioni M, Drineas P. Tensor-CUR decompositions for tensor-based data. SIAM J Matrix Anal Appl. 2008;30(3): 957–987. 10.1137/060665336 [DOI] [Google Scholar]
  • 17.Sun J, Tao D, Faloutsos C. Beyond streams and graphs: dynamic tensor analysis. In ACM SIGKDD’06; 2006; p.374–383.
  • 18.Sun J, Papadimitriou S, Yu PS. Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams. In ICDM’06; 2006; p. 1076–1080.
  • 19.Zhou S, Nguyen XV, Bailey J, Jia Y, Davidson I. Accelerating Online CP Decompositions for Higher Order Tensors. In ACM SIG KDD’16; 2016: p.1375–1384.
  • 20. Drineas P, Mahoney MW, Muthukrishnan S. Relative-error CUR matrix decompositions. SIAM J Matrix Anal Appl. 2008;30(2):844–881. 10.1137/07070471X [DOI] [Google Scholar]
  • 21. Drineas P, Kannan R, Mahoney M. Fast Monte Carlo algorithms for matrices III: Computing a compressed approximate matrix decomposition. SIAM J Comput. 2006;36(1):184–206. 10.1137/S0097539704442696 [DOI] [Google Scholar]
  • 22.Sun J, Xie Y, Zhang H, Faloutsos C. Less is More: Compact Matrix Decomposition for Large Sparse Graphs. In SDM’07; 2007; p. 366–377.
  • 23.Tong H, Papadimitriou S, Sun J, Yu PS, Faloutsos C. Colibri: fast mining of large static and dynamic graphs. In ACM SIGKDD’08; 2008; p. 686–694.
  • 24.Korolov M. DDoS costs, damages on the rise. CSO News, 2016.
  • 25. http://socialnetworks.mpi-sws.org/data-wosn2009.html.
  • 26. http://www.imageval.com/scene-database-4-faces-3-meters/
  • 27. http://konect.uni-koblenz.de/networks/sociopatterns-infectious.
  • 28. http://konect.uni-koblenz.de/networks/sociopatterns-hypertext.
  • 29. http://konect.uni-koblenz.de/networks/contact.
  • 30. https://github.com/leesael/CTD.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All data and code are available from GitHub (https://github.com/leesael/CTD).


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES