Abstract
Since multi-view learning leverages complementary information from multiple feature sets to improve model performance, a tensor-based data fusion layer for neural networks, called Multi-View Data Tensor Fusion (MV-DTF), is used. It fuses M feature spaces , referred to as views, in a new latent tensor space, , of order P and dimension , defined in the space of affine mappings composed of a multilinear map —represented as the Einstein product between a -order tensor anda rank-one tensor, , where is the m-th view—and a translation. Unfortunately, as the number of views increases, the number of parameters that determine the MV-DTF layer grows exponentially, and consequently, so does its computational complexity. To address this issue, we enforce low-rank constraints on certain subtensors of tensor using canonical polyadic decomposition, from which M other tensors , called here Hadamard factor tensors, are obtained. We found that the Einstein product can be approximated using a sum of R Hadamard products of M Einstein products encoded as , where R is related to the decomposition rank of subtensors of . For this relationship, the lower the rank values, the more computationally efficient the approximation. To the best of our knowledge, this relationship has not previously been reported in the literature. As a case study, we present a multitask model of vehicle traffic surveillance for occlusion detection and vehicle-size classification tasks, with a low-rank MV-DTF layer, achieving up to and in the normalized weighted Matthews correlation coefficient metric in individual tasks, representing a significant and improvement compared to the single-task single-view models.
Keywords: Einstein product, Hadamard product, Hadamard factor tensors, multi-view learning, multitask learning, vehicle traffic surveillance
1. Introduction
Vehicle traffic surveillance (VTS) systems are key components of intelligent transportation systems (ITSs), as they enable the automated video content analysis of traffic scenes to extract valuable traffic data. It includes crucial aspects of vehicle behavior, such as trajectories and speed, as well as traffic parameters, e.g., lane occupancy, traffic volume, and density. These data serve as the cornerstone for a variety of high-level ITS applications, including collision detection [1,2], route planning, and traffic control [3,4]. Currently, there exist several mathematical models for various tasks related to vehicle traffic, each with different conditions and traffic network topologies. For a comprehensive overview of vehicle traffic models, see, e.g., [5].
However, due to the complex nature of vehicle traffic, VTS systems are usually broken down into a set of smaller tasks, including vehicle detection, occlusion handling, and classification [6,7,8,9,10,11,12,13,14]. Each task is represented as a feature model, which should be related to the underlying task-specific explanatory factors, while it is either developed by human experts (hand-crafted) or automatically learned. These features focus on specific aspects of vehicles, such as texture, color, and shape, which individually provide complementary information to each other. Therefore, finding a highly descriptive feature model is crucial for enhancing the learning process on every VTS task.
Such feature diversity has made data fusion (DF) attractive for leveraging its shared and complementary information. DF allows for the integration of data from different sources to enhance our understanding and analysis of the underlying process [15]. In this context, there are two common DF levels [16]: low-level, where data are combined before analysis, and decision-level, where processed data from each source are integrated at a higher level, such as in ensemble learning [17]. Moreover, the diverse nature of data sources poses challenges, such as heterogeneity across sources, high-dimensional data, missing values, and a lot of redundancy that DF algorithms should address [18,19].
As part of DF, multi-view learning (MVL) is a machine learning (ML) paradigm that exploits the shared and complementary information contained in multiple data sources, called views, obtained from different feature sets [20]. Here, data represented by M views are referred to as M-view data. For instance, an image represented by texture, edges, and color features can be regarded as three-view data. MVL methods can be grouped into three categories: co-training, multiple kernel learning, and subspace learning (SL) [21,22]. Among these, SL-based methods focus on learning a low-dimensional latent subspace that captures the shared information across views [23].
On the other hand, multitask learning (MTL) is another ML paradigm where multiple related tasks are learned simultaneously to leverage their shared knowledge, with the ultimate aim of improving generalization and performance in individual tasks [24,25,26,27,28].
Recently, artificial neural networks (ANNs) have shown superior performance in vision-based VTS systems. ANNs are computational models built from a composition of functions, called layers, which together capture the underlying relationships between the so-called input and output spaces to solve a given task, such as regression or classification [29]. Such layers, including fully connected (FC) and convolutional (Conv), are parameterized by weights and biases structured as tensors, matrices, or vectors, which are learned during training. Notably, the first layers usually act as feature extractors, whereas higher layers capture the relationships between extracted features and the output space.
Furthermore, higher-order tensors [30], or multidimensional arrays, have gained significant attention over the last decade due to their ability to naturally represent multi-modal data, e.g., images and videos, and their interactions. They have been successfully applied in various domains, including signal processing [31], machine learning [32,33,34,35,36,37], computer vision [38], and wireless communications [39,40]. For instance, tensor methods such as decomposition models have been employed for the low-rank approximation of tensor data, enabling more efficient and effective analysis of such data.
In this work, we propose a computationally efficient tensor-based multi-view data fusion layer for neural networks, here expressed as the Einstein product. Our approach leverages multiple feature spaces to address the limitations inherent to single-view models, such as reduced data representation capacity and model overfitting. It offers improved flexibility and scalability, as it enables the integration of additional views without significantly increasing the computational burden. Finally, we present a case study with a multitask, multi-view VTS model, demonstrating significant performance improvements in vehicle-size classification and occlusion-detection tasks.
1.1. Related Work
Occlusion detection is a challenging problem in vision-based tasks, in which vehicles or some parts of them are hidden by other elements in the traffic scene, making their detection a difficult task. Early works have explored approaches based on empirical models, which infer the presence of occlusion by assuming specific geometric patterns, such as concavity in the shape of occluded vehicles [41,42,43,44,45,46,47]. Recently, deep learning (DL) has also been employed for occlusion detection [48,49,50,51,52], where such models are even capable of reconstructing the occluded parts [53,54].
Several algorithms based on ML and DL have been proposed for intra- and inter-class vehicle classification [6,8,9,55,56,57,58,59,60]. In [8], Hsieh et al. employ the optimal classifier to categorize vehicles as cars, buses, or trucks by leveraging the linearity and size features of vehicles, achieving accuracy of up to 97.0%. Moussa [9] introduces two levels of vehicle classification: the multiclass level, which categorizes vehicles as small, midsize, and large, and the intra-class level, in which midsize vehicles are classified as pickups, SUVs, and vans. In [6], we proposed a one-class support vector machine (OC-SVM) classifier with a radial basis kernel to classify vehicles as small, midsize, and large. By representing vehicles in a 3D feature space (area, width, and aspect-ratio) features, a recall, precision, and f-measure of up to were achieved for the midsize class. Other techniques include the gray-level co-occurrence matrix (GLCM) [61], 3D appearance models [62,63,64], eigenvehicles [65], and non-negative factorization [66,67,68]. Recently, CNN-based classifiers have been employed, outperforming previous works [55,58,59,60,69].
Other works based on MLV and MTL have also been developed for VTS systems. For instance, Wang et al. [70] proposed an MVL approach to foreground detection, where three-view heterogeneous data (brightness, chromaticity, and texture variations) are employed to improve detection performance. Then, their conditional probability densities are estimated via kernel density estimation, followed by pixel labeling through a Markov random field. In [71], a multi-view object retrieval approach to surveillance videos integrates semantic structure information from CNNs trained on ImageNet and deep color features, using locality-sensitive hashing (LSH) to encode the features into short binary codes for efficient retrieval. Chu et al. [72] present vehicle detection with multitask CNNs and a region-of-interest (RoI) voting scheme. This framework addresses simultaneously supervision with subcategory, region overlap, bounding-box regression, and category information to enhance detection performance. In [73], a multi-task CNN for traffic scene understanding is proposed. The CNN consists of a shared encoder and specific decoders for road segmentation and object detection, generating complementary representations efficiently. Additionally, the detection stage predicts object orientation, aiding in 3D bounding box estimation. Finally, Liu et al. [74] introduce the Multi-Task Attention Network (MTAN), a shared network with a global feature pooling and task-specific soft-attention modules to learn task-specific features from global features while allowing feature sharing across tasks.
Although, our work is focused on multi-view and multitask VTS systems, some works related to other domains are also overviewed. In [36], a tensor-based, multi-view feature selection method called DUAL-TMFS is proposed for effective disease diagnosis. This approach integrates clinical, imaging, immunologic, serologic, and cognitive data into a joint space using tensor products, and it employs SVM with recursive feature elimination to select relevant features, improving classification performance in neurological disorder datasets. Zadeh et al. [75] introduce a novel model called a tensor fusion network for multimodal sentiment analysis. It leverages the outer product between modalities to model both the intra-modality and inter-modality dynamics. On the other hand, Liu et al. [76] propose an efficient multimodal fusion scheme using low-rank tensors. Experimental validations across multimodal sentiment analysis, speaker trait analysis, and emotion recognition tasks demonstrate competitive performance and robustness across a variety of low-rank settings.
Table 1 offers a comprehensive overview of existing research related to our approach and to VTS systems. It highlights the use of ML and DL approaches, fed either by hand-crafted features or raw data with automatic feature learning, to capture the underlying task patterns. While DL features generally achieve superior performance, they require large, high-quality training sets and high computational complexity models to find suitable representations. Conversely, hand-crafted features can perform competitively for specific tasks, but determining the optimal feature representation is challenging, as no single hand-crafted feature can fully describe the underlying task’s relationships.
Furthermore, the emerging trend towards the adoption of ANN models on VTS systems is evident. However, despite their high performance, these models demand substantial memory and computational resources for learning and inference, as their layers are usually overparameterized. To address these challenges, various techniques such as sparsification, quantization, and low-rank approximation have been proposed to compress the parameters of pre-trained layers [77,78,79,80,81,82,83]. Among these techniques, low-rank approximation is very often employed. In [79,80], Denil et al. compress FC layers using matrix decomposition models. Conv layers are compressed via tensor decompositions, including canonical polyadic decomposition (CPD) [81,82] and Tucker decomposition [83]. However, compressing pre-trained layers usually results in an accuracy loss, and a fine-tuning procedure is often employed to recover the accuracy drop [82,84,85,86]. Therefore, some authors have suggested the incorporation of low-rank constraints into the optimization problem [87,88,89]. Other works have found that compressing raw images before training also contributes to computational complexity reduction, as suggested in [32,90]. Additionally, in [91], tensor contraction layers (TCLs) and tensor regression layers (TRLs) are introduced in CNNs for dimensionality reduction and multilinear regression tasks, respectively. This approach imposes low-rank constraints via Tucker decomposition on the weights of TCLs and TRLs to speed up their computations.
Table 1.
Reference | Input | Method | Contribution |
---|---|---|---|
[6,8,9,10,14] | Single-view | ML | Hand-crafted geometric features represent vehicles for detection and classification using ML-based algorithms |
[11,12] | Single-view | DL | CNN models are proposed to perform automatic feature learning for vehicle detection and classification |
[65] | Single-view | Eigenvalue decomposition | Eigenvehicles are introduced as an unsupervised feature representation method for vehicle recognition |
[66,67,68] | Single-view | Nonnegative factorization | A part-based model is employed for vehicle recognition via non-negative matrix/tensor factorization |
[72,73,74] | Single-view | DL-based MTL | MTL models based on DL are employed to simultaneously perform multiple tasks, including road segmentation, vehicle detection and classification |
[92] | Multi-view | DL | This work employs a YOLO-based model that fuses camera and LiDAR data at multiple levels |
[61,93,94] | Single-view | ML | Single-view features, such as HOG, Haar wavelets, or GLCM, represent vehicles for classification in ML models |
[95] | Multi-view | Tucker decomposition | A tensor decomposition is employed for feature selection of HOG, LBP, and FDF features |
[70,71,96] | Multi-view | MVL | MVL approaches are proposed to enhance vehicle detection, classification, and background modeling by learning richer data representations from color features |
[30,97,98,99,100] | − | − | These works provide theoretical foundations on tensors and its operations, such as the Einstein and Hadamard products, with applications across multiple domains |
[32,77,78,79,80,81,82,83,90] | − | DL | Matrix and tensor decompositions are employed for speeding up CNNs by compressing FC and Conv layers and reducing the dimensionality of their input space |
[91] | − | DL | Multilinear layers are introduced for dimensionality reduction and regression purposes in CNNs, leveraging tensor decompositions for efficient computation. |
1.2. Contributions
The main contributions of this work are the following:
We found a novel connection or mathematical relationship between the Einstein and Hadamard products for tensors (for details, see Section 5.2). From this connection, other algorithms for efficient approximations of the Einstein product can be developed.
Since multi-view models provide a more comprehensive input space than single-view models, we employ a tensor-based data fusion layer, here called multi-view data tensor fusion (MV-DTF). Unlike other works, our approach maps the multiple feature spaces (views) into a latent tensor space, , using a multilinear map, here expressed as the Einstein product (see Section 5), followed by a translation.
A major drawback of the MV-DTF layer is its high computational complexity, which grows exponentially with the number of views. To address this issue, a low-rank approximation for the MV-DTF layer, here called the low-rank multi-view data tensor fusion (LRMV-DTF) layer, is also proposed. This approach leverages the novel relationship between the Einstein and Hadamard products (see Section 5.2), where the lower the rank values, the more computationally efficient the operation.
As a case study, we introduce a high-performance multitask ANN model for VTS systems capable of simultaneously addressing various VTS tasks but which is here limited to occlusion detection and vehicle-size classification. This model incorporates the proposed LRMV-DTF layer as multi-view feature extractor to provide a more comprehensive input space compared to individual spaces.
1.3. How to Read This Article
For a comprehensive understanding of this paper, the following is suggested the following: Section 1 presents the motivation behind our research on VTS systems, as well as a review of their related works, while Section 2 introduces tensor algebra and multilinear maps, which will be essential for understanding the subsequent mathematical definitions; however, if you are already familiar with their theoretical foundations, you can proceed directly to Section 3 to delve into the problem statement and its mathematical formulation, where the main objectives are stated. These objectives are important to understand the major results of the paper. Section 4 provides a comprehensive overview of VTS systems and their associated tasks as an important case study. If you are already familiar with these concepts, proceed to Section 5 for the technical and mathematical details of the MV-DTF layer. Particularly, Section 5 is very important because it presents the novel connection between Einstein and Hadamard products. Section 6 presents the results and their analysis for a deeper understanding of our findings, which are complemented by figures and tables to facilitate data interpretation. Finally, Section 7 provides the conclusions of this work, summarizing the key points and suggesting directions for future research.
2. Mathematical Background
2.1. Notation
In this study, we adopt the conventional notation established in [30], along with other commonly used symbols. Table 2 provides a comprehensive overview of the symbols utilized in this paper. An Nth-order tensor is denoted by , where the dimension is usually referred to as the n-mode of . The ith entry of a vector, , is denoted as ; the th entry of a matrix by ; while the th entry of an Nth-order tensor is denoted as , where is called the n-mode index. The n-mode fiber of an Nth-order tensor is an -dimensional vector resulted from fixing every index but ; i.e., , where colon mark: denotes all possible values of the n-mode index , i.e., . The ith n-mode slice of an Nth-order tensor is an th-order tensor defined by just fixing the index, i.e., . Finally, for any two functions, and , denotes their function composition. For an understanding on tensor algebra, we refer the interested reader to the comprehensive work by Kolda and Bader [30].
Table 2.
The field for real, natural, and binary numbers | |
≃ | It denotes isomorphism between two structures |
The subset of natural numbers | |
Tensor, matrix, column vector, and scalar | |
The dimension of a vector space, V | |
⊙ | Hadamard product |
⊗ | Tensor product |
The n-mode tensor-matrix product | |
The Einstein product along the last N modes | |
The th element of a sequence, | |
indexed by , where X can be a scalar, vector, or tensor | |
The feature space for the m-th view | |
The output space for the t-th task | |
The t-th classification task | |
The MV-DTF layer and its low-rank approximation | |
The t-th task-specific function | |
Hypothesis space of the classifiers for the t-th task | |
The latent tensor space | |
P | The order of the latent tensor space |
The dimension of the latent tensor space | |
For a tensor , it denotes the | |
tensor rank of the -th subtensor |
2.2. Multilinear Algebra
This section provides an overview of basic concepts of multilinear algebra, such as tensors and their operations over a set of vector spaces.
Definition 1
(Multilinear map [101]). Let and W be vector spaces over a field, . And let be a function that maps an ordered M-tuple of vectors, , into an element, , where . If, for all and , Equation (1) holds, then is said to be a multilinear map (or an M-linear map); i.e., it is linear in each argument.
(1)
Definition 2
(Tensor product). Let and W be real vector spaces, where , and . Then, the tensor product of the set of M vector spaces , denoted as , is another vector space of dimension , called tensor space, together with a multilinear map, , that satisfies the following universal mapping property [101,102]: for any multilinear map , there exists a unique linear map, , such that .
Definition 3
(Tensor). Let be vector spaces over some field, , where . An M-order tensor, denoted as , is an element in the tensor product .
Definition 4
(m-mode matricization [30]). The m-mode matricization is a mapping that rearranges the m-mode fibers of a tensor, , into the columns of a matrix, , where .
Definition 5
(Rank-one tensor). Let be an Mth-order tensor, and let be a set of M vectors, where for all . Then, if can be written using the tensor product , it is said to be a rank-one tensor, and its -th entry will be determined by .
Definition 6
(Tensor decomposition rank). The decomposition rank, R, of a tensor, , is the smallest number of rank-one tensors that reconstructs exactly as their sum. Then, is called a rank-R tensor.
Definition 7
(Tensor multilinear rank). For any Mth-order tensor, , its multilinear rank, denoted as is the M-tuple , whose mth entry, , corresponds to the dimension of the column space of , i.e., , formally called m-mode rank.
Definition 8
(Tensor m-mode product). Given a tensor, , and a matrix, , their m-mode product, denoted as , produces another tensor, , whose th entry is given by Equation (2). Therefore, .
(2)
2.3. Einstein and Hadamard Products
In this section, the fundamental concepts for the mathematical modeling of the MV-DTF layer are presented, including the Hadamard and Einstein products.
Definition 9
(Inner product). For any two tensors, , , their inner product is defined as the sum of the product of each entry, as Equation (3) shows:
(3)
Definition 10
(Hadamard product). The Hadamard product of two Nth-order tensors , , denoted as , results in an Mth-order tensor, , such that its th-entry is equal to the element-wise product .
Definition 11
(Einstein product [100,103]). Given two tensors, and , of order and , their Einstein product or tensor contraction, denoted as , produces an tensor, , whose th entry is given by the inner product between subtensors and , as Equation (4) shows:
(4)
The product can be understood as a linear map, ; i.e., for any two scalars , and tensors , the following properties hold:
Distributive: .
Homogeneity: .
2.4. Subspace Learning
Recent advances in sensing and storage technologies have resulted in the generation of massive amounts of complex data, commonly referred to as big data [104,105]. These data are often represented in a high-dimensional space, making their visualization and analysis a challenging task. To address these challenges, subspace learning methods have emerged as a powerful approach to learning a low-dimensional representation of high-dimensional data [106,107], such as the spatial and temporal information encoded in videos. In this section, a brief review of linear and multilinear methods for subspace learning is presented, highlighting their advantages and disadvantages.
2.4.1. Linear Subspace Learning (LSL)
Given a dataset, , of N samples, arranged in matrix form as , whose n-th column vector corresponds to the n-th sample , LSL seeks to find a linear subspace of that best explains the data. The resulting subspace can be spanned by a set of linearly independent basis vectors, , where . By leveraging this subspace, high-dimensional data can be projected onto a lower-dimensional space , as Equation (5) shows:
(5) |
where is called the factor matrix, whose columns correspond to the basis vectors, and is the projection of the input matrix onto .
A wide variety of techniques have been proposed to address the LSL problem, ranging from unsupervised approaches such as principal component analysis [108], factor analysis (FA) [109], independent component analysis [110], canonical correlation analysis [111], and singular value decomposition [112], as well as supervised approaches like linear discriminant analysis [113]. Subsequently, such techniques aim to estimate by solving optimization problems such as maximizing the variance or minimizing the reconstruction error of the projected data.
Although LSL methods have shown great effectiveness in modeling vector-based observations, they face difficulties when addressing multidimensional data. Then, to apply LSL methods on tensor data, it is necessary to vectorize them. Unfortunately, this transformation very often leads to a computationally intractable problem due to the large number of parameters to be estimated, and the model may suffer from overfitting. Furthermore, vectorization also destroys the inherent multidimensional structure and correlations across modes of tensor data [30,106].
2.4.2. Multilinear Subspace Learning (MSL)
Multilinear subspace learning is a mathematical framework for exploring, analyzing, and modeling complex relationships over tensor data, preserving their inherent multidimensional structure. According to Lu [106], the MSL problem can be formulated as follows: Given a dataset arranged in tensor form as , where subtensor corresponds with the n-th data point , MLS seeks to find a set of M subspaces that best explains data, where the mth subspace resides in and is spanned by a set of linearly independent basis vectors, . The MSL problem can be formally defined using Equation (6):
(6) |
where is a matrix whose columns correspond to the basis vectors of the m-th subspace, and denotes a function to be maximized.
A classical MSL technique is the Tucker decomposition [30], which aims to approximate a given Mth-order tensor, , into a core tensor, , multiplied along the m-mode by a matrix, , for all , as Equation (7) shows:
(7) |
where is the m-th factor matrix associated with the m-mode fiber space of , captures the level of interaction on each factor matrix, and .
Similarly, canonical polyadic decomposition [30] aims to approximate a given Mth-order tensor into as a sum of R rank-one tensors, as Equation (8) shows:
(8) |
where is the r-th weighting term, and is the m-mode factor vector for the r-th rank-one tensor , while Equation (8) is exact iff R is the decomposition rank.
While MSL effectively mitigates several drawbacks related to LSL methods, it has some disadvantages. First, the intricate mathematical operations required for MSL methods very often involve high computational complexity, impacting both time and storage requirements. Moreover, MSL requires a substantial amount of data to effectively capture the intricate relationships of multilinear subspaces. Therefore, addressing these challenges is crucial to ensuring proper learning.
3. Problem Statement and Mathematical Definition
In this section, the problem to be addressed is formulated in natural language, outlining specific tasks related to VTS systems. Subsequently, the inherent challenges are mathematically formulated.
3.1. Problem Statement
Given a traffic surveillance video of seconds, recorded with a static camera, where multiple moving vehicles are observed, we aim to comprehensively model vehicle traffic using a multitask, multi-view learning approach. This model simultaneously addresses various tasks, such as vehicle detection, classification, and occlusion detection, each of them represented by specific views that partially describe the underlying problem. By projecting multi-view data into a unified, low-dimensional latent tensor space, which builds a new input space for the tasks, our approach should improve the model performance and provide a more comprehensive representation of different study cases, e.g., the traffic scene, compared to single-task, single-view learning models.
3.2. Mathematical Definition
3.2.1. Multitask, Multi-View Dataset: The Input and Output Spaces
Consider a collection of T supervised classification tasks related to VTS systems, such as vehicle detection, classification, and occlusion detection, where, to the t-th task, corresponds a dataset, , composed of M-view labeled instances, e.g., moving vehicles, as Equation (9) shows:
(9) |
where is the feature vector of the k-th instance over the m-th view and t-th task, belonging to the feature space , i.e., , the M-tuple is an element of the input data space , and its corresponding true label in an output space, .
3.2.2. Task Functions
For the t-th task, we aim to learn a multi-view classification function, , that predicts, with high probability, the true label of the k-th instance, as Equation (10) shows, where belongs to some hypothesis space, .
(10) |
Consequently, the dimension of the output space represents the number of classes in the t-th learning task.
3.2.3. The Parametric Model
Considering the high-dimension of the input data space, it seems reasonable to project multi-view data onto a low-dimensional latent space, , by learning some mapping , as Equation (11) shows:
(11) |
where is the projection of the k-th instance, and can be either unidimensional (e.g., J), or multidimensional (e.g., ). If we need a more efficient mapping, , a low-rank approximation function, , is required instead of .
Let be the t-th task-specific mapping that predicts the label from the k-th instance embedded in the latent space , as shown in Equation (12), where can be represented by, e.g., ANN, SVM, or random forest (RF) algorithms. In consequence, the function composition can determine the t-th task function .
(12) |
3.2.4. The Optimization Problem
For a given multitask, multi-view dataset, , our problem can be reduced to learn simultaneously the set of functions that minimizes the multi-objective empirical risk of Equation (13) [114]:
(13) |
where belongs to some hypothesis space , is the loss function related to the t-th task that measures the discrepancy between the true label and the predicted one, and is a weighting parameter, determined either statically or dynamically, which controls the relative importance of the t-th task.
3.2.5. Objectives
The main objectives are as follows:
For a multi-view input space of M views, to learn a mapping , where is a low-dimensional latent tensor space with or J (see Section 5.1, particularly Equation (20)).
To reduce the computational complexity of , a low-rank approximation, , needs to be learned.
For a set of T tasks, e.g., VTS tasks, the set of task-specific functions must be learned, where , and is the output space of the t-th task.
To evaluate the performance of our approach, a multitask, multi-view model for the case study of VTS systems (see Section 6.2) is employed.
4. Vehicle Traffic Surveillance System: Multitask, Multi-View Input Space Formation
In this section, we provide a general description of several tasks associated with a typical vision-based VTS system, including background and foreground segmentation, occlusion handling, and vehicle-size classification. Together, these tasks enable the estimation of traffic parameters, such as traffic density, vehicle count, and lane occupancy, inferred from the video. Specifically, these parameters are essential for high-level ITS applications.
4.1. Background and Foreground Segmentation
Let be a fourth-order tensor representing a traffic surveillance video, recorded at a frame rate with a duration of seconds. Here, , W, and H represent the image spatial dimensions, corresponding to width and height, respectively, and B is the dimensionality of the image spectral coordinate system, i.e., the color space in which each pixel lives, or the number of spectral bands in hyper-spectral imaging (HSI). For example, corresponds to grayscale, while corresponds to RGB color space. Finally, denotes the number of frames in the video.
From the aforementioned tensor , it is important to note the following:
The nth frontal slice represents the nth frame of the video at time .
The third-mode fiber denotes the th pixel value at frame n, where is the pixel location belonging to the image spatial domain .
Each pixel value is quantized using D bits per spectral band. For simplicity, here, we assume the 8-bit grayscale color space , i.e., , but it can be extended to other color spaces. Consequently, reduces to .
Every th pixel value can be modeled as a discrete random variable, , with a probability mass function (pmf), denoted as , where .
For any observation time, , the pmf of any pixel can be estimated, denoted as .
Then, tensor can be decomposed as Equation (14) shows and Figure 1 illustrates:
(14) |
where is called the background tensor, is the foreground tensor, and is the binary mask of the foreground tensor, whose -th entry if the th pixel value of at frame n is part of the foreground tensor ; otherwise , the complement of , and can be obtained from the Hadamard product .
4.2. Blob Formation
After decomposing into the background and foreground tensors, various moving objects, including vehicles, pedestrians, and cyclists, can be extracted by analyzing or its mask, . One such technique is called connected components analysis (CCA) [115,116]. CCA recursively searches at every nth frontal slice for connected pixel regions (see Definition 12), referred to in the literature as binary large objects (blobs), which can contain pixels associated with moving objects.
Definition 12
(Blob). A blob, denoted as S, is a set of pixel locations connected by a specified connectivity criterion (e.g., four-connectivity or eight-connectivity [117]). Specifically, a pixel located at belongs to blob S if there exists another pixel location, , such that the connectivity criterion is met, as Equation (15) shows:
(15) where is an inter-pixel distance that establishes the connectivity criterion given some threshold value, , and .
For every blob detected at frame n, a blob mask, , can be formed whose entries are given by Equation (16). Note that the pixel values of blob can be obtained from the product .
(16) |
4.3. Vehicle Feature Extraction and Selection
Feature extraction can be considered a mapping, , that transforms a given blob, S, into a low-dimensional point, , called the feature vector, as shown in Equation (17):
(17) |
where , called the feature space, captures specific aspects of blobs , e.g., color, shape, or texture.
The image moments (IMs) are a classical hand-crafted feature extractor that provides information about the spatial distribution, shape, and intensity of a blob image. Typical features extracted via the IM include centroid, area, orientation, and eccentricity. Formally, the -th raw IM for blob S is given by the bilinear map of Equation (18):
(18) |
where , are vectors whose i-th and j-th entries are and , respectively.
4.4. Vehicle Occlusion Task
Assuming there are vehicles on the road at the n-th frame, each associated with a specific blob , let denote the set of these blobs, and let be the set of blobs detected via CCA in the n-th frame, where . The v-th vehicle, with blob , is occluded by the u-th detected blob if any of the conditions in Equation (19) are met.
(19) |
Given the set of detected blobs in the n-th frame, the vehicle occlusion detection aims to predict, with high probability, a set of blobs , each containing more than one vehicle. To achieve this, an occlusion feature space, , is constructed using a feature extraction mapping to capture the vehicle occlusion patterns. In this space, every detected blob, , is represented by an -dimensional feature vector . Assuming occlusions are only composed of partially observed vehicles, a classification function, , can be built to predict whether a detected blob, , has more than one vehicle.
4.5. Vehicle Classification Task
Given a set of vehicle-size labels (e.g., ) represented in a vector space, , called the output space, the vehicle classification task aims to predict, with high probability, the true label for an unseen vehicle blob instance, , at frame n. First, each blob, , is mapped into some feature space, , using a feature extraction mapping, , constructed to explain the vehicle-size patterns. From this space, a feature vector, , associated with , is derived. Then, a classification function, , can be built to predict the label of a vehicle blob instance, .
5. A Multi-View Data Tensor Fusion Layer and the Connection Between the Einstein and Hadamard Products
In this section, the concept of a multi-view data tensor fusion (MV-DTF) layer and its connection with Einstein and Hadamard products are introduced. Basically, MV-DTF is a form of an FC layer for multi-view data; i.e., it is an affine function, but instead of using a linear map, our layer employs a multilinear map to encode the interactions across views. Additionally, a low-rank approximation for the MV-DTF layer is also proposed to reduce its computational complexity.
5.1. Multi-View Tensor Data Fusion Layer: The Mapping g as an Einstein Product
Inspired by previous works [36,75,76], we restrict the function space of the MV-DTF layer to the affine functions characterized by a multilinear map, , followed by a translation and, possibly, a non-linear map, , as Equation (20) shows:
(20) |
where is the MV-DTF layer, the P-order tensor is the projection of the k-th instance onto the latent tensor space , called the fused tensor, with dimension , is the translational term, formally called bias, and the mapping .
Definition 13 specifies how a multilinear map can be represented using coordinate systems, and from this representation, a tensor can be induced for every multilinear map.
Definition 13
(Coordinate representation of a multilinear map [101]). Let and W be real vector spaces, where for all , and . Let be the standard basis for W. And let be a multilinear map. Given an ordered M-tuple , where , the map is completely determined by a linear combination of basis vectors and scalars , as Equation (21) shows.
(21) The collection of scalars can then be arranged into an th-order tensor, denoted as , which determines , and whose -th entry corresponds with .
Next, Definition 14 establishes a connection between the Einstein product and multilinear maps via the universal property of multilinear maps (see Definition 2).
Definition 14.
Let be a set of vectors, where for all . And let be the multilinear map induced via the tensor , and is the multilinear map associated with the tensor product . For , the Einstein product can be understood as a linear map, . Then, and Φ are related by the universal property of multilinear maps, as Equation (22) shows.
(22)
For the multilinear map in Equation (20), Definition 13 ensures the existence of a tensor, , that determines , and Definition 14 provides the associated linear map of . From the above definitions, Equation (20) can be rewritten in tensor form as Equation (23) shows, where is a rank-one tensor resulted from the tensor product of the M view vectors associated with the k-th instance.
(23) |
Note that Equation (23) represents a differentiable expression with respect to tensors and . Consequently, their values can be learned using optimization algorithms such as stochastic gradient descent (SGD), where the number of parameters to learn, denoted as L, corresponds with the number of entries of tensors and , as Equation (24) shows. Note that L scales exponentially with the number of views, M, and the order P of . Specifically, for , L is reduced to . This exponential growth can lead to computational challenges while increasing the risk of overfitting due to the induced curse of dimensionality [118,119,120]; i.e., the number of samples needed to train a model grows exponentially with its dimension.
(24) |
5.2. Hadamard Products of Einstein Products and Low-Rank Approximation Mapping
Low-rank approximation is a well-known technique that not only allows for reducing model parameter storage requirements but also helps in alleviating the computational burden of neural network models [81,82,85,86,87,88,89,121]. Based on these facts, in this work, we explore a CPD-based low-rank structure, illustrated in Figure 2, to overcome the curse of dimensionality induced via the MV-DTF layer. This structure helps reduce the number of parameters required for the MV-DTF layer, and it is computationally more efficient (see Proposition 1). But before presenting this structure, the concept of Hadamard factor tensors is first introduced in Definition 15.
Definition 15
(Hadamard factor tensors). Let be a -order tensor, whose -th subtensor results from fixing every index but the last M modes; i.e., for all and can be approximated as a rank- tensor using the CPD, as Equation (25) shows:
(25) where the number of subtensors in corresponds to the dimension of the latent space ; i.e., , each -th subtensor has a specific rank, , which can be different across subtensors, and for , known as the m-mode factor vector, the superscripts identify the -th subtensor to which it corresponds, identifies its associated r-th rank-one tensor in the CPD, and its mode. Then, the set of factor vectors along the m-mode can be rearranged into a -order tensor, , here called the m-mode Hadamard factor tensor, whose -mode fibers are given by Equation (26):
(26) where is the zero vector, and is the maximum rank across subtensors, employed to avoid inconsistencies due to different rank values between subtensors.
Figure 2 illustrates the concept of Hadamard-factor tensors for the multilinear map with associated tensor . Here, there is a two-view data () with dimensions , and , respectively; the order and dimension of the latent tensor space are and , and hence, there are three subtensors, , , associated with the tensor . For subtensor , its rank is ; hence, for subtensor , , i.e., , while for subtensor , , and . From these vectors, two Hadamard factor tensors, and , can be constructed, corresponding to the first and second views, respectively. The second-mode dimension of these tensors corresponds to the greatest subtensor rank, i.e., , to avoid heterogeneous rank values across subtensors. Hence, the second and third subtensors incorporate two and one additional zero vectors, respectively, as Figure 2 shows.
Proposition 1 presents the primary result of this work, i.e., the mathematical relationship between Einstein and Hadamard products. To the best of our knowledge, this relationship is not known.
Proposition 1.
Let be a rank-one tensor, where for all . And let be a -order tensor induced via the multilinear map , which can be decomposed into a set of M factor tensors for a given rank, , where for all . Then, can be approximated by a sum of R Hadamard products of Einstein products, as Equation (27) shows:
(27) where is the m-mode Hadamard factor tensor. A vector of all ones is denoted as . And .
Proof.
In Appendix A. □
By leveraging Proposition 1 for tensor in Equation (23), the MV-DTF layer can be approximated through a more efficient low-rank mapping , called the low-rank multi-view data tensor fusion (LRMV-DTF) layer, defined in Equation (28), where the m-mode factor tensor , associated with the m-th view, contributes to building every k-th fused tensor .
(28) |
From this approximation, the number of parameters required for the LRMV-DTF layer, denoted as , is provided in Equation (29). Note that the product of the -dimensions related to the views in L (Equation (24)) has been replaced with a summation, which yields fewer parameters to learn compared to those in the MV-DTF layer, reducing the risk of overfitting.
(29) |
An illustration of our layers is shown in Figure 3a (MV-DTF), and Figure 3b (LRMV-DTF). Here, the number of views , and their dimensions , and respectively. The order of the latent space is , and its dimension . Consequently, the multilinear map is , with associated tensor , and bias . However, vector is fixed to zero for simplicity. For low-rank approximation, the rank of the -th subtensor is . Hence, according to Definition 15, tensor can be decomposed into two factor tensors: , and , associated with the first and second views, respectively.
This relationship between the Einstein and Hadamard product enables a rank-R CPD for every subtensor of tensor and, consequently, a low-rank approximation.
5.3. Dimension J or , Order P of the Latent Space , and the Rank R: The Hyperparameters of the MV-DTF and LRMV-DTF Layers
The proposed layers introduce three hyperparameters to tune: the order P and the dimension of , and the rank value R:
Latent space dimension: It determines the expressiveness of the latent space to capture complex patterns across views. High-dimensional spaces enhance expressiveness but also increase the risk of overfitting, while low-dimensional spaces reduce expressiveness but mitigate the risk of overfitting.
Latent space order: It is determined by the architecture of the ANN. For instance, in multi-layer perceptron (MLP) architectures, the input space dimension is unidimensional, e.g., ; hence, . In contrast, the input space of CNNs is multidimensional; thereby, .
Rank: It determines the computational complexity of the MV-DTF layer. For low rank values on subtensors of , the number of parameters to learn can be reduced, but it may not capture complex interactions across views effectively, limiting the model performance. Conversely, high rank values increase the capacity to learn complex patterns in data, but they may lead to overfitting.
5.4. MV-DTF and LRMV-DTF on Neural Network Architectures: The Mapping Set
According to the desired level of fusion [16], two primary configurations can be employed where our data fusion layer can be incorporated in an ANN architecture:
Feature extraction: The MV-DTF layer g can be integrated into an ANN to map the multi-view input space into some latent space, , for multi-view feature extraction; see Figure 4a,c. Here, both the order P and dimension of must correspond with the order and dimension of the input layer in the architecture of the ANN.
Multilinear regression: The MV-DTF layer performs multilinear regression to capture the multilinear relationships between the multi-view latent space and the output space for single-task learning (see Figure 4b). Here, is the m-th single-view latent space obtained from the mapping , where is the m-th single-view input space. Consequently, the dimension and order of the latent space must correspond with those of the output space.
6. Results and Discussion
6.1. Dataset Description
To test the effectiveness of the proposed MV-DTF layer, we conduct experiments on four real-world traffic surveillance videos, encompassing more than 50,000 frames of footage with a resolution of 420 × 240 pixels and recorded at a frame rate of 25 FPS (accessible via [122]). Sample images from each test video can be observed in Figure 5, while technical details are provided in Table 3.
Table 3.
Video | Duration (s) | Tracked Vehicles | Temporal Samples of Tracked Vehicles |
---|---|---|---|
V1 | 146 | 137 | 6132 |
V2 | 326 | 333 | 19,194 |
V3 | 216 | 239 | 14,457 |
V4 | 677 | 720 | 91,870 |
A collection of over 92,000 images of vehicles was then extracted from the test videos using the background and foreground method. Each image has been manually labeled for two tasks (): (1) occlusion detection, where vehicles are categorized as occluded or non-occluded (labeled to as 1 and 0, respectively), and (2) vehicle-size classification, where non-occluded vehicles are categorized as small (S), midsize (M), large (L), or very large (XL), with labels 1 to 4. Next, one-hot encoding was used to represent the class labels of each task. Consequently, the output spaces for the classification and occlusion detection tasks become , and , respectively, i.e., and .
In addition, three subsets of image moment-based features were extracted and normalized for each vehicle image: (1) a 4D feature space, (i.e., ), consisting of the vehicle blob solidity, orientation, eccentricity, and compactness features; (2) a 3D feature space, (i.e., ), encompassing the vehicle’s width, area, and aspect ratio; and (3) a 2D feature space, , representing the vehicle centroid coordinates. Together, the three feature spaces form a three-view input space of dimension ; i.e., the number of views is , and , , and are the dimensions of each feature space.
As a result, two datasets, denoted as and , were created from the test videos, where corresponds to the occlusion detection task and to the vehicle-size classification task. Both datasets encompass vehicle instances represented in a three-view feature space, available in [123]. Table 4 and Table 5 provide a summary of our datasets, detailing the distribution of images across the occlusion and vehicle-size classification tasks.
Table 4.
Video | Occluded | Unoccluded |
---|---|---|
V1 | 4671 | 1461 |
V2 | 12,684 | 6510 |
V3 | 10,384 | 4073 |
V4 | 41,002 | 11,084 |
Table 5.
Video | Small (Class 1) | Midsize (Class 2) | Large (Class 3) | Very Large (Class 4) |
---|---|---|---|---|
V1 | 45 | 4018 | 390 | 218 |
V2 | 169 | 11,687 | 676 | 152 |
V3 | 206 | 8426 | 1179 | 573 |
V4 | 777 | 35,843 | 3101 | 1282 |
6.2. The Multitask, Multi-View Model Architecture and Training
6.2.1. The Multitask, Multi-View Model Architecture
To learn the two tasks, a multitask, multi-view ANN model based on the MLP architecture was employed. The proposed model is structured in four main stages (see Figure 6): (1) hand-crafted feature extractors (shown in red), (2) an MV-DTF/LRMV-DTF layer (in green), (3) the neck (in yellow), and (4) the task-specific heads (in blue). Stages 1 and 2 form the backbone of the model, serving as a feature extractor to capture both low-level and high-level features from the raw data. Stage 3 refines the features extracted from the backbone. Finally, stage 4 performs prediction or inference. In addition, dropout is applied at the end of each stage to reduce the risk of overfitting and enhance the model’s generalization.
The MV-DTF/LRMV-DTF layer provides the mapping , where , , and . The order P of the latent space is fixed to one, i.e., without loss of generality, which simplifies its dimension to J, a hyperparameter to tune. Therefore, the parameters of the MV-DTF/LRMV-DTF layer are either the tensor , or the associated Hadamard factor tensors , and , where the rank R is a hyperparameter to tune, along with the bias tensor . Consequently, Equation (28) is reduced to Equation (30) (Equation (30) holds when is the tensor decomposition rank for all ):
(30) |
where the fused tensor and bias are transformed to the vectors and , respectively, and .
To solve this problem (see Section 3), the multi-objective optimization defined in Equation (13) is employed with and , where and are the task-specific occlusion detection and vehicle classification functions, can be either the MV-DTF or LRMV-DTF layer, and are the binary cross-entropy and multiclass cross-entropy loss (see Definitions 16 and 17, respectively) for the above tasks, and the task importance weighting hyperparameters and were selected from a finite set of values through cross-validation, a technique often employed by other authors [69].
Definition 16
(Binary cross-entropy (BCE)). Let be the true label of an instance, and let be the predicted probability for the positive class. The BCE between and is given by the following:
(31)
Definition 17
(Multiclass cross entropy (MCE)). Let be the true label of an instance, related to some multi-classification problem with C classes, encoded in one-hot format. And let be the predicted probability vector, where is the probability that the instance belongs to the c-th class. The MCE between and is given by the following:
(32)
6.2.2. Training and Validation
From the total number of tracked vehicles in Table 3, of them were selected from the four test videos via stratified random sampling for training and validation purposes. Including all temporal instances of a particular vehicle can cause data leakage; i.e., the model may learn specific patterns from highly correlated temporal samples, resulting in reduced generalization to unseen vehicles. To prevent this, uncorrelated temporal instances were only considered for each selected vehicle.
Vehicles from the subset, along with their uncorrelated temporal instances, were partitioned into two sets: (1) the training set , containing the of vehicles and their temporal instances; and (2) the validation set , with of the vehicles and their instances, where superscript t indexes the task-specific dataset (i.e., for vehicle-size classification and for occlusion detection). The remaining of vehicles and their instances comprise the testing set, denoted as .
Adaptive moment estimation [124] was employed to optimize the parameters of our model. Training was performed for a maximum of 200 epochs, with an early stopping scheme to avoid overfitting by halting training when performance on no longer improved. The training strategy for our multitask, multi-view model is shown in Algorithm 1, where is a mutually exclusive batch of the t-th task, i.e., for , with , and K is the number of batches.
Algorithm 1 Training scheme. |
|
All experiments were conducted and implemented in Python 3.10 and the PyTorch framework on a computer equipped with an Intel Core i7 processor running at 2.2 GHz. To accelerate the processing time, an NVIDIA GTX 1050 TI GPU was employed.
6.3. Performance Evaluation Metrics
In this work, we evaluate the performance of the proposed multitask, multi-view model using six main metrics: accuracy (ACC), F1-measure (F1), geometric mean (GM), normalized Matthews correlation coefficient (MCCn), and normalized Bookmaker informedness (BMn), as detailed in Table 6 (see details of these metrics in [125]). For binary classification, where vehicle instances are categorized into two classes—positive and negative—the performance metrics were directly derived from the entries of a confusion matrix (CM), characterized by true positives (TPs), false negatives (FNs), false positives (FPs), and true negatives (TNs). In multiclass classification with classes, the notions of TP, FN, FP, and TN are less straightforward than in binary classification, as the confusion matrix becomes a matrix whose (i,j)-th entry represents the number of samples that truly belong to the i-th class but were classified as the j-th class. In order to derive the performance metrics, a one-vs.-rest approach is typically employed to reduce the multiclass CM into C binary CMs, where the c-th matrix is formed by treating the c-th class as positive and the rest as the negative class [125,126]. Figure 7 illustrates the CM notion for binary classification (Figure 7a) and multiclass classification with classes (Figure 7b), which were obtained from the mean values of runs.
Table 6.
Metric | Equation | Weighted Metric |
---|---|---|
ACC * | ||
F1 * | ||
MCC * | ||
GM | ||
BM | ||
SNS | ||
SPC | ||
PRC * | ||
Global GM | - | |
Global BM | - | |
multiclass MCC * | - | |
In Table 6, C denotes the number of classes of interest, M is the number of classified instances, is a multiclass CM, metrics with subscript c refer to those computed from the c-th binary CM, obtained by reducing the multiclass CM fixing the c-th class. And metrics with subscript w denote weighted metrics, which consider the individual contributions of each class by weighting the metric value of the c-th class by the number of samples, , of class c. This approach provides a “fair” evaluation by considering the impact of imbalanced class distributions on the overall performance.
Furthermore, in order to quantify how much compression is achieved via the LRMV-DTF layer, a compression ratio, , between the number of parameters in the MV-DTF layer and those in the LRMV-DTF layer, i.e., L and , is defined in Equation (33). Note that is independent of the latent space dimension, and it depends only on the view dimensions. It ensures that the compression ratio is consistent, regardless of the latent tensor space dimension.
(33) |
6.4. Hyperparameter Tuning: The Latent Space Dimension J and the Rank R Values
To determine suitable hyperparameters for the low-rank MV-DTF layer, cross-validation via a grid search was employed [128]. Let be two finite sets containing candidate values for the rank R and latent space dimensionality J, respectively. A grid search trains the multitask, multi-view model, built with the pair , on the training set , and it subsequently evaluates its performance on the validation set using some metric, . The most suitable pair of values is that which achieves the highest performance metric over the validation set .
For our case study with a tensor, , we fixed to study the impact of the J value on the classification metrics across tasks, whereas was selected based on the rank values that reduce the number of parameters in the LRMV-DTF layer according to the compression ratio (see Table 7), and and for performance analysis only. Since our datasets exhibit class imbalance, the MCC as the evaluation metric was used, given its robustness on imbalanced classes, as explained by Luque et al. in [127]. Through this empirical process, we found that and achieve the best trade-off between model performance and the compression ratio in the set . The sets and , and the most suitable pair of values, , must be determined for each multitask, multi-view dataset.
Table 7.
Input Space Dimension | ||||||
---|---|---|---|---|---|---|
(Our Case Study) | ||||||
(2, 1) | 48 | 20 | 2.4 | 48,000 | 182 | 263.7 |
(2, 2) | 48 | 38 | 1.26 | 48,000 | 362 | 132.6 |
(2, 3) | 48 | 56 | 0.86 | 48,000 | 542 | 88.56 |
(2, 4) | 48 | 74 | 0.65 | 48,000 | 722 | 66.48 |
(8, 1) | 192 | 80 | 2.4 | 192,000 | 728 | 263.7 |
(8, 2) | 192 | 152 | 1.26 | 192,000 | 1448 | 132.6 |
(8, 3) | 192 | 224 | 0.86 | 192,000 | 2168 | 88.56 |
(8, 4) | 192 | 296 | 0.65 | 192,000 | 2888 | 66.48 |
(32, 1) | 768 | 320 | 2.4 | 768,000 | 2912 | 263.7 |
(32, 2) | 768 | 608 | 1.26 | 768,000 | 5792 | 132.6 |
(32, 3) | 768 | 896 | 0.86 | 768,000 | 8766 | 88.56 |
(32, 4) | 768 | 1184 | 0.65 | 768,000 | 11,552 | 66.48 |
6.5. Performance Evaluation
In this section, the performance of our multitask, multi-view case study in occlusion detection and vehicle classification tasks is evaluated. Our experiments focused on evaluating the impact of the rank, R, and dimension, J, of the latent tensor space on computational complexity and model performance. To ensure the consistency of our results, each experiment was repeated 30 times. We first provide the results for the space saving achieved using different R and J values in MV-DFT and its low-rank approximation, LRMV-DTF, followed by an analysis of their effects on the learning phases and model performance.
Table 7 provides a comparison of the compression achieved across different pairs of values and two multi-view input space dimensions. It is noteworthy that compression is only achieved for , and the larger the , the higher the compression. Specifically, for the multi-view space , compression is achieved only for (see Figure 8a), while for the multi-view space , a compression can be achieved for higher rank values (see Figure 8b). In consequence, provides an upper rank bound, denoted as , beyond which compression is no longer achieved. For tensors with greater dimensions or a greater order, the upper rank bound would be greater (see Figure 8).
Figure 8 illustrates the relationship between the compression ratio and the rank R for various multi-view spaces with different order and dimensionalities. For each space, we observe that, when the rank R increases, the compression ratio decreases. Figure 8a,b show the compression for multi-view spaces with the same order but different dimensionality, while Figure 8c,d show the compression for higher-order multi-view spaces.
Figure 9 shows the training and validation loss curves over epochs for the model using either the MV-DTF or LRMV-DTF layer. From this figure, distinct behaviors in the loss curves can be observed on the training and validation phases:
For low-dimensional latent tensor space (see Figure 9a,d), although stable, a slower convergence and higher loss values for both training and validation are observed. This indicates that the model may be underfitting.
For high-dimensional latent tensor space (Figure 9c,f), a lower training loss is achieved. However, it exhibits fluctuations in the validation loss, especially for Task 2 (Figure 9f). This suggests that the model begins to overfit as J increases, leading to probable instability in validation performance. The marginal gains in training loss do not justify the increased risk of overfitting.
For (Figure 9b,e), the most balanced performance across both tasks is achieved, showing faster convergence and smoother validation loss curves compared to and . It achieves lower training loss while maintaining minimal validation loss variability, indicating good generalization.
In the subsequent subsections, the performance evaluation for each task on the tested videos is presented, highlighting the impact of the selected hyperparameters in the model’s generalization.
6.5.1. Vehicle Occlusion Detection Results
This section presents the comparison results of the proposed multitask, multi-view model, with either the MV-DTF or LRMV-DTF layer and different pair of values, on the occlusion detection task. Figure 10 shows the mean values of performance metrics obtained from our model for 30 different training runs, evaluated across test videos. Each row corresponds to a specific test video, while each column reflects a particular latent tensor-space dimension J value. As illustrated in Figure 10, a performance drop for different rank R values is very low, especially in high-dimensional spaces (e.g., and , on the second and third columns of Figure 10). However, in low-dimensional spaces (see the first column of Figure 10 for ), the rank choice has a slightly greater impact on the performance, and a fine-tuning rank R value is necessary, as the dimension J decreases.
Additionally, Figure 10 is complemented by Table A1, which presents the mean and standard deviation of performance metrics across multiple runs, with the best and worst values highlighted in blue and red, respectively. From this table, the pair shows the lowest standard deviations across most metrics, providing a good balance between computational complexity (see Table 7) and competitive performance with the MV-DTF layer. Although high-dimensional spaces (e.g., ) yield high performance, they also tend to exhibit large standard deviations, potentially increasing the risk of overfitting despite their higher mean values.
Finally, Figure 11 presents a performance comparison between our multi-view multitask model, using the LRMV-DTF with , and single-task learning (STL) single-view learning (SVL) models of SVM and RF, tested across test videos. Figure 11 highlights that the proposed model exhibits higher and more consistent performance than STL-SVL models on all metrics and videos, particularly in V2, V3, and V4. In contrast, the SVM and RF models show a noticeable performance drop in these videos. Overall, the proposed model improves the performance in the MCCnw metric of up to , which represents a significant improvement over the SVM and RF models.
6.5.2. Vehicle-Size Classification Results
This section presents the comparison results of the proposed multitask, multi-view model, incorporating either the MV-DTF or LRMV-DTF layer with different pairs of values, on the vehicle-size classification task. Figure 12 shows the mean values of the performance metrics for 30 different training runs on the test videos, where each row and column are related to a specific test video, and latent tensor space dimension, respectively. From this figure, we observe that the lower the J value, the worse the performance. Similarly, high R values generally contribute to improved performance. For , there is a noticeable drop in performance, especially on the GMw, BMnw, and MCCnw metrics, suggesting that low-dimensional spaces fail to capture the complexity of the task. However, as long as J increases to 16 and 64, the metrics stabilize, and the performance drops across ranks becomes negligible, particularly for the ACCw and F1w metrics.
Additionally, Table A2 shows the mean and standard deviation of performance metrics across runs, with the best and worst values highlighted in blue and red, respectively. From this table, we found that high-dimensional spaces tend to yield not only higher mean performance but also lower standard deviation, indicating more stable and consistent outcomes across different test videos. In contrast, low-dimensional spaces (e.g., ) are more sensitive to the rank R hyperparameter, particularly for GMw, BMnw, and MCCnw. Consequently, a computationally efficient LRMV-DTF layer can be achieved in high-dimensional spaces by selecting low rank values without a significant performance drop. In contrast, for low-dimensional latent spaces (), the performance is more sensitive to the choice of R, especially for GMw, BMnw, and MCCnw. Therefore, selecting an appropriate rank becomes crucial for low J values to avoid significant drops in performance.
Finally, in Figure 13, a comparison between our multi-view, multitask model and STL-SVL models (SVM and RF) across test videos is presented. This figure highlights the superiority of our multitask model, particularly in videos V3 and V4, where the SVM and RF models again exhibit a significant performance drop. Overall, the proposed model improves the performance in the MCCnw metric by up to , which represents a significant improvement over the SVM and RF models.
6.5.3. Comparison with a Multitask Single-View Model
We also provide a comparison between the proposed multitask, multi-view model with its corresponding single-view model in Table A3. The latter model is basically the proposed model but without incorporating the MV-DTF layer, and the input space can only be either , , or . However, in this work, we fix the input space to . Finally, for a fair comparison, this model incorporates a layer that maps the feature space onto a latent space of dimension J.
The results provided in Table A3 show the overall mean value of weighted metrics across all videos, where it can be observed that incorporating the MV-DTF layer into this single-view model an improvement of up to and on the BMnw and MCCnw metrics, is achieved. These results are also consistent across all latent space dimensions.
Unlike the F1 metric, the experimental results show that the performance of single-view models does not exhibit a negative impact when the fusion layer is incorporated. Furthermore, even though the model parameters increase, incorporating the MV-DTF layer offers several advantages, including that the layer approximation through Hadamard products allows selecting ranks that, unlike the classical CPD, higher compression rates can be achieved.
Finally, Figure A1 and Figure A2 show the results for the occlusion detection and vehicle-size classification tasks, respectively. In contrast to the performance shown in Figure 10 and Figure 12, Figure A1 and Figure A2 show each metric independently for more detail.
6.6. Discussion
The promising results of the MV-DTF layer and its low-rank approximation LRMV-DTF comprise the following:
The performance and consistency of the multitask, multi-view model are significantly influenced by the dimensionality of the latent tensor space (see Figure 10 and Figure 12). For a specific dimension, , the model exhibits two distinct behaviors, given another J value: for , the model tends to underfit, whereas for , it is prone to overfitting to the training data.
A negligible performance drop was observed in our case study as the compression ratio approaches 1, i.e., , when the LRMV-DTF layer is employed. This result provides empirical evidence of the underlying low-rank structure in the subtensors of tensor in the MV-DTF layer.
The maximum allowable rank value (upper rank bound) that achieves parameters’ compression increases as the dimensions of the multi-view space grow and/or as the number of dimensions (tensor order) increases.
The major limitations of the MV-DTF layer are as follows:
Selecting suitable hyperparameters, i.e., the dimensionality or J of the latent tensor space , and the rank R for the LRMV-DTF layer, is a challenging task.
A high-dimensional latent space increases the risk of overfitting, while very low-dimensional spaces may not fully capture the underlying relationships across views, resulting in underfitting.
Reducing the rank of subtensors tends to decrease performance and increase the risk of underfitting classification models for low-dimensional latent spaces. Although higher rank values may improve model performance, they also increase the risk of overfitting.
The choice of rank R involves a trade-off: higher values increase computational complexity but can capture more complex patterns, while lower values reduce the computational burden but may limit expressiveness of the model, resulting in performance decreasing.
7. Conclusions
In this work, we found a novel connection between the Einstein and Hadamard products for tensors. It is a mathematical relationship involving the Einstein product of the tensor associated with a multilinear map , and a rank-one tensor , where , for all , and . By enforcing low-rank constraints on the subtensors of , which result by fixing every index but the last M, each j-th subtensor is approximated as a rank- tensor through the CPD. By exploiting this structure, a set of M third-order tensors , here called the Hadamard factor tensors, are obtained. We found that the Einstein product can then be approximated by a sum of R Hadamard products of M Einstein products , where R corresponds to the maximum decomposition rank across subtensors, and for all .
Since multi-view learning leverages complementary information from multiple feature sets to enhance model performance, a tensor-based data fusion layer for neural networks, called Multi-View Data Tensor Fusion, is here employed. This layer projects M feature spaces , referred to as views, into a unified latent tensor space through a mapping , where , and for all . Here, we constrain g to the space of affine mappings composed of a multilinear map, , followed by a translation. The multilinear map is here represented by the Einstein product , where is the induced tensor of , and . Unfortunately, as the number of views increases, the number of parameters that determine g grow exponentially, and consequently, its computational complexity also grows.
To mitigate the curse of dimensionality in the MV-DTF layer, we exploit the mathematical relationship between the Einstein product and Hadamard product, which is the low-rank approximation of the Einstein product, useful when the compression ratio .
The use of the LRMV-DTF layer based on the Hadamard product does not imply necessarily an improvement of the model performance compared to the MV-DTF layer based on the Einstein product. In fact, the dimension of the latent space J and the rank of subtensors R must be tuned via cross-validation (see Section 6.4). When the decomposition rank of subtensors is less than the upper rank bound (), an efficient low-rank approximation of the MV-DTF layer based on the Einstein product is obtained.
From our experiments, we show that the intoduction of the MV-DTF and LRMV-DTF layers in a case study multitask VTS model for vehicle-size classification and occlusion detection tasks improves its performance compared to single-task and single-view models. For our case study, i.e., a particular case using the LRMV-DTF layer with and , our model achieved an MCCnw of up to 95.10% for vehicle-size classification and 92.81% for occlusion detection, representing significant improvements of 7% and 6%, respectively, over single-task single-view models while reducing the number of parameters by a factor of 1.3.
Finally, for every case study, the dimension of the latent tensor space, J, and the decomposition rank, R, must be tuned. Additionally, the employment of an MV-DTF layer or a LRMV-DTF layer must be determined while the tradeoff between the model performance and computational complexity is taken into account.
Open Issues
A computational complexity analysis must be conducted to evaluate the LRMV-DTF layer efficiency.
For VTS systems, to integrate other high-dimensional feature spaces in order to improve the expressiveness of the latent tensor space and its computational efficiency.
To explore other tensor decomposition models, such as the tensor-train model, for more efficient algorithms in high-dimensional data.
To extend our work to more complex network architectures.
To address other VTS tasks within the MTL framework for a more comprehensive vehicle traffic model.
Acknowledgments
The authors acknowledge the Consejo Nacional de Ciencia y Tecnología (CONACYT) for Ph.D. student grant no. CB-253955. The first author also expresses gratitude to the Center for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV IPN) and its academic staff, especially to Torres, D. Special thanks are extended to the reviewers for their valuable feedback and constructive comments, which greatly contributed to improving the quality of this paper.
Abbreviations
The following abbreviations are used in this manuscript:
ITS | Intelligent Transportation Systems |
VTS | Vehicle traffic surveillance |
MVL | Multi-view learning |
MTL | Multitask learning |
MV-DTF | Multi-View Data Tensor Fusion |
LRMV-DTF | Low-Rank Multi-view Data Tensor Fusion |
Appendix A. Mathematical Proofs
Appendix A.1. Proof of Proposition
Let be a multilinear map, and let be its associated tensor. Also, let be M vectors where the m-th vector . And let be the image of the tuple under , i.e., . According to Definition 14, for , can be expressed as , whose -th entry is given as follows:
(A1) |
From the CPD (Section 2.4.2), each subtensor can be approximated as a rank- tensor, as Equation (A2) shows, where is its r-th rank-one tensor, while Equation (A2) holds if is the decomposition rank of .
(A2) |
Substituting Equation (A2) into Equation (A1) results in Equation (A3):
(A3) |
By exploiting the distributive property of the Einstein product, tensor can be distributed along the summation, as Equation (A4) shows.
(A4) |
Note that, as is a rank-one tensor, it can be factorized into the tensor product of N vectors (see Definition 7), as shown in Equation (A5), where denotes its mth vector, also called the factor vector, which is related to the m-mode.
(A5) |
Since is also a rank-one tensor, i.e., , it follows that , and , respectively. Hence, Equation (A4) can be rewritten as follows:
(A6) |
By leveraging the independence of the terms involved in the summations, it can be rearranged as a sum of the products of inner products, as shown in Equation (A7).
(A7) |
From Equation (A7), two cases can be distinguished:
The tensor ranks of all subtensors are equal.
The tensor ranks of all subtensors are different.
Appendix A.1.1. The Tensor Ranks of Subtensors Are Equal
Since the tensor rank is equal across subtensors, Equation (A6) becomes Equation (A8), where .
(A8) |
From the inner product , it should be noted that the term is indexed by indices , and . Let be the th entry of a th-order tensor , here called the mth factor tensor, whose -mode fiber . From this tensor, the inner product of Equation (A8) can be rewritten as the Einstein product , as Equation (A9) shows:
(A9) |
To simplify this proof, we restrict the order P of as the unidimensional case, i.e., ; however, it can be easily generalized to any arbitrary order. As a consequence, the tensor and become the vector , which is given by Equation (A10):
(A10) |
The summations and products on Equation (A10) can be rewritten as the product using the Hadamard product, as Equation (A11) shows:
(A11) |
Note that the entries of vector are inner products, i.e., . It resembles the standard matrix times vector multiplication, which is carried along the first and third mode of with vector , as illustrated by Equation (A12):
(A12) |
Finally, also note that the outer summation is carried out along the r index, and it can be expressed by the Einstein product of the Hadamard product of Einstein products with a vector , whose entries are 1, as shown in Equation (A13), which concludes the proof.
(A13) |
Appendix A.1.2. The Tensor Ranks of Subtensors Are Different
Here, we also restrict the order P of as the unidimensional case. Let R be the maximum decomposition rank across subtensors, as Equation (A14) shows:
(A14) |
Then, Equation (A6) can be rewritten as depicted in Equation (A15).
(A15) |
We also define a tensor , whose th entry is given as follows:
(A16) |
Note that Equation (A15) is consistent with Equation (A7). Consequently, Equation (A14) also holds for the second case, which completes this proof.
Appendix A.2. Low-Rank Multi-View Tensor Data Fusion Layer for Unidimensional Latent Spaces
A particular case of the MV-DTF layer is for unidimensional latent spaces, i.e., , and . In this scenario, the multilinear transformation is induced via the tensor . From the Einstein product (Definition 11), the j-th entry of is given by Equation (A1), where , and .
(A17) |
Using the CPD, each j-th subtensor can then be decomposed into rank-one tensors, as shown in Equation (A18), where is the m-mode factor vector for the r-th rank-one tensor , and .
(A18) |
Then, similar to Equation (A7), subtensor can also be expressed as a sum of products of inner products, as Equation (A18) shows.
(A19) |
Consequently, it yields the same tensor forms derived in Appendixes Appendix A.1.1 and Appendix A.1.2, i.e.,
(A20) |
where is the m-mode factor tensor associated with the m-th view, whose third-mode fiber .
Appendix B. Results
Table A1.
Video | (J, R) | ACCw | F1w | GMw | MCCnw | BMnw |
---|---|---|---|---|---|---|
V1 | ||||||
V1 | ||||||
V1 | ||||||
V2 | ||||||
V2 | ||||||
V2 | ||||||
V3 | ||||||
V3 | ||||||
V3 | ||||||
V4 | ||||||
V4 | ||||||
V4 | ||||||
Values highlighted in red denote the worst values achieved on every group. Values highlighted in blue denote the best values achieved on every group.
Table A2.
Video | (J,R) | ACCw | F1w | GMw | MCCnw | BMnw |
---|---|---|---|---|---|---|
V1 | ||||||
V1 | ||||||
V1 | ||||||
V2 | ||||||
V2 | ||||||
V2 | ||||||
V3 | ||||||
V3 | ||||||
V3 | ||||||
V4 | ||||||
V4 | ||||||
V4 | ||||||
Values highlighted in red denote the worst values achieved on every group. Values highlighted in blue denote the best values achieved on every group.
Table A3.
Method | Metric | ||||
---|---|---|---|---|---|
ACCw | F1w | GMw | MCCnw | BMnw | |
MT-SV (J = 8) | 0.9483 | 0.9282 | 0.7878 | 0.8388 | 0.8087 |
MT-MV (J = 8, R = −) | 0.9514 | 0.9286 | 0.8075 | 0.8494 | 0.8248 |
MT-MV (J = 8, R = 1) | 0.9501 | 0.9279 | 0.8026 | 0.8455 | 0.8205 |
MT-MV (J = 8, R = 2) | 0.9504 | 0.9285 | 0.8098 | 0.8476 | 0.8264 |
MT-MV (J = 8, R = 3) | 0.9516 | 0.9288 | 0.8073 | 0.8498 | 0.8245 |
MT-SV (J = 10) | 0.9493 | 0.9290 | 0.7909 | 0.8420 | 0.8112 |
MT-MV (J = 10, R = −) | 0.9504 | 0.9276 | 0.8091 | 0.8471 | 0.8257 |
MT-MV (J = 10, R = 1) | 0.9499 | 0.9273 | 0.8063 | 0.8456 | 0.8235 |
MT-MV (J = 10, R = 2) | 0.9503 | 0.9282 | 0.8006 | 0.8490 | 0.8190 |
MT-MV (J = 10, R = 3) | 0.9508 | 0.9284 | 0.8045 | 0.8476 | 0.8222 |
MT-SV (J = 12) | 0.9483 | 0.9285 | 0.7874 | 0.8389 | 0.8084 |
MT-MV (J = 12, R = −) | 0.9510 | 0.9283 | 0.8089 | 0.8487 | 0.8256 |
MT-MV (J = 12, R = 1) | 0.9512 | 0.9288 | 0.8089 | 0.8492 | 0.8257 |
MT-MV (J = 12, R = 2) | 0.9498 | 0.9272 | 0.8065 | 0.8455 | 0.8237 |
MT-MV (J = 12, R = 3) | 0.9511 | 0.9279 | 0.7984 | 0.8469 | 0.8176 |
Values highlighted in red denote the worst values achieved on every group. Values highlighted in blue denote the best values achieved on every group.
Author Contributions
F.H.-R. and D.T.-R. wrote the article. D.T.-R. contributed the initial concept of applying multitask learning and tensors to VTS systems. F.H.-R. conceptualized the multi-view learning for VTS systems, developed and implemented the associated algorithms, and identified the mathematical relationship between the Einstein and Hadamard products. The theoretical discussion of Einstein and Hadamard products was written by F.H.-R. and D.T.-R. All authors have read and agreed to the published version of the manuscript.
Institutional Review Board Statement
No applicable.
Informed Consent Statement
No applicable.
Data Availability Statement
The datasets used to support the findings of this study, particularly for the multi-view multitask model, are publicly available at the following GitHub repository: https://github.com/fhermosillo/VTSMultiviewDatasets (accessed on 20 October 2024). Unfortunately, the code employed in this work is currently private, but it can be made available upon request to reviewers for evaluation purposes. Please refer to the repository or contact the corresponding author for further inquiries or additional data requests.
Conflicts of Interest
The authors declare no conflicts of interest.
Funding Statement
This research received no external funding.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
References
- 1.Hou Z., Chen Y. A real time vehicle collision detecting and reporting system based on internet of things technology; Proceedings of the 2017 3rd IEEE International Conference on Computer and Communications (ICCC); Chengdu, China. 13–16 December 2017; pp. 1135–1139. [Google Scholar]
- 2.Ijjina E.P., Chand D., Gupta S., Goutham K. Computer vision-based accident detection in traffic surveillance; Proceedings of the 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT); Kanpur, India. 6–8 July 2019; pp. 1–6. [Google Scholar]
- 3.Niu Y., Zhang Y., Li L. Road Monitoring and Traffic Control System Design; Proceedings of the 2009 International Conference on Information Engineering and Computer Science; Wuhan, China. 19–20 December 2009; pp. 1–4. [Google Scholar]
- 4.Desai Y., Rungta Y., Reshamwala P. Automatic Traffic Management and Surveillance System; Proceedings of the 2020 International Conference on Smart Innovations in Design, Environment, Management, Planning and Computing (ICSIDEMPC); Aurangabad, India. 30–31 October 2020; pp. 131–133. [Google Scholar]
- 5.Chan M.N., Tint T. A Review on Advanced Detection Methods in Vehicle Traffic Scenes; Proceedings of the 2021 6th International Conference on Inventive Computation Technologies (ICICT); Coimbatore, India. 20–22 January 2021; pp. 642–649. [Google Scholar]
- 6.Velazquez-Pupo R., Sierra-Romero A., Torres-Roman D., Shkvarko Y.V., Santiago-Paz J., Gómez-Gutiérrez D., Robles-Valdez D., Hermosillo-Reynoso F., Romero-Delgado M. Vehicle Detection with Occlusion Handling, Tracking, and OC-SVM Classification: A High Performance Vision-Based System. Sensors. 2018;18:374. doi: 10.3390/s18020374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Buch N., Velastin S.A., Orwell J. A review of computer vision techniques for the analysis of urban traffic. IEEE Trans. Intell. Transp. Syst. 2011;12:920–939. doi: 10.1109/TITS.2011.2119372. [DOI] [Google Scholar]
- 8.Hsieh J.W., Yu S.H., Chen Y.S., Hu W.F. Automatic traffic surveillance system for vehicle tracking and classification. IEEE Trans. Intell. Transp. Syst. 2006;7:175–187. doi: 10.1109/TITS.2006.874722. [DOI] [Google Scholar]
- 9.Moussa G.S. Vehicle type classification with geometric and appearance attributes. Int. J. Archit. Environ. Eng. 2014;8:277–282. [Google Scholar]
- 10.Chen Z., Pears N., Freeman M., Austin J. A Gaussian mixture model and support vector machine approach to vehicle type and colour classification. ET Intell. Transp. Syst. 2014;8:135–144. doi: 10.1049/iet-its.2012.0104. [DOI] [Google Scholar]
- 11.Al Okaishi W.A.H.B.A.N., Zaarane A., Slimani I., Atouf I., Benrabh M. A Traffic Surveillance System in Real-Time to Detect and Classify Vehicles by Using Convolutional Neural Network; Proceedings of the 2019 International Conference on Systems of Collaboration Big Data, Internet of Things & Security; Casablanca, Morocco. 12–13 December 2019; pp. 1–5. [Google Scholar]
- 12.Wu Z., Sang J., Zhang Q., Xiang H., Cai B., Xia X. Multi-Scale Vehicle Detection for Foreground-Background Class Imbalance with Improved YOLOv2. Sensors. 2019;19:3336. doi: 10.3390/s19153336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Chen X.-Z., Chang C.-M., Yu C.-W., Chen Y.-L. A Real-Time Vehicle Detection System under Various Bad Weather Conditions Based on a Deep Learning Model without Retraining. Sensors. 2020;20:5731. doi: 10.3390/s20205731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chen Y., Hu W. Robust Vehicle Detection and Counting Algorithm Adapted to Complex Traffic Environments with Sudden Illumination Changes and Shadows. Sensors. 2020;20:2686. doi: 10.3390/s20092686. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Meng T., Jing X., Yan Z., Pedrycz W. A survey on machine learning for data fusion. Inf. Fusion. 2020;57:115–129. doi: 10.1016/j.inffus.2019.12.001. [DOI] [Google Scholar]
- 16.Castanedo F. A review of data fusion techniques. Sci. World J. 2013;2013:704504. doi: 10.1155/2013/704504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Polikar R. Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 2006;6:21–45. doi: 10.1109/MCAS.2006.1688199. [DOI] [Google Scholar]
- 18.Lahat D., Adali T., Jutten C. Multimodal data fusion: An overview of methods, challenges, and prospects. Proc. IEEE. 2015;103:1449–1477. doi: 10.1109/JPROC.2015.2460697. [DOI] [Google Scholar]
- 19.Lahat D., Adalý T., Jutten C. Challenges in multimodal data fusion; Proceedings of the 2014 22nd European Signal Processing Conference (EUSIPCO); Lisbon, Portugal. 1–5 September 2014; pp. 101–105. [Google Scholar]
- 20.Zhao J., Xie X., Xu X., Sun S. Multi-view learning overview: Recent progress and new challenges. Inf. Fusion. 2017;38:43–54. doi: 10.1016/j.inffus.2017.02.007. [DOI] [Google Scholar]
- 21.Yan X., Hu S., Mao Y., Ye Y., Yu H. Deep multi-view learning methods: A review. Neurocomputing. 2021;448:106–129. doi: 10.1016/j.neucom.2021.03.090. [DOI] [Google Scholar]
- 22.Xu C., Tao D., Xu C. A survey on multi-view learning. arXiv. 20131304.5634 [Google Scholar]
- 23.Zhang Q., Zhang L., Du B., Zheng W., Bian W., Tao D. MMFE: Multitask multiview feature embedding; Proceedings of the 2015 IEEE International Conference on Data Mining; Atlantic City, NJ, USA. 14–17 November 2015; pp. 1105–1110. [Google Scholar]
- 24.Caruana R. Multitask learning. Mach. Learn. 1997;28:41–75. doi: 10.1023/A:1007379606734. [DOI] [Google Scholar]
- 25.Zhang Y., Yang Q. A survey on multi-task learning. IEEE Trans. Knowl. Data Eng. 2021;34:5586–5609. doi: 10.1109/TKDE.2021.3070203. [DOI] [Google Scholar]
- 26.Zhang Y., Yang Q. An overview of multi-task learning. Natl. Sci. Rev. 2018;5:30–43. doi: 10.1093/nsr/nwx105. [DOI] [Google Scholar]
- 27.Crawshaw M. Multi-task learning with deep neural networks: A survey. arXiv. 20202009.09796 [Google Scholar]
- 28.Yang Y., Hospedales T. Deep multi-task representation learning: A tensor factorisation approach. arXiv. 20161605.06391 [Google Scholar]
- 29.Fausett L.V. Fundamentals of Neural Networks: Architectures, Algorithms and Applications. Pearson Education India; New Delhi, India: 2006. [Google Scholar]
- 30.Kolda T., Bader B. Tensor decompositions and applications. SIAM Rev. 2009;51:455–500. doi: 10.1137/07070111X. [DOI] [Google Scholar]
- 31.Cong F., Lin Q.H., Kuang L.D., Gong X.F., Astikainen P., Ristaniemi T. Tensor decomposition of EEG signals: A brief review. J. Neurosci. Methods. 2015;248:59–69. doi: 10.1016/j.jneumeth.2015.03.018. [DOI] [PubMed] [Google Scholar]
- 32.López J., Torres D., Santos S., Atzberger C. Spectral Imagery Tensor Decomposition for Semantic Segmentation of Remote Sensing Data through Fully Convolutional Networks. Remote Sens. 2020;12:517. doi: 10.3390/rs12030517. [DOI] [Google Scholar]
- 33.Wimalawarne K., Sugiyama M., Tomioka R. Multitask learning meets tensor factorization: Task imputation via convex optimization. Adv. Neural Inf. Process. Syst. 2014;27:2825–2833. [Google Scholar]
- 34.Romera-Paredes B., Aung H., Bianchi-Berthouze N., Pontil M. Multilinear multitask learning. Int. Conf. Mach. Learn. 2013;28:1444–1452. [Google Scholar]
- 35.Zhang Z., Xie Y., Zhang W., Tang Y., Tian Q. Tensor multi-task learning for person re-identification. IEEE Trans. Image Process. 2019;29:2463–2477. doi: 10.1109/TIP.2019.2949929. [DOI] [PubMed] [Google Scholar]
- 36.Cao B., He L., Kong X., Philip S.Y., Hao Z., Ragin A.B. Tensor-based multi-view feature selection with applications to brain diseases; Proceedings of the 2014 IEEE International Conference on Data Mining; Shenzhen, China. 14–17 December 2014; pp. 40–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Sidiropoulos N.D., De Lathauwer L., Fu X., Huang K., Papalexakis E.E., Faloutsos C. Tensor decomposition for signal processing and machine learning. IEEE Trans. Signal Process. 2017;65:3551–3582. doi: 10.1109/TSP.2017.2690524. [DOI] [Google Scholar]
- 38.Vasilescu M.A.O., Terzopoulos D. Multilinear analysis of image ensembles: Tensorfaces; Proceedings of the 7th European Conference on Computer Vision; Copenhagen, Denmark. 28–31 May 2002; pp. 447–460. [Google Scholar]
- 39.de Almeida A.L., Favier G., Mota J.C.M. PARAFAC-based unified tensor modeling for wireless communication systems with application to blind multiuser equalization. Signal Process. 2007;87:337–351. doi: 10.1016/j.sigpro.2005.12.014. [DOI] [Google Scholar]
- 40.da Costa M.N., Favier G., Romano J.M.T. Tensor modelling of MIMO communication systems with performance analysis and Kronecker receivers. Signal Process. 2018;145:304–316. doi: 10.1016/j.sigpro.2017.12.015. [DOI] [Google Scholar]
- 41.Zhang W., Wu Q.J., Yang X., Fang X. Multilevel framework to detect and handle vehicle occlusion. IEEE Trans. Intell. Transp. Syst. 2008;9:161–174. doi: 10.1109/TITS.2008.915647. [DOI] [Google Scholar]
- 42.Pang C.C.C., Lam W.W.L., Yung N.H.C. A novel method for resolving vehicle occlusion in a monocular traffic-image sequence. IEEE Trans. Intell. Transp. Syst. 2004;5:129–141. doi: 10.1109/TITS.2004.833769. [DOI] [Google Scholar]
- 43.Wu B.F., Kao C.C., Jen C.L., Li Y.F., Chen Y.H., Juang J.H. A relative-discriminative-histogram-of-oriented-gradients-based particle filter approach to vehicle occlusion handling and tracking. IEEE Trans. Ind. Electron. 2013;61:4228–4237. doi: 10.1109/TIE.2013.2284131. [DOI] [Google Scholar]
- 44.Yung N.H., Lai A.H. Detection of vehicle occlusion using a generalized deformable model. Detect. Veh. Occlusion Using Gen. Deform. Model. 1998;4:154–157. [Google Scholar]
- 45.Chang J., Wang L., Meng G., Xiang S., Pan C. Vision-based occlusion handling and vehicle classification for traffic surveillance systems. IEEE Intell. Transp. Syst. Mag. 2018;10:80–92. doi: 10.1109/MITS.2018.2806619. [DOI] [Google Scholar]
- 46.Phan H.N., Pham L.H., Tran D.N.N., Ha S.V.U. Occlusion vehicle detection algorithm in crowded scene for traffic surveillance system; Proceedings of the 2017 International Conference on System Science and Engineering (ICSSE); Ho Chi Minh City, Vietnam. 21–23 July 2017; pp. 215–220. [Google Scholar]
- 47.Heidari V., Ahmadzadeh M.R. A method for vehicle classification and resolving vehicle occlusion in traffic images; Proceedings of the 2013 First Iranian Conference on Pattern Recognition and Image Analysis (PRIA); Birjand, Iran. 6–8 March 2013; pp. 1–6. [Google Scholar]
- 48.Ke L., Tai Y.W., Tang C.K. Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; Nashville, TN, USA. 20–25 June 2021; pp. 4019–4028. [Google Scholar]
- 49.Qi J., Gao Y., Hu Y., Wang X., Liu X., Bai X., Belongie S., Yuille A., Torr P.H.S., Bai S. Occluded Video Instance Segmentation: A Benchmark. arXiv. 2021 doi: 10.1007/s11263-022-01629-1.2102.01558 [DOI] [Google Scholar]
- 50.Saleh K., Szénási S., Vámossy Z. Occlusion Handling in Generic Object Detection: A Review; Proceedings of the 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI); Herl’any, Slovakia. 21–23 January 2021; pp. 000477–000484. [Google Scholar]
- 51.Yuan X., Kortylewski A., Sun Y., Yuille A. Robust Instance Segmentation through Reasoning about Multi-Object Occlusion; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; Nashville, TN, USA. 20–25 June 2021; pp. 11141–11150. [Google Scholar]
- 52.Feng P., She Q., Zhu L., Li J., Zhang L., Feng Z., Wang C., Li C., Kang X., Ming A. MT-ORL: Multi-Task Occlusion Relationship Learning; Proceedings of the IEEE/CVF International Conference on Computer Vision; Montreal, QC, Canada. 10–17 October 2021; pp. 9364–9373. [Google Scholar]
- 53.Zhan X., Pan X., Dai B., Liu Z., Lin D., Loy C.C. Self-Supervised Scene De-Occlusion; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; Seattle, WA, USA. 13–19 June 2020; pp. 3784–3792. [Google Scholar]
- 54.Yan X., Wang F., Liu W., Yu Y., He S., Pan J. Visualizing the Invisible: Occluded Vehicle Segmentation and Recovery; Proceedings of the IEEE/CVF International Conference on Computer Vision; Seoul, Republic of Korea. 27 October–2 November 2019; pp. 7618–7627. [Google Scholar]
- 55.Lin J.P., Sun M.T. A YOLO-based traffic counting system; Proceedings of the 2018 Conference on Technologies and Applications of Artificial Intelligence (TAAI); Taichung, Taiwan. 30 November–2 December 2018; pp. 82–85. [Google Scholar]
- 56.Kim K.J., Park S.M., Choi Y.J. Deciding the number of color histogram bins for vehicle color recognition; Proceedings of the 2008 IEEE Asia-Pacific Services Computing Conference; Yilan, Taiwan. 9–12 December 2008; pp. 134–138. [Google Scholar]
- 57.Ge P., Hu Y. Vehicle Type Classification based on Improved HOG_SVM; Proceedings of the 3rd International Conference on Mechatronics Engineering and Information Technology (ICMEIT 2019); Dalian, China. 29–30 March 2019; pp. 640–647. [Google Scholar]
- 58.Kim J.A., Sung J.Y., Park S.H. Comparison of Faster-RCNN, YOLO, and SSD for real-time vehicle type recognition; Proceedings of the 2020 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia); Seoul, Republic of Korea. 1–3 November 2020; pp. 1–4. [Google Scholar]
- 59.Naik U.P., Rajesh V., Kumar R. Implementation of YOLOv4 algorithm for multiple object detection in image and video dataset using deep learning and artificial intelligence for urban traffic video surveillance application; Proceedings of the 2021 Fourth International Conference on Electrical, Computer and Communication Technologies (ICECCT); Erode, India. 15–17 September 2021; pp. 1–6. [Google Scholar]
- 60.Pavani K., Sriramya P. Comparison of KNN, ANN, CNN and YOLO algorithms for detecting the accurate traffic flow and build an Intelligent Transportation System; Proceedings of the 2022 2nd International Conference on Innovative Practices in Technology and Management (ICIPTM), Gautam Buddha Nagar; India. 23–25 February 2022; pp. 628–633. [Google Scholar]
- 61.Zhao Y., Lu Z. Ford Vehicle Identification based on gray-level co-occurrence matrix and genetic neural network; Proceedings of the 2019 Third World Conference on Smart Trends in Systems Security and Sustainablity; London, UK. 30–31 July 2019; pp. 275–279. [Google Scholar]
- 62.Leotta M.J., Mundy J.L. Vehicle surveillance with a generic, adaptive, 3d vehicle model. IEEE Trans. Pattern Anal. Mach. Intell. 2010;33:1457–1469. doi: 10.1109/TPAMI.2010.217. [DOI] [PubMed] [Google Scholar]
- 63.Sochor J., Herout A., Havel J. Boxcars: 3d boxes as cnn input for improved fine-grained vehicle recognition; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Las Vegas, NV, USA. 27–30 June 2016; pp. 3006–3015. [Google Scholar]
- 64.Prokaj J., Medioni G. 3-D model based vehicle recognition; Proceedings of the 2009 Workshop on Applications of Computer Vision; Snowbird, UT, USA. 7–8 December 2009; pp. 1–7. [Google Scholar]
- 65.Shahin O.R., Alruily M. Vehicle Identification Using Eigenvehicles; Proceedings of the 2019 IEEE International Conference on Electrical, Computer and Communication Technologies; Coimbatore, India. 20–22 February 2019; pp. 1–6. [Google Scholar]
- 66.Shi C., Wu C. Vehicle Face Recognition Algorithm Based on Weighted Nonnegative Matrix Factorization with Double Regularization Terms. Ksii Trans. Internet Inf. Syst. 2020;14:2171–2185. [Google Scholar]
- 67.Ban J.M., Lee B.R., Kang H.C. Vehicle recognition using NMF in urban scene. J. Korean Inst. Commun. Inf. Sci. 2012;37:554–564. [Google Scholar]
- 68.Ban J.M., Kang H. Vehicle Recognition using Non-negative Tensor Factorization. J. Inst. Electron. Inf. Eng. 2015;52:136–146. [Google Scholar]
- 69.Redmon J., Divvala S., Girshick R., Farhadi A. You only look once: Unified, real-time object detection; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Las Vegas, NV, USA. 27–30 June 2016; pp. 779–788. [Google Scholar]
- 70.Wang K., Liu Y., Gou C., Wang F.Y. A multi-view learning approach to foreground detection for traffic surveillance applications. IEEE Trans. Veh. Technol. 2015;65:4144–4158. doi: 10.1109/TVT.2015.2509465. [DOI] [Google Scholar]
- 71.Guo H., Wang J., Xu M., Zha Z.J., Lu H. Learning multi-view deep features for small object retrieval in surveillance scenarios; Proceedings of the 23rd ACM International Conference on Multimedia; New York, NY, USA. 26–30 October 2015; pp. 859–862. [Google Scholar]
- 72.Chu W., Liu Y., Shen C., Cai D., Hua X.S. Multi-task vehicle detection with region-of-interest voting. IEEE Trans. Image Process. 2017;27:432–441. doi: 10.1109/TIP.2017.2762591. [DOI] [PubMed] [Google Scholar]
- 73.Oeljeklaus M., Hoffmann F., Bertram T. A fast multi-task CNN for spatial understanding of traffic scenes; Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC); Maui, HI, USA. 4–7 November 2018; pp. 2825–2830. [Google Scholar]
- 74.Liu S., Johns E., Davison A.J. End-to-end multi-task learning with attention; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; Long Beach, CA, USA. 15–20 June 2019; pp. 1871–1880. [Google Scholar]
- 75.Zadeh A., Chen M., Poria S., Cambria E., Morency L.P. Tensor fusion network for multimodal sentiment analysis. arXiv. 20171707.07250 [Google Scholar]
- 76.Liu Z., Shen Y., Lakshminarasimhan V.B., Liang P.P., Zadeh A., Morency L.P. Efficient low-rank multimodal fusion with modality-specific factors. arXiv. 20181806.00064 [Google Scholar]
- 77.Guo Y., Zhang C., Zhang C., Chen Y. Sparse dnns with improved adversarial robustness. Adv. Neural Inf. Process. Syst. 2018;31:240–249. [Google Scholar]
- 78.Oh Y.H., Quan Q., Kim D., Kim S., Heo J., Jung S., Jang J., Lee J.W. A portable, automatic data qantizer for deep neural networks; Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques; Limassol, Cyprus. 1–4 November 2018; pp. 1–14. [Google Scholar]
- 79.Denil M., Shakibi B., Dinh L., Ranzato M.A., De Freitas N. Predicting parameters in deep learning. Adv. Neural Inf. Process. Systems. 2013;26:1–9. [Google Scholar]
- 80.Mai A., Tran L., Tran L., Trinh N. VGG deep neural network compression via SVD and CUR decomposition techniques; Proceedings of the 2020 7th NAFOSTED Conference on Information and Computer Science (NICS); Ho Chi Minh City, Vietnam. 26–27 November 2020; pp. 118–123. [Google Scholar]
- 81.Jaderberg M., Vedaldi A., Zisserman A. Speeding up convolutional neural networks with low rank expansions. arXiv. 20141405.3866 [Google Scholar]
- 82.Lebedev V., Ganin Y., Rakhuba M., Oseledets I., Lempitsky V. Speeding-up convolutional neural networks using fine-tuned cp-decomposition. arXiv. 20141412.6553 [Google Scholar]
- 83.Kim Y.D., Park E., Yoo S., Choi T., Yang L., Shin D. Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv. 20151511.06530 [Google Scholar]
- 84.Tukan M., Maalouf A., Weksler M., Feldman D. No fine-tuning, no cry: Robust svd for compressing deep networks. Sensors. 2021;21:5599. doi: 10.3390/s21165599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Tai C., Xiao T., Zhang Y., Wang X. Convolutional neural networks with low-rank regularization. arXiv. 20151511.06067 [Google Scholar]
- 86.Xu Y., Li Y., Zhang S., Wen W., Wang B., Dai W., Qi Y., Qi Y., Lin W., Xiong H. Trained rank pruning for efficient deep neural networks; Proceedings of the 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS); Vancouver, BC, Canada. 13 December 2019; pp. 14–17. [Google Scholar]
- 87.Novikov A., Podoprikhin D., Osokin A., Vetrov D.P. Tensorizing neural networks. Adv. Neural Inf. Process. Syst. 2015;28:442–450. [Google Scholar]
- 88.Newman E., Horesh L., Avron H., Kilmer M. Stable tensor neural networks for rapid deep learning. arXiv. 2018 doi: 10.3389/fdata.2024.1363978.1811.06569 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Lee D., Kwon S.J., Kim B., Wei G.Y. Learning low-rank approximation for cnns. arXiv. 20191905.10145 [Google Scholar]
- 90.Padilla-Zepeda E., Torres-Roman D., Mendez-Vazquez A. A Semantic Segmentation Framework for Hyperspectral Imagery Based on Tucker Decomposition and 3DCNN Tested with Simulated Noisy Scenarios. Remote Sens. 2023;15:1399. doi: 10.3390/rs15051399. [DOI] [Google Scholar]
- 91.Kossaifi J., Lipton Z.C., Kolbeinsson A., Khanna A., Furlanello T., Anandkumar A. Tensor regression networks. J. Mach. Learn. Res. 2020;21:4862–4882. [Google Scholar]
- 92.Zhu J., Li X., Jin P., Xu Q., Sun Z., Song X. Mme-yolo: Multi-sensor multi-level enhanced yolo for robust vehicle detection in traffic surveillance. Sensors. 2021;21:27. doi: 10.3390/s21010027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Cao X., Wu C., Yan P., Li X. Linear SVM classification using boosting HOG features for vehicle detection in low-altitude airborne videos; Proceedings of the 2011 18th IEEE International Conference on Image Processing; Brussels, Belgium. 11–14 September 2011; pp. 2421–2424. [Google Scholar]
- 94.Wen X., Yuan H., Yang C., Song C., Duan B., Zhao H. Improved Haar wavelet feature extraction approaches for vehicle detection; Proceedings of the 2007 IEEE Intelligent Transportation Systems Conference; Bellevue, WA, USA. 30 September–3 October 2007; pp. 1050–1053. [Google Scholar]
- 95.Kuang H., Chen L., Chan L.L.H., Cheung R.C., Yan H. Feature selection based on tensor decomposition and object proposal for night-time multiclass vehicle detection. IEEE Trans. Syst. Man Cybern. Syst. 2018;49:71–80. doi: 10.1109/TSMC.2018.2872891. [DOI] [Google Scholar]
- 96.Wang W., Zhang M. Tensor deep learning model for heterogeneous data fusion in Internet of Things. IEEE Trans. Emerg. Top. Comput. Intell. 2018;4:32–41. doi: 10.1109/TETCI.2018.2876568. [DOI] [Google Scholar]
- 97.Brazell M., Li N., Navasca C., Tamon C. Solving multilinear systems via tensor inversion. SIAM J. Matrix Anal. Appl. 2013;34:542–570. doi: 10.1137/100804577. [DOI] [Google Scholar]
- 98.Rogers M., Li L., Russell S.J. Multilinear dynamical systems for tensor time series. Adv. Neural Inf. Process. Syst. 2013;26:1–9. [Google Scholar]
- 99.Chen C., Surana A., Bloch A., Rajapakse I. Multilinear time invariant system theory; Proceedings of the 2019 Conference on Control and its Applications; Chengdu, China. 19–21 June 2019; pp. 118–125. [Google Scholar]
- 100.Pandey D., Leib H. A tensor framework for multi-linear complex MMSE estimation. IEEE Open J. Signal Process. 2021;2:336–358. doi: 10.1109/OJSP.2021.3084541. [DOI] [Google Scholar]
- 101.Greub W.H. Multilinear Algebra. Springer; Berlin/Heidelberg, Germany: 1978. Tensor algebra; pp. 60–83. [Google Scholar]
- 102.Lim L.H. Tensors in computations. Acta Numerica. 2021;30:555–764. doi: 10.1017/S0962492921000076. [DOI] [Google Scholar]
- 103.Panigrahy K., Mishra D. Extension of Moore–Penrose inverse of tensor via Einstein product. Linear Multilinear Algebra. 2022;70:750–773. doi: 10.1080/03081087.2020.1748848. [DOI] [Google Scholar]
- 104.Sagiroglu S., Sinanc D. Big data: A review; Proceedings of the 2013 International Conference on Collaboration Technologies and Systems (CTS); San Diego, CA, USA. 20–24 May 2013; pp. 42–47. [Google Scholar]
- 105.Fan J., Han F., Liu H. Challenges of big data analysis. Natl. Sci. Rev. 2014;1:293–314. doi: 10.1093/nsr/nwt032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Lu H., Plataniotis K.N., Venetsanopoulos A.N. A survey of multilinear subspace learning for tensor data. Pattern Recognit. 2011;44:1540–1551. doi: 10.1016/j.patcog.2011.01.004. [DOI] [Google Scholar]
- 107.De La Torre F., Black M.J. A framework for robust subspace learning. Int. J. Comput. Vis. 2003;54:117–142. doi: 10.1023/A:1023709501986. [DOI] [Google Scholar]
- 108.Pearson K. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1901;2:559–572. doi: 10.1080/14786440109462720. [DOI] [Google Scholar]
- 109.Spearman C. “General Intelligence” Objectively Determined and Measured. Am. J. Psychol. 1904;15:201–292. doi: 10.2307/1412107. [DOI] [Google Scholar]
- 110.Hyvärinen A., Oja E. Independent component analysis: Algorithms and applications. Neural Netw. 2000;13:411–430. doi: 10.1016/S0893-6080(00)00026-5. [DOI] [PubMed] [Google Scholar]
- 111.Hotelling H. Breakthroughs in Statistics. Springer; New York, NY, USA: 1992. Relations between two sets of variates; pp. 162–190. [Google Scholar]
- 112.Eckart C., Young G. The approximation of one matrix by another of lower rank. Psychometrika. 1936;1:211–218. doi: 10.1007/BF02288367. [DOI] [Google Scholar]
- 113.Klecka W.R., Iversen G.R., Klecka W.R. Discriminant Analysis. Volume 19 Sage; Thousand Oaks, CA, USA: 1980. [Google Scholar]
- 114.Sener O., Koltun V. Multi-task learning as multi-objective optimization. Adv. Neural Inf. Process. Syst. 2018;31:525–536. [Google Scholar]
- 115.Samet H., Tamminen M. Efficient component labeling of images of arbitrary dimension represented by linear bintrees. IEEE Trans. Pattern Anal. Mach. Intell. 1988;10:579–586. doi: 10.1109/34.3918. [DOI] [Google Scholar]
- 116.Dillencourt M.B., Samet H., Tamminen M. A general approach to connected-component labeling for arbitrary image representations. J. ACM. 1992;39:253–280. doi: 10.1145/128749.128750. [DOI] [Google Scholar]
- 117.Dougherty E.R., Lotufo R.A. Hands-On Morphological Image Processing. SPIE Press; Bellingham, WA, USA: 2003. p. 59. [Google Scholar]
- 118.Cichocki A., Mandic D., De Lathauwer L., Zhou G., Zhao Q., Caiafa C., Phan H.A. Tensor decompositions for signal processing applications: From two-way to multiway component analysis. IEEE Signal Process. Mag. 2017;32:145–163. doi: 10.1109/MSP.2013.2297439. [DOI] [Google Scholar]
- 119.Zhang C., Bengio S., Hardt M., Recht B., Vinyals O. Understanding deep learning (still) requires rethinking generalization. Commun. ACM. 2021;64:107–115. doi: 10.1145/3446776. [DOI] [Google Scholar]
- 120.Vervliet N., Debals O., Sorber L., De Lathauwer L. Breaking the curse of dimensionality using decompositions of incomplete tensors: Tensor-based scientific computing in big data analysis. IEEE Signal Process. Mag. 2014;31:71–79. doi: 10.1109/MSP.2014.2329429. [DOI] [Google Scholar]
- 121.Phan A., Sobolev K., Sozykin K., Ermilov D., Gusak J., Tichavsky P., Glukhov V., Oseledets I., Cichocki A. Stable low-rank tensor decomposition for compression of convolutional neural network; Proceedings of the Computer Vision–ECCV 2020, 16th European Conference; Glasgow, UK. 23–28 August 2020; Cham, Switzerland: Springer; 2020. pp. 522–539. Proceedings, Part XXIX 16. [Google Scholar]
- 122.Fernando Hermosillo Reynoso—Youtube. [(accessed on 20 November 2023)]. Available online: https://www.youtube.com/watch?v=ZWWX4nojMos&list=PLKng1hWmrHM2wWBQXrA8zxOoPzFSj-Upj.
- 123.Fernando Hermosillo Reynoso—Github. [(accessed on 20 November 2023)]. Available online: https://github.com/fhermosillo/TDF.
- 124.Kingma D.P., Ba J. Adam: A method for stochastic optimization. arXiv. 20141412.6980 [Google Scholar]
- 125.Markoulidakis I., Kopsiaftis G., Rallis I., Georgoulas I. Multi-class confusion matrix reduction method and its application on net promoter score classification problem; Proceedings of the 14th PErvasive Technologies Related to Assistive Environments Conference; Corfu, Greece. 29 June–2 July 2021; pp. 412–419. [Google Scholar]
- 126.Gonzalez-Ramirez A., Lopez J., Torres-Roman D., Yañez-Vargas I. Analysis of multi-class classification performance metrics for remote sensing imagery imbalanced datasets. J. Quant. Stat. Anal. 2021;8:11–17. doi: 10.35429/JQSA.2021.22.8.11.17. [DOI] [Google Scholar]
- 127.Luque A., Carrasco A., Martín A., de Las Heras A. The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognit. 2019;91:216–231. doi: 10.1016/j.patcog.2019.02.023. [DOI] [Google Scholar]
- 128.Arlot S., Celisse A. A survey of cross-validation procedures for model selection. Stat. Surv. 2010;4:40–79. doi: 10.1214/09-SS054. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets used to support the findings of this study, particularly for the multi-view multitask model, are publicly available at the following GitHub repository: https://github.com/fhermosillo/VTSMultiviewDatasets (accessed on 20 October 2024). Unfortunately, the code employed in this work is currently private, but it can be made available upon request to reviewers for evaluation purposes. Please refer to the repository or contact the corresponding author for further inquiries or additional data requests.