Table 6.
Approach | Disadvantage | Advantage | Idea | Method |
---|---|---|---|---|
Supervised learning | (i) High calculation time in assigning a new instance to the class. (ii) Selects the appropriate similarity recognition method. |
(i) Relatively high classification accuracy. (ii) Conducts a comprehensive empirical review of time series classification issues. (iii) Simplicity. (iv) Good performance against a large number of supervised methods. |
The principle of similarity between the training set and new examples is used to classify. The latest instance is assigned to the respective class by a majority vote of its closest neighbors. | KNN |
(i) Has two classes. (ii) The costly operation of building a training package on large-scale data. (iii) Possibility of low performance in large datasets. (iv) High training time. (vi) Ignores remote data. (vii) Low performance in the dataset with high noise. |
(i) Linear separation in the specified space. (ii) Saves time on detection. (iii) Also suitable for low training data. (iv) Strong generalizability. (v) Suitable for complex activities. (vi) Sturdy against heteroscedastic noise. |
This method uses statistical learning theory that maximizes the margin between the separator and the data. | SVM | |
(i) The direct impact of the selected feature set on accuracy. (ii) The possibility of overfitting in the small dataset and the great depth of the tree. (iii) Requires high-volume datasets. |
(i) They have an excellent computational performance. (ii) Data noise resistant. (iii) Efficiency for high-volume datasets. (iv) Suitability in cases where the dataset lacks values for all features. |
The DT uses static features of time series data and focuses on the sliding window. | DT | |
(i) Needs a lot of labeled data to achieve good performance. (ii) Low performance on low data. |
(i) Improves the performance of the DT. (ii) Compatibility with multiclass problem. (iii) Important feature selection for classification. |
Random forests contain a combination of decision trees and are based on the majority vote of each tree's different decisions. | RF | |
(i) Challenges in the data collection phase. (ii) Inaccuracy in the user. (iii) Independent model leads to decrease in accuracy. (iv) Ability to reduce the accuracy of big data. |
(i) Noise injection is provided to improve activity detection models. (ii) High accuracy and reduction of false-positive rates. (iii) Less vulnerability to changing conditions. (iv) Good generalizability. (v) Less vulnerability to changes in circumstances. |
This method's idea is that to create general recognition models for e-health, a small main dataset is used, and the area covered by the dataset is expanded using noise. | QDA | |
(i) Lack of details about the seemingly desirable parameters. (ii) A lack of systematic exploration of deep learning capabilities. (iii) Selects the appropriate method of deep learning. |
(i) Provides high-level abstraction models in the data. (ii) High accuracy. |
Deep learning has emerged as a learning model branch, creating a deep multilayered architecture for automated feature design. | DLC | |
(i) Saves only one step before. (ii) High calculation. (ii) Vanishing. (iv) Exploding gradient. (v) Difficulty of long-term modeling dynamics. |
(i) Compatible with variable-length input. (ii) Saves the previous step for higher accuracy. |
Includes non-linear units with internal modes that can learn dynamic temporal behavior from a continuous input with arbitrary length. | RNN | |
(i) High complexity of the model. (ii) Poor decoding efficiency. (iii) Training and decoding costs. |
(i) Traceable learning. (ii) Suitable for a variety of activities. (iii) Suitable for modeling complicated time relationships. (iv) Suitable for group activities. (v) Suitable for counteracting the effects of reduced gradients. |
At each step, the memory's content from the first layer contains differentiating information that describes the person's movement and past changes in his activity. Over time, cells learn to output, overwrite, or ignore their internal memory based on the current input and past state history, resulting in a system capable of storing information in hundreds of steps. | LSTM | |
(i) A difficult balance between learning rate and learning accuracy. | (i) Better performance than the perceptron. (ii) Solves the problem of falling at the local minimum point. (iii) Finds massive data patterns. (iv) An effective solution to solve the problem of gradient fading. (v) Useful for the low training set. |
As with the convection method, a set of matrix surface samples is first generated. Then, the average of the samples' signals in each matrix is used as the DBN input. | DBN | |
(i) Processing units in the CNN need to be used. Length of temporal dimensions. (ii) Sharing or integrating CNN units between different sensors. (iii) Selecting a smaller step size in the window to increase the sample size leads to higher computational costs. (iv) Requires computational time and high memory. |
(i) Learned features have more power. (ii) Effectiveness of local signals and local dependence. (iii) No change in scale. |
It is based on a deep architecture and contains at least one temporal convolutional layer, one pooling layer, and one fully connected layer before a classifier. | CNN | |
Unsupervised learning | (i) Convergence is not guaranteed in many cases. Dependence on the initial evaluation of EM algorithm. | (i) Suitable for detecting most activities. (ii) Good performance in the face of sparse data with a high diversity. |
A probabilistic method is generally used in unsupervised classification that uses the Gaussian component total weight density. | GMM |
(i) Poor performance in cluster overlap Uncertainty about data classification, especially in overlapping areas. Merges two different clusters when k is less than the actual value. (ii) Dependence of clustering results and iteration time on the initial centers of the clusters. The algorithm can be very slow to converge with wrong initialization. |
(i) Reduces the size of the total variance distortion within the cluster as a cost function. (ii) Low computational complexity. (iii) High performance for large datasets. (iv) High linearity of temporal complexity. |
An unsupervised classification method is known for clustering n samples into k classes. This method involves repeating the cluster centers' detection and then passing the data to the desired cluster according to their distance (for example, Euclid) from the cluster's center until it converges. | K-means | |
(i) Poor performance in cluster overlap. (i) Uncertainty about data classification. (iii) Merges two different clusters when k is less than the actual value. (iv) Dependence of clustering results and iteration time on the initial centers of the clusters. (v) The algorithm can operate very slowly to converge with wrong initialization. (vi) Convergence is not guaranteed. (vii) Recognizes a sequence that includes more than one activity as an activity. (viii) Not suitable for complex activities |
(i) A dynamic method. (ii) High performance for detecting short-term activities. (iii) Compatible with the sequential data model. |
A Markov chain expresses a discrete-time random process involving a limited number of states whose current state depends on the former. In the case of HAR, each activity is represented by a mode. | Markov | |
| ||||
Semisupervised learning | (i) It is difficult to analyze because it is a wrapper algorithm. | (i) Limited cost for labeling. (ii) Good performance in some cases. |
It is a wrapper algorithm that frequently uses a supervised learning method. A supervised classifier is training for the first time, with a small amount of labeled data. | Self-training |
(i) The need for data samples that should be described by two subsets which are sufficient and redundant. (ii) Used in quantitative applications such as text classification. (iii) To determine which sample is to be labeled, each classifier's labeling reliability must be carefully measured. (iv) Sometimes, this measurement process is very time-consuming. |
(i) An excellent approach to using unlabeled data to improve learning efficiency. | This method follows the process of repeated self-training. Simultaneously, the goal is to improve by strengthening the training process with one more source of information. | Co-training | |
(i) There is not always an identifiable composite distribution that can help build the generative model. (ii) Not suitable for all semisupervised learning tasks. |
(i) Detects missing data for the classification problem. (ii) No cost of data labeling by an expert. |
The core of the generative model for semisupervised learning is large amounts of unlabeled data to identify composite components. Then, unlabeled data for each class are sufficient to determine the compositional distribution fully. | Generative | |
(i) High learning cost. (ii) Many parameters. |
(i) Can control unlabeled and labeled data points. (ii) Relatively high accuracy |
Most well-known deep learning methods, such as CNN and LSTM, conceived the generative and discriminator models. It is not surprising to know that they can learn directly from unlabeled data. | DLS |