Abstract
The increasing use of motion sensors is causing major changes in the process of monitoring people's activities. One of the main applications of these sensors is the detection of sports activities, for example, they can be used to monitor the condition of athletes or analyze the quality of sports training. Although the existing sensor-based activity recognition systems can recognize basic activities such as: walking, running, or sitting; they don't perform well in recognizing different types of sports activities. This article introduces a new model based on machine learning (ML) techniques to more accurately distinguish between sports and everyday activities. In the proposed method, the necessary data to detect the type of activity is collected through his two sensors: an accelerometer and a gyroscope attached to a person's foot. For this purpose, the input signals are first preprocessed and then short-time Fourier transform (STFT) is used to describe the characteristics of each signal. In the next step, each STFT matrix is used as input to a convolutional neural network (CNN). This CNN describes various motion characteristics of the sensor in the form of vectors. Finally, a classification model based on error correction output code (ECOC) is used to classify the extracted features and detect the type of SA. The performance of the proposed AS recognition method is evaluated using the DSADS database and the results are compared with previous methods. Based on the results, the proposed method can recognize sports activities with an accuracy of 99.71. Furthermore, the performance of the proposed method based on precision and recall criteria are 99.72 and 99.71, respectively, which are better than the compared methods.
Keywords: Sports activity recognition, Deep learning, Error-correcting output codes (ECOC), Convolutional neural network (CNN)
1. Introduction
In the last decade, sensor manufacturing technologies and artificial intelligence (AI) techniques have been developing simultaneously. This has led to the development of intelligent systems for the automatic processing of information generated by sensors. By using these systems, a wide range of costly and exhausting processes can be performed automatically and accurately [1]. Recognition and analysis of SA is one of these issues. In the past, SA analysis was done manually by experienced experts. This process is time-consuming, energy-consuming, and not error-free [2]. For this reason, the problem of automatic recognition of SA can be solved by taking advantage of AI techniques in a more efficient way than before. In recent years, various studies were conducted in the field of SA recognition using AI and ML techniques.
Some early studies performed SA recognition using machine vision techniques [3]. In these methods, the input data is recorded through a camera, and by processing the resulting images, SA can be recognized. Nevertheless, the methods based on machine vision face limitations that arise from the type of data used. The movement limitation of the camera, the noise, vibration, and reflection in the images, as well as the possibility of error in identifying the subject, are some of these challenges [4]. The rapid development of technologies for manufacturing wearable sensors as well as the rapid expansion of the Internet of Things (IoT) also affected the strategies for recognizing SA; so that in recent years, SA recognition systems have tended to process sensor data more [5]. In these methods, the necessary data to recognize SA are gathered through wearable sensors such as accelerometer, gyroscope, and magnetometer [6].
Examining the previously presented solutions in this field shows that recognition methods based on sensor data face two basic challenges: First, some existing methods have limitations in terms of the number of recognizable activities. For instance, some works recognize a limited number of SA [7]. This is while an application system should be able to cover a wide range of people's daily activities. Secondly, in most research, accurate SAR requires the use of multiple sensors. In this case, in addition to the body, the sensors must be installed on both hands and both feet of the person. This requirement, in addition to the possibility of creating restrictions on the individual's activities, increases the implementation costs and reduces the possibility of using it in real conditions. The increasing popularity of motion sensors has revolutionized the way we monitor and analyze human activities. Among the many applications, SAR stands out as a crucial tool for various purposes, including athlete monitoring, training evaluation, and injury prevention. While existing SAR systems demonstrate proficiency in identifying basic activities like walking, running, and sitting, their ability to accurately distinguish between various sports disciplines remains a challenge. Addressing the above limitations has motivated the current research. In this research, we propose a novel SAR framework that leverages machine learning techniques to achieve enhanced accuracy and robustness. Our approach utilizes a combination of two sensors, an accelerometer and a gyroscope, strategically placed on the foot to gather comprehensive data. The gathered signals undergo preprocessing, followed by STFT to extract temporal and frequency features. These features are then fed into a CNN, which extracts high-level, discriminative representations of the movement patterns. Finally, an ECOC classifier is employed to categorize the extracted features and identify the specific sport activity. Therefore, in this article, an attempt is made to solve the challenges in the research problem by presenting a new hybrid strategy for SAR. The proposed method, through efficient feature extraction and also by using an accurate classification model, can recognize a wide range of people's daily and sports activities only based on the data of two sensors. The contribution of the current article includes the following.
-
•
This study presents a new feature extraction model based on CNN to effectively describe human motion features. In this feature extraction model, motion data obtained through sensors in each dimension is processed by a separate CNN model.
-
•
The paper presents a new classification model based on the combination of ECOC and support vector machine (SVM), in which weighted binary classifiers are used to reduce prediction error.
-
•
10-fold cross-validation was used to evaluate the performance of the proposed model and demonstrate its generalizability. The rest of this article is organized as follows. Section 2 provides an overview of previous efforts in the SAR field. Next, Section 3 describes the details of the proposed method and discusses the implementation results of the proposed method. Finally, the research results are summarized in Section 5.2.
2. Research background
Based on the type of data used, SAR methods can be divided into machine vision-based methods and sensor-based methods. Considering the current research strategy, mainly focuses on sensor-based methods, and then, several recent vision-based SAR methods will be reviewed.
2.1. A. SAR based on sensor data
In general, sensor-based methods include steps for pre-processing, signal segmentation, feature extraction, feature selection (or feature reduction), and classification. In Ref. [8], a method based on a CNN was presented for SAR. In this method, the input signal is preprocessed, segmented, and normalized after passing through high-pass and low-pass filters. The data for each sensor is described below in the form of an STFT matrix. These matrices are merged and the CNN model is used to determine the type of SA. This method performs SAR by using the data from four sensors. Two accelerometer sensors and two gyroscope sensors are worn on the person's right hand and foot.
In [9], a SAR method based on deep learning techniques was proposed. This method collects data from his 15 sensors, including his 5 accelerometers, 5 gyroscope sensors, and 5 magnetometer sensors attached to the person's torso, hands, and feet. In this method, sensor data is preprocessed through a process of segmentation and sampling. Then, these data are merged in a matrix format to generate the input of deep learning models. This article compares the performance of five deep learning models for SA recognition, namely: CNN, Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), gating recurrent unit (GRU), and Bidirectional GRU (BiGRU). Meanwhile, the BiGRU model can achieve higher accuracy than other methods. In Ref. [10], the authors tried to improve the performance of the BiGRU model in SAR and daily activities by adding multimodal features. This model also achieves acceptable detection accuracy by using five sensors. In Ref. [11], an aggregate system based on deep learning was introduced for SAR. This method gathers the required data through 9 sensors worn on the person's upper body, right hand, and right leg. In this method, sensor data are combined after segmentation and simultaneously used as input to several CNN models. Then, the weighted values of the output layer of these models are combined to get the maximum weight based on the total; Identify the type of activity. In this method, the number of recognizable sports activities is limited to 4 classes. In Ref. [12], the transformer model was used to detect people's activities based on the movement data of the sensors. The transformer model is a deep learning model that, due to the use of the attention mechanism, can recognize the activity characteristics of people over time more accurately than the basic CNN models. In this method, the movement data is normalized and after being converted into a matrix form, it is transferred to the transformer model. In this research, an experimental strategy was used to adjust the parameters of the learning model.
In [13], an attempt was made to increase the accuracy of recognizing people's activities based on data from wearable sensors by combining classification models. In this method, four types of data are used: acceleration, angular velocity, magnetic field, and direction. Each of these data is processed by an LSTM model and the type of activity is recognized using the majority voting strategy. In this method, the number of recognizable activities is limited to 7 classes. In Ref. [14], a method based on deep learning and motion sensor data analysis was proposed to detect different types of movements in a Frisbee game. This method uses an accelerometer and gyroscope sensor worn on the player's dominant hand. Each of the acceleration and gyroscope data is processed separately by similar CNN models. These CNN models include 3 convolution layers, the last fully connected layer of which is connected so that two feature classes of acceleration and gyroscope can be merged. Finally, by classifying the features merged by the SoftMax layer, the Veroi data are placed in one of the 9 movements defined by this model.
The study done in Ref. [15] focuses on analyzing sensor movement data to identify various football movement types. This method uses a CNN model as the learning model, and then uses a BiLSTM for feature categorization. Five sensors give the data needed for this learning model, and each data record is categorized into one of the five movement classes that have been established. In Ref. [16], a combination of CNN and LSTM based on the attention mechanism was utilized to recognize people's activities through internal sensors. In this method, first, the input data is pre-processed and segmented, and then each created segment is sent to a CNN network with 16 filters. The CNN output is passed through three layers of LSTM based on the attention mechanism to finally identify the type of activity. The method presented in Ref. [17] uses the data of smartphone sensors to recognize people's activities. This method involves first pre-processing the data and then extracting its features. The appropriate characteristics are then chosen using the Lukasiewicz similarity (LS) criterion. A multi-class SVM model is used in the following to optimize and classify the chosen characteristics. Research in Ref. [18] presented a GRU-INC model for human activity recognition. The model combines the advantages of gated recurrent units (GRUs) for temporal modeling with inception modules and convolutional block attention modules (CBAMs) for spatial modeling. Research in Ref. [19] presented a multi-class algorithm for human activity recognition using an ensemble of auto-encoders (AEs), where each AE in the ensemble is associated with a unique class, and the final classification is determined by the ensemble. Also, in Ref. [20] a CNN-GRU model for human activity recognition was presented.
2.2. B. vision-based SAR
Vision-based models, use video analysis methods to recognize sport activity. Reflection, camera movement and accurate tracking targets can be considered as the main challenges in these methods. In a groundbreaking advancement in human action recognition, researchers in Ref. [21] introduced a novel approach that seamlessly integrates deep neural networks (DNNs) with multiview features.
This innovative method harnesses the power of pre-trained convolutional neural networks (CNNs) for DNN features while drawing upon horizontal, vertical, and vertical directional gradients for multiview features. Relative entropy, mutual information, and strong correlation coefficient (SCC) are three essential criteria that are used to determine which elements from the combined set are most significant. A Naive Bayes classifier receives these condensed features in order to make a conclusive recognition. Table 1 summarizes the studied works.
Table 1.
summary of the studied works.
| Reference | Year | Method | Sensors | Dataset | Classes |
|---|---|---|---|---|---|
| [8] | 2019 | CNN | 4 (2 accelerometers, 2 gyroscopes) | DSADS | 7 |
| [9] | 2022 | CNN, LSTM, BiLSTM, GRU, BiGRU) | 15 (5 accelerometers, 5 gyroscopes, 5 magnetometers) | UCI-HAR | 9 |
| [10] | 2022 | BiGRU | 5 | UCI-HAR | 12 |
| [11] | 2022 | Aggregate CNN | 9 | Custom dataset | 4 |
| [12] | 2022 | Transformer | 4 (accelerometer, gyroscope, magnetometer, direction) | Custom dataset | 4 |
| [13] | 2022 | Combined classification models (LSTM) | 4 (acceleration, angular velocity, magnetic field, direction) | Custom dataset | 7 |
| [14] | 2022 | Convolutional neural networks (CNN) | 2 (accelerometer, gyroscope) | Custom dataset | 9 |
| [15] | 2022 | CNN + BiLSTM | 5 | Custom dataset | 5 |
| [16] | 2022 | CNN + LSTM + attention mechanism | Internal sensors | Custom dataset | 12 |
| [17] | 2022 | Smartphone sensors + SVM | 9 (accelerometer, gyroscope, magnetometer, light, pressure, humidity) | UCI-HAR | 6 |
| [18] | 2023 | GRU + CBAM | 15 (5 accelerometers, 5 gyroscopes, 5 magnetometers) | UCI-HAR, OPPORTUNITY, PAMAP2, WISDM | – |
| [19] | 2021 | Ensemble of AEs | 15 (5 accelerometers, 5 gyroscopes, 5 magnetometers) | WISDM. MHealth, PAMAP2 | 5 |
| [20] | 2023 | CNN + GRU | 9 | UCI-HAR, OPPORTUNITY, Mhealth | 6 |
3. Proposed model for recognizing SA
-
1.
The suggested model for SAR employing deep learning and ML approaches is detailed in this section. The suggested method extracts feature with a CNN and classifies and recognizes SA with the ECOC strategy. In summary, the suggested approach consists of the following steps: Pre-processing and signal normalization
-
2.
Feature description based on STFT
-
3.
CNN-based feature extraction
-
4.
Classification based on ECOC
In Fig. 1, the steps of recognizing SA by the proposed method are shown. According to this diagram, the proposed method starts with the pre-processing of the input signal to reduce the destructive effect of noise and improve recognition accuracy, by using lowpass and highpass filters. Then the pre-processed signal is normalized. By doing this, the signal magnitudes are ignored and the normalization result only includes the patterns of data changes. In the next step, a spectrogram is extracted from the obtained signals for each dimension. The obtained matrices are combined to create the input of the CNN model. This CNN model is responsible for extracting features related to a person's movement pattern. Therefore, the vector obtained from the last fully connected layer of the CNN model is considered as the extracted features. The extracted features are used as input to the ECOC model. This ECOC model includes a set of SVM classifiers, which together can increase the accuracy of recognition in multi-class problems. The suggested method's ECOC model determines the efficacy of each binary classifier as a weight value. This strategy has not been studied previously and distinguishes the suggested method from past publications. The proposed method is described in the next section.
Fig. 1.
Steps of the proposed method for recognizing SA.
3.1. Signal preprocessing and normalization
The signals gathered via motion sensors always include noise. This noise can be caused by sensor error, environmental conditions, disturbances caused by other equipment, etc. The presence of noise in the signal adds irrelevant data to it and can cause errors in the recognition of SA. For this reason, it is necessary to pre-process the signal to remove the destructive effect of noise. On the other hand, accelerometer sensors are affected by the gravity component, and to eliminate movement data acquisition related to SA, this component should also be removed from the accelerometer signals [22]. In the proposed method, this operation is done using lowpass and highpass filters.
Removing noise and unwanted data in the input signals is done with two filters. The first filter is for removing noise. Noise appears as high-frequency data in motion signals. To remove this data, a 10-point moving average low-pass filter is utilized. After passing the input signals through the low pass filter, the gravity component data can be removed from the accelerometer signals. For this purpose, each accelerometer signal is processed with a high-pass filter. The filter used in the proposed method is a third-order elliptic filter with a cutoff frequency of 0. 005 Hz, and the gyro signal passes only through the low-pass filter to remove noise contained in the signal. Additionally, the accelerometer signal passes through the aforementioned high-pass filter after the low-pass filter, which removes the influence of the gravitational component in addition to noise. This concludes the signal preprocessing process. To automatically segment motion from the filtered accelerations and angular velocities generated from sensors, we utilized the adaptive magnitude threshold method proposed in Ref. [23]. This method consists of four steps.
-
1
Calculation of signal vector magnitudes,
-
2
Calculation of adaptive magnitudes of signal vector magnitudes,
-
3
Finding start points of sport activity motion, and
-
4
Finding end points of sport activity motion.
The detailed procedure forsport motion segmentation can be found in Ref. [23] and therefore, its description is omitte.
All preprocessed signals are standardized below. Normalization involves converting all input signals to a specific scale. Depending on the size of the signal values, errors may occur in the recognition process. In other words, in the process of recognizing SA, the pattern of signal changes is of great importance; Not signal values. For example, swimming can be done at high or low speed. In both of these cases (which should be classified in the same class), the magnitude of the accelerometer signals is different; But the pattern of changes is similar. Based on this, in the proposed method, (1) is used to normalize the pre-processed signals:
| (1) |
Where represents the input signal, and and represent the minimum and maximum values of the signal, respectively.
3.2. STFT-based feature description
After preprocessing and normalizing the input signals, the STFT model is used to characterize the characteristics of each signal and form its spectrogram matrix. The STFT model is a suitable solution to represent signal changes in the time domain. This model provides suitable input for deep learning techniques such as CNN.This model is prper for describing interpretable features that show signal differences at closely spaced points. The spectrogram obtained from this model is a time-frequency matrix, obtained based on the squared amplitude of the STFT of the normalized signal. To form this matrix, first, the input signal is divided into a set of parts with the same length using a window function. Then the Fourier transform is calculated for each segment. STFT of a signal can be described as follows [24]:
| (2) |
where represents the normalized signal. Also, specifies the window function centered at . After calculating the STFT of the signal based on (2), the spectrogram of the signal can be generated as its square magnitude [24]:
| (3) |
The result of (3) is a spectrogram matrix whose horizontal and vertical dimensions describe time and frequency respectively. After calculating the spectrogram matrices for all input signals (in all , , and axes); These matrices are combined to form the input of the CNN model in the next step. For this purpose, first, accelerometer and gyroscope matrices obtained from different sensors are connected. This process is done for all three axes , , and to organize the movement features as a three-dimensional matrix. The procedure of spectrogram matrices integration in this step, this process is drawn in Fig. 2.
Fig. 2.
Spectrogram matrices integration procedure for describing motion features in the proposed method.
In the upper part of Fig. 2, the data gathered via the accelerometer and gyroscope sensors for the right hand and foot are displayed in the form of spectrogram matrices. In the first line, the spectrogram matrices obtained from the accelerometer sensor on the person's right hand are drawn in three axes , , and . The second line also shows the spectrogram matrices obtained from the gyroscope sensor on the right hand. The third and fourth lines show the results of the feature description for the accelerometer and gyroscope sensors on the person's right leg, respectively. In all these matrices, the horizontal axis indicates time and the vertical axis indicates frequency. The length of all the signals is equal and as a result, the obtained spectrogram matrices for all the signals have the same dimensions. To integrate the spectrogram matrices, the matrices related to each axis for all sensors are connected based on the frequency axis. For example, the accelerometer and gyroscope matrices corresponding to the x-axis are marked as , , , and . The arrangement pattern of these matrices after integration is shown in the lower part of Fig. 2. The matrix resulting from spectrogram integration has dimensions , where represents time and represents frequency. Also, each of the first to third layers of this matrix describes the movement data in the , , and axes. The obtained matrix is used as the input of the CNN model.
3.3. Feature extraction based on CNN
After integrating the spectrogram matrices, the resulting structure is applied as input to the CNN model. This CNN model is responsible for feature extraction and describing motion features in vector form. The features extracted from motion signals with the suggested method are actually compact STFT representations of these signals, which are extracted applying the proposed CNN model. In other words, the CNN model utilized in the proposed method serves to describe the STFT characteristics of the signal in a compact form.The proposed CNN model for extracting motion features is shown in Fig. 3. The dimensions of each input sample are . Each input sample is processed using two convolution components to identify the patterns in the data using its convolution filters. Each convolution component includes a convolution layer, a ReLU activation function, and a MaxPool layer. The sampling dimensions for MaxPool layers in both convolution components are set equal to 2 2. Also, the convolution layer in the first component has 32 filters with dimensions of 7 7. While the convolution layer in the second component includes 64 filters with dimensions of 5 5. In this model, the first convolution component is considered for extracting main data patterns, and then the second convolution component describes deeper data patterns. In both convolution and pooling layers, the stride has been considered as 1. In the continuation of the second convolution component, two fully connected layers are used, which have dimensions of 512 and 256, respectively. The task of these layers is to represent the features extracted by the CNN model in a vector format. Based on the experimental results, the use of two fully connected layers (instead of one layer) to gradually reduce the features, prevents removing useful features and can lead to higher accuracy in the detection of AS. It should be noted that the configuration of hyperparameters of the proposed CNN model is done using the BayesOpt tool. Each input sample is processed based on the proposed CNN model and the weight values obtained from the last fully connected layer of this network are considered as the extracted features for that sample.
Fig. 3.
Structure of the proposed CNN model for extracting motion features.
3.4. Classification based on ECOC
In the final step of the proposed method, the features extracted based on the CNN are classified by the ECOC model to identify the type of activity. ECOC is a classification method based on a combination of binary classifications that attempts to reduce problem complexity and improve accuracy by dividing a multiclass classification problem into a series of two-class problems. In this method, a binary coding matrix is used to analyze the classes. The ECOC coding matrix has rows corresponding to target classes and the number of matrix columns is equal to the number of binary classifiers. Each target class is described as a unique binary code string, and binary classifiers are trained based on the codes defined in each column of the code matrix [25]. There are various methods for forming the code matrix. In the proposed method, the ordinal strategy is used to form the coding matrix. In this method, the number of classifications (code matrix columns) is equal to , where C represents the number of target classes. In this coding method, for the first classifier, the first class is coded as negative (bit 0) and the other classes are coded as positive (bit 1). For the second classifier, the first two classes are coded as negative and the other classes as positive, and for other classifiers, matrix coding is done in the same way [26]. The classification model used in the proposed method is a SVM with linear kernel function. With these explanations, to classify the extracted features, first, a code matrix like is created based on the ordinal method. Each row of the matrix is assigned to one of the target classes and each SVM model is trained based on the codes in one of the columns of the code matrix. In the test phase and to predict the type of SA in new samples, first, the features of the sample are processed by each of the trained SVM models to create the binary output of each model. Then, by merging the bit codes of the classifiers, a code string is created. This code string is compared with the lines of the code matrix. Finally, the sample belongs to a class for which the binary output of SVM models has the smallest distance with the code string corresponding to that class. In conventional ECOC models, the Hamming distance criterion is used to calculate the difference between two code strings. In this case, all classes have the same value. But since the accuracy of one classifier can be different from others; In the proposed method, a weight value is assigned to each classification and the distance is calculated based on this weight value. In this case, the classifiers with higher performance will have more weight value. Therefore, in the proposed method, the distance between the output code string of the classifiers and the codes of the target classes is calculated based on the combination of these weight values and the Hamming function. This criterion is called Weighted Hamming Distance (WHD) and is defined as follows:
| (4) |
Where specifies the weighted Hamming distance between the code string O and class c in the code matrix. Also, L indicates the length of the code string or the number of classifiers of the ECOC model. Oi also specifies the output bit code generated by the i-th SVM model and determines the value of the i-th bit in the class c code string (or in other words, the c-th row and i-th column of the coding matrix). Finally, represents the bitwise XOR operator and the weight value assigned to the i-th SVM model is displayed as wi. In the proposed method, the weight value assigned to each SVM classifier is defined as one of the natural numbers . Also, to determine the weight value of each class, a comprehensive search strategy is used. In this case, all the weight combinations are checked and a state of weight values are assigned to the classifiers that result in the least training error.
4. Implementation and results
The suggested method was implemented in the MATLAB 2018a software environment applying the Daily and Sport Activity DataSet (DSADS) [27]. The effectiveness of suggested SA detection method was evaluated utilizing the criteria of precision, recall, precision, and F-measure. To confirm the validity of suggested method, the results were compared with previous methods. In the remainder of this section, we describe the database characteristics and evaluation criteria, and then review our findings.
4.1. Database
The DSADS database [27] contains various motion sensor data collected during sports and daily activities. This database contains 19 types of activities, 10 of which are sports. The database sample was collected from 8 of his individuals, including 4 males and 4 females, aged between 20 and 30 years. The number of signal samples in each class is 480.Thus, the entire database contains 9120 samples, each of which describes the signals gathered in a 5-s interval through 5625 features. The data sampling rate in this database is equal to 25 Hz. In this database, each sample includes sensor data connected to the upper body, two hands, and two legs. The movement of each member is gathered through 3 sets of accelerometer, gyroscope, and magnetometer sensors in three x, y, and z axes. In the proposed method, only the data of the accelerometer and gyroscope sensors attached to the right leg of people were used to recognize the type of activity. Also, all target classes are used. In Fig. 4, the signals extracted from the accelerometer and gyroscope sensors are given for some examples of SA in the database.
Fig. 4.
Signals extracted from accelerometer and gyroscope sensors for some samples of SA in the database.
Based on the sample activities shown in Fig. 4, the extracted signals for some sports activities such as running, cycling or jumping have an almost regular rhythm, and it seems that the type of sports activity can be predicted with high accuracy by examining the characteristics of the signal rhythm. Meanwhile, the signals extracted for some other activities such as basketball (due to the player's freedom of movement) and rowing (due to the high influence of water movement) have an irregular rhythm, which makes the analysis of these signals challenging.
4.2. Evaluation criteria
A 10-iteration cross-validation technique was used to evaluate the effectiveness of the proposed method in diagnosing sports (and everyday) activities. In this method, 90% of the samples are used to train the learning model, and the remaining samples are used for testing. All his DSADS samples are tested by repeating this process 10 times and using a new test sample in each iteration. After each iteration, the quality of the classification was measured based on the criteria of precision, accuracy, recall, and F-measure. The accuracy criterion shows the ratio of correctly classified samples to the total test samples. On the other hand, precision and recall criteria are often used for two-class problems. Considering that the number of target classes in this research is equal to 19; Therefore, these criteria are calculated separately for each target class. In this case, each time one of the target classes is assumed as a positive class and the other classes as a negative class, and then two precision and recall criteria are calculated. Accuracy metric is used to describe the classification accuracy of each class separately and indicates the percentage of correct positive results of the classification algorithm. Reproducibility criterion, on the other hand, indicates what proportion of positive class samples were correctly classified. These two criteria can be formulated as follows:
| (5) |
| (6) |
Here, TP represents the number of positive samples actually detected. FP represents the number of negative samples marked as positive, and finally FN indicates the number of positive samples incorrectly classified into other classes. Using the above two criteria, the F-Measure can be formulated as follows:
| (7) |
In the following, the results of the evaluation of the proposed method based on these criteria are discussed.
4.3. Results and discussion
According to the process described in the previous section, the implementation of the proposed method was done using DSADS data in MATLAB software environment. All tests were performed on a personal computer with an Intel Core i7 processor with a frequency of 3.2 GHz and 32 GB of RAM. Also, the processing related to the CNN model was done using the CUDA capability of Nvidia GeForce GTX 1080 graphics adapter.
All samples in the DSADS database were tested applying a 10-fold cross-validation strategy. Some examples of spectrograms extracted from DSADS instances are indicated in Fig. 5. In all these spectrometers, the horizontal axis shows time in seconds and the vertical axis represents frequency in Hertz.Due to the large number of target classes, only examples of 4 classes of walking, sailing, jumping, and basketball are given. In this figure, each pair of lines represents the spectrographs extracted for the samples of one of the mentioned classes, where the first line represents the spectrographs extracted for the accelerometer sensor data. and the second line also draws the spectrographs related to the gyroscope sensor. All the samples belong to the sensor worn on the right foot, and in the process of the tests, the data from other sensors were omitted.
Fig. 5.
Some examples of spectrographs extracted from database samples.
In Fig. 5, the first column specifies the spectrographs extracted from the x-axis data of the sensor, and the second and third columns depict these results for the y- and z-axis data, respectively. As shown in Fig. 5, the spectrographs obtained for different classes have obvious differences in data patterns. Therefore, it can be said that the resulting spectrographs can provide useful information in the field of a person's movement pattern, and the use of these features can be effective in accurately recognizing the type of activities.
To verify the effectiveness of the proposed weighted ECOC model in improving the accuracy of activity recognition, we compared its performance with the baseline ECOC model during experiments. The case of classifying features using CNN models was also considered. In this case, the features extracted by the CNN are classified using a softmax-based classifier without using the weighted ECOC model. In addition to his two cases mentioned above, the performance of the proposed method was tested in his Hsu et al. [8] and he compared BiGRU [9]. Note that all compared methods used the same data to evaluate performance. Fig. 6 compares the accuracy of the proposed method with other methods in detecting sports and daily activities in the DSADS database. Fig. 6 a shows the accuracy of each SAR method during each cross-validation fold.This graph shows that the proposed method has been able to maintain its high performance in different folds of cross-validation and perform SAR with a lower error level. In addition to higher accuracy in different folds, the range of error changes in the proposed model is smaller than the compared methods, which confirms its optimal performance. On the other hand, Fig. 6 b demonstrates the average accuracy of each method in all iterations. It should be noted that the results presented in this figure and other diagrams in this section are obtained by aggregating the outcomes of 10 folds of the cross-validation experiments.
Fig. 6.
The accuracy of the proposed method and other methods in recognizing sports and daily activities in the DSADS database (a) accuracy in each fold (b) average accuracy (c) accuracy of CNN during training epochs.
Based on Fig. 6 b, the proposed method can recognize sports and daily activities in the sample database with an accuracy of 99. 71, and the proposed method outperforms other methods. On the other hand, if the proposed weighted ECOC model is replaced by the baseline ECOC model, the detection accuracy drops to 98.13%. Also, if the ECOC model is removed and the features are classified by the CNN model, activities can be recognized with 97.92% accuracy. These comparisons show that.
-
•
First, although the CNN model has an acceptable efficiency in extracting signal features, the use of the ECOC model can be useful in improving classification accuracy.
-
•
Secondly, by using the proposed weighted ECOC model, the accuracy of recognition can be increased to a greater extent, and this shows the appropriate efficiency of the proposed classification model.
These results show that the proposed weighted ECOC model can increase recognition accuracy by at least 1.58%. This increase in accuracy in the proposed method can be attributed to the use of a weighting strategy for binary classifiers based on their performance. Finally, Fig. 6 c illustrates the accuracy of the proposed CNN model during its training epochs. It should be noted that this figure refers to the case that a sotfmax layer is appended to the last layer of the proposed CNN feature extractor. Fig. 6 c clearly shows the CNN model's gradual improvement in accuracy as it is trained on the data. This provides evidence of the model's learning ability and convergence. Examining the values reported in Fig. 6 c shows that the highest accuracy of the CNN model after completing the training is equal to 97.94, which is lower compared to the accuracy of the proposed method (combination of CNN and WECOC). These results show that although the CNN model can act as a powerful tool in extracting data features; But for more accurate diagnosis, classifiers more efficient than softmax can be used, and the proposed method using the WECOC model has been able to meet this goal.
On the other hand, the closest method to the suggested method in terms of accuracy is the BiGRU model, which achieves an accuracy of 99.04. The confusion matrix can obtain more details about classification methods to recognize various activities. Fig. 7 indicates the confusion matrix of the suggested method and the BiGRU method in classifying database samples. In these confusion matrices, each column of the matrix represents the actual labels of the test samples, and the rows of the matrix also show the labeling of the samples by each classification method. As an example, in Fig. 7 a, out of 480 samples belonging to the Sitting class (sum of values in the first column of the matrix), 479 samples were correctly classified by the proposed method and only one sample was incorrectly classified in the Elevator2 class, which is related to standing in the elevator. On the other hand, the BiGRU method correctly recognized 477 samples from the mentioned class and incorrectly classified three samples in “lying on the back”, “walking on the treadmill”, and “running” classes. In general, the comparison of Fig. 7-a and 7-b shows that the proposed method is superior to the BiGRU method in classifying samples of all classes, and increases the accuracy by 0.66 percent. In other words, the proposed method has 60 fewer errors compared to the BiGRU method for DSADS database samples, and this error reduction is significant in real applications. Also, the confusion matrices resulting from the classification by the basic ECOC model and Hsu et al.'s methods [8] are shown in Fig. 8. By comparing Fig. 7-a, with Fig. 8-a, and 8-b, a similar conclusion is obtained. Based on these results, the proposed method is superior to the compared methods in correctly classifying samples of all classes.
Fig. 7.
Confusion matrix (a) proposed method, and (b) BiGRU method in classifying database samples.
Fig. 8.
Confusion matrix (a) basic ECOC model and (b) Hsu et al.'s method in classifying database samples.
Fig. 9 compares the efficiency of different methods in recognizing sports and daily activities based on precision, recall, and F-Measure criteria for the DSADS database.
Fig. 9.
Comparing the efficiency of different methods based on the criteria, (a) precision, (b) recall, and (c) F-Measure.
In each of the diagrams depicted in Fig. 9, the first dimension shows the classes related to the type of activity and the second dimension corresponds to the compared methods. Investigating these graphs, it can be seen that the proposed method in the case of using the weighted ECOC model can classify different classes with higher efficiency than other methods. In Fig. 10, the average measures of precision, recall, and F-measure are calculated. These plots show the overall performance of different methods on qualitative classification criteria. Also, the numerical results of the tests conducted in this section are given in Table 2.
Fig. 10.
Average precision, recall, and F-Measure measures.
Table 2.
Efficiency comparison of the proposed method with other methods.
| Method | Accuracy | F-measure | Recall | precision | Test time (sec.) | Training time (min.) | Std. (accuracy) |
|---|---|---|---|---|---|---|---|
| Proposed (CNN + WECOC) | 99.7149 | 99.7150 | 99.7149 | 99.7155 | 0.16 | 73.60 | 0.173 |
| CNN + ECOC | 98.1250 | 98.1249 | 98.1250 | 98.1289 | 0.157 | 69.15 | 0.741 |
| CNN + SoftMax | 97.9167 | 97.9167 | 97.9167 | 97.9201 | 0.121 | 72.48 | 0.490 |
| Hsu et al [8] | 98.0811 | 98.0803 | 98.0811 | 98.0826 | 0.192 | 76.91 | 0.439 |
| BiGRU [9] | 99.0461 | 99.0463 | 99.0461 | 99.0487 | 0.207 | 83 | 0.274 |
Comparing the criteria of accuracy, precision, recall and F-measure in Table 2 and Fig. 10 confirms that the proposed method can recognize sports and living activities with higher quality methods are compared. Based on these results, the proposed method can improve the criteria of precision, recall and F-measure. The higher precision of the proposed method confirms that the results given for each The proposed method is correct with a higher probability than other methods. On the other hand, deeper recall indicates that the suggested method can correctly identify a higher proportion of samples belonging to various activities. From Table 2, the suggested model requires less training time than the models presented in Refs. [8,9]. This means that the combined use of CNN and WECOC, in addition to improving SAR accuracy, also allows this task to be performed in a shorter time period. Fig. 11 indicates the ROC curves provided from the classification of sports and daily activities in the tested database. The corners of this curve are indicated enlarged to make the method approach more clear. Regarding this graph, it can be seen that the proposed method has higher true positive rate (TPR), lower false positive rate (FPR), and larger area under the ROC curve than the comparison method. It can be concluded that the method proposed in this article has a high average accuracy in correctly classifying activities due to movement data.
Fig. 11.
ROC curve resulting from the classification of DSADS database samples.
5. Conclusion
Recognizing the type of SA is one of the interesting applications of AI techniques in the real world, which has been discussed in various research so far. The purpose of all these methods is to accurately distinguish different types of SA or daily activities by processing individual movement information. This article presented a new and efficient method for solving the mentioned problem. The proposed method recognizes SA with deep learning and ML techniques. For this purpose, a CNN model was presented in order to efficiently extract signal features, based on which the information set of input signals can be described in a compact format. In this method, a weighted combination based on SVM in the form of an ECOC model is used to classify movement features. For this purpose, a weight value is assigned to each SVM classifier in the ECOC model, and the output combination of SVM classifiers is done based on these values. The effectiveness of the suggetsed method for detecting SA was evaluated using the DSADS database and the findings were compared with previously presented methods. Comparing the performance of the suggetsed weighted ECOC model with the traditional ECOC model, we find that the suggested strategy can decrease the detection error by 1.58%. On the other hand, in addition to SA, the suggested method can detect a wide range of daily life activities and distinguish among these activities with 99.71% accuracy. Thus, the performance of the proposed method due to precision and recall criteria is 99.72 and 99.71, respectively, which is better than the compared methods. On the other hand, the suggested method only used information from the gyro sensor and acceleration sensor installed on the person's feet. Thus, the model proposed in this research can perform the recognition process through the minimum necessary sensors, which feature can lead to a reduction in the cost of implementing the proposed method in real conditions.
One of the limitations of the proposed method is the small range of weight values that can be assigned to binary classifications in the ECOC model. Meanwhile, weight values can be selected in a wider range so that, based on that, more performance improvement can be achieved in the weighted ECOC model. But it should be noted that by doing this, the search space expands explosively and it is not possible to determine the optimal weight values using a comprehensive search. In this situation, optimization techniques can be used to determine the weight values of binary classifications, which we will deal with in future research.
Data availability
All data generated or analysed during this study are included in this published article.
CRediT authorship contribution statement
Lu Lyu: Investigation. Yong Huang: Investigation.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
- 1.Demrozi F., Pravadelli G., Bihorac A., Rashidi P. Human activity recognition using inertial, physiological and environmental sensors: a comprehensive survey. IEEE Access. 2020;8:210816–210836. doi: 10.1109/access.2020.3037715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Yadav S.K., Tiwari K., Pandey H.M., Akbar S.A. A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions. Knowl. Base Syst. 2021;223 [Google Scholar]
- 3.Pareek P., Thakkar A. A survey on video-based human action recognition: recent updates, datasets, challenges, and applications. Artif. Intell. Rev. 2021;54:2259–2322. [Google Scholar]
- 4.Beddiar D.R., Nini B., Sabokrou M., Hadid A. Vision-based human activity recognition: a survey. Multimed. Tool. Appl. 2020;79:30509–30555. [Google Scholar]
- 5.Dang L.M., Min K., Wang H., Piran M.J., Lee C.H., Moon H. Sensor-based and vision-based human activity recognition: a comprehensive survey. Pattern Recogn. 2020;108 [Google Scholar]
- 6.Fu B., Damer N., Kirchbuchner F., Kuijper A. Sensing technology for human activity recognition: a comprehensive survey. IEEE Access. 2020;8:83791–83820. [Google Scholar]
- 7.Slim S.O., Atia A., Elfattah M.M., Mostafa M.S.M. Survey on human activity recognition based on acceleration data. Int. J. Adv. Comput. Sci. Appl. 2019;10(3) [Google Scholar]
- 8.Hsu Y.L., Chang H.C., Chiu Y.J. Wearable sport activity classification based on deep convolutional neural network. IEEE Access. 2019;7:170199–170212. [Google Scholar]
- 9.Mekruksavanich S., Jitpattanakul A. Multimodal wearable sensing for sport-related activity recognition using deep learning networks. J. Adv. Inf. Technol. 2022 [Google Scholar]
- 10.Mekruksavanich S., Jitpattanakul A. Sport-related activity recognition from wearable sensors using Bidirectional GRU network. Intelligent Automation & Soft Computing. 2022;34(3) [Google Scholar]
- 11.Pajak G., Krutz P., Patalas-Maliszewska J., Rehm M., Pajak I., Dix M. An approach to sport activities recognition based on an inertial sensor and deep learning. Sensor Actuator Phys. 2022;345 [Google Scholar]
- 12.Dirgová Luptáková I., Kubovčík M., Pospíchal J. Wearable sensor-based human activity recognition with transformer model. Sensors. 2022;22(5):1911. doi: 10.3390/s22051911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Webber J., Mehbodniya A., Arafa A., Alwakeel A. Improved human activity recognition using majority combining of Reduced-complexity sensor branch classifiers. Electronics. 2022;11(3):392. [Google Scholar]
- 14.Link J., Perst T., Stoeve M., Eskofier B.M. Wearable sensors for activity recognition in ultimate frisbee using convolutional neural networks and transfer learning. Sensors. 2022;22(7):2560. doi: 10.3390/s22072560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Cuperman R., Jansen K.M., Ciszewski M.G. An end-to-end deep learning pipeline for football activity recognition based on wearable acceleration sensors. Sensors. 2022;22(4):1347. doi: 10.3390/s22041347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Khatun M.A., Yousuf M.A., Ahmed S., Uddin M.Z., Alyami S.A., Al-Ashhab S.…Moni M.A. Deep CNN-LSTM with self-attention model for human activity recognition using wearable sensor. IEEE Journal of Translational Engineering in Health and Medicine. 2022;10:1–16. doi: 10.1109/JTEHM.2022.3177710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Azmat U. 2022 24th International Multitopic Conference (INMIC) IEEE; 2022. Human activity recognition via smartphone embedded sensor using multi-class SVM; pp. 1–7. October. [Google Scholar]
- 18.Mim T.R., Amatullah M., Afreen S., Yousuf M.A., Uddin S., Alyami S.A.…Moni M.A. GRU-INC: an inception-attention based approach using GRU for human activity recognition. Expert Syst. Appl. 2023;216 [Google Scholar]
- 19.Garcia K.D., de Sá C.R., Poel M., Carvalho T., Mendes-Moreira J., Cardoso J.M.…Kok J.N. An ensemble of autonomous auto-encoders for human activity recognition. Neurocomputing. 2021;439:271–280. [Google Scholar]
- 20.Khatun M.A., Yousuf M.A., Moni M.A. 2023 International Conference on Electrical, Computer and Communication Engineering (ECCE) IEEE; 2023. Deep CNN-GRU based human activity recognition with automatic feature extraction using smartphone and wearable sensors; pp. 1–6. February. [Google Scholar]
- 21.Khan M.A., Javed K., Khan S.A., Saba T., Habib U., Khan J.A., Abbasi A.A. Human action recognition using fusion of multiview and deep features: an application to video surveillance. Multimed. Tool. Appl. 2020:1–27. [Google Scholar]
- 22.Karantonis D.M., Narayanan M.R., Mathie M., Lovell N.H., Celler B.G. Implementation of a real-time human movement classifier using a triaxial accelerometer for ambulatory monitoring. IEEE Trans. Inf. Technol. Biomed. 2006;10(1):156–167. doi: 10.1109/titb.2005.856864. [DOI] [PubMed] [Google Scholar]
- 23.Hsu Y.L., Yang S.C., Chang H.C., Lai H.C. Human daily and sport activity recognition using a wearable inertial sensor network. IEEE Access. 2018;6:31715–31728. [Google Scholar]
- 24.Le Roux J., Kameoka H., Ono N., Sagayama S. Proc. DAFx (Vol. 10, Pp. 397-403) 2010. Fast signal reconstruction from magnitude STFT spectrogram based on spectrogram consistency. September. [Google Scholar]
- 25.Dietterich T.G., Bakiri G. The Mathematics of Generalization. CRC Press; 2018. Error-correcting output codes: a general method for improving multiclass inductive learning programs; pp. 395–407. [Google Scholar]
- 26.Joutsijoki H., Haponen M., Rasku J., Aalto-Setälä K., Juhola M. Error-correcting output codes in classification of human induced pluripotent stem cell colony images. BioMed Res. Int. 2016 doi: 10.1155/2016/3025057. 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Barshan B. DSADS: daily and sports activities data set. UCI Repository of machine learning databases. 2010 https://archive.ics.uci.edu/ml/datasets/Daily+and+Sports+Activities Available online at: (Accessed on: Feb 2023) [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All data generated or analysed during this study are included in this published article.











