An Energy-Based Method for Orientation Correction of EMG Bracelet Sensors in Hand Gesture Recognition Systems

Lorena Isabel Barona López; Ángel Leonardo Valdivieso Caraguay; Victor H Vimos; Jonathan A Zea; Juan P Vásconez; Marcelo Álvarez; Marco E Benalcázar

doi:10.3390/s20216327

. 2020 Nov 6;20(21):6327. doi: 10.3390/s20216327

An Energy-Based Method for Orientation Correction of EMG Bracelet Sensors in Hand Gesture Recognition Systems

Lorena Isabel Barona López ^1,^†, Ángel Leonardo Valdivieso Caraguay ^1,^†, Victor H Vimos ^1,^†,^‡, Jonathan A Zea ^1,^†, Juan P Vásconez ^1,^†, Marcelo Álvarez ^2,^†, Marco E Benalcázar ^1,^*,^†

PMCID: PMC7665113 PMID: 33171967

Abstract

Hand gesture recognition (HGR) systems using electromyography (EMG) bracelet-type sensors are currently largely used over other HGR technologies. However, bracelets are susceptible to electrode rotation, causing a decrease in HGR performance. In this work, HGR systems with an algorithm for orientation correction are proposed. The proposed orientation correction method is based on the computation of the maximum energy channel using a synchronization gesture. Then, the channels of the EMG are rearranged in a new sequence which starts with the maximum energy channel. This new sequence of channels is used for both training and testing. After the EMG channels are rearranged, this signal passes through the following stages: pre-processing, feature extraction, classification, and post-processing. We implemented user-specific and user-general HGR models based on a common architecture which is robust to rotations of the EMG bracelet. Four experiments were performed, taking into account two different metrics which are the classification and recognition accuracy for both models implemented in this work, where each model was evaluated with and without rotation of the bracelet. The classification accuracy measures how well a model predicted which gesture is contained somewhere in a given EMG, whereas recognition accuracy measures how well a model predicted when it occurred, how long it lasted, and which gesture is contained in a given EMG. The results of the experiments (without and with orientation correction) executed show an increase in performance from 44.5% to 81.2% for classification and from 43.3% to 81.3% for recognition in user-general models, while in user-specific models, the results show an increase in performance from 39.8% to 94.9% for classification and from 38.8% to 94.2% for recognition. The results obtained in this work evidence that the proposed method for orientation correction makes the performance of an HGR robust to rotations of the EMG bracelet.

Keywords: hand gesture recognition, orientation correction, electrodes displacement, Myo armband

1. Introduction

Hand gesture recognition (HGR) systems are human–machine interfaces that are responsible for determining which gesture was performed and when it was performed [1]. Hand gestures are a common and effective type of non-verbal communication which can be learned easily through direct observation [2]. In recent years, several applications of HGRs have been proven useful. For example, these models have been applied in sign language recognition (English, Arabic, Italian) [3,4,5], in prosthesis control [6,7,8,9], in robotics [10,11], in biometric technology [12], and in gesture recognition of activities of daily living [13], among others. In the medical field, hand gesture recognition has also been applied to data visualization [14] and image manipulation during medical procedures [15,16] as well as for biomedical signal processing [17,18]. Although there are many fields of application, HGR models have not reached their full potential, nor have they been widely adopted. This is caused mainly by three factors. First, the performance of HGR systems can still be improved (i.e., recognition accuracy and processing time, and number of gestures). Second, the protocol used for evaluating these models usually is poorly rigorous or ambiguous, and thus, the results are hardly comparable. Third, HGR implementations are commonly cumbersome. This is partly because they are not easy or intuitive to use (i.e., an HGR implementation is expected to be real-time, non-invasive, and wireless), or because they require some training or strict procedure before usage.

In this work, an HGR model focused on this third issue (procedures before usage, intuitive interface, and training/testing requirements) for HGR based on electromyography (EMG) signals is presented. In the following paragraphs, the problem is fully described.

1.1. Structure of Hand Gesture Recognition Systems

An HGR system is composed of five modules: data acquisition, pre-processing, feature extraction, classification, and post-processing. Data acquisition consists of measuring, via some physical sensors, the signals generated when a person performs a gesture [1]. All sorts of technologies have been used for data acquisition, such as inertial measurement units (IMUs) [19,20], cameras [21], force and flexion sensors (acquired through sensory gloves) [6,22], and sensors of electrical muscle activity (EMG) [23]. EMG signals can be captured via needle electrodes inserted in the muscle (intramuscular EMG, iEMG) or using surface electrodes which are placed over the skin (surface EMG, sEMG). The iEMG is used especially for medical diagnosis and has greater accuracy because needles can be directed on specific muscles [24]. On the other hand, sEMG is considered to be non-invasive. In this work, a non-invasive commercial device (Myo bracelet), which captures EMG signals, was used for data acquisition. EMG signals stand out among all other technologies because of their potential for capturing the intention of movement on amputees [25]. Pre-processing is the second module of an HGR system, which is in charge of organizing and homogenizing all sorts of acquired signals (i.e., sensor fusion) to match the feature extraction module. Common techniques used at this stage include filtering for noise reduction [7], normalization [26], or segmentation [27]. The next module of an HGR system is feature extraction. Its goal is to extract distinctive and non-redundant information from the original signals [28]. Features are intended to share similar patterns between elements of the same class. Feature extraction can be carried out using automatic feature extractors such as convolutional neural networks (CNNs) or autoencoders [29,30,31,32,33,34,35]. Other features can be selected manually with an arbitrary selection of the feature extraction functions. These functions can be extracted from time, frequency, or time–frequency domains [36]. However, most real-time HGR models use time-domain features because the controller delay for their computation is smaller compared to others. We found that the mean absolute value (MAV) was the most used feature for HGR applications. Nevertheless, we observed that other time-related features can also be used, such as root mean square (RMS), waveform length (WL), variance (VAR), fourth-order auto-regressive coefficients (AR-Coeff), standard deviation (SD), variance (VAR), energy ratio (ER), slope sign changes (SSC), mean, median, integrated EMG (iEMG), sample entropy (SampEn), mean absolute value ratio (MAVR), modified mean absolute value (MMAV), simple square integral (SSI), log detector (LOG), average amplitude change (AAC), maximum fractal length (MFL), dynamic time warping (DTW), sample entropy (SE), and quantization-based position weight matrix (QuPWM) [1,3,6,8,9,11,12,13,17,18].

The classifier module is composed of a supervised learning algorithm that maps a feature vector to a label. Common classifiers used for HGR applications are k-nearest neighbor [10], tree-based classifier [12], support vector machines (SVM) [6,11,37,38,39,40], Bayesian methods [41], neural networks (NN) [42,43,44], and recurrent neural networks [45,46,47,48]. Among these methods, it has been observed that SVM and CNN stand out, where SVM shows high efficiency with light computational requirements and fast responses, whereas CNN has very high recognition performance but requires hardware with more processing capacity and longer inference times. The last module is post-processing. Its objectives is to filter spurious predictions to produce a smoother response [49] and to adapt the responses of the classifier to final applications (e.g., a drone or robot).

1.2. Evaluation of Hand Gesture Recognition Systems

The performance of a hand gesture recognition system is analyzed based on three parameters: classification accuracy, recognition accuracy, and processing time. Classification and recognition concepts are differentiated in this work. Classification identifies the corresponding class of a given sample. The evaluation of classification just compares the predicted label with the true label of the EMG sample. Results of classification are usually presented in confusion matrices where sensitivity, precision, and accuracy are summarized by the gesture. Recognition goes further than classification because it not only involves assigning a sample to a label but also requires determining the instants of time where the gesture was performed. The evaluation of recognition accuracy, hence, compares the vector of predictions of an HGR system with the ground truth corresponding to the given EMG sample. The ground truth is a Boolean vector set over the points with muscle activity; this information is included in every sample of the data set, and it was obtained before by a manual segmentation procedure. There could be several ways of comparing the vector of predictions with the ground truth. In this work, the evaluation protocol previously defined in [50] is followed. This protocol calculates an overlapping factor between both vectors and considers a sample correctly recognized when the overlapping factor is above a threshold of 25%. This comparison is only carried out for a valid vector of predictions. A vector of predictions is valid when there is only one segment of continuous predictions with the same label which are different from the relax position. This can be considered as a strict evaluation because any point of the signal differently labeled will cause an incorrect recognition. Moreover, any relax label predicted in the middle of predictions of a different class will also imply an incorrect recognition. This way of evaluating recognition provides us with a true perspective of the HGR behavior in real applications. As a result, classification accuracy will be higher than recognition accuracy.

A demanding requirement for the HGR system is having real-time operation. For human–machine interfaces, a system works in real time when a person uses a system and does not perceive delay on the response [1]. This involves that real-time operation is dependent upon the application and user perception. There is much debate in the literature about the maximum time limit for a system to be considered in real time (e.g., 300 ms [51]). In this work, the threshold of 100 ms reported by [52] is considered. This time (also known as controller delay) is measured from the moment when the system receives the signal until it returns a response. Additionally, real-time operation is assured based on the time responses obtained over offline simulations. An offline simulation in this context is a simulation with previously obtained data. In contrast, an online evaluation involves new recordings of data every time it is going to be implemented. Additionally, HGR systems evaluated in online scenarios usually suffer from being tested over a small set of users (e.g., [53]). An offline evaluation has the advantage of using already collected data, and it also allows the experiments to be replicated and compared. An offline approach is suitable in our case where a large amount of data is required to evaluate the models. In our experiments, real-time data acquisition is simulated using a sliding window approach.

1.3. User-Specific and User-General HGR Systems

HGR systems are divided into two types: user-specific (dependent or individual models) and user-general (independent models). A user-specific system requires collecting samples each time a new user uses the system for training or tuning. On the other hand, user-general models are trained once over a multi-user data set, and these systems do not require additional collection of samples when a new user wants to use the system [54]. Although user-specific models are trained with fewer training samples, they usually obtain higher accuracies because they are trained and calibrated for each person. Meanwhile, user-general models are easier to use and set up. However, these models have a really low performance for a significant portion of users in the data set [29]. Developing user-general HGR systems is still an open research challenge because it requires not only large data sets but also robust and adaptable machine learning algorithms.

1.4. The Rotation Problem with Bracelet-Shaped Devices and Related Works

One of the main drawbacks of general HGR systems using a bracelet-shaped EMG devices is their dependence on the location of the sensor. This problem is usually diminished in the literature because HGR models are trained and evaluated assuming the exact location of the bracelet in the forearm of the user. In the literature, there are also reported examples of the downside effects of electrode displacement. For instance, Hargrove et al. [55] proposed a classifier training strategy in order to reduce the effect of electrode displacements on classification accuracy. Here, the system must be trained carefully. The samples corresponding to some rotation conditions were included in the training data. Sueaseenak et al. [56] proposed an optimal electrode position for the surface EMG sensor Myo bracelet. They found that the position to get the best surface EMG recording is in the middle of the forearm’s length area. This approach for wearing a bracelet sensor in its optimal position is not practical because it requires one to place the bracelet in exactly the same position every time the system is used. In [57], different experiments related to sensor orientation were applied when the testing data were shifted. The experiments demonstrated that shifting the sensor 2 cm causes the SVM’s and the kNN’s accuracy to drop significantly with accuracy between 50% and 60%. It is noticeable that sensor rotation decrements the performance of HGR systems and sometimes even makes those unusable. Therefore, it is important to have a system that corrects the variation in the orientation of the sensor. In this context, several researchers have tried to solve this problem with different methods. In [58], the bracelet was rotated every 45 degrees and the EMG signals were recorded. Then, a remapping was made according to the predicted angle and the distribution was marked on the user’s arm prior to the signal recording. However, the calculation time was high and it only worked well in steps of 45 degrees because of the high complexity of the algorithm. In [59], a classification system that uses the Myo bracelet and a correction to the rotation of the bracelet was applied showing a classification accuracy of 94.7%. However, the classification time was 338 ms, not applicable in real-time scenarios. Despite the fact that most of the previous works solve the problem of the sensor’s rotation found in the literature, the recognition was not evaluated in most of them, and only classification was performed. As a result, it is important to build a system that performs classification and recognition in conjunction with orientation correction.

1.5. Article Overview

The main contribution of this paper is the method for electrode rotation compensation, based on identifying the maximum energy channel (MEC) to detect the reference pod to compensate the variation in the orientation of the bracelet. The maximum energy is calculated using a reference hand gesture; then, the data are rearranged creating a new sensor order. This method is executed each time a person uses the recognition system, needing a maximum time of 4 s for the calibration process. After the calibration procedure, a person can use the proposed HGR system wearing the bracelet with a different rotation (i.e., any angle on the forearm). The proposed orientation correction algorithm was evaluated over a larger dataset following a stricter evaluation procedure for classification and recognition [50]. The data set has 612 users and was divided into two groups: 50% (i.e., 306 users) for training and 50% for testing. This work also implemented and compared user-specific and user-general models. One of the advantages of the HGR implemented system is its low computational cost and astonishing recognition and classification accuracy.

Following this introduction, the remaining of this paper is organized as follows. Section 2 presents Materials and Methods, including the EMG device used for collecting the data set, the gestures included, and the proposed model architecture to fix the displacement problem. In Section 3, the experiments designed for testing the proposed model are described. These include a comprehensive combination of user-specific and user-general models, original pod position and synthetic rotation, and HGR system with and without orientation correction. The results of these experiments are presented and analyzed in Section 4. In Section 5, further discussion over the results is presented. In Section 6, the findings of this research, as well as the outlines of the future work, are mentioned.

2. Materials and Methods

The architecture for the HGR system based on EMG signals that we developed in this work is presented in Figure 1. As can be observed, the proposed system is composed of five stages, which are data acquisition, pre-processing, feature extraction, classification, and post-processing. The mentioned stages are explained as follows.

Hand gesture recognition architecture. It can be observed that the proposed architecture is composed of five stages, which are data acquisition, pre-processing, feature extraction, classification, and post-processing.

2.1. Data Acquisition

This work uses the dataset collected in a previous research [60], and can be found in [61]. Additionally, the code has been uploaded to GitHub [62]. To simulate rotations of the bracelet, we assume that, by default, the pods of the Myo armband are ordered according to the sequence $S = 1, 2, \dots, 8$ . Then, with uniform probability, we randomly selected a number r from the set ${- 3, - 2, - 1, 0, + 1, + 2, + 3, + 4}$ . Then, we simulated the rotation of bracelet by computing the new sequence $\tilde{S} = \tilde{s_{1}}, \tilde{s_{2}}, \dots, \tilde{s_{8}}$ of the pods, where $\tilde{s_{i}} = m o d (s_{i} + r, 9)$ , with $s_{i} \in S$ and $i = 1, 2, \dots, 8$ . Note that in this way, we simulated rotations of the bracelet clockwise and counterclockwise in steps of 45 degrees.

The EMG signals were acquired with the Myo bracelet, which has eight differential electrodes with a sampling frequency of 200 Hz. This device also has an inertial measurement unit with nine degrees of freedom (accelerometer, gyroscope, and magnetometer) and haptic feedback, but in this work, we only used EMG information. The Myo bracelet is able to transmit the collected data via Bluetooth to a computer. The Myo bracelet sensor is illustrated in Figure 2a, the suggested manufacturer position of the Myo bracelet is observed in Figure 2b, and a sample of the Myo bracelet rotated in a different angle can be visualized in Figure 2c.

Myo armband sensor. (a) Myo pod distribution, (b) position of the sensor suggested by the Myo manufacturer, and (c) position of the Myo sensor rotated, which can cause issues during the recognition procedure.

The protocol followed for acquiring EMG signals indicates that the Myo bracelet must be placed in the same area of the right or left forearm during the acquisition over all the users. In this research, the signals used are from people who wear the bracelet placed only on the right forearm, no matter if they were right- or left-handed. The data set is composed of 612 users and was divided into two groups: 50% for training and 50% for testing (i.e., 306 users for each one). It has to be noted that the data set is composed of 96% right-handed people and 4% left-handed people, as well as 66% men and 34% women. The age distribution of the data set has a higher concentration of users between 18 and 25 years old; this is because the data are from undergraduate students. An illustration of the statistical information related to the data set is presented in Figure 3.

Data set statistics related to handedness distribution, sex, and age. The illustrations refer to the total number of users—612—in the data set. The data set is divided into 50% of users for training and 50% for test—306 for each, respectively.

The data set used in this work consists of five gestures, which are the same as those detected by the MYO manufacturer’s software. The mentioned hand gestures are $w a v e I n$ , $w a v e O u t$ , $f i s t$ , $o p e n$ , $p i n c h$ , and the relax state ( $n o G e s t u r e$ ) as can be observed in Figure 4. The total number of repetitions performed by each user is 300, which corresponds to 50 repetitions for each gesture. Each repetition was recorded during 5 s, and every gesture repetition starts in the relax position and ends in the same relax position.

Hand gestures to be recognized for the proposed architecture. (a) waveOut, (b) waveIn, (c) fist, (d) open, (e) pinch, and (f) noGesture.

The data set also includes information on the limits of muscle activity, which was manually segmented within the 5 s of the measured EMG signal. This information is useful to identify the moments when every gesture was performed. For the rest of the paper, we use the name $g r o u n d$ $T r u t h$ for the manual segmentation of the muscular activity.

2.1.1. General and Specific Models

In this work, we train and evaluate two different approaches for hand gesture recognition based on a general and a specific model, respectively. We first created a general model based on a training set composed of EMG information from all users, and then each user tested the model to evaluate the recognition results. On the other hand, we also created a specific model based on a training set that only uses one user at a time, and again each user tested their respective model to evaluate the recognition results. To work with general or specific models, it is necessary to create a matrix organized per sensor, user, and gesture category to train the classifier. Equation (1) shows the EMG training matrix ${Dtrain}_{u s e r_{k}}$ for each user k.

{Dtrain}_{u s e r_{k}} = [\begin{matrix} EMG (u s e r_{k}, w o g) \\ EMG (u s e r_{k}, w i g) \\ EMG (u s e r_{k}, f g) \\ EMG (u s e r_{k}, o g) \\ EMG (u s e r_{k}, p g) \\ EMG (u s e r_{k}, n g) \end{matrix}]

(1)

where $EMG (u s e r_{k}, g e s t u r e_{j})$ represents the EMG measures for each $u s e r_{k}$ and $g e s t u r e_{j}$ , waveOut ( $w o g$ ), waveIn ( $w i g$ ), fist ( $f g$ ), open ( $o g$ ), pinch ( $p g$ ), and noGesture ( $n g$ ). Each matrix $EMG (u s e r_{k}, g e s t u r e_{j})$ is composed of a set of the EMG measures denoted by ${Ms}_{k}$ , which represents the transposed vector of every channel repetition performed for $u s e r_{k}$ as we show.

EMG (u s e r_{k}, g e s t u r e_{j}) = [{Ms}_{1} {Ms}_{2} \dots {Ms}_{8}]

(2)

Notice that the dimensions of each matrix are ${Ms}_{k} \in R^{[P \times 7 \times 6] \times 200}$ , where we consider P as the number of the repetitions of a $g e s t u r e_{j}$ , with seven sliding windows for each measure, six classes, and 200 extracted points for each sliding window that extract information of the EMG signal. It is worth mentioning that each sliding window was separated from each other by 25 points. Since the Myo sensor has eight EMG channels, we can write the EMG training matrix dimension as ${Dtrain}_{{u s e r}_{k}} \in R^{[[P \times 7 \times 6] \times 200] \times 8}$ .

Finally, the data of each user are appended in a general training matrix ${Dtrain}_{t o t a l}$ . When a user-general model is used, we consider ( $P = 50$ ). Equation (3) shows how a total training matrix for the user-general model is composed ( $k = 306$ users).

{Dtrain}_{t o t a l} = [\begin{matrix} {Dtrain}_{{u s e r}_{1}} \\ {Dtrain}_{{u s e r}_{2}} \\ ⋮ \\ {Dtrain}_{{u s e r}_{k}} \end{matrix}]

(3)

where the EMG total training matrix dimension is ${Dtrain}_{t o t a l} \in R^{[[[P \times 7 \times 6] \times 200] \times Q] \times 8}$ . The parameter Q represents the number of users used in the model. For the user-general model, $Q = 306$ , and for the user-specific model, $Q = 1$ . For the case of a user-specific model, the training matrix is composed only of signals belonging to each specific user. In user-specific models, the number of repetitions considered is $P = 25$ . It has to be noted that for each measure related to a $EMG (u s e r_{k}$ and $g e s t u r e_{j})$ , a label Y $\in \{w a v e O u t, w a v e I n, f i s t, o p e n, p i n c h, a n d n o G e s t u r e\}$ is added to train the mode. Y denotes the label corresponding to the current EMG gesture sample, and to the seven sliding windows within it.

2.1.2. Orientation Considerations for the EMG Sensor

In this research, two approaches were tested regarding the orientation problem of the Myo armband sensor, which are with and without orientation correction. Both methods were applied over the user-specific and user-general models previously explained.

Typically, the models—user-general and user-specific—that do not consider orientation correction present poor performance when the user places the bracelet in a different orientation. In this work, we propose an orientation correction algorithm to solve the problem related to the orientation variation of the Myo bracelet. This approach uses the maximum energy channel (MEC) of a $E M G_{g e s t u r e}$ , which allows us to obtain high robustness to rotation and allows us to place the bracelet in any angle, similar to [63]. Furthermore, it helps to avoid the necessity to record the signals every time the system is going to be used.

For this purpose, a gesture to synchronize the HGR models was used. The synchronization gesture lets the sensor be used in a different position. All five gestures were tested as synchronization signals ( $s y n c$ ). The results of the test for the selection of the best gesture for the synchronization signal are presented in Appendix A. These results demonstrated that the best performance was obtained using the $w a v e O u t$ gesture; thus, we selected that gesture for our experiments.

Performing the gesture $w a v e O u t$ during a period of time, a pod $S_{x}$ is obtained, which shows the location of the maximum activity in the ${EMG}_{w a v e O u t}$ signal. The EMG data are then rearranged according to $S_{x}$ , obtaining a new sensor orientation for the HGR system. For this purpose, the average energy in every EMG window of 200 points is calculated for T repetitions, and then the maximum value is found in a specific pod. It is worth mentioning that one, two, three, or four windows of 200 points can be used as sync signals to identify $S_{x}$ . The procedure to get the pod information in the synchronization stage starts with the data acquisition of the EMG signals of the sensor in the vector ${EMG}_{w O}$ , as we state as follows:

{EMG}_{w O} = [\begin{matrix} s_{1} & s_{2} & s_{3} & s_{4} & s_{5} & s_{6} & s_{7} & s_{8} \end{matrix}]

(4)

where ${EMG}_{w O}$ $\in R^{200 \times 8}$ and $s_{i} \in {[- 1, 1]}^{200 \times 1}$ . It has to be noted that the sample values from each channel $s_{i}$ are normalized values in the range of $- 1$ and 1. Then, the energy of the samples of each channel is given by

E_{w O} = [\begin{matrix} E_{s_{1}} & E_{s_{2}} & E_{s_{3}} & E_{s_{4}} & E_{s_{5}} & E_{s_{6}} & E_{s_{7}} & E_{s_{8}} \end{matrix}]

(5)

where $E_{s}$ refers to the energy in each pod. The average energy $\bar{E_{s_{k}}}$ value over a channel for T repetitions of the gesture $w a v e O u t$ is represented by

\bar{E_{s_{k}}} = \frac{1}{T} \sum_{j = 1}^{T} (\sum_{i = 2}^{L} a b s {(x_{i}) \cdot a b s (x_{i}) - (x_{i - 1}) \cdot a b s (x_{i - 1})})

(6)

where $a b s$ refers to the absolute value, $T \in [1, 4]$ is the number of $w a v e O u t$ synchronization repetitions, $k \in [1, 8]$ represent the pod number, L is the length of the ${EMG}_{w a v e O u t}$ signal, and $x_{i}$ is the ith point of the ${EMG}_{w a v e O u t}$ signal. Then, the sensor $S_{x}$ is identified through the max function, which gives the maximum average energy value of the vector as we state as follows:

s_{x} = max (\begin{matrix} \bar{E_{s_{1}}} & \bar{E_{s_{2}}} & \bar{E_{s_{3}}} & \bar{E_{s_{4}}} & \bar{E_{s_{5}}} & \bar{E_{s_{6}}} & \bar{E_{s_{7}}} & \bar{E_{s_{8}}} \end{matrix})

(7)

Finally, the new matrix order for all gestures is organized and described according the following equation:

{EMG}_{n e w O r d e r} = [\begin{matrix} s_{x} & s_{m o d ((x + 1), 9)} & s_{m o d ((x + 2), 9)} & \dots & s_{m o d ((x + 7), 9)} & s_{m o d ((x + 8), 9)} \end{matrix}]

(8)

where $m o d$ refers to the remainder after division value, and the maximum value of $(x + 8)$ is 8 because there are eight pods. Notice that the default order coming from the Myo bracelet is as follows:

{EMG}_{d e f a u l t O r d e r} = [\begin{matrix} s_{1} & s_{2} & s_{3} & s_{4} & s_{5} & s_{6} & s_{7} & s_{8} \end{matrix}]

(9)

As an example, if the $S_{x}$ detected is $S_{6}$ , the new matrix is arranged as follows: ${EMG}_{n e w O r d e r} = [\begin{matrix} s_{6} & s_{7} & s_{8} & s_{1} & s_{2} & s_{3} & s_{4} & s_{5} \end{matrix}]$ .

After obtaining the $S_{x}$ reference sensors through the maximum energy channel ( $M E C$ ), we use it in training and testing procedures. It is important to highlight that the reference pod could not be the same for all recordings between users and gestures. The calibration process must be executed every time that a user wants to test the recognition system after the user takes the bracelet off.

For reproducing the results of the proposed models, the code and the dataset used for this paper are located in [64].

2.2. Pre-Processing

As part of the pre-processing stage, the EMG energy (Equation (15)) is used to identify if a current analyzed window needs to be classified or not. Every EMG window must exceed an energy threshold to be computed for the classifier. A threshold of 17% was considered in this research based on multiple tests with different energy thresholds. Whenever the energy of an analyzed window exceeds the threshold, the EMG window goes to the next stage, which is feature extraction. This process avoids the classification of unnecessary gestures if the threshold is not reached and, therefore, improves the computational cost. It has to be noted that the energy threshold is calculated using the synchronization gesture $w a v e O u t$ and adding consecutively the energy calculated from each channel to obtain the value of energy E.

To perform the pre-processing procedure, the eight pods of the Myo bracelet have been divided into two groups. Every group is composed of four pods— $g r o u p_{h i g h}$ and $g r o u p_{l o w}$ —that are analyzed individually with respect to the energy E and a threshold of 17%. The $w a v e O u t$ gesture requests a muscle activation pattern that is detected through the $g r o u p_{h i g h}$ . When a different gesture is performed, for example, waveIn, the activity is sensed through the $g r o u p_{l o w}$ of sensors. The channel division by groups allows the detection of gestures that activate a different group of muscles. The energy for $g r o u p_{h i g h}$ corresponds to the energy of the pods $S_{1}$ , $S_{2}$ , $S_{3}$ , $S_{4}$ as stated in Equation (10), while the energy for $g r o u p_{l o w}$ corresponds to the energy of the pods $S_{5}$ , $S_{6}$ , $S_{7}$ , and $S_{8}$ , as is shown in Equation (11).

T h_{h i g h} = (0.17) \frac{1}{4} \sum_{i = 1}^{4} \bar{E_{S_{i}}}

(10)

T h_{l o w} = (0.17) \frac{1}{4} \sum_{i = 5}^{8} \bar{E_{S_{i}}}

(11)

2.3. Feature Extraction

Five functions to extract features are used in this paper, which are applied over every EMG recording (see Figure 5) contained into a sliding window only when it surpassed the threshold of energy.

A sample of an electromyography (EMG) signal recorded using the Myo bracelet with the position of the sensors suggested by the Myo manufacturer for the $F i s t$ gesture.

The following set of functions that were used is briefly explained as follows:

Standard deviation (SD): This feature measures the dispersion of the EMG signal. It indicates how the data are scattered respectively to the average and is expressed as:
$S D = \sqrt{\frac{1}{L - 1} \sum_{i = 1}^{L} ∣ x_{i} - u ∣^{2}}$ (12)
where $x_{i}$ is a sample of EMG signal, u is the average, and L is the total points of the EMG;
Absolute envelope (AE): It uses the Hilbert transform for calculating the instantaneous attributes of a time series, especially amplitude and frequency [65]:
$A E = ∣ A E ∣ = \sqrt{{f (t)}^{2} + {(H \{f (t)\})}^{2}}$ (13)
where $H (t)$ is the Hilbert transform and $f (t)$ is the EMG signal;
Mean absolute value (MAV): It is a popular feature used in EMG-based hand gesture recognition applications. The mean absolute value is the average of the absolute value of the EMG signal amplitude, and it is defined as follows:
$M A V = \frac{1}{L} \sum_{i = 1}^{L} x_{i}$ (14)
where $x_{i}$ is a sample of EMG signal, and and L is the total points of the EMG;
Energy (E): It is a feature for measuring energy distribution, and it can be represented as [66]:
$E = \sum_{i = 2}^{L} a b s {(x_{i}) \cdot a b s (x_{i}) - (x_{i - 1}) \cdot a b s (x_{i - 1})}$ (15)
where $x_{i}$ is a sample of EMG signal, and L is total length of the EMG signal;
Root mean square (RMS): It describes the muscle force and non-fatigue contraction [51]. Mathematically, the RMS can be defined as:
$R M S = \sqrt{\frac{1}{L} \sum_{i = 1}^{L} {(x_{i})}^{2}}$ (16)
where $x_{i}$ is a sample of EMG signal, and L is the total points of the EMG.

2.4. Classification

A support vector machine (SVM) was chosen for the hand gesture classification. The SVM is a machine learning technique used to find the optimal separation hyper-plane in data classification [38,39,67]. It uses a kernel function in the input data to remap it into a new hyper-plane that facilitates the separation between classes. In this research, a polynomial kernel of third order with a one-vs.-one strategy was implemented to carried out the classification procedure. The parameters used to configure the SVM can be observed in Table 1. The parameters for the SVM were implemented in MATLAB for all the experiments.

Table 1.

Support vector machine (SVM) configuration.

MATLAB Variable	Value
Kernel Function	polynomial
Polynomial Order	3
Box Constrain	1 (variable value for regularization)
Standardize	$({F e a t u r e}_{i} - μ) / σ$ ; where $μ$ = mean, $σ$ = standard deviation
Coding	one vs one

	Targets						Predictions Count (Precision%)
	waveIn	waveOut	Fist	Open	Pinch	noGesture	Predictions Count (Precision%)
waveIn	4831	431	164	211	218	3	5858 82.47%
waveOut	368	5370	262	682	406	3	7091 75.73%
fist	1047	548	5361	1009	1588	29	9582 55.95%
open	334	458	404	4072	795	2	6065 67.14%
pinch	105	253	337	342	2437	3	3477 70.09%
noGesture	965	590	1122	1334	2206	7610	13827 55.04%
Targets Count (Sensitivity%)	7650 63.15%	7650 70.2%	7650 70.08%	7650 53.23%	7650 31.86%	7650 99.48%	45,900 64.66%

Paper	Device	Pods Sensors	Gestures		Train/Test Users	Class.(%)	Recog.(%)	HGR Model	Recognition Evaluated	Rotation Performed	Correction of Rotation
[39]	MYO	8 *	5	$G r_{1}$	12/12	97.80	-	S	no	no	no
[70]	Delsys	12	6	$G r_{3}$	40/40	79.68	-	S	no	no	no
[39]	MYO	8 *	5	$G r_{1}$	12/12	98.70	-	S	no	no	no
[55]	Sensors	5	11	$G r_{4}$	4/4	81.00	-	S	no	yes	no
[57]	High Density	96	11	$G r_{5}$	1/1	60.00	-	S	no	yes	no
[58]	MYO	8 *	15	$G r_{6}$	1/1	91.47	-	S	no	yes	yes
[59]	MYO	8 *	6	$G r_{1}$	10/10	94.70	-	S	no	yes	yes
[63]	MYO	8 *	5	$G r_{2}$	40/40	92.40	-	G	no	yes	yes
S-HGR **	MYO	8 *	5	$G r_{1}$	306/306	94.96	94.20	S	yes	yes	yes
G-HGR **	MYO	8 *	5	$G r_{1}$	306/306 ***	81.22	80.31	G	yes	yes	yes
S-HGR **	MYO	8*	5	$G r_{1}$	306/306	94.96	94.20	S	yes	yes	yes

User-General Models
Gesture	Classification (%)	Recognition (%)
waveOut	75.61	74.57
waveIn	$64.45$	$63.52$
fist	$64.80$	$64.01$
pinch	$67.21$	$66.51$
open	$74.79$	$74.00$

	Targets						Predictions Count (Precision%)
	waveIn	waveOut	Fist	Open	Pinch	noGesture	Predictions Count (Precision%)
waveIn	7339	65	73	57	36	168	7738 94.84%
waveOut	86	7416	64	54	32	136	7788 95.22%
fist	18	10	7305	43	19	136	7531 97%
open	79	94	100	7385	113	138	7909 93.37%
pinch	34	41	53	49	7232	150	7559 95.67%
noGesture	94	24	55	62	218	6922	7375 93.86%
Targets Count (Sensitivity%)	7650 95.93%	7650 96.94%	7650 95.49%	7650 96.54%	7650 94.54%	7650 90.48%	45,900 94.99%

	Targets						Predictions Count (Precision%)
	waveIn	waveOut	Fist	Open	Pinch	noGesture	Predictions Count (Precision%)
waveIn	2961	2265	2104	2231	2155	291	12007 24.66%
waveOut	1204	2320	970	1030	756	136	6416 36.16%
fist	1763	1874	2862	1714	1579	254	10046 28.49%
open	515	526	594	1389	516	127	3667 37.88%
pinch	869	566	874	965	2052	143	5469 37.52%
noGesture	338	99	246	321	592	6699	8295 80.76%
Targets Count (Sensitivity%)	7650 38.71%	7650 30.33%	7650 37.41%	7650 18.16%	7650 26.82%	7650 87.57%	45,900 39.83%

	Targets						Predictions Count (Precision%)
	waveIn	waveOut	Fist	Open	Pinch	noGesture	Predictions Count (Precision%)
waveIn	7338	49	80	46	39	171	7723 95.01%
waveOut	75	7460	65	54	29	134	7817 95.43%
fist	22	13	7301	44	22	137	7539 96.84%
open	76	68	118	7381	123	139	7905 93.37%
pinch	31	40	43	36	7175	149	7474 96%
noGesture	108	20	43	89	262	6920	7442 92.99%
Targets Count (Sensitivity%)	7650 95.92%	7650 97.52%	7650 95.44%	7650 96.48%	7650 93.79%	7650 90.46%	45,900 94.93%

	Targets						Predictions Count (Precision%)
	waveIn	waveOut	Fist	Open	Pinch	noGesture	Predictions Count (Precision%)
waveIn	7335	50	86	46	40	173	7730 94.89%
waveOut	77	7471	59	50	28	134	7819 95.55%
fist	27	10	7307	43	24	141	7552 96.76%
open	72	67	113	7386	125	137	7900 93.49%
pinch	33	35	41	33	7174	150	7466 96.09%
noGesture	106	17	44	92	259	6915	7433 93.03%
Targets Count (Sensitivity%)	7650 95.88%	7650 97.66%	7650 95.52%	7650 96.55%	7650 93.78%	7650 90.39%	45,900 94.96%

PERMALINK

An Energy-Based Method for Orientation Correction of EMG Bracelet Sensors in Hand Gesture Recognition Systems

Lorena Isabel Barona López

Ángel Leonardo Valdivieso Caraguay

Victor H Vimos

Jonathan A Zea

Juan P Vásconez

Marcelo Álvarez

Marco E Benalcázar

Abstract

1. Introduction

1.1. Structure of Hand Gesture Recognition Systems

1.2. Evaluation of Hand Gesture Recognition Systems

1.3. User-Specific and User-General HGR Systems

1.4. The Rotation Problem with Bracelet-Shaped Devices and Related Works

1.5. Article Overview

2. Materials and Methods

Figure 1.

2.1. Data Acquisition

Figure 2.

Figure 3.

Figure 4.

2.1.1. General and Specific Models

2.1.2. Orientation Considerations for the EMG Sensor

2.2. Pre-Processing

2.3. Feature Extraction

Figure 5.

2.4. Classification

Table 1.

2.5. Post-Processing

Figure 6.

3. Experimental Setup

Figure 7.

4. Results

4.1. Myo Bracelet Model Results Using Manufacturer’s Software

Table 2.

4.2. User-Specific HGR Model Result

Table 3.

Table 4.

Table 5.

Table 6.

4.3. User-General HGR Model Results

Table 7.

Table 8.

Table 9.

Table 10.

4.4. Comparison between User-Specific and User-General Results

Figure 8.

Figure 9.

Figure 10.

Table 11.

4.5. Comparison of Results with Other Papers

Table 12.

5. Discussion

6. Conclusions

Appendix A. Synchronization Gesture Selection

Table A1.

Table A2.

Table A3.

Table A4.

Table A5.

Table A6.

Appendix B. Confusion Matrices of User-Specific Models

Table A7.

Table A8.

Table A9.

Table A10.

Table A11.

Table A12.

Appendix C. Confusion Matrices of User-General Models

Table A13.

Table A14.

Table A15.

Table A16.

Table A17.

Table A18.

Appendix D. Description of Gestures Used in Other Works Found in the Literature

Author Contributions

Funding

Conflicts of Interest