Skip to main content
Computational and Mathematical Methods in Medicine logoLink to Computational and Mathematical Methods in Medicine
. 2022 Apr 8;2022:2323625. doi: 10.1155/2022/2323625

Interpatient ECG Arrhythmia Detection by Residual Attention CNN

Pengyao Xu 1, Hui Liu 1, Xiaoyun Xie 1, Shuwang Zhou 1,2, Minglei Shu 1, Yinglong Wang 1,
PMCID: PMC9012615  PMID: 35432590

Abstract

The precise identification of arrhythmia is critical in electrocardiogram (ECG) research. Many automatic classification methods have been suggested so far. However, efficient and accurate classification is still a challenge due to the limited feature extraction and model generalization ability. We integrate attention mechanism and residual skip connection into the U-Net (RA-UNET); besides, a skip connection between the RA-UNET and a residual block is executed as a residual attention convolutional neural network (RA-CNN) for accurate classification. The model was evaluated using the MIT-BIH arrhythmia database and achieved an accuracy of 98.5% and F1 scores for the classes S and V of 82.8% and 91.7%, respectively, which is far superior to other approaches.

1. Introduction

The latest survey statistics on global causes of mortality and disability of the World Health Organization demonstrate that cardiovascular disease (CVD) is one of the most serious diseases that threaten human health. The ECG signal reflects the electrical activity of the heart and is the primary basis for the diagnosis of CVD. With the development of computer technology, automatic arrhythmia detection technology has become a research hotspot.

Traditional machine learning approaches such as independent component analysis [13], principal component analysis (PCA) [4], support vector machine (SVM) [5], and K-nearest neighbor (KNN) [6] have been utilized to identify arrhythmias. However, these methods require artificial feature extraction and intervention. With the development of technology, deep learning has gradually become the mainstream method for automatic ECG classification [7]. There are mainly two kinds of deep learning approaches from the perspective of the dimension of ECG representation, i.e., one-dimensional (1-D) and two-dimensional (2-D).

Some studies exploit the original ECG signal as the model input. Although the proposed 1-D deep convolutional neural network (CNN) has achieved good classification results [8, 9], however, beat-by-beat classification cannot be achieved due to the fixed time window size. Lin et al. [10] proposed a method based on normalized and nonnormalized RR intervals that extract ECG morphology by wavelet analysis and linear prediction model, but this method requires lots of signal preprocessing and has low prediction accuracy. Llamedo and Martínez [11] proposed a method based on a linear classifier and a clustering algorithm; however, the clustering algorithm cannot effectively represent class at the edge, making more likely arrhythmia misjudgment. In addition, the abovementioned 1-D studies also introduced a small degree of preprocessing.

The ECG signal can also be converted from one dimension into two dimensions in various manners, such as frequency spectrum and time-frequency images. Al Rahhal et al. [12] use the continuous wavelet transform (CWT) to generate time-frequency information, then migration learning. However, denoising and data augmentation operations reduce model efficiency. Xia et al. [13] use the heartbeat extraction method to convert multiple signals contained within 5 s into an image. However, the proposed structure not only limits the effect of the model due to the immutability of the short-time Fourier transform window but also easily causes misjudgment of normal data in verification because as long as one of the multiple heartbeats contained in the image is abnormal, the entire image will be marked as abnormal. Li et al. [14] exploited three distinct types of wavelet transforms paired with CNN to create a depth technique for automatically distinguishing time-frequency images, which identified ventricular ectopic heartbeat (V) as more than 97%; however, preprocessing operations such as noise reduction increase the complexity of the model. Salem et al. [15] utilized DenseNet to classify ECG spectra from the perspective of transfer learning, but it also has the same risk of misjudgment as [13]. But in terms of overall performance, the 2-D ECG data is weaker than the 1-D signal noise interference, which has also been proved in the research [16, 17].

In order to solve the problems of cumbersome preprocessing and difficult beat-by-beat classification in the above research, inspired by structural variants such as fully convolutional network, U-Net, residual network, and attention mechanism [1827] that have been successfully used in various tasks (such as natural image classification and medical image segmentation), this paper proposes an RA-CNN model for the classification of arrhythmia between patients. Firstly, the CWT is used to convert the ECG heartbeat into an image and classes with much fewer samples are enhanced by data augmentation techniques. Secondly, the attention mechanism and residual skip connection are integrated into the U-Net which is called residual attention U-Net (RA-UNET). Finally, the RA-CNN constitutes by a skip connection between the RA-UNET and a residual block. We trained and tested the models on the MIT-BIH database, and the final experimental results demonstrate the superiority of the proposed method.

The main advantages of the proposed method are summarized as follows:

  1. The converted 2-D ECG will improve the effective area that the model can learn and use data enhancement methods to make up for the deficiency of waveforms [28]. The data enhancement on 1-D ECG may change its time domain information, but this problem does not exist in 2-D images

  2. A new residual block (R-block) with judgment branches is proposed as the basic module of RA-CNN; it judges whether to retain the original feature map and thus solves the performance degradation

  3. RA-UNET integrates the “split-transform-fusion” principle, splits the feature map into two groups after each sampling operation, uses the two branches of spatial and channel generate attention weights in parallel, and then fuses the weight feature maps of the two branches together to guide model learning

The rest of this paper is organized as follows. The proposed model is discussed in detail in Section 2, followed by the experimental design and verification in Section 3. Conclusions are finally drawn in Section 4.

2. Methodology

2.1. Database

The suggested approach is trained and evaluated using the MIT-BIH arrhythmia database [29]. It was developed in collaboration between the Massachusetts Institute of Technology and Beth Israel Hospital in Boston and is now considered one of the three primary databases in academic circles. The database contains 48 Holter records from 25 men and 22 women between the ages of 32 and 89 (of which 201 and 202 are from the same male), all of which have significant variances. Each recording is a dual-channel signal with a sampling rate of 360 Hz and a length of slightly more than 30 minutes, with the R peak value of each heartbeat indicated.

2.2. Preprocessing

2.2.1. ECG Heartbeat Segmentation

Because each heartbeat in an ECG has a distinct duration, the length of it segmented from an ECG is not equal. Different methods of heartbeat segmentation were employed in the literature [3032] in the study of 2-D.

We directly used the R peak position in the MIT-BIH database without additional positioning and confirmed the beat length after positioning the QRS complex according to the R peak position [33]. Rcurrent, Rprevious, and Rlast represent the R wave peaks of the currently located heartbeat and the adjacent heartbeats before and after; the R-R interval between two adjacent R waves is regarded as a segment. In order to fully ensure the integrity of the segmented heartbeat medical information, the middle 3/4 position of the two R peaks of Rprevious and Rlast is taken as the intercepted heartbeat length; therefore, the intercepted n-th heartbeat can be expressed as Formula (1) (Figure 1):

EBeat=3RlastRprevious4, (1)

where EBeat represent the extracted heartbeat, Rprevious and Rlast, respectively, represent the abscissa values of the previous and next heartbeat of the extracted heartbeat on the coordinate axis. If the extracted heartbeat has no heartbeat Rprevious or Rlast, the coordinates correspond to the heartbeat; then the current heartbeat will not be segmented.

Figure 1.

Figure 1

Heartbeat segmentation schematic diagram.

2.2.2. Transforming the 1-D ECG into 2-D ECG

After determining the sampling length of each beat, the 1-D ECG is converted to the time-frequency domain by CWT [28]. The choice of CWT is motivated by its success at analyzing ECG signals. The dimension of this output is higher than the dimension of the input. Unlike feature reduction, overcomplete representations allow finding more robust and sparse feature representations from the data [12]. For ECG time series, its CWT relative to a given mother wavelet EBeat is defined as follows:

Ca,bEBeatt=1a1/2EBeattψtbadt. (2)

Among them, a and b are the scale and translation parameters, respectively. EBeat(t) is the given signal; ψ is the mother wavelet.

2.2.3. Heartbeat Augmentation

Even in patients with arrhythmia, the majority of the swings in the ECG analysis are normal signals, leading to fewer damage data in the ECG database. The use of data augmentation techniques to boost damage data can effectively make up for the absence of training data. Decrease the danger of overfitting, and increase the algorithm's robustness.

According to the characteristics of the 2-D ECG waveform, this article will move the beat to the left and right, move up, and move down to obtain multiple enhanced heartbeat images. The signal characteristics in the original ECG can be significantly retained by using the augmented images [3436]. Multiple focal heartbeat data can be created after performing the preceding technique on the original ECG. In Figure 2, step (i) depicts the process of turning the extracted heartbeat into an image and step (ii) depicts a portion of the data augmentation impacts.

Figure 2.

Figure 2

2-D data generation and enhancement.

The abovementioned heartbeat enhancement approach is utilized to improve the data in DS1 (introduced in detail in this work 3.1.3). Following processing, the data balance is achieved in order to properly train the RA-CNN model. Table 1 shows the number and percentage of heartbeats before and after enhancement.

Table 1.

Data comparison before and after dataset enhancement.

Database Enhancement Type Number of heart beats Total
N S V F
MIT-BIH Before Amount 90042 2779 7007 802 100630
Percentage (%) 89.48 2.76 6.96 0.80
After Amount 90042 20696 41099 8668 160505
Percentage (%) 56.10 12.89 25.61 5.40

2.3. Model Architecture

Figure 3 shows the overall flowchart of the proposed RA-CNN model to classify arrhythmia. The encoding as images module (left) is the preprocessing process in this work 2.2 to use CWT transform the 1-D ECG into 2-D ECG heartbeat. The RA-CNN model (middle) is designed to learn 2-D ECG features so as to transform it to the forms that easy to classify. The arrhythmia prediction module (right) realizes the classification in terms of the output of RA-CNN according to arrhythmias in the AAMI standard.

Figure 3.

Figure 3

RA-CNN model training flowchart.

The RA-CNN model consists of three parts: top layer, middle layer, and bottom layer (as shown in Figure 4). The left part of the top layer uses conv2d, avg pooling, and R-block to perform a certain degree of feature reduction on the 2-D ECG image, which is conducive to reducing the size of the input (record the output as initial feature map) and expanding the receptive fields. The right part of the top layer reduces the image dimension to 1 × 1 in order to classify by multiple consecutive R-block and avg pooling. The skip connection in the top layer is to connect the initial feature map and the output features of the other two layers. In the middle layer, the initial feature map passes through only an R-block and then connects with the output of the bottom layer. The bottom layer is residual attention U-Net (RA-UNET) which is an hourglass structure from top-to-bottom to bottom-to-top, i.e., from downsampling to upsampling; the downsampling is achieved by R-block that extracts the essential features from high-dimensional images and upsampling to be done by bilinear interpolation. A-block is applied after each downsampling and upsampling to intensify the output by generating the attention weight distribution, so that the model can efficiently focus on the appropriate area of the ECG feature. At the same time, each output of downsampling is used as a carrier to save the characteristics of the feature map via the skip connection with the output of the upsampling in the same size, which prevents inaccurate feature reconstruction. The number of image channels and size changes in the RA-CNN model structure are shown in Table 2.

  1. Residual block (R-block): it is an encapsulated residual module with several convolution layers as the network infrastructure; it performs general feature learning operations or dimensionality reduction operations (such as 2.3.1).

  2. Residual Attention UNET (RA-UNET): it includes a complete downsampling and upsampling process through the hourglass structure; the module has fully learned the inherent characteristics of 2-D ECG. RA-UNET converts the intrinsic feature map output of each upsampling into an attention mask to guide the feature learning of the model through skip connection, so that the model can suppress the worthless area of the feature map while enhancing specific important information (such as 2.3.2).

  3. Attention block (A-block): channel attention and spatial attention are learned in parallel by grouping feature maps along the channel axis to achieve more accurate attention to important information areas (such as 2.3.3).

Figure 4.

Figure 4

RA-CNN model.

Table 2.

The number of channels and output dimensions of each layer.

Layer name Operate Kernel size Stride Output size Channels
Input 224 × 224 3
Top layer conv2d 7 × 7 2 112 × 112 16
Max Pool2d 3 × 3 2 56 × 56 16
R-block1 conv2d,1×1,4conv2d,3×3,4conv2d,1×1,16×1 1
1
1
56 × 56 16
Middle layer R-block5 conv2d,1×1,4conv2d,3×3,4conv2d,1×1,16×1 1
1
1
56 × 56 16
Bottom layer RA-UNET 56 × 56 16
Top layer R-block2 conv2d,1×1,8conv2d,3×3,8conv2d,1×1,32×1 1
2
1
28 × 28 32
R-block3 conv2d,1×1,16conv2d,3×3,16conv2d,1×1,64×1 1
2
1
14 × 14 64
R-block4 conv2d,1×1,16conv2d,3×3,16conv2d,1×1,64×1 1
2
1
7 × 7 64
Avg Pool2d 7 × 7 1 1 × 1 64
Output 4

2.3.1. R-Block

R-block is a basic residual block with judgment branches, which is made up of three BatchNorm2d-Relu-Conv2d layers and then distributed throughout the RA-CNN model to accomplish the general function of feature processing.

Figure 5 shows the structural details of the R-block, which was inspired by the ResNet to solve the “degradation” problem caused by very deep levels and designed a structure with a judgment function (the Exit? branch shown in Figure 5), which decides whether to retain more original feature information by setting different steps and channels, so the purpose of it is to ensure that the essential characteristics of the feature map will not be destroyed to the maximum extent. Therefore, we can set appropriate parameters for different needs, followed by the residual connection.

Figure 5.

Figure 5

R-block.

For the input XR of R-block, the expected output R(XR) can be expressed as

RXR=fiθTσ1XRXRi,fiθTσ1XRf1θTσ1XRii, (3)

where fi(∙) is the ith BatchNorm2d-Relu-Conv2d operation, θ is a convolution operation, σ1(∙) is a ReLU function, and ⊕ denotes the element-wise sum. In the Exit? process of judgment, when the number of input and output channels is equal or the convolution step is 1, the flow is shown in process (i) of Figure 5 and the expected output R(XR) is shown in the formula 3-(i). If not, the flow is shown in process (ii) in Figure 5 and the expected output R(XR) is shown in formula 3-(ii). Then the final output feature map R(XR) ∈ C×H×W.

The R-block solves the problem of degradation and gradient disappearance through the residual connection with judging branches, which improves the network performance and reduces the feature dimension by changing the number of channels or stride in the branch structure.

2.3.2. RA-UNET

RA-UNET is an improvement of the U-Net [1822] by incorporating residual and attention mechanisms. RA-UNET is an encoder-decoder structure (as shown in Figure 6), which extracts high-level information based on three layers of downsampling and then reconstructs the feature by three layers of upsampling. In our design, the most significant thing is the attention block (A-block) inserted after each downsampling and upsampling, which can assist the model in accurate and efficient feature reduction and reconstruction. We will introduce its implementation in detail:

  1. Encoder: using max pooling to realize the resample of vital information of the input image, i.e., down sampling, at the same time, the A-block is used to strengthen the effect of key areas.

  2. Decoder: the upsampling operation is accomplished through the bilinear interpolation layer, which can be intuitively understood as the restoration process of the feature map. After each step of the upsampling operation, the A-block is also used to encourage the model to use the learned knowledge to learn more feature map information.

  3. Skip connections: in order to better train the deep network, after downsampling and completing the A-block, the R-block for feature processing not only better integrates contextual semantic features and prevents the disappearance of gradients caused by the stacking of coding layers but also acts as a carrier to save the characteristics; it can better restore the details of the same size feature map during the upsampling process, so as to improve the recognition effect of the network on the diversity of waveform changes.

Figure 6.

Figure 6

RA-UNET structure diagram.

The specific size changes and convolution kernel size during RA-UNET processing are shown in Table 3.

Table 3.

Each layer structure and input size of RA-UNET.

Name Layer Kernel size Output size Channels
Encoder Max Pool2d 3 × 3, stride 2 28 × 28 16
A-block 28 × 28 16
R-block conv2d,1×1,4conv2d,3×3,4conv2d,1×1,16×1 28 × 28 16
Max Pool2d 3 × 3, stride 2 14 × 14 16
A-block 14 × 14 16
R-block conv2d,1×1,4conv2d,3×3,4conv2d,1×1,16×1 14 × 14 16
Max Pool2d 3 × 3, stride 2 7 × 7 16
A-block 7 × 7 16

Decoder Upsample Size (14, 14) 14 × 14 16
A-block 14 × 14 16
Upsample Size (28, 28) 28 × 28 16
A-block 28 × 28 16
Upsample Size (56, 56) 56 × 56 16
A-block 56 × 56 16
R-block conv2d,1×1,4conv2d,3×3,4conv2d,1×1,16×1 56 × 56 16

Output 56 × 56 16

2.3.3. A-Block

A-block captures remote contextual information in the spatial dimension and channel dimension, respectively. The attention mechanism is an improvement in the article [24], which is used to automatically learn and calculate the contribution of input data to output data. First, the sampled feature map is divided into n groups along the channel axis, and each group of features is split into two branches for channel attention and spatial attention, respectively, and then concatenates the attention results of the two branches together. Finally, the n groups of features are merged to obtain a feature map with the same size as the input. Figure 7 shows in detail one group of attention mechanisms after channel grouping.

Figure 7.

Figure 7

A-block.

Take the feature map X ∈ C×H×W as an example, which is the output after the first use of max pooling in RA-UNET. First, divide its channel dimension into n groups of subfeatures Xi(c/nH×W (1 ≤ in); then split each subfeature along the channel axis into two branches Xi1, Xi2(c/2nH×W (1 ≤ in); hence, the channel attention is performed on the first branch to embed global information and generate channel statistical attention weight distribution by average pooling layer and softmax function. Then, the channel attention weight distribution is imposed on Xi1 to help model focus on the distinct channel, followed with the residual connection. The final output feature map Xi1′ of the channel attention can be realized as follows:

Xi1=σ2W1AVGXi1+b1Xi1Xi1. (4)

Among them, σ2(∙) represents the softmax function, AVG (∙) is the average pooling operation, W1(c/2n)×1×1 and b1(c/2n)×1×1 are parameters used for scaling and translation, and ⊗ stands for matrix multiplication.

Next, the spatial attention is performed on the second branch to generate the spatial attention map which pays more attention to the important pixel area that stands for the principal character of the feature map. What is different from channel attention is that Xi2 obtained the spatial attention weight distribution via group normalization, and other operations are similar. The final output feature map Xi2′ of spatial attention can be achieved as follows:

Xi2=σ2W2GNXi2+b2Xi2Xi2. (5)

Among them, GN(∙) denotes the group normalization, W2(c/2n)×1×1 and b2(c/2n)×1×1 are model parameters need to be trained.

In order to maintain the consistency of channel dimensions after the attention operation, the channel attention feature map and the spatial attention feature map are spliced along the channel axis.

Xi=Concat Xi1,Xi2, (6)

where Concat {∙} denotes the dimension concatenating operation and Xi′ ∈ (c/nH×W (1 ≤ in).

Finally, after n groups of feature maps are also aggregated along the channel dimension, the final attention feature map containing the weight coefficient is generated: X′ = Concat {X1′, X2′, ⋯Xn′}.

2.3.4. Arrhythmia Predication

Finally, the RA-CNN model uses a fully connected layer to perform a fully connected operation on the learned attention feature map to achieve arrhythmia classification.

3. Experimental Design

3.1. Experimental Setup

3.1.1. Experimental Environment

The data preparation section of this paper is done on an i7-10700K processor. The experiment was done with the NVIDIAA 100 graphics card and completed on the Ubuntu 18.04.3 operating system. Run PyTorch, and then use WFDB packet to process the ECG signal.

3.1.2. Classification Standard of ECG

This study used the widely used [3742] American progressive association AAMI to develop medical device ANSI/AAMI EC57:2012 standards to classify arrhythmias. Arrhythmias are divided into five classes, as shown in Table 4.

Table 4.

Classification of ECG in the MIT-BIH database using AAMI standard.

Types Contains heartbeat type
Normal (N) Normal (NOR), left bundle branch block (LBBB), right bundle branch block (RBBB), atrial escape (AE), node (junction) escape heartbeat (NE)
Ventricular ectopic heartbeat (V) Premature ventricular contraction (PVC), ventricular escape heartbeat (VE)
Fusion heartbeat (F) Fusion of ventricular and normal (FVN)
Supraventricular ectopic heartbeat or premature heartbeat (S) Atrial premature (AP), aberrant atrial premature (AaP), nodal (junctional) premature (NP), supraventricular premature (SP)
Unknown heartbeat (Q) Paced (/), fusion of paced and normal (FPN), unclassified (U), undetermined (?)

3.1.3. Database Set

The data from MIT-BIH is used to train the model in this work. This paper strictly follows the AAMI classification standard, ignoring 4 records with severe noise among the 48 records. For the remaining records, an interpatient division scheme proposed in [3742] is used. Divide into training set (DS1) and test set (DS2). DS1 contains 22 records for training and parameter determination. DS2 is only used as a test set for final performance evaluation. Using this partitioning method, there is no need to worry about including the same patient's heartbeat in both training and test sets. The number of heart beats after division is shown in Table 5.

Table 5.

Interpatient dataset partitioning scheme.

Database Datasets Partition Number of heart beats Total
N S V F
MIT-BIH DS1 Training 45824 18860 37880 8280 110844
Percentage (%) 41.34 17.01 34.17 7.47 100
DS2 Testing 44218 1836 3219 388 49661
Total 90042 20696 41099 8668 160505

3.1.4. Training Parameter Setting

The learning rate is a key training parameter in the proposed RA-CNN model. We optimize the parameters in order to train the model for the best performance in arrhythmia classification.

We set the initial learning rate to 0.001 and drop to the original 0.1 every 20 epochs. In order to reduce the memory, use a smaller batch size for training, and set the batch size to a small batch of 16; the loss function uses cross entropy error, and the optimization function uses Adam.

3.1.5. Evaluation Metrics

This study utilized the MIT-BIH arrhythmia database to evaluate the RA-CNN model according to the AAMI standard in order to test its performance. These indicators have also been employed extensively in research [3742]: classification accuracy (Acc), sensitivity (Sen), positive prediction rate (Ppr), and F1-score.

Acc is the proportion of correctly classified ECG samples to the total sample and is also the most commonly used evaluation index in all classification problems.

Acc=TP+TNTP+TN+FP+FN×100%. (7)

Sen only processes positive heartbeats, which means the ratio of the detected true positive heartbeats to the actual positive heartbeats.

Sen=TPTP+FN×100%. (8)

Ppr represents the proportion of positive heartbeats that are correctly detected among all positive heartbeats.

Ppr=TPTP+FP×100%. (9)

F 1-score is a comprehensive evaluation index of precision rate and recall rate, used to reflect the overall situation.

F1=2×Sen×PprSen+Ppr×100%. (10)

Among the above four evaluation indicators, false positive (FP) is the number of heartbeats that are misclassified. For example, it is actually a heartbeat of class N but is classified into one of the classes V, F, or S. False negative (FN) is the number of heartbeats classified in different categories; it is also a misclassification of samples. True positive (TP) is the number of heartbeats that are correctly classified. True negative (TN) is the number of heartbeats that do not belong to a certain category and are not classified as such.

3.2. Experimental Verification

3.2.1. Analysis of the Impact of A-Block on Classification Results

Figure 8 shows the heartbeat display of channel attention and spatial attention after A-block processing in the process of using RA-UNET. A-block explores attention by assigning higher weights to pixels that are helpful for accurate classification. Therefore, as the depth of the RA-UNET deepens, the pixel area that represents the ECG curve in the feature map will become more and more obvious. The RA-UNET model will not only focus more precisely on the specific area of the lower part of the image where the waveform changes more but also filter the background information. Thereby, it can “do no useless work” and has the effect of improving the classification accuracy. In the figure, (i) shows 8 beats randomly selected from 2-D ECG, (ii) shows the visualization results output by Channel attention in A-block for the first time, and (iii) shows the output result of spatial attention structure processing. Obviously, it can be seen that (iii) pays more attention to the lower area of the image than (ii) and realizes that the large-scale, multichannel features are concentrated in the key positions of the various waveforms at the bottom of the image.

Figure 8.

Figure 8

2-D ECG data processing results of the two branches of A-block.

3.2.2. Data Enhancement Experiment

Figures 9 and 10, respectively, show the best results of classification of classes N, S, V, and F ECG using RA-CNN when only setting variables for data enhancement. It can be found that the number of correctly classified samples after enhancement has increased compared with that before enhancement.

Figure 9.

Figure 9

Confusion matrix without data augmentation.

Figure 10.

Figure 10

Confusion matrix enhanced with data augmentation.

Table 6 shows the evaluation results before and after data enhancement using the indicators mentioned in 3.1.5. It can be seen that with the basic settings unchanged, the average accuracy of the data enhancement method proposed in this work has increased by about 0.8%. Other indicators have also improved, so the data enhancement method proposed in this work can promote the classification results.

Table 6.

Comparison of effects before and after data enhancement.

Enhancement ACC N (%) S (%) V (%)
SEN Ppr F 1 SEN Ppr F 1 SEN Ppr F 1
Without 97.6 98.16 98.29 98.23 75.93 71.56 73.68 89.93 82.95 86.30
Proposed 98.5 98.87 98.64 98.75 83.06 82.48 82.77 93.46 90.04 91.72

The final experimental results show that the model has a good classification effect on class N and class V, while the class S classification effect is significantly lower than the other two classes. The main reason is that the number of training samples for class S is significantly less than the other two categories even with data enhancement. The second is that the similarity of the waveforms between class S and class N is extremely high, causing the two types of samples to overlap more in the distribution, and the classification effect is not ideal.

3.2.3. Ablation Study

It has been proved by 3.2.2 that the data enhancement method proposed in this work is effective. Therefore, the effectiveness of the proposed two basic structures of R-block and A-block is verified in the same situation using the enhancement method proposed in this work. Table 7 presents the results of our ablation experiments.

Table 7.

Data analysis of ablation experiments.

Works ACC N (%) S (%) V (%)
SEN Ppr F 1 SEN Ppr F 1 SEN Ppr F 1
Without R-block 97.4 97.49 98.41 97.94 77.72 71.85 74.67 91.92 78.26 84.54
Without A-block 97.2 97.72 98.15 97.94 71.35 66.23 68.69 88.38 78.77 83.30
Without channel attention 97.7 98.18 98.21 98.20 76.68 68.84 72.61 89.84 86.35 88.06
Without spatial attention 97.5 97.95 98.08 98.02 75.44 68.40 71.74 88.75 83.51 86.05
Without top layer 96.5 96.36 97.88 97.11 74.83 64.87 69.50 90.59 72.86 80.76
Without middle layer 97.3 97.28 98.48 97.88 80.39 72.60 76.30 92.17 77.19 84.02
Proposed 98.5 98.87 98.64 98.75 83.06 82.48 82.77 93.46 90.04 91.72

First of all, we verify the influence of the R-block module on the model effect. We use conv2d (the same as the conv2d used in R-block) to replace the R-block that implements the downsampling effect in the model and remove the R-block that implements the general feature processing function. The final implementation result (as shown without R-block) shows that the classification effect would be reduced without R-block, so R-block is effective for improving the classification effect.

Secondly, we verify the effectiveness of A-block. First, remove the A-block used to capture contextual information after the sampling step. The experimental results show that A-block also has a greater impact on the accuracy of classification. Then, the effectiveness of the channel attention branch and the spatial attention branch in the A-block were verified. By removing the two branches, respectively, it was proved that the two branches also have an important influence on the context information capture of the A-block, through the evaluation of the three classes of N, S, and V through the general evaluation indicators.

Finally, we verify the effectiveness of the skip connection used in the top layer and middle layer. The reason why the skip connection structure is used is that RA-UNET uses the function of ReLU in the feature learning process, which will make the output result between (0, 1); therefore, the value of the feature map will decrease over time as a result of a series of feature learning operations, resulting in unsatisfactory learning effects. Through the addition of the relatively original features of the top layer and middle layer, it is possible to minimize the loss of important information without attention learning. The final experimental findings also fully validate the efficacy of this step.

3.2.4. Performance Comparison

We compared this study to similar studies in recent years to verify the advanced nature of RA-CNN in the classification of arrhythmia. Table 8 displays the research findings based on data from the MIT-BIH arrhythmia database, which has been segmented in the same way as this paper. Each method's name, the year it was proposed, and its performance in the classification task are listed in the table.

Table 8.

Comparison of related experiments.

Works ACC N (%) S (%) V (%)
SEN Ppr F 1 SEN Ppr F 1 SEN Ppr F 1
Dictionary
(2018) [38]
95.1 90.9 99.4 94.2 80.8 48.8 60.8 82.2 85.4 83.8
DCNN
(2018) [39]
94.0 90.6 98.8 94.5 82.3 38.1 52.1 92.0 72.1 80.9
MPCNN
(2019)[40]
96.4 98.8 97.4 98.1 76.5 76.6 76.6 85.7 94.1 89.7
DDCNN + CLSM (2020) [41] 95.1 97.5 97.6 97.6 83.8 59.4 69.5 80.4 90.2 85.0
Linear discriminant (2021) [42] 87.3 78.7 99.3 87.8 89.4 37.5 52.9 86.5 93.0 89.6
Proposed 98.5 98.9 98.6 98.8 83.1 82.5 82.8 93.5 90.1 91.7

[38] used traditional methods for classification research, introduced 60 features for the classification step. Not only was the preprocessing process complicated, but also the class S Ppr value was 48.8%, which is not ideal. [39] It is necessary to read multiple heartbeat features for heartbeat classification, which undoubtedly increases the amount of calculation. [40] In addition to inputting the original signal as input, the model also introduces RR interval information, which requires additional feature extraction operations, and the obtained classification effect is also worse than this study [41]. After completing the initial classification using a deep dual-channel CNN (DDCNN), it is necessary to further use the central-towards LSTM supportive model (CLSM) to distinguish classes N and S; however, the classification effect of category S is still unsatisfactory. [42] not only performed tedious noise reduction processing but also introduced the RR interval relationship as a feature for learning, which undoubtedly increased the difficulty of feature extraction. Compared with the above experiments, this model not only has a simple feature extraction process but also has a higher F1 value for beat-by-beat classification, which is superior in class S pathology identification [3842].

4. Conclusion

In this work, we propose a novel and effective RA-CNN model. Experiments on arrhythmia data interpatients show that the model has a high ECG recognition ability, strong generalization, and robustness. When doctors diagnose electrocardiograms, they are mostly obtained in the form of images, and two-dimensional research is more conducive to visualization, thereby improving the efficiency of diagnosis and prevention of CVD. The data does not require any form of noise reduction operation and manual feature extraction, which avoids the loss of detailed information in the original ECG data and affects the feature extraction effect [16, 17]. The preprocessing does not need to strictly extract a single heartbeat. Even if the heartbeat is mixed with the information of the front and back heartbeats, the ECG characterization information can be better expressed through the CWT, and finally, a good classification performance can be achieved.

In a further work, we will investigate the improved ECG network and further improve the classification performance of different types of diseases [4346]. On the clinical side, we will develop an ECG system that can be deployed on wearable medical devices and automatic diagnosis algorithm, test, and improve its performance [9, 47].

Acknowledgments

This work was supported by the Major Research Plan of Shandong Province: Research and Integrated Application of Key Technologies for Smart Healthcare (grant number 2020CXGC010903).

Data Availability

The ECG signal data used to support the findings of this study have been deposited in the MIT-BIH Arrhythmia Database repository (https://www.physionet.org/content/mitdb/1.0.0/).

Conflicts of Interest

There are no conflicts of interest declared by the authors.

References

  • 1.Sahoo J. P. Analysis of ECG signal for detection of cardiac arrhythmias . Rourkela: National Institute Of Technology; 2011. [Google Scholar]
  • 2.Romdhane T. F., Pr M. A. Electrocardiogram heartbeat classification based on a deep convolutional neural network and focal loss. Computers in Biology and Medicine . 2020;123:p. 103866. doi: 10.1016/j.compbiomed.2020.103866. [DOI] [PubMed] [Google Scholar]
  • 3.Jiang X., Zhang L., Zhao Q., Albayrak S. ECG arrhythmias recognition system based on independent component analysis feature extraction. TENCON 2006-2006 IEEE Region 10 Conference; 2006. [Google Scholar]
  • 4.Martis R. J., Acharya U. R., Lim C. M., Suri J. S. Characterization of ECG beats from cardiac arrhythmia using discrete cosine transform in PCA framework. Knowledge-Based Systems . 2013;45:76–82. doi: 10.1016/j.knosys.2013.02.007. [DOI] [Google Scholar]
  • 5.Suchetha M., Kumaravel N. Classification of arrhythmia in electrocardiogram using EMD based features and support vector machine with margin sampling. International Journal of Computational Intelligence and Applications . 2013;12(3):p. 1350015. doi: 10.1142/S1469026813500156. [DOI] [Google Scholar]
  • 6.Park J., Lee K., Kang K. Arrhythmia detection from heartbeat using k-nearest neighbor classifier. 2013 IEEE International Conference on Bioinformatics and Biomedicine; Dec. 2013; pp. 15–22. [Google Scholar]
  • 7.Gupta V., Saxena N. K., Kanungo A., Gupta A., Kumar P., Salim A review of different ECG classification/detection techniques for improved medical applications. International Journal of System Assurance Engineering and Management . 2022:1–15. doi: 10.1007/s13198-021-01548-3. [DOI] [Google Scholar]
  • 8.Kiranyaz S., Ince T., Gabbouj M. Real-time patient-specific ECG classification by 1-D convolutional neural networks. IEEE Transactions on Biomedical Engineering . 2016;63(3):664–675. doi: 10.1109/TBME.2015.2468589. [DOI] [PubMed] [Google Scholar]
  • 9.Ribeiro H. D. M., Arnold A., Howard J. P., et al. ECG-based real-time arrhythmia monitoring using quantized deep neural networks: a feasibility study. Computers in Biology and Medicine . 2022;143:p. 105249. doi: 10.1016/j.compbiomed.2022.105249. [DOI] [PubMed] [Google Scholar]
  • 10.Lin C. C., Yang C. M. Heartbeat classification using normalized RR intervals and morphological features. Mathematical Problems in Engineering . 2014;2014:11. doi: 10.1155/2014/712474. [DOI] [Google Scholar]
  • 11.Llamedo M., Martínez J. P. An automatic patient-adapted ECG heartbeat classifier allowing expert assistance. IEEE Transactions on Biomedical Engineering . 2012;59(8):2312–2320. doi: 10.1109/TBME.2012.2202662. [DOI] [PubMed] [Google Scholar]
  • 12.Al Rahhal M. M., Bazi Y., Al Zuair M., Othman E., BenJdira B. Convolutional neural networks for electrocardiogram classification. Journal of Medical and Biological Engineering . 2018;38(6):1014–1025. doi: 10.1007/s40846-018-0389-7. [DOI] [Google Scholar]
  • 13.Xia Y., Wulan N., Wang K., Zhang H. Detecting atrial fibrillation by deep convolutional neural networks. Computers in Biology and Medicine . 2018;93:84–92. doi: 10.1016/j.compbiomed.2017.12.007. [DOI] [PubMed] [Google Scholar]
  • 14.Li Q., Liu C., Li Q., et al. Ventricular ectopic beat detection using a wavelet transform and a convolutional neural network. Physiological Measurement . 2019;40(5, article 055002) doi: 10.1088/1361-6579/ab17f0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Salem M., Taheri S., Yuan J. S. ECG arrhythmia classification using transfer learning from 2-dimensional deep CNN features. 2018 IEEE biomedical circuits and systems conference; 2018; pp. 1–4. [Google Scholar]
  • 16.Izci E., Ozdemir M. A., Degirmenci M., Akan A. Cardiac arrhythmia detection from 2D ECG images by using deep learning technique. Medical Technologies Congress (TIPTEKNO) . 2019;2019:1–4. doi: 10.1109/TIPTEKNO.2019.8895011. [DOI] [Google Scholar]
  • 17.Huang J., Chen B., Yao B., He W. ECG arrhythmia classification using STFT-based spectrogram and convolutional neural network. IEEE Access . 2019;7:92871–92880. doi: 10.1109/ACCESS.2019.2928017. [DOI] [Google Scholar]
  • 18.Long J., Shelhamer E., Darrell T. Fully convolutional networks for semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition; 2015; [DOI] [PubMed] [Google Scholar]
  • 19.Ronneberger O., Fischer P., Brox T. U-net: convolutional networks for biomedical image segmentation . Cham: Springer; 2015. [DOI] [Google Scholar]
  • 20.Huang H., Lin L., Tong R., et al. Unet 3+: a full-scale connected UNet for medical image segmentationC//ICASSP. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing; 2020; pp. 1055–1059. [Google Scholar]
  • 21.Guan S., Khan A. A., Sikdar S., Chitnis P. V. Fully dense UNet for 2-D sparse photoacoustic tomography artifact removal. IEEE Journal of Biomedical and Health Informatics . 2020;24(2):568–576. doi: 10.1109/JBHI.2019.2912935. [DOI] [PubMed] [Google Scholar]
  • 22.Wu S., Wang Z., Liu C., Zhu C., Wu S., Xiao K. Automatical segmentation of pelvic organs after hysterectomy by using dilated convolution u-net++. International Conference on Software Quality, Reliability and Security Companion; 2019; pp. 362–367. [Google Scholar]
  • 23.Russakovsky O., Deng J., Su H., et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision . 2015;115(3):211–252. doi: 10.1007/s11263-015-0816-y. [DOI] [Google Scholar]
  • 24.Zhang Q. L., Yang Y. B. Sa-net: shuffle attention for deep convolutional neural networks. ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing; 2021; pp. 2235–2239. [Google Scholar]
  • 25.Soans N., Asali E., Hong Y., Doshi P. Sa-net: robust state-action recognition for learning from observations. International Conference on Robotics and Automation; 2020; pp. 2153–2159. [Google Scholar]
  • 26.Wang F., Jiang M., Qian C., et al. Residual attention network for image classification. Proceedings of the IEEE conference on computer vision and pattern recognition; 2017; pp. 3156–3164. [Google Scholar]
  • 27.Sharif S. M. A., Naqvi R. A., Biswas M. Learning medical image denoising with deep dynamic residual attention network. Mathematics . 2020;8(12):p. 2192. doi: 10.3390/math8122192. [DOI] [Google Scholar]
  • 28.He R., Wang K., Zhao N., et al. Automatic detection of atrial fibrillation based on continuous wavelet transform and 2D convolutional neural networks. Frontiers in Physiology . 2018;9:p. 1206. doi: 10.3389/fphys.2018.01206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Mark R., Moody G. MIT-BIH Arrhythmia Database Directory, Cambridge, MA . USA: MIT; 1988. [Google Scholar]
  • 30.Luo K., Li J., Wang Z., Cuschieri A. Patient-specific deep architectural model for ECG classification. Journal of Healthcare Engineering . 2017;2017:13. doi: 10.1155/2017/4108720.4108720 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Yan G., Liang S., Zhang Y., Liu F. Fusing transformer model with temporal features for ECG heartbeat classification. IEEE International Conference on Bioinformatics and Biomedicine . 2019;2019:898–905. doi: 10.1109/BIBM47256.2019.8983326. [DOI] [Google Scholar]
  • 32.Wu L., Xie X., Wang Y. ECG enhancement and R-peak detection based on window variabilityC//healthcare. Multidisciplinary Digital Publishing Institute . 2021;9(2):p. 227. doi: 10.3390/healthcare9020227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Houssein E. H., Hassaballah M., Ibrahim I. E., AbdElminaam D. S., Wazery Y. M. An automatic arrhythmia classification model based on improved marine predators algorithm and convolutions neural networks. Expert Systems with Applications . 2022;187:p. 115936. doi: 10.1016/j.eswa.2021.115936. [DOI] [Google Scholar]
  • 34.Li F., Xu Y., Chen Z., Liu Z. Automated heartbeat classification using 3-D inputs based on convolutional neural network with multi-fields of view. IEEE Access . 2019;7:76295–76304. doi: 10.1109/ACCESS.2019.2921991. [DOI] [Google Scholar]
  • 35.Goodfellow I., Pouget-Abadie J., Mirza M., et al. Generative adversarial nets. Advances in Neural Information Processing Systems . 2014;27 [Google Scholar]
  • 36.Cubuk E. D., Zoph B., Mane D., Vasudevan V., Le Q. V. AutoAugment: Learning Augmentation Policies from Data . Computer Vision and Pattern Recognition; 2018. http://arxiv.org/abs/1805.09501. [Google Scholar]
  • 37.Jiang K., Liang S., Meng L., Zhang Y., Wang P., Wang W. A two-level attention-based sequence-to-sequence model for accurate inter-patient arrhythmia detection. IEEE International Conference on Bioinformatics and Biomedicine . 2020. pp. 1029–1033.
  • 38.Raj S., Ray K. C. Sparse representation of ECG signals for automated recognition of cardiac arrhythmias. Expert Systems with Applications . 2018;105:49–64. doi: 10.1016/j.eswa.2018.03.038. [DOI] [Google Scholar]
  • 39.Sellami A., Hwang H. A robust deep convolutional neural network with batch-weighted loss for heartbeat classification. Expert Systems with Applications . 2019;122:75–84. doi: 10.1016/j.eswa.2018.12.037. [DOI] [Google Scholar]
  • 40.Niu J., Tang Y., Sun Z., Zhang W. Inter-patient ECG classification with symbolic representations and multi-perspective convolutional neural networks. IEEE Journal of Biomedical and Health Informatics . 2020;24(5):1321–1332. doi: 10.1109/JBHI.2019.2942938. [DOI] [PubMed] [Google Scholar]
  • 41.He J., Rong J., Sun L., Wang H., Zhang Y. Pacific-Asia Conference on Knowledge Discovery and Data Mining . Cham: Springer; 2020. An advanced two-step DNN-based framework for arrhythmia detection; pp. 422–434. [DOI] [Google Scholar]
  • 42.Dias F. M., Monteiro H. L., Cabral T. W., Naji R., Kuehni M., Luz E. J. D. S. Arrhythmia classification from single-lead ECG signals using the inter-patient paradigm. Computer Methods and Programs in Biomedicine . 2021;202:p. 105948. doi: 10.1016/j.cmpb.2021.105948. [DOI] [PubMed] [Google Scholar]
  • 43.Kaya Y., Pehlivan H. Classification of premature ventricular contraction in ECG. International Journal of Advanced Computer Science and Applications . 2015;6(7) doi: 10.14569/IJACSA.2015.060706. [DOI] [Google Scholar]
  • 44.Sivapalan G., Nundy K., Dev S., Cardiff B., Deepu J. ANNet: a lightweight neural network for ECG anomaly detection in IoT edge sensors. IEEE Transactions on Biomedical Circuits and Systems . 2022:p. 1. doi: 10.1109/TBCAS.2021.3137646. [DOI] [PubMed] [Google Scholar]
  • 45.Kaya Y. Detection of bundle branch block using higher order statistics and temporal features. The International Arab Journal of Information Technology . 2021;18(3):279–285. doi: 10.34028/iajit/18/3/3. [DOI] [Google Scholar]
  • 46.Ganeshkumar M., Ravi V., Sowmya V., Gopalakrishnan E. A., Soman K. P. Explainable deep learning-based approach for multilabel classification of electrocardiogram. IEEE Transactions on Engineering Management . 2021:1–13. doi: 10.1109/TEM.2021.3104751. [DOI] [Google Scholar]
  • 47.Shaker A. M., Tantawi M., Shedeed H. A., Tolba M. F. Generalization of convolutional neural networks for ECG classification using generative adversarial networks. IEEE Access . 2020;8:35592–35605. doi: 10.1109/ACCESS.2020.2974712. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The ECG signal data used to support the findings of this study have been deposited in the MIT-BIH Arrhythmia Database repository (https://www.physionet.org/content/mitdb/1.0.0/).


Articles from Computational and Mathematical Methods in Medicine are provided here courtesy of Wiley

RESOURCES