Summary
The detection and classification of arrhythmias are crucial steps in diagnosing cardiovascular diseases. However, current deep learning-based classification methods often fail to consider both the morpho-logical and temporal features of the electrocardiogram (ECG) simultaneously. Therefore, we propose a hybrid heartbeat classification method that combines Transformer and multi branch convolutional neural networks (CNNs). Then, use the fusion module to stitch the features obtained from different classifiers. We performed three different heartbeat classification protocols on the MIT-BIH arrhythmia (MIT-BIH-AR) database and analyzed performance on SVEB and VEB classes to validate our method. The first was an intra-patient protocol with an overall accuracy of 99.5%, with 92.4% and 99.9% for Sen and Spe on SVEB and 98.2% and 99.9% for Sen and Spe on VEB. The latter two were inter-patient protocols, and we divided the training and test sets using different records, and the results showed an overall accuracy of 98.8% and 97.2%, respectively.
Subject areas: Biomedical engineering, Artificial intelligence
Graphical abstract

Highlights
-
•
Proposing a multi-branch fusion ECG classification method
-
•
The proposed method is validated on the MIT-BIH database
-
•
Three different forms of data segmentation are used
-
•
The proposed method is of great significance for clinical application
Biomedical engineering: Artificial intelligence
Introduction
Cardiovascular disease has become one of the most serious threats to human health in the world today, with the highest mortality rate among all causes of death.1 Cardiac arrhythmia is one of the most common heart diseases, and it is characterized by the abnormal generation or conduction of electrical impulses in the cardiac muscle cells during heartbeats. Cardiac arrhythmia is often a direct cause of sudden cardiac death and has attracted widespread attention from various sectors of society. Common cardiac diagnostic methods include stress tests, cardiac CT, and electrocardiogram. Stress testing refers to increasing the load on the heart through exercise or drugs, and diagnosing diseases by observing how well the heart is working at its best, which is simple to operate but requires specialized equipment to perform the test. Cardiac CT uses radiation to examine the heart and blood vessels, and can construct a three-dimensional picture to visualize the heart condition, but diagnosis depends on the doctor’s personal experience. An ECG is a visualization of a time series that records the cardiac activity of the human body. In 1903, Wilhelm Einthoven recorded the first clear human electrocardiogram signal. Today, it is widely used in clinical examinations of heart-related diseases, and there are relatively comprehensive ECG diagnostic criteria. The ECG method diagnoses heart disease by measuring the electrical charge of each part of the heart to generate an electrical signal, which is characterized by accuracy and speed, in addition, the ECG is easy to obtain, which is conducive to real-time monitoring of the patient’s status and prevention of the disease.
ECG play a crucial role in diagnosing abnormalities in the generation and conduction of cardiac electrical signals. However, accurately distinguishing abnormal heart rhythms from normal sinus rhythms based on ECG alone requires professional medical expertise, and patients themselves may not be able to promptly perceive the occurrence of cardiac arrhythmias. Furthermore, some ECG features of specific arrhythmias are related to individual characteristics, such as gender, age, and other individual differences. It is also closely related to the participants’ physical condition and physiological factors, such as whether they are pregnant.2
Computer-aided analysis of ECGs has significant applications in cloud computing service platforms and wearable ECG devices for primary care. It has also found utility in ECG monitoring, dynamic electrocardiography (DCG), ECG auto-analysis, ECG body surface calibration, high frequency ECG, late ventricular potentials, His bundle electrocardiogram (HBE), fetal ECG, and heart rate variability. These advancements are of great importance in terms of enabling timely and effective prevention and control of heart disease, ultimately contributing to saving patients’ lives.
ECG signals are one-dimensional electrical signals, and numerous researchers have explored methods that process time series for ECG classification. Currently, computer-aided ECG diagnosis methods fall into two main categories. The first is the shallow model-based ECG classification method, which relies on manually extracted ECG features. Vapnik et al.3,4 conducted research on the small sample machine learning problem in the 1960s, but limited by the research conditions, it was not possible to propose a method to put the theory into practice. In 1995, Cortes et al.5 proposed the support vector machine (SVM) for use in early ECG classification. Sandeep et al.6 used SVM with particle swarm optimization for classification, and Fejtová et al.7 analyzed ECG records by employing wavelet transform (WT) and used a decision tree machine learning method for classification. Luz et al.8 used an optimal random forest for classification and first used an optimum path forest (OPF) classifier to classify electrical signals during the cardiac classification task. The performance of the OPF classifier was compared with that of three well-known expert system classifiers, and the other methods included classification using k-nearest neighbor (KNN),9 decision trees, plain Bayes, SVM,2 hidden Markov models (HMM),10,11 artificial neural networks,12 and hybrid methods for expert systems.13 Shallow models have had a profound impact in the field of ECG classification; however, feature extraction usually relies on the expertise of domain experts, which is not only time-consuming and labour-intensive but also has a direct impact on classification performance. The second class of ECG classification method is based on deep learning, which is different from shallow models in that deep learning has the advantages of strong learning ability and does not rely on manual feature extraction. In addition, deep learning can automatically learn features from raw data, which can obtain richer information and higher-level feature expression, thereby improving model performance. And the ability to generalize is stronger. Zhang et al.14 proposed a deep convolutional neural network-based approach (DCNN) that uses Fourier transform to design a two-dimensional convolutional deep CNN architecture after transforming the ECG signal to the time-frequency domain. The architecture trains a specific DCNN for ECG samples of a specific length to learn the features of ECG wave signals from the time domain and perform classification in an end-to-end manner. Liman et al.15 conducted classification studies using a convolutional recurrent neural network (CRNN) and SVM, and Xie et al.16 proposed an ECG classification algorithm that combined a bidirectional recurrent neural network (BiRNN) with a CNN. Ganguly et al.17 achieved superior classification performance using a bidirectional long short-term memory (bi-LSTM) model, and Guan et al.18 introduced a low-dimensional denoised embedding Transformer model for ECG classification that embeds the signal into a low-dimensional space for feature learning.
For complex ECG datasets, how to effectively improve the performance of computer-assisted ECG classification algorithms is still a key issue. CNNs are known for their ability to extract local features, are good at mining feature representations with good generalization ability from raw ECG data, and synthesizing feature mining of multiple branches can provide richer information, but it may be difficult to capture global feature representations. On the other hand, Transformer is good at capturing long-distance feature dependencies through self-attention mechanisms, which has unique advantages in temporal information processing, but may ignore the details of local features. In this study, we propose a new method that combines the ECG classification of multi-branch CNNs and Transformer architectures. The proposed method involves several steps: First, pulsation interception and preprocessing of the raw ECG data. Next, multi-branch CNN and Transformer classifiers are used to extract features from the preprocessed data, respectively. Finally, fusion blocks are used to integrate the learned feature representations from different classifier modules to produce the final ECG pulse classification results.
We conducted experiments to validate the effectiveness of our model using three different data organization approaches. The experimental results demonstrated that our model achieved accurate and robust performance in the task of ECG classification, specifically in the classification of heartbeat types, and that our model is able to obtain high quality heartbeat classification results.
Related work
The ECG classification method usually involves the following steps: first, the raw ECG data are preprocessed, where ECG involves recording the electrical signals of the heart at each cycle of activity on the surface of the human body using an ECG machine, which is nonlinear, nonstationary and highly random in nature. In addition, ECG is susceptible to interference from various factors, such as collector proficiency, instrument variation, human activity, and the acquisition environment, which can introduce noise during acquisition. Therefore, it is necessary to preprocess the ECG data to remove adverse factors before performing feature extraction and classification work. The noise in ECG mainly includes three types: baseline drift noise, myoelectric interference noise and industrial frequency noise. Kozumplík et al.19 proposed a linear time-varying filtering method for suppressing the baseline drift noise of ECG signals, and Tulyakova et al.20 proposed an adaptive method for real-time filtering of unknown ECG nonsmooth noise. The filter has good filtering performance and high filtering quality on various real signals containing nonsmooth EMG signal noise. Next, heartbeat segmentation is performed on the heartbeat after preprocessing so that the input into the model is complete ECG heartbeat data. Feature extraction is then performed on the data. Diagnosis of disease by ECG requires the observation of changes in the waveform of the ECG signal. Such changes are not always easy to detect, especially for small changes. Therefore, advanced feature extraction methods are needed to study these small differences.21
In 1995, the first use of a short-time Fourier transform to extract ECG signal features was published by Gramatikov et al.22 and the processed data were eventually learned and classified. Rahul et al.23 proposed an artificial intelligence based method for AF detection. Two-stage median filtering and least squares smoothing filtering were used to preprocess the ECG, and the one-dimensional ECG signal was then converted into a two-dimensional time-frequency representation and classified using a deep learning model based on bidirectional LSTM. Its average accuracy reached 98.85%. Sinha et al.24 proposed a new multilevel feature analysis and deep learning strategy in which the multilevel features of ECG signals were obtained using empirical modal decomposition (EMD). Mahboobeh et al.25 present using CMR images and deep learning methods to detect CVDs. Qin et al.26 proposed the ECG-ADGAN model, which embeds a bidirectional long-short-term memory (Bi-LSTM) layer in the GAN structure, and uses a minibatch discrimination training strategy in the discriminator to synthesize ECG signals to improve the convergence stability of GAN. The accuracy of this method is 95.5%, and the AUC is 95.9%. Yan et al.27 proposed a Transformer-based model for classifying arrhythmic heartbeats. The RR interval was integrated to exploit ECG signal morphology and temporal information. Table 1 summarizes the models commonly used for classification.
Table 1.
Common methods of ECG classification
| No | Method | Data | Info |
|---|---|---|---|
| Tesfai et al.28 | 1D CNN | AHA | A lightweight CNN model based on ShuffleNet architecture is proposed and implemented. |
| Ahmed et al.29 | 2D CNN | CPSC2021 | Convert 1D ECG signals into 2D scale maps in the wavelet domain. |
| de Santana et al.30 | 2D CNN | MIT-BIH-AR | Convert 1D ECG recordings to 2D grayscale images. |
| Xu et al.31 | BiLSTM | PTB | Extract a variety of time series features as input. |
| Natarajan et al.32 | Transformer | PhysioNet Challenge 2020 | The random forest model was used to select 22 important features, such as RR interval median and P wave correlation coefficient. |
In recent years, an increasing number of fusion models have been proposed in the field. Lu et al.33 introduced a hybrid LSTM-CNN model for short-term ECG positive abnormality classification. The model combines LSTM and a CNN. It has the ability to learn the structural features of ECG signals independently, possesses memory and inference functions, and can effectively explore the temporal correlations between ECG signal points. The model achieved high accuracy on the MIT-BIH-AR database with an accuracy of 99.7%, sensitivity of 99.69%, and specificity of 99.7%. On the Chinese Cardiovascular Disease Database (CCDD), the model achieved an accuracy of 93.39%, sensitivity of 91.18%, and specificity of 95.21%. Ma et al.34 proposed two deep neural network structures to conduct multi-class experiments on the MIT-BIH-AR database. One structure is a CNN model composed of one-dimensional convolution, and the other structure is a fusion model composed of BILSTM and CNN. Comparative experiments show that the fusion model significantly improves the overall accuracy of classification and the classification effect of small probability samples. Zeeshan et al.35 propose two multimodal fusion frameworks for ECG heartbeat classification. The authors converted the raw ECG data into three different images into a model to create a single image modal used as input to the CNN, and finally used an SVM classifier for classification, achieving classification accuracy of 99.7% and 99.2%. Many previous works often first utilize CNN to extract low-level features and then passed through transformers to model global interaction.36 In this paper, we use multi-branch CNNs to learn rich ECG features. Most of existing methods only consider the morphological features of the ECG signals when classifying different types of arrhythmia, so we integrate four temporal features, i.e., four RR intervals which will be described in section 3 into the multi-branch CNNs. In order to fully leverage the advantages of CNNs and Transformer, we use a fusion module to integrate multibranch CNNs and Transformer in this paper.
Our proposed method
Data processing
The general flow of the proposed method is to first preprocess the original signal, then divide the training set and the validation set into models for training, and finally use the test set to test the training effect.
Massachusetts Institute of Technology-BIH-AR database
The MIT-BIH-AR database is a widely recognized arrhythmia database provided by the Massachusetts Institute of Technology (MIT). It contains a large amount of data that have been extensively annotated and labeled by qualitative experts, and it is widely used in the classification and clinical research of arrhythmias. The database consists of 48 dual-channel dynamic ECG signals, which were collected from over 4,000 24-h periodic dynamic ECG data from 47 subjects in the school’s arrhythmia laboratory. Each file includes a 30-min segment sampled at 360 Hz, and approximately 30% of the data contain abnormal heartbeats. In this study, we also selected this database to evaluate the classification performance of our model. Following the recommendation of the ANSI/AAMI EC 57:1998 standard,37 four records from patients with implanted pacemakers (102, 104, 107, and 217) were excluded from the dataset, and we used lead II signals for each recording. As shown in Table 2, the MIT- BIH classes are mapped onto the five recommended AAMI classes, which include normal or bundle branch block beats (N), supraventricular ectopic beats (S or SVEB), ventricular ectopic beats (V or VEB), fusion beats (F), and beats that could not be classified (Q).
Table 2.
AAMI Standard Classification
| AAMI | Normal (N) | Supraventricular (S) | Ventricular(V) | Fusion(F) | Unknown(Q) |
|---|---|---|---|---|---|
| MIT-BIH -AR classification | Normal beat (NOR) | Supraventricular premature beat (SP) | Premature ventricular contraction (PVC) | Fusion of ventricular and normal beat (fVN) | paced beat (P) |
| Left bundle branch block beat (LBBB) | atrial premature beat (AP) | Ventricular escape beat (VE) | Unclassifiable beat (U) | ||
| right bundle branch block beat (RBBB) | aberrated atrial premature beat (aAP) | Fusion of paced and normal beat (fPN) | |||
| atrial escape beats (AE) | Nodal (junctional) premature beat (NP) | ||||
| Nodal (junctional) escape beat (NE) |
Electrocardiogram preprocessing
In this section, we introduce the data preprocessing techniques used in our method. While the MIT-BIH-AR database contains almost noiseless data, it still contains some unavoidable noise, such as noise generated by muscle contractions and respiratory movements, noise caused by electrode placement, and other sources of noise. Therefore, the main task during the preprocessing stage is to remove the noise present in the ECG data.
The discrete wavelet transform (DWT) is a process of discretizing the scale and translation of the basic wavelet. Wang et al.38 proposed a new WE-ANN-CSO method for identifying CAP. It was the first work to apply wavelet analysis to CCT images. The experimental results show that this method is superior to the four methods of GANN, CADe, SVM, and DT. Wang et al.39 propose a new automated detection method for COVID-19 based on chest CT. The three-segment biogeography-based optimization (3SBBO) algorithm is proposed. Wavelet Renyi entropy is used to extract image features. The 3SBBO optimization method optimizes the weights, biases, and Renyi entropy of the network. This method outperforms state-of-the-art RBFNN, KELM, and ELMBA methods In our method, we first apply a 9-level DWT for denoising, and the ECG signal is decomposed into approximation components and detail components at each level of the decomposition. In many signals, the approximation component is crucial because it often contains the signal’s essential features, while the detail component reveals the signal’s finer details or differences. Moreover, for signals contaminated with noise, the main energy of the noise resides in the detail components obtained from the wavelet decomposition. Therefore, we apply thresholding on the detail components to filter out the noise present in the ECG signal. During signal processing, filters are devices used to preserve desired frequency components while eliminating undesired frequency components. By removing certain frequencies, the noise in the signal can be reduced. The commonly used filters include the low-pass filter (LPF), high-pass filter (HPF), and bandpass filter (BPF), where the bandpass filter combines the characteristics of both the lowpass filter and high-pass filter. In our approach, we apply a bandpass filter with a range of 0.1–100 Hz to the ECG signal, and then, we digitize the filtered ECG signal with a sampling frequency of 360 Hz.40,41
Electrocardiogram cropping
Due to the nature of the MIT-BIH-AR database, which consists of long-term ECG records, it is necessary to segment these records before applying them to deep learning models. After preprocessing and sampling, the long-duration electrocardiogram is divided into short-duration segments, which are used for ECG classification. We follow the heartbeat segmentation approach proposed by Xiao et al.42 The R-peak locations, which are annotated in the MIT-BIH-AR database, are used to locate the R-peaks. Each heartbeat segment is constructed by taking 133 samples before the R-peak and 266 samples after the R-peak, resulting in a total of 400 samples per heartbeat.
RR interval
The basic structure of an ECG signal consists of the P-wave, T-wave, and QRS complex. The interval between two consecutive R-peaks is referred to as the RR interval (illustrated in Figure 1), and it provides important reference information for the diagnosis of cardiac diseases. The time elapsed between two consecutive R-waves in the QRS signal is also widely recognized as an important feature of ECG signals.43,44 The dynamic nature of ECG captured from wearable devices causes the amplitude and morphology to be highly susceptible to various noises. However, even in the presence of severe noise, the detection of RR intervals can be achieved with acceptable accuracy.44,45 Therefore, using RR interval features can enable a more accurate and robust classification of ECG data. Li et al.46 employed four RR features, including the pre-RR interval, post-RR interval, average RR interval, and local mean RR interval, in their classification model to improve performance. In this study, we utilize the pre-RR interval, post-RR interval, average RR interval, and local mean RR interval of the last 10 heartbeats as our interval features.
-
(1)
Pre-RR interval: The RR interval between a given heartbeat and its preceding heartbeat.
-
(2)
Post-RR interval: The RR interval between a given heartbeat and its subsequent heartbeat.
-
(3)
Average RR interval: The ratio between the pre-RR interval and the post-RR interval.
-
(4)
Local mean RR interval: The average of the RR intervals between a given heartbeat and its preceding 10 heartbeats.
Figure 1.
A segment of the RR interval
Short-time Fourier transform
Time-domain, frequency-domain, time-frequency domain, and statistical techniques are commonly used to capture features of ECG signals.47 Classical Fourier transform (FT) is also used to obtain the frequency spectrum characteristics of ECG, but it can only capture globally decoupled frequency information that is independent of time-domain analysis. The STFT, which is essentially a windowed Fourier transform, bridges the gap between the time domain and frequency domain of a signal. In practice, the computation of STFT involves dividing a longer time signal into shorter segments of equal length and calculating the FT on each segment. For ECG signals, the definition of the STFT is expressed as follows:
| (Equation 1) |
In this method, x(t) represents the ECG signal with a sampling rate of 360 Hz, and ω(t) is the window function. We use the Hanning window with a window size of 64, and the Hanning window is defined as follows:
| (Equation 2) |
The spectrum sequence obtained from the ECG signal through STFT is used as the input signal for the Transformer model, and a normalization method is employed to keep the values of the sequence within the range of [-1, 1], ensuring stability and correctness during the deep neural network model’s learning process. The mathematical representation of normalization is defined as follows:
| (Equation 3) |
where x(t) denotes the ECG signal with a sampling rate of 360 Hz and Max(x) is the maximum value.
Fusion model
Fusing multiple machine learning models tends to improve overall predictive power. This is a very effective means of improvement. Wang et al.48 proposed a diagnostic method for chest CT images. On the one hand, features are extracted from a self-created CNN and learn a single image level representation. On the other hand, learn relationship aware representations from the Graph convolutional network (GCN). Finally, deep feature fusion (DFF) is used to fuse individual image-level features and relationship aware features from GCN and CNN, respectively. Wang et al.49 proposed a new CCSHNet model for COVID-19 detection in CCT. Proposes a novel selection algorithm of pretrained networks for fusion (SAPNF) to select the best two pretrained models. Deep CCT fusion is performed by discriminant correlation analysis to help fuse two features from the two models. Like most sequence-to-sequence models, Transformer is also an encoder-decoder architecture. However, due to the absence of a standard translation for ECG signals, Yan et al.27 only utilized the encoder part for ECG signal classification. Zhang et al.36 proposed a novel fusion technique called the BiFusion module that effectively combines multilevel features from both the CNN and Transformer branches, and Niu et al.50 introduced a multi-perspective convolutional neural network (MPCNN) to extract high-level features related to the local morphological structures and global shape information of ECG signals.
Our proposed ECG arrhythmia classification model is illustrated in Figure 2. The raw ECG signal is first preprocessed while the RR interval features are extracted. The processed data are then used as input for both the Transformer model and the CNN model, respectively. For the CNNs channel, the ECG signal is passed through the CNNs model, and RR interval features are added. For the Transformer channel, the ECG signal is input into the Transformer model after the STFT extraction of the time-frequency sequence. Finally, the fusion module is used to fuse the results of the two channels to obtain the final 5 classification results.
Figure 2.
Architecture of the proposed model
Convolutional neural networks block
CNNs extract features of the target by layer-by-layer abstraction, where an important concept is the perceptual field. If the perceptual field is too small, only local features can be observed, and if the perceptual field is too large, too many invalid features are easily obtained.
Therefore, we use a multi-channel CNNs model to learn the morphological features of the ECG signal, and the structure of the CNNs is shown in Figure 3. The model consists of1D convolutional layers, which extract features from ECG segments. Since time series data is one-dimensional and has spatial properties, 1D convolution is a good choice when the kernel slides along one dimension, a choice that has proven effective in many applications.51,52 five branches with different convolutional kernel sizes are equivalent to being given different sizes of receptive fields to learn the morphological features of the ECG signal at multiple scales. Branches 3 and 5 use two layers of convolution with kernel sizes of1×25, 1 × 15 and 1 × 30, 1 × 15 respectively, all in steps of 1, and branches 1, 2 and 4 use a single layer of convolution with kernel sizes of 1 × 20, 1 × 25 and 1 × 30 respectively, all in steps of 1.Branches 1, 2 and 4 were convolved using a single layer with kernel sizes of 1 × 20, 1 × 25 and 1 × 30 respectively, all in steps of 1, and after each convolutional layer, a modified linear unit (ReLU) method was used to increase the non-linear relationship between the layers of the neural network, then Max Pooling was used to reduce the parameters and computational effort while retaining the main features to prevent overfitting of the model, and finally a matrix splicing method (Concat) is used to splice the results of the five branches. The Table 3 counts all the parameters of the CNNs block.
Figure 3.
Architecture of the CNNs block
Table 3.
Parameter information for the CNN block
| Type | Con1d | BN | Max Pooling | Output Shape | Conv1d | BN | Max Pooling | Output Shape |
|---|---|---|---|---|---|---|---|---|
| b1 | (20,1) | 32 | 105 | 32 × 3 | – | – | – | – |
| b2 | (25,1) | 32 | 105 | 32 × 3 | – | – | – | – |
| b3 | (25,1) | 64 | 7 | 64 × 54 | (15,1) | 32 | 3 | 32 × 14 |
| b4 | (30,1) | 32 | 192 | 32 × 1 | – | – | – | – |
| b5 | (30,1) | 64 | 7 | 64 × 53 | (15,1) | 32 | 3 | 32 × 13 |
| Cat | – | – | – | – | – | – | – | 544 × 2 |
| Flatten | – | – | – | – | – | – | – | 1088 |
| Cat RR | – | – | – | – | – | – | – | 1092 |
| Fully Connected | – | – | – | – | – | – | – | 64 |
Transformer encoder
In this paper, Transformer is used as a processing branch for the temporal dimension features. The original ECG signal is first transformed using STFT to extract the time frequency features, which are then used as the input to Transformer. The Transformer decoder is replaced with a linear layer, relying only on the encoder for ECG signal classification, and the output size of the input embedding is 512. However, when using the embedding as an input representation, there is a problem that the embedding itself does not contain relative positional information. Therefore, it is necessary to incorporate positional encoding to express the positional information of the signal. In this paper, the positional encoding is represented using the method of sine and cosine functions, and the mathematical definition is as follows:
| (Equation 4) |
| (Equation 5) |
In this context, pos represents the position within the sequence, d represents the embedding dimension, 2i corresponds to even dimensions, and (2i + 1) corresponds to odd dimensions.
The structure of the Transformer encoder block is illustrated in Figure 4, and it consists of a stack of N encoders. In this paper, N is set to 3. The encoder block is divided into two parts: the multi-head attention mechanism and the feed-forwards neural network. Both parts include residual connections and layer normalization. We set the number of heads in the multi-head attention mechanism to 4, and based on the linear transformation of the embeddings, we obtain Q, K, and V. The self-attention mechanism calculates the similarity between the current token and other tokens by using query Q and key K. The current similarity is then used as a weight to compute a weighted sum of the values V, which becomes the next token in the subsequent layer. The formula for calculating the attention score matrix is as follows:
| (Equation 6) |
Figure 4.
Architecture of the Transformer encoder
is the dimension of K due to the nature of the softmax function; when the input values are very large, the function falls into a very small gradient. Therefore, a scale factor is used to counteract this effect.
Fusion block
To integrate the results obtained by both Transformer and the CNNs, inspired by ref. 36, we incorporated a self-attention mechanism and multimodal fusion mechanism into the module. The mathematical definition is as follows:
| (Equation 7) |
| (Equation 8) |
| (Equation 9) |
Where t represent the information about the Transformer branch, which first uses the channel attention mechanism to further focus on the global features of t, and g represents the information about the CNNs branch. The Hadamard product is then used to calculate the interaction feature bp for t and g, ω1∈ R(D×L), ω2∈ R(C×L). Finally, we connect the interaction features bp, t and g and pass them through the residual blocks to obtain the fused features.
Experiments
In this section, the performance of the proposed fusion model is evaluated, with metrics such as Sen, Spe, and others. A comparison is also made with several other methods to assess its effectiveness.
Experimental platform
The experiment was conducted using the PyTorch deep learning framework in the Python programming language. The hardware used for the experiment was a computer with the following specifications: the CPU was a 2∗Intel(R) Xeon(R) Gold 6330 CPU @ 200 GHz with 28 cores and 56 threads, 512 GB of memory; the GPU was an NVIDIA A100 with 40 GB of memory; and the operating system was Ubuntu 20.04.3 LTS.
Experimental setup
The mainstream classification experiments in ECG analysis are primarily conducted using two approaches: intra-patient experiments and inter-patient experiments. In intra-patient experiments, a random partition is made within each patient’s data, where one part is assigned as the training set and the remaining data are used as the test set. Although intra-patient classification achieves high accuracy, it is based on training on a portion of heartbeats from the same patient and testing on the rest of the patient’s heartbeats. Therefore, this approach is not reasonable, as it may result in an overestimation of the performance. Inter-patient experiments are considered to be more meaningful. In interpatient experiments, the entire ECG record of each patient is used as a single unit for training and testing. This approach better reflects real world scenarios and evaluates the model’s ability to generalize to unseen patients. In summary, while intra-patient experiments may yield high accuracy, interpatient experiments are preferred because they provide a more realistic assessment of the model’s performance.
We reviewed several neural network-based ECG classification methods.16,42,53,54,55,56,57,58,59,60 and found that they employed different evaluation methods, mainly consisting of three different experimental setups. Therefore, we adopted three different dataset partitioning approaches to correspond to the three experiments:
-
(1)
EXP1: corresponds to the intra-patient dataset partitioning, as referenced in.16,55,56,57,59 In this setup, 50% of the data were randomly extracted as the training set, and the remaining 50% were used as the test set.
-
(2)
EXP2: corresponds to the inter-patient dataset partitioning, as referenced in.53,54 Following the AAMI standard, out of the 48 records, 44 records were selected. Among them, 22 records were used as the training set (DS1), and the remaining 22 records were used as the test set (DS2). Additionally, the first 5 min of each record from the test set were added to the training set for auxiliary training. The details are presented in Table 4.
-
(3)
EXP3: corresponds to the inter-patient dataset partitioning, as referenced in.58,60 In this setup, 20 records are selected as the training set (DS100), while the remaining 24 records are used as the test set (DS200). The details are presented in Table 5.
Table 4.
EXP2 data division
| Dataset | Records |
|---|---|
| DS1 | 101 106 108 109 112 114 115 116 118 119 122 124 201 203 205 207 208 209 215 220 223 230 |
| DS2 | 100 103 105 111 113 117 121 123 200 202 210 212 213 214 219 221 222 228 231 232 233 234 |
Table 5.
EXP3 data division
| Dataset | Records |
|---|---|
| DS100 | 100 101 103 105 106 108 109 111 112 113 114 115 116 117 118 119 121 122 123 124 |
| DS200 | 200 201 202 203 205 207 208 209 210 212 213 214 215 219 220 221 222 223 228 230 231 232 233 234 |
In addition, to achieve optimal learning performance, the datasets obtained from the aforementioned three partitioning approaches were augmented by duplicating the heartbeats, and the augmented datasets were then randomly divided into training and validation sets in an 8:2 ratio.
The stochastic gradient descent (SGD) optimization algorithm is a core component in many scientific and engineering fields. Many theoretical and engineering problems can be formulated as mathematical problems that involve minimizing an objective function. However, they present some challenges, such as the difficulty in choosing an initial learning rate (lr) and the limitation of lr adjustment. The Adam optimizer effectively addresses these issues. Therefore, we chose the Adam optimizer to optimize the process, with an initial lr set to 1e-4, and a dynamic learning rate strategy was employed, in which lr decreased by 10% every 5 epochs. During the training process, cross-entropy loss was selected as the model’s loss function, and the batch size was set to 32. To save the best-performing model during training, we incorporated an early stopping mechanism that, stopped training if there was no significant decrease in the validation set loss over for 10 consecutive epochs. All training results were validated on the test set.
Evaluation metrics
The performance metrics for the classifier were obtained from.42 In our work, we evaluated the performance of the classifier on the 5-class classification using the Overall Accuracy. Additionally, we assessed the proposed model’s ability in terms of classifying VEB beats and SVEB beats by using metrics such as accuracy (Acc), specificity (Spe), sensitivity (Sen), predictivity (Ppr), and the F1 score (F1). These metrics are defined as follows:
| (Equation 10) |
| (Equation 11) |
| (Equation 12) |
| (Equation 13) |
| (Equation 14) |
| (Equation 15) |
where. φi is the sample size of class i that was correctly predicted. TP represents a true positive, FN represents a false negative, FN represents a false positive, and TN represents a true negative.
Results and discussion
Results
This section presents the experimental results for the three aforementioned experimental setups and compares them with several literature references. The MIT-BIH-A database is a small-scale database that can easily lead to overfitting during training. However, this issue can be mitigated by employing the early stopping mechanism. Figure 5 shows the loss curves during the EXP1, EXP2, and EXP3 training processes, with different colors representing the training and validation curves. It can be observed that around the epoch between 20 and 50, the model achieves lower and more stable loss values on both the training and validation sets. Figure 6 is a line chart of the correct rate of the training process, from which it can be seen that with the increase of the training process, the Train acc gradually improves and gradually stabilizes.
Figure 5.
Loss curves for three schemes
(A) Loss of EXP1, (B) Loss of EXP2, (C) Loss of EXP3. The blue curve represents the training loss, and the yellow curve represents the validation loss. The red dashed line represents the epoch when the early stop mechanism was triggered.
Figure 6.
Accuracy curves for three schemes
(A) Train acc of EXP1, (B) Train acc of EXP2, (C) Train acc of EXP3.
Figure 7 presents the confusion matrix for the EXP1 experiment, indicating excellent results for all five beat classes with an Overall Accuracy of 99.5%. Good classification performance is also achieved for the SVEB and VEB classes, with Acc of99.7% and 99.8%, respectively. Figure 8 displays the confusion matrix for the EXP2 experiment, showing an Overall Accuracy of 98.8% for the five classes, with Acc of 99.1% and 99.7% for the SVEB and VEB classes, respectively. Figure 9 presents the confusion matrix for the EXP3 experiment, revealing that the N and F classes maintain high classification performance, while the SVEB and VEB classes exhibit a slight decline in classification performance, with Acc of 98.0% and 99.0%, respectively. The Overall Accuracy for the five classes is 97.2%.
Figure 7.
Confusion matrix for EXP1
Figure 8.
Confusion matrix for EXP2
Figure 9.
Confusion matrix for EXP3
To verify the effectiveness of the method, we also performed ablation experiments. Table 6 is about the ablation experiment of the model branch, we decompose the model for experiments, respectively, using only the CNN branch, using only the Transformer branch, using the CNN and Transformer branch without adding the Fusion Block and using the complete model four schemes. From the experimental results, it can be seen that the results of using the fusion model are significantly improved compared with the results of the above schemes, and they are the best. Table 7 is an experiment on the effectiveness of artificial features, and we used artificial features as the RR interval. From the results, it can be seen that after adding artificial features, both the overall performance and the indicators of SVEB and VEB are significantly improved. We performed ablation experiments for different inputs, as shown in Table 8. As can be seen, if both the CNN branch and the Transformer branch use the original ECG signal, the overall performance is slightly reduced. If both the CNN branch and the Transformer branch use STFT-processed signals as input, all three schemes perform poorly.
Table 6.
Ablation experiments for our model, the units for all evaluation indicators in the table are (%)
| Method | VEB |
SVEB |
Overall Accuracy | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc | Sen | Spe | Ppr | F1 | Acc | Sen | Spe | Ppr | F1 | |||
| CNN | EXP1 | 99.8 | 98.0 | 99.9 | 99.0 | 98.5 | 99.5 | 92.5 | 99.7 | 89.7 | 91.1 | 99.4 |
| EXP2 | 99.6 | 97.8 | 99.7 | 96.0 | 96.9 | 98.6 | 77.1 | 99.5 | 85.7 | 81.2 | 98.3 | |
| EXP3 | 98.9 | 95.8 | 99.2 | 92.7 | 94.2 | 97.8 | 64.9 | 99.5 | 85.5 | 73.8 | 96.9 | |
| Transformer | EXP1 | 99.2 | 96.8 | 99.3 | 91.1 | 93.8 | 99.0 | 88.6 | 99.2 | 74.8 | 81.1 | 98.0 |
| EXP2 | 99.2 | 94.4 | 99.5 | 92.7 | 93.6 | 97.8 | 71.1 | 98.9 | 71.9 | 71.5 | 97.1 | |
| EXP3 | 98.5 | 92.1 | 99.2 | 92.5 | 92.3 | 98.0 | 63.5 | 99.7 | 91.0 | 74.7 | 96.8 | |
| CNN+Transformer | EXP1 | 99.8 | 98.0 | 99.9 | 98.4 | 98.1 | 99.6 | 90.8 | 99.9 | 94.1 | 92.4 | 99.3 |
| EXP2 | 99.5 | 97.5 | 99.7 | 95.6 | 96.5 | 98.3 | 74.8 | 99.3 | 81.1 | 77.8 | 98.0 | |
| EXP3 | 99.1 | 96.0 | 99.5 | 95.0 | 95.5 | 97.8 | 64.7 | 99.4 | 85.2 | 73.5 | 97.1 | |
| CNN+Transformer+FB | EXP1 | 99.8 | 98.2 | 99.9 | 99.0 | 98.6 | 99.7 | 92.4 | 99.9 | 95.8 | 94.1 | 99.5 |
| EXP2 | 99.7 | 97.6 | 99.8 | 98.0 | 97.6 | 99.1 | 81.5 | 99.8 | 94.9 | 87.7 | 98.8 | |
| EXP3 | 99.0 | 94.1 | 99.6 | 96.0 | 95.0 | 98.0 | 64.8 | 99.6 | 89.0 | 75.0 | 97.2 | |
Table 7.
Ablation experiments on the RR interval, the units for all evaluation indicators in the table are (%)
| Method | VEB |
SVEB |
Overall Accuracy | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc | Sen | Spe | Ppr | F1 | Acc | Sen | Spe | Ppr | F1 | |||
| Without RR | EXP1 | 99.8 | 98.2 | 99.9 | 98.7 | 98.4 | 99.6 | 93.5 | 99.7 | 89.2 | 91.3 | 99.3 |
| EXP2 | 99.6 | 97.7 | 99.7 | 96.4 | 97.0 | 98.7 | 77.4 | 99.5 | 87.2 | 82.0 | 98.4 | |
| EXP3 | 98.9 | 96.0 | 99.2 | 92.6 | 94.3 | 97.6 | 63.0 | 99.3 | 82.4 | 71.4 | 96.8 | |
| With RR | EXP1 | 99.8 | 98.2 | 99.9 | 99.0 | 98.6 | 99.7 | 92.4 | 99.9 | 95.8 | 94.1 | 99.5 |
| EXP2 | 99.7 | 97.6 | 99.8 | 98.0 | 97.6 | 99.1 | 81.5 | 99.8 | 94.9 | 87.7 | 98.8 | |
| EXP3 | 99.0 | 94.1 | 99.6 | 96.0 | 95.0 | 98.0 | 64.8 | 99.6 | 89.0 | 75.0 | 97.2 | |
Table 8.
Ablation experiments on model inputs, the units for all evaluation indicators in the table are (%)
| Method | VEB |
SVEB |
Overall Accuracy | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc | Sen | Spe | Ppr | F1 | Acc | Sen | Spe | Ppr | F1 | |||
| ECG original signal | EXP1 | 99.8 | 98.5 | 99.9 | 98.1 | 98.3 | 99.6 | 91.3 | 99.8 | 92.9 | 92.1 | 99.3 |
| EXP2 | 99.7 | 98.1 | 99.8 | 96.8 | 97.4 | 98.7 | 75.4 | 99.7 | 90.0 | 82.0 | 98.4 | |
| EXP3 | 98.5 | 90.6 | 99.4 | 94.0 | 92.3 | 98.2 | 65.8 | 99.8 | 93.6 | 77.2 | 97.0 | |
| STFT features | EXP1 | 99.6 | 95.4 | 99.9 | 97.9 | 96.7 | 99.4 | 81.4 | 99.9 | 94.7 | 87.6 | 98.8 |
| EXP2 | 93.7 | 95.9 | 93.5 | 50.6 | 66.2 | 96.2 | 77.0 | 97.0 | 50.8 | 61.2 | 90.1 | |
| EXP3 | 97.5 | 91.9 | 98.1 | 84.0 | 87.8 | 96.8 | 60.0 | 98.6 | 67.7 | 63.5 | 94.6 | |
| ECG original signal + STFT features | EXP1 | 99.8 | 98.2 | 99.9 | 99.0 | 98.6 | 99.7 | 92.4 | 99.9 | 95.8 | 94.1 | 99.5 |
| EXP2 | 99.7 | 97.6 | 99.8 | 98.0 | 97.6 | 99.1 | 81.5 | 99.8 | 94.9 | 87.7 | 98.8 | |
| EXP3 | 99.0 | 94.1 | 99.6 | 96.0 | 95.0 | 98.0 | 64.8 | 99.6 | 89.0 | 75.0 | 97.2 | |
Based on the experimental results presented earlier, we compared our results with the existing literature,16,42,53,54,55,56,57,58,59,60 and Tables 9, 10, and 11 display the comparison results. The bold data in the table is the maximum value, and the underscore data is the second largest value.
Table 9.
Comparison of the proposed method with the existing EXP1 method, the units for all evaluation indicators in the table are (%)
| Method | VEB |
SVEB |
Overall Accuracy | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc | Sen | Spe | Ppr | F1 | Acc | Sen | Spe | Ppr | F1 | ||
| 2018 Xie et al.16 | 99.6 | 98.8 | 99.6 | 95.5 | 97.1 | 99.1 | 92.7 | 99.3 | 80.2 | 86.0 | N/A |
| 2022 Xiao et al.42 | 99.7 | 97.2 | 99.8 | 97.5 | 97.3 | 99.6 | 86.6 | 99.9 | 97.3 | 91.7 | 99.1 |
| 2020Wang et al.55 | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | 96.3 |
| 2019Wang et al.56 | 99.1 | 91.8 | 99.6 | 95.3 | 93.5 | 99.4 | 79.5 | 99.9 | 96.3 | 87.2 | 98.4 |
| 2016 Li et al.57 | 97.4 | 92.1 | 98.1 | 88.2 | 90.1 | 98.4 | 88.6 | 99.1 | 88.8 | 88.7 | 94.8 |
| 2018Murugesan et al.59 | N/A | 97.0 | N/A | 97.0 | 97.0 | N/A | 96.0 | N/A | 75.0 | 84.0 | 97.6 |
| Proposed | 99.8 | 98.2 | 99.9 | 99.1 | 98.6 | 99.7 | 92.4 | 99.9 | 95.8 | 94.1 | 99.5 |
Table 10.
Comparison of the proposed method with the existing EXP2 method, the units for all evaluation indicators in the table are (%)
| Method | VEB |
SVEB |
Overall Accuracy | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc | Sen | Spe | Ppr | F1 | Acc | Sen | Spe | Ppr | F1 | ||
| 2022 Xiao et al.42 | 99.6 | 96.9 | 99.8 | 97.1 | 97.0 | 99.1 | 77.7 | 99.9 | 97.7 | 86.6 | 98.6 |
| 2019 Saadatnejad et al.53 | 99.6 | 95.8 | 99.9 | 97.8 | 96.8 | 99.0 | 75.6 | 99.9 | 98.9 | 85.7 | N/A |
| 2019 Amirshahi et al.54 | 97.9 | 80.2 | 99.8 | 97.3 | 88.0 | N/A | N/A | N/A | N/A | N/A | 97.3 |
| Proposed | 99.7 | 97.6 | 99.8 | 97.6 | 97.6 | 99.1 | 81.5 | 99.8 | 94.9 | 87.7 | 98.8 |
Table 11.
Comparison of the proposed method with the existing EXP3 method, the units for all evaluation indicators in the table are (%)
| Method | VEB |
SVEB |
Overall Accuracy | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc | Sen | Spe | Ppr | F1 | Acc | Sen | Spe | Ppr | F1 | ||
| 2022 Xiao et al.42 | 99.4 | 95.9 | 99.8 | 98.3 | 97.1 | 98.0 | 59.5 | 99.9 | 95.9 | 73.4 | 97.3 |
| 2015 Kiranyaz et al.58 | 98.4 | 95.0 | 98.7 | 88.1 | 91.4 | 96.6 | 64.6 | 98.1 | 62.0 | 63.3 | 95.1 |
| 2019 Xu et al.60 | 98.9 | 93.1 | 99.5 | 95.6 | 94.3 | 96.7 | 78.3 | 99.7 | 92.5 | 84.8 | 97.6 |
| Proposed | 99.0 | 94.1 | 99.6 | 95.9 | 95.0 | 98.0 | 64.8 | 99.6 | 89.0 | 75.0 | 97.2 |
Wang et al.55 addressed the limitation of the lack of interpretability in end-to-end trained deep learning models by proposing an interpretable arrhythmia classification method based on human-machine collaborative knowledge representation, achieving an Overall Accuracy of 96.3%, and ref.56 proposed a two-stage end-to-end neural network and diagnosis-based adaptive compression scheme for energy-efficient wearable intelligent ECG monitoring, achieving an Overall Accuracy of 98.4%. Reference57 used a parallel universal regression neural network for heartbeat classification, achieving an Overall Accuracy of 95%, and Murugesan et al.59 proposed three robust deep neural network architectures for feature extraction and classification of given 2-s ECG signals, achieving a classification result of 97.6%. Xie et al.16 proposed a combined network using bidirectional recurrent neural networks and convolutional neural networks for classification, and reference proposed the ULECGNet method,42 which had an Overall Accuracy of 99.1%. Compared to the aforementioned literature, our results demonstrate that in the EXP1 scheme, we achieved the highest Overall Accuracy of 99.5%, surpassing the previous methods by 0.4%. Moreover, we also achieved excellent performance on the SVEB and VEB classes.
Amirshahi et al.54 used a pulse neural network based model and employed spike-timing-dependent plasticity (STDP) and reward-modulated STDP (R-STDP) techniques, achieving an Overall Accuracy of 97.3%. Saadatnejad et al.53 utilized a novel architecture consisting of wavelet transform and multiple long short-term memory (LSTM) recurrent neural networks for classification, resulting in improved performance in the SVEB and VEB classes, and ref. 42 demonstrated significant performance improvement with an Overall Accuracy of 98.6%. In comparison, our method achieved the highest Overall Accuracy of 98.8% in the EXP2 scheme. In terms of the SVEB and VEB classes, our method also achieved the highest values for all metrics except Spe and Ppr. And almost all of the Spe and Ppr are the second best results.
Kiranyaz et al.58 proposed a patient-specific one-dimensional convolutional neural network model that integrates feature extraction and classification for ECG classification into a single learning body. The proposed model was validated in EXP3. By comparing the results, we have validated the effectiveness of our method, which achieved high Overall Accuracy across the 5 categories compared to the methods in the literature. Additionally, for EXP1 and EXP2, compared with the references, the highest Overall Accuracy was achieved, and the SVEB and VEB class indicators were also the highest. For the EXP3, we achieved the performance of SVEB and VEB, which are second only to.42 Finally, Xu et al.60 introduced a deep neural network utilizing patient-specific identity vectors. This model not only utilized the raw ECG waveforms from specific patients but also provided compact representations of ECG features through vectors. They achieved an Overall Accuracy of 97.6%. Our method had a slightly lower Overall Accuracy than their work, with a difference of 0.4%. For a more comprehensive analysis, we also compared other key metrics. Our SVEB and VEB performance is generally better than,60 and while EXP3 results are slightly lower than,42 EXP1 and EXP2 are much better than the former. However, there is still room for improvement in the performance of the EXP3.
It can be observed that our method achieved the highest Overall Accuracy in both the EXP1 and EXP2 schemes. In the EXP3 solution, our method maintains high Overall Accuracy while maintaining good small-class performance. Comparing the above confusion matrix, it can be found that the classification performance of the EXP3 scheme on SVEB and VEB is lower than that of the EXP2 scheme, which may be due to the fact that EXP3 data partitioning reduces the proportion of the two types of data in the training set.
Discussion
In this study, we study the features of ECG in the two dimensions of time and morphology, and try to extract these features for classification with the help of the fusion model to improve the accuracy of 5 classifications. The resulting classification results show that our method is effective and that the solution can be used for tasks in ECG classification situations when the training data is unstructured and unbalanced and can be represented as a one-dimensional time series with a standard time duration for portable single lead ECG devices. Our model achieves the best results after fewer training rounds and the results are more stable. The addition of the RR interval also reduces the time required for training. Similar to previous ECG classification results, our method does not perform as well as the intra-patient paradigm in the patient-to-patient paradigm, possibly because the database provides less features relevant to the individual. In addition, the RR interval has also been shown to be effective in improving classification results as a hand extracted feature, especially for some specific diseases.
Conclusions
In this paper, we proposed a novel ECG classification method that combines the CNNs and Transformer. First, the data were denoised, and RR intervals were extracted. Data augmentation techniques were then applied to address the issue of data imbalance. Next, the ECG signals were input into separate branches to learn morphological and temporal features, and finally, the outputs were fused to obtain the final classification results. We validated the performance of our method on the large-scale publicly available MITBIH database using three different schemes. The Overall Accuracy values were 99.5%, 98.8%, and 97.2% for each respective scheme. Our experimental results also demonstrated the effectiveness and efficiency of SVEB and VEB detection, which further promotes the practical application of our proposed method.
Cardiovascular disease is related to the body’s circulatory system, which can cause complications such as arrhythmias and heart rate failure, and in severe cases, it can be life-threatening. Being able to correctly detect and automatically identify cardiovascular diseases is of great significance for the timely and effective prevention and control of such diseases. Our research helps doctors identify potential patients with cardiovascular disease, relying on devices such as ECG machines, which can also monitor patients in real time and provide early warning when the onset occurs.
Limitations of the study
Despite the success of this method, there are still some limitations. For instance, abnormal heartbeats occur less frequently than normal heartbeats, and even with the use of data augmentation techniques, it may be challenging to achieve optimal performance. Larger databases with more abnormal signals would greatly contribute to better training of the model for heart rhythm classification. The MIT database records data from 47 subjects, this study has a small sample size, although the performance in the study has been good, but further research work may be needed for clinical use. In the future, we intend to continue our research on large databases to explore better solutions to solve the problem of data imbalance. We will try to extend our approach to CCDD clinical databases for research and improve our model to improve generalization performance. In addition, ECG methods for classifying specific diseases will be studied. We will also explore the application of different temporal models in ECG classification studies.
STAR★Methods
Key resources table
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Deposited data | ||
| Dataset | This paper | https://physionet.org/content/mitdb/1.0.0/ |
| Code | This paper | https://github.com/wjnGXNU/Fusion-model |
| Software and algorithms | ||
| Python 3.7 | Python | https://www.python.org/downloads/windows/ |
| PyTorch 1.11.0 | PyTorch | https://pytorch.org/ |
| Pandas 1.2.4 | Pandas | https://pandas.pydata.org/ |
| Numpy 1.21.5 | Numpy | https://numpy.org/ |
Resource availability
Lead contact
Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Feiyan Zhou (e-mail: zhfyyf15@126.com (F. Zhou)).
Materials availability
Materials are available upon request to Feiyan Zhou at zhfyyf15@126.com (F. Zhou).
Data and code availability
-
•
The code is available in: https://github.com/wjnGXNU/Fusion-model.
-
•
The data is publicly accessible in: https://physionet.org/content/mitdb/1.0.0/.
-
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
Experimental model and study participant details
The dataset for ECG classification comes from the internationally recognized and widely used MIT-BIH-AR database provided by the Massachusetts Institute of Technology (MIT). See the experiments section for specific experiment details.
Method details
On the MIT-BIH-AR database, we conducted a 5-category study of ECGs, both intra-patient and inter-patient. The acquired ECG signals were preprocessed to improve the signal quality, and two additional features, RR interval and STFT, were extracted to improve the performance of the method. For more information about data organization, see Data processing.
Traditional machine learning models, including support vector machines, KNNs, trees, and ensemble learning, among others. The main models of deep learning include CNN, Transformer, etc. The use of fusion models avoids manual feature extraction and fuses different features of the ECG. For details about the description and parameters of the model, see Fusion model.
Three different data divisions were used to validate the model, using commonly used evaluation indicators such as ACC, Sen, Spe, etc. For more information, see section experiments.
Quantification and statistical analysis
The evaluation indicators of the model include the Overall Accuracy, Acc, Spe, Sen, Ppr and F1. For more information, please refer to evaluation metrics.
Acknowledgments
This work was supported by Guangxi Science and Technology Base and Talent Special Project (No. GuiKe AD21159003), and the National Natural Science Foundation of China (No. 62006055).
Author contributions
Feiyan Zhou: Supervision, Writing-review and editing, Funding acquisition. Jiannan Wang: Investigation, Methodology, Experimental design, Writing-original draft.
Declaration of interests
The authors have no disclosures to declare.
Published: February 23, 2024
References
- 1.World Health Organization . 2023. Cardiovascular Diseases (Cvds)https://www.who.int/en/newsroom/fact-sheets/detail/cardiovascular-diseases-(cvds)/ [Google Scholar]
- 2.Osowski S., Hoai L.T., Markiewicz T. Support vector machine-based expert system for reliable heartbeat recognition. IEEE Trans. Biomed. Eng. 2004;51:582–589. doi: 10.1109/TBME.2004.824138. [DOI] [PubMed] [Google Scholar]
- 3.Vapnik V.N. Springer; 1995. The Nature of Statistical Learning Theory.https://link.springer.com/book/10.1007/978-1-4757-3264-1 [Google Scholar]
- 4.Vapnik V. Wiley; 1998. Statistical Learning Theory.https://www.researchgate.net/publication/220694713_Statistical_Learning_Theory [DOI] [PubMed] [Google Scholar]
- 5.Cortes C., Vapnik V. Support-Vector Networks. Mach. Learn. 1995;20:273–297. https://link.springer.com/article/10.1007/BF00994018 [Google Scholar]
- 6.Raj S., Ray K.C., Shankar O. Cardiac arrhythmia beat classification using DOST and PSO tuned SVM. Comput. Methods Programs Biomed. 2016;136:163–177. doi: 10.1016/j.cmpb.2016.08.016. [DOI] [PubMed] [Google Scholar]
- 7.Fejtová M., Macek J., Lhotská L. Final Programme Proceedings EUNITE; 2001. ECG Events Detection and Classification Using Wavelet Transform and Decision Trees; pp. 99–101.https://www.researchgate.net/publication/229028591_ECG_Events_Detection_and_Classification_Using_Wavelet_Transform_and_Decision_Tress [Google Scholar]
- 8.Luz E.J.d.S., Nunes T.M., De Albuquerque V.H.C., Papa J.P., Menotti D. ECG arrhythmia classification based on optimum-path forest. Expert Syst. Appl. 2013;40:3561–3573. doi: 10.1016/j.eswa.2012.12.063. [DOI] [Google Scholar]
- 9.Sharma M., Tan R.S., Acharya U.R. A novel automated diagnostic system for classification of myocardial infarction ECG signals using an optimal biorthogonal filter bank. Comput. Biol. Med. 2018;102:341–356. doi: 10.1016/j.compbiomed.2018.07.005. [DOI] [PubMed] [Google Scholar]
- 10.Javadi M., Ebrahimpour R., Sajedin A., Faridi S., Zakernejad S. Improving ECG classification accuracy using an ensemble of neural network modules. PLoS One. 2011;6 doi: 10.1371/journal.pone.0024386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Liang W., Zhang Y., Tan J., Li Y. A novel approach to ECG classification based upon two-layered HMMs in body sensor networks. Sensors. 2014;14:5994–6011. doi: 10.3390/s140405994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Rai H.M., Trivedi A., Shukla S. ECG signal processing for abnormalities detection using multi-resolution wavelet transform and Artificial Neural Network classifier. Measurement. 2013;46:3238–3246. doi: 10.1016/j.measurement.2013.05.021. [DOI] [Google Scholar]
- 13.Hassan A.R., Haque M.A. An expert system for automated identification of obstructive sleep apnea from single-lead ECG using random under sampling boosting. Neurocomputing. 2017;235:122–130. doi: 10.1016/j.neucom.2016.12.062. [DOI] [Google Scholar]
- 14.Zhang J., Tian J., Cao Y., Yang Y., Xu X., Wen C. Fine-Grained ECG Classification Based on Deep CNN and Online Decision Fusion. arXiv. 2019 https://www.researchgate.net/publication/330553668_Fine-grained_ECG_Classification_Based_on_Deep_CNN_and_Online_Decision_Fusion Preprint at. [Google Scholar]
- 15.Limam M., Precioso F. 2017. Atrial Fibrillation Detection and ECG Classification Based on Convolutional Recurrent Neural Network. [DOI] [Google Scholar]
- 16.Xie P., Wang G., Zhang C., Chen M., Zhang P. IEEE Access; 2018. Bidirectional Recurrent Neural Network and Convolutional Neural Network (BiRCNN) for ECG Beat Classification. [DOI] [PubMed] [Google Scholar]
- 17.Ganguly B., Ghosal A., Das A., Das D., Chatterjee D., Rakshit D. Automated Detection and Classification of Arrhythmia From ECG Signals Using Feature-Induced Long Short-Term Memory Network. IEEE Sens. Lett. 2020;4:1–4. doi: 10.1109/LSENS.2020.3006756. [DOI] [Google Scholar]
- 18.Guan J., Wang W., Feng P., Wang X., Wang W. Low-dimensional denoising embedding transformer for ECG classification. IEEE Access. 2021;41:1285–1289. doi: 10.1109/ICASSP39728.2021.9413766. [DOI] [Google Scholar]
- 19.Kozumplík J., Provazník I. Fast time-varying linear filters for suppression of baseline drift in electrocardiographic signals. Biomed. Eng. Online. 2017;16:24. doi: 10.1186/s12938-017-0316-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Tulyakova N., Trofymchuk O. Real-time filtering adaptive algorithms for non-stationary noise in electrocardiograms. Biomed. Signal Process Control. 2022;72 doi: 10.1016/j.bspc.2021.103308. [DOI] [Google Scholar]
- 21.Gupta V., Mittal M., Mittal V., Saxena N.K. A critical review of feature extraction techniques for ECG signal analysis. J. Inst. Eng. India. Ser. B. 2021;102:1049–1060. doi: 10.1007/s40031-021-00606-5. [DOI] [Google Scholar]
- 22.Gramatikov B., Georgiev I. Wavelets as alternative to short-time Fourier transform in signal-averaged electrocardiography. Med. Biol. Eng. Comput. 1995;33:482–487. doi: 10.1007/BF02510534. [DOI] [PubMed] [Google Scholar]
- 23.Rahul J., Sharma L.D. Artificial intelligence-based approach for atrial fibrillation detection using normalised and short-duration time-frequency ECG. Biomed. Signal Process Control. 2022;71 doi: 10.1016/j.bspc.2021.103270. [DOI] [Google Scholar]
- 24.Sinha N., Kumar Tripathy R., Das A. ECG beat classification based on discriminative multilevel feature analysis and deep learning approach. Biomed. Signal Process Control. 2022;78 doi: 10.1016/j.bspc.2022.103943. [DOI] [Google Scholar]
- 25.Jafari M., Shoeibi A., Khodatars M., Ghassemi N., Moridian P., Alizadehsani R., Khosravi A., Ling S.H., Delfan N., Zhang Y.-D., et al. Automated diagnosis of cardiovascular diseases from cardiac magnetic resonance imaging using deep learning models: A review. Comput. Biol. Med. 2023;160 doi: 10.1016/j.compbiomed.2023.106998. [DOI] [PubMed] [Google Scholar]
- 26.Qin J., Gao F., Wang Z., Wong D.C., Zhao Z., Relton S.D., Fang H. A novel temporal generative adversarial network for electrocardiography anomaly detection. Artif. Intell. Med. 2023;136 doi: 10.1016/j.artmed.2023.102489. [DOI] [PubMed] [Google Scholar]
- 27.Yan G., Liang S., Zhang Y., Liu F. IEEE; 2019. Fusing Transformer Model with Temporal Features for ECG Heartbeat Classification; pp. 898–905. [DOI] [Google Scholar]
- 28.Tesfai H., Saleh H., Al-Qutayri M., Mohammad M.B., Tekeste T., Khandoker A., Mohammad B. Lightweight Shufflenet Based CNN for Arrhythmia Classification. IEEE Access. 2022;10:111842–111854. doi: 10.1109/ACCESS.2022.3215665. [DOI] [Google Scholar]
- 29.Mostayed A., Luo J., Shu X., Wee W. Classification of 12-Lead ECG signals with bi-directional LSTM network. arXiv. 2018 doi: 10.48550/arXiv.1811.02090. Preprint at. [DOI] [Google Scholar]
- 30.de Santana J.G., Costa M.G., Costa Filho C.F.F. IEEE; 2021. A New Approach to Classify Cardiac Arrythmias Using 2D Convolutional Neural Networks; pp. 566–570. [DOI] [PubMed] [Google Scholar]
- 31.Xu W., Wang L., Wang B., Cheng W. Intelligent Recognition Algorithm of Multiple Myocardial Infarction Based on Morphological Feature Extraction. Processes. 2022;10:2348. doi: 10.3390/pr10112348. [DOI] [Google Scholar]
- 32.Natarajan A., Chang Y., Mariani S., Rahman A., Boverman G., Vij S., Rubin J. IEEE; 2020. A Wide and Deep Transformer Neural Network for 12-lead ECG Classification; pp. 1–4. [DOI] [Google Scholar]
- 33.Lu P., Guo S., Wang Y., Qi L., Han X., Wang Y. Springer; 2019. Ecg Classification Based on Long Short-Term Memory Networks; pp. 129–140. [DOI] [Google Scholar]
- 34.Guanglong M., Xiangqing W., Junsheng Y. IOP Publishing; 2019. ECG Signal Classification Algorithm Based on Fusion Features.https://iopscience.iop.org/article/10.1088/1742-6596/1207/1/012003 [Google Scholar]
- 35.Ahmad Z., Tabassum A., Guan L., Khan N.M. ECG heartbeat classification using multimodal fusion. IEEE Access. 2021;9:100615–100626. doi: 10.1109/ACCESS.2021.3097614. [DOI] [Google Scholar]
- 36.Zhang Y., Liu H., Hu Q. Springer; 2021. Transfuse: Fusing Transformers and Cnns for Medical Image Segmentation; pp. 14–24. [DOI] [Google Scholar]
- 37.Arlington V. 1998. Testing and reporting performance results of cardiac rhythm and st segment measurement algorithms. ANSI-AAMI EC57. [Google Scholar]
- 38.Wang S.-H., Zhou J., Zhang Y.-D. Community-acquired pneumonia recognition by wavelet entropy and cat swarm optimization. Mobile Netw. Appl. 2022;27:1–18. doi: 10.1007/s11036-021-01897-0. [DOI] [Google Scholar]
- 39.Wang S.-H., Wu X., Zhang Y.-D., Tang C., Zhang X. Diagnosis of COVID-19 by wavelet Renyi entropy and three-segment biogeography-based optimization. Int. J. Comput. Intell. Syst. 2020;13:1332–1344. doi: 10.2991/ijcis.d.200828.001. [DOI] [Google Scholar]
- 40.Anis M.T., Sharma V. IEEE; 2022. Classification of ECG Signal Using CNN Algorithm; pp. 185–189. [DOI] [Google Scholar]
- 41.Raj S., Ray K.C. A personalized point-of-care platform for real-time ECG monitoring. IEEE Trans. Consumer Electron. 2018;64:452–460. doi: 10.1109/TCE.2018.2877481. [DOI] [Google Scholar]
- 42.Xiao J., Liu J., Yang H., Liu Q., Wang N., Zhu Z., Chen Y., Long Y., Chang L., Zhou L., Zhou J. ULECGNet: An ultra-lightweight end-to-end ECG classification neural network. IEEE J. Biomed. Health Inform. 2022;26:206–217. doi: 10.1109/JBHI.2021.3090421. [DOI] [PubMed] [Google Scholar]
- 43.Lanfranchi P.A., Somers V.K. Principles and Practice of Sleep Medicine. Fifth Edition. Elsevier Inc; 2010. Cardiovascular physiology: autonomic control in health and in sleep disorders; pp. 226–236. [DOI] [Google Scholar]
- 44.Li J., Ashraf A., Cardiff B., Panicker R.C., Lian Y., John D. Low power optimisations for iot wearable sensors based on evaluation of nine qrs detection algorithms. IEEE Open J. Circuits Syst. 2020;1:115–123. doi: 10.1109/OJCAS.2020.3009822. [DOI] [Google Scholar]
- 45.John A., Redmond S.J., Cardiff B., John D. A multimodal data fusion technique for heartbeat detection in wearable IoT sensors. IEEE Internet Things J. 2022;9:2071–2082. doi: 10.1109/JIOT.2021.3093112. [DOI] [Google Scholar]
- 46.Xiaolin L., Xiang F., Panicker R.C., Cardiff B., John D. IEEE; 2023. Classification of Ecg Based on Hybrid Features Using Cnns for Wearable Applications; pp. 1–4. [DOI] [Google Scholar]
- 47.Wasimuddin M., Elleithy K., Abuzneid A.-S., Faezipour M., Abuzaghleh O. Stages-based ECG signal analysis from traditional signal processing to machine learning approaches: A survey. IEEE Access. 2020;8:177782–177803. doi: 10.1109/ACCESS.2020.3026968. [DOI] [Google Scholar]
- 48.Wang S.-H., Govindaraj V.V., Górriz J.M., Zhang X., Zhang Y.-D. Covid-19 classification by FGCNet with deep feature fusion from graph convolutional network and convolutional neural network. Inf. Fusion. 2021;67:208–229. doi: 10.1016/j.inffus.2020.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Wang S.-H., Nayak D.R., Guttery D.S., Zhang X., Zhang Y.-D. COVID-19 classification by CCSHNet with deep fusion using transfer learning and discriminant correlation analysis. Inf. Fusion. 2021;68:131–148. doi: 10.1016/j.inffus.2020.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Niu J., Tang Y., Sun Z., Zhang W. Inter-patient ECG classification with symbolic representations and multi-perspective convolutional neural networks. IEEE J. Biomed. Health Inform. 2020;24:1321–1332. doi: 10.1109/JBHI.2019.2942938. [DOI] [PubMed] [Google Scholar]
- 51.Szegedy C., Vanhoucke V., Ioffe S., Shlens J., Wojna Z. 2016. Rethinking the Inception Architecture for Computer Vision; pp. 2818–2826.https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Szegedy_Rethinking_the_Inception_CVPR_2016_paper.html [Google Scholar]
- 52.Dai W., Dai C., Qu S., Li J., Das S. IEEE; 2017. Very Deep Convolutional Neural Networks for Raw Waveforms; pp. 421–425. [DOI] [Google Scholar]
- 53.Saadatnejad S., Oveisi M., Hashemi M. LSTM-based ECG classification for continuous monitoring on personal wearable devices. IEEE J. Biomed. Health Inform. 2020;24:515–523. doi: 10.1109/JBHI.2019.2911367. [DOI] [PubMed] [Google Scholar]
- 54.Amirshahi A., Hashemi M. ECG classification algorithm based on STDP and R-STDP neural networks for real-time monitoring on ultra low-power personal wearable devices. IEEE Trans. Biomed. Circuits Syst. 2019;13:1483–1493. doi: 10.1109/TBCAS.2019.2948920. [DOI] [PubMed] [Google Scholar]
- 55.Wang J., Li R., Li R., Fu B., Xiao C., Chen D.Z. Towards interpretable arrhythmia classification with human-machine collaborative knowledge representation. IEEE Trans. Biomed. Eng. 2021;68:2098–2109. doi: 10.1109/TBME.2020.3024970. [DOI] [PubMed] [Google Scholar]
- 56.Wang N., Zhou J., Dai G., Huang J., Xie Y. Energy-efficient intelligent ECG monitoring for wearable devices. IEEE Trans. Biomed. Circuits Syst. 2019;13:1112–1121. doi: 10.1109/TBCAS.2019.2930215. [DOI] [PubMed] [Google Scholar]
- 57.Li P., Wang Y., He J., Wang L., Tian Y., Zhou T.-s., Li T., Li J.-s. High-performance personalized heartbeat classification model for long-term ECG signal. IEEE Trans. Biomed. Eng. 2017;64:78–86. doi: 10.1109/TBME.2016.2539421. [DOI] [PubMed] [Google Scholar]
- 58.Kiranyaz S., Ince T., Gabbouj M. Real-time patient-specific ECG classification by 1-D convolutional neural networks. IEEE Trans. Biomed. Eng. 2016;63:664–675. doi: 10.1109/TBME.2015.2468589. [DOI] [PubMed] [Google Scholar]
- 59.Murugesan B., Ravichandran V., Ram K., Preejith S., Joseph J., Shankaranarayana S.M., Sivaprakasam M. IEEE; 2018. Ecgnet: Deep Network for Arrhythmia Classification; pp. 1–6. [DOI] [Google Scholar]
- 60.Xu S.S., Mak M.-W., Cheung C.-C. I-vector-based patient adaptation of deep neural networks for automatic heartbeat classification. IEEE J. Biomed. Health Inform. 2020;24:717–727. doi: 10.1109/JBHI.2019.2919732. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
-
•
The code is available in: https://github.com/wjnGXNU/Fusion-model.
-
•
The data is publicly accessible in: https://physionet.org/content/mitdb/1.0.0/.
-
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.









