Heartbeat Classification Based on Multifeature Combination and Stacking-DWKNN Algorithm

Shasha Ji; Runchuan Li; Shengya Shen; Bicao Li; Bing Zhou; Zongmin Wang

doi:10.1155/2021/8811837

. 2021 Jan 28;2021:8811837. doi: 10.1155/2021/8811837

Heartbeat Classification Based on Multifeature Combination and Stacking-DWKNN Algorithm

Shasha Ji ^1,², Runchuan Li ^1,^2,^✉, Shengya Shen ³, Bicao Li ⁴, Bing Zhou ^1,², Zongmin Wang ^1,^2,^✉

PMCID: PMC7861929 PMID: 33575022

Abstract

Arrhythmia is one of the most common abnormal symptoms that can threaten human life. In order to distinguish arrhythmia more accurately, the classification strategy of the multifeature combination and Stacking-DWKNN algorithm is proposed in this paper. The method consists of four modules. In the preprocessing module, the signal is denoised and segmented. Then, multiple different features are extracted based on single heartbeat morphology, P length, QRS length, T length, PR interval, ST segment, QT interval, RR interval, R amplitude, and T amplitude. Subsequently, the features are combined and normalized, and the effect of different feature combinations on heartbeat classification is analyzed to select the optimal feature combination. Finally, the four types of normal and abnormal heartbeats were identified using the Stacking-DWKNN algorithm. This method is performed on the MIT-BIH arrhythmia database. The result shows a sensitivity of 89.42% and a positive predictive value of 94.90% of S-type beats and a sensitivity of 97.21% and a positive predictive value of 97.07% of V-type beats. The obtained average accuracy is 99.01%. Compared to other models with the same features, this method can improve accuracy and has a higher positive predictive value and sensitivity, which is important for clinical decision-making.

1. Introduction

Cardiovascular disease is one of the main diseases that endanger human health [1]. Arrhythmia is a common cardiovascular syndrome, and accurate identification of arrhythmia is an essential part of the prevention of cardiovascular diseases. Most arrhythmias are harmless, but some may immediately threaten people's lives. Early detection of arrhythmia can prolong life through proper treatment. The electrocardiogram (ECG) is a popular and mature diagnostic tool. It contains basic physiological information for analyzing cardiac function [2] and is the most basic method for the diagnosis of arrhythmia. Different classes of arrhythmias can be detected by analyzing the changes of ECG waveform, but it usually needs to be diagnosed at the onset of the disease. Some patients' symptoms appear infrequently. Traditional electrocardiogram may not capture the electrocardiogram at the time of onset. It is necessary to use dynamic ECG to record long-term cardiac electrical activities [3].

It may be time-consuming and impractical to rely on manual analysis of ECG signals. Moreover, due to the interference of noise and the diversity of ECG waveforms, arrhythmia is difficult to accurately diagnose and easy to be misdiagnosed. At the same time, relying on manual recognition of electrocardiograms often lacks real-time, which may delay the best time for patient treatment. The application of computer-aided intelligent diagnosis to the classification of arrhythmias can help doctors more accurately diagnose arrhythmias and reduce the workload of doctors. In the literature, numerous algorithms have been proposed to achieve an accurate result for the classification, mainly including deep learning-based approaches and feature extraction-based approaches.

Deep neural networks usually work in an end-to-end way, do not require manual feature extraction, and are widely used for ECG classification [4]. However, although they are good at learning feature representations and have produced very competitive performance in a wide range of applications, they cannot analyze the impact of specific features on classification performance.

The traditional feature extraction method has achieved good performance in ECG classification. Researchers usually fed the extracted features to the machine learning model to achieve heartbeat classification. The methods employing deep learning-based approaches have generated a competitive classification performance to the feature extraction-based methods. However, the classification performance of deep learning models can still be achieved by simple machine learning models. This means that there is still room for further performance improvements in this method.

In this paper, a heartbeat classification method based on multifeature combination and Stacking-DWKNN models is proposed to address the shortcomings of deep learning methods and traditional machine learning methods. The distance weight KNN algorithm (DWKNN) is to improve the KNN model by setting the weight of distance. The method proposed further improves the performance of the classification. The main contributions of this paper are as follows:

Different feature combinations are constructed. The suitability of every single feature is evaluated, and the results of different feature combinations on classification are analyzed to obtain the optimal feature combination.
Different model fusion methods are used for heartbeat classification to obtain the optimal model fusion method.
The Stacking-DWKNN model with the optimal feature combination is employed to distinguish normal beat (N), supraventricular ectopic beat (S), ventricular ectopic beat (V), and fusion normal (F), which is of great significance for clinical diagnosis.

The other parts of this paper are structured as follows. Section 2 introduces related work. The methods of heartbeat classification are introduced in Section 3. Experimental analysis and classification results are described in Section 4. Section 5 summarizes the full text.

2. Related Work

In the early days, the diagnosis of arrhythmias was based on the experience of the doctor. However, due to the diversity of arrhythmias and the corresponding complexity of the ECG waveform, manual analysis methods are no longer applicable. ECG intelligent analysis has become a research focus in recent years. Researchers have developed a diversity of classification methods for arrhythmias.

2.1. Arrhythmia Classification Based on Deep Learning

Deep learning does not require the manual design of feature extractors. It can automatically learn the features of ECG and extract the key features. It has very good robustness and makes the classification of heartbeat more efficient.

Some researchers [5–8] employed convolutional neural networks (CNNs), which automatically extract the ECG features and significantly improve the final prediction. Some works [9, 10] proposed a deep learning architecture based on a convolutional recurrent neural network (GRNN) to detect arrhythmias. Li et al. [11] designed the architecture of the deep neural network, CraftNet, for accurately recognizing the features, and assembled multiple child classifiers to classify heartbeats. Li et al. [12] used long short-term memory (LSTM) model to distinguish different category heartbeats. Ebrahimzadeh et al. [13] extracted a balanced combination of the Hermit features and interval features. And then, a number of multilayer perceptron (MLP) neural networks were employed to classify heartbeats.

The results of these researches were remarkable. Deep learning integrates feature learning into the process of modeling, and the classification of heartbeat is simple and effective. However, the requirement of deep learning for searching the optimal combination of features is challenging.

2.2. Arrhythmia Classification Based on Feature Extraction

Traditional machine learning (ML) involves direct feature engineering, making algorithms easy to interpret and understand. In addition, we have a comprehensive understanding of the algorithm and the structure of the data, making it easier to change the model. In recent years, researchers have developed numerous approaches for automatic classification. Among them, the two steps of feature extraction and classification are the most critical in the classification process, which are deeply studied by researchers. Furthermore, researches used numerous features to describe the ECG heartbeats, Hermite functions [13], morphological features [14, 15], wavelet features [16, 17], high-order statistical features [18, 19], QRS amplitude vector [20], QRS complex wave area [21], and heartbeat intervals [22–24]. Over the past few decades, numerous algorithms have been developed to distinguish different types of arrhythmias, including linear classifier [25–27], decision tree [28, 29], k-nearest neighbor [30–32], support vector machine [33, 34], random forest [35, 36], and ensemble classifier [37–41], etc.

In [27], researchers have extracted ECG morphology, heartbeat intervals, and RR-intervals and then applied a linear classifier model to the classification tasks using the learned features. Sharma et al. [32] used stop-band energy (SBE) minimized dyadic orthogonal filter bank, and wavelet decomposition of the ECG signals was performed. And then fuzzy entropy, Renyi entropy, and fractal dimension features were extracted for accurate classification. The ensemble classifiers fuse the classification results of multiple different classifiers, to achieve better performance than a single classifier. Mondéjar-Guerra et al. [34] trained specific support vector machine models for each feature, and then the multiple SVMs are combined to classify heartbeats. Shi H. et al. [37] constructed a hierarchical classifier improved by threshold and extreme gradient boosting classifier. This method has better classification performance. Javadi et al. [38] integrated a multiple neural network model based on a stacking algorithm for ECG classification, which reduced the classification error rate. Pandey et al. [39] employed an ensemble of SVMs to classify heartbeats into four classes. Rajesh et al. [40] used intrinsic mode functions to get the final features, and the AdaBoost classifier was employed to classify heartbeats. Shi et al. [41] employed a regional feature extraction method and used an ensemble classifier to distinguish heartbeats.

Although the aforementioned studies have achieved a good classification effect, the extracted medically meaningful features are less, part of the information hidden in the ECG is not easy to be revealed, the classification accuracy also needs to be improved, the classifier does not use a cross-validation method, and the robustness needs to be improved. The relevant literature in the related work is summarized in Table 1.

Table 1.

Summary of related work.

Methods	Classifier	Literatures
Deep learning	CNN	[5–8]
	GRNN	[9, 10]
	CraftNet	[11]
	LSTM	[12]
	MLP	[13]

Machine learning	SVM	[14, 16, 20, 22, 33, 34]
	HMM	[15]
	KNN	[18, 30–32]
	SVM&ICA-PCAnet	[19]
	RF	[21, 35, 36]
	LDC	[23]
	Linear classifier	[25–27]
	DT	[28, 29]
	Ensemble	[37–41]

Records	Fdr (%)	Se (%)	+p (%)	Acc (%)
100	0.04	99.96	100.00	99.96
101	0.27	100.00	99.73	99.73
103	0	100.00	100.00	100.00
106	0.59	99.56	99.85	99.41
109	0.28	100.00	99.72	99.72
113	0.06	99.94	100.00	99.94
114	0.27	99.95	99.79	99.73
117	0.07	100.00	99.93	99.93
119	0	100.00	100.00	100.00
124	0.06	99.94	100.00	99.94
202	0.38	99.62	100.00	99.62
205	0.11	99.89	100.00	99.89
208	0.82	99.53	99.66	99.18
212	0	100.00	100.00	100.00
213	0.03	99.97	100.00	99.97
215	0.03	100.00	99.97	99.97
220	0	100.00	100.00	100.00
223	0.12	99.96	99.92	99.88
228	8.19	99.90	92.51	91.81
231	0.06	99.94	100.00	99.94
232	0.79	100.00	99.22	99.21
234	0.07	99.93	100.00	99.93
Total avg	0.56	99.91	99.56	99.44

Number	ECG signal features	Introduction of ECG signal feature parameters
1	Morph	235 points of a single heartbeat
2	P_len	The time between the start and end of the P wave
3	QRS_len	The time between the start and end of the QRS complex
4	T_len	The time between the start and end of the T wave
5	RR_inter	The time between two adjacent R waves
6	PR_inter	The time from the start of the P wave to the start of the QRS complex
7	ST_seg	The time from the end of the QRS wave to the start of the T wave
8	QT_inter	The time between the QRS wave and the T wave
9	R_amp	The maximal of the R wave
10	T_amp	The maximal of the T wave

Features					Evaluation metrics (%)
P-QRS-T	RR_inter	PR_inter	ST_seg	QT_inter	Acc
•	•				92.17
•		•			90.89
•			•		92.85
•				•	91.78
•	•	•			92.72
•	•		•		94.71
•	•			•	94.68
•		•	•		93.92
•		•		•	93.10
•			•	•	94.53
•	•	•	•		95.39
•	•	•			95.27
•	•		•	•	94.76
•		•	•	•	95.26
•	•	•	•	•	95.41

	Morph				Inter				Amp
	n	s	v	f	n	s	v	f	n	s	v	f
N	9009	12	12	2	8939	26	56	14	8960	20	53	2
S	40	271	1	0	135	169	8	0	306	6	0	0
V	23	0	690	3	162	5	544	5	527	3	186	0
F	12	1	8	57	47	0	7	24	68	0	9	1

	Morph				Inter				Amp
	Se(%)	Sp(%)	+p(%)	Acc(%)	Se(%)	Sp(%)	+p(%)	Acc(%)	Se(%)	Sp(%)	+p(%)	Acc(%)
N	99.71	93.22	99.17	99.00	98.94	68.90	96.29	95.66	99.17	18.54	90.86	90.38
S	86.86	99.87	95.42	99.47	54.17	99.68	84.50	98.28	19.20	99.77	20.69	96.76
V	96.37	99.78	97.05	99.54	75.98	99.25	88.46	97.60	25.98	99.34	75.00	94.16
F	73.08	99.95	91.94	99.74	30.77	99.81	55.81	99.28	12.80	99.98	33.33	99.22

	Morph + Inter				Morph + Amp				Inter + Amp				Morph + Inter + Amp
	N	s	v	f	n	s	v	f	n	s	v	f	n	s	v	f
N	9010	12	11	2	9010	12	11	2	8942	27	54	12	9009	13	11	2
S	42	269	1	0	40	271	1	0	77	224	11	0	37	274	1	0
V	22	2	691	1	23	0	690	3	85	8	619	4	22	2	691	1
F	15	0	7	56	12	1	8	57	28	0	9	41	15	0	7	56

	Morph + Inter				Morph + Amp				Inter + Amp				Morph + Inter + Amp
	Se%	Sp%	+p%	Acc%	Se%	Sp%	+p%	Acc%	Se%	Sp%	+p%	Acc%	Se%	Sp%	+p%	Acc%
N	99.72	92.86	99.13	98.97	99.72	93.22	99.17	99.01	98.97	82.82	97.92	97.21	99.71	93.31	99.19	99.01
S	86.22	99.86	95.05	99.44	86.86	99.87	95.42	99.47	71.79	99.64	86.49	98.79	87.82	99.85	94.81	99.48
V	96.51	99.79	97.32	99.57	96.37	99.79	97.18	99.55	86.45	99.21	89.32	98.31	96.51	99.80	97.32	99.57
F	71.79	99.97	94.92	99.76	73.08	99.95	91.94	99.74	52.56	99.84	71.93	99.48	71.79	99.97	94.92	99.75

Feature combination			Classifier
Morph	Inter	Amp	LDA	LR	SVM	DT	GBDT	RF	KNN	DW KNN	Stacking-DWKNN
•			91.71	92.39	98.61	97.35	97.61	98.55	98.88	98.95	98.94
	•		89.06	89.01	93.69	93.47	94.45	95.73	95.41	95.36	95.59
		•	90.42	89.14	90.59	87.98	90.58	87.81	90.26	88.47	90.43
•	•		93.03	93.25	98.69	97.43	97.73	98.57	98.87	98.97	98.97
•		•	91.74	92.40	98.69	97.47	97.55	98.58	98.89	98.94	98.95
	•	•	90.14	90.94	95.71	95.74	96.01	97.08	96.89	96.93	97.09
•	•	•	93.82	94.24	98.74	97.80	98.15	98.83	98.91	98.96	99.01

Reference	Features	Classifier	Performance
Mar et al. [26]	Statistical features + SFFS; temporal features; morphological features	Weighted LD, MLP	Acc = 89.9%;
			Se_n = 89.6%; +P_n = 99.1%:
			Se_s = 83.2%; +P_s = 33.5%;
			Se_v = 86.8%; +P_v = 75.9%;
			Se_f = 61.1%; +P_f = 16.6%;

Zhang et al. [33]	ECG-intervals and segments; RR interval; morphological features	Combined SVM	Acc = 86.66%;
			Se_n = 88.94%; +P_n = 98.98%:
			Se_s = 79.06%; +P_s = 35.98%;
			Se_v = 85.48%; +P_v = 92.75%;
			Se_f = 93.81%; +P_f = 13.73%;

Zhu et al. [14]	ECG morphology	SVM	Acc = 97.80%;
			Se_n = 99.27%; +P_n = 98.48%;
			Se_s = 87.47%; +P_s = 95.25%;
			Se_v = 94.71%; +P_v = 95.22%
			Se_f = 73.88%; +P_f = 86.09%

Mondéjar-Guerra [34]	RR interval; HOS; ECG morphology; wavelet coefficients	Ensemble SVM	Acc = 94.5%;
			Se_n = 95.9%; +P_n = 98.2%;
			Se_s = 78.1%; +P_s = 49.7%;
			Se_v = 94.7%; +P_v = 93.9%
			Se_f = 12.4%; +P_f = 23.6%

Shi et al. [37]	ECG morphology	Hierarchical classifier	Se_n = 92.1%; +P_n = 99.5%;
			Se_s = 91.7%; +P_s = 46.2%;
			Se_v = 95.1%; +P_v = 88.1%;
			Se_f = 61.6%; +P_f = 15.2%;

Sharma et al. [32]	Fuzzy entropy; Renyi entropy; fractal dimension	KNN	Acc = 94.5%;
			Se_n = 99.59%; Sp_n = 91.92%; +P_n = 98.34;
			Se_s = 73.64%; Sp_s = 99.84%; +P_s = 92.09;
			Se_v = 92.11%; Sp_v = 99.75%; +P_v = 96.37;
			Se_f = 64.46%; Sp_f = 99.94%; +P_f = 88.38;

Singh et al. [8]	Gabor; wave; interval	DCNN	Acc = 93.19%;
			Se = 93.98%;
			Sp = 95%;

Li et al. [11]	R-R intervals; wavelet transform; Morph; higher-order statistics	CraftNet	Acc = 89.24%;
			Se_n = 88.16%; Sp_n = 94.34%
			Se_s = 85.37%; Sp_s = 94.85%
			Se_v = 94.53%; Sp_v = 99.70%
			Se_f = 88.92%; Sp_f = 94.28%

Proposed	Intervals; P-QRS-T wave; amplitude; ECG morphology	Stacking-DWKNN	Acc = 99.01%;
			Se_n = 99.65%; Sp_n = 94.94%; +P_n = 99.38;
			Se_s = 89.42%; Sp_s = 99.85%; +P_s = 94.90;
			Se_v = 97.21%; Sp_v = 99.78%; +P_v = 97.07;
			Se_f = 80.77%; Sp_f = 99.94%; +P_f = 88.73;

	Training set	Testing set	Total
N	81,560	9,035	90,595
S	2,528	253	2,781
V	6,450	785	7,235
F	723	79	802

	n	s	v	f	Total
N	Nn	Ns	Nv	Nf	∑N
S	Sn	Ss	Sv	Sf	∑S
V	Vn	Vs	Vv	Vf	∑V
F	Fn	Fs	Fv	Ff	∑F

	N			S			V			F			Acc
	Se(%)	Sp(%)	+p(%)	Se(%)	Sp(%)	+p(%)	Se(%)	Sp(%)	+p(%)	Se(%)	Sp(%)	+p(%)	Num
Stacking-SVM	99.83	91.32	98.95	81.09	99.96	98.44	96.51	99.78	97.05	69.32	99.98	96.43	98.79
Stacking-DT	98.93	90.96	98.89	81.73	99.50	83.88	93.99	99.51	93.60	70.51	99.75	68.75	97.83
Stacking-GBDT	99.68	88.16	98.57	74.68	99.90	95.88	94.41	99.71	96.16	66.67	99.94	89.66	98.28
Stacking-RF	99.86	91.68	98.99	82.05	99.98	99.22	96.51	99.78	97.05	70.51	99.98	96.49	98.85
Stacking-DWKNN	99.65	94.94	99.38	89.42	99.85	94.90	97.21	99.78	97.07	80.77	99.94	88.73	99.01

	N			S			V			F			Avg
	Se(%)	Sp(%)	+p(%)	Se(%)	Sp(%)	+p(%)	Se(%)	Sp(%)	+p(%)	Se(%)	Sp(%)	+p(%)	Acc (%)
Baseline1	99.60	94.85	99.37	88.78	99.79	92.95	97.20	99.80	97.35	80.77	99.92	88.73	98.96
Baseline2	99.58	92.86	99.13	87.18	99.77	92.20	95.95	99.76	96.76	70.51	99.94	91.67	97.83
Baseline3	99.67	94.39	99.32	89.10	99.81	93.60	96.51	99.81	97.46	78.21	99.93	89.71	98.95
Proposed	99.65	94.94	99.38	89.42	99.85	94.90	97.21	99.78	97.07	80.77	99.94	88.73	99.01

PERMALINK

Heartbeat Classification Based on Multifeature Combination and Stacking-DWKNN Algorithm

Shasha Ji

Runchuan Li

Shengya Shen

Bicao Li

Bing Zhou

Zongmin Wang

Abstract

1. Introduction

2. Related Work

2.1. Arrhythmia Classification Based on Deep Learning

2.2. Arrhythmia Classification Based on Feature Extraction

Table 1.

3. Methods

Figure 1.

3.1. ECG Signal Preprocessing

Figure 2.

Table 2.

3.2. Heartbeat Feature Extraction

Figure 3.

Table 3.

3.3. Feature Combination

3.4. Stacking-DWKNN Model Description

Figure 4.

Table 4.

4. Results

4.1. Experimental Data

Table 5.

4.2. Evaluation Indicator

Table 6.

4.3. Experiment and Result Analysis

4.3.1. Analysis of Experimental Results of Different Feature Combinations

Table 7.

Table 8.

Table 9.

Table 10.

Table 11.

Table 12.

Figure 5.

4.3.2. Analysis of Experimental Results with Different Parameters

Table 13.

4.3.3. Analysis of Experimental Result Analysis of Different Classifiers

Table 14.

Table 15.

Figure 6.

Table 16.

4.3.4. Comparison with Previous Studies

Table 17.

5. Conclusions

Acknowledgments

Contributor Information

Data Availability

Conflicts of Interest

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases