TABLE I.
Author | Heartbeat typesa | Model architecture | Performance | Datasetb |
---|---|---|---|---|
Zhang et al. 2017[51] | N, VEB, SVEB and F. | The raw ECG signal segment was fed into a 2-layer stacked LSTM. | The accuracies were 99.6% and 99.0% for detecting SVEB and VEB, respectively. | MITDB |
Maknickas et al. 2017[52] | N, AF, O, and noise. | • 23 handcraft features were extracted from each ECG segment and then connected in the time order to form an input sequence. • The model was composed of 3-layer LSTM and followed by a dense layer. |
Macro-averaging F1-score was 0.78. | CinC |
Schwab et al. 2017[22] | N, AF, O, and noise. | • Handcrafted and autoencoder calculated features were extracted from beat-wise ECG segments and then connected in the time order to form an input sequence. • An ensemble model contained 15 RNNs and 4 Hidden Semi-Markov models. The outputs of sub-models were concatenated as an input for a dense layer to obtain the final result. |
Macro-averaging F1-score was 0.79. | CinC |
Warrick et al. 2017[40] | N, AF, O, and noise. | The ECG signal segment was firstly fed into one CNN-1D layer. Three layers of LSTM were then applied at the top of the CNN layer | Macro-averaging F1-score was 0.80. | CinC |
Zihlmann et al. 2017[39] | N, AF, O, and noise. | • Proposed an ensemble model with 5 deep networks, either CNN only models or CNN and RNN models. • The authors first calculated the logarithmic time-frequency spectrogram as an input image. • The RNN part was composed of a 3-layer bidirectional LSTM. |
Macro-averaging F1-score was 0.821 | CinC |
Xiong et al. 2018[41] | N, AF, O, and noise. | • The ECG signal segment was first fed into a CNN part composed of 16 residual blocks with CNN-1D layers. • Three layers of Elman RNN were then applied at top of CNN parts. |
Macro-averaging F1-score was 0.864. | CinC |
Singh et al. 2018[37] | N and A. | The authors applied three kinds of RNNs with Elman, LSTM, and GRU. Each RNN had three stacked layers, and the ECG signal segment was used as an input sequence. | LSTM showed the best accuracy of 88.1%. | MITDB |
Shashikumar et al. 2018[11] | AF and O. | • The wavelet power spectrum was calculated in 30s non-overlapping windows. 5-layer CNN was applied on each spectrum to extract local features and thus form a feature sequence in 10 mins. • The feature sequence was fed into a bidirectional Elman RNN with a soft attention layer, followed by a dense layer for decision-making. |
• Testing accuracy was 96%. • Transfer learning was conducted on PPG recordings as input data and obtained 97% accuracy. |
Collected from 2850 patients |
Yildirim. 2018[24] | N, LBBB, RBBB, PB and VPC | • The detailed coefficients were calculated from 4 levels of the discrete wavelet transform. These coefficients with original signals were connected into a 5-channel time sequence as input. • The sequence was fed into 2-layer unidirectional and bidirectional LSTMs separately. |
• For the unidirectional LSTM, the accuracy was 99.25%. • For the bidirectional LSTM, the accuracy was 99.39% |
MITDB |
Oh et al. 2018[17] | N, LBBB, RBBB, APB and VPC | • The ECG signal segment was first fed into the CNN part, composed of 3 CNN-1D layers. • The CNN part was followed by 1-layer LSTM. The last time step’s output was fed to 3 dense layers for final classification. |
The accuracy was 98.10% | MITDB |
Chang et al. 2018[18] | N and AF | • The author calculated the time-frequency spectrum on each ECG recording, which contained multiple heartbeats. • Fed the spectrum into the LSTM model according to the time order. |
The accuracies were 98.3% and 87.0% for within- and cross-subject experiments | MITDB, CinC and self recorded data set |
Tan et al. 2018[12] | N and CAD | • Each ECG segment was sliced into multiple shorter segments with overlapping windows. Then they were put together into a 2D matrix as an input. • The input was first fed into 2-layer CNN, and 3-layer LSTM was designed on the top of CNN. |
• For non-patient specific task, the accuracy was 99.85%. • For patient specific task, the accuracy was 95.76%. |
INCARTDB |
Yildirim et al. 2019[25] | N, LBBB, RBBB, APB and VPC | • The authors applied a CNN-based autoencoder to extract the feature from each ECG segment. • The feature sequence was fed into 1-layer LSTM. |
The accuracy was 99.11% | MITDB |
Hou et al. 2019[45] | Two tasks: 1.N, LBBB, RBBB, APB and VPC; 2. AAMI standardc |
• The authors applied an RNN-based autoencoder to extract the features from each ECG signal segment. • Both encoder and decoder were designed with 1-layer LSTM. • An SVM was used for the final decision-making based on the features. |
• For task 1, the accuracy was 99.74%. • For task 2, the accuracy was 99.45% |
MITDB |
Wang et al. 2019[32] | N, VEB, SVEB and F | • The ECG signal segment was fed into the RNN as a morphological vector. • The RR interval features were fed into a dense layer as temporal input with an extra clinical label. • RNN and dense layer outputs were combined and fed to another dense layer for final decision-making. |
• Accuracies of detection SVEB on three datasets were: 99.7%, 99.0% and 99.9%. • Accuracies of detection VEB on three datasets were: 99.7%, 99.8% and 99.4% |
MITDB, SVDB and INCARTDB |
Liu et al. 2019[26] | N, VPC, RBBB and APB | • An ensemble model with 3 CNNs and 3 RNNs was proposed. • The signal was decomposed into 6 levels of intrinsic mode functions (IMF)with empirical mode decomposition. • Three low IMFs were fed into three bidirectional LSTMs, and three high IMFs were fed into three CNNs • The sub-models were first trained separately, and an SVM was used as a fusion layer. |
The accuracy was 99.1% | INCARTDB |
Saadatnejad et al. 2019[53] | VEB vs. non-VEB and SVEB vs. non-SVEB | • The authors proposed an ensemble model with two sub-RNN models. • The original signal, RR-interval features, and wavelet features were combined as pairs for the inputs of sub-RNN models. |
Evaluated the model with three different sub-set of MITDB. The average accuracies for those two tasks were 99.4% and 98.6% | MITDB |
Mousavi et al. 2019[54] | NA, SA, VA, FA, and QA | • A Seq2seq model was designed. The input was a sequence of multiple beats, and the output was sequential labels. • The ECG signals were first fed into CNN-1D layers to extract feature sequences, which were then fed into an encoder. • Both encoder and decoder were bidirectional Elman RNN. |
Intra-patient (within-subject) paradigm achieved 99.92% accuracy. Inter-patient (cross-subject) paradigm achieved 99.53% accuracy. | MITDB |
N: Normal rhythm; A: Arrhythmia; AF: Atrial fibrillation rhythm; APB: Atrial premature beats or atrial premature complex; CAD: Coronary artery disease; F: Fusion of a ventricular ectopic beat and a normal beat; LBBB: Bundle branch block; PB: Paced beat; RBBB: Right bundle branch block; VEB: Ventricular ectopic beat; VPC: Ventricular premature contraction; SVPC: Supraventricular premature contraction; SVEB: Supraventricular ectopic beat; O: Other types of rhythms.
CinC: A database provided by Computing in Cardiology 2017 challenge[15]; MITDB: MIT-BIH arrhythmia database[16]; SVDB: MIT-BIH Supraventricular Arrhythmia Database; INCARTDB: St. Petersburg Institute of Cardiological Technics 12-lead Arrhythmia Database[15]
AAMI standard includes 5 classes: NA: Normal beat, LBBB, RBBB, atrial escape beat, Nodal escape beat; SA: APB, SVPC, Aberrated atrial premature beat, Nodal premature beat; VA: VPC, VEB; FA: Fusion of normal and ventricular beat; QA: PB, Fusion of paced and normal beat, unclassified beat.