. Author manuscript; available in PMC: 2024 Oct 5.

Published in final edited form as: IEEE Trans Neural Netw Learn Syst. 2023 Oct 5;34(10):6983–7003. doi: 10.1109/TNNLS.2022.3145365

TABLE I.

Summary of works carried out using RNN structure with ECG Signals

Author	Heartbeat types^a	Model architecture	Performance	Dataset^b
Zhang et al. 2017[51]	N, VEB, SVEB and F.	The raw ECG signal segment was fed into a 2-layer stacked LSTM.	The accuracies were 99.6% and 99.0% for detecting SVEB and VEB, respectively.	MITDB
Maknickas et al. 2017[52]	N, AF, O, and noise.	• 23 handcraft features were extracted from each ECG segment and then connected in the time order to form an input sequence. • The model was composed of 3-layer LSTM and followed by a dense layer.	Macro-averaging F1-score was 0.78.	CinC
Schwab et al. 2017[22]	N, AF, O, and noise.	• Handcrafted and autoencoder calculated features were extracted from beat-wise ECG segments and then connected in the time order to form an input sequence. • An ensemble model contained 15 RNNs and 4 Hidden Semi-Markov models. The outputs of sub-models were concatenated as an input for a dense layer to obtain the final result.	Macro-averaging F1-score was 0.79.	CinC
Warrick et al. 2017[40]	N, AF, O, and noise.	The ECG signal segment was firstly fed into one CNN-1D layer. Three layers of LSTM were then applied at the top of the CNN layer	Macro-averaging F1-score was 0.80.	CinC
Zihlmann et al. 2017[39]	N, AF, O, and noise.	• Proposed an ensemble model with 5 deep networks, either CNN only models or CNN and RNN models. • The authors first calculated the logarithmic time-frequency spectrogram as an input image. • The RNN part was composed of a 3-layer bidirectional LSTM.	Macro-averaging F1-score was 0.821	CinC
Xiong et al. 2018[41]	N, AF, O, and noise.	• The ECG signal segment was first fed into a CNN part composed of 16 residual blocks with CNN-1D layers. • Three layers of Elman RNN were then applied at top of CNN parts.	Macro-averaging F1-score was 0.864.	CinC
Singh et al. 2018[37]	N and A.	The authors applied three kinds of RNNs with Elman, LSTM, and GRU. Each RNN had three stacked layers, and the ECG signal segment was used as an input sequence.	LSTM showed the best accuracy of 88.1%.	MITDB
Shashikumar et al. 2018[11]	AF and O.	• The wavelet power spectrum was calculated in 30s non-overlapping windows. 5-layer CNN was applied on each spectrum to extract local features and thus form a feature sequence in 10 mins. • The feature sequence was fed into a bidirectional Elman RNN with a soft attention layer, followed by a dense layer for decision-making.	• Testing accuracy was 96%. • Transfer learning was conducted on PPG recordings as input data and obtained 97% accuracy.	Collected from 2850 patients
Yildirim. 2018[24]	N, LBBB, RBBB, PB and VPC	• The detailed coefficients were calculated from 4 levels of the discrete wavelet transform. These coefficients with original signals were connected into a 5-channel time sequence as input. • The sequence was fed into 2-layer unidirectional and bidirectional LSTMs separately.	• For the unidirectional LSTM, the accuracy was 99.25%. • For the bidirectional LSTM, the accuracy was 99.39%	MITDB
Oh et al. 2018[17]	N, LBBB, RBBB, APB and VPC	• The ECG signal segment was first fed into the CNN part, composed of 3 CNN-1D layers. • The CNN part was followed by 1-layer LSTM. The last time step’s output was fed to 3 dense layers for final classification.	The accuracy was 98.10%	MITDB
Chang et al. 2018[18]	N and AF	• The author calculated the time-frequency spectrum on each ECG recording, which contained multiple heartbeats. • Fed the spectrum into the LSTM model according to the time order.	The accuracies were 98.3% and 87.0% for within- and cross-subject experiments	MITDB, CinC and self recorded data set
Tan et al. 2018[12]	N and CAD	• Each ECG segment was sliced into multiple shorter segments with overlapping windows. Then they were put together into a 2D matrix as an input. • The input was first fed into 2-layer CNN, and 3-layer LSTM was designed on the top of CNN.	• For non-patient specific task, the accuracy was 99.85%. • For patient specific task, the accuracy was 95.76%.	INCARTDB
Yildirim et al. 2019[25]	N, LBBB, RBBB, APB and VPC	• The authors applied a CNN-based autoencoder to extract the feature from each ECG segment. • The feature sequence was fed into 1-layer LSTM.	The accuracy was 99.11%	MITDB
Hou et al. 2019[45]	Two tasks: 1.N, LBBB, RBBB, APB and VPC; 2. AAMI standard^c	• The authors applied an RNN-based autoencoder to extract the features from each ECG signal segment. • Both encoder and decoder were designed with 1-layer LSTM. • An SVM was used for the final decision-making based on the features.	• For task 1, the accuracy was 99.74%. • For task 2, the accuracy was 99.45%	MITDB
Wang et al. 2019[32]	N, VEB, SVEB and F	• The ECG signal segment was fed into the RNN as a morphological vector. • The RR interval features were fed into a dense layer as temporal input with an extra clinical label. • RNN and dense layer outputs were combined and fed to another dense layer for final decision-making.	• Accuracies of detection SVEB on three datasets were: 99.7%, 99.0% and 99.9%. • Accuracies of detection VEB on three datasets were: 99.7%, 99.8% and 99.4%	MITDB, SVDB and INCARTDB
Liu et al. 2019[26]	N, VPC, RBBB and APB	• An ensemble model with 3 CNNs and 3 RNNs was proposed. • The signal was decomposed into 6 levels of intrinsic mode functions (IMF)with empirical mode decomposition. • Three low IMFs were fed into three bidirectional LSTMs, and three high IMFs were fed into three CNNs • The sub-models were first trained separately, and an SVM was used as a fusion layer.	The accuracy was 99.1%	INCARTDB
Saadatnejad et al. 2019[53]	VEB vs. non-VEB and SVEB vs. non-SVEB	• The authors proposed an ensemble model with two sub-RNN models. • The original signal, RR-interval features, and wavelet features were combined as pairs for the inputs of sub-RNN models.	Evaluated the model with three different sub-set of MITDB. The average accuracies for those two tasks were 99.4% and 98.6%	MITDB
Mousavi et al. 2019[54]	NA, SA, VA, FA, and QA	• A Seq2seq model was designed. The input was a sequence of multiple beats, and the output was sequential labels. • The ECG signals were first fed into CNN-1D layers to extract feature sequences, which were then fed into an encoder. • Both encoder and decoder were bidirectional Elman RNN.	Intra-patient (within-subject) paradigm achieved 99.92% accuracy. Inter-patient (cross-subject) paradigm achieved 99.53% accuracy.	MITDB

N: Normal rhythm; A: Arrhythmia; AF: Atrial fibrillation rhythm; APB: Atrial premature beats or atrial premature complex; CAD: Coronary artery disease; F: Fusion of a ventricular ectopic beat and a normal beat; LBBB: Bundle branch block; PB: Paced beat; RBBB: Right bundle branch block; VEB: Ventricular ectopic beat; VPC: Ventricular premature contraction; SVPC: Supraventricular premature contraction; SVEB: Supraventricular ectopic beat; O: Other types of rhythms.

CinC: A database provided by Computing in Cardiology 2017 challenge[15]; MITDB: MIT-BIH arrhythmia database[16]; SVDB: MIT-BIH Supraventricular Arrhythmia Database; INCARTDB: St. Petersburg Institute of Cardiological Technics 12-lead Arrhythmia Database[15]

AAMI standard includes 5 classes: NA: Normal beat, LBBB, RBBB, atrial escape beat, Nodal escape beat; SA: APB, SVPC, Aberrated atrial premature beat, Nodal premature beat; VA: VPC, VEB; FA: Fusion of normal and ventricular beat; QA: PB, Fusion of paced and normal beat, unclassified beat.