Synthetic ECG signal generation using generative neural networks

Edmond Adib; Fatemeh Afghah; John J Prevost

doi:10.1371/journal.pone.0271270

. 2025 Mar 25;20(3):e0271270. doi: 10.1371/journal.pone.0271270

Synthetic ECG signal generation using generative neural networks

Edmond Adib ^1,^*, Fatemeh Afghah ², John J Prevost ¹

Editor: Zahid Mehmood^3,³

PMCID: PMC11936209 PMID: 40132047

Abstract

Electrocardiogram (ECG) datasets tend to be highly imbalanced due to the scarcity of abnormal cases. Additionally, the use of real patients’ ECGs is highly regulated due to privacy issues. Therefore, there is always a need for more ECG data, especially for the training of automatic diagnosis machine learning models, which perform better when trained on a balanced dataset. We studied the synthetic ECG generation capability of 5 different models from the generative adversarial network (GAN) family and compared their performances, the focus being only on Normal cardiac cycles. Dynamic Time Warping (DTW), Fréchet, and Euclidean distance functions were employed to quantitatively measure performance. Five different methods for evaluating generated beats were proposed and applied. We also proposed 3 new concepts (threshold, accepted beat and productivity rate) and employed them along with the aforementioned methods as a systematic way for comparison between models. The results show that all the tested models can, to an extent, successfully mass-generate acceptable heartbeats with high similarity in morphological features, and potentially all of them can be used to augment imbalanced datasets. However, visual inspections of generated beats favors BiLSTM-DC GAN and WGAN, as they produce statistically more acceptable beats. Also, with regards to productivity rate, the Classic GAN is superior with a 72% productivity rate. We also designed a simple experiment with the state-of-the-art classifier (ECGResNet34) to show empirically that the augmentation of the imbalanced dataset by synthetic ECG signals could improve the performance of classification significantly.

Introduction

Cardiovascular diseases are one the major causes of death (for example, 31% of all deaths in 2016) [1]. Electrocardiogram (ECG) analysis is routine in any complete medical evaluation, mostly due to the fact that it is painless, noninvasive, and can easily reveal arrhythmia. The classification and diagnosis of arrhythmias is usually done by experts of the domain, which is time consuming and prone to human error. Therefore, automatic ECG analysis and diagnosis is of crucial importance. Classical supervised machine-learning shallow algorithms have been employed extensively for the classification of abnormalities in ECG [2–4]. Deep learning unsupervised algorithms have also been successfully used and reached state-of-the-art results, reducing or eliminating the need for external feature engineering [5].

One of the challenges in the application of ML algorithms on ECG is that their datasets are usually highly imbalanced (with regards to the number of samples per class), which causes automatic diagnosis models perform poorly on them [6]. Moreover, collected ECG data are sometimes noisy and accompanied with different types of artifacts, which may render some samples unusable or require preprocessing [7]. On the other hand, in spite of transfer learning, ML algorithms generally still require huge datasets for training. All these challenges suggest and justify the need for more synthetic ECG data and richer and larger artificially augmented and balanced datasets. Another issue that justifies the need for synthetic ECG beat generation is privacy — unlike synthetic data, the ECG data of real-patients contain personal information, and thus is considered highly sensitive. Because of this, their use, even for scientific and research purposes, is highly regulated [8]. To address all these issues, the generation of synthetic ECG signals has been the focus of many studies [9–14].

The main objective of this research is to assess the capability of 5 models from the GAN family in generating synthetic Normal ECG heartbeats. Additionally, we present 5 different methods to systematically evaluate the performance of the models in generating synthetic ECG beats. To this end, three similarity measures were incorporated: Dynamic Time Warping (DTW), Fréchet, and Euclidean distance functions. This study is different from previous works [10,11,14] in that: (1) we employed more models from the GAN family, (2) we incorporated WGAN, (3) we used lead-I from the two, ([11] used lead-II), and (4) we present a systematic way to evaluate the performance of the models in generating synthetic data. In addition, we introduce three new concepts: threshold, acceptable beat, and productivity rate. Thresholds are used to mathematically define “acceptable beats” as well as screen the generated beats for those that are low quality. We suggest a way to compute the threshold as well and believe the productivity rate is a key indicator in performance evaluation. To the best of our knowledge, this is the first time such a systematic way of comparison has been presented. Also, we designed a binary classification experiment and showed that the augmentation of imbalanced datasets with synthetically generated ECG beats can improve the performance of classification comparably with the all real balanced dataset.

Related works

Hong et al. [15] presented a comprehensive review and summary of the existing deep learning methods as well as challenges and opportunities in ECG analysis.

Delaney et al. [11] developed a range of GAN architectures to synthetically generate ECG beats. They used two evaluation metrics, Maximum Mean Discrepancy (MMD) and DTW, to quantitatively evaluate the generated beats and their suitability for real-world applications and used the Euclidean distance function in their “privacy disclosure test”.

Hyland et al. [16] used two-layer LSTM architectures in both the generator and discriminator to generate synthetic ECG beats. For the evaluation of the performance of their models, they used MMD plus two innovative methods.

Zhu et al. [14] proposed a novel BiLSTM-CNN GAN to generate ECG beats and reported better performance compared to other existing models.

Wang et al. [10] used a 14-layer ACGAN to generate synthetic ECG beats for data augmentation. For the evaluation process, they used Euclidean, Pearson Correlation Coefficient (PCC), and Kulblack-Leibler similarity measures.

Zhang et al. [13] proposed a GAN model, whose generator was comprised of a 2 dimensional BiLSTM plus a CNN layer. In the discriminator, they used CNN and FC layers. They used the standard 12 lead ECG signals, and studied four classes of arrhythmia, employing monoclass GAN models.

Wulan et al. [12] used three different GAN based models to generate 3 classes of ECG heart beats: Normal, Left Bundle Branch Block beat, and Right Bundle Branch Block beat. For evaluation, they used SVM (Support Vector Machine) classifier and GAN-train and GAN-test scores.

A comparison between some major works and our study is given in Tables 1, 2 and 3.

Table 1. Comparison with major related works - I.

Ref.	Year	Main Objective	GAN Variant	Architecture (Gen. - Discr.)
[11]	2019	Generating Realistic Synthetic ECG Signal	Regular	LSTM-4CNN, BiLSTM-4CNN
[10]	2019	Dataset Augmentation and Balancing	ACGAN	14CNN-16CNN
[14]	2019	Generating Realistic Synthetic ECG Signal	Regular	BiLSTM-(2CNN+FC)
[13]	2021	Fully Automated Synthetic ECG Generation	Regular	2D BiLSTM 5CNN-2D 4CNN FC
[12]	2020	Generating Realistic Synthetic ECG Signal	DCGAN (SpectroGAN) Regular (WaveletGAN)	2D 4TrCNN-2D 4CNN (SpectroGAN) 2D 3FC-2D 3FC (WaveletGAN)
ThS²	2021	Generating Realistic Synthetic ECG Signal	Regular, WGAN)	FC-FC, DC-DC, BiLSTM-DC, AE/VAE-FC, DC-DC (WGAN)

Open in a new tab

Table 2. Comparison with major related works - II.

Ref.	Dataset	Multiclass (study/model)	Mode Collapse Prevention	Metrics	Pre- processing
[11]	MIT-BIH (Lead II)	No/No (only Normal)	MBD ³ (didn’t work)	MMD ⁴, DTW	Concat. of beats
[10]	MIT-BIH (Lead II)	Yes/Yes	BN ⁵, DO ⁶ (in Discr.)	ED, PCC ⁷ , KL Div. (used templates)	NM
[14]	MIT-BIH (one lead)	[bc]NM ⁸	DO	PRD ⁹, RMS, FD ¹⁰	NM
[13]	12 lead, PTB-XL, CCDD, CSE, Chapman, Private Domain	Yes/No	NM	MMD (IQR ¹¹ SK ¹² KU ¹³ (between train, test and synthetic sets)	FWS ¹⁴
[12]	MIT-BIH (Lead I¹)	Yes/Yes	IN ¹⁵	SVM ¹⁶ , GTrTs¹⁷	4 second segmentation
ThS	MIT-BIH (Lead I)	No/No	BN, Visual Inspection	Original Methods	Pan-Tompkins

Open in a new tab

Table 3. Comparison with major related works - III.

Ref.	Batch Size	Optimization	Learning Rate	Hyper-Parameter Fine-Tuning	No. of Epochs
[11]	NM	Adam	NM	NM	60
[10]	NM	Adam	0.0001 (G) 0.0002 (D)	NM	150
[14]	NM	NM	NM	NM	NM
[13]	32	NM	NM	NM	max 1000 (10 min.)
[12]	NM	RMSProp	0.0001 (SectroGAN) 0.00015 (WaveletGAN)	NM	NM
ThS	9	Adam	0.0002	Used Recommended Suggestions	30

Open in a new tab

1	MLII	2	This Study	3	Minibatch Discrimination
4	Maximum Mean Discrepancy	5	Batch Normalization	6	Dropout
7	Pearson Correlation Coefficient	8	Not Mentioned	9	Percent Root Mean Square Difference
10	Fréchet Distance	11	Interquartile Range	12	Skewness
13	Kurtosis	14	Fixed Window Segmentation	15	Instance Normalization
16	Support Vector Machine	17	GAN-train/GAN-test Score

Open in a new tab

Materials and mathematical background

Generative models

Generative neural network models are powerful tools for learning the true underlying distribution of any kind of dataset in unsupervised settings. Two of the most commonly used families of generative models are (Variational) Autoencoder-Decoder (AE/VAE) and Generative Adversarial Networks (GAN).

Autoencoder-decoders.

Through the assumption that data have been originally generated by a much lower-dimension latent variable space (Z), Autoencoder-Decoder (AE) models learn the distribution of the latent space and map from Z (latent space) to X (real data space) [17].

Variational autoencoders.

In Variational Autoencoder-Decoder (VAE) networks, the bottleneck will be a distribution rather than a reduced dimension vector.

If $X$ is the input, $Z$ the latent random variables, $Q (Z | X)$ the encoder, and $P (X | Z)$ the decoder, then the objective function of VAE can be summarized as:

\begin{matrix} l o g P (X) - D_{K L} [Q (Z | X) ∥ P (Z | X)] = E [l o g P (X)] - D_{K L} [Q (Z | X) ∥ P (Z | X)] \end{matrix}

(1)

The objective function can be interpreted as follows: maximizing the expectation of the input data while minimizing the KL distance (Kullback-Leibler Divergence, $D_{K L}$ ) between the encoder and decoder [17].

In order to make back-propagation feasible, a technique called the Re-Parameterization Trick, [17], is used in VAE, in which a mean vector along with a standard deviation vector are generated instead.

Adversarial networks.

GAN architectures consist of two blocks of networks: the generator and the discriminator [18]. The generator, G, takes an input random vector $z$ and maps it into the data space (referred to as fake/synthesized data). The generator aims to fool the discriminator, i.e. make the discriminator mistakenly classify it as real.

The discriminator takes in an input (either fake/synthesized or real) and outputs a number between zero and one, representing the probability that the input is fake/synthesized or real, represented as D ( . ) ∈ [ 0 , 1 ] correspondingly.

The generator and the discriminator play a two-player zero-sum minimax game whose value function, V ( G , D ) , is defined as below [18]:

\begin{matrix} \min_{G} \max_{D} V (G, D) = E_{x \sim P_{d a t a} (X)} [l o g D (X)] + E_{z \sim P_{z} (z)} [l o g (1 - D (G (z)))] \end{matrix}

(2)

The GAN model implicitly finds the underlying distribution of the real data without any linkage or traceability between the generated data and the real data, which is required due to privacy concerns. Thus, the synthesized data by the generator have the same distribution as real data and can be used to enrich a dataset or to balance an imbalanced dataset. GAN models are a family and each member is named differently depending on the architecture used in the generator and the discriminator. Examples of this include classic, LSTM (Long Short-Term Memory), DC (Deep Convolutional), et cetera.

Experimental setup

Dataset and segmentation

The MIT-BIH Arrhythmia [19,20] dataset is one of the most common benchmarks for ECG signal analysis and is used in this study as well. This dataset includes 48 30-minute two-channel ambulatory ECG records from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. The recordings were digitized at 360 samples per second per channel with 11-bit resolution over a 10 mV range. This dataset is fully annotated with both beat-level and rhythm-level diagnoses. When segmented, the dataset is comprised of 109 , 338 individual beats, of which 90 , 502 beats are in the Normal class.

The MIT-BIH dataset is highly imbalanced and the beats are divided into 5 main classes: N: Normal (82 . 8%), V: Premature ventricular contraction (6 . 6%), F: Fusion of ventricular and normal beat (0 . 7%), S: Supraventricular premature beat (2 . 5%), and Q: Unclassifiable beat (7 . 3%).The class Q is not in fact a class per se, because any heartbeat that could not be classified has been put in this class; therefore, beats in this class do not follow a pattern as is the case in the other classes. One sample from the Normal class is shown in the Fig 6 (b). As the main objective in this study is to compare the capabilities of models in generating synthetic ECG beats, we focused on generating only one class of beats: N, which can be generalized to other classes.

We borrowed segmented dataset from Mousavi et al. [6], who used the Pan-Tompkins method [21] for segmentation. Their segmented beats are of the uniform length of 280, which were resampled to 256 in this study using the scipy.signal.resample function:

\begin{matrix} V & = {v_{i}^{k}} i = 1, \dots, N_{V}^{k} k = 1, \dots, K \end{matrix}

(3)

\begin{matrix} v_{i}^{k} & = [v_{i, 1}^{k}, \dots, v_{i, 256}^{k}] \end{matrix}

(4)

where k is the class and $N_{V}^{k}$ is the number of the beats in class k. The dataset was filtered and only the Normal beat class was kept, so k = 1 and it is dropped hereafter. There are $N_{V} = 90, 502$ individual beats ( $v_{i}$ ) in the filtered dataset space (V).

Model designs

A total of five models were utilized in this experiment, each of which is identified by a two digit code (01 to 05) for ease of reference. The details of the architectures of the models are shown in Tables 4, 5, 6, 7, and 8.

Table 4. Classic GAN (01).

Layer	Generator	Discriminator
1	FC(100x128), L-ReLU(0.2)	FC(256x512), L-ReLU(0.2)
2	FC(128x256), BN, L-ReLU(0.2)	FC(512, 256), L-ReLU(0.2)
3	FC(256x512), BN, L-ReLU(0.2)	FC(256,1), sigmoid
4	FC(512, 1024), BN, L-ReLU(0.2)	-
5	FC(1024, 256), tanh	-

Open in a new tab

Table 5. DC-DC GAN (02).

Layer	Generator	Discriminator
1	ConvTr1d(100x512), BN, L-ReLU(0.2)	Conv1d(1, 64), L-ReLU(0.2)
2	ConvTr1d(512, 256), BN, L-ReLU(0.2)	Conv1d(64, 128), BN, L-ReLU(0.2)
3	ConvTr1d(256, 128), BN, L-ReLU(0.2)	Conv1d(128, 256), BN, L-ReLU(0.2)
4	ConvTr1d(128, 64), BN, L-ReLU(0.2)	Conv1d(256, 512), BN, L-ReLU(0.2)
5	ConvTr1d(64, 1), BN, L-ReLU(0.2)	Conv1d(512, 1), FC(13, 1), sigmoid
6	FC (64, 256), tanh	-

Open in a new tab

Table 6. BiLSTM-DC GAN (03).

Layer	Generator	Discriminator
1	BiLSTM(100, 1000), 2 layers,	Conv1d(1, 64), L-ReLU(0.2)
2	FC(1000*2, 256), tanh	Conv1d(64, 128), BN, L-ReLU(0.2)
3	-	Conv1d(128, 256), BN, L-ReLU(0.2)
4	-	Conv1d(256, 512), BN, L-ReLU(0.2)
5	-	Conv1d(512, 1), FC(13, 1), sigmoid

Open in a new tab

Table 7. AE/VAE-DC GAN (04).

Layer	Encoder	Decoder	Discriminator
1	FC(256, 512), L-ReLU(0.2)	FC(10, 512), L-ReLU(0.2)	FC(10, 512), L-ReLU(0.2)
2	FC(512, 512), BN, LReLU	FC(512, 512), BN, L-ReLU(0.2)	FC(512, 256), L-ReLU(0.2)
3 (mu)	FC(512, 10)	FC(512, 256), tanh()	FC(256, 1), sigmoid
4 (logvar)	FC(512, 10))	-	-
5 (Output layer)	Reparameterization (mu, logvar)	-	-

Open in a new tab

Table 8. WGAN (05).

Layer	Generator	Discriminator
1	ConvTr1d(100x2048), BN, ReLU	Conv1d(1, 64), L-ReLU(0.2)
2	ConvTr1d(2048, 1024), BN, ReLU	Conv1d(64, 128), BN, L-ReLU(0.2)
3	ConvTr1d(1024, 512), BN, ReLU	Conv1d(128, 256), BN, L-ReLU(0.2)
4	ConvTr1d(512, 256), BN, ReLU	Conv1d(256, 512), BN, L-ReLU(0.2)
5	ConvTr1d(256, 128), BN, ReLU	Conv1d(512, 1024), BN, L-ReLU(0.2)
6	ConvTr1d(128, 64), BN, ReLU	Conv1d(1024, 2048), BN, L-ReLU(0.2)
7	Conv1d (64, 1), tanh	Conv1d(2048, 1)

Open in a new tab

Hyperparameter settings.

In all the models, the number of epochs is 30 with a batch size of 9. The optimizer used is ADAM with $β_{1}$ and $β_{2}$ equal to 0 . 5 and 0 . 999, respectively. The latent variable of a dimension of 100 is used at the generator input. Binary cross-entropy is used as the loss function. Gaussian Normal Distribution with μ of 0 and σ of 0 . 02 is used for parameter initialization. In model 04 which is a hybrid model of VAE and GAN, the loss generator function is a weighted sum of the adversarial loss and the L1 loss between the decoded beats and real beats.

Graphical representations of architectures.

Graphical representation of models 01 to 05 are shown in Figs 1, 2, 3, 4, and 5 respectively.

Similarity measures (distance functions)

Currently, there is a lack of consensus on the best evaluation metric for the performance of the generative models [22] and researchers mostly resort to the subjective expert-eye evaluation. In general, a distant function DF has a scalar output that quantifies the proximity (or distance) between its two input beats:

D F (x, y) : R^{256} x R^{256} \to R

(5)

Dynamic time warping measure (DTW).

DTW belongs to a family of measures known as “elastic dissimilarity measures” and it works by optimally aligning (warping) the time scale in a way that the accumulated cost of this alignment is minimal [23]. It constructs a cost matrix D based on the two time-series being compared, x and y. The elements of matrix D are defined, by a recurrent formula:

\begin{array}{l} D_{i, j} = f (x_{i}, y_{j}) + m i n {D_{i - 1, j}, D_{i, j - 1}, D_{i - 1, j - 1}} \end{array}

For i = 1 , … , M and j = 1 , … , N where M and N are the lengths of the two time series. The local cost function f ( . , . ) , also called sample dissimilarity function, is usually the Euclidean distance. The final DTW value typically corresponds to the total accumulated cost, i.e. $d_{D T W} (x, y) = D_{M, N}$ [23].

Fréchet distance measure.

If $P = (u_{1}, u_{2}, \dots, u_{p})$ and $Q = (v_{1}, v_{2}, \dots, v_{q})$ are two time series, a coupling L between P and Q is defined as the set of the links:

\begin{matrix} (u_{a_{1}}, v_{b_{1}}), (u_{a_{2}}, v_{b_{2}}), \dots, (u_{a_{m}}, v_{b_{m}}) \end{matrix}

(6)

such that $a_{1} = 1$ , $b_{1} = 1$ , $a_{m} = p$ and $b_{m} = q$ , and for all i = 1 , … , q, $a_{i + 1} = a_{i}$ or $a_{i + 1} = a_{i} + 1$ and $b_{i + 1} = b_{i}$ or $b_{i + 1} = b_{i} + 1$ . The length ∥ L ∥ of the coupling L is defined as the longest (maximum Euclidean distance) in the link L:

\begin{matrix} ∥ L ∥ = \max_{i = 1, \dots, m} d (u_{a_{i}}, v_{b_{i}}) \end{matrix}

(7)

then the Fréchet distance between P and Q is defined as [24]:

δ_{d F} = m i n {∥ L ∥ ∣ L is a coupling between P and Q}

(8)

Euclidean distance measure.

The Euclidean distance between two time series, $P = (u_{1}, u_{2}, \dots, u_{p})$ and $Q = (v_{1}, v_{2}, \dots, v_{q})$ is defined as:

\begin{matrix} d (P, Q) = \sqrt{{(u_{1} - v_{1})}^{2} + \dots + {(u_{n} - v_{n})}^{2}} \end{matrix}

(9)

Fréchet distance function fulfills all the properties required by metric spaces (e.g., commutative, triangle property, …) and can be used as a metric. However, DTW and Euclidean measures do not satisfy the triangle property and are not a metric as required by metric spaces.

Templates

For each class, there is one template which is the quintessential time-series of that class and distinctly represents all the morphological features and patterns of the class. Distance functions take the template as well as a generated beat as inputs and generate a scalar number, which signifies the proximity of the two time-series. The following two approaches are available for developing/selecting templates.

Statistically-Averaged Beat (SAB) approach.

Since all beats have an equal number of time steps (256), it is sensible that the template is defined as some sort of “mean of the class” such that the value at each time step is computed as the mean across all the beats of the class at that time step:

\begin{matrix} {\bar{v}}_{j} = \frac{1}{N_{V}} \sum_{i = 1}^{N_{V}} v_{i, j} j = 1, \dots, 256 \end{matrix}

(10)

\begin{matrix} t = [{\bar{v}}_{1}, \dots, {\bar{v}}_{256}] \end{matrix}

(11)

where $N_{V}$ is the number of beats in the set and t is the template. One sample of SAB template is shown in Fig 6(a).

Expert-eye/random approach.

In this approach, the original dataset is visually inspected by a domain expert to find the “most fit sample” that meets all the morphological characteristics of that class. In this experiment, the expert-eye approach is employed to select the template, which is shown in Fig 6(b).

Evaluating the generated beats and comparison between models

Evaluating the quality of the generated beats and the comparison between the performances of the models can be accomplished through one of the following four methods.

Method 1.

To assure the proximity of the two sets, the whole set of generated beats should be cross-compared to the whole original dataset, element by element. The outcome of this analysis (i.e., the mean distance) is a deterministic single number (with no randomosity) representing the average distance between the two sets. If V is the dataset space with $N_{V}$ elements in it and G is the generated beats space, i.e.:

\begin{matrix} G & = {g_{i}} i = 1, . . ., N_{G} \end{matrix}

(12)

\begin{matrix} g_{i} & = [g_{i, 1}, \dots, g_{i, 256}], \end{matrix}

(13)

then the average distance between the two sets, $d_{a v e}^{D F}$ , is:

\begin{matrix} s_{1}^{D F} = d_{a v e}^{D F} = \frac{1}{N_{V} N_{G}} \sum_{i = 1}^{N_{V}} \sum_{j = 1}^{N_{G}} D F (v_{i}, g_{j}) \end{matrix}

(14)

However, this analysis usually is not pragmatic, as it requires tremendous computational power as the number of elements in the sets increases. To approximate, for cross comparison, one can instead apply Eq 14 on two randomly selected portions from the two sets with sizes $N_{V}^{*}$ and $N_{G}^{*}$ . Of course, the method of sampling of the portions $N_{V}^{*}$ and $N_{G}^{*}$ plays a significant role in the outcome and makes this process stochastic. The size of the portions depends on the available computational power (the more there is, the more accurate the results will be). In this experiment, we used $N_{G}^{*} =$ 300 generated beats from each model (10 beats from each of the 30 epochs) and cross-compared them against $N_{V}^{*} =$ 300 randomly selected beats from the original dataset. Obviously, this process is stochastic and the outcome depends on the particular portions selected (Table 9).

Table 9. Method 1 (portions of the two sets compared).

Model	Model	$s_{1}^{D T W}$	$s_{1}^{F r é}$	$s_{1}^{E u c}$
01	Classic GAN	3.953	0.589	8 . 325
02	DC-DC GAN	5 . 313	0 . 862	9 . 390
03	BiLSTM-DC GAN	4 . 535	0 . 625	8 . 557
04	AE/VAE-DC GAN	4 . 357	0 . 622	8.230
05	WGAN	4 . 401	0 . 681	8 . 486

Open in a new tab

Method 2.

In this approach, the template is randomly selected from the original dataset and all the generated beats are compared with it. The average distance of all generated beats from the template is the score for that model:

\begin{matrix} s_{2}^{D F} = \frac{1}{N_{G}} \sum_{i = 1}^{N_{G}} D F (v_{i}, t) \end{matrix}

(15)

This method is also obviously stochastic, as the outcome depends on the initial choice of the template. However, there is a constraint on the selected template which must have all the morphological features required by the class. Therefore, the variation is very limited and the results are more reliable. To select the best model, the scores are compared with each other in Table 10.

Table 10. Method 2 (all generated beats compared with one randomly selected template, averages).

Model	Model	$s_{2}^{D T W}$	$s_{2}^{F r é}$	$s_{2}^{E u c}$
01	Classic GAN	4.13	0 . 595	8 . 44
02	DC-DC GAN	5 . 66	0 . 863	9 . 75
03	BiLSTM-DC GAN	4 . 33	0.594	8.29
04	AE/VAE-DC GAN	4 . 52	0 . 627	8 . 34
05	WGAN	4 . 59	0 . 693	8 . 71

Open in a new tab

Method 3.

In this method, a template is randomly selected from the original dataset as in Method 2. Then, all the beats generated by each model are measured against the template and the beat which has produced the minimum distance function value is reported as the “best beat” for that model. The score of the model is the distance of the best beat of that model with the template. This method measures the ultimate power of each model in getting as close to the template as possible (Table 11):

Table 11. Method 3 (best generated beat - Minimum Distance Functions).

Model	Model	$s_{3}^{D T W}$	$s_{3}^{F r é}$	$s_{3}^{E u c}$
01	Classic GAN	0 . 510	0.0844	0 . 890
02	DC-DC GAN	0 . 505	0 . 120	1 . 38
03	BiLSTM-DC GAN	0 . 425	0 . 0966	3 . 42
04	AE/VAE-DC GAN	0 . 505	0 . 108	1 . 02
05	WGAN	0.311	0 . 0981	0.610

Open in a new tab

\begin{matrix} v_{b e s t}^{D F} = \underset{v_{i} \in V}{argmin} D F (v_{i}, t) \end{matrix}

(16)

\begin{matrix} s_{3}^{D F} = D F (v_{b e s t}^{D F}, t) \end{matrix}

(17)

Method 4.

In this method, a threshold is defined for each similarity measure $(η^{D F})$ . Any generated beat $(g_{i})$ with a distance function value less than the threshold is considered as an acceptable beat, with respect to that distance function, i.e.,:

\begin{matrix} G^{a c c, D F} = {g_{i} : D F (g_{i}, t) \leq η^{D F} i = 1, . . ., N_{G}} \end{matrix}

(18)

\begin{matrix} N_{G}^{a c c, D F} = n (G^{a c c, D F}) \end{matrix}

(19)

where n ( . ) is the number of the element in the set. Then, the Productivity Rate (i.e. the percentage of the acceptable beats among all the generated beats) is calculated as the discriminating factor between the models:

\begin{matrix} s_{4}^{D F} = P r o d^{D F} = \frac{n (G^{a c c, D F})}{n (G)} = \frac{N_{G}^{a c c, D F}}{N_{G}} \end{matrix}

(20)

The model which produces the highest productivity rate is selected as the best in performance (Table 12). The value choice of the threshold is rather arbitrary, experience-based, and must be validated by a domain expert. It can be set at a factor of the minimum distance:

Table 12. Method 4 (productivity rate - percent of acceptable beats).

Model	Model	$s_{4}^{D T W}$	$s_{4}^{F r é}$	$s_{4}^{E u c}$
01	Classic GAN	72.3	60.0	10 . 5
02	DC-DC GAN	26 . 8	18 . 6	11.2
03	BiLSTM-DC GAN	54 . 2	47 . 0	0 . 437
04	AE/VAE-DC GAN	49 . 7	37 . 7	9 . 80
05	WGAN	49 . 0	38 . 05	8 . 50

Open in a new tab

\begin{matrix} η^{D F} = a s_{3}^{D F} a \in R \end{matrix}

(21)

In this experiment, for each distance function, the threshold is essentially computed separately as the arithmetic mean between the minimum and the average of the values of that particular distance function among all the generated beats:

\begin{matrix} η^{D F} = \frac{s_{3}^{D F} + s_{2}^{D F}}{2} \end{matrix}

(22)

Method 5.

In this method, an expert with domain-specific knowledge looks at the entire set of generated beats and gives their subjective judgment on the performance of the model. This is accomplished by inspecting the existence of the morphological features of the beat class in the set of generated beats. This method can also be used simply to validate the other aforementioned methods.

Efficacy of synthetic ECG augmentation

A simple experiment has been designed to show the efficacy of the augmentation. A subset of the MIT-BIH Arrhythmia dataset is selected which contained only two classes: N (Normal Sinus Beat) and L (Left Bundle Block Branch Block). Then, a state-of-the-art classifier (ECGResNet34) is trained on (i) a balanced binary real dataset (L: 6455 and N: 6457) and (ii) an imbalanced dataset created by sampling the original dataset (L: 6460 and N: 500) so that the classifier performs poorly on classification. And finally, (iii) the imbalanced dataset is balanced back again (L: 6458 and N: 6454) by augmenting with synthetically generated beats. The classifier is trained on each of these three training sets and the classification metrics and confusion matrices are compared with each other in all the three cases. The test set (unseen data) is the same in all three cases (L: 1607 and N: 1609). The classifier used is ResNet34 [25] which is a 34-layer model and is the state-of-the-art in classification of images (2D). It incorporates residual building blocks following the residual stream logic: F ( x ) + x. Each building block is comprised of two 3 × 3 convolutional layers where the residual stream, x, goes directly from the input to the outlet of the block which prevents deterioration of training accuracy in deeper models [25]. This classifier is pretrained on the ImageNet dataset (more than 100 , 000 images in 200 classes). We used its 1D implementation [26] to classify the heartbeats.

Platform

Two different machines have been utilized in this experiment: a Dell Alienware with Intel i9-9900k at 3.6 GHz (8 cores, 16 threads) microprocessor, 64 GB RAM, and NVIDIA GeForce RTX 2080 Ti graphics card with 24 GB RAM, and a personal Dell G7 laptop with an Intel i7-8750H at 2.2 GHz (6 cores, 12 threads) microprocessor, 20 GB of RAM, and NVIDIA GeForce 1060 MaxQ graphics card with 6 GB of RAM.

The codes were written in Python 3.8, and PyTorch 1.7.1 was used as the main deep learning network library, as it makes the migration between CPU and GPU as well as the back-propagation and optimization much easier, thanks to its dynamic computational graph feature. The codes are available on the GitHub page of the paper (https://github.com/mah533/ Synthetic-ECG-Generation—GAN-Models-Comparison).

Results and discussion

Templates and typical normal beat

A statistically-averaged beat (SAB) template is shown in Fig 6(a). The downside to SABs is that although all the beats have the same number of time-steps, small horizontal shifts of morphological features in the temporal axis are inevitable (for instance, as a result of the segmentation process or heart rate variability). Consequently, in the calculation of mean values ( ${\bar{v}}_{j}$ ), more often than not, not all of the corresponding points are averaged together. Therefore, the generated template (Fig 6(a)) looks completely distorted and totally different from the Normal beat . In other words, the morphological characteristic features of that class are not visually distinguishable anymore. However, it should not be forgotten that on an average basis and from the statistical aspect, the SAB template is the best representation of information from all the samples in the class and it has been used in similar studies [10].

Generated beats

Some of the generated beats by different models are presented in Figs 7, 8, 9, 10, and 11. Figures in columns (a), (b) and (c) are the generated beats with minimum DTW, Fréchet, and Euclidean distance functions, (i.e., $v_{b e s t}^{D T W}, v_{b e s t}^{F r é}$ and $v_{b e s t}^{E u c}$ ) respectively. The calculated values of all three distance functions are shown on each plot as well for comparison. In column (d), a beat selected from the last batch of the last iteration in the last epoch is shown, which in fact represents the maximally trained models’ outputs. It can be seen that, after convergence, more training does not necessarily produce a better beat, neither in appearance nor in terms of quantitative proximity. Finally, in column (e), a beat that is visually close enough to the template in terms of morphological features is selected by ab expert.

Distance and loss functions

The trends of all three similarity measures as well as the generator and discriminator loss functions against the epoch number are plotted and shown in Fig 12. All graphs of the DC-DC GAN model (02) suffer from severe fluctuations, which is a result of a convergence issue. Fluctuation exists in other models as well, but they are not as severe.

Performance metrics

Table 9 summarizes the performance metric $s_{1}^{D F}$ (i.e., $s_{1}^{D T W}$ , $s_{2}^{F r é}$ and $s_{1}^{E u c}$ ) of the five models. As shown here, Classic GAN (in terms of DTW and Fréchet distance functions) and AE/VAE-DC GAN (in terms of the Euclidean distance function) seemingly generate sets of beats closest to the original dataset. However, it should be noted here that this analysis is stochastic, as the outcome depends on the method of sampling, i.e. the way the portions are selected. Therefore, the resulting outcomes are basically just one realization of the corresponding random variables and consequently cannot be used for deterministic judgments. Nevertheless, these numbers show that all the models are, give or take, in the same ballpark range.

Table 10 shows the result of the comparison using Method 2 ( $s_{2}^{D F}$ , the average distances from the template). Similar to Method 1, this process is also random because it depends on the choice of template. However, since the selected template is constrained to have all the morphological features of the class, as any other qualified candidate is, the outcome numbers are much more reliable. The results show that with respect to the DTW distance function Classic GAN, with respect to Fréchet, both the Classic and BiLSTM-DC GAN models equally and, with respect to Euclidean distance function BiLSTM-DC GAN, perform the best.

The result of the analysis using Method 3, $s_{3}^{D F}$ (the distance of the best generated beat from the template), are shown in Table 11. The results show that in terms of DTW and Euclidean, WGAN perform the best, and in terms of Fréchet distance function, Classic GAN perform the best.

Assessment in terms of productivity rates ( $s_{4}^{D F}$ , Table 12), reveals that 72.3% and 60.0% of the generated beats by the Classic GAN (01) model are acceptable with respect to the DTW threshold ( $η^{D T W}$ ) and Fréchet threshold, respectively.

The Euclidean measure selects the DC-DC GAN with only 11% of success. This method, like Method 2, is essentially random, but since the selected template is constrained, its randomosity is limited.

Visual inspection of the generated beats by a domain-expert knowledge (Method 5) suggests subjectively that WGAN and BiLSTM-DC GAN models produce more acceptable beats than the other models.

Efficacy of augmentation

Comparing the results in Tables 15 and 14 shows that the augmentation of the imbalanced dataset with synthetically generated beats can improve the classification drastically, almost as in real balanced dataset Table 13. Same trend is noticeable from confusion matrices Tables 16a), 16b) and 16c).

Table 15. Augmented data, balanced.

Cl.	Precision	Recall	F1-Score	Support
L	0.99	0.95	0.97	1607
N	0.95	0.99	0.97	1609
Accuracy			0.97	3216
Macro avg	0.97	0.97	0.97	3216
Weighted avg	0.97	0.97	0.97	3216

Open in a new tab

Table 14. Real data, imbalanced.

Cl.	Precision	Recall	F1-Score	Support
L	0.52	1.00	0.68	1608
N	1.00	0.08	0.15	1608
Accuracy			0.54	3216
Macro avg	0.76	0.54	0.42	3216
Weighted avg	0.76	0.54	0.42	3216

Open in a new tab

Table 13. Real data, balanced.

Cl.	Precision	Recall	F1-Score	Support
L	0.95	0.96	0.96	1609
N	0.96	0.95	0.95	1607
Accuracy			0.95	3216
Macro avg	0.96	0.95	0.95	3216
Weighted avg	0.96	0.95	0.95	3216

Open in a new tab

Table 16. Confusion matrices.

a) Real data, balanced.
-	L	N
L	1552	57
N	88	1519
b) Real data, imbalanced.
-	L	N
L	1608	0
N	1480	128
c) Augmented data, balanced.
-	L	N
L	1521	86
N	9	1600

Open in a new tab

Conclusion

Machine Learning automatic ECG diagnosis models classify ECG signals based on morphological features. ECG datasets are usually highly imbalanced due to the fact that the anomaly cases are scarce compared to the abundant Normal cases. Additionally, because of privacy concerns, not all the collected data from real patients are available as training sets. Therefore, it is necessary that realistic synthetic ECG signals can be generated and made publicly available. In this study, we compared the efficiency of a few DL models in generating synthetic ECG signals using 5 different methods. The 3 introduced concepts (threshold, accepted beat and productivity rate) are employed to systematically evaluate the models. The results from Method 1 suggest that all the tested models compete very closely in generating synthetic ECG beats (Table 9). The fact that all the results are numerically in the same ballpark shows that, through this method (metric $s_{1}^{D F}$ ), all models behave more or less equally well in generating acceptable beats.

What matters in generating synthetic beats for augmenting datasets is the productivity rate ( $s_{4}^{D F}$ ), i.e., the efficiency of models in terms of time and computational power, which translates into the percentage of the acceptable beats. In fact, a good model is the one that generates more acceptable beats per unit of time and computational power. We believe the productivity rate (Method 4) is a very efficient way to assess the capability of models in end-to-end generation of the synthetic ECG signals.

Performance analysis using Method 4 shows that Classic GAN has the highest productivity rate in terms of the DTW distance function, whereas the percentages of the BiLSTM-DC, AE/VAE-DC GAN, and WGAN models are all slightly lower but in the same ballpark, and the productivity rate of the DC-DC GAN is the lowest. Using Fréchet distance function produces the same trend, although at a slightly lower level. Thus, using Method 4, Classic GAN has the highest percentage of acceptable beats and is the most efficient model with respect to the DTW and Fréchet similarity measures. This might seem a bit counter-intuitive at first, but as FC architectures are very powerful and can potentially simulate most complicated non-linearities and functions, they can map the latent space to real data space very well. The values of the Euclidean measure are so low altogether that it does not seem to be a suitable distance function for this purpose. For instance, Fig 9(c) shows one generated beat with minimum Euclidean Distance whereas it contains none of the morphological features. The fact that both DTW and Fréchet distance functions show the same trend indicates that both are suitable for the comparison and the choice is just a matter of computational power. Visual inspection of the the generated beats (Method 5) shows that BiLSTM-DC GAN and WGAN generate acceptable beats more often than the others.

There is a lack of a systematic way for the performance assessment of models in data generating tasks, contrary to classification tasks (in which the performance metrics are standardized), and the performance is measured in practice based on the quality and quantity of the generated data on a case-based basis. We believe Methods 1 to 4 can fill the gap and provide quantitative measures for assessments of GAN family models. Our simple experiment with the state-of-the-art classifier (ECGResNet34) showed empirically that the augmentation of imbalanced ECG dataset and balancing them with synthetic ECG signals can improve the classification performance drastically.

Future works

A better similarity measure that can capture the similarity between time series more reliably and can eliminate the supervision of humans would help greatly.

Using different loss functions in the algorithms with various regularizations that can capture the difference between time series in a better way can result in a better convergence and alleviate fluctuations in error/loss functions.

Data Availability

The dataset underlying the results presented in the study are available from PhysioNet MIT BIH Arrhythmia Dataset (https://physionet.org/ content/mitdb/1.0.0/).

Funding Statement

This research was partially supported by the Open Cloud Institute (OCI) at UTSA. The work of Fatemeh Afghah is supported by the National Science Foundation under Grant Number 2213915. There was no additional external funding received for this study.

References

1.WHO. Cardiovascular Diseses, Fact Sheet; 2017. https://www.who.int/en/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds). [Google Scholar]
2.Ye C, Coimbra M, Vijaya Kumar B. Arrhythmia detection and classification using morphological and dynamic features of ECG signals. In: 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology. 2010. p. 1918–21. [DOI] [PubMed] [Google Scholar]
3.Escalona-Morán MA, Soriano MC, Fischer I, Mirasso CR. Electrocardiogram classification using reservoir computing with logistic regression. IEEE J Biomed Health Inform 2015;19(3):892–8. doi: 10.1109/JBHI.2014.2332001 [DOI] [PubMed] [Google Scholar]
4.YU S, CHOU K. Integration of independent component analysis and neural networks for ECG beat classification. Exp Syst Appl 2008;34(4):2841–6. doi: 10.1016/j.eswa.2007.05.006 [DOI] [Google Scholar]
5.Rahhal MMA, Bazi Y, AlHichri H, Alajlan N, Melgani F, Yager RR. Deep learning approach for active classification of electrocardiogram signals. Inf Sci. 2016;345:340–54. doi: 10.1016/j.ins.2016.01.082 [DOI] [Google Scholar]
6.Mousavi S. ECG Heartbeat Classification Seq2Seq Model; 2019. https://github.com/MousaviSajad/ECG-Heartbeat-Classification-seq2seq-model [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Panda R, Pati UC. Removal of artifacts from electrocardiogram using digital filter. In: 2012 IEEE Students’ Conference on Electrical, Electronics and Computer Science. 2012. p. 1–4. doi: 10.1109/sceecs.2012.6184767 [DOI] [Google Scholar]
8.Hodge JG Jr. Health information privacy and public health. J Law Med Ethics 2003;31(4):663–71. doi: 10.1111/j.1748-720x.2003.tb00133.x [DOI] [PubMed] [Google Scholar]
9.Golany T, Radinsky K. PGANs: personalized generative adversarial networks for ECG synthesis to improve patient-specific deep ECG classification. AAAI 2019;33(01):557–64. doi: 10.1609/aaai.v33i01.3301557 [DOI] [Google Scholar]
10.Wang P, Hou B, Shao S, Yan R. ECG arrhythmias detection using auxiliary classifier generative adversarial network and residual network. IEEE Access. 2019;7:100910–22. doi: 10.1109/access.2019.2930882 [DOI] [Google Scholar]
11.Delaney A, Brophy E, Ward T. Synthesis of realistic ECG using generative adversarial networks. arXiv preprint. 2019. https://arxiv.org/abs/1909.09150 [Google Scholar]
12.Wulan N, Wang W, Sun P, Wang K, Xia Y, Zhang H. Generating electrocardiogram signals by deep learning. Neurocomputing. 2020;404:122–36. doi: 10.1016/j.neucom.2020.04.076 [DOI] [Google Scholar]
13.Zhang Y, Babaeizadeh S. Synthesis of standard 12-lead electrocardiograms using two dimensional generative adversarial network. arXiv preprint. 2021. https://arxiv.org/abs/2106.03701 [DOI] [PubMed] [Google Scholar]
14.Zhu F, Ye F, Fu Y, Liu Q, Shen B. Electrocardiogram generation with a bidirectional LSTM-CNN generative adversarial network. Sci Rep 2019;9(1):6734. doi: 10.1038/s41598-019-42516-z [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Hong S, Zhou Y, Shang J, Xiao C, Sun J. Opportunities and challenges of deep learning methods for electrocardiogram data: a systematic review. Comput Biol Med. 2020;122:103801. doi: 10.1016/j.compbiomed.2020.103801 [DOI] [PubMed] [Google Scholar]
16.Esteban C, Hyland S, Ratsch G. Real-valued (medical) time series generation with recurrent conditional gans. arXiv preprint. 2017. https://arxiv.org/abs/1706.02633 [Google Scholar]
17.Doersch C. Tutorial on variational autoencoders. arXiv preprint. 2016. https://arxiv.org/abs/1606.05908 [Google Scholar]
18.Goodfellow I. NIPS 2016 tutorial: generative adversarial networks. arXiv preprint 2016 [Google Scholar]
19.Moody GB, Mark RG. The impact of the MIT-BIH arrhythmia database. IEEE Eng Med Biol Mag 2001;20(3):45–50. doi: 10.1109/51.932724 [DOI] [PubMed] [Google Scholar]
20.Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov P, Mark R, et al. Components of a new research resource for complex physiologic signals. PhysioBank, PhysioToolkit, and Physionet. 2000. [DOI] [PubMed] [Google Scholar]
21.Pan J, Tompkins WJ. A real-time QRS detection algorithm. IEEE Trans Biomed Eng 1985;32(3):230–6. doi: 10.1109/TBME.1985.325532 [DOI] [PubMed] [Google Scholar]
22.Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA. Generative adversarial networks: an overview. IEEE Signal Process Mag 2018;35(1):53–65. doi: 10.1109/msp.2017.2765202 [DOI] [Google Scholar]
23.Berndt DJ, Clifford J. Using dynamic time warping to find patterns in time series. In: KDD workshop. vol. 10, no. 16. Seattle, WA, USA; 1994. p. 359–70. [Google Scholar]
24.Aronov B, Har-Peled S, Knauer C, Wang Y, Wenk C. Frechet distance for curves, revisited. In: European Symposium on Algorithms. Springer; 2006. [Google Scholar]
25.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. [Google Scholar]
26.Lyashuk A. ECG classification; 2021. https://github.com/lxdv/ecg-classification/blob/master/README.md [Google Scholar]

PLoS One. 2025 Mar 25;20(3):e0271270. doi: 10.1371/journal.pone.0271270.r001

Author response to Decision Letter 0

4 Dec 2021

PLoS One. doi: 10.1371/journal.pone.0271270.r002

Decision Letter 0

Zahid Mehmood

1 Apr 2022

PONE-D-21-38425Synthetic ECG Signal Generation Using Generative Neural NetworksPLOS ONE

Dear Dr. Adib,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

please submit your revised manuscript by May 16 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Zahid Mehmood, PhD

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf"

2. Thank you for stating in your Funding Statement:

“This research was partially supported by the Open Cloud Institute (OCI) at UTSA. The work of Fatemeh Afghah is supported by the National Science Foundation under Grant Number 1657260 and by the National Institute on Minority Health and Health Disparities of the National Institutes of Health under Award Number U54MD012388”

Please provide an amended statement that declares *all* the funding or sources of support (whether external or internal to your organization) received during this study, as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now. Please also include the statement “There was no additional external funding received for this study.” in your updated Funding Statement.

Please include your amended Funding Statement within your cover letter. We will change the online submission form on your behalf.

3. Thank you for stating the following in the Acknowledgments Section of your manuscript:

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“FA,

National Science Foundation, Grant Number 1657260

National Institute on Minority Health and Health Disparities of the National Institutes of Health under Award Number U54MD012388

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This paper provides a method for ECG beat generation using Generative Neural Networks (GAN) to solve the problem of imbalanced ECG datasets. The authors developed different models (FC-FC (classic), DC-DC, BiLSTM-DC, AE/VAE-FC, and DC-DC WGAN) to find the best model that can achieves the higher performance. The authors applied their methods on the MIT-BIH dataset with five classes from it. Regarding the paper quality and structure of the paper is very well organized and clear, while the methodology is superiorly described with clear and high-resolution Figures that describe the method. Finally, the results are providing clear and discussed and compared very well. Following are my comments to authors:

1. Regarding the proposed GANs network, the author must provide details about the used training optimization method and if they can provide a comparison between different fine-tuning results using the used optimization techniques.

2. Add more Figure to show the generated ECG beats especially for the main 5 classes and labelled them.

3. Add the equations for performance evaluations.

4. I think that authors should make the data available for the public using IEEE DataPort so others can use it to evaluate their algorithms for detection of fake ECG beats.

5. Add a table the compare the proposed method with methods in and must include more recent techniques and research and organized based on the number of beats.

6. Add a plots for the developed models not only tables.

Reviewer #2: The manuscript is well organized and the idea is interesting. My main concern is that the applicability of the approach in realistic application. Specifically, I suggest to design extra experiments to compare the model evaluational performance with generated ECG sample compared to real ECG sample, for example in ECG arrhythmia classification tasks.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2025 Mar 25;20(3):e0271270. doi: 10.1371/journal.pone.0271270.r003

Author response to Decision Letter 1

13 May 2022

please see the attached response (rebuttal) document

Attachment

Submitted filename: plosone_rebutal_rev1.docx

pone.0271270.s001.docx^{(29.7KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0271270.r004

Decision Letter 1

Zahid Mehmood

28 Jun 2022

Synthetic ECG Signal Generation Using Generative Neural Networks

PONE-D-21-38425R1

Dear Dr. Adib,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Zahid Mehmood, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: Dear Authors,

Thank you very much for your great response to the reviewer comments, the paper is now informative and contains all the needed information.

Reviewer #2: The main concern of the realistic use of the proposed approach has been well addressed, the author provided experiments and experimental results to validate the efficacy of augmentation.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Ali Mohammad Alqudah

Reviewer #2: No

**********

PLoS One. doi: 10.1371/journal.pone.0271270.r005

Acceptance letter

Zahid Mehmood

PONE-D-21-38425R1

Synthetic ECG signal generation using generative neural networks

Dear Dr. Adib:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Zahid Mehmood

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Attachment

Submitted filename: plosone_rebutal_rev1.docx

pone.0271270.s001.docx^{(29.7KB, docx)}

Data Availability Statement

The dataset underlying the results presented in the study are available from PhysioNet MIT BIH Arrhythmia Dataset (https://physionet.org/ content/mitdb/1.0.0/).

[pone.0271270.ref001] 1.WHO. Cardiovascular Diseses, Fact Sheet; 2017. https://www.who.int/en/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds). [Google Scholar]

[pone.0271270.ref002] 2.Ye C, Coimbra M, Vijaya Kumar B. Arrhythmia detection and classification using morphological and dynamic features of ECG signals. In: 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology. 2010. p. 1918–21. [DOI] [PubMed] [Google Scholar]

[pone.0271270.ref003] 3.Escalona-Morán MA, Soriano MC, Fischer I, Mirasso CR. Electrocardiogram classification using reservoir computing with logistic regression. IEEE J Biomed Health Inform 2015;19(3):892–8. doi: 10.1109/JBHI.2014.2332001 [DOI] [PubMed] [Google Scholar]

[pone.0271270.ref004] 4.YU S, CHOU K. Integration of independent component analysis and neural networks for ECG beat classification. Exp Syst Appl 2008;34(4):2841–6. doi: 10.1016/j.eswa.2007.05.006 [DOI] [Google Scholar]

[pone.0271270.ref005] 5.Rahhal MMA, Bazi Y, AlHichri H, Alajlan N, Melgani F, Yager RR. Deep learning approach for active classification of electrocardiogram signals. Inf Sci. 2016;345:340–54. doi: 10.1016/j.ins.2016.01.082 [DOI] [Google Scholar]

[pone.0271270.ref006] 6.Mousavi S. ECG Heartbeat Classification Seq2Seq Model; 2019. https://github.com/MousaviSajad/ECG-Heartbeat-Classification-seq2seq-model [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0271270.ref007] 7.Panda R, Pati UC. Removal of artifacts from electrocardiogram using digital filter. In: 2012 IEEE Students’ Conference on Electrical, Electronics and Computer Science. 2012. p. 1–4. doi: 10.1109/sceecs.2012.6184767 [DOI] [Google Scholar]

[pone.0271270.ref008] 8.Hodge JG Jr. Health information privacy and public health. J Law Med Ethics 2003;31(4):663–71. doi: 10.1111/j.1748-720x.2003.tb00133.x [DOI] [PubMed] [Google Scholar]

[pone.0271270.ref009] 9.Golany T, Radinsky K. PGANs: personalized generative adversarial networks for ECG synthesis to improve patient-specific deep ECG classification. AAAI 2019;33(01):557–64. doi: 10.1609/aaai.v33i01.3301557 [DOI] [Google Scholar]

[pone.0271270.ref010] 10.Wang P, Hou B, Shao S, Yan R. ECG arrhythmias detection using auxiliary classifier generative adversarial network and residual network. IEEE Access. 2019;7:100910–22. doi: 10.1109/access.2019.2930882 [DOI] [Google Scholar]

[pone.0271270.ref011] 11.Delaney A, Brophy E, Ward T. Synthesis of realistic ECG using generative adversarial networks. arXiv preprint. 2019. https://arxiv.org/abs/1909.09150 [Google Scholar]

[pone.0271270.ref012] 12.Wulan N, Wang W, Sun P, Wang K, Xia Y, Zhang H. Generating electrocardiogram signals by deep learning. Neurocomputing. 2020;404:122–36. doi: 10.1016/j.neucom.2020.04.076 [DOI] [Google Scholar]

[pone.0271270.ref013] 13.Zhang Y, Babaeizadeh S. Synthesis of standard 12-lead electrocardiograms using two dimensional generative adversarial network. arXiv preprint. 2021. https://arxiv.org/abs/2106.03701 [DOI] [PubMed] [Google Scholar]

[pone.0271270.ref014] 14.Zhu F, Ye F, Fu Y, Liu Q, Shen B. Electrocardiogram generation with a bidirectional LSTM-CNN generative adversarial network. Sci Rep 2019;9(1):6734. doi: 10.1038/s41598-019-42516-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0271270.ref015] 15.Hong S, Zhou Y, Shang J, Xiao C, Sun J. Opportunities and challenges of deep learning methods for electrocardiogram data: a systematic review. Comput Biol Med. 2020;122:103801. doi: 10.1016/j.compbiomed.2020.103801 [DOI] [PubMed] [Google Scholar]

[pone.0271270.ref016] 16.Esteban C, Hyland S, Ratsch G. Real-valued (medical) time series generation with recurrent conditional gans. arXiv preprint. 2017. https://arxiv.org/abs/1706.02633 [Google Scholar]

[pone.0271270.ref017] 17.Doersch C. Tutorial on variational autoencoders. arXiv preprint. 2016. https://arxiv.org/abs/1606.05908 [Google Scholar]

[pone.0271270.ref018] 18.Goodfellow I. NIPS 2016 tutorial: generative adversarial networks. arXiv preprint 2016 [Google Scholar]

[pone.0271270.ref019] 19.Moody GB, Mark RG. The impact of the MIT-BIH arrhythmia database. IEEE Eng Med Biol Mag 2001;20(3):45–50. doi: 10.1109/51.932724 [DOI] [PubMed] [Google Scholar]

[pone.0271270.ref020] 20.Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov P, Mark R, et al. Components of a new research resource for complex physiologic signals. PhysioBank, PhysioToolkit, and Physionet. 2000. [DOI] [PubMed] [Google Scholar]

[pone.0271270.ref021] 21.Pan J, Tompkins WJ. A real-time QRS detection algorithm. IEEE Trans Biomed Eng 1985;32(3):230–6. doi: 10.1109/TBME.1985.325532 [DOI] [PubMed] [Google Scholar]

[pone.0271270.ref022] 22.Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA. Generative adversarial networks: an overview. IEEE Signal Process Mag 2018;35(1):53–65. doi: 10.1109/msp.2017.2765202 [DOI] [Google Scholar]

[pone.0271270.ref023] 23.Berndt DJ, Clifford J. Using dynamic time warping to find patterns in time series. In: KDD workshop. vol. 10, no. 16. Seattle, WA, USA; 1994. p. 359–70. [Google Scholar]

[pone.0271270.ref024] 24.Aronov B, Har-Peled S, Knauer C, Wang Y, Wenk C. Frechet distance for curves, revisited. In: European Symposium on Algorithms. Springer; 2006. [Google Scholar]

[pone.0271270.ref025] 25.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. [Google Scholar]

[pone.0271270.ref026] 26.Lyashuk A. ECG classification; 2021. https://github.com/lxdv/ecg-classification/blob/master/README.md [Google Scholar]

PERMALINK

Synthetic ECG signal generation using generative neural networks

Edmond Adib

Fatemeh Afghah

John J Prevost

Roles

Abstract

Introduction

Related works

Table 1. Comparison with major related works - I.

Table 2. Comparison with major related works - II.

Table 3. Comparison with major related works - III.

Materials and mathematical background

Generative models

Autoencoder-decoders.

Variational autoencoders.

Adversarial networks.

Experimental setup

Dataset and segmentation

Fig 6. Templates.

Model designs

Table 4. Classic GAN (01).

Table 5. DC-DC GAN (02).

Table 6. BiLSTM-DC GAN (03).

Table 7. AE/VAE-DC GAN (04).

Table 8. WGAN (05).

Hyperparameter settings.

Graphical representations of architectures.

Fig 1. Graphical representation of model 01.

Fig 2. Graphical representation of model 02.

Fig 3. Graphical representation of model 3.

Fig 4. Graphical representation of model 4.

Fig 5. Graphical representation of model 5.

Similarity measures (distance functions)

Dynamic time warping measure (DTW).

Fréchet distance measure.

Euclidean distance measure.

Templates

Statistically-Averaged Beat (SAB) approach.

Expert-eye/random approach.

Evaluating the generated beats and comparison between models

Method 1.

Table 9. Method 1 (portions of the two sets compared).

Method 2.

Table 10. Method 2 (all generated beats compared with one randomly selected template, averages).

Method 3.

Table 11. Method 3 (best generated beat - Minimum Distance Functions).

Method 4.

Table 12. Method 4 (productivity rate - percent of acceptable beats).

Method 5.

Efficacy of synthetic ECG augmentation

Platform

Results and discussion

Templates and typical normal beat

Generated beats

Fig 7. Generated beats, Classic GAN (01).

Fig 8. Generated beats, DC-DC GAN (02).

Fig 9. Generated beats, BiLSTM-DC GAN (03).

Fig 10. Generated beats, AE/VAE GAN (04).

Fig 11. Generated beats, WGAN (05).

Distance and loss functions

Fig 12. Similarity measures and loss functions vs epoch numbers.

Performance metrics

Efficacy of augmentation

Table 15. Augmented data, balanced.

Table 14. Real data, imbalanced.

Table 13. Real data, balanced.

Table 16. Confusion matrices.

Conclusion

Future works

Data Availability

Funding Statement

References

Author response to Decision Letter 0

Decision Letter 0

Zahid Mehmood

Roles

Author response to Decision Letter 1

Decision Letter 1

Zahid Mehmood