Abstract
Objective:
The aim of this work is to demonstrate the performance of the ECG noise extraction tool (ECGNExT) which provides estimates of ECG noise that are not significantly different from the inherent noise in an ECG generated by motion artifacts and other sources. In addition, this paper elaborates on use of ECGNExT in an algorithm evaluation context comparing two QRS detection algorithms.
Methods:
140 simultaneous pairs of clean ECGs and ECGs corrupted with motion-induced noise from 29 participants under five different and separate motion conditions were collected and analyzed. Estimates of the noise component of the ECGs recorded with noise were obtained using ECGNExT and were then added to the clean ECGs yielding estimated ECGs with noise. Root mean squared error (RMSE) between the recorded and estimated ECGs with noise was calculated for temporal comparison, and band powers of the signals were calculated for spectral comparison.
Results:
A t-test revealed that the mean RMSE < 150-microvolts with p-value < 0.001 and , and equivalence tests showed that the band powers of the two ECGs were statistically equivalent with .
Conclusion:
ECGNExT can reliably estimate the underlying ECG noise while preserving temporal and spectral features.
Significance:
We previously proposed ECGNExT as a component of ECG analysis algorithm testing during noise conditions and reported its performance based on simulated ECG data. This work provides additional support of the performance and functionality of the ECGNExT algorithm from a study with pairs of simultaneously recorded ECGs with and without noise from human subjects.
Keywords: Arrhythmias, Cardiac activity monitoring, ECG analysis algorithms, ECG noise estimation, Heart rate variability, Heartbeat detection
I. Introduction
Wearable electrocardiograph (ECG) devices for monitoring cardiac activity and detecting arrhythmias are increasingly used in clinical and home healthcare applications [1–4]. These devices have drawn attention [5, 6] due to their compact form factor that facilitates long-term ECG detection and recording [7]. However, a challenge in these devices is the noise that contaminates the ECGs at the point of detection, particularly motion-induced noise for wearable devices [8] that could lead to missed or false detection of significant cardiac events [9]. Depending on the location of a wearable device on the body and the coupling of its sensors with the skin, the sources of noise can be a combination of various factors including motion [8], [10], myoelectric activity [11], electrode-to-skin coupling/movement [12], and other sources.
The effects of noise in ECGs can lead to various performance degradations for analysis algorithms such as false or incorrect detection of events or arrhythmias [9, 13], inaccurate heart rate and heart-rate variability estimations [14], and other adverse outcomes that can impact the diagnosis of relevant cardiac conditions. To assess the performance of ECG processing algorithms, international standards (e.g., ANSI/AAMI EC57, ANSI/AAMI/IEC 60601-2-47) [15, 16] include certain tests to be conducted using previously recorded databases, such as the MIT-BIH Arrhythmia database [17–20].
Applying additive noise to annotated ECGs from previously recorded databases is one approach that can be used to assess impacts of noise on an arrhythmia analysis algorithm’s performance. A common set of signals for this is the MIT-BIH Noise Stress Test (NST) Database [21] available via PhysioNet [20]. These noise datasets include half-hour recordings of muscle artifacts, electrode motions, and baseline wander and can be added to clean ECGs to simulate a noisy ECG. However, due to device design differences for new wearable ECG devices such as anatomical sensor location (e.g., wrist [22] vs. chest [2]), skin-sensor interface differences (e.g., contact [2, 22] vs. non-contact capacitive [23] electrodes), sensor modalities (e.g., wet vs. dry [22] electrodes), or other characteristics (e.g., hardware and software impacting ECG acquisition, filtering, and processing) it may not always be clear if the simulated noisy ECGs using previously recorded datasets may be representative of the ECGs recorded by a device during realistic use conditions. Using the noise samples from databases with ECGs recorded with different sensor technologies may not account for variations due to the different factors.
Obtaining realistic estimates of underlying signal noise from ECGs recorded with the device that will be used can provide more representative noise signals to be added to clean ECGs for testing the algorithms associated with the device of interest. This approach provides the noise samples to be relevant to the application a device is designed for, but also accounts for variations stemming from differences in device uses (e.g., electrode technology, electrode location, patient activity, and other factors that may impact the noise/artifacts introduced on the ECG). This is possible because the noise estimates are obtained using the same sensing approach used by the device in question.
To that end, we previously proposed an ECG noise extraction tool (ECGNExT) to obtain samples of realistic noise from an ECG [24]. This tool provides a means to extract the noise present in an ECG recorded from any device, that can then be added to annotated reference ECGs to provide different signal-to-noise ratios (SNRs) for testing. In our previous work, the validation of the algorithm was limited to simulated ECGs where the noise was added from the NST database to those simulated ECGs.
In this paper, we present a clinical performance analysis of the algorithm proposed in [24]. We recorded simultaneous ECGs at two locations (wrist and collar bone) under certain motion conditions at the wrist site that allows for a comparison of the noise estimates from ECGNExT. In addition to showing that the noise estimated by the algorithm is representative of the underlying ECG noise that could impact the performance of an ECG processing algorithm, we also provide an example of utilization of the ECGNExT algorithm in the context of performance evaluation of two simple heartbeat detection algorithms. The performances of these heartbeat detection algorithms are evaluated against varying SNRs of the ECGs affected by varying amplitudes of noises estimated by the ECGNExT algorithm.
II. Methods
A. Algorithm Description
The ECG noise extraction tool (ECGNExT) estimates samples of noise from an ECG recorded during noise conditions. A detailed description is available in [24]. As illustrated in Fig. 1, the inputs of the algorithm used in this paper are a ‘noisy’ ECG , the bandpass-filtered version of , and the position (indices) of the R peaks, marking each heartbeat. ECGNExT is designed to accept accurate QRS (R-peak) locations as an input. Here, we used the publicly available Pan-Tompkin’s algorithm [26] for detecting R-peaks. ECGNExT users can use more advanced R-peak detectors, or, for example, a separate (cleaner) ECG channel for R-peak detection [24]. is a band-pass [0.5 to 40 Hz] filtered version of used to create a median-beat template. A residual signal, residuals, expected to contain high-frequency and low-frequency is computed by subtracting from . The QRS estimator subsystem in the algorithm estimates a template of the PQRST complex of each beat by aligning the QRS complexes available in using the R peak indices and extracting the region around the R peak that includes the PQRST segments for every beat. For each R peak, this region starts at a distance of 1/3 of the smallest R-to-R interval (RRI) to the left of the R peak and ends at a distance of 2/3 of the smallest RRI to the right of the R peak. Finally, a median-beat template is created by obtaining a sample-by-sample median of the segments for each beat in a 60 s ECG record. Then the Synthetic ECG Generator subsystem generates a synthetic ECG by repeating the template at each of the input R peak indices, Fig. 1. Subtracting the Synthetic ECG from yields an initial estimate of the noise in the ECG, from which a period of +/− 40 ms around the R-peak is removed by the QRS Removal subsystem to mitigate any residual differences between the synthetic and recorded ECG for each beat. The resulting signal is added to residuals from which QRS complexes are similarly removed by the same QRS removal subsystem. The result is , the final output of the algorithm, which contains a synthetically extracted noise signal similar to that impacting the original ECG. Further details of the process are available in our previous work [24] and MATLAB (The MathWorks, Inc.) code implementing the algorithm is available at: https://github.com/dbp-osel/ECGNExT [25].
Fig. 1:

Block diagram of the ECGNExT Algorithm.
B. Study Design
The study was approved under FDA IRB Study #2020-CDRH-007. Informed consent was received from 30 participants (22 male, 8 female, age: 41+/−9) who were then enrolled in the study. Consistent with the protocol, all participants were willing and able to perform a series of simple instructed motions, had no known cardiac conditions or active implantable devices, and were not pregnant. Gel electrodes were placed on the right collar bone, right wrist, and two on the left iliac bone for reference. Single-lead ECGs were simultaneously recorded from the right collar bone and right inner wrist sites using the left iliac bone electrodes as references. All recordings were performed using a two-channel bio amplifier (FE232) and a PowerLab 16/35 data acquisition unit from ADInstruments. The sampling rate was set to 1000 samples per second and the “Mains Filtering” feature of the unit (a filtering feature offered by the data acquisition software (LabChart8) to attenuate powerline noise) was set to active.
Participants were asked to sit still and calm in a chair and perform the following five independent and separate activities with their right hand for a duration of at least one-minute each.
Finger Tapping: Tapping their fingers on the chair arm where they were sitting.
Arm Waving: Waving their arm from elbow down.
Wrist Rotating: Rotations around the wrist area without moving the rest of the arm.
Electrode Tapping: Tapping on the outer electrode surface on the right wrist using their left hand.
Fist Making: Different activities involving making a fist and other finger motions.
Each activity was performed to isolate movement to the wrist location only. Pairs of ECGs from collar bone and wrist locations were simultaneously recorded from each participant. is expected to be free from artifacts induced by the motion, while is expected to contain artifacts induced by the motion. From these recordings, motion-induced artifacts can then be isolated from by removing . Each activity was preceded by one-minute periods without any motion for collecting baseline ECGs. Of the 30 participants one dataset did not cover the entire set of activities as listed above and was not included in the analysis. A total of 145 (5 activities * 29 participants) pairs of and signals with lengths of 60 s were collected as the activities were being performed. While the data collection for each activity lasted for 60 s, 24 segments from 10 participants had significant noise and motion artifacts causing errors in R peak detection which are necessary for the study analysis. Of these 24 signals, 5 (4 from one participant and 1 from another) were determined to be unusable for the entire 60 s signal length and were not included in further analyses. This reduced the total ECGs studied to 140. At least 30 s of usable ECGs were identified from the remaining 19 corrupt signals and were included in the analyses. A breakdown of total number of ECGs analyzed for each activity is provided in Table I.
Table I:
Summary of ECG segments in each activity and number of outliers caused by and .
| Activity Type | Act. Code | # Of ECGs | # Of ECGs < 60 s | # Of cases with RMSE > 150 μV | |
|---|---|---|---|---|---|
| Finger Tap | ACT1 | 28 | 4 | 6 | 2 |
| Arm Wave | ACT2 | 28 | 6 | 5 | 2 |
| Wrist Rotate | ACT3 | 27 | 3 | 3 | 1 |
| Electrode Tap | ACT4 | 28 | 3 | 3 | 2 |
| Make Fist | ACT5 | 29 | 3 | 1 | 1 |
| Total | 140 | 19 | 18 | 8 | |
| No Motion | NoMot | 145 | 12 | 10 | 3 |
Fifteen datasets (including one from the 14 shortened ECGs) had signals with significant noise and motion artifacts throughout their recording periods while their R peaks could still be detected. A sample noisy (left panel of Fig. 2) is compared with a noise-free (right panel of Fig. 2). Existence of levels of noise in such as what is depicted in the left panel of Fig. 2 prevents us from accurately investigating the error associated with . Therefore, we applied a band-pass filter with a pass band between 0.5 and 40 Hz to all signals. We denote the resulting band pass-filtered as . The results based on both scenarios of using and in (1) are summarized in the tables of the Results section and in the Appendix, whereas the figures report results based on scenario only unless otherwise indicated.
Fig. 2:

from two participants while performing finger tapping. The left resulted in poor R peak detection whereas the right is acceptable.
C. Extracted Noise Analysis
First R-peaks were detected from using Pan-Tompkins QRS detection algorithm [26]. signals along with the R-peak locations were passed through the ECGNExT algorithm to estimate the recorded noise and other induced motion artifacts. We then obtained as in (1) to compare it with .
| (1) |
For reference, see Fig. 3 for a 10-second sample of these signals collected during a finger tapping activity from one participant. From the top in the figure, panel 1 shows , panel 2 shows , panel 3 shows , and panel 4 shows .
Fig. 3:

Sample signals for one subject during finger tapping activity. (Top to bottom) panel 1 is a clean ), panel 2 is the recorded noisy from the wrist panel 3 is the estimated noise from , and panel 4 is the estimated noisy obtained from adding to the .
To demonstrate that the estimated noise output from ECGNExT algorithm is representative of the motion artifact in the recorded ECG, we tested the hypothesis that the root mean squared error (RMSE) between and as defined in (2) was less than 150 μV and 50 μV using a one-side T-test. These thresholds were selected from the ECG monitoring standard IEC 60601-1-27 for rejecting QRS complexes and internal noise allowed in ambulatory ECG per the standard IEC 60601-2-47, respectively [15].
| (2) |
Here, is the number of samples in an ECG segment recorded during an activity from each participant. Because the primary difference between and is the presence of motion-induced artifacts, by subtracting from we can generate a direct estimate of the recorded noise for evaluation of . We measured and visually compared the recorded noise with , where is the difference between and :
| (3) |
The difference between and , as defined in (4) is also reported.
| (4) |
The QRS complexes of , and were removed when used in (1) to (4) because has its QRS removed as part of the ECGNExT algorithm.
To evaluate differences in spectral content between the estimated and recorded signals, spectral powers and were obtained for and , respectively. For this purpose, the signals were detrended after their means were removed, and then their power spectral densities were calculated based on Welch’s windowing method [27] using the MATLAB function pwelch.m with a window length of 30 s and 50% overlap. Finally, spectral powers P1 and P2 were obtained in three bands of 0.5 – 5 Hz, 5 – 25 Hz, and 25 – 40 Hz [28, 29] by calculating the area under the obtained power spectral density curves with widths defined by the mentioned frequency bands. here was based on and QRS complexes were not removed from and when obtaining P1 and P2. For the sample power curves in Fig. 8, the involved signals were first normalized based on their standard deviations before their powers were estimated, whereas the rest of frequency analysis was based on unnormalized signals. Please note that was used instead of as a reference signal when plotting Fig. 8 to cover for all the frequency components involved. In contrast, was used as a reference when showing aggregate powers of the signals involved in the mentioned three bands in the boxplots of Fig. 9 to be consistent with the rest of figures for time-domain analyses. The differences between and , were then analyzed through an equivalence test with an equivalence interval (EI) as defined in (5) for means of and at in the mentioned frequency bands:
| (5) |
and are the lower and upper bounds of the EI in (5) and used to define the alternative hypothesis as:
| (6) |
where and are the means of and , respectively. We used the two one-sided test (TOST) method implemented by a code provided in MathWorks® Exchange repository [30]. The test result is a confidence interval (CI) and if it lies within the proposed EI, then the null hypothesis is rejected. In this study, we use the common standard deviation of and , to define and as and , respectively, as proposed in [31] with:
| (7) |
where and are the standard deviations of and , respectively, with their corresponding number of samples denoted by and .
Fig. 8:

Sample power curves comparing spectral characteristics of the (dark grey dashed line) and (grey solid line).
Fig. 9:

Summary of absolute powers of (black) and (grey) powers in three bands ordered from top to bottom as in Table III, grouped based on each activity.
When obtaining and , we considered both cases of and with and without QRS complexes. For the case of removing QRS, the QRS components of were first removed and then was added to it to obtain per (1). In the case of with QRS, the was cut short to comply with the length of to allow addition per (1). These two cases were examined with both scenarios of using and in (1). The outcomes of this equivalence test are summarized in the Results section for the only scenario where is used in (1) without removing the QRS components of the resulting and the . The outcomes for other scenarios are summarized in the Appendix.
D. Sample Size Analysis
The number of participants for this study is based on simulated results from the previous work [24] where an RMSE of less than 50 μV was suggested based on the IEC 60601–2-47 standard [15]. We used “sampsizepwr.m” function from MATLAB to obtain the minimum number of datasets to ensure a power of 0.9 at for the proposed alternative hypothesis. At least 76 ECG pairs were calculated based on these conditions. It was initially unclear how many of the activity periods would provide clean and noisy ECGs acceptable for our analysis but estimated that 3 segments could be obtained from each participant. Therefore, a minimum of 26 subjects was determined to be needed with additional subjects to be included in case of subject drop out.
E. ECGNExT in a Sample ECG Analysis Algorithm Evaluation Context
This section presents an example utilization of ECGNExT, mimicking a hypothetical design problem, where a developer needs to compare two available ECG analysis algorithms to determine which one performs better in the presence of noise. To that end, we consider two example heartbeat detection algorithms, Pan-Tompkins [26] and Perry [32] – a QRS detection algorithm available in the WFDB software package [33] of PhysioNet [20]. We evaluate their QRS detection performance in presence of the realistic noise estimates captured during this study. We used the 145 segments that were obtained using ECGNExT algorithm as described earlier. Adding each segment at a time with a single selected segment from the pool of ECGs collected during no-motion episodes yielded 145 SNR-controlled signals. These signals were then fed to both QRS detection algorithms at SNR levels varying from −12 dB to 12 dB in steps of 6 dB to evaluate their performances at each SNR level and with each signal. Fig. 4 illustrates the processes of obtaining the from using ECGNExT algorithm, making the , and performance evaluation of the QRS detection algorithms.
Fig. 4:

Workflow of algorithm evaluation using the noise estimates from ECGNExT algorithm.
Detection rate (in percentage), false-detection rate (in beats per minute (bpm)), and heart rate (HR) in bpm were used to compare the performances of the Pan-Tompkins and Perry algorithms at different SNRs. Detection rate was defined as number of correctly detected R peaks divided by the number of ground truth (GT) R peaks. Only one peak detected within 150 ms of each GT R peak was considered a correctly detected R peak per ECG standard EC57 clause 4.32 [16] and the rest were reported as false detections. Number of false detections within the first and last GT R peaks divided by the time between these two GT R peaks defines the false-detection rate in this paper. HR reported in this paper is the average of reciprocals of RRIs.
III. Results
A total of 140 ECGs from 29 participants were included in the analyses. Table I provides a breakdown of the number of recordings used from each activity after excluding some ECGs as described in the Methods. In addition, the table provides the number of ECGs that resulted in RMSEs greater than 150 microvolts when using versus in (1).
A. Time Domain Results
Fig. 5 provides examples of signals for no motion (topmost panel) and the rest of activity types in the rest of the panels as labeled in Table I. The signal related to fist making (ACT5) appears to be the noisiest. Random spikes in the signal affected by electrode tapping (ACT4) due to tapping events are also evident in the figure.
Fig. 5:

Sample signals for each motion type as encoded in Table I.
The null hypothesis of having an RMSE of 150 μV was rejected with a p-value < 0.001 as we conducted the one-sided t-test with an , supporting that remains below 150 μV on average. Table II provides a summary of the RMSE statistics based on the two scenarios of using and . Rows 1 and 3 are based on the case when was used in (1) for no motion and activity periods, respectively. Similarly, Rows 2 and 4 are based on the case when was used instead in (1) for no motion and activity periods, respectively. The one-sided -test results are based on data summarized in the fourth row of the table.
Table II:
RMSE statistics for various scenarios of using the .
| No. | Reference ECG | RMSE Statistics (μV) | Activity | |
|---|---|---|---|---|
| Mean | STD | |||
| 1 | 62.7 | 78.2 | N | |
| 2 | 34.6 | 66.7 | ||
| 3 | 78.4 | 84.6 | Y | |
| 4 | 56.2 | 96.5 | ||
Number of signals causing the outliers is provided in Table I. We also performed the t-test with . However, this condition was too slim to reject the null hypothesis.
Fig. 3 shows sample signals involved in the process to observe the similarities between (the second panel from the top) and (the fourth panel from the top). As can be seen, the effects of (the third panel from the top) on the (the first panel from the top) is comparable to that of motion noise effects observed in the . This is despite the loss of time alignment between and due to QRS removal which shows that the algorithm preserves the overall morphology of the underlying ECG noise.
Fig. 6 shows an overlay of a sample after QRS removal only in contrast with . As can be seen, the effects of T-waves are attenuated in as opposed to with only QRS removed. This illustrates an advantage offered by the algorithm which avoids altering the T-Waves in an ECG after adding the estimated noise for algorithm evaluation.
Fig. 6:

Overlay of and for one subject during a finger tapping episode.
Fig. 7 shows boxplots of the RMS values of and on the top panel, and the RMS values of on the bottom. The boxplots are based on data summarized on row 4 of Table II but grouped based on activity types. A good conformity can be seen between and from the top panel of the figure as closely follows the pattern in terms of mean values and the variations around the mean for each activity. The difference between and , captured by , remains around 50 μV on average for each activity (except arm waving) based on the bottom panel of the figure. The largest difference is displayed for Arm Wave activity in the bottom panel of the figure. However, that difference is because of the activity affecting the electrode which is influenced more by the Arm Wave activity than other activities and introduces additional noise in the comparison that is not part of the obtained from ECGNExT algorithm. Rather, those noises are part of the since in arm waving activity the noise could not be completely isolated to the wrist area.
Fig. 7:

RMS values for in grey and in black (top panel); RMS values for (bottom panel).
B. Frequency Domain Results
Table III summarizes the power statistics of and at three different frequency bands and indicates that the CI’s are within their corresponding EI’s. Comparing the obtained CI’s of the mean difference for each spectral band power as listed in Table III and their corresponding EI’s defined in (5), the null hypothesis can be rejected with an . This means that spectral characteristics of the and are not significantly different on average for the listed frequency bands. This can be observed in the sample spectral plot in Fig. 8 as curves related to (dashed dark grey) and (solid grey) are closely following each other. In addition, the aggregate spectral characteristics depicted in the boxplots of Fig. 9 suggest that both and exhibit similar patterns across all the study participants and for each activity for most of the case. was used in (1) to obtain and the QRS components were not removed from and when obtaining the power for the contents of Table III. Results of the other combinations described in II.C are provided in the Appendix.
Table III:
Summary of the spectral characteristics of and signals.
| No. | ECG | QRS | Frequency Bands | Power Statistics | CI × 105 (μV2) | EI × 105 (μV2) | |||
|---|---|---|---|---|---|---|---|---|---|
| Mean × 105 (μV2) | STD × 105 (μV2) | ||||||||
| 1 | Y | 0.5 – 5 Hz | 0.8814 | 0.9132 | 3.9565 | 3.8709 | [−1.120 1.056] | [−1.4090 1.4090] | |
| 2 | 5 – 25 Hz | 0.4133 | 0.4729 | 1.1114 | 1.0470 | [−0.360 0.241] | [−0.3887 0.3887] | ||
| 3 | 25 – 40 Hz | 0.0256 | 0.0265 | 0.0571 | 0.0507 | [−0.016 0.014] | [−0.0194 0.0194] | ||
C. Algorithm Demo Test Results
Fig. 10 summarizes the performance of the two R-peak detection algorithms (Pan-Tompkins and Perry) when processing the with varying SNRs. Performance improvements against SNR can be observed in Fig. 10.
Fig. 10:

Performance metrics reported for Pan-Tompkins (black) and Perry (grey) QRS detection algorithms.
IV. Discussions
An RMSE, our estimate of difference between and , of less than 150 μV shows that the noise estimated by the ECGNExT algorithm (available online at: https://github.com/dbp-osel/ECGNExT [25]) are representative of the inherent noise in the ECGs collected under motion conditions. This means that a clean test ECG signal combined with noises estimated by ECGNExT algorithm can provide representative signals to an otherwise noisy ECG that an ECG analysis algorithm can be exposed to. It follows that relevant performance tests can be conducted on an ECG analysis algorithm using noise estimates from ECGNExT algorithm in conjunction with a clean ECG.
While an RMSE of 150 μV is a generous threshold, it needs to be emphasized that it allows some room for errors due to the inherent differences in ECGs collected from two different anatomical sites from the wrist vs. from the collar bone involved in , and for any noise present in . This is evident from the second row of Table II where an RMSE of up to about 100 μV for the no motion case alone is observed. This shows that part of the difference between and in our analyses is due to differences in location of electrode placement. Comparing the second and fourth rows, there is an increase of 20 μV in mean RMSE as we introduce motion in the ECG collected from the wrist. This also implies that the differences in ECG collection sites contribute more in the RMSE than motion artifacts. Thus, the algorithm tends to produce repeatable output under motion and no motion conditions.
The larger difference between and observed in the bottom panel of Fig. 7 related to the Arm Wave activity is due to the increased noise in . This is because the Arm Wave activity is less isolated to the wrist area and affects collected from the collar bone. Thus, using these segments to obtain results in increased amounts of noise from the which is not accounted for by the ECGNExT algorithm. On the other hand, the RMS Error between and (bottom panel of Fig. 7) for electrode tapping is relatively small despite the noise magnitude asserted by this activity (5th panel from the top in Fig. 5, top panel of Fig. 7, and boxplots of Fig. 9). This is because this activity has a better isolation from the shoulder where is recorded and thus from ECGNExT follows very closely.
Frequency analysis of and reveal similar outcomes. That is, the ECGNExT algorithm preserves the spectral components of the noise as shown by the equivalence testing. In addition, the power curves of Fig. 8 show that both samples from and exhibit very similar patterns. Similarly, the boxplots of the aggregate powers in Fig. 9 capture the overall conformity of the spectral powers of to that of . The slight, but consistent increase in powers of can again be attributed to inherent differences in recording locations where noises in also contribute based on (1). This can further be supported by noting the same increases in the no motion case.
The response of ECG analysis algorithms to motion noise presented in III.C highlights the effects of ECGs with different SNRs on the performance of the algorithm in question. This also outlines a possible application of the ECGNExT algorithm in an ECG analysis algorithm evaluation context. Based on the plots of Fig. 10, the performances of the two sample algorithms improve as the ECG SNR increases. While the performance improvement is not necessarily linear as the SNR is increased, the overall pattern shows an improving trend in the device performance as the SNR increases. Here, the ECGNExT algorithm facilitates a comparative study where a realistic noise relevant to the device application is made available using the algorithm that helps derive application-relevant insights. For example, we can observe that the Pan-Tompkins algorithm is more robust to higher levels of noise as opposed to Perry algorithm. Although the former presents higher false-detection rates, it reaches to the levels of false-detection rates from the latter at the SNR of 0dB. This, combined with the better detection rate offered by the Pan-Tompkins algorithm demonstrates overall better performance over Perry algorithm under the same ECG noise conditions. On the other hand, Perry algorithm tends to demonstrate better HR estimation performance owing to its lower false-detection rates even under higher levels of noise, for example.
One of the main advantages of the ECGNExT algorithm is that it estimates representative noise that can be collected under similar conditions that an ECG device will be used. This allows for comprehensive tests without burdensome ECG collection from target patients populations an algorithm under test is intended for. This is feasible by using the ECGNExT algorithm to obtain realistic noise estimates from healthy population which then can be added to annotated reference ECGs containing pathologies that a device is expected to encounter in clinical use. In addition, this algorithm can extensively remove morphological characteristics of the ECG (e.g., T-wave effects) that the noises can carry from the source ECG. This feature helps avoid the adverse effects that those ECG components may have on an algorithm being evaluated using noise estimates from the ECGNExT algorithm. Such adverse effects are prevented because the internal mechanism of the ECGNExT algorithm attenuates most of the morphological features related to physiological characteristics of the underlying noisy ECG.
One of the main limitations of this study is the challenge of noise in the reference ECG, , that could not be fully avoided in the study. Our method extracts any noise from , and the error quantification used in this paper assumes to be free from any noise. However, any noise included in would be measured as error, even in the hypothesis that the ECGNExT algorithm would work perfectly. This creates uncertainty in the evaluation of the results, as it is not possible to discriminate errors due to incorrect ECGNExT output from those apparent errors due to noise in . As a mitigation effort we excluded recordings that showed significant noise in , and used the band-pass filtered , instead of in (1) because in this equation needs to be ideally clean so that would not carry artifacts from . That is, helps get ECGs closer to ideally clean ECGs. While helped significantly reduce the number of RMSE outliers, a few outliers remain. This is due to noises that remain in that do not appear in , and in some cases, due to a filtering artifact that introduces an end-effect anomaly to that is not present in either or .
Another limitation is that the motion artifacts introduced at the wrist ECG may not be inclusive of the broad range of all ECG noise scenarios. Since the focus of the study is to evaluate noise that is at a similar amplitude range to QRS complex to challenge a given algorithm within their detection limits, certain body movements such as walking, running, sitting-standing, or other whole-body movements were not considered in our study. This is because they would exhibit overwhelming levels of noise leading to QRS characteristics not being identified. Rather, the artifacts considered here are representative signal patterns influencing the ECG. This was intentionally limited to these representative artifacts to avoid losing QRS complexes entirely due to factors such as saturation and/or other noise amplitude effects that are beyond the limits of what the ECG analysis algorithms may detect. In addition, a reference clean ECG as an evaluation baseline needed for the purpose of this paper could not be maintained under these whole-body motion conditions. The excluded forms of motions may be relevant for designing other studies to apply ECGNExT for studying a device under different scenarios where they may be used in conjunction with their relevant motion conditions.
Finally, while the study was conducted on up to 60 s-long signals, the method could be implemented in a windowing approach to apply the algorithm over longer signals.
V. Conclusion
A performance evaluation of the ECGNExT algorithm was presented through comparative analyses with simultaneously recorded ECGs with and without noise. The analyses compared both temporal and spectral characteristics of a noisy ECG to that of a clean ECG contaminated with noise estimates from the same noisy ECG where the noise estimates were obtained using the ECGNExT algorithm. The results indicate that the ECGNExT algorithm can estimate the noises with sufficient similarity to the actual underlying noise in an ECG enabling an effective and modernized approach for use when evaluating ECG analysis algorithms robustness to noise. A realistic noise estimate obtained using the ECGNExT algorithm supports efficient test methods with clinical relevance as visualized in the sample comparative study of QRS detection performance evaluation.
Appendix
The table below provides the remaining results of the spectral analysis of the and that were not included in Table III. These details are for the remaining scenarios that were examined and described in II.C.
| No. | ECG | QRS | Frequency Bands | Power Statistics | CI × 105 (μV2) | EI × 105 (μV2) | |||
|---|---|---|---|---|---|---|---|---|---|
| Mean × 105 (μV2) | STD × 105 (μV2) | ||||||||
| 1 | N | 0.5 – 5 Hz | 0.8134 | 0.9683 | 3.9739 | 3.8414 | [−1.242 0.932] | [−1.4070 1.4070] | |
| 2 | 5 – 25 Hz | 0.2099 | 0.2178 | 1.0436 | 1.0377 | [−0.297 0.281] | [−0.3746 0.3746] | ||
| 3 | 25 – 40 Hz | 0.0127 | 0.0136 | 0.0507 | 0.0508 | [−0.015 0.013] | [−0.0183 0.0183] | ||
| 4 | Y | 0.5 – 5 Hz | 0.8814 | 0.9546 | 3.9565 | 3.8746 | [−1.162 1.016] | [−1.4097 1.4097] | |
| 5 | 5 – 25 Hz | 0.4133 | 0.4731 | 1.1114 | 1.0470 | [−0.360 0.240] | [−0.3887 0.3887] | ||
| 6 | 25 – 40 Hz | 0.0256 | 0.0286 | 0.0571 | 0.0509 | [−0.018 0.012] | [−0.0195 0.0195] | ||
| 7 | N | 0.5 – 5 Hz | 0.8134 | 0.8713 | 3.9739 | 3.8260 | [−1.143 1.027] | [−1.4042 1.4042] | |
| 8 | 5 – 25 Hz | 0.2099 | 0.2178 | 1.0436 | 1.0377 | [−0.297 0.281] | [−0.3746 0.3746] | ||
| 9 | 25 – 40 Hz | 0.0127 | 0.0134 | 0.0507 | 0.0507 | [−0.015 0.013] | [−0.0182 0.0182] | ||
Footnotes
Disclaimer
The mention of commercial products, their sources, or their use in connection with material reported herein is not to be construed as either an actual or implied endorsement of such products by the Department of Health and Human Services. This article reflects the views of the author and should not be construed to represent FDA’s views or policies.
References
- [1].Hannun AY et al. , “Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network,” Nature medicine, vol. 25, no. 1, pp. 65–69, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Turakhia MP et al. , “Diagnostic utility of a novel leadless arrhythmia monitoring device,” The American journal of cardiology, vol. 112, no. 4, pp. 520–524, 2013. [DOI] [PubMed] [Google Scholar]
- [3].Turnbull S et al. , “Utility of a Handheld, Single-Lead ECG Device for Diagnosis of Cardiac Arrhythmias,” Journal of the American College of Cardiology, vol. 81, no. 23, pp. 2292–2294, 2023. [DOI] [PubMed] [Google Scholar]
- [4].Haverkamp HT, Fosse SO, and Schuster P, “Accuracy and usability of single-lead ECG from smartphones - A clinical study,” Indian Pacing and Electrophysiology Journal, vol. 19, no. 4, pp. 145–149, 2019/July/01/ 2019, doi: 10.1016/j.ipej.2019.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Bansal A and Joshi R, “Portable out-of-hospital electrocardiography: A review of current technologies,” Journal of Arrhythmia, vol. 34, no. 2, pp. 129–138, 2018, doi: 10.1002/joa3.12035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Gambarotta N, Aletti F, Baselli G, and Ferrario M, “A review of methods for the signal quality assessment to improve reliability of heart rate and blood pressures derived parameters,” Medical & Biological Engineering & Computing, vol. 54, no. 7, pp. 1025–1035, 2016/July/01 2016, doi: 10.1007/s11517-016-1453-5. [DOI] [PubMed] [Google Scholar]
- [7].Liu Y, Chen J, Bao N, Gupta BB, and Lv Z, “Survey on atrial fibrillation detection from a single-lead ECG wave for Internet of Medical Things,” Computer Communications, vol. 178, pp. 245–258, 2021/October/01/ 2021, doi: 10.1016/j.comcom.2021.08.002. [DOI] [Google Scholar]
- [8].Tanantong T, Nantajeewarawat E, and Thiemjarus S, “Toward continuous ambulatory monitoring using a wearable and wireless ECG-recording system: A study on the effects of signal quality on arrhythmia detection,” Bio-Medical Materials and Engineering, vol. 24, no. 1, pp. 391–404, 2014. [DOI] [PubMed] [Google Scholar]
- [9].Drew BJ et al. , “Insights into the problem of alarm fatigue with physiologic monitor devices: a comprehensive observational study of consecutive intensive care unit patients,” PloS one, vol. 9, no. 10, p. e110274, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Lanatà A, Scilingo EP, Nardini E, Loriga G, Paradiso R, and De-Rossi D, “Comparative evaluation of susceptibility to motion artifact in different wearable systems for monitoring respiratory rate,” IEEE Transactions on Information Technology in Biomedicine, vol. 14, no. 2, pp. 378–386, 2009. [DOI] [PubMed] [Google Scholar]
- [11].Shi F, “A review of noise removal techniques in ECG signals,” in 2022 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS), 2022: IEEE, pp. 237–240. [Google Scholar]
- [12].Li BM et al. , “Influence of armband form factors on wearable ECG monitoring performance,” IEEE Sensors Journal, vol. 21, no. 9, pp. 11046–11060, 2021. [Google Scholar]
- [13].Satija U, Ramkumar B, and Manikandan MS, “A new automated signal quality-aware ECG beat classification method for unsupervised ECG diagnosis environments,” IEEE Sensors Journal, vol. 19, no. 1, pp. 277–286, 2018. [Google Scholar]
- [14].Tobon DP, Jayaraman S, and Falk TH, “Spectro-temporal electrocardiogram analysis for noise-robust heart rate and heart rate variability measurement,” IEEE journal of translational engineering in health and medicine, vol. 5, pp. 1–11, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].ANSI/AAMI IEC, IEC 60601–2-47: 2012 medical electrical equipment-part 2–47: particular requirements for the basic safety and essential performance of ambulatory electrocardiographic systems, 2012. [Google Scholar]
- [16].ANSI/AAMI, EC57, Testing and reporting performance results of cardiac rhythm and ST segment measurement algorithms, 2012. [Google Scholar]
- [17].Mark RG and Moody GB, “Evaluation of Automated Arrhythmia Monitors Using an Annotated ECG DATABASE,” in Ambulatory Monitoring: Cardiovascular system and allied applications Proceedings of a workshop held in Pisa, April 11–12, 1983. Sponsored by the Commission of the European Communities, as advised by the Committee on Medical and Public Health Research, Marchesi C Ed. Dordrecht: Springer Netherlands, 1984, pp. 339–357. [Google Scholar]
- [18].Moody GB and Mark RG, “The impact of the MIT-BIH arrhythmia database,” IEEE engineering in medicine and biology magazine, vol. 20, no. 3, pp. 45–50, 2001. [DOI] [PubMed] [Google Scholar]
- [19].Moody GB and Mark RG, “How can we predict real-world performance of an arrhythmia detector,” Computers in Cardiology, vol. 10, pp. 71–6, 1983. [Google Scholar]
- [20].Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, … & Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. [Online] Available: https://physionet.org/content/nstdb/1.0.0/ [DOI] [PubMed] [Google Scholar]
- [21].Moody GB MW, Mark RG, “A noise stress test for arrhythmia detectors,” presented at the Computers in Cardiology, 1984. [Google Scholar]
- [22].Funston R et al. , “Comparative study of a single lead ECG in a wearable device,” Journal of Electrocardiology, vol. 74, pp. 88–93, 2022. [DOI] [PubMed] [Google Scholar]
- [23].Wang X et al. , “Flexible non-contact electrodes for wearable biosensors system on electrocardiogram monitoring in motion,” Frontiers in Neuroscience, vol. 16, p. 900146, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Galeotti L and Scully CG, “A method to extract realistic artifacts from electrocardiogram recordings for robust algorithm testing,” Journal of Electrocardiology, vol. 51, no. 6, Supplement, pp. S56–S60, 2018/November/01/ 2018, doi: 10.1016/j.jelectrocard.2018.08.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Suliman A et al. (2023, January). ECG Noise Extraction Tool. ECGNExT. [Online]. Available: https://github.com/dbp-osel/ECGNExT [Google Scholar]
- [26].Pan J and Tompkins WJ, “A real-time QRS detection algorithm,” IEEE transactions on biomedical engineering, no. 3, pp. 230–236, 1985. [DOI] [PubMed] [Google Scholar]
- [27].Welch P, “The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms,” IEEE Transactions on audio and electroacoustics, vol. 15, no. 2, pp. 70–73, 1967. [Google Scholar]
- [28].Daluwatte C, Johannesen L, Galeotti L, Vicente J, Strauss D, and Scully C, “Assessing ECG signal quality indices to discriminate ECGs with artefacts from pathologically different arrhythmic ECGs,” Physiological measurement, vol. 37, no. 8, p. 1370, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Di Marco LY et al. , “Evaluation of an algorithm based on single-condition decision rules for binary classification of 12-lead ambulatory ECG recording quality,” Physiological measurement, vol. 33, no. 9, p. 1435, 2012. [DOI] [PubMed] [Google Scholar]
- [30].Anisha (2023). TOST(sample1, sample2, d1, d2, alpha). [Online]. Available: https://www.mathworks.com/matlabcentral/fileexchange/63204-tost-sample1-sample2-d1-d2-alpha
- [31].Wellek S, Testing statistical hypotheses of equivalence. Chapman and Hall/CRC, 2002. [Google Scholar]
- [32].Pery (2006). QRS Onset Detector. [Google Scholar]
- [33].Moody G, Pollard T, and Moody B, “WFDB software package (version 10.7. 0),” Physionet, 2022 [Google Scholar]
