Skip to main content
PLOS Digital Health logoLink to PLOS Digital Health
. 2024 Aug 13;3(8):e0000538. doi: 10.1371/journal.pdig.0000538

QRS detection in single-lead, telehealth electrocardiogram signals: Benchmarking open-source algorithms

Florian Kristof 1, Maximilian Kapsecker 1,2, Leon Nissen 2, James Brimicombe 3, Martin R Cowie 4, Zixuan Ding 3, Andrew Dymond 3, Stephan M Jonas 2, Hannah Clair Lindén 5, Gregory Y H Lip 6,7, Kate Williams 3, Jonathan Mant 3, Peter H Charlton 3,*; on behalf of the SAFER Investigators
Editor: Calvin Or8
PMCID: PMC7617317  EMSID: EMS202110  PMID: 39137171

Abstract

Background and objectives

A key step in electrocardiogram (ECG) analysis is the detection of QRS complexes, particularly for arrhythmia detection. Telehealth ECGs present a new challenge for automated analysis as they are noisier than traditional clinical ECGs. The aim of this study was to identify the best-performing open-source QRS detector for use with telehealth ECGs.

Methods

The performance of 18 open-source QRS detectors was assessed on six datasets. These included four datasets of ECGs collected under supervision, and two datasets of telehealth ECGs collected without clinical supervision. The telehealth ECGs, consisting of single-lead ECGs recorded between the hands, included a novel dataset of 479 ECGs collected in the SAFER study of screening for atrial fibrillation (AF). Performance was assessed against manual annotations.

Results

A total of 12 QRS detectors performed well on ECGs collected under clinical supervision (F1 score ≥0.96). However, fewer performed well on telehealth ECGs: five performed well on the TELE ECG Database; six performed well on high-quality SAFER data; and performance was poorer on low-quality SAFER data (three QRS detectors achieved F1 of 0.78-0.84). The presence of AF had little impact on performance.

Conclusions

The Neurokit and University of New South Wales QRS detectors performed best in this study. These performed sufficiently well on high-quality telehealth ECGs, but not on low-quality ECGs. This demonstrates the need to handle low-quality ECGs appropriately to ensure only ECGs which can be accurately analysed are used for clinical decision making.

Author summary

The electrocardiogram (ECG) is a vital tool for assessing heart health. Traditionally, ECGs are recorded in clinical settings, but with advances in technology, mobile devices and smartwatches can now be used to record ECGs in daily life. However, ECG recordings from these devices often contain more noise, posing challenges for accurate analysis. In this study, we evaluated 18 different algorithms for detecting heartbeats in ECGs. Our aim was to identify the best-performing algorithm for use with ECGs recorded using mobile devices. We tested each algorithm on 995 ECG recordings and compared their performance against manually-annotated heartbeats. From our analysis, we identified the two best-performing algorithms. These algorithms performed well when analysing high-quality ECGs obtained under clinical supervision and from mobile devices. However, their performance degraded significantly when analysing noisy ECGs from mobile devices. These findings highlight the importance of selecting robust algorithms for ECG analysis, particularly for data collected outside clinical environments. Furthermore, the study demonstrates the need to ensure that only ECGs which can be accurately analysed are used for clinical decision making.

Introduction

The electrocardiogram (ECG) is one of the most widely used physiological measurement techniques, providing detailed information on heart function. Traditionally ECG measurements have been confined to clinical settings. However, recently it has become possible to measure the ECG in telehealth settings using handheld devices or smartwatches [1, 2]. This presents the opportunity to conduct health assessment beyond the clinical setting, with potential applications including remote health monitoring, personalized diagnosis, rehabilitation, and screening for atrial fibrillation (AF). Indeed, the recent COVID-19 pandemic has acted as a strong catalyst for innovation in this area [3]. However, the increasing use of wearable and telehealth technologies also presents new challenges.

A key challenge is that telehealth ECGs can be of lower quality than those collected in clinical settings, and so ECG analysis algorithms must be able to handle increased noise levels. Telehealth ECGs can be of lower quality for several reasons [4]: the ECG is often measured further away from the heart (such as at the hands rather than the chest); devices typically use dry electrodes rather than the more conductive adhesive electrodes; and there is less quality control since measurements are taken by a non-expert user without clinical supervision. Therefore, there is a need to understand how well ECG analysis algorithms perform in the telehealth setting.

QRS detection is a fundamental task in ECG analysis. QRS complexes indicate ventricular depolarisation, i.e. the electrical impulse which causes the the heart to pump blood into the circulation. QRS detection is widely used for heart rate and rhythm monitoring, and heart rate variability analysis. Furthermore, QRS detection is frequently the first step towards extraction of more detailed ECG features such as QT intervals and P-waves. A range of QRS detection algorithms have been proposed [5, 6], most of which were developed using ECGs collected in clinical settings ([4] being a notable exception). Therefore, there is a need to assess their performance with telehealth ECGs. QRS detectors should firstly be accurate, correctly identifying QRS complexes. They should ideally remain accurate in the presence of pathologies such as AF (which results in an irregular heart rhythm), and in the presence of noise. In addition, QRS detection algorithms should also be stable with low execution times to ensure they are suitable for rapid and long-term analyses.

Previous studies have compared the performance of QRS detection algorithms across databases recorded in different settings. Liu et al. assessed ten QRS detectors across five datasets including one telehealth dataset [5]. The algorithms, chosen for their computational efficiency, achieved F1 scores of >99% on high-quality signals, ≤80% for low-quality signals, and ≥94% during pacing and in the presence of arrhythmias. The study concluded that an optimized knowledge-based algorithm [7] performed best. Llamedo and Martinez assessed six QRS detectors on 12 databases covering five categories: normal sinus rhythm, arrhythmia, ST and T morphology changes, stress, and long-term monitoring [6]. The study concluded that the gqrs algorithm performed best. Research in [8] assessed 12 QRS detectors across five publicly available datasets. The study concluded that the neurokit (nk)algorithm performed best when considering both accuracy and execution time. Previous work in this area addresses known algorithms and benchmark ECG databases, but there is a lack of knowledge about the latest algorithms and their application to new telehealth databases, especially their performance on self-recorded ECGs.

The aim of this study was to identify the best-performing open-source QRS detector for use with telehealth ECGs. The performance of 18 algorithms was assessed on multiple datasets including a novel dataset collected using handheld devices during screening for AF. Performance was assessed primarily in terms of the accuracy of QRS detection (quantified using the F1 score), and also in terms of the execution time and error rate of algorithms. The findings address the gap in knowledge about how well QRS detection algorithms perform in telehealth and settings. They are particularly relevant given the rapid introduction of single-lead ECG technology in consumer devices such as smartwatches, and clinical devices such as handheld ECG recorders.

Methods

QRS detection algorithms

The 18 QRS detectors assessed in this study are summarised in Table 1 (with source links provided in Table A in S1 Text). The QRS detectors were identified through a search for open-source algorithms. The majority of algorithms were found in either in the ‘NeuroKit’ [8] or ‘ecgdetector’ [9] Python packages. Some algorithms were available in both packages with slightly different implementations, in which case the faster implementation was used. Python implementations were used where available to provide a fair comparison of algorithm execution times. In four cases Python implementations were not available and Matlab implementations were used instead (jqrs, rdeco, rpeak, and unsw). Six additional algorithms were identified but not used in this study due to one of the following reasons: (i) no Python or Matlab implementation was available; (ii) the available implementation only accepted particular sampling frequencies; (iii) the available implementation predominantly led to errors; or (iv) the execution time was substantially longer than that of other algorithms. Further details are provided in Table B in S1 Text.

Table 1. Datasets.

Description No. Beats No. Recordings Recording Duration (min) Total Time (min) Sampling Frequency (Hz) Source
Supervised ECG recordings
SIN: Recordings from patients without arrhythmias: excerpts from long-term recordings collected at an Arrhythmia Laboratory. 185,253 18 120 2,160 500 MIT-BIH NSR database [10]
ARR: Recordings from patients with and without arrhythmias: excerpts from 24-hour ambulatory recordings collected at an Arrhythmia Laboratory. 112,599 48 30 1,440 360 MIT-BIH arrhythmia database [10, 11]
HIGH: High-quality recordings from patients and healthy volunteers: collected from multimodal devices such as bedside monitors. 72,315 100 10 1,000 250 2014 PhysioNet/CinC challenge training set [10, 12]
LOW: Low-quality recordings from patients and healthy volunteers: collected from multimodal devices such as bedside monitors. 78,518 100 10 1,000 360 2014 PhysioNet/CinC challenge augmented training set [10, 12]
Unsupervised, telehealth ECG recordings
TELE: Single-lead, telehealth ECGs from home-dwelling patients: collected by patients without supervision using a device which records the ECG from the hands. 5,932 250 0.50 125 500 Harvard dataverse TELE database [4]
SAFER: Single-lead, telehealth ECGs from home-dwelling AF screening participants: collected by participants without supervision using a handheld device. Split into subsets according to presence of AF (AF or non-AF) and ECG quality (HIGH or LOW): 18,279 479 0.50 239.5 500 SAFER Feasibility Study (private) [13]
- SAFER-AF-HIGH 8,456 183 0.50 91.5 500
- SAFER-nonAF-HIGH 7,065 199 0.50 99.5 500
- SAFER-nonAF-LOW 2,758 97 0.50 48.5 500

Datasets

The performance of QRS detectors was assessed using six datasets, including datasets collected in inpatient, outpatient, and home settings. The datasets are summarised in Table 2 and described in the following paragraphs. Full source links for the datasets are provided in Table C in S1 Text.

Table 2. QRS detection algorithms.

Abbreviation Description Reference(s)
christ christov: Detect QRS complexes as points exceeding an adaptive threshold consisting of the sum of: (i) steep-slope threshold (a linear reduction from 200ms to 1200ms 200ms after a QRS); (ii) adaptive integrating threshold (increases in the presence of electromyogram noise); and (iii) adaptive beat expectation threshold (a linear reduction between 2/3 and 1 mean RR-interval after a QRS). [14]
engz engzee, sqrs: Detection of QRS complexes as points exceeding a threshold based on a filtered signal: (i) removal of baseline wandering; (ii) application of a threshold to detect R-peaks; (iii) application of a refractory period to prevent multiple detection of a single R-peak. [15, 16]
fnvg FastNVG: A natural visibility graph (NVG) based R-peak detector: (i) representing the ECG signal as a graph using NVG; (ii) calculation of a node metric in the graph domain for weighting the signal to emphasize R-peak positions (iii) applying thresholding as with pan-tomp. [17, 18]
fwhvg FastWHVG: An R-peak detector based on the horizontal visibility graph (HVG): The method is similar to the fnvg algorithm, but is computationally more efficient since the HVG is a subset of the NVG. [17, 18]
gamb gamboa: Detection using amplitude histogram and critical points: (i) signal normalization using the amplitude histogram; (ii) detection of critical points in the first derivative exceeding a threshold; (iii) elimination of false beats through constraints on detected ECG signal beats; (iv) Computation of the mean ECG wave to obtain QPRS features. [19]
gqrs gqrs: The ECG beat detection algorithm initiates with: (i) employing a trapezoid low-pass filter to the signal, followed by a QRS matched filter convolution. (ii) The parameters of recent intervals and peak thresholds are adjusted without recording QRS locations. (iii) Sample detection occurs, identifying larger samples and peaks that surpass the QRS threshold, thus marking them as QRS complexes. If no peak is detected, the system lowers the peak detection threshold. (iv) Primary and secondary peak identification differentiates between peak types based on neighborhood size and relevance to a previous primary peak or associated T-wave. [10]
hamilt hamilton, eplimited: Detect QRS complexes using filtering, differentiation, rectification, and a moving window method: (i) applying low-pass and high-pass filtering to the signal; (ii) calculating the signal’s derivative; (iii) rectifying the signal and utilizing a moving window of 80 ms; (iv) detecting QRS complexes following a predefined rule set. [20, 21]
jqrs jqrs: QRS detection enhanced by sliding window and custom filter: (i) window-based peak energy detector; (ii) band-pass filter with QRS matched filter (Mexican hat); (iii) reject detections based on heuristic during flat lines; (iv) search-back procedure for suspected missed beats. [2224]
kali Kalidas and Tamil, Stationary Wavelet Transform (swt): Peak detection using Stationary Wavelet Transform: (i) resample signal to 80 Hz for real-time processing; (ii) compute 2-level SWT using ‘db3’ wavelet; (iii) square and MWA to enhance QRS peaks; (iv) threshold-based peak detection; (v) detect missed beats based on RR intervals; (vi) determine actual R-peak location within 0.10 seconds. [8, 25]
mart martinez, wavedet, Continuous Wavelet Transorm (CWT): The algorithm functions by executing a continuous wavelet transformation of the ECG signal across five distinct scales: (i) Each scale calculates a standard deviation, epsilon, from the transformed signal and peaks exceeding this epsilon are identified. (ii) The algorithm then filters these identified peaks across each scale, keeping only those closely associated with preceding scale peaks. (iii) It locates R-peaks by pinpointing zero-crossings in scale one within a specified range. [26]
nab nabian: Usage of sliding window with adaptive thresholds and domain knowledge to detect PQRST points: (i) Filter using Elliptic, Gaussian, or Butterworth (default: Elliptic); (ii) Detect potential R-peaks using global maxima in sliding window; (iii) Eliminate R-peaks below amplitude threshold; (iv) Find missing R-peaks using R-R interval; (v) Detect PQST points using predefined R-based locations. [27]
nk neurokit: Usage of signal smoothing and gradients to detect QRS complexes: (i) Computing the gradient and average gradient threshold of the highpass-filtered raw ECG signal. (ii) Identifying the start and end of QRS complexes by comparing the signal’s smoothed gradient with the gradient threshold. (iii) Ignoring QRS complexes that are too short by setting a minimum length. (iv) Identifying R-peaks within each QRS. (v) Ensuring peaks identified are not too close together by enforcing a minimum delay.
pan-tomp pan tompkins: Filtering of the signal to segment the QRS complex: (i) low-pass and high-pass filtering; (ii) derivative of the signal; (iii) squaring of the derivative to amplify the QRS complex; (iv) adaptive thresholding with a refractory period using a moving window approach. [28]
rdeco r-deco: QRS detection using an envelope-based method: (i) using the difference between the lower and upper envelopes to flatten the signal; (ii) limiting the search range by considering segments whose value is higher than the 80 ms later value and whose upward slope lasts longer than 80 ms; (iii) selecting the segments of maximal value; (iii) defining the R peaks using the Pan-Tompkin adaptive thresholding method; (iv) eliminating false detections by performing a 50 ms backward search for each peak. [29]
rpeak rpeakdetect: Periodic adjustment of thresholds and parameters to detect QRS complexes using sensitivity-appropriate filtering: (i) cascaded low-pass and high-pass filtering to reduce signal noise; (ii) approximation of a derivative and application of an amplitude squaring operation; (iii) use of a moving window integrator with adaptive thresholds to segment the locations of QRS complexes. [21, 28]
two-avg two average, elgendi: Application of statistical thresholds and moving averages to generate blocks of interest: (i) bandpass filtering; (ii) integration of moving averages to generate blocks of interest; (iii) rejection of blocks smaller than the QRS complex length of the healthy adult and detection of the R peak as the maximum value of the remaining blocks. [30]
unsw unsw: Application of an adaptive threshold to a feature enriched signal: (i) filtering the signal with detrending, median filtering, and bandpass filters; (ii) calculating the QRS feature using the differentiated and filtered signal; (iii) smoothing the QRS features signal frequencies using the fundamental frequency as the lower bound; (iv) Adaptive threshold calculation using windows of different lengths on the filtered QRS feature signal; (v) Identification of possible QRS complexes using a peak-through detector on the filtered QRS feature signal; (vi) Rejection of erroneous QRS complexes. [4]
wqrs wqrs: Transformation to a curve-length signal to apply an adaptive threshold: (i) low-pass filtering; (ii) non-linear scaling of the signal to amplify the QRS complex and reduce noise; (iii) an adaptive threshold reveals the onset and duration of the QRS complex. [31]

MIT-BIH Normal Sinus Rhythm Database (SIN)

The MIT-BIH Normal Sinus Rhythm Database (SIN) contains 18 24-hour ECG recordings from patients referred to the Arrhythmia Laboratory at Boston’s Beth Israel Hospital, who were found not to have significant arrhythmias [10]. The first two hours of each recording were used in this study. The subjects consisted of 13 women and 5 men, aged 20 to 50. Each recordings contains two ECG channels of unknown leads, the first of which was used in this analysis.

MIT-BIH Arrhythmia Database (ARR)

The MIT-BIH Arrhythmia Database (ARR) contains 48 30-minute ECG recordings from 47 patients referred to the same Arrhythmia Laboratory [10, 11]. This dataset consists of 23 recordings which were selected at random from a larger dataset and a further 25 recordings which were manually selected to include examples of significant but uncommon arrhythmias. The subjects included 22 women and 25 men aged 23 to 89. The first ECG channel in each recording was analysed, which was the modified limb lead II in most cases.

PhysioNet/Computing in Cardiology Challenge 2014 Datasets (HIGH and LOW)

The PhysioNet/Computing in Cardiology Challenge 2014 datasets consist of 10-minute ECG recordings from patients and healthy volunteers [10, 12]. The two publicly available datasets were used in this study: (i) the Training Set (HIGH), which contains 100 recordings which are generally of high quality; and (ii) the Augmented Training Set (LOW), which contains 100 recordings that are generally of low quality. Each record in these datasets contains a single ECG lead. The LOW dataset contains the following leads: lead II (78 records); lead III (5), lead AVF (3), lead AVL (1), and no lead label (13). No lead labels are provided in the HIGH dataset.

TELE ECG database (TELE)

The TELE ECG Database contains 250 30-second lead-I ECG recordings from home-dwelling patients suffering from chronic obstructive pulmonary disease and/or congestive heart failure [4, 32]. Recordings were acquired without clinical supervision using the TeleMedCare Health Monitor (TeleMedCare Pty. Ltd. Sydney, Australia). The device records an ECG from the hands using dry metal electrodes. This dataset contains 221 ECGs randomly selected from 120 patients, and an additional 29 ECGs specifically selected to represent poor-quality data. The dataset contains manual annotations of QRS complexes. One ECG in the dataset lasted longer than 30s, and was truncated to 30s for this study.

SAFER ECG dataset (SAFER)

The SAFER ECG Dataset contains 479 30-second lead-I ECG recordings from home-dwelling subjects aged 65 and over, collected in an AF screening study (the SAFER Feasibility Study, ISRCTN 16939438) [13].

ECG recordings were acquired without clinical supervision using the Zenicor EKG-2 device shown in Fig 1 (Zenicor Medical Systems AB, Sweden). The device records an ECG from the thumbs using dry metal electrodes. This dataset contains: 183 high-quality ECGs exhibiting AF (denoted SAFER-AF-HIGH) collected from 48 subjects (13 female and 35 male); 199 high-quality ECGs from subjects without AF (SAFER-nonAF-HIGH) collected from 199 participants (100 female and 99 male); and 97 low-quality ECGs from subjects without AF (SAFER-nonAF-LOW) collected from 97 subjects (49 female and 48 male). ECG quality was assessed using the Cardiolund ECG Parser algorithm (Cardiolund AB). R-peaks were manually annotated specifically for this study. The presence of AF was determined as described in [13]: (i) using the Cardiolund algorithm to identify ECGs with potential abnormalities; and (ii) expert reviewers manually reviewing ECGs to identify AF (as described in [13, 33]). To provide further details, ECGs were classified as AF and non-AF based on ad hoc review by two cardiologists. An ECG was classified as AF if either: (i) both cardiologists agreed that the ECG contained AF; or (ii) one cardiologist made an AF diagnosis and the other provided no diagnosis. An ECG was classified as non-AF if either: (i) the Cardiolund algorithm didn’t identify abnormalities in the ECG, and the cardiologists did not identify an arrhythmia, and the participant was not diagnosed with AF; or (ii) both cardiologists agreed that the ECG didn’t contain an arrhythmia.

Fig 1. Zenicor-EKG device.

Fig 1

The handheld Zenicor-EKG device used to record 30-second ECGs in the SAFER ECG Dataset.

Ethics statement

The SAFER Feasibility Study in which the SAFER ECG dataset was acquired was approved by the London Central NHS Research Ethics Committee (18/LO/2066). All participants gave written informed consent to participate in the study. The study was conducted in accordance with the Declaration of Helsinki. Ethical approval was not required for the use of the remaining datasets as these were pre-existing, anonymised datasets.

Statistical analysis

The performance of QRS detectors was primarily assessed using the F1 score (following a precedent in [5, 34]). The F1 score is the harmonic mean of the sensitivity (SEN) and positive predictive value (PPV). These three statistics were calculated as follows from: the number of reference QRS complex annotations (nref, corresponding to the number of actual positives, P); the number of QRS complexes identified by an algorithm (nalg, corresponding to the number of predicted positives, i.e. true positives + false positives, TP+FP); and the number of QRS complexes which were correctly identified (ncorrect, corresponding to the number of true positives, TP).

SEN(%)=TPP×100=ncorrectnref×100 (1)
PPV(%)=TPTP+FP×100=ncorrectnalg×100 (2)
F1(%)=2×PPV×SENPPV+SEN×100 (3)

ncorrect was calculated as the number of reference QRS complex annotations for which at least one QRS complex was identified by an algorithm within ± 75ms of the reference QRS annotation as shown in Fig 2.

Fig 2. Assessing whether QRS complexes were correctly identified.

Fig 2

An ECG signal is shown with dotted red lines marking reference R-peak annotations, grey areas showing the tolerance of ± 75ms around these annotations within which QRS complexes are deemed to be correctly identified, and markers for the R-peaks identified by the 18 QRS detectors used in this study.

A threshold of ± 75ms was chosen to classify QRS detections as correct or not for the following reasons. QRS complexes typically last <120ms in health [35], although can last longer in disease [36]. ± 75ms was identified as a conservative threshold which would classify any QRS detections lying on a QRS complex as correct, whilst classifying any detections on other ECG waves (such as p- or t-waves) as incorrect. This was based on the assumptions that: a QRS complex lasts up to approximately 150ms; the R-wave is located approximately in the centre of a QRS complex; and reference QRS annotations are at the locations of R-waves. To investigate the suitability of this threshold, we assessed the performance of the QRS detectors on the telehealth (TELE and SAFER) datasets for thresholds ranging from ± 1 to 140ms. The results (shown in Fig A in S1 Text) show that for all QRS detectors performance was poorer at low thresholds, with performance generally approaching a maximum between 20 and 100ms (such as ≈ 20ms for nk and unsw, and ≈ 60–80ms for pan-tomp and two-avg). Therefore, a threshold of ± 75ms appeared a reasonable choice. In comparison, previous work in this area has used tolerances of 50 ms [5] and ± 150 ms [22].

F1 scores are reported using the median and inter-quartile range of the F1 score for each ECG window.

Two additional performance measures were used: algorithm error rate and execution time. Error rates were defined as the percentage of 30s ECG segments in which an algorithm encountered an error and did not return identified QRS complexes. Execution times were assessed as the median time taken for an algorithm to process each 30s ECG segment. The analysis was performed on a MacBook Air (M1, 2020, 16 GB RAM, 8 cores) without parallelization. The assessment was run in Visual Studio Code 1.73.0, using Python 3.9, and calling MATLAB R2022a for QRS detectors written in MATLAB code.

The two-sided Mann-Whitney U test was used to test for statistically significant differences between F1 scores at the 95% significance level. A Bonferroni correction was used to account for the multiple comparisons (a comparison for each beat detector). This test was used as the distributions were neither normally distributed nor dependent on each other. Comparisons were made between: (i) supervised and telehealth ECGs; (ii) high- and low-quality ECGs; (iii) AF and non-AF ECGs; and (iv) female and male subjects. Comparisons between female and male subjects were made on the SIN, ARR and SAFER datasets, but not on the HIGH, LOW and TELE datasets as to the best of our knowledge they do not contain information on gender.

Results

Algorithm performance

The performance of the algorithms is presented in Fig 3 using the F1 score. When using a F1 score of ≥ 0.96 to identify good performance, a total of 12 out of 18 algorithms performed well on ECGs collected under clinical supervision (ARR, HIGH and LOW, and SIN). The exceptions were engz, gamb, jqrs, mart, nab and rpeak. Fewer algorithms performed well on telehealth ECGs: five algorithms performed well on the TELE dataset (gqrs, nk, rdeco, two-avg, and unsw); six algorithms performed well on high-quality SAFER data (fnvg, fwhvg, nk, rdeco, two-avg, and unsw); and performance was considerably poorer on low-quality SAFER data, with only three algorithms scoring ≥ 0.78 (fnvg, nk, and unsw), and none scored higher than 0.84.

Fig 3. The performance of QRS detectors, expressed as the F1 score.

Fig 3

Results are shown for the 18 QRS detectors (on the y-axis) and the six datasets (x-axis). Dataset definitions: ARR—MIT-BIH Arrhythmia Database; HIGH—PhysioNet/Computing in Cardiology Challenge 2014 training set; LOW—PhysioNet/Computing in Cardiology Challenge 2014 augmented training set; SAFER-AF-HIGH—SAFER ECG Dataset subset of high-quality ECGs exhibiting AF; SAFER-nonAF-HIGH—SAFER ECG Dataset subset of high-quality ECGs not exhibiting AF; SAFER-nonAF-LOW—SAFER ECG Dataset subset of low-quality ECGs not exhibiting AF; SIN—MIT-BIH Normal Sinus Rhythm Database; TELE—TELE ECG Database.

Therefore, overall the nk and unsw algorithms performed best, with consistently high F1 scores on datasets of supervised ECG recordings, and the highest F1 scores on self-recorded ECGs (TELE and SAFER datasets).

Additional results for the positive positive predictive value (PPV) and sensitivity (SEN) are provided in Figs B and C in S1 Text. These metrics show that: gamb performed poorly because of a low PPV, indicating that it falsely detected additional QRS complexes; and mart, and engz had a low SEN, indicating that they frequently missed QRS complexes.

Fig 4 shows the error rates of each QRS detector (indicating the proportion of ECG windows for which the QRS detector algorithms failed to execute—i.e. encountered an error). Most QRS detectors had no or very few errors. The best-performing algorithms had 0.0% errors on all datasets (nk and unsw). The engz and gamb algorithm implementations frequently produced errors, and some errors were encountered for gqrs, jqrs, rdeco and rpeak algorithms. Of particular note, the gamb algorithm exhibited higher error rates on SAFER data, including error rates of ≥99% for gamb (in keeping with a previous study [8]). This was due to the algorithm’s use of a fixed amplitude threshold which was often not met for SAFER ECGs.

Fig 4. The error rates for each QRS detector (expressed as percentages).

Fig 4

These indicate the proportion of ECG windows for which the QRS detector algorithms failed to execute (i.e. encountered an error).

Fig 5 shows the median execution time of each QRS detector. The fastest QRS detector, rpeak, had an execution time of 1.1 ms (i.e. 0.004% of the signal duration). Of the best-performing QRS detectors (nk and unsw), nk had a short execution time of 2.7 ms (0.009% of the signal duration), whereas unsw was slower at 37.1 ms (0.124% of the signal duration). Four QRS detectors had much longer execution times (christ, engz, gqrs, and wqrs), although we note that C code implementations are available for some of these that would have led to shorter execution times. Most QRS detectors had similar median and mean execution times: the mean execution time was between 87 and 124% of the median for all QRS detectors except christ, whose median execution time was substantially longer (373% of the median), primarily due to exceptionally high runtimes on the SAFER-nonAF-LOW dataset.

Fig 5. QRS detector execution times.

Fig 5

The median execution time of each QRS detector was calculated across all datasets, where QRS detectors were implemented in either Python (blue) or Matlab (red).

Comparison between supervised and telehealth ECGs

For most QRS detectors, the performance of QRS detectors was higher on supervised ECG recordings than on unsupervised, telehealth ECGs. A total of 17 (out of 18) QRS detectors had a significantly higher F1 score on the supervised SIN dataset than the unsupervised SAFER-nonAF-HIGH dataset (mart showed no sigificant difference). Similarly, 16 QRS detectors had a significantly higher F1 score on the supervised ARR dataset than the unsupervised SAFER-AF-HIGH dataset (hamilt and nk showed no sigificant difference). Referring to Fig 3: some QRS detectors performed below average on unsupervised ECGs despite having performed well on supervised ECGs: gqrs achieved F1 scores of ≥0.98 on supervised ECGs (SIN, ARR, HIGH, LOW), but ≤0.70 on SAFER; and rpeak achieved ≥0.81 on supervised ECGs, but ≤0.53 on TELE and SAFER datasets.

The results for positive predictive value (PPV) and sensitivity (SEN) (in Figs B and C in S1 Text) show that most QRS detectors which performed poorly on self-recorded ECGs had a low PPV, indicating false positive QRS detections. In addition, some QRS detectors had low sensivities, indicating unrecognized QRS complexes (e.g. engz, gamb, jqrs, mart, nab, and wqrs).

Algorithm errors predominantly occurred in unsupervised telehealth ECGs (see Fig 4).

The impact of signal quality

Low signal quality was associated with poorer performance of QRS detectors in the telehealth setting. The F1 scores for all QRS detectors except gamb were significantly lower on low-quality unsupervised ECGs (SAFER-nonAF-LOW) than high-quality unsupervised ECGs (SAFER-nonAF-HIGH). For instance, the best-performing QRS detectors (nk and unsw) performed well on high-quality unsupervised ECGs (TELE, SAFER-nonAF-HIGH, SAFER-AF-HIGH) with F1 scores of ≥0.97, but performed less well on low-quality unsupervised ECGs (SAFER-nonAF-LOW) with F1 scores of ≤0.84. Indeed, all remaining QRS detectors showed F1 scores of ≤0.78 on low-quality ECGs (SAFER-nonAF-LOW) in the unsupervised telehealth environment.

Signal quality had a smaller but nonetheless significant impact on QRS detectors when using supervised ECGs. Almost all algorithms performed well on high-quality ECGs (the SIN, ARR, and HIGH datasets) with F1 scores of ≥0.97 (except gamb and mart), and most of these algorithms continued to perform relatively well on low-quality supervised ECGs (the LOW dataset) with F1 scores of ≥0.97 (except engz, gamb, jqrs, mart, nab and rpeak). The small differences in F1 scores between HIGH and LOW were significant for all QRS detectors except wqrs and hamilt.

Other influencing factors

The presence of arrhythmia did not have a large effect on F1 scores for either supervised ECGs (comparing ARR and SIN) or unsupervised ECGs (comparing SAFER-AF-HIGH and SAFER-nonAF-HIGH) (see Fig 3). Whilst the differences were mostly small, F1 scores were significantly lower during arrhythmias in ARR compared to SIN for 6 out of 18 QRS detectors, and in SAFER-AF-HIGH compared to SAFER-nonAF-HIGH for 8 QRS detectors. Amongst the best performing QRS detectors (nk and unsw), the only significant difference was for nk in the comparison of SAFER-AF-HIGH and SAFER-nonAF-HIGH, although this difference was small with median F1 scores of 0.99 and 0.98 on the datasets.

Sex had little impact on performance when using unsupervised ECGs as demonstrated by there being no significant differences in performance between female and male subjects on high-quality, non-AF SAFER signals (see Fig 6A), and significant differences for only two QRS detectors on high-quality, AF SAFER signals (see Fig 6B). There were no significant differences in performance between sexes on the SIN and ARR datasets (see Fig D in S1 Text), although we note the small numbers of subjects in each group in the SIN dataset (13 female and 5 male).

Fig 6. Comparison of the performance of QRS detectors between female (F) and male (M) SAFER participants.

Fig 6

A: SAFER-nonAF-HIGH: High-quality, non-AF ECGs (including 100 female and 99 male subjects). B: SAFER-AF-HIGH: High-quality, AF ECGs (including 92 female and 91 male subjects). Definitions: ns—no significant difference.

p-values for all statistical comparisons are provided in Tables D and E in S1 Text.

Discussion

Summary of findings

This study assessed the performance of open-source QRS detectors on single-lead, telehealth ECGs. The neurokit (nk)and UNSW (unsw)QRS detectors were identified as the best-performing out of 18 QRS detectors. They performed well on telehealth ECGs recorded without clinical supervision, and also on ECGs recorded in clinical settings. They achieved F1 scores of ≥0.98 on high-quality telehealth ECGs and ≥0.97 on ECGs recorded in clinical settings. Performance was lower at ≥0.78 when analysing low-quality telehealth ECGs. Performance was not substantially affected by heart rhythm or gender. nk had one of the fastest execution times (at 0.009% of the signal duration), whereas unsw was over ten times slower (0.124%).

Comparison with literature

Several studies have compared the performance of multiple QRS detection algorithms across databases of different quality [5, 6, 8, 22]. Previous studies assessed 6–12 algorithms, compared to 18 in the current study. Several of the high-performing algorithms included in the current study were not widely assessed in previous comparison studies: nk and two-avg were only included in [8]; unsw was only included in [5]; and rdeco was not included in these studies. In addition, previous studies had mostly focused on assessing performance on supervised ECG recordings rather than the telehealth setting. Telehealth data was only included in [5]: the current study included analyses of both this dataset and also data from the SAFER AF screening study, containing the additional challenge of QRS detection during AF.

The current study adds to our understanding of how best to detect QRS complexes in telehealth ECGs, and demonstrates the need to develop techniques to handle low-quality ECGs appropriately. Previously, QRS detectors had been found to perform worse on telehealth data, and in particular the TELE dataset [5]. We also observed worse performance on telehealth data, although we found that the best QRS detectors performed adequately well on high-quality telehealth data, and that performance was only substantially worse on low-quality telehealth data. This provides two complementary directions for future work: (i) QRS detectors could be developed to perform well even in the presence of noise (e.g. through denoising [37] or improved algorithm design [4]); and (ii) ECG signal quality algorithms could be developed to identify low-quality recordings in which QRS complexes cannot be accurately identified [22, 38].

The current study also has implications for future research. We observed that the performance of QRS detectors on supervised or high-quality ECG recordings is not necessarily indicative of their performance on unsupervised recordings, in keeping with [6]. This highlights the importance of assessing performance in the target setting, such as in AF screening as performed in this study. We also observed quite different performances on the TELE dataset to those reported previously: whereas the highest performing algorithm achieved an F1 score of 0.80 on TELE in [5], six of the algorithms included in the present study achieved F1 scores of 0.90–1.00. Whilst in many cases this may be explained by including additional algorithms in this study, it is notable that the jqrs algorithm’s performance was substantially higher on this dataset in the present study (0.93) than the previous study (0.79). This may also be explained by the use of different tolerance windows. Nonetheless, this demonstrates the need to share open-source algorithm implementations and the code used to perform algorithm assessments. To address this, we have provided a repository of open-source algorithms and assessment code to accompany this article: https://github.com/floriankri/ecg_detector_assessment.

Strengths and limitations

The key strengths of this study are the assessment of QRS detectors in a real-world AF screening setting, and the inclusion of recently developed, high-performance QRS detectors. The key limitation is that algorithms were run retrospectively on a computer, rather than in real-time on a telehealth device. Some algorithms were implemented in Python, and others in Matlab. Therefore, the comparison of algorithm execution times reported in this study may not be truly representative of the relative execution times which would be observed on devices: the comparison of Python and Matlab execution times may not be fair; different algorithms may have been optimised to different extents; and some algorithms may be more amenable to further optimisation for use on devices than others (such as through implementation in C, as is already the case for parts of unsw). We note that in this study we did not investigate the potential benefit of additional ECG filtering beyond that already incorporated into each of the QRS detector algorithms: potentially performance could be improved further by including additional linear or non-linear filtering steps [39, 40]. Furthermore, we did not investigate the accuracy of RR-intervals derived from QRS detections, nor their suitability for heart rate variability analysis or arrhythmia detection. We note that additional processing steps may be required to accurately derive RR-intervals, such as locating the R-wave on each detected QRS complex.

Implications

This study identified leading QRS detector algorithms for use with telehealth ECGs. The best-performing algorithms were able to detect QRS complexes with a very high degree of accuracy on high-quality telehealth ECG data, demonstrating the potential utility of telehealth devices for assessments based on RR-intervals (such as arrhythmia detection). Furthermore, the study demonstrates the importance of selecting a high-performance QRS detector, since performance can vary greatly on telehealth ECGs, between even well-established algorithms. The study also demonstrates the difficulty in analysing low-quality telehealth ECGs, which appear to be of particularly low quality, perhaps due to increased artifact, the use of dry electrodes, being self-recorded without clinical supervision, and acquisition at the hands rather than the chest [4].

The findings are particularly relevant to telehealth settings where ECG signals are recorded without clinical supervision. Several such settings arise in the detection and management of atrial fibrillation at home, including: (i) virtual wards to reduce hospitalisation for atrial fibrillation [41]; (ii) screening for paroxysmal atrial fibrillation [42]; and (iii) detecting recurrent atrial fibrillation after ablation or cardioversion [43]. In each of these examples an accurate QRS detector is a key step in processing the intermittent ECGs acquired by patients at home, where signal quality may be lower than in the clinical setting.

Conclusion

This study identified two leading QRS detectors for use with single-lead, telehealth ECGs: the nk and unsw algorithms. These algorithms provided accurate QRS detection with fast execution times. Whilst most other algorithms performed well on data collected under clinical supervision, many did not perform as well on telehealth data, demonstrating the importance of selecting a high-performance algorithm for use in clinical analysis. The performance of even the leading algorithms was substantially lower on low-quality telehealth ECGs, highlighting the need to handle low-quality ECGs appropriately in an analysis pipeline. All the QRS detection algorithms used in this study are openly available, ensuring that they can be quickly used in future research. Furthermore, the code used to assess algorithm performance is also available to facilitate future research, at: https://github.com/floriankri/ecg_detector_assessment.

Supporting information

S1 Text. Supplementary Material.

The Supplementary Material provides additional results, details of the study methodology, and links to algorithms and datasets.

(PDF)

pdig.0000538.s001.pdf (603.5KB, pdf)

Acknowledgments

[5] provided the foundations for the selection of datasets and their presentation in Table 2. ChatGPT (OpenAI, San Francisco, CA, USA) was used for language editing.

The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.

Data Availability

The following datasets are publicly available at the links provided in Table C in S1 Text: (i) MIT-BIH Arrhythmia Database (https://www.physionet.org/physiobank/database/mitdb/); (ii) PhysioNet/Computing in Cardiology Challenge 2014 training dataset and augmented training dataset (https://physionet.org/content/challenge-2014/1.0.0/); (iii) MIT-BIH Normal Sinus Rhythm Database (https://physionet.org/physiobank/database/nsrdb/); and (iv) TELE ECG Database (https://doi.org/10.7910/DVN/QTG0EP). The SAFER dataset cannot be shared due to ethical restrictions. Requests for access to the SAFER dataset should be directed to the SAFER study coordinator (SAFER@medschl.cam.ac.uk) and will be considered by the investigators, in accordance with participant consent.

Funding Statement

This study is funded by the British Heart Foundation (FS/20/20/34626 awarded to PHC), and the National Institute for Health and Care Research (NIHR) Programme Grants for Applied Research Programme (RP-PG0217-20007 awarded to JM), and the NIHR School for Primary Care Research (SPCR-2014-10043, project 410 awarded to JM). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Lopez Perales CR, Van Spall HGC, Maeda S, Jimenez A, Laţcu DG, Milman A, Kirakoya-Samadoulougou F, et al. Mobile health applications for the detection of atrial fibrillation: a systematic review. EP Europace. 2021;23(1):11–28. doi: 10.1093/europace/euaa139 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Lu L, Zhang J, Xie Y, Gao F, Xu S, Wu X, et al. Wearable health devices in health care: narrative systematic review. JMIR mHealth and uHealth. 2020;8(11):e18907. doi: 10.2196/18907 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Lee SM, Lee D. Opportunities and challenges for contactless healthcare services in the post-COVID-19 Era. Technol Forecast Soc Change. 2021;167:120712. doi: 10.1016/j.techfore.2021.120712 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Khamis H, Weiss R, Xie Y, Chang CW, Lovell NH, Redmond SJ. QRS detection algorithm for telehealth electrocardiogram recordings. IEEE Trans Biomed Eng. 2016;63(7):1377–1388. doi: 10.1109/TBME.2016.2549060 [DOI] [PubMed] [Google Scholar]
  • 5. Liu F, Liu C, Jiang X, Zhang Z, Zhang Y, Li J, et al. Performance analysis of ten common QRS detectors on different ECG application cases. J Healthcare Eng. 2018;2018:e9050812. doi: 10.1155/2018/9050812 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Llamedo M, Martínez JP. QRS detectors performance comparison in public databases. Computing in Cardiology. 2014;357–360. [Google Scholar]
  • 7. Elgendi M. Fast QRS Detection with an Optimized Knowledge-Based Method: Evaluation on 11 Standard ECG Databases. PLoS ONE. 2013;8(9):e73557. doi: 10.1371/journal.pone.0073557 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Makowski D, Pham T, Lau ZJ, Brammer JC, Lespinasse F, Pham H, et al. NeuroKit2: A Python toolbox for neurophysiological signal processing. Behav Res. 2021. Aug;53(4):1689–1696. doi: 10.3758/s13428-020-01516-y [DOI] [PubMed] [Google Scholar]
  • 9.Porr B, Howell L. py-ecg-detectors: Seven ECG heartbeat detection algorithms and heartrate variability analysis. Version 1.3.2. Available from: https://github.com/berndporr/py-ecg-qrs-detectors.
  • 10. Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation. 2000. Jun 13;101(23):E215–220. doi: 10.1161/01.CIR.101.23.e215 [DOI] [PubMed] [Google Scholar]
  • 11. Moody GB, Mark RG. The impact of the MIT-BIH Arrhythmia Database. IEEE Eng Med Biol Mag. 2001. May;20(3):45–50. doi: 10.1109/51.932724 [DOI] [PubMed] [Google Scholar]
  • 12. Moody G, Moody B, Silva I. Robust detection of heart beats in multimodal data: The PhysioNet/Computing in Cardiology Challenge 2014. In: Computing in Cardiology 2014. 2014. Sep; p. 549–552. [Google Scholar]
  • 13. Pandiaraja M, Brimicombe J, Cowie M, Dymond A, Lindén HC, Lip GYH, et al. Screening for atrial fibrillation: improving efficiency of manual review of handheld electrocardiograms. Eng Proc. 2020;2(1):78. doi: 10.3390/ecsa-7-08195 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Christov II. Real time electrocardiogram QRS detection using combined adaptive threshold. BioMedical Engineering OnLine. 2004. Aug 27;3(1):28. doi: 10.1186/1475-925X-3-28 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Engelse WAH, Zeelenberg C. A single scan algorithm for QRS-detection and feature extraction. Computers in Cardiology. 1979;6:37–42. [Google Scholar]
  • 16.Lourenco A, Silva H, Leite P, Lourenco R, Fred A. Real time electrocardiogram segmentation for finger based ECG biometrics. In: Proceedings of the International Conference on Bio-inspired Systems and Signal Processing. 2012; pages 49–54.
  • 17.Emrich J, Taulant K, Wirth S, Muma M. Accelerated Sample-Accurate R-Peak Detectors Based on Visibility Graphs. In: Proceedings of the European Signal Processing Conference. 2023; pages 1090–1094.
  • 18.Koka T, Muma M. Fast and Sample Accurate R-Peak Detection for Noisy ECG Using Visibility Graphs. In: Proceedings of the 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society. 2022; pages 121–126. [DOI] [PubMed]
  • 19.Gamboa H. Multi-modal Behavioral Biometrics Based on HCI and Electrophysiology. PhD Thesis, Universidade Técnica de Lisboa. 2008.
  • 20.Hamilton P. Open Source ECG Analysis. In: Computers in Cardiology 2002. Volume 29. Memphis, TN, USA: IEEE; 2002. p. 101–104.
  • 21. Hamilton PS, Tompkins WJ. Quantitative Investigation of QRS Detection Rules Using the MIT/BIH Arrhythmia Database. IEEE Transactions on Biomedical Engineering. 1986. Dec;BME-33(12):1157–1165. doi: 10.1109/TBME.1986.325695 [DOI] [PubMed] [Google Scholar]
  • 22. Johnson AEW, Behar J, Andreotti F, Clifford GD, Oster J. Multimodal heart beat detection using signal quality indices. Physiol Meas. 2015. Jul;36(8):1665–1677. doi: 10.1088/0967-3334/36/8/1665 [DOI] [PubMed] [Google Scholar]
  • 23. Behar J, Oster J, Clifford GD. Non-invasive FECG extraction from a set of abdominal sensors. Computing in Cardiology 2013. 2013. Sep;297–300. [Google Scholar]
  • 24. Behar J, Oster J, Clifford GD. Combining and benchmarking methods of foetal ECG extraction without maternal or scalp electrode data. Physiol Meas. 2014. Jul;35(8):1569–1589. doi: 10.1088/0967-3334/35/8/1569 [DOI] [PubMed] [Google Scholar]
  • 25.Kalidas V, Tamil L. Real-time QRS detector using Stationary Wavelet Transform for Automated ECG Analysis. In: Proceedings of the 2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE); 2017 Oct;457–461.
  • 26. Martinez JP, Almeida R, Olmos S, Rocha AP, Laguna P. A wavelet-based ECG delineator: evaluation on standard databases. IEEE Trans Biomed Eng. 2004. Apr;51(4):570–581. doi: 10.1109/TBME.2003.821031 [DOI] [PubMed] [Google Scholar]
  • 27. Nabian M, Yin Y, Wormwood J, Quigley KS, Barrett LF, Ostadabbas S. An open-source feature extraction tool for the analysis of peripheral physiological data. IEEE J Transl Eng Health Med. 2018;6:2800711. doi: 10.1109/JTEHM.2018.2878000 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Pan J, Tompkins WJ. A real-time QRS detection algorithm. IEEE Trans Biomed Eng. 1985. Mar;32(3):230–236. doi: 10.1109/TBME.1985.325532 [DOI] [PubMed] [Google Scholar]
  • 29. Moeyersons J, Amoni M, Van Huffel S, Willems R, Varon C. R-DECO: an open-source Matlab based graphical user interface for the detection and correction of R-peaks. PeerJ Comput Sci. 2019. Oct 21;5:e226. doi: 10.7717/peerj-cs.226 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Elgendi M, Jonkman M, DeBoer F. Frequency bands effects on QRS detection. In: International Conference on Bio-inspired Systems and Signal Processing. Valencia, Spain: SciTePress; 2010. p. 428–431.
  • 31. Zong W, Moody GB, Jiang D. A robust open-source algorithm to detect onset and duration of QRS complexes. In: Computers in Cardiology. 2003. Sep; p. 737–740. [Google Scholar]
  • 32. Redmond SJ, Xie Y, Chang D, Basilakis J, Lovell NH. Electrocardiogram signal quality measures for unsupervised telehealth environments. Physiol Meas. 2012;33(9):1517–1533. doi: 10.1088/0967-3334/33/9/1517 [DOI] [PubMed] [Google Scholar]
  • 33. Adeniji M, Brimicombe J, Cowie M, Dymond A, Lindén HC, Lip GYH, et al. Prioritising electrocardiograms for manual review to improve the efficiency of atrial fibrillation screening. In: Proc IEEE EMBS. 2022; p. 3239–3242. [DOI] [PubMed] [Google Scholar]
  • 34. Laguna P, Jané R, Caminal P. Automatic detection of wave boundaries in multilead ECG signals: validation with the CSE database. Comput Biomed Res. 1994. Feb;27(1):45–60. doi: 10.1006/cbmr.1994.1006 [DOI] [PubMed] [Google Scholar]
  • 35. Hnatkova K, Smetana P, Toman O, Schmidt G, Malik M. Sex and race differences in QRS duration. EP Europace. 2016;18(12):1842–1849. [DOI] [PubMed] [Google Scholar]
  • 36. Wang NC, Maggioni AP, Konstam MA, Zannad F, Krasa HB, Burnett JC, et al. Clinical implications of QRS duration in patients hospitalized with worsening heart failure and reduced left ventricular ejection fraction. JAMA. 2008;299(22):2656–2666. doi: 10.1001/jama.299.22.2656 [DOI] [PubMed] [Google Scholar]
  • 37. Beni NH, Jiang N. Heartbeat detection from single-lead ECG contaminated with simulated EMG at different intensity levels: A comparative study. Biomed Signal Process Control. 2023;83:104612. doi: 10.1016/j.bspc.2023.104612 [DOI] [Google Scholar]
  • 38. Liu F, Liu C, Zhao L, Jiang X, Zhang Z, Li J, et al. Dynamic ECG Signal Quality Evaluation Based on the Generalized bSQI Index. IEEE Access. 2018;6:41892–41902. doi: 10.1109/ACCESS.2018.2860056 [DOI] [Google Scholar]
  • 39.Clifford, GD. Linear Filtering Methods. In: Advanced Methods and Tools For ECG Data Analysis. Artech; 2006. p. 135–170.
  • 40.McSharry, PE and Clifford, GD. Nonlinear Filtering Methods. In: Advanced Methods and Tools For ECG Data Analysis. Artech; 2006. p. 171–196.
  • 41. Kotb A, Armstrong S, Koev I, Antoun I, Vali Z, Panchal G, et al. Digitally enabled acute care for atrial fibrillation: conception, feasibility and early outcomes of an AF virtual ward. Open Heart. 2023;10(1):e002272. doi: 10.1136/openhrt-2023-002272 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Svennberg E, Engdahl J, Al-Khalili F, Friberg L, Frykman V, Rosenqvist M. Mass screening for untreated atrial fibrillation: the STROKESTOP study. Circulation. 2015;131(25):2176–2184. doi: 10.1161/CIRCULATIONAHA.114.014343 [DOI] [PubMed] [Google Scholar]
  • 43. Goldenthal I, Sciacca RR, Riga T, Bakken S, Baumeister M, Biviano AB, et al. Recurrent atrial fibrillation/flutter detection after ablation or cardioversion using the AliveCor KardiaMobile device: iHEART results. Journal of Cardiovascular Electrophysiology. 2019;30(11):2220–2228. doi: 10.1111/jce.14160 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLOS Digit Health. doi: 10.1371/journal.pdig.0000538.r001

Decision Letter 0

Calvin Or

20 Feb 2024

PDIG-D-24-00016

QRS detection in single-lead, telehealth electrocardiogram signals: benchmarking open-source algorithms

PLOS Digital Health

Dear Dr. Charlton,

Thank you for submitting your manuscript to PLOS Digital Health. After careful consideration, we feel that it has merit but does not fully meet PLOS Digital Health's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript within 60 days Apr 20 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at digitalhealth@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pdig/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

* A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Calvin Or, PhD

Section Editor

PLOS Digital Health

Journal Requirements:

1. Please provide separate figure files in .tif or .eps format only and remove any figures embedded in your manuscript file. Please also ensure that all files are under our size limit of 10MB.

For more information about figure files please see our guidelines:

https://journals.plos.org/digitalhealth/s/figures

https://journals.plos.org/digitalhealth/s/figures#loc-file-requirements

2. We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex.

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Does this manuscript meet PLOS Digital Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Partly

--------------------

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

--------------------

3. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

--------------------

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Digital Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

--------------------

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This study undertakes an extensive statistical analysis, amalgamating QRS detectors with diverse datasets and employing F1 score and error rate as performance metrics. However, a meticulous examination of the congruence between reported results in Figure 3 and Figure 4 is recommended to ensure coherence. Specifically, the instance where both the F1 score and error rate for the cell (mart, SAFER-nonAF-LOW) are recorded as 0 necessitates careful scrutiny, as their concurrent occurrence may signal potential calculation errors.

Furthermore, the application of the Mann-Whitney U test to assess performance across various subsets is acknowledged. Nonetheless, the absence of a comprehensive presentation of detailed testing results and statistical values introduces a notable limitation. To enhance the study's robustness, it is advisable to furnish explicit p-values and statistics derived from the Mann-Whitney U test. Relying solely on boxplot representations poses a potential constraint, particularly when disparities between median values seemingly contradict the non-significant (ns) result, assumed to arise from the U test. A more exhaustive reporting of statistical outcomes is imperative for a nuanced and compelling interpretation of the comparative analyses.

Reviewer #2: This paper benchmarked 18 open-source QRS detection algorithms to identify the best-performing one for ECG signals. The study compares their performance across datasets, including a novel dataset collected during AF screening. The manuscript has good quality and the results are interesting. I have a few comments:

1. Figure 7,8,9, without any explanation. What is the purpose of doing these experiments, what message the results convey?

2. What is the computing environment of running python and matlab program? How many CPU cores are used? Are there any parallelization used?

3. Could the authors specify what are the number of male/female samples in each dataset?

Reviewer #3: This work describes the performance of several QRS detection algorithms. While the work is not novel, in and of itself, there certainly is a welcome place in the literature for a thorough evaluation of these approaches. As the authors' note, QRS detection forms an important part of several clinically relevant tasks (e.g., arrhythmia detection, HRV analyses, etc.). However, the work suffers from technical flaws that require clarification before it can be published.

1) The authors definitions of sensitivity and specificity are not correct as written and it is not clear whether this represents a typographical error or a true error in how these values were calculated. For example, the authors suggest that the numerator for both sensitivity calculations and for positive predictive value calculations are the same. This is incorrect. The sensitivity is the true positive rate and therefore the numerator refers to the number of reference QRS annotations that are also correct according to the algorithm. By contrast, numerator in the equation for the positive predictive value (PPV) is the number of algorithmic predictions that are correct. This is a very important point that needs to be clarified/corrected.

2) The authors define a correct prediction as one that is within +/- 150ms of the reference QRS annotation. This is a large range. In the introduction, the authors point to HRV and arrhythmia detection as important tasks that depend on QRS detection. However, such a large range (+/- 150ms of the QRS reference) will certainly not lead to accurate HRV estimates and will certainly not help with arrhythmia detection. The fact that this has been used in other studies is a poor reason to reply on this standard here. It is imperative that the authors discuss results using cutoffs that will yield more reliable HRV estimates (e.g., +/- 40ms).

--------------------

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

--------------------

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLOS Digit Health. doi: 10.1371/journal.pdig.0000538.r003

Decision Letter 1

Calvin Or

27 May 2024

QRS detection in single-lead, telehealth electrocardiogram signals: benchmarking open-source algorithms

PDIG-D-24-00016R1

Dear Dr Charlton,

We are pleased to inform you that your manuscript 'QRS detection in single-lead, telehealth electrocardiogram signals: benchmarking open-source algorithms' has been provisionally accepted for publication in PLOS Digital Health.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow-up email from a member of our team. 

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they'll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact digitalhealth@plos.org.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Digital Health.

Best regards,

Calvin Or, PhD

Section Editor

PLOS Digital Health

***********************************************************

Reviewer Comments (if any, and for reference):

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

**********

2. Does this manuscript meet PLOS Digital Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Digital Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: (No Response)

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: "To address this, we have provided a repository of open-source algorithms and assessment code to accompany this article: https://github.com/floriankri/ecg_detector_assessment".

The link does not exist. If the authors do not plan to make code available, I suggest remove the false statement.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

**********

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text. Supplementary Material.

    The Supplementary Material provides additional results, details of the study methodology, and links to algorithms and datasets.

    (PDF)

    pdig.0000538.s001.pdf (603.5KB, pdf)
    Attachment

    Submitted filename: Response to reviewers - 2024-04 (20240503).pdf

    pdig.0000538.s002.pdf (138.4KB, pdf)

    Data Availability Statement

    The following datasets are publicly available at the links provided in Table C in S1 Text: (i) MIT-BIH Arrhythmia Database (https://www.physionet.org/physiobank/database/mitdb/); (ii) PhysioNet/Computing in Cardiology Challenge 2014 training dataset and augmented training dataset (https://physionet.org/content/challenge-2014/1.0.0/); (iii) MIT-BIH Normal Sinus Rhythm Database (https://physionet.org/physiobank/database/nsrdb/); and (iv) TELE ECG Database (https://doi.org/10.7910/DVN/QTG0EP). The SAFER dataset cannot be shared due to ethical restrictions. Requests for access to the SAFER dataset should be directed to the SAFER study coordinator (SAFER@medschl.cam.ac.uk) and will be considered by the investigators, in accordance with participant consent.


    Articles from PLOS Digital Health are provided here courtesy of PLOS

    RESOURCES