Skip to main content
CJC Open logoLink to CJC Open
. 2022 Aug 4;4(11):939–945. doi: 10.1016/j.cjco.2022.07.011

Comparison of Apple Watch vs KardiaMobile: A Tale of Two Devices

Calvin Lee a, Charles Lee a, Carlos Fernando b, Chi-Ming Chow b,c,d,
PMCID: PMC9700214  PMID: 36444370

Abstract

Background

The Apple Watch Series 4 (AW4) and the KardiaMobile single bipolar lead model (KM) are 2 of the most popular US Food & Drug Administration (FDA)-approved commercial heart trackers. However, a lack of knowledge remains regarding their rhythm-detection accuracy in real-life clinical situations. This paper aims to determine the practicality of using an AW4 or a KM in modern medical practice, by assessing the accuracy of each in identifying heart rhythms and heart rate.

Methods

Participants from the Toronto Heart Centre clinic were enrolled from January 2019 to December 2019. They had a 12-lead electrocardiogram (ECG), followed by wearing the AW4 watch (OS 5.3), and pressing on the KM electrode plates, within the span of 5 minutes of one another. Each session involved a 12-lead ECG, an ECG from each device, and AW4’s photoplethysmography function (APPG).

Results

Of 200 participants, 162 (81%) were in sinus rhythm, and 38 (19%) had atrial fibrillation. The rhythm-detection accuracy for sinus rhythm was 100% for the AW4, and 99.03% for the KM. For atrial fibrillation, accuracy was 90.48% for the AW4, and 100% for the KM. The heart rate accuracy for sinus rhythm was 94.39% for the KM, 90.65% for the APPG, and 96.26% for the Apple ECG function. The heart rate accuracy for atrial fibrillation was 91.30% for the KM, 82.61% for the APPG, and 86.96% for the Apple ECG function.

Conclusions

Both the AW4 and the KM could reliably detect rhythm and heart rate in real-life clinical situations. However, a nonsignificant trend occurred toward better rhythm detection and accuracy with KM, compared with AW4. The difference is mainly due to artifacts (eg, tremors) and the fit of the strap for AW4. The findings have important implications for how these consumer devices can be used in real-life clinical settings.

Graphical abstract

graphic file with name fx1.jpg


One of the most popular personal health technologies is the smartwatch, namely the Apple Watch (Apple, Inc., Cupertino, CA). In fact, global smartwatch production grew substantially in 2021, reaching an excess of 40 million units, the largest quarterly shipment ever.1 We decided to focus on the Apple Watch Series 4 (AW4; Apple, Inc.) due to its popularity on digital and watch blogs and because it was the leader in smartwatch sales, accounting for almost a third of the worldwide smartwatch market in quarter 4 of 2021.2

The AW4 implements 2 methods to detect heart rate (HR) and rhythm—photoplethysmography (PPG) and a 2-lead electrocardiogram (ECG). PPG is an affordable, noninvasive technique that uses several green light-emitting diodes (LEDs) and optical sensors to measure changes in blood volume.3 The AW4 PPG function (APPG) provides long-term surveillance of HR. Additionally, the AW4’s user-triggered ECG (AECG) recording is activated via 2 installed electrodes, one in the digital crown and the other in the back of the watch.4

For the KardiaMobile (KM) (AliveCor Inc., San Francisco, CA), by contrast, a more focused approach has been used in the design; it comprises only a single bipolar lead that resembles lead I in a 12-lead ECG. However, the seemingly simple ECG mechanism uses a deep neural network-trained artificial intelligence (AI) algorithm with ECG data from over 200,000 Mayo Clinic patients.5

Despite the widespread popularity of smartwatches, consumer reports and clinical studies often have been contradictory in their claims about device accuracy. For example, one report claims that the AW4 has high accuracy.6 In contrast, some cardiologists have raised concerns about the Apple Watch’s inability to differentiate between long-term vs short-term arrhythmias, emphasizing the danger of presenting users with false positives.7 Many studies have demonstrated the high accuracy of the KM.8, 9, 10, 11 Yet, complaints regarding its practicality have been made, such as the need for wearers to be perfectly still when using the device and its inability to provide continuous monitoring.12

We aimed, therefore, to determine the practicality of incorporating digital health devices, specifically the AW4 and the KM, into contemporary medical practice, by assessing and comparing their accuracies in identifying heart rhythm and HR.

Methods

A total of 200 participants were recruited from the Toronto Heart Centre, a community-based cardiology clinic in downtown Toronto (Ontario, Canada), between January 2019 and December 2019, all of whom were scheduled for a new consultation or a routine follow-up visit.

The study team (C.L., C.L., C.M.C.) recruited potential participants, and those who signed the informed consent form underwent their usual scheduled 12-lead ECG. Immediately afterward, they were asked to sit down and were properly fitted with both the AW4 and the KM for the study duration. The AW4 was strapped onto the wrist of their nondominant hand, in the typical position of a wristwatch, ensuring that it rested snugly against their skin. The participants were instructed to use the ECG and HR functions on the AW4, taking 2 sets of 30-second readings for each mode. Next, the participants were instructed to hold the KM with both hands such that both their thumbs touched the sensors, taking 2 sets of 30-second readings. The first set of recordings was used to test for proper strap fit and to teach the patients how to use the device, and the second set was used for the study. The HR trackers were thoroughly cleaned and disinfected after each use. Approval was obtained from the Vertias Institutional review board (protocol no. 16450-17), Montreal, Quebec, on October 21, 2019. A blinded review of the HR recordings was not employed, so that the patients could use the devices as naturally and realistically as possible.

Data collection

We obtained the patients’ demographic information, medical history, risk factors, current treatment record, and reasons for referral by interviewing the patients and reviewing their health records. We also noted patients’ awareness of the ability of the digital health devices to detect HR and heart rhythm. Patient identifiers included name, gender, date of birth, and medical record number. These identifiers were required to access patients’ clinical records for later review.

Statistical analysis

The 12-lead ECG recordings were interpreted by a cardiologist (C.M.C.) who was blinded to the reported findings of the AW4 and the KM. Whenever indicated, data were presented as a percentage of patients; AW4 and KM rhythm diagnoses were considered accurate if they matched the rhythm detected by the 12-lead ECG. HR readings were considered correct if they deviated a maximum of ± 5 beats per minute from the 12-lead ECG, taking into account variability in atrial fibrillation (AF). Rates were analyzed using χ2 tests to compare the devices, with a P value of < 0.05 accepted as statistically significant. Additionally, Cohen’s kappa coefficients were calculated for device accuracy in detecting heart rhythm, sinus rhythm (SR), and AF. Continuous data were analyzed using independent-samples t-tests. We used the statistical software SAS Enterprise Guide 6.1 for Windows (SAS Institute, Cary, NC).

Results

Demographic data

A total of 200 participants were recruited from the Toronto Heart Centre, of which 41% (82) were women. The mean age was 65.6 years, with a standard deviation of ± 14.6, with the youngest being 26 years and the oldest 94 years. Few (41%) were aware of the AW4’s ability to record ECGs, and even fewer (3%) were aware of the KM’s ability to do so. Overall, 81% of participants (162) were in SR, and 19% (38) were in AF.

A Shapiro-Wilk test was performed and did not show evidence of non-normality (W = 0.987, P = 0.215). Parametric tests for comparisons were used based on this outcome. A summary of the demographic data for the 200 participants is provided in Table 1.

Table 1.

Participant demographics

Demographic variable AW4 (average)
Age, y 65.6 ± 14.6
Sex, male 59.2
Hypertension 49.6
Diabetes 18.5
Dyslipidemia 41.5
Coronary artery disease 3.7
Stroke/TIA 0.7
Vascular disease 0.7
Awareness of AW4’s ECG function 41
Awareness of KM’s ECG function 3

Values are %, unless otherwise indicated.

AW4, Apple Watch Series 4 (Apple, Inc, Cupertino, CA); ECG, electrocardiogram; KM, KardiaMobile (AliveCor, San Francisco, CA); TIA, transient ischemic attack.

The SR detection accuracy was 100% for the AW4, and 99% for the KM. No inconclusive SR readings occurred with the AW4, but 2 inconclusive readings occurred with the KM. The AF detection accuracy was 90.5% with the AW4, and 100% with the KM. The AW4 had 19 inconclusive AF readings, whereas the KM had none. Cohen’s kappa coefficients (k) for correctly identifying heart rhythm were 0.966 and 0.969 for the AW4 and KM, respectively.

SR HR accuracies for the devices were as follows: 96.5% for the AECG, 90.5% for the APPG, and 94% for the KM. There were 7 inconclusive SR HR readings for the AECG, 19 for the APPG, and 12 for the KM. AF HR accuracies were as follows: 87% for the AECG, 83% for the APPG, and 91% for the KM. A total of 26 inconclusive AF HR readings occurred for the AECG, 34 for the APPG, and 18 for the KM.

Discussion

This study demonstrates that the AW4 and KM are both capable digital health devices that can reliably and accurately detect heart rhythm and HR. Both devices performed superbly in detecting SR (100% for the AW4 and 99% for the KM), as shown in Figure 1. With AF, the rhythm detection performance decreased with the AW4 (90.5%) but not with the KM (100%). For assessment of HR in SR, the ECG method was highly accurate with both the AW4 (96.5%) and the KM (90.5%), as shown in Figure 2. With the PPG method used by the AW4, HR measurements were slightly more variable (90.5%) than those with the ECG method. For patients with AF, HR detection variability was even higher, which may be attributable to the slight time differences between the ECG and the mobile device recordings (87% for the AECG, 83% for the APPG, and 91% for the KM).

Figure 1.

Figure 1

Bar graph of percent of rhythms correctly identified by the Apple Watch Series 4 (AW4) and and the Kardia Mobile (KM). Afib, atrial fibrillation.

Figure 2.

Figure 2

Bar graph of heart rate accuracies for sinus rhythm and atrial fibrillation (Afib) for each device method. ECG, electrocardiogram; PPG, photoplethysmography.

The heart rhythm results show that SR was detected with greater overall accuracy than was AF. This result was similarly demonstrated in multiple studies that noted decreased accuracy in digital health devices’ HR readings for patients with AF, leading particularly to underestimation in HR readings.13,14 A likely explanation for this discrepancy is the ease of detecting the regularly occurring intervals of QRS complexes in SR, rather than the irregularly occurring intervals of QRS complexes in AF. Therefore, the devices’ rhythm detection formulas would make calculations easily based on the more stable SR dataset. However, consideration of each device's rhythm detection indicated that the AW4 was more accurate for SR detection.

In contrast, the KM was more accurate for AF detection, and this can be attributed to differences in the AI programs used by the 2 devices. Each program has characteristics that allow its respective device to achieve a higher degree of accuracy in a particular heart rhythm, due to differences such as ECG datasets.15 Another possible explanation for the AW4’s lower AF rhythm detection is that its AI program misinterpreted certain rhythm variations16; for example, in the case of 2 patients, the AW4 recorded AF that was later confirmed to be premature atrial complexes (PACs).

The devices’ HR results also show a higher accuracy for SR than for AF, as compared in Figures 3 and 4, respectively. A factor that may have contributed to the relatively lower accuracy of the AF patients’ HR readings was the difficulty associated with elderly patients.17 As reported in a study that examined the management of AF patients,18 most AF patients are elderly. These patients often presented with tremors, which created artifacts in the ECG recordings. Elderly female patients in particular presented with the issue of having improper strap fit for the AW4, as even the smallest available Apple Watch strap was too large on their wrists. Given that the AW4 relies on its sensors being pressed snugly against the wearer’s skin in order for the PPG and ECG methods to work properly, both methods for the AW4 would have decreased accuracy with an ill-fitting strap. Of note, although the KM could have provided better readings than previous versions, it was not available for purchase by the time this study was conducted. This device warrants further study.

Figure 3.

Figure 3

Scatter plot of measured heart rates vs true heart rates in patients with sinus rhythm. Solid line represents perfect accuracy (R2 = 1). bpm, beats per minute; ECG, electrocardiogram; PPG, photoplethysmography.

Figure 4.

Figure 4

Scatter plot of measured heart rates vs true heart rates in patients with atrial fibrillation. Solid line represents perfect accuracy (R2 = 1). bpm, beats per minute; ECG, electrocardiogram; PPG, photoplethysmography.

Examination of the 2 AW4 methods showed that the APPG function performed the worst when measuring HR. This finding was similar to results of previous studies,19,20 with PPG being less accurate than ECG, especially in situations requiring increased physical activity. Another drawback in the practicality of using PPG for HR recordings is that a decrease in temperature, such as exposure to cold weather, will affect the detection of heart complexes by making the fiducial point more difficult to locate.21 Lastly, PPG has been found to have greater inaccuracy for users with darker skin tones.22

An examination of patient demographic factors revealed that gender and age influenced device accuracy. As shown in Figure 5, the accuracy of the KM heart rhythm detection was high for both genders, but no significant difference between genders was seen. On the contrary, the accuracy of the AW4 rhythm detection was slightly lower than that of the KM, and women experienced lower rates of rhythm detection than men.

Figure 5.

Figure 5

Bar graph of percent of rhythms correctly identified, by gender.

Further research

A possible approach to further investigation is to take ECG recordings throughout various physical activity intensities. Such data would provide valuable information for those seeking to measure and interpret their HR and rhythm recordings during and immediately after activities that are physiologically stressing, such as an exercise workout. Incorporation of the newer KM device could reveal whether the addition of leads would result in significant improvements in HR and rhythm detection. Lastly, sampling from a population that includes a greater number of younger patients would be beneficial, in order to be inclusive of a larger portion of the consumer market for these devices.

Limitations

Although the study has strengths, it is not without limitations. Due to the constraints of the clinical operation at the Toronto Heart Centre, the device recordings had to be taken in sequence; the 12-lead ECG was used first, followed by the 2 AW4 modes (AECG and APPG), and finally the KM. Ideally, these should be used simultaneously to achieve more comparable results. However, to limit the variation of HR, each of the recordings was taken once the patients returned to their resting state. All the readings were completed within 1-2 minutes of each other.

Another limitation was that the ECG recordings were taken only at rest, as both the AW4 and the KM gave suboptimal ECG recordings when participants moved their hands. As a result, motion artifacts often presented in the ECG recordings as unsuccessful recordings. In such cases, an error message appeared, and the recording had to be repeated.

Conclusion

This study demonstrates that the AW4 and the KM can reliably and accurately detect heart rhythm and HR, supporting previous findings demonstrating the capacity of these digital health devices for use in patient screening and monitoring.23 Although the ECG function on the AW4 is more accurate than the PPG function, it does have greater limitations in terms of artifacts. Of note, the AW4 costs $519, and the KM costs $99 (both in Canadian dollars),24,25 yet the AW4 can be operated from the wrist alone, whereas the KM requires the additional use of a smartphone.

Acknowledgments

Funding Sources

The project is entirely self-funded, as the devices were purchased directly from the open market by the authors. The authors have no funding sources to declare.

Disclosures

None of the authors have collaborated with or are affiliated with Apple Inc. or AliveCor Inc. The authors have no conflicts of interest to disclose.

Footnotes

Ethics Statement: Approval was obtained from Vertias institutional review board (Protocol no. 16450-17), Montreal, Quebec on October 21, 2019. The research reported has adhered to the relevant ethical guidelines.

See page 944 for disclosure information.

References


Articles from CJC Open are provided here courtesy of Elsevier

RESOURCES