Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jan 1.
Published in final edited form as: Nat Med. 2022 Nov 14;28(12):2497–2503. doi: 10.1038/s41591-022-02053-1

Prospective evaluation of smartwatch-enabled detection of left ventricular dysfunction

Zachi I Attia 1, David M Harmon 1,2, Jennifer Dugan 1, Lukas Manka 3, Francisco Lopez-Jimenez 1, Amir Lerman 1, Konstantinos C Siontis 1, Peter A Noseworthy 1, Xiaoxi Yao 1,4, Eric W Klavetter 1, John D Halamka 5, Samuel J Asirvatham 1, Rita Khan 3, Rickey E Carter 6, Bradley C Leibovich 3,7, Paul A Friedman 1,
PMCID: PMC9805528  NIHMSID: NIHMS1853235  PMID: 36376461

Abstract

Although artificial intelligence (AI) algorithms have been shown to be capable of identifying cardiac dysfunction, defined as ejection fraction (EF) ≤ 40%, from 12-lead electrocardiograms (ECGs), identification of cardiac dysfunction using the single-lead ECG of a smartwatch has yet to be tested. In the present study, a prospective study in which patients of Mayo Clinic were invited by email to download a Mayo Clinic iPhone application that sends watch ECGs to a secure data platform, we examined patient engagement with the study app and the diagnostic utility of the ECGs. We digitally enrolled 2,454 unique patients (mean age 53 ± 15 years, 56% female) from 46 US states and 11 countries, who sent 125,610 ECGs to the data platform between August 2021 and February 2022; 421 participants had at least one watch-classified sinus rhythm ECG within 30 d of an echocardiogram, of whom 16 (3.8%) had an EF ≤ 40%. The AI algorithm detected patients with low EF with an area under the curve of 0.885 (95% confidence interval 0.823–0.946) and 0.881 (0.815–0.947), using the mean prediction within a 30-d window or the closest ECG relative to the echocardiogram that determined the EF, respectively. These findings indicate that consumer watch ECGs, acquired in nonclinical environments, can be used to identify patients with cardiac dysfunction, a potentially life-threatening and often asymptomatic condition.


AI has enabled the standard 12-lead ECG to detect cardiovascular diseases not apparent to expert human readers13. Multiple independent groups have developed and validated AI-enhanced ECGs (AI-ECGs) to screen for conditions such as left ventricular systolic dysfunction (LVSD), valvular heart disease, hypertrophic cardiomyopathy (HCM), electrolyte abnormalities, silent arrhythmias not present at the time of signal acquisition and other conditions49. A consistent finding has been the ability of the AI-ECG to detect occult and impending disease.

Asymptomatic LVSD is present in 2% of the population (9% in those aged > 60) and confers a 4.6-fold increased risk of clinical heart failure and 1.8-fold increase in all-cause mortality1012. The AI-ECG algorithm that screens for LVSD has been found to perform well across global populations, and to be stable over time and robust with regard to race and sex1316. A community-based, pragmatic, randomized trial that integrated the AI-ECG into routine clinical practice within primary care enrolled over 20,000 subjects in 8 months and found that the AI-ECG alert increased the new diagnosis of LVSD by > 30%17.

Previous studies of the AI-ECG to screen for LVSD have utilized 12-lead ECGs or been performed in clinical environments. We hypothesized that adaptation of the 12-lead AI-ECG to a single-lead consumer Apple Watch ECG (now accessible via Apple Health Kit) would allow massive scaling of this tool for screening and monitoring individuals in nonclinical environments. To test this hypothesis, we performed an analysis of watch ECGs acquired from a cohort of remotely enrolled and followed participants with linked health-care records.

Background characteristics

A total of 134,493 study invitations were emailed to existing patients who utilized the Mayo Clinic patient app between August 2021 and February 2022 (Extended Data Fig. 1). Of the survey respondents who indicated interest in the study (n = 5,177), 3,884 went on to provide digital consent and were enrolled into the study. Of these, 2,463 (63.5%) uploaded at least one ECG during the study period. Nine subjects were excluded due to an early software app malfunction that precluded data collection (subsequently remedied), leaving 2,454 in the final cohort. A final analysis set of 2,454 patients from 46 states and 11 countries recorded and uploaded one or more Apple Watch ECGs (Fig. 1). The mean age was 53 years (s.d. ± 15 years, range 18–94 years). Most of the population was female (56%; n = 1,374) and white (88%; n = 2,160). Baseline characteristics of the study cohort are detailed in Table 1. More than 125,000 ECGs were recorded by this 2,454 patient cohort (Fig. 2) and 78.5% of these ECG recordings were classified as sinus rhythm by the Apple Watch (n = 98,603). The rest of the Apple Watch ECGs were classified as atrial fibrillation (AF; 5.1%) or inconclusive (16.4%).

Fig. 1 |. Patient enrollment in the remote digital study.

Fig. 1 |

Flow sheet (left) is a consort diagram summarizing patient enrollment. The map (right) depicts the geographic distribution of enrolled patients within the United States.

Table 1 |.

Sample characteristics

Overall (n = 2,454)
Age (years) 54 (18, 94)
Gender (female) 1,364 (55.6%)
Race
 White 2,237 (91.2%)
 Black 70 (2.9%)
 Other 147 (6.0%)
Ethnicity (Hispanic or Latino) 95 (3.9%)
CHF 218 (8.9%)
Peripheral vascular disease 349 (14.2%)
Cerebrovascular disease 187 (7.6%)
Renal disease 201 (8.2%)
Chronic pulmonary disease 500 (20.4%)
Connective tissue disease–rheumatic disease 136 (5.5%)
Myocardial infarction 113 (4.6%)
Diabetes 271 (11.0%)
Hypertension, combined 915 (37.3%)

Fig. 2 |. Study participant engagement with the customized Mayo Clinic iPhone application.

Fig. 2 |

a, Number of transmitted ECGs versus time. The soft launch included ‘friends and family’ to assess the system function (first arrow); batch enrollment commenced as indicated by the second arrow (official launch). b, Number of days from first app use to last app use. c, Unique daily uses per patient (multiple uses on the same day counted as single use). d, Normalized length of time that individuals used the app. This calculation takes into account the varying amount of time during the study that each subject could use the app because those enrolled later in the study course had less opportunity to use it.

Patient engagement

During the study period, 92% (n = 2,258) of patients used the customized ECG application more than once and 50% (n = 1,227) used it more than five times. On average, patients used the app 2.1 times per month and increasing use positively correlated with patient age. Each patient contributed 51 ECGs on average (median: 25; interquartile range: 12–54) during the study period with an average of 7.8 daily uses (Fig. 2).

ECGs from Apple Watch to clinical ECG dashboard

After the creation of our Mayo Clinic ECG study app, patient-recorded Apple Watch ECGs were transmitted to their medical record in our ECG dashboard. This dashboard, paired with data from the electronic medical record (EMR), now features a ‘Mobile ECGs tab’ where patients’ recorded Apple Watch ECGs are located after upload (Extended Data Fig. 2). From here, providers can review these results in real time to assist with patient care (Extended Data Fig. 3). During our study period, 860 different providers used the ECG dashboard to search for 1,743 patients, of whom 366 had an Apple Watch ECG uploaded from the present study.

Detection of LVSD using Apple Watch

Some 421 unique patients were identified with a recorded sinus rhythm Apple Watch ECG and transthoracic echocardiography (TTE) within 30 d of each other. Of these patients, 16 (3.8%) had an EF ≤ 40%. Patient demographics and comorbidities of this subgroup are described in detail in Table 2. Application of the retrained single-lead AI algorithm to detect LVSD yielded an area under the curve (AUC; 95% confidence interval (CI)) of 0.885 (0.823–0.946) when using the single nearest ECG to TTE. The diagnostic performance using both the mean model predictions and the single closest model predictions was similarly robust (Extended Table 1). Using the optimal threshold (0.67), the sensitivity and specificity were 68.8% (11 out of 16; 95% CI 41.3–89.0%) and specificity 83.7% (339 out of 405; 79.7–87.2%) with the mean prediction. Using the single closest prediction, these estimates were 75.0% (12 out of 16; 47.6–92.7%) and 78.5% (318 out of 405; 74.2–82.4%), respectively. As shown in Extended Data Table 1, the model performance in terms of observed sensitivity and specificity varied with the cut value. Selecting a cut value as low as 0.6 resulted in 87.5% (61.7–98.4%) sensitivity and 80.7% (76.6–84.5%) specificity with the mean model prediction. Review of clinical records found that 12 of the 16 patients had minimal or no symptoms of LVSD (stage A/B; Extended Data Table 1).

Table 2 |.

Characteristics of the LVSD subgroup

LVEF ≤ 40% (n = 16) LVEF > 40% (n = 405) All patients (n = 421) P valuea
Age (years) 66 (36, 78) 61 (18, 94) 61 (18, 94) 0.23
Gender (female) 8 (50.0%) 187 (46.2%) 195 (46.3%) 0.76
Race 0.36
 White 15 (93.8%) 379 (93.6%) 394 (93.6%)
 Black 1 (6.2%) 8 (2.0%) 9 (2.1%)
 Other 0 (0.0%) 18 (4.4%) 18 (4.3%)
Ethnicity (Hispanic or Latino) 2 (12.5%) 8 (2.0%) 10 (2.4%) 0.007
Heart rate (beats min−1) 69 (57, 89) 72 (51, 113) 72 (51, 113) 0.88
Left ventricular EF (LVEF) (%) 37 (15, 40) 61 (41, 77) 60 (15, 77) <0.001
CHF 16 (100.0%) 100 (24.7%) 116 (27.6%) <0.001
Peripheral vascular disease 14 (87.5%) 143 (35.3%) 157 (37.3%) <0.001
Cerebrovascular disease 1 (6.2%) 55 (13.6%) 56 (13.3%) 0.40
Renal disease 4 (25.0%) 55 (13.6%) 59 (14.0%) 0.20
Chronic pulmonary disease 5 (31.2%) 90 (22.2%) 95 (22.6%) 0.40
Connective tissue disease–rheumatic disease 3 (18.8%) 28 (6.9%) 31 (7.4%) 0.075
Myocardial infarction 6 (37.5%) 45 (11.1%) 51 (12.1%) 0.002
Diabetes 3 (18.8%) 60 (14.8%) 63 (15.0%) 0.66
Hypertension, combined 9 (56.2%) 229 (56.5%) 238 (56.5%) 0.98
a

P < 0.05 denoted in bold.

A separate analysis that included watch ECGs acquired in either normal sinus rhythm (NSR) or AF added 6 patients to the cohort in whom all watch ECGs were acquired during AF (all others had both NSR and AF watch ECGs), leading to a cohort of 427 patients. Of the 427, 16 were positive for low EF. In this expanded analysis, which removed the need to filter ECGs by rhythm, the AUC was 0.873 for both the closest watch ECG score (95% CI 0.805–0.941; sensitivity: 68.8% (11 out of 16; 41.3–89.0%), specificity: 83.2% (342 out of 411; 79.2–86.7%)) and the average watch score (0.801–0.945; sensitivity 68.8% (11 out of 16; 41.3–89.0%) and specificity 77.6% (342 out of 411; 79.2–86.7%)).

When using the 12-lead AI-ECG as the reference test, 38 (5.4%) of the 701 had LVSD. The AUC using the average watch AI-ECG score was 0.900 (95% CI 0.854–0.947), with sensitivity of 76.3% (29 out of 38; 59.8–88.6%) and specificity of 87.2% (578 out of 663; 84.4–89.6%) (Fig. 3). When using the temporally closest watch ECG score, the AUC was 0.872 (0.810–0.934) with sensitivity 76.3% (59.8–88.6%) and specificity 83.9% (80.8–86.6%) (Supplementary Data Table 1).

Fig. 3 |. Assessment of EF using the watch AI-ECG.

Fig. 3 |

a, Distribution of echocardiogram-derived EFs among individuals acquired within 30 d of a clinically ordered echocardiogram. b, The ROC of the watch AI-ECG to determine LVSD (EF ≤ 40%) for single (AUC 0.88) and averaged (AUC 0.89) Apple Watch ECG recordings.

Temporal trends analysis of the AI scores

Applying the algorithm to all patients who provided ECGs that were classified as sinus rhythm, 50% (1,226 out of 2,442) had at least one ECG that exceeded the model threshold. With the addition of the smoothing of the model predictions that resulted from the 5-d moving average, the number of positive screens dropped to 477 (20%). Those that did have at least one positive screen were associated with increased testing frequency, older age and generally more comorbidities (Extended Data Table 2).

Discussion

In this decentralized, prospective, pragmatic study, a widely used consumer watch securely transmitted ECG information from nonmedical environments to a medical center using a downloaded app enabling AI signal analysis to screen for LVSD, a potentially life-threatening and frequently asymptomatic disease18,19. Furthermore, the feasibility of the democratized collection of ECGs for research purposes was established by means of electronic consent and specialized phone-based applications. The built-in notification system within the app allowed for continued collection of ECGs over the study period and a high level of engagement.

As a secondary objective, the utility of using the remotely collected ECGs was evaluated by developing and internally validating a new AI-ECG algorithm. The performance of the AI-ECG watch test for LVSD (AUC 0.88) compares favorably with other commonly used screening tests, including mammography for breast cancer (AUC 0.67–0.84), cervical cytology for cervical cancer (AUC near 0.7) and B-type natriuretic peptide for heart failure (AUC 0.6–0.7)2022. In contrast to previous studies of wearable or portable ECG recording devices for detecting arrhythmias such as AF, for which an ECG itself is the gold standard test23, the present study implemented an AI analysis of watch ECG signals to identify a condition typically detected with expensive, technology-intensive imaging studies such as echocardiograms, computed tomography scans or magnetic resonance images. The ability to screen for LVSD at home in 30 s may have important health-care implications for many at-risk individuals, including those with hypertension, diabetes and coronary artery disease, and those undergoing chemotherapy24,25. Numerous prospective randomized trials and professional society guidelines have demonstrated that pharmacological and device therapies for LVSD prevent morbidity and mortality18,19,26,27.

Using digital tools, we recruited a geographically and age-diverse population from 46 states and 11 countries, highlighting the capabilities of remote recruitment. Subjects were engaged in the trial with > 125,000 ECGs collected during 5 months from approximately 2,500 individuals. The high level of engagement may reflect enrollment from a cohort of watch-owning Mayo Clinic patients (as opposed to consumers). Although enrolled subjects were informed that their transmitted ECGs would not undergo automatic review, the availability of Apple Watch ECG tracings to clinicians via an EMR-linked ECG dashboard may have motivated participation (Extended Data Fig. 4). In the Apple Heart study23, > 400,000 watch owners were enrolled, of whom 2,161 received an irregular heart rate alert (0.52% of the total). The vast majority (79%) of recipients of an initial notice of an irregular heartbeat didn’t complete the study, and 3,070 subjects with a new diagnosis of AF at the end of the study were never notified by the watch, highlighting both the opportunities and the challenges of consumer-based enrollment. Similarly, the Huawei watch study engaged 246,541 individuals and used photoplethysmography monitoring to identify 227 individuals with confirmed AF28. Subjects were entered into a program of integrated AF management, resulting in anticoagulation in 80%.

Previous studies have demonstrated the ability of the AI-ECG to detect numerous conditions, including LVSD, valvular heart disease, amyloid heart disease and noncardiac conditions such as hyperkalemia and cirrhosis, among other conditions4,5,7,29,30. Importantly, these have predominantly utilized standardized 12-lead ECGs acquired in medical environments, typically in a supine position with skin preparation to insure signal quality. Biological factors such as skin preparation, body position and physical activity are known to affect ECG signal quality and diagnostic performance of human and computer interpretation3135. Signal-related factors, including sampling rate and filtering, also impact interpretation. In the present study, we demonstrated the ability to retrain a convolutional neural network developed using medical-grade data to function effectively using wearable ECG data. An important finding was the ability to easily acquire ECGs, so that multiple samples can be obtained and the higher-quality signal selected for analysis. An automated filter selected clean NSR tracings, increasing the AUC to 0.88, versus arbitrarily selecting the ECG closest temporally to the echocardiogram (0.85). Given the ease of watch ECG acquisition, for some clinical use cases multiple recordings may be acquired to generate test output. In addition, signal conditioning for the watch ECG differed from that of the standard 12-lead ECG model, and the optimal network threshold differed, underscoring the need to consider signal source input requirements before application of the AI model to insure robust test output.

Congestive heart failure (CHF) affects more than 5 million people and consumes over $US30 billion in health-care expenditure just in the United States36,37. The incidence of CHF is higher among African–Americans related to differences in the prevalence of hypertension and diabetes, as well as socioeconomic status38. The relatively low cost of a watch and Smartphone compared with medical ECG acquisition equipment and the ease of deployment may permit addressing health-care disparities. Although an ECG watch may be expensive for an individual consumer, placing one in a clinic or other screening environment as a shared resource may permit early disease detection. Previous work demonstrated the robustness of the AI-ECG low EF algorithm across sex, race and ethnicity1316.

The potential influx of data from consumer devices may further tax an overburdened health-care system. Currently, Apple Watch users record an ECG, manually download a PDF of the individual recording and either upload the file to their provider via an EMR portal or, more commonly, email the file, resulting in disjointed, difficult-to-locate data bits and added health-care clerical burden. In the present study, we used an iPhone app to automatically and securely send all ECGs to the Mayo Clinic secure unified data platform (UDP). From the UDP all watch ECGs are made available to an EMR-linked ECG dashboard that integrates, streamlines and organizes ECG viewing with a timeline, a summary table that includes watch-classified rhythm and the 30-s recorded ECG (Extended Data Fig. 2). It also enables ECG comparisons and visual operations (zoom, measure) in a dynamic web-based environment for review. This process facilitates provider review of arrhythmia data within the patient chart and patient-to-provider/provider-to-provider communication in the remote, outpatient environment (Extended Data Fig. 3). Furthermore, as the app transmits all recorded watch ECGs, including those labeled ‘inconclusive’, there is an opportunity for physician overread and identification of other electrophysiological phenomena (that is, premature ventricular contractions) that may trigger ‘inconclusive’ Apple Watch ECG results and necessitate further investigation as a result of manual ECG review39.

Our work is best understood in the context of its limitations. Although use of patient-owned signal acquisition hardware permits relatively inexpensive, geographically diverse screening, the cost of the devices may exacerbate the digital divide and health-care inequities. Whether this might be mitigated by clinic-based use of ECG watches is not known. The gender identity, racial and ethnic composition of the present study cohort was limited, although a previous analysis of the 12-lead ECGs demonstrated effective test performance in men and women, and across various ethnicities and races16. The current model is designed to identify patients with EF ≤ 40%; patients with heart failure with preserved EF or mildly reduced EF are not effectively detected. However, identification of patients with EF ≤ 40% has important therapeutic implications in accordance with multiple professional society and national guidelines. We recruited a geographically diverse group of participants by using a fully digital, site-less, pragmatic study design. However, although ‘site-less’, we included only patients who had been in contact with our medical system and had downloaded the Mayo Clinic app. Although the study design insured inclusion only of patients who have never been used in the development of the AI-ECG in the evaluation of the model’s performance (that is, mimic an external validation study with data not available at the time of model training)40, additional external validation of these preliminary findings will be required. Such external validation studies will be required to support the generalizability of these findings and enable regulatory approval. Given the small number of patients tested with (persistent?) AF, no meaningful conclusion can be made about model performance during AF.

False-positive results may lead to anxiety and increased cost through cardiac imaging overutilization. However, a recent prospective trial using AI-enhanced 12-lead ECGs to screen for LVSD found more effective imaging utilization by increasing the yield of important findings in subjects referred based on AI-ECG guidance17. Moreover, false-positive 12-lead AI-ECGs for low EF identify a high-risk group with a fivefold increased risk of developing heart failure in the future. We did not perform a cost-effectiveness analysis in the present study; however, previous work using simulated, universal screening with the 12-lead AI-ECG algorithm demonstrated relative cost-effectiveness41. We did not assess the incremental value of this model on top of models based on traditional clinical risk factors; such information would be important in assessing the added clinical utility of this approach. Although patients are able to report symptoms in association with an Apple Watch ECG recording, we did not provide instructions to patients regarding use of this feature and we did not analyze these data.

In summary, the present study applied AI to an Apple Watch ECG acquired in nonclinical environments, demonstrating its utility for effective identification of left ventricular dysfunction, a potentially life-threatening and often asymptomatic disease. In contrast to consumer-based trials, clinic-enrolled patients remained highly engaged. This suggests an opportunity to use AI in remote care and to clinically validate AI-ECG models in geographically dispersed populations at lower cost using patients’ own devices, in a potentially massively scalable manner that could improve quality of life, lower cost through earlier detection and strengthen patient engagement.

Online content

Any methods, additional references, Nature Research reporting summaries, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41591-022-02053-1.

Methods

The Institutional Review Board at the Mayo Clinic in Rochester, MN, USA approved this prospective, decentralized, pragmatic study. Mayo Clinic investigators designed the study, developed associated software and carried out the protocol with no financial or technical support from Apple.

Subject enrollment

Enrollment was entirely digital and remote. Patients who had previously downloaded the Mayo Clinic app to their personal Apple iPhone (IOS 14.0 or above) were sent an email survey (in batches) between August 2021 and February 2022 (n = 134,493; Extended Data Fig. 1). We asked recipients whether they had an Apple Watch capable of recording ECGs (that is; series 4 or later) and, if so, they were invited to participate in the study. Interested individuals were sent a digital consent form and a link to download the Mayo Clinic ECG Study app, created by the Mayo Clinic Center for Digital Health (CDH). Once uploaded, watch ECGs were made available in the secure Mayo Clinic ECG dashboard for facilitated clinician review (Extended Data Fig. 2). Patients independently acquired ECGs from their watches outside medical environments.

Mayo Clinic ECG study app

To facilitate the collection of ECGs from participants, the Mayo Clinic ECG study mobile app was developed. The app uses Apple Research Kit elements to guide patients through research onboarding and outline data usage and privacy policies. The app is integrated with Mayo Clinic’s patient identity management system and single sign-on (SSO), so that users can log in using their Mayo Clinic username and password. During user login, the app also verifies with the Mayo Clinic’s Participant Tracking System (PTrax) to confirm that the user has consented to participate in the ECG study. If informed consent is not documented, the user is unable to log in to the app and upload ECGs. Users are asked to allow the app to access Apple Health ECG data using the Apple Health Kit. When access is granted, the app retrieves all stored ECGs from Apple and transfers them to a Mayo Clinic database using a custom API. Every time the app is subsequently opened, the app transfers any newly stored ECGs in Apple Health since the last transfer. Users are reminded to take an Apple Watch ECG and open the ECG study app at least every 2 weeks via push notifications. Users can end their participation in the study any time by removing the ECG study app from their device.

AI-ECG dashboard

We previously developed a web dashboard that connects to the EMR and allows clinicians to view all 12-lead ECGs that were recorded at the Mayo Clinic and are saved in the GE MUSE system, with their AI scores over time for six published AI-ECG models, including detection of LVSD, amyloidosis, HCM, aortic stenosis, patient ECG age and ECG sex (Extended Data Fig. 2)4,6,29,42,43. As part of this project, we extended the ECG dashboard to allow clinicians to review Apple Watch ECG tracings that were recorded by the patients and were uploaded by our study app (Extended Data Fig. 2). The ECGs are shown with the Apple Watch-determined rhythm and exact date and time of the recording. All watch tracings are saved in a secure Mayo server of the UDP. Providers were not notified when patients sent recordings to the ECG dashboard and there was no specific action required or recommended for abnormal rhythm findings as a part of the present study.

Assessment of LVSD

A second objective of the present study was to evaluate whether the watch-obtained ECGs could be used for the determination of LVSD. We used two confirmatory tests to determine whether the watch AI-ECG could identify LVSD: EF measured from clinically indicated echocardiograms performed at the Mayo Clinic and EF assessed from AI analysis of clinically ordered 12-lead ECGs (because the 12-lead AI-ECG has been validated internationally and in a pragmatic prospective trial and found to be a reliable screening test for LVSD)4,17. Watch ECGs recorded during sinus rhythm (determined by the Apple Watch algorithm) from patients who underwent clinical TTE or 12-lead ECG at the Mayo Clinic within 30 d of the watch ECG were included in the LVSD analysis (n = 421 unique patients with TTE and 700 with 12-lead AI-ECG, of whom 377 also had a TTE). Two analyses were conducted to assess the algorithm’s performance. The first used only the single watch ECG temporally closest to the confirmatory test; the second averaged all watch ECGs performed within 30 d of the index confirmatory test. None of the patients in this analysis contributed ECGs for the initial derivation of the algorithm and all ECGs in the analysis were acquired after the model had been fully developed.

All echocardiograms acquired one or more EF measurements. For studies with more than one LVEF measurement, we used a heuristic technique to select the most accurate assessment4. The preferred measurement was (from most accurate to least accurate): three-dimensional (3-D) echocardiography, biplane imaging using the Simpson method, 2-D methods, M-mode measurements and, in the absence of any of the preceding, the reported visually estimated EF. Determination of LVSD from using AI analysis of the 12-lead ECG has been previously described4.

AI-ECG algorithm to detect LVSD

In the present study, a previously developed AI-ECG algorithm to detect LVSD was adapted to process the watch ECG4. Briefly, a convolutional neural network analyzed a matrix comprising 10-s, 12-lead ECG data resampled to 500 Hz (5,000 × 12 values per ECG). Each matrix row contained the raw amplitude for each of the 12 leads for that timestamp. The model used seven convolutional blocks, each with a convolutional layer, a batch normalization, a ‘Relu’ activation function and a max-pooling layer, followed by two fully connected blocks4. The model was developed using 44,959 unique patients and was tested on 52,870 patients not used to develop the model. In the original testing cohort, the model detected an EF ≤ 35% with an AUC of 0.93 and an EF ≤ 40% with an AUC of 0.91.

As the original model was trained and validated on 12-lead ECGs, it can only be tested on these signals. To allow the model to work on the Apple Watch ECGs, we created a new model based on the same derivation cohort4, but with certain adaptions: (1) the model only used a single lead, by utilizing lead 1 of the 12-lead ECG because it is the most biologically similar to the watch ECG recordings; (2) the ECGs were filtered using a Butterworth bandpass filter with 4 poles that kept only the information between 0.2 Hz and 25 Hz to mimic the morphology of the Apple Watch tracing and force the network not to learn high-frequency-based features that cannot be used in the filtered Apple tracing (Extended Data Fig. 4); (3) as mobile form factors often have many artifacts and baseline wandering, we elected to use the median beat that is less sensitive to these artifacts; and (4) similar to our earlier work when retraining the network, we used a threshold of EF ≤ 40% per the definition of LVSD in the international guidelines18,19. The original derivation cohort was subsetted to include only records that were currently authorized for research purposes and to exclude any records of patients who participated in the prospective Apple Watch study.

Quantification of patient engagement

Measuring patient engagement with the Mayo Clinic ECG study app was an objective of the study. Three metrics were developed to quantify the engagement. The first was the number of times each subject used the app to upload watch ECGs during the study period. The second was the number of ECGs uploaded on average per subject during the study period. Due to batch enrollment, not all subjects were in the study for the same duration. Thus, the third metric was the normalized time of use (app engagement) per subject defined as:

Normalizedtime=Date(lastupload)Date(firstupload)Date(studyend)Date(firstupload).

Diagnostic utility of watch-collected ECGs

As a secondary objective, the diagnostic performance of the retrained LVSD detection algorithm using the Apple Watch ECGs was evaluated. In the LVSD subgroup analysis, receiver operating characteristics (ROCs) were generated to evaluate the performance of the applied AI-ECG algorithm for LVSD on this single lead device, for both a single ECG recording nearest to the TTE date and a filtered median-beat ECG sample as described above. To arrive at binary test predictions, we used the testing set from the derivation cohort (n = 50,654) to find the optimal threshold based on Youden’s J-index and separately used the maximization of the sum of sensitivity and specificity. A total of 500 bootstrap replicates was generated which resulted in median estimates for the optimal threshold of 0.66 and 0.68. The final threshold was selected to be 0.67. Statistical analysis at P ≥ 0.67 for a positive AI screen was used for the analysis and was generated for the primary summary of performance. Sensitivity analyses examining diagnostic performance over a range of possible thresholds were also generated

Sex and gender analyses

The study protocol did not prespecify sex- or gender-based analyses. Sex at birth and gender were not ascertained during the patient recruitment directly from the patient. Gender, as reported previously in the EMR, was available through the electronic chart review for the study. The EMR is presently being updated to capture both sex at birth and gender; however, at this time, such data will not have been verified by all patients included in the present study. To minimize misrepresentation of sex- and gender-based results, only sample frequencies of gender as presently available in the EMR have been provided.

Statistical considerations

By design, the study gathered limited data directly from the participants. To characterize the consented patients who provided evaluable ECGs, the EMR was electronically searched to provide a range of comorbidities. A set of established International Classification of Disease (ICD)-9/ICD-10 codes were crossreferenced with the patient records to establish an indicator for the presence of one of the comorbidities on or before 28 February 2022. Analyses of these data were descriptive and not considered to be definitive diagnoses suitable for detailed cohort identification. Diagnostic performance was summarized using standard measures (for example, the ROC AUC, sensitivity and specificity). CIs (DeLong method for AUC; exact intervals for sensitivity and specificity) were used to summarize the estimated statistical precision in the data. A unique challenge for consumer-based ECG acquisition is the ability for participants to take many ECGs within short periods of time. To address this in the statistical analysis, two approaches were used: first, results based on the single most temporally related ECG pairing to the echocardiogram with LVEF determined was considered. A second approach was used that considered the mean of the model predictions for all ECGs acquired during a period of 30 d before and after the echocardiogram.

A temporal trend analysis was conducted that would mimic how the model would perform if the algorithm had been running at the time of the ECG acquisition. Two levels of summaries were conducted: first, the 0.67 threshold was applied to all ECGs that were in sinus rhythm (that is, excluding inconclusive readings and readings reporting AF). Although such an approach may yield the highest sensitivity, it would be expected to have an unacceptably high false-positive rate. To dampen the false-positive rate, a moving average smoother was applied to summarized predictions. First, the daily maximum model score was selected. Then, a rolling mean of these maximum daily scores was computed using the 5 d prior with an ECG recording (note: consecutive days were not required). If this rolling mean exceeded the threshold of 0.67, the participant was considered to have had a positive AI screen.

Data analysis was conducted using Python v.3.76 and R v.4.0.3. Threshold selection was supported by analysis using the R package cutpointr44.

Extended Data

Extended Data Fig. 1 |. Patient study invitations by month.

Extended Data Fig. 1 |

Batch invitations sent to Mayo Clinic app patients on a monthly basis.

Extended Data Fig. 2 |. EMR-integrated AI Dashboard with Apple Watch ECG data.

Extended Data Fig. 2 |

Panel A shows the 12 lead ECGs and AI derived scores to multiple AI-ECG models, Panel B shows the watch ECG tracings from the same patient.

Extended Data Fig. 3 |. Morphological differences between Apple watch and Lead 1 from 12 Lead.

Extended Data Fig. 3 |

12 Lead ECG recording versus post-processing Apple Watch ECG. Significant high frequency signal loss resulted in visual loss of some characteristics present on the standard 12-lead ECG such as pacing spikes that were absent in the Apple watch ECG.

Extended Data Fig. 4 |. Patient provider interaction using the AI dashboard.

Extended Data Fig. 4 |

Case example for patient with new atrial fibrillation on Apple Watch ECG without prior clinical history of atrial fibrillation.

Extended Data Table 1 |.

Clinical characteristics of patients with ventricular dysfunction

HF acuity (acute/chronic) Primary symptom LVEF Stage Underlying/suspected diagnosis comments
chronic DOE with stairs 40 B/C PVC related CM
chronic ambulating without symptoms 40 B dilated CM
chronic no symptoms 40 B dilated CM
unknown asymptomatic 40 A/B tachyarrhythmia induced CM recovery after cardioversion
acute DOE with few steps 39 C pacing-mediated CM with malignancy undergoing chemo
acute mild DOE but quite active 39 B arrhythmogenic CM
chronic asymptomatic 39 A/B nonischemic CM
acute STEMI 38 x cardiogenic shock recovery after intervention
chronic no symptoms 36 A AV block with pacing mediated CM
chronic no HF Sx (PVC symptoms-stress related) 30 B familial dilated CM symptomology paired with stress associated PVC
chronic chronic symptoms 29 C/D ischemic CM with underlying COPD
chronic no cardiac symptoms 29 B ischemic CM
chronic no cardiac symptoms 27 A/B ischemic cardiomyopathy Sx with occasional chest pain
acute no clinical HF Symptom with occasional chest pain 20 A/B unspecified CM
acute cardiac arrest 20 x cardiogenic shock rapid improvement following stent for STEMI
chronic no major symptoms 15 B nonischemic CM

Abbreviations: AV-atrioventricular; CM-cardiomyopathy; DOE-dyspnea on exertion; LVEF-left ventricular ejection fraction; HF-heart failure; PVC-premature ventricular contraction; STEMI-ST-segment elevation myocardial infarction; Sx-symptoms

Disease characteristics for patients with echocardiographic LVEF ≤ 40%.

Extended Data Table 2 |.

Patient comorbidities

One or more positive Watch AI ECGs (N=477) Watch AI ECG Negative(N=1,965) All Patients (N=2,442) P value

Number of ECGs <0.001
 Median 34 20 21
 Q1, Q3 15, 75 10, 39 11, 44
 Range 1, 905 1, 980 1, 980
Age <0.001
 Median 61 53 54
 Q1, Q3 50, 70 41, 64 42, 65
 Range 18, 89 18, 94 18, 94
Female gender 238 (49.9%) 1122 (57.1%) 1360 (55.7%) 0.004
Race 0.60
 White 440 (92.2%) 1786 (90.9%) 2226 (91.2%)
 Black 13 (2.7%) 57 (2.9%) 70 (2.9%)
 Other 24 (5.0%) 122 (6.2%) 146 (6.0%)
Hispanic 13 (2.7%) 82 (4.2%) 95 (3.9%) 0.14
Congestive Heart Failure 133 (27.9%) 82 (4.2%) 215 (8.8%) <0.001
Peripheral Vascular Disease 134 (28.1%) 210 (10.7%) 344 (14.1%) <0.001
Cerebrovascular Disease 48 (10.1%) 136 (6.9%) 184 (7.5%) 0.020
Renal Disease 67 (14.0%) 131 (6.7%) 198 (8.1%) <0.001
Chronic Pulmonary Disease 107 (22.4%) 391 (19.9%) 498 (20.4%) 0.22
Connective Tissue Disease-Rheumatic Disease 30 (6.3%) 105 (5.3%) 135 (5.5%) 0.42
Myocardial Infarction 43 (9.0%) 68 (3.5%) 111 (4.5%) <0.001
Diabetes 85 (17.8%) 183 (9.3%) 268 (11.0%) <0.001
Hypertension, combined 246 (51.6%) 664 (33.8%) 910 (37.3%) <0.001

P-values are t-tests for number of ECGs and age; chi-square tests were used for the remaining variables

Tabulation of the model prediction and patient comorbidities by the number positive screens. A positive screen was defined as a prediction ≥0.67 on the 5-d moving average of the maximum data model score.

Supplementary Material

Supplement

Acknowledgements

This publication was made possible through the support of the Ted and Loretta Rogers Cardiovascular Career Development Award Honoring H. C. Smith (to Z.I.A.). The Mayo Clinic CDH funded and developed the iPhone app used in the present study. The ECG dashboard used for clinician review was developed and supported by the Department of Cardiovascular Medicine. P.A.N. receives research funding from the National Institutes of Health (NIH, including the National Heart, Lung, and Blood Institute (grant nos. R21AG 62580-1, R01HL 131535-4 and R01HL 143070-2) and the National Institute on Aging (grant no. R01AG 062436-1)), the Agency for Healthcare Research and Quality (grant no. R01HS 25402-3), the Food and Drug Administration (FDA; grant no. FD 06292) and the American Heart Association (grant no. 18SFRN34230146). D.M.H. receives support from the NIH StARR Resident Investigator Award (grant no. 5R38HL150086-02). No technical or financial support was received from Apple.

Footnotes

Competing interests

The AI-ECG algorithm to detect left ventricular dysfunction was licensed by Mayo Clinic to Anumana, Eko health. P.A.F., Z.I.A., F.L.J., R.E.C., S.J.A. and other inventors and advisors to these entities may benefit financially from their commercialization. The remaining authors declare no competing interests.

Additional information

Extended data is available for this paper at https://doi.org/10.1038/s41591-022-02053-1.

Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s41591-022-02053-1.

Peer review information Nature Medicine thanks Partho Sengupta, Jill Waalen and Mohamed Elshazly for their contribution to the peer review of this work. Primary Handling Editor: Michael Basson, in collaboration with the Nature Medicine team.

Reprints and permissions information is available at www.nature.com/reprints.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Code availability

The AI algorithm architecture has been previously published17. The code itself cannot be shared because it contains and is proprietary intellectual property (patent pending) that has been licensed and is under FDA review. We have been advised that, without FDA approval, the AI algorithm cannot be used in routine practice outside the Mayo Clinic.

Data availability

The data are not publicly available because they are electronic health records. Sharing these data externally without additional consent might compromise patient privacy and would violate the study’s Institutional Review Board approval. If other investigators are interested in performing additional analyses, requests can be made to the corresponding author, P.F., and analyses could be performed in collaboration with the Mayo Clinic.

References

  • 1.Attia ZI, Harmon DM, Behr ER & Friedman PA Application of artificial intelligence to the electrocardiogram. Eur. Heart J. 42, 4717–4730 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Siontis KC, Noseworthy PA, Attia ZI & Friedman PA Artificial intelligence-enhanced electrocardiography in cardiovascular disease management. Nat. Rev. Cardiol. 18, 465–478 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Harmon DM, Attia ZI & Friedman PA Current and future implications of the artificial intelligence electrocardiogram: the transformation of healthcare and attendant research opportunities. Cardiovasc. Res. 118, e23–e25 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Attia ZI et al. Screening for cardiac contractile dysfunction using an artificial intelligence-enabled electrocardiogram. Nat. Med. 25, 70–74 (2019). [DOI] [PubMed] [Google Scholar]
  • 5.Kwon JM et al. Deep learning-based algorithm for detecting aortic stenosis using electrocardiography. J. Am. Heart Assoc. 9, e014717 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ko WY et al. Detection of hypertrophic cardiomyopathy using a convolutional neural network-enabled electrocardiogram. J. Am. Coll. Cardiol. 75, 722–733 (2020). [DOI] [PubMed] [Google Scholar]
  • 7.Galloway CD et al. Development and validation of a deep-learning model to screen for hyperkalemia from the electrocardiogram. JAMA Cardiol. 4, 428–436 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kwon JM et al. Artificial intelligence for detecting electrolyte imbalance using electrocardiography. Ann. Noninvasive Electrocardiol. 26, e12839 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Attia ZI et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. Lancet 394, 861–867 (2019). [DOI] [PubMed] [Google Scholar]
  • 10.Echouffo-Tcheugui JB, Erqou S, Butler J, Yancy CW & Fonarow GC Assessing the risk of progression from asymptomatic left ventricular dysfunction to overt heart failure: a systematic overview and meta-analysis. JACC Heart Fail. 4, 237–248 (2016). [DOI] [PubMed] [Google Scholar]
  • 11.Ammar KA et al. Prevalence and prognostic significance of heart failure stages: application of the American College of Cardiology/American Heart Association heart failure staging criteria in the community. Circulation 115, 1563–1570 (2007). [DOI] [PubMed] [Google Scholar]
  • 12.McDonagh TA, McDonald K & Maisel AS Screening for asymptomatic left ventricular dysfunction using B-type natriuretic peptide. Congest Heart Fail 14, 5–8 (2008). [DOI] [PubMed] [Google Scholar]
  • 13.Attia ZI et al. Prospective validation of a deep learning electrocardiogram algorithm for the detection of left ventricular systolic dysfunction. J. Cardiovasc. Electrophysiol. 30, 668–674 (2019). [DOI] [PubMed] [Google Scholar]
  • 14.Attia IZ et al. External validation of a deep learning electrocardiogram algorithm to detect ventricular dysfunction. Int. J. Cardiol. 329, 130–135 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Adedinsewo D et al. Artificial intelligence-enabled ECG algorithm to identify patients with left ventricular systolic dysfunction presenting to the emergency department with dyspnea. Circ. Arrhythm. Electrophysiol. 13, e008437 (2020). [DOI] [PubMed] [Google Scholar]
  • 16.Noseworthy PA et al. Assessing and mitigating bias in medical artificial intelligence: the effects of race and ethnicity on a deep learning model for ECG analysis. Circ. Arrhythm. Electrophysiol. 13, e007988 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Yao X et al. Artificial intelligence-enabled electrocardiograms for identification of patients with low ejection fraction: a pragmatic, randomized clinical trial. Nat. Med. 27, 815–819 (2021). [DOI] [PubMed] [Google Scholar]
  • 18.McDonagh TA et al. 2021 ESC guidelines for the diagnosis and treatment of acute and chronic heart failure. Eur. Heart J. 42, 3599–3726 (2021). [DOI] [PubMed] [Google Scholar]
  • 19.Yancy CW et al. 2017 ACC/AHA/HFSA focused update of the 2013 ACCF/AHA Guideline for the Management of Heart Failure: a report of the American College of Cardiology/American Heart Association Task Force on Cinical Practice Guidelines and the Heart Failure Society of America. Circulation 136, e137–e161 (2017). [DOI] [PubMed] [Google Scholar]
  • 20.Pisano ED et al. Diagnostic performance of digital versus film mammography for breast-cancer screening. N. Engl. J. Med. 353, 1773–1783 (2005). [DOI] [PubMed] [Google Scholar]
  • 21.Cárdenas-Turanzas M et al. The accuracy of the Papanicolaou smear in the screening and diagnostic settings. J. Low. Genit. Trac. Dis. 12, 269–275 (2008). [DOI] [PubMed] [Google Scholar]
  • 22.Bhalla V et al. Diagnostic ability of B-type natriuretic peptide and impedance cardiography: testing to identify left ventricular dysfunction in hypertensive patients. Am. J. Hypertens. 18, 73s–81s (2005). [DOI] [PubMed] [Google Scholar]
  • 23.Perez MV et al. Large-scale assessment of a smartwatch to identify atrial fibrillation. N. Engl. J. Med. 381, 1909–1917 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Benjamin ZI et al. Correction to: Heart disease and stroke statistics—2018 update: a report from the American Heart Association. Circulation 137, e493 (2018). [DOI] [PubMed] [Google Scholar]
  • 25.Bui AL, Horwich TB & Fonarow GC Epidemiology and risk profile of heart failure. Nat. Rev. Cardiol. 8, 30–41 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Dargie HJ Effect of carvedilol on outcome after myocardial infarction in patients with left-ventricular dysfunction: the CAPRICORN randomised trial. Lancet 357, 1385–1390 (2001). [DOI] [PubMed] [Google Scholar]
  • 27.Pfeffer MA et al. Effect of captopril on mortality and morbidity in patients with left ventricular dysfunction after myocardial infarction. Results of the survival and ventricular enlargement trial. The SAVE Investigators. N. Engl. J. Med. 327, 669–677 (1992). [DOI] [PubMed] [Google Scholar]
  • 28.Guo Y et al. Mobile photoplethysmographic technology to detect atrial fibrillation. J. Am. Coll. Cardiol. 74, 2365–2375 (2019). [DOI] [PubMed] [Google Scholar]
  • 29.Grogan M et al. Artificial intelligence-enhanced electrocardiogram for the early detection of cardiac amyloidosis. Mayo Clin. Proc. 96, 2768–2778 (2021). [DOI] [PubMed] [Google Scholar]
  • 30.Ahn JC et al. Development of the AI-cirrhosis-ECG score: an electrocardiogram-based deep learning model in cirrhosis. Am. J. Gastroenterol. 117, 424–432 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Bailey JJ et al. Recommendations for standardization and specifications in automated electrocardiography: bandwidth and digital signal processing. A report for health professionals by an ad hoc writing group of the Committee on Electrocardiography and Cardiac Electrophysiology of the Council on Clinical Cardiology, American Heart Association. Circulation 81, 730–739 (1990). [DOI] [PubMed] [Google Scholar]
  • 32.Shiner Z, Baharav A & Akselrod S Detection of different recumbent body positions from the electrocardiogram. Med. Biol. Eng. Comput. 41, 206–210 (2003). [DOI] [PubMed] [Google Scholar]
  • 33.Nelwan SP, Meij SH, van Dam TB & Kors JA Correction of ECG variations caused by body position changes and electrode placement during ST-T monitoring. J. Electrocardiol. 34, 213–216 (2001). [DOI] [PubMed] [Google Scholar]
  • 34.Williams GC et al. The impact of posture on cardiac repolarization: more than heart rate? J. Cardiovasc. Electrophysiol. 17, 352–358 (2006). [DOI] [PubMed] [Google Scholar]
  • 35.Schijvenaars BJ, Kors JA, van Herpen G, Kornreich F & van Bemmel JH Effect of electrode positioning on ECG interpretation by computer. J. Electrocardiol. 30, 247–256 (1997). [DOI] [PubMed] [Google Scholar]
  • 36.Heidenreich PA et al. Forecasting the impact of heart failure in the united states: a policy statement from the American Heart Association. Circ. Heart Fail. 6, 606–619 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Mozaffarian D et al. Heart disease and stroke—2015 update: a report from the American Heart Association. Circulation 131, e29–e322 (2015). [DOI] [PubMed] [Google Scholar]
  • 38.Bahrami H et al. Differences in the incidence of congestive heart failure by ethnicity: the multi-ethnic study of atherosclerosis. Arch. Intern. Med. 168, 2138–2145 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Apple Inc. Using Apple Watch for Arrhythmia Detection https://www.apple.com/ca/healthcare/docs/site/Apple_Watch_Arrhythmia_Detection.pdf (Apple, 2020; ). [Google Scholar]
  • 40.Steyerberg EW & Harrell FE Jr. Prediction models need appropriate internal, internal–external, and external validation. J. Clin. Epidemiol. 69, 245–247 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Tseng AS et al. Cost effectiveness of an electrocardiographic deep learning algorithm to detect asymptomatic left ventricular dysfunction. Mayo Clinic Proc. 96, 1835–1844 (2021). [DOI] [PubMed] [Google Scholar]
  • 42.Cohen-Shelly M et al. Electrocardiogram screening for aortic valve stenosis using artificial intelligence. Eur. Heart J. 42, 2885–2896 (2021). [DOI] [PubMed] [Google Scholar]
  • 43.Attia ZI et al. Age and sex estimation using artificial intelligence from standard 12-lead ECGs. Circ. Arrhythm. Electrophysiol. 12, e007284 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Thiele C & Hirschfeld G cutpointr: Improved estimation and validation of optimal cutpoints in R. J. Stat. Softw. 98, 1–27 (2021). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

Data Availability Statement

The data are not publicly available because they are electronic health records. Sharing these data externally without additional consent might compromise patient privacy and would violate the study’s Institutional Review Board approval. If other investigators are interested in performing additional analyses, requests can be made to the corresponding author, P.F., and analyses could be performed in collaboration with the Mayo Clinic.

RESOURCES