Abstract
Background
Wrist-worn monitors claim to provide accurate measures of heart rate and energy expenditure. People wishing to lose weight use these devices to monitor energy balance, however the accuracy of these devices to measure such parameters has not been established.
Aim
To determine the accuracy of four wrist-worn devices (Apple Watch, Fitbit Charge HR, Samsung Gear S and Mio Alpha) to measure heart rate and energy expenditure at rest and during exercise.
Methods
Twenty-two healthy volunteers (50% female; aged 24 ± 5.6 years) completed ~1-hr protocols involving supine and seated rest, walking and running on a treadmill and cycling on an ergometer. Data from the devices collected during the protocol were compared with reference methods: electrocardiography (heart rate) and indirect calorimetry (energy expenditure).
Results
None of the devices performed significantly better overall, however heart rate was consistently more accurate than energy expenditure across all four devices. Correlations between the devices and reference methods were moderate to strong for heart rate (0.67–0.95 [0.35 to 0.98]) and weak to strong for energy expenditure (0.16–0.86 [-0.25 to 0.95]). All devices underestimated both outcomes compared to reference methods. The percentage error for heart rate was small across the devices (range: 1–9%) but greater for energy expenditure (9–43%). Similarly, limits of agreement were considerably narrower for heart rate (ranging from -27.3 to 13.1 bpm) than energy expenditure (ranging from -266.7 to 65.7 kcals) across devices.
Conclusion
These devices accurately measure heart rate. However, estimates of energy expenditure are poor and would have implications for people using these devices for weight loss.
Introduction
The benefits of participating in regular physical activity are well documented [1], yet physical inactivity remains the largest risk factor for the development of cardiometabolic disease worldwide [2]. Wearable devices have become a popular method of measuring activity-based outcomes and facilitating behavior change to effectuate weight loss [3]. It was estimated that approximately 25 million of these devices would be sold in 2015 and worldwide sales are expected to increase to approximately 12.6 billion U.S. dollars by 2018 [4]. Notably, wrist-worn monitors are predicted to account for 87% of wearable devices shipped in 2018 [5]. These devices claim to provide accurate measures of energy expenditure and, more recently, heart rate via photoplethysmography.
Previous studies investigating the validity of energy expenditure estimates have been limited to devices that do not include a measure of heart rate. These studies have demonstrated moderate validity, typically underestimating total energy expenditure compared to reference methods by approximately 10–30% depending on the device measured [6–9].
With the inclusion of sophisticated photoplethysmography technology, new-generation devices such as the Apple Watch and Fitbit Charge HR have the potential to use heart rate-derived algorithms to contribute to estimates of energy expenditure based on activity intensity [10,11]. Recent evidence suggests this method has acceptable validity, however there is inherent variability, demonstrating that the accuracy of these devices is dependent on the device used, the type and intensity of activity, and skin photosensitivity [12,13]., Melanin concentration and skin pigmentation can attenuate the light wavelength emitted from these devices, thereby reducing pulse rate detection [14]. It is important to recognize, however, that the devices that have previously been evaluated were typically designed for sports performance, and contemporary activity trackers (e.g. Apple Watch, Fitbit Charge HR) have not yet been evaluated.
Given the rapid consumer uptake of these devices, it is critical to determine their accuracy to measure these variables across a variety of modes and intensities given their potential to have a major influence on lifestyle behavior and weight management. The aim of this study was to therefore determine the ability of four popular wrist-worn devices (Apple Watch, Fitbit Charge HR, Samsung Gear S and Mio Alpha) to accurately measure heart rate and energy expenditure during approximately one hour of rest, cycling and treadmill walking.
Methods
Participants
Healthy male and female volunteers aged between 18–70 were invited to participate. All participants were recruited from a large metropolitan university. The study received ethical clearance from the University of Queensland (HMS15/2403). Research staff screened all participants for any medical indications that may exclude them from exercise testing and obtained written informed consent. Prior to each visit, participants were advised to refrain from ingesting caffeine and alcohol, and to avoid vigorous physical activity for 24 hours, and from consuming a large meal four hours, prior. A standardised meal replacement beverage (Up & Go, Sanitarium, Australia) was provided to participants two hours before all testing sessions. A 24-hour physical activity and dietary diary was completed prior to the second testing session and participants were asked to replicate these behaviors before the final trial.
Experimental Protocols
Participants attended the research laboratory on three separate occasions, separated by between 48 hours and 7 days. Visit one included measures of height, weight and skin type assessment via Fitzpatrick Skin Type scale [15]. Maximal oxygen uptake () via indirect calorimetry (MetaMax 3B, Cortex, Germany) was also assessed using a Bruce treadmill protocol. Standard calibration of gas analysers (two point calibration against room air and known gas concentration of 4.07% CO2/15.95% O2) and volume (3L Hans Rudolph calibration syringe, Kansas, United States) was performed prior to each assessment as per manufacturers instructions. Measurements of oxygen consumption, carbon dioxide production and minute ventilation were obtained at rest and during exercise.
Visits two and three were testing sessions, with two devices tested per session (one on each arm), using a randomized and counterbalanced method. Each visit involved the simultaneous recordings of heart rate and energy expenditure from the devices during a range of activities for comparison with reference methods. As three of the devices also measured steps, total steps for the duration of the testing session was also recorded for these devices (Apple Watch, Fitbit Charge HR, Samsung Gear S). To ensure participants were adequately hydrated, urine osmolality was assessed on arrival (Osmocheck Pocket Refractometer, Vitech Scientific Ltd, Tokyo). Activities at rest (lying, sitting, standing) and exercise (walking, cycling) were chosen for the 58-minute protocol (Fig 1). Participants initially performed five-minute periods of supine, sitting and standing, respectively. Three stages of a Bruce graded treadmill exercise protocol were then undertaken followed by five minutes of seated rest. Participants then completed six, three-minute stages of a 25-watt step test (commencing at 25 W) on a cycle ergometer followed by a final five minutes of seated rest.
The devices tested were the Apple Watch (Apple Inc., California, United States), Fitbit Charge HR (Fitbit Inc., San Francisco, United States), Samsung Gear S (Samsung Electronics Co., Ltd., Suwon, South Korea) and Mio Alpha (Mio Global, Canada). As per manufacturer instructions, the devices were individualized for age, gender and anthropometrical data. Devices with compatible smartphone software were synchronized via Bluetooth to an appropriate smartphone to assist with data collection (ease of visualization).
Reference Methods
Electrocardiography (ECG) electrodes (3-lead, CASE exercise testing system, GE Healthcare, UK) were fitted at each visit and heart rate from the ECG and devices was manually recorded every 15-seconds during the protocol. Energy expenditure was measured using indirect calorimetry with a portable gas-analysis system (MetaMax 3B, Cortex, Germany). Participants were video recorded while walking on the treadmill and step count was determined from the recording retrospectively via visual inspection at half-speed playback.
Statistical analysis
Pearson (r) or Spearman rank correlation coefficients (rho), for normal- and non-normally distributed data, respectively, intraclass correlation coefficients and Bland-Altman plots with mean bias and upper and lower limits of agreement (LoA) were used to assess criterion validity and agreement between the device and the reference. After visual examination of the plots, systematic bias was assessed using linear regression to determine whether mean difference and/or limits of agreement varied across average values of the device and the reference [16]. Where mean difference and/or limits of agreement varied with average values, estimates were calculated for the mean of the average values. All statistical analyses were conducted using SPSS (Version 22, SPSS Inc.) and data presented as mean ± SD. The strength of correlation coefficients was interpreted based on the following definitions: weak (r = <0.5), moderate (0.5–0.7) and strong (r≥0.7).
Results
Twenty-two individuals (11 women) volunteered to participate [age: 24.9 ± 5.6 years; height: 173.1 ± 9.9 cm; weight: 72.7 ± 11.8 kg; : 50.1 ± 7.8 mL.kg-1.min-1; maximum heart rate: 189.6 ± 6.9 beats per minute; Fitzpatrick Skin Type scale <IV (n = 15) and >IV (n = 7)]. Participants were euhydrated prior to testing sessions (<700 mOsmol). All participants wore each device once however energy expenditure data were missing for three participants and step count data were missing for two due to a data recording error. Both trials increased heart rate to ~70–80% of maximum with mean oxygen consumption 13.8 ± 1.4 mL.kg-1.min-1 and 14.3 ± 2.0 mL.kg-1.min-1 for trial one and two respectively. The mean±SD relative oxygen consumption (mL.kg-1.min-1) for individual stages of both trials were as follows: supine (5.0 ± 0.7), quiet sitting 1 (4.5 ± 0.8), standing (4.9 ± 1.1), treadmill stage 1 (14.3 ± 1.6), treadmill stage 2 (22.4 ± 2.2), treadmill stage 3 (32.6 ± 3.0), quiet sitting 2 (8.4 ± 2.1), cycling stage 1 (12.2 ± 1.8), cycling stage 2 (14.8 ± 2.3), cycling stage 3 (17.7 ± 2.8), cycling stage 4 (21.5 ± 3.7), cycling stage 5 (25.3 ± 4.5), cycling stage 6 (29.2 ± 5.6), and quiet sitting 3 (7.2 ± 1.3).
Correlations and Bland-Altman findings (mean difference and limits of agreement) are presented in Table 1. No one device performed better overall, however, the outcome of heart rate was consistently more accurate than energy expenditure across all four devices. Correlations between device measures and reference methods varied depending upon the outcome and the device used, ranging from moderate to strong for heart rate (0.67–0.95 [0.35 to 0.98]), and from weak to strong for energy expenditure (0.16–0.86 [-0.25 to 0.95]) (Table 1).
Table 1. Sample size, mean, correlation, agreement between device and reference methods and Bland-Altman outcomes for heart rate (bpm) and energy expenditure (kcal).
Apple Watch | Fitbit Charge HR | Samsung Gear S | Mio ALPHA | ||
---|---|---|---|---|---|
Heart rate | N | 22 | 22 | 22 | 22 |
(bpm) | Device mean ± SD | 100.7 ±14.0 | 92.7 ± 11.5 | 93.4 ± 13.9 | 97.7 ± 14.6 |
ECG mean ± SD | 102.0 ± 14.4 | 102.0 ± 14.5 | 100.5 ± 14.6 | 102.0 ± 13.4 | |
r/Rho (95% CI) | 0.95 (0.88 to 0.98) | 0.81 (0.59 to 0.92) | 0.67* (0.35 to 0.85) | 0.87 (0.71 to 0.94) | |
ICC (95% CI) | 0.98 (0.94 to 0.99) | 0.78 (-0.02 to 0.93) | 0.80 (0.40 to 0.93) | 0.91 (0.72 to 0.97) | |
Mean difference ± SD | -1.3 ± 4.4 | -9.3 ± 8.5 | -7.1 ± 10.3 | -4.3 ± 7.2 | |
Upper LoA | 7.3 | 7.4 | 13.1 | -0.44.avg + 52.69† | |
Lower LoA | -9.9 | -26.0 | -27.3 | 0.4.avg—61.2† | |
Energy Expenditure | N | 22 | 22 | 19⌃ | 22 |
(kcal) | Device mean ± SD | 162.6 ± 33.0 | 236.8 ± 77.0 | 261.4 ± 47.5 | 189.5 ± 95.3 |
Indirect calorimetry mean ± SD | 285.7 ± 50.2 | 299.1 ± 46.0 | 287.5 ± 45.1 | 290.3 ± 46.3 | |
r/Rho (95% CI) | 0.16 (-0.28 to 0.54) | 0.64 (0.30 to 0.84) | 0.86 (0.67 to 0.95) | 0.46* (0.05 to 0.74) | |
ICC (95% CI) | 0.05 (-0.05 to 0.17) | 0.56 (-0.18 to 0.83) | 0.86 (0.15 to 0.96) | 0.32 (-0.24 to 0.68) | |
Mean difference ± SD | -123.1 ± 55.6 | 0.61.avg–224.6 ± 59.1† | -26.1 ± 24.2 | 0.91.avg -318.77 ± 84.8† | |
Upper LoA | -14.1 | 1.3.avg–334.28† | 21.3 | 0.91.avg -318.77 + 166.2† | |
Lower LoA | -232.1 | -0.11.avg–114.92† | -73.5 | 0.91.avg -318.77–166.2† |
Notes: ICC = intraclass correlation coefficient, CI = confidence interval, kcal = kilocalories, ECG = electrocardiography, bpm = beats per minute, SD = standard deviation, avg = average. Correlations r/Rho are Pearson’s correlation coefficient (r) except where indicated by * where they are Spearman rank correlation coefficients (Rho) due to non-normally distributed data.
† Where Bland-Altman parameters were systematically biased (mean difference/limits of agreement), values are presented as linear equations rather than point estimates.
⌃ Missing values (n = 3) due to a data recording error.
Bland-Altman plots indicated that all devices underestimated all outcome measures compared to the reference method (Fig 2). The average underestimation for devices compared to reference methods ranged from 1–9% for heart rate and 9–43% for energy expenditure. The Samsung Gear S demonstrated the greatest variability for heart rate (Lower LoA–Upper LoA; -27.3 to 13.1 bpm) (Fig 3). Furthermore, the Mio ALPHA demonstrated the greatest variability for estimated energy expenditure (-266.7 to 65.7 kcal) (Fig 4). Systematic bias was identified for energy expenditure and heart rate outcomes for the Fitbit Charge HR and Mio ALPHA devices. There were no statistical differences between correlations for heart rate based on skin color (Fitzpatrick Skin Type scale <IV (n = 15) and >IV (n = 7)], except for the Apple Watch, where the correlation for Fitzpatrick Skin Type Scale >IV (r = 1.00) was statistically different to <IV (r = 0.94) (p<0.05).
Three of the devices measured step count. Correlations between measured steps and the reference method for the Apple Watch (0.70 [0.38 to 0.87]), Fitbit Charge HR (0.67 [0.34 to 0.85] and Samsung Gear S (0.88 [0.72 to 0.95]) were considered moderate to strong. However, the Fitbit Charge HR demonstrated the greatest variability for step count (-353 to 235 steps) (Fig 5). The average error of underestimation for these devices ranged from 4–6%.
Discussion
This is the first study to examine the accuracy of four popular wrist-worn devices: the Apple Watch, Fitbit Charge HR, Samsung Gear S and Mio Alpha, to measure heart rate and energy expenditure during rest, cycling and treadmill walking. Our findings demonstrate that all devices underestimated heart rate and energy expenditure. No single device demonstrated consistently greater accuracy across these measures and the magnitude of error varied depending on the outcome of interest.
Device estimates of heart rate via photoplethysmography were within 1–9% of reference estimates. Heart rate is commonly used to monitor and prescribe cardiovascular-based exercise intensity [17] and therefore accurate measures are important for precise exercise prescription. Our findings indicate that wrist-worn devices utilizing photoplethysmography offer consumers a convenient and satisfactory method to monitor heart rate while exercising. This is consistent with a recent investigation examining the accuracy of the wrist-based Mio ALPHA, and the forearm-worn Scosche myRhythm, to measure heart rate during rest, exercise and hand-based activities compared to electrocardiography [12]. Overall, the devices had a mean error of <2%, however this varied between the devices for the type of activity. The Mio ALPHA demonstrated the largest mean error during cycling (-4.8%), whilst the largest mean error for the Scosche myRhythm was during walking (-3.13%) [12]. Similarly, Spierer and colleagues (2015) also assessed the accuracy of the Mio ALPHA, and the Omron HR500U during rest, and aerobic and resistance exercise [13]. All devices assessed demonstrated measurement error compared to the reference method, of which was significant during resistance exercise for the Mio ALPHA (mean ± standard error: 23.3 ± 31.94 bpm; p<0.01) [13].
The addition of heart rate measures to traditional accelerometery-based devices that measure physical activity would be expected to improve the accuracy of energy expenditure predictions [10]. However, our findings demonstrate significant variability in the accuracy of energy expenditure estimation, with up to 43% difference between the device and the reference method. As increased energy expenditure through physical activity is recommended as a part of a weight management strategy [18], the inability to accurately estimate energy expenditure is a limitation across these devices. It is difficult to speculate what contributed to errors of this magnitude. It is assumed that each device has a specific algorithm for the determination of energy expenditure. Technical assistance was sought from each company to ascertain information regarding the algorithms used to determine energy expenditure, however this information was not disclosed.
The accuracy of several commercially available activity trackers to measure a variety of physical-activity related outcomes during free-living conditions, which included two wrist-worn devices (Jawbone UP and Misfit Shine) was recently evaluated [7]. Although these devices were not designed to measure photoplethysmography-derived heart rate, the results highlighted that, consistent with our findings, all devices significantly underestimated energy expenditure (Jawbone = -898 kcal; Misfit Shine = -479 kcal), with only a modest association with reference methods (r = 0.74–0.79). Similarly, Sasaki and co-workers (2015) recently validated a hip-worn Fitbit Classic device against indirect calorimetry during a variety of lab-based activities including walking, running and simulated free-living conditions [9]. This study was the first to validate activity-specific estimates of energy expenditure compared to continuous estimates as previous described [6,8]. The Fitbit Classic underestimated energy expenditure for a variety of activities of daily living [-3.1 ± 4.2 kcal/6 min (95% limits of agreement (LoA): -11 to 5.2 kcal/6 min)], locomotion [-5.6 ± 12 kcal/6 min (95% LoA: -29 to 18 kcal/6 min)] and sports [-2.1 ± 12 kcal/6 min (95% LoA -26 to 22 kcal/6 min)]. As increased energy expenditure through physical activity is recommended as a part of a weight management strategy [18], the inability to accurately estimate energy expenditure is a limitation across these devices.
Of interest was the observation that that the Samsung Gear S does not incorporate heart rate into estimations of energy expenditure, whereas the others do. Instead, the Samsung Gear S appears to use an accelerometery-based algorithm during walking/running and predictive equations during cycling. Consistent with previous research [7,19], step count estimates for the Apple Watch, Fitbit Charge HR and Samsung Gear S were acceptable (within 4–6% of the reference).
Limitations of this study included the relatively young and apparently healthy sample of participants (mean: 24.9 ± 5.6 years, range: 19–41 years), and therefore results may not be generalizable to a broader consumer market. Furthermore, the findings associated with laboratory-based protocol cannot be generalized to the free-living context. Finally, it is suggested that the accuracy of these devices may be reduced during higher intensity or resistance-based exercise as a result of movement artefact [13], which was not addressed in this investigation.
Conclusion
The four devices accurately measure heart rate however estimates of energy expenditure are poor. This limits their use for monitoring energy balance, and therefore as a weight loss aid.
Supporting Information
Data Availability
All relevant data are within the paper and its Supporting Information files.
Funding Statement
The principal investigator on the study (Coombes) received an unrestricted grant from Coca Cola that was used to partially fund this study (Research Master Number 2014002786, http://transparency.coca-colajourney.com.au). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. There was no additional external funding received for this study.
References
- 1.Warburton DE, Nicol CW, Bredin SS. Health benefits of physical activity: the evidence. CMAJ. 2006;174: 801–809. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Haskell WL, Lee I-M, Pate RR, Powell KE, Blair SN, Franklin BA, et al. Physical activity and public health: updated recommendation for adults from the American College of Sports Medicine and the American Heart Association. Med Sci Sport Exerc. 2007;39: 1423–1434. [DOI] [PubMed] [Google Scholar]
- 3.Patel MS, Asch DA, Volpp KG. Wearable devices as facilitators, not drivers, of health behavior change. JAMA. 2015;313: 459–460. 10.1001/jama.2014.14781 [DOI] [PubMed] [Google Scholar]
- 4.Forecast unit sales of health and fitness trackers worldwide from 2014 to 2015 (in millions), by region. Statista. 2015. Available: http://www.statista.com/statistics/413265/health-and-fitness-tracker-worldwide-unit-sales-region/.
- 5.Smart watches and Smart Bands Dominate Fast-Growing Wearables Market. CCS Insight. 2014. Available: http://www.ccsinsight.com/press/company-news/1944-smartwatchesand-smart-bands-dominate-fast-growing-wearables-market.
- 6.Dannecker KL, Sazonova NA, Melanson EL, Sazonov ES, Browning RC. A comparison of energy expenditure estimation of several physical activity monitors. Med Sci Sport Exerc. 2013;45: 2105–2112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ferguson T, Rowlands AV, Olds T, Maher C. The validity of consumer-level, activity monitors in healthy adults worn in free-living conditions: a cross-sectional study. Int J Behav Nutr Phys Act. 2015;12: 42 10.1186/s12966-015-0201-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lee J-M, Kim Y, Welk GJ. Validity of consumer-based physical activity monitors. Med Sci Sport Exerc. 2014;46: 1840–1848. [DOI] [PubMed] [Google Scholar]
- 9.Sasaki JE, Hickey A, Mavilia M, Tedesco J, John D, Kozey Keadle S, et al. Validation of the Fitbit wireless activity tracker for prediction of energy expenditure. J Phys Act Health. 2015;12: 149–154. 10.1123/jpah.2012-0495 [DOI] [PubMed] [Google Scholar]
- 10.Keytel LR, Goedecke JH, Noakes TD, Hiiloskorpi H, Laukkanen R, van der Merwe L, et al. Prediction of energy expenditure from heart rate monitoring during submaximal exercise. J Sport Sci. 2007;23: 289–297. [DOI] [PubMed] [Google Scholar]
- 11.Luke A, Maki KC, Barkey N, Cooper R, McGee D. Simultaneous monitoring of heart rate and motion to assess energy expenditure. Med Sci Sport Exerc. 1997;29: 144–148. [DOI] [PubMed] [Google Scholar]
- 12.Parak J, Korhonen I. Evaluation of wearable consumer heart rate monitors based on photoplethysmography. IEEE. 2014: 3670–3673. [DOI] [PubMed] [Google Scholar]
- 13.Spierer DK, Rosen Z, Litman LL, Fujii K. Validation of photoplethysmography as a method to detect heart rate during rest and exercise. J Med Eng Technol. 2015;39: 264–271. 10.3109/03091902.2015.1047536 [DOI] [PubMed] [Google Scholar]
- 14.Fallow BA, Tarumi T, Tanaka H. Influence of skin type and wavelength on light wave reflectance. J Clin Monit Comput. 2013;27: 313–317. 10.1007/s10877-013-9436-7 [DOI] [PubMed] [Google Scholar]
- 15.Fitzpatrick TB. The validity and practicality of sun-reactive skin types I through VI. Arch Dermatol. 1988;124: 869–871. [DOI] [PubMed] [Google Scholar]
- 16.Brown R, Richmond S. An update on the analysis of agreement for orthodontic indices. Eur J Orthod. 2005;27: 286–291. [DOI] [PubMed] [Google Scholar]
- 17.Mann T, Lamberts RP, Lambert MI. Methods of prescribing relative exercise intensity: physiological and practical considerations. Sports Med. 2013;43: 613–625 10.1007/s40279-013-0045-x [DOI] [PubMed] [Google Scholar]
- 18.Donnelly JE, Blair SN, Jakicic JM, Manore MM, Rankin JW, Smith BK, et al. American College of Sports Medicine Position Stand. Appropriate physical activity intervention strategies for weight loss and prevention of weight regain for adults. Med Sci Sport Exerc. 2009;41: 459–471. [DOI] [PubMed] [Google Scholar]
- 19.Case MA, Burwick HA, Volpp KG, Patel MS. Accuracy of smartphone applications and wearable devices for tracking physical activity data. JAMA. 2015;313: 625–626. 10.1001/jama.2014.17841 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All relevant data are within the paper and its Supporting Information files.