Review of Validity and Reliability of Garmin Activity Trackers

Kelly R Evenson; Camden L Spade

doi:10.1123/jmpb.2019-0035

. Author manuscript; available in PMC: 2021 Jun 1.

Published in final edited form as: J Meas Phys Behav. 2020 Jun;3(2):170–185. doi: 10.1123/jmpb.2019-0035

Review of Validity and Reliability of Garmin Activity Trackers

Kelly R Evenson ¹, Camden L Spade ²

PMCID: PMC7323940 NIHMSID: NIHMS1580332 PMID: 32601613

Abstract

Purpose:

A systematic review to summarize the validity and reliability of steps, distance, energy expenditure, speed, elevation, heart rate, and sleep assessed by Garmin activity trackers.

Methods:

Searches included studies published through December 31, 2018. Correlation coefficients (CC) were assessed as low (<0.60), moderate (0.60-<0.75), good (0.75-<0.90), or excellent (>=0.90). Mean absolute percentage errors (MAPE) were assessed as acceptable at <5% in controlled conditions and <10% for free-living.

Results:

Overall, 32 studies of adults documented validity. Four of these studies also documented reliability. The sample size ranged from 1 to 95 for validity and 4 to 31 for reliability testing. Step inter- and intra-reliability was good-to-excellent and speed intra-reliability was excellent. No other features were explored for reliability. Step validity, across 16 studies, generally indicated good-to-excellent CC and acceptable MAPE. Distance validity, tested in three studies, generally indicated poor CC and MAPE that exceeded acceptable limits, with both over and underestimation. Energy expenditure validity, across 12 studies, generally indicated wide variability in CC and MAPE that exceeded acceptable limits. Heart rate validity in five studies had low-to-excellent CC and all MAPE exceeded acceptable limits. Speed, elevation, and sleep validity were assessed in only one or two studies each; for sleep, the criterion relied on self-report rather than polysomnography.

Conclusion:

This systematic review of Garmin activity trackers among adults indicated higher validity of steps; few studies on speed, elevation, and sleep; and lower validity for distance, energy expenditure, and heart rate. Intra- and inter-device feature reliability needs further testing.

Keywords: activity tracker, digital health, heart rate, physical activity, steps, wearables

Introduction

Wearables are worn devices that can provide a variety of feedback. From a search conducted in 2017, 423 unique wearables distributed across 132 brands were identified (Henriksen et al., 2018). This was an increase from only 3 wearables identified in 2011. In line with the proliferation of wearables, based on a 2018 survey of more than two thousand health professionals from around the world, “wearable technology” was considered the leading fitness trend (Thompson, 2019).

Activity trackers, a subset of wearables, have quickly caught on for personal use, such as to promote changes in physical activity (Strath and Rowley, 2018). In support of this, the Community Guide recommended activity trackers to increase physical activity among overweight or obese adults (de Vries et al., 2016). Consumers are also using activity trackers to communicate with healthcare providers and make more informed health-related decisions (Strath and Rowley, 2018; Wright et al., 2017). In addition, activity trackers are being extensively used for research purposes, both for intervention and measurement, as indicated in both the clinicaltrials.gov database of clinical trials and in the National Institutes of Health RePORTER database of United States’ governmental funded studies (Wright et al., 2017). Researchers who wish to use activity trackers must decide from a plethora of device options and features.

With the rise in the choice of activity trackers comes the integration of new sensors that can provide diverse features to the devices, including photoplethysmography, global positioning systems (GPS), barometry, and altimetry (Henriksen et al., 2018). When researchers consider which activity tracker to use, best practice indicates that the information output from the device (i.e., features) should be both valid and reliable (Duking et al., 2018). However, the literature assessing activity trackers is voluminous, with varied protocols, brands and versions, locations worn, and modes of testing. This makes it challenging to assess which device and features within devices to use for research purposes.

Systematic reviews on activity trackers from the same company offer the opportunity to document the history and lineage of their devices. Activity trackers are probably operationally more similar within company than across companies. For example, proprietary algorithms differ across companies and are likely repurposed within the same company. Previously, this type of review was conducted for Fitbit and Jawbone devices (Evenson et al., 2015). We proposed a similar review on Garmin activity trackers.

Garmin (Garmin Ltd., Olathe, Kansas) was founded in 1989 and, as early as 2006 offered activity trackers. Based on second quarter 2018, Garmin ranked fifth in amount of shipments worldwide of activity trackers at 5.3% (International Data Corporation, 2018). In December 2018, an announcement indicated that Garmin would be partnering with ActiGraph, one of the leaders in research-grade accelerometry, for a future product (Muoio, 2018; Plasqui et al., 2013). Garmin devices are also being used in clinical settings both for intervention and measurement. Conducting a search in the clinical trials database (clinicaltrials.gov) on December 16, 2019 revealed 41 studies using a Garmin wearable device.

In order to facilitate use of activity trackers in research, we conducted a systematic review of Garmin activity trackers. Specifically, we summarized the validity and reliability of wrist-worn Garmin activity trackers to assess steps, distance, energy expenditure, speed, elevation, heart rate, and sleep.

Methods

Literature Search

Searches of PubMed, Web of Science, and SPORTDiscus were conducted to include only full-length studies through December 31, 2018. The final search is described in Appendix 1. No start date was imposed in the search. The studies identified from the searches were compiled into Covidence (Melbourne, Victoria) and the two authors selected abstracts for full text review.

Abstracts, conference proceedings, and papers that did not provide the full text in English were excluded. Validity and reliability studies of Garmin trackers that were not activity trackers (example Duncan et al., (2007) were excluded. Studies focused on special populations that might have gait or mobility impairments which could impact the measures under study (examples: Lamont et al., (2018) Madigan, (2019) or Treacy et al., (2017) were also excluded. The review focused on locomotor speed and distance; therefore, we did not include other measures of speed and distance, such as assessed through skiing (Gloersen et al., 2018) or swimming (Mooney et al., 2017). The review also focused on heart rate measured at the wrist; assessment of heart rate straps worn in conjunction with the Garmin wrist-worn activity tracker were not included (for example Cassirame et al., (2017).

Abstraction and Analysis

First, descriptive information on the activity trackers (models, release date, placement, size, weight, and cost) from the Garmin website was recorded. Second, an abstraction tool used for this review was expanded from a tool initially created by De Vries et al. (2009) to document study characteristics and measurement properties of the activity trackers. Specifically, we extracted information on the study population, protocol, statistical analysis, and results related to validity and reliability. A primary reviewer extracted details and a second reviewer checked each entry, with discrepancies resolved by consensus. For abstracted information missing from the publication, we attempted to contact at least one study author to obtain the information. In total, we contacted authors from 15 papers, among which 12 responded. Summary tables were created from the abstracted information.

Reliability of the activity trackers included (Duking et al., 2018): (i) intra-device reliability: defined as reproducibility within the same tracker; and (ii) inter-device reliability: defined as reproducibility with different trackers. Validity of the activity trackers included (Higgins and Straub, 2006) (i) criterion validity, defined by comparing the trackers to a criterion measure; and (ii) construct validity, defined by comparing the trackers to other constructs that should track or correlate positively (convergent validity) or negatively (divergent validity).

If reported, we abstracted correlation coefficients (CC). We interpreted the CC using the following ratings: <0.60 low, 0.60-<0.75 moderate, 0.75-<0.90 good, and >=0.90 excellent. If reported, we abstracted the mean percentage error (MPE) which captured over- and under-estimation, defined as the [(criterion value minus Garmin tracker value)/criterion value]*100. If reported, we also abstracted the mean absolute percentage error (MAPE) which captured the magnitude of mis-estimation, defined as the absolute value of [(criterion value minus Garmin tracker value)/criterion value]*100. The smaller MAPE represented better accuracy and accounted for both over- and underestimation. We interpreted a MAPE<5% in laboratory or controlled conditions (Fokkema et al., 2017) and MAPE<10% in free-living conditions (Chen et al., 2016; Crouter et al., 2003; Nelson et al., 2016; Tudor-Locke et al., 2006) as significantly equivalent to the criterion measure. Anything over those measures was considered a practically relevant difference. We also summarized results from the Bland-Altman plots when presented (Bland and Altman, 1986).

Reporting study quality is standard practice for systematic reviews. However, we could locate no assessment tools specific to testing validity and reliability of a device. Therefore, we developed a 10-item assessment, guided both by a paper describing reporting suggestions for wearable sensors (Duking et al., 2018) and a critical appraisal tool developed originally to assess the quality of cross-sectional studies (Downes et al., 2016). The questions asked:

Was the research questions clearly stated?
Was the study population clearly defined?
Was the testing protocol clearly specified?
Is the way the tracker is worn on the wrist specified? (e.g., dominant or non-dominate hand, randomized)
Were free-living activities included in the protocol?
Were usability results presented?
Were the app set-up details described for the Garmin activity tracker?
Was the threat for specification error (gold standard not used) minimized?
Was intra-device reliability included?
Was inter-device reliability included?

Yes or no responses were recorded for all 10 items, with “yes” indicating higher study quality.

Results

In total, the search captured 164 unique papers (including 3 papers identified using other sources), with 42 receiving full text review and 32 studies included in the review (Appendix 2). All 32 studies documented validity and 4 of these studies also documented reliability of Garmin activity trackers. Trackers assessed for validity included the Forerunner 225, 235, 305, 310XT, 910XT, and 920XT; Vivoactive; Vivofit, Vivofit 2; and Vivosmart, Vivosmart HR, and Vivosmart HR+ (Table 1). Trackers assessed for reliability included the Forerunner 305, Vivofit, and Vivosmart. All of these products were wrist-worn, with detailed descriptions found in Appendix 3. Although the search was not limited by age, all studies enrolled adults only.

Table 1:

Garmin studies of reliability and validity (listed by author’s last name and publication year)

Garmin Device	Validity or Reliability	Steps	Distance	Speed	Elevation	Energy Expenditure	Heart Rate	Sleep
Forerunner 225	Validity					Dooley 2017	Claes 2017; Dooley 2017
Forerunner 235	Validity						Gillinov 2017
Forerunner 305	Validity			Hovsepian 2014		Hongu 2013
	Reliability			Hovsepian 2014
Forerunner 310XT	Validity				Menaspa 2014
Forerunner 910XT	Validity				Ammann 2016
Forerunner 920XT	Validity	Wahl 2017	Wahl 2017			Roos 2017; Wahl 2017
Vivoactive	Validity	Wahl 2017	Wahl 2017			Wahl 2017
Vivofit	Validity	Alsubheen 2016; An 2017; Chen 2016; Ehrler 2016; El-Amrawy 2015; Huang 2016; O’Connell 2016; Simunek 2016; Wahl 2017	Huang 2016; Wahl 2017			Alsubheen 2016; Brooke 2017; Pribyslavska 2018; Price 2017; Wahl 2017; Woodman 2017		Brooke 2017
	Reliability	Chen 2016; O’Connell 2016
Vivofit 2	Validity	Gaz 2018; Hochsmann 2018; Leth 2017; Munck 2018; Wang 2017	Gaz 2018			Yavelberg 2018
Vivosmart, Vivosmart HR, and Vivosmart HR+	Validity	Fokkema 2017; Sears 2017; Wahl 2017	Wahl 2017			Boudreaux 2018; Reddy 2018; Wahl 2017	Boudreaux 2018; Reddy 2018	Lee 2018
	Reliability	Fokkema 2017

Open in a new tab

Studies were conducted in Australia (n=1), Belgium (n=1), Canada (n=3), China (n=2), Czech Republic (n=1), Denmark (n=2), Egypt (n=1), Germany (n=1), Ireland (n=1), Italy (n=1), the Netherlands (n=1), Switzerland (n=4), Taiwan (n=1), and the United States (n=13) (Table 2). One study reported two countries (Canada and United States) (Reddy et al., 2018). Data collection study dates ranged from 2014 to 2018, as well as one study in 2012 (Menaspà et al., 2014).

Table 2:

Characteristics of studies included in the systematic review (listed by author’s last name and publication year)

Author (Year)	Location of Lab or Recruitment Area	Sample Size for Validity and Reliability Studies^*	% Female	Mean Age (SD), Range	Mean body mass index (SD), range in kilograms/ meters squared	Data Collection Period	Inclusion Criteria	Features Tested	Number Garmin Features Tested	Number Devices Tested^**

Alsubheen (2016)	Newfoundland, Canada	13 (V)	38	40 (11.9)	27.0 (3.4)	2015	Apparently healthy adult using PARQ as a screener	EE, S	2	1
Ammann (2016)	Switzerland	3 (V)	0	25.5 (1.3)	not reported	not reported	Recreational runners, practicing endurance sports more than 300 minutes/week with differing heights	E	1	3
An (2017)	Omaha, Nebraska, USA	35 (V)	51	31.0 (11.8), 19–65	23.8 (3.1)	not reported	Apparently healthy, completed PARQ, able to walk/run safely on treadmill and around an indoor track, does not use a walking aid, not pregnant, does not have an implanted electromagnetic device	S	1	10
Boudreaux (2018)	Hammond, Louisiana, USA	50 (V)	56	Females 22.7 (3.0), Males 22.0 (2.7), All 18–35	Females 25.8 (4.8), Males 27.1 (3.6)	October 2015 to June 2016	No cardiovascular disease or musculoskeletal injury within the past 6 months	EE, HR	2	8
Brooke (2017)	Omaha, Nebraska, USA	95 (V)	64	28.5 (9.9), 19–60	25.7 (3.4), 17–34.3	not reported	Able to perform activites of daily living without limitations, completed PARQ, does not require walking aids or have walking impairments	EE, SL	2	8
Chen (2016)	Kaohsiung City, Taiwan	30 (V and R)	50	21.5 (2.0)	21.5 (1.9)	February 2015 to May 2015	At least 20 years old, normal body mass index, could ambulate without assistance, normal gait pattern	S	1	3
Claes (2017)	Leuven, Belgium	12 (V)	50	28.0 (4.79)	22.14 (3.46)	October 2015 to April 2016	Regularly physically active men or women between 20–40 years of age and no known musculoskeletal pathology or cardiovascular, respiratory or metabolic disease.	HR	1	1
Dooley (2017)	Austin, Texas, USA	62 (V)	58	22.6 (4.3), 18–38	24.6 (4.8), 17.1–45.0	not reported	Caffeine free for 12 hours, fasted for 3 hours, non-smoker, no disability contraindicated for exercise, and no tattoos, piercings, or braces where device would be worn	EE, HR	2	3
Ehrler (2016)	Geneva, Switzerland	21 (V)	57	34.5 (15.7)	not reported	not reported	Healthy volunteers, able to walk at least 500m and not have any walking disability	S	1	4
El-Amrawy (2015)	Alexandria, Egypt	4 (V)	0	26.5 (12.8)	not reported	March 2014-June 2015	Apparently healthy adult 22–36 years	S	1	17
Fokkema (2017)	Groningen, The Netherlands	30 to 31 (V and R)	48	32 (12)	22.6 (2.4)	Fall 2016	Apparently healthy adult volunteers	S	1	10
Gaz (2018)	Rochester, Minnesota, USA	32 (V)	69	36 (8), 26–56	26.8 (5.2), 18.2–41.8	not reported	No known orthopedic limitations, no absolute contridictions to physical activity, employees of the institution	D, S	2	6
Gillinov (2017)	Cleveland, Ohio, USA	25 (V)	54	38 (12)	25 (3.5)	June 2016 to August 2016	At least 18 years old, could safely perform an 18 minute exercise protocol, and no known cardiovascular or lung disease, presence of cardiac pacemaker, treatment with beta-blockers or heart rhythm medications, and self-reported chest pain, dizziness, or loss of balance	HR	1	6
Hochsmann (2018)	Basel, Switzerland	20 (V)	Group 1: 60; Group 2: 80	Group 1: 22, 21–23; Group 2: 53, 52–66	Group 1: 23, 21–25; Group 2: 24, 22–29	January 2017 to March 2017	Apparently healthy volunteers	S	1	7
Hongu (2013)	Tucson, Arizona, USA	16 (V)	56	Females: 22.6 (2.6); Males: 21.3 (1.5)	Females: 22.5 (1.4); Males: 22.8 (2.3)	not reported	Apparently healthy college students free from cardiovascular or metabolic diseases or physical impariments that would interfere with walking	EE	1	4+1
Hovsepian (2014)	La Crosse, Wisconsin, USA	13 (V and R)	not reported	25.3 (2.5)	not reported	not reported	At least 18 years and self-reported running an average of >=10 miles/week during the past year	S	1	2
Huang (2016)	Shanghai, China	40 (V)	25	23.9 (2.8)	21.4 (2.5)	September 2014 to October 2014	>18 years old, able to walk on flat ground for at least 10 minutes and up or down stairs for at least 6 minutes continuously, body mass index <32 kg/m2, and no previous history of injury or disease inhibiting normal gait	D, S	2	5
Lee (2018)	Omaha, Nebraska, USA	40 (V)	54	27.6 (11.0), 19–66	25.3 (4.6), 19.4–39.7	not reported	>=19 years old, no insomnia	SL	1	6
Leth (2017)	Aalborg, Denmark	22 (V)	50	31.1 (8.0), 22–52	not reported	November 2015 to June 2016	No walking disabilities that could lead to unnatural walking patterns	S	1	5+1
Menaspa (2014)	Varese, Italy (study 2)	1 (V)^#	not reported	not reported	not reported	September 2012	not reported	E	1	4
Munck (2018)	Aalborg, Denmark	22 (V)	50	27 (7.25), 21–49	25.0 (3.8), 20.1–36.4	November 2015	>18 years old, capable of understanding Danish, did not suffer from previous neurologic, musculoskeletal, or mental illness, no use of walking aids, not pregnant	S	1	6
O’Connell (2016)	Galway, Ireland	15 (V)	53	21.1 (1.1)	Females 21.9 (1.8), Males 23.6 (2.7)	February 2015 to July 2015	No history of cardiovascular disease or neurological disorder	S	1	4
Pribyslavska (2018)	Murfreesboro, Tennessee, USA	34 (V)	32	25.8 (4.9)	24.4 (4.4)	Fall 2016- Spring 2017	classified as either low or moderate risk according to the American College of Sports Medicine cardiovascular risk classification, physically active	EE	1	3
Price (2017)	Melbourne, Australia	14 (V)	21	23.0 (6.0)	22.8 (2.6)	September 2014 to September 2015	Able to walk and run continuously on a treadmill unaided, healthy and free of factors associated with exercise risk as determined through standard screening procedures	EE	1	3
Reddy (2018)	Portland, Oregon, USA and Toronto, Canada	20 (V)	55	27.5 (6.0)	22.5 (2.3)	December 2017- February 2018	Healthy adults, screening used PARQ	EE, HR	2	2+1
Roos (2017)	Magglingen, Switzerland	20 (V)	40	23.9 (1.9)	not reported	January 2016 to March 2016	Recreational or competitive runner, no injury to lower extremities within past year.	EE	1	3
Sears (2017)	Buies Creek, North Carolina, USA	10 (V)	50	23.3 (5.2), 18–40	not reported	Spring 2016	Recreationally active, low- or moderate-risk for cardiovascular disease	S	1	5
Simunek (2016)	Olomouc, Czech Republic	20 (V)	30	34.0 (6.3), 25–52	24.3 (4.0)	December 2014 to February 2015	No history of injury or illness affecting mobility	S	1	2+2
Wahl (2017)	Cologne, Germany	20 (V)	50	Females: 24.2 (1.9), Males: 26.1 (2.8)	not reported	not reported	Apparently healthy and active sport students	D, EE, S	3	11
Wang (2017)	Hangzhou, China	9 (V)	44	22.0 (1.0)	not reported	Spring 2015	Apparently healthy participants	S	1	7
Woodman (2017)	Knoxville, Tennessee, USA	28 (V)	29	25.5 (3.7)	24.9 (2.6)	January 2015 to May 2015	Completed PARQ, not currently pregnant, obese, or have orthopedic or musculoskeletal issues that would limit activity, able to run on treadmill for 5 min at 134.1m/min with 0% grade	EE	1	5
Yavelberg (2018)	Toronto, Ontario, Canada	25 (V) but smaller sample wore Garmin	44	25.0 (7.6), 18–55	Females without diabetes 23.8 (2.7), Females with diabetes 24.7 (1.5), Males without diabetes 26.0 (2.2), Males with diabetes 23.6 (2.2)	2014 to 2016	At least 16 years, otherwise healthy and active with moderate to high levels of physical activity. Eight participants had a diagnosis of type 1 diabetes.	EE	1	5

Open in a new tab

Abbreviations: D, distance; E, elevation; EE, energy expenditure; HR, heart rate; PARQ, Physical Activity Readiness Questionnaire; R, reliability;SL, sleep; SD, standard deviation; S, steps; V, validity

The sample size was based from the article on the number who tested a Garmin device. For some studies, this was less than the full sample described for gender, age, and body mass index.

^**

The number of devices tested is listed as two numbers if the gold standard assessment included a device (e.g., accelerometer, activity tracker).

Only results from study 1 were included, since study 2 did not use an activity tracker.

The sample size ranged from 1 (Menaspà et al., 2014) to 95 (Brooke et al., 2017) for validity and 4 (O’Connell et al., 2016) to 31 (Fokkema et al., 2017) for reliability testing. The mean percentage of female participants ranged from 0 (Ammann et al., 2016) to 80 (Hochsmann et al., 2018). The assessment of steps, distance, speed, elevation, energy expenditure, heart rate, and sleep is summarized next, with reliability presented first followed by validity evidence. Study quality, along with the questions used for the assessment, is reported in Appendix 4 for each study.

Steps

A assessment of inter-device reliability of steps from 30 Vivofits indicated very small mean differences while on the treadmill (0 to 5 step mean difference over 5 minutes at each of four speeds), but larger differences when compared to carrying a bag (16 step mean difference over 5 minutes) or pushing a stroller (37 step mean difference over 5 minutes) (Appendix 5) (Chen et al., 2016). Another assessment of inter-device reliability of steps from 4 Vivofits indicated a 13.7% difference between units (O’Connell et al., 2016). An assessment of intra-device reliability (n=30–31), comparing steps from the same Vivosmart at two different treadmill sessions, indicated an acceptable MAPE (1.2–3.5%) during three treadmill speeds, but a larger variation in ICC’s (0.51 to 0.79) (Fokkema et al., 2017).

Sixteen studies assessed validity of the Garmin activity trackers to assess steps including the: Forerunner 920XT (Wahl et al., 2017), Vivoactive (Wahl et al., 2017), Vivofit (Alsubheen et al., 2016; An et al., 2017; Chen et al., 2016; Ehrler et al., 2016; El-Amrawy and Nounou, 2015; Huang et al., 2016; O’Connell et al., 2016; Simunek et al., 2016; Wahl et al., 2017), Vivofit 2 (Gaz et al., 2018; Hochsmann et al., 2018; Leth et al., 2017; Munck et al., 2018; Wang et al., 2017), and Vivosmart with (Sears et al., 2017) and without heart rate (Fokkema et al., 2017; Wahl et al., 2017) (Appendix 6). Assessments occurred mostly in the laboratory, although some studies included field-based testing or at-home monitoring (An et al., 2017; Gaz et al., 2018; Huang et al., 2016; Simunek et al., 2016; Wahl et al., 2017; Wang et al., 2017). Criterion measured steps were compared against video observation (Alsubheen et al., 2016; Chen et al., 2016; Ehrler et al., 2016; Hochsmann et al., 2018; Huang et al., 2016; O’Connell et al., 2016; Wahl et al., 2017), gait measurement and analysis device (Wahl et al., 2017), hand-tally of steps (An et al., 2017; El-Amrawy and Nounou, 2015; Fokkema et al., 2017; Gaz et al., 2018; Munck et al., 2018; Sears et al., 2017), a pedometer (An et al., 2017; Simunek et al., 2016), and an accelerometer (Leth et al., 2017; Simunek et al., 2016; Wang et al., 2017).

Generally the activity trackers underestimated steps taken on the treadmill (Alsubheen et al., 2016; Chen et al., 2016; Gaz et al., 2018; Hochsmann et al., 2018), except while on an incline (Alsubheen et al., 2016). Agreement, as indicated by CC between the Garmin activity trackers and walking or running on the treadmill, was good to excellent for the Forerunner 920XT (Wahl et al., 2017), Vivosmart (Fokkema et al., 2017; Wahl et al., 2017), Vivofit (Wahl et al., 2017), and Vivoactive (Wahl et al., 2017), but lower for the Vivosmart HR at 3.5 and 4.0 mph (Sears et al., 2017) (Figure 1a). The CC were lower with faster speed only for the Vivofit (Wahl et al., 2017) and Vivosmart HR (Sears et al., 2017).

Figure 1: — Correlation coefficients and mean absolute percentage error (MAPE) for steps taken on the treadmill at zero percent grade measured with Garmin activity trackers

MAPE was acceptable (<5%) at treadmill speeds 2 to 3 mph across activity trackers (An et al., 2017; Fokkema et al., 2017; Hochsmann et al., 2018; Wahl et al., 2017; Wang et al., 2017) (Figure 1b). Between 3.1 to 4.0 mph, the MAPE exceeded 5% in several studies (An et al., 2017; Fokkema et al., 2017), but not in others (Chen et al., 2016; Hochsmann et al., 2018). Between 4.1 to 8.1 mph, the MAPE never exceeded 5% (An et al., 2017; Chen et al., 2016; Wahl et al., 2017). However, other studies found higher error with slower walking speeds (Ehrler et al., 2016; Munck et al., 2018).

Other studies explored validity of the activity trackers to assess steps beyond the treadmill. The Vivofit underestimated steps when walking on flat ground and upstairs, but overestimated walking downstairs (Huang et al., 2016). Two other validation studies reported excellent agreement for above-ground walking for the Vivofit (El-Amrawy and Nounou, 2015) and Vivofit 2 (Leth et al., 2017). For the Vivofit, MAPE was acceptable (<5%) for slower but not faster speeds on the track (An et al., 2017), while another study found acceptable MAPE across a variety of activities except when pushing a stroller (Chen et al., 2016). One study tested various surfaces and found that steps on the Vivofit varied slightly across surfaces (e.g., natural lawn, gravel, linoleum, asphalt, ceramic tile) but the MAPE remained acceptable (O’Connell et al., 2016). In a study wherein participants wore the Vivofit at home, MAPE was large (17.8%), but the Pearson CC to another device (New Lifestyles pedometer) was excellent (An et al., 2017). In another study where the Vivofit and Yamax pedometer were worn for one week, the Vivofit underestimated daily steps (Simunek et al., 2016).

Distance

No studies reporting on reliability of Garmin-measured distance were identified. Three studies assessed validity of the Garmin activity trackers to assess distance including the Forerunner 920XT (Wahl et al., 2017), the Vivoactive (Wahl et al., 2017), the Vivofit (Huang et al., 2016; Wahl et al., 2017), the Vivofit 2 (Gaz et al., 2018), and the Vivosmart (Wahl et al., 2017) (Appendix 6). Criterion assessments included both known treadmill distance (Gaz et al., 2018; Huang et al., 2016; Wahl et al., 2017) and measured outdoor distance (Gaz et al., 2018; Huang et al., 2016; Wahl et al., 2017).

Generally, the CC for assessing distance were poor (Figure 2). Starting with the most comprehensive study that included four Garmin activity trackers, distance was overestimated at slower treadmill speeds and underestimated at faster treadmill speeds (Wahl et al., 2017). Another study indicated that the Vivofit overestimated distance during level walking, with the MPE highest at slower walking speeds, and greatly overestimated distance when traveling both up and down stairs (Huang et al., 2016). Another study concurred with the overestimation of distance at slower treadmill speeds, but an underestimation while walking on their own (Gaz et al., 2018).

Energy Expenditure

No studies reporting on reliability of Garmin-measured energy expenditure were identified. Twelve studies assessed validity of the Garmin activity trackers to assess energy expenditure including the: Forerunner 225 (Dooley et al., 2017), Forerunner 305 (Hongu et al., 2013), Forerunner 920XT (Roos et al., 2017; Wahl et al., 2017), Vivoactive (Wahl et al., 2017), Vivofit (Alsubheen et al., 2016; Brooke et al., 2017; Pribyslavska et al., 2018; Price et al., 2017; Wahl et al., 2017; Woodman et al., 2017), Vivofit 2 with a chest strap (Yavelberg et al., 2018), and Vivosmart with (Boudreaux et al., 2018; Reddy et al., 2018) and without heart rate (Wahl et al., 2017) (Appendix 7).

Generally, CC comparing agreement ranged from low to substantial (Boudreaux et al., 2018; Brooke et al., 2017; Price et al., 2017; Reddy et al., 2018; Wahl et al., 2017), with high variability across devices and studies (Figure 3a). In most cases, the MAPE was unacceptable (Figure 3b) (Boudreaux et al., 2018; Brooke et al., 2017; Dooley et al., 2017; Pribyslavska et al., 2018; Reddy et al., 2018; Roos et al., 2017; Wahl et al., 2017; Woodman et al., 2017). The MPE was also large for many different activities (Pribyslavska et al., 2018; Reddy et al., 2018). Three studies not reporting CC or MAPE found large mean differences between the Garmin assessment of energy expenditure and the criterion measure during physical activity (Alsubheen et al., 2016; Hongu et al., 2013; Yavelberg et al., 2018).

Figure 3: — Correlation coefficients and mean absolute percentage error (MAPE) for energy expenditure measured with Garmin activity trackers

Footnote: Rest= 1; Activities of daily living= 2; Resistance training= 3; Walking=4; Running= 5; Running maximal= 6; Cycling=7; Cycling maximal= 8; Two days of wear= 9; Intermittent activity= 10; Outdoor activity= 11

Speed

An assessment of intra-device reliability of speed from the Forerunner 305 indicated good to excellent agreement, with ICC’s ranging from 0.84 to 0.99 while running at different conditions on a track (Appendix 5) (Hovsepian et al., 2014). This was also the only study to report validity of speed measurement compared to recordings on a track using photoelectric timing lights. For 13 participants, generally the Forerunner slightly underestimated speed (Appendix 8), with the agreement ranging from good to excellent.

Elevation

No studies reporting on reliability of Garmin-measured elevation were identified. Two studies assessed validity to assess elevation using the Forerunner 310XT (Menaspà et al., 2014) and Forerunner 910XT (Ammann et al., 2016) (Appendix 8). In the earlier study, a Forerunner and two SRM PowerControl 7 devices mounted to a car roof rack were compared over 6 tests, repeating the same 16 kilometer mountain climb at different times of day and weather conditions (Menaspà et al., 2014). The Forerunner over estimated elevation, with smaller differences found when elevation correction was not used. The latter study conducted 40 trials for 3 participants using four speeds on a level track, with any elevation gained assumed to be error (Ammann et al., 2016). Across the four speeds, the hip recording (secured by using the wrist strap mounted to a waist-worn belt) produced less elevation gained compared to the wrist recording. At the wrist, where 15% of recordings were outliers, error was higher as speed increased.

Heart Rate

No studies reporting on reliability of Garmin-measured heart rate were identified. Five studies reported on validity using the Forerunner 225 (Claes et al., 2017; Dooley et al., 2017), Forerunner 235 (Gillinov et al., 2017), and the Vivosmart HR+ (Boudreaux et al., 2018) (Appendix 8). Two studies used a Polar chest transmitter to assess heart rate as the criterion measure (Dooley et al., 2017; Reddy et al., 2018), while three studies used a 3- to 12-lead electrocardiogram (ECG) (Boudreaux et al., 2018; Claes et al., 2017; Gillinov et al., 2017).

Three studies assessed the Forerunner tracker, with CC lower for activities that used arms (e.g., elliptical), but higher for rest and treadmill locomotion on flat or elevated grades (Claes et al., 2017; Gillinov et al., 2017) (Figure 4a). However, all MAPE exceeded 5% across rest and various laboratory activities (Figure 4b) (Dooley et al., 2017; Gillinov et al., 2017). For example, 25 participants in a laboratory-based study assessed heart rate using the Forerunner 235 compared to a 12-lead ECG (Gillinov et al., 2017). The MAPE was 6% at rest, and was higher with increasing intensity, particularly when arm movement was involved. Based on the Bland Altman plots, heart rate varied widely across the range of intensity, with 95% of the values falling between −27 to 33 beats/minute of the ECG value.

Figure 4: — Correlation coefficients and mean absolute percentage error (MAPE) for heart rate measured with Garmin activity trackers

Footnote: Rest= 1; Activities of daily living= 2; Resistance training= 3; Walking=4; Elliptical (no arms)= 5; Elliptical (with arms)= 6; Treadmill= 7; Walking with grade= 8; Running= 9; Running maximal= 10; Cycling= 11; Cycling maximal= 12

Two studies assessed heart rate recordings using the Vivosmart, with CC varying widely across activities and the MAPE exceeding 5% in all cases (Figure 4) (Boudreaux et al., 2018; Reddy et al., 2018), with the MPE and Bland Altman plots indicating generally an underestimate of heart rate (Reddy et al., 2018). In one study (Reddy et al., 2018), heart rate assessment was best when the activity mode setting was used. In addition, this study assessed the Vivosmart HR+ while off the body, simulating motion on a shaker table, and found spurious heart rate recordings. In the second study comparing to ECG recorded heart rate, the Vivosmart heart rate values differed from the ECG heart rate values for 10 of the 12 resistance exercises, underestimating heart rate during all 12 of them (Boudreaux et al., 2018).

Sleep

No studies reporting on reliability of Garmin-measured sleep were identified. Two studies assessed validity using the Vivofit (Brooke et al., 2017) and the Vivosmart (Lee et al., 2018) (Appendix 8). The earlier study included 24 participants who wore the Vivofit for two days, enabled sleep mode at bedtime, and kept a sleep log as the criterion measure (Brooke et al., 2017). Mean sleep time was similar between measures, with good CC and acceptable MAPE. The latter study included 40 participants who wore the Vivosmart (Lee et al., 2018). Mean sleep time was overestimated, with low agreement compared to diary measures. Other measures of sleep (e.g., time in bed, sleep efficiency, wake after sleep onset) were also not well measured.

Discussion

This review summarized the evidence for validity and reliability of Garmin activity trackers, identifying 32 studies published between 2013 to 2018. Specifically, the features of steps, distance, energy expenditure, speed, elevation, heart rate, and sleep were reviewed, with limited studies on reliability and variation for validity findings. All studies enrolled adults only.