A Device Agnostic Approach to Predict Children’s Activity from Consumer Wearable Accelerometer Data: A Proof-of-Concept Study

R Glenn Weaver; James White; Olivia Finnegan; Srihari Nelakuditi; Xuanxuan Zhu; Sarah Burkart; Michael Beets; Trey Brown; Russ Pate; Gregory J Welk; Massimiliano de Zambotti; Rahul Ghosal; Yuan Wang; Bridget Armstrong; Elizabeth L Adams; Layton Reesor-Oyer; Christopher D Pfledderer; Meghan Bastyr; Lauren von Klinggraeff; Hannah Parker

doi:10.1249/MSS.0000000000003294

. Author manuscript; available in PMC: 2025 Feb 1.

Published in final edited form as: Med Sci Sports Exerc. 2023 Sep 12;56(2):370–379. doi: 10.1249/MSS.0000000000003294

A Device Agnostic Approach to Predict Children’s Activity from Consumer Wearable Accelerometer Data: A Proof-of-Concept Study

R Glenn Weaver ¹, James White ¹, Olivia Finnegan ¹, Srihari Nelakuditi ¹, Xuanxuan Zhu ¹, Sarah Burkart ¹, Michael Beets ¹, Trey Brown ¹, Russ Pate ¹, Gregory J Welk ², Massimiliano de Zambotti ³, Rahul Ghosal ¹, Yuan Wang ¹, Bridget Armstrong ¹, Elizabeth L Adams ¹, Layton Reesor-Oyer ¹, Christopher D Pfledderer ¹, Meghan Bastyr ¹, Lauren von Klinggraeff ¹, Hannah Parker ¹

PMCID: PMC10841245 NIHMSID: NIHMS1928270 PMID: 37707503

Abstract

Introduction:

This study examined the potential of a device agnostic approach for predicting physical activity from consumer wearable accelerometry compared to a research-grade accelerometry.

Methods:

Seventy-five 5–12-year-olds (58% male, 63% White) participated in a 60-minute protocol. Children wore wrist-placed consumer wearables (Apple Watch Series 7 and Garmin Vivoactive 4) and a research-grade device (ActiGraph GT9X) concurrently with an indirect calorimeter (Cosmed K5). Activity intensities (i.e., inactive, light, moderate-to-vigorous physical activity[MVPA]) were estimated via indirect calorimetry (criterion) and the Hildebrand thresholds were applied to the raw accelerometer data from the consumer wearables and research-grade device. Epoch-by-epoch (e.g., weighted sensitivity, specificity) and discrepancy (e.g., mean bias, absolute error) analyses evaluated agreement between accelerometry-derived and criterion estimates. Equivalence testing evaluated the equivalence of estimates produced by the consumer wearables and ActiGraph.

Results:

Estimates produced by the raw accelerometry data from ActiGraph, Apple, and Garmin produced similar criterion agreement with weighted sensitivity=68.2% (95CI=67.1%, 69.3%), 73.0% (95CI=71.8%, 74.3%), and 66.6% (95CI=65.7%, 67.5%), respectively; and weighted specificity=84.4% (95CI=83.6%, 85.2%), 82.0% (95CI=80.6%, 83.4%), and 75.3% (95CI=74.7%, 75.9%), respectively. Apple Watch produced the lowest mean bias (inactive=−4.0±4.5, light activity=2.1±4.0) and absolute error (inactive=4.9±3.4, light activity=3.6±2.7) for inactive and light physical activity minutes. For MVPA, ActiGraph produced the lowest mean bias (1.0±2.9) and absolute error (2.8±2.4). No ActiGraph and consumer wearable device estimates were statistically significantly equivalent.

Conclusions:

Raw accelerometry estimated inactive and light activity from wrist-placed consumer wearables performed similarly to, if not better than a research-grade device, when compared to indirect calorimetry. This proof-of-concept study highlights the potential of device-agnostic methods for quantifying physical activity intensity via consumer wearables.

Keywords: Physical activity tracking, free-living activity monitoring, validation, open source

INTRODUCTION

Nearly all consumer wearable devices (e.g., Apple Watch, Garmin) include accelerometers. This coupled with their design for 24/7 wear (i.e., waterproof, small design, and comfortable fit on the wrist) make them ideal for research on children’s free-living movement behaviors (1). Consumer wearable devices also frequently have a long battery life (i.e., up to 54 days) (2) and remote data capture and monitoring. The utility of consumer wearable devices in research is evidenced by their ubiquitous uptake in research (3) and by consumers alike (4). For instance the NIH-funded All of US Research Program, which aims to enroll 1 million participants, recently incorporated a consumer wearable device as a measure of physical activity (5).

However, a key limitation of consumer wearable devices is that the metrics produced by these devices rely on proprietary algorithms that are not available for review by researchers (6). This is problematic because it is unclear how these algorithms were developed and what sensors they rely on for these metrics. This makes it impossible to examine the limitations of these algorithms and to apply them appropriately in a research setting. Additionally, the metrics yielded from these algorithms could be updated at any time without notification from the company. This makes comparing findings collected from the same device across time problematic because changes in the physical activity metrics may be, at least partially, attributable to changes in the underlying algorithms that produce these metrics.

Adopting a device-agnostic approach offers one solution to this problem (7, 8). A device-agnostic approach relies on the underlying data collected via sensors rather than the metrics produced by the proprietary algorithms on these devices. In this approach the raw sensor data is accessed, stored, and then processed via an open-source algorithm. This is possible because many consumer wearable devices now allow access to the raw data collected via the sensors on board, including the x-, y-, and z-axis accelerometry data in ɡ’s (hereafter referred to raw accelerometry data) via application programming interfaces (9). This approach has the potential to leverage the strengths of consumer wearable devices while overcoming the major limitation associated with these devices. All previous studies have examined the validity of the physical activity metrics produced by the proprietary algorithms of consumer wearables (10–12), while no studies have explored the validity of applying a device-agnostic approach of predicting activity intensity levels of children using consumer wearable devices. Therefore, the purpose of this proof-of-concept study was to explore the potential of a device agnostic approach when applied to consumer wearable devices and to evaluate the agreement of physical activity estimates produced by raw accelerometry data collected via two widely used consumer wearable devices and a research-grade device compared to a criterion of indirect calorimetry. To accomplish this purpose, the same cut-points were applied to the raw acceleration data collected from the consumer wearable and research-grade devices to produce estimates of inactive, light, and moderate-to-vigorous physical activity (MVPA). Error in these estimates was then produced by comparing activity intensity estimates to a criterion of indirect calorimetry. Differences in error were compared across the consumer wearable and research-grade devices to ascertain the feasibility of a device agnostic approach. The equivalence of physical activity estimates produced by raw accelerometer data collected from the consumer wearable devices and a research-grade device was then explored.

METHODS

Setting and Participants.

The current study took place in one southeastern state in the U.S. from May to September of 2022. A total of 89 children were recruited to participate in the study with 75 included in the final analytic sample. Data from participants were not included in the final analytic sample for the following reasons: three children did not show for their scheduled protocol, three children withdrew prior to completing the protocol, the K5 indirect calorimetry device failed to collect data on two children, the Apple device failed to collect data on three children, the Garmin device failed to collect data on one child, and both the Apple and Garmin failed to collect data on two children.

Participants were children (5–12 years old) recruited from a summer day camp operated by a local recreation department. Parents received informational fliers about the study protocols during drop-off and pickup at the camp. Interested parents completed an informed consent document. Children with parental consent were provided with additional information and provided verbal assent prior to participation in the study. To be included children had to be age 5 to 12 years and have the ability to be physically activity without an assistive device, such as a wheelchair. Study procedures were approved by the first author’s institutional review board prior to enrollment of the first participant. Participants received a $40 gift card upon completion of the study protocol.

Procedures

Children participated in the study procedures at the summer day camp. Upon arrival at the day camp, children were fitted with all devices (see Supplemental Figure 1, Supplemental Digital Content, Placement of devices) by trained research assistants. Participants then engaged in a series of simulated free-living activities that ranged from inactive to vigorous intensity based on the Youth Compendium of Physical Activities (13). A detailed description of the activities is provided in Supplemental Table 1 (Supplemental Digital Content, Simulated free-living protocol). The first activity was a was a 10-minute supine rest. Children laid on their backs on a blanket while watching a video of their choice on a tablet computer. The tablet computer was held by the children. Following the rest session, children engaged in two blocks of activities. The first block included four activities, lasting five minutes each, that progressed in intensity from inactive to vigorous. Following this block of activities children were randomly assigned to engage in one additional block of four activities, lasting five minutes each, that ranged in intensity from inactive to vigorous. There were a total of three additional blocks to which a child could be randomized. These activities were designed to simulate common activities that children engage in during their everyday life (see Supplemental Table 1, Supplemental Digital Content). Activities in the second block were conducted in a random order to simulate free-living activity.

Measures

Portable Cosmed K5 (criterion measure of activity intensity).

The COSMED K5 portable metabolic system (COSMED, Rome, Italy) was used to estimate breath-by-breath oxygen consumption via expired respiratory gasses. Pediatric face masks of various sizes were used which are suitable for children as young as 3 years of age (14). All calibration steps specified by the manufacturer were carried out prior to each protocol. These included flowmeter and gas calibration (including scrubber, reference gas, room air, and delay). Sampling lines were checked weekly for cracks or discoloration and turbines were cleaned and examined for proper function after each protocol. Prior to calibration, the K5 was turned on and allowed to warm up for an hour. The K5 was placed on the upper back of the participant by using a harness to ensure it was secure throughout testing.

Research-Grade and Consumer Wearable Devices.

Children wore three devices on their non-dominant wrist while simultaneously wearing the COSMED K5 during the research study. The devices included a research-grade ActiGraph GT9X Link (v6.13.4, ActiGraph; ActiGraph LLC, Pensacola, FL), and two consumer wearable devices: an Apple Watch Series 7 (Apple; Apple Technology Company, Cupertino, CA) and a Garmin Vivoactive 4 (Garmin; Garmin Ltd., Olathe, KS). The dynamic range of the ActiGraph GT9X Link accelerometer is ±8g (15). To the authors’ knowledge the dynamic accelerometer range of the consumer wearable devices are not publicly documented. However, based on the current study the dynamic range of Apple appears to be ±16g while Garmin is ±8g. Device order of placement on the child’s wrist was counterbalanced for the consumer wearable devices. The Actigraph was always worn proximally and the counterbalanced consumer devices were always worn distally. This decision was made to ensure device placement on the wrist did not impact comparisons between the research grade ActiGraph and consumer wearable devices. The ActiGraph device was initialized to collect data at 50Hz with idle sleep mode enabled. Idle sleep mode was enabled in order to emulate parameters often used for longer duration assessments in free-living physical activity research in children that require up to 14-day wear protocols (16). All devices were placed on the child by a trained research assistant. Accelerometry data points were extracted from the research-grade device via the Actilife software (ActiGraph LLC, version v6.13.4, Pensacola, FL). The accelerometry data were extracted from the consumer wearable devices via user-written apps that leverage the device-specific application programming interface to collect the underlying sensor data on the respective devices. RawLogger (version 1.0.20211201a) was used for the Garmin device while SensorLog (version 5.2) was used for the Apple device. RawLogger is available for download through the Connect IQTM store on the Garmin ConnectTM app, and SensorLog is available for download through the App Store. RawLogger allows researchers to export accelerometer data from the Garmin at 25Hz, while SensorLog will allow Apple accelerometer data to be exported at 50Hz.

Data Processing.

Prior to analyses, all data from all devices was trimmed to include only time between the beginning and end of the protocol as documented by research assistants. Data from the consumer wearable and research-grade accelerometry devices and criterion COSMED K5 were then temporally aligned and merged at the second level. The research grade accelerometer and COSMED K5 were initialized on the same computer ensuring that clocks were synchronized. The consumer wearable devices were initialized on the same phone ensuring that clocks were synchronized. Research assistants checked that the computer and phone clocks were synchronized to the second at the beginning of every protocol. Data were classified as inactive, light, or MVPA using standardized thresholds for the COSMED and accelerometry. Similar to past research (17–19), resting VO₂ was estimated for each child by taking the average VO₂ during the middle 5 minutes of the 10-minute resting period. The VO₂ estimates were then converted into youth metabolic equivalents for each child by dividing the estimated VO₂ for each activity by the calculated resting VO₂ for each child. Metabolic equivalents were then converted into activity intensity using standardized activity intensity thresholds of 1.5 metabolic equivalents for inactive (20), and 3.0 metabolic equivalents for MVPA (21). For accelerometry, data were converted into Euclidian norm minus one (ENMO) values by calculating the vector magnitude and subtracting one for each accelerometer X-, Y-, and Z-axis reading (i.e., at the Hz level), with values less than 0 rounded up to 0 (22). Euclidian norm minus one was selected for a variety of reasons not the least of which is that it is the default data metric in GGIR (23), one of the most widely used data processing platforms for accelerometer data in physical activity research. Further, as described by GGIR (24): ENMO has shown the ability to describe the variance in energy expenditure, has been correlated with questionnaire data, and is able to describe patterns in physical activity. Further, unlike other summary metrics of accelerometry, ENMO is also easy to describe mathematically thus improving reproducibility. Further, ENMO has been shown to have similar validity to other more complicated filter based metrics (25). Consistent with common processing practice in physical activity research (26), Euclidian norm minus one values were autocalibrated to account for local gravity and temperature as described elsewhere (22, 24). The established Hildebrand absolute intensity thresholds were applied to the raw accelerometry data collected from the consumer wearable and research grade devices to classify inactive (<35.6mg), light (35.6mg-201.4mg), and MVPA (>201.4mg) (17, 27). To account for the intermittent, sporadic and transitory patterns of children’s physical activity (28–30), all data streams (i.e., accelerometer and breath-by-breath COSMED) were averaged across 15-second windows.

Power

A two-step approach was adopted to determine adequate sample size for the present study. First, best practice in measurement studies calls for sufficient numbers of participants (i.e., 10/group) to represent different ages (31–33). For our study, the age range was 5–12 years, and thus, a sample of 70 children was necessary to include ~10 children per age. Second, a power analysis was completed to identify adequate power to detect equivalence of inactive, light, and MVPA estimates produced by the raw data collected from the consumer wearable devices compared to the research-grade device (see Analyses section below). For an equivalence test, power is determined by identifying the likelihood that the difference between two estimates is within prespecified equivalence bounds (34). Power is then determined based upon the smallest acceptable width of the equivalence bounds. For the current study, power was calculated fto detect equivalence between devices for estimates of light activity and MVPA. This decision was made because they would require the most precise equivalence bounds given that light activity and MVPA would be engaged in less than time inactive by protocol design. With a sample of 75 participants, assuming an alpha of 0.05, and a standard deviation of the difference between MVPA estimates produced by data collected via the consumer wearables and research-grade devices of 4 minutes, the study was adequately powered (power=0.8) to detect equivalence bounds from −1.0 to 1.0 minutes using standard statistical tests.

Analyses

All analyses were completed via Stata (version 16.1, StataCorp LLC, College Station, Texas). Consistent with best practice (35–37), analyses for the current study were conducted to explore epoch-by-epoch and overall discrepancy for the activity intensity estimates produced by the raw accelerometry data collected via the consumer wearable and research-grade devices compared to the activity intensity estimates produced by the criterion indirect calorimetry device. For the epoch-by-epoch analysis, confusion matrices were constructed, and sensitivity and specificity were calculated for inactive, light, and MVPA against all other activity intensities for each individual participant. Overall weighted sensitivity and specificity were computed by calculating the mean binary metric weighted by the number of samples in each activity intensity for each individual participant. Weighted sensitivity and specificity were calculated to account for the imbalance in epochs classified as inactive, light, and MVPA (38). Following the calculation of the individual metrics an overall mean and standard deviation were then calculated from individual participant estimates to complete a confusion matrix for the entire sample. An overall mean and a 95% confidence interval from the individual participant estimates were also calculated for sensitivity and specificity.

Following the epoch-by-epoch analysis, a discrepancy analysis was completed comparing the overall minutes classified as inactive, light, or MVPA by the data collected from the research-grade and consumer wearable devices compared to the criterion indirect calorimetry estimated minutes of inactive, light, or MVPA. Pearson’s correlation examined the relation of the variance in minutes of inactive, light, and MVPA as estimated by the data collected from the research-grade and consumer wearable devices compared with the variance in minutes of inactive, light, and MVPA as estimated by the criterion indirect calorimetry device. Lin’s Concordance Correlation Coefficient (39) was then calculated to explore the overall agreement between the proxy and criterion estimated inactive, light, and MVPA minutes. Mean bias and limits of agreement were calculated and Bland-Altman plots were constructed to allow for visual inspection of the bias, limits of agreement, and trend in the agreement (40). Consistent with best practice (35), absolute and absolute percent error were also calculated for inactive, light, and MVPA minutes estimated from the raw accelerometry data collected via the research-grade and consumer wearable devices.

Following the discrepancy analyses above, the Two-One-Sided-Tests method (41) was adopted to assess the equivalence of inactive, light, and MVPA minute estimates produced by the raw accelerometry data collected from the consumer wearable devices compared with the research-grade device (42). Because the primary objective of this study was to compare equivalence between research grade and consumer wearable devices when adopting a device agnostic approach and because past research has shown that estimates of activity intensity produced by accelerometers are not equivalent to indirect calorimetry when using the Hildebrand absolute intensity thresholds (17), equivalence between the accelerometer produced estimates and the K5 criterion monitor was not tested. In this approach, the null hypothesis is that estimates produced from the raw data collected via the consumer wearable devices and estimates produced from the raw data collected via the research-grade device are not equivalent. To test this with 95% confidence, the difference between consumer wearable and research-grade estimates is required to fall within 90% equivalence bounds (43). An equivalence zone of ±10% of the research-grade estimated mean was adopted based upon previous work and industry standards (43, 44). Thus, should the 90% confidence interval of the consumer wearable estimated minutes in activity (i.e., inactive, light, or MVPA) fall completely within ±10% of the research-grade estimated mean, equivalence of the two measures is concluded. The ‘tost’ command in Stata was used to complete all equivalence analyses. Follow-up analyses were completed to identify the minimum equivalence zone that showed statistically signficant equivalence. This approach indicates how wide the equivalence zone would need to be set in order for statistically signficant equivalence to be concluded in the current data.

Results

The demographic characteristics of the sample are presented in Table 1.

Table 1.

Demographics of the participating children

Sex	n	%
Female	32	42.6
Male	43	57.3
Race
Black	19	25.3
White	50	66.7
More than one race	4	5.3
Asian	2	2.7
Ethnicity
Not Hispanic or Latino	72	96.0
Hispanic or Latino	3	4.0
BMI Category
Underweight	2	2.7
Normal Weight	53	70.7
Overweight	12	16.0
Obese	8	10.7
	Mean	SD
Age	8.4	1.8
BMI z-score	0.28	1.1

Open in a new tab

Abbreviations: “SD” Standard Deviation

Epoch-by-epoch Analyses.

Confusion matrices for the activity intensity levels predicted from the raw data collected by each device are presented in Table 2. Corresponding sensitivity and specificity are presented in Table 3. Consistent with past research implementing the Hildebrand absolute intensity thresholds (17), the activity levels predicted by the raw data collected from the research-grade and consumer wearable devices displayed overall poor agreement with indirect calorimetry estimated activity intensity. However, similar trends of agreement with the criterion activity levels, the focus of this study, were observed. For instance, sensitivity for inactive was 66.7% (95CI=64.4%, 69.0%), 76.8% (95CI=75.8%, 77.9%), and 63.3% (95CI=61.4%, 65.1%) for the raw accelerometry data collected via ActiGraph, Apple, and Garmin. Specificity for inactive was 89.2% (95CI=88.3%, 90.1%), 78.7% (95CI=77.7%, 79.7%), and 70.6% (95CI=69.9%, 71.4%) for the raw accelerometry data collected via ActiGraph, Apple, and Garmin. Similar trends in sensitivity and specificity were also observed for light and MVPA across the research grade and consumer wearable devices.

Table 2.

Confusion Matrices for Research Grade and Consumer Wearable Devices Compared to Indirect Calorimetry

	Actigraph GT9X Link
		Inactive (SD)	Light (SD)	MVPA (SD)
Cosmed K5 (Indirect Calorimetry)	Inactive %	66.7 (18.8)	31.5 (18.1)	1.8 (2.5)
	Light %	15.1 (10.5)	66.6 (15.3)	18.3 (10.5)
	MVPA %	4.1(3.3)	21.1 (13.5)	74.7 (14.3)
	Apple Watch Series 7
		Inactive	Light	MVPA
Cosmed K5 (Indirect Calorimetry)	Inactive %	76.8 (9.3)	21.2 (7.9)	2.0 (3.4)
	Light %	15.5 (11.9)	63.5 (13.1)	21.0 (12.4)
	MVPA %	4.1 (3.2)	18.2 (13.7)	77.7 (18.8)
	Garmin Vivoactive 4
		Inactive	Light	MVPA
Cosmed K5 (Indirect Calorimetry)	Inactive %	63.3 (16.1)	34.5 (15.6)	2.2 (3.4)
	Light %	12.0 (11.3)	62.7 (13.1)	25.3 (14.1)
	MVPA %	2.5 (2.8)	16.1 (13.2)	81.4 (13.8)

Open in a new tab

Table 3.

Sensitivity and Specificity by Device

	Sensitivity	95CI	Specificity	95CI
Inactive
Actigraph GT9X Link	66.7	(64.4, 69.0)	89.2	(88.3, 90.1)
Apple Watch Series 7	76.8	(75.8, 77.9)	78.7	(77.7, 79.7)
Garmin Vivoactive 4	63.3	(61.4, 65.1)	70.6	(69.9, 71.4)
Light
Actigraph GT9X Link	66.6	(65.2, 68.0)	71.3	(69.6, 73.0)
Apple Watch Series 7	63.5	(62.2, 64.8)	79.6	(79.0, 80.3)
Garmin Vivoactive 4	62.7	(61.3, 64.2)	70.5	(69.2, 71.8)
MVPA
Actigraph GT9X Link	74.7	(73.1, 76.4)	92.1	(91.6, 92.7)
Apple Watch Series 7	77.7	(76.0, 79.3)	94.5	(93.9, 95.2)
Garmin Vivoactive 4	81.4	(79.8, 83.0)	95.3	(94.6, 96.0)
Overall Weighted
Actigraph GT9X Link	68.2	(67.1, 69.3)	84.4	(83.6, 85.2)
Apple Watch Series 7	73.0	(71.8, 74.3)	82.0	(80.6, 83.4)
Garmin Vivoactive 4	66.6	(65.7, 67.5)	75.3	(74.7, 75.9)

Open in a new tab

Discrepancy Analyses

Statistics are presented in Table 4 and Bland-Altman plots are presented in Figures 1, 2, and 3. Apple and Garmin ENMO demonstrated high Pearson Correlation and Lin’s Concordance Correlation with ActiGraph, except for during inactive time. Total minutes inactive predicted from raw accelerometry data collected via Apple demonstrated the highest Pearson Correlation (0.83) and Lin’s Concordance Correlation (0.71) with the K5. This was followed by Garmin at 0.61 and 0.36, and ActiGraph at 0.51 and 0.37. Light activity predicted by the raw accelerometry data collected via Apple demonstrated the highest Pearson Correlation and Lin’s Concordance Correlation with the criterion predicted minutes at 0.70 and 0.63, followed by Garmin at 0.47 and 0.29, and ActiGraph at 0.46 and 0.28. MVPA predicted by the raw accelerometry data collected via ActiGraph demonstrated the highest Pearson Correlation and Lin’s Concordance Correlation with the criterion predicted minutes in MVPA at 0.74 and 0.72, followed by Garmin at 0.69 and 0.58, and Apple at 0.67 and 0.62.

Table 4.

Table of the Validity Statistics for the Research Grade and Consumer Wearable Devices

Variable	Device	Mean	SD	Pearson Correlation	Lin’s CCC	Mean Bias	SD	LOA		MAE	SD	MAPE^a	SD^a
Overall ENMO	Actigraph GT9X Link	153.5	240.9	-	-	-	-	-		-	-	-	-
	Apple Watch Series 7	165.8	291.6	0.98	0.97	12.4	64.4	−113.9	138.6	28.7	59.0	-	-
	Garmin Vivoactive 4	183.5	278.9	0.96	0.94	29.0	91.4	−150.2	208.2	48.9	82.5	-	-
Inactive ENMO	Actigraph GT9X Link	15.0	10.9	-	-	-	-	-		-	-	-	-
	Apple Watch Series 7	12.7	12.6	0.52	0.51	−2.2	11.6	−25.0	20.5	8.2	8.5	-	-
	Garmin Vivoactive 4	22.2	23.6	0.27	0.19	7.2	23.3	−38.4	52.8	17.1	17.3	-	-
Light ENMO	Actigraph GT9X Link	90.0	43.4	-	-	-	-	-		-	-	-	-
	Apple Watch Series 7	91.0	61.3	0.90	0.85	1.0	29.2	−56.3	58.3	19.1	22.2	-	-
	Garmin Vivoactive 4	107.9	78.6	0.81	0.66	17.9	50.2	−80.5	116.3	36.4	38.9	-	-
MVPA ENMO	Actigraph GT9X Link	539.7	288.8	-	-	-	-	-		-	-	-	-
	Apple Watch Series 7	601.6	351.0	0.95	0.91	62.0	122.7	−178.4	302.4	86.3	106.9	-	-
	Garmin Vivoactive 4	631.6	370.2	0.89	0.83	91.9	171.6	−244.4	428.2	133.6	141.6	-	-
Inactive Minutes	Cosmed K5	31.8	8.1	-	-	-	-	-		-	-	-	-
	Actigraph GT9X Link	24.6	8.6	0.51	0.37	−7.3	8.2	(−23.5	8.9)	8.4	7.2	30.0	43.7
	Apple Watch Series 7	27.9	6.6	0.83	0.71	−4.0	4.5	(−12.8	4.7)	4.9	3.4	19.9	44.0
	Garmin Vivoactive 4	22.7	7.6	0.61	0.36	−9.2	6.9	(−22.6	4.3)	9.5	6.4	31.2	25.2
Light Minutes	Cosmed K5	18.7	4.9	-	-	-	-	-	-	-	-	-	-
	Actigraph GT9X Link	25.0	8.7	0.46	0.28	6.3	7.8	(−8.9	21.5)	7.1	7.0	41.7	43.4
	Apple Watch Series 7	20.8	5.3	0.70	0.63	2.1	4.0	(−5.6	9.9)	3.6	2.7	22.8	25.4
	Garmin Vivoactive 4	24.6	7.9	0.47	0.29	5.9	7.1	(−8.0	19.9)	7.3	5.6	44.6	44.4
MVPA Minutes	Cosmed K5	12.0	5.3	-	-	-	-	-	-	-	-	-	-
	Actigraph GT9X Link	12.9	4.5	0.74	0.72	1.0	3.6	(−6.1	8.0)	2.8	2.4	29.2	37.4
	Apple Watch Series 7	13.9	4.8	0.67	0.62	1.9	4.1	(−6.1	9.9)	3.3	3.0	34.6	42.1
	Garmin Vivoactive 4	15.2	5.0	0.69	0.58	3.2	4.0	(−4.7	11.1)	4.0	3.2	43.5	53.7

Open in a new tab

Abbreviations: “ENMO” Euclidian Norm Minus One, “CCC” Concordance Correlation Coefficient, “SD” Standard Deviation, “LOA” Limits of Agreement, “MAE” Mean Absolute Error, “MAPE” Mean Absolute Percent Error

Mean Absolute Percent Error not presented for ENMO because of issues with inflated percent of values that approach zero like ENMO often does (i.e., values below zero are rounded to zero)

Figure 1. — Bland Altman Plot of ActiGraph GT9X Link (a), Apple Watch Series 7 (b), and Garmin Vivoactive 4 (c) Inactive Minutes Compared to K5 Estimated Inactive Minutes

Abbreviations: “K5” Cosmed K5 Indirect Calorimetry Device, “LOA” Limits of Agreement,

b_xy = Trend in agreement from regression analysis with bias (i.e., proxy minus criterion) as the dependent variable and criterion estimated inactive minutes as the independent variable

Figure 2. — Bland Altman Plot of ActiGraph GT9X Link (a), Apple Watch Series 7 (b), and Garmin Vivoactive 4 (c) Light Physical Activity Minutes Compared to K5 Estimated Light Minutes

Abbreviations: “K5” Cosmed K5 Indirect Calorimetry Device, “LOA” Limits of Agreement

b_xy = Trend in agreement from regression analysis with bias (i.e., proxy minus criterion) as the dependent variable and criterion estimated light minutes as the independent variable

Figure 3. — Bland Altman Plot of Plot of ActiGraph GT9X Link (a), Apple Watch Series 7 (b), and Garmin Vivoactive 4 (c) MVPA Minutes Compared to K5 Estimated MVPA Minutes

Abbreviations: “K5” Cosmed K5 Indirect Calorimetry Device, “LOA” Limits of Agreement, “MVPA” Moderate-to-Vigorous Physical Activity

b_xy = Trend in agreement from regression analysis with bias (i.e., proxy minus criterion) as the dependent variable and criterion estimated MVPA minutes as the independent variable

Similar trends in mean bias were observed for ENMO, minutes inactive, light, and MVPA predicted from the raw accelerometry data collected via the ActiGraph, Apple, and Garmin when compared the criterion. Apple ENMO showed smaller mean bias and absolute error than Garmin when compared to ActiGraph. Apple predicted minutes inactive demonstrated the smallest mean bias at −4.0 minutes, compared to −9.2 minutes from Garmin and −7.3 minutes from ActiGraph. Again, raw accelerometry collected via Apple showed the smallest mean bias when predicting light activity at 2.1 minutes, followed by Garmin at 5.9, and ActiGraph at 6.3. Finally, the raw accelerometry collected via ActiGraph showed the smallest mean bias when predicting MVPA at 1.0, followed by Apple at 1.9, and Garmin at 3.2. Similar trends were observed for absolute error and absolute percent error.

Equivalence Testing

Findings from the equivalence tests between ENMO, minutes in inactive, light, and MVPA estimates produced by the raw data collected via the two consumer wearable devices compared to ActiGraph are presented in Figure 4. ENMO of Apple and ActiGraph were found to be equivalent while the Garmin and ActiGraph ENMO were not equivalent. No activity intensity estimates from Apple or Garmin were found to be statistically equivalent when compared to ActiGraph. For raw accelerometry data collected via Apple the minimum statistically significant equivalence zone for inactive, light, and MVPA were ±21.8%; ±24.7%, and ±17.1%, respectively. For raw accelerometry data collected via Garmin the minimum statistically significant equivalence zone for inactive, light, and MVPA were ±16.5%, ±10.4%, and ±27.1%, respectively.

Figure 4. — Summary of Differences and Equivalence Bounds for Consumer Wearable when compared to ActiGraph Accelerometry (a) ENMO, (b) Inactive, (c) Light, (d) and MVPA Minutes.

Abbreviations: “Apple” Apple Watch Series 7, “Garmin” Garmin Vivoactive 4, and “ENMO” Euclidian Norm Minus One

Shaded blue area represents 10% equivalence bound (90% confidence intervals in green must be complete within bounds for the estimates to be statistically significantly equivalent), whiskers at the end of the green represent 95% confidence interval

DISCUSSION

The purpose of this proof-of-concept study was to demonstrate the potential of a device agnostic approach when applied to consumer wearable devices and to explore the performance of raw accelerometry data collected via consumer wearable devices and a research-grade device for predicting activity intensity when compared to activity intensity predicted from a criterion. The raw accelerometry data collected via Apple demonstrated the best agreement with the criterion when predicting time spent in inactive and light physical activity. Raw accelerometry data collected via the research-grade device demonstrated the best agreement with the criterion when predicting MVPA. These differences led to activity intensity estimates produced by the raw accelerometry data collected via the consumer wearable and research-grade devices that were not equivalent. These findings suggest that utilizing the raw accelerometry data collected via consumer wearable devices to predict activity intensity is possible; however, new approaches such as those identified in the Granada consensus statement (i.e., machine learning and alternative intensity metrics) (45) may be necessary for a truly device agnostic approach to be adopted.

The findings herein make a significant contribution to the field of physical activity measurement in children. Consumer wearables hold significant promise for use in research as they are designed to be small and sleek, they fit comfortably on the wrist, many models include non-wear sensors, and many models are waterproof. These features that are ubiquitous in consumer wearables are largely absent from research-grade devices, and have the potential to increase participant wear compliance, reduce data loss, increase the accuracy of activity estimates by combining heartrate and activity (46–49), increase the number of days of data that can be collected which may be necessary for accurately measuring habitual activity in youth (16), and increase the richness of the data collected which is necessary for data collection techniques like intensive longitudinal monitoring (50). However, these features cannot be capitalized upon because the activity estimates provided by these devices are reliant upon proprietary algorithms developed by the corporations that make these devices. This is problematic because the manufacturers of consumer wearable devices could make estimate-altering changes to these algorithms and these changes would be unknown by the scientific community. This would render estimates collected at different points in time on the same device incompatible. However, relying on the underlying raw accelerometry data collected by these devices rather than the estimates produced by the proprietary black box algorithms has the potential to overcome this limitation by collecting the raw sensor data from consumer wearable devices and processing it via open-source algorithms. This study provides preliminary evidence that this strategy may be viable for use with consumer wearable devices.

However, while the physical activity intensity estimates produced by the raw accelerometry data collected via consumer wearable and the research-grade devices demonstrated similar agreement with the criterion, the estimates were not statistically significantly equivalent. These differences may be due to several differences in the hardware and software of the devices. One reason for the difference may be related to the accelerometers included in the devices. The dynamic range of the ActiGraph GT9X Link accelerometer is ±8g (15). Based on the current study the dynamic range of Apple is ±16g. Differences in the accelerometer range would certainly account for some of the differences in ENMO and potentially estimates of activity intensity. Another reason for these differences may be that the devices collect accelerometer readings at differing frequencies. In this study Garmin collected accelerometer readings at 25Hz while ActiGraph and Apple collected data at 50Hz. Past research in adults has shown that accelerometer output and activity estimates can be influenced by accelerometer recording frequency (51). Unfortunately, this difference was unavoidable as Garmin does not allow adjustment of the sampling frequency. Still another reason that may at least partially explain the differences is the placement of the devices as the ActiGraph was always placed proximally and the consumer wearables were always placed distally. Previous research in adults has shown that distal vs. proximal placement may produce differences in estimates of steps per day (52). Finally, differences in the way that devices preprocess “raw” accelerometer data may account for the differences observed herein. To mimic how ActiGraph devices are frequently used in the field (53–55), this study collected data with idle sleep mode enabled for the ActiGraph devices. According to ActiGraph (56),

“Devices with idle sleep mode enabled will enter a sleep or low power state after experiencing 10 seconds of inactivity (fluctuation on the accelerometer < +/− 40 milliGs). After entering this low power state, the device checks once every second to determine if the unit has moved. While in sleep mode, the last sampled accelerometer value is written into memory at the device’s preset sample rate. [For example, a device set to sample at 30Hz would store the last known accelerometer reading 30 times every second; the device would then wake up and check for movement. If no movement were detected, this pattern would continue. Otherwise, the unit would exit sleep mode (i.e., “wake up”) and continue sampling in normal fashion.] After filtering (conversion to *.agd), this data becomes 0 counts since no movement is detected during this time period.”

The use of idle sleep mode makes it problematic to compare with other accelerometer brands given that there is no current way to replicate data produced with idle sleep mode on (57). Given this information, it is not surprising that raw accelerometry data collected via ActiGraph produced inferior estimations of minutes inactive and/or light activity when compared to estimates produced by data collected via Apple and Garmin. Further, these findings indicate that, at least when idle sleep mode is enabled, estimates between these devices are not equivalent. New validation studies to create Apple and Garmin specific activity intensity thresholds, among other consumer wearable devices, could be undertaken but is most likely not the best course of action. While it is possible to develop monitor-specific thresholds, the field will be better served by efforts to develop and test methods that can be applied regardless of what monitor is used to collect the data. Unfortunately, a true device-agnostic approach may not be achievable at the current moment. Nevertheless, the approach outlined herein can still be used to address the “black box” nature of the proprietary algorithms employed by consumer wearable devices.

The findings of this study should be interpreted in the context of its strengths and weaknesses. This study used raw accelerometry data from two consumer wearable devices to estimate physical activity intensity. This is a considerable strength of the study given that virtually all previous validation studies of consumer wearable devices have relied upon the estimates produced by consumer wearable devices which rely upon black-box proprietary algorithms. Another strength of the current study is that study design, analysis, and reporting relied upon best practice statements to design and report our findings (31–33, 35–37). This should increase the ability of others to replicate the findings herein.

The current study is also limited in several ways. First, the findings herein are based upon a single 60-minute protocol designed to simulate free-living physical activity. While the activities within were carefully selected to engage children in activity intensities from inactive to vigorous, they do not encompass all activities in which children may engage in real-world settings. The activities herein also may not reflect the intermittent and variable nature of children’s activities (i.e., it is not common for a child to engage in a single activity for 5 continuous minutes). Thus, full day free-living studies which use alternative criterion measure (i.e., direct observation) may be a reasonable next step. Further, this design does not allow for examination of the variability of activity levels within participants between days that may affect the findings herein. The findings are also limited to the two consumer wearable devices (i.e., Apple Watch Series 7 and Garmin Vivoactive 4) and one research-grade device (ActiGraph GT9X). Future studies should replicate the findings with additional consumer wearable devices and research-grade devices. Finally, the study must be interpreted within the context of the limitations of cutpoints. The Hildebrand (17, 27) absolute intensity thresholds applied herein were developed in a different sample of children using a different calibration protocol. Thus, as demonstrated in the current study they are likely to misclassify activity intensity when applied in a new sample and a different protocol. This is a well-documented issue in research using accelerometers to measure physical activity and has led to the call for alternative approaches including machine learning methods for predicting free-living activity and alternative accelerometer activity outcomes (45).

Activity intensity estimates produced by raw accelerometry data collected by consumer wearable devices were not statistically significantly equivalent with those produced by a research-grade device in this study. However, this may be due to the structured nature of the free-living protocol. Additional research is needed to test equivalence using more naturalized conditions with monitors capturing lifestyle behaviors across a whole day. Further refinement and application of the device-agnostic methodologies deployed in this study could unlock the potential of consumer wearables for use in research as they address the major limitation of consumer wearable devices (i.e., activity intensity estimates produced by proprietary, black box algorithms).

Supplementary Material

Supplemental Data File (.doc, .tif, pdf, etc.)

NIHMS1928270-supplement-Supplemental_Data_File___doc___tif__pdf__etc__.docx^{(99.7KB, docx)}

Acknowledgements

Research reported in this publication was supported in part by the National Institute of Diabetes and Digestive and Kidney Diseases Award Number R01DK129215. Olivia Finnegan was supported by National Institute of General Medical Sciences Award Number T32GM081740 while James White was supported by National Institute of Diabetes and Digestive and Kidney Diseases Award Number F31DK136205. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Unrelated to this work Dr. Weaver and Dr. Armstrong report board membership and ownership shares in Trackster LLC. Unrelated to this work Dr. de Zambotti reports grants from Noctrix Health and Verily Life Science LLC (Alphabet Inc.), and is a co-founder and Chief Scientific Officer at Lisa Health Inc. and has ownership of shares in Lisa Health. The results of the study are presented clearly, honestly, and without fabrication, falsification, or inappropriate data manipulation. The results of the present study do not constitute endorsement by the American College of Sports Medicine.

Footnotes

SUPPLEMENTAL DIGITAL CONTENT

SDC 1: Supplemental Digital Content.docx

REFERENCES

1.Casado-Robles C, Viciana J, Guijarro-Romero S, Mayorga-Vega D. Effects of consumer-wearable activity tracker-based programs on objectively measured daily physical activity and sedentary behavior among school-aged children: a systematic review and meta-analysis. Sports Med Open. 2022;8(1):18. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Garmin. Instinct^® Solar 2020. Available from: https://buy.garmin.com/en-US/US/p/679335.
3.Wright SP, Hall Brown TS, Collier SR, Sandberg K. How consumer physical activity monitors could transform human physiology research. Am J Physiol Regul Integr Comp Physiol. 2017;312(3):R358–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Depner CM, Cheng PC, Devine JK, et al. Wearable technologies for developing sleep and circadian biomarkers: a summary of workshop discussions. Sleep. 2020;43(2):zsz254. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.National Institutes of Health. All of Us research program expands data collection efforts with Fitbit 2019. [cited 2020 July 7th]. Available from: https://allofus.nih.gov/news-events-and-media/announcements/all-us-research-program-expands-data-collection-efforts-fitbit.
6.Argent R, Hetherington-Rauth M, Stang J, et al. Recommendations for determining the validity of consumer wearables and smartphones for the estimation of energy expenditure: expert statement and checklist of the INTERLIVE network. Sports Med. 2022;52(8):1817–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Åkerberg A, Arwald J, Söderlund A, Lindén M. An approach to a novel device agnostic model illustrating the relative change in physical behavior over time to support behavioral change. J Technol Behav Sci. 2022;7(2):240–51. [Google Scholar]
8.Willetts M, Hollowell S, Aslett L, Holmes C, Doherty A. Statistical machine learning of sleep and physical activity phenotypes from sensor data in 96,220 UK Biobank participants. Sci Rep. 2018;8(1):7961. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Terra API. This is it… a comprehensive list of wearable data accessible through APIs today 2022. Available from: https://blog.tryterra.co/comprehensive-list-of-all-the-wearable-data-that-are-available-through-apis-2bcd35a7307f.
10.Martinko A, Karuc J, Jurić P, Podnar H, Sorić M. Accuracy and Precision Of Consumer-Grade Wearable Activity Monitors For Assessing Time Spent In Sedentary Behavior In Children And Adolescents: Systematic Review. JMIR Mhealth and Uhealth. 2022;10(8):e37547. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Evenson KR, Goto MM, Furberg RD. Systematic review of the validity and reliability of consumer-wearable activity trackers. Int J Behav Nutr Phys Act. 2015;12:159. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Germini F, Noronha N, Borg Debono V, et al. Accuracy and acceptability of wrist-wearable activity-tracking devices: systematic review of the literature. J Med Internet Res. 2022;24(1):e30791. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Butte NF, Watson KB, Ridley K, et al. A youth compendium of physical activities: activity codes and metabolic intensities. Med Sci Sports Exerc. 2018;50(2):246–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Pate RR, Almeida MJ, McIver KL, Pfeiffer KA, Dowda M. Validation and calibration of an accelerometer in preschool children. Obesity (Silver Spring). 2006;14(11):2000–6. [DOI] [PubMed] [Google Scholar]
15.ActiGraph Corp. ActiGraph GT9X Link 2023. Available from: https://actigraphcorp.com/actigraph-link/.
16.Barreira TV, Schuna J, Tudor-Locke C, et al. Reliability of accelerometer-determined physical activity and sedentary behavior in school-aged children: a 12-country study. Int J Obes Suppl. 2015;5(2):S29–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Hildebrand M, VT VH, Hansen BH, Ekelund U. Age group comparability of raw accelerometer output from wrist-and hip-worn monitors. Med Sci Sports Exerc. 2014;46(9):1816–24. [DOI] [PubMed] [Google Scholar]
18.Hibbing PR, Bassett DR, Coe DP, LaMunion SR, Crouter SE. Youth metabolic equivalents differ depending on operational definitions. Med Sci Sports Exerc. 2020;52(8):1846–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Evenson KR, Catellier DJ, Gill K, Ondrak KS, McMurray RG. Calibration of two objective measures of physical activity for children. J Sports Sci. 2008;26(14):1557–65. [DOI] [PubMed] [Google Scholar]
20.Saint-Maurice PF, Kim Y, Welk GJ, Gaesser GA. Kids are not little adults: what MET threshold captures sedentary behavior in children? Eur J Appl Physiol. 2016;116(1):29–38. [DOI] [PubMed] [Google Scholar]
21.Tudor-Locke C, Schuna JM, Han H, et al. Cadence (steps/min) and intensity during ambulation in 6–20 year olds: the CADENCE-kids study. Int J Behav Nutr Phys Act. 2018;15(1):20. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.van Hees VT, Fang Z, Langford J, et al. Autocalibration of accelerometer data for free-living physical activity assessment using local gravity and temperature: an evaluation on four continents. J Appl Physiol (1985). 2014;117(7):738–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.van Hees V, editor. Accelerometer data processing with GGIR–a success story in Research Software. Open Science Days 2019; 2019. [Google Scholar]
24.van Hees V Accelerometer data processing with GGIR 2023. Available from: https://cran.r-project.org/web/packages/GGIR/vignettes/GGIR.html#58_Why_use_data_metric_ENMO_as_default.
25.Van Hees VT, Gorzelniak L, Dean León EC, et al. Separating movement and gravity components in an acceleration signal and implications for the assessment of human daily physical activity. PLoS One. 2013;8(4):e61691. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Bakrania K, Yates T, Rowlands AV, et al. Intensity thresholds on raw acceleration data: Euclidean norm minus one (ENMO) and mean amplitude deviation (MAD) approaches. PLoS One. 2016;11(10):e0164045. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Hildebrand M, Hansen BH, van Hees VT, Ekelund U. Evaluation of raw acceleration sedentary thresholds in children and adults. Scand J Med Sci Sports. 2017;27(12):1814–23. [DOI] [PubMed] [Google Scholar]
28.Bailey RC, Olson J, Pepper SL, Porszasz J, Barstow TJ, Cooper DM. The level and tempo of children’s physical activities: an observational study. Med Sci Sports Exerc. 1995;27(7):1033–41. [DOI] [PubMed] [Google Scholar]
29.Baquet G, Stratton G, Van Praagh E, Berthoin S. Improving physical activity assessment in prepubertal children with high-frequency accelerometry monitoring: a methodological issue. Prev Med. 2007;44(2):143–7. [DOI] [PubMed] [Google Scholar]
30.Vale S, Santos R, Silva P, Soares-Miranda L, Mota J. Preschool children physical activity measurement: importance of epoch length choice. Pediatr Exerc Sci. 2009;21(4):413–20. [DOI] [PubMed] [Google Scholar]
31.Kim Y, Beets MW, Welk GJ. Everything you wanted to know about selecting the “right” Actigraph accelerometer cut-points for youth, but…: a systematic review. J Sci Med Sport. 2012;15(4):311–21. [DOI] [PubMed] [Google Scholar]
32.Freedson P, Pober D, Janz KF. Calibration of accelerometer output for children. Med Sci Sports Exerc. 2005;37(11 Suppl):S523–30. [DOI] [PubMed] [Google Scholar]
33.Freedson PS, Miller K. Objective monitoring of physical activity using motion sensors and heart rate. Res Q Exerc Sport. 2000;71 Suppl 2:21–9. [DOI] [PubMed] [Google Scholar]
34.Lakens D Equivalence tests: a practical primer for t tests, correlations, and meta-analyses. Soc Psychol Personal Sci. 2017;8(4):355–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Welk GJ, Bai Y, Lee J-M, Godino J, Saint-Maurice PF, Carr L. Standardizing analytic methods and reporting in activity monitor validation studies. Med Sci Sports Exerc. 2019;51(8):1767–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Lobelo F, Kelli HM, Tejedor SC, et al. The wild wild west: a framework to integrate mHealth software applications and wearables to support physical activity assessment, counseling and interventions for cardiovascular disease risk reduction. Prog Cardiovasc Dis. 2016;58(6):584–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Keadle SK, Lyden KA, Strath SJ, Staudenmayer JW, Freedson PS. A framework to evaluate devices that assess physical behavior. Exerc Sport Sci Rev. 2019;47(4):206–14. [DOI] [PubMed] [Google Scholar]
38.Luque A, Carrasco A, Martín A, de Las Heras A. The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognit. 2019;91:216–31. [Google Scholar]
39.Jinyuan L, Wan T, Guanqin C, Yin L, Changyong F. Correlation and agreement: overview and clarification of competing concepts and measures. Shanghai Arch Psychiatry. 2016;28(2):115–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Bland JM, Altman D. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;327(8476):307–10. [PubMed] [Google Scholar]
41.Schuirmann DJ. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J Pharmacokinet Biopharm. 1987;15(6):657–80. [DOI] [PubMed] [Google Scholar]
42.Rogers JL, Howard KI, Vessey JT. Using significance tests to evaluate equivalence between two experimental groups. Psychol Bull. 1993;113(3):553–65. [DOI] [PubMed] [Google Scholar]
43.Dixon PM, Saint-Maurice PF, Kim Y, Hibbing P, Bai Y, Welk GJ. A primer on the use of equivalence testing for evaluating measurement agreement. Med Sci Sports Exerc. 2018;50(4):837–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Chowdhury EA, Western MJ, Nightingale TE, Peacock OJ, Thompson D. Assessment of laboratory and daily energy expenditure estimates from consumer multi-sensor physical activity monitors. PLoS One. 2017;12(2):e0171720. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Migueles JH, Aadland E, Andersen LB, et al. GRANADA consensus on analytical approaches to assess associations with accelerometer-determined physical behaviours (physical activity, sedentary behaviour and sleep) in epidemiological studies. Br J Sports Med. 2022;56(7):376–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Corder K, Brage S, Wareham NJ, Ekelund U. Comparison of PAEE from combined and separate heart rate and movement models in children. Med Sci Sports Exerc. 2005;37(10):1761–7. [DOI] [PubMed] [Google Scholar]
47.Eston RG, Rowlands AV, Ingledew DK. Validity of heart rate, pedometry, and accelerometry for predicting the energy cost of children’s activities. J Appl Physiol (1985). 1998;84(1):362–71. [DOI] [PubMed] [Google Scholar]
48.Zakeri I, Adolph AL, Puyau MR, Vohra FA, Butte NF. Application of cross-sectional time series modeling for the prediction of energy expenditure from heart rate and accelerometry. J Appl Physiol (1985). 2008;104(6):1665–73. [DOI] [PubMed] [Google Scholar]
49.Zakeri IF, Adolph AL, Puyau MR, Vohra FA, Butte NF. Cross-sectional time series and multivariate adaptive regression splines models using accelerometry and heart rate predict energy expenditure of preschoolers. J Nutr. 2013;143(1):114–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.National Institutes of Health. Intensive Longitudinal Analysis of Health Behaviors: Leveraging New Technologies to Understand Health Behaviors (U24) 2017. Available from: Intensive Longitudinal Analysis of Health Behaviors: Leveraging New Technologies to Understand Health Behaviors (U24). [Google Scholar]
51.Small S, Khalid S, Dhiman P, et al. Impact of reduced sampling rate on accelerometer-based physical activity monitoring and machine learning activity classification. J Meas Phys Behav. 2021;4(4):298–310. [Google Scholar]
52.Park S, Toth LP, Crouter SE, Springer CM, Marcotte RT, Bassett DR. Effect of monitor placement on the daily step counts of wrist and hip activity monitors. J Meas Phys Behav. 2020;3(2):164–9. [Google Scholar]
53.Brazendale K, Beets MW, Weaver RG, et al. Comparing measures of free-living sleep in school-aged children. Sleep Med. 2019;60:197–201. [DOI] [PubMed] [Google Scholar]
54.Arvidsson D, Fridolfsson J, Börjesson M, et al. Re-examination of accelerometer data processing and calibration for the assessment of physical activity intensity. Scand J Med Sci Sports. 2019;29(10):1442–52. [DOI] [PubMed] [Google Scholar]
55.Jaeschke L, Steinbrecher A, Boeing H, et al. Factors associated with habitual time spent in different physical activity intensities using multiday accelerometry. Sci Rep. 2020;10(1):774. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Actigraph. Idle Sleep Mode Explained 2018. [cited 2022 December 19]. Available from: https://actigraphcorp.my.site.com/support/.
57.van Hees V GGIR now able to read .gt3x files via R package read.gt3× 2023. Available from: https://www.accelting.com/updates/ggir-release-2-6-0/.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Data File (.doc, .tif, pdf, etc.)

NIHMS1928270-supplement-Supplemental_Data_File___doc___tif__pdf__etc__.docx^{(99.7KB, docx)}

[R1] 1.Casado-Robles C, Viciana J, Guijarro-Romero S, Mayorga-Vega D. Effects of consumer-wearable activity tracker-based programs on objectively measured daily physical activity and sedentary behavior among school-aged children: a systematic review and meta-analysis. Sports Med Open. 2022;8(1):18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Garmin. Instinct^® Solar 2020. Available from: https://buy.garmin.com/en-US/US/p/679335.

[R3] 3.Wright SP, Hall Brown TS, Collier SR, Sandberg K. How consumer physical activity monitors could transform human physiology research. Am J Physiol Regul Integr Comp Physiol. 2017;312(3):R358–67. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Depner CM, Cheng PC, Devine JK, et al. Wearable technologies for developing sleep and circadian biomarkers: a summary of workshop discussions. Sleep. 2020;43(2):zsz254. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.National Institutes of Health. All of Us research program expands data collection efforts with Fitbit 2019. [cited 2020 July 7th]. Available from: https://allofus.nih.gov/news-events-and-media/announcements/all-us-research-program-expands-data-collection-efforts-fitbit.

[R6] 6.Argent R, Hetherington-Rauth M, Stang J, et al. Recommendations for determining the validity of consumer wearables and smartphones for the estimation of energy expenditure: expert statement and checklist of the INTERLIVE network. Sports Med. 2022;52(8):1817–32. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Åkerberg A, Arwald J, Söderlund A, Lindén M. An approach to a novel device agnostic model illustrating the relative change in physical behavior over time to support behavioral change. J Technol Behav Sci. 2022;7(2):240–51. [Google Scholar]

[R8] 8.Willetts M, Hollowell S, Aslett L, Holmes C, Doherty A. Statistical machine learning of sleep and physical activity phenotypes from sensor data in 96,220 UK Biobank participants. Sci Rep. 2018;8(1):7961. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Terra API. This is it… a comprehensive list of wearable data accessible through APIs today 2022. Available from: https://blog.tryterra.co/comprehensive-list-of-all-the-wearable-data-that-are-available-through-apis-2bcd35a7307f.

[R10] 10.Martinko A, Karuc J, Jurić P, Podnar H, Sorić M. Accuracy and Precision Of Consumer-Grade Wearable Activity Monitors For Assessing Time Spent In Sedentary Behavior In Children And Adolescents: Systematic Review. JMIR Mhealth and Uhealth. 2022;10(8):e37547. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Evenson KR, Goto MM, Furberg RD. Systematic review of the validity and reliability of consumer-wearable activity trackers. Int J Behav Nutr Phys Act. 2015;12:159. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Germini F, Noronha N, Borg Debono V, et al. Accuracy and acceptability of wrist-wearable activity-tracking devices: systematic review of the literature. J Med Internet Res. 2022;24(1):e30791. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Butte NF, Watson KB, Ridley K, et al. A youth compendium of physical activities: activity codes and metabolic intensities. Med Sci Sports Exerc. 2018;50(2):246–56. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Pate RR, Almeida MJ, McIver KL, Pfeiffer KA, Dowda M. Validation and calibration of an accelerometer in preschool children. Obesity (Silver Spring). 2006;14(11):2000–6. [DOI] [PubMed] [Google Scholar]

[R15] 15.ActiGraph Corp. ActiGraph GT9X Link 2023. Available from: https://actigraphcorp.com/actigraph-link/.

[R16] 16.Barreira TV, Schuna J, Tudor-Locke C, et al. Reliability of accelerometer-determined physical activity and sedentary behavior in school-aged children: a 12-country study. Int J Obes Suppl. 2015;5(2):S29–35. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Hildebrand M, VT VH, Hansen BH, Ekelund U. Age group comparability of raw accelerometer output from wrist-and hip-worn monitors. Med Sci Sports Exerc. 2014;46(9):1816–24. [DOI] [PubMed] [Google Scholar]

[R18] 18.Hibbing PR, Bassett DR, Coe DP, LaMunion SR, Crouter SE. Youth metabolic equivalents differ depending on operational definitions. Med Sci Sports Exerc. 2020;52(8):1846–53. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Evenson KR, Catellier DJ, Gill K, Ondrak KS, McMurray RG. Calibration of two objective measures of physical activity for children. J Sports Sci. 2008;26(14):1557–65. [DOI] [PubMed] [Google Scholar]

[R20] 20.Saint-Maurice PF, Kim Y, Welk GJ, Gaesser GA. Kids are not little adults: what MET threshold captures sedentary behavior in children? Eur J Appl Physiol. 2016;116(1):29–38. [DOI] [PubMed] [Google Scholar]

[R21] 21.Tudor-Locke C, Schuna JM, Han H, et al. Cadence (steps/min) and intensity during ambulation in 6–20 year olds: the CADENCE-kids study. Int J Behav Nutr Phys Act. 2018;15(1):20. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.van Hees VT, Fang Z, Langford J, et al. Autocalibration of accelerometer data for free-living physical activity assessment using local gravity and temperature: an evaluation on four continents. J Appl Physiol (1985). 2014;117(7):738–44. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.van Hees V, editor. Accelerometer data processing with GGIR–a success story in Research Software. Open Science Days 2019; 2019. [Google Scholar]

[R24] 24.van Hees V Accelerometer data processing with GGIR 2023. Available from: https://cran.r-project.org/web/packages/GGIR/vignettes/GGIR.html#58_Why_use_data_metric_ENMO_as_default.

[R25] 25.Van Hees VT, Gorzelniak L, Dean León EC, et al. Separating movement and gravity components in an acceleration signal and implications for the assessment of human daily physical activity. PLoS One. 2013;8(4):e61691. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Bakrania K, Yates T, Rowlands AV, et al. Intensity thresholds on raw acceleration data: Euclidean norm minus one (ENMO) and mean amplitude deviation (MAD) approaches. PLoS One. 2016;11(10):e0164045. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Hildebrand M, Hansen BH, van Hees VT, Ekelund U. Evaluation of raw acceleration sedentary thresholds in children and adults. Scand J Med Sci Sports. 2017;27(12):1814–23. [DOI] [PubMed] [Google Scholar]

[R28] 28.Bailey RC, Olson J, Pepper SL, Porszasz J, Barstow TJ, Cooper DM. The level and tempo of children’s physical activities: an observational study. Med Sci Sports Exerc. 1995;27(7):1033–41. [DOI] [PubMed] [Google Scholar]

[R29] 29.Baquet G, Stratton G, Van Praagh E, Berthoin S. Improving physical activity assessment in prepubertal children with high-frequency accelerometry monitoring: a methodological issue. Prev Med. 2007;44(2):143–7. [DOI] [PubMed] [Google Scholar]

[R30] 30.Vale S, Santos R, Silva P, Soares-Miranda L, Mota J. Preschool children physical activity measurement: importance of epoch length choice. Pediatr Exerc Sci. 2009;21(4):413–20. [DOI] [PubMed] [Google Scholar]

[R31] 31.Kim Y, Beets MW, Welk GJ. Everything you wanted to know about selecting the “right” Actigraph accelerometer cut-points for youth, but…: a systematic review. J Sci Med Sport. 2012;15(4):311–21. [DOI] [PubMed] [Google Scholar]

[R32] 32.Freedson P, Pober D, Janz KF. Calibration of accelerometer output for children. Med Sci Sports Exerc. 2005;37(11 Suppl):S523–30. [DOI] [PubMed] [Google Scholar]

[R33] 33.Freedson PS, Miller K. Objective monitoring of physical activity using motion sensors and heart rate. Res Q Exerc Sport. 2000;71 Suppl 2:21–9. [DOI] [PubMed] [Google Scholar]

[R34] 34.Lakens D Equivalence tests: a practical primer for t tests, correlations, and meta-analyses. Soc Psychol Personal Sci. 2017;8(4):355–62. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Welk GJ, Bai Y, Lee J-M, Godino J, Saint-Maurice PF, Carr L. Standardizing analytic methods and reporting in activity monitor validation studies. Med Sci Sports Exerc. 2019;51(8):1767–80. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] 36.Lobelo F, Kelli HM, Tejedor SC, et al. The wild wild west: a framework to integrate mHealth software applications and wearables to support physical activity assessment, counseling and interventions for cardiovascular disease risk reduction. Prog Cardiovasc Dis. 2016;58(6):584–94. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Keadle SK, Lyden KA, Strath SJ, Staudenmayer JW, Freedson PS. A framework to evaluate devices that assess physical behavior. Exerc Sport Sci Rev. 2019;47(4):206–14. [DOI] [PubMed] [Google Scholar]

[R38] 38.Luque A, Carrasco A, Martín A, de Las Heras A. The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognit. 2019;91:216–31. [Google Scholar]

[R39] 39.Jinyuan L, Wan T, Guanqin C, Yin L, Changyong F. Correlation and agreement: overview and clarification of competing concepts and measures. Shanghai Arch Psychiatry. 2016;28(2):115–20. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Bland JM, Altman D. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;327(8476):307–10. [PubMed] [Google Scholar]

[R41] 41.Schuirmann DJ. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J Pharmacokinet Biopharm. 1987;15(6):657–80. [DOI] [PubMed] [Google Scholar]

[R42] 42.Rogers JL, Howard KI, Vessey JT. Using significance tests to evaluate equivalence between two experimental groups. Psychol Bull. 1993;113(3):553–65. [DOI] [PubMed] [Google Scholar]

[R43] 43.Dixon PM, Saint-Maurice PF, Kim Y, Hibbing P, Bai Y, Welk GJ. A primer on the use of equivalence testing for evaluating measurement agreement. Med Sci Sports Exerc. 2018;50(4):837–45. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] 44.Chowdhury EA, Western MJ, Nightingale TE, Peacock OJ, Thompson D. Assessment of laboratory and daily energy expenditure estimates from consumer multi-sensor physical activity monitors. PLoS One. 2017;12(2):e0171720. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] 45.Migueles JH, Aadland E, Andersen LB, et al. GRANADA consensus on analytical approaches to assess associations with accelerometer-determined physical behaviours (physical activity, sedentary behaviour and sleep) in epidemiological studies. Br J Sports Med. 2022;56(7):376–84. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] 46.Corder K, Brage S, Wareham NJ, Ekelund U. Comparison of PAEE from combined and separate heart rate and movement models in children. Med Sci Sports Exerc. 2005;37(10):1761–7. [DOI] [PubMed] [Google Scholar]

[R47] 47.Eston RG, Rowlands AV, Ingledew DK. Validity of heart rate, pedometry, and accelerometry for predicting the energy cost of children’s activities. J Appl Physiol (1985). 1998;84(1):362–71. [DOI] [PubMed] [Google Scholar]

[R48] 48.Zakeri I, Adolph AL, Puyau MR, Vohra FA, Butte NF. Application of cross-sectional time series modeling for the prediction of energy expenditure from heart rate and accelerometry. J Appl Physiol (1985). 2008;104(6):1665–73. [DOI] [PubMed] [Google Scholar]

[R49] 49.Zakeri IF, Adolph AL, Puyau MR, Vohra FA, Butte NF. Cross-sectional time series and multivariate adaptive regression splines models using accelerometry and heart rate predict energy expenditure of preschoolers. J Nutr. 2013;143(1):114–22. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] 50.National Institutes of Health. Intensive Longitudinal Analysis of Health Behaviors: Leveraging New Technologies to Understand Health Behaviors (U24) 2017. Available from: Intensive Longitudinal Analysis of Health Behaviors: Leveraging New Technologies to Understand Health Behaviors (U24). [Google Scholar]

[R51] 51.Small S, Khalid S, Dhiman P, et al. Impact of reduced sampling rate on accelerometer-based physical activity monitoring and machine learning activity classification. J Meas Phys Behav. 2021;4(4):298–310. [Google Scholar]

[R52] 52.Park S, Toth LP, Crouter SE, Springer CM, Marcotte RT, Bassett DR. Effect of monitor placement on the daily step counts of wrist and hip activity monitors. J Meas Phys Behav. 2020;3(2):164–9. [Google Scholar]

[R53] 53.Brazendale K, Beets MW, Weaver RG, et al. Comparing measures of free-living sleep in school-aged children. Sleep Med. 2019;60:197–201. [DOI] [PubMed] [Google Scholar]

[R54] 54.Arvidsson D, Fridolfsson J, Börjesson M, et al. Re-examination of accelerometer data processing and calibration for the assessment of physical activity intensity. Scand J Med Sci Sports. 2019;29(10):1442–52. [DOI] [PubMed] [Google Scholar]

[R55] 55.Jaeschke L, Steinbrecher A, Boeing H, et al. Factors associated with habitual time spent in different physical activity intensities using multiday accelerometry. Sci Rep. 2020;10(1):774. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R56] 56.Actigraph. Idle Sleep Mode Explained 2018. [cited 2022 December 19]. Available from: https://actigraphcorp.my.site.com/support/.

[R57] 57.van Hees V GGIR now able to read .gt3x files via R package read.gt3× 2023. Available from: https://www.accelting.com/updates/ggir-release-2-6-0/.

PERMALINK

A Device Agnostic Approach to Predict Children’s Activity from Consumer Wearable Accelerometer Data: A Proof-of-Concept Study

R Glenn Weaver

James White

Olivia Finnegan

Srihari Nelakuditi

Xuanxuan Zhu

Sarah Burkart

Michael Beets

Trey Brown

Russ Pate

Gregory J Welk

Massimiliano de Zambotti

Rahul Ghosal

Yuan Wang

Bridget Armstrong

Elizabeth L Adams

Layton Reesor-Oyer

Christopher D Pfledderer

Meghan Bastyr

Lauren von Klinggraeff

Hannah Parker

Abstract

Introduction:

Methods:

Results:

Conclusions:

INTRODUCTION

METHODS

Setting and Participants.

Procedures

Measures

Portable Cosmed K5 (criterion measure of activity intensity).

Research-Grade and Consumer Wearable Devices.

Data Processing.

Power

Analyses

Results

Table 1.

Epoch-by-epoch Analyses.

Table 2.

Table 3.

Discrepancy Analyses

Table 4.

Figure 1.

Figure 2.

Figure 3.

Equivalence Testing

Figure 4.

DISCUSSION

Supplementary Material

Acknowledgements

Footnotes

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases