Abstract
Introduction
Wrist-worn accelerometers can capture stepping behavior passively, continuously, and remotely. Methods utilizing peak detection, threshold crossing, and frequency analysis have been used to detect steps from wrist-worn accelerometer data, but it remains unclear how different approaches perform across a range of walking speeds and free-living activities. In this study, we evaluated the performance of four open-source methods for deriving step counts from wrist-worn accelerometry data, when applied to data from a range of structured locomotion and free-living activities. In addition, we assessed how modifying the parameters of these methods would affect their performance.
Methods
Twenty-one participants (ages 20–33) wore an ActiGraph CentrePoint Insight Watch (Actigraph, LLC) on their non-dominant wrist while completing structured locomotion activities in a motion capture laboratory and during a free-living period in a mock apartment. Criterion step counts were determined from motion capture heel-strike events and from StepWatch 3 (Modus Health, LLC) during the free-living period. Four open-source methods implementing different algorithmic approaches were applied to CPIW data to derive step counts. The quantity and timing of method-derived and criterion steps during each type of activity were then compared.
Results
In terms of performance during structured locomotion, methods that relied on a single parameter, such as peak detection or threshold crossing, demonstrated the lowest bias among those investigated. Furthermore, three of the four investigated methods overestimated step counts during slow walking and underestimated step counts during fast walking, while the last method consistently underestimated at least half of the recorded steps across all speeds. During free-living activities, the method relying on frequency analysis exhibited the lowest percent error of all methods. Finally, we found that the incorporation of a locomotion classifier, wherein steps were only estimated during identified locomotion periods, reduced error for two methods when applied to data across structured and free-living settings.
Conclusion
In studying the performance of different step-counting approaches across different settings, we found a tradeoff between performance during structured walking and that during free-living activities. These findings highlight the opportunity for novel, context-aware methods for accurate step counting across real-world settings.
Keywords: Accelerometer, Analytical validation, Physical activity, Step counting, Wrist-worn device
Introduction
Wearable activity monitors are an established class of digital health technologies that can measure many aspects of physical behavior. These technologies provide researchers with opportunities to measure individuals’ real-world behaviors remotely with low burden and high granularity; as a result, the use of wearable activity monitors in observational and clinical research is on the rise [1, 2]. This shift towards a patient-centric approach offers the chance to gain a more comprehensive understanding of treatment outcomes and well-being across therapeutic areas [2, 3].
Many wearable activity monitors incorporate a tri-axial accelerometer to measure acceleration. Although activity monitors have typically been worn on the hip or waist (e.g., [4]), studies are increasingly incorporating wrist-worn activity monitors to facilitate improved wear compliance [5]. Acceleration data from wrist-worn devices can be processed and translated into measures reflecting many aspects of physical behavior, such as physical activity, sedentary behavior, and sleep. Steps are a physical activity metric commonly derived from wearable activity monitors due to their readily interpretable construct and association with both chronic disease morbidity and mortality as well as overall health status [6–9]. In addition to chronic disease risk prevention, measures such as step count and other mobility variables, including gait cadence, speed, and regularity, can serve as valuable digital biomarkers for assessing disease progression and severity [10, 11]. In application, these digital biomarkers can provide complementary insights into clinically relevant and meaningful patient reported health outcomes, such as fatigability, functional mobility, or independence [12, 13]. As a result, steps and step-related outcomes are increasingly recognized as components of meaningful aspects of health [14] and determinants of health outcomes across healthy individuals and those with clinical conditions [15, 16]. Understanding the relationship between stepping behaviors and health outcomes will inform the development of targeted interventions and treatment strategies.
The realization of these benefits hinges upon the identification and validation of methods to accurately and reliably detect steps from wrist-worn accelerometer data. In pursuit of methodological rigor and reliability, the V3 framework is a cornerstone in the assessment of digital measures, emphasizing analytical validity to ensure that technologies and algorithms accurately measure what they purport to measure [17]. By adhering to this framework, researchers can ascertain the suitability and reliability of digital measures for their intended purposes, thereby bolstering confidence in study outcomes.
Various algorithmic approaches have been used to detect steps from wrist-worn accelerometry data [18]. These methods leverage the periodic characteristics of locomotion acceleration profiles, detecting steps via approaches such as peak detection [19, 20], threshold crossing [21], and frequency-domain analysis [22]. Several methods have been validated in healthy adults within controlled laboratory settings [23, 24]. Investigating the validity of step-count methods during laboratory-based activities (e.g., treadmill walking) can provide insights into performance during various ambulation speeds and gait characteristics. However, there is also a growing body of evidence indicating that performance declines when methods are applied to data collected during real-world behavior [25, 26]. Accordingly, validation of various step-count methods in real-world settings is needed.
In addition, previous studies have validated the accuracy of proprietary step-counting methods developed by commercial entities [27, 28]. While proprietary step-count methods can be evaluated for overall performance, their parameters cannot be adjusted to investigate how such parameter modifications affect performance. In contrast, the parameters of open-source step-count methods can be tuned and tested; in this sense, open-source methods can contribute to a more comprehensive understanding of when certain methods perform best.
The purpose of this study was to compare the performance of four open-source step-count methods that implement diverse algorithmic approaches. Brief descriptions of the evaluated methods and their tunable parameters of representative methods are provided in Table 1. For each method, we evaluated performance by comparing the quantity and timing of method-estimated and criterion steps during a range of structured locomotion and simulated free-living activities. In addition, we aimed to understand how modifying the parameters of these step-count methods would affect their performance during different activities.
Table 1.
Overview of open-source step-count methods applied to raw wrist accelerometer data
| Method | Brief description | Tunable parameters |
|---|---|---|
| ADAMO | Adaptive threshold crossing based on midrange |
|
| Ducharme | Peak detection with a fixed absolute magnitude threshold |
|
| Kang | Frequency analysis of power spectrum |
|
| Verisense | Peak detection with multiple thresholds |
|
Methods
Participants
Participants were recruited from western Massachusetts. Individuals were eligible to participate if they were between the ages of 18–64 years old, were ostensibly healthy without prior diagnosis of chronic disease, had not experienced an acute orthopedic injury that would impact gait within the past 6-month period, and were willing to complete laboratory and free-living activities at the University of Massachusetts Amherst Institute for Applied Life Science Center. All procedures were approved by the Institutional Review Board at the University of Massachusetts under protocol number 2009. All participants provided written, informed consent. Participants were compensated USD 50 in gift cards for each visit.
Protocol
In visit 1, participants completed structured activities within a motion capture laboratory. In visit 2, participants completed an unstructured free-living assessment within a mock apartment space.
Activity Monitoring
During both visit 1 and visit 2, participants wore an ActiGraph CentrePoint Insight Watch (CPIW; ActiGraph LLC, Pensacola, FL, USA) on the dorsal aspect of their non-dominant wrist. CPIW devices were initialized to sample at 128 Hz and affixed to the wrist using manufacturer-provided wristbands. Devices were oriented so that the y-axis was parallel to the longitudinal axis of the forearm.
Visit 1: Structured Activities in Motion Capture Laboratory
Figure 1 illustrates the sequence of tasks completed during the structured activities’ protocol. Participants were asked to complete eight walking tasks at varying speeds (treadmill walking: 0.6, 1.0, and 1.4 m/s; over-ground walking: self-selected speed, 0.6, 1, and 1.4 m/s; over-ground 6-min walk test), and two additional walking tasks that were part of a short physical performance battery test (casual walk, balance walk). Treadmill walking was always completed first with the speed sub-condition randomized within participant. The remaining activities were completed in a counterbalanced randomized order across participants. Tasks were repeated multiple times and motion capture data were recorded during segments of each task, henceforth referred to as trials.
Fig. 1.
Task completion order for a sample participant. 6MWT, 6-min walk test.
Throughout the structured activities protocol, center of mass and overground gait events were captured. Kinematics and kinetics of participants’ legs were captured using an 8 infrared-camera motion system. Missing data were interpolated using cubic splines fit to 20 data points from before and after the missing data periods. Criterion steps during each trial were determined from heel-strike events derived from motion analysis of marker data.
Visit 2: Unstructured Free-Living Assessment in Mock Apartment
Participants were asked to occupy a mock apartment space for one 2-h period. The apartment contains different rooms for participants to inhabit (kitchen, living room, bedroom, bathroom, laundry room). Except for the bathroom, each room was equipped with a camera to record activities. Participants were provided with a list of activities to complete during their time in the apartment (computer work, watching TV, doing laundry, vacuuming, and reading). They were asked to complete every activity on the list, but no further instructions were given regarding the timing, intensity, or duration of activities to be performed. Participants were also asked to periodically fragment their sedentary time via spontaneous posture changes and walking (i.e., take a break from sitting every 15-min, get up and walk on 5 different occasions, complete at least 1 bout of not sitting for 10-continuous minutes).
During visit 2 only, each participant wore a StepWatch 3 (Modus Health LLC, Washington DC) device on their non-dominant ankle, directly above the lateral malleoli. StepWatch data were downloaded using Modus software and exported in 3-s epochs to provide criterion measures of step counts for the free-living period.
Step-Count Method Application
Step-Count Methods
Four different step-count methods implementing different algorithmic approaches (ADAMO, Ducharme, Verisense, Kang; Table 1) were applied to raw CPIW data to derive step estimates during structured and free-living activities. The Kang method’s parameters were adapted and modified for accelerometer data because the originally reported parameters were specific to gyroscope data [22]. Further information on each method can be found in the original research (Table 1; online suppl. Fig. 1; for all online suppl. material, see https://doi.org/10.1159/000542850) and the code provided at https://github.com/VivoSense/step-validation_public.
Method Evaluation and Comparison
Assessment of Method-Derived versus Criterion Step Counts. For structured activities, criterion step counts and method estimated step counts from each task were used to compute method percent error () and mean absolute percent error (MAPE; ) and used in linear mixed effects regression models with random intercepts for participants to determine mean and 95% confidence intervals. For the free-living assessment, criterion and method-derived step counts were used in linear regression models to compute mean percent error and absolute percent error and their 95% confidence intervals. Bland-Altman plots were generated for motion capture and free-living assessments to visualize systematic bias between method estimates and the criterion measures.
Best 6-Min Effort Assessment. Best 6-min effort (B6ME) has been proposed as a measure of physical function that can be derived from free-living step data [29]. As another means of evaluating the utility of the various step-count methods during the free-living period, B6ME was computed for each step-count method and from StepWatch data as the number of derived steps during the most active 6-min period, i.e., the period that included the highest volume of steps. Spearman rank correlations were used to test associations between StepWatch-derived B6ME and method-estimated B6ME.
Assessment of Method-Derived versus Criterion Step Timing. The transition pairing method (TPM) was used to derive step event detection and timing metrics for methods that provided timestamped step event estimates as an output (ADAMO, Ducharme, Verisense methods). The TPM was originally designed to evaluate sedentary bout transitions [30]. It applies a modified Gale-Shapely matching algorithm that identifies and evaluates 2 time series of events (specified as criterion and predicted events) and determines the optimal event pairings using various user-specified pairing window thresholds (henceforth referred to as spurious pairing thresholds) [30]. The TPM yields event detection metrics for each spurious pairing threshold: precision () and recall (). As applied here to step-count method evaluation, true positives refer to the number of criterion and method-derived step event pairings that occur within the allotted spurious pairing threshold, false negatives refer to unmatched criterion steps within the spurious pairing threshold, and false positives refer to unmatched method-derived steps within the specified spurious pairing threshold. In addition to these event detection metrics, the TPM produces event timing metrics based on optimal pairings at specified spurious pairing thresholds. As applied here to evaluation of step events, this refers to the root mean squared error (RMSE) reflecting the time difference between the criterion heel-strike step event and the method-derived step event. An aggregated overall indicator of performance can be derived from the TPM by first converting the RMSE metric to a relative scale () so that RMSE values are bounded between 0 and 1, with lower values indicating worse event timings and higher values indicating better event timings. Then, the average of precision, recall, and %RMSE can be used as a single metric of a method’s overall performance, reflecting both event detection and timing. In our case, the spurious pairing thresholds investigated were all sub-second level, given that step-to-step durations were typically short in duration (<1-s). To test for differences in each TPM metric for the different step-count methods, for each spurious pairing threshold, a linear mixed effects regression model was fit with the activity and step-count method as fixed effects and random intercepts for participants.
Cadence Assessment. For each method that provided timestamped step event estimates, we also computed gait cadence, given its associations with physical function, fatigability, and mobility across various therapeutic areas [31, 32]. For each activity trial, the lagged time difference among detected steps was computed and used to estimate the instantaneous cadence for each step (60/step-to-step difference in seconds). Cadence for a given trial was computed as the median of instantaneous cadences. The correlation between method-estimated and criterion-measured cadence was assessed using the “rmcorr” package in R (version 0.7.0) to account for repeated measures within participants. Method validity statistics (bias, percent error, MAPE) were evaluated using linear mixed effects regression with random intercepts for participants.
Step-Count Method Parameter Tuning and Assessment
To derive insights into how modifying the parameters of each method would affect performance during free-living activities, optimization via grid-search parameter tuning was conducted for each method. For simple methods that did not already incorporate a locomotion classifier prior to step detection (i.e., ADAMO and Ducharme), the Kang locomotion classifier with the default parameters was implemented to determine the impact of a 2-stage approach on free-living step performance (i.e., identify locomotion periods, then derive step events from identified locomotion periods). Details on the grid-search for parameter values investigated can be found in online supplementary Table S2. Briefly, to ensure low, moderate, and high step volume sessions were present for method optimization, a training set consisting of data from twelve participants was randomly selected from the distribution of criterion StepWatch step data (four participants from each tertile of low, medium, and high step volume). These data were used to tune methods and determine the optimal parameter set that minimized RMSE. Subsequently, the resulting optimized methods were applied to free-living data from the remaining seven participants and evaluated for total steps (percent error and MAPE). Means and 95% confidence intervals of the original and optimized methods’ performance metrics were compared.
Results
Twenty-one individuals between the ages of 20 and 33 years participated in the study. A summary of participant characteristics, as well as activities completed in the laboratory, is presented in online supplementary Table S1. One participant’s StepWatch device malfunctioned, resulting in no usable data, and another participant completed only the free-living portion of the study. After visual inspection of the data, one participant’s StepWatch-derived step counts were abnormally high during observed sitting periods, which accounted for 60% of total steps from the free-living session. Thus, this participant’s free-living data were excluded from analysis. This resulted in N = 20 participants with usable data for analysis of structured activities and N = 19 for analysis of free-living activities.
The agreement between method step estimates and criterion steps is shown in Figure 2 and online supplementary Figure 2. The ADAMO and Ducharme methods yielded the lowest MAPE across all structured walking activities. In contrast, Verisense underestimated at least half of criterion steps during all structured activities. The ADAMO, Ducharme, and Kang methods all significantly overestimated steps during slow walking (0.6 m/s) and underestimated steps during fast walking (1.4 m/s) in both the treadmill and overground settings. For walking at medium speed (1.0 m/s), these methods significantly underestimated steps during treadmill walking, but they yielded unbiased estimates during overground walking. Steps during balanced and casual walking were unbiased using ADAMO and Ducharme methods, while the Kang and Verisense methods significantly underestimated steps. All methods significantly underestimated steps during the 6-min walk test. During the free-living period, all methods provided biased estimates of steps compared to StepWatch-estimated steps. Relative to StepWatch, steps were significantly overestimated using the ADAMO (mean [95% CI]: 1,054% [829.6%, 1,277.9%]), Ducharme (128% [91.0%, 165.7%]), and Kang methods (28% [11.4%, 43.9%]), while the Verisense method underestimated steps (−34% [−52.1%, −15.1%]).
Fig. 2.
Step-count method percent error (a) and absolute percent error (b) during structured activities and the free-living assessment. Points indicate mean values. Error bars indicate 95% confidence intervals [CIs].
Associations between StepWatch-derived and method-estimated B6ME during the free-living assessment are illustrated in Figure 3. Of the different step-count methods, B6ME derived with the Kang method demonstrated the strongest positive association with StepWatch-derived B6ME (rho [95% CI]: 0.554 [0.147, 0.8], p = 0.011). B6ME derived from the Ducharme method also has a positive correlation. There was no significant association between Verisense- and StepWatch-derived B6ME (rho: 0.371 [−0.086, 0.699], p = 0.107). StepWatch-derived B6ME was significantly negatively correlated with ADAMO-derived B6ME.
Fig. 3.
Associations between method estimated- and StepWatch measured-B6ME steps during the free-living assessment. Solid lines reflect the linear fit. Dashed lines are lines of identity.
Results of the TPM analysis that aimed to compare the timing of criterion and predicted step events are presented in Figure 4. Compared to ADAMO and Verisense, TPM metrics were generally significantly higher for the Ducharme method at most spurious pairing thresholds. The TPM metrics become stable with spurious pairing thresholds ≥ 0.3 s, which is a reasonable difference in time allowed for different step-count methods, and the same threshold was applied for the results described in this paragraph unless stated explicitly otherwise. All methods achieved >90% precision and most methods achieved ≥ 74% recall (Ducharme = 84%; ADAMO = 81%). The Verisense method achieved the lowest recall of all methods (16% at a spurious pairing threshold of 0.3 s), indicating a high number of false negative steps compared to the other methods, which is consistent with its poor MAPE (Fig. 2). For matched step timing (%RMSE, higher is better), %RMSE was significantly higher for Ducharme compared to both ADAMO and Verisense. Overall, the Ducharme method demonstrated higher aggregated performance across precision, recall and %RMSE metrics at all spurious pairing thresholds compared to the other methods.
Fig. 4.
Transition pairing method metrics for step-count method detected steps during motion capture activities. Data presented are the grand mean and standard deviations across all tasks completed during the motion capture activities. Points indicate mean values. Error bars indicate 95% confidence intervals. Asterisks indicate that the respective result for ADAMO was significantly different form that for Ducharme. Crosses indicate that the respective result for Verisense was significantly different from that for Ducharme.
Summary statistics for estimated cadence using the Ducharme, ADAMO, and Verisense methods are presented in Table 2. Cadence estimated with the Ducharme method exhibited the strongest association with motion capture-determined cadence (r = 0.504 [95% CI: 0.448, 0.556]), compared to cadence estimated with the ADAMO (r = 0.225 [0.155, 0.292]) and Verisense (r = 0.340 [0.232, 0.440]) methods. Group-level ADAMO cadence estimates were unbiased. However, individual-level precision of cadence estimates was best using the Ducharme method as evidenced by its lower mean absolute percentage error (MAPE: 9.6% [6.2%, 13.1%]). Lower precision in cadence estimates was attributed to increased false positive and false negative steps, resulting in over- and under-estimation of cadence, respectively (online suppl. Fig. S3).
Table 2.
Validation of device algorithm-vs motion capture-derived gait cadence during laboratory activities
| ADAMO | Ducharme | Verisense | |
|---|---|---|---|
| Correlation | 0.225 [0.155, 0.292] | 0.504 [0.448, 0.556] | 0.340 [0.232, 0.440] |
| Bias, steps/min | −4.7 [−9.9, 0.4] | −8.1 [−12.3, −3.9] | 58.8 [12, 105.3] |
| Percent error, % | −3.3 [−7.4, 0.9] | −6.3 [−9.7, −2.8] | 45.3 [6.5, 83.7] |
| Absolute percent error, % | 13.3 [8.4, 18.1] | 9.6 [6.2, 13.1] | 97.8 [65.2, 130.1] |
Values presented are mean (95% CI) derived from linear mixed effects regression accounting for repeated measures within participants.
The performance of methods optimized via grid-search parameter tuning was assessed by percent error and MAPE, for structured and free-living activities combined (Fig. 5). MAPE was lower using optimized versions of the ADAMO and Ducharme methods compared to the original methods (online suppl. Table S3). Although there were significant differences in percent error between the original and optimized versions of Kang as indicated by non-overlapping confidence intervals (27.3% [3.1%, 51.5%] vs. −25.4% [−41%, −9.9%]) and Verisense (−12.7% [−45.1%, 19.7%] vs. 72.8% [40.2%, 105.4%]), MAPE did not differ between the original and optimized versions of either the Kang or Verisense method (online suppl. Table S3). Univariate distributions illustrating the impact of various parameter modifications on method performance are shown in online supplementary Figure S4. Of all the modifications, including a locomotion classifier had the largest positive influence on performance of the ADAMO and Ducharme methods.
Fig. 5.
Percent error and mean absolute percent error of original and optimized step-count method performance during the free-living activities. Points indicate mean values. Error bars indicate 95% confidence intervals (CIs).
Discussion
In this study, we investigated the analytical validity of four open-source step-count methods applied to wrist-worn accelerometer data collected during structured and free-living activities. Methods that rely on a single parameter to detect steps, such as peak detection via Ducharme or threshold crossing via ADAMO, demonstrated the lowest bias of all methods investigated during structured activities. However, during free-living activities, these methods exhibited poorer performance relative to the remaining methods. Given that the Ducharme and ADAMO methods were originally developed from data collected during structured locomotion activities (e.g., treadmill, overground, stair ascent/descent), this is consistent with prior findings that laboratory-developed methods perform better during structured activities and worse during free-living activities. In contrast, the Verisense method, which was developed and tuned to minimize error when applied to data from free-living behaviors, exhibited low error when applied to free-living data but underestimated steps for structured locomotion activities.
The frequency-based Kang method performed similarly to the Ducharme and ADAMO methods during structured activities and achieved the best performance of all methods during free-living conditions. Although it was originally developed using data from laboratory settings, the method’s free-living performance may stem from its parameterization of observed frequency-domain patterns and theoretical locomotion characteristics, leveraging frequency analysis to infer stride frequency and estimated steps. Furthermore, the Kang method’s inclusion of an initial locomotion classifier, which ensures that steps are only considered during periods exhibiting the dominant frequency occurring within the typical frequency range of human walking (0.6–2.0 Hz), may further mitigate false positive step errors in free-living settings. Results of our method optimization analysis further support the value of a locomotion classifier: We found that adding a locomotion classifier to the Ducharme and ADAMO methods significantly reduced error compared to the methods’ original, unmodified versions. Recent findings have indicated that step estimates derived from the Verisense method may also be improved when combined with a self-supervised machine learning locomotion classifier [33].
While the Kang method effectively estimates steps within a specified window, its output lacks timestamps for individual step events. This limits its application to step-count metrics (e.g., steps/day, B6ME) but forgoes the granular analytic potential offered by peak detection and threshold-based methods (e.g., for gait characteristics). These algorithmic approaches provide timestamped step event data, enabling deeper investigation of gait characteristics relevant to clinical outcomes. Given the critical role of precise step event detection in deriving numerous gait characteristics, TPM metrics offer a novel approach to comprehensively evaluate both step detection sensitivity and step event timing. The Ducharme method consistently demonstrated superior TPM performance across all metrics compared to the other methods investigated. Similarly, Ducharme-derived cadence estimates corresponded well with criterion-estimated cadence. Therefore, optimizing TPM metrics during the development of algorithms that produce discrete step event data may also lead to improvements in clinically relevant gait characteristics.
Taken together, our results suggest that peak-detection-based step-count methods necessitate stringent filtering to avoid counting spurious peaks as steps during free-living activities. This is particularly relevant for wrist-worn activity monitors, due to frequent upper limb involvement in real-world activities of daily living. We also found that a two-stage approach incorporating a primary locomotion classification followed by application of the step-count method to locomotion periods significantly improved free-living step estimates. Furthermore, while frequency-based approaches performed well during locomotion within their predefined frequency range, they exhibited increased susceptibility to error during locomotion movements outside that range. This limitation can yield inaccurate step estimates for non-canonical walking patterns or higher intensity locomotion (e.g., running).
This study has several additional limitations worth noting. First, due to space constraints, overground walking in the motion capture laboratory had short trial durations, which limited the acquired step samples. This potentially inflated percent error estimates, although results from overground walking were similar to those from treadmill activities, for which longer trial durations provided larger step samples. Future research investigating method validity across the entire waking day or for longer structured trial durations is warranted. Second, the mock apartment space limited opportunities for locomotion, resulting in predominantly short, frequent locomotion bouts during the free-living period. Although behaviors were intermittent by nature and emphasized activities of daily living, for which step detection is notoriously difficult [25, 26], the simulated free-living setting permitted a unique opportunity to assess real-world performance in a highly relevant context. Third, participants were predominantly young, healthy, and did not include individuals with low functional capacity or slow or altered gait, so findings may not be generalizable to populations with restricted physical abilities or pathological gait. Lastly, the sample size for this study was on the smaller side (N = 20), particularly for the optimization analysis which involved a split of the sample. However, even with the smaller sample size, the statistically significant relative improvement in method step estimates provides confidence in the overall trends and insights presented that should be considered for future algorithm development. Future research should explore the analytical validity of various step-count assessment methods in individuals with diverse demographics, restricted mobility, and altered gait. Additionally, ensemble-based approaches warrant future exploration for method development, given that some methods were observed to perform better within specific contexts.
Our aim in comparing the performance of the four open-source step-count methods applied to wrist-worn accelerometry data collected during structured and free-living activities was to learn about their strengths and weaknesses to inform the development of novel methods to accurately detect steps across different settings. Results indicated that the frequency-based Kang method demonstrated low error across both structured and free-living activities. Conversely, peak detection via the Ducharme method and threshold crossing via the ADAMO method exhibited better performance than other methods during structured locomotion activities but poorer performance in free-living scenarios. On the other hand, the Verisense method exhibited better performance during free-living behaviors. The incorporation of a locomotion classifier into the Ducharme and ADAMO methods led to performance improvements. Our findings highlight the tradeoff between better performance in specific contexts and generalizability across contexts, highlighting the opportunity for context-aware methods to robustly detect steps in real-world applications. The field of drug development is likely to benefit from robust step-count methods as they can enhance precision, sensitivity, and reliability of outcome measurements involving stepping behaviors. This has the potential to improve the detection of clinically meaningful change over time and optimize sample size estimates, leading to more efficient and cost-effective clinical studies [34, 35]. Ultimately, the development of robust step-count methods has the potential to support the selection of appropriate endpoints and the design of studies tailored to detect clinically meaningful treatment effects.
Statement of Ethics
All procedures were approved by the Institutional Review Board (IRB) at the University of Massachusetts Amherst under protocol number 2009. All participants provided written, informed consent prior to participation.
Conflict of Interest Statement
R.T.M. was a paid intern at VivoSense, Inc. S.L.B., Y.Z., I.C., and K.L. are employees of VivoSense, Inc. I.C. is on the Editorial Board of Karger Digital Biomarkers and the Scientific Advisory Board for IMI IDEA FAST and has received fees for lectures and consulting on digital health at ETH Zürich and FHNW Muttenz.
Funding Sources
Research reported in this publication was funded by VivoSense, Inc.
Author Contributions
K.L. designed the study, developed the protocol, and supervised data collection. R.T.M. conducted the data analysis under the instruction of Y.Z. in collaboration with S.L.B. and I.C., R.T.M. wrote the first draft of the manuscript. All authors contributed to the interpretation, writing, reviewing, and editing of the manuscript. All authors accept responsibility for the integrity of the work and approve the final version of the manuscript.
Funding Statement
Research reported in this publication was funded by VivoSense, Inc.
Data Availability Statement
The data that support the findings of this study are not publicly available due to their containing information that could compromise the privacy of research participants but are available from R.T.M. (robert.marcotte@vivosense.com) upon reasonable request.
Supplementary Material.
References
- 1. Wijndaele K, Westgate K, Stephens SK, Blair SN, Bull FC, Chastin SFM, et al. Utilization and harmonization of adult accelerometry data: review and expert consensus. Med Sci Sports Exerc. 2015;47(10):2129–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Izmailova ES, Wagner JA, Ammour N, Amondikar N, Bell-Vlasov A, Berman S, et al. Remote digital monitoring for medical product development. Clin Transl Sci. 2021;14(1):94–101. [Google Scholar]
- 3. Clay I, Peerenboom N, Connors DE, Bourke S, Keogh A, Wac K, et al. Reverse engineering of digital measures: inviting patients to the conversation. Digit Biomark. 2023;7(1):28–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Mueller A, Hoefling HA, Muaremi A, Praestgaard J, Walsh LC, Bunte O, et al. Continuous digital monitoring of walking speed in frail elderly patients: noninterventional validation study and longitudinal clinical trial. JMIR MHealth UHealth. 2019;7(11):e15191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Freedson PS, John D. Comment on “estimating activity and sedentary behavior from an accelerometer on the hip and wrist.”. Med Sci Sports Exerc. 2013;45(5):962–3. [DOI] [PubMed] [Google Scholar]
- 6. Bassett DR, Toth LP, LaMunion SR, Crouter SE. Step counting: a review of measurement considerations and health-related applications. Sports Med. 2017;47(7):1303–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Paluch AE, Bajpai S, Bassett DR, Carnethon MR, Ekelund U, Evenson KR, et al. Daily steps and all-cause mortality: a meta-analysis of 15 international cohorts. Lancet Public Health. 2022;7(3):e219–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Tudor-Locke C, Bassett DR. How many steps/day are enough? Sports Med. 2004;34(1):1–8. [DOI] [PubMed] [Google Scholar]
- 9. Piercy KL, Troiano RP, Ballard RM, Carlson SA, Fulton JE, Galuska DA, et al. The physical activity guidelines for Americans. JAMA. 2018;320(19):2020–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Sasaki JE, Bertochi GFA, Meneguci J, Motl RW. Pedometers and accelerometers in multiple sclerosis: current and new applications. Int J Environ Res Public Health. 2022;19(18):11839. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Soltani A, Abolhassani N, Marques-Vidal P, Aminian K, Vollenweider P, Paraschiv-Ionescu A. Real-world gait speed estimation, frailty and handgrip strength: a cohort-based study. Sci Rep. 2021;11(1):18966. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Griffiths P, Rofail D, Lehner R, Mastey V. The patient matters in the end(point). Adv Ther. 2022;39(11):4847–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Rao C, Di Lascio E, Demanse D, Marshall N, Sopala M, De Luca V. Association of digital measures and self-reported fatigue: a remote observational study in healthy participants and participants with chronic inflammatory rheumatic disease. Front Digit Health. 2023;5:1099456. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Manta C, Patrick-Lake B, Goldsack JC. Digital measures that matter to patients: a framework to guide the selection and development of digital measures of health. Digit Biomark. 2020;4(3):69–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Polgar O, Patel S, Walsh JA, Barker RE, Clarke SF, Man WD-C, et al. Minimal clinically important difference for daily pedometer step count in COPD. ERJ Open Res. 2021;7(1):00823-2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Gardner AW, Montgomery PS, Wang M, Shen B. Minimal clinically important differences in daily physical activity outcomes following supervised and home-based exercise in peripheral artery disease. Vasc Med. 2022;27(2):142–9. [DOI] [PubMed] [Google Scholar]
- 17. Goldsack JC, Coravos A, Bakker JP, Bent B, Dowling AV, Fitzer-Attas C, et al. Verification, analytical validation, and clinical validation (V3): the foundation of determining fit-for-purpose for Biometric Monitoring Technologies (BioMeTs). Npj Digit Med. 2020;3(1):55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Rhudy MB, Mahoney JM. A comprehensive comparison of simple step counting techniques using wrist- and ankle-mounted accelerometer and gyroscope signals. J Med Eng Technol. 2018;42(3):236–43. [DOI] [PubMed] [Google Scholar]
- 19. Ducharme SW, Lim J, Busa MA, Aguiar EJ, Moore CC, Schuna JM, et al. A transparent method for step detection using an acceleration threshold. J Meas Phys Behav. 2021;4(4):311–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Maylor BD, Edwardson CL, Dempsey PC, Patterson MR, Plekhanova T, Yates T, et al. Stepping towards more intuitive physical activity metrics with wrist-worn accelerometry: validity of an open-source step-count algorithm. Sensors. 2022;22(24):9984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Magistro D, Brustio PR, Ivaldi M, Esliger DW, Zecca M, Rainoldi A, et al. Validation of the ADAMO Care Watch for step counting in older adults. PLoS One. 2018;13(2):e0190753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Kang X, Huang B, Qi G. A novel walking detection and step counting algorithm using unconstrained smartphones. Sensors. 2018;18(1):297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Toth LP, Park S, Pittman WL, Sarisaltik D, Hibbing PR, Morton AL, et al. Validity of activity tracker step counts during walking, running, and activities of daily living. Transl J Am Coll Sports Med. 2018;3(7):52–9. [Google Scholar]
- 24. Toth LP, Park S, Pittman WL, Sarisaltik D, Hibbing PR, Morton AL, et al. Effects of brief intermittent walking bouts on step count accuracy of wearable devices. J Meas Phys Behav. 2019;2(1):13–21. [Google Scholar]
- 25. Kerr J, Patterson RE, Ellis K, Godbole S, Johnson E, Lanckriet G, et al. Objective assessment of physical activity: classifiers for public health. Med Sci Sports Exerc. 2016;48(5):951–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Pavey TG, Gilson ND, Gomersall SR, Clark B, Trost SG. Field evaluation of a random forest activity classifier for wrist-worn accelerometer data. J Sci Med Sport. 2017;20(1):75–80. [DOI] [PubMed] [Google Scholar]
- 27. Toth L, Paluch AE, Bassett DRJ, Rees-Punia E, Eberl EM, Park S, et al. Comparative analysis of ActiGraph step counting methods in adults: a systematic literature review and meta-analysis. Med Sci Sports Exerc. 2024;56(1):53–62. [DOI] [PubMed] [Google Scholar]
- 28. Fuller D, Colwell E, Low J, Orychock K, Tobin MA, Simango B, et al. Reliability and validity of commercially available wearable devices for measuring steps, energy expenditure, and heart rate: systematic review. JMIR MHealth UHealth. 2020;8(9):e18694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Lyden K, Abraham N, Boucher R, Wei G, Gonce V, Carle J, et al. Predicting hospitalization from real-world measures in patients with chronic kidney disease: a proof-of-principle study. Digit Health. 2023;9:20552076231181234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Hibbing PR, LaMunion SR, Hilafu H, Crouter SE. Evaluating the performance of sensor-based bout detection algorithms: the transition pairing method. J Meas Phys Behav. 2020;3(3):219–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Rubin DS, Ranjeva SL, Urbanek JK, Karas M, Madariaga MLL, Huisingh-Scheetz M. Smartphone-based gait cadence to identify older adults with decreased functional capacity. Digit Biomark. 2022;6(2):61–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Urbanek JK, Zipunnikov V, Harris T, Crainiceanu C, Harezlak J, Glynn NW. Validation of gait characteristics extracted from raw accelerometry during walking against measures of physical function, mobility, fatigability, and fitness. J Gerontol A Biol Sci Med Sci. 2018;73(5):676–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Small SR, Chan S, Walmsley R, Fritsch LV, Acquah A, Mertes G, et al. Development and validation of a machine learning wrist-worn step detection algorithm with deployment in the UK biobank. medRxiv. 2023;2023:2023.02.20.23285750. [Google Scholar]
- 34. Durán CO, Bonam M, Björk E, Hughes R, Ghiorghiu S, Massacesi C, et al. Implementation of digital health technology in clinical trials: the 6R framework. Nat Med. 2023;29(11):2693–7. [DOI] [PubMed] [Google Scholar]
- 35. Taylor-Rodriguez DM, Lovitz DM, Mattek N, Wu C-Y, Kaye J, Dodge HH, et al. Reducing the sample size in high-frequency biomarkers RCTs. Alzheimers Dement. 2020;16(S5):e042005. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data that support the findings of this study are not publicly available due to their containing information that could compromise the privacy of research participants but are available from R.T.M. (robert.marcotte@vivosense.com) upon reasonable request.





