Skip to main content
Lippincott Open Access logoLink to Lippincott Open Access
. 2021 May 25;53(11):2445–2454. doi: 10.1249/MSS.0000000000002705

The CNN Hip Accelerometer Posture (CHAP) Method for Classifying Sitting Patterns from Hip Accelerometers: A Validation Study

MIKAEL ANNE GREENWOOD-HICKMAN 1, SUPUN NAKANDALA 2, MARTA M JANKOWSKA 3, DORI E ROSENBERG 1, FATIMA TUZ-ZAHRA 4, JOHN BELLETTIERE 4, JORDAN CARLSON 5,6, PAUL R HIBBING 5, JINGJING ZOU 4, ANDREA Z LACROIX 4, ARUN KUMAR 2, LOKI NATARAJAN 4
PMCID: PMC8516667  NIHMSID: NIHMS1704504  PMID: 34033622

Supplemental digital content is available in the text.

Key Words: MACHINE LEARNING; HEALTHY AGING; SIT-TO-STAND TRANSITIONS; ACTIVPAL, ACTIGRAPH; FREE-LIVING; OLDER ADULT

ABSTRACT

Introduction

Sitting patterns predict several healthy aging outcomes. These patterns can potentially be measured using hip-worn accelerometers, but current methods are limited by an inability to detect postural transitions. To overcome these limitations, we developed the Convolutional Neural Network Hip Accelerometer Posture (CHAP) classification method.

Methods

CHAP was developed on 709 older adults who wore an ActiGraph GT3X+ accelerometer on the hip, with ground-truth sit/stand labels derived from concurrently worn thigh-worn activPAL inclinometers for up to 7 d. The CHAP method was compared with traditional cut-point methods of sitting pattern classification as well as a previous machine-learned algorithm (two-level behavior classification).

Results

For minute-level sitting versus nonsitting classification, CHAP performed better (93% agreement with activPAL) than did other methods (74%–83% agreement). CHAP also outperformed other methods in its sensitivity to detecting sit-to-stand transitions: cut-point (73%), TLBC (26%), and CHAP (83%). CHAP’s positive predictive value of capturing sit-to-stand transitions was also superior to other methods: cut-point (30%), TLBC (71%), and CHAP (83%). Day-level sitting pattern metrics, such as mean sitting bout duration, derived from CHAP did not differ significantly from activPAL, whereas other methods did: activPAL (15.4 min of mean sitting bout duration), CHAP (15.7 min), cut-point (9.4 min), and TLBC (49.4 min).

Conclusion

CHAP was the most accurate method for classifying sit-to-stand transitions and sitting patterns from free-living hip-worn accelerometer data in older adults. This promotes enhanced analysis of older adult movement data, resulting in more accurate measures of sitting patterns and opening the door for large-scale cohort studies into the effects of sitting patterns on healthy aging outcomes.


Sedentary behavior is a severe and prevalent health risk for older adults comprising 10–14 h of older adults’ days (16). Recent evidence suggests that there may be additional risk associated with sitting for prolonged periods of time independent of the total time spent sitting (79). The latter findings have led to increased interest in the study of “sitting patterns,” which refers to the number and duration of sitting bouts (i.e., continuous periods of sitting) versus nonsitting bouts (i.e., continuous periods of standing or stepping), as well as the postural transitions between them. Sitting patterns can be quantified using metrics such as number of daily sit-to-stand transitions, number of daily sitting bouts, number of daily prolonged sitting bouts (≥30 min), mean sitting bout duration (total daily sitting time/total sit-to-stand transitions), and usual bout duration (the sitting bout duration at or above which 50% of an individual’s sitting time is accumulated) (8,10).

Sitting patterns are generally measured using thigh or hip-worn accelerometers; however, to date, hip-worn accelerometry is the best approach to measure motion and movement (sedentary behavior), whereas thigh-worn devices are better at measuring posture and postural transitions (sitting patterns) (1113). Although systems using several sensors can measure both sedentary behavior and sitting patterns (14), it is desirable for participant ease and comfort to have one device that can measure both with high validity. Measures of sitting patterns derived from cut-point–based hip-worn accelerometer data do not adequately measure the postural transitions that form the basis of sitting pattern metrics, including overestimating the number of sit-to-stand transitions and underestimating prolonged sitting time (1517). Progress in machine learning techniques may make it possible to address hip-worn accelerometry’s major limitation and close the gap in sitting pattern measurement between hip-worn and thigh-worn accelerometers, as evidenced by developments in related areas such as activity type and intensity classification (1821). However, the ability of current algorithms to identify the postural transitions (sit-to-stand) needed to measure sitting patterns in free-living populations is low, and there is a lack of algorithms that are specifically trained to identify transitions (2224).

Thigh-worn inclinometers such as activPAL have been shown to accurately capture sit-to-stand transitions and can be used as high-frequency ground truth in posture labeling because data are provided many times per second (25). In previous work, we have demonstrated that activPAL data can be used to train machine learning models for capturing postural transitions in free-living hip-worn accelerometer data, although a small sample with low generalizability was used (26,27). Here we build on this previous work and describe the training and validation of a Convolutional Neural Network (CNN) + bi-directional long short-term memory network (BiLSTM) model designed to classify sitting patterns as well as sedentary behavior from hip-worn ActiGraph accelerometer data. We dub this algorithm the CNN Hip Accelereometer Posture (CHAP) method and detail its superior accuracy for identifying sit-to-stand transitions using data from 709 older men and women who concurrently wore hip-worn ActiGraph accelerometers and thigh-worn activPAL inclinometers for up to 7 d.

METHODS

Parent Study

Data were obtained from the Adult Changes in Thought (ACT) study, an ongoing longitudinal cohort study that maintains an active enrollment of approximately 2000 older adults (≥65-yr old) in Washington State. The ACT study began in 1994 to investigate risk factors for development of dementia and has since provided a unique opportunity to additionally study a wide range of noncognitive factors of healthy aging. Starting in 2016, the ACT activity monitor substudy (ACT-AM) was initiated, adding a device-based activity monitoring component to capture the spectrum of sedentary and physically active patterns (28). Participants were excluded from ACT-AM if they were wheelchair bound, receiving hospice or care for a critical illness, or resided in a nursing home, or if memory problems became evident during testing. The remaining participants were asked to wear a hip-worn ActiGraph wGT3X+ (ActiGraph LLC, Pensacola, FL), activated using ActiLife software to capture 30-Hz triaxial (i.e., data captured from three spatial axes) data and worn on an elastic belt situated so the device rests on the right side at the level of the suprailiac crest, and a thigh-worn activPAL micro3 (PAL Technologies, Glasgow, Scotland, United Kingdom), activated using a 10-s minimum threshold for labeling postural transitions and secured to the front, center thigh with waterproofed materials. Participants were asked to wear both devices 24 h·d−1 for 1 wk. Although some participants elected only to wear one device, most wore both simultaneously. Participants also recorded self-reported sleep logs throughout their device wear. Ethics approval was obtained from the Kaiser Permanente Washington institutional review board (approval no. 821300). All participants provided written informed consent.

Data Cleaning and Preprocessing

In-bed and accelerometer nonwear time was removed from the device data. The collected self-reported sleep logs were used to identify and remove in-bed time. Missing sleep log information was imputed using person-specific means, when available, or using the sample average. To identify and remove periods of nonwear, ActiGraph accelerometer data were processed using the Choi algorithm (29,30) applied to vector magnitude counts per minute using a 90-min window, 30-min streamframe, and 2-min tolerance.

For inclusion in this study, data were required from both the ActiGraph and activPAL devices simultaneously. Participants were excluded if data from either of the monitors were missing or invalid. No minimum wear time criteria were required; all days with concurrent device wear for any length of time were considered valid days and were included in the sample. After restricting to waking wear time on both devices, visual inspection was used to define invalid data based on time drift between the monitors, a phenomenon in which data collected from one device seem to gradually lose or gain time when compared with another device resulting in the two data streams no longer aligning (Figure, Supplemental Digital Content–Appendix, which depicts an example of drift between activPAL and ActiGraph, http://links.lww.com/MSS/C335) (31).

CHAP Design

The CHAP method was developed using a deep neural network (32) to classify sitting versus nonsitting behavioral postures and postural transitions from 10-Hz triaxial ActiGraph data (downsampled from 30-Hz via boxcar aggregation to reduce the size of the dataset). All computations were made on 10-s nonoverlapping windows of continuous 10-Hz data, each containing 100 triaxial acceleration values. The 10-s window size was chosen to align with activPAL’s 10-s minimum threshold for labeling postural transitions. We used a model architecture family called CNN-BiLSTM architecture (33), which has three main components: 1) a CNN base (34), 2) a BiLSTM network (35), and 3) a softmax output layer akin to a logistic regression classifier (36). The first component automatically extracted features for identifying sitting versus nonsitting for each time point, the second component refined these features by considering neighboring time points and the most likely sequence of events, and the third component converted the extracted features to a final classification label (sitting or nonsitting). Hereinafter, detailed descriptions are given for each component of CHAP and the unique way these components work synergistically.

CNN

After partitioning both activPAL and triaxial ActiGraph data into nonoverlapping 10-s increments, features were extracted for each window. Unlike traditional machine learning models that target certain predefined features (e.g., time- or frequency-domain summary values), the CNN automatically learned its own features by repeatedly convolving the raw triaxial data, with each convolution using a different kernel. During training, the model learned the parameters of each kernel, which enabled the convolution-based features to capture the relevant information for the posture classification task.

BiLSTM

The CNN classifications were made under the assumption that all 10-s windows contained independent and identically distributed data (37). Human behavior does not meet these conditions, as a given action will generally be influenced by the preceding actions. Therefore, it was important to account for this temporal dependence (38), which necessitated layering the BiLSTM on top of the CNN. The BiLSTM component automatically learned temporal features from the patterns of variations across time to differentiate activities. The BiLSTM component took in a sequence of features produced by the CNN component for a window of input data and output another sequence of BiLSTM-extracted features corresponding to each 10-s window of the input. During training, the parameters of the BiLSTM component were adjusted to properly smooth the output so that there was minimal opportunity for the model to insert spurious interruptions during continuous sitting or nonsitting bouts.

CNN and BiLSTM featurization relationship

The CNN and BiLSTM components have a complementary relationship in how they featurized the data for classification. The CNN captured behaviors at a lower temporal granularity using the immediate temporal relationships within the classification window (10 s). This helped identify sudden changes in the base accelerometer features, for example, those caused by transitions. In a sense, similar to how two-dimensional CNNs exploit spatial dependencies in image pixels to extract relevant features, our one-dimensional CNN effectively treated time series as “one-dimensional images” across time. The BiLSTM’s memory cells “remembered” patterns in the extracted CNN features over time to discern higher-level behaviors with longer temporal relationships. This helped identify both nonchanges in the base features, for example, those during sitting (or nonsitting) bouts, as well as reoccurring changes, for example, back-to-back transitions. Together, these capabilities demonstrated the power of modern deep learning in automatically featurizing low-level sequence data: myriad manually tuned temporal thresholds are replaced with compact end-to-end learned neural architectures.

Softmax output layer

The output of the BiLSTM component was a sequence of intermediate features corresponding to a window of input data. To perform the final behavior classification on the extracted features, we used a Softmax layer. The Softmax layer converts input features to final probabilities of each 10-s time interval belonging to sitting or nonsitting behavior. We then selected the most probable label as the final classification.

CHAP Development and Evaluation

The sample was divided into a training sample (n = 399 participants), a holdout validation sample (n = 97), and a test sample (n = 213). The training and validation samples were used to determine the optimal settings for CHAP, whereas the test sample was withheld until final models were selected and used for a performance comparison of CHAP and two other commonly used sitting pattern classification methods (described hereinafter). Given the large number of steps and parameter tuning that occurs when building CNN models, a test dataset was critical for obtaining unbiased estimates of model performance.

Model development

The CHAP method was trained end to end using the backpropagation technique (32), meaning that output from each layer was sequentially fed into the subsequent layer to generate a final output. During training, we fed each window of input ActiGraph data through CHAP, generating classifications for each 10-s time interval in each input window. We then compared classifications with the activPAL-derived ground-truth labels corresponding to the same 10-s input window in question, which are assigned based on the majority activPAL-designated posture in a given 10-s window (note: in the case of a tie, the sitting label was chosen). Based on this comparison, we then used the backpropagation method to update the learnable parameters in the model in order to minimize the cross-entropy of classifications (i.e., maximize accuracy) between the predicted classifications and the ground-truth labels. This process was completed for all input training data and repeated several times.

Training neural networks is a complex process involving multiple parameters and tuning steps that could lead to models that overfit the data. Thus, it is unwise to use training data alone for model selection given that the goal is to apply the algorithm on future data that is independent of the training set (39). Therefore, we fitted several model configurations on the training data and compared their performance when applied to the holdout validation data. Model configurations varied on four dimensions: BiLSTM window size (7 and 9 min), number of neurons in a CNN layer (3200 and 6400 neurons), learning rate (0.001 and 0.0001), and regularization coefficient (0.001 and 0.0001). All possible unique combinations of domain values were tested, for a total of 16 unique model configurations tested. These comparisons enabled us to identify the best model configuration, based on several performance metrics (Table 1). Metrics included overall and balanced classification accuracy, ability to accurately capture transitions (i.e., changes in posture), sitting and nonsitting bout deviations, and Kolmogorov–Smirnov statistics for comparing CHAP-predicted versus true (activPAL) probability distributions of sitting and nonsitting bouts. Models with low accuracy or high variance, relative to competing models, on any of these metrics were eliminated. Three models performed equally well on all metrics, and these models were used to create a hybrid ensemble model that made classifications based on the majority vote. This ensemble model represented the complete CHAP method. For each of the three models that performed best in the validation set and the final ensemble model, we calculated the means and SD of the evaluation metrics described in Table 1.

TABLE 1.

Definitions and interpretations of accuracy and error metrics.

Confusion Matrix of Actual and Predicted 10-s Segments
Predicted Sitting Predicted Nonsitting
Actual sitting a b
Actual nonsitting c d
Metric Definition† Interpretation
Accuracy (a + d)/(a + b + c + d) Proportion of segments correctly predicted
Sensitivity a/(a + b) Proportion of activPAL sitting segments that were predicted sitting. Shows out of all the activPAL sitting segments, how many were correctly precited as sitting
Specificity d/(c + d) Proportion of activPAL nonsitting segments that were predicted nonsitting. Shows out of all the activPAL not sitting minutes, how many were correctly predicted as nonsitting
Balanced accuracy 0.5a/(a + b) + 0.5d/(c + d) Average of sensitivity and specificity
Sitting time MAPE 100 (|(a + b) − (a + c)|)/(a + b) Absolute percent error in total predicted sitting time (vs total actual sitting time)
Notsitting time MAPE 100 (|(c + d) − (b + d)|)/(c + d) Absolute percent error in total predicted nonsitting time (vs total actual nonsitting time)

†Refers to letters defined in the confusion matrix.

Model evaluation

Using data from the test set, we compared the performance of CHAP with the performance of two other classification approaches that are commonly used in the field: 1) the standard ActiGraph cut-point (AG cut-point) method and 2) a previously developed two-level behavior classification (TLBC) machine-learned model designed to differentiate sitting from standing postures. The AG cut-point method is designed to capture sedentary, nonmovement bouts, which are sometimes used as a proxy for sitting bouts (7). Sedentary bouts were defined using 1-min epoch data, in which minutes was classified as sedentary if the vertical axis counts were less than 100 (40). Consecutive sedentary minutes were classified as bouts with no minimum duration required and no allowance for interruptions. TLBC sequentially applies a pretrained random forest and hidden Markov model to 30-Hz triaxial accelerometer data and was trained using annotated images captured from person-worn SenseCams (4143). TLBC first converts the 30-Hz triaxial accelerometer data into a set of 41 engineered features that are used to classify minutes of sitting, riding in a vehicle (which collectively represent sitting), standing, and walking/running (which collectively represent nonsitting). We defined sitting bouts as any period labeled by TLBC as a sitting posture, specifically sitting and riding in a vehicle.

The methods were compared using the same classification metrics that were used during validation (Table 1). Because the TLBC and AG cut-point methods yielded results at the minute level, for model comparison purposes, CHAP’s 10-s-level classifications were aggregated to the minute level using majority vote for sitting versus nonsitting labels. We also included comparisons of common person-level sitting pattern metrics, including mean sitting bout duration (total sitting time/number of sitting bouts), average daily sitting time (total sitting time/number of days), and average daily number of sitting bouts (number of sitting bouts/number of days). A final performance indicator was how well each method was able to predict the timing of postural transitions at a 10-s granularity within a 1-min window. This analysis was done using the transition pairing method (44), which uses an extended Gale-Shapley algorithm to pair actual and predicted transitions together for analysis. The method allowed for the exclusion of nonsequential pairings and any pairings that exceeded a specified lag time (tolerance), which was 1 min for this study. One minute was the minimum tolerance level after which the number of successful pairings leveled off (Supplemental Table 1, Supplemental Digital Content–Appendix, which shows transition pair sensitivity and precision results at different tolerance levels, from no tolerance to 5 min, across methods, http://links.lww.com/MSS/C335). The pairings were analyzed to determine the true positive rate (recall) and positive predictive value (PPV; precision) of predicted transitions.

Performance metrics were calculated for each person and method. Summary statistics were then calculated across participants, and boxplots were used to visually examine variability across test subjects. In addition to model performance metrics, we also compared commonly used sitting pattern metrics (mean sitting bout duration, mean daily sitting time, and mean number of daily sitting bouts), derived using each method to the activPAL ground truth. General estimating equations, accounting for nesting of methods within participants, were used to evaluate differences of performance between methods and whether sitting pattern metrics derived from different methods were significantly different from those derived from activPAL. A general estimating equation was implemented using an exchangeable correlation structure and robust standard errors. Finally, to allow inference about individual-level, in addition to sample-level, agreement, sitting pattern metrics derived from each modeling approach (AG cut-point, TLBC, and CHAP) were also compared with activPAL using mean absolute error (MAE).

RESULTS

Sample partitioning and characteristics

Figure 1 summarizes data loss and partitioning, and Table 2 shows participant characteristics for the final sample. Participant characteristics for the included overall ACT-AM sample were similarly distributed in the training (n = 399), validation (n = 97), and test sets (n = 213).

FIGURE 1.

FIGURE 1

Flow diagram from the ACT study for inclusion into this study and random division into training and testing data sets. 1Nonconcurrent wear represents data in which the devices are not worn concurrently. 2Drift is a phenomenon in which data collected from one device seems to gradually lose or gain time when compared with another device, such that, over time, the two data streams no longer align. See Figure, Supplemental Digital Content for an example of drift in this sample, http://links.lww.com/MSS/C335.

TABLE 2.

Participant characteristics for the full, training, validation, and test sets.

Full Sample Traininga Validationa Test
Characteristics n = 709 n = 399 n = 97 n = 213
Mean (SD)
Age, yr 76.70 (6.52) 76.87 (6.38) 76.60 (6.84) 76.44 (6.64)
Sex n (%)
 Female 415 (58.5) 234 (58.6) 54 (55.7) 127 (59.6)
Race ethnicity
 Hispanic or non-White 70 (9.9) 31 (7.8) 16 (16.5) 23 (10.9)
Education
 Less than high school 10 (1.4) 7 (1.8) 1 (1.0) 2 (0.9)
 Completed high school 52 (7.3) 25 (6.3) 8 (8.2) 19 (8.9)
 Some college 113 (15.9) 68 (17.0) 13 (13.4) 32 (15.0)
 Completed college 534 (75.3) 299 (74.9) 75 (77.3) 160 (75.1)
BMI, kg·m−2
 ≤29 537 (77.4) 293 (74.7) 81 (88.0) 163 (77.6)
 >29 157 (22.6) 99 (25.3) 11 (12.0) 47 (22.4)
Self-rated health
 Good, poor, or very poor 279 (39.4) 164 (41.1) 37 (38.1) 78 (36.6)
Difficulty in walking half a mile
 Some or more 168 (23.7) 99 (24.8) 21 (21.6) 48 (22.5)

aDifferences between training and validation sets and the test set were not statistically significant at the 5% level using two-sample t-test for continuous variables and χ2 test for categorical variables.

Model accuracy

Ten-second-level summary statistics of the three best CNN model configurations (labeled A, B, C) and the CHAP model are displayed in Table 3. Here we focus on the accuracy and mean absolute percent error (MAPE) metrics defined in Table 1 between the three CNN model configurations, which estimate agreement and deviation between the actual and predicted values.

TABLE 3.

Test set performance of top 3 performing CNN models and ensemble CHAP at the 10-s level (mean (SD) of metrics).

Models Accuracy (%) Balanced Accuracy (%) Sitting Time MAPE (%) Nonsitting Time MAPE (%) Transition Sensitivity (Recall) % at 1-min Tolerancea Transition PPV (Precision) % at 1-min Tolerancea
A 93.5 (3.9) 91.8 (4.7) 5.3 7.7 76.7 (10.3) 74.5 (12.6)
B 93.7 (3.8) 91.9 (5.1) 5.2 8.7 76.2 (11.1) 76.7 (12.3)
C 93.7 (3.6) 92.4 (4.2) 5.5 9.8 75.8 (9.9) 77.0 (11.6)
CHAP (ensemble) 94.1 (3.6) 92.6 (4.5) 5.2 8.2 77.1 (10.8) 80.0 (12.5)

aDetection of transitions within ±6 10-s epochs of ActiGraph data.

Across all performance metrics, CHAP was superior to the other methods (Fig. 2) at the minute level. For balanced accuracy, which is the average of sensitivity and specificity, the AG cut-point method performed worst, with a value of 74%, followed by 83% for TLBC versus 93% for the CHAP model. All models had high sensitivity for classifying sitting, ranging from 88% (AG cut-point) to 97% (CHAP). Specificity varied markedly between models: 60% for AG cut-point, 74% for TLBC, and 89% for CHAP. The differences in performance in balanced accuracy, sensitivity, and specificity between CHAP and the AG cut-point method, and between CHAP and TLBC were statistically significant at the 5% level. The MAPE values of sitting versus nonsitting classification were not similar. Although all methods were able to accurately classify true sitting, the AG cut-point and TLBC methods classified between 25% and 40% of true (activPAL registered) nonsitting as sitting. Of note, the variation in these metrics was also higher for the AG cut-point and TLBC versus CHAP, indicating superior individual-level agreement for the latter method.

FIGURE 2.

FIGURE 2

Minute-level performance (balanced accuracy, sensitivity/recall, specificity) in classifying sitting vs not sitting comparing AG cut-point (pink), TLBC (blue), and CHAP (green).

Participant-level sitting pattern classification

Figure 3 shows results of the sitting pattern analyses. The average mean bout duration from CHAP, 15.7 min·d−1, did not significantly differ relative to activPAL, 15.4 min·d−1 (MAECHAP = 2 min). Average mean bout duration using the AG cut-point (9.4 min·d−1) and TLBC methods (49.4 min·d−1), did significantly differ at the 5% level relative to activPAL (MAEAG cut-point = 6 min and MAETLBC = 34 min). Average daily sitting time derived using AG cut-point (643.2 min·d−1) and using the TLBC method (616.2 min·d−1), was significantly different relative to activPAL (594.6 min·d−1; MAEAG cut-point = 75 min, and MAETLBC = 50 min), but average daily sitting time derived from CHAP (595.4 min·d−1) was not significantly different relative to activPAL (MAECHAP = 31 min). Average daily number of sitting bouts using all three methods was significantly different from activPAL. Of the three methods, average daily number of sitting bouts derived using CHAP (41.8 per day) was the closest to activPAL (43.9 per day; MAECHAP = 5), and the difference was not deemed to be relevant in practice. The average daily number of sitting bouts derived using AG cut-point (79.2 per day) and TLBC (14.1 per day) had much larger deviations relative to activPAL (MAEAG cut-point = 35 and MAETLBC = 30). The results suggest that the latter two methods are unable to accurately capture sitting patterns. AG cut-point overpredicted the number of transitions by two times explaining why its mean bout duration was lower than activPAL, whereas TLBC underpredicted relative to activPAL by two-thirds and hence why its mean bout duration was higher. Despite its superior performance to the other two methods, the CHAP method had slightly lower person-to-person variability (i.e., lower SDs) compared with activPAL.

FIGURE 3.

FIGURE 3

Person-level sitting pattern metrics (mean sitting bout duration, average daily sitting time in minutes, average daily number of sitting bouts) comparing activPAL (orange), AG cut-point (pink), TLBC (blue), and CHAP (green).

Classifying the timing of sit-to-stand transitions

We examined accuracy in predicting sit-to-stand transitions within a 1-min window by the three methods compared with the activPAL (Fig. 4). Transition sensitivity estimates the percent of true transitions (as registered by the activPAL), which were captured by the different methods. Sensitivity for transition detection was similar for the AG cut-point (72%) and CHAP (83%), whereas it was only 26% for TLBC, likely due to oversmoothing. Transition PPV or precision estimates the proportion of predicted transitions, which are true activPAL transitions. In contrast to the sensitivity results, PPV was similar for CHAP (83%) and TLBC (71%), whereas it was only 30% for the AG cut-point. The differences in performance in transition sensitivity and transition PPV between CHAP and the AG cut-point method, and between CHAP and TLBC were statistically significant at the 5% level.

FIGURE 4.

FIGURE 4

Assessment of minute-level performance in timing of classification of sit-to-stand transitions within 1-min window (tolerance) using paired actual and predicted transitions for AG cut-point (pink), TLBC (blue), and CHAP (green).

DISCUSSION

The CHAP model had higher accuracy than existing methods for classifying sitting bouts and sit-to-stand transitions from free-living hip-worn accelerometer data in older adults. As such, it represents an important step forward in the field of sitting pattern measurement in this population. CHAP will allow for less cumbersome protocols for studies in older adults by necessitating only one hip-worn device to measure both posture and motion. CHAP can be used to reprocess previously collected hip-worn accelerometer data among older adults, resulting in more accurate measures of true sitting time and patterns in existing cohort studies as well as future studies that choose to use hip-worn accelerometers.

The AG cut-point method overestimated true sitting time and failed to capture sit-to-stand transitions that are key to the measurement of sitting patterns (1517,45). This underscores the importance of using methods for their intended use. That is, cut-point methods are meant to capture movement intensity and nonmovement but not changes in posture. The main shortcoming of the cut-point method was that it misclassified approximately 40% of activPAL registered nonsitting time as sitting, while simultaneously overpredicting sit-to-stand transitions such that approximately 70% of the transitions it predicted were not activPAL transitions, resulting in inaccurate measures of sitting patterns. These findings are in line with other studies that support the use of hip-worn accelerometry for measuring motion and movement but suggest thigh-worn devices for measuring posture and postural transitions (1113,1517). Thus, evidence on sitting patterns measured using ActiGraph cut-points should be interpreted with caution. It is not clear whether such misestimation has major impacts on the ability to detect associations between sitting patterns and health. Nonetheless, there is sufficient evidence to suggest that sitting pattern estimates, derived from ActiGraph cut-points should not be compared with studies that employed posture-based measures such as activPAL or used to inform specific thresholds of sitting patterns when generating intervention or public health recommendations.

Transitions have been a large issue for the field even with application of machine-learned algorithms. Machine learning approaches most often rely on single-label classification within a given window or period (e.g., 5 min), and therefore, an inherent assumption is that only one activity type occurs within each interval window (22). Laboratory-based training data reduce the amount of transitions, resulting in algorithms with high predictive accuracy, but algorithms trained on data obtained from free-living populations must account for the inherent messiness of human postural changes and movement. The TLBC method was designed to address some of these limitations by training it against free-living images collected by a body-worn camera. However, the body-worn camera captured images triggered by changes in light and movement, meaning TLBC was unable to reliably capture postural transitions or their exact timing, leading to an underestimation of postural transitions (44). Solutions have been proposed in the literature to allow for better identification of transitions by machine learning models including activity-based windowing and adaptive sliding window segmentation, where for both solutions windows are adjusted to ensure one activity is represented per window and windows can vary in size throughout the dataset (46,47). Alternatively, CHAP uses a BiLSTM component with a fixed window that automatically learns to capture the transitions during training. We found that, although the model accuracy did not significantly vary (at most 2% variation) with the chosen BiLSTM window size, it significantly affected the ability of the model to capture transitions correctly. As the window size was increased from 1 to 9 min, the transition capturing recall reduced by 6% from 83% to 77% and the PPV increased by 23% from 56% to 79%. In practice, we found that a window size of 7–9 min works well for our data, which had a mean activPAL sitting bout time of 15.4 min and mean nonsitting bout time of 7.9 min. More experimental results on the model sensitivity for the chosen BiLSTM window size are provided in Supplemental Table 2 (see Table, Supplemental Digital Content, Appendix, http://links.lww.com/MSS/C335).

Deep learning methods to improve measures derived from accelerometer data are of growing interest in the field. For instance, Nawaratne et al. (48) leverage a CNN model architecture to derive measures of physical activity intensity from wrist-worn ActiGraph that are of equal caliber to those measured from the hip-worn ActiGraph. Although the goals of Nawaratne et al.’s model differ from those of CHAP, making the results not directly comparable, their work demonstrates the utility of CNN model architecture in constructing machine-learned approaches to processing accelerometer data. CHAP builds on this approach, adding a BiLSTM layer for improved measurement of activity transitions.

We were able to find only one other study that uses hip-worn ActiGraph data to classify sedentary behavior and sitting patterns in a free-living population with high accuracy. Kuster et al. (49) developed an algorithm utilizing hip-worn ActiGraph data in a sample of office workers (n = 38) to detect prolonged sitting bouts (≥5 and ≥10 min). Their method used a random forest classifier on 563 engineered ActiGraph signal features, followed by a bagged classification tree ensemble method. The model achieved a low bias of ≤7 min·d−1, when classifying time spent in prolonged sitting bouts (≥5 and ≥10 min) relative to activPAL. CHAP builds on the model of Kuster et al. in several ways. Most importantly, it was developed, validated, and tested on a larger and more representative cohort (n = 709) of free-living older adults. Through the CNN + BiLSTM architecture, CHAP was also able to automate the feature extraction process rather than relying on human-engineered features. As a result, CHAP requires less human input than the Kuster et al. model and is a versatile and flexible model that can be used to derive various person-level sitting pattern variables beyond prolonged sitting bouts. This application in the older adult population of the ACT cohort represents only the first test-case for CHAP. Future work will apply this method in other populations to assess performance and generalizability of CHAP in other age groups, and refine the model for broader generalizability across age, sex, and other key demographic factors.

Researchers interested in more deeply exploring the CHAP algorithm or applying CHAP to their existing hip-worn accelerometer data to derive postural transition and sitting pattern metrics are invited to explore the study’s GitHub repository. CHAP and associated user documentation are available for download from https://github.com/ADALabUCSD/DeepPostures.

Our study has several limitations that should be considered. We used thigh-worn activPAL data as ground truth rather than direct observation, which could lead to compounding of the activPAL’s inherent measurement error. However, we believe the benefit of obtaining large amounts of free-living data outweighs limitations of activPAL. Furthermore, activPAL has been shown to be a highly valid instrument for measuring postural transitions (25). Notably, CHAP had slightly lower person-to-person variability (i.e., lower SDs for derived sitting pattern metrics) compared with activPAL, which could potentially result in reduced statistical power in studies of associations between sitting patterns and health outcomes, and should be addressed in future studies. However, because our CHAP model predictions have similar probability distributions to that of the ground truth (activPAL), in practice, we do not expect substantial negative effects on study power when using CHAP predictions. Despite these limitations, our study had considerable strengths, including the large sample size and rigorous machine learning procedures used. Although CHAP allows posture-based classification from a single device, the hip-worn ActiGraph, it is important to acknowledge that methods for integrating both types of sensors (e.g., activPAL and ActiGraph) to achieve systems for postural and motion measurement have been previously developed (14). In addition, recent studies have developed accurate classification methods of wrist-worn accelerometer data for both sedentary behavior and sitting patterns (50,51).

CHAP performed much better than currently available methods, and it established a novel and powerful framework for models that use hip-worn data. This advance will allow researchers to better understand the epidemiology of sitting patterns, including norms among healthy and unhealthy people and how sitting patterns are causally associated with a myriad of healthy aging outcomes. In addition, it will reduce participant burden by allowing for accurate measurement of posture and motion using one hip-worn device, rather than necessitating several devices. Ultimately, these data will be needed to help inform future guidelines for sedentary behavior among older adults.

Supplementary Material

SUPPLEMENTARY MATERIAL
msse-53-2445-s001.docx (101.3KB, docx)
msse-53-2445-s002.docx (286KB, docx)

Acknowledgments

This work was supported by grant number U01AG006781 from the National Institute on Aging and R01DK114945 from the National Institute of Diabetes and Digestive and Kidney Diseases. It was also supported in part by a Hellman Fellowship, an NSF CAREER Award under award number 1942724, and a gift from VMware. The content is solely the responsibility of the authors and does not necessarily represent the views of any of these organizations. We thank the members of UC San Diego’s Database Lab and Center for Networked Systems for their feedback on this work.

The authors have no conflicts of interest to declare. Results of the present study do not constitute endorsement by the American College of Sports Medicine. Results of the study are presented clearly, honestly, and without fabrication, falsification, or inappropriate data manipulation.

Footnotes

M.A.G.-H. and S.N. are co-first authors.

A.K. and L.N. are co-senior authors.

Supplemental digital content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s Web site (www.acsm-msse.org).

Contributor Information

SUPUN NAKANDALA, Email: snakanda@eng.ucsd.edu.

MARTA M. JANKOWSKA, Email: majankowska@ucsd.edu.

DORI E. ROSENBERG, Email: dori.e.rosenberg@kp.org.

FATIMA TUZ-ZAHRA, Email: ftuzzahra@ad.ucsd.edu.

JOHN BELLETTIERE, Email: jbellettiere@ucsd.edu.

JORDAN CARLSON, Email: jacarlson@cmh.edu.

PAUL R. HIBBING, Email: prhibbing@cmh.edu.

JINGJING ZOU, Email: j2zou@ucsd.edu.

ANDREA Z. LACROIX, Email: alacroix@health.ucsd.edu.

ARUN KUMAR, Email: arunkk@eng.ucsd.edu.

LOKI NATARAJAN, Email: lnatarajan@ucsd.edu.

REFERENCES

  • 1.Copeland JL Ashe MC Biddle SJ, et al. Sedentary time in older adults: a critical review of measurement, associations with health, and interventions. Br J Sports Med. 2017;51(21):1539. [DOI] [PubMed] [Google Scholar]
  • 2.Biswas A Oh PI Faulkner GE, et al. Sedentary time and its association with risk for disease incidence, mortality, and hospitalization in adults: a systematic review and meta-analysis. Ann Intern Med. 2015;162:123–32. [DOI] [PubMed] [Google Scholar]
  • 3.Knaeps S, Bourgois JG, Charlier R, Mertens E, Lefevre J, Wijndaele K. Ten-year change in sedentary behaviour, moderate-to-vigorous physical activity, cardiorespiratory fitness and cardiometabolic risk: independent associations and mediation analysis. Br J Sports Med. 2018;52(16):1063–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.De Rezende LF, Rodrigues Lopes M, Rey-López JP, Matsudo VK, Luiz Odo C. Sedentary behavior and health outcomes: an overview of systematic reviews. PLoS One. 2014;9(8):e105620. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Matthews CE Chen KY Freedson PS, et al. Amount of time spent in sedentary behaviors in the United States, 2003–2004. Am J Epidemiol. 2008;167:875–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Harvey JA, Chastin SF, Skelton DA. How sedentary are older people? A systematic review of the amount of sedentary behavior. J Aging Phys Act. 2015;23:471–87. [DOI] [PubMed] [Google Scholar]
  • 7.Healy GN Dunstan DW Salmon J, et al. Breaks in sedentary time: beneficial associations with metabolic risk. Diabetes Care. 2008;31(4):661–6. [DOI] [PubMed] [Google Scholar]
  • 8.Chastin SF, Granat MH. Methods for objective measure, quantification and analysis of sedentary behaviour and inactivity. Gait Posture. 2010;31(1):82–6. [DOI] [PubMed] [Google Scholar]
  • 9.Bellettiere J Winkler EAH Chastin SFM, et al. Associations of sitting accumulation patterns with cardio-metabolic risk biomarkers in Australian adults. PLoS One. 2017;12(6):e0180119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Boerema ST, van Velsen L, Vollenbroek MM, Hermens HJ. Pattern measures of sedentary behaviour in adults: a literature review. Digit Health. 2020;6:2055207620905418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Janssen X, Cliff DP. Issues related to measuring and interpreting objectively measured sedentary behavior data. Meas Phys Educ Exerc Sci. 2015;19(3):116–24. [Google Scholar]
  • 12.Kim Y, Barry VW, Kang M. Validation of the ActiGraph GT3X and activPAL accelerometers for the assessment of sedentary behavior. Meas Phys Educ Exerc Sci. 2015;19:125–37. [Google Scholar]
  • 13.Montoye AHK, Pivarnik JM, Mudd LM, Biswas S, Pfeiffer KA. Validation and comparison of accelerometers worn on the hip, thigh, and wrists for measuring physical activity and sedentary behavior. AIMS Public Health. 2016;3(2):298–312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Myers A Gibbons C Butler E, et al. A novel integrative procedure for identifying and integrating three-dimensions of objectively measured free-living sedentary behaviour. BMC Public Health. 2017;17:979. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Barreira TV, Zderic TW, Schuna JM, Jr, Hamilton MT, Tudor-Locke C. Free-living activity counts-derived breaks in sedentary time: are they real transitions from sitting to standing? Gait Posture. 2015;42(1):70–2. [DOI] [PubMed] [Google Scholar]
  • 16.Carlson JA Bellettiere J Kerr J, et al. Day-level sedentary pattern estimates derived from hip-worn accelerometer cut-points in 8–12-year-olds: do they reflect postural transitions? J Sports Sci. 2019;37(16):1899–909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bellettiere J Tuz-Zahra F Carlson J, et al. Agreement of sedentary behaviour metrics derived from hip-worn and thigh-worn accelerometers among older adults: with implications for studying physical and cognitive health. J Meas Phys Behav. 2021;4:79–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Sasaki JE, Hickey AM, Staudenmayer JW, John D, Kent JA, Freedson PS. Performance of activity classification algorithms in free-living older adults. Med Sci Sports Exerc. 2016;48(5):941–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Marcotte RT, Petrucci GJ, Cox MF, Freedson PS, Staudenmayer JW, Sirard JR. Estimating sedentary time from a hip- and wrist-worn accelerometer. Med Sci Sports Exerc. 2020;52(1):225–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Wullems JA, Verschueren SMP, Degens H, Morse CI, Onambélé GL. Performance of thigh-mounted triaxial accelerometer algorithms in objective quantification of sedentary behaviour and physical activity in older adults. PLoS One. 2017;12(11):e0188215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ellis K, Godbole S, Marshall S, Lanckriet G, Staudenmayer J, Kerr J. Identifying active travel behaviors in challenging environments using GPS, accelerometers, and machine learning algorithms. Front Public Heal. 2014;2:36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Farrahi V, Niemelä M, Kangas M, Korpelainen R, Jämsä T. Calibration and validation of accelerometer-based activity monitors: a systematic review of machine-learning approaches. Gait Posture. 2019;68:285–99. [DOI] [PubMed] [Google Scholar]
  • 23.Narayanan A, Desai F, Stewart T, Duncan S, MacKay L. Application of raw accelerometer data and machine-learning techniques to characterize human movement behavior: a systematic scoping review. J Phys Act Health. 2020;17(3):360–83. [DOI] [PubMed] [Google Scholar]
  • 24.Rosenberg D Godbole S Ellis K, et al. Classifiers for accelerometer-measured behaviors in older women. Med Sci Sports Exerc. 2017;49(3):610–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Giurgiu M Bussmann JBJ Hill H, et al. Validating accelerometers for the assessment of body position and sedentary behavior. J Meas Phys Behav. 2020;3(3):253–63. [Google Scholar]
  • 26.Kerr J, Carlson J, Godbole S, Cadmus-Bertram L, Bellettiere J, Hartman S. Improving hip-worn accelerometer estimates of sitting using machine learning methods. Med Sci Sports Exerc. 2018;50(7):1518–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Nakandala S Jankowska MM Tuz-Zahra F, et al. Application of Convolutional Neural Network algorithms for advancing sedentary and activity bout classification. J Meas Phys Behav. 2021;4(2):102–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Rosenberg D Walker R Greenwood-Hickman MA, et al. Device-assessed physical activity and sedentary behavior in a community-based cohort of older adults. BMC Public Health. 2020;20:1256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Choi L, Liu Z, Matthews CE, Buchowski MS. Validation of accelerometer wear and nonwear time classification algorithm. Med Sci Sports Exerc. 2011;43:357–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Choi L, Ward SC, Schnelle JF, Buchowski MS. Assessment of wear/nonwear time classification algorithms for triaxial accelerometer. Med Sci Sports Exerc. 2012;44(10):2009–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Steel C, Bejarano C, Carlson JA. Time drift considerations when using GPS and accelerometers. J Meas Phys Behav. 2019;2(3):203–7. [Google Scholar]
  • 32.Goodfellow I, Bengio Y, Courville A. Deep Learning [Internet]. MIT Press; 2016. Available from: http://www.deeplearningbook.org. [Google Scholar]
  • 33.Yoon J, Kim H. Multi-channel lexicon integrated CNN-BILSTM models for sentiment analysis. In: Proceedings of the 29th Conference on Computational Linguistics and Speech Processing, ROCLING 2017. Taipei, Taiwan; 2017. [Google Scholar]
  • 34.Kiranyaz S, Ince T, Gabbouj M. Real-time patient-specific ECG classification by 1-D Convolutional Neural Networks. IEEE Trans Biomed Eng. 2016;63(3):664–75. [DOI] [PubMed] [Google Scholar]
  • 35.Wang R, Liang X, Zhu X, Xie Y. A feasibility of respiration prediction based on deep Bi-LSTM for real-time tumor tracking. IEEE Access. 2018;6:51262–8. [Google Scholar]
  • 36.Bridle JS. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Soulie FF, Jerault J, editors. Neurocomputing NATO ASI Series (Series F: Computer and Systems Sciences). Berlin: Springer; 1990. [Google Scholar]
  • 37.Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. Vol.1 No.1. Springer Series in Statistics. New York: Springer; 2009. pp. 191–218. [Google Scholar]
  • 38.Ray EL, Sasaki JE, Freedson PS, Staudenmayer J. Physical activity classification with dynamic discriminative methods. Biometrics. 2018;74(4):1502–11. [DOI] [PubMed] [Google Scholar]
  • 39.Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A. Hyperband: a novel bandit-based approach to hyperparameter optimization. J Mach Learn Res. 2017;18(1):6765–816. [Google Scholar]
  • 40.Migueles JH Cadenas-Sanchez C Ekelund U, et al. Accelerometer data collection and processing criteria to assess physical activity and other outcomes: a systematic review and practical considerations. Sports Med. 2017;47:1821–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kerr J Patterson RE Ellis K, et al. Objective assessment of physical activity: classifiers for public health. Med Sci Sports Exerc. 2016;48(5):951–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Ellis K. TLBC: Two-Level Behavior Classification. R package version 1.1 [Internet]. 2016. Available from: https://github.com/sieberts/TLBC. Accessed January 30, 2021. [Google Scholar]
  • 43.Kerr J Marshall SJ Godbole S, et al. Using the SenseCam to improve classifications of sedentary behavior in free-living settings. Am J Prev Med. 2013;44(3):290–6. [DOI] [PubMed] [Google Scholar]
  • 44.Hibbing PR, LaMunion SR, Hilafu H, Crouter SE. Evaluating the performance of sensor-based bout detection algorithms: the transition pairing method. J Meas Phys Behav. 2020;3(3):219–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Lyden K, Kozey Keadle SL, Staudenmayer JW, Freedson PS. Validity of two wearable monitors to estimate breaks from sedentary time. Med Sci Sports Exerc. 2012;44(11):2243–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Allahbakhshi H, Hinrichs T, Huang H, Weibel R. The key factors in physical activity type detection using real-life data: a systematic review. Front Physiol. 2019;10:75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Noor MHM, Salcic Z, Wang KIK. Adaptive sliding window segmentation for physical activity recognition using a single tri-axial accelerometer. Pervasive Mob Comput. 2017;38(1):41–59. [Google Scholar]
  • 48.Nawaratne R Alahakoon D De Silva D, et al. Deep learning to predict energy expenditure and activity intensity in free living conditions using wrist-specific accelerometry. J Sports Sci. 2021;39(6):683–90. [DOI] [PubMed] [Google Scholar]
  • 49.Kuster RP, Grooten WJA, Baumgartner D, Blom V, Hagströmer M, Ekblom Ö. Detecting prolonged sitting bouts with the ActiGraph GT3X. Scand J Med Sci Sports. 2020;30:572–82. [DOI] [PubMed] [Google Scholar]
  • 50.Straczkiewicz M, Glynn NW, Zipunnikov V, Harezlak J. Fast and robust algorithm for detecting body posture using wrist-worn accelerometers. J Meas Phys Behav. 2020;3(4):285–93. [Google Scholar]
  • 51.Twaites J, Everson R, Langford J, Hillsdon M. Transition detection for automatic segmentation of wrist-worn acceleration data: a comparison of new and existing methods. J Meas Phys Behav. 2020;3(1):19–28. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SUPPLEMENTARY MATERIAL
msse-53-2445-s001.docx (101.3KB, docx)
msse-53-2445-s002.docx (286KB, docx)

Articles from Medicine and Science in Sports and Exercise are provided here courtesy of Wolters Kluwer Health

RESOURCES