Skip to main content
Physical Therapy logoLink to Physical Therapy
. 2016 Jun 1;96(6):898–907. doi: 10.2522/ptj.20150407

Reliability of Physical Activity Measures During Free-Living Activities in People After Total Knee Arthroplasty

Gustavo J Almeida 1,*, James J Irrgang 2, G Kelley Fitzgerald 3, John M Jakicic 4, Sara R Piva 5
PMCID: PMC6410954  PMID: 26586856

Abstract

Background

Few instruments that measure physical activity (PA) can accurately quantify PA performed at light and moderate intensities, which is particularly relevant in older adults. The evidence of their reliability in free-living conditions is limited.

Objective

The study objectives were: (1) to determine the test-retest reliability of the Actigraph (ACT), SenseWear Armband (SWA), and Community Healthy Activities Model Program for Seniors (CHAMPS) questionnaire in assessing free-living PA at light and moderate intensities in people after total knee arthroplasty; (2) to compare the reliability of the 3 instruments relative to each other; and (3) to determine the reliability of commonly used monitoring time frames (24 hours, waking hours, and 10 hours from awakening).

Design

A one-group, repeated-measures design was used.

Methods

Participants wore the activity monitors for 2 weeks, and the CHAMPS questionnaire was completed at the end of each week. Test-retest reliability was determined by using the intraclass correlation coefficient (ICC [2,k]) to compare PA measures from one week with those from the other week.

Results

Data from 28 participants who reported similar PA during the 2 weeks were included in the analysis. The mean age of these participants was 69 years (SD=8), and 75% of them were women. Reliability ranged from moderate to excellent for the ACT (ICC=.75–.86) and was excellent for the SWA (ICC=.93–.95) and the CHAMPS questionnaire (ICC=.86–.92). The 95% confidence intervals (95% CI) of the ICCs from the SWA were the only ones within the excellent reliability range (.85–.98). The CHAMPS questionnaire showed systematic bias, with less PA being reported in week 2. The reliability of PA measures in the waking-hour time frame was comparable to that in the 24-hour time frame and reflected most PA performed during this period.

Limitations

Reliability may be lower for time intervals longer than 1 week.

Conclusions

All PA measures showed good reliability. The reliability of the ACT was lower than those of the SWA and the CHAMPS questionnaire. The SWA provided more precise reliability estimates. Wearing PA monitors during waking hours provided sufficiently reliable measures and can reduce the burden on people wearing them.


Regular physical activity (PA) contributes to health enhancement in many ways. It helps maintain weight, prevents chronic diseases such as diabetes and hypertension, and reduces mortality.1 People who undergo total knee arthroplasty (TKA) for end-stage knee osteoarthritis are typically older adults who have an inactive lifestyle and perform most of their activities at light intensities because of persistent knee pain and functional limitations.24 Thus, measuring PA with reliable measurement tools that can capture light-intensity activities is warranted.

Several instruments for the measurement of free-living PA in older adults are available; free-living PA is defined as daily activities performed by people at their own pace and not under supervised or controlled conditions.5,6 However, only a few instruments can accurately distinguish light (eg, mopping, walking to do errands) from moderate (eg, lawn mowing, brisk walking) intensities of PA.712 Of those instruments, 2 activity monitors, the accelerometer-based Actigraph (ACT, Actigraph LLC, Pensacola, Florida) and the multisensor SenseWear Armband (SWA, BodyMedia Inc, Pittsburgh, Pennsylvania), and a self-report questionnaire, the Community Healthy Activities Model Program for Seniors (CHAMPS) questionnaire, have been validated for the measurement of PA in older adults and are widely used in research including this population.1315

The ACT is a waist-mounted accelerometer-based device, and the SWA is an arm-mounted multisensor device. Their advantage is that they can measure PA in real time, whereas the CHAMPS questionnaire relies on an individual's recall. Although the cost of activity monitors is substantial compared with that of questionnaires, their cost is reasonable for large-scale research. Also, they are far less expensive than doubly labeled water, which is the reference standard for the measurement of free-living PA. A drawback of activity monitors is that they need to be worn for several hours per day on several days per week to capture relevant PA information. The advantages of the CHAMPS questionnaire include the fact that it provides information about types of PA behaviors (eg, dancing and walking to do errands); the cost is very low; and it is easy to administer, with minimal burden on people (15 minutes to complete). However, PA questionnaires have been found to overestimate moderate PA and to underestimate sedentary behavior.12,16

At present, there is limited information on the reliability of these instruments for the measurement of PA in older adults with arthritis of the lower extremities. To our knowledge, no studies have determined the reliability of the SWA in free-living conditions. Studies in which the ACT and the CHAMPS questionnaire were used to measure free-living PA were mainly done with people who were healthy.7,12,15,1719 Results from people who are healthy may not be applicable to people with knee osteoarthritis, who usually have pain and functional limitations that can lead to gait abnormalities, an inactive lifestyle, and obesity.20,21 More importantly, to our knowledge, no studies have compared the performance of the ACT, the SWA, and the CHAMPS questionnaire. Therefore, it is key to concurrently compare the reliability of measures of PA from activity monitors that are based on distinct technologies and a self-report questionnaire. This comparison will provide evidence that researchers and clinicians can use to make well-informed choices of tools for the assessment of PA in people with arthritis of the lower extremities.

Specific to accelerometry-based devices, evidence about the appropriate time frame for wearing the monitors also is limited. Most studies investigating the consistency of measures of PA for different monitoring time frames have involved the number of days; a few have involved the number of hours per day.6,2224 The monitoring time frames most commonly used to assess free-living PA in research have been 10 hours, waking hours, and 24 hours.6,2224 Investigating which monitoring time frame provides more reliable estimates of PA may help reduce the burden on people if shorter monitoring periods are found to yield consistent PA measures.

The main purpose of this study was to determine the test-retest reliability of the ACT, the SWA, and the CHAMPS questionnaire for the assessment of free-living PA at light and moderate intensities in people who underwent TKA for knee osteoarthritis. We also aimed to compare the reliability across the 3 instruments. Additionally, we aimed to determine the test-retest reliability of monitoring time frames that are commonly used to assess PA with the ACT and the SWA.

Method

Design and Participants

This was an ancillary study with a one-group, repeated-measures design. People participating in a randomized trial investigating the effects of rehabilitation approaches on physical function after TKA were invited to take part in this reliability study. This study took place from October 2011 to March 2013 in the Department of Physical Therapy, University of Pittsburgh. All participants signed a consent form approved by the university's Institutional Review Board before participation.

In the parent study, people were included if they had undergone unilateral TKA in the preceding 3 to 6 months and were at least 50 years old. To be included in this ancillary study, people also had to be willing to wear the activity monitors for 2 consecutive weeks, to complete the CHAMPS questionnaire twice, and to agree to perform similar activities during the 2 weeks of data collection. For safety reasons, people were excluded from the parent study if they reported more than 2 falls in the preceding year, were unable to ambulate a distance of 31 m without an assistive device, and had medical conditions that precluded safe exercise participation.

Study Protocol

Participants attended 3 testing visits. In the first visit, they completed demographic and self-report questionnaires on pain and physical function. Height and weight were also measured. This information was used to characterize the sample. At the end of the first visit, participants were fitted with the ACT at the waist and the SWA at the upper arm and were asked to wear the monitors for 7 days (24 h/d), except during showering or water activities. Participants were asked to perform similar activities during the 2 weeks of data collection. At the end of week 1 (test) and week 2 (retest), participants returned to our research facility so that we could download data from the activity monitors and they could complete the CHAMPS questionnaire. The CHAMPS questionnaire queried about PA participation in the preceding week, corresponding to the time when the monitors were worn.

Data from the activity monitors were inspected to ensure that they were sufficient, defined as at least 5 days with 10 hours of PA data per day.25 Participants for whom data were not sufficient were asked to wear the portable monitors for an additional week and to complete the CHAMPS questionnaire at the end of that week. For assessment of the stability of PA during the 2 weeks, participants were asked whether they performed “more,” “less,” or “about the same” activities in the second week relative to the first week. Data that were obtained from participants who reported “about the same” activities during the 2 weeks and that were considered sufficient were used in the reliability analysis.

Measures of PA

The main outcome of this study was the duration of daily PA estimated with the ACT, the SWA, and the CHAMPS questionnaire in minutes per day. The daily number of steps was also assessed with the ACT and the SWA. The PA measurement tools used in this study have distinct measurement characteristics to differentiate the levels of intensity of PA into light, moderate, and vigorous. The measures of moderate and vigorous PA were combined into the moderate category because our sample engaged in negligible amounts of vigorous PA. The PA categories compared across the 3 instruments in this study were light, moderate, and light to moderate (combination of light and moderate intensities); the numbers of steps were compared for the ACT and the SWA only.

The ACT is a small triaxial accelerometer-based monitor (∼5.0 × ∼3.8 × ∼1.5 cm [2 × 1.5 × 0.6 in]) worn around the waist, at the level of the hip-bone, over the right anterior superior iliac spine. The ACT model GT3X and the ActiLife 5 software (Actigraph LLC) were used. The ACT measures body acceleration in activity counts and was set to collect data at 1-minute intervals. Participants' demographic information (sex, age, height, and weight) was entered into the software when the device was initialized. The ACT generates data on activity counts per minute (cpm) and number of steps.

To categorize the duration of daily activities in minutes per day, the software uses the following cpm cutoff points: 760 to 1,951 cpm for light PA, >1,951 cpm for moderate OA, and ≥760 cpm for light to moderate PA.26 The ActiLife 5 software identifies periods during which the ACT is not worn (nonwear periods) on the basis of a threshold of 60 consecutive minutes with 0 cpm, which indicates no movement at all, and allows for up to 2 minutes with 1 to 100 cpm within the 60 minutes. Nonwear periods also were visually determined. The ACT has been shown to have moderate accuracy (r=.49) for the measurement of PA in free-living conditions relative to doubly labeled water in older adults13 and to have moderate to excellent reliability for the assessment of PA in free-living conditions in adults who are healthy and older adults (intraclass correlation coefficient [ICC]=.68–.90).17,18

The SWA is a small multisensor device (∼8.6 × ∼5.3 × ∼2.0 cm [3.4 × 2.1 × 0.8 in]) worn on the back of the right arm, over the triceps muscle, at the midpoint between the shoulder and the elbow. The SWA Pro-3 and the Professional software v6.1 (BodyMedia Inc) were used. The device combines information from a biaxial accelerometer, heat flux (heat being dissipated by the body), a galvanic signal (onset, peak, and recovery of maximal sweat rates), and skin temperature. Information from the sensors is integrated and processed by the software with proprietary algorithms that account for participants' demographic characteristics (sex, age, height, and weight).

The SWA provides data on the duration (minutes per day) of light (2–2.9 metabolic equivalents [METs]), moderate (≥3 METs), and light to moderate (≥2 METs) PA. It also computes number of steps. The SWA turns off automatically when not worn, enabling the software to recognize nonwear periods. Data also were visually screened to identify nonwear periods. The SWA has been shown to have moderate accuracy (r=.48) relative to doubly labeled water in older adults13 and good to excellent reliability in controlled conditions (ICC=.62–.99).27,28 However, the reliability of the SWA has not been investigated in free-living conditions.

The CHAMPS questionnaire is a self-report questionnaire that queries the type, frequency, and duration of 41 activities usually performed by older adults, ranging from light to vigorous PA, such as cleaning, gardening, and sports activities. The duration of each activity (hours per week) is selected from a range of less than 1 h/wk to greater than or equal to 9 h/wk, and activity is categorized into 2 intensity levels according to the CHAMPS questionnaire activity code book.15 Light to moderate PA (≥2 METs) represents all exercise-related activities queried by the questionnaire, such as light to heavy household chores and recreational and sports activities. Moderate PA (≥3 METs) represents activities such as heavy household chores, calisthenics, and sports.

To allow direct comparisons between the CHAMPS questionnaire and the activity monitors, we subtracted moderate PA from light to moderate PA to create a light PA category for the CHAMPS questionnaire. Data on water activities described by the CHAMPS questionnaire (ie, items 31–33) were excluded from the calculation of PA scores because participants were required to remove the activity monitors during those activities. The CHAMPS questionnaire has been shown to have low accuracy relative to doubly labeled water for the measurement of PA in older adults (r=.28)13 and fair to good reliability (ICC=.27–.84) for the measurement of PA in people with musculoskeletal disorders and older adults who are healthy.7,15,19,29

Reliability of Monitoring Time Frames

The test-retest reliability of the ACT and the SWA was assessed with 3 time frames commonly used in previous studies: 24 hours, waking hours, and 10 hours from awakening.6,22,23 The 24-hour time frame corresponded to a whole day of activities, including sleep time but excluding the time when the monitors were removed for water activities. The waking-hour time frame represented the duration from wake-up time to sleep time. The 10-hour-from-awakening time frame was determined by counting 10 hours from wake-up time. Wake-up and sleep times were identified with the SWA, which has been shown to be accurate for detecting sleep.3032 The times identified in the SWA software were used to define times for the ACT.

Data were entered into the time frame reliability analysis only if accelerometry data from almost 24 hours (excluding showering) were available and if the monitoring days and times from the ACT matched those identified with the SWA. This approach was used to ensure that both monitors were measuring the same activities performed during the same times.

Data Analysis

Descriptive statistics for continuous variables included means and standard deviations or medians and 25th to 75th percentiles according to the distribution of the data. Counts and frequencies were used for categorical variables. Demographics and biomedical characteristics of participants who reported “about the same” activities during the 2 weeks and those who reported “more” or “less” activities in week 2 were compared by use of independent-sample t tests or Mann-Whitney U tests for continuous data and chi-square tests for categorical data.

The test-retest reliability of PA measures from week 1 and week 2 was determined by use of the ICC (2,k) with absolute agreement. As a general guideline for interpretation, an ICC of less than .5 means poor reliability, an ICC of .50 to .75 represents moderate reliability, and an ICC of greater than .75 indicates excellent reliability.33 Measurement error was estimated by use of the standard error of measurement (SEM) with the equation SEM=SD√1 − ICC. We also calculated the minimum detectable change (MDC) to provide a threshold within a defined level of statistical confidence that true change beyond measurement error occurred by using the following equation: MDC=z score for level of confidence × √2 × SEM (z scores of 1.96 are associated with 95% confidence [MDC95]; z scores of 1.65 are associated with 90% confidence [MDC90]).33,34 Bland-Altman plots were used to assess for systematic bias and outliers.35 To compare differences in the reliability of PA measures across the ACT, the SWA, and the CHAMPS questionnaire, we examined whether the point estimate of the ICC from one instrument was contained in the 95% CI of the ICC from another instrument. If the point estimate was not within the 95% CI, the reliability of an instrument was statistically different from that of another.36

We also explored whether the magnitudes of PA measures differed across instruments. Paired t tests or Wilcoxon signed rank tests were used according to the distribution of the data (ACT versus SWA, ACT versus CHAMPS questionnaire, and SWA versus CHAMPS questionnaire). The alpha level was set at .05 for all analyses, with no correction for multiple comparisons to minimize type II errors. Analyses were performed with IBM SPSS Statistics 21 (IBM Corp, Armonk, New York) and Microsoft Excel 2010 (Microsoft Corp, Redmond, Washington).

Role of the Funding Source

Funding for this study was provided by the Pepper Center Scholars Pilot Program (grant P30-AG024827), University of Pittsburgh; the Rehabilitation Institute Pilot Program, University of Pittsburgh Medical Center; and the SHRS Research Development Fund, University of Pittsburgh.

Results

We invited all 44 participants from the parent trial to participate in this ancillary study. Two people declined to participate because they were not willing to wear the activity monitors for 2 weeks. For 7 of the remaining 42 participants, data from one of the weeks were not sufficient, so these participants were asked to wear the monitors for an additional week. Data from 5 of these participants were excluded because they were insufficient, even after the participants wore the devices for an additional week. Of the remaining 37 participants, 9 reported “more” or “less” activities in week 2, leaving data from 28 participants with similar PA during both weeks for the reliability analysis (eFig. 1, available at ptjournal.atpa.org). The monitoring time frames for these participants were similar during both weeks, with a mean of 21 h/d (SD=3 h/d). The characteristics of participants whose data were included in the reliability analysis and those whose data were excluded from the reliability analysis were similar (Tab. 1).

Table 1.

Demographic and Biomedical Characteristics of Participantsa

Characteristic Value for Participants Whose Data Were: P b
Included in Reliability Analysis (n=28) Excluded From Reliability Analysis (n=16)
Age, y  68.6 (7.5)  66.8 (5.3)  .18
Sex, % women    21 (75.0)    9 (56.3)  .31
BMI, kg/m2  29.6 (4.3)  30.7 (4.0)  .58
Race, % white    26 (92.9)    14 (87.5)  .61
Education, n (%)  1.00
 High school   12 (42.9)    7 (43.8)
 College   16 (57.1)    9 (56.3)
Duration of arthritis, no. (%)  .85
 1–10y   20 (71.4)   11 (68.8)
 >10 y    8 (28.6)    5 (31.3)
Time from TKA, n (%)  .68
 3-4.9 mo   19 (67.9)   12 (75.0)
 5–6mo    9 (32.1)    4 (25.0)
Knee pain,c median (Q25, Q75)
 Surgical side    2 (1, 3)    3 (2, 5)  .13
 Nonsurgical side    3 (0, 6)    3 (0, 6)  .67
Physical functiond  18.8 (8.3)   20.0 (12.3)  .31
a

Values are means (SD), unless otherwise indicated. BMI=body mass index, TKA=total knee arthroplasty, Q25-25th quartile, Q75=75th quartile.

b

P value for difference between participants whose data were included in the reliability analysis and those whose data were excluded from the reliability analysis.

c

Assessed with an 11-point numeric pain scale ranging from 0 (”no pain”) to 10 (”worst imaginable pain”).39,40

d

Assessed with the 17-item Western Ontario and McMaster Universities Osteoarthritis Index—physical function subscale. Each item was scored from 0 (no limitation) to 4 (extreme limitation), with a total score of up to 68 points. Higher scores indicated worse function.41

Table 2 shows the characterization of PA during the 2 weeks and the estimates for test-retest reliability, SEM, and MDC. The 3 measurement tools showed moderate to excellent test-retest reliability (ICC≥.75), with 95% CIs ranging from poor to excellent (.43–.98). Bland-Altman plots revealed no systematic bias between measures from the 2 weeks for the activity monitors across PA categories (eFig. 2, available at ptjournal.apta.org). However, the plots demonstrated systematic bias for CHAMPS questionnaire scores in the duration of light PA and the duration of light to moderate PA. The line of equality (zero) was not contained in the 95% CI of the difference between the weeks, indicating that PA values from week 1 were significantly higher than those from week 2. These differences concurred with the significant F tests from the analysis of variance (from the ICC calculation) that examined the difference between PA measures from week 1 and those from week 2 (Tab. 2).

Table 2.

Average Duration of Physical Activity Obtained With Various Measurement Tools in 2 Weeksa

Measurement Tool Physical Activity Category Duration, min/d, X̅ (SD) Differenceb (95% CI) Significance of F Testc ICC (95% CI) SEM 90% MDC 95% MDC
Week 1 Week 2
ACT Light (min/d) 67.3 (33.0) 65.0 (36.3) 2.2 (–8.5, 13.0) .67 .86d (.66, .94) 13.0 30.3 36.0
Moderate (min/d) 11.7 (13.0) 9.0 (8.2) 2.6 (–1.5, 6.8) .20 .75d (.43, .89) 5.3 12.4 14.7
Light to moderate (min/d) 78.9 (42.9) 74.0 (43.2) 4.9 (–8.8, 18.6) .47 .85d (.64, .94) 16.9 39.4 46.8
No. of steps/d 4,677 (2,032) 4,414 (1,694) 263 (–368, 894) .39 .85d (.63, .94) 709 1,655 1,966
SWA Light (min/d) 138.9 (102.8) 149.7 (98.7) –10.8 (–27.9, 6.3) .21 .95e (.90, .98) 22.5 52.6 62.5
Moderate (min/d) 45.3 (44.7) 43.5 (36.6) 1.8 (–6.7, 10.3) .67 .93 (.85, .97) 10.8 25.1 29.8
Light to moderate (min/d) 184.3 (138.8) 193.2 (127.4) –9.0 (–32.1, 14.2) .43 .95 (.89, .98) 29.8 69.5 82.5
No. of steps/d 6,262 (3,648) 6,372 (3,432) –110 (–763, 544) .17 .95 (.88, .98) 937 2,186 2,596
CHAMPS questionnaire Light (min/d) 67.8 (37.4) 58.5 (31.6) 11.4 (3.0, 19.8) .01 .86 (.66, .94) 12.9 30.1 35.8
Moderate (min/d) 41.1 (39.8) 40.0 (46.1) 0.61 (–8.6, 9.8) .89 .92f(.83, .96) 12.2 28.4 33.7
Light to moderate (min/d) 108.8 (57.6) 98.5 (61.4) 12.0 (0.18, 23.8) .05 .92 (.82, .97) 16.8 39.3 46.7
a

ACT=Actigraph, SWA=SenseWear Armband, CHAMPS=Community Healthy Activities Model P ogram for Seniors, Cl=confidence interval ICC= intraclass correlation coefficient, SEM=standard error of measurement, MDC=minimum detectable change.

b

Raw difference between week 1 and week 2.

c

Significance of the F test from the analysis of variance used to calculate ICCs.

d

The point estimates of the ICCs from the ACT were not contained in the 95% Cls of the ICCs from the SWA.

e

The point estimates of the ICCs from the SWA were not contained in the 95% Cls of the ICCs from the CHAMPS questionnaire.

f

The point estimates of the ICCs from the CHAMPS questionnaire were not contained in the 95% Cls of the ICCs from the ACT.

The comparison of ICCs across PA measurement tools is shown in Table 2; ICCs that were statistically significantly different are indicated. The reliability of the ACT was significantly lower than that of the SWA across all PA categories (ICC estimates from the ACT were not contained in the 95% CIs from the SWA). The reliability of the ACT also was significantly lower than that of the CHAMPS questionnaire for moderate PA but not for the other categories. The reliability of the CHAMPS questionnaire was significantly lower than that of the SWA for light PA.

The Figure shows the magnitude of PA duration measured by each instrument during week 1. Measurements from the ACT were significantly lower than those from the SWA across the 3 PA categories (P<.05) and were significantly lower than those from the CHAMPS questionnaire for moderate PA (P=.001) and light to moderate PA (P=.02). The CHAMPS questionnaire scores were significantly lower than measures from the SWA for light PA (P=.004) and light to moderate PA (P=.02). The same analysis performed with data from week 2 generated similar results.

Figure.

Figure

Comparison of magnitudes of duration (min/d) of physical activity (PA) measured by the Actigraph (ACT), the SenseWear Armband (SWA), and the Community Healthy Activities Model Program for Seniors (CHAMPS) questionnaire in week 1. *Magnitudes of duration were significantly different between measurement tools (P<.05).

For 7 of the 28 participants whose data were included in the reliability analysis, either data from the 24-hour time frame were not available or the monitoring days and times from the ACT did not match those from the SWA. For the 21 participants whose data were included in the analysis of reliability in the monitoring time frames, the minimum wear time was 22:47 h/d. Reliability in the monitoring time frames is shown in Tables 3, 4, and 5 (for 24 hours, 10 hours from awakening, and waking hours, respectively). The reliability of the ACT and the SWA in each of the monitoring time frames ranged from moderate to excellent (ICC=.75–.96). The ICCs for each instrument were similar across all time frames, indicating no statistical difference. Bland-Altman plots revealed no systematic bias between measures from the 2 weeks for the activity monitors across PA categories and time frames (eFig. 2).

Table 3.

Reliability in the 24-Hour Time Framea

Measurement Tool Physical Activity Category Duration, min/d, X̅ (SD) Differenceb (95% CI) Significance of F Testc ICC (95% CI)
Week 1 Week 2
ACT Light (min/d) 65.5 (34.2) 66.3 (37.9) 0.2 (–11.4, 11.8) .97 .86 (.64, .94)
Moderate (min/d) 10.4 (12.8) 8.7 (8.0) 1.7 (–2.7, 6.1) .43 .75d (.46, .90)
Light to moderate (min/d) 76.9 (44.0) 75.0 (44.8) 1.9 (–12.8, 16.6) .79 .84 (.61, .94)
No. of steps/d 4,570 (2,041) 4,374 (1,684) 196 (–404, 796) .50 .89 (.70, .96)
SWA Light (mink]) 135.8 (89.8) 148.9 (92.6) –13.1 (–34.6, 8.4) .22 .93 (.82, .97)
Moderate (min/d) 45.3 (39.8) 43.8 (34.1) 1.5 (–8.1, 11.1) .75 .91 (.79, .97)
Light to moderate (min/d) 181.1 (118.1) 192.6 (116.1) –11.6 (–39.7, 16.5) .40 .93 (.82, S7)
No. of steps/d 6,104 (3,506) 6,096 (3,134) 8 (–759, 776) .98 .92 (.81, .97)
a

Daily physical activity duration and intraclass correlation coefficient (ICC) in 21 participants with matching data from the Actigraph (ACT) and the SenseWear Armband (SWA) in the 24-hour time frame during the 2 weeks of data collection.

b

Difference between week 1 and week 2 and corresponding 95% confidence interval (CI).

c

Significance of the F test from the analysis of variance used to calculate ICCs.

d

The point estimates of the ICCs from the ACT were not contained in the 95% Cls of the ICCs from the SWA for the duration of moderate activities.

Table 4.

Reliability in the 10-Hour-From-Awakening Time Framea

Measurement Tool Activity Intensity Duration, min/d, X̅ (SD) Differenceb (95% CI) Significance of F Testc ICC (95% CI)
Week 1 Week 2
ACT Light (min/d) 50.3 (29.7) 47.3 (28.9) 2.9 (–6.7, 12.5) .53 .88 (.71, .95)
Moderate (min/d) 7.7 (8.5) 6.7 (6.7) 1.0 (–1.7, 3.6) .45 .85 (.63, .94)
Light to moderate (min/d) 57.9 (35.8) 54.0 (34.7) 3.9 (–7.5, 15.4) .48 .89 (.72, .95)
No. of steps/d 3,325 (1,484) 3,238 (1,316) 88 (–287, 462) .63 .92 (.80, .97)
SWA Light (min/d) 102.1 (67.2) 109.6 (62.9) –7.5 (–22.4, 7.5) .31 .94 (.86, .98)
Moderate (min/d) 36.5 (35.5) 34.9 (27.7) 1.5 (–7.1, 10.2) .71 .92 (.80, .97)
Light to moderate (min/d) 138.6 (94.7) 144.7 (83.1) –6.1 (–27.4, 15.1) .56 .94 (.85, .98)
No. of steps/d 4,501 (2,895) 4,515 (2,422) –14 (–666, 638) .97 .95 (.87, .98)
a

Daily physical activity duration and intraclass correlation coefficient (ICC) in 21 participants with matching data from the Actigraph (ACT) and the SenseWear Armband (SWA) in the 10-hour-from-awakening time frame during the 2 weeks of data collection.

b

Difference between week 1 and week 2 and corresponding 95% confidence interval (CI).

c

Significance of the F test from the analysis of variance used to calculate ICCs.

Table 5.

Reliability in the Waking-Hour Time Framea

Measurement Tool Activity Intensity Duration, min/d, X̅ (SD) Differenceb (95% CI) Significance of F Testc ICC (95% CI)
Week 1 Week 2
ACT Light (min/d) 64.0 (33.3) 63.6 (37.4) 0.3 (–11.5, 12.2) .96 .86 (.66, .94)
Moderate (min/d) 9.9 (12.8) 8.3 (7.7) 1.6 (–2.8, 5.9) .46 .82d (.56, .93)
Light to moderate (min/d) 73.8 (42.9) 71.9 (43.9) 1.9 (–13.0, 16.8) .79 .86 (.65, .94)
No. of steps/d 4,506 (2,045) 4,352 (1,693) 154 (–446, 754) .60 .89 (.70, .96)
SWA Light (min/d) 134.0 (88.9) 147.7 (91.0) –12.7 (–34.0, 8.6) .23 .94 (.85, .98)
Moderate (min/d) 45.1 (39.6) 43.8 (34.3) 1.3 (–8.3, 10.9) .78 .93 (.83, .97)
Light to moderate (min/d) 180.1 (117.6) 191.5 (114.8) –11.5 (–39.3, 16.4) .40 .94 (.85, .98)
No. of steps/d 5,992 (3,483) 6,002 (3,127) –10 (–774, 754) .98 .96 (.89, .98)
a

Daily physical activity duration and intraclass correlation coefficient (ICC) in 21 participants with matching data from the Actigraph (ACT) and the SenseWear Armband (SWA) in the waking-hour time frame during the 2 weeks of data collection.

b

Difference between week 1 and week 2 and corresponding 95% confidence interval (CI).

c

Significance of the F test from the analysis of variance used to calculate ICCs.

d

The point estimates of the ICCs from the ACT were not contained in the 95% Cls of the ICCs from the SWA for the duration of moderate activities.

Discussion

To our knowledge, this is the first study to concurrently determine the test-retest reliability of the ACT, the SWA, and the CHAMPS questionnaire during free-living PA performed at light and moderate intensities after TKA. The results of the reliability analysis revealed good test-retest reliability for the 3 instruments. Measures of PA from the SWA were shown to be more reliable than those from the ACT and the CHAMPS questionnaire. The reliability of the duration of PA measured by the SWA was better than the reliability of the duration of PA measured by the ACT across all PA categories and was also better than the reliability of measures of light PA from the CHAMPS questionnaire. The better reliability of PA measures from the SWA was further supported by the lower bounds of the 95% CIs, which were consistently above the threshold of excellent reliability (ie, ICC>.75).

Unlike the ACT and the SWA, the CHAMPS questionnaire showed systematic bias, with higher scores in week 1 than in week 2, for light PA and light to moderate PA. Despite the high reliability for those categories (ICC>.85), the statistically significant differences in the CHAMPS questionnaire scores between weeks suggested that its ability to provide consistent scores is questionable. We assumed that the differences in scores occurred because most of the participants were wearing activity monitors for the first time, a fact that made them more aware of the activities performed during week 1. That awareness probably faded away during week 2 because the participants became accustomed to wearing the devices. This lack of awareness resulted in “less” activities being reported in the second week than in the first week, as indicated by the results from the CHAMPS questionnaire. Systematic bias in light to moderate PA was attributed to the contribution of light PA because systematic bias was not demonstrated for moderate PA.

Although our results for the SWA cannot be put in perspective with the literature because (to our knowledge) no prior studies determined the reliability of PA measures from the SWA in free-living conditions, they can be compared with the results of studies of the ACT and the CHAMPS questionnaire. However, studies of the ACT and the CHAMPS questionnaire investigated the reliability of PA measures from each instrument separately rather than concurrently in the same study. Studies of the ACT were done with middle-aged and older adults who were healthy, and the reported ICCs ranged from .74 to .90.17,18 Although the ICCs from our study fall within the range of ICCs from those studies, their findings cannot be generalized to our population because older adults who undergo TKA tend to be less active than their younger counterparts who are healthy.20

Several studies with older adults who were healthy investigated the reliability of the CHAMPS questionnaire, and the reported reliability was lower than that in the present study (ICC=.62–.81).7,12,15,19 In a study with adults who had fibromyalgia, the reported reliability also was lower than that in the present study (ICC=.27–.76).29 Of these studies, only one involved a 1-week interval to determine the reliability of PA measured by the CHAMPS questionnaire, and the estimates reported in that study (ICC=.79–.81)19 were comparable to those reported in the present study (ICC=.82–.92). The lower ICCs reported in the other studies could be attributed to the longer time interval between test and retest, which ranged from 2 weeks to 6 months. Future, larger studies should investigate the comparative reliability of activity monitors and questionnaires with different test-retest intervals.

The present study also provides SEMs and MDCs for the PA measures and is unique in this regard. We are not aware of other studies reporting these indexes of measurement error of PA measured by the ACT, the SWA, or the CHAMPS questionnaire. The SEM and the MDC are based on the same units as the measurement (ie, min/d or steps/d) and are clinically helpful because they can be used to interpret changes in PA that are beyond measurement error. Although the SEM represents the value of the measurement error itself, the SEM is used in the MDC to compute a threshold within a defined level of statistical confidence that true change beyond measurement error has occurred.

We reported the MDC with 2 degrees of confidence, the MDC90 and the MDC95, to allow different levels of strictness for interpreting changes in PA over time. For instance, if the MDC90 were used for the duration of light PA measured by the SWA (53 min/d) and a person increased the duration of light activities by greater than 53 min/d, one could be 90% confident that there was a real change in PA that was beyond measurement error. Although 53 min/d seems to be a large change, activities falling in this intensity category (2–2.9 METs) included mopping, gardening, and slow walking, which were spread throughout the day and could be done by people with arthritis of the lower extremities.

When the magnitudes of PA durations were compared across instruments, we observed that measures from the ACT were consistently lower than those from the SWA and lower than those from the CHAMPS questionnaire for moderate and light to moderate PA. These findings are in agreement with those of some studies on the validity of the ACT and the SWA in other populations.8,9,14,37 In those studies, measures from the activity monitors were concurrently compared against indirect calorimetry (reference standard) in young to middle-aged adults who were healthy,8,37 older adults with chronic obstructive pulmonary disease,9 and older adults after TKA.14 Measures of PA from the SWA were reported to be closer (more accurate) to those of the reference standard than were those from the ACT. It was suggested that the SWA captured more activities than the ACT because of its placement on the upper arm and its multisensor technology. These characteristics enabled the SWA to detect most daily activities, including nonambulatory ones, which were less likely to be captured by the ACT, which was worn at the hip.14

To our knowledge, the present study is also the first to provide evidence about the comparative reliability of PA measured by activity monitors across different monitoring time frames. The magnitudes of PA measures indicated that the information provided during the 10-hour period from awakening may not be a good representation of daily activities despite the excellent reliability of PA measured during that time frame (ICC≥.85). In a comparison of the amount of PA data contained in the 10-hour period from awakening with that contained in the 24-hour period, the 10-hour time frame had 19% to 29% less data for moderate PA and light PA, respectively. The waking-hour time frame had only 0% to 5% less PA data than the 24-hour time frame. These differences between time frames are in line with the results of a study in which PA in adults who were healthy was measured by the ACT.38 In that study, a 10-hour period had up to 42% less PA data than a 14-hour period (waking hours).38 Therefore, the use of data from periods shorter than waking hours may result in an underestimation of daily PA because many people tend to spread out their activities throughout the day. Measuring PA during waking hours rather than for 24 hours can reduce the burden on participants because most of them feel uncomfortable wearing monitors while sleeping. Also, the data processing for waking hours is less cumbersome than it is for a 10-hour period from awakening.

Because PA measured during waking hours seemed to be more indicative of participants' daily activities, we also calculated the SEM from PA measured by the ACT and the SWA in each intensity category with waking hours as a reference. The magnitudes of the SEMs derived from PA measured during waking hours were similar to those for the 24-hour period. For light PA, moderate PA, and light to moderate PA and the number of steps, the SEMs from the ACT were 13.2 min/d, 4.9 min/d, 16.3 min/d, and 618 steps per day, respectively, and the SEMs from the SWA were 22.3 min/d, 9.9 min/d, 28.9 min/d, and 686 steps per day, respectively.

We acknowledge that the sample size in the present study was small. Although our primary analysis involved data from only 28 participants, this characteristic did not seem to have negatively affected the ICC results, as the values had generally narrow CIs. We also acknowledge that estimates of reliability and measurement error may be different for longer time intervals. In addition, although our sample was a good representation of people who undergo TKA for knee osteoarthritis, our results may not be generalized to older adults who are healthy and do not have dysfunctions of the lower extremities.

In conclusion, our results indicated that the reliability of the duration of PA measured by the SWA was better than the reliability of the duration of PA measured by the ACT across intensity categories and was also better than the reliability of measures of light PA from the CHAMPS questionnaire. Measuring daily PA with the ACT or the CHAMPS questionnaire may not be ideal because the ACT measures significantly less PA than the SWA and the CHAMPS questionnaire and because measures of light PA from the CHAMPS questionnaire were significantly different between weeks. Furthermore, it seems that monitoring of PA during waking hours provides reliable data that resemble those from a 24-hour time frame. The use of the waking-hour monitoring time frame may help reduce the burden on participants wearing activity monitors without compromising reliability.

Supplementary Material

ptj0898-eFigure-1

ptj0898-eFigure-1

ptj0898-eFigure-2

ptj0898-eFigure-2

Footnotes

The study was approved by the University of Pittsburgh Institutional Review Board. All participants signed a consent form before participation in the study.

Funding for this study was provided by the Pepper Center Scholars Pilot Program (grant P30-AG024827), University of Pittsburgh; the Rehabilitation Institute Pilot Program, University of Pittsburgh Medical Center; and the SHRS Research Development Fund, University of Pittsburgh.

References

  • 1. Paterson DH, Warburton DE. Physical activity and functional limitations in older adults: a systematic review related to Canada's Physical Activity Guidelines. Int J Behav Nutr Phys Act. 2010;7:38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. de Groot IB, Bussmann JB, Stam HJ, Verhaar JA. Actual everyday physical activity in patients with end-stage hip or knee osteoarthritis compared with healthy controls. Osteoarthritis Cartilage. 2008;16:436–442. [DOI] [PubMed] [Google Scholar]
  • 3. Brandes M, Ringling M, Winter C, et al. . Changes in physical activity and health-related quality of life during the first year after total knee arthroplasty. Arthritis Care Res (Hoboken). 2011;63:328–334. [DOI] [PubMed] [Google Scholar]
  • 4. Holsgaard-Larsen A, Roos EM. Objectively measured physical activity in patients with end stage knee or hip osteoarthritis. Eur J Phys Rehabil Med. 2012;48:577–585. [PubMed] [Google Scholar]
  • 5. Cawthon PM, Blackwell TL, Cauley JA, et al. . Objective assessment of activity, energy expenditure, and functional limitations in older men: the Osteoporotic Fractures in Men study. J Gerontol A Biol Sci Med Sci. 2013;68:1518–1524. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Herrmann SD, Barreira TV, Kang M, Ainsworth BE. Impact of accelerometer wear time on physical activity data: a NHANES semisimulation data approach. Br J Sports Med. 2014;48:278–282. [DOI] [PubMed] [Google Scholar]
  • 7. Harada ND, Chiu V, King AC, Stewart AL. An evaluation of three self-report physical activity instruments for older adults. Med Sci Sports Exerc. 2001;33:962–970. [DOI] [PubMed] [Google Scholar]
  • 8. Calabró MA, Lee JM, Saint-Maurice PF, et al. . Validity of physical activity monitors for assessing lower intensity activity in adults. Int J Behav Nutr Phys Act. 2014;11:119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Van Remoortel H, Raste Y, Louvaris Z, et al. . Validity of six activity monitors in chronic obstructive pulmonary disease: a comparison with indirect calorimetry. PLoS One. 2012;7:e39198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Lee JM, Kim Y, Welk GJ. Validity of consumer-based physical activity monitors. Med Sci Sports Exerc. 2014;46:1840–1848. [DOI] [PubMed] [Google Scholar]
  • 11. Wetten AA, Batterham M, Tan SY, Tapsell L. Relative validity of 3 accelerometer models for estimating energy expenditure during light activity. J Phys Act Health. 2014;11:638–647. [DOI] [PubMed] [Google Scholar]
  • 12. Hekler EB, Buman MP, Haskell WL, et al. . Reliability and validity of CHAMPS self-reported sedentary-to-vigorous intensity physical activity in older adults. J Phys Act Health. 2012;9:225–236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Colbert LH, Matthews CE, Havighurst TC, et al. . Comparative validity of physical activity measures in older adults. Med Sci Sports Exerc. 2011;43:867–876. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Almeida GJ, Wert DM, Brower KS, Piva SR. Validity of physical activity measures in individuals after total knee arthroplasty. Arch Phys Med Rehabil. 2015;96:524–531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Stewart AL, Mills KM, King AC, et al. . CHAMPS physical activity questionnaire for older adults: outcomes for interventions. Med Sci Sports Exerc. 2001;33:1126–1141. [DOI] [PubMed] [Google Scholar]
  • 16. Neilson HK, Robson PJ, Friedenreich CM, Csizmadi I. Estimating activity energy expenditure: how valid are physical activity questionnaires? Am J Clin Nutr. 2008;87:279–291. [DOI] [PubMed] [Google Scholar]
  • 17. Sirard JR, Forsyth A, Oakes JM, Schmitz KH. Accelerometer test-retest reliability by data processing algorithms: results from the Twin Cities Walking Study. J Phys Act Health. 2011;8:668–674. [DOI] [PubMed] [Google Scholar]
  • 18. Buman MP, Hekler EB, Haskell WL, et al. . Objective light-intensity physical activity associations with rated health in older adults. Am J Epidemiol. 2010;172:1155–1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Giles K, Marshall AL. Repeatability and accuracy of CHAMPS as a measure of physical activity in a community sample of older Australian adults. J Phys Act Health. 2009;6:221–229. [DOI] [PubMed] [Google Scholar]
  • 20. Naal FD, Impellizzeri FM. How active are patients undergoing total joint arthroplasty? A systematic review. Clin Orthop Relat Res. 2010;468:1891–1904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Milner CE. Is gait normal after total knee arthroplasty? Systematic review of the literature. J Orthop Sci. 2009;14:114–120. [DOI] [PubMed] [Google Scholar]
  • 22. Almeida GJ, Wasko MC, Jeong K, et al. . Physical activity measured by the SenseWear Armband in women with rheumatoid arthritis. Phys Ther. 2011;91:1367–1376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Troiano RP, Berrigan D, Dodd KW, et al. . Physical activity in the United States measured by accelerometer. Med Sci Sports Exerc. 2008;40:181–188. [DOI] [PubMed] [Google Scholar]
  • 24. Miller GD, Jakicic JM, Rejeski WJ, et al. . Effect of varying accelerometry criteria on physical activity: the Look AHEAD study. Obesity (Silver Spring). 2013;21:32–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Trost SG, McIver KL, Pate RR. Conducting accelerometer-based activity assessments in field-based research. Med Sci Sports Exerc. 2005;37(11 suppl):S531–S543. [DOI] [PubMed] [Google Scholar]
  • 26. Freedson PS, Melanson E, Sirard J. Calibration of the Computer Science and Applications, Inc. accelerometer. Med Sci Sports Exerc. 1998;30:777–781. [DOI] [PubMed] [Google Scholar]
  • 27. Brazeau AS, Karelis AD, Mignault D, et al. . Test-retest reliability of a portable monitor to assess energy expenditure. Appl Physiol Nutr Metab. 2011;36:339–343. [DOI] [PubMed] [Google Scholar]
  • 28. Brazeau AS, Beaudoin N, Bélisle V, et al. . Validation and reliability of two activity monitors for energy expenditure assessment. J Sci Med Sport. 2016;19:46–50. [DOI] [PubMed] [Google Scholar]
  • 29. Kaleth AS, Ang DC, Chakr R, Tong Y. Validity and reliability of community health activities model program for seniors and short-form international physical activity questionnaire as physical activity assessment tools in patients with fibromyalgia. Disabil Rehabil. 2010;32:353–359. [DOI] [PubMed] [Google Scholar]
  • 30. Teller A. A platform for wearable physiological computing. Interact Comput. 2004;16:917–937. [Google Scholar]
  • 31. Miwa H, Sasahara S, Matsui T. Roll-over detection and sleep quality measurement using a wearable sensor. Conf Proc IEEE Eng Med Biol Soc. 2007;2007:1507–1510. [DOI] [PubMed] [Google Scholar]
  • 32. BaHammam A, Alrajeh M, Albabtain M, et al. . Circadian pattern of sleep, energy expenditure, and body temperature of young healthy men during the intermittent fasting of Ramadan. Appetite. 2010;54:426–429. [DOI] [PubMed] [Google Scholar]
  • 33. Portney LG, Watkins MP. Foundations of Clinical Research: Applications to Practice. Stamford, CT: Appleton & Lange; 1993. [Google Scholar]
  • 34. Nunnally JC, Bernstein IH. Psychometric Theory. New York, NY: McGraw-Hill; 1994. [Google Scholar]
  • 35. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1:307–310. [PubMed] [Google Scholar]
  • 36. Cumming G, Finch S. Inference by eye: confidence intervals and how to read pictures of data. Am Psychol. 2005;60:170–180. [DOI] [PubMed] [Google Scholar]
  • 37. Berntsen S, Hageberg R, Aandstad A, et al. . Validity of physical activity monitors in adults participating in free-living activities. Br J Sports Med. 2010;44:657–664. [DOI] [PubMed] [Google Scholar]
  • 38. Herrmann SD, Barreira TV, Kang M, Ainsworth BE. How many hours are enough? Accelerometer wear time may provide bias in daily activity estimates. J Phys Act Health. 2013;10:742–749. [DOI] [PubMed] [Google Scholar]
  • 39. Katz J, Melzack R. Measurement of pain. Surg Clin North Am. 1999;79:231–252. [DOI] [PubMed] [Google Scholar]
  • 40. Marx RG, Jones EC, Allen AA, et al. . Reliability, validity, and responsiveness of four knee outcome scales for athletic patients. J Bone Joint Surg Am. 2001;83:1459–1469. [DOI] [PubMed] [Google Scholar]
  • 41. Bellamy N, Buchanan WW, Goldsmith CH, et al. . Validation study of WOMAC: a health status instrument for measuring clinically important patient relevant outcomes to antirheumatic drug therapy in patients with osteoarthritis of the hip or knee. J Rheumatol. 1988;15:1833–1840. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ptj0898-eFigure-1

ptj0898-eFigure-1

ptj0898-eFigure-2

ptj0898-eFigure-2


Articles from Physical Therapy are provided here courtesy of Oxford University Press

RESOURCES