Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Mar 3.
Published in final edited form as: Physiother Theory Pract. 2022 Oct 25;40(3):647–657. doi: 10.1080/09593985.2022.2135149

Use of a treadmill, lift, and carry battery as a composite functional performance test: Analysis of data from a pragmatic randomized controlled trial in a military population attending a functional restoration program

Tyler Snow a, Larisa Burke b, Dana C Sanford a, Asha Mathew b, Alana D Steffen b, Diane M Flynn a, Ardith Z Doorenbos b,c
PMCID: PMC10126206  NIHMSID: NIHMS1845544  PMID: 36282735

Abstract

Background:

The treadmill, lift, and carry (TLC) battery is a composite functional performance test created to measure the effectiveness of a functional restoration (FR) program in a military population.

Purpose:

To determine the validity, reliability, and minimal clinically important differences (MCIDs) of the individual tests and the composite TLC battery.

Methods:

We assessed the validity by mean differences, effect sizes, and standardized response means pre- and post-FR; and by correlations between the TLC battery and other established measures. We assessed reliability by correlating pre- and post-FR scores. We used principal component analysis (PCA) to create a composite measure. We determined MCIDs via distribution methods and receiver operator curve analysis.

Results:

Significant (p < 0.001) mean changes and large effect sizes (0.6–0.8) pre- to post-FR. Pre- and posttest Spearman’s correlations ranged from 0.5 to 0.6. Spearman’s correlations between TLC battery scores and other measures were small (± 0.3–0.4) and significant (p < 0.001). PCA supported use of a single-component composite. MCIDs were treadmill time: 3 minutes; metabolic equivalent of task: 1.5 units; floor-to-waist lift: 15 lbs; waist-to-shoulder lift: 10 lbs; 40-foot carry: 10 lbs; and composite score: 6 units.

Conclusion:

This secondary data analysis provides preliminary support for the validity and reliability of the TLC battery for use in military populations.

Keywords: functional restoration, pain, military, validity, reliability

INTRODUCTION

Chronic pain is a significant burden to both civilian and military populations (Cohen, Vase, and Hooten, 2021; Smith, Taubman, and Clark, 2020). Service members are at particular risk for chronic pain due to the unique challenges of military duties, ongoing training for force readiness, and prior deployment injuries (Knox et al., 2011; McGeary and McGeary, 2014). Functional restoration (FR) has gained considerable attention and integration into the care of patients with chronic pain since its beginnings in 1988 (Mayer and Gatchel, 1988). Functional restoration is guided by the biopsychosocial model and takes a sports medicine approach that focuses on the patient as a working athlete. FR targets occupational skills and exercises that require physical strength, agility, and stamina (Mayer and Gatchel, 1988). An important advantage of the FR approach to treating chronic pain is that it simultaneously addresses multiple outcomes, including self-reported pain and disability, objective physical function, and socioeconomic outcomes (Gatchel and Mayer, 2008).

Researchers engaged in the ongoing development of FR have sought optimal ways to determine the effect of chronic pain treatment on physical function. While several well-known questionnaires such as the Patient-Reported Outcomes Measurement Information System (PROMIS) (National Institutes of Health, 2012) collect patient-reported measures of physical function, clinician-observed measures have also been used to assess treatment effectiveness. A “functional performance test” can provide observed information on specialized movements in sport, exercise, and occupations. Currently there is a lack of a validated test battery that assesses the impact of chronic pain treatment on function. To address this gap, the aim of this study was to determine the validity, reliability, and minimal clinically important differences (MCIDs) of individual tests of a battery as well as of the composite measure of the complete set of tests, thus expanding the knowledge base of functional performance tests.

METHODS

Sample

The study was performed in a military population and was part of a larger pragmatic randomized controlled trial of a FR program conducted in the Interdisciplinary Pain Management Center (IPMC) at the Madigan Army Medical Center (MAMC) in Tacoma, Washington. The study procedures were approved by the Regional Health Command-Pacific IRB and all participants provided informed consent. The MAMC IPMC was established at the direction of the US Army Medical Command in 2010 and offers a broad spectrum of pain therapies, including medications, physical and occupational therapies, psychological approaches, complementary and integrative health therapies, and interventional therapies. A medical provider at the MAMC IPMC identifies patients to be candidates for interdisciplinary pain care. Patients experiencing chronic pain (> 3 months) following musculoskeletal injury who were referred by their healthcare provider participated in this study.

The parent study included baseline functional performance assessments conducted by a physical or occupational therapist to determine study eligibility. Minimal functional eligibility criteria included the ability to independently sit down on and stand up from the floor; walk or jog on a treadmill for at least 6 minutes; and complete at least two of the following tasks: lift 20 lbs from floor to waist height and/or from waist to shoulder height, and/or carry 20 lbs a distance of 40 feet without an increase in pain intensity. Those who met eligibility and agreed to participate were individually randomized to the intervention arm or usual care. An intention to treat analysis was conducted to determine study results. More details of the parent study are described in (Flynn et al, 2018; Flynn et al, 2022) ClinicalTrials.gov Identifier: NCT04656340.

Intervention

Participants who randomized to the intervention arm completed a 3-week course of twice weekly chiropractic, acupuncture and yoga in addition to usual care. The control group completed a 3-week course or usual care alone (i.e. physical therapy, occupational therapy, and psychoeducation). Then over the next 3–6 weeks, all participants engaged in an intensive FR program, which included 12 full days of therapy. Each FR program treatment day included approximately 5 hours of physical activity, 1 hour of cognitive behavioral therapy for chronic pain, and 1 hour of pain education. For eligible participants unable to commit to 4 days of therapy per week over a 3-week period, we offered the option of 2 days of therapy per week over a 6-week period for the same number of contact hours. Thus participants received 12 days of therapy with approximately 84 contact hours over 3–6 weeks, depending on individual schedules.

For the current study, we conducted a secondary analyses of functional performance measures collected in the parent study that included a graded treadmill test, a floor-to-waist lift, a waist-to-shoulder lift, and a 40-foot carry. Although these treadmill, lift, and carry (TLC) measures were collected at multiple timepoints for the primary study, we used only the pre- and post- FR assessments for the current analyses. Although functional performance test scores could differ with site of chronic pain, all participants had to fulfill the minimal functional eligibility criteria (described above), and participants’ scores on their original TLC battery were considered as the baseline scores. At the conclusion of the FR program, the measures were typically collected on the last treatment day, but we considered a posttest acceptable up to 2 treatment days before the end of FR and up to 30 calendar days after the end of FR.

Measures

Treadmill, Lift, and Carry (TLC) Battery

We used a standardized protocol for the TLC battery assessment. Before participants began, the tester asked the participant to assess their overall pain level. For the purpose of this analysis, we defined this starting overall pain level as the participant’s baseline pain intensity. For each of the tests administered, we instructed participants to notify the tester when pain intensity increased from baseline. We terminated a test if the participant’s pain intensity increased from baseline, if the participant felt unsafe, or if the tester deemed the participant unsafe. We explained all tests to participants in a clear and understandable way. We did not coach mechanics prior to the tests. The TLC battery was administered by adequately trained and certified physical therapists, physical therapist assistants, occupational therapists, or occupational therapist assistants.

Treadmill Test

In this test, we used an adapted form of the modified Naughton treadmill protocol. We had to modify our protocol due to constraints of the treadmill we used. Specifically, we were limited by our incline function. Our treadmill only went to 15% incline and only by whole numbers. Our protocol increased in intensity by an estimated average of one metabolic equivalent of a task (MET) per every 2-minute stage by changing speed, incline, or a combination of the two (Table 1) similar to the modified Naughton treadmill protocol. While the modified Naughton treadmill protocol and others like it are commonly used as a stress test protocol for functional capacity (Harb et al., 2020) we are unaware of a treadmill protocol that is validated in a chronic pain population. With our protocol, we instructed the participant to ambulate, without holding on to the treadmill, until their pain intensity increased from their established baseline.

Table 1.

Treadmill Test Protocol

Time (minutes) Speed (MPH) Incline METs MET increase
0–2 1.0 0 2.0
2–4 1.5 0 2.5 0.5
4–6 2.0 3 3.8 1.3
6–8 2.0 7 4.9 1.1
8–10 2.0 10 5.7 0.8
10–12 3.0 7 6.5 0.8
12–14 3.0 10 7.3 0.8
14–16 3.0 12 7.9 0.6
16–18 3.0 15 8.7 0.8
18–20 3.5 15 9.9 1.2
20–22 4.0 15 11.1 1.2
22–24 4.5 15 12.4 1.3
24–26 5.0 15 13.7 1.3
26–28 5.5 15 14.6 0.9
28–30 6.0 15 15.6 1.0
30–32 6.5 15 16.5 0.9
32–34 7.0 15 17.4 0.9
34–36 7.5 15 18.3 0.9
36–38 8.0 15 19.3 1.0
38–40 8.5 15 20.2 0.9
40–42 9.0 15 21.2 1.0
42–44 9.5 15 22.1 0.9
44–46 10.0 15 23.1 1.0

MET = metabolic equivalent of task.; MPH = miles per hour.

Floor-to-Waist Lift

We instructed the participant to lift an empty crate (height-11 in., width-13 in., depth-13 in.) from the floor onto a shelf 29 inches high. If the participant was able to complete the lift without increased pain from their established baseline, then we instructed them to place the crate back on the floor and added a 5 or 10-lb plate to the crate. We then instructed the participant to repeat the lift. We repeated this process until the participant reported an increase of pain from the established baseline. We recorded the weight of the last completed lift. If the participant could not complete a single lift without a pain increase, we recorded zero.

Waist-to-Shoulder Lift

We instructed the participant to lift the same empty crate from a shelf 29 inches high to a shelf 45 inches high, then repeated with the same weight progression as described for the floor-to-waist lift test.

40-Foot Carry

for this test, we instructed the participant to lift the same empty crate from a shelf of 29 inches high and then carry the crate against the body and elbows at approximately 90 degrees. We instructed the participant to carry the crate to a point 20 feet away, then turn around and return the crate to the same shelf. We then applied the same weight progression as for the lifting tests.

Psychometric Assessments of the TLC Battery

PROMIS Physical Function

The PROMIS Physical Function item bank contains a large pool of physical function items ranging from self-care to strenuous activities (Rose et al, 2014).

PROMIS Pain Interference

The PROMIS Pain Interference items specifically focus on interference of pain in daily activities involving physical, psychological, and social functioning (Amtmann et al, 2010).

Defense and Veterans Pain Rating Scale (DVPRS)

The DVPRS average pain intensity during the previous 7 days is rated on a 0–10 scale (0 = no pain, 10 = “as bad as it could be, nothing else matters”) with color, graphic, and verbal descriptors associated with each number (Buckenmaier et al, 2013; Cook, Buckenmaier, and Gershon, 2014).

NIH Research Task Force (Pain) Impact Score (PIS)

The NIH Task Force on Research Standards for Chronic Low Back Pain recommends a pain impact score (Deyo et al, 2014) that is a composite score of pain intensity, pain interference with usual activities, and functional status. These items have major prognostic and discriminatory importance (Deyo et al, 2016) and were calculated in this study from the DVPRS and the PROMIS Physical Function and Pain Interference scores. Total PIS scores range from 8 to 50.

Roland-Morris Disability Questionnaire (RMDQ)

The Roland-Morris Disability Questionnaire (RMDQ) is a 24-item self-report questionnaire about how low-back pain affects functional activities (Roland and Morris, 1983). The RMDQ is scored by adding up the number of items checked. Each question is worth 1 point, so scores can range from 0 (no disability) to 24 (severe disability).

Statistical Analyses

Validity and Reliability

Prior to evaluating validity and reliability, the distributions of the functional measures were assessed and found generally to be skewed to the right and leptokurtic. Therefore, non-parametric effect estimates and statistical tests were conducted. Responsiveness is a type of validity and shows the ability of a test to measure differences across time (Hays and Hadorn, 1992). Responsiveness of the TLC battery measures was established by assessing the mean/median change in scores pre- and post-intervention, testing the change in scores using Wilcoxon signed rank tests, and calculating the effect size (r=Z/√N). Test-retest reliability assesses the stability of an instrument across time. We conducted test-retest reliability by assessing the correlation between the pre and posttest values (Spearman’s correlations) and the interclass correlation coefficients (ICC) from a one way random effects model (ICC1,1) (Bruton, Conway, and Holgate, 2000). Though we expected some change in scores pre- to post-FR, we anticipated there would be a level of consistency for functional tests taken by the same person over time. Similar methods for assessing reliability over time for pain patients have been used previously (Deyo et al, 2016). Convergent validity of a measure is established when it is highly correlated with other measures that capture similar constructs (Heale and Twycross, 2015). Convergent validity was assessed by correlating the TLC battery measures with other established measures of physical function (PROMIS Physical Function and RMDQ), pain (PROMIS Pain Interference and DVPRS), or both (NIH-PIS). Spearman’s correlations were used for these tests.

Minimal Clinically Important Differences

A multi-approach method was used to assess the MCIDs of the tests in the TLC battery. This approach included distributional and anchor-based methods (Revicki, Hays, Cella, and Sloan, 2008). The distribution methods consisted of calculating half of the baseline standard deviation and the standard error of measurement (SEM) (baseline SD * sqrt (1-correlation pre- and posttest scores). In addition, receiver operator curve (ROC) analyses were run for each functional performance test using a selection of physical function or pain anchor measures, including the NIH-PIS, DVPRS, PROMIS Pain Interference and Physical Function scales, and RMDQ.

MCIDs for the anchor measures were established from previously reported literature: an improvement of 2 points for the PROMIS measures and 3 for the NIH-PIS score (Deyo et al, 2016); and a 30% reduction in RMDQ (Jordan, Dunn, Lewis, and Croft, 2006). Though the MCID for the DVPRS has not been directly assessed (Hassett, Whibley, Kratz, and Williams, 2020) the MCID for comparable 0 to 10 pain scales has been found to be 30% or a 2-point reduction (Farrar et al, 2001). We found all the anchor measures met at minimum a ± 0.3 correlation with each of the functional performance tests (Revicki, Hays, Cella, and Sloan, 2008).

We used the CUTPT STATA program to conduct the ROC analyses (Clayton, 2013). Three options for determining the MCID are available within the program. We present the results from all three methods of assessing MCID in Table 2. Results from the ROC analyses were included in the table only if the area under the curve met or exceeded 0.6 and the significance value of the test was p < 0.05. We used the results from all distributions and anchor-based determined cut-points to make an overall recommendation for the MCID.

Table 2.

Minimal Clinically Important Difference Calculations for Functional Performance Tests

Change in outcome Method Treadmill time METs Floor-to-waist lift Waist-to-shoulder lift 40-foot carry Functional t-score
Distribution-based ½ SD 2.7 1.2 14.1 9.6 10.0 5.1
SEM 3.9 1.7 16.4 11.9 12.1 5.7
Pain impact score
(decrease of 3 or more)
Liu 1.0 15.0 5.0 7.0 4.3
Youden 7.0 4.3
Nearest 1.0 7.0 5.9
Pain Intensity
(decrease of 2 points)
Liu 3.9 2.3 15.0 8.6
Youden 3.3 2.3 15.0 8.6
Nearest 3.9 1.4 15.0 7.0 8.6
Pain Interference
(decrease of 2 points)
Liu 3.6 1.3 15.0 10.0 6.4
Youden 2.6 1.0 15.0 6.4
Nearest 15.0 10.0 6.4
Physical Function
(increase of 2 points)
Liu 3.6 1.0 15.0 5.0 7.0 6.5
Youden 2.4 1.0 15.0 7.0 5.4
Nearest 15.0 5.0 7.0 6.5
RMDQ
(decrease of 30%)
Liu 3.3 1.8 15.0 10.0 5.0 6.0
Youden 3.3 1.8 15.0 6.0
Nearest 3.3 1.8 15.0 10.0 6.0
MCID recommendation 3 minute increase 1.5 unit increase 15 lb increase 10 lb increase 10 lb increase 6 unit increase

MCID = Minimal Clinically Important Difference. RMDQ = Roland-Morris Disability Questionnaire. ROC cut-points included only if AUC ≥ 0.60 and p < 0.05. SD = standard deviation, where ½ SD = ½ standard deviation of baseline scores. SEM = standard error of measurement, where SEM = baseline SD * sqrt (1-correlation pre- and posttest scores). lb = pound. MET = metabolic equivalent of task. Treadmill time, METs, and functional t-score were adjusted.

Composite Functional Performance Measure

We created a composite measure to summarize the overall functional ability of the participant based on their individual functional performance test results, including the METs treadmill score, floor-to-waist lift test, waist-to-shoulder lift test, and 40-foot-carry test. To create a composite score, the strength of the correlations between the 4 functional measures were assessed. Then a principal component analysis (PCA) was performed to using the pre and post functional measure variables using the STATA 17 PCA program. Eigenvalues and the proportion of variation explained by each component were reviewed (Table 3).

Table 3.

Principal Component Analysis of Functional Performance Tests for Creation of a Composite Score

Components
1 2 3 4
Eigenvector coefficients
 METs treadmill score 0.42 0.90 0.10 0.05
 Floor-to-waist lift test 0.53 −0.15 −0.84 0.01
 Waist-to-shoulder lift test 0.52 −0.32 0.40 0.69
 40-foot-carry test 0.53 −0.24 0.37 −0.73
Eigenvalue 3.14 0.56 0.17 0.13
% variance explained 0.78 0.14 0.04 0.03

Note: Baseline n = 193. Post-FR n = 137.

n = number of participants. FR = functional restoration. MET = metabolic equivalent of task. % = percent.

To create the single composite score, each functional performance test was first converted to a z-score using the baseline means and standard deviations from the parent study as the reference population given that a reference dataset for the functional performance tests being assessed was not available. We summed the z-scores to create a total functional score. Lastly, we converted the total score to a t-score where the mean of the reference population is 50 and standard deviation is 10. The details of composite score calculation are shown in Table 4.

Table 4.

Composite Functional Score Calculations

Z-score METs=METs scoreReference mean(7.3)Reference SD(2.4)
Z-score floor-to-waist lift=METs scoreReference mean(44.1)Reference SD(27.3)
Z-score waist-to-shoulder lift=METs scoreReference mean(39.5)Reference SD(19.1)
Z-score 40-foot-carry=METs scoreReference mean(37.8)Reference SD(19.5)
Total functional score=Z-score METs+Z-score floor-to-waist+Z-score waist-to-shoulder+Z-score 40-foot-carry
Total functional t-score=50+10*Functional scoreReference SD(3.4)(Reference mean of 50 and SD of 10)

MET = metabolic equivalent of task. SD = standard deviation; Reference values are values of the functional performance tests from the IMPPPORT study pain-clinic participants at pre-intervention. Functional score was converted to a t-score where the mean = 50 and SD = 10 for the reference population.

RESULTS

Participant Characteristics

The TLC battery was administered to 194 patients at baseline and 140 patients post-FR, resulting in 140 participants with completed measures at both time points. The majority of these participants were males (n=118/140; 84.3%), younger (<40 years old: n=117/140; 84%), had post-high-school education (n=103/137; 75%), were married (n=102/135; 76%), in the Army, excluding Reserve and Guard (n=112/135; 83%), and were rank E5 or higher (86/136; 63%).

Validity, Reliability, Composite Score, and Minimal Clinically Important Differences

The PCA to support creation of a composite score found that a single component would explain 78% of the variation of the four individual functional performance tests, and that adding an additional component (e.g. looking at the treadmill test results separately from the lift tests) would only explain an additional 14% of the variation in results. The eigenvalue was 3.14 for a single component and 0.56 for two components (Table 3) which was far from meeting the traditional eigenvalue cutoff of 1.

Average values for the functional measures pre and post intervention are shown in Table 5. The effect sizes for the change in scores were large (0.4 to 0.5). The pre and posttest functional scores were positively correlated (r = 0.50 to 0.60) and the ICCs1,1 ranged from 0.30 to 0.50 which demonstrated that 30%−50% of the variation in functional performance test scores could be attributed to clustering between individuals.

Table 5.

Validity and Reliability of Functional Performance Tests and Composite Measure

Responsiveness validity Test-retest reliability
Baseline Post-functional restoration Change score Effect size r ICC1,1
n Mean Median SD Min Max Mean Median SD Min Max Mean Median SD Min Max
Treadmill time 139 12.6 11.0 5.4 2.2 27.7 16.2 16.7 4.9 4.6 27.3 3.6 3.4 5.2 −17.5 16.8 0.4 0.5 0.3
METs 139 7.4 6.5 2.4 2.5 14.6 8.9 8.7 2.3 3.8 14.6 1.5 1.4 2.3 −8.6 8.1 0.4 0.5 0.4
Floor-to-waist lift 139 44.5 35.0 28.2 5.0 160.0 64.4 60.0 36.8 5.0 285.0 19.8 15.0 28.8 −80.0 125.0 0.4 0.5 0.5
Waist-to-shoulder lift 139 39.6 35.0 19.2 5.0 110.0 53.1 50.0 21.2 10.0 110.0 13.5 13.0 18.0 −55.0 70.0 0.5 0.5 0.4
40-foot carry 138 37.2 30.0 20.0 15.0 145.0 49.5 45.0 21.0 10.0 115.0 12.3 10.0 18.2 −40.0 65.0 0.4 0.5 0.5
Functional t-score 137 50.0 46.6 10.1 37.7 89.8 57.9 57.1 12.0 34.1 105.6 7.9 7.6 9.1 −28.2 30.6 0.5 0.6 0.5

Change score = change in score from pre- to post-functional restoration program. Wilcoxon signed rank tests for pre-post change all p < 0.001; Effect size = z-score from Wilcoxon test/square root (number of observations). Effect size guide: small (0.1), medium (0.3), and large (0.5); n = number of participants; SD = standard deviation; Min = Minimum; Max = Maximum; r = Spearman correlation between pre-and post-test scores. All correlations p < 0.001; ICC = interclass correlation coefficient; MET = metabolic equivalent of task.

Correlations between the functional performance tests, the composite functional score, and other established physical function and pain measures are shown in Table 6. The individual functional performance measures and composite functional score had significant weak negative correlations with survey measures for pain and disability (r = 0.3 to 0.4) and significant weak positive correlations with the physical function measure (0.3 to 0.4) (Table 6). Strong positive correlations were observed among the weight-based functional measures (0.9). The treadmill test measures were moderately positively correlated with the weight-based measures (0.5 to 0.6). Strong positive correlations were also observed between the composite functional measure and the individual functional performance tests component measures (0.80 or 0.90).

Table 6.

Correlations for Functional and Pain Measures as Measured at Baseline or Post-Functional Restoration

Change in Outcome Pain Impact Score Pain Intensity Pain Interference Physical Function RMDQ Treadmill time METs Floor-to-waist lift Waist-to-shoulder lift 40-foot carry
n r n r n r n r n r n r n r n r n r n r
Pain Intensity 300 0.7
Pain Interference 300 0.9 300 0.6
Physical Function 300 -0.8 300 -0.5 304 -0.6
RMDQ 181 0.7 181 0.5 183 0.6 183 -0.6
Treadmill time 273 -0.4 274 -0.3 277 -0.3 277 0.4 206 -0.3
METs 273 -0.4 274 -0.3 277 -0.3 277 0.4 206 -0.3 333 1.0
Floor-to-waist lift 273 -0.4 274 -0.3 277 -0.3 277 0.4 205 -0.4 331 0.6 331 0.6
Waist-to-shoulder lift 273 -0.4 274 -0.3 277 -0.4 277 0.4 205 -0.3 331 0.5 331 0.5 332 0.9
40-foot carry 272 -0.4 273 -0.3 276 -0.3 276 0.4 205 -0.4 330 0.6 330 0.6 331 0.9 331 0.9
Functional t-score 271 -0.5 272 -0.4 275 -0.4 275 0.5 204 -0.4 330 0.8 330 0.8 330 0.9 330 0.9 330 0.9

Functional t-score = composite score for all functional tests. r = Spearman’s correlation coefficient (≥ 0.7 = large, 0.5–0.7 = medium, 0.3–0.4 = small). n = number of participants. All p < 0.01. RMDQ = Roland-Morris Disability Questionnaire. MET = metabolic equivalent of task.

The MCIDs for treadmill time ranged from 2.4 to 3.9 minutes, and we recommend an increase of 3 minutes in treadmill time as a standard (Table 2). The MCIDs for METs scores ranged from 1.0 to 2.3 units, and we recommend an increase of 1.5 units as a standard. The MCIDs for the floor-to-waist lift test ranged from 14.4 to 15 lbs, and we recommend an increase of 15 lbs as a standard. The MCIDs for the waist-to-shoulder lift test ranged from 5 to 11.9 lbs, and we recommend an increase of 10 lbs as a standard. The MCIDs for the 40-foot carry test ranged from 5 to 12.1, and we recommend an increase of 10 lbs as a standard. The MCIDs for the composite functional score ranged from 4.3 to 8.6 units, and we recommend an increase of 6 units as a standard.

DISCUSSION

This study utilized secondary data from the parent study to examine the responsiveness, validity, and MCID of TLC battery. Our study findings demonstrate a brief functional assessment battery can be reliably used to measure functional improvement following FR. The individual functional performance tests and the composite functional scores were responsive to change from pre- to post-FR. There was modest convergent validity among the four functional performance tests and self-reported physical function and pain questionnaires demonstrated by small to moderate correlations in the expected direction. This suggests that both types of assessments should be collected as there may be different information on physical function obtained from the TLC battery and self-report questionnaires. The moderate correlations between the pre- and posttest functional scores demonstrated a consistency in functional performance tests completed by the same participant, where higher pre-intervention scores were associated with post-intervention scores at the higher end of the range. ICC1,1 values demonstrated that 30% - 50% of the variation in functional performance test scores could be attributed to variation between individuals, again showing some consistency in repeat testing of the same individual despite an intervention occurring between the first and second test. The composite functional measure was correctly and moderately correlated with the reference self-reported physical function and pain questionnaires and highly correlated with the individual functional performance tests of the TLC battery.

Administration of the TLC Battery

The way in which the TLC is administered is of utmost importance. For each test administered in this study, we instructed the participant to notify the tester when pain intensity increased from baseline. The objective is to determine how much physical stress we can apply before the participant’s pain intensity increases. This practice is different from the often used assessment procedure where the patient is asked to continue until fatigued or experiencing shortness of breath (Jain, Yardi, and Rai, 2016). The goal of this approach is to teach or reinforce the concept of activity pacing. For participants who tend to avoid activity as a coping strategy for pain this approach communicates that the human body needs to move. For participants who tend to persist in an activity despite increasing pain this approach communicates that movement does not need to exacerbate pain.

It is also important to understand that the TLC battery is not meant to be a strength test nor an endurance test. Rather, we administered the TLC battery to determine when pain becomes a limiting factor for physical function. For each of the tests, participants notified the tester when pain intensity increased from baseline. Conceptually, we liken this to a patient notifying the examiner when pain intensity increases during a straight leg raise test (SLRT). During a SLRT, if a patient reports discomfort at 60 degrees of movement but denies radicular symptoms, clinicians conclude that the patient is more likely limited by myofascial tension than neural tension. Similarly, during the TLC battery, if participants report fatigue onset before their pain changes, clinicians may conclude that the participant is more likely limited by strength or endurance than by pain.

Clinical Value of the TLC Battery

Considering the reported mismatch between how individuals believe they function and how they actually function as observed by others (Strand, Moe-Nilssen, and Ljunggren, 2002) the clinical utility of TLC battery is significant. The TLC battery could serve as a valuable tool that can supplement subjective self-report measures and thus enable a comprehensive assessment of change in physical function (Strand, Moe-Nilssen, and Ljunggren, 2002).

Specifically, using the TLC battery in conjunction with self-report functional measures, four separate combinations of findings could be observed, each with significant clinical value: 1) functional improvement both on the TLC battery and self-report measures, which would indicate both perceived and demonstrated benefits from treatment; 2) improvement on self-report but not on the TLC battery, which would possibly indicate that the patient has learned the concept of pain stabilization and pain management better through activity pacing. Essentially, this scenario would indicate that the patient has learned to listen to their body better and does not consistently push through a pain increase to accomplish a task and therefore subjectively endorses improvement; 3) improvement on the TLC battery but not on self-report measures, which is of clinical value because a clinician can demonstrate to a potentially discouraged patient that their effort is making a difference. This outcome may also be valuable in creating cognitive dissonance for a patient who holds to the idea that pain has to be debilitating; and 4) no improvement on TLC battery and self-report measures, which may be valuable as an indicator to redirect treatment. Thus, with the use of the TLC battery, assessment of patient outcomes becomes more meaningful and clinically valuable.

Comparison of Study Findings

Our analysis construct validity found weak correlations (–0.30 ≤ r ≤ 0.40) between TLC and self-reported measures of physical function and pain, and all correlations were in the expected direction with better self-reported measures associated with better functional performance. Gouttebarge et al. (2009) reported somewhat weaker correlations (–0.29 ≤ r ≤ 0.05) between five Ergo-Kit functional capacity evaluation lifting tests and the Von Korff pain questionnaire. Weak correlations between clinician-measured and patient reported measures of function and pain might be interpreted as evidence of poor construct validity of functional performance tests. However, our interpretation is that neither self-reported nor clinician-measured functional assessments tell the whole story, and both should be collected whenever feasible to provide more comprehensive assessment of outcomes. Patients do not always recognize positive response to treatment, and clinician measurements of functional improvement can encourage patients to continue to engage in treatment.

Our use of anchor-based and distribution-based methods for estimating the MCID was compared with similar use in previous research. Benaim et al. (2019) estimated the MCID of six commonly used performance tests, using the anchor-based method as a reference and supplemented by the distribution-based and opinion-based approaches. They found the anchor-based method could not provide valid estimations for three of the six scales used and relied on distribution and opinion-based methods to provide rough values of MCIDs for those scales. In our study, both anchor-based and distribution-based methods provided valid estimations of MCIDs. Thus, our findings support the value of the TLC battery in assessing function and the incremental improvement over previous work in this field.

When comparing procedures for creating a composite score Tiftikçioğlu (2018) used procedures similar to ours to create a Multiple Sclerosis Functional Composite from three different physical function measures. Because the scores were on different scales, they were converted to z-scores, and then the transformed scores were summed. The reference population chosen when calculating z-scores can affect the interpretation of the composite score. A reference value can be the mean value (baseline or pooled) of the test from the study cohort or the mean value from an external reference dataset (of a disease-specific or general population). Tiftikçioğlu (2018) used an external database to calculate reference values which is preferred if available. A reference database with scores for our functional performance tests was not available so our study used baseline values of patients from the parent randomized controlled trial.

Finally, our study findings need to be interpreted in the context of chronic pain in a military population compared to the general population. While the prevalence of chronic pain among the general population is 20% (Dahlhamer et al, 2018) its prevalence among the military population is as high as 31–44% (Reif et al, 2018; Toblin et al, 2014). Active duty service members are likely to be younger, fully employed, more physically conditioned, and less likely to have other chronic medical conditions than the patient population in most pain management practices. Despite the differences between military and general populations, given its brevity and modest equipment requirements, the TLC battery may be useful for assessing change in function following rehabilitative pain care in nonmilitary populations.

Study Limitations

The study was conducted in a single military center, which may affect the generalizability of the findings to service members cared for at other military centers as well as to non-military populations. Further validation studies of the TLC battery outside the military populations may enhance its generalizability. This study was a secondary data analysis from the parent study and was limited in that the study was not designed for the assessment of validity and reliability of the TLC battery. Additionally, data regarding the different musculoskeletal injuries were not collected as part of this study. Thus, subcategory analyses could not be conducted.

CONCLUSIONS

This secondary data analysis of the psychometric properties of the TLC battery found preliminary support for the validity and reliability of these tests. Additionally, use of the single composite from the battery holds promise to help providers assess clinically meaningful change in physical function. Further research into the psychometric properties of the TLC battery and its composite measure is warranted.

Figure 1.

Figure 1.

Screeplot of Eigen values in Principal Component Analysis of Functional Performance Tests

PCA = Principal Component Analysis. CI = Confidence Interval; Composite score calculation = sum z-scores for METs and 3 lift tests; convert sum to t-score

ACKNOWLEDGEMENTS

This work was supported by grants from the U.S. Army Medical Research and Materiel Command under grant number W81XWH-14-DMRDP-CRI-IRA-MTI and the National Institutes of Health under grant number K24NR015340 and grant number K24AT011995. Tandem Editing LLC provided professional editing support.

Footnotes

Disclosure

All authors report no conflict of interest.

REFERENCES

  1. Amtmann D, Cook KF, Jensen MP, Chen WH, Choi S, Revicki D, Cella D, Rothrock N, Keefe F, Callahan L et al. 2010. Development of a PROMIS item bank to measure pain interference. Journal of Pain 150: 173–182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Benaim C, Blaser S, Léger B, Vuistiner P, Luthi F 2019. “Minimal clinically important difference” estimates of 6 commonly-used performance tests in patients with chronic musculoskeletal pain completing a work-related multidisciplinary rehabilitation program. BMC Musculoskeletal Disorders 20: 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bruton A, Conway JH, Holgate ST 2000. Reliability: What is it, and how is it measured? Physiotherapy 86: 94–99. [Google Scholar]
  4. Buckenmaier CC, Galloway KT, Polomano R, McDuffie M, Kwon N, Gallagher RM 2013. Preliminary validation of the Defense and Veterans Pain Rating Scale (DVPRS) in a military population. Pain Medicine 14: 110–123. [DOI] [PubMed] [Google Scholar]
  5. Clayton P 2013. CUTPT: Stata Module for Empirical Estimation of Cutpoint for a Diagnostic Test. Statistical Software Components. https://EconPapers.repec.org/RePEc:boc:bocode:s457719. [Google Scholar]
  6. Cohen SP, Vase L, Hooten WM 2021. Chronic pain: An update on burden, best practices, and new advances. Lancet 397: 2082–2097. [DOI] [PubMed] [Google Scholar]
  7. Cook KF, Buckenmaier C, Gershon RC 2014. PASTOR/PROMIS (R) pain outcomes system: What does it mean to pain specialists? Pain Management 4: 277–283. [DOI] [PubMed] [Google Scholar]
  8. Dahlhamer J, Lucas J, Zelaya C, Nahin R, Mackey S, DeBar L, Kerns R, Von Korff M, Porter L, Helmick C 2018. Prevalence of chronic pain and high-impact chronic pain among adults - United States, 2016 Morbidity and Mortality Weekly Report 67: 1001–1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Deyo RA, Dworkin SF, Amtmann D, Andersson G, Borenstein D, Carragee E, Carrino J, Chou R, Cook K, Delitto A et al. 2014. Report of the NIH Task Force on research standards for chronic low back pain. Journal of Pain 15: 569–585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Deyo RA, Katrina R, Buckley DI, Michaels LA, Kobus A, Eckstrom E, Forro V, Morris C 2016. Performance of a Patient Reported Outcomes Measurement Information System (PROMIS) short form in older adults with chronic musculoskeletal pain. Pain Medicine 17: 314–324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Farrar JT, Young JP, LaMoreaux L, Werth JL, Poole RM 2001. Clinical importance of changes in chronic pain intensity measured on an 11-point numerical pain rating scale. Journal of Pain 94: 149–158. [DOI] [PubMed] [Google Scholar]
  12. Flynn DM, McQuinn H, Fairchok A, Eaton LH, Langford DJ, Snow T, Doorenbos AZ 2018. Enhancing the success of functional restoration using complementary and integrative therapies: Protocol and challenges of a comparative effectiveness study in active duty service members with chronic pain. Contemporary Clinical Trials Communications 13: 100311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Flynn DM, McQuinn H, Burke L, Steffen AD, Fairchok A, Snow T, Doorenbos AZ 2022. Use of complementary and integrative health therapies before intensive functional restoration in active duty service members with chronic pain. Pain Medicine 23: 844–856. [DOI] [PubMed] [Google Scholar]
  14. Galan-Martin MA, Montero-Cuadrado F, Lluch-Girbes E, Coca-López MC, Mayo-Iscar A, Cuesta-Vargas A 2020. Pain neuroscience education and physical therapeutic exercise for patients with chronic spinal pain in Spanish physiotherapy primary care: A pragmatic randomized controlled trial. Journal of Clinical Medicine 9: 1201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gatchel RJ, Mayer TG 2008. Psychological evaluation of the spine patient. Journal of the American Academy of Orthopaedic Surgeons 16: 107–112. [DOI] [PubMed] [Google Scholar]
  16. Gouttebarge V, Wind H, Kuijer PP, Sluiter JK, Frings-Dresen MH 2009. Construct validity of functional capacity evaluation lifting tests in construction workers on sick leave as a result of musculoskeletal disorders. Archives of Physical Medicine and Rehabilitation 90: 302–308. [DOI] [PubMed] [Google Scholar]
  17. Harb SC, Bhat P, Cremer PC, Wu Y, Cremer LJ, Berger S, Cho L, Menon V, Gulati M, Jaber WA 2020. Prognostic value of functional capacity in different exercise protocols. Journal of the American Heart Association 9: e015986. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Hassett AL, Whibley D, Kratz A, Williams DA 2020. Measures for the assessment of pain in adults. Arthritis Care and Research 72: 342–357. [DOI] [PubMed] [Google Scholar]
  19. Hays RD, Hadorn D 1992. Responsiveness to change: An aspect of validity, not a separate dimension. Quality of Life Research 1: 73–75. [DOI] [PubMed] [Google Scholar]
  20. Heale R, Twycross A 2015. Validity and reliability in quantitative studies. Evidence-Based Nursing 18: 66–67. [DOI] [PubMed] [Google Scholar]
  21. Jain M, Yardi S, Rai R 2016. Comparative analysis of Bruce, Balke and Naughton treadmill protocols in normal subjects. Indian Journal of Physiotherapy and Occupational Therapy 10: 112. [Google Scholar]
  22. Jordan K, Dunn KM, Lewis M, Croft P 2006. A minimal clinically important difference was derived for the Roland-Morris Disability Questionnaire for low back pain. Journal of Clinical Epidemiology 59: 45–52. [DOI] [PubMed] [Google Scholar]
  23. Knox J, Orchowski J, Scher DL, Owens BD, Burks R, Belmont PJ 2011. The incidence of low back pain in active duty United States military service members. Spine 36: 1492–1500. [DOI] [PubMed] [Google Scholar]
  24. Mayer TG, Gatchel RJ 1988. Functional Restoration for Spinal Disorders: The Sports Medicine Approach. Philadelphia: Lea and Febiger. [Google Scholar]
  25. McGeary CA, McGeary DD 2014. New trends of musculoskeletal disorders in the military. In: Handbook of Musculoskeletal Pain and Disability Disorders in the Workplace, pp. 143–158. Springer. [Google Scholar]
  26. National Institutes of Health 2012. Patient-Reported Outcomes Measurement Information System (PROMIS). Bethesda, MD. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Reif S, Adams RS, Ritter GA, Williams TV, Larson MJ 2018. Prevalence of pain diagnoses and burden of pain among active duty soldiers, FY2012. Military Medicine 183: e330–e337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Revicki D, Hays RD, Cella D, Sloan J 2008. Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. Journal of Clinical Epidemiology 61: 102–109. [DOI] [PubMed] [Google Scholar]
  29. Roland M, Morris R 1983. A study of the natural history of low-back pain. Part II: Development of guidelines for trials of treatment in primary care. Spine 8: 145–150. [DOI] [PubMed] [Google Scholar]
  30. Rose M, Bjorner JB, Gandek B, Bruce B, Fries JF, Ware JE 2014. The PROMIS physical function item bank was calibrated to a standardized metric and shown to improve measurement efficiency. Journal of Clinical Epidemiology 67: 516–526. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Smith HJ, Taubman SB, Clark LL 2020. A burden and prevalence analysis of chronic pain by distinct case definitions among Active Duty US military service members, 2018. Pain Physician 23: E429–E440. [PubMed] [Google Scholar]
  32. Strand LI, Moe-Nilssen R, Ljunggren AE 2002. Back Performance Scale for the assessment of mobility-related activities in people with back pain. Physical Therapy 82: 1213–1223. [PubMed] [Google Scholar]
  33. Tiftikçioğlu BI 2018. Multiple Sclerosis Functional Composite (MSFC): Scoring instructions. Archives of Neuropsychiatry 55 (Suppl 1): S46–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Toblin RL, Quartana P, Riviere L, Clarke-Walper K, Hoge CW 2014. Chronic pain and opioid use in US soldiers after combat deployment. JAMA Internal Medicine 174: 1400–1401. [DOI] [PubMed] [Google Scholar]

RESOURCES