Reliability and validity of two square agility test and workability rate of manipulation in adults with musculoskeletal disorders

Rick Wickstrom; Inga Wang

doi:10.1589/jpts.38.171

. 2026 Apr 1;38(4):171–178. doi: 10.1589/jpts.38.171

Reliability and validity of two square agility test and workability rate of manipulation in adults with musculoskeletal disorders

Rick Wickstrom ^1,^*, Inga Wang ²

PMCID: PMC13038375 PMID: 41924117

Abstract

[Purpose] Study assessed test-retest reliability, minimal detectable change (MDC₉₅), concurrent validity, and known-groups validity of Two Square Agility Test (TSAT) and WorkAbility Rate of Manipulation (WRM) in adults with musculoskeletal disorders. [Participants and Methods] Fifty-five participants (mean age: 61.3 ± 15.3 years) transitioning from physical therapy completed TSAT and WRM subtests for Turning (WRMT) and Placing (WRMP) in both sessions. In session 2, participants also completed Grip Strength (GS), 10-Meter Walk at usual (GSU) and fast (GSF) pace, and Grooved Pegboard Placing (GPP) and Remove (GPR). [Results] Test-retest reliability was good for TSAT (ICC=0.89) and WRMT (ICC=0.86), and excellent for WRMP (ICC=0.93). MDC₉₅ was 0.74 steps/sec for TSAT, 13.1 parts/min for WRMT, and 6.5 parts/min for WRMP. TSAT correlated moderately with GSF (r=0.52) and GSU (r=0.50). WRMT and WRMP correlated highly (r=0.71). WRMT correlated highly with GPP (r=0.74), low with GPR (r=0.46) and negligible with GS (r=0.17). WRMP correlated highly with GPP (r=0.79), moderately with GPR (r=0.55) and negligible with GS (r=0.20). WRM differed by age and TSAT by age and body mass index. [Conclusions] Findings support reliability and validity of TSAT and WRM to assess adults with musculoskeletal disorders.

Key words: Functional status, Physical fitness, Musculoskeletal system

INTRODUCTION

Musculoskeletal disorders (MSDs) are the leading cause of pain, disability, sickness absence, and productivity loss for adults in the United States¹⁾. MSDs affecting the upper extremities can impair dexterity, limiting the ability to perform grasping, placing, or turning movements with the fingers. MSDs of the spine and lower extremities can restrict functional mobility, which has been associated with reduced quality of life, increased obesity risk, higher fall risk, and diminished functional activity performance. Hand dexterity and agility are motor functions that may decline with age or because of various health conditions. MSDs are particularly prevalent among midlife and older adults in the workforce, contributing to reduced work capacity and shorter work-life expectancy²⁾.

A proposed strategy to accelerate improvements in population health is to optimize the health of populations over individual lifespans by forging partnerships between health care providers and communities to improve service delivery and supportive community environments³⁾. This management approach is dependent on implementation of surveillance systems to monitor population health status and investigate health problems identified in the community. This call to measure what matters inspired the National Institutes of Health (NIH) to develop a free NIH Toolbox^® App to administer brief, reliable and norm-referenced tests of motor function for gait speed, hand dexterity, grip strength, standing balance, and walk endurance to assess physical health statis, disease burden, and outcomes relevant to daily function across the lifespan⁴⁾.

Clinical practice guidelines to optimize work participation after injury or illness recommend the use of valid and reliable physical performance tests throughout episodes of care to measure the individual’s work ability and to inform treatment and prognosis⁵⁾. The value of physical performance tests after the onset of an injury or illness is leveraged by having baseline comparison tests of motor function that may be collected through new hire and wellness exams to promote musculoskeletal health from hire to retire. This surveillance approach to workforce health is consistent with the Total Worker Health^® model to reduce worker health risks and job hazards that was advanced by the National Institute of Occupational Safety and Health (NIOSH)⁶⁾.

The Two Square Agility Test (TSAT) and WorkAbility Rate of Manipulation (WRM) are performance-based measures of functional mobility and hand dexterity, respectively, that can be administered within a small, private space. Developed by the first author, these tests were designed to monitor objective performance of workers as domains for movement health across the employment span, from pre-employment fitness screenings to wellness assessments, musculoskeletal primary care, and functional capacity evaluations.

TSAT is a brief timed test of functional mobility that assesses lower-extremity coordination, speed, and agility and can be administered in a small room to individuals with a broad range of physical fitness and cognitive abilities. This eliminates the need for an unobstructed walking path and positions the examiner in a stable position to guard the participant, compared to gait speed and other functional mobility tests. The reliability and concurrent validity of TSAT as a functional mobility measure for health promotion was previously established in a study of healthy, working age adults⁷⁾. Intra-rater reliability was found to be excellent (intraclass correlation coefficients [ICC]=0.94), and test-retest reliability was good (ICC=0.87). For this study of healthy adults, TSAT best time correlated moderately with other functional mobility tests that included Timed Up and Go (r=0.63), Five Times Sit to Stand (r=0.62), and Maximum Step Length (r=−0.54), supporting its concurrent validity. Performances for TSAT were found to be better in males, younger age, higher physical activity, and non-obese groups. The minimal detectable change at a 95% confidence level (MDC₉₅) was 1.37 s. for total time to complete 5 cycles for stepping forward and back across a marked line (20 steps).

The WRM is 3rd generation adaptation of the Minnesota Rate of Manipulation Test (MRMT) that was developed in the 1930s by the Minnesota Employment Stabilization Research Institution and is no longer manufactured. This original MRMT was used to validate the method adopted by the American Medical Association (AMA) for rating physical disability of participants with hand impairments in the 1971 version of the Guides to the Evaluation of Permanent Impairment⁸⁾. Lafayette Instrument introduced a similar test in 1991 with a compact folding board design called the Minnesota Manual Dexterity Test (MMDT). The MRMT and MMDT both have the same subtests for turning and placing 60 disks on boards with a 4 × 15 hole pattern; however, these test versions cannot be used interchangeably of other design differences⁹⁾. A review of physical fitness tests in older people by Hilgenkamp et al.¹⁰⁾ favored the Box and Block Test over the MMDT because of easier instructions to improve acceptability for test administration to people with lower cognitive function. This review inspired simplification of instructions for WRM turning and placing protocols to improve acceptability for assessment of persons with lower cognitive abilities.

The reliability, validity, and practical utility of the WRM as a functional hand dexterity test was initially studied in a comparison study of sixty-six healthy participants who also performed the MMDT¹¹⁾. Results supported good test-retest reliability of WRM (placing test ICC=0.88–0.90 and turning test ICC=0.68–0.82) using 60 disks. The WRM correlated moderately with MMDT (r=0.81 for the placing sub-test and r=0.44–0.57 for the turning sub-test). Overall, participants felt that the instruction of WRM was easier to follow (44%) and preferred its setup, color, and depth of the test board (49%). Time required to complete 1 panel of 20 disks correlated highly with the time needed to finish a complete trial of 60 disks in both MMDT (r=0.91–0.97) and WRM (r=0.88–0.95). The test-retest reliability was also good for completing only 1 panel of 20 disks (r=0.87 for the placing sub-test and r=0.75–0.82) for the turning sub-test). This finding of comparable reliability for one panel of 20 tests inspired changes to the board design of the WRM to reduce administration time for turning and placing of 20 disks using a 4 × 5 pattern of wells on two boards that contain the disks for storage and make it easy to reposition the test boards between different heights to simulate different work postures.

TSAT and WRM have not previously been studied for reliability or validity in clinical populations with MSDs. Therefore, this study aimed to (1) determine the test-retest reliability between sessions conducted by the same assessor, (2) estimate the MDC at the 95% confidence level (MDC₉₅), (3) assess the concurrent validity of the TSAT and WRM with self-report and performance-based measures of physical function, and (4) evaluate their known-groups validity by age group, gender, body mass index (BMI), and MSD impairment body areas.

PARTICIPANTS AND METHODS

This prospective cohort study used an adapted COSMIN risk of bias checklist to evaluate the methodological rigor of health status measurements¹²⁾. The Institutional Review Board of the University of Wisconsin-Milwaukee granted approval for the study (protocol #21.003). All participants provided written informed consent.

A convenience sample of adults with one or more MSD conditions was recruited within two weeks of discharge from outpatient physical therapy. Inclusion criteria comprised: (1) age 18 years or older, (2) presence of a musculoskeletal condition for more than two weeks, (3) ability to ambulate independently with or without an assistive aid (4) ability to pinch with either hand to pick up a pencil, and (5) self-reports resting pain of 6/10 or less. Exclusion criteria were: (1) musculoskeletal surgery within the past three months, (2) pregnancy, and (3) inability to understand written or spoken instructions in English.

Participants underwent two clinic visits. The flooring of the measurement environment was carpeted, and participants wore non-slip athletic footwear. During the 1st session, participants provided demographic information and were assessed on the Two Square Agility Test (TSAT), WorkAbility Rate of Manipulation Turning (WRMT), and Placing (WRMP) by a physical therapist assistant (PTA) who was also certified as a personal fitness trainer. During the second session within 3 weeks, participants received anthropometric measurements, self-report surveys of physical function, reassessment on the TSAT, WRMT, and WRMP, and comparative physical performance tests. Performance-based tests were administered in a randomized sequence and included 10-Meter Walk at Usual (GSU) and Fast (GSF) pace, Grip Strength (GS), as well as Grooved Pegboard Placing (GPP) and Remove (GPR) tasks.

Anthropometric measures: Participants were measured for height (cm), body weight (kg), and waist girth (cm) at the level of the umbilicus¹³⁾. BMI was computed by dividing the participant’s weight in kilograms by the square of height in meters¹⁴⁾.

TSAT: The TSAT assesses lower-extremity coordination, speed, and agility. As shown in Fig. 1, the participant begins by standing with both feet in one square behind a boundary line. Upon the start signal, the timer is initiated when the participant contacts the ground after stepping across a marked line. Then the participant steps across quickly with the other leg to land with both feet in front of the marked line. The participant then immediately steps with each leg to return to the starting square, and before stepping forward across the marked line with the lead leg to complete the first cycle. The participant is instructed to continue stepping forward and backward across the marked line as a practice trial until notified to stop after completing a total of five cycles. Then the participant is instructed to repeat the same stepping sequence “as fast as safety possible” for three timed trials at a fast pace. The total number of steps (20) required to complete five cycles was divided by the best (lowest) completion time across three trials. The resulting TSAT score was expressed as speed (steps per second) so that higher values indicate better physical performance for agility speed as a measure of lower body motor coordination and functional mobility.

WRM: This is a brief assessment of hand dexterity with demonstrated reliability when compared with the MMDT¹¹⁾. In this study, the starting hand for each session was randomly assigned. During Session 1, participants completed three timed trials of the WRMT subtest using alternate hands, followed by the WRMP subtest. In the WRMT (Fig. 2), participants used one hand to turn over all disks as fast as possible until the opposite color was facing up. In WRMP (Fig. 3), participants used one hand to move and place disks one at a time onto an empty board as fast as possible. For each subtest, the best (lowest) completion time of three trials was identified separately for the right and left hands. These times were then converted to dexterity speed (parts/min) using the formula: Dexterity Speed=20 parts / time (sec) * 60 sec/min. The overall dexterity speed for each subtest was calculated as the average of the right- and left-hand best trial speeds. Reporting dexterity speed (parts/min) communicates better performances with higher values in a positive direction for dexterity speed, as a measure of hand manipulation and upper body coordination.

Fig. 2. — WorkAbility Rate of Manipulation Turning (WRMT) right hand test.

Fig. 3. — WorkAbility Rate of Manipulation Placing (WRMP) right hand test.

GSU and GSF: An adapted 10MWT protocol was used to measure gait speed at both usual (GSU) and fast (GSF) walking speeds¹⁵⁾. Gait speed was calculated by dividing the 6-meter distance by the best (lowest) completion time (sec) from two trials in each condition. Timing was recorded only during the middle 6 meters of the 10-meter walkway (from 2 to 8 meters) to exclude acceleration and deceleration phases.

GS: GS was measured using a Jamar dynamometer following a standardized protocol. The starting hand was randomly assigned. Participants were instructed to squeeze the dynamometer as hard as possible while keeping the arm at their side and the elbow flexed at 90°¹⁶⁾. Three trials were performed for each hand at the position 2 handle setting. The best (highest) force from the three trials was recorded for each hand, and the overall grip strength was calculated as the average of the right- and left-hand best scores.

GPP and GPR: Finger dexterity was evaluated using the Lafayette Instrument Company’s Grooved Pegboard Test, a widely used measure of fine motor function in neuropsychologists¹⁷⁾. The starting hand was randomly assigned. Two trials of timed placing and removal tasks were conducted for the right and left hands, consistent with Schmidt et al.¹⁸⁾, who reported performance improvements on the second trial due to a training effect. The best (lowest) completion time for each hand was identified and converted to speed (parts/min) using the formula: Speed=25 parts / time (sec) * 60 sec/min. The best (lowest) completion time for each hand was identified and converted to speed (parts per min) using the formula as average of the best trial speed for the right and left hands, where higher values indicate better performance.

Data was analyzed using SPSS statistical package version 29 (Armonk, NY, USA: IBM Corp.). Descriptive statistics were used to describe the score distribution and frequency counts. For test-retest reliability, Interclass Correlation Coefficient (ICC; model 2,1) were calculated to assess the test-retest reliability for TSAT, WRMT, and WRMP scores. Reliability values for ICC were considered poor if less than 0.5, moderate if between 0.5 to less than 0.75, good if between 0.75 to less than 0.9, and excellent if 0.9 to 1.0)¹⁹⁾. For responsiveness, MDC₉₅ was calculated to assess the smallest change that can be detected by the instrument beyond measurement error, using the formula: MDC=1.96 * Ö2 * Standard Error of Measurement (SEM). SEM was determined as SEM=SD * Ö (1 −r), where “SD” represents the standard deviation of the scores and “r” represents the reliability coefficient of the test²⁰⁾. Concurrent validity was analyzed with Pearson correlations of TSAT, WRMT, and WRMP with upper body and lower body physical performance tests. The best speed trials were used for TSAT, GSU, and GSF. The average of right- and left-hand best trials were used for WRMT, WRMP, GPP, and GS. The correlation strength for concurrent validity was defined by Pearson r values and interpreted using the criteria recommended by Mukaka²¹⁾ [24] of negligible if less than 0.3, low if 0.30 to less than 0.5, moderate if 0.5 to less than 0.7, high if 0.7 to less than 0.9, and very high if 0.9 to 1.0. Known-groups validity was assessed by comparing TSAT, WRMT, and WRMP scores across classification groups for age, gender, BMI, and MSD impairment types (upper vs. lower body).

RESULTS

A total of 55 participants were included in the study (Table 1). The mean age was 61.3 ± 15.3 years, and the sample comprised 20 males (36.4%) and 35 females (63.6%). Most participants identified as Caucasian (98.2%). Regarding physical activity levels, 50.9% reported light activity, 45.5% moderate activity, and 3.6% vigorous activity. Musculoskeletal disorders were primarily localized to the lower body (81.8%), while 49.1% had upper-body impairments. The mean BMI was 31.3 ± 11.1 kg/m², with 25.5% classified as normal weight, 32.7% overweight, 20.0% obese class I, 7.3% obese class II, and 14.5% obese class III.

Table 1. Participant characteristics (n=55).

Characteristics*
Age (years)		61.3 ± 15.3
Gender: males/females		20 (36.4%)/35 (63.6%)
Race: Caucasian/all other		54 (98.2%)/1 (1.8%)
Recent physical activity		2 (3.6%)
	Vigorous	25 (45.5%)
	Moderate	28 (50.9%)
	Light
Musculoskeletal impairments
	Upper body	27 (49.1%)
	Lower body	45 (81.8%)
Body mass index (BMI)		31.3 ± 11.1
	Normal: BMI is 18.5 to <25	14 (25.5%)
	Overweight: BMI is 25.0 to <30	18 (32.7%)
	Obese Class 1: BMI of 30 to <35	11 (20.0%)
	Obese Class 2: BMI of 35 to <40	4 (7.3%)
	Obese Class 3: BMI of 40 or higher	8 (14.5%)

Open in a new tab

*Data shown as mean ± standard deviation or number (%).

Table 2 presents test-retest reliability and MDC₉₅ of TSAT and WRM scores. Test-retest reliability ICC coefficients for TSAT, WRMT, and WRMP scores were 0.88, 0.86, and 0.93, respectively, demonstrating good to excellent reliability. The MDC₉₅ for TSAT, WRMT, and WRMP scores were 0.74 steps/sec, 13.1 parts/min, and 6.5 parts/min, respectively.

Table 2. Test-retest reliability and minimal detectable change of TSAT and WRM scores.

Tests	Session 1	Session 2	Difference	p	MDC₉₅	ICC (2,1)	95% CI
TSAT	2.78	2.91	−0.13	0.009	0.74	0.89	(0.801, 0.936)
WRMT	69.36	70.62	−1.26	0.169	13.1	0.86	(0.766, 0.913)
WRMP	55.45	55.89	−0.44	0.338	6.5	0.93	(0.876, 0.956)

Open in a new tab

TSAT: Two Square Agility Test in steps/sec; WRMT: WorkAbility Rate of Manipulation Turning in parts/min; WRMP: WorkAbility Rate of Manipulation Placing in parts/min; MDC₉₅: minimal detectable change; ICC: interclass correlation coefficient.

For concurrent validity, TSAT showed moderate correlations with the lower-body performance tests GSF (r=0.61) and GSU (r=0.61). The two WRM subtests, WRMT and WRMP, were strongly correlated (r=0.71). WRMT demonstrated a strong association with the upper-body test GPP (r=0.74), a low correlation with GPR (r=0.46), and a negligible correlation with GS (r=0.17). Similarly, WRMP correlated strongly with GPP (r=0.79), moderately with GPR (r=0.55), and negligibly with GS (r=0.20). TSAT showed a moderate correlation with WRMP (r=0.61) and GPP (r=0.50), a low correlation with WRMT (r=0.38), and a negligible correlation with GS (r=0.22).

Table 3 presents the comparisons of TSAT, WRM subtests, and other performance measures across age, gender, BMI, and MSD classification groups. Significant group differences were observed by age, with younger participants (<50 years) demonstrating better performance on TSAT (p=0.001), WRMT (p=0.013), WRMP (p=0.001), GPP (p=0.003), and GS (p=0.040). No significant differences were found by gender for TSAT, WRM, or GPP, although GS was significantly greater among males (p<0.001). By BMI, participants with lower BMI (<30) performed better on TSAT (p=0.012), GSU (p=0.001), and GSF (p=0.033). No significant group differences were observed between upper- and lower-body MSD impairment groups.

Table 3. Known-groups validity by age, gender, BMI, and MSD.

		n*	TSAT	GSU	GSF	WRMT	WRMP	GPP	GPR	GS
Age
	Age <50	12	3.54 (0.8)	0.90 (0.2)	1.38 (0.2)	78.56 (14.9)	62.88 (10.3)	25.13 (5.7)	75.21 (14.8)	78.63 (42.9)
	Age ≥50	43	2.74 (0.7)	0.88 (0.2)	1.23 (0.3)	68.40 (11.3)	53.94 (7.1)	20.56 (4.1)	69.72 (12.2)	59.44 (22.3)
	p-value**		0.001	0.773	0.067	0.013	0.001	0.003	0.193	0.040
Gender
	Male	20	2.91 (0.8)	0.87 (0.2)	1.28 (0.3)	67.92 (13.6)	53.80 (9.0)	20.32 (5.2)	66.79 (11.6)	89.83 (31.0)
	Female	35	2.91 (0.8)	0.90 (0.2)	1.25 (0.2)	72.16 (12.1)	57.08 (8.3)	22.27 (4.5)	73.28 (13.1)	48.66 (11.8)
	p-value**		0.999	0.503	0.726	0.237	0.178	0.150	0.071	<0.001
BMI
	BMI <30	31	3.15 (0.7)	0.95 (0.2)	1.32 (0.2)	72.2 (13.4)	57.8 (8.1)	22.25 (4.8)	72.33 (14. 7)	60.81 (29.3)
	BMI ≥30	23	2.61 (0.8)	0.81 (0.1)	1.18 (0.3)	68.65 (12.0)	53.8 (8.9)	20.7 (4.9)	69.25 (10.3)	67.07 (28.7)
	p-value**		0.012	0.001	0.033	0.319	0.091	0.251	0.392	0.438
MSD
	Upper body	10	3.23 (0.7)	0.87 (0.2)	1.27 (0.3)	74.74 (15.4)	57.92 (11.4)	23.83 (4.9)	74.71 (8.8)	58.85 (17.1)
	Lower body	28	2.87 (0.8)	0.88 (0.2)	1.24 (0.3)	72.26 (9.9)	56.96 (8.5)	22.07 (4.2)	70.49 (14.8)	69.91 (32.5)
	p-value**		0.229	0.972	0.754	0.562	0.780	0.287	0.403	0.314

Open in a new tab

*Data shown under the n column is the number of participants in each comparison group. **p-value is the significance based on comparison difference in the mean (± standard deviation) for classification groups for each functional performance test. TSAT: two square agility test; GSU: gait speed usual; GSF: gait speed fast; WRMT: WorkAbility rate of manipulation turning; WRMP: WorkAbility rate of manipulation placing; GPP: grooved pegboard test placing; GPR: grooved pegboard test remove; GS: grip strength; BMI: body mass index; MSD: musculoskeletal disorder.

DISCUSSION

Movement screens for MSDs aimed to identify physical impairments and functional limitations that may contribute to injury risk and disability. Such assessments offer greater value when a pre-injury baseline of physical function can be established and monitored throughout a person’s lifespan. To be widely applicable, screening tools must be easy to administer, require minimal space and equipment, and be suitable as outcome measures in diverse clinical or workplace settings. This need is driven by the limitations of traditional post-injury evaluations, which rely heavily on symptoms and lack objective baseline data; the demand for time-efficient tools in busy settings; and the value of simple, reliable and cost-effective tests for tracking functional progress and readiness to resume physical activity.

This study was the first to examine test-retest reliability, MDC₉₅, and concurrent validity of the TSAT and the WRM in adults with MSDs. The results supported the reliability and validity of these assessments within this clinical population of adults. Designed for ease of use, both TSAT and WRM were successfully administered in a small space and can be readily implemented in various settings, including clinics, workplaces, and potentially home environments. The simplicity and minimal equipment requirements make these tests particularly suited for use in resource-limited or high-throughput contexts, offering a practical means to gather objective functional data that informs rehabilitation planning and supports safe participation in work or daily lifestyle activities.

The TSAT employs a reciprocal forward and backward stepping sequence, making it a practical functional mobility assessment for both low- and high-functioning adults. Unlike walking speed tests, the TSAT can be safely administered in a private examination room by a single examiner, who can provide guarding from a stationary stance, an important advantage for individuals with privacy or fall risk concerns. Its simple instructions and reliance on gross motor movements enhance usability among patients with physical or cognitive impairments. This may provide a more acceptable alternative for individuals recovering from stroke²²⁾ or vestibular disorders²³⁾ who often struggle to complete more complex tests, such as the Four Square Step Test (FSST), which involves multidirectional stepping. Simplifying the movement sequence also reduces motor learning effects, which may be an advantage with healthy adults that experience cognitive functioning. In this study, TSAT demonstrated good test-retest reliability in adults with musculoskeletal disorders. It showed moderate correlations with the lower body comparison measures of usual and fast gait speed.

In this study, WRMT demonstrated good test-retest reliability, and WRMP demonstrated excellent reliability. Both tests showed strong correlations with GPP sub-test that is often cited as a primary reference test for normative comparisons in neuropsychological evaluations. Negligible correlations for WRMT and WRMP with GS suggest that dexterity and grip strength measures are different constructs for hand function that warrant assessment of both dexterity and grip measures to comprehensively assess upper-extremity function. These results supported the validity and specificity of WRMT and WRMP for assessing upper-body performance that requires functional manipulation.

The known-groups analysis demonstrated significant differences by age and BMI, supporting the expected discriminative validity of the performance-based measures. Younger participants performed better on TSAT, WRMT, and WRMP, while participants with lower BMI showed superior performance on TSAT. These findings align with prior evidence that age-related declines in muscle strength, coordination, and agility, as well as higher BMI, are associated with reduced physical performance. In contrast, no significant group differences were observed between participants with upper- and lower-body MSD conditions. This finding may be explained by the uneven and limited subgroup sizes, with only 10 participants reporting upper-body-only impairments and 28 with lower-body-only impairments, while 17 participants reported both upper and lower body MSDs. Such overlap likely obscured clear between-group distinctions. Additionally, the small sample size and heterogeneity of impairment severity and compensatory strategies may have further limited the ability to detect meaningful differences. Future studies with larger and more balanced samples, along with stratification by single- and multi-region MSD involvement, are needed to more accurately assess the discriminative validity of these measures.

This study had several limitations. The sample size was relatively small and drawn from a convenience population, limiting the generalizability of the findings. Participants were primarily patients transitioning out of physical therapy, which may not represent the broader population with MSDs that are at a more stable point of recovery. While the study assessed reliability and validity, it did not evaluate sensitivity to change over time (i.e., responsiveness to rehabilitation or functional recovery). The administration of multiple tests may have introduced practice or fatigue effects, potentially influencing performance and reliability results.

Several future directions may expand the utility and impact of this study. Establishing age- and sex-specific normative values for TSAT and WRM in the general population and across occupational groups is an important next step. In parallel, developing reference cut-off scores to support risk stratification, early detection of functional decline, and assessment of work capacity could enhance the role of these tests in wellness screening and preventive care. As this study recruited participants solely from a physical therapy clinic, future research should also focus on validating the TSAT and WRM in specific clinical populations, including individuals with neurological disorders (e.g., stroke, Parkinson’s disease), vestibular dysfunction, or age-related frailty. We did not collect an external anchor to estimate the Minimum Clinically Important Difference (MCID), and participants completed only two visits within a 3-week interval without admission-to-discharge follow-up. Thus, MCID values could not be calculated in this study. Future research is warranted to determine the MCID for TSAT and WRM. Additionally, further validation is needed to assess the sensitivity of the TSAT and WRM to changes in function over time, particularly from the initiation of skilled therapy through to stabilization of functional stabilization. With ongoing advancements in mobile application development and artificial intelligence, a promising direction involves creating a mobile app that integrates motion analysis, digital data collection forms, automated scoring, and step-by-step instructions. Such a tool would streamline data collection, improve accessibility across clinical and non-clinical settings, and facilitate standardized administration of the TSAT and WRM.

Conflict of interest

The lead author (Wickstrom) designed the AMS and WRM tests for use in the commercial WorkAbility Systems method that he teaches, and supplies WRM and reporting software to health professionals. The second author (Wang) declares no conflict of interest.

Acknowledgments

The authors would like to acknowledge research assistance with participant recruitment and data collection by Paul Kaple, PT and Juliet Bay, PTA of Rehab Associates.

REFERENCES

1.United States Bone and Joint Initiative: The Burden of Musculoskeletal Diseases in the United States (BMUS): The Hidden Impact of Musculoskeletal Disorders on Americans, 4th ed. 2018. https://www.bmus-ors.org/.
2.Palmer KT, Goodson N: Ageing, musculoskeletal health and work. Best Pract Res Clin Rheumatol, 2015, 29: 391–404. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Delgado P, Binzer K, Shah A, et al. : Accelerating population health improvement. BMJ, 2021, 373: n966. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Fox RS, Zhang M, Amagai S, et al. : Uses of the NIH Toolbox^® in clinical samples: a scoping review. Neurol Clin Pract, 2022, 12: 307–319. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Daley D, Payne LP, Galper J, et al. : Clinical guidance to optimize work participation after injury or illness: the role of physical therapists. J Orthop Sports Phys Ther, 2021, 51: CPG1–CPG102. [DOI] [PubMed] [Google Scholar]
6.Chari R, Chang CC, Sauter SL, et al. : Expanding the paradigm of occupational safety and health: a new framework for worker well-being. J Occup Environ Med, 2018, 60: 589–593. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Wickstrom RJ, Wang YC, Wickstrom NE, et al. : A new two square agility test for workplace health-reliability, validity and minimal detectable change. J Phys Ther Sci, 2019, 31: 823–830. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Gloss DS, Wardle MG: Use of the Minnesota Rate of Manipulation Test for disability evaluation. Percept Mot Skills, 1982, 55: 527–532. [DOI] [PubMed] [Google Scholar]
9.Surrey LR, Nelson K, Delelio C, et al. : A comparison of performance outcomes between the Minnesota Rate of Manipulation Test and the Minnesota Manual Dexterity Test. Work, 2003, 20: 97–102. [PubMed] [Google Scholar]
10.Hilgenkamp TI, van Wijck R, Evenhuis HM: Physical fitness in older people with ID-concept and measuring instruments: a review. Res Dev Disabil, 2010, 31: 1027–1038. [DOI] [PubMed] [Google Scholar]
11.Wang YC, Wickstrom R, Yen SC, et al. : Assessing manual dexterity: comparing the WorkAbility Rate of Manipulation Test with the Minnesota Manual Dexterity Test. J Hand Ther, 2018, 31: 339–347. [DOI] [PubMed] [Google Scholar]
12.Mokkink LB, Terwee CB, Patrick DL, et al. : The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res, 2010, 19: 539–549. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Mason C, Katzmarzyk PT: Variability in waist circumference measurements according to anatomic measurement site. Obesity (Silver Spring), 2009, 17: 1789–1795. [DOI] [PubMed] [Google Scholar]
14.Obesity: identification, assessment and management. London: National Institute for Health and Care Excellence (NICE); 2023. [PubMed]
15.Moore JL, Potter K, Blankshain K, et al. : A core set of outcome measures for adults with neurologic conditions undergoing rehabilitation: a clinical practice guideline. J Neurol Phys Ther, 2018, 42: 174–220. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Wang YC, Bohannon RW, Li X, et al. : Hand-grip strength: normative reference values and equations for individuals 18 to 85 years of age residing in the United States. J Orthop Sports Phys Ther, 2018, 48: 685–693. [DOI] [PubMed] [Google Scholar]
17.Rabin LA, Paolillo E, Barr WB: Stability in test-usage practices of clinical neuropsychologists in the United States and Canada over a 10-year period: a follow-up survey of INS and NAN members. Arch Clin Neuropsychol, 2016, 31: 206–230. [DOI] [PubMed] [Google Scholar]
18.Schmidt SL, Oliveira RM, Rocha FR, et al. : Influences of handedness and gender on the grooved pegboard test. Brain Cogn, 2000, 44: 445–454. [DOI] [PubMed] [Google Scholar]
19.Koo TK, Li MY: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med, 2016, 15: 155–163. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Haley SM, Fragala-Pinkham MA: Interpreting change scores of tests and measures used in physical therapy. Phys Ther, 2006, 86: 735–743. [PubMed] [Google Scholar]
21.Mukaka MM: Statistics corner: a guide to appropriate use of correlation coefficient in medical research. Malawi Med J, 2012, 24: 69–71. [PMC free article] [PubMed] [Google Scholar]
22.Blennerhassett JM, Jayalath VM: The Four Square Step Test is a feasible and valid clinical test of dynamic standing balance for use in ambulant people poststroke. Arch Phys Med Rehabil, 2008, 89: 2156–2161. [DOI] [PubMed] [Google Scholar]
23.Whitney SL, Marchetti GF, Morris LO, et al. : The reliability and validity of the Four Square Step Test for people with balance deficits secondary to a vestibular disorder. Arch Phys Med Rehabil, 2007, 88: 99–104. [DOI] [PubMed] [Google Scholar]

[r1] 1.United States Bone and Joint Initiative: The Burden of Musculoskeletal Diseases in the United States (BMUS): The Hidden Impact of Musculoskeletal Disorders on Americans, 4th ed. 2018. https://www.bmus-ors.org/.

[r2] 2.Palmer KT, Goodson N: Ageing, musculoskeletal health and work. Best Pract Res Clin Rheumatol, 2015, 29: 391–404. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r3] 3.Delgado P, Binzer K, Shah A, et al. : Accelerating population health improvement. BMJ, 2021, 373: n966. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r4] 4.Fox RS, Zhang M, Amagai S, et al. : Uses of the NIH Toolbox^® in clinical samples: a scoping review. Neurol Clin Pract, 2022, 12: 307–319. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r5] 5.Daley D, Payne LP, Galper J, et al. : Clinical guidance to optimize work participation after injury or illness: the role of physical therapists. J Orthop Sports Phys Ther, 2021, 51: CPG1–CPG102. [DOI] [PubMed] [Google Scholar]

[r6] 6.Chari R, Chang CC, Sauter SL, et al. : Expanding the paradigm of occupational safety and health: a new framework for worker well-being. J Occup Environ Med, 2018, 60: 589–593. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r7] 7.Wickstrom RJ, Wang YC, Wickstrom NE, et al. : A new two square agility test for workplace health-reliability, validity and minimal detectable change. J Phys Ther Sci, 2019, 31: 823–830. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r8] 8.Gloss DS, Wardle MG: Use of the Minnesota Rate of Manipulation Test for disability evaluation. Percept Mot Skills, 1982, 55: 527–532. [DOI] [PubMed] [Google Scholar]

[r9] 9.Surrey LR, Nelson K, Delelio C, et al. : A comparison of performance outcomes between the Minnesota Rate of Manipulation Test and the Minnesota Manual Dexterity Test. Work, 2003, 20: 97–102. [PubMed] [Google Scholar]

[r10] 10.Hilgenkamp TI, van Wijck R, Evenhuis HM: Physical fitness in older people with ID-concept and measuring instruments: a review. Res Dev Disabil, 2010, 31: 1027–1038. [DOI] [PubMed] [Google Scholar]

[r11] 11.Wang YC, Wickstrom R, Yen SC, et al. : Assessing manual dexterity: comparing the WorkAbility Rate of Manipulation Test with the Minnesota Manual Dexterity Test. J Hand Ther, 2018, 31: 339–347. [DOI] [PubMed] [Google Scholar]

[r12] 12.Mokkink LB, Terwee CB, Patrick DL, et al. : The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res, 2010, 19: 539–549. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r13] 13.Mason C, Katzmarzyk PT: Variability in waist circumference measurements according to anatomic measurement site. Obesity (Silver Spring), 2009, 17: 1789–1795. [DOI] [PubMed] [Google Scholar]

[r14] 14.Obesity: identification, assessment and management. London: National Institute for Health and Care Excellence (NICE); 2023. [PubMed]

[r15] 15.Moore JL, Potter K, Blankshain K, et al. : A core set of outcome measures for adults with neurologic conditions undergoing rehabilitation: a clinical practice guideline. J Neurol Phys Ther, 2018, 42: 174–220. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r16] 16.Wang YC, Bohannon RW, Li X, et al. : Hand-grip strength: normative reference values and equations for individuals 18 to 85 years of age residing in the United States. J Orthop Sports Phys Ther, 2018, 48: 685–693. [DOI] [PubMed] [Google Scholar]

[r17] 17.Rabin LA, Paolillo E, Barr WB: Stability in test-usage practices of clinical neuropsychologists in the United States and Canada over a 10-year period: a follow-up survey of INS and NAN members. Arch Clin Neuropsychol, 2016, 31: 206–230. [DOI] [PubMed] [Google Scholar]

[r18] 18.Schmidt SL, Oliveira RM, Rocha FR, et al. : Influences of handedness and gender on the grooved pegboard test. Brain Cogn, 2000, 44: 445–454. [DOI] [PubMed] [Google Scholar]

[r19] 19.Koo TK, Li MY: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med, 2016, 15: 155–163. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r20] 20.Haley SM, Fragala-Pinkham MA: Interpreting change scores of tests and measures used in physical therapy. Phys Ther, 2006, 86: 735–743. [PubMed] [Google Scholar]

[r21] 21.Mukaka MM: Statistics corner: a guide to appropriate use of correlation coefficient in medical research. Malawi Med J, 2012, 24: 69–71. [PMC free article] [PubMed] [Google Scholar]

[r22] 22.Blennerhassett JM, Jayalath VM: The Four Square Step Test is a feasible and valid clinical test of dynamic standing balance for use in ambulant people poststroke. Arch Phys Med Rehabil, 2008, 89: 2156–2161. [DOI] [PubMed] [Google Scholar]

[r23] 23.Whitney SL, Marchetti GF, Morris LO, et al. : The reliability and validity of the Four Square Step Test for people with balance deficits secondary to a vestibular disorder. Arch Phys Med Rehabil, 2007, 88: 99–104. [DOI] [PubMed] [Google Scholar]

PERMALINK

Reliability and validity of two square agility test and workability rate of manipulation in adults with musculoskeletal disorders

Rick Wickstrom, PT, DPT

Inga Wang, PhD, OTR

Abstract

INTRODUCTION

PARTICIPANTS AND METHODS

Fig. 1.

Fig. 2.

Fig. 3.

RESULTS

Table 1. Participant characteristics (n=55).

Table 2. Test-retest reliability and minimal detectable change of TSAT and WRM scores.

Table 3. Known-groups validity by age, gender, BMI, and MSD.

DISCUSSION

Conflict of interest

Acknowledgments

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Reliability and validity of two square agility test and workability rate of manipulation in adults with musculoskeletal disorders

Rick Wickstrom, PT, DPT

Inga Wang, PhD, OTR

Abstract

INTRODUCTION

PARTICIPANTS AND METHODS

Fig. 1.

Fig. 2.

Fig. 3.

RESULTS

Table 1. Participant characteristics (n=55).

Table 2. Test-retest reliability and minimal detectable change of TSAT and WRM scores.

Table 3. Known-groups validity by age, gender, BMI, and MSD.

DISCUSSION

Conflict of interest

Acknowledgments

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases