ABSTRACT
Purpose: To examine reliability and validity of the Lunge Test (LT) of dorsiflexion range of motion and determine the impact of different approaches to obtain a score on these parameters. Methods: Fifty-three patients with ankle injury/dysfunction provided initial assessment data for cross-sectional convergent and known-groups validity analysis with the Pearson coefficient (r) and paired t-test, respectively; data after 4–8 weeks of treatment for longitudinal validity analysis with coefficient r; and data 3 days later for test–retest reliability using the intra-class correlation coefficient (ICC) and minimal detectable change (MDC). LT scores were determined for the affected leg only (LTAff) and for the difference between the two limbs (LTDiff). Two strategies were used to calculate LT scores: a single series and the mean of three series of lunges. LTs were correlated with the Lower Extremity Functional Scale and Global Foot and Ankle Scale. Results: Reliability coefficients were high (ICC=0.93–0.99). The MDC=1.0/1.5 cm, LTAff/LTDiff, respectively. Cross-sectional validity was confirmed for LTDiff (r=−0.40 to −0.50). Between-limb differences (p<0.05) supported known-groups validity. Longitudinal validity was supported for both LT change scores (r=0.39–0.63). The number of series of lunges used did not impact results. Conclusions: A single series of lunges produces a reliable LT score. From a validity perspective, clinicians should use LTDiff on initial assessment and either LT to assess change.
Key Words: reproducibility of results; ankle; range of motion, articular; weight-bearing
RÉSUMÉ
Objectif : Vérifier la fiabilité et la validité du test fonctionnel de flexion du genou vers l'avant (lunge test, LT) pour vérifier l'amplitude de la flexion dorsale du pied et évaluer les effets de diverses approches visant à obtenir une cote plus élevée pour ces paramètres. Méthode : Un échantillon de 53 patients avec blessure ou dysfonction de la cheville a permis de recueillir des données initiales pour procéder à une analyse convergente croisée et des groupes connus en vue d'évaluer la validité des données à l'aide du coefficient de Pearson (r) et du test de t, respectivement; des données après 4 à 8 semaines de traitement pour l'analyse de la validité longitudinale avec le coefficient r, et des données 3 ans plus tard pour la fiabilité test retest avec coefficient de corrélation intraclasse (CCI) et changement minimal détectable (CMD). Les pointages LT ont été établis pour la jambe touchée uniquement (LTAff) et pour les différences entre les deux membres (LTDiff). Deux stratégies ont été utilisées pour calculer les pointages LT: une série simple et la moyenne de trois séries de flexions. Les LT ont ensuite été corrélés avec les échelles fonctionnelles pour les membres inférieurs, le pied dans son ensemble et la cheville. Résultats : Les coefficients de fiabilité sont élevés (CCI=0,93–0,99). Le CMD=1,0/1,5 cm pour le LTAff/LTDiff, respectivement. La validité transversale a été confirmée pour LTDiff (r=−0,40 à −0,50). Les différences entre les deux membres (p<0,05) appuyaient la fiabilité des groupes connus. La validité longitudinale était appuyée pour les deux pointages de changement LT (r=0,39–0,63). Le nombre de séries de flexions utilisées n'a pas eu d'effet sur les résultats. Conclusions : Une seule série de flexions permet d'obtenir un pointage à LT fiable. Du point de vue de la validité, les cliniciens devraient utiliser le LTDiff pour l'évaluation initiale, et l'un ou l'autre des LT pour évaluer les changements.
Mots clés : fiabilité et validité, cheville, amplitude de mouvement, mise en charge
Ankle dorsiflexion occurs naturally during many lower-extremity tasks. Reduced ankle dorsiflexion range of motion (DF-ROM) is common in many orthopaedic conditions that confront physiotherapists, including ankle fractures1 and sprains.2 Clinicians pay attention to the arthro- and osteokinematics of the ankle during weight-bearing dorsiflexion. Normally, during this movement, the tibia moves forward over the foot as the tibial plafond glides anteriorly on the talar dome.3 When this accessory glide is limited—for example, due to an anteriorly positioned talus in people with chronic ankle instability4—the resulting decrease in DF-ROM prevents the ankle joint from achieving a close-packed position of bony stability, making it more vulnerable to inversion and internal rotation forces about the ankle.3 This is believed to increase the risk of repeated injury, as people with functional ankle instability have demonstrated increases in the vertical component of the ground reaction force in the presence of suboptimal ankle joint positioning when landing from a jumping activity.5 The importance of normalizing the relationship between physiologic and accessory ankle movements in weight bearing is evident in the development and investigation of treatment approaches that target these components to improve weight-bearing DF-ROM and spatiotemporal postural control.6,7
Techniques to measure DF-ROM can be grouped into three categories, based on measurement method and body position: visual estimation, goniometric measurement in non-weight-bearing positions, and measurement in a weight-bearing position. Each grouping demonstrates different levels of reliability. Visual estimation has poor measurement qualities and has not been recommended for use in clinical settings.8,9 Non-weight-bearing measures of DF-ROM have variable reports of intrarater reliability,9–13 with intra-class correlation coefficient (ICC) values varying from 0.649 to 0.9713 and interrater ICCs as high as 0.87.10 For weight-bearing measures of DF-ROM, reports of both types of reliability have been uniformly high, with intrarater ICCs from 0.9314 to 0.992,15 and interrater ICCs of 0.9814 and 0.99.16
Bennell16 introduced the weight-bearing Lunge Test (LT) for quantifying DF-ROM using a simple tape measure secured to the floor (see Figure 1). One key aspect of the testing protocol is its iterative nature: a series of lunges is performed to determine a single numeric value, and then this procedure is repeated three times, so that a mean of three values represents DF-ROM.
One question about this research method is whether it can be translated directly to clinical practice. In the literature, researchers have used up to six17 series of lunges to generate a mean value for characterizing DF-ROM. Some investigators have measured only the affected limb;2,15,16 others have used the difference between limbs as the measure of abnormal DF-ROM.18 Clinicians working under time constraints in busy treatment settings may choose to use a single series of lunges with the affected limb only; we do not know the impact of using a single versus multiple series of lunges with affected versus bilateral limbs on the reliability of the measure.
The validity of the LT has not been examined to the same extent as its reliability. In healthy study participants, ultrasound images have shown that LT values do correlate with gastrocnemius/soleus muscle fascicle lengths and pennation angles.19 Among patients with an ankle fracture, two studies have demonstrated the predictive validity of affected-limb LT scores on activity limitation.1,20
The present study was designed as an initial parameter estimation study of the psychometric properties of the LT in a sample of orthopaedic patients. Our objectives were to examine the reliability and validity of the LT and to determine the impact on these measurement properties of using a single or multiple series of lunges with one or both limbs.
METHODS
Study design
Figure 2 outlines how data were used to achieve the study objectives. Participants were recruited from four outpatient physiotherapy clinics. Data were collected at the initial assessment (Time 1) for convergent and known-groups validity, during the fourth to eighth week of rehabilitation (Time 2) for longitudinal validity, and within 3 days of Time 2 for test–retest reliability (Time 3). The Health Sciences Research Ethics Board of the University of Western Ontario approved this study, and all participants provided written informed consent.
Participants
Inclusion criteria were as follows: age >18 years, attending physiotherapy for post-surgical or non-surgical unilateral ankle dysfunction of musculoskeletal origin, loss of DF-ROM as judged by the treating physiotherapist, able to read/follow instructions in English, willing to attend testing sessions, and able to provide informed consent. Exclusion criteria were inability to successfully complete the LT, contraindication for full weight bearing, or presence of a concomitant neurological disorder or ankle arthrodesis. The target sample size of 35 was based on two test sessions and parameter estimation of ICC=0.85 (95% CI width of 0.20)21 and a 10% loss to follow-up.
Data collection
At Time 1, physiotherapists conducted a typical initial examination. For descriptive purposes, the following components of this assessment were recorded: age (y); sex; height (cm) and weight (kg) for body mass index (BMI) in kg/m2; mechanism of injury; and date of injury/surgery.
Measures
In addition to the LT, two self-report measures of function were administered at all three time points: the Lower Extremity Functional Scale (LEFS)22 and the American Academy of Orthopedic Surgeons (AAOS) Global Foot and Ankle Scale (GFAS).23 At Time 2 and Time 3, the clinician and the participant completed a Global Rating of Change (GRC) score.24
Lunge Test of ankle DF-ROM
The LT (see Figure 1) was performed following procedures outlined by Bennell.16 First, the unaffected foot was placed forward, with the great toe and centre of the heel on the tape measure. With both feet stationary, a controlled forward lunge was performed such that the knee flexed as the participant attempted to touch it to a vertical line marked on the wall with adhesive tape. During this movement, the physiotherapist maintained the foot's alignment on the tape measure secured to the floor, monitored the heel to ensure contact with the floor, and watched for knee contact with the wall. Pronation or supination of the foot was not controlled. An attempt was considered successful if the participant was able to touch the knee to the wall while maintaining the proper foot alignment and heel contact with the tape on the floor. Upon successfully touching the knee to the wall, the participant moved the foot further from the wall and performed another forward lunge, once again attempting to touch the knee to the wall. The participant was given up to five attempts to achieve the greatest distance between the large toe and the wall. Using the tape measure on the floor, this distance was recorded in millimetres to indicate a single value of DF-ROM. The entire process was then repeated with the affected limb. To obtain three values of DF-ROM for each ankle, the limb testing sequence was unaffected–affected–unaffected–affected–unaffected–affected.
Lower Extremity Functional Scale (LEFS)
The LEFS is a 20-item region-specific self-report questionnaire that asks participants to rate their perceived ability to perform various lower-extremity tasks on a 5-point scale (0=extreme difficulty or unable to perform activity, 4=no difficulty). Ratings are summed for a total LEFS score from 0 (very poor function) to 80 (excellent function). First described by Binkley,22 the LEFS has been shown to have excellent test–retest reliability in people with a variety of musculoskeletal complaints presenting to physiotherapy; good test–retest reliability has been reported in people with ankle sprains.25
AAOS Global Foot and Ankle Scale (GFAS)
The GFAS is a 20-item region-specific self-report questionnaire with four domains (pain, function, stiffness and swelling, and giving way) and varying response scales depending on the item. The standardized score (0–100, indicating very poor to excellent function) was used in this study. The GFAS has good internal consistency and acceptable test–retest reliability, and there is evidence to support its validity.23
Global Rating of Change scores (GRC)
The GRC uses a 15-point ordinal scale (−7=very great deal worse, +7=very great deal better).26 Norman27 has questioned the use of retrospective measures of change to recall functional status; to address this concern, Stratford28 has suggested using both clinician and participant ratings to create an average GRC.
Rater training
The raters used in the study were nine physiotherapists, one physiotherapy assistant, one kinesiologist, and two physiotherapy students. Clinical experience varied from 0 (i.e., the students) to 17 years. A 20-minute training session was used to demonstrate how to conduct the LT and to discuss inclusion and exclusion criteria, obtaining informed consent, administering questionnaires, recording of data, and the study timelines. Raters were not blinded, but they were asked to not review previous findings before performing the testing at Time 2 and Time 3. Periodic visits were conducted during data collection to review procedures. All four clinics were equipped with identical tape measures, and the set-up was reviewed by the primary investigator.
Analysis
Participant characteristics were summarized by means of descriptive statistics. LT scores (mm) were calculated for analyses involving the affected limb (LTAff) and for the difference between affected and unaffected limbs (LTDiff). For LTDiff, the affected side (LTAff) was subtracted from the unaffected side (LTUnaff), so a positive value indicates that the unaffected ankle had more DF-ROM than the affected ankle. Values for LTAff and LTDiff were calculated in two ways: one set of scores used only the first value from the LT protocol, while the other set used the mean of all three measurements obtained from the testing protocol.
Reliability
Test–retest reliability used Time 2 and Time 3 data, because we felt that ankle status should not vary over 1–3 days by the 4- to 8-week mark after beginning treatment. To test this assumption, we compared the means of the LT, LEFS, and GFAS at these two time points using paired t-tests.29 We calculated the ICC2,1 with its 95% CI,29 as well as the standard error of measurement (SEM) and the minimal detectable change at the 90% CI (MDC90).24,29 The SEM CIs were calculated following Stratford and Goldsmith.30
It has been suggested that measures of agreement are more appropriate than reliability measures for tools that will be used to assess clinical change.31 Therefore, we determined the 95% limits of agreement between the test–retest LT values32 and calculated the percentage of patients for whom test–retest scores differed by less than two threshold values31 (5 mm and 10 mm).
We performed two sets of reliability analyses, the first using values from the first measurement obtained from the LT protocol and the second using the mean of all three measurements from the protocol.
Construct validity
Time 1 and Time 2 data were used to examine validity. We used a construct-validation process33 to examine cross-sectional and longitudinal convergent and known-groups validity, as well as sensitivity to change.
To examine cross-sectional convergent validity, we used the Pearson product–moment correlation coefficient (r)29 to assess the correlation of LTAff and LTDiff with LEFS and GFAS, both at Time 1 and at Time 2. The hypothesis being tested was that greater DF-ROM should be correlated with better ankle-related function. Since higher LEFS and GFAS scores reflect better function, we expected a positive correlation with LTAff, which increases as DF-ROM improves; since an individual with very little side-to-side difference in DF-ROM would have a low LTDiff, we expected a negative correlation with LEFS and GFAS scores. Cross-sectional known-groups validity was examined at Time 1 by comparing LTAff with LTUnaff using a paired t-test. A significant difference between these means would indicate that the LT was able to differentiate between these two known groups.
To examine longitudinal convergent validity, we correlated the change in LTDiff and LTAff between Time 1 and Time 2 with change scores for the LEFS and the GFAS using the coefficient r. The magnitude of these correlations provides information about the extent to which a change in DF-ROM, as measured by the LT, is related to a change in functional ability. We wanted improvement in all measures to be reflected by positive change scores, so that a positive association was reflected by a positive coefficient r. We anticipated that the magnitude of LTDiff would decrease as DF-ROM of the affected ankle improved over time; therefore, Time 2 LTDiff scores were subtracted from Time 1 scores, so that a positive value indicated an improvement in DF-ROM. We expected that LTAff would increase as DF-ROM of the affected ankle improved; therefore, Time 1 LTAff scores were subtracted from Time 2 scores, so that a positive value would reflect an improvement in DF-ROM. We also anticipated that function would improve over time, and so a participant's score on the LEFS and GFAS would show improvement by increasing in value; therefore, we subtracted participants' LEFS and GFAS Time 1 scores from their Time 2 scores, anticipating a positive correlation.
Sensitivity to change was analyzed using the approach for a heterogeneous sample of individuals, most of whom were expected to change by different amounts.34 This analysis used the average GRC scores from the participants and physiotherapists. First, we used the ICC3,1 with its 2-sided 95% CI29 to examine the reliability of the average GRC scores between Time 2 and Time 3. Pearson's r was then calculated to examine the relationship between the average GRC at Time 2 and the change in LTAff and LTDiff among participants. A positive correlation was anticipated.
We also calculated the effect size (ES) and standardized response mean (SRM), which, while often considered an inappropriate approach to analyzing sensitivity to change for a heterogeneous sample,35 are nonetheless frequently reported in the literature. The ES was calculated as the average change between Time 1 and Time 2 divided by the standard deviation of the initial scores,36 and the SRM as the average change between Time 1 and Time 2 divided by the standard deviation of that change score.37
We performed two sets of validity analyses, the first using values from the first measurement obtained from the LT protocol and the second using the mean of all three measurements from the protocol.
RESULTS
Participant characteristics
Study participants were predominantly young, active adults with a mean (SD) age of 34.6 (13.9) years and a BMI of 25.3 (3.0) kg/m2. As defined by referral diagnosis, the largest group of participants (55%) had an inversion sprain; 15% had an ankle fracture, 11% had tendinopathy, and 7% had an eversion sprain. The rest were referred for osteoarthritis, Achilles tendon repair, surgical stabilization, calf strain, posterior impingement, contusion, or gunshot wound. Other characteristics are shown in Table 1. Of the 53 participants recruited at Time 1, 43 remained at Time 2 (after 4 weeks of rehabilitation), the rest having self-discharged from physiotherapy. The 37 participants who remained at Time 3 were those able to attend the retest session within the 1- to 3-day time window.
Table 1.
Testing occasion; no. (%) of patients* |
|||
---|---|---|---|
Time 1 |
Time 2 |
Time 3 |
|
Characteristic | Initial assessment (n=53) |
4–8 wk after Time 1 (n=43) |
1–3 d after Time 2 (n=37) |
Female sex | 24 (45) | 20 (47) | 17 (46) |
Age, y | |||
18–25 | 11 (21) | 9 (21) | 9 (24) |
26–35 | 24 (45) | 20 (47) | 16 (43) |
36–45 | 7 (13) | 3 (7) | 3 (8) |
46–55 | 5 (9) | 5 (12) | 4 (11) |
56–65 | 2 (4) | 2 (5) | 1 (3) |
>65 | 4 (8) | 4 (9) | 4 (11) |
Affected ankle, right side | 26 (49) | 23 (53) | 20 (54) |
Time since injury/surgery | |||
Acute (≤3 d) | 2 (4) | 2 (5) | 1 (3) |
Subacute (4 d to <2 wk) | 7 (13) | 6 (14) | 6 (16) |
Early chronic (2–4 wk) | 14 (26) | 9 (21) | 8 (22) |
Chronic | |||
(1–3 mo) | 15 (28) | 12 (28) | 10 (27) |
(3–6 mo) | 11 (21) | 10 (23) | 9 (24) |
(6–12 mo) | 0 (0) | 0 (0) | 0 (0) |
Longstanding (>1 y) | 4 (8) | 4 (9) | 3 (8) |
LEFS (0–100); mean (SD) | 49.0 (12.4) | 62.2 (12.5) | 64.9 (11.6) |
GFAS (0–100); mean (SD) | 68.9 (14.4) | 84.1 (11.3) | 84.6 (11.0) |
Unless otherwise specified.
LEFS=Lower Extremity Functional Scale (worst–best); GFAS=American Academy of Orthopedic Surgeons Global Foot and Ankle Scale (worst–best).
Reliability
Test–retest reliability findings are shown in Table 2. Across all approaches for generating a LT score, there was no difference between testing occasions (LTAff: first test, t=1.70, df=36, p=0.10; mean of 3 tests, t=2.06, df=36, p=0.05. LTDiff: first test, t=−1.36, df=36, p=0.18; mean of 3 tests, t=−1.70, df=36, p=0.10). All ICC values were >0.90, SEM varied from 4.0 to 5.7 mm, and MDC90 varied from 9.4 to 13.3 mm. The GFAS scores were no different between Time 2 and Time 3 (t=0.18, df=36, p=0.86), but LEFS scores at Time 2 differed from those at Time 3 (t=2.47, df=36, p=0.019).
Table 2.
Group; mean (SD), mm* |
||||
---|---|---|---|---|
Affected only (LTAff) |
Unaffected − affected (LTDiff) |
|||
Findings | 1st test | Mean of 3 tests | 1st test | Mean of 3 tests |
LT values | ||||
Test occasion | ||||
Time 2 (test) | 85.6 (37.3) | 89.4 (37.3) | 39.9 (21.6) | 39.2 (21.3) |
Time 3 (retest) | 87.4 (36.6) | 91.3 (36.9) | 38.1 (21.5) | 37.3 (21.5) |
Time 3–Time 2 | 1.8 (6.5) | 1.9 (5.5) | −1.8 (8.0) | −2.0 (7.1) |
Reliability | ||||
Parameter | ||||
ICC (95% CI) | 0.98 (0.98–0.99) | 0.99 (0.98–0.99) | 0.93 (0.87–0.96) | 0.94 (0.89–0.97) |
SEM (95% CI) | 4.7 (3.8–6.1) | 4.0 (3.3–5.2) | 5.7 (4.7–7.4) | 5.1 (4.2–6.6) |
MDC90 | 10.9 | 9.4 | 13.3 | 11.9 |
Agreement | ||||
Parameter | ||||
95% limits of agreement | −14.5, 10.9 | −12.6, 8.8 | −13.9, 17.4 | −11.9, 15.8 |
% ≤5 mm† | 65 | 68 | 41 | 65 |
% ≤10 mm‡ | 92 | 86 | 81 | 84 |
Unless otherwise indicated.
Percentage of patients with LT values differing ≤5 mm between test occasions.
Percentage of patients with LT values differing ≤10 mm between test occasions.
LT=Lunge Test; MDC90=minimal detectable change at the 90% CI.
For the agreement parameters in Table 2, across all approaches for calculating an LT score, more than 80% of patients had LT scores that differed by ≤10 mm between test occasions. This proportion dropped to less than 70% when the threshold for this difference was ≤5 mm.
Validity
For known-groups validity, LTAff scores were different from LTUnaff scores (t=−13.71, df=52, p<0.001). Mean (SD) values for the first measurement from the testing protocol were 56.8 (38.1) mm and 116.2 (35.0) mm, respectively. The corresponding values for the mean of three measurements from the testing protocol (not reported) were similar.
Correlational validity findings are shown in Table 3. For cross-sectional convergent validity of LTAff, all CIs for Pearson's r spanned the null value. For LTDiff, by contrast, no CIs for Pearson's r spanned the null value, and the point estimates varied from −0.40 to −0.50. Similar findings (not reported) were found for correlations at Time 2. For longitudinal validity, regardless of the approach to measuring the LT, values of r varied from 0.57 to 0.63 for the LEFS and 0.39 to 0.59 for the GFAS, with no CIs spanning the null value. For sensitivity to change, improvement in DF-ROM was associated with average GRC at Time 2. Use of the average GRC was supported by reliable test–retest ratings: ICC3,1=0.91 (95% CI, 0.84–0.95). Across all approaches for generating an LT score, the ES and SRM varied from 0.70 to 0.73 and 0.99 to 1.00, respectively.
Table 3.
Scoring strategy; Pearson correlation coefficient (95% CI) |
||||
---|---|---|---|---|
Affected only (LTAff) |
Unaffected-affected (LTDiff) |
|||
Type of validity | 1st test | Mean of 3 tests | 1st test | Mean of 3 tests |
Cross-sectional* | ||||
LEFS | 0.18 (−0.10–0.43) | 0.18 (−0.10–0.43) | −0.40 (−0.61 to −0.15) | −0.42 (−0.62 to −0.17) |
GFAS | 0.20 (−0.08–0.45) | 0.20 (−0.08–0.45) | −0.47 (−0.66 to −0.23) | −0.50 (−0.68 to −0.27) |
Longitudinal† | ||||
LEFS | 0.59 (0.35–0.76) | 0.57 (0.33–0.74) | 0.59 (0.35–0.76) | 0.63 (0.41–0.78) |
GFAS | 0.41 (0.13–0.63) | 0.39 (0.10–0.62) | 0.55 (0.30–0.73) | 0.59 (0.35–0.75) |
Sensitivity to change‡ | 0.54 (0.29–0.72) | 0.56 (0.31–0.74) | 0.33 (0.03–0.57) | 0.40 (0.11–0.63) |
Correlation between Time 1 values (n=53).
Correlation between change scores (n=43).
Correlation between change scores and Time 2 global ratings of change (n=43).
LT=Lunge Test; LEFS=Lower Extremity Functional Scale; GFAS=American Academy of Orthopaedic Surgeons Global Foot and Ankle Scale.
DISCUSSION
This study found good test–retest reliability of the LT in a sample of patients presenting with orthopaedic ankle dysfunction. We have also shown that the LT provided acceptable agreement findings and evidence supporting the validity of the test.
Reliability and agreement
For test–retest reliability, the current findings are similar to published ICC values noted above, which have been consistently high whether participants had no ankle dysfunction2,15,38 or whether the LTDiff score was used.18
Similarly, published values for the SEM and MDC (3 mm and 8 mm, respectively, for intra-observer reliability)38 are close to those found in the current study. Comparing the MDC90 values in Table 2 shows how quicker approaches to quantifying the LT affect the measure's ability to detect true change. If a single series of lunges is used, the MDC90 is about 1.5 mm larger than that produced by the more time-consuming approach of taking an average of three tests; if a score is obtained from the affected limb only, the MDC90 is about 2.5 mm larger than that produced by measuring both limbs.
In looking further at the MDC90 and agreement findings in Table 2, clinicians could conclude that true change in DF-ROM has occurred when LTAff changes by about 1 cm. The high percentage of patients with test–retest values that differed by ≤1 cm supports the use of this value for the MDC90 when measuring the affected ankle. When the LTDiff score is the variable of interest, clinicians could conclude that true change in DF-ROM has taken place when the LTDiff score changes by about 1.5 cm. Once again, the high percentage of patients with test–retest values that differed by ≤1 cm supports the use of this higher value for the MDC90 when measuring both limbs. These MDC values align well with the fact that tape measures typically mark 1 cm and 0.5 cm increments prominently. Future intervention studies should report the proportion of study participants who achieve these MDC values, to strengthen their clinical utility.
Validity
Our validity results agree with previous reports that performance-based measures and self-report functional measures are, at best, moderately correlated (r<0.60).25,39 For example, Alcock and Stratford25 found a correlation of 0.36 between non-weight-bearing DF-ROM and LEFS scores for people with ankle sprains. In addition, the cross-sectional correlations between LTAff and the LEFS in our study are similar to those reported for patients with an ankle fracture after cast removal (95% CIs for r=0.07–0.21 at 6 weeks and 0.05–0.20 at 6 months).1
As Table 3 shows, the validity findings suggest a measurement strategy that may minimize clinical time spent measuring DF-ROM. The absence of a cross-sectional relationship between LTAff and LEFS and GFAS scores, in the presence of a longitudinal relationship between their change scores, suggests that measurement of LTAff is a valid means of documenting change. The moderate cross-sectional and longitudinal correlations between LTDiff and LEFS and GFAS scores support the cross-sectional and longitudinal convergent validity of the LTDiff, which suggests that clinicians should measure LTDiff at the initial assessment if the goal is to provide a valid measure of the current status of ankle mobility relative to the uninvolved limb. When clinicians seek to document within-limb change over time, longitudinal validity findings support the use of the less time-consuming LTAff.
Our study has several limitations. First, the findings are not generalizable outside of the active adult population presenting with orthopaedic ankle conditions. Second, only individuals who could perform the LT as described by Bennell16 were included; patients with weight-bearing restrictions and those unable to perform the LT are therefore not represented in our results. Third, statements about the validity of the test are made from the perspective that this portion of the study, as a parameter-estimation study, was intended to begin the process of examining LT validity. Fourth, the actual time taken to perform the various LT scoring methods was not measured; future study is warranted to determine the relationship between LT scoring strategies and their completion time in clinical settings.
CONCLUSIONS
Our study has shown that the LT has sound reliability and agreement qualities. Known-groups validity of the test is supported. Cross-sectional convergent validity is supported when both limbs are measured. Longitudinal convergent validity is supported when one or both limbs are measured. A single series of up to five lunges can be used to obtain a reliable LT score.
KEY MESSAGES
What is already known on this topic
The Lunge Test (LT) to measure weight-bearing ankle dorsiflexion is a reliable test; however, there is limited evidence on its validity. It is not known whether a quicker testing protocol than the one described for research settings is valid and reliable.
What this study adds
A single series of up to five lunges can be used to generate a reliable LT score. From a validity perspective, our findings suggest that a LT difference score between a participant's ankles should be measured at the initial assessment, but measures of the affected limb alone can be made for the purpose of documenting clinical change.
Physiotherapy Canada 2012; 64(4);347–355; doi:10.3138/ptc.2011-41
REFERENCES
- 1.Hancock MJ, Herbert RD, Stewart M. Prediction of outcome after ankle fracture. J Orthop Sports Phys Ther. 2005;35(12):786–92. doi: 10.2519/jospt.2005.35.12.786. Medline:16848099. [DOI] [PubMed] [Google Scholar]
- 2.Collins N, Teys P, Vicenzino B. The initial effects of a Mulligan's mobilization with movement technique on dorsiflexion and pain in subacute ankle sprains. Man Ther. 2004;9(2):77–82. doi: 10.1016/S1356-689X(03)00101-2. http://dx.doi.org/10.1016/S1356-689X(03)00101-2. Medline:15040966. [DOI] [PubMed] [Google Scholar]
- 3.Hertel J. Functional anatomy, pathomechanics, and pathophysiology of lateral ankle instability. J Athl Train. 2002;37(4):364–75. Medline:12937557. [PMC free article] [PubMed] [Google Scholar]
- 4.Wikstrom EA, Hubbard TJ. Talar positional fault in persons with chronic ankle instability. Arch Phys Med Rehabil. 2010;91(8):1267–71. doi: 10.1016/j.apmr.2010.04.022. http://dx.doi.org/10.1016/j.apmr.2010.04.022. Medline:20684909. [DOI] [PubMed] [Google Scholar]
- 5.Delahunt E, Monaghan K, Caulfield B. Changes in lower limb kinematics, kinetics, and muscle activity in subjects with functional instability of the ankle joint during a single leg drop jump. J Orthop Res. 2006;24(10):1991–2000. doi: 10.1002/jor.20235. http://dx.doi.org/10.1002/jor.20235. Medline:16894592. [DOI] [PubMed] [Google Scholar]
- 6.Hoch MC, McKeon PO. The effectiveness of mobilization with movement at improving dorsiflexion after ankle sprain. J Sport Rehabil. 2010;19(2):226–32. doi: 10.1123/jsr.19.2.226. Medline:20543222. [DOI] [PubMed] [Google Scholar]
- 7.Hoch MC, McKeon PO. Joint mobilization improves spatiotemporal postural control and range of motion in those with chronic ankle instability. J Orthop Res. 2011;29(3):326–32. doi: 10.1002/jor.21256. http://dx.doi.org/10.1002/jor.21256. Medline:20886654. [DOI] [PubMed] [Google Scholar]
- 8.Croxford P, Jones K, Barker K. Inter-tester comparison between visual estimation and goniometric measurement of ankle dorsiflexion. Physiother Theory Pract. 1998;14(2):107–13. http://dx.doi.org/10.3109/09593989809057153. [Google Scholar]
- 9.Youdas JW, Bogard CL, Suman VJ. Reliability of goniometric measurements and visual estimates of ankle joint active range of motion obtained in a clinical setting. Arch Phys Med Rehabil. 1993;74(10):1113–8. doi: 10.1016/0003-9993(93)90071-h. http://dx.doi.org/10.1016/0003-9993(93)90071-H. Medline:8215866. [DOI] [PubMed] [Google Scholar]
- 10.Diamond JE, Mueller MJ, Delitto A, et al. Reliability of a diabetic foot evaluation. Phys Ther. 1989;69(10):797–802. doi: 10.1093/ptj/69.10.797. Medline:2780806. [DOI] [PubMed] [Google Scholar]
- 11.Elveru RA, Rothstein JM, Lamb RL. Goniometric reliability in a clinical setting. Subtalar and ankle joint measurements. Phys Ther. 1988;68(5):672–7. doi: 10.1093/ptj/68.5.672. Medline:3362980. [DOI] [PubMed] [Google Scholar]
- 12.Jonson SR, Gross MT. Intraexaminer reliability, interexaminer reliability, and mean values for nine lower extremity skeletal measures in healthy naval midshipmen. J Orthop Sports Phys Ther. 1997;25(4):253–63. doi: 10.2519/jospt.1997.25.4.253. Medline:9083944. [DOI] [PubMed] [Google Scholar]
- 13.Van Gheluwe B, Kirby KA, Roosen P, et al. Reliability and accuracy of biomechanical measurements of the lower extremities. J Am Podiatr Med Assoc. 2002;92(6):317–26. doi: 10.7547/87507315-92-6-317. Medline:12070231. [DOI] [PubMed] [Google Scholar]
- 14.Venturini C, Ituassu NT, Teixeira LM, et al. Intrarater and interrater reliability of two methods for measuring the active range of motion for ankle dorsiflexion in healthy subjects. Rev Bras Fisioter. 2006;10:377–81. [Google Scholar]
- 15.Vicenzino B, Prangley I, Martin D. The initial effect of two Mulligan mobilisation with movement treatment techniques on ankle dorsiflexion; A sports medicine odyssey—challenges, controversies and change: proceedings of the Australian Conference of Science and Medicine in Sport; 2001 Oct 23–27; Perth, Australia. Available from: http://fulltext.ausport.gov.au/fulltext/2001/acsms. [Google Scholar]
- 16.Bennell KL, Talbot RC, Wajswelner H, et al. Intra-rater and inter-rater reliability of a weight-bearing lunge measure of ankle dorsiflexion. Aust J Physiother. 1998;44(3):175–80. doi: 10.1016/s0004-9514(14)60377-9. Medline:11676731. [DOI] [PubMed] [Google Scholar]
- 17.Jones R, Carter J, Moore P, et al. A study to determine the reliability of an ankle dorsiflexion weight-bearing device. Physiotherapy. 2005;91(4):242–9. http://dx.doi.org/10.1016/j.physio.2005.04.005. [Google Scholar]
- 18.Vicenzino B, Branjerdporn M, Teys P, et al. The initial effects in posterior talar glide and dorsiflexion after mobilization with movement in individuals with recurrent ankle sprain. J Orthop Sport Phys. 2006;36(7):464–71. doi: 10.2519/jospt.2006.2265. http://dx.doi.org/10.2519/jospt.2006.2265. [DOI] [PubMed] [Google Scholar]
- 19.Hallet G, McEwan I, Thom J. Validation of the dorsiflexion lunge test; Proceedings of the XIV International Congress on Sports Rehabilitation and Traumatology; 2005 Apr 9–10; Bologna, Italy. [Google Scholar]
- 20.Lin CW, Moseley AM, Herbert RD, et al. Pain and dorsiflexion range of motion predict short- and medium-term activity limitation in people receiving physiotherapy intervention after ankle fracture: an observational study. Aust J Physiother. 2009;55(1):31–7. doi: 10.1016/s0004-9514(09)70058-3. http://dx.doi.org/10.1016/S0004-9514(09)70058-3. Medline:19400023. [DOI] [PubMed] [Google Scholar]
- 21.Bonett DG. Sample size requirements for estimating intraclass correlations with desired precision. Stat Med. 2002;21(9):1331–5. doi: 10.1002/sim.1108. http://dx.doi.org/10.1002/sim.1108. Medline:12111881. [DOI] [PubMed] [Google Scholar]
- 22.Binkley JM, Stratford PW, Lott SA, et al. North American Orthopaedic Rehabilitation Research Network. The Lower Extremity Functional Scale (LEFS): scale development, measurement properties, and clinical application. Phys Ther. 1999;79(4):371–83. Medline:10201543. [PubMed] [Google Scholar]
- 23.Johanson NA, Liang MH, Daltroy L, et al. American Academy of Orthopaedic Surgeons lower limb outcomes assessment instruments. Reliability, validity, and sensitivity to change. J Bone Joint Surg Am. 2004;86-A(5):902–9. doi: 10.2106/00004623-200405000-00003. Medline:15118030. [DOI] [PubMed] [Google Scholar]
- 24.Stratford PW, Binkley J, Solomon P, et al. Defining the minimum level of detectable change for the Roland-Morris questionnaire. Phys Ther. 1996;76(4):359–65. doi: 10.1093/ptj/76.4.359. discussion 366–8 Medline:8606899. [DOI] [PubMed] [Google Scholar]
- 25.Alcock GK, Stratford PW. Validation of the lower extremity functional scale on athletic subjects with ankle sprains. Physiother Can. 2002;54(4):233–40. [Google Scholar]
- 26.Jaeschke R, Singer J, Guyatt GH. Measurement of health status: ascertaining the minimal clinically important difference. Control Clin Trials. 1989;10(4):407–15. doi: 10.1016/0197-2456(89)90005-6. http://dx.doi.org/10.1016/0197-2456(89)90005-6. Medline:2691207. [DOI] [PubMed] [Google Scholar]
- 27.Norman GR, Stratford P, Regehr G. Methodological problems in the retrospective computation of responsiveness to change: the lesson of Cronbach. J Clin Epidemiol. 1997;50(8):869–79. doi: 10.1016/s0895-4356(97)00097-8. http://dx.doi.org/10.1016/S0895-4356(97)00097-8. Medline:9291871. [DOI] [PubMed] [Google Scholar]
- 28.Stratford PW, Binkley JM, Riddle DL, et al. Sensitivity to change of the roland-morris back pain questionnaire: Part 1. Phys Ther. 1998;78(11):1186–96. doi: 10.1093/ptj/78.11.1186. Medline:9806623. [DOI] [PubMed] [Google Scholar]
- 29.Portney LG, Watkins MP. Foundations of clinical research: applications to practice. 2nd ed. Upper Saddle River (NJ): Prentice Hall Health; 2000. [Google Scholar]
- 30.Stratford PW, Goldsmith CH. Use of the standard error as a reliability index of interest: an applied example using elbow flexor strength data. Phys Ther. 1997;77(7):745–50. doi: 10.1093/ptj/77.7.745. Medline:9225846. [DOI] [PubMed] [Google Scholar]
- 31.de Vet HCW, Terwee CB, Knol DL, et al. When to use agreement versus reliability measures. J Clin Epidemiol. 2006;59(10):1033–9. doi: 10.1016/j.jclinepi.2005.10.015. http://dx.doi.org/10.1016/j.jclinepi.2005.10.015. Medline:16980142. [DOI] [PubMed] [Google Scholar]
- 32.Bland JM, Altman DG. Applying the right statistics: analyses of measurement studies. Ultrasound Obstet Gynecol. 2003;22(1):85–93. doi: 10.1002/uog.122. http://dx.doi.org/10.1002/uog.122. Medline:12858311. [DOI] [PubMed] [Google Scholar]
- 33.Pedhauzer EJ, Schmelkin LP. Measurement, design, and analysis: an integrated approach. Hillsdale (NJ): Lawrence Erlbaum; 1991. [Google Scholar]
- 34.Stratford PW, Riddle DL. Assessing sensitivity to change: choosing the appropriate change coefficient. [cited 2011 Aug 16];Health Qual Life Out [Internet] 2005 Apr;3(24) doi: 10.1186/1477-7525-3-23. http://dx.doi.org/10.1186/1477-7525-3-23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Stratford PW, Spadoni G, Kennedy D, et al. Seven points to consider when investigating a measure's ability to detect change. Physiother Can. 2002;54(1):16–24. [Google Scholar]
- 36.Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health status. Med Care. 1989;27(3 Suppl):S178–89. doi: 10.1097/00005650-198903001-00015. http://dx.doi.org/10.1097/00005650-198903001-00015. Medline:2646488. [DOI] [PubMed] [Google Scholar]
- 37.Liang MH, Fossel AH, Larson MG. Comparisons of five health status instruments for orthopedic evaluation. Med Care. 1990;28(7):632–42. doi: 10.1097/00005650-199007000-00008. http://dx.doi.org/10.1097/00005650-199007000-00008. Medline:2366602. [DOI] [PubMed] [Google Scholar]
- 38.Dennis RJ, Finch CF, Elliott BC, et al. The reliability of musculoskeletal screening tests used in cricket. Phys Ther Sport. 2008;9(1):25–33. doi: 10.1016/j.ptsp.2007.09.004. http://dx.doi.org/10.1016/j.ptsp.2007.09.004. Medline:19083701. [DOI] [PubMed] [Google Scholar]
- 39.Finch E, Brooks D, Stratford PW, et al. Physical rehabilitation outcome measures: a guide to enhanced clinical decision making. Toronto: Canadian Physiotherapy Association; 2002. [Google Scholar]