Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Dec 1.
Published in final edited form as: J Rehabil Med. 2010 Apr;42(4):323–331. doi: 10.2340/16501977-0537

Using psychometric techniques to improve the Balance Evaluation System’s Test: the mini-BESTest

Franco Franchignoni 1, Fay Horak 2, Marco Godi 3, Antonio Nardone 3,4, Andrea Giordano 5
PMCID: PMC3228839  NIHMSID: NIHMS239251  PMID: 20461334

Abstract

Objective

To improve, with the aid of psychometric analysis, the Balance Evaluation System’s Test (BESTest), a tool designed to analyse several postural control systems that may contribute to poor functional balance in adults.

Methods

We examined performance of the BESTest in a convenience sample of 115 consecutive adult patients with diverse neurological diagnoses and disease severity, referred to rehabilitation for balance disorders. Factor (both explorative and confirmatory) and Rasch analysis were used to process the data in order to produce a new, reduced and coherent balance measurement tool.

Results

Factor analysis selected 24 out of the 36 original BESTest items likely to represent the unidimensional construct of ‘dynamic balance’. Rasch analysis was then used to: 1) improve the rating categories, and 2) delete 10 items (misfitting or showing local dependency). The model consisting of the remaining 14 tasks was verified with confirmatory factor analysis to meet the stringent requirements of modern measurement.

Conclusion

The new 14-item scale (dubbed mini-BESTest) focuses on dynamic balance, can be conducted in 10-15 minutes, and contains items belonging evenly to four of the six sections from the original BESTest. Further studies are needed to confirm the usefulness of the mini-BESTest in clinical settings.

Keywords: postural balance, outcome assessment, psychometrics

INTRODUCTION

Assessment of balance and mobility in clinical settings can help determine both risk of falling (1) and the most suitable measures to reduce postural instability (2-3). Laboratory studies have shown that postural control embraces different subdomains, including stability during quiet stance, postural reactions to external disturbances, anticipatory postural adjustments to perturbations caused by self-initiated movements (e.g. lifting an object), and dynamic balance during gait (4). However, until recently clinical balance tests did not systematically evaluate all these subdomains (5-6).

Recently, a new clinical tool for assessing subdomains underlying balance deficits has been presented: the Balance Evaluation Systems Test (BESTest) (7). The BESTest is a comprehensive balance assessment tool developed to identify the postural control systems underlying poor functional balance, so that treatments can be targeted to the specific balance deficit. Since the BESTest encompasses 4-6 items for each of six different balance domains, it takes about 35 minutes to administer compared to only about 15 min for other balance scales (e.g. the Berg Balance Scale, BBS) (8). This is an important shortcoming of the BESTest, limiting its routine use. On the other hand, the main disadvantage of other popular balance scales, including the BBS, is that they do not include important aspects of dynamic balance control such as the capability to react to postural perturbations, to stand on a compliant or inclined surface, or to walk while performing a cognitive task. All these features of balance control are known to be important in assessing balance disorders in different types of patients and reflect balance challenges during activities of daily living (5,7,9). Therefore, there is need for a comprehensive balance assessment tool that can be administered in a short time period.

To develop and validate a new clinical instrument, there is a growing trend to use Rasch analysis (10). Whereas traditional psychometric approaches focus on an instrument’s total score, Item Response Theory (IRT) models - as the Rasch measurement models - are founded on the probability that a person will make a particular response according to their level of the underlying latent variable. In this framework, it is possible to evaluate how well an item performs in terms of its relevance or contribution for measuring the underlying construct, the level of the underlying construct targeted by the question, the possible redundancy of the item relative to other items in the scale, and the appropriateness of the response categories (11). For these reasons, Rasch analysis has been recommended as a complementary method to assess the scaling properties of new clinical instruments, in addition to the traditional psychometric criteria for disability outcomes research (12).

The purpose of this study was to use both classical psychometric techniques and Rasch analysis to evaluate the BESTest, investigating a wide range of measurement requirements (e.g., dimensionality, quality of the rating categories, construct validity, reliability indexes) in order to improve the structure and measurement qualities of the test. Based on this analysis, we present a new, mini-BESTest that focuses on dynamic balance and can be conducted in 10-15 minutes.

MATERIAL AND METHODS

Patients

A hundred fifteen patients (53 men and 62 women), aged 62.7 years ± 16 (standard deviation, SD), were studied. They represent a convenience sample of patients with balance disorders, recruited with a consecutive sampling method. Patient diagnosis was as follows: 22 hemiparesis (12 right, 10 left), 21 Parkinson’s disease, 15 neuromuscular diseases, 14 hereditary ataxia, 11 multiple sclerosis, 10 unspecific age-related balance disorders, 7 peripheral vestibular disorders, 6 traumatic brain injury, 4 diffuse encephalopathy, 3 cervical myelopathy and 2 CNS neoplasm. All subjects were inpatients referred to the Scientific Institute of Veruno for rehabilitation assessment and treatment. Inclusion criteria were: able to walk with or without a cane; absence of severe cognitive or communication impairments; ability to tolerate the balance tasks without fatigue. Prior to taking part in the study, all participants signed the informed consent that had been approved by the Central Ethics Committee of the ‘Salvatore Maugeri’ Foundation.

Instrument and procedure

The Balance Evaluation Systems Test (BESTest) (7) contains six subscales, covering a broad spectrum of performance tasks: 1) biomechanical constraints, 2) stability limits, 3) transitions and anticipatory postural adjustments, 4) postural responses to perturbation, 5) sensory orientation while standing on a compliant or inclined base of support, 6) dynamic stability in gait with and without a cognitive task (Table 1). The BESTest consists of 27 items but some of them are subdivided into 2-4 subitems (e.g. for left and right sides) for a total of 36 tasks. Each item is scored on a 4-category ordinal scale from 0 (worst performance) to 3 (best performance). Specific patient and rating instructions, and stopwatch and ruler values are used to improve reliability (see www.bestest.us). Patients were rated by a physical therapist (M.G.) with four years of practice experience in balance assessment, who participated in a 1 week training course on the BESTest, at the Balance Disorders Laboratory - Oregon Health & Science University.

Table 1.

Summary of BESTest items and subsystem categories. The 14 items forming the miniBESTest for dynamic balance are in bold. Only the worst performance in items 11 ‘Stand on one leg’ and 18 ‘Lateral stepping’ have to be taken into account for the score. Moreover, the performance in item 27 ‘Cognitive Get up and go’ must be compared with that in the baseline item 26.

I Biomechanical Constraints II Stability limits III Anticipatory- Transitions
  1. Base of Support

  2. Alignment

  3. Ankle Strength

  4. Hip Strength

  5. Sit on Floor and Stand Up

  • 6
    1. Lateral Lean L

    2. Lateral Lean R

    3. Sitting Verticality L

    4. Sitting Verticality R

  • 7

    Reach Forward

  • 8
    1. Reach L

    2. Reach R

  • 9

    Sit to Stand

  • 10

    Rise to Toes

  • 11

    Stand on one leg (both right and left)

  • 12

    Alternate Stair Touch

  • 13

    Standing Arm Raise

IV Postural Responses V Sensory Orientation VI Dynamic Gait
  • 14

    In-place forward

  • 15

    In-place backward

  • 16

    Stepping forward

  • 17

    Stepping backward

  • 18

    Lateral stepping (both right and left)

  • 19

    1. Stance EO (firm surface)

    2. Stance EC (firm surface)

    3. Foam EO

    4. Foam EC

  • 20

    Incline EC

  • 21

    Gait Natural

  • 22

    Change Speed

  • 23

    Head Turns

  • 24

    Pivot Turns

  • 25

    Obstacles

  • 26

    Get up and Go

  • 27

    Cognitive Get up and Go

Legend: L= Left; R= Right; EO= Eyes Open; EC= Eyes Closed

Statistical analysis

Unidimensionality, i.e. whether items are measuring one underlying dimension or several separate dimensions, is one of the key requisites for test analysis and must be verified before applying Rasch models (13). To test the dimensionality of the BESTest, we performed the following statistical steps.

  1. A confirmatory factor analysis for categorical data (CFA, LISREL 8.80 software, Scientific Software International, Inc. Lincolnwood, IL 60712, U.S.A.) was performed to evaluate the fit of the scale to a unidimensional model. The extent to which the model can be used to reproduce the sample data was determined by examining the following indexes: the non-normed fit index (NNFI, aka Tucker-Lewis index), the comparative fit index (CFI), the root mean square error of approximation (RMSEA) and the standardized root mean square residual (SRMR). NNFI and CFI scores range from 0 to 1 with higher values indicating better fit: values greater than 0.95 are indicative of an acceptable model fit. A RMSEA value lower than 0.08 reflects an adequate fit and a RMSEA value equal to or less than 0.05-0.06 suggests a good fit. A SRMR value between <0.10 and 0.05 is reflective of an acceptable fit (14-15).

  2. In the event of a poor fit (i.e. multidimensionality is suspected) the following statistical steps were performed sequentially:

    1. Horn’s parallel analysis (16) was used to estimate the number of meaningful dimensions in the response matrix: the size of eigenvalues obtained from principal component analysis (PCA) was compared with those obtained from a randomly generated data set of the same size and number of variables. Only factors with eigenvalues exceeding the values obtained from the corresponding random dataset were retained for further investigation. Parallel analysis was conducted using ViSta (17) Parallel Analysis plugin (http://www.mdp.edu.ar/psicologia/vista/)

    2. Explanatory factor analysis (EFA, STATA 10.1 software, StataCorp LP College Station TX 77845 U.S.A.) was performed with a principal factor analysis using the number of factors suggested by the parallel analysis. After varimax rotation, the relationships between the test items and retained factors were taken into account. For a solution that is stable and approximates the population pattern, given the sample size, only items with loading > 0.50 were considered as correlated to the factors (18).

    3. Item exclusion, based on the EFA results and expert review, was performed leading to a preliminary reduced set of test items.

Following the above analysis and item exclusion, the matrix of item responses of the 24 retained items for each subject underwent Rasch analysis using the WINSTEPS software (Linacre JM. WINSTEPS Rasch measurement computer program - version 3.68. Chicago: Winsteps.com; 2009) (19).

As a first step, we investigated whether the rating scale of each BESTest item was used in the expected manner. We evaluated the rating scale categories (partial credit model) using criteria suggested by Linacre (20, 21): a) at least 10 observations per response option; b) even distribution of category use; c) monotonic increase in both average measures of persons with a given score/category and thresholds (thresholds - or step calibrations - are the ability levels at which the response to either of two adjacent categories is equally likely); d) category outfit mean square (MnSq) values less than 2 (see below); and e) threshold differences larger than 1.4 and lower than 5 logits. We collapsed categories following these guidelines, and compared different collapsing solutions, examining not only the category diagnostics, but also reliability indices. We were guided by the intention to select a solution that maximized statistical indices and clinical meaningfulness.

After this rating scale modification, a new Rasch analysis was performed, including PCA on the standardized residuals to evaluate: i) the presence of sub-dimensions, as an independent confirmation of the unidimensionality of the scale, and ii) the local independence of items.

  1. ‘Unidimensionality’ assumes that - after removal of the trait that the scale intended to measure (the ‘Rasch factor’) - the residuals will be uncorrelated and normally distributed (i.e. there are no principal components) (19). The following criteria were used to determine whether additional factors were likely to be present in the residuals: a) a cutoff of 50% of the variance explained by the Rasch factor; and b) eigenvalue of the first residual factor smaller than 3 (19).

  2. ‘Local independence’ between items indicates that they do not duplicate some feature of each other or they both incorporate some shared dimension. Item couples with a standardized residual correlation > 0.30 were considered as possibly dependent components (22). Based on examination of the respective item information functions and expert judgement, we progressively eliminated all dependencies either removing one of the items, or - in the case of dependent items which were related to the same task performed in different directions (e.g. scores assessing right and left sides) - collapsing the items into a new one reporting only the worst performance.

Internal validity of the scale was assessed by evaluating the fit of individual test items to determine if the pattern of item difficulty was consistent with the model predictions. We estimated the goodness-of-fit of the observed data to data predicted by the Rasch model (23, 24). Information-weighted (infit) and outlier-sensitive (outfit) mean-square statistics (MnSq) for each item were calculated to test if there were items that did not fit the model expectancies. Both of these fit statistics are expected to approach 1 if the data fit the model. In accordance with the literature (10), we considered MnSq >0.7 and <1.3 as an indicator of acceptable fit.

We also estimated the level of difficulty of each item (‘item difficulty’) and the ability of each individual subject, and then we examined the data for floor and ceiling effects. Item difficulty and subject ability are expressed - on a common interval scale - in logit units, a logit being the natural logarithm of the ratio (odds) of mutually exclusive alternatives (e.g. pass vs. fail, or higher vs. lower response option) (23, 24). Logit-transformed measures represent linear measures. By convention, 0 logit was ascribed to the mean item difficulty. For Rasch analysis, a sample size of more than 100 persons will estimate item difficulty with an alpha of 0.05 within ± 0.5 logits (25).

Reliability was evaluated in terms of “separation” across test items, defined as the ratio of the true spread of the measures to their measurement error (23, 24). Two indexes were calculated: the item separation index and the person separation index, that give an estimate (in standard error units) of the spread or “separation” of items and persons along the measurement construct, respectively. A separation of 2.0 is considered good (24). Related indexes are the reliability of the item separation index and of the person separation index. These provide the degree of confidence that can be placed in the consistency of the estimates. This confidence ranges from 0 to 1, and coefficients >0.80 and >0.90 are considered respectively good and excellent (23).

RESULTS

The confirmatory factor analysis (CFA) gave, using all the items in the BESTest, an inadequate fit (NNFI= 0.91, CFI= 0.91, RMSEA= 0.12, SRMR= 0.15). Horn’s Parallel Analysis (PA) revealed three factors with empirical eigenvalues exceeding those from the random data. These three factors explained 43%, 11%, and 8% of the variance, respectively. To investigate the contribution of each item to the scale, we tested the three-factor model suggested by PA using explanatory factor analysis for ordinal data (EFA) with a principal axis factor extraction method. After varimax rotation, 24 items loaded > 0.50 in the first factor, 4 items (6 a-d) in the second factor, and 3 items (7, 8a and 8b) in the third factor, while items 1-4 and 13 failed to load meaningfully in any factor.

Taking into account these results and expert opinion, 12 items (1-4, 6a-d, 7, 8a-b, 13) were deemed as not belonging to the main trait and therefore were dropped from subsequent analyses. The expert review judged the remaining 24 items to potentially measure a factor likely to represent “dynamic balance” in a variety of functional conditions. These 24 items underwent Rasch analysis.

Rating scale diagnostics showed that the 0-3 level rating categories did not comply with our pre-set criteria for category function. The model best meeting the criteria reduced the rating scale from 4 to 3 levels by combining categories 0 (absent) and 1 (mild) or 1 (mild) and 2 (moderate) (Table 2), with different collapsing strategies used across items.

Table 2.

Mean difficulty estimates for each of the 14 items of the miniBESTest with standard errors (S.E.) and infit and outfit mean-square statistics (MnSq). The more difficult the item estimate, the less likely it is for any subject to gain a high score. Alongside each item is its number in the original BESTest (see Table 1). The rating scale column shows how the 4 scaling categories were collapsed into 3 categories, e.g. 0012 means that categories 0 and 1 have been collapsed and then the remaining three categories have been re-numbered accordingly.

ITEM Mean difficulty S.E. Infit MnSq Outfit MnSq Rating scale
11 a/b - Stand on L/R leg 2.43 0.25 0.90 1.07 0112
18 a/b - Postural Stepping L/R 1.10 0.22 0.84 0.76 0112
23 - Head turns 1.00 0.19 0.91 0.83 0012
17 - Postural Stepping backward 0.93 0.22 0.97 1.08 0112
27 - Timed “Get Up and Go” with dual task 0.77 0.24 1.07 1.08 0112
10 - Rise to toes 0.65 0.20 0.94 1.11 0012
19d - Foam Surface EC 0.54 0.20 1.04 1.12 0112
25 - Obstacles 0.10 0.21 0.75 0.73 0112
16 - Postural Stepping forward -0.03 0.21 1.14 1.23 0112
20 - Incline EC -0.64 0.21 1.12 1.00 0112
24 - Pivot turns -0.85 0.21 0.99 1.32 0112
22 - Change speed -1.00 0.20 0.89 0.78 0112
9 - Sit to stand -1.78 0.24 1.30 1.32 0012
19a - Stance EO -2.51 0.39 1.12 0.66 0012

After combining these rating scale categories, 22 out of the 24 items fitted the underlying construct of dynamic balance that the scale was intended to measure (infit and outfit MnSq between 0.7 and 1.3). Item 5 ‘Sit on floor and stand up’ was underfitting (i.e. with unexpectedly high variability) and item 26 ‘Get up and go’ was overfitting (i.e. with an overly predictable pattern), so they were eliminated. The PCA of standardized residuals showed several high (> 0.30) residual correlations between items. Based on examination of the respective item information functions and expert judgment, all misfitting items and residual correlations > 0.30 were eliminated one by one, and the Rasch analysis was rerun. Correlated (redundant) items were removed either by deleting one of them, or by maintaining only the worst performance in items 11 and 18, which assessed the same task on both right and left side. At the end of these iterations, only 14 test items remained. This set of items (called the mini-Balance Evaluation Systems Test of dynamic balance, mini-BESTest) (see Table 1) underwent further analyses.

All of the final 14 items showed good infit and outfit MnSq values (Table 2). The variance explained by the estimated Rasch measures was 58.8%, whereas only 5.3% of the variance was explained by the first residual factor (eigenvalue 1.8). Regarding the hierarchic ordering of items, figure 1 and 2 show - according to the Rasch model – the distribution of subject ability and item difficulty. Item difficulty showed a fairly even spread (from the most easy item ‘Stand with eyes open on a firm surface’ to the most difficult item ‘Stand on one leg’), and subject ability presented a normal distribution spanning from −5 to +4.9 logits, with an average measure = + 0.15 (mean S.E. 0.59). Only two subjects showed extreme maximum scores: the precision of their ability estimates was quite low, the S.E. being about 30% of the corresponding measure. No floor effect was found. Overall, these findings demonstrate an adequate sample-item distribution. The item difficulty estimates spanned from −4 to +2.5 logits. The reliability indices of mini-BESTest were as follows: Item separation index = 7.35 and Item separation reliability = 0.98; Person separation index = 2.50 and Person separation reliability = 0.86.

Figure 1.

Figure 1

Subject-ability and item-difficulty maps of the mini-BESTest (n=115). In both maps, the vertical line represents the measure of the variable, in linear logit units. The left-hand column locates each patient’s ability, from best to worst dynamic balance. The right-hand column locates each item’s relative difficulty for this sample (for each item, the difficulty estimate represents the mean calibration of the threshold parameters according to the partial credit model). From bottom to top, measures indicate better balance for patients and higher difficulty for items. By convention, the average difficulty of items in the test is set at 0 logits (and indicated with M’) and patients with average ability are located at M.

Figure 2.

Figure 2

Expected scores for the mini-BESTest (n=115). Distance between points is equal-interval. Logit measure at top of key, centered at the mean item difficulty. The rating scale is collapsed from 4 to 3 categories renumbered 0 (severely impaired), 1 (moderately impaired), 2 (normal). The threshold between adjacent categories is marked by ‘:’. At the bottom is the distribution of the person measures (subject ability): each marker is a single person.

A final CFA confirmed the unidimensionality of the mini-BESTest, supporting the unidimensional model with the following indexes: NNFI= 0.98, CFI= 0.99, RMSEA= 0.064, SRMR= 0.098. The final version of the mini-BESTest is shown in the Appendix.

DISCUSSION

The original BESTest is composed of a comprehensive battery of 36 balance tasks, developed to analyse six different postural control systems that may contribute to poor functional balance in adults of any age (7). Thus, it is not surprising that this test failed to meet a unidimensionality assumption (i.e. that a single dimension underlies all item responses), when applied to 115 patients with a wide range of diagnoses and severity of disease.

Our dimensionality assessment extracted from the test battery 24 item assumed to define ‘dynamic balance’. On these items we performed an analysis of category and item properties using Rasch psychometric methods, which led to the definition of the 14 most psychometrically useful and practical items: the refined miniBESTest measures the unidimensional construct of ‘dynamic balance’ without redundant items or significant ceiling/floor effects (26) and takes 10-15 minutes to administer.

The rating scale diagnostics (21) performed on the 24 items retained after EFA showed that the original 4 levels were redundant (23). This finding was expected, since some BESTest items were borrowed (with modifications) from the Berg Balance Scale (BBS) and the Dynamic Gait Index. These two well-known balance and mobility scales have been shown to include sub-optimal category functioning (27, 28) when strict diagnostic criteria are applied (20). In addition, it has already been demonstrated that the BBS (and other balance scales) show essentially identical psychometric properties – including responsiveness – when used with a 3-category, instead of a 4- or 5-category rating scale (29). Appropriate combination of levels 0-1 or 1-2 eliminated underutilized rating categories, and ensured that each rating category was distinct from the others in representing a distinct balance ability.

After collapsing the categories to three distinct levels, the data from the 24-item set were reanalyzed to calculate fit statistics and the PCA of the residuals. This analysis enabled us to eliminate 10 misfitting or redundant items without loss of measurement information and with the great advantage of improving test acceptability and feasibility. For the remaining 14-item (the mini-BESTest), we calculated fit statistics, extracted Rasch-modeled parameters of ability and difficulty, and then examined internal validity and test reliability. The average ability of this group of patients was very similar to the mean value of 0 logits (+0.15): this means that the test is well targeted to the sample. Moreover, the person-ability and item-difficulty mapped logit scale showed a broad range for both person-ability and item-difficulty (see Figure 1). The 1.7% of subjects (2/115) having extreme maximum scores the two “X” at the top of the left-hand column in figure 1 - constituted a minor trend toward a ceiling effect in very highly functioning subjects. No floor effect was found. However, one should interpret the extreme results with caution since these person measures have the least precision due to the larger errors of measurement. On the other hand, the high item separation reliability indicates that great confidence can be placed in the consistency of item difficulty estimate across future samples.

Content validity of the dynamic miniBESTest is high since many items included in the test are part of well-known balance batteries: a) ‘Sit to stand’ is from the Berg Balance Scale (30) and the Performance-Oriented Mobility Assessment (31); b) ‘Stand on one leg’ is from the Ataxia Test Battery (32) and the Berg Balance Scale; c) ‘Stance – eyes open’ and ‘Stance on foam – eyes closed’ are from the modified Clinical Test of Sensory Integration of Balance (33, 34); d) Gait when balance is challenged by changing speed, head rotations, pivot turns, or stepping over obstacles comes from the Dynamic Gait Index (35); e) the ‘Get Up and Go’ test (36) and the ‘Get up and Go with a simultaneous cognitive task’ (37) are stand-alone tests. In the BESTest, Horak et al. (7) made only minor modifications to some of the above original items, in order to increase their challenge and improve their consistency and reliability. Novel items in the mini-BESTest have been adapted from laboratory tests where they were shown to distinguish different types of balance disorders: a) postural reactions to external perturbations (38); b) rise to toes (39); and c) stance on an inclined surface with eyes closed (40).

As an additional demonstration of the internal construct validity of the scale, the general hierarchic arrangement found by Rasch analysis (Table 2) is consistent with clinical expectations. For example, the maintenance of feet-together stance, eyes open on a firm surface (‘Stance EO’) is the easiest task and ‘Stand on one leg’ the most difficult task item (28). In fact, ‘Stance EO’ makes few sensory demands and requires low effort, whereas ‘Stand on one leg’ is very challenging because of the narrow base of support and musculoskeletal demands. In addition, the results of Rasch analysis of the mini-BESTest show a hierarchical order of item difficulty: ‘Gait with horizontal head turns’, ‘Stand on one leg’, and ‘Lateral stepping responses’ were the most difficult items, whereas ‘Stance EO’ and ‘Sit to Stand’ were the easiest items. The high difficulty of the item ‘Gait with horizontal head turns’ may be attributed to vestibular influences (35) and is in line with the results of the two Rasch studies on the Dynamic Gait Index (28, 41).

The mini-BESTest contains 14 items belonging evenly to four of the six sections from the original BESTest (table 1): section III ‘Anticipatory Postural Adjustments’ (sit to stand, rise to toes, stand on one leg); section IV ‘Postural Responses’ (stepping in four different directions); section V ‘Sensory Orientation’ (stance - eyes open; foam surface - eyes closed; incline - eyes closed); and section VI ‘Balance during Gait’ (gait during change speed, head turns, pivot turns, obstacles; timed ‘Get Up and Go’ with dual task).

Our factor analysis procedure (42) isolated a number of items, primarily in the first two sections of the BESTest, that did not contribute to the dominant trait (dynamic balance), suggesting that parts I ‘Biomechanical constraints’ and II ‘Stability limits’ of the BESTest warrant separate psychometric studies. Biomechanical constraints (such as orthopedic limitations on the base of foot support, postural alignment and strength) and stability limits (ability to lean to perceived limits of stability and perception of verticality) are also important facets of postural control but appear to be independent of the construct ‘dynamic balance’.

This study has several limitations, that restrict the generalization of our results to different groups or settings, and raters. In particular, the selection criteria of our convenience sample (recruited with a consecutive sampling method) may represent a threat to external validity. Our sample was a cross-section of adults drawn from a single rehabilitation facility and with balance disorders of very different origins and severities. Moreover, we used only one rater, but – to improve the reliability of results – he participated in a 1 week training course on BESTest, held by one of its developers (FBH).

In conclusion, the new mini-BESTest for dynamic balance offers a unique, brief clinical rating scale for dynamic balance that has excellent psychometric characteristics. The potential interest of the mini-BESTest in clinical settings is high, but further studies are needed. They should include: a) analysis of the actual performance of the new 3-level response structure; and b) a study of differential item functioning, i.e. the stability of item hierarchy across sub-samples defined according to potentially relevant clinical criteria; c) relation of the scores to fall risk and to other clinical tests of balance; and d) age-related normative values.

Acknowledgments

Fay Horak was supported by a Grant from the National Institutes on Aging AG-06457 (U.S.A.).

APPENDIX

MINI-BESTest of DYNAMIC BALANCE - Balance Evaluation System’s Test © 2009

Subjects should be tested with flat-heeled shoes, OR shoes and socks off. If subject must use an assistive device for an item, score that item one category lower. If subject requires physical assistance to perform an item, score the lowest category (0) for that item.

  1. SIT TO STAND

    • (2) Normal: Comes to stand without use of hands and stabilizes independently.

    • (1) Moderate: Comes to stand with use of hands on first attempt.

    • (0) Severe: Impossible to stand up from chair without assistance, OR several attempts with use of hands.

  2. RISE TO TOES

    • (2) Normal: Stable for >3 s with maximum height.

    • (1) Moderate: Heels up, but not full range (smaller than when holding hands), OR noticeable instability for >3 s.

    • (0) Severe: ≤ 3 s.

  3. STAND ON ONE LEG
    Left Time in Se.c Trial 1:_____ Trial 2:_____ Right Time in Sec. Trial 1: _____ Trial 2:_______
    (2) Normal: 20 s. (2) Normal: 20 s.
    (1) Moderate: < 20 s. (1) Moderate: < 20 s.
    (0) Severe: Unable. (0) Severe: Unable.
  4. COMPENSATORY STEPPING CORRECTION - FORWARD

    • (2) Normal: Recovers independently a single, large step (second realignment step is allowed).

    • (1) Moderate: More than one step used to recover equilibrium.

    • (0) Severe: No step, OR would fall if not caught, OR falls spontaneously.

  5. COMPENSATORY STEPPING CORRECTION - BACKWARD

    • (2) Normal: Recovers independently a single, large step.

    • (1) Moderate: More than one step used to recover equilibrium.

    • (0) Severe: No step, OR would fall if not caught, OR falls spontaneously.

  6. COMPENSATORY STEPPING CORRECTION - LATERAL
    Left Right
    (2) Normal: Recovers independently with 1 step (crossover or lateral OK). (2) Normal: Recovers independently with 1 step (crossover or lateral OK).
    (1) Moderate: Several steps to recovers equilibrium. (1) Moderate: Several steps to recovers equilibrium.
    (0) Severe: Falls, or cannot step. (0) Severe: Falls, or cannot step.
  7. EYES OPEN, FIRM SURFACE (FEET TOGETHER) Time in Sec:________

    • (2) Normal: 30s.

    • (1) Moderate: < 30s.

    • (0) Severe: Unable.

  8. EYES CLOSED, FOAM SURFACE (FEET TOGETHER) Time in Sec:________

    • (3) Normal: 30s.

    • (1) Moderate: < 30s.

    • (0) Severe: Unable.

  9. INCLINE - EYES CLOSED (TOES UP) Time in Sec:________

    • (2) Normal: Stands independently 30 s and aligns with gravity.

    • (1) Moderate: Stands independently <30 s, OR aligns with surface.

    • (0) Severe: Unable to stand >10 s, OR will not attempt independent stance.

  10. CHANGE IN GAIT SPEED

    • (2) Normal: Significantly changes walking speed without imbalance.

    • (1) Moderate: Unable to change walking speed or imbalance.

    • (0) Severe: Unable to achieve significant change in speed AND signs of imbalance.

  11. WALK WITH HEAD TURNS – HORIZONTAL

    • (2) Normal: performs head turns with no change in gait speed and good balance.

    • (1) Moderate: performs head turns with reduction in gait speed.

    • (0) Severe: performs head turns with imbalance.

  12. WALK WITH PIVOT TURNS

    • (2) Normal: Turns with feet close, FAST (≤ 3 steps) with good balance.

    • (1) Moderate: Turns with feet close SLOW (≥ 4 steps) with good balance.

    • (0) Severe: Cannot turn with feet close at any speed without imbalance.

  13. STEP OVER OBSTACLES

    • (2) Normal: Able to step over 2 stacked shoe boxes with minimal change of speed and with good balance.

    • (1) Moderate: Steps over shoe boxes but touches box, OR displays cautious behavior by slowing gait.

    • (0) Severe: Cannot step over shoe boxes, OR stops, OR steps around box.

  14. TIMED UP & GO WITH DUAL TASK Single Task: ______sec; Dual Task: ______sec

    • (2) Normal: No noticeable change between sitting and standing in the rate or accuracy of backwards counting and no change in gait speed compared to Timed Up and Go without cognitive task.

    • (1) Moderate: Affects on either the cognitive task or slower walking than without the dual task.

    • (0) Severe: Can’t count backward while walking or stops walking while talking.

INSTRUCTIONS

1. SIT TO STAND
Examiner Instructions: Note the initiation of the movement, and the use of hands on the arms of the chair or their thighs or thrusts arms forward. Patient: Cross arms across your chest. Try not to use your hands unless you must. Don’t let your legs lean against the back of the chair when you stand. Please stand up now.
2. RISE TO TOES
Examiner Instructions: Allow the patient to try it twice. Record the best score. (If you suspect that subject is using less than their full height, ask them to rise up while holding the examiners’ hands.) Make sure subjects look at a non-moving target 4-12 ft / 10-30 cm away. Patient: Place your feet shoulder width apart. Place your hands on your hips. Try to rise as high as you can onto your toes. I’ll count out loud to 3 seconds. Try to hold this pose for at least 3 seconds. Look straight ahead. Rise now.
3. STAND ON ONE LEG
Examiner Instructions: Allow the patient two attempts and record the best. Record the no. of seconds they can hold posture up to a maximum of 30 s. Stop timing when subject moves their hand off hips or puts a foot down. Make sure subjects look at a non-moving target 4-12 ft / 10-30 cm ahead. Patient: Look straight ahead. Keep your hands on your hips. Bend one leg behind you. Don’t touch your raised leg on your other leg. Stay standing on one leg as long as you can. Look straight ahead. Lift now.
Repeat other side.
4. STEPPING - FORWARD
Examiner Instructions: Stand in front to the side of patient with one hand on each shoulder and ask them to push forward. Make sure there is room for them to step forward. Require them to lean until their shoulders and hips are in front of their toes. Suddenly release your push when the subject is in place and providing constant pressure to a level just before the heels lift off. The test must elicit a step. NOTE: Be prepared to catch patient. Patient: Stand with your feet shoulder width apart, arms at your sides. Lean forward against my hands beyond your forward limits. When I let go, do whatever is necessary, including taking a step, to avoid a fall.
5. STEPPING - BACKWARD
Examiner Instructions: Stand in back to the side of the patient with one hand on each scapula and ask them to push backward. Make sure there is room for them to step backward. Require them to lean until their shoulders and hips are in back of their heels. Release your push when the subject is in place, and providing constant pressure to a level just before the heels lift off. Test must elicit a step. NOTE: Be prepared to catch patient. Patient: Stand with your feet shoulder width apart, arms down at your sides. Lean backward against my hands beyond your backward limits. When I let go, do whatever is necessary, including taking a step, to avoid a fall.
6. STEPPING - LATERAL
Examiner Instructions: Stand behind the patient, place one hand on either the right (or left) side of the pelvis, and get them to lean their whole body into your hand. Require them to lean until the midline of pelvis is over the right (or left) foot and then suddenly release your hold. NOTE: Be prepared to catch patient. Patient: Stand with your feet together, arms down at your sides
Lean into my hand beyond your sideways limit.
When I let go, step if you need to, to avoid a fall.
7. STANCE – EYES OPEN ; 8. STANCE ON FOAM – EYES CLOSED
Examiner Instructions: Do the tests in order. Record the time the patient was able to stand in each condition to a maximum of 30 s. Repeat condition if not able to stand for 30 s and record both trials (average for category). In # 8, use medium density Temper® foam, 4” / 10 cm thick. Assist subject in stepping onto foam. Have the subject step off the foam between trials. Include leaning or hip strategy during a trial as “instability” (31). Patient: For the next 2 assessments, you’ll either be standing on the normal ground (# 7) or on this foam (# 8), with your eyes open or closed. Place your hands on your hips. Place your feet together until almost touching. Look straight ahead. Each time, stay as stable as possible until I say stop.
9. INCLINE - EYES CLOSED
Examiner Instructions: Aid the patient onto the ramp. Once the patient closes their eyes, begin timing and record and average both times. Note if sway is greater than when standing on firm, level surface with eyes closed or if there is poor alignment to vertical. Assist includes a cane or light touch any time during the trial. Patient: I will be timing this next assessment. Please stand on the incline ramp with your toes toward the top. Place your feet shoulder width apart. Keep arms at your sides. Place your hand on your hips. I will start timing when you close your eyes.
10. Change in Speed
Examiner Instructions: Allow the patient to take 3-5 steps at their normal speed, and then say “fast”, after 3-5 fast steps once say “slow”. Allow 3-5 slow steps before they stop walking. Patient: Begin walking at your normal speed, when I tell you “fast” walk as fast as you can. When I say “slow”, walk very slowly.
11. Walk With Head Turns – Horizontal
Examiner Instructions: Allow the patient to reach their normal speed, and give the commands “right, left” every 3-5 steps. Score if you see a problem in either direction. If patient has severe cervical restrictions allow combined head and trunk movements (en bloc). Patient: Begin walking at your normal speed, when I say “right”, turn your head and look to the right. When I say “left” turn your head and look to the left. Try to keep yourself walking in a straight line.
12. Walk With Pivot Turns
Examiner Instructions: Demonstrate a pivot turn. Once the patient is walking at normal speed, say “turn and stop”. Count the steps from turn” until the subject is stable. Instability may be indicated by wide stance width, extra stepping or trunk motion. Patient: Begin walking at your normal speed. When I tell you to “turn and stop”, turn as quickly as you can to face the opposite direction and stop. After the turn, your feet should be close together.
13. Step over obstacle
Examiner Instructions: Place the 2 stacked boxes (9” / 23 cm height) 10 ft. / 30 cm away from where the patient will begin walking. Use a stopwatch to time gait duration to calculate average velocity by dividing the number of seconds into 20 ft / 60 cm. Patient: Begin walking at your normal speed. When you come to the shoe boxes (9” / 23 cm height), step over them, not around them and keep walking
14. Timed get Up & Go (TUG) with cognitive task
Examiner Instructions: First, time the patient performing the TUG without a cognitive task. Then, while sitting, ask the patient to count backward from a number between 80 and 100 by 3s, and keep track of how many numbers they can subtract within 10 s. Then, ask the patients to count backwards from a different number and after a few numbers say “go” for the TUG. Time the patient from when you say “go” until they return to sitting. Stop timing when the patient’s buttocks touch the chair bottom. The chair should be firm with arms to push from, if necessary. Patient: a) Practice counting out loud, backwards from a number between 80 and 100 by 3s while sitting in the chair. b) I will see how long it takes you to get up from the chair, walk past the tape on the floor and turn around to walk back to the chair and sit down. c) Now count backwards from a number between 80 and 100 by 3s and when I say “go,” stand up from the chair, walk at your normal speed across the tape on the floor, turn around, and come back to sit in the chair but continue backward counting.

References

  • 1.Thurman DJ, Stevens JA, Rao JK. Practice parameter: Assessing patients in a neurology practice for risk of falls (an evidence-based review): report of the Quality Standards Subcommittee of the American Academy of Neurology. Neurology. 2008;70:473–479. doi: 10.1212/01.wnl.0000299085.18976.20. [DOI] [PubMed] [Google Scholar]
  • 2.Gillespie LD, Robertson MC, Gillespie WJ, Lamb SE, Gates S, Cumming RG, et al. Interventions for preventing falls in older people living in the community. Cochrane Database Syst Rev. 2009 Apr 15;2:CD007146. doi: 10.1002/14651858.CD007146.pub2. [DOI] [PubMed] [Google Scholar]
  • 3.Horak FB. Clinical assessment of balance disorders. Gait Posture. 1997;6:76–84. [Google Scholar]
  • 4.Horak FB, Macpherson JM. Postural orientation and equilibrium. In: Shepard J, Rowell L, editors. Regulation and Integration of Multiple Systems Handbook of Physiology: Section 12, Exercise. New York: Oxford University Press; 1996. pp. 255–292. [Google Scholar]
  • 5.Horak FB. Postural orientation and equilibrium: what do we need to know about neural control of balance to prevent falls? Age Ageing. 2006;35:ii7–ii11. doi: 10.1093/ageing/afl077. [DOI] [PubMed] [Google Scholar]
  • 6.Pérennou D, Decavel P, Manckoundia P, Penven Y, Mourey F, Launay F, et al. Evaluation of balance in neurologic and geriatric disorders. Ann Readapt Med Phys. 2005;48:317–335. doi: 10.1016/j.annrmp.2005.04.009. [DOI] [PubMed] [Google Scholar]
  • 7.Horak FB, Wrisley DM, Frank J. The Balance Evaluation Systems Test (BESTest) to differentiate balance deficits. Phys Ther. 2009;89:484–498. doi: 10.2522/ptj.20080071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Franchignoni F, Tesio L, Martino MT, Ricupero C. Reliability of four simple, quantitative tests of balance and mobility in healthy elderly females. Aging (Milano) 1998;10:26–31. doi: 10.1007/BF03339630. [DOI] [PubMed] [Google Scholar]
  • 9.Woollacott M, Shumway-Cook A. Motor Control - Theory and Practical Applications. Baltimore: Lippincott Williams & Wilkins; 2001. Abnormal postural control; pp. 248–270. [Google Scholar]
  • 10.Tesio L. Measuring behaviours and perceptions: Rasch analysis as a tool for rehabilitation. J Rehabil Med. 2003;35:105–115. doi: 10.1080/16501970310010448. [DOI] [PubMed] [Google Scholar]
  • 11.Reeve BB, Fayers P. Applying item response theory modeling for evaluating questionnaire item and scale properties. In: Fayers P, Hays RD, editors. Assessing Quality of Life in Clinical Trials: Methods of Practice. 2. Oxford, NY: Oxford University Press; 2005. pp. 55–73. [Google Scholar]
  • 12.Andresen EM. Criteria for assessing the tools of disability outcomes research. Arch Phys Med Rehabil. 2000;81(Suppl 2):S15–S20. doi: 10.1053/apmr.2000.20619. [DOI] [PubMed] [Google Scholar]
  • 13.Reeve BB, Hays RD, Bjorner JB, Cook KF, Crane PK, Teresi JA, et al. PROMIS Cooperative Group. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS) Med Care. 2007;45(Suppl 1):S22–S31. doi: 10.1097/01.mlr.0000250483.85507.04. [DOI] [PubMed] [Google Scholar]
  • 14.Schermelleh-Engel K, Moosbrugger HH, Müller H. Evaluating the fit of structural equation models: tests of significance and descriptive goodness-of-fit measures. Methods of Psychological Research online. 2003;8:23–72. [Google Scholar]
  • 15.Cook KF, Teal CR, Bjorner JB, Cella D, Chang CH, Crane PK, et al. IRT health outcomes data analysis project: an overview and summary. Qual Life Res. 2007;16(Suppl 1):121–132. doi: 10.1007/s11136-007-9177-5. [DOI] [PubMed] [Google Scholar]
  • 16.Horn JL. A rationale and test for the number of factors in factor analysis. Psychometrika. 1965;30:179–185. doi: 10.1007/BF02289447. [DOI] [PubMed] [Google Scholar]
  • 17.Young FW. ViSta: The Visual Statistics System, UNC L.L. Thurstone Psychometric Laboratory Research Memorandum 94-1(c) 1996 [Google Scholar]
  • 18.Guadagnoli E, Velicer WF. Relation of sample size to the stability of component patterns. Psychol Bull. 1988;103:265–275. doi: 10.1037/0033-2909.103.2.265. [DOI] [PubMed] [Google Scholar]
  • 19.Linacre JM. Program manual 3.68.0. Chicago, IL: WINSTEPS.com; 2009. A user’s guide to WINSTEPS-MINISTEP: Rasch-model computer programs. Retrieved September 24, 2009, from http://www.winsteps.com/a/winsteps.pdf. [Google Scholar]
  • 20.Linacre JM. Optimizing rating scale category effectiveness. J Appl Meas. 2002;3:85–106. [PubMed] [Google Scholar]
  • 21.Wolfe EW, Smith EV., Jr Instrument development tools and activities for measure validation using Rasch models: part II – validation activities. J Appl Meas. 2007;8:204–234. [PubMed] [Google Scholar]
  • 22.Davidson M. Rasch analysis of 24-, 18- and 11-item versions of the Roland-Morris Disability Questionnaire. Qual Life Res. 2009;18:473–81. doi: 10.1007/s11136-009-9456-4. [DOI] [PubMed] [Google Scholar]
  • 23.Bond TG, Fox CM. Applying the Rasch model: fundamental measurement in the human sciences. 2. Mahwah: Lawrence Erlbaum Associates; 2001. [Google Scholar]
  • 24.Wright BD, Masters GN. Rating scale analysis. Chicago: Mesa Press; 1982. [Google Scholar]
  • 25.Linacre JM. Sample size and item calibration stability. Rasch Meas Trans. 1994;7:328. [Google Scholar]
  • 26.Hobart JC, Lamping DL, Freeman JA, Langdon DW, McLellan DL, Greenwood RJ, et al. Evidence-based measurement: which disability scale for neurologic rehabilitation? Neurology. 2001;57:639–644. doi: 10.1212/wnl.57.4.639. [DOI] [PubMed] [Google Scholar]
  • 27.Kornetti DL, Fritz SL, Chiu YP, Light KE, Velozo CA. Rating scale analysis of the Berg Balance Scale. Arch Phys Med Rehabil. 2004;85:1128–1135. doi: 10.1016/j.apmr.2003.11.019. [DOI] [PubMed] [Google Scholar]
  • 28.Chiu YP, Fritz SL, Light KE, Velozo CA. Use of item response analysis to investigate measurement properties and clinical validity of data for the dynamic gait index. Phys Ther. 2006;86:778–787. [PubMed] [Google Scholar]
  • 29.Wang CH, Hsueh IP, Sheu CF, Yao G, Hsieh CL. Psychometric properties of 2 simplified 3-level balance scales used for patients with stroke. Phys Ther. 2004;84:430–438. [PubMed] [Google Scholar]
  • 30.Berg KO, Wood-Dauphinee SL, Williams JI, Maki B. Measuring balance in the elderly: validation of an instrument. Can J Public Health. 1992;83(Suppl 2):S7–S11. [PubMed] [Google Scholar]
  • 31.Tinetti ME, Richman D, Powell L. Falls efficacy as a measure of fear of falling. J Gerontol. 1990;45:P239–P243. doi: 10.1093/geronj/45.6.p239. [DOI] [PubMed] [Google Scholar]
  • 32.Graybiel A, Fregly AR. A new quantitative ataxia test battery. Acta Otolaryngol. 1966;6:292–312. [PubMed] [Google Scholar]
  • 33.Shumway-Cook A, Horak FB. Assessing the influence of sensory interaction of balance. Suggestion from the field. Phys Ther. 1986;66:1548–1550. doi: 10.1093/ptj/66.10.1548. [DOI] [PubMed] [Google Scholar]
  • 34.Cohen H, Blatchly CA, Gombash LL. A study of the clinical test of sensory interaction and balance. Phys Ther. 1993;73:346–351. doi: 10.1093/ptj/73.6.346. [DOI] [PubMed] [Google Scholar]
  • 35.Whitney SL, Hudak MT, Marchetti GF. The dynamic gait index relates to self-reported fall history in individuals with vestibular dysfunction. J Vestib Res. 2000;10:99–105. [PubMed] [Google Scholar]
  • 36.Podsiadlo D, Richardson S. The timed “Up & Go”: a test of basic functional mobility for frail elderly persons. J Am Geriatr Soc. 1991;39:142–148. doi: 10.1111/j.1532-5415.1991.tb01616.x. [DOI] [PubMed] [Google Scholar]
  • 37.Shumway-Cook A, Brauer S, Woollacott M. Predicting the probability for falls in community-dwelling older adults using the Timed Up & Go Test. Phys Ther. 2000;80:896–903. [PubMed] [Google Scholar]
  • 38.Henry SM, Fung J, Horak FB. EMG responses to maintain stance during multidirectional surface translations. J Neurophysiol. 1998;80:1939–1950. doi: 10.1152/jn.1998.80.4.1939. [DOI] [PubMed] [Google Scholar]
  • 39.Nardone A, Schieppati M. Postural adjustments associated with voluntary contraction of leg muscles in standing man. Exp Brain Res. 1988;69:469–480. doi: 10.1007/BF00247301. [DOI] [PubMed] [Google Scholar]
  • 40.Kluzik J, Horak FB, Peterka RJ. Differences in preferred reference frames for postural orientation shown by after-effects of stance on an inclined surface. Exp Brain Res. 2005;162:474–489. doi: 10.1007/s00221-004-2124-6. [DOI] [PubMed] [Google Scholar]
  • 41.Marchetti GF, Whitney SL. Construction and validation of the 4-item dynamic gait index. Phys Ther. 2006;86:1651–1660. doi: 10.2522/ptj.20050402. [DOI] [PubMed] [Google Scholar]
  • 42.Coste J, Bouée S, Ecosse E, Leplège A, Pouchot J. Methodological issues in determining the dimensionality of composite health measures using principal component analysis: case illustration and suggestions for practice. Qual Life Res. 2005;14:641–654. doi: 10.1007/s11136-004-1260-6. [DOI] [PubMed] [Google Scholar]

RESOURCES