Psychometric and Measurement Properties of Concussion Assessment Tools in Youth Sports

Tamara C Valovich McLeod; William B Barr; Michael McCrea; Kevin M Guskiewicz

. 2006 Oct-Dec;41(4):399–408.

Psychometric and Measurement Properties of Concussion Assessment Tools in Youth Sports

Tamara C Valovich McLeod ^*, William B Barr ^†, Michael McCrea ^‡, Kevin M Guskiewicz ^§

PMCID: PMC1752194 PMID: 17273465

Abstract

Context: Establishing psychometric and measurement properties of concussion assessments is important before these assessments are used by clinicians. To date, data have been limited regarding these issues with respect to neurocognitive and postural stability testing, especially in a younger athletic population.

Objective: To determine the test-retest reliability and reliable change indices of concussion assessments in athletes participating in youth sports. A secondary objective was to determine the relationship between the Standardized Assessment of Concussion (SAC) and neuropsychological assessments in young athletes.

Design: We used a repeated-measures design to evaluate the test-retest reliability of the concussion assessments in young athletes. Correlations were calculated to determine the relationship between the measures. All subjects underwent 2 test sessions 60 days apart.

Setting: Sports medicine laboratory and school or home environment.

Patients or Other Participants: Fifty healthy young athletes between the ages of 9 and 14 years.

Main Outcome Measure(s): Scores from the SAC, Balance Error Scoring System, Buschke Selective Reminding Test, Trail Making Test B, and Coding and Symbol Search subsets of the Wechsler Intelligence Scale for Children were used in the analysis.

Results: Our test-retest indices for each of the 6 scores were poor to good, ranging from r = .46 to .83. Good reliability was found for the Coding and Symbol Search tests. The reliable change scores provided a way of determining a meaningful change in score for each assessment. We found a weak relationship ( r < .36) between the SAC and each of the neuropsychological assessments; however, stronger relationships ( r > .70) were found between certain neuropsychological measures.

Conclusions: We found moderate test-retest reliability on the cognitive tests that assessed attention, concentration, and visual processing and the Balance Error Scoring System. Our results demonstrated only a weak relationship between performance on the SAC and the selected neuropsychological tests, so it is likely that these tests assess somewhat different areas of cognitive function. Our correlational findings provide more evidence for using the SAC along with a more complex neuropsychological assessment battery in the evaluation of concussion in young athletes.

Keywords: neuropsychological testing, brain injury, athletic injuries, reliability

Sport-related concussion is a significant problem in all levels of athletic participation. Because of the potential complications and long-term consequences of returning to competition too early, ^1–3 sports medicine professionals are using more objective tools to assess athletes after a concussive injury. Several mental status and neuropsychological tests have been commonly used in high school ¹^,^4–6 and collegiate ^6–10 athletes to assess various cognitive domains after injury. Additionally, measures of postural stability have made their way into many concussion assessment protocols. ^11–14

An ideal sport concussion assessment battery should consist of tests that are objective, reliable, valid, easy to administer, and time efficient. ¹⁵ Part of the efficiency equation is finding tests that assess the areas susceptible to deficits after concussion yet do not overlap in their assessment domains. Therefore, an understanding of the relationships among the various tests will aid the clinician in creating an appropriate test battery. Additionally, before such assessments are used to evaluate sport-related concussion, they should be validated through establishment of test-retest reliability, sensitivity, validity, reliable change index (RCI) scores, and clinical utility. ¹⁶

Test-retest reliability denotes the correlation between 2 test administrations and refers to the stability of the instrument. The test-retest reliability on some measures of cognitive function has been studied in professional rugby players. ¹⁷ Test-retest values were moderate to good for the Speed of Comprehension Test ( r = .78), the Digit Symbol Test ( r = .74), and the Symbol Digit Test ( r = .72). At the high school level, Barr ¹⁸ evaluated the test-retest effects, RCIs, and sex differences on the Wechsler Intelligence Scale for Children (WISC-III) Digit Span and Processing Speed Tests, Trail Making Test, Controlled Oral Word Association Test, and the Hopkins Verbal Learning Test. Test-retest reliabilities ranged from r = .39 to .78.

Test-retest reliability for various computerized neuropsychological platforms has also been reported, most often demonstrating moderate to good reliability. Iverson et al ¹⁹ found moderate to good reliability on the composite scores for Verbal Memory ( r = .70), Visual Memory ( r = .67), Reaction Time ( r = .79), and Processing Speed ( r = .86) of ImPACT, version 2.0. Reliability coefficients of .68 to .82 have been reported on the HeadMinder Concussion Resolution Index, ²⁰ and moderate to good reliability has been reported using CogSport for speed indices ( r = .69 to .82); however, lower reliability coefficients were reported for the accuracy indices ( r = .31 to .51). ²¹

Interest has grown recently in defining the most clinically useful methods for detecting change in neuropsychological test scores when utilizing the test-retest paradigm. In interpreting postconcussion test scores, statistical methods involving a control group are not necessarily helpful for the clinician who needs to determine whether fluctuations on one athlete's assessment represent meaningful changes or normal variability in performance. ²² Much attention has focused on the use of the RCI and standardized regression-based methods, techniques developed and refined in studies of outcomes from psychotherapy and surgical treatment. The RCI has been useful to help account for differences in the test-retest reliability and practice effects with serial testing and has been utilized to assess intraindividual differences over time. ¹⁷^,^23–25 The RCI analysis includes adjustments for practice effects and can help with the predictive capabilities of a test instrument. ²³ Reliable change scores have been published for both pencil-and-paper ¹⁷^,¹⁸ and computerized test batteries. ¹⁹

Although cognitive and postural stability assessments for sport-related concussion now are often performed on high school athletes, ⁴^,¹⁸ research into the use of these tools in athletes younger than high school age has been limited. Recently, Valovich McLeod et al ²⁶ found the Standardized Assessment of Concussion (SAC) and the Balance Error Scoring System (BESS) to be appropriate tests for 9- to 14-year-old athletes. They also demonstrated a practice effect with serial BESS testing (subjects improved on their balance performance by the third time they performed the task) ²⁶; however, little is known about the psychometric and measurement properties of other concussion assessments in this younger population. Therefore, our purpose was to determine the test-retest reliability and RCIs of cognitive and balance tests in a youth sports population. Because many different assessment tools are available to the clinician and because little is known about the relationships among these tools, our secondary purpose was to determine the relationship between the SAC and several neuropsychological assessments in this age group.

METHODS

We used a quasi-experimental, repeated-measures design to evaluate the test-retest reliability of the concussion assessments in young athletes. All subjects underwent an initial test consisting of administration of the SAC, BESS, and a neuropsychological test battery designed for children between the ages of 9 and 14 years. All subjects returned approximately 60 days after their initial test for 1 follow-up test session (mean test-retest interval = 57.94 ± 4.15 days). This time interval was chosen to reflect an interval between baseline testing and the latter part of an athletic season ¹⁸ and represented an adequate time frame for demonstrating the test-retest reliability of measures used to study sport-related concussion. ¹⁶

Subjects

Fifty youth sport participants were recruited from the local community to participate in this study. Descriptive data are presented in Table 1. Male (n = 24) and female (n = 26) participants were selected based on the following general criteria: (1) participation in recreational or competitive athletics (baseball, softball, soccer, gymnastics, or swimming), (2) no lower extremity musculoskeletal injuries in the 6 months before testing, (3) no history of head injury, (4) no diagnosed visual, vestibular, or balance disorders, and (5) no diagnosis of attention deficit disorder or learning difficulty. All inclusion and exclusion criteria were determined from self-report and parental report. Before participation, each subject and his or her parent or guardian read and signed an informed consent form approved by the university's Institutional Review Board for the Protection of Human Subjects, which also approved the study. The test-retest data were excluded for a control subject who sustained a concussion before the follow-up test.

Table 1. Participants' Descriptive Data (Mean ± SD).

graphic file with name i1062-6050-41-4-399-t01.jpg

Open in a new tab

Instrumentation

Neuropsychological Test Battery

The neuropsychological test battery consisted of 4 tests with established norms for this age group: the Buschke Selective Reminding Test (SRT), Trail Making Test B (Trails B), and the Symbol Search and Coding subsets of the WISC-III (The Psychological Corp, San Antonio, TX). We chose each neuropsychological assessment because of the cognitive domain assessed as well as the similarity to the neuropsychological batteries commonly employed to assess older athletes after concussion.

The Buschke SRT was used to measure verbal learning and memory during a multiple-trial list-learning task. ²⁷ The test involved reading the subject a list of words and then having the subject recall as many of the words as possible. Each subsequent reading of the list involves only the items that were not recalled on the immediately preceding trial. For the age group in our study, we presented 12 words for 8 trials or until the child recalled all 12 words on 2 consecutive trials. Scores were calculated for sum total (total number of words recalled), continuous long-term recall (CLTR) (number of words recalled continuously), and delayed recall (number of words recalled after 20 minutes). The 2 alternate forms of the SRT for children were counterbalanced between subjects and test sessions. ²⁸

The Trails B from the Halstead Reitan Neuropsychological Test Battery (Reitan Neuropsychological Laboratory, Tucson, AZ) was used to test speed of attention, sequencing (a measure of mental organization and tracking with a set order of priority), mental flexibility, and visual scanning (measure of visual processing and target detection) and motor function. ²⁹ The test was essentially a connect-the-dots activity, which required that the child alternate between the numeric and alphabetic sequencing systems (progressing from 1 to A to 2 to B to 3, etc). We used the children's, or intermediate, form for this study. We recorded the length of time it took the participant to complete the task and the number of errors committed.

The Symbol Search subset of the WISC-III assessed attention, visual perception, and concentration. ³⁰ The subjects scanned 2 groups of symbols and indicated whether the target symbol appeared in the search group. They had 120 seconds to complete as many items as possible.

The Coding subset of the WISC-III assessed processing speed (rate of cognitive processing), concentration, and attention. ³⁰ The testing tool consisted of rows of blank squares with numbers 1 to 9 written above each square. A key printed on the top of the test sheet paired each number with a symbol. The child's task was to fill in the blanks with the correct symbol as quickly as possible. The child was given 90 seconds to complete the coding task.

Standardized Assessment of Concussion

The SAC is an instrument designed to assess acute neurocognitive impairment on the sideline and includes measures of orientation, immediate memory, concentration, and delayed recall. ⁶^,³¹ The instrument requires 5 to 7 minutes to administer and was designed for use by individuals with no prior expertise in neurocognitive test administration, including certified athletic trainers. Alternate forms (A, B, and C) of the SAC were designed to minimize practice effects during follow-up testing. Previous researchers ³²^,³³ have demonstrated multiple form equivalence with no differences among the 3 forms. Orientation was assessed by asking the subject to provide the day of the week, date, month, year, and time. A 5-word list of unrelated terms was used to measure immediate memory. The list was read to the subject for immediate recall and the procedure repeated for a total of 3 trials. Concentration was assessed by having the subject repeat strings of numbers in the reverse order of their reading by the examiner and by reciting the months of the year in reverse order. Delayed recall of the 5-word list was also recorded. A composite score, with a maximum of 30 points, was derived. We used all 3 SAC forms and counterbalanced them among subjects and test sessions.

Balance Error Scoring System

The BESS consists of 6 separate 20-second balance tests that the subjects perform in different stances and on different surfaces. ³⁴ A 16-in × 16-in– (40.64-cm × 40.64-cm–) piece of medium-density (60 kg/ m ³) foam (Exertools, Inc, Novato, CA) was used to create an unstable surface for the subjects. The test consisted of 3 stance conditions (double leg, single leg, and tandem) and 2 surfaces (firm and foam). Errors were recorded as the quantitative measurement of postural stability under different testing conditions. These errors included (1) opening the eyes, (2) stepping, stumbling, and falling out of the test position, (3) lifting the hands off the iliac crests, (4) lifting the toes or heels, (5) moving the leg into more than 30° of flexion or abduction, and (6) remaining out of the test position for more than 5 seconds.

Procedures

Testing was conducted either in the Sports Medicine/Athletic Training Research Laboratory at the University of Virginia or at the child's school or home. For each subject, testing was performed in a quiet room at the same location for both test administrations. Approximately half of the subjects were tested at the University and half at their schools or homes. Testing consisted of 2 test sessions scheduled approximately 60 days apart. Participants were not restricted from sport participation or recreational activities in the time between the test sessions. A single investigator performed all test administrations to ensure optimal consistency of administration procedures. No other individuals were present in the room during the test sessions. Before data collection, the principal investigator completed training in administration of the neuropsychological assessments with a pediatric neuropsychologist and laboratory technician in the Neuropsychology Assessment Laboratory at the University of Virginia.

During each test session, the assessments were performed in the following order: SAC, BESS, Buschke SRT, Trails B, Coding, Symbol Search, and Delayed SRT. The alternative forms of the SAC and Buschke SRT were used and counterbalanced among subjects and test sessions. The neuropsychological test battery took approximately 15 minutes to administer and consisted of the Buschke SRT, Trails B (version for those 9 to 14 years of age), and the Coding and Symbol Search subsets of the WISC-III.

The BESS testing took approximately 10 minutes per subject, and all scores were recorded on a form by the primary investigator. The order of trials followed a format that progressively increased the demands placed on the sensory systems: double leg, single leg, and tandem on firm surface and then foam surface. To ensure consistency among subjects tested at different sites, all trials on the firm surface were performed on a thin carpet over a tile or linoleum floor. Subjects were asked to assume the required stance by placing their hands on their iliac crests, and once they closed their eyes, the test began. During the single-leg stances, the subjects were asked to maintain the contralateral limb in 20° of hip flexion and 40° of knee flexion. Additionally, subjects were asked to stand quietly and as motionless as possible in the stance position, keeping their hands on their iliac crests and their eyes closed. Subjects were told that upon losing their balance, they should make any necessary adjustments and return to the stance position as quickly as possible. Performance was assessed by individual trial scores and by adding the error points for each of the 6 trials. Trials were considered incomplete if the subject could not sustain the stance position for longer than 5 seconds. In these instances, subjects were then assigned a standard maximum score of 10 for that stance, ³⁴ a situation that occurred on only 3 occasions (once each in 3 subjects).

Statistical Analyses

Separate scores from the Buschke SRT trials were calculated for a sum total score (SRT Sum) and a CLTR, creating 6 neuropsychological test scores in our analyses. A 2 × 2 analysis of variance was used to evaluate sex and age group differences in the initial test data. We used intraclass correlation coefficients (ICC[2,1]) and the Pearson product moment correlation ( r) to determine the test-retest reliability for each assessment. Pearson correlations were included because they have commonly been used in the test-retest literature regarding concussion assessments ^17–19 and are needed to calculate the RCI scores. Coefficients of less than .50 were considered to indicate poor reliability; coefficients measuring from .50 to .75 indicated moderate reliability; and coefficients measuring greater than .75 indicated good reliability. ³⁵ Separate paired-samples t tests were performed to analyze whether significant differences existed between the initial test and the retest. Reliable change index scores were calculated using the results from the Pearson correlations and the SD of the initial score, as recommended by Jacobson and Truax. ³⁶ We corrected for practice effects by adding the mean change score to the confidence interval, as suggested by Chelune et al. ³⁷ We also used Pearson product moment correlations to determine the relationship between the SAC and each of the 6 neuropsychological test scores. All analyses were performed using SPSS (version 12.0; SPSS Inc, Chicago, IL), and significance was set a priori at P ≤ .05.

RESULTS

Initial Testing

The means and SDs for the entire sample from the initial test session are presented in Table 2. Separate results are also listed for male and female athletes and for younger (9-to-11-year-old) and older (12-to-14-year-old) athletes, chosen to represent the younger and older 3-year spans in our sample. Significant differences were found between the sexes on the SAC, with females outperforming males. With respect to age, the older athletes scored better on the BESS, Trails B, and Coding tests; no significant differences were noted on the other tests.

Table 2. Initial Test Session Scores for the Entire Sample.

graphic file with name i1062-6050-41-4-399-t02.jpg

Open in a new tab

Test-Retest Reliability

Our reliability values are presented in Table 3, and the test scores for the initial and retest sessions are listed in Table 4. Our test-retest indices for each of the 8 assessments ranged from poor (ICC = .46, r = .46) to good (ICC = .83, r = .83). We found significant decreases in BESS errors ( t ₄₈ = 3.010, P = .004) and the time to complete the Trails B ( t ₄₈ = −3.496, P = .001) at the retest session compared with the initial test ( Table 5). For both of these assessments, a significant decrease demonstrated improved performance. No other assessments were significantly different between test sessions ( Table 5). We did find some observable differences in the test-retest reliability between the sexes and between the 9-to-11-year-old and the 12-to-14-year-old subjects ( Table 6).

Table 3. Reliability of Assessments Across the Entire Sample.

graphic file with name i1062-6050-41-4-399-t03.jpg

Open in a new tab

Table 4. Test Scores From the Initial and Retest Sessions for the Entire Sample.

graphic file with name i1062-6050-41-4-399-t04.jpg

Open in a new tab

Table 5. Paired-Samples t Tests Between the Initial and Retest Sessions.

graphic file with name i1062-6050-41-4-399-t05.jpg

Open in a new tab

Table 6. Test-Retest Reliability by Sex and Age Group (Intraclass Correlation Coefficient [2,1]).

graphic file with name i1062-6050-41-4-399-t06.jpg

Open in a new tab

Reliable Change Indices

The RCIs for the assessments are listed in Table 7 for the 90%, 80%, and 70% confidence intervals (CIs) for both the raw score and the whole-number units. Based on the 70% CI, which is the most conservative index, a decrease in 2 SAC points, 5 Buschke SRT words, 16 CLTR words, 2 Coding points, 2 Symbol points, and 2 SRT delayed recalls, as well as an increase of 3 BESS errors and 14 seconds in the Trails B time, would indicate a change in performance consistent with impairment on these measures.

Table 7. Adjusted Reliable Change Indices Calculated for 90%, 80%, and 70% Confidence Intervals.

graphic file with name i1062-6050-41-4-399-t07.jpg

Open in a new tab

Relationship Between Standard Assessment of Concussion and Neuropsychological Tests

The correlation matrix describing the relationship between the SAC and the neuropsychological test scores for all 50 subjects during their initial test is found in Table 8. We noted significant ( P < .05) positive correlations between the SAC and 4 of the 6 neuropsychological scores: SAC and Symbol Search ( r = .32), SRT Sum ( r = .28), and SRT Delayed Recall ( r = .36). The SAC and Trails B were negatively correlated ( r = −.29). Higher scores on the SAC tended to be associated with higher scores on the Symbol Search, SRT Sum, and SRT Delayed Recall and with faster times on the Trails B.

Table 8. Correlations Between the Standard Assessment of Concussion and Neuropsychological Tests.

graphic file with name i1062-6050-41-4-399-t08.jpg

Open in a new tab

We found some higher correlations among the neuropsychological assessments, with the Trails B negatively correlated with the Coding ( r = −.53) and Symbol Search ( r = −.73). Positive correlations were found between the Symbol Search and Coding ( r = .65), the SRT Sum and CLTR ( r = .92), and the SRT Sum and SRT Delayed Recall ( r = .54) tests.

DISCUSSION

Our findings provide some insight into the psychometric and measurement properties of various concussion assessment tools that could be used to evaluate concussion in young athletes. Although more evidence exists on the use of various assessments in professional and collegiate athletes and although high school athletes are increasingly being studied, our investigation is one of the first to research the measurement properties of neuropsychological and balance tests in a youth sports population.