Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Dec 1.
Published in final edited form as: J Psychosoc Rehabil Ment Health. 2021 Apr 29;8(3):231–246. doi: 10.1007/s40737-021-00218-8

Rasch analysis of the Behavioral Assessment Screening Tool (BAST) in chronic traumatic brain injury

Shannon Juengst 1,2, Emily Grattan 3, Brittany Wright 2, Lauren Terhorst 4
PMCID: PMC8673913  NIHMSID: NIHMS1730568  PMID: 34926129

Abstract

The Behavioral Assessment Screening Tool (BAST) measures neurobehavioral symptoms in adults with traumatic brain injury (TBI). Exploratory Factor Analyses established five subscales: Negative Affect, Fatigue, Executive Function, Impulsivity, and Substance Abuse. In the current study, we assessed all the subscales except Substance Abuse using Rasch analysis following the Rasch Reporting Guidelines in Rehabilitation Research (RULER) framework. RULER identifies unidimensionality and fit statistics, item hierarchies, targeting, and symptom severity strata as areas of interest for Rasch analysis. The BAST displayed good unidimensionality with only one item from the Impulsivity scale exhibiting potential item misfit (MnSQ 1.40). However, removing this item resulted in a lower average domain measure (1.42 to −1.49) and higher standard error (0.34 to 0.43) so the item was retained. Items for each of the four subscales also ranged in difficulty (i.e. endorsement of symptom frequency) with more severe symptoms being endorsed in the Fatigue subscale and more mild symptoms being endorsed in the Impulsivity subscale. Though Negative Affect and Executive Function displayed appropriate targeting, the Fatigue and Impulsivity Subscales had larger average domain values (1.35 and −1.42) meaning that more items may need to be added to these subscales to capture differences across a wider range of symptom severity. The BAST displayed excellent reliability via item and person separation indices and distinct strata for each of the four subscales. Future work should use Rasch analysis in a larger, more representative sample, include more items for the Fatigue and Impulsivity subscale, and include the Substance Abuse subscale.

Keywords: traumatic brain injury, psychometrics, measurement, behavior, emotions

Introduction

The Behavioral Assessment Screening Tool (BAST) assesses frequency of self-reported multidimensional neurobehavioral symptoms in community-dwelling adults with traumatic brain injury (TBI). Active collaboration with clinical and research experts in brain injury rehabilitation and with individuals with TBI and their care partners resulted in a comprehensive screening measure of neurobehavioral symptoms common after and relevant to brain injury, designed to be completed remotely and independently by individuals living in the community (Juengst, Terhorst, Dicianno, et al., 2018; Osborne et al., 2019). In addition to strong content validity, established through multiple approaches (Juengst, Terhorst, Dicianno, et al., 2018; Osborne et al., 2019), the BAST includes accessible language literacy (8th-9th grade reading level or below) in English and Spanish (Higashi & Juengst, 2019). The factor structure of the BAST, established through factor analysis, and good internal consistency reliabilities of its five subscales reflect the multidimensional and theoretically-based concept model upon which it was based (Juengst et al., 2017; Juengst, Terhorst, & Wagner, 2018). Though the BAST comprehensively covers the broad construct of neurobehavioral symptoms with the purpose of capturing patterns in problematic symptoms across multiple domains, the BAST subscales each assess a unidimensional construct and can therefore also stand alone. The Negative Affect, Fatigue, and Executive Function BAST subscales also demonstrate good initial convergent and discriminant validity. The Negative Affect subscale also demonstrates good known-groups validity for classifying those with moderate-severe depressive symptoms and moderate-severe anxiety symptoms (Juengst et al., 2019). Though very promising, the psychometric properties of the BAST have, to date, been primarily established in a single sample of one-hundred-ten community-dwelling adults with TBI using classical measurement methods. Further, the BAST subscale scores present averages of an ordinal scale, rather than interval scores that account for the interplay between item difficulty and person ability. To ensure accurate and reliable neurobehavioral symptom screening in this population, validation studies are required for each of the BAST subscales.

The objective of the current study was to apply Rasch Measurement Theory to the BAST subscales following elements from the Rasch Reporting Guidelines in Rehabilitation Research (RULER) framework (Rasch Reporting Guidelines Task Force, n.d.). Rasch analysis provides an opportunity to transform ordinal scales into interval-like scores to account for differences in item-difficulty and differences across categories in an ordinal response scale. Additionally, unidimensionality of scales or subscales can be verified by examination of item-fit, and items that do not fit within a construct can be revised or eliminated based on the objective(s) of the scale or subscale. Specifically, we aimed to determine: 1) whether all items in each subscale contributed to a unidimensional construct; 2) the extent to which the items were of appropriate difficulty for the sample; 3) the hierarchy of items from least to most difficult within each subscale; and 4) how well the items distinguish different levels of neurobehavioral symptom severity within each subscale.

Methods

Design & Participants

This was a cross-sectional measurement development study to further validate the BAST in community-dwelling adults with chronic TBI. Informed consent and data collection occurred electronically, via RedCap (Harris et al., 2009), and all procedures were approved by the University of Texas Southwestern Medical Center Institutional Review Board. Inclusion criteria were: 1) at least one self-reported lifetime TBI, following established definitions (Corrigan & Bogner, 2007; Menon et al., 2010); 2) >3 months post-injury; 3) ≥18 years old; 4) English fluency. Individuals were excluded if they reported current active bipolar or psychotic disorder or dementia. Participants were recruited through study flyers posted in outpatient Physical Medicine & Rehabilitation clinics at University of Texas Southwestern Medical Center, email and paper mailings to participants in past studies requiring medical documentation of TBI, including the North Texas TBI Model Systems and the University of Pittsburgh TBI Model System, and ConTex concussion registry, through emails to the University of Texas Southwestern Medical Center Acquired Brain Injury Research Registry, and through community organizations serving individuals with TBI. Flyers and emails included a link to the RedCap electronic survey where participants completed informed consent and all self-reported measures for this study.

Lifetime history of TBI was determined through an electronic self-reported adaptation of the OSU-TBI (Corrigan & Bogner, 2007; Lequerica et al., 2018). To characterize lifetime TBI, we derived worst lifetime TBI, history of TBI with loss of consciousness, total number of lifetime TBI, and age at first TBI from the modified OSU-TBI. Identification of mild TBI required history of an injury to the head resulting in either loss of consciousness or being dazed/having memory lapses. Identification of moderate to severe TBI required an injury to the head resulting in a loss of consciousness of >30 minutes (>24 hours differentiated severe from moderate TBI).

Primary Measure: Behavioral Assessment Screening Tool (BAST)

The BAST is available in both English and Spanish (Higashi & Juengst, 2019; Juengst, Terhorst, Dicianno, et al., 2018; Juengst, Terhorst, & Wagner, 2018; Osborne et al., 2019), though this study presents validation data for the English version of the BAST only. Participants rated up to 48-items (42 primary items, 6 sub-items asked only when participants endorsed feeling stressed) on a 5-level ordinal scale (never, rarely, sometimes, often, very often). The five primary subscales, identified through prior Exploratory Factor Analysis (Juengst, Terhorst, & Wagner, 2018) and derived from the 42 primary items are: Negative Affect, Executive Function, Fatigue, Impulsivity, and Substance Abuse. After initial factor analysis, 1 new Substance Abuse item was added to the original items based on literature review and expert consultation. Follow-up factor analysis in the current sample resulted in removal of five items and moving two items to different subscales (1 from Executive Function to Impulsivity and 1 from Fatigue to Negative Affect), the latter changes consistent with conceptual make-up of the subscales (Juengst & Terhorst, 2020). Therefore, the BAST version on which we performed Rasch analysis has 37 items in its five subscales: Negative Affect (10 items), Executive Function (10 items), Fatigue (5 items), Impulsivity (5 items), and Substance Abuse (3 items). The 6-sub-items related to coping provide contextual information for clinicians, but are not recommended for use as distinct subscales at this time. Given the questionable internal consistency reliability of the Substance Abuse subscale in the present sample (α=.616) and item-level reliabilities (Juengst & Terhorst, 2020), we are conducting further item development prior to validation of this subscale using Rasch analysis. Therefore, the focus of the present validation study was on the Negative Affect, Executive Function, Fatigue, and Impulsivity subscales.

Statistical Analysis

All data were analyzed using the Rasch Model with Winsteps, version 4.5.5. The Masters partial credit model was used for the analysis given the ordinal nature of the item response options and the expectation that the metric distance between the thresholds that separate response options 1 and 2 would not be the same as the distance between other points (e.g., between 3 and 4) for all items (Masters, 1982). The items in each subscale were examined separately, given that prior psychometric work established that subscales measured distinct constructs. Items indicating positive symptoms or behaviors (e.g., “I was organized”, “I planned ahead”), which make up the Executive Function subscale, were reverse scored prior to analysis.

Unidimensionality

Although subscale unidimensionality was established using exploratory factor analysis in prior studies, we further verified this property in the Rasch Model by assessing item fit statistics. The fit statistics, labeled as infit and outfit mean squares (Mnsq), are calculated as the ratio of the observed variance to the expected variance, and represent the extent to which persons and items fit the intended construct (i.e., negative affect, executive function, fatigue, impulsivity). Itme fit values <1.4 are considered desirable (Applying the Rasch Model, n.d.). For example, a Mnsq value of 1.4 would indicate that the item or person is 40% more variable than the ideal value of 1. Person fit values <2.0 are considered desirable (Table 6.1 Person Statistics in Misfit Order: Winsteps Help, n.d., p. 1), with higher values indicating unexpected results or unmodeled noise.

Sample Size Considerations and Handling of Misfitting Items

Recommendations for sample size adequacy vary based on aims of the Rasch analysis (i.e., establishing structural validity, measurement accuracy), if participants are well-aligned with items, and response categories endorsement frequencies. If a subscale is well-targeted, a sample size as small as 50 could provide 99% confidence of stable person measures within ±1 logit (Sample Size and Item Calibration or Person Measure Stability, n.d.). Experts recommend at least 10 responses per rating scale category be obtained to accurately examine the rating scale step/category structure (Linacre, 1999b, 1999a, 2002). Our sample size of n=132 participants exceeds the recommended minimum for stable person measures and should provide saturation across rating scale categories; however, we examined these properties to verify appropriateness.

Item fit was assessed using infit/outfit statistics, the average domain measures, and separation indices. All items were within thresholds and cut points; thus, all were retained for further analysis. Person fit is reported, but no persons are removed, as we wished to keep the study sample intact and different participants ‘misfit’ on different subscales.

Item Hierarchies

Next, to examine hierarchy of items, we examined item difficulty values, estimated as logits in Rasch analysis, with higher logits indicating more difficulty (i.e., likely to be endorsed only be those with more severe symptoms). In the context of the BAST measure, items that are too ‘easy’ would indicate that most respondents, even those with mild symptoms, are reporting the symptom (i.e., endorsing the ‘very often’ response), whereas items that are ‘difficult’ would indicate that only those with severe symptoms would select the ‘very often’ response. As symptom severity increases, we would expect that the probability of selecting a high category would increase, and as symptom severity decreases, we would expect that the probability of selecting a high category would decrease. This provides evidence of monotonicity for item responses.

Targeting

The extent to which the items were of appropriate difficulty for the sample was assessed by examining the average domain (subscale) measure. Average domain measures that approach zero indicate that items capture symptom severity within the sample (i.e., perfect targeting); however, higher values of average domain measures may indicate that items are not capturing severity of symptoms. Values further from 0 indicate that the domain is easier or harder (positive domain measures) than expected or that persons report more or less severity than expected (negative domain measures). Average domain measures ≥ .5 units from 0 indicate slight mistargeting and values ≥ 1 units away from 0 indicate stronger levels of mistargeting. The information obtained from examining average domain measures is useful to determine whether more items that measure milder or more severe symptoms should be added to the BAST subscales to capture the entire range of symptoms.

To examine how well the items distinguished different levels of neurobehavioral symptom severity within each subscale, we assessed person and item separation indices. A larger separation index indicated more distinct levels of a construct could be distinguished by the BAST subscale items. We used the threshold of 1.5 to 2.0 as ‘acceptable’ to ‘good’ levels of separation.

Symptom Severity Strata

From the separation index, we calculated the number of strata – or symptom severity levels distinguishing groups of persons – for each subscale, using the equation ((4*Separation Index)+1)/3 (Silverstein et al., 1992; Wright & Masters, 1982). We also computed separation reliability indices, which can be interpreted like Cronbach’s alpha. In Winsteps, a separation reliability index of 1.5 corresponds to a Cronbach’s alpha level .70, whereas a separation index of 2.0 corresponds to a Cronbach’s alpha of .80.

Results

Participants

Participant characteristics can be found in Table 1. A total of n=135 adult participants (n=55 men; n=78 women; n=1 Transgender/Other; n=1 gender not reported) with a history of mild to severe TBI completed the BAST for this psychometric validation study. Participants were, on average, 44.2 years old (SD=15.7). The majority identified as non-Hispanic White (80.0%), and all participants had at least a high school diploma or equivalent, with the majority (57.7%) having a Bachelor’s degree or higher. A quarter of participants (25.6%) had a history of at least one moderate to severe to TBI, with the rest having a history of mild TBI with or without loss of consciousness. A third of the sample had a history of more than one lifetime TBI.

Table 1.

Participant Characteristics (n=133)

Characteristic Mean (SD), Range
Age (years) 44.2 (15.7), 18–81
Time since most recent TBI (years) 7.6 (11.0), <1–57
Time since first TBI (years) 14.9 (16.4), <1–62
BAST Subscale Averages
 Negative Affect 3.18 (0.80), 1.50–4.90
 Fatigue 3.58 (0.92), 1.00–5.00
 Executive Function* 2.47 (0.70), 1.10–4.10
 Impulsivity 2.25 (0.74), 1.00–4.80
n (%)
Gender
 Men 54 (40.6)
 Women 77 (57.9)
 Transgender/Other/Unknown 2 (1.6)
Race and Ethnicity±
 White, non-Hispanic 107 (80.5)
 Black, non-Hispanic 9 (6.8)
 White, Hispanic 6 (4.5)
 Black Hispanic 2 (1.5)
 Asian or Asian American 7 (5.3)
 Middle Eastern or North African 0 (0)
 Native American, Alaska, or Hawaiian 3 (2.3)
Education
 <High School 0 (0)
 High School Diploma or GED 45 (34.8)
 Undergraduate College Degree 54 (40.7)
 Graduate Degree 34 (25.6)
TBI Injury Severity (worst lifetime injury)
 Mild, no LOC 43 (32.4)
 Mild, with LOC 56 (42.1)
 Moderate 7 (5.3)
 Severe 27 (20.3)
Total lifetime number of TBI (all severities)
 1 93 (69.9)
 2 23 (17.3)
 3 11 (8.3)
 4+ 6 (4.5)

BAST=Behavioral Assessment Screening Tool; TBI=Traumatic Brain Injury

*

Items are reverse-scored

±

Participants could select multiple Race and Ethnicity categories

Unidimensionality

Given the previous work on BAST subscales, it was not surprising that only one item ‘I made inappropriate sexual jokes or comments’ from the impulsivity scale was possibly misfitting, with an infit value right at the recommended cut-point (MnSQ 1.40; See Table 2).

Table 2.

Item Fit Statistics in Order of Difficulty within Subscales

Subscale Items Measure across categories Measure in middle category SE Infit MNSq Infit Zstd Outfit MNsq Outfit Zstd
Negative Affect I did not enjoy activities that are usually important to me 0.87 0.59 0.10 0.90 −0.79 0.94 −0.45
I felt depressed or hopeless 0.87 0.68 0.11 0.86 −1.15 0.81 −1.54
I got mad easily 0.66 0.46 0.12 1.22 1.75 1.19 1.52
I felt guilty 0.50 0.27 0.11 1.24 1.89 1.29 2.21
I felt anxious 0.00 −0.11 0.11 0.92 −0.63 0.89 −0.83
Thoughts got stuck in my head and I could not stop thinking about them −0.01 0.14 0.11 1.25 2.02 1.35 2.55
I felt worried −0.56 −0.23 0.11 0.90 −0.84 0.84 −1.21
I felt stressed −0.61 −0.05 0.11 0.88 −0.99 0.84 −1.18
I felt overwhelmed −0.77 −0.19 0.11 0.70 −2.88 0.80 −1.50
When something upset me, I kept thinking about it −0.94 −0.33 0.12 1.13 1.08 1.11 0.77
Executive Function I finished things that I started 0.41 −0.34 0.12 0.74 −2.40 0.74 −2.15
I understood how my actions made other people feel 0.32 −0.38 0.13 1.14 1.08 1.15 1.14
I planned ahead 0.28 −0.40 0.11 1.19 1.50 1.15 1.11
I started activities on my own 0.28 −0.33 0.12 0.95 −0.42 0.94 −0.44
I thought about how others were feeling 0.24 −0.23 0.11 1.20 1.45 1.18 1.28
When I had a problem, I could think of multiple solutions 0.07 −0.43 0.12 1.27 2.07 1.25 1.72
I followed through on my responsibilities −0.03 0.00 0.13 0.72 −2.60 0.73 −2.33
I was able to adapt when things did not go as planned −0.10 −0.47 0.13 0.83 −1.42 0.83 −1.33
I was organized −0.44 −0.57 0.11 0.96 −0.32 0.96 −0.32
I was able to pay attention to more than one thing at a time. −1.03 −1.07 0.11 0.97 −0.20 0.99 −0.09
Fatigue I limited my physical activities because of fatigue 0.79 0.76 0.13 1.25 1.92 1.24 1.83
I felt too tired to finish tasks that required thinking 0.66 0.85 0.14 1.00 0.03 1.03 0.25
I need to rest to get through my day 0.12 0.09 0.13 1.23 1.76 1.26 1.91
I had low energy −0.69 −0.35 0.15 0.79 −1.69 0.82 −1.30
I felt tired −0.87 −0.07 0.15 0.65 −2.91 0.61 −2.94
Impulsivity I made inappropriate sexual comments or jokes 0.80 −0.15 0.13 1.40 2.53 1.41 1.91
I did things that were unsafe 0.51 −0.17 0.13 0.77 −1.87 0.81 −1.35
I acted rudely 0.00 0.98 0.15 0.93 −0.51 0.92 −0.63
I took unnecessary risks −0.17 −0.48 0.13 0.96 −0.28 0.97 −0.23
I reacted without thinking −1.14 −1.03 0.12 0.96 −0.31 0.95 −0.38

Sample Size Considerations and Handling of Misfitting Items

Based on our examination of the separation indices, it seems that the sample was well-aligned with the items, and our sample size was sufficient to produce accurate estimates. Although the Fatigue and Impulsivity scales had moderate to strong mistargeting based on average domain measures, the separation indices and reliabilities were strong for all four subscales. With respect to at least 10 responses per rating category, we noticed frequencies lower than 10 in the higher categories (i.e., very often, often) of the rating scale for the executive function and impulsivity subscales, and lower frequencies in the lowest category (i.e., never) for the negative affect and fatigue subscales.

Given potential misfitting, we reran analyses on the Impulsivity subscale without the ‘I made inappropriate sexual jokes or comments’ item, resulting in the person average domain measure decreasing from −1.42 to −1.49, and the person standard error increasing from 0.15 to 0.19. Additionally, the item standard error increased from 0.34 to 0.43. This decrease in average domain measure and increase in error indicated that the ‘inappropriate’ item was not poorly fitting and so we kept it on the Impulsivity subscale.

Person Fit

For Negative Affect, infit MNSQ values ranged from .15–4.16 and outfit MNSQ values ranged from .14–3.15, with 12 participants scores >.20. For Executive Function, infit MNSQ values ranged from .09–4.23 and outfit MNSQ values ranged from .09–4.43, with 9 participants scores >.20. For Fatigue, infit MNSQ values ranged from .06–4.40 and outfit MNSQ values ranged from .07–4.92, with 14 participants scores >.20. For Impulsivity, infit MNSQ values ranged from .06–4.09 and outfit MNSQ values ranged from .06–3. 85, with 18 participants scores >.20. Given the four subscales are highly correlated (r=.330–.600), unexpected values may represent overlap with other symptom domains.

Item Difficulty Hierarchy.

Items are ranked by difficulty for each subscale in Table 2, with logits reported across all response categories and for the middle category. In Table 2, items are arranged with respect to their difficulties across response categories, with larger difficulties representing more endorsement for ‘very often’, or more severe symptoms. Logits ranged from −1.14 to 0.87 across response categories for all subscales, with the ‘easiest’ item in the Impulsivity scale (i.e., “I reacted without thinking”) and the ‘hardest’ items in the Negative Affect scale (i.e., “I did not enjoy activities that are usually important to me”, “I felt depressed or hopeless”). This indicates that participants, even those experiencing mild impulsivity symptoms, were generally endorsing the ‘often’ or ‘very often’ category for reacting without thinking, and only those experiencing more severe negative affect symptoms were endorsing the ‘very often’ category for not enjoying activities and feeling depressed.

In Figures 14, average item scores within each domain are presented across rating scale categories from “never” to “very often” experiencing symptoms (items) in the past two weeks. Items higher on the y axis are more likely to be endorsed by those experiencing more frequent symptoms. On the y axis, X’s indicate participants. Items spread well across all subscales, with no clustering noted near the bottom or top, indicating no substantial floor or ceiling effects. However, for fatigue, the overall distribution of participants was skewed towards the more severe symptom end of the scale, and for impulsivity, the overall distribution of participants was skewed towards the milder end of the scale. This may be because most participants in our sample were experiencing more severe fatigue symptoms and few were experiencing more impulsivity, which would be consistent with symptom patterns noted in chronic TBI. Item ordering within each domain made conceptual sense.

Figure 1. Negative affect item map of average measures across rating categories.

Figure 1.

NOTE: Each X=1 person

Figure 4. Impulsivity item map of average measures across rating categories.

Figure 4.

NOTE: Each .=1 person and each X=2 persons

Targeting

Average domain measures along with item and person separation indices can be found in Table 3. Examination of the average domain measures indicated that the Negative Affect (0.37) and Executive Function (−0.90) subscales were adequately capturing symptom severity in the sample. However, the Fatigue and Impulsivity subscales may not have been able to capture symptom severity in this sample, given the larger average domain values of 1.35 and −1.42, respectively. In other words, the items in these subscales may be unable to differentiate between those with less severe and more severe symptoms. The Fatigue subscale positive value suggests that persons were reporting higher severity on these items than expected (i.e., higher than average); whereas, the Impulsivity subscale negative value suggests that persons were reporting less severity on these items than expected (i.e., lower than average), consistent with what we noted in Figures 3 & 4.

Table 3.

Person and Item Separation Statistics

Persons Items
BAST Subscale Average Subscale Measure SE of Measurement Separation Index Separation Reliability SE of Measurement Separation Index Separation Reliability
Negative Affect 0.37 0.11 2.90 .89 0.22 5.98 .97
Executive Function −0.90 0.11 2.69 .88 0.14 3.41 .92
Fatigue 1.35 0.20 2.42 .85 0.33 4.69 .96
Impulsivity −1.42 0.15 1.97 .79 0.34 5.08 .96

BAST=Behavioral Assessment Screening Tool

Figure 3. Fatigue item map of average measures across rating categories.

Figure 3.

NOTE: Each .=1 person and each X=2 persons

Item and person separation indices, also reported in Table 3, show excellent reliability for the subscales, with all separation reliabilities ≥ .79 and all separation indices exceeding the recommended 2.0 for ‘good’ reliability and separation.

Symptom Severity Strata

The BAST subscales distinguished three to four distinct strata of persons with regard to symptom severity. The Negative Affect subscale distinguished the most strata (4.2), followed by Executive Function (3.92), Fatigue (3.57), and Impulsivity (2.95). This indicates that the Negative Affect and Executive Function subscales can discern those with high, above average, below average, and low symptom severity, and that the Fatigue and Impulsivity subscales can discern those with high, average, and low symptom severity. Table 4 presents ordinal to normed continuous score conversions, with means indicated for each subscale and values for logits indicating a statistically significant change in score.

Table 4.

Ordinal to normed scale scores and percentiles conversion table

Ordinal Score Negative Affect
Mean: 47.19
Logit 7.65
Executive Function
Mean: 56.90
Logit: 7.66
Fatigue
Mean: 44.05
Logit: 4.43
Impulsivity
Mean: 58.08
Logit: 5.76
Normed Percentile Normed Percentile Normed Percentile Normed Percentile
5 -- -- -- -- 19.4 1 42.5 2
6 -- -- -- -- 25.0 1 32.5 6
7 -- -- -- -- 28.5 1 38.2 11
8 -- -- -- -- 30.9 2 42.3 20
9 -- -- -- -- 32.8 3 45.7 31
10 3.1 0 12.1 0 34.7 5 48.5 39
11 12.8 0 21.8 1 36.5 6 51.0 48
12 18.8 0 27.7 2 38.2 10 53.3 60
13 22.5 0 31.4 5 40.0 17 55.4 72
14 25.3 0 34.1 7 41.9 23 57.4 79
15 27.5 1 36.3 9 43.8 29 59.3 86
16 29.5 2 38.3 12 45.8 35 61.1 91
17 31.3 2 40.0 15 47.7 42 63.0 93
18 32.9 3 41.6 18 49.5 50 64.8 95
19 34.4 5 43.1 23 51.3 58 66.6 97
20 35.9 7 44.5 30 53.2 63 68.4 98
21 37.2 9 45.8 34 55.2 68 70.4 98
22 38.6 11 47.1 38 57.4 76 72.6 98
23 39.8 14 48.4 41 60.1 85 75.5 99
24 41.0 18 49.7 45 64.0 91 79.9 99
25 42.2 22 51.0 48 69.8 96 87.1 100
26 43.4 25 52.3 54 -- -- -- --
27 44.5 30 53.6 62 -- -- -- --
28 45.6 34 54.9 70 -- -- -- --
29 46.7 38 56.2 75 -- -- -- --
30 47.8 44 57.5 79 -- -- -- --
31 48.9 48 58.8 83 -- -- -- --
32 49.9 51 60.1 86 -- -- -- --
33 51.0 55 61.4 89 -- -- -- --
34 52.0 60 62.7 92 -- -- -- --
35 53.1 66 64.0 93 -- -- -- --
36 54.1 70 65.3 94 -- -- -- --
37 55.2 73 66.6 95 -- -- -- --
38 56.3 77 67.9 95 -- -- -- --
39 57.4 80 69.2 96 -- -- -- --
40 58.6 84 70.6 97 -- -- -- --
41 59.9 86 72.1 99 -- -- -- --
42 61.2 88 73.7 100 -- -- -- --
43 62.6 89 75.4 100 -- -- -- --
44 64.2 91 77.4 100 -- -- -- --
45 66.0 93 79.7 100 -- -- -- --
46 68.1 95 82.6 100 -- -- -- --
47 70.7 97 86.4 100 -- -- -- --
48 74.2 98 92.7 100 -- -- -- --
49 79.9 99 102.6 100 -- -- -- --
50 89.4 100 -- -- -- -- -- --

Discussion

The purpose of our investigation was to assess the structural validity and measurement accuracy of the BAST in a sample of participants with chronic TBI using Rasch analysis. This is the first time the BAST items have been analyzed using modern test theory, as previous investigations focused on classical methods to establish the BAST’s psychometric properties. Overall, the BAST items all performed well and were all maintained, and ordinal to continuous normed scores were generated for each subscale to aid in clinical interpretation. Our results suggest that the BAST is a psychometrically strong and potentially useful clinical surveillance measure for community-dwelling adults with chronic TBI that could be implemented in a proactive chronic symptom-monitoring program to improve neurobehavioral symptoms and mitigate their adverse consequences.

The results of our analyses indicated that the four BAST subscales we tested were unidimensional and measured the intended constructs. Although one item from the Impulsivity scale had an infit statistic on the threshold of misfitting, removal of that item did not improve the average domain measure for the subscale, hence the item was retained. The average domain measures for the Impulsivity and Fatigue subscales were mistargeting to a moderate/strong degree, indicating that more items should be added to these subscales to represent the spectrum of symptom severity for individuals with chronic TBI.

For each subscale, several participants had unexpected responses, indicating “unmodeled noise”. Though each subscale was unidimensional, the BAST was developed as an overarching multidimensional measure of neurobehavioral symptoms. The theoretical model upon which it is based (Juengst et al., 2017; Juengst, Terhorst, Dicianno, et al., 2018) and correlations noted between its subscales (excepting Substance Abuse; (Juengst, Terhorst, & Wagner, 2018; Juengst & Terhorst, 2020)) support that these distinct constructs still overlap. The unmodeled noise of some participants on one subscale may therefore be capturing modeled variance on another subscale. Therefore, we recommend that – though individual subscale scores can be calculated – all subscales of the BAST should be administered and interpreted together when applying clinically. Future work will examine the BAST using Multidimensional Rasch Analysis to improve the measure further (Briggs & Wilson, 2003). Additionally, single time point assessment of neurobehavioral symptoms has known limitations with regard to recall bias and notable within person variability (Juengst et al., 2021; Juengst, Terhorst, et al., 2019; Terhorst et al., 2018), which may also account for unmodeled noise in symptom reporting. We recommend that the BAST be used for repeated measurement in clinical surveillance of chronic neurobehavioral symptoms after TBI, to identify potential neurobehavioral issues early in their development, rather than as a diagnostic tool or absolute indicator of symptom severity.

Limitations and Future Directions

The main limitation with the current investigation was the sample size, as there were not 10 responses per rating scale category for all items. These deficiencies occurred at either the lowest or highest end of the rating scale, indicating that our sample may not have been representative of individuals experiencing symptoms at the extremes. Though one alternative could be to collapse the rating scale, previous work on the BAST already determined that a 5-level ordinal scale was preferred by the target population and performed better psychometrically than a 3-level ordinal scale (Juengst, Terhorst, & Wagner, 2018). Participants in our sample were also highly educated, English-speaking, and predominantly non-Hispanic White and were recruited predominantly from connection with large medical systems. Past work with the BAST suggests that education, language, and ethnicity and geographic location may be related to neurobehavioral symptoms (Juengst et al., 2020; Juengst, Nabasny, et al., 2019). Future investigations using Rasch analysis should be performed with a larger, more representative sample. This will also allow for more analyses, such as Differential Item Functioning, which we were not able to perform with this sample, and Multidimensional Rasch Analysis. We also relied on self-reported history of TBI. Other studies have shown that, while many individuals under-report TBI (especially mild TBI), ‘false positives’ are rare (Alderman et al., 2001; Bailie et al., 2017). However, characterizing loss of consciousness and confusion/PTA is less valid via self-report (Rabinowitz et al., 2020). So, while confident that participants in our study did sustain a TBI, the severity of injury classification may be less reliable.

Results of these Rasch analyses provide new scoring paradigms for the BAST that address problems associated with using ordinal response scales to assess symptom severity (a continuous construct). Further, these results will inform development of a short-form of the BAST for use in Ecological Momentary Assessment of neurobehavioral symptoms. Future validation of the BAST using modern test theory approaches, both for the English and Spanish language versions of the BAST, will also include the Substance Abuse subscale. The BAST could then be used in long-term clinical surveillance systems to monitor neurobehavioral symptoms in adults with chronic TBI.

Figure 2. Executive function item map of average measures across rating categories.

Figure 2.

NOTE: Each X=1 person; All items on this subscale are worded as positive symptoms or behaviors (e.g., “I planned ahead”, “I was organized”) and are reverse scored for analysis (a response of “never” for this subscale would equal 5 rather than 1 point), the response scale options reflected in this figure indicate the original response, for ease of interpretation.

Funding:

This work was funded by the National Institutes for Health, Eunice Kennedy Shriver

National Institute of Child Health and Human Development (NIH/NICHD). Grant no:R03HD09445 (PI: Juengst).

Footnotes

Conflicts of interest/Competing interests: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Ethics Approval: Approval was obtained from the Institutional Review Board (IRB) at UT Southwestern Medical Center and the study was performed in line with the principles of the Declaration of Helsinki.

Consent to Participate: Participants involved in this study consented to the research.

Consent for Publication: Participants involved in this study were made aware of and consented to publication.

Availability of data and material:

The dataset generated for this study will not be made publicly available. The corresponding author can provide the dataset upon request and execution of the necessary data use agreements.

References

  1. Alderman N, Dawson K, Rutterford NA, & Reynolds PJ (2001). A comparison of the validity of self-report measures amongst people with acquired brain injury: A preliminary study of the usefulness of EuroQol-5D. Neuropsychological Rehabilitation, 11(5), 529–537. 10.1080/09602010042000231 [DOI] [Google Scholar]
  2. Applying the Rasch Model: Fundamental Measurement in the Human Sciences, Third Edition. (n.d.). CRC Press. Retrieved July 8, 2020, from https://www.routledge.com/Applying-the-Rasch-Model-Fundamental-Measurement-in-the-Human-Sciences/Bond/p/book/9780415833424 [Google Scholar]
  3. Bailie J, Babakhanyan I, Jolly M, Ekanayake V, Sargent P, Duckworth J, Ekanayke V, & Ekanayake V (2017). Traumatic Brain Injury-2Accuracy of Self-Reported Questions for Assessment of TBI History. Archives of Clinical Neuropsychology, 32(6), 656–666. 10.1093/arclin/acx075.14 [DOI] [Google Scholar]
  4. Briggs DC, & Wilson M (2003). An Introduction to Multidimensional Measurement using Rasch Models. https://nepc.colorado.edu/publication/an-introduction-multidimensional-measurement-using-rasch-models [PubMed]
  5. Corrigan JD, & Bogner J (2007). Initial Reliability and Validity of the Ohio State University TBI Identification Method: Journal of Head Trauma Rehabilitation, 22(6), 318–329. 10.1097/01.HTR.0000300227.67748.77 [DOI] [PubMed] [Google Scholar]
  6. Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, & Conde JG (2009). Research electronic data capture (REDCap)—A metadata-driven methodology and workflow process for providing translational research informatics support. Journal of Biomedical Informatics, 42(2), 377–381. 10.1016/j.jbi.2008.08.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Higashi R, & Juengst SB (2019). Patient-centered measure development and Spanish validation exemplar. Health Literacy Research and Practice. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Juengst SB, Nabasny A, & Terhorst L (2019). Neurobehavioral Symptoms in Community-Dwelling Adults With and Without Chronic Traumatic Brain Injury: Differences by Age, Gender, Education, and Health Condition. Frontiers in Neurology, 10. 10.3389/fneur.2019.01210 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Juengst SB, Nabasny A, & Terhorst L (2020). Cohort Differences in Neurobehavioral Symptoms in Chronic Mild to Severe Traumatic Brain Injury. Frontiers in Neurology, 10. 10.3389/fneur.2019.01342 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Juengst SB, Switzer G, Oh BM, Arenth PM, & Wagner AK (2017). Conceptual model and cluster analysis of behavioral symptoms in two cohorts of adults with traumatic brain injuries. Journal of Clinical and Experimental Neuropsychology, 39(6), 513–524. 10.1080/13803395.2016.1240758 [DOI] [PubMed] [Google Scholar]
  11. Juengst SB, Terhorst L, Dicianno BE, Niemeier JP, & Wagner AK (2018). Development and content validity of the behavioral assessment screening tool (BASTβ). Disability and Rehabilitation, 1–7. 10.1080/09638288.2017.1423403 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Juengst SB, Terhorst L, Kew CL, & Wagner AK (2019). Variability in daily self-reported emotional symptoms and fatigue measured over eight weeks in community dwelling individuals with traumatic brain injury. Brain Injury. https://www.tandfonline.com/doi/abs/10.1080/02699052.2019.1584333 [DOI] [PubMed] [Google Scholar]
  13. Juengst SB, Terhorst L, Nabasny A, Wallace T, Weaver JA, Osborne CL, Burns SP, Wright B, Wen P-S, Kew C-LN, & Morris J (2021). Use of mHealth Technology for Patient-Reported Outcomes in Community-Dwelling Adults with Acquired Brain Injuries: A Scoping Review. International Journal of Environmental Research and Public Health, 18(4). 10.3390/ijerph18042173 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Juengst SB, Terhorst L, & Wagner AK (2018). Factor structure of the Behavioral Assessment Screening Tool (BAST) in traumatic brain injury. Disability and Rehabilitation, 1–6. 10.1080/09638288.2018.1496487 [DOI] [PubMed] [Google Scholar]
  15. Juengst S, Conley M, & Terhorst L (2019). Convergent and Divergent Validity of the Behavioral Assessment Screening Tool (BAST) in Traumatic Brain Injury. Archives of Physical Medicine and Rehabilitation, 100(10), e59. 10.1016/j.apmr.2019.08.165 [DOI] [Google Scholar]
  16. Juengst S, & Terhorst L (2020). Further Psychometric Development of the Behavioral Assessment Screening Tool (BAST). Archives of Physical Medicine and Rehabilitation, 101(12), e140. 10.1016/j.apmr.2020.10.045 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Lequerica AH, Lucca C, Chiaravalloti ND, Ward I, & Corrigan JD (2018). Feasibility and Preliminary Validation of an Online Version of the Ohio State University Traumatic Brain Injury Identification Method. Archives of Physical Medicine and Rehabilitation, 99(9), 1811–1817. 10.1016/j.apmr.2018.03.023 [DOI] [PubMed] [Google Scholar]
  18. Linacre JM (1999a). Investigating rating scale category utility. Journal of Outcome Measurement, 3(2), 103–122. [PubMed] [Google Scholar]
  19. Linacre JM (1999b). Understanding Rasch measurement: Estimation methods for Rasch measures. Journal of Outcome Measurement, 3, 382–405. [PubMed] [Google Scholar]
  20. Linacre JM (2002). Understanding Rasch Measurement: Optimizing Rating Scale Category Effectiveness. Journal of Applied Measurement, 3(1), 85–106. [PubMed] [Google Scholar]
  21. Masters GN (1982). A Rasch Model for Partial Credit Scoring. Psychometrika, 47(2), 149–174. [Google Scholar]
  22. Menon DK, Schwab K, Wright DW, Maas AI, & Demographics and Clinical Assessment Working Group of the International and Interagency Initiative toward Common Data Elements for Research on Traumatic Brain Injury and Psychological Health. (2010). Position statement: Definition of traumatic brain injury. Archives of Physical Medicine and Rehabilitation, 91(11), 1637–1640. 10.1016/j.apmr.2010.05.017 [DOI] [PubMed] [Google Scholar]
  23. Osborne CL, Kauvar DS, & Juengst SB (2019). Linking the behavioral assessment screening tool to the international classification of functioning, disability, and health as a novel indicator of content validity. Disability and Rehabilitation, 0(0), 1–8. 10.1080/09638288.2018.1539128 [DOI] [PubMed] [Google Scholar]
  24. Rabinowitz AR, Chervoneva I, Hart T, O’Neil-Pirozzi TM, Bogner J, Dams-O’Connor K, Brown AW, & Johnson-Greene D (2020). Influence of Prior and Intercurrent Brain Injury on 5-Year Outcome Trajectories After Moderate to Severe Traumatic Brain Injury. The Journal of Head Trauma Rehabilitation, 35(4), E342–E351. 10.1097/HTR.0000000000000556 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Rasch Reporting Guidelines Task Force. (n.d.). ACRM. Retrieved November 9, 2020, from https://acrm.org/acrm-communities/measurement/misig-task-forces/rasch-reporting-guidelines-task-force/ [Google Scholar]
  26. Sample Size and Item Calibration or Person Measure Stability. (n.d.). Retrieved January 10, 2021, from https://www.rasch.org/rmt/rmt74m.htm
  27. Silverstein B, Fisher WP, Kilgore KM, Harley JP, & Harvey RF (1992). Applying psychometric criteria to functional assessment in medical rehabilitation: II. Defining interval measures. Archives of Physical Medicine and Rehabilitation, 73(6), 507–518. 10.5555/uri:pii:000399939290184X [DOI] [PubMed] [Google Scholar]
  28. Table 6.1 Person statistics in misfit order: Winsteps Help (n.d.). Retrieved April 1, 2021, from https://www.winsteps.com/winman/table6_1.htm
  29. Terhorst L, Juengst SB, Beck KB, & Shiffman S (2018). People can change: Measuring individual variability in rehabilitation science. Rehabilitation Psychology, No Pagination Specified-No Pagination Specified. 10.1037/rep0000214 [DOI] [PubMed] [Google Scholar]
  30. Wright B, & Masters G (1982). Rating scale analysis. Measurement and Statistics. https://research.acer.edu.au/measurement/2 [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The dataset generated for this study will not be made publicly available. The corresponding author can provide the dataset upon request and execution of the necessary data use agreements.

RESOURCES