Skip to main content
European Journal of Physical and Rehabilitation Medicine logoLink to European Journal of Physical and Rehabilitation Medicine
. 2023 Sep 25;59(4):458–473. doi: 10.23736/S1973-9087.23.07908-X

The structure of the Early Rehabilitation Barthel Index (ERBI) should be modified: evidence from a Rasch analysis study

Leonardo PELLICCIARI 1, Lucia F LUCCA 2, Antonio DE TANTI 3, Rita FORMISANO 4, Anna ESTRANEO 5, Francesca C CAVA 6, Donatella SAVIOLA 3, Fabio LA PORTA 1,*
PMCID: PMC10595071  PMID: 37534887

Abstract

BACKGROUND

The Early Rehabilitation Barthel Index (ERBI) comprises seven items of the Early Rehabilitation Index and ten items of the Barthel Index. The ERBI is usually used to measure functional changes in patients with severe acquired brain injury (sABI), but its measurement properties have yet to be extensively assessed.

AIM

To study the unidimensionality and internal construct validity (ICV) of the ERBI through Confirmatory Factor Analysis (CFA), Mokken Analysis (MA), and Rasch Analysis (RA).

DESIGN

Multicenter prospective study.

SETTING

Inpatients from five intensive rehabilitation centers.

POPULATION

Two hundred and forty-seven subjects with sABI.

METHODS

ERBI was administered on admission and discharge to study its unidimensionality through CFA and MA and its ICV, reliability, and targeting through RA.

RESULTS

The preliminary analyses showed a lack of unidimensionality (RMSEA=0.460 >0.06; SRMR=0.176 >0.06; CFI=1.000 >0.950; TLI=1.000 >0.950). According to CFA, “Confusional state” and “Behavioral disturbance” items showed low factor loadings (<0.40), whereas these two items composed a separate scale within the MA. Furthermore, the baseline RA showed that three items misfitted (“Mechanical ventilation,” “Confusional state,” “Behavioral disturbances”) and a lack of conformity of several ICV requirements. After deletion of three misfitting items and further non-structural modifications (i.e., testlets creation to absorb local dependence between items and item misfit), the solution obtained showed adequate ICV, adequate reliability for measurements at the individual level (PSI>0.85), although with a frank floor effect. This final solution was successfully replicated in a total sample of the subjects. After post-hoc modifications of the score structure of two out of three misfitting items, the subsequent CFA (RMSEA=0.044 <0.06; SRMR=0.056 <0.06; CFI=1.000 >0.950 TLI=1.000 >0.950) and MA showed the resolution of the unidimensional issues.

CONCLUSIONS

Although the ERBI is a potentially valuable tool for measuring functioning in the coma-to-community continuum, our analyses suggested its lack of ICV, partly due to an incorrect scoring design of some items. A new perspective multicenter study is proposed to validate a modified version of the ERBI that overcomes the problems highlighted in this analysis.

CLINICAL REHABILITATION IMPACT

Our results do not support the use of the original structure of the ERBI in clinical practice and research, as a lack of ICV was highlighted.

Key words: Brain injuries, Health care outcome assessment, Psychometrics, Rehabilitation


Severe acquired brain injuries (sABIs) are associated with a wide heterogeneity of clinical, cognitive, behavioral, and psychosocial impairments, leading to reduced functional independence. Accurate and reliable measurement of functional independence is crucial to plan an appropriate and tailored rehabilitation treatment and to make an accurate functional prognosis.1 The Barthel Index (BI)2 was developed to measure functional independence in patients with different diseases. The BI has been widely used as a measurement instrument in patients with sABI in several kinds of researches.3, 4 Moreover, BI psychometric properties were confirmed in several neurologic patients as stroke,5 Parkinson’s disease,6 and brain injury.3 As its administration and total score calculation are straightforward, BI is widely used in clinical practice.

However, some functional changes in such severely disabled patients during the early rehabilitation stay may go undetected using BI. Indeed, for these patients, the initial rehabilitation goals are often related to medical stabilization, weaning from mechanical ventilation and other medical devices, and interventions targeting recovery of swallowing, and preventing consequences of hypomobility.7 These crucial rehabilitation outcomes are not captured by BI. Moreover, several studies8, 9 demonstrated a BI floor effect (the percentage of patients with the lowest possible score) being superior to the maximum recommended threshold of 15%.10 The presence of a floor effect compromises the BI ability to discriminate between patients with lower functional independence.11

As a consequence, to reduce this floor effect, Schönle et al.12 proposed to add to BI the Early Rehabilitation Index (ERI), a scale composed of seven highly relevant items investigating variables typical of neurological and neurosurgical patients in the early post-acute rehabilitation phase, as intensive medical monitoring, tracheotomy tube management and weaning, mechanical ventilation use, confusional state, behavioral disturbances, communication deficit, and dysphagia treatment; thus, the Early Rehabilitation Barthel Index (ERBI) was developed to include these domains. Although ERBI has been administered in several studies in persons with sABI,13, 14 its psychometric properties were assessed in detail by few studies.7, 15, 16 Particularly, Rollnik et al.15 demonstrated high correlations between nurses’ and physicians’ assessments (r=0.849) and moderate correlations with BI (r=0.438 and r=0.430 if administrated by nurses of physicians, respectively) in 273 neurological rehabilitation patients.15 Furthermore, Reis et al.16 showed that a Brazilian Portuguese version of the ERBI had a moderate to excellent inter-rater reliability (kappa ranging from 0.54 to 1.00) and that its total score correlated significantly with several other functional indicators in 122 patients admitted to the general intensive care unit.16 Finally, Rollnik et al.15, 17 demonstrated the predictive validity of low ERBI total scores in terms of significantly longer length of stay15, 17 and significant morbidities.15

The available psychometric evidence on the validity and reliability of the ERBI focused on the total score paradigm of the Classical Theory Test (CTT). However, specific analysis has yet to be undertaken to demonstrate the unidimensionality of the ERBI items. In other words, so far, there is no evidence that ERI and BI items could be summated to generate a valid total score. As such, all the reliability and validity data accrued so far should be interpreted cautiously, as they were based on total scores, which may be internally invalid as they may be multi-dimensional. On the other hand, other classical and modern psychometric approaches could provide unidimensionality evidence. Amongst modern psychometric models, the Rasch Measurement Theory (RMT) framework is worth mentioning as it can provide evidence of unidimensionality and other requirements for internal construct validity (ICV) and reliability.18 The RMT is unique as if the data generated by the scale meets its requirements, the scale total score will be transformed into continuous estimates of ability on an interval scale. Considering that these interval-level estimates also hold the invariance property, it could be said that a scale validated within the RMT can deliver the three fundamental tenets of measurement: invariance, unidimensionality, and interval-level estimates.11

Thus, the aims of this study were: 1) to assess the ERBI unidimensionality according to a variety of classical and modern psychometric methods; 2) to assess in more detail the ERBI ICV and reliability within the RMT; 3) to provide clues on possible strategies to improve the ERBI measurement properties.

Materials and methods

Study design and setting

Data were collected across five Italian intensive neurorehabilitation centers with expertise in diagnosing and caring for adults with sABI within two multicenter prospective observational studies, where the ERBI was employed as an outcome measure. The methods of these studies are described in detail elsewhere.13, 19

The study protocols was approved by the local Ethical Committee (No. 244, 24th October 2017), that were carried out according to the principles outlined in the Helsinki declaration. In addition, participants or their respective caregivers also signed a written informed consent before any study-related procedures.

Participants

Patients were enrolled through convenience sampling and were included in this study if they met the following inclusion criteria: aged over 18 years at the onset, with a first event of traumatic or non-traumatic (i.e., anoxic or vascular) sABI, with a clinical diagnosis of a disorder of consciousness based on standardized clinical criteria,20 first admission to intensive rehabilitation unit, no longer than three months since sABI. Subjects were excluded if: they presented a mixed etiology or they reported a premorbid history of psychiatric or neurodegenerative diseases.

Outcome measures

The Early Rehabilitation Barthel Index (ERBI)12 comprises seven items from ERI and ten items from BI (Table I). Items from ERI are scored with a 2-point Likert scale (-50 or -25 and 0 points), whereas items from BI are scored with a 3-point Likert scale (0, 5, and 10 points). All item scores are summed up to generate a total score that ranges from -325 to 100 points (the higher the score, the higher the improvement of functional independence).

Table I. The Early Rehabilitation Barthel Index.

Item Scoring
Early Rehabilitation Index (ERI) Yes No
A1. Intensive medical monitoring −50 0
A2. Tracheostoma requiring special treatment (suctioning) −50 0
A3. Intermittent (or continuous) mechanical ventilation −50 0
A4. Confusional state requiring special supervision −50 0
A5. Behavioral disturbances requiring special care (patient poses a risk to himself or his environment) −50 0
A6. Severe communication deficits −25 0
A7. Swallowing disorders requiring special supervision −50 0
Item Scoring
Barthel Index (BI) Unable Needs help Independent
B1. Feeding 0 5 10
B2. Transfers 0 5-10 15
B3. Grooming 0 5
B4. Toilet use 0 5 10
B5. Bathing 0 5
B6. Mobility 0 5-10 15
B7. Stairs 0 5 10
B8. Dressing 0 5 10
B9. Bowels 0 5 10
B10. Bladder 0 5 10

Procedures

Demographic and clinical characteristics of the patients were collected at admission to each rehabilitation center. Moreover, ERBI was administered both on admission and discharge. The participating centers adopted a shared protocol to collect data to reduce inter-rater variability.

Data manipulation

Considering that the available data included observations collected at two different time points (i.e., on admission and at discharge), we applied the procedure proposed by Mallinson21 to build a sample that randomly contained only an assessment of an enrolled subject. Therefore, we generated a unique assessment sample, where each patient had either an admission or a discharge assessment chosen randomly. Finally, we generated a total sample which contained all available observations. The data manipulation procedure is detailed in Figure 1.

Figure 1.

Figure 1

—Generation of the unique assessment sample. We built a unique assessment sample that randomly contained an admission or discharge assessment.

Planned analyses

The following analyses were undertaken:

  • descriptive statistics for the total sample, scale, and items;

  • preliminary assessment of unidimensionality on the total sample:

  • Classical test theory item analysis (CTT-IA);

  • Mokken analysis (MA);

  • confirmatory factor analysis (CFA);

  • ICV with Rasch analysis (RA) on the unique assessment sample and total sample.

Statistical analysis

Descriptive sample and scale statistics

Descriptive statistics were computed to describe all the collected variables. Mean and standard deviation, median with first and third quartile, and absolute frequency with percentage were calculated for the interval, ordinal, and nominal variables, respectively.

Preliminary assessment of unidimensionality on the total sample

Classical item analysis

We assessed the internal consistency of the total sample by calculating the following statistics:

  • at the total score level, the Cronbach’s alpha (α);22 where values between 0.70 and 0.9523 were considered satisfactory;

  • at the item level:

  • the average inter-item correlations, i.e., the mean of the inter-item correlations (based on Spearman’s correlation coefficient (rs)24) between each pair of items; values ≥0.2 were recommended;24

  • the Cronbach’s alpha with an item deleted; values below the total α were expected for each item;24

  • the item-to-total correlations, based on rs between each item and its rest score (i.e., the total score minus the item score); values ≥0.40 were considered acceptable.24

Mokken analysis

We computed the Scalability coefficients (H)25 for each item. Particularly, we considered the following rules of thumb for the interpretability of this coefficient:25

  • <0.30: lack of scalability;

  • 0.30-0.39: low scalability;

  • 0.40-0.49: moderate scalability;

  • >0.50: high scalability.

Confirmatory factor analysis (CFA)

A CFA was run to assess the ERBI unidimensional structure. The assessment of model fit was performed using the following indexes:24, 26

  • the model Chi-square (χ2), an overall indicator of model fit, measures the discrepancy between the covariance matrices of the model and the sample.27 For adequate fit to the model, the χ2 probability values should be not significant;

  • the Root Mean Square Error of Approximation (RMSEA) measures the discrepancy between the covariance matrix predicted by the model and the population covariance matrix.27 Values≤0.06 are considered sufficient to indicate a good fit to the model;28

  • the Standardized Root Mean Square Residual (SRMR) is the average value across all residual values derived from the comparison between the predicted and the observed variance-covariance matrix.27 Values≤0.08 indicate an adequate fit to the model;28

  • the Tucker-Lewis Index (TLI) and the Comparative Fit Index (CFI) measure the proportionate improvement in the model fit by comparing the hypothesized and the null models. Values≥0.95 are considered indicative of a good fit to the model;28

  • a representative loading of each item on the latent factor>0.4 was considered acceptable.23

Should the initial baseline analysis fail to fit the CFA unidimensional model, we would assess the modification indices (MIs) on item pairs.27 MIs27 indicate the presence of residual covariance between items, as in the case of local dependency between items, where the response to one of the two items within the pair is influenced by the response to the other item.29, 30 In these cases, the model would be re-specified after accounting for local dependency by allowing correlation of the error terms of the items in the pair.24, 27 After accounting for local dependency, we would reassess the fit of the item set to a unidimensional model.

Analysis of internal construct validity with Rasch analysis

A full description of all the RA parameters and relative indexes employed is available in Supplementary Digital Material 1 (Supplementary Table I, Supplementary Text File 1). The RAs were based on the partial credit Rasch model, considering that the items had an unequal number of scoring categories. In brief, we based the interpretation of the analysis findings on the following summary statistics:

Fitness to the Rasch model relates to the stochastically invariant ordering of the items. It was assessed by computing the mean and standard deviation of the item and person fit residuals, which indicated a good fit if ≤1.4.31 Furthermore, we also assessed a summary Chi-square interaction statistics, which should not be significant (i.e., values above the Bonferroni adjusted P value) in case of no deviations from the model’s expectations.32-34

We also assessed the following ICV requirements:

  • unidimensionality, i.e., all scale items must measure a single underlying construct.33 We tested the unidimensionality with a post-hoc procedure,35 consisting of a paired t-test on separate estimates for each subject (derived from subsets of items identified by principal component analysis of the standardized Rasch residuals). Unidimensionality was considered ‘strict’ when both the proportion of significant t-tests (PST) and the lower bound of the binomial confidence interval for proportions (BCI) were below 5%.36 Otherwise, unidimensionality was considered ‘acceptable’ when only the BCI was <5%;36

  • monotonicity, i.e., the increase of the underlying latent trait should be associated with an increased probability of endorsing a scoring category indicative of higher functional independence within an item.36, 37 Therefore, this requirement was summarized as the percentage of items with disordered thresholds (T-DT), expecting a value of 0% for adequate monotonicity;38

  • local independence, i.e., all the variation among responses to an item is accounted for by the person’s ability only. Therefore, there should be no further systematic relationship among responses for the same value of ability.37 Thus, we considered items to be locally independent if their residual correlation was above a local dependency relative cutoff (LDRC), calculated by adding 0.2 to the average of residual correlations after having removed the association of each item to itself, equal to.33, 39 Then, we summed all the correlation coefficients of the residuals above the LDRC to obtain a total value of LD (T-LD), where 0 indicates the complete absence of LD;38

  • absence of differential item functioning (DIF), i.e., each item must be invariant also across relevant subgroups (or person factors), as gender or age.36, 40 A two-way ANOVA tested the presence of DIF for each item, where scores are compared across each level of the person factor and different ability levels, as summarized by the class intervals. If the ANOVA P values were significantly below the Bonferroni correction, DIF was present.41 We summarized the amount of DIF (T-DIF) by obtaining the absolute value of the base-ten logarithm of the sum of all significant P values across all items and all person factors.38 Thus, T-DIF values ranged from zero to infinite, where zero indicated the absence of DIF. We tested the following person factors within the DIF analysis: age (<40 years, between 41 and 59 years, >60 years); gender (male, female); etiology (traumatic brain injury, vascular, other), acute length of stay (<28 days, between 29-48 days, >49 days), time since lesion (<65 days, between 66-123 days, >124 days) and according to Levels of Cognitive Functioning (LCF) (levels 1-3: disorder of consciousness; level 4-6: global neuropsychological dysfunction; level ≥7 selective neuropsychological deficits).

Targeting and reliability were summarized as follows:

  • targeting (i.e., how well the measurement range of the scale matches the distribution of the calibrating sample)42 was studied by the visual inspection of the targeting graphs and by the presence of ceiling and floor effects. The latter were deemed significant if the highest or the lowest possible score were detected, respectively, for more than 15% of the subjects in the sample;10

  • separation reliability (i.e., the ability of the instrument to separate persons effectively based on their level of ability37 was expressed by the Person Separation Index (PSI), and α.43 In this context, we considered PSI or α values ≥0.85 and ≥0.70 but <0.85 as sufficient for individual-level and group-level measurements, respectively.30, 44

Should the data not fit the Rasch model (as it is often the case), several subsequent analyses would be undertaken to perform some adjustments to the scale aiming at controlling for the violations of the ICV requirements. This process, which we would undertake iteratively, would be based on post-hoc modifications that could be either:

  • structural, where the scale structure is actively modified at the item level, either because of rescoring or deleting items (i.e., these modifications affect the total score range). For item rescoring, collapsing adjacent response categories of the same item resolve monotonicity violations. Published guidelines would be followed,45 and the rescoring pattern would be performed to maximize statistical indexes and clinical meaning.26 Instead, an item would be deleted in case of a severe misfit to the model requirements, also taking into account the findings of the preliminary unidimensionality analysis;

  • statistical, where items would undergo testlets creation or item splitting. In these cases, the scale structure would be unmodified (i.e., the total score range would be unchanged), as these procedures mainly affect the conversion of the total score into interval-level estimates of ability. According to this approach, testlet creation (i.e., item grouping) would be performed on item pairs to account for LD,46 whereas item splitting would be used to account for violations of the absence of uniform-DIF requirement, as detected by the mean of ANOVA.40 However, should DIF be revealed, the influence of the item-splitting procedure on the person estimates would be examined by employing the procedure reported by Maritz et al.31 before incorporating the item-splitting in the final solution. In particular, after item splitting, we would anchor the splitted item solution (i.e., without DIF) to the solution without splitting (i.e., with DIF); subsequently, we would compare the person estimates of the two solutions with a paired t-test, assessing the size of any difference by the Cohen’s d. Thus, should Cohen’s d be <0.2, we would conclude that the DIF is negligible and therefore, we would not adjust for it; instead, in case of Cohen’s d>0.2, we would correct the DIF, by splitting the item and the splitted solution would be the final solution.

Fitness to the Rasch model, ICV requirements, reliability, and targeting were assessed for the original scale (base analysis) and, after each scale modification, to ascertain whether adequate model fit was achieved. This process was repeated cyclically until no further changes were needed and/or possible.

The RA was conducted on the unique assessment sample (i.e., validating sample). Once found a solution, the same analysis would be replicated on the total sample (i.e., confirmatory sample), which would then be anchored to the validating sample. The solution would be considered stable enough should adequate fit to the Rasch model be demonstrated also for the anchored confirmatory sample.

Statistical notes, software, and sample size issues

Descriptive statistics and internal consistency analyses were computed by SPSS software (v. 21 for Windows; SPSS, Inc.., Chicago, IL, USA). MA was run with the R package Mokken v. 2.8.4. CFA was performed using the Mplus software v. 6.0 (Muthen & Muthen, Los Angeles, CA, USA). RA was performed with RUMM2030 software v. 5.1 (Perth, Australia). A significance value of 0.05 was used and corrected for the number of tests by Bonferroni correction.47 Finally, we used the RUMM Logbook™ v. 1.9.5, an ad-hoc Excel 2007™ application developed using Microsoft Visual Basic™ macros to facilitate the interpretation of the results of each RA. A free copy of this application is available from the corresponding author upon request. For CFA, it was estimated that 247 subjects would guarantee a subject-parameter ratio of 9.1, which is close to the recommended ratio of 10:1,48 given 27 score points for ERBI. For RA, the same sample size would be sufficient to estimate item difficulty with α of 0.01 to <±0.5 logits.49

Data availability statement

The raw data associated with the article are publicly available for download from in www.zenodo.org (according to the license Creative Commons Attribution 4.0 International) from the following link: link: https://doi.org/10.5281/zenodo.8171871.

Results

Participants

Two hundred and forty-seven patients with sABI were included in this study. Demographic and clinical characteristics are presented in Table II.

Table II. Demographic and clinical characteristics of the sample (N.=247).

Variable N. % Median Min-max Mean SD
Age (years) 51 0-82 47.7 19.1
Gender
Male 164 66.4
Female 83 33.6
Etiology
TBI 138 55.9
Vascular 87 35.2
Anoxic 20 8.1
Infective 2 0.8
LOS-A (days) 34 10-199 41.9 26.1
LOS-R (days) 142 4-605 158.4 26.1
Discharge GOS
Good recovery 16 16.5
Moderate disability 47 19.0
Severe disability 115 46.5
Vegetative state 68 27.5
Death 1 0.4
Center
ISA 144 58.3
CCF 51 20.6
IMOR 20 8.1
FDG 20 8.1
FSL 12 4.9

N: number; Min: minimum; Max: maximum; SD: standard deviation; TBI: traumatic brain injury; LOS-A: acute length of stay; LOS-R: rehabilitation length of stay; GOS: Glasgow Outcome Scale; ISA: Instituto Sant’Anna; CCF: Centro Cardinal Ferrari; IMOR: Istituto di Montecatone Ospedale di Riabilitazione; FDG: Fondazione Don Gnocchi; FSL: Fondazione Santa Lucia.

Given the availability of an admission and a discharge assessment, we built a total sample including both assessments, giving 494 observations. Next, a unique assessment sample was created (N.=247), containing a randomly selected observation for each patient, with no repeated observations (Figure 1). Detailed clinical characteristics for the unique assessment sample are presented in Table III.

Table III. Detailed clinical characteristics for each sample and assessment.

Variable Admission (N.=247) Discharge (N.=247) Unique assessment sample (N.=247)
N. % Med Min-Max Mean SD N. % Med Min-max Mean SD N. % Med Min-max Mean SD
TSL (days) 36 10-428 49.4 52.0 183 31-587 198.8 106.5 81 15-587 125.7 112.8
Diagnosis
VS 86 34.8 35 14.2 62 25.1
MCS 79 32.0 47 19.0 61 24.7
E-MCS 82 33.2 165 66.8 124 50.2
Missing 0 0.0 0 0.0 0 0.0
CRS-R 12.5 1-23 13.5 8.0 23 3-23 17.8 6.8 17 1-23 15.6 7.6
Missing 5 2.0 25 10.1 14 5.7
LCF
1 1 0.4 0 0.0 0 0.0
2 85 34.4 36 14.6 62 25.1
3 70 28.3 47 19.0 58 23.5
4 37 15.0 5 2.0 16 6.5
5 24 9.7 26 10.5 23 9.3
6 25 10.1 48 19.4 37 15.0
7 4 1.6 69 27.9 42 17.0
8 1 0.4 15 6.1 9 3.6
Missing 0 0.0 1 0.4 0 0.0
LCF Groups
1-3 156 63.2 83 33.6 120 48.6
4-6 86 34.8 79 32.0 76 30.8
7-8 5 2.0 84 34.0 51 20.6
Missing 0 0.0 1 0.4 0 0.0
Tracheostomia
Yes 200 81.0 69 27.9 130 52.6
No 47 19.0 178 72.1 117 47.4
Missing 0 0.0 0 0.0 0 0.0
Respiration
Spontaneous 122 49.4 214 86.6 173 70.0
Spontaneous+O2 102 41.3 27 10.9 58 23.5
Mechanical 10 4.0 6 2.4 8 3.2
Missing 13 5.3 0 0.0 8 3.2
Feeding
TNP 5 2.0 2 0.8 4 1.6
NGT 100 40.5 7 2.8 49 19.8
PEG 99 40.1 89 36.0 95 38.5
Oral 42 17.0 149 60.3 99 40.1
Missing 1 0.4 0 0.0 0 0.0
Bladder
No 239 96.8 45 18.2 137 55.5
Yes 8 3.2 202 81.8 110 44.5
Missing 0 0.0 0 0.0 0 0.0
Pressure sores
No 154 62.3 208 84.2 180 72.9
Yes 93 37.7 38 15.4 67 27.1
Missing 0 0.0 1 0.4 0 0.0
Ashworth Scale a
No 210 85.0 169 68.4 187 75.7
Yes 37 15.0 58 23.5 51 20.6
Missing 0 0.0 20 8.1 9 3.6
HO
No 4 1.6 20 8.1 10 4.0
Yes 243 98.4 207 83.8 228 92.3
Missing 0 0.0 20 8.1 9 3.6
Craniectomy
No 53 21.5 26 10.5 37 15.0
Yes 194 78.5 221 89.5 210 85.0
Missing 0 0.0 0 0.0 0 0.0
Hydrocephalus
No 20 8.1 23 9.3 21 8.5
Yes 207 83.8 204 82.6 206 83.4
Missing 20 8.1 20 8.1 20 8.1

N: number; Med: median; Min: minimum; Max: maximum; SD: standard deviation; TSL: time since lesion; VS: vegetative state; MCS: minimally conscious state; E-MCS: emergence of minimally conscious state; CRS-R: Coma Recovery Scale-Revisited; LCF: Cognitive Functioning Scale; O2: oxygen; TPN: total parenteral nutrition; NGT: nasogastric tube; PEG: percutaneous endoscopic gastrostomy; HO: heterotopic ossification. a Ashworth Scale score ≥3 points in >4 joints.

Analyses on the total sample

Internal consistency

Considering the total sample, at the total score level, α was satisfactory (α=0.964). Similar findings were reported for the average inter-item correlations (=0.544 >0.200). At the items level, we observed item-to-total correlation coefficients above 0.400 for all items but three (i.e., A3 [Mechanical ventilation], A4 [Confusional state], and A5 [Behavioral disturbances]). α with an item deleted showed that the deletion of five items (i.e., A1 [Intensive medical monitoring], A2 [Tracheotomy], A3[Mechanical ventilation], A4 [Confusional state], and A5 [Behavioral disturbances]) increased α (Table IV).

Table IV. Results of classical item analysis, Mokken analysis and confirmatory factor analysis on original items (N.=494).
Summary analysis Classical item analysis Mokken analysis Confirmatory factor analysis - baseline Confirmatory factor analysis - final
aIIC α Scale Scale RMSEA SRMR CFI TLI χ2df P value RMSEA SRMR CFI TLI χ2df P value
0.544 0.964 0.460* 0.176* 1.000 1.000 12557.7119 0.0000* 0.048 0.084* 1.000 1.000 174.982 0.0000*
Recommended values >0.200 >0.700 ≤0.06 ≤0.06 ≥0.950 ≥0.950 n.s ≥0.05 ≤0.06 ≤0.06 ≥0.950 ≥0.950 n.s ≥0.05
1 2
Item analysis ITC α-iid H H Factor loading SE Factor loading SE
A1 Intensive medical monitoring 0.615 0.964* 1.00 1.000 0.000 1.000 0.000
A2 Tracheostoma 0.676 0.964* 0.97 0.946 0.013 0.952 0.013
A3 Mechanical ventilation 0.092* 0.968* 0.98 0.998 0.000 0.999 0.000
A4 Confusional state 0.147* 0.969* 0.66 0.296* 0.068 0.372* 0.083
A5 Behavioral disturbances 0.132* 0.969* 0.66 0.292* 0.078 0.205* 0.080
A6 Severe communication deficits 0.734 0.963 0.96 0.931 0.013 0.928 0.014
A7 Swallowing disorders 0.758 0.963 0.97 0.999 0.000 0.966 0.010
B1 Feeding 0.946 0.959 1.00 0.985 0.000 0.985 0.000
B2 Transfers 0.961 0.959 0.98 0.990 0.000 0.990 0.000
B3 Grooming 0.959 0.959 0.99 0.988 0.000 0.986 0.000
B4 Toilet use 0.926 0.960 0.98 0.988 0.000 0.987 0.000
B5 Bathing 0.866 0.961 0.98 0.986 0.000 0.986 0.000
B6 Mobility 0.947 0.959 0.98 0.987 0.000 0.986 0.000
B7 Stairs 0.901 0.960 0.98 0.988 0.000 0.986 0.000
B8 Dressing 0.940 0.959 0.99 0.986 0.000 0.986 0.000
B9 Bowels 0.938 0.960 0.97 0.986 0.000 0.986 0.000
B10 Bladder 0.929 0.960 0.97 0.986 0.000 0.986 0.000
Recommended values >0.400 <0.964 >0.30 >0.30 >0.400 >0.400

aIIC: average item-to-total correlation; α: Cronbach’s alpha; RMSEA: Root Mean Square Error of Approximation; SRMR: Standardized Root Mean Square Residual; CFI: Comparative Fit Index; TLI: Tucker Lewis Index; χ2df: Chi-square with degree of freedom; ITC: item-to-total correlation; α-iid: alpha if an item was deleted; H: scalability coefficient; SE: standard error; n.s.: not significant. *Statistics values outside the recommended cut-off.

Mokken analysis

At the scale level, MA on the total sample showed that all items could be scalable on one main scale, but A4 [Confusional state] and A5 [Behavioral disturbances]), which were scalable onto a separate scale. At the item level, H coefficients showed high scalability for all items on both scales (H>0.30) (Table IV).

Confirmatory factor analysis

Baseline CFA on the total sample showed a lack of fit to a unidimensional structure (RMSEA=0.460). Furthermore, A4 [Confusional state] and A5 [Behavioral disturbances]), again, showed significantly lower factor loadings (<0.30) in comparison to all other items (>0.93). Finally, high MIs suggested local dependence between several pairs of items (Table IV). In particular, we observed MIs >0.994 amongst several pairs of BI items. Indeed, after accounting for local dependence, the scale did fit a unidimensional model (RMSEA=0.048), although the factor loadings for A4 and A5 remained significantly lower (<3.8) in comparison to the other items (>0.929) (Table IV).

Rasch analysis

As reported in Table V, the base RA on the unique assessment sample showed that the data failed to meet the requirements of stochastic invariance (χ2df=866.954; P≤0.001) local independence (10 items presented their residual correlation above the LDRC, here set at 0.168), and absence of DIF (six items presented uniform and non-uniform DIF). Furthermore, 11 items misfitted to the model. However, there were no violations of the monotonicity requirements (all items had an ordered threshold structure), and the scale appeared to be strictly unidimensional (PST=1.3%, lower BCI=-0.9%). At the item level, A3 (Mechanical ventilation), A4 (Confusional state), and A5 (Behavioral disturbances) showed misfitting to the model.

Table V. Summary of the Rasch analyses for each sample.
Analysis description Fitness to the Rasch Model Internal construct validity requirements Separation reliability
Item fit residual Person fit residual Item-trait interaction Unidimensionality Other ICV requirements
Analysis name N/CI Mean SD Mean SD χ2df P value Cut-off a PST (%) b Lower BCI (%) b T-DT c T-LD d T-DIF e PSI α
Unique Assessment Sample Base 247/2 -1.462 1.821* -0.462 0.468 221.517 0.0000* 0.0029 1.3% -0.9% 0.0% 3.352* 135.5* 0.960 0.965
Deleting A3-A4-A5 247/2 -0.751 1.065 -0.289 0.435 25.214 0.0331 0.0036 5.3% 1.5% 0.0% 2.320* 43.5* 0.951 0.980
Five testlets 247/2 -0.633 0.919 -0.324 0.402 7.25 0.2070 0.0100 3.1% -0.7% 0.0% 0.000 21.5* 0.902 0.936
Total Sample Base 494/4 -1.742 2.315* -0.458 0.440 866.934 0.0000* 0.0029 2.6% 1.5% 0.0% 1.576* 121.8* 0.953 0.964
Deleting A3-A4-A5 494/4 -1.020 1.466* -0.349 0.452 491.242 0.0000* 0.0036 4.0% 2.0% 0.0% 1.536* 112.0* 0.940 0.979
Five testlets 494/4 -0.712 0.706 -0.359 0.410 19.315 0.1988 0.0100 2.4% 0.4% 0.0% 0.000 64.1* 0.909 0.937
Anchored from unique assessment sample 494/4 -0.442 0.694 -0.353 0.427 20.615 0.1518 0.0100 3.6% 1.6% 0.0% 0.000 57.3 0.894 0.937
Splitting ST4 for time since lesion 494/4 -0.770 0.657 -0.359 0.411 19.117 0.3260 0.0083 - - 0.0% 0.200 50.1 0.909 -
Splitting ST5 for Levels of Cognitive Function 494/4 -0.847 0.722 -0.352 0.392 21.920 0.3526 0.0071 - - 0.0% 0.236 40.3 0.920 -
Recommended values - ≤1.4 - ≤1.4 - <Cut-off - <5.0 g <5.0 0% 0 0 ≥0.85 f ≥0.85 f

ICV: internal construct validity; N/CI: the ratio between sample size and class intervals; SD: standard deviation; χ2df: unconditional Chi-square for model fit with degree of freedom; PST: the proportion of significant t-test carried out on the estimates that, within a principal component analysis of residuals, loaded positively and negatively (factor loading >0.30) on the first component; BCI, binomial (95%) confidence interval for proportions of significant t-test; T-DT: percentage of items with disordered thresholds; T-LD: total local dependency load; T-DIF: total DIF load; PSI: Person Separation Index; α: Cronbach’s alpha. a Bonferroni-corrected P value, which varies by analysis, and that is used to interpret the corresponding Chi-square P value; b unidimensionality is considered achieved either when PST is <5% or when the lower bound of its BCI is <5%; c the T-DT statistic is calculated as the percentage of items with disordered thresholds out of the total number of items. The values range from zero to 100%, where zero indicates the absence of items with disordered thresholds; d the T-LD summary statistics is calculated by summing together all the residual correlations values above the local dependency relative cut-off, which is calculated as the mean of all the residual correlations (excluding the correlation of items with themselves) minus 0.2. The values range from zero to infinite, where zero indicates the absence of local dependency; e the T-DIF summary statistic is calculated as the absolute value of the base ten logarithms of all P values for uniform and non-uniform DIF across all items and across all person factors, which are below the Bonferroni-corrected P value. The values range from zero to infinite, where zero indicates no DIF; f a value of ≥0.850 suggests a precision of measurement also at the individual level, whereas a value between 0.700 and 0.849 indicates precision only at the group level. *Values outside the recommended range.

Given the lack of fit to the Rasch model and the results of the preliminary assessment of unidimensionality analyses, we decided to delete A3 (Mechanical ventilation), A4 (Confusional state), and A5 (Behavioral disturbances). The subsequent RA on the reduced item set showed that the data satisfied the requirements of the stochastic ordering of the items (χ2df=25.214; P=0.0331), but failed to meet the local independence (6 items had residual correlation values higher than the LDRC, here set at 0.145), and absence of DIF (two items presented uniform and non-uniform DIF) requirements. Moreover, B2 showed a misfit to the model. However, the unidimensionality was acceptable (PST=5.3, lower BCI=1.5%), and no item showed DT (Table V).

Afterward, according to the item content and the local dependency pattern, we created testlets to absorb local dependency and item misfit. Particularly, we created the following five testlets:

  • ST1: Intensive medical monitoring (A1) + Tracheostomy (A2);

  • ST2: Severe communication deficits (A6) + Swallowing disorders (A7);

  • ST3: Feeding (B1) + Transfers (B2) + Mobility (B6) + Stairs (B7);

  • ST4: Grooming (B3) + Toilet use (B4) + Bathing (B5);

  • ST5: Bowels (B9) + Bladder (B10);

This five-testlet solution showed a good fit to the model (χ2df=7.25; P=0.2070), satisfied the requirements of monotonicity (no items with DT), strict unidimensionality (PST=3.1, lower BCI=-0.7%), and local independence. There was uniform and non-uniform DIF by etiology for ST1 (A1+A2) (Table V). However, the impact of this DIF on the person estimates appeared to be negligible, as the difference between the paired person estimates of the splitted versus the un-splitted solution yielded a Cohen’s d=0.135. Therefore, DIF was not accounted for. The PSI value (0.902) suggested that the scale had sufficient precision for measurement on single subjects (PSI >0.850). The item hierarchy indicated that ST1 (A1+A2) and ST4 (B3+B4+B5) were, respectively, the easiest and the most difficult testlet (Table VI). The item fit statistics and the scoring model for the final solution are reported in Table VI. The targeting graph of the final solution showed that participants were spread across seventeen logits, with a frank floor effect as 102 (47.4%) subjects presented the lowest score (Figure 2).

Table VI. Items’ parameter, fit statistics, scoring model for the final solution in unique assessment sample (N.=247).
Item description Item parameters and fit statistics Scoring model
Location SE FR χ2 Prob a
ST01 – A1-A2 -4.727 0.209 0.089 0.316 0.574 0-2
ST02 – A6-A7-B1 -1.334 0.152 -0.378 0.839 0.360 0-4
ST05 – B9-B10 1.265 0.171 -0.032 0.040 0.841 0-4
ST03 – B2-B6-B7 2.211 0.145 -2.192 5.820 0.016 0-6
ST04 – B3-B4-B5 2.585 0.124 -0.651 0.173 0.678 0-6

SE: standard error; FR: fit residual; χ2: Chi-square; Prob: χ2 probability. The location is expressed in logits. The degrees of freedom for each χ2 were 1 for all items. a Bonferroni-corrected P value was set at 0.01, indicative of statistical significance at the 0.05 level.

Figure 2.

Figure 2

—Targeting (person-thresholds distribution) graphs for unique assessment sample. For each graph, persons (N.=247) and item thresholds are displayed, respectively, in the upper and the lower part of the chart, separated by the logit scale. Grouping set to interval length of 0.20, making 85 groups for base and final analyses. Freq: frequency; No: number; SD: standard deviation.

The base RA on total sample showed that even these data failed to satisfy the requirements of the stochastic ordering of the items (χ2df=866.934; P≤0.001), local independence (five items had residual correlation values higher than the LDRC, here set at 0.166), and absence of DIF (five items presented uniform and non-uniform DIF). Moreover, eleven items misfitting to the model. Finally, the unidimensionality was strict (PST=2.6, lower BCI=1.5%), and no item presented DT (Table V). Following this, we replicated the same solution achieved for the unique assessment sample on the total sample. This solution showed that the data satisfied the model requirement of stochastic invariance (χ2df=19.315; P=0.1988), strict unidimensionality (PST=2.4, lower BCI=0.4%), and monotonicity (no DT was reported). No testlet showed a misfit to the model, although four testlets presented five uniform DIFs and two non-uniform DIFs (Table V).

Finally, we anchored the item difficulty estimates from the unique assessment sample to the final solution of the total sample. Good fit to the model (χ2df=20.615; P=0.1518) and satisfaction of all the other ICV requirements (including unidimensionality) were confirmed, except for the presence of four uniform DIFs and two non-uniform DIFs. Notably, ST4 presented a uniform DIF for time since lesion; the difference between the paired person estimates of the splitted versus the un-splitted solution yielded a Cohen’s d value of 0.906 (large effect size). Consequently, a new Rasch analysis was run to split ST4 for time since lesion (<65 days vs. between 66-123 days and >124 days). The data meet the model expectation for stochastic invariance (χ2df=19.117; P=0.3260) and monotonicity; no testlet showed a misfit to the model. Moreover, three testlets presented uniform and non-uniform DIFs. Particularly, ST5 showed a uniform DIF for LCF; the difference between the paired person estimates of the splitted vs. the un-splitted solution yielded a Cohen’s d of 0.832, which, again, was a large effect size. Therefore, a new Rasch analysis was run to split ST5 for LCF. The data satisfied the model requirement of stochastic invariance (χ2df=21.920; P=0.3526) and monotonicity. No testlet presented misfitting to the model, and three items presented uniform and non-uniform DIFs. In particular, ST1 presented a uniform and non-uniform DIFs for etiology, whereas ST2 presented a uniform DIF for LCF and a DIF for time since lesion. However, the impact on the estimates of these item biases was small or negligible (Cohen’s d of 0.333, 0.125, and 0.216, respectively). Therefore, DIF for both items was not accounted for. For this solution, too, the PSI (0.920) was compatible with measurement on single subjects (>0.85).

Analysis of simulated data

Given the scoring model of A4 (Confusional state) and A5 (Behavioral disturbance), we hypothesized that their misfit could be due to the lack of matching between their current scoring structure and the temporal evolution of the related conditions. Indeed, both confusional state and behavioral disturbance (i.e., agitation) are not clinically detectable both in persons with disorders of consciousness and in individuals who have reached a higher level of independence. In other words, for an item to measure these two sub-constructs correctly, its structure should be unfolding and based on a three-level rating scale (0 = confusional state/behavioral disturbance non-detectable because of disorder of consciousness; 1 = clinical condition present; 2 = confusional stat/behavioral disturbance resolved). Thus, we hypothesized that fit to a unidimensional model could improve by replicating this unfolding structure at the level of the two items.

Thus, we applied the above rating scale to the two items to test this hypothesis. In particular, the lower score was assigned to patients with an LCF score ranging from 1 to 3 (i.e., absence of confusional state and behavioral disturbance because of a disorder of consciousness). In contrast, the two other score levels were unchanged. Following this, we performed a CTT-IA, MA, and CFA to test whether these two items’ new unfolding structure could better satisfy the unidimensionality requirement.

At the total score level, α (=0.972) was improved if compared to α of the original data (0.964), and the average inter-item correlations yielded similar results (0.660>0.544) (Table VII). At the items level, all items showed an item-to-total correlation coefficient above 0.400 but mA3 [Mechanical ventilation]. However, the item-to-total correlation coefficients for mA4 [Confusional state] and mA5 [Behavioral disturbances] improved significantly (>0.980) (Table VII). At the scale level, the MA now showed that all items could be scalable on a single scale, with H coefficients indicating high scalability for all items (H>0.30) (Table VII). The baseline CFA showed that the modified data still misfitted to a unidimensional model (RMSEA=0.400), although now the factor loadings for mA4 and mA5 were both >0.980. After accounting for local dependence, the fit to the unidimensional model improved if compared to the original scale (RMSEA=0.044 vs. 0.048) (Table VII).

Table VII. Results of classical item analysis, Mokken analysis and confirmatory factor analysis on the modified scale (N.=494).

Summary analysis Classical item analysis Mokken analysis Confirmatory Factor Analysis - Baseline Confirmatory Factor Analysis - Final
aIIC α Scale RMSEA SRMR CFI TLI χ2df P value RMSEA SRMR CFI TLI χ2df P value
0.660 0.972 0.400* 0.159* 1.000 1.000 9521.2119 0.0000* 0.044 0.053 1.000 1.000 165.784 0.0000*
Recommended values >0.200 >0.700 ≤0.06 ≤0.06 ≥0.950 ≥0.950 n.s ≥0.05 ≤0.06 ≤0.06 ≥0.950 ≥0.950 n.s ≥0.05
1
Item analysis ITC α-iid H Factor loading SE Factor loading SE
A1 Intensive medical monitoring 0.641 0.972* 1.00 1.000 0.000 1.000 0.000
A2 Tracheostoma 0.737 0.971 0.97 0.942 0.012 0.952 0.012
A3 Mechanical ventilation 0.104* 0.976* 0.98 0.998 0.000 0.998 0.000
mA4 Confusional state 0.758 0.972* 0.95 0.984 0.007 0.921 0.015
mA5 Behavioral disturbances 0.738 0.972* 0.95 0.992 0.007 0.921 0.016
A6 Severe communication deficits 0.785 0.971 0.96 0.930 0.012 0.940 0.012
A7 Swallowing disorders 0.807 0.971 0.97 0.999 0.000 0.971 0.009
B1 Feeding 0.941 0.968 1.00 0.984 0.000 0.986 0.000
B2 Transfers 0.952 0.968 0.98 0.990 0.000 0.988 0.000
B3 Grooming 0.940 0.968 0.99 0.987 0.000 0.986 0.000
B4 Toilet use 0.899 0.969 0.99 0.987 0.000 0.986 0.000
B5 Bathing 0.839 0.970 0.98 0.986 0.000 0.986 0.000
B6 Mobility 0.927 0.968 0.98 0.986 0.000 0.986 0.000
B7 Stairs 0.873 0.969 0.98 0.988 0.000 0.986 0.000
B8 Dressing 0.915 0.969 0.99 0.986 0.000 0.984 0.000
B9 Bowels 0.924 0.968 0.97 0.986 0.000 0.986 0.000
B10 Bladder 0.905 0.969 0.97 0.986 0.000 0.986 0.000
Recommended values >0.400 <0.964 >0.30 >0.400 N/A >0.400 N/A

aIIC: average item-to-total correlation; α: Cronbach’s alpha; RMSEA: Root Mean Square Error of Approximation; SRMR: Standardized Root Mean Square Residual; CFI: Comparative Fit Index; TLI: Tucker Lewis Index; χ2df: Chi-square with degree of freedom; ITC: item-to-total correlation; α-iid: alpha if an item was deleted; H: scalability coefficient; SE: standard error; n.s.: not significant. *Statistics values outside the recommended cut-off.

Discussion

To the best of our knowledge, this is the first study that deeply investigated the ERBI ICV through classical (i.e., CTT-IA and CFA) and modern (i.e., MA and RA) psychometric techniques. The unidimensionality preliminary assessment conducted with CTT-IA, MA, and CFA showed clearly that both A4 (Confusional state) and A5 (Behavioral disturbances) contributed to a lesser extent to the measurement of early functional changes if compared to the other items. The RA confirmed these findings, which also showed the misfitting of A3, A4, and A5 and violation of the stochastic invariance, local independence, and measurement invariance (i.e., presence of DIF), but not monotonicity and unidimensionality. After the deletion of three misfitting items and further non-structural modifications (i.e., testlets creation to absorb LD between items and item misfitting), the final solution showed stochastic and measurement invariance, local independence, strict unidimensionality, and adequate reliability for measurements at the individual level, albeit with a frank floor effect. This final solution was successfully replicated in the total sample of the subjects. After post-hoc modifications of the scoring structure of two out of three misfitting items, the subsequent CFA and MA showed the resolution of the unidimensional issues.

Within this study, we consecutively enrolled patients admitted to early rehabilitation across five different Italian rehabilitation centers. Furthermore, the enrolled patients presented different levels of consciousness, a wide range of duration of ICU stay, and various etiologies (i.e., traumatic, vascular, anoxic, and rare ones such as encephalitis). Therefore, the characteristics of our sample are similar to that enrolled in the GISCAR study, an Italian epidemiological study.50 Therefore, our sample could be considered sufficiently representative of the Italian sABI population admitted to early rehabilitation.

Mechanical ventilation (A3) showed a lack of unidimensionality and several fit issues in RA. This could be explained considering that few of the patients within the sample (4.0% and 1.6%, on admission and discharge, respectively) were mechanically ventilated on admission to a rehabilitation unit. Furthermore, some of these patients may be difficult to wean, especially if they present a brain stem injury. As score variance under this condition was limited, this could explain why this item did not work as expected when summed up with the other items and, hence, was deleted. However, should this scale be employed in subacute settings (e.g., intensive care unit), the prevalence of mechanically ventilated patients could be higher, making this item more appropriate for the targeted sample.

Within the RA, we deleted A4 (Confusional state) and A5 (Behavioral disturbances) given the lack of unidimensionality and the RA findings. Indeed, the latter showed highly significant Chi-squares indicating a serious misift to the stochastic invariance requirement of the Rasch model. We hypothesized that this could be the consequence of the violation of the monotonicity requirement due to the incorrect dichotomous scoring structure. Indeed, what is expected is that as the level of independence increases (i.e., the ERBI total score increases), the score of each item increases. However, the dichotomous structure allowed the attribution of the higher score for these two items both to patients with a severe disorder of consciousness and to those with a much higher level of independence. We tested this hypothesis by simulating an unfolding structure for these two items, where the absence of the related behaviors was scored as the lowest in the presence of a disorder of consciousness. This post-hoc scoring modification recreated the same quantitative hierarchy of all the other items, significantly improving the unidimensionality shown by MA and CFA. These findings suggest the need for a refinement of the scoring structure of these two items rather than their deletion, as their content coverage is relevant and valuable to assess the level of independence of patients in the early post-acute neurorehabilitation phase.

After deleting three items and creating five testlets, the fit to Rasch model was achieved even if the targeting graph showed a frank floor effect. However, different results were obtained in the Brazilian version of ERBI,16 in which the authors found no floor effect. However, this discrepancy could be explained by considering the different populations and settings of the two studies. Indeed, within the study from Silva et al., they enrolled patients admitted to the intensive care unit with surgical outcomes or internal and respiratory pathologies. Furthermore, most of them were discharged at the end of their hospitalization in the ICU with a consequent probable higher functionality than reported by our patients with sABI. Therefore, we hypothesize that the tendency towards floor effect of the ERBI for sABI patients admitted to early rehabilitation could be improved by adding additional items which may be specifically relevant for this population and that could be better able to discriminate at lower levels of ability. For instance, items that explore the level of consciousness, the presence of bedsores, and of other devices (e.g., bladder catheter, nasogastric tube, and for the administration of intravenous medications) could help overcome this frank floor effect, as these items would investigate elements of clinical complexity which are more likely to be present on admission to rehabilitation rather than on discharge. Thus, we propose a revision of the content of the ERBI for patients admitted to early rehabilitation that, together with the revision of the scoring structure of A4 and A5, may lead to a drastic reduction of the floor effect. Of course, this hypothesis about the improvement of the measurement properties of the ERBI should be tested and confirmed within the context of a new prospective multicenter study.

We did not perform a Rasch analysis on the simulated data. The reason for that is that the scoring structure of the behavioral disturbance and confusional state items was modified deterministically using the LCF as a guidance. Indeed, the Rasch model expects some degree of randomness in the data which cannot be achieved with data simulated deterministically.51 On the other hand, the purpose of the post-hoc analyses on simulated data was to test the hypothesis that unidimensionality could improve after modification of these two items. This hypothesis was successfully tested within this study with less restrictive models (i.e., MA and CFA) and will be further tested using Rasch analysis in future prospective longitudinal studies.

We did not report in this study the recommended conversion table from raw scores to interval-level estimates of ability.36 Although we have achieved a valid scale structure according to the Rasch Model, the modified ERBI presents a higher floor effect than its original counterpart. Although this is expected considering the deletion of three items, we believe that clinicians and researchers should not be encouraged to use this modified version, as they would be if we provided a conversion table. Indeed, our conclusion is that the design of the scale needs to be improved considering the basis of the indications provided by this study and that the validity of the new scale should be tested with a proper, ad-hoc, prospective longitudinal study.

Our results show that, although ERBI meets the ICV requirements within the Rasch model after structural modifications, it showed a frank floor effect that effectively limits its use in clinical practice and research. Our post-hoc analyses demonstrated that modifying the score structure of two misfitting items that were deleted could improve the unidimensionality of the ERBI; moreover, other items with appropriate content for the sample examined could reduce the floor effect. Therefore, ERBI cannot be used in this form in clinical practice and research in measuring functional changes in patients with ABI; consequently, we suggest modifying its structure before using it and verifying its psychometric properties before using it in new scientific studies.

Limitations of the study

This study presents several limitations that deserve to be highlighted. First, although our sample could be considered sufficiently representative of the Italian sABI population, our findings cannot be generalized to patients with different diseases admitted to other settings (e.g., ICU). Second, we performed the rescoring of A4 and A5 based on the LCF score. This post-hoc imputation, based on a deterministic approach would need confirmation on a further analysis based on newly collected real data. Third, since the data analyzed in this article comes from previously published studies, the procedures may not have been the same between the different studies; this issue could introduce a measurement bias, i.e., it may lead to systematic errors or differences in how the data is collected or interpreted. However, the assessors involved in these studies were experienced, and these assessments were part of routine clinical practice. Finally, in this study, we could not exert any control on the ERBI administration procedures, nor we could assess classical inter- and intra-rater reliability, as we performed a secondary analysis built upon data collected from previously published studies.

Conclusions

The results of this study suggest that the ERBI total score cannot be considered a valid measure of the functional changes occurring in patients with sABI admitted to early rehabilitation. However, this study has also suggested the possibility of improving the scoring of some items and to reduce the founded frank floor effect, we found, adding other items that might have a more appropriate content to the construct that ERBI intends to measure. Future researchers are needed to refine the instrument to make it a valuable measuring tool for this population.

History

Supplementary Digital Material 1

Supplementary Table I

Assessment of the measurement quality of an instrument within the Rasch analysis framework.

Footnotes

Conflicts of interest: The authors certify that there is no conflict of interest with any financial organization regarding the material discussed in the manuscript.

Funding: The publication of this article was supported by the “Ricerca Corrente” funding from the Italian Ministry of Health.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Table I

Assessment of the measurement quality of an instrument within the Rasch analysis framework.

Data Availability Statement

The raw data associated with the article are publicly available for download from in www.zenodo.org (according to the license Creative Commons Attribution 4.0 International) from the following link: link: https://doi.org/10.5281/zenodo.8171871.


Articles from European Journal of Physical and Rehabilitation Medicine are provided here courtesy of Edizioni Minerva Medica S.p.A.

RESOURCES