Skip to main content
VA Author Manuscripts logoLink to VA Author Manuscripts
. Author manuscript; available in PMC: 2022 Mar 4.
Published in final edited form as: Eval Health Prof. 2016 Nov 16;41(1):25–43. doi: 10.1177/0163278716676873

Linking Existing Instruments to Develop an Activity of Daily Living Item Bank

Chih-Ying Li 1,2, Sergio Romero 3,4, Heather S Bonilha 1, Kit N Simpson 2, Annie N Simpson 2, Ickpyo Hong 5, Craig A Velozo 6
PMCID: PMC8895526  NIHMSID: NIHMS1780491  PMID: 27856680

Abstract

This study examined dimensionality and item-level psychometric properties of an item bank measuring activities of daily living (ADL) across inpatient rehabilitation facilities and community living centers. Common person equating method was used in the retrospective veterans data set. This study examined dimensionality, model fit, local independence, and monotonicity using factor analyses and fit statistics, principal component analysis (PCA), and differential item functioning (DIF) using Rasch analysis. Following the elimination of invalid data, 371 veterans who completed both the Functional Independence Measure (FIM) and minimum data set (MDS) within 6 days were retained. The FIM-MDS item bank demonstrated good internal consistency (Cronbach’s α = .98) and met three rating scale diagnostic criteria and three of the four model fit statistics (comparative fit index/Tucker–Lewis index = 0.98, root mean square error of approximation = 0.14, and standardized root mean residual = 0.07). PCA of Rasch residuals showed the item bank explained 94.2% variance. The item bank covered the range of θ from −1.50 to 1.26 (item), − 3.57 to 4.21 (person) with person strata of 6.3. The findings indicated the ADL physical function item bank constructed from FIM and MDS measured a single latent trait with overall acceptable item-level psychometric properties, suggesting that it is an appropriate source for developing efficient test forms such as short forms and computerized adaptive tests.

Keywords: continuity of patient care, activities of daily living, veterans, outcome assessment (health care), psychometrics


Based on the nature of disease progress, patients need health-care services in a variety of postacute care (PAC) settings to meet with their evolving needs. The term “trajectory of care” has been coined to discuss care that a patient receives during their recovery process. “A trajectory of care” is synonymous with the term “episode of care,” used in section 5008 of the Deficit Reduction Act in 2005, meaning “the care a patient receives in order to treat a spell of illness associated with a hospitalization. A trajectory may include one or more settings,” whereas “a spell of illness” covers “all readmission and skilled nursing facility service use” based on Medicare’s definition (Centers for Medicare & Medicaid Services, 2012).

A trajectory of PAC is provided across varied facilities, such as inpatient rehabilitation facilities (IRFs), skilled nursing facilities (SNFs; known as community living centers [CLCs] at the veterans health-care system), home health agencies, long-term acute care hospital, and outpatient therapy services. Based on a 5% national sample of 2006 Medicare claims data, over a third (35.2%, n = 109,236) of all beneficiaries discharged from acute care facilities transitioned to at least one type of PAC facility (Research Triangle Institute [RTI] International, 2013). In addition, 52% of this group of beneficiaries went on to use at least one additional PAC service after the first PAC site (RTI International, 2013). In 2007, the Medicare Payment Advisory Commission (MedPAC) reported that Medicare spent over US$45 billion dollars on PAC (RTI International, 2013). Based on its high utilization rate and cost, PAC plays an important role for patients, health-care practitioners, and health-care policy makers.

One major challenge resulting from the continuum of PAC is assessing and monitoring the function of patients, as they transfer across different facilities. To enhance the quality of care and outcomes of veterans who receive PAC, the vital first step is to measure patients’ functional progress continually. Measuring patients’ recovery progression continually allows practitioners to avoid duplicate assessments and design realistic and continuous treatment plans. Different facilities currently use different but conceptually similar functional instruments to assess patients. For instance, the required PAC site-specific patient assessment tools for different facilities include the IRF Patient Assessment Instrument (i.e., the Functional Independence Measure [FIM™] with additional demographic data such as age and gender) for the IRFs and the minimum data set (MDS) for the SNFs/CLCs. The use of different instruments across the PAC results in two major issues: (1) individual patient’s functional scores from different instruments are not comparable or not easily translated from one facility to the next and (2) since scores are not exchangeable, it is difficult to track quality indicators and patients outcomes across settings (MedPAC, 2013).

There are two potential solutions that could solve the above-mentioned challenges. One solution is to use a single measurement system for all PAC venues. A major advantage of using a single instrument is that scores across different facilities would be comparable. However, the development and implementation of a new, single measurement system requires enormous efforts and costs for instrumental development (e.g., examining interrater reliability), testing (e.g., new payment models testing), and implementation (e.g., modification of electronic medical records or payment systems). Data collection software systems will require complete change or replacement, and instrument implementation will require extensive personnel training for assessment administration, leading to additional burden for the practitioners. Increased measurement error likely occurs at the beginning of implementing the new instrument. A major disadvantage of using a new item set is that administrators and researchers will not be able to use historical data from the previous years as a comparison or part of the study data set unless some kind of linking methodology is used.

An alternative solution, which has none of the aforementioned limitations, is to use modern test theory, such as item response theory (IRT)/latent trait model, to link existing instruments and translate scores from different instruments across the PAC continuum. IRT methods provide solutions for the challenges associated with using different test instruments and allow measures translatable across instruments. This methodical advantage can be accomplished by the IRT-based assumption, the latent trait model, that a single construct (i.e., activities of daily living [ADL]) can be cocalibrated across different instruments, and thus, the estimated scores of a respondent can be used to predict or explain test performance based on the latent traits of a person (Hambleton, Swaminathan, Cook, Eignor, & Gifford, 1978). The IRT method is hypothesized to be able to establish an item bank that measures the same latent trait.

The study findings are specifically applicable in the veterans health-care system. Since 2003, the FIM had been used as an IRF performance measure and supportive quality indicator (Department of Veterans Affairs [VA], 2011). Similarly, the MDS is used as a quality indicator to monitor patients’ performance and improvement in CLCs (Department of VA, 2014). Maintaining existing assessments is important to sustain current Veterans Administration outcome reporting systems that have been well established for decades. In addition, continued use of FIM and MDS will maintain links to extensive and elaborate historical health data in the VA, since the VA system maintains some of the largest and most complete health-care databases in the world.

An initial demonstration of the latent trait model (that supports using existing instruments to measure equivalent construct across the PAC continuum) is to determine whether the items on different instruments can be linked. Several studies had made such efforts to validate this assumption and found promising results in linking different instruments measuring the same latent trait (Dorans, Pommerich, & Holland, 2010; Haley et al., 2011; Kolen & Brennan, 2004; Velozo, Byers, Wang, & Joseph, 2007; Wang, Byers, & Velozo, 2008a). This study aimed to use IRT-based linking procedures to establish an FIM-MDS item bank and validated whether the FIM-MDS item bank measures the same latent trait. By examining the item-level psychometrics of the FIM-MDS item bank based on the IRT methodologies, the hypothesis was that the FIM-MDS item bank would show acceptable IRT item-level psychometrics, with an assumption that both instruments measure the same latent trait, ADL physical function.

Method

Ethical Disclosure

This study was approved by Health Center Institutional Review Board at the University of Florida (IRB Project # 440–2011) and the Veterans Health Service Human Research Protection Program at the North Florida/South Georgia Veterans Health System. This study did not require obtaining inform consents from the participants due to its nature of secondary data analysis design. This study was approved as a “Not Human Subjects Research” for its deidentified data analysis nature by the Office of Research Integrity at the Medical University of South Carolina.

Participants

Data for the study were extracted from the existing databases maintained by the Veterans Austin Information Technology Center (AITC). The FIM and the MDS data resided in two separate databases at the AITC. FIM data were contained in the Function Status and Outcomes Dataset (FSOD), and the MDS data were maintained in the data set for the Office of the Assistant Deputy Under Secretary for Health at the Patient Care Services. These two data sets were merged by patient identifiers and later deidentified at the University of Florida. The subsequent deidentified data analysis was performed at the Medical University of South Carolina.

The data were limited to veterans who had one of the four medical conditions: (1) new stroke, (2) lower extremity amputee, (3) knee replacement, and (4) hip replacement. The four distinct diagnoses were chosen in order to minimize the possibility that the same individual would be classified into more than one functional-related group in the following validation study. For inclusion in the study, the two assessments had to be administered within 6 days during the period of October 2008 to September 2010. This study only included veterans who completed both instruments (FIM and MDS) without any missing items. The four impairment groups were chosen because (a) they are common diagnoses for postacute rehabilitation care, (b) these four diagnoses had distinctive illness characteristics (thus the same patient rarely has more than one of them at the same time), and (c) they allow for comparisons to a previous study (Wang et al., 2008a).

Statistical Analysis

SAS Version 9.4 was used to merge data, match data, and descriptive/inferential analysis. Mplus Version 7.1 was used for factor analysis and residual correlation matrix. Winsteps Version 3.57.2 was used for Rasch analysis, including fits statistics, rating scale diagnoses (e.g., monotonicity), person strata, and principal component analysis (PCA). Winsteps Rasch–Welch (logistic regression) t-test was used to examine items with differential item functioning (DIF).

Linking Procedures

Rasch analysis common person equating method was used in this study. Common person equating method requires the same person responding to both instruments. The cocalibration approach used in this study was based on the following three steps, including (a) using a preidentified set of 26 items from the FIM and MDS measuring an equivalent construct of ADL, (b) removing invalid responses, and (c) anchoring MDS and FIM person measures based on the cocalibrated FIM-MDS item difficulties and item step thresholds (i.e., the item calibration between a rating of 1 and 2, 2 and 3, 3 and 4, etc.).

A sample of 500 veterans was randomly stratified from a cohort of 3,000 veterans across the four above-mentioned impairment groups (stroke, lower extremity amputation, knee replacement, and hip replacement). The person measures (i.e., the unified person ability scores) for the FIM and MDS were generated by anchoring separate analyses on item measures (i.e., the unified item difficulty scores) and step measures (i.e., the unified item rating scale difficulty scores) from a cocalibration of the 500 veterans. In this study, “measure” was a term used to represent a logit transformation from Rasch analysis that enables person ability and item difficulty to be placed in the same unit (logit) and thus can be compared. To ensure that each patient responded consistently to both instruments before developing an item bank, veterans with FIM-MDS person measures that fell outside of the 95% confidence interval error identity line were excluded, leaving a sample of 371 (74.2%) veterans in the final analytical data set. This remaining sample size (n = 371) was above the acceptable number of 300 suggested by previous literature when using IRT methodologies to link health outcome measures (Cook, Taylor, Dodd, Teal, & McHorney, 2007; Fisher, 1997; Fischer, Wahl, Fliege, Klapp, & Rose, 2012).

Item Bank Testing Based on Rasch Model Requirements

The FIM-MDS item bank of 371 veterans was examined to determine whether it fulfilled the Rasch model assumptions, including unidimensionality, local independence, and monotonicity. Unidimensionality means a scale measures only one construct (Tennant & Pallant, 2006). Local independence means the response to any item is unrelated to the response to any other item (Bond & Fox, 2007). Monotonicity means person ability should increase when the probability of endorsing a rating scale response is indicative of better function (Bond & Fox, 2007). DIF items were also identified in order to investigate whether the probability for the individuals to endorse an item differed on the basis of age and diagnoses. MDS rating scale was converted from the original rating scale (i.e., 012348) to match with the rating scale of FIM (i.e., 1234567). Converting the rating scale enabled the scores to represent the patient’s ability in the same direction from both instruments. We also ensured that the rating scales were converted equivalently based on their conceptual meanings.

Confirmatory factor analysis (CFA) and Rasch fit statistics were used to determine whether the FIM-MDS item bank was “essentially” unidimensional (Bond & Fox, 2007). Rasch fit statistics is an index to measure the difference between the estimated scores of the Rasch model and the observed scores (Linacre, 2004). Mean square standardized residuals (MnSqs), representing observed variance divided by expected variance, were used to assess the extent of unidimensional level of each item. For clinical scales, a reasonable range of MnSq fit values was suggested between 0.5 and 1.7, along with associated standardized fit statistics of values between ± 2.0 (Wright & Linacre, 1994). A CFA polychoric correlation matrix was used with a weighted least squares estimator of four model fit indices, including the comparative fit index (CFI > 0.95), Tucker–Lewis index (TLI > 0.95), root mean square error of approximation (RMSEA < 0.06), and standardized root mean residuals (SRMR < 0.08; Hu & Bentler, 1996). The factor loadings and average absolute residual correlations were also used to confirm the factor structure (Hu & Bentler, 1996).

The Rasch residual PCA was used to assess if there were meaningful structures of residuals after extracting the primary Rasch dimension. First contrast in the Rasch residual PCA represents the first PCA component in the correlation matrix of the residuals after extracting the Rasch dimension (Linacre, 2004, 2010, 2012; Patient-Reported Outcome Measurement Information System [PROMIS®], 2014). Unidimensionality of an instrument is supported when the Rasch dimension explains more than 40% variance of the data, and the first contrast of the Rasch residual explains less than 5% variance of the data (PROMIS®, 2014). Local independence was identified by the residual correlation matrix produced by the factor analyses with Mplus. Items with residual correlations beyond ±.2 were identified as violating local independence (PROMIS®, 2014; Reeve et al., 2007).

The rating scale structure was evaluated based on three criteria: (1) having at least 10 responses in each rating category, (2) a monotonic pattern of category logit measure, and (3) the outfit mean square value for each rating scale was less than ±2.0 (Linacre, 1999, 2002). Outfit mean square value is an outlier-sensitive fit statistics. Monotonicity was examined by the increase in the probability of endorsing a rating scale response when the person ability increases. If the predicted order is reversed, it indicates that the item “violates” monotonicity. Rasch–Welch (logistic regression) t-test examined group differences across age (equal to or under 65 vs. over 65 years) and diagnosis (stroke vs. orthopedic impairments) was used to detect DIF items. The items were identified as a slight-to-moderate DIF item if the DIF contrast ≥0.43 logits at significant level of p > .05 and as a moderate-to-large DIF item if the DIF contrast ≥0.64 logits at significant level of p >.05 (Zwick, Thayer, & Lewis, 1999).

All psychometric analyses were accomplished using the 371 veterans. Items in the item bank that did not fit the unidimensional model, having residual correlation above ±.2 and having significant DIF values or poor discrimination, were reviewed by the research team to determine whether the items should be removed. Clinical relevance was also used to make final item elimination decisions. The final item bank, that meets the essential requirement of unidimensionality, was used for Rasch analysis to generate point-measure correlation, person strata, and item–person map. Point-measure correlation is an index demonstrating the Pearson correlation coefficients between the item observations and the corresponding Rasch measures (estimated including the current response; Linacre, 1998). A value larger than the absolute value of 0.3 was considered acceptable. Person separation index was used to calculate the number of levels of person ability (person strata) distinguished by the item difficulties and calculated as (4Gp + 1)/3, where Gp is person separation (Wright & Masters, 1982). An item–person map was used to determine ceiling/floor effects. Greater than 5% of the sample being at the ceiling or floor was considered as significant ceiling/floor effect in this study (Velozo, Choi, Zylstra, & Santopoalo, 2006).

Results

Descriptive Statistics

Participants had a mean age of 67.0 years (SD = 11.0), with a range from 22 to 90 years. Six (1.6%) veterans who were older or equal to 90 years were grouped as one group and were identified as 90 years old. The majority of the participants in this study were male (n = 354, 95.4%), White (n = 233, 62.8%), and married (n = 161, 43.4%; Table 1). The average number of days since onset of the given diagnoses by a physician was 173.4 ± 1,331.3 days, about 6 months. The mean days between the administrations of the FIM and the MDS were 3.1 days (SD = 2.1), with a range from 0 to 6 days. There were 164 (44.2%) veterans with stroke, 77 (20.8%) with lower extremity amputees, 74 (19.9%) with knee replacement, and 56 (15.1%) with hip replacement (Table 1).

Table 1.

Demographic Characteristics of Participants in this Study.

Community-Dwelling Veterans (n = 371)
Variables Number %

Age (range: 22–90 y/o) Mean = 67.0 SD = 11.0
Average number of days since onset Mean = 173.4 SD = 1,331.3
Age-group
 <65 y/o 203 54.7
 >65 y/o 168 45.3
Gender
 Male 354 95.4
 Female 14  3.8
 Missing 3  0.8
Ethnicity
 White 233 62.8
 Black 83 22.4
 Native American 4  1.1
 Hispanic 19  5.1
 Other 19  5.1
 Missing 13  3.5
Diagnoses
 Stroke 164 44.2
 Lower extremity amputee 77 20.8
 Knee replacement 74 19.9
 Hip replacement 56 15.1
Marital status
 Single 37 10.3
 Married 161 43.4
 Widowed 26  7.0
 Separated 18  4.9
 Divorced 118 31.8
 Missing 11  3.0
Days between administrations of FIM and MDS (range = 0–6) Mean = 3.1 SD = 2.1
FIM raw score Mean = 63.5 SD = 22.8
FIM anchored measure score Mean = 0.36 SD = 1.5
MDS raw score Mean = 30.0 SD = 25.8
MDS anchored measure score Mean = 0.55 SD = 1.3

Note. n = 371. y/o = years old; FIM = Functional Independence Measure; MDS = minimum data set.

Factor Structure of the FIM-MDS Item Bank

The study findings from CFA and PCA of Rasch residuals supported our hypothesis that the FIM-MDS item bank is a one-factor model structure measuring a single latent trait of ADL. The FIM-MDS item bank met three of the four model fit criteria (CFI/TLI = 0.98 > 0.95, RMSEA = 0.14 > 0.06, and SRMR = 0.07 < 0.08), indicating the item bank measuring one factor (Table 2). The PCA showed that Rasch dimension (person and item measures) explained 94.2% variance of the scale, far above 40%, and the first contrast of the Rasch residual explains 0.8% variance of the data, far less than 5% criteria. The person reliability (similar to Cronbach’s α) of the 26-item FIM-MDS item bank was 0.98. The raw scores of the FIM and the MDS correlated at –.93. The measure scores (i.e., standardized scores) of the FIM and the MDS correlated at .85. The raw scores and the anchored measure scores after linking of the FIM and the MDS correlated at .93 and .85, respectively, after adjusting for rating scale direction.

Table 2.

Item-Level Psychometric Properties of the FIM-MDS Item Bank.

FIM-MDS Item Bank

Person reliability (Cronbach’s α) 0.98
Person separation index 4.51
Person strata 6.3
Person ability M = 0.49, SD = 0.20
Minimum = −3.57, Maximum = 4.21 (range = 7.78)
Item difficulty M = 0, SD = 0.05
Minimum = −1.50, Maximum = 1.26 (range = 2.76)
Misfitting items (both high and low fit) 42.3% (11/26 items)
Floor effect 0% (0/371 persons)
Ceiling effect 0% (0/371 persons)

Note. n = 371. FIM = Functional Independence Measure; MDS = mínimum data set.

Item-Level Psychometrics of the FIM-MDS Item Bank

All test items met three rating scale criteria and showed local independence, except for 1 item (MDS walk in corridor) which had residual correlations above ±.2 with 2 items: MDS walk in room (0.272) and MDS eating (–.242; Table 2). All items had point-measure correlations larger than .3 (range from .56 to .90), indicating all items measuring the same construct.

A total of 15 items (57.7%) from the item bank showed fit statistics between 0.5 and 1.7. Misfitting items included 5 items with high infit values and 6 items with low infit values. Items with high fit values are erratic, that is, they do not fit the expected pattern of the Rasch model, while the items with low fit values were Guttman-like items (fit the model too well). For practical reasons, items with high fit values had more concerns, which were MDS bladder continence, bowel continence, locomotion off unit, walk in corridor, and walk in room. The items with low fit values included FIM dressing upper body and dressing lower body, shower/bathing self, toileting hygiene, toilet (transfer), and bed to chair/wheelchair (transfer). The items with high fit values were all MDS items, and the items with low fit values were all FIM items. Overall, the average person ability (mean = 0.49, SD = 0.20) was higher than the item difficulty of the item bank (mean = 0.0, SD = 0.05). Person measures had skewed distributions toward the higher ability (Figure 1).

Figure 1.

Figure 1.

Item–person map of the Functional Independence Measure and minimum data set item bank.

The range of item difficulty of the item bank is 2.76 logits (min = −1.50, max = 1.26), while the range of person ability is 7.78 logits (min = −3.57, max = 4.21). Overall, the MDS items were slightly more difficult (0.55 ± 1.3) than the FIM items (0.36 ± 1.5). The MDS items covered a wider range of item difficulty (range = 2.76 logits) and had the easiest and the most difficult items in the item bank compared to the FIM items (range = 1.98 logits; Figure 1). The person separation index was 4.51 and person strata was 6.3, indicating the FIM-MDS item bank can distinguish the respondents’ ability into six hierarchical levels.

DIF

Across age-groups (below/equal vs. above 65 years old), 1 item, MDS bowel continence, was more difficult for the above 65 years old group, showing a DIF contrast of 0.56, (p < .05), indicating slight-to-moderate DIF. Across diagnostic groups (stroke vs. orthopedic), 1 item, FIM eating, was more difficult for the stroke group, showed DIF contrast of 0.77 (p < .05), indicating moderate-to-large DIF; 3 items, MDS eating, MDS walk in room, and MDS walk in corridor, showed slight-to-moderate DIF contrast (within the range of 0.43–0.64; all p < .05). All DIF items for the diagnostic groups were more difficult for the veterans with stroke compared to those with orthopedic impairments (i.e., amputation, knee replacement, and hip replacement). To determine the influence of DIF on person measures, the researchers compared generated person measures with and without the FIM eating item (this item showed moderate-to-large DIF). The person measures correlated at .999 between with and without the DIF item, indicating minimal influence of DIF on person measures.

Discussion

This study was the first step for developing a PAC continuum measurement by cocalibrating two existing ADL instruments currently used across PAC settings and establishing a psychometrically sound ADL item bank. The FIM-MDS item bank demonstrated overall good item-level psychometric properties, including good internal consistency, good person strata, and good point-measure correlation. In addition, the item bank demonstrated overall good model fit and acceptable fit statistics for 21 of the 26 items, indicating that both instruments measure the same construct (ADL physical function). The unidimensionality of the FIM-MDS item bank was also supported by the high correlations of both the raw scores and the measure scores. One item, FIM eating, had moderate-to-large DIF, and 1 item, MDS walk in corridor, had high residual correlations. However, both items were kept in the final item bank in order to cover a full spectrum of item difficulty levels, since these 2 items were the easiest and the most difficult items, respectively. In addition, removing the DIF item did not have a significant impact on the person measures. Furthermore, the CFA results supported a one-factor model of all 26 items. Thus, all 26 items were retained in the final FIM-MDS item bank. Future studies in this line of research can minimize the concerns of item redundancy by excluding multiple items with high correlations or by flagging only one of the highly correlated items during the development of short forms (SFs) from the item bank.

This study demonstrated similar psychometric properties results of the FIM-MDS item bank for the veterans with disabilities as previous study (Velozo et al., 2007), but with a larger sample size and more restrictive number of days between FIM and MDS administrations, indicating more reliable results. The FIM-MDS item bank in this study demonstrated better internal consistency (0.98 vs. 0.94), better point-measure correlations (0.56–30.90 vs. 0.54–0.84), similar raw score and person measure correlations (−0.93, 0.85 vs. −0.81, 0.72), but more misfit items (11 vs. 5 misfit items). The higher percentage of misfit items may be due to veterans in this study having an overall higher ability. This study showed consistent results for four misfit MDS items as the previous study, including MDS bladder control, MDS locomotion off unit, MDS walk in corridor, and MDS walk in room. This finding was also consistent with several studies that suggested incontinence and ambulation items may be considered as separate constructs other than ADL (Nilsson, Sunnerhagen, & Grimby, 2005; Velozo et al., 2007; Velozo, Magalhaes, Pan, & Leiter, 1995). Our current study utilized CFA, PCA, and residual correlations to elaborate the determination of factor structure for the item bank, while previous study only utilized Rasch analysis to determine unidimensionality of the scale (Velozo et al., 2007). In summary, this study results supported the conclusion that the ADL physical function items of the FIM and MDS measured the same construct with acceptable to good item-level psychometric properties.

This study showed consistent ADL item difficulty hierarchy as previous study findings (Linacre, Heinemann, Wright, Granger, & Hamilton, 1994; Velozo et al., 1995, 2007; Wang et al., 2008a; Wang, Byers, & Velozo, 2008b). Overall, eating was the easiest item and walking was the most difficult item, supporting the concept of creating a global functional measure for physical ADL tasks. This global physical ADL item difficulty hierarchy could be used across diagnostic groups and across different populations, such as the general population and the veterans (Linacre et al., 1994; Velozo et al., 1995, 2007; Wang et al., 2008a, 2008b).

The current study particularly focused on cocalibrating the FIM and MDS items and developing a psychometrically sound item bank instead of developing a raw score conversion table between instruments, because the goals of current and the follow-up studies are to generate a feasible linking measurement in efficient administration formats, such as SFs and computerized adaptive tests (CATs), to decrease assessment burden for practitioners across the PAC settings. Establishing a well-developed item bank is the first step for developing easier assessment forms. Thus, the positive findings of this study are a crucial first step/cornerstone for developing a unified instrument using existing instruments across the PAC continuum. By using data collected for clinical and administrative reporting purposes, the results of this study have clear implications for future clinical applications.

Both the FIM and MDS play important roles in determining functional outcome and rehabilitation (supportive) treatments within the VA (Department of VA, 2011, 2014). It is VA’s policy that all VA medical centers with inpatient or CLC beds must use the FSOD (which contains the FIM) to track rehabilitation outcomes on all acute stroke, lower extremity amputation, and traumatic brain injury patients admitted to VA inpatient units (Department of VA, 2011). The FSOD is used by Comprehensive Interdisciplinary Inpatient Rehabilitation Program to monitor ongoing process and quality improvements in the veterans health-care system. FIM scores and FIM-derived functional classifications are used to determine rehabilitation treatment intensity and monitor functional outcomes. Similar to FIM, and suggested by the Commission on Accreditation of Rehabilitation Facilities (CARF), the MDS is the primary quality indicator for CLC services in VA. The goal of CLC supportive rehabilitation is to optimize functional ability and maintain or delay further functional loss. The MDS is used to monitor and measure performance and improvement for the veterans receiving supportive rehabilitation. Developing an item bank that connects FIM and MDS provides considerable benefits to the VA. Our study preserves the integrity of the VA efforts to measure quality improvement in the IRFs and CLCs while creating a mechanism to measure veterans’ function across the continuum of care.

In addition, the results of current study can be used to develop SFs and CATs from a cocalibrated FIM-MDS item bank, facilitating feasible and practical assessments across settings, without a need to develop a new single instrument across PAC continuum. Also, the ability to use items from the FIM and MDS to monitor the progress of patients may provide an important first step toward measuring patients’ progress across the continuum of care. Our ability to measure functional progress across settings is essential for understanding how best to optimize care to improve patient outcomes and conserve financial resources. Thus, a valid item bank may serve as a crucial basis to improve researchers’ ability to assess comparative effectiveness and relevant health-care costs for populations who receive service mixes, with the potential of benefit to patients, practitioners, and billing payers.

Study Limitations

There are several limitations of this study. First, to reduce the influence of functional changes, this study only included the data of the same veteran who had completed both the FIM and the MDS data within 6 days. However, it still could be possible that patients’ function changed over this short period of time, which may potentially produce undesirable noise in the data. A second limitation of this study was that the data used were limited to the veterans, and most individuals in this study were male and tended to be older compared to the general population. Thus, the results might have limited external generalizability. However, the psychometric results of the item bank were not expected to be much different since only slight difference observed that was associated with patients’ demographic characteristics. However, we recognized that selection bias might exist since we only included those who completed both FIM and MDS. This selection may also result in a more homogeneous group and thus may not represent the larger population. Furthermore, this study used retrospective data that were not prospectively designed and collected for this research purposes. Thus, the existing limitations such as rater bias could not be controlled in this study. Also, this study did not differentiate or control whether any cognitive disability may impact on physical functional performance since it is possible that the same patient had physical and cognitive comorbidities, which could impact functional outcomes. Lastly, removing person measures that differed significantly between the FIM and MDS before cocalibrating the two instruments may favor more promising psychometric qualities. Note that the logic behind this “cleaning” of the data is to build the item bank using only valid responses (i.e., having an individual scored high on one instrument indicating high functional ability and low on the other instrument indicating low functional ability is likely due to invalid scoring). Since the purpose of the current study was to establish a well-developed item bank. The second phase of our larger study, the validity testing, will use the data from all subjects (i.e., no elimination of invalid responses) and allow for the evaluation of the influence of the data cleaning.

Conclusion

This study found that the FIM-MDS item bank had acceptable to good item-level psychometric properties, suggesting that a single construct, ADL, was measured by these two instruments. The researchers will use these results to develop SFs and computerized adaptive testing to decrease assessment burdens for the clinical practitioners and to enable researchers and policy makers to begin measuring outcomes for patient groups that receive the continuum of care. In addition, future studies are needed to validate the measurement precision and accuracy of the item bank and its multiple test forms across the PAC continuum, such as investigating if the SFs generated from the linked item bank could produce similar functional results as the original instrument scores.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Department of Veterans Affairs, Health Services Research and Development, Grant IR 11-223-1, “Item Banking across the Continuum of Care,” from North Florida/South Georgia Veterans Health System, Center of Innovation on Disability and Rehabilitation Research (CINDRR).

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

  1. Bond TG, & Fox CM (2007). Applying the Rasch model: Fundamental measurement in the human sciences (2nd ed.). Mahwah, NJ: Lawrence Erlbaum. [Google Scholar]
  2. Centers for Medicare & Medicaid Services. (2012). U.S. Department of Health and Human Services. Report to Congress: Post Acute Care Payment Reform Demonstration (PAC-PRD). Retrieved May 22, 2013, from http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Reports/downloads/Flood_PACPRD_RTC_CMS_Report_Jan_2012.pdf [Google Scholar]
  3. Cook KF, Taylor PW, Dodd BG, Teal CR, & McHorney CA (2007). Evidence-based practice for equating health status items: Sample size and IRT model. Journal of Applied Measurement, 8,175–189. [PubMed] [Google Scholar]
  4. DepartmentofVeteransAffairs.(2011).Physicalmedicineandrehabilitationoutcomes for acute stroke, traumatic brain injury, and lower-extremity amputation patients (Veterans Health Administration Directive 2011–017). Washington, DC: Veterans Health Adminstration. [Google Scholar]
  5. Department of Veterans Affairs. (2014). Rehabilitation continuum of care (Veterans Health Administration Handbook 1170.04). Washington, DC: Veterans Health Administration. [Google Scholar]
  6. Dorans NJ, Pommerich M, & Holland PW (2010). Statistics for social and behavioral sciences: Linking and aligning scores and scales. In Pommerich M (Ed.), Concordance: The good, the bad, and the ugly (Chapter 11, pp. 202–216). New York, NY: Springer Science + Business Media, LLC. [Google Scholar]
  7. Fischer HF, Wahl I, Fliege H, Klapp BF, & Rose M (2012). Impact of cross-calibration methods on the interpretation of a treatment comparison study using 2 depression scales. Medical Care, 50, 320–326. [DOI] [PubMed] [Google Scholar]
  8. Fisher WP Jr. (1997). Physical disability construct convergence across instruments: Towards a universal metric. Journal of Outcome Measurement, 1, 87–113. [PubMed] [Google Scholar]
  9. Haley SM, Ni P, Lai JS, Tian F, Coster WJ, Jette AM, ... Cella D (2011). Linking the activity measure for post acute care and the quality of life outcomes in neurological disorders. Archives of Physical Medicine and Rehabilitation, 92, S37–S43. doi: 10.1016/j.apmr.2011.01.026 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Hambleton RK, Swaminathan H, Cook LL, Eignor DR, & Gifford JA (1978). Developments in latent trait theory: A review of models, technical issues, and applications. Review of Educational Research, 48, 467–510. [Google Scholar]
  11. Hu L, & Bentler PM (1996). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equating Modeling, 6, 1–55. doi: 10.1080/10705519909540118 [DOI] [Google Scholar]
  12. Kolen MJ, & Brennan RL (2004). Test equating, scaling, and linking: Methods and practice (2nd ed.). In Kolen MJ & Brennan RL (Eds.), Item response theory methods (Chapter 6, pp. 176–205). New York, NY: Springer Science + Business Media, LLC. [Google Scholar]
  13. Linacre JM (1998). Table 13.1: Item statistics in measure order. Rasch Measurement Forum. Retrieved December 12, 2013, from http://www.winsteps.com/winman/index.htm?correlations.htm [Google Scholar]
  14. Linacre JM (1999). Investigating rating scale category utility. Journal of Outcome Measure, 3, 103–122. [PubMed] [Google Scholar]
  15. Linacre JM (2002). Optimizing rating scale category effectiveness. Journal of Applied Measurement, 3, 85–106. [PubMed] [Google Scholar]
  16. Linacre JM (2004). Rasch model estimation: Further topics. Journal of Applied Measure, 5, 95–110. [PubMed] [Google Scholar]
  17. Linacre JM (2010). Predicting responses from Rasch measures. Journal of Applied Measure, 11, 1–10. [PubMed] [Google Scholar]
  18. Linacre JM (2012). A user’s guide to Winsteps ministep 3.70.0: Rasch model computer programs. Chicago, IL: Winsteps. [Google Scholar]
  19. Linacre J, Heinemann AW, Wright BD, Granger CV, & Hamilton BB (1994). The structure and stability of the functional independence measure. Archives of Physical Medicine and Rehabilitation, 75, 127–132. [PubMed] [Google Scholar]
  20. Medicare Payment Advisory Commission. (2013). Report to the congress: Medicare and the health care delivery system. In Approaches to bundling payment for postacute care (Chapter 3, pp. 59–88). [Google Scholar]
  21. Nilsson AL, Sunnerhagen KS, & Grimby G (2005). Scoring alternatives for FIM in neurological disorders applying Rasch analysis. Acta Neurologica Scandinavica, 111, 264–273. [DOI] [PubMed] [Google Scholar]
  22. Patient-Reported Outcome Measurement Information System. (2014). Instrument development and psychometric evaluation scientific standards. Retrieved June 6, 2014, from http://www.nihpromis.org/Documents/PROMISStandards_Vers2.0_Final.pdf
  23. Reeve BB, Burke LB, Chiang YP, Clauser SB, Colpe LJ, Elias JW, ... Werner EM (2007). Enhancing measurement in health outcomes research supported by Agencies within the US Department of Health and Human Services. Quality of Life Research, 16, 175–186. [DOI] [PubMed] [Google Scholar]
  24. Research Triangle Institute International. (2013). Examining post acute care relationships in an integrated hospital system: Final report for 2009. Retrieved July 17, 2013, from http://aspe.hhs.gov/health/reports/09/pacihs/report.shtml
  25. Tennant A, & Pallant JF (2006). Unidimensionality matters! (A tale of two Smiths?). Rasch Measurement Transaction, 20, 1048–1051. [Google Scholar]
  26. Velozo CA, Byers KL, Wang YC, & Joseph BR (2007). Translating measures across the continuum of care: Using Rasch analysis to create a crosswalk between the Functional Independence Measure and the Minimum Data Set. Journal of Rehabilitation Research & Development, 44, 467–478. [DOI] [PubMed] [Google Scholar]
  27. Velozo CA, Choi B, Zylstra SE, & Santopoalo R (2006). Measurement qualities of a self-report and therapist-scored functional capacity instrument based on the Dictionary of Occupational Titles. Journal of Occupational Rehabilitation, 16, 109–122. [DOI] [PubMed] [Google Scholar]
  28. Velozo CA, Magalhaes LC, Pan AW, & Leiter P (1995). Functional scale discrimination at admission and discharge: Rasch analysis of the Level of Rehabilitation Scale-III. Archives of Physical Medicine and Rehabilitation, 76, 705–712. [DOI] [PubMed] [Google Scholar]
  29. Wang YC, Byers KL, & Velozo CA (2008a). Validation of FIM™-MDS crosswalk conversion algorithm. Journal of Rehabilitation Research & Development, 45, 1065–1076. [DOI] [PubMed] [Google Scholar]
  30. Wang YC, Byers KL, & Velozo CA (2008b). Rasch analysis of Minimum Data Set mandated in skilled nursing facilities. Journal of Rehabilitation Research and Development, 45, 1385–1399. [PubMed] [Google Scholar]
  31. Wright BD, & Linacre JM (1994). Reasonable mean-square fit values. Rasch Measurement Transaction, 8, 370. [Google Scholar]
  32. Wright BD, & Masters GN (1982). Rating scale analysis. Chicago, IL: Mesa Press. [Google Scholar]
  33. Zwick R, Thayer DT, & Lewis C (1999). An empirical bayes approach to mantel-haenszel DIF analysis. Journal of Education Measure, 36, 1–28. [Google Scholar]

RESOURCES