Abstract
Background.
Having psychometrically strong disability measures that minimize response burden is important in assessing of older adults.
Methods.
Using the original 48 items from the Late-Life Function and Disability Instrument and newly developed items, a 158-item Activity Limitation and a 62-item Participation Restriction item pool were developed. The item pools were administered to a convenience sample of 520 community-dwelling adults 60 years or older. Confirmatory factor analysis and item response theory were employed to identify content structure, calibrate items, and build the computer-adaptive testings (CATs). We evaluated real-data simulations of 10-item CAT subscales. We collected data from 102 older adults to validate the 10-item CATs against the Veteran’s Short Form-36 and assessed test–retest reliability in a subsample of 57 subjects.
Results.
Confirmatory factor analysis revealed a bifactor structure, and multi-dimensional item response theory was used to calibrate an overall Activity Limitation Scale (141 items) and an overall Participation Restriction Scale (55 items). Fit statistics were acceptable (Activity Limitation: comparative fit index = 0.95, Tucker Lewis Index = 0.95, root mean square error approximation = 0.03; Participation Restriction: comparative fit index = 0.95, Tucker Lewis Index = 0.95, root mean square error approximation = 0.05). Correlation of 10-item CATs with full item banks were substantial (Activity Limitation: r = .90; Participation Restriction: r = .95). Test–retest reliability estimates were high (Activity Limitation: r = .85; Participation Restriction r = .80). Strength and pattern of correlations with Veteran’s Short Form-36 subscales were as hypothesized. Each CAT, on average, took 3.56 minutes to administer.
Conclusions.
The Late-Life Function and Disability Instrument CATs demonstrated strong reliability, validity, accuracy, and precision. The Late-Life Function and Disability Instrument CAT can achieve psychometrically sound disability assessment in older persons while reducing respondent burden. Further research is needed to assess their ability to measure change in older adults.
Keywords: Function, Disability, Computer-adaptive testing, Activity, Participation
Disability associated with aging is a major issue facing American society (1). Rising numbers of older adults in the population increases the burden of disability and has major implications for the quality of life of older adults and their families, rising health care costs and related resource utilization (1). Understanding the process of disablement for older adults is critical for a wide range of stakeholders. Several important developments have advanced the fundamental tools available for this endeavor, including a theoretical model of disablement (2), an international language of health and disability (3), and improved measurement instruments (4).
Traditional fixed-form instruments have suffered from important concerns regarding response burden, inadequate measurement precision (5,6), floor and ceiling effects, and administration of redundant or irrelevant items to respondents (7). Short-form solutions have provided relief of response burden, with important limitations to measurement precision (8,9) and ability to measure clinically important change (10). Item response theory (IRT) and computer-adaptive testing (CAT) are promising contemporary measurement approaches adopted in the health field in recent years to overcome limitations of traditional fixed-form measures (7,11–13). Item response theory and CAT-based measures have been shown to significantly reduce assessment time compared with fixed-form testing while maintaining good precision (14).
CATs require a comprehensive item bank that contains items representing the construct of interest from low to high levels of ability. A computer algorithm tailors the administration to the respondent by selecting items based on prior responses and can be programmed to stop once a specified score precision or predetermined number of items is reached (7,8).
The Late-Life Function and Disability Instrument (LLFDI) is a fixed-form self-report measure developed to assess the domains of function and disability of community-dwelling older adults (15–17). In previous research, a prototype LLFDI-CAT developed from the items contained in the original fixed-form LLFDI demonstrated promising psychometric properties (18). However, because the prototype CAT was based on a relatively small pool of 48 items contained in the original version, we undertook further work to expand the item pools, aiming to improve the instrument’s breadth and depth of measurement, particularly at the low and high ends of each scale. The purpose of this project was to develop and test expanded LLFDI item banks and CAT and evaluate their precision, reliability, validity, and efficiency to assess function and disability in community-dwelling older adults.
METHODS
All study procedures were approved by the Boston University Institutional Review Board, and all subjects provided written informed consent prior to participation.
The World Health Organization’s International Classification of Functioning, Disability, and Health (ICF) was used as a framework for organizing the new LLFDI-CAT’s item pools as recommended by the IOM Report, The Future of Disability in America (1,3). We modified our terminology to describe the new LLFDI-CAT domains in language consistent with the ICF framework (19). The original LLFDI structure and terms are consistent with the Nagi framework (2), in which “functional limitation” refers to limitation of a person in performance of specific functional tasks and activities; the corresponding term in the language of the ICF is “activity limitation.” The original LLFDI uses the term “disability” to describe limitation in performing social roles and activities; within ICF the term “participation restriction” is used.
Development and testing of the CAT involved the following stages: item bank development, item bank calibration, CAT development from the calibrated item bank, and LLFDI-CAT field testing and validation. Calibration involves administration of the items in the bank to subjects and estimating parameters characterizing the respondent’s location on the scale representing his/her ability level and the difficulty of the task presented by each item.
Instruments
Late-Life Function and Disability Instrument.—
The original fixed-form LLFDI was used as the foundation of the expanded LLFDI item pools (15–17). It includes a Function domain scale consisting of 32 items and a Disability domain scale consisting of 16 items that assess performance of social roles in two dimensions: frequency and limitation.
Veteran’s Rand-36.—
The VR-36 (20,21), a modified version of the Medical Outcomes Short Form-36 (22) is a self-report measure of health status that consists of 36 questions within eight subscales and two summary components: physical and mental. The 10-item Physical Functioning Scale of the SF-36 subscale (PF-10) was used to screen subjects for eligibility. The PF-10 assesses extent of limitation in 10 tasks on a scale of 1–3, with raw score range of 10–30.
To develop the new LLFDI-CAT item banks, we included all items from the original LLFDI and conducted an extensive literature review to identify relevant new items. We used the ICF framework to categorize these items, allowing us to evaluate gaps in content. We conducted two focus groups with nine older adults to elicit opinions about item content and wording and held two geriatric clinician focus groups. Audiotapes of the sessions were analyzed, and results were categorized according to content by ICF domain structure and compared with existing item content. Novel items were written to address the newly identified gaps.
Instrument Construction
For the Activity Limitation content domain, questions asked, “How much difficulty do you currently have doing a particular activity?” or “How much help from another person do you currently need doing a particular activity?” The response options include “None at all; A little; A lot; Unable to do; and Does not apply.” Respondents were instructed that currently means “how you ‘typically’ or ‘usually’ perform the activity at this point in your life” and that “Does not apply” means that “you do not do the activity for reasons other than your physical or mental health.” Questions in the Participation Restriction domain asked, “Because of your physical or mental health, to what extent do you feel limited in doing a particular activity?” Response options included “Not at all; A little; A lot; Completely; and Does not apply.” Feedback from use in the field and focus groups suggested that the limitation (restriction) dimension was most critical, and therefore, the frequency dimension of the Disability Scale was eliminated.
We conducted cognitive testing of the preliminary item bank among 13 older adults, so that each item was exposed twice to elicit their interpretation of the questions and to identify any questions that needed to be revised or removed. The final Activity Limitation item pool used for the calibration study consisted of 158 items including 124 core items, seven gender-specific items, and 27 walking aid or wheelchair-specific items. The final Participation Restriction item pool included 62 items.
Calibration Study
Subjects and sampling procedures.—
We recruited 520 older adults using advertisements in various media, and from Councils on Aging, senior housing, and geriatric and physical therapy practices in eastern MA. Subjects were eligible if they were 60 years of age or older, spoke and understood English, lived in the community, reported limitation on at least one item of the PF-10 (22), could provide contact information, and were oriented to person, year, and month.
Data Collection
Trained interviewers administered demographic questions and new LLFDI-CAT item pools to participants in-person or by telephone. They also administered gender-specific items as appropriate. Participants who reported using a wheelchair and/or walking aid answered the appropriate assistive device questions. Clarifications and probes were scripted for the interviewers.
Evaluation of Instrument Structure
Confirmatory factor analyses were conducted using MPlus software, version 6 (Muthén & Muthén, Los Angeles, CA) (23) to assess the dimensionality of the Activity Limitation and Participation Restriction domains separately by comparing the results across different confirmatory factor analysis models (1): one-factor unidimensional (2); two-factor multidimensional; and (3) three bifactor multidimensional IRTs. Means and variance-adjusted weighted least squares estimation methods with polychoric correlation matrixes were used. The model comparison was based on a likelihood ratio chi-square, the number of parameters estimated, and the following information criteria indices: Akaike Information Criterion, Bayesian Information Criterion and sample-size adjusted Bayesian Information Criterion. Lower information values indicate better model fit (24). To evaluate the model, we calculated several fit statistics and compared them with acceptable levels as follows: chi-square test, comparative fit index and Tucker Lewis Index ≥0.90 (25–27); and root mean square error (RMSE) approximation <0.08 as acceptable fit, and 0.05 as very good fit (28,29).
One important assumption of IRT is that of local independence. In a bifactor model, this assumption holds that a person’s response to an item is only determined by the general factor and the subfactors. Once the general factor and the subfactors are controlled, there should be no significant association among item responses. We checked this assumption by examining the inter-item residual correlations. High residual correlation (≥.20) was considered as violating this assumption and showing local dependence. We interpreted inter-item residual correlations of ≤.20 as indicative of acceptable fit and appropriate attribution of inter-item correlation to the primary factor (30).
Item Calibration and CAT Construction
We used bifactor multidimensional IRT model to calibrate the data. This model is a multidimensional logistic graded response model, as it estimates both the discrimination parameters and the ordered location parameters or difficulty estimates for each item. The marginal maximum likelihood estimation was used to estimate the item parameters. The person scores for each domain and subdomain were estimated using maximum a posteriori estimation (31,32), scores were expressed in log-odds units, or logits, then transformed into a scale with mean = 50 and standard deviation = 10, lower scores indicating more difficulty or limitation. For example, each respondent received a score for the Activity Limitation domain and for each of the two subdomains: Basic Mobility & Handling, and Daily Activities. Similarly, each respondent received an overall score for the Participation Restriction domain and subscores for each of the two subdomains: Social Roles and Instrumental Roles. Item fit was assessed by the Z index, which is the standardized difference between the observed and expected log likelihood of response patterns. This was a one-sided test and under the null hypothesis, the z scores were distributed as a normal distribution, and the cutoff was −1.645 (33,34). IRT analyses were conducted using IRTPRO (35).
We also evaluated the breadth of coverage of the item bank by comparing subjects’ score distribution to the item bank score distribution, which was created by mapping the item response category expected value onto the general factor scale. In a unidimensional IRT model, we could map each response category onto the general factor through the expected score. Because of the multidimensional nature of bifactor model used in this study, at each general factor score level the expected value depends on the scores at the subfactor level. We took the average of those expected values and mapped the average expected value onto the general factor score level.
CAT algorithms were created for each domain using HDR software (HDR, Boston University, Boston, MA) to be administered using a computer or web-based platform. In each CAT, the first question was selected from the middle of the difficulty range; maximum a posteriori estimation was used to estimate the subject’s score and standard error; the item with the maximum test information matrix at current score level was selected; the program updated the subject’s score based on that response and continued until the stopping rule had been satisfied. The stopping rule can be chosen as the maximum desired number of items to be administered or the level of precision as specified by the standard error of the score estimate.
Differential Item Functioning
Differential item functioning (DIF) was examined in the calibration data using ordinal logistic regression (36) to evaluate whether the pattern of subjects’ responses was influenced by group membership, including subjects’ gender and age. The dependent variable was the response to an item, and the independent variables were participants’ domain score, group membership, and an interaction term between the total score and group membership. The analytic strategy was to successively add total score, group membership, and interaction term into the model in three steps, and the procedure was repeated for each item. The test statistic was the −2log likelihood difference between models, which is a distributed chi-square with two degrees of freedom, and the effect size was the R 2 change between models (37). The following criteria were set for DIF analysis: if the likelihood difference test was statistically significant and the R 2 change was greater than .07 for one item, that item exhibited severe DIF; if the likelihood ratio test was statistically significant and the R 2 change was between .035 and .07, that item exhibited moderate DIF; otherwise, values indicated negligible DIF (37). The DIF analysis was conducted separately for each item in each domain.
Simulation Studies
To conduct initial evaluation of the new LLFDI-CAT performance, we used the calibration study data to compare characteristics of simulated 10-item CATs to the full item banks. The CAT selected questions according to the algorithm, and participant responses were fed to the CAT as they were selected, creating a score and standard error for each participant for each scale. We evaluated accuracy, precision, and conditional reliability. To evaluate accuracy, we calculated the correlation, bias, and root mean square error between mean scores generated by the CATs and those of the full item bank. Precision was assessed by calculating the standard errors across the range of scores for the CATs and for the items from the original fixed-form scales. Conditional reliability was estimated across the scale as 1/[1+(standard error)2]. Areas with reliabilities <0.70 were considered insufficient. In this paper, we report on the results for the domain scores.
Floor and Ceiling
We evaluated potential floor and ceiling effects by calculating the percent at the floor and ceiling using the response data at the participant level: if the participant responded at the highest category for all the items, then he/she was grouped at the ceiling, if the participant responded at the lowest category for all the items, then he/she was grouped at the floor.
Initial Field Testing and Validation of 10-Item CAT
To assess validity, test–retest reliability, and acceptance, we conducted a study in which trained interviewers administered 10-item LLFDI-CAT scales and the Veterans’ Rand-36 (20,21) survey to 102 community-dwelling older adults using the same recruitment and enrollment procedures used in the calibration study. Time to complete the instruments was recorded, and Pearson correlation coefficients were calculated to evaluate the construct validity of the new LLFDI-CAT scales relative to the VR-36 subscales. For a subset of 57 subjects, the LLFDI-CAT scales were readministered within 7–14 days along with questions about the acceptability of the LLFDI-CAT. Intraclass correlation coefficients were calculated to assess test–retest reliability and reliability was considered high if r > .80 and substantial if between 0.61 and 0.80 (38). We hypothesized that correlations would be moderate to strong (>.60) between the new LLFDI-CAT subscales and relevant VR-36 subscales (eg, LLFDI-CAT mobility and VR-36 physical function) (39).
RESULTS
The demographic characteristics of the study samples are summarized in Table 1.
Table 1.
Background Characteristics of the Study Samples
| Calibration Sample (N = 520) | Validation Sample (N = 102) | Test–Retest Sample (N = 57) | |
| Age range, y | 60–101 | 59–98 | 62–98 |
| Mean age, y (SD) | 76.19 (9.08) | 78.22 (9.35) | 77.82 (8.06) |
| Female, # (%) | 389 (74.8) | 83 (81.4) | 50 (87.7) |
| Ethnicity, # (%) N = 513 | |||
| Hispanic or Latino | 6 (1.2) | 4 (4.0) | 3 (5.3) |
| Race # (%) N = 517 | |||
| Asian | 13 (2.5) | 2 (2.0) | 1 (1.8) |
| Other | 10 (1.9) | 5 (4.9) | 3 (5.3) |
| African American | 79 (15.3) | 21 (20.6) | 14 (24.6) |
| White | 411 (79.5) | 73 (71.6) | 39 (68.4) |
| American Indian or Alaskan native | 4 (0.8) | 1 (0.9) | |
| Education, # (%) N = 512 | |||
| Post-graduate | 99 (19.3) | 16 (15.7) | 8 (14) |
| College graduate | 105 (20.5) | 17 (16.7) | 5 (8.8) |
| Some college | 138 (27.0) | 27 (26.5) | 15 (26.3) |
| High school graduate | 109 (21.3) | 19 (18.6) | 18 (31.6) |
| Less than high school | 61 (11.9) | 23 (22.5) | 11 (19.3) |
| Health level, # (%) N = 512 | |||
| Excellent | 48 (9.4) | ||
| Good | 200 (39.1) | ||
| Fair | 110 (21.5) | ||
| Poor | 35 (6.8) | ||
| Very poor | 119 (23.2) | ||
| SF-36 Physical Function Score (22) | |||
| Mean (SD) | 21.4 (5.2) | 19.9 (5.5) | |
| Min, max | 10, 29 | 10, 29 | |
| 10–20 | 43.05 (223) | 55.88 (57) | |
| 21–29 | 56.95 (295) | 44.12 (45) | |
| Walking aids, # (%) N = 511 | |||
| Do not use walking aid/wheelchair | 329 (64.4) | 55 (53.9) | 29 (50.9) |
| Use walking aid only | 131 (25.6) | 37 (36.3) | 21 (36.8) |
| Use both walking aid and wheelchair | 42 (8.2) | 7 (6.9) | 5 (8.8) |
| Use wheelchair and never walk | 9 (1.8) | 3 (2.9) | 2 (3.5) |
Confirmatory Factor Analysis and Item Calibration
Confirmatory factor analysis revealed that the bifactor model with two subfactors in both domains had the lowest Akaike Information Criterion, Bayesian Information Criterion, and adjusted Bayesian Information Criterion values and the likelihood ratio chi-square test demonstrated that the bifactor model with two subfactors in either domain had significantly better fit than other models. The model comparison results are presented in Appendix 1. The Activity Limitation domain consisted of Basic Mobility & Handling and Daily Activities subdomains, and the Participation Restriction domain consisted of Social Roles and Instrumental Roles subdomains. Fit statistics for the Activity Limitation Scale were comparative fit index = 0.95; Tucker Lewis Index = 0.95; and root mean square error approximation = 0.03, and statistics for the Participation Restriction Scale were comparative fit index = 0.95; Tucker Lewis Index = 0.95; and root mean square error approximation = 0.05. After removing 17 items from the Activity Limitation scale and 7 from the Participation Restriction Scale due to local dependence, 141 and 55 items remained in each scale, respectively. The new LLFDI items are presented in Appendix 2 by domain.
Appendix 1. Confirmatory Factor Analysis Model Comparison Based on a Likelihood Ratio Chi-Square, the Number of Parameters Estimated, and the Following Information Criteria Indices: Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) and Sample-Size Adjusted BIC. Lower Information Values Indicate Better Model Fit
| Akaike (AIC) | Bayesian (BIC) | Sample-Size Adjusted BIC | Loglikelihood | Number of Parameters Estimated | ||
| Participation restriction | Unidimensional | 39,050.02 | 39,987.981 | 39,286.49 | −19,304.010 | 221 |
| Two factor model | 38,658.89 | 39,601.099 | 38,896.43 | −19,107.447 | 222 | |
| Bifactor with one subfactor: Instrumental roles | 38,554.71 | 39,607.268 | 38,820.07 | −19,029.357 | 247 | |
| Bifactor with one subfactor: social roles | 38,530.54 | 39,578.848 | 38,794.83 | −19,018.269 | 248 | |
| Bifactor with two subfactors | 37,913.16 | 39,080.305 | 38,207.41 | −18,681.579 | 275 | |
| Activity limitation | Unidimensional | 78,569.09 | 80,371.897 | 79,026.03 | −38,860.545 | 424 |
| Two factor model | 77,410.57 | 79,217.631 | 77,868.59 | −38,280.286 | 425 | |
| Bifactor with one subfactor: basic mobility & handling | 76,279.14 | 78,345.562 | 76,802.90 | −37,653.569 | 486 | |
| Bifactor with one subfactor: daily activities | 77,181.09 | 79,166.725 | 77,684.37 | −38,123.543 | 467 | |
| Bifactor with two subfactors | 75,844.06 | 78,097.566 | 76,415.24 | −37,392.028 | 530 |
Appendix 2. Item Difficulty and Fit Statistics
| Domain | Items | Parameters | ||||||
| Discrimination | Location | Difficulty | ||||||
| a1 | a2 | a3 | c1 | c2 | c3 | |||
| Participation restriction | Taking part in active recreation? | 1.92 | 0.56 | 0 | 0.87 | −0.7 | −2.33 | −0.72 |
| Taking part in a regular fitness program? This may include walking for exercise, stationary biking, weight lifting, or exercise classes. | 1.64 | 0 | 0.03 | 2.28 | 1.02 | −0.46 | 0.95 | |
| Caring for a sick or disabled adult? | 1.93 | 0 | −0.33 | 2.49 | 1.14 | −0.51 | 1.04 | |
| Driving yourself in a car? | 1.74 | −0.41 | 0 | 1.75 | 1.34 | 0.24 | 1.11 | |
| Going on the sort of trips and holidays you want to go on? | 2.19 | 0.24 | 0 | 3.27 | 1.13 | −0.69 | 1.24 | |
| Having a sexual relationship? | 1.25 | 0.44 | 0 | 2.37 | 1.28 | 0.27 | 1.31 | |
| Working at a volunteer job outside your home? | 2 | 0 | 0.16 | 2.7 | 1.46 | 0.25 | 1.47 | |
| Providing care or assistance to others? This may include providing personal care, transportation, and running errands for family members and friends. | 3.07 | 0 | −0.4 | 3.5 | 1.92 | −0.51 | 1.64 | |
| Getting housework done by yourself when you want it done? | 2.01 | 0 | 0.44 | 3.63 | 1.79 | −0.36 | 1.69 | |
| Traveling out of town for at least an overnight stay? | 2.19 | 0.19 | 0 | 3.15 | 1.59 | 0.32 | 1.69 | |
| Filing your taxes? | 1.37 | 0 | 0.28 | 3.02 | 1.68 | 0.69 | 1.80 | |
| Traveling by bus, train or ferry? | 1.96 | −0.34 | 0 | 3.48 | 1.85 | 0.24 | 1.86 | |
| Completing forms for insurance or disability benefits? | 1.19 | 0 | 0.27 | 3.41 | 2.07 | 0.17 | 1.88 | |
| Going to parks or other outdoor recreational areas? | 2.1 | 0.09 | 0 | 3.53 | 2.07 | 0.22 | 1.94 | |
| Taking care of the inside of your home? This includes managing and taking responsibility for homemaking, laundry, housecleaning, and minor household repairs. | 2.28 | 0 | 0.47 | 4.26 | 2.2 | −0.13 | 2.11 | |
| Inviting people into your home for a meal or entertainment? | 1.8 | 0.87 | 0 | 4.05 | 2.09 | 0.46 | 2.20 | |
| Going to a sport, social, or other club? (female) | 2.09 | 0.93 | 0 | 4.05 | 2.27 | 0.38 | 2.23 | |
| How much work you do (including work at home)? | 2.25 | 0 | 1.74 | 6.09 | 2.58 | −0.89 | 2.59 | |
| Baking or cooking something special for others? | 2.32 | 0 | 0.02 | 4.03 | 2.87 | 0.95 | 2.62 | |
| Your ability to meet the needs of those who depend on you? | 2.76 | 0 | 0.28 | 5.03 | 2.87 | 0.19 | 2.70 | |
| Doing things for fun outside your home? | 2.06 | 0.82 | 0 | 4.97 | 2.7 | 0.64 | 2.77 | |
| Taking part in organized social activities? | 2.1 | 0.96 | 0 | 4.7 | 2.95 | 0.75 | 2.80 | |
| Writing checks, paying, bills, balancing checkbook, keeping financial records? | 1.37 | 0 | 0.66 | 4.48 | 2.7 | 1.25 | 2.81 | |
| Taking care of household business and finances? This may include managing and taking responsibility for your money, paying bills, dealing with a landlord or tenants, dealing with utility companies or governmental agencies. | 1.53 | 0 | 0.55 | 4.57 | 2.79 | 1.14 | 2.83 | |
| Working on a hobby or project? | 2 | 0.77 | 0 | 5.05 | 2.86 | 0.69 | 2.87 | |
| Reading books, magazines or newspapers? | 1.2 | 0.5 | 0 | 4.35 | 2.8 | 1.5 | 2.88 | |
| Going to a sport, social, or other club? (male) | 2.84 | 0.63 | 0 | 5.43 | 2.87 | 0.43 | 2.91 | |
| Doing your job the way you want to? (≥75 y) | 2.63 | 0 | 1.56 | 5.78 | 3.36 | 0 | 3.05 | |
| Getting minor housework done by yourself? | 2.29 | 0 | 0.5 | 5.08 | 3.34 | 0.8 | 3.07 | |
| Doing the work you want to do? | 3.16 | 0 | 2.58 | 7.56 | 2.73 | −1.05 | 3.08 | |
| Doing things for your grandchildren? | 2.36 | 0 | −0.01 | 5.25 | 3.4 | 0.67 | 3.11 | |
| Doing your job the way you want to? (<75 y) | 2.27 | 0 | 1.52 | 6 | 3.31 | 0.12 | 3.14 | |
| Taking your car in for regular maintenance? | 2.73 | 0 | −0.03 | 4.53 | 3.46 | 1.94 | 3.31 | |
| Doing things for your spouse? | 3.07 | 0.1 | 0 | 5.99 | 3.45 | 0.72 | 3.39 | |
| Seeing people as often as you want? | 2.45 | 1.21 | 0 | 6.68 | 3.31 | 0.38 | 3.46 | |
| Doing things for your friends? | 2.88 | 0 | 0.07 | 5.8 | 3.75 | 0.85 | 3.47 | |
| How much difficulty do you currently have keeping in touch with others with letters, phone, or email? | 1 | 1.5 | 0 | 5.86 | 3.46 | 1.46 | 3.59 | |
| The amount of time you spend visiting friends? | 2.85 | 1.57 | 0 | 6.23 | 3.65 | 0.93 | 3.60 | |
| Doing your job as carefully and accurately as others with similar jobs? | 2.32 | 0 | 1.27 | 5.96 | 3.94 | 0.95 | 3.62 | |
| The amount of time you spend doing work (include work at home)? | 3.16 | 0 | 3.06 | 8.84 | 3.14 | −1.02 | 3.65 | |
| Playing cards or games such as bingo? | 1.71 | 0.96 | 0 | 5.05 | 3.9 | 2.13 | 3.69 | |
| Taking care of your own health? This may include managing daily medications, following a special diet, scheduling doctor’s appointments. | 1.51 | 0 | 0.72 | 5.93 | 3.63 | 1.61 | 3.72 | |
| Visiting friends and family in their homes? (≥75 y) | 2.72 | 0.78 | 0 | 6.01 | 3.71 | 1.75 | 3.82 | |
| Taking care of local errands? This may include managing and taking responsibility for shopping for food and personal items, and going to the bank, library, or dry cleaner. | 3.47 | 0 | −0.2 | 6.09 | 4.3 | 1.92 | 4.10 | |
| Going out with others to public places such as restaurants or movies? | 2.84 | 0.93 | 0 | 6.3 | 3.86 | 2.16 | 4.11 | |
| Preparing meals for yourself? This includes planning, cooking, serving and cleaning up. | 2.83 | 0 | 0.33 | 5.94 | 4.33 | 2.19 | 4.15 | |
| Getting along with other people you live with? | 1.23 | 1.32 | 0 | 6.53 | 4.53 | 2 | 4.35 | |
| Visiting friends and family in their homes? (<75 y) | 3.23 | 1.31 | 0 | 6.71 | 4.38 | 1.99 | 4.36 | |
| The amount of contact you have with family and friends? | 2.38 | 1.65 | 0 | 7.41 | 4.15 | 1.53 | 4.36 | |
| Preparing your own meals with help? | 1.99 | 0 | 0.4 | 6.11 | 4.58 | 2.6 | 4.43 | |
| Getting together with your neighbors? | 2.53 | 1.01 | 0 | 7.27 | 4.37 | 1.71 | 4.45 | |
| Doing the work that is really important to you (include work at home)? | 3.71 | 0 | 2.92 | 9.28 | 4.17 | −0.09 | 4.45 | |
| Going outside the home, for example to shop or visit a doctor's office? | 2.68 | 0 | 0.03 | 7.15 | 4.62 | 2.4 | 4.72 | |
| Taking care of your own personal care needs? This includes bathing, dressing, and toileting. | 2.09 | 0 | 0.55 | 6.95 | 4.53 | 2.74 | 4.74 | |
| Voting in elections? | 1.99 | 0.82 | 0 | 6.45 | 4.7 | 3.45 | 4.87 | |
| Talking to people close to you? | 1.71 | 2.11 | 0 | 8.77 | 5.11 | 3.06 | 5.65 | |
| Having a close friendship? | 1.92 | 2.54 | 0 | 8.79 | 5.41 | 3.01 | 5.74 | |
| Maintaining a friendship? | 2.36 | 3.15 | 0 | 10.3 | 7.13 | 3.93 | 7.12 | |
| Activity limitation | Running 1/2 mile or more? | 1.59 | 0 | 1.71 | −2.8 | −4.6 | −8.58 | −5.33 |
| Hiking a couple of miles on uneven surfaces, including hills? | 2.79 | 0 | 2.75 | −6.9 | −4.3 | −2.13 | −4.43 | |
| Lifting 50 pounds (eg, a large full suitcase) or more? | 1.93 | 0 | 0.6 | −1.6 | −3 | −4.77 | −3.13 | |
| Running a short distance, such as to catch a bus? | 2.2 | 0 | 1.94 | −1.3 | −2.3 | −4.37 | −2.64 | |
| Taking a 1 mile, brisk walk without stopping to rest? | 2.39 | 0 | 2.26 | −0.9 | −2.4 | −4.17 | −2.51 | |
| Going up a flight of stairs outside, without using a handrail? | 3.38 | 0 | 3.18 | 0.06 | −2 | −5.04 | −2.33 | |
| Lifting 25 pounds from the ground to a table? (eg, dog food or a large bag of fertilizer)? (female) | 2.06 | 0 | 0.6 | −0.3 | −1.8 | −4.29 | −2.12 | |
| Carrying something in both arms while climbing a flight of stairs (eg, laundry basket)? | 3.06 | 0 | 2.3 | −0.2 | −1.6 | −4.34 | −2.04 | |
| Standing for one hour? | 2.2 | 0 | 0.9 | −0.1 | −2 | −3.95 | −2.00 | |
| Walking on a slippery surface outdoors? | 1.74 | 0 | 1.4 | 0.87 | −1.8 | −4.84 | −1.93 | |
| Walking up steep unpaved inclines (eg, steep gravel driveway)? | 2.53 | 0 | 1.91 | 1.08 | −1.7 | −4.47 | −1.71 | |
| Getting up from the floor (as if you were laying on the ground)? | 2.62 | 0 | 0.86 | 1.18 | −1.5 | −4.09 | −1.47 | |
| Going up and down 3 flights of stairs inside, using a handrail? | 2.25 | 0 | 1.57 | 0.99 | −1.1 | −3.17 | −1.08 | |
| Opening a stuck window? (female) | 1.51 | 0 | 0.31 | 1.08 | −0.8 | −3.3 | −1.02 | |
| Descending 3–5 steps without a handrail? | 2.99 | 0 | 2.52 | 0.88 | −0.5 | −2.88 | −0.84 | |
| Walking a mile, taking rests as necessary? | 2.72 | 0 | 2.23 | 0.67 | −0.7 | −2.28 | −0.75 | |
| Getting into a squatting position (eg, when gardening)? | 2.31 | 0 | 0.62 | 1.18 | −0.5 | −2.69 | −0.68 | |
| Climbing 3 to 5 steps without a handrail? | 3.29 | 0 | 2.73 | 1.18 | −0.4 | −2.66 | −0.62 | |
| Unscrewing the lid off a previously unopened jar without using any devices? (female) | 1.09 | 0.59 | 0 | 1.78 | −0.2 | −2.99 | −0.46 | |
| Unscrewing the lid off a previously unopened jar without using any devices? (male) | −0 | −0.04 | 0 | 3.78 | 1.46 | −0.59 | 1.55 | |
| Walking several blocks (several hundred yards or lengths of a football field)? | 2.79 | 0 | 2.03 | 1.48 | −0.5 | −2.09 | −0.36 | |
| Cutting your toenails? | 1.5 | 0.41 | 0 | 0.87 | −0.1 | −1.58 | −0.26 | |
| Standing for 20 min (eg, waiting in a line)? | 2.61 | 0 | 0.8 | 1.86 | 0.03 | −2.32 | −0.14 | |
| Fastening a necklace behind your neck? | 1.63 | 0.79 | 0 | 1.78 | 0.19 | −1.98 | 0.00 | |
| Using a step stool to reach into a high cabinet? | 2.67 | 0 | 1.04 | 1.38 | 0.38 | −1.38 | 0.13 | |
| Unloading a car trunk or hatchback (eg, packages or equipment)? | 2.83 | 0 | 0.67 | 1.74 | 0.57 | −1.54 | 0.26 | |
| Lifting 25 pounds from the ground to a table? (eg, dog food or a large bag of fertilizer)? (male) | 0.5 | 0 | −0.43 | 1.8 | 0.36 | −1.28 | 0.29 | |
| Walking on an uneven surface (eg, grass, dirt road or sidewalk)? | 2.52 | 0 | 1.48 | 3.12 | 0.28 | −2.53 | 0.29 | |
| Carrying 2 plastic grocery bags with handles at your side for 50 feet? | 3.2 | 0 | 1.47 | 1.75 | 0.7 | −1.37 | 0.36 | |
| Carrying a large object, requiring two hands (eg, tray of food) while walking? | 3 | 0 | 1.01 | 1.6 | 0.76 | −1.27 | 0.36 | |
| Fastening clothing behind your back? | 2.13 | 0.74 | 0 | 2.18 | 0.76 | −1.49 | 0.48 | |
| Crossing the road at a 4-lane traffic light with curbs? | 2.28 | 0 | 1.29 | 2.24 | 0.56 | −1.21 | 0.53 | |
| Opening a heavy, outside door? | 2.29 | 0 | 0.29 | 2.86 | 0.81 | −1.96 | 0.57 | |
| Sitting down in a low, soft couch? | 2.23 | 0 | 0.21 | 2.71 | 0.84 | −1.01 | 0.85 | |
| Stepping on and off a bus? | 2.68 | 0 | 1.07 | 2.93 | 1.14 | −1.51 | 0.85 | |
| Using a computer? | 0.75 | 0.74 | 0 | 1.92 | 1.01 | −0.3 | 0.88 | |
| Pounding a nail in straight with a hammer to hang a picture? | 1.92 | 0 | 0.1 | 2.21 | 1.08 | −0.6 | 0.90 | |
| Opening a stuck window? (male) | 0.58 | 0 | −0.56 | 2.55 | 1.19 | −1.01 | 0.91 | |
| Going up and down a flight of stairs inside, using a handrail? | 2.84 | 0 | 1.55 | 3.24 | 1 | −1.31 | 0.98 | |
| Ripping open a package of snack food (eg, cellophane wrapping on crackers) using only your hands? | 1.33 | 0.62 | 0 | 3.59 | 1.29 | −1.65 | 1.08 | |
| Bending over from a standing position to pick up a piece of clothing from the floor ? | 2.61 | 0 | 0.29 | 3.08 | 1.23 | −1.08 | 1.08 | |
| Cleaning up spills on the floor (eg, with a rag)? | 3.13 | −0.5 | 0 | 3.49 | 1.2 | −1.29 | 1.13 | |
| Holding a screw and screwing it tight with a manual screwdriver? | 1.77 | 0 | 0.08 | 2.44 | 1.47 | −0.43 | 1.16 | |
| Walking one block (about 100 yards or the length of one football field)? | 2.64 | 0 | 1.76 | 2.76 | 1.5 | −0.61 | 1.22 | |
| Making a bed, including spreading and tucking in bed sheets? | 2.28 | −0.03 | 0 | 3.23 | 1.39 | −0.79 | 1.28 | |
| Lifting 5 pounds from the ground to a table (for example, a bag of flour or sugar)? | 2.69 | 0 | 0.54 | 2.7 | 1.45 | −0.32 | 1.28 | |
| Cleaning the floor using a broom and dustpan? | 3.21 | −0.57 | 0 | 2.92 | 1.58 | −0.49 | 1.34 | |
| Walking backwards 3 steps? | 2.24 | 0 | 0.82 | 2.54 | 1.64 | −0.1 | 1.36 | |
| Remembering a list of 4 or 5 errands without writing it down? | 0.79 | 0.17 | 0 | 3.7 | 1.51 | −1.01 | 1.40 | |
| Reaching behind your back as if to put a belt through a belt loop? | 2.88 | 0.74 | 0 | 3.21 | 1.75 | −0.54 | 1.47 | |
| Walking quickly indoors to answer the telephone? | 3.18 | 0 | 1.53 | 3.46 | 1.95 | −0.58 | 1.61 | |
| Picking up a kitchen chair and moving it, in order to clean? | 2.99 | 0 | 0.73 | 3.1 | 2.1 | −0.28 | 1.64 | |
| Reaching overhead while standing, as if to pull a light cord? | 1.93 | 0 | 0.48 | 3.17 | 1.81 | 0.07 | 1.68 | |
| Standing up from an armless straight chair (for example, a dining room chair)? | 2.57 | 0 | 0.67 | 3.9 | 2.04 | −0.74 | 1.73 | |
| Biting and chewing on hard foods (eg, a firm apple or celery)? | 1.05 | 0.59 | 0 | 2.95 | 1.92 | 0.38 | 1.75 | |
| How much help from another person do you currently need getting into a tub? | 2.27 | 0 | 0.64 | 2.45 | 1.69 | 1.13 | 1.76 | |
| Using an escalator? | 2.6 | 0 | 1.41 | 2.81 | 2.17 | 0.48 | 1.82 | |
| Standing, while leaning on the sink for 10 minutes? | 2.26 | 0 | 0.4 | 3.66 | 2.03 | −0.07 | 1.87 | |
| Stepping up and down from a curb? | 2.77 | 0 | 1.27 | 4.37 | 2.23 | −0.79 | 1.94 | |
| Turning over in bed (including adjusting bedclothes, sheets and blankets)? | 2.31 | 0 | 0.03 | 4.36 | 2 | −0.18 | 2.06 | |
| Getting into and out of a car/taxi (sedan)? | 2.64 | 0 | 0.54 | 5.16 | 2.12 | −1 | 2.09 | |
| Pouring from a large pitcher? | 2.02 | 0.52 | 0 | 4.47 | 2.32 | −0.21 | 2.19 | |
| Tying shoes? | 3.18 | 0.42 | 0 | 3.6 | 2.46 | 0.52 | 2.19 | |
| Walking around inside a building (50 ft.) on the same level (for example, hospital hallway, around a doctor’s office or supermarket)? | 3.17 | 0 | 1.18 | 3.94 | 2.64 | 0.49 | 2.36 | |
| Searching a crowded grocery shelf to find the brand of cereal you prefer? | 1.21 | 0.27 | 0 | 4.14 | 2.68 | 0.28 | 2.37 | |
| Putting on socks? | 2.49 | 0.27 | 0 | 3.99 | 2.58 | 0.54 | 2.37 | |
| Washing dishes, pots, and utensils by hand while standing at sink? | 2.96 | −0.19 | 0 | 4.01 | 2.71 | 0.47 | 2.40 | |
| How much help from another person do you currently need getting into a tub using a tub seat? | 1.98 | 0 | 0.52 | 3.25 | 2.53 | 1.45 | 2.41 | |
| Operating an ATM (automatic teller) to get cash or make deposits? | 0.9 | 0.65 | 0 | 3.35 | 2.64 | 1.28 | 2.42 | |
| Reaching behind you to get your seatbelt? | 2.26 | 0 | 0.03 | 4.8 | 2.58 | 0.18 | 2.52 | |
| How much help from another person do you currently need climbing a full flight of stairs with a railing? | 3.04 | 0 | 1.73 | 3.72 | 2.68 | 1.63 | 2.68 | |
| Reaching into the back pocket of a pair of pants? (<75 y) | 0.48 | 0 | −0.48 | 4 | 2.64 | 1.42 | 2.69 | |
| Looking up a phone number or address in the phone book or in your own address book? | 1.07 | 0.39 | 0 | 4.24 | 2.81 | 1.06 | 2.70 | |
| Putting on a (button down) shirt or a blouse? (<75 y) | 0 | 0.13 | 0 | 4.47 | 2.45 | 1.2 | 2.71 | |
| Washing and rinsing your hair? | 2.57 | 0.71 | 0 | 3.61 | 2.85 | 1.7 | 2.72 | |
| Washing your lower body while giving yourself a sponge bath? (male) | −0.1 | −0.25 | 0 | 4.08 | 2.68 | 1.43 | 2.73 | |
| Sitting down in an armless straight chair (eg, a dining room chair)? | 2.8 | 0 | 0.32 | 4.59 | 2.93 | 0.86 | 2.79 | |
| Using a washer and dryer, including loading clothes, and setting the dials? | 2.21 | 0.43 | 0 | 3.7 | 3.06 | 1.78 | 2.85 | |
| Chopping or slicing vegetables (eg, onions or peppers)? | 2.24 | 0.78 | 0 | 4.44 | 3.26 | 1.05 | 2.92 | |
| Keeping track of time (eg, using a clock)? | 1.08 | 0.33 | 0 | 4.1 | 1.75 | 2.93 | ||
| Styling your hair? | 2.19 | 1.43 | 0 | 4.42 | 2.92 | 1.47 | 2.94 | |
| Carrying a small object in one hand (eg, something fragile or a glass of water) while walking indoors? | 2.98 | 0 | 0.88 | 4.16 | 3.52 | 1.24 | 2.97 | |
| Cleaning kitchen surfaces thoroughly (eg, stove, sink, or counter tops)? | 2.22 | 0.04 | 0 | 5.13 | 3 | 0.99 | 3.04 | |
| Moving from lying on your back to sitting on the side of the bed? | 2.82 | 0 | 0.01 | 6.45 | 3.04 | −0.32 | 3.06 | |
| Walking around one floor of your home, taking into consideration doors, furniture, and a variety of floor coverings. | 2.73 | 0 | 0.71 | 4.37 | 3.64 | 1.29 | 3.10 | |
| Using a microwave to heat up foods? (male) | 1.28 | 0.78 | 0 | 4.16 | 3.73 | 1.55 | 3.15 | |
| Standing for at least one minute? | 2.83 | 0 | 0.64 | 4.7 | 3.86 | 1.47 | 3.34 | |
| Holding a book while reading? | 1.93 | 0 | −0.32 | 5.03 | 3.79 | 1.33 | 3.38 | |
| Trimming and filing your fingernails? | 2.01 | 1.58 | 0 | 4.69 | 3.67 | 1.99 | 3.45 | |
| Opening car doors? | 2.88 | 0 | −0.1 | 4.81 | 4.08 | 1.58 | 3.49 | |
| Holding a full glass of water in one hand? | 1.95 | 0 | −0.21 | 5.04 | 3.8 | 1.75 | 3.53 | |
| Zipping a jacket? | 1.81 | 1.11 | 0 | 5.38 | 3.89 | 1.44 | 3.57 | |
| Reaching into the back pocket of a pair of pants? (≥75 y) | 2.58 | 0 | −0.15 | 5.45 | 3.6 | 1.73 | 3.59 | |
| Holding a plate full of food? | 2.51 | 0.89 | 0 | 6.03 | 3.72 | 1.21 | 3.65 | |
| Using common utensils for preparing meals (for example, can opener, potato peeler, or sharp knife)? | 2.48 | 1.43 | 0 | 5.68 | 3.86 | 1.49 | 3.68 | |
| Putting on a (button down) shirt or a blouse? (≥75 years) | 2.02 | 0.93 | 0 | 6.05 | 3.79 | 1.62 | 3.82 | |
| How much help from another person do you currently need stepping into a shower? | 3.05 | 0 | 1.07 | 5.27 | 3.68 | 2.55 | 3.83 | |
| Doing zippers, snaps, or hooks on pants? | 3.11 | 1.36 | 0 | 6.04 | 4.23 | 1.54 | 3.94 | |
| Putting on long pants (including managing fasteners)? | 3.06 | 0.48 | 0 | 6.64 | 3.88 | 1.36 | 3.96 | |
| Putting on makeup accurately (for example, lipstick, foundation, eyeliner)? | 1.63 | 0.74 | 0 | 6.09 | 4.19 | 1.75 | 4.01 | |
| Putting on a coat or jacket? | 3.05 | 1.02 | 0 | 7.9 | 4.7 | 1.56 | 4.72 | |
| Putting on a pullover shirt? | 2.55 | 1.23 | 0 | 7.66 | 4.61 | 1.92 | 4.73 | |
| Washing your lower body while giving yourself a sponge bath? (female) | 2.64 | 0.65 | 0 | 7.3 | 5.03 | 2.23 | 4.85 | |
| Applying spreads to bread using a knife? | 2.39 | 1.15 | 0 | 6.65 | 5.15 | 3.01 | 4.94 | |
| How much help from another person do you currently need moving from a bed to a chair (including a wheelchair)? | 3.13 | 0 | 0.35 | 6.42 | 3.54 | 4.98 | ||
| Washing your upper body while giving yourself a sponge bath? | 3 | 0.57 | 0 | 7.42 | 5.12 | 2.44 | 4.99 | |
| Using a microwave to heat up foods? (female) | 2.28 | 1.23 | 0 | 6.55 | 3.89 | 5.22 | ||
| Preparing the toothbrush and brushing teeth? | 2.27 | 1.21 | 0 | 7.23 | 3.78 | 5.51 | ||
| Cutting your own food (such as meat, fruit, etc.)? | 2.98 | 2.24 | 0 | 8.1 | 5.86 | 3.54 | 5.83 | |
| Combing and parting hair? | 3.24 | 1.64 | 0 | 7.97 | 6.51 | 3.42 | 5.97 | |
| Using a spoon to eat a meal? | 2.15 | 1.93 | 0 | 7.55 | 4.54 | 6.05 | ||
| Using a fork to eat a meal? | 2.25 | 1.85 | 0 | 8.24 | 7.05 | 4.28 | 6.52 | |
| How much help from another person do you currently need managing toileting aftercare, including wiping yourself and putting clothes back on? | 4.39 | 0 | 0.18 | 9.38 | 6.46 | 4.39 | 6.74 | |
*Difficulty, or location parameters were estimated using a multidimensional logistic graded response model, which estimates both the discrimination parameters and the ordered location parameters for each item. The marginal maximum likelihood estimation was used to estimate the item parameters. For the bifactor model in this study, there are three discrimination parameters for each item: one for the general factor (a1) and one for each of the two sub-factors (a2 and a3).
†Difficulty parameters were estimated by calculating the mean of the ordered location parameters (c1, c2, and c3) of an item.
Breadth of coverage for the Activity Limitation/Participation Restriction item bank is illustrated in Figure 1 by plotting the general factor score distribution of the calibration sample opposite that of the item bank.
Figure 1.

Breadth of coverage as represented by the general factor score distribution for the sample and the item banks. Activity Limitation/Participation Restriction score distribution for the calibration sample is presented on the left side of the plot. The score distribution for the item bank is presented on the right, calculated by taking the average of the expected values at the sub-factor level and mapping the average expected value onto the general factor score level. (a) Activity Limitation Scale (b) Participation Restriction Scale.
Simulation Studies
Activity Limitation Scale.—
In terms of accuracy, scores from the 10-item CAT demonstrated r = .90 with the total item bank, a bias = −0.30, and an RMSE = 3.26. The 10-item CAT scores achieved a correlation of r = .87 with the original 32-item fixed-form LLFDI Function Scale. In regard to precision, the Activity Limitation score standard errors were smaller at the middle of the scale than that at the extremes of the difficulty continuum (Figure 2a). The 10-item Activity Limitation CAT scores were associated with standard errors less than four in the score range of 15–60. The standard errors of Basic Mobility & Handling and Daily Activities subscale scores were less than 4 in the score range of 20–50. Score precision of the 10-item CAT was comparable or superior to the precision of the 32 items from the original fixed-form LLFDI Function Scale. Conditional reliability estimates were acceptable with 66% of the estimates demonstrating r = .90 or greater.
Figure 2.

Precision of two 10-item CATs compared with the fixed-form LLFDI and the full item banks. Precision is represented by the standard error of the scores; mean = 50, SD = 10. (a) activity limitation (b) participation restriction.
Participation Restriction Scale.—
The 10-item Participation Restriction CAT demonstrated accuracy compared with the item bank of r ≥ .95; a bias: −0.20 and an RMSE 2.9 and a correlation of r = .93 with the original 16 items from the fixed-form LLFDI Disability scale. Precision in the middle of the Participation Restriction scale was better than at the extremes, as indicated by standard errors less than 4 across the 20–55 score range and less than 4 across the 20–50 score range for the subscale scores (Figure 2b). Precision of the 10-item Participation Restriction scale was comparable to that of the 16 items from original fixed-form LLFDI. Conditional reliability estimates were acceptable with 60% of the estimates larger than r = .90.
Differential Item Functioning
In the Activity Limitation Scale, seven items demonstrated DIF, five related to gender, and two related to age. Examples of those related to gender included “lifting 25 lbs … ” and “unscrewing the lid off a previously unopened jar without using any devices.” Examples related to age included “reaching into a back pocket … ” and “putting on a button down shirt/blouse.” In the Participation Restriction Scale, one item demonstrated DIF by gender (“going to a sport, social, or other club”) and two items by age (“visiting friends and family in their homes” and “doing your job the way you want to”). These DIF items were calibrated separately for different gender or age groups, and therefore, item parameters are provided by group (Appendix 2).
Validity and Reliability Testing
Test–retest reliability estimates were r = .85 for the Activity Limitation Scale and r = .80 for the Participation Restriction scale. The strength and pattern of correlations of the new LLFDI-CAT scales with VR-36 subscales were moderate to strong as hypothesized (Table 2). The new LLFDI-CAT scales demonstrated strong to moderate correlations with the Physical Component Summary Score of the VR-36 (r = .73 for Activity Limitation and r = .58 for Participation Restriction) and weak correlations with the Mental Component Summary Score (r = .01 and r = .07, respectively). We removed data from one participant for time to complete due to an extended interruption to the interview. Mean time to complete the two 10-item CATs was 7 minutes and 12 seconds compared with 13 minutes and 32 seconds for the VR-36. Responses to acceptability questions ranged from 90% to 98% positive for amount of time required; understandability; importance of questions relative to function and disability; giving a good picture of ability to take part in daily activities and willingness to answer in the future. One person (2%) found questions upsetting.
Table 2.
Pearson Correlations Coefficients for LLFDI 10-Item CATs and VR-36 Subscales (n = 102)
| Activity Limitation | Participation Restriction | ||||||
| General | Basic Mobility & Handling | Daily Activities | General | Social Roles | Instrumental Roles | ||
| VR-36 | PF | 0.81 | 0.52 | 0.62 | 0.65 | 0.53 | 0.53 |
| RP | 0.46 | 0.42 | 0.43 | 0.64 | 0.55 | 0.51 | |
| RE | 0.11 | 0.23 | 0.11 | 0.21 | 0.27 | 0.21 | |
| SF | 0.24 | 0.18 | 0.13 | 0.57 | 0.44 | 0.45 | |
| VT | 0.38 | 0.18 | 0.22 | 0.29 | 0.30 | 0.38 | |
| BP | 0.37 | 0.23 | 0.39 | 0.32 | 0.23 | 0.18 | |
| MH | 0.21 | 0.27 | 0.17 | 0.37 | 0.39 | 0.33 | |
| GH | 0.43 | 0.28 | 0.31 | 0.31 | 0.14 | 0.30 | |
Note: PF = Physical Functioning; RP = Role Physical; RE = Role Emotional; SF = Social Functioning; VT = Vitality; BP = Bodily Pain; MH = Mental Health; GH = General Health.
Floor and Ceiling
Overall in the validation study, 40/55 items were administered in the Participation Restriction Scale and 74/141 were administered from the Activity Limitation Scale. There were no subjects at the floor in the calibration or validation study samples for the new LLFDI-CATs. In the calibration study sample using the item pool, 0.2% was at the ceiling for the Activity Limitations scale and 3% for the Participation Restriction scale. No subjects were found at the ceiling on either CAT scale in the validation study sample. For the VR-36 subscales 4.2% of the validation sample was at the floor in physical function. At the ceiling, by subscale, there was Physical Functioning: 2.5%; Role Physical: 11.7%; Social Functioning: 29.2%; Bodily Pain: 12.5%; Mental Health: 7.5%; and General Health: 4.2%.
DISCUSSION
The aim of this project was to develop and test a comprehensive CAT-version of the LLFDI by building and calibrating expanded item banks and to examine whether the CATs sacrificed or improved upon psychometric quality compared with the original fixed-form LLFDI. Simulation studies indicated that 10-item LLFDI-CATs based on the expanded 141-item Activity Limitation Scale and 55-item Participation Restriction Scale provided promising content breadth, accuracy, precision, and reliability, with no floor effects and minimal to no ceiling effects in a sample of 520 community-dwelling older adults. In the validation sample, each 10-item CAT took, on average, 3.56 minutes to administer, with no loss of measurement precision compared with the original fixed-form instrument, which took 20–30 minutes to administer (18). Additionally, using the calibration study data to compare reliability of the new 10-item CATs with the items from the original LLFDI scales, the 10-item CATs had comparable or better reliability across the functional continuum. The 10-item CATs demonstrated strong test–retest reliability compared to results from earlier research with the original fixed forms. The test–retest ICC for the 10-item Activity Limitation CAT in this study was r = .85, and that of the 32-item Function Scale in prior research was r = .96 (15). The 10-item Participation Restriction CAT demonstrated test–retest ICC of r = .80 compared with r = .82 for the 16-item Limitation subscale of the Disability Scale (16). This indicates that the reliability of the CATs remains high relative to the fixed forms. Field testing of the new LLFDI-CATs revealed that they were acceptable to older adults and demonstrated good test–retest reliability and concurrent validity compared with the VR-36.
The new LLFDI-CAT meets several criteria suggested for assessing disability outcomes in older adults including a conceptual basis (ICF), brief administration time (3.5 min/scale), strong psychometric quality, and high acceptability for older respondents (40). CAT instruments have the potential to allow researchers and clinicians who work with older adults to achieve highly efficient patient-reported disability assessment without loss of measurement accuracy, precision, or reliability. Using responses to the full item banks compared with simulated CAT provided a unique opportunity to consider the most fundamental performance characteristics of the 10-item CAT, and CATs have been shown to offer strong measurement properties combined with a conceptual basis for understanding the disablement process. Earlier work to develop a prototype CAT for the original LLFDI demonstrated promising precision; however, when compared with the results from this study, the real data simulation results demonstrated lower standard errors across much of the score range (18).
In the original version of the LLFDI, there were two dimensions of Participation Restriction represented: limitation and frequency. In the new LLFDI-CAT, we limited the Participation Restriction Scale to perceived limitation, in large part based on our conceptual definition of participation from the individual’s perspective, combined with feedback from the field that the perceived limitation was most critical to the measurement of participation restriction. Hammel and colleagues (41), for example, addressed the difficulty in using set norms or frequency of performance as a measure of participation in their investigation of the meaning of participation for persons who self-identified with disabilities. Although Hammel’s study supported a much broader concept of participation than is represented by the new LLFDI-CAT, it provided important insight into the issues involved in measuring participation from various perspectives and supports the decision to focus on limitation in participation from the perspective of the respondent rather than to measure frequency of participation. This decision was also partly due to practical considerations involved in determining how large an item pool could be administered to the calibration sample. Because calibration could involve administration of each item in the item bank to each participant, focusing on the limitation dimension allowed us to expand the range of content in the Activity Limitation and Participation Restriction domains.
Strengths of this project include using a conceptual framework to guide instrument development, eliciting input from stakeholders at every step of the development process and the range of physical functioning represented by the samples. Limitations include underrepresentation of ethnic and racial minorities. Although real-data simulations may overestimate the agreement between CATs and the item banks, they provide a reasonable estimation of the performance of the prototype CATs. One challenge in developing large item banks is to include unique items that cover the full range of abilities and represent relevant content without violating assumptions of the measurement model. We removed items that demonstrated local dependence from the item banks. However, some items in our banks demonstrated DIF due to gender or age, meaning that responses were related to gender or age as well as difficulty. One approach to this problem is to remove the items that demonstrate DIF. We chose to keep the items in the instrument to preserve the coverage of content and calibrate the items separately by group.
CONCLUSIONS
This study reveals that CAT methodology can be applied successfully to assess patient-reported disability by older adults, reducing the time required for administration without loss of accuracy or precision while maintaining acceptable levels of reliability and validity. Although further work is needed to assess whether the markedly expanded item banks confer superior ability to measure change, these results suggest that the CAT approach offers a viable solution to the long-standing conflict between the need for accuracy in outcome assessment and the equal need for practicality of administration.
ACCESSING THE LLFDI-CAT
Access to the LLFDI-CAT for Windows can be obtained at the Health & Disability Research Institute: http://sph.bu.edu/HDRI/outcome-measures/menu-id-617525.html, or at HYPERLINK “http://iTunes.apple.com/us/app/latelife-cat-for-iPad/id496103142”, or search for “latelife” on the Apple IPad ‘App Store.’
FUNDING
This work was supported by the National Institute on Aging (grant number 5R42AG027620-03 and 1P30AG031679), the Eunice Kennedy Shriver National Institute of Child Health and Human Development (grant number 5F32HD056763), and a fellowship from the Foundation for Physical Therapy, the New Investigator Fellowship Training Initiative, to Dr. C.M.M.
CONFLICT OF INTEREST
Dr. A.M.J and Mr. R.M. hold stock in CRECare, a small business they created to disseminate and provide technical support for computer adaptive outcome instrument users.
References
- 1.Institute of Medicine. The Future of Disability in America. Washington, DC: The National Academies Press; 2007. [PubMed] [Google Scholar]
- 2.Nagi SZ. An epidemiology of disability among adults in the United States. Milbank Mem Fund Q Health Soc. 1976;54:439–67. [PubMed] [Google Scholar]
- 3.World Health Organization. International Classification of Functioning, Disability and Health (ICF). Geneva, Switzerland: World Health Organization. 2001. [Google Scholar]
- 4.Guralnik JM, Ferrucci L. Assessing the building blocks of function: utilizing measures of functional limitation. Am J Prev Med. 2003;25(3 suppl 2):112–121. doi: 10.1016/s0749-3797(03)00174-0. [DOI] [PubMed] [Google Scholar]
- 5.Haywood KL, Garratt AM, Fitzgerald R. Older people specific health status and quality of life: a structured review of self-assessed instruments. J Eval Clin Pract. 2005;11(4):315–327. doi: 10.1111/j.1365-2753.2005.00538.x. [DOI] [PubMed] [Google Scholar]
- 6.Ware JE., Jr Conceptualization and measurement of health-related quality of life: comments on an evolving field. Arch Phys Med Rehabil. 2003;84(suppl 2):S43–S51. doi: 10.1053/apmr.2003.50246. [DOI] [PubMed] [Google Scholar]
- 7.Jette AM, Haley SM. Longitudinal outcome monitoring across post-acute care (PAC) settings. Uniform Patient Assessment for Post-acute Care Final Report. Aurora, CO: Division of Health Care Policy and Research, University of Colorado at Denver and Health Sciences Center; 2006. pp. 100–120. [Google Scholar]
- 8.Jette AM, Haley SM. Contemporary measurement techniques for rehabilitation outcomes assessment. J Rehabil Med. 2005;37:339–345. doi: 10.1080/16501970500302793. [DOI] [PubMed] [Google Scholar]
- 9.Haley SM, Coster WJ, Andres PL, Kosinski M, Ni P. Score comparability of short forms and computerized adaptive testing: Simulation study with the activity measure for post-acute care. Arch Phys Med Rehabil. 2004 Apr;85(4):661–666. doi: 10.1016/j.apmr.2003.08.097. [DOI] [PubMed] [Google Scholar]
- 10.Rubenach S, Shadbolt B, McCallum J. Assessing health-related quality of life following myocardial infarction: Is the SF-12 useful? J Clin Epidemiol. 2002;55:306–309. doi: 10.1016/s0895-4356(01)00426-7. [DOI] [PubMed] [Google Scholar]
- 11.Cella D, Yount S, Rothrock N, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years. Med Care. 2007 May;45(5 suppl 1):S3–S11. doi: 10.1097/01.mlr.0000258615.42478.55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Hambleton RK. Applications of Item Response Theory to improve health outcomes assessment: developing item banks, linking Instruments, and computer-adaptive testing. In: Lipscomb J, Gotay CC, Snyder C, editors. Outcomes Assessment in Cancer. Cambridge, UK: Cambridge University Press; 2005. [Google Scholar]
- 13.Hays R, Morales L, Reise S. Item response theory and health outcomes measurement in the 21st century. Med Care. 2000;38(II):28–42. doi: 10.1097/00005650-200009002-00007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Haley SM, Siebens H, Coster WJ, et al. Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: I. Activity outcomes. Arch Phys Med Rehabil. 2006 Aug;87(8):1033–1042. doi: 10.1016/j.apmr.2006.04.020. [DOI] [PubMed] [Google Scholar]
- 15.Haley SM, Jette AM, Coster WJ, et al. Late Life Function and Disability Instrument: II. Development and evaluation of the function component. J Gerontol A Med Sci. doi: 10.1093/gerona/57.4.m217. [Evaluation Studies Research Support, U.S. Gov’t, P.H.S.]. 2002 Apr;57(4):M217–M222. [DOI] [PubMed] [Google Scholar]
- 16.Jette AM, Haley SM, Coster WJ, et al. Late life function and disability instrument: I. Development and evaluation of the disability component. J Gerontol A Med Sci. doi: 10.1093/gerona/57.4.m209. [Research Support, U.S. Gov’t, P.H.S. Validation Studies]. 2002 Apr;57(4):M209–M216. [DOI] [PubMed] [Google Scholar]
- 17.Sayers SP, Jette AM, Haley SM, Heeren TC, Guralnik JM, Fielding RA. Validation of the Late-Life Function and Disability Instrument. J Am Geriatr Soc. doi: 10.1111/j.1532-5415.2004.52422.x. [Research Support, U.S. Gov’t, P.H.S. Validation Studies]. 2004 Sep;52(9):1554–1559. [DOI] [PubMed] [Google Scholar]
- 18.Jette AM, Haley SM, Ni P, Olarsch S, Moed R. Creating a computer adaptive test version of the late-life function and disability instrument. J Gerontol A Biol Sci Med Sci. doi: 10.1093/gerona/63.11.1246. [Research Support, N.I.H., Extramural]. 2008 Nov;63(11):1246–1256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Jette AM. Toward a common language of disablement. J Gerontol A Biol Sci Med Sci. 2009 Nov;64(11):1165–1168. doi: 10.1093/gerona/glp093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kazis LE, Miller DR, Clark JA, et al. Improving the response choices on the Veterans SF-36 health survey role functioning scales: results from the Veterans Health Study. J Ambul Care Manage. 2004 Jul–Sep;27(3):263–280. doi: 10.1097/00004479-200407000-00010. [DOI] [PubMed] [Google Scholar]
- 21.Kazis LE, Miller DR, Skinner KM, et al. Patient-reported measures of health: The Veterans Health Study. J Ambul Care Manage. 2004 Jan–Mar;27(1):70–83. doi: 10.1097/00004479-200401000-00012. [DOI] [PubMed] [Google Scholar]
- 22.Ware JE, Jr, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care. 1992;30(6):473–483. [PubMed] [Google Scholar]
- 23.Muthen B, Muthen L. Mplus Statistical Analysis with Latent Variables User's Guide. Los Angeles, CA: Muthén & Muthén; 2007. [Google Scholar]
- 24.Kass R, Wasserman L. A reference Bayesian test for nested hypotheses and its relationship to the Schwartz criterion. J Am Stat Assoc. 1995;90:928–934. [Google Scholar]
- 25.Anderson JC, Gerbing DW. The effect of sampling error on convergence, improper solutions, and goodness-of-fit indexes for maximum-likelihood confirmatory factor-analysis. Psychometrika. 1984;49:155–173. [Google Scholar]
- 26.Bentler PM. On the fit of models to covariance and methodology. Psychol Bull. 1992;112:400–404. doi: 10.1037/0033-2909.112.3.400. [DOI] [PubMed] [Google Scholar]
- 27.Marsh HW, Balla JR, McDonald RP. Goodness of fit indexes in confirmatory factor analysis: the effect of sample size. Psychol Bull. 1988:103, 391–410. [Google Scholar]
- 28.Browne MW, Cudeck R. Alternative ways of assessing model fit. In: Bollen KA, Long JS, editors. Testing Structural Equation Models. Thousand Oaks, CA: Sage; 1993. pp. 136–162. [Google Scholar]
- 29.Hu LT, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equat Model. 1999;6(1):1–55. [Google Scholar]
- 30.Reeve BB, Hays RD, Bjorner JB. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS) Med Care. 2007 May;45(5 suppl 1):S22–S31. doi: 10.1097/01.mlr.0000250483.85507.04. [DOI] [PubMed] [Google Scholar]
- 31.Mislevy RJ. Bayes modal estimation in item response models. Psychometrika. 1986;51:177–195. [Google Scholar]
- 32.Segall DO. General ability measurement: an application of multidimensional item response theory. Psychometrika. 2001;66:79–97. [Google Scholar]
- 33.Drasgow F, Levine M, Williams E. Appropriateness measurement with polytomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology. 1985;38:67–86. [Google Scholar]
- 34.Ackerman T, Hombo C, Neustel S, editors. Evaluating indices used to assess the goodness-of-fit of the compensatory multidimensional item response theory model. Poster presented at the Annual Meeting of the National Council on Measurement in Education; 2002; New Orleans, LA. [Google Scholar]
- 35.Thissen D, editor. The MEDPRO project: An SBIR project for a comprehensive IRT and CAT software system—IRT software. Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing; Minneapolis, MN, 2009. [Google Scholar]
- 36.Crane P, Gibbons L, Ocepek-Welikson K, et al. A comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression. Qual Life Res. 2007;16(0):69–84. doi: 10.1007/s11136-007-9185-5. [DOI] [PubMed] [Google Scholar]
- 37.Jodoin MG, Gierl MJ. Evaluating Type I error and power rates using an effect size measure with the Logistic Regression Procedure for DIF detection. Appl Measure Educat. 2001;14(4):329–349. [Google Scholar]
- 38.Fleiss JL. Statistical Methods for Rates and Proportions. New York: John Wiley & Sons; 1981. [Google Scholar]
- 39.McDowell I. Measuring Health: A Guide to Rating Scales and Questionnaires. 3rd ed. New York: Oxford University Press; 2006. [Google Scholar]
- 40.Andresen EM. Criteria for assessing the tools of disability outcomes research. Arch Phys Med Rehabil. 2000 Dec;81(12 suppl 2):S15–S20. doi: 10.1053/apmr.2000.20619. [DOI] [PubMed] [Google Scholar]
- 41.Hammel J, Magasi S, Heinemann A, Whiteneck G, Bogner J, Rodriguez E. What does participation mean? An insider perspective from people with disabilities. Disabil Rehabil. 2008;30(19):1445–1460. doi: 10.1080/09638280701625534. [DOI] [PubMed] [Google Scholar]
