Abstract
Objective:
To assess psychometric properties of the improved Work Disability Functional Assessment Battery (WD-FAB 2.0).
Design:
Longitudinal study
Setting:
Community
Participants
Three samples of working-age (21–66) adults: (1) unable to work because of a physical condition (n=375); (2) unable to work because of a mental health condition (n=296); (3) general US working age sample (n=335)
Intervention
NA
Main Outcome Measures
All samples completed the WD-FAB 2.0; the second administration came 5 days after the first. Construct validity was examined by convergent and divergent correlational analysis using legacy measures. Test-retest reliability was assessed by intraclass correlation coefficients (ICC3,1). Standard error of measurement (SEM) and minimal detectable change (MDC90) were calculated to measure scale precision and sensitivity.
Results
Physical function ICCs ranged from 0.69 to 0.77 in the general sample, and 0.66 to 0.86 in the disability sample. Mental health function scales ICCs ranged from 0.62 to 0.73 in the general sample, and 0.74 to 0.76 in the disability sample. SEMs for all scales indicated good discrimination; those for the physical function scales were generally lower than those for the mental health scales. MDC90 values ranged from 3.41 to 10.55. Correlations between all WD-FAB 2.0 scales and legacy measures were in the expected direction.
Conclusions
The study provides substantial support for the reliability and construct validity of the WD-FAB 2.0 among three diverse samples. Although initially developed for use within the Social Security Administration, these results suggest that the WD-FAB 2.0 could be used for assessment and measurement of workrelated physical and mental health function in other contexts as well.
Keywords: Disability Evaluation, Outcomes Assessment, Rehabilitation
Introduction/Background
The Boston University Health and Disability Research Institute (BU-HDR) and the National Institutes of Health’s (NIH-RMD) Rehabilitation Medicine Department collaborated with the United States (U.S.) Social Security Administration (SSA) on the development of a new instrument that measures work-relevant physical and mental health functioning using modern measurement methodology guided by a contemporary framework of disability defined by the WHO’s International Classification of Function and Health (ICF).1 That development process included an extensive literature review, the identification of hypothesized key dimensions of work-related functioning, the development of comprehensive lists of questionnaire items (called ‘item pools’) within each functional domain with input from a panel of experts, and the refinement of those items through cognitive interviewing with potential respondents.
We administered 289 items to samples of SSA claimants and working age adults from the general U.S. population, and identified discreet functional dimensions underlying these item pools. Item response theory (IRT) analyses were used to calibrate items along a continuum of functional difficulty within each domain, thus creating a set of unidimensional scales called the Work Disability Functional Assessment Battery (WD-FAB). Computer-adaptive testing (CAT) methods were used to administer WD-FAB items that allowed for scale score estimation in response to a small subset of questions from the large item pools. CAT uses algorithms that can tailor item selection in real time, presenting the item that will provide the most information at the current estimated score, thus avoiding the need to administer all the items in the WD-FAB.2
This initial work produced WD-FAB 1.0, which measured physical functioning across four domains: Changing and Maintaining Body Position; Whole Body Mobility; Upper Body Function; and Upper Extremity Fine Motor Functioning.3–5 The WD-FAB 1.0 also characterized mental health function along four domains: Mood & Emotions, Social Interactions, Self-Efficacy, and Behavioral Control.3,6 Initial psychometric results have been published previously.7,8
A key advantage of IRT-based assessment instruments is that they can be updated and expanded by adding new items and integrating them into the existing item pools and scoring algorithms – a process known as replenishment.9 Replenishment is common in educational testing, but has been applied infrequently in health measurement. In this work, we undertook a comprehensive replenishment of the WD-FAB 1.0 item pools to expand the previously developed scales to improve their breadth, particularly at the floor and ceiling levels of function.10,11 In addition, we sought to incorporate new content representing additional areas of work-relevant function. Specifically, we (1) expanded the range of functional content in all of the physical scales and (2) added the assessment of community mobility using transportation. We also identified a need to extend content coverage to reflect a broader range of functional aspects of mental conditions that could lead a person to experience difficulty working. Regarding the latter, we aimed to (1) add to the tool the domains of communication and cognitive function, (2) replenish the Social Interactions scale so as to include a broader range of interpersonal interactions items, and (3) replenish the Self-Efficacy scale so as to capture the more work-related construct of “resilience,” defined as a person’s capacity to adapt and respond to the pressures of daily life demands.
This extensive replenishment work, along with the addition of new domains, resulted in the WD-FAB 2.0, which consists of four physical function scales -- Basic Mobility (56 items), Upper Body Function (34 items), Fine Motor Function (45 items), and Community Mobility (11 items) – and four scales assessing work-related mental health functioning – Communication &Cognition (68 items), Self-Regulation (34 items), Resilience & Sociability (29 items) and Mood & Emotions (34 items).10,11 The WD-FAB2.0 utilized two primary item structures: agreement-based and ability-based. The agreement items asked for individuals to “Specify your level of agreement” on a 4-point Likert-type response scale ranging from “Strongly Agree” to “Strongly Disagree,” with some items in the physical function domain including a fifth “unable to do” option. The ability-based items asked “Are you able to” with five-point response categories ranging from “Yes, without difficulty” to Unable to do.” Both item structures also allowed an opt-out option of “I don’t know.”
The goal of the present study was to conduct a rigorous psychometric evaluation of WD-FAB 2.0 by (1) examining test-retest reliability of each WD-FAB 2.0 scale, and (2) assessing construct validity by (a) examining the pattern of convergent and discriminant correlations between the WD-FAB 2.0 scales and established legacy measures of similar constructs, and (b) comparing the WD-FAB2.0 scale scores in samples known to have different disability levels: general working age adults, and adults unable to work due to a permanent physical and/or mental disability. These analyses were conducted in a sample of general working age adults and samples of adults unable to work due to a permanent physical and/or mental disability.
Methods
Procedure
The study team contracted data collection to YouGov, a survey research organization that maintains an opt-in internet panel of 1.5 million US residents. To assess test-retest reliability, YouGov administered the WD-FAB 2.0 CAT to each individual twice over the internet, with the second survey administration (T2) done 5 days after the first (T1) administration. To assess construct validity, YouGov administered several legacy measures along with the WD-FAB 2.0 CAT at T2. The legacy measures included: the Participation Measure for Post-Acute CARE (PM-PAC) Community Mobility Scale; the PROMIS Physical Function 20 item short form; the La Trobe Communication Questionnaire; the General Self-Efficacy Scale; the BASIS 24 (excluding drug/alcohol use questions); and the Activity Measure for Post-Acute Care (AM-PAC) Applied Cognition short form. The legacy measures were selected to target the same domains of function as represented by the WD-FAB 2.0 scales for purposes of assessing construct validity. The internal consistency reliabilities for all legacy measures were examined in the study samples, with Cronbach alpha coefficients ranging from 0.82 to 0.96 among T2 respondents overall. Within the three disability status subgroups, the lowest observed alpha was 0.72.
Sample
We hypothesized that scores for people who reported no disability would be significantly different than those who reported permanent disability. Therefore we recruited three samples from the YouGov national panel. Sample A was obtained using the proximity matching method developed by YouGov to identify 335 working age adults (aged 21–66 years) matched to the U.S. adult population on age, sex, race, ethnicity and education level.12 Self-reported disability status previously collected by YouGov and the time of panel enrollment was then used to select panel members for the known-groups discriminant analysis. Sample B was recruited from those adults in the YouGov panel in the same age range who had previously reported their employment status as ‘permanently disabled ‘ for any physical condition (n=375), and sample C was recruited from those adults in the YouGov panel in the same age range who had previously reported their employment status as ‘permanently disabled ‘ for any mental health condition (n=296). All 3 samples received the same assessments. Disability status and the physical or mental nature of the disabling condition was confirmed by a screener question administered at the Time 1 survey. A target sample size of 225 per group was set based on the range of ICC coefficients and convergent/discriminant correlations observed in prior work with the WD-FAB1.0. This number would provide 90% assurance probability that the 95% confidence intervals around the observed metrics (ICC, r) would include values as low or high as previously observed. We deliberately overshot this target number at the Time 1 data collection to provide a buffer against potentially low return rate for the second data collection.
Data Analysis
WD-FAB 2.0 raw scores are calculated in logits, then standardized to z-scores which are transformed to T-scores where the mean = 50 and SD=10 based on the general adult sample as the reference score; higher scores indicate better functioning.
To assess test-retest reliability, we computed intraclass correlation coefficients for each WD-FAB 2.0 scale using a two-way mixed model (ICC3,1) separately for each of the three samples. At T2 we included a question asking respondents if their physical or mental health had improved, worsened, or stayed the same over the past 5 days. This allowed us to conduct a sensitivity analysis by removing those who had reported a change in health status and re-computing the ICCs for the remaining sample to see if the “changers” in the sample had attenuated scale reliability estimates.
We also calculated the standard error of measurement (SEM), which quantifies the precision of individual scores on each WD-FAB 2.0 scale. SEM was calculated as the average score standard errors at baseline and follow-up.13 We then calculated the minimal detectable change (MDC90), which was the minimal threshold for change in an individual’s score that would statistically identify real change. MDC90 was calculated as SEM*1.645*√2 where 1.645 was derived from the 90% CI of no change.14,15
To assess construct validity we examined the pattern of convergent and discriminant Pearson correlations between the established measures and the WD-FAB 2.0 scales at T2. We hypothesized that the WD-FAB 2.0 measures would be positively and significantly correlated with the established measures of their respective domains. Evidence of discriminant validity was obtained by comparing the strength of same-domain and cross-domain correlations. For these comparisons, we regarded correlations <=0.2 to be small, >=0.50 to be large, and values in between to be moderate. We hypothesized that convergent correlations would in general be in the moderate range or higher, and that discriminant correlations would in general be small.
Results
The working age adult sample was 42% male and 78% white with an average age of 43.4 years (SD 12.9). The adults with a physical work disability sample were 49% male and 81% white with an average age of 53.6 (SD 8.3) years. The adults with a mental health-related work disability were 47% male and 84% white with an average age of 49.9 (SD 10.5) years. The average duration of work disability was 11.9 (SD 9.4) years and 12.9 (SD 9.9) years for the physical and mental-health disability groups, respectively (Table 1).
Table 1.
Respondent Demographics
Demographic Characteristic | Working Age Adults N=335 | Work-Disabled: Physical N=375 | Work-Disabled: Behavioral N=296 |
---|---|---|---|
Age mean (sd) | 43.39(12.86) | 53.62 (8.32) | 49.89( 10.53) |
Work disability duration (years) mean (sd) | n/a | 11.95 (9.4) Missing: 25 |
12.9 (9.86) Missing:29 |
Sex: Male n (%) | 148 (44.18%) | 185 (49.33%) | 138 (46.62%) |
Race n (%) | |||
White | 260 (77.61%) | 303 (80.8%) | 249 (84.12%) |
Black | 38 (11.34%) | 40 (10.67%) | 13 (4.39%) |
Other | 37 (11.04%) | 30 (8%) | 31 (10.47%) |
Missing | 0 | 2 (0.53%) | 3 (1.01%) |
Hispanic n (%) | |||
Yes | 50 (14.93%) | 22 (5.87%) | 26 (8.78%) |
No | 283 (84.48%) | 351 (93.6%) | 268 (90.54%) |
Refused | 2 (0.6%) | 2 (0.53%) | 2 (0.68%) |
Education n (%) | |||
<high school | 12 (3.58%) | 16 (4.27%) | 9 (3.04%) |
High school | 88 (26.27%) | 147 (39.2%) | 78 (26.35%) |
>High School | 235 (70.15%) | 212 (56.53%) | 209 (70.60%) |
Relationship status n (%) | |||
Never married | 90 (26.87%) | 76 (20.27%) | 92 (31.08%) |
Married/Living with a Partner | 100 (59.40%) | 196 (52.27%) | 117 (39.53%) |
Divorced/Separated | 32 (9.55%) | 83 (22.13%) | 70 (23.65%) |
Widowed | 10 (2.99%) | 20 (5.33%) | 16 (5.41%) |
Refused | 4 (1.19%) | 0 | 1 (0.34%) |
Change in past week: Physical Health n (%) | |||
No change | 218 (65.07%) | 254 (67.33%) | 195 (65.88%) |
Got better | 35 (10.45%) | 18 (4.8%) | 16 (5.412%) |
Got worse | 18 (5.37%) | 48 (12.8%) | 40 (13.51%) |
Missing | 64 (19.1%) | 55 (14.67%) | 45 (15.2%) |
Change in past week: Mental Health n (%) | |||
No change | 210 (62.69%) | 263 (70.13%) | 180 (60.81%) |
Got better | 32 (9.55%) | 26 (6.93%) | 28 (9.46%) |
Got worse | 29 (8.66%) | 31 (8.27%) | 43 (14.53%) |
Missing | 64 (19.1%) | 55 (14.67%) | 45 (15.2%) |
Table 2 reports the results of the test-retest reliability and score precision analyses. For the WD-FAB 2.0 Physical Function scales, ICCs ranged from 0.69 to 0.77 in the general adult sample, and 0.66–0.86 in the physical work-disability sample. For the WD-FAB 2.0 Mental Health Function scales, ICCs ranged from 0.62 to 0.73 in the general adult sample, and 0.74–0.76 in the mental health work-disability sample. When the ‘‘changers’’ in each domain were removed from the sample as a sensitivity analysis, across all scales, the mean (SD) of the differences in ICCs was −0.023 (0.029) and ranged from −0.109 to 0.014. We judged this to be of no practical significance, and included the “changers” in all analyses, a conservative strategy in that including these respondents would, if anything, be expected to lower test-retest reliability.
2.
Reliability of the WD-FAB in three subsamples
WD-FAB Scale: Physical | Working Age Adults | Work-Disabled: Physical | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
n | Mean (SD) | ICC (95% CI) | SEM | MDC90 | n | Mean (SD) | ICC (95% CI) | SEM | MDC90 | |
Basic Mobility | 335 | 45.27 (6.26) | 0.77 (0.72–0.81) | 2.00 | 4.68 | 375 | 34.32 (5.73) | 0.86 (0.83–0.88) | 1.50 | 3.51 |
Upper Body Function | 335 | 44.69 (5.10) | 0.74 (0.69–0.78) | 1.96 | 4.59 | 375 | 34.34 (5.47) | 0.84 (0.80–0.87) | 1.47 | 3.44 |
Fine Motor Function | 335 | 45.36 (5.09) | 0.57 (0.50–0.64) | 3.52 | 8.22 | 375 | 38.34 (6.08) | 0.76 (0.72–0.80) | 2.10 | 4.91 |
Community Mobility: Driving | 284 | 34.43 (1.07) | 0.69 (0.62–0.74) | 2.85 | 6.66 | 248 | 34.05 (1.61) | 0.66 (0.58–0.72) | 2.71 | 6.34 |
Community Mobility: Public Transportation | 62 | 46.71 (5.19) | 0.70 (0.54–0.80) | 3.95 | 9.22 | 54 | 41.32 (5.91) | 0.75 (0.60–0.84) | 2.69 | 6.28 |
WD-FAB Scale: Physical | Working Age Adults | Work-Disabled: Physical | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
n | Mean (SD) | ICC (95% CI) | SEM | MDC90 | n | Mean (SD) | ICC (95% CI) | SEM | MDC90 | |
Cognition & Communication | 335 | 50.21 (7.70) | 0.68 (0.62–0.73) | 3.02 | 7.06 | 296 | 39.96 (6.5) | 0.75 (0.70–0.80) | 2.13 | 4.98 |
Self-Regulation | 335 | 52.41 (10.93) | 0.56 (0.48–0.63) | 3.68 | 8.59 | 296 | 44.66 (8.28) | 0.72 (0.66–0.77) | 2.93 | 6.84 |
Resilience & Sociability | 335 | 51.33 (9.73) | 0.62 (0.55–0.68) | 3.51 | 8.19 | 296 | 41.30 (9.20) | 0.74 (0.68–0.79) | 3.19 | 7.45 |
Mood & Emotions | 335 | 51.38 (15.23) | 0.73 (0.68–0.78) | 4.52 | 10.55 | 296 | 36.72 (12.62) | 0.76 (0.70–0.80) | 3.74 | 8.72 |
The SEM’s obtained for all WD-FAB 2.0 scales ranged from 1.46 to 4.79 points, indicating good discriminating ability. SEMs for the WD-FAB 2.0 physical function scales were generally lower than those for the WD-FAB 2.0 mental health scales across all three samples.
Overall, across all three samples, MDC90 values ranged from 3.41 to 10.55 points.
Basic descriptive statistics and information regarding the direction of scoring (whether high scores indicate higher or poorer functioning) are reported for the legacy measures in Table 3. Pearson correlations between the legacy instruments and WD-FAB 2.0 scales are reported for the Physical Function scales in Table 4 and for the Mental Health scales in Table 5. WD-FAB 2.0 scales are consistently scored such that higher scores are indicative of higher functional ability. Thus a conceptually consistent relationship between a WD-FAB 2.0 measure and a legacy measure will generally be positive, but may be negative depending on the scoring convention of the latter.
Table 3.
Legacy Measures: Basic Descriptive Statistics
Measure | Mean (SD) | ||
---|---|---|---|
Working Age Adults N=271 | Work-Disabled: Physical N=320 | Work-Disabled: Mental N=251 | |
PROMIS Physical Function* | 50.35 (9.67) | 34.66 (7.88) | 40.59 (8.30) |
PM-PAC Community Mobility* | 36.27 (5.31) | 28.20 (7.22) | 29.76 (6.71) |
BASIS 24¥ | |||
Depression/ Functioning | 0.74 (0.81) | 1.13 (0.89) | 1.88 (0.98) |
Relationships | 1.00 (1.06) | 1.03 (1.04) | 1.53 (1.02) |
Self-Harm | 0.23 (0.61) | 0.26 (0.63) | 0.80 (1.16) |
Emotional Lability | 0.76 (0.86) | 0.88 (0.84) | 1.63 (1.08) |
Psychosis | 0.32 (0.64) | 0.27 (0.54) | 0.79 (0.98) |
Generalized Self-Efficacy Scale (GSE)* | 32.06 (6.00) | 30.76 (5.29) | 26.41 (6.18) |
LaTrobe Communication Questionnaire¥ | 50.74 (11.71) | 51.51 (10.45) | 60.70 (12.99) |
AM-PAC Applied Cognition* | 47.78 (10.40) | 45.24 (8.39) | 40.06 (6.63) |
Higher scores indicate higher functioning
Higher scores indicate poorer functioning
Table 4.
Physical Function: Spearman Correlations (r) between Established and WD-FAB Measures in 3 Subsamples
Working Age Adults | |||||
---|---|---|---|---|---|
Legacy Measure | WD-FAB Physical Function Scale | ||||
Basic Mobility | Upper Body Function | Fine Motor Function | Community Mobility: Driving | Community Mobility: Public Transportation | |
PROMIS Physical | 0.73 | 0.53 | 0.52 | 0.37 | 0.58 |
PM-PAC Mobility | 0.43 | 0.59 | 0.44 | 0.50 | 0.58 |
B24 - Depression | −0.43 | −0.42 | 0.28 | −0.33 | −0.39 |
B24 - Relationships | −0.16 | −0.26 | 0.28 | −0.03 | −0.12 |
B24 - Self-Harm | −0.12 | −0.11 | −0.23 | −0.18 | −0.37 |
B24 - Emotions | −0.35 | −0.30 | −0.27 | −0.22 | −0.37 |
B24 - Psychosis | −0.16 | −0.14 | −0.28 | −0.25 | −0.42 |
GSE | 0.20 | 0.27 | 0.28 | 0.25 | 0.15 |
LaTrobe | −0.27 | −0.18 | −0.30 | −0.26 | −0.51 |
AM-PAC Cognition | 0.27 | 0.20 | 0.28 | 0.25 | 0.26 |
Work-Disabled: Physical | |||||
Legacy Measure | WD-FAB Physical Function Scale | ||||
Basic Mobility | Upper Body Function | Fine Motor Function | Community Mobility: Driving | Community Mobility: Public Transportation | |
PROMIS Physical | 0.82003* | 0.75284* | 0.60312* | 0.25137* | 0.57190* |
PM-PAC Mobility | 0.53401* | 0.55125* | 0.34220* | 0.29365* | 0.48108* |
B24 - Depression | −0.28279* | −0.28506* | −0.33019* | −0.30223* | −0.43018* |
B24 - Relationships | −0.05280 | 0.00111 | −0.02579 | −0.25402* | −0.38838* |
B24 - Self-Harm | −0.05280 | −0.10238 | −0.19636* | −0.08910 | −0.03349 |
B24 - Emotions | −0.19159* | −0.17315* | −0.26366* | −0.21267* | −0.49703* |
B24 - Psychosis | −0.09549 | −0.06256 | −0.16452* | −0.16238* | −0.35968* |
GSE | 0.15668* | 0.14482* | 0.13576* | 0.27178* | 0.35344* |
LaTrobe | −0.08954 | −0.14982* | −0.19268* | −0.17600* | −0.30269* |
AM-PAC Cognition | 0.15790* | 0.18580* | 0.24276* | 0.22672* | 0.28455 |
Note. Shaded cells indicate convergent correlations, hypothesized to be higher.
Significant correlation (p<0.05)
Table 5.
Mental Function: Spearman Correlations (r) between Established and WD-FAB Measures in 3 Subsamples
Working Age Adults | ||||
---|---|---|---|---|
Legacy Measure | WD-FAB Mental Function Scale | |||
Cognition & Communication | Self-Regulation | Resilience & Sociability | Mood & Emotions | |
PROMIS Physical | 0.41558* | 0.21940* | 0.21071* | 0.40053* |
PM-PAC Mobility | 0.32552* | 0.20598* | 0.17893* | 0.27495* |
B24 - Depression | −0.44138* | −0.38955* | −0.29717* | −0.62267* |
B24 - Relationships | −0.26778* | −0.33685* | −0.13491* | −0.22832* |
B24 - Self-Harm | −0.15553* | −0.27080* | −0.01567 | −0.26204* |
B24 - Emotions | −0.36432* | −0.42701* | −0.22979* | −0.50273* |
B24 - Psychosis | −0.28890* | −0.37198* | −0.05863 | −0.26185* |
GSE | 0.44340* | 0.45170* | 0.29947* | 0.36260* |
LaTrobe | −0.39484* | −0.29560* | −0.18416* | −0.35290* |
AM-PAC Cognition | 0.41646* | 0.32093* | 0.16331* | 0.40960* |
Working Age Adults | ||||
Legacy Measure | WD-FAB Mental Function Scale | |||
Cognition & Communication | Self-Regulation | Resilience & Sociability | Mood & Emotions | |
PROMIS Physical | 0.41382* | 0.15507* | 0.22302* | 0.18853 |
PM-PAC Mobility | 0.39204* | 0.23138* | 0.19256* | 0.30780* |
B24 - Depression | −0.55907* | -0.39602* | −0.45195* | -0.71111* |
B24 - Relationships | −0.27416* | −0.43156* | −0.47680* | −0.43630* |
B24 - Self-Harm | −0.37584* | −0.32266 | −0.34136* | −0.52632 |
B24 - Emotions | −0.44112* | −0.58840* | −0.43123* | −0.54281* |
B24 - Psychosis | −0.45943* | –0.41650* | −0.33335* | –0.43223* |
GSE | 0.54906* | 0.34482* | 0.56723* | 0.52016* |
LaTrobe | −0.58380* | −0.33296 | −0.41407* | −0.43887* |
AM-PAC Cognition | 0.59812* | 0.29817* | 0.39898* | 0.41647* |
Note. Shaded cells indicate convergent correlations, hypothesized to be higher.
Significant correlation (p<0.05)
All correlations were in the expected direction. The comparison of convergent and discriminant correlations is summarized in Table 6 based on the absolute value of all correlations. In the physical domain, the median correlations of WD-FAB 2.0 scales with the legacy measures of physical functioning (same-domain correlations) were 2 to almost 3 times higher than the median correlations of the WD-FAB 2.0 scales with the legacy measures of mental health functioning (cross-domain correlations). For example, among those work-disabled for physical reasons, the median WD-FAB 2.0 same-domain correlation with legacy measures was 0.54 as compared to the cross-domain median correlation of 0.19. In the mental health domain the evidence for distinction among the measures was not as strong but still substantial. The median correlations of the WD-FAB 2.0 mental health scales with same-domain legacy measures were 1.2 to 1.9 times higher than the median cross-domain correlations. For example, among those work-disabled for mental health reasons, the median WD-FAB 2.0 same-domain correlation with legacy measures was 0.43 as compared to the cross-domain median correlation of 0.23. The differentiation between convergent and discriminant correlations was lowest within the (non-disabled) working age adult sample.
Table 6.
Construct Validity: Summary of Convergent and Discriminant Correlations
Sample | Working Age Adults | Work-Disabled* | ||||||
---|---|---|---|---|---|---|---|---|
Domain & Correlation Type | Mean | Median | Min | Max | Mean | Median | Min | Max |
Physical Domains | ||||||||
Convergent | 0.53 | 0.52 | 0.37 | 0.73 | 0.52 | 0.54 | 0.25 | 0.82 |
Discriminant | 0.26 | 0.26 | 0.03 | 0.51 | 0.20 | 0.19 | 0.00 | 0.50 |
Mental Health Domains | ||||||||
Convergent | 0.31 | 0.31 | 0.02 | 0.62 | 0.45 | 0.43 | 0.27 | 0.71 |
Discriminant | 0.28 | 0.25 | 0.18 | 0.42 | 0.26 | 0.23 | 0.16 | 0.41 |
Results for WD-FAB 2.0 physical domains based on the subsample with physical work disability; results for WD-FAB 2.0 mental domains based on the subsample with a mental health work disability.
Discussion
This study provides substantial support for the psychometric quality of the WD-FAB 2.0 CAT in both the physical and mental domains. Regarding test-retest reliability, other widely used self-report measures of physical function and mental health typically display ICC’s above 0.7. ICC values for the WDFAB 2.0 physical function scales exceeded .70 in 7 of 10 possible instances in the working age adult and physical disability samples. Two of the three instances where WD-FAB 2.0 physical function scales demonstrated ICC values below that threshold involved Community Mobility-Driving in the working age adult sample (0.69) and the physical disability samples (0.66). In both instances, however, 95% CI upper limits exceeded .70. The third instance involved the Fine Motor scale. Although the ICC for that scale exceeded .70 in the physical disability sample, reliability was only .57 in the working-age adult sample. As one might anticipate, the mean Fine Motor function score was higher in the working age adult sample than it was in the physical disability sample (45.4 and 38.3, respectively), and there was less variation in the working age adult sample (SDs 5.1 and 6.1, respectively). The low variability in the non-disabled sample may explain in part the lower ICC in that group. To examine this further we computed the absolute score difference between test and retest, and estimated the score standard deviation (SD) at first test. In the general adult sample the first test score standard deviation was 0.88, and in the physical disability sample, the first score standard deviation was 1.05. This suggests that the lower ICC in the working age sample is related to the lower variation in that group, as more variation in scores cause higher ICC’s.
Regarding the reliability of the mental health scales, ICC values exceeded .70 for all four scales in the mental health disability sample, and for Mood and Emotions in the working age adult sample. ICC values fell below that threshold, however, for the other three mental health scales in the non-disabled group. In one instance, Cognition and Communication (ICC .68), the upper limit of the 95% CI exceeded .70. The other exceptions involved Self-Regulation (ICC .56) and Resilience and Sociability (ICC 0.62). As one would expect, the mean mental health function scores of the working age adult sample, as reported in Table 2, were higher than those of the mental health disability sample on both of these scales. However, again on both scales, there was greater variability in function in the non-disabled sample, suggesting that mental health function may vary with a general population sample more than it does in a sample of individuals who are currently disabled for mental health reasons. Additional replenishment work is recommended to clarify the meaning of the Self-Regulation and Resilience and Sociability constructs in the context of work function, and thereby potentially improve the test-retest reliability of these scales.
The SEM provides an indication of the dispersion of measurement errors when trying to estimate a subject’s true score from their observed scores, and is a function of the variability of scores on the measure in question and that measure’s reliability. The SEM sets a confidence interval (CI) around a given respondent’s score, with lower SEMs indicate better accuracy. When comparing scores between different subjects on the same scale, if the CI’s around the subject’s score do not overlap, then we can be reasonably confident that the two subjects have different levels of functional ability. The SEMs obtained for the WD-FAB 2.0 physical function scales across the two relevant samples (working age adult; physically disabled) ranged from 1.47 to 3.95, indicating good discriminant ability. The four mental health scales displayed similarly favorable SEMs, ranging from 2.13 to 4.52 across the working age adult and mental disability samples.
The MDC90 may be interpreted as the smallest detectable change that falls outside the measurement error of the instrument. This value is of primary concern when evaluating change, as an amount of change less than the MDC90 cannot be distinguished from random fluctuation. For the physical function scales, the MDC90 ranged from 3.44 to 9.22 across the two relevant samples. Thus none of the MDC90 values exceeded 10, the standard deviation for all scales. For the mental function scales, the MDC90 ranged from 4.98 to 10.55 across the two relevant samples. The high MDC90 value was observed on the Mood and Emotions scale in the working age adult sample and was slightly above 10, the standard deviation for all scales. This suggests that for that scale in a general population sample a respondent’s score would have to change by about one standard deviation – a large amount of change – before one could be sure that a real change had occurred.
Finally, the pattern of convergent and discriminant correlations between WD-FAB 2.0 scales and established measures of related and cross-domain constructs, respectively, provided strong evidence of construct validity. Regarding physical function, median convergent correlations between the WD-FAB 2.0 scales and established measures of physical function were in the low .50’s, as compared to median discriminant correlations with established mental health measures that ranged from .19 to .26. A similar pattern obtained regarding mental health function, where median convergent correlations between the WD-FAB 2.0 scales and established measures of mental health ranged from .31 to .43, compared to median discriminant correlations with established measures of physical function in the low/mid .20’s.
Limitations
While broadly supportive of the psychometric quality of the WD-FAB 2.0, two limitations of the present study should be noted. First, physical and mental disability status was determined by self-report and could not be independently verified. In addition, it would be valuable to gain some experience in the actual application of the WD-FAB 2.0 in more individuals with work disability to inform possible future replenishment activities.
Conclusions
The present study provided substantial evidence for the reliability and validity of the WD-FAB 2.0, a CAT-based assessment of work-relevant physical and mental health function. Although initially developed for use by the Social Security Administration to help standardize the disability assessment process, the performance of these scales in samples of working age adults and adults with work disability in this study suggest that the WD-FAB 2.0 could be used for assessment and measurement of work-related function in other contexts as well.
Funding Acknowledgement:
This study was supported by Social Security Administration-National Institutes of Health Interagency Agreements under the National Institutes of Health (contract nos. HHSN269200900004C, HHSN269201000011C, HHSN269201100009I, HHSN269201200005C), and by the National Institutes of Health Intramural Research Program.
Footnotes
There are no conflicts of interest to declare.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References:
- 1.WHO I, World Health Organization. International classification of functioning. Disability and Health (ICF), endorsed by all. 2007;191. [Google Scholar]
- 2.Gibbons RD, Weiss DJ, Kupfer DJ, et al. Using computerized adaptive testing to reduce the burden of mental health assessment. Psychiatric Services. 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Marfeo EE, Haley SM, Jette AM, et al. Conceptual foundation for measures of physical function and behavioral health function for social security work disability evaluation. Arch Phys Med Rehabil. 2013;94(9):1645–1652. e2. doi: 10.1016/j.apmr.2013.03.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ni P, McDonough CM, Jette AM, et al. Development of a computer-adaptive physical function instrument for social security administration disability determination. Arch Phys Med Rehabil. 2013;94(9):1661–1669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.McDonough CM, Jette AM, Ni P, et al. Development of a self-report physical function instrument for disability assessment: Item pool construction and factor analysis. Arch Phys Med Rehabil. 2013;94(9):1653–1660. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Marfeo EE, Ni P, Haley SM, et al. Development of an instrument to measure behavioral health function for work disability: Item pool construction and factor analysis. Arch Phys Med Rehabil. 2013;94(9):1670–1678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Meterko M, Marfeo EE, McDonough CM, et al. The work disability functional assessment battery (WDFAB): Feasibility and psychometric properties. Arch Phys Med Rehabil. 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Marino ME, Meterko M, Marfeo EE, et al. Work-related measures of physical and behavioral health function: Test-retest reliability. Disability and health journal. 2015;8(4):652–657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Haley SM, Ni P, Jette AM, et al. Replenishing a computerized adaptive test of patient-reported daily activity functioning. Quality of Life Research. 2009;18(4):461–471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.McDonough CM, Ni P, Peterik K, et al. Improving measures of work-related physical functioning. Quality of Life Research. 2016:1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Marfeo M, Ni P, McDonough C, et al. Improving measures of work-related mental health functioning. Quality of Life Research. Under Review. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Rivers D. Sample matching: Representative sampling from internet panels. Polimetrix White Paper Series. 2006. [Google Scholar]
- 13.Yost KJ, Eton DT, Garcia SF, Cella D. Minimally important differences were estimated for six patient-reported outcomes measurement information system-cancer scales in advanced-stage cancer patients. J Clin Epidemiol. 2011;64(5):507–516. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wyrwich KW, Tierney WM, Wolinsky FD. Further evidence supporting an SEM-based criterion for identifying meaningful intra-individual changes in health-related quality of life. J Clin Epidemiol. 1999;52(9):861–873. [DOI] [PubMed] [Google Scholar]
- 15.Wyrwich KW. Minimal important difference thresholds and the standard error of measurement: Is there a connection? J Biopharm Stat. 2004;14(1):97–110. [DOI] [PubMed] [Google Scholar]