Abstract
Purpose
To improve the mental health component of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration’s (SSA) disability determination process. Specifically our goal was to expand the WD-FAB scales of mood & emotions, resilience, social interactions, and behavioral control to improve the depth and breadth of the current scales and expand the content coverage to include aspects of cognition & communication function.
Methods
Data were collected from a random, stratified sample of 1695 claimants applying for the SSA work disability benefits, and a general population sample of 2025 working age adults. 169 new items were developed to replenish the WD-FAB scales and analyzed using factor analysis and item response theory (IRT) analysis to construct unidimensional scales. We conducted computer adaptive test (CAT) simulations to examine the psychometric properties of the WD-FAB.
Results
Analyses supported the inclusion of four mental health subdomains: Cognition & Communication (68 items), Self-Regulation (34 items), Resilience & Sociability (29 items) and Mood & Emotions (34 items). All scales yielded acceptable psychometric properties.
Conclusions
IRT methods were effective in expanding the WD-FAB to assess mental health function. The WD-FAB has the potential to enhance work disability assessment both within the context of the SSA disability programs as well as other clinical and vocational rehabilitation settings.
Keywords: Employment, Mental health, Disability assessment, Measurement
Introduction
The national burden of mental health related work disability is increasing and has significant health, social, and economic consequences for individuals with disabilities and their families. Disability has been identified as an underdeveloped area of public health research [1]. The US Social Security Administration’s Social Security Disability Insurance (SSDI) and Supplemental Security Income (SSI) programs are the primary source of funding for many US disabled workers and their families. In 2014, payments to disabled beneficiaries were provided to just over 10.2 million people [2]. Approximately 35% of the current SSDI beneficiaries receive benefits due to a mental disorder. Such disorders include autistic disorders, developmental disorders, intellectual disability, organic mood disorders, mood disorders, schizophrenic and other psychotic disorders [2]. These types of disabling conditions represent one of the larger diagnostic categories affecting the US working-age population today.
The early steps of the Social Security Administration’s (SSA) work disability determination process primarily focuses on an individual’s symptoms or impairments without systematically capturing the related functional consequences of their condition. Current research and practice show that the relationship between symptoms and work performance is not always clear, and the weak relationship between them has been increasingly recognized as one of the fundamental challenges in work disability assessment [3–5]. Due to the financial stability that is provided to individuals with disabilities through the SSI/SSDI programs, it is essential that a person’s functional abilities are fully characterized to determine ability to return to work or obtain meaningful employment.
This is supported by a recent Institute of Medicine report that recommends SSA support the development of promising alternative approaches to evaluating eligibility for Social Security disability benefits, including the development and use of novel assessments that incorporate function into their current disability assessment process in a systematic way [6]. To more fully inform SSA’s disability assessment process, we developed a new assessment tool called the Work Disability Functional Assessment Battery (WD-FAB), using modern psychometric methods and contemporary notions of disability defined by the World Health Organization’s International Classification of Function and Health (ICF) [4]. The ICF is a biopsychosocial model that views disability and functioning as outcomes of interactions between an individual’s health conditions and contextual factors [7]. As applied to the work disability assessment process this means we should not only consider physical or mental impairments related to a given health condition, but we should consider using a “whole person” approach to measuring work-related function [4, 5]. For the purposes of our work, we aimed to develop and instrument that can efficiency and systematically measure aspects of cognition and psychosocial function in order to characterize a profile of work related mental health function.
In previous work, the WD-FAB characterized mental health function along the following domains: mood & emotions, social interactions, self-efficacy, and behavioral control. Mood & emotions represented a range of a person’s internal emotional state that can affect a person’s ability to work and encompassed feelings such as depression and anxiety [8–12]. Social interactions focused on a person’s ability to interact with others. Self-efficacy represented a range of concepts such as resilience, adaptability, trust, and motivation. Lastly, behavioral control captured traits such emotional regulation and anger. All of these factors have been cited in the literature as important factors that have been associated with a person’s ability to function in the workplace yet underdeveloped from a disability assessment perspective [4, 13–15].
Although the first version of the WD-FAB represented significant conceptual and psychometric progress, there was a need to expand the WD-FAB to ensure it reflected item content that was applicable for use among people with a wider range of mental health conditions. This manuscript reports the findings of a replenishment study we conducted to expand and enhance the mental health elements of the WD-FAB. To that end, the primary objectives were to (1) expand the WD-FAB to include as scale related to cognitive and communication function, (2) expand the social interaction scale to assess a broader range of self-regulation skills, and (3) improve the construct clarity related to self-efficacy scale to capture additional work related items supporting the construct of “Resilience & Sociability” (defined by person’s capacity to adapt and respond to pressure of daily life demands).
Methods
Sample Selection
The samples for this study included a random sample of SSA claimants, stratified by geographic region and urban/rural designation and a general population sample of working age adults in the US. Data collection for the SSA claimants was conducted via phone and internet by the Westat research organization. Claimants were invited to participate in the study via mailed letter. Westat data collectors then followed up with a phone call to determine interest and eligibility. seven follow up phone calls were made if initial attempts were unsuccessful. Data for the general population sample was collected via internet by YouGov research organization. YouGov maintains a large (>1 million) panel of voluntary Internet survey participants. YouGov invited subjects to participate via email with up to two follow up invitations if needed [16].The general population sample was stratified by age (21–66 years), gender, and geographic region to approximate the distribution of working age adults in the US based on data from the decennial Census. Individuals who identified as African American or Asian race were oversampled to support differential item functioning analysis. The study was approved by [blinded for review] University institutional review board, and all subjects provided informed consent prior to participation in any study activities.
Instruments
Demographics
Basic demographics, work history, and alleged disabling condition (physical, mental, or both) were obtained for both samples for descriptive purposes. Specifically, age, sex and race were collected to assess differential item functioning (DIF) by any of these participant characteristics.
WD-FAB
Both new items and some existing items (that served as anchors to the original item calibrations) were used to calibrate and replenish the WD-FAB scales. See Fig. 1 for number of items within each domain that were developed and field tested. Item development involved the use of focus groups, expert item review, and cognitive interviewing [9, 10, 17, 18]. A content expert panel (seven experts from the fields of mental health rehabilitation, speech language pathology, and measurement development) was used to draft and evaluate new items to broaden the range of functional activity covered within a construct or to fill potential content gaps. Specifically we asked experts to evaluate the current items in the WD-FAB to enhance the construct clarity of the scales, expand any areas that were missing from key mental health functional areas related to work. The items were developed using two primary item structures: agreement-based and ability-based. The agreement items asked for individuals to “Specify your level of agreement” and included a response scale from “Strongly Agree to Strongly Disagree” The ability-based items asked “Are you able to” and included response categories ranging from “Yes, without difficulty to Unable to do.” Both items structured allowed an opt-out option of “I don’t know. “After the new items were developed, all items underwent cognitive testing which entailed having each item reviewed and interpreted by the target users in order to minimize response errors. Items found difficult to understand or interpreted differently than intended, were discarded or reworded. After cognitive testing, we administered 226 new and anchor items to the general population sample and the sample of SSA claimants who filed for disability benefits (see Fig. 1 “field tested items”).
Fig. 1.

Factor structure replenishment results. Note This figure represents the number of items field tested for each hypothesized domain of mental health function, followed by the final item count for each confirmed scale of the WD-FAB mental health function component. *The mood & emotions scale was a priority area for this replenishment study, so this scale was retained from our previous WD-FAB measurement development work
Analysis Plan
The analysis was conducted in two phases. The first phase focused on the claimant sample, then analyses were replicated in the US general population sample. This dual-sample approach allows the WD-FAB to generate comparative scores between the claimant and general population samples. Additionally we integrated the calibration data from our previous work to retain comparability across both existing and new items. To establish the initial structure of the new items within the context of the previously developed domains a series of exploratory factor analysis (EFA) was conducted. We determined the number of factors based on the interpretation of the factor content and maximizing parsimony. Items loadings of greater than 0.3 were considered sufficient to retain [19]. For items that loaded on two factors, the higher loading was used to identify the best factor [20]. For items with similar loadings, the content model was used to guide item categorization. Next, we performed confirmatory factor analyses (CFA) for each factor to assess unidimensionality of each factor. Optimal model fit was defined as RMSEA < = 0.08, CFI & TLI > = 0.9 [21, 22]. For the mood & emotions scale, we retained the results from our previous work where the criteria included RMSEA < 0.1 [9, 10, 17, 18]. After we determined the model with optimal model fit, we examined each solution from a conceptual perspective to ensure the items retained relevant content. The final model solutions balanced parsimony, statistical fit, and content coverage.
We calibrated the items using the graded response IRT model which is based on the assumption that responses to the items indicate individual differences on a single underlying, or latent, construct for each scale [23, 24]. The item fit was assessed by Pearson’s X 2 (S-x2) [25] and the Bonferroni correction p value was used to identify misfit items. We calculated score distributions, percent at the lowest (floor) and highest (ceiling) scores; and, the marginal reliability and correlations between the CAT simulations (5 and 10 items) with the full item banks for each scale. The simulation of 5–10 item CAT used the CAT stopping rules: Minimal number of items = 5, Maximum number of items = 10, or Standard Error < 0.32 (reliability > 0.9). To link the existing items to the new items, we used concurrent calibration methods to estimate the new item parameters onto each existing WD-FAB scale. We applied a two-group IRT model and used the original WD-FAB sample as the reference population (with mean = 0 and standard deviation = 1). We applied separate calibrations with the Stocking-Lord method to link the original WD-FAB scales onto general population scales [26].
In conjunction with item calibrations, we employed IRT methods to analyze differential item functioning (DIF), which is a method of identifying whether new item parameter estimates differed significantly by demographic characteristics such as age, gender, or race or by the calibration sample (i.e. claimant vs. general population and previous WD-FAB vs. new WD-FAB items. DIF occurs when people at the same estimated ability level in a particular content domain respond differently to the same item on that subdomain based on some other variable. We used a two-step DIF analysis using Langer’s method to screen for potential item DIF [27, 28]. Then we evaluated the potential effect of DIF by examining difference in the Item Characteristic Curves (ICCs) using the weighted Area Between the expected score Curves (wABC) [29]. Items with wABC > 0.3 (5-category items) or wABC > 0.24 (4-category items) were identified as having DIF. Items that demonstrated DIF were considered carefully and those items retained include calibrations specific to the relevant characteristic (i.e. Male vs. Female). Statistical analysis was conducted using SAS and IRTPRO software [30–32].
Results
Sample Characteristics
Table 1 displays the demographic characteristics of both the SSA Claimant and general population samples. The SSA Claimant sample consisted of 1695 claimants who had an average age of 44.7 years, majority female (60%), white (67%), non-Hispanic (90%), and greater than high school education (55%). The demographics of the US working age adult sample included 2,025 participants with and average age of 42 years, majority female (55%), white (53%), non-Hispanic (77%), and greater than high school education (64%). Missing data for all item responses was evaluated. “Unscalable” and “I don’t know” responses were treated as missing for calibration purposes. Overall missing data was below 10%. Among the claimant sample, the “I don’t know” response was selected an average of 1.41% and for the general population sample 1.32%. There was no systematic response pattern to the “I don’t know” responses, so scales were developed using the maximum available data for each scale.
Table 1.
Characteristics of the general population and claimant samples for replenishment of the FAB scales
| Characteristic | SSA claimants (N = 1695) | General population sample | ||
|---|---|---|---|---|
| N | % | N | % | |
| Age mean ± SD | 44.73(11.64) | 42.19(12.76) | ||
| Under 40 | 575 | 33.92 | 913 | 45.09 |
| 40–55 | 693 | 40.88 | 654 | 32.30 |
| 55+ | 423 | 24.96 | 458 | 22.62 |
| Missing | 4 | 0.24 | 0 | 0 |
| Sex | ||||
| Female | 1023 | 60.35 | 1113 | 54.96 |
| Male | 667 | 39.35 | 912 | 45.04 |
| Missing | 5 | 0.30 | 0 | 0 |
| Race | ||||
| White | 1136 | 67.02 | 1083 | 53.48 |
| Black/African American | 368 | 21.71 | 412 | 20.35 |
| Other | 176 | 10.38 | 483 | 23.85 |
| Missing | 15 | 0.89 | 47 | 2.32 |
| Hispanic ethnicity | ||||
| Yes | 147 | 8.67 | 447 | 22.07 |
| No | 1528 | 90.15 | 1561 | 77.09 |
| Refuse | 20 | 1.18 | 17 | 0.84 |
| Education | ||||
| Less than high school | 252 | 14.87 | 81 | 4 |
| High school/GED | 507 | 29.91 | 636 | 31.41 |
| Greater than high school | 933 | 55.04 | 1302 | 64.29 |
| Missing | 3 | 0.18 | 6 | 0.3 |
| General health | ||||
| Excellent-very good | 97 | 5.72 | 848 | 41.87 |
| Good—fair | 800 | 47.20 | 1011 | 49.92 |
| Poor | 768 | 45.31 | 128 | 6.32 |
| Refused/don’t know | 30 | 1.77 | 38 | 1.88 |
| Mental health | ||||
| Excellent—very good | 104 | 6.14 | 1139 | 56.24 |
| Good—fair | 857 | 50.56 | 756 | 37.33 |
| Poor | 711 | 41.95 | 93 | 4.59 |
| Refused/don’t know | 23 | 1.36 | 37 | 1.83 |
WD-FAB Structure and IRT Scale Properties
The resulting structure from factor analyses yielded four distinct WD-FAB mental health domains (see Fig. 1): Cognition & communication, resilience & sociability, self-regulation, and mood & emotions. With the exception of the mood & emotions scale, all mental health scales represent expanded content areas to the original WD-FAB. Cognition & Communication scale includes items that characterize aspects of function such as organizational skills, attention, following instructions, oral and written communication. Resilience & Sociability represents a range of content such as handling stress, accomplishing goals, learning from mistakes, and interacting with others. Self-Regulation characterizes attributes of function such as controlling temper, respecting others, following rules, and social appropriateness. Lastly, the mood & emotions scale, which was retained from our previous work, characterizes aspects of a person’s emotional stability, depressive feelings and anxiety. Sample items for each scale include: Cognition communication: Are you able to get your point across when speaking with other people? [Unable to do; Yes, with a lot of difficulty; Yes, with some difficulty; Yes, with a little difficulty; Yes, without difficulty; I don’t know]; Please specify your level of agreement: I can keep up a conversation. [Strongly Disagree, Disagree, Agree, and Strongly Agree, I don’t know]. Self-regulation: Please specify your level of agreement: I have difficulty following the rules. [Strongly Disagree, Disagree, Agree, and Strongly Agree, I don’t know]; Please specify your level of agreement: When I am stressed, I find myself losing control. [Strongly Disagree, Disagree, Agree, and Strongly Agree, I don’t know]; Resilience & Sociability: Please specify your level of agreement: I usually accomplish what I set out to do. [Strongly Disagree, Disagree, Agree, and Strongly Agree, I don’t know]; Please specify your level of agreement: When there is a problem I am able to work things out with other people. [Strongly Disagree, Disagree, Agree, and Strongly Agree, I don’t know]; and Mood & Emotions: Please specify your level of agreement: I am so tired when I wake up, it’s hard to get going. [Strongly Disagree, Disagree, Agree, and Strongly Agree, I don’t know]; Please specify your level of agreement: I dwell on my problems. [Strongly Disagree, Disagree, Agree, and Strongly Agree, I don’t know]. The factor structure and item fit of the data that emerged in the claimant sample was successfully replicated in the general population sample (Factor loading for both samples available in Supplemental Table A1). This co-calibration allows us to provide score comparisons using the same metric for the claimant sample vs the general population sample. Results from the CFA indicated acceptable fit statistics across all mental health subdomains in both samples and all scales demonstrated acceptable fit statistics meeting the RMSEA < = 0.08, CFI & TLI > = 0.9 criteria (see Table 2).
Table 2.
Comparison of CFA result for SSA claimant sample and US general population sample
| Cognition & communication | Self-regulation | Resilience & sociability | Mood & emotions | |||||
|---|---|---|---|---|---|---|---|---|
| Claimant sample | US General population sample | Claimant sample | US General population sample | Claimant Sample | US general Population Sample | Claimant sample | US general population sample | |
| CFI | 0.980 | 0.904 | 0.911 | 0.911 | 0.912 | 0.921 | 0.947 | 0.947 |
| TLI | 0.0976 | 0.901 | 0.917 | 0.095 | 0.907 | 0.916 | 0.942 | 0.942 |
| RMESA | 0.042 | 0.064 | 0.068 | 0.017 | 0.072 | 0.067 | 0.079 | 0.079 |
Table 3 presents results from IRT analyses comparing the full item bank and CAT simulations. The WD-FAB scores calculate IRT-based standardized t-scores that are based on the general population sample (norm-reference sample) with a mean of 50 and standard deviation of 10. This allows scores to be comparable from the SSA claimants to an age and gender matched working age adult in the US general population. Higher scores on each scale indicate higher mental health function. Based on these results, the greatest differences between the claimant and general population samples were seen in the cognition-communication and mood & emotions scales where claimant scores are on average greater than 1SD below the average score of the general population scores. The claimant Self-Regulation and Resilience & Sociability scale scores also demonstrated lower functioning (slightly less the 1SD) compared to the general population sample. Ceiling effects were less than 15% for all scales. Figure 2a, d illustrate the differences in the score distribution between the claimant and normative samples for each scale. The results from the marginal reliability of the CATs vs. the full item bank was >0.89 and correlations between the CAT simulations and full item bank were >0.95.
Table 3.
Descriptive statistics for simulated CATs and full item bank for claimants based on general population data, by content subdomain
| Subdomain | Mode | N* | Mean (SD), range | % Ceiling | % Floor |
|---|---|---|---|---|---|
| Cognition & communication | 5–10 item CAT | 1692 | 38.8 (6.78) | 4 (0.24%) | 0 |
| Full item bank | 1692 | 38. (7.04) | 4(0.24%) | 0 | |
| Full item bank (general sample) | 2005 | 50.0 (10.2) | 158 (7.88%) | 2 (0.1%) | |
| Self-regulation | 5–10 item CAT | 1413 | 42.7 (9.57) | 1 (0.07%) | 0 |
| Full item bank | 1413 | 42.7 (10.6) | 0 | 0 | |
| Full item bank (general sample) | 2020 | 49.5 (11.94) | 95 (4.7%) | 17 (0.84%) | |
| Resilience & sociability | 5–10 item CAT | 1409 | 43.4 (10.21) | 9 (0.64%) | 0 |
| Full item bank | 1409 | 43.3 (10.44) | 8(0.57%) | 0 | |
| Full item bank (general sample) | 1989 | 49.5 (12.14) | 258(12.97%) | 3 (0.15%) | |
| Mood & emotions | 5–10 item CAT | 1015 | 35.0 (11.91) | 0 | 5 (0.49%) |
| Full item bank | 1015 | 34.9 (12.33) | 0 | 3 (0.13%) | |
| Full item bank (general sample) | 1000 | 49.9 (10.25) | 3(0.3%) | 0 |
Sample size variation due to “I don’t know responses”. Each scale was developed using the maximum amount of complete data for each domain
Fig. 2.

Score distribution of claimant and general population samples. a cognition & communication, b self-regulation, c resilience & sociability, d mood & emotions. Note The dark grey represents the SSA claimant sample, the light grey represents the US general sample, and the middle grey represents overlap in these two samples. The distributions are centered on the US general sample mean at 50 ± 10 SD
Lastly, the results of the DIF analysis revealed a minimal degree of DIF within both samples when examining demographic characteristics. Among the claimant sample there was DIF by age (Self-Regulation: 1 item), gender (Cognition & Communication: 1 item, Self-Regulation: 1 item, Mood & Emotions: 2 items), and race (Cognition & Communication: 1 item, Self-Regulation: 1 item). Even less DIF was observed among the General population sample: DIF by age (Interpersonal Interaction: 1 item, Mood & Emotions: 1 item) and gender (Mood & Emotions: 1 item.). Our approach was to retain the items in the item pools and calibrate the items differently based on the resultant age, gender, or racial DIF. No DIF was observed between the calibration samples from our previous work and the current WD-FAB, indicating successful ability to link items from our previous work with the results of this WD-FAB expansion study.
Discussion
In this study, we successfully expanded the mental health content in the WD-FAB to develop a tool that could comprehensively characterize mental health function related to work. Consistent with previous work, this study represents the powerful utility of developing health outcomes measures built upon a clear conceptual framework and by using IRT methods. We were able to maintain and expand item banks assessing mental health important to work functioning and successfully add new items allowing the characterization of four relevant domains: cognition & communication, resilience & sociability, self-regulation, and mood & emotions. We simultaneously increased the precision and mental health content coverage of the WD-FAB and paired with complementary results from development of physical domains [4, 17, 18]. The resulting WD-FAB represents an innovative tool that advances the field of disability assessment both conceptually and psychometrically.
In terms of robust psychometric properties, all scales yielded good precision and accuracy across the potential score distribution range. Additionally the CAT simulations indicate strong correlations between a 5-and 10 item CAT and the full item bank. These results support the notion that using IRT/CAT is a feasible approach to capturing a broad array of function without administering all items for each scale [33]. Additionally, each scale demonstrated differences in the SSA claimant and general population sample in the expected direction, reflecting the WD-FAB’s utility in describing claimant mental health functioning. However, there is more overlap between the claimant and general population samples in the distributions for the Self-Regulation and the Resilience & Sociability scales compared with other mental health subdomains. Even with this overlap the difference between the claimant sample distribution and general adult sample distribution is approaching 1SD, which is substantial. One potential reason is that these two content areas may be more complex and subject to greater variation at baseline among general working age adult population as compared to more objective domains such as Cognition-Communication and Mood & Emotions. Overall, the WD-FAB scales characterize four important domains of mental health that can affect a person’s ability to successfully function in the workplace.
The WD-FAB is novel in the use of item response theory (IRT) and computer adaptive testing (CAT). IRT and CAT are contemporary methodologies in measurement scale development that allow efficient, precise measurement of complex, multifactorial aspects of health and functioning [34]. Specifically, IRT is used to calibrate an item pool, which is then administered through computer adaptive test (CAT). IRT is a model that describes the association between a respondent’s underlying level on a latent trait and the probability of a particular item response using a nonlinear monotonic function. The primary assumptions for using IRT are unidimensionality (only 1 construct is measured by the items in a scale) and local independence (items are uncorrelated with each other when the latent trait or traits have been controlled for) [34]. As a result, IRT arranges the items by difficulty levels for each unidimensional scale. Using CAT algorithms, items are administered at the trait level of the respondent. Similar methodologies have been used for decades in educational settings in the administration of standardized tests such as the SAT and GRE. The IRT methods create an instrument that can characterize a person’s functional status along multiple domains of function (i.e., scales) rather than being constrained to a single domain [34]. Ultimately this approach results in an efficient, standardized, and comprehensive way to characterize the profile of a person’s functional level for a given construct—in our case, work related mental health function.
We know of no other instrument of this type, breadth, or efficiency being used in other social welfare systems internationally. Some countries (such as the UK) have relied on in-person clinical assessments to systematically quantify residual functional abilities. However, such assessments are both lengthy and costly and are therefore not practical to implement in countries with a high volume of applications annually (such as the U.S.). All social security programs are challenged with linking functional capacity with workplace demand. The approaches used to make this linkage vary among national programs and are influenced by programmatic demand and available resources. The lack of consensus in identifying a gold standard coupled with variable reliability among assessing professionals supports more systematic and comprehensive approaches to assessment processes. Thus, the WD-FAB could potentially be a useful tool to inform disability determination processes both within and outside of the U.S.
Challenges and Limitations
The WD-FAB instrument offers several psychometric and conceptual advancements in measuring aspects of mental health in the context of work; however, a few limitations should be noted. Although the WD-FAB instrument allows for characterization of four distinct dimensions of mental health function, other important aspects of a person’s ability to work, environmental factors and job demands, should be taken into account when assessing the full spectrum of a person’s potential ability to work. Lastly, this research represents the final stages of instrument development. Additional research of the completed instrument should be conducted to evaluate psychometric properties such as test–retest reliability, criterion validity, and predictive validity. Additionally future work to examine how individual mental health function as characterized by the WD-FAB may be linked to job demands may help lessen the gap between individual and environmental characteristics of assessing a persons’ potential ability to participate in work. We did not intend for the WD-FAB to determine eligibility for disability benefits or their level. Rather, if the WD-FAB functional profiles are examined in the context of work demands, it suggests the potential fit with certain jobs. Thus, it is actually ideally suited for use in countries focused on re-integration into work since it would suggest areas where functional abilities would need to be enhanced to optimize performance at certain jobs. Alternatively, it could suggest jobs where the fit with current (residual) abilities was good where the likelihood of successful return to work would be high. Future work to examine effective disability program implementation of the WD-FAB should be conducted to identify the optimal way in which the WD-FAB may be useful in informing decision-making processes related to a person’s ability to return to work.
Clinical and Policy Applications
This work represents significant psychometric and conceptual advancement in the area of assessment related to work in several ways. First, instruments developed using IRT are considered “dynamic” instruments. The WD-FAB’s use of IRT/CAT methods will allow for continual updating and improvement over time. Updated forms of the WD-FAB can be created from the existing item bank while maintaining the underlying scale of measurement. This feature of IRT methods allows for future WD-FAB versions to be comparable with earlier versions.
Best practice in work disability assessment is constantly changing. Having a tool that can be updated to reflect current scientific and conceptual views on work related disability has potential to be a valuable resource for a wide array of stakeholders who are interested in systematically and efficiently assessing work-related mental function. Practically, the WD-FAB generates profiles along several key dimensions of mental health function important for work. The efficiencies gained by the CAT administration while preserving the breadth of content coverage allows for potential policy relevant and clinical applications of the WD-FAB. The WD-FAB’s representation of work disability from a functional perspective is consistent with contemporary notions of disability [3, 5]. The WD-FAB represents a promising method to systematically and efficiently characterize an array of important mental health functioning factors that relate to a person’s ability to work. From initial screening to re-evaluation of disability status, there is clear potential for the WD-FAB to play an important role in improving disability assessment both within the context of the SSA disability programs as well as other clinical and vocational rehabilitation settings. The WD-FAB could be informative for guiding clinical care targeting return to work interventions as well as assist vocational rehabilitation counselors to identify areas of strengths and weaknesses in their goal of matching underlying functional ability with potential job demands of the work environment. The WD-FAB is a tool that has the ability to provide data to support and improve decision making around work disability assessment.
Supplementary Material
Acknowledgements
This study was supported by Social Security Administration-National Institutes of Health Interagency Agreements under the National Institutes of Health (contract nos. HHSN269200900004C, HHSN269201000011C, HHSN269201100009I, HHSN269201200005C), and by the National Institutes of Health Intramural Research Program.
Footnotes
Electronic supplementary material The online version of this article (doi:10.1007/s10926-017-9710-5) contains supplementary material, which is available to authorized users.
Conflict of interest The authors declare they have no conflict of interest to disclose.
Ethical Approval All procedures performed were in accordance with the ethical standards of the University research committee and with the 1964 Helsinki declaration and its later amendments and standards.
Informed Consent All subjects provided informed consent prior to participating in any study activities.
References
- 1.Krahn GL, Walker DK, Correa-De-Araujo R. Persons with disabilities as an unrecognized health disparity population. Am J Public Health. 2015;105(Suppl 2):S198–S206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Social Security Administration. Annual statistical report on the social security disability insurance program. Baltimore, MD: Social Security Administration; 2014. Available at https://www.ssa.gov/policy/docs/statcomps/di_asr/. Accessed 14 Dec 2015. [Google Scholar]
- 3.Marfeo EE, Eisen S, Ni P, Rasch EK, Rogers ES, Jette A. Do claimants over-report behavioral health dysfunction when filing for work disability benefits?. Work. 2015;51(2):187–194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Marfeo EE, Haley SM, Jette AM, Eisen SV, Ni P, Bogusz K, Meterko M, McDonough CM, Chan L, Brandt DE, Rasch EK. Conceptual foundation for measures of physical function and behavioral health function for social security work disability evaluation. Arch Phys Med Rehab. 2013;94(9):1645–1652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Brandt DE, Houtenville AJ, Huynh MT, Chan L, Rasch EK. Connecting contemporary paradigms to the social security administration’s disability evaluation process. J Disab Pol Stud. 2011;22(2):116–128. [Google Scholar]
- 6.Mathiowetz N, Wunderlich GS, eds. Institute of Medicine (US) and National Research Council (US) Committee to Review the Social Security Administration’s Disability Decision Process Research. Washington (DC): National Academies Press (US); 2000. [PubMed] [Google Scholar]
- 7.Escorpizo R, Gmünder HP, Stucki G. Introduction to special section: advancing the field of vocational rehabilitation with the international classification of functioning, disability and health (ICF). J Occup Rehab. 2011;21(2):121–125. [DOI] [PubMed] [Google Scholar]
- 8.Marfeo EE, Ni P, Chan L, Rasch EK, McDonough CM, Brandt DE, Bogusz K, Jette AM. Interpreting physical and behavioral health scores from new work disability instruments. J Rehab Med. 2015;47(5):394–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Marfeo EE, Ni P, Haley SM, Bogusz K, Meterko M, McDonough CM, Chan L, Rasch EK, Brandt DE, Jette AM. Scale refinement and initial evaluation of a behavioral health function measurement tool for work disability evaluation. Arch Phys Med Rehab. 2013;94(9):1679–1686. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Marfeo EE, Ni P, Haley SM, Jette AM, Bogusz K, Meterko M, McDonough CM, Chan L, Brandt DE, Rasch EK. Development of an instrument to measure behavioral health function for work disability: item pool construction and factor analysis. Arch Phys Med Rehab. 2013;94(9):1670–1680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Marino ME, Meterko M, Marfeo EE, McDonough CM, Jette AM, Ni P, Bogusz K, Rasch EK, Brandt DE, Chan L. Work-related measures of physical and behavioral health function: test-retest reliability. Disab Health J. 2015;8(4):652–657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Meterko M, Marfeo EE, McDonough CM, Jette AM, Ni P, Bogusz K, Rasch EK, Brandt DE, Chan L. Work disability functional assessment battery: feasibility and psychometric properties. Arch Phys Med Rehab. 2015;96(6):1028–1035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Franche RL, Krause N. Readiness for return to work following injury or illness: conceptualizing the interpersonal impact of health care, workplace, and insurance factors. J Occup Rehab. 2002;12(4):233–256. [DOI] [PubMed] [Google Scholar]
- 14.Saunders SL, Nedelec B. What work means to people with work disability: a scoping review. J Occup Rehab. 2014;24(1):100–110. [DOI] [PubMed] [Google Scholar]
- 15.Escorpizo R, Stucki G. Disability evaluation, social security, and the international classification of functioning, disability and health: the time is now. J Occup Environ Med. 2013;55(6):644–651. [DOI] [PubMed] [Google Scholar]
- 16.Rivers D. Sample matching: representative sampling from internet panels. A white paper on the advantages of the sample matching methodology. Palo Alto, CA:YouGovPolymetrix; 2002. [Google Scholar]
- 17.McDonough CM, Jette AM, Ni P, Bogusz K, Marfeo EE, Brandt DE, Chan L, Meterko M, Haley SM, Rasch EK. Development of a self-report physical function instrument for disability assessment: item pool construction and factor analysis. Arch Phys Med Rehab. 2013;94(9):1653–1660. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ni P, McDonough CM, Jette AM, Bogusz K, Marfeo EE, Rasch EK, Brandt DE, Meterko M, Haley SM, Chan L. Development of a computer-adaptive physical function instrument for social security administration disability determination. Arch Phys Med Rehab. 2013;94(9):1661–1669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Floyd FJ, Widaman KF. Factor analysis in the development and refinement of clinical assessment instruments. Psychol Assess. 1995;7(3):286–299. [Google Scholar]
- 20.Brown TA. Confirmatory factor analysis for applied research. 2nd ed. New York, NY: Guilford Publications; 2015. [Google Scholar]
- 21.Hu LT, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equat Model. 1999;6(1):1–55. [Google Scholar]
- 22.Steiger JH. Structural model evaluation and modification: an interval estimation approach. Multivar Behav Res. 1990;25(2):173–180. [DOI] [PubMed] [Google Scholar]
- 23.Gibbons RD, Bock RD, Hedeker D, Weiss DJ, Segawa E, Bhaumik DK, Kupfer DJ, Frank E, Grochocinski VJ, Stover A. Full-information item bifactor analysis of graded response data. Appl Psychol Measur. 2007;31(1):4–19. [Google Scholar]
- 24.Edelen MO, Reeve BB. Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Qual Life Res. 2007;16(1):5–18. [DOI] [PubMed] [Google Scholar]
- 25.Orlando M, Thissen D. Further investigation of the performance of S-X2: an item fit index for use with dichotomous item response theory models. Appl Psychol Measur. 2003;27(4):289–298. [Google Scholar]
- 26.Roju NS, Van der Linden WJ, Fleer PF. IRT-based internal measures of differential functioning of items and tests. Appl Psychol Measur. 1995;19(4):353–368. [Google Scholar]
- 27.Woods CM, Cai L, Wang M. The Langer-improved Wald test for DIF testing with multiple groups: evaluation and comparison to two-group IRT. Educ Psychol Measur. 2013;73(3):532–547. [Google Scholar]
- 28.Cai L, Du Toit S, Thissen D. IRTPRO: flexible, multidimensional, multiple categorical IRT modeling [Computer software]. Chicago, IL: Scientific Software International; 2011. [Google Scholar]
- 29.Edelen MO, Stucky BD, Chandra A. Quantifying ‘problematic’DIF within an IRT framework: Application to a cancer stigma index. Qual Life Res. 2015;24(1):95–103. [DOI] [PubMed] [Google Scholar]
- 30.Cai L, du Toit S, Thissen D. IRTPRO: flexible, multidimensional, multiple categorical IRT modeling. Chicago, IL: Scientific Software International; 2011. [Google Scholar]
- 31.Bjorner J, Smith K, Stone C, Sun X. IRTFIT: a macro for item fit and local dependence tests under IRT models. Lincoln, RI: QualityMetric Incorporated; 2007. [Google Scholar]
- 32.SAS Institute. SAS 9.1.3 Cary, NC: 2004. [Google Scholar]
- 33.Quinn H, Thissen D, Liu Y, Magnus B, Lai JS, Amtmann D, Varni JW, Gross HE, DeWalt DA. Using item response theory to enrich and expand the PROMIS® pediatric self report banks. Health Qual Life Outcomes. 2014;12(1):160–170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Hays RD, Morales LS, Reise SP. Item response theory and health outcomes measurement in the 21st century. Med Care. 2000;38(9 Suppl):II28–II42. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
