Abstract
Aims:
Well-designed score reports can support therapists to accurately interpret assessments. We piloted a score report for the Pediatric Evaluation Disability Inventory-Patient Reported Outcome (PEDI-PRO) and evaluated: 1) To what extent can occupational and physical therapists (OT, PT) accurately interpret item-response theory (IRT)-based PEDI-PRO assessment results? 2) What is the perceived clinical utility of the pilot score report?
Methods:
Exploratory, sequential mixed methods design. Focus groups with OT and PTs (n =20) informed the development of the final score report; revisions were made in response to feedback. Next, OTs and PTs (n=33) reviewed score reports from two fictional clients and answered survey questions about the interpretation of the PEDI-PRO results. Additional questions evaluated clinical utility.
Results:
Focus groups: Visual cues supported score interpretation, but therapists requested additional explanations for advanced IRT measurement concepts. Survey: Therapists accurately interpreted foundational IRT concepts (e.g., identifying most/least difficult items, highest scores), but were less accurate when interpreting advanced concepts (e.g., fit, unexpected responses). Therapists anticipated sharing different components of the score report with family members, clinicians, and payers to support their clinical practice.
Conclusions:
The pilot PEDI-PRO score report was highly endorsed by therapists, but therapists may need additional training to interpret advanced IRT concepts.
Keywords: assessment interpretation, item response theory, patient reported outcome measures
Patient-reported outcome measures (PROMs) are an essential component of client-centered care (Basford & Cheville, 2022; Mroz et al., 2015; Rapport et al., 2014). Through PROMs, clients can report their subjective experience of a range of health-related constructs, such as mental wellbeing, quality of life, pain, participation, and function. Development and adoption of PROMs has increased over the past decade (Basford & Cheville, 2022; Churruca et al., 2021). In the United States, this may be in part driven by Affordable Care Act which mandated the inclusion of patient-reported outcomes (Mroz et al., 2015). However, barriers to the use of PROMs persist.
One barrier is occupational (OT) and physical therapists’ (PT) confidence and ability to interpret PROM results (Foster et al., 2018; Santana et al., 2015). Recent research has endeavored to facilitate clinicians’ abilities to interpret PROM results by evaluating the type of score reports, or data displays, that facilitate accurate interpretations. Findings suggest that graphical displays of results and use of color (e.g., green for within typical level of functioning) can facilitate interpretation at both single time points and over time (Bantug et al., 2016; Brundage et al., 2015; Snyder et al., 2017). Clinicians also demonstrate a preference for score reports that utilize color and other visual markers (e.g., circles) for clinically relevant information, such as low or concerning scores (Bantug et al., 2016; Brundage et al., 2015; Snyder et al., 2017; van Muilekom et al., 2021).
In some cases, interpretation of PROMs can be further complicated by the measurement model utilized. Newly developed PROMs and computer-administered PROMS are increasingly constructed and scored using item response theory (IRT). IRT, broadly, is an approach to measurement in which each item on a PROM reflects a unique level of difficulty or skill of the construct being assessed, with clients’ final scores associated with a location along the continuum assessed (Velozo et al., 2012). This contrasts with PROMs that utilize classical test theory, in which each item is hypothesized to equally represent the construct assessed and scores are derived by summing responses to represent the total amount of the construct present (or absent). When well- constructed, the use of IRT models in PROMs can result in a more precise score and can inform intervention planning by supporting therapists to identify the next skills that must be developed to make progress on the underlying construct. However, these more complex IRT measurement models can pose additional interpretation challenges for therapists, because assessment results can include unfamiliar information. For example, they may include both item- and respondent- fit, reports of skipped responses. Further, many therapists may not be prepared to interpret IRT-based PROMs, as interpretation of IRT-based assessments is not a specific educational standard for either entry level OT or PT educational curricula (Accreditation Council for Occupational Therapy Education (ACOTE), 2018; Commission on Accreditation in Physical Therapy Education, 2024). Therefore, PROM developers must identify strategies that support clinicians to efficiently and accurately interpret PROM results, regardless of their prior understanding of IRT.
We developed and evaluated a pilot score report for the Pediatric Evaluation of Disability Inventory-Patient Reported Outcome (PEDI-PRO). The PEDI-PRO is a computer-based assessment of functional performance designed to be accessible for youth and young adults ages 14–22 with intellectual and developmental disabilities (Kramer & Schwartz, 2017, 2018; Schwartz et al., 2021). The PEDI-PRO assesses three of the same functional domains as the Pediatric Evaluation of Disability-Computer Adaptive Test (PEDI-CAT): Daily Activities, Social/Cognitive, and Mobility (Haley et al., 2011), an assessment widely used by pediatric OTs and PTs. In contrast to the PEDI-CAT and other measures of functional performance, in which items are administered by domains, the PEDI-PRO organizes the administration of items by familiar and important everyday life situations (e.g., working at a job, cooking, playing sports and exercising). Items within each everyday life situation are automatically mapped by the PEDI-PRO software onto the three functional domains to generate IRT-based domain scores (Kramer et al., 2021). Initial testing of the PEDI-PRO items demonstrated the items were interpreted as intended (Kramer & Schwartz, 2017), are reliable, socially valid, and perceived as usable by therapists (Kramer et al., 2021). The PEDI-PRO is scored according to the Rasch model, a specific IRT model.
We developed a pilot score report specifically designed to support therapists’ interpretation of the PEDI-PRO results. The purpose of this study was to examine if our score report would support use of the PEDI-PRO, given its unique conceptual measurement framework and use of an IRT-scoring approach. Our research questions were: 1) What design features support OTs’ and PTs’ interpretation of the PEDI-PRO score report? 2) To what extent can OTs and PTs accurately interpret IRT-based PEDI-PRO assessment results using the pilot score report? 3) What is the perceived clinical utility of the pilot score report?
Methods
Design
We used an exploratory, sequential mixed methods design (Creswell & Creswell, 2017; Leech & Onwuegbuzie, 2009), in which the qualitative and quantitative phases had equal priority and influence on the design of the score report. First, we used qualitative focus groups with OTs and PTs to obtain feedback on and refine the design of the PEDI-PRO pilot score report. Second, we administered an online survey to evaluate the perceived clinical utility of the revised pilot score-report and evaluate the extent to which the score report could be used by therapists to accurately interpret PEDI-PRO assessment results. The survey results were analyzed using a quantitative approach. All procedures were approved by the University of Florida IRB and all participants provided informed consent.
Participants
OTs and PTs were recruited across the United States using professional contacts, organizations serving people with disabilities, schools/school personnel, fairs/conferences/symposiums, professional organizations, and public records of licensed PTs and OTs in Florida. OTs and PTs were invited to participate in the study if they: 1) worked with youth (age 14–22 years) with intellectual and developmental disabilities, 2) had a license in either OT or PT, 3) worked in their OT/PT role for at least 1 year, 4) worked at schools or rehabilitation in/outpatient settings. Therapists were excluded if they: 1) were unable to understand or communicate with English, or 2) did not have experience using assessments that produce standardized scores.
Focus Groups
Twenty therapists participated in focus groups (9 OTs, 11 PTs; all female; Table 1). Focus group participants had an average of 23.3 years (SD = 14.5) experience working in their discipline and 22.1 years (SD = 12.9) with youth and young adults with intellectual and developmental disabilities. Most clinicians identified as White and non-Hispanic (85% White, 5% Asian/Pacific Islander, 5% Multiracial, 5% preferred not to answer; 95% not Hispanic/Latino, 5% preferred not to answer), similar to the overall workforce (American Physical Therapy Association, 2020; Banks, 2022).
Table 1.
Participant Professional Experience
| Professional Experience | Focus Group n = 20 | Survey n = 33 |
|---|---|---|
|
| ||
|
Education Undergraduate |
2 (10%) | 6 (18.18%) |
| Master’s degree | 9 (45%) | 13 (39.39%) |
| Clinical Doctoral Degree | 8 (40%) | 12 (36.36%) |
| Research Doctoral Degree | 1 (5%) | 2 (6.06%) |
|
Experience with child
ages Early intervention (0–5) |
17 (85%) | 27 (81.82%) |
| Elementary school age (K-5) | 20 (100%) | 30 (90.90%) |
| Middle school age (6–8) | 20 (100%) | 31 (93.94%) |
| High school age (9–12) | 20 (100%) | 32 (96.97%) |
| Youth over 18 or post-diploma | 18 (90%) | 28 (84.85%) |
|
Practice
setting Regular school settings |
18 (90%) | 15 (45.45%) |
| Outpatient Rehabilitation | 16 (80%) | 19 (57.57%) |
| Community-based clinic | 12 (60%) | 12 (36.36%) |
| Specialized school settings | 11 (55%) | 19 (57.57%) |
| Inpatient Hospital | 4 (20%) | 4 (12.12%) |
| Other | 2 (10%) | 6 (18.18%) |
| Frequency administering questionnaires with youth with IDD | ||
| Often | 5 (25%) | 6 (18.18%) |
| Sometimes | 11 (55%) | 16 (48.48%) |
| Never or rarely | 4 (20%) | 11 (33.33%) |
| Frequency administering computer-based assessments | ||
| Often | 6 (30%) | 6 (18.18%) |
| Sometimes | 11 (55%) | 18 (54.55%) |
| Never or rarely | 3 (15%) | 9 (27.27%) |
Survey
Thirty-three therapists participated in the survey (18 OTs, 15 PTs; 32 females, 1 male; 6 participated in focus groups; Table 2). The therapists who completed the survey had an average of 21.2 years (SD = 14.1) experience working in their discipline and 12 years (SD = 17.1) experience working with youth and young adults with intellectual and developmental disabilities. Most clinicians identified as White and non-Hispanic (81.82% of persons identifying as White, 3.03% Asian/Pacific Islander, 9.1% Multiracial, 3.0% Black or African American, and 3.0% preferred not to answer; 90.9% identified as non-Hispanic/ Latino, 6.06% Hispanic or Latino, 3.0% preferred not to answer).
Table 2.
PEDI-PRO Score Report Sections and Design Features
| PEDI-PRO Report Section | Description | Design Features to Support Interpretation |
|---|---|---|
|
| ||
| Cover page | • Describes PEDI-PRO purpose and
intended age range • Guidance to interpret and use PEDI-PRO response categories, as informed by previous research |
• All information on one page reduces
need to refer to separate assessment manual • Each response choice is consistently associated with a different shade of blue throughout the report to support interpretation. |
|
| ||
| Summary Report | • Domain scaled scores, 95% confidence
intervals, fit, and number of items skipped for each
domain • Brief information describing: each domain; how to interpret criterion scaled scores; confidence intervals; fit; and skipped items |
• All information on one page reduces
need to refer to separate assessment manual • Warning icon appears when fit scores for a domain are not acceptable |
|
| ||
| Item responses by domain/ Item responses by everyday life situation | • Responses to each individual item (very easy, a little easy, a little hard, skip), organized by both domain and everyday life situation | • Each response choice is consistently associated with a different shade of blue throughout the report to support interpretation. |
|
| ||
| Item maps | • Item maps for each domain, depicting
the relative difficulty of each item from those requiring less to more
functional ability • The scaled score with 95% confidence interval is overlaid on the item map • Client responses are circled in red on the item maps |
• Each response choice is consistently associated with a different shade of blue throughout the report to support interpretation. |
|
| ||
| Change report summary table | • A summary table directly comparing domain scores, 95% CI, and fit from two administrations | • An up or down arrow is used to indicate change outside of the 95% confidence interval. |
|
| ||
| Change in item by domain | • Responses to each individual item in each domain from two administrations | • Each response choice is consistently associated with a different shade of blue throughout the report to support interpretation. |
|
| ||
| Change graph | • Bar graphs depicting each domain scaled score at each administration | • Parallel lines representing the 95% confidence interval at the first administration are overlaid on the second administration bar graph to identify change outside of the 95% confidence interval. |
Procedures
PEDI-PRO Pilot Score Report
The PEDI-PRO pilot score report is 8 pages and has 7 sections and incorporated several design features hypothesized to support clinicians to interpret assessment results (Table 2, see also the Appendix for an example of a summary report. These features included color-coding and a warning icon for out of range values (poor fit) and arrows to indicate significant change, as informed by previous literature regarding the importance of visual cues for data interpretation (Bantug et al., 2016; Brundage et al., 2015; Rothrock et al., 2020; Snyder et al., 2017) and principles of health literacy (Jacobson & Parker, 2014).
Focus Groups
Three focus groups were held via Zoom (5–9 clinicians/group) and lasted approximately 1.5 hours. During the focus groups, therapists viewed an example score report, with different options for layout, colors, symbols, and report wording. Therapists qualitatively indicated preferences and anticipated challenges when interpreting the different score reports, and suggested alternative wording, layouts, and formats.
Survey
Therapists received a Qualtrics link for the survey and a PDF of two score reports for two fictional cases for reference during the survey. Therapists were asked to view a three-minute video introducing the PEDI-PRO assessment format and administration. Cases were designed to reflect the types of young adult clients appropriate for the PEID-PRO. “David” was a 15-year-old with Down syndrome whose mother wanted him to gain independence in daily routines, safely navigate his environment, and socialize in the community. “Lucia” was an 18-year-old with autism planning to attend a post-secondary program and live in the dorm, which requires her to independently complete self-care and manage two meals a day.
Multiple choice and short answer questions were developed to assess foundational and advanced skills in interpreting IRT-based assessments. Foundational skills were those required for correct interpretation of domain criterion scores at one and multiple time points. Advanced skills support a more in-depth understanding of functional skills and facilitate effective goal identification but are not required to correctly use the PEDI-PRO (see results for more detail on foundational and advanced skills). Each skill was assessed in both cases, using parallel questions and choice wording across cases to ensure case equivalence. After both cases were completed, clinicians reported their perceived level of confidence interpreting different components of PEDI-PRO assessment results (e.g., domain scores, confidence intervals, fit) using a Likert response scale. Therapists also reported the perceived clinical utility of the score report and potential barriers to usability in practice.
Data Analyses
Focus Groups
We transcribed and de-identified focus group recordings. Two team members independently coded transcripts following a three-level codebook; 1) Classify the PEDI-PRO score report section referenced (i.e., summary table, item response by domain, item map); 2) Classify the content of the feedback (e.g., aesthetics, interpreting fit); 3) Classify the recommendation (if any) (i.e., add, revise, or remove). The coders discussed discrepancies and revised the codebook and coding until consensus was reached. Patterns in code levels were reviewed inductively (reading all quotes coded the same) and deductively (counting frequencies of codes, cross tabs of codes) to identify revisions (Creswell & Creswell, 2017). These revisions were implemented prior to survey administration.
Survey
Survey questions were categorized into foundational and advanced IRT interpretation skills (as shown in Table 3). Responses to multiple choice questions were scored as correct (1) or incorrect (0) and organized by specific skills. For the two skills that had multiple correct responses (see table 3), responses were scored as correct if participants identified at least one correct response and no incorrect responses. We calculated the number of therapists who had 100% correct responses for each skill both within each case study and across both cases (total number of possible responses per skill ranged from 2 to 6). Frequencies were calculated for post-case survey questions about confidence and clinical utility.
Table 3:
IRT-based score report interpretation: Percentage of correct responses
| Mode Correct/# Questions | % of Clinicians with 100% Correct Responses(n) | ||||
|---|---|---|---|---|---|
| Case 1 | Case 2 | Case 1 (n=33) | Case 2 (n=32) | Across both (n=32) | |
|
| |||||
| Foundational IRT Interpretation Scores | |||||
| Identify highest and lowest domain scaled score | 2/2 | 2/2 | 97.0% (32) | 100% (32) | 96.9% (31) |
| Identify 1 or more skip responses | 1/1 | 1/1 | 97.0% (32) | 100% (32) | 96.9% (31) |
| Understand how to interpret scaled score confidence interval | 1/1 | 1/1 | 78.8% (26) | 90.6% (29) | 75.0% (24) |
| Identify domain scaled scores with poor fit | 3/3 | 3/3 | 63.6% (21) | 71.9% (23) | 53. % (17) |
| Identify significant change in domain scaled scores | 1/1 | 1/1 | 81.8% (27) | 75.0% (24) | 65.6% (21) |
| Understand how to interpret criterion -referenced scaled scores | 1/1 | 1/1 | 57.6% (19) | 71.9% (23) | 53.1% (17) |
| Identify most & least difficult items using the item map | 2/2 | 2/2 | 84.9% (28) | 90.6% (29) | 84.4% (27) |
| Advanced IRT Interpretation Skills | |||||
| Interpret directionality of poor fit scores (positive or negative) | 1/1 | 1/1 | 66.7% (22) | 77.4% (24) a | 54.8% (17) a |
| Interpret domain scaled scores with poor fit | 1/1 | 1/1 | 100% (33) | 93.6% (29) a | 93.6% (29) a |
| Interpret why difference between domain scaled scores is or is not significant | 1/1 | 1/1 | 77.4% (24) a | 65.6% (21) | 61.3% (19) |
| Identified at least one unexpected response that was two rating scale choice lower than expected c | 1/1c | 1/1d | 68.8% (22) b | 67.7% (21) a | 67.7% (21) a |
| Identified at least one unexpected response that was two rating scale choice higher than expected | 0/1d | 1/1c | 15.6% (5) b | 80.7% (25) a | 9.7% (3) a |
| Identified at least one unexpected response that was one rating scale choice lower than expected | 0/1e | 0/1d | 18.8% (6) b | 9.7% (3) a | 6.5% (2) a |
| Identified at least one unexpected response that was one rating scale choice higher than expected | 0/1c | 0/1f | 6.3% (2)b | 32.3% (10)a | 25.8% (8) a |
31 responses
32 responses
2 possible correct responses
1 possible correct response
6 possible correct responses
5 possible correct responses
Results
Focus Group Results: Therapists’ Preferences for Score Report Design Features
Visual Cues to Support Interpretation
Overall, therapists endorsed the score report’s design features. They emphasized the importance of providing visual cues to support interpretation, such as the warning sign to indicate poor fit. Relatedly, they described the role of color in supporting them to quickly interpret scores. They found that the different color blue shading for different responses was “easy and nice,” and “bring[s] attention quickly.” One therapist noted, “the description below [the summary tables] makes it [the colors] easy to interpret,” underscoring the importance of also providing a written interpretation of color coding and other visual cues. Therapists felt that color could also be used to share information about change and unexpected responses, for example sharing: “Highlight the outliers for people, so that it would be more obvious...I’d make it yellow. Something that would call your attention to those things that are outliers that you should be paying attention to”; and “I do like having color on the change scale or differentiating clinical meaningful increase in some way.” While there was some discussion of utilizing “stoplight” colors (green, yellow, and red) for the different PEDI-PRO response choices and increase/decreases in scores, other therapists cautioned against this, due to the social interpretation of these colors. For example, “please don’t make it red and green, because the connection to that is the one [red] is wrong and the other one [green] is right.” Finally, therapists felt that while colors can be helpful, many clinics and schools use black and white printing, so all visual supports for interpretation should be clear and distinguishable in grayscale.
In response to feedback regarding visual cues, we added a dashed line on the change graph to identify the high range of the 95% confidence interval, so therapists could more easily identify if the score from the second administration exceeded the 95% confidence interval of the score from the first administration. The use of graduated blue shading was maintained to represent clients’ response choices.
Advanced IRT Concepts Require Explanation
While several therapists were familiar with fit, several others were not. These therapists felt that an extended description of fit would be helpful to include with the Summary Table section. Additionally, they were interested in understanding the impact of skipped items on scores, as this can vary across assessments. In response to this feedback, we refined the text summarizing how to interpret fit, and added the interpretation of skipped responses to the interpretation table on the cover page.
Interactive Score Reports Facilitate Clinical Reasoning and Communication
Several therapists desired an interactive score report, where they would be able to sort and select information based on the different clinical scenarios and ways in which they personally made sense of the assessment information. Therapists thought it would be beneficial to have the option to sort information in the “Item Responses by Domain” section either by client response (e.g., first view all items rated as “A little hard”) or item difficulty. Additionally, they endorsed a score report that showed responses by both domain and everyday life situation as clinically useful: “If you’re reporting the scores based on the domain, then I’d probably want to see what [item] fits in every domain, but [when] I would want to look at in setting goals or thinking about what’s important to a family or youth, I’d rather see them by everyday life situations.” Participants felt that being able to select information and how it is displayed could support communication with diverse audiences, sharing “different people that you might be sharing it with might have different understanding of it all and might just be interested in one or two different pieces.” These suggestions were all noted for the future, when the PEDI-PRO score report can be developed to have interactive functionality. In addition, therapists identified the need for a more accessible score report to share with youth and young adults. The subsequent development of this accessible score report is described elsewhere (Camacho et al., 2024).
Survey Results
Therapists’ Use and Perceptions of the Revised Pilot Score Report
Accuracy Interpreting the Score Report.
Overall, therapists were able to utilize the score report to accurately interpret several components considered “foundational skills” (e.g., identifying highest and lowest domain scores, skip responses, and most/least difficult items on the item map) (Table 3). Therapists had the most difficulty interpreting domain scaled scores that had poor fit and criterion-referenced scaled scores. On average, poor fit was identified 83.3% of the time, and only 53.1% of clinicians responded correctly to all questions about poor fit. Most incorrect responses relating to criterion-referenced scaled scores were due to clinicians interpreting the score as the percentage of items the young adult could perform. Some therapists had difficulty interpreting confidence intervals; most incorrect respondents believed that the confidence interval represented the percent of a time a specific score was attained, rather than the range of “true scores.” Across the two cases, only 65.6% of therapists correctly used confidence intervals and graphs to determine if change over time was significant.
Therapists had more difficulty interpreting advanced IRT-based assessment scores. While most therapists understood that domain scores with poor fit should be interpreted with caution, only 54.8% correctly interpreted the directionality of domain score’s poor fit (i.e., positive or negative out of fit values) (See the Appendix for example of interpretation of fit scores). Using the item maps, therapists most consistently were able to identify unexpected responses when they were 2 response categories higher than expected. However, overall, they had difficulty identifying unexpected responses that were only 1 response category higher or lower than expected.
Confidence Interpreting the Score Report.
Aligned with findings, therapists reported the most confidence interpreting skip responses, domain scaled scores, change scores and change graphs (>84% reported “very confident” or “somewhat confident”) (Table 4). They reported the least confidence interpreting item maps and item fit. The observation that 81.2% of therapists reported they were “very confident” or “somewhat confident” interpreting confidence intervals contrasted with their observed abilities when using confidence intervals to draw clinical interpretations for the hypothetical cases.
Table 4:
Confidence interpreting PEDI-PRO score report components (n=32)
| Score Report Component | Very confident % (n) | Somewhat confident % (n) | A little confident % (n) | Not at all confident % (n) |
|---|---|---|---|---|
|
| ||||
| Skip responses | 65.6% (21) | 25.0% (8) | 9.4% (3) | 0.0% (0) |
| Domain scaled scores | 40.6% (13) | 46.9% (15) | 9.4% (3) | 3.1% (1) |
| Change scores | 40.6% (13) | 46.9% (15) | 9.4% (3) | 3.1% (1) |
| Change graphs | 40.6% (13) | 43.8% (14) | 9.4% (3) | 6.3% (2) |
| Confidence intervals | 31.3% (10) | 50.0% (16) | 15.6% (5) | 3.1% (1) |
| Item maps | 34.4% (11) | 40.6% (13) | 21.9% (7) | 3.1% (1) |
| Fit | 28.1% (9) | 40.6% (13) | 21.9% (7) | 9.4% (3) |
Features Perceived as Supporting Score Report Interpretation.
The icons were most highly endorsed as supportive, as 87.5% of clinicians reported they “helped a lot.” Therapists also felt that written information under the Summary Table (84.4%) and on the Cover Page of the score report (65.6%) helped “a lot,” with few respondents reporting that this information “neither helped or made it harder” (information under tables: 3.13%; first page: 6.3%). The feature perceived as relatively least supportive was use of color, with only 56.3% participants reporting it “helped a lot”, and 9.4% reporting it “neither helped or made it harder.”
Perceived Clinical Utility of the PEDI-PRO Score Report.
Therapists were likely to use many components of the score reports, with >85% of participants, reporting they would be “very likely” or “somewhat likely” to use domain scores, scaled scores, fit scores, change graphs, and responses to individual items when discussing PEDI-PRO scores. They were least likely to discuss skipped items (Table 5). Therapists anticipated sharing different score report components with different stakeholders. For example, they were more likely to discuss skipped responses with family members (93.8%) than with other clinicians (56.3%), payors (e.g., insurance) (12.5%), and other stakeholders (e.g., client, school personnel) (21.9%). With the exception of skip responses and change graphs (63.3%), at least 80% of therapists anticipated sharing all other score report components with other clinicians. At least 90% of therapists anticipated sharing domain scores, scaled scores confidence intervals, item responses, skip responses, and change graphs with family members (Table 5). However, they were less likely to share several components with payors; they most highly anticipated sharing domain scores (53.1%) and change graphs (63.3%) with payors.
Table 5:
Type of stakeholder with whom therapists would share PEDI-PRO score report components (n=32)
| Score Report Component | Clinicians % (n) | Payors % (n) | Family Members % (n) | Othera % (n) |
|---|---|---|---|---|
|
| ||||
| Domain scores (32) | 87.5% (28) | 53.1% (17) | 90.6% (29) | 53.1% (17) |
| Scaled score confidence interval (31) | 80.7% (25) | 34.5% (11) | 61.3% (19) | 9.7% (3) |
| Fit score (31) | 80.7% (25) | 32.3% (10) | 74.2% (23) | 12.9% (4) |
| Skip responses (32) | 56.3% (18) | 12.5% (4) | 93.8% (30) | 21.9% (7) |
| Item response (32) | 81.3% (26) | 37.5% (12) | 93.8% (30) | 18.8% (6) |
| Item maps (30) | 83.3% (25) | 36.7% (11) | 80.0% (24) | 13.3% (4) |
| Change graphs (30) | 63.3% (19) | 63.3% (19) | 93.3% (28) | 16.7% (5) |
Other: Client, patient, school personnel
Discussion
This study found that experienced occupational and physical therapists were able to accurately interpret some components of the PEDI-PRO score report that required foundational knowledge of IRT concepts. However, therapists had more difficulty interpreting advanced IRT-based concepts, such as fit and using item maps. Aligned with this observation, therapists also expressed the least confidence interpreting these components. These findings align with previous research, in which therapists express a lack of confidence interpreting PROM results (Santana, 2015). Therefore, regardless of score-report design, therapists may benefit from training on how to interpret PROM results. Indeed, one study found that clinical supervisors had more positive attitudes towards PROMs and higher perceived self-efficacy for supporting the clinicians they supervise to implement and interpret PROM results after a 3-day training (Fullerton et al., 2018). Still, such intensive training requires resources. PROM developers must continue to innovate the design of score reports to enhance accuracy and reduce burden (Foster et al., 2018; Stover et al., 2021). As suggested by the therapists in our focus groups and explored in other studies, PROM developers may consider incorporating multimodal approaches to reporting results, including alternative visual presentations (Bantug et al., 2016; Brundage et al., 2015; Rothrock et al., 2020; Snyder et al., 2017), and creating interactive score reports that allow therapists to quickly identify needed and relevant information (Zapata-Rivera & Katz, 2014).
Therapists in this study highly valued score report features that supported their intervention planning. PROM developers may also consider the potential value of including clinical decision-making tools within their score reports. Challenges incorporating PROM scores into clinical decision-making can be a barrier to adoption, especially within larger systems of care (Foster et al., 2018; Stover et al., 2021). Clinical decision-making tools could link criterion scores or cut points to specific suggestions for intervention, or decisional support algorithms for identifying priorities and setting goals (Jarvis et al., 2019; Marfeo et al., 2022; Parham et al., 2021). While item maps derived from IRT-based assessments, in which items are depicted in their sequential order (e.g., more difficult to less difficult), can help clinicians identify “just right challenges” aligned with developmental sequences of skill development (Haley & Fragala-Pinkham, 2006), our research and others suggests these maps may not be easily interpreted by novice or untrained clinicians (Rothrock et al., 2020). Therefore, future development of the PEDI-PRO score report and other IRT-based assessments that incorporate item maps can use evidence-based approaches to develop specific recommendations for intervention strategies or recommendations for next steps that are linked to domain scores and item maps (Cohen et al., 2023; Marfeo et al., 2022).
Given the increasing number of IRT-based PROMs used in rehabilitation (Basford & Cheville, 2022), it is critical to identify how to best implement this innovative assessment methodology among an experienced workforce. At the individual level, there is a need to build capacity in experienced therapists who may have received their clinical education prior to the proliferation of IRT-based PROMs. Our sample represented a highly experienced group of clinicians with a diverse set of clinical settings. The majority reported “sometimes” administering self-report questionnaires to youth with intellectual and/or developmental disabilities and “sometimes” using computer-based assessments. For experienced clinicians, high quality professional development activities including work-based training and communities of practice may facilitate the incorporation of emerging, evidence-based IRT-based PROMs into their practice (Barry et al., 2017; Foster et al., 2018; Leahy et al., 2020). Further, to ensure there is workforce capacity to use IRT-based assessments, OT and PT educational programs should consider adding standards to their curricula to ensure new graduates have these skills that are increasingly relevant to entry-level practice.
Efforts to build individual clinician capacity must also be supported by system and organizational support that is responsive to the local context and needs (Barry et al., 2017; Foster et al., 2018; Leahy et al., 2020; Stover et al., 2021). Successful adoption of IRT-based PROMS in practice is facilitated when the purpose of the new assessment is aligned with the goals of the organization, and when there is a shared value for the potential benefits of such tools to enhance client experiences and outcomes (Foster et al., 2018; Stover et al., 2021). Health services research and creative leadership are needed to design and test effective ways to integrate PROM use into existing workflows, troubleshoot technology challenges, and ensure PROM results can be incorporated into the organization’s electronic health records.
Limitations and Future Research
The survey respondents reported an average of >21years of experience, and therefore may not have represented newly educated clinicians more well-versed in PROMs, computer-based assessments, and IRT. Conversely, their length of time in practice may have supported enhanced interpretation of the cases. Future studies about the usability of score reports for IRT-based PROMs should include therapists with a wider range of clinical experience. An additional limitation is that no therapists had the opportunity to administer the PEDI-PRO prior to interpreting assessment results. Having the hands-on experience of administering the PEDI-PRO and seeing the full assessment, rather than examples in a video, may have supported more accurate interpretation of the score report.
While the David and Lucia cases were designed to be as equivalent as possible, this was not tested prior to the survey. Further, as all respondents repeated the cases in the same order, the slightly improved accuracy interpretating the second case (Lucia) could reflect a practice effect. However, this points to potential mechanisms for enhancing therapists’ knowledge of IRT-based assessments. Future research could determine if feedback regarding interpretation accuracy improves subsequent interpretation and problem solving. Future development of the PEDI-PRO will also include features valued by therapists across both phrases, including interactive components and decision-making tools.
Conclusions
IRT-based PROMs are increasingly used in clinical practice to enhance client engagement and generate precise scores efficiently. This study contributes to the emerging body of work on score report features that support clinical interpretations of assessment results. Colors, icons, and integrated text descriptions of measurement concepts may support therapists to accurately interpret IRT-based score reports. Overall, therapists had less accurate interpretations of and reported less confidence for advanced-IRT concepts included in the PEDI-PRO pilot score report, suggesting a need for continuing education and organizational support. Therapists endorsed the clinical utility of the PEDI-PRO pilot score report for communicating with families, other professionals, and payors.
Acknowledgements:
Thank you to the clinicians who shared their perspectives over the years to contribute to the development of the PEDI-PRO, especially members of our Advisory Board. Additionally, the PED-PRO would not be possible without the ongoing commitment of the Inclusive Cool Cats Research team, whose lived experience has shaped the development and evaluation of the PEDI-PRO since 2013.
Research reported in this publication was supported by the Eunice Kennedy Shriver National Institute of Child Health & Human Development of the National Institutes of Health under Award Number R42HD090772. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Disclosure Statement:
Dan Davies: In accordance with Taylor & Francis policy and my ethical obligation as a researcher, I am reporting that I have a financial and/or business interests in the commercialization of the PEDI-PRO system, if any proceeds result from the sale of licensing of this system, through my company AbleLink Smart Living Technologies, LLC, a company that may be affected by the research reported in the enclosed paper. Through this notice I am disclosing those interests fully to Taylor & Francis.
Erik Mugele: In accordance with Taylor & Francis policy and my ethical obligation as a partner in this project, I am reporting that I have a general financial and/or business interests in the commercialization of the PEDI-PRO system, if any proceeds result from the sale of licensing of this system, through my role as Director of Technology at AbleLink Smart Living Technologies, LLC, a company that may be affected by the research reported in the enclosed paper. Through this notice I am disclosing those interests fully to Taylor & Francis
Interpretability and Clinical Utility of the Pediatric Evaluation of Disability Inventory- Patient Reported Outcome (PEDI-PRO) Score Report
Appendix: Example pages from the PEDI-PRO Score Report


Footnotes
All other authors have nothing to disclose.
Contributor Information
Ariel Schwartz, Institute on Disability, University of New Hampshire, Durham, NH 03824.
Fiorella Guerrero Calle, University of Florida, Gainesville, FL.
Elizabeth Barbour, Department of Occupational Therapy, University of Florida, 1225 Center Drive, Gainesville, FL 32603.
Andrew Persch, Fort Collins, CO, Colorado State University.
Beth Pfeiffer, Philadelphia, PA, Temple University, 1913 North Broad Street, Philadelphia, PA 19122.
Daniel K. Davies, Founder and President, AbleLink Smart Living Technologies, LLC, 6745 Rangewood Drive, Suite 210, Colorado Springs, CO 80918
Erik J. Mugele, Director of Technology, AbleLink Smart Living Technologies, LLC, 6745 Rangewood Drive, Suite 210, Colorado Springs, CO 80918
Jessica Kramer, Department of Occupational Therapy, University of Florida, 1225 Center Drive, Gainesville, FL 32603.
References
- Accreditation Council for Occupational Therapy Education (ACOTE). (2018). ACOTE Accreditation Standards and Interpretive Guide. https://acoteonline.org/accreditation-explained/standards/ [DOI] [PubMed] [Google Scholar]
- Bantug ET, Coles T, Smith KC, Snyder CF, Rouette J, & Brundage MD (2016). Graphical displays of patient-reported outcomes (PRO) for use in clinical practice: What makes a pro picture worth a thousand words? Patient Education and Counseling, 99(4), 483–490. 10.1016/j.pec.2015.10.027 [DOI] [PubMed] [Google Scholar]
- Barry M, Kuijer-Siebelink W, Nieuwenhuis L, & Scherpbier-de Haan N. (2017). Communities of practice: A means to support occupational therapists’ continuing professional development. A literature review. Australian Occupational Therapy Journal, 64(2), 185–193. 10.1111/1440-1630.12334 [DOI] [PubMed] [Google Scholar]
- Basford JR, & Cheville A. (2022). Patient-Reported Outcome Measures: An exploration of their utility in functional assessment and rehabilitation. Archives of Physical Medicine and Rehabilitation, 103(5, Supplement), S1–S2. 10.1016/j.apmr.2022.02.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brundage MD, Smith KC, Little EA, Bantug ET, Snyder CF, & The PRO Data Presentation Stakeholder Advisory Board. (2015). Communicating patient-reported outcome scores using graphic formats: Results from a mixed-methods evaluation. Quality of Life Research, 24(10), 2457–2472. 10.1007/s11136-015-0974-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Camacho B, Nonga-Mann C, Thomas E, & Kramer JM (2024). Figuring out an accessible PEDI-PRO Score Report for young adults with disabilities. Inclusion, 12(1),20–25. 10.1352/2326-6988-12.1.20 [DOI] [Google Scholar]
- Churruca K, Pomare C, Ellis LA, Long JC, Henderson SB, Murphy LED, Leahy CJ, & Braithwaite J. (2021). Patient-reported outcome measures (PROMs): A review of generic and condition-specific measures and a discussion of trends and issues. Health Expectations, 24(4), 1015–1024. 10.1111/hex.13254 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cohen ML, Harnish SM, Lanzi AM, Brello J, Hula WD, Victorson D, Nandakumar R, Kisala PA, & Tulsky DS (2023). Establishing severity levels for patient-reported measures of functional communication, participation, and perceived cognitive function for adults with acquired cognitive and language disorders. Quality of Life Research, 32(6), 1659–1670. 10.1007/s11136-022-03337-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Commission on Accreditation in Physical Therapy Education. (2024). Standards and Required Elements for Accreditation of Physical Therapist Education Program. https://www.capteonline.org/faculty-and-program-resources/resource_documents/accreditation-handbook [Google Scholar]
- Creswell JW, & Creswell JD (2017). Research Design: Qualitative, Quantitative, and Mixed Methods Approaches. SAGE Publications. [Google Scholar]
- Foster A, Croot L, Brazier J, Harris J, & O’Cathain A. (2018). The facilitators and barriers to implementing patient reported outcome measures in organisations delivering health related services: A systematic review of reviews. Journal of Patient-Reported Outcomes, 2(1), 46. 10.1186/s41687-018-0072-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fullerton M, Edbrooke-Childs J, Law D, Martin K, Whelan I, & Wolpert M. (2018). Using patient-reported outcome measures to improve service effectiveness for supervisors: A mixed-methods evaluation of supervisors’ attitudes and self-efficacy after training to use outcome measures in child mental health. Child and Adolescent Mental Health, 23(1), 34–40. 10.1111/camh.12206 [DOI] [PubMed] [Google Scholar]
- Haley SM, Coster WJ, Dumas HM, Fragala-Pinkham MA, Kramer J, Ni P, Tian F, Kao Y-C, Moed R, & Ludlow LH (2011). Accuracy and precision of the Pediatric Evaluation of Disability Inventory Computer Adapted Test (PEDI-CAT). Developmental Medicine and Child Neurology, 53(12), 1100–1106. 10.1111/j.1469-8749.2011.04107.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haley SM, & Fragala-Pinkham MA (2006). Interpreting change scores of tests and measures used in physical therapy. Physical Therapy, 86(5), 735–743. 10.1093/ptj/86.5.735 [DOI] [PubMed] [Google Scholar]
- Jacobson KL, & Parker RM (2014). Health literacy principles: Guidance for making information understandable, useful, and navigable., Institute of Medicine, Washington, DC. https://nam.edu/wp-content/uploads/2015/06/HealthLiteracyGuidance.pdf [Google Scholar]
- Jarvis JM, Gurga A, Greif A, Lim H, Anaby D, Teplicky R, & Khetani MA (2019). Usability of the Participation and Environment Measure Plus (PEM+) for client-centered and participation-focused care planning. The American Journal of Occupational Therapy, 73(4), 7304205130p1-30p8. 10.5014/ajot.2019.032235 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kramer JM, & Schwartz A. (2017). Refining the Pediatric Evaluation of Disability Inventory – Patient-Reported Outcome (PEDI-PRO) item candidates: Interpretation of a self-reported outcome measure of functional performance by young people with neurodevelopmental disabilities. Developmental Medicine and Child Neurology, 59(10), 1083–1088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kramer JM, & Schwartz AE (2018). Development of the Pediatric Disability Inventory-Patient Reported Outcome (PEDI-PRO) measurement conceptual framework and item candidates. Scandinavian Journal of Occupational Therapy, 25(5), 335–346. 10.1080/11038128.2018.1502344 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kramer JM, Schwartz AE, Davies DK, Stock S, & Ni P. (2021). Usability and reliability of an accessible Patient Reported Outcome Measure (PROM) software: The PEDI-PRO. American Journal of Occupational Therapy, 75(1), 7501205010p1–7501205010p10. 10.5014/ajot.2020.040733 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leahy E, Chipchase L, Calo M, & Blackstock FC (2020). Which learning activities enhance physical therapist practice? Part 2: Systematic review of qualitative studies and thematic synthesis. Physical Therapy, 100(9), 1484–1501. 10.1093/ptj/pzaa108 [DOI] [PubMed] [Google Scholar]
- Leech NL, & Onwuegbuzie AJ (2009). A typology of mixed methods research designs. Quality & Quantity, 43(2), 265–275. 10.1007/s11135-007-9105-3 [DOI] [Google Scholar]
- Marfeo E, Ni P, Wang C, Weiss D, & Cheville AL (2022). Identifying clinically relevant functional strata to direct mobility preservation among patients hospitalized with medical conditions. Archives of Physical Medicine and Rehabilitation, 103(5, Supplement), S78–S83.e1. 10.1016/j.apmr.2021.05.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mroz TM, Pitonyak JS, Fogelberg D, & Leland NE (2015). Client centeredness and health reform: Key issues for occupational therapy. The American Journal of Occupational Therapy, 69(5), 6905090010p1–6905090010p8. 10.5014/ajot.2015.695001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parham D, Ecker CL, Kuhaneck H, Henry DA, & Glennon TJ (2021). Sensory Processing Measure, Second Edition (SPMTM-2). https://www.wpspublish.com/spm-2 [Google Scholar]
- Rapport MJ, Furze J, Martin K, Schreiber J, Dannemiller LA, DiBiasio PA, & Moerchen VA (2014). Essential competencies in entry-level pediatric physical therapy education. Pediatric Physical Therapy, 26(1), 7. 10.1097/PEP.0000000000000003 [DOI] [PubMed] [Google Scholar]
- Rothrock NE, Amtmann D, & Cook KF (2020). Development and validation of an interpretive guide for PROMIS scores. Journal of Patient-Reported Outcomes, 4(1), 16. 10.1186/s41687-020-0181-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Santana MJ, Haverman L, Absolom K, Takeuchi E, Feeny D, Grootenhuis M, & Velikova G. (2015). Training clinicians in how to use patient-reported outcome measures in routine clinical practice. Quality of Life Research, 24(7), 1707–1718. 10.1007/s11136-014-0903-5 [DOI] [PubMed] [Google Scholar]
- Schwartz AE, Kramer JM, & PEDI-PRO Youth Team. (2021). Inclusive approaches to developing content valid patient reported outcome measure response scales for youth with intellectual/developmental disabilities. British Journal of Learning Disabilities, 49(1), 100–110. 10.1111/bld.12346 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Snyder CF, Smith KC, Bantug ET, Tolbert EE, Blackford AL, Brundage MD, & The PRO Data Presentation Stakeholder Advisory Board. (2017). What do these scores mean? Presenting patient-reported outcomes data to patients and clinicians to improve interpretability. Cancer, 123(10), 1848–1859. 10.1002/cncr.30530 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stover AM, Haverman L, van Oers HA, Greenhalgh J, Potter CM, Ahmed S, Greenhalgh J, Gibbons E, Haverman L, Manalili K, Potter C, Roberts N, Santana M, Stover AM, van Oers H, & On behalf of the ISOQOL PROMs/PREMs in Clinical Practice Implementation Science Work Group. (2021). Using an implementation science approach to implement and evaluate patient-reported outcome measures (PROM) initiatives in routine care settings. Quality of Life Research, 30(11), 3015–3033. 10.1007/s11136-020-02564-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Muilekom MM, Luijten MAJ, van Oers HA, Terwee CB, van Litsenburg RRL, Roorda LD, Grootenhuis MA, & Haverman L. (2021). From statistics to clinics: The visual feedback of PROMIS® CATs . Journal of Patient-Reported Outcomes, 5(1), 55. 10.1186/s41687-021-00324-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Velozo CA, Seel RT, Magasi S, Heinemann AW, & Romero S. (2012). Improving measurement methods in rehabilitation: Core concepts and recommendations for scale development. Archives of Physical Medicine and Rehabilitation, 93(8 Suppl), S154–163. 10.1016/j.apmr.2012.06.001 [DOI] [PubMed] [Google Scholar]
- Zapata-Rivera JD, & Katz IR (2014). Keeping your audience in mind: Applying audience analysis to the design of interactive score reports. Assessment in Education: Principles, Policy & Practice, 21(4), 442–463. 10.1080/0969594X.2014.936357 [DOI] [Google Scholar]
