Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Sep 18.
Published in final edited form as: Qual Life Res. 2020 May 14;29(9):2573–2584. doi: 10.1007/s11136-020-02513-6

Identifying clinically meaningful severity categories for PROMIS pediatric measures of anxiety, mobility, fatigue, and depressive symptoms in juvenile idiopathic arthritis and childhood‑onset systemic lupus erythematosus

C M Mann 1, L E Schanberg 2, M Wang 3, E von Scheven 4, N Lucas 1, A Hernandez 1, S Ringold 5, B B Reeve 1,2
PMCID: PMC10505945  NIHMSID: NIHMS1889073  PMID: 32410143

Abstract

Purpose

A key limitation to widespread adoption of patient-reported outcome (PRO) measures is the lack of interpretability of scores. We aim to identify clinical severity thresholds to distinguish categories of no problems, mild, moderate, and severe along the PROMIS® Pediatric T-score metric for measures of anxiety, mobility, fatigue, and depressive symptoms for use in populations with juvenile idiopathic arthritis (JIA) and childhood-onset systemic lupus erythematosus (cSLE).

Methods

We used a modified standard setting methodology from educational testing to identify clinical severity thresholds (clinical cut scores). Using item response theory-based parameters from PROMIS item banks, we developed a series of clinical vignettes that represented different severity or ability levels along the PROMIS Pediatric T-score metric. In stakeholder workshops, participants worked individually and together to reach consensus on clinical cut scores. Median cut-score placements were taken when consensus was not reached. Focus groups were recorded and qualitative analysis was conducted to identify decision-making processes.

Results

Nine adolescents (age 13–17 years) with JIA (33% female) and their caregivers, five adolescents (age 14–16 years) with cSLE (100% female) and their caregivers, and 12 pediatric rheumatologists (75% female) participated in bookmarking workshops. Placement of thresholds for bookmarks was highly similar across stakeholder groups (differences from 0 to 5 points on the PROMIS t-score metric) for all but one bookmark placement.

Conclusion

This study resulted in clinical thresholds for severity categories for PROMIS Pediatric measures of anxiety, mobility, fatigue, and depressive symptoms, providing greater interpretability of scores in JIA and cSLE populations.

Keywords: Patient-reported outcomes, PROMIS, Juvenile idiopathic arthritis, Childhood-onset, Systematic lupus erythematosus, Reference values

Background

Juvenile idiopathic arthritis (JIA) and childhood-onset systemic lupus erythematosus (cSLE) are chronic autoimmune diseases with no known cures. JIA affects between 4 and 14 in every 100,000 children in the United States (US) [1], is the most common cause of acquired childhood disability in the US, and the fifth most common chronic childhood disease [2]. Children with JIA experience an unpredictable disease course, with periods of improved disease control intermixed with episodes of flare [35]. Even when a child with JIA is well controlled, they may still report poorer health-related quality of life (HRQOL) than healthy peers [68].

Children with cSLE, similar to those with JIA, may have disease flares interspersed with periods of disease quiescence, but are rarely able to stop all medications. The prevalence rate for cSLE is 3.3–8.8 in 100,000 children [9] and children typically experience severe phenotypes, including organ disease [10]. In addition to life-threatening disease manifestations, secondary morbidities and psychosocial difficulties, such as mood disorders, body image problems, and academic and social challenges, complicate cSLE.

Patient-reported outcome (PRO) measures provide an opportunity to assess more fully the impact of JIA and cSLE and their associated therapies, providing actionable health information to make treatment and symptom management more personalized. Wide adoption of PRO measures for use in JIA and cSLE clinical trials and health care delivery settings requires: (1) substantial evidence supporting the reliability and validity of the PRO measures in these populations [11], and (2) a way to interpret PRO scores that is meaningful to researchers, clinicians, patients, and caregivers.

The Patient-Reported Outcomes Measurement Information System® (PROMIS®) includes a set of pediatric and adult questionnaires for assessing symptoms and function that have undergone extensive psychometric evaluation in a range of diseases, including rheumatic diseases [1215]. The PROMIS Pediatric measures provide a score for each domain (e.g., fatigue) on a T-score metric based on reference group norms, with a mean of 50 and a standard deviation of 10. The reference group included children from the general population and children with a range of chronic conditions and diseases [16]. Higher PROMIS Pediatric symptom scores (e.g., Fatigue, Anxiety, Depressive Symptoms) are associated with increased symptom burden. Higher PROMIS Pediatric function scores (e.g., Mobility) are associated with better function. A difference or change of 3 points in PROMIS Pediatric scores constitutes a minimally important difference (MID) [17].

Currently, there is little data contributing to the interpretation of pediatric PROMIS scores in terms of symptom severity or level of function. Stand-alone scores are difficult for clinicians, caregivers, and children to interpret. To enhance the adoption of PROMIS Pediatric measures in clinical practice and research settings, guidance is needed regarding PROMIS T-score ranges that differentiate between normal symptoms or function and severe symptoms or functional impairment [18, 19].

Our study used abbreviated bookmarking procedures [2023] with key stakeholders (adolescents, caregivers, and clinicians) to derive cut scores along the continuous PROMIS T-score metric, creating ordinal categories of no problems, mild, moderate, and severe that reflect symptom severity levels or functional limitation levels for key PRO domains relevant to adolescents with JIA and cSLE.

Methods

Study design

Investigators held separate bookmarking workshops for each stakeholder group: adolescents with JIA, adolescents with cSLE, caregivers of adolescents with JIA, caregivers of adolescents with cSLE, and pediatric rheumatologists who treat both conditions. Participants reviewed ordered vignettes of symptom/functional status and placed “bookmarks” on cut scores between severity categories. Consistent with previous bookmarking methodologies [24, 25] in education, from which the current methodologies were derived, we used the group median score when consensus was not reached for a given bookmark. Table 1 shows the domains completed by each stakeholder group. Focus group sessions were recorded and qualitative analysis was conducted using Rapid Assessment Process (RAP) [26] to identify decision-making processes around bookmark placement and clinical utility of cut scores.

Table 1.

Domains by stakeholder group

Adolescents JIA Parents JIA Adolescents SLE Parents SLE Clinician group 1 Clinician group 2
Anxiety X X X X X X
Fatigue X X X X X X
Physical function—mobility X X X
Depressive symptoms X X X

Participants

Based on previous studies [2123], each stakeholder group included 5–9 participants. JIA and cSLE adolescent participants between 13 and 17 years of age, diagnosed at least 6 months prior to study enrollment, were eligible for inclusion. Caregiver participants were caregivers of an eligible adolescent but did not have to be the caregiver of an enrolled participant. Participants in the adolescent and caregiver groups were recruited at the 2017 Houston Arthritis Foundation Juvenile Arthritis Conference (JIA) and from Duke University Medical Center Pediatric Rheumatology Clinic (cSLE). Caregivers and adolescents received $60 each for participating in the bookmarking workshop. Seven pediatric rheumatologists with at least two years of experience practicing pediatric rheumatology were recruited at the 2017 CARRA Annual Scientific Meeting in Houston and five clinicians were recruited at the 2018 CARRA Annual Scientific Meeting in Denver. Clinicians received $100 for their participation in the bookmarking workshop. Due to the rarity of JIA and SLE, convenience sampling was used.

PROMIS pediatric measures

Each PROMIS Pediatric item bank is comprised of multiple items to assess domain symptoms or function (anxiety—15 items, fatigue—25 items, physical function-mobility—24 items, and depressive symptoms—14 items). All four domains have five-point Likert scale response options. Symptom domains utilize “never,” “almost never,” “sometimes,” “often,” and “almost always” response options. Function domains use the response options “with no trouble,” “with a little trouble,” “with some trouble,” “with a lot of trouble,” and “not able to do.”

Study investigators (pediatric rheumatologists) determined the domains most clinically relevant to for JIA (anxiety, mobility, and fatigue) and cSLE (Depressive Symptoms, Mobility, and Fatigue). Patients with JIA are significantly more likely to suffer from mobility issues, as arthritis is a defining feature of this group of conditions. In contrast, the involvement of critical organs, such as brain, lung, heart and kidney are more important to their morbidity, mortality and life experience in cSLE. Children with cSLE commonly report Depressive symptoms, likely related to a combination of the underlying disease involvement of the central nervous system, chronic exposure to mood-altering medications, frequent issues with body image, and significant morbidity and risk for mortality. Patients with both conditions frequently experience fatigue.

Vignette development

For each PRO domain, vignettes were designed as bulleted lists of four PROMIS Pediatric items that describe the level of symptom severity or functional limitation experienced by a hypothetical adolescent with JIA or cSLE (Fig. 1). Bulleted lists were used rather than narrative vignettes because they are more easily understood by both adults and adolescents and research has shown that adolescents prefer smaller text chunks when consuming written information [27].

Fig. 1.

Fig. 1

Fatigue vignette examples for PROMIS T-score Levels of 40 and 45

For each individual vignette, we selected four specific items and responses from the PROMIS Pediatric item bank for a given domain. We chose items based on item response theory (IRT) parameters derived from the PROMIS Pediatric calibration samples [16], resulting in vignettes representing varying levels of severity across the symptom or function continuum. For a given PROMIS Pediatric measure (e.g., anxiety), the most likely response was identified for every item at each predefined PROMIS T-score location (on a scale from approximately 30 to 80, in 5-point increments, with the actual range varying depending on the PROMIS domain). Likelihoods of each response category for each item at the predefined T-score locations were first computed in R program [28] (Online Resource 1) using IRT parameters calibrated in large pediatric samples. All existing calibrations were performed based on the premise that the unidimensional graded-response IRT models fit closely to observed data with minimal local dependency and differential item functioning issues, which ensures the quality and transferability of T-scores adopted for vignette generation [16, 2933]. Once the likelihoods are calculated, response categories with the highest likelihoods were saved in an Excel spreadsheet, with rows representing individual items and columns representing responses at the predefined T-score locations. To generate vignettes for a symptom or function experience, we selected four items and their corresponding most likely responses at each predefined T-score location, while maximizing the overall item and response diversity. Each PRO domain included six to ten vignettes. Please see Fig. 1 for example vignettes.

Workshop procedures

A moderator trained in qualitative data collection, a psychometrician, and a study coordinator staffed each 4-h workshop. To begin, the moderator gave a presentation on PRO measures and the purpose of the workshop. Next, participants completed a warm-up exercise to familiarize themselves with the bookmarking procedure by classifying the “fanciness” of a range of desserts using bookmarking procedures. After the warm-up exercise, participants worked through one PROMIS domain at a time. For each domain, participants first completed the PROMIS Pediatric short form for that domain. Adolescents completed the survey themselves, caregivers completed the survey with their child in mind, and clinicians completed the survey with a single patient in mind. Stakeholders then participated in a discussion about what it means to have “no problems,” “mild,” “moderate,” and “severe” symptoms or functional impairment in the given domain. In addition, clinicians discussed how different symptom or functional levels may be associated with different treatment responses.

After the discussion, we gave participants a set of ordered vignettes, with the PROMIS T-score for that vignette in the top right corner. Participants laid the vignettes out in order from lowest to highest for symptom domains and highest to lowest for function domains. Then, participants individually placed bookmarks between vignettes to indicate where they thought cut scores between “no problems,” “mild,” “moderate,” and “severe” levels of a symptom or function fell. After individual bookmark placement, the focus group facilitator worked with participants to discuss bookmark placements, reasons for specific placements, and to seek consensus on bookmark placement across the group. During this process, the research coordinator tracked the placement and movement of bookmarks in real time using presentation software so that the group could easily visualize the process and bookmark movements. Once final bookmark placements were made, cut scores were calculated as the half-way point between the T-scores on the two bordering vignettes.

Figure 2 shows an example of the real time bookmarking visuals. We selected the median score for the bookmark when consensus was not reached.

Fig. 2.

Fig. 2

Real time bookmarking slides illustrating placements of fatigue cut scores by participants

Once initial group bookmark placements were determined during consensus seeking, participants received the scored PROMIS Pediatric measure they had completed at the beginning of the session. As a measure of consequential validity [34], participants looked at their PROMIS score to determine which category their score fell into based on the placement of bookmarks by the group and then assessed whether that category was appropriate. Participants could then change their bookmark placements if this activity had changed their mind about their earlier bookmark placement.

Because discussion length varied between groups, each group worked through as many domains as possible in the 4-h workshop, resulting in some groups not completing all four domains. All sessions were audio recorded and transcribed for qualitative analysis around decision-making processes and PRO use in clinical care.

Qualitative analysis

We used Rapid Assessment Process (RAP) which includes triangulation, iterative data analysis, and additional data collection to quickly understand qualitative data sets. Two coders who participated in the bookmarking workshops reviewed each transcript. Initially, framework matrices were developed using a priori codes derived from workshop guides and qualitative analysis goals. As transcripts were reviewed, relevant information was summarized in the framework matrices. This process allowed for quick summarization of data by multiple coders. Discrepancies in coding were discussed by the coders and reconciled and the framework matrices amended as new relevant information was identified by subsequent coders or subsequent workshop transcripts.

Results

Sample characteristics

JIA

Participants included nine adolescents with JIA (awJIA) aged 13–17 years and nine caregivers of children with JIA (cwJIA) aged 34–50, from six US states who were attending a patient conference. Despite open recruitment criteria permitting parents and adolescents to participate either in pairs or as individuals, all enrolled caregivers ended up being the caregiver of a participating adolescent.

SLE

Participants included five adolescents with SLE (awcSLE) aged 13–16 years and five caregivers of children with SLE (cwcSLE) aged 34–63, from a North Carolina Pediatric Rheumatology clinic. The caregiver group consisted of four parents and one grandparent of the participating adolescents.

Clinicians

Participants included 12 pediatric rheumatologists (aged 37–64 years) from seven US states and one Canadian Province who were attending an international professional conference. Clinicians had been practicing pediatric rheumatology between 6 and 27 years (median 11, mean 13.5).

Table 2 shows additional demographic information on all participant groups.

Table 2.

Participant demographics

Adolescents JIA (n = 9) Parents JIA (n = 9) Adolescents cSLE (n = 5) Parents cSLE (n = 5) Clinicians G1 (n = 7) Clinicians G2 (n = 5)
Female 3 7 5 4 5 4
Age (mean) 14.6 43 15.6 44 45.7 45.4
Asian, not Hispanic or Latinoa 0 0 0 0 3 1
Black or African American, not Hispanic or Latinoa 1 1 4 4 0 1
White, not Hispanic or Latinoa 8 8 1 1 4 3
PROMIS pediatric mobility score mean (range) 37.0 (27.8–58.5)
PROMIS pediatric anxiety score mean (range) 53.5 (33.5–67.6) 54.5 (47.0–63.2)
PROMIS pediatric fatigue score mean (range) 55.0 (38.3–82.6) 60.4 (52.7–68.3)
PROMIS pediatric depressive symptoms score mean (range) 56.4 (47.7–61.0)
Mean years practicing pediatric rheumatology 13.1 14.3
a

No participants reported Hispanic ethnicity

Severity cut scores by PROMIS pediatric domains

Figures 3 and 4 show consensus cut scores for no problems, mild, moderate, and severe categories for each PROMIS domain by stakeholder group. For 22% of bookmark placements, participants reached 100% consensus on bookmarking placement. For another 13%, agreement of between 78 and 99% consensus was reached.

Fig. 3.

Fig. 3

PROMIS® pediatric mobility cut scores by stakeholder group—JIA

Fig. 4.

Fig. 4

PROMIS® pediatric anxiety, fatigue, and depression cut scores by stakeholder group—JIA & cSLE

Overall, cut scores for anxiety and mobility had less variability than those for fatigue and depressive symptoms. JIA cut scores had less variability than those in the cSLE groups. However, the majority of the cut-score placements across stakeholder groups differed by no more than 10 points (1 standard deviation on the PROMIS T-Score metric) from one another. See Table 3 for final cut-score recommendations.

Table 3.

Consensus cut scores by domain and by stakeholder group

PROMIS domain Adjacent categories Group cut-score placements
Final recommended cut scores
awJIA cwJIA awcSLE cwcSLE Clinician G1 Clinician G2 JIA SLE
Physical function—mobility No problems to mild 42.5* 42.5* 42.5** 42.5
Mild to moderate 32.5 37.5 37.5** 37.5
Moderate to severe 22.5* 22.5** 22.5* 22.5
Anxiety No problems to mild 52.5** 52.5 52.5** 52.5 52.5 47.5 52.5 52.5
Mild to moderate 62.5* 62.5* 67.5 67.5 62.5* 62.5 62.5 65
Moderate to severe 72.5* 72.5* 72.5 72.5 72.5** 72.5* 72.5 72.5
Fatigue No problems to mild 42.5 42.5 47.5 42.5 47.5 42.5** 42.5 45
Mild to moderate 57.5* 57.5 62.5 57.5 57.5 52.5** 57.5 57.5
Moderate to severe 67.5 67.5 67.5 72.5 67.5** 67.5 67.5 67.5
Depressive symptoms No problems to mild 47.5 47.5* 52.5** 47.5
Mild to moderate 62.5 57.5 57.5** 57.5
Moderate to severe 77.5 72.5 72.5** 72.5

Values in bold and followed by two asterisks are values for which 100% consensus was reached. Values in bold with a single asterisk are values for which 78% or greater agreement was reached prior to taking the median

Consequential validity

For most participants, the PROMIS short form scores completed prior to the bookmarking exercise fell within the severity category they expected when using group consensus cut scores to delineate severity categories. For participants whose PROMIS scores were incongruent with the severity category (0 to 2 people per domain in each group), the majority chose not to change their bookmark placements, with the exception of one clinician. Participants cited different reasons for not changing the bookmark, many citing having been focused only on their own situation; however, during the activity they were thinking about the range of experiences represented in the vignettes, some of which may have been more severe or less severe than their own. Rather than moving their bookmarks, participants instead suggested that the previously set group bookmarks were correct based on the broader range of experiences represented in the activity.

Qualitative findings

Using RAP [26], we ascertained that decisions around bookmarking placements across disease groups were largely based on (1) perceived severity of individual items, (2) frequency of individual items (never, almost never, sometimes, almost always, always), or (3) a combination of both severity and frequency. Additionally, with regard to item severity, the definition of “normal” was a primary point of conversation around differentiating no problems from mild symptoms or function. For example, one awJIA participant explaining their cut-score placement said:

… with Lee [vignette name] he sometimes felt tired, and I just figured most people get tired when they do things. Like after a tough school day, you might feel tired but that doesn’t necessarily mean that you have mild fatigue. It just means that you’re tired.

Similarly, impact on activities of daily living (such as attending school or being able to shower) seemed to be one specific facet of item severity that was salient across groups, with one cwJIA participant commenting:

What pushed me to moderate for Christine [vignette name] was ‘get in and out of a car with a little trouble’, ‘get in to bed by herself with some trouble’. Getting into bed is such a common, normal thing, and if they can’t even get into bed by themselves their mobility is very severe…

Additionally, child age played a large role in how clinicians interpreted severity within the vignettes. Clinicians reported varying expectations of what acceptable symptom levels are for children of different ages, as patients have varying levels of fatigue, anxiety, mobility, or depressive symptoms at varying developmental stages. For example, one clinician commented that:

… so they’re each worried about, sort of… sometimes, one thing but depending on the age of the child that could be normal but it could be super abnormal…

With regard to the clinical utility of cut scores and PROs, clinicians reported interest in using PROs in clinic, but that obtaining scores in a timely manner and interpreting those scores were barriers to use of PROs in clinical settings. As expressed by one clinician in a bookmarking group:

I think that is always one of the challenges with all these PROMIS measures is that yeah, there’s a lot of good information. Yeah, we can give you a score but then if there’s no context with that score then you’re sort of like oh well, I don’t really know what I’m supposed to do with this now that you’re a sixty-five. Holy cow. We missed an opportunity and I think that’s ultimately one of the things I would like to feedback is how do you get that score back to you quick enough to be able to act on it?

Further, clinicians commented across all domains that with clinical severity categories, they would begin treating patients when symptoms or functional impacts reached moderate levels, if due to disease activity. Treatment would become more aggressive the more severe symptoms became, but cut scores separating moderate from severe may not be as important or necessary as the cut score for moderate, with one clinician commenting:

I would probably treat anything from moderate … and so I think it’s a little bit arbitrary for where you do the severe part because for me once you go into the moderate category it’s sort of like well we need to address this and figure out what we need to do with it.

If the symptoms were unrelated to disease activity, they would refer patients out for treatment. For example, clinicians noted that there isn’t much they can do for fatigue other than to refer out for sleep studies or specialty treatment.

Discussion

This study identified clinical cut scores for PROMIS Pediatric measures in JIA and cSLE populations using bookmarking methodology. Clinicians, caregivers, and adolescents placed bookmarks with a high degree of similarity, sometimes with perfect agreement, both within and across groups.

Unique contributions: comparison across multiple groups of cut scores

One component of the current study that differs from previous bookmarking studies is the inclusion of multiple clinician stakeholder groups, which allows us to evaluate consistency in scores across separate clinician groups for the same domain (anxiety and fatigue, Fig. 4). Cut scores across the clinician groups were highly similar, and in some cases identical. Clinicians tended to show more agreement across groups at the more severe end of the scale, but disagreement at the low end differed by no more than 5 points between groups.

A previous bookmarking study by Morgan et al. [23] estimated similar cut scores for PROMIS Pediatric domains of Physical Function—Mobility and Fatigue, allowing for comparison with our study (Figs. 5, 6). The Morgan study was conducted with JIA adolescents (15 to 20 years of age), caregivers of JIA adolescents (with children between 13 and 20 years of age), and clinicians (5–35 years of pediatric rheumatology experience). The differences in cut-score placements between each study, by stakeholder group and across stakeholder groups varied by 2.5–7.5 points, with one exception. The caregiver placement of the “no problems” to “mild” cut score for mobility in the Morgan study had a group cut score differing 10 points from the corresponding cut-point in our study. That score also differed by 7.5 points from the adolescent group cut score and 12.5 points from the clinician group cut score in the Morgan study. Given how closely most of the other bookmarks were placed across stakeholder groups and across studies, the caregiver group in the Morgan study may be an outlier.

Fig. 5.

Fig. 5

Comparison of PROMIS® pediatric mobility function cut scores in the current study and a study by Morgan et al.

Fig. 6.

Fig. 6

Comparison of PROMIS® pediatric fatigue symptoms cut scores in the current study and a study by Morgan et al.

Similarity of bookmark placements within our study (e.g., among clinician groups) and across studies (i.e., our study and the Morgan study) provide support for the usefulness of data from single bookmarking sessions when stakeholder groups are represented. However, multiple bookmarking sessions with each stakeholder group may be needed to include a more diverse group of participants resulting in more generalizable cut-score placements.

Clinical utility of PROs

The need to improve interpretability of PRO scores is well documented [35, 36]. Recent studies aimed specifically at assessing effective display of PRO data in clinical care and research settings identified the need for clinical severity categories/anchors [19]. Furthermore, information needs to be displayed in a useable format for clinicians, patients, and caregivers [18], which is done most clearly using clinical severity thresholds that are delineated graphically [37]. Our qualitative data suggest clinicians want to use PROs in clinical care, but they must be easy and quick to use for this to be feasible. For a pediatric rheumatologist, the ability to classify the degree of mobility impairment is critical to developing individualized treatment plans aimed at optimizing overall physical function. Establishing clinical cut scores that define T-score ranges on PROMIS Pediatric measures via bookmarking allows for identification of severity levels requiring intervention. Cut scores can also provide information regarding the magnitude of improved function (e.g., a change from severe to no problems in function), which may then be used effectively in graphic reports for patients and clinicians alike.

Limitations

The small number of stakeholder groups and participants in each stakeholder group may limit the generalizability of study findings. In particular, our cSLE groups were small, including only 5 people, which reduced our ability to recruit a diverse sample, particularly with regard to gender. Cut scores for Depressive Symptoms and Mobility were derived from smaller sample sizes than those for Anxiety and Fatigue. Future bookmarking studies in pediatric rheumatology should incorporate larger sample sizes via multiple stakeholder groups to assess the variability between different stakeholder groups and allow across group analysis to increase generalizability. In addition, with access to a large enough clinical sample, the use of purposive sampling rather than convenience sampling could increase diversity of participant groups.

We purposely selected the 13–17 age group because we recognize the bookmarking process is cognitively burdensome and we wanted to make sure the approach would be well understood by participants. Future studies should additionally explore the validity of these methods for children between 8 and 12 years.

An additional limitation was asking specialists in pediatric rheumatology who treat both JIA and cSLE, to think about only one disease when participating in their bookmarking sessions. Group leaders asked clinicians to think about a specific patient, but did not specify the patient’s disease, which may have contributed to discrepancies between clinician and patient/caregiver cut scores, as clinicians may rate symptom severity differently in JIA than in cSLE. Finally, clinicians were disease specialists, rather than domain specialists. Clinician participants mentioned that some domains were difficult for them to work with because they are not area experts and suggested domain experts participate in bookmarking, particularly for psychological domains.

Participant feedback

Overall, participants provided positive feedback about their role in the study and the usefulness of the bookmarks. Participants expressed interest in using clinical severity category ranges and cut scores in clinic, with one provider saying,

I think it would be helpful if I knew I was [working with a patient] on the mild closer to moderate versus the mild closer to normal. Right? So as long as you give maybe that information to the provider and her patient, I think it would be helpful to know where on the spectrum you are.

Many adolescents and caregivers also expressed appreciation at being involved in disease-related research and meeting other people with similar experiences.

Conclusion

PROs are critical in bringing the patient voice into clinical care. However, there are barriers to the successful integration of PRO measures into clinical settings, including the lack of knowledge around how to interpret and use PRO scores. The current study has identified clinical cut scores for severity categories for PROMIS Pediatric measures of anxiety, mobility, and fatigue in JIA and anxiety, fatigue, and depressive symptoms in cSLE, enhancing the interpretability of scores in two pediatric rheumatic disease populations. Similarly, the study revealed several issues around the integration of PROs into clinical care and research settings via the qualitative data collected. Future studies are needed to replicate the results, particularly across larger groups.

Supplementary Material

R-code for vignette item selection

Acknowledgements

This study was conducted as part of the Duke PEPR Center, which is funded by the National Institutes of Health through the following grant administered by the National Institute of Arthritis and Musculoskeletal and Skin Diseases: U19AR069519. The research reported is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We would like to acknowledge the adolescents, parents, and clinicians who participated in this study. We are grateful for their contributions. We would also like to acknowledge the Childhood and Rheumatology Research Alliance (CARRA) and the Arthritis Foundation for their help setting up bookmarking workshops at their annual meetings and conferences.

Footnotes

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11136-020-02513-6) contains supplementary material, which is available to authorized users.

Compliance with ethical standards

Conflict of interest Laura Schanberg, MD, Sarah Ringold, MD, and Emily von Scheven work for the Childhood Arthritis and Rheumatology Research Alliance (CARRA). Dr. Schanberg is a former Board Chair and currently sits on the Registry and Research Oversight Committee along with Dr. Sarah Ringold. Dr. Emily von Scheven will be the next Board Chair for CARRA. All other authors declare no conflict of interest.

Ethical approval All procedures performed in studies involving human participants were conducted in accordance with ethical standards of the institutional review board (Duke Health IRB, Pro00091284 and UNC IRB #15–2442 and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed consent Informed consent was obtained from all individual participants included in the study. For participants under the age of 18, caregiver consent was obtained in addition to child assent.

References

  • 1.Helmick CG, Felson DT, Lawrence RC, Gabriel S, Hirsch R, Kwoh CK, et al. (2008). Estimates of the prevalence of arthritis and other rheumatic conditions in the United States. Part I. Arthritis and Rheumatism, 58(1), 15–25. 10.1002/art.23177. [DOI] [PubMed] [Google Scholar]
  • 2.Sacks JJ, Helmick CG, Luo YH, Ilowite NT, & Bowyer S (2007). Prevalence of and annual ambulatory health care visits for pediatric arthritis and other rheumatologic conditions in the United States in 2001–2004. Arthritis and Rheumatism, 57(8), 1439–1445. 10.1002/art.23087. [DOI] [PubMed] [Google Scholar]
  • 3.Wallace CA, Huang B, Bandeira M, Ravelli A, & Giannini EH (2005). Patterns of clinical remission in select categories of juvenile idiopathic arthritis. Arthritis and Rheumatism, 52(11), 3554–3562. 10.1002/art.21389. [DOI] [PubMed] [Google Scholar]
  • 4.Ringold S, Seidel KD, Koepsell TD, & Wallace CA (2009). Inactive disease in polyarticular juvenile idiopathic arthritis: Current patterns and associations. Rheumatology (Oxford) , 48(8), 972–977. 10.1093/rheumatology/kep144. [DOI] [PubMed] [Google Scholar]
  • 5.Magni-Manzoni S, Pistorio A, Labo E, Viola S, Garcia-Munitis P, Panigada S, et al. (2008). A longitudinal analysis of physical functional disability over the course of juvenile idiopathic arthritis. Annals of the Rheumatic Diseases, 67(8), 1159–1164. 10.1136/ard.2007.078121. [DOI] [PubMed] [Google Scholar]
  • 6.Gutierrez-Suarez R, Pistorio A, Cespedes Cruz A, Norambuena X, Flato B, Rumba I, et al. (2007). Health-related quality of life of patients with juvenile idiopathic arthritis coming from 3 different geographic areas. The PRINTO multinational quality of life cohort study. Rheumatology (Oxford), 46(2), 314–320. 10.1093/rheumatology/kel218. [DOI] [PubMed] [Google Scholar]
  • 7.Seid M, Opipari L, Huang B, Brunner HI, & Lovell DJ (2009). Disease control and health-related quality of life in juvenile idiopathic arthritis. Arthritis and Rheumatism, 61(3), 393–399. 10.1002/art.24477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Haverman L, Grootenhuis MA, van den Berg JM, van Veenendaal M, Dolman KM, Swart JF, et al. (2012). Predictors of health-related quality of life in children and adolescents with juvenile idiopathic arthritis: Results from a web-based survey. Arthritis Care Research (Hoboken) 10.1002/acr.21609. [DOI] [PubMed] [Google Scholar]
  • 9.Kamphuis S, & Silverman ED (2010). Prevalence and burden of pediatric-onset systemic lupus erythematosus. Nature Reviews Rheumatology, 6(9), 538–546. 10.1038/nrrheum.2010.121. [DOI] [PubMed] [Google Scholar]
  • 10.Tucker LB, Menon S, Schaller JG, & Isenberg DA (1995). Adult- and childhood-onset systemic lupus erythematosus: A comparison of onset, clinical features, serology, and outcome. British Journal of Rheumatology, 34(9), 866–872. [DOI] [PubMed] [Google Scholar]
  • 11.Reeve BB, Wyrwich KW, Wu AW, Velikova G, Terwee CB, Snyder CF, et al. (2013). ISOQOL recommends minimum standards for patient-reported outcome measures used in patient-centered outcomes and comparative effectiveness research. Quality of Life Research, 22(8), 1889–1905. 10.1007/s11136-012-0344-y. [DOI] [PubMed] [Google Scholar]
  • 12.Reeve BB, Hays RD, Bjorner JB, Cook KF, Crane PK, Teresi JA, et al. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the patient-reported outcomes measurement information system (PROMIS). Medical Care, 45(5 Suppl 1), S22–31. 10.1097/01.mlr.0000250483.85507.04. [DOI] [PubMed] [Google Scholar]
  • 13.Cella D, Riley W, Stone A, Rothrock N, Reeve B, Yount S, et al. (2010). The patient-reported outcomes measurement information system (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. Journal of Clinical Epidemiology, 63(11), 1179–1194. 10.1016/j.jclinepi.2010.04.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.DeWalt DA, Gross HE, Gipson DS, Selewski DT, DeWitt EM, Dampier CD, et al. (2015). PROMIS((R)) pediatric self-report scales distinguish subgroups of children within and across six common pediatric chronic health conditions. Quality of Life Research, 24(9), 2195–2208. 10.1007/s11136-015-0953-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Reeve BB, Edwards LJ, Jaeger BC, Hinds PS, Dampier C, Gipson DS, et al. (2018). Assessing responsiveness over time of the PROMIS((R)) pediatric symptom and function measures in cancer, nephrotic syndrome, and sickle cell disease. Quality of Life Research, 27(1), 249–257. 10.1007/s11136-017-1697-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Quinn H, Thissen D, Liu Y, Magnus B, Lai JS, Amtmann D, et al. (2014). Using item response theory to enrich and expand the PROMIS(R) pediatric self report banks. Health Quality Life Outcomes, 12, 160. 10.1186/s12955-014-0160-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Thissen D, Liu Y, Magnus B, Quinn H, Gipson DS, Dampier C, et al. (2016). Estimating minimally important difference (MID) in PROMIS pediatric measures using the scale-judgment method. Quality of Life Research, 25(1), 13–23. 10.1007/s11136-015-1058-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Snyder C, Smith K, Holzner B, Rivera YM, Bantug E, Brundage M, et al. (2019). Making a picture worth a thousand numbers: Recommendations for graphically displaying patient-reported outcomes data. Quality of Life Research, 28, 345–356. 10.1007/s11136-018-2020-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Brundage MD, Smith KC, Little EA, Bantug ET, Snyder CF, & Board PRODPSA (2015). Communicating patient-reported outcome scores using graphic formats: Results from a mixed-methods evaluation. Quality of Life Research , 24(10), 2457–2472. 10.1007/s11136-015-0974-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Cook KF, Cella D, & Reeve BB (2019). PRO-bookmarking to estimate clinical thresholds for patient-reported symptoms and function. Medicine Care, 57(Suppl 5), S13–S17. 10.1097/MLR.0000000000001087. [DOI] [PubMed] [Google Scholar]
  • 21.Cook KF, Victorson DE, Cella D, Schalet BD, & Miller C (2014). Creating meaningful cut-scores for Neuro-QOL measures of fatigue, physical functioning and sleep disturbance using standard setting with patients and providers. Quality Life Research 10.1007/s11136-014-0790-9. [DOI] [PubMed] [Google Scholar]
  • 22.Cella D, Choi S, Garcia S, Cook KF, Rosenbloom S, Lai JS, et al. (2014). Setting standards for severity of common symptoms in oncology using the PROMIS item banks and expert judgment. Quality of Life Research 10.1007/s11136-014-0732-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Morgan EM, Mara CA, Huang B, Barnett K, Carle AC, Farrell JE, et al. (2017). Establishing clinical meaning and defining important differences for patient-reported outcomes measurement information system (PROMIS((R))) measures in juvenile idiopathic arthritis using standard setting with patients, parents, and providers. Quality of Life Research, 26(3), 565–586. 10.1007/s11136-016-1468-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Buckendahl CW, Smith RW, Impara JC, & Plake BS (2002). A comparison of angoff and bookmark standard setting methods. Journal of Educational Measurement, 39(3), 253–263. [Google Scholar]
  • 25.Cizek GJ, & Bunch MB (2006). The Bookmark Method. In Cizek G & Bunch MB (Eds.), Standard setting: A guide to establishing and evaluating performance standards on tests Cleveland: SAGE Publications Inc. [Google Scholar]
  • 26.Beebe J (2001). Rapid assessment process: An introduction Walnut Creek: AltaMira Press. [Google Scholar]
  • 27.Teens Know What They Want from Online News. (2013). Do You? Newspaper Association of America Foundation and the Media Management Center at Northwestern University [Google Scholar]
  • 28.Team RC (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing Vienna, Austria [Google Scholar]
  • 29.Irwin DE, Stucky B, Langer MM, Thissen D, Dewitt EM, Lai JS, et al. (2010). An item response analysis of the pediatric PROMIS anxiety and depressive symptoms scales. Quality of Life Research, 19(4), 595–607. 10.1007/s11136-010-9619-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Varni JW, Thissen D, Stucky BD, Liu Y, Magnus B, Quinn H, et al. (2014). PROMIS(R) parent proxy report scales for children ages 5–7 years: An item response theory analysis of differential item functioning across age groups. Quality of Life Research, 23(1), 349–361. 10.1007/s11136-013-0439-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Irwin DE, Stucky BD, Thissen D, Dewitt EM, Lai JS, Yeatts K, et al. (2010). Sampling plan and patient characteristics of the PROMIS pediatrics large-scale survey. Quality of Life Research, 19(4), 585–594. 10.1007/s11136-010-9618-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.DeWitt EM, Stucky BD, Thissen D, Irwin DE, Langer M, Varni JW, et al. (2011). Construction of the eight-item patient-reported outcomes measurement information system pediatric physical function scales: Built using item response theory. Journal of Clinical Epidemiology, 64(7), 794–804. 10.1016/j.jclinepi.2010.10.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Lai JS, Stucky BD, Thissen D, Varni JW, DeWitt EM, Irwin DE, et al. (2013). Development and psychometric properties of the PROMIS((R)) pediatric fatigue item banks. Quality of Life Research, 22(9), 2417–2427. 10.1007/s11136-013-0357-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Messick S (1998). Test validity: A matter of consequence. Social Indicators Research, 45(1–3), 35–44. [Google Scholar]
  • 35.Guyatt GH, Feeny DH, & Patrick DL (1993). Measuring health-related quality of life. Annals of Internal Medicine, 118(8), 622–629. 10.7326/0003-4819-118-8-199304150-00009. [DOI] [PubMed] [Google Scholar]
  • 36.Spertus J (2014). Barriers to the use of patient-reported outcomes in clinical care. Circular Cardiovascular Quality Outcomes, 7(1), 2–4. 10.1161/circoutcomes.113.000829. [DOI] [PubMed] [Google Scholar]
  • 37.Snyder CF, Smith KC, Bantug ET, Tolbert EE, Blackford AL, & Brundage MD (2017). What do these scores mean? Presenting patient-reported outcomes data to patients and clinicians to improve interpretability. Cancer, 123(10), 1848–1859. 10.1002/cncr.30530. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

R-code for vignette item selection

RESOURCES