Abstract
Objective:
To demonstrate how to interpret PROMIS pediatric patient-reported outcome measure (PROM) scores for patients with pediatric inflammatory bowel disease (IBD).
Methods:
Using data from a prospective cohort study of patients ages 8–23 years with IBD (n = 1,049), we established disease-specific percentiles and computed the minimal clinically important difference (MCID) change score for six pediatric PROMs. We applied these results, general population percentiles, and the reliable change index to interpret PROM scores in a clinical trial sample of patients ages 8–20 years with IBD (n = 294) in which PROMIS PROMs were obtained at baseline and 3 months later.
Results:
Application of general population percentiles showed that the clinical trial sample at baseline had moderately worse self-reported health than the general population (22% of patients at or above the 95th percentile on Fatigue; 21% on Pain Interference). IBD-specific percentiles showed that the sample was somewhat worse than the reference IBD sample (8% of patients at or above the 95th percentile on Fatigue; 11% on Pain Interference). Application of the MCID threshold indicated that among the subgroup of patients that improved by 15 or more on the short Pediatric Crohn’s Disease Activity Index (n = 38), 45% also improved on IBD Symptoms, 47% for Fatigue, and 65% for Pain Interference.
Conclusion:
This study established IBD-specific percentiles for six pediatric PROMIS measures and demonstrated the application of percentiles and other methods for interpreting PROM scores.
Keywords: patient reported outcome measures, treatment outcome, inflammatory bowel diseases, Crohn disease, child
Introduction
Patient-reported outcome measures (PROMs) quantify and systematically evaluate patients’ perspectives on symptoms, functional status, or well-being.1,2 For PROMs to be impactful in clinical decision-making, users of these measures must be able to interpret scores at a single point in time and changes in these scores over time. One approach that is useful clinically, particularly for cross-sectional comparisons, is to link PROM scores to percentile ranks from a reference population. This is akin to educational standardized tests, such as the Scholastic Assessment Test (SAT), which provide both scores and percentile ranks. Percentiles may be updated over time without changing the scoring, which is critical for longitudinal research and comparing scores collected at different points in a measure’s history. Percentiles may also be used to establish clinically meaningful cut-points, similar to the definition of obesity as the 95th percentile or greater in body mass index for children of the same age and sex.3
To interpret longitudinal differences in PRO scores, it is useful to consider whether a PROM score’s change is large enough to be meaningful to patients and clinicians. This issue is particularly salient when evaluating group change because it may be deemed statistically significant even when the difference is small.4–7 There is no single method to define a minimal clinically important difference (MCID) for a change between scores obtained at different time points. An MCID is considered the smallest difference in a score that would be perceived as beneficial or would have implications for patients’ management.7–10 An MCID is commonly estimated by examining the magnitude of change in PROM scores observed in a group of patients that experienced a minimally important amount of change on a separate but associated clinical measure, such as disease severity. One-half the standard deviation of observed scores can also be used as a rule of thumb for identifying meaningful change when no clinical measure is available.11 Finally, the reliable change index (RCI)12 provides a useful metric to examine whether true change has likely occurred for an individual, as opposed to change due to measurement error. These approaches offer different perspectives and advantages when interpreting PROM score changes.
The NIH-funded Patient Reported Outcomes Measurement Information System (PROMIS) has produced dozens of pediatric PROMs that are now commonly used in clinical research and increasingly in clinical practice.13–17 These measures were developed to be universally applicable, rather than disease-specific. PROMIS scores are predictions of an individual’s level of health based on item response theory (IRT). This involves using the items’ IRT-based psychometric properties and the individual’s pattern of responses to the items to create an IRT-based score,18 as opposed to creating a score using a simple sum or average of the responses. The IRT scores, interpreted relative to a standard normal metric, are then transformed to the PROMIS T-score metric by multiplying the IRT score by 10 and adding 50. Thus, theoretically, each PROMIS measure may be interpreted relative to a normal distribution with a mean of T = 50 and standard deviation of 10. However, some of the samples used to set the metric and establish the mean and SD of the distributions were not representative of the general population, nor were they representative of a clinical population. Hence, a T-score of 50 does not always correspond to a meaningful average. To aid PROMIS T-score interpretation, we recently published nationally representative general population percentiles for each PROMIS Pediatric measure based on large samples of children (approximately 1,000 for each measure) in the United States aged 8 to 17 years.19 This paper allows one to interpret a score relative to the distribution of scores among children in the US. For example, a PROMIS Fatigue T-score of 60 corresponds to the 75th percentile, indicating that the score is higher than or equal to 75% of Fatigue scores in the US general population.
The purpose of this study is to demonstrate approaches for understanding PROM scores and to establish IBD-specific percentiles for six PROMIS measures. We examine alternative interpretation approaches that may be applied to pediatric clinical populations using data from two studies of children and adolescents with inflammatory bowel disease (IBD). Using data from a large prospective cohort study of patients ages 8–23 years (n = 1,049), we established disease-specific percentiles and MCID estimates for six PROMs relevant to pediatric IBD. We applied these results and other analyses to interpret PROM scores in a clinical trial for patients ages 8–20 years with IBD (n = 294).
Methods
Study Samples
In the first study, hereafter referred to as the IBD standard sample, participants were enrolled from February to June 2015 in an observational study on patient-reported outcomes. Recruitment was done in 14 pediatric gastroenterology practices in the ImproveCareNow Pediatric IBD Learning Health System learning network.20 Eligibility criteria were age 8–23 years-old, diagnosis of inflammatory bowel disease (IBD), and capacity to complete self-administered questionnaires.
In the second study, hereafter referred to as the COMBINE clinical trial, participants were enrolled at 35 study centers from October 2016 to December 2019 in the Low Dose Oral Methotrexate in Pediatric Crohn’s Disease Patients Initiating Anti-Tumor Necrosis Factor (anti-TNF) Therapy (COMBINE) trial (NCT02772965). Eligibility criteria were age 8–20 years-old, diagnosis of Crohn’s disease, and initiation of anti-TNF therapy. Patients were randomized to receive methotrexate or placebo. Assignment to these two groups remained blinded as of the writing of this manuscript. Thus, we report results for the entire cohort of participants in the trial.
In both studies, participants completed PROMs at study baseline and 3 months later. Participants 18 years or older or parents of children provided informed consent, and children provided assent. For the IBD standard sample, the Institutional Review Board of the Children’s Hospital of Philadelphia approved the study protocol (IRB #14–011233). For the COMBINE clinical trial, the Central Institutional Review Board at Cincinnati Children’s Hospital approved the study protocol (IRB #2018–0234C).
PROMs
The questionnaire administered to participants at baseline and follow-up in the IBD standard sample included a newly developed IBD Symptoms questionnaire (see Appendix A for details) and five PROMIS Pediatric instruments completed by patients (not parent proxies): Global Health 721 and 4-item short forms for Pain Interference,22 Fatigue,23 Life Satisfaction,24 and Psychological Stress Experiences.25 The questionnaire administered in the COMBINE clinical trial included the IBD Symptoms measure and 8-item short forms for Pain Interference22 and Fatigue.23 PROMIS instruments are publicly available (see healthmeasures.net). We used Mplus 826 and the published item parameters to produce IRT scores18 and the standard error of measurement (SEM) for each score. Scores were converted to a T-scale (T-score = 10 × score + 50). The measures are scored in the direction of their concept name, so that greater Global Health scores correspond to better health, whereas greater Fatigue scores correspond to worse fatigue.
Clinical Data
Clinical data for both studies were extracted from the ImproveCareNow registry. Disease severity indices included the Physician Global Assessment27 (PGA) and the short Pediatric Crohn’s Disease Activity Index28 (sPCDAI). The PGA is a physician rating of disease activity using a Likert scale of 1 (inactive) to 4 (severe). The sPCDAI is scored from 0–90 with items that assess abdominal pain, stools, weight, extra-intestinal manifestations, and well-being. Scores are classified as inactive (< 15), mild (15 to 25), or moderate-severe (> 25).
IBD-Specific Percentiles
To enable comparisons of PROMIS T-scores to children and adolescents with IBD, we used the baseline data from the IBD standard sample to develop disease-specific percentiles for each PROM. The pctile command in STATA29 was used to order scores from lowest to highest and calculate the score in the dataset that has p percent of scores at or below it, where p ranges from 1 to 99.
Data Analysis
To demonstrate the application of percentiles to the interpretation of scores, we used the IBD percentiles and the general population percentiles to categorize participants’ baseline PRO scores in the COMBINE clinical trial. For each PROM, we calculated the percentage of participants in each of three percentile-based severity categories: < 75th percentile; 75th – 94th percentile; 95th percentile or above.
To interpret group-level change, we estimated clinical anchor-based MCIDs using data from the IBD standard sample. Patients in the IBD standard sample completed a global assessment of disease activity at baseline and at follow-up using the same item: “In the past seven days, my IBD was a problem.” Response options ranged from 1 = “Never” to 5 = “Always.” This item was used as the clinical anchor for MCID estimation. We defined minimally important change on this anchor as a change between 4–5 (Often or Always) and 2–3 (Rarely or Sometimes) or between 2–3 and 1 (Never). Using the longitudinal data from the IBD standard sample, we first selected the subgroup of patients that showed minimally important improvement on the anchor and the subgroup that showed minimally important worsening on the anchor. For each subgroup, we used the lme4 package30 in R31 to construct mixed models with the scores on each PROM regressed on time (baseline/follow-up), age, gender, and the number of days that elapsed between baseline and follow-up and included by-subject random intercepts. The MCID estimate was the model estimate of the adjusted difference in score between baseline and follow-up. Tests of statistical significance used alpha = 0.05. To demonstrate the application of MCID thresholds to score interpretation, we determined whether group average improvement in the clinical trial sample exceeded the MCID estimates. In addition, we calculated one-half the standard deviation in baseline scores for each PROM in the COMBINE clinical trial and applied this value as a threshold for change.
We employed the reliable change index (RCI),12 to identify responders. The RCI provides a confidence interval for the difference in scores from baseline to follow-up accounting for measurement error associated with each of the two time points: RCI = (Time 2 Score – Time 1 Score) / √(Time 1 SEM2 + Time 2 SEM2). Significant change was defined as an RCI value that exceeded 1.28 (the z-score corresponding to a two-tailed 80% confidence level). Consistent with prior work,32–34 we also used the MCID estimates to assess responder status.
Results
Participants
Participants included 1,049 children in the IBD standard sample and 294 children in the COMBINE clinical trial sample (Table 1). The age range at study baseline was 8–23 years (mean = 16; SD = 3) in the IBD standard sample and 8–20 years (mean = 14; SD = 3) in the COMBINE trial. Participants in the IBD standard sample completed baseline questionnaires using paper and pen (n = 949) or electronic administration online (n = 100). Follow-up PROMs were completed by 493 participants (47%); 243 completed paper questionnaires and 250 completed electronic questionnaires. Participants in the COMBINE clinical trial completed questionnaires using paper and pen. Follow-up PROMs were completed by 234 participants (80%). The average time between baseline and follow-up PROM completion dates was 3.8 months (range = 2–8) in the IBD standard sample and 3.5 months (range = 2–6) in the COMBINE trial.
Table 1.
Baseline Participant Characteristics
| IBD Standard Sample | COMBINE Clinical Trial Sample | ||
|---|---|---|---|
| Total study sample | n = 1,049 | n = 294 | |
| Age at baseline in years (mean; SD) | 16; 3 | 14; 3 | |
| n (%) | n (%) | p-value | |
| Age at baseline (years) | |||
| 8–12 | 231 (22%) | 106 (36%) | <0.01 |
| 13–17 | 559 (53%) | 175 (60%) | |
| 18–23 | 259 (25%) | 13 (4%) | |
| Gender | |||
| Male | 569 (54%) | 193 (66%) | <0.01 |
| Female | 480 (46%) | 101 (34%) | |
| Race/Ethnicity | |||
| Hispanic/Latino | 49 (5%) | 8 (3%) | 0.21 |
| Black or African American | 112 (12%) | 30 (10%) | |
| White | 711 (78%) | 240 (82%) | |
| Other | 41 (5%) | 14 (5%) | |
| Diagnosis | |||
| Crohn’s disease | 726 (72%) | 294 (100%) | <0.01 |
| Ulcerative colitis | 224 (22%) | 0 (0%) | |
| Indeterminate colitis | 63 (6%) | 0 (0%) | |
| Patient-reported Disease Activity | |||
| Never | 440 (45%) | 95 (33%) | <0.01 |
| Rarely/Sometimes | 446 (45%) | 133 (46%) | |
| Often/Always | 103 (10%) | 63 (21%) | |
| Physician Global Assessment 1 | |||
| Inactive | 593 (70%) | 69 (33%) | <0.01 |
| Mild | 167 (20%) | 94 (45%) | |
| Moderate-Severe | 80 (10%) | 45 (22%) | |
| Medications 1 | |||
| Azathioprine | 134 (16%) | 0 (0%) | <0.01 |
| Steroids | 119 (14%) | 98 (45%) | <0.01 |
| Methotrexate | 133 (15%) | 0 (0%) | <0.01 |
| Other Measurements1 | Mean (SD) | Mean (SD) | p-value |
| Weight (kg) | 57 (18) | 49 (17) | <0.01 |
| Height (cm) | 162 (14) | 157 (16) | <0.01 |
| Disease duration (years) | 3.8 (3.1) | 0.7 (1.7) | <0.01 |
| C-reactive protein (mg/L) | 3.2 (8.8) | 3.6 (8.4) | 0.67 |
| Erythrocyte sedimentation rate (mm/h) | 15.3 (14.9) | 17.2 (15.7) | 0.17 |
| Albumin (g/dL) | 4.1 (0.5) | 4.0 (0.5) | <0.01 |
| Hematocrit (%) | 38.6 (4.4) | 37.5 (4.4) | <0.01 |
| Crohn’s Disease Patients | n (%) | n (%) | p-value |
| Short Pediatric Crohn’s Disease Activity Index 1 | |||
| Inactive (< 15) | 323 (72%) | 104 (51%) | <0.01 |
| Mild (15–25) | 72 (16%) | 64 (32%) | |
| Moderate-Severe (>25) | 53 (12%) | 35 (17%) | |
| Lower GI disease | |||
| Heal only | 119 (16%) | 58 (22%) | 0.05 |
| Colonic only | 177 (24%) | 48 (18%) | |
| Ileocolonic | 443 (59%) | 156 (58%) | |
| None | 10 (1%) | 6 (2%) | |
| Upper GI disease proximal to the | |||
| ligament of Treitz | 301 (42%) | 138 (54%) | <0.01 |
| Upper GI disease distal to the | |||
| ligament of Treitz | 129 (20%) | 65 (28%) | 0.02 |
| Disease behavior | |||
| Inflammatory (non-penetrating, non-stricturing) | 568 (87%) | 227 (86%) | 0.90 |
| Stricturing only | 35 (5%) | 13 (5%) | |
| Penetrating only | 34 (5%) | 15 (6%) | |
| Both stricturing and penetrating | 15 (2%) | 8 (3%) | |
| Perianal disease | 140 (18%) | 31 (21%) | 0.39 |
| Ulcerative Colitis Patients | n (%) | ||
| Pediatric Ulcerative Colitis Activity Index 1 | |||
| Inactive (<10) | 132 (69%) | ||
| Mild (10–30) | 36 (19%) | ||
| Moderate-Severe (>30) | 24 (12%) | ||
| Disease Extent | |||
| Ulcerative Proctitis (rectum only) | 16 (7%) | ||
| Left sided Ulcerative Colitis | 39 (16%) | ||
| Extensive Ulcerative Colitis | 23 (10%) | ||
| Pancolitis (the entire colon) | 160 (67%) |
Note. The denominator for each percentage is the number of participants with data available for the variable.
Data were included if the clinic visit occurred within 30 days of the study baseline questionnaire.
Interpreting Cross-sectional Scores
Selected percentiles for the IBD standard sample are reported in Table 2, and the full percentiles are reported in Appendix B. Appendix C includes a figure displaying the distributions of T-scores corresponding to the 1st through 99th percentiles for the general population and for the IBD standard sample. Compared with the US general pediatric population, the 95th percentile in the IBD standard sample corresponded to a PROMIS T-score that was higher by 10 points for Fatigue, 6 for Pain Interference, 5 for Psychological Stress, and lower by 6 for Global Health and 2 for Life Satisfaction (see Table 2).
Table 2.
Select IBD Percentiles and US General Population Percentiles
| High scores & percentiles = poorer functioning | ||||||
|---|---|---|---|---|---|---|
| PROM | Reference Group | 5th percentile | Median (50th percentile) | 75th percentile | 95th percentile | T-score Range |
| IBD Symptoms | IBD sample | 40 | 50 | 56 | 65 | 40–80 |
| US general population | n/a | n/a | n/a | n/a | n/a | |
| Fatigue | IBD sample | 25 | 48 | 57 | 70 | 25–88 |
| US general population | 20 | 37 | 47 | 60 | 20–89 | |
| Pain Interference | IBD sample | 31 | 43 | 54 | 64 | 31–77 |
| US general population | 25 | 36 | 49 | 58 | 25–80 | |
| Psychological | IBD sample | 41 | 55 | 62 | 71 | 41–80 |
| Stress Experiences | US general population | 37 | 50 | 56 | 66 | 39–75 |
| High scores & percentiles = greater functioning | ||||||
| PROM | Reference Group | 95th percentile | Median (50th percentile) | 25th percentile | 5th percentile | T-score Range |
| Global Health | IBD sample | 58 | 43 | 37 | 30 | 15–61 |
| US general population | 61 | 48 | 41 | 36 | 16–64 | |
| Life Satisfaction | IBD sample | 61 | 49 | 44 | 33 | 21–61 |
| US general population | 63 | 49 | 44 | 35 | 20–63 | |
Note. Cell values are T-scores corresponding to the specified percentile. US general population percentiles were established in prior work.19
The IBD percentiles and the US general population percentiles provided two complementary reference points to interpret the severity of fatigue and pain interference among patients in the COMBINE clinical trial (see Table 3). Compared with the general population, a relatively high percentage of patients in the clinical trial fell at or above the 95th percentile: 22% for Fatigue and 21% for Pain Interference. Using the IBD-specific percentiles, the percentage of patients at or above the 95th percentile was just 8% for Fatigue and 11% for Pain Interference, suggesting that the COMBINE clinical trial sample had a somewhat higher burden of fatigue and pain interference compared with the unselected group of children with IBD.
Table 3.
COMBINE Clinical Trial Baseline PRO Scores by Percentile Categories
| PROM | Reference Group | < 75th percentile | n (%) 75th – 94th percentile | 95th percentile or above |
|---|---|---|---|---|
| IBD Symptoms | IBD sample | 175 (60%) | 102 (35%) | 15 (5%) |
| US general population | n/a | n/a | n/a | |
| Fatigue | IBD sample | 212 (72%) | 59 (20%) | 22 (8%) |
| US general population | 157 (53%) | 72 (25%) | 64 (22%) | |
| Pain Interference | IBD sample | 193 (66%) | 69 (23%) | 31 (11%) |
| US general population | 156 (53%) | 76 (26%) | 61 (21%) |
Interpreting Change in PROM Scores
In the IBD standard sample, we observed minimally important improvement in disease activity for 68 patients (14%) and minimally important worsening in disease activity for 84 patients (17%). Data from these patient groups were used to estimate MCID thresholds for each PROM (see Table 4). MCIDs could not be obtained for improvement or worsening on Life Satisfaction or for worsening on Psychological Stress because the minimally improved group of patients did not show significant change on Life Satisfaction (p = 0.13), and the worsened group did not show significant change on Life Satisfaction (p = 0.07) or Psychological Stress (p = 0.85).
Table 4.
Thresholds for Important Change
| PROM | Clinical anchor-based MCID: Improvement | Clinical anchor-based MCID: Worsening | 0.5 Standard deviation |
|---|---|---|---|
| IBD Symptoms | −5.6 (CI: −7.1 to −4.2) | 5.0 (CI: 3.7 to 6.3) | 3.3 |
| Fatigue | −8.7 (CI: −12.0 to −5.4) | 3.6 (CI: 0.4 to 6.8) | 7.6 |
| Pain Interference | −5.2 (CI: −8.0 to −2.4) | 3.6 (CI: 1.0 to 6.3) | 7.2 |
| Psychological Stress Experiences | −4.4 (CI: −6.5 to −2.3) | n/a | n/a |
| Global Health | 2.4 (CI: 0.7 to 4.0) | −2.5 (CI: −3.9 to −1.1) | n/a |
Notes. CI = 95% confidence interval for the minimal clinically important difference (MCID) estimate. 0.5 standard deviation was calculated using baseline data from the COMBINE clinical trial sample. The MCID analyses showed no significant associations between PRO scores and covariates (i.e., gender, age in years, and days between baseline and follow-up) for the improved subgroup, with the exception of higher psychological stress associated with older age (β = 0.7, SE = 0.3, p = .01). For the worsened group, higher psychological stress was associated with older age (β = 0.6, SE = 0.2, p = .02) and girls compared to boys (β = −3.3, SE = 1.6, p = .04); greater baseline to follow-up interval was weakly associated with lower IBD symptoms (β = −0.1, SE = 0.03, p < .01), lower fatigue (β = −0.2, SE = 0.1, p < .01), higher global health (β = 0.1, SE = 0.03, p = .04), and higher life satisfaction (β = 0.1, SE = 0.03, p < .01).
After establishing the MCID estimates using change in self-assessed disease severity as a clinical anchor, we examined PROM score changes for the full sample of COMBINE clinical trial participants with follow-up data (n = 234) and for the subset that showed improvement in disease severity between baseline and follow-up (n = 38), as defined by a decrease of at least 15 points on the sPCDAI.30 Overall, group average scores for patients in the COMBINE clinical trial decreased by 2.3 points on IBD Symptoms, 4.5 on Fatigue, and 5.2 on Pain Interference. Hence, the clinical trial sample overall did not surpass the MCID thresholds for meaningful improvement on the PROMs. In addition, group average change was less than one-half the baseline standard deviation for each measure (see Table 4). In contrast, the clinically improved subgroup surpassed both the MCID and SD thresholds for improvement on each PROM. The average score changes in this subgroup were a decrease of 5.9 on IBD Symptoms, 9.2 on Fatigue, and 12.7 on Pain Interference.
When identifying responders, the RCI provided a slightly stricter threshold than the MCID and ½ SD thresholds (Table 5 and Appendix C). Percentages of improved patients were higher than percentages of worsened patients on each PROM. These differences were particularly pronounced for the clinically improved subgroup, with most estimates indicating over 40% of patients improved and less than 10% of patients worsened on each PROM.
Table 5.
Responder Status in the COMBINE Clinical Trial
| Responder Status | |||||||
|---|---|---|---|---|---|---|---|
| Participants that improved on IBD activity index* (n = 38) | Participants that did not improve on IBD activity index* (n = 105) | ||||||
| PROM | Threshold applied | n (%) Improved | n (%) No Change | n (%) Worsened | n (%) Improved | n (%) No Change | n (%) Worsened |
| IBD Symptoms | RCI | 15 (39%) | 23 (61%) | 0 (0%) | 15 (14%) | 79 (77%) | 9 (9%) |
| MCID | 17 (45%) | 19 (50%) | 2 (5%) | 26 (25%) | 62 (60%) | 15 (15%) | |
| 0.5 SD | 22 (58%) | 13 (34%) | 3 (8%) | 41 (40%) | 37 (36%) | 25 (24%) | |
| Fatigue | RCI | 16 (42%) | 9 (50%) | 3 (8%) | 35 (34%) | 53 (51%) | 16 (15%) |
| MCID | 18 (47%) | 14 (37%) | 6 (16%) | 38 (36%) | 38 (36%) | 28 (27%) | |
| 0.5 SD | 18 (47%) | 17 (45%) | 3 (8%) | 42 (40%) | 44 (42%) | 18 (17%) | |
| Pain Interference | RCI | 23 (62%) | 12 (33%) | 2 (5%) | 38 (36%) | 56 (53%) | 11 (11%) |
| MCID | 24 (65%) | 10 (27%) | 3 (8%) | 42 (40%) | 47 (45%) | 16 (15%) | |
| 0.5 SD | 24 (65%) | 11 (30%) | 2 (5%) | 36 (34%) | 57 (54%) | 12 (12%) | |
Improvement is defined as a decrease of 15 points or more on sPCDAI between baseline and follow-up.
Notes. This table includes data from n = 143 participants with sPCDAI measurements collected within 30 days of baseline and follow-up PROM completion. RCI = reliable change index. MCID = minimal clinically important difference. SD = standard deviation. For each PROM, an “improved” participant is one whose score change surpassed the specified threshold for improvement; a “worsened” participant is one whose change surpassed the specified threshold for worsening; “no change” includes any change that fell in between the improvement and worsening thresholds.
Discussion
PROMs are increasingly used in clinical care and research to assess health outcomes from the perspectives of patients themselves. This study demonstrated approaches for interpreting PROM scores, cross-sectionally and longitudinally. In cross-sectional analyses, we applied general population and disease-specific percentiles to aid interpretation of PROMs. The disease-specific percentiles showed that scores in the IBD standard sample were more severe than in the general population for Pain Interference, Fatigue, Psychological Stress, and Global Health, but not Life Satisfaction. Consistent with prior results from children with chronic illness,35 these findings indicate that children with IBD have similar levels of satisfaction with their lives as children in the general population despite experiencing a greater burden of symptoms and lower general health status. The percentage of participants in the clinical trial sample with baseline fatigue or pain interference scores greater than the 95th percentile of the reference IBD sample was about twice as high, and the percentage with fatigue or pain interference scores greater than the 95th percentile of the general population was more than four times as high. Both interpretations are useful, and we suggest that additional disease-specific percentiles be developed for other health conditions.
Consistent with prior studies of health-related PROMs,11,36,37 estimates of clinically important change were approximately one-half standard deviation unit, ranging in score differences from 3 to 9 across the six PROMs in the study. Using the longitudinal data currently available from the COMBINE clinical trial (prior to unmasking), the clinical anchor-based MCID and one-half SD thresholds indicated that the group of patients that improved in disease activity also showed meaningful improvement in IBD symptoms, fatigue, and pain interference. The anchor-based MCID, one-half SD, and RCI approaches showed that the percentages of improved patients were higher than the percentages of worsened patients on each PROM.
Strengths and Limitations of PROM Interpretation Methods
Each approach for interpreting PROMs has advantages and disadvantages, as prior reviews have described.8,38–41 Few prior studies have applied percentiles to the interpretation of PROMs. Substantial work is required to develop nationally representative or disease-specific percentiles based on large sample sizes. Nevertheless, this approach is highly valuable to support the use of PROMs in clinical settings because clinicians and clinical researchers are familiar with the interpretation of percentiles for many quantitative test results which have normal ranges and abnormal thresholds.2
Clinical anchor-based approaches associate PROM score differences with an external measure of clinically important change. Once estimated, the MCID threshold offers a simple approach to evaluate longitudinal differences in scores. As noted in prior reviews, however, this approach has a number of weaknesses.8,39,40,42 MCID analyses must be conducted for the selected PROMs in the clinical population of interest, and the results are affected by the size and demographic and clinical characteristics of the study sample. MCID estimates also depend on a number of decisions that must be made based on the investigators’ judgment, including the choice of clinical anchors and covariates, the definition of minimally important change on the anchors, and the strength of association between an anchor and PROM that justifies calculation of a MCID. Determining an appropriate anchor measure for each PROM can be difficult because PROMs are most useful when they assess outcomes that are not highly correlated with standard clinical metrics. It is possible to observe meaningful change on a PROM but not on a disease activity anchor measure, or vice versa. MCID thresholds such as those reported in this paper should be understood as limited by these factors and used in combination with alternative approaches when interpreting change over time.38,39,41
The standard deviation, the RCI, and other methods based on standard error of measurement43 may be calculated for any sample and any PROM without relying on an external measure. These approaches have been criticized as being unintuitive for clinicians and lacking connection to the patients’ perspective of meaningful change.33,34 In addition, investigators must consider the tradeoff between false positives (identifying an individual as changed when true change did not occur) and false negatives (failing to detect true change when it occurred) when choosing a confidence interval for the RCI. In this study, we chose a relatively wide confidence interval (80%) to decrease the probability of false negatives. Nevertheless, the RCI was the most conservative approach for classifying improved and worsened individuals. This study illustrates that if the change observed for an individual is statistically significant (i.e., surpasses the RCI threshold), it is unlikely to be small or not clinically meaningful because individual scores are associated with greater measurement error than group averages.43
Like the MCID, the RCI may be most useful when considered in combination with other approaches. In the original report of the RCI method, the authors suggested that clinically meaningful change should be supported by surpassing the RCI threshold and by evidence that an individual’s post-treatment score falls outside the range of typical scores for a “dysfunctional” population or within the range of typical scores for a “functional” (i.e., healthy) population.44 A similar definition of meaningful change on a PROMIS measure would require (1) a magnitude of change that surpasses the RCI threshold and (2) a final score that falls below the 95th general population percentile.
Study Limitations
The percentiles and MCID estimates reported in this paper are based on a large group of unselected children and adolescents with IBD, but the sample is not necessarily representative of all children with IBD. In particular, the study did not include children younger than age 8 years, and the sample included a higher proportion of adolescents than children 8–12 years-old. PROMIS parent proxy measures are available for children younger than 8 years-old, and similar interpretation approaches may be examined for these measures in future studies. In longitudinal analyses, the study focused on changes that occurred between two timepoints. Additional approaches are needed to examine trajectories of change in PROM scores over the course of three or more repeated measurements.
Conclusions
This study demonstrated relevant approaches for interpreting group mean and individual PROM scores cross-sectionally and longitudinally, including the development of condition-specific percentiles. Approaches like these can be applied to other pediatric clinical populations, thereby advancing the use of PROMs to incorporate patients’ perspectives in clinical research. These approaches may also be impactful in clinical practice, and future research should evaluate the effects of using PROMs as tools to initiate and inform patient-provider discussions and decision-making.
What’s New.
This study establishes inflammatory bowel disease-specific percentiles for six patient-reported outcome measures (PROMs) and demonstrates relevant approaches for interpreting PROM scores, including the application of PROMIS pediatric general population percentiles.
Acknowledgments
Research reported in this publication was supported by the National Institute of Arthritis and Musculoskeletal and Skin Diseases under award numbers U01AR057956 and U19AR069525; Patient Centered Outcomes Research Institute PCS-1406–18643; Helmsley Charitable Trust 2016PG-IBD003. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
The following ImproveCareNow and COMBINE authors participated in data collection and reviewed the manuscript: Jeremy Adler; Rana F. Ammoury; Keith Benkov; Brendan Boyle; José M. Cabrera; Jennifer L. Clegg; Jill M. Dorsey; Dawn R. Ebach; Lina M. Felipez; Ann M. Firestine; Arieda Gjikopulli; Ajay S. Gulati; Edward J. Hoffenberg; Traci W. Jester; Jess L. Kaplan; Mark E. Kusek; Dale Y. Lee; Tiffany M. Linville; Peter Margolis; Phillip Minar; Zarela Molle Rios; Jonathan Moses; B. Joanna Niklinska-Schirtz; Helen Pappa; Dinesh S. Pashankar; Shehzad A. Saeed; Charles M. Samson; Kelly C. Sandberg; Steven J. Steiner; Jillian S. Sullivan; Jeanne Tung; Prateek Wali.
Funding/Support
Research reported in this publication was supported by the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) (grant numbers U01AR057956 to Forrest and U19AR069525 to Forrest); Patient Centered Outcomes Research Institute (grant number PCS-1406–18643 to Kappelman); Helmsley Charitable Trust (grant number 2016PG-IBD003 to Kappelman).
Role of Funder/Sponsor
The sponsors had no role in the design and conduct of the study.
Abbreviations:
- COMBINE
Low Dose Oral Methotrexate in Pediatric Crohn’s Disease Patients Initiating Anti-Tumor Necrosis Factor Therapy trial
- IBD
Inflammatory bowel disease
- MCID
Minimal clinically important difference
- PROM
Patient-reported outcome measure
- PROMIS
Patient Reported Outcomes Measurement Information System
- RCI
Reliable change index
- SEM
Standard error of measurement
- sPCDAI
short Pediatric Crohn’s Disease Activity Index
Appendix A
Pediatric Inflammatory Bowel Disease (IBD) Symptoms Scale
The Pediatric Inflammatory Bowel Disease (IBD) Symptoms Scale was developed using participants’ baseline questionnaire responses in the IBD standard sample. The measure is a 4-item scale that assesses self-reported IBD symptoms in the past 7 days. The Item Response Theory (IRT) assumptions of unidimensionality and local independence were evaluated by fitting a single factor confirmatory factor analysis model (CFA) to identify the optimal set of items that represented the construct of IBD symptoms. The combination of items listed in Appendix Table 1 fulfilled these assumptions and produced the most robust CFA model fit indices (CFI=1.00, TLI=0.99, RMSEA=0.014). The scale demonstrates acceptable internal consistency with a Cronbach’s alpha of 0.74. Samejima’s Graded Response Model (GRM) was used to estimate each item’s discrimination (a) and location (b1 – b4) parameters. Discrimination parameters (a) indicate the degree to which items differentiate respondents with varying levels of IBD symptoms. Location parameters (b1 – b4) indicate the point along the IBD symptoms continuum at which an item has the greatest measurement precision. Known-groups validity of the Pediatric IBD Symptoms Scale was examined using self-reported, physician-reported, and clinical indices of disease activity (Appendix Table 2).
Appendix Table 1.
Pediatric IBD Symptoms Items with Item Response Theory Parameters and Confirmatory Factor Analysis Loadings
| Item | a | b1 | b2 | b3 | b4 | CFA Loading |
|---|---|---|---|---|---|---|
| In the past 7 days, my poop was loose or watery | 2.59 | −0.04 | 0.79 | 1.46 | 2.04 | 0.83 |
| In the past 7 days, I rushed to the bathroom to avoid an accident | 2.38 | 0.55 | 1.21 | 1.91 | 2.62 | 0.82 |
| In the past 7 days, I had blood in my poop | 1.61 | 1.02 | 1.75 | 2.37 | 3.09 | 0.65 |
| In the past 7 days, I had a stomachache | 1.4 | −0.27 | 0.89 | 1.89 | 3.44 | 0.62 |
Note. Each of the four items is scored as 1 (Never), 2 (Rarely), 3 (Sometimes), 4 (Often), and 5 (Always).
Appendix Table 2.
IBD Symptoms by Disease Activity
| n | IBD Symptoms T-score mean (SD) | |
|---|---|---|
| All patients | 1,005* | 50 (8) |
| Physician Global Assessment | ||
| Inactive | 573 | 47 (7) |
| Mild | 157 | 54 (8) |
| Moderate/Severe | 77 | 60 (9) |
| Child/Youth Past 7 days IBD Was a Problem | ||
| Never | 433 | 45 (6) |
| Rarely | 256 | 50 (6) |
| Sometimes | 179 | 55 (7) |
| Often or Always | 102 | 62 (7) |
| Crohn’s Disease Patients | ||
| All Crohn’s disease patients | 696 | 50 (8) |
| Inactive (sPCDAI<15) | 304 | 47 (7) |
| Mild (sPCDAI 15–25) | 67 | 54 (7) |
| Mod-Severe (sPCDAI>25) | 51 | 59 (8) |
| Ulcerative Colitis Patients | ||
| All ulcerative colitis patients | 215 | 50 (9) |
| Inactive (PUCAI<10) | 128 | 47 (7) |
| Mild (PUCAI 10–30) | 34 | 54 (8) |
| Mod-Severe (PUCAI>30) | 23 | 66 (6) |
Note. The Pediatric IBD Symptoms Scale Score is on the T-scale (M=50, SD=10), with higher scores corresponding to greater symptoms. sPCDAI = short Pediatric Crohn’s Disease Activity Index. PUCAI = Pediatric Ulcerative Colitis Activity Index. Disease activity data were included if the assessment occurred within 30 days of the study baseline questionnaire.
n = 44 participants in the study did not complete the IBD Symptoms Scale
Appendix B
| Percentile | IBD Symptoms | Fatigue | Pain Interference | Psychological Stress | Global Health | Life Satisfaction |
|---|---|---|---|---|---|---|
| 1 | 40 | 25 | 31 | 41 | 26 | 28 |
| 2 | 40 | 25 | 31 | 41 | 27 | 30 |
| 3 | 40 | 25 | 31 | 41 | 28 | 31 |
| 4 | 40 | 25 | 31 | 41 | 29 | 32 |
| 5 | 40 | 25 | 31 | 41 | 30 | 33 |
| 6 | 40 | 25 | 31 | 41 | 31 | 34 |
| 7 | 40 | 25 | 31 | 41 | 31 | 35 |
| 8 | 40 | 25 | 31 | 41 | 32 | 36 |
| 9 | 40 | 25 | 31 | 41 | 32 | 36 |
| 10 | 40 | 25 | 31 | 41 | 33 | 37 |
| 11 | 40 | 25 | 31 | 41 | 33 | 37 |
| 12 | 40 | 25 | 31 | 41 | 33 | 37 |
| 13 | 40 | 25 | 31 | 41 | 34 | 38 |
| 14 | 40 | 25 | 31 | 41 | 34 | 38 |
| 15 | 40 | 25 | 31 | 41 | 35 | 39 |
| 16 | 40 | 25 | 31 | 41 | 35 | 39 |
| 17 | 40 | 25 | 31 | 41 | 36 | 40 |
| 18 | 40 | 25 | 31 | 46 | 36 | 41 |
| 19 | 40 | 25 | 31 | 46 | 37 | 41 |
| 20 | 40 | 25 | 31 | 46 | 37 | 42 |
| 21 | 40 | 25 | 31 | 46 | 37 | 42 |
| 22 | 40 | 25 | 31 | 46 | 37 | 43 |
| 23 | 40 | 25 | 31 | 46 | 37 | 44 |
| 24 | 40 | 35 | 31 | 46 | 37 | 44 |
| 25 | 40 | 35 | 31 | 47 | 37 | 44 |
| 26 | 42 | 35 | 31 | 48 | 38 | 44 |
| 27 | 45 | 35 | 31 | 48 | 38 | 44 |
| 28 | 45 | 36 | 31 | 48 | 38 | 45 |
| 29 | 45 | 36 | 31 | 50 | 39 | 45 |
| 30 | 45 | 36 | 31 | 50 | 39 | 45 |
| 31 | 45 | 37 | 31 | 50 | 39 | 45 |
| 32 | 45 | 37 | 31 | 50 | 40 | 45 |
| 33 | 45 | 37 | 31 | 50 | 40 | 45 |
| 34 | 45 | 38 | 31 | 50 | 40 | 45 |
| 35 | 45 | 39 | 31 | 51 | 40 | 45 |
| 36 | 46 | 41 | 31 | 52 | 40 | 45 |
| 37 | 46 | 41 | 31 | 52 | 41 | 45 |
| 38 | 46 | 41 | 31 | 52 | 41 | 45 |
| 39 | 46 | 42 | 31 | 52 | 41 | 45 |
| 40 | 46 | 42 | 39 | 53 | 41 | 46 |
| 41 | 47 | 43 | 39 | 53 | 41 | 46 |
| 42 | 48 | 44 | 39 | 53 | 42 | 46 |
| 43 | 48 | 45 | 40 | 54 | 42 | 47 |
| 44 | 48 | 45 | 41 | 54 | 42 | 47 |
| 45 | 48 | 46 | 41 | 54 | 42 | 47 |
| 46 | 48 | 46 | 41 | 54 | 43 | 47 |
| 47 | 48 | 47 | 41 | 54 | 43 | 48 |
| 48 | 49 | 47 | 41 | 54 | 43 | 48 |
| 49 | 50 | 47 | 43 | 55 | 43 | 48 |
| 50 | 50 | 48 | 43 | 55 | 43 | 49 |
| 51 | 50 | 48 | 44 | 55 | 44 | 49 |
| 52 | 50 | 48 | 45 | 55 | 44 | 51 |
| 53 | 50 | 49 | 45 | 56 | 44 | 51 |
| 54 | 50 | 49 | 45 | 56 | 44 | 51 |
| 55 | 50 | 49 | 45 | 56 | 45 | 51 |
| 56 | 51 | 49 | 46 | 56 | 45 | 51 |
| 57 | 51 | 49 | 47 | 57 | 45 | 53 |
| 58 | 51 | 51 | 47 | 57 | 46 | 53 |
| 59 | 51 | 51 | 47 | 57 | 46 | 53 |
| 60 | 52 | 51 | 48 | 57 | 46 | 53 |
| 61 | 52 | 51 | 48 | 57 | 46 | 55 |
| 62 | 52 | 52 | 49 | 57 | 46 | 55 |
| 63 | 53 | 52 | 49 | 58 | 46 | 55 |
| 64 | 53 | 52 | 50 | 58 | 47 | 55 |
| 65 | 53 | 54 | 50 | 58 | 47 | 55 |
| 66 | 54 | 54 | 50 | 59 | 47 | 55 |
| 67 | 54 | 54 | 50 | 59 | 48 | 55 |
| 68 | 54 | 54 | 51 | 60 | 48 | 55 |
| 69 | 54 | 54 | 51 | 60 | 48 | 61 |
| 70 | 55 | 55 | 52 | 60 | 48 | 61 |
| 71 | 55 | 56 | 52 | 60 | 48 | 61 |
| 72 | 55 | 57 | 52 | 61 | 49 | 61 |
| 73 | 56 | 57 | 53 | 62 | 49 | 61 |
| 74 | 56 | 57 | 53 | 62 | 49 | 61 |
| 75 | 56 | 57 | 54 | 62 | 50 | 61 |
| 76 | 57 | 58 | 54 | 62 | 50 | 61 |
| 77 | 57 | 59 | 55 | 62 | 50 | 61 |
| 78 | 57 | 60 | 55 | 62 | 50 | 61 |
| 79 | 57 | 60 | 55 | 62 | 51 | 61 |
| 80 | 58 | 60 | 56 | 64 | 51 | 61 |
| 81 | 58 | 60 | 56 | 64 | 51 | 61 |
| 82 | 58 | 62 | 57 | 64 | 51 | 61 |
| 83 | 59 | 62 | 57 | 64 | 51 | 61 |
| 84 | 59 | 63 | 57 | 64 | 52 | 61 |
| 85 | 59 | 63 | 58 | 66 | 52 | 61 |
| 86 | 60 | 63 | 58 | 66 | 53 | 61 |
| 87 | 60 | 65 | 58 | 66 | 53 | 61 |
| 88 | 61 | 65 | 59 | 66 | 53 | 61 |
| 89 | 61 | 65 | 60 | 67 | 54 | 61 |
| 90 | 61 | 67 | 60 | 67 | 54 | 61 |
| 91 | 62 | 68 | 61 | 67 | 55 | 61 |
| 92 | 62 | 68 | 61 | 67 | 55 | 61 |
| 93 | 63 | 68 | 62 | 69 | 56 | 61 |
| 94 | 64 | 70 | 63 | 69 | 57 | 61 |
| 95 | 65 | 70 | 64 | 71 | 58 | 61 |
| 96 | 66 | 72 | 66 | 71 | 59 | 61 |
| 97 | 68 | 74 | 67 | 73 | 60 | 61 |
| 98 | 69 | 76 | 69 | 75 | 62 | 61 |
| 99 | 71 | 81 | 72 | 80 | 62 | 61 |
Appendix C
Appendix Figure 1.

Distributions of T-scores corresponding to the 1st through 99th percentiles for the US general population and for the IBD standard sample.
Appendix Table 3.
| Responder Status | |||||||
|---|---|---|---|---|---|---|---|
| IBD Standard Sample (n = 490 with follow-up data) | COMBINE Clinical Trial Sample (n = 234 with follow-up data) | ||||||
| PROM | Threshold applied | n (%) Improved | n (%) No Change | n (%) Worsened | n (%) Improved | n (%) No Change | n (%) Worsened |
| IBDSymptoms | RCI | 47 (11%) | 339 (77%) | 51 (12%) | 46 (20%) | 166 (72%) | 19 (8%) |
| MCID | 81 (19%) | 245 (56%) | 111 (25%) | 70 (30%) | 128 (56%) | 33 (14%) | |
| 0.5 SD | 102 (23%) | 202 (46%) | 133 (31%) | 98 (42%) | 81 (35%) | 52 (23%) | |
| Fatigue | RCI | 106 (22%) | 296 (62%) | 75 (16%) | 77 (33%) | 124 (53%) | 32 (14%) |
| MCID | 135 (28%) | 195 (41%) | 147 (31%) | 82 (35%) | 94 (41%) | 57 (24%) | |
| 0.5 SD | 143 (30%) | 223 (47%) | 111 (23%) | 91 (39%) | 103 (44%) | 39 (17%) | |
| Pain Interference | RCI | 87 (18%) | 303 (64%) | 85 (18%) | 90 (39%) | 118 (50%) | 25 (11%) |
| MCID | 124 (26%) | 210 (44%) | 141 (30%) | 96 (41%) | 93 (40%) | 44 (19%) | |
| 0.5 SD | 117 (25%) | 235 (49%) | 123 (26%) | 86 (37%) | 114 (49%) | 33 (14%) | |
Note. RCI = reliable change index. MCID = minimal clinically important difference estimated using data from the IBD standard sample. SD = standard deviation calculated using the baseline data from each respective sample. For each PROM, an “improved” participant is one whose score change surpassed the specified threshold for improvement; a “worsened” participant is one whose change surpassed the specified threshold for worsening; “no change” includes any change that fell in between the improvement and worsening thresholds.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declarations of Competing Interest
The authors have no conflicts of interest relevant to this article to disclose.
References
- 1.Broderick JE, DeWitt EM, Rothrock N, Crane PK, Forrest CB. Advances in patient-reported outcomes: the NIH PROMIS measures. EGEMS. 2013;1 (1):1015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Jensen RE, Bjorner JB. Applying PRO reference values to communicate clinically relevant information at the point-of-care. Med Care. 2019;57:S24–S30. [DOI] [PubMed] [Google Scholar]
- 3.Centers for Disease Control and Prevention. About child & teen BMI. Page last reviewed: March 17, 2021. Accessed June 12, 2021. https://www.cdc.gov/healthyweight/assessing/bmi/childrens_bmi/about_childrens_bmi.html.
- 4.Hays RD, Farivar SS, Liu H. Approaches and recommendations for estimating minimally important differences for health-related quality of life measures. COPD. 2005;2:63–7. [DOI] [PubMed] [Google Scholar]
- 5.Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health status. Med Care. 1989:S178–S89. [DOI] [PubMed] [Google Scholar]
- 6.Lydick E, Epstein R. Interpretation of quality of life changes. Qual Life Res. 1993;2:221–226. [DOI] [PubMed] [Google Scholar]
- 7.Wyrwich KW, Bullinger M, Aaronson N, et al. Estimating clinically significant differences in quality of life outcomes. Qual Life Res. 2005;14:285–95. [DOI] [PubMed] [Google Scholar]
- 8.Copay AG, Subach BR, Glassman SD, et al. Understanding the minimum clinically important difference: a review of concepts and methods. Spine J. 2007;7:541–6. [DOI] [PubMed] [Google Scholar]
- 9.Jaeschke R, Singer J, Guyatt GH. Measurement of health status. Control Clin Trials. 1989;10:407–15. [DOI] [PubMed] [Google Scholar]
- 10.McLeod LD, Coon CD, Martin SA, et al. Interpreting patient-reported outcome results: US FDA guidance and emerging methods. Expert Rev Pharmacoecon Outcomes Res. 2011;11:163–169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Norman GR, Sloan JA, Wyrwich KW. Interpretation of changes in health-related quality of life: the remarkable universality of half a standard deviation. Med Care. 2003;582–92. [DOI] [PubMed] [Google Scholar]
- 12.Jabrayilov R, Emons WHM, Sijtsma K. Comparison of classical test theory and item response theory in individual change assessment. Appl Psychol Meas. 2016;40:559–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Cella D, Yount S, Rothrock N, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years. Med Care. 2007;45:S3–S11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Forrest CB, Bevans KB, Tucker C, et al. Commentary: the patient-reported outcome measurement information system (PROMIS) for children and youth: application to pediatric psychology. J Pediatr Psychol. 2012;37:614–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Selewski DT, Troost JP, Cummings D, et al. Responsiveness of the PROMIS measures to changes in disease status among pediatric nephrotic syndrome patients: a Midwest pediatric nephrology consortium study. Health Qual. Life Outcomes 2017;15 (1):1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Dampier C, Barry V, Gross HE, et al. Initial evaluation of the pediatric PROMIS health domains in children and adolescents with sickle cell disease. Pediatr. Blood Cancer 2016;63 (6):1031–1037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.DeWalt DA, Gross HE, Gipson DS, et al. PROMIS pediatric self-report scales distinguish subgroups of children within and across six common pediatric chronic health conditions. Qual Life Res. 2015;24 (9):2195–2208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bock RD, Mislevy RJ. Adaptive EAP estimation of ability in a microcomputer environment. Appl Psychol Meas. 1982;6:431–44. [Google Scholar]
- 19.Carle AC, Bevans KB, Tucker CA, Forrest CB. Using nationally representative percentiles to interpret PROMIS pediatric measures. Qual Life Res. 2020:1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Crandall W, Kappelman MD, Colletti RB, et al. ImproveCareNow: The development of a pediatric inflammatory bowel disease improvement network. Inflamm Bowel Dis. 2011;17:450–7. [DOI] [PubMed] [Google Scholar]
- 21.Forrest CB, Bevans KB, Pratiwadi R, et al. Development of the PROMIS pediatric global health (PGH-7) measure. Qual Life Res. 2014;23:1221–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Varni JW, Stucky BD, Thissen D, et al. PROMIS Pediatric Pain Interference Scale: an item response theory analysis of the pediatric pain item bank. J Pain. 2010;11:1109–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lai JS, Stucky BD, Thissen D, et al. Development and psychometric properties of the PROMIS pediatric fatigue item banks. Qual Life Res. 2013;22:2417–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Forrest CB, Devine J, Bevans KB, et al. Development and psychometric evaluation of the PROMIS Pediatric Life Satisfaction item banks, child-report, and parent-proxy editions. Qual Life Res. 2018;27:217–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Bevans KB, Gardner W, Pajer K, et al. Qualitative development of the PROMIS pediatric stress response item banks. J Pediatr Psychol. 2013;38:173–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Muthén LK, Muthén BO. Mplus User’s Guide. Los Angeles, CA: Muthén & Muthén; 1998–2017. [Google Scholar]
- 27.Colletti RB, Baldassano RN, Milov DE, et al. Variation in care in pediatric Crohn disease. J Pediatr Gastroenterol Nutr. 2009;49:297–303. [DOI] [PubMed] [Google Scholar]
- 28.Kappelman MD, Crandall WV, Colletti RB, et al. Short pediatric Crohn’s disease activity index for quality improvement and observational research. Inflamm Bowel Dis. 2011;17:112–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.StataCorp. Stata Statistical Software: Release 15. College Station, TX: StataCorp LLC. 2017. [Google Scholar]
- 30.Bates D, Sarkar D, Bates MD, Matrix L. The lme4 package. R package version. 2007;2:74. [Google Scholar]
- 31.Team RC. R: A language and environment for statistical computing. 2013.
- 32.Food and Drug Administration. Patient-focused drug development guidance public workshop: incorporating clinical outcome assessments into endpoints for regulatory decision-making. 2019.
- 33.Turner D, Schunemann HJ, Griffith LE, et al. The minimal detectable change cannot reliably replace the minimal important difference. J Clin Epidemiol. 2010;63:28–36. [DOI] [PubMed] [Google Scholar]
- 34.Coon CD, Cook KF. Moving from significance to real-world meaning: methods for interpreting change in clinical outcome assessment scores. Qual Life Res. 2018;27:33–40. [DOI] [PubMed] [Google Scholar]
- 35.Blackwell CK, Elliott AJ, Ganiban J, et al. General health and life satisfaction in children with chronic illness. Pediatrics. 2019;143 (6). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Kappelman MD, Long MD, Martin C, et al. Evaluation of the patient-reported outcomes measurement information system in a large cohort of patients with inflammatory bowel diseases. Clin Gastroenterol Hepatol. 2014;12 (8):1315–1323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Arvanitis M, DeWalt DA, Martin CF, et al. Patient-reported outcomes measurement information system in children with Crohn’s disease. J Pediatr. 2016;174:153–159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Mouelhi Y, Jouve E, Castelli C, Gentile S. How is the minimal clinically important difference established in health-related quality of life instruments? Review of anchors and methods. Health Qual. Life Outcomes 2020;18:1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Crosby RD, Kolotkin RL, Williams GR. Defining clinically meaningful change in health-related quality of life. J Clin Epidemiol. 2003;56 (5):395–407. [DOI] [PubMed] [Google Scholar]
- 40.Hays RD, Woolley JM. The concept of clinically meaningful difference in health-related quality-of-life research. Pharmacoeconomics. 2000;18 (5):419–423. [DOI] [PubMed] [Google Scholar]
- 41.Revicki D, Hays RD, Cella D, Sloan J. Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol. 2008;61 (2):102–109. [DOI] [PubMed] [Google Scholar]
- 42.Sedaghat AR. Understanding the minimal clinically important difference (MCID) of patient-reported outcome measures. Otolaryngol. Head Neck Surg 2019;161 (4):551–560. [DOI] [PubMed] [Google Scholar]
- 43.Hays RD, Brodsky M, Johnston MF, et al. Evaluating the statistical significance of health-related quality-of-life change in individual patients. Eval Health Prof. 2005;28:160–71. [DOI] [PubMed] [Google Scholar]
- 44.Jacobson NS, Truax P. Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. J Consult Clin Psychol. 1991;59 (1):12–19. [DOI] [PubMed] [Google Scholar]
