Skip to main content
Pain Medicine: The Official Journal of the American Academy of Pain Medicine logoLink to Pain Medicine: The Official Journal of the American Academy of Pain Medicine
. 2015 Dec 22;17(2):314–324. doi: 10.1093/pm/pnv046

Performance of a Patient Reported Outcomes Measurement Information System (PROMIS) Short Form in Older Adults with Chronic Musculoskeletal Pain

Richard A Deyo *,†,‡,§,, Katrina Ramsey , David I Buckley *,, LeAnn Michaels , Amy Kobus , Elizabeth Eckstrom , Vanessa Forro , Cynthia Morris |‖
PMCID: PMC6281027  PMID: 26814279

Abstract

Objective . To assess reliability, validity, and responsiveness of a 29-item short-form version of the Patient Reported Outcomes Measurement Information System (PROMIS) and a novel “impact score” calculated from those measures.

Design . Prospective cohort study.

Setting . Rural primary care practices.

Subjects . Adults aged ≥ 55 years with chronic musculoskeletal pain, not currently receiving prescription opioids.

Methods . Subjects completed the PROMIS short form at baseline and after 3 months. Patient subsets were compared to assess reliability and responsiveness. Construct validity was tested by comparing baseline scores among patients who were or were not applying for Worker's Compensation; those with higher or lower catastrophizing scores; and those with or without recent falls. Responsiveness was assessed with mean score changes, effect sizes, and standardized response means.

Results . Internal consistency was good to excellent, with Cronbach's alpha between 0.81 and 0.95 for all scales. Among patients who rated their pain as stable, test-retest scores at 3 months were around 0.70 for most scales. PROMIS scores were worse among patients seeking or receiving worker's compensation, those with high catastrophizing scores, and those with recent falls. Among patients rating pain as “much less” at 3 months, absolute effect sizes for the various scales ranged from 0.24 (Depression) to 1.93 (Pain Intensity).

Conclusions . Results indicate that the PROMIS short 29-item form may be useful for the study of patients with chronic musculoskeletal pain. Our findings also support use of the novel “impact score” recommended by the National Institutes of Health (NIH) Task Force on Research Standards for Chronic Low Back Pain.

Keywords: Chronic Pain, Musculoskeletal, Measurement, Pain Medicine

Introduction

Measuring outcomes for patients with chronic pain conditions remains challenging. Clear and objective outcomes such as death or complete cure are relatively rare. Self-reported outcomes such as pain and physical function scores are widely used, and there are several familiar instruments for assessing these outcomes [ 1 , 2 ]. However, investigators frequently want to measure a wide range of other relevant and patient-centered outcomes, such as mood, sleep quality, social interactions, and others. An important measurement challenge has been the tradeoff between instrument breadth and length—seeking reliable, valid, and responsive measures of multiple constructs while minimizing respondent burden.

In recent years, the National Institutes of Health (NIH) have supported development of the Patient Reported Outcomes Measurement Information System (PROMIS) to assess self-reported outcomes in several domains relevant for multiple clinical conditions [ 3–6 ]. This effort began by identifying items from well-validated “legacy” measures, adapting them into large item banks, and modifying response options into standardized formats. Based on Item Response Theory, the items in each domain were ranked by difficulty or severity, using responses from large patient or general population samples. The aim was to create computer adaptive tests that were brief, yet would measure various outcomes precisely over a wide range of impairments. Computer adaptive tests adjust the sequential items asked according to previous responses, so an estimate of pain or function, for example, can be reached with the fewest possible questions.

In addition to the option of computer adaptive testing, PROMIS developers created several short-form versions of the measurement instruments [ 5 , 7 ]. These short forms are available in several lengths, and are intended to elicit scores that would match those obtained from computer adaptive testing. The measurement error is slightly greater with short forms [ 8 ], but they offer investigators an opportunity to use PROMIS measures in situations where Internet access is unfeasible or unavailable.

The NIH also recently sponsored a Research Task Force (RTF) to create standards for clinical research on patients with chronic low back pain [ 9 ]. The Task Force recommended use of the PROMIS measures or their short forms for characterizing a range of patient traits at baseline. The intent was that all clinical researchers assess and report a standard set of patient characteristics. For example, even surgical investigators would routinely report measures of patient mood, and even psychologist investigators would routinely report measures of physical function. If investigators are not using computer adaptive testing, the Task Force recommended use of scales from the 29-item short form, which assesses several constructs (e.g., physical function, depression, sleep disturbance) with just four items each. Pain intensity was measured with a single item on a 0-10 numerical rating scale.

The Task Force also recommended a novel “impact score” that combined results of the Pain Intensity, Physical Function, and Pain Interference PROMIS scores (nine items from the 29-item short form version) [ 9 ]. This was conceived as a means of stratifying patients aside from their anatomic or pathophysiologic characteristics, taking advantage of the prognostic and discriminatory features of pain intensity, physical function, and pain interference. Though this recommendation was made for studying chronic low back pain, the impact score may be relevant for many forms of chronic musculoskeletal pain.

There are a few reports of using the PROMIS measures for studying musculoskeletal pain or interventional pain procedures [ 10–16 ]. For example, the four-item measures of depression and anxiety showed high levels of internal consistency and strong correlations with legacy measures among patients with mixed musculoskeletal disorders [ 16 ]. The computer adaptive version of the physical function items showed high internal consistency and negligible ceiling and floor effects among back pain or orthopedic trauma patients [ 12 , 13 ]. Among patients with rheumatoid arthritis, a 20-item version of the PROMIS physical function scale was more responsive to change than two widely used legacy measures [ 15 ]. However, the PROMIS measures, especially the four-item short forms and the impact score, have not yet been widely used or examined among patients with chronic pain conditions. Thus, we have limited information about traditional psychometric performance of these measures in this population.

We examined performance of the 29-item version of the PROMIS short forms and the RTF “impact score” in a prospective cohort study of patients with chronic musculoskeletal pain. Our aims were to:

  • • Assess the reliability and validity of the PROMIS short form versions of pain intensity, pain interference, physical function, depression, anxiety, fatigue, sleep disturbance, and satisfaction with social role;

  • • Assess the responsiveness of these instruments to changes over time; and

  • • Assess the psychometric properties of the novel “impact score” recommended by the NIH Task Force on Research Standards for Chronic Low Back Pain.

Methods

Setting

Subjects were recruited from five rural primary care practices that were members of the Oregon Rural Practice-based Research Network (ORPRN). The main purpose of the study was to track longitudinal changes in pain and neuropsychological measures according to type of pharmacologic therapy. All activities were approved by the Institutional Review Board of Oregon Health & Science University. Subjects provided written informed consent.

Subjects

We identified potential subjects from administrative records in each primary care office, using a list of ICD-9-CM diagnosis codes that included various forms of back pain, neck pain, shoulder pain, extremity pain, arthritis, or carpal tunnel syndrome. This included 214 ICD-9-CM codes that we thought would capture the vast majority of patients with musculoskeletal pain. Potential subjects were screened for inclusion by medical record review and telephone interview and then completed a written survey at baseline. Patients confirmed the presence of chronic musculoskeletal pain, and named “the pain that bothers you most.” Eligibility criteria included age ≥ 55 years, because the larger study targeted neuropsychological effects of analgesics that may be most common in older adults. We required at least two visits or contacts for musculoskeletal pain in the past year (spanning at least 3 months), and current pain intensity rated ≥ 5 on a 10-point scale. We sought patients with at least moderate pain (≥ 5 points) because of larger study goals, which depended on some fraction of subjects initiating opioid therapy, and having sufficient pain to observe important analgesic effects. Subjects were required to have received no prescription opioid therapy for at least one month prior to enrollment, as one of the broader study goals was to compare those who did or did not subsequently initiate opioid therapy. The absence of opioid use in the past month was confirmed by three methods for every patient: medical record review, patient self-report, and baseline urine screen. Subjects were also required to have a telephone, not be planning to move, and not have cognitive impairment, as assessed by the St. Louis University Mental Status Examination (SLUMS) [ 17 ] or the Telephone Interview for Cognitive Status (TICS) [ 18 , 19 ].

Exclusion factors were a known adverse reaction to opioids, life expectancy less than two years (inferred from the presence of major systemic illness such as metastatic cancer or multiple hospitalizations for chronic organ failure), current prescription opioid use or use in the past month, or inability to provide informed consent due to dementia or major psychiatric illness.

Any treatments provided to patients over the 3-month study interval were entirely at the discretion of the treating clinician, and were in no way constrained by the study.

Measures

We focused largely on self-reported pain symptoms, functional status, and potential neuropsychological adverse effects of treatment. We used the 29-item PROMIS short form to minimize respondent burden, especially in the face of repeated measurements. Measures included:

  • • Demographics, pain history, work status, and disability compensation status;

  • • The Pain Catastrophizing Scale (PCS) [ 20 ];

  • • The PROMIS-29 profile, which includes: Pain Interference, four items; Pain Intensity, one item; Physical Function, four items; Fatigue, four items; Depression, four items; Anxiety, four items; Sleep Disturbance, four items; Satisfaction with Social Participation, four items [ 7 ];

  • • At 3-month follow-up, a self-rating of pain as “much less”, “a little less”, “about the same”, “a little worse”, or “much worse”.

Each measure (except the PCS) was collected at baseline (in-person) and at 3 months. Follow-up for this project was mainly by mail. When mail contact failed, we attempted telephone contact. We provided a financial incentive of $10 for in-person contacts and $5 for telephone or mail-based contacts.

Data Analysis

Baseline characteristics and baseline scores on the outcome measures were tabulated as means with standard deviations or as percentages. We calculated the impact score as recommended by the NIH Task Force. This calls for reversing the usual scoring scale of the physical function items so that a score of one is least severe and five is most severe. The impact score was calculated as the sum of the reversed raw physical function score plus the raw pain interference score plus the pain intensity score. The resulting range of possible scores is from eight (least impact) to 50 (greatest impact) [ 9 ].

We used the recommended T-score method for reporting all PROMIS scores, allowing the use of population norms for interpretation. With this method, a score of 50 points represents the population mean for each scale, and 10 points represent one standard deviation. Higher scores always indicate more of the particular scale’s construct, which may represent a desirable outcome or an undesirable outcome. For example, higher scores for the Physical Function scale represent better function, whereas higher scores on the Depression scale indicate more depressive symptoms.

Reliability was assessed as both internal consistency of the items for each PROMIS measure (Cronbach's alpha) and as test-retest reliability of each PROMIS measure. In general, scores above 0.70 are considered acceptable for both types of reliability [ 21 ]. Test-retest reliability was assessed for the subset of patients who reported that their pain was about the same at 3 months compared to baseline. We also examined the subset of patients whose 3-month follow-up pain intensity scores were within one point (plus or minus) of their baseline score. We measured test-retest reliability using the intraclass correlation coefficient (ICC agreement ) [ 21 ]. Unlike a simple product-moment correlation, in which scores can be highly correlated even if they are systematically different, the ICC considers not only the strength of correlation, but whether the slope and intercept differ from those expected with replicate measures. Values of the ICC range from 0 (indicating only random agreement) to 1 (indicating perfect agreement). The ICC is mathematically equivalent to the kappa statistic for nominal data or weighted kappa for ordinal data [ 21 , 22 ].

For examining construct validity, we chose not to add redundant measures of multiple domains for comparison with the PROMIS short form. Instead, we relied on comparisons with other patient features that allowed hypotheses regarding associations with the short form. Construct validity of the baseline scores was assessed by comparing scores between subjects who were or were not seeking worker's compensation, those with high or low scores on the Pain Catastrophizing Scale, and those with or without a history of a fall within the past 3 months.

We hypothesized that subjects who were seeking or receiving worker's compensation would have worse scores on most PROMIS measures than patients who were not, based on earlier studies reporting worse self-reported pain, function, and mental health scores among patients with back pain who receive worker’s compensation [ 23 , 24 ]. Similarly, we hypothesized that patients with high pain catastrophizing scores would have worse scores on most PROMIS measures than those with lower pain catastrophizing scores, because of studies associating catastrophizing with worse pain, functional disability, depression, and anxiety [ 25 ]. We also hypothesized that subjects with recent falls would have worse scores than those without, given evidence that musculoskeletal pain, functional ability, and depression are all risk factors for falling [ 26–28 ].

To assess responsiveness of the scales, we first tabulated mean score changes for patients who reported at 3 months that their pain was much better, slightly better, the same, slightly worse, or much worse. We calculated Spearman rank order correlations for PROMIS scales scores across these categories. Next, we calculated effect sizes for patients with self-reported improvement or worsening of pain. This was calculated as the difference between baseline and follow-up score divided by the standard deviation of baseline scores. We then tabulated standardized response means, calculated as the score change divided by the standard deviation of those score changes.

We examined ceiling and floor effects for the PROMIS scales by tabulating the proportion of subjects with the best or worst possible scores. For this purpose, we examined all the scores both at baseline and at 3-month follow-up.

Because our study subjects had a mix of musculoskeletal diagnoses, we conducted a stratified analysis comparing our largest diagnostic group (back pain) to patients with other diagnoses. All the analyses above were conducted for the two subgroups. Other individual diagnostic groups were not large enough for meaningful comparison.

Results

Participants

We enrolled 202 subjects, but four actively withdrew consent, leaving 198 enrolled participants. Their mean age was 66.5 years. Most patients were white, consistent with the general population of rural Oregon, and back pain was the most common single clinical condition. Almost 83% had pain for greater than two years, and approximately 40% had previous surgery for back, neck, or joint pain ( Table 1 ). We obtained 3-month follow-up data from 197 subjects (99%).

Table 1.

Baseline characteristics of study patients

Patient characteristic (n = 198) * n or Mean
Mean age (SD) 66.5 (8.2)
Gender, female n (%) 123 (62.1)
Education, n (%)
 Did not graduate from high school 19 (9.5)
 High school graduate 50 (25.2)
 Some college, did not graduate 72 (36.3)
 College graduate 41 (20.7)
 Graduate work 16 (8.0)
Non-Hispanic white, n (%) 180 (92.3)
Hispanic, n (%) 7 (3.6)
Current smoker, n (%) 30 (15.1)
Pain that bothers you most, n (%)
 Back pain 61 (30.8)
 Neck pain 15 (7.5)
 Joint pain 28 (14.1)
 Arthritis 31 (15.6)
 Other 63 (31.8)
Duration of current pain, n (%)
 Less than 3 months 7 (3.5)
 3 months–1 year 16 (8.1)
 Greater than 1 year 11 (5.5)
 Greater than 2 years 163 (82.7)
History of surgery for back, neck, joint or arthritis pain, n (%)
 Yes, more than once 39 (19.8)
 Yes, once 39 (19.8)
 No 118 (60.2)
Reports using opioids in past year, n (%) 29 (15.5)
Seeking or receiving disability compensation, n (%) 29 (14.6)

*Missing values from 0–4 persons per variable (0–2%), except opioids in past year, with 11 missing (5%).

“Other” diagnoses included carpal tunnel syndrome and other arm or hand pain (n = 8); shoulder pain, including rotator cuff injuries (n = 9); fibromyalgia (n = 7); foot pain, including plantar fasciitis (n = 11); hip or knee pain (n = 15); leg or ankle pain (n = 8); and ambiguous conditions (n = 5).

Average scores on the PROMIS measures were relatively stable over the 3-month follow-up ( Table 2 ). At the 3-month follow-up, the most common response to the query about change in pain was “about the same” (46%); with more patients reporting their pain being worse (32%) than being improved (22%) ( Table 2 ).

Table 2.

  Scores on patient-reported outcomes at baseline and follow-up

Baseline (n = 198)
3 months (n = 197)
Measure (range) Mean (SD) Mean (SD)
PROMIS 29 domains
 Pain intensity (0–10) 5.9 (1.8) 5.4 (2.1)
Domains as T-scores (population mean 50, sd 10)
 Pain interference 60.6 (5.8) 59.8 (7.0)
 Fatigue 53.8 (8.8) 53.9 (9.2)
 Sleep disturbance 52.5 (7.6) 53.2 (7.9)
 Anxiety 52.2 (8.5) 51.1 (8.8)
 Depression 49.9 (8.8) 49.9 (8.8)
 Satisfaction with social role 45.1 (9.5) 44.8 (9.1)
 Physical function 40.9 (6.7) 40.4 (6.3)
 Impact score (raw score, 8–50) 27.2 (7.8) 26.6 (8.8)
Pain Catastrophizing Scale (PCS)
 Rumination (0–16) 5.4 (4.2) *
 Magnification (0–12) 2.9 (2.6) *
 Helplessness (0–24) 5.6 (5.1) *
 Total PCS Score (0–52) 13.7 (11.0) *
Compared to 3 months ago, your pain now is: % %
 Much less 6.6 10.2
 A little less 13.1 11.7
 About the same 43.4 46.2
 A little worse 23.7 23.9
 Much worse 13.1 8.1

*PCS collected only at baseline.

Reliability

Internal consistency of the baseline PROMIS measures was generally high. Cronbach’s alpha ranged from 0.81 (sleep disturbance) to 0.95 (satisfaction with social role) ( Table 3 ). The NIH Research Task Force (RTF) impact score had a Cronbach’s alpha of 0.91.

Table 3.

  Internal consistency: Cronbach's alpha for PROMIS measures and the derived impact score

PROMIS measure Cronbach's alpha (standardized * )
Pain interference 0.92
Physical function 0.86
Fatigue 0.94
Sleep disturbance 0.81
Depression 0.92
Anxiety 0.85
Satisfaction with social role 0.95
Impact score 0.91

*Standardized so that each item has a standard deviation of 1 before computing Cronbach's coefficient alpha.

Among patients who rated their pain as “about the same” at 3 months of follow-up (n = 91), the PROMIS scores were relatively stable and provided an estimate of test-retest reliability ( Table 4 ). The ICCs ranged from 0.44 (for Pain Intensity) to 0.73 (for Depression), with most ICCs greater than 0.60. The Task Force impact score had an ICC of 0.73. Among patients whose 3-month Pain Intensity scores were within one point of baseline scores (n = 98), the PROMIS score ICCs ranged from 0.57 (Satisfaction with Social Role) to 0.76 (Fatigue), with all but one ICC greater than 0.60. The impact score had an ICC of 0.80.

Table 4.

  Test-retest reproducibility of PROMIS measures and the derived impact score

ICC * (95% CI)
PROMIS measure Patient’s pain “about the same” (n = 91) Pain intensity rating changed within +/− 1 point (n = 98)
Pain intensity 0.44 (0.29, 0.61) ——
Pain interference 0.58 (0.44, 0.71) 0.67 (0.56, 0.77)
Physical function 0.68 (0.56, 0.78) 0.70 (0.59, 0.79)
Fatigue 0.68 (0.56, 0.78) 0.76 (0.67, 0.84)
Sleep disturbance 0.70 (0.58, 0.79) 0.74 (0.64, 0.82)
Depression 0.73 (0.62, 0.81) 0.74 (0.64, 0.82)
Anxiety 0.63 (0.50, 0.75) 0.69 (0.58, 0.78)
Satisfaction with social role 0.54 (0.39, 0.68) 0.57 (0.43, 0.70)
RTF Impact Score 0.73 (0.62, 0.82) 0.80 (0.71, 0.86)

Values are intraclass correlation coefficients (ICCs) for patients who rated their pain as “about the same” or whose pain intensity was within 1 point of baseline value at 3 months following baseline assessment.

*ICC = Intraclass Correlation Coefficient.

Construct Validity

The evaluation of construct validity of the PROMIS measures is shown in Table 5 . As hypothesized, patients seeking or receiving worker's compensation had statistically significantly worse scores on every measure compared with patients who were not seeking or receiving compensation. Similarly, those with high catastrophizing scores had significantly worse scores on each PROMIS measure than those with low catastrophizing scores. Patients with recent falls had statistically significantly worse scores than those without recent falls on all but two measures: Pain Intensity and Sleep Disturbance.

Table 5.

  Evidence of construct validity of baseline PROMIS measures and the derived impact score

Worker’s compensation
Catastrophizing score (total)
Falls in previous 3 months
PROMIS measure Yes No P * <14 ≥14 P * Yes No P *
N 29.0 169.0 109.0 78.0 57.0 139.0
Pain intensity 6.9 (1.9) 5.8 (1.7) 0.002 5.5 (1.6) 6.6 (1.9) <.001 6.3 (1.8) 5.8 (1.8) 0.106
Pain interference 65.0 (4.9) 59.8 (5.6) <.001 58.6 (5.9) 63.4 (4.9) <.001 62.7 (6.1) 59.7 (5.5) <.001
Physical function 36.0 (4.4) 41.8 (6.7) <.001 43.2 (7.4) 38.3 (6.2) <.001 38.5 (5.9) 41.9 (6.8) 0.001
Fatigue 59.2 (8.7) 52.8 (8.5) <.001 51.3 (8.3) 57.4 (9.3) <.001 56.9 (8.7) 52.6 (8.6) 0.002
Sleep disturbance 56.2 (5.1) 51.8 (7.8) <.001 50.9 (8.3) 54.5 (7.9) 0.001 53.0 (8.0) 52.2 (7.5) 0.473
Depression 57.0 (9.0) 48.6 (8.1) <.001 46.5 (8.5) 54.2 (9.6) <.001 53.8 (9.6) 48.3 (8.0) <.001
Anxiety 57.5 (8.5) 51.3 (8.2) <.001 49.2 (8.8) 56.2 (8.6) <.001 55.9 (7.9) 50.9 (8.3) <.001
Satisfaction with social role 39.1 (6.9) 46.1 (9.5) <.001 47.1 (9.4) 42.5 (9.4) <.001 42.4 (8.6) 46.2 (9.8) 0.014
Impact score 34.2 (6.1) 26.0 (7.4) <.001 24.0 (6.8) 31.3 (6.7) <.001 30.6 (7.3) 25.9 (7.6) <.001

Tabled figures are all means (SD).

*T-test of means. Bolded P-values are significant (<0.05).

Responsiveness

Responsiveness of each measure was evaluated by tabulating the mean score change between baseline and 3-month follow-up, the effect size, and the standardized response means for patients who judged themselves improved, the same, or worse on the pain improvement scale ( Table 6 ). The mean score changes monotonically increased for Pain Intensity and Pain Interference, and monotonically decreased for Physical Function, in moving on the pain improvement scale from “much less” to “much worse”. Each of these correlations was statistically significant. Similarly, the changes in the mean impact score (derived from these three measures) showed a statistically significant monotonic increase across the pain improvement categories. Changes in Sleep Interference, Anxiety, and Social Satisfaction were also correlated with pain improvement categories (in the expected directions), but less strongly than were Pain Intensity, Physical Function, or Pain Interference. The correlations were statistically significant for Sleep Interference and Social Satisfaction, and borderline significant for anxiety (p = 0.06). Changes in Fatigue and Depression were minimal and not statistically significantly correlated with pain improvement categories.

Table 6.

  Responsiveness of PROMIS measures and derived impact score

Change in pain at 3 months compared to baseline
Much less (n = 20) A little less (n = 23) About the same (n = 91) A little worse (n = 47) Much worse (n = 16) Spearman correlation coefficient P
Mean score changes *
Pain intensity (10-point scale) −3.60 −1.48 −0.37 0.45 1.25 0.500 <.0001 
Pain interference −6.84 −1.82 −0.76 0.74 3.78 0.367 <.0001
Physical function 3.85 0.09 −0.57 −1.34 −3.85 −0.295 <.0001
Fatigue −2.47 1.45 0.20 −0.39 1.64 0.057 0.43
Sleep interference −2.43 1.17 0.49 1.32 3.62 0.188 0.01
Depression −2.44 3.67 −0.24 −0.54 1.74 0.004 0.95
Anxiety −5.82 −0.19 −1.19 0.48 −1.32 0.137 0.06
Satisfaction with social role 3.09 −0.55 0.11 −1.37 −2.79 −0.159 0.03
Impact score (8–50 scale) −10.16 −2.78 −0.64 1.53 6.06 0.497 <.0001
Effect sizes (change/baseline SD)
Pain intensity (10-point scale) −1.93 −0.79 −0.20 0.24 0.67
Pain interference −1.03 −0.28 −0.08 0.17 0.71
Physical function 0.68 0.07 −0.04 −0.16 −0.57
Fatigue −0.37 0.20 0.05 −0.05 0.14
Sleep interference −0.32 0.10 0.04 0.17 0.41
Depression −0.24 0.30 −0.02 −0.06 0.30
Anxiety −0.57 0.03 −0.14 0.04 0.02
Satisfaction with social role 0.34 0.01 0.01 −0.15 −0.39
Impact score (8–50 scale) −1.30 −0.36 −0.08 0.20 0.78
Standardized response means (change/SD of change)
Pain intensity (10-point scale) −1.61 −0.66 −0.17 0.20 0.56
Pain interference −1.07 −0.29 −0.08 0.18 0.74
Physical function 0.87 0.09 −0.05 −0.20 −0.72
Fatigue −0.50 0.28 0.07 −0.08 0.20
Sleep interference −0.39 0.12 0.04 0.21 0.50
Depression −0.29 0.37 −0.02 −0.08 0.37
Anxiety −0.66 0.03 −0.16 0.05 0.02
Satisfaction with social role 0.39 0.01 0.01 −0.17 −0.44
Impact score (8–50 scale) −1.53 −0.42 −0.10 0.23 0.91

*Means are T-scores with population mean of 50 unless otherwise specified. Correlations and P -values calculated on raw values.

P -values are the same as for the mean score changes.

Ceiling and Floor Effects

The percentage of scores at either the highest or lowest possible score for each measure was generally low ( Table 7 ). Only two scales (Depression and Anxiety) had more than 10% of assessments at the lowest possible score.

Table 7.

  Evidence of floor and ceiling effects (or lack thereof) in PROMIS measures and the derived impact score

Percent of responses
PROMIS measure Lowest possible score Highest possible score
Pain intensity 1.8 3.3
Pain interference 3.1 2.6
Physical function 0.5 6.6
Fatigue 4.4 2.3
Sleep interference 1.8 1.6
Depression 41.8 0.0
Anxiety 28.3 0.0
Satisfaction with social role 9.3 6.2
Impact score 0.3 0.0

Percent of responses at baseline and 3 months with the lowest and highest possible scores. Bolded values are those exceeding 10%.

Analysis Stratified by Diagnostic Subgroups

The analyses above were also conducted separately for patients with back pain (n = 61), and compared to patients having all other diagnoses (n = 137). Results in general were similar for these two subgroups. For example, Cronbach’s alpha for the PROMIS physical function scale was 0.92 for the overall population and also for each subgroup. Cronbach’s alpha was also nearly identical for the impact score in both subgroups, and for all the PROMIS scales. Scores for the patients with back pain and those with other diagnoses had similar associations with the presence or absence of worker's compensation, with catastrophizing scores, and with the presence or absence of falls. Measures of responsiveness were also similar. The main difference between subgroups in the stratified analysis concerned test-retest reliability, where most values were lower in the back pain subgroup than the other diagnoses subgroup. For example, in the 3-month retest, the kappa value for the physical function subscale of PROMIS was 0.62 in the back pain group compared with 0.78 in the other diagnosis group. Similarly, the impact score had a kappa value of 0.63 in the back pain group compared with 0.77 in the other diagnoses group. With the smaller sample sizes in the subgroups, some variability in test performance can be expected, but it appeared that test performance overall was similar in the two diagnostic subgroups.

Discussion

In this longitudinal study of older adults with chronic musculoskeletal pain, we found the PROMIS-29 to have good reliability and validity by conventional psychometric assessments. The four-item scales for individual PROMIS measures had high levels of internal consistency and the measures had substantial test-retest reliability. Furthermore, baseline scores for each measure were significantly associated with seeking or receiving worker's compensation and with levels of catastrophizing, and, for most measures, with recent history of a fall. Scores showed sizable changes over time that were consistent in direction with patient self-reports of pain improvement or worsening, suggesting appropriate responsiveness to change. The association of score changes with the pain improvement scale augments the evidence of validity in addition to indicating responsiveness of the short form scales.

The test-retest reliability coefficients, most around 0.7, may seem low to some observers. Some constructs, such as pain intensity, are expected to change and to be variably interpreted by patients over time. Further, we note that these coefficients were based on a repeat test after 3 months, a longer time interval than often used for test-retest evaluation [ 21 ], but corresponding more closely to a usual interval between clinician visits. Furthermore, this interval minimizes memory effects, which may inflate test-retest reliability when survey points are scheduled close to each other [ 29 ]. The results are therefore more typical of clinical patient reports of change, but reflect the greater variability expected over a longer interval. If we use conventional interpretations of the kappa statistic as a guide to interpreting the ICC, a score of 0.7 would be judged as “substantial agreement” [ 30 ].

For responsiveness to change over time, a common rule of thumb is that an effect size of 0.5 is moderate and one of 0.8 or greater is large [ 31 , 32 ]. By these standards, patients who reported “much less” pain showed large effect sizes for pain intensity, pain interference, and the impact score. Physical function showed a moderate effect size. Among those judging themselves “much worse,” moderate effect sizes were observed for pain intensity, pain interference, physical function, and the impact score. Although interpretation of standardized response means is challenging and may differ from that of the effect size measure [ 32 ], the values in Table 6 are similar to the corresponding effect sizes.

Identifying a minimal clinically important change from our data is crude, based on the relatively small numbers of patients who improved or worsened. We speculate that a minimally important change is one corresponding to patient reports of pain being between “slightly better” and “much better” (or between “slightly worse” and “much worse”). Based on the mean score changes tabulated in Table 6 , we might then estimate that for the pain intensity, pain interference, and physical function measures, score changes of around two points could be considered a minimal clinically important change. For the impact score, three points might be considered a minimal clinically important difference. Changes on the other scales in this study were not highly correlated with changes in pain intensity, and our data do not allow estimation of a minimal clinically important change.

These anchor-based estimates of minimally important change (using an external measure of change for comparison) are congruent with estimates in other studies of chronic musculoskeletal disease. Kroenke et al. suggested 2.0-2.5 for the PROMIS-29 Anxiety and Depression scales [ 16 ], and Hays et al. suggested 2 points on the PROMIS 20-item Physical Function short form [ 15 ].

Our study is limited by not including “legacy” measures of pain, function, or mood for direct comparison. Indirect comparisons between our data and other reports of well-validated legacy instruments must be made cautiously, because of different populations, methods, and measurement timing. Nonetheless, our findings for the PROMIS 29-item short form appear similar to the performance characteristics of several “legacy” instruments. As an example, studies of the Oswestry Disability Index for low back pain report Cronbach alpha statistics ranging from 0.71 to 0.87 [ 33 ]. Effect sizes for patients who judge themselves “better” or “much better” have been reported as 0.80 to 0.87 [ 34 , 35 ] with standardized response means in the range of 0.80 to 1.0 [ 35–37 ]. Studies of the more generic Physical Component Score (PCS) of the SF-12 have reported Cronbach alphas of 0.77 to 0.89 [ 38 , 39 ], and effect sizes of approximately 1.0 for patients judged “better” or “much better” [ 34 , 35 ]. Turner reported no ceiling or floor effects for the Physical Component Score [ 40 ]. Few instruments have reported test-retest statistics for intervals as long as 3 months, but the Roland and Morris Disability Questionnaire for back pain had test-retest reliability of 0.76 to 0.81 at this interval [ 41 ].

Making more direct instrument comparisons remains an important agenda for future research. Nonetheless, an emerging literature is beginning to provide a “cross-walk” between PROMIS scores and legacy measures of various constructs [ 42–46 ].

Other limitations include the sample of a rural, older, primarily white, northwestern US population, which may limit generalizability of the results. The heterogeneity of diagnoses, with relatively small numbers of each, and with the likelihood of multiple diagnoses in some patients, made meaningful analysis by diagnosis (other than back pain vs. non-back pain) unfeasible. We also recognize that the sources of pain identified by patients are not mutually exclusive, and that important overlaps are possible. For example, “arthritis” often plays a major role in back pain, neck pain, and extremity pain, so precise pathological distinctions cannot be made here.

Our results support the use of PROMIS-29 measures in clinical studies of patients with chronic musculoskeletal pain, with a caution about possible floor effects in the anxiety and depression scales. The PROMIS-29 also appears promising for cost-effectiveness evaluation [ 47 ]. However, it seems likely that computer adaptive testing, using the full PROMIS item banks, would perform better than the 29-item short form [ 8 ]. Thus, although the PROMIS SF-29 may be useful for many purposes, the greater precision of a longer form, the computer adaptive version, or a legacy measure may be desirable in particular circumstances. An example would be when one of the subscales here (such as physical function) is used as a primary outcome measure.

Our findings further support use of the impact score recommended by the NIH Task Force on Research Standards for Chronic Low Back Pain, even among patients with a wider range of musculoskeletal pain conditions. The impact score (with nine items rather than four) had among the best performance characteristics by every assessment. Further study is needed to evaluate how these measures compare with legacy measures and with other versions of the PROMIS measures.

Funding sources: Supported in part by grant number R21 AG042647 from the National Institute on Aging.

Conflicts of interest: The authors report no conflicts of interest relevant to this manuscript.

Disclosures: The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

References

  • 1. Dworkin RH, Turk DC, Wyrwich KW , et al. . Interpreting the clinical importance of treatment outcomes in chronic pain clinical trials: IMMPACT recommendations . J Pain 2008. ; 9 : 105 – 21 . [DOI] [PubMed] [Google Scholar]
  • 2. Dworkin RH, Turk DC, Farrar JT , et al. . Core outcome measures for chronic pain clinical trials: IMMPACT recommendations . Pain 2005. ; 113 : 9 – 19 . [DOI] [PubMed] [Google Scholar]
  • 3. Cella D, Yount S, Rothrock N , et al. . The Patient-Reported Outcomes Measurement Information System (PROMIS): Progress of an NIH Roadmap cooperative group during its first two years . Med Care 2007. ; 45 ( 5 suppl 1 ): S3 – 11 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Cella D, Riley W, Stone A , et al. . The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005-2008 . J Clin Epidemiol 2010. ; 63 : 1179 – 94 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Fries JF, Cella D, Rose M, Krishnan E, Bruce B. Progress in assessing physical function in arthritis: PROMIS short forms and computerized adaptive testing . J Rheumatol 2009. ; 36 : 2061 – 6 . [DOI] [PubMed] [Google Scholar]
  • 6. Amtmann D, Cook KF, Johnson KL, Cella D. The PROMIS initiative: Involvement of rehabilitation stakeholders in development and examples of applications in rehabilitation research . Arch Phys Med Rehabil 2011. ; 92 ( 10 suppl 1 ): S12 – 19 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Instruments available for use in Assessment Center . Available at: http://www.assessmentcenter.net/documents/InstrumentLibrary.pdf (accessed October 6, 2015).
  • 8. Choi SW, Reise SP, Pilkonis PA, Hays RD, Cella D. Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms . Qual Life Res 2010. ; 19 : 125 – 36 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Deyo RA, Dworkin SF, Amtmann D , et al. . Report of the NIH task force on research standards for chronic low back pain . J Pain 2014. ; 15 : 569 – 85 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Durkin B, Romeiser J, Shroyer AL , et al. . Report from a quality assurance program on patients undergoing the MILD procedure . Pain Med 2013. ; 14 : 650 – 6 . [DOI] [PubMed] [Google Scholar]
  • 11. Karp JF, Yu L, Friedly J, Amtmann D, Pilkonis PA. Negative affect and sleep disturbance may be associated with response to epidural steroid injections for spine-related pain . Arch Phys Med Rehabil 2014. ; 95 : 309 – 15 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Hung M, Hon SD, Franklin JD , et al. . Psychometric properties of the PROMIS physical function item bank in patients with spinal disorders . Spine 2014. ; 39 : 158 – 63 . [DOI] [PubMed] [Google Scholar]
  • 13. Hung M, Stuart AR, Higgins TF, Salzman CL, Kubiak EN. Computerized adaptive testing using the PROMIS physical function item bank reduces test burden with less ceiling effects compared with the short musculoskeletal function assessment in orthopedic trauma patients . J Orthop Trauma 2014. ; 28 : 439 – 43 . [DOI] [PubMed] [Google Scholar]
  • 14. Shahgholi L, Yost KJ, Kallmes DF. Correlation of the national institutes of health patient reported outcomes measurement information system scales and standard pain and functional outcomes in the spine augmentation . Am J Neuroradiol 2012. ; 33 : 2186 – 90 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Hayes RD, Spritzer KL, Fries JF, Krishnan E. Responsiveness and minimally important difference for the Patient-Reported Outcomes Measurement Information System (PROMIS) 20-item physical functioning short form in a prospective observational study of rheumatoid arthritis . Ann Rheum Dis 2015. ; 74 : 104 – 07 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Kroenke K, Yu Z, Wu J, Kean J, Monahan PO. Operating characteristics of PROMIS four item depression and anxiety scales in primary care patients with chronic pain . Pain Med 2014. ; 15 : 1892 – 901 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Tariq SH, Tumosa N, Chibnall JT, Perry MH, 3rd, Morley JE. Comparison of the Saint Louis University mental status examination and the mini-mental state examination or detecting dementia and mild neurocognitive disorder—A pilot study . Am J Geriatr Psychiatry 2006. ; 14 : 900 – 10 . [DOI] [PubMed] [Google Scholar]
  • 18. Castanho TC, Amorim L, Zihl J , et al. . Telephone-based screening tools for mild cognitive impairment and dementia in aging studies: A review of validated instruments . Front Aging Neurosci 2014. ; 6 : 1 – 17 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Brandt J, Spencer M, Folstein M. The telephone interview for cognitive status . Neuropsychiatry Neuropsychol Behav Neurol 1988. ; 1 : 111 – 7 . [Google Scholar]
  • 20. Sullivan MJ, Bishop SR, Pivik J. The Pain Catastrophizing Scale: Development and validation . Psychol Assess 1995. ; 7 : 524 – 32 . [Google Scholar]
  • 21. Terwee CB, Bot SDM, de Boer MR , et al. . Quality criteria were proposed for measurement properties of health status questionnaires . J Clin Epidemiol 2007. ; 60 : 34 – 42 . [DOI] [PubMed] [Google Scholar]
  • 22. Fleiss JL, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability . Educ Psychol Meas 1973. ; 33 : 613 – 9 . [Google Scholar]
  • 23. Atlas SJ, Singer DE, Keller RB, Patrick DL, Deyo RA. Applications of outcomes research in occupational low back pain: The Maine Lumbar Spine Study . Am J Industr Med 1996. ; 29 : 584 – 9 . [DOI] [PubMed] [Google Scholar]
  • 24. Atlas SJ, Chang Y, Kammann E , et al. . Long term disability and return to work among patients who have a herniated lumbar disc: The effect of disability compensation . J Bone Joint Surg 2000. ; 82A : 4 – 15 . [DOI] [PubMed] [Google Scholar]
  • 25. Sullivan MJL, Thorn B, Haythornthwaite JA , et al. . Theoretical perspectives on the relation between catastrophizing and pain . Clin J Pain 2001. ; 17 : 52 – 64 . [DOI] [PubMed] [Google Scholar]
  • 26. Rubenstein LZ. Falls in older people: Epidemiology, risk factors and strategies for prevention . Age Aging 2006. ; 35 ( S2 ): ii37 – 41 . [DOI] [PubMed] [Google Scholar]
  • 27. Leveille SG, Jones RN, Kiely DP , et al. . Chronic musculoskeletal pain and the occurrence of falls in an older population . JAMA 2009. ; 302 : 2214 – 21 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Eggermont LHP, Penninx BWJH, Jones RN, Leveille SG. Depressive symptoms, chronic pain, and falls in older community-dwelling adults: The MOBILIZE Boston study . J Am Geriatr Soc 2012. ; 60 : 230 – 7 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Reeve BB, Wyrwich KW, Wu AW , et al. . ISOQOL recommends minimum standards for patient-reported outcome measures used in patient-centered outcomes and comparative effectiveness research . Qual Life Res 2013. ; 22 : 1889 – 905 . [DOI] [PubMed] [Google Scholar]
  • 30. Viera AJ, Garrett JM. Understanding interobserver agreement: The kappa statistic . Fam Med 2005. ; 37 : 360 – 3 . [PubMed] [Google Scholar]
  • 31. Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health status . Med Care 1989. ; 27 ( 3 suppl ): S178 – 89 . [DOI] [PubMed] [Google Scholar]
  • 32. Middel B, van Sonderen E. Statistical significant change versus relevant or important change in (quasi) experimental design: Some conceptual and methodological problems in estimating magnitude of intervention-related change in health services research . Int J Integrated Care 2002. ; 2 : 1 – 18 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Roland M, Fairbank J. The Roland-Morris disability questionnaire and the Oswestry disability questionnaire . Spine 2000. ; 25 : 3115 – 24 . [DOI] [PubMed] [Google Scholar]
  • 34. Taylor SJ, Taylor AE, Foy MA, Fogg AJB. Responsiveness of common outcome measures for patients with low back pain . Spine 1999. ; 24 : 1805 – 12 . [DOI] [PubMed] [Google Scholar]
  • 35. Walsh TLL, Hanscom B, Lurie JD, Weinstein JN. Is a condition-specific instrument for patients with low back pain/leg symptoms really necessary? The responsiveness of the Oswestry Disability Index, MODEMS, and the SF-36 . Spine 2003. ; 28 : 607 – 15 . [DOI] [PubMed] [Google Scholar]
  • 36. Beurskens AJHM, de Vet HCW, Koke AJA. Responsiveness of functional status in low back pain: A comparison of different instruments . Pain 1996. ; 65 : 71 – 6 . [DOI] [PubMed] [Google Scholar]
  • 37. Grotle M, Brox JI, Vollestad NK. Concurrent comparison of responsiveness in pain and functional status measures used for patients with low back pain . Spine 2004. ; 29 : E492 – 501 . [DOI] [PubMed] [Google Scholar]
  • 38. Luo X, George ML, Kakouras I , et al. . Reliability, validity, and responsiveness of the short form 12-item survey (SF-12) in patients with back pain . Spine 2003. ; 28 : 1739 – 45 . [DOI] [PubMed] [Google Scholar]
  • 39. Ware JE, Kosinski M, Keller S. A 12-item short-form health survey: Construction of scales and preliminary tests of reliability and validity . Med Care 1996. ; 34 : 220 – 33 . [DOI] [PubMed] [Google Scholar]
  • 40. Turner JA, Fulton-Kehoe D, Franklin G, Wickizer TM, Wu R. Comparison of the Roland-Morris disability questionnaire and generic health status measures: A population-based study of workers compensation back injury claimants . Spine 2003. ; 28 : 1061 – 7 . [DOI] [PubMed] [Google Scholar]
  • 41. Atlas SJ, Deyo RA, van den Ancker M , et al. . The Maine-Seattle back questionnaire: A 12-item disability questionnaire for evaluating patients with lumbar sciatica or stenosis: Results of a derivation and validation cohort analysis . Spine 2003. ; 28 : 1869 – 76 . [DOI] [PubMed] [Google Scholar]
  • 42. Choi SW, Schalet B, Cook KF, Cella D. Establishing a common metric for depressive symptoms: Linking the BDI-II, CES-D and PHQ-9 to PROMIS depression . Psychol Assess 2014. ; 26 : 513 – 27 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Schalet BD, Cook KF, Choi SW, Cella D. Establishing a common metric for self-reported anxiety: Linking the MASQ, PANAS, and GAD-7 to PROMIS anxiety . J Anxiety Dis 2014. ; 28 : 88 – 96 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Schalet BD, Revicki DA, Cook KF , et al. . Establishing a common metric for physical function: Linking the HAQ-DI and SF-36 PF subscale to PROMIS physical function . J Gen Intern Med 2015. ; 30 : 1517 – 23 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Askew RL, Kim J, Chung H , et al. . Development of a crosswalk for pain interference measured by the BPI and PROMIS pain interference short form . Qual Life Res 2013. ; 22 : 2769 – 76 . [DOI] [PubMed] [Google Scholar]
  • 46. Cook KF, Schalet DD, Kallen MA, Rutsohn JP, Cella D. Establishing a common metric for self-reported pain: Linking BPI pain interference and SF-36 bodily pain subscale scores to the PROMIS pain interference metric . Qual Life Res 2015. ; 24 : 2305 – 18 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Craig BM, Reeve BB, Brown PM , et al. . US valuation of health outcomes measured using the PROMIS-29 . Value Health 2014. ; 17 : 846 – 53 . [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Pain Medicine: The Official Journal of the American Academy of Pain Medicine are provided here courtesy of Oxford University Press

RESOURCES