Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jan 1.
Published in final edited form as: J Pain. 2023 Aug 5;25(1):142–152. doi: 10.1016/j.jpain.2023.07.028

Comparable Minimally Important Differences and Responsiveness of Brief Pain Inventory and PEG Pain Scales across Six Trials

David E Reed II 1,2, Timothy E Stump 3, Patrick O Monahan 3, Kurt Kroenke 4,5
PMCID: PMC10859144  NIHMSID: NIHMS1922969  PMID: 37544394

Abstract

The 3-item pain intensity (P), interference with the enjoyment of life (E), and interference with general activity (G), or PEG, has become one of the most widely used measures of pain severity and interference. The minimally important differences (MID) and responsiveness of the PEG are essential metrics for solidifying its role in research and clinical care. The current study aims to establish the MID and responsiveness of the PEG by synthesizing data from 1,710 participants across 6 controlled trials. MIDs were estimated using absolute score changes among individuals reporting their pain was “a little better” on a retrospective global change anchor as well as distribution-based estimates using standard deviation thresholds and 1 and 2 standard errors of measurement. Responsiveness was assessed using standardized response means, area under the curve, and treatment effect sizes. MID estimates for the PEG ranged from 0.60 to 1.1 when using 0.35 SD, and 0.78 to 1.22 using 1 standard error of measurement. MID estimates using the global anchor had somewhat more variability but most estimates ranged from 1.0 to 1.75. Responsiveness effect sizes were generally large (> .80) for standardized response means and moderate (> .50) for treatment effect. Similarly, the most area under the curve values demonstrated an acceptable level of scale responsiveness (≥.70). Importantly, MID estimates and responsiveness of the PEG and BPI scales were largely comparable when aggregating data across trials. Our synthesis indicates that 1 point is a reasonable MID estimate on these 0- to 10-point pain scales, with 2 points being an upper bound.

Keywords: PEG, Brief Pain Inventory, pain, psychometrics, measurement


Chronic pain remains one of the most significant and debilitating conditions in the United States, affecting millions of individuals.17,19 Pain is reported across a variety of patient populations, including in primary care24,47 and oncology1 clinics, where clinicians are tasked to efficiently assess pain severity (ie, the intensity of pain) and interference (ie, the degree to which pain disrupts daily activities). Due to the need for a brief measure that assesses both pain severity and interference, the 3-item pain intensity (P), interference with enjoyment of life (E), and interference with general activity (G), or PEG24, has become one of the most widely used ultra-brief measures used to assess pain severity and interference. However, establishing the minimally important difference (MID) of the PEG as well as its responsiveness are critical steps to support its increasing use.

The PEG was derived from the longer 11-item Brief Pain Inventory (BPI) legacy scale6 which includes assessments of pain severity (4 items), in addition to pain interference (7 items), across several health domains (eg, mood, walking ability, and sleep). The Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT) recommendations11 and guidelines from the Veterans Health Administration Pain Measures Work Group28 include the assessment of both pain severity and interference. Recommendations specifically mention the BPI-Interference subscale and pain numeric scale (which is included in both the PEG and BPI-Severity subscale). An MID of 1 point for the BPI-Interference subscale and the numeric rating scale (NRS) has been proposed.12 While assessing both pain intensity and interference broadens the scope of information gathered in research and clinical settings, administering all items is less efficient. Comparatively, the PEG has only 3 items, efficiently integrating the assessment of both pain severity and interference into a single composite score.

The MID is defined as the least required amount of reported change on an assessment to be considered important for an individual.36 Responsiveness is defined as a measure’s capability to detect meaningful change over time due either to an intervention or change that occurs naturalistically.34 A variety of methods has been established to determine MIDs and responsiveness15, but they broadly encompass two strategies. Distribution-based methods use psychometric properties of the assessments (eg, standard deviations and reliability) to establish an MID. In contrast, anchor-based methods analyze change relative to a retrospective or prospective item that participants answer about self-perceived improvement.

Because different methods may produce somewhat different estimates of MID, some experts recommend triangulating both anchor-based and distribution-based methods when estimating MID.40,45 Common anchors include the patient-rated global impression of change and comparison with absolute change on a legacy measure of the same domain. Common distribution-based methods include 0.2 to 0.5 standard deviations (SDs) or 1 to 2 standard errors of measurement (SEMs) as the lower and upper bounds of an MID.5,26,30,39,40,42,51 Anchor-based methods have important advantages50 and are recommended by the FDA as a preferred method46; however, supplementing anchor-based with distribution-based approaches may be informative, especially when different methods demonstrate reasonable convergence.

Although the PEG has had relatively rapid clinical and research uptake, its use and interpretation will be substantially enhanced by further evidence regarding its MID and responsiveness. A recent push to ensure randomized controlled trials (RCTs) are adequately powered to detect clinically, as opposed to merely statistically, meaningful changes, further emphasizes the importance of establishing MIDs for the PEG.14 Moreover, the movement towards using measurement-based care to monitor and adjust treatment for physical and psychological symptoms13 highlights the importance of determining the responsiveness of patient-reported outcome measures. Thus, further establishing the MID and responsiveness of the PEG will enable clinicians and researchers to assess whether patients or participants are improving meaningfully over time, and will serve as accurate benchmarks to determine within and between group effect sizes and power calculations for clinical trials.

The current study is a psychometric synthesis of the PEG and aims to assess the MID and responsiveness of the PEG and compare its psychometric properties to the BPI legacy measure across 6 randomized clinical trials in a variety of clinical settings and patients with pain. The PEG was compared to the total BPI score, in addition to the BPI-Severity and BPI-Interference subscales. In integrating the findings across the 6 trials, we hypothesized that the ultra-brief PEG would have generally comparable MID estimates and responsiveness compared to its legacy, but longer, parent measure, the BPI.

Methods

We examined data across 6 randomized clinical trials to compare the BPI and PEG. These trials were chosen because they included the requisite data for calculating the MIDs and responsiveness for the PEG and 3 BPI scales. Some of the MID and responsiveness metrics have previously been published for the individual trials.4,5,19,22,32 In the current paper, however, we summarize and synthesize results across all 6 trials. Additionally, the integration of PEG findings is especially important given its emergence as an ultra-brief pain measure with accelerating clinical and research use.

The Stepped Care for Affective disorders and Musculoskeletal Pain trial (SCAMP; NCT00118430) included 427 adults diagnosed with co-occurring musculoskeletal pain and depression recruited from primary care clinics.22,25 The Indiana Cancer Pain and Depression trial (INCPAD; NCT00313573) included 274 adults with cancer diagnosed with pain and/or depression recruited from outpatient oncology clinics.31,32 The Stepped Care Optimizing Pain Care Effectiveness trial (SCOPE; NCT00926588) included 250 adults who had chronic musculoskeletal pain recruited from primary care clinics.19,29 The Care Management for the Effective use of Opioids trial (CAMEO; NCT01236521) included 261 adults with low back pain receiving long-term opioid therapy recruited from multiple Veteran Affairs (VA) primary care clinics.2,4,5 The Strategies for Prescribing Analgesics Comparative Effectiveness trial (SPACE; NCT01583985) included 240 adults with chronic back pain, or hip or knee osteoarthritis pain recruited from multiple VA primary care clinics.4,5,23 The Stroke Survivors Self-Management trial (SSM; NCT01507688) included 258 adults with stroke recruited from several hospitals, including VA hospitals.4,5 Table 1 summarizes the trial information.

Table 1.

Key Characteristics of the Six Randomized Clinical Trials (n = 1710)

Trial SCAMP (n=427) INCPAD (n=274) SCOPE (n=250) CAMEO (n=261) SPACE (n=240) SSM (n=258)
Clinical Population Co-occurring chronic musculoskeletal pain and depression Cancer with pain and/or depression Chronic musculosk eletal pain Low Back Pain Chronic back, hip, or knee pain Post-Stroke
Setting Primary Care Oncology Primary Care Primary Care Primary Care Neurology
Age, mean (SD) 59.1 (13.0) 58.1 (10.5) 55.2 (8.5) 57.9 (9.5) 58.3 (13.7) 61.7 (10.8)
Sex, n (%)
 Men 199 (46.6) 93 (33.9) 207 (82.8) 241 (92.3) 208 (86.7) 209 (81.0)
 Women 228 (53.4) 181 (66.1) 43 (17.2) 20 (7.7) 32 (13.3) 49 (19.0)
Race, n (%)
 White 249 (58.3) 212 (77.4) 192 (76.8) 191 (73.2) 207 (86.2) 166 (64.3)
 Black 163 (38.2) 57 (20.8) 48 (19.2) 54 (20.7) 18 (7.5) 78 (30.2)
 Other 15 (3.5) 5 (1.8) 10 (4.0) 16 6.1) 15 (6.3) 14 (5.4)
Intervention group Optimized Antidepres sants and Pain Self-Management Optimized Antidepres sants and Analgesics Optimized Analgesics Optimized Analgesics Optimized Opioid Analgesics Stroke Self-Management Program
Control group Usual Care Usual Care Usual Care Cognitive Behavioral Therapy Optimized Non-Opioid Analgesics Usual Care
Retrospective global anchor 7-item Likert Version A 7-item Version A 7-item Version B 7-item Version B 7-item Version B 7-item Version B
*

Data in all studies represent sex assigned at birth.

The participant is asked whether, since the last assessment: “Overall would you say your pain is …”. The 7 response options in Version A are worse, about the same, a little better, somewhat better, moderately better, a lot better, completely better (pain is gone). The 7 response options in Version B are much better, moderately better, a little better, no change, a little worse, moderately worse, much worse.

Measures

Brief Pain Inventory.

The Brief Pain Inventory6,33 includes both a total score as well as 2 subscales for severity and interference. The severity subscale includes 3 items assessing pain at its worst, least, and average over the past 24 hours, and 1 item assessing current pain level. Participants respond on a scale ranging from 0 (no pain) to 10 (pain as bad as you can imagine). The Interference subscale has 7 items assessing how pain has interfered with general activity, mood, walking ability, work/housework, relationships, sleep, and enjoyment of life. Participants respond on a scale ranging from 0 (does not interfere) to 10 (completely interferes). Unweighted means of items are used to calculate the total scale score and subscales, and higher scores represent more pain severity and/or interference.

PEG.

The PEG24 is a 3-item assessment of pain severity and interference. Participants respond to the item “What number best describes your pain on average in the past week” on a 0 (no pain) to 10 (pain as bad as you can imagine) scale. Participants respond to the items “What number best describes how, during the past week, pain has interfered with your enjoyment of life?” and “What number best describes how, during the past week, pain has interfered with your general activity” on a 0 (does not interfere) to 10 (completely interferes) scale. The total scale score equals an unweighted mean of items, and higher scores represent more pain severity and/or interference.

Retrospective Global Change.

Individuals are asked on a 7-point Likert scale whether, since their last assessment: “Overall would you say your pain is …”. In SCAMP and INCPAD, the 7 response options are worse, about the same, a little better, somewhat better, moderately better, a lot better, and completely better (pain is gone). In the other 4 trials, the 7 response options are much better, moderately better, a little better, no change, a little worse, moderately worse, much worse.

Data Analysis

Using established methods to estimate MIDs and responsiveness18,27,39, we synthesized previous psychometric work4,5,19,22,32 coupled with new analyses of the PEG to estimate its MID and determine its responsiveness. Distribution-based approaches used baseline data (T1) from the full study sample without comparing subgroups. Anchor-based approaches used data from either the follow-up assessment (T2) for the retrospective anchor or from two timepoints – baseline (T1) and follow-up (T2) – for the longitudinal application of cross-sectional anchors, and compared subgroups according to patient-reported global change. One responsiveness metric (between-group treatment effect size) compared longitudinal change according to intervention versus control group status.

MID Estimates using an Anchor-Based Approach

Absolute changes in scale scores from baseline to follow-up (T1 to T2) were determined for individuals who reported they were “a little better” at follow-up (T2) on the 7-point retrospective global anchor.

MID Estimates using Distribution-Based Approaches

SD Thresholds.

Proportional values of the SD represent one type of effect size and are a common distribution-based method for estimating a MID. Small, medium, and large ES are considered .2 SD, .5 SD., and .8 SD, respectively.7 Therefore, as in previous work5,39, we considered .35 SD to be a reasonable point estimate of MID as it is midway between a small and medium ES. We also report .5 SD as a more conservative upper bound of the MID.5,19,22

Standard Error of Measurement.

The SEM is another distribution-based method of estimating an MID. The SEM is calculated by multiplying the standard deviation by the square root of 1-reliability. We used Cronbach’s alpha as the reliability estimate. Prior studies have found that 1 SEM corresponds to anchor-based MIDs.5,49,51 Depending on the context, 2 SEM can also be an appropriate approach to estimate the MID.48 Thus, we considered 1 SEM to be a reasonable point estimate of MID, with 2 SEM representing an upper bound. Of note, when reliability = .75, 1 SEM = .50 SD (See Table 2 footnote for SEM formula).

Table 2.

Means, Reliability, and Minimally Important Differences of Brief Pain Inventory and PEG Pain Scales in 6 Trials (n=1,710)

Variable SCAMP (n=427) INCPAD (n=274) SCOPE (n=250) CAMEO (n=261) SPACE (n=240) SSM (n=258)
Population Primary care Oncology Primary care Primary care Primary care Post-Stroke
Mean (SD)
• BPI Severity 5.7 (1.8) 5.2 (1.8) 5.1 (1.7) 6.8 (1.6) 5.6 (1.4) 2.9 (2.8)
• BPI Interference 5.8 (2.4) 5.7 (2.6) 5.3 (2.2) 6.4 (2.1) 5.5 (1.9) 2.7 (3.0)
• BPI Total 5.7 (2.0) 5.5 (2.1) 5.2 (1.8) 6.5 (1.8) 5.5 (1.6) 2.8 (2.8)
• PEG 6.0 (2.2) 5.9 (2.2) 5.4 (2.0) 6.5 (1.9) 5.8 (1.7) 2.9 (3.0)
Cronbach’s alpha internal reliability
• BPI Severity 0.83 0.79 0.87 0.79 0.84 0.86
• BPI Interference 0.87 0.89 0.88 0.86 0.85 0.94
• BPI Total 0.88 0.89 0.91 0.87 0.87 0.94
• PEG 0.73 0.69 0.76 0.72 0.79 0.85
Anchor-Based MIDs
A Little Better*
• BPI Severity 1.36 -- 0.73 0.87 1.17 −0.39
• BPI Interference 2.02 -- 1.34 1.41 1.67 −0.52
• BPI Total 1.82 -- 1.04 1.26 1.48 −0.55
• PEG 2.05 -- 1.25 1.18 2.02 −0.52
Distribution-Based MIDs
0.35 standard deviation (SD)
• BPI Severity 0.63 0.63 0.60 0.56 0.49 0.98
• BPI Interference 0.84 0.91 0.77 0.74 0.67 1.05
• BPI Total 0.70 0.74 0.63 0.63 0.56 0.98
• PEG 0.77 0.77 0.70 0.67 0.60 1.05
0.50 standard deviation (SD)
• BPI Severity 0.90 .90 0.85 0.80 0.70 1.40
• BPI Interference 1.20 1.3 1.10 1.05 0.95 1.50
• BPI Total 1.00 1.1 0.90 0.90 0.80 1.40
• PEG 1.10 1.1 1.00 0.95 0.85 1.50
Variable SCAMP INCPAD SCOPE CAMEO SPACE SSM
1 SEM
• BPI Severity 0.74 0.82 0.61 0.72 0.57 1.04
• BPI Interference 0.87 0.86 0.76 0.77 0.75 0.74
• BPI Total 0.69 0.70 0.54 0.63 0.58 0.69
• PEG 1.14 1.22 0.98 1.01 0.78 1.15
2 SEM
• BPI Severity 1.48 1.65 1.23 1.44 1.13 2.08
• BPI Interference 1.73 1.72 1.52 1.53 1.49 1.48
• BPI Total 1.39 1.39 1.08 1.26 1.15 1.38
• PEG 2.29 2.45 1.96 2.02 1.55 2.30
*

Absolute change in score from baseline to follow-up in individuals who reported on the retrospective global anchor that they were “a little better” at follow-up (4 trials). For the SCAMP that used Version A of the retrospective global anchor with 5 categories (rather than 3 categories) of improvement, score changes are for those reporting “a little” or “somewhat” better. Data not available for INCPAD.

Difference of 0.2 SD is often considered a small effect, and 0.5 SD is considered a moderate effect, with 0.35 midway between small and moderate. Some consider 0.35 to 0.50 SD differences as one distribution-based method for estimating minimally important difference (MID)

SEM = standard error of measurement = SD×1a, where α = Cronbach’s alpha. The SEM is a second distribution-based method of estimating an MID, for which 1 to 2 SEM are often considered the lower and upper bounds.

Responsiveness using Anchor-Based Approaches

Standardized Response Mean (SRM).

The SRM is calculated as a standardized difference in scale scores between 2-time points (ie, T1–T2/SD of change scores) that corresponds to participants’ response to a global anchor item (improved, unchanged, or worse). Our analyses focused on the SRM for the patient group that reported improvement compared to those that did not improve. All 6 trials used a retrospective global anchor wherein respondents at the follow-up time point (T2) reported their change in pain compared to the initial time point (T1). Additionally, 3 trials used a prospective global anchor wherein respondents provided cross-sectional global pain estimates at two time points (T1 and T2) and the difference in T1 and T2 global pain estimates were calculated to classify individuals as improved, unchanged or worse.4 The SRM is a type of effect size for assessing responsiveness, and SRMs of .2, .5, and .8 are considered small, medium, and large effect sizes respectively.7 Although these SRM thresholds were originally derived for Cohen’s d effect sizes,37 differences between SRM and Cohen’s d are generally quite small; therefore, Cohen’s d thresholds can serve as a reasonable approximation of thresholds for SRM20 as well as between-group treatment effect sizes (described below).19

Area Under the Curve.

The area under the curve (AUC) is an anchor-based method of assessing responsiveness and is determined by a receiver operator characteristic (ROC) analysis.41 The discriminatory strength of the scale for determining any improvement and moderate improvement using the retrospective global anchor was estimated by the AUC. Some experts recommend an AUC ≥ .70 as a threshold for responsiveness when using a criterion standard anchor but also acknowledge that criterion standards often do not exist for patient-reported outcomes.38,43 Comparable AUCs suggest similar responsiveness of scales.

Responsiveness to Treatment

Between Group Treatment Effect Sizes.

This responsiveness metric was calculated by subtracting the control group mean change from the intervention group mean change and dividing this difference by the standard deviation of the pooled change score. This metric could be calculated for the 3 trials that had a true control (usual care) group rather than an active comparator. Treatment effect sizes of .2, .5, and .8 represent small, medium, and large intervention effects, respectively.7

Results

Sample Characteristics

Study and participant characteristics are summarized in Table 1. Table 1 also summarizes the type of intervention or control exposure each group received in the two-arm trials as well as the retrospective global change anchor used. Patients were recruited from either primary care clinics (n=4), an oncology clinic (n=1), or after being diagnosed with stroke (n=1). The 6 trials included a total of 1,710 participants, with the sample size across trials ranging from 240 to 427. The average age of participants in each trial ranged from 55.2 to 61.7. Men constituted 67.7% of the total sample, ranging from 33.9% to 92.3% across the trials. Overall, 71.2% of participants were white (range across trials, 58.3%–86.2%) and 24.4% were black (range, 7.5%–38.2%)

In all but one trial (post-stroke), participants on average endorsed moderate pain severity and interference (Table 2). Specifically in these 5 trials, PEG scores ranged from 5.4 to 6.5; BPI-Severity, from 5.1 to 6.8; BPI-Interference, from 5.3 to 6.4; and BPI-Total, from 5.2 to 6.5. In the post-stroke trial, scale scores reflected milder pain.

Minimally Important Differences

Table 2 summarizes the means and internal consistency reliability of the scales, as well the distribution-based and anchor-based estimates of MIDs. MID estimates using .35 SD ranged from .60 to 1.05 for the PEG, .49 to .98 for BPI severity, .67 to 1.10 for BPI interference, and .56 to .98 for BPI total. MID estimates using 1 SEM ranged from .78 to 1.22 for the PEG, .57 to 1.04 for BPI severity, .74 to .87 for BPI interference, and .54 to .70 for BPI total. Data for .5 SD and 2 SEM is also summarized in the table.

MID estimates using change scores from individuals reporting being “a little better” on the retrospective global anchor revealed somewhat greater variability. For 3 trials (SCOPE, CAMEO, and SPACE), most MID estimates for the 4 scales using the global anchor were in the 1.0 to 1.75 range. Conversely, the SSM trial showed unexpectedly small negative MIDs using this global anchor, whereas the SCAMP trial showed slightly larger MIDs (possibly because “a little better” and “somewhat better” were collapsed for the global anchor in SCAMP which assessed 5 levels of improvement (rather than the 3 levels of improvement assessed in the other 4 trials).

Responsiveness

Table 3 summarizes data regarding responsiveness using anchor-based approaches (SRM and AUC) and treatment response. The SRM for improvement with the retrospective global anchor for 5 of the 6 trials (excluding the post-stroke trial which was an extreme outlier) ranged from .77 to 1.43 for the PEG, .71 to 1.18 for BPI severity, .76 to 1.20 for BPI interference, and .83 to 1.29 for BPI total. For the 3 trials which had a prospective global anchor, SRM estimates were generally similar to estimates using a retrospective anchor in 2 trials, but higher for the post-stroke trial. For 5 of the 6 trials (excluding the post-stroke trial), AUC values of the PEG demonstrated an acceptable level of scale responsiveness (≥ .70).

Table 3.

Responsiveness of Brief Pain Inventory and PEG Pain Scales in 6 Trials

Variable* SCAMP (n=427) INCPAD (n=274) SCOPE (n=244) CAMEO (n=261) SPACE (n=240) SSM (n=258)
Population Primary care Oncology Primary care Primary care Primary care Post-Stroke
Anchor-Based Responsiveness
SRM for Improvement, retrospective*
• BPI Severity 1.00 1.13 0.71 0.72 1.18 0.17
• BPI Interference 0.86 0.91 0.94 0.76 1.20 0.17
• BPI Total 1.02 1.10 0.93 0.83 1.29 0.17
• PEG 0.99 1.08 0.86 0.77 1.43 0.18
SRM for Improvement, prospective*
• BPI Severity -- -- -- 0.67 1.35 0.72
• BPI Interference -- -- -- 0.65 1.21 0.75
• BPI Total -- -- -- 0.73 1.31 0.76
• PEG -- -- -- 0.85 1.45 0.77
AUC for any improvement
• BPI Severity .82 .78 .73 .72 .76 .55
• BPI Interference .74 .73 .68 .73 .77 .53
• BPI Total .80 .79 .73 .75 .79 .54
• PEG .76 .74 .71 .72 .79 .54
AUC for moderate improvement
• BPI Severity .83 .81 .74 .74 .78 .60
• BPI Interference .72 .73 .69 .69 .76 .59
• BPI Total .79 .80 .74 .73 .80 .60
• PEG .75 .75 .72 .73 .75 .59
Responsiveness to Treatment
Between-group Treatment Effect Size
• BPI Severity .56 .58 .38 -- -- --
• BPI Interference .59 .46 .37 -- -- --
• BPI Total .64 .58 .42 -- -- --
• PEG .58 .52 .37 -- -- --
*

SRM = standardized response mean, which is the within-group change effect size between two time points calculated as (T1 mean score – T2 mean score) / SD of change score). For each trial, SRM was calculated for three global change groups (improved, unchanged, worse). All trials used a retrospective global anchor question to estimate SRM, and three trials also used a prospective global anchor. The SRM is a type of effect size and therefore, SRMs of 0.2, 0.5, and 0.8 represent small, moderate, and large effect sizes respectively. In this table, the SRM is reported for the improved group only.

AUC = area under the curve as determined by ROC analysis. The discriminatory strength of the scale for determining any improvement and moderate improvement using the retrospective global anchor was estimated by the AUC. Whereas good AUCs for diagnostic tests are often > 0.80, AUCs for scales in determining improvement are typically lower, and what is more important is determining whether scales have comparable AUCs (i.e., similar responsiveness).

The between-group treatment effect size is calculated as: (intervention group mean change – control group mean change) / pooled change score SD. Treatment effect sizes of 0.2, 0.5, and 0.8 represent small, moderate, and large intervention effects, respectively. End-of-trial treatment effect size data was available for 3 of the 4 trials that had a usual care (rather than active comparator) control group.

Data in this column is from the 244 SCOPE participants who completed both baseline and 3-month assessments.

The between-group treatment effect size in the 3 trials where this responsiveness metric could be calculated showed a moderate treatment effect in 2 trials (SCAMP and INCPAD) and a small to moderate treatment effect in 1 trial (SCOPE). Importantly, both the AUC values and treatment effect sizes were generally comparable for the 4 scales within each trial.

Synthesis of Metric Data

Table 4 provides a synthesis of the MID and responsiveness metrics across the 6 trials. Several important findings should be noted. First, the results were similar whether using the median or the weighted mean to integrate metrics across the 6 trials. Second, most metrics are relatively similar for the PEG and BPI scales, except for a somewhat higher SEM for the PEG (an expected consequence of shorter scales usually having a lower Cronbach’s alpha). Third, most MID estimates are around 1 point (± .3) on these 0 to 10-point scales. Fourth, responsiveness as assessed by the SRM revealed large effects sizes (> .80) and acceptable AUC (≥ .70) and was generally comparable for all 4 scales. Fifth, between-group (control vs treatment) effect sizes were in the .5 SD range, which is consistent with a moderate responsiveness to treatment. Sixth, many of the differences between the PEG and BPI scales in MID values were < .20 which is considered a lower threshold for a small difference.35 The most notable exception was the SEM having somewhat higher PEG-BPI differences due a lower Cronbach’s alpha for the PEG, which is expected for a shorter scale.

Table 4.

Synopsis of Minimally Important Difference and Responsiveness Metrics Across Six Trials

Variable PEG BPI Total BPI Interference BPI Severity PEG-BPI Difference*
Cronbach’s alpha
 Median .73 .89 .88 .84 −.11 to −.16
 Weighted Mean .75 .89 .88 .83 −.08 to −.14
Minimally Important difference (MID)
Global anchor – A Little Better
 Median 1.25 1.26 1.41 0.87 −.16 to +.38
 Weighted Mean 1.28 1.06 1.28 0.81 .00 to +.47
0.35 standard deviation (SD)
 Median 0.74 0.67 0.81 0.62 −.07 to +.12
 Weighted Mean 0.77 0.71 0.84 0.65 −.07 to +.12
0.50 standard deviation (SD)
 Median 1.05 0.95 1.15 0.88 −.10 to +.17
 Weighted Mean 1.09 1.03 1.16 0.92 −.07 to +.17
1 SEM
 Median 1.08 0.66 0.77 0.73 +.31 to +.42
 Weighted Mean 1.06 0.64 0.80 0.75 +.26 to +.42
2 SEM
 Median 2.16 1.33 1.53 1.46 +.63 to +.83
 Weighted Mean 2.12 1.29 1.59 1.50 +.53 to +.83
Responsiveness
SRM for improvement, retrospective
 Median 0.93 0.98 0.89 0.86 −.05 to +.07
 Weighted Mean 0.89 0.90 0.81 0.84 −.01 to +.08
SRM for improvement, prospective
 Median 0.85 0.76 0.75 0.72 +.09 to +.13
 Weighted Mean 1.01 0.92 0.86 0.90 +.09 to +.15
AUC for any improvement
 Median 0.73 0.77 0.73 0.75 −.04 to +.00
 Weighted Mean 0.71 0.74 0.70 0.74 −.03 to +.01
AUC for moderate improvement
 Median 0.74 0.77 0.71 0.76 −.03 to +.04
 Weighted Mean 0.72 0.75 0.70 0.76 −.04 to +.02
Between-group Treatment Effect Size
 Median 0.52 0.58 0.46 0.56 −.06 to +.06
 Weighted Mean 0.51 0.57 0.50 0.52 −.06 to +.01
*

Range of differences between the PEG and the 3 BPI MID/responsiveness metrics calculated as PEG metric minus BPI metric. For example, the PEG-BPI difference in the 6 trials for the 0.50 standard deviation weighted mean is 1.09 – 1.03 = +.06 for the PEG-BPI Total difference; 1.09 – 1.16 = −.07 for the PEG-BPI Interference difference; and 1.09 – 0.92 = +.17 for the PEG-BPI Severity difference. Thus, the range is −.07 to +.17

Because SSM was the only trial that did not explicitly enroll patients with elevated pain and because several of its psychometric findings differed substantially from the other 5 trials, we performed a sensitivity analysis by comparing the synthesis results with and without the SSM trial. As summarized in the Supplemental Table, results are generally similar with and without the SSM trial. Specifically, differences between the PEG and the 3 BPI scales fell into a similarly narrow range whether including or excluding the SSM trial. Also, changes in the values of specific metrics were typically quite small (< .10) when excluding the SSM trial.

Discussion

The current research examined MIDs and responsiveness of the PEG and compared these estimates to the legacy BPI subscales and total score across 6 clinical trials. The aggregate data across trials supports a 1-point difference in the PEG as being clinically meaningful, which is generally consistent with other literature within the context of a 0 to 10-point scale.5,12,35 Indeed, when comparing group differences, evidence-based reviews use .5, 1 and 2 point changes on a 0 to 10-point numeric rating scale as indicative of small, moderate, and large treatment effects, respectively.35 Results also suggest that MID estimates and responsiveness of the PEG and BPI scales are largely comparable when using multiple psychometric approaches. Study strengths include the synthesis of data across more than 1,700 patients from 6 trials, heterogeneity of clinical settings and patient samples which enhances generalizability, scoring of all 4 pain measures on a similar 0 to 10 point scale, and triangulation of psychometric estimates using a variety of accepted methods.

Our findings comprise the most comprehensive synthesis of empiric data aimed at systematically establishing an MID for the 3-item PEG, which has gained substantial uptake since its development over a decade ago. In 2016, the United States Surgeon General initiated a Turn the Tide opioid campaign and sent a letter to more than 2.3 million health care practitioners and public health leaders across the country to seek help in addressing the prescription opioid crisis. The campaign advised using a validated pain scale before prescribing and highlighted the PEG as a specific example. Consequently, the U.S. Centers for Disease Control and Prevention included the PEG in its Centers for Disease Control pocket guide.3

Comparable results across trials and measures suggest that clinicians and researchers should feel confident using the 3-item PEG to detect between- and within-group change over time using the 1-point (best estimate) to 2-point (upper bound) threshold. The current research suggests clinical trials should be powered to detect a 1- to 2-point threshold. From a power perspective, 1 point is the more conservative threshold because larger sample sizes are needed to detect a smaller population difference. In addition, a 1-point difference is in line with the common usage of a 1-point change in the NRS as being considered a meaningful difference. Using a 2-point threshold to define “treatment responder” in the sample data increases certainty that meaningful change occurred for that individual but increases the likelihood of false negatives. Nevertheless, using a 1-point change over time to define a “treatment responder” in the sample data also increases the likelihood (compared to using a 2-point change) that false positives are categorized as treatment responders.

The magnitude of an MID may vary depending upon whether one is measuring a difference or change at the level of an individual person versus using aggregated individual-level data to compare differences between groups in research or clinical populations.15,21 To be considered meaningful, change within an individual may need to be larger than differences that are detected between groups.10 Thus, a 1-point change may be appropriate as the MID for group changes in research studies whereas a larger change (1 to 2 points) may be considered when clinically monitoring individual patients.12 It should be pointed out that the majority of MID estimates across the trials are around 1 point or less (Table 4), supporting this as the best MID point estimate. Finally, we note there is debate on how best to use MIDs in relation to categorizing treatment responders.15

Comparable MIDs and responsiveness of the PEG and BPI scales across trials also allow researchers and clinicians to choose the measure based on the needs and implications of the clinical or research question being asked. The PEG is less burdensome and provides a broad snapshot of symptoms, while the BPI subscales focus on a specific aspect of the pain experience while taking more time to complete. Of note, both the 3-item PEG and 11-item BPI total score integrate both pain severity and interference into a single composite score rather than 2 separate domain scores. A single score may have advantages when choosing a single primary outcome in research or when monitoring and adjusting pain treatment in clinical practice. The current research thus expands the options within clinical research settings. Clinicians now have a benchmark for using the PEG in settings where pain severity and interference is of interest. Importantly, because the pain numeric rating scale is part of the PEG, both can easily be used with a patient or within the same study using pain NRS-established benchmarks if the NRS is the principal outcome of interest.12

Results in the single trial of post-stroke patients, whose pain levels were mild, differed from the more consistent results across the other 5 trials. Thus, we conducted a sensitivity analysis by synthesizing results with and without the stroke trial. It is reassuring that the results were generally similar in that the differences between the synthesis of 6 versus 5 trials were relatively small. Nonetheless, further research in populations with different levels of pain, as well as different health conditions, is warranted.

Strengths

There are several noted strengths of the current research. Data were compiled from 6 separate clinical trials in a variety of settings. These heterogenous samples increase the generalizability of the results. We also used a variety of methods to establish MIDs and responsiveness, increasing confidence in the results. Finally, patients in all but the SSM post-stroke trial required at least moderate pain for inclusion, suggesting the 1-point best estimate for MID and the 2-point upper bound are appropriate for patients who meet common inclusion criteria for chronic pain trials, as well as patients being treated for pain in practice.

Limitations and Future Directions

Results are presented with limitations. Half of the trials were limited to a retrospective anchor in calculating the SRM and between-group comparisons. Our use of only six trials with heterogenous samples limits the results’ generalizability. Most patients were recruited from primary care. Therefore, different MIDs may be found in samples of patients with more severe chronic pain or patients that have a more extensive history of chronic pain treatments. For example, a 1-point change may be more clinically meaningful for patients established in a pain clinic with consistent severe pain and who are considering a complicated neck surgery, compared to someone with moderate knee pain 60% of the time seeking treatment through primary care. Finally, the noted differences in the post-stroke trial suggest that results may be less generalizable to post-stroke patients specifically, or possibly non-pain samples, more broadly.

Current results suggest potentially productive lines of future research. More research is needed to determine whether the current estimates are consistent among other populations of individuals with pain, including patients who receive their care in specialty pain clinics. Future research should examine how an MID on the PEG corresponds to other emotional well-being constructs relevant to chronic individuals with chronic pain, such as depressive affect,12 self-concept,44 or meaning and purpose.8,9 In addition, the prevalence of co-occurring mental health conditions among individuals with chronic pain is high.16 Ensuring that the 1 to 2-point benchmark remains consistent among individuals with co-occurring chronic pain and, for example, depression or posttraumatic stress disorder will provide valuable insights into how different patient populations consider meaningful change.

Conclusions

The accurate and efficient assessment of chronic pain across a variety of settings has become an important component of the clinical encounter. The PEG was developed to assess both pain severity and interference using only 3 items. Examining data from 6 randomized clinical trials, we used several distribution- and anchor-based methods to establish 1 to 2 points as an MID for the PEG. Moreover, the PEG and BPI scales demonstrated comparable responsiveness. Results allow clinicians and researchers to assess whether patients are making meaningful improvements over time, categorize treatment responders, and power randomized clinical trials.

Supplementary Material

1

Perspective:

This article synthesizes data from 6 clinical trials to establish the minimally important difference (MID) and responsiveness of the 3-item PEG pain scale. The PEG demonstrated good responsiveness, and 1 to 2 points proved to be reasonable estimates for the lower and upper bounds of the MID.

Highlights:

  • Six RCTs were used to establish 1 to 2 points on the PEG as the lower and upper bounds of an MID.

  • MID estimates and responsiveness of the PEG and BPI scales were largely comparable.

  • Results provide guidance in determining meaningful improvement for patient care and future trials.

Disclosures

The studies from which data are derived were funded by the VA Office of Research and Development (NCT01507688, NCT01583985, NCT01236521, NCT00926588), the National Cancer Institute (NCT00313573), and the National Institute of Mental Health (NCT00118430).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

The authors have no conflicts of interest to disclose.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES