Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Dec 1.
Published in final edited form as: Arthritis Care Res (Hoboken). 2014 Dec;66(12):1783–1789. doi: 10.1002/acr.22392

CLINICALLY IMPORTANT CHANGES IN SHORT FORM-36 SCALES FOR USE IN RHEUMATOID ARTHRITIS CLINICAL TRIALS: THE IMPACT OF LOW RESPONSIVENESS

Michael M Ward 1, Lori C Guthrie 2, Maria I Alba 3
PMCID: PMC4245332  NIHMSID: NIHMS625400  PMID: 24980417

Abstract

Objective

Despite wide use of the Short-Form 36 (SF-36) in clinical trials of rheumatoid arthritis (RA), estimates of minimal clinically important improvement (MCII) for its scales are not well-established. We estimated MCIIs for SF-36 scales in patients with active RA.

Methods

In this prospective longitudinal study, we studied 243 patients who had active RA, and who completed the SF-36 before and after treatment escalation. We first assessed responsiveness with standardized response means (SRM). For scales with adequate responsiveness (SRM ≥ 0.50), we used patient judgments of improvement in arthritis status as anchors for estimating MCIIs. We used receiver operating characteristic curve analysis to identify the MCIIs as the change associated with a specificity of 0.80 for improvement.

Results

Patients had substantial improvement in RA activity with treatment. However, among SF-36 scales, only the physical functioning and bodily pain scale and the physical component summary had adequate responsiveness. Using 0.80 specificity for improvement as the criterion, the MCIIs were 7.1 for the physical functioning scale, 4.9 for the bodily pain scale, and 7.2 for the physical component summary.

Conclusions

Low responsiveness precluded estimation of valid MCIIs for many SF-36 scales in patients with RA, particularly the scales assessing mental health. Although the SF-36 has been included in many clinical trials to broaden the assessment of health status, low responsiveness limits the interpretation of changes in its mental health-related scales.


Measures of health status and health-related quality of life have been recognized as essential components of clinical evaluations because they provide patients’ perspectives on how they are affected by a disease (1,2). Consequently, patient-reported outcomes have been included as endpoints in many clinical trials, allowing treatment effects to be assessed more comprehensively. In rheumatoid arthritis (RA), measures such as pain scales and the Health Assessment Questionnaire Disability Index have been used for decades, and clinical trials have also increasingly included generic health status measures, particularly the Short-Form 36 (SF-36) (39). The SF-36 includes eight scales that assess pain, physical functioning, general health, fatigue/vitality, mental health, social functioning, and role limitations due to either physical or emotional problems. Two summary scores, the physical component summary (PCS) and mental component summary (MCS) can also be computed. The SF-36 therefore assesses a broader range of health concerns than RA-specific measures, and because it is a generic measure, comparisons are possible across diseases. The SF-36 is the most commonly used patient-reported measure worldwide, and has had extensive testing of reliability and validity (1012).

Interpretation of SF-36 scores has been aided by development of population-based normative values, and by rescaling methods that allow patients’ scores to be directly compared to general population means. However, interpretation of changes in SF-36 scores is not intuitive, and requires knowledge of the degree of change that corresponds to improvement (or deterioration) in health that is recognized as important or meaningful to patients (13). Scores may improve significantly with treatment, but the magnitude of the change may not be clinically important. Determination of thresholds for clinically important improvement, and reporting the proportion of patients who had changes exceeding this threshold, were highlighted by the U.S. Food and Drug Administration as important considerations in using patient-reported outcomes in clinical trials (14).

Despite its wide use, estimates of minimal clinically important improvement (MCII) for SF-36 scales are not well-established (15). Thresholds of 3, 5, or 10 points on the individual scales, and 2.5, 3, or 5 points on the physical and mental component summaries, have been cited as MCIIs in studies of RA (58,16). A uniform threshold has commonly been applied to all scales, even though the interpretation of changes in individual scales may differ. The single study that served as the source of these estimates in RA used the original version of the SF-36, which has a one week recall period rather than the 4 week recall version more commonly used in RA clinical trials, and examined normalized scores for the PCS and MCS but not for the individual scales (17). In this study, MCIIs varied from 2.4 to 16.4 for individual scale scores, and were 4.4 and 3.1 for the PCS and MCS, respectively. The investigators noted that a single best estimate of MCII could not be established, and that estimates for the scales related to mental health had limited validity because they did not vary much with RA severity (17). They did not use these estimates in a subsequent study (18). Complicating matters further, it is difficult to generalize estimates of MCIIs across diseases and treatments, because different diseases impact different aspects of health, and different aspects of health are responsive to different treatments (19).

In this study, we sought to estimate MCIIs for the SF-36 for patients with active RA who are receiving treatment with disease-modifying medications, biologics, or prednisone, so that these may be applied in clinical trials that use the SF-36. We examined population-normalized scores for the SF-36-version 2 with 4 week recall, as this is the version most commonly used in clinical trials currently.

METHODS

Participants

We enrolled patients with RA who were receiving care in our clinics in a prospective longitudinal study to determine clinically important changes in RA activity measures (20). Inclusion criteria were age 18 or older, a clinical diagnosis of RA and fulfillment of the 1987 American College of Rheumatology classification criteria (21), active RA based on physician judgment and the presence of at least six tender joints, and escalation of anti-rheumatic treatment to treat active RA at the baseline visit. This escalation could be an increased dose of the patient’s current disease-modifying medication, initiation of a new disease-modifying medication or biologic, or initiation of prednisone. The specific treatment was chosen by the patient’s rheumatologist and was not dictated by this study. The study was approved by the institutional review board, and all patients provided written informed consent.

Study procedures

Participants had clinical assessments at a baseline visit and a follow-up visit either 1 month later (for those treated with prednisone) or 4 months later (for all others). Follow-up visits were earlier for the prednisone-treated patients because quicker responses were anticipated. At both assessments, we measured tender and swollen joint counts, physician global assessment by visual analog scale (0–100), and tested the erythrocyte sedimentation rate (ESR) and C-reactive protein (CRP) level. Patients completed a global assessment by visual analog scale (possible range 0 – 100 with anchors of very well and very poor), pain scale by visual analog scale (possible range 0 – 100 with anchors of no pain and severe pain), Health Assessment Questionnaire Disability Index (HAQ; possible range 0 – 3, with higher scores indicating more functional limitations)(3), the Centers for Epidemiologic Studies-Depression scale (0 – 60, with higher scores indicating more depressive symptoms)(22) and the SF-36 (23). SF-36 scores can range from 0 to 100, with higher scores indicating better health. We normalized scores to U.S. population values with the population mean set at 50 and standard deviation of 10. We computed the DAS28-ESR and SDAI as measures of clinical RA activity (24,25).

At the follow-up visit, participants answered anchor questions on whether they judged that their arthritis overall had improved, worsened, or was unchanged since the baseline visit, and on the importance of any change on a 7-point scale (from “hardly important at all” to “extremely important”) (26). Similar questions were asked about changes in pain, ability to do things, joint swelling, stiffness, fatigue, and depression. Responses on the anchor questions were highly associated with measured changes in RA activity, indicating good construct validity of the anchor questions (20).

Statistical analysis

To provide valid estimates of MCIIs, measures must be sensitive to change. If a measure is not sensitive to change, patients may experience important improvements in health, but these improvements would correspond to only small changes in the measure. Small changes in the measure could therefore mistakenly be labeled as important. We first examined changes in SF-36 scores to establish their responsiveness. We measured responsiveness using standardized response means (SRM), computed as the mean change divided by the standard deviation of the change of each scale. SRMs ≥ 0.50 were considered to reflect acceptable responsiveness (27,28).

We used an anchor-based approach rather than a distribution-based approach to estimate MCII, because anchor-based methods aim to assess clinical significance and can incorporate patients’ perspectives, while distribution-based approaches only define thresholds based on measurement reliability, which are not necessarily related to clinical significance (13). For scales with acceptable responsiveness, we used receiver operating characteristic (ROC) curves to derive MCIIs (29,30). An ROC curve is a plot of sensitivity versus (1 – specificity) of the association between measured changes in a scale and the presence or absence of an outcome of interest. In this study, the outcome of interest was improvement (versus no improvement) as judged by the patient, based on their response to the anchor questions. We used domain-specific anchor questions when possible. For example, the anchor question on pain was used to determine the MCII for the bodily pain scale, and the anchor question on ability to do things was used to determine the MCII for the physical functioning scale. For the PCS and MCS, we used the global arthritis and depression anchor questions, respectively. Area under the ROC curve provides an estimate of the discriminative ability of changes in the SF-36 scales, as well as of the consistency among patients in their ratings of improvement (31). Areas of 1.0 indicate perfect separation between those reporting improvement versus no improvement, while areas of 0.5 indicate discrimination no better than chance. We considered judgments of improvement to be consistent among patients if the lower 95% confidence limit for the ROC area was greater than 0.5. We computed ROC curves for 2000 bootstrapped samples to provide estimates with predictive validity, using nonparametric resampling and the bias corrected and accelerated method to compute confidence intervals (32).

We also used the ROC curves to determine MCIIs. As the primary criterion for estimation of the MCII, we used the change in the SF-36 scale that had a specificity for improvement of 0.80, following previous studies (20,33,34). This threshold indicates the degree of change in the SF-36 scale that 80% or more of patients would indicate as being important. We also used the Youden index and the minimal distance to the upper left corner [0, 1] of the ROC plot as alternative criteria for determining MCIIs (35). The Youden index is the point on the ROC curve associated with the maximal difference between true positives and false positives. The minimal distance to [0, 1] is the point on the ROC curve with both maximum sensitivity and specificity for the outcome. These latter methods do not ensure a minimum threshold for specificity, and the Youden index may take more than one value, making them less suitable for determining MCIIs than the 0.80 specificity criterion.

We estimated that a sample of 250 patients would provide sufficient statistical power (beta = 0.20) that an ROC curve area as small as 0.6 would be significantly different from 0.5 with type 1 error (two-tailed) of 0.05, even if 225 patients reported improvement and 25 did not report improvement (i.e. a confidence interval of 0.53, 0.67). Statistical power would be greater if the proportions reporting improvement or no improvement were less skewed.

We used SAS programs version 9.3 (SAS Institute, Cary, NC) for statistical analysis.

RESULTS

Patient characteristics

Of 250 patients, we included 243 patients who had no missing data for the SF-36 on the baseline and follow-up visits. Patients were largely middle-aged women with established seropositive RA (Table 1). They had active RA, with a mean DAS28 of 6.14 and a mean SDAI of 38.4. At the baseline visit, 88 patients (36.2%) began a new disease-modifying drug or biologic, 54 patients (22.2%) were treated with prednisone, and 101 patients (41.6%) had an increase in dose of their current medication.

Table 1.

Patient characteristics at study entry (N = 243).*

Age, years 50.6 ± 13.5
Women 191 (79%)
White, non-Hispanic 99 (40.7%)
Hispanic 70 (28.8%)
Black 56 (23.0%)
Asian 17 (7.0%)
Multi-ethnic 1 (0.4%)
Formal education, years 12.7 ± 3.9
Duration of RA, years 6.3 (2.1, 14.8)
Seropositive 183 (75.2%)
Erosive 139 (64.6%)
Swollen joint count (0 – 66) 16.1 ± 9.0
Tender joint count (0 – 68) 25.2 ± 15.0
Physician global assessment (0 – 100) 48.2 ± 17.5
Patient global assessment (0 – 100) 55.0 ± 25.0
Pain (0 – 100) 60.4 ± 25.1
Health Assessment Questionnaire (0 – 3) 1.5 (0.75, 2.0)
Center for Epidemiologic Studies-Depression scale (0 – 60) 18.4 ± 11.7
Erythrocyte sedimentation rate (mm/hr) 40 ± 28
Disease Activity Score-28 (0 – 9.4) 6.14 ± 1.2
Simplified Disease Activity Index (0 – 86) 38.4 ± 14.9
*

Plus-minus values are mean ± standard deviation. N (N, N) values are median (25th, 75th percentile). All other values are number (percentage).

Of 215 patients with radiographs.

Clinical Responses

RA activity improved substantially at the follow-up visit, with the mean DAS28 decreasing to 4.83 and mean SDAI to 23.8. Reflecting this improvement, the SRM of both the DAS28 and SDAI was 0.97. Other clinical measures were also responsive, with SRMs for the tender joint count, swollen joint count, pain scale, patient global assessment, and HAQ of 0.74, 0.70, 0.70, 0.67, and 0.66, respectively. However, there was only a small improvement in depressive symptoms as measured by the CES-D, with mean values decreasing from 18.4 to 15.0 (SRM 0.34).

At the follow-up visit, 164 patients (67%) judged their overall arthritis status had improved, while 160 patients (66%) judged their pain improved, and 149 patients (61%) judged their functional ability improved.

Changes in SF-36 scales

At baseline, scores for all SF-36 scales were substantially lower than the population mean of 50, although the mental health scale was less affected than others (Table 2). Scores on all scales improved significantly at the follow-up visit. However, only the physical functioning and bodily pain scales and PCS had SRMs ≥ 0.50. The general health, social functioning, role emotional, and mental health scales and MCS were particularly poorly responsive, with SRMs of 0.33 or less.

Table 2.

Changes in Short-Form 36 scales and summary measures during the study.*

Measure Baseline Follow-up Mean change P for
change
SRM (95% CI)
Physical functioning 31.9 ± 11.3 36.9 ± 11.3 5.1 ± 9.2 <0.0001 0.55 (0.46, 0.65)
Role physical 34.8 ± 9.7 39.6 ± 11.6 4.8 ± 11.8 <0.0001 0.40 (0.33, 0.49)
Bodily pain 34.6 ± 9.0 41.4 ± 10.3 6.7 ± 10.4 <0.0001 0.65 (0.57, 0.73)
General health 38.6 ± 7.5 41.1 ± 5.3 2.5 ± 7.6 <0.0001 0.33 (0.24, 0.41)
Vitality 41.6 ± 10.6 46.5 ± 11.6 4.8 ± 10.3 <0.0001 0.47 (0.39, 0.55)
Social functioning 39.3 ± 12.7 42.7 ± 11.3 3.4 ± 12.1 <0.0001 0.28 (0.20, 0.37)
Role emotional 41.0 ± 14.2 43.8 ± 13.6 2.8 ± 14.1 0.003 0.20 (0.11, 0.28)
Mental health 45.1 ± 11.6 47.5 ± 11.2 2.4 ± 10.7 0.0005 0.22 (0.15, 0.31)
Physical component summary (PCS) 31.9 ± 9.3 37.5 ± 9.8 5.6 ± 9.0 <0.0001 0.63 (0.55, 0.71)
Mental component summary (MCS) 46.6 ± 12.3 48.5 ± 11.1 2.0 ± 11.2 0.008 0.17 (0.10, 0.25)
*

SRM = standardized response mean; CI = confidence interval.

Plus-minus values are mean ± standard deviation.

MCII estimates

Because of the poor responsiveness of many scales, we could estimate valid MCIIs only for the physical functioning and bodily pain scales and the PCS. ROC curve areas for these 3 measures ranged from 0.72 to 0.79, with lower 95% confidence limits all greater than 0.65, indicating good discrimination (Figure 1).

Figure 1.

Figure 1

Figure 1

Figure 1

Receiver operating characteristic curves for the physical functioning scale (top), bodily pain scale (middle), and physical component summary (bottom).

Using the 0.80 specificity criterion, the MCII was 7.1 for the physical functioning scale and 4.9 for the bodily pain scale (Table 3). A somewhat higher estimate was obtained for the physical functioning scale using the Youden index or the point closest to [0, 1] as the criterion, but this estimate also had lower sensitivity and higher specificity. Estimates were similar for the bodily pain scale using either the 0.80 specificity criterion, Youden index, or the point closest to [0, 1]. For the PCS, the MCII was 7.2 using the 0.80 specificity criterion. Thresholds were somewhat lower using the alternative criteria, with lower specificity but appreciably higher sensitivity.

Table 3.

Minimal clinically important improvement estimates for the physical functioning scale, bodily pain scale, and physical component summary of the Short-Form 36.

Measure 80% Specificity Criterion Youden Index Point closest to [0, 1]
Physical functioning scale MCII 7.1 (4.2, 8.9) 8.4 (8.0 , 12.6) 8.4 (8.1, 14.7)
Sensitivity 0.54 (0.43, 0.64) 0.49 (0.40, 0.57) 0.49 (0.40, 0.57)
Specificity 0.80 0.86 (0.77, 0.92) 0.86 (0.77, 0.92)
ROC curve area 0.72 (0.66, 0.78) 0.72 (0.66, 0.78) 0.72 (0.66, 0.78)
Bodily pain scale MCII 4.9 (3.9, 12.9) 4.7 (3.8, 12.7) 4.7 (3.4, 12.4)
Sensitivity 0.64 (0.50, 0.75) 0.66 (0.59, 0.74) 0.66 (0.59, 0.74)
Specificity 0.80 0.78 (0.67, 0.87) 0.78 (0.67, 0.87)
ROC curve area 0.79 (0.73, 0.85) 0.79 (0.73, 0.85) 0.79 (0.73, 0.85)
Physical component summary MCII 7.2 (4.6, 8.0) 5.1 (2.2, 10.7) 5.1 (2.2, 10.7)
Sensitivity 0.46 (0.35, 0.67) 0.64 (0.56, 0.71) 0.64 (0.56, 0.71)
Specificity 0.80 0.74 (0.63, 0.84) 0.74 (0.63, 0.84)
ROC curve area 0.73 (0.67, 0.79) 0.73 (0.67, 0.79) 0.73 (0.67, 0.79)
*

ROC = receiver operating characteristic; MCII = minimal clinically important improvement.

Values in parentheses are 95% confidence limits.

DISCUSSION

The main findings of our study were that low responsiveness precluded estimation of valid MCIIs for many SF-36 scales and for the MCS, that MCIIs for the physical functioning and bodily pain scales were 7.1 and 4.9, respectively, and that the MCII of the PCS was larger than previously estimated.

Low responsiveness indicates that the health status measure did not change much despite clinical improvement. In some cases, this may reflect the construction of the measure. Measures that can take on only a few values tend to be poorly responsive, because patients may need substantial improvement to register a change. Finely graded measures, such as visual analog scales, tend to be more responsive because they can capture incremental changes. Features of the sample can also affect responsiveness. If patients do not manifest abnormalities in a domain, they will have little opportunity to demonstrate improvement, and these ceiling or floor effects will result in low responsiveness. Additionally, responsiveness may be low if the treatment given is ineffective. Responsiveness is therefore situational, and dependent on not only the measure but also the patient and treatment.

The MCS and the social functioning, role emotional, mental health, and general health scales were poorly responsive, with SRMs of 0.33 or less, and the vitality and role physical scales were marginally responsive. It is unlikely that the response options of these scales contributed to low responsiveness, because these scales had as many response options as the pain and physical functioning scales, and both the MCS and PCS took as many values as there were patients. Patient characteristics also were unlikely to be the sole reason for low responsiveness, as patients’ baseline scores on the social functioning, role emotional, vitality, and general health scales were all substantially lower than population norms. Mean scores on the mental health scale and MCS were less abnormal, and may have contributed to their lower responsiveness.

More likely, differences in responsiveness between the mental health-oriented scales and the physical health-oriented scales were due to the nature of the treatments used. While anti-rheumatic treatments improve the core physical aspects of RA, and induced large changes in the pain and physical functioning scales and PCS, they do not directly target mood or social functioning. Effects on these aspects of health are indirect. The mental health-oriented scales improved with treatment, suggesting that they provide useful information on treatment effects in RA. However, these changes were small relative to the variation in these measures. The low responsiveness of the CES-D supports the conclusion that the mismatch between the health domain and the treatment, rather than the construction of the SF-36 scales, was primarily responsible for the low responsiveness of the mental health-oriented SF-36 scales in this study.

Although the SF-36 has generally been considered a responsive measure in RA, examination of previous studies indicates that this conclusion has often been based on overgeneralization of findings for the bodily pain, physical functioning, and (in some studies) role physical scales and the PCS. The MCS and mental health-oriented scales had uniformly low responsiveness in studies of anti-rheumatic treatment in RA (4,3641). The vitality scale often had marginal responsiveness, as in our study. This observation suggests that MCIIs for the mental health-oriented scales need to be examined in studies that use a treatment known to improve psychological health in patients with RA, which would induce larger changes in these measures and enhance their responsiveness. Confronting similar issues in studies of the treatment of hepatitis C, hepatologists proposed an MCII estimate only for the vitality scale, because this was the only scale that was responsive to treatment of this condition (42). Use of poorly responsive measures can lead to the erroneous designation of trivial improvements or even worsenings as the MCII (4345).

MCIIs for the bodily pain and physical functioning scales were for normalized values, which have a compressed scale compared to non-normalized values. Therefore, it is difficult to compare our results to the estimates of 11.0 and 7.7 proposed for non-normalized values of these scales (17). Use of the 0.80 specificity criterion is appealing because it establishes thresholds with uniform and high specificity across measures. Modestly higher estimates were obtained for the physical functioning scale using the Youden index or smallest distance to [0, 1] as the criterion. For the PCS, the MCII of 7.2 was larger than the estimate of 4.4 proposed in the original study (17), which was closer to our estimate of 5.1 using the Youden index and smallest distance to [0, 1]. In this study, the Youden Index and the smallest distance to [0, 1] identified the same threshold as the MCII for the physical functioning scale, pain scale, and the PCS, but this is not always the case. For the pain scale, the MCII was also similar using the 0.80 specificity criterion. Choice among these ROC-based approaches should be based on the priority given to maximizing specificity or sensitivity or both properties, or maximizing true positives. The 0.80 specificity criterion provides greater assurance that the estimates represent changes recognized as important by most patients, and so was the method we preferred. We should emphasize that these MCIIs are derived from responses of patients with active RA, and should be applied to patients with comparable levels of RA activity. The target population of this study was patients eligible for clinical trials, so that the MCII estimates could be used to interpret changes in the SF-36 in RA clinical trials.

The strengths of this study include the large sample with active RA, documented improvement in RA activity with treatment, and testing of responsiveness. Approximately one-third of patients did not report improvement, and this variation is needed to estimate MCIIs. The study is limited in that we only examined improvement and not worsening, the thresholds for which may differ. Because treatment is directed at improvement, we considered this aim to be more relevant. We used only a single anchor for each scale. Use of multiple types of anchors would have enhanced confidence in the MCII estimates. However, we used domain-specific anchors to increase specificity. Although we assessed responses at different times in the prednisone-treated group and the other treatment groups, MCIIs are largely unrelated to the assessment interval (46,47). Perhaps the greatest limitation is our inability to provide estimates of MCIIs for the MCS and many SF-36 scales that address aspects of health status other than pain and physical functioning . We believe this limitation highlights the importance of careful consideration of responsiveness and use of appropriate treatments when attempting to derive MCIIs for health status measures.

SIGNIFICANCE AND INNOVATION.

  • First study to estimate thresholds for clinically important improvement in SF-36 scales in relation to the sensitivity to change of the scales

  • Poor sensitivity to change precluded estimation of these thresholds for the mental health-related scales of the SF-36

  • Estimates of minimal clinically important improvement in the physical functioning, bodily pain, and physical component summary for patients with active RA in clinical trials are provided

ACKNOWLEDGEMENTS

This study was supported by the Intramural Research Program, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health.

Footnotes

None of the authors has any commercial or financial interests or conflicts related to this work.

Contributor Information

Michael M. Ward, Intramural Research Program, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health.

Lori C. Guthrie, Intramural Research Program, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health.

Maria I. Alba, Intramural Research Program, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health.

REFERENCES

  • 1.U.S. Centers for Disease Control and Prevention. [accessed on October 2, 2013];Healthy People 2020. www.cdc.gov/nchs/healthy_people/hp2020.htm.
  • 2.Wilson IB, Cleary PD. Linking clinical variables with health-related quality of life: A conceptual model of patient outcomes. JAMA. 1995;273:59–65. [PubMed] [Google Scholar]
  • 3.Fries JF, Spitz P, Kraines RG, Holman HR. Measurement of patient outcome in arthritis. Arthritis Rheum. 1980;23:137–145. doi: 10.1002/art.1780230202. [DOI] [PubMed] [Google Scholar]
  • 4.Tugwell P, Wells G, Strand V, Maetzel A, Bombardier C, Crawford B, et al. Clinical improvement as reflected in measures of function and health-related quality of life following treatment with leflunomide compared with methotrexate in patients with rheumatoid arthritis. Arthritis Rheum. 2000;43:506–514. doi: 10.1002/1529-0131(200003)43:3<506::AID-ANR5>3.0.CO;2-U. [DOI] [PubMed] [Google Scholar]
  • 5.Genovese MC, Schiff M, Luggen M, Becker J-C, Aranda R, Teng J, et al. Efficacy and safety of the selective co-stimulation modulator abatacept following 2 years of treatment in patients with rheumatoid arthritis and an inadequate response to anti-tumour necrosis factor therapy. Ann Rheum Dis. 2008;67:547–554. doi: 10.1136/ard.2007.074773. [DOI] [PubMed] [Google Scholar]
  • 6.Keystone E, Burmester GR, Furie R, Loveless JE, Emery P, Kremer J, et al. Improvement in patient-reported outcomes in a rituximab trial in patients with severe rheumatoid arthritis refractory to anti-tumor necrosis factor therapy. Arthritis Rheum. 2008;59:785–793. doi: 10.1002/art.23715. [DOI] [PubMed] [Google Scholar]
  • 7.Coombs JH, Bloom BJ, Breedveld FC, Fletcher MP, Gruben D, Kremer JM, et al. Improved pain, physical functioning and health status in patients with rheumatoid arthritis treated with CP-690,550, an orally active Janus kinase (JAK) inhibitor: results from a randomised, double-blind, placebo-controlled trial. Ann Rheum Dis. 2010;69:413–416. doi: 10.1136/ard.2009.108159. [DOI] [PubMed] [Google Scholar]
  • 8.Strand V, Rentz AM, Cifaldi MA, Chen N, Roy S, Revicki D. Health-related quality of life outcomes of adalimumab for patients with early rheumatoid arthritis: Results from a randomized multicenter study. J Rheumatol. 2012;39:63–72. doi: 10.3899/jrheum.101161. [DOI] [PubMed] [Google Scholar]
  • 9.Kalyoncu U, Dougados M, Daurès J-P, Gossec L. Reporting of patient-reported outcomes in recent trials in rheumatoid arthritis: a systematic literature review. Ann Rheum Dis. 2009;68:183–190. doi: 10.1136/ard.2007.084848. [DOI] [PubMed] [Google Scholar]
  • 10.Ware JE, Jr, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care. 1992;30:473–483. [PubMed] [Google Scholar]
  • 11.McHorney CA, Ware JE, Jr, Raczek AE. The MOS 36-item short-form health survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care. 1993;31:247–263. doi: 10.1097/00005650-199303000-00006. [DOI] [PubMed] [Google Scholar]
  • 12.Jenkinson C, Stewart-Brown S, Petersen S, Paice C. Assessment of the SF-36 version 2 in the United Kingdom. J Epidemiol Community Health. 1999;53:46–50. doi: 10.1136/jech.53.1.46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Guyatt GH, Osoba D, Wu AW, Wyrwich KW, Norman GR. Clinical Significance Consensus Meeting Group. Methods to explain the clinical significance of health status measures. Mayo Clin Proc. 2002;77:371–383. doi: 10.4065/77.4.371. [DOI] [PubMed] [Google Scholar]
  • 14.U.S. Department of Health and Human Services. Food and Drug Administration. Guidance for Industry. [accessed on Oct 3, 2013];Patient-reported outcome measures: Use in medical product development to support labeling claims. www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM193282.pdf.
  • 15.Samsa G, Edelman D, Rothman ML, Williams GR, Lipscomb J, Matchar D. Determining clinically important differences in health status measures. A general approach with illustration to the Health Utilities Index Mark II. Pharmacoeconomics. 1999;15:141–155. doi: 10.2165/00019053-199915020-00003. [DOI] [PubMed] [Google Scholar]
  • 16.Lubeck DP. Patient-reported outcomes and their role in the assessment of rheumatoid arthritis. Pharmacoeconomics. 2004;22(Suppl 1):27–38. doi: 10.2165/00019053-200422001-00004. [DOI] [PubMed] [Google Scholar]
  • 17.Kosinski M, Zhao SZ, Dedhiya S, Osterhaus JT, Ware JE., Jr Determining minimally important changes in generic and disease-specific health-related quality of life questionnaires in clinical trials of rheumatoid arthritis. Arthritis Rheum. 2000;43:1478–1487. doi: 10.1002/1529-0131(200007)43:7<1478::AID-ANR10>3.0.CO;2-M. [DOI] [PubMed] [Google Scholar]
  • 18.Emery P, Kosinski M, Li T, Martin M, Williams GR, Becker J-C, et al. Treatment of rheumatoid arthritis patients with abatacept and methotrexate significantly improved health-related quality of life. J Rheumatol. 2006;33:681–689. [PubMed] [Google Scholar]
  • 19.Wyrwich KW, Tierney WM, Babu AN, Kroenke K, Wolinsky FD. A comparison of clinically important differences in health-related quality of life for patients with chronic lung disease, asthma, or heart disease. Health Serv Res. 2005;40:577–591. doi: 10.1111/j.1475-6773.2005.00373.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Ward MM, Guthrie LC, Alba MI. Clinically important changes in individual and composite measures of rheumatoid arthritis activity. Thresholds applicable to clinical trials. Ann Rheum Dis. 2014 doi: 10.1136/annrheumdis-2013-205079. (in press). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Arnett FC, Edworthy SM, Bloch DA, McShane DJ, Fries JF, Cooper NS, et al. The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. Arthritis Rheum. 1988;31:315–324. doi: 10.1002/art.1780310302. [DOI] [PubMed] [Google Scholar]
  • 22.Radloff LS. The CES-D scale: A self-report depression scale for research in the general population. Appl Psychol Meas. 1977;1:385–401. [Google Scholar]
  • 23.Ware JR, Jr, Kosinski M, Dewey JE. How to score version 2 of the SF-36 health survey. Lincoln, RI: QualityMetric Incorporated; 2000. [Google Scholar]
  • 24.Prevoo ML, van’t Hof MA, Kuper HH, van Leeuwen MA, van de Putte LB, van Riel PL. Modified disease activity scores that include twenty-eight-joint counts. Development and validation in a prospective longitudinal study of patients with rheumatoid arthritis. Arthritis Rheum. 1995;38:44–48. doi: 10.1002/art.1780380107. [DOI] [PubMed] [Google Scholar]
  • 25.Smolen JS, Breedveld FC, Schiff MH, Kalden JR, Emery P, Eberl G, et al. A simplified disease activity index for rheumatoid arthritis for use in clinical practice. Rheumatology. 2003;42:244–257. doi: 10.1093/rheumatology/keg072. [DOI] [PubMed] [Google Scholar]
  • 26.Jaeschke R, Singer J, Guyatt GH. Measurement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials. 1989;10:407–415. doi: 10.1016/0197-2456(89)90005-6. [DOI] [PubMed] [Google Scholar]
  • 27.Cohen J. Statistical power analyses for the behavioral sciences. Second ed. Hillsdale, NJ: Erlbaum; 1988. [Google Scholar]
  • 28.Beaton DE, Hogg-Johnson S, Bombardier C. Evaluating changes in health status: Reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders. J Clin Epidemiol. 1997;50:79–93. doi: 10.1016/s0895-4356(96)00296-x. [DOI] [PubMed] [Google Scholar]
  • 29.Ward MM, Marx AS, Barry NN. Identification of clinically important changes in health status using receiver operating characteristic curves. J Clin Epidemiol. 2000;53:279–284. doi: 10.1016/s0895-4356(99)00140-7. [DOI] [PubMed] [Google Scholar]
  • 30.Turner D, Schünemann HJ, Griffith LE, Beaton DE, Griffiths AM, Critch JN, et al. Using the entire cohort in the receiver operating characteristic analysis maximizes precision of the minimal important difference. J Clin Epidemiol. 2009;62:374–379. doi: 10.1016/j.jclinepi.2008.07.009. [DOI] [PubMed] [Google Scholar]
  • 31.Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36. doi: 10.1148/radiology.143.1.7063747. [DOI] [PubMed] [Google Scholar]
  • 32.Carpenter J, Bithell J. Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Statist Med. 2000;19:1141–1164. doi: 10.1002/(sici)1097-0258(20000515)19:9<1141::aid-sim479>3.0.co;2-f. [DOI] [PubMed] [Google Scholar]
  • 33.Kvamme MK, Kristiansen IS, Lie E, Kvien TK. Identification of cutpoints for acceptable health status and important improvement in patient-reported outcomes, in rheumatoid arthritis, psoriatic arthritis, and ankylosing spondylitis. J Rheumatol. 2010;37:26–31. doi: 10.3899/jrheum.090449. [DOI] [PubMed] [Google Scholar]
  • 34.Aletaha D, Smolen JS, Ward MM. Measuring function in rheumatoid arthritis: Identifying reversible and irreversible components. Arthritis Rheum. 2006;54:2784–2792. doi: 10.1002/art.22052. [DOI] [PubMed] [Google Scholar]
  • 35.Perkins NJ, Schisterman EF. The inconsistency of “optimal” cutpoints obtained using two criteria based on the receiver operating characteristic curve. Am J Epidemiol. 2006;163:670–675. doi: 10.1093/aje/kwj063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Ruta DA, Hurst NP, Kind P, Hunter M, Stubbings A. Measuring health status in British patients with rheumatoid arthritis: reliability, validity, and responsiveness of the short-form 36-item health survey (SF-36) Br J Rheumatol. 1998;37:425–436. doi: 10.1093/rheumatology/37.4.425. [DOI] [PubMed] [Google Scholar]
  • 37.Hagen KB, Smedstad LM, Uhlig T, Kvien TK. The responsiveness of health status measures in patients with rheumatoid arthritis: Comparison of disease-specific and generic instruments. J Rheumatol. 1999;26:1474–1480. [PubMed] [Google Scholar]
  • 38.Wells G, Li T, Maxwell L, Maclean R, Tugwell P. Responsiveness of patient reported outcomes including fatigue, sleep quality, activity limitation, and quality of life following treatment with abatacept for rheumatoid arthritis. Ann Rheum Dis. 2008;67:260–265. doi: 10.1136/ard.2007.069690. [DOI] [PubMed] [Google Scholar]
  • 39.Linde L, Sørensen J, Østergaard M, Hørslev-Petersen K, Hetland ML. Health-related quality of life: Validity, reliability, and responsiveness of SF-36, EQ-15D, EQ-5D, RAQoL, and HAQ in patients with rheumatoid arthritis. J Rheumatol. 2008;35:1528–1537. [PubMed] [Google Scholar]
  • 40.Veehof MM, ten Klooster M, Taal E, van Riel PLCM, van de Laar MAFJ. Comparison of internal and external responsiveness of the generic Medical Outcome Study Short Form-36 (SF-36) with disease-specific measures in rheumatoid arthritis. J Rheumatol. 2008;35:610–617. [PubMed] [Google Scholar]
  • 41.ten Klooster PM, Vonkeman HE, Taal E, Siemons L, Hendriks L, de Jong AJL, et al. Performance of the Dutch SF-36 version 2 as a measure of health-related quality of life in patients with rheumatoid arthritis. Health Qual Life Outcomes. 2013;11:77. doi: 10.1186/1477-7525-11-77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Spiegel BMR, Younossi ZM, Hays RD, Revicki D, Robbins S, Kanwal F. Impact of hepatitis C on health related quality of life: a systematic review and quantitative assessment. Hepatology. 2005;41:790–800. doi: 10.1002/hep.20659. [DOI] [PubMed] [Google Scholar]
  • 43.Keurentjes JC, van Tol RF, Fiocco M, Schoones JW, Nelissen RG. Minimal clinically important difference in health-related quality of life after total hip or knee replacement. Bone Joint Res. 2012;1:71–77. doi: 10.1302/2046-3758.15.2000065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Angst F, Aeschlimann A, Stucki G. Smallest detectable and minimal clinically important differences of rehabilitation intervention with their implications for required sample sizes using WOMAC and SF-36 quality of life measurement instruments in patients with osteoarthritis of the lower extremities. Arthritis Care Res. 2001;45:384–391. doi: 10.1002/1529-0131(200108)45:4<384::AID-ART352>3.0.CO;2-0. [DOI] [PubMed] [Google Scholar]
  • 45.Pope JE, Khanna D, Norrie D, Ouimet JM. The minimally important difference for the health assessment questionnaire in rheumatoid arthritis clinical practice is smaller than in randomized controlled trials. J Rheumatol. 2009;36:254–259. doi: 10.3899/jrheum.080479. [DOI] [PubMed] [Google Scholar]

RESOURCES