Abstract
The Medical Outcomes Study Short Form-36 (SF-36) is a generic measure of health-related quality of life (HRQOL), validated and cross-culturally translated, which has been extensively utilised in rheumatology. In randomised controlled trials and observational studies, SF-36 provides rich data regarding HRQOL; but as typically portrayed, patterns of disease and treatment-associated effects can be difficult to discern. “Spydergrams” offer a simplified means to visualise complex results across all domains of SF-36 in a single figure: depicting disease and population-specific patterns of decrements in HRQOL compared with age and gender-matched normative data, as well as providing a tool for interpreting complex treatment-associated or longitudinal changes. Utilising spydergrams as a standard format to illustrate and report changes in SF-36 across different rheumatic diseases can greatly facilitate analyses and interpretations of clinical trial results, as well as providing patients an accessible means to compare baseline scores and treatment-associated improvements with normative data from individuals without arthritis. Furthermore, SF-6D utility scores based on mean changes across all eight domains of SF-36 are suggested as a quantitative means of summarising changes illustrated by spydergrams, offering a universal metric for cost-effectiveness analyses of therapeutic interventions.
The Medical Outcomes Study Short Form-36 (SF-36) was developed to measure self-reported health-related quality of life (HRQOL): 36 questions combined into eight domains reflecting different dimensions of health,1, 2 grouped into composite physical and mental component summary (PCS and MCS) scores.3 SF-36 has been cross-culturally translated4 and is widely used for clinical research, health policy evaluations as well as general population surveys. A US Veterans Affairs version has also been derived and validated.5
Extensively validated in randomised controlled trials (RCT) and longitudinal observational studies, this generic instrument has demonstrated sensitivity to treatment effects and reflects the impact of various rheumatic diseases upon HRQOL, including rheumatoid arthritis (RA),6, 7 systemic lupus erythematosus (SLE),8 psoriatic arthritis,9 ankylosing spondylitis,10 gout,11 systemic sclerosis (SSc),12 fibromyalgia13 and osteoarthritis.14 SF-36 scores correlate well with improvements in physical function measured by health assessment questionnaire (HAQ-DI) in RA.15 Over the past decade, RCT with new disease-modifying antirheumatic drugs have documented significant treatment-associated changes, including “improvement in physical function and HRQOL”, which have become established labelling claims for approved therapies.15
SPYDERGRAMS
Differences in the way individuals perceive and report HRQOL can be better interpreted by viewing baseline and change scores across domains, scored from 0 to 100, without z-transformation and normalisation as recommended in version 2 of SF-36, both of which reduce the magnitude of possible change. In contrast to the current practice of displaying SF-36 as eight-columned bar charts, “spydergrams” offer the ability to view changes more easily across all domains as a pattern recognition profile, depicting disease and population-specific “patterns” of decrements in baseline values compared with matched normative data, as well as treatment-associated or longitudinal changes. These “irregularly formed octagons or polygons” can be informative, reflecting different patterns of HRQOL and the impact of underlying disease on “multidimensional function”.
For heuristic or analytic purposes, SF-36 domain score bar graphs are presented as line graphs to aid the viewer in perceiving effects or trends. Similarly, in spydergrams categorical changes are connected (linked) to facilitate visual recognition of patterns, with the disclaimer that this is not intended to imply these are continuous scales. It is not unusual to see figures presented as line graphs that colour in the area below the line, not for significance as an “area under the curve” analysis, but to facilitate visual recognition of differences further. Spydergrams are an evolution from these standard practices, whereby the axis is simply rotated to connect with itself; bar graphs of baseline and changes in domain scores are connected with lines, and areas below the lines are shaded to facilitate pattern recognition.
To compare across disease states, the order of domains presented should be consistent, whether or not they reflect a certain sequence or priority of importance. Convention has dictated that the four physical domains are presented from left to right in a bar chart, then mental domains; thus in a “spydergram” physical function (PF) is at the top, 12 o’clock, followed clockwise by role physical (RP), bodily pain (BP) and general health perceptions (GH), and vitality (VT) at the 6 o’clock position, followed by social functioning (SF), role emotional (RE) and mental health index (MH) clockwise (fig 1A, B).7, 15 Domain scores are plotted from 0 (worst) at the centre to 100 (best) at the outside; demarcations along axes of the domains present changes of 10 points, representing one to two times minimally clinical important differences (MCID). Changes in shape and thickness of these irregular octagonal rings offer a single graphic representation to: (1) compare baseline decrements with age and gender-matched normative values; (2) assess treatment-associated or longitudinal improvements in HRQOL and (3) compare and contrast scores across protocols and disease states. As spydergrams allow visualisation of these values simultaneously, they may be presented on an individual basis with norms as a “treatment goal” and were recently utilised in a patient-assessed programme of therapy.16
Figure 1.
| Baseline | 1 Year | 2 Years | A/G norms | |
|---|---|---|---|---|
| MTX | 0.391 | 0.495 | 0.682 | 0.786 |
| ADA+MTX | 0.384 | 0.514 | 0.756 | 0.786 |
| Baseline | 1 Year | A/G norms | |
|---|---|---|---|
| PL+MTX | 0.305 | 0.331 | 0.523 |
| CZP200+MTX | 0.305 | 0.423 | 0.523 |
| CZP400+MTX | 0.305 | 0.425 | 0.523 |
QUANTITATIVE SUMMARY OF CHANGES IN SF-36 DOMAIN SCORES
Although suggestive of an area under the curve analysis, this would be misleading. The typical eight-column bar graphs have been linked into a single graphic for ease of interpretation (pattern recognition), but technically represent categorical data, not a continuum. Nevertheless, a summary metric that combines data from all eight domains into a single score is important for quantifying changes in these patterns.
Statistical analyses require a primary outcome measure and PCS and MCS scores are often chosen as a single metric for analysis of SF-36 within rheumatology. However, PCS and MCS scores do not fully reflect patterns of change within the domains as they are derived from z-transformed and norm-based domain scores. The model for their derivation assumes that physical and mental health constructs are independent,17 but in the Swedish SF-36 normative database, Taft et al18 illustrated significant correlations between PCS and MCS scores. Farivar et al19 showed there were fewer negative factor scoring coefficients using an oblique factor than standard orthogonal solutions. Hann and Reeves20 recently tested several models in two large databases, again observing correlations between PCS and MCS scores and that the relationship between domain and PCS and MCS scores varied significantly by medical condition, supporting the argument against the orthogonal derivation of scores. Furthermore, Ware and Kosinski21, 22 have argued that: “one of the best defenses against inappropriate conclusions based on the summary measures is the thorough comparison with results based on the 8 SF-36 subscales (domains)”.
An alternative approach to summarise SF-36 domain scores quantitatively could be health state preferences, or utilities valuing health from “0” death, to “1” perfect health, an economic measure critical for evaluation of cost-effectiveness of therapeutic interventions. Ara and Brazier,23, 24 Brazier et al25 and Marra et al26 developed a new calculation of SF-6D, which utilises mean scores across all eight SF-36 domains to yield a single utility measure, which has been validated in longitudinal databases and against EQ-5D within a rheumatic disease population. This single valuation may be used to represent baseline decrements and change scores portrayed by spydergrams.
The use of spydergrams to compare and contrast the impact of multiple rheumatic diseases upon HRQOL, measured by SF-36, has made their value apparent in clarifying decrements in HRQOL compared with matched normative data, as well as treatment-associated improvements.
SF-36 SPYDERGRAM IN DIFFERENT RHEUMATIC DISEASES
Figures 1–3 illustrate SF-36 data, available from published reports and abstract presentations, analysed from RCT in RA, SLE, gout, SSc and osteoarthritis. Age and gender-matched normative data specific to each population were generated based on US norms published in SF-36 manuals and updates.27 Spydergrams were configured for each study, and utility scores were generated following the approach of Ara and Brazier.24 These figures reveal different “polygonal” patterns for each rheumatic disease.
Figure 3.
| Gout | A/G matched norms | |
|---|---|---|
| Veterans with gout and comorbidities (n = 1558) | 0.515 | 0.795 |
| Natural History Study (n = 100) | 0.578 | 0.799 |
| Phase 3 RCT with pegloticase versus placebo | 0.563 | 0.794 |
In early and later disease, RA appears to impact all domains of HRQOL, especially RP, PF and BP, but also RE (figs 1A, C).7, 28 Treatment-associated changes are large in all, not just physical domains, and are greatest in those with the largest decrements at baseline. In SLE (fig 2A), baseline SF-36 scores were low across all domains compared with matched norms.29 In contrast to RA, large decrements in any one domain do not stand out, reflecting the broad impact of active disease on mental as well as physical domains. When baseline as well as treatment-associated changes are viewed as spydergrams, SF-36 data reflected clinical responses defined by the British Isle Lupus Assessment Group (BILAG) disease activity score, patient and physician global scores and decreases in prednisone dose, despite small sample sizes and loss of balanced randomisation.30–32
Figure 2.
| Baseline | 1 Year | A/G norms | |
|---|---|---|---|
| Placebo | 0.391 | 0.495 | 0.825 |
| Epratuzumab 360 mg/m2 | 0.587 | 0.770 | 0.825 |
| Baseline | 6 Months | A/G norms | |
|---|---|---|---|
| Placebo (n = 95) | 0.598 | 0.587 | 0.819 |
| Relaxin (n = 136) | 0.539 | 0.554 | 0.819 |
In SSc (fig 2B), SF-36 scores from a failed RCT with relaxin33 reveal patterns different from previous examples with markedly lower PF and RP scores than those with SLE, including more decrements in VT, BP and GH domains; despite the heterogeneity and multi-organ involvement shared by both conditions.
In the Vet-QOL survey (fig 3A), veterans with gout reported statistically more medical and arthritic comorbidities, hospitalisations and utilisation of outpatient services than those without, and large decrements across all HRQOL domains compared with matched norms.11 In comparison with a “treatment failure gout” population enrolled in an observational Natural History Study,34 baseline scores in both groups were low; remarkably similar, as were SF-36 scores reported by treatment failure gout patients enrolled in two phase 3 protocols comparing pegloticase versus placebo.35 Values reflect decrements in HRQOL comparable to those reported by subjects with longstanding, debilitating RA, or active SLE.36 In contrast, osteoarthritis appears exclusively to impact PF, RP and BP domains, with preservation of other scores, including VT (fatigue)37 (fig 3B).
DISCUSSION
As we have attempted to demonstrate, spydergrams can provide an effective tool to perceive more quickly patterns of change in complex sets of data. They are designed to illustrate differences between baseline and normative data, and portray treatment-associated changes in the context of age and gender-matched norms specific to the population studied. Due to the inclusion of baseline values and comparisons with matched norms as lower and upper bounds, spydergrams allow visual comparisons of the thickness of “rings” exactly proportional to the degree of changes from baseline. Although the shape of the octagon or polygon would change according to the order of presentation of the domains, perceived effects would still remain proportional along each axis, reflecting the impact of disease, facilitating comparisons across conditions.38 Baseline values and treatment-associated changes, in terms of clinical meaningfulness relative to MCID, can easily be discerned by examination of changes along individual domain axes. SF-36 manuals provide data from which age and gender-matched US norms can be generated. Importantly, normative data are available for Great Britain, Denmark, Norway and Sweden, The Netherlands and Turkey, among others.4, 39
Data presented here demonstrate that the pattern of baseline domain scores as well as longitudinal and treatment-associated changes appear to be unique to each rheumatic disease. They also show that improvements tend to occur in domains with the largest decrements at baseline compared with age and gender-matched norms. It is evident that comparing data across all eight domains offers a richness of information not available when solely evaluating PCS and MCS scores or utilising norm-transformed domain scores. Importantly, findings derived from RCT and longitudinal observational studies are similar, supporting the robustness of these observations.
Utilising the recently derived SF-6D utility score to summarise data across all eight domains in a single metric offers a numeric comparison across disease states, in addition to the shape and thickness comparisons offered by spydergrams. The use of SF-6D also facilitates economic evaluations as baseline and change scores can be transformed into utilities for the calculation of quality-adjusted life years, a universal metric in cost-effectiveness analyses. Combining spydergrams with a single metric that generates health utility measures, SF-6D, allows both quantitative and qualitative assessment of the impact of disease and its treatment upon multidimensional function.
Acknowledgments
Funding: DK was supported by a National Institutes of Health Award (NIAMS K23 AR053858-01A1) and the Scleroderma Foundation (New Investigator Award).
Footnotes
Competing interests: VS is a consultant to the following: Abbott Immunology, Alder, Allergan, Almirall, Amgen Corporation, AstraZeneca, Bexel, BiogenIdec, CanFite, Centocor, Chelsea, Crescendo, Cypress Biosciences, Eurodiagnostica, Fibrogen, Forest Laboratories, Genentech, Human Genome Sciences, Idera, Incyte, Jazz Pharmaceuticals, Lexicon Genetics, Logical Therapeutics, Lux Biosciences, Medimmune, Merck Serono, Novartis Pharmaceuticals, NovoNordisk, Nuon, Ono Pharmaceuticals, Pfizer, Procter and Gamble, Rigel, Roche, Sanofi-Aventis, Savient, Schering Plough, SKK, UCB, Wyeth and Xdx. The other authors declare no conflicts of interest.
Provenance and peer review: Not commissioned; externally peer reviewed.
References
- 1.Ware JE, Sherbourne CD. The MOS 36-item short-form health survey (SF–36), I. Conceptual framework and item selection. Med Care. 1992;30:473–83. [PubMed] [Google Scholar]
- 2.Ware J, Kosinski M. SF–36 Physical and Mental Health Summary Scales: a manual for users of version 1. Boston, MA: The Health Institute. New England Medical Center; 2001. [Google Scholar]
- 3.McHorney CA, Ware JE, Raczek AE. The MOS 36-item Short Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care. 1993;31:247–63. doi: 10.1097/00005650-199303000-00006. [DOI] [PubMed] [Google Scholar]
- 4.Ware JE, Keller S, Bentler PM, et al. Comparisons of health status measurement models and the validity of the SF-36 in Great Britain, Sweden and the USA. Qual Life Res. 1994;3:68. [Google Scholar]
- 5.Selim AJ, Berlowitz D, Fincke G, et al. Use of risk-adjusted change in health status to assess the performance of integrated service networks in the Veterans Health Administration. Int J Qual Health Care. 2006;18:43–50. doi: 10.1093/intqhc/mzi080. [DOI] [PubMed] [Google Scholar]
- 6.Kosinski M, Zhao SZ, Dedhiya S, et al. Determining minimally important changes in generic and disease-specific health-related quality of life questionnaires in clinical trials of rheumatoid arthritis. Arthritis Rheum. 2000;43:1478–87. doi: 10.1002/1529-0131(200007)43:7<1478::AID-ANR10>3.0.CO;2-M. [DOI] [PubMed] [Google Scholar]
- 7.Strand V, Singh JA. Improved health-related quality of life with effective disease-modifying antirheumatic drugs: evidence from randomized controlled trials. Am J Manag Care. 2008;14:234–54. [PubMed] [Google Scholar]
- 8.Thumboo J, Strand V. Health-related quality of life in patients with systemic lupus erythematosus: an update. Ann Acad Med Singapore. 2007;36:115–22. [PubMed] [Google Scholar]
- 9.Gladman D, Mease PJ, Strand V, et al. Consensus on a core set domains for psoriatic arthritis. J Rheumatol. 2007;34:1167–70. [PubMed] [Google Scholar]
- 10.Singh JA, Strand V. Spondyloarthritis is associated with poor function and physical health related quality of life. J Rheumatol. 2009;36:1012–20. doi: 10.3899/jrheum.081015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Singh JA, Strand V. Gout is associated with more comorbidities, poorer health related quality of life and higher health care utilization in US Veterans. Ann Rheum Dis. 2008;67:1310–16. doi: 10.1136/ard.2007.081604. [DOI] [PubMed] [Google Scholar]
- 12.Khanna D, Furst DE, Clements PJ, et al. Responsiveness of the SF-36 and the Health Assessment Questionnaire Disability Index in a systemic sclerosis clinical trial. J Rheumatol. 2005;32:832–40. [PubMed] [Google Scholar]
- 13.Hoffman DL, Dukes EM. The health status burden of people with fibromyalgia: a review of studies that assessed health status with the SF-36 or the SF-12. Int J Clin Pract. 2008;62:115–26. doi: 10.1111/j.1742-1241.2007.01638.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kosinski M, Keller SD, Ware JE, Jr, et al. The SF-36 health survey as a generic outcome measure in clinical trials of patients with osteoarthritis and rheumatoid arthritis: relative validity of scales in relation to clinical measures of arthritis severity. Med Care. 1999;37:MS23–39. doi: 10.1097/00005650-199905001-00003. [DOI] [PubMed] [Google Scholar]
- 15.Strand V, Singh J. Health related quality of life in rheumatoid arthritis (chapter 9C) In: Hochberg MC, Silman A, Smolen J, et al., editors. Rheumatoid arthritis. 1. Philadelphia, PA: Mosby Elsevier; 2008. pp. 237–59. [Google Scholar]
- 16.Aventis Sanofi. Welcome to Arrive – your Arava© care programme! [accessed 29 Sep 2009];2009 January; www.arrive-online.org.
- 17.Ware JE, Kosinski M, Bayliss MS, et al. Comparison of methods for the scoring and statistical analysis of SF-36 health profile and summary measures: summary of results from the Medical Outcomes Study. Med Care. 1995;33:AS264–79. [PubMed] [Google Scholar]
- 18.Taft C, Karlsson J, Sullivan M. Do SF-36 summary component scores accurately summarize subscale scores? Qual Life Res. 2001;10:395–404. doi: 10.1023/a:1012552211996. [DOI] [PubMed] [Google Scholar]
- 19.Farivar SS, Cunningham WE, Hays RD. Correlated physical and mental health summary scores for the SF-36 and SF-12 health survey, V1. Health Qual Life Outcomes. 2007;5:1–8. doi: 10.1186/1477-7525-5-54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Hann M, Reeves D. The SF-36 scales are not accurately summarised by independent physical and mental component scores. Qual Life Res. 2008;17:413–23. doi: 10.1007/s11136-008-9310-0. [DOI] [PubMed] [Google Scholar]
- 21.Ware JE, Kosinski M. Interpreting SF-36 summary health measures: a response. Qual Life Res. 2001;10:405–13. doi: 10.1023/a:1012588218728. [DOI] [PubMed] [Google Scholar]
- 22.Ware JE, Kosinski M. Interpreting SF-36 summary health measures: a response supplemental documentation. [accessed 29 Sep 2009]; doi: 10.1023/a:1012588218728. www.sf-36.org/news/responsetotaft.pdf. [DOI] [PubMed]
- 23.Ara R, Brazier J. Predicting the Short Form-6D preference-based index using the eight mean Short Form-36 health dimension scores: estimating preference-based health-related utilities when patient level data are not available. Value in Health. 2009;12:346–53. doi: 10.1111/j.1524-4733.2008.00428.x. [DOI] [PubMed] [Google Scholar]
- 24.Ara R, Brazier J. Deriving an algorithm to convert the eight mean SF-36 dimension scores into a mean EQ-5D preference-based score from published studies (where patient level data are not available) Value in Health. 2008;11:1131–43. doi: 10.1111/j.1524-4733.2008.00352.x. [DOI] [PubMed] [Google Scholar]
- 25.Brazier JE, Roberts J, Deverill M. The estimation of a preference based measure of health from the SF–36. J Health Econ. 2002;21:271–92. doi: 10.1016/s0167-6296(01)00130-8. [DOI] [PubMed] [Google Scholar]
- 26.Marra CA, Woolcott JC, Kopec JA, et al. A comparison of generic, indirect utility measures (the HUI2, HUI3, SF–6D, and the EQ–5D) and disease-specific instruments (the RAQoL and the HAQ) in rheumatoid arthritis. Soc Sci Med. 2005;60:1571–82. doi: 10.1016/j.socscimed.2004.08.034. [DOI] [PubMed] [Google Scholar]
- 27.Ware JE, Jr, Snow KK, Kosinski M, et al. SF-36 health survey: manual and interpretation guide. Boston, MA: The Health Institute. New England Medical Center; 1993. [Google Scholar]
- 28.Strand V, Keininger DL, Tahari-Fitzgerald E. Certolizumab pegol results in clinically meaningful improvements in physical function and health-related quality of life in patients with active rheumatoid arthritis despite treatment with methotrexate. Arthr Rheum. 2007;56:S393. [Google Scholar]
- 29.Strand V, Petri M, Buyon J, et al. Baseline data from 5 RCTs demonstrate that SLE impacts all domains of HRQOL. Arthr Rheum. 2006;54:S277. [Google Scholar]
- 30.Strand V, Gordon C, Kalunian K, et al. Meaningful improvements in health-related quality of life with epratuzumab (anti-CD22 mAb targeting B-cells) in patients with systemic lupus erythematosus with high disease activity: results from 2 randomized controlled trials (RCTs) Arthr Rheum. 2008;58:S570–1. [Google Scholar]
- 31.Petri M, Hobbs K, Gordon C, et al. Clinically meaningful improvements with epratuzumab (anti-CD22 mAb targeting B-cells) in patients (pts) with moderate/severe systemic lupus erythematosus (SLE) flares: results from 2 randomized controlled trials. Arthr Rheum. 2008;58:S571. [Google Scholar]
- 32.Wallace D, Hobbs K, Houssiau F, et al. Epratuzumab (anti-CD22 mAb targeting B-cells) provides clinically meaningful reductions in corticosteroid (CS) use with a favorable safety profile in patients with moderate/severe flaring systemic lupus erythematosus (SLE): results from randomized controlled trials (RCTs) Arthr Rheum. 2008;58:S571–2. [Google Scholar]
- 33.Khanna D, Clements PJ, Furst DE, et al. Recombinant human relaxin in the treatment of systemic sclerosis with diffuse cutaneous involvement. Arthr Rheum. 2009;60:1102–11. doi: 10.1002/art.24380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Becker MA, Schumacher HR, Benjamin KL, et al. Quality of life and disability in patients with treatment-failure gout. J Rheumatol. 2009;36:1041–8. doi: 10.3899/jrheum.071229. [DOI] [PubMed] [Google Scholar]
- 35.Edwards NL, Baraf HSB, Becker MA, et al. Improvement in health-related quality of life (HRQL) and disability index in treatment failure gout (TFG) after pegloticase therapy: pooled results from GOUT1 and GOUT2, phase 3, randomized, double-blind, placebo controlled trials. Arthr Rheum. 2008;58:S178. [Google Scholar]
- 36.Strand V, Singh JA, Sundy J, et al. HRQOL of patients with refractory gout and US veterans with gout and comorbidities is poor, and comparable to that in other severe conditions. Arthr Rheum. 2009;58:S177. [Google Scholar]
- 37.Baraf HS, Strand V, Hosokawa H, et al. Effectiveness and safety of a single intraarticular injection of Gel-200, a new cross linked formulation of hyaluronic acid in the the treatment of symptomatic osteoarthritis of the knee. Proceedings of the OARSI World Congress on Osteoarthritis; 10–13 September 2009; Montreal, Canada. OsteoArthritis Research Society International; Poster 326. [Google Scholar]
- 38.Slatkowsky-Christiansen B, Mowinckel P, Loge JH, et al. Health related quality of life in women with symptomatic hand osteoarthritis: a comparison with rheumatoid arthritis patients, healthy controls and normative data. Arthr Rheum. 2007;57:1404–9. doi: 10.1002/art.23079. [DOI] [PubMed] [Google Scholar]
- 39.Kvien TK, Kaasa S, Smedstad LM. Performance of the Norwegian SF-36 health survey in patients with rheumatoid arthritis. II. A comparison of the SF-36 with disease-specific measures. J Clin Epidemiol. 1998;51:1077–86. doi: 10.1016/s0895-4356(98)00099-7. [DOI] [PubMed] [Google Scholar]



