Abstract
Background:
Utility preference scores are standardized generic health related quality of life (HRQOL) measures that quantify disease severity and burden and summarize morbidity on a scale from 0 (death) to 1 (optimum health). Utility scores are widely used to measure HRQOL and in cost-effectiveness research.
Objective:
To determine the responsiveness, validity properties and minimal important difference (MID) of utility scores, as measured by the Short Form 6D (SF-6D) and EuroQol (EQ-5D), in women undergoing surgery for pelvic organ prolapse (POP).
Study Design:
This study combined data from four large U.S. multicenter surgical trials enrolling 1,321 women with pelvic organ prolapse. We collected condition-specific quality of life data using the Pelvic Floor Distress Inventory (PFDI) and Pelvic Floor Impact Questionnaire (PFIQ). A subset of women completed the SF6D; women in two trials also completed the EQ5D. Mean utility scores were compared from baseline to 12 months after surgery. Responsiveness was assessed using effect size (ES) and standardized response mean (SRM). Validity properties were assessed by 1) comparing changes in utility scores at 12 months between surgical successes and failures as defined in each study and 2) correlating changes in utility scores with changes in the PFDI and PFIQ. MID was estimated using both anchor-based (SF-36 general health global rating scale “somewhat better” vs. “no change”) and distribution-based methods.
Results:
The mean SF-6D score improved 0.050, from 0.705 +/− 0.126 at baseline to 0.761 +/− 0.131 at 12 months, p<0.01. The mean EQ-5D score improved 0.060, from 0.810 +/− 0.15 at baseline to 0.868 +/− 0.15 at 12 months, p<0.01. ES (0.13-0.61) and SRM (0.13-0.57) were in the small to moderate range, demonstrating responsiveness of the SF-6D and EQ-5D similar to other conditions. SF-6D and EQ-5D scores improved more for prolapse reconstructive surgical successes than failures. The SF-6D and EQ-5D scores correlated with each other (r=0.41; n=645) and with condition-specific instruments. Correlations with the PFDI and PFIQ and their prolapse subscales were in the low to moderate range (r=0.09-0.38), similar to other studies. Using the anchor-based method, the MID was 0.026 for SF-6D and 0.025 for EQ-5D, within the range of MIDs reported in other populations and for other conditions. These findings were supported by distribution-based estimates.
Conclusion:
The SF-6D and EQ-5D have good validity properties and are responsive preference-based utility and general HRQOL measures for women undergoing surgical treatment for prolapse. The MIDs for SF-6D and EQ-5D are similar and within the range found for other medical conditions.
Keywords: EuroQol, Health related quality of life, minimal important difference, Pelvic floor disorders, Pelvic organ prolapse, Short Form 6D, Utility score
Introduction:
Pelvic organ prolapse (POP) is common, with more than 188,659 inpatient procedures performed annually in the United States.1 Treatment is recommended if symptoms are bothersome and impacting quality of life.2 Understanding the impact of POP and its treatment on health-related quality of life (HRQOL) is important both clinically and for cost-effectiveness research. Utility preference scores are generic HRQOL measures that quantify disease severity/burden and treatment impact; they summarize morbidity on a scale from 0 (death) to 1 (optimum health).3 Utility scores allow comparison across a wide range of disease states, populations, and treatment modalities and serve as an integral component to the quality-adjusted life years (QALYs) annualized measure of HRQOL. QALYs are commonly utilized when quantifying the benefits of a medical intervention for cost-utility analysis, the most common health economics evaluation.4 Evaluating the psychometric properties of utility scores in women with POP will allow researchers and health care providers to measure HRQOL, assess the effect of treatment on women’s quality of life, perform health economic evaluations and compare the cost-effectiveness of treatments for pelvic organ prolapse to treatments for other medical conditions.
General scales have been developed to measure utility preference scores for a wide variety of disease conditions and populations. These include the widely used multi-item, multi-attribute EuroQol (EQ-5D) and Short Form 6D (SF-6D).5,6 Use of these indices with varied medical conditions facilitates the interpretation of results and comparison of disease and treatment outcomes. While the SF-6D and EQ-5D have been evaluated in women with pelvic floor disorders, including POP7, the responsiveness and minimally important difference (MID) of these instruments in women undergoing surgery for POP have not been established. It is unknown whether these generic instruments that do not contain POP or pelvic-floor related components and the general scores produced are sensitive to change following surgical treatment of POP.
Our objective was to determine the responsiveness, validity properties and MID of utility scores as measured by the SF-6D and EQ-5D in women undergoing surgical repair for POP.
Materials and Methods:
This study is a retrospective analysis that combined data from four large U.S. multicenter POP surgical trials conducted by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD)-sponsored Pelvic Floor Disorders Network (PFDN): Outcomes following vaginal prolapse repair and mid urethral sling (OPUS)8, Operations and pelvic muscle training in the management of apical support loss (OPTIMAL),9 Colpopexy and Urinary Reduction Effort (CARE),10 and Colpocleisis Trial (COLPO).11 All sites had IRB approval and all women provided written informed consent. All participants underwent surgical correction of stage II-IV prolapse by experienced pelvic surgeons at 17 sites throughout the United States.
Two common preference-based multi-attribute health-status classification system instruments were used to estimate utility preference scores: EuroQol (EQ-5D) (EuroQol Group, http://www.euroqol.org), and Short Form 6D (SF-6D) (QualityMetric Incorporated, http://www.qualitymetric.com). The EQ-5D is scored on a −0.59 to 1.00 scale and has 5 dimensions (mobility, self-care, usual activities, pain/discomfort and anxiety/depression); each dimension has 3 levels for 243 possible unique health states.5,12 The SF-6D is scored on a 0.29 to 1.00 scale and has 6 dimensions (physical functioning, role limitation, social functioning, pain, mental health, vitality); each dimension has 2 to 6 levels for 18,000 possible unique health states.6,12 Higher scores indicate better quality of life.
Pelvic floor symptoms were assessed by the Pelvic Floor Distress Inventory (PFDI), a validated, condition-specific questionnaire with 46 items and 3 scales, designed to evaluate distress caused by bowel, urinary and POP complaints.13 Pelvic floor related quality of life was measured by the Pelvic Floor Impact Questionnaire (PFIQ), a validated condition-specific HRQOL questionnaire with 93 items including bladder, bowel and POP domains, each with 4 subscales.13 Higher scores on the PFDI and PFIQ indicate worse symptoms and quality of life.
Surgical success was previously defined in the four individual trials. The three reconstructive studies (OPUS, OPTIMAL and CARE) defined success using the following criteria: 1) absence of bothersome bulge symptoms as measured by the PFDI, 2) no prolapse beyond the hymen on Pelvic Organ Prolapse Quantification (POPQ) examination14 and 3) no subsequent retreatment for prolapse. OPTIMAL had an additional criterion: 4) no descent of the vaginal apex more than one-third into the vaginal canal. Women not meeting all criteria were considered surgical failures. The obliterative study (COLPO) defined success as no prolapse beyond 1 cm inside the hymen on the POPQ. POPQ examinations, EQ-5D, SF-6D, PFDI and PFIQ measures were administered at baseline and 12 month follow up visits. While the four individual trials had different lengths of follow up, ranging from one to two years, this analysis uses data from the common 12-month visit.
Responsiveness of the utility scores, an instrument’s ability to detect change that occurs as the result of therapy (i.e., POP surgery), was assessed in 2 ways via the effect size (ES) and the standardized response mean (SRM). ES is the mean change in utility score from baseline to 12 months divided by the standard deviation (SD) of the baseline score. SRM is the mean change in score from baseline to 12 months divided by the SD of the change. ES and SRM were classified as small (0.2-0.49), moderate (0.5-0.79), and large (>0.8).15
Validity properties of the utility scores, whether an instrument measures what it is intended to measure, was assessed in two ways. Convergent and discriminant validity, the degree to which two measures that should be related or unrelated are in fact, related or unrelated, was analyzed by comparing changes in scores of the EQ-5D and SF-6D at 12 months between surgical successes and failures and assessing whether surgical successes had larger utility gains than failures. Concurrent validity, the relationship of an instrument with other measures of the same or similar construct that are measured at the same time, was analyzed by correlating changes in scores of the EQ-5D and SF-6D with each other and with the condition-specific instruments PFDI and PFIQ. Correlations were classified as low (<0.3), moderate (0.3-0.5), and high (>0.5).16 Potential biases in the utility score measures from the SF-6D and EQ-5D were explored using a Bland–Altman plot to examine whether differences between the two measures depended on the initial health status of a patient.17
The Minimal important difference (MID), the smallest change that can be regarded as clinically meaningful, was estimated for the utility scores by applying both anchor-based and distribution-based methods per current recommendations.18,19,20,21 Anchor-based methods examine the relationship between a HRQOL measure and an independent external measure (or anchor) to elucidate the meaning of a particular change in the health construct.18,20,21The anchor-based MID approach used the difference in utility score corresponding to a self-reported small, but important, change on question 2 of the SF-36, 12 months postoperatively. Question 2, which is not part of the SF-6D, asks if general health is much better, somewhat better, the same, somewhat worse, or much worse compared with before surgery. The MID was defined as the difference between the mean change in utility scores for patients whose global rating score was “somewhat better” and the mean change in utility for patients who reported they were “the same”. Distribution-based MID approaches relate utility changes to either variability (e.g. standard deviation) or reliability (e.g. Cronbach’s α);.22 Our approach focused on variability and was based on the baseline standard deviation of utility scores. The distribution-based MID was defined as 0.5 × baseline SD (i.e., medium effect size) and 0.2 × baseline SD (i.e., small effect size).16 MID estimates of the anchor- and distribution-based approaches were compared, and recommendations for the MID of the SF-6D and EQ-5D were made by consensus, consistent with the recommendations of Revicki et al.19
Statistical Methods:
Demographic data are presented as percentages or means. Categorical data were compared between studies using Pearson’s Chi-square; continuous variables were compared using ANOVA. Ordinal variables such as pelvic organ prolapse stage were compared using a Kruskal-Wallis test.
Responsiveness:
Utility score change from baseline to 12 months was estimated and compared to zero for each study using a t-test and for the combined studies using a t-test from an individual patient data random effects meta-analysis mixed effects linear model using the Kenward-Roger method to estimate degrees of freedom.23, 24 ES was calculated as the mean change divided by the SD of the baseline score. SRM was calculated as the mean change divided by the SD of the change. For the combined studies, the within-study SD estimate was obtained using ANOVA.
Validity:
Change in utility scores for those with surgical success were compared versus surgical failures for each study using ANOVA and for the combined studies using a meta-analysis mixed effects model as described above. Pearson correlations were calculated for changes in scores with each other and with condition-specific instruments.
Minimal important difference:
Anchor-based MID and 95% CI in utility score mean changes for those with a global rating score of “somewhat better” minus the mean change for those with a global rating score of “the same” were estimated for each study using ANOVA and for the combined studies using the meta-analysis mixed effects model. Homogeneity of MID across studies was assessed by a test of interaction between the study and global rating score category in a supportive ANOVA model. The percentage of all patients with a change equal to or better than the MID was calculated. Distribution-based MID was calculated as 0.2 and 0.5 times the baseline standard deviation of utility scores. For the combined studies, the within-study SD estimate was obtained using ANOVA. The 95% CI of the distribution-based MID estimates were calculated using a bootstrapping approach.
All reported p-values were two-sided. All statistical analyses were done using Stata Statistical Software: Release 15 (College Station, TX) or SAS, version 9.4 (Cary, NC).
Results:
The methods and results of OPUS, OPTIMAL, CARE and COLPO have been previously published.8,9,10,11 These four studies enrolled a total of 1,314 women. (Table 1) The SF-6D was included in all four studies; the EQ-5D was included in CARE and COLPO. Women who had utility data at 12 months for the SF-6D (N=1,100) and EQ-5D (N=715) were included in the current analysis. We included 410/466 (88%) women from OPUS, 309/374 (83%) from OPTIMAL, 284/322 (88%) from CARE and 118/152 (78%) from COLPO in the current analysis; the number of included women from each trial reflects the maximum sample from the SF-6D or the EQ-5D responses. (Figure 1)
Table 1:
Summary of studies
| Study | Study Type |
Study N |
Prolapse Surgery Type |
SF-6D N 12 Months |
EQ-5D N 12 Months |
|---|---|---|---|---|---|
| OPUS | RCT | 337 RCT 129 preference arm | Reconstructive | 389 | 410 |
| OPTIMAL | RCT | 374 | Reconstructive | 309 | 305 |
| CARE | RCT | 322 | Reconstructive | 284 | n/a |
| COLPO | Cohort | 152 | Obliterative | 118 | n/a |
| Total | 1,314 | 1,100 | 715 |
RCT=Randomized Controlled Trial
Figure 1:
STROBE diagram
The baseline characteristics for women completing the SF-6D and the EQ-5D from each of the four studies are shown in Table 2. Notable differences between study groups include older average age for COLPO at 78.1 +/− 5. 4 years, while the mean age in the other studies ranged from 57.2 +/− 10.9 to 63.8 +/− 9.9 years. There were higher stages of POP in CARE and COLPO and higher rates of prior surgeries at baseline in these cohorts. We found no difference in baseline utility scores and found there were few and small differences in baseline characteristics of subjects who were excluded from our analysis because of missing data and those who were included (data not shown).
Table 2.
Baseline Characteristics for SF-6D and EQ-5D Samples by Study
| SF-6D Sample | EQ-5D Sample | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Baseline Characteristics |
Over all |
OPUS | OPTIMAL | CARE | COLPO | Within SF-6D P- value1 |
Overall | OPUS | OPTIMAL | Within EQ-5D P- value1 |
| N | 1,100 | 389 | 309 | 284 | 118 | 715 | 410 | 305 | ||
| Age (mean +/− SD) | 62.9(11.5) | 63.6(10.0) | 57.2(10.9) | 61.6(10.3) | 78.1(5.4) | <0.01 | 61.2(10.7) | 63.8(9.9) | 57.7(10.8) | <0.01 |
| Race | <0.01 | 0.94 | ||||||||
| White | 89% | 88% | 87% | 94% | 91% | 87% | 87% | 87% | ||
| Black | 6% | 6% | 5% | 5% | 8% | 6% | 6% | 6% | ||
| Other Race | 5% | 6% | 7% | 1% | 1% | 7% | 7% | 7% | ||
| Ethnicity (%) | <0.01 | 0.03 | ||||||||
| Hispanic | 11% | 12% | 20% | 3% | 1% | 15% | 12% | 18% | ||
| Non-Hispanic | 89% | 88% | 80% | 97% | 99% | 85% | 88% | 82% | ||
| Baseline Prolapse Stage (%) | <0.01 | <0.01 | ||||||||
| Stage 2 | 25% | 28% | 38% | 13% | 2% | 32% | 28% | 38% | ||
| Stage 3 | 63% | 63% | 57% | 69% | 64% | 61% | 63% | 57% | ||
| Stage 4 | 13% | 9% | 5% | 18% | 34% | 7% | 9% | 5% | ||
| Prior Surgery for Prolapse (%) | 19% | 14% | 6% | 38% | 24% | <0.01 | 11% | 14% | 8% | <0.01 |
| Prior Surgery for Urinary Incontinence (%) | 5% | 3% | 4% | 7% | 14% | <0.01 | 3% | 2% | 4% | 0.18 |
| Prior Hysterectomy (%) | 46% | 39% | 26% | 71% | 61% | <0.01 | 34% | 38% | 28% | <0.01 |
P-value comparing studies
Responsiveness:
The mean SF-6D and EQ-5D scores improved at 12 months for each of the reconstructive studies (p < 0.01), but not the obliterative study (p=0.15). The overall SF-6D and EQ-5D scores also showed improvement (p<0.01). (Table 3) Effect sizes for the SF-6D and EQ-5D were in the small (0.2-0.49) to moderate (0.5-0.79) ranges for the reconstructive studies but < 0.2 for the obliterative study. (Table 4)
Table 3:
Responsiveness - Changes in SF-6D and EQ-5D scores at 12 months after POP surgery
| SF-6D | EQ-5D | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| N | Baseline Mean (SD) |
12 Months Mean (SD) |
Change1
Mean (SD/SE) |
P-value2 | N | Baseline Mean (SD) |
12 Months Mean (SD) |
Change1
Mean (SD/SE) |
P-value2 | |
| Overall | 1,100 | 0.705 (0.126) | 0.761 (0.131) | 0.050 (SE 0.004) | < 0.0001 | 715 | 0.810 (0.149) | 0.868 (0.154) | 0.060 (SE 0.006) | < 0.0001 |
| OPUS | 389 | 0.719 (0.132) | 0.779 (0.132) | 0.060 (0.124) | <0.01 | 410 | 0.806 (0.154) | 0.854 (0.158) | 0.047 (0.146) | <0.01 |
| OPTIMAL | 309 | 0.677 (0.127) | 0.754 (0.137) | 0.077 (0.136) | <0.01 | 305 | 0.815 (0.143) | 0.887 (0.146) | 0.072 (0.151) | <0.01 |
| CARE | 284 | 0.716 (0.117) | 0.763 (0.120) | 0.047 (0.115) | <0.01 | 0 | N/A | N/A | N/A | N/A |
| COLPO | 118 | 0.703 (0.114) | 0.717 (0.122) | 0.014 (0.108) | 0.15 | 0 | N/A | N/A | N/A | N/A |
Change in utility score of 0.03 is generally considered clinically significant.
P-values in this table represent a paired t-test of utility at baseline and follow-up. Standard errors in the overall results are adjusted for clustering by trial.
Table 4:
Responsiveness - Effect size and standardized response mean for the SF-6D and EQ-5D
| SF-6D | EQ-5D | |||||
|---|---|---|---|---|---|---|
| N | Effect Size2 | SRM3 | N | Effect Size2 | SRM3 | |
| Overall1 | 1,100 | 0.399 | 0.404 | 715 | 0.390 | 0.393 |
| OPUS | 389 | 0.459 | 0.487 | 410 | 0.308 | 0.324 |
| OPTIMAL | 309 | 0.608 | 0.571 | 305 | 0.502 | 0.477 |
| CARE | 284 | 0.400 | 0.408 | n/a | n/a | n/a |
| COLPO | 118 | 0.126 | 0.133 | n/a | n/a | n/a |
The overall Effect Size and SRM use standard deviations that account for within-trial clustering.
ES is the mean change in score divided by the standard deviation (SD) of the baseline score.
SRM is the mean change in score divided by the SD of the change.
Validity:
SF-6D and EQ-5D scores improved more for POP surgical successes than failures, both overall and for the reconstructive studies. This was not seen for the obliterative study. These particular analyses were performed among the subset of women that had complete surgical success outcomes data at 12 months for the SF-6D (N=991) and EQ-5D (N=666). (Table 5) The SF-6D and EQ-5D scores moderately correlated with each other (r = 0.41) and with the PFDI, PFIQ, and all their subscales. The correlations were low (r = 0.0 to 0.30) or moderate (r = 0.30 to 0.50). (Table 6) The Bland–Altman plot to assess agreement of the SF-6D and EQ-5D in measurement of baseline utility scores suggested that the differences between the two measurements was somewhat dependent on the individual’s baseline health status. Similar to other studies’ findings, women with low baseline quality of life (average utility scores <0.6) had lower scores on the EQ-5D than on the SF-6D, while those with high baseline quality of life (average utility scores >0.8) had higher scores on the EQ-5D. Most women had mid-range utility values where the EQ-5D and SF-6D were more aligned.25 (Figure 2)
Table 5:
Validity - Change of SF-6D and EQ-5D scores for surgical successes vs. failures at 12 months1
| SF-6D | EQ-5D | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Subsample N1 |
% with Surgical Success2 |
Surgical Success Utility Score Change3 (SD/SE) |
Surgical Failure Utility Score Change3 (SD/SE) |
P- value |
Subsample N1 |
% with Surgical Success2 |
Surgical Success Utility Score Change3 (SD/SE) |
Surgical Failure Utility Score Change3 (SD/SE) |
P- value |
|
| Overall | 991 | 80.8% | 0.052 (0.005) | 0.029 (0.009) | 0.023 | 666 | 76.6% | 0.066 (0.008) | 0.038 (0.014) | 0.034 |
| OPUS | 357 | 75.1% | 0.064 (.125) | 0.046 (.113) | 0.211 | 381 | 75.3% | 0.052 (.142) | 0.037 (.149) | 0.391 |
| OPTIMAL | 287 | 76.3% | 0.087 (.138) | 0.045 (.131) | 0.024 | 285 | 78.2% | 0.081 (.148) | 0.035 (.153) | 0.035 |
| CARE | 260 | 92.7% | 0.044 (.107) | 0.023 (.139) | 0.538 | n/a | n/a | n/a | n/a | n/a |
| COLPO | 87 | 83.9% | 0.014 (.110) | 0.016 (.105) | 0.967 | n/a | n/a | n/a | n/a | n/a |
Uses only the subsample that had complete surgical success outcomes data at 12 months.
The percentage of the subsample that had surgical success outcome data available whose outcomes reflected successful surgery based on the definition of success used in the trial.
Change in utility score of 0.03 is generally considered clinically significant. SD shown for individual study results and SE shown for overall study results. Standard errors in the overall results are adjusted for clustering by trial.
Table 6.
Validity - SF-6D and EQ-5D correlations1 with each other and with indices of condition severity
| Total | OPUS | OPTIMAL | CARE | COLPO | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| SF-6D N=1,100 |
EQ-5D N=715 |
SF-6D N=389 |
EQ-5D N= 410 |
SF-6D N=309 |
EQ-5D N=305 |
SF-6D N=260 |
EQ-5D N/A |
SF-6D N=87 |
EQ-5D N/A |
|
| SF-6D | - | 0.41(N=645) | - | 0.43(N=382) | - | 0.37(N=263) | - | - | - | - |
| PFDI | 0.29(N=1,047) | 0.24(N=711) | .24(N=384) | 0.15(N=406) | 0.31(N=269) | 0.31(N=305) | 0.34(N=278) | - | 0.21(N=116) | - |
| Prolapse subscore | 0.29(N=1,056) | 0.25(N=715) | 0.29(N=389) | 0.19(N=410) | 0.28(N=269) | 0.31(N=305) | 0.35(N=281) | - | 0.27(N=117) | - |
| Bladder subscore | 0.26(N=1,055) | 0.24(N=713) | 0.23(N=386) | 0.13(N=408) | 0.30(N=269) | 0.32(N=305) | 0.30(N=283) | - | 0.16(N=117) | - |
| Bowel subscore | 0.20(N=1,050) | 0.14(N=713) | 0.14(N=387) | 0.05(N=408) | 0.23(N=269) | 0.21(N=305) | 0.22(N=278) | - | 0.13(N=116) | - |
| PFIQ | 0.31(N=1,043) | 0.27(N=703) | 0.32(N=380) | 0.21(N=398) | 0.31(N=269) | 0.29(N=305) | 0.38(N=279) | - | 0.13(N=115) | - |
| Prolapse subscore | 0.30(N=1,043) | 0.17(N=703) | 0.33(N=380) | 0.09(N=400) | 0.27(N=269) | 0.22(N=305) | 0.37(N=279) | - | 0.13(N=115) | - |
| Bladder subscore | 0.31(N=1,043) | 0.23(N=703) | 0.32(N=380) | 0.17(N=398) | 0.31(N=269) | 0.26(N=305) | 0.33(N=279) | - | 0.19(N=115) | - |
| Bowel subscore | 0.19(N=1,043) | 0.29(N=705) | 0.15(N=380) | 0.26(N=398) | 0.24(N=269) | 0.29(N=305) | 0.20(N=279) | - | 0.05(N=115) | - |
Pearson’s correlation coefficient
Figure 2:
Bland Altman plot for SF-6D and EQ-5D scores at baseline.
Minimal important difference:
Using the anchor-based method, the MID of the SF-6D ranged from 0.017 to 0.031 with a mean of 0.026; the MID of the EQ-5D ranged from 0.013 to 0.042 with a mean of 0.025. There was no statistical evidence of lack of homogeneity in the anchor-based MID estimates across the four studies that collected SF-6D utility scores (p = 0.95, F=0.13, df=3) or the two studies that collected EQ-5D sores (p=0.36, F=0.84 df=1). MID estimates were therefore combined to produce overall weighted total mean MID estimates for the SF-6D and the EQ-5D. (Table 7) The wide confidence intervals for the MID estimates, including negative values for the individual studies and EQ-5D overall, reflect both the uncertainty in the estimates and the relatively small study sizes.
Table 7:
Minimal important difference of the SF-6D and EQ-5D
| SF-6D | EQ-5D 2 | |||||
|---|---|---|---|---|---|---|
| N | Participants with change > MID1 |
MID1 (95% CI) |
N | Participants with change > MID1 |
MID1 (95% CI) |
|
| Overall | 548 | 57.8% | 0.026 (0.007, 0.045) N=548 | 444 | 47.0% | 0.025 (−0.005, 0.054) N=350 |
| OPUS | 198 | 60.9% | 0.031 (0.000, 0.062) | 305 | 47.3% | 0.013 (−0.026, 0.051) |
| OPTIMAL | 154 | 64.1% | 0.030 (−0.009, 0.068) | 139 | 47.5% | 0.042 (−0.005, 0.088) |
| CARE | 132 | 60.2% | 0.017 (−0.019, 0.053) | n/a | n/a | n/a |
| COLPO | 64 | 43.2% | 0.025 (−0.031, 0.080) | n/a | n/a | n/a |
MID anchor-based method = mean difference between the subsample of subjects reporting “somewhat better” on the global measure of change question from the SF-6D and the subsample of subjects reporting “the same.”
Note: A total of 548 subjects were included in the SF-6D MID estimate (267 reporting “better” and 281 reporting “samee”). A total of 350 subjects were included in the EQ-5D MID estimate (176 reporting “better” and 174 reporting “worse”)
MID estimates using the distribution-based method indicated that the ES with 0.2 SD corresponded to an improvement in SF-6D score that ranged from 0.023 to 0.026, with mean 0.025; the EQ-5D improvement ranged from 0.029 to 0.031, with mean 0.030. The ES with 0.5 SD corresponded to an improvement in SF-6D score that ranged from 0.057 to 0.066, with mean 0.063. The corresponding improvement in the EQ-5D score ranged from 0.072 to 0.077, with mean 0.075. Anchor-based MID estimates were similar to distribution-based MIDs of 0.2 SD and substantially smaller than distribution-based MIDs of 0.5 SD. Figure 3 shows a forest plot of the SF-6D and EQ-5D anchor-based MID estimates for each trial and overall and their associated confidence limits. Distribution-based MIDs with small ES (0.2 SD) were similar to the anchor-based MID point estimates; distribution-based MIDs with medium ES (0.5 SD) corresponded closely to the upper confidence limits of the anchor-based MIDs.
Figure 3:
Minimally important differences for the SF-6D and EQ-5D with Anchor- and Distribution-Based Methods
Comment:
This study shows that the SF-6D and EQ-5D have good validity properties and are responsive instruments for measuring the effect of reconstructive surgical treatment for POP. We observed moderate correlations of the SF-6D and EQ-5D scores with each other and condition-specific measures of POP symptoms and QOL, the PFDI and PFIQ, similar to other studies, demonstrating concurrent validity.7 For women undergoing reconstructive pelvic surgery for POP, scores for both instruments improved at 12 months, with greater improvement for surgical successes than failures, demonstrating responsiveness and convergent and discriminant validity.
Finally, the MID of both instruments for the surgical treatment of POP were similar to values that have previously been reported for other medical conditions. These findings suggest that SF-6D and EQ-5D are valid preference-based utility and general HRQOL measures that could be used to evaluate the cost-effectiveness of reconstructive surgeries for POP.
The values of two measures of responsiveness, ES and SRM, were in the small to moderate range for reconstructive surgical procedures, ES (0.31-0.61) and SRM (0.32-0.57), comparable to the reported responsiveness of the SF-6D and EQ-5D for a variety of other medical conditions. In an analysis of 7 studies that included irritable bowel syndrome, leg ulcers, knee osteoarthritis, orthopedic limb reconstruction, early rheumatoid arthritis, chronic obstructive pulmonary disease, and adults >65 years, Walters et al. reported that the SRM for SF-6D was 0.30 (range 0.11 to 0.48).26 Similarly, another analysis of 11 studies, also with a variety of medical conditions, showed that the SRM of SF-6D was 0.39 (range 0.12 to 0.87) and 0.24 for EQ-5D (range −0.05 to 0.43).12 Our findings suggest that SF-6D and EQ-5D measure the effect of surgical interventions for POP in a manner similar to treatments for other conditions.
The SF-6D was not responsive to the effect of obliterative surgery; scores did not significantly improve at 12 months (Table 3) and were not higher for surgical successes than failures (Table 5). This may have been due to a relatively small sample size from a single study. Alternatively, the SF-6D may not be sensitive to the effect of obliterative surgical treatment of POP in older women who may have additional comorbidities that prevent them from undergoing reconstructive surgery, greatly reduce their general quality of life or affect recovery after surgery. Our findings suggest that for women undergoing obliterative surgery, condition-specific instruments might be better than generic HRQOL instruments for measuring the efficacy of treatment, even though they do not allow calculation of utilities or comparison to other disease conditions.
MID places the magnitude of change in a context to help clinicians assess whether surgeries result in meaningful improvement in patient HRQOL. The present study used one anchor-based approach and a conservative distribution-based approach to establish the MID for the SF-6D and EQ-5D. The application of multiple methods to determine the MID in a specific patient population generally results in a range of values, as seen in this study. Clinically, a single point estimate or narrow range of the MID is most helpful. Using an integrated approach, we report MIDs of 0.026 (0.017 to 0.031) for the SF-6D and 0.025 (0.013 to 0.042) for the EQ-5D for women undergoing surgery for POP. Approximately half of women across all studies had a utility change ≥ to MID. Our reported MIDs are comparable to those for other treatments and are within the range previously reported for other medical conditions. Results from preference-based measures in other populations suggest MID=0.03.27 One analysis with 7 studies reported mean SF-6D MID=0.033 (range 0.010-0.048)26; another with 11 studies reported mean SF-6D MID=0.041 (range 0.011-0.097) and mean EQ-5D MID=0.074 (range-0.011-0.140).12
Though both the SF-6D and EQ-5D weigh health states on a scale of 0 (dead) to 1 (optimal health) and the MID values for both instruments are similar, they are not directly comparable. In the current study, the Bland-Altman plot demonstrated differences in baseline SF-6D and EQ-5D scores. The pattern is similar to other studies that have demonstrated that the SF-6D does not appear to describe health states at the lower end of the scale as well as the EQ-5D but is better able to describe health states and detect improvements towards the top of the utility scale.28
These findings suggest that the SF-6D and EQ-5D scores are similar in their detection of changes in utility but different in the absolute amount of HRQOL measured. Therefore, while both the SF-6D and EQ-5D are able to measure the effect of POP, scores are not directly comparable. The baseline health state of women might inform the choice between SF-6D and EQ-5D utility instruments. EQ-5D may be preferred for women with lower baseline health and SF-6D preferred for women with higher baseline health.
The strengths of this study are the use of four multi-center studies of women undergoing various POP surgical procedures, the use of multiple approaches to establish MID estimates and the use of validated and widely accepted patient-reported utility and condition-specific outcome measures. Our data allow evaluation of the reliability and validity of generic HRQOL scales to assess the impact of surgical interventions for POP and allow evaluation of utility measures for each study and in aggregate across multiple studies.
The limitations of this study include a smaller amount of EQ-5D data from two of the four studies and a smaller number of subjects undergoing obliterative procedures from only one study. This retrospective study was limited to the SF-6D and EQ-5D instruments administered as part of the four trials; other possible utility instrument options include the 15-D29 and Health Utilities Index Mark 3 (HUI-3)30. The use of multiple utility instruments in future studies for women undergoing surgery for the treatment of POP can further help inform the choice between these instruments in this population. Confidence in the MID values of SF-6D and EQ-5D from this study should evolve over time through additional research on different populations and contextual characteristics. As with other aspects of construct validity, responsiveness and MID values are confirmed based on accumulating evidence from multiple studies.
In conclusion, the SF-6D and EQ-5D allow valid measurement of the impact of POP on HRQOL, facilitate comparison of HRQOL changes to other diseases and general population norms, and provide utility scores for calculating quality adjusted life years for health economic evaluation. Our data suggest that reasonable estimates of MID in women undergoing surgical treatment for POP are approximately 0.026 for the SF-6D and 0.025 for the EQ-5D. Concern that these generic instruments with non-pelvic-floor specific items may lack sensitivity to the unique aspects of POP and the impact on patients’ lives has limited their use. Our findings support the use of these general utility preference instruments in women undergoing reconstructive surgical treatment for POP; future intervention trials should include these measures to provide HRQOL outcome data and allow for cost-effectiveness analysis.
Condensation:
The SF-6D and EQ-5D are valid and responsive preference-based utility and general health related quality of life measures with similar minimal important differences for women undergoing surgical treatment for pelvic organ prolapse.
Implications and Contributions:
Why was this study conducted:
This study was conducted to determine the responsiveness, validity properties and minimal important difference (MID) of utility scores, as measured by the Short Form 6D (SF-6D) and EuroQol (EQ-5D), in women undergoing surgery for pelvic organ prolapse.
What are the key findings:
The SF-6D and EQ-5D have good validity properties and are responsive preference-based utility and general health related quality of life (HRQOL) measures for women undergoing surgical treatment for prolapse. The MIDs for SF-6D and EQ-5D are similar and within the range found for other medical conditions
What does this study add to what is already known:
The SF-6D and EQ-5D provide valid measures of HRQOL and utility scores in women with pelvic organ prolapse and will allow comparison of the impact of these conditions to other disease states and provide essential data for cost-effectiveness research
Acknowledgments
Financial support for this project was provided by:
Eunice Kennedy Shriver National Institute of Child Health and Human Development and the NIH Office of Research on Women’s Health at National Institutes of Health 5U24HD069031-07
Footnotes
Disclosures:
Harvie: None
Honeycutt: None
Neuwahl: None
Barber: Royalties: Elsevier, UpToDate
Richter: Pelvalon, Consultant and grant support; Renovia, Consultant; Royalties from UptoDate; Travel funds IUGA; Travel funds ICS; NICHD, NIA, PCORI
Visco: Ninomed
Sung: None
Shepherd: Site PI for Myrbetriq trial supported by Astellas
Rogers: DSMB chair for the TRANSFORM trial sponsored by American Medical Systems; Stipend and travel from ABOG and IUGA; Royalties from Uptodate
Jakus-Waldman: None
Mazloomdoost: None
Disclaimer: None
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References:
- 1.Bradley SL, Weidner AC, Siddiqui NY, Gandhi MP, Wu JM. Shifts in National Rates of Inpatient Prolapse Surgery Emphasize Current Coding Inadequacies Female Pelvic Med Reconstr Surg 2011. July ; 17(4): 204–208. doi: 10.1097/SPV.0b013e3182254cf1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Practice Bulletin No. 185: Pelvic Organ Prolapse. Obstet Gynecol 2017. November;130(5):e234–e250 [DOI] [PubMed] [Google Scholar]
- 3.Drummond MF, Schulpher MJ, Torrance GW, O'Brien BJ, Stoddart GL. Methods for the Economic Evaluation of Health Care Programmes 3rd Edition ed. New York: Oxford University Press; 2005. [Google Scholar]
- 4.Mehrez A, Gafni A. Quality-adjusted life years, utility theory, and healthy-years equivalents. Medical decision making : an international journal of the Society for Medical Decision Making. 1989;9(2):142–149. [DOI] [PubMed] [Google Scholar]
- 5.Brooks R _EuroQol: the current state of play. Health Policy. 1996;37(1):53–72. [DOI] [PubMed] [Google Scholar]
- 6.Brazier J, Roberts J, Deverill M. The estimation of a preference-based measure of health from the SF-36. J health Econ 2002;21(2):271–292 [DOI] [PubMed] [Google Scholar]
- 7.Harvie HS, Lee DD, Andy UU, Shea JA, Arya LA: Validity of utility measures for women with pelvic organ prolapse. Am J Obstet Gynecol October 2017 [DOI] [PubMed] [Google Scholar]
- 8.Wei JT, Nygaard I, Richter H, Nager CW, Barber MD, Kenton K, Amundsen CL, Schaffer J, Meikle SF, Spino C. A midurethral sling to reduce incontinence after vaginal prolapse repair. NEJM 2012; 366(25): 2358–2367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Barber MD, Brubaker L, Burgio KL, et al. Comparison of 2 transvaginal surgical approaches and perioperative behavioral therapy for apical vaginal prolapse: the OPTIMAL randomized trial. JAMA 2014;311:1023–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Brubaker L, Cundiff GW, Fine P, Nygaard I, Richter HE, Visco AG, Zyczynski H, Brown MB, Weber AM; Pelvic Floor Disorders Network. Abdominal sacrocolpopexy with Burch colposuspension to reduce urinary stress incontinence. N Engl J Med 2006; 354:1557–1566. [DOI] [PubMed] [Google Scholar]
- 11.Fitzgerald MP, Richter HE, Bradley CS, Ye W, Visco AC, Cundiff GW, Zyczynski HM, Fine P, Weber AM, Pelvic Floor Disorders Network. Pelvic support, pelvic symptoms, and patient satisfaction after colpocleisis. International urogynecology journal and pelvic floor dysfunction. 2008. December; 19(12): 1603–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Walters SJ, Brazier JE. Comparison of the minimally important difference for two health state utility measures: EQ-5D and SF-6D. Quality of life research : an international journal of quality of life aspects of treatment, care and rehabilitation. 2005;14(6):1523–1532. [DOI] [PubMed] [Google Scholar]
- 13.Barber MD, Kuchibhatla MN, Pieper CF, Bump RC. Psychometric evaluation of 2 comprehensive condition-specific quality of life instruments for women with pelvic floor disorders. American journal of obstetrics and gynecology. 2001;185(6):1388–1395. [DOI] [PubMed] [Google Scholar]
- 14.Bump RC, Mattiasson A, Bo K, et al. The standardization of terminology of female pelvic organ prolapse and pelvic floor dysfunction. American journal of obstetrics and gynecology. 1996;175(1): 10–17. [DOI] [PubMed] [Google Scholar]
- 15.Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, Bouter LM, de Vet HC. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63(7):737–745. doi: 10.1016/j.jclinepi.2010.02.006 [DOI] [PubMed] [Google Scholar]
- 16.Cohen J Statistical power analysis for the behavioral sciences. Mahwah, NJ: Lawrence Erlbaum Associates; 1988. [Google Scholar]
- 17.Bland JM, Altman DG: Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986, 1:307–310. [PubMed] [Google Scholar]
- 18.Crosby RD, Kolotkin RL, Williams GR. Defining clinically meaningful change in health-related quality of life. J Clin Epidemiol 2003;56:395–407. [PubMed: 12812812] [DOI] [PubMed] [Google Scholar]
- 19.Revicki D, Hays RD, Cella D, Sloan J. Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol 2008;61:102–9. [PubMed: 18177782] [DOI] [PubMed] [Google Scholar]
- 20.Guyatt GH, Osoba D, Wu AW, Wyrwich KW, Norman GR. Methods to explain the clinical significance of health status measures. Mayo Clin Proc 2002;77:371–83. [PubMed: 11936935] [DOI] [PubMed] [Google Scholar]
- 21.Wyrwich KW, Bullinger M, Aaronson N, Hays RD, Patrick DL, Symonds T. Estimating clinically significant differences in quality of life outcomes. Qual Life Res 2005;14:285–95. [PubMed: 15892420] [DOI] [PubMed] [Google Scholar]
- 22.Jelovsek JE Chen Z, Markland AD, Brubaker L, Dyer KY, Meikle S, Rahn DD, Siddiqui NY, Tuteja A, Barber MD. Minimum important differences for scales assessing symptom severity and quality of life in patients with fecal incontinence. Female Pelvic Med Reconstr Surg 2014. Nov-Dec;20(6):342–8. doi: 10.1097/SPV.0000000000000078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Higgins JP Whitehead A Turner RM Omar R Thompson SG. Meta-analysis of continuous outcome data from individual patients. Statist. Med 2001; 20:2219–2241 [DOI] [PubMed] [Google Scholar]
- 24.Kenward MG, Roger JH, Small sample inference for fixed effects from restricted maximum likelihood. Biometrics 1997; 53: 983–997. [PubMed] [Google Scholar]
- 25.Obradovic M, Lal A, Liedgens H. Health and Quality of Life Outcomes 2013, 11:110 http://www.hqlo.com/content/11/1/110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Walters SJ, Brazier JE. What is the relationship between the minimally important difference and health state utility values? The case of the SF-6D. Health and quality of life outcomes. 2003;1:4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Drummond MF. Introducing economic and quality of life measures into clinical studies. Ann Med 2001;33:344–349. [DOI] [PubMed] [Google Scholar]
- 28.Longworth L, Stirling B. An empirical comparison of EQ-5D and SF-6D in liver transplant patients. Health Econ. 12: 1061–1067 (2003). DOI: 10.1002/hec.787 [DOI] [PubMed] [Google Scholar]
- 29.Sintonen Harri (2001) The 15D instrument of health-related quality of life: properties and applications, Annals of Medicine, 33:5, 328–336, DOI: 10.3109/07853890109002086 [DOI] [PubMed] [Google Scholar]
- 30.Feeny D, Furlong W, Boyle M, Torrance GW. Multi-attribute health status classification systems. Health Utilities Index. PharmacoEconomics. 1995;7(6):490–502. [DOI] [PubMed] [Google Scholar]



