Health-Related Quality of Life After Elective Surgery: Measurement of Longitudinal Changes

Carol M Mangione; Lee Goldman; E John Orav; Edward R Marcantonio; Alex Pedan; Lynn E Ludwig; Magruder C Donaldson; David J Sugarbaker; Robert Poss; Thomas H Lee

doi:10.1046/j.1525-1497.1997.07142.x

. 1997 Nov;12(11):686–697. doi: 10.1046/j.1525-1497.1997.07142.x

Health-Related Quality of Life After Elective Surgery

Measurement of Longitudinal Changes

Carol M Mangione ¹, Lee Goldman ¹, E John Orav ¹, Edward R Marcantonio ¹, Alex Pedan ¹, Lynn E Ludwig ¹, Magruder C Donaldson ¹, David J Sugarbaker ¹, Robert Poss ¹, Thomas H Lee ¹

PMCID: PMC1497188 PMID: 9383137

Abstract

OBJECTIVE

To examine the responsiveness of the 36-Item Short Form Health Survey (SF-36) to clinical changes in three surgical groups and to study how health-related quality of life (HRQL) changes with time among patients who undergo total hip arthroplasty, thoracic surgery for treatment of non-small-cell lung cancer, or abdominal aortic aneurysm (AAA) repair.

DESIGN

Prospective cohort study with serial evaluations of HRQL preoperatively and at 1, 6, and 12 months after surgery.

SETTING

University tertiary care hospital.

PATIENTS

Of 528 patients, more than 50 years of age, who were admitted for these elective procedures, 454 (86%) provided preoperative health status data and are members of the study cohort. At 12 months after surgery, 439 (93%) of the cohort was successfully contacted and 390 (90%) provided follow-up interviews.

MEASUREMENTS AND MAIN RESULTS

The Medical Outcomes Study SF-36, the Specific Activity Scale, five validated health transition questions, and a 0 to 100 scale measure of global health were used to assess changes in health status at 1, 6, and 12 months after surgery. Change in health status as measured by the SF-36 demonstrated that physical function and role limitations due to physical health problems were worse 1 month after these three surgeries. However, by 6 months after surgery, most patients experienced significant gains in the majority of the dimensions of health, and these gains were sustained at 12 months after surgery. Longitudinal changes in the SF-36 were positively associated with responses to the five health transition questions, to changes on the Specific Activity Scale and global health rating question, and to clinical parameters for persons who had AAA repair. These findings indicate that the SF-36 has evidence of validity and is responsive to expected changes in HRQL after elective surgery for these procedures.

CONCLUSIONS

For the total hip arthroplasty patients, responsiveness was greatest for the SF-36 scales that measure physical constructs. However, for the two other procedures and at various points of recovery, significant changes were observed for all eight subscales, suggesting that responsiveness was dependent on the type of surgery and the timing of follow-up, and that multidimensional measures are needed to fully capture changes in HRQL after surgery.

Keywords: health-related quality of life, elective surgery

In recent years, pressures to prioritize health care spending have amplified the need to assess and compare the outcomes of medical interventions.¹,⁴ To this end, investigators have developed questionnaires that broaden the focus beyond traditional clinical outcomes to include the patient's perception of health-related quality of life (HRQL).⁵,⁹ This investigation assesses the responsiveness of the Medical Outcomes Study (MOS) 36-Item Short Form Health Survey (SF-36 or RAND 36-Item Health Survey 1.0) when it is used to monitor changes in HRQL after major elective surgery and also examines how HRQL changes with time after three specific surgical procedures.¹⁰,¹³ The elective surgical population is of particular interest because of the high costs and uncertain benefits associated with many procedures.

The SF-36 is an attractive measure because of its brevity, patient acceptability, and rigorous psychometric development.⁹,¹⁸ In addition, published population-based data greatly enhance the interpretation of SF-36 scores.¹⁷ The SF-36 has well-established reliability and validity in cross-sectional studies of medical and psychiatric patients,¹¹^,¹⁸,²¹ and in longitudinal studies of selected medical conditions such as diabetes mellitus,^15,22 end-stage renal disease,¹⁶ HIV infection,²³^,²⁴ asthma,²⁵ coronary disease,²⁶ and psychiatric diseases.²⁷,²⁹

Except for arthroplasty surgery,³⁰,³⁵ however, only a few surgery-specific studies of highly select patients have provided data to support the SF-36's responsiveness to the effects of elective surgery over time.³⁶,³⁹ This is the first investigation to examine simultaneously the responsiveness of the SF-36 across multiple procedures at various postoperative time intervals. These analyses describe preoperative HRQL and observed changes in health status after three major elective surgical procedures: total hip arthroplasty (THA); thoracic surgery for the treatment of non-small-cell lung cancer; and resection of abdominal aortic aneurysm (AAA).

Subjects

Eligible subjects were patients older than 50 years of age who were admitted to Brigham and Women's Hospital, a 720-bed urban teaching hospital, for unilateral THA for the treatment of hip dysfunction caused by osteoarthritis, thoracic surgery (lobectomy or pneumonectomy) for the treatment of non-small-cell lung cancer, or repair of AAA, between November 1, 1990, and May 30, 1993. Eligibility criteria included the ability to speak English and adequate hearing and cognitive function to complete a preoperative self-administered health status questionnaire and postoperative telephone interviews. During the study period, 528 (95%) of the eligible patients who were approached by research staff gave informed consent and were enrolled in the study. Of these, 454 patients (86%), completed the preoperative health status questionnaire and constituted the cohort for this report. The distributions of gender, age, and race among the persons who did not return the preoperative questionnaire were similar to those among persons who did. However, a greater proportion of those who did not return the survey had three or more coexistent chronic medical conditions (p < .05 for linear trend).

Of the 454 participating patients, 5 died before the 1-month follow-up interview, another died before the 6-month follow-up, and 17 more deaths occurred before the 12-month interview (Fig. 1). Of these 23 deaths, 14 were among thoracic surgery patients, 6 among those who had AAA repair, and 3 among the THA patients. Because changes in HRQL are relevant only for persons who survive, subjects were deleted from the analyses from the time of their death forward.

Data Collection

All subjects were evaluated a median of 1 day before surgery. Data were prospectively collected on medical history, including 26 underlying chronic medical conditions, physical examination, and mental status. The MOS SF-36 was included preoperatively and at 1, 6, and 12 months after surgery. The SF-36 includes questions that evaluate eight health dimensions: health perception, physical function, role limitations due to physical health problems and role limitations due to emotional problems, mental health, social function, vitality, and bodily pain.⁹ Subjects also completed the Specific Activity Scale (SAS), an ordinally scaled, four-class measure of cardiovascular physical functioning (1 = best to 4 = worst) based on the metabolic expenditures of various personal care, housework, occupational, and recreational activities before and at 6 months after surgery.^40,41

To determine whether participants considered their health to be unchanged, better, or worse when compared with their preoperative health, the 6-month follow-up interview included five dimension-specific health transition questions (Appendix A).⁴² Finally, during all four interviews patients were asked the following question as a measure of global health:

If you were to rate your current health on a scale from 0 to 100 with 100 being perfect health and 0 being death, what number would you rate yourself today?

Responses from the non-SF-36 questions were used to evaluate the validity of the observed changes in the SF-36 after each of the elective procedures. Patients were interviewed at 1, 6, and 12 months after surgery to determine whether the SF-36 would be sensitive to short-term disability from elective surgery, and to the long-term benefits of surgery or declines in health due to progression of the underlying condition.

Rationale for the Selection of Surgical Procedures and Expected Changes in Quality of Life

We used the framework provided by the physical, mental, and combined constructs represented in the SF-36,¹¹ to classify the expected surgery-specific changes in SF-36 scores over time (Table 1). A range of anticipated changes in HRQL were derived from published condition-specific descriptions of outcomes after THA, thoracic surgery for lung cancer, and repair of AAA. We have used these to evaluate the responsiveness of the SF-36 relative to expected longitudinal changes in clinical status over time.

Table 1.

Hypothesized Changes in the 36-Item Short Form Health Survey's Physical, Mental, and Combined Health Constructs Among Survivors*

graphic file with name jgi_7142_t1.jpg

Open in a new tab

Total Hip Arthroplasty

Previous studies of outcomes after THA have demonstrated dramatic improvements in HRQL, particularly for the dimensions of pain and physical functioning.³⁶,³⁹ From published findings, we anticipated that after transient procedure-related declines in physical health, THA patients would report large improvements on the SF-36 subscales that share variance with the physical construct and moderate improvements on the combined construct of the SF-36.¹¹ Because chronic pain can influence the severity of symptoms related to emotional well-being, we also expected subscales that share variance with the mental construct to improve moderately after surgery Table 1.

Thoracic Surgery for Lung Cancer

As described by Dales et al.,⁴³ we expected patients to report substantial declines in physical functioning during the immediate postoperative period followed by improvements with possible return to preoperative levels by 12 months after surgery for those who remained disease-free. If the SF-36 is a useful tool for the longitudinal assessment of persons with lung cancer, it should be sensitive to the immediate physical and emotional consequences of lost respiratory capacity due to surgery and postoperative pain, as well as the long-term effect of uncertainty about prognosis (Table 1).

Abdominal Aortic Aneurysm Repair

Repair of AAA is a procedure with known moderate to severe short-term morbidity, therefore we expected improvement among survivors on the subscales that represent the mental health construct. Because vascular disease of the aorta rarely occurs in isolation, we expected these patients to have more coexistent vascular disease at the time of surgery. Previous investigators have demonstrated that the SF-36 is sensitive to the effect of coexistent medical conditions.^44–46 For these reasons we expected the AAA patients to have pronounced short-term declines in SF-36 subscales that represent the physical and combined constructs. For those who survived the procedure, we expected a return to preoperative levels of HRQL, rather than improvements related to the procedure (Table 1).

Statistical Analysis

Health Status Scores

The scoring algorithm for the SF-36 was identical to that described by Ware et al.¹⁷ Mean scores were calculated by surgical procedure. The percentages of persons with the highest and lowest possible preoperative SF-36 scores also were recorded.

Changes in Health Status over Time

Mean changes in health status scores at 1, 6, and 12 months after surgery relative to preoperative scores were calculated for the three surgical groups. Student's paired t tests were used to assess the significance of within-subject changes in the SF-36 subscales and the 0 to 100 global rating scale. The Wilcoxon Sign-Rank Test was used to test for significant change in the SAS at 6 months after surgery.⁴⁷

Responsiveness of the SF-36

To assess the responsiveness of each SF-36 subscale, we compared the square of the absolute value of the T statistic (12-month after-surgery score − preoperative score/SD of the difference) for each of the subscales relative to the subscale with the largest T statistic. This comparison simultaneously accounts for the magnitude of change over time and the variability in each subscale. The relative efficiency statistic (T₁/T₂)², first described by Liang et al.,⁴⁸ identifies which SF-36 subscale at each point in time is most responsive for the three procedures under investigation.

Validity of the SF-36

To evaluate the validity of the longitudinal changes observed with the SF-36 for the three surgical groups, we conducted three statistical comparisons.⁴⁹ The first test compared the magnitude of the correlations between changes in the SF-36 and changes in the overall-health-rating question (0 –100) and SAS class. We interpreted the clinical significance of these observed correlations by adopting a framework described by McHorney et al.,¹¹ in which correlations that are greater than or equal to 0.70 are classified as “high,” those between 0.7 and 0.3 are considered to be “moderate,” and those less than 0.3 are considered to be “low.” Because response to the 0 to 100 rating question was based on a person's synthesis of overall health, our expectation was that changes in the response to this question would be moderately correlated with the SF-36 subscales that contribute to the combined construct (health perception and vitality) and that the remainder of the subscales would have low correlations (<0.3) with the 0 to 100 rating question.

The SAS covers some physical activities that are similar to those found in the SF-36 physical functioning scale; therefore we expected changes in this measure to correlate moderately (0.3–0.7) with changes in the SF-36 subscales that measure the physical construct (physical functioning, role limitations due to physical problems, and bodily pain). Because poorer cardiovascular functioning can lead to fatigue, we expected moderate correlations with changes in SAS and the SF-36 vitality scale. We expected low correlations between the SAS and the remaining SF-36 scales.

A person's impression of transition in HRQL is the closest proxy to a “gold standard” for change in HRQL (Appendix A).⁴² Our second statistical comparison uses data from transition questions to compare the magnitude of the dimension-specific changes in SF-36 scores for participants who reported that they were better, the same, or worse at 6 months after surgery. To provide a reference point for the clinical interpretation of the SF-36 scores, the third test statistic uses Student's t tests for pairwise comparisons of preoperative and postoperative scores to published age- and gender-adjusted population-based scores.¹⁷ To adjust the published scores, they were weighted to match the surgery-specific distribution of age and gender in our sample.

Tests of Clinical Validity: Procedure-Specific Analyses of SF-36 Scores

Within the AAA subgroup, validity of the observed SF-36 subscale changes was also assessed using clinical parameters by comparing mean change at 1 year after surgery among those with more than three coexistent medical conditions relative to those with fewer than three comorbidities. From the work of Greenfield et al.,^44–46 we expected that those with more medical comorbidity would have greater declines in HRQL over time.

Reliability of the SF-36

To determine whether the SF-36 was equally reliable across different surgical categories, we calculated Cronbach's coefficient α as a measure of internal consistency for each of the multi-item subscales of the SF-36 in each of the groups.⁵⁰ Significance was defined as a two-tailed p value ≤.05 for all statistical tests.

Population

The 454 patients (54% female; 96% white) had a mean (± SD) age of 67 ± 9 years (Table 2). Common comorbid conditions were hypertension (42%), arthritis (22%), ischemic heart disease (22%), and diabetes mellitus (8%); 238 (52%) of the patients had only one or no comorbid conditions (Table 2). Patients in the three surgical groups varied by demographic and medical characteristics.

Table 2.

Description of the Study Population (n= 454)

graphic file with name jgi_7142_t2.jpg

Open in a new tab

The preoperative SF-36 scores that contribute to the physical construct (physical function, role limitations due to physical limitations, and bodily pain) were lowest for patients scheduled for THA, whereas scores on the SF-36 health perception subscale were lower for patients with potentially life-threatening conditions (lung cancer or AAA) (Table 3). More than one third of the preoperative scores were at the ceiling or floor of the role-physical and role-emotional subscales and the social-functioning subscale, indicating that for these three subscales, scores after surgery could only stay the same or change in one direction. Reliability estimates based on Cronbach's coefficient α ranged from 0.92 to 0.86 for the SF-36 subscales.

Table 3.

Preoperative 6-Item Short Form Health Survey Scores by Surgical Procedure (n= 454)

graphic file with name jgi_7142_t3.jpg

Open in a new tab

Responsiveness of the SF-36 to Within-Procedure Change in Health Status over Time

In this section we first compare the observed changes in SF-36 scores to the expected changes described in Table 1. We then report which SF-36 scale is most responsive for each procedure, and finally we compare the preoperative and postoperative SF-36 scores to a population-based sample. As was hypothesized, for patients undergoing any one of the elective surgeries under consideration, SF-36 scores obtained 1 month after surgery demonstrated significant declines in physical functioning and role limitations due to physical problems (Table 4).

Table 4.

Responsiveness of the 36-Item Short Form Health Survey by Surgical Procedure

graphic file with name jgi_7142_t4.jpg

Open in a new tab