Skip to main content
Clinical Orthopaedics and Related Research logoLink to Clinical Orthopaedics and Related Research
. 2021 May 20;479(9):2022–2032. doi: 10.1097/CORR.0000000000001794

Test-retest Reliability and Construct Validity of the Satisfaction with Treatment Result Questionnaire in Patients with Hand and Wrist Conditions: A Prospective Study

Willemijn A De Ridder 1,2,3,4, Yara E van Kooij 1,2,4, Guus M Vermeulen 3, Harm P Slijper 1,2,3, Ruud W Selles 1,2, Robbert M Wouters 1,2,4,; , and the Hand-Wrist Study Group
PMCID: PMC8373545  PMID: 34014631

Abstract

Background

A patient’s satisfaction with a treatment result is an important outcome domain as clinicians increasingly focus on patient-centered, value-based healthcare. However, to our knowledge, there are no validated satisfaction metrics focusing on treatment results for hand and wrist conditions.

Questions/purposes

Among patients who were treated for hand and wrist conditions, we asked: (1) What is the test-retest reliability of the Satisfaction with Treatment Result Questionnaire? (2) What is the construct validity of that outcomes tool?

Methods

This was a prospective study using two samples: a test-retest reliability sample and a construct validity sample. For the test-retest sample, data collection took place between February 2020 and May 2020, and we included 174 patients at the end of their treatment with complete baseline data that included both the primary test and the retest. Test-retest reliability was evaluated with a mean time difference of 7.2 ± 1.6 days. For the construct validity sample, data collection took place between January 2012 and May 2020. We included 3742 patients who completed the Satisfaction with Treatment Result Questionnaire, VAS, and the Net Promotor Score (NPS) at 3 months. Construct validity was evaluated using hypothesis testing in which we correlated the patients’ level of satisfaction to the willingness to undergo the treatment again, VAS scores, and the NPS. We performed additional hypothesis testing on 2306 patients who also completed the Michigan Hand Outcomes Questionnaire (MHQ). Satisfaction with the treatment result was measured as the patients’ level of satisfaction on a 5-point Likert scale and their willingness to undergo the treatment again under similar circumstances.

Results

We found high reliability for level of satisfaction measured on Likert scale (intraclass correlation coefficient 0.86 [95% CI 0.81 to 0.89]) and almost-perfect agreement for both level of satisfaction measured on the Likert scale (weighted kappa 0.86 [95% CI 0.80 to 0.91]) and willingness to undergo the treatment again (kappa 0.81 [95% CI 0.70 to 0.92]) of the Satisfaction with Treatment Result Questionnaire. Construct validity was good to excellent as seven of the eight hypotheses were confirmed. In the confirmed hypotheses, there was a moderate-to-strong correlation with VAS pain, VAS function, NPS, MHQ pain, and MHQ general hand function (Spearman rho ranged from 0.43 to 0.67; all p < 0.001) and a strong to very strong correlation with VAS satisfaction and MHQ satisfaction (Spearman rho 0.73 and 0.71; both p < 0.001). The rejected hypothesis indicated only a moderate correlation between the level of satisfaction on a 5-point Likert scale and the willingness to undergo the treatment again under similar circumstances (Spearman rho 0.44; p < 0.001).

Conclusion

The Satisfaction with Treatment Result Questionnaire has good-to-excellent construct validity and very high test-retest reliability in patients with hand and wrist conditions.

Clinical Relevance

This questionnaire can be used to reliably and validly measure satisfaction with treatment result in striving for patient-centered care and value-based healthcare. Future research should investigate predictors of variation in satisfaction with treatment results.

Introduction

Patient-centered care and value-based healthcare frameworks have gained recognition globally in recent years [2, 3, 23]. In these frameworks, the patient is central and the aim is to achieve high value at low cost [2, 3, 23]. Using patient-reported outcome measurements (PROMs) is an important aspect of patient-centered care and value-based healthcare [8, 12, 15, 18, 30], and the degree to which a patient is satisfied with his or her treatment result may be one of the most important and relevant PROMs to use [2-4, 23, 24]. Satisfaction with the treatment result is measured worldwide using variations of the question “how satisfied are you with your treatment results so far?” [19], but there are doubts about the reliability and validity of measuring satisfaction with the treatment result [9, 25, 32]. The patient’s opinion about the treatment result seems too difficult to measure and depends on variables such as pretreatment expectations, environmental factors, and psychological factors [25].

Although the same considerations apply to any PROM, several studies have shown that many well-designed PROMs are valid and reliable [10, 16, 22]. However, to the best of our knowledge, no studies on the reliability and validity of a PROM evaluating patient satisfaction with the treatment result are currently available. Further, as satisfaction with treatment result is important for patient-centered care and value-based healthcare, and it is also considered an essential outcome domain by the recently published International Consortium for Health Outcomes Measurement standard set for hand and wrist conditions [11], it is essential to further investigate the psychometric properties of measures for evaluating satisfaction with treatment results.

Among patients who were treated for hand and wrist conditions, we asked: (1) What is the test-retest reliability of the Satisfaction with Treatment Result Questionnaire? (2) What is the construct validity of that outcomes tool?

Patients and Methods

Study Design

This was a prospective study on the test-retest reliability and construct validity of the Satisfaction with Treatment Result questionnaire using a prospective and population-based sample of patients with hand and wrist conditions from the Hand-Wrist Study Group cohort. This study was reported following the Strengthening the Reporting of Observational Studies in Epidemiology statement [31].

Setting

Data were collected at Xpert Clinic and Handtherapie Nederland, currently comprising 28 clinics for hand surgery and therapy in The Netherlands. Twenty-three surgeons certified by the Federation of European Societies for Surgery of the Hand and more than 150 hand therapists are employed at our treatment centers.

The primary data collection on satisfaction with the treatment result was part of the usual care and occurred between January 2012 and May 2020, and additional retest data for satisfaction with the treatment result were collected between February 2020 and May 2020. Data were collected using GemsTracker electronic data capture tools (GemsTracker 2020, Erasmus MC and Equipe Zorgbedrijven). GemsTracker is a secure internet-based application for the distribution of questionnaires and forms during clinical research and quality registrations. Details on the Hand-Wrist Study Group cohort have been published [26].

Participants

In this study, we used two samples: a sample to evaluate the test-retest reliability and a sample to evaluate the construct validity.

Patients were eligible for inclusion in the test-retest reliability sample if they were treated for any hand or wrist condition and completed the Satisfaction with Treatment Result Questionnaire at the final timepoint of outcome measurement, as defined within the Hand-Wrist Study Group cohort [26]. We chose the final timepoint because we expected little or no change in health status at that point. The following timepoints were included: 3 months after minor surgery or nonsurgical treatment (for example, trigger finger release or exercise therapy), 6 months after treatment for neuropathies or Dupuytren contracture (such as carpal tunnel release or limited fasciectomy), and 12 months after more extensive surgery (for example, thumb carpometacarpal resection arthroplasty). Five to 7 days after completing the Satisfaction with Treatment Result Questionnaire, patients were invited to complete the questionnaire again to evaluate the test-retest reliability. The retest questionnaire was available for 6 days after patients received the invitation, creating a time interval of 5 to 13 days as we hypothesized that the construct of satisfaction with treatment result remained stable over that time frame. The average time between the primary test and retest of the Satisfaction with Treatment Result Questionnaire was 7.2 ± 1.6 days.

Patients were eligible for inclusion in the construct validity sample if they completed the Satisfaction with Treatment Result Questionnaire, VAS for pain during physical load, VAS function, VAS satisfaction with the hand, and the Net Promotor Score (NPS) at 3 months after treatment. We used 3 months as a timepoint because the NPS was only administered at this timepoint. We additionally composed a subset of patients within this sample that also completed the Michigan Hand Outcomes Questionnaire (MHQ). This was only a subset of patients as this questionnaire is not administered for every patient in our cohort. We included patients who underwent one of the following common treatments: nonsurgical treatment for carpometacarpal osteoarthritis of the thumb, surgical treatment for carpometacarpal osteoarthritis of the thumb, nonsurgical treatment for wrist tendonitis or tenosynovitis, three-ligament tenodesis (modified Brunelli procedure), trigger finger release, proximal interphalangeal joint arthroplasty, limited fasciectomy, and carpal tunnel release. We chose these treatments because they are the most common treatments in our eight different measurement tracks [26].

For the test-retest reliability sample, we screened 330 consecutive patients who completed the Satisfaction with Treatment Result Questionnaire at the final time point. Of those, 287 had complete baseline sociodemographics and were invited to complete the retest. One hundred and thirteen patients did not respond, thus we included 174 patients in the test-retest reliability sample (Table 1.).

Table 1.

Baseline characteristics of the first sample for test-retest reliability (n = 174 patients)

Variable Value
Age in years 56.4 ± 12.8
Male sex 41 (72)
Dominant side treated 53 (93)
Initial (rather than second) opinion 96 (167)
Type of work
 Unemployed 35 (61)
 Light physical labor 29 (51)
 Moderate physical labor 30 (53)
 Heavy physical labor 5 (9)
Symptom duration in months 22.6 ± 45
Measurement track
 Thumb regular 7 (12)
 Thumb extended 9 (15)
 Wrist regular 18 (31)
 Wrist extended 10 (18)
 Finger regular 19 (33)
 Finger extended 5 (8)
 Dupuytren contracture 14 (25)
 Nerve compression or decompression 18 (32)
Primary test of satisfaction with treatment result: Question 1
 Excellent 22 (38)
 Good 36 (62)
 Fair 21 (36)
 Moderate 17 (30)
 Poor 5 (8)
Primary test of satisfaction with treatment result: Question 2 = no 20 (34)

Data are presented as % (n) or mean ± SD, as appropriate.

For the construct validity sample, we screened 31,846 patients with complete sociodemographics, and within that group, 17,166 patients completed the Satisfaction with Treatment Result Questionnaire at 3 months. Of those, 3742 patients also completed the VAS and the NPS and were included in the construct validity sample (Table 2). Within the construct validity sample, 2306 patients also completed the MHQ and were included in the subanalyses for construct validity (Fig. 1)

Table 2.

Baseline characteristics of the second sample for the entire construct validity and the subset of patients that also completed the Michigan Hand Outcomes Questionnaire (MHQ)

Variable Entire construct validity sample (n = 3742) Subset that completed MHQ (n = 2306)
Age in years 59 ± 12 62 ± 10
Male sex 41 (1542) 46 (1067)
Dominant side treated 51 (1899) 99 (2272)
Initial (rather than second) opinion 98 (3661) 46 (1061)
Type of work
 Unemployed 42 (1552) 47 (1088)
 Light physical labor 26 (977) 27 (614)
 Moderate physical labor 22 (830) 19 (428)
 Heavy physical labor 10 (383) 8 (176)
Treatment
 Nonsurgical treatment for CMC-1 OA 15 (570) 23 (532)
 Surgical treatment for CMC-1 OA 8 (317) 13 (309)
 Nonsurgical treatment for wrist tendonitis or tenosynovitis 8 (282) 0 (0)
 Three-ligament tenodesis (modified Brunelli) 2 (80) 0 (0)
 Trigger finger release 24 (879) 36 (830)
 PIP joint arthroplasty 1 (28) 1 (21)
 Limited fasciectomy 17 (624) 27 (614)
 Carpal tunnel release 26 (962)
Symptom duration in months 21.8 ± 37 23.9 ± 38
Satisfaction with treatment result: Question 1
 Excellent 21 (766) 18 (412)
 Good 38 (1434) 38 (877)
 Fair 25 (940) 26 (604)
 Moderate 12 (442) 14 (320)
 Poor 4 (160) 4 (93)
Satisfaction with treatment result: Question 2 = no 15 (571) 16 (379)

Data are presented as % (n) or mean ± SD, as appropriate; CMC-1 = carpometacarpal of the thumb; OA = osteoarthritis; PIP = proximal interphalangeal.

Fig. 1.

Fig. 1

This flowchart shows the patients who were included in the study. The left side displays the test-retest reliability sample, and the right side displays the construct validity sample.

Variables, Data Sources, and Measurements

The primary outcome of this study was the Satisfaction with Treatment Result Questionnaire, comprising two questions. Question 1 evaluates the patient’s satisfaction with the treatment result thus far, using a 5-point Likert scale (exact question: “How satisfied are you with your treatment result thus far?”; answer options were poor, moderate, fair, good, and excellent). In Question 2, the patient indicates whether he or she would undergo the same procedure again under similar circumstances (exact question: “If you would be in the same circumstances, would you be willing to undergo this treatment again?”; answer options were yes or no).

To evaluate the construct validity, we used several other questionnaires to calculate between-questionnaire correlations. We used a VAS (range 0-100), which is reliable and valid [10], to measure pain (higher scores indicate more pain), hand function (higher scores indicate better function), and satisfaction with the hand (exact question: “How satisfied are you with your hand at this moment?”; higher scores indicate greater satisfaction). We also used the MHQ subscales pain, general hand function, and satisfaction with hand (all subscales: range 0-100, higher scores indicate better performance), but as this questionnaire is not administered for every patient this was only done for a subset of the entire construct validity sample.

Finally, we used the NPS in the hypothesis testing, which is a metric to assess the quality-of-service delivery [5, 17]. This included a single question indicating the extent to which patients would recommend our clinic to friends and family on a 10-point scale (higher scores indicate a stronger recommendation).

Ethical Approval

Ethical approval for this study was obtained from Erasmus MC, Rotterdam, the Netherlands (approval number 2018-1088). This study was performed in accordance with the Declaration of Helsinki and approved by the local medical research ethical committee. Written informed consent was obtained from all patients.

Study Size

A priori power analysis for the test-retest reliability sample, testing the null hypothesis (Cohen kappa = 0.7, indicating substantial agreement) versus the alternative hypothesis (Cohen kappa > 0.7, given that kappa = 0.85 and ratings classify 50% in agreement), suggested that a sample of 96 participants was required, which was below the included sample of 174 patients we included in the test-retest reliability sample.

For the construct validity sample, a post hoc power analysis for the Spearman correlation, with an α = 0.05, β = 0.10, and an expected correlation coefficient of r = 0.20, suggested that a sample of 259 participants was required, which was well below the included samples of 3742 patients we included in the construct validity sample (Fig. 1).

Statistical Methods

To evaluate whether patients in the test-retest reliability sample who completed the retest systematically differed from patients who did not complete the retest, we performed a nonresponder analysis. In this analysis, we classified nonresponders as patients who did not complete the retest in the predetermined time, and responders were classified as patients who completed the retest. The sociodemographics of responders and nonresponders were compared using independent sample t-tests for continuous data and chi-square tests for dichotomous or categorical data. A p value < 0.05 was considered statistically significant. The sociodemographics of responders and nonresponders were highly similar; the only difference was whether the dominant side was treated, and this was the case in 53% of the responders and in 33% of the nonresponders (p = 0.03) (see Table 1; Supplemental Digital Content 1, http://links.lww.com/CORR/A563).

Test-retest reliability was evaluated using the weighted kappa and intraclass correlation coefficients (ICCs) for question 1 of the Satisfaction with Treatment Result Questionnaire, and the Cohen kappa was used for question 2. We also evaluated test-retest reliability using Cohen kappa for dichotomized modifications of question 1 because these might be used in logistic regression models in future research. For this, the 5-point Likert scale was split into “satisfied” and “dissatisfied” using two classifications, with the answer options of “poor,” “moderate,” and “fair” attributed to “dissatisfied” in the first classification and only “poor” and “moderate” attributed to “dissatisfied” in the second classification. For the weighted kappa determination, we used quadratic weights, implying that misclassification between adjacent categories was less problematic than those between more distant categories. The greater the distance, the larger the penalty for misclassification [6, 14]. For instance, a deviation from “good” to “poor” gets more weight than a deviation from “good” to “fair.” Weighted kappa and Cohen kappa scores can range from -1 to 1, where < 0 indicates no agreement, 0.01 to 0.20 is none to slight, 0.21 to 0.40 is fair, 0.41 to 0.60 is moderate, 0.61 to 0.80 is substantial, and 0.81 to 1.00 is almost-perfect agreement [6].

ICC values were calculated using a two-way mixed-effects model [13]. ICC values range from 0 to 1, where 1 is perfect reliability, 0.90 to 0.99 is very high reliability, 0.70 to 0.89 indicates high reliability, 0.50 to 0.69 represents moderate reliability, 0.26 to 0.49 is low reliability, and 0.00 to 0.25 indicates little, if any, reliability [7, 20, 27]. We also calculated the percentage of absolute agreement between the primary test and the retest for both questions and both dichotomized variants to examine the absolute proportion of overlap between the primary test and the retest. The absolute percentage agreement was considered high if it exceeded 75%, moderate if it was between 40% and 75%, and low if it was less than 40%.

Construct validity was evaluated using hypotheses testing, following the guidelines of the Consensus-based Standards for the Selection of Health Measurement Instruments [21]. Construct validity was defined as “the degree to which the scores of a measurement instrument are consistent with the hypotheses, with regard to internal relationships, relationships with scores of other instruments, or differences between relevant groups” [21].We formed eight hypotheses before the analysis, with a specific and clearly defined direction, magnitude, and rationale (Table 3). First, we hypothesized that there was a strong association between question 1 and question 2 of the Satisfaction with Treatment Result Questionnaire, derived from the rationale that satisfaction with result may dictate the decision to undergo the treatment again in the future. Also, we hypothesized that the level of satisfaction would have at least a moderate correlation with pain and function levels, as, logically, these may determine one’s level of satisfaction. Furthermore, we hypothesized that one’s level of recommendation would moderately correlate with level of satisfaction, as the degree of recommendation may, among other things, be influenced by satisfaction with the treatment result. Lastly, we hypothesized that there would be a strong correlation with satisfaction with the hand, as this construct may overlap with satisfaction with treatment result. For each possible outcome, we also defined the interpretation before the analysis. All authors agreed with the eight independent hypotheses before analysis. We considered each hypothesis with equal weight. To test the hypotheses, we calculated the Spearman rho correlation coefficients between question 1 of the Satisfaction with Treatment Result Questionnaire and question 2, VAS pain during physical load, VAS function, VAS satisfaction with the hand, the NPS, MHQ pain, MHQ general hand function, and MHQ satisfaction. The Spearman correlation coefficients were interpreted as follows: 0.00 to 0.19 was a very weak correlation, 0.20.to 0.39 was a weak correlation, 0.40 to 0.69 was a moderate correlation, 0.70 to 0.89 was a strong correlation, and 0.90 to 1 was a very strong correlation [1, 28]. Confirmation of ≥ 80% of the hypotheses was considered good-to-excellent construct validity [21].

Table 3.

Hypotheses for testing the construct validity of satisfaction with the treatment result and associated conclusions

Hypothesis for question 1 of the Satisfaction with Treatment Result Questionnaire Rationale Hypothesized Spearman rhoa Conclusions if correlation coefficient is lower than hypothesized Conclusions if correlation coefficient is higher than hypothesized Actual correlation Hypothesis confirmed?
Entire construct validity sample (n = 3742)
1. There will be at least a strong association between question 1 and question 2 of the Satisfaction with Treatment Result Questionnaire Satisfaction with result may dictate the decision to undergo the treatment again in the future ≥ 0.7 Very weak, weak, or moderate correlation Constructs may differ Strong or very strong correlation Constructs may be the same or overlap 0.44b No
2. There will be at least a moderate association with VAS pain during physical load Residual pain may correlate with satisfaction with the treatment result ≥ 0.4 Very weak or weak correlation Constructs are very different Moderate or strong correlation with a different construct 0.67b Yes
3. There will be at least a moderate association with VAS function Level of function may correlate with satisfaction with treatment result ≥ 0.4 Very weak or weak correlation Constructs are very different Moderate or strong correlation with different construct 0.61b Yes
4. There will be at least a strong association with VAS satisfaction with the hand Satisfaction with treatment result and satisfaction with the hand might measure an overlapping construct ≥ 0.7 Very weak, weak, or moderate correlation, although theoretically, there is overlap of constructs Strong or very strong correlation with, theoretically, an overlapping construct 0.73b Yes
5. There will be at least a moderate association with the NPS Satisfaction with the treatment result may correlate to the NPS because the degree of recommendation may, among other things, be influenced by satisfaction with the treatment result ≥ 0.4 Very weak or weak correlation Constructs are very different Moderate or strong correlation with different construct 0.43b Yes
Subset of construct validity sample with MHQ data (n = 2306)
6. There will be at least a moderate association with MHQ pain subscale Residual pain may correlate with satisfaction with the treatment result ≥ 0.4 Very weak or weak correlation Constructs are very different Moderate or strong correlation with a different construct 0.57b Yes
7. There will be at least a moderate association with MHQ overall hand function subscale Level of function may correlate with satisfaction with treatment result ≥ 0.4 Very weak or weak correlation Constructs are very different Moderate or strong correlation with different construct 0.51b Yes
8. There will be at least a strong association with MHQ satisfaction with the hand Satisfaction with treatment result and satisfaction with the hand might measure an overlapping construct ≥ 0.7 Very weak, weak, or moderate correlation, although theoretically, there is overlap of constructs Strong or very strong correlation with, theoretically, an overlapping construct 0.71b Yes
a

Spearman correlation coefficients are interpreted as follows: 0.00-0.19 = very weak correlation, 0.20-0.39 = weak correlation, 0.40-0.69 = moderate correlation, 0.70-0.89 = strong correlation, and 0.90-1 = very strong correlation.

b

p < 0.001; NPS = Net Promotor Score; MHQ = Michigan Hand Outcomes Questionnaire.

All analyses were performed using R Statistical Programming, version 3.3.4 (R Project for Statistical Computing).

Results

Test-retest Reliability

We found high reliability and almost-perfect agreement for test-retest reliability using the 5-point Likert scale of the Satisfaction with Treatment Result Questionnaire, with an ICC value of 0.86 (95% CI 0.81 to 0.89) and a weighted kappa of 0.86 (95% CI 0.80 to 0.91), respectively (Table 4). The distribution of answers at the primary test and the retest were highly similar (Fig. 2A), and most deviations were one step up or down compared with the primary test (Fig. 2B). The first dichotomized variant of question 1, with “poor,” “moderate,” and “fair” attributed to “dissatisfied,” demonstrated an absolute percentage of agreement of 87% and a kappa score of 0.73 (95% CI 0.62 to 0.83), indicating substantial agreement. The second dichotomized variant of question 1, with “poor” and “moderate” attributed to “dissatisfied,” demonstrated an absolute percentage of agreement of 81% and a kappa score of 0.57 (95% CI 0.45 to 0.69), indicating moderate agreement. When patients were asked about their willingness to undergo treatment again, we found a kappa score of 0.81 (95% CI 0.70 to 0.92), indicating almost-perfect agreement (94%)

Table 4.

Overview of the outcomes of test-retest reliability

Question of Satisfaction with Treatment Result Questionnaire ICC (95% CI) Weighted kappa (95% CI) Kappa (95% CI) Absolute agreement
Question 1: level of satisfaction measured with a 5-point Likert scale 0.86 (0.81-0.89) 0.86 (0.80-0.91) 70%
Question 1: level of satisfaction measured with dichotomized variant 1 (“poor,” “moderate,” and “fair” attributed to “dissatisfied”) 0.73 (0.62-0.83 87%
Question 1: level of satisfaction measured with dichotomized variant 2 (“poor” and “moderate” attributed to “dissatisfied”) 0.57 (0.45-0.69) 81%
Question 2: willingness to undergo the treatment again 0.81 (0.70-0.92) 94%

We included both intraclass correlation coefficients (ICC) and weighted kappa as the ICC indicates the reliability and the weighted kappa indicates the weighted agreement of the retest relative to the primary test.

Fig. 2.

Fig. 2

A-B (A) This bar plot indicates the distribution of question 1 of the Satisfaction with Treatment Result Questionnaire at the primary test moment and the retest. (B) This balloon plot indicates the degree of deviation between the primary test and the retest of question 1 of the Satisfaction with Treatment Result Questionnaire. In this plot, the primary test is displayed on the x axis, while the retest score is displayed on the y axis. The size of the dots indicates the number of patients. A color image accompanies the online version of this article.

Construct Validity

The Satisfaction with Treatment Result Questionnaire demonstrated good-to-excellent construct validity in this study. Of the eight hypotheses we tested, seven confirmed construct validity (Table 3). In the confirmed hypotheses, there was a moderate-to-strong correlation with VAS pain, VAS function, NPS, MHQ pain, and MHQ general hand function (Spearman rho ranging from 0.43 to 0.67; all p < 0.001) and a strong to very strong correlation with VAS satisfaction and MHQ satisfaction (Spearman rho 0.73 and 0.71; both p < 0.001). Only hypothesis 1 was rejected, as we found only a moderate correlation between question 1 and question 2 of the Satisfaction with Treatment Result Questionnaire (Spearman rho 0.44; p < 0.001).

Discussion

Satisfaction with treatment result is widely used and is considered an essential and patient-centered outcome domain [11, 19]. Before this study, there were doubts on reliability and validity of measures of satisfaction with the treatment result [25]. In this study, we found that the Satisfaction with Treatment Result Questionnaire had good-to-excellent construct validity and very high test-retest validity in two large samples of patients with hand and wrist conditions. Our findings indicate that the Satisfaction with Treatment Result Questionnaire is a reliable and valid instrument that can safely be used in daily practice and clinical research for evaluating patients’ satisfaction with their treatment result after treatment for a hand or wrist condition.

Limitations

A limitation of the observational design of this study is that a substantial proportion of patients did not respond, although our nonresponder analysis indicated that there were very few differences between responders and nonresponders. Hence, we are confident that this did not influence our results. An additional limitation is that we evaluated construct validity in the absence of a gold standard. Future research should investigate how to address this. Additionally, although not in the scope of this study, another limitation is that we did not study other important psychometric properties of the Satisfaction with Treatment Result Questionnaire, including responsiveness and other aspects of validity such as content validity [25]. Also, the psychometric properties of this measure in other study populations are still unknown.

Test-retest Reliability

Our study shows that satisfaction with a treatment result can reliably be measured using a one-question, 5-point Likert scale. Because we did not find any other studies investigating the psychometrical properties of the Satisfaction with Treatment Result Questionnaire in hand and wrist conditions, we cannot compare our findings with previous studies. However, Ring and Leopold [25] questioned the validity and reliability of assessing satisfaction with treatment results using a PROM, owing to within-person variation in pretreatment expectations as well as environmental and psychological factors. Although variation in these constructs exists among patients with hand and wrist conditions [29, 33, 34], our study shows that satisfaction with a treatment result can be measured reliably using a standardized PROM such as ours. This is supported by our finding that, if deviations between test-retest measurements occurred, these deviations were, in almost all instances, only in one level on the 5-point Likert scale.

We found that a dichotomized variant of a patient’s level of satisfaction, with “poor,” “moderate,” and “fair” attributed to “dissatisfied,” yielded substantial agreement, while the other variant yielded only moderate agreement. Although the agreement decreased when dichotomizing outcomes, which is often suboptimal due to loss of data, use of the first variant may be useful, for example, when aiming to use logistic regression models to explain the variance in levels of satisfaction with a treatment result.

Construct Validity

We found good-to-excellent construct validity of the Satisfaction with Treatment Result Questionnaire in this study as seven of the eight hypotheses we tested were confirmed. However, it should be noted that a gold standard for measuring satisfaction with treatment result is absent. Additionally, although the VAS satisfaction and MHQ satisfaction evaluate satisfaction with one’s hand and not satisfaction with treatment result, there may be circular reasoning. Future studies of construct validity may incorporate additional measures, such as the Global Rating of Change Score.

A remarkable finding in this study is that a patient’s willingness to undergo the treatment again under similar circumstances (question 2 of the Satisfaction with Treatment Result Questionnaire) was only moderately associated with his or her level of satisfaction with the treatment result (question 1). An explanation for this finding may be that a patient might not be completely satisfied with the treatment result but has improved enough to consider the treatment again under similar circumstances (or vice versa). This suggests that these two questions measure different constructs, and future research should investigate how these two constructs relate. Furthermore, the influence of one’s psychological mindset (including aspects such as anxiety or depression) and other factors on levels of satisfaction and willingness to undergo a treatment again should be further explored [25]. Also, future research may investigate which components form the construct of satisfaction with treatment result from both a patient and clinician perspective to optimize validity in measures of satisfaction with treatment result.

Conclusion

In this study, the Satisfaction with Treatment Result Questionnaire had good-to-excellent construct validity and very high test-retest validity in two large samples of patients with hand and wrist conditions. Satisfaction with treatment result can be measured safely in daily practice and clinical research using these questions in striving for patient-centered care and value-based healthcare. Future research should investigate other psychometric properties such as responsiveness or content validity, other tools such as the International Consortium for Health Outcomes Measurement satisfaction with treatment result questionnaire, as well as independent predictors of variation in satisfaction with the treatment result.

Group Authors

Members of the Hand-Wrist Study Group include R. A. M. Blomme, B. J. R. Sluijter, D. J. J. C. van der Avoort, A. Kroeze, J. Smit, J. Debeij, E. T. Walbeehm, G. M. van Couwelaar, G. M. Vermeulen, J. P. de Schipper, J. F. M. Temming, J. H. van Uchelen, H. L. de Boer, K. P. de Haas, K. Harmsen, O. T. Zöphel, R. Feitz, J. S. Souer, R. Koch, S. E. R. Hovius, T. M. Moojen, X. Smit, R. van Huis, P. Y. Pennehouat, K. Schoneveld, Y. E. van Kooij, J. J. Veltkamp, A. Fink, W. A. de Ridder, H. P. Slijper, R. W. Selles, J. T. Porsius, R. M. Wouters, J. Tsehaie, R. Poelstra, M. C. Jansen, M. J. W. van der Oest, L. Hoogendam, J. S. Teunissen, J. Dekker, M. Jansen-Landheer, M. ter Stege, J. M. Zuidam, C. A. van Nieuwenhoven, L. S. Duraku, C. Hundepool, B. van der Heijden, and J. W. Colaris.

Supplementary Material

SUPPLEMENTARY MATERIAL
abjs-479-2022-s001.docx (99.7KB, docx)

Acknowledgments

We thank all the patients who completed questionnaires as part of their clinical care and who agreed that their data could be anonymously used for the present study. In addition, we thank the members of the Hand-Wrist Study Group, caregivers, and personnel of Xpert Clinic, Handtherapie Nederland, and Equipe Zorgbedrijven for assisting in the routine outcome measurements that are the basis for this study.

Footnotes

a

Members of the Hand-Wrist Study Group are listed in an Appendix at the end of this article.

The institution of one or more the authors (RMW) has received, during the study period, funding from ZonMW.

Each author certifies that neither he nor she, nor any member of his or her immediate family, has funding or commercial associations (consultancies, stock ownership, equity interest, patent/licensing arrangements, etc.) that might pose a conflict of interest in connection with the submitted article.

All ICMJE Conflict of Interest Forms for authors and Clinical Orthopaedics and Related Research® editors and board members are on file with the publication and can be viewed on request.

Ethical approval for this study was obtained from Erasmus MC, Rotterdam, the Netherlands (approval number 2018-1088).

This work was performed at Erasmus MC, Rotterdam, the Netherlands.

Contributor Information

Willemijn A. De Ridder, Email: willemijna@gmail.com.

Yara E. van Kooij, Email: yaravk@hotmail.com.

Guus M. Vermeulen, Email: g.vermeulen@xpertclinics.nl.

Harm P. Slijper, Email: harm.slijper@gmail.com.

Ruud W. Selles, Email: r.selles@erasmusmc.nl.

References

  • 1.Balkau B. Practical statistics for nursing and health care. In: European Diabetes Nursing. Taylor & Francis; 2006:27. [Google Scholar]
  • 2.Barry MJ, Edgman-Levitan S. Shared decision making — the pinnacle of patient-centered care. New Engl J Med. 2012;366:780-781. [DOI] [PubMed] [Google Scholar]
  • 3.Basch E. Patient-reported outcomes — harnessing patients’ voices to improve clinical care. New Engl J Med. 2017;376:105-108. [DOI] [PubMed] [Google Scholar]
  • 4.Bleich SN, Ozaltin E, Murray CK. How does satisfaction with the health-care system relate to patient experience? Bull World Health Organ. 2009;87:271-278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Breckenridge K, Bekker HL, Gibbons E, et al. How to routinely collect data on patient-reported outcome and experience measures in renal registries in Europe: an expert consensus meeting. Nephrol Dial Transplant. 2015;30:1605-1614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Cohen J.Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull. 1968;70:213-220. [DOI] [PubMed] [Google Scholar]
  • 7.Fleiss JL. Reliability of measurement. In: Fleiss JL. The Design and Analysis of Clinical Experiments. John Wiley & Sons; 1999:1-32. [Google Scholar]
  • 8.Goldhahn J, Angst F, Simmen BR. What counts: outcome assessment after distal radius fractures in aged patients. J Orthop Trauma. 2008;22:S126-130. [DOI] [PubMed] [Google Scholar]
  • 9.Graham B, Green A, James M, Katz J, Swiontkowski M. Measuring patient satisfaction in orthopaedic surgery. J Bone Joint Surg Am. 2015;97:80-84. [DOI] [PubMed] [Google Scholar]
  • 10.Hawker GA, Mian S, Kendzerska T, French M. Measures of adult pain: visual analog scale for pain (VAS pain), Numeric Rating Scale for pain (NRS pain), Mcgill Pain Questionnaire (MPQ), Short-Form Mcgill Pain Questionnaire (SF-MPQ), Chronic Pain Grade Scale (CPGS), Short Form-36 Bodily Pain Scale (SF-36 BPS), and measure of Intermittent and Constant OsteoArthritis Pain (ICOAP). Arthritis Care Res (Hoboken). 2011;63(suppl 11):S240-252. [DOI] [PubMed] [Google Scholar]
  • 11.ICHOM. Hand and wrist conditions data collection reference guide version 1.0.0. 2020. Available at: https://connect.ichom.org/standard-sets/hand-and-wrist-conditions/. Accessed April 1, 2021.
  • 12.Karnezis IA, Fragkiadakis EG. Association between objective clinical variables and patient-rated disability of the wrist. J Bone Joint Surg Br. 2002;84:967-970. [DOI] [PubMed] [Google Scholar]
  • 13.Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15:155-163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159-174. [PubMed] [Google Scholar]
  • 15.MacDermid JC, Roth JH, Raj Rampersaud Y, Bain GI. Trapezial arthroplasty with silicone rubber implantation for advanced osteoarthritis of the trapeziometacarpal joint of the thumb. Can J Surg. 2003;46:103-110. [PMC free article] [PubMed] [Google Scholar]
  • 16.MacDermid JC, Wessel J, Humphrey R, Ross D, Roth JH. Validity of self-report measures of pain and disability for persons who have undergone arthroplasty for osteoarthritis of the carpometacarpal joint of the hand. Osteoarthritis Cartilage. 2007;15:524-530. [DOI] [PubMed] [Google Scholar]
  • 17.Manary MP, Boulding W, Staelin R, Glickman SW. The patient experience and health outcomes. New Engl J Med. 2013;368:201-203. [DOI] [PubMed] [Google Scholar]
  • 18.Mandl LA, Galvin DH, Bosch JP, et al. Metacarpophalangeal arthroplasty in rheumatoid arthritis: what determines satisfaction with surgery? J Rheumatol. 2002;29:2488-2491. [PubMed] [Google Scholar]
  • 19.Marks M, Herren DB, Vliet Vlieland TPM, et al. Determinants of patient satisfaction after orthopedic interventions to the hand: a review of the literature. J Hand Ther. 2011;24:303-312. [DOI] [PubMed] [Google Scholar]
  • 20.Mitani AA, Freer PE, Nelson KP. Summary measures of agreement and association between many raters' ordinal classifications. Ann Epidemiol. 2017;27:677-685.e674. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Mokkink LB, Terwee CB, Patrick DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63:737-745. [DOI] [PubMed] [Google Scholar]
  • 22.Poole J. Measures of hand function: Arthritis Hand Function Test (AHFT), AUStralian CANadian osteoarthritis hand index (AUSCAN), Cochin Hand Function Scale, Functional Index for Hand OsteoArthritis (FIHOA), Grip Ability Test (GAT), Jebsen Hand Function Test (JHFT), and Michigan Hand outcomes Questionnaire (MHQ). Arthritis Care Res (Hoboken). 2011;63 (Suppl 11):S189-S199. [DOI] [PubMed] [Google Scholar]
  • 23.Porter ME. What is value in health care? New Engl J Med. 2010;363:2477-2481. [DOI] [PubMed] [Google Scholar]
  • 24.Rathert C, Wyrwich MD, Boren SA. Patient-centered care and outcomes: a systematic review of the literature. Med Care Res Rev. 2013;70:351-379. [DOI] [PubMed] [Google Scholar]
  • 25.Ring D, Leopold SS. Editorial: Measuring satisfaction: can it be done? Clin Orthop Relat Res. 2015;473:3071-3073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Selles RW, Wouters RM, Poelstra R, et al. Routine health outcome measurement: development, design, and implementation of the Hand and Wrist Cohort. Plast Reconstr Surg. 2020;146:343-354. [DOI] [PubMed] [Google Scholar]
  • 27.Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86:420-428. [DOI] [PubMed] [Google Scholar]
  • 28.Streiner D, Norman G, Cairney J. Health measurement scales: a practical guide to their development and use, 5th ed. Aust N Z J Public Health. 2016;40:294-295. [DOI] [PubMed] [Google Scholar]
  • 29.Tsehaie J, Spekreijse KR, Wouters RM, et al. Predicting outcome after hand orthosis and hand therapy for thumb carpometacarpal osteoarthritis: a prospective study. Arch Phys Med Rehabil. 2019;100:844-850. [DOI] [PubMed] [Google Scholar]
  • 30.van Oosterom FJ, Ettema AM, Mulder PG, Hovius SE. Impairment and disability after severe hand injuries with multiple phalangeal fractures. J Hand Surg Am. 2007;32:91-95. [DOI] [PubMed] [Google Scholar]
  • 31.von Elm E, Altman DG, Egger M, et al. The Strengthening the Reporting of OBservational studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet. 2007;370:1453-1457. [DOI] [PubMed] [Google Scholar]
  • 32.Vranceanu AM, Ring D. Factors associated with patient satisfaction. J Hand Surg Am. 2011;36:1504-1508. [DOI] [PubMed] [Google Scholar]
  • 33.Wouters RM, Slijper HP, Esteban Lopez L, Hovius SER, Selles RW. Beneficial effects of nonsurgical treatment for symptomatic thumb carpometacarpal instability in clinical practice: a cohort study. Arch Phys Med Rehabil. 2020;101:434-441. [DOI] [PubMed] [Google Scholar]
  • 34.Wouters RM, Vranceanu AM, Slijper HP, et al. Patients with thumb-base osteoarthritis scheduled for surgery have more symptoms, worse psychological profile, and higher expectations than nonsurgical counterparts: a large cohort analysis. Clin Orthop Relat Res. 2019;477:2735-2746. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SUPPLEMENTARY MATERIAL
abjs-479-2022-s001.docx (99.7KB, docx)

Articles from Clinical Orthopaedics and Related Research are provided here courtesy of The Association of Bone and Joint Surgeons

RESOURCES