Abstract
Study Design
Narrative review.
Objectives
To review the current state-of-the-art in patient reported outcome measurements (PROMs) in adult spinal deformity (ASD) surgery.
Methods
PubMed was queried for publications related to PROM usage in ASD. PROM properties including responsiveness to change and thresholds for clinically relevant change were reviewed.
Results
Despite many reports using PROMs in ASD, there are little data to support superiority of any particular PROM. The Scoliosis Research Society-22r is a disease-specific measure that is responsive to change across pain, function, and self-image domains. The Patient Reported Outcome Measurement Information System (PROMIS) is a domain-specific measure available in computer adaptive tests, which may reduce question burden and ease administration for both patients and providers. Minimum clinically important differences, minimum detectable changes, and patient-acceptable symptom states have been proposed.
Conclusions
PROMs are an essential component of modern, value-based ASD care, irrespective of academic pursuits. The SRS-22r is a validated disease specific measure, though this may be supplanted by computer-adaptive tests such as PROMIS to reduce the question burden. There is no PROMIS question set for self-image, which must be developed to cover all pertinent ASD domains.
Keywords: patient reported outcomes, adult deformity, scoliosis
Introduction
A patient reported outcome (PRO) is a measure of a patient’s health status from their perspective without interpretation by another observer. They are an essential component of any value evaluation in healthcare, where patient perception of their disability or disease state is required to indicate treatment, understand and set expectations, and evaluate treatment success. PROs are captured through patient-reported outcome measurements (PROMs) which may be general measures of health, trait-specific (e.g., pain), or disease-specific (e.g., spinal deformity). The World Health Organization has published the International Classification of Functioning, Disability, and Health (ICF) which calls for the recognition and measurement of the disability caused across health domains by any disease process. 1
Adult spinal deformity (ASD) is a complex, multimodal bio-psychological condition. 2 It affects function, pain, appearance, interpersonal relationships, and mental health among other health domains. This diversity, then, creates challenges as PROMs are developed and chosen to measure disability in ASD patients. PROMs must be valid for the concept examined. 3 Content validity ensures that the PROM instrument measures the concept tested (i.e. a mental health PROM measures mental health only and not pain or function). Construct validity ensures that the questions comprising the PROM are logically related and discriminate the domain or concept tested (i.e. a series of questions test the same concept). Finally, the PROM should follow some other known measure of the domain illustrating criterion validity. This is not always possible, however, if no gold standard measure exists.
PROMs must also be reliable, that is providing a consistent measure of the health state. 3 Test-retest reliability requires a single patient to complete the PROM at 2 distinct time points where there should be little variation in the patient state. Internal consistency measures the ability of the test to deliver consistent scores across patients within similar disease states. Beyond consistency, a PROM must be able to detect change as a disease worsens or improves. Responsiveness to change is often tested with the effect size (ES) or standardized response mean (SRM), where larger values mean more responsiveness to change. With respect to PROMs it is important to consider the effect the baseline value has on change that is possible (where you start affects where you can go) and an adjustment for this is required. 4
Critical to the design of PROM instruments and comprehension of data output are the concepts of “floor” and “ceiling” effects which affect the dynamic range (the lowest and highest possible scores). 5 The “floor” is the lowest score possible on any instrument, while the “ceiling” is the maximum score attainable. PROMs with short scales (e.g., score range of 1-5]) may have more difficulties with these limitations than PROMs with broad scales (e.g., score range of 0-100) where different health states are discriminated. Historically a well-designed PROM allowed no more than 15% of patients to achieve a floor or ceiling. As PROM development has matured this tolerance has fallen to 5-10%. Floor and ceiling effects, if present, will limit the ability of the PROM to distinguish different health states at the top and bottom of the test. Continuous measures of health, such as the Patient Reported Outcome Measurement Information System (PROMIS) may minimize these effects, though the working range (values achieved) may not match the dynamic range of the instrument, leading to a false sense of superiority.
“Floor” and “ceiling” effects are distinct from PROM clustering which may occur after ASD reconstructions. While the PROMs are often blamed for lacking sensitivity to change, the reality may be that multilevel fusions can only improve function/pain/self-image some finite amount. If this is true, then there may be an optimum PROM score for intervention, where improvement is maximized and patients reach an acceptable symptom state. The Adult Symptomatic Lumbar Scoliosis-1 study investigated this, finding that patients with the least amount of disability were at the highest risk for PROM worsening while patients with the greatest disability remained substantially disabled at 2-years after surgery. 6 While the authors were unable to identify a “sweet spot” for intervention, on average patients tended to crossover from nonoperative care to operative care with an Oswestry Disability Index (ODI) score of 40. This value is similar to that found in the Spine Patient Outcomes Research Trial (SPORT). Thus, the ODI value may help inform surgeons about the severity of disease and assist with decision making.
Patient Reported Outcome Measures in Adult Spinal Deformity
A systematic review of PROMs identified 7 PROM frequently used in ASD research including general measures and disease-specific measures of health as well as measures of satisfaction and self-reported treatments(TABLE). 2 The 2 most common general measures of health are the SF-36 and the SF-12, scored on a scale from 0-100 where 100 indicates perfect health. The SF-36 measures 8 domains through 36 questions: physical functioning (PF), role physical (RP), bodily pain (BP), general health (GH), vitality (VT), social functioning (SF), role emotional (RE), and mental health (MH) and asks 1 question regarding perceived change in health status. These scores associated with these domains can then be reduced to composite physical component summary (PCS) and mental component summary (MCS) scores. This conversion may minimize the chance errors of multiple comparisons in PROMs research by reducing the number of PROM subsets analyzed. SF-36 has excellent validity, reliability, and has been translated into many languages for administration. A short-coming of the SF-36 is the length of the questionnaire and, as a result, the SF-12 was developed as a 12-question general measure of health taking approximately 1/3 the time of the SF-36 to complete. The SF-12 can also be reduced to PCS and MCS scores. In general, the SF-12 performs like the SF-36 for validity and reliability across diseases and populations.
The Visual Analog Scale (VAS) and Numerical Rating Scale (NRS) were PROMs identified to measure pain intensity. The VAS requires patients to indicate the severity of their pain using visual cues from good to bad along a line. The clinician then converts this rating to a number where 10 is the worst pain. The NRS asks patients to indicate their pain on a line from 0 to 10. In general, these PROM are highly correlated and likely interchangeable, though some work indicates that patients prefer the NRS. Both measures have been shown valid and reliable.
Disease-specific measures identified were the ODI, the Scoliosis Research Society-22 (SRS-22), the SRS-24, and the SRS-30. The ODI is a 10-question measure of low back disability scored from 0-100 where 100 is the greatest disability. A modified ODI is often used where a single question related to sexual function has been replaced with a pain intensity question. The SRS questionnaires are 22-, 24-, and 30-question tests related to spinal deformity. First developed for adolescent idiopathic scoliosis, the SRS questionnaires have been adapted to ASD.7,8 The SRS-24 was developed first, though Asher et al questioned the properties of this instrument and developed the SRS-22 with excellent validity and reliability. It has served as the “gold standard” for ASD PROM since development, though it does lack questions related to the postoperative health state. It measures the following domains: Physical Function, Pain, Self-Image, and Mental Health. These domains can be combined into a single subscore, which reduces the errors associated with multiple comparisons while losing the granularity and trait-specificity of each individual domain. The SRS-30 was developed to include questions related to the postoperative state across the tested domains. A revised version of the SRS-22(to improve construct validity, denoted SRS-22r) is now the preferred PROM for spinal deformity patients. Other versions of the SRS-instrument have been removed from the Scoliosis Research Society website to encourage consistent use among clinicians and researchers.
Not identified in the systematic review of ASD PROMs were the EuroQol instruments: EQ-5D-5 L, EQ-5D-3 L, and the EQ-5D-Y. 9 These instruments measure mobility, self-care, usual activities, pain/discomfort, and anxiety/depression in 1 question each over 3 levels (-3L, -Y) or 5 levels (-5L) of severity of symptoms. These instruments also include a VAS for overall health for a total of 6 questions. In addition to providing data related to each of these health domains, EuroQol data can be converted to Quality-Adjusted Life Years (QALYs) for use in healthcare economic research. QALYs are reported on a scale from 0 to 1, where 1 represents 1 year of life in perfect health. QALYs gained or lost over a period time before or during treatment are then summed to estimate the health benefit of the intervention. Costs can be applied to estimate the cost-effectiveness of any treatment. These 2 instruments ask few questions, which limits the ability to identify change in patients’ perceived health which discourages their routine use in practice. This is particularly true of the -3L as patients may choose the middle of the 3 outcome options (i.e., “I have some problems” rather than “I have no problems” or “I have many problems”).
Recent work has developed the PROMIS questionnaires, which are a catalog of trait-specific (e.g., pain interference, physical function, depression, anxiety). 10 This is a distinction between general measures of health (e.g. SF-36) and disease-specific measures (e.g. SRS-22). It will allow for customization of questionnaires to capture particular domains of interest to the patient or physicians. Importantly, these questionnaires exist in computer-adaptive tests (CAT) which use item response theory (IRT) to build the question set for each patient. Traditional PROM, such as the SRS-22, follow classical test theory (CTT). Much like an academic test examining fund of knowledge, CTT PROMs offer an estimation of health state with a fixed set of questions. That is, 1 must answer all the questions in a test (e.g., 10 questions in the ODI) to estimate the domain tested. Adaptive tests reduce question burden by using a response to inform the next question, thereby obtaining “the score” faster. For example, if 1 responds “I can run one mile” subsequent questions will focus on activities more demanding than a 1 mile run rather than reverting to activities that should be easy. As a result, questionnaires are often complete within 4-6 questions, reducing the question burden for patients and time of administration. Interpretation of PROMIS scores is also simpler, as the scores reported are a T-score metric where 50 is the mean of the population tested adjusted for patient age, 10 points is 1 standard deviation, and a higher score indicates “more” of the trait measured. An additional benefit of PROMIS over the SRS-22r and SF-36 is the ability to distinguish depression and anxiety as different concepts, rather than forming a composite of mental health as the 2 legacy instruments.
What to Measure in Adult Spinal Deformity
Given the multitude of questionnaires available, the Scoliosis Research Society supported the Core Outcome Study on Scoliosis (COSSCO) to determine a minimum set of domains to measure (with corresponding PROMs) in ASD. 11 This was performed using the recommendations of the Core Outcome Measures in Effectiveness Trials (COMET) Initiative guidelines. COMET seeks to define core outcome sets (COS) that should be considered the minimum standard for clinical research across disciplines. A systematic review (step 1) of ASD research identified 29 domains classified according to the WHO-ICF and identified 7 PROMs. Experienced spinal deformity surgeons then used a modified Delphi process with 7 rounds (step 2) to determine a COS for ASD to be collected at baseline and at 1 year after surgery. Both the patient and surgeon perspectives were considered when determining the minimum outcomes to collect with these categories: the patient experience (PROM), clinical status (e.g., mortality, new neurological deficit), and long-term results (e.g., pseudarthrosis). PROMs selected were the EQ-5D 3 L, the SRS-22r, the ODI v2.1a, and Numerical Pain Rating Scales (NRPS [0-10]) for back and leg pain. These were chosen because they are validated and readily available in numerous languages, noting that multiple PROM sets are required as no composite PROM captures all relevant domains in ASD. They offer both general health measures (EQ-5D 3 L) and disease specific assessment (SRS-22r) assessment of a patients perceived health status. The panel recommended that numerical pain rating scales (NPRS) for back and leg pain be collected at baseline and 1 year. Finally, 2 additional questions regarding time to return to work and time to total recovery were recommended at follow-up visits. A routine one-year follow-up visit is recommended for routine clinical care, with 6-month and 2-year visits recommended for research purposes. The total question burden of this COS is 40 at baseline and 42 at follow-up.
The AO Spine Knowledge Forum Deformity next used the COSSCO framework of PROMs deemed most relevant to ASD care to describe an individual patient profile (Figure 1). 12 This is necessary in ASD due to the number of domains that can be affected and/or unaffected, such as cosmesis, pain, function, and mental health. Using a similar Delphi process PROMs were selected for a “Spine-specific Status” domain in addition to the PROMs selected by COSSCO (EQ-5D 3 L, SRS-22r, ODI v2.1a, and NRPS Back/Leg). Patient expectations play a prominent role in perceived success (or failure) of treatment. To measure this, question 4 from the Credibility/Expectancy Questionnaire (CEQ) was added to the COS bringing the total number of questions asked to 41. CEQ question 4 asks patients to report their expected amount of improvement as multiples of 10 from 0 to 100%. Unlike the COSCCO set, data regarding return to work and recovery were not requested as this group sought to define an ASD patient-profile to classify the heterogenous ASD population.
Figure 1.
The AO Spine adult spine deformity patient profile.
The questionnaires recommended by both consensus panels will allow for assessment of outcomes with attention to the multitude of ASD-specific domains. Furthermore, EQ-5D 3 L data can be converted to QALYs for use in economic analyses. Despite considerable consensus and an overlap of 40 questions, there was no discussion of respondent fatigue which is a concern when subjecting patients to this volume of questions in routine clinical care. 13 The inclusion of both general measures of health and disease (ASD)-specific measures is an important aspect of these question sets. General measures of health allow patients to understand their disease, and health status, relative to other common conditions such as diabetes mellitus or congestive heart failure. 14 It offers researchers and clinicians a measure of severity of disease, which is important to communicate given the discordance between the burden of disease at patient and societal levels and funding for both care and research in ASD. General measures of health create a composite score for domains such as pain and function, which must be distinguished when considering treatments, changes in health state and addressing patient concerns. Thus, disease- or domain-specific measures are required. Given the heterogeneity of symptomatic presentation, it is important to identify varying levels of pain, function, and satisfaction with appearance.
Interpreting Patient Reported Outcome Measures
As previously discussed, PROMs allow for quantification of change and assessment of surgical success or failure. It is necessary to distinguish statistical change from clinically relevant change. In response, Jaschke defined the minimum clinically important difference (MCID) as the smallest change in health-status deemed clinically relevant to the patient in the absence of extreme risk or expense. 15 MCID can be calculated using several different methodologies. 16 Distribution-based methods assign “no improvement” to a predetermined number of patients using statistical properties of the data, for example the standard deviation or the standard error of measurement. While it is most likely that some patients will fail to improve an ideal methodology would allow for the “perfect” surgery with all patients improving. This is not possible with distribution-based methods. The anchor-based method requests that patients respond to a question regarding improvement, classifying patients as “better” or “not better.” Receiver operating characteristic (ROC) analyses determine a value for MCID to optimize the sensitivity and specificity of this threshold value. The ability of ROC analyses to properly classify data is assessed by measuring the area under the curve (AUC) where 0 is no discrimination, 1 is perfect discrimination, and .5 chance. Furthermore, 1 should consider MCID in this context as ASD surgeries are expensive and are accompanied by risks of neurological injury and unplanned reoperation. Thus, the MCID is not an appropriate outcome target when 1 understands that this is the smallest detectable improvement in symptoms. With great risk and expense, 1 should reasonably expect proportionally greater improvements in health-status. In response to this shortcoming, substantial clinical benefit (SCB) is proposed as an alternative to MCID. 17 As opposed to MCID, SCB is estimated from patient responses indicating “much better.” There is a middle ground between “minimal” and “substantial” which is relevant and may remain undiscovered with the use of the SCB.
While the amount of change in a PROM is important, patient satisfaction is ignored by both MCID and SCB. That is, despite achieving some improvement a patient may still be dissatisfied with the outcome of surgery if the improvement is not what they expected, for example a patient moving from a severely disabled to moderately disabled state (e.g., ODI of 60 to 40). The patient acceptable symptom state (PASS) is a threshold value for a PROM at which point a patient considers themselves satisfied with their state of health. 18 A benefit of PASS is that it is a threshold that may be generally applicable across patients, irrespective of their baseline PROM value (or disability). This is an improvement over MCID as it allows for discrimination between patients who are doing well vs those who are improved, but still disabled and dissatisfied or as Dougados wrote “It’s good to feel better but better to feel good.” 19 The various threshold values proposed in the literature are found in Table 1.
Table 1.
Minimum Clinically Important Difference (MCID), Minimum Detectable Measurement Difference (MDMD), Patient Acceptable Symptom State (PASS) and Substantial Clinical Benefit (SCB) Thresholds.
| Patient Reported Outcome Measure | Minimum Clinically Important Difference (MCID)[27-31] | Minimum Detectable Measurement Difference[20] | Patient Acceptable Symptom State[18] | Substantial Clinical Benefit[28, 32] |
|---|---|---|---|---|
| Scoliosis research society 22-r | ||||
| Activity | .3 - 0.6 | 0.3 | >3.3 | .6 - 0.9 |
| Self-image | .6 - 1.2 | 0.5 | >3.3 | 1.6 - 1.7 |
| Pain | .4 - 0.6 | 0.6 | >3.5 | |
| Mental health | 0.5 | 0.3 | >3.8 | |
| Subscore | 0.4 | 0.4 | >3.5 | .6 - 0.7 |
| Oswestry disability index | 15 | 7.0 | ≤18 <50 yrs: ≤ 11 ≥50 yrs: ≤29 |
|
| SF-36 | ||||
| Physical component summary | 7.8 | 5.4 | ||
| Numerical rating scale pain* | ≤3.0 <50 yrs: ≤3.0 ≥50 yrs: ≤4.9 |
|||
*Score Chosen From the Higher of Back and Leg.
A limitation of MCID, SCB, and PASS is that misclassification may be more common than desirable because of the methods used to determine these thresholds. Distribution-based approaches assign success or failure as some portion of the total patient set. For example, the standard error of measurement (equal to the standard deviation divided by the square root of the sample size) is 1 value proposed for a distribution-based MCID. This simplifies calculation of MCID but ignores any patient perception of the results which are the foundation of the threshold. In theory, it should be possible for an intervention to be perfect with all patients improving, though distribution methods make this situation impossible. Anchor-based methods require a compromise of misclassification, using receiver operating characteristic curves to balance the sensitivity and specificity of responses. Unfortunately, these thresholds often have quite poor discriminative properties (with area under the curve (AUC) approaching .5 (chance) rather than 1 (perfect classification)) despite widespread adoption. Thus, it is likely best to ask anchor questions at any subsequent follow-up visit to ascertain patient-perceived results, rather than to apply derived classifications of success. Anchor questions should also be trait-specific, rather than generic, so that 1 may understand patient perception of postoperative pain, function, and self-image independent of 1 another.
It is important to note that MCID and SCB are both “within patient” concepts. That is, 1 must consider whether a patient has achieved the threshold using responder analyses to evaluate the success within a group, where the percentage of patients achieving MCID/SCB are reported. It is not appropriate to use group means to examine the success of any intervention relative to the MCID/SCB. This is particularly germane to comparative effectiveness research when 1 is looking to examine the clinical relevance of any difference in PROMs. To address this issue, the minimum detectable measurement difference (MDMD) is proposed. 20 This is a threshold of difference that exceeds the error of PROM values between 2 time points. It allows 1 to determine whether 1 intervention offers greater improvement than another when comparing group means, for example comparing results of minimally invasive ASD reconstructions with traditional open reconstructions. When the MDMD is taken into consideration with a responder analysis, there is a greater understanding of the difference and clinical relevance of any difference between interventions or treatments. Also, when PROMs are used as a primary outcome measure in comparative effectiveness a sample size estimation is required. It is not infrequent that the MCID is mistakenly used to determine “whether a clinically relevant difference exists” between 2 treatments. The use of MCID as the expected treatment effect (difference in mean PROM change) is a common and inappropriate method. If the control group is treated with a “gold standard” procedure, then PROM change meeting MCID will be expected. In many cases, it is unreasonable to think that a surgery that will be twice as good as the standard existing treatment. This will then overestimate the treatment effect and lead to conclusions of “no difference” when a larger sample may be required to determine difference. This is an important concept as comparative-effectiveness is required in many health care systems now and an erroneous conclusion of “no difference” may affect introduction of new technologies. Thus, the use of the MDMD is recommended for sample size estimations when comparing the effectiveness of 2 interventions. The MDMD can also be used as a non-inferiority margin for studies investigating multiple PROMs. That is, provided PROM change falls within the boundary of the MDMD then 1 can conclude “it is not better, but it is not worse.”
The Smallest Worthwhile Effect (SWE) seeks to improve on the shortcomings of the MCID by incorporating patient expectations, cost, and risk-tolerance. 21 It is a change in health-related quality of life (HRQOL) deemed appropriate given the risks encountered. Ideally, SWE is calculated using the benefit-harm tradeoff method where patients are offered a variety of scenarios with varying amounts of improvement, risk and cost. For example, in ASD these scenarios would include the risk of neurological deficit with a three-column osteotomy. Patient desire, given these risks, may lead to a higher expected change in PROM than the MCID because of the risk for an adverse outcome. Future research in ASD to determine the SWE is required given the expense and risk encountered with these surgeries.
Several studies have sought to provide “normative” PROMs data for patients unaffected by ASD.22-25 Interestingly, despite studying disease specific measures (SRS-22, -22r, -30) the mean scores across affected domains were slightly below the ceiling of 5 points. The confidence intervals are generally close to the mean suggesting little variability in healthy patients. It is important to note that there is a tendency for younger patients to report less disability than older patients across -Pain, -Activity, and -Self-Image domains. Similarly, EQ-5D Pain and VAS-Back/Leg scores tended to get worse with age.24,25 Variation is more substantial across nationalities rather than age, however, where older UK patient reported substantially lower SRS-22r scores than healthy Swedish counterparts. Rather than describe the “normal” values for the reader, we believe it is important to emphasize that “normative” data points are difficult to interpret and should not be used as triggers for surgical intervention nor as labels for “acceptable” outcomes. While patients are described as unaffected by ASD, the responses suggest either undiagnosed ASD or imperfect PROM questions which lack perfect ASD specificity. More research is required to understand the potential for score improvement across age groups and to understand “interference” from age-related degeneration in other body systems, before accepting lower PROMs results in older patients. Again, it is important to note that PROMs, ultimately, are patient-specific and it is challenging to compare subjective impressions/complaints across patients, ages, or cultures.
Future PROM Directions in Adult Spinal Deformity
PROMs are a critical component of modern patient evaluation and surgeon assessment of outcomes. As such, we agree some form of PROM should be collected by any surgeon performing ASD reconstructions as suggested by the COSSCO group. COSSCO did not, however, consider PROMIS-CAT in their review. Given the ease of administration with CAT such as PROMIS, it is reasonable for surgeons to collect a small number of domains, such as pain and function. Fatigue burden must be considered in both the research and non-research settings and this may dissuade both surgeons and patients from routine collection of PROMs and lead to misclassification of disability.13,26 CAT allow for a precise measure of the disease state often in less than 1 minute with 6 or fewer questions answered. Furthermore, PROMIS scores are reported as age-adjusted measures unlike legacy measures like the SRS-22r. This may allow for improved expectation management, as the goals and expectations may differ between a 25-year-old and a 65-year-old. The ease of administration may encourage non-academic surgeons to collect these within their electronic medical record.
Future direction in ASD surrounding PROMs calls for development of CAT to measure self-image/cosmesis and for development of a cervical spine deformity specific question bank from which CAT will come. Both areas are germane to ASD patients but have no appropriate PROM. In addition to routine collection of PROMs across ASD providers, establishing more accurate methods of “success” and “failure” for ASD reconstructions necessary. This will involve patient expectations and the use of prediction modeling to estimate the likelihood of changes important to the patient across the individual domains. While a step away from refined and brief, the patient-generated index, may serve as a guide used in concert with PROMIS-CAT (or something similar) to provide personalized care and expectation management as we advance this aspect of ASD care. Finally, more work investigating threshold values for timing of intervention is needed (i.e. nonoperative care for ODI <40, consider operative care for ODI ≥40) as this will solidify PROM as a “vital sign” in a modern ASD practice as they will be used to guide care, set expectations, and evaluate the success of treatment for this complex disease. 27
Acknowledgments
The authors would like to thank Dr. Marinus de Kleuver of the AO Spine Knowledge Forum Deformity for his guidance and comments as we wrote this narrative review. The authors declare no competing interest in the Patient Reported Outcome Measurement Information System.
Footnotes
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This supplement was organized and financially supported by AO Spine through the AO Spine Knowledge Forum Deformity, a focused group of international Adult Spinal Deformity experts. AO Spine is a clinical division of the AO Foundation, which is an independent medically-guided not-for-profit organization. Support was provided directly through AO Network Clinical Research.
ORCID iD
Michael P. Kelly https://orcid.org/0000-0001-6221-7406
References
- 1.Kostanjsek N. Use of The International Classification of Functioning, Disability and Health (ICF) as a conceptual framework and common language for disability statistics and health information systems. BMC Public Health. 2011;11(Suppl 4):S3. doi: 10.1186/1471-2458-11-S4-S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Faraj SSA, van Hooff ML, Holewijn RM, Polly DW, Jr., Haanstra TM, de Kleuver M. Measuring outcomes in adult spinal deformity surgery: A systematic review to identify current strengths, weaknesses and gaps in patient-reported outcome measures. Eur Spine J. 2017;26(8):2084-2093. doi: 10.1007/s00586-017-5125-4. [DOI] [PubMed] [Google Scholar]
- 3.van der Wees PJ, Verkerk EW, Verbiest MEA, et al. Development of a framework with tools to support the selection and implementation of patient-reported outcome measures. J Patient Rep Outcomes. 2019;3(1):75. doi: 10.1186/s41687-019-0171-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Middel B, van Sonderen E. Statistical significant change versus relevant or important change in (quasi) experimental design: Some conceptual and methodological problems in estimating magnitude of intervention-related change in health services research. Int J Integr Care. 2002;2:e15. doi: 10.5334/ijic.65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Bridwell KH, Cats-Baril W, Harrast J, et al. The validity of the SRS-22 instrument in an adult spinal deformity population compared with the Oswestry and SF-12: A study of response distribution, concurrent validity, internal consistency, and reliability. Spine (Phila Pa 1976). 2005;30(4):455-461. doi: 10.1097/01.brs.0000153393.82368.6b. [DOI] [PubMed] [Google Scholar]
- 6.Wondra JP, Kelly MP, Yanik EL, et al. Patient-reported outcome measure clustering after surgery for adult symptomatic lumbar scoliosis. J Neurosurg Spine. 2022:1-12. Online ahead of print. doi: 10.3171/2021.11.SPINE21949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Asher M, Min Lai S, Burton D, Manna B. The reliability and concurrent validity of the scoliosis research society-22 patient questionnaire for idiopathic scoliosis. Spine (Phila Pa 1976). 2003;28(1):63-69. doi: 10.1097/00007632-200301010-00015. [DOI] [PubMed] [Google Scholar]
- 8.Asher MA, Min Lai S, Burton DC. Further development and validation of the Scoliosis Research Society (SRS) outcomes instrument. Spine (Phila Pa 1976). 2000;25(18):2381-2386. doi: 10.1097/00007632-200009150-00018. [DOI] [PubMed] [Google Scholar]
- 9.Rabin R, Gudex C, Selai C, Herdman M. From translation to version management: A history and review of methods for the cultural adaptation of the EuroQol five-dimensional questionnaire. Value Health. 2014;17(1):70-76. doi: 10.1016/j.jval.2013.10.006. [DOI] [PubMed] [Google Scholar]
- 10.Horn ME, Reinke EK, Couce LJ, Reeve BB, Ledbetter L, George SZ. Reporting and utilization of Patient-Reported Outcomes Measurement Information System(R) (PROMIS(R)) measures in orthopedic research and practice: A systematic review. J Orthop Surg Res. 2020;15(1):553. doi: 10.1186/s13018-020-02068-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.de Kleuver M, Faraj SSA, Haanstra TM, et al. The Scoliosis Research Society adult spinal deformity standard outcome set. Spine Deform. 2021;9(5):1211-1221. doi: 10.1007/s43390-021-00334-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Naresh-Babu J, Kwan KYH, Wu Y, et al. AO spine adult spinal deformity patient profile: A paradigm shift in comprehensive patient evaluation in order to optimize treatment and improve patient care. Global Spine J. 2021:21925682211037935. Online ahead of print. doi: 10.1177/21925682211037935. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Egleston BL, Miller SM, Meropol NJ. The impact of misclassification due to survey response fatigue on estimation and identifiability of treatment effects. Stat Med. 2011;30(30):3560-3572. doi: 10.1002/sim.4377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Bess S, Line B, Fu KM, et al. The health impact of symptomatic adult spinal deformity: Comparison of deformity types to united states population norms and chronic diseases. Spine (Phila Pa 1976). 2016;41(3):224-233. doi: 10.1097/BRS.0000000000001202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Jaeschke R, Singer J, Guyatt GH. Measurement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials. 1989;10(4):407-415. doi: 10.1016/0197-2456(89)90005-6. [DOI] [PubMed] [Google Scholar]
- 16.Copay AG, Glassman SD, Subach BR, Berven S, Schuler TC, Carreon LY. Minimum clinically important difference in lumbar spine surgery patients: A choice of methods using the Oswestry Disability Index, Medical Outcomes Study questionnaire Short Form 36, and pain scales. Spine J. 2008;8(6):968-974. doi: 10.1016/j.spinee.2007.11.006. [DOI] [PubMed] [Google Scholar]
- 17.Glassman SD, Copay AG, Berven SH, Polly DW, Subach BR, Carreon LY. Defining substantial clinical benefit following lumbar spine arthrodesis. J Bone Joint Surg Am. 2008;90(9):1839-1847. doi: 10.2106/JBJS.G.01095. [DOI] [PubMed] [Google Scholar]
- 18.Mannion AF, Loibl M, Bago J, et al. What level of symptoms are patients with adult spinal deformity prepared to live with? A cross-sectional analysis of the 12-month follow-up data from 1043 patients. Eur Spine J. Jun. 2020;29(6):1340-1352. doi: 10.1007/s00586-020-06365-z. [DOI] [PubMed] [Google Scholar]
- 19.Strand V, Boers M, Idzerda L, et al. It’s good to feel better but it's better to feel good and even better to feel good as soon as possible for as long as possible. Response criteria and the importance of change at OMERACT 10. J Rheumatol. 2011;38(8):1720-1727. doi: 10.3899/jrheum.110392. [DOI] [PubMed] [Google Scholar]
- 20.Kelly MP, Kim HJ, Ames CP, et al. Minimum detectable measurement difference for health-related quality of life measures varies with age and disability in adult spinal deformity: Implications for calculating minimal clinically important difference. Spine (Phila Pa 1976). 2018;43(13):E790-E795. doi: 10.1097/BRS.0000000000002519. [DOI] [PubMed] [Google Scholar]
- 21.Ferreira M. Research Note: The smallest worthwhile effect of a health intervention. J Physiother. Oct. 2018;64(4):272-274. doi: 10.1016/j.jphys.2018.07.008. [DOI] [PubMed] [Google Scholar]
- 22.Baldus C, Bridwell K, Harrast J, et al. The Scoliosis Research Society Health-Related Quality of Life (SRS-30) age-gender normative data: an analysis of 1346 adult subjects unaffected by scoliosis. Spine (Phila Pa 1976). 2011;36(14):1154-1162. doi: 10.1097/BRS.0b013e3181fc8f98. [DOI] [PubMed] [Google Scholar]
- 23.Berven S, Deviren V, Demir-Deviren S, Hu SS, Bradford DS. Studies in the modified Scoliosis Research Society Outcomes Instrument in adults: Validation, reliability, and discriminatory capacity. Spine (Phila Pa 1976). 2003;28(18):2164-2169. doi: 10.1097/01.BRS.0000084666.53553.D6. [DOI] [PubMed] [Google Scholar]
- 24.Diarbakerli E, Grauers A, Gerdhem P. Population-based normative data for the Scoliosis Research Society 22r questionnaire in adolescents and adults, including a comparison with EQ-5D. Eur Spine J. 2017;26(6):1631-1637. doi: 10.1007/s00586-016-4854-0. [DOI] [PubMed] [Google Scholar]
- 25.Tsirikos AI, Wordie SJ. Population-based normative data for the Scoliosis Research Society 22r, EQ-5D, and VAS questionnaires among individuals aged 20 to 69 years. Bone Jt Open. 2022;3(2):130-134. doi: 10.1302/2633-1462.32.BJO-2021-0110.R1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Cabitza F, Dui LG, Banfi G. PROs in the wild: Assessing the validity of patient reported outcomes in an electronic registry. Comput Methods Programs Biomed. 2019;181:104837. doi: 10.1016/j.cmpb.2019.01.009. [DOI] [PubMed] [Google Scholar]
- 27.Chang SS, Movsas B. How vital are patient reported outcomes? J Natl Cancer Inst. 2021;114(3):347-348. doi: 10.1093/jnci/djab178 [DOI] [PMC free article] [PubMed] [Google Scholar]

