Key Points
Question
What patient-reported outcome measures exist for studying patients with upper airway–related dyspnea?
Findings
In this systematic review, we identified 3 patient-reported outcome measures directly applicable to patients with upper airway–related dyspnea; one was developed de novo for this population, and 2 were adapted from existing pulmonary measures. Thematic deficiencies in current measures are lack of patient involvement in item development (content validity), plan for interpretation, and literacy level assessments.
Meaning
Care must be taken to understand the measurement characteristics and contextual relevance before applying these instruments for clinical, research, or quality initiatives.
Abstract
Importance
Patient-reported outcome (PRO) measures address the need for patient-centered data and are now used in diverse clinical, research, and policy pursuits. They are important in conditions causing upper airway–related dyspnea in which the patient’s reported experience and physiological data can be discrepant.
Objectives
To perform a systematic review of the literature on upper airway dyspnea–related PRO measures and to rigorously evaluate each measure’s developmental properties, validation, and applicability.
Evidence Review
This study strictly adhered to Preferred Reporting Items for Systematic Review and Meta-analysis (PRISMA) guidelines. MEDLINE via the PubMed interface, the Cumulative Index to Nursing and Allied Health Literature (CINAHL), and the Health and Psychosocial Instruments (HaPI) database were searched using relevant vocabulary terms and key terms related to PRO measures and upper airway–related dyspnea. Three investigators performed abstract review, and 2 investigators independently performed full-text review by applying an established checklist to evaluate the conceptual model, content validity, reliability, construct validity, scoring and interpretability, and respondent burden and presentation of each identified instrument. The initial literature search was conducted in November 2014 and was updated in April 2016.
Findings
Of 1269 studies reviewed, 3 upper airway–related dyspnea PRO measures met criteria for inclusion. One PRO measure was designed de novo to assess upper airway–related dyspnea symptoms and monitor treatment outcomes, while 2 were adapted from established instruments designed for lower airway disease. Measurement properties and psychometric characteristics differed, and none met all checklist criteria. Two met a criterion in each of 7 domains evaluated. Two demonstrated test-retest and internal consistency reliability, and 2 showed that their scores were responsive to change. Thematic deficiencies in current upper airway–related dyspnea PRO measures are lack of patient involvement in item development (content validity), plan for interpretation, and literacy level assessments.
Conclusions and Relevance
PRO measures are critical in the assessment of patients with upper airway–related dyspnea. Three instruments with disparate developmental rigor have been designed or adapted to assess this construct. Care must be taken to understand the measurement characteristics and contextual relevance before applying these PRO measures for clinical, research, or quality initiatives.
This systematic review of upper airway dyspnea–related patient-reported outcome measures evaluates each measure’s developmental properties, validation, and applicability.
Introduction
Dyspnea has an estimated point prevalence of 1% to 32% in the United States, and the vast majority is attributed to intrinsic lung disease (eg, chronic obstructive pulmonary disease). In contrast, upper airway–related dyspnea characterizes a group of debilitating rare diseases and conditions that are underrecognized and broadly categorized into structural (eg, airway stenosis), functional (eg, paradoxical vocal fold motion [PVFM]), and neurological (eg, bilateral vocal fold paralysis).
Management of severe upper airway–related dyspnea can require emergent, live-saving intervention (eg, intubation or tracheotomy). However, most affected patients are seen with nonacute, stable, or variable upper airway–related breathing difficulties and represent a more nuanced management challenge. Physiological tests (eg, pulmonary function tests and laryngoscopy) can be useful to assess the severity of the obstructive process. However, understanding associated symptoms and quality-of-life implications is equally or arguably more critical in management decision making. Disparity between patient experience and physiological test results complicates the care of this patient population and highlights the importance of incorporating the patient’s perspective during counseling and treatment decisions.
Patient-reported outcome (PRO) measures, defined as any report of the status of a patient’s health condition that comes directly from the patient without interpretation of the patient’s response by a clinician or anyone else, provide the ideal means of systematically capturing the patient perspective and experience. They are recognized by the US Food and Drug Administration and National Institutes of Health as critical end points in clinical trials and comparative effectiveness research. While routinely used in the study of common pulmonary conditions, their application is less pervasive but no less important in advancing the study of upper airway–related diseases. To date, no study has systematically identified what PRO measures are applicable to upper airway–related dyspnea and, of those, which are designed with appropriate methodological rigor for incorporation into clinical care, research, and quality initiatives. End users should not presume that published PRO measures have comparably strong and appropriate measurement properties, precision, and applicability. Use of poorly developed “validated” PRO measures or those intended for a different application can have significant implications leading to distorted and inaccurate findings. To inform potential end users, the objectives of our study were to perform a systematic review of the literature on upper airway dyspnea–related PRO measures and to rigorously evaluate each measure’s developmental properties, validation, and applicability.
Methods
This study did not involve data collection from or about human participants and was therefore exempt from institutional review board approval. Systematic review methods used herein strictly adhered to Preferred Reporting Items for Systematic Review and Meta-analysis (PRISMA) guidelines.
Search Strategy
We searched MEDLINE via the PubMed interface, the Cumulative Index to Nursing and Allied Health Literature (CINAHL), and the Health and Psychosocial Instruments (HaPI) database using relevant vocabulary terms and key terms related to PRO measures and upper airway–related dyspnea. No restrictions on publication date were used. The initial literature search was conducted in November 2014 and was updated in April 2016. Reference lists of the included articles and recent reviews related to measurement of dyspnea were hand-searched to identify additional relevant articles.
Study Selection
Inclusion and exclusion criteria were developed in consultation with an expert panel that included a statistician with expertise in measurement theory (I.D.F.), systematic review methodologists (N.A.S. and M.L.M.), and researchers and clinicians who treat and study upper airway–related dyspnea (A.G. and D.O.F.). Three investigators (M.N., K.H., and D.O.F.) independently reviewed abstracts of all studies identified in the literature search, and those meeting predetermined criteria were advanced to full-text review. Measures focusing on children or pulmonary-specific conditions were excluded. Articles lacking adequate information in their title or abstract to determine eligibility were also included in the full-text review phase. Two independent reviewers (M.N. and D.O.F.) performed full-text review of articles to determine eligibility for data extraction. Disagreements were resolved through discussion or adjudication by a senior investigator (I.D.F.).
Data Extraction
One reviewer (D.O.F.) extracted all relevant data from studies meeting criteria at the full-text review phase. A second reviewer (M.N.) independently verified data accuracy. Components of PRO measure development were critically examined and entered into evidence tables. These data included the instrument’s name and acronym, authors, years published, objective and intended construct, setting of development (eg, tertiary care) and country, population targeted and involved in development, type of scale used (eg, Likert type or visual analog scale), number of items or questions, and, when present, what subscales or domains they were designed to specifically measure.
PRO Measure Assessment
Two investigators (M.N. and D.O.F.) independently assessed each study’s methods using a criteria checklist developed a priori (Table 1). In brief, the checklist was designed to help systematic reviewers identify components deemed important to the construction of PRO measures, including: (1) conceptual model, (2) content validity, (3) reliability, (4) construct validity, (5) scoring and interpretation, and (6) respondent burden and presentation. Definitions of these concepts are provided in the eTable in the Supplement. Each reviewer was trained and tested on appropriate application of the checklist using a method described separately. They were independently tasked with evaluating all identified PRO measures. On completion, reviewers (M.N. and D.O.F.) met to discuss and come to consensus on scoring discrepancies.
Table 1. Checklist of Key Characteristics to Consider When Evaluating a Patient-Reported Outcome (PRO) Measurea.
Characteristic | Score |
---|---|
Conceptual Model | |
1. Has the PRO construct to be measured been specifically defined? | |
2. Has the intended respondent population been described? | |
3. Does the conceptual model address whether a single construct or scale or multiple subscales are expected? | |
Content Validity | |
4. Is there evidence that members of the intended respondent population were involved in the PRO measure’s development? | |
5. Is there evidence that content experts were involved in the PRO measure’s development? | |
6. Is there a description of the method by which items or questions were determined (eg, focus groups and interviews)? | |
Reliability | |
7. Is there evidence that the PRO measure’s reliability was tested (eg, test-retest reliability and internal consistency)? | |
8. Are reported indexes of reliability adequate (eg, r≥0.80 is ideal, and r≥0.70 is adequate or is otherwise justified)? | |
Construct Validity | |
9. Is there reported quantitative justification that a single scale or multiple subscales exist in the PRO measure (eg, factor analysis and item response theory)? | |
10. Is the PRO measure intended to measure change over time? If yes, is there evidence of both test-retest reliability and responsiveness to change? Otherwise, award 1 point if there is an explicit statement that the PRO measure is not intended to measure change over time. | |
11. Are there findings supporting expected associations with existing PRO measures or with other relevant data? | |
12. Are there findings supporting expected differences in scores between relevant known groups? | |
Scoring and Interpretation | |
13. Is there documentation of how to score the PRO measure (eg, a scoring method like summing or an algorithm)? | |
14. Has a plan for managing or interpreting missing responses been described (ie, how to score incomplete surveys)? | |
15. Is information provided about how to interpret the PRO measure scores (eg, scaling or anchors [what high and low scores represent], normative data, or a definition of severity [mild to severe])? | |
Respondent Burden and Presentation | |
16. Is the time to complete reported and reasonable? Or, if it is not reported, is the number of questions appropriate for the intended application? | |
17. Is there a description of the literacy level of the PRO measure? | |
18. Is the entire PRO measure available for public viewing (eg, published with the citation or information provided about how to access a copy)? |
Instructions: Please indicate in the Score column whether or not the information provided in the citation or source document meets each criterion (0 indicates criterion not met, and 1 indicates criterion met).
Data Synthesis
Data from unique PRO measures demonstrated wide heterogeneity in constructs, methods, and intended purpose and therefore were not appropriate for aggregation or meta-analysis. Instead, individual PRO measure characteristics were summarized independently with respect to developmental and psychometric rigor.
Results
Flow of the Study
The eFigure in the Supplement shows the study flow as well as inclusions and reasons for exclusions. Of 1269 studies reviewed, 3 upper airway–related dyspnea PRO measures met criteria for inclusion. While 32 dyspnea-related PRO measures were identified, most were excluded because they were dyspnea symptom indexes specifically related to lower airway conditions or other systemic diseases.
One of the 3 PRO measures was designed de novo to assess upper airway–related dyspnea symptoms and monitor treatment outcomes (Dyspnea Index [DI]), and 2 were adapted from established PRO measures for nonspecific lower airway disease (Medical Research Council [MRC] dyspnea scale) or chronic obstructive pulmonary disease (Clinical Chronic Obstructive Pulmonary Disease Questionnaire [CCQ]). The number of participants involved ranged from 33 to 369, they were predominantly female (60%-76%), and they had a mean or median age ranging from 44 to 49 years (Table 2). The MRC dyspnea scale and CCQ were adapted from existing pulmonary applications for use in patients with laryngotracheal stenosis (Table 3). In contrast, the DI was developed for patients with upper airway–related dyspnea of varied etiology.
Table 2. Study Population Characteristics and Setting Involved in Development of Patient-Reported Outcome Measures Related to Upper Airway Dyspnea.
Source | Instrument | Study Population | Setting | No. | Distribution of Pathology | Age, Mean (SD) [Range], y | Female, % | Country |
---|---|---|---|---|---|---|---|---|
Nouraei et al, 2008 | Medical Research Council dyspnea scale | Adult patients with LTS | The National Centre for Airway Reconstruction, Charing Cross Hospital, London, England | 40 | 29 Postintubation injury, 8 idiopathic subglottic stenosis, 2 granulomatosis polyangiitis, 1 glottis sarcoidosis | 44 (14) [18-75] | 60 | England |
Nouraei et al, 2009 | Clinical Chronic Obstructive Pulmonary Disease Questionnaire | Adult patients with LTS undergoing endoscopic laryngotracheoplasty | Department of Otolaryngology, Charing Cross Hospital, London, England | 33 | 10 Bilateral vocal cord mobility impairment, 23 subglottic or tracheal stenosis | 44 (15) [18-75] | 64 | England |
Gartner-Schmidt et al, 2014 | Dyspnea Index | Adult patients with dyspnea related to upper airway | University of Pittsburgh Voice Center, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania | 369 | 200 Final development (dyspnea NOS), 15 cognitive interviews (dyspnea NOS), 108 reliability and validity (51 dyspnea NOS and 57 healthy controls), 46 pre-post outcomes (22 LTS, 21 PVFM, 3 BVFP) | 49 (17) [13-82] | 76 | United States |
Abbreviations: BVFP, bilateral vocal fold paralysis; LTS, laryngotracheal stenosis; NOS, not otherwise specified; PVFM, paradoxical vocal fold motion.
Table 3. Measurement Aims, Target Population, and Item Characteristics for Patient-Reported Outcome Measures Related to Upper Airway Dyspnea.
Instrument | Year | Measurement Aim | Target Population | Language | No. of Items or Domains | Response Options | Domains |
---|---|---|---|---|---|---|---|
Medical Research Council dyspnea scale | 2008 | To assess sensitivity and responsiveness of the scale, a psychophysical dyspnea assessment instrument, to the presence and treatment of adult laryngotracheal stenosis | Laryngotracheal stenosis | English | 1 Item | Grade (range, 1-5) | NA |
Clinical Chronic Obstructive Pulmonary Disease Questionnaire | 2009 | To validate the questionnaire, a patient-administered instrument developed for bronchopulmonary disease, as a disease-specific psychophysical outcome measure for adult laryngotracheal stenosis | Laryngotracheal stenosis undergoing endoscopic laryngotracheoplasty | English | 10 Items, 3 domains | Likert type (range, 0-6) | Symptom, functional, mental |
Dyspnea Index | 2013 | To develop and validate the index, to quantify the severity of symptoms in upper airway dyspnea, and to validate the index as an outcome measure | Upper airway–related dyspnea | English | 10 Items | Likert type (range, 0-4) | NA |
Abbreviation: NA, not applicable.
Developmental Characteristics
The developmental process and measurement properties differed among identified measures. Two met at least one criterion in each of 7 domains evaluated. None met all checklist criteria. The DI met the most criteria (15 of 18), followed by the CCQ (14 of 18) and the MRC dyspnea scale (12 of 18). Analyses by domain are shown in the Figure and summarized below.
Figure. Summary Comparison of Measurement Properties Among Identified Patient-Reported Outcome (PRO) Measures.
Blue indicates the criterion is met. CCQ indicates Clinical Chronic Obstructive Pulmonary Disease Questionnaire; DI, Dyspnea Index; and MRC, Medical Research Council dyspnea scale.
Conceptual Model
Each PRO measure defined the construct that it intended to measure and its respective target population (Figure). These measures also prespecified the expected dimensionality within the intended conceptual framework (CCQ and DI) or comprised a single item (MRC dyspnea scale).
Content Validity
Direct input from patients experiencing upper airway–related dyspnea (eg, interviews and focus groups) was not used to develop item content for any PRO measure (Figure). For example, the DI item content was determined in a “clinical consensus conference” that included content experts who care for and study this population (ie, otolaryngologists, laryngologists, and speech-language pathologists) but not affected patients. It used patient cognitive interviews to refine final items; however, investigators did not use patients in their original derivation. In fact, content experts (clinicians) devised or adapted the content of all of these PRO measures without patient input. The MRC dyspnea scale and CCQ were specifically designed for patients with lung disease; hence, their content did not derive from patients with upper airway disease–related dyspnea or from upper airway disease content experts. Nonetheless, clinicians with expertise in treating upper airway–related dyspnea oversaw their adaptation. For each measure, investigators explained the method by which they either derived their items (DI) or the rationale and methodology used to adapt existing PRO measures to this patient population (MRC dyspnea scale and CCQ).
Reliability
The CCQ and DI tested and demonstrated adequate test-retest and internal consistency reliability (Figure). The DI performed test-retest reliability with 45 minutes between the original and repeat testing (at the same clinic visit), whereas the CCQ allowed 2 weeks between administrations. Reliability of the MRC dyspnea scale was not tested in this patient population.
Construct Validity
No PRO measure met all construct validity criteria (Figure). Dimensionality was demonstrated by the DI, which was found to have a common factor (ie, one subscale), indicating that all of its items measure the same dimension or construct. The CCQ claimed multidimensionality (ie, symptom, function, and mental) but did not statistically justify the existence of dimensions in the target population. The MRC dyspnea scale has a single item and therefore is unidimensional de facto.
Two measures adequately evaluated longitudinal validity or responsiveness to change. The CCQ and DI showed significant decreases in scores after behavioral or surgical intervention (as indicated based on pathology). Two measures (MRC dyspnea scale and CCQ) established convergent validity by correlating scores with other related questionnaires or with clinical correlates. The CCQ and DI showed known group validity by establishing their ability to differentiate groups that empirical evidence has shown to be different (eg, disease vs healthy).
Scoring and Interpretation
Scoring approaches differ. Two measures use Likert-type scales with simple summation, with higher total scores indicating greater degrees of the construct being measured (CCQ and DI) (Table 2). The CCQ also uses subscale-specific scoring. In contrast, the MRC dyspnea scale uses a simple one-item, 5-point grading system scale based on dyspnea severity. For example, patients with MRC dyspnea scale grade 1 “only get breathless with strenuous exercise,” whereas those with grade 5 are “too breathless to leave the house” or “get breathless when [they] get dressed/undressed or wash [themselves].”
No measure described a plan for managing missing data. This absence is less relevant for the MRC dyspnea scale because it only includes one question, but a plan was also not discussed for either multi-item measure. Each PRO measure provided some description of score scaling. However, only the MRC dyspnea scale advanced a minimal clinically important difference (MCID) in score, which was determined to be a one-grade change (eg, grade 5 [“too breathless to leave the house”] → grade 4 [“I stop for breath after 100 yards on the level”]). The other measures did not establish what would represent a clinically or minimally important change in score.
Respondent Burden and Presentation
All measures had acceptable degrees of patient burden (1-10 items). Only the DI offered an estimation of the questionnaire literacy level (a Gunning Fog Index of 5.167, which is commensurate with a fifth-grade reading level). All publications provided instructions on use and provided the ability to see all of the measured items.
Discussion
PRO measures are central to systematically understand the patient’s experience with a condition. Lack of consensus exists in the optimal outcome measurement for the study and management of upper airway–related dyspnea. A balance is needed between physiological and patient-centered outcomes because they are complementary in defining symptom severity and in comparing treatment effectiveness.
Appropriateness and applicability of PRO measures are dependent on design intent, developmental rigor, and measurement characteristics. The present study systematically reviewed the literature on upper airway–related dyspnea PRO measures to assess their developmental properties, validation, and applicability. Of the 3 measures identified, only one (DI) was originally designed to measure symptoms and quality of life attributed to upper airway disease, and 2 (MRC dyspnea scale and CCQ) were adapted for patients with upper airway dyspnea from established instruments designed to assess patient experience with lower airway disease.
Target Population
Relevant PRO measures for upper airway–related dyspnea used different target populations in de novo development or in adaptation. The DI used patients with the following 3 categories of disease in its development: laryngeal stenosis, bilateral vocal fold paralysis, and PVFM. The former 2 represent fixed obstruction, while PVFM is episodic, with periods of normal breathing between episodes. The mixture of fixed and variable obstructive upper airway disease affected which items were included and thus the measure’s specificity. For example, “My breathing gets worse with stress” may be more relevant to patients with PVFM than to those with fixed obstruction. In contrast, the MRC dyspnea scale and CCQ only used patients with laryngotracheal stenosis in their adaptation, reliability, and validation processes. Therefore, these PRO measures are not tested or may not be appropriate for use in patients with PVFM. In summary, PRO measure psychometric properties can only be vouched for and understood when applied to patients similar to those used in their development or post hoc adaptation.
Sample Size Considerations
The range of target individuals involved in development or validation of these measures significantly varied (range, 33-369). It is generally recommended that variable and participant sampling be optimized for factor or principal components analysis–based methods or that there be more than 100 participants involved in validation. One measure (DI) achieved this standard. It is important to question the acceptability, applicability, and generalizability of measures that include too few individuals from the target population during development.
Psychometric Properties
All published upper airway–related dyspnea PRO measures purport to show reliability or validity. This simple statement is often considered sufficient legitimization of a PRO measure’s quality by end users. Caution should be practiced before assuming equivalence between PRO measures that are validated. It is important to understand that reliability and validity are not discrete concepts but rather exist on a spectrum. Furthermore, both reliability and validity have many different forms of variable rigor. For example, face validity can be asserted if experts “agree” that the content is appropriate. However, establishment of face validity does not imply that the instrument is appropriate for tracking treatment outcomes (ie, responsiveness to change or construct validity).
Lack of Patient Centeredness
Patient centeredness was lacking in devising these measures. None directly engaged patients with upper airway disease in de novo item content development. This lack is concerning because the foundation for PRO measures is the target population’s perspective and experience. Therefore, omitting patients at this stage compromises the content validity and constancy of scores and creates a condition in which patients answer questions designed by and based on the experience and opinions of content experts who do not live with their particular condition. The DI mentioned this omission as a weakness and attempted to remedy it by enlisting patients with upper airway dyspnea problems in cognitive interviews.
Reliability
The CCQ and DI assessed test-retest reliability and internal consistency reliability. Test-retest reliability demonstrates the stability of participant scores when the underlying construct (upper airway–related dyspnea) has not changed. Showing this reliability is critical because it is difficult to determine whether a meaningful change occurs over time or after intervention without demonstrated stability of scores when the condition being measured has not changed. It is recommended that the retest occur at a sufficient interval to minimize the likelihood of recall bias. Recall bias is a concern with the approach taken for the DI. It was administered at the beginning and conclusion of a clinic visit, or approximately 45 minutes between test and retest. The consequence of this bias is that reliability and stability of the PRO measure may be overestimated, which can distort the instrument’s sensitivity to change or longitudinal validity. The general recommendation is to allow several days to a week (or more) to elapse between administrations (depending on the construct being measured) to mitigate this risk.
Construct Validity
Strength of construct validity varied among PRO measures. Only one measure (CCQ) proposed subscales but did not empirically justify the existence of those subscales (ie, symptom, function, and emotion). This omission is critical, particularly when claiming that the subscales are measuring discrete aspects of the overarching construct (dyspnea). It is presumed at face value that subscales are discrete; therefore, end users are instructed to score subscales independently. However, statistical justification is lacking to ensure their independence or that they are not measuring overlapping aspects of the construct. Therefore, to avoid dissemination of spurious results, we do not recommend that the CCQ subscale scores be used in this population until these dimensions have been empirically demonstrated.
All measures purport to show longitudinal validity or responsiveness to change. Each showed that scores improved after intervention. However, responsiveness to change requires that the PRO measure show that it is stable in the absence of change. This stability is shown using test-retest reliability in the intended population. Both the CCQ and DI adequately tested this parameter. However, the MRC dyspnea scale did not; while its score changed with intervention, users cannot determine whether that change was due to random chance or a meaningful change without proof of test-retest reliability.
Interpretability
Understanding how to interpret scores is critical to any measure’s usefulness. Developers or adapters of each measure provided guidance on interpreting scores. Simple summation was proposed for the CCQ and DI. A higher score represents a greater burden of the construct (dyspnea). While this summation is useful, it does not provide guidance as to what represents a meaningful change. Is a CCQ score change of 2 clinically meaningful? Lack of clarity on what represents an “important” change invites researchers to equate a “statistically significant change or P < .05” with clinical meaningfulness. This practice is problematic because statistical significance is intrinsically linked to sample size (power) and can distract from the actual effect size (where the focus should be).
Methods to mitigate this recognized limitation exist. Among the identified PRO measures, the MRC dyspnea scale discussed and advanced an MCID, which is defined as the smallest change in a treatment outcome that a patient would recognize as important. Admittedly, several methods have been considered to refine this method and are debated by experts in PRO measure development and statisticians. Confidence in the MRC dyspnea scale’s MCID should be muted because it failed to demonstrate test-retest reliability. Nonetheless, establishing what constitutes a clinically important change within the measure is a valuable feature for end users. Otherwise, it is unclear when statistically significant difference equates with a clinically important difference. Omitting this feature represents a weakness in dyspnea-related PRO measures and limits the interpretability of scores in clinical and research applications.
Another limitation in these PRO measures was the absence of guidance on how to manage incomplete questionnaires, which is a common occurrence in clinical practice and research applications. Implications of missing data can be significant, particularly if systematic, thus introducing bias. Using an example from voice, the Voice Handicap Index and Voice Handicap Index-10 ask respondents whether their voice affects the ability to perform their job. Retirees and unemployed patients may leave this question blank because of its lack of relevance. Therefore, exclusion of incomplete PRO measures would unknowingly bias results toward employed patients and away from elderly and the unemployed, an important consideration that is often overlooked in developing the PRO measure. Many techniques for dealing with incomplete data exist, but no measure provided a framework.
Implications
PRO measures are relied on as an objective, systematic approach to assess patient symptom severity and quality-of-life implications. Care should be exercised to understand each measure’s developmental characteristics before selecting and advocating for their use. Each instrument identified has developmental and psychometric limitations. For example, a critical consideration in adapting measures originally designed for lung disease to patients with upper airway disease is that patients with lower airway diseases can differ substantially from those with upper airway diseases in terms of symptoms, quality of life, clinical course, and prognosis. A measure given to a population different from that for which it was developed inherently threatens the content validity and increases the chance of aberrant inaccurate clinical conclusions.
It is also pragmatic to recognize that most conditions causing upper airway–related dyspnea are rare. In the United States and the European Union, a disease is considered rare if it affects fewer than 200 000 persons or 2000 persons, respectively. Assessing treatments from the patient perspective in rare diseases poses unique challenges (eg, heterogeneity of presentation and unknown natural history). The International Society for Pharmacoeconomics and Outcomes Research recognizes that standard methods and strategies of PRO development, validation, and implementation, including those recommended by regulators, need to be interpreted in the context of the unique challenges associated with rare disease [populations]. It is through this lens that our findings should be interpreted.
Limitations
There are limitations to this review process. Despite the careful design, the search may not have captured all available literature because it is poorly indexed. Hand searches were used to mitigate this limitation. There is also the risk of subjectivity in scoring PRO measure characteristics. Every effort was made to minimize this risk by using 2 independent reviewers for each instrument considered. In addition, the checklist was used to evaluate the presence of measurement properties within PRO measure development but is not intended to grade the rigor by which they were demonstrated. For example, the DI met criteria for reliability, but many may argue that the short time frame between administrations and the inherent recall bias are invalidating, as we discussed above.
Conclusions
PRO measures allow the systematic and objective assessment of symptom severity and quality of life in patients with upper airway–related dyspnea. Only 3 measures exist (MRC dyspnea scale, CCQ, and DI) despite the fact that the patient symptoms and quality of life are often ultimate arbiters of when to intervene and to measure treatment effectiveness. Available PRO measures have disparate developmental rigor. Care must be taken to understand the developmental characteristics of candidate PRO measures before using them in research and clinical applications.
eTable. Glossary of Measurement Properties of Patient-Reported Outcome Measures
eFigure. PRISMA Diagram of the Disposition of Studies Identified for This Study
References
- 1.Currow DC, Plummer JL, Crockett A, Abernethy AP. A community population survey of prevalence and severity of dyspnea in adults. J Pain Symptom Manage. 2009;38(4):533-545. [DOI] [PubMed] [Google Scholar]
- 2.Patrick DL, Burke LB, Powers JH, et al. Patient-reported outcomes to support medical product labeling claims: FDA perspective. Value Health. 2007;10(suppl 2):S125-S137. [DOI] [PubMed] [Google Scholar]
- 3.Patient-Centered Outcomes Research Institute (PCORI) Patient-centered outcomes research working definition. http://www.pcori.org/assets/PCOR-Revised-Definition-v2-042020121.pdf. Accessed December 3, 2016.
- 4.US Food and Drug Administration Guidance for Industry: Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims Rockville, MD: FDA; 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Terwee CB, Mokkink LB, Knol DL, Ostelo RW, Bouter LM, de Vet HC. Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist. Qual Life Res. 2012;21(4):651-657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Regnault A, Hamel JF, Patrick DL. Pooling of cross-cultural PRO data in multinational clinical trials: how much can poor measurement affect statistical power? Qual Life Res. 2015;24(2):273-277. [DOI] [PubMed] [Google Scholar]
- 7.Penson DF, Litwin MS, Aaronson NK. Health related quality of life in men with prostate cancer. J Urol. 2003;169(5):1653-1661. [DOI] [PubMed] [Google Scholar]
- 8.Shamseer L, Moher D, Clarke M, et al. ; PRISMA-P Group . Preferred Reporting Items for Systematic Review and Meta-analysis Protocols (PRISMA-P) 2015: elaboration and explanation. BMJ. 2015;349:g7647. [DOI] [PubMed] [Google Scholar]
- 9.Francis DO, McPheeters ML, Noud M, Penson DF, Feurer ID. Checklist to operationalize measurement characteristics of patient-reported outcome measures. Syst Rev. 2016;5(1):129.27484996 [Google Scholar]
- 10.Nouraei SA, Nouraei SM, Randhawa PS, et al. Sensitivity and responsiveness of the Medical Research Council dyspnoea scale to the presence and treatment of adult laryngotracheal stenosis. Clin Otolaryngol. 2008;33(6):575-580. [DOI] [PubMed] [Google Scholar]
- 11.Nouraei SA, Randhawa PS, Koury EF, et al. Validation of the Clinical COPD Questionnaire as a psychophysical outcome measure in adult laryngotracheal stenosis. Clin Otolaryngol. 2009;34(4):343-348. [DOI] [PubMed] [Google Scholar]
- 12.Gartner-Schmidt JL, Shembel AC, Zullo TG, Rosen CA. Development and validation of the Dyspnea Index (DI): a severity index for upper airway–related dyspnea. J Voice. 2014;28(6):775-782. [DOI] [PubMed] [Google Scholar]
- 13.Fletcher CM, Elmes PC, Fairbairn AF, Wood CH. The significance of respiratory symptoms and the diagnosis of chronic bronchitis in a working population. BMJ. 1959;2(5147):257-266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.van der Molen T, Willemse BW, Schokker S, ten Hacken NH, Postma DS, Juniper EF. Development, validity and responsiveness of the Clinical COPD Questionnaire. Health Qual Life Outcomes. 2003;1:13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.DuBay WH. The principles of readability. http://www.impact-information.com/impactinfo/readability02.pdf. Published August 25, 2004. Accessed December 3, 2016.
- 16.Terwee CB, Bot SD, de Boer MR, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60(1):34-42. [DOI] [PubMed] [Google Scholar]
- 17.Velicer WF, Fava JL. Effects of variable and subject sampling on factor pattern recovery. Psychol Methods. 1998;3(2):231-251. [Google Scholar]
- 18.Newton PE, Shaw SD. Standards for talking and thinking about validity. Psychol Methods. 2013;18(3):301-319. [DOI] [PubMed] [Google Scholar]
- 19.Guyatt GH, Osoba D, Wu AW, Wyrwich KW, Norman GR; Clinical Significance Consensus Meeting Group . Methods to explain the clinical significance of health status measures. Mayo Clin Proc. 2002;77(4):371-383. [DOI] [PubMed] [Google Scholar]
- 20.de Vet HC, Terwee CB, Ostelo RW, Beckerman H, Knol DL, Bouter LM. Minimal changes in health status questionnaires: distinction between minimally detectable change and minimally important change. Health Qual Life Outcomes. 2006;4:54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Jacobson BH, Johnson A, Grywalsky C, et al. The Voice Handicap Index (VHI): development and validation. Am J Speech Lang Pathol. 1997;6:66-70. [Google Scholar]
- 22.Rosen CA, Lee AS, Osborne J, Zullo T, Murry T. Development and validation of the Voice Handicap Index-10. Laryngoscope. 2004;114(9):1549-1556. [DOI] [PubMed] [Google Scholar]
- 23.Richter T, Nestler-Parr S, Babela R, et al. ; International Society for Pharmacoeconomics and Outcomes Research Rare Disease Special Interest Group . Rare disease terminology and definitions: a systematic global review: report of the ISPOR Rare Disease Special Interest Group. Value Health. 2015;18(6):906-914. [DOI] [PubMed] [Google Scholar]
- 24.EURORDIS Position Paper. Patients’ priorities and needs for rare disease research: 2014-2020. http://www.eurordis.org/sites/default/files/publications/what_how%20_are_disease_research_0.pdf. Published October 2011. Accessed December 3, 2016.
- 25.International Society for Pharmacoeconomics and Outcomes Research (ISPOR) Clinical outcomes assessment (COA) in rare disease clinical trials: emerging good practices: report of the ISPOR Rare Disease Trials COA Measurement Task Force. https://www.ispor.org/taskforces/clinicaloutcomesassessment-raredisease-clinicaltrials.asp. Accessed December 3, 2016.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
eTable. Glossary of Measurement Properties of Patient-Reported Outcome Measures
eFigure. PRISMA Diagram of the Disposition of Studies Identified for This Study