Abstract
Rationale
In interstitial lung disease (ILD), symptoms drive impairments in health-related quality of life. Patient-reported outcome measures (PROMs) can assess whether interventions change symptom severity. The meaningfulness of change in a PROM score is estimated by anchoring it to a related variable for which meaningful change has been previously established. Patient global impressions of severity (PGISs) are single-item PROMs that may make trustworthy anchors, but, for ILD, the meaningfulness of change in PGIS items for shortness of breath (SOB), cough, and fatigue/low energy are unknown.
Objectives
To improve understanding of how patients with ILD rate and categorize symptoms, how differing levels of symptom severity affect lived experiences, and how patients derive and apply meaningfulness to change in symptoms.
Methods
We used one-on-one interviews and an electronic survey to collect data from patients with various forms of ILD. Interviews were conducted to provide richness and context to survey responses. We conducted certain analyses with respondents stratified by supplemental oxygen use.
Results
Interviewees (N = 18) confirmed SOB, cough, and fatigue/low energy as the most bothersome symptoms of ILD. Among 298 survey respondents, on a PGIS for SOB with a 0–4 numeric rating scale, on average, those who used supplemental O2 had more severe SOB than nonusers, and most respondents considered a 2-point change meaningful for worsening (45.5%) or improvement (47.2%). On a PGIS with a five-option ordinal response scale, for SOB, most considered a 1-category change meaningful for worsening (49.8%) and a 2-category change meaningful for improvement (42.3); for cough frequency, most respondents considered a 1-category change on the five-option ordinal response scale meaningful for worsening (48.2%) or improvement (45.0%). Survey responses for SOB at the present time versus 3 months earlier (patient global impressions of change) were biased toward the present state.
Conclusions
PGISs can be used as anchors for meaningful change analyses of PROMs that assess SOB or cough in patients with ILD. Patient global impressions of change demonstrate present-state bias and should not be used. Patients’ descriptions paint a vivid picture of lived experience with varying levels of symptom severity and can help contextualize change scores.
Keywords: ILD, symptoms, patient global impressions, meaningful change
Three symptoms (shortness of breath [SOB], cough, and fatigue/low energy) impair health-related quality of life in patients with interstitial lung disease (ILD) (1). Patient-reported outcome measures (PROMs) are administered in drug trials or other studies to assess how patients/subjects feel and function in their daily lives.
Validation is a process through which we gain understanding of the meaning (and meaningfulness to patients) of a PROM’s score: what a respondent with a given score “looks like” and how their daily life is altered when scores change by varying magnitudes. A PROM’s scores are valid if the inferences made about the respondent, based on those scores, are informative and accurate; thus, validation is about the score, not the measure.
Meaningfulness of scores and their change are typically estimated as part of the PROM validation analysis and describe the change in a PROM’s score that corresponds to a meaningful alteration in how a patient feels or functions. Over time, there has been confusing terminology around this concept, whose labels have included “minimal clinically important difference” and “meaningful within-patient change,” but, currently, the preferred term is “meaningful score difference” (MSD) (2). It is frequently derived as a threshold, so a PROM change score that exceeds the MSD is a meaningful change to the patient/respondent.
There is no consensus on how MSD thresholds for a PROM’s score should be estimated (3–6). It most often involves tying PROM change scores to anchor variables, which are other metrics that measure (or are hypothesized to measure) the same construct as the PROM. In analyses aimed at estimating a PROM score’s MSD, only anchors with established MSDs should be used (7).
Patient global impressions of severity (PGISs) or change (PGICs) are single-item PROMs aimed at capturing symptom severity or change, and they are recommended for consideration for use as anchors in PROM validation analysis (2). However, in ILD, MSDs for PGI symptom severity items have not been established. Without this critical information, there can be little confidence in the accuracy of MSDs derived for PROMs using PGI items as anchors. In this mixed-methods study, we used interviews and surveys to achieve three objectives: 1) to examine how patients with ILD describe, rate, and categorize symptoms; 2) to assess how patients discern and conceptualize changes in symptoms; and 3) to establish MSDs for SOB and cough PGIS items.
Methods
Survey
The English-language survey was developed in and distributed via REDCap (see data supplement). The link to the survey was presented to patients by emailing it directly or by posting it in advocacy group newsletters or on websites (see data supplement). Because of the length of the survey, fatigue/low energy was only briefly touched on, and there were no items focused on meaningful change of this symptom.
Interviews
Semistructured interviews were conducted to enhance richness and contextualize survey responses. The interview cohort comprised a convenience sample of English-speaking patients with various forms of ILD recruited from the Center for ILD at National Jewish Health (Denver, CO). Interviewees did not complete the survey. See data supplement for more information on the interview, including the guide.
Analysis and Ethics Approval
Microsoft Excel was used to organize interview data. Immersion and familiarization with the data was achieved by reading and rereading full transcripts and during data organization. Data were summarized using the interview question as the organizational unit, and the analytic approach was explicit rather than latent, with the “codes” determined a priori to be severity ratings rather than actively developed in the analytic process. We intended all patient quotes/descriptors to be visible and to stand on their own rather than being integrated into higher-order themes. SAS software (version 9.4) was used for all other analyses. Demographic and clinical data were summarized and tabulated. A χ2 test (or Fisher’s exact test as appropriate) was used to analyze categorical data. With logistic regression, we examined associations between predictors and survey responses while controlling for potentially influential variables. We conducted some analyses with interviewees and survey respondents stratified by supplemental oxygen (O2) use: continuous or with exertion (users) versus no daytime use (nonusers). Interviewees were compensated with gift cards; survey respondents were not. The protocol (HS-4137) was reviewed, approved, and deemed to qualify for exemption from federal regulations by the National Jewish Health Institutional Review Board.
Results
Table 1 summarizes characteristics of interviewees (N = 18) and survey respondents (N = 298). Because of how the survey was administered, we could not calculate the response rate. There were 592 clicks to open the survey.
Table 1.
Clinical characteristics of interviewees and survey respondents
| Interviewees (n = 18) | Survey Respondents (n = 298) | |
|---|---|---|
| Age, yr | 67.8 ± 9.4 | 70.5 ± 9.4 | 
| Female sex* | 10 (56%) | 133 (45%) | 
| Country* | ||
| United States | 18 (100%) | 215 (72%) | 
| Australia | — | 21 (7%) | 
| Canada | — | 3 (1%) | 
| Ireland | — | 1 (<1%) | 
| United Kingdom | — | 53 (18%) | 
| Other | 4 (2%) | |
| Diagnosis† | ||
| IPF | 3 (17%) | 202 (68%) | 
| HP | 6 (33%) | 27 (9%) | 
| Other | 3 (17%) | 40 (13%) | 
| CTD-ILD | 6 (33%) | 28 (9%) | 
| RA-ILD | 2 | — | 
| Anti-synthetase syndrome | 3 | — | 
| SSc-ILD | 1 | — | 
| Daytime supplemental O2 use | ||
| No | 10 (22%) | 166 (56%) | 
| Yes | 8 (44%) | 132 (44%) | 
Definition of abbreviations: CTD-ILD = connective tissue disease–related interstitial lung disease; HP = hypersensitivity pneumonitis; IPF = idiopathic pulmonary fibrosis; RA-ILD = rheumatoid arthritis interstitial lung disease; SSc-ILD = systemic sclerosis–related interstitial lung disease.
One missing for survey group.
The survey did not ask patients with CTD-ILD whether their ILD was fibrotic, but it did for HP.
Most Bothersome Symptoms of ILD According to Interviewees
To set the stage, we started by asking interviewees to identify the most bothersome symptom of their lung disease. Fifteen of 18 volunteered SOB, which they described as “shortness of breath” (69-year-old man with rheumatoid arthritis ILD, no O2 use), “hard to breathe” (60-year-old woman with anti-synthetase syndrome, continuous O2 use), “[low] lung intake” (64-year-old man with familial pulmonary fibrosis, no O2 use), “drains my lungs” (66-year-old man with idiopathic pulmonary fibrosis [IPF], continuous O2), “gotta stop and catch my breath” (67-year-old woman with hypersensitivity pneumonitis [HP], sleep O2 use), “not being able to breathe” (73-year-old woman with idiopathic nonspecific interstitial pneumonia, continuous O2), “a little labored on my breathing” (59-year-old man with HP, no O2 use), “inability to get as much breath as I need” (57-year-old woman with HP, no O2 use), “lack of oxygen” (71-year-old woman with HP, continuous O2). Four interpreted “symptoms” broadly and mentioned O2. Two mentioned cough, and three mentioned “major fatigue,” “lack of energy,” or “tire out easily.”
When asked directly, 5 of 17 interviewees denied cough altogether, and 7 (41%) rated their cough severity a 0 during the previous week on a 0–4 numeric rating scale (NRS). Those who mentioned cough said they cough “if I try to take a deep breath”; “more with activity”; “[in] fits”; “not very much… and when I do, it’s kind of dry”; and “[in] episodes where I felt like, you know, it’s gonna snap my back.”
When asked directly, 16 of 18 interviewees described fatigue and/or low energy and described it as “…[feeling like] I could do this, but you can’t. You’re tired, you can’t”; “my energy level is definitely dropped”; “low energy means I don’t wanna go to the gym… but I’ll read a book or watch a movie”; “at the end of day, you’re tired. You just want to rest, you know? That has nothing to do with being short of breath”; “just the blah feeling. You just don’t feel any energy”; and “I think that low energy probably more… captures how I feel than fatigue.”
SOB Severity
Survey
On a 0–4 NRS, most survey respondents (52.0%) categorized response options 1 and 2 as mild, 3 as moderate, and 4 as severe (Figure 1). There was no difference in this categorization scheme between O2 users and nonusers (P = 0.88). O2 users had more severe SOB than nonusers (Fisher’s exact test P < 0.0001; Figure 2). Most respondents (82%) believed their positions for transitions between mild and moderate SOB and between moderate and severe SOB on the 0–4 NRS for the SOB PGIS were the same for a 0–4 NRS PGIS for cough severity.
Figure 1.
Mild, moderate, and severe categorization of (A) 0–4 and (B) 0–5 numeric rating scales for patient global impression of severity for shortness of breath. Shown are the numbers and percentages of survey respondents who categorized the numeric rating scale as shown. Color changes represent thresholds between categories (e.g., in A, the “N = 129, 52.0%” group considered scores of 1 and 2 as mild [green], 3 as moderate [blue], and 4 as severe [red]).
Figure 2.
Patient global impression of severity for shortness of breath for (A) supplemental O2 users (continuous or exertion) and (B) nonusers (none or sleep only). Fisher’s exact test P < 0.0001 for comparison between O2 users and nonusers. CTD = connective tissue disease; HP = hypersensitivity pneumonitis; IPF = idiopathic pulmonary fibrosis; Oth = other.
We asked survey respondents to write their own item to give to patients with ILD to assess SOB severity during the previous week. Responses covered a range of concepts (Table E1 in the data supplement). Of the 281 responses, 94 dealt with assessing SOB during various activities (climbing stairs was mentioned most frequently, n = 12).
Interviews
To add granularity, Table 2 shows interviewees’ descriptions of ratings of 1, 2, 3, and 4 on the 0–4 NRS PGIS for SOB. In general, interviewees considered a severity of 1 mild, something “I can feel,” occurring with various activities but easily controlled and not terribly limiting. With increasing severity, in general, SOB was present with lower-demand activities, forced patients to stop and catch their breath more frequently, and lengthened recovery time.
Table 2.
Quotes from interviewees about dyspnea severity ratings on a 0–4 scale
| When shortness of breath is rated a ____… | ||||
|---|---|---|---|---|
| 1 | 2 | 3 | 4 | |
| 24/7 O2 use | 
  | 
  | 
  | 
  | 
| No daytime O2 use | 
  | 
  | 
  | 
  | 
Change in SOB Severity
Survey
For the PGIC item for SOB—“How would you rate the severity of your shortness of breath now compared with 3 months ago?”, with seven response options (much/moderately/minimally worse, no different, minimally/moderately/much better)—worsening was reported by a greater number of O2 users than nonusers (P = 0.001; Figure E1). Table E2 shows the biased relationship between current SOB severity and PGIC ratings: respondents with more severe SOB currently are more likely to respond that SOB had worsened during the previous 3 months.
Interviews
Interviewees were presented the same PGIC item for SOB severity as the survey and asked to read it out loud and “think aloud” as they decided on their responses. Just more than half of interviewees recalled thinking about the exact month it was 3 months before the interview (December; “Well, I’m going back to the holidays…”). Despite the wording of the question, the others did not call to mind a specific time frame; one used “just a general feel.” Although some interviewees reflected on a specific activity they performed routinely and used it as a barometer of change over time—“I’m thinking of examples like… how far can I walk the dog”—most, including those who recalled the specific month, did not. A few interviewees recalled their pulmonary function tests: “3 months ago… the [pulmonary function tests] or whatever… they were, in that period, more difficult”; some reflected on the amount of oxygen they were using (“I go by how much air I’m using”); or data from their exercise equipment (“So… over that time period, I’ve increased my pedaling rate, and I have noticed that my heart rate seems to be to be better”).
Meaningfulness of Change in SOB Severity
Survey
The survey presented a hypothetical scenario: imagine, at baseline and again 3 months later, you were asked to use the 0–4 scale to respond to the question, “over the last week, how severe has your shortness of breath been?” This was followed by questions of how many points would ratings have to differ (between baseline and 3 months) for them to consider it a meaningful improvement or worsening. On the 0–4 NRS, nearly half of respondents considered 2 points meaningful for improvement (n = 135; 47.2%) and worsening (n = 130; 45.5%) (Table 3). Among all respondents, the weighted κ-value for agreement between responses for what constitutes meaningful improvement and worsening was 0.62 (95% confidence limits, 0.54–0.70). Agreement was the same for subgroups stratified on O2 use (users, 0.60; nonusers, 0.63).
Table 3.
Meaningfulness of 3-month change in shortness of breath patient global impression of severity with response options of 0–4 or “not at all” to “very severe”
| Response options | Over the Last Week, How Severe Has Your Shortness of Breath Been? | Over the Last Week, How Severe Has Your Shortness of Breath Been? | ||
|---|---|---|---|---|
| 0 1 2 3 4 | ||||
| Meaningful worsening | 1 point | 2 points | 3 points | 4 points | 
| All (N = 286) | 104 (36.4%) | 130 (45.5%) | 39 (13.6) | 13 (4.6) | 
| O2 users (n = 125) | 46 (36.8%) | 54 (43.2%) | 18 (14.4%) | 7 (5.6%) | 
| No O2 (n = 161) | 58 (36.0%) | 76 (47.2%) | 21 (13.0%) | 6 (3.7%) | 
| Meaningful improvement | 1 point | 2 points | 3 points | 4 points | 
| All (N = 286) | 97 (33.9%) | 135 (47.2%) | 42 (14.7%) | 12 (4.2%) | 
| O2 users (n = 125) | 35 (28.0%) | 55 (44.0%) | 28 (22.4%) | 7 (5.6%) | 
| No O2 (n = 161) | 62 (38.5%) | 80 (49.7%) | 14 (8.7%) | 5 (3.1%) | 
| Response options | Not at all Mild Moderate Severe Very severe | |||
|---|---|---|---|---|
| Meaningful worsening | 1 category | 2 categories | 3 categories | 4 categories | 
| All (N = 279) | 139 (49.8%) | 100 (35.8%) | 26 (9.3%) | 14 (5.0%) | 
| O2 users (n = 122) | 61 (50.0%) | 38 (31.2%) | 14 (11.5%) | 9 (7.4%) | 
| No O2 (n = 157) | 78 (49.7%) | 62 (39.5%) | 12 (7.6%) | 5 (3.2%) | 
| Meaningful improvement | 1 category | 2 categories | 3 categories | 4 categories | 
| All (N = 279) | 116 (41.6%) | 118 (42.3%) | 31 (11.1%) | 14 (5.0%) | 
| O2 users (n = 122) | 46 (37.7%) | 50 41.0%) | 17 (13.9%) | 9 (7.4%) | 
| No O2 (n = 157) | 70 (44.6%) | 68 43.3%) | 14 (8.9%) | 5 (3.2%) | 
“O2 users” indicates respondents who used supplemental oxygen during the day; “no O2” indicates no O2 use or use during sleep only. For the 0–4 numeric rating scale, the survey stated that “0” means “not at all” and higher numbers correspond to greater severity. Among all respondents, the weighted κ for agreement between responses for meaningful improvement and worsening shortness of breath severity (e.g., 1-category for improvement and worsening, 2 categories for improvement and worsening) was 0.81 (95% confidence limits, 0.74–0.88). And, among all respondents, the weighted κ for agreement between responses for what constitutes meaningful improvement and worsening was 0.62 (95% confidence limits, 0.54–0.70). Agreement was the same for subgroups stratified on O2 use (users, 0.60; nonusers, 0.63).
A majority of survey respondents (70–77%) affirmed that it did not matter where the baseline response was (e.g., 0, 1, 2, 3, 4): over 3 months, all 1-point improvements or worsenings in SOB were equally meaningful (e.g., a 3-month worsening from 0 to 1 carries the same meaning as a worsening from 3 to 4). The same was true for 2- and 3-point changes on the 0–4 NRS for SOB.
When the survey posed the same hypothetical scenario, but instead of 0–4, the response options were “not at all,” “mild,” “moderate,” “severe,” and “very severe,” most respondents considered 1-category worsenings, but 2-category improvements, meaningful (Table 3). Among all respondents, the weighted κ-value for agreement between responses for meaningful improvement and worsening SOB severity (e.g., 1 category for improvement and worsening, 2 categories for improvement and worsening) was 0.81 (95% confidence limits, 0.74–0.88). The greatest number of respondents in any off-diagonal cell was 28 (10%): they considered a 1-category worsening meaningful but required a 2-category improvement for it to be meaningful. Three quarters of respondents considered all 1-category worsenings equivalent; the same was true for all 2- and 3-category worsenings. However, as with the 0–4 NRS, a number of survey respondents stated that not all 1-category worsenings are the same, as exemplified by one survey respondent on a write-in item: “any jump to [very severe] is worse than any other increase.” In a multivariable logistic regression model, there was no association between sex, O2 use, diagnosis (IPF yes/no), or age and a response of nonequivalence among all 1- or 2-point or 1- or 2-category worsenings (Table E3).
Interviews
In contrast to the previous findings, five of the six interviewees (and as many as 30% of survey respondents) considered a 1-point worsening on the 0–4 NRS PGIS for SOB meaningful but said the weight of meaningfulness depended on the location on the scale. Generally, any worsening (1- or 2-point) over 3 months that landed them closer to a score of 4 was more meaningful (e.g., change from 2 to 3 was more meaningful than from 1 to 2). From the interviews: “The higher the numbers go… I’m getting closer to not being able to breathe at all”; “I would say the 2 to 4 would be a bigger jump than the 1 to 3 to me”; “The higher the number, the more severe. There’s, to me, there’s a big difference between 2 and 3 and 4.”
Cough Severity
Survey
In contrast to the 0–4 NRS for SOB, cough severity was categorized using a 0–5 NRS. A total of 33.3% of survey respondents categorized 1 and 2 as mild, 3 as moderate, and 4 and 5 as severe. A nearly equal proportion (32.2%) categorized 1 and 2 as mild, 3 and 4 as moderate, and 5 as severe (Figure 1). Nearly 70% of O2 users and 81% of nonusers rated the severity of cough absent or mild, and nearly 40% in each group rated the frequency of their cough during the previous week as occurring “sometimes” (Figure E2). The majority (82%) of respondents believed the positions of their mild-to-moderate and moderate-to-severe transitions on the 0–5 NRS for cough severity would be the same for a 0–5 NRS PGIS for fatigue/low energy severity.
Interviews
Table E4 shows interviewees’ descriptions of ratings of 1, 2, 3, and 4 on the 0–4 NRS PGIS for cough severity. (Note: this is different from the 0–5 NRS on the survey.) In general, interviewees considered cough severity of 1 mild, occurring occasionally, and noninterfering. Several interviewees suggested the “depth” of the cough (“coming up from deep within my lungs”) as, at least partially, driving the severity. Severe cough was one that occurred frequently, was interfering with activities, and was often continuous to the point that patients could not “talk, breathe, eat” or “grab a breath.” Table E5 shows the relationship between perceived cough frequency and perceived cough severity during the previous week.
Meaningfulness of Change in Cough Frequency
Survey
For the hypothetical 3-month change scenario for cough frequency—at baseline and 3 months later, “over the last week, how frequently have you coughed?”, with response options of “none,” “rarely,” “sometimes,” “often,” or “very frequently”—most respondents (Table 4) considered a 1-category change meaningful for improvement (45.0%) and worsening (48.2%); fewer considered a 2-category change meaningful for improvement (38.9%) and worsening (37.9%). The weighted κ-value for agreement between ratings for improvement and worsening was 0.77 (95% confidence limits, 0.67–0.86).
Table 4.
Meaningfulness of 3-month change in cough frequency patient global impression of severity from “not at all” to “very frequently” ordinal response scale response
| Over the Last Week, How Frequently Have You Coughed? | Over the Last Week, How Frequently Have You Coughed? | |||
|---|---|---|---|---|
| Response options | Not at all Rarely Sometimes Often Very frequently | |||
| Meaningful worsening | 1 category | 2 categories | 3 categories | 4 categories | 
| All (N = 280) | 135 (48.2%) | 106 (37.9%) | 27 (9.6%) | 12 (4.3%) | 
| O2 users (n = 123) | 61 (49.6%) | 44 (35.8%) | 13 (10.6%) | 5 (4.0%) | 
| No O2 (n = 157) | 74 (47.1%) | 62 (39.5%) | 14 (8.9%) | 7 (4.5%) | 
| Meaningful improvement | 1 category | 2 categories | 3 categories | 4 categories | 
| All (N = 280) | 126 (45.0%) | 109 (38.9%) | 29 (10.4%) | 16 (5.7%) | 
| O2 users (n = 123) | 52 (42.3%) | 44 (35.8%) | 17 (13.8%) | 10 (8.1%) | 
| No O2 (n = 157) | 74 (47.1%) | 65 (41.4%) | 12 (7.6%) | 6 (3.8%) | 
“O2 users” indicates respondents who used supplemental oxygen during the day; “no O2” indicates no O2 use or use during sleep only. The weighted κ for agreement between ratings for improvement and worsening was 0.77 (95% confidence limits, 0.67–0.86).
Fatigue/Low Energy Severity
Survey
Most survey respondents rated the severity of their fatigue/low energy during the previous week mild or moderate (Figure E3).
Interviews
Table E6 shows interviewees’ descriptions of ratings of 1, 2, 3, or 4 on the 0–4 NRS PGIS for fatigue/low energy severity. In general, interviewees considered a severity of 1 as not interfering with things they liked or needed to do. Severe fatigue/low energy meant “[feeling] exhausted”; there was an emotional component at this level, with interviewees “not caring”; not feeling like “accomplishing anything”; or “just [wanting to] sit on the couch.”
Discussion
We interviewed and surveyed patients with various forms of ILD and learned how they describe, rate, and categorize symptoms and how they assess the meaningfulness of change in symptoms over a 3-month period. This information allowed us to generate first-ever MSD determinations for PGIS items for SOB and cough in this target population. Interviews confirmed that patients with all forms of ILD (IPF and non-IPF fibrosing ILD) are bothered by the same three symptoms and that the impact of those symptoms on wellbeing appears to not differ by diagnosis. To our knowledge, our interviews also provide, among other things, the largest set of qualitative data describing various levels of severity of SOB, cough, and fatigue/low energy in patients with ILD.
Through the survey, we learned from patients (the true ILD experts) that, for most, on an SOB PGIS item with a 0–4 NRS, over 3 months, a 2-point improvement or 2-point worsening is a meaningful change. However, for a SOB PGIS item with a 5-option ordinal response scale (ORS; from “not at all” to “very severe”), the MSDs were 2 categories for improvement and 1 category for worsening. Because significant proportions of respondents answered differently, if these PGIS items are used as anchors for PROM MSD estimation, it is reasonable to conduct analyses using 1- and 2-unit changes in the anchor as meaningful. For a cough frequency PGIS with a 5-option ORS (from “never” to “very frequently”), a 1-category change could be considered meaningful for improvement or worsening. As with SOB, a 2-category change could also be analyzed. Regardless of symptom, for any PGIS anchor, investigators must select one cutoff a priori for primary analyses and use any other cutoff(s) only for exploratory purposes. Because >80% of interviewees we asked, as well as a significant minority of survey respondents, said not all 1-unit worsenings were equivalent, it will be informative to show results for subgroups defined by baseline PGIS rating.
One drawback to using single-item PROMs (like PGI items or visual analogue scales) as endpoints themselves is loss of granularity. For example, what drives cough severity ratings? Survey data suggest a relationship between cough frequency and severity, but it is clear from the interview data that even infrequent episodes of cough that induce SOB likely contribute. Because we cannot apply numerals to ORS options—the “distance” between ORS response options is likely different from the distance between consecutive numbers—we could not conduct statistical analyses. However, the data in Table E4 show a convincing but not perfect relationship between cough frequency and severity.
In theory, PGIC responses should be equally (but oppositely) correlated with past and present status, but this is rarely the case. Most often, PGIC responses exhibit present-state bias (8, 9): they have an inappropriately strong correlation with the present state and a far lower-than-expected correlation with past state. Our results confirmed present-state bias in SOB PGIC and—particularly when combined with the somewhat haphazard way many interviewees appeared to develop their responses in the think-aloud exercise—would argue strongly against the use of PGICs as anchors in MSD analyses in ILD.
The interview and response data generated here will support the move to greater patient centeredness in trial endpoints (10). What will propel such advancement is increased confidence that candidate PROM endpoints have scores that are reliable, valid, and responsive to change. Even more important is improved interpretability of how changes in a PROM’s scores relate to patients’ experiences (2). Several methods can generate such interpretability, but the one used most involves relating a PROM’s scores to other trustworthy outcome assessment tools (i.e., anchors) whose MSDs have already been established. For ILD, PGIS items for SOB, cough, and fatigue/low energy are attractive anchors because they are brief, easily understood, capture important concepts, and can be administered in various formats at times when other study data are collected. Until now, sponsors and ILD investigators using PGISs in their trials were required to assume what score change is meaningful to patients. Those assumptions no longer need to be made: the experts have spoken.
Our study has limitations. Most respondents were from the United States, and we received relatively few surveys from patients with non-IPF diagnoses. The lack of heterogeneity precluded subgroup analyses by country. The interviews and surveys were not identical, but there was extensive overlap and the responses were generally similar, supporting the internal validity of the results. We cannot assess the fidelity of self-reported demographic and clinical information provided by survey respondents, nor can we confirm that any respondent turned in only one survey. We did not ask for ethnicity and/or race, which was a missed opportunity.
Because interviews were conducted over Zoom and the survey was delivered electronically, only patients with access to the internet could participate. Recruiting through patient advocacy groups selected for a particularly engaged patient group, likely with high health literacy and perhaps a keen understanding of complex terminology associated with ILD; a broader advertisement campaign could have mitigated this bias. The survey was in English, so only English speakers/readers could respond. Many patients are used to filling out questionnaires about symptoms but not questions about response options or items that require intense reflection, as many of our items did; this could have influenced the response rate and created an influential response burden. All of these raise concern about whether the results would apply to the universe of patients with ILD. Nonetheless, more 300 participants, including 18 interviewees, is a nice-sized sample.
An inductive approach to the qualitative analysis, with organic and open development of codes and themes (using multiple coders), could generate a rich conceptual framework for ILD symptom severity, but that was not the intent of our analysis. We did not administer other PROMs to use as anchors for respondents’ meaningfulness thresholds; however, we believe the patients’ responses are the gold standard for assessing meaningfulness. The analysis for present-state bias of the PGIC could have been enhanced by collecting PGISs at baseline and then resurveying to collect PGIS and PGIC 3 months later.
Our results should not be viewed as valid for other PGI wordings, time frames, or patients outside the target population. For example, although the modified Medical Research Council dyspnea scale could be considered a 0–4 PGIS for dyspnea and is likely used by many ILD practitioners in the clinic, additional research would be needed to determine whether it is a suitable dyspnea anchor and what its MSD is. Recognizing these limitations, we believe our results advance the field to a new level of understanding of patients’ perceptions of symptom severity and meaningfulness of change. Investigators now have disease-specific guidance to incorporate in the planning of any studies in which PROMs and PGIS items are included. Future research could target diagnostic subgroups (to determine whether there are differences between them) and patients from other countries and cultures to assess whether our results hold there.
Acknowledgments
Acknowledgments
The authors thank the Pulmonary Fibrosis Foundation, PF Warriors, Tam Corte in Australia, Chris Ryerson in Canada, and Phil Molyneaux in England for their assistance in distributing the survey and to all the patients who completed it.
Footnotes
Author Contributions: Conceptualization: J.J. Swigris and K.I.A. Analysis: J.J. Swigris and J.B.P. Interpretation of data: J.J. Swigris, J.B.P., K.I.A., T.A.G., and J.J. Solomon. Initial draft of the manuscript: J.J. Swigris. Critical input, review, editing, and approval of the final draft: J.J. Swigris, J.B.P., K.I.A., T.A.G., and J.J. Solomon.
This article has a data supplement, which is accessible at the Supplements tab.
Author disclosures are available with the text of this article at www.atsjournals.org.
References
- 1. Wijsenbeek M, Molina-Molina M, Chassany O, Fox J, Galvin L, Geissler K, et al. Developing a conceptual model of symptoms and impacts in progressive fibrosing interstitial lung disease to evaluate patient-reported outcome measures. ERJ Open Res . 2022;8:00681-2021. doi: 10.1183/23120541.00681-2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 2.U.S. Department of Health and Human Services and the Food and Drug Administration (CDER) Guidance for industry, Food and Drug administration staff, and other stakeholders: patient-focused drug development: incorporating clinical outcome assessments into endpoints for regulatory decision-making. Silver Spring, MD: U.S. Food and Drug Administration; 2023. [Google Scholar]
 - 3. Peipert JD, Hays RD, Cella D. Likely change indexes improve estimates of individual change on patient-reported outcomes. Qual Life Res . 2023;32:1341–1352. doi: 10.1007/s11136-022-03200-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 4. Terluin B, Eekhout I, Terwee C, De Vet H. Minimal important change (MIC) based on a predictive modeling approach was more precise than MIC based on ROC analysis. J Clin Epidemiol . 2015;68:1388–1396. doi: 10.1016/j.jclinepi.2015.03.015. [DOI] [PubMed] [Google Scholar]
 - 5. Terluin B, Eekhout I, Terwee CB. The anchor-based minimal important change, based on receiver operating characteristic analysis or predictive modeling, may need to be adjusted for the proportion of improved patients. J Clin Epidemiol . 2017;83:90–100. doi: 10.1016/j.jclinepi.2016.12.015. [DOI] [PubMed] [Google Scholar]
 - 6. Wyrwich KW, Norman GR. The challenges inherent with anchor-based approaches to the interpretation of important change in clinical outcome assessments. Qual Life Res . 2023;32:1239–1246. doi: 10.1007/s11136-022-03297-7. [DOI] [PubMed] [Google Scholar]
 - 7. Devji T, Carrasco-Labra A, Qasim A, Phillips M, Johnston BC, Devasenapathy N, et al. Evaluating the credibility of anchor based estimates of minimal important differences for patient reported outcomes: instrument development and reliability study. BMJ . 2020;369:m1714. doi: 10.1136/bmj.m1714. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 8. Guyatt GH, Norman GR, Juniper EF, Griffith LE. A critical look at transition ratings. J Clin Epidemiol . 2002;55:900–908. doi: 10.1016/s0895-4356(02)00435-3. [DOI] [PubMed] [Google Scholar]
 - 9. Norman GR, Stratford P, Regehr G. Methodological problems in the retrospective computation of responsiveness to change: the lesson of Cronbach. J Clin Epidemiol . 1997;50:869–879. doi: 10.1016/s0895-4356(97)00097-8. [DOI] [PubMed] [Google Scholar]
 - 10. Aronson KI, Danoff SK, Russell AM, Ryerson CJ, Suzuki A, Wijsenbeek MS, et al. Patient-centered outcomes research in interstitial lung disease: an official American Thoracic Society Research Statement. Am J Respir Crit Care Med . 2021;204:e3–e23. doi: 10.1164/rccm.202105-1193ST. [DOI] [PMC free article] [PubMed] [Google Scholar]
 


